[macstl-dev] Re: Question

Glen Low glen.low at pixelglow.com
Tue May 24 22:19:16 WST 2005

  • Previous message: [macstl-dev] Re: Question
  • Next message: [macstl-dev] Fwd: My changes
  • Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]


On 18/05/2005, at 2:51 AM, Ilya Lipovsky wrote:
>>

>> Now element_cast <> isn't as bad as you think. Consider that the  
>> valarray expression template engine I wrote can actually  
>> reconfigure expressions at compile time for efficiency e.g.
>>
>> (a * b) + c
>>
>> actually recomposes the expression to use something like madd (c,  
>> a, b) i.e. what looks like two separate operations in the  
>> expression can be merged into a single.
>>
>> That means
>>
>> element_cast <complex> (a) * b
>>
>> need not actually unpack the float a into a complex then multiply  
>> by b, but invoke some sort of merged operation which multiplies 2  
>> complex by 2 float at a time. (The only limitation I see is that  
>> the iterator would need to step through 2 complex at a time, so  
>> the float vector may have to be loaded twice -- it would take a  
>> smart loop unroller in the compiler to see that double load and  
>> optimize to a single one...)
>>
>>
>
> What bothers me is this:
>
> Is the chunking mechanism going to correctly iterate thru the  
> arrays? E.g., if we're stepping thru 2 complex entries at a time,  
> aren't we in danger of stepping thru 4 real entries? Or my fears  
> are completely unfounded, and the mechanism is such that it only  
> iterates by the least possible number of elements instead of the  
> 128 bits atom? Could you check?

The chunking works off expression template objects. Each term in an  
expression becomes an expression template object, and complex terms  
are then made up of simple terms, all the way down to the leaf terms  
which are either literals or arrays (valarrays). An operator overload  
(or function overload) on 1 or 2 expression template objects yields  
another (arbitrary) expression template object, that how the  
composition happens.

Now each expression template object declares a chunk_begin that is a  
STL-style iterator into the chunks produced by the expression  
template object. This chunk_iterator (and const_chunk_iterator) must  
yield vector elements, whose scalar elements are the same type as the  
expression template object's value type. So for an expression  
template of value type float, its chunk iterators must have type vec  
<float, n>.

Thus a hypothetical combination of a float valarray and a complex  
float valarray should have a chunking iterator whose type is vec  
<complex <float>, n> since complex <float> is the type of the  
expression. Now vec <xxx> must fit into a vector register, so  
therefore n == 2 for Altivec.

A binary function ET simply takes the chunk iterator of its two  
constituent ET's and yields its own iterator. Therefore, it must take  
the float valarray chunk iterator which returns vec <float, 4> and  
the complex float valarray which returns vec <complex <float>, 2> and  
somehow produce a vec <complex <float>, 2>. A typical way to do this  
is for the iterator to know if it's on an even or odd index; if even,  
take the lower 2 floats of vec <float, 4> and combine it with the vec  
<complex <float>, 2>; if odd, do the corresponding thing. When you  
increment the binary function iterator, it increments the complex  
valarray iterator similarly, but only increments the float valarray  
iterator every other time and sets even or odd index appropriately.  
Hope that makes sense...
>
>> The issue then becomes whether it is convenient for users to use  
>> element_cast. The valarray expression engine works on identical  
>> types and has no notion of type promotion (yet). Some people have  
>> said element_cast is rather clunky and would rather automatic  
>> promotions like regular C (i.e. float -> complex float, integer ->  
>> float etc.). My worry from a syntactic point of view is that these  
>> conversions aren't free, more so with SIMD architectures, so  
>> there's a need to highlight expensive conversions. What do you think?
>>
>>
>
> I side with the people [who want automatic promotion] in this case.  
> Just because a conversion is of SIMD-type doesn't mean it's not  
> expensive. Consider integer->float. I don't know anything about  
> x86, but on ppc you have to save the GPU register on stack and then  
> reload the data into an FPU register. Cheap? I don't think it's  
> cheaper than converting a float into a complex float thru the VPU,  
> which is having 1 temp register and do:
>
> vxor vtemp, vtemp, vtemp /* puts -0.0 in vtemp */
>
> and then finish with
>
> vmrghw vdest, vdest, vtemp /* you get the original first 2 floats in
>                   complex format with the other 2 original
>                   floats discarded */
>
> This all is going to be cheaper on VPU (at least on G4) than on  
> scalar hardware. Converting from float to int is even more trivial,  
> requiring the use of only one specialized AltiVec instruction.

The conversion is cheap, but not free. People might still blithely  
write i + f where i is an int and f a float, expecting it to do the  
right thing, not aware that it costs some. If they realized that,  
they might be able to choose an algorithm which used only ints or  
floats. However I'm beginning to be persuaded to your stand, mainly  
because that's how (fortunately or unfortunately) C already works,  
and even the supposedly more typesafe descendants like Java and C# do  
the same.

The remaining issue is a hairy one though. ET's are composed up of  
sub-ET's, all the way up to the assignment operator, which then  
actually "does" something (the actual copy). So we could say c + f is  
always c, c * f is always c etc. But what about:

c = f

There will have to be some rewiring happening around the assignment  
operator which in C++ must be a member function, so you don't get the  
same flexibility with the binary function ET's working through free  
functions.



-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.pixelglow.com/lists/archive/macstl-dev/attachments/20050524/bed194aa/attachment.html

  • Previous message: [macstl-dev] Re: Question
  • Next message: [macstl-dev] Fwd: My changes
  • Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the macstl-dev mailing list