[macstl-dev] Proposal for mixed complex and real arithmetic

Glen Low glen.low at pixelglow.com
Tue Jul 19 08:20:20 WST 2005

  • Previous message: [macstl-dev] Proposal for mixed complex and real arithmetic
  • Next message: [macstl-dev] Interleave/transpose function -- opinions wanted
  • Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]


On 19/07/2005, at 1:20 AM, Ilya Lipovsky wrote:

>>
>> The other possibility is to engineer a vec <complex <float>, 4>  
>> that contains 2 __vector floats, and restructure valarray <complex  
>> <float> > to use this. However I used to remember gcc 3.3 had a  
>> terrible time optimizing structs that contained more than 1 field,  
>> as vec <complex <float>, 4> would. If you do a test of this  
>> structure on 3.4 and it works acceptably, we can then change  
>> valarray <complex <float> > to use this.
>>
>> Quickly declare a simple vecComplexFloat4 struct with 2 __vector  
>> floats. Declare a operator+ that adds two vecComplexFloat4's. Then  
>> try this:
>>
>> vecComplexFloat4 a, b;
>> vecComplexFloat4 c = (a + a + a) + (b + b + b);
>>
>> On 3.3, the usual thing would be that the compiler did a store and  
>> then a redundant load for each of the temps a + a etc. Try it on  
>> 3.4 and 4.0 to see if that has changed. If you get positive  
>> results on 3.4 and 4.0, we can adopt that approach instead -- A  
>> and B would be unchanged, but C would be simpler, at the cost of  
>> having to define up most of the vec <complex <float>, 4> operators.
>>
>
> And do you think it should lead to run-time overhead? I mean, if  
> you're working with only 2 floats at a time, you have to use lvsl/ 
> lvsr as you're grabbing the same vector value twice in a loop.
>
> -Ilya

It might have to be a trade-off.

There are three alternatives, as I see it. We may have to do some  
exploratory code to see which one is best.

(1)    Even/odd iterators into vec <float, 4> and double loads. lvsl/ 
lvsr need to be used, double loads might be avoided through compiler  
CSE -- need to check this.
(2)    Moving to vec <complex <float>, 4>. A lot of changes. Depends  
on temporary generation e.g. does f + f + f cause the temp to load  
and store back into memory?
(3)    Trying out some sort of templated loop unrolling not based on  
iterator dereference return values. Probably the most efficient, but  
I have to think hard about to fit it into the existing framework.

I tried making a simple object with two float fields and seeing how a  
temporary would be created, and even on 4.0 it does the redundant  
load & store i.e. it doesn't properly enregister a two-field struct.  
Perhaps you can do better -- do try some experiments with (1) and (2)  
first and I'll explore option (3), since it would help with Andrew's  
proposed interleave function.

Cheers, Glen Low


---
pixelglow software | simply brilliant stuff
www.pixelglow.com
aim: pixglen

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.pixelglow.com/lists/archive/macstl-dev/attachments/20050719/f0f8cfb8/attachment.html

  • Previous message: [macstl-dev] Proposal for mixed complex and real arithmetic
  • Next message: [macstl-dev] Interleave/transpose function -- opinions wanted
  • Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the macstl-dev mailing list