[macstl-dev] Proposal for mixed complex and real arithmetic
glen.low at pixelglow.com
Tue Jul 19 08:20:20 WST 2005
On 19/07/2005, at 1:20 AM, Ilya Lipovsky wrote:
>> The other possibility is to engineer a vec <complex <float>, 4>
>> that contains 2 __vector floats, and restructure valarray <complex
>> <float> > to use this. However I used to remember gcc 3.3 had a
>> terrible time optimizing structs that contained more than 1 field,
>> as vec <complex <float>, 4> would. If you do a test of this
>> structure on 3.4 and it works acceptably, we can then change
>> valarray <complex <float> > to use this.
>> Quickly declare a simple vecComplexFloat4 struct with 2 __vector
>> floats. Declare a operator+ that adds two vecComplexFloat4's. Then
>> try this:
>> vecComplexFloat4 a, b;
>> vecComplexFloat4 c = (a + a + a) + (b + b + b);
>> On 3.3, the usual thing would be that the compiler did a store and
>> then a redundant load for each of the temps a + a etc. Try it on
>> 3.4 and 4.0 to see if that has changed. If you get positive
>> results on 3.4 and 4.0, we can adopt that approach instead -- A
>> and B would be unchanged, but C would be simpler, at the cost of
>> having to define up most of the vec <complex <float>, 4> operators.
> And do you think it should lead to run-time overhead? I mean, if
> you're working with only 2 floats at a time, you have to use lvsl/
> lvsr as you're grabbing the same vector value twice in a loop.
It might have to be a trade-off.
There are three alternatives, as I see it. We may have to do some
exploratory code to see which one is best.
(1) Even/odd iterators into vec <float, 4> and double loads. lvsl/
lvsr need to be used, double loads might be avoided through compiler
CSE -- need to check this.
(2) Moving to vec <complex <float>, 4>. A lot of changes. Depends
on temporary generation e.g. does f + f + f cause the temp to load
and store back into memory?
(3) Trying out some sort of templated loop unrolling not based on
iterator dereference return values. Probably the most efficient, but
I have to think hard about to fit it into the existing framework.
I tried making a simple object with two float fields and seeing how a
temporary would be created, and even on 4.0 it does the redundant
load & store i.e. it doesn't properly enregister a two-field struct.
Perhaps you can do better -- do try some experiments with (1) and (2)
first and I'll explore option (3), since it would help with Andrew's
proposed interleave function.
Cheers, Glen Low
pixelglow software | simply brilliant stuff
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the macstl-dev