[macstl-dev] not all rosey in the gcc-4.0.0 land

Glen Low glen.low at pixelglow.com
Sun Jul 3 13:46:51 WST 2005

  • Previous message: [macstl-dev] not all rosey in the gcc-4.0.0 land
  • Next message: [macstl-dev] not all rosey in the gcc-4.0.0 land
  • Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]


Ilya:

On 03/07/2005, at 3:10 AM, Ilya Lipovsky wrote:

>>> Could you please try that with valarray<stdext::complex <float>  
>>> > ? Because this is where my code fails.
>>> -Ilya
>>>
>>
>> Sure thing.
>>
>>        using namespace stdext;
>>        valarray <complex <float> > v1 (complex <float> (1.0f, 
>> 2.0f), 100);
>>        valarray <complex <float> > v2 (complex <float> (3.0f, 
>> 4.0f), 100);
>>        std::cout << (v1 * v2).sum ();
>>
>> produces
>>
>> (-500,1000)
>> benchmark has exited with status 0.
>>
>> on my Mac as expected.
>>
>> Perhaps the error is with particular values of v1, v2 etc.? (BTW,  
>> the complex multiply then sum is also optimized to use some  
>> combination of vectorized fma, from recollection, so any error  
>> would start at valarray_altivec.h:154 -- test that is involved by  
>> inserting a std::cout << "x" in the static "call" function.) You  
>> can do a random search of the problem space by looking at  
>> exhaustive.cpp and configuring it with the right functor template,  
>> stdext::accumulator <stdext::plus>.
>>
>>
>
> I'd like to reiterate that the expression above works just fine  
> with my code as well. It works well with -O0 and -O1 but *not* with  
> -O2 and -O3. Again, I disassembled the code and ran it instruction  
> by instruction to see its flow. Even with some C++ code  
> rearrangement inside complex_fma the same code is generated with O2  
> and O3: a ppc-decrement-counter-and-branch into itself (<label 
> +offset>: bdnz label+offset) -- i.e. an empty loop that goes on  
> decrementing the counter until it's zero. Afterwards it fp-loads  
> the supposedly calculated value and fp-stores it into my variable.  
> And it contains gibberish.
>
> With -O1 the loop looks & works perfectly normal (I'd say, even  
> beautiful).

The accumulating loop is found in valarray_altivec.h:154. Some of the  
things I would try that you may or may not have tried already:

1.    Throw a spanner into the optimizer works. Usually the optimizer  
cannot optimize around an output statement or a volatile memory  
write, so you can try either. E.g. create a global volatile static  
int, then write to it inside of the loop and various places you think  
might be overoptimizing. The place which successfully breaks the  
overoptimization would give you a clue as to what level it's  
occurring at.

2.    If you're getting this error only with sum () and not regular  
assigns e.g. vr = v1 * v2 or vr = v1 * v2 + v3, then it's a pretty  
good bet it has something to do with the init parameter in the above.  
Try changing the parameter declaration there from T init to const T&  
init, and copying the init to a private init_copy within the  
function. Try making it volatile etc.

3.    The code at line 154 is called from valarray_algorithm.h:60,  
there's another place to do 1, 2 and other things to see if this is  
where the overoptimization happens. This is where the valarray is  
examined so that only the initial sequence is vectorized, while the  
tail, left-over elements use a scalar loop (called tail).

FSF gcc 4.0 release is dated 20 April 2005, and I suspect Apple put  
in a lot of effort over and beyond that to get it working with  
Altivec code for Tiger and Xcode 2.1 -- more's the pity they seem to  
be all for switching to Intel. So we may be better off waiting for  
4.0.1 if we can't resolve the overoptimization, and leave only 3.4.x  
the supported compiler for YDL at all optimization levels --  
according to the gcc.gnu.org site the 4.0 branch has been frozen as  
of 13 June in preparation for 4.0.1 release.

Cheers, Glen Low


---
pixelglow software | simply brilliant stuff
www.pixelglow.com
aim: pixglen

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.pixelglow.com/lists/archive/macstl-dev/attachments/20050703/47eff40c/attachment.html

  • Previous message: [macstl-dev] not all rosey in the gcc-4.0.0 land
  • Next message: [macstl-dev] not all rosey in the gcc-4.0.0 land
  • Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the macstl-dev mailing list