[macstl-dev] not all rosey in the gcc-4.0.0 land
lipovsky at skycomputers.com
Sun Jul 3 03:10:01 WST 2005
>> Could you please try that with valarray<stdext::complex <float> > ? Because
>> this is where my code fails.
> Sure thing.
> using namespace stdext;
> valarray <complex <float> > v1 (complex <float> (1.0f,2.0f), 100);
> valarray <complex <float> > v2 (complex <float> (3.0f,4.0f), 100);
> std::cout << (v1 * v2).sum ();
> benchmark has exited with status 0.
> on my Mac as expected.
> Perhaps the error is with particular values of v1, v2 etc.? (BTW, the complex
> multiply then sum is also optimized to use some combination of vectorized
> fma, from recollection, so any error would start at valarray_altivec.h:154 --
> test that is involved by inserting a std::cout << "x" in the static "call"
> function.) You can do a random search of the problem space by looking at
> exhaustive.cpp and configuring it with the right functor template,
> stdext::accumulator <stdext::plus>.
I'd like to reiterate that the expression above works just fine with my
code as well. It works well with -O0 and -O1 but *not* with -O2 and -O3.
Again, I disassembled the code and ran it instruction by instruction to
see its flow. Even with some C++ code rearrangement inside complex_fma the
same code is generated with O2 and O3: a ppc-decrement-counter-and-branch
into itself (<label+offset>: bdnz label+offset) -- i.e. an empty loop that
goes on decrementing the counter until it's zero. Afterwards it fp-loads
the supposedly calculated value and fp-stores it into my variable. And it
With -O1 the loop looks & works perfectly normal (I'd say, even
I tried rewriting the multiplies_plus functor in different ways; but
all to no avail. I am having a problem even pinpointing what exactly
causes the overoptimization. Still investigating.
More information about the macstl-dev