[macstl-dev] not all rosey in the gcc-4.0.0 land
glen.low at pixelglow.com
Sat Jul 2 09:46:52 WST 2005
On 02/07/2005, at 7:24 AM, Ilya Lipovsky wrote:
> Looks like the expression
> (u * v).sum()
> produces bad code in gcc 4.0.0, if inline is NOT #defined as
> "inline __attribute__ ((always_inline)).
> If, however, inline IS #defined as "inline __attribute__
> ((always_inline)), then the code is correct and the run-time
> results are OK.
That is strange indeed. On Apple gcc 4.0 (as of Xcode 2.1), the
using namespace stdext;
valarray <float> v1 (10.f, 100);
valarray <float> v2 (20.f, 100);
std::cout << (v1 * v2).sum ();
produces 20000 as expected. Do you not get this result with Yellow
You may have to check carefully where the problem occurs in YDL,
since this is explicitly one of the expressions optimized to use
vmaddfp through valarray_altivec.h:154.
> This is a bug in the compiler's optimizer :( . This time I have
> rechecked myself by running gdb and looking at the actual
> instruction sequences. When reducing my -O3 option down to -O1
> everything is *fine*!!!
> Without redefining inline gcc's O3/O2-level optimizer "thinks" that
> the operands can be safely taken out of the loop, thus producing an
> empty loop (i.e. a bdnz onto itself!).
> Another thing, since we're on the "inline" subject: it looks like
> inline almost never helps to produce more efficient code. This is
> in contrast to the situation I faced on the 3.3 compiler. Actually
> sometimes it drastically reduces performance (for example in the
> operator/ case).
That too is unusual. When I switched from Apple gcc 3.3 to 4.0, the
main change in performance was it didn't like the Barton-Nackman
trick used in the vec template -- I had vec inherit from vec_base
where I put all the common functionality, and while 3.3 optimized
this correctly, 4.0 didn't. So flattening out vec totally (hence the
new DEFINE_VEC_CLASS_GUTS) significantly improved performance and
actually helped with speeding up other bits of code like the trig tests.
I had to do some adjustments of the inline options to avoid too long
compile times while still getting good performance. Do try these
-falign-loops=16 -falign-jumps=16 -falign-functions=16 -finline-
and for benchmark, these additional flags
--param large-function-growth=50000 --param inline-unit-growth=50000
I don't like using __attribute__ ((always_inline)) since it's non-C++
standard and has to appear everywhere, and moreover we would have to
tag EVERYTHING, including the member functions within classes as
such. Then work out the equivalent attribute and its placement for
Cheers, Glen Low
pixelglow software | simply brilliant stuff
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the macstl-dev