[macstl-dev] not all rosey in the gcc-4.0.0 land

Glen Low glen.low at pixelglow.com
Sat Jul 2 09:46:52 WST 2005

  • Previous message: [macstl-dev] not all rosey in the gcc-4.0.0 land
  • Next message: [macstl-dev] not all rosey in the gcc-4.0.0 land
  • Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]


On 02/07/2005, at 7:24 AM, Ilya Lipovsky wrote:

> Glen,
> Looks like the expression
>
> (u * v).sum()
>
> produces bad code in gcc 4.0.0, if inline is NOT #defined as  
> "inline __attribute__ ((always_inline)).
>
> If, however, inline IS #defined as "inline __attribute__  
> ((always_inline)), then the code is correct and the run-time  
> results are OK.

That is strange indeed. On Apple gcc 4.0 (as of Xcode 2.1), the  
following:

         using namespace stdext;
         valarray <float> v1 (10.f, 100);
         valarray <float> v2 (20.f, 100);
         std::cout << (v1 * v2).sum ();

produces 20000 as expected. Do you not get this result with Yellow  
Dog Linux?

You may have to check carefully where the problem occurs in YDL,  
since this is explicitly one of the expressions optimized to use  
vmaddfp through valarray_altivec.h:154.

> This is a bug in the compiler's optimizer :( . This time I have  
> rechecked myself by running gdb and looking at the actual  
> instruction sequences. When reducing my -O3 option down to -O1  
> everything is *fine*!!!
>
> Without redefining inline gcc's O3/O2-level optimizer "thinks" that  
> the operands can be safely taken out of the loop, thus producing an  
> empty loop (i.e. a bdnz onto itself!).
>
> Another thing, since we're on the "inline" subject: it looks like  
> inline almost never helps to produce more efficient code. This is  
> in contrast to the situation I faced on the 3.3 compiler. Actually  
> sometimes it drastically reduces performance (for example in the  
> operator/ case).

That too is unusual. When I switched from Apple gcc 3.3 to 4.0, the  
main change in performance was it didn't like the Barton-Nackman  
trick used in the vec template -- I had vec inherit from vec_base  
where I put all the common functionality, and while 3.3 optimized  
this correctly, 4.0 didn't. So flattening out vec totally (hence the  
new DEFINE_VEC_CLASS_GUTS) significantly improved performance and  
actually helped with speeding up other bits of code like the trig tests.

I had to do some adjustments of the inline options to avoid too long  
compile times while still getting good performance. Do try these  
options:

-falign-loops=16 -falign-jumps=16 -falign-functions=16 -finline- 
limit=10000 -mcpu=G5

and for benchmark, these additional flags

  --param large-function-growth=50000 --param inline-unit-growth=50000

I don't like using __attribute__ ((always_inline)) since it's non-C++  
standard and has to appear everywhere, and moreover we would have to  
tag EVERYTHING, including the member functions within classes as  
such. Then work out the equivalent attribute and its placement for  
Visual C++.

Cheers, Glen Low


---
pixelglow software | simply brilliant stuff
www.pixelglow.com
aim: pixglen

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.pixelglow.com/lists/archive/macstl-dev/attachments/20050702/17e899ec/attachment-0001.html

  • Previous message: [macstl-dev] not all rosey in the gcc-4.0.0 land
  • Next message: [macstl-dev] not all rosey in the gcc-4.0.0 land
  • Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the macstl-dev mailing list