[macstl-dev] Good news (finally) about gcc 4.0 problems
lipovsky at skycomputers.com
Sat Sep 24 08:02:26 WST 2005
(I just saved a bunch of money on my car insurance... (Just kidding,
that was just taken from a U.S. commercial... :-D ) )
Seriously, the good news is that it strongly appears that I was able to
solve the problems with FSF's gcc 4.0 generating bad code on Linux PPC.
I am very thankful to Andrew Pinski (a gcc maintainer) for his suggestion.
Essentially all I needed to do was to modify altivec.h to change all
instances of simple "inline" into "INLINE" defined as "inline
__attribute__ ((always_inline))", a familiar macro.
I know, I know: I should've noticed it earlier. Sometimes you miss
things that, as you think later on, were staring you in the face (but
who would doubt a standard header?).
So, to reiterate, the stuff that previously ran so badly (in terms of
generating correct results) now runs *well and fast*.
I am going to implement similar changes in the altivec.h file that
belongs to gcc 3.4. Also, I am going to add some macro logic to the
config.h file to conditionally include the appropriate versions of the
manually changed altivec.h. We should not use the standard altivec.h header.
Andrew Pinski mentioned in a (later) email to me that gcc 4.0.3 will
possibly have the fast header. He also claimed in the letter before the
above that the future FSF gcc 4.1 should expand the vec_* routines into
builtins directly, avoiding going through the definitions in altivec.h
(just like what Apple gcc does). We'll see.
At some point I still want to submit a bug report about gcc 4.0. As it
stands right now, however, the "INLINE" macro is a very successful hack.
Glen Low wrote:
> On 20/09/2005, at 10:46 AM, Ilya Lipovsky wrote:
>> On Tue, 20 Sep 2005, Glen Low wrote:
>>> Another alternative that won't lose you the performance is to
>>> #define NO_CHUNKING_ITERATOR in config.h:40 for your gcc. While the
>>> Apple gcc 4.0 build doesn't define it by default, I did test
>>> compiling with it on and it seems to be OK on Apple gcc 4.0. What
>>> this does is force macstl to access __vector float through a type
>>> that can alias anything else.
>> That changed the execution behavior. Now it executes with different
>> results, which are neither NaN nor inf. They are all different
>> floating point numbers, which clearly are a function of the input (s)
>> but *totally off*!!
>> In essence, it's a pretty nasty gcc bug. That's all I know for now.
>> I attached here the assembly code for the cases of:
>> #define'd NO_CHUNKING_ITERATOR (glen_without_chunking_iter.s)
>> the regular case (glen_with_chunking_iter.s)
> I had a quick look at the .s files and they look really weird. Mainly,
> the intrinsic calls are not being inlined e.g. vec_madd, vec_cmpeq,
> vec_cmpb, vec_and and so on. These appear to be the GNU ones not the
> macstl ones (the macstl ones never have the vec_ prefix and "and" is
> called "vand") -- which indicates to me something's wrong with the
> declaration in <altivec.h> that's needed to make Altivec work on FSF gcc.
> Are these the ones produced by -O1 instead of -O3?
> Can you email me the line in <altivec.h> that declares vec_madd etc.?
> May be barking up the wrong tree here but I know at some stage gcc had
> problems moving vector arguments around, this would be exacerbated by
> the lack of inlining intrinsics.
> Cheers, Glen Low
> pixelglow software | simply brilliant stuff
> aim: pixglen
More information about the macstl-dev