[macstl-dev] Re: Question
lipovsky at skycomputers.com
Tue Jul 19 00:15:12 WST 2005
>> #define inline inline __attribute__ ((always_inline))
>> The last directive is essential due the compiler being stubbornly
>> lazy about inlining certain nested template functions (such as the
>> complex fused multiply-add when used as optimization in certain
> You are right about this, although I'd been stubbornly resisting this
> for a while.
> The main reasons why this will be good:
> 1. Sometimes the compiler is stubborn about inlining functions,
> even at insanely high inlining levels (as you and some others have
> said) that cause the compiler to have heart attacks on slow machines.
> 2. Clients may not want to inline the rest of their code.
> 3. When using -faltivec without -maltivec on Apple gcc, the
> compiler refuses to inline code that contains Altivec ops, presumably
> to allow running on a non-Altivec machine (given the right runtime
> detection). Only the directive seems to overcome it.
> 4. Existing practice. The ppc_intrinsics.h etc. all use
> always_inline too.
One additional very important reason: the optimizing bug of 4.0.0 and
4.0.1 in case of (u*v).sum does not show its ugly head with this
> Thus 0.3.1 or 0.3.2 will have the always_inline turned on, but in this
> A. We'll define a macro called ALWAYS_INLINE (any other
> suggestions?) in config.h and define it appropriately for gcc and
> Visual C++ (and possibly ICC). Hopefully the attribute should work in
> the same syntactic position as inline already does in C++. (I don't
> want to redefine the inbuilt "inline" because it would conflict with
> client code at the least, and I feel it's dangerous to redefine
> B. We'll have to tag ALL candidate functions with this, even the
> ones within classes and class templates that are implicitly inline, in
> order to force them to be inlined. In general this should be OK since
> a large part of them are generated via private macros anyway.
> C. A clean compile with no explicit inline tuning at the compiler
> level should still provide the same performance level as the current
> high inlining levels.
Seems like a good idea to me. One note, however: *some* of my benchmarks
run slightly (e.g. 54 as opposed to 50 millisecs for about 1K points)
slower with the always_inline in place of regular inline. I don't
exactly know how the compiler optimizes things, but it seems to me that
"always_inline" is invoked before any optimizations take place, thus
making it harder for the optimizer do its job, because, I think, it is
harder for it to discern the original structure. For gcc 3.4 there is
practically no difference. gcc 4.0 is pickier (nevertheless, it produces
correct results!!). It may be using some kind of heuristic to determine
when "enough is enough" for the code's mass, and avoids any further
implicit inlining. This can be avoided, I surmise, if we implement the
ALWAYS_INLINE macro the way you've proposed, thus forcing the compiler
to inline everything.
More information about the macstl-dev