[macstl-dev] Re: Question

Glen Low glen.low at pixelglow.com
Tue Jul 19 07:59:56 WST 2005

  • Previous message: [macstl-dev] Re: Question
  • Next message: [macstl-dev] Proposal for mixed complex and real arithmetic
  • Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]


Ilya:

On 19/07/2005, at 12:15 AM, Ilya Lipovsky wrote:

>> Thus 0.3.1 or 0.3.2 will have the always_inline turned on, but in  
>> this fashion:
>>
>> A.    We'll define a macro called ALWAYS_INLINE (any other  
>> suggestions?) in config.h and define it appropriately for gcc and  
>> Visual C++ (and possibly ICC). Hopefully the attribute should work  
>> in the same syntactic position as inline already does in C++. (I  
>> don't want to redefine the inbuilt "inline" because it would  
>> conflict with client code at the least, and I feel it's dangerous  
>> to redefine built-ins.)
>> B.    We'll have to tag ALL candidate functions with this, even  
>> the ones within classes and class templates that are implicitly  
>> inline, in order to force them to be inlined. In general this  
>> should be OK since a large part of them are generated via private  
>> macros anyway.
>> C.    A clean compile with no explicit inline tuning at the  
>> compiler level should still provide the same performance level as  
>> the current high inlining levels.
>>
>>
>
> Seems like a good idea to me. One note, however: *some* of my  
> benchmarks run slightly (e.g. 54 as opposed to 50 millisecs for  
> about 1K points)  slower with the always_inline in place of regular  
> inline. I don't exactly know how the compiler optimizes things, but  
> it seems to me that "always_inline" is invoked before any  
> optimizations take place, thus making it harder for the optimizer  
> do its job, because, I think, it is harder for it to discern the  
> original structure. For gcc 3.4 there is practically no difference.  
> gcc 4.0 is pickier (nevertheless, it produces correct results!!).  
> It may be using some kind of heuristic to determine when "enough is  
> enough" for the code's mass, and avoids any further implicit  
> inlining. This can be avoided, I surmise, if we implement the  
> ALWAYS_INLINE macro the way you've proposed, thus forcing the  
> compiler to inline everything.

Interesting, I've been having similar problems. gcc 4.0 has very good  
CSE (common subexpression elimination) powers but they tend to tire  
out when the chain of expressions is too long or when always_inline  
interrupts it. It's been fairly hit and miss at the moment, though  
I'm narrowing down on a couple of solutions. At the least you'd get

valarray <float> f;
f + f

using only one lvx vs, two lvx's if the compiler cannot detect the  
common subexpression "f".

The two things that have helped so far are (1) turning -maltivec or - 
mcpu=G5 off -- this even impacts CSE of scalar code for some strange  
reason -- although I'm not sure whether the FSF version has the same  
bug/feature, and (2) having array_term store elements as __vector  
float instead of vec <float, 4>, and having the chunk_iterator  
convert them into vec <float, 4> on the fly.

always_inline seems to sometimes kill the CSE, but it's unavoidable  
for the solution in (2), since the chunk_iterator needs to return a  
vectorized value.




Cheers, Glen Low


---
pixelglow software | simply brilliant stuff
www.pixelglow.com
aim: pixglen

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.pixelglow.com/lists/archive/macstl-dev/attachments/20050719/6bcbffc4/attachment.html

  • Previous message: [macstl-dev] Re: Question
  • Next message: [macstl-dev] Proposal for mixed complex and real arithmetic
  • Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the macstl-dev mailing list