[macstl-dev] macstl 0.3.0.32

Glen Low glen.low at pixelglow.com
Fri Aug 12 23:56:08 WST 2005

  • Previous message: [macstl-dev] macstl 0.3.0.32
  • Next message: [macstl-dev] macstl 0.3.1 soon out; time's up for "Hell Freezes Over" promotion
  • Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]


On 11/08/2005, at 11:14 PM, Glen Low wrote:

> Dear All
>
> Just committed the new version of macstl to the Subversion  
> repository. This has extensive but largely transparent changes to  
> support better CSE and inlining, especially when compiling - 
> faltivec without -maltivec on Apple gcc 4.0. I have optimized  
> literal terms in valarray expressions as well as element access to  
> valarray.

A little elaboration on the improvements in 0.3.0.32

1.    Better CSE. Compiling with -faltivec without -maltivec on Apple  
gcc 4.0, the following sorts of expressions will have minimal loads:

valarray <float> v1, v2, vr;

vr = v1 + v1 + v1 + v1;    // 1 load, 3 adds in the inner loop  
instead of 4 loads, 3 adds.
vr = (v1 + v2) + (v1 + v2);    // 2 loads, 2 adds in the inner loop  
instead of 4 loads, 3 adds.

You will get slightly worse results with -faltivec and -maltivec on  
together.

2.    Inlining. macstl now doesn't require turning inline limits up  
to the wazoo on gcc, Visual C++ and ICC, it uses the minimal amount  
of forced inlining to get inner loops to compile as one code path.  
Besides helping with compile times, this also helps on -faltivec  
without -maltivec which otherwise won't inline vector code.

3.    Literal terms. Any literal terms should be faster both in the  
inner loop (especially -faltivec without -maltivec) and prolog/epilog  
sessions. E.g.

vr = 3.0 + v1;    // 3.0 is a literal term

4.    Element access to valarray. Previous versions had poor element  
access code to chunked valarrays, so v1 [0] would generate poor code  
and if the valarray was chunked but the entire expression wasn't  
chunked, evaluating it would be much slower than the equivalent  
scalar code. Thanks to gcc 4.0 having better support for proxies and  
temporaries, I was able to rearrange the iterators so that element  
access is now as fast as C element access and unchunked expressions  
should evaluate almost as fast as the C equivalent. All this rework  
is still fully aliasing compliant so you should still be able to  
access by element and do chunked operations without worrying that the  
compiler is going to reorder them wrongly.

Hint: if you want to see if a particular expression is chunked or  
not, look for the chunk_begin member e.g.

((v1 + v2) + v3).chunk_begin ();    // compiles because (v1 + v2) +  
v3 is chunked
atan (v1).chunk_begin ();        // doesn't compile because atan (v1)  
isn't chunked -- no vectorized version of atan available yet
                                                     // atan(v1) on  
0.3 and earlier used to be slower than the implied hand-coded loop,  
but now it should be almost the same speed

or if you're using gcc or an equivalent, you can use __typeof and  
look for the const_chunk_iterator typedef e.g.

__typeof ((v1 + v2) + v3)::const_chunk_iterator;    // exists
__typeof (atan (v1))::const_chunk_iterator;    // doesn't exist

Hint: Using typeof is also handy if you want to store the expression  
off for subsequent (re)evaluation, rather than evaluate down to a  
valarray or statarray e.g.

__typeof (v1 + v2) temp = v1 + v2;
vr = temp + temp;

is more efficient than

valarray <float> temp = v1 + v2;
vr = temp + temp;

Not as good as having CSE, but at least you don't pay for extraneous  
stores and temp memory.

5.    -faltivec without -maltivec. Code compiled with these options  
should be comparable if not faster than -faltivec and -maltivec,  
several spurious non-vectorized memcpy's were eliminated.

Cheers, Glen Low


---
pixelglow software | simply brilliant stuff
www.pixelglow.com
aim: pixglen

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.pixelglow.com/lists/archive/macstl-dev/attachments/20050812/ac57db93/attachment.html

  • Previous message: [macstl-dev] macstl 0.3.0.32
  • Next message: [macstl-dev] macstl 0.3.1 soon out; time's up for "Hell Freezes Over" promotion
  • Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the macstl-dev mailing list