[macstl-dev] Re: [altivec] CSE and the single programmer
glen.low at pixelglow.com
Wed Jul 20 18:46:04 WST 2005
On 20/07/2005, at 9:24 AM, Daniel Berlin wrote:
> On Wed, 2005-07-20 at 08:54 +0800, Glen Low wrote:
>> Hi All
>> Apple gcc 4.0 is leaps and bounds better than 3.3 for CSE (common
>> subexpression elimination). For example in macstl 0.3.x I can now
>> reliably do:
>> valarray <float> a, b;
>> valarray <float> b = a + a + a;
>> and have one lvx (or lfs) within the inner loop, instead of 3 lvx
>> when the compiler can't track the identical origins of a in the
> Your welcome :)
> (I'm responsible for the tree level PRE implementation in GCC)
Cool! Glad to have got you on the line in one of the mailing lists I
>> Since it looks like the compiler gives up CSE at a certain length of
>> expression rather than with a definite combination of options/usage,
>> it feels like there's some sort of "maximum length of CSE'able
>> expression" flag in gcc.
> It does no reassociation though until 4.1 (where we have trivial
> reassociation), so depending on the phase of the moon, and how the
> gimplifier split up your expression, we may not see the common
Ah well. In that case,
1. Can I influence or control how much of a chunk the gimplifier
takes of my source code?
2. Does __attribute__((pure)) or __attribute__((const)) affect the
CSE when the candidate function is already inlined?
3. Any general hints about how to keep the RTL or whatever
intermediate code size down so that the CSE module sees as much as
I had to eliminate a lot of temporaries and uglify the code somewhat
even to get a + a + a to work... for example, in C++ it's a common
idiom to use a functor object that has a operator() and no data, and
just instantiate it anyway -- gcc 3.x+ have been good in eliminating
the redundant construct and destruct -- but it definitely affects the
CSE in 4.0+ ...
> This is an area we want to improve, but it's not easy to get good
> results, though what we have now is better than nothing :)
Indeed. We need good CSE to achieve better high-level support for
SIMD operations without compromising performance. E.g. in macstl, a
valarray expression becomes a fairly complicated (but invisible to
the client) expression template, and so the vectorization/loop fusion
cannot bridge different expressions/statements, thus I can't do my
own CSE with temporaries.
Cheers, Glen Low
pixelglow software | simply brilliant stuff
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the macstl-dev