[macstl-dev] not all rosey in the gcc-4.0.0 land
lipovsky at skycomputers.com
Tue Jul 5 07:34:27 WST 2005
> The accumulating loop is found in valarray_altivec.h:154. Some of the things
> I would try that you may or may not have tried already:
> 1. Throw a spanner into the optimizer works. Usually the optimizer cannot
> optimize around an output statement or a volatile memory write, so you can
> try either. E.g. create a global volatile static int, then write to it inside
> of the loop and various places you think might be overoptimizing. The place
> which successfully breaks the overoptimization would give you a clue as to
> what level it's occurring at.
With so much templates, the problem is determining the the levels.
> 2. If you're getting this error only with sum () and not regular assigns
> e.g. vr = v1 * v2 or vr = v1 * v2 + v3, then it's a pretty good bet it has
> something to do with the init parameter in the above. Try changing the
> parameter declaration there from T init to const T& init, and copying the
> init to a private init_copy within the function. Try making it volatile etc.
> 3. The code at line 154 is called from valarray_algorithm.h:60, there's
> another place to do 1, 2 and other things to see if this is where the
> overoptimization happens. This is where the valarray is examined so that only
> the initial sequence is vectorized, while the tail, left-over elements use a
> scalar loop (called tail).
Line 60 of valarray_algorithm.h corresponds to comments. It will be very
appreciated if you give me the actual chain of invocation. Right now I
know 1 thing for sure : the problem does not happen in
valarray_altivec.h:154 . I commented out the whole structure and yet the
overoptimization still occurs (i.e. it's somewhere earlier in the chain).
> FSF gcc 4.0 release is dated 20 April 2005, and I suspect Apple put in a lot
> of effort over and beyond that to get it working with Altivec code for Tiger
> and Xcode 2.1 -- more's the pity they seem to be all for switching to Intel.
> So we may be better off waiting for 4.0.1 if we can't resolve the
> overoptimization, and leave only 3.4.x the supported compiler for YDL at all
> optimization levels -- according to the gcc.gnu.org site the 4.0 branch has
> been frozen as of 13 June in preparation for 4.0.1 release.
I am afraid that we'll have to.
More information about the macstl-dev