[macstl-dev] Re: macstl on Linux, redux
glen.low at pixelglow.com
Fri Sep 2 08:13:59 WST 2005
Thanks for the results and for all your efforts to get macstl stable
and performant on Linux PPC and x86. I think the rocket fuel quote is
worthy to be in the next press release, if it's OK by you!
Just for sanity too, see if you get significantly different results
if you comment out all the other benchmarks except the fastest
speedup, to see if the compiler is stealing cycles here and there.
Cheers, Glen Low
pixelglow software | simply brilliant stuff
On 02/09/2005, at 4:40 AM, Ilya Lipovsky wrote:
> Hi Glen,
> Everything appears to be compiling and running smoothly. I
> benchmarked on a 7447 (G4-equivalent in Apple-speak) system with
> Linux 2.6 kernel installed on it.
> With gcc 3.4.2 compiled code I am getting:
> Benching multiply add: 2.74074*std 2.14815*raw.
> Benching inner product: 2.08333*std 2.08333*raw.
> Benching polynomial: 1.94521*std 1.67123*raw.
> Benching hypotenuse: 9.6*std 8.50476*raw.
> Benching complex multiply add: 6.27559*std 1.6378*raw.
> Benching predicate: 24.7895*std 2.89474*raw.
> Benching slicing: 0.65534*std 0.645631*raw.
> Benching power: 7.125*std 6.92096*raw.
> Benching trigonometric: 75.3491*std 75.3926*raw.
> With gcc 4.0.0 compiled code I am getting:
> Benching inner product: 4.63636*std 4.54545*raw.
> Benching polynomial: 4.71429*std 4.32143*raw.
> Benching hypotenuse: 14.3881*std 12.8806*raw.
> Benching complex multiply add: 3.30159*std 2.79365*raw.
> Benching predicate: 20.5556*std 2.88889*raw.
> Benching slicing: 0.711957*std 0.706522*raw.
> Benching power: 13.864*std 13.6544*raw.
> Benching trigonometric: 450.697*std 450.72*raw.
> I am satisfied with the solidity of the above results. Or, more
> informally speaking, if G4 were a rocket with data processing its
> mission, then MacSTL would be its fuel :-). Great job, Glen!
> P.S. I withheld the absolute timing measurements mostly because
> they are dependent on the clock rate of the processor.
> Glen Low wrote:
>> Ilya, Rene:
>> Finally isolated the Linux x86 performance problems. I have Cygwin
>> installed at my Windows XP machine and finally got macstl
>> compiling on it and found the same terrible performance on the
>> benchmarks as Rene did, so you're not imagining things! Strangely
>> enough the solution seemed to be -fno-unit-a-time for x86
>> (directly contradicting needing -funit-at-a-time on PPC...), and
>> it happens because the compiler tries to aggressively move (macstl-
>> vectorized?) transcendental function calls away from the
>> benchmarked loop. For example you can do without -fno-unit-a-time
>> and comment out the trig test and the performance is back to
>> Windows levels.
>> Here's a zip of the latest, including a new Makefile by Ilya --
>> I'm still mildly worried about how -funit-at-a-time is affecting
>> benchmark results overall, but haven't been able to stop code
>> being rearranged willy nilly otherwise.
>> Cheers, Glen Low
More information about the macstl-dev