CodeWarrior vs. gcc vs. macstl

It’s a three way fight this September with Mac stalwart Metrowerks Codewarrior 9 entering the fray with its own Standard C++ Library implementation, MSL C++. Apple’s August 2003 gcc Updater brings further optimizations for the Power Mac G4 and G5, and a new compiler flag -fast. And course, macstl returns to defend his crown with version 0.1.4.

Operations per Clock Tick (larger = faster)
operation CW9
gcc 3.3
gcc 3.3
inline arithmetic 361 3623 940 3649
inline transcendental 76 1030 80 934
outline transcendental 80 39 94 39
inline scalarization 1474 1779 1485 4048
inline predication 294 4761 251 4347
inline slice 1751 503 3921 4000
unchunked apply 176 235 404 467
unchunked shift 149 440 1587 1333
unchunked mask 129 174 300 173
unchunked indirect 209 555 578 649

CodeWarrior’ll be Back

macstl is particularly valuable for CodeWarrior developers, since the fast, efficient compiler is hampered by the slow, basic valarray implementation within MSL C++. On the other hand, macstl is 10.0x faster on inline arithmetic, 13.6x faster on inline transcendental and 16.2x faster on inline predication — thanks to the killer left-right hook of expression templates and Altivec optimization.

CodeWarrior reigns in compile speed, except when tackling macstl’s 6 megabyte altivec_constant.h. However, the compiler seems to have trouble hoisting vector invariants out of loops, punishing its performance on inline scalarization and inline slice vs. gcc.

To gcc or not to gcc

The August 2003 gcc 3.3 show some minor improvements across the board, due to better instruction scheduling and a touch of loop unrolling. With compiler options set to -O3 -faltivec -fast -mcpu=G4 , macstl is 3.9x faster on inline arithmetic, 11.7x faster on inline transcendental, 2.7x faster on inline scalarization and 17.3x faster on inline predication.

However, the -fast option seems to introduce reliability problems: valarray <double>, which is not optimized for Altivec, fails to minimize under the new unit test regime.

Unit Tests

Speaking of which, the unit tests are now included with version 0.1.3 onwards. The tests run a battery of operators and functions on valarrays of arbitrary size, and comparing them to C arrays with the same values. Effectively, they compare the results from Altivec algorithms against their scalar equivalents.

I found them very useful in checking the correctness of macstl algorithms as well as vouching for the reliability of new compilers and compiler settings. Go and see for yourself!

Mon, 29 Sep 2003. © Pixelglow Software.
» pentium vs. g5