The fight youve all been waiting for, after that last battle of the libraries. In the red corner, the latest gcc 3.3 libstdc++, courtesy of Apples December 2002 gcc Updater, replacing the old gcc 3.1 codebase. And in the blue corner, a newly retuned macstl 0.1.2. The ground remains the same: a dual processor Power Macintosh G4. But I had added a couple of new benchmarks to exercise our participants.
|operation||gcc 3.3 libstdc++||macstl 0.1.2, Altivec off||macstl 0.1.2, Altivec on|
The inline predication benchmark tests the relational min and max expressions of the form
(v1 == v2).min (), which are optimized in macstl 0.1.2. The inline slice benchmark is the same as the old unchunked slice benchmark, but since slicing is now chunked and inlined, the name was changed. The unchunked shift benchmark was in the source code since macstl 0.1, but while it crashed the gcc 3.1 libstdc++, it works fine now in 3.3.
The biggest jump in gcc performance is with inline arithmetic: 81% faster than the previous version. However, macstl without Altivec still keeps its lead at 10% faster. And with Altivec, it speeds away from 4.2x to 18.5x faster than gcc on all inline tests except for slicing.
macstl 0.1.2 specifically targets chunked relational min and max expressions, using Altivec predicates to gain 18.5x speed over gcc in the inline predication test. It even enhances unchunked bool-valued min and max, yielding 5.8x speed over gcc.
The new slicing algorithms also come out on top, based on Altivec permutes. The improvements over scalar code are not as dramatic though, being just 37% faster than gcc and 41% faster than without Altivec in the inline slice test.