A high performance implementation of
valarray and associated
lib.numerics classes, optimized for SIMD architectures like the PowerPC G4 and G5 Altivec, and the Pentium 3, Pentium 4 and Itanium MMX/SSE/SSE2/SSE3.
The valarray lets you access the raw parallel processing power of these machines without worrying about the nitty gritty of new opcodes and hand scheduling.
sin (v1) * cos (v2) + sin (v2) * cos (v1)? Yet this will still run 3.6x to 16.2x faster than a hand-coded scalar loop on different architectures.
The implementation is built on top of the portable SIMD vec classes, which allows new SIMD architectures to be easily swapped in.