[macstl-dev] [ANN] macstl 0.3.1 -- Extensively re-optimized,
runs 450x faster than G4 scalar code
glen.low at pixelglow.com
Tue Sep 6 18:09:49 WST 2005
macstl is a portable SIMD (single instruction multiple data) toolkit
that massively accelerates array-based code. It features fast
transcendental and integer division functions, complex number
arithmetic and cross-platform programming, all in an easy-to-use
syntax. After extensive re-optimization, the new 0.3.1 version
features new Linux x86 and Cygwin support, a contributed complex
conjugate function, a much-requested refarray class, optimizations
for SSE2 and lots more. For Apple developers, 0.3.1 now runs even
faster on -faltivec without -maltivec and has improved macstlizer
Altivec-to-SSE conversions for the PowerPC-Intel transition.
macstl is rocket fuel for your data processing code -- the Opteron
on Windows x64 cruises in at 9.8x faster than scalar code, and the G4
on Linux blasts forward at 450x faster than scalar code. No, it’s not
a misprint, here I’ll spell it out -- four-hundred-and-freaking-fifty
times faster than scalar!!
Opteron on Windows x64: http://www.pixelglow.com/lists/archive/macstl-
G4 on Linux: http://www.pixelglow.com/lists/archive/macstl-dev/2005-
macstl requires Mac OS X 10.3 or 10.4, Windows 2000, XP or Server
2003, Linux PPC or x86, or Cygwin 1.5. The library is open-source and
free when derived code is reciprocated, otherwise it is $99 for a
Personal license, $499 for a Corporate License and $2499 for a
List of New Features
- Fixed class scope vector typedefs, missing PowerPC intrinsics
header, vector initializer syntax for FSF 3.4 [ILi*].
- Added complex conj function for vec and valarray [ILi*].
- Improved valarray expression performance: v1 [slice].
- Improved valarray code generation: CSE, inlining limits, literal
terms, array term elements, statarray construction, compiling -
faltivec without -maltivec for Apple gcc 4.0.
- Added refarray class [PBa].
- Fixed buffer overflow in integral valarrays for SSE2; added
optimizations for valarray expressions: v1 >> k and v1 << k for SSE2
- Fixed accumulate array dispatch, integer constant overflow, literal
benchmark test for SSE2; fixed chunking iterator pessimization for
gcc 3.3/4 [ILi, RBe].
- Added makefile for Linux x86 [ILi*].
- Added support for FSF gcc 3.4 on Cygwin 1.5.
- Added differently typed valarray construct and assign from terms,
valarrays of sized booleans, select with sized booleans [ILi].
- Fixed unix makefile directory.
- Added macstlizer conversions: abs, abss, cmpeq, max, min.
- Improved readme file.
Thanks especially go to Ilya Lipovsky (SKY Computers) and Rene Bertin
for their immense help, testing and code contribution with the full
Linux port. That's what open source is all about, folks!
Cheers, Glen Low
pixelglow software | simply brilliant stuff
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the macstl-dev