[macstl-dev] Proposal for mixed complex and real arithmetic
Glen Low
glen.low at pixelglow.com
Fri Jul 15 21:27:11 WST 2005
Ilya, All:
On 15/07/2005, at 8:57 PM, Glen Low wrote:
> Hi Ilya, All
>
> You offered about 1.5 months ago to do an extension of complex
> arithmetic in macstl. If you're still willing and have the time to
> do it, we'd welcome as the first major non-Pixelglow extension of
> macstl and and an important test of the extensibility of the
> fundamental programming.
>
> Brief outline of scope:
>
> The following mixed arithmetic operations should be made possible,
> where r is a valarray of real number (int, float, double etc.) and
> c is the complex equivalent (std::complex <int>, stdext::complex
> <float>, stdext::complex <double>). Then,
>
> r + c -> c
> c + r -> c
> r - c -> c
> c - r -> c
> r * c -> c
> c * r -> c
> r / c -> c
> c/ r -> c
>
> E.g. valarray <float> + valarray <complex <float> > should yield an
> expression that can be treated as a valarray <complex <float> >.
>
> Only valarrays of float and valarrays of complex float as the above
> will be optimized using Altivec.
>
> Things to do:
>
> Based on macstl 0.3 (which has new differently-typed argument
> functionality to support the above), here are the changes to be done:
>
> A. [EASY] functional.h needs to have the arithmetic functors
> specialized to accept the above combinations. Later we may consider
> removing this in favor of a generalized functor that works with any
> 2 different types, but this would mean we'd need moderately
> complicated typeof simulation to figure out the result type. Once A
> is done, you should then be able to compile the expressions above
> and run them, but without optimization.
>
> B. [MODERATE] valarray_altivec.h needs to have a chunker
> specialization defined on the right sort of expression, so that it
> inserts const_chunk_iterator and chunk_begin for the optimization.
> See valarray_altivec.h:251 etc. for hints. Once you do this, but
> defining the body as empty you can detect whether the optimization
> would be called -- e.g. by putting in a simple destructor with a
> std::cout << "hi I'm here" message and compiling and running the
> stage A.
>
> C. [DIFFICULT] We need to define a sensible const_chunk_iterator
> for the above. Ideally it should be generalizable for all complex/
> real or real/complex combinations above, and passed in a template
> template function which is the required operation -- see
> valarray_function.h:529 for a hint. Ideally it should also be
> random access when its two sub-iterators are random access too, but
> we may have to check code generation to see if a forward iterator
> makes more sense. This iterator should yield a vec <complex
> <float>, 2> and so its complex subiterator is incremented whenever
> it is incremented, but its real subiterator is incremented every
> other time. Presumably an high/low indicator will be held in the
> iterator so that it knows which 2 parts of the real subiterator
> needs to accessed, and an appropriate lvsl/lvsr/vperm applied.
> Finally the operator* and operator[] should implement the operation
> -- you might get away with defining it in terms of the (real, real)
> function.
The other possibility is to engineer a vec <complex <float>, 4> that
contains 2 __vector floats, and restructure valarray <complex <float>
> to use this. However I used to remember gcc 3.3 had a terrible
time optimizing structs that contained more than 1 field, as vec
<complex <float>, 4> would. If you do a test of this structure on 3.4
and it works acceptably, we can then change valarray <complex <float>
> to use this.
Quickly declare a simple vecComplexFloat4 struct with 2 __vector
floats. Declare a operator+ that adds two vecComplexFloat4's. Then
try this:
vecComplexFloat4 a, b;
vecComplexFloat4 c = (a + a + a) + (b + b + b);
On 3.3, the usual thing would be that the compiler did a store and
then a redundant load for each of the temps a + a etc. Try it on 3.4
and 4.0 to see if that has changed. If you get positive results on
3.4 and 4.0, we can adopt that approach instead -- A and B would be
unchanged, but C would be simpler, at the cost of having to define up
most of the vec <complex <float>, 4> operators.
Cheers, Glen Low
---
pixelglow software | simply brilliant stuff
www.pixelglow.com
aim: pixglen
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.pixelglow.com/lists/archive/macstl-dev/attachments/20050715/32630b41/attachment-0001.html
More information about the macstl-dev
mailing list