[macstl-dev] Re: Question

Glen Low glen.low at pixelglow.com
Tue May 17 08:29:40 WST 2005

  • Previous message: [macstl-dev] gcc 3.3 on YellowDogLinux
  • Next message: [macstl-dev] Re: Question
  • Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]


Ilya

On 17/05/2005, at 4:51 AM, Ilya Lipovsky wrote:
Hi Glen,

I think your element_cast <> may not be such a bad idea. However, I  
am not sure it is the best, either. I think the right idea in this  
case is to actually expand the chunking mechanism to adapt varying  
types. To be combined in a native fashion within one loop.

Why I believe it to be a better idea? Because, for example, in the  
case provided in my previous email element_cast <> will convert a  
compact representation into a strided one. This will require 2 extra  
vmrghw instructions + 1 load per iteration (to make a <float> into  
<complex <float>> ) as opposed to simply loading the 4 floats  
natively and multiplying them with the 2 registers that contain 4  
complex<float>'s. I am not even counting the wasted vperm's of the  
operator*<complex<float> >. We don't need the extra vperm's and the  
vmaddfp's in the natively implemented operator*<complex<float>,  
<float> > case. The operator should be able to be implemented as  
follows:

   template <> struct multiplies <macstl::vec <float, 4>, macstl::vec  
<stdext::complex <float>, 4> >
   {
     typedef macstl::vec <float, 4> first_argument_type;
     typedef macstl::vec <stdext::complex <float>, 4>  
second_argument_type;
     typedef macstl::vec <stdext::complex <float>, 4> result_type;

     result_type operator() (const first_argument_type& lhs, const  
second_argument_type& rhs) const
     {
       using namespace macstl;

       return ..... ; // this is nontrivial ;-)
     }
   };

The problem is that macstl::vec (on Altivec) is defined as a 128-bit  
quantity corresponding exactly to one vector register. In practice  
stuffing anything more spoils gcc 3.3's ability to enregister  
macstl::vec -- we need to ensure that it only ever contains one field  
of native type in order to get gcc to keep it in registers only.

Therefore vec <complex <float>, 4> can't work -- a complex float is  
64 bit and therefore such a beast would be 256 bit.

We have to tackle the multiplication at a higher level, at the  
valarray expression level.

Now element_cast <> isn't as bad as you think. Consider that the  
valarray expression template engine I wrote can actually reconfigure  
expressions at compile time for efficiency e.g.

(a * b) + c

actually recomposes the expression to use something like madd (c, a,  
b) i.e. what looks like two separate operations in the expression can  
be merged into a single.

That means

element_cast <complex> (a) * b

need not actually unpack the float a into a complex then multiply by  
b, but invoke some sort of merged operation which multiplies 2  
complex by 2 float at a time. (The only limitation I see is that the  
iterator would need to step through 2 complex at a time, so the float  
vector may have to be loaded twice -- it would take a smart loop  
unroller in the compiler to see that double load and optimize to a  
single one...)

The issue then becomes whether it is convenient for users to use  
element_cast. The valarray expression engine works on identical types  
and has no notion of type promotion (yet). Some people have said  
element_cast is rather clunky and would rather automatic promotions  
like regular C (i.e. float -> complex float, integer -> float etc.).  
My worry from a syntactic point of view is that these conversions  
aren't free, more so with SIMD architectures, so there's a need to  
highlight expensive conversions. What do you think?

This conversation is interesting, so I'm going to suggest we continue  
it in the mailing list.

http://www.pixelglow.com/lists/listinfo/macstl-dev/


The question, then, is how hard is it to implement such a beast. What  
is your opinion?

I don't mind doing some coding as long as my manager(s) approve. I am  
just a soldier ;).

-Ilya


OK thanks for the offer. Once we thrash out what you need and what  
the others are happy with, we can work something out.


Cheers, Glen Low


---
pixelglow software | simply brilliant stuff
www.pixelglow.com
aim: pixglen


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.pixelglow.com/lists/archive/macstl-dev/attachments/20050517/acf59aca/attachment.html

  • Previous message: [macstl-dev] gcc 3.3 on YellowDogLinux
  • Next message: [macstl-dev] Re: Question
  • Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the macstl-dev mailing list