[macstl-dev] Opinions wanted: tied expressions (multi-valued expressions and manual loop fusion)

Glen Low glen.low at pixelglow.com
Wed Dec 7 08:48:23 WST 2005

  • Previous message: [macstl-dev] Opinions wanted: tied expressions (multi-valued expressions and manual loop fusion)
  • Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]


Ilya, All:

On 06/12/2005, at 2:03 AM, Ilya Lipovsky wrote:

>>
>> 1. Should tie only work at the top level or should we allow ties at
>> the subexpression level? If so, what impact on performance? e.g.
>>
>> tie (e, f) = tie (a * b) + tie (g * h);
>>
>
> What does the above expression represent? I don't understand the right
> side.

If we allow subexpression ties, then tie (a * b) + tie (g * h)  
evaluates to tie (a * b + g * h).

It's a bad example, a better one would be:

tie (a * b, c * d) + tie (g * h, i * j) evaluates to tie (a * b + g *  
h, c * d, i * j)

To restate the proposal, should tie expressions obey normal  
(valarray) rules of arithmetic? If we allow this, then we will have  
to allow scalar ops as well (ooo -- my head hurts) -- e.g. tie (a *  
b, c * d) + k evaluates to tie (a * b + k, c * d + k).

On the implementation side, we have to decide whether tie + tie  
actually results in another tie object, or a some sort of tie  
expression template. I suspect that tie expressions could simply be  
injected into the whole expression template machinery we have with  
minimal problems -- they are just terms of tuple type.

Off hand, I can think of two real uses for this:

1.	May allow ties to be more succinctly expressed if they have  
repeated portions.
2.	May help with functions which return tuples (as opposed to  
constructed with ties) like the proposed deinterleave.

>> 2. How should we spec it so that we don't invalidate multiproc work?
>> (one of the other projects that might be sponsored, to get macstl to
>> scale for multiprocs transparently.) We should say something like:
>> logical parallel sections of the expressions may be evaluated in any
>> order (since they might be in a multiproc situation).
>>
>
> I don't have an opinion on that yet. What mutiprocessor  
> architectures do
> you mean, Cell? Anything else? Do you want to invoke something like  
> MPI
> behind the scene?

I was originally thinking of supporting OpenMP, but our inner loops  
don't need the generalization of OpenMP and we can probably write our  
own pthread/winthread implementation more straightforwardly. We may  
then be able to add MPI support to the mix. I haven't nutted it out  
entirely, one possibility is to have different kinds of assign e.g.

v1.mp_assign (v2 * v3 + v4);

which automatically sections the incoming stream into as many  
sections as there are cores then processes them all in parallel in  
different threads, then rejoins all the threads at the end. If we  
dovetail this with the tied expression work, then developers can do a  
large amount of work before needing to resynchronize at the end.

In terms of succinctness, I would love to have scoped assigns e.g.

{
	using namespace mp;
	v1 = v2 * v3 + v4;	// uses mp_assign
}
{
	using namespace spu;
	v1 = v2 * v3 + v4; // uses spu_assign
}

but I'm not sure it's at all possible using C++, since operator= has  
to be a member function.

As for the Cell, I do have a proposal in the works. I have to examine  
the implications of the DMA/local-store SPU model a little more.

>> 3. What about dependent expressions e.g.
>>
>> tie (e, f) = tie ((a + b) * d, e * h);	// f now depends on e
>>
>
> Excellent case!! And very relevant, too. Yes, this should  
> definitely be
> supported. However,
>
> tie (f, g) = tie (g * h, (a + b) * d)		//e depends on f, an
> 						//uninitialized array
>
> Should be left without any further analysis on macstl's part. The  
> user is
> responsible for correctly ordering statements inside the tie ()  
> functor.

Yes, for sure.

>
>> 4. Further to point 3, should we introduce a special "temp"
>> placeholder which would work akin to a manual CSE? If so what impact
>> on 2? How to implement in C++?
>>
>
> The only reasonable way (that I see) to implement it in C++ is have  
> a bank
> of prototyped valarray/refarray tie_temp<float/int/whatever>  
> vectors. For
> example
>
> valarray<int> tie_temp0, tie_temp1, tie_temp2, ...;
> ....
> valarray<char> tie_temp0, tie_temp1, .... ;

I think the temp placeholder should be a distinct type. It cannot be  
a real array, otherwise it will need to store away all the values it  
receives during a tie expression evaluation -- killing the caches.  
Rather, it should only "store" the immediate chunk it receives in the  
inner loop.

An explication:

valarray a, b, c, d, e;

then tie (a, d) = tie (b * c, a * e)
performs this:

for each i:
a [i] = b [i] * c [i];
d [i] = a [i] * e [i];

notice that the temp valarray a has to store away all i values even  
though it's only a temp.

Now with the placeholder:

placeholder A;
valarray b, c, d, e;

then tie (A, d) = tie (b * c, A * e)
performs this:

for each i:
A = b [i] * c [i];
d [i] = A * e [i];

here the placeholder A only gets the last value of i wherever it is  
in the loop.

> There is a lot of impact on 2, that's why, I think it's best to  
> focus on
> on one processor. People can always compile different pieces of  
> code based
> on macstl and make them work together on multiprocessor systems  
> manually.

The hope is to get some sort of transparent upscaling for multiproc  
systems for those developers that don't want to be bothered with  
rethreading their code. Remember the number of cores is only going to  
go up with Moore's Law, custom threading that utilizes all cores  
effectively will become more and more difficult.

>> 5. Do we do automatic or manual tie flattening e.g.
>>
>> should tie (tie (a, b, c), d, e, f) become tie (a, b, c, d, e, f)
>> automatically or some other function do this?
>>
>
> I think that tie (tie (a, b, c), d, e, f) should automatically.  
> Become tie
> (a, b, c, d, e, f).

Was afraid you'd say that! :-) I tend to agree since nested tuples  
seem useless to me, but then the tie function has to be particularly  
smart about what it does. (or the tuple that it produces will have to  
be the smart one.) On the other hand -- playing devil's advocate here  
-- it's different from the list syntax of PERL/PHP, and some C++  
advocates might be uncomfortable about the loss of type information.  
I'd like some further opinion on this before we crystallize our thought.

>> 6. How to incorporate multiple output syntax into the proposal. E.g.
>> a deinterleave function that deinterleaves 1 stream into 4 might
>> logically produce a 4-tuple so that
>>
>> tie (a, b, c, d) = deinterleave (e);
>>
>> But then what should the syntax be for applying operations between
>> the deinterleaved streams i.e. what would be the succinct  
>> expression of
>>
>> tie (a, b, c, d) = deinterelave (e);
>> f = (a * b) + (c * d);
>>
>> I suppose if we allowed temp placeholders and tie flattening, then
>>
>> tie (a, b, c, d, f) = tie (deinterleave (e), (a * b) + (c *d));
>>
>> could be one such syntax.
>>
>
> The latter syntax looks really nice!

Yes, it's a generalization of the untied case and probably how a PERL/ 
PHP junkie would do it.

If we didn't allow automatic tie flattening, we can still do

tie (tie (a, b, c, d), f) = tie (deinterleave (e), (a*b) + (c*d));

It occurs to me it's somewhat unnice that the tie expressions are  
separated by a =, since when you are tying a large number of  
expressions it could be confusing for the developer to match up what  
goes with what.  But the tie expressions as formulated is closest to  
existing language usage and probably affords the most flexibility.

> One note though: is "deinterleave"
> the right term for that? Maybe there's something else from, say, set
> theory to denote a set containing the same element multiple times.

Any suggestions anyone? It seems logical since there will be a  
corresponding "interleave" function.

Note that deinterleave really does that e.g. if we had a stream (1,  
2, 3, 4, 5, 6, 7, 8, 9...) then deinterleave <4> would produce 4  
streams (1, 5...), (2, 6...), (3, 7...), (4, 8...)

If we allowed ties to scalar (see my thots on subexpression ties),  
then that would handle the case of "set containing the same element  
multiple times".




Cheers, Glen Low


---
pixelglow software | simply brilliant stuff
www.pixelglow.com
aim: pixglen




  • Previous message: [macstl-dev] Opinions wanted: tied expressions (multi-valued expressions and manual loop fusion)
  • Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the macstl-dev mailing list