[macstl-dev] Opinions wanted: tied expressions (multi-valued
expressions and manual loop fusion)
glen.low at pixelglow.com
Wed Dec 7 08:48:23 WST 2005
On 06/12/2005, at 2:03 AM, Ilya Lipovsky wrote:
>> 1. Should tie only work at the top level or should we allow ties at
>> the subexpression level? If so, what impact on performance? e.g.
>> tie (e, f) = tie (a * b) + tie (g * h);
> What does the above expression represent? I don't understand the right
If we allow subexpression ties, then tie (a * b) + tie (g * h)
evaluates to tie (a * b + g * h).
It's a bad example, a better one would be:
tie (a * b, c * d) + tie (g * h, i * j) evaluates to tie (a * b + g *
h, c * d, i * j)
To restate the proposal, should tie expressions obey normal
(valarray) rules of arithmetic? If we allow this, then we will have
to allow scalar ops as well (ooo -- my head hurts) -- e.g. tie (a *
b, c * d) + k evaluates to tie (a * b + k, c * d + k).
On the implementation side, we have to decide whether tie + tie
actually results in another tie object, or a some sort of tie
expression template. I suspect that tie expressions could simply be
injected into the whole expression template machinery we have with
minimal problems -- they are just terms of tuple type.
Off hand, I can think of two real uses for this:
1. May allow ties to be more succinctly expressed if they have
2. May help with functions which return tuples (as opposed to
constructed with ties) like the proposed deinterleave.
>> 2. How should we spec it so that we don't invalidate multiproc work?
>> (one of the other projects that might be sponsored, to get macstl to
>> scale for multiprocs transparently.) We should say something like:
>> logical parallel sections of the expressions may be evaluated in any
>> order (since they might be in a multiproc situation).
> I don't have an opinion on that yet. What mutiprocessor
> architectures do
> you mean, Cell? Anything else? Do you want to invoke something like
> behind the scene?
I was originally thinking of supporting OpenMP, but our inner loops
don't need the generalization of OpenMP and we can probably write our
own pthread/winthread implementation more straightforwardly. We may
then be able to add MPI support to the mix. I haven't nutted it out
entirely, one possibility is to have different kinds of assign e.g.
v1.mp_assign (v2 * v3 + v4);
which automatically sections the incoming stream into as many
sections as there are cores then processes them all in parallel in
different threads, then rejoins all the threads at the end. If we
dovetail this with the tied expression work, then developers can do a
large amount of work before needing to resynchronize at the end.
In terms of succinctness, I would love to have scoped assigns e.g.
using namespace mp;
v1 = v2 * v3 + v4; // uses mp_assign
using namespace spu;
v1 = v2 * v3 + v4; // uses spu_assign
but I'm not sure it's at all possible using C++, since operator= has
to be a member function.
As for the Cell, I do have a proposal in the works. I have to examine
the implications of the DMA/local-store SPU model a little more.
>> 3. What about dependent expressions e.g.
>> tie (e, f) = tie ((a + b) * d, e * h); // f now depends on e
> Excellent case!! And very relevant, too. Yes, this should
> definitely be
> supported. However,
> tie (f, g) = tie (g * h, (a + b) * d) //e depends on f, an
> //uninitialized array
> Should be left without any further analysis on macstl's part. The
> user is
> responsible for correctly ordering statements inside the tie ()
Yes, for sure.
>> 4. Further to point 3, should we introduce a special "temp"
>> placeholder which would work akin to a manual CSE? If so what impact
>> on 2? How to implement in C++?
> The only reasonable way (that I see) to implement it in C++ is have
> a bank
> of prototyped valarray/refarray tie_temp<float/int/whatever>
> vectors. For
> valarray<int> tie_temp0, tie_temp1, tie_temp2, ...;
> valarray<char> tie_temp0, tie_temp1, .... ;
I think the temp placeholder should be a distinct type. It cannot be
a real array, otherwise it will need to store away all the values it
receives during a tie expression evaluation -- killing the caches.
Rather, it should only "store" the immediate chunk it receives in the
valarray a, b, c, d, e;
then tie (a, d) = tie (b * c, a * e)
for each i:
a [i] = b [i] * c [i];
d [i] = a [i] * e [i];
notice that the temp valarray a has to store away all i values even
though it's only a temp.
Now with the placeholder:
valarray b, c, d, e;
then tie (A, d) = tie (b * c, A * e)
for each i:
A = b [i] * c [i];
d [i] = A * e [i];
here the placeholder A only gets the last value of i wherever it is
in the loop.
> There is a lot of impact on 2, that's why, I think it's best to
> focus on
> on one processor. People can always compile different pieces of
> code based
> on macstl and make them work together on multiprocessor systems
The hope is to get some sort of transparent upscaling for multiproc
systems for those developers that don't want to be bothered with
rethreading their code. Remember the number of cores is only going to
go up with Moore's Law, custom threading that utilizes all cores
effectively will become more and more difficult.
>> 5. Do we do automatic or manual tie flattening e.g.
>> should tie (tie (a, b, c), d, e, f) become tie (a, b, c, d, e, f)
>> automatically or some other function do this?
> I think that tie (tie (a, b, c), d, e, f) should automatically.
> Become tie
> (a, b, c, d, e, f).
Was afraid you'd say that! :-) I tend to agree since nested tuples
seem useless to me, but then the tie function has to be particularly
smart about what it does. (or the tuple that it produces will have to
be the smart one.) On the other hand -- playing devil's advocate here
-- it's different from the list syntax of PERL/PHP, and some C++
advocates might be uncomfortable about the loss of type information.
I'd like some further opinion on this before we crystallize our thought.
>> 6. How to incorporate multiple output syntax into the proposal. E.g.
>> a deinterleave function that deinterleaves 1 stream into 4 might
>> logically produce a 4-tuple so that
>> tie (a, b, c, d) = deinterleave (e);
>> But then what should the syntax be for applying operations between
>> the deinterleaved streams i.e. what would be the succinct
>> expression of
>> tie (a, b, c, d) = deinterelave (e);
>> f = (a * b) + (c * d);
>> I suppose if we allowed temp placeholders and tie flattening, then
>> tie (a, b, c, d, f) = tie (deinterleave (e), (a * b) + (c *d));
>> could be one such syntax.
> The latter syntax looks really nice!
Yes, it's a generalization of the untied case and probably how a PERL/
PHP junkie would do it.
If we didn't allow automatic tie flattening, we can still do
tie (tie (a, b, c, d), f) = tie (deinterleave (e), (a*b) + (c*d));
It occurs to me it's somewhat unnice that the tie expressions are
separated by a =, since when you are tying a large number of
expressions it could be confusing for the developer to match up what
goes with what. But the tie expressions as formulated is closest to
existing language usage and probably affords the most flexibility.
> One note though: is "deinterleave"
> the right term for that? Maybe there's something else from, say, set
> theory to denote a set containing the same element multiple times.
Any suggestions anyone? It seems logical since there will be a
corresponding "interleave" function.
Note that deinterleave really does that e.g. if we had a stream (1,
2, 3, 4, 5, 6, 7, 8, 9...) then deinterleave <4> would produce 4
streams (1, 5...), (2, 6...), (3, 7...), (4, 8...)
If we allowed ties to scalar (see my thots on subexpression ties),
then that would handle the case of "set containing the same element
Cheers, Glen Low
pixelglow software | simply brilliant stuff
More information about the macstl-dev