[macstl-dev] macstl wrapper
glen.low at pixelglow.com
Fri Jun 3 09:15:40 WST 2005
On 03/06/2005, at 3:41 AM, Simon P wrote:
>> Also, operator* of NumericArray2 has to return thru
>> constant reference as well.
> I think this is where the problem is. It does an extra
> copy, because I can't return through the const
You've encountered the classic problem with usual implementations of
std::valarray that macstl's valarray (and gcc's valarray) were
designed to solve. In C++ each operation should return a temporary,
but if the temporary is merely the same type as the original, you end
up having to allocate and deallocate memory for it e.g.
In a * b + c * d,
a * b is a temporary
c * d is a temporary
and the result is a temporary
The compiler can only (possibly) remove the last temporary. The
situation is worse with C, since you would have to explicitly
allocate memory for the temporaries.
You can't remove the language's need for temporaries, but you can use
a different "proxy" type for them, which is the essence of the
Expression Template technique. Each temporary is then a proxy which
references the two underlying objects and so there's no great
allocation/deallocation overhead, and better still for modern CPU
architectures, allows loop fusion i.e. a singe loop that performs 3
ops per iteration, instead of 3 loops that perform 1 op per iteration.
That means in order to wrap macstl's valarrays properly to avoid
allocation/deallocation of full valarrays and get the speed thereof,
you'll have to closely follow its use of Expression Templates.
Each function that consumes a valarray should instead take the ET
base class by const reference, stdext::impl::term e.g. replace const
valarray <T>& with const term <T, Term>&. See valarray_base.h:69.
Each function that produces a valarray should instead produce the
appropriate leaf ET class by value e.g. for operator+ it is const
binary_term <LTerm, RTerm, std::plus>. See valarray_function:647.
Therefore one possibility for a 2D/3D wrapper class has a NumericTerm
<Term> wrapping the ET
NumericTerm <binary_term <LTerm, RTerm, std::plus> >
(const NumericTerm <LTerm>& left, const NumericTerm <RTerm>& right)
class NumericTerm <Term>
Term term_; // whenever NumericTerm is copied, the underlying
Term is also copied -- but this is inexpensive except for Term ==
You can then define an appropriate subclass which just wraps the
valarray alone, so that the underlying valarray is hidden from the
client. (It's really just a convenience so that clients don't have to
declare NumericTerm <valarray <float> >, they can just do
class NumericArray <T>: public NumericTerm <valarray <T> >
Note that just like macstl, the operator+ defined above will take
either leaf terms (NumericArrays) as well as branch terms
(NumericTerms wrapping macstl branch terms).
Also note, like macstl, all terms other than valarray are inexpensive
to copy since they merely references to the underlying valarray(s).
Only the valarray leaf term is expensive to copy because it will
allocate/deallocate memory, but your expressions all consist of non-
Another hint is if you can use the valarray subsetting operations to
implement some of the 2D/3D functionality. Currently only slicing is
Altivec-optimized (and not yet as well as scalar ops as the
benchmarks will show), but it's likely that more of them will be
rewritten to use Altivec as the project proceeds.
Cheers, Glen Low
pixelglow software | simply brilliant stuff
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the macstl-dev