[macstl-dev] Re: [AGG] Re: [Fwd: Multiplication?]
glen.low at pixelglow.com
Tue Mar 1 08:11:54 WST 2005
On 01/03/2005, at 12:38 AM, Maxim Shemanarev wrote:
>> vec <unsigned, 4>::operator* is currently undefined but planned. As
>> Tony has said, this would require the PMULUDQ, but because PMULUDQ
>> produces a 64-bit element result, I would have to shuffle, do two
>> PMULUDQ's and then do an unpack (pack?). So if it's important to you
>> guys, I will explore this in 0.2.2, or alternatively you could
>> contribute code to that effect (hint: vec_mmx.h:2640).
>> Altivec is similar, it requires several instructions to get the same
> AFAICS Alitivec is much better for alpha-blend operations. In general
> what we
> need is 4x32 unsigned multiplications (16bit arguments and 32bit
> result), in
> Altivec it's vmulouh. In SSE2 there is no such instruction. The use of
> doesn't make much sense because it allows to parallel only 2 color
> (Red, Green), then you have to calculate Blue separately. Alpha is
> differently. That's would be the perfect fit for both, 8- and
> 16-bit-per-component colors (for 8bpc we could calculate up to 4
> pixels in
> However, for 8-bit-per-component colors we can use MMX and PMULLW, 16
> bits is
> enough here. But there's no instruction to multiply 4 16-bit values
> and get 4
> 32-but ones, alas.
I'm a little confused here, do you want a half multiply e.g. 32 bit x
32 bit = 32 bit OR 16 bit x 16 bit = 16 bit (based on your like of the
PMULLW opcode, and my guess that you want to stuff the result back into
a similarly sized pixel), or a full multiply 16 bit x 16 bit = 32 bit
(based on your like of the vmulouh opcode)? The former is planned, but
the latter isn't for macstl -- although I can certainly consider it,
since the implementation has the flexibility for it.
So, the missing types in macstl::vec<> is quite explainable.
> I honestly do not understand why they abandoned such useful operations
> 4x32 bits. PMULLD and PMULHD would be perfect.
>> Indeed, you can use macstl's valarray
>> implementation to do calculations on entire arrays transparently and
>> efficiently -- you declare valarrays of the scalar element, but the
>> implementation transparently upscales using the equivalent SIMD
>> if available.
> Sorry for possibly silly question.
> Can I map a raw memory area that has logical structure R-G-B-A (one
> byte per
> component) to a valarray? Of course, the starting adress must be
> aligned to 16
> The problem is we have an RGBA frame buffer in memory. Each row can be
> to 16 bytes, and the width of the buffer can be aligned too. But the
> problem is
> we need to blend, say, 20 pixels starting from 10th. In this case we
> blend two
> first pixels as usual (to get the aligned start from 12th pixel) and
> perform 4
> pixel parallel blend operation. Then we process the unaligned end.
> Is this trick possible with valarray?
Some (Paul Baxter) have asked for similar things, I'm contemplating how
best to get this functionality. For now, the easiest way to get what
you want is to derive from the stdext::impl::array_term class, which
lets you wrap any arbitrary piece of memory, although it needs vector
type memory instead of scalar type because of aliasing and alignment
issues. This transparently handles the use of scalar ops for the
Your resulting type can then participate in expressions largely
interchangeably with valarray and friends -- they are all built upon
I'm contemplating making this easier by implementing a refarray, which
will allow you simply to pass in raw memory area as you suggest.
For severely non-contiguous access, you can try deriving from
stdext::impl::term instead, but then you have to supply your own
iterator that steps through the memory in chunks.
Cheers, Glen Low
pixelglow software | simply brilliant stuff
More information about the macstl-dev