[macstl-dev] Re: [AGG] Re: [Fwd: Multiplication?]
glen.low at pixelglow.com
Wed Mar 2 09:14:04 WST 2005
On 02/03/2005, at 7:19 AM, Maxim Shemanarev wrote:
> It's a very common thing. I need to use 8 bit source values and
> produce 8 bit
> results. But for intermediate results there must be at least 16 bits.
> AFAIU, U8x16 will truncate the result of the multiplication to 8 bits,
> so, the
> only correct way is to use U16x8, keeping the range of all input values
> 0...255. An example is as follows:
> r = (int8u)((((cr - r) * (int16u)alpha) + (r << 8)) >> 8);
> g = (int8u)((((cg - g) * (int16u)alpha) + (g << 8)) >> 8);
> b = (int8u)((((cb - b) * (int16u)alpha) + (b << 8)) >> 8);
> Here all values are unsigned bytes, but as soon as you perform
> there will be overflow (this is why "alpha" is converted int16u, to
> make the
> compiler use int16u for the whole expression).
> The same is for 16 bit input values and results (U32x4):
> r = (int16u)((((cr - r) * (int32u)alpha) + (r << 16)) >> 16);
> g = (int16u)((((cg - g) * (int32u)alpha) + (g << 16)) >> 16);
> b = (int16u)((((cb - b) * (int32u)alpha) + (b << 16)) >> 16);
> The problem is SSE2 (unlike Altivec) doesn't have such commands, to
> four 16*16=32 or just 32*32=32, there's only two 32*32=64, so, it will
> need 2
> consecutive commands, which will definitely be slower.
Hmm... given your specific expression above, it seems you really want
the high byte (or word) of the multiplication as with the SSE PMULHUW
mulhi (cr - r, alpha) + r
That could be arranged...
On the other hand, C (unsigned) integer arithmetic rules work modulo
n+1, where n is the largest unsigned integer for that type. So
a * b
where a and b are 8 bit integers may indeed overflow, but if you work
"conservatively" with the result it doesn't matter e.g.
mullo (a, b) + c
== (a * b) mod 256 + c
== (a * b + c) mod 256
which is the result you'd get anyway if you ignored the overflow of a *
b in the first place.
> The problem here is with dependencies. If I use valarray, I'll make
> the very
> basic things in my library (and as the result, the whole library)
> dependent on
> macstl. This will make the library useless on SGI, Sun, HP, IBM, etc.
> In case
> if I could map raw memory (preserving the alignment rules of course),
> it's only
> a single file, like agg_pixfmt_rgba_vec.h. So, if the application uses
> file it depends on macstl, but the other parts of the library don't.
> If you for
> some reason can't use macstl (sacrificing the speed), you just don't
> use the
> parts dependent on it.
There are a couple of ways of managing dependencies. You could do as
you say, through a single file that brings in macstl for the people who
want to use it. Alternatively, the macstl implementation of valarray is
largely a simple superset of std::valarray, which of course is
available on the C++ standard library. Thus a simple statement like
using namespace stdext;
is sufficient to switch between macstl's stdext::valarray and the
hosted std::valarray. Of course from tests I see that std::valarray
fares much poorer than hand-coded scalar loops on all tested compilers
except gcc 3.3, where Gabriel Dos Reis's excellent ET implementation
achieves close to hand-coded scalar loop performance.
Also, I'd like to see macstl compile and work correctly on as many
compilers as possible, so it would be good to see results on those
compilers you mentioned -- can someone try compiling macstl on those
other compilers. The main requirement is a close-to-standard C++98
compiler, e.g. VC 6.0 and 7.0 fare badly but 7.1 is OK.
There might be licensing issues as well. macstl is RPL and I originally
thought AGG was BSD, but the SF page lists you as CPL. We'd have to
work something out, if you're the owner of AGG, that including macstl
is OK but anyone who wants the optimizations of macstl should respect
its RPL rules.
I'll work on a refarray for 0.2.2.
Cheers, Glen Low
pixelglow software | simply brilliant stuff
More information about the macstl-dev