[macstl-dev] Re: [AGG] Re: [Fwd: Multiplication?]

Glen Low glen.low at pixelglow.com
Wed Mar 2 09:14:04 WST 2005

  • Previous message: [macstl-dev] Re: [AGG] Re: [Fwd: Multiplication?]
  • Next message: [macstl-dev] Re: [AGG] Re: [Fwd: Multiplication?]
  • Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]


Maxim:

On 02/03/2005, at 7:19 AM, Maxim Shemanarev wrote:
>
> It's a very common thing. I need to use 8 bit source values and 
> produce 8 bit
> results. But for intermediate results there must be at least 16 bits.
> AFAIU, U8x16 will truncate the result of the multiplication to 8 bits, 
> so, the
> only correct way is to use U16x8, keeping the range of all input values
> 0...255. An example is as follows:
> r = (int8u)((((cr - r) * (int16u)alpha) + (r << 8)) >> 8);
> g = (int8u)((((cg - g) * (int16u)alpha) + (g << 8)) >> 8);
> b = (int8u)((((cb - b) * (int16u)alpha) + (b << 8)) >> 8);
>
> Here all values are unsigned bytes, but as soon as you perform 
> (cr-r)*alpha
> there will be overflow (this is why "alpha" is converted int16u, to 
> make the
> compiler use int16u for the whole expression).
>
> The same is for 16 bit input values and results (U32x4):
> r = (int16u)((((cr - r) * (int32u)alpha) + (r << 16)) >> 16);
> g = (int16u)((((cg - g) * (int32u)alpha) + (g << 16)) >> 16);
> b = (int16u)((((cb - b) * (int32u)alpha) + (b << 16)) >> 16);
>
> The problem is SSE2 (unlike Altivec) doesn't have such commands, to 
> multiply
> four 16*16=32 or just 32*32=32, there's only two 32*32=64, so, it will 
> need 2
> consecutive commands, which will definitely be slower.

Hmm... given your specific expression above, it seems you really want 
the high byte (or word) of the multiplication as with the SSE PMULHUW 
opcode?

Something like

mulhi (cr - r, alpha) + r

That could be arranged...

On the other hand, C (unsigned) integer arithmetic rules work modulo 
n+1, where n is the largest unsigned integer for that type. So 
something like

a * b

where a and b are 8 bit integers may indeed overflow, but if you work 
"conservatively" with the result it doesn't matter e.g.

mullo (a,  b) + c

== (a * b) mod 256 + c

== (a * b + c) mod 256

which is the result you'd get anyway if you ignored the overflow of a * 
b in the first place.

> The problem here is with dependencies. If I use valarray, I'll make 
> the very
> basic things in my library (and as the result, the whole library) 
> dependent on
> macstl. This will make the library useless on SGI, Sun, HP, IBM, etc. 
> In case
> if I could map raw memory (preserving the alignment rules of course), 
> it's only
> a single file, like agg_pixfmt_rgba_vec.h. So, if the application uses 
> this
> file it depends on macstl, but the other parts of the library don't. 
> If you for
> some reason can't use macstl (sacrificing the speed), you just don't 
> use the
> parts dependent on it.

There are a couple of ways of managing dependencies. You could do as 
you say, through a single file that brings in macstl for the people who 
want to use it. Alternatively, the macstl implementation of valarray is 
largely a simple superset of std::valarray, which of course is 
available on the C++ standard library. Thus a simple statement like 
this:

#ifdef USE_MACSTL
#include <macstl/valarray.h>
namespace std
	{
		using namespace stdext;
	}
#else
#include <valarray>
#endif

is sufficient to switch between macstl's stdext::valarray and the 
hosted std::valarray. Of course from tests I see that std::valarray 
fares much poorer than hand-coded scalar loops on all tested compilers 
except gcc 3.3, where Gabriel Dos Reis's excellent ET implementation 
achieves close to hand-coded scalar loop performance.

Also, I'd like to see macstl compile and work correctly on as many 
compilers as possible, so it would be good to see results on those 
compilers you mentioned -- can someone try compiling macstl on those 
other compilers. The main requirement is a close-to-standard C++98 
compiler, e.g. VC 6.0 and 7.0 fare badly but 7.1 is OK.

There might be licensing issues as well. macstl is RPL and I originally 
thought AGG was BSD, but the SF page lists you as CPL. We'd have to 
work something out, if you're the owner of AGG, that including macstl 
is OK but anyone who wants the optimizations of macstl should respect 
its RPL rules.

I'll work on a refarray for 0.2.2.


Cheers, Glen Low


---
pixelglow software | simply brilliant stuff
www.pixelglow.com




  • Previous message: [macstl-dev] Re: [AGG] Re: [Fwd: Multiplication?]
  • Next message: [macstl-dev] Re: [AGG] Re: [Fwd: Multiplication?]
  • Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the macstl-dev mailing list