[macstl-dev] Questions on valarray use : non-aligned buffers
Glen Low
glen.low at pixelglow.com
Thu Sep 22 07:23:57 WST 2005
Arman, all:
On 21/09/2005, at 9:25 PM, Arman Garakani wrote:
>
> On Sep 21, 2005, at 7:16 AM, Glen Low wrote:
>
>> Stéphane:
>>
>> On 20/09/2005, at 11:50 PM, Stéphane Letz wrote:
>>
>>> It seems that valarray functions do not support non-aligned
>>> buffers. It a feature that could make sense to add in a future
>>> version?
>>>
>>> (Apple vecLib code use scalor code in case of on-aligned buffers..)
>>>
>>> http://developer.apple.com/documentation/Performance/Conceptual/
>>> vDSP/ref_chap/chapter_4.1_section_244.html
>>
>> macstl gets its insane speed partly from not needing to check the
>> buffers for alignment. valarray and statarray were defined first
>> and since they encapsulate the details of element storage, I could
>> then guarantee their alignment. Now that I've defined the refarray
>> class, I cannot guarantee the alignment of data beforehand -- in
>> the next version I'll likely insert an assert to that effect.
>>
>> Rather than slow down all macstl code with a runtime check for
>> alignment, I'd suggest that you check for alignment yourself and
>> act accordingly. E.g. in pseudo-code:
>>
>> if (a is aligned and b is aligned) then use macstl;
>> if (a and b are relatively aligned) then peel off the initial
>> sequence, and use macstl on the rest;
>> else use regular arithmetic;
>>
>> In your case you may find one or two of the above situations never
>> arise, so you lose less speed checking for them.
>>
>> If you need to minimize duplication of arithmetic, you might be
>> able to put the refarray code in it's own module, and compile with
>> and without Altivec (or SSE),
>>
>
> Not so fast,
>
> More common than most realize is byte arrays still common in image
> data. The problem with the 3 cases you outlined is that while the
> first and the second differ slightly in speed, the third choice
> suffers significantly. To illustrate, imagine processing a user-
> specified window on a single or multi-plane byte image. Processing
> the image data inside the window -- obviously depending on what
> operation you are performing -- under the 3 cases above will vary
> small percentages between first and second option but quite
> possibly an order of magnitude in the third case. So looking at it
> from a user interface point of view the user will get radically
> different "feel" for the performance depending on where the window
> is!! Checking for alignment is not slow and potentially its price
> will degrade the best case performance a bit, but the average and
> the worst improve. A good approach is to copy in to aligned buffer
> (s) in the third case. Certainly for algorithm designers designing
> your data size and flow so that it is properly aligned-stored is
> critical.
It occurs to me that I could design a new "unaligned_array" class, in
which you could pass an arbitrary piece of memory and automatically
does the re-alignment. On the PowerPC it would use lvsr/lvsl and perm
to adjust the input stream, and on Intel it would use loadu or lddqu
(SSE3). And in the default non-SIMD case, it would just return
element by element as does refarray. The advantage would be that you
needn't copy the stuff to another buffer and bash your cache and LSU.
With such a class you might still want to do some runtime checking
before using it, so that the always aligned folk don't pay
performance for the sometimes aligned folk's flexibility. Given that
the iterator would do something different from regular refarray's
iterator and I don't think I can vary runtime behavior without using
slower non-inline virtual member functions -- it would have to be an
entirely different class instead of just an improvement in refarray.
Now all I need is a snazzy name, like its brothers valarray,
statarray and refarray :-) and you'll see it in 0.3.2.
Cheers, Glen Low
---
pixelglow software | simply brilliant stuff
www.pixelglow.com
aim: pixglen
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.pixelglow.com/lists/archive/macstl-dev/attachments/20050922/5e577a29/attachment-0001.html
More information about the macstl-dev
mailing list