[macstl-dev] Questions on valarray use : non-aligned buffers
Arman Garakani
arman at alum.mit.edu
Wed Sep 21 21:39:53 WST 2005
On Sep 21, 2005, at 7:16 AM, Glen Low wrote:
Stéphane:
On 20/09/2005, at 11:50 PM, Stéphane Letz wrote:
It seems that valarray functions do not support non-aligned buffers.
It a feature that could make sense to add in a future version?
(Apple vecLib code use scalor code in case of on-aligned buffers..)
http://developer.apple.com/documentation/Performance/Conceptual/vDSP/
ref_chap/chapter_4.1_section_244.html
macstl gets its insane speed partly from not needing to check the
buffers for alignment. valarray and statarray were defined first and
since they encapsulate the details of element storage, I could then
guarantee their alignment. Now that I've defined the refarray class,
I cannot guarantee the alignment of data beforehand -- in the next
version I'll likely insert an assert to that effect.
Rather than slow down all macstl code with a runtime check for
alignment, I'd suggest that you check for alignment yourself and act
accordingly. E.g. in pseudo-code:
if (a is aligned and b is aligned) then use macstl;
if (a and b are relatively aligned) then peel off the initial
sequence, and use macstl on the rest;
else use regular arithmetic;
In your case you may find one or two of the above situations never
arise, so you lose less speed checking for them.
If you need to minimize duplication of arithmetic, you might be able
to put the refarray code in it's own module, and compile with and
without Altivec (or SSE),
Not so fast,
More common than most realize is byte arrays still common in image
data. The problem with the 3 cases you outlined is that while the
first and the second differ slightly in speed, the third choice
suffers significantly. To illustrate, imagine processing a user-
specified window on a single or multi-plane byte image. Processing
the image data inside the window -- obviously depending on what
operation you are performing -- under the 3 cases above will vary
small percentages between first and second option but quite possibly
an order of magnitude in the third case. So looking at it from a user
interface point of view the user will get radically different "feel"
for the performance depending on where the window is!! Checking for
alignment is not slow and potentially its price will degrade the best
case performance a bit, but the average and the worst improve. A good
approach is to copy in to aligned buffer(s) in the third case.
Certainly for algorithm designers designing your data size and flow
so that it is properly aligned-stored is critical.
my 2 cents
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.pixelglow.com/lists/archive/macstl-dev/attachments/20050921/099bb731/attachment.html
More information about the macstl-dev
mailing list