[macstl-dev] LDDQU vs MOVDQU
Glen Low
glen.low at pixelglow.com
Fri Oct 28 16:24:49 WST 2005
Hi all
Not sure which list to post to in this brave new world of post-
PowerPC at Apple, but here goes...
LDDQU is the load unaligned op in SSE3 that has the same interface as
the old MOVDQU of SSE2.
Questions:
1. Is this purely an implementation detail? If so why have a
distinctly different op rather than "upgrade" the older op when SSE3
came out?
2. All the (sparse) online docs say don't use LDDQU in a store-
load forwarding situation, use MOVDQU instead. I presume that if the
intent is to do pure streaming i.e. reading from x and storing into
distinctly different y (fire and forget), then LDDQU is the
appropriate op?
3. The (sparse) online docs also say that LDDQU works better
across cache lines because it is 2 aligned loads + a realign, rather
than 2 part loads lie MOVDQU. Why?
4. When you use LDDQU in a streaming sequential load, do I end up
with double the number of memory accesses (due to the implicit 2
aligned loads) or is the Intel wizardry saavy enough to factor out
the repeated loads?
I'm implementing cross-platform unaligned loads in macstl and want to
do The Right Thing (TM).
Cheers, Glen Low
---
pixelglow software | simply brilliant stuff
www.pixelglow.com
aim: pixglen
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.pixelglow.com/lists/archive/macstl-dev/attachments/20051028/86095eb6/attachment.html
More information about the macstl-dev
mailing list