[macstl-dev] LDDQU vs MOVDQU
glen.low at pixelglow.com
Fri Oct 28 16:24:49 WST 2005
Not sure which list to post to in this brave new world of post-
PowerPC at Apple, but here goes...
LDDQU is the load unaligned op in SSE3 that has the same interface as
the old MOVDQU of SSE2.
1. Is this purely an implementation detail? If so why have a
distinctly different op rather than "upgrade" the older op when SSE3
2. All the (sparse) online docs say don't use LDDQU in a store-
load forwarding situation, use MOVDQU instead. I presume that if the
intent is to do pure streaming i.e. reading from x and storing into
distinctly different y (fire and forget), then LDDQU is the
3. The (sparse) online docs also say that LDDQU works better
across cache lines because it is 2 aligned loads + a realign, rather
than 2 part loads lie MOVDQU. Why?
4. When you use LDDQU in a streaming sequential load, do I end up
with double the number of memory accesses (due to the implicit 2
aligned loads) or is the Intel wizardry saavy enough to factor out
the repeated loads?
I'm implementing cross-platform unaligned loads in macstl and want to
do The Right Thing (TM).
Cheers, Glen Low
pixelglow software | simply brilliant stuff
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the macstl-dev