[macstl-dev] LDDQU vs MOVDQU

Glen Low glen.low at pixelglow.com
Fri Oct 28 16:24:49 WST 2005

  • Previous message: [macstl-dev] Fwd: running on intel emt64
  • Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]


Hi all

Not sure which list to post to in this brave new world of post- 
PowerPC at Apple, but here goes...

LDDQU is the load unaligned op in SSE3 that has the same interface as  
the old MOVDQU of SSE2.

Questions:

1.    Is this purely an implementation detail? If so why have a  
distinctly different op rather than "upgrade" the older op when SSE3  
came out?
2.    All the (sparse) online docs say don't use LDDQU in a store- 
load forwarding situation, use MOVDQU instead. I presume that if the  
intent is to do pure streaming i.e. reading from x and storing into  
distinctly different y (fire and forget), then LDDQU is the  
appropriate op?
3.    The (sparse) online docs also say that LDDQU works better  
across cache lines because it is 2 aligned loads + a realign, rather  
than 2 part loads lie MOVDQU. Why?
4.    When you use LDDQU in a streaming sequential load, do I end up  
with double the number of memory accesses (due to the implicit 2  
aligned loads) or is the Intel wizardry saavy enough to factor out  
the repeated loads?

I'm implementing cross-platform unaligned loads in macstl and want to  
do The Right Thing (TM).

Cheers, Glen Low


---
pixelglow software | simply brilliant stuff
www.pixelglow.com
aim: pixglen

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.pixelglow.com/lists/archive/macstl-dev/attachments/20051028/86095eb6/attachment.html

  • Previous message: [macstl-dev] Fwd: running on intel emt64
  • Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the macstl-dev mailing list