[macstl-dev] Re: Windows build without SSE2/3 (e.g. athlon XP) vectest and benchmark

Glen Low glen.low at pixelglow.com
Thu Feb 3 08:07:14 WST 2005

  • Previous message: [macstl-dev] Re: Windows build without SSE2/3 (e.g. athlon XP) vectest and benchmark
  • Next message: [macstl-dev] Re: Windows build without SSE2/3 (e.g. athlon XP) vectest and benchmark
  • Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]


Paul:

> For both projects:
> Adjust the project defines to __MMX__ and __SSE__ only (no __SSE2__)  
> and set the project build arch to SSE (rather than SSE2).
>
>
> Tests of vectest project:
>
> Compilation problems in vec_mmx.h
>
>> From line 3779 there is a section for SSE/MMX that has some problems
> (definition of <unsigned short,8> for the maximum function. Replacing  
> the 8
> with 4 (incl the subsequent <short,8> inside the function allows  
> compilation and passes unsigned short tests
>
> Same applies to minimum at line 3858 onwards
>
> (test_func() OK but test_accum() is noted as undefined, due ( I think)  
> to the lack of
> template <> struct accumulator <maximum <macstl::vec <unsigned short,  
> 4> > >
> )

I'll look into it.

> Other points:
> vec<float,4> fails at min and max in both test_func and test_acc
> Don't know why?? QNAN handling? maybe different handling in scalar and  
> vector code?

Yes it's NaN handling. C89 and C++98 are silent about NaN handling in  
max and min code, but C99 says that fmax and fmin are supposed to  
ignore NaN rather than propagate NaN ("C99-style max"). However both  
Altivec and MMX/SSE use Java-style max, which propagates the NaN. So  
for the common interface I've adopted C99-style max, which is fairly  
easy to get on Altivec and requires a little more thought on SSE, and  
it would devolve to faster Java-style max when finite math optimization  
is on (is there a macro or option for this on VC++?).

>
> Benchmark test:
>
> With __MMX__, __SSE__ and arch set to SSE
>
> Program crashes at entry to main
>
> Changing the denormal activation to
> #ifdef __SSE__
>
> // _mm_setcsr (_mm_getcsr () | 0x8040); // on Intel, treat denormals  
> as zero for full speed
>
> _MM_SET_FLUSH_ZERO_MODE(_MM_FLUSH_ZERO_ON);
>
> #endif
>
> as per recommendations some way down the thread at  
> http://softwareforums.intel.com/ids/board/message? 
> board.id=16&message.id=183
> then allowed the program to execute on my AthlonXP (Which I then  
> showed to be equivalent to changing the bit mask to 0x8000 ie just the  
> FTZ bit.)
> I also tried 0x40 (DAZ bit 6 undocumented) as recommended in the same  
> thread but suspect that this further (P4-specific?) assistance may not  
> be available for Athlon's as it repeated the crash.
>
> Querying _get_csr() after the revised function call on the Athlon  
> gives me a value of 0x98F0 which looks like the FTZ bit(15) is set but  
> not the DAZ bit 6.
>
> It would be interesting to know whether the function call is  
> equivalent to a mask of 0x8040 on P4's and 0x8000 on Athlons  
> automatically. Perhaps you can try it on a P4 and let me know?  
> Otherwise you could consider the function at
> http://ccrma-mail.stanford.edu/pipermail/planetccrma/2005-January/ 
> 007558.html
> This code sets bit 15 FTZ and then queries cpuid before deciding on  
> bit 6.

Yes, I'm considering to make it part of the common interface i.e. a  
single function that sets or clears denormal handling on all  
processors.


Cheers, Glen Low


---
pixelglow software | simply brilliant stuff
www.pixelglow.com





  • Previous message: [macstl-dev] Re: Windows build without SSE2/3 (e.g. athlon XP) vectest and benchmark
  • Next message: [macstl-dev] Re: Windows build without SSE2/3 (e.g. athlon XP) vectest and benchmark
  • Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the macstl-dev mailing list