[macstl-dev] Re: Windows build without SSE2/3 (e.g. athlon XP) vectest and benchmark

Paul Baxter pauljbaxter at hotmail.com
Wed Feb 2 23:58:42 WST 2005

  • Previous message: [macstl-dev] Re: macstl 0.2 is finally here! whew...
  • Next message: [macstl-dev] Re: Windows build without SSE2/3 (e.g. athlon XP) vectest and benchmark
  • Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]


>> The benchmark tests for windows compile OK but throw unhandled
>> exceptions when run. I haven't had time to chase this yet (10 hrs at work
>> + a kid is limiting my spare 'fun' time right now :) ) but I'm doing this
>> at home on my Athlon XP which is lacking SSE2/3.
>
> Yes, the library is currently tuned for SSE2. You can selectively disable
> support by not defining the appropriate macro: __MMX__, __SSE__, __SSE2__,
> __SSE3__ in the VS.NET project level

>> The benchmark tests for windows compile OK but throw unhandled exceptions
>> but I'm doing this at home on my Athlon XP which is lacking SSE2/3.
>
> Yes, the library is currently tuned for SSE2. You can selectively disable
> support by not defining the appropriate macro: __MMX__, __SSE__, __SSE2__,
> __SSE3__ in the VS.NET project level

For both projects:
Adjust the project defines to __MMX__ and __SSE__ only (no __SSE2__) and set 
the project build arch to SSE (rather than SSE2).


Tests of vectest project:

Compilation problems in vec_mmx.h

>From line 3779 there is a section for SSE/MMX that has some problems
(definition of <unsigned short,8> for the maximum function. Replacing the 8
with 4 (incl the subsequent <short,8> inside the function allows compilation 
and passes unsigned short tests

Same applies to minimum at line 3858 onwards

(test_func() OK but test_accum() is noted as undefined, due ( I think) to 
the lack of
template <> struct accumulator <maximum <macstl::vec <unsigned short, 4> > >
)

Other points:
vec<float,4> fails at min and max in both test_func and test_acc
Don't know why?? QNAN handling? maybe different handling in scalar and 
vector code?


Benchmark test:

With __MMX__, __SSE__ and arch set to SSE

Program crashes at entry to main

Changing the denormal activation to
#ifdef __SSE__

// _mm_setcsr (_mm_getcsr () | 0x8040); // on Intel, treat denormals as zero 
for full speed

_MM_SET_FLUSH_ZERO_MODE(_MM_FLUSH_ZERO_ON);

#endif

as per recommendations some way down the thread at 
http://softwareforums.intel.com/ids/board/message?board.id=16&message.id=183
then allowed the program to execute on my AthlonXP (Which I then showed to 
be equivalent to changing the bit mask to 0x8000 ie just the FTZ bit.)
I also tried 0x40 (DAZ bit 6 undocumented) as recommended in the same thread 
but suspect that this further (P4-specific?) assistance may not be available 
for Athlon's as it repeated the crash.

Querying _get_csr() after the revised function call on the Athlon gives me a 
value of 0x98F0 which looks like the FTZ bit(15) is set but not the DAZ bit 
6.

It would be interesting to know whether the function call is equivalent to a 
mask of 0x8040 on P4's and 0x8000 on Athlons automatically. Perhaps you can 
try it on a P4 and let me know? Otherwise you could consider the function at
http://ccrma-mail.stanford.edu/pipermail/planetccrma/2005-January/007558.html
This code sets bit 15 FTZ and then queries cpuid before deciding on bit 6.

Regards

Paul Baxter

-------------------------

Partial output of vectest.exe (with above mods added to vec_mmx.h) attached
at the end of email

<extract relating to my mod of maximum<unsigned short, 4>>

vec <unsigned short, 4> defined:
sum defined:
  sum OK.
max undefined.
min undefined.
operator- defined:
< snipped>
max defined:
  max OK.
min defined:
  min OK.
pow undefined.
<snipped>
---------------------------------------
<extract relating to <float,4> problem>

vec <float, 4> defined:
sum defined:
  sum OK.
max defined:
10000000: -1.#QNAN != 2.752831445473174e-031 == max 
(2.752831445473174e-031 -3.2907964123623209e-019 -1.3662074820492266e-017 -1.#QNAN).
min defined:
10000000: -1.#QNAN != -4.5765854009174442e-027 == min 
(30269796477465809000000 -4.5765854009174442e-027 503096195153920 -1.#QNAN).
operator- defined:
  operator- OK.
<snipped>
log undefined.
max defined:
10000000: -1.#QNAN != 6.7629492218561597e+032 == max 
(6.7629492218561597e+032, -1.#QNAN).
min defined:
10000000: 1.#QNAN != -2.2030078907976132e-016 == min 
(-2.2030078907976132e-016, 1.#QNAN).
pow undefined.
<snipped > 




  • Previous message: [macstl-dev] Re: macstl 0.2 is finally here! whew...
  • Next message: [macstl-dev] Re: Windows build without SSE2/3 (e.g. athlon XP) vectest and benchmark
  • Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the macstl-dev mailing list