[macstl-dev] Re: Windows build without SSE2/3 (e.g. athlon XP)
vectest and benchmark
Paul Baxter
pauljbaxter at hotmail.com
Wed Feb 2 23:58:42 WST 2005
>> The benchmark tests for windows compile OK but throw unhandled
>> exceptions when run. I haven't had time to chase this yet (10 hrs at work
>> + a kid is limiting my spare 'fun' time right now :) ) but I'm doing this
>> at home on my Athlon XP which is lacking SSE2/3.
>
> Yes, the library is currently tuned for SSE2. You can selectively disable
> support by not defining the appropriate macro: __MMX__, __SSE__, __SSE2__,
> __SSE3__ in the VS.NET project level
>> The benchmark tests for windows compile OK but throw unhandled exceptions
>> but I'm doing this at home on my Athlon XP which is lacking SSE2/3.
>
> Yes, the library is currently tuned for SSE2. You can selectively disable
> support by not defining the appropriate macro: __MMX__, __SSE__, __SSE2__,
> __SSE3__ in the VS.NET project level
For both projects:
Adjust the project defines to __MMX__ and __SSE__ only (no __SSE2__) and set
the project build arch to SSE (rather than SSE2).
Tests of vectest project:
Compilation problems in vec_mmx.h
>From line 3779 there is a section for SSE/MMX that has some problems
(definition of <unsigned short,8> for the maximum function. Replacing the 8
with 4 (incl the subsequent <short,8> inside the function allows compilation
and passes unsigned short tests
Same applies to minimum at line 3858 onwards
(test_func() OK but test_accum() is noted as undefined, due ( I think) to
the lack of
template <> struct accumulator <maximum <macstl::vec <unsigned short, 4> > >
)
Other points:
vec<float,4> fails at min and max in both test_func and test_acc
Don't know why?? QNAN handling? maybe different handling in scalar and
vector code?
Benchmark test:
With __MMX__, __SSE__ and arch set to SSE
Program crashes at entry to main
Changing the denormal activation to
#ifdef __SSE__
// _mm_setcsr (_mm_getcsr () | 0x8040); // on Intel, treat denormals as zero
for full speed
_MM_SET_FLUSH_ZERO_MODE(_MM_FLUSH_ZERO_ON);
#endif
as per recommendations some way down the thread at
http://softwareforums.intel.com/ids/board/message?board.id=16&message.id=183
then allowed the program to execute on my AthlonXP (Which I then showed to
be equivalent to changing the bit mask to 0x8000 ie just the FTZ bit.)
I also tried 0x40 (DAZ bit 6 undocumented) as recommended in the same thread
but suspect that this further (P4-specific?) assistance may not be available
for Athlon's as it repeated the crash.
Querying _get_csr() after the revised function call on the Athlon gives me a
value of 0x98F0 which looks like the FTZ bit(15) is set but not the DAZ bit
6.
It would be interesting to know whether the function call is equivalent to a
mask of 0x8040 on P4's and 0x8000 on Athlons automatically. Perhaps you can
try it on a P4 and let me know? Otherwise you could consider the function at
http://ccrma-mail.stanford.edu/pipermail/planetccrma/2005-January/007558.html
This code sets bit 15 FTZ and then queries cpuid before deciding on bit 6.
Regards
Paul Baxter
-------------------------
Partial output of vectest.exe (with above mods added to vec_mmx.h) attached
at the end of email
<extract relating to my mod of maximum<unsigned short, 4>>
vec <unsigned short, 4> defined:
sum defined:
sum OK.
max undefined.
min undefined.
operator- defined:
< snipped>
max defined:
max OK.
min defined:
min OK.
pow undefined.
<snipped>
---------------------------------------
<extract relating to <float,4> problem>
vec <float, 4> defined:
sum defined:
sum OK.
max defined:
10000000: -1.#QNAN != 2.752831445473174e-031 == max
(2.752831445473174e-031 -3.2907964123623209e-019 -1.3662074820492266e-017 -1.#QNAN).
min defined:
10000000: -1.#QNAN != -4.5765854009174442e-027 == min
(30269796477465809000000 -4.5765854009174442e-027 503096195153920 -1.#QNAN).
operator- defined:
operator- OK.
<snipped>
log undefined.
max defined:
10000000: -1.#QNAN != 6.7629492218561597e+032 == max
(6.7629492218561597e+032, -1.#QNAN).
min defined:
10000000: 1.#QNAN != -2.2030078907976132e-016 == min
(-2.2030078907976132e-016, 1.#QNAN).
pow undefined.
<snipped >
More information about the macstl-dev
mailing list