Monday, July 27, 2009

Windows float to integer compiler woes...

Got my Number Theoretic Transforms to run faster than FFTs from FFTW on Linux via GCC. After attempting to run it on Windows x64 to bench similar result, I found it ran extremely slowly. :(

This was very distressing as what I found was that the truncation of floating pointing numbers to integers is rerouted to a compiler specific function called __ftol (thats float to long), which isn't fast enough.

I found some good links explaining this like Intel Notes on Floating Point-to-Integer Latency and Herf's Know your FPU page. These pages sum up the design flaw in the MSVC 2008 x64 compiler, as most work arounds can be done painlessly only on the x86 version of the compiler.

Apparently they've decided that this function is fast enough, and already marked the compiler flag '/QIfist', which stops it from being used in the x86 compiler, for deprecation!
"No compiler option is needed. The compiler has made significant improvements in float to int conversion speed."
Sigh. I guess this is true for regular use, but my work requires tens to hundreds of milliseconds... and __ftol makes a significant difference. Incidentally, the compiler flag '/fp:fast' did help speed up the program but it is still 2-3 times slower than the GCC version.

I also tried the code
int ftol_ambient(double d) {
int i;

__asm {
fld d
fistp i
}
return i;
}
Turns out the x64 compiler doesn't support inline assembly.... grrr....

Might have to implement the float casting using SSE (SIMD) intrinsics. Looks like I may have to switch compilers to stick or Linux.

Cheers
Shakes - L3mming

EDIT: The x64 compiler doesn't support inline assembly because it has been replaced with intrinsics. The way to achieve truncation is to use the following code:
static inline int round (double const x)
{
return _mm_cvtsd_si32(_mm_load_sd(&x));
}

No comments:

Post a Comment