SIGFPE

2020-05-20T21:56:52

Every programmer at one point will encounter the arcane semantics of IEEE-754 floating-point arithmetic. If for you that point is right now, allow me to list the worst offenders of the principle of least surprise:

Because of the semantics of integer division, it may be very surprising to the uninitiated that nonsensical floating-point operations actually "succeed" and control flow proceeds uninterrupted. While that might at first sound like a good thing, NaNs and infinitys quickly spread through any data that come in contact with them, because they propagate if they appear in any of the operands of most operations (NaN + 1 yields NaN, infinity + 1 yields infinity, etc.).

Take for example the following code that attempts to find the vector of penetration of a point p in a sphere with center c and radius r (the precursor to this epopoeia):

const auto d = c - p;
if(const auto l2 = dot(d, d); l2 > r2) {
    d *= r / std::sqrt(l2) - 1;
    // …
}

The code itself is not very interesting other than the fact that if the point happens to lie exactly on the circle, d will be a vector of length 0 (the conditional was supposed to be "greater or equal"). This is easily verifiable by observing that the division will equal 1 and the subtraction, 0.

If subsequent operations try to use that length in a division (which of course they did), NaNs and infinity will very soon appear and spread quickly without generating an easily identifiable error. The result is spectacular, but not ideal if one wants a stable simulation:

spectacular simulation

Enter SIGFPE

POSIX platforms have a special signal for arithmetic operation errors: SIGFPE (short for "floating-point exception"). If integer division by zero generates a signal, this is the one that is generated. Because of the aforementioned well-defined rules for IEEE-754 floating-point arithmetic, however, this signal is not generated by default for the operations we are interested in.

The good news is this can be changed, albeit with a bit of platform-specific code. For Linux systems using glibc, there is feenableexcept:

feenableexcept(FE_DIVBYZERO | FE_INVALID | FE_OVERFLOW);

This will cause all (IEEE-754) exceptions to send a SIGFPE to the process. Crucially, the signal will be delivered at the exact point where the floating-point operation is performed, yielding either a core dump (the default signal action for SIGFPE) or a precise breakpoint in the debugger.

There is an implementation on the internet for OS X and the cppreference page mentions a similar solution for Windows, although I have not tested any of those.

Even the raw x86 assembly instructions that actually set the FPU and SSE modes are very simple, v. the glibc implementation (slightly formatted):

int feenableexcept (int excepts) {
    unsigned short int new_exc, old_exc;
    unsigned int new;
    excepts &= FE_ALL_EXCEPT;
    /* Get the current control word of the x87 FPU. */
    __asm__ ("fstcw %0" : "=m" (*&new_exc));
    old_exc = (~new_exc) & FE_ALL_EXCEPT;
    new_exc &= ~excepts;
    __asm__ ("fldcw %0" : : "m" (*&new_exc));
    /* And now the same for the SSE MXCSR register. */
    __asm__ ("stmxcsr %0" : "=m" (*&new));
    /* The SSE exception masks are shifted by 7 bits. */
    new &= ~(excepts << 7);
    __asm__ ("ldmxcsr %0" : : "m" (*&new));
    return old_exc;
}

Bonus

For bonus points, this can be put in a shared library so that FPE signals can be enabled dynamically by either loading or LD_PRELOADing the library (modulo static initialization order) without recompiling the program:

$ cat fpe.cpp
#include <fenv.h>
static int _=[]{return feenableexcept(FE_DIVBYZERO|FE_INVALID|FE_OVERFLOW);}();
$ g++ -c -fpic fpe.cpp && g++ -shared -o fpe.so fpe.o
$ cat test.cpp
int main() { return 1.0f / 0.0f; }
$ g++ -o test test.cpp
$ ./test
$ LD_PRELOAD=./fpe.so ./test
Floating point exception (core dumped)

(credits for Jonathan Dursi for the blog post that got me started with this rabbit hole)

assembly c debugging linux math programming unix