SIGFPE
2020-05-20T21:56:52
Every programmer at one point will encounter the arcane semantics of IEEE-754 floating-point arithmetic. If for you that point is right now, allow me to list the worst offenders of the principle of least surprise:
-
Division by zero is a (semi-)valid operation that returns
±infinity. -
Division of zero by zero returns
NaN(not-a-number). -
Even invalid operations are also (semi-)valid operations that return
NaN. - In what has to be considered the clear winner, despite the items above not generating an exception by default, an integer division-by-zero will generate a floating-point exception (unrelated to exceptions in languages such as C++) in POSIX-compliant operating systems — if it does generate an exception at all, which is usually the case.
Because of the semantics of integer division, it may be very surprising to the
uninitiated that nonsensical floating-point operations actually "succeed" and
control flow proceeds uninterrupted. While that might at first sound like a
good thing, NaNs and infinitys quickly spread through
any data that come in contact with them, because they propagate if they appear
in any of the operands of most operations (NaN + 1 yields
NaN, infinity + 1 yields infinity,
etc.).
Take for example the following code that attempts to find the vector of
penetration of a point p in a sphere with center c
and radius r (the precursor to this epopoeia):
const auto d = c - p;
if(const auto l2 = dot(d, d); l2 > r2) {
d *= r / std::sqrt(l2) - 1;
// …
}
The code itself is not very interesting other than the fact that if the point
happens to lie exactly on the circle, d will be a vector of length
0 (the conditional was supposed to be "greater or equal"). This is
easily verifiable by observing that the division will equal 1 and the
subtraction, 0.
If subsequent operations try to use that length in a division (which of course
they did), NaNs and infinity will very soon appear
and spread quickly without generating an easily identifiable error.
The result is spectacular, but not ideal if one wants a stable simulation:
Enter SIGFPE
POSIX platforms have a special signal for arithmetic operation errors:
SIGFPE (short for "floating-point exception"). If integer
division by zero generates a signal, this is the one that is generated.
Because of the aforementioned well-defined rules for IEEE-754 floating-point
arithmetic, however, this signal is not generated by default for the operations
we are interested in.
The good news is this can be changed, albeit with a bit of platform-specific
code. For Linux systems using glibc, there is feenableexcept:
feenableexcept(FE_DIVBYZERO | FE_INVALID | FE_OVERFLOW);
This will cause all (IEEE-754) exceptions to send a SIGFPE to the
process. Crucially, the signal will be delivered at the exact point where the
floating-point operation is performed, yielding either a core dump (the default
signal action for SIGFPE) or a precise breakpoint in the debugger.
There is an implementation on the internet for OS X and the cppreference page mentions a similar solution for Windows, although I have not tested any of those.
Even the raw x86 assembly instructions that actually set the FPU
and SSE modes are very simple, v. the
glibc implementation
(slightly formatted):
int feenableexcept (int excepts) {
unsigned short int new_exc, old_exc;
unsigned int new;
excepts &= FE_ALL_EXCEPT;
/* Get the current control word of the x87 FPU. */
__asm__ ("fstcw %0" : "=m" (*&new_exc));
old_exc = (~new_exc) & FE_ALL_EXCEPT;
new_exc &= ~excepts;
__asm__ ("fldcw %0" : : "m" (*&new_exc));
/* And now the same for the SSE MXCSR register. */
__asm__ ("stmxcsr %0" : "=m" (*&new));
/* The SSE exception masks are shifted by 7 bits. */
new &= ~(excepts << 7);
__asm__ ("ldmxcsr %0" : : "m" (*&new));
return old_exc;
}
Bonus
For bonus points, this can be put in a shared library so that FPE signals can
be enabled dynamically by either loading or LD_PRELOADing the
library (modulo static initialization order) without recompiling the program:
$ cat fpe.cpp
#include <fenv.h>
static int _=[]{return feenableexcept(FE_DIVBYZERO|FE_INVALID|FE_OVERFLOW);}();
$ g++ -c -fpic fpe.cpp && g++ -shared -o fpe.so fpe.o
$ cat test.cpp
int main() { return 1.0f / 0.0f; }
$ g++ -o test test.cpp
$ ./test
$ LD_PRELOAD=./fpe.so ./test
Floating point exception (core dumped)
(credits for Jonathan Dursi for the blog post that got me started with this rabbit hole)