SIGFPE
2020-05-20T21:56:52
Every programmer at one point will encounter the arcane semantics of IEEE-754 floating-point arithmetic. If for you that point is right now, allow me to list the worst offenders of the principle of least surprise:
-
Division by zero is a (semi-)valid operation that returns
±infinity
. -
Division of zero by zero returns
NaN
(not-a-number). -
Even invalid operations are also (semi-)valid operations that return
NaN
. - In what has to be considered the clear winner, despite the items above not generating an exception by default, an integer division-by-zero will generate a floating-point exception (unrelated to exceptions in languages such as C++) in POSIX-compliant operating systems — if it does generate an exception at all, which is usually the case.
Because of the semantics of integer division, it may be very surprising to the
uninitiated that nonsensical floating-point operations actually "succeed" and
control flow proceeds uninterrupted. While that might at first sound like a
good thing, NaN
s and infinity
s quickly spread through
any data that come in contact with them, because they propagate if they appear
in any of the operands of most operations (NaN + 1
yields
NaN
, infinity + 1
yields infinity
,
etc.).
Take for example the following code that attempts to find the vector of
penetration of a point p
in a sphere with center c
and radius r
(the precursor to this epopoeia):
const auto d = c - p;
if(const auto l2 = dot(d, d); l2 > r2) {
d *= r / std::sqrt(l2) - 1;
// …
}
The code itself is not very interesting other than the fact that if the point
happens to lie exactly on the circle, d
will be a vector of length
0 (the conditional was supposed to be "greater or equal"). This is
easily verifiable by observing that the division will equal 1 and the
subtraction, 0.
If subsequent operations try to use that length in a division (which of course
they did), NaN
s and infinity
will very soon appear
and spread quickly without generating an easily identifiable error.
The result is spectacular, but not ideal if one wants a stable simulation:
Enter SIGFPE
POSIX platforms have a special signal for arithmetic operation errors:
SIGFPE
(short for "floating-point exception"). If integer
division by zero generates a signal, this is the one that is generated.
Because of the aforementioned well-defined rules for IEEE-754 floating-point
arithmetic, however, this signal is not generated by default for the operations
we are interested in.
The good news is this can be changed, albeit with a bit of platform-specific
code. For Linux systems using glibc
, there is feenableexcept
:
feenableexcept(FE_DIVBYZERO | FE_INVALID | FE_OVERFLOW);
This will cause all (IEEE-754) exceptions to send a SIGFPE
to the
process. Crucially, the signal will be delivered at the exact point where the
floating-point operation is performed, yielding either a core dump (the default
signal action for SIGFPE
) or a precise breakpoint in the debugger.
There is an implementation on the internet for OS X and the cppreference page mentions a similar solution for Windows, although I have not tested any of those.
Even the raw x86
assembly instructions that actually set the FPU
and SSE modes are very simple, v. the
glibc
implementation
(slightly formatted):
int feenableexcept (int excepts) {
unsigned short int new_exc, old_exc;
unsigned int new;
excepts &= FE_ALL_EXCEPT;
/* Get the current control word of the x87 FPU. */
__asm__ ("fstcw %0" : "=m" (*&new_exc));
old_exc = (~new_exc) & FE_ALL_EXCEPT;
new_exc &= ~excepts;
__asm__ ("fldcw %0" : : "m" (*&new_exc));
/* And now the same for the SSE MXCSR register. */
__asm__ ("stmxcsr %0" : "=m" (*&new));
/* The SSE exception masks are shifted by 7 bits. */
new &= ~(excepts << 7);
__asm__ ("ldmxcsr %0" : : "m" (*&new));
return old_exc;
}
Bonus
For bonus points, this can be put in a shared library so that FPE signals can
be enabled dynamically by either loading or LD_PRELOAD
ing the
library (modulo static initialization order) without recompiling the program:
$ cat fpe.cpp
#include <fenv.h>
static int _=[]{return feenableexcept(FE_DIVBYZERO|FE_INVALID|FE_OVERFLOW);}();
$ g++ -c -fpic fpe.cpp && g++ -shared -o fpe.so fpe.o
$ cat test.cpp
int main() { return 1.0f / 0.0f; }
$ g++ -o test test.cpp
$ ./test
$ LD_PRELOAD=./fpe.so ./test
Floating point exception (core dumped)
(credits for Jonathan Dursi for the blog post that got me started with this rabbit hole)