Hisseo

Hisseo is a collaborative research project to explore the issues related to the compilation of floating-point programs.

Abstract

Thanks to widespread hardware support, IEEE 754 floating-point computations are increasingly finding their way into embedded C code, and in particular into critical embedded C code.

Floating-point computations are notorious for the traps they lay on the programmer's path. The programmer who uses them should be aware that the result of a floating point operation is (depending on the rounding mode and on the operation itself) only one of the real numbers that can be represented as a floating-point number and that approximates more or less closely the actual result (as computed in the reals). In the worse cases, these approximations may accumulate and compound each other to the point that the final result obtained with floating-point computations is meaningless.

The Hisseo project (that started in October 2008) does not intend to tackle these issues, which are being adressed elsewhere. The Hisseo project will focus on the problems related to the treatment of floating-point computations in the compilation process, especially in the case of the compilation of critical C code.

Trusting compiled floating-point computations?

The compilation of IEEE 754 computations in a C program was not defined univocally until the C99 standard, and as of 2008, many compilers still implement the loose semantics of the previous standard (the previous standard allows reordering floating-point operations as if they were associative, when they aren't. The C99 standard mandates left-to-right ordering). Compiler makers focus on performance, and are not in a hurry to implement a subpart of the standard that removes optimization opportunities.

This situation is unlikely to improve quickly, because the majority of programmers are not concerned by details of this level.

Instruction sets of several popular architectures create performance penalties for trying to do the right thing. Processor designers try to get an edge over the competition by introducing ``improved'' floating-point instructions in their architectures. Architectures evolve very slowly (new models of processors come out all the time, but they have to be compatible with previous models). The historical Intel 8087 floating-point coprocessor introduced a 80-bit format that causes ``double rounding'' problems for the float and double C types. The PowerPC introduced a Floating Multiply-Add instruction which does not always produce the same result as a a multiplication followed by an addition.

The processor designers mean well! 80 bits is more precise than 32 or 64. The fmadd instruction introduces less rounding than a multiplication followed by an addition. But, because it is impossible to guess whether the compiler will be able to keep the results in 80-bit registers or use the special instructions, it becomes impossible to predict what the binary will compute looking only at the source code

The choice of a compiler for critical embedded C code is already quite constrained as it is. It is not always feasible to choose the compiler that implements the C99 version of the standard with respect to floating-point computations, even assuming that one can be found.

To be acceptable to producers of critical C code, a solution may need to take the reverse-engineering path, and obtain the necessary information from the output of the compiler

Three complementary solutions

The Hisseo project will be the opportunity to study three different approaches to the problem of trusting the compiled floating-point computations. The three solutions will face some of the same challenges and will also benefit from each other's results.

Making a correct compiler
In this subtask, formal semantics for floating point computations will be defined and be put to use in the formal verification of a C compiler.
Analyzing the assembly code
At the assembly level, almost all the sources of non-determinism in the computations have disappeared, but some of the information regarding the structure of the program remains. This is the ideal point to verify by static analysis the translation done by an off-the-shelf C compiler.
Testing the uninstrumented binary
A proven approach to the verification of run-time properties is to insert sanity checks within the source code. Unfortunately, in the case of floating-point computations, any modification of the source code may change other, unrelated computations because optimization opportunities may appear or disappear with the modifications. Hence, the tested binary computes differently from the binary finally executed. The tests may fail to find a problem that appears only in the latter.
One way to improve the trustability of tests is to make sure that the tested binary is the same as the finally executed binary.

The C compilation chain

In green, the places in which the Hisseo project plans to intervene.