Two weeks ago, I came across an interesting bug. The convert()
function
below returns 0x80000001
when p
points to 0x01
,
0x00
, 0x00
, 0x80
, but the expected return value is
0x00000001
instead.
int32_t convert(const uint8_t *restrict p) {
uint32_t x = ( p[0] +
256 * p[1] +
256 * 256 * p[2] +
256 * 256 * 256 * p[3]);
if (x > INT32_MAX) {
return (x - INT32_MAX) - 1;
} else {
return (((int32_t)x + (int32_t)-INT32_MAX) - 1);
}
}
According to the bug report, this function was fine in the past but became broken after the compiler toolchain was upgraded. It sounds like an undefined behavior in the code, but I cannot spot any integer overflows or underflows in the if-else statement (even though it looks suspicious).
Although I found the root cause by disassembling the binary, I feel this is a great example to showcase the power of Clang Undefined Behavior Sanitizer (UBSan).
Undefiend Behavior Sanitizer
Clang has a built-in Undefined Behavior Sanitizer (UBSan). UBSan instruments the input source code with several run-time checks and print error messages if undefined behaviors occur.
To instrument a program with UBSan, add -fsanitize=undefined
to the
compiler options (both CFLAGS
and LDFLAGS
):
$ clang input.c -fsanitize=undefined
To test the convert()
function, a main()
function is added to
input.c
. It reads the user input and prints the returned value of
convert()
:
#include <inttypes.h>
#include <stdint.h>
#include <stdio.h>
#include <string.h>
int32_t convert(const uint8_t *restrict p) {
uint32_t x = ( p[0] +
256 * p[1] +
256 * 256 * p[2] +
256 * 256 * 256 * p[3]);
if (x > INT32_MAX) {
return (x - INT32_MAX) - 1;
} else {
return (((int32_t)x + (int32_t)-INT32_MAX) - 1);
}
}
int main() {
uint32_t value;
uint8_t buf[sizeof(uint32_t)];
while (scanf("%" SCNx32, &value) == 1) {
memcpy(buf, &value, sizeof(buf));
printf("%08" PRIx32 "\n", convert(buf));
}
return 0;
}
Then, compile the program with clang -fsanitize=undefined
:
$ clang input.c -fsanitize=undefined
Run the executable and enter 00000000
and 80000001
:
$ ./a.out
00000000
80000000
80000001
input.c:10:33: runtime error: signed integer overflow: 16777216 * 128 cannot be represented in type 'int'
00000001
In response to the first input 00000000
, the program prints the expected
80000000
. However, when 80000001
is entered, UBSan detects an
error and prints an error message. It points out the signed integer overflow in
256 * 256 * 256 * p[3]
.
This error message deserves more elaborations. p[3]
is an
unsigned char
. It will be promoted to a signed int
ranging from
0 to 255. And then, this signed int
will be multipled by 256 *
256 * 256
. The multiplication may result in a signed integer overflow.
According to the C/C++ specification, a signed integer overflow may lead to
undefined behaviors.
In fact, some Clang optimizations actually exploit this undefined behavior and removed the then block of the if-else statement. Clang generates following assembly for ARM architecture:
; clang -target armv7-linux-gnueabi -mthumb -S -O2 input.c
ldr r0, [r0]
orr r0, r0, #-2147483648
bx lr
There are several ways to avoid this undefined behavior. The simpliest solution is to replace multiplication expressions with more idiomatic shift expressions:
int32_t convert(const uint8_t *restrict p) {
uint32_t x = (((uint32_t)p[0] ) |
((uint32_t)p[1] << 8u) |
((uint32_t)p[2] << 16u) |
((uint32_t)p[3] << 24u));
if (x > INT32_MAX) {
return (x - INT32_MAX) - 1;
} else {
return (((int32_t)x + (int32_t)-INT32_MAX) - 1);
}
}
Conclusion
Undefined behaviors are dangerous. Every C/C++ programmers must avoid them at
all costs. However, some undefined behaviors are subtle and difficult to spot.
Undefined Behavior Sanitizer (UBSan) helps programmers find undefined behaviors
in their program. Add -fsanitize=undefined
to the compiler options if
you are investigating miscompilation or debugging the program which used to
work.