Find Undefined Behavior with Clang UBSan

Two weeks ago, I came across an interesting bug. The convert() function below returns 0x80000001 when p points to 0x01, 0x00, 0x00, 0x80, but the expected return value is 0x00000001 instead.

int32_t convert(const uint8_t *restrict p) {
  uint32_t x = (                  p[0] +
                256 *             p[1] +
                256 * 256 *       p[2] +
                256 * 256 * 256 * p[3]);

  if (x > INT32_MAX) {
    return (x - INT32_MAX) - 1;
  } else {
    return (((int32_t)x + (int32_t)-INT32_MAX) - 1);
  }
}

According to the bug report, this function was fine in the past but became broken after the compiler toolchain was upgraded. It sounds like an undefined behavior in the code, but I cannot spot any integer overflows or underflows in the if-else statement (even though it looks suspicious).

Although I found the root cause by disassembling the binary, I feel this is a great example to showcase the power of Clang Undefined Behavior Sanitizer (UBSan).

Undefiend Behavior Sanitizer

Clang has a built-in Undefined Behavior Sanitizer (UBSan). UBSan instruments the input source code with several run-time checks and print error messages if undefined behaviors occur.

To instrument a program with UBSan, add -fsanitize=undefined to the compiler options (both CFLAGS and LDFLAGS):

$ clang input.c -fsanitize=undefined

To test the convert() function, a main() function is added to input.c. It reads the user input and prints the returned value of convert():

#include <inttypes.h>
#include <stdint.h>
#include <stdio.h>
#include <string.h>

int32_t convert(const uint8_t *restrict p) {
  uint32_t x = (                  p[0] +
                256 *             p[1] +
                256 * 256 *       p[2] +
                256 * 256 * 256 * p[3]);

  if (x > INT32_MAX) {
    return (x - INT32_MAX) - 1;
  } else {
    return (((int32_t)x + (int32_t)-INT32_MAX) - 1);
  }
}

int main() {
  uint32_t value;
  uint8_t buf[sizeof(uint32_t)];
  while (scanf("%" SCNx32, &value) == 1) {
    memcpy(buf, &value, sizeof(buf));
    printf("%08" PRIx32 "\n", convert(buf));
  }
  return 0;
}

Then, compile the program with clang -fsanitize=undefined:

$ clang input.c -fsanitize=undefined

Run the executable and enter 00000000 and 80000001:

$ ./a.out
00000000
80000000
80000001
input.c:10:33: runtime error: signed integer overflow: 16777216 * 128 cannot be represented in type 'int'
00000001

In response to the first input 00000000, the program prints the expected 80000000. However, when 80000001 is entered, UBSan detects an error and prints an error message. It points out the signed integer overflow in 256 * 256 * 256 * p[3].

This error message deserves more elaborations. p[3] is an unsigned char. It will be promoted to a signed int ranging from 0 to 255. And then, this signed int will be multipled by 256 * 256 * 256. The multiplication may result in a signed integer overflow. According to the C/C++ specification, a signed integer overflow may lead to undefined behaviors.

In fact, some Clang optimizations actually exploit this undefined behavior and removed the then block of the if-else statement. Clang generates following assembly for ARM architecture:

; clang -target armv7-linux-gnueabi -mthumb -S -O2 input.c
ldr r0, [r0]
orr r0, r0, #-2147483648
bx  lr

There are several ways to avoid this undefined behavior. The simpliest solution is to replace multiplication expressions with more idiomatic shift expressions:

int32_t convert(const uint8_t *restrict p) {
  uint32_t x = (((uint32_t)p[0]       ) |
                ((uint32_t)p[1] <<  8u) |
                ((uint32_t)p[2] << 16u) |
                ((uint32_t)p[3] << 24u));

  if (x > INT32_MAX) {
    return (x - INT32_MAX) - 1;
  } else {
    return (((int32_t)x + (int32_t)-INT32_MAX) - 1);
  }
}

Conclusion

Undefined behaviors are dangerous. Every C/C++ programmers must avoid them at all costs. However, some undefined behaviors are subtle and difficult to spot. Undefined Behavior Sanitizer (UBSan) helps programmers find undefined behaviors in their program. Add -fsanitize=undefined to the compiler options if you are investigating miscompilation or debugging the program which used to work.