LLVM Bugpoint

In this post, I would like to introduce the bugpoint command line tool. This is a automatic test case reduction tool which can help us generate minimal test case.

As a compiler developer, the first step to debug is to create a minimal test case which can still reproduce the bug. Unfortunately, the preprocessed C++ source code usually contains more than 10,000 lines. However, to make the test case understandable, we have to reduce them to less than 100 lines. To be honest, it is really a boring task that I don't like to work on manually. Fortunately, bugpoint is the automatic tool to solve the problem.

Convert to LLVM Assembly

Bugpoint is a reduction tool for LLVM assembly. In the other words, it takes LLVM assembly as the input and generates LLVM bitcode as the output. Thus, we have to convert C/C++ programs to LLVM assembly with clang. The easiest way is to replace the -emit-obj with -emit-llvm from the cc1 invocation command. For example,

$ clang -cc1 -emit-llvm input.cpp  # ... other options ...

If clang crashes in this step, then it seems that you are facing a front-end bug. You may wish to use C-Reduce or more general tools working on top of C/C++ source code.

If the output input.ll is generated without any problems, than we can continue with the llc command (which will generate either machine assembly file or relocatable object file):

$ llc input.ll  # ... other options ... (e.g. -O3 -mtriple=...)

The llc command should crash in this step. If it does not crash, then try to add some common optimization flags such as -O3 to the command line.

Reduce the Test Case

Now, we can reduce the test case with the bugpoint command. Since I am cross-compiling the source code in this case, I am using -llc-safe to test the compiler without the interpreter. Besides, the arguments to be passed to llc can be specified with the -safe-tool-args option.

$ bugpoint input.ll -llc-safe -safe-tool-args -mtriple=armv7-linux-gnueabi

If everything goes well, then bugpoint-reduced-simplified.bc will be created. You can disassemble the output file with:

$ llvm-dis bugpoint-reduced-simplified.bc

The output bugpoint-reduced-simplified.ll is the result test case.

Reduce the Test Case with Custom Compile Script

You may wish to customize the compiler pipeline to reproduce the bug. To do so, use the -compile-custom option instead and specify the test script with -compile-command. For example,

$ bugpoint input.ll -compile-custom -compile-command ./test.sh

Here's the test script:

#!/bin/bash

# Create a temporary file for the test command
logfile="$(mktemp)"

# Run your test command (and redirect the output messages)
llc "$@" > "${logfile}" 2>&1
ret="$?"

# Print messages when error occurs
if [ "${ret}" != 0 ]; then
  echo "test failed"  # must print something on failure
  cat "${logfile}"
fi

# Cleanup the temporary file
rm "${logfile}"

exit "${ret}"

Note

The test script MUST print some message when the command failed and it should not print any message when the command succeed. Otherwise, the bugpoint command won't work properly.

Strip the Symbols

Sometimes, there will be several long symbol names and dead function declaration in the LLVM bitcode, we can further strip the bitcode with:

$ opt -S -strip -strip-dead-prototypes \
    bugpoint-reduced-simplified.ll > strip.ll

Conclusion

After these steps, we should be able to obtain a minimal test case which is suitable for debugging. We can find the exact pass causing the problem with:

$ opt -print-before-all -print-after-all -O2 strip.ll > debug.txt 2>&1

In this post, I have introduced the basic usage of bugpoint to reduce the test case for code generation bugs. We can automate the test case reduction process with bugpoint, and as a creative programmer we can focus on more challenging tasks. For further information, please refer to How to Submit a LLVM Bug.