Debug Multithreaded Program with GDB

GDB is a powerful debugger. Without a doubt, we can debug multi-threaded programs with gdb. In particular, we can switch between threads, inspect the stack, and dump the registers. In this post, I would like to start with the Dining Philosophers Problem, which was first coined by Edsger Dijkstra.

Here is the problem statement for Dining Philosophers Problem:

Several philosophers are sitting in a round table. There is exactly one fork between each pair of neighboring philosophers. Only if a philosopher grabs both forks at his left and right, can he start eating. Can we guarantee that all of the philosophers can finish their dishes eventually?

This is the first attempt to solve the problem:

#include <chrono>
#include <mutex>
#include <thread>
#include <vector>

constexpr unsigned NUM_PHILOSOPHERS = 3;

std::mutex forks[NUM_PHILOSOPHERS];

void eat(unsigned idx) {
  unsigned first = idx;
  unsigned second = (idx + 1) % NUM_PHILOSOPHERS;

  // Grab the forks.
  std::lock_guard<std::mutex> first_fork(forks[first]);
  std::this_thread::sleep_for(std::chrono::seconds(1));
  std::lock_guard<std::mutex> second_fork(forks[second]);

  printf("Philosopher %u is eating.\n", idx);
  std::this_thread::sleep_for(std::chrono::seconds(1));
}

int main() {
  std::vector<std::thread> philosophers(NUM_PHILOSOPHERS);

  for (unsigned i = 0; i < NUM_PHILOSOPHERS; ++i) {
    philosophers[i] = std::thread(eat, i);
  }

  for (unsigned i = 0; i < NUM_PHILOSOPHERS; ++i) {
    philosophers[i].join();
  }
}

As we can see, there are two std::lock_guard in eat(), which guarantees that the dining philosopher has grabbed two forks. Besides, the forks will be returned to the table after the philosopher finishes his dish.

Let's compile this code snippet:

$ g++ -std=c++11 -pthread philosopher.cpp

And then, with high probability, the program will hang forever:

$ ./a.out
^C

Let's debug it with gdb. First, we have to re-compile the program with -g option. Second, we have to load the program with gdb:

# Recompile the program with debug option.
$ g++ -std=c++11 -pthread -g philosopher.cpp

# Launch the debugger.
$ gdb --args ./a.out
GNU gdb (Ubuntu 7.7.1-0ubuntu5~14.04.2) 7.7.1
... skipped ...
Reading symbols from ./a.out...done.
(gdb)

Let's start our program with run:

(gdb) run
Starting program: /home/logan/a.out
... skipped ...
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff6fd6700 (LWP 17406)]
[New Thread 0x7ffff67d5700 (LWP 17407)]
[New Thread 0x7ffff5fd4700 (LWP 17408)]

Apparently, the program get stuck again. Let's interrupt the program with Ctrl-C:

^C
Program received signal SIGINT, Interrupt.
0x00007ffff76ab66b in pthread_join (threadid=140737337190144,
    thread_return=0x0) at pthread_join.c:92
92  pthread_join.c: No such file or directory.
(gdb)

We get back to the gdb command line again. Let's see the backtrace of current thread:

(gdb) bt
#0  0x00007ffff76ab66b in pthread_join (threadid=140737337190144,
    thread_return=0x0) at pthread_join.c:92
#1  0x00007ffff7b87837 in std::thread::join() ()
   from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#2  0x00000000004010c4 in main () at philosopher.cpp:31
(gdb) up 2
#2  0x00000000004010c4 in main () at philosopher.cpp:31
31      philosophers[i].join();
(gdb) l
26    for (unsigned i = 0; i < NUM_PHILOSOPHERS; ++i) {
27      philosophers[i] = std::thread(eat, i);
28    }
(gdb) p i
$1 = 0

It seems that the main thread is waiting for the philosophers to finish their dishes, which is the expected behavior. Let's have a look on the other threads:

(gdb) info threads
  Id   Target Id         Frame
  4    Thread 0x7ffff5fd4700 (LWP 17408) "a.out" __lll_lock_wait ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
  3    Thread 0x7ffff67d5700 (LWP 17407) "a.out" __lll_lock_wait ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
  2    Thread 0x7ffff6fd6700 (LWP 17406) "a.out" __lll_lock_wait ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
* 1    Thread 0x7ffff7fcf780 (LWP 17402) "a.out" 0x00007ffff76ab66b in
       pthread_join (threadid=140737337190144, thread_return=0x0) at
       pthread_join.c:92

We can list all threads with info threads. Each thread will be associated with an Id. The thread marked with the asterisk is the thread we are inspecting currently. From the output of info threads, it is obvious that all of the philosophers are waiting to grab another lock, i.e. they are trying to grab a fork which was grabbed by the other. It seems to be a deadlock.

Let's have a look on the philosopher threads. We can switch to those threads with thread [Id] command.

(gdb) thread 2
[Switching to thread 2 (Thread 0x7ffff6fd6700 (LWP 17406))]
#0  __lll_lock_wait ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135

After switching the threads, we can check their backtrace with bt and dump their registers with info registers.

(gdb) bt
#0  __lll_lock_wait ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
... skipped ...
#5  0x00000000004015b2 in std::lock_guard<std::mutex>::lock_guard (
    this=0x7ffff6fd5e00, __m=...) at /usr/include/c++/4.8/mutex:414
#6  0x0000000000400f87 in eat (idx=0) at philosopher.cpp:17
... skipped ...
---Type <return> to continue, or q <return> to quit---q
 at pthread_create.c:Quit

It seems that there are deadlocks in our eat() function. Let's have a closer look:

(gdb) up 6
#6  0x0000000000400f87 in eat (idx=0) at philosopher.cpp:17
17    std::lock_guard<std::mutex> second_fork(forks[second]);
(gdb) l
12    unsigned second = (idx + 1) % NUM_PHILOSOPHERS;
13
14    // Grab the forks.
15    std::lock_guard<std::mutex> first_fork(forks[first]);
16    std::this_thread::sleep_for(std::chrono::seconds(1));
17    std::lock_guard<std::mutex> second_fork(forks[second]);
18
19    printf("Philosopher %u is eating.\n", idx);
20    std::this_thread::sleep_for(std::chrono::seconds(1));
21  }

OK. Now we know that thread 2 is waiting on line 17, which is trying to grab the second fork.

After inspecting the other threads, it will be clear that every philosophers grabbed their first fork but they couldn't grab their second fork. It is a classical deadlock caused by circular waiting.

We can break the tie by changing following lines:

unsigned first = idx;
unsigned second = (idx + 1) % NUM_PHILOSOPHERS;

to:

unsigned first, second;
if (idx == NUM_PHILOSOPHERS - 1) {
  first = 0;
  second = idx;
} else {
  first = idx;
  second = idx + 1;
}

In the other words, the last philosopher will grab the fork in a different order. There will be no circular waiting after implementing this strategy. The program can run without problems now!

$ ./a.out
Philosopher 1 is eating.
Philosopher 0 is eating.
Philosopher 2 is eating.

Conclusion

To sum up, there are two important gdb commands that can be used to debug multithreaded programs:

  1. info threads -- This command will list all threads and print the thread Ids.
  2. thread [id] -- This command will switch the context to another thread, so that thread-specific commands such as bt and info registers can work without problems.

That's all. Hope you enjoy this post. See you.