Thoughts on the Developement of PyPy RISC-V JIT Backend

After eight years of development, my patch implementing a RISC-V JIT backend for PyPy was merged upstream in 2024. This work is now included in the PyPy v7.3.17 release in this August. To share my journey and my insights, I decided to write this article.

A brief introduction to PyPy and RPython

PyPy is a Python implementation that uses a Just-in-Time (JIT) compiler to accelerate Python programs. Its benchmark shows speed improvements of roughly 4.4 times compared to CPython.

PyPy is a Python interpreter implemented in RPython, a statically-typed subset of the Python Programming Language. The RPython toolchain can compile RPython code to native machine code, enabling direct execution on hardware. However, even after compiling the PyPy interpreter, its initial performance is still lower than that of CPython.

To achieve performance superior to CPython, PyPy employs a Meta-Tracing Just-In-Time (JIT) compiler. To understand this, let's break down the omponents:

  • Just-In-Time (JIT) Compilation: A JIT compiler compiles code fragments at runtime, a common technique in dynamic language implementations (e.g., JavaScript, Ruby). Unlike Ahead-of-Time (AOT) compilers, which often generate multiple specialized versions alongside a slower, general version (increasing code size), JIT compilers can generate more efficient code by specializing types. (Note: AOT's code bloat can make it less competitive than interpreters or JIT compilers in some scenarios.)

  • Tracing JIT vs. Method JIT: JIT compilers can be categorized by the scope of code they compile. Tracing JITs use an interpreter to identify frequently executed code sequences or loops (hot spots) and then compile these hot spots into machine code. Method JITs, on the other hand, compile entire methods. PyPy/RPython uses a Tracing JIT. While Method JIT is often theorized to offer greater optimization potential due to its larger scope, I believe this isn't always the case in practice. Performance depends heavily on specific benchmarks and the implementation quality of both Tracing and Method JITs -- a topic for another discussion.

  • Meta-Tracing JIT: This is a variant of Tracing JIT. A interpreter typically consists of a single interpreter loop and several opcode handlers. If a Tracing JIT were to trace the interpreter directly, the opcode handlers themselves would likely become hot spots, albeit short ones. Even in the best case, this would only achieve performance comparable to a C/C++ implementation of the interpreter, failing to leverage the type specialization benefits of JIT compilation.

    Meta-Tracing JIT addresses this by tracing the interpreted program rather than the interpreter itself. This is achieved by annotating the interpreter. These annotations allow the Meta-Tracing JIT to identify and compile hot spots within the executed program's logic. The PyPy interpreter is, of course, annotated. Meta-Tracing JIT is a core technology of PyPy/RPython. For further details, see Tracing the meta-level: PyPy's tracing JIT compiler.

RISC-V JIT backend

My work focused on the JIT backend, which is responsible for compiling the hot spots identified by meta-tracing into machine code. Porting a JIT backend to a new platform typically follows a standard procedure.

The RPython JIT backend is structured as a two-pass compiler: the first pass allocates machine registers, and the second translates RPython intermediate opcodes into native machine code. Although some utility functions are shared across backends, their internal organization differs significantly. My porting effort was based on the ARM/AArch64 backend, with which I had prior experience. To fully grasp the semantics of the RPython opcodes, I meticulously studied the ARM/AArch64 backend's code and development logs. When the ARM/AArch64 implementation was unclear to me, I consulted the x86 backend for additional insights.

During the porting process, I found Test-Driven Development (TDD) invaluable. PyPy's complexity, numerous abstraction layers, and the informally defined nature of RPython (a Python-like language with distinct semantics) combined with the RPython toolchain's cryptic error messages initially hindered my progress. After several years of slow progress, I adopted TDD two years ago. I began by creating a RISC-V assembler and writing comprehensive test cases to ensure its correctness. Then, following the AArch64 backend's development order, I ported RPython opcodes incrementally, enabling a corresponding test case after each implementation. This TDD approach allowed me to make steady progress and also provided an opportunity to learn PyTest and Hypothesis.

Notably, the RPython test suite provides excellent coverage. Despite being platform-independent, it uncovered a bug in my backend related to a misunderstanding of the RISC-V ABI. According to the ABI, if all floating-point argument registers are occupied but integer argument registers remain available, floating-point arguments must be passed in integer registers. I had incorrectly spilled the argument to the call stack. I was impressed by the test suite's ability to catch this subtle error.

At the end of 2023, I dedicated my year-end vacation to tackling the most challenging aspects of the port. Within two weeks, I implemented opcodes for function calls, guards, compiler bridges, garbage collector write barrier, and garbage collector write barrier for array, etc.

The garbage collector write barrier for array requires marking the card table whenever an array element is updated. Specifically, array indices from k * 128 to (k + 1) * 128 - 1 map to the same bit in the card table. During the garbage collector's mark phase, this allows the GC to skip checking 128 elements if the corresponding bit is not set, thus reduces pause times. Implementing this opcode brought back fond memories of discussing garbage collection papers with my classmates.

By January 2024, I had implemented all necessary opcodes and enabled the compilation of the full PyPy interpreter with the RPython toolchain. I announced this accomplishment on the PyPy mailing list with a post titled "Contribute a RISC-V 64 JIT backend." Following Matti Picus's suggestion, I ran several test suites and measured the benchmark performance on a SiFive Unmatched development board. I then submitted a Pull Request, which was merged on August 14th. Instructions for building PyPy with the RISC-V JIT backend can be found in "Cross-Translating for RISC-V."

Thoughts

I am delighted to have completed this eight-year side project in 2024, fulfilling two long-held aspirations.

  • First, I achieved my goal of porting a compiler to a new platform, gaining valuable insights into compiler backend details such as register allocation, function calls, constant pools, and trampolines, as well as learning the RISC-V ISA and ABI.
  • Second, I was able to contribute to the PyPy project, a source of inspiration since I first learned about it. PyPy's unconventional technical decisions broadened my research perspective, motivating me to persevere with this side project. I am truly grateful to have finally achieved this goal.

Finally, I would like to express my sincere gratitude to my undergraduate classmate X, whose enthusiastic introduction to PyPy years ago set this entire journey in motion.