The Compiler Backend

All of the preceding chapters of this guide have one thing in common: we never generated any executable machine code at all! With this chapter, all of that changes.

It's often useful to think of compilers as being composed of a frontend and a backend (though in rustc, there's not a sharp line between frontend and backend). The frontend is responsible for taking raw source code, checking it for correctness, and getting it into a format usable by the backend. For rustc, this format is the MIR. The backend refers to the parts of the compiler that turn rustc's MIR into actual executable code (e.g. an ELF or EXE binary) that can run on a processor. All of the previous chapters deal with rustc's frontend.

rustc's backend does the following:

  1. First, we need to collect the set of things to generate code for. In particular, we need to find out which concrete types to substitute for generic ones, since we need to generate code for the concrete types. Generating code for the concrete types (i.e. emitting a copy of the code for each concrete type) is called monomorphization, so the process of collecting all the concrete types is called monomorphization collection.
  2. Next, we need to actually lower the MIR to a codegen IR (usually LLVM IR) for each concrete type we collected.
  3. Finally, we need to invoke the codegen backend (e.g. LLVM or Cranelift), which runs a bunch of optimization passes, generates executable code, and links together an executable binary.

The code for codegen is actually a bit complex due to a few factors:

  • Support for multiple backends (LLVM and Cranelift). We try to share as much backend code between them as possible, so a lot of it is generic over the codegen implementation. This means that there are often a lot of layers of abstraction.
  • Codegen happens asynchronously in another thread for performance.
  • The actual codegen is done by a third-party library (either LLVM or Cranelift).

Generally, the rustc_codegen_ssa crate contains backend-agnostic code (i.e. independent of LLVM or Cranelift), while the rustc_codegen_llvm crate contains code specific to LLVM codegen.

At a very high level, the entry point is rustc_codegen_ssa::base::codegen_crate. This function starts the process discussed in the rest of this chapter.