Closure Expansion in rustc

This section describes how rustc handles closures. Closures in Rust are effectively "desugared" into structs that contain the values they use (or references to the values they use) from their creator's stack frame. rustc has the job of figuring out which values a closure uses and how, so it can decide whether to capture a given variable by shared reference, mutable reference, or by move. rustc also has to figure out which of the closure traits (Fn, FnMut, or FnOnce) a closure is capable of implementing.

Let's start with a few examples:

Example 1

To start, let's take a look at how the closure in the following example is desugared:

fn closure(f: impl Fn()) {
    f();
}

fn main() {
    let x: i32 = 10;
    closure(|| println!("Hi {}", x));  // The closure just reads x.
    println!("Value of x after return {}", x);
}

Let's say the above is the content of a file called immut.rs. If we compile immut.rs using the following command. The -Z dump-mir=all flag will cause rustc to generate and dump the MIR to a directory called mir_dump.

> rustc +stage1 immut.rs -Z dump-mir=all

After we run this command, we will see a newly generated directory in our current working directory called mir_dump, which will contain several files. If we look at file rustc.main.-------.mir_map.0.mir, we will find, among other things, it also contains this line:

_4 = &_1;
_3 = [closure@immut.rs:7:13: 7:36] { x: move _4 };

Note that in the MIR examples in this chapter, _1 is x.

Here in first line _4 = &_1;, the mir_dump tells us that x was borrowed as an immutable reference. This is what we would hope as our closure just reads x.

Example 2

Here is another example:

fn closure(mut f: impl FnMut()) {
    f();
}

fn main() {
    let mut x: i32 = 10;
    closure(|| {
        x += 10;  // The closure mutates the value of x
        println!("Hi {}", x)
    });
    println!("Value of x after return {}", x);
}
_4 = &mut _1;
_3 = [closure@mut.rs:7:13: 10:6] { x: move _4 };

This time along, in the line _4 = &mut _1;, we see that the borrow is changed to mutable borrow. Fair enough! The closure increments x by 10.

Example 3

One more example:

fn closure(f: impl FnOnce()) {
    f();
}

fn main() {
    let x = vec![21];
    closure(|| {
        drop(x);  // Makes x unusable after the fact.
    });
    // println!("Value of x after return {:?}", x);
}
_6 = [closure@move.rs:7:13: 9:6] { x: move _1 }; // bb16[3]: scope 1 at move.rs:7:13: 9:6

Here, x is directly moved into the closure and the access to it will not be permitted after the closure.

Inferences in the compiler

Now let's dive into rustc code and see how all these inferences are done by the compiler.

Let's start with defining a term that we will be using quite a bit in the rest of the discussion - upvar. An upvar is a variable that is local to the function where the closure is defined. So, in the above examples, x will be an upvar to the closure. They are also sometimes referred to as the free variables meaning they are not bound to the context of the closure. compiler/rustc_passes/src/upvars.rs defines a query called upvars_mentioned for this purpose.

Other than lazy invocation, one other thing that distinguishes a closure from a normal function is that it can use the upvars. It borrows these upvars from its surrounding context; therefore the compiler has to determine the upvar's borrow type. The compiler starts with assigning an immutable borrow type and lowers the restriction (that is, changes it from immutable to mutable to move) as needed, based on the usage. In the Example 1 above, the closure only uses the variable for printing but does not modify it in any way and therefore, in the mir_dump, we find the borrow type for the upvar x to be immutable. In example 2, however, the closure modifies x and increments it by some value. Because of this mutation, the compiler, which started off assigning x as an immutable reference type, has to adjust it as a mutable reference. Likewise in the third example, the closure drops the vector and therefore this requires the variable x to be moved into the closure. Depending on the borrow kind, the closure has to implement the appropriate trait: Fn trait for immutable borrow, FnMut for mutable borrow, and FnOnce for move semantics.

Most of the code related to the closure is in the compiler/rustc_hir_typeck/src/upvar.rs file and the data structures are declared in the file compiler/rustc_middle/src/ty/mod.rs.

Before we go any further, let's discuss how we can examine the flow of control through the rustc codebase. For closures specifically, set the RUSTC_LOG env variable as below and collect the output in a file:

> RUSTC_LOG=rustc_hir_typeck::upvar rustc +stage1 -Z dump-mir=all \
    <.rs file to compile> 2> <file where the output will be dumped>

This uses the stage1 compiler and enables debug! logging for the rustc_hir_typeck::upvar module.

The other option is to step through the code using lldb or gdb.

  1. rust-lldb build/host/stage1/bin/rustc test.rs
  2. In lldb:
    1. b upvar.rs:134 // Setting the breakpoint on a certain line in the upvar.rs file`
    2. r // Run the program until it hits the breakpoint

Let's start with upvar.rs. This file has something called the euv::ExprUseVisitor which walks the source of the closure and invokes a callback for each upvar that is borrowed, mutated, or moved.

fn main() {
    let mut x = vec![21];
    let _cl = || {
        let y = x[0];  // 1.
        x[0] += 1;  // 2.
    };
}

In the above example, our visitor will be called twice, for the lines marked 1 and 2, once for a shared borrow and another one for a mutable borrow. It will also tell us what was borrowed.

The callbacks are defined by implementing the Delegate trait. The InferBorrowKind type implements Delegate and keeps a map that records for each upvar which mode of capture was required. The modes of capture can be ByValue (moved) or ByRef (borrowed). For ByRef borrows, the possible BorrowKinds are ImmBorrow, UniqueImmBorrow, MutBorrow as defined in the compiler/rustc_middle/src/ty/mod.rs.

Delegate defines a few different methods (the different callbacks): consume for move of a variable, borrow for a borrow of some kind (shared or mutable), and mutate when we see an assignment of something.

All of these callbacks have a common argument cmt which stands for Category, Mutability and Type and is defined in compiler/rustc_middle/src/middle/mem_categorization.rs. Borrowing from the code comments, "cmt is a complete categorization of a value indicating where it originated and how it is located, as well as the mutability of the memory in which the value is stored". Based on the callback (consume, borrow etc.), we will call the relevant adjust_upvar_borrow_kind_for_<something> and pass the cmt along. Once the borrow type is adjusted, we store it in the table, which basically says what borrows were made for each closure.

self.tables
    .borrow_mut()
    .upvar_capture_map
    .extend(delegate.adjust_upvar_captures);