About this guide

This guide is meant to help document how rustc – the Rust compiler – works, as well as to help new contributors get involved in rustc development.

There are six parts to this guide:

  1. Building and debugging rustc: Contains information that should be useful no matter how you are contributing, about building, debugging, profiling, etc.
  2. Contributing to rustc: Contains information that should be useful no matter how you are contributing, about procedures for contribution, stabilizing features, etc.
  3. High-Level Compiler Architecture: Discusses the high-level architecture of the compiler and stages of the compile process.
  4. Source Code Representation: Describes the process of taking raw source code from the user and transforming it into various forms that the compiler can work with easily.
  5. Analysis: discusses the analyses that the compiler uses to check various properties of the code and inform later stages of the compile process (e.g., type checking).
  6. From MIR to Binaries: How linked executable machine code is generated.
  7. Appendices at the end with useful reference information. There are a few of these with different information, including a glossary.

The Guide itself is of course open-source as well, and the sources can be found at the GitHub repository. If you find any mistakes in the guide, please file an issue about it, or even better, open a PR with a correction!

Other places to find information

You might also find the following sites useful:

  • rustc API docs -- rustdoc documentation for the compiler
  • Forge -- contains documentation about rust infrastructure, team procedures, and more
  • compiler-team -- the home-base for the rust compiler team, with description of the team procedures, active working groups, and the team calendar.

Getting Started

This documentation is not intended to be comprehensive; it is meant to be a quick guide for the most useful things. For more information, see this chapter on how to build and run the compiler.

Asking Questions

The compiler team (or t-compiler) usually hangs out in Zulip in this "stream"; it will be easiest to get questions answered there.

Please ask questions! A lot of people report feeling that they are "wasting expert time", but nobody on t-compiler feels this way. Contributors are important to us.

Also, if you feel comfortable, prefer public topics, as this means others can see the questions and answers, and perhaps even integrate them back into this guide :)

Experts

Not all t-compiler members are experts on all parts of rustc; it's a pretty large project. To find out who has expertise on different parts of the compiler, consult this "experts map".

It's not perfectly complete, though, so please also feel free to ask questions even if you can't figure out who to ping.

Etiquette

We do ask that you be mindful to include as much useful information as you can in your question, but we recognize this can be hard if you are unfamiliar with contributing to Rust.

Just pinging someone without providing any context can be a bit annoying and just create noise, so we ask that you be mindful of the fact that the t-compiler folks get a lot of pings in a day.

Cloning and Building

The main repository is rust-lang/rust. This contains the compiler, the standard library (including core, alloc, test, proc_macro, etc), and a bunch of tools (e.g. rustdoc, the bootstrapping infrastructure, etc).

There are also a bunch of submodules for things like LLVM, clippy, miri, etc. You don't need to clone these immediately, but the build tool will automatically clone and sync them (more later).

Take a look at the "Suggested Workflows" chapter for some helpful advice.

System Requirements

See this chapter for detailed software requirements. Most notably, you will need Python 2 or 3 to run x.py.

There are no hard hardware requirements, but building the compiler is computationally expensive, so a beefier machine will help, and I wouldn't recommend trying to build on a Raspberry Pi :P

  • Recommended >=30GB of free disk space; otherwise, you will have to keep clearing incremental caches. More space is better, the compiler is a bit of a hog; it's a problem we are aware of.
  • Recommended >=8GB RAM.
  • Recommended >=2 cores; more cores really helps.
  • You will need an internet connection to build; the bootstrapping process involves updating git submodules and downloading a beta compiler. It doesn't need to be super fast, but that can help.

Building the compiler takes more than half an hour on my moderately powerful laptop. The first time you build the compiler, LLVM will also be built unless you use your system's LLVM (see below).

Like cargo, the build system will use as many cores as possible. Sometimes this can cause you to run low on memory. You can use -j to adjust the number concurrent jobs. If a full build takes more than ~45 minutes to an hour, you are probably spending most of the time swapping memory in and out; try using -j1.

On a slow machine, the build times for rustc are very painful. Consider using ./x.py check instead of a full build and letting the automated tests run when you push to GitHub.

If you don't have too much free disk space, you may want to turn off incremental compilation (see below). This will make compilation take longer (especially after a rebase), but will save a ton of space from the incremental caches.

Cloning

You can just do a normal git clone:

git clone https://github.com/rust-lang/rust.git

You don't need to clone the submodules at this time. But if you want to, you can do the following:

# first time
git submodule update --init --recursive

# subsequent times (to pull new commits)
git submodule update

Configuring the Compiler

The compiler has a configuration file which contains a ton of settings. We will provide some recommendations here that should work for most, but check out this chapter for more info.

In the top level of the repo:

$ x.py setup

This will walk you through an interactive setup for x.py that looks like this:

$ x.py setup
Welcome to the Rust project! What do you want to do with x.py?
a) Contribute to the standard library
b) Contribute to the compiler
c) Contribute to the compiler, and also modify LLVM or codegen
d) Install Rust from source
Please choose one (a/b/c/d): a
`x.py` will now use the configuration at /home/joshua/rustc2/src/bootstrap/defaults/config.toml.library
To get started, try one of the following commands:
- `x.py check`
- `x.py build`
- `x.py test library/std`
- `x.py doc`
For more suggestions, see https://rustc-dev-guide.rust-lang.org/building/suggested.html

You may also want to set up system LLVM to avoid building LLVM from source.

./x.py Intro

rustc is a bootstrapping compiler, which means that it is written in Rust and thus needs to be compiled by itself. So where do you get the original compiler from? We use the current beta compiler to build a new compiler. Then, we use that compiler to build itself. Thus, rustc has a 2-stage build. You can read more about bootstrapping here, but you don't need to know much more to contribute.

We have a special tool ./x.py that drives this process. It is used for building the compiler, the standard libraries, and rustdoc. It is also used for driving CI and building the final release artifacts.

Unfortunately, a proper 2-stage build takes a long time depending on your hardware, but it is the only correct way to build everything (e.g. it's what the CI and release processes use). However, in most cases, you can get by without a full 2-stage build. In the following section, we give instructions for how to do "the correct thing", but then we also give various tips to speed things up.

Building and Testing rustc

Here is a summary of the different commands for reference, but you probably should still read the rest of the section:

CommandWhen to use it
x.py checkQuick check to see if things compile; rust-analyzer can run this automatically for you
x.py build --stage 0 [library/std]Build only the standard library, without building the compiler
x.py build library/stdBuild just the 1st stage of the compiler, along with the standard library; this is faster than building stage 2 and usually good enough
x.py build --keep-stage 1 library/stdBuild the 1st stage of the compiler and skips rebuilding the standard library; this is useful after you've done an ordinary stage1 build to skip compilation time, but it can cause weird problems. (Just do a regular build to resolve.)
x.py test [--keep-stage 1]Run the test suite using the stage1 compiler
x.py test --bless [--keep-stage 1]Run the test suite using the stage1 compiler and update expected test output.
x.py build --stage 2 compiler/rustcDo a full 2-stage build. You almost never want to do this.
x.py test --stage 2Do a full 2-stage build and run all tests. You almost never want to do this.

To do a full 2-stage build of the whole compiler, you should run this (after updating config.toml as mentioned above):

./x.py build --stage 2 compiler/rustc

In the process, this will also necessarily build the standard libraries, and it will build rustdoc (which doesn't take too long).

To build and test everything:

./x.py test

For most contributions, you only need to build stage 1, which saves a lot of time:

# Build the compiler (stage 1)
./x.py build library/std

# Subsequent builds
./x.py build --keep-stage 1 library/std

This will take a while, especially the first time. Be wary of accidentally touching or formatting the compiler, as ./x.py will try to recompile it.

NOTE: The --keep-stage 1 will assume that the stage 0 standard library does not need to be rebuilt, which is usually true, which will save some time. However, if you are changing certain parts of the compiler, this may lead to weird errors. Feel free to ask on zulip if you are running into issues.

This runs a ton of tests and takes a long time to complete. If you are working on rustc, you can usually get by with only the UI tests. These test are mostly for the frontend of the compiler, so if you are working on LLVM or codegen, this shortcut will not test your changes. You can read more about the different test suites in this chapter.

# First build
./x.py test src/test/ui

# Subsequent builds
./x.py test src/test/ui --keep-stage 1

If your changes impact test output, you can use --bless to automatically update the .stderr files of the affected tests:

./x.py test src/test/ui --keep-stage 1 --bless

While working on the compiler, it can be helpful to see if the code just compiles (similar to cargo check) without actually building it. You can do this with:

./x.py check

This command is really fast (relative to the other commands). It usually completes in a couple of minutes on my laptop. A common workflow when working on the compiler is to make changes and repeatedly check with ./x.py check. Then, run the tests as shown above when you think things should work.

Finally, the CI ensures that the codebase is using consistent style. To format the code:

# Actually format
./x.py fmt

# Just check formatting, exit with error
./x.py fmt --check

Note: we don't use stable rustfmt; we use a pinned version with a special config, so this may result in different style from normal rustfmt if you have format-on-save turned on. It's a good habit to run ./x.py fmt before every commit, as this reduces conflicts later. The pinned version is built under build/<target>/stage0/bin/rustfmt, so if you want, you can use it for a single file or for format-on-save in your editor, which can be faster than ./x.py fmt.

On last thing: you can use RUSTC_LOG=XXX to get debug logging. Read more here. Notice the C in RUSTC_LOG. Other than that, it uses normal env_logger syntax.

Building and Testing std/core/alloc/test/proc_macro/etc.

As before, technically the proper way to build one of these libraries is to use the stage-2 compiler, which of course requires a 2-stage build, described above (./x.py build).

In practice, though, you don't need to build the compiler unless you are planning to use a recently added nightly feature. Instead, you can just build stage 0, which uses the current beta compiler.

./x.py build --stage 0
./x.py test --stage 0 library/std

(The same works for library/alloc, library/core, etc.)

Building and Testing rustdoc

rustdoc uses rustc internals (and, of course, the standard library), so you will have to build the compiler and std once before you can build rustdoc. As before, you can use ./x.py build to do this. The first time you build, the stage-1 compiler will also be built.

# First build
./x.py build

# Subsequent builds
./x.py build --keep-stage 1

As with the compiler, you can do a fast check build:

./x.py check

Rustdoc has two types of tests: content tests and UI tests.

# Content tests
./x.py test src/test/rustdoc

# UI tests
./x.py test src/test/rustdoc-ui

# Both at once
./x.py test src/test/rustdoc src/test/rustdoc-ui

Contributing code to other Rust projects

There are a bunch of other projects that you can contribute to outside of the rust-lang/rust repo, including clippy, miri, chalk, and many others.

These repos might have their own contributing guidelines and procedures. Many of them are owned by working groups (e.g. chalk is largely owned by WG-traits). For more info, see the documentation in those repos' READMEs.

Other ways to contribute

There are a bunch of other ways you can contribute, especially if you don't feel comfortable jumping straight into the large rust-lang/rust codebase.

The following tasks are doable without much background knowledge but are incredibly helpful:

  • Cleanup crew: find minimal reproductions of ICEs, bisect regressions, etc. This is a way of helping that saves a ton of time for others to fix an error later.
  • Writing documentation: if you are feeling a bit more intrepid, you could try to read a part of the code and write doc comments for it. This will help you to learn some part of the compiler while also producing a useful artifact!
  • Working groups: there are a bunch of working groups on a wide variety of rust-related things.

Contributor Procedures

There are some official procedures to know about. This is a tour of the highlights, but there are a lot more details, which we will link to below.

Code Review

When you open a PR on the rust-lang/rust repo, a bot called @rust-highfive will automatically assign a reviewer to the PR. The reviewer is the person that will approve the PR to be tested and merged. If you want a specific reviewer (e.g. a team member you've been working with), you can specifically request them by writing r? @user (e.g. r? @eddyb) in either the original post or a followup comment (you can see this comment for example).

Please note that the reviewers are humans, who for the most part work on rustc in their free time. This means that they can take some time to respond and review your PR. It also means that reviewers can miss some PRs that are assigned to them.

To try to move PRs forward, the Triage WG regularly goes through all PRs that are waiting for review and haven't been discussed for at least 2 weeks. If you don't get a review within 2 weeks, feel free to ask the Triage WG on Zulip (#t-release/triage). They have knowledge of when to ping, who might be on vacation, etc.

The reviewer may request some changes using the GitHub code review interface. They may also request special procedures (such as a crater run; see below) for some PRs.

When the PR is ready to be merged, the reviewer will issue a command to @bors, the CI bot. Usually, this is @bors r+ or @bors r=user to approve a PR (there are few other commands, but they are less relevant here). You can see this comment for example. This puts the PR in bors's queue to be tested and merged. Be patient; this can take a while and the queue can sometimes be long. PRs are never merged by hand.

Bug Fixes or "Normal" code changes

For most PRs, no special procedures are needed. You can just open a PR, and it will be reviewed, approved, and merged. This includes most bug fixes, refactorings, and other user-invisible changes. The next few sections talk about exceptions to this rule.

Also, note that it is perfectly acceptable to open WIP PRs or GitHub Draft PRs. Some people prefer to do this so they can get feedback along the way or share their code with a collaborator. Others do this so they can utilize the CI to build and test their PR (e.g. if you are developing on a laptop).

New Features

Rust has strong backwards-compatibility guarantees. Thus, new features can't just be implemented directly in stable Rust. Instead, we have 3 release channels: stable, beta, and nightly.

  • Stable: this is the latest stable release for general usage.
  • Beta: this is the next release (will be stable within 6 weeks).
  • Nightly: follows the master branch of the repo. This is the only channel where unstable, incomplete, or experimental features are usable with feature gates.

In order to implement a new feature, usually you will need to go through the RFC process to propose a design, have discussions, etc. In some cases, small features can be added with only an FCP (see below). If in doubt, ask the compiler, language, or libs team (whichever is most relevant).

After a feature is approved to be added, a tracking issue is created on the rust-lang/rust repo, which tracks the progress towards the implementation of the feature, any bugs reported, and eventually stabilization.

The feature then needs to be implemented behind a feature gate, which prevents it from being accidentally used.

Finally, somebody may propose stabilizing the feature in an upcoming version of Rust. This requires a Final Comment Period (see below) to get the approval of the relevant teams.

After that, the feature gate can be removed and the feature turned on for all users.

For more details on this process, see this chapter on implementing new features.

Breaking Changes

As mentioned above, Rust has strong backwards-compatibility guarantees. To this end, we are reluctant to make breaking changes. However, sometimes they are needed to correct compiler bugs (e.g. code that compiled but should not) or make progress on some features.

Depending on the scale of the breakage, there are a few different actions that can be taken. If the reviewer believes the breakage is very minimal (i.e. very unlikely to be actually encountered by users), they may just merge the change. More often, they will request a Final Comment Period (FCP), which calls for rough consensus among the members of a relevant team. The team members can discuss the issue and either accept, reject, or request changes on the PR.

If the scale of breakage is large, a deprecation warning may be needed. This is a warning that the compiler will display to users whose code will break in the future. After some time, an FCP can be used to move forward with the actual breakage.

If the scale of breakage is unknown, a team member or contributor may request a crater run. This is a bot that will compile all crates.io crates and many public github repos with the compiler with your changes. A report will then be generated with crates that ceased to compile with or began to compile with your changes. Crater runs can take a few days to complete.

Major Changes

The compiler team has a special process for large changes, whether or not they cause breakage. This process is called a Major Change Proposal (MCP). MCP is a relatively lightweight mechanism for getting feedback on large changes to the compiler (as opposed to a full RFC or a design meeting with the team).

Example of things that might require MCPs include major refactorings, changes to important types, or important changes to how the compiler does something, or smaller user-facing changes.

When in doubt, ask on zulip. It would be a shame to put a lot of work into a PR that ends up not getting merged! See this document for more info on MCPs.

Performance

Compiler performance is important. We have put a lot of effort over the last few years into gradually improving it.

If you suspect that your change may cause a performance regression (or improvement), you can request a "perf run" (your reviewer may also request one before approving). This is yet another bot that will compile a collection of benchmarks on a compiler with your changes. The numbers are reported here, and you can see a comparison of your changes against the latest master.

Other Resources

How to Build and Run the Compiler

The compiler is built using a tool called x.py. You will need to have Python installed to run it. But before we get to that, if you're going to be hacking on rustc, you'll want to tweak the configuration of the compiler. The default configuration is oriented towards running the compiler as a user, not a developer.

For instructions on how to install Python and other prerequisites, see the next page.

Get the source code

The very first step to work on rustc is to clone the repository:

git clone https://github.com/rust-lang/rust.git
cd rust

Create a config.toml

To start, copy config.toml.example to config.toml:

cp config.toml.example config.toml

Then you will want to open up the file and change the following settings (and possibly others, such as llvm.ccache):

[llvm]
# Indicates whether the LLVM assertions are enabled or not
assertions = true

[rust]
# Whether or not to leave debug! and trace! calls in the rust binary.
# Overrides the `debug-assertions` option, if defined.
#
# Defaults to rust.debug-assertions value
#
# If you see a message from `tracing` saying
# `max_level_info` is enabled and means logging won't be shown,
# set this value to `true`.
debug-logging = true

# Whether to always use incremental compilation when building rustc
incremental = true

If you have already built rustc, then you may have to execute rm -rf build for subsequent configuration changes to take effect. Note that ./x.py clean will not cause a rebuild of LLVM, so if your configuration change affects LLVM, you will need to manually rm -rf build/ before rebuilding.

What is x.py?

x.py is the script used to orchestrate the tooling in the rustc repository. It is the script that can build docs, run tests, and compile rustc. It is the now preferred way to build rustc and it replaces the old makefiles from before. Below are the different ways to utilize x.py in order to effectively deal with the repo for various common tasks.

This chapter focuses on the basics to be productive, but if you want to learn more about x.py, read its README.md here. To read more about the bootstrap process and why x.py is necessary, read this chapter.

Running x.py slightly more conveniently

There is a binary that wraps x.py called x in src/tools/x. All it does is run x.py, but it can be installed system-wide and run from any subdirectory of a checkout. It also looks up the appropriate version of python to use.

You can install it with cargo install --path src/tools/x.

Building the Compiler

To build a compiler, run ./x.py build. This will build up to the stage1 compiler, including rustdoc, producing a usable compiler toolchain from the source code you have checked out.

Note that building will require a relatively large amount of storage space. You may want to have upwards of 10 or 15 gigabytes available to build the compiler.

There are many flags you can pass to the build command of x.py that can be beneficial to cutting down compile times or fitting other things you might need to change. They are:

Options:
    -v, --verbose       use verbose output (-vv for very verbose)
    -i, --incremental   use incremental compilation
        --config FILE   TOML configuration file for build
        --build BUILD   build target of the stage0 compiler
        --host HOST     host targets to build
        --target TARGET target targets to build
        --on-fail CMD   command to run on failure
        --stage N       stage to build
        --keep-stage N  stage to keep without recompiling
        --src DIR       path to the root of the rust checkout
    -j, --jobs JOBS     number of jobs to run in parallel
    -h, --help          print this help message

For hacking, often building the stage 1 compiler is enough, but for final testing and release, the stage 2 compiler is used.

./x.py check is really fast to build the rust compiler. It is, in particular, very useful when you're doing some kind of "type-based refactoring", like renaming a method, or changing the signature of some function.

Once you've created a config.toml, you are now ready to run x.py. There are a lot of options here, but let's start with what is probably the best "go to" command for building a local rust:

./x.py build -i library/std

This may look like it only builds std, but that is not the case. What this command does is the following:

  • Build std using the stage0 compiler (using incremental)
  • Build rustc using the stage0 compiler (using incremental)
    • This produces the stage1 compiler
  • Build std using the stage1 compiler (cannot use incremental)

This final product (stage1 compiler + libs built using that compiler) is what you need to build other rust programs (unless you use #![no_std] or #![no_core]).

The command includes the -i switch which enables incremental compilation. This will be used to speed up the first two steps of the process: in particular, if you make a small change, we ought to be able to use your old results to make producing the stage1 compiler faster.

Unfortunately, incremental cannot be used to speed up making the stage1 libraries. This is because incremental only works when you run the same compiler twice in a row. In this case, we are building a new stage1 compiler every time. Therefore, the old incremental results may not apply. As a result, you will probably find that building the stage1 std is a bottleneck for you -- but fear not, there is a (hacky) workaround. See the section on "recommended workflows" below.

Note that this whole command just gives you a subset of the full rustc build. The full rustc build (what you get if you say ./x.py build --stage 2 compiler/rustc) has quite a few more steps:

  • Build rustc with the stage1 compiler.
    • The resulting compiler here is called the "stage2" compiler.
  • Build std with stage2 compiler.
  • Build librustdoc and a bunch of other things with the stage2 compiler.

Build specific components

  • Build only the core library
./x.py build --stage 0 library/core
  • Build only the core and proc_macro libraries
./x.py build --stage 0 library/core library/proc_macro

Sometimes you might just want to test if the part you’re working on can compile. Using these commands you can test that it compiles before doing a bigger build to make sure it works with the compiler. As shown before you can also pass flags at the end such as --stage.

Creating a rustup toolchain

Once you have successfully built rustc, you will have created a bunch of files in your build directory. In order to actually run the resulting rustc, we recommend creating rustup toolchains. The first one will run the stage1 compiler (which we built above). The second will execute the stage2 compiler (which we did not build, but which you will likely need to build at some point; for example, if you want to run the entire test suite).

rustup toolchain link stage1 build/<host-triple>/stage1
rustup toolchain link stage2 build/<host-triple>/stage2

The <host-triple> would typically be one of the following:

  • Linux: x86_64-unknown-linux-gnu
  • Mac: x86_64-apple-darwin
  • Windows: x86_64-pc-windows-msvc

Now you can run the rustc you built with. If you run with -vV, you should see a version number ending in -dev, indicating a build from your local environment:

$ rustc +stage1 -vV
rustc 1.48.0-dev
binary: rustc
commit-hash: unknown
commit-date: unknown
host: x86_64-unknown-linux-gnu
release: 1.48.0-dev
LLVM version: 11.0

Other x.py commands

Here are a few other useful x.py commands. We'll cover some of them in detail in other sections:

  • Building things:
    • ./x.py build – builds everything using the stage 1 compiler, not just up to std
    • ./x.py build --stage 2 – builds the stage2 compiler
  • Running tests (see the section on running tests for more details):
    • ./x.py test library/std – runs the #[test] tests from std
    • ./x.py test src/test/ui – runs the ui test suite
    • ./x.py test src/test/ui/const-generics - runs all the tests in the const-generics/ subdirectory of the ui test suite
    • ./x.py test src/test/ui/const-generics/const-types.rs - runs the single test const-types.rs from the ui test suite

Cleaning out build directories

Sometimes you need to start fresh, but this is normally not the case. If you need to run this then rustbuild is most likely not acting right and you should file a bug as to what is going wrong. If you do need to clean everything up then you only need to run one command!

./x.py clean

rm -rf build works too, but then you have to rebuild LLVM, which can take a long time even on fast computers.

Prerequisites

Dependencies

Before building the compiler, you need the following things installed:

  • python 3 or 2.7 (under the name python; python2 or python3 will not work)
  • curl
  • git
  • ssl which comes in libssl-dev or openssl-devel
  • pkg-config if you are compiling on Linux and targeting Linux

If building LLVM from source (the default), you'll need additional tools:

  • g++ 5.1 or later, clang++ 3.5 or later, or MSVC 2017 or later.
  • ninja, or GNU make 3.81 or later (ninja is recommended, especially on Windows)
  • cmake 3.4.3 or later

Otherwise, you'll need LLVM installed and llvm-config in your path. See this section for more info.

Windows

winget is a Windows package manager. It will make package installation easy on Windows.

Run the following in a terminal:

winget install python
winget install cmake

If any of those is installed already, winget will detect it. Then edit your systems PATH variable and add: C:\Program Files\CMake\bin.

For more information about building on Windows, see the rust-lang/rust README.

Hardware

These are not so much requirements as recommendations:

  • ~15GB of free disk space (~25GB or more if doing incremental builds).
  • >= 8GB RAM
  • >= 4 cores
  • Internet access

Beefier machines will lead to much faster builds. If your machine is not very powerful, a common strategy is to only use ./x.py check on your local machine and let the CI build test your changes when you push to a PR branch.

rustc and toolchain installation

Follow the installation given in the Rust book to install a working rustc and the necessary C/++ toolchain on your platform.

Suggested Workflows

The full bootstrapping process takes quite a while. Here are some suggestions to make your life easier.

Installing a pre-commit hook

CI will automatically fail your build if it doesn't pass tidy, our internal tool for ensuring code quality. If you'd like, you can install a Git hook that will automatically run x.py test tidy --bless on each commit, to ensure your code is up to par. If you decide later that this behavior is undesirable, you can delete the pre-commit file in .git/hooks.

A prebuilt git hook lives at src/etc/pre-commit.sh which can be copied into your .git/hooks folder as pre-commit (without the .sh extension!).

You can also install the hook as a step of running x.py setup!

Configuring rust-analyzer for rustc

rust-analyzer can help you check and format your code whenever you save a file. By default, rust-analyzer runs the cargo check and rustfmt commands, but you can override these commands to use more adapted versions of these tools when hacking on rustc. For example, for Visual Studio Code, you can write:

{
    "rust-analyzer.checkOnSave.overrideCommand": [
        "./x.py",
        "check",
        "--json-output"
    ],
    "rust-analyzer.rustfmt.overrideCommand": [
      "./build/TARGET_TRIPLE/stage0/bin/rustfmt"
    ],
    "editor.formatOnSave": true
}

in your .vscode/settings.json file. This will ask rust-analyzer to use x.py check to check the sources, and the stage 0 rustfmt to format them.

If running x.py check on save is inconvenient, in VS Code you can use a Build Task instead:

// .vscode/tasks.json
{
    "version": "2.0.0",
    "tasks": [
        {
            "label": "./x.py check",
            "command": "./x.py check",
            "type": "shell",
            "problemMatcher": "$rustc",
            "presentation": { "clear": true },
            "group": { "kind": "build", "isDefault": true }
        }
    ]
}

Check, check, and check again

When doing simple refactorings, it can be useful to run ./x.py check continuously. If you set up rust-analyzer as described above, this will be done for you every time you save a file. Here you are just checking that the compiler can build, but often that is all you need (e.g., when renaming a method). You can then run ./x.py build when you actually need to run tests.

In fact, it is sometimes useful to put off tests even when you are not 100% sure the code will work. You can then keep building up refactoring commits and only run the tests at some later time. You can then use git bisect to track down precisely which commit caused the problem. A nice side-effect of this style is that you are left with a fairly fine-grained set of commits at the end, all of which build and pass tests. This often helps reviewing.

Configuring rustup to use nightly

Some parts of the bootstrap process uses pinned, nightly versions of tools like rustfmt. To make things like cargo fmt work correctly in your repo, run

cd <path to rustc repo>
rustup override set nightly

after installing a nightly toolchain with rustup. Don't forget to do this for all directories you have setup a worktree for. You may need to use the pinned nightly version from src/stage0.txt, but often the normal nightly channel will work.

Note see the section on vscode for how to configure it with this real rustfmt x.py uses, and the section on rustup for how to setup rustup toolchain for your bootstrapped compiler

Note This does not allow you to build rustc with cargo directly. You still have to use x.py to work on the compiler or standard library, this just lets you use cargo fmt.

Incremental builds with --keep-stage.

Sometimes just checking whether the compiler builds is not enough. A common example is that you need to add a debug! statement to inspect the value of some state or better understand the problem. In that case, you really need a full build. By leveraging incremental, though, you can often get these builds to complete very fast (e.g., around 30 seconds). The only catch is this requires a bit of fudging and may produce compilers that don't work (but that is easily detected and fixed).

The sequence of commands you want is as follows:

  • Initial build: ./x.py build -i library/std
    • As documented previously, this will build a functional stage1 compiler as part of running all stage0 commands (which include building a std compatible with the stage1 compiler) as well as the first few steps of the "stage 1 actions" up to "stage1 (sysroot stage1) builds std".
  • Subsequent builds: ./x.py build -i library/std --keep-stage 1
    • Note that we added the --keep-stage 1 flag here

As mentioned, the effect of --keep-stage 1 is that we just assume that the old standard library can be re-used. If you are editing the compiler, this is almost always true: you haven't changed the standard library, after all. But sometimes, it's not true: for example, if you are editing the "metadata" part of the compiler, which controls how the compiler encodes types and other states into the rlib files, or if you are editing things that wind up in the metadata (such as the definition of the MIR).

The TL;DR is that you might get weird behavior from a compile when using --keep-stage 1 -- for example, strange ICEs or other panics. In that case, you should simply remove the --keep-stage 1 from the command and rebuild. That ought to fix the problem.

You can also use --keep-stage 1 when running tests. Something like this:

  • Initial test run: ./x.py test -i src/test/ui
  • Subsequent test run: ./x.py test -i src/test/ui --keep-stage 1

Fine-tuning optimizations

Setting optimize = false makes the compiler too slow for tests. However, to improve the test cycle, you can disable optimizations selectively only for the crates you'll have to rebuild (source). For example, when working on rustc_mir_build, the rustc_mir_build and rustc_driver crates take the most time to incrementally rebuild. You could therefore set the following in the root Cargo.toml:

[profile.release.package.rustc_mir_build]
opt-level = 0
[profile.release.package.rustc_driver]
opt-level = 0

Working on multiple branches at the same time

Working on multiple branches in parallel can be a little annoying, since building the compiler on one branch will cause the old build and the incremental compilation cache to be overwritten. One solution would be to have multiple clones of the repository, but that would mean storing the Git metadata multiple times, and having to update each clone individually.

Fortunately, Git has a better solution called worktrees. This lets you create multiple "working trees", which all share the same Git database. Moreover, because all of the worktrees share the same object database, if you update a branch (e.g. master) in any of them, you can use the new commits from any of the worktrees. One caveat, though, is that submodules do not get shared. They will still be cloned multiple times.

Given you are inside the root directory for your rust repository, you can create a "linked working tree" in a new "rust2" directory by running the following command:

git worktree add ../rust2

Creating a new worktree for a new branch based on master looks like:

git worktree add -b my-feature ../rust2 master

You can then use that rust2 folder as a separate workspace for modifying and building rustc!

Skipping LLVM Build

By default, LLVM is built from source, and that takes significant amount of time. One way to avoid that is to add this to config.toml:

[llvm]
download-ci-llvm = true

Downloading LLVM from CI is still experimental though, and might not be available on all platforms. Otherwise, we'd make it a default!

Another alternative is to use LLVM already installed on your computer. This is specified in the target section of config.toml:

[target.x86_64-unknown-linux-gnu]
llvm-config = "/path/to/llvm/llvm-7.0.1/bin/llvm-config"

We have observed the following paths before, which may be different from your system:

  • /usr/bin/llvm-config-8
  • /usr/lib/llvm-8/bin/llvm-config

Note that you need to have the LLVM FileCheck tool installed, which is used for codegen tests. This tool is normally built with LLVM, but if you use your own preinstalled LLVM, you will need to provide FileCheck in some other way. On Debian-based systems, you can install the llvm-N-tools package (where N is the LLVM version number, e.g. llvm-8-tools). Alternately, you can specify the path to FileCheck with the llvm-filecheck config item in config.toml or you can disable codegen test with the codegen-tests item in config.toml.

Build distribution artifacts

You might want to build and package up the compiler for distribution. You’ll want to run this command to do it:

./x.py dist

Install distribution artifacts

If you’ve built a distribution artifact you might want to install it and test that it works on your target system. You’ll want to run this command:

./x.py install

Note: If you are testing out a modification to a compiler, you might want to use it to compile some project. Usually, you do not want to use ./x.py install for testing. Rather, you should create a toolchain as discussed in here.

For example, if the toolchain you created is called foo, you would then invoke it with rustc +foo ... (where ... represents the rest of the arguments).

Documenting rustc

You might want to build documentation of the various components available like the standard library. There’s two ways to go about this. You can run rustdoc directly on the file to make sure the HTML is correct, which is fast. Alternatively, you can build the documentation as part of the build process through x.py. Both are viable methods since documentation is more about the content.

Document everything

This uses the beta rustdoc, which usually but not always has the same output as stage 1 rustdoc.

./x.py doc

If you want to be sure that the links behave the same as on CI

./x.py doc --stage 1

First the compiler and rustdoc get built to make sure everything is okay and then it documents the files.

Document specific components

./x.py doc src/doc/book
./x.py doc src/doc/nomicon
./x.py doc src/doc/book library/std

Much like individual tests or building certain components you can build only the documentation you want.

Document internal rustc items

Compiler documentation is not built by default. To enable it, modify config.toml:

[build]
compiler-docs = true

Note that when enabled, documentation for internal compiler items will also be built.

Compiler Documentation

The documentation for the rust components are found at rustc doc.

The walking tour of rustdoc

Rustdoc actually uses the rustc internals directly. It lives in-tree with the compiler and standard library. This chapter is about how it works. For information about Rustdoc's features and how to use them, see the Rustdoc book.

Rustdoc is implemented entirely within the crate librustdoc. It runs the compiler up to the point where we have an internal representation of a crate (HIR) and the ability to run some queries about the types of items. HIR and queries are discussed in the linked chapters.

librustdoc performs two major steps after that to render a set of documentation:

  • "Clean" the AST into a form that's more suited to creating documentation (and slightly more resistant to churn in the compiler).
  • Use this cleaned AST to render a crate's documentation, one page at a time.

Naturally, there's more than just this, and those descriptions simplify out lots of details, but that's the high-level overview.

(Side note: librustdoc is a library crate! The rustdoc binary is created using the project in src/tools/rustdoc. Note that literally all that does is call the main() that's in this crate's lib.rs, though.)

Cheat sheet

  • Use ./x.py build to make a usable rustdoc you can run on other projects.
    • Add library/test to be able to use rustdoc --test.
    • If you've used rustup toolchain link local /path/to/build/$TARGET/stage1 previously, then after the previous build command, cargo +local doc will Just Work.
  • Use ./x.py doc --stage 1 library/std to use this rustdoc to generate the standard library docs.
    • The completed docs will be available in build/$TARGET/doc/std, though the bundle is meant to be used as though you would copy out the doc folder to a web server, since that's where the CSS/JS and landing page are.
  • Use x.py test src/test/rustdoc* to run the tests using a stage1 rustdoc.
  • Most of the HTML printing code is in html/format.rs and html/render.rs. It's in a bunch of fmt::Display implementations and supplementary functions.
  • The types that got Display impls above are defined in clean/mod.rs, right next to the custom Clean trait used to process them out of the rustc HIR.
  • The bits specific to using rustdoc as a test harness are in test.rs.
  • The Markdown renderer is loaded up in html/markdown.rs, including functions for extracting doctests from a given block of Markdown.
  • The tests on rustdoc output are located in src/test/rustdoc, where they're handled by the test runner of rustbuild and the supplementary script src/etc/htmldocck.py.
  • Tests on search index generation are located in src/test/rustdoc-js, as a series of JavaScript files that encode queries on the standard library search index and expected results.

See also

For more details about how rustdoc works, see the page on rustdoc internals.

ctags

One of the challenges with rustc is that the RLS can't handle it, since it's a bootstrapping compiler. This makes code navigation difficult. One solution is to use ctags.

ctags has a long history and several variants. Exuberant Ctags seems to be quite commonly distributed but it does not have out-of-box Rust support. Some distributions seem to use Universal Ctags, which is a maintained fork and does have built-in Rust support.

The following script can be used to set up Exuberant Ctags: https://github.com/nikomatsakis/rust-etags.

ctags integrates into emacs and vim quite easily. The following can then be used to build and generate tags:

$ rust-ctags src/lib* && ./x.py build <something>

This allows you to do "jump-to-def" with whatever functions were around when you last built, which is ridiculously useful.

Adding a new target

These are a set of steps to add support for a new target. There are numerous end states and paths to get there, so not all sections may be relevant to your desired goal.

Specifying a new LLVM

For very new targets, you may need to use a different fork of LLVM than what is currently shipped with Rust. In that case, navigate to the src/llvm_project git submodule (you might need to run x.py check at least once so the submodule is updated), check out the appropriate commit for your fork, then commit that new submodule reference in the main Rust repository.

An example would be:

cd src/llvm_project
git remote add my-target-llvm some-llvm-repository
git checkout my-target-llvm/my-branch
cd ..
git add llvm_target
git commit -m 'Use my custom LLVM'

If you have a local LLVM checkout that is already built, you may be able to configure Rust to treat your build as the system LLVM to avoid redundant builds.

Creating a target specification

You should start with a target JSON file. You can see the specification for an existing target using --print target-spec-json:

rustc -Z unstable-options --target=wasm32-unknown-unknown --print target-spec-json

Save that JSON to a file and modify it as appropriate for your target.

Adding a target specification

Once you have filled out a JSON specification and been able to compile somewhat successfully, you can copy the specification into the compiler itself.

You will need to add a line to the big table inside of the supported_targets macro in the rustc_target::spec module. You will then add a corresponding file for your new target containing a target function.

Look for existing targets to use as examples

Patching crates

You may need to make changes to crates that the compiler depends on, such as libc or cc. If so, you can use Cargo's [patch] ability. For example, if you want to use an unreleased version of libc, you can add it to the top-level Cargo.toml file:

diff --git a/Cargo.toml b/Cargo.toml
index be15e50e2bc..4fb1248ba99 100644
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -66,10 +66,11 @@ cargo = { path = "src/tools/cargo" }
 [patch.crates-io]
 # Similar to Cargo above we want the RLS to use a vendored version of `rustfmt`
 # that we're shipping as well (to ensure that the rustfmt in RLS and the
 # `rustfmt` executable are the same exact version).
 rustfmt-nightly = { path = "src/tools/rustfmt" }
+libc = { git = "https://github.com/rust-lang/libc", rev = "0bf7ce340699dcbacabdf5f16a242d2219a49ee0" }

 # See comments in `src/tools/rustc-workspace-hack/README.md` for what's going on
 # here
 rustc-workspace-hack = { path = 'src/tools/rustc-workspace-hack' }

After this, run cargo update -p libc to update the lockfiles.

Cross-compiling

Once you have a target specification in JSON and in the code, you can cross-compile rustc:

DESTDIR=/path/to/install/in \
./x.py install -i --stage 1 --host aarch64-apple-darwin.json --target aarch64-apple-darwin \
compiler/rustc library/std

If your target specification is already available in the bootstrap compiler, you can use it instead of the JSON file for both arguments.

Promoting a target from tier 2 (target) to tier 2 (host)

There are two levels of tier 2 targets: a) Targets that are only cross-compiled (rustup target add) b) Targets that have a native toolchain (rustup toolchain install)

For an example of promoting a target from cross-compiled to native, see #75914.

The compiler testing framework

The Rust project runs a wide variety of different tests, orchestrated by the build system (x.py test). The main test harness for testing the compiler itself is a tool called compiletest (located in the src/tools/compiletest directory). This section gives a brief overview of how the testing framework is setup, and then gets into some of the details on how to run tests as well as how to add new tests.

Compiletest test suites

The compiletest tests are located in the tree in the src/test directory. Immediately within you will see a series of subdirectories (e.g. ui, run-make, and so forth). Each of those directories is called a test suite – they house a group of tests that are run in a distinct mode.

Here is a brief summary of the test suites and what they mean. In some cases, the test suites are linked to parts of the manual that give more details.

  • ui – tests that check the exact stdout/stderr from compilation and/or running the test
  • run-pass-valgrind – tests that ought to run with valgrind
  • run-fail – tests that are expected to compile but then panic during execution
  • compile-fail – tests that are expected to fail compilation.
  • parse-fail – tests that are expected to fail to parse
  • pretty – tests targeting the Rust "pretty printer", which generates valid Rust code from the AST
  • debuginfo – tests that run in gdb or lldb and query the debug info
  • codegen – tests that compile and then test the generated LLVM code to make sure that the optimizations we want are taking effect. See LLVM docs for how to write such tests.
  • assembly – similar to codegen tests, but verifies assembly output to make sure LLVM target backend can handle provided code.
  • mir-opt – tests that check parts of the generated MIR to make sure we are building things correctly or doing the optimizations we expect.
  • incremental – tests for incremental compilation, checking that when certain modifications are performed, we are able to reuse the results from previous compilations.
  • run-make – tests that basically just execute a Makefile; the ultimate in flexibility but quite annoying to write.
  • rustdoc – tests for rustdoc, making sure that the generated files contain the expected documentation.
  • *-fulldeps – same as above, but indicates that the test depends on things other than std (and hence those things must be built)

Other Tests

The Rust build system handles running tests for various other things, including:

  • Tidy – This is a custom tool used for validating source code style and formatting conventions, such as rejecting long lines. There is more information in the section on coding conventions.

    Example: ./x.py test tidy

  • Formatting – Rustfmt is integrated with the build system to enforce uniform style across the compiler. In the CI, we check that the formatting is correct. The formatting check is also automatically run by the Tidy tool mentioned above.

    Example: ./x.py fmt --check checks formatting an exits with an error if formatting is needed.

    Example: ./x.py fmt runs rustfmt on the codebase.

    Example: ./x.py test tidy --bless does formatting before doing other tidy checks.

  • Unit tests – The Rust standard library and many of the Rust packages include typical Rust #[test] unittests. Under the hood, x.py will run cargo test on each package to run all the tests.

    Example: ./x.py test library/std

  • Doc tests – Example code embedded within Rust documentation is executed via rustdoc --test. Examples:

    ./x.py test src/doc – Runs rustdoc --test for all documentation in src/doc.

    ./x.py test --doc library/std – Runs rustdoc --test on the standard library.

  • Link checker – A small tool for verifying href links within documentation.

    Example: ./x.py test src/tools/linkchecker

  • Dist check – This verifies that the source distribution tarball created by the build system will unpack, build, and run all tests.

    Example: ./x.py test distcheck

  • Tool tests – Packages that are included with Rust have all of their tests run as well (typically by running cargo test within their directory). This includes things such as cargo, clippy, rustfmt, rls, miri, bootstrap (testing the Rust build system itself), etc.

  • Cargo test – This is a small tool which runs cargo test on a few significant projects (such as servo, ripgrep, tokei, etc.) just to ensure there aren't any significant regressions.

    Example: ./x.py test src/tools/cargotest

Testing infrastructure

When a Pull Request is opened on Github, GitHub Actions will automatically launch a build that will run all tests on some configurations (x86_64-gnu-llvm-8 linux. x86_64-gnu-tools linux, mingw-check linux). In essence, it runs ./x.py test after building for each of them.

The integration bot bors is used for coordinating merges to the master branch. When a PR is approved, it goes into a queue where merges are tested one at a time on a wide set of platforms using GitHub Actions (currently over 50 different configurations). Due to the limit on the number of parallel jobs, we run CI under the rust-lang-ci organization except for PRs. Most platforms only run the build steps, some run a restricted set of tests, only a subset run the full suite of tests (see Rust's platform tiers).

Testing with Docker images

The Rust tree includes Docker image definitions for the platforms used on GitHub Actions in src/ci/docker. The script src/ci/docker/run.sh is used to build the Docker image, run it, build Rust within the image, and run the tests.

You can run these images on your local development machine. This can be helpful to test environments different from your local system. First you will need to install Docker on a Linux, Windows, or macOS system (typically Linux will be much faster than Windows or macOS because the later use virtual machines to emulate a Linux environment). To enter interactive mode which will start a bash shell in the container, run src/ci/docker/run.sh --dev <IMAGE> where <IMAGE> is one of the directory names in src/ci/docker (for example x86_64-gnu is a fairly standard Ubuntu environment).

The docker script will mount your local rust source tree in read-only mode, and an obj directory in read-write mode. All of the compiler artifacts will be stored in the obj directory. The shell will start out in the obj directory. From there, you can run ../src/ci/run.sh which will run the build as defined by the image.

Alternatively, you can run individual commands to do specific tasks. For example, you can run python3 ../x.py test src/test/ui to just run UI tests. Note that there is some configuration in the src/ci/run.sh script that you may need to recreate. Particularly, set submodules = false in your config.toml so that it doesn't attempt to modify the read-only directory.

Some additional notes about using the Docker images:

  • Some of the std tests require IPv6 support. Docker on Linux seems to have it disabled by default. Run the commands in enable-docker-ipv6.sh to enable IPv6 before creating the container. This only needs to be done once.
  • The container will be deleted automatically when you exit the shell, however the build artifacts persist in the obj directory. If you are switching between different Docker images, the artifacts from previous environments stored in the obj directory may confuse the build system. Sometimes you will need to delete parts or all of the obj directory before building inside the container.
  • The container is bare-bones, with only a minimal set of packages. You may want to install some things like apt install less vim.
  • You can open multiple shells in the container. First you need the container name (a short hash), which is displayed in the shell prompt, or you can run docker container ls outside of the container to list the available containers. With the container name, run docker exec -it <CONTAINER> /bin/bash where <CONTAINER> is the container name like 4ba195e95cef.

Running tests on a remote machine

Tests may be run on a remote machine (e.g. to test builds for a different architecture). This is done using remote-test-client on the build machine to send test programs to remote-test-server running on the remote machine. remote-test-server executes the test programs and sends the results back to the build machine. remote-test-server provides unauthenticated remote code execution so be careful where it is used.

To do this, first build remote-test-server for the remote machine, e.g. for RISC-V

./x.py build src/tools/remote-test-server --target riscv64gc-unknown-linux-gnu

The binary will be created at ./build/$HOST_ARCH/stage2-tools/$TARGET_ARCH/release/remote-test-server. Copy this over to the remote machine.

On the remote machine, run the remote-test-server with the remote argument (and optionally -v for verbose output). Output should look like this:

$ ./remote-test-server -v remote
starting test server
listening on 0.0.0.0:12345!

You can test if the remote-test-server is working by connecting to it and sending ping\n. It should reply pong:

$ nc $REMOTE_IP 12345
ping
pong

To run tests using the remote runner, set the TEST_DEVICE_ADDR environment variable then use x.py as usual. For example, to run ui tests for a RISC-V machine with the IP address 1.2.3.4 use

export TEST_DEVICE_ADDR="1.2.3.4:12345"
./x.py test src/test/ui --target riscv64gc-unknown-linux-gnu

If remote-test-server was run with the verbose flag, output on the test machine may look something like

[...]
run "/tmp/work/test1007/a"
run "/tmp/work/test1008/a"
run "/tmp/work/test1009/a"
run "/tmp/work/test1010/a"
run "/tmp/work/test1011/a"
run "/tmp/work/test1012/a"
run "/tmp/work/test1013/a"
run "/tmp/work/test1014/a"
run "/tmp/work/test1015/a"
run "/tmp/work/test1016/a"
run "/tmp/work/test1017/a"
run "/tmp/work/test1018/a"
[...]

Tests are built on the machine running x.py not on the remote machine. Tests which fail to build unexpectedly (or ui tests producing incorrect build output) may fail without ever running on the remote machine.

Testing on emulators

Some platforms are tested via an emulator for architectures that aren't readily available. For architectures where the standard library is well supported and the host operating system supports TCP/IP networking, see the above instructions for testing on a remote machine (in this case the remote machine is emulated).

There is also a set of tools for orchestrating running the tests within the emulator. Platforms such as arm-android and arm-unknown-linux-gnueabihf are set up to automatically run the tests under emulation on GitHub Actions. The following will take a look at how a target's tests are run under emulation.

The Docker image for armhf-gnu includes QEMU to emulate the ARM CPU architecture. Included in the Rust tree are the tools remote-test-client and remote-test-server which are programs for sending test programs and libraries to the emulator, and running the tests within the emulator, and reading the results. The Docker image is set up to launch remote-test-server and the build tools use remote-test-client to communicate with the server to coordinate running tests (see src/bootstrap/test.rs).

TODO: Is there any support for using an iOS emulator?

It's also unclear to me how the wasm or asm.js tests are run.

Crater

Crater is a tool for compiling and running tests for every crate on crates.io (and a few on GitHub). It is mainly used for checking for extent of breakage when implementing potentially breaking changes and ensuring lack of breakage by running beta vs stable compiler versions.

When to run Crater

You should request a crater run if your PR makes large changes to the compiler or could cause breakage. If you are unsure, feel free to ask your PR's reviewer.

Requesting Crater Runs

The rust team maintains a few machines that can be used for running crater runs on the changes introduced by a PR. If your PR needs a crater run, leave a comment for the triage team in the PR thread. Please inform the team whether you require a "check-only" crater run, a "build only" crater run, or a "build-and-test" crater run. The difference is primarily in time; the conservative (if you're not sure) option is to go for the build-and-test run. If making changes that will only have an effect at compile-time (e.g., implementing a new trait) then you only need a check run.

Your PR will be enqueued by the triage team and the results will be posted when they are ready. Check runs will take around ~3-4 days, with the other two taking 5-6 days on average.

While crater is really useful, it is also important to be aware of a few caveats:

  • Not all code is on crates.io! There is a lot of code in repos on GitHub and elsewhere. Also, companies may not wish to publish their code. Thus, a successful crater run is not a magically green light that there will be no breakage; you still need to be careful.

  • Crater only runs Linux builds on x86_64. Thus, other architectures and platforms are not tested. Critically, this includes Windows.

  • Many crates are not tested. This could be for a lot of reasons, including that the crate doesn't compile any more (e.g. used old nightly features), has broken or flaky tests, requires network access, or other reasons.

  • Before crater can be run, @bors try needs to succeed in building artifacts. This means that if your code doesn't compile, you cannot run crater.

Perf runs

A lot of work is put into improving the performance of the compiler and preventing performance regressions. A "perf run" is used to compare the performance of the compiler in different configurations for a large collection of popular crates. Different configurations include "fresh builds", builds with incremental compilation, etc.

The result of a perf run is a comparison between two versions of the compiler (by their commit hashes).

You should request a perf run if your PR may affect performance, especially if it can affect performance adversely.

Further reading

The following blog posts may also be of interest:

Running tests

You can run the tests using x.py. The most basic command – which you will almost never want to use! – is as follows:

./x.py test

This will build the stage 1 compiler and then run the whole test suite. You probably don't want to do this very often, because it takes a very long time, and anyway bors / GitHub Actions will do it for you. (Often, I will run this command in the background after opening a PR that I think is done, but rarely otherwise. -nmatsakis)

The test results are cached and previously successful tests are ignored during testing. The stdout/stderr contents as well as a timestamp file for every test can be found under build/ARCH/test/. To force-rerun a test (e.g. in case the test runner fails to notice a change) you can simply remove the timestamp file.

Note that some tests require a Python-enabled gdb. You can test if your gdb install supports Python by using the python command from within gdb. Once invoked you can type some Python code (e.g. print("hi")) followed by return and then CTRL+D to execute it. If you are building gdb from source, you will need to configure with --with-python=<path-to-python-binary>.

Running a subset of the test suites

When working on a specific PR, you will usually want to run a smaller set of tests. For example, a good "smoke test" that can be used after modifying rustc to see if things are generally working correctly would be the following:

./x.py test src/test/{ui,compile-fail}

This will run the ui and compile-fail test suites. Of course, the choice of test suites is somewhat arbitrary, and may not suit the task you are doing. For example, if you are hacking on debuginfo, you may be better off with the debuginfo test suite:

./x.py test src/test/debuginfo

If you only need to test a specific subdirectory of tests for any given test suite, you can pass that directory to x.py test:

./x.py test src/test/ui/const-generics

Likewise, you can test a single file by passing its path:

./x.py test src/test/ui/const-generics/const-test.rs

Run only the tidy script

./x.py test tidy

Run tests on the standard library

./x.py test --stage 0 library/std

Run the tidy script and tests on the standard library

./x.py test --stage 0 tidy library/std

Run tests on the standard library using a stage 1 compiler

./x.py test library/std

By listing which test suites you want to run you avoid having to run tests for components you did not change at all.

Warning: Note that bors only runs the tests with the full stage 2 build; therefore, while the tests usually work fine with stage 1, there are some limitations.

Running an individual test

Another common thing that people want to do is to run an individual test, often the test they are trying to fix. As mentioned earlier, you may pass the full file path to achieve this, or alternatively one may invoke x.py with the --test-args option:

./x.py test src/test/ui --test-args issue-1234

Under the hood, the test runner invokes the standard rust test runner (the same one you get with #[test]), so this command would wind up filtering for tests that include "issue-1234" in the name. (Thus --test-args is a good way to run a collection of related tests.)

Editing and updating the reference files

If you have changed the compiler's output intentionally, or you are making a new test, you can pass --bless to the test subcommand. E.g. if some tests in src/test/ui are failing, you can run

./x.py test src/test/ui --bless

to automatically adjust the .stderr, .stdout or .fixed files of all tests. Of course you can also target just specific tests with the --test-args your_test_name flag, just like when running the tests.

Passing --pass $mode

Pass UI tests now have three modes, check-pass, build-pass and run-pass. When --pass $mode is passed, these tests will be forced to run under the given $mode unless the directive // ignore-pass exists in the test file. For example, you can run all the tests in src/test/ui as check-pass:

./x.py test src/test/ui --pass check

By passing --pass $mode, you can reduce the testing time. For each mode, please see here.

Using incremental compilation

You can further enable the --incremental flag to save additional time in subsequent rebuilds:

./x.py test src/test/ui --incremental --test-args issue-1234

If you don't want to include the flag with every command, you can enable it in the config.toml:

[rust]
incremental = true

Note that incremental compilation will use more disk space than usual. If disk space is a concern for you, you might want to check the size of the build directory from time to time.

Running tests with different "compare modes"

UI tests may have different output depending on certain "modes" that the compiler is in. For example, when in "non-lexical lifetimes" (NLL) mode a test foo.rs will first look for expected output in foo.nll.stderr, falling back to the usual foo.stderr if not found. To run the UI test suite in NLL mode, one would use the following:

./x.py test src/test/ui --compare-mode=nll

Other examples of compare-modes are "noopt", "migrate", and revisions.

Running tests manually

Sometimes it's easier and faster to just run the test by hand. Most tests are just rs files, so you can do something like

rustc +stage1 src/test/ui/issue-1234.rs

This is much faster, but doesn't always work. For example, some tests include directives that specify specific compiler flags, or which rely on other crates, and they may not run the same without those options.

Adding new tests

In general, we expect every PR that fixes a bug in rustc to come accompanied by a regression test of some kind. This test should fail in master but pass after the PR. These tests are really useful for preventing us from repeating the mistakes of the past.

To add a new test, the first thing you generally do is to create a file, typically a Rust source file. Test files have a particular structure:

Depending on the test suite, there may be some other details to be aware of:

What kind of test should I add?

It can be difficult to know what kind of test to use. Here are some rough heuristics:

  • Some tests have specialized needs:
    • need to run gdb or lldb? use the debuginfo test suite
    • need to inspect LLVM IR or MIR IR? use the codegen or mir-opt test suites
    • need to run rustdoc? Prefer a rustdoc or rustdoc-ui test. Occasionally you'll need rustdoc-js as well.
    • need to inspect the resulting binary in some way? Then use run-make
  • Library tests should go in library/${crate}/tests (where ${crate} is usually core, alloc, or std). Library tests include:
    • tests that an API behaves properly, including accepting various types or having some runtime behavior
    • tests where any compiler warnings are not relevant to the test
    • tests that a use of an API gives a compile error, where the exact error message is not relevant to the test. These should have an error number (E0XXX) in the code block to make sure it's the correct error.
  • For most other things, a ui (or ui-fulldeps) test is to be preferred:
    • ui tests subsume both run-pass, compile-fail, and parse-fail tests
    • in the case of warnings or errors, ui tests capture the full output, which makes it easier to review but also helps prevent "hidden" regressions in the output

Naming your test

We have not traditionally had a lot of structure in the names of tests. Moreover, for a long time, the rustc test runner did not support subdirectories (it now does), so test suites like src/test/ui have a huge mess of files in them. This is not considered an ideal setup.

For regression tests – basically, some random snippet of code that came in from the internet – we often name the test after the issue plus a short description. Ideally, the test should be added to a directory that helps identify what piece of code is being tested here (e.g., src/test/ui/borrowck/issue-54597-reject-move-out-of-borrow-via-pat.rs) If you've tried and cannot find a more relevant place, the test may be added to src/test/ui/issues/. Still, do include the issue number somewhere.

When writing a new feature, create a subdirectory to store your tests. For example, if you are implementing RFC 1234 ("Widgets"), then it might make sense to put the tests in a directory like src/test/ui/rfc1234-widgets/.

In other cases, there may already be a suitable directory. (The proper directory structure to use is actually an area of active debate.)

Comment explaining what the test is about

When you create a test file, include a comment summarizing the point of the test at the start of the file. This should highlight which parts of the test are more important, and what the bug was that the test is fixing. Citing an issue number is often very helpful.

This comment doesn't have to be super extensive. Just something like "Regression test for #18060: match arms were matching in the wrong order." might already be enough.

These comments are very useful to others later on when your test breaks, since they often can highlight what the problem is. They are also useful if for some reason the tests need to be refactored, since they let others know which parts of the test were important (often a test must be rewritten because it no longer tests what is was meant to test, and then it's useful to know what it was meant to test exactly).

Header commands: configuring rustc

Header commands are special comments that the test runner knows how to interpret. They must appear before the Rust source in the test. They are normally put after the short comment that explains the point of this test. For example, this test uses the // compile-flags command to specify a custom flag to give to rustc when the test is compiled:

// Test the behavior of `0 - 1` when overflow checks are disabled.

// compile-flags: -Coverflow-checks=off

fn main() {
    let x = 0 - 1;
    ...
}

Ignoring tests

These are used to ignore the test in some situations, which means the test won't be compiled or run.

  • ignore-X where X is a target detail or stage will ignore the test accordingly (see below)
  • only-X is like ignore-X, but will only run the test on that target or stage
  • ignore-pretty will not compile the pretty-printed test (this is done to test the pretty-printer, but might not always work)
  • ignore-test always ignores the test
  • ignore-lldb and ignore-gdb will skip a debuginfo test on that debugger.
  • ignore-gdb-version can be used to ignore the test when certain gdb versions are used

Some examples of X in ignore-X:

  • Architecture: aarch64, arm, asmjs, mips, wasm32, x86_64, x86, ...
  • OS: android, emscripten, freebsd, ios, linux, macos, windows, ...
  • Environment (fourth word of the target triple): gnu, msvc, musl.
  • Pointer width: 32bit, 64bit.
  • Stage: stage0, stage1, stage2.
  • When cross compiling: cross-compile
  • When remote testing is used: remote
  • When debug-assertions are enabled: debug
  • When particular debuggers are being tested: cdb, gdb, lldb
  • Specific compare modes: compare-mode-nll, compare-mode-polonius

Other Header Commands

Here is a list of other header commands. This list is not exhaustive. Header commands can generally be found by browsing the TestProps structure found in header.rs from the compiletest source.

  • run-rustfix for UI tests, indicates that the test produces structured suggestions. The test writer should create a .fixed file, which contains the source with the suggestions applied. When the test is run, compiletest first checks that the correct lint/warning is generated. Then, it applies the suggestion and compares against .fixed (they must match). Finally, the fixed source is compiled, and this compilation is required to succeed. The .fixed file can also be generated automatically with the --bless option, described in this section.
  • min-gdb-version specifies the minimum gdb version required for this test; see also ignore-gdb-version
  • min-lldb-version specifies the minimum lldb version required for this test
  • rust-lldb causes the lldb part of the test to only be run if the lldb in use contains the Rust plugin
  • no-system-llvm causes the test to be ignored if the system llvm is used
  • min-llvm-version specifies the minimum llvm version required for this test
  • min-system-llvm-version specifies the minimum system llvm version required for this test; the test is ignored if the system llvm is in use and it doesn't meet the minimum version. This is useful when an llvm feature has been backported to rust-llvm
  • ignore-llvm-version can be used to skip the test when certain LLVM versions are used. This takes one or two arguments; the first argument is the first version to ignore. If no second argument is given, all subsequent versions are ignored; otherwise, the second argument is the last version to ignore.
  • build-pass for UI tests, indicates that the test is supposed to successfully compile and link, as opposed to the default where the test is supposed to error out.
  • compile-flags passes extra command-line args to the compiler, e.g. compile-flags -g which forces debuginfo to be enabled.
  • edition controls the edition the test should be compiled with (defaults to 2015). Example usage: // edition:2018.
  • should-fail indicates that the test should fail; used for "meta testing", where we test the compiletest program itself to check that it will generate errors in appropriate scenarios. This header is ignored for pretty-printer tests.
  • gate-test-X where X is a feature marks the test as "gate test" for feature X. Such tests are supposed to ensure that the compiler errors when usage of a gated feature is attempted without the proper #![feature(X)] tag. Each unstable lang feature is required to have a gate test.
  • needs-profiler-support - a profiler runtime is required, i.e., profiler = true in rustc's config.toml.
  • needs-sanitizer-support - a sanitizer runtime is required, i.e., sanitizers = true in rustc's config.toml.
  • needs-sanitizer-{address,leak,memory,thread} - indicates that test requires a target with a support for AddressSanitizer, LeakSanitizer, MemorySanitizer or ThreadSanitizer respectively.

Error annotations

Error annotations specify the errors that the compiler is expected to emit. They are "attached" to the line in source where the error is located. Error annotations are considered during tidy lints of line length and should be formatted according to tidy requirements. You may use an error message prefix sub-string if necessary to meet line length requirements. Make sure that the text is long enough for the error message to be self-documenting.

The error annotation definition and source line definition association is defined with the following set of idioms:

  • ~: Associates the following error level and message with the current line
  • ~|: Associates the following error level and message with the same line as the previous comment
  • ~^: Associates the following error level and message with the previous error annotation line. Each caret (^) that you add adds a line to this, so ~^^^ is three lines above the error annotation line.

Error annotation examples

Here are examples of error annotations on different lines of UI test source.

Positioned on error line

Use the //~ ERROR idiom:

fn main() {
    let x = (1, 2, 3);
    match x {
        (_a, _x @ ..) => {} //~ ERROR `_x @` is not allowed in a tuple
        _ => {}
    }
}

Positioned below error line

Use the //~^ idiom with number of carets in the string to indicate the number of lines above. In the example below, the error line is four lines above the error annotation line so four carets are included in the annotation.

fn main() {
    let x = (1, 2, 3);
    match x {
        (_a, _x @ ..) => {}  // <- the error is on this line
        _ => {}
    }
}
//~^^^^ ERROR `_x @` is not allowed in a tuple

Use same error line as defined on error annotation line above

Use the //~| idiom to define the same error line as the error annotation line above:

struct Binder(i32, i32, i32);

fn main() {
    let x = Binder(1, 2, 3);
    match x {
        Binder(_a, _x @ ..) => {}  // <- the error is on this line
        _ => {}
    }
}
//~^^^^ ERROR `_x @` is not allowed in a tuple struct
//~| ERROR this pattern has 1 field, but the corresponding tuple struct has 3 fields [E0023]

The error levels that you can have are:

  1. ERROR
  2. WARNING
  3. NOTE
  4. HELP and SUGGESTION*

* Note: SUGGESTION must follow immediately after HELP.

Revisions

Certain classes of tests support "revisions" (as of the time of this writing, this includes compile-fail, run-fail, and incremental, though incremental tests are somewhat different). Revisions allow a single test file to be used for multiple tests. This is done by adding a special header at the top of the file:


#![allow(unused)]
fn main() {
// revisions: foo bar baz
}

This will result in the test being compiled (and tested) three times, once with --cfg foo, once with --cfg bar, and once with --cfg baz. You can therefore use #[cfg(foo)] etc within the test to tweak each of these results.

You can also customize headers and expected error messages to a particular revision. To do this, add [foo] (or bar, baz, etc) after the // comment, like so:


#![allow(unused)]
fn main() {
// A flag to pass in only for cfg `foo`:
//[foo]compile-flags: -Z verbose

#[cfg(foo)]
fn test_foo() {
    let x: usize = 32_u32; //[foo]~ ERROR mismatched types
}
}

Note that not all headers have meaning when customized to a revision. For example, the ignore-test header (and all "ignore" headers) currently only apply to the test as a whole, not to particular revisions. The only headers that are intended to really work when customized to a revision are error patterns and compiler flags.

Guide to the UI tests

The UI tests are intended to capture the compiler's complete output, so that we can test all aspects of the presentation. They work by compiling a file (e.g., ui/hello_world/main.rs), capturing the output, and then applying some normalization (see below). This normalized result is then compared against reference files named ui/hello_world/main.stderr and ui/hello_world/main.stdout. If either of those files doesn't exist, the output must be empty (that is actually the case for this particular test). If the test run fails, we will print out the current output, but it is also saved in build/<target-triple>/test/ui/hello_world/main.stdout (this path is printed as part of the test failure message), so you can run diff and so forth.

Tests that do not result in compile errors

By default, a UI test is expected not to compile (in which case, it should contain at least one //~ ERROR annotation). However, you can also make UI tests where compilation is expected to succeed, and you can even run the resulting program. Just add one of the following header commands:

  • // check-pass - compilation should succeed but skip codegen (which is expensive and isn't supposed to fail in most cases)
  • // build-pass – compilation and linking should succeed but do not run the resulting binary
  • // run-pass – compilation should succeed and we should run the resulting binary

Normalization

The compiler output is normalized to eliminate output difference between platforms, mainly about filenames.

The following strings replace their corresponding values:

  • $DIR: The directory where the test is defined.
    • Example: /path/to/rust/src/test/ui/error-codes
  • $SRC_DIR: The root source directory.
    • Example: /path/to/rust/src
  • $TEST_BUILD_DIR: The base directory where the test's output goes.
    • Example: /path/to/rust/build/x86_64-unknown-linux-gnu/test/ui

Additionally, the following changes are made:

  • Line and column numbers for paths in $SRC_DIR are replaced with LL:CC. For example, /path/to/rust/library/core/src/clone.rs:122:8 is replaced with $SRC_DIR/core/src/clone.rs:LL:COL.

    Note: The line and column numbers for --> lines pointing to the test are not normalized, and left as-is. This ensures that the compiler continues to point to the correct location, and keeps the stderr files readable. Ideally all line/column information would be retained, but small changes to the source causes large diffs, and more frequent merge conflicts and test errors. See also -Z ui-testing below which applies additional line number normalization.

  • \t is replaced with an actual tab character.

  • Error line annotations like // ~ERROR some message are removed.

  • Backslashes (\) are converted to forward slashes (/) within paths (using a heuristic). This helps normalize differences with Windows-style paths.

  • CRLF newlines are converted to LF.

Additionally, the compiler is run with the -Z ui-testing flag which causes the compiler itself to apply some changes to the diagnostic output to make it more suitable for UI testing. For example, it will anonymize line numbers in the output (line numbers prefixing each source line are replaced with LL). In extremely rare situations, this mode can be disabled with the header command // compile-flags: -Z ui-testing=no.

Sometimes these built-in normalizations are not enough. In such cases, you may provide custom normalization rules using the header commands, e.g.


#![allow(unused)]
fn main() {
// normalize-stdout-test: "foo" -> "bar"
// normalize-stderr-32bit: "fn\(\) \(32 bits\)" -> "fn\(\) \($$PTR bits\)"
// normalize-stderr-64bit: "fn\(\) \(64 bits\)" -> "fn\(\) \($$PTR bits\)"
}

This tells the test, on 32-bit platforms, whenever the compiler writes fn() (32 bits) to stderr, it should be normalized to read fn() ($PTR bits) instead. Similar for 64-bit. The replacement is performed by regexes using default regex flavor provided by regex crate.

The corresponding reference file will use the normalized output to test both 32-bit and 64-bit platforms:

...
   |
   = note: source type: fn() ($PTR bits)
   = note: target type: u16 (16 bits)
...

Please see ui/transmute/main.rs and main.stderr for a concrete usage example.

Besides normalize-stderr-32bit and -64bit, one may use any target information or stage supported by ignore-X here as well (e.g. normalize-stderr-windows or simply normalize-stderr-test for unconditional replacement).

Using compiletest commands to control test execution

Introduction

compiletest is the main test harness of the Rust test suite. It allows test authors to organize large numbers of tests (the Rust compiler has many thousands), efficient test execution (parallel execution is supported), and allows the test author to configure behavior and expected results of both individual and groups of tests.

compiletest tests may check test code for success, for runtime failure, or for compile-time failure. Tests are typically organized as a Rust source file with annotations in comments before and/or within the test code, which serve to direct compiletest on if or how to run the test, what behavior to expect, and more. If you are unfamiliar with the compiler testing framework, see this chapter for additional background.

The tests themselves are typically (but not always) organized into "suites" – for example, run-fail, a folder holding tests that should compile successfully, but return a failure (non-zero status) at runtime, compile-fail, a folder holding tests that should fail to compile, and many more. The various suites are defined in src/tools/compiletest/src/common.rs in the pub enum Mode declaration. And a good introduction to the different suites of compiler tests along with details about them can be found in Adding new tests.

Adding a new test file

Briefly, simply create your new test in the appropriate location under src/test. No registration of test files is necessary as compiletest will scan the src/test subfolder recursively, and will execute any Rust source files it finds as tests. See Adding new tests for a complete guide on how to add new tests.

Header Commands

Source file annotations which appear in comments near the top of the source file before any test code are known as header commands. These commands can instruct compiletest to ignore this test, set expectations on whether it is expected to succeed at compiling, or what the test's return code is expected to be. Header commands and inline //~ ERROR commands are described more fully here.

Adding a new header command

Header commands are defined in the TestProps struct in src/tools/compiletest/src/header.rs. At a high level, there are dozens of test properties defined here, all set to default values in the TestProp struct's impl block. Any test can override this default value by specifying the property in question as header command as a comment (//) in the test source file, before any source code.

Using a header command

Here is an example, specifying the must-compile-successfully header command, which takes no arguments, followed by the failure-status header command, which takes a single argument (which, in this case is a value of 1). failure-status is instructing compiletest to expect a failure status of 1 (rather than the current Rust default of 101). The header command and the argument list (if present) are typically separated by a colon:

// must-compile-successfully
// failure-status: 1

#![feature(termination_trait)]

use std::io::{Error, ErrorKind};

fn main() -> Result<(), Box<Error>> {
    Err(Box::new(Error::new(ErrorKind::Other, "returned Box<Error> from main()")))
}

Adding a new header command property

One would add a new header command if there is a need to define some test property or behavior on an individual, test-by-test basis. A header command property serves as the header command's backing store (holds the command's current value) at runtime.

To add a new header command property:

  1. Look for the pub struct TestProps declaration in src/tools/compiletest/src/header.rs and add the new public property to the end of the declaration.
  2. Look for the impl TestProps implementation block immediately following the struct declaration and initialize the new property to its default value.

Adding a new header command parser

When compiletest encounters a test file, it parses the file a line at a time by calling every parser defined in the Config struct's implementation block, also in src/tools/compiletest/src/header.rs (note that the Config struct's declaration block is found in src/tools/compiletest/src/common.rs). TestProps's load_from() method will try passing the current line of text to each parser, which, in turn typically checks to see if the line begins with a particular commented (//) header command such as // must-compile-successfully or // failure-status. Whitespace after the comment marker is optional.

Parsers will override a given header command property's default value merely by being specified in the test file as a header command or by having a parameter value specified in the test file, depending on the header command.

Parsers defined in impl Config are typically named parse_<header_command> (note kebab-case <header-command> transformed to snake-case <header_command>). impl Config also defines several 'low-level' parsers which make it simple to parse common patterns like simple presence or not (parse_name_directive()), header-command:parameter(s) (parse_name_value_directive()), optional parsing only if a particular cfg attribute is defined (has_cfg_prefix()) and many more. The low-level parsers are found near the end of the impl Config block; be sure to look through them and their associated parsers immediately above to see how they are used to avoid writing additional parsing code unnecessarily.

As a concrete example, here is the implementation for the parse_failure_status() parser, in src/tools/compiletest/src/header.rs:

@@ -232,6 +232,7 @@ pub struct TestProps {
     // customized normalization rules
     pub normalize_stdout: Vec<(String, String)>,
     pub normalize_stderr: Vec<(String, String)>,
+    pub failure_status: i32,
 }

 impl TestProps {
@@ -260,6 +261,7 @@ impl TestProps {
             run_pass: false,
             normalize_stdout: vec![],
             normalize_stderr: vec![],
+            failure_status: 101,
         }
     }

@@ -383,6 +385,10 @@ impl TestProps {
             if let Some(rule) = config.parse_custom_normalization(ln, "normalize-stderr") {
                 self.normalize_stderr.push(rule);
             }
+
+            if let Some(code) = config.parse_failure_status(ln) {
+                self.failure_status = code;
+            }
         });

         for key in &["RUST_TEST_NOCAPTURE", "RUST_TEST_THREADS"] {
@@ -488,6 +494,13 @@ impl Config {
         self.parse_name_directive(line, "pretty-compare-only")
     }

+    fn parse_failure_status(&self, line: &str) -> Option<i32> {
+        match self.parse_name_value_directive(line, "failure-status") {
+            Some(code) => code.trim().parse::<i32>().ok(),
+            _ => None,
+        }
+    }

Implementing the behavior change

When a test invokes a particular header command, it is expected that some behavior will change as a result. What behavior, obviously, will depend on the purpose of the header command. In the case of failure-status, the behavior that changes is that compiletest expects the failure code defined by the header command invoked in the test, rather than the default value.

Although specific to failure-status (as every header command will have a different implementation in order to invoke behavior change) perhaps it is helpful to see the behavior change implementation of one case, simply as an example. To implement failure-status, the check_correct_failure_status() function found in the TestCx implementation block, located in src/tools/compiletest/src/runtest.rs, was modified as per below:

@@ -295,11 +295,14 @@ impl<'test> TestCx<'test> {
     }

     fn check_correct_failure_status(&self, proc_res: &ProcRes) {
-        // The value the rust runtime returns on failure
-        const RUST_ERR: i32 = 101;
-        if proc_res.status.code() != Some(RUST_ERR) {
+        let expected_status = Some(self.props.failure_status);
+        let received_status = proc_res.status.code();
+
+        if expected_status != received_status {
             self.fatal_proc_rec(
-                &format!("failure produced the wrong error: {}", proc_res.status),
+                &format!("Error: expected failure status ({:?}) but received status {:?}.",
+                         expected_status,
+                         received_status),
                 proc_res,
             );
         }
@@ -320,7 +323,6 @@ impl<'test> TestCx<'test> {
         );

         let proc_res = self.exec_compiled_test();
-
         if !proc_res.status.success() {
             self.fatal_proc_rec("test run failed!", &proc_res);
         }
@@ -499,7 +501,6 @@ impl<'test> TestCx<'test> {
                 expected,
                 actual
             );
-            panic!();
         }
     }

Note the use of self.props.failure_status to access the header command property. In tests which do not specify the failure status header command, self.props.failure_status will evaluate to the default value of 101 at the time of this writing. But for a test which specifies a header command of, for example, // failure-status: 1, self.props.failure_status will evaluate to 1, as parse_failure_status() will have overridden the TestProps default value, for that test specifically.

Debugging the compiler

This chapter contains a few tips to debug the compiler. These tips aim to be useful no matter what you are working on. Some of the other chapters have advice about specific parts of the compiler (e.g. the Queries Debugging and Testing chapter or the LLVM Debugging chapter).

-Z flags

The compiler has a bunch of -Z flags. These are unstable flags that are only enabled on nightly. Many of them are useful for debugging. To get a full listing of -Z flags, use -Z help.

One useful flag is -Z verbose, which generally enables printing more info that could be useful for debugging.

Getting a backtrace

When you have an ICE (panic in the compiler), you can set RUST_BACKTRACE=1 to get the stack trace of the panic! like in normal Rust programs. IIRC backtraces don't work on MinGW, sorry. If you have trouble or the backtraces are full of unknown, you might want to find some way to use Linux, Mac, or MSVC on Windows.

In the default configuration, you don't have line numbers enabled, so the backtrace looks like this:

stack backtrace:
   0: std::sys::imp::backtrace::tracing::imp::unwind_backtrace
   1: std::sys_common::backtrace::_print
   2: std::panicking::default_hook::{{closure}}
   3: std::panicking::default_hook
   4: std::panicking::rust_panic_with_hook
   5: std::panicking::begin_panic
   (~~~~ LINES REMOVED BY ME FOR BREVITY ~~~~)
  32: rustc_typeck::check_crate
  33: <std::thread::local::LocalKey<T>>::with
  34: <std::thread::local::LocalKey<T>>::with
  35: rustc::ty::context::TyCtxt::create_and_enter
  36: rustc_driver::driver::compile_input
  37: rustc_driver::run_compiler

If you want line numbers for the stack trace, you can enable debug = true in your config.toml and rebuild the compiler (debuginfo-level = 1 will also add line numbers, but debug = true gives full debuginfo). Then the backtrace will look like this:

stack backtrace:
   (~~~~ LINES REMOVED BY ME FOR BREVITY ~~~~)
             at /home/user/rust/compiler/rustc_typeck/src/check/cast.rs:110
   7: rustc_typeck::check::cast::CastCheck::check
             at /home/user/rust/compiler/rustc_typeck/src/check/cast.rs:572
             at /home/user/rust/compiler/rustc_typeck/src/check/cast.rs:460
             at /home/user/rust/compiler/rustc_typeck/src/check/cast.rs:370
   (~~~~ LINES REMOVED BY ME FOR BREVITY ~~~~)
  33: rustc_driver::driver::compile_input
             at /home/user/rust/compiler/rustc_driver/src/driver.rs:1010
             at /home/user/rust/compiler/rustc_driver/src/driver.rs:212
  34: rustc_driver::run_compiler
             at /home/user/rust/compiler/rustc_driver/src/lib.rs:253

Getting a backtrace for errors

If you want to get a backtrace to the point where the compiler emits an error message, you can pass the -Z treat-err-as-bug=n, which will make the compiler panic on the nth error on delay_span_bug. If you leave off =n, the compiler will assume 1 for n and thus panic on the first error it encounters.

This can also help when debugging delay_span_bug calls - it will make the first delay_span_bug call panic, which will give you a useful backtrace.

For example:

$ cat error.rs
fn main() {
    1 + ();
}
$ ./build/x86_64-unknown-linux-gnu/stage1/bin/rustc error.rs
error[E0277]: the trait bound `{integer}: std::ops::Add<()>` is not satisfied
 --> error.rs:2:7
  |
2 |     1 + ();
  |       ^ no implementation for `{integer} + ()`
  |
  = help: the trait `std::ops::Add<()>` is not implemented for `{integer}`

error: aborting due to previous error

$ # Now, where does the error above come from?
$ RUST_BACKTRACE=1 \
    ./build/x86_64-unknown-linux-gnu/stage1/bin/rustc \
    error.rs \
    -Z treat-err-as-bug
error[E0277]: the trait bound `{integer}: std::ops::Add<()>` is not satisfied
 --> error.rs:2:7
  |
2 |     1 + ();
  |       ^ no implementation for `{integer} + ()`
  |
  = help: the trait `std::ops::Add<()>` is not implemented for `{integer}`

error: internal compiler error: unexpected panic

note: the compiler unexpectedly panicked. this is a bug.

note: we would appreciate a bug report: https://github.com/rust-lang/rust/blob/master/CONTRIBUTING.md#bug-reports

note: rustc 1.24.0-dev running on x86_64-unknown-linux-gnu

note: run with `RUST_BACKTRACE=1` for a backtrace

thread 'rustc' panicked at 'encountered error with `-Z treat_err_as_bug',
/home/user/rust/compiler/rustc_errors/src/lib.rs:411:12
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose
backtrace.
stack backtrace:
  (~~~ IRRELEVANT PART OF BACKTRACE REMOVED BY ME ~~~)
   7: rustc::traits::error_reporting::<impl rustc::infer::InferCtxt<'a, 'tcx>>
             ::report_selection_error
             at /home/user/rust/compiler/rustc_middle/src/traits/error_reporting.rs:823
   8: rustc::traits::error_reporting::<impl rustc::infer::InferCtxt<'a, 'tcx>>
             ::report_fulfillment_errors
             at /home/user/rust/compiler/rustc_middle/src/traits/error_reporting.rs:160
             at /home/user/rust/compiler/rustc_middle/src/traits/error_reporting.rs:112
   9: rustc_typeck::check::FnCtxt::select_obligations_where_possible
             at /home/user/rust/compiler/rustc_typeck/src/check/mod.rs:2192
  (~~~ IRRELEVANT PART OF BACKTRACE REMOVED BY ME ~~~)
  36: rustc_driver::run_compiler
             at /home/user/rust/compiler/rustc_driver/src/lib.rs:253
$ # Cool, now I have a backtrace for the error

Getting logging output

These crates are used in compiler for logging:

The compiler has a lot of debug! calls, which print out logging information at many points. These are very useful to at least narrow down the location of a bug if not to find it entirely, or just to orient yourself as to why the compiler is doing a particular thing.

To see the logs, you need to set the RUSTC_LOG environment variable to your log filter, e.g. to get the logs for a specific module, you can run the compiler as RUSTC_LOG=module::path rustc my-file.rs. All debug! output will then appear in standard error.

If you are developing rustdoc, use RUSTDOC_LOG instead.

See the env-logger doc for more info on the full syntax. (Note: unlike the compiler, the env-logger crate and its examples use the RUST_LOG env variable.)

Note that unless you use a very strict filter, the logger will emit a lot of output, so use the most specific module(s) you can (comma-separated if multiple). It's typically a good idea to pipe standard error to a file and look at the log output with a text editor.

So to put it together.

# This puts the output of all debug calls in `rustc_middle/src/traits` into
# standard error, which might fill your console backscroll.
$ RUSTC_LOG=rustc_middle::traits rustc +local my-file.rs

# This puts the output of all debug calls in `rustc_middle/src/traits` in
# `traits-log`, so you can then see it with a text editor.
$ RUSTC_LOG=rustc_middle::traits rustc +local my-file.rs 2>traits-log

# Not recommended. This will show the output of all `debug!` calls
# in the Rust compiler, and there are a *lot* of them, so it will be
# hard to find anything.
$ RUSTC_LOG=debug rustc +local my-file.rs 2>all-log

# This will show the output of all `info!` calls in `rustc_trans`.
#
# There's an `info!` statement in `trans_instance` that outputs
# every function that is translated. This is useful to find out
# which function triggers an LLVM assertion, and this is an `info!`
# log rather than a `debug!` log so it will work on the official
# compilers.
$ RUSTC_LOG=rustc_trans=info rustc +local my-file.rs

# This will show the output of all `info!` calls made by rustdoc or any rustc library it calls.
$ RUSTDOC_LOG=info rustdoc +local my-file.rs

# This will only show `debug!` calls made by rustdoc directly, not any `rustc*` crate.
$ RUSTDOC_LOG=rustdoc rustdoc +local my-file.rs

How to keep or remove debug! and trace! calls from the resulting binary

While calls to error!, warn! and info! are included in every build of the compiler, calls to debug! and trace! are only included in the program if debug-logging=true is turned on in config.toml (it is turned off by default), so if you don't see DEBUG logs, especially if you run the compiler with RUSTC_LOG=rustc rustc some.rs and only see INFO logs, make sure that debug-logging=true is turned on in your config.toml.

In some cases, just setting it will not trigger a rebuild, so if you changed it and you already have a compiler built, you might want to call x.py clean to force one.

Logging etiquette and conventions

Because calls to debug! are removed by default, in most cases, don't worry about adding "unnecessary" calls to debug! and leaving them in code you commit - they won't slow down the performance of what we ship, and if they helped you pinning down a bug, they will probably help someone else with a different one.

A loosely followed convention is to use debug!("foo(...)") at the start of a function foo and debug!("foo: ...") within the function. Another loosely followed convention is to use the {:?} format specifier for debug logs.

One thing to be careful of is expensive operations in logs.

If in the module rustc::foo you have a statement

debug!("{:?}", random_operation(tcx));

Then if someone runs a debug rustc with RUSTC_LOG=rustc::bar, then random_operation() will run.

This means that you should not put anything too expensive or likely to crash there - that would annoy anyone who wants to use logging for their own module. No-one will know it until someone tries to use logging to find another bug.

Formatting Graphviz output (.dot files)

Some compiler options for debugging specific features yield graphviz graphs - e.g. the #[rustc_mir(borrowck_graphviz_postflow="suffix.dot")] attribute dumps various borrow-checker dataflow graphs.

These all produce .dot files. To view these files, install graphviz (e.g. apt-get install graphviz) and then run the following commands:

$ dot -T pdf maybe_init_suffix.dot > maybe_init_suffix.pdf
$ firefox maybe_init_suffix.pdf # Or your favorite pdf viewer

Viewing Spanview output (.html files)

In addition to graphviz output, MIR debugging flags include an option to generate a MIR representation called Spanview that uses HTML to highlight code regions in the original source code and display compiler metadata associated with each region. -Zdump-mir-spanview, for example, highlights spans associated with each MIR Statement, Terminator, and/or BasicBlock.

These .html files use CSS features to dynamically expand spans obscured by overlapping spans, and native tooltips (based on the HTML title attribute) to reveal the actual MIR elements, as text.

To view these files, simply use a modern browser, or a CSS-capable HTML preview feature in a modern IDE. (The default HTML preview pane in VS Code is known to work, for instance.

Narrowing (Bisecting) Regressions

The cargo-bisect-rustc tool can be used as a quick and easy way to find exactly which PR caused a change in rustc behavior. It automatically downloads rustc PR artifacts and tests them against a project you provide until it finds the regression. You can then look at the PR to get more context on why it was changed. See this tutorial on how to use it.

Downloading Artifacts from Rust's CI

The rustup-toolchain-install-master tool by kennytm can be used to download the artifacts produced by Rust's CI for a specific SHA1 -- this basically corresponds to the successful landing of some PR -- and then sets them up for your local use. This also works for artifacts produced by @bors try. This is helpful when you want to examine the resulting build of a PR without doing the build yourself.

Debugging type layouts

The (permanently) unstable #[rustc_layout] attribute can be used to dump the Layout of the type it is attached to. For example:


#![allow(unused)]
#![feature(rustc_attrs)]

fn main() {
#[rustc_layout(debug)]
type T<'a> = &'a u32;
}

Will emit the following:

error: layout_of(&'a u32) = Layout {
    fields: Primitive,
    variants: Single {
        index: 0,
    },
    abi: Scalar(
        Scalar {
            value: Pointer,
            valid_range: 1..=18446744073709551615,
        },
    ),
    largest_niche: Some(
        Niche {
            offset: Size {
                raw: 0,
            },
            scalar: Scalar {
                value: Pointer,
                valid_range: 1..=18446744073709551615,
            },
        },
    ),
    align: AbiAndPrefAlign {
        abi: Align {
            pow2: 3,
        },
        pref: Align {
            pow2: 3,
        },
    },
    size: Size {
        raw: 8,
    },
}
 --> src/lib.rs:4:1
  |
4 | type T<'a> = &'a u32;
  | ^^^^^^^^^^^^^^^^^^^^^

error: aborting due to previous error

Profiling the compiler

This section talks about how to profile the compiler and find out where it spends its time.

Depending on what you're trying to measure, there are several different approaches:

  • If you want to see if a PR improves or regresses compiler performance:

    • The rustc-perf project makes this easy and can be triggered to run on a PR via the @rustc-perf bot.
  • If you want a medium-to-high level overview of where rustc is spending its time:

    • The -Zself-profile flag and measureme tools offer a query-based approach to profiling. See their docs for more information.
  • If you want function level performance data or even just more details than the above approaches:

    • Consider using a native code profiler such as perf
    • or tracy for a nanosecond-precision, full-featured graphical interface.
  • If you want a nice visual representation of the compile times of your crate graph, you can use cargo's -Ztimings flag, eg. cargo -Ztimings build. You can use this flag on the compiler itself with CARGOFLAGS="-Ztimings" ./x.py build

Optimizing rustc's bootstrap times with cargo-llvm-lines

Using cargo-llvm-lines you can count the number of lines of LLVM IR across all instantiations of a generic function. Since most of the time compiling rustc is spent in LLVM, the idea is that by reducing the amount of code passed to LLVM, compiling rustc gets faster.

Example usage:

cargo install cargo-llvm-lines
# On a normal crate you could now run `cargo llvm-lines`, but x.py isn't normal :P

# Do a clean before every run, to not mix in the results from previous runs.
./x.py clean
RUSTFLAGS="--emit=llvm-ir" ./x.py build --stage 0 compiler/rustc

# Single crate, eg. rustc_middle
cargo llvm-lines --files ./build/x86_64-unknown-linux-gnu/stage0-rustc/x86_64-unknown-linux-gnu/debug/deps/rustc_middle-a539a639bdab6513.ll > llvm-lines-middle.txt
# Specify all crates of the compiler. (Relies on the glob support of your shell.)
cargo llvm-lines --files ./build/x86_64-unknown-linux-gnu/stage0-rustc/x86_64-unknown-linux-gnu/debug/deps/*.ll > llvm-lines.txt

Example output:

  Lines            Copies        Function name
  -----            ------        -------------
  11802479 (100%)  52848 (100%)  (TOTAL)
   1663902 (14.1%)   400 (0.8%)  rustc_query_system::query::plumbing::get_query_impl::{{closure}}
    683526 (5.8%)  10579 (20.0%) core::ptr::drop_in_place
    568523 (4.8%)    528 (1.0%)  rustc_query_system::query::plumbing::get_query_impl
    472715 (4.0%)   1134 (2.1%)  hashbrown::raw::RawTable<T>::reserve_rehash
    306782 (2.6%)   1320 (2.5%)  rustc_middle::ty::query::plumbing::<impl rustc_query_system::query::QueryContext for rustc_middle::ty::context::TyCtxt>::start_query::{{closure}}::{{closure}}::{{closure}}
    212800 (1.8%)    514 (1.0%)  rustc_query_system::dep_graph::graph::DepGraph<K>::with_task_impl
    194813 (1.7%)    124 (0.2%)  rustc_query_system::query::plumbing::force_query_impl
    158488 (1.3%)      1 (0.0%)  rustc_middle::ty::query::<impl rustc_middle::ty::context::TyCtxt>::alloc_self_profile_query_strings
    119768 (1.0%)    418 (0.8%)  core::ops::function::FnOnce::call_once
    119644 (1.0%)      1 (0.0%)  rustc_target::spec::load_specific
    104153 (0.9%)      7 (0.0%)  rustc_middle::ty::context::_DERIVE_rustc_serialize_Decodable_D_FOR_TypeckResults::<impl rustc_serialize::serialize::Decodable<__D> for rustc_middle::ty::context::TypeckResults>::decode::{{closure}}
     81173 (0.7%)      1 (0.0%)  rustc_middle::ty::query::stats::query_stats
     80306 (0.7%)   2029 (3.8%)  core::ops::function::FnOnce::call_once{{vtable.shim}}
     78019 (0.7%)   1611 (3.0%)  stacker::grow::{{closure}}
     69720 (0.6%)   3286 (6.2%)  <&T as core::fmt::Debug>::fmt
     56327 (0.5%)    186 (0.4%)  rustc_query_system::query::plumbing::incremental_verify_ich
     49714 (0.4%)     14 (0.0%)  rustc_mir::dataflow::framework::graphviz::BlockFormatter<A>::write_node_label

Since this doesn't seem to work with incremental compilation or x.py check, you will be compiling rustc a lot. I recommend changing a few settings in config.toml to make it bearable:

[rust]
# A debug build takes _a third_ as long on my machine, 
# but compiling more than stage0 rustc becomes unbearably slow.
optimize = false

# We can't use incremental anyway, so we disable it for a little speed boost.
incremental = false
# We won't be running it, so no point in compiling debug checks.
debug = false

# Using a single codegen unit gives less output, but is slower to compile.
codegen-units = 0  # num_cpus

The llvm-lines output is affected by several options. optimize = false increases it from 2.1GB to 3.5GB and codegen-units = 0 to 4.1GB.

MIR optimizations have little impact. Compared to the default RUSTFLAGS="-Zmir-opt-level=1", level 0 adds 0.3GB and level 2 removes 0.2GB. Inlining currently only happens in LLVM, but this might change in the future.

Profiling with perf

This is a guide for how to profile rustc with perf.

Initial steps

  • Get a clean checkout of rust-lang/master, or whatever it is you want to profile.
  • Set the following settings in your config.toml:
    • debuginfo-level = 1 - enables line debuginfo
    • jemalloc = false - lets you do memory use profiling with valgrind
    • leave everything else the defaults
  • Run ./x.py build to get a full build
  • Make a rustup toolchain pointing to that result

Gathering a perf profile

perf is an excellent tool on linux that can be used to gather and analyze all kinds of information. Mostly it is used to figure out where a program spends its time. It can also be used for other sorts of events, though, like cache misses and so forth.

The basics

The basic perf command is this:

perf record -F99 --call-graph dwarf XXX

The -F99 tells perf to sample at 99 Hz, which avoids generating too much data for longer runs (why 99 Hz you ask? It is often chosen because it is unlikely to be in lockstep with other periodic activity). The --call-graph dwarf tells perf to get call-graph information from debuginfo, which is accurate. The XXX is the command you want to profile. So, for example, you might do:

perf record -F99 --call-graph dwarf cargo +<toolchain> rustc

to run cargo -- here <toolchain> should be the name of the toolchain you made in the beginning. But there are some things to be aware of:

  • You probably don't want to profile the time spend building dependencies. So something like cargo build; cargo clean -p $C may be helpful (where $C is the crate name)
    • Though usually I just do touch src/lib.rs and rebuild instead. =)
  • You probably don't want incremental messing about with your profile. So something like CARGO_INCREMENTAL=0 can be helpful.

Gathering a perf profile from a perf.rust-lang.org test

Often we want to analyze a specific test from perf.rust-lang.org. To do that, the first step is to clone the rustc-perf repository:

git clone https://github.com/rust-lang/rustc-perf

Doing it the easy way

Once you've cloned the repo, you can use the collector executable to do profiling for you! You can find instructions in the rustc-perf readme.

For example, to measure the clap-rs test, you might do:

./target/release/collector                                      \
  --output-repo /path/to/place/output                           \
  profile perf-record                                           \
  --rustc /path/to/rustc/executable/from/your/build/directory   \
  --cargo `which cargo`                                         \
  --filter clap-rs                                              \
  --builds Check                                                \

You can also use that same command to use cachegrind or other profiling tools.

Doing it the hard way

If you prefer to run things manually, that is also possible. You first need to find the source for the test you want. Sources for the tests are found in the collector/benchmarks directory. So let's go into the directory of a specific test; we'll use clap-rs as an example:

cd collector/benchmarks/clap-rs

In this case, let's say we want to profile the cargo check performance. In that case, I would first run some basic commands to build the dependencies:

# Setup: first clean out any old results and build the dependencies:
cargo +<toolchain> clean
CARGO_INCREMENTAL=0 cargo +<toolchain> check

(Again, <toolchain> should be replaced with the name of the toolchain we made in the first step.)

Next: we want record the execution time for just the clap-rs crate, running cargo check. I tend to use cargo rustc for this, since it also allows me to add explicit flags, which we'll do later on.

touch src/lib.rs
CARGO_INCREMENTAL=0 perf record -F99 --call-graph dwarf cargo rustc --profile check --lib

Note that final command: it's a doozy! It uses the cargo rustc command, which executes rustc with (potentially) additional options; the --profile check and --lib options specify that we are doing a cargo check execution, and that this is a library (not a binary).

At this point, we can use perf tooling to analyze the results. For example:

perf report

will open up an interactive TUI program. In simple cases, that can be helpful. For more detailed examination, the perf-focus tool can be helpful; it is covered below.

A note of caution. Each of the rustc-perf tests is its own special snowflake. In particular, some of them are not libraries, in which case you would want to do touch src/main.rs and avoid passing --lib. I'm not sure how best to tell which test is which to be honest.

Gathering NLL data

If you want to profile an NLL run, you can just pass extra options to the cargo rustc command, like so:

touch src/lib.rs
CARGO_INCREMENTAL=0 perf record -F99 --call-graph dwarf cargo rustc --profile check --lib -- -Zborrowck=mir

Analyzing a perf profile with perf focus

Once you've gathered a perf profile, we want to get some information about it. For this, I personally use perf focus. It's a kind of simple but useful tool that lets you answer queries like:

  • "how much time was spent in function F" (no matter where it was called from)
  • "how much time was spent in function F when it was called from G"
  • "how much time was spent in function F excluding time spent in G"
  • "what functions does F call and how much time does it spend in them"

To understand how it works, you have to know just a bit about perf. Basically, perf works by sampling your process on a regular basis (or whenever some event occurs). For each sample, perf gathers a backtrace. perf focus lets you write a regular expression that tests which functions appear in that backtrace, and then tells you which percentage of samples had a backtrace that met the regular expression. It's probably easiest to explain by walking through how I would analyze NLL performance.

Installing perf-focus

You can install perf-focus using cargo install:

cargo install perf-focus

Example: How much time is spent in MIR borrowck?

Let's say we've gathered the NLL data for a test. We'd like to know how much time it is spending in the MIR borrow-checker. The "main" function of the MIR borrowck is called do_mir_borrowck, so we can do this command:

$ perf focus '{do_mir_borrowck}'
Matcher    : {do_mir_borrowck}
Matches    : 228
Not Matches: 542
Percentage : 29%

The '{do_mir_borrowck}' argument is called the matcher. It specifies the test to be applied on the backtrace. In this case, the {X} indicates that there must be some function on the backtrace that meets the regular expression X. In this case, that regex is just the name of the function we want (in fact, it's a subset of the name; the full name includes a bunch of other stuff, like the module path). In this mode, perf-focus just prints out the percentage of samples where do_mir_borrowck was on the stack: in this case, 29%.

A note about c++filt. To get the data from perf, perf focus currently executes perf script (perhaps there is a better way...). I've sometimes found that perf script outputs C++ mangled names. This is annoying. You can tell by running perf script | head yourself — if you see names like 5rustc6middle instead of rustc::middle, then you have the same problem. You can solve this by doing:

perf script | c++filt | perf focus --from-stdin ...

This will pipe the output from perf script through c++filt and should mostly convert those names into a more friendly format. The --from-stdin flag to perf focus tells it to get its data from stdin, rather than executing perf focus. We should make this more convenient (at worst, maybe add a c++filt option to perf focus, or just always use it — it's pretty harmless).

Example: How much time does MIR borrowck spend solving traits?

Perhaps we'd like to know how much time MIR borrowck spends in the trait checker. We can ask this using a more complex regex:

$ perf focus '{do_mir_borrowck}..{^rustc::traits}'
Matcher    : {do_mir_borrowck},..{^rustc::traits}
Matches    : 12
Not Matches: 1311
Percentage : 0%

Here we used the .. operator to ask "how often do we have do_mir_borrowck on the stack and then, later, some function whose name begins with rustc::traits?" (basically, code in that module). It turns out the answer is "almost never" — only 12 samples fit that description (if you ever see no samples, that often indicates your query is messed up).

If you're curious, you can find out exactly which samples by using the --print-match option. This will print out the full backtrace for each sample. The | at the front of the line indicates the part that the regular expression matched.

Example: Where does MIR borrowck spend its time?

Often we want to do a more "explorational" queries. Like, we know that MIR borrowck is 29% of the time, but where does that time get spent? For that, the --tree-callees option is often the best tool. You usually also want to give --tree-min-percent or --tree-max-depth. The result looks like this:

$ perf focus '{do_mir_borrowck}' --tree-callees --tree-min-percent 3
Matcher    : {do_mir_borrowck}
Matches    : 577
Not Matches: 746
Percentage : 43%

Tree
| matched `{do_mir_borrowck}` (43% total, 0% self)
: | rustc_mir::borrow_check::nll::compute_regions (20% total, 0% self)
: : | rustc_mir::borrow_check::nll::type_check::type_check_internal (13% total, 0% self)
: : : | core::ops::function::FnOnce::call_once (5% total, 0% self)
: : : : | rustc_mir::borrow_check::nll::type_check::liveness::generate (5% total, 3% self)
: : : | <rustc_mir::borrow_check::nll::type_check::TypeVerifier<'a, 'b, 'tcx> as rustc::mir::visit::Visitor<'tcx>>::visit_mir (3% total, 0% self)
: | rustc::mir::visit::Visitor::visit_mir (8% total, 6% self)
: | <rustc_mir::borrow_check::MirBorrowckCtxt<'cx, 'tcx> as rustc_mir::dataflow::DataflowResultsConsumer<'cx, 'tcx>>::visit_statement_entry (5% total, 0% self)
: | rustc_mir::dataflow::do_dataflow (3% total, 0% self)

What happens with --tree-callees is that

  • we find each sample matching the regular expression
  • we look at the code that is occurs after the regex match and try to build up a call tree

The --tree-min-percent 3 option says "only show me things that take more than 3% of the time. Without this, the tree often gets really noisy and includes random stuff like the innards of malloc. --tree-max-depth can be useful too, it just limits how many levels we print.

For each line, we display the percent of time in that function altogether ("total") and the percent of time spent in just that function and not some callee of that function (self). Usually "total" is the more interesting number, but not always.

Relative percentages

By default, all in perf-focus are relative to the total program execution. This is useful to help you keep perspective — often as we drill down to find hot spots, we can lose sight of the fact that, in terms of overall program execution, this "hot spot" is actually not important. It also ensures that percentages between different queries are easily compared against one another.

That said, sometimes it's useful to get relative percentages, so perf focus offers a --relative option. In this case, the percentages are listed only for samples that match (vs all samples). So for example we could get our percentages relative to the borrowck itself like so:

$ perf focus '{do_mir_borrowck}' --tree-callees --relative --tree-max-depth 1 --tree-min-percent 5
Matcher    : {do_mir_borrowck}
Matches    : 577
Not Matches: 746
Percentage : 100%

Tree
| matched `{do_mir_borrowck}` (100% total, 0% self)
: | rustc_mir::borrow_check::nll::compute_regions (47% total, 0% self) [...]
: | rustc::mir::visit::Visitor::visit_mir (19% total, 15% self) [...]
: | <rustc_mir::borrow_check::MirBorrowckCtxt<'cx, 'tcx> as rustc_mir::dataflow::DataflowResultsConsumer<'cx, 'tcx>>::visit_statement_entry (13% total, 0% self) [...]
: | rustc_mir::dataflow::do_dataflow (8% total, 1% self) [...]

Here you see that compute_regions came up as "47% total" — that means that 47% of do_mir_borrowck is spent in that function. Before, we saw 20% — that's because do_mir_borrowck itself is only 43% of the total time (and .47 * .43 = .20).

crates.io Dependencies

The rust compiler supports building with some dependencies from crates.io. For example, log and env_logger come from crates.io.

In general, you should avoid adding dependencies to the compiler for several reasons:

  • The dependency may not be high quality or well-maintained, whereas we want the compiler to be high-quality.
  • The dependency may not be using a compatible license.
  • The dependency may have transitive dependencies that have one of the above problems.

TODO: what is the vetting process?

Permitted dependencies

The tidy tool has a list of crates that are allowed. To add a dependency that is not already in the compiler, you will need to add it to the list.

Contributing to Rust

Thank you for your interest in contributing to Rust! There are many ways to contribute, and we appreciate all of them.

If you have questions, please make a post on internals.rust-lang.org or hop on the Rust Discord server or Rust Zulip server.

As a reminder, all contributors are expected to follow our Code of Conduct.

If this is your first time contributing, the Getting Started and walkthrough chapters can give you a good example of how a typical contribution would go.

Feature Requests

Feature requests need to go through a process to be approved by the relevant teams. Usually this requires a Final Comment Period (FCP) or even a Request for Comments (RFC). See Getting Started for more information about these processes.

Bug Reports

While bugs are unfortunate, they're a reality in software. We can't fix what we don't know about, so please report liberally. If you're not sure if something is a bug or not, feel free to file a bug anyway.

If you believe reporting your bug publicly represents a security risk to Rust users, please follow our instructions for reporting security vulnerabilities.

If you're using the nightly channel, please check if the bug exists in the latest toolchain before filing your bug. It might be fixed already.

If you have the chance, before reporting a bug, please search existing issues, as it's possible that someone else has already reported your error. This doesn't always work, and sometimes it's hard to know what to search for, so consider this extra credit. We won't mind if you accidentally file a duplicate report.

Similarly, to help others who encountered the bug find your issue, consider filing an issue with a descriptive title, which contains information that might be unique to it. This can be the language or compiler feature used, the conditions that trigger the bug, or part of the error message if there is any. An example could be: "impossible case reached" on lifetime inference for impl Trait in return position.

Opening an issue is as easy as following this link and filling out the fields in the appropriate provided template.

Pull Requests

Pull requests (or PRs for short) are the primary mechanism we use to change Rust. GitHub itself has some great documentation on using the Pull Request feature. We use the "fork and pull" model described here, where contributors push changes to their personal fork and create pull requests to bring those changes into the source repository.

All pull requests are reviewed by another person. We have a bot, @rust-highfive, that will automatically assign a random person to review your request.

If you want to request that a specific person reviews your pull request, you can add an r? to the pull request description. For example, Steve usually reviews documentation changes. So if you were to make a documentation change, add

r? @steveklabnik

to the end of the pull request description, and @rust-highfive will assign @steveklabnik instead of a random person. This is entirely optional.

In addition to being reviewed by a human, pull requests are automatically tested thanks to continuous integration (CI). Basically, every time you open and update a pull request, CI builds the compiler and tests it against the compiler test suite, and also performs other tests such as checking that your pull request is in compliance with Rust's style guidelines.

Running continuous integration tests allows PR authors to catch mistakes early without going through a first review cycle, and also helps reviewers stay aware of the status of a particular pull request.

Rust has plenty of CI capacity, and you should never have to worry about wasting computational resources each time you push a change. It is also perfectly fine (and even encouraged!) to use the CI to test your changes if it can help your productivity. In particular, we don't recommend running the full x.py test suite locally, since it takes a very long time to execute.

After someone has reviewed your pull request, they will leave an annotation on the pull request with an r+. It will look something like this:

@bors r+

This tells @bors, our lovable integration bot, that your pull request has been approved. The PR then enters the merge queue, where @bors will run all the tests on every platform we support. If it all works out, @bors will merge your code into master and close the pull request.

Depending on the scale of the change, you may see a slightly different form of r+:

@bors r+ rollup

The additional rollup tells @bors that this change should always be "rolled up". Changes that are rolled up are tested and merged alongside other PRs, to speed the process up. Typically only small changes that are expected not to conflict with one another are marked as "always roll up".

Opening a PR

You are now ready to file a pull request? Great! Here are a few points you should be aware of.

All pull requests should be filed against the master branch, except in very particular scenarios. Unless you know for sure that you should target another branch, master will be the right choice (it's also the default).

Make sure your pull request is in compliance with Rust's style guidelines by running

$ ./x.py test tidy --bless

We recommend to make this check before every pull request (and every new commit in a pull request); you can add git hooks before every push to make sure you never forget to make this check. The CI will also run tidy and will fail if tidy fails.

Rust follows a no merge-commit policy, meaning, when you encounter merge conflicts you are expected to always rebase instead of merging. E.g. always use rebase when bringing the latest changes from the master branch to your feature branch. Also, please make sure that fixup commits are squashed into other related commits with meaningful commit messages.

If you encounter merge conflicts, your PR will get marked as S-waiting-on-author. When you resolve them, you should use @rustbot to mark it as S-waiting-on-review. See this chapter for more details.

GitHub allows closing issues using keywords. This feature should be used to keep the issue tracker tidy. However, it is generally preferred to put the "closes #123" text in the PR description rather than the issue commit; particularly during rebasing, citing the issue number in the commit can "spam" the issue in question.

External Dependencies (subtree)

As a developer to this repository, you don't have to treat the following external projects differently from other crates that are directly in this repo:

  • Clippy

They are just regular files and directories. This is in contrast to submodule dependencies (see below for those). Only tool authors will actually use any operations here.

Synchronizing a subtree

There are two synchronization directions: subtree push and subtree pull.

git subtree push -P src/tools/clippy git@github.com:your-github-name/rust-clippy sync-from-rust

takes all the changes that happened to the copy in this repo and creates commits on the remote repo that match the local changes. Every local commit that touched the subtree causes a commit on the remote repo, but is modified to move the files from the specified directory to the tool repo root.

Make sure to not pick the master branch on the tool repo, so you can open a normal PR to the tool to merge that subrepo push.

git subtree pull -P src/tools/clippy https://github.com/rust-lang/rust-clippy master

takes all changes since the last subtree pull from the tool repo repo and adds these commits to the rustc repo + a merge commit that moves the tool changes into the specified directory in the rust repository.

It is recommended that you always do a push first and get that merged to the tool master branch. Then, when you do a pull, the merge works without conflicts. While it's definitely possible to resolve conflicts during a pull, you may have to redo the conflict resolution if your PR doesn't get merged fast enough and there are new conflicts. Do not try to rebase the result of a git subtree pull, rebasing merge commits is a bad idea in general.

You always need to specify the -P prefix to the subtree directory and the corresponding remote repository. If you specify the wrong directory or repository you'll get very fun merges that try to push the wrong directory to the wrong remote repository. Luckily you can just abort this without any consequences by throwing away either the pulled commits in rustc or the pushed branch on the remote and try again. It is usually fairly obvious that this is happening because you suddenly get thousands of commits that want to be synchronized.

Creating a new subtree dependency

If you want to create a new subtree dependency from an existing repository, call (from this repository's root directory!)

git subtree add -P src/tools/clippy https://github.com/rust-lang/rust-clippy.git master

This will create a new commit, which you may not rebase under any circumstances! Delete the commit and redo the operation if you need to rebase.

Now you're done, the src/tools/clippy directory behaves as if Clippy were part of the rustc monorepo, so no one but you (or others that synchronize subtrees) actually needs to use git subtree.

External Dependencies (submodules)

Currently building Rust will also build the following external projects:

We allow breakage of these tools in the nightly channel. Maintainers of these projects will be notified of the breakages and should fix them as soon as possible.

After the external is fixed, one could add the changes with

git add path/to/submodule

outside the submodule.

In order to prepare your tool-fixing PR, you can run the build locally by doing ./x.py build src/tools/TOOL. If you will be editing the sources there, you may wish to set submodules = false in the config.toml to prevent x.py from resetting to the original branch.

Breakage is not allowed in the beta and stable channels, and must be addressed before the PR is merged.

Breaking Tools Built With The Compiler

Rust's build system builds a number of tools that make use of the internals of the compiler. This includes RLS and rustfmt. If these tools break because of your changes, you may run into a sort of "chicken and egg" problem. These tools rely on the latest compiler to be built so you can't update them to reflect your changes to the compiler until those changes are merged into the compiler. At the same time, you can't get your changes merged into the compiler because the rust-lang/rust build won't pass until those tools build and pass their tests.

That means that, in the default state, you can't update the compiler without first fixing rustfmt, rls and the other tools that the compiler builds.

Luckily, a feature was added to Rust's build to make all of this easy to handle. The idea is that we allow these tools to be "broken", so that the rust-lang/rust build passes without trying to build them, then land the change in the compiler, wait for a nightly, and go update the tools that you broke. Once you're done and the tools are working again, you go back in the compiler and update the tools so they can be distributed again.

This should avoid a bunch of synchronization dances and is also much easier on contributors as there's no need to block on rls/rustfmt/other tools changes going upstream.

Here are those same steps in detail:

  1. (optional) First, if it doesn't exist already, create a config.toml by copying config.toml.example in the root directory of the Rust repository. Set submodules = false in the [build] section. This will prevent x.py from resetting to the original branch after you make your changes. If you need to update any submodules to their latest versions, see the section of this file about that for more information.
  2. (optional) Run ./x.py test src/tools/rustfmt (substituting the submodule that broke for rustfmt). Fix any errors in the submodule (and possibly others).
  3. (optional) Make commits for your changes and send them to upstream repositories as a PR.
  4. (optional) Maintainers of these submodules will not merge the PR. The PR can't be merged because CI will be broken. You'll want to write a message on the PR referencing your change, and how the PR should be merged once your change makes it into a nightly.
  5. Wait for your PR to merge.
  6. Wait for a nightly
  7. (optional) Help land your PR on the upstream repository now that your changes are in nightly.
  8. (optional) Send a PR to rust-lang/rust updating the submodule.

Updating submodules

These instructions are specific to updating rustfmt, however they may apply to the other submodules as well. Please help by improving these instructions if you find any discrepancies or special cases that need to be addressed.

To update the rustfmt submodule, start by running the appropriate git submodule command. For example, to update to the latest commit on the remote master branch, you may want to run:

git submodule update --remote src/tools/rustfmt

If you run ./x.py build now, and you are lucky, it may just work. If you see an error message about patches that did not resolve to any crates, you will need to complete a few more steps which are outlined with their rationale below.

(This error may change in the future to include more information.)

error: failed to resolve patches for `https://github.com/rust-lang/rustfmt`

Caused by:
  patch for `rustfmt-nightly` in `https://github.com/rust-lang/rustfmt` did not resolve to any crates
failed to run: ~/rust/build/x86_64-unknown-linux-gnu/stage0/bin/cargo build --manifest-path ~/rust/src/bootstrap/Cargo.toml

The [patch] section of Cargo.toml can be very useful for testing. In addition to that, you should read the Overriding dependencies section of the documentation.

Specifically, the following section in Overriding dependencies reveals what the problem is:

Next up we need to ensure that our lock file is updated to use this new version of uuid so our project uses the locally checked out copy instead of one from crates.io. The way [patch] works is that it'll load the dependency at ../path/to/uuid and then whenever crates.io is queried for versions of uuid it'll also return the local version.

This means that the version number of the local checkout is significant and will affect whether the patch is used. Our manifest declared uuid = "1.0" which means we'll only resolve to >= 1.0.0, < 2.0.0, and Cargo's greedy resolution algorithm also means that we'll resolve to the maximum version within that range. Typically this doesn't matter as the version of the git repository will already be greater or match the maximum version published on crates.io, but it's important to keep this in mind!

This says that when we updated the submodule, the version number in our src/tools/rustfmt/Cargo.toml changed. The new version is different from the version in Cargo.lock, so the build can no longer continue.

To resolve this, we need to update Cargo.lock. Luckily, cargo provides a command to do this easily.

$ cargo update -p rustfmt-nightly

This should change the version listed in Cargo.lock to the new version you updated the submodule to. Running ./x.py build should work now.

Writing Documentation

Documentation improvements are very welcome. The source of doc.rust-lang.org is located in src/doc in the tree, and standard API documentation is generated from the source code itself (e.g. lib.rs). Documentation pull requests function in the same way as other pull requests.

To find documentation-related issues, sort by the T-doc label.

You can find documentation style guidelines in RFC 1574.

In many cases, you don't need a full ./x.py doc --stage 2, which will build the entire stage 2 compiler and compile the various books published on doc.rust-lang.org. When updating documentation for the standard library, first try ./x.py doc library/std. If that fails, or if you need to see the output from the latest version of rustdoc, add --stage 1. Results should appear in build/$TARGET/doc.

You can also use rustdoc directly to check small fixes. For example, rustdoc src/doc/reference.md will render reference to doc/reference.html. The CSS might be messed up, but you can verify that the HTML is right.

Additionally, contributions to the rustc-dev-guide are always welcome. Contributions can be made directly at the rust-lang/rustc-dev-guide repo. The issue tracker in that repo is also a great way to find things that need doing. There are issues for beginners and advanced compiler devs alike!

Issue Triage

Sometimes, an issue will stay open, even though the bug has been fixed. And sometimes, the original bug may go stale because something has changed in the meantime.

It can be helpful to go through older bug reports and make sure that they are still valid. Load up an older issue, double check that it's still true, and leave a comment letting us know if it is or is not. The least recently updated sort is good for finding issues like this.

Thanks to @rustbot, anyone can help triage issues by adding appropriate labels to issues that haven't been triaged yet:

  • Yellow, A-prefixed labels state which area of the project an issue relates to.

  • Magenta, B-prefixed labels identify bugs which are blockers.

  • Dark blue, beta- labels track changes which need to be backported into the beta branches.

  • Light purple, C-prefixed labels represent the category of an issue.

  • Green, E-prefixed labels explain the level of experience necessary to fix the issue.

  • The dark blue final-comment-period label marks bugs that are using the RFC signoff functionality of rfcbot and are currently in the final comment period.

  • Red, I-prefixed labels indicate the importance of the issue. The I-nominated label indicates that an issue has been nominated for prioritizing at the next triage meeting. Similarly, the I-prioritize indicates that an issue has been requested to be prioritized by the appropriate team.

  • The purple metabug label marks lists of bugs collected by other categories.

  • Purple gray, O-prefixed labels are the operating system or platform that this issue is specific to.

  • Orange, P-prefixed labels indicate a bug's priority. These labels are only assigned during triage meetings, and replace the I-prioritize label.

  • The gray proposed-final-comment-period label marks bugs that are using the RFC signoff functionality of rfcbot and are currently awaiting signoff of all team members in order to enter the final comment period.

  • Pink, regression-prefixed labels track regressions from stable to the release channels.

  • The light orange relnotes label marks issues that should be documented in the release notes of the next release.

  • Gray, S-prefixed labels are used for tracking the status of pull requests.

  • Blue, T-prefixed bugs denote which team the issue belongs to.

If you're looking for somewhere to start, check out the E-easy tag.

Out-of-tree Contributions

There are a number of other ways to contribute to Rust that don't deal with rust-lang/rust:

Helpful Links and Information

For people new to Rust, and just starting to contribute, or even for more seasoned developers, some useful places to look for information are:

  • This guide contains information about how various parts of the compiler work and how to contribute to the compiler
  • Rust Forge contains additional documentation, including write-ups of how to achieve common tasks
  • The Rust Internals forum, a place to ask questions and discuss Rust's internals
  • The generated documentation for Rust's compiler
  • The Rust reference, even though it doesn't specifically talk about Rust's internals, is a great resource nonetheless
  • Although out of date, Tom Lee's great blog article is very helpful
  • rustaceans.org is helpful, but mostly dedicated to IRC
  • The Rust Compiler Testing Docs
  • For @bors, this cheat sheet is helpful
  • Google is always helpful when programming. You can search all Rust documentation (the standard library, the compiler, the books, the references, and the guides) to quickly find information about the language and compiler.
  • You can also use Rustdoc's built-in search feature to find documentation on types and functions within the crates you're looking at. You can also search by type signature! For example, searching for * -> vec should find all functions that return a Vec<T>. Hint: Find more tips and keyboard shortcuts by typing ? on any Rustdoc page!
  • Don't be afraid to ask! The Rust community is friendly and helpful.

About the compiler team

rustc is maintained by the Rust compiler team. The people who belong to this team collectively work to track regressions and implement new features. Members of the Rust compiler team are people who have made significant contributions to rustc and its design.

Discussion

Currently the compiler team chats in Zulip:

  • Team chat occurs in the t-compiler stream on the Zulip instance
  • There are also a number of other associated Zulip streams, such as t-compiler/help, where people can ask for help with rustc development, or t-compiler/meetings, where the team holds their weekly triage and steering meetings.

Expert map

If you're interested in figuring out who can answer questions about a particular part of the compiler, or you'd just like to know who works on what, check out our experts directory. It contains a listing of the various parts of the compiler and a list of people who are experts on each one.

Rust compiler meeting

The compiler team has a weekly meeting where we do triage and try to generally stay on top of new bugs, regressions, and other things. They are held on Zulip. It works roughly as follows:

  • Review P-high bugs: P-high bugs are those that are sufficiently important for us to actively track progress. P-high bugs should ideally always have an assignee.
  • Look over new regressions: we then look for new cases where the compiler broke previously working code in the wild. Regressions are almost always marked as P-high; the major exception would be bug fixes (though even there we often aim to give warnings first).
  • Check I-nominated issues: These are issues where feedback from the team is desired.
  • Check for beta nominations: These are nominations of things to backport to beta.
  • Possibly WG checking: A WG may give an update at this point, if there is time.

The meeting currently takes place on Thursdays at 10am Boston time (UTC-4 typically, but daylight savings time sometimes makes things complicated).

The meeting is held over a "chat medium", currently on zulip.

Team membership

Membership in the Rust team is typically offered when someone has been making significant contributions to the compiler for some time. Membership is both a recognition but also an obligation: compiler team members are generally expected to help with upkeep as well as doing reviews and other work.

If you are interested in becoming a compiler team member, the first thing to do is to start fixing some bugs, or get involved in a working group. One good way to find bugs is to look for open issues tagged with E-easy or E-mentor.

r+ rights

Once you have made a number of individual PRs to rustc, we will often offer r+ privileges. This means that you have the right to instruct "bors" (the robot that manages which PRs get landed into rustc) to merge a PR (here are some instructions for how to talk to bors).

The guidelines for reviewers are as follows:

  • You are always welcome to review any PR, regardless of who it is assigned to. However, do not r+ PRs unless:
    • You are confident in that part of the code.
    • You are confident that nobody else wants to review it first.
      • For example, sometimes people will express a desire to review a PR before it lands, perhaps because it touches a particularly sensitive part of the code.
  • Always be polite when reviewing: you are a representative of the Rust project, so it is expected that you will go above and beyond when it comes to the Code of Conduct.

high-five

Once you have r+ rights, you can also be added to the high-five rotation. high-five is the bot that assigns incoming PRs to reviewers. If you are added, you will be randomly selected to review PRs. If you find you are assigned a PR that you don't feel comfortable reviewing, you can also leave a comment like r? @so-and-so to assign to someone else — if you don't know who to request, just write r? @nikomatsakis for reassignment and @nikomatsakis will pick someone for you.

Getting on the high-five list is much appreciated as it lowers the review burden for all of us! However, if you don't have time to give people timely feedback on their PRs, it may be better that you don't get on the list.

Full team membership

Full team membership is typically extended once someone made many contributions to the Rust compiler over time, ideally (but not necessarily) to multiple areas. Sometimes this might be implementing a new feature, but it is also important — perhaps more important! — to have time and willingness to help out with general upkeep such as bugfixes, tracking regressions, and other less glamorous work.

Using Git

The Rust project uses Git to manage its source code. In order to contribute, you'll need some familiarity with its features so that your changes can be incorporated into the compiler.

The goal of this page is to cover some of the more common questions and problems new contributors face. Although some Git basics will be covered here, if you find that this is still a little too fast for you, it might make sense to first read some introductions to Git, such as the Beginner and Getting started sections of this tutorial from Atlassian. GitHub also provides documentation and guides for beginners, or you can consult the more in depth book from Git.

Prerequisites

We'll assume that you've installed Git, forked rust-lang/rust, and cloned the forked repo to your PC. We'll use the command line interface to interact with Git; there are also a number of GUIs and IDE integrations that can generally do the same things.

If you've cloned your fork, then you will be able to reference it with origin in your local repo. It may be helpful to also set up a remote for the official rust-lang/rust repo via

git remote add upstream https://github.com/rust-lang/rust.git

if you're using HTTPS, or

git remote add upstream git@github.com:rust-lang/rust.git

if you're using SSH.

Standard Process

Below is the normal procedure that you're likely to use for most minor changes and PRs:

  1. Ensure that you're making your changes on top of master: git checkout master.
  2. Get the latest changes from the Rust repo: git pull upstream master.
  3. Make a new branch for your change: git checkout -b issue-12345-fix.
  4. Make some changes to the repo and test them.
  5. Stage your changes via git add src/changed/file.rs src/another/change.rs and then commit them with git commit. Of course, making intermediate commits may be a good idea as well. Avoid git add ., as it makes it too easy to unintentionally commit changes that should not be committed, such as submodule updates. You can use git status to check if there are any files you forgot to stage.
  6. Push your changes to your fork: git push --set-upstream origin issue-12345-fix.
  7. Open a PR from your fork to rust-lang/rust's master branch.

If your reviewer requests changes, the procedure for those changes looks much the same, with some steps skipped:

  1. Ensure that you're making changes to the most recent version of your code: git checkout issue-12345-fix.
  2. Make, stage, and commit your additional changes just like before.
  3. Push those changes to your fork: git push.

Troubleshooting git issues

You don't need to clone rust-lang/rust from scratch if it's out of date! Even if you think you've messed it up beyond repair, there are ways to fix the git state that don't require downloading the whole repository again. Here are some common issues you might run into:

I deleted my fork on GitHub!

This is not a problem from git's perspective. If you run git remote -v, it will say something like this:

$ git remote -v
origin	https://github.com//rust-lang/rust (fetch)
origin	https://github.com//rust-lang/rust (push)
personal	https://github.com/jyn514/rust (fetch)
personal	https://github.com/jyn514/rust (push)

If you renamed your fork, you can change the URL like this:

git remote set-url personal <URL>

where the is your new fork.

I see 'Untracked Files: src/stdarch'?

This is left over from the move to the library/ directory. Unfortunately, git rebase does not follow renames for submodules, so you have to delete the directory yourself:

rm -r src/stdarch

I see <<< HEAD?

You were probably in the middle of a rebase or merge conflict. See Conflicts for how to fix the conflict. If you don't care about the changes and just want to get a clean copy of the repository back, you can use git reset:

# WARNING: this throws out any local changes you've made! Consider resolving the conflicts instead.
git reset --hard master

Quick note about submodules

When updating your local repository with git pull, you may notice that sometimes Git says you have modified some files that you have never edited. For example, running git status gives you something like (note the new commits mention):

On branch master
Your branch is up to date with 'origin/master'.

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
	modified:   src/tools/cargo (new commits)
	modified:   src/tools/rls (new commits)
	modified:   src/tools/rustfmt (new commits)

no changes added to commit (use "git add" and/or "git commit -a")

These changes are not changes to files: they are changes to submodules (more on this later). To get rid of those, run git submodule update (or run any x.py command, which will automatically update the submodules). Note that there is currently a bug if you use worktrees, submodules, and x.py in a commit hook. If you run into an error like:

error: failed to read `/home/joshua/rustc-worktree/src/tools/miri/cargo-miri/Cargo.toml`

Caused by:
  No such file or directory (os error 2)

it's not anything you did wrong. There is a workaround at #77620.

Conflicts

When you edit your code locally, you are making changes to the version of rust-lang/rust that existed when you created your feature branch. As such, when you submit your PR it is possible that some of the changes that have been made to rust-lang/rust since then are in conflict with the changes you've made.

When this happens, you need to resolve the conflicts before your changes can be merged. First, get a local copy of the conflicting changes: Checkout your local master branch with git checkout master, then git pull upstream master to update it with the most recent changes.

Rebasing

You're now ready to start the rebasing process. Checkout the branch with your changes and execute git rebase master.

When you rebase a branch on master, all the changes on your branch are reapplied to the most recent version of master. In other words, Git tries to pretend that the changes you made to the old version of master were instead made to the new version of master. During this process, you should expect to encounter at least one "rebase conflict." This happens when Git's attempt to reapply the changes fails because your changes conflicted with other changes that have been made. You can tell that this happened because you'll see lines in the output that look like

CONFLICT (content): Merge conflict in file.rs

When you open these files, you'll see sections of the form

<<<<<<< HEAD
Original code
=======
Your code
>>>>>>> 8fbf656... Commit fixes 12345

This represents the lines in the file that Git could not figure out how to rebase. The section between <<<<<<< HEAD and ======= has the code from master, while the other side has your version of the code. You'll need to decide how to deal with the conflict. You may want to keep your changes, keep the changes on master, or combine the two.

Generally, resolving the conflict consists of two steps: First, fix the particular conflict. Edit the file to make the changes you want and remove the <<<<<<<, ======= and >>>>>>> lines in the process. Second, check the surrounding code. If there was a conflict, its likely there are some logical errors lying around too! It's a good idea to run x.py check here to make sure there are no glaring errors.

Once you're all done fixing the conflicts, you need to stage the files that had conflicts in them via git add. Afterwards, run git rebase --continue to let Git know that you've resolved the conflicts and it should finish the rebase. Once the rebase has succeeded, you'll want to update the associated branch on your fork with git push --force-with-lease.

Note that git push will not work properly and say something like this:

 ! [rejected]        issue-xxxxx -> issue-xxxxx (non-fast-forward)
error: failed to push some refs to 'https://github.com/username/rust.git'
hint: Updates were rejected because the tip of your current branch is behind
hint: its remote counterpart. Integrate the remote changes (e.g.
hint: 'git pull ...') before pushing again.
hint: See the 'Note about fast-forwards' in 'git push --help' for details.

The advice this gives is incorrect! Because of Rust's "no-merge" policy the merge commit created by git pull will not be allowed in the final PR, in addition to defeating the point of the rebase! Use git push --force-with-lease instead.

Advanced Rebasing

If your branch contains multiple consecutive rewrites of the same code, or if the rebase conflicts are extremely severe, you can use git rebase --interactive master to gain more control over the process. This allows you to choose to skip commits, edit the commits that you do not skip, change the order in which they are applied, or "squash" them into each other.

Alternatively, you can sacrifice the commit history like this:

# squash all the changes into one commit so you only have to worry about conflicts once
git rebase -i $(git merge-base master HEAD)  # and squash all changes along the way
git rebase master
# fix all merge conflicts
git rebase --continue

"Squashing" commits into each other causes them to be merged into a single commit. Both the upside and downside of this is that it simplifies the history. On the one hand, you lose track of the steps in which changes were made, but the history becomes easier to work with.

You also may want to squash just the last few commits together, possibly because they only represent "fixups" and not real changes. For example, git rebase --interactive HEAD~2 will allow you to edit the two commits only.

No-Merge Policy

The rust-lang/rust repo uses what is known as a "rebase workflow." This means that merge commits in PRs are not accepted. As a result, if you are running git merge locally, chances are good that you should be rebasing instead. Of course, this is not always true; if your merge will just be a fast-forward, like the merges that git pull usually performs, then no merge commit is created and you have nothing to worry about. Running git config merge.ff only once will ensure that all the merges you perform are of this type, so that you cannot make a mistake.

There are a number of reasons for this decision and like all others, it is a tradeoff. The main advantage is the generally linear commit history. This greatly simplifies bisecting and makes the history and commit log much easier to follow and understand.

Git submodules

NOTE: submodules are a nice thing to know about, but it isn't an absolute prerequisite to contribute to rustc. If you are using Git for the first time, you might want to get used to the main concepts of Git before reading this section.

The rust-lang/rust repository uses Git submodules as a way to use other Rust projects from within the rust repo. Examples include Rust's fork of llvm-project and many devtools such as cargo, rust-analyzer and rustfmt.

Those projects are developped and maintained in an separate Git (and GitHub) repository, and they have their own Git history/commits, issue tracker and PRs. Submodules allow us to create some sort of embedded sub-repository inside the rust repository and use them like they were directories in the rust repository.

Take miri for example. miri is maintained in the rust-lang/miri repository, but it is used in rust-lang/rust by the compiler for const evaluation. We bring it in rust as a submodule, in the src/tools/miri folder.

The contents of submodules are ignored by Git: submodules are in some sense isolated from the rest of the repository. However, if you try to cd src/tools/miri and then run git status:

HEAD detached at 3fafb835
nothing to commit, working tree clean

As far as git is concerned, you are no longer in the rust repo, but in the miri repo. You will notice that we are in "detatched HEAD" state, i.e. not on a branch but on a particular commit.

This is because, like any dependency, we want to be able to control which version to use. Submodules allow us to do just that: every submodule is "pinned" to a certain commit, which doesn't change unless modified manually. If you use git checkout <commit> in the miri directory and go back to the rust directory, you can stage this change like any This is usually done by the maintainers of the project, and looks like this.

Git submodules take some time to get used to, so don't worry if it isn't perfectly clear yet. You will rarely have to use them directly and, again, you don't need to know everything about submodules to contribute to Rust. Just know that they exist and that they correspond to some sort of embedded subrepository dependency that Git can nicely and fairly conveniently handle for us.

Mastering @rustbot

@rustbot (also known as triagebot) is a utility robot that is mostly used to allow any contributor to achieve certain tasks that would normally require GitHub membership to the rust-lang organization. Its most interesting features for contributors to rustc are issue claiming and relabeling.

Issue claiming

@rustbot exposes a command that allows anyone to assign an issue to themselves. If you see an issue you want to work on, you can send the following message as a comment on the issue at hand:

@rustbot claim

This will tell @rustbot to assign the issue to you if it has no assignee yet. Note that because of some GitHub restrictions, you may be assigned indirectly, i.e. @rustbot will assign itself as a placeholder and edit the top comment to reflect the fact that the issue is now assigned to you.

If you want to unassign from an issue, @rustbot has a different command:

@rustbot release-assignment

Issue relabeling

Changing labels for an issue or PR is also normally reserved for members of the organization. However, @rustbot allows you to relabel an issue yourself, only with a few restrictions. This is mostly useful in two cases:

Helping with issue triage: Rust's issue tracker has more than 5,000 open issues at the time of this writing, so labels are the most powerful tool that we have to keep it as tidy as possible. You don't need to spend hours in the issue tracker to triage issues, but if you open an issue, you should feel free to label it if you are comfortable with doing it yourself.

Updating the status of a PR: We use "status labels" to reflect the status of PRs. For example, if your PR has merge conflicts, it will automatically be assigned the S-waiting-on-author, and reviewers might not review it until you rebase your PR. Once you do rebase your branch, you should change the labels yourself to remove the S-waiting-on-author label and add back S-waiting-on-review. In this case, the @rustbot command will look like this:

@rustbot label -S-waiting-on-author +S-waiting-on-review

The syntax for this command is pretty loose, so there are other variants of this command invocation. For more details, see the wiki page about labeling.

Other commands

If you are interested in seeing what @rustbot is capable of, check out its wiki, which is meant as a reference for the bot and should be kept up to date every time the bot gets an upgrade.

@rustbot is maintained by the Release team. If you have any feedback regarding existing commands or suggestions for new commands, feel free to reach out on Zulip or file an issue in the triagebot repository

Walkthrough: a typical contribution

There are a lot of ways to contribute to the rust compiler, including fixing bugs, improving performance, helping design features, providing feedback on existing features, etc. This chapter does not claim to scratch the surface. Instead, it walks through the design and implementation of a new feature. Not all of the steps and processes described here are needed for every contribution, and I will try to point those out as they arise.

In general, if you are interested in making a contribution and aren't sure where to start, please feel free to ask!

Overview

The feature I will discuss in this chapter is the ? Kleene operator for macros. Basically, we want to be able to write something like this:

macro_rules! foo {
    ($arg:ident $(, $optional_arg:ident)?) => {
        println!("{}", $arg);

        $(
            println!("{}", $optional_arg);
        )?
    }
}

fn main() {
    let x = 0;
    foo!(x); // ok! prints "0"
    foo!(x, x); // ok! prints "0 0"
}

So basically, the $(pat)? matcher in the macro means "this pattern can occur 0 or 1 times", similar to other regex syntaxes.

There were a number of steps to go from an idea to stable rust feature. Here is a quick list. We will go through each of these in order below. As I mentioned before, not all of these are needed for every type of contribution.

  • Idea discussion/Pre-RFC A Pre-RFC is an early draft or design discussion of a feature. This stage is intended to flesh out the design space a bit and get a grasp on the different merits and problems with an idea. It's a great way to get early feedback on your idea before presenting it the wider audience. You can find the original discussion here.
  • RFC This is when you formally present your idea to the community for consideration. You can find the RFC here.
  • Implementation Implement your idea unstably in the compiler. You can find the original implementation here.
  • Possibly iterate/refine As the community gets experience with your feature on the nightly compiler and in std, there may be additional feedback about design choice that might be adjusted. This particular feature went through a number of iterations.
  • Stabilization When your feature has baked enough, a rust team member may propose to stabilize it. If there is consensus, this is done.
  • Relax Your feature is now a stable rust feature!

Pre-RFC and RFC

NOTE: In general, if you are not proposing a new feature or substantial change to rust or the ecosystem, you don't need to follow the RFC process. Instead, you can just jump to implementation.

You can find the official guidelines for when to open an RFC here.

An RFC is a document that describes the feature or change you are proposing in detail. Anyone can write an RFC; the process is the same for everyone, including rust team members.

To open an RFC, open a PR on the rust-lang/rfcs repo on GitHub. You can find detailed instructions in the README.

Before opening an RFC, you should do the research to "flesh out" your idea. Hastily-proposed RFCs tend not to be accepted. You should generally have a good description of the motivation, impact, disadvantages, and potential interactions with other features.

If that sounds like a lot of work, it's because it is. But no fear! Even if you're not a compiler hacker, you can get great feedback by doing a pre-RFC. This is an informal discussion of the idea. The best place to do this is internals.rust-lang.org. Your post doesn't have to follow any particular structure. It doesn't even need to be a cohesive idea. Generally, you will get tons of feedback that you can integrate back to produce a good RFC.

(Another pro-tip: try searching the RFCs repo and internals for prior related ideas. A lot of times an idea has already been considered and was either rejected or postponed to be tried again later. This can save you and everybody else some time)

In the case of our example, a participant in the pre-RFC thread pointed out a syntax ambiguity and a potential resolution. Also, the overall feedback seemed positive. In this case, the discussion converged pretty quickly, but for some ideas, a lot more discussion can happen (e.g. see this RFC which received a whopping 684 comments!). If that happens, don't be discouraged; it means the community is interested in your idea, but it perhaps needs some adjustments.

The RFC for our ? macro feature did receive some discussion on the RFC thread too. As with most RFCs, there were a few questions that we couldn't answer by discussion: we needed experience using the feature to decide. Such questions are listed in the "Unresolved Questions" section of the RFC. Also, over the course of the RFC discussion, you will probably want to update the RFC document itself to reflect the course of the discussion (e.g. new alternatives or prior work may be added or you may decide to change parts of the proposal itself).

In the end, when the discussion seems to reach a consensus and die down a bit, a rust team member may propose to move to "final comment period" (FCP) with one of three possible dispositions. This means that they want the other members of the appropriate teams to review and comment on the RFC. More discussion may ensue, which may result in more changes or unresolved questions being added. At some point, when everyone is satisfied, the RFC enters the FCP, which is the last chance for people to bring up objections. When the FCP is over, the disposition is adopted. Here are the three possible dispositions:

  • Merge: accept the feature. Here is the proposal to merge for our ? macro feature.
  • Close: this feature in its current form is not a good fit for rust. Don't be discouraged if this happens to your RFC, and don't take it personally. This is not a reflection on you, but rather a community decision that rust will go a different direction.
  • Postpone: there is interest in going this direction but not at the moment. This happens most often because the appropriate rust team doesn't have the bandwidth to shepherd the feature through the process to stabilization. Often this is the case when the feature doesn't fit into the team's roadmap. Postponed ideas may be revisited later.

When an RFC is merged, the PR is merged into the RFCs repo. A new tracking issue is created in the rust-lang/rust repo to track progress on the feature and discuss unresolved questions, implementation progress and blockers, etc. Here is the tracking issue on for our ? macro feature.

Implementation

To make a change to the compiler, open a PR against the rust-lang/rust repo.

Depending on the feature/change/bug fix/improvement, implementation may be relatively-straightforward or it may be a major undertaking. You can always ask for help or mentorship from more experienced compiler devs. Also, you don't have to be the one to implement your feature; but keep in mind that if you don't it might be a while before someone else does.

For the ? macro feature, I needed to go understand the relevant parts of macro expansion in the compiler. Personally, I find that improving the comments in the code is a helpful way of making sure I understand it, but you don't have to do that if you don't want to.

I then implemented the original feature, as described in the RFC. When a new feature is implemented, it goes behind a feature gate, which means that you have to use #![feature(my_feature_name)] to use the feature. The feature gate is removed when the feature is stabilized.

Most bug fixes and improvements don't require a feature gate. You can just make your changes/improvements.

When you open a PR on the rust-lang/rust, a bot will assign your PR to a review. If there is a particular rust team member you are working with, you can request that reviewer by leaving a comment on the thread with r? @reviewer-github-id (e.g. r? @eddyb). If you don't know who to request, don't request anyone; the bot will assign someone automatically.

The reviewer may request changes before they approve your PR. Feel free to ask questions or discuss things you don't understand or disagree with. However, recognize that the PR won't be merged unless someone on the rust team approves it.

When your reviewer approves the PR, it will go into a queue for yet another bot called @bors. @bors manages the CI build/merge queue. When your PR reaches the head of the @bors queue, @bors will test out the merge by running all tests against your PR on GitHub Actions. This takes a lot of time to finish. If all tests pass, the PR is merged and becomes part of the next nightly compiler!

There are a couple of things that may happen for some PRs during the review process

  • If the change is substantial enough, the reviewer may request an FCP on the PR. This gives all members of the appropriate team a chance to review the changes.
  • If the change may cause breakage, the reviewer may request a crater run. This compiles the compiler with your changes and then attempts to compile all crates on crates.io with your modified compiler. This is a great smoke test to check if you introduced a change to compiler behavior that affects a large portion of the ecosystem.
  • If the diff of your PR is large or the reviewer is busy, your PR may have some merge conflicts with other PRs that happen to get merged first. You should fix these merge conflicts using the normal git procedures.

If you are not doing a new feature or something like that (e.g. if you are fixing a bug), then that's it! Thanks for your contribution :)

Refining your implementation

As people get experience with your new feature on nightly, slight changes may be proposed and unresolved questions may become resolved. Updates/changes go through the same process for implementing any other changes, as described above (i.e. submit a PR, go through review, wait for @bors, etc).

Some changes may be major enough to require an FCP and some review by rust team members.

For the ? macro feature, we went through a few different iterations after the original implementation: 1, 2, 3.

Along the way, we decided that ? should not take a separator, which was previously an unresolved question listed in the RFC. We also changed the disambiguation strategy: we decided to remove the ability to use ? as a separator token for other repetition operators (e.g. + or *). However, since this was a breaking change, we decided to do it over an edition boundary. Thus, the new feature can be enabled only in edition 2018. These deviations from the original RFC required another FCP.

Stabilization

Finally, after the feature had baked for a while on nightly, a language team member moved to stabilize it.

A stabilization report needs to be written that includes

  • brief description of the behavior and any deviations from the RFC
  • which edition(s) are affected and how
  • links to a few tests to show the interesting aspects

The stabilization report for our feature is here.

After this, a PR is made to remove the feature gate, enabling the feature by default (on the 2018 edition). A note is added to the Release notes about the feature.

Steps to stabilize the feature can be found at Stabilizing Features.

Rustc Bug Fix Procedure

This page defines the best practices procedure for making bug fixes or soundness corrections in the compiler that can cause existing code to stop compiling. This text is based on RFC 1589.

Motivation

From time to time, we encounter the need to make a bug fix, soundness correction, or other change in the compiler which will cause existing code to stop compiling. When this happens, it is important that we handle the change in a way that gives users of Rust a smooth transition. What we want to avoid is that existing programs suddenly stop compiling with opaque error messages: we would prefer to have a gradual period of warnings, with clear guidance as to what the problem is, how to fix it, and why the change was made. This RFC describes the procedure that we have been developing for handling breaking changes that aims to achieve that kind of smooth transition.

One of the key points of this policy is that (a) warnings should be issued initially rather than hard errors if at all possible and (b) every change that causes existing code to stop compiling will have an associated tracking issue. This issue provides a point to collect feedback on the results of that change. Sometimes changes have unexpectedly large consequences or there may be a way to avoid the change that was not considered. In those cases, we may decide to change course and roll back the change, or find another solution (if warnings are being used, this is particularly easy to do).

What qualifies as a bug fix?

Note that this RFC does not try to define when a breaking change is permitted. That is already covered under RFC 1122. This document assumes that the change being made is in accordance with those policies. Here is a summary of the conditions from RFC 1122:

  • Soundness changes: Fixes to holes uncovered in the type system.
  • Compiler bugs: Places where the compiler is not implementing the specified semantics found in an RFC or lang-team decision.
  • Underspecified language semantics: Clarifications to grey areas where the compiler behaves inconsistently and no formal behavior had been previously decided.

Please see the RFC for full details!

Detailed design

The procedure for making a breaking change is as follows (each of these steps is described in more detail below):

  1. Do a crater run to assess the impact of the change.
  2. Make a special tracking issue dedicated to the change.
  3. Do not report an error right away. Instead, issue forwards-compatibility lint warnings.
    • Sometimes this is not straightforward. See the text below for suggestions on different techniques we have employed in the past.
    • For cases where warnings are infeasible:
      • Report errors, but make every effort to give a targeted error message that directs users to the tracking issue
      • Submit PRs to all known affected crates that fix the issue
        • or, at minimum, alert the owners of those crates to the problem and direct them to the tracking issue
  4. Once the change has been in the wild for at least one cycle, we can stabilize the change, converting those warnings into errors.

Finally, for changes to rustc_ast that will affect plugins, the general policy is to batch these changes. That is discussed below in more detail.

Tracking issue

Every breaking change should be accompanied by a dedicated tracking issue for that change. The main text of this issue should describe the change being made, with a focus on what users must do to fix their code. The issue should be approachable and practical; it may make sense to direct users to an RFC or some other issue for the full details. The issue also serves as a place where users can comment with questions or other concerns.

A template for these breaking-change tracking issues can be found below. An example of how such an issue should look can be found here.

The issue should be tagged with (at least) B-unstable and T-compiler.

Tracking issue template

This is a template to use for tracking issues:

This is the **summary issue** for the `YOUR_LINT_NAME_HERE`
future-compatibility warning and other related errors. The goal of
this page is describe why this change was made and how you can fix
code that is affected by it. It also provides a place to ask questions
or register a complaint if you feel the change should not be made. For
more information on the policy around future-compatibility warnings,
see our [breaking change policy guidelines][guidelines].

[guidelines]: LINK_TO_THIS_RFC

#### What is the warning for?

*Describe the conditions that trigger the warning and how they can be
fixed. Also explain why the change was made.**

#### When will this warning become a hard error?

At the beginning of each 6-week release cycle, the Rust compiler team
will review the set of outstanding future compatibility warnings and
nominate some of them for **Final Comment Period**. Toward the end of
the cycle, we will review any comments and make a final determination
whether to convert the warning into a hard error or remove it
entirely.

Issuing future compatibility warnings

The best way to handle a breaking change is to begin by issuing future-compatibility warnings. These are a special category of lint warning. Adding a new future-compatibility warning can be done as follows.


#![allow(unused)]
fn main() {
// 1. Define the lint in `compiler/rustc_middle/src/lint/builtin.rs`:
declare_lint! {
    pub YOUR_ERROR_HERE,
    Warn,
    "illegal use of foo bar baz"
}

// 2. Add to the list of HardwiredLints in the same file:
impl LintPass for HardwiredLints {
    fn get_lints(&self) -> LintArray {
        lint_array!(
            ..,
            YOUR_ERROR_HERE
        )
    }
}

// 3. Register the lint in `compiler/rustc_lint/src/lib.rs`:
store.register_future_incompatible(sess, vec![
    ...,
    FutureIncompatibleInfo {
        id: LintId::of(YOUR_ERROR_HERE),
        reference: "issue #1234", // your tracking issue here!
    },
]);

// 4. Report the lint:
tcx.lint_node(
    lint::builtin::YOUR_ERROR_HERE,
    path_id,
    binding.span,
    format!("some helper message here"));
}

Helpful techniques

It can often be challenging to filter out new warnings from older, pre-existing errors. One technique that has been used in the past is to run the older code unchanged and collect the errors it would have reported. You can then issue warnings for any errors you would give which do not appear in that original set. Another option is to abort compilation after the original code completes if errors are reported: then you know that your new code will only execute when there were no errors before.

Crater and crates.io

We should always do a crater run to assess impact. It is polite and considerate to at least notify the authors of affected crates the breaking change. If we can submit PRs to fix the problem, so much the better.

Is it ever acceptable to go directly to issuing errors?

Changes that are believed to have negligible impact can go directly to issuing an error. One rule of thumb would be to check against crates.io: if fewer than 10 total affected projects are found (not root errors), we can move straight to an error. In such cases, we should still make the "breaking change" page as before, and we should ensure that the error directs users to this page. In other words, everything should be the same except that users are getting an error, and not a warning. Moreover, we should submit PRs to the affected projects (ideally before the PR implementing the change lands in rustc).

If the impact is not believed to be negligible (e.g., more than 10 crates are affected), then warnings are required (unless the compiler team agrees to grant a special exemption in some particular case). If implementing warnings is not feasible, then we should make an aggressive strategy of migrating crates before we land the change so as to lower the number of affected crates. Here are some techniques for approaching this scenario:

  1. Issue warnings for subparts of the problem, and reserve the new errors for the smallest set of cases you can.
  2. Try to give a very precise error message that suggests how to fix the problem and directs users to the tracking issue.
  3. It may also make sense to layer the fix:
    • First, add warnings where possible and let those land before proceeding to issue errors.
    • Work with authors of affected crates to ensure that corrected versions are available before the fix lands, so that downstream users can use them.

Stabilization

After a change is made, we will stabilize the change using the same process that we use for unstable features:

  • After a new release is made, we will go through the outstanding tracking issues corresponding to breaking changes and nominate some of them for final comment period (FCP).

  • The FCP for such issues lasts for one cycle. In the final week or two of the cycle, we will review comments and make a final determination:

    • Convert to error: the change should be made into a hard error.
    • Revert: we should remove the warning and continue to allow the older code to compile.
    • Defer: can't decide yet, wait longer, or try other strategies.

Ideally, breaking changes should have landed on the stable branch of the compiler before they are finalized.

Removing a lint

Once we have decided to make a "future warning" into a hard error, we need a PR that removes the custom lint. As an example, here are the steps required to remove the overlapping_inherent_impls compatibility lint. First, convert the name of the lint to uppercase (OVERLAPPING_INHERENT_IMPLS) ripgrep through the source for that string. We will basically by converting each place where this lint name is mentioned (in the compiler, we use the upper-case name, and a macro automatically generates the lower-case string; so searching for overlapping_inherent_impls would not find much).

NOTE: these exact files don't exist anymore, but the procedure is still the same.

Remove the lint.

The first reference you will likely find is the lint definition in rustc_session/src/lint/builtin.rs that resembles this:


#![allow(unused)]
fn main() {
declare_lint! {
    pub OVERLAPPING_INHERENT_IMPLS,
    Deny, // this may also say Warning
    "two overlapping inherent impls define an item with the same name were erroneously allowed"
}
}

This declare_lint! macro creates the relevant data structures. Remove it. You will also find that there is a mention of OVERLAPPING_INHERENT_IMPLS later in the file as part of a lint_array!; remove it too,

Next, you see see a reference to OVERLAPPING_INHERENT_IMPLS in rustc_lint/src/lib.rs. This defining the lint as a "future compatibility lint":


#![allow(unused)]
fn main() {
FutureIncompatibleInfo {
    id: LintId::of(OVERLAPPING_INHERENT_IMPLS),
    reference: "issue #36889 <https://github.com/rust-lang/rust/issues/36889>",
},
}

Remove this too.

Add the lint to the list of removed lists.

In compiler/rustc_lint/src/lib.rs there is a list of "renamed and removed lints". You can add this lint to the list:


#![allow(unused)]
fn main() {
store.register_removed("overlapping_inherent_impls", "converted into hard error, see #36889");
}

where #36889 is the tracking issue for your lint.

Update the places that issue the lint

Finally, the last class of references you will see are the places that actually trigger the lint itself (i.e., what causes the warnings to appear). These you do not want to delete. Instead, you want to convert them into errors. In this case, the add_lint call looks like this:


#![allow(unused)]
fn main() {
self.tcx.sess.add_lint(lint::builtin::OVERLAPPING_INHERENT_IMPLS,
                       node_id,
                       self.tcx.span_of_impl(item1).unwrap(),
                       msg);
}

We want to convert this into an error. In some cases, there may be an existing error for this scenario. In others, we will need to allocate a fresh diagnostic code. Instructions for allocating a fresh diagnostic code can be found here. You may want to mention in the extended description that the compiler behavior changed on this point, and include a reference to the tracking issue for the change.

Let's say that we've adopted E0592 as our code. Then we can change the add_lint() call above to something like:


#![allow(unused)]
fn main() {
struct_span_err!(self.tcx.sess, self.tcx.span_of_impl(item1).unwrap(), msg)
    .emit();
}

Update tests

Finally, run the test suite. These should be some tests that used to reference the overlapping_inherent_impls lint, those will need to be updated. In general, if the test used to have #[deny(overlapping_inherent_impls)], that can just be removed.

./x.py test

All done!

Open a PR. =)

Implement New Feature

When you want to implement a new significant feature in the compiler, you need to go through this process to make sure everything goes smoothly.

The @rfcbot (p)FCP process

When the change is small and uncontroversial, then it can be done with just writing a PR and getting r+ from someone who knows that part of the code. However, if the change is potentially controversial, it would be a bad idea to push it without consensus from the rest of the team (both in the "distributed system" sense to make sure you don't break anything you don't know about, and in the social sense to avoid PR fights).

If such a change seems to be too small to require a full formal RFC process (e.g. a big refactoring of the code, or a "technically-breaking" change, or a "big bugfix" that basically amounts to a small feature) but is still too controversial or big to get by with a single r+, you can start a pFCP (or, if you don't have r+ rights, ask someone who has them to start one - and unless they have a concern themselves, they should). pFCP stands for "proposed final comment period".

Again, the pFCP process is only needed if you need consensus - if you don't think anyone would have a problem with your change, it's ok to get by with only an r+. For example, it is OK to add or modify unstable command-line flags or attributes without a pFCP for compiler development or standard library use, as long as you don't expect them to be in wide use in the nightly ecosystem.

You don't need to have the implementation fully ready for r+ to ask for a pFCP, but it is generally a good idea to have at least a proof of concept so that people can see what you are talking about.

When a pFCP is started, it requires all members of the team to sign off the FCP. After they all do so, there's a 10 day long "final comment period" where everybody can comment, and if no new concerns are raised, the PR/issue gets FCP approval.

The logistics of writing features

There are a few "logistic" hoops you might need to go through in order to implement a feature in a working way.

Warning Cycles

In some cases, a feature or bugfix might break some existing programs in some edge cases. In that case, you might want to do a crater run to assess the impact and possibly add a future-compatibility lint, similar to those used for edition-gated lints.

Stability

We value the stability of Rust. Code that works and runs on stable should (mostly) not break. Because of that, we don't want to release a feature to the world with only team consensus and code review - we want to gain real-world experience on using that feature on nightly, and we might want to change the feature based on that experience.

To allow for that, we must make sure users don't accidentally depend on that new feature - otherwise, especially if experimentation takes time or is delayed and the feature takes the trains to stable, it would end up de facto stable and we'll not be able to make changes in it without breaking people's code.

The way we do that is that we make sure all new features are feature gated - they can't be used without enabling a feature gate (#[feature(foo)]), which can't be done in a stable/beta compiler. See the stability in code section for the technical details.

Eventually, after we gain enough experience using the feature, make the necessary changes, and are satisfied, we expose it to the world using the stabilization process described here. Until then, the feature is not set in stone: every part of the feature can be changed, or the feature might be completely rewritten or removed. Features are not supposed to gain tenure by being unstable and unchanged for a year.

Tracking Issues

To keep track of the status of an unstable feature, the experience we get while using it on nightly, and of the concerns that block its stabilization, every feature-gate needs a tracking issue.

General discussions about the feature should be done on the tracking issue.

For features that have an RFC, you should use the RFC's tracking issue for the feature.

For other features, you'll have to make a tracking issue for that feature. The issue title should be "Tracking issue for YOUR FEATURE".

For tracking issues for features (as opposed to future-compat warnings), I don't think the description has to contain anything specific. Generally we put the list of items required for stabilization in a checklist, e.g.,

**Steps:**

- [ ] Implement the RFC. (CC @rust-lang/compiler -- can anyone write
      up mentoring instructions?)
- [ ] Adjust the documentation. ([See instructions on rustc-dev-guide.](https://rustc-dev-guide.rust-lang.org/stabilization_guide.html#documentation-prs))
- [ ] Stabilize the feature. ([See instructions on rustc-dev-guide.](https://rustc-dev-guide.rust-lang.org/stabilization_guide.html#stabilization-pr))

Stability in code

The below steps needs to be followed in order to implement a new unstable feature:

  1. Open a tracking issue - if you have an RFC, you can use the tracking issue for the RFC.

    The tracking issue should be labeled with at least C-tracking-issue. For a language feature, a label F-feature_name should be added as well.

  2. Pick a name for the feature gate (for RFCs, use the name in the RFC).

  3. Add a feature gate declaration to rustc_feature/src/active.rs in the active declare_features block. See here for detailed instructions.

  4. Prevent usage of the new feature unless the feature gate is set. You can check it in most places in the compiler using the expression tcx.features().$feature_name (or sess.features_untracked().$feature_name if the tcx is unavailable)

    If the feature gate is not set, you should either maintain the pre-feature behavior or raise an error, depending on what makes sense.

    For features introducing new syntax, pre-expansion gating should be used instead. To do so, extend the GatedSpans struct, add spans to it during parsing, and then finally feature-gate all the spans in rustc_ast_passes::feature_gate::check_crate.

  5. Add a test to ensure the feature cannot be used without a feature gate, by creating feature-gate-$feature_name.rs and feature-gate-$feature_name.stderr files under the directory where the other tests for your feature reside.

  6. Add a section to the unstable book, in src/doc/unstable-book/src/language-features/$feature_name.md.

  7. Write a lot of tests for the new feature. PRs without tests will not be accepted!

  8. Get your PR reviewed and land it. You have now successfully implemented a feature in Rust!

Stability attributes

This section is about the stability attributes and schemes that allow stable APIs to use unstable APIs internally in the rustc standard library.

For instructions on stabilizing a language feature see Stabilizing Features.

unstable

The #[unstable(feature = "foo", issue = "1234", reason = "lorem ipsum")] attribute explicitly marks an item as unstable. Items that are marked as "unstable" cannot be used without a corresponding #![feature] attribute on the crate, even on a nightly compiler. This restriction only applies across crate boundaries, unstable items may be used within the crate that defines them.

The issue field specifies the associated GitHub issue number. This field is required and all unstable features should have an associated tracking issue. In rare cases where there is no sensible value issue = "none" is used.

The unstable attribute infects all sub-items, where the attribute doesn't have to be reapplied. So if you apply this to a module, all items in the module will be unstable.

You can make specific sub-items stable by using the #[stable] attribute on them. The stability scheme works similarly to how pub works. You can have public functions of nonpublic modules and you can have stable functions in unstable modules or vice versa.

Note, however, that due to a rustc bug, stable items inside unstable modules are available to stable code in that location! So, for example, stable code can import core::intrinsics::transmute even though intrinsics is an unstable module. Thus, this kind of nesting should be avoided when possible.

The unstable attribute may also have the soft value, which makes it a future-incompatible deny-by-default lint instead of a hard error. This is used by the bench attribute which was accidentally accepted in the past. This prevents breaking dependencies by leveraging Cargo's lint capping.

stable

The #[stable(feature = "foo", "since = "1.420.69")] attribute explicitly marks an item as stabilized. To do this, follow the instructions in Stabilizing Features.

Note that stable functions may use unstable things in their body.

rustc_const_unstable

The #[rustc_const_unstable(feature = "foo", issue = "1234", reason = "lorem ipsum")] has the same interface as the unstable attribute. It is used to mark const fn as having their constness be unstable. This allows you to make a function stable without stabilizing its constness or even just marking an existing stable function as const fn without instantly stabilizing the const fnness.

Furthermore this attribute is needed to mark an intrinsic as const fn, because there's no way to add const to functions in extern blocks for now.

rustc_const_stable

The #[rustc_const_stable(feature = "foo", "since = "1.420.69")] attribute explicitly marks a const fn as having its constness be stable. This attribute can make sense even on an unstable function, if that function is called from another rustc_const_stable function.

Furthermore this attribute is needed to mark an intrinsic as callable from rustc_const_stable functions.

allow_internal_unstable

Macros, compiler desugarings and const fns expose their bodies to the call site. To work around not being able to use unstable things in the standard library's macros, there's the #[allow_internal_unstable(feature1, feature2)] attribute that allows the given features to be used in stable macros or const fns.

Note that const fns are even more special in this regard. You can't just allow any feature, the features need an implementation in qualify_min_const_fn.rs. For example the const_fn_union feature gate allows accessing fields of unions inside stable const fns. The rules for when it's ok to use such a feature gate are that behavior matches the runtime behavior of the same code (see also this blog post). This means that you may not create a const fn that e.g. transmutes a memory address to an integer, because the addresses of things are nondeterministic and often unknown at compile-time.

Always ping @oli-obk, @RalfJung, and @Centril if you are adding more allow_internal_unstable attributes to any const fn

staged_api

Any crate that uses the stable, unstable, or rustc_deprecated attributes must include the #![feature(staged_api)] attribute on the crate.

rustc_deprecated

The deprecation system shares the same infrastructure as the stable/unstable attributes. The rustc_deprecated attribute is similar to the deprecated attribute. It was previously called deprecated, but was split off when deprecated was stabilized. The deprecated attribute cannot be used in a staged_api crate, rustc_deprecated must be used instead. The deprecated item must also have a stable or unstable attribute.

rustc_deprecated has the following form:

#[rustc_deprecated(
    since = "1.38.0",
    reason = "explanation for deprecation",
    suggestion = "other_function"
)]

The suggestion field is optional. If given, it should be a string that can be used as a machine-applicable suggestion to correct the warning. This is typically used when the identifier is renamed, but no other significant changes are necessary.

Another difference from the deprecated attribute is that the since field is actually checked against the current version of rustc. If since is in a future version, then the deprecated_in_future lint is triggered which is default allow, but most of the standard library raises it to a warning with #![warn(deprecated_in_future)].

-Zforce-unstable-if-unmarked

The -Zforce-unstable-if-unmarked flag has a variety of purposes to help enforce that the correct crates are marked as unstable. It was introduced primarily to allow rustc and the standard library to link to arbitrary crates on crates.io which do not themselves use staged_api. rustc also relies on this flag to mark all of its crates as unstable with the rustc_private feature so that each crate does not need to be carefully marked with unstable.

This flag is automatically applied to all of rustc and the standard library by the bootstrap scripts. This is needed because the compiler and all of its dependencies are shipped in the sysroot to all users.

This flag has the following effects:

  • Marks the crate as "unstable" with the rustc_private feature if it is not itself marked as stable or unstable.
  • Allows these crates to access other forced-unstable crates without any need for attributes. Normally a crate would need a #![feature(rustc_private)] attribute to use other unstable crates. However, that would make it impossible for a crate from crates.io to access its own dependencies since that crate won't have a feature(rustc_private) attribute, but everything is compiled with -Zforce-unstable-if-unmarked.

Code which does not use -Zforce-unstable-if-unmarked should include the #![feature(rustc_private)] crate attribute to access these force-unstable crates. This is needed for things that link rustc, such as miri, rls, or clippy.

Request for stabilization

Once an unstable feature has been well-tested with no outstanding concern, anyone may push for its stabilization. It involves the following steps.

  • Documentation PRs
  • Write a stabilization report
  • FCP
  • Stabilization PR

Documentation PRs

If any documentation for this feature exists, it should be in the Unstable Book, located at src/doc/unstable-book. If it exists, the page for the feature gate should be removed.

If there was documentation there, integrating it into the existing documentation is needed.

If there wasn't documentation there, it needs to be added.

Places that may need updated documentation:

  • The Reference: This must be updated, in full detail.
  • The Book: This may or may not need updating, depends. If you're not sure, please open an issue on this repository and it can be discussed.
  • standard library documentation: As needed. Language features often don't need this, but if it's a feature that changes how good examples are written, such as when ? was added to the language, updating examples is important.
  • Rust by Example: As needed.

Prepare PRs to update documentation involving this new feature for repositories mentioned above. Maintainers of these repositories will keep these PRs open until the whole stabilization process has completed. Meanwhile, we can proceed to the next step.

Write a stabilization report

Find the tracking issue of the feature, and create a short stabilization report. Essentially this would be a brief summary of the feature plus some links to test cases showing it works as expected, along with a list of edge cases that came up and were considered. This is a minimal "due diligence" that we do before stabilizing.

The report should contain:

  • A summary, showing examples (e.g. code snippets) what is enabled by this feature.
  • Links to test cases in our test suite regarding this feature and describe the feature's behavior on encountering edge cases.
  • Links to the documentations (the PRs we have made in the previous steps).
  • Any other relevant information(Examples of such reports can be found in rust-lang/rust#44494 and rust-lang/rust#28237).
  • The resolutions of any unresolved questions if the stabilization is for an RFC.

FCP

If any member of the team responsible for tracking this feature agrees with stabilizing this feature, they will start the FCP (final-comment-period) process by commenting

@rfcbot fcp merge

The rest of the team members will review the proposal. If the final decision is to stabilize, we proceed to do the actual code modification.

Stabilization PR

Once we have decided to stabilize a feature, we need to have a PR that actually makes that stabilization happen. These kinds of PRs are a great way to get involved in Rust, as they take you on a little tour through the source code.

Here is a general guide to how to stabilize a feature -- every feature is different, of course, so some features may require steps beyond what this guide talks about.

Note: Before we stabilize any feature, it's the rule that it should appear in the documentation.

Updating the feature-gate listing

There is a central listing of feature-gates in compiler/rustc_feature. Search for the declare_features! macro. There should be an entry for the feature you are aiming to stabilize, something like (this example is taken from rust-lang/rust#32409:

// pub(restricted) visibilities (RFC 1422)
(active, pub_restricted, "1.9.0", Some(32409)),

The above line should be moved down to the area for "accepted" features, declared below in a separate call to declare_features!. When it is done, it should look like:

// pub(restricted) visibilities (RFC 1422)
(accepted, pub_restricted, "1.31.0", Some(32409)),
// note that we changed this

Note that, the version number is updated to be the version number of the stable release where this feature will appear. This can be found by consulting the forge, which will guide you the next stable release number. You want to add 1 to that, because the code that lands today will become go into beta on that date, and then become stable after that. So, at the time of this writing, the next stable release (i.e. what is currently beta) was 1.30.0, hence I wrote 1.31.0 above.

Removing existing uses of the feature-gate

Next search for the feature string (in this case, pub_restricted) in the codebase to find where it appears. Change uses of #![feature(XXX)] from the std and any rustc crates to be #![cfg_attr(bootstrap, feature(XXX))]. This includes the feature-gate only for stage0, which is built using the current beta (this is needed because the feature is still unstable in the current beta).

Also, remove those strings from any tests. If there are tests specifically targeting the feature-gate (i.e., testing that the feature-gate is required to use the feature, but nothing else), simply remove the test.

Do not require the feature-gate to use the feature

Most importantly, remove the code which flags an error if the feature-gate is not present (since the feature is now considered stable). If the feature can be detected because it employs some new syntax, then a common place for that code to be is in the same compiler/rustc_ast_passes/src/feature_gate.rs. For example, you might see code like this:

gate_feature_post!(&self, pub_restricted, span,
 "`pub(restricted)` syntax is experimental");

This gate_feature_post! macro prints an error if the pub_restricted feature is not enabled. It is not needed now that #[pub_restricted] is stable.

For more subtle features, you may find code like this:

if self.tcx.sess.features.borrow().pub_restricted { /* XXX */ }

This pub_restricted field (obviously named after the feature) would ordinarily be false if the feature flag is not present and true if it is. So transform the code to assume that the field is true. In this case, that would mean removing the if and leaving just the /* XXX */.

if self.tcx.sess.features.borrow().pub_restricted { /* XXX */ }
becomes
/* XXX */

if self.tcx.sess.features.borrow().pub_restricted && something { /* XXX */ }
 becomes
if something { /* XXX */ }

Feature Gates

This chapter is intended to provide basic help for adding, removing, and modifying feature gates.

Adding a feature gate

See "Stability in code" for help with adding a new feature; this section just covers how to add the feature gate declaration.

Add a feature gate declaration to rustc_feature/src/active.rs in the active declare_features block:

/// description of feature
(active, $feature_name, "$current_nightly_version", Some($tracking_issue_number), $edition)

where $edition has the type Option<Edition>, and is typically just None.

For example:

/// Allows defining identifiers beyond ASCII.
(active, non_ascii_idents, "1.0.0", Some(55467), None),

When added, the current version should be the one for the current nightly. Once the feature is moved to accepted.rs, the version is changed to that nightly version.

Removing a feature gate

To remove a feature gate, follow these steps:

  1. Remove the feature gate declaration in rustc_feature/src/active.rs. It will look like this:

    /// description of feature
    (active, $feature_name, "$version", Some($tracking_issue_number), $edition)
    
  2. Add a modified version of the feature gate declaration that you just removed to rustc_feature/src/removed.rs:

    /// description of feature
    (removed, $old_feature_name, "$version", Some($tracking_issue_number), $edition,
     Some("$why_it_was_removed"))
    

Renaming a feature gate

To rename a feature gate, follow these steps (the first two are the same steps to follow when removing a feature gate):

  1. Remove the old feature gate declaration in rustc_feature/src/active.rs. It will look like this:

    /// description of feature
    (active, $old_feature_name, "$version", Some($tracking_issue_number), $edition)
    
  2. Add a modified version of the old feature gate declaration that you just removed to rustc_feature/src/removed.rs:

    /// description of feature
    /// Renamed to `$new_feature_name`
    (removed, $old_feature_name, "$version", Some($tracking_issue_number), $edition,
     Some("renamed to `$new_feature_name`"))
    
  3. Add a feature gate declaration with the new name to rustc_feature/src/active.rs. It should look very similar to the old declaration:

    /// description of feature
    (active, $new_feature_name, "$version", Some($tracking_issue_number), $edition)
    

Stabilizing a feature

See "Updating the feature-gate listing" in the "Stabilizing Features" chapter for instructions. There are additional steps you will need to take beyond just updating the declaration!

This file offers some tips on the coding conventions for rustc. This chapter covers formatting, coding for correctness, using crates from crates.io, and some tips on structuring your PR for easy review.

Formatting and the tidy script

rustc is moving towards the Rust standard coding style. This is enforced by the "tidy" script and can be mostly automated using ./x.py fmt.

As the output of rustfmt is not completely stable, formatting this repository using cargo fmt is not recommended.

The tidy script runs automatically when you do ./x.py test and can be run in isolation with ./x.py test tidy.

Copyright notice

In the past, files began with a copyright and license notice. Please omit this notice for new files licensed under the standard terms (dual MIT/Apache-2.0).

All of the copyright notices should be gone by now, but if you come across one in the rust-lang/rust repo, feel free to open a PR to remove it.

Line length

Lines should be at most 100 characters. It's even better if you can keep things to 80.

Ignoring the line length limit. Sometimes – in particular for tests – it can be necessary to exempt yourself from this limit. In that case, you can add a comment towards the top of the file like so:


#![allow(unused)]
fn main() {
// ignore-tidy-linelength
}

Tabs vs spaces

Prefer 4-space indent.

Coding for correctness

Beyond formatting, there are a few other tips that are worth following.

Prefer exhaustive matches

Using _ in a match is convenient, but it means that when new variants are added to the enum, they may not get handled correctly. Ask yourself: if a new variant were added to this enum, what's the chance that it would want to use the _ code, versus having some other treatment? Unless the answer is "low", then prefer an exhaustive match. (The same advice applies to if let and while let, which are effectively tests for a single variant.)

Use "TODO" comments for things you don't want to forget

As a useful tool to yourself, you can insert a // TODO comment for something that you want to get back to before you land your PR:

fn do_something() {
    if something_else {
        unimplemented!(); // TODO write this
    }
}

The tidy script will report an error for a // TODO comment, so this code would not be able to land until the TODO is fixed (or removed).

This can also be useful in a PR as a way to signal from one commit that you are leaving a bug that a later commit will fix:

if foo {
    return true; // TODO wrong, but will be fixed in a later commit
}

Using crates from crates.io

It is allowed to use crates from crates.io, though external dependencies should not be added gratuitously. All such crates must have a suitably permissive license. There is an automatic check which inspects the Cargo metadata to ensure this.

How to structure your PR

How you prepare the commits in your PR can make a big difference for the reviewer. Here are some tips.

Isolate "pure refactorings" into their own commit. For example, if you rename a method, then put that rename into its own commit, along with the renames of all the uses.

More commits is usually better. If you are doing a large change, it's almost always better to break it up into smaller steps that can be independently understood. The one thing to be aware of is that if you introduce some code following one strategy, then change it dramatically (versus adding to it) in a later commit, that 'back-and-forth' can be confusing.

Format liberally. While only the final commit of a PR must be correctly formatted, it is both easier to review and less noisy to format each commit individually using ./x.py fmt.

No merges. We do not allow merge commits into our history, other than those by bors. If you get a merge conflict, rebase instead via a command like git rebase -i rust-lang/master (presuming you use the name rust-lang for your remote).

Individual commits do not have to build (but it's nice). We do not require that every intermediate commit successfully builds – we only expect to be able to bisect at a PR level. However, if you can make individual commits build, that is always helpful.

Naming conventions

Apart from normal Rust style/naming conventions, there are also some specific to the compiler.

  • cx tends to be short for "context" and is often used as a suffix. For example, tcx is a common name for the Typing Context.

  • 'tcx is used as the lifetime name for the Typing Context.

  • Because crate is a keyword, if you need a variable to represent something crate-related, often the spelling is changed to krate.

Notification groups

The notification groups are an easy way to help out with rustc in a "piece-meal" fashion, without committing to a larger project. Notification groups are easy to join (just submit a PR!) and joining does not entail any particular commitment.

Once you join a notification group, you will be added to a list that receives pings on github whenever a new issue is found that fits the notification group's criteria. If you are interested, you can then claim the issue and start working on it.

Of course, you don't have to wait for new issues to be tagged! If you prefer, you can use the Github label for a notification group to search for existing issues that haven't been claimed yet.

List of notification groups

Here's the list of the notification groups:

What issues are a good fit for notification groups?

Notification groups tend to get pinged on isolated bugs, particularly those of middle priority:

  • By isolated, we mean that we do not expect large-scale refactoring to be required to fix the bug.
  • By middle priority, we mean that we'd like to see the bug fixed, but it's not such a burning problem that we are dropping everything else to fix it. The danger with such bugs, of course, is that they can accumulate over time, and the role of the notification group is to try and stop that from happening!

Joining a notification group

To join an notification group, you just have to open a PR adding your Github username to the appropriate file in the Rust team repository. See the "example PRs" below to get a precise idea and to identify the file to edit.

Also, if you are not already a member of a Rust team then -- in addition to adding your name to the file -- you have to checkout the repository and run the following command:

cargo run add-person $your_user_name

Example PRs:

Tagging an issue for a notification group

To tag an issue as appropriate for a notification group, you give rustbot a ping command with the name of the notification group. For example:

@rustbot ping llvm
@rustbot ping cleanup-crew
@rustbot ping windows
@rustbot ping arm

To make some commands shorter and easier to remember, there are aliases, defined in the triagebot.toml file. For example, all of these commands are equivalent and will ping the Cleanup Crew:

@rustbot ping cleanup
@rustbot ping bisect
@rustbot ping reduce

Keep in mind that these aliases are meant to make humans' life easier. They might be subject to change. If you need to ensure that a command will always be valid, prefer the full invocations over the aliases.

Note though that this should only be done by compiler team members or contributors, and is typically done as part of compiler team triage.

ARM notification group

Github Label: O-ARM

This list will be used to ask for help both in diagnosing and testing ARM-related issues as well as suggestions on how to resolve interesting questions regarding our ARM support.

The group also has an associated Zulip stream (#t-compiler/arm) where people can go to pose questions and discuss ARM-specific topics.

So, if you are interested in participating, please sign up for the ARM group! To do so, open a PR against the rust-lang/team repository. Just follow this example, but change the username to your own!

Cleanup Crew

Github Label: ICEBreaker-Cleanup-Crew

The "Cleanup Crew" are focused on improving bug reports. Specifically, the goal is to try to ensure that every bug report has all the information that will be needed for someone to fix it:

  • a minimal, standalone example that shows the problem
  • links to duplicates or related bugs
  • if the bug is a regression (something that used to work, but no longer does), then a bisection to the PR or nightly that caused the regression

This kind of cleanup is invaluable in getting bugs fixed. Better still, it can be done by anybody who knows Rust, without any particularly deep knowledge of the compiler.

Let's look a bit at the workflow for doing "cleanup crew" actions.

Finding a minimal, standalone example

Here the ultimate goal is to produce an example that reproduces the same problem but without relying on any external crates. Such a test ought to contain as little code as possible, as well. This will make it much easier to isolate the problem.

However, even if the "ultimate minimal test" cannot be achieved, it's still useful to post incremental minimizations. For example, if you can eliminate some of the external dependencies, that is helpful, and so forth.

It's particularly useful to reduce to an example that works in the Rust playground, rather than requiring people to checkout a cargo build.

There are many resources for how to produce minimized test cases. Here are a few:

  • The rust-reduce tool can try to reduce code automatically.
    • The C-reduce tool also works on Rust code, though it requires that you start from a single file. (XXX link to some post explaining how to do it?)
  • pnkfelix's Rust Bug Minimization Patterns blog post
    • This post focuses on "heavy bore" techniques, where you are starting with a large, complex cargo project that you wish to narrow down to something standalone.

Links to duplicate or related bugs

If you are on the "Cleanup Crew", you will sometimes see multiple bug reports that seem very similar. You can link one to the other just by mentioning the other bug number in a Github comment. Sometimes it is useful to close duplicate bugs. But if you do so, you should always copy any test case from the bug you are closing to the other bug that remains open, as sometimes duplicate-looking bugs will expose different facets of the same problem.

Bisecting regressions

For regressions (something that used to work, but no longer does), it is super useful if we can figure out precisely when the code stopped working. The gold standard is to be able to identify the precise PR that broke the code, so we can ping the author, but even narrowing it down to a nightly build is helpful, especially as that then gives us a range of PRs. (One other challenge is that we sometimes land "rollup" PRs, which combine multiple PRs into one.)

cargo-bisect-rustc

To help in figuring out the cause of a regression we have a tool called cargo-bisect-rustc. It will automatically download and test various builds of rustc. For recent regressions, it is even able to use the builds from our CI to track down the regression to a specific PR; for older regressions, it will simply identify a nightly.

To learn to use cargo-bisect-rustc, check out this blog post, which gives a quick introduction to how it works. You can also ask questions at the Zulip stream #t-compiler/cargo-bisect-rustc, or help in improving the tool.

identifying the range of PRs in a nightly

If the regression occurred more than 90 days ago, then cargo-bisect-rustc will not able to identify the particular PR that caused the regression, just the nightly build. In that case, we can identify the set of PRs that this corresponds to by using the git history.

The command rustc +nightly -vV will cause rustc to output a number of useful bits of version info, including the commit-hash. Given the commit-hash of two nightly versions, you can find all of PRs that have landed in between by taking the following steps:

  1. Go to an update checkout of the rust-lang/rust repository
  2. Execute the command git log --author=bors --format=oneline SHA1..SHA2
  • This will list out all of the commits by bors, which is our merge bot
  • Each commit corresponds to one PR, and information about the PR should be in the description
  1. Copy and paste that information into the bug report

Often, just eye-balling the PR descriptions (which are included in the commit messages) will give you a good idea which one likely caused the problem. But if you're unsure feel free to just ping the compiler team (@rust-lang/compiler) or else to ping the authors of the PR themselves.

LLVM Notification group

Github Label: A-LLVM

The "LLVM Notification Group" are focused on bugs that center around LLVM. These bugs often arise because of LLVM optimizations gone awry, or as the result of an LLVM upgrade. The goal here is:

  • to determine whether the bug is a result of us generating invalid LLVM IR, or LLVM misoptimizing;
  • if the former, to fix our IR;
  • if the latter, to try and file a bug on LLVM (or identify an existing bug).

The group may also be asked to weigh in on other sorts of LLVM-focused questions.

Helpful tips and options

The "Debugging LLVM" section of the rustc-dev-guide gives a step-by-step process for how to help debug bugs caused by LLVM. In particular, it discusses how to emit LLVM IR, run the LLVM IR optimization pipelines, and so forth. You may also find it useful to look at the various codegen options listed under -Chelp and the internal options under -Zhelp -- there are a number that pertain to LLVM (just search for LLVM).

If you do narrow to an LLVM bug

The "Debugging LLVM" section also describes what to do once you've identified the bug.

RISC-V notification group

Github Label: O-riscv

This list will be used to ask for help both in diagnosing and testing RISC-V-related issues as well as suggestions on how to resolve interesting questions regarding our RISC-V support.

The group also has an associated Zulip stream (#t-compiler/risc-v) where people can go to pose questions and discuss RISC-V-specific topics.

So, if you are interested in participating, please sign up for the RISC-V group! To do so, open a PR against the rust-lang/team repository. Just follow this example, but change the username to your own!

Windows notification group

Github Label: O-Windows

This list will be used to ask for help both in diagnosing and testing Windows-related issues as well as suggestions on how to resolve interesting questions regarding our Windows support.

The group also has an associated Zulip stream (#t-compiler/windows) where people can go to pose questions and discuss Windows-specific topics.

To get a better idea for what the group will do, here are some examples of the kinds of questions where we would have reached out to the group for advice in determining the best course of action:

  • Which versions of MinGW should we support?
  • Should we remove the legacy InnoSetup GUI installer? #72569
  • What names should we use for static libraries on Windows? #29520

So, if you are interested in participating, please sign up for the Windows group! To do so, open a PR against the rust-lang/team repository. Just follow this example, but change the username to your own!

rust-lang/rust Licenses

The rustc compiler source and standard library are dual licensed under the Apache License v2.0 and the MIT License unless otherwise specified.

Detailed licensing information is available in the COPYRIGHT document of the rust-lang/rust repository.

Part 2: High-Level Compiler Architecture

The remaining parts of this guide discuss how the compiler works. They go through everything from high-level structure of the compiler to how each stage of compilation works. They should be friendly to both readers interested in the end-to-end process of compilation and readers interested in learning about a specific system they wish to contribute to. If anything is unclear, feel free to file an issue on the rustc-dev-guide repo or contact the compiler team, as detailed in this chapter from Part 1.

In this part, we will look at the high-level architecture of the compiler. In particular, we will look at three overarching design choices that impact the whole compiler: the query system, incremental compilation, and interning.

Overview of the Compiler

This chapter is about the overall process of compiling a program -- how everything fits together.

The rust compiler is special in two ways: it does things to your code that other compilers don't do (e.g. borrow checking) and it has a lot of unconventional implementation choices (e.g. queries). We will talk about these in turn in this chapter, and in the rest of the guide, we will look at all the individual pieces in more detail.

What the compiler does to your code

So first, let's look at what the compiler does to your code. For now, we will avoid mentioning how the compiler implements these steps except as needed; we'll talk about that later.

  • The compile process begins when a user writes a Rust source program in text and invokes the rustc compiler on it. The work that the compiler needs to perform is defined by command-line options. For example, it is possible to enable nightly features (-Z flags), perform check-only builds, or emit LLVM-IR rather than executable machine code. The rustc executable call may be indirect through the use of cargo.

  • Command line argument parsing occurs in the rustc_driver. This crate defines the compile configuration that is requested by the user and passes it to the rest of the compilation process as a rustc_interface::Config.

  • The raw Rust source text is analyzed by a low-level lexer located in rustc_lexer. At this stage, the source text is turned into a stream of atomic source code units known as tokens. The lexer supports the Unicode character encoding.

  • The token stream passes through a higher-level lexer located in rustc_parse to prepare for the next stage of the compile process. The StringReader struct is used at this stage to perform a set of validations and turn strings into interned symbols (interning is discussed later). String interning is a way of storing only one immutable copy of each distinct string value.

  • The lexer has a small interface and doesn't depend directly on the diagnostic infrastructure in rustc. Instead it provides diagnostics as plain data which are emitted in rustc_parse::lexer::mod as real diagnostics.

  • The lexer preserves full fidelity information for both IDEs and proc macros.

  • The parser translates the token stream from the lexer into an Abstract Syntax Tree (AST). It uses a recursive descent (top-down) approach to syntax analysis. The crate entry points for the parser are the Parser::parse_crate_mod() and Parser::parse_mod() methods found in rustc_parse::parser::item. The external module parsing entry point is rustc_expand::module::parse_external_mod. And the macro parser entry point is Parser::parse_nonterminal().

  • Parsing is performed with a set of Parser utility methods including fn bump, fn check, fn eat, fn expect, fn look_ahead.

  • Parsing is organized by the semantic construct that is being parsed. Separate parse_* methods can be found in rustc_parse parser directory. The source file name follows the construct name. For example, the following files are found in the parser:

    • expr.rs
    • pat.rs
    • ty.rs
    • stmt.rs
  • This naming scheme is used across many compiler stages. You will find either a file or directory with the same name across the parsing, lowering, type checking, THIR lowering, and MIR building sources.

  • Macro expansion, AST validation, name resolution, and early linting takes place during this stage of the compile process.

  • The parser uses the standard DiagnosticBuilder API for error handling, but we try to recover, parsing a superset of Rust's grammar, while also emitting an error.

  • rustc_ast::ast::{Crate, Mod, Expr, Pat, ...} AST nodes are returned from the parser.

  • We then take the AST and convert it to High-Level Intermediate Representation (HIR). This is a compiler-friendly representation of the AST. This involves a lot of desugaring of things like loops and async fn.

  • We use the HIR to do type inference. This is the process of automatic detection of the type of an expression.

  • TODO: Maybe some other things are done here? I think initial type checking happens here? And trait solving?

  • The HIR is then lowered to Mid-Level Intermediate Representation (MIR).

    • Along the way, we construct the THIR, which is an even more desugared HIR. THIR is used for pattern and exhaustiveness checking. It is also more convenient to convert into MIR than HIR is.
  • The MIR is used for borrow checking.

  • We (want to) do many optimizations on the MIR because it is still generic and that improves the code we generate later, improving compilation speed too.

    • MIR is a higher level (and generic) representation, so it is easier to do some optimizations at MIR level than at LLVM-IR level. For example LLVM doesn't seem to be able to optimize the pattern the simplify_try mir opt looks for.
  • Rust code is monomorphized, which means making copies of all the generic code with the type parameters replaced by concrete types. To do this, we need to collect a list of what concrete types to generate code for. This is called monomorphization collection.

  • We then begin what is vaguely called code generation or codegen.

    • The code generation stage (codegen) is when higher level representations of source are turned into an executable binary. rustc uses LLVM for code generation. The first step is to convert the MIR to LLVM Intermediate Representation (LLVM IR). This is where the MIR is actually monomorphized, according to the list we created in the previous step.
    • The LLVM IR is passed to LLVM, which does a lot more optimizations on it. It then emits machine code. It is basically assembly code with additional low-level types and annotations added. (e.g. an ELF object or wasm).
    • The different libraries/binaries are linked together to produce the final binary.

How it does it

Ok, so now that we have a high-level view of what the compiler does to your code, let's take a high-level view of how it does all that stuff. There are a lot of constraints and conflicting goals that the compiler needs to satisfy/optimize for. For example,

  • Compilation speed: how fast is it to compile a program. More/better compile-time analyses often means compilation is slower.
    • Also, we want to support incremental compilation, so we need to take that into account. How can we keep track of what work needs to be redone and what can be reused if the user modifies their program?
      • Also we can't store too much stuff in the incremental cache because it would take a long time to load from disk and it could take a lot of space on the user's system...
  • Compiler memory usage: while compiling a program, we don't want to use more memory than we need.
  • Program speed: how fast is your compiled program. More/better compile-time analyses often means the compiler can do better optimizations.
  • Program size: how large is the compiled binary? Similar to the previous point.
  • Compiler compilation speed: how long does it take to compile the compiler? This impacts contributors and compiler maintenance.
  • Implementation complexity: building a compiler is one of the hardest things a person/group can do, and Rust is not a very simple language, so how do we make the compiler's code base manageable?
  • Compiler correctness: the binaries produced by the compiler should do what the input programs says they do, and should continue to do so despite the tremendous amount of change constantly going on.
  • Integration: a number of other tools need to use the compiler in various ways (e.g. cargo, clippy, miri, RLS) that must be supported.
  • Compiler stability: the compiler should not crash or fail ungracefully on the stable channel.
  • Rust stability: the compiler must respect Rust's stability guarantees by not breaking programs that previously compiled despite the many changes that are always going on to its implementation.
  • Limitations of other tools: rustc uses LLVM in its backend, and LLVM has some strengths we leverage and some limitations/weaknesses we need to work around.

So, as you read through the rest of the guide, keep these things in mind. They will often inform decisions that we make.

Constant change

Keep in mind that rustc is a real production-quality product. As such, it has its fair share of codebase churn and technical debt. A lot of the designs discussed throughout this guide are idealized designs that are not fully realized yet. And things keep changing so that it is hard to keep this guide completely up to date on everything!

The compiler definitely has rough edges, but because of its design it is able to keep up with the requirements above.

Intermediate representations

As with most compilers, rustc uses some intermediate representations (IRs) to facilitate computations. In general, working directly with the source code is extremely inconvenient and error-prone. Source code is designed to be human-friendly while at the same time being unambiguous, but it's less convenient for doing something like, say, type checking.

Instead most compilers, including rustc, build some sort of IR out of the source code which is easier to analyze. rustc has a few IRs, each optimized for different purposes:

  • Token stream: the lexer produces a stream of tokens directly from the source code. This stream of tokens is easier for the parser to deal with than raw text.
  • Abstract Syntax Tree (AST): the abstract syntax tree is built from the stream of tokens produced by the lexer. It represents pretty much exactly what the user wrote. It helps to do some syntactic sanity checking (e.g. checking that a type is expected where the user wrote one).
  • High-level IR (HIR): This is a sort of desugared AST. It's still close to what the user wrote syntactically, but it includes some implicit things such as some elided lifetimes, etc. This IR is amenable to type checking.
  • Typed HIR (THIR): This is an intermediate between HIR and MIR, and used to be called High-level Abstract IR (HAIR). It is like the HIR but it is fully typed and a bit more desugared (e.g. method calls and implicit dereferences are made fully explicit). Moreover, it is easier to lower to MIR from THIR than from HIR.
  • Middle-level IR (MIR): This IR is basically a Control-Flow Graph (CFG). A CFG is a type of diagram that shows the basic blocks of a program and how control flow can go between them. Likewise, MIR also has a bunch of basic blocks with simple typed statements inside them (e.g. assignment, simple computations, etc) and control flow edges to other basic blocks (e.g., calls, dropping values). MIR is used for borrow checking and other important dataflow-based checks, such as checking for uninitialized values. It is also used for a series of optimizations and for constant evaluation (via MIRI). Because MIR is still generic, we can do a lot of analyses here more efficiently than after monomorphization.
  • LLVM IR: This is the standard form of all input to the LLVM compiler. LLVM IR is a sort of typed assembly language with lots of annotations. It's a standard format that is used by all compilers that use LLVM (e.g. the clang C compiler also outputs LLVM IR). LLVM IR is designed to be easy for other compilers to emit and also rich enough for LLVM to run a bunch of optimizations on it.

One other thing to note is that many values in the compiler are interned. This is a performance and memory optimization in which we allocate the values in a special allocator called an arena. Then, we pass around references to the values allocated in the arena. This allows us to make sure that identical values (e.g. types in your program) are only allocated once and can be compared cheaply by comparing pointers. Many of the intermediate representations are interned.

Queries

The first big implementation choice is the query system. The rust compiler uses a query system which is unlike most textbook compilers, which are organized as a series of passes over the code that execute sequentially. The compiler does this to make incremental compilation possible -- that is, if the user makes a change to their program and recompiles, we want to do as little redundant work as possible to produce the new binary.

In rustc, all the major steps above are organized as a bunch of queries that call each other. For example, there is a query to ask for the type of something and another to ask for the optimized MIR of a function. These queries can call each other and are all tracked through the query system. The results of the queries are cached on disk so that we can tell which queries' results changed from the last compilation and only redo those. This is how incremental compilation works.

In principle, for the query-fied steps, we do each of the above for each item individually. For example, we will take the HIR for a function and use queries to ask for the LLVM IR for that HIR. This drives the generation of optimized MIR, which drives the borrow checker, which drives the generation of MIR, and so on.

... except that this is very over-simplified. In fact, some queries are not cached on disk, and some parts of the compiler have to run for all code anyway for correctness even if the code is dead code (e.g. the borrow checker). For example, currently the mir_borrowck query is first executed on all functions of a crate. Then the codegen backend invokes the collect_and_partition_mono_items query, which first recursively requests the optimized_mir for all reachable functions, which in turn runs mir_borrowck for that function and then creates codegen units. This kind of split will need to remain to ensure that unreachable functions still have their errors emitted.

Moreover, the compiler wasn't originally built to use a query system; the query system has been retrofitted into the compiler, so parts of it are not query-fied yet. Also, LLVM isn't our code, so that isn't querified either. The plan is to eventually query-fy all of the steps listed in the previous section, but as of this writing, only the steps between HIR and LLVM-IR are query-fied. That is, lexing and parsing are done all at once for the whole program.

One other thing to mention here is the all-important "typing context", TyCtxt, which is a giant struct that is at the center of all things. (Note that the name is mostly historic. This is not a "typing context" in the sense of Γ or Δ from type theory. The name is retained because that's what the name of the struct is in the source code.) All queries are defined as methods on the TyCtxt type, and the in-memory query cache is stored there too. In the code, there is usually a variable called tcx which is a handle on the typing context. You will also see lifetimes with the name 'tcx, which means that something is tied to the lifetime of the TyCtxt (usually it is stored or interned there).

ty::Ty

Types are really important in Rust, and they form the core of a lot of compiler analyses. The main type (in the compiler) that represents types (in the user's program) is rustc_middle::ty::Ty. This is so important that we have a whole chapter on ty::Ty, but for now, we just want to mention that it exists and is the way rustc represents types!

Also note that the rustc_middle::ty module defines the TyCtxt struct we mentioned before.

Parallelism

Compiler performance is a problem that we would like to improve on (and are always working on). One aspect of that is parallelizing rustc itself.

Currently, there is only one part of rustc that is already parallel: codegen. During monomorphization, the compiler will split up all the code to be generated into smaller chunks called codegen units. These are then generated by independent instances of LLVM. Since they are independent, we can run them in parallel. At the end, the linker is run to combine all the codegen units together into one binary.

However, the rest of the compiler is still not yet parallel. There have been lots of efforts spent on this, but it is generally a hard problem. The current approach is to turn RefCells into Mutexs -- that is, we switch to thread-safe internal mutability. However, there are ongoing challenges with lock contention, maintaining query-system invariants under concurrency, and the complexity of the code base. One can try out the current work by enabling parallel compilation in config.toml. It's still early days, but there are already some promising performance improvements.

Bootstrapping

rustc itself is written in Rust. So how do we compile the compiler? We use an older compiler to compile the newer compiler. This is called bootstrapping.

Bootstrapping has a lot of interesting implications. For example, it means that one of the major users of Rust is the Rust compiler, so we are constantly testing our own software ("eating our own dogfood").

For more details on bootstrapping, see the bootstrapping section of the guide.

Unresolved Questions

  • Does LLVM ever do optimizations in debug builds?
  • How do I explore phases of the compile process in my own sources (lexer, parser, HIR, etc)? - e.g., cargo rustc -- -Zunpretty=hir-tree allows you to view HIR representation
  • What is the main source entry point for X?
  • Where do phases diverge for cross-compilation to machine code across different platforms?

References

High-level overview of the compiler source

NOTE: The structure of the repository is going through a lot of transitions. In particular, we want to get to a point eventually where the top-level directory has separate directories for the compiler, build-system, std libs, etc, rather than one huge src/ directory.

As of this writing, the std libs have been moved to library/ and the crates that make up the rustc compiler itself have been moved to compiler/

Now that we have seen what the compiler does, let's take a look at the structure of the contents of the rust-lang/rust repo.

Workspace structure

The rust-lang/rust repository consists of a single large cargo workspace containing the compiler, the standard libraries (core, alloc, std, proc_macro, etc), and rustdoc, along with the build system and bunch of tools and submodules for building a full Rust distribution.

As of this writing, this structure is gradually undergoing some transformation to make it a bit less monolithic and more approachable, especially to newcomers.

The repository consists of a src directory, under which there live many crates, which are the source for the compiler, build system, tools, etc. This directory is currently being broken up to be less monolithic. There is also a library/ directory, where the standard libraries (core, alloc, std, proc_macro, etc) live.

Standard library

The standard library crates are all in library/. They have intuitive names like std, core, alloc, etc. There is also proc_macro, test, and other runtime libraries.

This code is fairly similar to most other Rust crates except that it must be built in a special way because it can use unstable features.

Compiler

You may find it helpful to read The Overview Chapter first, which gives an overview of how the compiler works. The crates mentioned in this section implement the compiler, and are underneath compiler/

The compiler/ crates all have names starting with rustc_*. These are a collection of around 50 interdependent crates ranging in size from tiny to huge. There is also the rustc crate which is the actual binary (i.e. the main function); it doesn't actually do anything besides calling the rustc_driver crate, which drives the various parts of compilation in other crates.

The dependency structure of these crates is complex, but roughly it is something like this:

You can see the exact dependencies by reading the Cargo.toml for the various crates, just like a normal Rust crate.

One final thing: src/llvm-project is a submodule for our fork of LLVM. During bootstrapping, LLVM is built and the src/librustc_llvm and src/rustllvm crates contain rust wrappers around LLVM (which is written in C++), so that the compiler can interface with it.

Most of this book is about the compiler, so we won't have any further explanation of these crates here.

Big picture

The dependency structure is influenced strongly by two main factors:

  1. Organization. The compiler is a huge codebase; it would be an impossibly large crate. In part, the dependency structure reflects the code structure of the compiler.
  2. Compile time. By breaking the compiler into multiple crates, we can take better advantage of incremental/parallel compilation using cargo. In particular, we try to have as few dependencies between crates as possible so that we don't have to rebuild as many crates if you change one.

At the very bottom of the dependency tree are a handful of crates that are used by the whole compiler (e.g. rustc_span). The very early parts of the compilation process (e.g. parsing and the AST) depend on only these.

Pretty soon after the AST is constructed, the compiler's query system gets set up. The query system is set up in a clever way using function pointers. This allows us to break dependencies between crates, allowing more parallel compilation.

However, since the query system is defined in rustc_middle, nearly all subsequent parts of the compiler depend on this crate. It is a really large crate, leading to long compile times. Some efforts have been made to move stuff out of it with limited success. Another unfortunate side effect is that sometimes related functionality gets scattered across different crates. For example, linting functionality is scattered across earlier parts of the crate, rustc_lint, rustc_middle, and other places.

More generally, in an ideal world, it seems like there would be fewer, more cohesive crates, with incremental and parallel compilation making sure compile times stay reasonable. However, our incremental and parallel compilation haven't gotten good enough for that yet, so breaking things into separate crates has been our solution so far.

At the top of the dependency tree are the rustc_interface and rustc_driver crates. rustc_interface is an unstable wrapper around the query system that helps to drive the various stages of compilation. Other consumers of the compiler may use this interface in different ways (e.g. rustdoc or maybe eventually rust-analyzer). The rustc_driver crate first parses command line arguments and then uses rustc_interface to drive the compilation to completion.

rustdoc

The bulk of rustdoc is in librustdoc. However, the rustdoc binary itself is src/tools/rustdoc, which does nothing except call rustdoc::main.

There is also javascript and CSS for the rustdocs in src/tools/rustdoc-js and src/tools/rustdoc-themes.

You can read more about rustdoc in this chapter.

Tests

The test suite for all of the above is in src/test/. You can read more about the test suite in this chapter.

The test harness itself is in src/tools/compiletest.

Build System

There are a number of tools in the repository just for building the compiler, standard library, rustdoc, etc, along with testing, building a full Rust distribution, etc.

One of the primary tools is src/bootstrap. You can read more about bootstrapping in this chapter. The process may also use other tools from src/tools/, such as tidy or compiletest.

Other

There are a lot of other things in the rust-lang/rust repo that are related to building a full rust distribution. Most of the time you don't need to worry about them.

These include:

  • src/ci: The CI configuration. This actually quite extensive because we run a lot of tests on a lot of platforms.
  • src/doc: Various documentation, including submodules for a few books.
  • src/etc: Miscellaneous utilities.
  • src/tools/rustc-workspace-hack, and others: Various workarounds to make cargo work with bootstrapping.
  • And more...

Bootstrapping the Compiler

This subchapter is about the bootstrapping process.

What is bootstrapping? How does it work?

Bootstrapping is the process of using a compiler to compile itself. More accurately, it means using an older compiler to compile a newer version of the same compiler.

This raises a chicken-and-egg paradox: where did the first compiler come from? It must have been written in a different language. In Rust's case it was written in OCaml. However it was abandoned long ago and the only way to build a modern version of rustc is a slightly less modern version.

This is exactly how x.py works: it downloads the current beta release of rustc, then uses it to compile the new compiler.

Stages of bootstrapping

Compiling rustc is done in stages:

  • Stage 0: the stage0 compiler is usually (you can configure x.py to use something else) the current beta rustc compiler and its associated dynamic libraries (which x.py will download for you). This stage0 compiler is then used only to compile rustbuild, std, and rustc. When compiling rustc, this stage0 compiler uses the freshly compiled std. There are two concepts at play here: a compiler (with its set of dependencies) and its 'target' or 'object' libraries (std and rustc). Both are staged, but in a staggered manner.
  • Stage 1: the code in your clone (for new version) is then compiled with the stage0 compiler to produce the stage1 compiler. However, it was built with an older compiler (stage0), so to optimize the stage1 compiler we go to next the stage.
    • In theory, the stage1 compiler is functionally identical to the stage2 compiler, but in practice there are subtle differences. In particular, the stage1 compiler itself was built by stage0 and hence not by the source in your working directory: this means that the symbol names used in the compiler source may not match the symbol names that would have been made by the stage1 compiler. This is important when using dynamic linking and the lack of ABI compatibility between versions. This primarily manifests when tests try to link with any of the rustc_* crates or use the (now deprecated) plugin infrastructure. These tests are marked with ignore-stage1.
  • Stage 2: we rebuild our stage1 compiler with itself to produce the stage2 compiler (i.e. it builds itself) to have all the latest optimizations. (By default, we copy the stage1 libraries for use by the stage2 compiler, since they ought to be identical.)
  • (Optional) Stage 3: to sanity check our new compiler, we can build the libraries with the stage2 compiler. The result ought to be identical to before, unless something has broken.

The stage2 compiler is the one distributed with rustup and all other install methods. However, it takes a very long time to build because one must first build the new compiler with an older compiler and then use that to build the new compiler with itself. For development, you usually only want the stage1 compiler: x.py build library/std.

Default stages

x.py tries to be helpful and pick the stage you most likely meant for each subcommand. These defaults are as follows:

  • doc: --stage 0
  • build: --stage 1
  • test: --stage 1
  • dist: --stage 2
  • install: --stage 2
  • bench: --stage 2

You can always override the stage by passing --stage N explicitly.

For more information about stages, see below.

Complications of bootstrapping

Since the build system uses the current beta compiler to build the stage-1 bootstrapping compiler, the compiler source code can't use some features until they reach beta (because otherwise the beta compiler doesn't support them). On the other hand, for compiler intrinsics and internal features, the features have to be used. Additionally, the compiler makes heavy use of nightly features (#![feature(...)]). How can we resolve this problem?

There are two methods used:

  1. The build system sets --cfg bootstrap when building with stage0, so we can use cfg(not(bootstrap)) to only use features when built with stage1. This is useful for e.g. features that were just stabilized, which require #![feature(...)] when built with stage0, but not for stage1.
  2. The build system sets RUSTC_BOOTSTRAP=1. This special variable means to break the stability guarantees of rust: Allow using #![feature(...)] with a compiler that's not nightly. This should never be used except when bootstrapping the compiler.

Contributing to bootstrap

When you use the bootstrap system, you'll call it through x.py. However, most of the code lives in src/bootstrap. bootstrap has a difficult problem: it is written in Rust, but yet it is run before the rust compiler is built! To work around this, there are two components of bootstrap: the main one written in rust, and bootstrap.py. bootstrap.py is what gets run by x.py. It takes care of downloading the stage0 compiler, which will then build the bootstrap binary written in Rust.

Because there are two separate codebases behind x.py, they need to be kept in sync. In particular, both bootstrap.py and the bootstrap binary parse config.toml and read the same command line arguments. bootstrap.py keeps these in sync by setting various environment variables, and the programs sometimes have to add arguments that are explicitly ignored, to be read by the other.

Adding a setting to config.toml

This section is a work in progress. In the meantime, you can see an example contribution here.

Understanding stages of bootstrap

Overview

This is a detailed look into the separate bootstrap stages.

The convention x.py uses is that:

  • A --stage N flag means to run the stage N compiler (stageN/rustc).
  • A "stage N artifact" is a build artifact that is produced by the stage N compiler.
  • The "stage (N+1) compiler" is assembled from "stage N artifacts". This process is called uplifting.

Build artifacts

Anything you can build with x.py is a build artifact. Build artifacts include, but are not limited to:

  • binaries, like stage0-rustc/rustc-main
  • shared objects, like stage0-sysroot/rustlib/libstd-6fae108520cf72fe.so
  • rlib files, like stage0-sysroot/rustlib/libstd-6fae108520cf72fe.rlib
  • HTML files generated by rustdoc, like doc/std

Assembling the compiler

There is a separate step between building the compiler and making it possible to run. This step is called assembling or uplifting the compiler. It copies all the necessary build artifacts from build/stageN-sysroot to build/stage(N+1), which allows you to use build/stage(N+1) as a toolchain with rustup toolchain link.

There is no way to trigger this step on its own, but x.py will perform it automatically any time you build with stage N+1.

Examples

  • x.py build --stage 0 means to build with the beta rustc.
  • x.py doc --stage 0 means to document using the beta rustdoc.
  • x.py test --stage 0 library/std means to run tests on the standard library without building rustc from source ('build with stage 0, then test the artifacts'). If you're working on the standard library, this is normally the test command you want.
  • x.py test src/test/ui means to build the stage 1 compiler and run compiletest on it. If you're working on the compiler, this is normally the test command you want.

Examples of what not to do

  • x.py test --stage 0 src/test/ui is not meaningful: it runs tests on the beta compiler and doesn't build rustc from source. Use test src/test/ui instead, which builds stage 1 from source.
  • x.py test --stage 0 compiler/rustc builds the compiler but runs no tests: it's running cargo test -p rustc, but cargo doesn't understand Rust's tests. You shouldn't need to use this, use test instead (without arguments).
  • x.py build --stage 0 compiler/rustc builds the compiler, but does not assemble it. Use x.py build library/std instead, which puts the compiler in stage1/rustc.

Building vs. Running

Note that build --stage N compiler/rustc does not build the stage N compiler: instead it builds the stage N+1 compiler using the stage N compiler.

In short, stage 0 uses the stage0 compiler to create stage0 artifacts which will later be uplifted to be the stage1 compiler.

In each stage, two major steps are performed:

  1. std is compiled by the stage N compiler.
  2. That std is linked to programs built by the stage N compiler, including the stage N artifacts (stage (N+1) compiler).

This is somewhat intuitive if one thinks of the stage N artifacts as "just" another program we are building with the stage N compiler: build --stage N compiler/rustc is linking the stage N artifacts to the std built by the stage N compiler.

Here is a chart of a full build using x.py:

A diagram of the rustc compilation phases

Keep in mind this diagram is a simplification, i.e. rustdoc can be built at different stages, the process is a bit different when passing flags such as --keep-stage, or if there are non-host targets.

The stage 2 compiler is what is shipped to end-users.

Stages and std

Note that there are two std libraries in play here:

  1. The library linked to stageN/rustc, which was built by stage N-1 (stage N-1 std)
  2. The library used to compile programs with stageN/rustc, which was built by stage N (stage N std).

Stage N std is pretty much necessary for any useful work with the stage N compiler. Without it, you can only compile programs with #![no_core] -- not terribly useful!

The reason these need to be different is because they aren't necessarily ABI-compatible: there could be a new layout optimizations, changes to MIR, or other changes to Rust metadata on nightly that aren't present in beta.

This is also where --keep-stage 1 library/std comes into play. Since most changes to the compiler don't actually change the ABI, once you've produced a std in stage 1, you can probably just reuse it with a different compiler. If the ABI hasn't changed, you're good to go, no need to spend time recompiling that std. --keep-stage simply assumes the previous compile is fine and copies those artifacts into the appropriate place, skipping the cargo invocation.

Cross-compiling

Building stage2 std is different depending on whether you are cross-compiling or not (see in the table how stage2 only builds non-host std targets). This is because x.py uses a trick: if HOST and TARGET are the same, it will reuse stage1 std for stage2! This is sound because stage1 std was compiled with the stage1 compiler, i.e. a compiler using the source code you currently have checked out. So it should be identical (and therefore ABI-compatible) to the std that stage2/rustc would compile.

However, when cross-compiling, stage1 std will only run on the host. So the stage2 compiler has to recompile std for the target.

Why does only libstd use cfg(bootstrap)?

The rustc generated by the stage0 compiler is linked to the freshly-built std, which means that for the most part only std needs to be cfg-gated, so that rustc can use features added to std immediately after their addition, without need for them to get into the downloaded beta.

Note this is different from any other Rust program: stage1 rustc is built by the beta compiler, but using the master version of libstd!

The only time rustc uses cfg(bootstrap) is when it adds internal lints that use diagnostic items. This happens very rarely.

What is a 'sysroot'?

When you build a project with cargo, the build artifacts for dependendencies are normally stored in target/debug/deps. This only contains dependencies cargo knows about; in particular, it doesn't have the standard library. Where do std or proc_macro come from? It comes from the sysroot, the root of a number of directories where the compiler loads build artifacts at runtime. The sysroot doesn't just store the standard library, though - it includes anything that needs to be loaded at runtime. That includes (but is not limited to):

  • libstd/libtest/libproc_macro
  • The compiler crates themselves, when using rustc_private. In-tree these are always present; out of tree, you need to install rustc-dev with rustup.
  • libLLVM.so, the shared object file for the LLVM project. In-tree this is either built from source or downloaded from CI; out-of-tree, you need to install llvm-tools-preview with rustup.

All the artifacts listed so far are compiler runtime dependencies. You can see them with rustc --print sysroot:

$ ls $(rustc --print sysroot)/lib
libchalk_derive-0685d79833dc9b2b.so  libstd-25c6acf8063a3802.so
libLLVM-11-rust-1.50.0-nightly.so    libtest-57470d2aa8f7aa83.so
librustc_driver-4f0cc9f50e53f0ba.so  libtracing_attributes-e4be92c35ab2a33b.so
librustc_macros-5f0ec4a119c6ac86.so  rustlib

There are also runtime dependencies for the standard library! These are in lib/rustlib, not lib/ directly.

$ ls $(rustc --print sysroot)/lib/rustlib/x86_64-unknown-linux-gnu/lib | head -n 5
libaddr2line-6c8e02b8fedc1e5f.rlib
libadler-9ef2480568df55af.rlib
liballoc-9c4002b5f79ba0e1.rlib
libcfg_if-512eb53291f6de7e.rlib
libcompiler_builtins-ef2408da76957905.rlib

rustlib includes libraries like hashbrown and cfg_if, which are not part of the public API of the standard library, but are used to implement it. rustlib is part of the search path for linkers, but lib will never be part of the search path.

Since rustlib is part of the search path, it means we have to be careful about which crates are included in it. In particular, all crates except for the standard library are built with the flag -Z force-unstable-if-unmarked, which means that you have to use #![feature(rustc_private)] in order to load it (as opposed to the standard library, which is always available).

You can find more discussion about sysroots in:

Directories and artifacts generated by x.py

The following tables indicate the outputs of various stage actions:

Stage 0 ActionOutput
beta extractedbuild/HOST/stage0
stage0 builds bootstrapbuild/bootstrap
stage0 builds test/stdbuild/HOST/stage0-std/TARGET
copy stage0-std (HOST only)build/HOST/stage0-sysroot/lib/rustlib/HOST
stage0 builds rustc with stage0-sysrootbuild/HOST/stage0-rustc/HOST
copy stage0-rustc (except executable)build/HOST/stage0-sysroot/lib/rustlib/HOST
build llvmbuild/HOST/llvm
stage0 builds codegen with stage0-sysrootbuild/HOST/stage0-codegen/HOST
stage0 builds rustdoc, clippy, miri, with stage0-sysrootbuild/HOST/stage0-tools/HOST

--stage=0 stops here.

Stage 1 ActionOutput
copy (uplift) stage0-rustc executable to stage1build/HOST/stage1/bin
copy (uplift) stage0-codegen to stage1build/HOST/stage1/lib
copy (uplift) stage0-sysroot to stage1build/HOST/stage1/lib
stage1 builds test/stdbuild/HOST/stage1-std/TARGET
copy stage1-std (HOST only)build/HOST/stage1/lib/rustlib/HOST
stage1 builds rustcbuild/HOST/stage1-rustc/HOST
copy stage1-rustc (except executable)build/HOST/stage1/lib/rustlib/HOST
stage1 builds codegenbuild/HOST/stage1-codegen/HOST

--stage=1 stops here.

Stage 2 ActionOutput
copy (uplift) stage1-rustc executablebuild/HOST/stage2/bin
copy (uplift) stage1-sysrootbuild/HOST/stage2/lib and build/HOST/stage2/lib/rustlib/HOST
stage2 builds test/std (not HOST targets)build/HOST/stage2-std/TARGET
copy stage2-std (not HOST targets)build/HOST/stage2/lib/rustlib/TARGET
stage2 builds rustdoc, clippy, miribuild/HOST/stage2-tools/HOST
copy rustdocbuild/HOST/stage2/bin

--stage=2 stops here.

Passing stage-specific flags to rustc

x.py allows you to pass stage-specific flags to rustc when bootstrapping. The RUSTFLAGS_BOOTSTRAP environment variable is passed as RUSTFLAGS to the bootstrap stage (stage0), and RUSTFLAGS_NOT_BOOTSTRAP is passed when building artifacts for later stages.

Environment Variables

During bootstrapping, there are a bunch of compiler-internal environment variables that are used. If you are trying to run an intermediate version of rustc, sometimes you may need to set some of these environment variables manually. Otherwise, you get an error like the following:

thread 'main' panicked at 'RUSTC_STAGE was not set: NotPresent', library/core/src/result.rs:1165:5

If ./stageN/bin/rustc gives an error about environment variables, that usually means something is quite wrong -- or you're trying to compile e.g. rustc or std or something that depends on environment variables. In the unlikely case that you actually need to invoke rustc in such a situation, you can find the environment variable values by adding the following flag to your x.py command: --on-fail=print-env.

Queries: demand-driven compilation

As described in the high-level overview of the compiler, the Rust compiler is currently transitioning from a traditional "pass-based" setup to a "demand-driven" system. The Compiler Query System is the key to our new demand-driven organization. The idea is pretty simple. You have various queries that compute things about the input – for example, there is a query called type_of(def_id) that, given the def-id of some item, will compute the type of that item and return it to you.

Query execution is memoized – so the first time you invoke a query, it will go do the computation, but the next time, the result is returned from a hashtable. Moreover, query execution fits nicely into incremental computation; the idea is roughly that, when you do a query, the result may be returned to you by loading stored data from disk (but that's a separate topic we won't discuss further here).

The overall vision is that, eventually, the entire compiler control-flow will be query driven. There will effectively be one top-level query ("compile") that will run compilation on a crate; this will in turn demand information about that crate, starting from the end. For example:

  • This "compile" query might demand to get a list of codegen-units (i.e. modules that need to be compiled by LLVM).
  • But computing the list of codegen-units would invoke some subquery that returns the list of all modules defined in the Rust source.
  • That query in turn would invoke something asking for the HIR.
  • This keeps going further and further back until we wind up doing the actual parsing.

However, that vision is not fully realized. Still, big chunks of the compiler (for example, generating MIR) work exactly like this.

Incremental Compilation in Detail

The Incremental Compilation in Detail chapter gives a more in-depth description of what queries are and how they work. If you intend to write a query of your own, this is a good read.

Invoking queries

To invoke a query is simple. The tcx ("type context") offers a method for each defined query. So, for example, to invoke the type_of query, you would just do this:

let ty = tcx.type_of(some_def_id);

How the compiler executes a query

So you may be wondering what happens when you invoke a query method. The answer is that, for each query, the compiler maintains a cache – if your query has already been executed, then, the answer is simple: we clone the return value out of the cache and return it (therefore, you should try to ensure that the return types of queries are cheaply cloneable; insert a Rc if necessary).

Providers

If, however, the query is not in the cache, then the compiler will try to find a suitable provider. A provider is a function that has been defined and linked into the compiler somewhere that contains the code to compute the result of the query.

Providers are defined per-crate. The compiler maintains, internally, a table of providers for every crate, at least conceptually. Right now, there are really two sets: the providers for queries about the local crate (that is, the one being compiled) and providers for queries about external crates (that is, dependencies of the local crate). Note that what determines the crate that a query is targeting is not the kind of query, but the key. For example, when you invoke tcx.type_of(def_id), that could be a local query or an external query, depending on what crate the def_id is referring to (see the self::keys::Key trait for more information on how that works).

Providers always have the same signature:

fn provider<'tcx>(
    tcx: TyCtxt<'tcx>,
    key: QUERY_KEY,
) -> QUERY_RESULT {
    ...
}

Providers take two arguments: the tcx and the query key. They return the result of the query.

How providers are setup

When the tcx is created, it is given the providers by its creator using the Providers struct. This struct is generated by the macros here, but it is basically a big list of function pointers:

struct Providers {
    type_of: for<'tcx> fn(TyCtxt<'tcx>, DefId) -> Ty<'tcx>,
    ...
}

At present, we have one copy of the struct for local crates, and one for external crates, though the plan is that we may eventually have one per crate.

These Providers structs are ultimately created and populated by rustc_driver, but it does this by distributing the work throughout the other rustc_* crates. This is done by invoking various provide functions. These functions tend to look something like this:

pub fn provide(providers: &mut Providers) {
    *providers = Providers {
        type_of,
        ..*providers
    };
}

That is, they take an &mut Providers and mutate it in place. Usually we use the formulation above just because it looks nice, but you could as well do providers.type_of = type_of, which would be equivalent. (Here, type_of would be a top-level function, defined as we saw before.) So, if we want to add a provider for some other query, let's call it fubar, into the crate above, we might modify the provide() function like so:

pub fn provide(providers: &mut Providers) {
    *providers = Providers {
        type_of,
        fubar,
        ..*providers
    };
}

fn fubar<'tcx>(tcx: TyCtxt<'tcx>, key: DefId) -> Fubar<'tcx> { ... }

N.B. Most of the rustc_* crates only provide local providers. Almost all extern providers wind up going through the rustc_metadata crate, which loads the information from the crate metadata. But in some cases there are crates that provide queries for both local and external crates, in which case they define both a provide and a provide_extern function, through provide_both, that rustc_driver can invoke.

Adding a new kind of query

So suppose you want to add a new kind of query, how do you do so? Well, defining a query takes place in two steps:

  1. first, you have to specify the query name and arguments; and then,
  2. you have to supply query providers where needed.

To specify the query name and arguments, you simply add an entry to the big macro invocation in compiler/rustc_middle/src/query/mod.rs, which looks something like:

rustc_queries! {
    Other {
        /// Records the type of every item.
        query type_of(key: DefId) -> Ty<'tcx> {
            cache { key.is_local() }
        }
    }

    ...
}

Queries are grouped into categories (Other, Codegen, TypeChecking, etc.). Each group contains one or more queries. Each query definition is broken up like this:

query type_of(key: DefId) -> Ty<'tcx> { ... }
^^    ^^^^^^^      ^^^^^     ^^^^^^^^   ^^^
|     |            |         |          |
|     |            |         |          query modifiers
|     |            |         result type of query
|     |            query key type
|     name of query
query keyword

Let's go over them one by one:

  • Query keyword: indicates a start of a query definition.
  • Name of query: the name of the query method (tcx.type_of(..)). Also used as the name of a struct (ty::queries::type_of) that will be generated to represent this query.
  • Query key type: the type of the argument to this query. This type must implement the ty::query::keys::Key trait, which defines (for example) how to map it to a crate, and so forth.
  • Result type of query: the type produced by this query. This type should (a) not use RefCell or other interior mutability and (b) be cheaply cloneable. Interning or using Rc or Arc is recommended for non-trivial data types.
    • The one exception to those rules is the ty::steal::Steal type, which is used to cheaply modify MIR in place. See the definition of Steal for more details. New uses of Steal should not be added without alerting @rust-lang/compiler.
  • Query modifiers: various flags and options that customize how the query is processed (mostly with respect to incremental compilation).

So, to add a query:

  • Add an entry to rustc_queries! using the format above.
  • Link the provider by modifying the appropriate provide method; or add a new one if needed and ensure that rustc_driver is invoking it.

Query structs and descriptions

For each kind, the rustc_queries macro will generate a "query struct" named after the query. This struct is a kind of a place-holder describing the query. Each such struct implements the self::config::QueryConfig trait, which has associated types for the key/value of that particular query. Basically the code generated looks something like this:

// Dummy struct representing a particular kind of query:
pub struct type_of<'tcx> { data: PhantomData<&'tcx ()> }

impl<'tcx> QueryConfig for type_of<'tcx> {
  type Key = DefId;
  type Value = Ty<'tcx>;

  const NAME: QueryName = QueryName::type_of;
  const CATEGORY: ProfileCategory = ProfileCategory::Other;
}

There is an additional trait that you may wish to implement called self::config::QueryDescription. This trait is used during cycle errors to give a "human readable" name for the query, so that we can summarize what was happening when the cycle occurred. Implementing this trait is optional if the query key is DefId, but if you don't implement it, you get a pretty generic error ("processing foo..."). You can put new impls into the config module. They look something like this:

impl<'tcx> QueryDescription for queries::type_of<'tcx> {
    fn describe(tcx: TyCtxt, key: DefId) -> String {
        format!("computing the type of `{}`", tcx.def_path_str(key))
    }
}

Another option is to add desc modifier:

rustc_queries! {
    Other {
        /// Records the type of every item.
        query type_of(key: DefId) -> Ty<'tcx> {
            desc { |tcx| "computing the type of `{}`", tcx.def_path_str(key) }
        }
    }
}

rustc_queries macro will generate an appropriate impl automatically.

The Query Evaluation Model in Detail

This chapter provides a deeper dive into the abstract model queries are built on. It does not go into implementation details but tries to explain the underlying logic. The examples here, therefore, have been stripped down and simplified and don't directly reflect the compilers internal APIs.

What is a query?

Abstractly we view the compiler's knowledge about a given crate as a "database" and queries are the way of asking the compiler questions about it, i.e. we "query" the compiler's "database" for facts.

However, there's something special to this compiler database: It starts out empty and is filled on-demand when queries are executed. Consequently, a query must know how to compute its result if the database does not contain it yet. For doing so, it can access other queries and certain input values that the database is pre-filled with on creation.

A query thus consists of the following things:

  • A name that identifies the query
  • A "key" that specifies what we want to look up
  • A result type that specifies what kind of result it yields
  • A "provider" which is a function that specifies how the result is to be computed if it isn't already present in the database.

As an example, the name of the type_of query is type_of, its query key is a DefId identifying the item we want to know the type of, the result type is Ty<'tcx>, and the provider is a function that, given the query key and access to the rest of the database, can compute the type of the item identified by the key.

So in some sense a query is just a function that maps the query key to the corresponding result. However, we have to apply some restrictions in order for this to be sound:

  • The key and result must be immutable values.
  • The provider function must be a pure function, that is, for the same key it must always yield the same result.
  • The only parameters a provider function takes are the key and a reference to the "query context" (which provides access to rest of the "database").

The database is built up lazily by invoking queries. The query providers will invoke other queries, for which the result is either already cached or computed by calling another query provider. These query provider invocations conceptually form a directed acyclic graph (DAG) at the leaves of which are input values that are already known when the query context is created.

Caching/Memoization

Results of query invocations are "memoized" which means that the query context will cache the result in an internal table and, when the query is invoked with the same query key again, will return the result from the cache instead of running the provider again.

This caching is crucial for making the query engine efficient. Without memoization the system would still be sound (that is, it would yield the same results) but the same computations would be done over and over again.

Memoization is one of the main reasons why query providers have to be pure functions. If calling a provider function could yield different results for each invocation (because it accesses some global mutable state) then we could not memoize the result.

Input data

When the query context is created, it is still empty: No queries have been executed, no results are cached. But the context already provides access to "input" data, i.e. pieces of immutable data that were computed before the context was created and that queries can access to do their computations. Currently this input data consists mainly of the HIR map, upstream crate metadata, and the command-line options the compiler was invoked with. In the future, inputs will just consist of command-line options and a list of source files -- the HIR map will itself be provided by a query which processes these source files.

Without inputs, queries would live in a void without anything to compute their result from (remember, query providers only have access to other queries and the context but not any other outside state or information).

For a query provider, input data and results of other queries look exactly the same: It just tells the context "give me the value of X". Because input data is immutable, the provider can rely on it being the same across different query invocations, just as is the case for query results.

An example execution trace of some queries

How does this DAG of query invocations come into existence? At some point the compiler driver will create the, as yet empty, query context. It will then, from outside of the query system, invoke the queries it needs to perform its task. This looks something like the following:

fn compile_crate() {
    let cli_options = ...;
    let hir_map = ...;

    // Create the query context `tcx`
    let tcx = TyCtxt::new(cli_options, hir_map);

    // Do type checking by invoking the type check query
    tcx.type_check_crate();
}

The type_check_crate query provider would look something like the following:

fn type_check_crate_provider(tcx, _key: ()) {
    let list_of_hir_items = tcx.hir_map.list_of_items();

    for item_def_id in list_of_hir_items {
        tcx.type_check_item(item_def_id);
    }
}

We see that the type_check_crate query accesses input data (tcx.hir_map.list_of_items()) and invokes other queries (type_check_item). The type_check_item invocations will themselves access input data and/or invoke other queries, so that in the end the DAG of query invocations will be built up backwards from the node that was initially executed:

         (2)                                                 (1)
  list_of_all_hir_items <----------------------------- type_check_crate()
                                                               |
    (5)             (4)                  (3)                   |
  Hir(foo) <--- type_of(foo) <--- type_check_item(foo) <-------+
                                      |                        |
                    +-----------------+                        |
                    |                                          |
    (7)             v  (6)                  (8)                |
  Hir(bar) <--- type_of(bar) <--- type_check_item(bar) <-------+

// (x) denotes invocation order

We also see that often a query result can be read from the cache: type_of(bar) was computed for type_check_item(foo) so when type_check_item(bar) needs it, it is already in the cache.

Query results stay cached in the query context as long as the context lives. So if the compiler driver invoked another query later on, the above graph would still exist and already executed queries would not have to be re-done.

Cycles

Earlier we stated that query invocations form a DAG. However, it would be easy to form a cyclic graph by, for example, having a query provider like the following:

fn cyclic_query_provider(tcx, key) -> u32 {
  // Invoke the same query with the same key again
  tcx.cyclic_query(key)
}

Since query providers are regular functions, this would behave much as expected: Evaluation would get stuck in an infinite recursion. A query like this would not be very useful either. However, sometimes certain kinds of invalid user input can result in queries being called in a cyclic way. The query engine includes a check for cyclic invocations and, because cycles are an irrecoverable error, will abort execution with a "cycle error" messages that tries to be human readable.

At some point the compiler had a notion of "cycle recovery", that is, one could "try" to execute a query and if it ended up causing a cycle, proceed in some other fashion. However, this was later removed because it is not entirely clear what the theoretical consequences of this are, especially regarding incremental compilation.

"Steal" Queries

Some queries have their result wrapped in a Steal<T> struct. These queries behave exactly the same as regular with one exception: Their result is expected to be "stolen" out of the cache at some point, meaning some other part of the program is taking ownership of it and the result cannot be accessed anymore.

This stealing mechanism exists purely as a performance optimization because some result values are too costly to clone (e.g. the MIR of a function). It seems like result stealing would violate the condition that query results must be immutable (after all we are moving the result value out of the cache) but it is OK as long as the mutation is not observable. This is achieved by two things:

  • Before a result is stolen, we make sure to eagerly run all queries that might ever need to read that result. This has to be done manually by calling those queries.
  • Whenever a query tries to access a stolen result, we make the compiler ICE so that such a condition cannot go unnoticed.

This is not an ideal setup because of the manual intervention needed, so it should be used sparingly and only when it is well known which queries might access a given result. In practice, however, stealing has not turned out to be much of a maintenance burden.

To summarize: "Steal queries" break some of the rules in a controlled way. There are checks in place that make sure that nothing can go silently wrong.

Parallel Query Execution

The query model has some properties that make it actually feasible to evaluate multiple queries in parallel without too much of an effort:

  • All data a query provider can access is accessed via the query context, so the query context can take care of synchronizing access.
  • Query results are required to be immutable so they can safely be used by different threads concurrently.

The nightly compiler already implements parallel query evaluation as follows:

When a query foo is evaluated, the cache table for foo is locked.

  • If there already is a result, we can clone it, release the lock and we are done.
  • If there is no cache entry and no other active query invocation computing the same result, we mark the key as being "in progress", release the lock and start evaluating.
  • If there is another query invocation for the same key in progress, we release the lock, and just block the thread until the other invocation has computed the result we are waiting for. This cannot deadlock because, as mentioned before, query invocations form a DAG. Some thread will always make progress.

Incremental compilation

The incremental compilation scheme is, in essence, a surprisingly simple extension to the overall query system. We'll start by describing a slightly simplified variant of the real thing – the "basic algorithm" – and then describe some possible improvements.

The basic algorithm

The basic algorithm is called the red-green algorithm1. The high-level idea is that, after each run of the compiler, we will save the results of all the queries that we do, as well as the query DAG. The query DAG is a DAG that indexes which queries executed which other queries. So, for example, there would be an edge from a query Q1 to another query Q2 if computing Q1 required computing Q2 (note that because queries cannot depend on themselves, this results in a DAG and not a general graph).

On the next run of the compiler, then, we can sometimes reuse these query results to avoid re-executing a query. We do this by assigning every query a color:

  • If a query is colored red, that means that its result during this compilation has changed from the previous compilation.
  • If a query is colored green, that means that its result is the same as the previous compilation.

There are two key insights here:

  • First, if all the inputs to query Q are colored green, then the query Q must result in the same value as last time and hence need not be re-executed (or else the compiler is not deterministic).
  • Second, even if some inputs to a query changes, it may be that it still produces the same result as the previous compilation. In particular, the query may only use part of its input.
    • Therefore, after executing a query, we always check whether it produced the same result as the previous time. If it did, we can still mark the query as green, and hence avoid re-executing dependent queries.

The try-mark-green algorithm

At the core of incremental compilation is an algorithm called "try-mark-green". It has the job of determining the color of a given query Q (which must not have yet been executed). In cases where Q has red inputs, determining Q's color may involve re-executing Q so that we can compare its output, but if all of Q's inputs are green, then we can conclude that Q must be green without re-executing it or inspecting its value at all. In the compiler, this allows us to avoid deserializing the result from disk when we don't need it, and in fact enables us to sometimes skip serializing the result as well (see the refinements section below).

Try-mark-green works as follows:

  • First check if the query Q was executed during the previous compilation.
    • If not, we can just re-execute the query as normal, and assign it the color of red.
  • If yes, then load the 'dependent queries' of Q.
  • If there is a saved result, then we load the reads(Q) vector from the query DAG. The "reads" is the set of queries that Q executed during its execution.
    • For each query R in reads(Q), we recursively demand the color of R using try-mark-green.
      • Note: it is important that we visit each node in reads(Q) in same order as they occurred in the original compilation. See the section on the query DAG below.
      • If any of the nodes in reads(Q) wind up colored red, then Q is dirty.
        • We re-execute Q and compare the hash of its result to the hash of the result from the previous compilation.
        • If the hash has not changed, we can mark Q as green and return.
      • Otherwise, all of the nodes in reads(Q) must be green. In that case, we can color Q as green and return.

The query DAG

The query DAG code is stored in compiler/rustc_middle/src/dep_graph. Construction of the DAG is done by instrumenting the query execution.

One key point is that the query DAG also tracks ordering; that is, for each query Q, we not only track the queries that Q reads, we track the order in which they were read. This allows try-mark-green to walk those queries back in the same order. This is important because once a subquery comes back as red, we can no longer be sure that Q will continue along the same path as before. That is, imagine a query like this:

fn main_query(tcx) {
    if tcx.subquery1() {
        tcx.subquery2()
    } else {
        tcx.subquery3()
    }
}

Now imagine that in the first compilation, main_query starts by executing subquery1, and this returns true. In that case, the next query main_query executes will be subquery2, and subquery3 will not be executed at all.

But now imagine that in the next compilation, the input has changed such that subquery1 returns false. In this case, subquery2 would never execute. If try-mark-green were to visit reads(main_query) out of order, however, it might visit subquery2 before subquery1, and hence execute it. This can lead to ICEs and other problems in the compiler.

Improvements to the basic algorithm

In the description of the basic algorithm, we said that at the end of compilation we would save the results of all the queries that were performed. In practice, this can be quite wasteful – many of those results are very cheap to recompute, and serializing and deserializing them is not a particular win. In practice, what we would do is to save the hashes of all the subqueries that we performed. Then, in select cases, we also save the results.

This is why the incremental algorithm separates computing the color of a node, which often does not require its value, from computing the result of a node. Computing the result is done via a simple algorithm like so:

  • Check if a saved result for Q is available. If so, compute the color of Q. If Q is green, deserialize and return the saved result.
  • Otherwise, execute Q.
    • We can then compare the hash of the result and color Q as green if it did not change.

Resources

The initial design document can be found here, which expands on the memoization details, provides more high-level overview and motivation for this system.

Footnotes

1

I have long wanted to rename it to the Salsa algorithm, but it never caught on. -@nikomatsakis

Incremental Compilation In Detail

The incremental compilation scheme is, in essence, a surprisingly simple extension to the overall query system. It relies on the fact that:

  1. queries are pure functions -- given the same inputs, a query will always yield the same result, and
  2. the query model structures compilation in an acyclic graph that makes dependencies between individual computations explicit.

This chapter will explain how we can use these properties for making things incremental and then goes on to discuss version implementation issues.

A Basic Algorithm For Incremental Query Evaluation

As explained in the query evaluation model primer, query invocations form a directed-acyclic graph. Here's the example from the previous chapter again:

  list_of_all_hir_items <----------------------------- type_check_crate()
                                                               |
                                                               |
  Hir(foo) <--- type_of(foo) <--- type_check_item(foo) <-------+
                                      |                        |
                    +-----------------+                        |
                    |                                          |
                    v                                          |
  Hir(bar) <--- type_of(bar) <--- type_check_item(bar) <-------+

Since every access from one query to another has to go through the query context, we can record these accesses and thus actually build this dependency graph in memory. With dependency tracking enabled, when compilation is done, we know which queries were invoked (the nodes of the graph) and for each invocation, which other queries or input has gone into computing the query's result (the edges of the graph).

Now suppose we change the source code of our program so that HIR of bar looks different than before. Our goal is to only recompute those queries that are actually affected by the change while re-using the cached results of all the other queries. Given the dependency graph we can do exactly that. For a given query invocation, the graph tells us exactly what data has gone into computing its results, we just have to follow the edges until we reach something that has changed. If we don't encounter anything that has changed, we know that the query still would evaluate to the same result we already have in our cache.

Taking the type_of(foo) invocation from above as an example, we can check whether the cached result is still valid by following the edges to its inputs. The only edge leads to Hir(foo), an input that has not been affected by the change. So we know that the cached result for type_of(foo) is still valid.

The story is a bit different for type_check_item(foo): We again walk the edges and already know that type_of(foo) is fine. Then we get to type_of(bar) which we have not checked yet, so we walk the edges of type_of(bar) and encounter Hir(bar) which has changed. Consequently the result of type_of(bar) might yield a different same result than what we have in the cache and, transitively, the result of type_check_item(foo) might have changed too. We thus re-run type_check_item(foo), which in turn will re-run type_of(bar), which will yield an up-to-date result because it reads the up-to-date version of Hir(bar).

The Problem With The Basic Algorithm: False Positives

If you read the previous paragraph carefully you'll notice that it says that type_of(bar) might have changed because one of its inputs has changed. There's also the possibility that it might still yield exactly the same result even though its input has changed. Consider an example with a simple query that just computes the sign of an integer:

  IntValue(x) <---- sign_of(x) <--- some_other_query(x)

Let's say that IntValue(x) starts out as 1000 and then is set to 2000. Even though IntValue(x) is different in the two cases, sign_of(x) yields the result + in both cases.

If we follow the basic algorithm, however, some_other_query(x) would have to (unnecessarily) be re-evaluated because it transitively depends on a changed input. Change detection yields a "false positive" in this case because it has to conservatively assume that some_other_query(x) might be affected by that changed input.

Unfortunately it turns out that the actual queries in the compiler are full of examples like this and small changes to the input often potentially affect very large parts of the output binaries. As a consequence, we had to make the change detection system smarter and more accurate.

Improving Accuracy: The red-green Algorithm

The "false positives" problem can be solved by interleaving change detection and query re-evaluation. Instead of walking the graph all the way to the inputs when trying to find out if some cached result is still valid, we can check if a result has actually changed after we were forced to re-evaluate it.

We call this algorithm the red-green algorithm because nodes in the dependency graph are assigned the color green if we were able to prove that its cached result is still valid and the color red if the result has turned out to be different after re-evaluating it.

The meat of red-green change tracking is implemented in the try-mark-green algorithm, that, you've guessed it, tries to mark a given node as green:

fn try_mark_green(tcx, current_node) -> bool {

    // Fetch the inputs to `current_node`, i.e. get the nodes that the direct
    // edges from `node` lead to.
    let dependencies = tcx.dep_graph.get_dependencies_of(current_node);

    // Now check all the inputs for changes
    for dependency in dependencies {

        match tcx.dep_graph.get_node_color(dependency) {
            Green => {
                // This input has already been checked before and it has not
                // changed; so we can go on to check the next one
            }
            Red => {
                // We found an input that has changed. We cannot mark
                // `current_node` as green without re-running the
                // corresponding query.
                return false
            }
            Unknown => {
                // This is the first time we look at this node. Let's try
                // to mark it green by calling try_mark_green() recursively.
                if try_mark_green(tcx, dependency) {
                    // We successfully marked the input as green, on to the
                    // next.
                } else {
                    // We could *not* mark the input as green. This means we
                    // don't know if its value has changed. In order to find
                    // out, we re-run the corresponding query now!
                    tcx.run_query_for(dependency);

                    // Fetch and check the node color again. Running the query
                    // has forced it to either red (if it yielded a different
                    // result than we have in the cache) or green (if it
                    // yielded the same result).
                    match tcx.dep_graph.get_node_color(dependency) {
                        Red => {
                            // The input turned out to be red, so we cannot
                            // mark `current_node` as green.
                            return false
                        }
                        Green => {
                            // Re-running the query paid off! The result is the
                            // same as before, so this particular input does
                            // not invalidate `current_node`.
                        }
                        Unknown => {
                            // There is no way a node has no color after
                            // re-running the query.
                            panic!("unreachable")
                        }
                    }
                }
            }
        }
    }

    // If we have gotten through the entire loop, it means that all inputs
    // have turned out to be green. If all inputs are unchanged, it means
    // that the query result corresponding to `current_node` cannot have
    // changed either.
    tcx.dep_graph.mark_green(current_node);

    true
}

// Note: The actual implementation can be found in
//       compiler/rustc_middle/src/dep_graph/graph.rs

By using red-green marking we can avoid the devastating cumulative effect of having false positives during change detection. Whenever a query is executed in incremental mode, we first check if its already green. If not, we run try_mark_green() on it. If it still isn't green after that, then we actually invoke the query provider to re-compute the result.

The Real World: How Persistence Makes Everything Complicated

The sections above described the underlying algorithm for incremental compilation but because the compiler process exits after being finished and takes the query context with its result cache with it into oblivion, we have to persist data to disk, so the next compilation session can make use of it. This comes with a whole new set of implementation challenges:

  • The query result cache is stored to disk, so they are not readily available for change comparison.
  • A subsequent compilation session will start off with new version of the code that has arbitrary changes applied to it. All kinds of IDs and indices that are generated from a global, sequential counter (e.g. NodeId, DefId, etc) might have shifted, making the persisted results on disk not immediately usable anymore because the same numeric IDs and indices might refer to completely new things in the new compilation session.
  • Persisting things to disk comes at a cost, so not every tiny piece of information should be actually cached in between compilation sessions. Fixed-sized, plain-old-data is preferred to complex things that need to run through an expensive (de-)serialization step.

The following sections describe how the compiler currently solves these issues.

A Question Of Stability: Bridging The Gap Between Compilation Sessions

As noted before, various IDs (like DefId) are generated by the compiler in a way that depends on the contents of the source code being compiled. ID assignment is usually deterministic, that is, if the exact same code is compiled twice, the same things will end up with the same IDs. However, if something changes, e.g. a function is added in the middle of a file, there is no guarantee that anything will have the same ID as it had before.

As a consequence we cannot represent the data in our on-disk cache the same way it is represented in memory. For example, if we just stored a piece of type information like TyKind::FnDef(DefId, &'tcx Substs<'tcx>) (as we do in memory) and then the contained DefId points to a different function in a new compilation session we'd be in trouble.

The solution to this problem is to find "stable" forms for IDs which remain valid in between compilation sessions. For the most important case, DefIds, these are the so-called DefPaths. Each DefId has a corresponding DefPath but in place of a numeric ID, a DefPath is based on the path to the identified item, e.g. std::collections::HashMap. The advantage of an ID like this is that it is not affected by unrelated changes. For example, one can add a new function to std::collections but std::collections::HashMap would still be std::collections::HashMap. A DefPath is "stable" across changes made to the source code while a DefId isn't.

There is also the DefPathHash which is just a 128-bit hash value of the DefPath. The two contain the same information and we mostly use the DefPathHash because it simpler to handle, being Copy and self-contained.

This principle of stable identifiers is used to make the data in the on-disk cache resilient to source code changes. Instead of storing a DefId, we store the DefPathHash and when we deserialize something from the cache, we map the DefPathHash to the corresponding DefId in the current compilation session (which is just a simple hash table lookup).

The HirId, used for identifying HIR components that don't have their own DefId, is another such stable ID. It is (conceptually) a pair of a DefPath and a LocalId, where the LocalId identifies something (e.g. a hir::Expr) locally within its "owner" (e.g. a hir::Item). If the owner is moved around, the LocalIds within it are still the same.

Checking Query Results For Changes: HashStable And Fingerprints

In order to do red-green-marking we often need to check if the result of a query has changed compared to the result it had during the previous compilation session. There are two performance problems with this though:

  • We'd like to avoid having to load the previous result from disk just for doing the comparison. We already computed the new result and will use that. Also loading a result from disk will "pollute" the interners with data that is unlikely to ever be used.
  • We don't want to store each and every result in the on-disk cache. For example, it would be wasted effort to persist things to disk that are already available in upstream crates.

The compiler avoids these problems by using so-called Fingerprints. Each time a new query result is computed, the query engine will compute a 128 bit hash value of the result. We call this hash value "the Fingerprint of the query result". The hashing is (and has to be) done "in a stable way". This means that whenever something is hashed that might change in between compilation sessions (e.g. a DefId), we instead hash its stable equivalent (e.g. the corresponding DefPath). That's what the whole HashStable infrastructure is for. This way Fingerprints computed in two different compilation sessions are still comparable.

The next step is to store these fingerprints along with the dependency graph. This is cheap since fingerprints are just bytes to be copied. It's also cheap to load the entire set of fingerprints together with the dependency graph.

Now, when red-green-marking reaches the point where it needs to check if a result has changed, it can just compare the (already loaded) previous fingerprint to the fingerprint of the new result.

This approach works rather well but it's not without flaws:

  • There is a small possibility of hash collisions. That is, two different results could have the same fingerprint and the system would erroneously assume that the result hasn't changed, leading to a missed update.

    We mitigate this risk by using a high-quality hash function and a 128 bit wide hash value. Due to these measures the practical risk of a hash collision is negligible.

  • Computing fingerprints is quite costly. It is the main reason why incremental compilation can be slower than non-incremental compilation. We are forced to use a good and thus expensive hash function, and we have to map things to their stable equivalents while doing the hashing.

A Tale Of Two DepGraphs: The Old And The New

The initial description of dependency tracking glosses over a few details that quickly become a head scratcher when actually trying to implement things. In particular it's easy to overlook that we are actually dealing with two dependency graphs: The one we built during the previous compilation session and the one that we are building for the current compilation session.

When a compilation session starts, the compiler loads the previous dependency graph into memory as an immutable piece of data. Then, when a query is invoked, it will first try to mark the corresponding node in the graph as green. This means really that we are trying to mark the node in the previous dep-graph as green that corresponds to the query key in the current session. How do we do this mapping between current query key and previous DepNode? The answer is again Fingerprints: Nodes in the dependency graph are identified by a fingerprint of the query key. Since fingerprints are stable across compilation sessions, computing one in the current session allows us to find a node in the dependency graph from the previous session. If we don't find a node with the given fingerprint, it means that the query key refers to something that did not yet exist in the previous session.

So, having found the dep-node in the previous dependency graph, we can look up its dependencies (i.e. also dep-nodes in the previous graph) and continue with the rest of the try-mark-green algorithm. The next interesting thing happens when we successfully marked the node as green. At that point we copy the node and the edges to its dependencies from the old graph into the new graph. We have to do this because the new dep-graph cannot not acquire the node and edges via the regular dependency tracking. The tracking system can only record edges while actually running a query -- but running the query, although we have the result already cached, is exactly what we want to avoid.

Once the compilation session has finished, all the unchanged parts have been copied over from the old into the new dependency graph, while the changed parts have been added to the new graph by the tracking system. At this point, the new graph is serialized out to disk, alongside the query result cache, and can act as the previous dep-graph in a subsequent compilation session.

Didn't You Forget Something?: Cache Promotion

The system described so far has a somewhat subtle property: If all inputs of a dep-node are green then the dep-node itself can be marked as green without computing or loading the corresponding query result. Applying this property transitively often leads to the situation that some intermediate results are never actually loaded from disk, as in the following example:

   input(A) <-- intermediate_query(B) <-- leaf_query(C)

The compiler might need the value of leaf_query(C) in order to generate some output artifact. If it can mark leaf_query(C) as green, it will load the result from the on-disk cache. The result of intermediate_query(B) is never loaded though. As a consequence, when the compiler persists the new result cache by writing all in-memory query results to disk, intermediate_query(B) will not be in memory and thus will be missing from the new result cache.

If there subsequently is another compilation session that actually needs the result of intermediate_query(B) it will have to be re-computed even though we had a perfectly valid result for it in the cache just before.

In order to prevent this from happening, the compiler does something called "cache promotion": Before emitting the new result cache it will walk all green dep-nodes and make sure that their query result is loaded into memory. That way the result cache doesn't unnecessarily shrink again.

Incremental Compilation and the Compiler Backend

The compiler backend, the part involving LLVM, is using the query system but it is not implemented in terms of queries itself. As a consequence it does not automatically partake in dependency tracking. However, the manual integration with the tracking system is pretty straight-forward. The compiler simply tracks what queries get invoked when generating the initial LLVM version of each codegen unit, which results in a dep-node for each of them. In subsequent compilation sessions it then tries to mark the dep-node for a CGU as green. If it succeeds it knows that the corresponding object and bitcode files on disk are still valid. If it doesn't succeed, the entire codegen unit has to be recompiled.

This is the same approach that is used for regular queries. The main differences are:

  • that we cannot easily compute a fingerprint for LLVM modules (because they are opaque C++ objects),

  • that the logic for dealing with cached values is rather different from regular queries because here we have bitcode and object files instead of serialized Rust values in the common result cache file, and

  • the operations around LLVM are so expensive in terms of computation time and memory consumption that we need to have tight control over what is executed when and what stays in memory for how long.

The query system could probably be extended with general purpose mechanisms to deal with all of the above but so far that seemed like more trouble than it would save.

Query Modifiers

The query system allows for applying modifiers to queries. These modifiers affect certain aspects of how the system treats the query with respect to incremental compilation:

  • eval_always - A query with the eval_always attribute is re-executed unconditionally during incremental compilation. I.e. the system will not even try to mark the query's dep-node as green. This attribute has two use cases:

    • eval_always queries can read inputs (from files, global state, etc). They can also produce side effects like writing to files and changing global state.

    • Some queries are very likely to be re-evaluated because their result depends on the entire source code. In this case eval_always can be used as an optimization because the system can skip recording dependencies in the first place.

  • no_hash - Applying no_hash to a query tells the system to not compute the fingerprint of the query's result. This has two consequences:

    • Not computing the fingerprint can save quite a bit of time because fingerprinting is expensive, especially for large, complex values.

    • Without the fingerprint, the system has to unconditionally assume that the result of the query has changed. As a consequence anything depending on a no_hash query will always be re-executed.

  • cache_on_disk_if - This attribute is what determines which query results are persisted in the incremental compilation query result cache. The attribute takes an expression that allows per query invocation decisions. For example, it makes no sense to store values from upstream crates in the cache because they are already available in the upstream crate's metadata.

  • anon - This attribute makes the system use "anonymous" dep-nodes for the given query. An anonymous dep-node is not identified by the corresponding query key, instead its ID is computed from the IDs of its dependencies. This allows the red-green system to do its change detection even if there is no query key available for a given dep-node -- something which is needed for handling trait selection because it is not based on queries.

The Projection Query Pattern

It's interesting to note that eval_always and no_hash can be used together in the so-called "projection query" pattern. It is often the case that there is one query that depends on the entirety of the compiler's input (e.g. the indexed HIR) and another query that projects individual values out of this monolithic value (e.g. a HIR item with a certain DefId). These projection queries allow for building change propagation "firewalls" because even if the result of the monolithic query changes (which it is very likely to do) the small projections can still mostly be marked as green.

  +------------+
  |            |           +---------------+           +--------+
  |            | <---------| projection(x) | <---------| foo(a) |
  |            |           +---------------+           +--------+
  |            |
  | monolithic |           +---------------+           +--------+
  |   query    | <---------| projection(y) | <---------| bar(b) |
  |            |           +---------------+           +--------+
  |            |
  |            |           +---------------+           +--------+
  |            | <---------| projection(z) | <---------| baz(c) |
  |            |           +---------------+           +--------+
  +------------+

Let's assume that the result monolithic_query changes so that also the result of projection(x) has changed, i.e. both their dep-nodes are being marked as red. As a consequence foo(a) needs to be re-executed; but bar(b) and baz(c) can be marked as green. However, if foo, bar, and baz would have directly depended on monolithic_query then all of them would have had to be re-evaluated.

This pattern works even without eval_always and no_hash but the two modifiers can be used to avoid unnecessary overhead. If the monolithic query is likely to change at any minor modification of the compiler's input it makes sense to mark it as eval_always, thus getting rid of its dependency tracking cost. And it always makes sense to mark the monolithic query as no_hash because we have the projections to take care of keeping things green as much as possible.

Shortcomings of the Current System

There are many things that still can be improved.

Incrementality of on-disk data structures

The current system is not able to update on-disk caches and the dependency graph in-place. Instead it has to rewrite each file entirely in each compilation session. The overhead of doing so is a few percent of total compilation time.

Unnecessary data dependencies

Data structures used as query results could be factored in a way that removes edges from the dependency graph. Especially "span" information is very volatile, so including it in query result will increase the chance that that result won't be reusable. See https://github.com/rust-lang/rust/issues/47389 for more information.

Debugging and Testing Dependencies

Testing the dependency graph

There are various ways to write tests against the dependency graph. The simplest mechanisms are the #[rustc_if_this_changed] and #[rustc_then_this_would_need] annotations. These are used in ui tests to test whether the expected set of paths exist in the dependency graph. As an example, see src/test/ui/dep-graph/dep-graph-caller-callee.rs.

The idea is that you can annotate a test like:

#[rustc_if_this_changed]
fn foo() { }

#[rustc_then_this_would_need(TypeckTables)] //~ ERROR OK
fn bar() { foo(); }

#[rustc_then_this_would_need(TypeckTables)] //~ ERROR no path
fn baz() { }

This will check whether there is a path in the dependency graph from Hir(foo) to TypeckTables(bar). An error is reported for each #[rustc_then_this_would_need] annotation that indicates whether a path exists. //~ ERROR annotations can then be used to test if a path is found (as demonstrated above).

Debugging the dependency graph

Dumping the graph

The compiler is also capable of dumping the dependency graph for your debugging pleasure. To do so, pass the -Z dump-dep-graph flag. The graph will be dumped to dep_graph.{txt,dot} in the current directory. You can override the filename with the RUST_DEP_GRAPH environment variable.

Frequently, though, the full dep graph is quite overwhelming and not particularly helpful. Therefore, the compiler also allows you to filter the graph. You can filter in three ways:

  1. All edges originating in a particular set of nodes (usually a single node).
  2. All edges reaching a particular set of nodes.
  3. All edges that lie between given start and end nodes.

To filter, use the RUST_DEP_GRAPH_FILTER environment variable, which should look like one of the following:

source_filter     // nodes originating from source_filter
-> target_filter  // nodes that can reach target_filter
source_filter -> target_filter // nodes in between source_filter and target_filter

source_filter and target_filter are a &-separated list of strings. A node is considered to match a filter if all of those strings appear in its label. So, for example:

RUST_DEP_GRAPH_FILTER='-> TypeckTables'

would select the predecessors of all TypeckTables nodes. Usually though you want the TypeckTables node for some particular fn, so you might write:

RUST_DEP_GRAPH_FILTER='-> TypeckTables & bar'

This will select only the predecessors of TypeckTables nodes for functions with bar in their name.

Perhaps you are finding that when you change foo you need to re-type-check bar, but you don't think you should have to. In that case, you might do:

RUST_DEP_GRAPH_FILTER='Hir & foo -> TypeckTables & bar'

This will dump out all the nodes that lead from Hir(foo) to TypeckTables(bar), from which you can (hopefully) see the source of the erroneous edge.

Tracking down incorrect edges

Sometimes, after you dump the dependency graph, you will find some path that should not exist, but you will not be quite sure how it came to be. When the compiler is built with debug assertions, it can help you track that down. Simply set the RUST_FORBID_DEP_GRAPH_EDGE environment variable to a filter. Every edge created in the dep-graph will be tested against that filter – if it matches, a bug! is reported, so you can easily see the backtrace (RUST_BACKTRACE=1).

The syntax for these filters is the same as described in the previous section. However, note that this filter is applied to every edge and doesn't handle longer paths in the graph, unlike the previous section.

Example:

You find that there is a path from the Hir of foo to the type check of bar and you don't think there should be. You dump the dep-graph as described in the previous section and open dep-graph.txt to see something like:

Hir(foo) -> Collect(bar)
Collect(bar) -> TypeckTables(bar)

That first edge looks suspicious to you. So you set RUST_FORBID_DEP_GRAPH_EDGE to Hir&foo -> Collect&bar, re-run, and then observe the backtrace. Voila, bug fixed!

Profiling Queries

In an effort to support incremental compilation, the latest design of the Rust compiler consists of a query-based model.

The details of this model are (currently) outside the scope of this document, however, we explain some background of this model, in an effort to explain how we profile its performance. We intend this profiling effort to address issue 42678.

Quick Start

0. Enable debug assertions

./configure --enable-debug-assertions

1. Compile rustc

Compile the compiler, up to at least stage 1:

./x.py build library/std

2. Run rustc, with flags

Run the compiler on a source file, supplying two additional debugging flags with -Z:

rustc -Z profile-queries -Z incremental=cache foo.rs

Regarding the two additional parameters:

  • -Z profile-queries tells the compiler to run a separate thread that profiles the queries made by the main compiler thread(s).
  • -Z incremental=cache tells the compiler to "cache" various files that describe the compilation dependencies, in the subdirectory cache.

This command will generate the following files:

  • profile_queries.html consists of an HTML-based representation of the trace of queries.
  • profile_queries.counts.txt consists of a histogram, where each histogram "bucket" is a query provider.

3. Run rustc, with -Z time-passes:

  • This additional flag will add all timed passes to the output files mentioned above, in step 2. As described below, these passes appear visually distinct from the queries in the HTML output (they currently appear as green boxes, via CSS).

4. Inspect the output

  • 4(a). Open the HTML file (profile_queries.html) with a browser. See this section for an explanation of this file.
  • 4(b). Open the data file (profile_queries.counts.txt) with a text editor, or spreadsheet. See this section for an explanation of this file.

Interpret the HTML Output

Example 0

The following image gives some example output, from tracing the queries of hello_world.rs (a single main function, that prints "hello world" via the macro println!). This image only shows a short prefix of the total output; the actual output is much longer.

Example HTML output View full HTML output. Note; it could take up to a second to properly render depending on your browser.

Here is the corresponding text output.

Example 0 explanation

The trace of the queries has a formal structure; see Trace of Queries for details.

We style this formal structure as follows:

  • Timed passes: Green boxes, when present (via -Z time-passes), represent timed passes in the compiler. In future versions, these passes may be replaced by queries, explained below.
  • Labels: Some green and red boxes are labeled with text. Where they are present, the labels give the following information:
    • The query's provider, sans its key and its result, which are often too long to include in these labels.
    • The duration of the provider, as a fraction of the total time (for the entire trace). This fraction includes the query's entire extent (that is, the sum total of all of its sub-queries).
  • Query hits: Blue dots represent query hits. They consist of leaves in the trace's tree. (CSS class: hit).
  • Query misses: Red boxes represent query misses. They consist of internal nodes in the trace's tree. (CSS class: miss).
  • Nesting structure: Many red boxes contain nested boxes and dots. This nesting structure reflects that some providers depend on results from other providers, which consist of their nested children.
  • Some red boxes are labeled with text, and have highlighted borders (light red, and bolded). (See heuristics for details).

Heuristics

Heuristics-based CSS Classes:

  • important -- Trace nodes are important if they have an extent of 6 (or more), or they have a duration fraction of one percent (or more). These numbers are simple heuristics (currently hard-coded, but easy to modify). Important nodes are styled with textual labels, and highlighted borders (light red, and bolded).

  • frac-50, -40, ... -- Trace nodes whose total duration (self and children) take a large fraction of the total duration, at or above 50%, 40%, and so on. We style nodes these with larger font and padding.

Interpret the Data Output

The file profile_queries.counts.txt contains a table of information about the queries, organized around their providers.

For each provider (or timed pass, when -Z time-passes is present), we produce:

  • A total count --- the total number of times this provider was queried

  • A total duration --- the total number of seconds spent running this provider, including all providers it may depend on. To get a sense of this dependency structure, and inspect a more fine-grained view of these durations, see this section.

These rows are sorted by total duration, in descending order.

Counts: Example 0

The following example profile_queries.counts.txt file results from running on a hello world program (a single main function that uses println to print "hello world").

As explained above, the columns consist of provider/pass, count, duration:

translation,1,0.891
symbol_name,2658,0.733
def_symbol_name,2556,0.268
item_attrs,5566,0.162
type_of,6922,0.117
generics_of,8020,0.084
serialize dep graph,1,0.079
relevant_trait_impls_for,50,0.063
def_span,24875,0.061
expansion,1,0.059
const checking,1,0.055
adt_def,1141,0.048
trait_impls_of,32,0.045
is_copy_raw,47,0.045
is_foreign_item,2638,0.042
fn_sig,2172,0.033
adt_dtorck_constraint,2,0.023
impl_trait_ref,2434,0.023
typeck_tables_of,29,0.022
item-bodies checking,1,0.017
typeck_item_bodies,1,0.017
is_default_impl,2320,0.017
borrow checking,1,0.014
borrowck,4,0.014
mir_validated,4,0.013
adt_destructor,10,0.012
layout_raw,258,0.010
load_dep_graph,1,0.007
item-types checking,1,0.005
mir_const,2,0.005
name resolution,1,0.004
is_object_safe,35,0.003
is_sized_raw,89,0.003
parsing,1,0.003
is_freeze_raw,11,0.001
privacy checking,1,0.001
privacy_access_levels,5,0.001
resolving dependency formats,1,0.001
adt_sized_constraint,9,0.001
wf checking,1,0.001
liveness checking,1,0.001
compute_incremental_hashes_map,1,0.001
match checking,1,0.001
type collecting,1,0.001
param_env,31,0.000
effect checking,1,0.000
trait_def,140,0.000
lowering ast -> hir,1,0.000
predicates_of,70,0.000
extern_crate,319,0.000
lifetime resolution,1,0.000
is_const_fn,6,0.000
intrinsic checking,1,0.000
translation item collection,1,0.000
impl_polarity,15,0.000
creating allocators,1,0.000
language item collection,1,0.000
crate injection,1,0.000
early lint checks,1,0.000
indexing hir,1,0.000
maybe creating a macro crate,1,0.000
coherence checking,1,0.000
optimized_mir,6,0.000
is_panic_runtime,33,0.000
associated_item_def_ids,7,0.000
needs_drop_raw,10,0.000
lint checking,1,0.000
complete gated feature checking,1,0.000
stability index,1,0.000
region_maps,11,0.000
super_predicates_of,8,0.000
coherent_trait,2,0.000
AST validation,1,0.000
loop checking,1,0.000
static item recursion checking,1,0.000
variances_of,11,0.000
associated_item,5,0.000
plugin loading,1,0.000
looking for plugin registrar,1,0.000
stability checking,1,0.000
describe_def,15,0.000
variance testing,1,0.000
codegen unit partitioning,1,0.000
looking for entry point,1,0.000
checking for inline asm in case the target doesn't support it,1,0.000
inherent_impls,1,0.000
crate_inherent_impls,1,0.000
trait_of_item,7,0.000
crate_inherent_impls_overlap_check,1,0.000
attribute checking,1,0.000
internalize symbols,1,0.000
impl wf inference,1,0.000
death checking,1,0.000
reachability checking,1,0.000
reachable_set,1,0.000
is_exported_symbol,3,0.000
is_mir_available,2,0.000
unused lib feature checking,1,0.000
maybe building test harness,1,0.000
recursion limit,1,0.000
write allocator module,1,0.000
assert dep graph,1,0.000
plugin registration,1,0.000
write metadata,1,0.000

Background

We give some background about the query model of the Rust compiler.

Def IDs

In the query model, many queries have a key that consists of a Def ID. The Rust compiler uses Def IDs to distinguish definitions in the input Rust program.

From the compiler source code (compiler/rustc_span/src/def_id.rs):

/// A DefId identifies a particular *definition*, by combining a crate
/// index and a def index.
#[derive(Clone, Eq, Ord, PartialOrd, PartialEq, RustcEncodable, RustcDecodable, Hash, Copy)]
pub struct DefId {
    pub krate: CrateNum,
    pub index: DefIndex,
}

Queries

A query relates a key to a result, either by invoking a provider that computes this result, or by reusing a cached result that was provided earlier. We explain each term in more detail:

  • Query Provider: Each kind of query has a pre-defined provider, which refers to the compiler behavior that provides an answer to the query. These providers may nest; see trace of queries for more information about this nesting structure. Example providers:
    • typeck -- Typecheck a Def ID; produce "tables" of type information.
    • borrowck -- Borrow-check a Def ID.
    • optimized_mir -- Generate an optimized MIR for a Def ID; produce MIR.
    • For more examples, see Example 0.
  • Query Key: The input/arguments to the provider. Often, this consists of a particular Def ID.
  • Query Result: The output of the provider.

Trace of Queries

Formally, a trace of the queries consists of a tree, where sub-trees represent sub-traces. In particular, the nesting structure of the trace of queries describes how the queries depend on one another.

Even more precisely, this tree represents a directed acyclic graph (DAG), where shared sub-graphs consist of tree nodes that occur multiple times in the tree, first as "cache misses" and later as "cache hits".

Cache hits and misses. The trace is a tree with the following possible tree nodes:

  • Query, with cache miss: The query's result is unknown, and its provider runs to compute it. In this case, the dynamic extent of the query's trace consists of the traced behavior of its provider.
  • Query, with cache hit: The query's result is known, and is reused; its provider does not rerun. These nodes are leaves in the trace, since they have no dynamic extent. These leaves also represent where the tree, represented as a DAG, would share a sub-graph (namely, the sub-graph of the query that was reused from the cache).

Tree node metrics. To help determine how to style this tree, we define the following tree node metrics:

  • Depth: The number of ancestors of the node in its path from the tree root.
  • Extent: The number of immediate children of the node.

Intuitively, a dependency tree is "good" for incremental caching when the depth and extent of each node is relatively small. It is pathological when either of these metrics grows too large. For instance, a tree node whose extent consists of 1M immediate children means that if and when this node is re-computed, all 1M children must be re-queried, at the very least (some may also require recomputation, too).

External Links

Related design ideas, and tracking issues:

More discussion and issues:

How Salsa works

This chapter is based on the explanation given by Niko Matsakis in this video about Salsa. To find out more you may want to watch Salsa In More Depth, also by Niko Matsakis.

Salsa is not used directly in rustc, but it is used extensively for rust-analyzer and may be integrated into the compiler in the future.

What is Salsa?

Salsa is a library for incremental recomputation. This means it allows reusing computations that were already done in the past to increase the efficiency of future computations.

The objectives of Salsa are:

  • Provide that functionality in an automatic way, so reusing old computations is done automatically by the library
  • Doing so in a "sound", or "correct", way, therefore leading to the same results as if it had been done from scratch

Salsa's actual model is much richer, allowing many kinds of inputs and many different outputs. For example, integrating Salsa with an IDE could mean that the inputs could be the manifest (Cargo.toml), entire source files (foo.rs), snippets and so on; the outputs of such an integration could range from a binary executable, to lints, types (for example, if a user selects a certain variable and wishes to see its type), completions, etc.

How does it work?

The first thing that Salsa has to do is identify the "base inputs" that are not something computed but given as input.

Then Salsa has to also identify intermediate, "derived" values, which are something that the library produces, but, for each derived value there's a "pure" function that computes the derived value.

For example, there might be a function ast(x: Path) -> AST. The produced AST isn't a final value, it's an intermidiate value that the library would use for the computation.

This means that when you try to compute with the library, Salsa is going to compute various derived values, and eventually read the input and produce the result for the asked computation.

In the course of computing, Salsa tracks which inputs were accessed and which values are derived. This information is used to determine what's going to happen when the inputs change: are the derived values still valid?

This doesn't necessarily mean that each computation downstream from the input is going to be checked, which could be costly. Salsa only needs to check each downstream computation until it finds one that isn't changed. At that point, it won't check other derived computations since they wouldn't need to change.

It's is helpful to think about this as a graph with nodes. Each derived value has a dependency on other values, which could themselves be either base or derived. Base values don't have a dependency.

I <- A <- C ...
          |
J <- B <--+

When an input I changes, the derived value A could change. The derived value B , which does not depend on I, A, or any value derived from A or I, is not subject to change. Therefore, Salsa can reuse the computation done for B in the past, without having to compute it again.

The computation could also terminate early. Keeping the same graph as before, say that input I has changed in some way (and input J hasn't) but, when computing A again, it's found that A hasn't changed from the previous computation. This leads to an "early termination", because there's no need to check if C needs to change, since both C direct inputs, A and B, haven't changed.

Key Salsa concepts

Query

A query is some value that Salsa can access in the course of computation. Each query can have a number of keys (from 0 to many), and all queries have a result, akin to functions. 0-key queries are called "input" queries.

Database

The database is basically the context for the entire computation, it's meant to store Salsa's internal state, all intermediate values for each query, and anything else that the computation might need. The database must know all the queries that the library is going to do before it can be built, but they don't need to be specified in the same place.

After the database is formed, it can be accessed with queries that are very similar to functions. Since each query's result is stored in the database, when a query is invoked N times, it will return N cloned results, without having to recompute the query (unless the input has changed in such a way that it warrants recomputation).

For each input query (0-key), a "set" method is generated, allowing the user to change the output of such query, and trigger previous memoized values to be potentially invalidated.

Query Groups

A query group is a set of queries which have been defined together as a unit. The database is formed by combining query groups. Query groups are akin to "Salsa modules".

A set of queries in a query group are just a set of methods in a trait.

To create a query group a trait annotated with a specific attribute (#[salsa::query_group(...)]) has to be created.

An argument must also be provided to said attribute as it will be used by Salsa to create a struct to be used later when the database is created.

Example input query group:

/// This attribute will process this tree, produce this tree as output, and produce
/// a bunch of intermidiate stuff that Salsa also uses.  One of these things is a
/// "StorageStruct", whose name we have specified in the attribute.
///
/// This query group is a bunch of **input** queries, that do not rely on any
/// derived input.
#[salsa::query_group(InputsStorage)]
pub trait Inputs {
    /// This attribute (`#[salsa::input]`) indicates that this query is a base
    /// input, therefore `set_manifest` is going to be auto-generated
    #[salsa::input]
    fn manifest(&self) -> Manifest;

    #[salsa::input]
    fn source_text(&self, name: String) -> String;
}

To create a derived query group, one must specify which other query groups this one depends on by specifying them as supertraits, as seen in the following example:

/// This query group is going to contain queries that depend on derived values a
/// query group can access another query group's queries by specifying the
/// dependency as a super trait query groups can be stacked as much as needed using
/// that pattern.
#[salsa::query_group(ParserStorage)]
pub trait Parser: Inputs {
    /// This query `ast` is not an input query, it's a derived query this means
    /// that a definition is necessary.
    fn ast(&self, name: String) -> String;
}

When creating a derived query the implementation of said query must be defined outside the trait. The definition must take a database parameter as an impl Trait (or dyn Trait), where Trait is the query group that the definition belongs to, in addition to the other keys.

///This is going to be the definition of the `ast` query in the `Parser` trait.
///So, when the query `ast` is invoked, and it needs to be recomputed, Salsa is going to call this function
///and it's is going to give it the database as `impl Parser`.
///The function doesn't need to be aware of all the queries of all the query groups
fn ast(db: &impl Parser, name: String) -> String {
    //! Note, `impl Parser` is used here but `dyn Parser` works just as well
    /* code */
    ///By passing an `impl Parser`, this is allowed
    let source_text = db.input_file(name);
    /* do the actual parsing */
    return ast;
}

Eventually, after all the query groups have been defined, the database can be created by declaring a struct.

To specify which query groups are going to be part of the database an attribute (#[salsa::database(...)]) must be added. The argument of said attribute is a list of identifiers, specifying the query groups storages.

///This attribute specifies which query groups are going to be in the database
#[salsa::database(InputsStorage, ParserStorage)]
#[derive(Default)] //optional!
struct MyDatabase {
    ///You also need this one field
    runtime : salsa::Runtime<MyDatabase>,
}
///And this trait has to be implemented
impl salsa::Databse for MyDatabase {
    fn salsa_runtime(&self) -> &salsa::Runtime<MyDatabase> {
        &self.runtime
    }
}

Example usage:

fn main() {
    let db = MyDatabase::default();
    db.set_manifest(...);
    db.set_source_text(...);
    loop {
        db.ast(...); //will reuse results
        db.set_source_text(...);
    }
}

Memory Management in Rustc

Rustc tries to be pretty careful how it manages memory. The compiler allocates a lot of data structures throughout compilation, and if we are not careful, it will take a lot of time and space to do so.

One of the main way the compiler manages this is using arenas and interning.

Arenas and Interning

We create a LOT of data structures during compilation. For performance reasons, we allocate them from a global memory pool; they are each allocated once from a long-lived arena. This is called arena allocation. This system reduces allocations/deallocations of memory. It also allows for easy comparison of types for equality: for each interned type X, we implemented PartialEq for X, so we can just compare pointers. The CtxtInterners type contains a bunch of maps of interned types and the arena itself.

Example: ty::TyS

Taking the example of ty::TyS which represents a type in the compiler (you can read more here). Each time we want to construct a type, the compiler doesn’t naively allocate from the buffer. Instead, we check if that type was already constructed. If it was, we just get the same pointer we had before, otherwise we make a fresh pointer. With this schema if we want to know if two types are the same, all we need to do is compare the pointers which is efficient. TyS is carefully setup so you never construct them on the stack. You always allocate them from this arena and you always intern them so they are unique.

At the beginning of the compilation we make a buffer and each time we need to allocate a type we use some of this memory buffer. If we run out of space we get another one. The lifetime of that buffer is 'tcx. Our types are tied to that lifetime, so when compilation finishes all the memory related to that buffer is freed and our 'tcx references would be invalid.

In addition to types, there are a number of other arena-allocated data structures that you can allocate, and which are found in this module. Here are a few examples:

  • Substs, allocated with mk_substs – this will intern a slice of types, often used to specify the values to be substituted for generics (e.g. HashMap<i32, u32> would be represented as a slice &'tcx [tcx.types.i32, tcx.types.u32]).
  • TraitRef, typically passed by value – a trait reference consists of a reference to a trait along with its various type parameters (including Self), like i32: Display (here, the def-id would reference the Display trait, and the substs would contain i32). Note that def-id is defined and discussed in depth in the AdtDef and DefId section.
  • Predicate defines something the trait system has to prove (see traits module).

The tcx and how it uses lifetimes

The tcx ("typing context") is the central data structure in the compiler. It is the context that you use to perform all manner of queries. The struct TyCtxt defines a reference to this shared context:

tcx: TyCtxt<'tcx>
//          ----
//          |
//          arena lifetime

As you can see, the TyCtxt type takes a lifetime parameter. When you see a reference with a lifetime like 'tcx, you know that it refers to arena-allocated data (or data that lives as long as the arenas, anyhow).

A Note On Lifetimes

The Rust compiler is a fairly large program containing lots of big data structures (e.g. the AST, HIR, and the type system) and as such, arenas and references are heavily relied upon to minimize unnecessary memory use. This manifests itself in the way people can plug into the compiler (i.e. the driver), preferring a "push"-style API (callbacks) instead of the more Rust-ic "pull" style (think the Iterator trait).

Thread-local storage and interning are used a lot through the compiler to reduce duplication while also preventing a lot of the ergonomic issues due to many pervasive lifetimes. The rustc_middle::ty::tls module is used to access these thread-locals, although you should rarely need to touch it.

Serialization in Rustc

Rustc has to serialize and deserialize various data during compilation. Specifically:

  • "Crate metadata", mainly query outputs, are serialized in a binary format into rlib and rmeta files that are output when compiling a library crate, these are then deserialized by crates that depend on that library.
  • Certain query outputs are serialized in a binary format to persist incremental compilation results.
  • The -Z ast-json and -Z ast-json-noexpand flags serialize the AST to json and output the result to stdout.
  • CrateInfo is serialized to json when the -Z no-link flag is used, and deserialized from json when the -Z link-only flag is used.

The Encodable and Decodable traits

The rustc_serialize crate defines two traits for types which can be serialized:

pub trait Encodable<S: Encoder> {
    fn encode(&self, s: &mut S) -> Result<(), S::Error>;
}

pub trait Decodable<D: Decoder>: Sized {
    fn decode(d: &mut D) -> Result<Self, D::Error>;
}

It also defines implementations of these for integer types, floating point types, bool, char, str and various common standard library types.

For types that are constructed from those types, Encodable and Decodable are usually implemented by derives. These generate implementations that forward deserialization to the fields of the struct or enum. For a struct those impls look something like this:

#![feature(rustc_private)]
extern crate rustc_serialize;
use rustc_serialize::{Decodable, Decoder, Encodable, Encoder};

struct MyStruct {
    int: u32,
    float: f32,
}

impl<E: Encoder> Encodable<E> for MyStruct {
    fn encode(&self, s: &mut E) -> Result<(), E::Error> {
        s.emit_struct("MyStruct", 2, |s| {
            s.emit_struct_field("int", 0, |s| self.int.encode(s))?;
            s.emit_struct_field("float", 1, |s| self.float.encode(s))
        })
    }
}
impl<D: Decoder> Decodable<D> for MyStruct {
    fn decode(s: &mut D) -> Result<MyStruct, D::Error> {
        s.read_struct("MyStruct", 2, |d| {
            let int = d.read_struct_field("int", 0, Decodable::decode)?;
            let float = d.read_struct_field("float", 1, Decodable::decode)?;

            Ok(MyStruct { int, float })
        })
    }
}

Encoding and Decoding arena allocated types

Rustc has a lot of arena allocated types. Deserializing these types isn't possible without access to the arena that they need to be allocated on. The TyDecoder and TyEncoder traits are supertraits of Decoder and Encoder that allow access to a TyCtxt.

Types which contain arena allocated types can then bound the type parameter of their Encodable and Decodable implementations with these traits. For example

impl<'tcx, D: TyDecoder<'tcx>> Decodable<D> for MyStruct<'tcx> {
    /* ... */
}

The TyEncodable and TyDecodable derive macros will expand to such an implementation.

Decoding the actual arena allocated type is harder, because some of the implementations can't be written due to the orphan rules. To work around this, the RefDecodable trait is defined in rustc_middle. This can then be implemented for any type. The TyDecodable macro will call RefDecodable to decode references, but various generic code needs types to actually be Decodable with a specific decoder.

For interned types instead of manually implementing RefDecodable, using a new type wrapper, like ty::Predicate and manually implementing Encodable and Decodable may be simpler.

Derive macros

The rustc_macros crate defines various derives to help implement Decodable and Encodable.

  • The Encodable and Decodable macros generate implementations that apply to all Encoders and Decoders. These should be used in crates that don't depend on rustc_middle, or that have to be serialized by a type that does not implement TyEncoder.
  • MetadataEncodable and MetadataDecodable generate implementations that only allow decoding by rustc_metadata::rmeta::encoder::EncodeContext and rustc_metadata::rmeta::decoder::DecodeContext. These are used for types that contain rustc_metadata::rmeta::Lazy.
  • TyEncodable and TyDecoder generate implementation that apply to any TyEncoder or TyDecoder. These should be used for types that are only serialized in crate metadata and/or the incremental cache, which is most serializable types in rustc_middle.

Shorthands

Ty can be deeply recursive, if each Ty was encoded naively then crate metadata would be very large. To handle this, each TyEncoder has a cache of locations in its output where it has serialized types. If a type being encoded is in the cache, then instead of serializing the type as usual, the byte offset within the file being written is encoded instead. A similar scheme is used for ty::Predicate.

Lazy<T>

Crate metadata is initially loaded before the TyCtxt<'tcx> is created, so some deserialization needs to be deferred from the initial loading of metadata. The Lazy<T> type wraps the (relative) offset in the crate metadata where a T has been serialized.

The Lazy<[T]> and Lazy<Table<I, T>> type provide some functionality over Lazy<Vec<T>> and Lazy<HashMap<I, T>>:

  • It's possible to encode a Lazy<[T]> directly from an iterator, without first collecting into a Vec<T>.
  • Indexing into a Lazy<Table<I, T>> does not require decoding entries other than the one being read.

note: Lazy<T> does not cache its value after being deserialized the first time. Instead the query system is the main way of caching these results.

Specialization

A few types, most notably DefId, need to have different implementations for different Encoders. This is currently handled by ad-hoc specializations: DefId has a default implementation of Encodable<E> and a specialized one for Encodable<CacheEncoder>.

Parallel Compilation

Most of the compiler is not parallel. This represents an opportunity for improving compiler performance. Much effort has been put into parallelizing rustc, but it is still pretty early days for this work. There is a lot of design and correctness work that needs to be done.

One can try out the current parallel compiler work by enabling it in the config.toml.

There are a few basic ideas in this effort:

  • There are a lot of loops in the compiler that just iterate over all items in a crate. These can possibly be parallelized.
  • We can use (a custom fork of) rayon to run tasks in parallel. The custom fork allows the execution of DAGs of tasks, not just trees.
  • There are currently a lot of global data structures that need to be made thread-safe. A key strategy here has been converting interior-mutable data-structures (e.g. Cell) into their thread-safe siblings (e.g. Mutex).

As of this writing, much of this effort is on hold due to lack of manpower. We have a working prototype with promising performance gains in many cases. However, there are two blockers:

  • It's not clear what invariants need to be upheld that might not hold in the face of concurrency. An auditing effort was underway, but seems to have stalled at some point.

  • There is a lot of lock contention, which actually degrades performance as the number of threads increases beyond 4.

Here are some resources that can be used to learn more (note that some of them are a bit out of date):

Rustdoc internals

This page describes rustdoc's passes and modes. For an overview of rustdoc, see rustdoc.

From crate to clean

In core.rs are two central items: the DocContext struct, and the run_core function. The latter is where rustdoc calls out to rustc to compile a crate to the point where rustdoc can take over. The former is a state container used when crawling through a crate to gather its documentation.

The main process of crate crawling is done in clean/mod.rs through several implementations of the Clean trait defined within. This is a conversion trait, which defines one method:

pub trait Clean<T> {
    fn clean(&self, cx: &DocContext) -> T;
}

clean/mod.rs also defines the types for the "cleaned" AST used later on to render documentation pages. Each usually accompanies an implementation of Clean that takes some AST or HIR type from rustc and converts it into the appropriate "cleaned" type. "Big" items like modules or associated items may have some extra processing in its Clean implementation, but for the most part these impls are straightforward conversions. The "entry point" to this module is the impl Clean<Crate> for visit_ast::RustdocVisitor, which is called by run_core above.

You see, I actually lied a little earlier: There's another AST transformation that happens before the events in clean/mod.rs. In visit_ast.rs is the type RustdocVisitor, which actually crawls a rustc_hir::Crate to get the first intermediate representation, defined in doctree.rs. This pass is mainly to get a few intermediate wrappers around the HIR types and to process visibility and inlining. This is where #[doc(inline)], #[doc(no_inline)], and #[doc(hidden)] are processed, as well as the logic for whether a pub use should get the full page or a "Reexport" line in the module page.

The other major thing that happens in clean/mod.rs is the collection of doc comments and #[doc=""] attributes into a separate field of the Attributes struct, present on anything that gets hand-written documentation. This makes it easier to collect this documentation later in the process.

The primary output of this process is a clean::Crate with a tree of Items which describe the publicly-documentable items in the target crate.

Hot potato

Before moving on to the next major step, a few important "passes" occur over the documentation. These do things like combine the separate "attributes" into a single string and strip leading whitespace to make the document easier on the markdown parser, or drop items that are not public or deliberately hidden with #[doc(hidden)]. These are all implemented in the passes/ directory, one file per pass. By default, all of these passes are run on a crate, but the ones regarding dropping private/hidden items can be bypassed by passing --document-private-items to rustdoc. Note that unlike the previous set of AST transformations, the passes happen on the cleaned crate.

(Strictly speaking, you can fine-tune the passes run and even add your own, but we're trying to deprecate that. If you need finer-grain control over these passes, please let us know!)

Here is current (as of this writing) list of passes:

  • propagate-doc-cfg - propagates #[doc(cfg(...))] to child items.
  • collapse-docs concatenates all document attributes into one document attribute. This is necessary because each line of a doc comment is given as a separate doc attribute, and this will combine them into a single string with line breaks between each attribute.
  • unindent-comments removes excess indentation on comments in order for markdown to like it. This is necessary because the convention for writing documentation is to provide a space between the /// or //! marker and the text, and stripping that leading space will make the text easier to parse by the Markdown parser. (In the past, the markdown parser used was not Commonmark- compliant, which caused annoyances with extra whitespace but this seems to be less of an issue today.)
  • strip-priv-imports strips all private import statements (use, extern crate) from a crate. This is necessary because rustdoc will handle public imports by either inlining the item's documentation to the module or creating a "Reexports" section with the import in it. The pass ensures that all of these imports are actually relevant to documentation.
  • strip-hidden and strip-private strip all doc(hidden) and private items from the output. strip-private implies strip-priv-imports. Basically, the goal is to remove items that are not relevant for public documentation.

From clean to crate

This is where the "second phase" in rustdoc begins. This phase primarily lives in the html/ folder, and it all starts with run() in html/render.rs. This code is responsible for setting up the Context, SharedContext, and Cache which are used during rendering, copying out the static files which live in every rendered set of documentation (things like the fonts, CSS, and JavaScript that live in html/static/), creating the search index, and printing out the source code rendering, before beginning the process of rendering all the documentation for the crate.

Several functions implemented directly on Context take the clean::Crate and set up some state between rendering items or recursing on a module's child items. From here the "page rendering" begins, via an enormous write!() call in html/layout.rs. The parts that actually generate HTML from the items and documentation occurs within a series of std::fmt::Display implementations and functions that pass around a &mut std::fmt::Formatter. The top-level implementation that writes out the page body is the impl<'a> fmt::Display for Item<'a> in html/render.rs, which switches out to one of several item_* functions based on the kind of Item being rendered.

Depending on what kind of rendering code you're looking for, you'll probably find it either in html/render.rs for major items like "what sections should I print for a struct page" or html/format.rs for smaller component pieces like "how should I print a where clause as part of some other item".

Whenever rustdoc comes across an item that should print hand-written documentation alongside, it calls out to html/markdown.rs which interfaces with the Markdown parser. This is exposed as a series of types that wrap a string of Markdown, and implement fmt::Display to emit HTML text. It takes special care to enable certain features like footnotes and tables and add syntax highlighting to Rust code blocks (via html/highlight.rs) before running the Markdown parser. There's also a function in here (find_testable_code) that specifically scans for Rust code blocks so the test-runner code can find all the doctests in the crate.

From soup to nuts

(alternate title: "An unbroken thread that stretches from those first Cells to us")

It's important to note that the AST cleaning can ask the compiler for information (crucially, DocContext contains a TyCtxt), but page rendering cannot. The clean::Crate created within run_core is passed outside the compiler context before being handed to html::render::run. This means that a lot of the "supplementary data" that isn't immediately available inside an item's definition, like which trait is the Deref trait used by the language, needs to be collected during cleaning, stored in the DocContext, and passed along to the SharedContext during HTML rendering. This manifests as a bunch of shared state, context variables, and RefCells.

Also of note is that some items that come from "asking the compiler" don't go directly into the DocContext - for example, when loading items from a foreign crate, rustdoc will ask about trait implementations and generate new Items for the impls based on that information. This goes directly into the returned Crate rather than roundabout through the DocContext. This way, these implementations can be collected alongside the others, right before rendering the HTML.

Other tricks up its sleeve

All this describes the process for generating HTML documentation from a Rust crate, but there are couple other major modes that rustdoc runs in. It can also be run on a standalone Markdown file, or it can run doctests on Rust code or standalone Markdown files. For the former, it shortcuts straight to html/markdown.rs, optionally including a mode which inserts a Table of Contents to the output HTML.

For the latter, rustdoc runs a similar partial-compilation to get relevant documentation in test.rs, but instead of going through the full clean and render process, it runs a much simpler crate walk to grab just the hand-written documentation. Combined with the aforementioned "find_testable_code" in html/markdown.rs, it builds up a collection of tests to run before handing them off to the test test runner. One notable location in test.rs is the function make_test, which is where hand-written doctests get transformed into something that can be executed.

Some extra reading about make_test can be found here.

Dotting i's and crossing t's

So that's rustdoc's code in a nutshell, but there's more things in the repo that deal with it. Since we have the full compiletest suite at hand, there's a set of tests in src/test/rustdoc that make sure the final HTML is what we expect in various situations. These tests also use a supplementary script, src/etc/htmldocck.py, that allows it to look through the final HTML using XPath notation to get a precise look at the output. The full description of all the commands available to rustdoc tests (e.g. @has and @matches) is in htmldocck.py.

To use multiple crates in a rustdoc test, add // aux-build:filename.rs to the top of the test file. filename.rs should be placed in an auxiliary directory relative to the test file with the comment. If you need to build docs for the auxiliary file, use // build-aux-docs.

In addition, there are separate tests for the search index and rustdoc's ability to query it. The files in src/test/rustdoc-js each contain a different search query and the expected results, broken out by search tab. These files are processed by a script in src/tools/rustdoc-js and the Node.js runtime. These tests don't have as thorough of a writeup, but a broad example that features results in all tabs can be found in basic.js. The basic idea is that you match a given QUERY with a set of EXPECTED results, complete with the full item path of each item.

Testing locally

Some features of the generated HTML documentation might require local storage to be used across pages, which doesn't work well without an HTTP server. To test these features locally, you can run a local HTTP server, like this:

$ ./x.py doc library/std --stage 1
# The documentation has been generated into `build/[YOUR ARCH]/doc`.
$ python3 -m http.server -d build/[YOUR ARCH]/doc

Now you can browse your documentation just like you would if it was hosted on the internet. For example, the url for std will be `/std/".

Part 3: Source Code Representation

This part describes the process of taking raw source code from the user and transforming it into various forms that the compiler can work with easily. These are called intermediate representations (IRs).

This process starts with compiler understanding what the user has asked for: parsing the command line arguments given and determining what it is to compile. After that, the compiler transforms the user input into a series of IRs that look progressively less like what the user wrote.

Command-line Arguments

Command-line flags are documented in the rustc book. All stable flags should be documented there. Unstable flags should be documented in the unstable book.

See the forge guide for new options for details on the procedure for adding a new command-line argument.

Guidelines

  • Flags should be orthogonal to each other. For example, if we'd have a json-emitting variant of multiple actions foo and bar, an additional --json flag is better than adding --foo-json and --bar-json.
  • Avoid flags with the no- prefix. Instead, use the parse_bool function, such as -C embed-bitcode=no.
  • Consider the behavior if the flag is passed multiple times. In some situations, the values should be accumulated (in order!). In other situations, subsequent flags should override previous flags (for example, the lint-level flags). And some flags (like -o) should generate an error if it is too ambiguous what multiple flags would mean.
  • Always give options a long descriptive name, if only for more understandable compiler scripts.
  • The --verbose flag is for adding verbose information to rustc output. For example, using it with the --version flag gives information about the hashes of the compiler code.
  • Experimental flags and options must be guarded behind the -Z unstable-options flag.

The Rustc Driver and Interface

The rustc_driver is essentially rustc's main() function. It acts as the glue for running the various phases of the compiler in the correct order, using the interface defined in the rustc_interface crate.

The rustc_interface crate provides external users with an (unstable) API for running code at particular times during the compilation process, allowing third parties to effectively use rustc's internals as a library for analysing a crate or emulating the compiler in-process (e.g. the RLS or rustdoc).

For those using rustc as a library, the rustc_interface::run_compiler() function is the main entrypoint to the compiler. It takes a configuration for the compiler and a closure that takes a Compiler. run_compiler creates a Compiler from the configuration and passes it to the closure. Inside the closure, you can use the Compiler to drive queries to compile a crate and get the results. This is what the rustc_driver does too. You can see a minimal example of how to use rustc_interface here.

You can see what queries are currently available through the rustdocs for Compiler. You can see an example of how to use them by looking at the rustc_driver implementation, specifically the rustc_driver::run_compiler function (not to be confused with rustc_interface::run_compiler). The rustc_driver::run_compiler function takes a bunch of command-line args and some other configurations and drives the compilation to completion.

rustc_driver::run_compiler also takes a Callbacks, a trait that allows for custom compiler configuration, as well as allowing some custom code run after different phases of the compilation.

Warning: By its very nature, the internal compiler APIs are always going to be unstable. That said, we do try not to break things unnecessarily.

Example: Type checking through rustc_interface

rustc_interface allows you to interact with Rust code at various stages of compilation.

Getting the type of an expression

To get the type of an expression, use the global_ctxt to get a TyCtxt:

// See https://github.com/rust-lang/rustc-dev-guide/blob/master/examples/rustc-driver-interacting-with-the-ast.rs for complete program.
let config = rustc_interface::Config {
    input: config::Input::Str {
        name: source_map::FileName::Custom("main.rs".to_string()),
        input: "fn main() { let message = \"Hello, world!\"; println!(\"{}\", message); }"
            .to_string(),
    },
    /* other config */
};
rustc_interface::run_compiler(config, |compiler| {
    compiler.enter(|queries| {
        // Analyze the crate and inspect the types under the cursor.
        queries.global_ctxt().unwrap().take().enter(|tcx| {
            // Every compilation contains a single crate.
            let krate = tcx.hir().krate();
            // Iterate over the top-level items in the crate, looking for the main function.
            for (_, item) in &krate.items {
                // Use pattern-matching to find a specific node inside the main function.
                if let rustc_hir::ItemKind::Fn(_, _, body_id) = item.kind {
                    let expr = &tcx.hir().body(body_id).value;
                    if let rustc_hir::ExprKind::Block(block, _) = expr.kind {
                        if let rustc_hir::StmtKind::Local(local) = block.stmts[0].kind {
                            if let Some(expr) = local.init {
                                let hir_id = expr.hir_id; // hir_id identifies the string "Hello, world!"
                                let def_id = tcx.hir().local_def_id(item.hir_id); // def_id identifies the main function
                                let ty = tcx.typeck(def_id).node_type(hir_id);
                                println!("{:?}: {:?}", expr, ty); // prints expr(HirId { owner: DefIndex(3), local_id: 4 }: "Hello, world!"): &'static str
                            }
                        }
                    }
                }
            }
        })
    });
});

Example: Getting diagnostic through rustc_interface

rustc_interface allows you to intercept diagnostics that would otherwise be printed to stderr.

Getting diagnostics

To get diagnostics from the compiler, configure rustc_interface::Config to output diagnostic to a buffer, and run TyCtxt.analysis:


#![allow(unused)]
fn main() {
// See https://github.com/rust-lang/rustc-dev-guide/blob/master/examples/rustc-driver-getting-diagnostics.rs for complete program.
let buffer = sync::Arc::new(sync::Mutex::new(Vec::new()));
let config = rustc_interface::Config {
    opts: config::Options {
        // Configure the compiler to emit diagnostics in compact JSON format.
        error_format: config::ErrorOutputType::Json {
            pretty: false,
            json_rendered: rustc_errors::emitter::HumanReadableErrorType::Default(
                rustc_errors::emitter::ColorConfig::Never,
            ),
        },
        /* other config */
    },
    // Redirect the diagnostic output of the compiler to a buffer.
    diagnostic_output: rustc_session::DiagnosticOutput::Raw(Box::from(DiagnosticSink(
        buffer.clone(),
    ))),
    /* other config */
};
rustc_interface::run_compiler(config, |compiler| {
    compiler.enter(|queries| {
        queries.global_ctxt().unwrap().take().enter(|tcx| {
            // Run the analysis phase on the local crate to trigger the type error.
            tcx.analysis(rustc_hir::def_id::LOCAL_CRATE);
        });
    });
});
// Read buffered diagnostics.
let diagnostics = String::from_utf8(buffer.lock().unwrap().clone()).unwrap();
}

Syntax and the AST

Working directly with source code is very inconvenient and error-prone. Thus, before we do anything else, we convert raw source code into an AST. It turns out that doing even this involves a lot of work, including lexing, parsing, macro expansion, name resolution, conditional compilation, feature-gate checking, and validation of the AST. In this chapter, we take a look at all of these steps.

Notably, there isn't always a clean ordering between these tasks. For example, macro expansion relies on name resolution to resolve the names of macros and imports. And parsing requires macro expansion, which in turn may require parsing the output of the macro.

Lexing and Parsing

The parser and lexer are currently undergoing a lot of refactoring, so parts of this chapter may be out of date.

The very first thing the compiler does is take the program (in Unicode characters) and turn it into something the compiler can work with more conveniently than strings. This happens in two stages: Lexing and Parsing.

Lexing takes strings and turns them into streams of tokens. For example, a.b + c would be turned into the tokens a, ., b, +, and c. The lexer lives in rustc_lexer.

Parsing then takes streams of tokens and turns them into a structured form which is easier for the compiler to work with, usually called an Abstract Syntax Tree (AST). An AST mirrors the structure of a Rust program in memory, using a Span to link a particular AST node back to its source text.

The AST is defined in rustc_ast, along with some definitions for tokens and token streams, data structures/traits for mutating ASTs, and shared definitions for other AST-related parts of the compiler (like the lexer and macro-expansion).

The parser is defined in rustc_parse, along with a high-level interface to the lexer and some validation routines that run after macro expansion. In particular, the rustc_parse::parser contains the parser implementation.

The main entrypoint to the parser is via the various parse_* functions and others in the parser crate. They let you do things like turn a SourceFile (e.g. the source in a single file) into a token stream, create a parser from the token stream, and then execute the parser to get a Crate (the root AST node).

To minimise the amount of copying that is done, both the StringReader and Parser have lifetimes which bind them to the parent ParseSess. This contains all the information needed while parsing, as well as the SourceMap itself.

Note that while parsing, we may encounter macro definitions or invocations. We set these aside to be expanded (see this chapter). Expansion may itself require parsing the output of the macro, which may reveal more macros to be expanded, and so on.

More on Lexical Analysis

Code for lexical analysis is split between two crates:

  • rustc_lexer crate is responsible for breaking a &str into chunks constituting tokens. Although it is popular to implement lexers as generated finite state machines, the lexer in rustc_lexer is hand-written.

  • StringReader from rustc_ast integrates rustc_lexer with rustc specific data structures. Specifically, it adds Span information to tokens returned by rustc_lexer and interns identifiers.

Macro expansion

rustc_ast, rustc_expand, and rustc_builtin_macros are all undergoing refactoring, so some of the links in this chapter may be broken.

Rust has a very powerful macro system. In the previous chapter, we saw how the parser sets aside macros to be expanded (it temporarily uses placeholders). This chapter is about the process of expanding those macros iteratively until we have a complete AST for our crate with no unexpanded macros (or a compile error).

First, we will discuss the algorithm that expands and integrates macro output into ASTs. Next, we will take a look at how hygiene data is collected. Finally, we will look at the specifics of expanding different types of macros.

Many of the algorithms and data structures described below are in rustc_expand, with basic data structures in rustc_expand::base.

Also of note, cfg and cfg_attr are treated specially from other macros, and are handled in rustc_expand::config.

Expansion and AST Integration

First of all, expansion happens at the crate level. Given a raw source code for a crate, the compiler will produce a massive AST with all macros expanded, all modules inlined, etc. The primary entry point for this process is the MacroExpander::fully_expand_fragment method. With few exceptions, we use this method on the whole crate (see "Eager Expansion" below for more detailed discussion of edge case expansion issues).

At a high level, fully_expand_fragment works in iterations. We keep a queue of unresolved macro invocations (that is, macros we haven't found the definition of yet). We repeatedly try to pick a macro from the queue, resolve it, expand it, and integrate it back. If we can't make progress in an iteration, this represents a compile error. Here is the algorithm:

  1. Initialize an queue of unresolved macros.
  2. Repeat until queue is empty (or we make no progress, which is an error):
    1. Resolve imports in our partially built crate as much as possible.
    2. Collect as many macro Invocations as possible from our partially built crate (fn-like, attributes, derives) and add them to the queue.
    3. Dequeue the first element, and attempt to resolve it.
    4. If it's resolved:
      1. Run the macro's expander function that consumes a TokenStream or AST and produces a TokenStream or AstFragment (depending on the macro kind). (A TokenStream is a collection of TokenTrees, each of which are a token (punctuation, identifier, or literal) or a delimited group (anything inside ()/[]/{})).
        • At this point, we know everything about the macro itself and can call set_expn_data to fill in its properties in the global data; that is the hygiene data associated with ExpnId. (See the "Hygiene" section below).
      2. Integrate that piece of AST into the big existing partially built AST. This is essentially where the "token-like mass" becomes a proper set-in-stone AST with side-tables. It happens as follows:
        • If the macro produces tokens (e.g. a proc macro), we parse into an AST, which may produce parse errors.
        • During expansion, we create SyntaxContexts (hierarchy 2). (See the "Hygiene" section below)
        • These three passes happen one after another on every AST fragment freshly expanded from a macro:
      3. After expanding a single macro and integrating its output, continue to the next iteration of fully_expand_fragment.
    5. If it's not resolved:
      1. Put the macro back in the queue
      2. Continue to next iteration...

Error Recovery

If we make no progress in an iteration, then we have reached a compilation error (e.g. an undefined macro). We attempt to recover from failures (unresolved macros or imports) for the sake of diagnostics. This allows compilation to continue past the first error, so that we can report more errors at a time. Recovery can't cause compilation to suceed. We know that it will fail at this point. The recovery happens by expanding unresolved macros into ExprKind::Err.

Name Resolution

Notice that name resolution is involved here: we need to resolve imports and macro names in the above algorithm. This is done in rustc_resolve::macros, which resolves macro paths, validates those resolutions, and reports various errors (e.g. "not found" or "found, but it's unstable" or "expected x, found y"). However, we don't try to resolve other names yet. This happens later, as we will see in the next chapter.

Eager Expansion

Eager expansion means that we expand the arguments of a macro invocation before the macro invocation itself. This is implemented only for a few special built-in macros that expect literals; expanding arguments first for some of these macro results in a smoother user experience. As an example, consider the following:

macro bar($i: ident) { $i }
macro foo($i: ident) { $i }

foo!(bar!(baz));

A lazy expansion would expand foo! first. An eager expansion would expand bar! first.

Eager expansion is not a generally available feature of Rust. Implementing eager expansion more generally would be challenging, but we implement it for a few special built-in macros for the sake of user experience. The built-in macros are implemented in rustc_builtin_macros, along with some other early code generation facilities like injection of standard library imports or generation of test harness. There are some additional helpers for building their AST fragments in rustc_expand::build. Eager expansion generally performs a subset of the things that lazy (normal) expansion. It is done by invoking fully_expand_fragment on only part of a crate (as opposed to whole crate, like we normally do).

Other Data Structures

Here are some other notable data structures involved in expansion and integration:

  • ResolverExpand - a trait used to break crate dependencies. This allows the resolver services to be used in rustc_ast, despite rustc_resolve and pretty much everything else depending on rustc_ast.
  • ExtCtxt/ExpansionData - various intermediate data kept and used by expansion infrastructure in the process of its work
  • Annotatable - a piece of AST that can be an attribute target, almost same thing as AstFragment except for types and patterns that can be produced by macros but cannot be annotated with attributes
  • MacResult - a "polymorphic" AST fragment, something that can turn into a different AstFragment depending on its AstFragmentKind - item, or expression, or pattern etc.

Hygiene and Hierarchies

If you have ever used C/C++ preprocessor macros, you know that there are some annoying and hard-to-debug gotchas! For example, consider the following C code:

#define DEFINE_FOO struct Bar {int x;}; struct Foo {Bar bar;};

// Then, somewhere else
struct Bar {
    ...
};

DEFINE_FOO

Most people avoid writing C like this – and for good reason: it doesn't compile. The struct Bar defined by the macro clashes names with the struct Bar defined in the code. Consider also the following example:

#define DO_FOO(x) {\
    int y = 0;\
    foo(x, y);\
    }

// Then elsewhere
int y = 22;
DO_FOO(y);

Do you see the problem? We wanted to generate a call foo(22, 0), but instead we got foo(0, 0) because the macro defined its own y!

These are both examples of macro hygiene issues. Hygiene relates to how to handle names defined within a macro. In particular, a hygienic macro system prevents errors due to names introduced within a macro. Rust macros are hygienic in that they do not allow one to write the sorts of bugs above.

At a high level, hygiene within the rust compiler is accomplished by keeping track of the context where a name is introduced and used. We can then disambiguate names based on that context. Future iterations of the macro system will allow greater control to the macro author to use that context. For example, a macro author may want to introduce a new name to the context where the macro was called. Alternately, the macro author may be defining a variable for use only within the macro (i.e. it should not be visible outside the macro).

The context is attached to AST nodes. All AST nodes generated by macros have context attached. Additionally, there may be other nodes that have context attached, such as some desugared syntax (non-macro-expanded nodes are considered to just have the "root" context, as described below). Throughout the compiler, we use rustc_span::Spans to refer to code locations. This struct also has hygiene information attached to it, as we will see later.

Because macros invocations and definitions can be nested, the syntax context of a node must be a hierarchy. For example, if we expand a macro and there is another macro invocation or definition in the generated output, then the syntax context should reflex the nesting.

However, it turns out that there are actually a few types of context we may want to track for different purposes. Thus, there are not just one but three expansion hierarchies that together comprise the hygiene information for a crate.

All of these hierarchies need some sort of "macro ID" to identify individual elements in the chain of expansions. This ID is ExpnId. All macros receive an integer ID, assigned continuously starting from 0 as we discover new macro calls. All hierarchies start at ExpnId::root(), which is its own parent.

rustc_span::hygiene contains all of the hygiene-related algorithms (with the exception of some hacks in Resolver::resolve_crate_root) and structures related to hygiene and expansion that are kept in global data.

The actual hierarchies are stored in HygieneData. This is a global piece of data containing hygiene and expansion info that can be accessed from any Ident without any context.

The Expansion Order Hierarchy

The first hierarchy tracks the order of expansions, i.e., when a macro invocation is in the output of another macro.

Here, the children in the hierarchy will be the "innermost" tokens. The ExpnData struct itself contains a subset of properties from both macro definition and macro call available through global data. ExpnData::parent tracks the child -> parent link in this hierarchy.

For example,

macro_rules! foo { () => { println!(); } }

fn main() { foo!(); }

In this code, the AST nodes that are finally generated would have hierarchy:

root
    expn_id_foo
        expn_id_println

The Macro Definition Hierarchy

The second hierarchy tracks the order of macro definitions, i.e., when we are expanding one macro another macro definition is revealed in its output. This one is a bit tricky and more complex than the other two hierarchies.

SyntaxContext represents a whole chain in this hierarchy via an ID. SyntaxContextData contains data associated with the given SyntaxContext; mostly it is a cache for results of filtering that chain in different ways. SyntaxContextData::parent is the child -> parent link here, and SyntaxContextData::outer_expns are individual elements in the chain. The "chaining operator" is SyntaxContext::apply_mark in compiler code.

A Span, mentioned above, is actually just a compact representation of a code location and SyntaxContext. Likewise, an Ident is just an interned Symbol + Span (i.e. an interned string + hygiene data).

For built-in macros, we use the context: SyntaxContext::empty().apply_mark(expn_id), and such macros are considered to be defined at the hierarchy root. We do the same for proc-macros because we haven't implemented cross-crate hygiene yet.

If the token had context X before being produced by a macro then after being produced by the macro it has context X -> macro_id. Here are some examples:

Example 0:

macro m() { ident }

m!();

Here ident originally has context SyntaxContext::root(). ident has context ROOT -> id(m) after it's produced by m.

Example 1:

macro m() { macro n() { ident } }

m!();
n!();

In this example the ident has context ROOT originally, then ROOT -> id(m) after the first expansion, then ROOT -> id(m) -> id(n).

Example 2:

Note that these chains are not entirely determined by their last element, in other words ExpnId is not isomorphic to SyntaxContext.

macro m($i: ident) { macro n() { ($i, bar) } }

m!(foo);

After all expansions, foo has context ROOT -> id(n) and bar has context ROOT -> id(m) -> id(n).

Finally, one last thing to mention is that currently, this hierarchy is subject to the "context transplantation hack". Basically, the more modern (and experimental) macro macros have stronger hygiene than the older MBE system, but this can result in weird interactions between the two. The hack is intended to make things "just work" for now.

The Call-site Hierarchy

The third and final hierarchy tracks the location of macro invocations.

In this hierarchy ExpnData::call_site is the child -> parent link.

Here is an example:

macro bar($i: ident) { $i }
macro foo($i: ident) { $i }

foo!(bar!(baz));

For the baz AST node in the final output, the first hierarchy is ROOT -> id(foo) -> id(bar) -> baz, while the third hierarchy is ROOT -> baz.

Macro Backtraces

Macro backtraces are implemented in rustc_span using the hygiene machinery in rustc_span::hygiene.

Producing Macro Output

Above, we saw how the output of a macro is integrated into the AST for a crate, and we also saw how the hygiene data for a crate is generated. But how do we actually produce the output of a macro? It depends on the type of macro.

There are two types of macros in Rust: macro_rules! macros (a.k.a. "Macros By Example" (MBE)) and procedural macros (or "proc macros"; including custom derives). During the parsing phase, the normal Rust parser will set aside the contents of macros and their invocations. Later, macros are expanded using these portions of the code.

Some important data structures/interfaces here:

  • SyntaxExtension - a lowered macro representation, contains its expander function, which transforms a TokenStream or AST into another TokenStream or AST + some additional data like stability, or a list of unstable features allowed inside the macro.
  • SyntaxExtensionKind - expander functions may have several different signatures (take one token stream, or two, or a piece of AST, etc). This is an enum that lists them.
  • ProcMacro/TTMacroExpander/AttrProcMacro/MultiItemModifier - traits representing the expander function signatures.

Macros By Example

MBEs have their own parser distinct from the normal Rust parser. When macros are expanded, we may invoke the MBE parser to parse and expand a macro. The MBE parser, in turn, may call the normal Rust parser when it needs to bind a metavariable (e.g. $my_expr) while parsing the contents of a macro invocation. The code for macro expansion is in compiler/rustc_expand/src/mbe/.

Example

It's helpful to have an example to refer to. For the remainder of this chapter, whenever we refer to the "example definition", we mean the following:

macro_rules! printer {
    (print $mvar:ident) => {
        println!("{}", $mvar);
    };
    (print twice $mvar:ident) => {
        println!("{}", $mvar);
        println!("{}", $mvar);
    };
}

$mvar is called a metavariable. Unlike normal variables, rather than binding to a value in a computation, a metavariable binds at compile time to a tree of tokens. A token is a single "unit" of the grammar, such as an identifier (e.g. foo) or punctuation (e.g. =>). There are also other special tokens, such as EOF, which indicates that there are no more tokens. Token trees resulting from paired parentheses-like characters ((...), [...], and {...}) – they include the open and close and all the tokens in between (we do require that parentheses-like characters be balanced). Having macro expansion operate on token streams rather than the raw bytes of a source file abstracts away a lot of complexity. The macro expander (and much of the rest of the compiler) doesn't really care that much about the exact line and column of some syntactic construct in the code; it cares about what constructs are used in the code. Using tokens allows us to care about what without worrying about where. For more information about tokens, see the Parsing chapter of this book.

Whenever we refer to the "example invocation", we mean the following snippet:

printer!(print foo); // Assume `foo` is a variable defined somewhere else...

The process of expanding the macro invocation into the syntax tree println!("{}", foo) and then expanding that into a call to Display::fmt is called macro expansion, and it is the topic of this chapter.

The MBE parser

There are two parts to MBE expansion: parsing the definition and parsing the invocations. Interestingly, both are done by the macro parser.

Basically, the MBE parser is like an NFA-based regex parser. It uses an algorithm similar in spirit to the Earley parsing algorithm. The macro parser is defined in compiler/rustc_expand/src/mbe/macro_parser.rs.

The interface of the macro parser is as follows (this is slightly simplified):

fn parse_tt(
    parser: &mut Cow<Parser>,
    ms: &[TokenTree],
) -> NamedParseResult

We use these items in macro parser:

  • parser is a reference to the state of a normal Rust parser, including the token stream and parsing session. The token stream is what we are about to ask the MBE parser to parse. We will consume the raw stream of tokens and output a binding of metavariables to corresponding token trees. The parsing session can be used to report parser errors.
  • ms a matcher. This is a sequence of token trees that we want to match the token stream against.

In the analogy of a regex parser, the token stream is the input and we are matching it against the pattern ms. Using our examples, the token stream could be the stream of tokens containing the inside of the example invocation print foo, while ms might be the sequence of token (trees) print $mvar:ident.

The output of the parser is a NamedParseResult, which indicates which of three cases has occurred:

  • Success: the token stream matches the given matcher ms, and we have produced a binding from metavariables to the corresponding token trees.
  • Failure: the token stream does not match ms. This results in an error message such as "No rule expected token blah".
  • Error: some fatal error has occurred in the parser. For example, this happens if there are more than one pattern match, since that indicates the macro is ambiguous.

The full interface is defined here.

The macro parser does pretty much exactly the same as a normal regex parser with one exception: in order to parse different types of metavariables, such as ident, block, expr, etc., the macro parser must sometimes call back to the normal Rust parser.

As mentioned above, both definitions and invocations of macros are parsed using the macro parser. This is extremely non-intuitive and self-referential. The code to parse macro definitions is in compiler/rustc_expand/src/mbe/macro_rules.rs. It defines the pattern for matching for a macro definition as $( $lhs:tt => $rhs:tt );+. In other words, a macro_rules definition should have in its body at least one occurrence of a token tree followed by => followed by another token tree. When the compiler comes to a macro_rules definition, it uses this pattern to match the two token trees per rule in the definition of the macro using the macro parser itself. In our example definition, the metavariable $lhs would match the patterns of both arms: (print $mvar:ident) and (print twice $mvar:ident). And $rhs would match the bodies of both arms: { println!("{}", $mvar); } and { println!("{}", $mvar); println!("{}", $mvar); }. The parser would keep this knowledge around for when it needs to expand a macro invocation.

When the compiler comes to a macro invocation, it parses that invocation using the same NFA-based macro parser that is described above. However, the matcher used is the first token tree ($lhs) extracted from the arms of the macro definition. Using our example, we would try to match the token stream print foo from the invocation against the matchers print $mvar:ident and print twice $mvar:ident that we previously extracted from the definition. The algorithm is exactly the same, but when the macro parser comes to a place in the current matcher where it needs to match a non-terminal (e.g. $mvar:ident), it calls back to the normal Rust parser to get the contents of that non-terminal. In this case, the Rust parser would look for an ident token, which it finds (foo) and returns to the macro parser. Then, the macro parser proceeds in parsing as normal. Also, note that exactly one of the matchers from the various arms should match the invocation; if there is more than one match, the parse is ambiguous, while if there are no matches at all, there is a syntax error.

For more information about the macro parser's implementation, see the comments in compiler/rustc_expand/src/mbe/macro_parser.rs.

macros and Macros 2.0

There is an old and mostly undocumented effort to improve the MBE system, give it more hygiene-related features, better scoping and visibility rules, etc. There hasn't been a lot of work on this recently, unfortunately. Internally, macro macros use the same machinery as today's MBEs; they just have additional syntactic sugar and are allowed to be in namespaces.

Procedural Macros

Precedural macros are also expanded during parsing, as mentioned above. However, they use a rather different mechanism. Rather than having a parser in the compiler, procedural macros are implemented as custom, third-party crates. The compiler will compile the proc macro crate and specially annotated functions in them (i.e. the proc macro itself), passing them a stream of tokens.

The proc macro can then transform the token stream and output a new token stream, which is synthesized into the AST.

It's worth noting that the token stream type used by proc macros is stable, so rustc does not use it internally (since our internal data structures are unstable). The compiler's token stream is rustc_ast::tokenstream::TokenStream, as previously. This is converted into the stable proc_macro::TokenStream and back in rustc_expand::proc_macro and rustc_expand::proc_macro_server. Because the Rust ABI is unstable, we use the C ABI for this conversion.

TODO: more here.

Custom Derive

Custom derives are a special type of proc macro.

TODO: more?

Name resolution

In the previous chapters, we saw how the AST is built with all macros expanded. We saw how doing that requires doing some name resolution to resolve imports and macro names. In this chapter, we show how this is actually done and more.

In fact, we don't do full name resolution during macro expansion -- we only resolve imports and macros at that time. This is required to know what to even expand. Later, after we have the whole AST, we do full name resolution to resolve all names in the crate. This happens in rustc_resolve::late. Unlike during macro expansion, in this late expansion, we only need to try to resolve a name once, since no new names can be added. If we fail to resolve a name now, then it is a compiler error.

Name resolution can be complex. There are a few different namespaces (e.g. macros, values, types, lifetimes), and names may be valid at different (nested) scopes. Also, different types of names can fail to be resolved differently, and failures can happen differently at different scopes. For example, for a module scope, failure means no unexpanded macros and no unresolved glob imports in that module. On the other hand, in a function body, failure requires that a name be absent from the block we are in, all outer scopes, and the global scope.

Basics

In our programs we can refer to variables, types, functions, etc, by giving them a name. These names are not always unique. For example, take this valid Rust program:


#![allow(unused)]
fn main() {
type x = u32;
let x: x = 1;
let y: x = 2;
}

How do we know on line 3 whether x is a type (u32) or a value (1)? These conflicts are resolved during name resolution. In this specific case, name resolution defines that type names and variable names live in separate namespaces and therefore can co-exist.

The name resolution in Rust is a two-phase process. In the first phase, which runs during macro expansion, we build a tree of modules and resolve imports. Macro expansion and name resolution communicate with each other via the ResolverAstLowering trait.

The input to the second phase is the syntax tree, produced by parsing input files and expanding macros. This phase produces links from all the names in the source to relevant places where the name was introduced. It also generates helpful error messages, like typo suggestions, traits to import or lints about unused items.

A successful run of the second phase (Resolver::resolve_crate) creates kind of an index the rest of the compilation may use to ask about the present names (through the hir::lowering::Resolver interface).

The name resolution lives in the rustc_resolve crate, with the meat in lib.rs and some helpers or symbol-type specific logic in the other modules.

Namespaces

Different kind of symbols live in different namespaces ‒ e.g. types don't clash with variables. This usually doesn't happen, because variables start with lower-case letter while types with upper case one, but this is only a convention. This is legal Rust code that'll compile (with warnings):


#![allow(unused)]
fn main() {
type x = u32;
let x: x = 1;
let y: x = 2; // See? x is still a type here.
}

To cope with this, and with slightly different scoping rules for these namespaces, the resolver keeps them separated and builds separate structures for them.

In other words, when the code talks about namespaces, it doesn't mean the module hierarchy, it's types vs. values vs. macros.

Scopes and ribs

A name is visible only in certain area in the source code. This forms a hierarchical structure, but not necessarily a simple one ‒ if one scope is part of another, it doesn't mean the name visible in the outer one is also visible in the inner one, or that it refers to the same thing.

To cope with that, the compiler introduces the concept of Ribs. This is abstraction of a scope. Every time the set of visible names potentially changes, a new rib is pushed onto a stack. The places where this can happen includes for example:

  • The obvious places ‒ curly braces enclosing a block, function boundaries, modules.
  • Introducing a let binding ‒ this can shadow another binding with the same name.
  • Macro expansion border ‒ to cope with macro hygiene.

When searching for a name, the stack of ribs is traversed from the innermost outwards. This helps to find the closest meaning of the name (the one not shadowed by anything else). The transition to outer rib may also change the rules what names are usable ‒ if there are nested functions (not closures), the inner one can't access parameters and local bindings of the outer one, even though they should be visible by ordinary scoping rules. An example:


#![allow(unused)]
fn main() {
fn do_something<T: Default>(val: T) { // <- New rib in both types and values (1)
    // `val` is accessible, as is the helper function
    // `T` is accessible
    let helper = || { // New rib on `helper` (2) and another on the block (3)
        // `val` is accessible here
    }; // End of (3)
    // `val` is accessible, `helper` variable shadows `helper` function
    fn helper() { // <- New rib in both types and values (4)
        // `val` is not accessible here, (4) is not transparent for locals)
        // `T` is not accessible here
    } // End of (4)
    let val = T::default(); // New rib (5)
    // `val` is the variable, not the parameter here
} // End of (5), (2) and (1)
}

Because the rules for different namespaces are a bit different, each namespace has its own independent rib stack that is constructed in parallel to the others. In addition, there's also a rib stack for local labels (e.g. names of loops or blocks), which isn't a full namespace in its own right.

Overall strategy

To perform the name resolution of the whole crate, the syntax tree is traversed top-down and every encountered name is resolved. This works for most kinds of names, because at the point of use of a name it is already introduced in the Rib hierarchy.

There are some exceptions to this. Items are bit tricky, because they can be used even before encountered ‒ therefore every block needs to be first scanned for items to fill in its Rib.

Other, even more problematic ones, are imports which need recursive fixed-point resolution and macros, that need to be resolved and expanded before the rest of the code can be processed.

Therefore, the resolution is performed in multiple stages.

Speculative crate loading

To give useful errors, rustc suggests importing paths into scope if they're not found. How does it do this? It looks through every module of every crate and looks for possible matches. This even includes crates that haven't yet been loaded!

Loading crates for import suggestions that haven't yet been loaded is called speculative crate loading, because any errors it encounters shouldn't be reported: resolve decided to load them, not the user. The function that does this is lookup_import_candidates and lives in rustc_resolve/src/diagnostics.rs.

To tell the difference between speculative loads and loads initiated by the user, resolve passes around a record_used parameter, which is false when the load is speculative.

TODO:

This is a result of the first pass of learning the code. It is definitely incomplete and not detailed enough. It also might be inaccurate in places. Still, it probably provides useful first guidepost to what happens in there.

  • What exactly does it link to and how is that published and consumed by following stages of compilation?
  • Who calls it and how it is actually used.
  • Is it a pass and then the result is only used, or can it be computed incrementally (e.g. for RLS)?
  • The overall strategy description is a bit vague.
  • Where does the name Rib come from?
  • Does this thing have its own tests, or is it tested only as part of some e2e testing?

The #[test] attribute

Today, rust programmers rely on a built in attribute called #[test]. All you have to do is mark a function as a test and include some asserts like so:

#[test]
fn my_test() {
    assert!(2+2 == 4);
}

When this program is compiled using rustc --test or cargo test, it will produce an executable that can run this, and any other test function. This method of testing allows tests to live alongside code in an organic way. You can even put tests inside private modules:

mod my_priv_mod {
    fn my_priv_func() -> bool {}

    #[test]
    fn test_priv_func() {
        assert!(my_priv_func());
    }
}

Private items can thus be easily tested without worrying about how to expose them to any sort of external testing apparatus. This is key to the ergonomics of testing in Rust. Semantically, however, it's rather odd. How does any sort of main function invoke these tests if they're not visible? What exactly is rustc --test doing?

#[test] is implemented as a syntactic transformation inside the compiler's rustc_ast crate. Essentially, it's a fancy macro, that rewrites the crate in 3 steps:

Step 1: Re-Exporting

As mentioned earlier, tests can exist inside private modules, so we need a way of exposing them to the main function, without breaking any existing code. To that end, rustc_ast will create local modules called __test_reexports that recursively reexport tests. This expansion translates the above example into:

mod my_priv_mod {
    fn my_priv_func() -> bool {}

    pub fn test_priv_func() {
        assert!(my_priv_func());
    }

    pub mod __test_reexports {
        pub use super::test_priv_func;
    }
}

Now, our test can be accessed as my_priv_mod::__test_reexports::test_priv_func. For deeper module structures, __test_reexports will reexport modules that contain tests, so a test at a::b::my_test becomes a::__test_reexports::b::__test_reexports::my_test. While this process seems pretty safe, what happens if there is an existing __test_reexports module? The answer: nothing.

To explain, we need to understand how the AST represents identifiers. The name of every function, variable, module, etc. is not stored as a string, but rather as an opaque Symbol which is essentially an ID number for each identifier. The compiler keeps a separate hashtable that allows us to recover the human-readable name of a Symbol when necessary (such as when printing a syntax error). When the compiler generates the __test_reexports module, it generates a new Symbol for the identifier, so while the compiler-generated __test_reexports may share a name with your hand-written one, it will not share a Symbol. This technique prevents name collision during code generation and is the foundation of Rust's macro hygiene.

Step 2: Harness Generation

Now that our tests are accessible from the root of our crate, we need to do something with them. rustc_ast generates a module like so:

#[main]
pub fn main() {
    extern crate test;
    test::test_main_static(&[&path::to::test1, /*...*/]);
}

where path::to::test1 is a constant of type test::TestDescAndFn.

While this transformation is simple, it gives us a lot of insight into how tests are actually run. The tests are aggregated into an array and passed to a test runner called test_main_static. We'll come back to exactly what TestDescAndFn is, but for now, the key takeaway is that there is a crate called test that is part of Rust core, that implements all of the runtime for testing. test's interface is unstable, so the only stable way to interact with it is through the #[test] macro.

Step 3: Test Object Generation

If you've written tests in Rust before, you may be familiar with some of the optional attributes available on test functions. For example, a test can be annotated with #[should_panic] if we expect the test to cause a panic. It looks something like this:

#[test]
#[should_panic]
fn foo() {
    panic!("intentional");
}

This means our tests are more than just simple functions, they have configuration information as well. test encodes this configuration data into a struct called TestDesc. For each test function in a crate, rustc_ast will parse its attributes and generate a TestDesc instance. It then combines the TestDesc and test function into the predictably named TestDescAndFn struct, that test_main_static operates on. For a given test, the generated TestDescAndFn instance looks like so:

self::test::TestDescAndFn{
  desc: self::test::TestDesc{
    name: self::test::StaticTestName("foo"),
    ignore: false,
    should_panic: self::test::ShouldPanic::Yes,
    allow_fail: false,
  },
  testfn: self::test::StaticTestFn(||
    self::test::assert_test_result(::crate::__test_reexports::foo())),
}

Once we've constructed an array of these test objects, they're passed to the test runner via the harness generated in step 2.

Inspecting the generated code

On nightly rust, there's an unstable flag called unpretty that you can use to print out the module source after macro expansion:

$ rustc my_mod.rs -Z unpretty=hir

Panicking in rust

Step 1: Invocation of the panic! macro.

There are actually two panic macros - one defined in core, and one defined in std. This is due to the fact that code in core can panic. core is built before std, but we want panics to use the same machinery at runtime, whether they originate in core or std.

core definition of panic!

The core panic! macro eventually makes the following call (in library/core/src/panicking.rs):


#![allow(unused)]
fn main() {
// NOTE This function never crosses the FFI boundary; it's a Rust-to-Rust call
extern "Rust" {
    #[lang = "panic_impl"]
    fn panic_impl(pi: &PanicInfo<'_>) -> !;
}

let pi = PanicInfo::internal_constructor(Some(&fmt), location);
unsafe { panic_impl(&pi) }
}

Actually resolving this goes through several layers of indirection:

  1. In compiler/rustc_middle/src/middle/weak_lang_items.rs, panic_impl is declared as 'weak lang item', with the symbol rust_begin_unwind. This is used in rustc_typeck/src/collect.rs to set the actual symbol name to rust_begin_unwind.

    Note that panic_impl is declared in an extern "Rust" block, which means that core will attempt to call a foreign symbol called rust_begin_unwind (to be resolved at link time)

  2. In library/std/src/panicking.rs, we have this definition:


#![allow(unused)]
fn main() {
/// Entry point of panic from the core crate.
#[cfg(not(test))]
#[panic_handler]
#[unwind(allowed)]
pub fn begin_panic_handler(info: &PanicInfo<'_>) -> ! {
    ...
}
}

The special panic_handler attribute is resolved via compiler/rustc_middle/src/middle/lang_items. The extract function converts the panic_handler attribute to a panic_impl lang item.

Now, we have a matching panic_handler lang item in the std. This function goes through the same process as the extern { fn panic_impl } definition in core, ending up with a symbol name of rust_begin_unwind. At link time, the symbol reference in core will be resolved to the definition of std (the function called begin_panic_handler in the Rust source).

Thus, control flow will pass from core to std at runtime. This allows panics from core to go through the same infrastructure that other panics use (panic hooks, unwinding, etc)

std implementation of panic!

This is where the actual panic-related logic begins. In library/std/src/panicking.rs, control passes to rust_panic_with_hook. This method is responsible for invoking the global panic hook, and checking for double panics. Finally, we call __rust_start_panic, which is provided by the panic runtime.

The call to __rust_start_panic is very weird - it is passed a *mut &mut dyn BoxMeUp, converted to an usize. Let's break this type down:

  1. BoxMeUp is an internal trait. It is implemented for PanicPayload (a wrapper around the user-supplied payload type), and has a method fn box_me_up(&mut self) -> *mut (dyn Any + Send). This method takes the user-provided payload (T: Any + Send), boxes it, and converts the box to a raw pointer.

  2. When we call __rust_start_panic, we have an &mut dyn BoxMeUp. However, this is a fat pointer (twice the size of a usize). To pass this to the panic runtime across an FFI boundary, we take a mutable reference to this mutable reference (&mut &mut dyn BoxMeUp), and convert it to a raw pointer (*mut &mut dyn BoxMeUp). The outer raw pointer is a thin pointer, since it points to a Sized type (a mutable reference). Therefore, we can convert this thin pointer into a usize, which is suitable for passing across an FFI boundary.

Finally, we call __rust_start_panic with this usize. We have now entered the panic runtime.

Step 2: The panic runtime

Rust provides two panic runtimes: panic_abort and panic_unwind. The user chooses between them at build time via their Cargo.toml

panic_abort is extremely simple: its implementation of __rust_start_panic just aborts, as you would expect.

panic_unwind is the more interesting case.

In its implementation of __rust_start_panic, we take the usize, convert it back to a *mut &mut dyn BoxMeUp, dereference it, and call box_me_up on the &mut dyn BoxMeUp. At this point, we have a raw pointer to the payload itself (a *mut (dyn Send + Any)): that is, a raw pointer to the actual value provided by the user who called panic!.

At this point, the platform-independent code ends. We now call into platform-specific unwinding logic (e.g unwind). This code is responsible for unwinding the stack, running any 'landing pads' associated with each frame (currently, running destructors), and transferring control to the catch_unwind frame.

Note that all panics either abort the process or get caught by some call to catch_unwind: in library/std/src/rt.rs, the call to the user-provided main function is wrapped in catch_unwind.

AST Validation

AST validation is the process of checking various correctness properties about the AST after macro expansion.

TODO: write this chapter.

Feature Gate Checking

TODO: this chapter

The HIR

The HIR – "High-Level Intermediate Representation" – is the primary IR used in most of rustc. It is a compiler-friendly representation of the abstract syntax tree (AST) that is generated after parsing, macro expansion, and name resolution (see Lowering for how the HIR is created). Many parts of HIR resemble Rust surface syntax quite closely, with the exception that some of Rust's expression forms have been desugared away. For example, for loops are converted into a loop and do not appear in the HIR. This makes HIR more amenable to analysis than a normal AST.

This chapter covers the main concepts of the HIR.

You can view the HIR representation of your code by passing the -Zunpretty=hir-tree flag to rustc:

cargo rustc -- -Zunpretty=hir-tree

Out-of-band storage and the Crate type

The top-level data-structure in the HIR is the Crate, which stores the contents of the crate currently being compiled (we only ever construct HIR for the current crate). Whereas in the AST the crate data structure basically just contains the root module, the HIR Crate structure contains a number of maps and other things that serve to organize the content of the crate for easier access.

For example, the contents of individual items (e.g. modules, functions, traits, impls, etc) in the HIR are not immediately accessible in the parents. So, for example, if there is a module item foo containing a function bar():


#![allow(unused)]
fn main() {
mod foo {
    fn bar() { }
}
}

then in the HIR the representation of module foo (the Mod struct) would only have the ItemId I of bar(). To get the details of the function bar(), we would lookup I in the items map.

One nice result from this representation is that one can iterate over all items in the crate by iterating over the key-value pairs in these maps (without the need to trawl through the whole HIR). There are similar maps for things like trait items and impl items, as well as "bodies" (explained below).

The other reason to set up the representation this way is for better integration with incremental compilation. This way, if you gain access to an &rustc_hir::Item (e.g. for the mod foo), you do not immediately gain access to the contents of the function bar(). Instead, you only gain access to the id for bar(), and you must invoke some function to lookup the contents of bar() given its id; this gives the compiler a chance to observe that you accessed the data for bar(), and then record the dependency.

Identifiers in the HIR

There are a bunch of different identifiers to refer to other nodes or definitions in the HIR. In short:

  • A DefId refers to a definition in any crate.
  • A LocalDefId refers to a definition in the currently compiled crate.
  • A HirId refers to any node in the HIR.

For more detailed information, check out the chapter on identifiers.

The HIR Map

Most of the time when you are working with the HIR, you will do so via the HIR Map, accessible in the tcx via tcx.hir() (and defined in the hir::map module). The HIR map contains a number of methods to convert between IDs of various kinds and to lookup data associated with a HIR node.

For example, if you have a DefId, and you would like to convert it to a NodeId, you can use tcx.hir().as_local_node_id(def_id). This returns an Option<NodeId> – this will be None if the def-id refers to something outside of the current crate (since then it has no HIR node), but otherwise returns Some(n) where n is the node-id of the definition.

Similarly, you can use tcx.hir().find(n) to lookup the node for a NodeId. This returns a Option<Node<'tcx>>, where Node is an enum defined in the map; by matching on this you can find out what sort of node the node-id referred to and also get a pointer to the data itself. Often, you know what sort of node n is – e.g. if you know that n must be some HIR expression, you can do tcx.hir().expect_expr(n), which will extract and return the &hir::Expr, panicking if n is not in fact an expression.

Finally, you can use the HIR map to find the parents of nodes, via calls like tcx.hir().get_parent_node(n).

HIR Bodies

A rustc_hir::Body represents some kind of executable code, such as the body of a function/closure or the definition of a constant. Bodies are associated with an owner, which is typically some kind of item (e.g. an fn() or const), but could also be a closure expression (e.g. |x, y| x + y). You can use the HIR map to find the body associated with a given def-id (maybe_body_owned_by) or to find the owner of a body (body_owner_def_id).

Lowering

The lowering step converts AST to HIR. This means many structures are removed if they are irrelevant for type analysis or similar syntax agnostic analyses. Examples of such structures include but are not limited to

  • Parenthesis
    • Removed without replacement, the tree structure makes order explicit
  • for loops and while (let) loops
    • Converted to loop + match and some let bindings
  • if let
    • Converted to match
  • Universal impl Trait
    • Converted to generic arguments (but with some flags, to know that the user didn't write them)
  • Existential impl Trait
    • Converted to a virtual existential type declaration

Lowering needs to uphold several invariants in order to not trigger the sanity checks in compiler/rustc_middle/src/hir/map/hir_id_validator.rs:

  1. A HirId must be used if created. So if you use the lower_node_id, you must use the resulting NodeId or HirId (either is fine, since any NodeIds in the HIR are checked for existing HirIds)
  2. Lowering a HirId must be done in the scope of the owning item. This means you need to use with_hir_id_owner if you are creating parts of an item other than the one being currently lowered. This happens for example during the lowering of existential impl Trait
  3. A NodeId that will be placed into a HIR structure must be lowered, even if its HirId is unused. Calling let _ = self.lower_node_id(node_id); is perfectly legitimate.
  4. If you are creating new nodes that didn't exist in the AST, you must create new ids for them. This is done by calling the next_id method, which produces both a new NodeId as well as automatically lowering it for you so you also get the HirId.

If you are creating new DefIds, since each DefId needs to have a corresponding NodeId, it is advisable to add these NodeIds to the AST so you don't have to generate new ones during lowering. This has the advantage of creating a way to find the DefId of something via its NodeId. If lowering needs this DefId in multiple places, you can't generate a new NodeId in all those places because you'd also get a new DefId then. With a NodeId from the AST this is not an issue.

Having the NodeId also allows the DefCollector to generate the DefIds instead of lowering having to do it on the fly. Centralizing the DefId generation in one place makes it easier to refactor and reason about.

HIR Debugging

The -Zunpretty=hir-tree flag will dump out the HIR.

If you are trying to correlate NodeIds or DefIds with source code, the --pretty expanded,identified flag may be useful.

TODO: anything else?

The MIR (Mid-level IR)

MIR is Rust's Mid-level Intermediate Representation. It is constructed from HIR. MIR was introduced in RFC 1211. It is a radically simplified form of Rust that is used for certain flow-sensitive safety checks – notably the borrow checker! – and also for optimization and code generation.

If you'd like a very high-level introduction to MIR, as well as some of the compiler concepts that it relies on (such as control-flow graphs and desugaring), you may enjoy the rust-lang blog post that introduced MIR.

Introduction to MIR

MIR is defined in the compiler/rustc_middle/src/mir/ module, but much of the code that manipulates it is found in compiler/rustc_mir.

Some of the key characteristics of MIR are:

  • It is based on a control-flow graph.
  • It does not have nested expressions.
  • All types in MIR are fully explicit.

Key MIR vocabulary

This section introduces the key concepts of MIR, summarized here:

  • Basic blocks: units of the control-flow graph, consisting of:
    • statements: actions with one successor
    • terminators: actions with potentially multiple successors; always at the end of a block
    • (if you're not familiar with the term basic block, see the background chapter)
  • Locals: Memory locations allocated on the stack (conceptually, at least), such as function arguments, local variables, and temporaries. These are identified by an index, written with a leading underscore, like _1. There is also a special "local" (_0) allocated to store the return value.
  • Places: expressions that identify a location in memory, like _1 or _1.f.
  • Rvalues: expressions that produce a value. The "R" stands for the fact that these are the "right-hand side" of an assignment.
    • Operands: the arguments to an rvalue, which can either be a constant (like 22) or a place (like _1).

You can get a feeling for how MIR is structed by translating simple programs into MIR and reading the pretty printed output. In fact, the playground makes this easy, since it supplies a MIR button that will show you the MIR for your program. Try putting this program into play (or clicking on this link), and then clicking the "MIR" button on the top:

fn main() {
    let mut vec = Vec::new();
    vec.push(1);
    vec.push(2);
}

You should see something like:

// WARNING: This output format is intended for human consumers only
// and is subject to change without notice. Knock yourself out.
fn main() -> () {
    ...
}

This is the MIR format for the main function.

Variable declarations. If we drill in a bit, we'll see it begins with a bunch of variable declarations. They look like this:

let mut _0: ();                      // return place
let mut _1: std::vec::Vec<i32>;      // in scope 0 at src/main.rs:2:9: 2:16
let mut _2: ();
let mut _3: &mut std::vec::Vec<i32>;
let mut _4: ();
let mut _5: &mut std::vec::Vec<i32>;

You can see that variables in MIR don't have names, they have indices, like _0 or _1. We also intermingle the user's variables (e.g., _1) with temporary values (e.g., _2 or _3). You can tell apart user-defined variables because they have debuginfo associated to them (see below).

User variable debuginfo. Below the variable declarations, we find the only hint that _1 represents a user variable:

scope 1 {
    debug vec => _1;                 // in scope 1 at src/main.rs:2:9: 2:16
}

Each debug <Name> => <Place>; annotation describes a named user variable, and where (i.e. the place) a debugger can find the data of that variable. Here the mapping is trivial, but optimizations may complicate the place, or lead to multiple user variables sharing the same place. Additionally, closure captures are described using the same system, and so they're complicated even without optimizations, e.g.: debug x => (*((*_1).0: &T));.

The "scope" blocks (e.g., scope 1 { .. }) describe the lexical structure of the source program (which names were in scope when), so any part of the program annotated with // in scope 0 would be missing vec, if you were stepping through the code in a debugger, for example.

Basic blocks. Reading further, we see our first basic block (naturally it may look slightly different when you view it, and I am ignoring some of the comments):

bb0: {
    StorageLive(_1);
    _1 = const <std::vec::Vec<T>>::new() -> bb2;
}

A basic block is defined by a series of statements and a final terminator. In this case, there is one statement:

StorageLive(_1);

This statement indicates that the variable _1 is "live", meaning that it may be used later – this will persist until we encounter a StorageDead(_1) statement, which indicates that the variable _1 is done being used. These "storage statements" are used by LLVM to allocate stack space.

The terminator of the block bb0 is the call to Vec::new:

_1 = const <std::vec::Vec<T>>::new() -> bb2;

Terminators are different from statements because they can have more than one successor – that is, control may flow to different places. Function calls like the call to Vec::new are always terminators because of the possibility of unwinding, although in the case of Vec::new we are able to see that indeed unwinding is not possible, and hence we list only one successor block, bb2.

If we look ahead to bb2, we will see it looks like this:

bb2: {
    StorageLive(_3);
    _3 = &mut _1;
    _2 = const <std::vec::Vec<T>>::push(move _3, const 1i32) -> [return: bb3, unwind: bb4];
}

Here there are two statements: another StorageLive, introducing the _3 temporary, and then an assignment:

_3 = &mut _1;

Assignments in general have the form:

<Place> = <Rvalue>

A place is an expression like _3, _3.f or *_3 – it denotes a location in memory. An Rvalue is an expression that creates a value: in this case, the rvalue is a mutable borrow expression, which looks like &mut <Place>. So we can kind of define a grammar for rvalues like so:

<Rvalue>  = & (mut)? <Place>
          | <Operand> + <Operand>
          | <Operand> - <Operand>
          | ...

<Operand> = Constant
          | copy Place
          | move Place

As you can see from this grammar, rvalues cannot be nested – they can only reference places and constants. Moreover, when you use a place, we indicate whether we are copying it (which requires that the place have a type T where T: Copy) or moving it (which works for a place of any type). So, for example, if we had the expression x = a + b + c in Rust, that would get compiled to two statements and a temporary:

TMP1 = a + b
x = TMP1 + c

(Try it and see, though you may want to do release mode to skip over the overflow checks.)

MIR data types

The MIR data types are defined in the compiler/rustc_middle/src/mir/ module. Each of the key concepts mentioned in the previous section maps in a fairly straightforward way to a Rust type.

The main MIR data type is Mir. It contains the data for a single function (along with sub-instances of Mir for "promoted constants", but you can read about those below).

  • Basic blocks: The basic blocks are stored in the field basic_blocks; this is a vector of BasicBlockData structures. Nobody ever references a basic block directly: instead, we pass around BasicBlock values, which are newtype'd indices into this vector.
  • Statements are represented by the type Statement.
  • Terminators are represented by the Terminator.
  • Locals are represented by a newtype'd index type Local. The data for a local variable is found in the Mir (the local_decls vector). There is also a special constant RETURN_PLACE identifying the special "local" representing the return value.
  • Places are identified by the enum Place. There are a few variants:
    • Local variables like _1
    • Static variables FOO
    • Projections, which are fields or other things that "project out" from a base place. So e.g. the place _1.f is a projection, with f being the "projection element and _1 being the base path. *_1 is also a projection, with the * being represented by the ProjectionElem::Deref element.
  • Rvalues are represented by the enum Rvalue.
  • Operands are represented by the enum Operand.

Representing constants

to be written

to be written

THIR and MIR construction

The lowering of HIR to MIR occurs for the following (probably incomplete) list of items:

  • Function and closure bodies
  • Initializers of static and const items
  • Initializers of enum discriminants
  • Glue and shims of any kind
    • Tuple struct initializer functions
    • Drop code (the Drop::drop function is not called directly)
    • Drop implementations of types without an explicit Drop implementation

The lowering is triggered by calling the mir_built query. There is an intermediate representation between HIR and MIR called the THIR that is only used during lowering. THIR means "Typed HIR" and used to be called "HAIR (High-level Abstract IR)". The THIR's most important feature is that the various adjustments (which happen without explicit syntax) like coercions, autoderef, autoref and overloaded method calls have become explicit casts, deref operations, reference expressions or concrete function calls.

The THIR is a shallow wrapper around HIR, with datatypes that mirror the HIR datatypes. For example, instead of -x being a thir::ExprKind::Neg(thir::Expr) (a deep copy), it is a thir::ExprKind::Neg(hir::Expr) (a shallow copy). This shallowness enables the THIR to represent all datatypes that HIR has, but without having to create an in-memory copy of the entire HIR. MIR lowering will first convert the topmost expression from HIR to THIR (in rustc_mir_build::thir::cx::expr) and then process the THIR expressions recursively.

The lowering creates local variables for every argument as specified in the signature. Next, it creates local variables for every binding specified (e.g. (a, b): (i32, String)) produces 3 bindings, one for the argument, and two for the bindings. Next, it generates field accesses that read the fields from the argument and writes the value to the binding variable.

With this initialization out of the way, the lowering triggers a recursive call to a function that generates the MIR for the body (a Block expression) and writes the result into the RETURN_PLACE.

unpack! all the things

Functions that generate MIR tend to fall into one of two patterns. First, if the function generates only statements, then it will take a basic block as argument onto which those statements should be appended. It can then return a result as normal:

fn generate_some_mir(&mut self, block: BasicBlock) -> ResultType {
   ...
}

But there are other functions that may generate new basic blocks as well. For example, lowering an expression like if foo { 22 } else { 44 } requires generating a small "diamond-shaped graph". In this case, the functions take a basic block where their code starts and return a (potentially) new basic block where the code generation ends. The BlockAnd type is used to represent this:

fn generate_more_mir(&mut self, block: BasicBlock) -> BlockAnd<ResultType> {
    ...
}

When you invoke these functions, it is common to have a local variable block that is effectively a "cursor". It represents the point at which we are adding new MIR. When you invoke generate_more_mir, you want to update this cursor. You can do this manually, but it's tedious:

let mut block;
let v = match self.generate_more_mir(..) {
    BlockAnd { block: new_block, value: v } => {
        block = new_block;
        v
    }
};

For this reason, we offer a macro that lets you write let v = unpack!(block = self.generate_more_mir(...)). It simply extracts the new block and overwrites the variable block that you named in the unpack!.

Lowering expressions into the desired MIR

There are essentially four kinds of representations one might want of an expression:

  • Place refers to a (or part of a) preexisting memory location (local, static, promoted)
  • Rvalue is something that can be assigned to a Place
  • Operand is an argument to e.g. a + operation or a function call
  • a temporary variable containing a copy of the value

The following image depicts a general overview of the interactions between the representations:

Click here for a more detailed view

We start out with lowering the function body to an Rvalue so we can create an assignment to RETURN_PLACE, This Rvalue lowering will in turn trigger lowering to Operand for its arguments (if any). Operand lowering either produces a const operand, or moves/copies out of a Place, thus triggering a Place lowering. An expression being lowered to a Place can in turn trigger a temporary to be created if the expression being lowered contains operations. This is where the snake bites its own tail and we need to trigger an Rvalue lowering for the expression to be written into the local.

Operator lowering

Operators on builtin types are not lowered to function calls (which would end up being infinite recursion calls, because the trait impls just contain the operation itself again). Instead there are Rvalues for binary and unary operators and index operations. These Rvalues later get codegened to llvm primitive operations or llvm intrinsics.

Operators on all other types get lowered to a function call to their impl of the operator's corresponding trait.

Regardless of the lowering kind, the arguments to the operator are lowered to Operands. This means all arguments are either constants, or refer to an already existing value somewhere in a local or static.

Method call lowering

Method calls are lowered to the same TerminatorKind that function calls are. In MIR there is no difference between method calls and function calls anymore.

Conditions

if conditions and match statements for enums without variants with fields are lowered to TerminatorKind::SwitchInt. Each possible value (so 0 and 1 for if conditions) has a corresponding BasicBlock to which the code continues. The argument being branched on is (again) an Operand representing the value of the if condition.

Pattern matching

match statements for enums with variants that have fields are lowered to TerminatorKind::SwitchInt, too, but the Operand refers to a Place where the discriminant of the value can be found. This often involves reading the discriminant to a new temporary variable.

Aggregate construction

Aggregate values of any kind (e.g. structs or tuples) are built via Rvalue::Aggregate. All fields are lowered to Operators. This is essentially equivalent to one assignment statement per aggregate field plus an assignment to the discriminant in the case of enums.

MIR visitor

The MIR visitor is a convenient tool for traversing the MIR and either looking for things or making changes to it. The visitor traits are defined in the rustc_middle::mir::visit module – there are two of them, generated via a single macro: Visitor (which operates on a &Mir and gives back shared references) and MutVisitor (which operates on a &mut Mir and gives back mutable references).

To implement a visitor, you have to create a type that represents your visitor. Typically, this type wants to "hang on" to whatever state you will need while processing MIR:

struct MyVisitor<...> {
    tcx: TyCtxt<'tcx>,
    ...
}

and you then implement the Visitor or MutVisitor trait for that type:

impl<'tcx> MutVisitor<'tcx> for NoLandingPads {
    fn visit_foo(&mut self, ...) {
        ...
        self.super_foo(...);
    }
}

As shown above, within the impl, you can override any of the visit_foo methods (e.g., visit_terminator) in order to write some code that will execute whenever a foo is found. If you want to recursively walk the contents of the foo, you then invoke the super_foo method. (NB. You never want to override super_foo.)

A very simple example of a visitor can be found in NoLandingPads. That visitor doesn't even require any state: it just visits all terminators and removes their unwind successors.

Traversal

In addition the visitor, the rustc_middle::mir::traversal module contains useful functions for walking the MIR CFG in different standard orders (e.g. pre-order, reverse post-order, and so forth).

MIR passes

If you would like to get the MIR for a function (or constant, etc), you can use the optimized_mir(def_id) query. This will give you back the final, optimized MIR. For foreign def-ids, we simply read the MIR from the other crate's metadata. But for local def-ids, the query will construct the MIR and then iteratively optimize it by applying a series of passes. This section describes how those passes work and how you can extend them.

To produce the optimized_mir(D) for a given def-id D, the MIR passes through several suites of optimizations, each represented by a query. Each suite consists of multiple optimizations and transformations. These suites represent useful intermediate points where we want to access the MIR for type checking or other purposes:

  • mir_build(D) – not a query, but this constructs the initial MIR
  • mir_const(D) – applies some simple transformations to make MIR ready for constant evaluation;
  • mir_validated(D) – applies some more transformations, making MIR ready for borrow checking;
  • optimized_mir(D) – the final state, after all optimizations have been performed.

Implementing and registering a pass

A MirPass is some bit of code that processes the MIR, typically – but not always – transforming it along the way somehow. For example, it might perform an optimization. The MirPass trait itself is found in the rustc_mir::transform module, and it basically consists of one method, run_pass, that simply gets an &mut Mir (along with the tcx and some information about where it came from). The MIR is therefore modified in place (which helps to keep things efficient).

A good example of a basic MIR pass is NoLandingPads, which walks the MIR and removes all edges that are due to unwinding – this is used when configured with panic=abort, which never unwinds. As you can see from its source, a MIR pass is defined by first defining a dummy type, a struct with no fields, something like:


#![allow(unused)]
fn main() {
struct MyPass;
}

for which you then implement the MirPass trait. You can then insert this pass into the appropriate list of passes found in a query like optimized_mir, mir_validated, etc. (If this is an optimization, it should go into the optimized_mir list.)

If you are writing a pass, there's a good chance that you are going to want to use a MIR visitor. MIR visitors are a handy way to walk all the parts of the MIR, either to search for something or to make small edits.

Stealing

The intermediate queries mir_const() and mir_validated() yield up a &'tcx Steal<Mir<'tcx>>, allocated using tcx.alloc_steal_mir(). This indicates that the result may be stolen by the next suite of optimizations – this is an optimization to avoid cloning the MIR. Attempting to use a stolen result will cause a panic in the compiler. Therefore, it is important that you do not read directly from these intermediate queries except as part of the MIR processing pipeline.

Because of this stealing mechanism, some care must also be taken to ensure that, before the MIR at a particular phase in the processing pipeline is stolen, anyone who may want to read from it has already done so. Concretely, this means that if you have some query foo(D) that wants to access the result of mir_const(D) or mir_validated(D), you need to have the successor pass "force" foo(D) using ty::queries::foo::force(...). This will force a query to execute even though you don't directly require its result.

As an example, consider MIR const qualification. It wants to read the result produced by the mir_const() suite. However, that result will be stolen by the mir_validated() suite. If nothing was done, then mir_const_qualif(D) would succeed if it came before mir_validated(D), but fail otherwise. Therefore, mir_validated(D) will force mir_const_qualif before it actually steals, thus ensuring that the reads have already happened (remember that queries are memoized, so executing a query twice simply loads from a cache the second time):

mir_const(D) --read-by--> mir_const_qualif(D)
     |                       ^
  stolen-by                  |
     |                    (forces)
     v                       |
mir_validated(D) ------------+

This mechanism is a bit dodgy. There is a discussion of more elegant alternatives in rust-lang/rust#41710.

Identifiers in the Compiler

If you have read the few previous chapters, you now know that rustc uses many different intermediate representations to perform different kinds of analyses. However, like in every data structure, you need a way to traverse the structure and refer to other elements. In this chapter, you will find information on the different identifiers rustc uses for each intermediate representation.

In the AST

A NodeId is an identifier number that uniquely identifies an AST node within a crate. Every node in the AST has its own NodeId, including top-level items such as structs, but also individual statements and expressions.

However, because they are absolute within in a crate, adding or removing a single node in the AST causes all the subsequent NodeIds to change. This renders NodeIds pretty much useless for incremental compilation, where you want as few things as possible to change.

NodeIds are used in all the rustc bits that operate directly on the AST, like macro expansion and name resolution.

In the HIR

The HIR uses a bunch of different identifiers that coexist and serve different purposes.

  • A DefId, as the name suggests, identifies a particular definition, or top-level item, in a given crate. It is composed of two parts: a CrateNum which identifies the crate the definition comes from, and a DefIndex which identifies the definition within the crate. Unlike NodeIds, there isn't a DefId for every expression, which makes them more stable across compilations.
  • A LocalDefId is basically a DefId that is known to come from the current crate. This allows us to drop the CrateNum part, and use the type system to ensure that only local definitions are passed to functions that expect a local definition.
  • A HirId uniquely identifies a node in the HIR of the current crate. It is composed of two parts: an owner and a local_id that is unique within the owner. This combination makes for more stable values which are helpful for incremental compilation. Unlike DefIds, a HirId can refer to fine-grained entities like expressions, but stays local to the current crate.
  • A BodyId identifies a HIR Body in the current crate. It is currenty only a wrapper around a HirId. For more info about HIR bodies, please refer to the HIR chapter.

These identifiers can be converted into one another through the HIR map. See the HIR chapter for more detailed information.

In the MIR

TODO

Closure Expansion in rustc

This section describes how rustc handles closures. Closures in Rust are effectively "desugared" into structs that contain the values they use (or references to the values they use) from their creator's stack frame. rustc has the job of figuring out which values a closure uses and how, so it can decide whether to capture a given variable by shared reference, mutable reference, or by move. rustc also has to figure out which of the closure traits (Fn, FnMut, or FnOnce) a closure is capable of implementing.

Let's start with a few examples:

Example 1

To start, let's take a look at how the closure in the following example is desugared:

fn closure(f: impl Fn()) {
    f();
}

fn main() {
    let x: i32 = 10;
    closure(|| println!("Hi {}", x));  // The closure just reads x.
    println!("Value of x after return {}", x);
}

Let's say the above is the content of a file called immut.rs. If we compile immut.rs using the following command. The -Zdump-mir=all flag will cause rustc to generate and dump the MIR to a directory called mir_dump.

> rustc +stage1 immut.rs -Zdump-mir=all

After we run this command, we will see a newly generated directory in our current working directory called mir_dump, which will contain several files. If we look at file rustc.main.-------.mir_map.0.mir, we will find, among other things, it also contains this line:

_4 = &_1;
_3 = [closure@immut.rs:7:13: 7:36] { x: move _4 };

Note that in the MIR examples in this chapter, _1 is x.

Here in first line _4 = &_1;, the mir_dump tells us that x was borrowed as an immutable reference. This is what we would hope as our closure just reads x.

Example 2

Here is another example:

fn closure(mut f: impl FnMut()) {
    f();
}

fn main() {
    let mut x: i32 = 10;
    closure(|| {
        x += 10;  // The closure mutates the value of x
        println!("Hi {}", x)
    });
    println!("Value of x after return {}", x);
}
_4 = &mut _1;
_3 = [closure@mut.rs:7:13: 10:6] { x: move _4 };

This time along, in the line _4 = &mut _1;, we see that the borrow is changed to mutable borrow. Fair enough! The closure increments x by 10.

Example 3

One more example:

fn closure(f: impl FnOnce()) {
    f();
}

fn main() {
    let x = vec![21];
    closure(|| {
        drop(x);  // Makes x unusable after the fact.
    });
    // println!("Value of x after return {:?}", x);
}
_6 = [closure@move.rs:7:13: 9:6] { x: move _1 }; // bb16[3]: scope 1 at move.rs:7:13: 9:6

Here, x is directly moved into the closure and the access to it will not be permitted after the closure.

Inferences in the compiler

Now let's dive into rustc code and see how all these inferences are done by the compiler.

Let's start with defining a term that we will be using quite a bit in the rest of the discussion - upvar. An upvar is a variable that is local to the function where the closure is defined. So, in the above examples, x will be an upvar to the closure. They are also sometimes referred to as the free variables meaning they are not bound to the context of the closure. compiler/rustc_middle/src/ty/query/mod.rs defines a query called upv.rs_mentioned for this purpose.

Other than lazy invocation, one other thing that distinguishes a closure from a normal function is that it can use the upvars. It borrows these upvars from its surrounding context; therefore the compiler has to determine the upvar's borrow type. The compiler starts with assigning an immutable borrow type and lowers the restriction (that is, changes it from immutable to mutable to move) as needed, based on the usage. In the Example 1 above, the closure only uses the variable for printing but does not modify it in any way and therefore, in the mir_dump, we find the borrow type for the upvar x to be immutable. In example 2, however, the closure modifies x and increments it by some value. Because of this mutation, the compiler, which started off assigning x as an immutable reference type, has to adjust it as a mutable reference. Likewise in the third example, the closure drops the vector and therefore this requires the variable x to be moved into the closure. Depending on the borrow kind, the closure has to implement the appropriate trait: Fn trait for immutable borrow, FnMut for mutable borrow, and FnOnce for move semantics.

Most of the code related to the closure is in the compiler/rustc_typeck/src/check/upvar.rs file and the data structures are declared in the file compiler/rustc_middle/src/ty/mod.rs.

Before we go any further, let's discuss how we can examine the flow of control through the rustc codebase. For closures specifically, set the RUST_LOG env variable as below and collect the output in a file:

> RUST_LOG=rustc_typeck::check::upvar rustc +stage1 -Zdump-mir=all \
    <.rs file to compile> 2> <file where the output will be dumped>

This uses the stage1 compiler and enables debug! logging for the rustc_typeck::check::upvar module.

The other option is to step through the code using lldb or gdb.

  1. rust-lldb build/x86_64-apple-darwin/stage1/bin/rustc test.rs
  2. In lldb:
    1. b upvar.rs:134 // Setting the breakpoint on a certain line in the upvar.rs file`
    2. r // Run the program until it hits the breakpoint

Let's start with upvar.rs. This file has something called the euv::ExprUseVisitor which walks the source of the closure and invokes a callbackfor each upvar that is borrowed, mutated, or moved.

fn main() {
    let mut x = vec![21];
    let _cl = || {
        let y = x[0];  // 1.
        x[0] += 1;  // 2.
    };
}

In the above example, our visitor will be called twice, for the lines marked 1 and 2, once for a shared borrow and another one for a mutable borrow. It will also tell us what was borrowed.

The callbacks are defined by implementing the Delegate trait. The InferBorrowKind type implements Delegate and keeps a map that records for each upvar which mode of capture was required. The modes of capture can be ByValue (moved) or ByRef (borrowed). For ByRef borrows, the possible BorrowKinds are ImmBorrow, UniqueImmBorrow, MutBorrow as defined in the compiler/rustc_middle/src/ty/mod.rs.

Delegate defines a few different methods (the different callbacks): consume for move of a variable, borrow for a borrow of some kind (shared or mutable), and mutate when we see an assignment of something.

All of these callbacks have a common argument cmt which stands for Category, Mutability and Type and is defined in compiler/rustc_middle/src/middle/mem_categorization.rs. Borrowing from the code comments, "cmt is a complete categorization of a value indicating where it originated and how it is located, as well as the mutability of the memory in which the value is stored". Based on the callback (consume, borrow etc.), we will call the relevant adjust_upvar_borrow_kind_for_ and pass the cmt along. Once the borrow type is adjusted, we store it in the table, which basically says what borrows were made for each closure.

self.tables
    .borrow_mut()
    .upvar_capture_map
    .extend(delegate.adjust_upvar_captures);

Part 4: Analysis

This part discusses the many analyses that the compiler uses to check various properties of the code and to inform later stages. Typically, this is what people mean when they talk about "Rust's type system". This includes the representation, inference, and checking of types, the trait system, and the borrow checker. These analyses do not happen as one big pass or set of contiguous passes. Rather, they are spread out throughout various parts of the compilation process and use different intermediate representations. For example, type checking happens on the HIR, while borrow checking happens on the MIR. Nonetheless, for the sake of presentation, we will discuss all of these analyses in this part of the guide.

The ty module: representing types

The ty module defines how the Rust compiler represents types internally. It also defines the typing context (tcx or TyCtxt), which is the central data structure in the compiler.

ty::Ty

When we talk about how rustc represents types, we usually refer to a type called Ty . There are quite a few modules and types for Ty in the compiler (Ty documentation).

The specific Ty we are referring to is rustc_middle::ty::Ty (and not rustc_hir::Ty). The distinction is important, so we will discuss it first before going into the details of ty::Ty.

rustc_hir::Ty vs ty::Ty

The HIR in rustc can be thought of as the high-level intermediate representation. It is more or less the AST (see this chapter) as it represents the syntax that the user wrote, and is obtained after parsing and some desugaring. It has a representation of types, but in reality it reflects more of what the user wrote, that is, what they wrote so as to represent that type.

In contrast, ty::Ty represents the semantics of a type, that is, the meaning of what the user wrote. For example, rustc_hir::Ty would record the fact that a user used the name u32 twice in their program, but the ty::Ty would record the fact that both usages refer to the same type.

Example: fn foo(x: u32) → u32 { x } In this function we see that u32 appears twice. We know that that is the same type, i.e. the function takes an argument and returns an argument of the same type, but from the point of view of the HIR there would be two distinct type instances because these are occurring in two different places in the program. That is, they have two different Spans (locations).

Example: fn foo(x: &u32) -> &u32 In addition, HIR might have information left out. This type &u32 is incomplete, since in the full rust type there is actually a lifetime, but we didn’t need to write those lifetimes. There are also some elision rules that insert information. The result may look like fn foo<'a>(x: &'a u32) -> &'a u32.

In the HIR level, these things are not spelled out and you can say the picture is rather incomplete. However, at the ty::Ty level, these details are added and it is complete. Moreover, we will have exactly one ty::Ty for a given type, like u32, and that ty::Ty is used for all u32s in the whole program, not a specific usage, unlike rustc_hir::Ty.

Here is a summary:

rustc_hir::Tyty::Ty
Describe the syntax of a type: what the user wrote (with some desugaring).Describe the semantics of a type: the meaning of what the user wrote.
Each rustc_hir::Ty has its own spans corresponding to the appropriate place in the program.Doesn’t correspond to a single place in the user’s program.
rustc_hir::Ty has generics and lifetimes; however, some of those lifetimes are special markers like LifetimeName::Implicit.ty::Ty has the full type, including generics and lifetimes, even if the user left them out
fn foo(x: u32) → u32 { } - Two rustc_hir::Ty representing each usage of u32. Each has its own Spans, etc.- rustc_hir::Ty doesn’t tell us that both are the same typefn foo(x: u32) → u32 { } - One ty::Ty for all instances of u32 throughout the program.- ty::Ty tells us that both usages of u32 mean the same type.
fn foo(x: &u32) -> &u32)- Two rustc_hir::Ty again.- Lifetimes for the references show up in the rustc_hir::Tys using a special marker, LifetimeName::Implicit.fn foo(x: &u32) -> &u32)- A single ty::Ty.- The ty::Ty has the hidden lifetime param

Order HIR is built directly from the AST, so it happens before any ty::Ty is produced. After HIR is built, some basic type inference and type checking is done. During the type inference, we figure out what the ty::Ty of everything is and we also check if the type of something is ambiguous. The ty::Ty then, is used for type checking while making sure everything has the expected type. The astconv module is where the code responsible for converting a rustc_hir::Ty into a ty::Ty is located. This occurs during the type-checking phase, but also in other parts of the compiler that want to ask questions like "what argument types does this function expect?"

How semantics drive the two instances of Ty You can think of HIR as the perspective of the type information that assumes the least. We assume two things are distinct until they are proven to be the same thing. In other words, we know less about them, so we should assume less about them.

They are syntactically two strings: "u32" at line N column 20 and "u32" at line N column 35. We don’t know that they are the same yet. So, in the HIR we treat them as if they are different. Later, we determine that they semantically are the same type and that’s the ty::Ty we use.

Consider another example: fn foo<T>(x: T) -> u32. Suppose that someone invokes foo::<u32>(0). This means that T and u32 (in this invocation) actually turns out to be the same type, so we would eventually end up with the same ty::Ty in the end, but we have distinct rustc_hir::Ty. (This is a bit over-simplified, though, since during type checking, we would check the function generically and would still have a T distinct from u32. Later, when doing code generation, we would always be handling "monomorphized" (fully substituted) versions of each function, and hence we would know what T represents (and specifically that it is u32).)

Here is one more example:


#![allow(unused)]
fn main() {
mod a {
    type X = u32;
    pub fn foo(x: X) -> u32 { 22 }
}
mod b {
    type X = i32;
    pub fn foo(x: X) -> i32 { x }
}
}

Here the type X will vary depending on context, clearly. If you look at the rustc_hir::Ty, you will get back that X is an alias in both cases (though it will be mapped via name resolution to distinct aliases). But if you look at the ty::Ty signature, it will be either fn(u32) -> u32 or fn(i32) -> i32 (with type aliases fully expanded).

ty::Ty implementation

rustc_middle::ty::Ty is actually a type alias to &TyS. This type, which is short for "Type Structure", is where the main functionality is located. You can ignore TyS struct in general; you will basically never access it explicitly. We always pass it by reference using the Ty alias. The only exception is to define inherent methods on types. In particular, TyS has a kind field of type TyKind, which represents the key type information. TyKind is a big enum with variants to represent many different Rust types (e.g. primitives, references, abstract data types, generics, lifetimes, etc). TyS also has 2 more fields, flags and outer_exclusive_binder. They are convenient hacks for efficiency and summarize information about the type that we may want to know, but they don’t come into the picture as much here. Finally, ty::TySs are interned, so that the ty::Ty can be a thin pointer-like type. This allows us to do cheap comparisons for equality, along with the other benefits of interning.

Allocating and working with types

To allocate a new type, you can use the various mk_ methods defined on the tcx. These have names that correspond mostly to the various kinds of types. For example:

let array_ty = tcx.mk_array(elem_ty, len * 2);

These methods all return a Ty<'tcx> – note that the lifetime you get back is the lifetime of the arena that this tcx has access to. Types are always canonicalized and interned (so we never allocate exactly the same type twice).

N.B. Because types are interned, it is possible to compare them for equality efficiently using == – however, this is almost never what you want to do unless you happen to be hashing and looking for duplicates. This is because often in Rust there are multiple ways to represent the same type, particularly once inference is involved. If you are going to be testing for type equality, you probably need to start looking into the inference code to do it right.

You can also find various common types in the tcx itself by accessing its fields: tcx.types.bool, tcx.types.char, etc. (See CommonTypes for more.)

ty::TyKind Variants

Note: TyKind is NOT the functional programming concept of Kind.

Whenever working with a Ty in the compiler, it is common to match on the kind of type:

fn foo(x: Ty<'tcx>) {
  match x.kind {
    ...
  }
}

The kind field is of type TyKind<'tcx>, which is an enum defining all of the different kinds of types in the compiler.

N.B. inspecting the kind field on types during type inference can be risky, as there may be inference variables and other things to consider, or sometimes types are not yet known and will become known later.

There are a lot of related types, and we’ll cover them in time (e.g regions/lifetimes, “substitutions”, etc).

There are many variants on the TyKind enum, which you can see by looking at its documentation. Here is a sampling:

  • Algebraic Data Types (ADTs) An algebraic data type is a struct, enum or union. Under the hood, struct, enum and union are actually implemented the same way: they are all ty::TyKind::Adt. It’s basically a user defined type. We will talk more about these later.
  • Foreign Corresponds to extern type T.
  • Str Is the type str. When the user writes &str, Str is the how we represent the str part of that type.
  • Slice Corresponds to [T].
  • Array Corresponds to [T; n].
  • RawPtr Corresponds to *mut T or *const T.
  • Ref Ref stands for safe references, &'a mut T or &'a T. Ref has some associated parts, like Ty<'tcx> which is the type that the reference references. Region<'tcx> is the lifetime or region of the reference and Mutability if the reference is mutable or not.
  • Param Represents a type parameter (e.g. the T in Vec<T>).
  • Error Represents a type error somewhere so that we can print better diagnostics. We will discuss this more later.
  • And many more...

Import conventions

Although there is no hard and fast rule, the ty module tends to be used like so:

use ty::{self, Ty, TyCtxt};

In particular, since they are so common, the Ty and TyCtxt types are imported directly. Other types are often referenced with an explicit ty:: prefix (e.g. ty::TraitRef<'tcx>). But some modules choose to import a larger or smaller set of names explicitly.

ADTs Representation

Let's consider the example of a type like MyStruct<u32>, where MyStruct is defined like so:

struct MyStruct<T> { x: u32, y: T }

The type MyStruct<u32> would be an instance of TyKind::Adt:

Adt(&'tcx AdtDef, SubstsRef<'tcx>)
//  ------------  ---------------
//  (1)            (2)
//
// (1) represents the `MyStruct` part
// (2) represents the `<u32>`, or "substitutions" / generic arguments

There are two parts:

  • The AdtDef references the struct/enum/union but without the values for its type parameters. In our example, this is the MyStruct part without the argument u32. (Note that in the HIR, structs, enums and unions are represented differently, but in ty::Ty, they are all represented using TyKind::Adt.)
  • The SubstsRef is an interned list of values that are to be substituted for the generic parameters. In our example of MyStruct<u32>, we would end up with a list like [u32]. We’ll dig more into generics and substitutions in a little bit.

AdtDef and DefId

For every type defined in the source code, there is a unique DefId (see this chapter). This includes ADTs and generics. In the MyStruct<T> definition we gave above, there are two DefIds: one for MyStruct and one for T. Notice that the code above does not generate a new DefId for u32 because it is not defined in that code (it is only referenced).

AdtDef is more or less a wrapper around DefId with lots of useful helper methods. There is essentially a one-to-one relationship between AdtDef and DefId. You can get the AdtDef for a DefId with the tcx.adt_def(def_id) query. AdtDefs are all interned, as shown by the 'tcx lifetime.

Type errors

There is a TyKind::Error that is produced when the user makes a type error. The idea is that we would propagate this type and suppress other errors that come up due to it so as not to overwhelm the user with cascading compiler error messages.

There is an important invariant for TyKind::Error. The compiler should never produce Error unless we know that an error has already been reported to the user. This is usually because (a) you just reported it right there or (b) you are propagating an existing Error type (in which case the error should've been reported when that error type was produced).

It's important to maintain this invariant because the whole point of the Error type is to suppress other errors -- i.e., we don't report them. If we were to produce an Error type without actually emitting an error to the user, then this could cause later errors to be suppressed, and the compilation might inadvertently succeed!

Sometimes there is a third case. You believe that an error has been reported, but you believe it would've been reported earlier in the compilation, not locally. In that case, you can invoke delay_span_bug This will make a note that you expect compilation to yield an error -- if however compilation should succeed, then it will trigger a compiler bug report.

For added safety, it's not actually possible to produce a TyKind::Error value outside of rustc_middle::ty; there is a private member of TyKind::Error that prevents it from being constructable elsewhere. Instead, one should use the TyCtxt::ty_error or TyCtxt::ty_error_with_message methods. These methods automatically call delay_span_bug before returning an interned Ty of kind Error. If you were already planning to use delay_span_bug, then you can just pass the span and message to ty_error_with_message instead to avoid delaying a redundant span bug.

Question: Why not substitute “inside” the AdtDef?

Recall that we represent a generic struct with (AdtDef, substs). So why bother with this scheme?

Well, the alternate way we could have choosen to represent types would be to always create a new, fully-substituted form of the AdtDef where all the types are already substituted. This seems like less of a hassle. However, the (AdtDef, substs) scheme has some advantages over this.

First, (AdtDef, substs) scheme has an efficiency win:

struct MyStruct<T> {
  ... 100s of fields ...
}

// Want to do: MyStruct<A> ==> MyStruct<B>

in an example like this, we can subst from MyStruct<A> to MyStruct<B> (and so on) very cheaply, by just replacing the one reference to A with B. But if we eagerly substituted all the fields, that could be a lot more work because we might have to go through all of the fields in the AdtDef and update all of their types.

A bit more deeply, this corresponds to structs in Rust being nominal types — which means that they are defined by their name (and that their contents are then indexed from the definition of that name, and not carried along “within” the type itself).

Generics and substitutions

Given a generic type MyType<A, B, …>, we may want to swap out the generics A, B, … for some other types (possibly other generics or concrete types). We do this a lot while doing type inference, type checking, and trait solving. Conceptually, during these routines, we may find out that one type is equal to another type and want to swap one out for the other and then swap that out for another type and so on until we eventually get some concrete types (or an error).

In rustc this is done using the SubstsRef that we mentioned above (“substs” = “substitutions”). Conceptually, you can think of SubstsRef of a list of types that are to be substituted for the generic type parameters of the ADT.

SubstsRef is a type alias of List<GenericArg<'tcx>> (see List rustdocs). GenericArg is essentially a space-efficient wrapper around GenericArgKind, which is an enum indicating what kind of generic the type parameter is (type, lifetime, or const). Thus, SubstsRef is conceptually like a &'tcx [GenericArgKind<'tcx>] slice (but it is actually a List).

So why do we use this List type instead of making it really a slice? It has the length "inline", so &List is only 32 bits. As a consequence, it cannot be "subsliced" (that only works if the length is out of line).

This also implies that you can check two Lists for equality via == (which would be not be possible for ordinary slices). This is precisely because they never represent a "sub-list", only the complete List, which has been hashed and interned.

So pulling it all together, let’s go back to our example above:

struct MyStruct<T>
  • There would be an AdtDef (and corresponding DefId) for MyStruct.
  • There would be a TyKind::Param (and corresponding DefId) for T (more later).
  • There would be a SubstsRef containing the list [GenericArgKind::Type(Ty(T))]
    • The Ty(T) here is my shorthand for entire other ty::Ty that has TyKind::Param, which we mentioned in the previous point.
  • This is one TyKind::Adt containing the AdtDef of MyStruct with the SubstsRef above.

Finally, we will quickly mention the Generics type. It is used to give information about the type parameters of a type.

Unsubstituted Generics

So above, recall that in our example the MyStruct struct had a generic type T. When we are (for example) type checking functions that use MyStruct, we will need to be able to refer to this type T without actually knowing what it is. In general, this is true inside all generic definitions: we need to be able to work with unknown types. This is done via TyKind::Param (which we mentioned in the example above).

Each TyKind::Param contains two things: the name and the index. In general, the index fully defines the parameter and is used by most of the code. The name is included for debug print-outs. There are two reasons for this. First, the index is convenient, it allows you to include into the list of generic arguments when substituting. Second, the index is more robust. For example, you could in principle have two distinct type parameters that use the same name, e.g. impl<A> Foo<A> { fn bar<A>() { .. } }, although the rules against shadowing make this difficult (but those language rules could change in the future).

The index of the type parameter is an integer indicating its order in the list of the type parameters. Moreover, we consider the list to include all of the type parameters from outer scopes. Consider the following example:

struct Foo<A, B> {
  // A would have index 0
  // B would have index 1

  .. // some fields
}
impl<X, Y> Foo<X, Y> {
  fn method<Z>() {
    // inside here, X, Y and Z are all in scope
    // X has index 0
    // Y has index 1
    // Z has index 2
  }
}

When we are working inside the generic definition, we will use TyKind::Param just like any other TyKind; it is just a type after all. However, if we want to use the generic type somewhere, then we will need to do substitutions.

For example suppose that the Foo<A, B> type from the previous example has a field that is a Vec<A>. Observe that Vec is also a generic type. We want to tell the compiler that the type parameter of Vec should be replaced with the A type parameter of Foo<A, B>. We do that with substitutions:

struct Foo<A, B> { // Adt(Foo, &[Param(0), Param(1)])
  x: Vec<A>, // Adt(Vec, &[Param(0)])
  ..
}

fn bar(foo: Foo<u32, f32>) { // Adt(Foo, &[u32, f32])
  let y = foo.x; // Vec<Param(0)> => Vec<u32>
}

This example has a few different substitutions:

  • In the definition of Foo, in the type of the field x, we replace Vec's type parameter with Param(0), the first parameter of Foo<A, B>, so that the type of x is Vec<A>.
  • In the function bar, we specify that we want a Foo<u32, f32>. This means that we will substitute Param(0) and Param(1) with u32 and f32.
  • In the body of bar, we access foo.x, which has type Vec<Param(0)>, but Param(0) has been substituted for u32, so foo.x has type Vec<u32>.

Let’s look a bit more closely at that last substitution to see why we use indexes. If we want to find the type of foo.x, we can get generic type of x, which is Vec<Param(0)>. Now we can take the index 0 and use it to find the right type substitution: looking at Foo's SubstsRef, we have the list [u32, f32] , since we want to replace index 0, we take the 0-th index of this list, which is u32. Voila!

You may have a couple of followup questions…

type_of How do we get the “generic type of x"? You can get the type of pretty much anything with the tcx.type_of(def_id) query. In this case, we would pass the DefId of the field x. The type_of query always returns the definition with the generics that are in scope of the definition. For example, tcx.type_of(def_id_of_my_struct) would return the “self-view” of MyStruct: Adt(Foo, &[Param(0), Param(1)]).

subst How do we actually do the substitutions? There is a function for that too! You use subst to replace a SubstRef with another list of types.

Here is an example of actually using subst in the compiler. The exact details are not too important, but in this piece of code, we happen to be converting from the rustc_hir::Ty to a real ty::Ty. You can see that we first get some substitutions (substs). Then we call type_of to get a type and call ty.subst(substs) to get a new version of ty with the substitutions made.

Note on indices: It is possible for the indices in Param to not match with what we expect. For example, the index could be out of bounds or it could be the index of a lifetime when we were expecting a type. These sorts of errors would be caught earlier in the compiler when translating from a rustc_hir::Ty to a ty::Ty. If they occur later, that is a compiler bug.

TypeFoldable and TypeFolder

How is this subst query actually implemented? As you can imagine, we might want to do substitutions on a lot of different things. For example, we might want to do a substitution directly on a type like we did with Vec above. But we might also have a more complex type with other types nested inside that also need substitutions.

The answer is a couple of traits: TypeFoldable and TypeFolder.

  • TypeFoldable is implemented by types that embed type information. It allows you to recursively process the contents of the TypeFoldable and do stuff to them.
  • TypeFolder defines what you want to do with the types you encounter while processing the TypeFoldable.

For example, the TypeFolder trait has a method fold_ty that takes a type as input a type and returns a new type as a result. TypeFoldable invokes the TypeFolder fold_foo methods on itself, giving the TypeFolder access to its contents (the types, regions, etc that are contained within).

You can think of it with this analogy to the iterator combinators we have come to love in rust:

vec.iter().map(|e1| foo(e2)).collect()
//             ^^^^^^^^^^^^ analogous to `TypeFolder`
//         ^^^ analogous to `TypeFoldable`

So to reiterate:

  • TypeFolder is a trait that defines a “map” operation.
  • TypeFoldable is a trait that is implemented by things that embed types.

In the case of subst, we can see that it is implemented as a TypeFolder: SubstFolder. Looking at its implementation, we see where the actual substitutions are happening.

However, you might also notice that the implementation calls this super_fold_with method. What is that? It is a method of TypeFoldable. Consider the following TypeFoldable type MyFoldable:

struct MyFoldable<'tcx> {
  def_id: DefId,
  ty: Ty<'tcx>,
}

The TypeFolder can call super_fold_with on MyFoldable if it just wants to replace some of the fields of MyFoldable with new values. If it instead wants to replace the whole MyFoldable with a different one, it would call fold_with instead (a different method on TypeFoldable).

In almost all cases, we don’t want to replace the whole struct; we only want to replace ty::Tys in the struct, so usually we call super_fold_with. A typical implementation that MyFoldable could have might do something like this:

my_foldable: MyFoldable<'tcx>
my_foldable.subst(..., subst)

impl TypeFoldable for MyFoldable {
  fn super_fold_with(&self, folder: &mut impl TypeFolder<'tcx>) -> MyFoldable {
    MyFoldable {
      def_id: self.def_id.fold_with(folder),
      ty: self.ty.fold_with(folder),
    }
  }

  fn super_visit_with(..) { }
}

Notice that here, we implement super_fold_with to go over the fields of MyFoldable and call fold_with on them. That is, a folder may replace def_id and ty, but not the whole MyFoldable struct.

Here is another example to put things together: suppose we have a type like Vec<Vec<X>>. The ty::Ty would look like: Adt(Vec, &[Adt(Vec, &[Param(X)])]). If we want to do subst(X => u32), then we would first look at the overall type. We would see that there are no substitutions to be made at the outer level, so we would descend one level and look at Adt(Vec, &[Param(X)]). There are still no substitutions to be made here, so we would descend again. Now we are looking at Param(X), which can be substituted, so we replace it with u32. We can’t descend any more, so we are done, and the overall result is Adt(Vec, &[Adt(Vec, &[u32])]).

One last thing to mention: often when folding over a TypeFoldable, we don’t want to change most things. We only want to do something when we reach a type. That means there may be a lot of TypeFoldable types whose implementations basically just forward to their fields’ TypeFoldable implementations. Such implementations of TypeFoldable tend to be pretty tedious to write by hand. For this reason, there is a derive macro that allows you to #![derive(TypeFoldable)]. It is defined here.

subst In the case of substitutions the actual folder is going to be doing the indexing we’ve already mentioned. There we define a Folder and call fold_with on the TypeFoldable to process yourself. Then fold_ty the method that process each type it looks for a ty::Param and for those it replaces it for something from the list of substitutions, otherwise recursively process the type. To replace it, calls ty_for_param and all that does is index into the list of substitutions with the index of the Param.

Generic arguments

A ty::subst::GenericArg<'tcx> represents some entity in the type system: a type (Ty<'tcx>), lifetime (ty::Region<'tcx>) or constant (ty::Const<'tcx>). GenericArg is used to perform substitutions of generic parameters for concrete arguments, such as when calling a function with generic parameters explicitly with type arguments. Substitutions are represented using the Subst type as described below.

Subst

ty::subst::Subst<'tcx> is intuitively simply a slice of GenericArg<'tcx>s, acting as an ordered list of substitutions from generic parameters to concrete arguments (such as types, lifetimes and consts).

For example, given a HashMap<K, V> with two type parameters, K and V, an instantiation of the parameters, for example HashMap<i32, u32>, would be represented by the substitution &'tcx [tcx.types.i32, tcx.types.u32].

Subst provides various convenience methods to instantiate substitutions given item definitions, which should generally be used rather than explicitly constructing such substitution slices.

GenericArg

The actual GenericArg struct is optimised for space, storing the type, lifetime or const as an interned pointer containing a tag identifying its kind (in the lowest 2 bits). Unless you are working with the Subst implementation specifically, you should generally not have to deal with GenericArg and instead make use of the safe GenericArgKind abstraction.

GenericArgKind

As GenericArg itself is not type-safe, the GenericArgKind enum provides a more convenient and safe interface for dealing with generic arguments. An GenericArgKind can be converted to a raw GenericArg using GenericArg::from() (or simply .into() when the context is clear). As mentioned earlier, substitution lists store raw GenericArgs, so before dealing with them, it is preferable to convert them to GenericArgKinds first. This is done by calling the .unpack() method.

// An example of unpacking and packing a generic argument.
fn deal_with_generic_arg<'tcx>(generic_arg: GenericArg<'tcx>) -> GenericArg<'tcx> {
    // Unpack a raw `GenericArg` to deal with it safely.
    let new_generic_arg: GenericArgKind<'tcx> = match generic_arg.unpack() {
        GenericArgKind::Type(ty) => { /* ... */ }
        GenericArgKind::Lifetime(lt) => { /* ... */ }
        GenericArgKind::Const(ct) => { /* ... */ }
    };
    // Pack the `GenericArgKind` to store it in a substitution list.
    new_generic_arg.into()
}

Type inference

Type inference is the process of automatic detection of the type of an expression.

It is what allows Rust to work with fewer or no type annotations, making things easier for users:

fn main() {
    let mut things = vec![];
    things.push("thing");
}

Here, the type of things is inferred to be Vec<&str> because of the value we push into things.

The type inference is based on the standard Hindley-Milner (HM) type inference algorithm, but extended in various way to accommodate subtyping, region inference, and higher-ranked types.

A note on terminology

We use the notation ?T to refer to inference variables, also called existential variables.

We use the terms "region" and "lifetime" interchangeably. Both refer to the 'a in &'a T.

The term "bound region" refers to a region that is bound in a function signature, such as the 'a in for<'a> fn(&'a u32). A region is "free" if it is not bound.

Creating an inference context

You create and "enter" an inference context by doing something like the following:

tcx.infer_ctxt().enter(|infcx| {
    // Use the inference context `infcx` here.
})

Within the closure, infcx has the type InferCtxt<'cx, 'tcx> for some fresh 'cx, while 'tcx is the same as outside the inference context. (Again, see the ty chapter for more details on this setup.)

The tcx.infer_ctxt method actually returns a builder, which means there are some kinds of configuration you can do before the infcx is created. See InferCtxtBuilder for more information.

Inference variables

The main purpose of the inference context is to house a bunch of inference variables – these represent types or regions whose precise value is not yet known, but will be uncovered as we perform type-checking.

If you're familiar with the basic ideas of unification from H-M type systems, or logic languages like Prolog, this is the same concept. If you're not, you might want to read a tutorial on how H-M type inference works, or perhaps this blog post on unification in the Chalk project.

All told, the inference context stores four kinds of inference variables as of this writing:

  • Type variables, which come in three varieties:
    • General type variables (the most common). These can be unified with any type.
    • Integral type variables, which can only be unified with an integral type, and arise from an integer literal expression like 22.
    • Float type variables, which can only be unified with a float type, and arise from a float literal expression like 22.0.
  • Region variables, which represent lifetimes, and arise all over the place.

All the type variables work in much the same way: you can create a new type variable, and what you get is Ty<'tcx> representing an unresolved type ?T. Then later you can apply the various operations that the inferencer supports, such as equality or subtyping, and it will possibly instantiate (or bind) that ?T to a specific value as a result.

The region variables work somewhat differently, and are described below in a separate section.

Enforcing equality / subtyping

The most basic operations you can perform in the type inferencer is equality, which forces two types T and U to be the same. The recommended way to add an equality constraint is to use the at method, roughly like so:

infcx.at(...).eq(t, u);

The first at() call provides a bit of context, i.e. why you are doing this unification, and in what environment, and the eq method performs the actual equality constraint.

When you equate things, you force them to be precisely equal. Equating returns an InferResult – if it returns Err(err), then equating failed, and the enclosing TypeError will tell you what went wrong.

The success case is perhaps more interesting. The "primary" return type of eq is () – that is, when it succeeds, it doesn't return a value of any particular interest. Rather, it is executed for its side-effects of constraining type variables and so forth. However, the actual return type is not (), but rather InferOk<()>. The InferOk type is used to carry extra trait obligations – your job is to ensure that these are fulfilled (typically by enrolling them in a fulfillment context). See the trait chapter for more background on that.

You can similarly enforce subtyping through infcx.at(..).sub(..). The same basic concepts as above apply.

"Trying" equality

Sometimes you would like to know if it is possible to equate two types without error. You can test that with infcx.can_eq (or infcx.can_sub for subtyping). If this returns Ok, then equality is possible – but in all cases, any side-effects are reversed.

Be aware, though, that the success or failure of these methods is always modulo regions. That is, two types &'a u32 and &'b u32 will return Ok for can_eq, even if 'a != 'b. This falls out from the "two-phase" nature of how we solve region constraints.

Snapshots

As described in the previous section on can_eq, often it is useful to be able to do a series of operations and then roll back their side-effects. This is done for various reasons: one of them is to be able to backtrack, trying out multiple possibilities before settling on which path to take. Another is in order to ensure that a series of smaller changes take place atomically or not at all.

To allow for this, the inference context supports a snapshot method. When you call it, it will start recording changes that occur from the operations you perform. When you are done, you can either invoke rollback_to, which will undo those changes, or else confirm, which will make them permanent. Snapshots can be nested as long as you follow a stack-like discipline.

Rather than use snapshots directly, it is often helpful to use the methods like commit_if_ok or probe that encapsulate higher-level patterns.

Subtyping obligations

One thing worth discussing is subtyping obligations. When you force two types to be a subtype, like ?T <: i32, we can often convert those into equality constraints. This follows from Rust's rather limited notion of subtyping: so, in the above case, ?T <: i32 is equivalent to ?T = i32.

However, in some cases we have to be more careful. For example, when regions are involved. So if you have ?T <: &'a i32, what we would do is to first "generalize" &'a i32 into a type with a region variable: &'?b i32, and then unify ?T with that (?T = &'?b i32). We then relate this new variable with the original bound:

&'?b i32 <: &'a i32

This will result in a region constraint (see below) of '?b: 'a.

One final interesting case is relating two unbound type variables, like ?T <: ?U. In that case, we can't make progress, so we enqueue an obligation Subtype(?T, ?U) and return it via the InferOk mechanism. You'll have to try again when more details about ?T or ?U are known.

Region constraints

Regions are inferenced somewhat differently from types. Rather than eagerly unifying things, we simply collect constraints as we go, but make (almost) no attempt to solve regions. These constraints have the form of an "outlives" constraint:

'a: 'b

Actually the code tends to view them as a subregion relation, but it's the same idea:

'b <= 'a

(There are various other kinds of constraints, such as "verifys"; see the region_constraints module for details.)

There is one case where we do some amount of eager unification. If you have an equality constraint between two regions

'a = 'b

we will record that fact in a unification table. You can then use opportunistic_resolve_var to convert 'b to 'a (or vice versa). This is sometimes needed to ensure termination of fixed-point algorithms.

Extracting region constraints

Ultimately, region constraints are only solved at the very end of type-checking, once all other constraints are known. There are two ways to solve region constraints right now: lexical and non-lexical. Eventually there will only be one.

To solve lexical region constraints, you invoke resolve_regions_and_report_errors. This "closes" the region constraint process and invokes the lexical_region_resolve code. Once this is done, any further attempt to equate or create a subtyping relationship will yield an ICE.

Non-lexical region constraints are not handled within the inference context. Instead, the NLL solver (actually, the MIR type-checker) invokes take_and_reset_region_constraints periodically. This extracts all of the outlives constraints from the region solver, but leaves the set of variables intact. This is used to get just the region constraints that resulted from some particular point in the program, since the NLL solver needs to know not just what regions were subregions, but also where. Finally, the NLL solver invokes take_region_var_origins, which "closes" the region constraint process in the same way as normal solving.

Lexical region resolution

Lexical region resolution is done by initially assigning each region variable to an empty value. We then process each outlives constraint repeatedly, growing region variables until a fixed-point is reached. Region variables can be grown using a least-upper-bound relation on the region lattice in a fairly straightforward fashion.

Trait resolution (old-style)

This chapter describes the general process of trait resolution and points out some non-obvious things.

Note: This chapter (and its subchapters) describe how the trait solver currently works. However, we are in the process of designing a new trait solver. If you'd prefer to read about that, see this subchapter.

Major concepts

Trait resolution is the process of pairing up an impl with each reference to a trait. So, for example, if there is a generic function like:

fn clone_slice<T:Clone>(x: &[T]) -> Vec<T> { ... }

and then a call to that function:

let v: Vec<isize> = clone_slice(&[1, 2, 3])

it is the job of trait resolution to figure out whether there exists an impl of (in this case) isize : Clone.

Note that in some cases, like generic functions, we may not be able to find a specific impl, but we can figure out that the caller must provide an impl. For example, consider the body of clone_slice:

fn clone_slice<T:Clone>(x: &[T]) -> Vec<T> {
    let mut v = Vec::new();
    for e in &x {
        v.push((*e).clone()); // (*)
    }
}

The line marked (*) is only legal if T (the type of *e) implements the Clone trait. Naturally, since we don't know what T is, we can't find the specific impl; but based on the bound T:Clone, we can say that there exists an impl which the caller must provide.

We use the term obligation to refer to a trait reference in need of an impl. Basically, the trait resolution system resolves an obligation by proving that an appropriate impl does exist.

During type checking, we do not store the results of trait selection. We simply wish to verify that trait selection will succeed. Then later, at trans time, when we have all concrete types available, we can repeat the trait selection to choose an actual implementation, which will then be generated in the output binary.

Overview

Trait resolution consists of three major parts:

  • Selection: Deciding how to resolve a specific obligation. For example, selection might decide that a specific obligation can be resolved by employing an impl which matches the Self type, or by using a parameter bound (e.g. T: Trait). In the case of an impl, selecting one obligation can create nested obligations because of where clauses on the impl itself. It may also require evaluating those nested obligations to resolve ambiguities.

  • Fulfillment: The fulfillment code is what tracks that obligations are completely fulfilled. Basically it is a worklist of obligations to be selected: once selection is successful, the obligation is removed from the worklist and any nested obligations are enqueued.

  • Coherence: The coherence checks are intended to ensure that there are never overlapping impls, where two impls could be used with equal precedence.

Selection

Selection is the process of deciding whether an obligation can be resolved and, if so, how it is to be resolved (via impl, where clause, etc). The main interface is the select() function, which takes an obligation and returns a SelectionResult. There are three possible outcomes:

  • Ok(Some(selection)) – yes, the obligation can be resolved, and selection indicates how. If the impl was resolved via an impl, then selection may also indicate nested obligations that are required by the impl.

  • Ok(None) – we are not yet sure whether the obligation can be resolved or not. This happens most commonly when the obligation contains unbound type variables.

  • Err(err) – the obligation definitely cannot be resolved due to a type error or because there are no impls that could possibly apply.

The basic algorithm for selection is broken into two big phases: candidate assembly and confirmation.

Note that because of how lifetime inference works, it is not possible to give back immediate feedback as to whether a unification or subtype relationship between lifetimes holds or not. Therefore, lifetime matching is not considered during selection. This is reflected in the fact that subregion assignment is infallible. This may yield lifetime constraints that will later be found to be in error (in contrast, the non-lifetime-constraints have already been checked during selection and can never cause an error, though naturally they may lead to other errors downstream).

Candidate assembly

Searches for impls/where-clauses/etc that might possibly be used to satisfy the obligation. Each of those is called a candidate. To avoid ambiguity, we want to find exactly one candidate that is definitively applicable. In some cases, we may not know whether an impl/where-clause applies or not – this occurs when the obligation contains unbound inference variables.

The subroutines that decide whether a particular impl/where-clause/etc applies to a particular obligation are collectively referred to as the process of matching. At the moment, this amounts to unifying the Self types, but in the future we may also recursively consider some of the nested obligations, in the case of an impl.

TODO: what does "unifying the Self types" mean? The Self of the obligation with that of an impl?

The basic idea for candidate assembly is to do a first pass in which we identify all possible candidates. During this pass, all that we do is try and unify the type parameters. (In particular, we ignore any nested where clauses.) Presuming that this unification succeeds, the impl is added as a candidate.

Once this first pass is done, we can examine the set of candidates. If it is a singleton set, then we are done: this is the only impl in scope that could possibly apply. Otherwise, we can winnow down the set of candidates by using where clauses and other conditions. If this reduced set yields a single, unambiguous entry, we're good to go, otherwise the result is considered ambiguous.

The basic process: Inferring based on the impls we see

This process is easier if we work through some examples. Consider the following trait:

trait Convert<Target> {
    fn convert(&self) -> Target;
}

This trait just has one method. It's about as simple as it gets. It converts from the (implicit) Self type to the Target type. If we wanted to permit conversion between isize and usize, we might implement Convert like so:

impl Convert<usize> for isize { ... } // isize -> usize
impl Convert<isize> for usize { ... } // usize -> isize

Now imagine there is some code like the following:

let x: isize = ...;
let y = x.convert();

The call to convert will generate a trait reference Convert<$Y> for isize, where $Y is the type variable representing the type of y. Of the two impls we can see, the only one that matches is Convert<usize> for isize. Therefore, we can select this impl, which will cause the type of $Y to be unified to usize. (Note that while assembling candidates, we do the initial unifications in a transaction, so that they don't affect one another.)

TODO: The example says we can "select" the impl, but this section is talking specifically about candidate assembly. Does this mean we can sometimes skip confirmation? Or is this poor wording? TODO: Is the unification of $Y part of trait resolution or type inference? Or is this not the same type of "inference variable" as in type inference?

Winnowing: Resolving ambiguities

But what happens if there are multiple impls where all the types unify? Consider this example:

trait Get {
    fn get(&self) -> Self;
}

impl<T: Copy> Get for T {
    fn get(&self) -> T {
        *self
    }
}

impl<T: Get> Get for Box<T> {
    fn get(&self) -> Box<T> {
        Box::new(<T>::get(self))
    }
}

What happens when we invoke get_it(&Box::new(1_u16)), for example? In this case, the Self type is Box<u16> – that unifies with both impls, because the first applies to all types T, and the second to all Box<T>. In order for this to be unambiguous, the compiler does a winnowing pass that considers where clauses and attempts to remove candidates. In this case, the first impl only applies if Box<u16> : Copy, which doesn't hold. After winnowing, then, we are left with just one candidate, so we can proceed.

where clauses

Besides an impl, the other major way to resolve an obligation is via a where clause. The selection process is always given a parameter environment which contains a list of where clauses, which are basically obligations that we can assume are satisfiable. We will iterate over that list and check whether our current obligation can be found in that list. If so, it is considered satisfied. More precisely, we want to check whether there is a where-clause obligation that is for the same trait (or some subtrait) and which can match against the obligation.

Consider this simple example:

trait A1 {
    fn do_a1(&self);
}
trait A2 : A1 { ... }

trait B {
    fn do_b(&self);
}

fn foo<X:A2+B>(x: X) {
    x.do_a1(); // (*)
    x.do_b();  // (#)
}

In the body of foo, clearly we can use methods of A1, A2, or B on variable x. The line marked (*) will incur an obligation X: A1, while the line marked (#) will incur an obligation X: B. Meanwhile, the parameter environment will contain two where-clauses: X : A2 and X : B. For each obligation, then, we search this list of where-clauses. The obligation X: B trivially matches against the where-clause X: B. To resolve an obligation X:A1, we would note that X:A2 implies that X:A1.

Confirmation

Confirmation unifies the output type parameters of the trait with the values found in the obligation, possibly yielding a type error.

Suppose we have the following variation of the Convert example in the previous section:

trait Convert<Target> {
    fn convert(&self) -> Target;
}

impl Convert<usize> for isize { ... } // isize -> usize
impl Convert<isize> for usize { ... } // usize -> isize

let x: isize = ...;
let y: char = x.convert(); // NOTE: `y: char` now!

Confirmation is where an error would be reported because the impl specified that Target would be usize, but the obligation reported char. Hence the result of selection would be an error.

Note that the candidate impl is chosen based on the Self type, but confirmation is done based on (in this case) the Target type parameter.

Selection during translation

As mentioned above, during type checking, we do not store the results of trait selection. At trans time, we repeat the trait selection to choose a particular impl for each method call. In this second selection, we do not consider any where-clauses to be in scope because we know that each resolution will resolve to a particular impl.

One interesting twist has to do with nested obligations. In general, in trans, we only need to do a "shallow" selection for an obligation. That is, we wish to identify which impl applies, but we do not (yet) need to decide how to select any nested obligations. Nonetheless, we do currently do a complete resolution, and that is because it can sometimes inform the results of type inference. That is, we do not have the full substitutions in terms of the type variables of the impl available to us, so we must run trait selection to figure everything out.

TODO: is this still talking about trans?

Here is an example:

trait Foo { ... }
impl<U, T:Bar<U>> Foo for Vec<T> { ... }

impl Bar<usize> for isize { ... }

After one shallow round of selection for an obligation like Vec<isize> : Foo, we would know which impl we want, and we would know that T=isize, but we do not know the type of U. We must select the nested obligation isize : Bar<U> to find out that U=usize.

It would be good to only do just as much nested resolution as necessary. Currently, though, we just do a full resolution.

Early and Late Bound Variables

In Rust, item definitions (like fn) can often have generic parameters, which are always universally quantified. That is, if you have a function like


#![allow(unused)]
fn main() {
fn foo<T>(x: T) { }
}

this function is defined "for all T" (not "for some specific T", which would be existentially quantified).

While Rust items can be quantified over types, lifetimes, and constants, the types of values in Rust are only ever quantified over lifetimes. So you can have a type like for<'a> fn(&'a u32), which represents a function pointer that takes a reference with any lifetime, or for<'a> dyn Trait<'a>, which is a dyn trait for a trait implemented for any lifetime; but we have no type like for<T> fn(T), which would be a function that takes a value of any type as a parameter. This is a consequence of monomorphization -- to support a value of type for<T> fn(T), we would need a single function pointer that can be used for a parameter of any type, but in Rust we generate customized code for each parameter type.

One consequence of this asymmetry is a weird split in how we represesent some generic types: early- and late- bound parameters. Basically, if we cannot represent a type (e.g. a universally quantified type), we have to bind it early so that the unrepresentable type is never around.

Consider the following example:

fn foo<'a, 'b, T>(x: &'a u32, y: &'b T) where T: 'b { ... }

We cannot treat 'a, 'b, and T in the same way. Types in Rust can't have for<T> { .. }, only for<'a> {...}, so whenever you reference foo the type you get back can't be for<'a, 'b, T> fn(&'a u32, y: &'b T). Instead, the T must be substituted early. In particular, you have:

let x = foo; // T, 'b have to be substituted here
x(...);      // 'a substituted here, at the point of call
x(...);      // 'a substituted here with a different value

Early-bound parameters

Early-bound parameters in rustc are identified by an index, stored in the ParamTy struct for types or the EarlyBoundRegion struct for lifetimes. The index counts from the outermost declaration in scope. This means that as you add more binders inside, the index doesn't change.

For example,

trait Foo<T> {
  type Bar<U> = (Self, T, U);
}

Here, the type (Self, T, U) would be ($0, $1, $2), where $N means a ParamTy with the index of N.

In rustc, the Generics structure carries the this information. So the Generics for Bar above would be just like for U and would indicate the 'parent' generics of Foo, which declares Self and T. You can read more in this chapter.

Late-bound parameters

Late-bound parameters in rustc are handled quite differently (they are also specialized to lifetimes since, right now, only late-bound lifetimes are supported, though with GATs that has to change). We indicate their potential presence by a Binder type. The Binder doesn't know how many variables there are at that binding level. This can only be determined by walking the type itself and collecting them. So a type like for<'a, 'b> ('a, 'b) would be for (^0.a, ^0.b). Here, we just write for because we don't know the names of the things bound within.

Moreover, a reference to a late-bound lifetime is written ^0.a:

  • The 0 is the index; it identifies that this lifetime is bound in the innermost binder (the for).
  • The a is the "name"; late-bound lifetimes in rustc are identified by a "name" -- the BoundRegion struct. This struct can contain a DefId or it might have various "anonymous" numbered names. The latter arise from types like fn(&u32, &u32), which are equivalent to something like for<'a, 'b> fn(&'a u32, &'b u32), but the names of those lifetimes must be generated.

This setup of not knowing the full set of variables at a binding level has some advantages and some disadvantages. The disadvantage is that you must walk the type to find out what is bound at the given level and so forth. The advantage is primarily that, when constructing types from Rust syntax, if we encounter anonymous regions like in fn(&u32), we just create a fresh index and don't have to update the binder.

Higher-ranked trait bounds

One of the more subtle concepts in trait resolution is higher-ranked trait bounds. An example of such a bound is for<'a> MyTrait<&'a isize>. Let's walk through how selection on higher-ranked trait references works.

Basic matching and placeholder leaks

Suppose we have a trait Foo:


#![allow(unused)]
fn main() {
trait Foo<X> {
    fn foo(&self, x: X) { }
}
}

Let's say we have a function want_hrtb that wants a type which implements Foo<&'a isize> for any 'a:

fn want_hrtb<T>() where T : for<'a> Foo<&'a isize> { ... }

Now we have a struct AnyInt that implements Foo<&'a isize> for any 'a:

struct AnyInt;
impl<'a> Foo<&'a isize> for AnyInt { }

And the question is, does AnyInt : for<'a> Foo<&'a isize>? We want the answer to be yes. The algorithm for figuring it out is closely related to the subtyping for higher-ranked types (which is described here and also in a paper by SPJ. If you wish to understand higher-ranked subtyping, we recommend you read the paper). There are a few parts:

  1. Replace bound regions in the obligation with placeholders.
  2. Match the impl against the placeholder obligation.
  3. Check for placeholder leaks.

So let's work through our example.

  1. The first thing we would do is to replace the bound region in the obligation with a placeholder, yielding AnyInt : Foo<&'0 isize> (here '0 represents placeholder region #0). Note that we now have no quantifiers; in terms of the compiler type, this changes from a ty::PolyTraitRef to a TraitRef. We would then create the TraitRef from the impl, using fresh variables for it's bound regions (and thus getting Foo<&'$a isize>, where '$a is the inference variable for 'a).

  2. Next we relate the two trait refs, yielding a graph with the constraint that '0 == '$a.

  3. Finally, we check for placeholder "leaks" – a leak is basically any attempt to relate a placeholder region to another placeholder region, or to any region that pre-existed the impl match. The leak check is done by searching from the placeholder region to find the set of regions that it is related to in any way. This is called the "taint" set. To pass the check, that set must consist solely of itself and region variables from the impl. If the taint set includes any other region, then the match is a failure. In this case, the taint set for '0 is {'0, '$a}, and hence the check will succeed.

Let's consider a failure case. Imagine we also have a struct

struct StaticInt;
impl Foo<&'static isize> for StaticInt;

We want the obligation StaticInt : for<'a> Foo<&'a isize> to be considered unsatisfied. The check begins just as before. 'a is replaced with a placeholder '0 and the impl trait reference is instantiated to Foo<&'static isize>. When we relate those two, we get a constraint like 'static == '0. This means that the taint set for '0 is {'0, 'static}, which fails the leak check.

TODO: This is because 'static is not a region variable but is in the taint set, right?

Higher-ranked trait obligations

Once the basic matching is done, we get to another interesting topic: how to deal with impl obligations. I'll work through a simple example here. Imagine we have the traits Foo and Bar and an associated impl:


#![allow(unused)]
fn main() {
trait Foo<X> {
    fn foo(&self, x: X) { }
}

trait Bar<X> {
    fn bar(&self, x: X) { }
}

impl<X,F> Foo<X> for F
    where F : Bar<X>
{
}
}

Now let's say we have an obligation Baz: for<'a> Foo<&'a isize> and we match this impl. What obligation is generated as a result? We want to get Baz: for<'a> Bar<&'a isize>, but how does that happen?

After the matching, we are in a position where we have a placeholder substitution like X => &'0 isize. If we apply this substitution to the impl obligations, we get F : Bar<&'0 isize>. Obviously this is not directly usable because the placeholder region '0 cannot leak out of our computation.

What we do is to create an inverse mapping from the taint set of '0 back to the original bound region ('a, here) that '0 resulted from. (This is done in higher_ranked::plug_leaks). We know that the leak check passed, so this taint set consists solely of the placeholder region itself plus various intermediate region variables. We then walk the trait-reference and convert every region in that taint set back to a late-bound region, so in this case we'd wind up with Baz: for<'a> Bar<&'a isize>.

Caching and subtle considerations therewith

In general, we attempt to cache the results of trait selection. This is a somewhat complex process. Part of the reason for this is that we want to be able to cache results even when all the types in the trait reference are not fully known. In that case, it may happen that the trait selection process is also influencing type variables, so we have to be able to not only cache the result of the selection process, but replay its effects on the type variables.

An example

The high-level idea of how the cache works is that we first replace all unbound inference variables with placeholder versions. Therefore, if we had a trait reference usize : Foo<$t>, where $t is an unbound inference variable, we might replace it with usize : Foo<$0>, where $0 is a placeholder type. We would then look this up in the cache.

If we found a hit, the hit would tell us the immediate next step to take in the selection process (e.g. apply impl #22, or apply where clause X : Foo<Y>).

On the other hand, if there is no hit, we need to go through the selection process from scratch. Suppose, we come to the conclusion that the only possible impl is this one, with def-id 22:

impl Foo<isize> for usize { ... } // Impl #22

We would then record in the cache usize : Foo<$0> => ImplCandidate(22). Next we would confirm ImplCandidate(22), which would (as a side-effect) unify $t with isize.

Now, at some later time, we might come along and see a usize : Foo<$u>. When replaced with a placeholder, this would yield usize : Foo<$0>, just as before, and hence the cache lookup would succeed, yielding ImplCandidate(22). We would confirm ImplCandidate(22) which would (as a side-effect) unify $u with isize.

Where clauses and the local vs global cache

One subtle interaction is that the results of trait lookup will vary depending on what where clauses are in scope. Therefore, we actually have two caches, a local and a global cache. The local cache is attached to the ParamEnv, and the global cache attached to the tcx. We use the local cache whenever the result might depend on the where clauses that are in scope. The determination of which cache to use is done by the method pick_candidate_cache in select.rs. At the moment, we use a very simple, conservative rule: if there are any where-clauses in scope, then we use the local cache. We used to try and draw finer-grained distinctions, but that led to a serious of annoying and weird bugs like #22019 and #18290. This simple rule seems to be pretty clearly safe and also still retains a very high hit rate (~95% when compiling rustc).

TODO: it looks like pick_candidate_cache no longer exists. In general, is this section still accurate at all?

Specialization

TODO: where does Chalk fit in? Should we mention/discuss it here?

Defined in the specialize module.

The basic strategy is to build up a specialization graph during coherence checking (recall that coherence checking looks for overlapping impls). Insertion into the graph locates the right place to put an impl in the specialization hierarchy; if there is no right place (due to partial overlap but no containment), you get an overlap error. Specialization is consulted when selecting an impl (of course), and the graph is consulted when propagating defaults down the specialization hierarchy.

You might expect that the specialization graph would be used during selection – i.e. when actually performing specialization. This is not done for two reasons:

  • It's merely an optimization: given a set of candidates that apply, we can determine the most specialized one by comparing them directly for specialization, rather than consulting the graph. Given that we also cache the results of selection, the benefit of this optimization is questionable.

  • To build the specialization graph in the first place, we need to use selection (because we need to determine whether one impl specializes another). Dealing with this reentrancy would require some additional mode switch for selection. Given that there seems to be no strong reason to use the graph anyway, we stick with a simpler approach in selection, and use the graph only for propagating default implementations.

Trait impl selection can succeed even when multiple impls can apply, as long as they are part of the same specialization family. In that case, it returns a single impl on success – this is the most specialized impl known to apply. However, if there are any inference variables in play, the returned impl may not be the actual impl we will use at trans time. Thus, we take special care to avoid projecting associated types unless either (1) the associated type does not use default and thus cannot be overridden or (2) all input types are known concretely.

Additional Resources

This talk by @sunjay may be useful. Keep in mind that the talk only gives a broad overview of the problem and the solution (it was presented about halfway through @sunjay's work). Also, it was given in June 2018, and some things may have changed by the time you watch it.

Chalk-based trait solving

Chalk is an experimental trait solver for rust that is currently under development by the Traits Working Group. Its goal is to enable a lot of trait system features and bug fixes that are currently hard to implement (e.g. GATs or specialization). if you would like to help in hacking on the new solver, you will find instructions for getting involved in the Traits Working Group tracking issue.

The new-style trait solver is based on the work done in chalk. Chalk recasts Rust's trait system explicitly in terms of logic programming. It does this by "lowering" Rust code into a kind of logic program we can then execute queries against.

The key observation here is that the Rust trait system is basically a kind of logic, and it can be mapped onto standard logical inference rules. We can then look for solutions to those inference rules in a very similar fashion to how e.g. a Prolog solver works. It turns out that we can't quite use Prolog rules (also called Horn clauses) but rather need a somewhat more expressive variant.

You can read more about chalk itself in the Chalk book section.

Ongoing work

The design of the new-style trait solving currently happens in two places:

chalk. The chalk repository is where we experiment with new ideas and designs for the trait system.

rustc. Once we are happy with the logical rules, we proceed to implementing them in rustc. We map our struct, trait, and impl declarations into logical inference rules in the lowering module in rustc.

Lowering to logic

The key observation here is that the Rust trait system is basically a kind of logic, and it can be mapped onto standard logical inference rules. We can then look for solutions to those inference rules in a very similar fashion to how e.g. a Prolog solver works. It turns out that we can't quite use Prolog rules (also called Horn clauses) but rather need a somewhat more expressive variant.

Rust traits and logic

One of the first observations is that the Rust trait system is basically a kind of logic. As such, we can map our struct, trait, and impl declarations into logical inference rules. For the most part, these are basically Horn clauses, though we'll see that to capture the full richness of Rust – and in particular to support generic programming – we have to go a bit further than standard Horn clauses.

To see how this mapping works, let's start with an example. Imagine we declare a trait and a few impls, like so:


#![allow(unused)]
fn main() {
trait Clone { }
impl Clone for usize { }
impl<T> Clone for Vec<T> where T: Clone { }
}

We could map these declarations to some Horn clauses, written in a Prolog-like notation, as follows:

Clone(usize).
Clone(Vec<?T>) :- Clone(?T).

// The notation `A :- B` means "A is true if B is true".
// Or, put another way, B implies A.

In Prolog terms, we might say that Clone(Foo) – where Foo is some Rust type – is a predicate that represents the idea that the type Foo implements Clone. These rules are program clauses; they state the conditions under which that predicate can be proven (i.e., considered true). So the first rule just says "Clone is implemented for usize". The next rule says "for any type ?T, Clone is implemented for Vec<?T> if clone is implemented for ?T". So e.g. if we wanted to prove that Clone(Vec<Vec<usize>>), we would do so by applying the rules recursively:

  • Clone(Vec<Vec<usize>>) is provable if:
    • Clone(Vec<usize>) is provable if:
      • Clone(usize) is provable. (Which it is, so we're all good.)

But now suppose we tried to prove that Clone(Vec<Bar>). This would fail (after all, I didn't give an impl of Clone for Bar):

  • Clone(Vec<Bar>) is provable if:
    • Clone(Bar) is provable. (But it is not, as there are no applicable rules.)

We can easily extend the example above to cover generic traits with more than one input type. So imagine the Eq<T> trait, which declares that Self is equatable with a value of type T:

trait Eq<T> { ... }
impl Eq<usize> for usize { }
impl<T: Eq<U>> Eq<Vec<U>> for Vec<T> { }

That could be mapped as follows:

Eq(usize, usize).
Eq(Vec<?T>, Vec<?U>) :- Eq(?T, ?U).

So far so good.

Type-checking normal functions

OK, now that we have defined some logical rules that are able to express when traits are implemented and to handle associated types, let's turn our focus a bit towards type-checking. Type-checking is interesting because it is what gives us the goals that we need to prove. That is, everything we've seen so far has been about how we derive the rules by which we can prove goals from the traits and impls in the program; but we are also interested in how to derive the goals that we need to prove, and those come from type-checking.

Consider type-checking the function foo() here:

fn foo() { bar::<usize>() }
fn bar<U: Eq<U>>() { }

This function is very simple, of course: all it does is to call bar::<usize>(). Now, looking at the definition of bar(), we can see that it has one where-clause U: Eq<U>. So, that means that foo() will have to prove that usize: Eq<usize> in order to show that it can call bar() with usize as the type argument.

If we wanted, we could write a Prolog predicate that defines the conditions under which bar() can be called. We'll say that those conditions are called being "well-formed":

barWellFormed(?U) :- Eq(?U, ?U).

Then we can say that foo() type-checks if the reference to bar::<usize> (that is, bar() applied to the type usize) is well-formed:

fooTypeChecks :- barWellFormed(usize).

If we try to prove the goal fooTypeChecks, it will succeed:

  • fooTypeChecks is provable if:
    • barWellFormed(usize), which is provable if:
      • Eq(usize, usize), which is provable because of an impl.

Ok, so far so good. Let's move on to type-checking a more complex function.

Type-checking generic functions: beyond Horn clauses

In the last section, we used standard Prolog horn-clauses (augmented with Rust's notion of type equality) to type-check some simple Rust functions. But that only works when we are type-checking non-generic functions. If we want to type-check a generic function, it turns out we need a stronger notion of goal than what Prolog can provide. To see what I'm talking about, let's revamp our previous example to make foo generic:

fn foo<T: Eq<T>>() { bar::<T>() }
fn bar<U: Eq<U>>() { }

To type-check the body of foo, we need to be able to hold the type T "abstract". That is, we need to check that the body of foo is type-safe for all types T, not just for some specific type. We might express this like so:

fooTypeChecks :-
  // for all types T...
  forall<T> {
    // ...if we assume that Eq(T, T) is provable...
    if (Eq(T, T)) {
      // ...then we can prove that `barWellFormed(T)` holds.
      barWellFormed(T)
    }
  }.

This notation I'm using here is the notation I've been using in my prototype implementation; it's similar to standard mathematical notation but a bit Rustified. Anyway, the problem is that standard Horn clauses don't allow universal quantification (forall) or implication (if) in goals (though many Prolog engines do support them, as an extension). For this reason, we need to accept something called "first-order hereditary harrop" (FOHH) clauses – this long name basically means "standard Horn clauses with forall and if in the body". But it's nice to know the proper name, because there is a lot of work describing how to efficiently handle FOHH clauses; see for example Gopalan Nadathur's excellent "A Proof Procedure for the Logic of Hereditary Harrop Formulas" in the bibliography of Chalk Book.

It turns out that supporting FOHH is not really all that hard. And once we are able to do that, we can easily describe the type-checking rule for generic functions like foo in our logic.

Source

This page is a lightly adapted version of a blog post by Nicholas Matsakis.

Goals and clauses

In logic programming terms, a goal is something that you must prove and a clause is something that you know is true. As described in the lowering to logic chapter, Rust's trait solver is based on an extension of hereditary harrop (HH) clauses, which extend traditional Prolog Horn clauses with a few new superpowers.

Goals and clauses meta structure

In Rust's solver, goals and clauses have the following forms (note that the two definitions reference one another):

Goal = DomainGoal           // defined in the section below
        | Goal && Goal
        | Goal || Goal
        | exists<K> { Goal }   // existential quantification
        | forall<K> { Goal }   // universal quantification
        | if (Clause) { Goal } // implication
        | true                 // something that's trivially true
        | ambiguous            // something that's never provable

Clause = DomainGoal
        | Clause :- Goal     // if can prove Goal, then Clause is true
        | Clause && Clause
        | forall<K> { Clause }

K = <type>     // a "kind"
    | <lifetime>

The proof procedure for these sorts of goals is actually quite straightforward. Essentially, it's a form of depth-first search. The paper "A Proof Procedure for the Logic of Hereditary Harrop Formulas" gives the details.

In terms of code, these types are defined in rustc_middle/src/traits/mod.rs in rustc, and in chalk-ir/src/lib.rs in chalk.

Domain goals

Domain goals are the atoms of the trait logic. As can be seen in the definitions given above, general goals basically consist in a combination of domain goals.

Moreover, flattening a bit the definition of clauses given previously, one can see that clauses are always of the form:

forall<K1, ..., Kn> { DomainGoal :- Goal }

hence domain goals are in fact clauses' LHS. That is, at the most granular level, domain goals are what the trait solver will end up trying to prove.

To define the set of domain goals in our system, we need to first introduce a few simple formulations. A trait reference consists of the name of a trait along with a suitable set of inputs P0..Pn:

TraitRef = P0: TraitName<P1..Pn>

So, for example, u32: Display is a trait reference, as is Vec<T>: IntoIterator. Note that Rust surface syntax also permits some extra things, like associated type bindings (Vec<T>: IntoIterator<Item = T>), that are not part of a trait reference.

A projection consists of an associated item reference along with its inputs P0..Pm:

Projection = <P0 as TraitName<P1..Pn>>::AssocItem<Pn+1..Pm>

Given these, we can define a DomainGoal as follows:

DomainGoal = Holds(WhereClause)
            | FromEnv(TraitRef)
            | FromEnv(Type)
            | WellFormed(TraitRef)
            | WellFormed(Type)
            | Normalize(Projection -> Type)

WhereClause = Implemented(TraitRef)
            | ProjectionEq(Projection = Type)
            | Outlives(Type: Region)
            | Outlives(Region: Region)

WhereClause refers to a where clause that a Rust user would actually be able to write in a Rust program. This abstraction exists only as a convenience as we sometimes want to only deal with domain goals that are effectively writable in Rust.

Let's break down each one of these, one-by-one.

Implemented(TraitRef)

e.g. Implemented(i32: Copy)

True if the given trait is implemented for the given input types and lifetimes.

ProjectionEq(Projection = Type)

e.g. ProjectionEq<T as Iterator>::Item = u8

The given associated type Projection is equal to Type; this can be proved with either normalization or using placeholder associated types. See the section on associated types in Chalk Book.

Normalize(Projection -> Type)

e.g. ProjectionEq<T as Iterator>::Item -> u8

The given associated type Projection can be normalized to Type.

As discussed in the section on associated types in Chalk Book, Normalize implies ProjectionEq, but not vice versa. In general, proving Normalize(<T as Trait>::Item -> U) also requires proving Implemented(T: Trait).

FromEnv(TraitRef)

e.g. FromEnv(Self: Add<i32>)

True if the inner TraitRef is assumed to be true, that is, if it can be derived from the in-scope where clauses.

For example, given the following function:


#![allow(unused)]
fn main() {
fn loud_clone<T: Clone>(stuff: &T) -> T {
    println!("cloning!");
    stuff.clone()
}
}

Inside the body of our function, we would have FromEnv(T: Clone). In-scope where clauses nest, so a function body inside an impl body inherits the impl body's where clauses, too.

This and the next rule are used to implement implied bounds. As we'll see in the section on lowering, FromEnv(TraitRef) implies Implemented(TraitRef), but not vice versa. This distinction is crucial to implied bounds.

FromEnv(Type)

e.g. FromEnv(HashSet<K>)

True if the inner Type is assumed to be well-formed, that is, if it is an input type of a function or an impl.

For example, given the following code:

struct HashSet<K> where K: Hash { ... }

fn loud_insert<K>(set: &mut HashSet<K>, item: K) {
    println!("inserting!");
    set.insert(item);
}

HashSet<K> is an input type of the loud_insert function. Hence, we assume it to be well-formed, so we would have FromEnv(HashSet<K>) inside the body of our function. As we'll see in the section on lowering, FromEnv(HashSet<K>) implies Implemented(K: Hash) because the HashSet declaration was written with a K: Hash where clause. Hence, we don't need to repeat that bound on the loud_insert function: we rather automatically assume that it is true.

WellFormed(Item)

These goals imply that the given item is well-formed.

We can talk about different types of items being well-formed:

  • Types, like WellFormed(Vec<i32>), which is true in Rust, or WellFormed(Vec<str>), which is not (because str is not Sized.)

  • TraitRefs, like WellFormed(Vec<i32>: Clone).

Well-formedness is important to implied bounds. In particular, the reason it is okay to assume FromEnv(T: Clone) in the loud_clone example is that we also verify WellFormed(T: Clone) for each call site of loud_clone. Similarly, it is okay to assume FromEnv(HashSet<K>) in the loud_insert example because we will verify WellFormed(HashSet<K>) for each call site of loud_insert.

Outlives(Type: Region), Outlives(Region: Region)

e.g. Outlives(&'a str: 'b), Outlives('a: 'static)

True if the given type or region on the left outlives the right-hand region.

Coinductive goals

Most goals in our system are "inductive". In an inductive goal, circular reasoning is disallowed. Consider this example clause:

    Implemented(Foo: Bar) :-
        Implemented(Foo: Bar).

Considered inductively, this clause is useless: if we are trying to prove Implemented(Foo: Bar), we would then recursively have to prove Implemented(Foo: Bar), and that cycle would continue ad infinitum (the trait solver will terminate here, it would just consider that Implemented(Foo: Bar) is not known to be true).

However, some goals are co-inductive. Simply put, this means that cycles are OK. So, if Bar were a co-inductive trait, then the rule above would be perfectly valid, and it would indicate that Implemented(Foo: Bar) is true.

Auto traits are one example in Rust where co-inductive goals are used. Consider the Send trait, and imagine that we have this struct:


#![allow(unused)]
fn main() {
struct Foo {
    next: Option<Box<Foo>>
}
}

The default rules for auto traits say that Foo is Send if the types of its fields are Send. Therefore, we would have a rule like

Implemented(Foo: Send) :-
    Implemented(Option<Box<Foo>>: Send).

As you can probably imagine, proving that Option<Box<Foo>>: Send is going to wind up circularly requiring us to prove that Foo: Send again. So this would be an example where we wind up in a cycle – but that's ok, we do consider Foo: Send to hold, even though it references itself.

In general, co-inductive traits are used in Rust trait solving when we want to enumerate a fixed set of possibilities. In the case of auto traits, we are enumerating the set of reachable types from a given starting point (i.e., Foo can reach values of type Option<Box<Foo>>, which implies it can reach values of type Box<Foo>, and then of type Foo, and then the cycle is complete).

In addition to auto traits, WellFormed predicates are co-inductive. These are used to achieve a similar "enumerate all the cases" pattern, as described in the section on implied bounds.

Incomplete chapter

Some topics yet to be written:

  • Elaborate on the proof procedure
  • SLG solving – introduce negative reasoning

Canonical queries

The "start" of the trait system is the canonical query (these are both queries in the more general sense of the word – something you would like to know the answer to – and in the rustc-specific sense). The idea is that the type checker or other parts of the system, may in the course of doing their thing want to know whether some trait is implemented for some type (e.g., is u32: Debug true?). Or they may want to normalize some associated type.

This section covers queries at a fairly high level of abstraction. The subsections look a bit more closely at how these ideas are implemented in rustc.

The traditional, interactive Prolog query

In a traditional Prolog system, when you start a query, the solver will run off and start supplying you with every possible answer it can find. So given something like this:

?- Vec<i32>: AsRef<?U>

The solver might answer:

Vec<i32>: AsRef<[i32]>
    continue? (y/n)

This continue bit is interesting. The idea in Prolog is that the solver is finding all possible instantiations of your query that are true. In this case, if we instantiate ?U = [i32], then the query is true (note that a traditional Prolog interface does not, directly, tell us a value for ?U, but we can infer one by unifying the response with our original query – Rust's solver gives back a substitution instead). If we were to hit y, the solver might then give us another possible answer:

Vec<i32>: AsRef<Vec<i32>>
    continue? (y/n)

This answer derives from the fact that there is a reflexive impl (impl<T> AsRef<T> for T) for AsRef. If were to hit y again, then we might get back a negative response:

no

Naturally, in some cases, there may be no possible answers, and hence the solver will just give me back no right away:

?- Box<i32>: Copy
    no

In some cases, there might be an infinite number of responses. So for example if I gave this query, and I kept hitting y, then the solver would never stop giving me back answers:

?- Vec<?U>: Clone
    Vec<i32>: Clone
        continue? (y/n)
    Vec<Box<i32>>: Clone
        continue? (y/n)
    Vec<Box<Box<i32>>>: Clone
        continue? (y/n)
    Vec<Box<Box<Box<i32>>>>: Clone
        continue? (y/n)

As you can imagine, the solver will gleefully keep adding another layer of Box until we ask it to stop, or it runs out of memory.

Another interesting thing is that queries might still have variables in them. For example:

?- Rc<?T>: Clone

might produce the answer:

Rc<?T>: Clone
    continue? (y/n)

After all, Rc<?T> is true no matter what type ?T is.

A trait query in rustc

The trait queries in rustc work somewhat differently. Instead of trying to enumerate all possible answers for you, they are looking for an unambiguous answer. In particular, when they tell you the value for a type variable, that means that this is the only possible instantiation that you could use, given the current set of impls and where-clauses, that would be provable.

The response to a trait query in rustc is typically a Result<QueryResult<T>, NoSolution> (where the T will vary a bit depending on the query itself). The Err(NoSolution) case indicates that the query was false and had no answers (e.g., Box<i32>: Copy). Otherwise, the QueryResult gives back information about the possible answer(s) we did find. It consists of four parts:

  • Certainty: tells you how sure we are of this answer. It can have two values:
    • Proven means that the result is known to be true.
      • This might be the result for trying to prove Vec<i32>: Clone, say, or Rc<?T>: Clone.
    • Ambiguous means that there were things we could not yet prove to be either true or false, typically because more type information was needed. (We'll see an example shortly.)
      • This might be the result for trying to prove Vec<?T>: Clone.
  • Var values: Values for each of the unbound inference variables (like ?T) that appeared in your original query. (Remember that in Prolog, we had to infer these.)
    • As we'll see in the example below, we can get back var values even for Ambiguous cases.
  • Region constraints: these are relations that must hold between the lifetimes that you supplied as inputs. We'll ignore these here.
  • Value: The query result also comes with a value of type T. For some specialized queries – like normalizing associated types – this is used to carry back an extra result, but it's often just ().

Examples

Let's work through an example query to see what all the parts mean. Consider the Borrow trait. This trait has a number of impls; among them, there are these two (for clarity, I've written the Sized bounds explicitly):

impl<T> Borrow<T> for T where T: ?Sized
impl<T> Borrow<[T]> for Vec<T> where T: Sized

Example 1. Imagine we are type-checking this (rather artificial) bit of code:

fn foo<A, B>(a: A, vec_b: Option<B>) where A: Borrow<B> { }

fn main() {
    let mut t: Vec<_> = vec![]; // Type: Vec<?T>
    let mut u: Option<_> = None; // Type: Option<?U>
    foo(t, u); // Example 1: requires `Vec<?T>: Borrow<?U>`
    ...
}

As the comments indicate, we first create two variables t and u; t is an empty vector and u is a None option. Both of these variables have unbound inference variables in their type: ?T represents the elements in the vector t and ?U represents the value stored in the option u. Next, we invoke foo; comparing the signature of foo to its arguments, we wind up with A = Vec<?T> and B = ?U. Therefore, the where clause on foo requires that Vec<?T>: Borrow<?U>. This is thus our first example trait query.

There are many possible solutions to the query Vec<?T>: Borrow<?U>; for example:

  • ?U = Vec<?T>,
  • ?U = [?T],
  • ?T = u32, ?U = [u32]
  • and so forth.

Therefore, the result we get back would be as follows (I'm going to ignore region constraints and the "value"):

  • Certainty: Ambiguous – we're not sure yet if this holds
  • Var values: [?T = ?T, ?U = ?U] – we learned nothing about the values of the variables

In short, the query result says that it is too soon to say much about whether this trait is proven. During type-checking, this is not an immediate error: instead, the type checker would hold on to this requirement (Vec<?T>: Borrow<?U>) and wait. As we'll see in the next example, it may happen that ?T and ?U wind up constrained from other sources, in which case we can try the trait query again.

Example 2. We can now extend our previous example a bit, and assign a value to u:

fn foo<A, B>(a: A, vec_b: Option<B>) where A: Borrow<B> { }

fn main() {
    // What we saw before:
    let mut t: Vec<_> = vec![]; // Type: Vec<?T>
    let mut u: Option<_> = None; // Type: Option<?U>
    foo(t, u); // `Vec<?T>: Borrow<?U>` => ambiguous

    // New stuff:
    u = Some(vec![]); // ?U = Vec<?V>
}

As a result of this assignment, the type of u is forced to be Option<Vec<?V>>, where ?V represents the element type of the vector. This in turn implies that ?U is unified to Vec<?V>.

Let's suppose that the type checker decides to revisit the "as-yet-unproven" trait obligation we saw before, Vec<?T>: Borrow<?U>. ?U is no longer an unbound inference variable; it now has a value, Vec<?V>. So, if we "refresh" the query with that value, we get:

Vec<?T>: Borrow<Vec<?V>>

This time, there is only one impl that applies, the reflexive impl:

impl<T> Borrow<T> for T where T: ?Sized

Therefore, the trait checker will answer:

  • Certainty: Proven
  • Var values: [?T = ?T, ?V = ?T]

Here, it is saying that we have indeed proven that the obligation holds, and we also know that ?T and ?V are the same type (but we don't know what that type is yet!).

(In fact, as the function ends here, the type checker would give an error at this point, since the element types of t and u are still not yet known, even though they are known to be the same.)

The lowering module in rustc

This work is ongoing. This section will be filled in once some of it has landed in rustc.

Type checking

The rustc_typeck crate contains the source for "type collection" and "type checking", as well as a few other bits of related functionality. (It draws heavily on the type inference and trait solving.)

Type collection

Type "collection" is the process of converting the types found in the HIR (hir::Ty), which represent the syntactic things that the user wrote, into the internal representation used by the compiler (Ty<'tcx>) – we also do similar conversions for where-clauses and other bits of the function signature.

To try and get a sense for the difference, consider this function:

struct Foo { }
fn foo(x: Foo, y: self::Foo) { ... }
//        ^^^     ^^^^^^^^^

Those two parameters x and y each have the same type: but they will have distinct hir::Ty nodes. Those nodes will have different spans, and of course they encode the path somewhat differently. But once they are "collected" into Ty<'tcx> nodes, they will be represented by the exact same internal type.

Collection is defined as a bundle of queries for computing information about the various functions, traits, and other items in the crate being compiled. Note that each of these queries is concerned with interprocedural things – for example, for a function definition, collection will figure out the type and signature of the function, but it will not visit the body of the function in any way, nor examine type annotations on local variables (that's the job of type checking).

For more details, see the collect module.

TODO: actually talk about type checking...

Method lookup

Method lookup can be rather complex due to the interaction of a number of factors, such as self types, autoderef, trait lookup, etc. This file provides an overview of the process. More detailed notes are in the code itself, naturally.

One way to think of method lookup is that we convert an expression of the form:

receiver.method(...)

into a more explicit UFCS form:

Trait::method(ADJ(receiver), ...) // for a trait call
ReceiverType::method(ADJ(receiver), ...) // for an inherent method call

Here ADJ is some kind of adjustment, which is typically a series of autoderefs and then possibly an autoref (e.g., &**receiver). However we sometimes do other adjustments and coercions along the way, in particular unsizing (e.g., converting from [T; n] to [T]).

Method lookup is divided into two major phases:

  1. Probing (probe.rs). The probe phase is when we decide what method to call and how to adjust the receiver.
  2. Confirmation (confirm.rs). The confirmation phase "applies" this selection, updating the side-tables, unifying type variables, and otherwise doing side-effectful things.

One reason for this division is to be more amenable to caching. The probe phase produces a "pick" (probe::Pick), which is designed to be cacheable across method-call sites. Therefore, it does not include inference variables or other information.

The Probe phase

Steps

The first thing that the probe phase does is to create a series of steps. This is done by progressively dereferencing the receiver type until it cannot be deref'd anymore, as well as applying an optional "unsize" step. So if the receiver has type Rc<Box<[T; 3]>>, this might yield:

Rc<Box<[T; 3]>>
Box<[T; 3]>
[T; 3]
[T]

Candidate assembly

We then search along those steps to create a list of candidates. A Candidate is a method item that might plausibly be the method being invoked. For each candidate, we'll derive a "transformed self type" that takes into account explicit self.

Candidates are grouped into two kinds, inherent and extension.

Inherent candidates are those that are derived from the type of the receiver itself. So, if you have a receiver of some nominal type Foo (e.g., a struct), any methods defined within an impl like impl Foo are inherent methods. Nothing needs to be imported to use an inherent method, they are associated with the type itself (note that inherent impls can only be defined in the same module as the type itself).

FIXME: Inherent candidates are not always derived from impls. If you have a trait object, such as a value of type Box<ToString>, then the trait methods (to_string(), in this case) are inherently associated with it. Another case is type parameters, in which case the methods of their bounds are inherent. However, this part of the rules is subject to change: when DST's "impl Trait for Trait" is complete, trait object dispatch could be subsumed into trait matching, and the type parameter behavior should be reconsidered in light of where clauses.

TODO: Is this FIXME still accurate?

Extension candidates are derived from imported traits. If I have the trait ToString imported, and I call to_string() on a value of type T, then we will go off to find out whether there is an impl of ToString for T. These kinds of method calls are called "extension methods". They can be defined in any module, not only the one that defined T. Furthermore, you must import the trait to call such a method.

So, let's continue our example. Imagine that we were calling a method foo with the receiver Rc<Box<[T; 3]>> and there is a trait Foo that defines it with &self for the type Rc<U> as well as a method on the type Box that defines Foo but with &mut self. Then we might have two candidates:

&Rc<Box<[T; 3]>> from the impl of `Foo` for `Rc<U>` where `U=Box<T; 3]>
&mut Box<[T; 3]>> from the inherent impl on `Box<U>` where `U=[T; 3]`

Candidate search

Finally, to actually pick the method, we will search down the steps, trying to match the receiver type against the candidate types. At each step, we also consider an auto-ref and auto-mut-ref to see whether that makes any of the candidates match. We pick the first step where we find a match.

In the case of our example, the first step is Rc<Box<[T; 3]>>, which does not itself match any candidate. But when we autoref it, we get the type &Rc<Box<[T; 3]>> which does match. We would then recursively consider all where-clauses that appear on the impl: if those match (or we cannot rule out that they do), then this is the method we would pick. Otherwise, we would continue down the series of steps.

Variance of type and lifetime parameters

For a more general background on variance, see the background appendix.

During type checking we must infer the variance of type and lifetime parameters. The algorithm is taken from Section 4 of the paper "Taming the Wildcards: Combining Definition- and Use-Site Variance" published in PLDI'11 and written by Altidor et al., and hereafter referred to as The Paper.

This inference is explicitly designed not to consider the uses of types within code. To determine the variance of type parameters defined on type X, we only consider the definition of the type X and the definitions of any types it references.

We only infer variance for type parameters found on data types like structs and enums. In these cases, there is a fairly straightforward explanation for what variance means. The variance of the type or lifetime parameters defines whether T<A> is a subtype of T<B> (resp. T<'a> and T<'b>) based on the relationship of A and B (resp. 'a and 'b).

We do not infer variance for type parameters found on traits, functions, or impls. Variance on trait parameters can indeed make sense (and we used to compute it) but it is actually rather subtle in meaning and not that useful in practice, so we removed it. See the addendum for some details. Variances on function/impl parameters, on the other hand, doesn't make sense because these parameters are instantiated and then forgotten, they don't persist in types or compiled byproducts.

Notation

We use the notation of The Paper throughout this chapter:

  • + is covariance.
  • - is contravariance.
  • * is bivariance.
  • o is invariance.

The algorithm

The basic idea is quite straightforward. We iterate over the types defined and, for each use of a type parameter X, accumulate a constraint indicating that the variance of X must be valid for the variance of that use site. We then iteratively refine the variance of X until all constraints are met. There is always a solution, because at the limit we can declare all type parameters to be invariant and all constraints will be satisfied.

As a simple example, consider:

enum Option<A> { Some(A), None }
enum OptionalFn<B> { Some(|B|), None }
enum OptionalMap<C> { Some(|C| -> C), None }

Here, we will generate the constraints:

1. V(A) <= +
2. V(B) <= -
3. V(C) <= +
4. V(C) <= -

These indicate that (1) the variance of A must be at most covariant; (2) the variance of B must be at most contravariant; and (3, 4) the variance of C must be at most covariant and contravariant. All of these results are based on a variance lattice defined as follows:

   *      Top (bivariant)
-     +
   o      Bottom (invariant)

Based on this lattice, the solution V(A)=+, V(B)=-, V(C)=o is the optimal solution. Note that there is always a naive solution which just declares all variables to be invariant.

You may be wondering why fixed-point iteration is required. The reason is that the variance of a use site may itself be a function of the variance of other type parameters. In full generality, our constraints take the form:

V(X) <= Term
Term := + | - | * | o | V(X) | Term x Term

Here the notation V(X) indicates the variance of a type/region parameter X with respect to its defining class. Term x Term represents the "variance transform" as defined in the paper:

If the variance of a type variable X in type expression E is V2 and the definition-site variance of the corresponding type parameter of a class C is V1, then the variance of X in the type expression C<E> is V3 = V1.xform(V2).

Constraints

If I have a struct or enum with where clauses:

struct Foo<T: Bar> { ... }

you might wonder whether the variance of T with respect to Bar affects the variance T with respect to Foo. I claim no. The reason: assume that T is invariant with respect to Bar but covariant with respect to Foo. And then we have a Foo<X> that is upcast to Foo<Y>, where X <: Y. However, while X : Bar, Y : Bar does not hold. In that case, the upcast will be illegal, but not because of a variance failure, but rather because the target type Foo<Y> is itself just not well-formed. Basically we get to assume well-formedness of all types involved before considering variance.

Dependency graph management

Because variance is a whole-crate inference, its dependency graph can become quite muddled if we are not careful. To resolve this, we refactor into two queries:

  • crate_variances computes the variance for all items in the current crate.
  • variances_of accesses the variance for an individual reading; it works by requesting crate_variances and extracting the relevant data.

If you limit yourself to reading variances_of, your code will only depend then on the inference of that particular item.

Ultimately, this setup relies on the red-green algorithm. In particular, every variance query effectively depends on all type definitions in the entire crate (through crate_variances), but since most changes will not result in a change to the actual results from variance inference, the variances_of query will wind up being considered green after it is re-evaluated.

Addendum: Variance on traits

As mentioned above, we used to permit variance on traits. This was computed based on the appearance of trait type parameters in method signatures and was used to represent the compatibility of vtables in trait objects (and also "virtual" vtables or dictionary in trait bounds). One complication was that variance for associated types is less obvious, since they can be projected out and put to myriad uses, so it's not clear when it is safe to allow X<A>::Bar to vary (or indeed just what that means). Moreover (as covered below) all inputs on any trait with an associated type had to be invariant, limiting the applicability. Finally, the annotations (MarkerTrait, PhantomFn) needed to ensure that all trait type parameters had a variance were confusing and annoying for little benefit.

Just for historical reference, I am going to preserve some text indicating how one could interpret variance and trait matching.

Variance and object types

Just as with structs and enums, we can decide the subtyping relationship between two object types &Trait<A> and &Trait<B> based on the relationship of A and B. Note that for object types we ignore the Self type parameter – it is unknown, and the nature of dynamic dispatch ensures that we will always call a function that is expected the appropriate Self type. However, we must be careful with the other type parameters, or else we could end up calling a function that is expecting one type but provided another.

To see what I mean, consider a trait like so:


#![allow(unused)]
fn main() {
trait ConvertTo<A> {
    fn convertTo(&self) -> A;
}
}

Intuitively, If we had one object O=&ConvertTo<Object> and another S=&ConvertTo<String>, then S <: O because String <: Object (presuming Java-like "string" and "object" types, my go to examples for subtyping). The actual algorithm would be to compare the (explicit) type parameters pairwise respecting their variance: here, the type parameter A is covariant (it appears only in a return position), and hence we require that String <: Object.

You'll note though that we did not consider the binding for the (implicit) Self type parameter: in fact, it is unknown, so that's good. The reason we can ignore that parameter is precisely because we don't need to know its value until a call occurs, and at that time (as you said) the dynamic nature of virtual dispatch means the code we run will be correct for whatever value Self happens to be bound to for the particular object whose method we called. Self is thus different from A, because the caller requires that A be known in order to know the return type of the method convertTo(). (As an aside, we have rules preventing methods where Self appears outside of the receiver position from being called via an object.)

Trait variance and vtable resolution

But traits aren't only used with objects. They're also used when deciding whether a given impl satisfies a given trait bound. To set the scene here, imagine I had a function:

fn convertAll<A,T:ConvertTo<A>>(v: &[T]) { ... }

Now imagine that I have an implementation of ConvertTo for Object:

impl ConvertTo<i32> for Object { ... }

And I want to call convertAll on an array of strings. Suppose further that for whatever reason I specifically supply the value of String for the type parameter T:

let mut vector = vec!["string", ...];
convertAll::<i32, String>(vector);

Is this legal? To put another way, can we apply the impl for Object to the type String? The answer is yes, but to see why we have to expand out what will happen:

  • convertAll will create a pointer to one of the entries in the vector, which will have type &String

  • It will then call the impl of convertTo() that is intended for use with objects. This has the type fn(self: &Object) -> i32.

    It is OK to provide a value for self of type &String because &String <: &Object.

OK, so intuitively we want this to be legal, so let's bring this back to variance and see whether we are computing the correct result. We must first figure out how to phrase the question "is an impl for Object,i32 usable where an impl for String,i32 is expected?"

Maybe it's helpful to think of a dictionary-passing implementation of type classes. In that case, convertAll() takes an implicit parameter representing the impl. In short, we have an impl of type:

V_O = ConvertTo<i32> for Object

and the function prototype expects an impl of type:

V_S = ConvertTo<i32> for String

As with any argument, this is legal if the type of the value given (V_O) is a subtype of the type expected (V_S). So is V_O <: V_S? The answer will depend on the variance of the various parameters. In this case, because the Self parameter is contravariant and A is covariant, it means that:

V_O <: V_S iff
    i32 <: i32
    String <: Object

These conditions are satisfied and so we are happy.

Variance and associated types

Traits with associated types – or at minimum projection expressions – must be invariant with respect to all of their inputs. To see why this makes sense, consider what subtyping for a trait reference means:

<T as Trait> <: <U as Trait>

means that if I know that T as Trait, I also know that U as Trait. Moreover, if you think of it as dictionary passing style, it means that a dictionary for <T as Trait> is safe to use where a dictionary for <U as Trait> is expected.

The problem is that when you can project types out from <T as Trait>, the relationship to types projected out of <U as Trait> is completely unknown unless T==U (see #21726 for more details). Making Trait invariant ensures that this is true.

Another related reason is that if we didn't make traits with associated types invariant, then projection is no longer a function with a single result. Consider:

trait Identity { type Out; fn foo(&self); }
impl<T> Identity for T { type Out = T; ... }

Now if I have <&'static () as Identity>::Out, this can be validly derived as &'a () for any 'a:

<&'a () as Identity> <: <&'static () as Identity>
if &'static () < : &'a ()   -- Identity is contravariant in Self
if 'static : 'a             -- Subtyping rules for relations

This change otoh means that <'static () as Identity>::Out is always &'static () (which might then be upcast to 'a (), separately). This was helpful in solving #21750.

Opaque types (type alias impl Trait)

Opaque types are syntax to declare an opaque type alias that only exposes a specific set of traits as their interface; the concrete type in the background is inferred from a certain set of use sites of the opaque type.

This is expressed by using impl Trait within type aliases, for example:

type Foo = impl Bar;

This declares an opaque type named Foo, of which the only information is that it implements Bar. Therefore, any of Bar's interface can be used on a Foo, but nothing else (regardless of whether it implements any other traits).

Since there needs to be a concrete background type, you can currently express that type by using the opaque type in a "defining use site".

struct Struct;
impl Bar for Struct { /* stuff */ }
fn foo() -> Foo {
    Struct
}

Any other "defining use site" needs to produce the exact same type.

Defining use site(s)

Currently only the return value of a function can be a defining use site of an opaque type (and only if the return type of that function contains the opaque type).

The defining use of an opaque type can be any code within the parent of the opaque type definition. This includes any siblings of the opaque type and all children of the siblings.

The initiative for "not causing fatal brain damage to developers due to accidentally running infinite loops in their brain while trying to comprehend what the type system is doing" has decided to disallow children of opaque types to be defining use sites.

Associated opaque types

Associated opaque types can be defined by any other associated item on the same trait impl or a child of these associated items. For instance:

trait Baz {
    type Foo;
    fn foo() -> Self::Foo;
}

struct Quux;

impl Baz for Quux {
    type Foo = impl Bar;
    fn foo() -> Self::Foo { ... }
}

Pattern and Exhaustiveness Checking

In Rust, pattern matching and bindings have a few very helpful properties. The compiler will check that bindings are irrefutable when made and that match arms are exhaustive.

Pattern usefulness

The central question that usefulness checking answers is: "in this match expression, is that branch reachable?". More precisely, it boils down to computing whether, given a list of patterns we have already seen, a given new pattern might match any new value.

For example, in the following match expression, we ask in turn whether each pattern might match something that wasn't matched by the patterns above it. Here we see the 4th pattern is redundant with the 1st; that branch will get an "unreachable" warning. The 3rd pattern may or may not be useful, depending on whether Foo has other variants than Bar. Finally, we can ask whether the whole match is exhaustive by asking whether the wildcard pattern (_) is useful relative to the list of all the patterns in that match. Here we can see that _ is useful (it would catch (false, None)); this expression would therefore get a "non-exhaustive match" error.


#![allow(unused)]
fn main() {
// x: (bool, Option<Foo>)
match x {
    (true, _) => {} // 1
    (false, Some(Foo::Bar)) => {} // 2
    (false, Some(_)) => {} // 3
    (true, None) => {} // 4
}
}

Thus usefulness is used for two purposes: detecting unreachable code (which is useful to the user), and ensuring that matches are exhaustive (which is important for soundness, because a match expression can return a value).

Where it happens

This check is done to any expression that desugars to a match expression in MIR. That includes actual match expressions, but also anything that looks like pattern matching, including if let, destructuring let, and similar expressions.


#![allow(unused)]
fn main() {
// `match`
// Usefulness can detect unreachable branches and forbid non-exhaustive matches.
match foo() {
    Ok(x) => x,
    Err(_) => panic!(),
}

// `if let`
// Usefulness can detect unreachable branches.
if let Some(x) = foo() {
    // ...
}

// `while let`
// Usefulness can detect infinite loops and dead loops.
while let Some(x) = it.next() {
    // ...
}

// Destructuring `let`
// Usefulness can forbid non-exhaustive patterns.
let Foo::Bar(x, y) = foo();

// Destructuring function arguments
// Usefulness can forbid non-exhaustive patterns.
fn foo(Foo { x, y }: Foo) {
    // ...
}
}

The algorithm

Exhaustiveness checking is implemented in check_match. The core of the algorithm is in usefulness. That file contains a detailed description of the algorithm.

Dataflow Analysis

If you work on the MIR, you will frequently come across various flavors of dataflow analysis. rustc uses dataflow to find uninitialized variables, determine what variables are live across a generator yield statement, and compute which Places are borrowed at a given point in the control-flow graph. Dataflow analysis is a fundamental concept in modern compilers, and knowledge of the subject will be helpful to prospective contributors.

However, this documentation is not a general introduction to dataflow analysis. It is merely a description of the framework used to define these analyses in rustc. It assumes that the reader is familiar with the core ideas as well as some basic terminology, such as "transfer function", "fixpoint" and "lattice". If you're unfamiliar with these terms, or if you want a quick refresher, Static Program Analysis by Anders Møller and Michael I. Schwartzbach is an excellent, freely available textbook. For those who prefer audiovisual learning, the Goethe University Frankfurt has published a series of short lectures on YouTube in English that are very approachable.

Defining a Dataflow Analysis

The interface for dataflow analyses is split into three traits. The first is AnalysisDomain, which must be implemented by all analyses. In addition to the type of the dataflow state, this trait defines the initial value of that state at entry to each block, as well as the direction of the analysis, either forward or backward. The domain of your dataflow analysis must be a lattice (strictly speaking a join-semilattice) with a well-behaved join operator. See documentation for the lattice module, as well as the JoinSemiLattice trait, for more information.

You must then provide either a direct implementation of the Analysis trait or an implementation of the proxy trait GenKillAnalysis. The latter is for so-called "gen-kill" problems, which have a simple class of transfer function that can be applied very efficiently. Analyses whose domain is not a BitSet of some index type, or whose transfer functions cannot be expressed through "gen" and "kill" operations, must implement Analysis directly, and will run slower as a result. All implementers of GenKillAnalysis also implement Analysis automatically via a default impl.

 AnalysisDomain
       ^
       |          | = has as a supertrait
       |          . = provides a default impl for
       |
   Analysis
     ^   ^
     |   .
     |   .
     |   .
 GenKillAnalysis

Transfer Functions and Effects

The dataflow framework in rustc allows each statement (and terminator) inside a basic block define its own transfer function. For brevity, these individual transfer functions are known as "effects". Each effect is applied successively in dataflow order, and together they define the transfer function for the entire basic block. It's also possible to define an effect for particular outgoing edges of some terminators (e.g. apply_call_return_effect for the success edge of a Call terminator). Collectively, these are referred to as "per-edge effects".

The only meaningful difference (besides the "apply" prefix) between the methods of the GenKillAnalysis trait and the Analysis trait is that an Analysis has direct, mutable access to the dataflow state, whereas a GenKillAnalysis only sees an implementer of the GenKill trait, which only allows the gen and kill operations for mutation.

"Before" Effects

Observant readers of the documentation may notice that there are actually two possible effects for each statement and terminator, the "before" effect and the unprefixed (or "primary") effect. The "before" effects are applied immediately before the unprefixed effect regardless of the direction of the analysis. In other words, a backward analysis will apply the "before" effect and then the the "primary" effect when computing the transfer function for a basic block, just like a forward analysis.

The vast majority of analyses should use only the unprefixed effects: Having multiple effects for each statement makes it difficult for consumers to know where they should be looking. However, the "before" variants can be useful in some scenarios, such as when the effect of the right-hand side of an assignment statement must be considered separately from the left-hand side.

Convergence

Your analysis must converge to "fixpoint", otherwise it will run forever. Converging to fixpoint is just another way of saying "reaching equilibrium". In order to reach equilibrium, your analysis must obey some laws. One of the laws it must obey is that the bottom value1 joined with some other value equals the second value. Or, as an equation:

bottom join x = x

Another law is that your analysis must have a "top value" such that

top join x = top

Having a top value ensures that your semilattice has a finite height, and the law state above ensures that once the dataflow state reaches top, it will no longer change (the fixpoint will be top).

1

The bottom value's primary purpose is as the initial dataflow state. Each basic block's entry state is initialized to bottom before the analysis starts.

A Brief Example

This section provides a brief example of a simple data-flow analysis at a high level. It doesn't explain everything you need to know, but hopefully it will make the rest of this page clearer.

Let's say we want to do a simple analysis to find if mem::transmute may have been called by a certain point in the program. Our analysis domain will just be a bool that records whether transmute has been called so far. The bottom value will be false, since by default transmute has not been called. The top value will be true, since our analysis is done as soon as we determine that transmute has been called. Our join operator will just be the boolean OR (||) operator. We use OR and not AND because of this case:

let x = if some_cond {
    std::mem::transmute<i32, u32>(0_i32); // transmute was called!
} else {
    1_u32; // transmute was not called
};

// Has transmute been called by this point? We conservatively approximate that
// as yes, and that is why we use the OR operator.
println!("x: {}", x);

Inspecting the Results of a Dataflow Analysis

Once you have constructed an analysis, you must pass it to an Engine, which is responsible for finding the steady-state solution to your dataflow problem. You should use the into_engine method defined on the Analysis trait for this, since it will use the more efficient Engine::new_gen_kill constructor when possible.

Calling iterate_to_fixpoint on your Engine will return a Results, which contains the dataflow state at fixpoint upon entry of each block. Once you have a Results, you can can inspect the dataflow state at fixpoint at any point in the CFG. If you only need the state at a few locations (e.g., each Drop terminator) use a ResultsCursor. If you need the state at every location, a ResultsVisitor will be more efficient.

                         Analysis
                            |
                            | into_engine(…)
                            |
                          Engine
                            |
                            | iterate_to_fixpoint()
                            |
                         Results
                         /     \
 into_results_cursor(…) /       \  visit_with(…)
                       /         \
               ResultsCursor  ResultsVisitor

For example, the following code uses a ResultsVisitor...

// Assuming `MyVisitor` implements `ResultsVisitor<FlowState = MyAnalysis::Domain>`...
let mut my_visitor = MyVisitor::new();

// inspect the fixpoint state for every location within every block in RPO.
let results = MyAnalysis::new()
    .into_engine(tcx, body, def_id)
    .iterate_to_fixpoint()
    .visit_in_rpo_with(body, &mut my_visitor);

whereas this code uses ResultsCursor:

let mut results = MyAnalysis::new()
    .into_engine(tcx, body, def_id)
    .iterate_to_fixpoint()
    .into_results_cursor(body);

// Inspect the fixpoint state immediately before each `Drop` terminator.
for (bb, block) in body.basic_blocks().iter_enumerated() {
    if let TerminatorKind::Drop { .. } = block.terminator().kind {
        results.seek_before_primary_effect(body.terminator_loc(bb));
        let state = results.get();
        println!("state before drop: {:#?}", state);
    }
}

Graphviz Diagrams

When the results of a dataflow analysis are not what you expect, it often helps to visualize them. This can be done with the -Zdump-mir flags described in Debugging MIR. Start with -Zdump-mir=F -Zdump-mir-dataflow, where F is either "all" or the name of the MIR body you are interested in.

These .dot files will be saved in your mir_dump directory and will have the NAME of the analysis (e.g. maybe_inits) as part of their filename. Each visualization will display the full dataflow state at entry and exit of each block, as well as any changes that occur in each statement and terminator. See the example below:

A graphviz diagram for a dataflow analysis

MIR borrow check

The borrow check is Rust's "secret sauce" – it is tasked with enforcing a number of properties:

  • That all variables are initialized before they are used.
  • That you can't move the same value twice.
  • That you can't move a value while it is borrowed.
  • That you can't access a place while it is mutably borrowed (except through the reference).
  • That you can't mutate a place while it is immutably borrowed.
  • etc

The borrow checker operates on the MIR. An older implementation operated on the HIR. Doing borrow checking on MIR has several advantages:

Major phases of the borrow checker

The borrow checker source is found in the rustc_mir::borrow_check module. The main entry point is the mir_borrowck query.

  • We first create a local copy of the MIR. In the coming steps, we will modify this copy in place to modify the types and things to include references to the new regions that we are computing.
  • We then invoke replace_regions_in_mir to modify our local MIR. Among other things, this function will replace all of the regions in the MIR with fresh inference variables.
  • Next, we perform a number of dataflow analyses that compute what data is moved and when.
  • We then do a second type check across the MIR: the purpose of this type check is to determine all of the constraints between different regions.
  • Next, we do region inference, which computes the values of each region — basically, the points in the control-flow graph where each lifetime must be valid according to the constraints we collected.
  • At this point, we can compute the "borrows in scope" at each point.
  • Finally, we do a second walk over the MIR, looking at the actions it does and reporting errors. For example, if we see a statement like *a + 1, then we would check that the variable a is initialized and that it is not mutably borrowed, as either of those would require an error to be reported. Doing this check requires the results of all the previous analyses.

Tracking moves and initialization

Part of the borrow checker's job is to track which variables are "initialized" at any given point in time -- this also requires figuring out where moves occur and tracking those.

Initialization and moves

From a user's perspective, initialization -- giving a variable some value -- and moves -- transferring ownership to another place -- might seem like distinct topics. Indeed, our borrow checker error messages often talk about them differently. But within the borrow checker, they are not nearly as separate. Roughly speaking, the borrow checker tracks the set of "initialized places" at any point in the source code. Assigning to a previously uninitialized local variable adds it to that set; moving from a local variable removes it from that set.

Consider this example:

fn foo() {
    let a: Vec<u32>;
    
    // a is not initialized yet
    
    a = vec![22];
    
    // a is initialized here
    
    std::mem::drop(a); // a is moved here
    
    // a is no longer initialized here

    let l = a.len(); //~ ERROR
}

Here you can see that a starts off as uninitialized; once it is assigned, it becomes initialized. But when drop(a) is called, that moves a into the call, and hence it becomes uninitialized again.

Subsections

To make it easier to peruse, this section is broken into a number of subsections:

  • Move paths the move path concept that we use to track which local variables (or parts of local variables, in some cases) are initialized.
  • TODO Rest not yet written =)

Move paths

In reality, it's not enough to track initialization at the granularity of local variables. Rust also allows us to do moves and initialization at the field granularity:

fn foo() {
    let a: (Vec<u32>, Vec<u32>) = (vec![22], vec![44]);
    
    // a.0 and a.1 are both initialized
    
    let b = a.0; // moves a.0
    
    // a.0 is not initialized, but a.1 still is

    let c = a.0; // ERROR
    let d = a.1; // OK
}

To handle this, we track initialization at the granularity of a move path. A MovePath represents some location that the user can initialize, move, etc. So e.g. there is a move-path representing the local variable a, and there is a move-path representing a.0. Move paths roughly correspond to the concept of a Place from MIR, but they are indexed in ways that enable us to do move analysis more efficiently.

Move path indices

Although there is a MovePath data structure, they are never referenced directly. Instead, all the code passes around indices of type MovePathIndex. If you need to get information about a move path, you use this index with the move_paths field of the MoveData. For example, to convert a MovePathIndex mpi into a MIR Place, you might access the MovePath::place field like so:

move_data.move_paths[mpi].place

Building move paths

One of the first things we do in the MIR borrow check is to construct the set of move paths. This is done as part of the MoveData::gather_moves function. This function uses a MIR visitor called Gatherer to walk the MIR and look at how each Place within is accessed. For each such Place, it constructs a corresponding MovePathIndex. It also records when/where that particular move path is moved/initialized, but we'll get to that in a later section.

Illegal move paths

We don't actually create a move-path for every Place that gets used. In particular, if it is illegal to move from a Place, then there is no need for a MovePathIndex. Some examples:

  • You cannot move from a static variable, so we do not create a MovePathIndex for static variables.
  • You cannot move an individual element of an array, so if we have e.g. foo: [String; 3], there would be no move-path for foo[1].
  • You cannot move from inside of a borrowed reference, so if we have e.g. foo: &String, there would be no move-path for *foo.

These rules are enforced by the move_path_for function, which converts a Place into a MovePathIndex -- in error cases like those just discussed, the function returns an Err. This in turn means we don't have to bother tracking whether those places are initialized (which lowers overhead).

Looking up a move-path

If you have a Place and you would like to convert it to a MovePathIndex, you can do that using the MovePathLookup structure found in the rev_lookup field of MoveData. There are two different methods:

  • find_local, which takes a mir::Local representing a local variable. This is the easier method, because we always create a MovePathIndex for every local variable.
  • find, which takes an arbitrary Place. This method is a bit more annoying to use, precisely because we don't have a MovePathIndex for every Place (as we just discussed in the "illegal move paths" section). Therefore, find returns a LookupResult indicating the closest path it was able to find that exists (e.g., for foo[1], it might return just the path for foo).

Cross-references

As we noted above, move-paths are stored in a big vector and referenced via their MovePathIndex. However, within this vector, they are also structured into a tree. So for example if you have the MovePathIndex for a.b.c, you can go to its parent move-path a.b. You can also iterate over all children paths: so, from a.b, you might iterate to find the path a.b.c (here you are iterating just over the paths that are actually referenced in the source, not all possible paths that could have been referenced). These references are used for example in the find_in_move_path_or_its_descendants function, which determines whether a move-path (e.g., a.b) or any child of that move-path (e.g.,a.b.c) matches a given predicate.

The MIR type-check

A key component of the borrow check is the MIR type-check. This check walks the MIR and does a complete "type check" -- the same kind you might find in any other language. In the process of doing this type-check, we also uncover the region constraints that apply to the program.

TODO -- elaborate further? Maybe? :)

Region inference (NLL)

The MIR-based region checking code is located in the rustc_mir::borrow_check module.

The MIR-based region analysis consists of two major functions:

  • replace_regions_in_mir, invoked first, has two jobs:
    • First, it finds the set of regions that appear within the signature of the function (e.g., 'a in fn foo<'a>(&'a u32) { ... }). These are called the "universal" or "free" regions – in particular, they are the regions that appear free in the function body.
    • Second, it replaces all the regions from the function body with fresh inference variables. This is because (presently) those regions are the results of lexical region inference and hence are not of much interest. The intention is that – eventually – they will be "erased regions" (i.e., no information at all), since we won't be doing lexical region inference at all.
  • compute_regions, invoked second: this is given as argument the results of move analysis. It has the job of computing values for all the inference variables that replace_regions_in_mir introduced.
    • To do that, it first runs the MIR type checker. This is basically a normal type-checker but specialized to MIR, which is much simpler than full Rust, of course. Running the MIR type checker will however create various constraints between region variables, indicating their potential values and relationships to one another.
    • After this, we perform constraint propagation by creating a RegionInferenceContext and invoking its solve method.
    • The NLL RFC also includes fairly thorough (and hopefully readable) coverage.

Universal regions

The UniversalRegions type represents a collection of universal regions corresponding to some MIR DefId. It is constructed in replace_regions_in_mir when we replace all regions with fresh inference variables. UniversalRegions contains indices for all the free regions in the given MIR along with any relationships that are known to hold between them (e.g. implied bounds, where clauses, etc.).

For example, given the MIR for the following function:


#![allow(unused)]
fn main() {
fn foo<'a>(x: &'a u32) {
    // ...
}
}

we would create a universal region for 'a and one for 'static. There may also be some complications for handling closures, but we will ignore those for the moment.

TODO: write about how these regions are computed.

Region variables

The value of a region can be thought of as a set. This set contains all points in the MIR where the region is valid along with any regions that are outlived by this region (e.g. if 'a: 'b, then end('b) is in the set for 'a); we call the domain of this set a RegionElement. In the code, the value for all regions is maintained in the rustc_mir::borrow_check::nll::region_infer module. For each region we maintain a set storing what elements are present in its value (to make this efficient, we give each kind of element an index, the RegionElementIndex, and use sparse bitsets).

The kinds of region elements are as follows:

  • Each location in the MIR control-flow graph: a location is just the pair of a basic block and an index. This identifies the point on entry to the statement with that index (or the terminator, if the index is equal to statements.len()).
  • There is an element end('a) for each universal region 'a, corresponding to some portion of the caller's (or caller's caller, etc) control-flow graph.
  • Similarly, there is an element denoted end('static) corresponding to the remainder of program execution after this function returns.
  • There is an element !1 for each placeholder region !1. This corresponds (intuitively) to some unknown set of other elements – for details on placeholders, see the section placeholders and universes.

Constraints

Before we can infer the value of regions, we need to collect constraints on the regions. The full set of constraints is described in the section on constraint propagation, but the two most common sorts of constraints are:

  1. Outlives constraints. These are constraints that one region outlives another (e.g. 'a: 'b). Outlives constraints are generated by the MIR type checker.
  2. Liveness constraints. Each region needs to be live at points where it can be used. These constraints are collected by generate_constraints.

Inference Overview

So how do we compute the contents of a region? This process is called region inference. The high-level idea is pretty simple, but there are some details we need to take care of.

Here is the high-level idea: we start off each region with the MIR locations we know must be in it from the liveness constraints. From there, we use all of the outlives constraints computed from the type checker to propagate the constraints: for each region 'a, if 'a: 'b, then we add all elements of 'b to 'a, including end('b). This all happens in propagate_constraints.

Then, we will check for errors. We first check that type tests are satisfied by calling check_type_tests. This checks constraints like T: 'a. Second, we check that universal regions are not "too big". This is done by calling check_universal_regions. This checks that for each region 'a if 'a contains the element end('b), then we must already know that 'a: 'b holds (e.g. from a where clause). If we don't already know this, that is an error... well, almost. There is some special handling for closures that we will discuss later.

Example

Consider the following example:

fn foo<'a, 'b>(x: &'a usize) -> &'b usize {
    x
}

Clearly, this should not compile because we don't know if 'a outlives 'b (if it doesn't then the return value could be a dangling reference).

Let's back up a bit. We need to introduce some free inference variables (as is done in replace_regions_in_mir). This example doesn't use the exact regions produced, but it (hopefully) is enough to get the idea across.

fn foo<'a, 'b>(x: &'a /* '#1 */ usize) -> &'b /* '#3 */ usize {
    x // '#2, location L1
}

Some notation: '#1, '#3, and '#2 represent the universal regions for the argument, return value, and the expression x, respectively. Additionally, I will call the location of the expression x L1.

So now we can use the liveness constraints to get the following starting points:

RegionContents
'#1
'#2L1
'#3L1

Now we use the outlives constraints to expand each region. Specifically, we know that '#2: '#3 ...

RegionContents
'#1L1
'#2L1, end('#3) // add contents of '#3 and end('#3)
'#3L1

... and '#1: '#2, so ...

RegionContents
'#1L1, end('#2), end('#3) // add contents of '#2 and end('#2)
'#2L1, end('#3)
'#3L1

Now, we need to check that no regions were too big (we don't have any type tests to check in this case). Notice that '#1 now contains end('#3), but we have no where clause or implied bound to say that 'a: 'b... that's an error!

Some details

The RegionInferenceContext type contains all of the information needed to do inference, including the universal regions from replace_regions_in_mir and the constraints computed for each region. It is constructed just after we compute the liveness constraints.

Here are some of the fields of the struct:

  • constraints: contains all the outlives constraints.
  • liveness_constraints: contains all the liveness constraints.
  • universal_regions: contains the UniversalRegions returned by replace_regions_in_mir.
  • universal_region_relations: contains relations known to be true about universal regions. For example, if we have a where clause that 'a: 'b, that relation is assumed to be true while borrow checking the implementation (it is checked at the caller), so universal_region_relations would contain 'a: 'b.
  • type_tests: contains some constraints on types that we must check after inference (e.g. T: 'a).
  • closure_bounds_mapping: used for propagating region constraints from closures back out to the creator of the closure.

TODO: should we discuss any of the others fields? What about the SCCs?

Ok, now that we have constructed a RegionInferenceContext, we can do inference. This is done by calling the solve method on the context. This is where we call propagate_constraints and then check the resulting type tests and universal regions, as discussed above.

Constraint propagation

The main work of the region inference is constraint propagation, which is done in the propagate_constraints function. There are three sorts of constraints that are used in NLL, and we'll explain how propagate_constraints works by "layering" those sorts of constraints on one at a time (each of them is fairly independent from the others):

  • liveness constraints (R live at E), which arise from liveness;
  • outlives constraints (R1: R2), which arise from subtyping;
  • member constraints (member R_m of [R_c...]), which arise from impl Trait.

In this chapter, we'll explain the "heart" of constraint propagation, covering both liveness and outlives constraints.

Notation and high-level concepts

Conceptually, region inference is a "fixed-point" computation. It is given some set of constraints {C} and it computes a set of values Values: R -> {E} that maps each region R to a set of elements {E} (see here for more notes on region elements):

  • Initially, each region is mapped to an empty set, so Values(R) = {} for all regions R.
  • Next, we process the constraints repeatedly until a fixed-point is reached:
    • For each constraint C:
      • Update Values as needed to satisfy the constraint

As a simple example, if we have a liveness constraint R live at E, then we can apply Values(R) = Values(R) union {E} to make the constraint be satisfied. Similarly, if we have an outlives constraints R1: R2, we can apply Values(R1) = Values(R1) union Values(R2). (Member constraints are more complex and we discuss them in this section.)

In practice, however, we are a bit more clever. Instead of applying the constraints in a loop, we can analyze the constraints and figure out the correct order to apply them, so that we only have to apply each constraint once in order to find the final result.

Similarly, in the implementation, the Values set is stored in the scc_values field, but they are indexed not by a region but by a strongly connected component (SCC). SCCs are an optimization that avoids a lot of redundant storage and computation. They are explained in the section on outlives constraints.

Liveness constraints

A liveness constraint arises when some variable whose type includes a region R is live at some point P. This simply means that the value of R must include the point P. Liveness constraints are computed by the MIR type checker.

A liveness constraint R live at E is satisfied if E is a member of Values(R). So to "apply" such a constraint to Values, we just have to compute Values(R) = Values(R) union {E}.

The liveness values are computed in the type-check and passed to the region inference upon creation in the liveness_constraints argument. These are not represented as individual constraints like R live at E though; instead, we store a (sparse) bitset per region variable (of type LivenessValues). This way we only need a single bit for each liveness constraint.

One thing that is worth mentioning: All lifetime parameters are always considered to be live over the entire function body. This is because they correspond to some portion of the caller's execution, and that execution clearly includes the time spent in this function, since the caller is waiting for us to return.

Outlives constraints

An outlives constraint 'a: 'b indicates that the value of 'a must be a superset of the value of 'b. That is, an outlives constraint R1: R2 is satisfied if Values(R1) is a superset of Values(R2). So to "apply" such a constraint to Values, we just have to compute Values(R1) = Values(R1) union Values(R2).

One observation that follows from this is that if you have R1: R2 and R2: R1, then R1 = R2 must be true. Similarly, if you have:

R1: R2
R2: R3
R3: R4
R4: R1

then R1 = R2 = R3 = R4 follows. We take advantage of this to make things much faster, as described shortly.

In the code, the set of outlives constraints is given to the region inference context on creation in a parameter of type OutlivesConstraintSet. The constraint set is basically just a list of 'a: 'b constraints.

The outlives constraint graph and SCCs

In order to work more efficiently with outlives constraints, they are converted into the form of a graph, where the nodes of the graph are region variables ('a, 'b) and each constraint 'a: 'b induces an edge 'a -> 'b. This conversion happens in the RegionInferenceContext::new function that creates the inference context.

When using a graph representation, we can detect regions that must be equal by looking for cycles. That is, if you have a constraint like

'a: 'b
'b: 'c
'c: 'd
'd: 'a

then this will correspond to a cycle in the graph containing the elements 'a...'d.

Therefore, one of the first things that we do in propagating region values is to compute the strongly connected components (SCCs) in the constraint graph. The result is stored in the constraint_sccs field. You can then easily find the SCC that a region r is a part of by invoking constraint_sccs.scc(r).

Working in terms of SCCs allows us to be more efficient: if we have a set of regions 'a...'d that are part of a single SCC, we don't have to compute/store their values separately. We can just store one value for the SCC, since they must all be equal.

If you look over the region inference code, you will see that a number of fields are defined in terms of SCCs. For example, the scc_values field stores the values of each SCC. To get the value of a specific region 'a then, we first figure out the SCC that the region is a part of, and then find the value of that SCC.

When we compute SCCs, we not only figure out which regions are a member of each SCC, we also figure out the edges between them. So for example consider this set of outlives constraints:

'a: 'b
'b: 'a

'a: 'c

'c: 'd
'd: 'c

Here we have two SCCs: S0 contains 'a and 'b, and S1 contains 'c and 'd. But these SCCs are not independent: because 'a: 'c, that means that S0: S1 as well. That is -- the value of S0 must be a superset of the value of S1. One crucial thing is that this graph of SCCs is always a DAG -- that is, it never has cycles. This is because all the cycles have been removed to form the SCCs themselves.

Applying liveness constraints to SCCs

The liveness constraints that come in from the type-checker are expressed in terms of regions -- that is, we have a map like Liveness: R -> {E}. But we want our final result to be expressed in terms of SCCs -- we can integrate these liveness constraints very easily just by taking the union:

for each region R:
  let S be the SCC that contains R
  Values(S) = Values(S) union Liveness(R)

In the region inferencer, this step is done in RegionInferenceContext::new.

Applying outlives constraints

Once we have computed the DAG of SCCs, we use that to structure out entire computation. If we have an edge S1 -> S2 between two SCCs, that means that Values(S1) >= Values(S2) must hold. So, to compute the value of S1, we first compute the values of each successor S2. Then we simply union all of those values together. To use a quasi-iterator-like notation:

Values(S1) =
  s1.successors()
    .map(|s2| Values(s2))
    .union()

In the code, this work starts in the propagate_constraints function, which iterates over all the SCCs. For each SCC S1, we compute its value by first computing the value of its successors. Since SCCs form a DAG, we don't have to be concerned about cycles, though we do need to keep a set around to track whether we have already processed a given SCC or not. For each successor S2, once we have computed S2's value, we can union those elements into the value for S1. (Although we have to be careful in this process to properly handle higher-ranked placeholders. Note that the value for S1 already contains the liveness constraints, since they were added in RegionInferenceContext::new.

Once that process is done, we now have the "minimal value" for S1, taking into account all of the liveness and outlives constraints. However, in order to complete the process, we must also consider member constraints, which are described in a later section.

Universal regions

"Universal regions" is the name that the code uses to refer to "named lifetimes" -- e.g., lifetime parameters and 'static. The name derives from the fact that such lifetimes are "universally quantified" (i.e., we must make sure the code is true for all values of those lifetimes). It is worth spending a bit of discussing how lifetime parameters are handled during region inference. Consider this example:

fn foo<'a, 'b>(x: &'a u32, y: &'b u32) -> &'b u32 {
  x
}

This example is intended not to compile, because we are returning x, which has type &'a u32, but our signature promises that we will return a &'b u32 value. But how are lifetimes like 'a and 'b integrated into region inference, and how this error wind up being detected?

Universal regions and their relationships to one another

Early on in region inference, one of the first things we do is to construct a UniversalRegions struct. This struct tracks the various universal regions in scope on a particular function. We also create a UniversalRegionRelations struct, which tracks their relationships to one another. So if you have e.g. where 'a: 'b, then the UniversalRegionRelations struct would track that 'a: 'b is known to hold (which could be tested with the outlives function.

Everything is a region variable

One important aspect of how NLL region inference works is that all lifetimes are represented as numbered variables. This means that the only variant of ty::RegionKind that we use is the ReVar variant. These region variables are broken into two major categories, based on their index:

  • 0..N: universal regions -- the ones we are discussing here. In this case, the code must be correct with respect to any value of those variables that meets the declared relationships.
  • N..M: existential regions -- inference variables where the region inferencer is tasked with finding some suitable value.

In fact, the universal regions can be further subdivided based on where they were brought into scope (see the RegionClassification type). These subdivisions are not important for the topics discussed here, but become important when we consider closure constraint propagation, so we discuss them there.

Universal lifetimes as the elements of a region's value

As noted previously, the value that we infer for each region is a set {E}. The elements of this set can be points in the control-flow graph, but they can also be an element end('a) corresponding to each universal lifetime 'a. If the value for some region R0 includes end('a), then this implies that R0 must extend until the end of 'a in the caller.

The "value" of a universal region

During region inference, we compute a value for each universal region in the same way as we compute values for other regions. This value represents, effectively, the lower bound on that universal region -- the things that it must outlive. We now describe how we use this value to check for errors.

Liveness and universal regions

All universal regions have an initial liveness constraint that includes the entire function body. This is because lifetime parameters are defined in the caller and must include the entirety of the function call that invokes this particular function. In addition, each universal region 'a includes itself (that is, end('a)) in its liveness constraint (i.e., 'a must extend until the end of itself). In the code, these liveness constraints are setup in init_free_and_bound_regions.

Propagating outlives constraints for universal regions

So, consider the first example of this section:

fn foo<'a, 'b>(x: &'a u32, y: &'b u32) -> &'b u32 {
  x
}

Here, returning x requires that &'a u32 <: &'b u32, which gives rise to an outlives constraint 'a: 'b. Combined with our default liveness constraints we get:

'a live at {B, end('a)} // B represents the "function body"
'b live at {B, end('b)}
'a: 'b

When we process the 'a: 'b constraint, therefore, we will add end('b) into the value for 'a, resulting in a final value of {B, end('a), end('b)}.

Detecting errors

Once we have finished constraint propagation, we then enforce a constraint that if some universal region 'a includes an element end('b), then 'a: 'b must be declared in the function's bounds. If not, as in our example, that is an error. This check is done in the check_universal_regions function, which simply iterates over all universal regions, inspects their final value, and tests against the declared UniversalRegionRelations.

Member constraints

A member constraint 'm member of ['c_1..'c_N] expresses that the region 'm must be equal to some choice regions 'c_i (for some i). These constraints cannot be expressed by users, but they arise from impl Trait due to its lifetime capture rules. Consider a function such as the following:

fn make(a: &'a u32, b: &'b u32) -> impl Trait<'a, 'b> { .. }

Here, the true return type (often called the "hidden type") is only permitted to capture the lifetimes 'a or 'b. You can kind of see this more clearly by desugaring that impl Trait return type into its more explicit form:

type MakeReturn<'x, 'y> = impl Trait<'x, 'y>;
fn make(a: &'a u32, b: &'b u32) -> MakeReturn<'a, 'b> { .. }

Here, the idea is that the hidden type must be some type that could have been written in place of the impl Trait<'x, 'y> -- but clearly such a type can only reference the regions 'x or 'y (or 'static!), as those are the only names in scope. This limitation is then translated into a restriction to only access 'a or 'b because we are returning MakeReturn<'a, 'b>, where 'x and 'y have been replaced with 'a and 'b respectively.

Detailed example

To help us explain member constraints in more detail, let's spell out the make example in a bit more detail. First off, let's assume that you have some dummy trait:

trait Trait<'a, 'b> { }
impl<T> Trait<'_, '_> for T { }

and this is the make function (in desugared form):

type MakeReturn<'x, 'y> = impl Trait<'x, 'y>;
fn make(a: &'a u32, b: &'b u32) -> MakeReturn<'a, 'b> {
  (a, b)
}

What happens in this case is that the return type will be (&'0 u32, &'1 u32), where '0 and '1 are fresh region variables. We will have the following region constraints:

'0 live at {L}
'1 live at {L}
'a: '0
'b: '1
'0 member of ['a, 'b, 'static]
'1 member of ['a, 'b, 'static]

Here the "liveness set" {L} corresponds to that subset of the body where '0 and '1 are live -- basically the point from where the return tuple is constructed to where it is returned (in fact, '0 and '1 might have slightly different liveness sets, but that's not very interesting to the point we are illustrating here).

The 'a: '0 and 'b: '1 constraints arise from subtyping. When we construct the (a, b) value, it will be assigned type (&'0 u32, &'1 u32) -- the region variables reflect that the lifetimes of these references could be made smaller. For this value to be created from a and b, however, we do require that:

(&'a u32, &'b u32) <: (&'0 u32, &'1 u32)

which means in turn that &'a u32 <: &'0 u32 and hence that 'a: '0 (and similarly that &'b u32 <: &'1 u32, 'b: '1).

Note that if we ignore member constraints, the value of '0 would be inferred to some subset of the function body (from the liveness constraints, which we did not write explicitly). It would never become 'a, because there is no need for it too -- we have a constraint that 'a: '0, but that just puts a "cap" on how large '0 can grow to become. Since we compute the minimal value that we can, we are happy to leave '0 as being just equal to the liveness set. This is where member constraints come in.

Choices are always lifetime parameters

At present, the "choice" regions from a member constraint are always lifetime parameters from the current function. This falls out from the placement of impl Trait, though in the future it may not be the case. We take some advantage of this fact, as it simplifies the current code. In particular, we don't have to consider a case like '0 member of ['1, 'static], in which the value of both '0 and '1 are being inferred and hence changing. See rust-lang/rust#61773 for more information.

Applying member constraints

Member constraints are a bit more complex than other forms of constraints. This is because they have a "or" quality to them -- that is, they describe multiple choices that we must select from. E.g., in our example constraint '0 member of ['a, 'b, 'static], it might be that '0 is equal to 'a, 'b, or 'static. How can we pick the correct one? What we currently do is to look for a minimal choice -- if we find one, then we will grow '0 to be equal to that minimal choice. To find that minimal choice, we take two factors into consideration: lower and upper bounds.

Lower bounds

The lower bounds are those lifetimes that '0 must outlive -- i.e., that '0 must be larger than. In fact, when it comes time to apply member constraints, we've already computed the lower bounds of '0 because we computed its minimal value (or at least, the lower bounds considering everything but member constraints).

Let LB be the current value of '0. We know then that '0: LB must hold, whatever the final value of '0 is. Therefore, we can rule out any choice 'choice where 'choice: LB does not hold.

Unfortunately, in our example, this is not very helpful. The lower bound for '0 will just be the liveness set {L}, and we know that all the lifetime parameters outlive that set. So we are left with the same set of choices here. (But in other examples, particularly those with different variance, lower bound constraints may be relevant.)

Upper bounds

The upper bounds are those lifetimes that must outlive '0 -- i.e., that '0 must be smaller than. In our example, this would be 'a, because we have the constraint that 'a: '0. In more complex examples, the chain may be more indirect.

We can use upper bounds to rule out members in a very similar way to lower lower bounds. If UB is some upper bound, then we know that UB: '0 must hold, so we can rule out any choice 'choice where UB: 'choice does not hold.

In our example, we would be able to reduce our choice set from ['a, 'b, 'static] to just ['a]. This is because '0 has an upper bound of 'a, and neither 'a: 'b nor 'a: 'static is known to hold.

(For notes on how we collect upper bounds in the implementation, see the section below.)

Minimal choice

After applying lower and upper bounds, we can still sometimes have multiple possibilities. For example, imagine a variant of our example using types with the opposite variance. In that case, we would have the constraint '0: 'a instead of 'a: '0. Hence the current value of '0 would be {L, 'a}. Using this as a lower bound, we would be able to narrow down the member choices to ['a, 'static] because 'b: 'a is not known to hold (but 'a: 'a and 'static: 'a do hold). We would not have any upper bounds, so that would be our final set of choices.

In that case, we apply the minimal choice rule -- basically, if one of our choices if smaller than the others, we can use that. In this case, we would opt for 'a (and not 'static).

This choice is consistent with the general 'flow' of region propagation, which always aims to compute a minimal value for the region being inferred. However, it is somewhat arbitrary.

Collecting upper bounds in the implementation

In practice, computing upper bounds is a bit inconvenient, because our data structures are setup for the opposite. What we do is to compute the reverse SCC graph (we do this lazily and cache the result) -- that is, a graph where 'a: 'b induces an edge SCC('b) -> SCC('a). Like the normal SCC graph, this is a DAG. We can then do a depth-first search starting from SCC('0) in this graph. This will take us to all the SCCs that must outlive '0.

One wrinkle is that, as we walk the "upper bound" SCCs, their values will not yet have been fully computed. However, we have already applied their liveness constraints, so we have some information about their value. In particular, for any regions representing lifetime parameters, their value will contain themselves (i.e., the initial value for 'a includes 'a and the value for 'b contains 'b). So we can collect all of the lifetime parameters that are reachable, which is precisely what we are interested in.

Placeholders and universes

From time to time we have to reason about regions that we can't concretely know. For example, consider this program:

// A function that needs a static reference
fn foo(x: &'static u32) { }

fn bar(f: for<'a> fn(&'a u32)) {
       // ^^^^^^^^^^^^^^^^^^^ a function that can accept **any** reference
    let x = 22;
    f(&x);
}

fn main() {
    bar(foo);
}

This program ought not to type-check: foo needs a static reference for its argument, and bar wants to be given a function that that accepts any reference (so it can call it with something on its stack, for example). But how do we reject it and why?

Subtyping and Placeholders

When we type-check main, and in particular the call bar(foo), we are going to wind up with a subtyping relationship like this one:

fn(&'static u32) <: for<'a> fn(&'a u32)
----------------    -------------------
the type of `foo`   the type `bar` expects

We handle this sort of subtyping by taking the variables that are bound in the supertype and replacing them with universally quantified representatives, denoted like !1 here. We call these regions "placeholder regions" – they represent, basically, "some unknown region".

Once we've done that replacement, we have the following relation:

fn(&'static u32) <: fn(&'!1 u32)

The key idea here is that this unknown region '!1 is not related to any other regions. So if we can prove that the subtyping relationship is true for '!1, then it ought to be true for any region, which is what we wanted.

So let's work through what happens next. To check if two functions are subtypes, we check if their arguments have the desired relationship (fn arguments are contravariant, so we swap the left and right here):

&'!1 u32 <: &'static u32

According to the basic subtyping rules for a reference, this will be true if '!1: 'static. That is – if "some unknown region !1" outlives 'static. Now, this might be true – after all, '!1 could be 'static – but we don't know that it's true. So this should yield up an error (eventually).

What is a universe?

In the previous section, we introduced the idea of a placeholder region, and we denoted it !1. We call this number 1 the universe index. The idea of a "universe" is that it is a set of names that are in scope within some type or at some point. Universes are formed into a tree, where each child extends its parents with some new names. So the root universe conceptually contains global names, such as the the lifetime 'static or the type i32. In the compiler, we also put generic type parameters into this root universe (in this sense, there is not just one root universe, but one per item). So consider this function bar:

struct Foo { }

fn bar<'a, T>(t: &'a T) {
    ...
}

Here, the root universe would consist of the lifetimes 'static and 'a. In fact, although we're focused on lifetimes here, we can apply the same concept to types, in which case the types Foo and T would be in the root universe (along with other global types, like i32). Basically, the root universe contains all the names that appear free in the body of bar.

Now let's extend bar a bit by adding a variable x:

fn bar<'a, T>(t: &'a T) {
    let x: for<'b> fn(&'b u32) = ...;
}

Here, the name 'b is not part of the root universe. Instead, when we "enter" into this for<'b> (e.g., by replacing it with a placeholder), we will create a child universe of the root, let's call it U1:

U0 (root universe)
│
└─ U1 (child universe)

The idea is that this child universe U1 extends the root universe U0 with a new name, which we are identifying by its universe number: !1.

Now let's extend bar a bit by adding one more variable, y:

fn bar<'a, T>(t: &'a T) {
    let x: for<'b> fn(&'b u32) = ...;
    let y: for<'c> fn(&'b u32) = ...;
}

When we enter this type, we will again create a new universe, which we'll call U2. Its parent will be the root universe, and U1 will be its sibling:

U0 (root universe)
│
├─ U1 (child universe)
│
└─ U2 (child universe)

This implies that, while in U2, we can name things from U0 or U2, but not U1.

Giving existential variables a universe. Now that we have this notion of universes, we can use it to extend our type-checker and things to prevent illegal names from leaking out. The idea is that we give each inference (existential) variable – whether it be a type or a lifetime – a universe. That variable's value can then only reference names visible from that universe. So for example if a lifetime variable is created in U0, then it cannot be assigned a value of !1 or !2, because those names are not visible from the universe U0.

Representing universes with just a counter. You might be surprised to see that the compiler doesn't keep track of a full tree of universes. Instead, it just keeps a counter – and, to determine if one universe can see another one, it just checks if the index is greater. For example, U2 can see U0 because 2 >= 0. But U0 cannot see U2, because 0 >= 2 is false.

How can we get away with this? Doesn't this mean that we would allow U2 to also see U1? The answer is that, yes, we would, if that question ever arose. But because of the structure of our type checker etc, there is no way for that to happen. In order for something happening in the universe U1 to "communicate" with something happening in U2, they would have to have a shared inference variable X in common. And because everything in U1 is scoped to just U1 and its children, that inference variable X would have to be in U0. And since X is in U0, it cannot name anything from U1 (or U2). This is perhaps easiest to see by using a kind of generic "logic" example:

exists<X> {
   forall<Y> { ... /* Y is in U1 ... */ }
   forall<Z> { ... /* Z is in U2 ... */ }
}

Here, the only way for the two foralls to interact would be through X, but neither Y nor Z are in scope when X is declared, so its value cannot reference either of them.

Universes and placeholder region elements

But where does that error come from? The way it happens is like this. When we are constructing the region inference context, we can tell from the type inference context how many placeholder variables exist (the InferCtxt has an internal counter). For each of those, we create a corresponding universal region variable !n and a "region element" placeholder(n). This corresponds to "some unknown set of other elements". The value of !n is {placeholder(n)}.

At the same time, we also give each existential variable a universe (also taken from the InferCtxt). This universe determines which placeholder elements may appear in its value: For example, a variable in universe U3 may name placeholder(1), placeholder(2), and placeholder(3), but not placeholder(4). Note that the universe of an inference variable controls what region elements can appear in its value; it does not say region elements will appear.

Placeholders and outlives constraints

In the region inference engine, outlives constraints have the form:

V1: V2 @ P

where V1 and V2 are region indices, and hence map to some region variable (which may be universally or existentially quantified). The P here is a "point" in the control-flow graph; it's not important for this section. This variable will have a universe, so let's call those universes U(V1) and U(V2) respectively. (Actually, the only one we are going to care about is U(V1).)

When we encounter this constraint, the ordinary procedure is to start a DFS from P. We keep walking so long as the nodes we are walking are present in value(V2) and we add those nodes to value(V1). If we reach a return point, we add in any end(X) elements. That part remains unchanged.

But then after that we want to iterate over the placeholder placeholder(x) elements in V2 (each of those must be visible to U(V2), but we should be able to just assume that is true, we don't have to check it). We have to ensure that value(V1) outlives each of those placeholder elements.

Now there are two ways that could happen. First, if U(V1) can see the universe x (i.e., x <= U(V1)), then we can just add placeholder(x) to value(V1) and be done. But if not, then we have to approximate: we may not know what set of elements placeholder(x) represents, but we should be able to compute some sort of upper bound B for it – some region B that outlives placeholder(x). For now, we'll just use 'static for that (since it outlives everything) – in the future, we can sometimes be smarter here (and in fact we have code for doing this already in other contexts). Moreover, since 'static is in the root universe U0, we know that all variables can see it – so basically if we find that value(V2) contains placeholder(x) for some universe x that V1 can't see, then we force V1 to 'static.

Extending the "universal regions" check

After all constraints have been propagated, the NLL region inference has one final check, where it goes over the values that wound up being computed for each universal region and checks that they did not get 'too large'. In our case, we will go through each placeholder region and check that it contains only the placeholder(u) element it is known to outlive. (Later, we might be able to know that there are relationships between two placeholder regions and take those into account, as we do for universal regions from the fn signature.)

Put another way, the "universal regions" check can be considered to be checking constraints like:

{placeholder(1)}: V1

where {placeholder(1)} is like a constant set, and V1 is the variable we made to represent the !1 region.

Back to our example

OK, so far so good. Now let's walk through what would happen with our first example:

fn(&'static u32) <: fn(&'!1 u32) @ P  // this point P is not imp't here

The region inference engine will create a region element domain like this:

{ CFG; end('static); placeholder(1) }
  ---  ------------  ------- from the universe `!1`
  |    'static is always in scope
  all points in the CFG; not especially relevant here

It will always create two universal variables, one representing 'static and one representing '!1. Let's call them Vs and V1. They will have initial values like so:

Vs = { CFG; end('static) } // it is in U0, so can't name anything else
V1 = { placeholder(1) }

From the subtyping constraint above, we would have an outlives constraint like

'!1: 'static @ P

To process this, we would grow the value of V1 to include all of Vs:

Vs = { CFG; end('static) }
V1 = { CFG; end('static), placeholder(1) }

At that point, constraint propagation is complete, because all the outlives relationships are satisfied. Then we would go to the "check universal regions" portion of the code, which would test that no universal region grew too large.

In this case, V1 did grow too large – it is not known to outlive end('static), nor any of the CFG – so we would report an error.

Another example

What about this subtyping relationship?

for<'a> fn(&'a u32, &'a u32)
    <:
for<'b, 'c> fn(&'b u32, &'c u32)

Here we would replace the bound region in the supertype with a placeholder, as before, yielding:

for<'a> fn(&'a u32, &'a u32)
    <:
fn(&'!1 u32, &'!2 u32)

then we instantiate the variable on the left-hand side with an existential in universe U2, yielding the following (?n is a notation for an existential variable):

fn(&'?3 u32, &'?3 u32)
    <:
fn(&'!1 u32, &'!2 u32)

Then we break this down further:

&'!1 u32 <: &'?3 u32
&'!2 u32 <: &'?3 u32

and even further, yield up our region constraints:

'!1: '?3
'!2: '?3

Note that, in this case, both '!1 and '!2 have to outlive the variable '?3, but the variable '?3 is not forced to outlive anything else. Therefore, it simply starts and ends as the empty set of elements, and hence the type-check succeeds here.

(This should surprise you a little. It surprised me when I first realized it. We are saying that if we are a fn that needs both of its arguments to have the same region, we can accept being called with arguments with two distinct regions. That seems intuitively unsound. But in fact, it's fine, as I tried to explain in this issue on the Rust issue tracker long ago. The reason is that even if we get called with arguments of two distinct lifetimes, those two lifetimes have some intersection (the call itself), and that intersection can be our value of 'a that we use as the common lifetime of our arguments. -nmatsakis)

Final example

Let's look at one last example. We'll extend the previous one to have a return type:

for<'a> fn(&'a u32, &'a u32) -> &'a u32
    <:
for<'b, 'c> fn(&'b u32, &'c u32) -> &'b u32

Despite seeming very similar to the previous example, this case is going to get an error. That's good: the problem is that we've gone from a fn that promises to return one of its two arguments, to a fn that is promising to return the first one. That is unsound. Let's see how it plays out.

First, we replace the bound region in the supertype with a placeholder:

for<'a> fn(&'a u32, &'a u32) -> &'a u32
    <:
fn(&'!1 u32, &'!2 u32) -> &'!1 u32

Then we instantiate the subtype with existentials (in U2):

fn(&'?3 u32, &'?3 u32) -> &'?3 u32
    <:
fn(&'!1 u32, &'!2 u32) -> &'!1 u32

And now we create the subtyping relationships:

&'!1 u32 <: &'?3 u32 // arg 1
&'!2 u32 <: &'?3 u32 // arg 2
&'?3 u32 <: &'!1 u32 // return type

And finally the outlives relationships. Here, let V1, V2, and V3 be the variables we assign to !1, !2, and ?3 respectively:

V1: V3
V2: V3
V3: V1

Those variables will have these initial values:

V1 in U1 = {placeholder(1)}
V2 in U2 = {placeholder(2)}
V3 in U2 = {}

Now because of the V3: V1 constraint, we have to add placeholder(1) into V3 (and indeed it is visible from V3), so we get:

V3 in U2 = {placeholder(1)}

then we have this constraint V2: V3, so we wind up having to enlarge V2 to include placeholder(1) (which it can also see):

V2 in U2 = {placeholder(1), placeholder(2)}

Now constraint propagation is done, but when we check the outlives relationships, we find that V2 includes this new element placeholder(1), so we report an error.

Propagating closure constraints

When we are checking the type tests and universal regions, we may come across a constraint that we can't prove yet if we are in a closure body! However, the necessary constraints may actually hold (we just don't know it yet). Thus, if we are inside a closure, we just collect all the constraints we can't prove yet and return them. Later, when we are borrow check the MIR node that created the closure, we can also check that these constraints hold. At that time, if we can't prove they hold, we report an error.

Reporting region errors

TODO: we should discuss how to generate errors from the results of these analyses.

Two-phase borrows

Two-phase borrows are a more permissive version of mutable borrows that allow nested method calls such as vec.push(vec.len()). Such borrows first act as shared borrows in a "reservation" phase and can later be "activated" into a full mutable borrow.

Only certain implicit mutable borrows can be two-phase, any &mut or ref mut in the source code is never a two-phase borrow. The cases where we generate a two-phase borrow are:

  1. The autoref borrow when calling a method with a mutable reference receiver.
  2. A mutable reborrow in function arguments.
  3. The implicit mutable borrow in an overloaded compound assignment operator.

To give some examples:


#![allow(unused)]
fn main() {
// In the source code

// Case 1:
let mut v = Vec::new();
v.push(v.len());
let r = &mut Vec::new();
r.push(r.len());

// Case 2:
std::mem::replace(r, vec![1, r.len()]);

// Case 3:
let mut x = std::num::Wrapping(2);
x += x;
}

Expanding these enough to show the two-phase borrows:

// Case 1:
let mut v = Vec::new();
let temp1 = &two_phase v;
let temp2 = v.len();
Vec::push(temp1, temp2);
let r = &mut Vec::new();
let temp3 = &two_phase *r;
let temp4 = r.len();
Vec::push(temp3, temp4);

// Case 2:
let temp5 = &two_phase *r;
let temp6 = vec![1, r.len()];
std::mem::replace(temp5, temp6);

// Case 3:
let mut x = std::num::Wrapping(2);
let temp7 = &two_phase x;
let temp8 = x;
std::ops::AddAssign::add_assign(temp7, temp8);

Whether a borrow can be two-phase is tracked by a flag on the AutoBorrow after type checking, which is then converted to a BorrowKind during MIR construction.

Each two-phase borrow is assigned to a temporary that is only used once. As such we can define:

  • The point where the temporary is assigned to is called the reservation point of the two-phase borrow.
  • The point where the temporary is used, which is effectively always a function call, is called the activation point.

The activation points are found using the GatherBorrows visitor. The BorrowData then holds both the reservation and activation points for the borrow.

Checking two-phase borrows

Two-phase borrows are treated as if they were mutable borrows with the following exceptions:

  1. At every location in the MIR we check if any two-phase borrows are activated at this location. If a live two phase borrow is activated at a location, then we check that there are no borrows that conflict with the two-phase borrow.
  2. At the reservation point we error if there are conflicting live mutable borrows. And lint if there are any conflicting shared borrows.
  3. Between the reservation and the activation point, the two-phase borrow acts as a shared borrow. We determine (in is_active) if we're at such a point by using the Dominators for the MIR graph.
  4. After the activation point, the two-phase borrow acts as a mutable borrow.

Parameter Environment

When working with associated and/or or generic items (types, constants, functions/methods) it is often relevant to have more information about the Self or generic parameters. Trait bounds and similar information is encoded in the ParamEnv. Often this is not enough information to obtain things like the type's Layout, but you can do all kinds of other checks on it (e.g. whether a type implements Copy) or you can evaluate an associated constant whose value does not depend on anything from the parameter environment.

For example if you have a function


#![allow(unused)]
fn main() {
fn foo<T: Copy>(t: T) { ... }
}

the parameter environment for that function is [T: Copy]. This means any evaluation within this function will, when accessing the type T, know about its Copy bound via the parameter environment.

You can get the parameter environment for a def_id using the param_env query. However, this ParamEnv can be too generic for your use case. Using the ParamEnv from the surrounding context can allow you to evaluate more things. For example, suppose we had something the following:


#![allow(unused)]
fn main() {
trait Foo {
    type Assoc;
}

trait Bar { }

trait Baz {
    fn stuff() -> bool;
}

fn foo<T>(t: T)
where
    T: Foo,
    <T as Foo>::Assoc: Bar
{
   bar::<T::Assoc>()
}

fn bar<T: Baz>() {
    if T::stuff() { mep() } else { mop() }
}
}

We may know some things inside bar that we wouldn't know if we just fetched bar's param env because of the <T as Foo>::Assoc: Bar bound in foo. This is a contrived example that makes no sense in our existing analyses, but we may run into similar cases when doing analyses with associated constants on generic traits or traits with assoc types.

Bundling

Another great thing about ParamEnv is that you can use it to bundle the thing depending on generic parameters (e.g. a Ty) by calling the and method. This will produce a ParamEnvAnd<Ty>, making clear that you should probably not be using the inner value without taking care to also use the ParamEnv.

Errors and Lints

A lot of effort has been put into making rustc have great error messages. This chapter is about how to emit compile errors and lints from the compiler.

Diagnostic structure

The main parts of a diagnostic error are the following:

error[E0000]: main error message
  --> file.rs:LL:CC
   |
LL | <code>
   | -^^^^- secondary label
   |  |
   |  primary label
   |
   = note: note without a `Span`, created with `.note`
note: sub-diagnostic message for `.span_note`
  --> file.rs:LL:CC
   |
LL | more code
   |      ^^^^
  • Description (error, warning, etc.).
  • Code (for example, for "mismatched types", it is E0308). It helps users get more information about the current error through an extended description of the problem in the error code index.
  • Message. It is the main description of the problem. It should be general and able to stand on its own, so that it can make sense even in isolation.
  • Diagnostic window. This contains several things:
    • The path, line number and column of the beginning of the primary span.
    • The users' affected code and its surroundings.
    • Primary and secondary spans underlying the users' code. These spans can optionally contain one or more labels.
      • Primary spans should have enough text to describe the problem in such a way that if it where the only thing being displayed (for example, in an IDE) it would still make sense. Because it is "spatially aware" (it points at the code), it can generally be more succinct than the error message.
      • If cluttered output can be foreseen in cases when multiple span labels overlap, it is a good idea to tweak the output appropriately. For example, the if/else arms have incompatible types error uses different spans depending on whether the arms are all in the same line, if one of the arms is empty and if none of those cases applies.
  • Sub-diagnostics. Any error can have multiple sub-diagnostics that look similar to the main part of the error. These are used for cases where the order of the explanation might not correspond with the order of the code. If the order of the explanation can be "order free", leveraging secondary labels in the main diagnostic is preferred, as it is typically less verbose.

The text should be matter of fact and avoid capitalization and periods, unless multiple sentences are needed:

error: the fobrulator needs to be krontrificated

When code or an identifier must appear in a message or label, it should be surrounded with single acute accents `.

Error explanations

Some errors include long form descriptions. They may be viewed with the --explain flag, or via the error index. Each explanation comes with an example of how to trigger it and advice on how to fix it.

Please read RFC 1567 for details on how to format and write long error codes.

The descriptions are written in Markdown, and all of them are linked in the rustc_error_codes crate.

As a general rule, give an error a code (with an associated explanation) if the explanation would give more information than the error itself. A lot of the time it's better to put all the information in the emitted error itself. However, sometimes that would make the error verbose or there are too many possible triggers to include useful information for all cases in the error, in which case it's a good idea to add an explanation.1 As always, if you are not sure, just ask your reviewer!

1

This rule of thumb was suggested by @estebank here.

Lints versus fixed diagnostics

Some messages are emitted via lints, where the user can control the level. Most diagnostics are hard-coded such that the user cannot control the level.

Usually it is obvious whether a diagnostic should be "fixed" or a lint, but there are some grey areas.

Here are a few examples:

  • Borrow checker errors: these are fixed errors. The user cannot adjust the level of these diagnostics to silence the borrow checker.
  • Dead code: this is a lint. While the user probably doesn't want dead code in their crate, making this a hard error would make refactoring and development very painful.
  • safe_packed_borrows future compatibility warning: this is a silencable lint related to safety. It was judged that the making this a hard (fixed) error would cause too much breakage, so instead a warning is emitted that eventually will be turned into a hard error.

Hard-coded warnings (those using the span_warn methods) should be avoided for normal code, preferring to use lints instead. Some cases, such as warnings with CLI flags, will require the use of hard-coded warnings.

See the deny lint level below for guidelines when to use an error-level lint instead of a fixed error.

Diagnostic output style guide

  • Write in plain simple English. If your message, when shown on a – possibly small – screen (which hasn't been cleaned for a while), cannot be understood by a normal programmer, who just came out of bed after a night partying, it's too complex.
  • Error, Warning, Note, and Help messages start with a lowercase letter and do not end with punctuation.
  • Error messages should be succinct. Users will see these error messages many times, and more verbose descriptions can be viewed with the --explain flag. That said, don't make it so terse that it's hard to understand.
  • The word "illegal" is illegal. Prefer "invalid" or a more specific word instead.
  • Errors should document the span of code where they occur – the rustc_errors::diagnostic_builder::DiagnosticBuilder span_* methods allow to easily do this. Also note other spans that have contributed to the error if the span isn't too large.
  • When emitting a message with span, try to reduce the span to the smallest amount possible that still signifies the issue
  • Try not to emit multiple error messages for the same error. This may require detecting duplicates.
  • When the compiler has too little information for a specific error message, consult with the compiler team to add new attributes for library code that allow adding more information. For example see #[rustc_on_unimplemented]. Use these annotations when available!
  • Keep in mind that Rust's learning curve is rather steep, and that the compiler messages are an important learning tool.
  • When talking about the compiler, call it the compiler, not Rust or rustc.

Lint naming

From RFC 0344, lint names should be consistent, with the following guidelines:

The basic rule is: the lint name should make sense when read as "allow lint-name" or "allow lint-name items". For example, "allow deprecated items" and "allow dead_code" makes sense, while "allow unsafe_block" is ungrammatical (should be plural).

  • Lint names should state the bad thing being checked for, e.g. deprecated, so that #[allow(deprecated)] (items) reads correctly. Thus ctypes is not an appropriate name; improper_ctypes is.

  • Lints that apply to arbitrary items (like the stability lints) should just mention what they check for: use deprecated rather than deprecated_items. This keeps lint names short. (Again, think "allow lint-name items".)

  • If a lint applies to a specific grammatical class, mention that class and use the plural form: use unused_variables rather than unused_variable. This makes #[allow(unused_variables)] read correctly.

  • Lints that catch unnecessary, unused, or useless aspects of code should use the term unused, e.g. unused_imports, unused_typecasts.

  • Use snake case in the same way you would for function names.

Diagnostic levels

Guidelines for different diagnostic levels:

  • error: emitted when the compiler detects a problem that makes it unable to compile the program, either because the program is invalid or the programmer has decided to make a specific warning into an error.

  • warning: emitted when the compiler detects something odd about a program. Care should be taken when adding warnings to avoid warning fatigue, and avoid false-positives where there really isn't a problem with the code. Some examples of when it is appropriate to issue a warning:

    • A situation where the user should take action, such as swap out a deprecated item, or use a Result, but otherwise doesn't prevent compilation.
    • Unnecessary syntax that can be removed without affecting the semantics of the code. For example, unused code, or unnecessary unsafe.
    • Code that is very likely to be incorrect, dangerous, or confusing, but the language technically allows, and is not ready or confident enough to make an error. For example unused_comparisons (out of bounds comparisons) or bindings_with_variant_name (the user likely did not intend to create a binding in a pattern).
    • Future-incompatible lints, where something was accidentally or erroneously accepted in the past, but rejecting would cause excessive breakage in the ecosystem.
    • Stylistic choices. For example, camel or snake case, or the dyn trait warning in the 2018 edition. These have a high bar to be added, and should only be used in exceptional circumstances. Other stylistic choices should either be allow-by-default lints, or part of other tools like Clippy or rustfmt.
  • help: emitted following an error or warning to give additional information to the user about how to solve their problem. These messages often include a suggestion string and rustc_errors::Applicability confidence level to guide automated source fixes by tools. See the Suggestions section for more details.

    The error or warning portion should not suggest how to fix the problem, only the "help" sub-diagnostic should.

  • note: emitted to identify additional circumstances and parts of the code that caused the warning or error. For example, the borrow checker will note any previous conflicting borrows.

Not to be confused with lint levels, whose guidelines are:

  • forbid: Lints should never default to forbid.

  • deny: Equivalent to error diagnostic level. Some examples:

    • A future-incompatible or edition-based lint that has graduated from the warning level.
    • Something that has an extremely high confidence that is incorrect, but still want an escape hatch to allow it to pass.
  • warn: Equivalent to the warning diagnostic level. See warning above for guidelines.

  • allow: Examples of the kinds of lints that should default to allow:

    • The lint has a too high false positive rate.
    • The lint is too opinionated.
    • The lint is experimental.
    • The lint is used for enforcing something that is not normally enforced. For example, the unsafe_code lint can be used to prevent usage of unsafe code.

More information about lint levels can be found in the rustc book and the reference.

Helpful tips and options

Finding the source of errors

There are two main ways to find where a given error is emitted:

  • grep for either a sub-part of the error message/label or error code. This usually works well and is straightforward, but there are some cases where the error emitting code is removed from the code where the error is constructed behind a relatively deep call-stack. Even then, it is a good way to get your bearings.
  • Invoking rustc with the nightly-only flag -Ztreat-err-as-bug=1, which will treat the first error being emitted as an Internal Compiler Error, which allows you to use the environment variable RUST_BACKTRACE=full to get a stack trace at the point the error has been emitted. Change the 1 to something else if you whish to trigger on a later error. Some limitations with this approach is that some calls get elided from the stack trace because they get inlined in the compiled rustc, and the same problem we faced with the prior approach, where the construction of the error is far away from where it is emitted. In some cases we buffer multiple errors in order to emit them in order.

The regular development practices apply: judicious use of debug!() statements and use of a debugger to trigger break points in order to figure out in what order things are happening.

Span

Span is the primary data structure in rustc used to represent a location in the code being compiled. Spans are attached to most constructs in HIR and MIR, allowing for more informative error reporting.

A Span can be looked up in a SourceMap to get a "snippet" useful for displaying errors with span_to_snippet and other similar methods on the SourceMap.

Error messages

The rustc_errors crate defines most of the utilities used for reporting errors.

Session and ParseSess have methods (or fields with methods) that allow reporting errors. These methods usually have names like span_err or struct_span_err or span_warn, etc... There are lots of them; they emit different types of "errors", such as warnings, errors, fatal errors, suggestions, etc.

In general, there are two classes of such methods: ones that emit an error directly and ones that allow finer control over what to emit. For example, span_err emits the given error message at the given Span, but struct_span_err instead returns a DiagnosticBuilder.

DiagnosticBuilder allows you to add related notes and suggestions to an error before emitting it by calling the emit method. (Failing to either emit or cancel a DiagnosticBuilder will result in an ICE.) See the docs for more info on what you can do.

// Get a DiagnosticBuilder. This does _not_ emit an error yet.
let mut err = sess.struct_span_err(sp, "oh no! this is an error!");

// In some cases, you might need to check if `sp` is generated by a macro to
// avoid printing weird errors about macro-generated code.

if let Ok(snippet) = sess.source_map().span_to_snippet(sp) {
    // Use the snippet to generate a suggested fix
    err.span_suggestion(suggestion_sp, "try using a qux here", format!("qux {}", snippet));
} else {
    // If we weren't able to generate a snippet, then emit a "help" message
    // instead of a concrete "suggestion". In practice this is unlikely to be
    // reached.
    err.span_help(suggestion_sp, "you could use a qux here instead");
}

// emit the error
err.emit();

Alternatively, for less-complex diagnostics, the SessionDiagnostic derive macro can be used -- see Creating Errors With SessionDiagnostic.

Suggestions

In addition to telling the user exactly why their code is wrong, it's oftentimes furthermore possible to tell them how to fix it. To this end, DiagnosticBuilder offers a structured suggestions API, which formats code suggestions pleasingly in the terminal, or (when the --error-format json flag is passed) as JSON for consumption by tools, most notably the Rust Language Server and rustfix.

Not all suggestions should be applied mechanically, they have a degree of confidence in the suggested code, from high (Applicability::MachineApplicable) to low (Applicability::MaybeIncorrect). Be conservative when choosing the level. Use the span_suggestion method of DiagnosticBuilder to make a suggestion. The last argument provides a hint to tools whether the suggestion is mechanically applicable or not.

Suggestions point to one or more spans with corresponding code that will replace their current content.

The message that accompanies them should be understandable in the following contexts:

  • shown as an independent sub-diagnostic (this is the default output)
  • shown as a label pointing at the affected span (this is done automatically if some heuristics for verbosity are met)
  • shown as a help sub-diagnostic with no content (used for cases where the suggestion is obvious from the text, but we still want to let tools to apply them))
  • not shown (used for very obvious cases, but we still want to allow tools to apply them)

For example, to make our qux suggestion machine-applicable, we would do:

let mut err = sess.struct_span_err(sp, "oh no! this is an error!");

if let Ok(snippet) = sess.source_map().span_to_snippet(sp) {
    err.span_suggestion(
        suggestion_sp,
        "try using a qux here",
        format!("qux {}", snippet),
        Applicability::MachineApplicable,
    );
} else {
    err.span_help(suggestion_sp, "you could use a qux here instead");
}

err.emit();

This might emit an error like

$ rustc mycode.rs
error[E0999]: oh no! this is an error!
 --> mycode.rs:3:5
  |
3 |     sad()
  |     ^ help: try using a qux here: `qux sad()`

error: aborting due to previous error

For more information about this error, try `rustc --explain E0999`.

In some cases, like when the suggestion spans multiple lines or when there are multiple suggestions, the suggestions are displayed on their own:

error[E0999]: oh no! this is an error!
 --> mycode.rs:3:5
  |
3 |     sad()
  |     ^
help: try using a qux here:
  |
3 |     qux sad()
  |     ^^^

error: aborting due to previous error

For more information about this error, try `rustc --explain E0999`.

The possible values of Applicability are:

  • MachineApplicable: Can be applied mechanically.
  • HasPlaceholders: Cannot be applied mechanically because it has placeholder text in the suggestions. For example, "Try adding a type: `let x: <type>`".
  • MaybeIncorrect: Cannot be applied mechanically because the suggestion may or may not be a good one.
  • Unspecified: Cannot be applied mechanically because we don't know which of the above cases it falls into.

Suggestion Style Guide

  • Suggestions should not be a question. In particular, language like "did you mean" should be avoided. Sometimes, it's unclear why a particular suggestion is being made. In these cases, it's better to be upfront about what the suggestion is.

    Compare "did you mean: Foo" vs. "there is a struct with a similar name: Foo".

  • The message should not contain any phrases like "the following", "as shown", etc. Use the span to convey what is being talked about.

  • The message may contain further instruction such as "to do xyz, use" or "to do xyz, use abc".

  • The message may contain a name of a function, variable, or type, but avoid whole expressions.

Lints

The compiler linting infrastructure is defined in the rustc_middle::lint module.

Declaring a lint

The built-in compiler lints are defined in the rustc_lint crate.

Every lint is implemented via a struct that implements the LintPass trait (you also implement one of the more specific lint pass traits, either EarlyLintPass or LateLintPass). The trait implementation allows you to check certain syntactic constructs as the linter walks the source code. You can then choose to emit lints in a very similar way to compile errors.

You also declare the metadata of a particular lint via the declare_lint! macro. This includes the name, the default level, a short description, and some more details.

Note that the lint and the lint pass must be registered with the compiler.

For example, the following lint checks for uses of while true { ... } and suggests using loop { ... } instead.

// Declare a lint called `WHILE_TRUE`
declare_lint! {
    WHILE_TRUE,

    // warn-by-default
    Warn,

    // This string is the lint description
    "suggest using `loop { }` instead of `while true { }`"
}

// This declares a struct and a lint pass, providing a list of associated lints. The
// compiler currently doesn't use the associated lints directly (e.g., to not
// run the pass or otherwise check that the pass emits the appropriate set of
// lints). However, it's good to be accurate here as it's possible that we're
// going to register the lints via the get_lints method on our lint pass (that
// this macro generates).
declare_lint_pass!(WhileTrue => [WHILE_TRUE]);

// Helper function for `WhileTrue` lint.
// Traverse through any amount of parenthesis and return the first non-parens expression.
fn pierce_parens(mut expr: &ast::Expr) -> &ast::Expr {
    while let ast::ExprKind::Paren(sub) = &expr.kind {
        expr = sub;
    }
    expr
}

// `EarlyLintPass` has lots of methods. We only override the definition of
// `check_expr` for this lint because that's all we need, but you could
// override other methods for your own lint. See the rustc docs for a full
// list of methods.
impl EarlyLintPass for WhileTrue {
    fn check_expr(&mut self, cx: &EarlyContext<'_>, e: &ast::Expr) {
        if let ast::ExprKind::While(cond, ..) = &e.kind {
            if let ast::ExprKind::Lit(ref lit) = pierce_parens(cond).kind {
                if let ast::LitKind::Bool(true) = lit.kind {
                    if !lit.span.from_expansion() {
                        let msg = "denote infinite loops with `loop { ... }`";
                        let condition_span = cx.sess.source_map().guess_head_span(e.span);
                        cx.struct_span_lint(WHILE_TRUE, condition_span, |lint| {
                            lint.build(msg)
                                .span_suggestion_short(
                                    condition_span,
                                    "use `loop`",
                                    "loop".to_owned(),
                                    Applicability::MachineApplicable,
                                )
                                .emit();
                        })
                    }
                }
            }
        }
    }
}

Edition-gated Lints

Sometimes we want to change the behavior of a lint in a new edition. To do this, we just add the transition to our invocation of declare_lint!:

declare_lint! {
    pub ANONYMOUS_PARAMETERS,
    Allow,
    "detects anonymous parameters",
    Edition::Edition2018 => Warn,
}

This makes the ANONYMOUS_PARAMETERS lint allow-by-default in the 2015 edition but warn-by-default in the 2018 edition.

A future-incompatible lint should be declared with the @future_incompatible additional "field":

declare_lint! {
    pub ANONYMOUS_PARAMETERS,
    Allow,
    "detects anonymous parameters",
    @future_incompatible = FutureIncompatibleInfo {
        reference: "issue #41686 <https://github.com/rust-lang/rust/issues/41686>",
        edition: Some(Edition::Edition2018),
    };
}

If you need a combination of options that's not supported by the declare_lint! macro, you can always define your own static with a type of &Lint but this is currently linted against in the compiler tree.

Guidelines for creating a future incompatibility lint

  • Create a lint defaulting to warn as normal, with ideally the same error message you would normally give.
  • Add a suitable reference, typically an RFC or tracking issue. Go ahead and include the full URL, sort items in ascending order of issue numbers.
  • Later, change lint to error.
  • Eventually, remove lint.

Renaming or removing a lint

A lint can be renamed or removed, which will trigger a warning if a user tries to use the old lint name. To declare a rename/remove, add a line with store.register_renamed or store.register_removed to the code of the register_builtins function.

store.register_renamed("single_use_lifetime", "single_use_lifetimes");

Lint Groups

Lints can be turned on in groups. These groups are declared in the register_builtins function in rustc_lint::lib. The add_lint_group! macro is used to declare a new group.

For example,

add_lint_group!(sess,
    "nonstandard_style",
    NON_CAMEL_CASE_TYPES,
    NON_SNAKE_CASE,
    NON_UPPER_CASE_GLOBALS);

This defines the nonstandard_style group which turns on the listed lints. A user can turn on these lints with a !#[warn(nonstandard_style)] attribute in the source code, or by passing -W nonstandard-style on the command line.

Linting early in the compiler

On occasion, you may need to define a lint that runs before the linting system has been initialized (e.g. during parsing or macro expansion). This is problematic because we need to have computed lint levels to know whether we should emit a warning or an error or nothing at all.

To solve this problem, we buffer the lints until the linting system is processed. Session and ParseSess both have buffer_lint methods that allow you to buffer a lint for later. The linting system automatically takes care of handling buffered lints later.

Thus, to define a lint that runs early in the compilation, one defines a lint like normal but invokes the lint with buffer_lint.

Linting even earlier in the compiler

The parser (rustc_ast) is interesting in that it cannot have dependencies on any of the other rustc* crates. In particular, it cannot depend on rustc_middle::lint or rustc_lint, where all of the compiler linting infrastructure is defined. That's troublesome!

To solve this, rustc_ast defines its own buffered lint type, which ParseSess::buffer_lint uses. After macro expansion, these buffered lints are then dumped into the Session::buffered_lints used by the rest of the compiler.

JSON diagnostic output

The compiler accepts an --error-format json flag to output diagnostics as JSON objects (for the benefit of tools such as cargo fix or the RLS). It looks like this—

$ rustc json_error_demo.rs --error-format json
{"message":"cannot add `&str` to `{integer}`","code":{"code":"E0277","explanation":"\nYou tried to use a type which doesn't implement some trait in a place which\nexpected that trait. Erroneous code example:\n\n```compile_fail,E0277\n// here we declare the Foo trait with a bar method\ntrait Foo {\n    fn bar(&self);\n}\n\n// we now declare a function which takes an object implementing the Foo trait\nfn some_func<T: Foo>(foo: T) {\n    foo.bar();\n}\n\nfn main() {\n    // we now call the method with the i32 type, which doesn't implement\n    // the Foo trait\n    some_func(5i32); // error: the trait bound `i32 : Foo` is not satisfied\n}\n```\n\nIn order to fix this error, verify that the type you're using does implement\nthe trait. Example:\n\n```\ntrait Foo {\n    fn bar(&self);\n}\n\nfn some_func<T: Foo>(foo: T) {\n    foo.bar(); // we can now use this method since i32 implements the\n               // Foo trait\n}\n\n// we implement the trait on the i32 type\nimpl Foo for i32 {\n    fn bar(&self) {}\n}\n\nfn main() {\n    some_func(5i32); // ok!\n}\n```\n\nOr in a generic context, an erroneous code example would look like:\n\n```compile_fail,E0277\nfn some_func<T>(foo: T) {\n    println!(\"{:?}\", foo); // error: the trait `core::fmt::Debug` is not\n                           //        implemented for the type `T`\n}\n\nfn main() {\n    // We now call the method with the i32 type,\n    // which *does* implement the Debug trait.\n    some_func(5i32);\n}\n```\n\nNote that the error here is in the definition of the generic function: Although\nwe only call it with a parameter that does implement `Debug`, the compiler\nstill rejects the function: It must work with all possible input types. In\norder to make this example compile, we need to restrict the generic type we're\naccepting:\n\n```\nuse std::fmt;\n\n// Restrict the input type to types that implement Debug.\nfn some_func<T: fmt::Debug>(foo: T) {\n    println!(\"{:?}\", foo);\n}\n\nfn main() {\n    // Calling the method is still fine, as i32 implements Debug.\n    some_func(5i32);\n\n    // This would fail to compile now:\n    // struct WithoutDebug;\n    // some_func(WithoutDebug);\n}\n```\n\nRust only looks at the signature of the called function, as such it must\nalready specify all requirements that will be used for every type parameter.\n"},"level":"error","spans":[{"file_name":"json_error_demo.rs","byte_start":50,"byte_end":51,"line_start":4,"line_end":4,"column_start":7,"column_end":8,"is_primary":true,"text":[{"text":"    a + b","highlight_start":7,"highlight_end":8}],"label":"no implementation for `{integer} + &str`","suggested_replacement":null,"suggestion_applicability":null,"expansion":null}],"children":[{"message":"the trait `std::ops::Add<&str>` is not implemented for `{integer}`","code":null,"level":"help","spans":[],"children":[],"rendered":null}],"rendered":"error[E0277]: cannot add `&str` to `{integer}`\n --> json_error_demo.rs:4:7\n  |\n4 |     a + b\n  |       ^ no implementation for `{integer} + &str`\n  |\n  = help: the trait `std::ops::Add<&str>` is not implemented for `{integer}`\n\n"}
{"message":"aborting due to previous error","code":null,"level":"error","spans":[],"children":[],"rendered":"error: aborting due to previous error\n\n"}
{"message":"For more information about this error, try `rustc --explain E0277`.","code":null,"level":"","spans":[],"children":[],"rendered":"For more information about this error, try `rustc --explain E0277`.\n"}

Note that the output is a series of lines, each of which is a JSON object, but the series of lines taken together is, unfortunately, not valid JSON, thwarting tools and tricks (such as piping to python3 -m json.tool) that require such. (One speculates that this was intentional for LSP performance purposes, so that each line/object can be sent to RLS as it is flushed?)

Also note the "rendered" field, which contains the "human" output as a string; this was introduced so that UI tests could both make use of the structured JSON and see the "human" output (well, sans colors) without having to compile everything twice.

The "human" readable and the json format emitter can be found under rustc_errors, both were moved from the rustc_ast crate to the rustc_errors crate.

The JSON emitter defines its own Diagnostic struct (and sub-structs) for the JSON serialization. Don't confuse this with errors::Diagnostic!

#[rustc_on_unimplemented(...)]

The #[rustc_on_unimplemented] attribute allows trait definitions to add specialized notes to error messages when an implementation was expected but not found. You can refer to the trait's generic arguments by name and to the resolved type using Self.

For example:

#![feature(rustc_attrs)]

#[rustc_on_unimplemented="an iterator over elements of type `{A}` \
    cannot be built from a collection of type `{Self}`"]
trait MyIterator<A> {
    fn next(&mut self) -> A;
}

fn iterate_chars<I: MyIterator<char>>(i: I) {
    // ...
}

fn main() {
    iterate_chars(&[1, 2, 3][..]);
}

When the user compiles this, they will see the following;

error[E0277]: the trait bound `&[{integer}]: MyIterator<char>` is not satisfied
  --> <anon>:14:5
   |
14 |     iterate_chars(&[1, 2, 3][..]);
   |     ^^^^^^^^^^^^^ an iterator over elements of type `char` cannot be built from a collection of type `&[{integer}]`
   |
   = help: the trait `MyIterator<char>` is not implemented for `&[{integer}]`
   = note: required by `iterate_chars`

rustc_on_unimplemented also supports advanced filtering for better targeting of messages, as well as modifying specific parts of the error message. You target the text of:

  • the main error message (message)
  • the label (label)
  • an extra note (note)

For example, the following attribute

#[rustc_on_unimplemented(
    message="message",
    label="label",
    note="note"
)]
trait MyIterator<A> {
    fn next(&mut self) -> A;
}

Would generate the following output:

error[E0277]: message
  --> <anon>:14:5
   |
14 |     iterate_chars(&[1, 2, 3][..]);
   |     ^^^^^^^^^^^^^ label
   |
   = note: note
   = help: the trait `MyIterator<char>` is not implemented for `&[{integer}]`
   = note: required by `iterate_chars`

To allow more targeted error messages, it is possible to filter the application of these fields based on a variety of attributes when using on:

  • crate_local: whether the code causing the trait bound to not be fulfilled is part of the user's crate. This is used to avoid suggesting code changes that would require modifying a dependency.
  • Any of the generic arguments that can be substituted in the text can be referred by name as well for filtering, like Rhs="i32", except for Self.
  • _Self: to filter only on a particular calculated trait resolution, like Self="std::iter::Iterator<char>". This is needed because Self is a keyword which cannot appear in attributes.
  • direct: user-specified rather than derived obligation.
  • from_method: usable both as boolean (whether the flag is present, like crate_local) or matching against a particular method. Currently used for try.
  • from_desugaring: usable both as boolean (whether the flag is present) or matching against a particular desugaring. The desugaring is identified with its variant name in the DesugaringKind enum.

For example, the Iterator trait can be annotated in the following way:

#[rustc_on_unimplemented(
    on(
        _Self="&str",
        note="call `.chars()` or `.as_bytes()` on `{Self}"
    ),
    message="`{Self}` is not an iterator",
    label="`{Self}` is not an iterator",
    note="maybe try calling `.iter()` or a similar method"
)]
pub trait Iterator {}

Which would produce the following outputs:

error[E0277]: `Foo` is not an iterator
 --> src/main.rs:4:16
  |
4 |     for foo in Foo {}
  |                ^^^ `Foo` is not an iterator
  |
  = note: maybe try calling `.iter()` or a similar method
  = help: the trait `std::iter::Iterator` is not implemented for `Foo`
  = note: required by `std::iter::IntoIterator::into_iter`

error[E0277]: `&str` is not an iterator
 --> src/main.rs:5:16
  |
5 |     for foo in "" {}
  |                ^^ `&str` is not an iterator
  |
  = note: call `.chars()` or `.bytes() on `&str`
  = help: the trait `std::iter::Iterator` is not implemented for `&str`
  = note: required by `std::iter::IntoIterator::into_iter`

If you need to filter on multiple attributes, you can use all, any or not in the following way:

#[rustc_on_unimplemented(
    on(
        all(_Self="&str", T="std::string::String"),
        note="you can coerce a `{T}` into a `{Self}` by writing `&*variable`"
    )
)]
pub trait From<T>: Sized { /* ... */ }

Creating Errors With SessionDiagnostic

The SessionDiagnostic derive macro gives an alternate way to the DiagnosticBuilder API for defining and emitting errors. It allows a struct to be annotated with information which allows it to be transformed and emitted as a Diagnostic.

As an example, we'll take a look at how the "field already declared" diagnostic is actually defined in the compiler (see the definition here and usage here):

#[derive(SessionDiagnostic)]
#[error = "E0124"]
pub struct FieldAlreadyDeclared {
    pub field_name: Ident,
    #[message = "field `{field_name}` is already declared"]
    #[label = "field already declared"]
    pub span: Span,
    #[label = "`{field_name}` first declared here"]
    pub prev_span: Span,
}
// ...
tcx.sess.emit_err(FieldAlreadyDeclared {
    field_name: f.ident,
    span: f.span,
    prev_span,
});

We see that using SessionDiagnostic is relatively straight forward. The #[error = "..."] attribute is used to supply the error code for the diagnostic. We then annotate fields in the struct with various information on how to convert an instance of the struct into a rendered diagnostic. The attributes above produce code which is roughly equivalent to the following (in pseudo-Rust):

impl SessionDiagnostic for FieldAlreadyDeclared {
    fn into_diagnostic(self, sess: &'_ rustc_session::Session) -> DiagnosticBuilder<'_> {
        let mut diag = sess.struct_err_with_code("", rustc_errors::DiagnosticId::Error("E0124"));
        diag.set_span(self.span);
        diag.set_primary_message(format!("field `{field_name}` is already declared", field_name = self.field_name));
        diag.span_label(self.span, "field already declared");
        diag.span_label(self.prev_span, format!("`{field_name}` first declared here", field_name = self.field_name));
        diag
    }
}

The generated code draws attention to a number of features. First, we see that within the strings passed to each attribute, field names can be referenced without needing to be passed explicitly into the format string -- in this example here, #[message = "field {field_name} is already declared"] produces a call to format! with the appropriate arguments to format self.field_name into the string. This applies to strings passed to all attributes.

We also see that labelling Span fields in the struct produces calls which pass that Span to the produced diagnostic. In the example above, we see that putting the #[message = "..."] attribute on a Span leads to the primary span of the diagnostic being set to that Span, while applying the #[label = "..."] attribute on a Span will simply set the span for that label. Each attribute has different requirements for what they can be applied on, differing on position (on the struct, or on a specific field), type (if it's applied on a field), and whether or not the attribute is optional.

Attributes Listing

Below is a listing of all the currently-available attributes that #[derive(SessionDiagnostic)] understands:

AttributeApplied toMandatoryBehaviour
#[code = "..."]StructYesSets the Diagnostic's error code
#[message = "..."]Struct / Span fieldsYesSets the Diagnostic's primary message. If on Span field, also sets the Diagnostic's span.
#[label = "..."]Span fieldsNoEquivalent to calling span_label with that Span and message.
#[suggestion(message = "..." , code = "..."](Span, MachineApplicability) or Span fieldsNoEquivalent to calling span_suggestion. Note code is optional.
#[suggestion_short(message = "..." , code = "..."](Span, MachineApplicability) or Span fieldsNoEquivalent to calling span_suggestion_short. Note code is optional.
#[suggestion_hidden(message = "..." , code = "..."](Span, MachineApplicability) or Span fieldsNoEquivalent to calling span_suggestion_hidden. Note code is optional.
#[suggestion_verbose(message = "..." , code = "..."](Span, MachineApplicability) or Span fieldsNoEquivalent to calling span_suggestion_verbose. Note code is optional.

Optional Diagnostic Attributes

There may be some cases where you want one of the decoration attributes to be applied optionally; for example, if a suggestion can only be generated sometimes. In this case, simply wrap the field's type in an Option. At runtime, if the field is set to None, the attribute for that field won't be used in creating the diagnostic. For example:

#[derive(SessionDiagnostic)]
#[code = "E0123"]
struct SomeKindOfError {
    ...
    #[suggestion(message = "informative error message")]
    opt_sugg: Option<(Span, Applicability)>
    ...
}

Lints

This page documents some of the machinery around lint registration and how we run lints in the compiler.

The LintStore is the central piece of infrastructure, around which everything rotates. It's not available during the early parts of compilation (i.e., before TyCtxt) in most code, as we need to fill it in with all of the lints, which can only happen after plugin registration.

Lints vs. lint passes

There are two parts to the linting mechanism within the compiler: lints and lint passes. Unfortunately, a lot of the documentation we have refers to both of these as just "lints."

First, we have the lint declarations themselves: this is where the name and default lint level and other metadata come from. These are normally defined by way of the declare_lint! macro, which boils down to a static with type &rustc_session::lint::Lint. We lint against direct declarations without the use of the macro today (though this may change in the future, as the macro is somewhat unwieldy to add new fields to, like all macros by example).

Lint declarations don't carry any "state" - they are merely global identifers and descriptions of lints. We assert at runtime that they are not registered twice (by lint name).

Lint passes are the meat of any lint. Notably, there is not a one-to-one relationship between lints and lint passes; a lint might not have any lint pass that emits it, it could have many, or just one -- the compiler doesn't track whether a pass is in any way associated with a particular lint, and frequently lints are emitted as part of other work (e.g., type checking, etc.).

Registration

High-level overview

The lint store is created and all lints are registered during plugin registration, in rustc_interface::register_plugins. There are three 'sources' of lint: the internal lints, plugin lints, and rustc_interface::Config register_lints. All are registered here, in register_plugins.

Once the registration is complete, we "freeze" the lint store by placing it in an Lrc. Later in the driver, it's passed into the GlobalCtxt constructor where it lives in an immutable form from then on.

Lints are registered via the LintStore::register_lint function. This should happen just once for any lint, or an ICE will occur.

Lint passes are registered separately into one of the categories (pre-expansion, early, late, late module). Passes are registered as a closure -- i.e., impl Fn() -> Box<dyn X>, where dyn X is either an early or late lint pass trait object. When we run the lint passes, we run the closure and then invoke the lint pass methods, which take &mut self -- lint passes can keep track of state internally.

Internal lints

Note, these include both rustc-internal lints, and the traditional lints, like, for example the dead code lint.

These are primarily described in two places: rustc_session::lint::builtin and rustc_lint::builtin. The first provides the definitions for the lints themselves, and the latter provides the lint pass definitions (and implementations).

The internal lint registration happens in the rustc_lint::register_builtins function, along with the rustc_lint::register_internals function. More generally, the LintStore "constructor" function which is the way to get a LintStore in the compiler (you should not construct it directly) is rustc_lint::new_lint_store; it calls the registration functions.

Plugin lints

This is one of the primary use cases remaining for plugins/drivers. Plugins are given access to the mutable LintStore during registration to call any functions they need on the LintStore, just like rustc code. Plugins are intended to declare lints with the plugin field set to true (e.g., by way of the declare_tool_lint! macro), but this is purely for diagnostics and help text; otherwise plugin lints are mostly just as first class as rustc builtin lints.

Driver lints

These are the lints provided by drivers via the rustc_interface::Config register_lints field, which is a callback. Drivers should, if finding it already set, call the function currently set within the callback they add. The best way for drivers to get access to this is by overriding the Callbacks::config function which gives them direct access to the Config structure.

Compiler lint passes are combined into one pass

Within the compiler, for performance reasons, we usually do not register dozens of lint passes. Instead, we have a single lint pass of each variety (e.g. BuiltinCombinedModuleLateLintPass) which will internally call all of the individual lint passes; this is because then we get the benefits of static over dynamic dispatch for each of the (often empty) trait methods.

Ideally, we'd not have to do this, since it certainly adds to the complexity of understanding the code. However, with the current type-erased lint store approach, it is beneficial to do so for performance reasons.

New lints being added likely want to join one of the existing declarations like late_lint_mod_passes in rustc_lint/src/lib.rs, which would then auto-propagate into the other.

Diagnostic Codes

We generally try to assign each error message a unique code like E0123. These codes are defined in the compiler in the diagnostics.rs files found in each crate, which basically consist of macros. The codes come in two varieties: those that have an extended write-up, and those that do not. Whenever possible, if you are making a new code, you should write an extended write-up.

Allocating a fresh code

Error codes are stored in compiler/rustc_error_codes.

To create a new error, you first need to find the next available code. You can find it with tidy:

./x.py test tidy

This will invoke the tidy script, which generally checks that your code obeys our coding conventions. One of those jobs is to check that diagnostic codes are indeed unique. Once it is finished with that, tidy will print out the lowest unused code:

...
tidy check (x86_64-apple-darwin)
* 470 error codes
* highest error code: E0591
...

Here we see the highest error code in use is E0591, so we probably want E0592. To be sure, run rg E0592 and check, you should see no references.

Ideally, you will write an extended description for your error, which will go in rustc_error_codes/src/error_codes/E0592.md. To register the error, open rustc_error_codes/src/error_codes.rs and add the code (in its proper numerical order) into register_diagnostics! macro, like this:


#![allow(unused)]
fn main() {
register_diagnostics! {
    ...
    E0592: include_str!("./error_codes/E0592.md"),
}
}

But you can also add it without an extended description:


#![allow(unused)]
fn main() {
register_diagnostics! {
    ...
    E0592, // put a description here
}
}

To actually issue the error, you can use the struct_span_err! macro:


#![allow(unused)]
fn main() {
struct_span_err!(self.tcx.sess, // some path to the session here
                 span, // whatever span in the source you want
                 E0592, // your new error code
                 &format!("text of the error"))
    .emit() // actually issue the error
}

If you want to add notes or other snippets, you can invoke methods before you call .emit():


#![allow(unused)]
fn main() {
struct_span_err!(...)
    .span_label(another_span, "something to label in the source")
    .span_note(another_span, "some separate note, probably avoid these")
    .emit_()
}

For an example of a PR adding an error code, see #76143.

From MIR to Binaries

All of the preceding chapters of this guide have one thing in common: we never generated any executable machine code at all! With this chapter, all of that changes.

So far, we've shown how the compiler can take raw source code in text format and transform it into MIR. We have also shown how the compiler does various analyses on the code to detect things like type or lifetime errors. Now, we will finally take the MIR and produce some executable machine code.

NOTE: This part of a compiler is often called the backend the term is a bit overloaded because in the compiler source, it usually refers to the "codegen backend" (i.e. LLVM or Cranelift). Usually, when you see the word "backend" in this part, we are referring to the "codegen backend".

So what do we need to do?

  1. First, we need to collect the set of things to generate code for. In particular, we need to find out which concrete types to substitute for generic ones, since we need to generate code for the concrete types. Generating code for the concrete types (i.e. emitting a copy of the code for each concrete type) is called monomorphization, so the process of collecting all the concrete types is called monomorphization collection.
  2. Next, we need to actually lower the MIR to a codegen IR (usually LLVM IR) for each concrete type we collected.
  3. Finally, we need to invoke LLVM or Cranelift, which runs a bunch of optimization passes, generates executable code, and links together an executable binary.

The code for codegen is actually a bit complex due to a few factors:

  • Support for multiple codegen backends (LLVM and Cranelift). We try to share as much backend code between them as possible, so a lot of it is generic over the codegen implementation. This means that there are often a lot of layers of abstraction.
  • Codegen happens asynchronously in another thread for performance.
  • The actual codegen is done by a third-party library (either LLVM or Cranelift).

Generally, the rustc_codegen_ssa crate contains backend-agnostic code (i.e. independent of LLVM or Cranelift), while the rustc_codegen_llvm crate contains code specific to LLVM codegen.

At a very high level, the entry point is rustc_codegen_ssa::base::codegen_crate. This function starts the process discussed in the rest of this chapter.

MIR optimizations

MIR optimizations are optimizations run on the MIR to produce better MIR before codegen. This is important for two reasons: first, it makes the final generated executable code better, and second, it means that LLVM has less work to do, so compilation is faster. Note that since MIR is generic (not monomorphized yet), these optimizations are particularly effective; we can optimize the generic version, so all of the monomorphizations are cheaper!

MIR optimizations run after borrow checking. We run a series of optimization passes over the MIR to improve it. Some passes are required to run on all code, some passes don't actually do optimizations but only check stuff, and some passes are only turned on in release mode.

The optimized_mir query is called to produce the optimized MIR for a given DefId. This query makes sure that the borrow checker has run and that some validation has occurred. Then, it steals the MIR, optimizes it, and returns the improved MIR.

Defining optimization passes

The list of passes run and the order in which they are run is defined by the run_optimization_passes function. It contains an array of passes to run. Each pass in the array is a struct that implements the MirPass trait. The array is an array of &dyn MirPass trait objects. Typically, a pass is implemented in its own submodule of the rustc_mir::transform module.

Some examples of passes are:

  • CleanupNonCodegenStatements: remove some of the info that is only needed for analyses, rather than codegen.
  • ConstProp: Does constant propagation

You can see the "Implementors" section of the MirPass rustdocs for more examples.

MIR optimization levels

MIR optimizations can come in various levels of readiness. Experimental optimizations may cause miscompilations, or slow down compile times. These passes are still included in nightly builds to gather feedback and make it easier to modify the pass. To enable working with slow or otherwise experimental optimization passes, you can specify the -Z mir-opt-level debug flag. You can find the definitions of the levels in the compiler MCP. If you are developing a MIR pass and want to query whether your optimization pass should run, you can check the current level using tcx.sess.opts.debugging_opts.mir_opt_level.

Optimization fuel

Optimization fuel is a compiler option (-Z fuel=<crate>=<value>) that allows for fine grained control over which optimizations can be applied during compilation: each optimization reduces fuel by 1, and when fuel reaches 0 no more optimizations are applied. The primary use of fuel is debugging optimizations that may be incorrect or misapplied. By changing the fuel value, you can bisect a compilation session down to the exact incorrect optimization (this behaves like a kind of binary search through the optimizations).

MIR optimizations respect fuel, and in general each pass should check fuel by calling tcx.consider_optimizing and skipping the optimization if fuel is empty. There are a few considerations:

  1. If the pass is considered "guaranteed" (for example, it should always be run because it is needed for correctness), then fuel should not be used. An example of this is PromoteTemps.
  2. In some cases, an initial pass is performed to gather candidates, which are then iterated to perform optimizations. In these situations, we should allow for the initial gathering pass and then check fuel as close to the mutation as possible. This allows for the best debugging experience, because we can determine where in the list of candidates an optimization may have been misapplied. Examples of this are InstCombine and ConstantPropagation.

MIR Debugging

The -Zdump-mir flag can be used to dump a text representation of the MIR. The following optional flags, used in combination with -Zdump-mir, enable additional output formats, including:

  • -Zdump-mir-graphviz - dumps a .dot file that represents MIR as a control-flow graph
  • -Zdump-mir-dataflow - dumps a .dot file showing the dataflow state at each point in the control-flow graph
  • -Zdump-mir-spanview - dumps an .html file that highlights the source spans associated with MIR elements (including mouse-over actions to reveal elements obscured by overlaps, and tooltips to view the MIR statements). This flag takes an optional value: statement (the default), terminator, or block, to generate span highlights with different levels of granulatity.

-Zdump-mir=F is a handy compiler options that will let you view the MIR for each function at each stage of compilation. -Zdump-mir takes a filter F which allows you to control which functions and which passes you are interesting in. For example:

> rustc -Zdump-mir=foo ...

This will dump the MIR for any function whose name contains foo; it will dump the MIR both before and after every pass. Those files will be created in the mir_dump directory. There will likely be quite a lot of them!

> cat > foo.rs
fn main() {
    println!("Hello, world!");
}
^D
> rustc -Zdump-mir=main foo.rs
> ls mir_dump/* | wc -l
     161

The files have names like rustc.main.000-000.CleanEndRegions.after.mir. These names have a number of parts:

rustc.main.000-000.CleanEndRegions.after.mir
      ---- --- --- --------------- ----- either before or after
      |    |   |   name of the pass
      |    |   index of dump within the pass (usually 0, but some passes dump intermediate states)
      |    index of the pass
      def-path to the function etc being dumped

You can also make more selective filters. For example, main & CleanEndRegions will select for things that reference both main and the pass CleanEndRegions:

> rustc -Zdump-mir='main & CleanEndRegions' foo.rs
> ls mir_dump
rustc.main.000-000.CleanEndRegions.after.mir	rustc.main.000-000.CleanEndRegions.before.mir

Filters can also have | parts to combine multiple sets of &-filters. For example main & CleanEndRegions | main & NoLandingPads will select either main and CleanEndRegions or main and NoLandingPads:

> rustc -Zdump-mir='main & CleanEndRegions | main & NoLandingPads' foo.rs
> ls mir_dump
rustc.main-promoted[0].002-000.NoLandingPads.after.mir
rustc.main-promoted[0].002-000.NoLandingPads.before.mir
rustc.main-promoted[0].002-006.NoLandingPads.after.mir
rustc.main-promoted[0].002-006.NoLandingPads.before.mir
rustc.main-promoted[1].002-000.NoLandingPads.after.mir
rustc.main-promoted[1].002-000.NoLandingPads.before.mir
rustc.main-promoted[1].002-006.NoLandingPads.after.mir
rustc.main-promoted[1].002-006.NoLandingPads.before.mir
rustc.main.000-000.CleanEndRegions.after.mir
rustc.main.000-000.CleanEndRegions.before.mir
rustc.main.002-000.NoLandingPads.after.mir
rustc.main.002-000.NoLandingPads.before.mir
rustc.main.002-006.NoLandingPads.after.mir
rustc.main.002-006.NoLandingPads.before.mir

(Here, the main-promoted[0] files refer to the MIR for "promoted constants" that appeared within the main function.)

TODO: anything else?

Constant Evaluation

Constant evaluation is the process of computing values at compile time. For a specific item (constant/static/array length) this happens after the MIR for the item is borrow-checked and optimized. In many cases trying to const evaluate an item will trigger the computation of its MIR for the first time.

Prominent examples are:

  • The initializer of a static
  • Array length
    • needs to be known to reserve stack or heap space
  • Enum variant discriminants
    • needs to be known to prevent two variants from having the same discriminant
  • Patterns
    • need to be known to check for overlapping patterns

Additionally constant evaluation can be used to reduce the workload or binary size at runtime by precomputing complex operations at compiletime and only storing the result.

Constant evaluation can be done by calling the const_eval_* functions of TyCtxt. They're the wrappers of the const_eval query.

The const_eval_* functions use a ParamEnv of environment in which the constant is evaluated (e.g. the function within which the constant is used) and a GlobalId. The GlobalId is made up of an Instance referring to a constant or static or of an Instance of a function and an index into the function's Promoted table.

Constant evaluation returns a EvalToConstValueResult with either the error, or a representation of the constant. static initializers are always represented as miri virtual memory allocations (via ConstValue::ByRef). Other constants get represented as ConstValue::Scalar or ConstValue::Slice if possible. This means that the const_eval_* functions cannot be used to create miri-pointers to the evaluated constant. If you need the value of a constant inside Miri, you need to directly work with eval_const_to_op.

Miri

Miri (MIR Interpreter) is a virtual machine for executing MIR without compiling to machine code. It is usually invoked via tcx.const_eval_* functions.

If you start out with a constant:


#![allow(unused)]
fn main() {
const FOO: usize = 1 << 12;
}

rustc doesn't actually invoke anything until the constant is either used or placed into metadata.

Once you have a use-site like:

type Foo = [u8; FOO - 42];

The compiler needs to figure out the length of the array before being able to create items that use the type (locals, constants, function arguments, ...).

To obtain the (in this case empty) parameter environment, one can call let param_env = tcx.param_env(length_def_id);. The GlobalId needed is

let gid = GlobalId {
    promoted: None,
    instance: Instance::mono(length_def_id),
};

Invoking tcx.const_eval(param_env.and(gid)) will now trigger the creation of the MIR of the array length expression. The MIR will look something like this:

Foo::{{constant}}#0: usize = {
    let mut _0: usize;
    let mut _1: (usize, bool);

    bb0: {
        _1 = CheckedSub(const FOO, const 42usize);
        assert(!move (_1.1: bool), "attempt to subtract with overflow") -> bb1;
    }

    bb1: {
        _0 = move (_1.0: usize);
        return;
    }
}

Before the evaluation, a virtual memory location (in this case essentially a vec![u8; 4] or vec![u8; 8]) is created for storing the evaluation result.

At the start of the evaluation, _0 and _1 are Operand::Immediate(Immediate::Scalar(ScalarMaybeUndef::Undef)). This is quite a mouthful: Operand can represent either data stored somewhere in the interpreter memory (Operand::Indirect), or (as an optimization) immediate data stored in-line. And Immediate can either be a single (potentially uninitialized) scalar value (integer or thin pointer), or a pair of two of them. In our case, the single scalar value is not (yet) initialized.

When the initialization of _1 is invoked, the value of the FOO constant is required, and triggers another call to tcx.const_eval_*, which will not be shown here. If the evaluation of FOO is successful, 42 will be subtracted from its value 4096 and the result stored in _1 as Operand::Immediate(Immediate::ScalarPair(Scalar::Raw { data: 4054, .. }, Scalar::Raw { data: 0, .. }). The first part of the pair is the computed value, the second part is a bool that's true if an overflow happened. A Scalar::Raw also stores the size (in bytes) of this scalar value; we are eliding that here.

The next statement asserts that said boolean is 0. In case the assertion fails, its error message is used for reporting a compile-time error.

Since it does not fail, Operand::Immediate(Immediate::Scalar(Scalar::Raw { data: 4054, .. })) is stored in the virtual memory was allocated before the evaluation. _0 always refers to that location directly.

After the evaluation is done, the return value is converted from Operand to ConstValue by op_to_const: the former representation is geared towards what is needed during cost evaluation, while ConstValue is shaped by the needs of the remaining parts of the compiler that consume the results of const evaluation. As part of this conversion, for types with scalar values, even if the resulting Operand is Indirect, it will return an immediate ConstValue::Scalar(computed_value) (instead of the usual ConstValue::ByRef). This makes using the result much more efficient and also more convenient, as no further queries need to be executed in order to get at something as simple as a usize.

Future evaluations of the same constants will not actually invoke Miri, but just use the cached result.

Datastructures

Miri's outside-facing datastructures can be found in rustc_middle/src/mir/interpret. This is mainly the error enum and the ConstValue and Scalar types. A ConstValue can be either Scalar (a single Scalar, i.e., integer or thin pointer), Slice (to represent byte slices and strings, as needed for pattern matching) or ByRef, which is used for anything else and refers to a virtual allocation. These allocations can be accessed via the methods on tcx.interpret_interner. A Scalar is either some Raw integer or a pointer; see the next section for more on that.

If you are expecting a numeric result, you can use eval_usize (panics on anything that can't be representad as a u64) or try_eval_usize which results in an Option<u64> yielding the Scalar if possible.

Memory

To support any kind of pointers, Miri needs to have a "virtual memory" that the pointers can point to. This is implemented in the Memory type. In the simplest model, every global variable, stack variable and every dynamic allocation corresponds to an Allocation in that memory. (Actually using an allocation for every MIR stack variable would be very inefficient; that's why we have Operand::Immediate for stack variables that are both small and never have their address taken. But that is purely an optimization.)

Such an Allocation is basically just a sequence of u8 storing the value of each byte in this allocation. (Plus some extra data, see below.) Every Allocation has a globally unique AllocId assigned in Memory. With that, a Pointer consists of a pair of an AllocId (indicating the allocation) and an offset into the allocation (indicating which byte of the allocation the pointer points to). It may seem odd that a Pointer is not just an integer address, but remember that during const evaluation, we cannot know at which actual integer address the allocation will end up -- so we use AllocId as symbolic base addresses, which means we need a separate offset. (As an aside, it turns out that pointers at run-time are more than just integers, too.)

These allocations exist so that references and raw pointers have something to point to. There is no global linear heap in which things are allocated, but each allocation (be it for a local variable, a static or a (future) heap allocation) gets its own little memory with exactly the required size. So if you have a pointer to an allocation for a local variable a, there is no possible (no matter how unsafe) operation that you can do that would ever change said pointer to a pointer to a different local variable b. Pointer arithmetic on a will only ever change its offset; the AllocId stays the same.

This, however, causes a problem when we want to store a Pointer into an Allocation: we cannot turn it into a sequence of u8 of the right length! AllocId and offset together are twice as big as a pointer "seems" to be. This is what the relocation field of Allocation is for: the byte offset of the Pointer gets stored as a bunch of u8, while its AllocId gets stored out-of-band. The two are reassembled when the Pointer is read from memory. The other bit of extra data an Allocation needs is undef_mask for keeping track of which of its bytes are initialized.

Global memory and exotic allocations

Memory exists only during the Miri evaluation; it gets destroyed when the final value of the constant is computed. In case that constant contains any pointers, those get "interned" and moved to a global "const eval memory" that is part of TyCtxt. These allocations stay around for the remaining computation and get serialized into the final output (so that dependent crates can use them).

Moreover, to also support function pointers, the global memory in TyCtxt can also contain "virtual allocations": instead of an Allocation, these contain an Instance. That allows a Pointer to point to either normal data or a function, which is needed to be able to evaluate casts from function pointers to raw pointers.

Finally, the GlobalAlloc type used in the global memory also contains a variant Static that points to a particular const or static item. This is needed to support circular statics, where we need to have a Pointer to a static for which we cannot yet have an Allocation as we do not know the bytes of its value.

Pointer values vs Pointer types

One common cause of confusion in Miri is that being a pointer value and having a pointer type are entirely independent properties. By "pointer value", we refer to a Scalar::Ptr containing a Pointer and thus pointing somewhere into Miri's virtual memory. This is in contrast to Scalar::Raw, which is just some concrete integer.

However, a variable of pointer or reference type, such as *const T or &T, does not have to have a pointer value: it could be obtaining by casting or transmuting an integer to a pointer (currently that is hard to do in const eval, but eventually transmute will be stable as a const fn). And similarly, when casting or transmuting a reference to some actual allocation to an integer, we end up with a pointer value (Scalar::Ptr) at integer type (usize). This is a problem because we cannot meaningfully perform integer operations such as division on pointer values.

Interpretation

Although the main entry point to constant evaluation is the tcx.const_eval_* functions, there are additional functions in rustc_mir/src/const_eval.rs that allow accessing the fields of a ConstValue (ByRef or otherwise). You should never have to access an Allocation directly except for translating it to the compilation target (at the moment just LLVM).

Miri starts by creating a virtual stack frame for the current constant that is being evaluated. There's essentially no difference between a constant and a function with no arguments, except that constants do not allow local (named) variables at the time of writing this guide.

A stack frame is defined by the Frame type in rustc_mir/src/interpret/eval_context.rs and contains all the local variables memory (None at the start of evaluation). Each frame refers to the evaluation of either the root constant or subsequent calls to const fn. The evaluation of another constant simply calls tcx.const_eval_*, which produce an entirely new and independent stack frame.

The frames are just a Vec<Frame>, there's no way to actually refer to a Frame's memory even if horrible shenanigans are done via unsafe code. The only memory that can be referred to are Allocations.

Miri now calls the step method (in rustc_mir/src/interpret/step.rs ) until it either returns an error or has no further statements to execute. Each statement will now initialize or modify the locals or the virtual memory referred to by a local. This might require evaluating other constants or statics, which just recursively invokes tcx.const_eval_*.

Monomorphization

As you probably know, rust has a very expressive type system that has extensive support for generic types. But of course, assembly is not generic, so we need to figure out the concrete types of all the generics before the code can execute.

Different languages handle this problem differently. For example, in some languages, such as Java, we may not know the most precise type of value until runtime. In the case of Java, this is ok because (almost) all variables are reference values anyway (i.e. pointers to a stack allocated object). This flexibility comes at the cost of performance, since all accesses to an object must dereference a pointer.

Rust takes a different approach: it monomorphizes all generic types. This means that compiler stamps out a different copy of the code of a generic function for each concrete type needed. For example, if I use a Vec<u64> and a Vec<String> in my code, then the generated binary will have two copies of the generated code for Vec: one for Vec<u64> and another for Vec<String>. The result is fast programs, but it comes at the cost of compile time (creating all those copies can take a while) and binary size (all those copies might take a lot of space).

Monomorphization is the first step in the backend of the rust compiler.

Collection

First, we need to figure out what concrete types we need for all the generic things in our program. This is called collection, and the code that does this is called the monomorphization collector.

Take this example:

fn banana() {
   peach::<u64>();
}

fn main() {
    banana();
}

The monomorphization collector will give you a list of [main, banana, peach::<u64>]. These are the functions that will have machine code generated for them. Collector will also add things like statics to that list.

See the collector rustdocs for more info.

The monomorphization collector is run just before MIR lowering and codegen. rustc_codegen_ssa::base::codegen_crate calls the collect_and_partition_mono_items query, which does monomorphization collection and then partitions them into codegen units.

Codegen Unit (CGU) partitioning

For better incremental build times, the CGU partitioner creates two CGU for each source level modules. One is for "stable" i.e. non-generic code and the other is more volatile code i.e. monoporphized/specialized instances.

For depenencies, consider Crate A and Crate B, such that Crate B depends on Crate A. The following table lists different scenarios for a function in Crate A that might be used by one or more modules in Crate B.

Crate A functionBehavior
Non-generic functionCrate A function doesn't appear in any codegen units of Crate B
Non-generic #[inline] functionCrate A function appears with in a single CGU of Crate B, and exists even after post-inlining stage
Generic functionRegardless of inlining, all monoporphized (specialized) functions
from Crate A appear within a single codegen unit for Crate B.
The codegen unit exists even after the post inlining stage.
Generic #[inline] function- same -

For more details about the partitioner read the module level documentation.

Polymorphization

As mentioned above, monomorphization produces fast code, but it comes at the cost of compile time and binary size. MIR optimizations can help a bit with this.

In addition to MIR optimizations, rustc attempts to determine when fewer copies of functions are necessary and avoid making those copies - known as "polymorphization". When a function-like item is found during monomorphization collection, the rustc_mir::monomorphize::polymorphize::unused_generic_params query is invoked, which traverses the MIR of the item to determine on which generic parameters the item might not need duplicated.

Currently, polymorphization only looks for unused generic parameters. These are relatively rare in functions, but closures inherit the generic parameters of their parent function and it is common for closures to not use those inherited parameters. Without polymorphization, a copy of these closures would be created for each copy of the parent function. By creating fewer copies, less LLVM IR is generated and needs processed.

unused_generic_params returns a FiniteBitSet<u64> where a bit is set if the generic parameter of the corresponding index is unused. Any parameters after the first sixty-four are considered used.

The results of polymorphization analysis are used in the Instance::polymorphize function to replace the Instance's substitutions for the unused generic parameters with their identity substitutions.

Consider the example below:

fn foo<A, B>() {
    let x: Option<B> = None;
}

fn main() {
    foo::<u16, u32>();
    foo::<u64, u32>();
}

During monomorphization collection, foo will be collected with the substitutions [u16, u32] and [u64, u32] (from its invocations in main). foo has the identity substitutions [A, B] (or [ty::Param(0), ty::Param(1)]).

Polymorphization will identify A as being unused and it will be replaced in the substitutions with the identity parameter before being added to the set of collected items - thereby reducing the copies from two ([u16, u32] and [u64, u32]) to one ([A, u32]).

unused_generic_params will also invoked during code generation when the symbol name for foo is being computed for use in the callsites of foo (which have the regular substitutions present, otherwise there would be a symbol mismatch between the caller and the function).

As a result of polymorphization, items collected during monomorphization cannot be assumed to be monomorphic.

It is intended that polymorphization be extended to more advanced cases, such as where only the size/alignment of a generic parameter are required.

More details on polymorphization are available in the master's thesis associated with polymorphization's initial implementation.

Lowering MIR to a Codegen IR

Now that we have a list of symbols to generate from the collector, we need to generate some sort of codegen IR. In this chapter, we will assume LLVM IR, since that's what rustc usually uses. The actual monomorphization is performed as we go, while we do the translation.

Recall that the backend is started by rustc_codegen_ssa::base::codegen_crate. Eventually, this reaches rustc_codegen_ssa::mir::codegen_mir, which does the lowering from MIR to LLVM IR.

The code is split into modules which handle particular MIR primitives:

Before a function is translated a number of simple and primitive analysis passes will run to help us generate simpler and more efficient LLVM IR. An example of such an analysis pass would be figuring out which variables are SSA-like, so that we can translate them to SSA directly rather than relying on LLVM's mem2reg for those variables. The analysis can be found in rustc_codegen_ssa::mir::analyze.

Usually a single MIR basic block will map to a LLVM basic block, with very few exceptions: intrinsic or function calls and less basic MIR statements like assert can result in multiple basic blocks. This is a perfect lede into the non-portable LLVM-specific part of the code generation. Intrinsic generation is fairly easy to understand as it involves very few abstraction levels in between and can be found in rustc_codegen_llvm::intrinsic.

Everything else will use the builder interface. This is the code that gets called in the rustc_codegen_ssa::mir::* modules discussed above.

TODO: discuss how constants are generated

Code generation

Code generation or "codegen" is the part of the compiler that actually generates an executable binary. Usually, rustc uses LLVM for code generation; there is also support for Cranelift. The key is that rustc doesn't implement codegen itself. It's worth noting, though, that in the rust source code, many parts of the backend have codegen in their names (there are no hard boundaries).

NOTE: If you are looking for hints on how to debug code generation bugs, please see this section of the debugging chapter.

What is LLVM?

LLVM is "a collection of modular and reusable compiler and toolchain technologies". In particular, the LLVM project contains a pluggable compiler backend (also called "LLVM"), which is used by many compiler projects, including the clang C compiler and our beloved rustc.

LLVM takes input in the form of LLVM IR. It is basically assembly code with additional low-level types and annotations added. These annotations are helpful for doing optimizations on the LLVM IR and outputted machine code. The end result of all this is (at long last) something executable (e.g. an ELF object, an EXE, or wasm).

There are a few benefits to using LLVM:

  • We don't have to write a whole compiler backend. This reduces implementation and maintenance burden.
  • We benefit from the large suite of advanced optimizations that the LLVM project has been collecting.
  • We can automatically compile Rust to any of the platforms for which LLVM has support. For example, as soon as LLVM added support for wasm, voila! rustc, clang, and a bunch of other languages were able to compile to wasm! (Well, there was some extra stuff to be done, but we were 90% there anyway).
  • We and other compiler projects benefit from each other. For example, when the Spectre and Meltdown security vulnerabilities were discovered, only LLVM needed to be patched.

Running LLVM, linking, and metadata generation

Once LLVM IR for all of the functions and statics, etc is built, it is time to start running LLVM and its optimization passes. LLVM IR is grouped into "modules". Multiple "modules" can be codegened at the same time to aid in multi-core utilization. These "modules" are what we refer to as codegen units. These units were established way back during monomorphization collection phase.

Once LLVM produces objects from these modules, these objects are passed to the linker along with, optionally, the metadata object and an archive or an executable is produced.

It is not necessarily the codegen phase described above that runs the optimizations. With certain kinds of LTO, the optimization might happen at the linking time instead. It is also possible for some optimizations to happen before objects are passed on to the linker and some to happen during the linking.

This all happens towards the very end of compilation. The code for this can be found in rustc_codegen_ssa::back and rustc_codegen_llvm::back. Sadly, this piece of code is not really well-separated into LLVM-dependent code; the rustc_codegen_ssa contains a fair amount of code specific to the LLVM backend.

Once these components are done with their work you end up with a number of files in your filesystem corresponding to the outputs you have requested.

Updating LLVM

The Rust compiler uses LLVM as its primary codegen backend today, and naturally we want to at least occasionally update this dependency! Currently we do not have a strict policy about when to update LLVM or what it can be updated to, but a few guidelines are applied:

  • We try to always support the latest released version of LLVM
  • We try to support the "last few" versions of LLVM (how many is changing over time)
  • We allow moving to arbitrary commits during development.
  • Strongly prefer to upstream all patches to LLVM before including them in rustc.

This policy may change over time (or may actually start to exist as a formal policy!), but for now these are rough guidelines!

Why update LLVM?

There are a few reasons nowadays that we want to update LLVM in one way or another:

  • A bug could have been fixed! Often we find bugs in the compiler and fix them upstream in LLVM. We'll want to pull fixes back to the compiler itself as they're merged upstream.

  • A new feature may be available in LLVM that we want to use in rustc, but we don't want to wait for a full LLVM release to test it out.

  • LLVM itself may have a new release and we'd like to update to this LLVM release.

Each of these reasons has a different strategy for updating LLVM, and we'll go over them in detail here.

Bugfix Updates

For updates of LLVM that are to fix a small bug, we cherry-pick the bugfix to the branch we're already using. The steps for this are:

  1. Make sure the bugfix is in upstream LLVM.
  2. Identify the branch that rustc is currently using. The src/llvm-project submodule is always pinned to a branch of the rust-lang/llvm-project repository.
  3. Fork the rust-lang/llvm-project repository
  4. Check out the appropriate branch (typically named rustc/a.b-yyyy-mm-dd)
  5. Cherry-pick the upstream commit onto the branch
  6. Push this branch to your fork
  7. Send a Pull Request to rust-lang/llvm-project to the same branch as before. Be sure to reference the Rust and/or LLVM issue that you're fixing in the PR description.
  8. Wait for the PR to be merged
  9. Send a PR to rust-lang/rust updating the src/llvm-project submodule with your bugfix. This can be done locally with git submodule update --remote src/llvm-project typically.
  10. Wait for PR to be merged

The tl;dr; is that we can cherry-pick bugfixes at any time and pull them back into the rust-lang/llvm-project branch that we're using, and getting it into the compiler is just updating the submodule via a PR!

Example PRs look like: #59089

Feature updates

Note that this is all information as applies to the current day in age. This process for updating LLVM changes with practically all LLVM updates, so this may be out of date!

Unlike bugfixes, updating to pick up a new feature of LLVM typically requires a lot more work. This is where we can't reasonably cherry-pick commits backwards so we need to do a full update. There's a lot of stuff to do here, so let's go through each in detail.

  1. Create a new branch in the rust-lang/llvm-project repository. This branch should be named rustc/a.b-yyyy-mm-dd where a.b is the current version number of LLVM in-tree at the time of the branch and the remaining part is today's date. Move this branch to the commit in LLVM that you'd like, which for this is probably the current LLVM HEAD.

  2. Apply Rust-specific patches to the llvm-project repository. All features and bugfixes are upstream, but there's often some weird build-related patches that don't make sense to upstream which we have on our repositories. These patches are around the latest patches in the rust-lang/llvm-project branch that rustc is currently using.

  3. Build the new LLVM in the rust repository. To do this you'll want to update the src/llvm-project repository to your branch and the revision you've created. It's also typically a good idea to update .gitmodules with the new branch name of the LLVM submodule. Make sure you've committed changes to src/llvm-project to ensure submodule updates aren't reverted. Some commands you should execute are:

    • ./x.py build src/llvm - test that LLVM still builds
    • ./x.py build src/tools/lld - same for LLD
    • ./x.py build - build the rest of rustc

    You'll likely need to update src/rustllvm/*.cpp to compile with updated LLVM bindings. Note that you should use #ifdef and such to ensure that the bindings still compile on older LLVM versions.

  4. Test for regressions across other platforms. LLVM often has at least one bug for non-tier-1 architectures, so it's good to do some more testing before sending this to bors! If you're low on resources you can send the PR as-is now to bors, though, and it'll get tested anyway.

    Ideally, build LLVM and test it on a few platforms:

    • Linux
    • OSX
    • Windows

    and afterwards run some docker containers that CI also does:

    • ./src/ci/docker/run.sh wasm32-unknown
    • ./src/ci/docker/run.sh arm-android
    • ./src/ci/docker/run.sh dist-various-1
    • ./src/ci/docker/run.sh dist-various-2
    • ./src/ci/docker/run.sh armhf-gnu
  5. Prepare a PR to rust-lang/rust. Work with maintainers of rust-lang/llvm-project to get your commit in a branch of that repository, and then you can send a PR to rust-lang/rust. You'll change at least src/llvm-project and will likely also change src/rustllvm/* as well.

For prior art, previous LLVM updates look like #55835 #47828 #62474 #62592. Note that sometimes it's easiest to land src/rustllvm/* compatibility as a PR before actually updating src/llvm-project. This way while you're working through LLVM issues others interested in trying out the new LLVM can benefit from work you've done to update the C++ bindings.

Caveats and gotchas

Ideally the above instructions are pretty smooth, but here's some caveats to keep in mind while going through them:

  • LLVM bugs are hard to find, don't hesitate to ask for help! Bisection is definitely your friend here (yes LLVM takes forever to build, yet bisection is still your friend)
  • If you've got general questions, @alexcrichton can help you out.
  • Creating branches is a privileged operation on GitHub, so you'll need someone with write access to create the branches for you most likely.

New LLVM Release Updates

Updating to a new release of LLVM is very similar to the "feature updates" section above. The release process for LLVM is often months-long though and we like to ensure compatibility ASAP. The main tweaks to the "feature updates" section above is generally around branch naming. The sequence of events typically looks like:

  1. LLVM announces that its latest release version has branched. This will show up as a branch in https://github.com/llvm/llvm-project typically named release/$N.x where $N is the version of LLVM that's being released.

  2. We then follow the "feature updates" section above to create a new branch of LLVM in our rust-lang/llvm-project repository. This follows the same naming convention of branches as usual, except that a.b is the new version. This update is eventually landed in the rust-lang/rust repository.

  3. Over the next few months, LLVM will continually push commits to its release/a.b branch. Often those are bug fixes we'd like to have as well. The merge process for that is to use git merge itself to merge LLVM's release/a.b branch with the branch created in step 2. This is typically done multiple times when necessary while LLVM's release branch is baking.

  4. LLVM then announces the release of version a.b.

  5. After LLVM's official release, we follow the "feature update" section again to create a new branch in the rust-lang/llvm-project repository, this time with a new date. The commit history should look much cleaner as just a few Rust-specific commits stacked on top of stock LLVM's release branch.

Debugging LLVM

NOTE: If you are looking for info about code generation, please see this chapter instead.

This section is about debugging compiler bugs in code generation (e.g. why the compiler generated some piece of code or crashed in LLVM). LLVM is a big project on its own that probably needs to have its own debugging document (not that I could find one). But here are some tips that are important in a rustc context:

As a general rule, compilers generate lots of information from analyzing code. Thus, a useful first step is usually to find a minimal example. One way to do this is to

  1. create a new crate that reproduces the issue (e.g. adding whatever crate is at fault as a dependency, and using it from there)

  2. minimize the crate by removing external dependencies; that is, moving everything relevant to the new crate

  3. further minimize the issue by making the code shorter (there are tools that help with this like creduce)

The official compilers (including nightlies) have LLVM assertions disabled, which means that LLVM assertion failures can show up as compiler crashes (not ICEs but "real" crashes) and other sorts of weird behavior. If you are encountering these, it is a good idea to try using a compiler with LLVM assertions enabled - either an "alt" nightly or a compiler you build yourself by setting [llvm] assertions=true in your config.toml - and see whether anything turns up.

The rustc build process builds the LLVM tools into ./build/<host-triple>/llvm/bin. They can be called directly.

The default rustc compilation pipeline has multiple codegen units, which is hard to replicate manually and means that LLVM is called multiple times in parallel. If you can get away with it (i.e. if it doesn't make your bug disappear), passing -C codegen-units=1 to rustc will make debugging easier.

For rustc to generate LLVM IR, you need to pass the --emit=llvm-ir flag. If you are building via cargo, use the RUSTFLAGS environment variable (e.g. RUSTFLAGS='--emit=llvm-ir'). This causes rustc to spit out LLVM IR into the target directory.

cargo llvm-ir [options] path spits out the LLVM IR for a particular function at path. (cargo install cargo-asm installs cargo asm and cargo llvm-ir). --build-type=debug emits code for debug builds. There are also other useful options. Also, debug info in LLVM IR can clutter the output a lot: RUSTFLAGS="-C debuginfo=0" is really useful.

RUSTFLAGS="-C save-temps" outputs LLVM bitcode (not the same as IR) at different stages during compilation, which is sometimes useful. One just needs to convert the bitcode files to .ll files using llvm-dis which should be in the target local compilation of rustc.

If you want to play with the optimization pipeline, you can use the opt tool from ./build/<host-triple>/llvm/bin/ with the LLVM IR emitted by rustc. Note that rustc emits different IR depending on whether -O is enabled, even without LLVM's optimizations, so if you want to play with the IR rustc emits, you should:

$ rustc +local my-file.rs --emit=llvm-ir -O -C no-prepopulate-passes \
    -C codegen-units=1
$ OPT=./build/$TRIPLE/llvm/bin/opt
$ $OPT -S -O2 < my-file.ll > my

If you just want to get the LLVM IR during the LLVM pipeline, to e.g. see which IR causes an optimization-time assertion to fail, or to see when LLVM performs a particular optimization, you can pass the rustc flag -C llvm-args=-print-after-all, and possibly add -C llvm-args='-filter-print-funcs=EXACT_FUNCTION_NAME (e.g. -C llvm-args='-filter-print-funcs=_ZN11collections3str21_$LT$impl$u20$str$GT$\ 7replace17hbe10ea2e7c809b0bE').

That produces a lot of output into standard error, so you'll want to pipe that to some file. Also, if you are using neither -filter-print-funcs nor -C codegen-units=1, then, because the multiple codegen units run in parallel, the printouts will mix together and you won't be able to read anything.

If you want just the IR for a specific function (say, you want to see why it causes an assertion or doesn't optimize