NOTE: The structure of the repository is going through a lot of transitions. In particular, we want to get to a point eventually where the top-level directory has separate directories for the compiler, build-system, std libs, etc, rather than one huge
As of January 2021, the standard libraries have been moved to
library/and the crates that make up the
rustccompiler itself have been moved to
Now that we have seen what the compiler does, let's take a look at the structure of the contents of the rust-lang/rust repo.
rust-lang/rust repository consists of a single large cargo workspace
containing the compiler, the standard libraries (
proc_macro, etc), and
rustdoc, along with the build system and a bunch of
tools and submodules for building a full Rust distribution.
The repository consists of three main directories:
compiler/contains the source code for
rustc. It consists of many crates that together make up the compiler.
library/contains the standard libraries (
test), as well as the Rust runtime (
src/contains the source code for rustdoc, clippy, cargo, the build system, language docs, etc.
The standard library crates are all in
library/. They have intuitive names
alloc, etc. There is also
other runtime libraries.
This code is fairly similar to most other Rust crates except that it must be built in a special way because it can use unstable features.
You may find it helpful to read The Overview Chapter first, which gives an overview of how the compiler works. The crates mentioned in this section implement the compiler, and are underneath
compiler/ crates all have names starting with
rustc_*. These are a
collection of around 50 interdependent crates ranging in size from tiny to
huge. There is also the
rustc crate which is the actual binary (i.e. the
main function); it doesn't actually do anything besides calling the
rustc_driver crate, which drives the various parts of compilation in other
The dependency structure of these crates is complex, but roughly it is something like this:
rustc(the binary) calls
rustc_driverdepends on a lot of other crates, but the main one is
rustc_interfacedepends on most of the other compiler crates. It is a fairly generic interface for driving the whole compilation.
- Most of the other
rustc_*crates depend on
rustc_middle, which defines a lot of central data structures in the compiler.
- Most of the other
You can see the exact dependencies by reading the
Cargo.toml for the various
crates, just like a normal Rust crate.
One final thing:
src/llvm-project is a submodule for our fork of LLVM.
During bootstrapping, LLVM is built and the
contains rust wrappers around LLVM (which is written in C++), so that the
compiler can interface with it.
Most of this book is about the compiler, so we won't have any further explanation of these crates here.
The dependency structure is influenced strongly by two main factors:
- Organization. The compiler is a huge codebase; it would be an impossibly large crate. In part, the dependency structure reflects the code structure of the compiler.
- Compile time. By breaking the compiler into multiple crates, we can take better advantage of incremental/parallel compilation using cargo. In particular, we try to have as few dependencies between crates as possible so that we don't have to rebuild as many crates if you change one.
At the very bottom of the dependency tree are a handful of crates that are used
by the whole compiler (e.g.
rustc_span). The very early parts of the
compilation process (e.g. parsing and the AST) depend on only these.
Pretty soon after the AST is constructed, the compiler's query system gets set up. The query system is set up in a clever way using function pointers. This allows us to break dependencies between crates, allowing more parallel compilation.
However, since the query system is defined in
rustc_middle, nearly all
subsequent parts of the compiler depend on this crate. It is a really large
crate, leading to long compile times. Some efforts have been made to move stuff
out of it with limited success. Another unfortunate side effect is that sometimes
related functionality gets scattered across different crates. For example,
linting functionality is scattered across earlier parts of the crate,
rustc_middle, and other places.
More generally, in an ideal world, it seems like there would be fewer, more cohesive crates, with incremental and parallel compilation making sure compile times stay reasonable. However, our incremental and parallel compilation haven't gotten good enough for that yet, so breaking things into separate crates has been our solution so far.
At the top of the dependency tree are the
rustc_interface is an unstable wrapper around the
query system that helps to drive the various stages of compilation. Other
consumers of the compiler may use this interface in different ways (e.g.
rustdoc or maybe eventually rust-analyzer). The
rustc_driver crate first
parses command line arguments and then uses
rustc_interface to drive the
compilation to completion.
You can read more about rustdoc in this chapter.
The test harness itself is in
There are a number of tools in the repository just for building the compiler, standard library, rustdoc, etc, along with testing, building a full Rust distribution, etc.
There are a lot of other things in the
rust-lang/rust repo that are related
to building a full rust distribution. Most of the time you don't need to worry
src/ci: The CI configuration. This actually quite extensive because we run a lot of tests on a lot of platforms.
src/doc: Various documentation, including submodules for a few books.
src/etc: Miscellaneous utilities.
src/tools/rustc-workspace-hack, and others: Various workarounds to make cargo work with bootstrapping.
- And more...