What Bootstrapping does
Bootstrapping is the process of using a compiler to compile itself. More accurately, it means using an older compiler to compile a newer version of the same compiler.
This raises a chicken-and-egg paradox: where did the first compiler come from?
It must have been written in a different language. In Rust's case it was
written in OCaml. However, it was abandoned long ago, and the
only way to build a modern version of rustc is with a slightly less modern version.
This is exactly how ./x.py works: it downloads the current beta release of
rustc, then uses it to compile the new compiler.
Note that this documentation mostly covers user-facing information. See bootstrap/README.md to read about bootstrap internals.
Stages of bootstrapping
Overview
- Stage 0: the pre-compiled compiler and standard library
- Stage 1: from current code, by an earlier compiler
- Stage 2: the truly current compiler
- Stage 3: the same-result test
Compiling rustc is done in stages. Here's a diagram, adapted from Jynn
Nelson's talk on bootstrapping at RustConf 2022, with
detailed explanations below.
The A, B, C, and D show the ordering of the stages of bootstrapping.
Blue nodes are
downloaded, yellow
nodes are built with the stage0 compiler, and green nodes are built with the stage1
compiler.
graph TD
s0c["stage0 compiler (1.86.0-beta.1)"]:::downloaded -->|A| s0l("stage0 std (1.86.0-beta.1)"):::downloaded;
s0c & s0l --- stepb[ ]:::empty;
stepb -->|B| s0ca["stage0 compiler artifacts (1.87.0-dev)"]:::with-s0c;
s0ca -->|copy| s1c["stage1 compiler (1.87.0-dev)"]:::with-s0c;
s1c -->|C| s1l("stage1 std (1.87.0-dev)"):::with-s1c;
s1c & s1l --- stepd[ ]:::empty;
stepd -->|D| s1ca["stage1 compiler artifacts (1.87.0-dev)"]:::with-s1c;
s1ca -->|copy| s2c["stage2 compiler"]:::with-s1c;
classDef empty width:0px,height:0px;
classDef downloaded fill: lightblue;
classDef with-s0c fill: yellow;
classDef with-s1c fill: lightgreen;
Stage 0: the pre-compiled compiler
The stage0 compiler is by default the very recent beta rustc compiler and its
associated dynamic libraries, which ./x.py will download for you. (You can
also configure ./x.py to change stage0 to something else.)
The precompiled stage0 compiler is then used only to compile src/bootstrap and compiler/rustc
with precompiled stage0 std.
Note that to build the stage1 compiler we use the precompiled stage0 compiler and std. Therefore, to use a compiler with a std that is freshly built from the tree, you need to build the stage2 compiler.
There are two concepts at play here: a compiler (with its set of dependencies) and its
'target' or 'object' libraries (std and rustc). Both are staged, but in a staggered manner.
Stage 1: from current code, by an earlier compiler
The rustc source code is then compiled with the stage0 compiler to produce the
stage1 compiler.
Stage 2: the truly current compiler
We then rebuild the compiler using stage1 compiler with in-tree std to produce the stage2
compiler.
The stage1 compiler itself was built by precompiled stage0 compiler and std
and hence not by the source in your working directory. This means that the ABI
generated by the stage0 compiler may not match the ABI that would have been made
by the stage1 compiler, which can cause problems for dynamic libraries, tests
and tools using rustc_private.
Note that the proc_macro crate avoids this issue with a C FFI layer called
proc_macro::bridge, allowing it to be used with stage1.
The stage2 compiler is the one distributed with rustup and all other install
methods. However, it takes a very long time to build because one must first
build the new compiler with an older compiler and then use that to build the new
compiler with itself.
For development, you usually only want to use --stage 1 flag to build things.
See Building the compiler.
Stage 3: the same-result test
Stage 3 is optional. To sanity check our new compiler we can build the libraries
with the stage2 compiler. The result ought to be identical to before, unless
something has broken.
Building the stages
The script ./x tries to be helpful and pick the stage you most likely meant
for each subcommand. Here are some x commands with their default stages:
check:--stage 1clippy:--stage 1doc:--stage 1build:--stage 1test:--stage 1dist:--stage 2install:--stage 2bench:--stage 2
You can always override the stage by passing --stage N explicitly.
For more information about stages, see below.
Complications of bootstrapping
Since the build system uses the current beta compiler to build a stage1
bootstrapping compiler, the compiler source code can't use some features until
they reach beta (because otherwise the beta compiler doesn't support them). On
the other hand, for compiler intrinsics and internal features, the
features have to be used. Additionally, the compiler makes heavy use of
nightly features (#![feature(...)]). How can we resolve this problem?
There are two methods used:
- The build system sets
--cfg bootstrapwhen building withstage0, so we can usecfg(not(bootstrap))to only use features when built withstage1. Setting--cfg bootstrapin this way is used for features that were just stabilized, which require#![feature(...)]when built withstage0, but not forstage1. - The build system sets
RUSTC_BOOTSTRAP=1. This special variable means to break the stability guarantees of Rust: allowing use of#![feature(...)]with a compiler that's notnightly. SettingRUSTC_BOOTSTRAP=1should never be used except when bootstrapping the compiler.
Understanding stages of bootstrap
Overview
This is a detailed look into the separate bootstrap stages.
The convention ./x uses is that:
- A
--stage Nflag means to run the stage N compiler (stageN/rustc). - A "stage N artifact" is a build artifact that is produced by the stage N compiler.
- The stage N+1 compiler is assembled from stage N artifacts. This process is called uplifting.
Build artifacts
Anything you can build with ./x is a build artifact. Build artifacts
include, but are not limited to:
- binaries, like
stage0-rustc/rustc-main - shared objects, like
stage0-sysroot/rustlib/libstd-6fae108520cf72fe.so - rlib files, like
stage0-sysroot/rustlib/libstd-6fae108520cf72fe.rlib - HTML files generated by rustdoc, like
doc/std
Examples
./x test tests/uimeans to build thestage1compiler and runcompileteston it. If you're working on the compiler, this is normally the test command you want../x test --stage 0 library/stdmeans to run tests on the standard library without buildingrustcfrom source ('build withstage0, then test the artifacts'). If you're working on the standard library, this is normally the test command you want../x build --stage 0means to build with the stage0rustc../x doc --stage 1means to document using the stage0rustdoc.
Examples of what not to do
./x test --stage 0 tests/uiis not useful: it runs tests on the beta compiler and doesn't buildrustcfrom source. Usetest tests/uiinstead, which buildsstage1from source../x test --stage 0 compiler/rustcbuilds the compiler but runs no tests: it's runningcargo test -p rustc, butcargodoesn't understand Rust's tests. You shouldn't need to use this, usetestinstead (without arguments)../x build --stage 0 compiler/rustcbuilds the compiler, but does not buildlibstdor evenlibcore. Most of the time, you'll want./x build libraryinstead, which allows compiling programs without needing to define lang items.
Building vs. running
In short, stage 0 uses the stage0 compiler to create stage0 artifacts which
will later be uplifted to be the stage1 compiler.
In each stage besides 0, two major steps are performed:
stdis compiled by the stage N compiler.- That
stdis linked to programs built by the stage N compiler, including the stage N artifacts (stage N+1 compiler).
This is somewhat intuitive if one thinks of the stage N artifacts as "just"
another program we are building with the stage N compiler: build --stage N compiler/rustc is linking the stage N artifacts to the std built by the stage
N compiler.
Stages and std
Note that there are two std libraries in play here:
- The library linked to
stageN/rustc, which was built by stage N-1 (stage N-1std) - The library used to compile programs with
stageN/rustc, which was built by stage N (stage Nstd).
Stage N std is pretty much necessary for any useful work with the stage N
compiler. Without it, you can only compile programs with #![no_core] -- not
terribly useful!
The reason these need to be different is because they aren't necessarily
ABI-compatible: there could be new layout optimizations, changes to MIR, or
other changes to Rust metadata on nightly that aren't present in beta.
This is also where --keep-stage 1 library/std comes into play. Since most
changes to the compiler don't actually change the ABI, once you've produced a
std in stage1, you can probably just reuse it with a different compiler. If
the ABI hasn't changed, you're good to go, no need to spend time recompiling
that std. The flag --keep-stage simply instructs the build script to assumes
the previous compile is fine and copies those artifacts into the appropriate
place, skipping the cargo invocation.
Cross-compiling rustc
Cross-compiling is the process of compiling code that will run on another
architecture. For instance, you might want to build an ARM version of rustc
using an x86 machine. Building stage2 std is different when you are
cross-compiling.
This is because ./x uses the following logic: if HOST and TARGET are the
same, it will reuse stage1 std for stage2! This is sound because stage1
std was compiled with the stage1 compiler, i.e. a compiler using the source
code you currently have checked out. So it should be identical (and therefore
ABI-compatible) to the std that stage2/rustc would compile.
However, when cross-compiling, stage1 std will only run on the host. So the
stage2 compiler has to recompile std for the target.
(See in the table how stage2 only builds non-host std targets).
What is a 'sysroot'?
When you build a project with cargo, the build artifacts for dependencies are
normally stored in target/debug/deps. This only contains dependencies cargo
knows about; in particular, it doesn't have the standard library. Where do std
or proc_macro come from? They come from the sysroot, the root of a number
of directories where the compiler loads build artifacts at runtime. The
sysroot doesn't just store the standard library, though - it includes anything
that needs to be loaded at runtime. That includes (but is not limited to):
- Libraries
libstd/libtest/libproc_macro. - Compiler crates themselves, when using
rustc_private. In-tree these are always present; out of tree, you need to installrustc-devwithrustup. - Shared object file
libLLVM.sofor the LLVM project. In-tree this is either built from source or downloaded from CI; out-of-tree, you need to installllvm-tools-previewwithrustup.
All the artifacts listed so far are compiler runtime dependencies. You can see
them with rustc --print sysroot:
$ ls $(rustc --print sysroot)/lib
libchalk_derive-0685d79833dc9b2b.so libstd-25c6acf8063a3802.so
libLLVM-11-rust-1.50.0-nightly.so libtest-57470d2aa8f7aa83.so
librustc_driver-4f0cc9f50e53f0ba.so libtracing_attributes-e4be92c35ab2a33b.so
librustc_macros-5f0ec4a119c6ac86.so rustlib
There are also runtime dependencies for the standard library! These are in
lib/rustlib/, not lib/ directly.
$ ls $(rustc --print sysroot)/lib/rustlib/x86_64-unknown-linux-gnu/lib | head -n 5
libaddr2line-6c8e02b8fedc1e5f.rlib
libadler-9ef2480568df55af.rlib
liballoc-9c4002b5f79ba0e1.rlib
libcfg_if-512eb53291f6de7e.rlib
libcompiler_builtins-ef2408da76957905.rlib
Directory lib/rustlib/ includes libraries like hashbrown and cfg_if, which
are not part of the public API of the standard library, but are used to
implement it. Also lib/rustlib/ is part of the search path for linkers, but
lib will never be part of the search path.
-Z force-unstable-if-unmarked
Since lib/rustlib/ is part of the search path we have to be careful about
which crates are included in it. In particular, all crates except for the
standard library are built with the flag -Z force-unstable-if-unmarked, which
means that you have to use #![feature(rustc_private)] in order to load it (as
opposed to the standard library, which is always available).
The -Z force-unstable-if-unmarked flag has a variety of purposes to help
enforce that the correct crates are marked as unstable. It was introduced
primarily to allow rustc and the standard library to link to arbitrary crates on
crates.io which do not themselves use staged_api. rustc also relies on this
flag to mark all of its crates as unstable with the rustc_private feature so
that each crate does not need to be carefully marked with unstable.
This flag is automatically applied to all of rustc and the standard library by
the bootstrap scripts. This is needed because the compiler and all of its
dependencies are shipped in sysroot to all users.
This flag has the following effects:
- Marks the crate as "
unstable" with therustc_privatefeature if it is not itself marked asstableorunstable. - Allows these crates to access other forced-unstable crates without any need
for attributes. Normally a crate would need a
#![feature(rustc_private)]attribute to use otherunstablecrates. However, that would make it impossible for a crate from crates.io to access its own dependencies since that crate won't have afeature(rustc_private)attribute, but everything is compiled with-Z force-unstable-if-unmarked.
Code which does not use -Z force-unstable-if-unmarked should include the
#![feature(rustc_private)] crate attribute to access these forced-unstable
crates. This is needed for things which link rustc its self, such as MIRI or
clippy.
You can find more discussion about sysroots in:
- The rustdoc PR explaining why it uses
extern cratefor dependencies loaded fromsysroot - Discussions about sysroot on Zulip
- Discussions about building rustdoc out of tree
Passing flags to commands invoked by bootstrap
Conveniently ./x allows you to pass stage-specific flags to rustc and
cargo when bootstrapping. The RUSTFLAGS_BOOTSTRAP environment variable is
passed as RUSTFLAGS to the bootstrap stage (stage0), and
RUSTFLAGS_NOT_BOOTSTRAP is passed when building artifacts for later stages.
RUSTFLAGS will work, but also affects the build of bootstrap itself, so it
will be rare to want to use it. Finally, MAGIC_EXTRA_RUSTFLAGS bypasses the
cargo cache to pass flags to rustc without recompiling all dependencies.
RUSTDOCFLAGS,RUSTDOCFLAGS_BOOTSTRAPandRUSTDOCFLAGS_NOT_BOOTSTRAPare analogous toRUSTFLAGS, but forrustdoc.CARGOFLAGSwill pass arguments to cargo itself (e.g.--timings).CARGOFLAGS_BOOTSTRAPandCARGOFLAGS_NOT_BOOTSTRAPwork analogously toRUSTFLAGS_BOOTSTRAP.--test-argswill pass arguments through to the test runner. Fortests/ui, this iscompiletest. For unit tests and doc tests this is thelibtestrunner.
Most test runners accept --help,
which you can use to find out the options accepted by the runner.
Environment Variables
During bootstrapping, there are a bunch of compiler-internal environment
variables that are used. If you are trying to run an intermediate version of
rustc, sometimes you may need to set some of these environment variables
manually. Otherwise, you get an error like the following:
thread 'main' panicked at 'RUSTC_STAGE was not set: NotPresent', library/core/src/result.rs:1165:5
If ./stageN/bin/rustc gives an error about environment variables, that usually
means something is quite wrong -- such as you're trying to compile rustc or
std or something which depends on environment variables. In the unlikely case
that you actually need to invoke rustc in such a situation, you can tell the
bootstrap shim to print all env variables by adding -vvv to your x
command.
Finally, bootstrap makes use of the cc-rs crate which has its own
method of configuring C compilers and C flags via environment
variables.
Clarification of build command's stdout
In this part, we will investigate the build command's stdout in an action
(similar, but more detailed and complete documentation compare to topic above).
When you execute x build --dry-run command, the build output will be something
like the following:
Building stage0 library artifacts (x86_64-unknown-linux-gnu -> x86_64-unknown-linux-gnu)
Copying stage0 library from stage0 (x86_64-unknown-linux-gnu -> x86_64-unknown-linux-gnu / x86_64-unknown-linux-gnu)
Building stage0 compiler artifacts (x86_64-unknown-linux-gnu -> x86_64-unknown-linux-gnu)
Copying stage0 rustc from stage0 (x86_64-unknown-linux-gnu -> x86_64-unknown-linux-gnu / x86_64-unknown-linux-gnu)
Assembling stage1 compiler (x86_64-unknown-linux-gnu)
Building stage1 library artifacts (x86_64-unknown-linux-gnu -> x86_64-unknown-linux-gnu)
Copying stage1 library from stage1 (x86_64-unknown-linux-gnu -> x86_64-unknown-linux-gnu / x86_64-unknown-linux-gnu)
Building stage1 tool rust-analyzer-proc-macro-srv (x86_64-unknown-linux-gnu)
Building rustdoc for stage1 (x86_64-unknown-linux-gnu)
Building stage0 {std,compiler} artifacts
These steps use the provided (downloaded, usually) compiler to compile the local Rust source into libraries we can use.
Copying stage0 {std,rustc}
This copies the library and compiler artifacts from cargo into
stage0-sysroot/lib/rustlib/{target-triple}/lib
Assembling stage1 compiler
This copies the libraries we built in "building stage0 ... artifacts" into the
stage1 compiler's lib/ directory. These are the host libraries that the
compiler itself uses to run. These aren't actually used by artifacts the new
compiler generates. This step also copies the rustc and rustdoc binaries we
generated into build/$HOST/stage/bin.
The stage1/bin/rustc is a fully functional compiler built with stage0 (precompiled) compiler and std.
To use a compiler built entirely from source with the in-tree compiler and std, you need to build the
stage2 compiler, which is compiled using the stage1 (in-tree) compiler and std.