Memory Management in Rustc
Rustc tries to be pretty careful how it manages memory. The compiler allocates a lot of data structures throughout compilation, and if we are not careful, it will take a lot of time and space to do so.
One of the main way the compiler manages this is using arenas and interning.
Arenas and Interning
We create a LOT of data structures during compilation. For performance reasons,
we allocate them from a global memory pool; they are each allocated once from a
long-lived arena. This is called arena allocation. This system reduces
allocations/deallocations of memory. It also allows for easy comparison of
types for equality: for each interned type X
, we implemented PartialEq for X
, so we can just compare pointers. The CtxtInterners
type
contains a bunch of maps of interned types and the arena itself.
Example: ty::TyKind
Taking the example of ty::TyKind
which represents a type in the compiler (you
can read more here). Each time we want to construct a type, the
compiler doesn’t naively allocate from the buffer. Instead, we check if that
type was already constructed. If it was, we just get the same pointer we had
before, otherwise we make a fresh pointer. With this schema if we want to know
if two types are the same, all we need to do is compare the pointers which is
efficient. TyKind
should never be constructed on the stack, and it would be unusable
if done so.
You always allocate them from this arena and you always intern them so they are
unique.
At the beginning of the compilation we make a buffer and each time we need to allocate a type we use
some of this memory buffer. If we run out of space we get another one. The lifetime of that buffer
is 'tcx
. Our types are tied to that lifetime, so when compilation finishes all the memory related
to that buffer is freed and our 'tcx
references would be invalid.
In addition to types, there are a number of other arena-allocated data structures that you can allocate, and which are found in this module. Here are a few examples:
GenericArgs
, allocated withmk_args
– this will intern a slice of types, often used to specify the values to be substituted for generics args (e.g.HashMap<i32, u32>
would be represented as a slice&'tcx [tcx.types.i32, tcx.types.u32]
).TraitRef
, typically passed by value – a trait reference consists of a reference to a trait along with its various type parameters (includingSelf
), likei32: Display
(here, the def-id would reference theDisplay
trait, and the args would containi32
). Note thatdef-id
is defined and discussed in depth in theAdtDef and DefId
section.Predicate
defines something the trait system has to prove (seetraits
module).
The tcx and how it uses lifetimes
The tcx
("typing context") is the central data structure in the compiler. It is the context that
you use to perform all manner of queries. The struct TyCtxt
defines a reference to this shared
context:
tcx: TyCtxt<'tcx>
// ----
// |
// arena lifetime
As you can see, the TyCtxt
type takes a lifetime parameter. When you see a reference with a
lifetime like 'tcx
, you know that it refers to arena-allocated data (or data that lives as long as
the arenas, anyhow).
A Note On Lifetimes
The Rust compiler is a fairly large program containing lots of big data
structures (e.g. the AST, HIR, and the type system) and as such, arenas and
references are heavily relied upon to minimize unnecessary memory use. This
manifests itself in the way people can plug into the compiler (i.e. the
driver), preferring a "push"-style API (callbacks) instead
of the more Rust-ic "pull" style (think the Iterator
trait).
Thread-local storage and interning are used a lot through the compiler to reduce
duplication while also preventing a lot of the ergonomic issues due to many
pervasive lifetimes. The rustc_middle::ty::tls
module is used to access these
thread-locals, although you should rarely need to touch it.