Queries: demand-driven compilation
As described in the high-level overview of the compiler, the Rust compiler
is still (as of July 2021) transitioning from a
traditional "pass-based" setup to a "demand-driven" system. The compiler query
system is the key to rustc's demand-driven organization.
The idea is pretty simple. Instead of entirely independent passes
(parsing, type-checking, etc.), a set of function-like queries
compute information about the input source. For example,
there is a query called type_of
that, given the DefId
of
some item, will compute the type of that item and return it to you.
Query execution is memoized. The first time you invoke a query, it will go do the computation, but the next time, the result is returned from a hashtable. Moreover, query execution fits nicely into incremental computation; the idea is roughly that, when you invoke a query, the result may be returned to you by loading stored data from disk.1
Eventually, we want the entire compiler
control-flow to be query driven. There will effectively be one
top-level query (compile
) that will run compilation on a crate; this
will in turn demand information about that crate, starting from the
end. For example:
- The
compile
query might demand to get a list of codegen-units (i.e. modules that need to be compiled by LLVM). - But computing the list of codegen-units would invoke some subquery that returns the list of all modules defined in the Rust source.
- That query in turn would invoke something asking for the HIR.
- This keeps going further and further back until we wind up doing the actual parsing.
Although this vision is not fully realized, large sections of the compiler (for example, generating MIR) currently work exactly like this.
The "Incremental Compilation in Detail chapter gives a more in-depth description of what queries are and how they work. If you intend to write a query of your own, this is a good read.
Invoking queries
Invoking a query is simple. The TyCtxt
("type context") struct offers a method
for each defined query. For example, to invoke the type_of
query, you would just do this:
let ty = tcx.type_of(some_def_id);
How the compiler executes a query
So you may be wondering what happens when you invoke a query
method. The answer is that, for each query, the compiler maintains a
cache – if your query has already been executed, then, the answer is
simple: we clone the return value out of the cache and return it
(therefore, you should try to ensure that the return types of queries
are cheaply cloneable; insert an Rc
if necessary).
Providers
If, however, the query is not in the cache, then the compiler will try to find a suitable provider. A provider is a function that has been defined and linked into the compiler somewhere that contains the code to compute the result of the query.
Providers are defined per-crate. The compiler maintains,
internally, a table of providers for every crate, at least
conceptually. Right now, there are really two sets: the providers for
queries about the local crate (that is, the one being compiled)
and providers for queries about external crates (that is,
dependencies of the local crate). Note that what determines the crate
that a query is targeting is not the kind of query, but the key.
For example, when you invoke tcx.type_of(def_id)
, that could be a
local query or an external query, depending on what crate the def_id
is referring to (see the self::keys::Key
trait for more
information on how that works).
Providers always have the same signature:
fn provider<'tcx>(
tcx: TyCtxt<'tcx>,
key: QUERY_KEY,
) -> QUERY_RESULT {
...
}
Providers take two arguments: the tcx
and the query key.
They return the result of the query.
How providers are setup
When the tcx is created, it is given the providers by its creator using
the Providers
struct. This struct is generated by
the macros here, but it is basically a big list of function pointers:
struct Providers {
type_of: for<'tcx> fn(TyCtxt<'tcx>, DefId) -> Ty<'tcx>,
...
}
At present, we have one copy of the struct for local crates, and one for external crates, though the plan is that we may eventually have one per crate.
These Providers
structs are ultimately created and populated by
rustc_driver
, but it does this by distributing the work
throughout the other rustc_*
crates. This is done by invoking
various provide
functions. These functions tend to look
something like this:
pub fn provide(providers: &mut Providers) {
*providers = Providers {
type_of,
..*providers
};
}
That is, they take an &mut Providers
and mutate it in place. Usually
we use the formulation above just because it looks nice, but you could
as well do providers.type_of = type_of
, which would be equivalent.
(Here, type_of
would be a top-level function, defined as we saw
before.) So, if we want to add a provider for some other query,
let's call it fubar
, into the crate above, we might modify the provide()
function like so:
pub fn provide(providers: &mut Providers) {
*providers = Providers {
type_of,
fubar,
..*providers
};
}
fn fubar<'tcx>(tcx: TyCtxt<'tcx>, key: DefId) -> Fubar<'tcx> { ... }
N.B. Most of the rustc_*
crates only provide local
providers. Almost all extern providers wind up going through the
rustc_metadata
crate, which loads the information
from the crate metadata. But in some cases there are crates that
provide queries for both local and external crates, in which case
they define both a provide
and a provide_extern
function, through
wasm_import_module_map
, that rustc_driver
can invoke.
Adding a new query
How do you add a new query? Defining a query takes place in two steps:
- Declare the query name, its arguments and description.
- Supply query providers where needed.
To declare the query name and arguments, you simply add an entry to
the big macro invocation in compiler/rustc_middle/src/query/mod.rs
.
Then you need to add a documentation comment to it with some internal description.
Then, provide the desc
attribute which contains a user-facing description of the query.
The desc
attribute is shown to the user in query cycles.
This looks something like:
rustc_queries! {
/// Records the type of every item.
query type_of(key: DefId) -> Ty<'tcx> {
cache_on_disk_if { key.is_local() }
desc { |tcx| "computing the type of `{}`", tcx.def_path_str(key) }
}
...
}
A query definition has the following form:
query type_of(key: DefId) -> Ty<'tcx> { ... }
^^^^^ ^^^^^^^ ^^^^^ ^^^^^^^^ ^^^
| | | | |
| | | | query modifiers
| | | result type
| | query key type
| name of query
query keyword
Let's go over these elements one by one:
- Query keyword: indicates a start of a query definition.
- Name of query: the name of the query method
(
tcx.type_of(..)
). Also used as the name of a struct (ty::queries::type_of
) that will be generated to represent this query. - Query key type: the type of the argument to this query.
This type must implement the
ty::query::keys::Key
trait, which defines (for example) how to map it to a crate, and so forth. - Result type of query: the type produced by this query. This type
should (a) not use
RefCell
or other interior mutability and (b) be cheaply cloneable. Interning or usingRc
orArc
is recommended for non-trivial data types.2 - Query modifiers: various flags and options that customize how the query is processed (mostly with respect to incremental compilation).
So, to add a query:
- Add an entry to
rustc_queries!
using the format above. - Link the provider by modifying the appropriate
provide
method; or add a new one if needed and ensure thatrustc_driver
is invoking it.
The one exception to those rules is the ty::steal::Steal
type,
which is used to cheaply modify MIR in place. See the definition
of Steal
for more details. New uses of Steal
should not be
added without alerting @rust-lang/compiler
.
External links
Related design ideas, and tracking issues:
- Design document: On-demand Rustc incremental design doc
- Tracking Issue: "Red/Green" dependency tracking in compiler
More discussion and issues: