The HIR
The HIR – "High-Level Intermediate Representation" – is the primary IR used
in most of rustc. It is a compiler-friendly representation of the abstract
syntax tree (AST) that is generated after parsing, macro expansion, and name
resolution (see Lowering for how the HIR is created).
Many parts of HIR resemble Rust surface syntax quite closely, with
the exception that some of Rust's expression forms have been desugared away.
For example, for loops are converted into a loop and do not appear in
the HIR. This makes HIR more amenable to analysis than a normal AST.
This chapter covers the main concepts of the HIR.
You can view the HIR representation of your code by passing the
-Z unpretty=hir-tree flag to rustc:
cargo rustc -- -Z unpretty=hir-tree
You can also use the -Z unpretty=hir option to generate a HIR
that is closer to the original source code expression:
cargo rustc -- -Z unpretty=hir
Out-of-band storage and the Crate type
The top-level data-structure in the HIR is the Crate, which stores
the contents of the crate currently being compiled (we only ever
construct HIR for the current crate). Whereas in the AST the crate
data structure basically just contains the root module, the HIR
Crate structure contains a number of maps and other things that
serve to organize the content of the crate for easier access.
For example, the contents of individual items (e.g. modules,
functions, traits, impls, etc) in the HIR are not immediately
accessible in the parents. So, for example, if there is a module item
foo containing a function bar():
#![allow(unused)] fn main() { mod foo { fn bar() { } } }
then in the HIR the representation of module foo (the Mod
struct) would only have the ItemId I of bar(). To get the
details of the function bar(), we would lookup I in the
items map.
One nice result from this representation is that one can iterate over all items in the crate by iterating over the key-value pairs in these maps (without the need to trawl through the whole HIR). There are similar maps for things like trait items and impl items, as well as "bodies" (explained below).
The other reason to set up the representation this way is for better
integration with incremental compilation. This way, if you gain access
to an &rustc_hir::Item (e.g. for the mod foo), you do not immediately
gain access to the contents of the function bar(). Instead, you only
gain access to the id for bar(), and you must invoke some
function to lookup the contents of bar() given its id; this gives
the compiler a chance to observe that you accessed the data for
bar(), and then record the dependency.
Identifiers in the HIR
The HIR uses a bunch of different identifiers that coexist and serve different purposes.
-
A
DefId, as the name suggests, identifies a particular definition, or top-level item, in a given crate. It is composed of two parts: aCrateNumwhich identifies the crate the definition comes from, and aDefIndexwhich identifies the definition within the crate. UnlikeHirIds, there isn't aDefIdfor every expression, which makes them more stable across compilations. -
A
LocalDefIdis basically aDefIdthat is known to come from the current crate. This allows us to drop theCrateNumpart, and use the type system to ensure that only local definitions are passed to functions that expect a local definition. -
A
HirIduniquely identifies a node in the HIR of the current crate. It is composed of two parts: anownerand alocal_idthat is unique within theowner. This combination makes for more stable values which are helpful for incremental compilation. UnlikeDefIds, aHirIdcan refer to fine-grained entities like expressions, but stays local to the current crate. -
A
BodyIdidentifies a HIRBodyin the current crate. It is currently only a wrapper around aHirId. For more info about HIR bodies, please refer to the HIR chapter.
These identifiers can be converted into one another through the TyCtxt.
HIR Operations
Most of the time when you are working with the HIR, you will do so via
TyCtxt. It contains a number of methods, defined in the hir::map module and
mostly prefixed with hir_, to convert between IDs of various kinds and to
lookup data associated with a HIR node.
For example, if you have a LocalDefId, and you would like to convert it
to a HirId, you can use tcx.local_def_id_to_hir_id(def_id).
You need a LocalDefId, rather than a DefId, since only local items have HIR nodes.
Similarly, you can use tcx.hir_node(n) to lookup the node for a
HirId. This returns a Option<Node<'hir>>, where Node is an enum
defined in the map. By matching on this, you can find out what sort of
node the HirId referred to and also get a pointer to the data
itself. Often, you know what sort of node n is – e.g. if you know
that n must be some HIR expression, you can do
tcx.hir_expect_expr(n), which will extract and return the
&hir::Expr, panicking if n is not in fact an expression.
Finally, you can find the parents of nodes, via
calls like tcx.parent_hir_node(n).
HIR Bodies
A rustc_hir::Body represents some kind of executable code, such as the body
of a function/closure or the definition of a constant. Bodies are
associated with an owner, which is typically some kind of item
(e.g. an fn() or const), but could also be a closure expression
(e.g. |x, y| x + y). You can use the TyCtxt to find the body
associated with a given def-id (hir_maybe_body_owned_by) or to find
the owner of a body (hir_body_owner_def_id).