The parser and lexer are currently undergoing a lot of refactoring, so parts of this chapter may be out of date.
The very first thing the compiler does is take the program (in Unicode characters) and turn it into something the compiler can work with more conveniently than strings. This happens in two stages: Lexing and Parsing.
Lexing takes strings and turns them into streams of tokens. For example,
a.b + c would be turned into the tokens
The lexer lives in
Parsing then takes streams of tokens and turns them into a structured
form which is easier for the compiler to work with, usually called an Abstract
Syntax Tree (AST). An AST mirrors the structure of a Rust program in memory,
Span to link a particular AST node back to its source text.
The AST is defined in
librustc_ast, along with some definitions for
tokens and token streams, data structures/traits for mutating ASTs, and shared
definitions for other AST-related parts of the compiler (like the lexer and
The parser is defined in
librustc_parse, along with a
high-level interface to the lexer and some validation routines that run after
macro expansion. In particular, the
the parser implementation.
The main entrypoint to the parser is via the various
parse_* functions in the
parser. They let you do things like turn a
(e.g. the source in a single file) into a token stream, create a parser from
the token stream, and then execute the parser to get a
Crate (the root AST
To minimise the amount of copying that is done, both the
Parser have lifetimes which bind them to the parent
ParseSess. This contains
all the information needed while parsing, as well as the
Code for lexical analysis is split between two crates:
rustc_lexercrate is responsible for breaking a
&strinto chunks constituting tokens. Although it is popular to implement lexers as generated finite state machines, the lexer in