Rustdoc Internals
- From Crate to Clean
- From Clean To HTML
- Other Tricks Up Its Sleeve
- Dotting i's And Crossing t's
- Testing Locally
- See Also
This page describes rustdoc
's passes and modes. For an overview of rustdoc
,
see the "Rustdoc overview" chapter.
From Crate to Clean
In core.rs
are two central items: the rustdoc::core::DocContext
struct
, and the rustdoc::core::run_global_ctxt
function. The latter is
where rustdoc
calls out to rustc
to compile a crate to the point where
rustdoc
can take over. The former is a state container used when crawling
through a crate to gather its documentation.
The main process of crate crawling is done in clean/mod.rs
through several
functions with names that start with clean_
. Each function accepts an hir
or ty
data structure, and outputs a clean
structure used by rustdoc
. For
example, this function for converting lifetimes:
fn clean_lifetime<'tcx>(lifetime: &hir::Lifetime, cx: &mut DocContext<'tcx>) -> Lifetime {
if let Some(
rbv::ResolvedArg::EarlyBound(did)
| rbv::ResolvedArg::LateBound(_, _, did)
| rbv::ResolvedArg::Free(_, did),
) = cx.tcx.named_bound_var(lifetime.hir_id)
&& let Some(lt) = cx.args.get(&did).and_then(|arg| arg.as_lt())
{
return lt.clone();
}
Lifetime(lifetime.ident.name)
}
Also, clean/mod.rs
defines the types for the "cleaned" Abstract Syntax Tree
(AST
) used later to render documentation pages. Each usually accompanies a
clean_*
function that takes some AST
or High-Level Intermediate
Representation (HIR
) type from rustc
and converts it into the
appropriate "cleaned" type. "Big" items like modules or associated items may
have some extra processing in its clean
function, but for the most part these
impl
s are straightforward conversions. The "entry point" to this module is
clean::utils::krate
, which is called by run_global_ctxt
.
The first step in clean::utils::krate
is to invoke
visit_ast::RustdocVisitor
to process the module tree into an intermediate
visit_ast::Module
. This is the step that actually crawls the
rustc_hir::Crate
, normalizing various aspects of name resolution, such as:
- handling
#[doc(inline)]
and#[doc(no_inline)]
- handling import globs and cycles, so there are no duplicates or infinite directory trees
- inlining public
use
exports of private items, or showing a "Reexport" line in the module page - inlining items with
#[doc(hidden)]
if the base item is hidden but the - showing
#[macro_export]
-ed macros at the crate root, regardless of where they're defined reexport is not
After this step, clean::krate
invokes clean_doc_module
, which actually
converts the HIR
items to the cleaned AST
. This is also the step where cross-
crate inlining is performed, which requires converting rustc_middle
data
structures into the cleaned AST
.
The other major thing that happens in clean/mod.rs
is the collection of doc
comments and #[doc=""]
attributes into a separate field of the Attributes
struct
, present on anything that gets hand-written documentation. This makes it
easier to collect this documentation later in the process.
The primary output of this process is a clean::types::Crate
with a tree of Item
s
which describe the publicly-documentable items in the target crate.
Passes Anything But a Gas Station (or: Hot Potato)
Before moving on to the next major step, a few important "passes" occur over
the cleaned AST
. Several of these passes are lint
s and reports, but some of
them mutate or generate new items.
These are all implemented in the librustdoc/passes
directory, one file per pass.
By default, all of these passes are run on a crate, but the ones
regarding dropping private/hidden items can be bypassed by passing
--document-private-items
to rustdoc
. Note that unlike the previous set of AST
transformations, the passes are run on the cleaned crate.
Here is the list of passes as of March 2023:
-
calculate-doc-coverage
calculates information used for the--show-coverage
flag. -
check-doc-test-visibility
runsdoctest
visibility–relatedlint
s. This pass runs beforestrip-private
, which is why it needs to be separate fromrun-lints
. -
collect-intra-doc-links
resolves intra-doc links. -
collect-trait-impls
collectstrait
impl
s for each item in the crate. For example, if we define astruct
that implements atrait
, this pass will note that thestruct
implements thattrait
. -
propagate-doc-cfg
propagates#[doc(cfg(...))]
to child items. -
run-lints
runs some ofrustdoc
'slint
s, defined inpasses/lint
. This is the last pass to run.-
bare_urls
detects links that are not linkified, e.g., in Markdown such asGo to https://example.com/.
It suggests wrapping the link with angle brackets:Go to <https://example.com/>.
to linkify it. This is the code behind therustdoc::bare_urls
lint
. -
check_code_block_syntax
validates syntax inside Rust code blocks (```rust
) -
html_tags
detects invalidHTML
(like an unclosed<span>
) in doc comments.
-
-
strip-hidden
andstrip-private
strip alldoc(hidden)
and private items from the output.strip-private
impliesstrip-priv-imports
. Basically, the goal is to remove items that are not relevant for public documentation. This pass is skipped when--document-hidden-items
is passed. -
strip-priv-imports
strips all private import statements (use
,extern crate
) from a crate. This is necessary becauserustdoc
will handle public imports by either inlining the item's documentation to the module or creating a "Reexports" section with the import in it. The pass ensures that all of these imports are actually relevant to documentation. It is technically only run when--document-private-items
is passed, butstrip-private
accomplishes the same thing. -
strip-private
strips all private items from a crate which cannot be seen externally. This pass is skipped when--document-private-items
is passed.
There is also a stripper
module in librustdoc/passes
, but it is a
collection of utility functions for the strip-*
passes and is not a pass
itself.
From Clean To HTML
This is where the "second phase" in rustdoc
begins. This phase primarily lives
in the librustdoc/formats
and librustdoc/html
folders, and it all starts with
formats::renderer::run_format
. This code is responsible for setting up a type that
impl FormatRenderer
, which for HTML
is Context
.
This structure contains methods that get called by run_format
to drive the
doc rendering, which includes:
init
generatesstatic.files
, as well as search index andsrc/
item
generates the itemHTML
files themselvesafter_krate
generates other global resources likeall.html
In item
, the "page rendering" occurs, via a mixture of Askama templates
and manual write!()
calls, starting in html/layout.rs
. The parts that have
not been converted to templates occur within a series of std::fmt::Display
implementations and functions that pass around a &mut std::fmt::Formatter
.
The parts that actually generate HTML
from the items and documentation start
with print_item
defined in html/render/print_item.rs
, which switches out
to one of several item_*
functions based on kind of Item
being rendered.
Depending on what kind of rendering code you're looking for, you'll probably
find it either in html/render/mod.rs
for major items like "what sections
should I print for a struct
page" or html/format.rs
for smaller component
pieces like "how should I print a where clause as part of some other item".
Whenever rustdoc
comes across an item that should print hand-written
documentation alongside, it calls out to html/markdown.rs
which interfaces
with the Markdown parser. This is exposed as a series of types that wrap a
string of Markdown, and implement fmt::Display
to emit HTML
text. It takes
special care to enable certain features like footnotes and tables and add
syntax highlighting to Rust code blocks (via html/highlight.rs
) before
running the Markdown parser. There's also a function find_codes
which is
called by find_testable_codes
that specifically scans for Rust code blocks so
the test-runner code can find all the doctest
s in the crate.
From Soup to Nuts (or: "An Unbroken Thread Stretches From Those First Cell
s To Us")
It's important to note that rustdoc
can ask the compiler for type information
directly, even during HTML
generation. This didn't used to be the case, and
a lot of rustdoc
's architecture was designed around not doing that, but a
TyCtxt
is now passed to formats::renderer::run_format
, which is used to
run generation for both HTML
and the
(unstable as of March 2023) JSON format.
This change has allowed other changes to remove data from the "clean" AST
that can be easily derived from TyCtxt
queries, and we'll usually accept
PRs that remove fields from "clean" (it's been soft-deprecated), but this
is complicated from two other constraints that rustdoc
runs under:
- Docs can be generated for crates that don't actually pass type checking.
This is used for generating docs that cover mutually-exclusive platform
configurations, such as
libstd
having a single package of docs that cover all supported operating systems. This meansrustdoc
has to be able to generate docs fromHIR
. - Docs can inline across crates. Since crate metadata doesn't contain
HIR
, it must be possible to generate inlined docs from therustc_middle
data.
The "clean" AST
acts as a common output format for both input formats. There
is also some data in clean that doesn't correspond directly to HIR
, such as
synthetic impl
s for auto traits and blanket impl
s generated by the
collect-trait-impls
pass.
Some additional data is stored in
html::render::context::{Context, SharedContext}
. These two types serve as
ways to segregate rustdoc
's data for an eventual future with multithreaded doc
generation, as well as just keeping things organized:
Context
stores data used for generating the current page, such as its path, a list ofHTML
IDs that have been used (to avoid duplicateid=""
), and the pointer toSharedContext
.SharedContext
stores data that does not vary by page, such as thetcx
pointer, and a list of all types.
Other Tricks Up Its Sleeve
All this describes the process for generating HTML
documentation from a Rust
crate, but there are couple other major modes that rustdoc
runs in. It can also
be run on a standalone Markdown file, or it can run doctest
s on Rust code or
standalone Markdown files. For the former, it shortcuts straight to
html/markdown.rs
, optionally including a mode which inserts a Table of
Contents to the output HTML
.
For the latter, rustdoc
runs a similar partial-compilation to get relevant
documentation in test.rs
, but instead of going through the full clean and
render process, it runs a much simpler crate walk to grab just the
hand-written documentation. Combined with the aforementioned
"find_testable_code
" in html/markdown.rs
, it builds up a collection of
tests to run before handing them off to the test runner. One notable location
in test.rs
is the function make_test
, which is where hand-written
doctest
s get transformed into something that can be executed.
Some extra reading about make_test
can be found
here.
Dotting i's And Crossing t's
So that's rustdoc
's code in a nutshell, but there's more things in the
compiler that deal with it. Since we have the full compiletest
suite at hand,
there's a set of tests in tests/rustdoc
that make sure the final HTML
is
what we expect in various situations. These tests also use a supplementary
script, src/etc/htmldocck.py
, that allows it to look through the final HTML
using XPath
notation to get a precise look at the output. The full
description of all the commands available to rustdoc
tests (e.g. @has
and
@matches
) is in htmldocck.py
.
To use multiple crates in a rustdoc
test, add // aux-build:filename.rs
to the top of the test file. filename.rs
should be placed in an auxiliary
directory relative to the test file with the comment. If you need to build
docs for the auxiliary file, use // build-aux-docs
.
In addition, there are separate tests for the search index and rustdoc
's
ability to query it. The files in tests/rustdoc-js
each contain a
different search query and the expected results, broken out by search tab.
These files are processed by a script in src/tools/rustdoc-js
and the Node.js
runtime. These tests don't have as thorough of a writeup, but a broad example
that features results in all tabs can be found in basic.js
. The basic idea is
that you match a given QUERY
with a set of EXPECTED
results, complete with
the full item path of each item.
Testing Locally
Some features of the generated HTML
documentation might require local
storage to be used across pages, which doesn't work well without an HTTP
server. To test these features locally, you can run a local HTTP
server, like
this:
$ ./x doc library
# The documentation has been generated into `build/[YOUR ARCH]/doc`.
$ python3 -m http.server -d build/[YOUR ARCH]/doc
Now you can browse your documentation just like you would if it was hosted
on the internet. For example, the url for std
will be rust/std/
.