LLVM Source-Based Code Coverage

rustc supports detailed source-based code and test coverage analysis with a command line option (-Z instrument-coverage) that instruments Rust libraries and binaries with additional instructions and data, at compile time.

The coverage instrumentation injects calls to the LLVM intrinsic instruction llvm.instrprof.increment at code branches (based on a MIR-based control flow analysis), and LLVM converts these to instructions that increment static counters, when executed. The LLVM coverage instrumentation also requires a Coverage Map that encodes source metadata, mapping counter IDs--directly and indirectly--to the file locations (with start and end line and column).

Rust libraries, with or without coverage instrumentation, can be linked into instrumented binaries. When the program is executed and cleanly terminates, LLVM libraries write the final counter values to a file (default.profraw or a custom file set through environment variable LLVM_PROFILE_FILE).

Developers use existing LLVM coverage analysis tools to decode .profraw files, with corresponding Coverage Maps (from matching binaries that produced them), and generate various reports for analysis, for example:

Screenshot of sample `llvm-cov show` result, for function add_quoted_string


Detailed instructions and examples are documented in the Rust Unstable Book (under source-based-code-coverage).

Rust symbol mangling

-Z instrument-coverage automatically enables Rust symbol mangling v0 (as if the user specified -Z symbol-mangling-version=v0 option when invoking rustc) to ensure consistent and reversible name mangling. This has two important benefits:

  1. LLVM coverage tools can analyze coverage over multiple runs, including some changes to source code; so mangled names must be consistent across compilations.
  2. LLVM coverage reports can report coverage by function, and even separates out the coverage counts of each unique instantiation of a generic function, if invoked with multiple type substitution variations.

Components of LLVM Coverage Instrumentation in rustc

LLVM Runtime Dependency

Coverage data is only generated by running the executable Rust program. rustc statically links coverage-instrumented binaries with LLVM runtime code (compiler-rt) that implements program hooks (such as an exit hook) to write the counter values to the .profraw file.

In the rustc source tree, library/profiler_builtins bundles the LLVM compiler-rt code into a Rust library crate. (When building rustc, the profiler_builtins library is only included when profiler = true is set in rustc's config.toml.)

When compiling with -Z instrument-coverage, CrateLoader::postprocess() dynamically loads the profiler_builtins library by calling inject_profiler_runtime().

MIR Pass: InstrumentCoverage

Coverage instrumentation is performed on the MIR with a MIR pass called InstrumentCoverage. This MIR pass analyzes the control flow graph (CFG)--represented by MIR BasicBlocks--to identify code branches, and injects additional Coverage statements into the BasicBlocks.

A MIR Coverage statement is a virtual instruction that indicates a counter should be incremented when its adjacent statements are executed, to count a span of code (CodeRegion). It counts the number of times a branch is executed, and also specifies the exact location of that code span in the Rust source code.

Note that many of these Coverage statements will not be converted into physical counters (or any other executable instructions) in the final binary. Some of them will be (see CoverageKind::Counter), but other counters can be computed on the fly, when generating a coverage report, by mapping a CodeRegion to a CoverageKind::Expression.

As an example:


#![allow(unused)]
fn main() {
fn some_func(flag: bool) {
    // increment Counter(1)
    ...
    if flag {
        // increment Counter(2)
        ...
    } else {
        // count = Expression(1) = Counter(1) - Counter(2)
        ...
    }
    // count = Expression(2) = Counter(1) + Zero
    //     or, alternatively, Expression(2) = Counter(2) + Expression(1)
    ...
}
}

In this example, four contiguous code regions are counted while only incrementing two counters.

CFG analysis is used to not only determine where the branches are, for conditional expressions like if, else, match, and loop, but also to determine where expressions can be used in place of physical counters.

The advantages of optimizing coverage through expressions are more pronounced with loops. Loops generally include at least one conditional branch that determines when to break out of a loop (a while condition, or an if or match with a break). In MIR, this is typically lowered to a SwitchInt, with one branch to stay in the loop, and another branch to break out of the loop. The branch that breaks out will almost always execute less often, so InstrumentCoverage chooses to add a Counter to that branch, and an Expression(continue) = Counter(loop) - Counter(break) to the branch that continues.

The InstrumentCoverage MIR pass is documented in more detail below.

Counter Injection and Coverage Map Pre-staging

When the compiler enters the Codegen phase, with a coverage-enabled MIR, codegen_statement() converts each MIR Statement into some backend-specific action or instruction. codegen_statement() forwards Coverage statements to codegen_coverage():


#![allow(unused)]
fn main() {
    pub fn codegen_statement(&mut self, mut bx: Bx, statement: &mir::Statement<'tcx>) -> Bx {
        ...
        match statement.kind {
            ...
            mir::StatementKind::Coverage(box ref coverage) => {
                self.codegen_coverage(&mut bx, coverage.clone());
                bx
            }
}

codegen_coverage() handles each CoverageKind as follows:

  • For all CoverageKinds, Coverage data (counter ID, expression equation and ID, and code regions) are passed to the backend's Builder, to populate data structures that will be used to generate the crate's "Coverage Map". (See the FunctionCoverage struct.)
  • For CoverageKind::Counters, an instruction is injected in the backend IR to increment the physical counter, by calling the BuilderMethod instrprof_increment().

#![allow(unused)]
fn main() {
    pub fn codegen_coverage(&self, bx: &mut Bx, coverage: Coverage) {
        let Coverage { kind, code_region } = coverage;
        match kind {
            CoverageKind::Counter { function_source_hash, id } => {
                if let Some(code_region) = code_region {
                    bx.add_coverage_counter(self.instance, id, code_region);
                }
                ...
                bx.instrprof_increment(fn_name, hash, num_counters, index);
            }
            CoverageKind::Expression { id, lhs, op, rhs } => {
                bx.add_coverage_counter_expression(self.instance, id, lhs, op, rhs, code_region);
            }
            CoverageKind::Unreachable => {
                ...
}

code snippet trimmed for brevity

The function name instrprof_increment() is taken from the LLVM intrinsic call of the same name (llvm.instrprof.increment), and uses the same arguments and types; but note that, up to and through this stage (even though modeled after LLVM's implementation for code coverage instrumentation), the data and instructions are not strictly LLVM-specific.

But since LLVM is the only Rust-supported backend with the tooling to process this form of coverage instrumentation, the backend for Coverage statements is only implemented for LLVM, at this time.

Coverage Map Generation

With the instructions to increment counters now implemented in LLVM IR, the last remaining step is to inject the LLVM IR variables that hold the static data for the coverage map.

rustc_codegen_llvm's compile_codegen_unit() calls coverageinfo_finalize(), which delegates its implementation to the rustc_codegen_llvm::coverageinfo::mapgen module.

For each function Instance (code-generated from MIR, including multiple instances of the same MIR for generic functions that have different type substitution combinations), mapgen's finalize() method queries the Instance-associated FunctionCoverage for its Counters, Expressions, and CodeRegions; and calls LLVM codegen APIs to generate properly-configured variables in LLVM IR, according to very specific details of the LLVM Coverage Mapping Format (Version 4).1

1

The Rust compiler (as of

January 2021) supports LLVM Coverage Mapping Format Version 4 (the most up-to-date version of the format, at the time of this writing) for improved compatibility with other LLVM-based compilers (like Clang), and to take advantage of some format optimizations. Version 4 was introduced in LLVM 11, which is currently the default LLVM version for Rust. Note that the Rust compiler optionally supports some earlier LLVM versions, prior to LLVM 11. If rustc is configured to use an incompatible version of LLVM, compiling with -Z instrument-coverage will generate an error message.


#![allow(unused)]
fn main() {
pub fn finalize<'ll, 'tcx>(cx: &CodegenCx<'ll, 'tcx>) {
    let mut function_coverage_map = match cx.coverage_context() {
        Some(ctx) => ctx.take_function_coverage_map(),
        None => return,
    };
    ...
    add_unreachable_coverage(tcx, &mut function_coverage_map);

    let mut mapgen = CoverageMapGenerator::new();

    for (instance, function_coverage) in function_coverage_map {
        ...
        let coverage_mapping_buffer = llvm::build_byte_buffer(|coverage_mapping_buffer| {
            mapgen.write_coverage_mapping(expressions, counter_regions, coverage_mapping_buffer);
        });
}

code snippet trimmed for brevity

One notable step, performed by mapgen::finalize() before processing the Instances and their FunctionCoverages, is the call to add_unreachable_functions().

When finalizing the coverage map, FunctionCoverage only has the CodeRegions and counters for the functions that went through codegen; such as public functions and "used" functions (functions referenced by other "used" or public items). Any other functions (considered unused or "Unreachable") were still parsed and processed through the MIR stage.

The set of unreachable functions is computed via the set difference of all MIR DefIds (tcx query mir_keys) minus the codegenned DefIds (tcx query collect_and_partition_mono_items). add_unreachable_functions() computes the set of unreachable functions, queries the tcx for the previously-computed CodeRegions, for each unreachable MIR, and adds those code regions to one of the non-generic codegenned functions (non-generic avoids potentially injecting the unreachable coverage multiple times for multiple instantiations).

Testing LLVM Coverage

Coverage instrumentation in the MIR is validated by a mir-opt test: instrument-coverage.

More complete testing of end-to-end coverage instrumentation and reports are done in the run-make-fulldeps tests, with sample Rust programs (to be instrumented) in the coverage directory, and the actual tests and expected results in coverage-reports.

In addition to testing the final result, two intermediate results are also validated to catch potential regression errors early: Minimum CoverageSpans computed during the InstrumentCoverage MIR pass are saved in mir_dump Spanview files and compared to expected results in coverage-spanview.

Finally, the coverage-llvmir test compares compiles a simple Rust program with -Z instrument-coverage and compares the compiled program's LLVM IR to expected LLVM IR instructions and structured data for a coverage-enabled program, including various checks for Coverage Map-related metadata and the LLVM intrinsic calls to increment the runtime counters.

Expected results for both the mir-opt tests and the coverage* tests under run-make-fulldeps can be refreshed by running:

$ ./x.py test src/test/<test-type> --blessed

Implementation Details of the InstrumentCoverage MIR Pass

The bulk of the implementation of the InstrumentCoverage MIR pass is performed by the Instrumentor. For each MIR (each non-const, non-inlined function, generic, or closure), the Instrumentor's constructor prepares a CoverageGraph and then executes inject_counters().


#![allow(unused)]
fn main() {
        Instrumentor::new(&self.name(), tcx, mir_body).inject_counters();
}

The CoverageGraph is a coverage-specific simplification of the MIR control flow graph (CFG). Its nodes are BasicCoverageBlocks, which encompass one or more sequentially-executed MIR BasicBlocks (with no internal branching), plus a CoverageKind counter (to be added, via coverage analysis), and an optional set of additional counters to count incoming edges (if there are more than one).

The Instrumentor's inject_counters() uses the CoverageGraph to compute the best places to inject coverage counters, as MIR Statements, with the following steps:

  1. Depending on the debugging configurations in rustc's, config.toml, and rustc command line flags, various debugging features may be enabled to enhance debug!() messages in logs, and to generate various "dump" files, to help developers understand the MIR transformation process for coverage. Most of the debugging features are implemented in the debug sub-module.
  2. generate_coverage_spans() computes the minimum set of distinct, non-branching code regions, from the MIR. These CoverageSpans represent a span of code that must be counted.
  3. make_bcb_counters() generates CoverageKind::Counters and CoverageKind::Expressions for each CoverageSpan, plus additional intermediate_expressions2, not associated with any CodeRegion, but are required to compute a final Expression value for a CodeRegion.
  4. Inject the new counters into the MIR, as new StatementKind::Coverage statements. This is done by three distinct functions:
    • inject_coverage_span_counters()
    • inject_indirect_counters()
    • inject_intermediate_expression(), called for each intermediate expression returned from make_bcb_counters()
2

Intermediate expressions are sometimes required because Expressions are limited to binary additions or subtractions. For example, A + (B - C) might represent an Expression count computed from three other counters, A, B, and C, but computing that value requires an intermediate expression for B - C.

The CoverageGraph

The CoverageGraph is derived from the MIR (mir::Body).


#![allow(unused)]
fn main() {
        let basic_coverage_blocks = CoverageGraph::from_mir(mir_body);
}

Like mir::Body, the CoverageGraph is also a DirectedGraph. Both graphs represent the function's fundamental control flow, with many of the same graph traits, supporting start_node(), num_nodes(), successors(), predecessors(), and is_dominated_by().

For anyone that knows how to work with the MIR, as a CFG, the CoverageGraph will be familiar, and can be used in much the same way. The nodes of the CoverageGraph are BasicCoverageBlocks (BCBs), which index into an IndexVec of BasicCoverageBlockData. This is analogous to the MIR CFG of BasicBlocks that index BasicBlockData.

Each BasicCoverageBlockData captures one or more MIR BasicBlocks, exclusively, and represents the maximal-length sequence of BasicBlocks without conditional branches.

compute_basic_coverage_blocks() builds the CoverageGraph as a coverage-specific simplification of the MIR CFG. In contrast with the SimplifyCfg MIR pass, this step does not alter the MIR itself, because the CoverageGraph aggressively simplifies the CFG, and ignores nodes that are not relevant to coverage. For example:

  • The BCB CFG ignores (excludes) branches considered not relevant to the current coverage solution. It excludes unwind-related code3 that is injected by the Rust compiler but has no physical source code to count, which allows a Call-terminated BasicBlock to be merged with its successor, within a single BCB.
  • A Goto-terminated BasicBlock can be merged with its successor as long as it has the only incoming edge to the successor BasicBlock.
  • Some BasicBlock terminators support Rust-specific concerns--like borrow-checking--that are not relevant to coverage analysis. FalseUnwind, for example, can be treated the same as a Goto (potentially merged with its successor into the same BCB).
3

(Note, however, that Issue #78544 considers providing future support for coverage of programs that intentionally panic, as an option, with some non-trivial cost.)

The BCB CFG is critical to simplifying the coverage analysis by ensuring graph path-based queries (is_dominated_by(), predecessors, successors, etc.) have branch (control flow) significance.

To visualize the CoverageGraph, you can generate a graphviz *.dot file with the following rustc flags:4

4

This image also applies -Z graphviz-dark-mode, to produce a Graphviz document with "dark mode" styling. If you use a dark mode or theme in your development environment, you will probably want to use this option so you can review the graphviz output without straining your vision.

$ rustc -Z instrument-coverage -Z dump-mir=InstrumentCoverage \
    -Z dump-mir-graphviz some_rust_source.rs

The -Z dump-mir flag requests MIR debugging output (generating *.mir files, by default). -Z dump-mir-graphviz additionally generates *.dot files. -Z dump-mir=InstrumentCoverage restricts these files to the InstrumentCoverage pass. All files are written to the ./mir_dump/ directory, by default.

Files with names ending in .-------.InstrumentCoverage.0.dot contain the graphviz representations of a CoverageGraph (one for each MIR, that is, for each function and closure):

cropped image of a sample CoverageGraph in graphviz format


This image shows each BasicCoverageBlock as a rectangular node, with directional edges (the arrows) leading from each node to its successors(). The nodes contain information in sections:

  1. The gray header has a label showing the BCB ID (or index for looking up its BasicCoverageBlockData).
  2. The first content section shows the assigned Counter or Expression for each contiguous section of code. (There may be more than one Expression incremented by the same Counter for discontiguous sections of code representing the same sequential actions.) Note the code is represented by the line and column ranges (for example: 52:28-52:33, representing the original source line 52, for columns 28-33). These are followed by the MIR Statement or Terminator represented by that source range. (How these coverage regions are determined is discussed in the following section.)
  3. The final section(s) show the MIR BasicBlocks (by ID/index and its TerminatorKind) contained in this BCB. The last BCB is separated out because its successors() determine the edges leading out of the BCB, and into the leading_bb() (first BasicBlock) of each successor BCB.

Note, to find the BasicCoverageBlock from a final BCB Terminator's successor BasicBlock, there is an index and helper function--bcb_from_bb()--to look up a BasicCoverageBlock from any contained BasicBlock.

CoverageSpans

The struct CoverageSpans builds and refines a final set of CoverageSpans, each representing the largest contiguous Span of source within a single BCB. By definition--since each Span falls within a BCB, the Span is also non-branching; so if any code in that Span has executed, all code in the Span will have executed, the same number of times.

CoverageSpans::generate_coverage_spans() constructs an initial set of CoverageSpans from the Spans associated with each MIR Statement and Terminator.

The final stage of generate_coverage_spans() is handled by to_refined_spans(), which iterates through the CoverageSpans, merges and de-duplicates them, and returns an optimal, minimal set of CoverageSpans that can be used to assign coverage Counters or Expressions, one-for-one.

An visual, interactive representation of the final CoverageSpans can be generated with the following rustc flags:

$ rustc -Z instrument-coverage -Z dump-mir=InstrumentCoverage \
    -Z dump-mir-spanview some_rust_source.rs

These flags request Spanview output for the InstrumentCoverage pass, and the resulting files (one for each MIR, that is, for each function or closure) can be found in the mir_dump directory (by default), with the extension: .-------.InstrumentCoverage.0.html.

cropped image of a sample Spanview in a browser


The image above shows one such example. The orange and blue backgrounds highlight alternating CoverageSpans from the refined set. Hovering over a line expands the output on that line to show the MIR BasicBlock IDs covered by each CoverageSpan. While hovering, the CoverageSpan under the pointer also has a tooltip block of text, showing even more detail, including the MIR Statements and Terminators contributing to the CoverageSpan, and their individual Spans (which should be encapsulated within the code region of the refined CoverageSpan)

make_bcb_counters()

make_bcb_counters() traverses the CoverageGraph and adds a Counter or Expression to every BCB. It uses Control Flow Analysis to determine where an Expression can be used in place of a Counter. Expressions have no runtime overhead, so if a viable expression (adding or subtracting two other counters or expressions) can compute the same result as an embedded counter, an Expression is preferred.

TraverseCoverageGraphWithLoops provides a traversal order that ensures all BasicCoverageBlock nodes in a loop are visited before visiting any node outside that loop. The traversal state includes a context_stack, with the current loop's context information (if in a loop), as well as context for nested loops.

Within loops, nodes with multiple outgoing edges (generally speaking, these are BCBs terminated in a SwitchInt) can be optimized when at least one branch exits the loop and at least one branch stays within the loop. (For an if or while, there are only two branches, but a match may have more.)

A branch that does not exit the loop should be counted by Expression, if possible. Note that some situations require assigning counters to BCBs before they are visited by traversal, so the counter_kind (CoverageKind for a Counter or Expression) may have already been assigned, in which case one of the other branches should get the Expression.

For a node with more than two branches (such as for more than two match patterns), only one branch can be optimized by Expression. All others require a Counter (unless its BCB counter_kind was previously assigned).

A branch expression is derived from the equation:

Counter(branching_node) = SUM(Counter(branches))

It's important to be aware that the branches in this equation are the outgoing edges from the branching_node, but a branch's target node may have other incoming edges. Given the following graph, for example, the count for B is the sum of its two incoming edges:

Example graph with multiple incoming edges to a branch node


In this situation, BCB node B may require an edge counter for its "edge from A", and that edge might be computed from an Expression, Counter(A) - Counter(C). But an expression for the BCB node B would be the sum of all incoming edges:

Expression((Counter(A) - Counter(C)) + SUM(Counter(remaining_edges)))

Note that this is only one possible configuration. The actual choice of Counter vs. Expression also depends on the order of counter assignments, and whether a BCB or incoming edge counter already has its Counter or Expression.

Injecting counters into a MIR BasicBlock

With the refined CoverageSpans, and after all Counters and Expressions are created, the final step is to inject the StatementKind::Coverage statements into the MIR. There are three distinct sources, handled by the following functions:

These three functions inject the Coverage statements into the MIR. Counters and Expressions with CoverageSpans add Coverage statements to a corresponding BasicBlock, with a CodeRegion computed from the refined Span and current SourceMap.

All other Coverage statements have a CodeRegion of None, but they still must be injected because they contribute to other Expressions.

Finally, edge's with a CoverageKind::Counter require a new BasicBlock, so the counter is only incremented when traversing the branch edge.

Additional Debugging Support

See the crate documentation for rustc_mir::transform::coverage::debug for a detailed description of the debug output, logging, and configuration options available to developers working on the InstrumentCoverage pass.