Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Rust Codegen

The first phase in debug info generation requires Rust to inspect the MIR of the program and communicate it to LLVM. This is primarily done in rustc_codegen_llvm/debuginfo, though some type-name processing exists in rustc_codegen_ssa/debuginfo. Rust communicates to LLVM via the DIBuilder API - a thin wrapper around LLVM’s internals that exists in rustc_llvm.

Type Information

Type information typically consists of the type name, size, alignment, as well as things like fields, generic parameters, and storage modifiers if they are relevant. Much of this work happens in rustc_codegen_llvm/src/debuginfo/metadata.

It is important to keep in mind that the goal is not necessarily “represent types exactly how they appear in Rust”, rather it is to represent them in a way that allows debuggers to most accurately reconstruct the data during debugging. This distinction is vital to understanding the core work that occurs on this layer; many changes made here will be for the purpose of working around debugger limitations when no other option will work.

Quirks

Rust’s generated DI nodes “pretend” to be C/C++ for both CDB and LLDB’s sake. This can result in some unintuitive and non-idiomatic debug info.

Pointers and Reference

Wide pointers/references/Box are treated as a struct with 2 fields: data_ptr and length.

All non-wide pointers, references, and Box pointers are output as pointer nodes, and no distinction is made between mut and non-mut. Several attempts have been made to rectify this, but unfortunately there is not a straightforward solution. Using the reference DI nodes of the respective formats has pitfalls. There is a semantic difference between C++ references and Rust references that is unreconcilable.

From cppreference:

References are not objects; they do not necessarily occupy storage, although the compiler may allocate storage if it is necessary to implement the desired semantics (e.g. a non-static data member of reference type usually increases the size of the class by the amount necessary to store a memory address).

Because references are not objects, there are no arrays of references, no pointers to references, and no references to references

The current proposed solution is to simply typedef the pointer nodes.

Using the const qualifier to denote non-mut poses potential issues due to LLDB’s internal optimizations. In short, LLDB attempts to cache the child-values of variables (e.g. struct fields, array elements) when stepping through code. A heuristic is used to determine which values are safely cache-able, and const is part of that heuristic. Research has not been done into how this would interact with things like Rust’s interior mutability constructs.

DWARF vs PDB

While most of the type information is fairly straight forward, one notable issue is the debug info format of the target. Each format has different semantics and limitations, as such they require slightly different debug info in some cases. This is gated by calls to cpp_like_debuginfo.

Naming

Rust attempts to communicate type names as accurately as possible, but debuggers and debug info formats do not always respect that.

Due to limitations in MSVC’s expression parser, the following name transformations are made for PDB debug info:

Rust nameMSVC name
&str/&mut strref$<str$>/ref_mut$<str$>
&[T]/&mut [T]ref$<slice$<T> >/ref_mut$<slice$<T> >1
[T; N]array$<T, N>
RustEnumenum2$<RustEnum>
(T1, T2)tuple$<T1, T2>
*const Tptr_const$<T>
*mut Tptr_mut$<T>
usizesize_t2
isizeptrdiff_t2
uNunsigned __intN2
iN__intN2
f32float2
f64double2
f128fp1282

Generics

Rust outputs generic type information (T in ArrayVec<T, N: usize>), but not generic value information (N in ArrayVec<T, N: usize>).

CodeView does not have a leaf node for generics/C++ templates, so all generic information is lost when generating PDB debug info. There are workarounds that allow the debugger to retrieve the generic arguments via the type name, but it is fragile solution at best. Efforts are being made to contact Microsoft to correct this deficiency, and/or to use one of the unused CodeView node types as a suitable equivalent.

Type aliases

Rust outputs typedef nodes in several cases to help account for debugger limitiations, but it does not currently output nodes for type aliases in the source code.

Enums

Enum DI nodes are generated in rustc_codegen_llvm/src/debuginfo/metadata/enums

DWARF

DWARF has a dedicated node for discriminated unions: DW_TAG_variant. It is a container that references DW_TAG_variant_part nodes that may or may not contain a discriminant value. The hierarchy looks as follows:

DW_TAG_structure_type      (top-level type for the coroutine)
  DW_TAG_variant_part      (variant part)
    DW_AT_discr            (reference to discriminant DW_TAG_member)
    DW_TAG_member          (discriminant member)
    DW_TAG_variant         (variant 1)
    DW_TAG_variant         (variant 2)
    DW_TAG_variant         (variant 3)
  DW_TAG_structure_type    (type of variant 1)
  DW_TAG_structure_type    (type of variant 2)
  DW_TAG_structure_type    (type of variant 3)

PDB

PDB does not have a dedicated node, so it generates the C equivalent of a discriminated union:

union enum2$<RUST_ENUM_NAME> {
    enum VariantNames {
        First,
        Second
    };
    struct Variant0 {
        struct First {
            // fields
        };
        static const enum2$<RUST_ENUM_NAME>::VariantNames NAME;
        static const unsigned long DISCR_EXACT;
        enum2$<RUST_ENUM_NAME>::Variant0::First value;
    };
    struct Variant1 {
        struct Second {
            // fields
        };
        static enum2$<RUST_ENUM_NAME>::VariantNames NAME;
        static unsigned long DISCR_EXACT;
        enum2$<RUST_ENUM_NAME>::Variant1::Second value;
    };
    enum2$<RUST_ENUM_NAME>::Variant0 variant0;
    enum2$<RUST_ENUM_NAME>::Variant1 variant1;
    unsigned long tag;
}

An important note is that due to limitations in LLDB, the DISCR_* value generated is always a u64 even if the value is not #[repr(u64)]. This is largely a non-issue for LLDB because the DISCR_* value and the tag are read into uint64_t values regardless of their type.

Source Information

TODO


  1. MSVC’s expression parser will treat >> as a right shift. It is necessary to separate consecutive >’s with a space (> >) in type names.

  2. While these type names are generated as part of the debug info node (which is then wrapped in a typedef node with the Rust name), once the LLVM-IR node is converted to a CodeView node, the type name information is lost. This is because CodeView has special shorthand nodes for primitive types, and those shorthand nodes to not have a “name” field. ↩2 ↩3 ↩4 ↩5 ↩6 ↩7