Rust Codegen
The first phase in debug info generation requires Rust to inspect the MIR of the program and
communicate it to LLVM. This is primarily done in rustc_codegen_llvm/debuginfo, though
some type-name processing exists in rustc_codegen_ssa/debuginfo. Rust communicates to
LLVM via the DIBuilder API - a thin wrapper around LLVM’s internals that exists in
rustc_llvm.
Type Information
Type information typically consists of the type name, size, alignment, as well as things like fields, generic parameters, and storage modifiers if they are relevant. Much of this work happens in rustc_codegen_llvm/src/debuginfo/metadata.
It is important to keep in mind that the goal is not necessarily “represent types exactly how they appear in Rust”, rather it is to represent them in a way that allows debuggers to most accurately reconstruct the data during debugging. This distinction is vital to understanding the core work that occurs on this layer; many changes made here will be for the purpose of working around debugger limitations when no other option will work.
Quirks
Rust’s generated DI nodes “pretend” to be C/C++ for both CDB and LLDB’s sake. This can result in some unintuitive and non-idiomatic debug info.
Pointers and Reference
Wide pointers/references/Box are treated as a struct with 2 fields: data_ptr and length.
All non-wide pointers, references, and Box pointers are output as pointer nodes, and no
distinction is made between mut and non-mut. Several attempts have been made to rectify this,
but unfortunately there is not a straightforward solution. Using the reference DI nodes of the
respective formats has pitfalls. There is a semantic difference between C++ references and Rust
references that is unreconcilable.
From cppreference:
References are not objects; they do not necessarily occupy storage, although the compiler may allocate storage if it is necessary to implement the desired semantics (e.g. a non-static data member of reference type usually increases the size of the class by the amount necessary to store a memory address).
Because references are not objects, there are no arrays of references, no pointers to references, and no references to references
The current proposed solution is to simply typedef the pointer nodes.
Using the const qualifier to denote non-mut poses potential issues due to LLDB’s internal
optimizations. In short, LLDB attempts to cache the child-values of variables (e.g. struct fields,
array elements) when stepping through code. A heuristic is used to determine which values are safely
cache-able, and const is part of that heuristic. Research has not been done into how this would
interact with things like Rust’s interior mutability constructs.
DWARF vs PDB
While most of the type information is fairly straight forward, one notable issue is the debug info
format of the target. Each format has different semantics and limitations, as such they require
slightly different debug info in some cases. This is gated by calls to
cpp_like_debuginfo.
Naming
Rust attempts to communicate type names as accurately as possible, but debuggers and debug info formats do not always respect that.
Due to limitations in MSVC’s expression parser, the following name transformations are made for PDB debug info:
| Rust name | MSVC name |
|---|---|
&str/&mut str | ref$<str$>/ref_mut$<str$> |
&[T]/&mut [T] | ref$<slice$<T> >/ref_mut$<slice$<T> >1 |
[T; N] | array$<T, N> |
RustEnum | enum2$<RustEnum> |
(T1, T2) | tuple$<T1, T2> |
*const T | ptr_const$<T> |
*mut T | ptr_mut$<T> |
usize | size_t2 |
isize | ptrdiff_t2 |
uN | unsigned __intN2 |
iN | __intN2 |
f32 | float2 |
f64 | double2 |
f128 | fp1282 |
Generics
Rust outputs generic type information (T in ArrayVec<T, N: usize>), but not generic value
information (N in ArrayVec<T, N: usize>).
CodeView does not have a leaf node for generics/C++ templates, so all generic information is lost when generating PDB debug info. There are workarounds that allow the debugger to retrieve the generic arguments via the type name, but it is fragile solution at best. Efforts are being made to contact Microsoft to correct this deficiency, and/or to use one of the unused CodeView node types as a suitable equivalent.
Type aliases
Rust outputs typedef nodes in several cases to help account for debugger limitiations, but it does not currently output nodes for type aliases in the source code.
Enums
Enum DI nodes are generated in rustc_codegen_llvm/src/debuginfo/metadata/enums
DWARF
DWARF has a dedicated node for discriminated unions: DW_TAG_variant. It is a container that
references DW_TAG_variant_part nodes that may or may not contain a discriminant value. The
hierarchy looks as follows:
DW_TAG_structure_type (top-level type for the coroutine)
DW_TAG_variant_part (variant part)
DW_AT_discr (reference to discriminant DW_TAG_member)
DW_TAG_member (discriminant member)
DW_TAG_variant (variant 1)
DW_TAG_variant (variant 2)
DW_TAG_variant (variant 3)
DW_TAG_structure_type (type of variant 1)
DW_TAG_structure_type (type of variant 2)
DW_TAG_structure_type (type of variant 3)
PDB
PDB does not have a dedicated node, so it generates the C equivalent of a discriminated union:
union enum2$<RUST_ENUM_NAME> {
enum VariantNames {
First,
Second
};
struct Variant0 {
struct First {
// fields
};
static const enum2$<RUST_ENUM_NAME>::VariantNames NAME;
static const unsigned long DISCR_EXACT;
enum2$<RUST_ENUM_NAME>::Variant0::First value;
};
struct Variant1 {
struct Second {
// fields
};
static enum2$<RUST_ENUM_NAME>::VariantNames NAME;
static unsigned long DISCR_EXACT;
enum2$<RUST_ENUM_NAME>::Variant1::Second value;
};
enum2$<RUST_ENUM_NAME>::Variant0 variant0;
enum2$<RUST_ENUM_NAME>::Variant1 variant1;
unsigned long tag;
}
An important note is that due to limitations in LLDB, the DISCR_* value generated is always a
u64 even if the value is not #[repr(u64)]. This is largely a non-issue for LLDB because the
DISCR_* value and the tag are read into uint64_t values regardless of their type.
Source Information
TODO
-
MSVC’s expression parser will treat
>>as a right shift. It is necessary to separate consecutive>’s with a space (> >) in type names. ↩ -
While these type names are generated as part of the debug info node (which is then wrapped in a typedef node with the Rust name), once the LLVM-IR node is converted to a CodeView node, the type name information is lost. This is because CodeView has special shorthand nodes for primitive types, and those shorthand nodes to not have a “name” field. ↩ ↩2 ↩3 ↩4 ↩5 ↩6 ↩7