Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

LLDB Internals

LLDB’s debug info processing relies on a set of extensible interfaces largely defined in lldb/src/Plugins. These are meant to allow third-party compiler developers to add language support that is loaded at run-time by LLDB, but at time of writing (Nov 2025) the public API has not been settled on, so plugins exist either in LLDB itself or in standalone forks of LLDB.

Typically, language support will be written as a pipeline of these plugins: *ASTParser -> TypeSystem -> ExpressionParser/Language.

Here are some existing implementations of LLDB’s plugin API:

Rust Support and TypeSystemClang

As mentioned in the debug info overview, LLDB has partial Rust support. To further clarify, Rust uses the plugin-pipeline that was built for C/C++ (though it contains some helpers for Rust enum types), which relies directly on the clang compiler’s representation of types. This imposes heavy restrictions on how much we can change when LLDB’s output doesn’t match what we want. Some workarounds can help, but at the end of the day Rust’s needs are secondary compared to making sure C and C++ compilation and debugging work correctly.

LLDB is receptive to adding a TypeSystemRust, but it is a massive undertaking. This section serves to not only document how we currently interact with TypeSystemClang, but also as light guidance on implementing a TypeSystemRust in the future.

It is worth noting that a TypeSystem directly interacting with the target language’s compiler is the intention, but it is not a requirement. One can create all the necessary supporting types within their plugin implementation.

Note: LLDB’s documentation, including comments in the source code, is pretty sparse. Trying to understand how language support works by reading TypeSystemClang’s implementation is somewhat difficult due to the added requirement of understanding the clang compiler’s internals. It is recommended to look at the 2 TypeSystemRust implementations listed above, as they are written “from scratch” without leveraging a compiler’s type representation. They are relatively close to the minimum necessary to implement language support.

DWARF vs PDB

LLDB is unique in being able to handle both DWARF and PDB debug information. This does come with some added complexity. To complicate matters further, PDB support is split between dia, which relies on the msdia140.dll library distributed with Visual Studio, and native, which is written from scratch using publicly available information about the PDB format.

Note: dia was the default up to LLDB version 21. native is the new default as of LLDB 22’s release. There are plans to deprecate and completely remove the dia-based plugins. As such, only native parsing will be discussed below. For progress, please see this discourse thread and the relevant tracking issue.

native can be toggled via the plugin.symbol-file.pdb.reader setting added in LLDB 22 or using the environment variable LLDB_USE_NATIVE_PDB_READER=0/1

Debug Node Parsing

The first step is to process the raw debug nodes into something usable. This primarily occurs in the DWARFASTParser and PdbAstBuilder classes. These classes are fed a deserialized form of the debug info generated from SymbolFileDWARF and SymbolFileNativePDB respectively. The SymbolFile implementers make almost no transformations to the underlying debug info before passing it to the parsers. For both PDB and DWARF, the debug info is read using LLVM’s debug info handlers.

The parsers translate the nodes into more convenient formats for LLDB’s purposes. For clang, these formats are clang::QualType, clang::Decl, and clang::DeclContext, which are the types clang uses internally when compiling C and C++. Again, using the compiler’s representation of types is not a requirement, but the plugin system was built with it as a possibility.

Note: The above types will be referred to language-agnostically as LangType, Decl, and DeclContext when the specific implementation details of TypeSystemClang are not relevant.

LangType represents a type. This includes information such as the name of the type, the size and alignment, its classification (e.g. struct, primitive, pointer), its qualifiers (e.g. const, volatile), template arguments, function argument and return types, etc. Here is an example of what a RustType might look like.

Decl represents any kind of declaration. It could be a type, a variable, a static field of a struct, the value that a static or const is initialized with, etc.

DeclContext more or less represents a scope. DeclContexts typically contain Decls and other DeclContexts, though the relationship isn’t that straight forward. For example, a function can be both a Decl (because function signatures are types), and a DeclContext (because functions contain variable declarations, nested functions declarations, etc.).

The translation process can be quite verbose, but is usually straightforward. Much of the work here is dependant on the exact information needed to fill out LangType, Decl, and DeclContext.

Once a node is translated, a pointer to it is type-erased (void*) and wrapped in CompilerType, CompilerDecl, or CompilerDeclContext. These wrappers associate the them with the TypeSystem that owns them. Methods on these objects delegates to the TypeSystem, which casts the void* back to the appropriate LangType*/Decl*/DeclContext* and operates on the internals. In Rust terms, the relationship looks something like this:

struct CompilerType {
    inner_type: *mut c_void,
    type_system: Arc<dyn TypeSystem>,
}

impl CompilerType {
    pub fn get_byte_size(&self) -> usize {
        self.type_system.get_byte_size(self.lang_type)
    }

}

...

impl TypeSystem for TypeSystemLang {
    pub fn get_byte_size(lang_type: *mut c_void) -> usize {
        let lang_type = lang_type as *mut LangType;

        // Operate on the internals of the LangType to
        // determine its size
        ...
    }
}

Type Systems

The TypeSystem interface has 3 major purposes:

  1. Act as the “sole authority” of a language’s types. This allows the type system to be added to LLDB’s “pool” of type systems. When an executable is loaded, the target language is determined, and the pool is queried to find a TypeSystem that claims it can handle the language. One can also use the TypeSystem to retrieve the backing SymbolFile, search for types, and synthesize basic types that might not exist in the debug info (e.g. primitives, arrays-of-T, pointers-to-T).
  2. Manage the lifetimes of the LangType, Decl, and DeclContext objects
  3. Customize the “defaults” of how those types appear and how they can be interacted with.

The first two functions are pretty straightforward so we will focus on the third.

Many of the functions in the TypeSystem interface will look familiar if you have worked with the visualizer scripts. These functions underpin SBType the SBValue functions with matching names. For example, TypeSystem::GetFormat returns the default format for the type if no custom formatter has been applied to it.

Of particular note are GetIndexOfChildWithName and GetNumChildren. The TypeSystem versions of these functions operate on a type, not a value like the SBValue versions. The values returned from the TypeSystem functions dictate what parts of the struct can be interacted with at all by the rest of LLDB. If a field is ommitted, that field effectively no longer exists to LLDB.

Additionally, since they do not work with objects, there is no underlying memory to inspect or interpret. Essentially, this means these functions do not have the same purpose as their equivalent SyntheticProvider functions. There is no way to determine how many elements a Vec has or what address those elements live at. It is also not possible to determine the value of the discriminant of a sum-type.

Ideally, the TypeSystem should expose types as they appear in the debug info with as few alterations as possible. LLDB’s synthetics and frontend can handle making the type pretty. If some piece of information is useless, the Rust compiler should be altered to not output that debug info in the first place.

Expression Parsing

The TypeSystem is typically written to have a counterpart that can handle expression parsing. It requires implementing a few extra functions in the TypeSystem interface. The bulk of the expression parsing code should live in lldb/source/Plugins/ExpressionParser.

There isn’t too much of note about the parser. It requires implementing a simple interpreter that can handle (possibly simplified) Rust syntax. They operate on lldb::ValueObjects, which are the objects that underpin SBValue.

Language

The Language plugins are the C++ equivalent to the Python visualizer scripts. They operate on SBValue objects for the same purpose: creating synthetic children and pretty-printing. The CPlusPlusLanguage’s implementations for the LibCxx types are great resources to learn how visualizers should be written.

These plugins can access LLDB’s private internals (including the underlying TypeSystem), so synthetic/summary providers written as a Language plugin can provide higher quality output than their python equivalent.

While debug node parsing, type systems, and expression parsing are all closely tied to eachother, the Language plugin is encapsulated more and thus can be written “standalone” for any language that an existing type system supports. Due to the lower barrier of entry, a RustLanguage plugin may be a good stepping stone towards full language support in LLDB.

Visualizers

WIP