Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

LLDB - Python Providers

NOTE: LLDB’s C++<->Python FFI expects a version of python designated at the time LLDB was compiled. LLDB is careful to correspond this version to the minimum in typical Linux and macOS distributions, but on Windows there is no easy solution. If you recieve an import error regarding _lldb not existing, a mismatched Python version is likely the cause.

LLDB is considering solutions this issue. For updates, see this discussion and this github issue

NOTE: Currently (Nov 2025), LLDB’s minimum supported Python version is 3.8 with plans to update it to 3.9 or 3.10 depending on several outside factors. Scripts should ideally be written with only the features available in the minimum supported Python version. Please see this discussion for more info.

NOTE: The path to LLDB’s python package can be located via the CLI command lldb -P

LLDB provides 3 mechanisms for customizing output:

  • Formats
  • Synthetic providers
  • Summary providers

Formats

The official documentation is here. In short, formats allow one to set the default print format for primitive types (e.g. print 25u8 as decimal 25, hex 0x19, or binary 00011001).

Rust will almost always need to override unsigned char, signed char, char, u8, and i8, to (unsigned) decimal format.

Synthetic Providers

The official documentation is here, but some information is vague, outdated, or entirely missing.

Nearly all interaction the user has with variables will be through LLDB’s SBValue objects which are used both in the Python API, and internally via LLDB’s plugins and CLI.

A Synthetic Provider is a Python class, written with a specific interface, that is associated with one or more Rust types. The Synthetic Provider wraps SBValue objects and LLDB will call our class’s functions when inspecting the variable.

The wrapped value is still an SBValue, but when calling e.g. SBValue.GetChildAtIndex, it will internally call SyntheticProvider.get_child_at_index. You can check if a value has a synthetic provider via SBValue.IsSynthetic(), and which synthetic it is via SBValue.GetTypeSynthetic(). If you want to interact with the underlying non-synthetic value, you can call SBValue.GetNonSyntheticValue().

The expected interface is as follows:

class SyntheticProvider:
    def __init__(self, valobj: SBValue, _lldb_internal): ...

    # optional
    def update(self) -> bool: ...

    # optional
    def has_children(self) -> bool: ...

    # optional
    def num_children(self, max_children: int) -> int: ...

    def get_child_index(self, name: str) -> int: ...

    def get_child_at_index(self, index: int) -> SBValue: ...

    # optional
    def get_type_name(self) -> str: ...

    # optional
    def get_value(self) -> SBValue: ...

Below are explanations of the methods, their quirks, and how they should generally be used. If a method overrides an SBValue method, that method will be listed.

__init__

This function is called once per object, and must store the valobj in the python class so that it is accessible elsewhere. Very little else should be done here.

(optional) update

This function is called prior to LLDB interacting with a variable, but after __init__. LLDB tracks whether update has already been called. If it has been, and if it is not possible for the variable to have changed (e.g. inspecting the same variable a second time without stepping), it will omit the call to update.

This function has 2 purposes:

  • Store/update any information that may have changed since the last time update was run
  • Inform LLDB if there were changes to the children such that it should flush the child cache.

Typical operations include storing the heap pointer, length, capacity, and element type of a Vec, determining an enum variable’s variant, or checking which slots of a HashMap are occupied.

The bool returned from this function is somewhat complicated, see: update caching below for more info. When in doubt, return False/None. Currently (Nov 2025), none of the visualizers return True, but that may change as the debug info test suite is improved.

update caching

LLDB attempts to cache values when possible, including child values. This cache is effectively the number of child objects, and the addresses of the underlying debugee memory that the child object represents. By returning True, you indicate to LLDB that the number of children and the addresses of those children have not changed since the last time update was run, meaning it can reuse the cached children.

Returning True in the wrong circumstances will result in the debugger outputting incorrect information.

Returning False indicates that there have been changes, the cache will be flushed, and the children will be fetched from scratch. It is the safer option if you are unsure.

The only relationship that matters is parent-to-child. Grandchildren depend on the update function of their direct parent, not that of the grandparent.

It is important to view the child cache as pointers-to-memory. For example, if a slice’s data_ptr value and length have not changed, returning True is appropriate. Even if the slice is mutable and elements of it are overwritten (e.g. slice[0] = 15), because the child cache consists of pointers, they will reflect the new data at that memory location.

Conversely, if data_ptr has changed, that means it is pointing to a new location in memory, the child pointers are invalid, and the cache must be flushed. If the length has changed, we need to flush the cache to reflect the new number of children. If length has changed but data_ptr has not, it is possible to store the old children in the SyntheticProvider itself (e.g. list[SBValue]) and dole those out rather than generating them from scratch, only creating new children if they do not already exist in the SyntheticProvider’s list.

For further clarification, see this discussion

NOTE: when testing the caching behavior, do not rely on LLDB’s heuristic to persist variables when stepping. Instead, store the variable in a python object (e.g. v = lldb.frame.var("var_name")), step forward, and then inspect the stored variable.

(optional) has_children

Overrides SBValue.MightHaveChildren

This is a shortcut used by LLDB to check if the value has children at all, without doing potentially expensive computations to determine how many children there are (e.g. linked list). Often, this will be a one-liner of return True/return False or return self.valobj.MightHaveChildren().

(optional) num_children

Overrides SBValue.GetNumChildren

Returns the total number of children that LLDB should try to access when printing the type. This number does not need to match to total number of synthetic children.

The max_children argument can be returned if calculating the number of children can be expensive (e.g. linked list). If this is not a consideration, max_children can be omitted from the function signature.

Additionally, fields can be intentionally “hidden” from LLDB while still being accessible to the user. For example, one might want a vec![1, 2, 3] to display only its elements, but still have the len and capacity values accessible on request. By returning 3 from num_children, one can restrict LLDB to only displaying [1, 2, 3], while users can still directly access v.len and v.capacity. See: Example Provider: Vec<T> to see an implementation of this.

get_child_index

Overrides SBValue.GetIndexOfChildWithName

Affects SBValue.GetChildMemberWithName

Given a name, returns the index that the child should be accessed at. It is expected that the return value of this function is passed directly to get_child_at_index. As with num_children, the values returned here can be arbitrary, so long as they are properly coordinated with get_child_at_index.

One special value is $$dereference$$. Accounting for this pseudo-field will allow LLDB to use the SBValue returned from get_child_at_index as the result of a dereference via LLDB’s expression parser (e.g. *val and val->field)

get_child_at_index

Overrides SBValue.GetChildAtIndex

Given an index, returns a child SBValue. Often these are generated via SBValue.CreateValueFromAddress, but less commonly SBValue.CreateChildAtOffset, SBValue.CreateValueFromExpression, and SBValue.CreateValueFromData. These functions can be a little finicky, so you may need to fiddle with them to get the output you want.

In some cases, SBValue.Clone is appropriate. It creates a new child that is an exact copy of an existing child, but with a new name. This is useful for cases like tuples, which have field names of the style __0, __1, … when we would prefer they were named 0, 1, …

Small alterations can be made to the resulting child before it is returned. This is useful for &str/String, where we would prefer if the children were displayed as lldb.eFormatBytesWithASCII rather than just as a decimal value.

(optional) get_type_name

Overrides SBValue.GetDisplayTypeName

Overrides the displayed name of a type. For a synthetic SBValue whose type name is overridden, the original type name can still be retrieved via SBValue.GetTypeName() and SBValue.GetType().GetName()

This can be helpful in shortening the name of common standard library types (e.g. std::collections::hash::map::HashMap<K, V, std::hash::random::RandomState> -> HashMap<K, V>), or in normalizing MSVC type names (e.g. ref$<str$> -> &str).

The string manipulation can be a little tricky, especially on MSVC where we cannot conveniently access the generic parameters of the type.

(optional) get_value

Overrides SBValue.GetValue(), SBValue.GetValueAsUnsigned(), SBValue.GetValueAsSigned(), SBValue.GetValueAsAddress(),

The SBValue returned is expected to be a primitive type or pointer, and is treated as the value of the variable in expressions.

IMPORTANT: The SBValue returned must be stored in the SyntheticProvider. There is currently (Nov 2025) a bug where if the SBValue is acquired within get_value and not stored anywhere, Python will segfault when LLDB attempts to access the value.

Summary Providers

Summary providers are python functions of the following form:

def SummaryProvider(valobj: SBValue, _lldb_internal) -> str: ...

Where the returned string is passed verbatim to the user. If the returned value isn’t a string, it is naively convered to a string (e.g. return None prints "None", not an empty string).

If the SBValue passed in is of a type that has a Synthetic Provider, valobj.IsSynthetic() will return True, and the synthetic’s corresponding functions will be used. If this is undesirable, the original value can be retrieved via valobj.GetNonSyntheticValue(). This can be helpful in cases like String, where individually calling GetChildAtIndex in a loop is much slower than accessing the heap pointer, reading the whole byte array directly from the debugee’s memory, and using Python’s bytes.decode().

Instance Summaries

Regular SummaryProvider functions take an opaque SBValue. That SBValue will reflect the type’s SyntheticProvider if one exists, but we cannot access the SyntheticProvider instance itself, or any of its internal implementation details. This is deterimental in cases where we need some of those internal details to help complete the summary. Currently (Nov 2025), in the synthetic we just run the non-synthetic value through the synthetic provider (synth = SyntheticProvider(valobj.GetNonSyntheticValue(), _dict)), but this is obviously suboptimal and there are plans to use the method outlined below.

Instead, we can leverage the Python module’s state to allow for instance summaries. Prior art for this technique exists in the old CodeLLDB Rust visualizer scripts.

In short: every Synthetic Provider’s __init__ function stores a unique ID and a weak reference to self in a global dictionary. The Synthetic Provider class also implements a get_summary function. The type’s SummaryProvider is a function that looks up the unique ID in this dictionary, then calls a get_summary on the instance it retrieves.

import weakref

SYNTH_BY_ID = weakref.WeakValueDictionary()

class SyntheticProvider:
    valobj: SBValue

    # slots requires opting-in to __weakref__
    __slots__ = ("valobj", "__weakref__")

    def __init__(valobj: SBValue, _dict):
        SYNTH_BY_ID[valobj.GetID()] = self
        self.valobj = valobj

    def get_summary(self) -> str:
        ...

def InstanceSummaryProvider(valobj: SBValue, _dict) -> str:
    # GetNonSyntheticValue should never fail as InstanceSummaryProvider implies an instance of a
    # `SyntheticProvider`. No non-synthetic types should ever have this summary assigned to them
    # We use GetNonSyntheticValue because the synthetic vaobj has its own unique ID
    return SYNTH_BY_ID[valobj.GetNonSyntheticValue().GetID()].get_summary()

For example, one might use this for the Enum synthetic provider. The summary would like to access the variant name, but there isn’t a convenient way to reflect this via the type name or child-values of the synthetic. By implementing an instance summary, we can retrieve the variant name via self.variant.GetTypeName() and some string manipulation.

Writing Visualizer Scripts

IMPORTANT: Unlike GDB and CDB, LLDB can debug executables with either DWARF or PDB debug info. Visualizers must be written to account for both formats whenever possible. See: rust-codegen for an overview of the differences

Scripts are injected into LLDB via the CLI command command script import <path-to-script>.py. Once injected, classes and functions can be added to the synthetic/summary pool with type synthetic add and type summary add respectively. The summaries and synthetics can be associated with a “category”, which is typically named after the language the providers are intended for. The category we use will be called Rust.

TIP: all LLDB commands can be prefixed with help (e.g. help type synthetic add) for a brief description, list of arguments, and examples.

Currently (Nov 2025) we use command source ..., which executes a series of CLI commands from the file lldb_commands to add providers. This file is somewhat unwieldy, and will soon be supplanted by the Python API equivalent outlined below.

__lldb_init_module

This is an optional function of the form:

def __lldb_init_module(debugger: SBDebugger, _lldb_internal) -> None: ...

This function is called at the end of command script import ..., but before control returns back to the CLI. It allows the script to initialize its own state.

Crucially, it is passed a reference to the debugger itself. This allows us to create the Rust category and add providers to it. It can also allow us to conditionally change which providers we use depending on what version of LLDB the script detects. This is vital for backwards compatibility once we begin using recognizer functions, as recognizers were added in lldb 19.0.

Visualizer Resolution

The order that visualizers resolve in is listed here. In short:

  • If there is an exact match (non-regex name, recognizer function, or type already matched to provider), use that
  • If the object is a pointer/reference, try to use the dereferenced type’s formatter
  • If the object is a typedef, check the underlying type for a formatter
  • If none of the above work, iterate through the regex type matchers

Within each of those steps, iteration is done backwards to allow new commands to “override” old commands. This is important for cases like Box<str> vs Box<T>, were we want a specialized synthetic for the former, but a more generalized synthetic for the latter.

Minutiae

LLDB’s API is very powerful, but there are some “gotchas” and unintuitive behavior, some of which will be outlined below. The python implementation can be viewed at the path returned by the CLI command lldb -P in lldb\__init__.py. In addition to the examples in the lldb repo, there are also C++ visualizers that can be used as a reference (e.g. LibCxxVector, the equivalent to Vec<T>). While C++’s visualizers are written in C++ and have access to LLDB’s internals, the API and general practices are very similar.

SBValue

  • Pointer/reference SBValues will effectively “auto-deref” in some cases, acting as if the children of the pointed-to-object are its own children.
  • The non-function fields are typically property() fields that point directly to the function anyway (e.g. SBValue.type = property(GetType, None)). Accessing through these shorthands is a bit slower to access than just calling the function directly, so they should be avoided. Some of the properties return special objects with special properties (e.g. SBValue.member returns an object that acts like dict[str, SBValue] to access children). Internally, many of these special objects just allocate a new class instance and call the function on the SBValue anyway, resulting in additional performance loss (e.g. SBValue.member internally just implements __getitem__ which is the one-liner return self.valobj.GetChildMemberWithName(name))
  • SBValue.GetID returns a unique int for each value for the duration of the debug session. Synthetic SBValue’s have a different ID than their underlying SBValue. The underlying ID can be retrieved via SBValue.GetNonSyntheticValue().GetID().
  • When manually calculating an address, SBValue.GetValueAsAddress should be preferred over SBValue.GetValueAsUnsigned due to target-specific behavior
  • Getting a string representation of an SBValue can be tricky because GetSummary requires a summary provider and GetValue requires the type be representable by a primitive. In almost all cases where neither of those conditions are met, the type is a user defined struct that can be passed through StructSummaryProvider.

SBType

  • “Aggregate type” means a non-primitive struct/class/union
  • “Template” is equivalent to “Generic”
  • Types can be looked up by their name via SBTarget.FindFirstType(type_name). SBTarget can be acquired via SBValue.GetTarget
  • SBType.template_args returns None instead of an empty list if the type has no generics
  • It is sometimes necessary to transform a type into the type you want via functions like SBType.GetArrayType and SBType.GetPointerType. These functions cannot fail. They ask the underlying LLDB TypeSystem plugin for the type, bypassing the debug info completely. Even if the type does not exist in the debug info at all, these functions can create the appropriate type.
  • SBType.GetCanonicalType is effectively SBType.GetTypedefedType + SBType.GetUnqualifiedType. Unlike SBType.GetTypedefedType, it will always return a valid SBType regardless of whether or not the original SBType is a typedef.
  • SBType.GetStaticFieldWithName was added in LLDB 18. Unfortunately, backwards compatibility isn’t always possible since the static fields are otherwise completely inaccessible.

Example Provider: Vec<T>

SyntheticProvider

We start with the typical prelude, using __slots__ since we have known fields. In addition to the object itself, we also need to store the type of the elements because Vec’s heap pointer is a *mut u8, not a *mut T. Rust is a statically typed language, so the type of T will never change. That means we can store it during initialization. The heap pointer, length, and capacity can change though, and thus are default initialized here.

import lldb

class VecSyntheticProvider:
    valobj: SBValue
    data_ptr: SBValue
    len: int
    cap: int
    element_type: SBType

    __slots__ = (
        "valobj",
        "data_ptr",
        "len",
        "cap",
        "element_type",
        "__weakref__",
    )

    def __init__(valobj: SBValue, _dict) -> None:
        self.valobj = valobj
        # invalid type is a better default than `None`
        self.element_type = SBType()

        # special handling to account for DWARF/PDB differences
        if (arg := valobj.GetType().GetTemplateArgumentType(0)):
            self.element_type = arg
        else:
            arg_name = next(get_template_args(valobj.GetTypeName()))
            self.element_type = resolve_msvc_template_arg(arg_name, valobj.GetTarget())

For the implementation of get_template_args and resolve_msvc_template_arg, please see: lldb_providers.py.

Next, the update function. We check if the pointer or length have changed. We can ommit checking the capacity, as the number of children will remain the same unless len changes. If changing the capacity resulted in a reallocation, data_ptr’s address would be different.

If data_ptr and length haven’t changed, we can take advantage of LLDB’s caching and return early. If they have changed, we store the new values and tell LLDB to flush the cache.

def update(self):
    ptr = self.valobj.GetChildMemberWithName("data_ptr")
    len = self.valobj.GetChildMemberWithName("length").GetValueAsUnsigned()

    if (
        self.data_ptr.GetValueAsAddress() == ptr.GetValueAsAddress()
        and self.len == len
    ):
        # Our child address offsets and child count are still valid
        # so we can reuse cached children
        return True

    self.data_ptr = ptr
    self.len = len

    return False

has_children and num_children are both straightforward:

def has_children(self) -> bool:
    return True

def num_children(self) -> int:
    return self.len

When accessing elements, we expect values of the format [0], [1], etc. to mimic indexing. Additionally, we still want the user to be able to quickly access the length and capacity, as they can be very useful when debugging. We assign these values u32::MAX - 1 and u32::MAX - 2 respectively, as we can almost surely guarantee that they will not overlap with element values. Note that we can account for both the full and shorthand capacity name.

    def get_child_index(self, name: str) -> int:
        index = name.lstrip("[").rstrip("]")
        if index.isdigit():
            return int(index)
        if name == "len":
            return lldb.UINT32_MAX - 1
        if name == "cap" or name == "capacity":
            return lldb.UINT32_MAX - 2

        return -1

We now have to properly coordinate get_child_at_index so that the elements, length, and capacity are all accessible.

def get_child_at_index(self, index: int) -> SBValue:
    if index == UINT32_MAX - 1:
        return self.valobj.GetChildMemberWithName("len")
    if index == UINT32_MAX - 2:
        return (
            self.valobj.GetChildMemberWithName("buf")
            .GetChildMemberWithName("inner")
            .GetChildMemberWithName("cap")
            .GetChildAtIndex(0)
            .Clone("capacity")
        )
    addr = self.data_ptr.GetValueAsAddress()
    addr += index * self.element_type.GetByteSize()
    return self.valobj.CreateValueFromAddress(f"[{index}]", addr, self.element_type)

For the type’s display name, we can strip the path qualifier. User defined types named Vec will end up fully qualified, so there shouldn’t be any ambiguity. We can also remove the allocator generic, as it’s very very rarely useful. We use get_template_args instead of self.element_type.GetName() for 3 reasons:

  1. If we fail to resolve the element type for any reason, self.valobj’s type name can still let the user know what the real type of the element is
  2. Type names are not subject to the limitations of DWARF and PDB nodes, so the template type in the name will reflect things like *const/*mut and &/&mut.
  3. We do not currently (Nov 2025) normalize MSVC type names, but once we do, we will need to work with the string-names of types anyway. It’s also much easier to cache a string-to-string conversion compared to an SBType-to-string conversion.
def get_type_name(self) -> str:
    return f"Vec<{next(get_template_args(self.valobj))}>"

There isn’t an appropriate primitive value with which to represent a Vec, so we simply ommit the get_value function.

SummaryProvider

The summary provider is very simple thanks to our synthetic provider. The only real hiccup is that GetSummary only returns a value if the object’s type has a SummaryProvider. If it doesn’t, it will return an empty string which is not ideal. In a full set of visualizer scripts, we can ensure that every type that doesn’t have a GetSummary() or a GetValue() is a struct, and then delegate to a generic StructSummaryProvider. For this demonstration, I will gloss over that detail.

def VecSummaryProvider(valobj: SBValue, _lldb_internal) -> str:
    children = []
    for i in range(valobj.GetNumChildren()):
        child = valobj.GetChildAtIndex(i)
        summary = child.GetSummary()
        if summary is None:
            summary = child.GetValue()
            if summary is None:
                summary = "{...}"

        children.append(summary)

    return f"vec![{", ".join(children)}]"

Enabling the providers

Assume this synthetic is imported into lldb_lookup.py

With CLI commands:

type synthetic add -l lldb_lookup.synthetic_lookup -x "^(alloc::([a-z_]+::)+)Vec<.+>$" --category Rust
type summary add -F lldb_lookup.summary_lookup -x "^(alloc::([a-z_]+::)+)Vec<.+>$" --category Rust

With __lldb_init_module:

def __lldb_init_module(debugger: SBDebugger, _dict: LLDBOpaque):
    # Ensure the category exists and is enabled
    rust_cat = debugger.GetCategory("Rust")

    if not rust_cat.IsValid():
        rust_cat = debugger.CreateCategory("Rust")

    rust_cat.SetEnabled(True)

    # Register Vec providers
    vec_regex = r"^(alloc::([a-z_]+::)+)Vec<.+>$"
    sb_name = lldb.SBTypeNameSpecifier(vec_regex, is_regex=True)

    sb_synth = lldb.SBTypeSynthetic.CreateWithClassName("lldb_lookup.VecSyntheticProvider")
    sb_synth.SetOptions(lldb.eTypeOptionCascade)

    sb_summary = lldb.SBTypeSummary.CreateWithFunctionName("lldb_lookup.VecSummaryProvider")
    sb_summary.SetOptions(lldb.eTypeOptionCascade)

    rust_cat.AddTypeSynthetic(sb_name, sb_synth)
    rust_cat.AddSummary(sb_name, sb_summary)

Output

Without providers:

(lldb) v vec_v
(alloc::vec::Vec<int, alloc::alloc::Global>) vec_v = {
  buf = {
    inner = {
      ptr = {
        pointer = (pointer = "\n")
        _marker = {}
      }
      cap = (__0 = 5)
      alloc = {}
    }
    _marker = {}
  }
  len = 5
}
(lldb) v vec_v[0]
error: <user expression 0>:1:6: subscripted value is not an array or pointer
   1 | vec_v[0]
     | ^

With providers (v <var_name> prints the summary and then a list of all children):

(lldb) v vec_v
(Vec<int>) vec_v = vec![10, 20, 30, 40, 50] {
  [0] = 10
  [1] = 20
  [2] = 30
  [3] = 40
  [4] = 50
}
(lldb) v vec_v[0]
(int) vec_v[0] = 10

We can also confirm that the “hidden” length and capacity are still accessible:

(lldb) v vec_v.len
(unsigned long long) vec_v.len = 5
(lldb) v vec_v.capacity
(unsigned long long) vec_v.capacity = 5
(lldb) v vec_v.cap
(unsigned long long) vec_v.cap = 5