Debugging LLVM
NOTE: If you are looking for info about code generation, please see this chapter instead.
This section is about debugging compiler bugs in code generation (e.g. why the compiler generated some piece of code or crashed in LLVM). LLVM is a big project on its own that probably needs to have its own debugging document (not that I could find one). But here are some tips that are important in a rustc context:
Minimize the example
As a general rule, compilers generate lots of information from analyzing code. Thus, a useful first step is usually to find a minimal example. One way to do this is to
-
create a new crate that reproduces the issue (e.g. adding whatever crate is at fault as a dependency, and using it from there)
-
minimize the crate by removing external dependencies; that is, moving everything relevant to the new crate
-
further minimize the issue by making the code shorter (there are tools that help with this like
creduce
)
For more discussion on methodology for steps 2 and 3 above, there is an epic blog post from pnkfelix specifically about Rust program minimization.
Enable LLVM internal checks
The official compilers (including nightlies) have LLVM assertions disabled,
which means that LLVM assertion failures can show up as compiler crashes (not
ICEs but "real" crashes) and other sorts of weird behavior. If you are
encountering these, it is a good idea to try using a compiler with LLVM
assertions enabled - either an "alt" nightly or a compiler you build yourself
by setting [llvm] assertions=true
in your config.toml - and see whether
anything turns up.
The rustc build process builds the LLVM tools into
./build/<host-triple>/llvm/bin
. They can be called directly.
These tools include:
llc
, which compiles bitcode (.bc
files) to executable code; this can be used to replicate LLVM backend bugs.opt
, a bitcode transformer that runs LLVM optimization passes.bugpoint
, which reduces large test cases to small, useful ones.- and many others, some of which are referenced in the text below.
By default, the Rust build system does not check for changes to the LLVM source code or
its build configuration settings. So, if you need to rebuild the LLVM that is linked
into rustc
, first delete the file .llvm-stamp
, which should be located
in build/<host-triple>/llvm/
.
The default rustc compilation pipeline has multiple codegen units, which is
hard to replicate manually and means that LLVM is called multiple times in
parallel. If you can get away with it (i.e. if it doesn't make your bug
disappear), passing -C codegen-units=1
to rustc will make debugging easier.
Get your hands on raw LLVM input
For rustc to generate LLVM IR, you need to pass the --emit=llvm-ir
flag. If
you are building via cargo, use the RUSTFLAGS
environment variable (e.g.
RUSTFLAGS='--emit=llvm-ir'
). This causes rustc to spit out LLVM IR into the
target directory.
cargo llvm-ir [options] path
spits out the LLVM IR for a particular function
at path
. (cargo install cargo-asm
installs cargo asm
and cargo llvm-ir
). --build-type=debug
emits code for debug builds. There are also
other useful options. Also, debug info in LLVM IR can clutter the output a lot:
RUSTFLAGS="-C debuginfo=0"
is really useful.
RUSTFLAGS="-C save-temps"
outputs LLVM bitcode (not the same as IR) at
different stages during compilation, which is sometimes useful. The output LLVM
bitcode will be in .bc
files in the compiler's output directory, set via the
--out-dir DIR
argument to rustc
.
-
If you are hitting an assertion failure or segmentation fault from the LLVM backend when invoking
rustc
itself, it is a good idea to try passing each of these.bc
files to thellc
command, and see if you get the same failure. (LLVM developers often prefer a bug reduced to a.bc
file over one that uses a Rust crate for its minimized reproduction.) -
To get human readable versions of the LLVM bitcode, one just needs to convert the bitcode (
.bc
) files to.ll
files usingllvm-dis
, which should be in the target local compilation of rustc.
Note that rustc emits different IR depending on whether -O
is enabled, even
without LLVM's optimizations, so if you want to play with the IR rustc emits,
you should:
$ rustc +local my-file.rs --emit=llvm-ir -O -C no-prepopulate-passes \
-C codegen-units=1
$ OPT=./build/$TRIPLE/llvm/bin/opt
$ $OPT -S -O2 < my-file.ll > my
If you just want to get the LLVM IR during the LLVM pipeline, to e.g. see which
IR causes an optimization-time assertion to fail, or to see when LLVM performs
a particular optimization, you can pass the rustc flag -C llvm-args=-print-after-all
, and possibly add -C llvm-args='-filter-print-funcs=EXACT_FUNCTION_NAME
(e.g. -C llvm-args='-filter-print-funcs=_ZN11collections3str21_$LT$impl$u20$str$GT$\ 7replace17hbe10ea2e7c809b0bE'
).
That produces a lot of output into standard error, so you'll want to pipe that
to some file. Also, if you are using neither -filter-print-funcs
nor -C codegen-units=1
, then, because the multiple codegen units run in parallel, the
printouts will mix together and you won't be able to read anything.
-
One caveat to the aforementioned methodology: the
-print
family of options to LLVM only prints the IR unit that the pass runs on (e.g., just a function), and does not include any referenced declarations, globals, metadata, etc. This means you cannot in general feed the output of-print
intollc
to reproduce a given problem. -
Within LLVM itself, calling
F.getParent()->dump()
at the beginning ofSafeStackLegacyPass::runOnFunction
will dump the whole module, which may provide better basis for reproduction. (However, you should be able to get that same dump from the.bc
files dumped by-C save-temps
.)
If you want just the IR for a specific function (say, you want to see why it
causes an assertion or doesn't optimize correctly), you can use llvm-extract
,
e.g.
$ ./build/$TRIPLE/llvm/bin/llvm-extract \
-func='_ZN11collections3str21_$LT$impl$u20$str$GT$7replace17hbe10ea2e7c809b0bE' \
-S \
< unextracted.ll \
> extracted.ll
Investigate LLVM optimization passes
If you are seeing incorrect behavior due to an optimization pass, a very handy
LLVM option is -opt-bisect-limit
, which takes an integer denoting the index
value of the highest pass to run. Index values for taken passes are stable
from run to run; by coupling this with software that automates bisecting the
search space based on the resulting program, an errant pass can be quickly
determined. When an -opt-bisect-limit
is specified, all runs are displayed
to standard error, along with their index and output indicating if the
pass was run or skipped. Setting the limit to an index of -1 (e.g.,
RUSTFLAGS="-C llvm-args=-opt-bisect-limit=-1"
) will show all passes and
their corresponding index values.
If you want to play with the optimization pipeline, you can use the opt
tool
from ./build/<host-triple>/llvm/bin/
with the LLVM IR emitted by rustc.
When investigating the implementation of LLVM itself, you should be aware of its internal debug infrastructure. This is provided in LLVM Debug builds, which you enable for rustc LLVM builds by changing this setting in the config.toml:
[llvm]
# Indicates whether the LLVM assertions are enabled or not
assertions = true
# Indicates whether the LLVM build is a Release or Debug build
optimize = false
The quick summary is:
- Setting
assertions=true
enables coarse-grain debug messaging.- beyond that, setting
optimize=false
enables fine-grain debug messaging.
- beyond that, setting
LLVM_DEBUG(dbgs() << msg)
in LLVM is likedebug!(msg)
inrustc
.- The
-debug
option turns on all messaging; it is like setting the environment variableRUSTC_LOG=debug
inrustc
. - The
-debug-only=<pass1>,<pass2>
variant is more selective; it is like setting the environment variableRUSTC_LOG=path1,path2
inrustc
.
Getting help and asking questions
If you have some questions, head over to the rust-lang Zulip and
specifically the #t-compiler/wg-llvm
stream.
Compiler options to know and love
The -C help
and -Z help
compiler switches will list out a variety
of interesting options you may find useful. Here are a few of the most
common that pertain to LLVM development (some of them are employed in the
tutorial above):
- The
--emit llvm-ir
option emits a<filename>.ll
file with LLVM IR in textual format- The
--emit llvm-bc
option emits in bytecode format (<filename>.bc
)
- The
- Passing
-C llvm-args=<foo>
allows passing pretty much all the options that tools like llc and opt would accept; e.g.-C llvm-args=-print-before-all
to print IR before every LLVM pass. - The
-C no-prepopulate-passes
will avoid pre-populate the LLVM pass manager with a list of passes. This will allow you to view the LLVM IR that rustc generates, not the LLVM IR after optimizations. - The
-C passes=val
option allows you to supply a space separated list of extra LLVM passes to run - The
-C save-temps
option saves all temporary output files during compilation - The
-Z print-llvm-passes
option will print out LLVM optimization passes being run - The
-Z time-llvm-passes
option measures the time of each LLVM pass - The
-Z verify-llvm-ir
option will verify the LLVM IR for correctness - The
-Z no-parallel-backend
will disable parallel compilation of distinct compilation units - The
-Z llvm-time-trace
option will output a Chrome profiler compatible JSON file which contains details and timings for LLVM passes. - The
-C llvm-args=-opt-bisect-limit=<index>
option allows for bisecting LLVM optimizations.
Filing LLVM bug reports
When filing an LLVM bug report, you will probably want some sort of minimal working example that demonstrates the problem. The Godbolt compiler explorer is really helpful for this.
-
Once you have some LLVM IR for the problematic code (see above), you can create a minimal working example with Godbolt. Go to llvm.godbolt.org.
-
Choose
LLVM-IR
as programming language. -
Use
llc
to compile the IR to a particular target as is:- There are some useful flags:
-mattr
enables target features,-march=
selects the target,-mcpu=
selects the CPU, etc. - Commands like
llc -march=help
output all architectures available, which is useful because sometimes the Rust arch names and the LLVM names do not match. - If you have compiled rustc yourself somewhere, in the target directory
you have binaries for
llc
,opt
, etc.
- There are some useful flags:
-
If you want to optimize the LLVM-IR, you can use
opt
to see how the LLVM optimizations transform it. -
Once you have a godbolt link demonstrating the issue, it is pretty easy to fill in an LLVM bug. Just visit their github issues page.
Porting bug fixes from LLVM
Once you've identified the bug as an LLVM bug, you will sometimes find that it has already been reported and fixed in LLVM, but we haven't gotten the fix yet (or perhaps you are familiar enough with LLVM to fix it yourself).
In that case, we can sometimes opt to port the fix for the bug directly to our own LLVM fork, so that rustc can use it more easily. Our fork of LLVM is maintained in rust-lang/llvm-project. Once you've landed the fix there, you'll also need to land a PR modifying our submodule commits -- ask around on Zulip for help.