I’ve been reading about stages of Rust compilation, and wrote about some of my learnings in a previous post. As I started reading more, I realized that it might be helpful to find a way to contribute to the project, learn more about the compiler and validate my assumptions / understanding of the compiler.
Finding an issue to work on
The rustc
compiler is a large codebase, and definitely intimidating at first glance. As with contributing to any new OSS project, I started by looking at the issues that were tagged as beginner friendly and found an issue that was tagged as E-mentor
and E-easy
: #129599.
It involved adding a new std_features
flag to the bootstrap phase of the compiler, that helps in enabling std
features for compiler development. This could be features like enabling panic-unwind
or backtrace
features when building the compiler.
This felt like a good issue to start with, and it was relatively straightforward to understand the issue, but I had to understand how the compiler was bootstrapped and how the std_features
flag was passed to the compiler.
I also connected with Onur Ozkan
who was the mentor for the issue on zulip chat, and he was very helpful and welcoming throughout the process. Thanks Onur!
What’s bootstrapping?
Bootstrapping is how the compiler compiles itself, typically by using an older version to build a newer version. For rust to bootstrap, it has 4 stages:
- Stage 0: Pre-compiled Compiler - the current beta
rustc
and is downloaded from the internet. - Stage 1: Compiler built with the pre-compiled compiler from Stage 0
- Stage 2: Rebuilding the stage 1 compiler with itself, and this is what get’s distributed with
rustup
. - Stage 3 (optional): We can build libs with this compiler to sanity check that the compiler result is the same as the stage 2 compiler.
The Rustc Dev Guide on Bootstrapping has a lot more details on bootstrapping, if you’re interested in learning more.
Getting started
I followed the steps in the Quick Start guide on my macOS machine, and I ran into my first issue:
The ./x.py setup
command failed with the following error:
It looked like the build failed because the dependencies couldn’t be compiled. Initially, I thought that I had some system dependencies missing, but later found out that it was because of conflicting system dependencies. I had binutils
installed via brew
and I had strip
in my path, which was conflicting with the strip
that was being used by the compiler.
So I had to remove binutils
from my path, and then the build was successful:
What’s that python script doing in my rust compiler?
Wait a minute - why’s there a python script to build the compiler? 🤔
x.py
is a wrapper script that calls into the bootstrap
tool - a cross-platform build tool backed by cargo
, used specifically for the rust project. x.py
by itself doesn’t do a lot more, other than checking for the python version and subsequently invoking the bootstrap.py
script, which does the following:
- Parses the CLI arguments and passes it on the
bootstrap
tool - Downloads the rust toolchain, and makes sure that the necessary build tools
rustc
,cargo
etc., are available. - Runs the bootstrap build using
cargo
to compile the bootstrap tool insrc/bootstrap
. The CLI arguments from earlier are passed to thebootstrap
tool. - Finally, the
bootstrap
tool is run with the CLI arguments.
So when we call ./x <subcommand>
, it’s actually invoking the bootstrap
CLI with the subcommand.
Setting up Zed for Compiler Development
I’ve started using Zed for all of my personal work and I’ve been liking it a lot.
./x setup
has an option to add settings to configure VSCode, but not for Zed. This is important for rust_analyzer
to work correctly during compiler development. The configuration for rust_analyzer
is not very different from VSCode, but needs to be added in a .zed/settings.json
in the root of the repo:
Making the change
The change involved a couple of steps:
- Adding a new field
std_features
in theconfig.toml
file - Parsing (deduplicated) and updating the internal rust
features
config to be passed on the compiler bootstrap process - Unit tests
How does bootstrap read the config.toml
file?
Bootstrap uses toml-rs to parse the config.toml
. Also since configuration can be overridden from both the toml
file and from command line arguments, the TomlConfig
struct implements a Merge
trait:
The config fields are not directly decoded, but decoded into separate sections represented by their own struct
. In my case, I only needed to add a field in the Rust
struct, corresponding to the [rust]
section in the config.toml
here.
Deduplicating std_features
Although the previous step handled merging, since we are dealing with a list of values there needed to be some way to deduplicate it, because the internal features
representation needs to be based on that.
I reached for the handy HashSet
collection, but from the review I understood that the order of std_features
could be important, and the current implementation works with this assumption. HashSet
does not have deterministic ordering and so I cannot use it, this is because it’s backed by a hashtable and needs to be compute the hashcode for every element and then puts it in a “bucket”. The bucket has no connection to the insertion order, so there’s no way to keep track of the insertion order.
So I implemented it with a BTreeSet
instead as suggested in the initial review.
Error Handling in CI Builds
The toml
parser handles parsing errors by default, so I didn’t have to do anything there. Since the std_features
option can fundamentally change the output of the compiler, we need to throw an error if this config gets passed in CI builds. I really liked the err
macro that they had setup for making this easy.
I usually stay away from macros (skill issue tbh), but I liked this one. I’m still contemplating if this could have been a function, but I digress.
Merging the change - Squashing commits
I addressed all the feedback comments, and the PR was ready for merge. The rust project follows a rebase + squash workflow for contributors. I have had my own reservations about rebase in the past:
I learnt that most of my pain with rebase was probably due to not knowing about rerere
which solved a lot of issues dealing with rebase merge conflicts.
However, squashing was new to me and the first time around I botched the squash and messed up the branch. Since it was also a rebase and I pushed upstream, there was no way to undo it (maybe there’s a way, but I gave up pretty quickly).
So I made a new branch, rebased upstream onto it and cherry-picked my changes from the previous changes, and then squashed all of them into a single commit after following through this article.
Finally, after a couple of weeks of picking up my first issue, I got my first PR merged - https://github.com/rust-lang/rust/pull/131315. This might all sound trivial, but hey - I would not have imagined me being able to contribute to the rust project a couple of years ago, it’s good to count the small wins! https://thanks.rust-lang.org/rust/1.83.0/ now lists me as a contributor 😄
Looking forward to contributing, learning and sharing a lot more about rustc
.
See you in the next one 👋