Writing a Linker Map Parser - Linkerland Devlog 001
| 7 min read
Hello! I’m building linkerland - a tool to parse and visualize linker map files. This is a devlog series where I’ll be sharing my progress and learnings, inspired by Mitchell Hashimoto’s Devlogs on Ghostty
What is a .map
file?
A Linker map file contains information about the memory layout of the program. It is efectively a map of the memory address offsets to variables and functions in a program.
A map file typically has these sections:
- Object files referenced in a program during linking
- Sections in the ELF / Mach-O / COFF Object File formats.
- The symbol table, listing all the symbols in the program.
Here’s a trimmed down example of a map file, generated by Clang / LD for a rust program on macOS:
The map file (generated by LLD) is divided into blocks, each block starting with a #
followed by the name. The # Symbols
block lists all the symbols in the program, along with their addresses and sizes. I’m calling it a “block” so as to not conflate it with “sections” in the ELF file format.
If you’ve read my previous blog on Resolving Rust Symbols these terms might sound all too familiar. It’s okay if you haven’t read it, I have talked about these in the blog, so now is probably a good time to check it out!
Why am I building this?
I wanted to parse the map files and take a peek into what the linker has done. The symbol tables from .o
files, give us the outlook of the program at a file level. The map file could give us a bird’s eye view of the entire program. (At least that’s what I think 🤔)
I looked at MapFileViewer and it seems like a really good tool, but it’s Windows specific. I wanted to build something similar for macOS and Linux. So here we are!
I’v read that firmware engineers working in embedded systems use map files for debugging. I don’t have a lot of experience with embedded systems, but maybe this tool could be useful for them too.
Writing a parser
I started by writing a parser crate for the map files. The parser would read the map file and give us a structured representation of the file. I haven’t yet decided on the output format, could be an AST with a JSON representation or just simple structs.
As of now, I’ve started implementing the parser for macOS, and eventually will support Linux.
Parser Generators vs Combinators
The most simplistic way to parse a map file is to use regular expressions. The popular mapfile_parser
crate uses this approach, but I wanted to avoid regex matching.
I wanted to parse the file in a more structured way, and it also felt using regex could take a performance hit (but I didn’t really measure it tbh 🤷♂️)
There are a couple of options:
- Parser Generators - Generates the parser code based on a grammar (
yacc
,bison
,antlr
, etc.) - Parser Combinators - Write the parser code directly as high-level functions
I liked parser combinators because they are fairly easy to reason about and write. Functions written in parser combinators are composable and can be reused, and are also easy to test.
I also learnt that tree-sitter is a parser generator, and powers the syntax highlighting of most modern code editors.
I’ve worked with parser generators in the past: built a parser for KQL that needed to work on the browser, using nearley
during a work hackathon. It was fun, but wasn’t particularly easy to test and never saw the light of day 😅
Atleast for my use case, parser combinators seemed like a good-enough choice, considering how the map files don’t follow a specification unlike programming languages.
Winnow vs Nom
TLDR: I chose winnow
over nom
because it’s actively maintained and faster according to benchmarks.
Using Winnow
Let’s take a snippet from the parser crate to look at how we write it using winnow
.
Specifically, let’s say we want to parse just the architecture from the map file. The architecture is usually the first line in the map file, and it looks something like this.
The arch
function takes an input string and returns the architecture. Every parser function in winnow
should ideally return a PResult
which I think is short for “parser result” and can be either Ok(O)
where O is parsed value or Err(ErrMode<E>)
.
Then we use the preceded
combinator to parse the text. The preceded
combinator takes in two parsers as arguments, runs them sequentially, ignores the output of the first and returns the output of the second parser.
In our case, The first parser is literal("# Arch: ")
which matches the literal string # Arch:
and the second parser takes till_line_ending
which matches everything till the end of the line, preceded by 0 or more whitespaces which is represented by multispace0
.
Then we call parse_next
on the parser to parse the input string. The parse_next
function is a trait method that is implemented for all winnow::Parser
types.
Okay! let’s see how easy it is to test our arch
parser.
In a similar fashion, I’ve written parsers for other blocks in the map file like sections
, symbols
, path
, etc. The bigger challenge is to make this work cross platform, but that’s a problem for another day 😅
Next Steps
- Finish up the parser for macOS
- Add output formats for the parsed data
- Think about how to visualize it
- Write the Linux implementation
See you in the next devlog! 👋