Writing a Linker Map Parser - Linkerland Devlog 001 • Shriram Balaji's Blog

Hello! I’m building linkerland - a tool to parse and visualize linker map files. This is a devlog series where I’ll be sharing my progress and learnings, inspired by Mitchell Hashimoto’s Devlogs on Ghostty

Shriram Balaji

@shrirambalaji

· Follow

building "linkerland" - a tool to visualize and parse linker map files. needed this for the blog, and couldn't find any reliable tools that did this well. just wrote down what I intend to do, and might share things I learn along the way here. maybe I'll use egui + ratatui for

8:27 PM · Aug 21, 2024

Read 1 reply

What is a `.map` file?

A Linker map file contains information about the memory layout of the program. It is effectively a map of the memory address offsets to variables and functions in a program.

A map file typically has these sections:

Object files referenced in a program during linking
Sections in the ELF / Mach-O / COFF Object File formats.
The symbol table, listing all the symbols in the program.

Here’s a trimmed down example of a map file, generated by Clang / LD for a rust program on macOS:

# Path: /target/debug/deps/sample-app
# Arch: arm64
# Object files:
[ 66] /Library/Developer/CommandLineTools/SDKs/MacOSX14.4.sdk/usr/lib/system/libunwind.tbd
# Sections:
# Address  Size      Segment  Section
0x1000007DC  0x00036FC4  __TEXT  __text
# Symbols:
# Address  Size      File  Name
0x10004C058  0x00000018  [  1] __ZN3std3sys3pal4unix17thread_local_dtor13register_dtor5DTORS17hf7230a0b661819a4E

The map file (generated by LLD) is divided into blocks, each block starting with a # followed by the name. The # Symbols block lists all the symbols in the program, along with their addresses and sizes. I’m calling it a “block” so as to not conflate it with “sections” in the ELF file format.

Tip

The map file format varies between different linkers. If you’re curious, click the arrow on the left to see what the map file generated by GCC’s linker (ld) looks like:

Archive member included to satisfy reference by file (symbol)

target/out/libbar.a(libbar.bar.353231859863985b-cgu.0.rcgu.o)
                              /path/target/debug/deps/learning_linkers-b617d8cd69d8a0d7.cpwahgvjsr57gus97bejfuwnj.rcgu.o (bar)

Merging program properties

Removed property 0xc0000000 to merge /tmp/rustcOZG9Nq/symbols.o (0x3) and /nix/store/p852ydpr8zlq0szh5fpvvbzzjaq2ydp5-glibc-2.39-52/lib/Scrt1.o (not found)

As-needed library included to satisfy reference by file (symbol)

libc.so.6                     target/out/libfoo.a(std-da896425a938a71e.std.a0dabd9cb6c0976d-cgu.0.rcgu.o) (nanosleep@@GLIBC_2.17)

Discarded input sections

 .rodata.cst4   0x0000000000000000        0x4 /nix/store/p852ydpr8zlq0szh5fpvvbzzjaq2ydp5-glibc-2.39-52/lib/Scrt1.o
 .data          0x0000000000000000        0x4 /nix/store/p852ydpr8zlq0szh5fpvvbzzjaq2ydp5-glibc-2.39-52/lib/Scrt1.o
 .data.rel.ro..Lanon.3a4e54b40bc4949ecbf2c1e94809000e.378
                0x0000000000000000       0x10 target/out/libfoo.a(core-077a73c34c19ca9c.core.9ed7ba0fc6c6436d-cgu.0.rcgu.o)
 .rodata..Lanon.3a4e54b40bc4949ecbf2c1e94809000e.394
                0x0000000000000000        0x2 target/out/libfoo.a(core-077a73c34c19ca9c.core.9ed7ba0fc6c6436d-cgu.0.rcgu.o)

Memory Configuration

Name             Origin             Length             Attributes
*default*        0x0000000000000000 0xffffffffffffffff

Linker script and memory map

LOAD /nix/store/p852ydpr8zlq0szh5fpvvbzzjaq2ydp5-glibc-2.39-52/lib/Scrt1.o
LOAD /nix/store/p852ydpr8zlq0szh5fpvvbzzjaq2ydp5-glibc-2.39-52/lib/crti.o
LOAD /nix/store/m1x0zcvlj5jvgzbxzl8n53qjr5kbfb0y-gcc-13.2.0/lib/gcc/aarch64-unknown-linux-gnu/13.2.0/crtbeginS.o
LOAD /tmp/rustcOZG9Nq/symbols.o
LOAD /path/target/debug/deps/learning_linkers-b617d8cd69d8a0d7.13glgoxqfrapa4422kjiylhjn.rcgu.o
LOAD /path/target/debug/deps/learning_linkers-b617d8cd69d8a0d7.5ltaynf8hmx71k4xzabglzwzv.rcgu.o
LOAD /path/target/debug/deps/learning_linkers-b617d8cd69d8a0d7.a66fw6re8gsowyur9ec2r6vhh.rcgu.o
LOAD /path/target/debug/deps/learning_linkers-b617d8cd69d8a0d7.cpwahgvjsr57gus97bejfuwnj.rcgu.o
LOAD /path/target/debug/deps/learning_linkers-b617d8cd69d8a0d7.djv3k11is87wsaj7dpfbgrbha.rcgu.o
LOAD /path/target/debug/deps/learning_linkers-b617d8cd69d8a0d7.ey1yr5g74cciy6i8m9mvw3cfs.rcgu.o
LOAD /path/target/debug/deps/learning_linkers-b617d8cd69d8a0d7.24vwyyh6wgrsmswrip7k1j013.rcgu.o
LOAD target/out/libfoo.a
LOAD target/out/libbar.a

                [!provide]                        PROVIDE (__executable_start = SEGMENT_START ("text-segment", 0x0))
                0x0000000000000270                . = (SEGMENT_START ("text-segment", 0x0) + SIZEOF_HEADERS)

.interp         0x0000000000000270       0x54
 *(.interp)
 .interp        0x0000000000000270       0x54 /nix/store/p852ydpr8zlq0szh5fpvvbzzjaq2ydp5-glibc-2.39-52/lib/Scrt1.o

.note.ABI-tag   0x00000000000002c4       0x20
 .note.ABI-tag  0x00000000000002c4       0x20 /nix/store/p852ydpr8zlq0szh5fpvvbzzjaq2ydp5-glibc-2.39-52/lib/Scrt1.o

.note.gnu.build-id
 *(.note.gnu.build-id)

.hash           0x00000000000002e8      0x1ac
 *(.hash)
 .hash          0x00000000000002e8      0x1ac /nix/store/p852ydpr8zlq0szh5fpvvbzzjaq2ydp5-glibc-2.39-52/lib/Scrt1.o

.gnu.hash       0x0000000000000498       0x1c
 *(.gnu.hash)
 .gnu.hash      0x0000000000000498       0x1c /nix/store/p852ydpr8zlq0szh5fpvvbzzjaq2ydp5-glibc-2.39-52/lib/Scrt1.o

.dynsym         0x00000000000004b8      0x660
 *(.dynsym)
 .dynsym        0x00000000000004b8      0x660 /nix/store/p852ydpr8zlq0szh5fpvvbzzjaq2ydp5-glibc-2.39-52/lib/Scrt1.o

.dynstr         0x0000000000000b18      0x447
 *(.dynstr)
 .dynstr        0x0000000000000b18      0x447 /nix/store/p852ydpr8zlq0szh5fpvvbzzjaq2ydp5-glibc-2.39-52/lib/Scrt1.o

.gnu.version    0x0000000000000f60       0x88
 *(.gnu.version)
 .gnu.version   0x0000000000000f60       0x88 /nix/store/p852ydpr8zlq0szh5fpvvbzzjaq2ydp5-glibc-2.39-52/lib/Scrt1.o

.gnu.version_d  0x0000000000000fe8        0x0
 *(.gnu.version_d)
 .gnu.version_d
                0x0000000000000fe8        0x0 /nix/store/p852ydpr8zlq0szh5fpvvbzzjaq2ydp5-glibc-2.39-52/lib/Scrt1.o

.gnu.version_r  0x0000000000000fe8       0xb0
 *(.gnu.version_r)
 .gnu.version_r
                0x0000000000000fe8       0xb0 /nix/store/p852ydpr8zlq0szh5fpvvbzzjaq2ydp5-glibc-2.39-52/lib/Scrt1.o

.rela.dyn       0x0000000000001098     0x2f10
0.rcgu.o)

If you’ve read my previous blog on Resolving Rust Symbols these terms might sound all too familiar. It’s okay if you haven’t read it, I have talked about these in the blog, so now is probably a good time to check it out!

Why am I building this?

I wanted to parse the map files and take a peek into what the linker has done. The symbol tables from .o files, give us the outlook of the program at a file level. The map file could give us a bird’s eye view of the entire program. (At least that’s what I think 🤔)

I looked at MapFileViewer and it seems like a really good tool, but it’s Windows specific. I wanted to build something similar for macOS and Linux. So here we are!

I’v read that firmware engineers working in embedded systems use map files for debugging. I don’t have a lot of experience with embedded systems, but maybe this tool could be useful for them too.

Writing a parser

I started by writing a parser crate for the map files. The parser would read the map file and give us a structured representation of the file. I haven’t yet decided on the output format, could be an AST with a JSON representation or just simple structs.

As of now, I’ve started implementing the parser for macOS, and eventually will support Linux.

Parser Generators vs Combinators

The most simplistic way to parse a map file is to use regular expressions. The popular mapfile_parser crate uses this approach, but I wanted to avoid regex matching.

I wanted to parse the file in a more structured way, and it also felt using regex could take a performance hit (but I didn’t really measure it tbh 🤷‍♂️)

There are a couple of options:

Parser Generators - Generates the parser code based on a grammar (yacc, bison, antlr, etc.)
Parser Combinators - Write the parser code directly as high-level functions

I liked parser combinators because they are fairly easy to reason about and write. Functions written in parser combinators are composable and can be reused, and are also easy to test.

I also learnt that tree-sitter is a parser generator, and powers the syntax highlighting of most modern code editors.

I’ve worked with parser generators in the past: built a parser for KQL that needed to work on the browser, using nearley during a work hackathon. It was fun, but wasn’t particularly easy to test and never saw the light of day 😅

Atleast for my use case, parser combinators seemed like a good-enough choice, considering how the map files don’t follow a specification unlike programming languages.

Winnow vs Nom

TLDR: I chose winnow over nom because it’s actively maintained and faster according to benchmarks.

Using Winnow

Let’s take a snippet from the parser crate to look at how we write it using winnow.

Specifically, let’s say we want to parse just the architecture from the map file. The architecture is usually the first line in the map file, and it looks something like this.

# Arch: arm64

use winnow::{
    ascii::{multispace0, till_line_ending}
    combinator::preceded,
}

fn arch<'i>(input: &mut &'i str) -> PResult<&'i str> {
    preceded(
        literal("# Arch: "),
        preceded(multispace0, till_line_ending),
    )
    .parse_next(input)
}

The arch function takes an input string and returns the architecture. Every parser function in winnow should ideally return a PResult which I think is short for “parser result” and can be either Ok(O) where O is parsed value or Err(ErrMode<E>).

Then we use the preceded combinator to parse the text. The preceded combinator takes in two parsers as arguments, runs them sequentially, ignores the output of the first and returns the output of the second parser.

In our case, The first parser is literal("# Arch: ") which matches the literal string # Arch: and the second parser takes till_line_ending which matches everything till the end of the line, preceded by 0 or more whitespaces which is represented by multispace0.

Then we call parse_next on the parser to parse the input string. The parse_next function is a trait method that is implemented for all winnow::Parser types.

Okay! let’s see how easy it is to test our arch parser.

#[test]
fn test_arch() {
    let mut input = "# Arch: x86_64";
    let result = arch(&mut input);
    assert_eq!(result.unwrap(), "x86_64");
}

In a similar fashion, I’ve written parsers for other blocks in the map file like sections, symbols, path, etc. The bigger challenge is to make this work cross platform, but that’s a problem for another day 😅

Next Steps

Finish up the parser for macOS
Add output formats for the parsed data
Think about how to visualize it
Write the Linux implementation

See you in the next devlog! 👋

Writing a Linker Map Parser - Linkerland Devlog 001

What is a .map file?

Why am I building this?

Writing a parser

Parser Generators vs Combinators

Winnow vs Nom

Using Winnow

Next Steps

References

What is a `.map` file?