Project 18: ELF Linker and Loader
Project 18: ELF Linker and Loader
Build a tiny static linker for a constrained subset of ELF64, so “undefined reference” stops being mysterious and relocation becomes something you can explain byte-for-byte.
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Expert |
| Time Estimate | 2-3 weeks |
| Language | C (Alternatives: Rust) |
| Prerequisites | Comfortable with binary parsing, readelf/objdump, basic x86-64 asm |
| Key Topics | ELF sections/symbols, symbol resolution, relocation, layout, entry points |
| CS:APP Chapters | 7 (plus 1 for toolchain context) |
1. Learning Objectives
By completing this project, you will:
- Parse ELF64 relocatable objects: headers, section tables, symbols, and relocations
- Implement symbol resolution across multiple
.oinputs (strong/weak/common rules) - Implement a small set of x86-64 relocation types end-to-end
- Produce a valid output artifact you can inspect with
readelfand execute (in milestone form)
2. Project Specification
2.1 What You Will Build
A CLI called myld:
- Input: multiple ELF64 relocatable objects (
.o) - Output (Milestone A): a single merged relocatable (
combined.o) with resolved symbols and applied relocations where possible - Output (Milestone B, optional): a runnable ELF64 executable for a very constrained program model (no shared libs; a tiny
_start)
2.2 Functional Requirements
- Read and validate ELF64 input objects (endianness, class, machine = x86-64)
- Merge like sections (
.text,.rodata,.data,.bss) with alignment preserved - Build a global symbol table; report undefined symbols with actionable diagnostics
- Apply relocations for a small supported set (start with absolute relocations)
- Emit a new ELF file with correct section headers and string tables
2.3 Example Output
$ ./myld -o combined.o main.o util.o
inputs=2 merged_sections=.text,.rodata,.data,.bss
symbols: defined=37 undefined=0
relocations: applied=12 skipped=0
wrote combined.o
3. Solution Architecture
3.1 High-Level Design
+-----------+ +-------------------+ +-----------------+
.o ...->| ELF Reader | --> | Link State/Model | --> | ELF Writer |--> combined.o / a.out
+-----------+ | (sections/symbols) | +-----------------+
+-------------------+
|
v
+-------------------+
| Relocation Engine |
+-------------------+
3.2 Key Implementation Decisions
- Keep your internal model independent from ELF on-disk structs (avoid “parse once, mutate raw bytes”).
- Start relocations with easy ones first (e.g.,
R_X86_64_64), then add PC-relative. - Make diagnostics a first-class feature: show which object referenced which missing symbol.
4. Phased Implementation
- Parse + dump: replicate a subset of
readelf -S -s -rprogrammatically - Merge sections: layout
.text/.rodata/.data/.bsswith alignment - Resolve symbols: undefined/defined, local/global, name collisions
- Apply relocations: patch addends/addresses into the merged section bytes
- Write output ELF: correct
.shstrtab,.strtab,.symtab - (Optional) Executable: create program headers + entry
_startfor your minimal model
5. Testing Strategy
- Golden-file tests: compare your “dump” output against
readelffor known.osamples. - Differential checks:
readelf -a combined.o,objdump -drwC combined.o, and (optional) run the produced executable. - Negative tests: duplicate strong symbols, unresolved externals, unsupported relocation types.
6. Extensions
- Add support for
.rela.*for more sections and more relocation types (R_X86_64_PC32). - Emit a link map (
--map) and a symbol provenance report (--trace-symbol foo). - Implement a minimal dynamic loader experiment (read PT_DYNAMIC + relocations) as a follow-on.
7. Reference Reading
- CS:APP 3e — Ch. 7 (Linking)
- John R. Levine — Linkers and Loaders