Project 18: ELF Linker and Loader

Project 18: ELF Linker and Loader

Build a tiny static linker for a constrained subset of ELF64, so “undefined reference” stops being mysterious and relocation becomes something you can explain byte-for-byte.

Quick Reference

Attribute Value
Difficulty Expert
Time Estimate 2-3 weeks
Language C (Alternatives: Rust)
Prerequisites Comfortable with binary parsing, readelf/objdump, basic x86-64 asm
Key Topics ELF sections/symbols, symbol resolution, relocation, layout, entry points
CS:APP Chapters 7 (plus 1 for toolchain context)

1. Learning Objectives

By completing this project, you will:

  1. Parse ELF64 relocatable objects: headers, section tables, symbols, and relocations
  2. Implement symbol resolution across multiple .o inputs (strong/weak/common rules)
  3. Implement a small set of x86-64 relocation types end-to-end
  4. Produce a valid output artifact you can inspect with readelf and execute (in milestone form)

2. Project Specification

2.1 What You Will Build

A CLI called myld:

  • Input: multiple ELF64 relocatable objects (.o)
  • Output (Milestone A): a single merged relocatable (combined.o) with resolved symbols and applied relocations where possible
  • Output (Milestone B, optional): a runnable ELF64 executable for a very constrained program model (no shared libs; a tiny _start)

2.2 Functional Requirements

  1. Read and validate ELF64 input objects (endianness, class, machine = x86-64)
  2. Merge like sections (.text, .rodata, .data, .bss) with alignment preserved
  3. Build a global symbol table; report undefined symbols with actionable diagnostics
  4. Apply relocations for a small supported set (start with absolute relocations)
  5. Emit a new ELF file with correct section headers and string tables

2.3 Example Output

$ ./myld -o combined.o main.o util.o
inputs=2 merged_sections=.text,.rodata,.data,.bss
symbols: defined=37 undefined=0
relocations: applied=12 skipped=0
wrote combined.o

3. Solution Architecture

3.1 High-Level Design

        +-----------+     +-------------------+     +-----------------+
.o ...->| ELF Reader | --> | Link State/Model  | --> | ELF Writer      |--> combined.o / a.out
        +-----------+     | (sections/symbols) |     +-----------------+
                           +-------------------+
                                   |
                                   v
                           +-------------------+
                           | Relocation Engine |
                           +-------------------+

3.2 Key Implementation Decisions

  • Keep your internal model independent from ELF on-disk structs (avoid “parse once, mutate raw bytes”).
  • Start relocations with easy ones first (e.g., R_X86_64_64), then add PC-relative.
  • Make diagnostics a first-class feature: show which object referenced which missing symbol.

4. Phased Implementation

  1. Parse + dump: replicate a subset of readelf -S -s -r programmatically
  2. Merge sections: layout .text/.rodata/.data/.bss with alignment
  3. Resolve symbols: undefined/defined, local/global, name collisions
  4. Apply relocations: patch addends/addresses into the merged section bytes
  5. Write output ELF: correct .shstrtab, .strtab, .symtab
  6. (Optional) Executable: create program headers + entry _start for your minimal model

5. Testing Strategy

  • Golden-file tests: compare your “dump” output against readelf for known .o samples.
  • Differential checks: readelf -a combined.o, objdump -drwC combined.o, and (optional) run the produced executable.
  • Negative tests: duplicate strong symbols, unresolved externals, unsupported relocation types.

6. Extensions

  • Add support for .rela.* for more sections and more relocation types (R_X86_64_PC32).
  • Emit a link map (--map) and a symbol provenance report (--trace-symbol foo).
  • Implement a minimal dynamic loader experiment (read PT_DYNAMIC + relocations) as a follow-on.

7. Reference Reading

  • CS:APP 3e — Ch. 7 (Linking)
  • John R. Levine — Linkers and Loaders