Project 6: Minimal Dynamic Linker (Capstone)

Build a simplified dynamic linker that loads ELF binaries, resolves symbols, and runs main().

Quick Reference

Attribute Value
Difficulty Advanced
Time Estimate 1 month+
Language C (Linux)
Prerequisites ELF parsing, virtual memory, Projects 1-2
Key Topics mmap, relocations, GOT/PLT, symbol resolution

1. Learning Objectives

By completing this project, you will:

  1. Parse ELF headers and program headers.
  2. Load segments into memory with mmap.
  3. Resolve dynamic symbols and apply relocations.
  4. Transfer control to the program entry point.

2. Theoretical Foundation

2.1 Core Concepts

  • ELF program headers: Describe loadable segments.
  • Relocations: Fix up addresses at runtime for PIC.
  • GOT/PLT: Indirection tables used for dynamic calls.

2.2 Why This Matters

Dynamic linking happens before your program even reaches main(). Building a minimal linker reveals exactly how shared libraries are made to work.

2.3 Historical Context / Background

System linkers like ld.so evolved with ELF to support shared code, lazy binding, and versioning. This capstone builds the smallest viable subset.

2.4 Common Misconceptions

  • “The kernel loads everything”: The kernel loads the interpreter, which loads the libs.
  • “Relocations are optional”: Without relocations, most shared objects cannot run.

3. Project Specification

3.1 What You Will Build

A minimal myld loader that can:

  • parse an ELF executable,
  • load a dependent shared library,
  • resolve a small set of symbols,
  • run the target program.

3.2 Functional Requirements

  1. Parse ELF headers and identify loadable segments.
  2. Map segments into memory at correct addresses.
  3. Resolve at least one dynamic symbol from a shared library.
  4. Apply relocations needed for execution.

3.3 Non-Functional Requirements

  • Reliability: Exit cleanly on malformed binaries.
  • Performance: Not critical; clarity matters.
  • Safety: Validate all offsets and sizes.

3.4 Example Usage / Output

$ ./myld ./hello_world
Hello, World!

3.5 Real World Outcome

Your loader runs a dynamically linked program without the system loader doing the work:

$ ./myld ./hello_world
Hello, World!

4. Solution Architecture

4.1 High-Level Design

┌──────────────┐     ┌──────────────┐     ┌──────────────┐
│ ELF binary   │────▶│ myld loader  │────▶│ program main │
└──────────────┘     └──────────────┘     └──────────────┘

4.2 Key Components

Component Responsibility Key Decisions
ELF parser Read headers and tables Support 64-bit ELF
Loader mmap segments Respect permissions
Resolver Symbol lookup Minimal symbol set

4.3 Data Structures

typedef struct {
    uint64_t vaddr;
    uint64_t memsz;
    uint64_t filesz;
    uint64_t offset;
    uint32_t flags;
} segment_t;

4.4 Algorithm Overview

Key Algorithm: Minimal dynamic loading

  1. Read ELF headers.
  2. Map PT_LOAD segments.
  3. Load dependent library and symbols.
  4. Apply relocations.
  5. Jump to entry point.

Complexity Analysis:

  • Time: O(S + R) for segments and relocations.
  • Space: O(S) for mapped segments.

5. Implementation Guide

5.1 Development Environment Setup

gcc --version
readelf --version

5.2 Project Structure

project-root/
├── src/
│   ├── main.c
│   ├── elf.c
│   ├── loader.c
│   ├── reloc.c
│   └── resolver.c
└── Makefile

5.3 The Core Question You’re Answering

“What actually happens between execve and main for a dynamically linked program?”

5.4 Concepts You Must Understand First

Stop and research these before coding:

  1. ELF layout
    • Program headers and dynamic section
  2. Relocations
    • RELA records and symbol references
  3. Memory mapping
    • mmap, permissions, and alignment

5.5 Questions to Guide Your Design

  1. Which relocations are mandatory for your target?
  2. How will you resolve symbols across multiple libs?
  3. What is the minimal set to run printf?

5.6 Thinking Exercise

Why must the loader run constructors (.init_array) before main()?

5.7 The Interview Questions They’ll Ask

  1. What is the role of the dynamic linker?
  2. How do relocations enable PIC?
  3. What is the PLT/GOT used for?

5.8 Hints in Layers

Hint 1: Start with static

  • Load a static binary first to simplify.

Hint 2: Minimal symbols

  • Resolve only puts or printf initially.

Hint 3: Validate mapping

  • Use readelf -l to confirm segment layout.

5.9 Books That Will Help

Topic Book Chapter
ELF internals “Practical Binary Analysis” Ch. 2-4
Dynamic linking “Linkers and Loaders” Ch. 10
Memory mapping TLPI Ch. 49

5.10 Implementation Phases

Phase 1: Foundation (1-2 weeks)

Goals:

  • Parse ELF and load PT_LOAD segments.

Tasks:

  1. Parse ELF headers.
  2. Map segments into memory.

Checkpoint: A static binary can run.

Phase 2: Core Functionality (2-3 weeks)

Goals:

  • Resolve symbols and relocations.

Tasks:

  1. Parse .dynsym and .rela.
  2. Apply relocations.

Checkpoint: A dynamically linked hello program runs.

Phase 3: Polish & Edge Cases (1-2 weeks)

Goals:

  • Handle multiple libs and constructors.

Tasks:

  1. Load dependent shared libraries.
  2. Run .init_array before entry point.

Checkpoint: Simple programs with libc run.

5.11 Key Implementation Decisions

Decision Options Recommendation Rationale
ELF class 32-bit vs 64-bit 64-bit Modern systems
Reloc set minimal vs full minimal first Scope control

6. Testing Strategy

6.1 Test Categories

Category Purpose Examples
ELF parsing Validate headers Compare with readelf
Mapping Correct memory layout Verify segment addresses
Execution Program runs ./hello_world

6.2 Critical Test Cases

  1. Static hello binary runs.
  2. Dynamic hello binary runs with printf.
  3. Multiple dependencies load without crashes.

6.3 Test Data

hello_world (static and dynamic builds)

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

Pitfall Symptom Solution
Wrong mmap flags SIGSEGV Match segment permissions
Missing relocations Crash at call Implement required relocation types
Bad symbol lookup Unresolved functions Verify dynsym parsing

7.2 Debugging Strategies

  • Use strace to compare with system loader behavior.
  • Dump mapped addresses and compare to readelf -l.

7.3 Performance Traps

Not performance-critical; correctness first.


8. Extensions & Challenges

8.1 Beginner Extensions

  • Print segment and relocation tables.

8.2 Intermediate Extensions

  • Support lazy binding via PLT.

8.3 Advanced Extensions

  • Implement symbol versioning checks.

9. Real-World Connections

9.1 Industry Applications

  • Loader debugging: Understand crashes before main.
  • Security: Loader behavior impacts exploit techniques.
  • glibc ld.so: Reference implementation.
  • musl ldso: Smaller, readable dynamic loader.

9.3 Interview Relevance

  • Deep systems knowledge and ELF internals.

10. Resources

10.1 Essential Reading

  • System V ABI - ELF specification.
  • TLPI - Memory mappings and linking.

10.2 Video Resources

  • Search: “build a dynamic linker”.

10.3 Tools & Documentation

  • readelf and objdump.
  • man mmap, man elf.

11. Self-Assessment Checklist

11.1 Understanding

  • I can explain how ELF segments map to memory.
  • I can describe how relocations work.

11.2 Implementation

  • My loader runs a simple dynamic binary.
  • I can resolve at least one external symbol.

11.3 Growth

  • I can reason about loader errors in real systems.

12. Submission / Completion Criteria

Minimum Viable Completion:

  • Load a static binary and run it.

Full Completion:

  • Run a dynamically linked program with printf.

Excellence (Going Above & Beyond):

  • Load multiple libraries and handle constructors.

This guide was generated from SHARED_LIBRARIES_LEARNING_PROJECTS.md. For the complete learning path, see the parent directory README.