Project 10: ELF Link Map & Interposition Toolkit
Build a tool that reveals the hidden world of symbols, relocations, and dynamic linking, then demonstrate function call hooking through library interposition.
Quick Reference
| Attribute | Value |
|---|---|
| Language | C (alt: Rust, Zig, C++) |
| Difficulty | Advanced |
| Time | 2–3 weeks |
| Chapters | 7 |
| Coolness | ★★★★☆ Hardcore Tech Flex |
| Portfolio Value | Service & Support |
Learning Objectives
By completing this project, you will:
- Parse ELF structures confidently: Read headers, section tables, symbol tables, and relocation entries
- Understand symbol resolution: Explain how the linker and loader find symbol definitions
- Master PLT/GOT mechanics: Trace exactly how a dynamically linked function call works
- Explain relocation types: Understand why different relocation types exist and when each is used
- Implement library interposition: Hook function calls at compile-time, link-time, and run-time
- Debug linking issues: Diagnose and fix common linking problems in real programs
- Reason about security implications: Understand how linking mechanisms affect program security
Deep Theoretical Foundation
The ELF Format: A Complete Tour
ELF (Executable and Linkable Format) is the standard binary format on Unix-like systems. Understanding it is essential for systems programming.
ELF File Structure Overview
+=====================================+
| ELF HEADER | <- Fixed size (52 or 64 bytes)
| Magic, class, endianness, type, |
| machine, entry point, offsets |
+=====================================+
| PROGRAM HEADERS (optional) | <- How to load into memory
| Segment type, offset, vaddr, | (for executables/shared libs)
| paddr, filesz, memsz, flags |
+=====================================+
| |
| SECTIONS |
| |
| .text (code) |
| .rodata (read-only data) |
| .data (initialized data) |
| .bss (uninitialized data) |
| .symtab (symbol table) |
| .strtab (string table) |
| .rela.text (relocations for .text) |
| .dynsym (dynamic symbols) |
| .dynstr (dynamic strings) |
| .plt (procedure linkage table) |
| .got (global offset table) |
| .dynamic (dynamic linking info) |
| ... |
| |
+=====================================+
| SECTION HEADERS | <- Describes all sections
| Name, type, flags, addr, offset, | (for linker/tools)
| size, link, info, align, entsize |
+=====================================+
The ELF Header
// 64-bit ELF header structure
typedef struct {
unsigned char e_ident[16]; // Magic number and identification
Elf64_Half e_type; // Object file type
Elf64_Half e_machine; // Architecture
Elf64_Word e_version; // ELF version
Elf64_Addr e_entry; // Entry point virtual address
Elf64_Off e_phoff; // Program header table file offset
Elf64_Off e_shoff; // Section header table file offset
Elf64_Word e_flags; // Processor-specific flags
Elf64_Half e_ehsize; // ELF header size
Elf64_Half e_phentsize; // Program header table entry size
Elf64_Half e_phnum; // Program header table entry count
Elf64_Half e_shentsize; // Section header table entry size
Elf64_Half e_shnum; // Section header table entry count
Elf64_Half e_shstrndx; // Section header string table index
} Elf64_Ehdr;
// ELF magic number: 0x7f 'E' 'L' 'F'
// e_ident[0] = 0x7f
// e_ident[1] = 'E'
// e_ident[2] = 'L'
// e_ident[3] = 'F'
// e_ident[4] = class (1=32-bit, 2=64-bit)
// e_ident[5] = data encoding (1=little, 2=big endian)
ELF Types (e_type): | Value | Name | Description | |——-|——|————-| | 1 | ET_REL | Relocatable object file (.o) | | 2 | ET_EXEC | Executable file | | 3 | ET_DYN | Shared object file (.so) or PIE executable | | 4 | ET_CORE | Core dump |
Section Headers
typedef struct {
Elf64_Word sh_name; // Section name (index into .shstrtab)
Elf64_Word sh_type; // Section type
Elf64_Xword sh_flags; // Section flags
Elf64_Addr sh_addr; // Virtual address in memory
Elf64_Off sh_offset; // Offset in file
Elf64_Xword sh_size; // Size in bytes
Elf64_Word sh_link; // Link to another section
Elf64_Word sh_info; // Additional info
Elf64_Xword sh_addralign; // Alignment constraint
Elf64_Xword sh_entsize; // Entry size if section has table
} Elf64_Shdr;
Key Section Types: | Type | Name | Description | |——|——|————-| | SHT_PROGBITS | 1 | Code or data | | SHT_SYMTAB | 2 | Symbol table | | SHT_STRTAB | 3 | String table | | SHT_RELA | 4 | Relocation entries with addends | | SHT_DYNAMIC | 6 | Dynamic linking information | | SHT_DYNSYM | 11 | Dynamic symbol table |
Critical Sections for Linking:
+-------------------+--------------------------------------------------+
| Section | Purpose |
+-------------------+--------------------------------------------------+
| .text | Executable machine code |
| .rodata | Read-only data (string literals, constants) |
| .data | Initialized global/static variables |
| .bss | Uninitialized global/static (zero at load) |
| .symtab | Full symbol table (for debugging/linking) |
| .strtab | String table for .symtab names |
| .dynsym | Dynamic symbol table (runtime resolution) |
| .dynstr | String table for .dynsym names |
| .rel.text/.rela.* | Relocation entries |
| .plt | Procedure Linkage Table stubs |
| .got | Global Offset Table entries |
| .got.plt | GOT entries specifically for PLT |
| .dynamic | Dynamic linking control information |
| .interp | Path to dynamic linker (ld-linux.so) |
+-------------------+--------------------------------------------------+
Symbol Tables and String Tables
Symbols are the names that connect your code to definitions across files and libraries.
Symbol Table Entry Structure
typedef struct {
Elf64_Word st_name; // Symbol name (index into string table)
unsigned char st_info; // Type and binding
unsigned char st_other; // Visibility
Elf64_Half st_shndx; // Section index
Elf64_Addr st_value; // Symbol value (address or offset)
Elf64_Xword st_size; // Size of the symbol
} Elf64_Sym;
// Macros to extract binding and type from st_info
#define ELF64_ST_BIND(info) ((info) >> 4)
#define ELF64_ST_TYPE(info) ((info) & 0xf)
#define ELF64_ST_INFO(bind, type) (((bind) << 4) + ((type) & 0xf))
Symbol Binding (visibility)
+-------+----------------+------------------------------------------+
| Value | Name | Meaning |
+-------+----------------+------------------------------------------+
| 0 | STB_LOCAL | Not visible outside this object file |
| 1 | STB_GLOBAL | Visible to all; one definition must exist|
| 2 | STB_WEAK | Like global, but can be overridden |
+-------+----------------+------------------------------------------+
Key insight: Local symbols (static functions/variables in C) cannot be referenced from other files. This is why static provides encapsulation.
Symbol Type
+-------+----------------+------------------------------------------+
| Value | Name | Meaning |
+-------+----------------+------------------------------------------+
| 0 | STT_NOTYPE | Type not specified |
| 1 | STT_OBJECT | Data object (variable) |
| 2 | STT_FUNC | Function |
| 3 | STT_SECTION | Section symbol |
| 4 | STT_FILE | Source file name |
+-------+----------------+------------------------------------------+
Special Section Indices
+--------+----------------+------------------------------------------+
| Value | Name | Meaning |
+--------+----------------+------------------------------------------+
| 0 | SHN_UNDEF | Undefined (needs resolution) |
| 0xfff1 | SHN_ABS | Absolute value, not affected by reloc |
| 0xfff2 | SHN_COMMON | Common block (tentative definition) |
+--------+----------------+------------------------------------------+
Reading symbols with nm:
$ nm hello.o
U printf # Undefined, needs linking
0000000000000000 T main # Text (code), defined here
0000000000000000 D global_var # Data, initialized
0000000000000004 C uninit_var # Common (uninitialized)
String Tables
String tables are simple: just null-terminated strings packed together. Symbol names are stored as offsets into the string table:
Offset: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
Data: | \0| m | a | i | n | \0| p | r | i | n | t | f | \0| x | \0|...|
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
Symbol "main" -> st_name = 1
Symbol "printf" -> st_name = 6
Symbol "x" -> st_name = 13
Relocation: Patching Addresses
When the compiler generates code, it doesn’t know where symbols will be placed in memory. Relocation entries tell the linker how to patch these addresses.
Relocation Entry Structure
// Relocation entry with addend (most common on x86-64)
typedef struct {
Elf64_Addr r_offset; // Location to patch
Elf64_Xword r_info; // Symbol index and relocation type
Elf64_Sxword r_addend; // Addend for computation
} Elf64_Rela;
#define ELF64_R_SYM(info) ((info) >> 32)
#define ELF64_R_TYPE(info) ((info) & 0xffffffff)
Common Relocation Types (x86-64)
+----------------------+------------------------------------------------+
| Relocation Type | Computation |
+----------------------+------------------------------------------------+
| R_X86_64_64 | S + A (absolute 64-bit address) |
| R_X86_64_PC32 | S + A - P (PC-relative 32-bit) |
| R_X86_64_PLT32 | L + A - P (PLT entry, 32-bit PC-relative) |
| R_X86_64_GOTPCREL | G + GOT + A - P (GOT entry, PC-relative) |
| R_X86_64_GLOB_DAT | S (GOT entry for data) |
| R_X86_64_JUMP_SLOT | S (GOT entry for function/PLT) |
+----------------------+------------------------------------------------+
Where:
S = Value of the symbol
A = Addend from relocation entry
P = Place (address being relocated)
L = Address of PLT entry
G = Offset into GOT
GOT = Address of GOT
Why PC-Relative Addressing?
Position-Independent Code (PIC) uses PC-relative addressing to work regardless of where it’s loaded:
Absolute: CALL 0x401234 ; Only works if code at that address
PC-Relative: CALL [RIP + 0x123] ; Works at any address
PC-relative formula: target = current_address + offset
If code moves, both current_address and target move by the same amount,
so the offset stays correct!
Static Linking: Resolving at Build Time
Static linking happens when you create an executable from object files and static libraries (.a).
The Static Linker’s Job
+===========================================================================+
| STATIC LINKER WORKFLOW |
+===========================================================================+
| |
| INPUT: |
| main.o ────────┐ |
| helper.o ───────┼───────────> LINKER (ld) ──────> EXECUTABLE |
| libfoo.a ───────┘ │ |
| │ |
| PROCESS: ▼ |
| |
| 1. SYMBOL RESOLUTION |
| - Collect all defined symbols from input files |
| - For each undefined symbol, find a defining module |
| - Error if undefined symbol has no definition |
| - Error if multiple strong definitions exist |
| |
| 2. RELOCATION |
| - Merge sections (.text from all inputs → one .text) |
| - Assign runtime addresses to all symbols |
| - Patch relocation entries with final addresses |
| |
| 3. OUTPUT |
| - Write ELF executable with program headers for loader |
| |
+===========================================================================+
Symbol Resolution Rules
// Strong symbols: functions and initialized global variables
int x = 5; // Strong (initialized)
int foo() { return 1; } // Strong (function)
// Weak symbols: uninitialized global variables
int y; // Weak (tentative definition)
/* RESOLUTION RULES:
* 1. Multiple strong definitions → ERROR
* 2. One strong + multiple weak → Use strong
* 3. Multiple weak only → Pick one (usually largest)
*/
// Example that causes problems:
// file1.c: int x = 5; // Strong
// file2.c: int x = 10; // Strong → LINKER ERROR!
// This "works" but is dangerous:
// file1.c: int x = 5; // Strong
// file2.c: int x; // Weak → Uses file1's definition
Static Library Scanning
Static libraries (.a) are archives of object files. The linker scans them left-to-right:
# Order matters!
gcc main.o -L. -lfoo -lbar # Search for undefined symbols in order
# If main.o needs symbol from libbar.a, and libbar.a needs symbol from
# libfoo.a, this will fail:
gcc main.o -lbar -lfoo # Wrong order!
# Correct:
gcc main.o -lfoo -lbar # Or specify -lfoo again after -lbar
Dynamic Linking: Resolution at Load Time
Dynamic linking defers symbol resolution until the program runs. This is more complex but has significant advantages.
Why Dynamic Linking?
+----------------------------+----------------------------+
| STATIC LINKING | DYNAMIC LINKING |
+----------------------------+----------------------------+
| + Self-contained binary | + Smaller executables |
| + No runtime dependencies | + Shared memory for libs |
| + Slightly faster startup | + Update libs without |
| - Larger file size | recompiling |
| - Duplicate code in memory | + Required for plugins |
| - Can't update libs | - Runtime overhead |
| without recompile | - Dependency management |
+----------------------------+----------------------------+
The Dynamic Linker (ld-linux.so)
+===========================================================================+
| PROGRAM LOADING WITH DYNAMIC LINKING |
+===========================================================================+
| |
| 1. KERNEL LOADS EXECUTABLE |
| - Read ELF header, create memory mappings |
| - Find .interp section (path to ld-linux.so) |
| - Load dynamic linker into address space |
| - Transfer control to dynamic linker |
| |
| 2. DYNAMIC LINKER INITIALIZATION |
| - Load shared libraries listed in DT_NEEDED entries |
| - Recursively load dependencies |
| - Process relocations (patch GOT entries) |
| - Run initialization functions (.init, .ctors) |
| |
| 3. TRANSFER TO APPLICATION |
| - Jump to program's entry point (e_entry) |
| - __libc_start_main calls main() |
| |
+===========================================================================+
The .dynamic Section
The .dynamic section contains tags that control dynamic linking:
typedef struct {
Elf64_Sxword d_tag; // Type of entry
union {
Elf64_Xword d_val; // Integer value
Elf64_Addr d_ptr; // Address value
} d_un;
} Elf64_Dyn;
Key Dynamic Tags:
+-----------+--------------------------------------------------+
| Tag | Meaning |
+-----------+--------------------------------------------------+
| DT_NEEDED | Name of required shared library |
| DT_SONAME | Shared object name |
| DT_SYMTAB | Address of dynamic symbol table |
| DT_STRTAB | Address of dynamic string table |
| DT_PLTREL | Type of PLT relocations |
| DT_JMPREL | Address of PLT relocations |
| DT_PLTGOT | Address of GOT (for PLT) |
| DT_RELA | Address of relocation table |
| DT_RELASZ | Size of relocation table |
| DT_INIT | Address of initialization function |
| DT_FINI | Address of finalization function |
+-----------+--------------------------------------------------+
PLT and GOT: The Heart of Dynamic Linking
The Procedure Linkage Table (PLT) and Global Offset Table (GOT) work together to enable lazy binding of dynamically linked functions.
High-Level Overview
+===========================================================================+
| PLT/GOT MECHANISM |
+===========================================================================+
| |
| YOUR CODE PLT GOT |
| --------- --- --- |
| |
| call printf@plt ───> [PLT entry] ───> [GOT entry] ───> printf |
| (stub) (address) (in libc) |
| |
| |
| FIRST CALL: |
| PLT stub jumps to resolver, which: |
| 1. Finds printf's actual address in libc.so |
| 2. Patches the GOT entry with that address |
| 3. Jumps to printf |
| |
| SUBSEQUENT CALLS: |
| PLT stub jumps through GOT directly to printf |
| (no resolver involvement) |
| |
+===========================================================================+
Detailed PLT Structure
+===========================================================================+
| PLT ENTRY ANATOMY |
+===========================================================================+
| |
| PLT[0] - Special resolver entry: |
| ┌───────────────────────────────────────────────────────────────────┐ |
| │ push [GOT+8] ; Push link_map pointer │ |
| │ jmp [GOT+16] ; Jump to _dl_runtime_resolve │ |
| └───────────────────────────────────────────────────────────────────┘ |
| |
| PLT[1] - Entry for printf (typical example): |
| ┌───────────────────────────────────────────────────────────────────┐ |
| │ jmp [GOT+24] ; Jump to address in GOT entry │ |
| │ push 0 ; Push relocation index (0 = first) │ |
| │ jmp PLT[0] ; Jump to resolver │ |
| └───────────────────────────────────────────────────────────────────┘ |
| |
| PLT[2] - Entry for malloc: |
| ┌───────────────────────────────────────────────────────────────────┐ |
| │ jmp [GOT+32] ; Jump to address in GOT entry │ |
| │ push 1 ; Push relocation index (1 = second) │ |
| │ jmp PLT[0] ; Jump to resolver │ |
| └───────────────────────────────────────────────────────────────────┘ |
| |
+===========================================================================+
GOT Structure
+===========================================================================+
| GOT LAYOUT |
+===========================================================================+
| |
| GOT[0] = Address of .dynamic section |
| GOT[1] = Pointer to link_map (struct for this shared object) |
| GOT[2] = Address of _dl_runtime_resolve function |
| GOT[3] = Entry for printf (initially points to PLT[1]+6) |
| GOT[4] = Entry for malloc (initially points to PLT[2]+6) |
| ... |
| |
| Before resolution: |
| GOT[3] -> PLT[1]+6 (instruction after jmp, the push instruction) |
| |
| After resolution: |
| GOT[3] -> 0x7fff... (actual address of printf in libc.so) |
| |
+===========================================================================+
Complete Lazy Binding Sequence
+===========================================================================+
| LAZY BINDING STEP-BY-STEP |
+===========================================================================+
| |
| FIRST CALL TO printf: |
| ───────────────────── |
| |
| 1. Code executes: call printf@plt |
| ↓ |
| 2. Jump to PLT[1]: jmp [GOT+24] |
| GOT+24 initially contains address of PLT[1]+6 |
| ↓ |
| 3. Fall through to: push 0 (relocation index) |
| ↓ |
| 4. Jump to PLT[0]: jmp PLT[0] |
| ↓ |
| 5. PLT[0] executes: |
| - push [GOT+8] ; Push link_map |
| - jmp [GOT+16] ; Jump to _dl_runtime_resolve |
| ↓ |
| 6. _dl_runtime_resolve: |
| - Uses relocation index (0) to find symbol name ("printf") |
| - Searches loaded libraries for printf |
| - Finds printf at 0x7fff12345678 in libc.so |
| - Writes 0x7fff12345678 to GOT+24 |
| - Jumps to printf (not returns!) |
| |
| SECOND CALL TO printf: |
| ────────────────────── |
| |
| 1. Code executes: call printf@plt |
| ↓ |
| 2. Jump to PLT[1]: jmp [GOT+24] |
| GOT+24 now contains 0x7fff12345678 |
| ↓ |
| 3. Direct jump to printf - done! |
| |
+===========================================================================+
┌─────────────────────────────────────┐
│ ASCII DIAGRAM │
└─────────────────────────────────────┘
Your Program PLT GOT libc.so
┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
│ │ │ PLT[0] │ │ GOT[0] │ │ │
│ call │───>│ resolver │ │ .dynamic │ │ │
│ printf │ │ stub │ │ │ │ │
│ @plt │ ├──────────┤ ├──────────┤ │ │
│ │ │ PLT[1] │ │ GOT[1] │ │ │
│ │ │ jmp [GOT]│────────>│link_map │ │ │
│ │ │ push 0 │ │ │ │ │
│ │ │ jmp PLT0 │ ├──────────┤ │ │
│ │ │ │ │ GOT[2] │ │ │
│ │ ├──────────┤ │ resolver │ │ │
│ │ │ PLT[2] │ ├──────────┤ ├──────────┤
│ │ │ ... │ │ GOT[3] │ ────>│ printf │
│ │ │ │ │ printf │ │ code │
└──────────┘ └──────────┘ └──────────┘ └──────────┘
First call: PLT → GOT (has PLT+6) → back to PLT → resolver → writes GOT
Later calls: PLT → GOT (has printf addr) → printf directly
Immediate Binding vs Lazy Binding
# Lazy binding (default): Resolve symbols on first use
./my_program
# Immediate binding: Resolve all symbols at load time
LD_BIND_NOW=1 ./my_program
# Or compile with -z now:
gcc -Wl,-z,now -o my_program main.c
When to use immediate binding:
- Security (RELRO + BIND_NOW = no writable GOT after startup)
- Debugging (fail-fast if symbol missing)
- Real-time systems (no unpredictable latency from resolution)
Position Independent Code (PIC)
Shared libraries must work when loaded at any address. PIC achieves this using PC-relative addressing.
How PIC Accesses Global Data
+===========================================================================+
| PIC DATA ACCESS |
+===========================================================================+
| |
| WITHOUT PIC (absolute addressing): |
| ┌──────────────────────────────────────────────────────────────────┐ |
| │ mov rax, [0x601020] ; Load from absolute address │ |
| └──────────────────────────────────────────────────────────────────┘ |
| Problem: Only works if loaded at expected address! |
| |
| WITH PIC (PC-relative through GOT): |
| ┌──────────────────────────────────────────────────────────────────┐ |
| │ mov rax, [rip + global_var@GOTPCREL] ; Get GOT entry address │ |
| │ mov rax, [rax] ; Load actual value │ |
| └──────────────────────────────────────────────────────────────────┘ |
| Works at any address because: |
| 1. GOT is always at fixed offset from code |
| 2. Dynamic linker fills GOT with actual addresses at load time |
| |
+===========================================================================+
Compiling with PIC
# For shared libraries, PIC is required on most systems
gcc -fPIC -shared -o libfoo.so foo.c
# For executables, PIE (Position Independent Executable) is optional
gcc -fPIE -pie -o my_program main.c # PIE executable
gcc -no-pie -o my_program main.c # Traditional executable
Library Interposition
Interposition lets you intercept calls to library functions. There are three approaches:
Compile-Time Interposition
Replace function at compile time using macros:
// mymalloc.c - compile-time interposition wrapper
#ifdef COMPILETIME
#include <stdio.h>
#include <malloc.h>
// Intercept malloc
void *mymalloc(size_t size) {
void *ptr = malloc(size);
printf("malloc(%zu) = %p\n", size, ptr);
return ptr;
}
// Intercept free
void myfree(void *ptr) {
printf("free(%p)\n", ptr);
free(ptr);
}
#endif
// malloc.h - redefine malloc/free
#define malloc(size) mymalloc(size)
#define free(ptr) myfree(ptr)
void *mymalloc(size_t size);
void myfree(void *ptr);
# Compile with interposition
gcc -DCOMPILETIME -c mymalloc.c
gcc -I. -o my_program main.c mymalloc.o
Link-Time Interposition
Use linker’s --wrap option:
// mymalloc.c - link-time interposition
#include <stdio.h>
// __real_malloc is the actual malloc
void *__real_malloc(size_t size);
// __wrap_malloc intercepts calls to malloc
void *__wrap_malloc(size_t size) {
void *ptr = __real_malloc(size);
printf("malloc(%zu) = %p\n", size, ptr);
return ptr;
}
void __real_free(void *ptr);
void __wrap_free(void *ptr) {
printf("free(%p)\n", ptr);
__real_free(ptr);
}
gcc -c mymalloc.c
gcc -Wl,--wrap,malloc -Wl,--wrap,free -o my_program main.c mymalloc.o
Run-Time Interposition (LD_PRELOAD)
The most powerful approach - intercept at runtime without recompiling:
// mymalloc.c - run-time interposition
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <dlfcn.h>
// Function pointer to original malloc
static void *(*real_malloc)(size_t) = NULL;
static void (*real_free)(void *) = NULL;
// Initialize pointers to real functions
static void init(void) {
real_malloc = dlsym(RTLD_NEXT, "malloc");
real_free = dlsym(RTLD_NEXT, "free");
if (!real_malloc || !real_free) {
fprintf(stderr, "Error loading symbols: %s\n", dlerror());
exit(1);
}
}
// Interpose malloc
void *malloc(size_t size) {
if (!real_malloc) init();
void *ptr = real_malloc(size);
fprintf(stderr, "malloc(%zu) = %p\n", size, ptr);
return ptr;
}
// Interpose free
void free(void *ptr) {
if (!real_free) init();
fprintf(stderr, "free(%p)\n", ptr);
real_free(ptr);
}
# Build as shared library
gcc -fPIC -shared -o mymalloc.so mymalloc.c -ldl
# Use with any program - no recompilation needed!
LD_PRELOAD=./mymalloc.so ls
LD_PRELOAD=./mymalloc.so /bin/cat file.txt
LD_PRELOAD Search Order
+===========================================================================+
| DYNAMIC LINKER SYMBOL SEARCH ORDER |
+===========================================================================+
| |
| When resolving a symbol, the dynamic linker searches: |
| |
| 1. LD_PRELOAD libraries (searched first!) |
| 2. Executable itself (if not RTLD_LOCAL) |
| 3. DT_NEEDED libraries in order |
| 4. Libraries loaded by DT_NEEDED libraries (BFS) |
| |
| Example: |
| LD_PRELOAD=mymalloc.so ./program |
| |
| Search for "malloc": |
| 1. mymalloc.so - FOUND! Uses this malloc |
| 2. (Never reaches libc.so.6 which has the "real" malloc) |
| |
+===========================================================================+
RTLD_NEXT Magic
// dlsym(RTLD_NEXT, "symbol") returns the NEXT definition of symbol
// after the current library in the search order
// In our malloc wrapper:
void *malloc(size_t size) {
// RTLD_NEXT skips mymalloc.so and finds libc's malloc
void *(*real_malloc)(size_t) = dlsym(RTLD_NEXT, "malloc");
void *ptr = real_malloc(size); // Call the real one
log("malloc(%zu) = %p\n", size, ptr);
return ptr;
}
Security Implications
Linking mechanisms have significant security implications:
RELRO (RELocation Read-Only)
# Partial RELRO (default):.got is read-only,.got.plt is writable
gcc -Wl,-z,relro -o program main.c
# Full RELRO: All GOT entries read-only after binding
gcc -Wl,-z,relro,-z,now -o program main.c
Partial RELRO:
┌─────────────┐
│ .got │ Read-Only (non-PLT globals)
├─────────────┤
│ .got.plt │ Writable (function pointers - can be hijacked!)
└─────────────┘
Full RELRO:
┌─────────────┐
│ .got │ Read-Only
├─────────────┤
│ .got.plt │ Read-Only (resolved at startup)
└─────────────┘
GOT/PLT Attacks and Defenses
ATTACK: Overwrite GOT entry with address of malicious code
Next call to that function jumps to attacker's code
DEFENSE:
1. Full RELRO - GOT becomes read-only after startup
2. PIE + ASLR - Attacker can't predict GOT address
3. Stack canaries - Prevent buffer overflows that reach GOT
Project Specification
What You Will Build
A comprehensive toolkit that:
- ELF Inspector: Parse and display symbol tables, sections, and relocations from ELF files
- Link Map Analyzer: Show how symbols are resolved between object files
- PLT/GOT Tracer: Trace dynamic symbol resolution at runtime
- Interposition Toolkit: Demonstrate all three interposition techniques
Functional Requirements
Component 1: ELF Inspector (elfinspect)
# Basic usage
./elfinspect <elf-file>
# Options
./elfinspect --header hello.o # Show ELF header
./elfinspect --sections hello.o # List sections
./elfinspect --symbols hello.o # List symbols
./elfinspect --relocations hello.o # Show relocations
./elfinspect --dynamic /bin/ls # Show dynamic linking info
./elfinspect --all hello # Full analysis
Component 2: Link Map Analyzer (linkmap)
# Analyze how symbols resolve across files
./linkmap main.o helper.o libfoo.a
# Show what each symbol needs and provides
./linkmap --deps main.o helper.o
Component 3: PLT/GOT Tracer (pltrace)
# Trace PLT/GOT activity during program execution
./pltrace ./hello_world
# Output: When each symbol is resolved and its final address
Component 4: Interposition Demos
# Compile-time demo
make compiletime-demo
# Link-time demo
make linktime-demo
# Runtime demo
make runtime-demo
Example Output
ELF Inspector Output
$ ./elfinspect --all hello.o
=== ELF HEADER ===
Class: ELF64
Data: 2's complement, little endian
Type: REL (Relocatable file)
Machine: Advanced Micro Devices X86-64
Entry point: 0x0
Section headers: 13 sections at offset 0x2d8
Program headers: 0 entries
=== SECTION TABLE ===
[Nr] Name Type Address Offset Size Flags
[ 0] NULL 0x0 0x000000 0x0
[ 1] .text PROGBITS 0x0 0x000040 0x28 AX
[ 2] .rela.text RELA 0x0 0x000210 0x48 I
[ 3] .data PROGBITS 0x0 0x000068 0x0 WA
[ 4] .bss NOBITS 0x0 0x000068 0x0 WA
[ 5] .rodata PROGBITS 0x0 0x000068 0x0e A
[ 6] .comment PROGBITS 0x0 0x000076 0x27 MS
[ 7] .note.GNU-stack PROGBITS 0x0 0x00009d 0x0
[ 8] .eh_frame PROGBITS 0x0 0x0000a0 0x38 A
[ 9] .rela.eh_frame RELA 0x0 0x000258 0x18 I
[10] .symtab SYMTAB 0x0 0x0000d8 0x120
[11] .strtab STRTAB 0x0 0x0001f8 0x13
[12] .shstrtab STRTAB 0x0 0x000270 0x61
Flags: A=Alloc, W=Write, X=Execute, M=Merge, S=Strings, I=Info
=== SYMBOL TABLE (.symtab) ===
Num: Value Size Type Bind Vis Ndx Name
0: 0x0 0 NOTYPE LOCAL DEFAULT UND
1: 0x0 0 FILE LOCAL DEFAULT ABS hello.c
2: 0x0 0 SECTION LOCAL DEFAULT 1 .text
3: 0x0 0 SECTION LOCAL DEFAULT 3 .data
4: 0x0 0 SECTION LOCAL DEFAULT 4 .bss
5: 0x0 0 SECTION LOCAL DEFAULT 5 .rodata
6: 0x0 40 FUNC GLOBAL DEFAULT 1 main
7: 0x0 0 NOTYPE GLOBAL DEFAULT UND printf
Summary: 8 symbols (1 function, 1 undefined, 6 other)
=== RELOCATION TABLE (.rela.text) ===
Offset Type Symbol Addend
0x00000009 R_X86_64_PC32 .rodata -0x4
0x00000013 R_X86_64_PLT32 printf -0x4
RELOCATION EXPLANATION:
- At .text+0x09: Reference to string literal in .rodata (PC-relative)
- At .text+0x13: Call to printf via PLT (PC-relative to PLT entry)
Link Map Analysis Output
$ ./linkmap main.o helper.o -lm
=== SYMBOL DEPENDENCY ANALYSIS ===
main.o:
DEFINES: main (FUNC, GLOBAL)
REQUIRES: printf (libc.so.6)
helper (helper.o)
sin (libm.so.6)
helper.o:
DEFINES: helper (FUNC, GLOBAL)
helper_data (DATA, GLOBAL)
REQUIRES: malloc (libc.so.6)
free (libc.so.6)
=== RESOLUTION RESULT ===
Symbol Defined In Address
------ ---------- -------
main main.o 0x401126
helper helper.o 0x401168
helper_data helper.o 0x404020
printf libc.so.6 <runtime>
malloc libc.so.6 <runtime>
free libc.so.6 <runtime>
sin libm.so.6 <runtime>
All symbols resolved successfully.
PLT/GOT Trace Output
$ ./pltrace ./test_program
=== PLT/GOT RESOLUTION TRACE ===
PID: 12345
Executable: ./test_program
[LOAD] Program loaded at base address: 0x555555554000
[LOAD] libc.so.6 loaded at: 0x7ffff7dc2000
[LOAD] libm.so.6 loaded at: 0x7ffff7b9e000
[RESOLVE] First call to printf:
PLT entry: 0x555555555030
GOT entry: 0x555555558018
Before: 0x555555555036 (PLT+6)
After: 0x7ffff7e45040 (printf in libc.so.6)
Elapsed: 0.043ms
[RESOLVE] First call to malloc:
PLT entry: 0x555555555040
GOT entry: 0x555555558020
Before: 0x555555555046 (PLT+6)
After: 0x7ffff7e6e0f0 (malloc in libc.so.6)
Elapsed: 0.021ms
[CALL] printf called 5 more times (no resolution, direct jump)
[CALL] malloc called 3 more times (no resolution, direct jump)
=== SUMMARY ===
Total PLT calls: 12
Lazy resolutions: 4
Direct GOT jumps: 8
Interposition Demo Output
$ make runtime-demo
Building malloc tracer...
gcc -fPIC -shared -o malloc_trace.so malloc_trace.c -ldl
Running test program with interposition:
LD_PRELOAD=./malloc_trace.so ./test_program
=== MALLOC TRACE ===
[malloc_trace] malloc(24) = 0x55a3bc8f12a0 [from main+0x1a]
[malloc_trace] malloc(100) = 0x55a3bc8f12c0 [from main+0x2f]
[malloc_trace] malloc(50) = 0x55a3bc8f1330 [from helper+0x12]
[malloc_trace] free(0x55a3bc8f12c0) [from main+0x58]
[malloc_trace] free(0x55a3bc8f12a0) [from main+0x62]
[malloc_trace] free(0x55a3bc8f1330) [from helper+0x1f]
=== SUMMARY ===
Total allocations: 3
Total frees: 3
Peak memory: 174 bytes
Memory leaked: 0 bytes
Real World Outcome
When you complete this project, you will have a comprehensive ELF analysis toolkit. Here is exactly what running your tools will look like:
ELF Header Analysis
$ ./elfmap --header /bin/ls
╔══════════════════════════════════════════════════════════════════╗
║ ELF HEADER ANALYSIS ║
║ /bin/ls ║
╚══════════════════════════════════════════════════════════════════╝
Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
(ELF64, Little Endian, System V ABI)
Type: DYN (Shared object file) - Position Independent Executable
Machine: x86-64
Version: 1 (current)
Entry point: 0x6ab0
Program headers: 13 entries at offset 0x40 (56 bytes each)
Section headers: 31 entries at offset 0x22a78 (64 bytes each)
Flags: 0x0
Header size: 64 bytes
Section name string table: section 30
Symbol Table Analysis
$ ./elfmap --symbols /bin/ls | head -30
╔══════════════════════════════════════════════════════════════════╗
║ SYMBOL TABLE ANALYSIS ║
╚══════════════════════════════════════════════════════════════════╝
.dynsym: 125 entries (dynamic symbols - used at runtime)
.symtab: [stripped - not present]
Dynamic Symbols:
─────────────────────────────────────────────────────────────────────
Num Value Size Type Bind Vis Ndx Name
─────────────────────────────────────────────────────────────────────
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 FUNC GLOBAL DEFAULT UND __ctype_toupper_loc
2: 0000000000000000 0 FUNC GLOBAL DEFAULT UND getenv
3: 0000000000000000 0 FUNC GLOBAL DEFAULT UND sigprocmask
4: 0000000000000000 0 FUNC GLOBAL DEFAULT UND __snprintf_chk
5: 0000000000000000 0 FUNC GLOBAL DEFAULT UND raise
6: 0000000000000000 0 FUNC GLOBAL DEFAULT UND free
7: 0000000000000000 0 FUNC GLOBAL DEFAULT UND abort
8: 0000000000000000 0 FUNC GLOBAL DEFAULT UND __errno_location
9: 0000000000000000 0 FUNC GLOBAL DEFAULT UND strncmp
10: 0000000000000000 0 FUNC WEAK DEFAULT UND _ITM_deregisterTM...
...
Symbol Statistics:
FUNC: 89 (71.2%) NOTYPE: 15 (12.0%)
OBJECT: 18 (14.4%) TLS: 3 ( 2.4%)
GLOBAL: 95 (76.0%) WEAK: 20 (16.0%)
LOCAL: 10 ( 8.0%)
Undefined (UND): 78 (62.4%) - resolved at runtime from shared libraries
Relocation Analysis
$ ./elfmap --relocs /bin/ls
╔══════════════════════════════════════════════════════════════════╗
║ RELOCATION TABLE ANALYSIS ║
╚══════════════════════════════════════════════════════════════════╝
.rela.dyn: 192 entries (data relocations - resolved at load time)
.rela.plt: 102 entries (PLT relocations - lazy binding)
─────────────────────────────────────────────────────────────────────
.rela.dyn (Data Relocations):
─────────────────────────────────────────────────────────────────────
Offset Type Symbol + Addend
0000000022fc8 R_X86_64_RELATIVE +0x13f20
0000000022fd0 R_X86_64_RELATIVE +0x13ee0
0000000023050 R_X86_64_RELATIVE +0x13f10
0000000022f88 R_X86_64_GLOB_DAT __ctype_toupper_loc + 0
0000000022f90 R_X86_64_GLOB_DAT __ctype_b_loc + 0
0000000022f98 R_X86_64_GLOB_DAT optind + 0
...
─────────────────────────────────────────────────────────────────────
.rela.plt (PLT/GOT Relocations - Lazy Binding):
─────────────────────────────────────────────────────────────────────
Offset Type Symbol
0000000023088 R_X86_64_JUMP_SLOT getenv
0000000023090 R_X86_64_JUMP_SLOT sigprocmask
0000000023098 R_X86_64_JUMP_SLOT raise
00000000230a0 R_X86_64_JUMP_SLOT free
00000000230a8 R_X86_64_JUMP_SLOT abort
...
Relocation Statistics:
R_X86_64_RELATIVE: 90 (46.9%) - PIE base address fixups
R_X86_64_GLOB_DAT: 12 ( 6.3%) - global data pointers
R_X86_64_JUMP_SLOT: 102 (53.1%) - PLT function pointers
PLT/GOT Tracing
$ ./elfmap --pltgot ./test_program
╔══════════════════════════════════════════════════════════════════╗
║ PLT/GOT ANALYSIS ║
╚══════════════════════════════════════════════════════════════════╝
PLT (Procedure Linkage Table):
Address: 0x1060
Size: 256 bytes
Entries: 15 stubs
GOT (Global Offset Table):
Address: 0x3f70
Size: 168 bytes
Entries: 21 pointers
PLT → GOT Mapping:
─────────────────────────────────────────────────────────────────────
PLT Entry GOT Entry Symbol State
─────────────────────────────────────────────────────────────────────
0x1070 0x3f90 puts UNRESOLVED
0x1080 0x3f98 strlen UNRESOLVED
0x1090 0x3fa0 __libc_start_main UNRESOLVED
0x10a0 0x3fa8 malloc UNRESOLVED
0x10b0 0x3fb0 printf UNRESOLVED
...
PLT Stub Disassembly (printf@plt):
0x10b0: endbr64
0x10b4: bnd jmp QWORD PTR [rip+0x2ef5] # 0x3fb0 <printf@GLIBC_2.2.5>
0x10bb: nop DWORD PTR [rax+rax*1+0x0]
Library Interposition Demo
$ cat > malloc_trace.c << 'EOF'
#define _GNU_SOURCE
#include <stdio.h>
#include <dlfcn.h>
void *malloc(size_t size) {
static void *(*real_malloc)(size_t) = NULL;
if (!real_malloc) real_malloc = dlsym(RTLD_NEXT, "malloc");
void *ptr = real_malloc(size);
fprintf(stderr, "malloc(%zu) = %p\n", size, ptr);
return ptr;
}
EOF
$ gcc -shared -fPIC -o libmalloc_trace.so malloc_trace.c -ldl
$ LD_PRELOAD=./libmalloc_trace.so ls
malloc(472) = 0x5555557a3010
malloc(120) = 0x5555557a31f0
malloc(1024) = 0x5555557a3270
malloc(13) = 0x5555557a3680
...
Desktop Documents Downloads Pictures test_program
Interposition Comparison
$ ./interpose_demo
╔══════════════════════════════════════════════════════════════════╗
║ LIBRARY INTERPOSITION DEMONSTRATION ║
╚══════════════════════════════════════════════════════════════════╝
Testing three interposition techniques with malloc():
1. COMPILE-TIME (wrapper function):
─────────────────────────────────────────────────────────────
Technique: #define malloc(s) my_malloc(s)
Pros: Zero runtime overhead, catches all calls in our code
Cons: Requires source access, doesn't affect libraries
Result: malloc(1024) -> my_malloc captured, real_malloc returned 0x12340000
2. LINK-TIME (--wrap flag):
─────────────────────────────────────────────────────────────
Technique: gcc -Wl,--wrap,malloc
Pros: No source changes needed, can wrap any symbol
Cons: Static linking only, must relink
Build: gcc -Wl,--wrap,malloc -o prog prog.o wrap.o
Result: __wrap_malloc called, forwarded to __real_malloc
3. RUN-TIME (LD_PRELOAD):
─────────────────────────────────────────────────────────────
Technique: LD_PRELOAD=./libhook.so
Pros: Works on any binary, no recompilation
Cons: Only affects dynamically linked calls, slight overhead
Result: Interposed malloc() called 47 times during program execution
Solution Architecture
High-Level Design
+===========================================================================+
| ELF LINK MAP TOOLKIT |
+===========================================================================+
| |
| ┌───────────────┐ ┌───────────────┐ ┌───────────────┐ ┌───────────┐ |
| │ elfinspect │ │ linkmap │ │ pltrace │ │ interpose │ |
| │ (ELF Parser) │ │ (Analyzer) │ │ (Tracer) │ │ (Demos) │ |
| └───────┬───────┘ └───────┬───────┘ └───────┬───────┘ └─────┬─────┘ |
| │ │ │ │ |
| └──────────────────┼──────────────────┼────────────────┘ |
| │ │ |
| ▼ ▼ |
| ┌───────────────────────────────────┐ |
| │ Common Library │ |
| │ │ |
| │ elf_parser.c - ELF reading │ |
| │ symbol.c - Symbol handling │ |
| │ reloc.c - Relocations │ |
| │ format.c - Output format │ |
| │ util.c - Utilities │ |
| │ │ |
| └───────────────────────────────────┘ |
| |
+===========================================================================+
Key Data Structures
// elf_types.h
#include <elf.h>
// Parsed ELF file representation
typedef struct {
int fd; // File descriptor
void *map; // Memory-mapped file
size_t size; // File size
Elf64_Ehdr *ehdr; // ELF header
Elf64_Shdr *shdrs; // Section headers
Elf64_Phdr *phdrs; // Program headers (if any)
char *shstrtab; // Section name string table
char *strtab; // Symbol string table
char *dynstr; // Dynamic string table
Elf64_Sym *symtab; // Symbol table
size_t symtab_count;
Elf64_Sym *dynsym; // Dynamic symbol table
size_t dynsym_count;
Elf64_Rela *rela_text; // .rela.text relocations
size_t rela_text_count;
Elf64_Rela *rela_plt; // .rela.plt relocations
size_t rela_plt_count;
Elf64_Dyn *dynamic; // .dynamic section
size_t dynamic_count;
} ElfFile;
// Symbol with resolved information
typedef struct {
char *name;
Elf64_Addr value;
Elf64_Xword size;
unsigned char type;
unsigned char bind;
uint16_t shndx;
const char *section_name;
int is_defined;
} Symbol;
// Relocation with resolved information
typedef struct {
Elf64_Addr offset;
uint32_t type;
uint32_t sym_idx;
char *symbol_name;
Elf64_Sxword addend;
const char *type_name;
char *explanation;
} Relocation;
Module Structure
elf-toolkit/
├── include/
│ ├── elf_parser.h # ELF parsing functions
│ ├── symbol.h # Symbol handling
│ ├── reloc.h # Relocation handling
│ ├── format.h # Output formatting
│ └── util.h # Utilities
│
├── src/
│ ├── common/
│ │ ├── elf_parser.c # Core ELF parsing
│ │ ├── symbol.c # Symbol table operations
│ │ ├── reloc.c # Relocation processing
│ │ ├── format.c # Pretty-printing
│ │ └── util.c # Memory, error handling
│ │
│ ├── elfinspect/
│ │ └── main.c # elfinspect entry point
│ │
│ ├── linkmap/
│ │ ├── main.c # linkmap entry point
│ │ └── resolver.c # Symbol resolution logic
│ │
│ ├── pltrace/
│ │ ├── main.c # pltrace entry point
│ │ └── tracer.c # PLT/GOT tracing
│ │
│ └── interpose/
│ ├── compile_time.c # Compile-time wrapper
│ ├── link_time.c # Link-time wrapper
│ └── runtime.c # LD_PRELOAD library
│
├── tests/
│ ├── test_elf_parser.c
│ ├── test_symbols.c
│ ├── test_reloc.c
│ ├── samples/ # Test ELF files
│ │ ├── hello.c
│ │ ├── multifile/
│ │ └── dynamic/
│ └── expected/ # Expected outputs
│
├── Makefile
└── README.md
Algorithm Overview
ELF Parsing Algorithm
1. Open file and memory-map it
2. Validate ELF magic number and class (32/64 bit)
3. Parse ELF header to get section header location
4. Load section headers into array
5. Find .shstrtab section (section name strings)
6. For each section:
- If .symtab: Parse symbol table, find .strtab
- If .dynsym: Parse dynamic symbols, find .dynstr
- If .rela.*: Parse relocation entries
- If .dynamic: Parse dynamic section entries
7. Cross-reference symbols with sections and strings
Symbol Resolution Algorithm
For each input object file:
Collect defined symbols (GLOBAL, type != UND)
Collect undefined symbols (type == UND)
Build global symbol table:
For each file:
For each defined symbol:
If already in table:
Check for multiple strong definitions (error)
Apply strong/weak resolution rules
Else:
Add to table with source file
For each undefined symbol:
Search global table for definition
If not found:
Search libraries in order
If still not found: Report unresolved
Implementation Guide
Development Environment Setup
# Install required tools
sudo apt-get install build-essential binutils elfutils libelf-dev
# Verify installations
readelf --version
objdump --version
# Create project structure
mkdir -p elf-toolkit/{include,src/{common,elfinspect,linkmap,pltrace,interpose},tests/samples}
cd elf-toolkit
Phase 1: ELF Parser Foundation (Days 1-4)
Goals:
- Memory-map ELF files
- Parse ELF header
- Navigate section headers
Implementation:
// src/common/elf_parser.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <elf.h>
#include "elf_parser.h"
ElfFile *elf_open(const char *path) {
ElfFile *elf = calloc(1, sizeof(ElfFile));
if (!elf) return NULL;
// Open file
elf->fd = open(path, O_RDONLY);
if (elf->fd < 0) {
perror("open");
free(elf);
return NULL;
}
// Get file size
struct stat st;
if (fstat(elf->fd, &st) < 0) {
perror("fstat");
close(elf->fd);
free(elf);
return NULL;
}
elf->size = st.st_size;
// Memory map the file
elf->map = mmap(NULL, elf->size, PROT_READ, MAP_PRIVATE, elf->fd, 0);
if (elf->map == MAP_FAILED) {
perror("mmap");
close(elf->fd);
free(elf);
return NULL;
}
// Validate ELF magic
unsigned char *ident = (unsigned char *)elf->map;
if (ident[0] != 0x7f || ident[1] != 'E' ||
ident[2] != 'L' || ident[3] != 'F') {
fprintf(stderr, "Not an ELF file\n");
elf_close(elf);
return NULL;
}
// Check class (32 vs 64 bit)
if (ident[EI_CLASS] != ELFCLASS64) {
fprintf(stderr, "Only 64-bit ELF supported\n");
elf_close(elf);
return NULL;
}
// Parse ELF header
elf->ehdr = (Elf64_Ehdr *)elf->map;
// Get section headers
elf->shdrs = (Elf64_Shdr *)((char *)elf->map + elf->ehdr->e_shoff);
// Get section name string table
Elf64_Shdr *shstrtab_hdr = &elf->shdrs[elf->ehdr->e_shstrndx];
elf->shstrtab = (char *)elf->map + shstrtab_hdr->sh_offset;
// Get program headers (if present)
if (elf->ehdr->e_phnum > 0) {
elf->phdrs = (Elf64_Phdr *)((char *)elf->map + elf->ehdr->e_phoff);
}
return elf;
}
void elf_close(ElfFile *elf) {
if (!elf) return;
if (elf->map && elf->map != MAP_FAILED) {
munmap(elf->map, elf->size);
}
if (elf->fd >= 0) {
close(elf->fd);
}
free(elf);
}
const char *elf_section_name(ElfFile *elf, int idx) {
if (idx < 0 || idx >= elf->ehdr->e_shnum) return NULL;
return elf->shstrtab + elf->shdrs[idx].sh_name;
}
Elf64_Shdr *elf_find_section(ElfFile *elf, const char *name) {
for (int i = 0; i < elf->ehdr->e_shnum; i++) {
if (strcmp(elf_section_name(elf, i), name) == 0) {
return &elf->shdrs[i];
}
}
return NULL;
}
Checkpoint: Parse hello.o and print ELF header fields correctly.
Phase 2: Symbol Table Parsing (Days 5-7)
Goals:
- Parse .symtab and .dynsym
- Resolve symbol names from string tables
- Categorize symbols by type and binding
Implementation:
// src/common/symbol.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <elf.h>
#include "elf_parser.h"
#include "symbol.h"
int elf_load_symbols(ElfFile *elf) {
// Find .symtab section
Elf64_Shdr *symtab_hdr = elf_find_section(elf, ".symtab");
if (symtab_hdr) {
elf->symtab = (Elf64_Sym *)((char *)elf->map + symtab_hdr->sh_offset);
elf->symtab_count = symtab_hdr->sh_size / sizeof(Elf64_Sym);
// Get associated string table
Elf64_Shdr *strtab_hdr = &elf->shdrs[symtab_hdr->sh_link];
elf->strtab = (char *)elf->map + strtab_hdr->sh_offset;
}
// Find .dynsym section
Elf64_Shdr *dynsym_hdr = elf_find_section(elf, ".dynsym");
if (dynsym_hdr) {
elf->dynsym = (Elf64_Sym *)((char *)elf->map + dynsym_hdr->sh_offset);
elf->dynsym_count = dynsym_hdr->sh_size / sizeof(Elf64_Sym);
// Get associated string table
Elf64_Shdr *dynstr_hdr = &elf->shdrs[dynsym_hdr->sh_link];
elf->dynstr = (char *)elf->map + dynstr_hdr->sh_offset;
}
return 0;
}
const char *symbol_binding_name(unsigned char bind) {
switch (bind) {
case STB_LOCAL: return "LOCAL";
case STB_GLOBAL: return "GLOBAL";
case STB_WEAK: return "WEAK";
default: return "UNKNOWN";
}
}
const char *symbol_type_name(unsigned char type) {
switch (type) {
case STT_NOTYPE: return "NOTYPE";
case STT_OBJECT: return "OBJECT";
case STT_FUNC: return "FUNC";
case STT_SECTION: return "SECTION";
case STT_FILE: return "FILE";
default: return "UNKNOWN";
}
}
void print_symbol_table(ElfFile *elf, int use_dynamic) {
Elf64_Sym *symtab = use_dynamic ? elf->dynsym : elf->symtab;
size_t count = use_dynamic ? elf->dynsym_count : elf->symtab_count;
char *strtab = use_dynamic ? elf->dynstr : elf->strtab;
if (!symtab || count == 0) {
printf("No symbol table found.\n");
return;
}
printf("\n=== SYMBOL TABLE (%s) ===\n",
use_dynamic ? ".dynsym" : ".symtab");
printf("%6s: %-16s %5s %-7s %-6s %-8s %3s %s\n",
"Num", "Value", "Size", "Type", "Bind", "Vis", "Ndx", "Name");
for (size_t i = 0; i < count; i++) {
Elf64_Sym *sym = &symtab[i];
const char *name = strtab + sym->st_name;
printf("%6zu: %016lx %5lu %-7s %-6s %-8s ",
i,
(unsigned long)sym->st_value,
(unsigned long)sym->st_size,
symbol_type_name(ELF64_ST_TYPE(sym->st_info)),
symbol_binding_name(ELF64_ST_BIND(sym->st_info)),
"DEFAULT"); // Simplified visibility
// Print section index
if (sym->st_shndx == SHN_UNDEF) {
printf("%3s ", "UND");
} else if (sym->st_shndx == SHN_ABS) {
printf("%3s ", "ABS");
} else if (sym->st_shndx == SHN_COMMON) {
printf("%3s ", "COM");
} else {
printf("%3d ", sym->st_shndx);
}
printf("%s\n", name);
}
}
Checkpoint: Display symbol table matching readelf -s output.
Phase 3: Relocation Handling (Days 8-10)
Goals:
- Parse relocation sections
- Explain each relocation type
- Show what needs patching
Implementation:
// src/common/reloc.c
#include <stdio.h>
#include <elf.h>
#include "elf_parser.h"
#include "reloc.h"
typedef struct {
uint32_t type;
const char *name;
const char *formula;
} RelocInfo;
static const RelocInfo reloc_types[] = {
{R_X86_64_NONE, "R_X86_64_NONE", "None"},
{R_X86_64_64, "R_X86_64_64", "S + A"},
{R_X86_64_PC32, "R_X86_64_PC32", "S + A - P"},
{R_X86_64_GOT32, "R_X86_64_GOT32", "G + A"},
{R_X86_64_PLT32, "R_X86_64_PLT32", "L + A - P"},
{R_X86_64_COPY, "R_X86_64_COPY", "Copy symbol"},
{R_X86_64_GLOB_DAT, "R_X86_64_GLOB_DAT", "S (GOT entry)"},
{R_X86_64_JUMP_SLOT, "R_X86_64_JUMP_SLOT", "S (PLT/GOT)"},
{R_X86_64_RELATIVE, "R_X86_64_RELATIVE", "B + A"},
{R_X86_64_GOTPCREL, "R_X86_64_GOTPCREL", "G + GOT + A - P"},
{0, NULL, NULL}
};
const char *reloc_type_name(uint32_t type) {
for (int i = 0; reloc_types[i].name; i++) {
if (reloc_types[i].type == type)
return reloc_types[i].name;
}
return "UNKNOWN";
}
const char *reloc_formula(uint32_t type) {
for (int i = 0; reloc_types[i].name; i++) {
if (reloc_types[i].type == type)
return reloc_types[i].formula;
}
return "?";
}
int elf_load_relocations(ElfFile *elf) {
// Find .rela.text
Elf64_Shdr *rela_text = elf_find_section(elf, ".rela.text");
if (rela_text) {
elf->rela_text = (Elf64_Rela *)((char *)elf->map + rela_text->sh_offset);
elf->rela_text_count = rela_text->sh_size / sizeof(Elf64_Rela);
}
// Find .rela.plt
Elf64_Shdr *rela_plt = elf_find_section(elf, ".rela.plt");
if (rela_plt) {
elf->rela_plt = (Elf64_Rela *)((char *)elf->map + rela_plt->sh_offset);
elf->rela_plt_count = rela_plt->sh_size / sizeof(Elf64_Rela);
}
return 0;
}
void print_relocations(ElfFile *elf, const char *section_name) {
Elf64_Rela *rela;
size_t count;
if (strcmp(section_name, ".rela.text") == 0) {
rela = elf->rela_text;
count = elf->rela_text_count;
} else if (strcmp(section_name, ".rela.plt") == 0) {
rela = elf->rela_plt;
count = elf->rela_plt_count;
} else {
return;
}
if (!rela || count == 0) {
printf("No relocations in %s\n", section_name);
return;
}
printf("\n=== RELOCATION TABLE (%s) ===\n", section_name);
printf("%-16s %-20s %-20s %s\n",
"Offset", "Type", "Symbol", "Addend");
for (size_t i = 0; i < count; i++) {
uint32_t sym_idx = ELF64_R_SYM(rela[i].r_info);
uint32_t type = ELF64_R_TYPE(rela[i].r_info);
const char *sym_name = "";
if (elf->symtab && sym_idx < elf->symtab_count) {
sym_name = elf->strtab + elf->symtab[sym_idx].st_name;
}
printf("%016lx %-20s %-20s %ld\n",
(unsigned long)rela[i].r_offset,
reloc_type_name(type),
sym_name,
(long)rela[i].r_addend);
}
// Print explanation
printf("\nExplanation:\n");
for (size_t i = 0; i < count; i++) {
uint32_t type = ELF64_R_TYPE(rela[i].r_info);
printf(" - Offset 0x%lx: %s (%s)\n",
(unsigned long)rela[i].r_offset,
reloc_type_name(type),
reloc_formula(type));
}
}
Checkpoint: Show relocations with human-readable explanations.
Phase 4: Link Map Analyzer (Days 11-13)
Goals:
- Analyze multiple object files together
- Show symbol dependencies
- Simulate resolution
// src/linkmap/resolver.c
typedef struct {
char *name;
char *source_file;
Elf64_Addr value;
int is_strong; // Strong vs weak
int is_defined;
} GlobalSymbol;
typedef struct {
GlobalSymbol *symbols;
size_t count;
size_t capacity;
} GlobalSymTable;
int resolve_symbols(const char **files, int file_count) {
GlobalSymTable global_table = {0};
// Phase 1: Collect all symbols from all files
for (int i = 0; i < file_count; i++) {
ElfFile *elf = elf_open(files[i]);
if (!elf) continue;
elf_load_symbols(elf);
for (size_t j = 0; j < elf->symtab_count; j++) {
Elf64_Sym *sym = &elf->symtab[j];
unsigned char bind = ELF64_ST_BIND(sym->st_info);
// Only process global and weak symbols
if (bind != STB_GLOBAL && bind != STB_WEAK) continue;
const char *name = elf->strtab + sym->st_name;
if (!name || !*name) continue;
int is_defined = sym->st_shndx != SHN_UNDEF;
int is_strong = bind == STB_GLOBAL && is_defined;
// Add to global table with resolution rules
add_symbol(&global_table, name, files[i],
sym->st_value, is_strong, is_defined);
}
elf_close(elf);
}
// Phase 2: Check for unresolved symbols
for (size_t i = 0; i < global_table.count; i++) {
GlobalSymbol *sym = &global_table.symbols[i];
if (!sym->is_defined) {
printf("UNRESOLVED: %s (needed by %s)\n",
sym->name, sym->source_file);
}
}
return 0;
}
Phase 5: PLT/GOT Tracer (Days 14-16)
Goals:
- Use ptrace or LD_AUDIT to trace PLT calls
- Show GOT resolution in real-time
This phase is more advanced - you can use either:
- LD_AUDIT: A less invasive approach using the audit interface
- ptrace: Full control but more complex
// src/pltrace/tracer.c - Using LD_AUDIT approach
// Create an audit library that logs resolutions
// rtld-audit.so
#define _GNU_SOURCE
#include <link.h>
#include <stdio.h>
unsigned int la_version(unsigned int version) {
return version;
}
unsigned int la_objopen(struct link_map *map, Lmid_t lmid,
uintptr_t *cookie) {
fprintf(stderr, "[LOAD] %s at %p\n",
map->l_name, (void *)map->l_addr);
return LA_FLG_BINDTO | LA_FLG_BINDFROM;
}
uintptr_t la_symbind64(Elf64_Sym *sym, unsigned int ndx,
uintptr_t *refcook, uintptr_t *defcook,
unsigned int *flags, const char *symname) {
fprintf(stderr, "[BIND] %s -> %p\n",
symname, (void *)sym->st_value);
return sym->st_value;
}
# Build and use
gcc -fPIC -shared -o rtld-audit.so rtld-audit.c
LD_AUDIT=./rtld-audit.so ./test_program
Phase 6: Interposition Toolkit (Days 17-19)
Implement all three interposition techniques with demonstration programs.
Runtime Interposition (most important):
// src/interpose/runtime.c - Comprehensive malloc tracer
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <dlfcn.h>
#include <execinfo.h>
#include <pthread.h>
static void *(*real_malloc)(size_t) = NULL;
static void (*real_free)(void *) = NULL;
static void *(*real_realloc)(void *, size_t) = NULL;
static void *(*real_calloc)(size_t, size_t) = NULL;
static size_t total_allocated = 0;
static size_t total_freed = 0;
static size_t current_allocated = 0;
static size_t peak_allocated = 0;
static size_t alloc_count = 0;
static size_t free_count = 0;
static pthread_mutex_t stats_lock = PTHREAD_MUTEX_INITIALIZER;
static int initialized = 0;
static int in_init = 0;
static void init(void) {
if (initialized || in_init) return;
in_init = 1;
real_malloc = dlsym(RTLD_NEXT, "malloc");
real_free = dlsym(RTLD_NEXT, "free");
real_realloc = dlsym(RTLD_NEXT, "realloc");
real_calloc = dlsym(RTLD_NEXT, "calloc");
if (!real_malloc || !real_free) {
fprintf(stderr, "Error loading malloc/free: %s\n", dlerror());
_exit(1);
}
initialized = 1;
in_init = 0;
}
static void print_caller(void) {
void *bt[3];
int n = backtrace(bt, 3);
char **syms = backtrace_symbols(bt, n);
if (syms && n > 2) {
fprintf(stderr, " [from %s]", syms[2]);
}
free(syms);
}
void *malloc(size_t size) {
if (!initialized) init();
void *ptr = real_malloc(size);
pthread_mutex_lock(&stats_lock);
total_allocated += size;
current_allocated += size;
alloc_count++;
if (current_allocated > peak_allocated) {
peak_allocated = current_allocated;
}
pthread_mutex_unlock(&stats_lock);
fprintf(stderr, "[malloc_trace] malloc(%zu) = %p", size, ptr);
print_caller();
fprintf(stderr, "\n");
return ptr;
}
void free(void *ptr) {
if (!initialized) init();
if (!ptr) return;
pthread_mutex_lock(&stats_lock);
free_count++;
// Note: We can't easily track the size of freed memory without extra bookkeeping
pthread_mutex_unlock(&stats_lock);
fprintf(stderr, "[malloc_trace] free(%p)", ptr);
print_caller();
fprintf(stderr, "\n");
real_free(ptr);
}
__attribute__((destructor))
void print_stats(void) {
fprintf(stderr, "\n=== MALLOC TRACE SUMMARY ===\n");
fprintf(stderr, "Total allocations: %zu\n", alloc_count);
fprintf(stderr, "Total frees: %zu\n", free_count);
fprintf(stderr, "Bytes allocated: %zu\n", total_allocated);
fprintf(stderr, "Peak memory: %zu bytes\n", peak_allocated);
if (alloc_count > free_count) {
fprintf(stderr, "WARNING: Potential memory leak (%zu unfreed allocs)\n",
alloc_count - free_count);
}
}
Phase 7: Integration and Polish (Days 20-21)
Goals:
- Combine all tools
- Add comprehensive error handling
- Create demonstration scripts
Testing Strategy
Test Categories
| Category | Purpose | Examples |
|---|---|---|
| Unit Tests | Test parsing functions | ELF header validation |
| Integration Tests | Test full tool output | Compare with readelf |
| Regression Tests | Ensure fixes don’t break | Known-good outputs |
| Edge Cases | Handle unusual inputs | Stripped binaries, malformed ELF |
Test Cases
// Test 1: Simple object file
// hello.c
#include <stdio.h>
int main() {
printf("Hello\n");
return 0;
}
// Expected: 1 undefined (printf), 1 defined (main)
// Test 2: Multiple symbols
// multi.c
int global_init = 42;
int global_uninit;
static int local_var = 10;
static void local_func(void) {}
void global_func(void) { local_func(); }
int main() { return global_init + global_uninit; }
// Expected: 3 global, 2 local, proper categorization
// Test 3: External dependencies
// extern.c
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
int main() {
printf("%f\n", sin(1.0));
char *p = malloc(100);
free(p);
return 0;
}
// Expected: printf, sin, malloc, free as undefined
Validation Against Standard Tools
# Compare your output with standard tools
./elfinspect --symbols test.o > my_output.txt
readelf -s test.o > readelf_output.txt
diff my_output.txt readelf_output.txt
# Verify relocations
./elfinspect --relocations test.o
readelf -r test.o
Common Pitfalls and Debugging
Frequent Mistakes
| Pitfall | Symptom | Solution |
|---|---|---|
| Wrong endianness | Garbage values | Check EI_DATA in ELF header |
| 32/64 bit confusion | Segfault | Check EI_CLASS, use correct structures |
| String table offset | Wrong symbol names | Verify sh_link field |
| Relocation addend | Wrong addresses | Use Rela not Rel on x86-64 |
| PLT vs GOT confusion | Wrong addresses traced | Study the PLT structure carefully |
| Thread safety in interposition | Crashes/deadlocks | Use thread-local storage or locks |
Debugging Strategies
# Compare with standard tools
readelf -a binary > reference.txt
./elfinspect --all binary > mine.txt
diff reference.txt mine.txt
# Hexdump to verify parsing
hexdump -C binary | head -100
# GDB for ELF parsing issues
gdb ./elfinspect
(gdb) break elf_open
(gdb) run test.o
(gdb) print *elf->ehdr
# ltrace for interposition issues
ltrace ./test_program 2>&1 | grep malloc
Common GOT/PLT Debugging
# Examine PLT entries
objdump -d -j .plt binary
# Examine GOT entries
objdump -d -j .got.plt binary
# Watch GOT changes with GDB
gdb ./program
(gdb) break main
(gdb) run
(gdb) x/10gx &printf@got.plt # Before first call
(gdb) call printf("test\n")
(gdb) x/10gx &printf@got.plt # After resolution
Extensions and Challenges
Beginner Extensions
- JSON output: Machine-readable output format
- Symbol search: Find symbol by name across files
- Section hexdump: Show raw bytes of any section
- Dependency graph: DOT format for visualization
Intermediate Extensions
- DWARF debug info: Parse .debug_* sections for source mapping
- Version scripts: Handle symbol versioning
- Weak symbol handling: Full weak/strong resolution
- Archive support: Handle .a static libraries
Advanced Extensions
- Binary patching: Modify GOT entries at runtime
- Full LD_PRELOAD profiler: Track all allocations with size
- Cross-architecture: Support ARM64 ELF files
- Security scanner: Check RELRO, stack canary, PIE
Real-World Connections
Industry Applications
| Application | How This Project Helps |
|---|---|
| Debugging | Understand symbol resolution failures |
| Profiling | Interpose to measure function timing |
| Security | Analyze binary protections |
| Reverse Engineering | Understand program structure |
| Build Systems | Debug linking issues |
| Containers | Understand dynamic library loading |
Related Tools
- ldd: List shared library dependencies
- nm: List symbols
- readelf: Display ELF file information
- objdump: Disassemble and display
- patchelf: Modify ELF files
- ltrace/strace: Trace library/system calls
- Ghidra/IDA: Advanced binary analysis
Interview Relevance
This project prepares you to answer:
- “Explain how dynamic linking works”
- “What happens when you call a function in a shared library?”
- “How would you intercept all malloc calls in a program?”
- “Explain the PLT and GOT”
- “How does LD_PRELOAD work?”
- “What is position-independent code?”
Resources
Essential Reading
- CS:APP Chapter 7: “Linking” - Core concepts
- “Linkers and Loaders” by John Levine - Definitive reference
- ELF Specification: Official format documentation
- System V ABI: x86-64 supplement for relocations
Documentation
man elf- ELF format overviewman dlopen- Dynamic loading APIman rtld-audit- Runtime linker audit interfaceman ld.so- Dynamic linker documentation
Online Resources
Related Projects in This Series
- Previous: P9 (Cache Lab++) - Memory hierarchy understanding
- Foundation: P1 (Toolchain Explorer) - Basic linking concepts
- Next: P11 (Signals + Processes) - Process execution context
Self-Assessment Checklist
Understanding
- I can explain the ELF file structure (header, sections, segments)
- I understand the difference between .symtab and .dynsym
- I can explain each common relocation type and when it’s used
- I understand why PC-relative addressing is used in shared libraries
- I can trace through a PLT/GOT call step by step
- I understand lazy vs immediate binding
- I can explain all three interposition techniques
- I understand the security implications of GOT/PLT
Implementation
- My ELF parser correctly reads headers and sections
- Symbol table output matches readelf -s
- Relocation output matches readelf -r
- Link map analyzer identifies undefined symbols
- PLT/GOT tracer shows resolution events
- All three interposition demos work correctly
- Tools handle edge cases gracefully
Practical Skills
- I can debug linking errors using these tools
- I can profile a program using interposition
- I can explain why a symbol failed to resolve
- I can analyze a binary’s dynamic dependencies
- I can use readelf, objdump, nm, ldd fluently
The Core Question You’re Answering
“How does a collection of separately compiled object files become a running program, and how does the operating system resolve symbols across shared libraries at runtime?”
This project demystifies the “magic” that happens between compilation and execution. Understanding linking is essential for debugging mysterious undefined reference errors, creating plugins, hooking system calls for debugging/security, and writing code that plays well with shared libraries.
Concepts You Must Understand First
Before starting this project, ensure you have a solid grasp of these foundational concepts:
| Concept | Where to Learn | Why It Matters |
|---|---|---|
| Compilation process (preprocessor, compiler, assembler) | CS:APP 7.1 | Understanding what object files contain |
| Virtual memory basics | CS:APP 9.1-9.3 | How program sections map to memory |
| C pointers and memory layout | CS:APP 3.8-3.9 | Parsing binary structures, pointer arithmetic |
| Hexadecimal and binary | CS:APP 2.1 | Reading ELF byte patterns |
| File I/O in C (fopen, fread, mmap) | K&R Ch. 8 | Reading binary files efficiently |
| Static vs dynamic libraries | CS:APP 7.6-7.7 | Why linking works differently for each |
| Position-Independent Code (PIC) | CS:APP 7.12 | How shared libraries can load anywhere |
| x86-64 calling conventions | CS:APP 3.7 | Understanding function calls in PLT |
Questions to Guide Your Design
Work through these questions before writing any code:
-
File Mapping: Should you use read()/fread() or mmap() to access the ELF file? What are the tradeoffs for a tool that needs to jump around the file?
-
Endianness: The ELF header tells you the file’s endianness. How will you handle reading multi-byte fields on a machine with different endianness?
-
String Tables: Symbol names are stored as offsets into string tables. How will you safely convert an offset to a string pointer without buffer overflows?
-
Section vs Segment: Sections are for the linker, segments are for the loader. When would you iterate sections vs segments?
-
Symbol Resolution: Given an undefined symbol in your program, how would you find which shared library provides it? What data structures enable this lookup?
-
GOT Modification: For runtime interposition, you might patch the GOT directly. What memory protection issues will you encounter? How can ptrace or /proc/pid/mem help?
Thinking Exercise
Before coding, trace through what happens when this program runs:
// main.c
#include <stdio.h>
int main() {
printf("Hello\n");
printf("World\n");
return 0;
}
Compiled with: gcc -o hello main.c
Questions to answer by hand:
- When
maincallsprintfthe first time, what address does thecallinstruction target? - What code executes at that address?
- How does the dynamic linker find printf in libc.so?
- What gets written to the GOT?
- When
maincallsprintfthe second time, what’s different?
Draw a diagram showing the PLT stub, GOT entry, and libc’s printf for both the first and second calls.
Solution (click to expand)
First call to printf:
call printf@pltjumps to PLT stub at fixed offset (e.g., 0x1050)- PLT stub:
jmp *GOT[printf]- but GOT initially points back to PLT+6 - PLT stub pushes relocation index, jumps to PLT[0] (resolver)
- Resolver calls
_dl_runtime_resolve(link_map, reloc_index) - Dynamic linker searches loaded libraries for “printf” symbol
- Finds printf at 0x7ffff7a62840 in libc.so
- Patches GOT[printf] = 0x7ffff7a62840
- Jumps to printf
Second call to printf:
call printf@pltjumps to same PLT stub- PLT stub:
jmp *GOT[printf]- now contains 0x7ffff7a62840 - Jumps directly to printf in libc - no resolver!
FIRST CALL: SECOND CALL:
main: main:
call 0x1050 ─┐ call 0x1050 ─┐
│ │
PLT[printf]: │ PLT[printf]: │
jmp *GOT ────┼──┐ jmp *GOT ────┼─────────────┐
push reloc │ │ (GOT points │ │
jmp PLT[0] ◄─┘ │ back here) GOT[printf]: │ │
┌──┘ 0x7fff... ◄──┘ │
GOT[printf]: │ │
0x1056 ◄─────┘ libc.so: │
printf() ◄─────────────────┘
PLT[0]:
push &link_map
jmp _dl_runtime_resolve
│
▼
Searches libc.so for "printf"
Patches GOT[printf] = &printf
Jumps to printf
Hints in Layers
Use these hints progressively if you get stuck.
Hint Layer 1: Getting Started
- Use mmap() to map the ELF file, then cast pointers:
Elf64_Ehdr *ehdr = (Elf64_Ehdr *)mapped_file; - Include
<elf.h>for all the structure definitions and macros - Start with just printing the ELF header fields; don’t try to parse everything at once
Hint Layer 2: Navigating Sections
- Section headers start at
file_base + ehdr->e_shoff - Section name string table is section number
ehdr->e_shstrndx - To get a section name:
strtab + shdr->sh_namewhere strtab is the shstrtab section’s data
Hint Layer 3: Symbol Resolution
- Symbol table entries are in .dynsym section (type SHT_DYNSYM)
- Symbol names are in .dynstr section (linked via sh_link field)
- For each symbol:
name = dynstr + sym->st_name - Use ELF64_ST_BIND() and ELF64_ST_TYPE() macros on st_info
Hint Layer 4: Interposition
- For LD_PRELOAD: define a function with the same signature as the target
- Use
dlsym(RTLD_NEXT, "function_name")to get the real function - Remember to compile with
-fPIC -sharedand link with-ldl - For link-time:
gcc -Wl,--wrap,mallocrenames malloc to __wrap_malloc
The Interview Questions They’ll Ask
After completing this project, you should be able to confidently answer these questions:
- “Explain the difference between .symtab and .dynsym. When is each used?”
- .symtab is for static linking and debugging (often stripped in production)
- .dynsym is for dynamic linking at runtime (always present in shared libs)
- “Walk me through what happens when you call a dynamically linked function like printf().”
- Must cover PLT stub, GOT indirection, lazy binding, resolver
- “What is the difference between RTLD_NEXT and RTLD_DEFAULT in dlsym()?”
- RTLD_NEXT searches libraries loaded after the current one
- RTLD_DEFAULT searches all libraries in load order
- “Why do position-independent executables (PIE) need special relocation types?”
- PIE can load at any address; R_X86_64_RELATIVE relocations adjust pointers
- “How would you intercept all malloc() calls in a program you didn’t compile?”
- LD_PRELOAD, ptrace, or GOT patching; discuss tradeoffs
- “What security implications arise from the PLT/GOT mechanism?”
- GOT overwrite attacks, RELRO (RELocation Read-Only), ASLR
Books That Will Help
| Topic | Book | Specific Chapters |
|---|---|---|
| ELF format and linking fundamentals | CS:APP (3rd ed.) | Chapter 7 (entire chapter) |
| Static linking details | CS:APP (3rd ed.) | Chapter 7.5-7.6 |
| Dynamic linking and PLT/GOT | CS:APP (3rd ed.) | Chapter 7.7-7.12 |
| Library interposition | CS:APP (3rd ed.) | Chapter 7.13 |
| Advanced linker topics | Linkers and Loaders (Levine) | Chapters 1-4, 8-10 |
| ELF specification | TIS ELF Specification v1.2 | Entire document |
| Linux dynamic linker internals | The Linux Programming Interface (Kerrisk) | Chapter 41-42 |
| Binary analysis and security | Practical Binary Analysis (Andriesse) | Chapters 1-5 |
Submission / Completion Criteria
Minimum Viable Completion:
- ELF parser reads headers, sections, symbols
- Symbol table display works
- Basic relocation display works
- One interposition technique demonstrated
Full Completion:
- All four tool components working
- Comprehensive symbol analysis
- PLT/GOT tracing with explanations
- All three interposition techniques
- Clean error handling
Excellence:
- DWARF debug info parsing
- Cross-reference with source code
- Security analysis features
- Comprehensive test suite
- Production-quality documentation
This guide was expanded from CSAPP_3E_DEEP_LEARNING_PROJECTS.md. For the complete learning path, see the project index.