Project 10: ELF Link Map & Interposition Toolkit
Project 10: ELF Link Map & Interposition Toolkit
Build a tool that reveals the hidden world of symbols, relocations, and dynamic linking, then demonstrate function call hooking through library interposition.
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Advanced |
| Time Estimate | 2-3 weeks |
| Language | C (Alternatives: Rust, Zig, C++) |
| Prerequisites | Linux environment, basic binary tooling, P1 (Toolchain Explorer) |
| Key Topics | ELF format, symbol resolution, relocation, PLT/GOT, dynamic linking, interposition |
| CS:APP Chapters | 7 (Linking) |
1. Learning Objectives
By completing this project, you will:
- Parse ELF structures confidently: Read headers, section tables, symbol tables, and relocation entries
- Understand symbol resolution: Explain how the linker and loader find symbol definitions
- Master PLT/GOT mechanics: Trace exactly how a dynamically linked function call works
- Explain relocation types: Understand why different relocation types exist and when each is used
- Implement library interposition: Hook function calls at compile-time, link-time, and run-time
- Debug linking issues: Diagnose and fix common linking problems in real programs
- Reason about security implications: Understand how linking mechanisms affect program security
2. Deep Theoretical Foundation
2.1 The ELF Format: A Complete Tour
ELF (Executable and Linkable Format) is the standard binary format on Unix-like systems. Understanding it is essential for systems programming.
ELF File Structure Overview
+=====================================+
| ELF HEADER | <- Fixed size (52 or 64 bytes)
| Magic, class, endianness, type, |
| machine, entry point, offsets |
+=====================================+
| PROGRAM HEADERS (optional) | <- How to load into memory
| Segment type, offset, vaddr, | (for executables/shared libs)
| paddr, filesz, memsz, flags |
+=====================================+
| |
| SECTIONS |
| |
| .text (code) |
| .rodata (read-only data) |
| .data (initialized data) |
| .bss (uninitialized data) |
| .symtab (symbol table) |
| .strtab (string table) |
| .rela.text (relocations for .text) |
| .dynsym (dynamic symbols) |
| .dynstr (dynamic strings) |
| .plt (procedure linkage table) |
| .got (global offset table) |
| .dynamic (dynamic linking info) |
| ... |
| |
+=====================================+
| SECTION HEADERS | <- Describes all sections
| Name, type, flags, addr, offset, | (for linker/tools)
| size, link, info, align, entsize |
+=====================================+
The ELF Header
// 64-bit ELF header structure
typedef struct {
unsigned char e_ident[16]; // Magic number and identification
Elf64_Half e_type; // Object file type
Elf64_Half e_machine; // Architecture
Elf64_Word e_version; // ELF version
Elf64_Addr e_entry; // Entry point virtual address
Elf64_Off e_phoff; // Program header table file offset
Elf64_Off e_shoff; // Section header table file offset
Elf64_Word e_flags; // Processor-specific flags
Elf64_Half e_ehsize; // ELF header size
Elf64_Half e_phentsize; // Program header table entry size
Elf64_Half e_phnum; // Program header table entry count
Elf64_Half e_shentsize; // Section header table entry size
Elf64_Half e_shnum; // Section header table entry count
Elf64_Half e_shstrndx; // Section header string table index
} Elf64_Ehdr;
// ELF magic number: 0x7f 'E' 'L' 'F'
// e_ident[0] = 0x7f
// e_ident[1] = 'E'
// e_ident[2] = 'L'
// e_ident[3] = 'F'
// e_ident[4] = class (1=32-bit, 2=64-bit)
// e_ident[5] = data encoding (1=little, 2=big endian)
ELF Types (e_type): | Value | Name | Description | |โโ-|โโ|โโโโ-| | 1 | ET_REL | Relocatable object file (.o) | | 2 | ET_EXEC | Executable file | | 3 | ET_DYN | Shared object file (.so) or PIE executable | | 4 | ET_CORE | Core dump |
Section Headers
typedef struct {
Elf64_Word sh_name; // Section name (index into .shstrtab)
Elf64_Word sh_type; // Section type
Elf64_Xword sh_flags; // Section flags
Elf64_Addr sh_addr; // Virtual address in memory
Elf64_Off sh_offset; // Offset in file
Elf64_Xword sh_size; // Size in bytes
Elf64_Word sh_link; // Link to another section
Elf64_Word sh_info; // Additional info
Elf64_Xword sh_addralign; // Alignment constraint
Elf64_Xword sh_entsize; // Entry size if section has table
} Elf64_Shdr;
Key Section Types: | Type | Name | Description | |โโ|โโ|โโโโ-| | SHT_PROGBITS | 1 | Code or data | | SHT_SYMTAB | 2 | Symbol table | | SHT_STRTAB | 3 | String table | | SHT_RELA | 4 | Relocation entries with addends | | SHT_DYNAMIC | 6 | Dynamic linking information | | SHT_DYNSYM | 11 | Dynamic symbol table |
Critical Sections for Linking:
+-------------------+--------------------------------------------------+
| Section | Purpose |
+-------------------+--------------------------------------------------+
| .text | Executable machine code |
| .rodata | Read-only data (string literals, constants) |
| .data | Initialized global/static variables |
| .bss | Uninitialized global/static (zero at load) |
| .symtab | Full symbol table (for debugging/linking) |
| .strtab | String table for .symtab names |
| .dynsym | Dynamic symbol table (runtime resolution) |
| .dynstr | String table for .dynsym names |
| .rel.text/.rela.* | Relocation entries |
| .plt | Procedure Linkage Table stubs |
| .got | Global Offset Table entries |
| .got.plt | GOT entries specifically for PLT |
| .dynamic | Dynamic linking control information |
| .interp | Path to dynamic linker (ld-linux.so) |
+-------------------+--------------------------------------------------+
2.2 Symbol Tables and String Tables
Symbols are the names that connect your code to definitions across files and libraries.
Symbol Table Entry Structure
typedef struct {
Elf64_Word st_name; // Symbol name (index into string table)
unsigned char st_info; // Type and binding
unsigned char st_other; // Visibility
Elf64_Half st_shndx; // Section index
Elf64_Addr st_value; // Symbol value (address or offset)
Elf64_Xword st_size; // Size of the symbol
} Elf64_Sym;
// Macros to extract binding and type from st_info
#define ELF64_ST_BIND(info) ((info) >> 4)
#define ELF64_ST_TYPE(info) ((info) & 0xf)
#define ELF64_ST_INFO(bind, type) (((bind) << 4) + ((type) & 0xf))
Symbol Binding (visibility)
+-------+----------------+------------------------------------------+
| Value | Name | Meaning |
+-------+----------------+------------------------------------------+
| 0 | STB_LOCAL | Not visible outside this object file |
| 1 | STB_GLOBAL | Visible to all; one definition must exist|
| 2 | STB_WEAK | Like global, but can be overridden |
+-------+----------------+------------------------------------------+
Key insight: Local symbols (static functions/variables in C) cannot be referenced from other files. This is why static provides encapsulation.
Symbol Type
+-------+----------------+------------------------------------------+
| Value | Name | Meaning |
+-------+----------------+------------------------------------------+
| 0 | STT_NOTYPE | Type not specified |
| 1 | STT_OBJECT | Data object (variable) |
| 2 | STT_FUNC | Function |
| 3 | STT_SECTION | Section symbol |
| 4 | STT_FILE | Source file name |
+-------+----------------+------------------------------------------+
Special Section Indices
+--------+----------------+------------------------------------------+
| Value | Name | Meaning |
+--------+----------------+------------------------------------------+
| 0 | SHN_UNDEF | Undefined (needs resolution) |
| 0xfff1 | SHN_ABS | Absolute value, not affected by reloc |
| 0xfff2 | SHN_COMMON | Common block (tentative definition) |
+--------+----------------+------------------------------------------+
Reading symbols with nm:
$ nm hello.o
U printf # Undefined, needs linking
0000000000000000 T main # Text (code), defined here
0000000000000000 D global_var # Data, initialized
0000000000000004 C uninit_var # Common (uninitialized)
String Tables
String tables are simple: just null-terminated strings packed together. Symbol names are stored as offsets into the string table:
Offset: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
Data: | \0| m | a | i | n | \0| p | r | i | n | t | f | \0| x | \0|...|
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
Symbol "main" -> st_name = 1
Symbol "printf" -> st_name = 6
Symbol "x" -> st_name = 13
2.3 Relocation: Patching Addresses
When the compiler generates code, it doesnโt know where symbols will be placed in memory. Relocation entries tell the linker how to patch these addresses.
Relocation Entry Structure
// Relocation entry with addend (most common on x86-64)
typedef struct {
Elf64_Addr r_offset; // Location to patch
Elf64_Xword r_info; // Symbol index and relocation type
Elf64_Sxword r_addend; // Addend for computation
} Elf64_Rela;
#define ELF64_R_SYM(info) ((info) >> 32)
#define ELF64_R_TYPE(info) ((info) & 0xffffffff)
Common Relocation Types (x86-64)
+----------------------+------------------------------------------------+
| Relocation Type | Computation |
+----------------------+------------------------------------------------+
| R_X86_64_64 | S + A (absolute 64-bit address) |
| R_X86_64_PC32 | S + A - P (PC-relative 32-bit) |
| R_X86_64_PLT32 | L + A - P (PLT entry, 32-bit PC-relative) |
| R_X86_64_GOTPCREL | G + GOT + A - P (GOT entry, PC-relative) |
| R_X86_64_GLOB_DAT | S (GOT entry for data) |
| R_X86_64_JUMP_SLOT | S (GOT entry for function/PLT) |
+----------------------+------------------------------------------------+
Where:
S = Value of the symbol
A = Addend from relocation entry
P = Place (address being relocated)
L = Address of PLT entry
G = Offset into GOT
GOT = Address of GOT
Why PC-Relative Addressing?
Position-Independent Code (PIC) uses PC-relative addressing to work regardless of where itโs loaded:
Absolute: CALL 0x401234 ; Only works if code at that address
PC-Relative: CALL [RIP + 0x123] ; Works at any address
PC-relative formula: target = current_address + offset
If code moves, both current_address and target move by the same amount,
so the offset stays correct!
2.4 Static Linking: Resolving at Build Time
Static linking happens when you create an executable from object files and static libraries (.a).
The Static Linkerโs Job
+===========================================================================+
| STATIC LINKER WORKFLOW |
+===========================================================================+
| |
| INPUT: |
| main.o โโโโโโโโโ |
| helper.o โโโโโโโโผโโโโโโโโโโโ> LINKER (ld) โโโโโโ> EXECUTABLE |
| libfoo.a โโโโโโโโ โ |
| โ |
| PROCESS: โผ |
| |
| 1. SYMBOL RESOLUTION |
| - Collect all defined symbols from input files |
| - For each undefined symbol, find a defining module |
| - Error if undefined symbol has no definition |
| - Error if multiple strong definitions exist |
| |
| 2. RELOCATION |
| - Merge sections (.text from all inputs โ one .text) |
| - Assign runtime addresses to all symbols |
| - Patch relocation entries with final addresses |
| |
| 3. OUTPUT |
| - Write ELF executable with program headers for loader |
| |
+===========================================================================+
Symbol Resolution Rules
// Strong symbols: functions and initialized global variables
int x = 5; // Strong (initialized)
int foo() { return 1; } // Strong (function)
// Weak symbols: uninitialized global variables
int y; // Weak (tentative definition)
/* RESOLUTION RULES:
* 1. Multiple strong definitions โ ERROR
* 2. One strong + multiple weak โ Use strong
* 3. Multiple weak only โ Pick one (usually largest)
*/
// Example that causes problems:
// file1.c: int x = 5; // Strong
// file2.c: int x = 10; // Strong โ LINKER ERROR!
// This "works" but is dangerous:
// file1.c: int x = 5; // Strong
// file2.c: int x; // Weak โ Uses file1's definition
Static Library Scanning
Static libraries (.a) are archives of object files. The linker scans them left-to-right:
# Order matters!
gcc main.o -L. -lfoo -lbar # Search for undefined symbols in order
# If main.o needs symbol from libbar.a, and libbar.a needs symbol from
# libfoo.a, this will fail:
gcc main.o -lbar -lfoo # Wrong order!
# Correct:
gcc main.o -lfoo -lbar # Or specify -lfoo again after -lbar
2.5 Dynamic Linking: Resolution at Load Time
Dynamic linking defers symbol resolution until the program runs. This is more complex but has significant advantages.
Why Dynamic Linking?
+----------------------------+----------------------------+
| STATIC LINKING | DYNAMIC LINKING |
+----------------------------+----------------------------+
| + Self-contained binary | + Smaller executables |
| + No runtime dependencies | + Shared memory for libs |
| + Slightly faster startup | + Update libs without |
| - Larger file size | recompiling |
| - Duplicate code in memory | + Required for plugins |
| - Can't update libs | - Runtime overhead |
| without recompile | - Dependency management |
+----------------------------+----------------------------+
The Dynamic Linker (ld-linux.so)
+===========================================================================+
| PROGRAM LOADING WITH DYNAMIC LINKING |
+===========================================================================+
| |
| 1. KERNEL LOADS EXECUTABLE |
| - Read ELF header, create memory mappings |
| - Find .interp section (path to ld-linux.so) |
| - Load dynamic linker into address space |
| - Transfer control to dynamic linker |
| |
| 2. DYNAMIC LINKER INITIALIZATION |
| - Load shared libraries listed in DT_NEEDED entries |
| - Recursively load dependencies |
| - Process relocations (patch GOT entries) |
| - Run initialization functions (.init, .ctors) |
| |
| 3. TRANSFER TO APPLICATION |
| - Jump to program's entry point (e_entry) |
| - __libc_start_main calls main() |
| |
+===========================================================================+
The .dynamic Section
The .dynamic section contains tags that control dynamic linking:
typedef struct {
Elf64_Sxword d_tag; // Type of entry
union {
Elf64_Xword d_val; // Integer value
Elf64_Addr d_ptr; // Address value
} d_un;
} Elf64_Dyn;
Key Dynamic Tags:
+-----------+--------------------------------------------------+
| Tag | Meaning |
+-----------+--------------------------------------------------+
| DT_NEEDED | Name of required shared library |
| DT_SONAME | Shared object name |
| DT_SYMTAB | Address of dynamic symbol table |
| DT_STRTAB | Address of dynamic string table |
| DT_PLTREL | Type of PLT relocations |
| DT_JMPREL | Address of PLT relocations |
| DT_PLTGOT | Address of GOT (for PLT) |
| DT_RELA | Address of relocation table |
| DT_RELASZ | Size of relocation table |
| DT_INIT | Address of initialization function |
| DT_FINI | Address of finalization function |
+-----------+--------------------------------------------------+
2.6 PLT and GOT: The Heart of Dynamic Linking
The Procedure Linkage Table (PLT) and Global Offset Table (GOT) work together to enable lazy binding of dynamically linked functions.
High-Level Overview
+===========================================================================+
| PLT/GOT MECHANISM |
+===========================================================================+
| |
| YOUR CODE PLT GOT |
| --------- --- --- |
| |
| call printf@plt โโโ> [PLT entry] โโโ> [GOT entry] โโโ> printf |
| (stub) (address) (in libc) |
| |
| |
| FIRST CALL: |
| PLT stub jumps to resolver, which: |
| 1. Finds printf's actual address in libc.so |
| 2. Patches the GOT entry with that address |
| 3. Jumps to printf |
| |
| SUBSEQUENT CALLS: |
| PLT stub jumps through GOT directly to printf |
| (no resolver involvement) |
| |
+===========================================================================+
Detailed PLT Structure
+===========================================================================+
| PLT ENTRY ANATOMY |
+===========================================================================+
| |
| PLT[0] - Special resolver entry: |
| โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ |
| โ push [GOT+8] ; Push link_map pointer โ |
| โ jmp [GOT+16] ; Jump to _dl_runtime_resolve โ |
| โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ |
| |
| PLT[1] - Entry for printf (typical example): |
| โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ |
| โ jmp [GOT+24] ; Jump to address in GOT entry โ |
| โ push 0 ; Push relocation index (0 = first) โ |
| โ jmp PLT[0] ; Jump to resolver โ |
| โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ |
| |
| PLT[2] - Entry for malloc: |
| โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ |
| โ jmp [GOT+32] ; Jump to address in GOT entry โ |
| โ push 1 ; Push relocation index (1 = second) โ |
| โ jmp PLT[0] ; Jump to resolver โ |
| โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ |
| |
+===========================================================================+
GOT Structure
+===========================================================================+
| GOT LAYOUT |
+===========================================================================+
| |
| GOT[0] = Address of .dynamic section |
| GOT[1] = Pointer to link_map (struct for this shared object) |
| GOT[2] = Address of _dl_runtime_resolve function |
| GOT[3] = Entry for printf (initially points to PLT[1]+6) |
| GOT[4] = Entry for malloc (initially points to PLT[2]+6) |
| ... |
| |
| Before resolution: |
| GOT[3] -> PLT[1]+6 (instruction after jmp, the push instruction) |
| |
| After resolution: |
| GOT[3] -> 0x7fff... (actual address of printf in libc.so) |
| |
+===========================================================================+
Complete Lazy Binding Sequence
+===========================================================================+
| LAZY BINDING STEP-BY-STEP |
+===========================================================================+
| |
| FIRST CALL TO printf: |
| โโโโโโโโโโโโโโโโโโโโโ |
| |
| 1. Code executes: call printf@plt |
| โ |
| 2. Jump to PLT[1]: jmp [GOT+24] |
| GOT+24 initially contains address of PLT[1]+6 |
| โ |
| 3. Fall through to: push 0 (relocation index) |
| โ |
| 4. Jump to PLT[0]: jmp PLT[0] |
| โ |
| 5. PLT[0] executes: |
| - push [GOT+8] ; Push link_map |
| - jmp [GOT+16] ; Jump to _dl_runtime_resolve |
| โ |
| 6. _dl_runtime_resolve: |
| - Uses relocation index (0) to find symbol name ("printf") |
| - Searches loaded libraries for printf |
| - Finds printf at 0x7fff12345678 in libc.so |
| - Writes 0x7fff12345678 to GOT+24 |
| - Jumps to printf (not returns!) |
| |
| SECOND CALL TO printf: |
| โโโโโโโโโโโโโโโโโโโโโโ |
| |
| 1. Code executes: call printf@plt |
| โ |
| 2. Jump to PLT[1]: jmp [GOT+24] |
| GOT+24 now contains 0x7fff12345678 |
| โ |
| 3. Direct jump to printf - done! |
| |
+===========================================================================+
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ ASCII DIAGRAM โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Your Program PLT GOT libc.so
โโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโ
โ โ โ PLT[0] โ โ GOT[0] โ โ โ
โ call โโโโ>โ resolver โ โ .dynamic โ โ โ
โ printf โ โ stub โ โ โ โ โ
โ @plt โ โโโโโโโโโโโโค โโโโโโโโโโโโค โ โ
โ โ โ PLT[1] โ โ GOT[1] โ โ โ
โ โ โ jmp [GOT]โโโโโโโโโ>โlink_map โ โ โ
โ โ โ push 0 โ โ โ โ โ
โ โ โ jmp PLT0 โ โโโโโโโโโโโโค โ โ
โ โ โ โ โ GOT[2] โ โ โ
โ โ โโโโโโโโโโโโค โ resolver โ โ โ
โ โ โ PLT[2] โ โโโโโโโโโโโโค โโโโโโโโโโโโค
โ โ โ ... โ โ GOT[3] โ โโโโ>โ printf โ
โ โ โ โ โ printf โ โ code โ
โโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโ
First call: PLT โ GOT (has PLT+6) โ back to PLT โ resolver โ writes GOT
Later calls: PLT โ GOT (has printf addr) โ printf directly
Immediate Binding vs Lazy Binding
# Lazy binding (default): Resolve symbols on first use
./my_program
# Immediate binding: Resolve all symbols at load time
LD_BIND_NOW=1 ./my_program
# Or compile with -z now:
gcc -Wl,-z,now -o my_program main.c
When to use immediate binding:
- Security (RELRO + BIND_NOW = no writable GOT after startup)
- Debugging (fail-fast if symbol missing)
- Real-time systems (no unpredictable latency from resolution)
2.7 Position Independent Code (PIC)
Shared libraries must work when loaded at any address. PIC achieves this using PC-relative addressing.
How PIC Accesses Global Data
+===========================================================================+
| PIC DATA ACCESS |
+===========================================================================+
| |
| WITHOUT PIC (absolute addressing): |
| โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ |
| โ mov rax, [0x601020] ; Load from absolute address โ |
| โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ |
| Problem: Only works if loaded at expected address! |
| |
| WITH PIC (PC-relative through GOT): |
| โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ |
| โ mov rax, [rip + global_var@GOTPCREL] ; Get GOT entry address โ |
| โ mov rax, [rax] ; Load actual value โ |
| โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ |
| Works at any address because: |
| 1. GOT is always at fixed offset from code |
| 2. Dynamic linker fills GOT with actual addresses at load time |
| |
+===========================================================================+
Compiling with PIC
# For shared libraries, PIC is required on most systems
gcc -fPIC -shared -o libfoo.so foo.c
# For executables, PIE (Position Independent Executable) is optional
gcc -fPIE -pie -o my_program main.c # PIE executable
gcc -no-pie -o my_program main.c # Traditional executable
2.8 Library Interposition
Interposition lets you intercept calls to library functions. There are three approaches:
Compile-Time Interposition
Replace function at compile time using macros:
// mymalloc.c - compile-time interposition wrapper
#ifdef COMPILETIME
#include <stdio.h>
#include <malloc.h>
// Intercept malloc
void *mymalloc(size_t size) {
void *ptr = malloc(size);
printf("malloc(%zu) = %p\n", size, ptr);
return ptr;
}
// Intercept free
void myfree(void *ptr) {
printf("free(%p)\n", ptr);
free(ptr);
}
#endif
// malloc.h - redefine malloc/free
#define malloc(size) mymalloc(size)
#define free(ptr) myfree(ptr)
void *mymalloc(size_t size);
void myfree(void *ptr);
# Compile with interposition
gcc -DCOMPILETIME -c mymalloc.c
gcc -I. -o my_program main.c mymalloc.o
Link-Time Interposition
Use linkerโs --wrap option:
// mymalloc.c - link-time interposition
#include <stdio.h>
// __real_malloc is the actual malloc
void *__real_malloc(size_t size);
// __wrap_malloc intercepts calls to malloc
void *__wrap_malloc(size_t size) {
void *ptr = __real_malloc(size);
printf("malloc(%zu) = %p\n", size, ptr);
return ptr;
}
void __real_free(void *ptr);
void __wrap_free(void *ptr) {
printf("free(%p)\n", ptr);
__real_free(ptr);
}
gcc -c mymalloc.c
gcc -Wl,--wrap,malloc -Wl,--wrap,free -o my_program main.c mymalloc.o
Run-Time Interposition (LD_PRELOAD)
The most powerful approach - intercept at runtime without recompiling:
// mymalloc.c - run-time interposition
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <dlfcn.h>
// Function pointer to original malloc
static void *(*real_malloc)(size_t) = NULL;
static void (*real_free)(void *) = NULL;
// Initialize pointers to real functions
static void init(void) {
real_malloc = dlsym(RTLD_NEXT, "malloc");
real_free = dlsym(RTLD_NEXT, "free");
if (!real_malloc || !real_free) {
fprintf(stderr, "Error loading symbols: %s\n", dlerror());
exit(1);
}
}
// Interpose malloc
void *malloc(size_t size) {
if (!real_malloc) init();
void *ptr = real_malloc(size);
fprintf(stderr, "malloc(%zu) = %p\n", size, ptr);
return ptr;
}
// Interpose free
void free(void *ptr) {
if (!real_free) init();
fprintf(stderr, "free(%p)\n", ptr);
real_free(ptr);
}
# Build as shared library
gcc -fPIC -shared -o mymalloc.so mymalloc.c -ldl
# Use with any program - no recompilation needed!
LD_PRELOAD=./mymalloc.so ls
LD_PRELOAD=./mymalloc.so /bin/cat file.txt
LD_PRELOAD Search Order
+===========================================================================+
| DYNAMIC LINKER SYMBOL SEARCH ORDER |
+===========================================================================+
| |
| When resolving a symbol, the dynamic linker searches: |
| |
| 1. LD_PRELOAD libraries (searched first!) |
| 2. Executable itself (if not RTLD_LOCAL) |
| 3. DT_NEEDED libraries in order |
| 4. Libraries loaded by DT_NEEDED libraries (BFS) |
| |
| Example: |
| LD_PRELOAD=mymalloc.so ./program |
| |
| Search for "malloc": |
| 1. mymalloc.so - FOUND! Uses this malloc |
| 2. (Never reaches libc.so.6 which has the "real" malloc) |
| |
+===========================================================================+
RTLD_NEXT Magic
// dlsym(RTLD_NEXT, "symbol") returns the NEXT definition of symbol
// after the current library in the search order
// In our malloc wrapper:
void *malloc(size_t size) {
// RTLD_NEXT skips mymalloc.so and finds libc's malloc
void *(*real_malloc)(size_t) = dlsym(RTLD_NEXT, "malloc");
void *ptr = real_malloc(size); // Call the real one
log("malloc(%zu) = %p\n", size, ptr);
return ptr;
}
2.9 Security Implications
Linking mechanisms have significant security implications:
RELRO (RELocation Read-Only)
# Partial RELRO (default): .got is read-only, .got.plt is writable
gcc -Wl,-z,relro -o program main.c
# Full RELRO: All GOT entries read-only after binding
gcc -Wl,-z,relro,-z,now -o program main.c
Partial RELRO:
โโโโโโโโโโโโโโโ
โ .got โ Read-Only (non-PLT globals)
โโโโโโโโโโโโโโโค
โ .got.plt โ Writable (function pointers - can be hijacked!)
โโโโโโโโโโโโโโโ
Full RELRO:
โโโโโโโโโโโโโโโ
โ .got โ Read-Only
โโโโโโโโโโโโโโโค
โ .got.plt โ Read-Only (resolved at startup)
โโโโโโโโโโโโโโโ
GOT/PLT Attacks and Defenses
ATTACK: Overwrite GOT entry with address of malicious code
Next call to that function jumps to attacker's code
DEFENSE:
1. Full RELRO - GOT becomes read-only after startup
2. PIE + ASLR - Attacker can't predict GOT address
3. Stack canaries - Prevent buffer overflows that reach GOT
3. Project Specification
3.1 What You Will Build
A comprehensive toolkit that:
- ELF Inspector: Parse and display symbol tables, sections, and relocations from ELF files
- Link Map Analyzer: Show how symbols are resolved between object files
- PLT/GOT Tracer: Trace dynamic symbol resolution at runtime
- Interposition Toolkit: Demonstrate all three interposition techniques
3.2 Functional Requirements
Component 1: ELF Inspector (elfinspect)
# Basic usage
./elfinspect <elf-file>
# Options
./elfinspect --header hello.o # Show ELF header
./elfinspect --sections hello.o # List sections
./elfinspect --symbols hello.o # List symbols
./elfinspect --relocations hello.o # Show relocations
./elfinspect --dynamic /bin/ls # Show dynamic linking info
./elfinspect --all hello # Full analysis
Component 2: Link Map Analyzer (linkmap)
# Analyze how symbols resolve across files
./linkmap main.o helper.o libfoo.a
# Show what each symbol needs and provides
./linkmap --deps main.o helper.o
Component 3: PLT/GOT Tracer (pltrace)
# Trace PLT/GOT activity during program execution
./pltrace ./hello_world
# Output: When each symbol is resolved and its final address
Component 4: Interposition Demos
# Compile-time demo
make compiletime-demo
# Link-time demo
make linktime-demo
# Runtime demo
make runtime-demo
3.3 Example Output
ELF Inspector Output
$ ./elfinspect --all hello.o
=== ELF HEADER ===
Class: ELF64
Data: 2's complement, little endian
Type: REL (Relocatable file)
Machine: Advanced Micro Devices X86-64
Entry point: 0x0
Section headers: 13 sections at offset 0x2d8
Program headers: 0 entries
=== SECTION TABLE ===
[Nr] Name Type Address Offset Size Flags
[ 0] NULL 0x0 0x000000 0x0
[ 1] .text PROGBITS 0x0 0x000040 0x28 AX
[ 2] .rela.text RELA 0x0 0x000210 0x48 I
[ 3] .data PROGBITS 0x0 0x000068 0x0 WA
[ 4] .bss NOBITS 0x0 0x000068 0x0 WA
[ 5] .rodata PROGBITS 0x0 0x000068 0x0e A
[ 6] .comment PROGBITS 0x0 0x000076 0x27 MS
[ 7] .note.GNU-stack PROGBITS 0x0 0x00009d 0x0
[ 8] .eh_frame PROGBITS 0x0 0x0000a0 0x38 A
[ 9] .rela.eh_frame RELA 0x0 0x000258 0x18 I
[10] .symtab SYMTAB 0x0 0x0000d8 0x120
[11] .strtab STRTAB 0x0 0x0001f8 0x13
[12] .shstrtab STRTAB 0x0 0x000270 0x61
Flags: A=Alloc, W=Write, X=Execute, M=Merge, S=Strings, I=Info
=== SYMBOL TABLE (.symtab) ===
Num: Value Size Type Bind Vis Ndx Name
0: 0x0 0 NOTYPE LOCAL DEFAULT UND
1: 0x0 0 FILE LOCAL DEFAULT ABS hello.c
2: 0x0 0 SECTION LOCAL DEFAULT 1 .text
3: 0x0 0 SECTION LOCAL DEFAULT 3 .data
4: 0x0 0 SECTION LOCAL DEFAULT 4 .bss
5: 0x0 0 SECTION LOCAL DEFAULT 5 .rodata
6: 0x0 40 FUNC GLOBAL DEFAULT 1 main
7: 0x0 0 NOTYPE GLOBAL DEFAULT UND printf
Summary: 8 symbols (1 function, 1 undefined, 6 other)
=== RELOCATION TABLE (.rela.text) ===
Offset Type Symbol Addend
0x00000009 R_X86_64_PC32 .rodata -0x4
0x00000013 R_X86_64_PLT32 printf -0x4
RELOCATION EXPLANATION:
- At .text+0x09: Reference to string literal in .rodata (PC-relative)
- At .text+0x13: Call to printf via PLT (PC-relative to PLT entry)
Link Map Analysis Output
$ ./linkmap main.o helper.o -lm
=== SYMBOL DEPENDENCY ANALYSIS ===
main.o:
DEFINES: main (FUNC, GLOBAL)
REQUIRES: printf (libc.so.6)
helper (helper.o)
sin (libm.so.6)
helper.o:
DEFINES: helper (FUNC, GLOBAL)
helper_data (DATA, GLOBAL)
REQUIRES: malloc (libc.so.6)
free (libc.so.6)
=== RESOLUTION RESULT ===
Symbol Defined In Address
------ ---------- -------
main main.o 0x401126
helper helper.o 0x401168
helper_data helper.o 0x404020
printf libc.so.6 <runtime>
malloc libc.so.6 <runtime>
free libc.so.6 <runtime>
sin libm.so.6 <runtime>
All symbols resolved successfully.
PLT/GOT Trace Output
$ ./pltrace ./test_program
=== PLT/GOT RESOLUTION TRACE ===
PID: 12345
Executable: ./test_program
[LOAD] Program loaded at base address: 0x555555554000
[LOAD] libc.so.6 loaded at: 0x7ffff7dc2000
[LOAD] libm.so.6 loaded at: 0x7ffff7b9e000
[RESOLVE] First call to printf:
PLT entry: 0x555555555030
GOT entry: 0x555555558018
Before: 0x555555555036 (PLT+6)
After: 0x7ffff7e45040 (printf in libc.so.6)
Elapsed: 0.043ms
[RESOLVE] First call to malloc:
PLT entry: 0x555555555040
GOT entry: 0x555555558020
Before: 0x555555555046 (PLT+6)
After: 0x7ffff7e6e0f0 (malloc in libc.so.6)
Elapsed: 0.021ms
[CALL] printf called 5 more times (no resolution, direct jump)
[CALL] malloc called 3 more times (no resolution, direct jump)
=== SUMMARY ===
Total PLT calls: 12
Lazy resolutions: 4
Direct GOT jumps: 8
Interposition Demo Output
$ make runtime-demo
Building malloc tracer...
gcc -fPIC -shared -o malloc_trace.so malloc_trace.c -ldl
Running test program with interposition:
LD_PRELOAD=./malloc_trace.so ./test_program
=== MALLOC TRACE ===
[malloc_trace] malloc(24) = 0x55a3bc8f12a0 [from main+0x1a]
[malloc_trace] malloc(100) = 0x55a3bc8f12c0 [from main+0x2f]
[malloc_trace] malloc(50) = 0x55a3bc8f1330 [from helper+0x12]
[malloc_trace] free(0x55a3bc8f12c0) [from main+0x58]
[malloc_trace] free(0x55a3bc8f12a0) [from main+0x62]
[malloc_trace] free(0x55a3bc8f1330) [from helper+0x1f]
=== SUMMARY ===
Total allocations: 3
Total frees: 3
Peak memory: 174 bytes
Memory leaked: 0 bytes
4. Solution Architecture
4.1 High-Level Design
+===========================================================================+
| ELF LINK MAP TOOLKIT |
+===========================================================================+
| |
| โโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโ |
| โ elfinspect โ โ linkmap โ โ pltrace โ โ interpose โ |
| โ (ELF Parser) โ โ (Analyzer) โ โ (Tracer) โ โ (Demos) โ |
| โโโโโโโโโฌโโโโโโโโ โโโโโโโโโฌโโโโโโโโ โโโโโโโโโฌโโโโโโโโ โโโโโโโฌโโโโโโ |
| โ โ โ โ |
| โโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโ |
| โ โ |
| โผ โผ |
| โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ |
| โ Common Library โ |
| โ โ |
| โ elf_parser.c - ELF reading โ |
| โ symbol.c - Symbol handling โ |
| โ reloc.c - Relocations โ |
| โ format.c - Output format โ |
| โ util.c - Utilities โ |
| โ โ |
| โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ |
| |
+===========================================================================+
4.2 Key Data Structures
// elf_types.h
#include <elf.h>
// Parsed ELF file representation
typedef struct {
int fd; // File descriptor
void *map; // Memory-mapped file
size_t size; // File size
Elf64_Ehdr *ehdr; // ELF header
Elf64_Shdr *shdrs; // Section headers
Elf64_Phdr *phdrs; // Program headers (if any)
char *shstrtab; // Section name string table
char *strtab; // Symbol string table
char *dynstr; // Dynamic string table
Elf64_Sym *symtab; // Symbol table
size_t symtab_count;
Elf64_Sym *dynsym; // Dynamic symbol table
size_t dynsym_count;
Elf64_Rela *rela_text; // .rela.text relocations
size_t rela_text_count;
Elf64_Rela *rela_plt; // .rela.plt relocations
size_t rela_plt_count;
Elf64_Dyn *dynamic; // .dynamic section
size_t dynamic_count;
} ElfFile;
// Symbol with resolved information
typedef struct {
char *name;
Elf64_Addr value;
Elf64_Xword size;
unsigned char type;
unsigned char bind;
uint16_t shndx;
const char *section_name;
int is_defined;
} Symbol;
// Relocation with resolved information
typedef struct {
Elf64_Addr offset;
uint32_t type;
uint32_t sym_idx;
char *symbol_name;
Elf64_Sxword addend;
const char *type_name;
char *explanation;
} Relocation;
4.3 Module Structure
elf-toolkit/
โโโ include/
โ โโโ elf_parser.h # ELF parsing functions
โ โโโ symbol.h # Symbol handling
โ โโโ reloc.h # Relocation handling
โ โโโ format.h # Output formatting
โ โโโ util.h # Utilities
โ
โโโ src/
โ โโโ common/
โ โ โโโ elf_parser.c # Core ELF parsing
โ โ โโโ symbol.c # Symbol table operations
โ โ โโโ reloc.c # Relocation processing
โ โ โโโ format.c # Pretty-printing
โ โ โโโ util.c # Memory, error handling
โ โ
โ โโโ elfinspect/
โ โ โโโ main.c # elfinspect entry point
โ โ
โ โโโ linkmap/
โ โ โโโ main.c # linkmap entry point
โ โ โโโ resolver.c # Symbol resolution logic
โ โ
โ โโโ pltrace/
โ โ โโโ main.c # pltrace entry point
โ โ โโโ tracer.c # PLT/GOT tracing
โ โ
โ โโโ interpose/
โ โโโ compile_time.c # Compile-time wrapper
โ โโโ link_time.c # Link-time wrapper
โ โโโ runtime.c # LD_PRELOAD library
โ
โโโ tests/
โ โโโ test_elf_parser.c
โ โโโ test_symbols.c
โ โโโ test_reloc.c
โ โโโ samples/ # Test ELF files
โ โ โโโ hello.c
โ โ โโโ multifile/
โ โ โโโ dynamic/
โ โโโ expected/ # Expected outputs
โ
โโโ Makefile
โโโ README.md
4.4 Algorithm Overview
ELF Parsing Algorithm
1. Open file and memory-map it
2. Validate ELF magic number and class (32/64 bit)
3. Parse ELF header to get section header location
4. Load section headers into array
5. Find .shstrtab section (section name strings)
6. For each section:
- If .symtab: Parse symbol table, find .strtab
- If .dynsym: Parse dynamic symbols, find .dynstr
- If .rela.*: Parse relocation entries
- If .dynamic: Parse dynamic section entries
7. Cross-reference symbols with sections and strings
Symbol Resolution Algorithm
For each input object file:
Collect defined symbols (GLOBAL, type != UND)
Collect undefined symbols (type == UND)
Build global symbol table:
For each file:
For each defined symbol:
If already in table:
Check for multiple strong definitions (error)
Apply strong/weak resolution rules
Else:
Add to table with source file
For each undefined symbol:
Search global table for definition
If not found:
Search libraries in order
If still not found: Report unresolved
5. Implementation Guide
5.1 Development Environment Setup
# Install required tools
sudo apt-get install build-essential binutils elfutils libelf-dev
# Verify installations
readelf --version
objdump --version
# Create project structure
mkdir -p elf-toolkit/{include,src/{common,elfinspect,linkmap,pltrace,interpose},tests/samples}
cd elf-toolkit
5.2 Phase 1: ELF Parser Foundation (Days 1-4)
Goals:
- Memory-map ELF files
- Parse ELF header
- Navigate section headers
Implementation:
// src/common/elf_parser.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <elf.h>
#include "elf_parser.h"
ElfFile *elf_open(const char *path) {
ElfFile *elf = calloc(1, sizeof(ElfFile));
if (!elf) return NULL;
// Open file
elf->fd = open(path, O_RDONLY);
if (elf->fd < 0) {
perror("open");
free(elf);
return NULL;
}
// Get file size
struct stat st;
if (fstat(elf->fd, &st) < 0) {
perror("fstat");
close(elf->fd);
free(elf);
return NULL;
}
elf->size = st.st_size;
// Memory map the file
elf->map = mmap(NULL, elf->size, PROT_READ, MAP_PRIVATE, elf->fd, 0);
if (elf->map == MAP_FAILED) {
perror("mmap");
close(elf->fd);
free(elf);
return NULL;
}
// Validate ELF magic
unsigned char *ident = (unsigned char *)elf->map;
if (ident[0] != 0x7f || ident[1] != 'E' ||
ident[2] != 'L' || ident[3] != 'F') {
fprintf(stderr, "Not an ELF file\n");
elf_close(elf);
return NULL;
}
// Check class (32 vs 64 bit)
if (ident[EI_CLASS] != ELFCLASS64) {
fprintf(stderr, "Only 64-bit ELF supported\n");
elf_close(elf);
return NULL;
}
// Parse ELF header
elf->ehdr = (Elf64_Ehdr *)elf->map;
// Get section headers
elf->shdrs = (Elf64_Shdr *)((char *)elf->map + elf->ehdr->e_shoff);
// Get section name string table
Elf64_Shdr *shstrtab_hdr = &elf->shdrs[elf->ehdr->e_shstrndx];
elf->shstrtab = (char *)elf->map + shstrtab_hdr->sh_offset;
// Get program headers (if present)
if (elf->ehdr->e_phnum > 0) {
elf->phdrs = (Elf64_Phdr *)((char *)elf->map + elf->ehdr->e_phoff);
}
return elf;
}
void elf_close(ElfFile *elf) {
if (!elf) return;
if (elf->map && elf->map != MAP_FAILED) {
munmap(elf->map, elf->size);
}
if (elf->fd >= 0) {
close(elf->fd);
}
free(elf);
}
const char *elf_section_name(ElfFile *elf, int idx) {
if (idx < 0 || idx >= elf->ehdr->e_shnum) return NULL;
return elf->shstrtab + elf->shdrs[idx].sh_name;
}
Elf64_Shdr *elf_find_section(ElfFile *elf, const char *name) {
for (int i = 0; i < elf->ehdr->e_shnum; i++) {
if (strcmp(elf_section_name(elf, i), name) == 0) {
return &elf->shdrs[i];
}
}
return NULL;
}
Checkpoint: Parse hello.o and print ELF header fields correctly.
5.3 Phase 2: Symbol Table Parsing (Days 5-7)
Goals:
- Parse .symtab and .dynsym
- Resolve symbol names from string tables
- Categorize symbols by type and binding
Implementation:
// src/common/symbol.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <elf.h>
#include "elf_parser.h"
#include "symbol.h"
int elf_load_symbols(ElfFile *elf) {
// Find .symtab section
Elf64_Shdr *symtab_hdr = elf_find_section(elf, ".symtab");
if (symtab_hdr) {
elf->symtab = (Elf64_Sym *)((char *)elf->map + symtab_hdr->sh_offset);
elf->symtab_count = symtab_hdr->sh_size / sizeof(Elf64_Sym);
// Get associated string table
Elf64_Shdr *strtab_hdr = &elf->shdrs[symtab_hdr->sh_link];
elf->strtab = (char *)elf->map + strtab_hdr->sh_offset;
}
// Find .dynsym section
Elf64_Shdr *dynsym_hdr = elf_find_section(elf, ".dynsym");
if (dynsym_hdr) {
elf->dynsym = (Elf64_Sym *)((char *)elf->map + dynsym_hdr->sh_offset);
elf->dynsym_count = dynsym_hdr->sh_size / sizeof(Elf64_Sym);
// Get associated string table
Elf64_Shdr *dynstr_hdr = &elf->shdrs[dynsym_hdr->sh_link];
elf->dynstr = (char *)elf->map + dynstr_hdr->sh_offset;
}
return 0;
}
const char *symbol_binding_name(unsigned char bind) {
switch (bind) {
case STB_LOCAL: return "LOCAL";
case STB_GLOBAL: return "GLOBAL";
case STB_WEAK: return "WEAK";
default: return "UNKNOWN";
}
}
const char *symbol_type_name(unsigned char type) {
switch (type) {
case STT_NOTYPE: return "NOTYPE";
case STT_OBJECT: return "OBJECT";
case STT_FUNC: return "FUNC";
case STT_SECTION: return "SECTION";
case STT_FILE: return "FILE";
default: return "UNKNOWN";
}
}
void print_symbol_table(ElfFile *elf, int use_dynamic) {
Elf64_Sym *symtab = use_dynamic ? elf->dynsym : elf->symtab;
size_t count = use_dynamic ? elf->dynsym_count : elf->symtab_count;
char *strtab = use_dynamic ? elf->dynstr : elf->strtab;
if (!symtab || count == 0) {
printf("No symbol table found.\n");
return;
}
printf("\n=== SYMBOL TABLE (%s) ===\n",
use_dynamic ? ".dynsym" : ".symtab");
printf("%6s: %-16s %5s %-7s %-6s %-8s %3s %s\n",
"Num", "Value", "Size", "Type", "Bind", "Vis", "Ndx", "Name");
for (size_t i = 0; i < count; i++) {
Elf64_Sym *sym = &symtab[i];
const char *name = strtab + sym->st_name;
printf("%6zu: %016lx %5lu %-7s %-6s %-8s ",
i,
(unsigned long)sym->st_value,
(unsigned long)sym->st_size,
symbol_type_name(ELF64_ST_TYPE(sym->st_info)),
symbol_binding_name(ELF64_ST_BIND(sym->st_info)),
"DEFAULT"); // Simplified visibility
// Print section index
if (sym->st_shndx == SHN_UNDEF) {
printf("%3s ", "UND");
} else if (sym->st_shndx == SHN_ABS) {
printf("%3s ", "ABS");
} else if (sym->st_shndx == SHN_COMMON) {
printf("%3s ", "COM");
} else {
printf("%3d ", sym->st_shndx);
}
printf("%s\n", name);
}
}
Checkpoint: Display symbol table matching readelf -s output.
5.4 Phase 3: Relocation Handling (Days 8-10)
Goals:
- Parse relocation sections
- Explain each relocation type
- Show what needs patching
Implementation:
// src/common/reloc.c
#include <stdio.h>
#include <elf.h>
#include "elf_parser.h"
#include "reloc.h"
typedef struct {
uint32_t type;
const char *name;
const char *formula;
} RelocInfo;
static const RelocInfo reloc_types[] = {
{R_X86_64_NONE, "R_X86_64_NONE", "None"},
{R_X86_64_64, "R_X86_64_64", "S + A"},
{R_X86_64_PC32, "R_X86_64_PC32", "S + A - P"},
{R_X86_64_GOT32, "R_X86_64_GOT32", "G + A"},
{R_X86_64_PLT32, "R_X86_64_PLT32", "L + A - P"},
{R_X86_64_COPY, "R_X86_64_COPY", "Copy symbol"},
{R_X86_64_GLOB_DAT, "R_X86_64_GLOB_DAT", "S (GOT entry)"},
{R_X86_64_JUMP_SLOT, "R_X86_64_JUMP_SLOT", "S (PLT/GOT)"},
{R_X86_64_RELATIVE, "R_X86_64_RELATIVE", "B + A"},
{R_X86_64_GOTPCREL, "R_X86_64_GOTPCREL", "G + GOT + A - P"},
{0, NULL, NULL}
};
const char *reloc_type_name(uint32_t type) {
for (int i = 0; reloc_types[i].name; i++) {
if (reloc_types[i].type == type)
return reloc_types[i].name;
}
return "UNKNOWN";
}
const char *reloc_formula(uint32_t type) {
for (int i = 0; reloc_types[i].name; i++) {
if (reloc_types[i].type == type)
return reloc_types[i].formula;
}
return "?";
}
int elf_load_relocations(ElfFile *elf) {
// Find .rela.text
Elf64_Shdr *rela_text = elf_find_section(elf, ".rela.text");
if (rela_text) {
elf->rela_text = (Elf64_Rela *)((char *)elf->map + rela_text->sh_offset);
elf->rela_text_count = rela_text->sh_size / sizeof(Elf64_Rela);
}
// Find .rela.plt
Elf64_Shdr *rela_plt = elf_find_section(elf, ".rela.plt");
if (rela_plt) {
elf->rela_plt = (Elf64_Rela *)((char *)elf->map + rela_plt->sh_offset);
elf->rela_plt_count = rela_plt->sh_size / sizeof(Elf64_Rela);
}
return 0;
}
void print_relocations(ElfFile *elf, const char *section_name) {
Elf64_Rela *rela;
size_t count;
if (strcmp(section_name, ".rela.text") == 0) {
rela = elf->rela_text;
count = elf->rela_text_count;
} else if (strcmp(section_name, ".rela.plt") == 0) {
rela = elf->rela_plt;
count = elf->rela_plt_count;
} else {
return;
}
if (!rela || count == 0) {
printf("No relocations in %s\n", section_name);
return;
}
printf("\n=== RELOCATION TABLE (%s) ===\n", section_name);
printf("%-16s %-20s %-20s %s\n",
"Offset", "Type", "Symbol", "Addend");
for (size_t i = 0; i < count; i++) {
uint32_t sym_idx = ELF64_R_SYM(rela[i].r_info);
uint32_t type = ELF64_R_TYPE(rela[i].r_info);
const char *sym_name = "";
if (elf->symtab && sym_idx < elf->symtab_count) {
sym_name = elf->strtab + elf->symtab[sym_idx].st_name;
}
printf("%016lx %-20s %-20s %ld\n",
(unsigned long)rela[i].r_offset,
reloc_type_name(type),
sym_name,
(long)rela[i].r_addend);
}
// Print explanation
printf("\nExplanation:\n");
for (size_t i = 0; i < count; i++) {
uint32_t type = ELF64_R_TYPE(rela[i].r_info);
printf(" - Offset 0x%lx: %s (%s)\n",
(unsigned long)rela[i].r_offset,
reloc_type_name(type),
reloc_formula(type));
}
}
Checkpoint: Show relocations with human-readable explanations.
5.5 Phase 4: Link Map Analyzer (Days 11-13)
Goals:
- Analyze multiple object files together
- Show symbol dependencies
- Simulate resolution
// src/linkmap/resolver.c
typedef struct {
char *name;
char *source_file;
Elf64_Addr value;
int is_strong; // Strong vs weak
int is_defined;
} GlobalSymbol;
typedef struct {
GlobalSymbol *symbols;
size_t count;
size_t capacity;
} GlobalSymTable;
int resolve_symbols(const char **files, int file_count) {
GlobalSymTable global_table = {0};
// Phase 1: Collect all symbols from all files
for (int i = 0; i < file_count; i++) {
ElfFile *elf = elf_open(files[i]);
if (!elf) continue;
elf_load_symbols(elf);
for (size_t j = 0; j < elf->symtab_count; j++) {
Elf64_Sym *sym = &elf->symtab[j];
unsigned char bind = ELF64_ST_BIND(sym->st_info);
// Only process global and weak symbols
if (bind != STB_GLOBAL && bind != STB_WEAK) continue;
const char *name = elf->strtab + sym->st_name;
if (!name || !*name) continue;
int is_defined = sym->st_shndx != SHN_UNDEF;
int is_strong = bind == STB_GLOBAL && is_defined;
// Add to global table with resolution rules
add_symbol(&global_table, name, files[i],
sym->st_value, is_strong, is_defined);
}
elf_close(elf);
}
// Phase 2: Check for unresolved symbols
for (size_t i = 0; i < global_table.count; i++) {
GlobalSymbol *sym = &global_table.symbols[i];
if (!sym->is_defined) {
printf("UNRESOLVED: %s (needed by %s)\n",
sym->name, sym->source_file);
}
}
return 0;
}
5.6 Phase 5: PLT/GOT Tracer (Days 14-16)
Goals:
- Use ptrace or LD_AUDIT to trace PLT calls
- Show GOT resolution in real-time
This phase is more advanced - you can use either:
- LD_AUDIT: A less invasive approach using the audit interface
- ptrace: Full control but more complex
// src/pltrace/tracer.c - Using LD_AUDIT approach
// Create an audit library that logs resolutions
// rtld-audit.so
#define _GNU_SOURCE
#include <link.h>
#include <stdio.h>
unsigned int la_version(unsigned int version) {
return version;
}
unsigned int la_objopen(struct link_map *map, Lmid_t lmid,
uintptr_t *cookie) {
fprintf(stderr, "[LOAD] %s at %p\n",
map->l_name, (void *)map->l_addr);
return LA_FLG_BINDTO | LA_FLG_BINDFROM;
}
uintptr_t la_symbind64(Elf64_Sym *sym, unsigned int ndx,
uintptr_t *refcook, uintptr_t *defcook,
unsigned int *flags, const char *symname) {
fprintf(stderr, "[BIND] %s -> %p\n",
symname, (void *)sym->st_value);
return sym->st_value;
}
# Build and use
gcc -fPIC -shared -o rtld-audit.so rtld-audit.c
LD_AUDIT=./rtld-audit.so ./test_program
5.7 Phase 6: Interposition Toolkit (Days 17-19)
Implement all three interposition techniques with demonstration programs.
Runtime Interposition (most important):
// src/interpose/runtime.c - Comprehensive malloc tracer
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <dlfcn.h>
#include <execinfo.h>
#include <pthread.h>
static void *(*real_malloc)(size_t) = NULL;
static void (*real_free)(void *) = NULL;
static void *(*real_realloc)(void *, size_t) = NULL;
static void *(*real_calloc)(size_t, size_t) = NULL;
static size_t total_allocated = 0;
static size_t total_freed = 0;
static size_t current_allocated = 0;
static size_t peak_allocated = 0;
static size_t alloc_count = 0;
static size_t free_count = 0;
static pthread_mutex_t stats_lock = PTHREAD_MUTEX_INITIALIZER;
static int initialized = 0;
static int in_init = 0;
static void init(void) {
if (initialized || in_init) return;
in_init = 1;
real_malloc = dlsym(RTLD_NEXT, "malloc");
real_free = dlsym(RTLD_NEXT, "free");
real_realloc = dlsym(RTLD_NEXT, "realloc");
real_calloc = dlsym(RTLD_NEXT, "calloc");
if (!real_malloc || !real_free) {
fprintf(stderr, "Error loading malloc/free: %s\n", dlerror());
_exit(1);
}
initialized = 1;
in_init = 0;
}
static void print_caller(void) {
void *bt[3];
int n = backtrace(bt, 3);
char **syms = backtrace_symbols(bt, n);
if (syms && n > 2) {
fprintf(stderr, " [from %s]", syms[2]);
}
free(syms);
}
void *malloc(size_t size) {
if (!initialized) init();
void *ptr = real_malloc(size);
pthread_mutex_lock(&stats_lock);
total_allocated += size;
current_allocated += size;
alloc_count++;
if (current_allocated > peak_allocated) {
peak_allocated = current_allocated;
}
pthread_mutex_unlock(&stats_lock);
fprintf(stderr, "[malloc_trace] malloc(%zu) = %p", size, ptr);
print_caller();
fprintf(stderr, "\n");
return ptr;
}
void free(void *ptr) {
if (!initialized) init();
if (!ptr) return;
pthread_mutex_lock(&stats_lock);
free_count++;
// Note: We can't easily track the size of freed memory without extra bookkeeping
pthread_mutex_unlock(&stats_lock);
fprintf(stderr, "[malloc_trace] free(%p)", ptr);
print_caller();
fprintf(stderr, "\n");
real_free(ptr);
}
__attribute__((destructor))
void print_stats(void) {
fprintf(stderr, "\n=== MALLOC TRACE SUMMARY ===\n");
fprintf(stderr, "Total allocations: %zu\n", alloc_count);
fprintf(stderr, "Total frees: %zu\n", free_count);
fprintf(stderr, "Bytes allocated: %zu\n", total_allocated);
fprintf(stderr, "Peak memory: %zu bytes\n", peak_allocated);
if (alloc_count > free_count) {
fprintf(stderr, "WARNING: Potential memory leak (%zu unfreed allocs)\n",
alloc_count - free_count);
}
}
5.8 Phase 7: Integration and Polish (Days 20-21)
Goals:
- Combine all tools
- Add comprehensive error handling
- Create demonstration scripts
6. Testing Strategy
6.1 Test Categories
| Category | Purpose | Examples |
|---|---|---|
| Unit Tests | Test parsing functions | ELF header validation |
| Integration Tests | Test full tool output | Compare with readelf |
| Regression Tests | Ensure fixes donโt break | Known-good outputs |
| Edge Cases | Handle unusual inputs | Stripped binaries, malformed ELF |
6.2 Test Cases
// Test 1: Simple object file
// hello.c
#include <stdio.h>
int main() {
printf("Hello\n");
return 0;
}
// Expected: 1 undefined (printf), 1 defined (main)
// Test 2: Multiple symbols
// multi.c
int global_init = 42;
int global_uninit;
static int local_var = 10;
static void local_func(void) {}
void global_func(void) { local_func(); }
int main() { return global_init + global_uninit; }
// Expected: 3 global, 2 local, proper categorization
// Test 3: External dependencies
// extern.c
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
int main() {
printf("%f\n", sin(1.0));
char *p = malloc(100);
free(p);
return 0;
}
// Expected: printf, sin, malloc, free as undefined
6.3 Validation Against Standard Tools
# Compare your output with standard tools
./elfinspect --symbols test.o > my_output.txt
readelf -s test.o > readelf_output.txt
diff my_output.txt readelf_output.txt
# Verify relocations
./elfinspect --relocations test.o
readelf -r test.o
7. Common Pitfalls and Debugging
7.1 Frequent Mistakes
| Pitfall | Symptom | Solution |
|---|---|---|
| Wrong endianness | Garbage values | Check EI_DATA in ELF header |
| 32/64 bit confusion | Segfault | Check EI_CLASS, use correct structures |
| String table offset | Wrong symbol names | Verify sh_link field |
| Relocation addend | Wrong addresses | Use Rela not Rel on x86-64 |
| PLT vs GOT confusion | Wrong addresses traced | Study the PLT structure carefully |
| Thread safety in interposition | Crashes/deadlocks | Use thread-local storage or locks |
7.2 Debugging Strategies
# Compare with standard tools
readelf -a binary > reference.txt
./elfinspect --all binary > mine.txt
diff reference.txt mine.txt
# Hexdump to verify parsing
hexdump -C binary | head -100
# GDB for ELF parsing issues
gdb ./elfinspect
(gdb) break elf_open
(gdb) run test.o
(gdb) print *elf->ehdr
# ltrace for interposition issues
ltrace ./test_program 2>&1 | grep malloc
7.3 Common GOT/PLT Debugging
# Examine PLT entries
objdump -d -j .plt binary
# Examine GOT entries
objdump -d -j .got.plt binary
# Watch GOT changes with GDB
gdb ./program
(gdb) break main
(gdb) run
(gdb) x/10gx &printf@got.plt # Before first call
(gdb) call printf("test\n")
(gdb) x/10gx &printf@got.plt # After resolution
8. Extensions and Challenges
8.1 Beginner Extensions
- JSON output: Machine-readable output format
- Symbol search: Find symbol by name across files
- Section hexdump: Show raw bytes of any section
- Dependency graph: DOT format for visualization
8.2 Intermediate Extensions
- DWARF debug info: Parse .debug_* sections for source mapping
- Version scripts: Handle symbol versioning
- Weak symbol handling: Full weak/strong resolution
- Archive support: Handle .a static libraries
8.3 Advanced Extensions
- Binary patching: Modify GOT entries at runtime
- Full LD_PRELOAD profiler: Track all allocations with size
- Cross-architecture: Support ARM64 ELF files
- Security scanner: Check RELRO, stack canary, PIE
9. Real-World Connections
9.1 Industry Applications
| Application | How This Project Helps |
|---|---|
| Debugging | Understand symbol resolution failures |
| Profiling | Interpose to measure function timing |
| Security | Analyze binary protections |
| Reverse Engineering | Understand program structure |
| Build Systems | Debug linking issues |
| Containers | Understand dynamic library loading |
9.2 Related Tools
- ldd: List shared library dependencies
- nm: List symbols
- readelf: Display ELF file information
- objdump: Disassemble and display
- patchelf: Modify ELF files
- ltrace/strace: Trace library/system calls
- Ghidra/IDA: Advanced binary analysis
9.3 Interview Relevance
This project prepares you to answer:
- โExplain how dynamic linking worksโ
- โWhat happens when you call a function in a shared library?โ
- โHow would you intercept all malloc calls in a program?โ
- โExplain the PLT and GOTโ
- โHow does LD_PRELOAD work?โ
- โWhat is position-independent code?โ
10. Resources
10.1 Essential Reading
- CS:APP Chapter 7: โLinkingโ - Core concepts
- โLinkers and Loadersโ by John Levine - Definitive reference
- ELF Specification: Official format documentation
- System V ABI: x86-64 supplement for relocations
10.2 Documentation
man elf- ELF format overviewman dlopen- Dynamic loading APIman rtld-audit- Runtime linker audit interfaceman ld.so- Dynamic linker documentation
10.3 Online Resources
10.4 Related Projects in This Series
- Previous: P9 (Cache Lab++) - Memory hierarchy understanding
- Foundation: P1 (Toolchain Explorer) - Basic linking concepts
- Next: P11 (Signals + Processes) - Process execution context
11. Self-Assessment Checklist
Understanding
- I can explain the ELF file structure (header, sections, segments)
- I understand the difference between .symtab and .dynsym
- I can explain each common relocation type and when itโs used
- I understand why PC-relative addressing is used in shared libraries
- I can trace through a PLT/GOT call step by step
- I understand lazy vs immediate binding
- I can explain all three interposition techniques
- I understand the security implications of GOT/PLT
Implementation
- My ELF parser correctly reads headers and sections
- Symbol table output matches readelf -s
- Relocation output matches readelf -r
- Link map analyzer identifies undefined symbols
- PLT/GOT tracer shows resolution events
- All three interposition demos work correctly
- Tools handle edge cases gracefully
Practical Skills
- I can debug linking errors using these tools
- I can profile a program using interposition
- I can explain why a symbol failed to resolve
- I can analyze a binaryโs dynamic dependencies
- I can use readelf, objdump, nm, ldd fluently
13. Real World Outcome
When you complete this project, you will have a comprehensive ELF analysis toolkit. Here is exactly what running your tools will look like:
ELF Header Analysis
$ ./elfmap --header /bin/ls
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ ELF HEADER ANALYSIS โ
โ /bin/ls โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
(ELF64, Little Endian, System V ABI)
Type: DYN (Shared object file) - Position Independent Executable
Machine: x86-64
Version: 1 (current)
Entry point: 0x6ab0
Program headers: 13 entries at offset 0x40 (56 bytes each)
Section headers: 31 entries at offset 0x22a78 (64 bytes each)
Flags: 0x0
Header size: 64 bytes
Section name string table: section 30
Symbol Table Analysis
$ ./elfmap --symbols /bin/ls | head -30
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ SYMBOL TABLE ANALYSIS โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
.dynsym: 125 entries (dynamic symbols - used at runtime)
.symtab: [stripped - not present]
Dynamic Symbols:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Num Value Size Type Bind Vis Ndx Name
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 FUNC GLOBAL DEFAULT UND __ctype_toupper_loc
2: 0000000000000000 0 FUNC GLOBAL DEFAULT UND getenv
3: 0000000000000000 0 FUNC GLOBAL DEFAULT UND sigprocmask
4: 0000000000000000 0 FUNC GLOBAL DEFAULT UND __snprintf_chk
5: 0000000000000000 0 FUNC GLOBAL DEFAULT UND raise
6: 0000000000000000 0 FUNC GLOBAL DEFAULT UND free
7: 0000000000000000 0 FUNC GLOBAL DEFAULT UND abort
8: 0000000000000000 0 FUNC GLOBAL DEFAULT UND __errno_location
9: 0000000000000000 0 FUNC GLOBAL DEFAULT UND strncmp
10: 0000000000000000 0 FUNC WEAK DEFAULT UND _ITM_deregisterTM...
...
Symbol Statistics:
FUNC: 89 (71.2%) NOTYPE: 15 (12.0%)
OBJECT: 18 (14.4%) TLS: 3 ( 2.4%)
GLOBAL: 95 (76.0%) WEAK: 20 (16.0%)
LOCAL: 10 ( 8.0%)
Undefined (UND): 78 (62.4%) - resolved at runtime from shared libraries
Relocation Analysis
$ ./elfmap --relocs /bin/ls
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ RELOCATION TABLE ANALYSIS โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
.rela.dyn: 192 entries (data relocations - resolved at load time)
.rela.plt: 102 entries (PLT relocations - lazy binding)
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
.rela.dyn (Data Relocations):
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Offset Type Symbol + Addend
0000000022fc8 R_X86_64_RELATIVE +0x13f20
0000000022fd0 R_X86_64_RELATIVE +0x13ee0
0000000023050 R_X86_64_RELATIVE +0x13f10
0000000022f88 R_X86_64_GLOB_DAT __ctype_toupper_loc + 0
0000000022f90 R_X86_64_GLOB_DAT __ctype_b_loc + 0
0000000022f98 R_X86_64_GLOB_DAT optind + 0
...
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
.rela.plt (PLT/GOT Relocations - Lazy Binding):
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Offset Type Symbol
0000000023088 R_X86_64_JUMP_SLOT getenv
0000000023090 R_X86_64_JUMP_SLOT sigprocmask
0000000023098 R_X86_64_JUMP_SLOT raise
00000000230a0 R_X86_64_JUMP_SLOT free
00000000230a8 R_X86_64_JUMP_SLOT abort
...
Relocation Statistics:
R_X86_64_RELATIVE: 90 (46.9%) - PIE base address fixups
R_X86_64_GLOB_DAT: 12 ( 6.3%) - global data pointers
R_X86_64_JUMP_SLOT: 102 (53.1%) - PLT function pointers
PLT/GOT Tracing
$ ./elfmap --pltgot ./test_program
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ PLT/GOT ANALYSIS โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
PLT (Procedure Linkage Table):
Address: 0x1060
Size: 256 bytes
Entries: 15 stubs
GOT (Global Offset Table):
Address: 0x3f70
Size: 168 bytes
Entries: 21 pointers
PLT โ GOT Mapping:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
PLT Entry GOT Entry Symbol State
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
0x1070 0x3f90 puts UNRESOLVED
0x1080 0x3f98 strlen UNRESOLVED
0x1090 0x3fa0 __libc_start_main UNRESOLVED
0x10a0 0x3fa8 malloc UNRESOLVED
0x10b0 0x3fb0 printf UNRESOLVED
...
PLT Stub Disassembly (printf@plt):
0x10b0: endbr64
0x10b4: bnd jmp QWORD PTR [rip+0x2ef5] # 0x3fb0 <printf@GLIBC_2.2.5>
0x10bb: nop DWORD PTR [rax+rax*1+0x0]
Library Interposition Demo
$ cat > malloc_trace.c << 'EOF'
#define _GNU_SOURCE
#include <stdio.h>
#include <dlfcn.h>
void *malloc(size_t size) {
static void *(*real_malloc)(size_t) = NULL;
if (!real_malloc) real_malloc = dlsym(RTLD_NEXT, "malloc");
void *ptr = real_malloc(size);
fprintf(stderr, "malloc(%zu) = %p\n", size, ptr);
return ptr;
}
EOF
$ gcc -shared -fPIC -o libmalloc_trace.so malloc_trace.c -ldl
$ LD_PRELOAD=./libmalloc_trace.so ls
malloc(472) = 0x5555557a3010
malloc(120) = 0x5555557a31f0
malloc(1024) = 0x5555557a3270
malloc(13) = 0x5555557a3680
...
Desktop Documents Downloads Pictures test_program
Interposition Comparison
$ ./interpose_demo
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ LIBRARY INTERPOSITION DEMONSTRATION โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Testing three interposition techniques with malloc():
1. COMPILE-TIME (wrapper function):
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Technique: #define malloc(s) my_malloc(s)
Pros: Zero runtime overhead, catches all calls in our code
Cons: Requires source access, doesn't affect libraries
Result: malloc(1024) -> my_malloc captured, real_malloc returned 0x12340000
2. LINK-TIME (--wrap flag):
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Technique: gcc -Wl,--wrap,malloc
Pros: No source changes needed, can wrap any symbol
Cons: Static linking only, must relink
Build: gcc -Wl,--wrap,malloc -o prog prog.o wrap.o
Result: __wrap_malloc called, forwarded to __real_malloc
3. RUN-TIME (LD_PRELOAD):
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Technique: LD_PRELOAD=./libhook.so
Pros: Works on any binary, no recompilation
Cons: Only affects dynamically linked calls, slight overhead
Result: Interposed malloc() called 47 times during program execution
14. The Core Question Youโre Answering
โHow does a collection of separately compiled object files become a running program, and how does the operating system resolve symbols across shared libraries at runtime?โ
This project demystifies the โmagicโ that happens between compilation and execution. Understanding linking is essential for debugging mysterious undefined reference errors, creating plugins, hooking system calls for debugging/security, and writing code that plays well with shared libraries.
15. Concepts You Must Understand First
Before starting this project, ensure you have a solid grasp of these foundational concepts:
| Concept | Where to Learn | Why It Matters |
|---|---|---|
| Compilation process (preprocessor, compiler, assembler) | CS:APP 7.1 | Understanding what object files contain |
| Virtual memory basics | CS:APP 9.1-9.3 | How program sections map to memory |
| C pointers and memory layout | CS:APP 3.8-3.9 | Parsing binary structures, pointer arithmetic |
| Hexadecimal and binary | CS:APP 2.1 | Reading ELF byte patterns |
| File I/O in C (fopen, fread, mmap) | K&R Ch. 8 | Reading binary files efficiently |
| Static vs dynamic libraries | CS:APP 7.6-7.7 | Why linking works differently for each |
| Position-Independent Code (PIC) | CS:APP 7.12 | How shared libraries can load anywhere |
| x86-64 calling conventions | CS:APP 3.7 | Understanding function calls in PLT |
16. Questions to Guide Your Design
Work through these questions before writing any code:
-
File Mapping: Should you use read()/fread() or mmap() to access the ELF file? What are the tradeoffs for a tool that needs to jump around the file?
-
Endianness: The ELF header tells you the fileโs endianness. How will you handle reading multi-byte fields on a machine with different endianness?
-
String Tables: Symbol names are stored as offsets into string tables. How will you safely convert an offset to a string pointer without buffer overflows?
-
Section vs Segment: Sections are for the linker, segments are for the loader. When would you iterate sections vs segments?
-
Symbol Resolution: Given an undefined symbol in your program, how would you find which shared library provides it? What data structures enable this lookup?
-
GOT Modification: For runtime interposition, you might patch the GOT directly. What memory protection issues will you encounter? How can ptrace or /proc/pid/mem help?
17. Thinking Exercise
Before coding, trace through what happens when this program runs:
// main.c
#include <stdio.h>
int main() {
printf("Hello\n");
printf("World\n");
return 0;
}
Compiled with: gcc -o hello main.c
Questions to answer by hand:
- When
maincallsprintfthe first time, what address does thecallinstruction target? - What code executes at that address?
- How does the dynamic linker find printf in libc.so?
- What gets written to the GOT?
- When
maincallsprintfthe second time, whatโs different?
Draw a diagram showing the PLT stub, GOT entry, and libcโs printf for both the first and second calls.
Solution (click to expand)
First call to printf:
call printf@pltjumps to PLT stub at fixed offset (e.g., 0x1050)- PLT stub:
jmp *GOT[printf]- but GOT initially points back to PLT+6 - PLT stub pushes relocation index, jumps to PLT[0] (resolver)
- Resolver calls
_dl_runtime_resolve(link_map, reloc_index) - Dynamic linker searches loaded libraries for โprintfโ symbol
- Finds printf at 0x7ffff7a62840 in libc.so
- Patches GOT[printf] = 0x7ffff7a62840
- Jumps to printf
Second call to printf:
call printf@pltjumps to same PLT stub- PLT stub:
jmp *GOT[printf]- now contains 0x7ffff7a62840 - Jumps directly to printf in libc - no resolver!
FIRST CALL: SECOND CALL:
main: main:
call 0x1050 โโ call 0x1050 โโ
โ โ
PLT[printf]: โ PLT[printf]: โ
jmp *GOT โโโโโผโโโ jmp *GOT โโโโโผโโโโโโโโโโโโโโ
push reloc โ โ (GOT points โ โ
jmp PLT[0] โโโ โ back here) GOT[printf]: โ โ
โโโโ 0x7fff... โโโโ โ
GOT[printf]: โ โ
0x1056 โโโโโโโ libc.so: โ
printf() โโโโโโโโโโโโโโโโโโโ
PLT[0]:
push &link_map
jmp _dl_runtime_resolve
โ
โผ
Searches libc.so for "printf"
Patches GOT[printf] = &printf
Jumps to printf
18. The Interview Questions Theyโll Ask
After completing this project, you should be able to confidently answer these questions:
- โExplain the difference between .symtab and .dynsym. When is each used?โ
- .symtab is for static linking and debugging (often stripped in production)
- .dynsym is for dynamic linking at runtime (always present in shared libs)
- โWalk me through what happens when you call a dynamically linked function like printf().โ
- Must cover PLT stub, GOT indirection, lazy binding, resolver
- โWhat is the difference between RTLD_NEXT and RTLD_DEFAULT in dlsym()?โ
- RTLD_NEXT searches libraries loaded after the current one
- RTLD_DEFAULT searches all libraries in load order
- โWhy do position-independent executables (PIE) need special relocation types?โ
- PIE can load at any address; R_X86_64_RELATIVE relocations adjust pointers
- โHow would you intercept all malloc() calls in a program you didnโt compile?โ
- LD_PRELOAD, ptrace, or GOT patching; discuss tradeoffs
- โWhat security implications arise from the PLT/GOT mechanism?โ
- GOT overwrite attacks, RELRO (RELocation Read-Only), ASLR
19. Hints in Layers
Use these hints progressively if you get stuck.
Hint Layer 1: Getting Started
- Use mmap() to map the ELF file, then cast pointers:
Elf64_Ehdr *ehdr = (Elf64_Ehdr *)mapped_file; - Include
<elf.h>for all the structure definitions and macros - Start with just printing the ELF header fields; donโt try to parse everything at once
Hint Layer 2: Navigating Sections
- Section headers start at
file_base + ehdr->e_shoff - Section name string table is section number
ehdr->e_shstrndx - To get a section name:
strtab + shdr->sh_namewhere strtab is the shstrtab sectionโs data
Hint Layer 3: Symbol Resolution
- Symbol table entries are in .dynsym section (type SHT_DYNSYM)
- Symbol names are in .dynstr section (linked via sh_link field)
- For each symbol:
name = dynstr + sym->st_name - Use ELF64_ST_BIND() and ELF64_ST_TYPE() macros on st_info
Hint Layer 4: Interposition
- For LD_PRELOAD: define a function with the same signature as the target
- Use
dlsym(RTLD_NEXT, "function_name")to get the real function - Remember to compile with
-fPIC -sharedand link with-ldl - For link-time:
gcc -Wl,--wrap,mallocrenames malloc to __wrap_malloc
20. Books That Will Help
| Topic | Book | Specific Chapters |
|---|---|---|
| ELF format and linking fundamentals | CS:APP (3rd ed.) | Chapter 7 (entire chapter) |
| Static linking details | CS:APP (3rd ed.) | Chapter 7.5-7.6 |
| Dynamic linking and PLT/GOT | CS:APP (3rd ed.) | Chapter 7.7-7.12 |
| Library interposition | CS:APP (3rd ed.) | Chapter 7.13 |
| Advanced linker topics | Linkers and Loaders (Levine) | Chapters 1-4, 8-10 |
| ELF specification | TIS ELF Specification v1.2 | Entire document |
| Linux dynamic linker internals | The Linux Programming Interface (Kerrisk) | Chapter 41-42 |
| Binary analysis and security | Practical Binary Analysis (Andriesse) | Chapters 1-5 |
12. Submission / Completion Criteria
Minimum Viable Completion:
- ELF parser reads headers, sections, symbols
- Symbol table display works
- Basic relocation display works
- One interposition technique demonstrated
Full Completion:
- All four tool components working
- Comprehensive symbol analysis
- PLT/GOT tracing with explanations
- All three interposition techniques
- Clean error handling
Excellence:
- DWARF debug info parsing
- Cross-reference with source code
- Security analysis features
- Comprehensive test suite
- Production-quality documentation
This guide was expanded from CSAPP_3E_DEEP_LEARNING_PROJECTS.md. For the complete learning path, see the project index.