Project 6: Symbol Table Analyzer
Build a tool that parses ELF files and displays symbol table information, revealing how the linker resolves references between object files.
Quick Reference
| Attribute | Value |
|---|---|
| Language | C |
| Difficulty | Level 4 (Advanced) |
| Time | 1 Week (20-30 hours) |
| Book Reference | CS:APP Ch. 7, Advanced C and C++ Compiling Ch. 3-4 |
| Coolness | 4/5 - Linker Archaeology |
| Portfolio Value | High - Systems Programming Essential |
Learning Objectives
By completing this project, you will:
- Parse ELF binary format - Read and interpret ELF headers, section headers, and symbol tables
- Understand symbol binding - Distinguish LOCAL, GLOBAL, and WEAK symbols
- Decode symbol types - Identify FUNC, OBJECT, NOTYPE, FILE, and SECTION symbols
- Interpret symbol visibility - Understand DEFAULT, HIDDEN, PROTECTED visibility
- Read relocation entries - See how the linker patches addresses
- Debug linker errors systematically - Use symbol knowledge to solve undefined references
- Connect compilation to linking - See how source code becomes symbols
- Use ELF tools effectively - Complement nm, readelf, objdump with deep understanding
The Core Question You’re Answering
“When you compile a C program, what exactly is in the symbol table, and how does the linker use it to combine object files into an executable?”
Every C program you write becomes a collection of symbols - names that represent functions, variables, and other entities. The linker’s job is to:
- Collect all symbols from all object files
- Match every undefined symbol reference to exactly one defined symbol
- Patch (relocate) all addresses to their final locations
Understanding symbol tables transforms linker errors from mysterious incantations into straightforward debugging:
BEFORE THIS PROJECT:
───────────────────────────────────────────────────────────
$ gcc main.o helper.o -o program
undefined reference to `calculate'
collect2: error: ld returned 1 exit status
You: "Why doesn't it find calculate? It's right there in helper.c!"
*random modifications until it works*
AFTER THIS PROJECT:
───────────────────────────────────────────────────────────
$ nm main.o | grep calculate
U calculate <- UNDEFINED, needs resolution
$ nm helper.o | grep calculate
U calculate <- Also UNDEFINED! Not defined here either!
$ nm mathlib.o | grep calculate
0000000000000040 T calculate <- DEFINED (T=text), here it is!
You: "Aha! I need to link mathlib.o"
*one precise fix*
Theoretical Foundation
2.1 The ELF File Format
ELF (Executable and Linkable Format) is the standard binary format on Linux, BSD, and many other Unix systems. Every .o, .so, and executable follows this structure:
ELF FILE STRUCTURE
══════════════════════════════════════════════════════════════════════════════
┌──────────────────────────────────────────────────────────────────────────────┐
│ ELF HEADER │
│ ┌────────────────────────────────────────────────────────────────────────┐ │
│ │ Magic number: 0x7f 'E' 'L' 'F' │ │
│ │ Class: 32-bit or 64-bit │ │
│ │ Endianness: Little or Big │ │
│ │ Type: Relocatable (.o), Executable, Shared object (.so), Core dump │ │
│ │ Entry point address (for executables) │ │
│ │ Program header table offset │ │
│ │ Section header table offset │ │
│ │ Number of program headers │ │
│ │ Number of section headers │ │
│ └────────────────────────────────────────────────────────────────────────┘ │
├──────────────────────────────────────────────────────────────────────────────┤
│ PROGRAM HEADER TABLE │
│ (Used by loader at runtime - describes segments) │
│ ┌─────────────────────────────────────────────────────────────────────────┐ │
│ │ LOAD segment: .text + .rodata (rx permissions) │ │
│ │ LOAD segment: .data + .bss (rw permissions) │ │
│ │ DYNAMIC segment: dynamic linking info │ │
│ │ ... │ │
│ └─────────────────────────────────────────────────────────────────────────┘ │
├──────────────────────────────────────────────────────────────────────────────┤
│ SECTIONS │
│ ┌─────────────────────────────────────────────────────────────────────────┐ │
│ │ .text - Executable machine code │ │
│ │ .rodata - Read-only data (string literals, constants) │ │
│ │ .data - Initialized global/static variables │ │
│ │ .bss - Uninitialized global/static variables (zero-initialized) │ │
│ │ .symtab - Symbol table (this project's focus!) │ │
│ │ .strtab - String table for symbol names │ │
│ │ .rel.text - Relocations for .text section │ │
│ │ .rel.data - Relocations for .data section │ │
│ │ .dynsym - Dynamic symbol table (for shared libraries) │ │
│ │ .dynstr - Dynamic string table │ │
│ │ .debug_* - DWARF debugging information │ │
│ │ ... │ │
│ └─────────────────────────────────────────────────────────────────────────┘ │
├──────────────────────────────────────────────────────────────────────────────┤
│ SECTION HEADER TABLE │
│ (Used by linker - describes each section) │
│ ┌─────────────────────────────────────────────────────────────────────────┐ │
│ │ Section 0: NULL (reserved) │ │
│ │ Section 1: .text - offset, size, flags (ALLOC|EXECINSTR) │ │
│ │ Section 2: .data - offset, size, flags (ALLOC|WRITE) │ │
│ │ Section 3: .symtab - offset, size, link to .strtab │ │
│ │ ... │ │
│ └─────────────────────────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────────────────────┘
2.2 Symbol Table Entry Structure
Each symbol in the .symtab section has this structure (64-bit ELF):
SYMBOL TABLE ENTRY (Elf64_Sym)
══════════════════════════════════════════════════════════════════════════════
┌──────────────────────────────────────────────────────────────────────────────┐
│ struct Elf64_Sym { │
│ Elf64_Word st_name; /* Offset into string table (4 bytes) */ │
│ unsigned char st_info; /* Type (4 bits) + Binding (4 bits) */ │
│ unsigned char st_other; /* Visibility (2 bits) + reserved */ │
│ Elf64_Half st_shndx; /* Section header index (2 bytes) */ │
│ Elf64_Addr st_value; /* Symbol value/address (8 bytes) */ │
│ Elf64_Xword st_size; /* Symbol size in bytes (8 bytes) */ │
│ }; /* Total: 24 bytes per symbol */ │
└──────────────────────────────────────────────────────────────────────────────┘
FIELD BREAKDOWN:
────────────────
st_name: Index into .strtab section
0 means no name (anonymous symbol)
st_info: Packed byte containing:
┌─────────────────────────────────────────────┐
│ High 4 bits: BINDING │
│ STB_LOCAL (0) - Not visible outside file│
│ STB_GLOBAL (1) - Visible to all files │
│ STB_WEAK (2) - Lower priority global │
│ │
│ Low 4 bits: TYPE │
│ STT_NOTYPE (0) - Unspecified │
│ STT_OBJECT (1) - Data object (variable) │
│ STT_FUNC (2) - Function │
│ STT_SECTION(3) - Section symbol │
│ STT_FILE (4) - Source file name │
└─────────────────────────────────────────────┘
st_other: Visibility (bits 0-1):
STV_DEFAULT (0) - Normal visibility
STV_HIDDEN (2) - Not visible outside .so
STV_PROTECTED (3) - Visible but not preemptable
st_shndx: Section index where symbol is defined:
SHN_UNDEF (0) - Undefined (reference)
SHN_ABS (0xfff1) - Absolute value
SHN_COMMON(0xfff2) - Common symbol (tentative)
1, 2, 3... - Index into section headers
st_value: For defined symbols: offset within section
For undefined: 0
For COMMON: alignment requirement
st_size: Size of data object or function (0 if unknown)
2.3 Symbol Binding and Resolution
The linker resolves symbols based on binding:
SYMBOL RESOLUTION RULES
══════════════════════════════════════════════════════════════════════════════
┌──────────────────────────────────────────────────────────────────────────────┐
│ STRONG vs WEAK SYMBOLS │
├──────────────────────────────────────────────────────────────────────────────┤
│ │
│ STRONG SYMBOLS (STB_GLOBAL with definition): │
│ ───────────────────────────────────────────── │
│ • Functions │
│ • Initialized global variables │
│ │
│ WEAK SYMBOLS (STB_WEAK): │
│ ──────────────────────── │
│ • Uninitialized global variables (tentative definitions) │
│ • Explicitly marked with __attribute__((weak)) │
│ │
├──────────────────────────────────────────────────────────────────────────────┤
│ RESOLUTION RULES │
├──────────────────────────────────────────────────────────────────────────────┤
│ │
│ Rule 1: Multiple strong definitions of same symbol │
│ → LINKER ERROR: "multiple definition of X" │
│ │
│ Rule 2: One strong + multiple weak definitions │
│ → Strong wins (silently, no warning!) │
│ │
│ Rule 3: Multiple weak definitions, no strong │
│ → Linker picks one (typically first encountered) │
│ │
│ Rule 4: Undefined symbol with no definition │
│ → LINKER ERROR: "undefined reference to X" │
│ │
├──────────────────────────────────────────────────────────────────────────────┤
│ EXAMPLES │
├──────────────────────────────────────────────────────────────────────────────┤
│ │
│ // file1.c // file2.c │
│ int x = 1; /* STRONG */ int x = 2; /* STRONG */ │
│ │
│ → ERROR: multiple definition of 'x' │
│ │
│ ─────────────────────────────────────────────────────────────────────────── │
│ │
│ // file1.c // file2.c │
│ int x = 1; /* STRONG */ int x; /* WEAK (uninitialized) */ │
│ │
│ → OK: x = 1 (strong wins) │
│ │
│ ─────────────────────────────────────────────────────────────────────────── │
│ │
│ // file1.c // file2.c │
│ int x; /* WEAK */ int x; /* WEAK */ │
│ │
│ → OK: x = 0 (one weak chosen, both initialized to 0) │
│ │
└──────────────────────────────────────────────────────────────────────────────┘
2.4 Relocations: The Linker’s Patch List
Object files contain references to symbols whose final addresses aren’t known at compile time. Relocations tell the linker where and how to patch these references:
RELOCATION ENTRIES
══════════════════════════════════════════════════════════════════════════════
┌──────────────────────────────────────────────────────────────────────────────┐
│ struct Elf64_Rela { │
│ Elf64_Addr r_offset; /* Location to patch (8 bytes) */ │
│ Elf64_Xword r_info; /* Symbol index (32 bits) + Type (32 bits) */ │
│ Elf64_Sxword r_addend; /* Constant to add (8 bytes) */ │
│ }; │
└──────────────────────────────────────────────────────────────────────────────┘
COMMON RELOCATION TYPES (x86-64):
────────────────────────────────
R_X86_64_64 Absolute 64-bit address
S + A
(Symbol value + Addend)
R_X86_64_PC32 PC-relative 32-bit
S + A - P
(Symbol + Addend - Place)
R_X86_64_PLT32 Function call through PLT
L + A - P
(PLT entry + Addend - Place)
R_X86_64_GOTPCREL Data access through GOT
G + A - P
(GOT entry + Addend - Place)
EXAMPLE: How a function call gets relocated
═══════════════════════════════════════════
Source code:
────────────
extern int helper(int x);
int main() {
return helper(42);
}
Object file (main.o):
────────────────────
.text:
main:
push rbp
mov rbp, rsp
mov edi, 42
call 0x00000000 ← Placeholder! (to be filled by linker)
pop rbp
ret
.rela.text:
r_offset: 0x0a (offset of the call target in .text)
r_info: helper (symbol) + R_X86_64_PLT32 (type)
r_addend: -4
After linking (executable):
──────────────────────────
.text:
main:
push rbp
mov rbp, rsp
mov edi, 42
call 0x401040 ← Patched with helper's address!
pop rbp
ret
2.5 Why This Matters
Understanding symbol tables unlocks:
- Debugging Linker Errors: Know exactly why “undefined reference” occurs
- Library Design: Control symbol visibility for clean APIs
- Optimization: Understand inlining, LTO, dead code elimination
- Security: Symbol interposition, LD_PRELOAD hooks
- Reverse Engineering: Analyze binaries without source code
- Build Systems: Understand incremental linking, shared libraries
2.6 Historical Context: Before ELF
BINARY FORMAT EVOLUTION
══════════════════════════════════════════════════════════════════════════════
1970s: a.out (Assembler OUTput)
- Simple format from Unix
- Limited to one code + one data section
- No dynamic linking
1980s: COFF (Common Object File Format)
- Multiple sections
- Still limited symbol information
- Used by early Windows, Unix System V
1990s: ELF (Executable and Linkable Format)
- Created for System V Release 4
- Rich symbol tables, dynamic linking
- Became standard on Linux, BSD, Solaris
Other formats still in use:
- Mach-O: macOS and iOS
- PE/COFF: Windows executables (.exe, .dll)
- WebAssembly: Browser binaries
2.7 Common Misconceptions
| Misconception | Reality |
|---|---|
| “Static variables aren’t in symbol table” | They ARE, but with LOCAL binding |
| “Symbol table is only for debugging” | Linker REQUIRES it for symbol resolution |
| “All symbols have addresses” | UNDEFINED symbols have no address |
| “Function names are stored as-is” | C++ mangles names; C doesn’t |
| “Weak symbols cause warnings” | Strong silently overrides weak |
| “Symbol size is always accurate” | Compiler may not know size of extern |
Project Specification
3.1 What You Will Build
A mini nm/readelf clone called symtab that parses ELF files and displays symbol table information:
$ ./symtab main.o
================================================================================
SYMBOL TABLE ANALYZER
File: main.o
Type: Relocatable object file
================================================================================
SYMBOL TABLE (.symtab) - 12 entries
────────────────────────────────────────────────────────────────────────────────
Num Value Size Type Bind Vis Ndx Name
────────────────────────────────────────────────────────────────────────────────
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 FILE LOCAL DEFAULT ABS main.c
2: 0000000000000000 0 SECTION LOCAL DEFAULT 1 .text
3: 0000000000000000 0 SECTION LOCAL DEFAULT 3 .data
4: 0000000000000000 0 SECTION LOCAL DEFAULT 4 .bss
5: 0000000000000000 0 SECTION LOCAL DEFAULT 5 .rodata
6: 0000000000000000 4 OBJECT LOCAL DEFAULT 3 local_static
7: 0000000000000000 42 FUNC GLOBAL DEFAULT 1 main
8: 0000000000000000 4 OBJECT GLOBAL DEFAULT 3 global_var
9: 0000000000000004 4 OBJECT GLOBAL DEFAULT 4 uninit_global
10: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND printf
11: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND helper_function
UNDEFINED SYMBOLS (need resolution):
────────────────────────────────────────────────────────────────────────────────
printf - standard library function
helper_function - user-defined, must be linked
RELOCATIONS (.rela.text) - 3 entries
────────────────────────────────────────────────────────────────────────────────
Offset Info Type Sym. Value Sym. Name + Addend
────────────────────────────────────────────────────────────────────────────────
00000000001a 000a00000004 R_X86_64_PLT32 0000000000000000 printf - 4
000000000024 000b00000004 R_X86_64_PLT32 0000000000000000 helper_function - 4
00000000002f 000500000002 R_X86_64_PC32 0000000000000000 .rodata - 4
SYMBOL SUMMARY:
────────────────────────────────────────────────────────────────────────────────
Total symbols: 12
Defined (GLOBAL): 3 (main, global_var, uninit_global)
Defined (LOCAL): 2 (local_static, main.c)
Undefined: 2 (printf, helper_function)
Section symbols: 4
================================================================================
3.2 Functional Requirements
- ELF Header Parsing:
- Validate ELF magic number
- Determine 32-bit vs 64-bit
- Identify file type (relocatable, executable, shared)
- Parse endianness
- Section Header Parsing:
- Locate all sections
- Find
.symtaband.strtab - Find
.dynsymand.dynstr(for shared libs) - Find relocation sections (
.rela.*,.rel.*)
- Symbol Table Display:
- List all symbols with full information
- Decode binding (LOCAL, GLOBAL, WEAK)
- Decode type (FUNC, OBJECT, NOTYPE, FILE, SECTION)
- Show section index or special value (UND, ABS, COM)
- Display symbol value and size
- Relocation Display:
- Show all relocations
- Decode relocation type
- Show target symbol
- Display addend
- Analysis Features:
- List undefined symbols
- Identify potential linking issues
- Compare two object files
- Export/import analysis
3.3 Non-Functional Requirements
- Portability: Handle both 32-bit and 64-bit ELF
- Robustness: Gracefully handle malformed files
- Performance: Efficiently mmap large files
- Accuracy: Match output with
nmandreadelf - Educational: Clear output with explanations
3.4 Example Usage and Output
# Basic symbol table dump
$ ./symtab main.o
# Show only undefined symbols
$ ./symtab --undefined main.o
Undefined symbols in main.o:
printf (libc function)
helper_function (user function)
# Show only relocations
$ ./symtab --reloc main.o
# Compare what file1.o exports vs what file2.o needs
$ ./symtab --match file1.o file2.o
Symbols exported by file1.o:
helper_function FUNC GLOBAL .text
shared_data OBJECT GLOBAL .data
Symbols needed by file2.o:
helper_function -> RESOLVED by file1.o
printf -> UNRESOLVED (needs libc)
# Analyze a shared library
$ ./symtab --dynamic /lib/x86_64-linux-gnu/libc.so.6 | head -20
# Verbose mode with educational explanations
$ ./symtab --verbose main.o
[INFO] Reading ELF header...
Magic: 0x7f 'E' 'L' 'F' (valid ELF file)
Class: 64-bit
Endian: Little endian
Type: Relocatable (ET_REL) - object file before linking
[INFO] Scanning section headers...
Found 13 sections
.symtab at offset 0x2a8, 288 bytes (12 symbols)
.strtab at offset 0x3c8, 95 bytes
.rela.text at offset 0x430, 72 bytes (3 relocations)
...
3.5 Real World Outcome
After building this tool, you can:
# Scenario: "undefined reference to calculate"
# Step 1: Check what main.o needs
$ ./symtab --undefined main.o
calculate NOTYPE GLOBAL UND <- Undefined!
# Step 2: Find where calculate is defined
$ ./symtab --defined mathlib.o | grep calculate
calculate FUNC GLOBAL .text <- Found!
# Step 3: Verify the fix works
$ gcc main.o mathlib.o -o program # Success!
Solution Architecture
4.1 High-Level Design
SYMTAB TOOL ARCHITECTURE
══════════════════════════════════════════════════════════════════════════════
┌──────────────────────────────────────────────────────────────────────────────┐
│ USER INTERFACE │
│ ┌────────────────────────────────────────────────────────────────────────┐ │
│ │ CLI Parser: --symbols, --reloc, --undefined, --verbose, --match │ │
│ └────────────────────────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────────────────────┐
│ ELF PARSER CORE │
│ ┌────────────────────────────────────────────────────────────────────────┐ │
│ │ elf_parser.c │ │
│ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │
│ │ │ ELF Header │ │ Section Hdrs │ │ Program Hdrs │ │ │
│ │ │ Parser │ │ Parser │ │ Parser │ │ │
│ │ └──────────────┘ └──────────────┘ └──────────────┘ │ │
│ └────────────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌────────────────────────────────────────────────────────────────────────┐ │
│ │ section_reader.c │ │
│ │ ┌──────────────────┐ ┌──────────────────┐ ┌──────────────────────┐ │ │
│ │ │ .symtab Reader │ │ .strtab Reader │ │ .rel/.rela Reader │ │ │
│ │ │ (Symbol Table) │ │ (String Table) │ │ (Relocations) │ │ │
│ │ └──────────────────┘ └──────────────────┘ └──────────────────────┘ │ │
│ └────────────────────────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────────────────────┐
│ ANALYSIS ENGINE │
│ ┌────────────────────────────────────────────────────────────────────────┐ │
│ │ symbol_analyzer.c │ │
│ │ ┌────────────────┐ ┌────────────────┐ ┌────────────────────────┐ │ │
│ │ │ Classify │ │ Find Undefined │ │ Match Exports/Imports │ │ │
│ │ │ Symbols │ │ References │ │ Between Files │ │ │
│ │ └────────────────┘ └────────────────┘ └────────────────────────┘ │ │
│ └────────────────────────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────────────────────┐
│ OUTPUT FORMATTER │
│ ┌────────────────────────────────────────────────────────────────────────┐ │
│ │ display.c │ │
│ │ ┌────────────────┐ ┌────────────────┐ ┌────────────────────────┐ │ │
│ │ │ Table Format │ │ Verbose Mode │ │ JSON Export │ │ │
│ │ │ (nm-style) │ │ (Educational) │ │ (Machine-readable) │ │ │
│ │ └────────────────┘ └────────────────┘ └────────────────────────┘ │ │
│ └────────────────────────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────────────────────┘
4.2 Key Components
| Component | File | Responsibility |
|---|---|---|
| Main/CLI | main.c |
Command-line parsing, orchestration |
| ELF Parser | elf_parser.c |
Read and validate ELF headers |
| Section Reader | section_reader.c |
Read specific sections |
| Symbol Table | symtab.c |
Parse and store symbol entries |
| Relocations | reloc.c |
Parse relocation entries |
| Analyzer | analyzer.c |
Symbol resolution analysis |
| Display | display.c |
Output formatting |
4.3 Data Structures
/* ELF file representation */
typedef struct {
int fd; /* File descriptor */
void *map; /* Memory-mapped file */
size_t size; /* File size */
int is_64bit; /* 64-bit or 32-bit */
int is_little_endian; /* Endianness */
uint16_t type; /* ET_REL, ET_EXEC, ET_DYN */
/* Header pointers (points into mmap'd region) */
union {
Elf32_Ehdr *ehdr32;
Elf64_Ehdr *ehdr64;
};
/* Section header table */
void *shdr; /* Section headers */
int shnum; /* Number of sections */
char *shstrtab; /* Section string table */
/* Symbol table */
void *symtab; /* Symbol table */
int symcount; /* Number of symbols */
char *strtab; /* Symbol string table */
/* Dynamic symbol table (for .so files) */
void *dynsym;
int dynsymcount;
char *dynstr;
} ElfFile;
/* Parsed symbol (normalized for 32/64 bit) */
typedef struct {
char *name; /* Symbol name (pointer into strtab) */
uint64_t value; /* Symbol value/address */
uint64_t size; /* Symbol size */
uint8_t type; /* STT_FUNC, STT_OBJECT, etc. */
uint8_t binding; /* STB_LOCAL, STB_GLOBAL, STB_WEAK */
uint8_t visibility; /* STV_DEFAULT, STV_HIDDEN, etc. */
uint16_t section_index; /* Section or special (SHN_UNDEF, etc.) */
const char *section_name; /* Section name (or "UND", "ABS", "COM") */
} Symbol;
/* Parsed relocation */
typedef struct {
uint64_t offset; /* Location to patch */
uint32_t type; /* Relocation type */
uint32_t symbol_index; /* Symbol table index */
int64_t addend; /* Constant addend */
char *symbol_name; /* Resolved symbol name */
} Relocation;
/* Analysis result */
typedef struct {
Symbol *defined; /* Defined symbols */
int defined_count;
Symbol *undefined; /* Undefined symbols */
int undefined_count;
Symbol *weak; /* Weak symbols */
int weak_count;
} SymbolAnalysis;
4.4 Algorithm Overview
ELF Parsing Algorithm:
PARSE ELF FILE
══════════════════════════════════════════════════════════════════════════════
1. OPEN AND MAP FILE
────────────────────────────────────────────────────────────────────────────
fd = open(filename, O_RDONLY)
stat(fd, &st)
map = mmap(NULL, st.st_size, PROT_READ, MAP_PRIVATE, fd, 0)
2. VALIDATE ELF HEADER
────────────────────────────────────────────────────────────────────────────
Check magic: map[0..3] == {0x7f, 'E', 'L', 'F'}
Determine class: map[4] == 1 (32-bit) or 2 (64-bit)
Determine endian: map[5] == 1 (little) or 2 (big)
Read type: ehdr->e_type
3. LOCATE SECTION HEADER TABLE
────────────────────────────────────────────────────────────────────────────
shoff = ehdr->e_shoff // Offset to section headers
shnum = ehdr->e_shnum // Number of sections
shentsize = ehdr->e_shentsize // Size of each header
shstrndx = ehdr->e_shstrndx // Index of section name string table
shdr = map + shoff
shstrtab = map + shdr[shstrndx].sh_offset
4. FIND SYMBOL TABLE SECTION
────────────────────────────────────────────────────────────────────────────
for each section:
if shdr[i].sh_type == SHT_SYMTAB:
symtab = map + shdr[i].sh_offset
symcount = shdr[i].sh_size / shdr[i].sh_entsize
strtab_idx = shdr[i].sh_link // Link to string table
strtab = map + shdr[strtab_idx].sh_offset
5. PARSE EACH SYMBOL
────────────────────────────────────────────────────────────────────────────
for i = 0 to symcount:
sym = symtab[i]
name = strtab + sym.st_name
type = ELF_ST_TYPE(sym.st_info)
binding = ELF_ST_BIND(sym.st_info)
visibility = ELF_ST_VISIBILITY(sym.st_other)
section = sym.st_shndx
6. FIND AND PARSE RELOCATIONS
────────────────────────────────────────────────────────────────────────────
for each section:
if shdr[i].sh_type == SHT_RELA or SHT_REL:
Parse relocation entries
Link to symbol table via sh_link
Implementation Guide
5.1 Development Environment Setup
# Required tools
sudo apt-get install gcc gdb binutils build-essential
# Verify installations
gcc --version
readelf --version
nm --version
objdump --version
# Create project structure
mkdir -p symtab/{src,include,test,docs}
cd symtab
5.2 Project Structure
symtab/
├── src/
│ ├── main.c # Entry point, CLI parsing
│ ├── elf_parser.c # ELF file opening and header parsing
│ ├── section_reader.c # Section location and reading
│ ├── symtab.c # Symbol table parsing
│ ├── reloc.c # Relocation parsing
│ ├── analyzer.c # Symbol analysis (undefined, matching)
│ └── display.c # Output formatting
├── include/
│ ├── symtab.h # Main header with types
│ ├── elf_parser.h # ELF parsing declarations
│ └── display.h # Display function declarations
├── test/
│ ├── test_basic.c # Simple test file
│ ├── test_multi.c # Multiple files test
│ └── run_tests.sh # Test script
├── Makefile
└── README.md
5.3 The Core Question You’re Answering
“When you compile a C program, what exactly is in the symbol table, and how does the linker use it to combine object files into an executable?”
5.4 Concepts You Must Understand First
Before implementing, ensure you can answer these questions:
- What is the difference between DEFINED and UNDEFINED symbols?
- DEFINED: Symbol has a definition (code or data) in this file
- UNDEFINED: Symbol is referenced but defined elsewhere
- How: Check
st_shndx- ifSHN_UNDEF(0), it’s undefined - CS:APP Ch. 7.5
- What is symbol binding (LOCAL vs GLOBAL)?
- LOCAL: Only visible within the compilation unit (file)
- GLOBAL: Visible to other files during linking
- How: Extract from
st_infousingELF_ST_BIND() - CS:APP Ch. 7.5
- What are WEAK symbols and when are they used?
- Lower priority than GLOBAL - can be overridden
- Uninitialized globals are tentatively defined as WEAK
- How:
STB_WEAKbinding, linker resolution rules - CS:APP Ch. 7.6.1
- What is a relocation and why is it needed?
- Object files don’t know final addresses
- Relocations are “fix-up” instructions for the linker
- How:
.rel.textand.rela.textsections - CS:APP Ch. 7.7
- What is the string table and how does it work?
- Symbol names are stored separately in
.strtab st_nameis an offset into the string table- Allows variable-length names without wasting space
- ELF specification
- Symbol names are stored separately in
5.5 Questions to Guide Your Design
Work through these before coding:
- How will you handle both 32-bit and 64-bit ELF?
- Header structures differ in size
- Consider using unions or conditional parsing
- Test with both architectures
- How will you access the string table efficiently?
- Memory-map the file for direct pointer access
- Or read string table into memory separately
- How will you decode relocation types?
- Types are architecture-specific (x86-64, ARM, etc.)
- Consider creating a type-to-string mapping table
- How will you match symbol imports/exports between files?
- Build a hash table of defined symbols
- Check each undefined against the hash table
- How will you handle malformed ELF files?
- Validate magic number, bounds check offsets
- Fail gracefully with informative errors
5.6 Thinking Exercise
Before writing code, trace through this example manually:
/* main.c */
extern int helper(int x);
int global_var = 42;
int main(void) {
return helper(global_var);
}
/* helper.c */
int helper(int x) {
return x * 2;
}
Exercise:
- List all symbols that will be in
main.o’s symbol table - Classify each as DEFINED/UNDEFINED, LOCAL/GLOBAL
- What relocations will
main.ohave? - After linking, how are the relocations resolved?
Expected Answers:
main.o symbols:
main.c FILE LOCAL ABS <- Source file name
main FUNC GLOBAL .text <- Defined function
global_var OBJECT GLOBAL .data <- Defined variable
helper NOTYPE GLOBAL UND <- Undefined reference!
main.o relocations:
call helper R_X86_64_PLT32 <- Needs helper's address
mov global_var R_X86_64_PC32 <- Needs global_var's address (PC-relative)
5.7 Hints in Layers
Hint 1: Starting with ELF Header
Use the system headers for ELF structures:
#include <elf.h>
/* Check magic number */
int is_elf(unsigned char *data) {
return data[0] == 0x7f &&
data[1] == 'E' &&
data[2] == 'L' &&
data[3] == 'F';
}
/* Determine 32 vs 64 bit */
int is_64bit(unsigned char *data) {
return data[EI_CLASS] == ELFCLASS64;
}
Hint 2: Memory Mapping the File
Memory mapping is more efficient than read():
#include <sys/mman.h>
#include <sys/stat.h>
#include <fcntl.h>
ElfFile *open_elf(const char *filename) {
ElfFile *elf = calloc(1, sizeof(ElfFile));
elf->fd = open(filename, O_RDONLY);
if (elf->fd < 0) return NULL;
struct stat st;
fstat(elf->fd, &st);
elf->size = st.st_size;
elf->map = mmap(NULL, elf->size, PROT_READ, MAP_PRIVATE, elf->fd, 0);
if (elf->map == MAP_FAILED) return NULL;
return elf;
}
Hint 3: Finding the Symbol Table
Loop through section headers to find SHT_SYMTAB:
void find_symtab(ElfFile *elf) {
Elf64_Ehdr *ehdr = elf->map;
Elf64_Shdr *shdr = elf->map + ehdr->e_shoff;
/* First, get section name string table */
Elf64_Shdr *shstrtab_hdr = &shdr[ehdr->e_shstrndx];
char *shstrtab = elf->map + shstrtab_hdr->sh_offset;
for (int i = 0; i < ehdr->e_shnum; i++) {
char *name = shstrtab + shdr[i].sh_name;
if (shdr[i].sh_type == SHT_SYMTAB) {
elf->symtab = elf->map + shdr[i].sh_offset;
elf->symcount = shdr[i].sh_size / shdr[i].sh_entsize;
/* sh_link points to associated string table */
Elf64_Shdr *strtab_hdr = &shdr[shdr[i].sh_link];
elf->strtab = elf->map + strtab_hdr->sh_offset;
}
}
}
Hint 4: Decoding Symbol Info
Use the ELF macros to extract type and binding:
void print_symbol(Elf64_Sym *sym, char *strtab) {
char *name = strtab + sym->st_name;
/* Extract binding and type from st_info */
unsigned char binding = ELF64_ST_BIND(sym->st_info);
unsigned char type = ELF64_ST_TYPE(sym->st_info);
unsigned char vis = ELF64_ST_VISIBILITY(sym->st_other);
const char *bind_str[] = {"LOCAL", "GLOBAL", "WEAK"};
const char *type_str[] = {"NOTYPE", "OBJECT", "FUNC", "SECTION", "FILE"};
printf("%016lx %5lu %-7s %-6s %s\n",
sym->st_value,
sym->st_size,
type_str[type < 5 ? type : 0],
bind_str[binding < 3 ? binding : 0],
name);
}
Hint 5: Section Index Special Values
Handle special section indices:
const char *get_section_name(uint16_t shndx, ElfFile *elf) {
switch (shndx) {
case SHN_UNDEF: return "UND"; /* Undefined */
case SHN_ABS: return "ABS"; /* Absolute */
case SHN_COMMON: return "COM"; /* Common (unallocated) */
default:
if (shndx < elf->shnum) {
return elf->shstrtab + elf->shdr[shndx].sh_name;
}
return "???";
}
}
Hint 6: Reading Relocations
Relocations are in .rela.text or .rel.text:
void read_relocations(ElfFile *elf, Elf64_Shdr *rela_shdr) {
Elf64_Rela *rela = elf->map + rela_shdr->sh_offset;
int count = rela_shdr->sh_size / rela_shdr->sh_entsize;
/* sh_link points to symbol table, sh_info to target section */
Elf64_Shdr *symtab_hdr = &elf->shdr[rela_shdr->sh_link];
Elf64_Sym *symtab = elf->map + symtab_hdr->sh_offset;
for (int i = 0; i < count; i++) {
uint32_t sym_idx = ELF64_R_SYM(rela[i].r_info);
uint32_t type = ELF64_R_TYPE(rela[i].r_info);
Elf64_Sym *sym = &symtab[sym_idx];
char *sym_name = elf->strtab + sym->st_name;
printf("Offset: %lx, Type: %d, Symbol: %s, Addend: %ld\n",
rela[i].r_offset, type, sym_name, rela[i].r_addend);
}
}
5.8 The Interview Questions They’ll Ask
After completing this project, you’ll be ready for:
- “What is the symbol table and what does it contain?”
- Table of all named entities (functions, variables)
- Each entry has: name, type, binding, section, value, size
- Linker uses it to resolve references between files
- “Explain the difference between LOCAL and GLOBAL symbols”
- LOCAL: Only visible within the file (like
staticin C) - GLOBAL: Visible to other files during linking
- LOCAL symbols can have duplicate names across files
- LOCAL: Only visible within the file (like
- “What is a WEAK symbol and when would you use one?”
- Lower priority than GLOBAL during resolution
- Uninitialized globals are tentatively WEAK
- Useful for providing default implementations
__attribute__((weak))for intentional weak symbols
- “How does the linker resolve undefined references?”
- Collect all symbols from all object files
- For each undefined, search for matching defined symbol
- Multiple definitions of same symbol? Apply strong/weak rules
- Still undefined after all files? Linker error
- “What is a relocation entry?”
- Location in code/data that needs patching
- References symbol by index
- Type determines how to compute final value
- Addend for offset calculations
- “Why do we need separate .symtab and .strtab sections?”
- Symbol entries are fixed-size (24 bytes on 64-bit)
- Names are variable length
- Storing names separately allows efficient symbol table scanning
- Multiple symbols can share common prefixes
5.9 Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Symbol tables | CS:APP | Ch. 7.5 |
| Symbol resolution | CS:APP | Ch. 7.6 |
| Relocations | CS:APP | Ch. 7.7 |
| ELF format | Advanced C and C++ Compiling | Ch. 3-4 |
| Dynamic linking | CS:APP | Ch. 7.10-7.12 |
| Linker scripts | Linkers and Loaders | Ch. 3 |
5.10 Implementation Phases
Phase 1: ELF Header Parsing (Day 1)
Goals:
- Open and memory-map ELF files
- Validate ELF magic number
- Parse ELF header (32-bit and 64-bit)
- Display basic file information
Deliverable:
$ ./symtab main.o
ELF Header:
Magic: 7f 45 4c 46 02 01 01 00 ...
Class: ELF64
Data: Little endian
Type: REL (Relocatable file)
Machine: x86-64
Phase 2: Section Headers (Day 2)
Goals:
- Parse section header table
- List all sections with attributes
- Find
.symtaband.strtab
Deliverable:
$ ./symtab --sections main.o
Section Headers:
[Nr] Name Type Addr Off Size Flags
[ 0] NULL 00000000 00000000 00000000
[ 1] .text PROGBITS 00000000 00000040 00000042 AX
[ 2] .rela.text RELA 00000000 000002a0 00000048 I
[ 3] .data PROGBITS 00000000 00000084 00000004 WA
...
Phase 3: Symbol Table Parsing (Days 3-4)
Goals:
- Parse all symbols from
.symtab - Decode type, binding, visibility
- Display in nm-style format
Deliverable:
$ ./symtab main.o
Symbol Table (.symtab):
Num Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 42 FUNC GLOBAL DEFAULT 1 main
2: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND helper
Phase 4: Relocations (Day 5)
Goals:
- Parse
.rela.*and.rel.*sections - Decode relocation types
- Show target symbols
Deliverable:
$ ./symtab --reloc main.o
Relocations (.rela.text):
Offset Type Sym Addend
000000000012 R_X86_64_PLT32 helper -4
Phase 5: Analysis Features (Days 6-7)
Goals:
- List undefined symbols
- Match exports/imports between files
- Summary statistics
Deliverable:
$ ./symtab --undefined main.o
Undefined symbols:
helper NOTYPE GLOBAL
printf NOTYPE GLOBAL
$ ./symtab --match main.o helper.o
Resolution analysis:
helper: RESOLVED by helper.o (FUNC, .text)
printf: UNRESOLVED (external library)
5.11 Key Implementation Decisions
-
Memory mapping vs read(): Use
mmap()for efficiency and direct pointer access -
32-bit vs 64-bit: Use unions or separate code paths; structures differ in size
-
Endianness: Most systems are little-endian; consider byte-swapping for portability
-
Error handling: Validate all offsets against file size before dereferencing
-
Output format: Match
nmandreadelfoutput for easy comparison
Testing Strategy
Test Against System Tools
# Compare your output with nm
$ ./symtab main.o > my_output.txt
$ nm main.o > nm_output.txt
$ diff my_output.txt nm_output.txt
# Compare with readelf
$ ./symtab --verbose main.o > my_output.txt
$ readelf -s main.o > readelf_output.txt
Test Cases
| Test | Input | Expected |
|---|---|---|
| Valid relocatable | main.o |
All symbols displayed |
| Valid executable | /bin/ls |
Symbols + program headers |
| Shared library | libc.so.6 |
Dynamic symbols shown |
| Stripped binary | stripped |
No .symtab, only .dynsym |
| Invalid file | random.txt |
“Not an ELF file” error |
| 32-bit ELF | hello32.o |
Correct parsing |
| Missing sections | nosymtab.o |
Graceful “no symbols” message |
Validation Script
#!/bin/bash
# test/run_tests.sh
echo "Testing symbol table analyzer..."
# Test 1: Basic object file
gcc -c test/test_basic.c -o test/test_basic.o
./symtab test/test_basic.o > /dev/null && echo "PASS: Basic parsing" || echo "FAIL"
# Test 2: Compare with nm
nm test/test_basic.o | grep -v "^$" > /tmp/nm_out.txt
./symtab test/test_basic.o | grep -E "FUNC|OBJECT" > /tmp/my_out.txt
# Compare symbol names (basic check)
# Test 3: Undefined symbols
echo 'extern void undefined(void); int main() { undefined(); }' > /tmp/undef.c
gcc -c /tmp/undef.c -o /tmp/undef.o
./symtab --undefined /tmp/undef.o | grep -q "undefined" && \
echo "PASS: Undefined detection" || echo "FAIL"
# Test 4: Relocations
./symtab --reloc /tmp/undef.o | grep -q "R_X86_64" && \
echo "PASS: Relocation parsing" || echo "FAIL"
echo "Tests complete!"
Common Pitfalls & Debugging
| Pitfall | Symptom | Solution |
|---|---|---|
| Wrong struct size | Garbage values | Check 32 vs 64 bit |
| Off-by-one in sections | Crash or wrong section | Section 0 is NULL |
| Endian mismatch | Huge/wrong numbers | Check EI_DATA byte |
| String table offset | Empty names | Verify sh_link points to .strtab |
| Bounds check missing | Segfault on bad file | Validate all offsets |
| Missing SHN_UNDEF check | “UND” shows wrong section | Handle special indices |
Debugging Tips
# Verify your understanding with readelf
readelf -h main.o # ELF header
readelf -S main.o # Section headers
readelf -s main.o # Symbol table
readelf -r main.o # Relocations
# Hex dump specific sections
objdump -s -j .symtab main.o
# See raw bytes
hexdump -C main.o | head -100
Extensions & Challenges
Beginner Extensions
- Color-coded output (undefined in red, defined in green)
- JSON output format for scripting
- Filter by symbol type (–functions, –variables)
Intermediate Extensions
- Dynamic symbol table (
.dynsym) parsing - Version symbols parsing (
.gnu.version) - Support for macOS Mach-O format
- Demangle C++ symbol names
Advanced Extensions
- DWARF debug info parsing (function line numbers)
- Build a symbol dependency graph
- Detect ABI compatibility issues
- Compare symbol tables from different compiler versions
Real-World Connections
Industry Applications
- Build Systems: Symbol analysis for incremental linking
- Package Managers: Dependency detection (what libraries needed)
- Security Tools: Finding exported functions for fuzzing
- Reverse Engineering: IDA/Ghidra use symbol tables for analysis
- Performance: LTO (Link-Time Optimization) uses symbols
Related Open Source Projects
- binutils (nm, readelf, objdump): The standard ELF tools
- LLVM tools (llvm-nm, llvm-readelf): Modern alternatives
- patchelf: Modify ELF files post-compilation
- elfutils: Library for ELF manipulation
- radare2: Reverse engineering framework
Resources
ELF Specification
- ELF-64 Object File Format - Official specification
- System V ABI - x86-64 supplement
Tool Documentation
man elf # ELF format overview
man nm # Symbol table lister
man readelf # ELF file analysis
man objdump # Object file analysis
Headers to Study
#include <elf.h> /* ELF structure definitions */
/* Key structures:
* Elf64_Ehdr - ELF header
* Elf64_Shdr - Section header
* Elf64_Sym - Symbol entry
* Elf64_Rela - Relocation with addend
*/
Self-Assessment Checklist
Understanding
- I can explain what information is in a symbol table entry
- I know the difference between LOCAL, GLOBAL, and WEAK binding
- I understand why undefined symbols exist and how they’re resolved
- I can explain what a relocation entry does
- I know the difference between .symtab and .dynsym
Implementation
- Tool correctly parses ELF headers (32 and 64-bit)
- All symbols are displayed with correct attributes
- Undefined symbols are identified correctly
- Relocations are parsed and displayed
- Output matches nm/readelf for test files
Growth
- I can debug linker errors by examining symbol tables
- I understand how source code becomes symbols
- I can analyze unknown binaries using these techniques
Submission / Completion Criteria
Minimum Viable Completion
- Parses ELF header and validates magic
- Finds and parses
.symtabsection - Displays all symbols with name, type, binding
- Identifies undefined vs defined symbols
Full Completion
- Handles both 32-bit and 64-bit ELF
- Parses relocations and displays them
- Provides analysis features (–undefined, –match)
- Output format matches nm/readelf
- Handles error cases gracefully
Excellence (Going Above & Beyond)
- Dynamic symbol table support
- C++ name demangling
- Dependency graph generation
- Web interface for visual exploration
- Support for Mach-O (macOS) format
This guide was expanded from EXPERT_C_PROGRAMMING_DEEP_DIVE.md. For the complete learning path, see the project index.