Project 6: Symbol Table Analyzer

Build a tool that parses ELF files and displays symbol table information, revealing how the linker resolves references between object files.


Quick Reference

Attribute Value
Language C
Difficulty Level 4 (Advanced)
Time 1 Week (20-30 hours)
Book Reference CS:APP Ch. 7, Advanced C and C++ Compiling Ch. 3-4
Coolness 4/5 - Linker Archaeology
Portfolio Value High - Systems Programming Essential

Learning Objectives

By completing this project, you will:

  1. Parse ELF binary format - Read and interpret ELF headers, section headers, and symbol tables
  2. Understand symbol binding - Distinguish LOCAL, GLOBAL, and WEAK symbols
  3. Decode symbol types - Identify FUNC, OBJECT, NOTYPE, FILE, and SECTION symbols
  4. Interpret symbol visibility - Understand DEFAULT, HIDDEN, PROTECTED visibility
  5. Read relocation entries - See how the linker patches addresses
  6. Debug linker errors systematically - Use symbol knowledge to solve undefined references
  7. Connect compilation to linking - See how source code becomes symbols
  8. Use ELF tools effectively - Complement nm, readelf, objdump with deep understanding

The Core Question You’re Answering

“When you compile a C program, what exactly is in the symbol table, and how does the linker use it to combine object files into an executable?”

Every C program you write becomes a collection of symbols - names that represent functions, variables, and other entities. The linker’s job is to:

  1. Collect all symbols from all object files
  2. Match every undefined symbol reference to exactly one defined symbol
  3. Patch (relocate) all addresses to their final locations

Understanding symbol tables transforms linker errors from mysterious incantations into straightforward debugging:

BEFORE THIS PROJECT:
───────────────────────────────────────────────────────────
$ gcc main.o helper.o -o program
undefined reference to `calculate'
collect2: error: ld returned 1 exit status

You: "Why doesn't it find calculate? It's right there in helper.c!"
     *random modifications until it works*

AFTER THIS PROJECT:
───────────────────────────────────────────────────────────
$ nm main.o | grep calculate
                 U calculate    <- UNDEFINED, needs resolution

$ nm helper.o | grep calculate
                 U calculate    <- Also UNDEFINED! Not defined here either!

$ nm mathlib.o | grep calculate
0000000000000040 T calculate    <- DEFINED (T=text), here it is!

You: "Aha! I need to link mathlib.o"
     *one precise fix*

Theoretical Foundation

2.1 The ELF File Format

ELF (Executable and Linkable Format) is the standard binary format on Linux, BSD, and many other Unix systems. Every .o, .so, and executable follows this structure:

ELF FILE STRUCTURE
══════════════════════════════════════════════════════════════════════════════

┌──────────────────────────────────────────────────────────────────────────────┐
│                              ELF HEADER                                       │
│  ┌────────────────────────────────────────────────────────────────────────┐  │
│  │ Magic number: 0x7f 'E' 'L' 'F'                                         │  │
│  │ Class: 32-bit or 64-bit                                                │  │
│  │ Endianness: Little or Big                                              │  │
│  │ Type: Relocatable (.o), Executable, Shared object (.so), Core dump    │  │
│  │ Entry point address (for executables)                                  │  │
│  │ Program header table offset                                            │  │
│  │ Section header table offset                                            │  │
│  │ Number of program headers                                              │  │
│  │ Number of section headers                                              │  │
│  └────────────────────────────────────────────────────────────────────────┘  │
├──────────────────────────────────────────────────────────────────────────────┤
│                         PROGRAM HEADER TABLE                                  │
│  (Used by loader at runtime - describes segments)                            │
│  ┌─────────────────────────────────────────────────────────────────────────┐ │
│  │ LOAD segment: .text + .rodata (rx permissions)                         │ │
│  │ LOAD segment: .data + .bss (rw permissions)                            │ │
│  │ DYNAMIC segment: dynamic linking info                                   │ │
│  │ ...                                                                     │ │
│  └─────────────────────────────────────────────────────────────────────────┘ │
├──────────────────────────────────────────────────────────────────────────────┤
│                              SECTIONS                                         │
│  ┌─────────────────────────────────────────────────────────────────────────┐ │
│  │ .text      - Executable machine code                                    │ │
│  │ .rodata    - Read-only data (string literals, constants)               │ │
│  │ .data      - Initialized global/static variables                        │ │
│  │ .bss       - Uninitialized global/static variables (zero-initialized)  │ │
│  │ .symtab    - Symbol table (this project's focus!)                      │ │
│  │ .strtab    - String table for symbol names                              │ │
│  │ .rel.text  - Relocations for .text section                             │ │
│  │ .rel.data  - Relocations for .data section                             │ │
│  │ .dynsym    - Dynamic symbol table (for shared libraries)               │ │
│  │ .dynstr    - Dynamic string table                                       │ │
│  │ .debug_*   - DWARF debugging information                                │ │
│  │ ...                                                                     │ │
│  └─────────────────────────────────────────────────────────────────────────┘ │
├──────────────────────────────────────────────────────────────────────────────┤
│                         SECTION HEADER TABLE                                  │
│  (Used by linker - describes each section)                                   │
│  ┌─────────────────────────────────────────────────────────────────────────┐ │
│  │ Section 0: NULL (reserved)                                              │ │
│  │ Section 1: .text  - offset, size, flags (ALLOC|EXECINSTR)              │ │
│  │ Section 2: .data  - offset, size, flags (ALLOC|WRITE)                  │ │
│  │ Section 3: .symtab - offset, size, link to .strtab                     │ │
│  │ ...                                                                     │ │
│  └─────────────────────────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────────────────────┘

2.2 Symbol Table Entry Structure

Each symbol in the .symtab section has this structure (64-bit ELF):

SYMBOL TABLE ENTRY (Elf64_Sym)
══════════════════════════════════════════════════════════════════════════════

┌──────────────────────────────────────────────────────────────────────────────┐
│ struct Elf64_Sym {                                                           │
│     Elf64_Word    st_name;    /* Offset into string table (4 bytes)      */ │
│     unsigned char st_info;    /* Type (4 bits) + Binding (4 bits)        */ │
│     unsigned char st_other;   /* Visibility (2 bits) + reserved          */ │
│     Elf64_Half    st_shndx;   /* Section header index (2 bytes)          */ │
│     Elf64_Addr    st_value;   /* Symbol value/address (8 bytes)          */ │
│     Elf64_Xword   st_size;    /* Symbol size in bytes (8 bytes)          */ │
│ };  /* Total: 24 bytes per symbol */                                         │
└──────────────────────────────────────────────────────────────────────────────┘

FIELD BREAKDOWN:
────────────────

st_name:   Index into .strtab section
           0 means no name (anonymous symbol)

st_info:   Packed byte containing:
           ┌─────────────────────────────────────────────┐
           │  High 4 bits: BINDING                       │
           │    STB_LOCAL  (0) - Not visible outside file│
           │    STB_GLOBAL (1) - Visible to all files    │
           │    STB_WEAK   (2) - Lower priority global   │
           │                                             │
           │  Low 4 bits: TYPE                           │
           │    STT_NOTYPE (0) - Unspecified             │
           │    STT_OBJECT (1) - Data object (variable)  │
           │    STT_FUNC   (2) - Function                │
           │    STT_SECTION(3) - Section symbol          │
           │    STT_FILE   (4) - Source file name        │
           └─────────────────────────────────────────────┘

st_other:  Visibility (bits 0-1):
           STV_DEFAULT   (0) - Normal visibility
           STV_HIDDEN    (2) - Not visible outside .so
           STV_PROTECTED (3) - Visible but not preemptable

st_shndx:  Section index where symbol is defined:
           SHN_UNDEF (0)      - Undefined (reference)
           SHN_ABS   (0xfff1) - Absolute value
           SHN_COMMON(0xfff2) - Common symbol (tentative)
           1, 2, 3...        - Index into section headers

st_value:  For defined symbols: offset within section
           For undefined: 0
           For COMMON: alignment requirement

st_size:   Size of data object or function (0 if unknown)

2.3 Symbol Binding and Resolution

The linker resolves symbols based on binding:

SYMBOL RESOLUTION RULES
══════════════════════════════════════════════════════════════════════════════

┌──────────────────────────────────────────────────────────────────────────────┐
│                        STRONG vs WEAK SYMBOLS                                 │
├──────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  STRONG SYMBOLS (STB_GLOBAL with definition):                                │
│  ─────────────────────────────────────────────                               │
│  • Functions                                                                 │
│  • Initialized global variables                                              │
│                                                                              │
│  WEAK SYMBOLS (STB_WEAK):                                                    │
│  ────────────────────────                                                    │
│  • Uninitialized global variables (tentative definitions)                    │
│  • Explicitly marked with __attribute__((weak))                              │
│                                                                              │
├──────────────────────────────────────────────────────────────────────────────┤
│                        RESOLUTION RULES                                       │
├──────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  Rule 1: Multiple strong definitions of same symbol                          │
│          → LINKER ERROR: "multiple definition of X"                          │
│                                                                              │
│  Rule 2: One strong + multiple weak definitions                              │
│          → Strong wins (silently, no warning!)                               │
│                                                                              │
│  Rule 3: Multiple weak definitions, no strong                                │
│          → Linker picks one (typically first encountered)                    │
│                                                                              │
│  Rule 4: Undefined symbol with no definition                                 │
│          → LINKER ERROR: "undefined reference to X"                          │
│                                                                              │
├──────────────────────────────────────────────────────────────────────────────┤
│                              EXAMPLES                                         │
├──────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  // file1.c                      // file2.c                                  │
│  int x = 1;  /* STRONG */        int x = 2;  /* STRONG */                   │
│                                                                              │
│  → ERROR: multiple definition of 'x'                                         │
│                                                                              │
│  ───────────────────────────────────────────────────────────────────────────  │
│                                                                              │
│  // file1.c                      // file2.c                                  │
│  int x = 1;  /* STRONG */        int x;      /* WEAK (uninitialized) */     │
│                                                                              │
│  → OK: x = 1 (strong wins)                                                   │
│                                                                              │
│  ───────────────────────────────────────────────────────────────────────────  │
│                                                                              │
│  // file1.c                      // file2.c                                  │
│  int x;      /* WEAK */          int x;      /* WEAK */                      │
│                                                                              │
│  → OK: x = 0 (one weak chosen, both initialized to 0)                        │
│                                                                              │
└──────────────────────────────────────────────────────────────────────────────┘

2.4 Relocations: The Linker’s Patch List

Object files contain references to symbols whose final addresses aren’t known at compile time. Relocations tell the linker where and how to patch these references:

RELOCATION ENTRIES
══════════════════════════════════════════════════════════════════════════════

┌──────────────────────────────────────────────────────────────────────────────┐
│ struct Elf64_Rela {                                                          │
│     Elf64_Addr  r_offset;   /* Location to patch (8 bytes)               */ │
│     Elf64_Xword r_info;     /* Symbol index (32 bits) + Type (32 bits)   */ │
│     Elf64_Sxword r_addend;  /* Constant to add (8 bytes)                 */ │
│ };                                                                           │
└──────────────────────────────────────────────────────────────────────────────┘

COMMON RELOCATION TYPES (x86-64):
────────────────────────────────

R_X86_64_64        Absolute 64-bit address
                   S + A
                   (Symbol value + Addend)

R_X86_64_PC32      PC-relative 32-bit
                   S + A - P
                   (Symbol + Addend - Place)

R_X86_64_PLT32     Function call through PLT
                   L + A - P
                   (PLT entry + Addend - Place)

R_X86_64_GOTPCREL  Data access through GOT
                   G + A - P
                   (GOT entry + Addend - Place)


EXAMPLE: How a function call gets relocated
═══════════════════════════════════════════

Source code:
────────────
    extern int helper(int x);
    int main() {
        return helper(42);
    }

Object file (main.o):
────────────────────
    .text:
    main:
        push   rbp
        mov    rbp, rsp
        mov    edi, 42
        call   0x00000000    ← Placeholder! (to be filled by linker)
        pop    rbp
        ret

    .rela.text:
        r_offset: 0x0a       (offset of the call target in .text)
        r_info:   helper (symbol) + R_X86_64_PLT32 (type)
        r_addend: -4

After linking (executable):
──────────────────────────
    .text:
    main:
        push   rbp
        mov    rbp, rsp
        mov    edi, 42
        call   0x401040      ← Patched with helper's address!
        pop    rbp
        ret

2.5 Why This Matters

Understanding symbol tables unlocks:

  1. Debugging Linker Errors: Know exactly why “undefined reference” occurs
  2. Library Design: Control symbol visibility for clean APIs
  3. Optimization: Understand inlining, LTO, dead code elimination
  4. Security: Symbol interposition, LD_PRELOAD hooks
  5. Reverse Engineering: Analyze binaries without source code
  6. Build Systems: Understand incremental linking, shared libraries

2.6 Historical Context: Before ELF

BINARY FORMAT EVOLUTION
══════════════════════════════════════════════════════════════════════════════

1970s: a.out (Assembler OUTput)
       - Simple format from Unix
       - Limited to one code + one data section
       - No dynamic linking

1980s: COFF (Common Object File Format)
       - Multiple sections
       - Still limited symbol information
       - Used by early Windows, Unix System V

1990s: ELF (Executable and Linkable Format)
       - Created for System V Release 4
       - Rich symbol tables, dynamic linking
       - Became standard on Linux, BSD, Solaris

Other formats still in use:
       - Mach-O: macOS and iOS
       - PE/COFF: Windows executables (.exe, .dll)
       - WebAssembly: Browser binaries

2.7 Common Misconceptions

Misconception Reality
“Static variables aren’t in symbol table” They ARE, but with LOCAL binding
“Symbol table is only for debugging” Linker REQUIRES it for symbol resolution
“All symbols have addresses” UNDEFINED symbols have no address
“Function names are stored as-is” C++ mangles names; C doesn’t
“Weak symbols cause warnings” Strong silently overrides weak
“Symbol size is always accurate” Compiler may not know size of extern

Project Specification

3.1 What You Will Build

A mini nm/readelf clone called symtab that parses ELF files and displays symbol table information:

$ ./symtab main.o
================================================================================
                    SYMBOL TABLE ANALYZER
                    File: main.o
                    Type: Relocatable object file
================================================================================

SYMBOL TABLE (.symtab) - 12 entries
────────────────────────────────────────────────────────────────────────────────
  Num    Value            Size Type    Bind   Vis      Ndx Name
────────────────────────────────────────────────────────────────────────────────
    0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND
    1: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS main.c
    2: 0000000000000000     0 SECTION LOCAL  DEFAULT    1 .text
    3: 0000000000000000     0 SECTION LOCAL  DEFAULT    3 .data
    4: 0000000000000000     0 SECTION LOCAL  DEFAULT    4 .bss
    5: 0000000000000000     0 SECTION LOCAL  DEFAULT    5 .rodata
    6: 0000000000000000     4 OBJECT  LOCAL  DEFAULT    3 local_static
    7: 0000000000000000    42 FUNC    GLOBAL DEFAULT    1 main
    8: 0000000000000000     4 OBJECT  GLOBAL DEFAULT    3 global_var
    9: 0000000000000004     4 OBJECT  GLOBAL DEFAULT    4 uninit_global
   10: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND printf
   11: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND helper_function

UNDEFINED SYMBOLS (need resolution):
────────────────────────────────────────────────────────────────────────────────
  printf          - standard library function
  helper_function - user-defined, must be linked

RELOCATIONS (.rela.text) - 3 entries
────────────────────────────────────────────────────────────────────────────────
  Offset          Info           Type                Sym. Value    Sym. Name + Addend
────────────────────────────────────────────────────────────────────────────────
  00000000001a  000a00000004 R_X86_64_PLT32          0000000000000000 printf - 4
  000000000024  000b00000004 R_X86_64_PLT32          0000000000000000 helper_function - 4
  00000000002f  000500000002 R_X86_64_PC32           0000000000000000 .rodata - 4

SYMBOL SUMMARY:
────────────────────────────────────────────────────────────────────────────────
  Total symbols:     12
  Defined (GLOBAL):  3  (main, global_var, uninit_global)
  Defined (LOCAL):   2  (local_static, main.c)
  Undefined:         2  (printf, helper_function)
  Section symbols:   4
================================================================================

3.2 Functional Requirements

  1. ELF Header Parsing:
    • Validate ELF magic number
    • Determine 32-bit vs 64-bit
    • Identify file type (relocatable, executable, shared)
    • Parse endianness
  2. Section Header Parsing:
    • Locate all sections
    • Find .symtab and .strtab
    • Find .dynsym and .dynstr (for shared libs)
    • Find relocation sections (.rela.*, .rel.*)
  3. Symbol Table Display:
    • List all symbols with full information
    • Decode binding (LOCAL, GLOBAL, WEAK)
    • Decode type (FUNC, OBJECT, NOTYPE, FILE, SECTION)
    • Show section index or special value (UND, ABS, COM)
    • Display symbol value and size
  4. Relocation Display:
    • Show all relocations
    • Decode relocation type
    • Show target symbol
    • Display addend
  5. Analysis Features:
    • List undefined symbols
    • Identify potential linking issues
    • Compare two object files
    • Export/import analysis

3.3 Non-Functional Requirements

  • Portability: Handle both 32-bit and 64-bit ELF
  • Robustness: Gracefully handle malformed files
  • Performance: Efficiently mmap large files
  • Accuracy: Match output with nm and readelf
  • Educational: Clear output with explanations

3.4 Example Usage and Output

# Basic symbol table dump
$ ./symtab main.o

# Show only undefined symbols
$ ./symtab --undefined main.o
Undefined symbols in main.o:
  printf          (libc function)
  helper_function (user function)

# Show only relocations
$ ./symtab --reloc main.o

# Compare what file1.o exports vs what file2.o needs
$ ./symtab --match file1.o file2.o
Symbols exported by file1.o:
  helper_function  FUNC    GLOBAL  .text
  shared_data      OBJECT  GLOBAL  .data

Symbols needed by file2.o:
  helper_function  -> RESOLVED by file1.o
  printf           -> UNRESOLVED (needs libc)

# Analyze a shared library
$ ./symtab --dynamic /lib/x86_64-linux-gnu/libc.so.6 | head -20

# Verbose mode with educational explanations
$ ./symtab --verbose main.o
[INFO] Reading ELF header...
       Magic: 0x7f 'E' 'L' 'F' (valid ELF file)
       Class: 64-bit
       Endian: Little endian
       Type: Relocatable (ET_REL) - object file before linking

[INFO] Scanning section headers...
       Found 13 sections
       .symtab at offset 0x2a8, 288 bytes (12 symbols)
       .strtab at offset 0x3c8, 95 bytes
       .rela.text at offset 0x430, 72 bytes (3 relocations)
...

3.5 Real World Outcome

After building this tool, you can:

# Scenario: "undefined reference to calculate"

# Step 1: Check what main.o needs
$ ./symtab --undefined main.o
  calculate    NOTYPE  GLOBAL  UND    <- Undefined!

# Step 2: Find where calculate is defined
$ ./symtab --defined mathlib.o | grep calculate
  calculate    FUNC    GLOBAL  .text  <- Found!

# Step 3: Verify the fix works
$ gcc main.o mathlib.o -o program  # Success!

Solution Architecture

4.1 High-Level Design

SYMTAB TOOL ARCHITECTURE
══════════════════════════════════════════════════════════════════════════════

┌──────────────────────────────────────────────────────────────────────────────┐
│                              USER INTERFACE                                   │
│  ┌────────────────────────────────────────────────────────────────────────┐  │
│  │  CLI Parser: --symbols, --reloc, --undefined, --verbose, --match      │  │
│  └────────────────────────────────────────────────────────────────────────┘  │
└──────────────────────────────────────────────────────────────────────────────┘
                                     │
                                     ▼
┌──────────────────────────────────────────────────────────────────────────────┐
│                             ELF PARSER CORE                                   │
│  ┌────────────────────────────────────────────────────────────────────────┐  │
│  │                        elf_parser.c                                    │  │
│  │  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐                 │  │
│  │  │ ELF Header   │  │ Section Hdrs │  │ Program Hdrs │                 │  │
│  │  │ Parser       │  │ Parser       │  │ Parser       │                 │  │
│  │  └──────────────┘  └──────────────┘  └──────────────┘                 │  │
│  └────────────────────────────────────────────────────────────────────────┘  │
│                                     │                                        │
│                                     ▼                                        │
│  ┌────────────────────────────────────────────────────────────────────────┐  │
│  │                        section_reader.c                                │  │
│  │  ┌──────────────────┐  ┌──────────────────┐  ┌──────────────────────┐ │  │
│  │  │ .symtab Reader   │  │ .strtab Reader   │  │ .rel/.rela Reader    │ │  │
│  │  │ (Symbol Table)   │  │ (String Table)   │  │ (Relocations)        │ │  │
│  │  └──────────────────┘  └──────────────────┘  └──────────────────────┘ │  │
│  └────────────────────────────────────────────────────────────────────────┘  │
└──────────────────────────────────────────────────────────────────────────────┘
                                     │
                                     ▼
┌──────────────────────────────────────────────────────────────────────────────┐
│                           ANALYSIS ENGINE                                     │
│  ┌────────────────────────────────────────────────────────────────────────┐  │
│  │  symbol_analyzer.c                                                     │  │
│  │  ┌────────────────┐  ┌────────────────┐  ┌────────────────────────┐   │  │
│  │  │ Classify       │  │ Find Undefined │  │ Match Exports/Imports  │   │  │
│  │  │ Symbols        │  │ References     │  │ Between Files          │   │  │
│  │  └────────────────┘  └────────────────┘  └────────────────────────┘   │  │
│  └────────────────────────────────────────────────────────────────────────┘  │
└──────────────────────────────────────────────────────────────────────────────┘
                                     │
                                     ▼
┌──────────────────────────────────────────────────────────────────────────────┐
│                            OUTPUT FORMATTER                                   │
│  ┌────────────────────────────────────────────────────────────────────────┐  │
│  │  display.c                                                             │  │
│  │  ┌────────────────┐  ┌────────────────┐  ┌────────────────────────┐   │  │
│  │  │ Table Format   │  │ Verbose Mode   │  │ JSON Export            │   │  │
│  │  │ (nm-style)     │  │ (Educational)  │  │ (Machine-readable)     │   │  │
│  │  └────────────────┘  └────────────────┘  └────────────────────────┘   │  │
│  └────────────────────────────────────────────────────────────────────────┘  │
└──────────────────────────────────────────────────────────────────────────────┘

4.2 Key Components

Component File Responsibility
Main/CLI main.c Command-line parsing, orchestration
ELF Parser elf_parser.c Read and validate ELF headers
Section Reader section_reader.c Read specific sections
Symbol Table symtab.c Parse and store symbol entries
Relocations reloc.c Parse relocation entries
Analyzer analyzer.c Symbol resolution analysis
Display display.c Output formatting

4.3 Data Structures

/* ELF file representation */
typedef struct {
    int fd;                     /* File descriptor */
    void *map;                  /* Memory-mapped file */
    size_t size;                /* File size */
    int is_64bit;               /* 64-bit or 32-bit */
    int is_little_endian;       /* Endianness */
    uint16_t type;              /* ET_REL, ET_EXEC, ET_DYN */

    /* Header pointers (points into mmap'd region) */
    union {
        Elf32_Ehdr *ehdr32;
        Elf64_Ehdr *ehdr64;
    };

    /* Section header table */
    void *shdr;                 /* Section headers */
    int shnum;                  /* Number of sections */
    char *shstrtab;             /* Section string table */

    /* Symbol table */
    void *symtab;               /* Symbol table */
    int symcount;               /* Number of symbols */
    char *strtab;               /* Symbol string table */

    /* Dynamic symbol table (for .so files) */
    void *dynsym;
    int dynsymcount;
    char *dynstr;

} ElfFile;

/* Parsed symbol (normalized for 32/64 bit) */
typedef struct {
    char *name;                 /* Symbol name (pointer into strtab) */
    uint64_t value;             /* Symbol value/address */
    uint64_t size;              /* Symbol size */
    uint8_t type;               /* STT_FUNC, STT_OBJECT, etc. */
    uint8_t binding;            /* STB_LOCAL, STB_GLOBAL, STB_WEAK */
    uint8_t visibility;         /* STV_DEFAULT, STV_HIDDEN, etc. */
    uint16_t section_index;     /* Section or special (SHN_UNDEF, etc.) */
    const char *section_name;   /* Section name (or "UND", "ABS", "COM") */
} Symbol;

/* Parsed relocation */
typedef struct {
    uint64_t offset;            /* Location to patch */
    uint32_t type;              /* Relocation type */
    uint32_t symbol_index;      /* Symbol table index */
    int64_t addend;             /* Constant addend */
    char *symbol_name;          /* Resolved symbol name */
} Relocation;

/* Analysis result */
typedef struct {
    Symbol *defined;            /* Defined symbols */
    int defined_count;
    Symbol *undefined;          /* Undefined symbols */
    int undefined_count;
    Symbol *weak;               /* Weak symbols */
    int weak_count;
} SymbolAnalysis;

4.4 Algorithm Overview

ELF Parsing Algorithm:

PARSE ELF FILE
══════════════════════════════════════════════════════════════════════════════

1. OPEN AND MAP FILE
   ────────────────────────────────────────────────────────────────────────────
   fd = open(filename, O_RDONLY)
   stat(fd, &st)
   map = mmap(NULL, st.st_size, PROT_READ, MAP_PRIVATE, fd, 0)

2. VALIDATE ELF HEADER
   ────────────────────────────────────────────────────────────────────────────
   Check magic: map[0..3] == {0x7f, 'E', 'L', 'F'}
   Determine class: map[4] == 1 (32-bit) or 2 (64-bit)
   Determine endian: map[5] == 1 (little) or 2 (big)
   Read type: ehdr->e_type

3. LOCATE SECTION HEADER TABLE
   ────────────────────────────────────────────────────────────────────────────
   shoff = ehdr->e_shoff        // Offset to section headers
   shnum = ehdr->e_shnum        // Number of sections
   shentsize = ehdr->e_shentsize // Size of each header
   shstrndx = ehdr->e_shstrndx  // Index of section name string table

   shdr = map + shoff
   shstrtab = map + shdr[shstrndx].sh_offset

4. FIND SYMBOL TABLE SECTION
   ────────────────────────────────────────────────────────────────────────────
   for each section:
       if shdr[i].sh_type == SHT_SYMTAB:
           symtab = map + shdr[i].sh_offset
           symcount = shdr[i].sh_size / shdr[i].sh_entsize
           strtab_idx = shdr[i].sh_link  // Link to string table
           strtab = map + shdr[strtab_idx].sh_offset

5. PARSE EACH SYMBOL
   ────────────────────────────────────────────────────────────────────────────
   for i = 0 to symcount:
       sym = symtab[i]
       name = strtab + sym.st_name
       type = ELF_ST_TYPE(sym.st_info)
       binding = ELF_ST_BIND(sym.st_info)
       visibility = ELF_ST_VISIBILITY(sym.st_other)
       section = sym.st_shndx

6. FIND AND PARSE RELOCATIONS
   ────────────────────────────────────────────────────────────────────────────
   for each section:
       if shdr[i].sh_type == SHT_RELA or SHT_REL:
           Parse relocation entries
           Link to symbol table via sh_link

Implementation Guide

5.1 Development Environment Setup

# Required tools
sudo apt-get install gcc gdb binutils build-essential

# Verify installations
gcc --version
readelf --version
nm --version
objdump --version

# Create project structure
mkdir -p symtab/{src,include,test,docs}
cd symtab

5.2 Project Structure

symtab/
├── src/
│   ├── main.c              # Entry point, CLI parsing
│   ├── elf_parser.c        # ELF file opening and header parsing
│   ├── section_reader.c    # Section location and reading
│   ├── symtab.c            # Symbol table parsing
│   ├── reloc.c             # Relocation parsing
│   ├── analyzer.c          # Symbol analysis (undefined, matching)
│   └── display.c           # Output formatting
├── include/
│   ├── symtab.h            # Main header with types
│   ├── elf_parser.h        # ELF parsing declarations
│   └── display.h           # Display function declarations
├── test/
│   ├── test_basic.c        # Simple test file
│   ├── test_multi.c        # Multiple files test
│   └── run_tests.sh        # Test script
├── Makefile
└── README.md

5.3 The Core Question You’re Answering

“When you compile a C program, what exactly is in the symbol table, and how does the linker use it to combine object files into an executable?”

5.4 Concepts You Must Understand First

Before implementing, ensure you can answer these questions:

  1. What is the difference between DEFINED and UNDEFINED symbols?
    • DEFINED: Symbol has a definition (code or data) in this file
    • UNDEFINED: Symbol is referenced but defined elsewhere
    • How: Check st_shndx - if SHN_UNDEF (0), it’s undefined
    • CS:APP Ch. 7.5
  2. What is symbol binding (LOCAL vs GLOBAL)?
    • LOCAL: Only visible within the compilation unit (file)
    • GLOBAL: Visible to other files during linking
    • How: Extract from st_info using ELF_ST_BIND()
    • CS:APP Ch. 7.5
  3. What are WEAK symbols and when are they used?
    • Lower priority than GLOBAL - can be overridden
    • Uninitialized globals are tentatively defined as WEAK
    • How: STB_WEAK binding, linker resolution rules
    • CS:APP Ch. 7.6.1
  4. What is a relocation and why is it needed?
    • Object files don’t know final addresses
    • Relocations are “fix-up” instructions for the linker
    • How: .rel.text and .rela.text sections
    • CS:APP Ch. 7.7
  5. What is the string table and how does it work?
    • Symbol names are stored separately in .strtab
    • st_name is an offset into the string table
    • Allows variable-length names without wasting space
    • ELF specification

5.5 Questions to Guide Your Design

Work through these before coding:

  1. How will you handle both 32-bit and 64-bit ELF?
    • Header structures differ in size
    • Consider using unions or conditional parsing
    • Test with both architectures
  2. How will you access the string table efficiently?
    • Memory-map the file for direct pointer access
    • Or read string table into memory separately
  3. How will you decode relocation types?
    • Types are architecture-specific (x86-64, ARM, etc.)
    • Consider creating a type-to-string mapping table
  4. How will you match symbol imports/exports between files?
    • Build a hash table of defined symbols
    • Check each undefined against the hash table
  5. How will you handle malformed ELF files?
    • Validate magic number, bounds check offsets
    • Fail gracefully with informative errors

5.6 Thinking Exercise

Before writing code, trace through this example manually:

/* main.c */
extern int helper(int x);
int global_var = 42;

int main(void) {
    return helper(global_var);
}

/* helper.c */
int helper(int x) {
    return x * 2;
}

Exercise:

  1. List all symbols that will be in main.o’s symbol table
  2. Classify each as DEFINED/UNDEFINED, LOCAL/GLOBAL
  3. What relocations will main.o have?
  4. After linking, how are the relocations resolved?

Expected Answers:

main.o symbols:
  main.c      FILE    LOCAL   ABS     <- Source file name
  main        FUNC    GLOBAL  .text   <- Defined function
  global_var  OBJECT  GLOBAL  .data   <- Defined variable
  helper      NOTYPE  GLOBAL  UND     <- Undefined reference!

main.o relocations:
  call helper    R_X86_64_PLT32   <- Needs helper's address
  mov global_var R_X86_64_PC32    <- Needs global_var's address (PC-relative)

5.7 Hints in Layers

Hint 1: Starting with ELF Header

Use the system headers for ELF structures:

#include <elf.h>

/* Check magic number */
int is_elf(unsigned char *data) {
    return data[0] == 0x7f &&
           data[1] == 'E' &&
           data[2] == 'L' &&
           data[3] == 'F';
}

/* Determine 32 vs 64 bit */
int is_64bit(unsigned char *data) {
    return data[EI_CLASS] == ELFCLASS64;
}
Hint 2: Memory Mapping the File

Memory mapping is more efficient than read():

#include <sys/mman.h>
#include <sys/stat.h>
#include <fcntl.h>

ElfFile *open_elf(const char *filename) {
    ElfFile *elf = calloc(1, sizeof(ElfFile));

    elf->fd = open(filename, O_RDONLY);
    if (elf->fd < 0) return NULL;

    struct stat st;
    fstat(elf->fd, &st);
    elf->size = st.st_size;

    elf->map = mmap(NULL, elf->size, PROT_READ, MAP_PRIVATE, elf->fd, 0);
    if (elf->map == MAP_FAILED) return NULL;

    return elf;
}
Hint 3: Finding the Symbol Table

Loop through section headers to find SHT_SYMTAB:

void find_symtab(ElfFile *elf) {
    Elf64_Ehdr *ehdr = elf->map;
    Elf64_Shdr *shdr = elf->map + ehdr->e_shoff;

    /* First, get section name string table */
    Elf64_Shdr *shstrtab_hdr = &shdr[ehdr->e_shstrndx];
    char *shstrtab = elf->map + shstrtab_hdr->sh_offset;

    for (int i = 0; i < ehdr->e_shnum; i++) {
        char *name = shstrtab + shdr[i].sh_name;

        if (shdr[i].sh_type == SHT_SYMTAB) {
            elf->symtab = elf->map + shdr[i].sh_offset;
            elf->symcount = shdr[i].sh_size / shdr[i].sh_entsize;

            /* sh_link points to associated string table */
            Elf64_Shdr *strtab_hdr = &shdr[shdr[i].sh_link];
            elf->strtab = elf->map + strtab_hdr->sh_offset;
        }
    }
}
Hint 4: Decoding Symbol Info

Use the ELF macros to extract type and binding:

void print_symbol(Elf64_Sym *sym, char *strtab) {
    char *name = strtab + sym->st_name;

    /* Extract binding and type from st_info */
    unsigned char binding = ELF64_ST_BIND(sym->st_info);
    unsigned char type = ELF64_ST_TYPE(sym->st_info);
    unsigned char vis = ELF64_ST_VISIBILITY(sym->st_other);

    const char *bind_str[] = {"LOCAL", "GLOBAL", "WEAK"};
    const char *type_str[] = {"NOTYPE", "OBJECT", "FUNC", "SECTION", "FILE"};

    printf("%016lx %5lu %-7s %-6s %s\n",
           sym->st_value,
           sym->st_size,
           type_str[type < 5 ? type : 0],
           bind_str[binding < 3 ? binding : 0],
           name);
}
Hint 5: Section Index Special Values

Handle special section indices:

const char *get_section_name(uint16_t shndx, ElfFile *elf) {
    switch (shndx) {
        case SHN_UNDEF:  return "UND";    /* Undefined */
        case SHN_ABS:    return "ABS";    /* Absolute */
        case SHN_COMMON: return "COM";    /* Common (unallocated) */
        default:
            if (shndx < elf->shnum) {
                return elf->shstrtab + elf->shdr[shndx].sh_name;
            }
            return "???";
    }
}
Hint 6: Reading Relocations

Relocations are in .rela.text or .rel.text:

void read_relocations(ElfFile *elf, Elf64_Shdr *rela_shdr) {
    Elf64_Rela *rela = elf->map + rela_shdr->sh_offset;
    int count = rela_shdr->sh_size / rela_shdr->sh_entsize;

    /* sh_link points to symbol table, sh_info to target section */
    Elf64_Shdr *symtab_hdr = &elf->shdr[rela_shdr->sh_link];
    Elf64_Sym *symtab = elf->map + symtab_hdr->sh_offset;

    for (int i = 0; i < count; i++) {
        uint32_t sym_idx = ELF64_R_SYM(rela[i].r_info);
        uint32_t type = ELF64_R_TYPE(rela[i].r_info);

        Elf64_Sym *sym = &symtab[sym_idx];
        char *sym_name = elf->strtab + sym->st_name;

        printf("Offset: %lx, Type: %d, Symbol: %s, Addend: %ld\n",
               rela[i].r_offset, type, sym_name, rela[i].r_addend);
    }
}

5.8 The Interview Questions They’ll Ask

After completing this project, you’ll be ready for:

  1. “What is the symbol table and what does it contain?”
    • Table of all named entities (functions, variables)
    • Each entry has: name, type, binding, section, value, size
    • Linker uses it to resolve references between files
  2. “Explain the difference between LOCAL and GLOBAL symbols”
    • LOCAL: Only visible within the file (like static in C)
    • GLOBAL: Visible to other files during linking
    • LOCAL symbols can have duplicate names across files
  3. “What is a WEAK symbol and when would you use one?”
    • Lower priority than GLOBAL during resolution
    • Uninitialized globals are tentatively WEAK
    • Useful for providing default implementations
    • __attribute__((weak)) for intentional weak symbols
  4. “How does the linker resolve undefined references?”
    • Collect all symbols from all object files
    • For each undefined, search for matching defined symbol
    • Multiple definitions of same symbol? Apply strong/weak rules
    • Still undefined after all files? Linker error
  5. “What is a relocation entry?”
    • Location in code/data that needs patching
    • References symbol by index
    • Type determines how to compute final value
    • Addend for offset calculations
  6. “Why do we need separate .symtab and .strtab sections?”
    • Symbol entries are fixed-size (24 bytes on 64-bit)
    • Names are variable length
    • Storing names separately allows efficient symbol table scanning
    • Multiple symbols can share common prefixes

5.9 Books That Will Help

Topic Book Chapter
Symbol tables CS:APP Ch. 7.5
Symbol resolution CS:APP Ch. 7.6
Relocations CS:APP Ch. 7.7
ELF format Advanced C and C++ Compiling Ch. 3-4
Dynamic linking CS:APP Ch. 7.10-7.12
Linker scripts Linkers and Loaders Ch. 3

5.10 Implementation Phases

Phase 1: ELF Header Parsing (Day 1)

Goals:

  • Open and memory-map ELF files
  • Validate ELF magic number
  • Parse ELF header (32-bit and 64-bit)
  • Display basic file information

Deliverable:

$ ./symtab main.o
ELF Header:
  Magic:   7f 45 4c 46 02 01 01 00 ...
  Class:   ELF64
  Data:    Little endian
  Type:    REL (Relocatable file)
  Machine: x86-64

Phase 2: Section Headers (Day 2)

Goals:

  • Parse section header table
  • List all sections with attributes
  • Find .symtab and .strtab

Deliverable:

$ ./symtab --sections main.o
Section Headers:
  [Nr] Name          Type      Addr     Off      Size     Flags
  [ 0]               NULL      00000000 00000000 00000000
  [ 1] .text         PROGBITS  00000000 00000040 00000042 AX
  [ 2] .rela.text    RELA      00000000 000002a0 00000048 I
  [ 3] .data         PROGBITS  00000000 00000084 00000004 WA
  ...

Phase 3: Symbol Table Parsing (Days 3-4)

Goals:

  • Parse all symbols from .symtab
  • Decode type, binding, visibility
  • Display in nm-style format

Deliverable:

$ ./symtab main.o
Symbol Table (.symtab):
  Num    Value          Size Type    Bind   Vis      Ndx Name
    0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND
    1: 0000000000000000    42 FUNC    GLOBAL DEFAULT    1 main
    2: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND helper

Phase 4: Relocations (Day 5)

Goals:

  • Parse .rela.* and .rel.* sections
  • Decode relocation types
  • Show target symbols

Deliverable:

$ ./symtab --reloc main.o
Relocations (.rela.text):
  Offset          Type                Sym     Addend
  000000000012    R_X86_64_PLT32      helper  -4

Phase 5: Analysis Features (Days 6-7)

Goals:

  • List undefined symbols
  • Match exports/imports between files
  • Summary statistics

Deliverable:

$ ./symtab --undefined main.o
Undefined symbols:
  helper          NOTYPE  GLOBAL
  printf          NOTYPE  GLOBAL

$ ./symtab --match main.o helper.o
Resolution analysis:
  helper: RESOLVED by helper.o (FUNC, .text)
  printf: UNRESOLVED (external library)

5.11 Key Implementation Decisions

  1. Memory mapping vs read(): Use mmap() for efficiency and direct pointer access

  2. 32-bit vs 64-bit: Use unions or separate code paths; structures differ in size

  3. Endianness: Most systems are little-endian; consider byte-swapping for portability

  4. Error handling: Validate all offsets against file size before dereferencing

  5. Output format: Match nm and readelf output for easy comparison


Testing Strategy

Test Against System Tools

# Compare your output with nm
$ ./symtab main.o > my_output.txt
$ nm main.o > nm_output.txt
$ diff my_output.txt nm_output.txt

# Compare with readelf
$ ./symtab --verbose main.o > my_output.txt
$ readelf -s main.o > readelf_output.txt

Test Cases

Test Input Expected
Valid relocatable main.o All symbols displayed
Valid executable /bin/ls Symbols + program headers
Shared library libc.so.6 Dynamic symbols shown
Stripped binary stripped No .symtab, only .dynsym
Invalid file random.txt “Not an ELF file” error
32-bit ELF hello32.o Correct parsing
Missing sections nosymtab.o Graceful “no symbols” message

Validation Script

#!/bin/bash
# test/run_tests.sh

echo "Testing symbol table analyzer..."

# Test 1: Basic object file
gcc -c test/test_basic.c -o test/test_basic.o
./symtab test/test_basic.o > /dev/null && echo "PASS: Basic parsing" || echo "FAIL"

# Test 2: Compare with nm
nm test/test_basic.o | grep -v "^$" > /tmp/nm_out.txt
./symtab test/test_basic.o | grep -E "FUNC|OBJECT" > /tmp/my_out.txt
# Compare symbol names (basic check)

# Test 3: Undefined symbols
echo 'extern void undefined(void); int main() { undefined(); }' > /tmp/undef.c
gcc -c /tmp/undef.c -o /tmp/undef.o
./symtab --undefined /tmp/undef.o | grep -q "undefined" && \
    echo "PASS: Undefined detection" || echo "FAIL"

# Test 4: Relocations
./symtab --reloc /tmp/undef.o | grep -q "R_X86_64" && \
    echo "PASS: Relocation parsing" || echo "FAIL"

echo "Tests complete!"

Common Pitfalls & Debugging

Pitfall Symptom Solution
Wrong struct size Garbage values Check 32 vs 64 bit
Off-by-one in sections Crash or wrong section Section 0 is NULL
Endian mismatch Huge/wrong numbers Check EI_DATA byte
String table offset Empty names Verify sh_link points to .strtab
Bounds check missing Segfault on bad file Validate all offsets
Missing SHN_UNDEF check “UND” shows wrong section Handle special indices

Debugging Tips

# Verify your understanding with readelf
readelf -h main.o      # ELF header
readelf -S main.o      # Section headers
readelf -s main.o      # Symbol table
readelf -r main.o      # Relocations

# Hex dump specific sections
objdump -s -j .symtab main.o

# See raw bytes
hexdump -C main.o | head -100

Extensions & Challenges

Beginner Extensions

  • Color-coded output (undefined in red, defined in green)
  • JSON output format for scripting
  • Filter by symbol type (–functions, –variables)

Intermediate Extensions

  • Dynamic symbol table (.dynsym) parsing
  • Version symbols parsing (.gnu.version)
  • Support for macOS Mach-O format
  • Demangle C++ symbol names

Advanced Extensions

  • DWARF debug info parsing (function line numbers)
  • Build a symbol dependency graph
  • Detect ABI compatibility issues
  • Compare symbol tables from different compiler versions

Real-World Connections

Industry Applications

  • Build Systems: Symbol analysis for incremental linking
  • Package Managers: Dependency detection (what libraries needed)
  • Security Tools: Finding exported functions for fuzzing
  • Reverse Engineering: IDA/Ghidra use symbol tables for analysis
  • Performance: LTO (Link-Time Optimization) uses symbols
  • binutils (nm, readelf, objdump): The standard ELF tools
  • LLVM tools (llvm-nm, llvm-readelf): Modern alternatives
  • patchelf: Modify ELF files post-compilation
  • elfutils: Library for ELF manipulation
  • radare2: Reverse engineering framework

Resources

ELF Specification

Tool Documentation

man elf          # ELF format overview
man nm           # Symbol table lister
man readelf      # ELF file analysis
man objdump      # Object file analysis

Headers to Study

#include <elf.h>     /* ELF structure definitions */

/* Key structures:
 * Elf64_Ehdr - ELF header
 * Elf64_Shdr - Section header
 * Elf64_Sym  - Symbol entry
 * Elf64_Rela - Relocation with addend
 */

Self-Assessment Checklist

Understanding

  • I can explain what information is in a symbol table entry
  • I know the difference between LOCAL, GLOBAL, and WEAK binding
  • I understand why undefined symbols exist and how they’re resolved
  • I can explain what a relocation entry does
  • I know the difference between .symtab and .dynsym

Implementation

  • Tool correctly parses ELF headers (32 and 64-bit)
  • All symbols are displayed with correct attributes
  • Undefined symbols are identified correctly
  • Relocations are parsed and displayed
  • Output matches nm/readelf for test files

Growth

  • I can debug linker errors by examining symbol tables
  • I understand how source code becomes symbols
  • I can analyze unknown binaries using these techniques

Submission / Completion Criteria

Minimum Viable Completion

  • Parses ELF header and validates magic
  • Finds and parses .symtab section
  • Displays all symbols with name, type, binding
  • Identifies undefined vs defined symbols

Full Completion

  • Handles both 32-bit and 64-bit ELF
  • Parses relocations and displays them
  • Provides analysis features (–undefined, –match)
  • Output format matches nm/readelf
  • Handles error cases gracefully

Excellence (Going Above & Beyond)

  • Dynamic symbol table support
  • C++ name demangling
  • Dependency graph generation
  • Web interface for visual exploration
  • Support for Mach-O (macOS) format

This guide was expanded from EXPERT_C_PROGRAMMING_DEEP_DIVE.md. For the complete learning path, see the project index.