Project 6: Deconstructing a Stripped Binary Crash

Debug crashes in production binaries where all debug symbols have been removed.

Quick Reference

Attribute Value
Difficulty Advanced
Time Estimate 1-2 weeks
Language C
Prerequisites Project 2, basic assembly knowledge
Key Topics Symbol stripping, disassembly, address mapping, reverse engineering

1. Learning Objectives

By completing this project, you will:

  • Understand what debug symbols are and why they’re removed in production
  • Learn to analyze crashes when GDB shows only memory addresses (??)
  • Master the disassemble command to view assembly code at crash sites
  • Use objdump and address mapping to correlate stripped binaries with debug builds
  • Develop techniques for debugging production binaries without source access
  • Understand ELF symbol tables and how stripping affects them

2. Theoretical Foundation

2.1 Core Concepts

What Are Debug Symbols?

Debug symbols are metadata embedded in an executable that map:

  • Memory addresses → Function names
  • Memory addresses → Source file names and line numbers
  • Variable names → Stack offsets and memory locations
  • Type information → Data structure layouts

When you compile with gcc -g, the compiler generates DWARF debug information stored in special sections of the ELF file (.debug_info, .debug_line, .debug_abbrev, etc.).

Why Strip Binaries?

Production binaries are stripped for several reasons:

┌──────────────────────────────────────────────────────────────┐
│                    REASONS TO STRIP                          │
├──────────────────────────────────────────────────────────────┤
│                                                              │
│  1. SIZE REDUCTION                                           │
│     Debug symbols can add 5-10x to binary size               │
│     Example: 1MB binary → 10MB with full debug info          │
│                                                              │
│  2. SECURITY                                                 │
│     Symbols reveal internal structure to attackers           │
│     Function names expose implementation details             │
│                                                              │
│  3. INTELLECTUAL PROPERTY                                    │
│     Symbol names reveal proprietary algorithm names          │
│     Makes reverse engineering more difficult                 │
│                                                              │
│  4. DEPLOYMENT SPEED                                         │
│     Smaller binaries transfer faster                         │
│     Faster container image pulls                             │
│                                                              │
└──────────────────────────────────────────────────────────────┘

The Symbol Table

Every ELF binary has a symbol table (.symtab) that maps addresses to names:

┌─────────────────────────────────────────────────────────────┐
│                    ELF SYMBOL TABLE                          │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│  Address          Size    Type      Name                     │
│  ─────────────────────────────────────────────────────────── │
│  0x0000000001129   45     FUNC      main                     │
│  0x0000000001156   120    FUNC      process_data             │
│  0x00000000011ce   89     FUNC      validate_input           │
│  0x0000000001227   200    FUNC      write_output             │
│                                                              │
│  After stripping:                                            │
│  ─────────────────────────────────────────────────────────── │
│  (empty or minimal)                                          │
│                                                              │
└─────────────────────────────────────────────────────────────┘

The strip Command

The strip command removes debug information and symbols:

# View symbol table before stripping
nm my_program
# Output:
# 0000000000001129 T main
# 0000000000001156 T process_data
# 00000000000011ce T validate_input

# Strip the binary
strip my_program

# View symbol table after stripping
nm my_program
# Output:
# nm: my_program: no symbols

2.2 Why This Matters

In the real world, you’ll often debug crashes from:

  • Production deployments (always stripped)
  • Third-party libraries (no debug builds available)
  • Customer-provided binaries (only have the stripped version)
  • Legacy systems (debug builds long lost)

Without symbols, GDB shows useless output:

(gdb) bt
#0  0x000055555555513d in ?? ()
#1  0x00007ffff7de8b25 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x00005555555550a0 in ?? ()

This project teaches you to make sense of ??.

2.3 Historical Context

Debug symbols have been a part of Unix since the early days. The DWARF format (Debugging With Arbitrary Record Formats) was developed in the 1980s and is now the standard for Unix/Linux systems. The name is a pun on ELF (Executable and Linkable Format).

The practice of stripping production binaries became standard as:

  1. Disk space was expensive (1970s-1990s)
  2. Security became a concern (1990s-present)
  3. Container deployments prioritized size (2010s-present)

2.4 Common Misconceptions

Misconception 1: “Stripped binaries are impossible to debug”

  • Reality: They’re harder, but you can still examine assembly, memory, and registers

Misconception 2: “You need the source code to debug a crash”

  • Reality: You can often identify the problem from assembly and memory state

Misconception 3: “Stripping removes all useful information”

  • Reality: The actual code remains; only the metadata is removed

Misconception 4: “You can’t recover function names from stripped binaries”

  • Reality: You can often infer them from the unstripped debug build or library symbols

3. Project Specification

3.1 What You Will Build

A complete workflow for debugging stripped binary crashes:

  1. A C program with multiple functions that crashes
  2. A stripped version of this program
  3. A core dump from the stripped version
  4. Documentation of the analysis process using GDB and objdump
  5. A script that automates the address-to-function mapping

3.2 Functional Requirements

  1. Create a multi-function crashing program
    • At least 4-5 functions in the call chain
    • The crash should occur in a nested function call
    • Use a mix of local variables and pointers
  2. Generate both debug and stripped versions
    • Debug version compiled with -g
    • Stripped version with all symbols removed
  3. Analyze the stripped crash
    • Get a backtrace showing only addresses
    • Disassemble the crashing instruction
    • Map addresses back to functions using the debug build
  4. Create a mapping script
    • Input: Address from stripped binary
    • Output: Corresponding function name and approximate source location

3.3 Non-Functional Requirements

  • The program should crash deterministically
  • All addresses should be position-independent (PIE) to simulate real scenarios
  • Documentation should be clear enough for someone else to follow

3.4 Example Usage / Output

Analyzing a stripped binary crash:

# Generate the crash with stripped binary
$ ./crashing_program_stripped
Segmentation fault (core dumped)

# Load in GDB - shows useless backtrace
$ gdb ./crashing_program_stripped core.5678
(gdb) bt
#0  0x000055555555513d in ?? ()
#1  0x00005555555551a8 in ?? ()
#2  0x0000555555555210 in ?? ()
#3  0x00007ffff7de8b25 in __libc_start_main () from /lib/x86_64-linux-gnu/libc.so.6
#4  0x00005555555550a0 in ?? ()

# Disassemble the crash location
(gdb) disassemble 0x000055555555513d
Dump of assembler code for function main:  # GDB might guess "main" but it's wrong!
   0x0000555555555129 <+0>:     push   %rbp
   0x000055555555512a <+1>:     mov    %rsp,%rbp
   ...
=> 0x000055555555513d <+20>:    movl   $0x2a,(%rax)  # The crash is here!
   ...
End of assembler dump.

# The disassembly shows we're writing to an address in %rax
(gdb) info registers rax
rax            0x0                 0

# %rax is 0 (NULL)! We found the problem.

Using objdump to map addresses:

# Get the function layout from the debug build
$ objdump -d crashing_program_debug | grep -A 5 "vulnerable_function"
0000000000001129 <vulnerable_function>:
    1129:       55                      push   %rbp
    112a:       48 89 e5                mov    %rsp,%rbp
    ...

# The crash at 0x...13d is offset 0x14 from function start
# Looking at the debug build: 0x1129 + 0x14 = 0x113d
# This is in vulnerable_function!

3.5 Real World Outcome

You’ll be able to take a customer’s crash report showing only hex addresses and:

  1. Identify which function crashed
  2. Understand the assembly instructions that failed
  3. Determine the likely cause from register state
  4. Provide a meaningful bug report

4. Solution Architecture

4.1 High-Level Design

┌─────────────────────────────────────────────────────────────────┐
│                    STRIPPED BINARY ANALYSIS                      │
└─────────────────────────────────────────────────────────────────┘
                              │
         ┌────────────────────┼────────────────────┐
         ▼                    ▼                    ▼
┌─────────────────┐  ┌─────────────────┐  ┌─────────────────┐
│  Stripped       │  │  Debug Build    │  │  Core Dump      │
│  Binary         │  │  (Reference)    │  │  File           │
│  ────────────   │  │  ────────────   │  │  ────────────   │
│  No symbols     │  │  Full symbols   │  │  Memory state   │
│  Just code      │  │  DWARF info     │  │  Register vals  │
└────────┬────────┘  └────────┬────────┘  └────────┬────────┘
         │                    │                    │
         └────────────────────┼────────────────────┘
                              │
                              ▼
                 ┌────────────────────────┐
                 │      ANALYSIS          │
                 │  ──────────────────    │
                 │  1. Get crash address  │
                 │  2. Disassemble code   │
                 │  3. Map to debug build │
                 │  4. Find function name │
                 │  5. Examine registers  │
                 └────────────────────────┘
                              │
                              ▼
                 ┌────────────────────────┐
                 │      DIAGNOSIS         │
                 │  ──────────────────    │
                 │  Function: X at line Y │
                 │  Cause: NULL pointer   │
                 └────────────────────────┘

4.2 Key Components

  1. Test Program: Multi-function C program with intentional crash
  2. Build System: Makefile producing both debug and stripped builds
  3. Analysis Script: Python/Bash script to automate address mapping
  4. Documentation: Step-by-step analysis walkthrough

4.3 Data Structures

Understanding ELF structure is key:

ELF File Structure
┌────────────────────────────┐
│      ELF Header            │  ← File type, architecture, entry point
├────────────────────────────┤
│      Program Headers       │  ← Memory layout for execution
├────────────────────────────┤
│      .text section         │  ← Executable code (PRESERVED)
├────────────────────────────┤
│      .data section         │  ← Initialized globals (PRESERVED)
├────────────────────────────┤
│      .rodata section       │  ← Read-only data (PRESERVED)
├────────────────────────────┤
│      .symtab section       │  ← Symbol table (STRIPPED)
├────────────────────────────┤
│      .strtab section       │  ← String table (STRIPPED)
├────────────────────────────┤
│      .debug_* sections     │  ← DWARF info (STRIPPED)
├────────────────────────────┤
│      Section Headers       │  ← Section metadata
└────────────────────────────┘

4.4 Algorithm Overview

Address Mapping Algorithm:

  1. Calculate base address offset:
    • PIE binaries load at random addresses
    • Subtract runtime base from crash address
    • Get the file offset
  2. Find function containing offset:
    • Parse debug build’s symbol table
    • Find function whose range includes offset
    • Return function name and relative offset
  3. Get source line (if debug build available):
    • Use addr2line with the file offset
    • Returns source file and line number

5. Implementation Guide

5.1 Development Environment Setup

# Install required tools
sudo apt-get install build-essential gdb binutils

# Verify objdump is available
objdump --version

# Create project directory
mkdir -p stripped_crash_project
cd stripped_crash_project

5.2 Project Structure

stripped_crash_project/
├── src/
│   └── crashing_program.c      # The test program
├── bin/
│   ├── crashing_program_debug  # Debug build
│   └── crashing_program_stripped # Stripped build
├── scripts/
│   └── addr2func.py            # Address mapping script
├── analysis/
│   └── analysis_notes.md       # Your investigation notes
└── Makefile

5.3 The Core Question You’re Answering

“How can you determine what function crashed and why when the binary has no debug symbols?”

The answer involves:

  1. Using assembly analysis to understand the failing instruction
  2. Correlating addresses between stripped and debug builds
  3. Reading register state to understand the immediate cause
  4. Using the debug build as a “Rosetta Stone” for the stripped binary

5.4 Concepts You Must Understand First

Before starting, verify you can answer:

  1. What is the difference between static and dynamic linking?
    • Reference: “Computer Systems: A Programmer’s Perspective” Ch. 7
  2. What are the main sections of an ELF file?
    • Reference: “Practical Binary Analysis” Ch. 2
  3. What is Position Independent Executable (PIE)?
    • Reference: GCC documentation on -fPIE
  4. How does the x86-64 calling convention work?
    • Reference: System V AMD64 ABI
  5. What do common x86-64 instructions like MOV, PUSH, CALL mean?
    • Reference: Intel x86-64 manual or “Low-Level Programming” by Zhirkov

5.5 Questions to Guide Your Design

About the test program:

  • How many functions deep should the crash be to make it interesting?
  • Should you use function pointers to make the analysis harder?
  • How will you verify both builds have identical code layout?

About the analysis process:

  • How will you handle ASLR (Address Space Layout Randomization)?
  • What information can you get from the dynamic symbol table (.dynsym)?
  • How do you calculate the base address of a loaded binary?

About automation:

  • How will your script find the corresponding debug build?
  • Should the script parse ELF directly or use tools like objdump?
  • How will you handle addresses in shared libraries?

5.6 Thinking Exercise

Before writing any code, work through this scenario on paper:

Given:

  • A crash at address 0x5555555551a3
  • The binary’s base address is 0x555555555000
  • objdump of the debug build shows:
    0000000000001140 <helper_function>:
        1140:  push   %rbp
        ...
        11a3:  movl   $0x0,(%rax)  <- crash here
        ...
        11bc:  ret
    

Questions to answer:

  1. What is the file offset of the crash?
  2. What function contains this offset?
  3. What is the crashing instruction?
  4. What could cause this instruction to fail?

Exercise: Draw the complete memory layout showing:

  • Where the binary is loaded
  • The relationship between file offsets and runtime addresses
  • How PIE affects address calculations

5.7 Hints in Layers

Hint 1 - Getting Started: Create a simple program with a crash buried 3-4 function calls deep. The key is that when stripped, you won’t know which function crashed just from the backtrace.

Hint 2 - Address Calculation: For PIE binaries, the file offset = runtime address - base address. You can find the base address in GDB with info proc mappings or by looking at the first address in the memory map.

Hint 3 - Using objdump:

# Get function boundaries
objdump -d binary | grep -E '^[0-9a-f]+ <.*>:'

# Get detailed disassembly
objdump -d binary | less

# Get symbol table
objdump -t binary

Hint 4 - Verification Approach: Compare the disassembly at the crash address in both stripped and debug builds. The actual machine code bytes should be identical—only the annotations differ.

5.8 The Interview Questions They’ll Ask

  1. “A customer sends you a crash with only hex addresses. How do you debug it?”
    • Expected: Explain the address mapping process, using debug builds, disassembly
  2. “What’s the difference between the .symtab and .dynsym sections?”
    • Expected: .symtab has all symbols (stripped away), .dynsym has only dynamic linking symbols (preserved)
  3. “How does ASLR affect crash dump analysis?”
    • Expected: Addresses are randomized, need to calculate offsets from base address
  4. “What information survives stripping?”
    • Expected: Code, data, PLT/GOT entries, dynamic symbols for shared libs
  5. “How would you debug a crash in a statically-linked, stripped binary?”
    • Expected: Harder—no library symbols remain. Need signature matching or original debug build
  6. “What’s the difference between a symbol file and a debug build?”
    • Expected: Some build systems create separate .debug files that can be loaded alongside stripped binaries

5.9 Books That Will Help

Topic Book Chapter(s)
ELF Format “Practical Binary Analysis” - Andriesse Ch. 2: The ELF Format
x86 Assembly “Low-Level Programming” - Zhirkov Ch. 1-3
GDB Disassembly “The Art of Debugging” - Matloff & Salzman Ch. 5
Linking & Symbols “Computer Systems: A Programmer’s Perspective” Ch. 7
Reverse Engineering “Reverse Engineering for Beginners” - Yurichev Part I

5.10 Implementation Phases

Phase 1: Create the Test Program (Day 1)

  • Write a C program with 4-5 functions
  • Include local variables and pointer operations
  • Ensure crash occurs in a nested function

Phase 2: Build System (Day 1-2)

  • Create Makefile with debug and release targets
  • Verify both builds produce crashes
  • Understand size difference between builds

Phase 3: Manual Analysis (Day 2-4)

  • Analyze stripped crash manually in GDB
  • Document the address mapping process
  • Write up findings in analysis notes

Phase 4: Automation Script (Day 5-7)

  • Create script to automate address→function mapping
  • Handle PIE address calculation
  • Support shared library addresses

Phase 5: Documentation (Day 7-10)

  • Document the complete workflow
  • Create a cheat sheet for common tasks
  • Add example analysis walkthrough

5.11 Key Implementation Decisions

  1. PIE vs Non-PIE: Use PIE binaries (default on modern systems) to learn real-world scenarios

  2. Static vs Dynamic Linking: Use dynamic linking to also learn about library symbol handling

  3. Script Language: Python is recommended for its subprocess and string handling capabilities

  4. Address Representation: Use hex consistently (0x…) to avoid confusion


6. Testing Strategy

Unit Tests

  • Verify stripped binary has no symbols (nm binary returns nothing)
  • Verify debug binary has symbols (nm binary shows functions)
  • Verify both binaries crash at the same code location

Integration Tests

  • Test address mapping script with known addresses
  • Verify script handles both PIE and non-PIE binaries
  • Test with addresses in shared libraries

Verification Checklist

  • Stripped binary is significantly smaller than debug build
  • GDB shows ?? for all user functions in stripped binary
  • Disassembly in both binaries shows identical machine code
  • Address mapping correctly identifies crashing function

7. Common Pitfalls & Debugging

Pitfall 1: ASLR Confusion

Problem: Addresses in core dump don’t match objdump output

Symptom:

(gdb) bt
#0  0x00005623a8c0113d in ?? ()

$ objdump -d binary | grep 113d
    113d:  movl   ...  # But this doesn't match!

Solution: Calculate file offset = runtime address - base address

(gdb) info proc mappings
# Find the binary's base address (usually first entry)
# Subtract base from crash address

Pitfall 2: Wrong Binary Loaded

Problem: Disassembly doesn’t make sense

Symptom: GDB shows assembly that doesn’t match the crash context

Solution: Verify you’re using the exact binary that created the core dump

# Check core dump's binary path
file core.1234
# Match MD5/SHA against your debug build
md5sum crashing_program_stripped crashing_program_debug

Pitfall 3: Compiler Optimizations Differ

Problem: Debug and release builds have different code layout

Symptom: Addresses don’t map correctly between builds

Solution: Use same optimization level for both:

CFLAGS_DEBUG = -g -O2   # Same optimization as release
CFLAGS_RELEASE = -O2 -s # Same optimization as debug

Pitfall 4: Library Addresses

Problem: Crash is in a library, not your code

Symptom: Address starts with 0x7f… (typical libc location)

Solution: Install debug symbols for libraries:

# Debian/Ubuntu
sudo apt-get install libc6-dbg

# RHEL/CentOS
sudo debuginfo-install glibc

8. Extensions & Challenges

Extension 1: Automated Symbol File Matching

Build a tool that:

  • Stores debug builds with their build IDs
  • Automatically retrieves the right debug build for a given core dump
  • Uses GNU Build ID to match binaries

Extension 2: Signature-Based Function Detection

When no debug build exists:

  • Identify common patterns (function prologues/epilogues)
  • Use library function signatures to identify calls
  • Apply ML techniques to recognize function boundaries

Extension 3: Cross-Architecture Analysis

  • Analyze ARM64 stripped binaries on x86-64
  • Learn QEMU for cross-architecture GDB
  • Compare calling conventions across architectures

Extension 4: Stripped Kernel Analysis

  • Analyze kernel modules without symbols
  • Use kallsyms as a partial symbol source
  • Map kernel addresses to kernel source

9. Real-World Connections

Industry Practice: Separate Debug Files

Many companies store debug files separately:

/usr/lib/debug/
├── usr/
│   └── bin/
│       └── myapp.debug   # Debug symbols only

GDB automatically finds these via build-id:

# Check build ID
readelf -n myapp

# GDB looks in /usr/lib/debug/.build-id/<xx>/<yyyyyyy>.debug

Mozilla/Google Breakpad Symbol Server

Large projects maintain symbol servers:

  • Binaries are released stripped
  • Debug symbols uploaded to symbol server
  • Crash reporters download symbols on demand

Production Debugging Workflow

Production Crash → Core Dump → Upload to Analysis System
                                      │
Symbol Server → Matching Symbols ─────┤
                                      │
                              Automated Analysis
                                      │
                               Bug Report

10. Resources

Official Documentation

Tools

  • objdump - Disassembler and ELF analyzer
  • nm - Symbol table viewer
  • readelf - ELF file analyzer
  • addr2line - Address to source line translator
  • strip - Symbol removal tool

Online Resources


11. Self-Assessment Checklist

Before You Start

  • Can explain what debug symbols contain
  • Understand basic x86-64 assembly (MOV, PUSH, CALL, RET)
  • Know how to use objdump to view disassembly
  • Understand PIE and ASLR concepts

After Completion

  • Can take a stripped binary crash and identify the function
  • Can read assembly to understand crash cause
  • Can calculate file offsets from runtime addresses
  • Can use objdump/nm/readelf to analyze binaries
  • Can explain the stripping process and what’s removed
  • Can debug crashes without access to source code

12. Submission / Completion Criteria

Your project is complete when you can demonstrate:

  1. Reproducible Crash
    • Stripped binary crashes deterministically
    • Core dump is generated
  2. Analysis Walkthrough
    • Documented process of analyzing stripped crash
    • Showed address mapping to function names
    • Identified crash cause from assembly
  3. Working Script
    • Script takes crash address and returns function name
    • Handles PIE address calculation
    • Works with your test program
  4. Understanding Demonstration
    • Can explain each step of the analysis
    • Can answer interview questions in section 5.8
    • Can apply technique to a new unknown binary

Next: Project 7: The Minidump Parser - Parse Google Breakpad minidump files