Project 6: Deconstructing a Stripped Binary Crash
Debug crashes in production binaries where all debug symbols have been removed.
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Advanced |
| Time Estimate | 1-2 weeks |
| Language | C |
| Prerequisites | Project 2, basic assembly knowledge |
| Key Topics | Symbol stripping, disassembly, address mapping, reverse engineering |
1. Learning Objectives
By completing this project, you will:
- Understand what debug symbols are and why they’re removed in production
- Learn to analyze crashes when GDB shows only memory addresses (
??) - Master the
disassemblecommand to view assembly code at crash sites - Use
objdumpand address mapping to correlate stripped binaries with debug builds - Develop techniques for debugging production binaries without source access
- Understand ELF symbol tables and how stripping affects them
2. Theoretical Foundation
2.1 Core Concepts
What Are Debug Symbols?
Debug symbols are metadata embedded in an executable that map:
- Memory addresses → Function names
- Memory addresses → Source file names and line numbers
- Variable names → Stack offsets and memory locations
- Type information → Data structure layouts
When you compile with gcc -g, the compiler generates DWARF debug information stored in special sections of the ELF file (.debug_info, .debug_line, .debug_abbrev, etc.).
Why Strip Binaries?
Production binaries are stripped for several reasons:
┌──────────────────────────────────────────────────────────────┐
│ REASONS TO STRIP │
├──────────────────────────────────────────────────────────────┤
│ │
│ 1. SIZE REDUCTION │
│ Debug symbols can add 5-10x to binary size │
│ Example: 1MB binary → 10MB with full debug info │
│ │
│ 2. SECURITY │
│ Symbols reveal internal structure to attackers │
│ Function names expose implementation details │
│ │
│ 3. INTELLECTUAL PROPERTY │
│ Symbol names reveal proprietary algorithm names │
│ Makes reverse engineering more difficult │
│ │
│ 4. DEPLOYMENT SPEED │
│ Smaller binaries transfer faster │
│ Faster container image pulls │
│ │
└──────────────────────────────────────────────────────────────┘
The Symbol Table
Every ELF binary has a symbol table (.symtab) that maps addresses to names:
┌─────────────────────────────────────────────────────────────┐
│ ELF SYMBOL TABLE │
├─────────────────────────────────────────────────────────────┤
│ │
│ Address Size Type Name │
│ ─────────────────────────────────────────────────────────── │
│ 0x0000000001129 45 FUNC main │
│ 0x0000000001156 120 FUNC process_data │
│ 0x00000000011ce 89 FUNC validate_input │
│ 0x0000000001227 200 FUNC write_output │
│ │
│ After stripping: │
│ ─────────────────────────────────────────────────────────── │
│ (empty or minimal) │
│ │
└─────────────────────────────────────────────────────────────┘
The strip Command
The strip command removes debug information and symbols:
# View symbol table before stripping
nm my_program
# Output:
# 0000000000001129 T main
# 0000000000001156 T process_data
# 00000000000011ce T validate_input
# Strip the binary
strip my_program
# View symbol table after stripping
nm my_program
# Output:
# nm: my_program: no symbols
2.2 Why This Matters
In the real world, you’ll often debug crashes from:
- Production deployments (always stripped)
- Third-party libraries (no debug builds available)
- Customer-provided binaries (only have the stripped version)
- Legacy systems (debug builds long lost)
Without symbols, GDB shows useless output:
(gdb) bt
#0 0x000055555555513d in ?? ()
#1 0x00007ffff7de8b25 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#2 0x00005555555550a0 in ?? ()
This project teaches you to make sense of ??.
2.3 Historical Context
Debug symbols have been a part of Unix since the early days. The DWARF format (Debugging With Arbitrary Record Formats) was developed in the 1980s and is now the standard for Unix/Linux systems. The name is a pun on ELF (Executable and Linkable Format).
The practice of stripping production binaries became standard as:
- Disk space was expensive (1970s-1990s)
- Security became a concern (1990s-present)
- Container deployments prioritized size (2010s-present)
2.4 Common Misconceptions
Misconception 1: “Stripped binaries are impossible to debug”
- Reality: They’re harder, but you can still examine assembly, memory, and registers
Misconception 2: “You need the source code to debug a crash”
- Reality: You can often identify the problem from assembly and memory state
Misconception 3: “Stripping removes all useful information”
- Reality: The actual code remains; only the metadata is removed
Misconception 4: “You can’t recover function names from stripped binaries”
- Reality: You can often infer them from the unstripped debug build or library symbols
3. Project Specification
3.1 What You Will Build
A complete workflow for debugging stripped binary crashes:
- A C program with multiple functions that crashes
- A stripped version of this program
- A core dump from the stripped version
- Documentation of the analysis process using GDB and objdump
- A script that automates the address-to-function mapping
3.2 Functional Requirements
- Create a multi-function crashing program
- At least 4-5 functions in the call chain
- The crash should occur in a nested function call
- Use a mix of local variables and pointers
- Generate both debug and stripped versions
- Debug version compiled with
-g - Stripped version with all symbols removed
- Debug version compiled with
- Analyze the stripped crash
- Get a backtrace showing only addresses
- Disassemble the crashing instruction
- Map addresses back to functions using the debug build
- Create a mapping script
- Input: Address from stripped binary
- Output: Corresponding function name and approximate source location
3.3 Non-Functional Requirements
- The program should crash deterministically
- All addresses should be position-independent (PIE) to simulate real scenarios
- Documentation should be clear enough for someone else to follow
3.4 Example Usage / Output
Analyzing a stripped binary crash:
# Generate the crash with stripped binary
$ ./crashing_program_stripped
Segmentation fault (core dumped)
# Load in GDB - shows useless backtrace
$ gdb ./crashing_program_stripped core.5678
(gdb) bt
#0 0x000055555555513d in ?? ()
#1 0x00005555555551a8 in ?? ()
#2 0x0000555555555210 in ?? ()
#3 0x00007ffff7de8b25 in __libc_start_main () from /lib/x86_64-linux-gnu/libc.so.6
#4 0x00005555555550a0 in ?? ()
# Disassemble the crash location
(gdb) disassemble 0x000055555555513d
Dump of assembler code for function main: # GDB might guess "main" but it's wrong!
0x0000555555555129 <+0>: push %rbp
0x000055555555512a <+1>: mov %rsp,%rbp
...
=> 0x000055555555513d <+20>: movl $0x2a,(%rax) # The crash is here!
...
End of assembler dump.
# The disassembly shows we're writing to an address in %rax
(gdb) info registers rax
rax 0x0 0
# %rax is 0 (NULL)! We found the problem.
Using objdump to map addresses:
# Get the function layout from the debug build
$ objdump -d crashing_program_debug | grep -A 5 "vulnerable_function"
0000000000001129 <vulnerable_function>:
1129: 55 push %rbp
112a: 48 89 e5 mov %rsp,%rbp
...
# The crash at 0x...13d is offset 0x14 from function start
# Looking at the debug build: 0x1129 + 0x14 = 0x113d
# This is in vulnerable_function!
3.5 Real World Outcome
You’ll be able to take a customer’s crash report showing only hex addresses and:
- Identify which function crashed
- Understand the assembly instructions that failed
- Determine the likely cause from register state
- Provide a meaningful bug report
4. Solution Architecture
4.1 High-Level Design
┌─────────────────────────────────────────────────────────────────┐
│ STRIPPED BINARY ANALYSIS │
└─────────────────────────────────────────────────────────────────┘
│
┌────────────────────┼────────────────────┐
▼ ▼ ▼
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Stripped │ │ Debug Build │ │ Core Dump │
│ Binary │ │ (Reference) │ │ File │
│ ──────────── │ │ ──────────── │ │ ──────────── │
│ No symbols │ │ Full symbols │ │ Memory state │
│ Just code │ │ DWARF info │ │ Register vals │
└────────┬────────┘ └────────┬────────┘ └────────┬────────┘
│ │ │
└────────────────────┼────────────────────┘
│
▼
┌────────────────────────┐
│ ANALYSIS │
│ ────────────────── │
│ 1. Get crash address │
│ 2. Disassemble code │
│ 3. Map to debug build │
│ 4. Find function name │
│ 5. Examine registers │
└────────────────────────┘
│
▼
┌────────────────────────┐
│ DIAGNOSIS │
│ ────────────────── │
│ Function: X at line Y │
│ Cause: NULL pointer │
└────────────────────────┘
4.2 Key Components
- Test Program: Multi-function C program with intentional crash
- Build System: Makefile producing both debug and stripped builds
- Analysis Script: Python/Bash script to automate address mapping
- Documentation: Step-by-step analysis walkthrough
4.3 Data Structures
Understanding ELF structure is key:
ELF File Structure
┌────────────────────────────┐
│ ELF Header │ ← File type, architecture, entry point
├────────────────────────────┤
│ Program Headers │ ← Memory layout for execution
├────────────────────────────┤
│ .text section │ ← Executable code (PRESERVED)
├────────────────────────────┤
│ .data section │ ← Initialized globals (PRESERVED)
├────────────────────────────┤
│ .rodata section │ ← Read-only data (PRESERVED)
├────────────────────────────┤
│ .symtab section │ ← Symbol table (STRIPPED)
├────────────────────────────┤
│ .strtab section │ ← String table (STRIPPED)
├────────────────────────────┤
│ .debug_* sections │ ← DWARF info (STRIPPED)
├────────────────────────────┤
│ Section Headers │ ← Section metadata
└────────────────────────────┘
4.4 Algorithm Overview
Address Mapping Algorithm:
- Calculate base address offset:
- PIE binaries load at random addresses
- Subtract runtime base from crash address
- Get the file offset
- Find function containing offset:
- Parse debug build’s symbol table
- Find function whose range includes offset
- Return function name and relative offset
- Get source line (if debug build available):
- Use
addr2linewith the file offset - Returns source file and line number
- Use
5. Implementation Guide
5.1 Development Environment Setup
# Install required tools
sudo apt-get install build-essential gdb binutils
# Verify objdump is available
objdump --version
# Create project directory
mkdir -p stripped_crash_project
cd stripped_crash_project
5.2 Project Structure
stripped_crash_project/
├── src/
│ └── crashing_program.c # The test program
├── bin/
│ ├── crashing_program_debug # Debug build
│ └── crashing_program_stripped # Stripped build
├── scripts/
│ └── addr2func.py # Address mapping script
├── analysis/
│ └── analysis_notes.md # Your investigation notes
└── Makefile
5.3 The Core Question You’re Answering
“How can you determine what function crashed and why when the binary has no debug symbols?”
The answer involves:
- Using assembly analysis to understand the failing instruction
- Correlating addresses between stripped and debug builds
- Reading register state to understand the immediate cause
- Using the debug build as a “Rosetta Stone” for the stripped binary
5.4 Concepts You Must Understand First
Before starting, verify you can answer:
- What is the difference between static and dynamic linking?
- Reference: “Computer Systems: A Programmer’s Perspective” Ch. 7
- What are the main sections of an ELF file?
- Reference: “Practical Binary Analysis” Ch. 2
- What is Position Independent Executable (PIE)?
- Reference: GCC documentation on
-fPIE
- Reference: GCC documentation on
- How does the x86-64 calling convention work?
- Reference: System V AMD64 ABI
- What do common x86-64 instructions like MOV, PUSH, CALL mean?
- Reference: Intel x86-64 manual or “Low-Level Programming” by Zhirkov
5.5 Questions to Guide Your Design
About the test program:
- How many functions deep should the crash be to make it interesting?
- Should you use function pointers to make the analysis harder?
- How will you verify both builds have identical code layout?
About the analysis process:
- How will you handle ASLR (Address Space Layout Randomization)?
- What information can you get from the dynamic symbol table (
.dynsym)? - How do you calculate the base address of a loaded binary?
About automation:
- How will your script find the corresponding debug build?
- Should the script parse ELF directly or use tools like
objdump? - How will you handle addresses in shared libraries?
5.6 Thinking Exercise
Before writing any code, work through this scenario on paper:
Given:
- A crash at address
0x5555555551a3 - The binary’s base address is
0x555555555000 - objdump of the debug build shows:
0000000000001140 <helper_function>: 1140: push %rbp ... 11a3: movl $0x0,(%rax) <- crash here ... 11bc: ret
Questions to answer:
- What is the file offset of the crash?
- What function contains this offset?
- What is the crashing instruction?
- What could cause this instruction to fail?
Exercise: Draw the complete memory layout showing:
- Where the binary is loaded
- The relationship between file offsets and runtime addresses
- How PIE affects address calculations
5.7 Hints in Layers
Hint 1 - Getting Started: Create a simple program with a crash buried 3-4 function calls deep. The key is that when stripped, you won’t know which function crashed just from the backtrace.
Hint 2 - Address Calculation:
For PIE binaries, the file offset = runtime address - base address. You can find the base address in GDB with info proc mappings or by looking at the first address in the memory map.
Hint 3 - Using objdump:
# Get function boundaries
objdump -d binary | grep -E '^[0-9a-f]+ <.*>:'
# Get detailed disassembly
objdump -d binary | less
# Get symbol table
objdump -t binary
Hint 4 - Verification Approach: Compare the disassembly at the crash address in both stripped and debug builds. The actual machine code bytes should be identical—only the annotations differ.
5.8 The Interview Questions They’ll Ask
- “A customer sends you a crash with only hex addresses. How do you debug it?”
- Expected: Explain the address mapping process, using debug builds, disassembly
- “What’s the difference between the .symtab and .dynsym sections?”
- Expected: .symtab has all symbols (stripped away), .dynsym has only dynamic linking symbols (preserved)
- “How does ASLR affect crash dump analysis?”
- Expected: Addresses are randomized, need to calculate offsets from base address
- “What information survives stripping?”
- Expected: Code, data, PLT/GOT entries, dynamic symbols for shared libs
- “How would you debug a crash in a statically-linked, stripped binary?”
- Expected: Harder—no library symbols remain. Need signature matching or original debug build
- “What’s the difference between a symbol file and a debug build?”
- Expected: Some build systems create separate .debug files that can be loaded alongside stripped binaries
5.9 Books That Will Help
| Topic | Book | Chapter(s) |
|---|---|---|
| ELF Format | “Practical Binary Analysis” - Andriesse | Ch. 2: The ELF Format |
| x86 Assembly | “Low-Level Programming” - Zhirkov | Ch. 1-3 |
| GDB Disassembly | “The Art of Debugging” - Matloff & Salzman | Ch. 5 |
| Linking & Symbols | “Computer Systems: A Programmer’s Perspective” | Ch. 7 |
| Reverse Engineering | “Reverse Engineering for Beginners” - Yurichev | Part I |
5.10 Implementation Phases
Phase 1: Create the Test Program (Day 1)
- Write a C program with 4-5 functions
- Include local variables and pointer operations
- Ensure crash occurs in a nested function
Phase 2: Build System (Day 1-2)
- Create Makefile with debug and release targets
- Verify both builds produce crashes
- Understand size difference between builds
Phase 3: Manual Analysis (Day 2-4)
- Analyze stripped crash manually in GDB
- Document the address mapping process
- Write up findings in analysis notes
Phase 4: Automation Script (Day 5-7)
- Create script to automate address→function mapping
- Handle PIE address calculation
- Support shared library addresses
Phase 5: Documentation (Day 7-10)
- Document the complete workflow
- Create a cheat sheet for common tasks
- Add example analysis walkthrough
5.11 Key Implementation Decisions
-
PIE vs Non-PIE: Use PIE binaries (default on modern systems) to learn real-world scenarios
-
Static vs Dynamic Linking: Use dynamic linking to also learn about library symbol handling
-
Script Language: Python is recommended for its
subprocessand string handling capabilities -
Address Representation: Use hex consistently (0x…) to avoid confusion
6. Testing Strategy
Unit Tests
- Verify stripped binary has no symbols (
nm binaryreturns nothing) - Verify debug binary has symbols (
nm binaryshows functions) - Verify both binaries crash at the same code location
Integration Tests
- Test address mapping script with known addresses
- Verify script handles both PIE and non-PIE binaries
- Test with addresses in shared libraries
Verification Checklist
- Stripped binary is significantly smaller than debug build
- GDB shows
??for all user functions in stripped binary - Disassembly in both binaries shows identical machine code
- Address mapping correctly identifies crashing function
7. Common Pitfalls & Debugging
Pitfall 1: ASLR Confusion
Problem: Addresses in core dump don’t match objdump output
Symptom:
(gdb) bt
#0 0x00005623a8c0113d in ?? ()
$ objdump -d binary | grep 113d
113d: movl ... # But this doesn't match!
Solution: Calculate file offset = runtime address - base address
(gdb) info proc mappings
# Find the binary's base address (usually first entry)
# Subtract base from crash address
Pitfall 2: Wrong Binary Loaded
Problem: Disassembly doesn’t make sense
Symptom: GDB shows assembly that doesn’t match the crash context
Solution: Verify you’re using the exact binary that created the core dump
# Check core dump's binary path
file core.1234
# Match MD5/SHA against your debug build
md5sum crashing_program_stripped crashing_program_debug
Pitfall 3: Compiler Optimizations Differ
Problem: Debug and release builds have different code layout
Symptom: Addresses don’t map correctly between builds
Solution: Use same optimization level for both:
CFLAGS_DEBUG = -g -O2 # Same optimization as release
CFLAGS_RELEASE = -O2 -s # Same optimization as debug
Pitfall 4: Library Addresses
Problem: Crash is in a library, not your code
Symptom: Address starts with 0x7f… (typical libc location)
Solution: Install debug symbols for libraries:
# Debian/Ubuntu
sudo apt-get install libc6-dbg
# RHEL/CentOS
sudo debuginfo-install glibc
8. Extensions & Challenges
Extension 1: Automated Symbol File Matching
Build a tool that:
- Stores debug builds with their build IDs
- Automatically retrieves the right debug build for a given core dump
- Uses GNU Build ID to match binaries
Extension 2: Signature-Based Function Detection
When no debug build exists:
- Identify common patterns (function prologues/epilogues)
- Use library function signatures to identify calls
- Apply ML techniques to recognize function boundaries
Extension 3: Cross-Architecture Analysis
- Analyze ARM64 stripped binaries on x86-64
- Learn QEMU for cross-architecture GDB
- Compare calling conventions across architectures
Extension 4: Stripped Kernel Analysis
- Analyze kernel modules without symbols
- Use kallsyms as a partial symbol source
- Map kernel addresses to kernel source
9. Real-World Connections
Industry Practice: Separate Debug Files
Many companies store debug files separately:
/usr/lib/debug/
├── usr/
│ └── bin/
│ └── myapp.debug # Debug symbols only
GDB automatically finds these via build-id:
# Check build ID
readelf -n myapp
# GDB looks in /usr/lib/debug/.build-id/<xx>/<yyyyyyy>.debug
Mozilla/Google Breakpad Symbol Server
Large projects maintain symbol servers:
- Binaries are released stripped
- Debug symbols uploaded to symbol server
- Crash reporters download symbols on demand
Production Debugging Workflow
Production Crash → Core Dump → Upload to Analysis System
│
Symbol Server → Matching Symbols ─────┤
│
Automated Analysis
│
Bug Report
10. Resources
Official Documentation
Tools
objdump- Disassembler and ELF analyzernm- Symbol table viewerreadelf- ELF file analyzeraddr2line- Address to source line translatorstrip- Symbol removal tool
Online Resources
11. Self-Assessment Checklist
Before You Start
- Can explain what debug symbols contain
- Understand basic x86-64 assembly (MOV, PUSH, CALL, RET)
- Know how to use objdump to view disassembly
- Understand PIE and ASLR concepts
After Completion
- Can take a stripped binary crash and identify the function
- Can read assembly to understand crash cause
- Can calculate file offsets from runtime addresses
- Can use objdump/nm/readelf to analyze binaries
- Can explain the stripping process and what’s removed
- Can debug crashes without access to source code
12. Submission / Completion Criteria
Your project is complete when you can demonstrate:
- Reproducible Crash
- Stripped binary crashes deterministically
- Core dump is generated
- Analysis Walkthrough
- Documented process of analyzing stripped crash
- Showed address mapping to function names
- Identified crash cause from assembly
- Working Script
- Script takes crash address and returns function name
- Handles PIE address calculation
- Works with your test program
- Understanding Demonstration
- Can explain each step of the analysis
- Can answer interview questions in section 5.8
- Can apply technique to a new unknown binary
Next: Project 7: The Minidump Parser - Parse Google Breakpad minidump files