Project 4: The Automated Crash Detective
Build a Python script that automates initial crash dump triage, extracting backtraces, registers, and crash signals from core files to generate concise analysis reports.
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Intermediate |
| Time Estimate | 1-2 weeks |
| Language | Python |
| Prerequisites | Project 2 (GDB Backtrace), Basic Python scripting |
| Key Topics | GDB batch mode, GDB Python API, subprocess management, crash automation |
1. Learning Objectives
By completing this project, you will:
- Understand GDB’s batch mode and how to run non-interactive debugging sessions
- Master the GDB Python API for programmatic access to debugging information
- Learn subprocess management for orchestrating external tools from Python
- Build robust text parsing to extract structured data from GDB output
- Design reusable automation scripts that handle multiple crash scenarios
- Develop practical SRE/DevOps skills for crash triage at scale
2. Theoretical Foundation
2.1 Core Concepts
GDB Batch Mode: Non-Interactive Debugging
GDB’s batch mode allows you to run debugging commands without human interaction. This is the foundation of automated crash analysis.
Interactive GDB Session Batch Mode Execution
┌───────────────────────────┐ ┌───────────────────────────┐
│ $ gdb ./app core.1234 │ │ $ gdb --batch --quiet \ │
│ (gdb) bt │ │ --command=cmds.gdb \ │
│ #0 main() at app.c:10 │ │ ./app core.1234 │
│ (gdb) info registers │ │ │
│ rax 0x0 ... │ │ #0 main() at app.c:10 │
│ (gdb) quit │ │ rax 0x0 ... │
└───────────────────────────┘ └───────────────────────────┘
│ │
▼ ▼
Human types commands Script reads stdout output
Human reads output Script parses and processes
Key batch mode flags:
--batch: Exit after processing commands (implies--quiet)--quietor-q: Suppress introductory and copyright messages--command=FILEor-x FILE: Execute GDB commands from FILE--eval-command=COMMANDor-ex COMMAND: Execute a single GDB command
GDB Python API: Structured Access to Debugging Data
GDB embeds a Python interpreter that provides programmatic access to debugging information. Unlike parsing text output, the Python API gives you structured data.
┌─────────────────────────────────────────────────────────────────┐
│ GDB Architecture │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────┐ ┌─────────────────┐ ┌────────────────┐ │
│ │ Core Dump │───▶│ GDB Core │───▶│ Command Line │ │
│ │ + Executable │ │ (symbol table, │ │ Interface │ │
│ └──────────────┘ │ stack unwinder,│ └────────────────┘ │
│ │ expression │ │ │
│ │ evaluator) │ ▼ │
│ └────────┬─────────┘ ┌────────────────┐ │
│ │ │ Text Output │ │
│ │ └────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────┐ │
│ │ Python API │ │
│ │ (gdb module) │ │
│ └────────┬────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────┐ │
│ │ Your Script │ │
│ │ (analyzer.py) │ │
│ └─────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
Essential GDB Python API functions:
| Function | Purpose | Example |
|---|---|---|
gdb.execute(cmd) |
Run a GDB command, return output as string | gdb.execute("bt") |
gdb.parse_and_eval(expr) |
Evaluate expression, return gdb.Value | gdb.parse_and_eval("$rip") |
gdb.selected_frame() |
Get current stack frame | frame = gdb.selected_frame() |
gdb.selected_inferior() |
Get current inferior (debugged process) | inf = gdb.selected_inferior() |
gdb.newest_frame() |
Get the innermost (newest) frame | top = gdb.newest_frame() |
Subprocess Management: Orchestrating External Tools
Python’s subprocess module allows your main script to invoke GDB as a child process and capture its output.
┌─────────────────────────────────────────────────────────────────┐
│ Two-Process Architecture │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Process 1: Your Python Script (auto_analyzer.py) │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ │ │
│ │ executable = sys.argv[1] │ │
│ │ core_file = sys.argv[2] │ │
│ │ │ │
│ │ result = subprocess.run( │ │
│ │ ["gdb", "--batch", "-x", "analyzer.py", │ │
│ │ executable, core_file], │ │
│ │ capture_output=True, text=True │ │
│ │ ) │ │
│ │ │ │
│ │ report = parse_output(result.stdout) │ │
│ │ │ │
│ └──────────────────────────┬──────────────────────────────┘ │
│ │ │
│ │ spawns │
│ ▼ │
│ Process 2: GDB with Python Extension │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ │ │
│ │ GDB loads core dump and executable │ │
│ │ GDB runs analyzer.py inside its Python interpreter │ │
│ │ analyzer.py uses gdb.execute() and gdb.parse_and_eval()│ │
│ │ Output goes to stdout → captured by Process 1 │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
2.2 Why This Matters
The scale problem:
- A single server might generate 1-5 crashes per day during development
- A fleet of 1000 servers might generate 100+ crashes per day
- A mobile app with 1 million users might generate 10,000+ crash reports per day
Manual analysis does not scale. Every SRE team, every crash reporting service (Sentry, Crashlytics, Breakpad), and every serious debugging workflow relies on automation.
What automation enables:
- Immediate triage: Categorize crashes by type within seconds
- Deduplication: Group identical crashes to focus engineering effort
- Alerting: Notify on-call engineers when new crash types appear
- Trending: Track crash rates over time to detect regressions
- Integration: Connect crash data to CI/CD, ticketing, and monitoring systems
2.3 Historical Context
Before automation (pre-2000s):
- Engineers would manually run
gdbon each core dump - No systematic tracking of which crashes had been analyzed
- “I’ll look at that later” often meant “never”
- Major crashes slipped through the cracks
The evolution of crash automation:
- Shell scripts (1990s): Simple
gdb --batchwrappers - GDB command files (2000s): Reusable
.gdbscripts - GDB Python API (2008): Structured programmatic access
- Crash reporting services (2010s): Sentry, Crashlytics, Raygun
- Modern pipelines (2020s): Integration with observability platforms
Why the GDB Python API was a game-changer (GDB 7.0, 2009):
- Before: Parse text output with fragile regex
- After: Access typed values, iterate frames, query symbols programmatically
- Made complex automation reliable and maintainable
2.4 Common Misconceptions
Misconception 1: “Batch mode means no interactivity, so it’s limited”
Reality: Batch mode gives you the SAME capabilities as interactive mode. You can:
- Set breakpoints, watchpoints, and catchpoints
- Evaluate any expression
- Examine any memory
- Walk the stack
The only difference is input comes from a script instead of a keyboard.
Misconception 2: “The GDB Python API is just for writing GDB extensions”
Reality: The API is equally useful for:
- One-off analysis scripts
- CI/CD integration
- Automated crash reports
- Custom debugging tools
You don’t need to modify GDB or create plugins.
Misconception 3: “Parsing GDB’s text output is good enough”
Reality: Text parsing is fragile because:
- Output format changes between GDB versions
- Localization can change output language
- Edge cases produce unexpected formatting
The Python API provides stable, typed access to the same data.
Misconception 4: “This is only useful for C/C++ crashes”
Reality: GDB (and your automation) can analyze:
- C and C++ programs
- Rust programs (with debug symbols)
- Go programs (with limitations)
- Any language that produces ELF executables with DWARF debug info
3. Project Specification
3.1 What You Will Build
A Python script (auto_analyzer.py) that:
- Takes an executable path and a core dump file as arguments
- Programmatically invokes GDB to load the crash
- Extracts key crash information (signal, backtrace, registers)
- Produces a formatted summary report
┌─────────────────────────────────────────────────────────────────┐
│ auto_analyzer.py │
├─────────────────────────────────────────────────────────────────┤
│ │
│ INPUT: │
│ ┌─────────────┐ ┌─────────────┐ │
│ │ Executable │ │ Core Dump │ │
│ │ (./my_app) │ │ (core.1234) │ │
│ └──────┬──────┘ └──────┬──────┘ │
│ │ │ │
│ └────────┬─────────┘ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ GDB + Python Script │ │
│ │ │ │
│ │ 1. Load core dump │ │
│ │ 2. Extract signal information │ │
│ │ 3. Get backtrace │ │
│ │ 4. Read register values │ │
│ │ 5. Identify crash location │ │
│ │ │ │
│ └──────────────────────────┬──────────────────────────────┘ │
│ │ │
│ ▼ │
│ OUTPUT: │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Crash Analysis Report │ │
│ │ │ │
│ │ Executable: ./my_app │ │
│ │ Core File: core.1234 │ │
│ │ Signal: SIGSEGV (Segmentation fault) │ │
│ │ Crashing IP: 0x55555555513d │ │
│ │ │ │
│ │ --- Backtrace --- │ │
│ │ #0 main () at crashing_program.c:4 │ │
│ │ │ │
│ │ --- Registers --- │ │
│ │ RAX: 0x0 │ │
│ │ RBX: 0x0 │ │
│ │ ... │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
3.2 Functional Requirements
| ID | Requirement | Priority |
|---|---|---|
| FR-1 | Accept executable and core file paths as command-line arguments | Must Have |
| FR-2 | Validate that both files exist and are readable | Must Have |
| FR-3 | Extract the signal that caused the crash (SIGSEGV, SIGABRT, etc.) | Must Have |
| FR-4 | Extract the crash address (if applicable) | Must Have |
| FR-5 | Extract the full backtrace | Must Have |
| FR-6 | Extract key register values (RIP, RSP, RAX, RBX, etc.) | Must Have |
| FR-7 | Handle core dumps without debug symbols gracefully | Should Have |
| FR-8 | Output a human-readable report to stdout | Must Have |
| FR-9 | Support different crash types (SIGSEGV, SIGFPE, SIGABRT, SIGBUS) | Should Have |
| FR-10 | Provide JSON output option for machine processing | Nice to Have |
3.3 Non-Functional Requirements
| ID | Requirement | Metric |
|---|---|---|
| NFR-1 | Analysis completes quickly | < 5 seconds for typical core dump |
| NFR-2 | Works with GDB 8.0+ | Tested on Ubuntu 20.04, 22.04 |
| NFR-3 | Error messages are clear and actionable | User can resolve issues without docs |
| NFR-4 | Script is portable | Works on any Linux distribution |
| NFR-5 | No external Python dependencies (stdlib only) | Easy deployment |
3.4 Example Usage / Output
Basic usage:
$ python3 auto_analyzer.py ./my_app core.1234
--- Crash Analysis Report ---
Executable: ./my_app
Core File: core.1234
Signal: SIGSEGV (Segmentation fault) at 0x0
Crashing IP (RIP): 0x55555555513d
--- Backtrace ---
#0 0x000055555555513d in main () at crashing_program.c:4
--- Registers ---
RAX: 0x0
RBX: 0x0
RCX: 0x7ffff7f9aa80
RDX: 0x7fffffffe528
RSI: 0x7fffffffe518
RDI: 0x1
RBP: 0x7fffffffe420
RSP: 0x7fffffffe420
RIP: 0x55555555513d
With JSON output:
$ python3 auto_analyzer.py --json ./my_app core.1234
{
"executable": "./my_app",
"core_file": "core.1234",
"signal": {
"name": "SIGSEGV",
"description": "Segmentation fault",
"address": "0x0"
},
"crash_ip": "0x55555555513d",
"backtrace": [
{
"frame": 0,
"address": "0x000055555555513d",
"function": "main",
"file": "crashing_program.c",
"line": 4
}
],
"registers": {
"rax": "0x0",
"rbx": "0x0",
...
}
}
Error handling:
$ python3 auto_analyzer.py ./missing_app core.1234
ERROR: Executable not found: ./missing_app
$ python3 auto_analyzer.py ./my_app core.wrong
ERROR: Core file not found: core.wrong
$ python3 auto_analyzer.py ./wrong_app core.1234
WARNING: Core file was not generated by this executable.
Expected: ./wrong_app
Actual: ./my_app
3.5 Real World Outcome
After completing this project, you will have:
- A reusable crash analysis tool that can be dropped into any project
- Foundation for a crash reporting pipeline (add HTTP upload, database storage)
- Skills transferable to commercial tools like Sentry, Datadog, or custom SRE tooling
- Understanding of how tools like
coredumpctlwork internally
How this connects to production systems:
Your Script Production Evolution
┌──────────────┐ ┌─────────────────────────────────────────┐
│auto_analyzer │ ──▶ │ Crash Pipeline │
│ .py │ │ │
└──────────────┘ │ ┌─────────┐ ┌──────┐ ┌────────────┐ │
│ │ Collect │──│Analyze│──│ Store/Alert│ │
│ └─────────┘ └──────┘ └────────────┘ │
│ │
│ • systemd-coredump captures crashes │
│ • Your script analyzes automatically │
│ • Results go to Elasticsearch/Splunk │
│ • PagerDuty alerts for new crash types │
└─────────────────────────────────────────┘
4. Solution Architecture
4.1 High-Level Design
There are two main approaches to implementing this project. You should understand both:
Approach A: GDB Batch File (Simpler)
┌────────────────────────────────────────────────────────────────┐
│ Approach A: Batch Commands │
├────────────────────────────────────────────────────────────────┤
│ │
│ auto_analyzer.py │
│ ┌───────────────────────────────────────────────────────────┐ │
│ │ 1. Create commands.gdb file: │ │
│ │ set pagination off │ │
│ │ bt │ │
│ │ info registers │ │
│ │ quit │ │
│ │ │ │
│ │ 2. Run: gdb --batch -x commands.gdb ./app core.1234 │ │
│ │ │ │
│ │ 3. Parse stdout text output │ │
│ │ │ │
│ │ 4. Generate report │ │
│ └───────────────────────────────────────────────────────────┘ │
│ │
│ Pros: Simple, no Python inside GDB │
│ Cons: Fragile text parsing, limited flexibility │
│ │
└────────────────────────────────────────────────────────────────┘
Approach B: GDB Python API (More Robust)
┌────────────────────────────────────────────────────────────────┐
│ Approach B: Python API │
├────────────────────────────────────────────────────────────────┤
│ │
│ auto_analyzer.py (wrapper) │
│ ┌───────────────────────────────────────────────────────────┐ │
│ │ 1. Run: gdb --batch -x gdb_script.py ./app core.1234 │ │
│ │ │ │
│ │ 2. Capture stdout │ │
│ │ │ │
│ │ 3. Present report (already formatted by gdb_script.py) │ │
│ └───────────────────────────────────────────────────────────┘ │
│ │
│ gdb_script.py (runs inside GDB) │
│ ┌───────────────────────────────────────────────────────────┐ │
│ │ import gdb │ │
│ │ │ │
│ │ # Use API for structured access │ │
│ │ rip = gdb.parse_and_eval("$rip") │ │
│ │ frame = gdb.selected_frame() │ │
│ │ bt = gdb.execute("bt", to_string=True) │ │
│ │ │ │
│ │ # Print formatted output │ │
│ │ print(f"RIP: {rip}") │ │
│ └───────────────────────────────────────────────────────────┘ │
│ │
│ Pros: Structured data, reliable, extensible │
│ Cons: Two-file setup, requires understanding GDB Python │
│ │
└────────────────────────────────────────────────────────────────┘
Recommended approach: Start with Approach A to understand the basics, then refactor to Approach B for robustness.
4.2 Key Components
┌─────────────────────────────────────────────────────────────────┐
│ Component Architecture │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ CLI Interface (main) │ │
│ │ • Parse command-line arguments │ │
│ │ • Validate inputs │ │
│ │ • Handle --json flag │ │
│ │ • Print final report │ │
│ └────────────────────────────┬────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ GDB Invoker │ │
│ │ • Build GDB command line │ │
│ │ • Manage subprocess │ │
│ │ • Handle GDB errors │ │
│ │ • Return raw output │ │
│ └────────────────────────────┬────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ GDB Script (runs inside GDB) │ │
│ │ • Extract signal info │ │
│ │ • Generate backtrace │ │
│ │ • Read registers │ │
│ │ • Format output │ │
│ └────────────────────────────┬────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Report Formatter │ │
│ │ • Parse GDB output │ │
│ │ • Build report structure │ │
│ │ • Output text or JSON │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
4.3 Data Structures
# Crash analysis data model
from dataclasses import dataclass
from typing import List, Optional, Dict
@dataclass
class StackFrame:
"""Represents a single frame in the call stack."""
frame_number: int # #0, #1, #2, ...
address: str # 0x55555555513d
function: Optional[str] # main (or None if no symbols)
file: Optional[str] # crashing_program.c (or None)
line: Optional[int] # 4 (or None)
@dataclass
class SignalInfo:
"""Information about the terminating signal."""
name: str # SIGSEGV
description: str # Segmentation fault
fault_address: Optional[str] # 0x0 (address that caused fault)
@dataclass
class RegisterState:
"""CPU register values at crash time."""
registers: Dict[str, str] # {"rax": "0x0", "rbx": "0x7f...", ...}
@dataclass
class CrashReport:
"""Complete crash analysis report."""
executable: str
core_file: str
signal: SignalInfo
crash_address: str # RIP value
backtrace: List[StackFrame]
registers: RegisterState
has_symbols: bool # True if debug symbols present
4.4 Algorithm Overview
Main Analysis Flow:
┌─────────────────────────────────────────────────────────────────┐
│ Analysis Algorithm │
├─────────────────────────────────────────────────────────────────┤
│ │
│ 1. VALIDATE INPUTS │
│ ├─ Check executable exists and is ELF │
│ ├─ Check core file exists and is ELF core dump │
│ └─ Verify core matches executable (optional warning) │
│ │
│ 2. PREPARE GDB SESSION │
│ ├─ Create temporary command/script file │
│ └─ Build subprocess command line │
│ │
│ 3. INVOKE GDB │
│ ├─ Run: gdb --batch --quiet -x script executable core │
│ ├─ Capture stdout and stderr │
│ └─ Check return code │
│ │
│ 4. EXTRACT INFORMATION (inside GDB script) │
│ ├─ Signal: Parse "Program terminated with signal" message │
│ ├─ RIP: gdb.parse_and_eval("$rip") or "info registers" │
│ ├─ Backtrace: gdb.execute("bt") or "bt" command │
│ └─ Registers: gdb.parse_and_eval("$rax") or "info regs" │
│ │
│ 5. FORMAT REPORT │
│ ├─ Structure data into CrashReport │
│ ├─ Format as text (default) or JSON (--json flag) │
│ └─ Print to stdout │
│ │
│ 6. CLEANUP │
│ └─ Remove temporary files │
│ │
└─────────────────────────────────────────────────────────────────┘
5. Implementation Guide
5.1 Development Environment Setup
Required packages:
# Ubuntu/Debian
sudo apt-get update
sudo apt-get install gdb python3 build-essential
# Verify GDB has Python support
gdb --batch --eval-command="python print('Python works!')"
# Should output: Python works!
# Check GDB version (need 8.0+)
gdb --version
Create a test crash program:
// crashing_program.c
#include <stdio.h>
void inner_function(int *ptr) {
*ptr = 42; // Crash here if ptr is NULL
}
void outer_function(int *ptr) {
printf("About to crash...\n");
inner_function(ptr);
}
int main(int argc, char **argv) {
int *ptr = NULL; // NULL pointer
outer_function(ptr);
return 0;
}
Generate a core dump:
# Compile with debug symbols
gcc -g -o crashing_program crashing_program.c
# Enable core dumps
ulimit -c unlimited
# On some systems, configure core_pattern
# Check current setting:
cat /proc/sys/kernel/core_pattern
# Run and crash
./crashing_program
# Output: Segmentation fault (core dumped)
# Find the core file (location depends on core_pattern)
ls -la core* /var/lib/apport/coredump/ /var/crash/
5.2 Project Structure
auto_crash_analyzer/
├── auto_analyzer.py # Main entry point (wrapper script)
├── gdb_analyzer.py # GDB Python script (runs inside GDB)
├── test_programs/
│ ├── null_deref.c # NULL pointer dereference
│ ├── stack_overflow.c # Stack buffer overflow
│ ├── div_by_zero.c # Division by zero (SIGFPE)
│ ├── abort_call.c # Explicit abort() (SIGABRT)
│ └── Makefile # Build all test programs
├── test_cores/ # Generated core dumps for testing
├── tests/
│ ├── test_analyzer.py # Unit tests
│ └── test_integration.py # Integration tests
└── README.md
5.3 The Core Question You’re Answering
“How can I programmatically extract meaningful information from a crash dump without manual intervention?”
This question underpins all crash automation. The answer involves:
- Understanding what data GDB can extract
- Knowing how to invoke GDB non-interactively
- Structuring output for both human and machine consumption
- Handling edge cases (no symbols, corrupted dumps, etc.)
5.4 Concepts You Must Understand First
Before starting implementation, verify you understand:
| Concept | Self-Check Question | Resource if Unsure |
|---|---|---|
| GDB basics | Can you run bt, info registers, and p <var> in GDB? |
Project 2 of this series |
| Python subprocess | Can you capture stdout from a shell command in Python? | Python docs: subprocess module |
| ELF format | Can you use file command to identify ELF executables and core dumps? |
man elf, man file |
| Unix signals | Can you list 5 common signals and when they occur? | man 7 signal |
| Debug symbols | What’s the difference between compiling with and without -g? |
Project 2 of this series |
5.5 Questions to Guide Your Design
Input handling:
- How will you verify the executable is the correct one for the core dump?
- What should happen if the files don’t exist?
- Should you support absolute and relative paths?
GDB interaction:
- Will you use a command file, inline
-excommands, or a Python script? - How will you handle GDB errors (e.g., “No stack” or “Cannot access memory”)?
- Should your script work with both Python 2 and Python 3 in GDB?
Information extraction:
- What if there’s no signal info (e.g., the core was generated by
gcore)? - How many stack frames should you show by default?
- Which registers are most important to include?
Output format:
- Should the text output be colorized?
- How will you handle very long backtraces?
- What metadata should the JSON output include?
5.6 Thinking Exercise
Before writing code, trace through this scenario manually:
Exercise: You have a core dump from a multi-threaded program. Thread 2 crashed with SIGSEGV. Thread 1 was waiting in select(). Thread 3 was in malloc().
- Draw a diagram showing what information GDB will show for each thread.
- List the GDB commands needed to examine each thread.
- Design your output format: How will you represent multiple threads?
- What should happen if the user’s core dump has 100 threads?
This exercise prepares you for the multi-threaded extension in a later project.
5.7 Hints in Layers
Hint 1 - Starting Point (Conceptual Direction):
Start with the simplest possible implementation: a command file with bt and info registers, invoked via subprocess. Get basic extraction working before adding Python API usage.
Hint 2 - Next Level (More Specific Guidance):
Create commands.gdb:
set pagination off
set print pretty on
bt
info registers
quit
Invoke with: subprocess.run(["gdb", "--batch", "-q", "-x", "commands.gdb", exe, core], capture_output=True)
Hint 3 - Technical Details (Approach/Pseudocode):
# Wrapper script structure
def main():
exe, core = parse_args()
validate_inputs(exe, core)
# Create temp command file
with tempfile.NamedTemporaryFile(mode='w', suffix='.gdb') as f:
f.write("set pagination off\n")
f.write("bt\n")
f.write("info registers\n")
f.write("quit\n")
f.flush()
result = subprocess.run(
["gdb", "--batch", "-q", "-x", f.name, exe, core],
capture_output=True, text=True, timeout=30
)
report = parse_gdb_output(result.stdout, result.stderr)
print_report(report)
Hint 4 - Tools/Debugging (Verification Methods):
- Test with a known-good core dump first
- Print raw GDB output before parsing to see exact format
- Use
--eval-command="show version"to verify GDB is being invoked correctly - Check
result.returncode- GDB returns 0 even if core is corrupt
5.8 The Interview Questions They’ll Ask
- “How does GDB load a core dump?”
- Expected: GDB reads the ELF core file which contains memory segments, register values, and metadata. It uses the executable for symbol information.
- “What’s the difference between
gdb.execute()andgdb.parse_and_eval()?”- Expected:
execute()runs a command and returns text output.parse_and_eval()evaluates an expression and returns a typedgdb.Valueobject.
- Expected:
- “How would you handle a core dump from a stripped binary?”
- Expected: The backtrace will show addresses instead of function names. You can still examine registers and memory. If you have the original debug symbols separately, you can load them with
symbol-file.
- Expected: The backtrace will show addresses instead of function names. You can still examine registers and memory. If you have the original debug symbols separately, you can load them with
- “How would you scale this to analyze 1000 crashes per hour?”
- Expected: Parallelize analysis, deduplicate by backtrace signature, cache symbol files, use a queue system for crash files, store results in a database.
- “What are the security implications of automated crash analysis?”
- Expected: Core dumps may contain sensitive data (passwords in memory, encryption keys). Analysis should happen in isolated environments. Results should be redacted before storage.
5.9 Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| GDB basics | “The Art of Debugging with GDB” - Matloff & Salzman | Ch. 1-3 |
| GDB scripting | “Debugging with GDB” - GNU Manual | Ch. 23 (Python) |
| Process memory | “Computer Systems: A Programmer’s Perspective” - Bryant & O’Hallaron | Ch. 9 (Virtual Memory) |
| Automation mindset | “Black Hat Python” - Justin Seitz | Ch. 1-2 |
| Crash reporting systems | “Site Reliability Engineering” - Google | Ch. 15 (Postmortems) |
5.10 Implementation Phases
Phase 1: Basic Extraction (Days 1-3)
Goals:
- Create test crash programs
- Invoke GDB via subprocess
- Capture backtrace output
Deliverable: Script that prints raw GDB output for any core dump.
Phase 2: Structured Parsing (Days 3-6)
Goals:
- Parse backtrace into stack frames
- Extract signal information
- Parse register values
Deliverable: Script that prints formatted report with sections.
Phase 3: GDB Python API Migration (Days 6-9)
Goals:
- Rewrite extraction using
gdb.execute()andgdb.parse_and_eval() - Improve reliability of data extraction
- Handle edge cases (no symbols, truncated stacks)
Deliverable: Robust analyzer using Python API.
Phase 4: Polish and Testing (Days 9-14)
Goals:
- Add JSON output option
- Add comprehensive error handling
- Test with multiple crash types
- Document usage
Deliverable: Production-ready analyzer script.
5.11 Key Implementation Decisions
Decision 1: Command file vs. Python API
| Factor | Command File | Python API |
|---|---|---|
| Simplicity | Easier to start | More code |
| Reliability | Fragile parsing | Structured data |
| Flexibility | Limited | High |
| Debugging | Harder | Easier |
Recommendation: Start with command file, migrate to Python API.
Decision 2: Single script vs. two scripts
| Factor | Single Script | Two Scripts |
|---|---|---|
| Deployment | One file | Two files |
| Complexity | Higher (embed GDB script) | Lower (separation) |
| Maintainability | Harder | Easier |
| Testing | Harder | Can test independently |
Recommendation: Two scripts (wrapper + GDB script).
Decision 3: Text parsing vs. structured output
For the command file approach, you must parse text. Key patterns:
# Backtrace line pattern
#0 0x000055555555513d in main () at crashing_program.c:4
^ ^ ^ ^ ^
| | | | +-- line number
| | | +-- file name
| | +-- function name
| +-- address
+-- frame number
# Register pattern
rax 0x0 0
^ ^ ^
| | +-- decimal value
| +-- hex value
+-- register name
# Signal pattern
Program terminated with signal SIGSEGV, Segmentation fault.
^ ^
| +-- description
+-- signal name
6. Testing Strategy
Unit Tests
# tests/test_analyzer.py
import unittest
from analyzer import parse_backtrace_line, parse_register_line, parse_signal_line
class TestBacktraceParsing(unittest.TestCase):
def test_full_frame_with_symbols(self):
line = "#0 0x000055555555513d in main () at crashing_program.c:4"
frame = parse_backtrace_line(line)
self.assertEqual(frame.frame_number, 0)
self.assertEqual(frame.address, "0x000055555555513d")
self.assertEqual(frame.function, "main")
self.assertEqual(frame.file, "crashing_program.c")
self.assertEqual(frame.line, 4)
def test_frame_without_symbols(self):
line = "#0 0x000055555555513d in ?? ()"
frame = parse_backtrace_line(line)
self.assertEqual(frame.function, None)
self.assertEqual(frame.file, None)
def test_frame_with_args(self):
line = "#1 0x0000555555555160 in foo (x=42, y=0x7fff) at test.c:10"
frame = parse_backtrace_line(line)
self.assertEqual(frame.function, "foo")
class TestRegisterParsing(unittest.TestCase):
def test_standard_register(self):
line = "rax 0x0 0"
name, value = parse_register_line(line)
self.assertEqual(name, "rax")
self.assertEqual(value, "0x0")
def test_register_with_large_value(self):
line = "rsp 0x7fffffffe420 140737488348192"
name, value = parse_register_line(line)
self.assertEqual(name, "rsp")
self.assertEqual(value, "0x7fffffffe420")
class TestSignalParsing(unittest.TestCase):
def test_sigsegv(self):
line = "Program terminated with signal SIGSEGV, Segmentation fault."
signal = parse_signal_line(line)
self.assertEqual(signal.name, "SIGSEGV")
self.assertEqual(signal.description, "Segmentation fault")
def test_sigabrt(self):
line = "Program terminated with signal SIGABRT, Aborted."
signal = parse_signal_line(line)
self.assertEqual(signal.name, "SIGABRT")
Integration Tests
#!/bin/bash
# tests/test_integration.sh
set -e
SCRIPT_DIR=$(dirname "$0")
ANALYZER="$SCRIPT_DIR/../auto_analyzer.py"
TEST_PROGS="$SCRIPT_DIR/../test_programs"
TEST_CORES="$SCRIPT_DIR/../test_cores"
# Build test programs
make -C "$TEST_PROGS"
# Generate core dumps
ulimit -c unlimited
cd "$TEST_CORES"
for prog in null_deref stack_overflow div_by_zero abort_call; do
echo "Testing $prog..."
# Generate core (will crash)
"$TEST_PROGS/$prog" 2>/dev/null || true
# Find core file (name depends on system)
CORE=$(ls -t core* 2>/dev/null | head -1)
if [ -z "$CORE" ]; then
echo "ERROR: No core file generated for $prog"
exit 1
fi
# Run analyzer
OUTPUT=$(python3 "$ANALYZER" "$TEST_PROGS/$prog" "$CORE")
# Verify output contains expected sections
echo "$OUTPUT" | grep -q "Crash Analysis Report" || { echo "Missing report header"; exit 1; }
echo "$OUTPUT" | grep -q "Backtrace" || { echo "Missing backtrace"; exit 1; }
echo "$OUTPUT" | grep -q "Registers" || { echo "Missing registers"; exit 1; }
# Verify signal detection
case "$prog" in
null_deref) echo "$OUTPUT" | grep -q "SIGSEGV" ;;
div_by_zero) echo "$OUTPUT" | grep -q "SIGFPE" ;;
abort_call) echo "$OUTPUT" | grep -q "SIGABRT" ;;
esac
echo " PASSED"
rm -f "$CORE"
done
echo "All integration tests passed!"
Test with Different Scenarios
| Scenario | How to Create | What to Verify |
|---|---|---|
| Simple SIGSEGV | Dereference NULL | Signal detected, address is 0x0 |
| SIGFPE | Divide by zero | Signal is SIGFPE |
| SIGABRT | Call abort() |
Signal is SIGABRT |
| Deep stack | Recursive function | Many frames shown |
| No symbols | Strip executable | Shows ?? for functions |
| Multi-threaded | pthread crash | Thread info included |
| Corrupted core | Truncate core file | Graceful error message |
7. Common Pitfalls & Debugging
Pitfall 1: GDB Not Finding Python
Symptom:
$ gdb --batch -x script.py exe core
Python scripting is not supported in this copy of GDB.
Cause: GDB was compiled without Python support.
Fix:
# Check GDB Python support
gdb --batch --eval-command="python print('test')"
# If unsupported, install GDB with Python:
# Ubuntu/Debian
sudo apt-get install gdb
# From source
./configure --with-python
Pitfall 2: Core Pattern Redirects Core Dumps
Symptom: Core dump not appearing in current directory.
Cause: System’s core_pattern redirects to a different location.
Fix:
# Check current pattern
cat /proc/sys/kernel/core_pattern
# Common patterns and where to find cores:
# |/usr/share/apport/apport ... -> /var/crash/ (Ubuntu)
# |/usr/lib/systemd/systemd-coredump ... -> journalctl (systemd)
# core.%p -> ./core.<pid>
# For testing, set simple pattern (requires root):
sudo sysctl kernel.core_pattern=core.%p
# Or use coredumpctl on systemd systems:
coredumpctl list
coredumpctl dump <pid> > core.file
Pitfall 3: GDB Output Format Changes
Symptom: Parser works on one system, fails on another.
Cause: Different GDB versions format output differently.
Fix: Use Python API instead of text parsing, or be defensive:
# Fragile (exact format)
frame_match = re.match(r'^#(\d+)\s+0x([0-9a-f]+)\s+in\s+(\S+)\s+\(\)', line)
# More robust (flexible whitespace, optional parts)
frame_match = re.match(
r'^#(\d+)\s+' # Frame number
r'(?:0x)?([0-9a-f]+)\s+' # Address (optional 0x prefix)
r'in\s+(\S+)\s*' # Function name
r'(?:\([^)]*\))?\s*' # Arguments (optional)
r'(?:at\s+(\S+):(\d+))?', # File:line (optional)
line
)
Pitfall 4: subprocess Timeout
Symptom: Script hangs when analyzing large core dump.
Cause: GDB can take a long time for large or complex cores.
Fix:
try:
result = subprocess.run(
gdb_command,
capture_output=True,
text=True,
timeout=60 # 60 second timeout
)
except subprocess.TimeoutExpired:
print("ERROR: GDB analysis timed out. Core file may be too large.")
sys.exit(1)
Pitfall 5: Mismatched Executable and Core
Symptom: GDB shows wrong symbols or “warning: core file may not match”.
Cause: Core was generated by a different build of the executable.
Fix:
# Check before analysis
def verify_executable_matches_core(exe_path, core_path):
"""Verify the core was generated by this executable."""
# Extract the path from the core file's NT_FILE note
result = subprocess.run(
["eu-readelf", "-n", core_path],
capture_output=True, text=True
)
if result.returncode != 0:
return True # Can't verify, proceed anyway
# Look for original executable path in output
for line in result.stdout.split('\n'):
if 'NT_FILE' in line or exe_name in line:
# Comparison logic here
pass
return True # Default to proceeding
Debugging Techniques
1. Print raw GDB output:
result = subprocess.run(gdb_command, capture_output=True, text=True)
print("=== STDOUT ===")
print(result.stdout)
print("=== STDERR ===")
print(result.stderr)
print("=== RETURN CODE ===")
print(result.returncode)
2. Interactive debugging:
# Run the same commands interactively to see what GDB shows
gdb ./exe core
(gdb) bt
(gdb) info registers
3. Test Python API in GDB:
gdb -q ./exe core
(gdb) python
>>> import gdb
>>> print(gdb.parse_and_eval("$rip"))
>>> frame = gdb.selected_frame()
>>> print(frame.name())
>>> end
8. Extensions & Challenges
Extension 1: JSON Output
Add --json flag for machine-readable output:
def format_as_json(report: CrashReport) -> str:
return json.dumps({
"executable": report.executable,
"core_file": report.core_file,
"signal": {
"name": report.signal.name,
"description": report.signal.description,
"address": report.signal.fault_address
},
"crash_ip": report.crash_address,
"backtrace": [
{
"frame": f.frame_number,
"address": f.address,
"function": f.function,
"file": f.file,
"line": f.line
}
for f in report.backtrace
],
"registers": report.registers.registers,
"has_symbols": report.has_symbols,
"analyzed_at": datetime.utcnow().isoformat()
}, indent=2)
Extension 2: Crash Signature Generation
Generate a stable “fingerprint” for crash deduplication:
def generate_crash_signature(report: CrashReport) -> str:
"""Generate a stable signature for crash deduplication."""
# Use top 3 frames (excluding library frames)
significant_frames = []
for frame in report.backtrace[:5]:
if frame.function and not frame.function.startswith("__"):
significant_frames.append(f"{frame.function}:{frame.file}")
if len(significant_frames) >= 3:
break
# Include signal type
signature_parts = [report.signal.name] + significant_frames
# Hash for compact representation
signature_string = "|".join(signature_parts)
return hashlib.sha256(signature_string.encode()).hexdigest()[:16]
Extension 3: Batch Processing
Analyze multiple cores at once:
$ python3 auto_analyzer.py --batch /var/crash/*.core
Processing 15 core files...
[1/15] core.app1.1234 - SIGSEGV in main()
[2/15] core.app1.1235 - SIGSEGV in main() (duplicate of #1)
[3/15] core.app2.5678 - SIGABRT in abort()
...
Summary:
Total crashes: 15
Unique signatures: 3
Most common: SIGSEGV in main() (12 occurrences)
Extension 4: Memory Analysis
Add memory examination to find what was being accessed:
# In GDB Python script
def analyze_crash_memory(fault_address):
"""Try to understand what memory was being accessed."""
try:
# Check if address is mapped
maps = gdb.execute("info proc mappings", to_string=True)
# Try to read memory around the fault address
if fault_address != "0x0":
nearby = gdb.execute(f"x/8wx {fault_address}", to_string=True)
return nearby
except gdb.error:
pass
return None
Extension 5: Source Context
Show source code around the crash:
# In GDB Python script
def get_source_context(frame, context_lines=3):
"""Get source code around the crashing line."""
if frame.file and frame.line:
try:
output = gdb.execute(
f"list {frame.file}:{frame.line - context_lines},"
f"{frame.line + context_lines}",
to_string=True
)
return output
except gdb.error:
return "Source not available"
return None
9. Real-World Connections
How This Relates to Production Systems
systemd-coredump:
Your Script systemd-coredump
┌────────────────┐ ┌─────────────────────────────────┐
│ Takes core │ │ Intercepts ALL core dumps │
│ as argument │ │ via core_pattern │
├────────────────┤ ├─────────────────────────────────┤
│ Invokes GDB │ │ Compresses and stores in │
│ manually │ │ /var/lib/systemd/coredump/ │
├────────────────┤ ├─────────────────────────────────┤
│ Parses output │ │ Indexes by executable, PID, │
│ │ │ timestamp │
├────────────────┤ ├─────────────────────────────────┤
│ Prints report │ │ coredumpctl provides interface │
└────────────────┘ └─────────────────────────────────┘
Sentry/Crashlytics:
Your Script Sentry
┌────────────────┐ ┌─────────────────────────────────┐
│ Single core │ │ Millions of crashes per day │
│ local analysis │ │ from thousands of users │
├────────────────┤ ├─────────────────────────────────┤
│ GDB Python API │ │ Custom parsers for minidumps, │
│ │ │ symbolication servers │
├────────────────┤ ├─────────────────────────────────┤
│ Text/JSON │ │ Web UI with graphs, trends, │
│ output │ │ alerts, integrations │
├────────────────┤ ├─────────────────────────────────┤
│ Manual run │ │ SDK in app, automatic upload │
└────────────────┘ └─────────────────────────────────┘
Industry Use Cases
-
Game Development: Every crash from QA testers is analyzed automatically. New crash types page engineers.
-
Mobile Apps: Crashlytics/Bugsnag receive millions of crash reports, deduplicate, and show trends.
-
Cloud Infrastructure: AWS/GCP/Azure automate analysis of their internal service crashes.
-
Automotive: Safety-critical systems require automated post-crash analysis for regulatory compliance.
-
Financial Services: Every trading system crash is analyzed to determine if it caused incorrect trades.
10. Resources
Official Documentation
Books
| Book | Relevant Chapters |
|---|---|
| “The Art of Debugging with GDB” - Matloff & Salzman | Ch. 1-3 (GDB basics), Ch. 7 (Scripting) |
| “Debugging with GDB” - GNU Manual | Ch. 23 (Python Extensions) |
| “Black Hat Python” - Seitz | Ch. 1-2 (Automation mindset) |
| “The Linux Programming Interface” - Kerrisk | Ch. 22 (Signals) |
Online Resources
Related Tools
| Tool | Purpose |
|---|---|
coredumpctl |
systemd interface for core dumps |
eu-readelf |
Examine ELF files (including cores) |
gcore |
Generate core dump of running process |
pstack |
Print stack of running process |
minidump_stackwalk |
Google Breakpad minidump analyzer |
11. Self-Assessment Checklist
Functionality
- Script accepts executable and core file as arguments
- Script validates that both files exist
- Script extracts signal name and description
- Script extracts crash address (fault address)
- Script generates full backtrace
- Script extracts key register values
- Script handles cores without debug symbols
- Script outputs formatted report
Robustness
- Script handles missing files gracefully
- Script handles GDB errors
- Script has timeout protection
- Script works with SIGSEGV, SIGFPE, SIGABRT
- Script output is consistent across GDB versions
Code Quality
- Code is organized into functions
- Functions have docstrings
- Error messages are clear
- No hardcoded paths
Extensions (Optional)
- JSON output option implemented
- Crash signature generation implemented
- Batch processing implemented
12. Submission / Completion Criteria
You have successfully completed this project when:
- Basic Analysis Works:
$ python3 auto_analyzer.py ./crashing_program core.1234 # Produces report with signal, backtrace, and registers - Error Handling Works:
$ python3 auto_analyzer.py ./nonexistent core.1234 ERROR: Executable not found: ./nonexistent - Multiple Crash Types Work:
- SIGSEGV (segmentation fault)
- SIGFPE (floating point exception)
- SIGABRT (abort)
- Works Without Symbols:
$ strip ./crashing_program $ ./crashing_program # generates core $ python3 auto_analyzer.py ./crashing_program core.xxx # Shows addresses instead of function names, but doesn't crash - Tests Pass:
$ python3 -m pytest tests/ # All tests pass
Stretch Goals:
- JSON output option (
--jsonflag) - Analysis of multi-threaded core dumps
- Crash signature for deduplication
- Integration with a simple web interface
What’s Next?
After completing this project, you’re ready for:
- Project 5: Multi-threaded Mayhem - Apply your automation skills to complex concurrent crashes
- Project 7: The Minidump Parser - Build a parser for a different crash dump format
- Project 10: Building a Centralized Crash Reporter - Scale this to a full crash pipeline
The automation skills you’ve built here are the foundation of production crash analysis systems. Every technique you’ve learned—batch mode, subprocess management, output parsing—appears in real-world crash reporting infrastructure.