Project 4: The Automated Crash Detective

Build a Python script that automates initial crash dump triage, extracting backtraces, registers, and crash signals from core files to generate concise analysis reports.

Quick Reference

Attribute Value
Difficulty Intermediate
Time Estimate 1-2 weeks
Language Python
Prerequisites Project 2 (GDB Backtrace), Basic Python scripting
Key Topics GDB batch mode, GDB Python API, subprocess management, crash automation

1. Learning Objectives

By completing this project, you will:

  1. Understand GDB’s batch mode and how to run non-interactive debugging sessions
  2. Master the GDB Python API for programmatic access to debugging information
  3. Learn subprocess management for orchestrating external tools from Python
  4. Build robust text parsing to extract structured data from GDB output
  5. Design reusable automation scripts that handle multiple crash scenarios
  6. Develop practical SRE/DevOps skills for crash triage at scale

2. Theoretical Foundation

2.1 Core Concepts

GDB Batch Mode: Non-Interactive Debugging

GDB’s batch mode allows you to run debugging commands without human interaction. This is the foundation of automated crash analysis.

Interactive GDB Session                Batch Mode Execution
┌───────────────────────────┐         ┌───────────────────────────┐
│ $ gdb ./app core.1234     │         │ $ gdb --batch --quiet \   │
│ (gdb) bt                  │         │     --command=cmds.gdb \  │
│ #0  main() at app.c:10    │         │     ./app core.1234       │
│ (gdb) info registers      │         │                           │
│ rax  0x0 ...              │         │ #0  main() at app.c:10    │
│ (gdb) quit                │         │ rax  0x0 ...              │
└───────────────────────────┘         └───────────────────────────┘
        │                                       │
        ▼                                       ▼
  Human types commands               Script reads stdout output
  Human reads output                 Script parses and processes

Key batch mode flags:

  • --batch: Exit after processing commands (implies --quiet)
  • --quiet or -q: Suppress introductory and copyright messages
  • --command=FILE or -x FILE: Execute GDB commands from FILE
  • --eval-command=COMMAND or -ex COMMAND: Execute a single GDB command

GDB Python API: Structured Access to Debugging Data

GDB embeds a Python interpreter that provides programmatic access to debugging information. Unlike parsing text output, the Python API gives you structured data.

┌─────────────────────────────────────────────────────────────────┐
│                     GDB Architecture                             │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  ┌──────────────┐    ┌─────────────────┐    ┌────────────────┐ │
│  │ Core Dump    │───▶│  GDB Core       │───▶│ Command Line   │ │
│  │ + Executable │    │  (symbol table, │    │ Interface      │ │
│  └──────────────┘    │   stack unwinder,│    └────────────────┘ │
│                      │   expression     │              │         │
│                      │   evaluator)     │              ▼         │
│                      └────────┬─────────┘    ┌────────────────┐ │
│                               │              │ Text Output    │ │
│                               │              └────────────────┘ │
│                               │                                  │
│                               ▼                                  │
│                      ┌─────────────────┐                        │
│                      │ Python API      │                        │
│                      │ (gdb module)    │                        │
│                      └────────┬────────┘                        │
│                               │                                  │
│                               ▼                                  │
│                      ┌─────────────────┐                        │
│                      │ Your Script     │                        │
│                      │ (analyzer.py)   │                        │
│                      └─────────────────┘                        │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Essential GDB Python API functions:

Function Purpose Example
gdb.execute(cmd) Run a GDB command, return output as string gdb.execute("bt")
gdb.parse_and_eval(expr) Evaluate expression, return gdb.Value gdb.parse_and_eval("$rip")
gdb.selected_frame() Get current stack frame frame = gdb.selected_frame()
gdb.selected_inferior() Get current inferior (debugged process) inf = gdb.selected_inferior()
gdb.newest_frame() Get the innermost (newest) frame top = gdb.newest_frame()

Subprocess Management: Orchestrating External Tools

Python’s subprocess module allows your main script to invoke GDB as a child process and capture its output.

┌─────────────────────────────────────────────────────────────────┐
│                 Two-Process Architecture                         │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  Process 1: Your Python Script (auto_analyzer.py)               │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │                                                          │   │
│  │  executable = sys.argv[1]                               │   │
│  │  core_file = sys.argv[2]                                │   │
│  │                                                          │   │
│  │  result = subprocess.run(                                │   │
│  │      ["gdb", "--batch", "-x", "analyzer.py",            │   │
│  │       executable, core_file],                           │   │
│  │      capture_output=True, text=True                     │   │
│  │  )                                                       │   │
│  │                                                          │   │
│  │  report = parse_output(result.stdout)                   │   │
│  │                                                          │   │
│  └──────────────────────────┬──────────────────────────────┘   │
│                              │                                   │
│                              │ spawns                            │
│                              ▼                                   │
│  Process 2: GDB with Python Extension                           │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │                                                          │   │
│  │  GDB loads core dump and executable                     │   │
│  │  GDB runs analyzer.py inside its Python interpreter     │   │
│  │  analyzer.py uses gdb.execute() and gdb.parse_and_eval()│   │
│  │  Output goes to stdout → captured by Process 1          │   │
│  │                                                          │   │
│  └─────────────────────────────────────────────────────────┘   │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

2.2 Why This Matters

The scale problem:

  • A single server might generate 1-5 crashes per day during development
  • A fleet of 1000 servers might generate 100+ crashes per day
  • A mobile app with 1 million users might generate 10,000+ crash reports per day

Manual analysis does not scale. Every SRE team, every crash reporting service (Sentry, Crashlytics, Breakpad), and every serious debugging workflow relies on automation.

What automation enables:

  1. Immediate triage: Categorize crashes by type within seconds
  2. Deduplication: Group identical crashes to focus engineering effort
  3. Alerting: Notify on-call engineers when new crash types appear
  4. Trending: Track crash rates over time to detect regressions
  5. Integration: Connect crash data to CI/CD, ticketing, and monitoring systems

2.3 Historical Context

Before automation (pre-2000s):

  • Engineers would manually run gdb on each core dump
  • No systematic tracking of which crashes had been analyzed
  • “I’ll look at that later” often meant “never”
  • Major crashes slipped through the cracks

The evolution of crash automation:

  1. Shell scripts (1990s): Simple gdb --batch wrappers
  2. GDB command files (2000s): Reusable .gdb scripts
  3. GDB Python API (2008): Structured programmatic access
  4. Crash reporting services (2010s): Sentry, Crashlytics, Raygun
  5. Modern pipelines (2020s): Integration with observability platforms

Why the GDB Python API was a game-changer (GDB 7.0, 2009):

  • Before: Parse text output with fragile regex
  • After: Access typed values, iterate frames, query symbols programmatically
  • Made complex automation reliable and maintainable

2.4 Common Misconceptions

Misconception 1: “Batch mode means no interactivity, so it’s limited”

Reality: Batch mode gives you the SAME capabilities as interactive mode. You can:

  • Set breakpoints, watchpoints, and catchpoints
  • Evaluate any expression
  • Examine any memory
  • Walk the stack

The only difference is input comes from a script instead of a keyboard.

Misconception 2: “The GDB Python API is just for writing GDB extensions”

Reality: The API is equally useful for:

  • One-off analysis scripts
  • CI/CD integration
  • Automated crash reports
  • Custom debugging tools

You don’t need to modify GDB or create plugins.

Misconception 3: “Parsing GDB’s text output is good enough”

Reality: Text parsing is fragile because:

  • Output format changes between GDB versions
  • Localization can change output language
  • Edge cases produce unexpected formatting

The Python API provides stable, typed access to the same data.

Misconception 4: “This is only useful for C/C++ crashes”

Reality: GDB (and your automation) can analyze:

  • C and C++ programs
  • Rust programs (with debug symbols)
  • Go programs (with limitations)
  • Any language that produces ELF executables with DWARF debug info

3. Project Specification

3.1 What You Will Build

A Python script (auto_analyzer.py) that:

  1. Takes an executable path and a core dump file as arguments
  2. Programmatically invokes GDB to load the crash
  3. Extracts key crash information (signal, backtrace, registers)
  4. Produces a formatted summary report
┌─────────────────────────────────────────────────────────────────┐
│                    auto_analyzer.py                              │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  INPUT:                                                          │
│  ┌─────────────┐    ┌─────────────┐                            │
│  │ Executable  │    │ Core Dump   │                            │
│  │ (./my_app)  │    │ (core.1234) │                            │
│  └──────┬──────┘    └──────┬──────┘                            │
│         │                  │                                     │
│         └────────┬─────────┘                                    │
│                  ▼                                               │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │                  GDB + Python Script                     │   │
│  │                                                          │   │
│  │  1. Load core dump                                       │   │
│  │  2. Extract signal information                          │   │
│  │  3. Get backtrace                                        │   │
│  │  4. Read register values                                 │   │
│  │  5. Identify crash location                             │   │
│  │                                                          │   │
│  └──────────────────────────┬──────────────────────────────┘   │
│                              │                                   │
│                              ▼                                   │
│  OUTPUT:                                                         │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │                 Crash Analysis Report                    │   │
│  │                                                          │   │
│  │  Executable: ./my_app                                   │   │
│  │  Core File:  core.1234                                  │   │
│  │  Signal:     SIGSEGV (Segmentation fault)               │   │
│  │  Crashing IP: 0x55555555513d                            │   │
│  │                                                          │   │
│  │  --- Backtrace ---                                      │   │
│  │  #0  main () at crashing_program.c:4                    │   │
│  │                                                          │   │
│  │  --- Registers ---                                      │   │
│  │  RAX: 0x0                                               │   │
│  │  RBX: 0x0                                               │   │
│  │  ...                                                    │   │
│  └─────────────────────────────────────────────────────────┘   │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

3.2 Functional Requirements

ID Requirement Priority
FR-1 Accept executable and core file paths as command-line arguments Must Have
FR-2 Validate that both files exist and are readable Must Have
FR-3 Extract the signal that caused the crash (SIGSEGV, SIGABRT, etc.) Must Have
FR-4 Extract the crash address (if applicable) Must Have
FR-5 Extract the full backtrace Must Have
FR-6 Extract key register values (RIP, RSP, RAX, RBX, etc.) Must Have
FR-7 Handle core dumps without debug symbols gracefully Should Have
FR-8 Output a human-readable report to stdout Must Have
FR-9 Support different crash types (SIGSEGV, SIGFPE, SIGABRT, SIGBUS) Should Have
FR-10 Provide JSON output option for machine processing Nice to Have

3.3 Non-Functional Requirements

ID Requirement Metric
NFR-1 Analysis completes quickly < 5 seconds for typical core dump
NFR-2 Works with GDB 8.0+ Tested on Ubuntu 20.04, 22.04
NFR-3 Error messages are clear and actionable User can resolve issues without docs
NFR-4 Script is portable Works on any Linux distribution
NFR-5 No external Python dependencies (stdlib only) Easy deployment

3.4 Example Usage / Output

Basic usage:

$ python3 auto_analyzer.py ./my_app core.1234

--- Crash Analysis Report ---
Executable: ./my_app
Core File:  core.1234
Signal:     SIGSEGV (Segmentation fault) at 0x0
Crashing IP (RIP): 0x55555555513d

--- Backtrace ---
#0  0x000055555555513d in main () at crashing_program.c:4

--- Registers ---
RAX: 0x0
RBX: 0x0
RCX: 0x7ffff7f9aa80
RDX: 0x7fffffffe528
RSI: 0x7fffffffe518
RDI: 0x1
RBP: 0x7fffffffe420
RSP: 0x7fffffffe420
RIP: 0x55555555513d

With JSON output:

$ python3 auto_analyzer.py --json ./my_app core.1234
{
  "executable": "./my_app",
  "core_file": "core.1234",
  "signal": {
    "name": "SIGSEGV",
    "description": "Segmentation fault",
    "address": "0x0"
  },
  "crash_ip": "0x55555555513d",
  "backtrace": [
    {
      "frame": 0,
      "address": "0x000055555555513d",
      "function": "main",
      "file": "crashing_program.c",
      "line": 4
    }
  ],
  "registers": {
    "rax": "0x0",
    "rbx": "0x0",
    ...
  }
}

Error handling:

$ python3 auto_analyzer.py ./missing_app core.1234
ERROR: Executable not found: ./missing_app

$ python3 auto_analyzer.py ./my_app core.wrong
ERROR: Core file not found: core.wrong

$ python3 auto_analyzer.py ./wrong_app core.1234
WARNING: Core file was not generated by this executable.
         Expected: ./wrong_app
         Actual:   ./my_app

3.5 Real World Outcome

After completing this project, you will have:

  1. A reusable crash analysis tool that can be dropped into any project
  2. Foundation for a crash reporting pipeline (add HTTP upload, database storage)
  3. Skills transferable to commercial tools like Sentry, Datadog, or custom SRE tooling
  4. Understanding of how tools like coredumpctl work internally

How this connects to production systems:

Your Script              Production Evolution
┌──────────────┐        ┌─────────────────────────────────────────┐
│auto_analyzer │   ──▶  │ Crash Pipeline                          │
│   .py        │        │                                         │
└──────────────┘        │ ┌─────────┐  ┌──────┐  ┌────────────┐  │
                        │ │ Collect │──│Analyze│──│ Store/Alert│  │
                        │ └─────────┘  └──────┘  └────────────┘  │
                        │                                         │
                        │ • systemd-coredump captures crashes    │
                        │ • Your script analyzes automatically   │
                        │ • Results go to Elasticsearch/Splunk   │
                        │ • PagerDuty alerts for new crash types │
                        └─────────────────────────────────────────┘

4. Solution Architecture

4.1 High-Level Design

There are two main approaches to implementing this project. You should understand both:

Approach A: GDB Batch File (Simpler)

┌────────────────────────────────────────────────────────────────┐
│                    Approach A: Batch Commands                   │
├────────────────────────────────────────────────────────────────┤
│                                                                 │
│  auto_analyzer.py                                               │
│  ┌───────────────────────────────────────────────────────────┐ │
│  │ 1. Create commands.gdb file:                              │ │
│  │    set pagination off                                     │ │
│  │    bt                                                      │ │
│  │    info registers                                         │ │
│  │    quit                                                    │ │
│  │                                                            │ │
│  │ 2. Run: gdb --batch -x commands.gdb ./app core.1234       │ │
│  │                                                            │ │
│  │ 3. Parse stdout text output                               │ │
│  │                                                            │ │
│  │ 4. Generate report                                         │ │
│  └───────────────────────────────────────────────────────────┘ │
│                                                                 │
│  Pros: Simple, no Python inside GDB                            │
│  Cons: Fragile text parsing, limited flexibility              │
│                                                                 │
└────────────────────────────────────────────────────────────────┘

Approach B: GDB Python API (More Robust)

┌────────────────────────────────────────────────────────────────┐
│                    Approach B: Python API                       │
├────────────────────────────────────────────────────────────────┤
│                                                                 │
│  auto_analyzer.py (wrapper)                                     │
│  ┌───────────────────────────────────────────────────────────┐ │
│  │ 1. Run: gdb --batch -x gdb_script.py ./app core.1234      │ │
│  │                                                            │ │
│  │ 2. Capture stdout                                          │ │
│  │                                                            │ │
│  │ 3. Present report (already formatted by gdb_script.py)    │ │
│  └───────────────────────────────────────────────────────────┘ │
│                                                                 │
│  gdb_script.py (runs inside GDB)                                │
│  ┌───────────────────────────────────────────────────────────┐ │
│  │ import gdb                                                 │ │
│  │                                                            │ │
│  │ # Use API for structured access                           │ │
│  │ rip = gdb.parse_and_eval("$rip")                          │ │
│  │ frame = gdb.selected_frame()                               │ │
│  │ bt = gdb.execute("bt", to_string=True)                    │ │
│  │                                                            │ │
│  │ # Print formatted output                                   │ │
│  │ print(f"RIP: {rip}")                                       │ │
│  └───────────────────────────────────────────────────────────┘ │
│                                                                 │
│  Pros: Structured data, reliable, extensible                   │
│  Cons: Two-file setup, requires understanding GDB Python       │
│                                                                 │
└────────────────────────────────────────────────────────────────┘

Recommended approach: Start with Approach A to understand the basics, then refactor to Approach B for robustness.

4.2 Key Components

┌─────────────────────────────────────────────────────────────────┐
│                    Component Architecture                        │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │              CLI Interface (main)                        │   │
│  │  • Parse command-line arguments                          │   │
│  │  • Validate inputs                                       │   │
│  │  • Handle --json flag                                    │   │
│  │  • Print final report                                    │   │
│  └────────────────────────────┬────────────────────────────┘   │
│                               │                                  │
│                               ▼                                  │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │              GDB Invoker                                  │   │
│  │  • Build GDB command line                                │   │
│  │  • Manage subprocess                                     │   │
│  │  • Handle GDB errors                                     │   │
│  │  • Return raw output                                     │   │
│  └────────────────────────────┬────────────────────────────┘   │
│                               │                                  │
│                               ▼                                  │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │              GDB Script (runs inside GDB)                │   │
│  │  • Extract signal info                                   │   │
│  │  • Generate backtrace                                    │   │
│  │  • Read registers                                        │   │
│  │  • Format output                                         │   │
│  └────────────────────────────┬────────────────────────────┘   │
│                               │                                  │
│                               ▼                                  │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │              Report Formatter                             │   │
│  │  • Parse GDB output                                      │   │
│  │  • Build report structure                                │   │
│  │  • Output text or JSON                                   │   │
│  └─────────────────────────────────────────────────────────┘   │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

4.3 Data Structures

# Crash analysis data model

from dataclasses import dataclass
from typing import List, Optional, Dict

@dataclass
class StackFrame:
    """Represents a single frame in the call stack."""
    frame_number: int           # #0, #1, #2, ...
    address: str               # 0x55555555513d
    function: Optional[str]    # main (or None if no symbols)
    file: Optional[str]        # crashing_program.c (or None)
    line: Optional[int]        # 4 (or None)

@dataclass
class SignalInfo:
    """Information about the terminating signal."""
    name: str                  # SIGSEGV
    description: str           # Segmentation fault
    fault_address: Optional[str]  # 0x0 (address that caused fault)

@dataclass
class RegisterState:
    """CPU register values at crash time."""
    registers: Dict[str, str]  # {"rax": "0x0", "rbx": "0x7f...", ...}

@dataclass
class CrashReport:
    """Complete crash analysis report."""
    executable: str
    core_file: str
    signal: SignalInfo
    crash_address: str         # RIP value
    backtrace: List[StackFrame]
    registers: RegisterState
    has_symbols: bool          # True if debug symbols present

4.4 Algorithm Overview

Main Analysis Flow:

┌─────────────────────────────────────────────────────────────────┐
│                    Analysis Algorithm                            │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  1. VALIDATE INPUTS                                              │
│     ├─ Check executable exists and is ELF                       │
│     ├─ Check core file exists and is ELF core dump              │
│     └─ Verify core matches executable (optional warning)        │
│                                                                  │
│  2. PREPARE GDB SESSION                                          │
│     ├─ Create temporary command/script file                     │
│     └─ Build subprocess command line                            │
│                                                                  │
│  3. INVOKE GDB                                                   │
│     ├─ Run: gdb --batch --quiet -x script executable core       │
│     ├─ Capture stdout and stderr                                │
│     └─ Check return code                                        │
│                                                                  │
│  4. EXTRACT INFORMATION (inside GDB script)                      │
│     ├─ Signal: Parse "Program terminated with signal" message   │
│     ├─ RIP: gdb.parse_and_eval("$rip") or "info registers"     │
│     ├─ Backtrace: gdb.execute("bt") or "bt" command            │
│     └─ Registers: gdb.parse_and_eval("$rax") or "info regs"    │
│                                                                  │
│  5. FORMAT REPORT                                                │
│     ├─ Structure data into CrashReport                          │
│     ├─ Format as text (default) or JSON (--json flag)          │
│     └─ Print to stdout                                          │
│                                                                  │
│  6. CLEANUP                                                      │
│     └─ Remove temporary files                                   │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

5. Implementation Guide

5.1 Development Environment Setup

Required packages:

# Ubuntu/Debian
sudo apt-get update
sudo apt-get install gdb python3 build-essential

# Verify GDB has Python support
gdb --batch --eval-command="python print('Python works!')"
# Should output: Python works!

# Check GDB version (need 8.0+)
gdb --version

Create a test crash program:

// crashing_program.c
#include <stdio.h>

void inner_function(int *ptr) {
    *ptr = 42;  // Crash here if ptr is NULL
}

void outer_function(int *ptr) {
    printf("About to crash...\n");
    inner_function(ptr);
}

int main(int argc, char **argv) {
    int *ptr = NULL;  // NULL pointer
    outer_function(ptr);
    return 0;
}

Generate a core dump:

# Compile with debug symbols
gcc -g -o crashing_program crashing_program.c

# Enable core dumps
ulimit -c unlimited

# On some systems, configure core_pattern
# Check current setting:
cat /proc/sys/kernel/core_pattern

# Run and crash
./crashing_program
# Output: Segmentation fault (core dumped)

# Find the core file (location depends on core_pattern)
ls -la core* /var/lib/apport/coredump/ /var/crash/

5.2 Project Structure

auto_crash_analyzer/
├── auto_analyzer.py        # Main entry point (wrapper script)
├── gdb_analyzer.py         # GDB Python script (runs inside GDB)
├── test_programs/
│   ├── null_deref.c        # NULL pointer dereference
│   ├── stack_overflow.c    # Stack buffer overflow
│   ├── div_by_zero.c       # Division by zero (SIGFPE)
│   ├── abort_call.c        # Explicit abort() (SIGABRT)
│   └── Makefile            # Build all test programs
├── test_cores/             # Generated core dumps for testing
├── tests/
│   ├── test_analyzer.py    # Unit tests
│   └── test_integration.py # Integration tests
└── README.md

5.3 The Core Question You’re Answering

“How can I programmatically extract meaningful information from a crash dump without manual intervention?”

This question underpins all crash automation. The answer involves:

  1. Understanding what data GDB can extract
  2. Knowing how to invoke GDB non-interactively
  3. Structuring output for both human and machine consumption
  4. Handling edge cases (no symbols, corrupted dumps, etc.)

5.4 Concepts You Must Understand First

Before starting implementation, verify you understand:

Concept Self-Check Question Resource if Unsure
GDB basics Can you run bt, info registers, and p <var> in GDB? Project 2 of this series
Python subprocess Can you capture stdout from a shell command in Python? Python docs: subprocess module
ELF format Can you use file command to identify ELF executables and core dumps? man elf, man file
Unix signals Can you list 5 common signals and when they occur? man 7 signal
Debug symbols What’s the difference between compiling with and without -g? Project 2 of this series

5.5 Questions to Guide Your Design

Input handling:

  • How will you verify the executable is the correct one for the core dump?
  • What should happen if the files don’t exist?
  • Should you support absolute and relative paths?

GDB interaction:

  • Will you use a command file, inline -ex commands, or a Python script?
  • How will you handle GDB errors (e.g., “No stack” or “Cannot access memory”)?
  • Should your script work with both Python 2 and Python 3 in GDB?

Information extraction:

  • What if there’s no signal info (e.g., the core was generated by gcore)?
  • How many stack frames should you show by default?
  • Which registers are most important to include?

Output format:

  • Should the text output be colorized?
  • How will you handle very long backtraces?
  • What metadata should the JSON output include?

5.6 Thinking Exercise

Before writing code, trace through this scenario manually:

Exercise: You have a core dump from a multi-threaded program. Thread 2 crashed with SIGSEGV. Thread 1 was waiting in select(). Thread 3 was in malloc().

  1. Draw a diagram showing what information GDB will show for each thread.
  2. List the GDB commands needed to examine each thread.
  3. Design your output format: How will you represent multiple threads?
  4. What should happen if the user’s core dump has 100 threads?

This exercise prepares you for the multi-threaded extension in a later project.

5.7 Hints in Layers

Hint 1 - Starting Point (Conceptual Direction): Start with the simplest possible implementation: a command file with bt and info registers, invoked via subprocess. Get basic extraction working before adding Python API usage.

Hint 2 - Next Level (More Specific Guidance): Create commands.gdb:

set pagination off
set print pretty on
bt
info registers
quit

Invoke with: subprocess.run(["gdb", "--batch", "-q", "-x", "commands.gdb", exe, core], capture_output=True)

Hint 3 - Technical Details (Approach/Pseudocode):

# Wrapper script structure
def main():
    exe, core = parse_args()
    validate_inputs(exe, core)

    # Create temp command file
    with tempfile.NamedTemporaryFile(mode='w', suffix='.gdb') as f:
        f.write("set pagination off\n")
        f.write("bt\n")
        f.write("info registers\n")
        f.write("quit\n")
        f.flush()

        result = subprocess.run(
            ["gdb", "--batch", "-q", "-x", f.name, exe, core],
            capture_output=True, text=True, timeout=30
        )

    report = parse_gdb_output(result.stdout, result.stderr)
    print_report(report)

Hint 4 - Tools/Debugging (Verification Methods):

  • Test with a known-good core dump first
  • Print raw GDB output before parsing to see exact format
  • Use --eval-command="show version" to verify GDB is being invoked correctly
  • Check result.returncode - GDB returns 0 even if core is corrupt

5.8 The Interview Questions They’ll Ask

  1. “How does GDB load a core dump?”
    • Expected: GDB reads the ELF core file which contains memory segments, register values, and metadata. It uses the executable for symbol information.
  2. “What’s the difference between gdb.execute() and gdb.parse_and_eval()?”
    • Expected: execute() runs a command and returns text output. parse_and_eval() evaluates an expression and returns a typed gdb.Value object.
  3. “How would you handle a core dump from a stripped binary?”
    • Expected: The backtrace will show addresses instead of function names. You can still examine registers and memory. If you have the original debug symbols separately, you can load them with symbol-file.
  4. “How would you scale this to analyze 1000 crashes per hour?”
    • Expected: Parallelize analysis, deduplicate by backtrace signature, cache symbol files, use a queue system for crash files, store results in a database.
  5. “What are the security implications of automated crash analysis?”
    • Expected: Core dumps may contain sensitive data (passwords in memory, encryption keys). Analysis should happen in isolated environments. Results should be redacted before storage.

5.9 Books That Will Help

Topic Book Chapter
GDB basics “The Art of Debugging with GDB” - Matloff & Salzman Ch. 1-3
GDB scripting “Debugging with GDB” - GNU Manual Ch. 23 (Python)
Process memory “Computer Systems: A Programmer’s Perspective” - Bryant & O’Hallaron Ch. 9 (Virtual Memory)
Automation mindset “Black Hat Python” - Justin Seitz Ch. 1-2
Crash reporting systems “Site Reliability Engineering” - Google Ch. 15 (Postmortems)

5.10 Implementation Phases

Phase 1: Basic Extraction (Days 1-3)

Goals:

  • Create test crash programs
  • Invoke GDB via subprocess
  • Capture backtrace output

Deliverable: Script that prints raw GDB output for any core dump.

Phase 2: Structured Parsing (Days 3-6)

Goals:

  • Parse backtrace into stack frames
  • Extract signal information
  • Parse register values

Deliverable: Script that prints formatted report with sections.

Phase 3: GDB Python API Migration (Days 6-9)

Goals:

  • Rewrite extraction using gdb.execute() and gdb.parse_and_eval()
  • Improve reliability of data extraction
  • Handle edge cases (no symbols, truncated stacks)

Deliverable: Robust analyzer using Python API.

Phase 4: Polish and Testing (Days 9-14)

Goals:

  • Add JSON output option
  • Add comprehensive error handling
  • Test with multiple crash types
  • Document usage

Deliverable: Production-ready analyzer script.

5.11 Key Implementation Decisions

Decision 1: Command file vs. Python API

Factor Command File Python API
Simplicity Easier to start More code
Reliability Fragile parsing Structured data
Flexibility Limited High
Debugging Harder Easier

Recommendation: Start with command file, migrate to Python API.

Decision 2: Single script vs. two scripts

Factor Single Script Two Scripts
Deployment One file Two files
Complexity Higher (embed GDB script) Lower (separation)
Maintainability Harder Easier
Testing Harder Can test independently

Recommendation: Two scripts (wrapper + GDB script).

Decision 3: Text parsing vs. structured output

For the command file approach, you must parse text. Key patterns:

# Backtrace line pattern
#0  0x000055555555513d in main () at crashing_program.c:4
^   ^                     ^          ^                   ^
|   |                     |          |                   +-- line number
|   |                     |          +-- file name
|   |                     +-- function name
|   +-- address
+-- frame number

# Register pattern
rax            0x0                      0
^              ^                        ^
|              |                        +-- decimal value
|              +-- hex value
+-- register name

# Signal pattern
Program terminated with signal SIGSEGV, Segmentation fault.
                               ^        ^
                               |        +-- description
                               +-- signal name

6. Testing Strategy

Unit Tests

# tests/test_analyzer.py

import unittest
from analyzer import parse_backtrace_line, parse_register_line, parse_signal_line

class TestBacktraceParsing(unittest.TestCase):
    def test_full_frame_with_symbols(self):
        line = "#0  0x000055555555513d in main () at crashing_program.c:4"
        frame = parse_backtrace_line(line)
        self.assertEqual(frame.frame_number, 0)
        self.assertEqual(frame.address, "0x000055555555513d")
        self.assertEqual(frame.function, "main")
        self.assertEqual(frame.file, "crashing_program.c")
        self.assertEqual(frame.line, 4)

    def test_frame_without_symbols(self):
        line = "#0  0x000055555555513d in ?? ()"
        frame = parse_backtrace_line(line)
        self.assertEqual(frame.function, None)
        self.assertEqual(frame.file, None)

    def test_frame_with_args(self):
        line = "#1  0x0000555555555160 in foo (x=42, y=0x7fff) at test.c:10"
        frame = parse_backtrace_line(line)
        self.assertEqual(frame.function, "foo")

class TestRegisterParsing(unittest.TestCase):
    def test_standard_register(self):
        line = "rax            0x0                      0"
        name, value = parse_register_line(line)
        self.assertEqual(name, "rax")
        self.assertEqual(value, "0x0")

    def test_register_with_large_value(self):
        line = "rsp            0x7fffffffe420           140737488348192"
        name, value = parse_register_line(line)
        self.assertEqual(name, "rsp")
        self.assertEqual(value, "0x7fffffffe420")

class TestSignalParsing(unittest.TestCase):
    def test_sigsegv(self):
        line = "Program terminated with signal SIGSEGV, Segmentation fault."
        signal = parse_signal_line(line)
        self.assertEqual(signal.name, "SIGSEGV")
        self.assertEqual(signal.description, "Segmentation fault")

    def test_sigabrt(self):
        line = "Program terminated with signal SIGABRT, Aborted."
        signal = parse_signal_line(line)
        self.assertEqual(signal.name, "SIGABRT")

Integration Tests

#!/bin/bash
# tests/test_integration.sh

set -e

SCRIPT_DIR=$(dirname "$0")
ANALYZER="$SCRIPT_DIR/../auto_analyzer.py"
TEST_PROGS="$SCRIPT_DIR/../test_programs"
TEST_CORES="$SCRIPT_DIR/../test_cores"

# Build test programs
make -C "$TEST_PROGS"

# Generate core dumps
ulimit -c unlimited
cd "$TEST_CORES"

for prog in null_deref stack_overflow div_by_zero abort_call; do
    echo "Testing $prog..."

    # Generate core (will crash)
    "$TEST_PROGS/$prog" 2>/dev/null || true

    # Find core file (name depends on system)
    CORE=$(ls -t core* 2>/dev/null | head -1)

    if [ -z "$CORE" ]; then
        echo "ERROR: No core file generated for $prog"
        exit 1
    fi

    # Run analyzer
    OUTPUT=$(python3 "$ANALYZER" "$TEST_PROGS/$prog" "$CORE")

    # Verify output contains expected sections
    echo "$OUTPUT" | grep -q "Crash Analysis Report" || { echo "Missing report header"; exit 1; }
    echo "$OUTPUT" | grep -q "Backtrace" || { echo "Missing backtrace"; exit 1; }
    echo "$OUTPUT" | grep -q "Registers" || { echo "Missing registers"; exit 1; }

    # Verify signal detection
    case "$prog" in
        null_deref)    echo "$OUTPUT" | grep -q "SIGSEGV" ;;
        div_by_zero)   echo "$OUTPUT" | grep -q "SIGFPE" ;;
        abort_call)    echo "$OUTPUT" | grep -q "SIGABRT" ;;
    esac

    echo "  PASSED"
    rm -f "$CORE"
done

echo "All integration tests passed!"

Test with Different Scenarios

Scenario How to Create What to Verify
Simple SIGSEGV Dereference NULL Signal detected, address is 0x0
SIGFPE Divide by zero Signal is SIGFPE
SIGABRT Call abort() Signal is SIGABRT
Deep stack Recursive function Many frames shown
No symbols Strip executable Shows ?? for functions
Multi-threaded pthread crash Thread info included
Corrupted core Truncate core file Graceful error message

7. Common Pitfalls & Debugging

Pitfall 1: GDB Not Finding Python

Symptom:

$ gdb --batch -x script.py exe core
Python scripting is not supported in this copy of GDB.

Cause: GDB was compiled without Python support.

Fix:

# Check GDB Python support
gdb --batch --eval-command="python print('test')"

# If unsupported, install GDB with Python:
# Ubuntu/Debian
sudo apt-get install gdb

# From source
./configure --with-python

Pitfall 2: Core Pattern Redirects Core Dumps

Symptom: Core dump not appearing in current directory.

Cause: System’s core_pattern redirects to a different location.

Fix:

# Check current pattern
cat /proc/sys/kernel/core_pattern

# Common patterns and where to find cores:
# |/usr/share/apport/apport ... -> /var/crash/ (Ubuntu)
# |/usr/lib/systemd/systemd-coredump ... -> journalctl (systemd)
# core.%p -> ./core.<pid>

# For testing, set simple pattern (requires root):
sudo sysctl kernel.core_pattern=core.%p

# Or use coredumpctl on systemd systems:
coredumpctl list
coredumpctl dump <pid> > core.file

Pitfall 3: GDB Output Format Changes

Symptom: Parser works on one system, fails on another.

Cause: Different GDB versions format output differently.

Fix: Use Python API instead of text parsing, or be defensive:

# Fragile (exact format)
frame_match = re.match(r'^#(\d+)\s+0x([0-9a-f]+)\s+in\s+(\S+)\s+\(\)', line)

# More robust (flexible whitespace, optional parts)
frame_match = re.match(
    r'^#(\d+)\s+'               # Frame number
    r'(?:0x)?([0-9a-f]+)\s+'    # Address (optional 0x prefix)
    r'in\s+(\S+)\s*'            # Function name
    r'(?:\([^)]*\))?\s*'        # Arguments (optional)
    r'(?:at\s+(\S+):(\d+))?',   # File:line (optional)
    line
)

Pitfall 4: subprocess Timeout

Symptom: Script hangs when analyzing large core dump.

Cause: GDB can take a long time for large or complex cores.

Fix:

try:
    result = subprocess.run(
        gdb_command,
        capture_output=True,
        text=True,
        timeout=60  # 60 second timeout
    )
except subprocess.TimeoutExpired:
    print("ERROR: GDB analysis timed out. Core file may be too large.")
    sys.exit(1)

Pitfall 5: Mismatched Executable and Core

Symptom: GDB shows wrong symbols or “warning: core file may not match”.

Cause: Core was generated by a different build of the executable.

Fix:

# Check before analysis
def verify_executable_matches_core(exe_path, core_path):
    """Verify the core was generated by this executable."""
    # Extract the path from the core file's NT_FILE note
    result = subprocess.run(
        ["eu-readelf", "-n", core_path],
        capture_output=True, text=True
    )

    if result.returncode != 0:
        return True  # Can't verify, proceed anyway

    # Look for original executable path in output
    for line in result.stdout.split('\n'):
        if 'NT_FILE' in line or exe_name in line:
            # Comparison logic here
            pass

    return True  # Default to proceeding

Debugging Techniques

1. Print raw GDB output:

result = subprocess.run(gdb_command, capture_output=True, text=True)
print("=== STDOUT ===")
print(result.stdout)
print("=== STDERR ===")
print(result.stderr)
print("=== RETURN CODE ===")
print(result.returncode)

2. Interactive debugging:

# Run the same commands interactively to see what GDB shows
gdb ./exe core
(gdb) bt
(gdb) info registers

3. Test Python API in GDB:

gdb -q ./exe core
(gdb) python
>>> import gdb
>>> print(gdb.parse_and_eval("$rip"))
>>> frame = gdb.selected_frame()
>>> print(frame.name())
>>> end

8. Extensions & Challenges

Extension 1: JSON Output

Add --json flag for machine-readable output:

def format_as_json(report: CrashReport) -> str:
    return json.dumps({
        "executable": report.executable,
        "core_file": report.core_file,
        "signal": {
            "name": report.signal.name,
            "description": report.signal.description,
            "address": report.signal.fault_address
        },
        "crash_ip": report.crash_address,
        "backtrace": [
            {
                "frame": f.frame_number,
                "address": f.address,
                "function": f.function,
                "file": f.file,
                "line": f.line
            }
            for f in report.backtrace
        ],
        "registers": report.registers.registers,
        "has_symbols": report.has_symbols,
        "analyzed_at": datetime.utcnow().isoformat()
    }, indent=2)

Extension 2: Crash Signature Generation

Generate a stable “fingerprint” for crash deduplication:

def generate_crash_signature(report: CrashReport) -> str:
    """Generate a stable signature for crash deduplication."""
    # Use top 3 frames (excluding library frames)
    significant_frames = []
    for frame in report.backtrace[:5]:
        if frame.function and not frame.function.startswith("__"):
            significant_frames.append(f"{frame.function}:{frame.file}")
        if len(significant_frames) >= 3:
            break

    # Include signal type
    signature_parts = [report.signal.name] + significant_frames

    # Hash for compact representation
    signature_string = "|".join(signature_parts)
    return hashlib.sha256(signature_string.encode()).hexdigest()[:16]

Extension 3: Batch Processing

Analyze multiple cores at once:

$ python3 auto_analyzer.py --batch /var/crash/*.core

Processing 15 core files...
[1/15] core.app1.1234 - SIGSEGV in main()
[2/15] core.app1.1235 - SIGSEGV in main() (duplicate of #1)
[3/15] core.app2.5678 - SIGABRT in abort()
...

Summary:
  Total crashes: 15
  Unique signatures: 3
  Most common: SIGSEGV in main() (12 occurrences)

Extension 4: Memory Analysis

Add memory examination to find what was being accessed:

# In GDB Python script
def analyze_crash_memory(fault_address):
    """Try to understand what memory was being accessed."""
    try:
        # Check if address is mapped
        maps = gdb.execute("info proc mappings", to_string=True)

        # Try to read memory around the fault address
        if fault_address != "0x0":
            nearby = gdb.execute(f"x/8wx {fault_address}", to_string=True)
            return nearby
    except gdb.error:
        pass
    return None

Extension 5: Source Context

Show source code around the crash:

# In GDB Python script
def get_source_context(frame, context_lines=3):
    """Get source code around the crashing line."""
    if frame.file and frame.line:
        try:
            output = gdb.execute(
                f"list {frame.file}:{frame.line - context_lines},"
                f"{frame.line + context_lines}",
                to_string=True
            )
            return output
        except gdb.error:
            return "Source not available"
    return None

9. Real-World Connections

How This Relates to Production Systems

systemd-coredump:

Your Script                     systemd-coredump
┌────────────────┐             ┌─────────────────────────────────┐
│ Takes core     │             │ Intercepts ALL core dumps       │
│ as argument    │             │ via core_pattern                │
├────────────────┤             ├─────────────────────────────────┤
│ Invokes GDB    │             │ Compresses and stores in        │
│ manually       │             │ /var/lib/systemd/coredump/     │
├────────────────┤             ├─────────────────────────────────┤
│ Parses output  │             │ Indexes by executable, PID,     │
│                │             │ timestamp                       │
├────────────────┤             ├─────────────────────────────────┤
│ Prints report  │             │ coredumpctl provides interface  │
└────────────────┘             └─────────────────────────────────┘

Sentry/Crashlytics:

Your Script                     Sentry
┌────────────────┐             ┌─────────────────────────────────┐
│ Single core    │             │ Millions of crashes per day     │
│ local analysis │             │ from thousands of users         │
├────────────────┤             ├─────────────────────────────────┤
│ GDB Python API │             │ Custom parsers for minidumps,   │
│                │             │ symbolication servers          │
├────────────────┤             ├─────────────────────────────────┤
│ Text/JSON      │             │ Web UI with graphs, trends,     │
│ output         │             │ alerts, integrations           │
├────────────────┤             ├─────────────────────────────────┤
│ Manual run     │             │ SDK in app, automatic upload    │
└────────────────┘             └─────────────────────────────────┘

Industry Use Cases

  1. Game Development: Every crash from QA testers is analyzed automatically. New crash types page engineers.

  2. Mobile Apps: Crashlytics/Bugsnag receive millions of crash reports, deduplicate, and show trends.

  3. Cloud Infrastructure: AWS/GCP/Azure automate analysis of their internal service crashes.

  4. Automotive: Safety-critical systems require automated post-crash analysis for regulatory compliance.

  5. Financial Services: Every trading system crash is analyzed to determine if it caused incorrect trades.


10. Resources

Official Documentation

Books

Book Relevant Chapters
“The Art of Debugging with GDB” - Matloff & Salzman Ch. 1-3 (GDB basics), Ch. 7 (Scripting)
“Debugging with GDB” - GNU Manual Ch. 23 (Python Extensions)
“Black Hat Python” - Seitz Ch. 1-2 (Automation mindset)
“The Linux Programming Interface” - Kerrisk Ch. 22 (Signals)

Online Resources

Tool Purpose
coredumpctl systemd interface for core dumps
eu-readelf Examine ELF files (including cores)
gcore Generate core dump of running process
pstack Print stack of running process
minidump_stackwalk Google Breakpad minidump analyzer

11. Self-Assessment Checklist

Functionality

  • Script accepts executable and core file as arguments
  • Script validates that both files exist
  • Script extracts signal name and description
  • Script extracts crash address (fault address)
  • Script generates full backtrace
  • Script extracts key register values
  • Script handles cores without debug symbols
  • Script outputs formatted report

Robustness

  • Script handles missing files gracefully
  • Script handles GDB errors
  • Script has timeout protection
  • Script works with SIGSEGV, SIGFPE, SIGABRT
  • Script output is consistent across GDB versions

Code Quality

  • Code is organized into functions
  • Functions have docstrings
  • Error messages are clear
  • No hardcoded paths

Extensions (Optional)

  • JSON output option implemented
  • Crash signature generation implemented
  • Batch processing implemented

12. Submission / Completion Criteria

You have successfully completed this project when:

  1. Basic Analysis Works:
    $ python3 auto_analyzer.py ./crashing_program core.1234
    # Produces report with signal, backtrace, and registers
    
  2. Error Handling Works:
    $ python3 auto_analyzer.py ./nonexistent core.1234
    ERROR: Executable not found: ./nonexistent
    
  3. Multiple Crash Types Work:
    • SIGSEGV (segmentation fault)
    • SIGFPE (floating point exception)
    • SIGABRT (abort)
  4. Works Without Symbols:
    $ strip ./crashing_program
    $ ./crashing_program  # generates core
    $ python3 auto_analyzer.py ./crashing_program core.xxx
    # Shows addresses instead of function names, but doesn't crash
    
  5. Tests Pass:
    $ python3 -m pytest tests/
    # All tests pass
    

Stretch Goals:

  • JSON output option (--json flag)
  • Analysis of multi-threaded core dumps
  • Crash signature for deduplication
  • Integration with a simple web interface

What’s Next?

After completing this project, you’re ready for:

  • Project 5: Multi-threaded Mayhem - Apply your automation skills to complex concurrent crashes
  • Project 7: The Minidump Parser - Build a parser for a different crash dump format
  • Project 10: Building a Centralized Crash Reporter - Scale this to a full crash pipeline

The automation skills you’ve built here are the foundation of production crash analysis systems. Every technique you’ve learned—batch mode, subprocess management, output parsing—appears in real-world crash reporting infrastructure.