Project 6: GDB Python Scripting

Extend GDB with Python to automate debugging and build custom commands.

Quick Reference

Attribute Value
Difficulty Advanced
Time Estimate 1-2 weeks
Language Python (inside GDB)
Prerequisites Python basics, Projects 1-3
Key Topics gdb Python API, breakpoints, custom commands

1. Learning Objectives

By completing this project, you will:

  1. Load and run a Python script inside GDB.
  2. Create a custom GDB command with gdb.Command.
  3. Create automated breakpoints with gdb.Breakpoint.
  4. Read memory and registers from Python.

2. Theoretical Foundation

2.1 Core Concepts

  • GDB Python API: Exposes breakpoints, frames, and memory to Python.
  • Inferior process: The debugged program inside GDB, accessible via gdb.selected_inferior().
  • Automation: Use Python to capture patterns and reduce repetitive manual steps.

2.2 Why This Matters

Scripting transforms GDB from a manual tool into a programmable debugging platform. This is how you build reusable debugging utilities for a codebase.

2.3 Historical Context / Background

GDB added Python scripting in the 7.x era. Before that, users relied on GDB command language and extensions like DDD.

2.4 Common Misconceptions

  • “Python scripts are slow”: They can be, but you can make breakpoints non-stopping and conditional.
  • “You must stop on every breakpoint”: return False allows logging without pausing.

3. Project Specification

3.1 What You Will Build

A Python script that logs every call to printf, extracts the format string, and prints a formatted trace in real time.

3.2 Functional Requirements

  1. Create a script file and load it with source.
  2. Set an internal breakpoint on printf.
  3. Read the first argument from rdi and print it.
  4. Continue execution without stopping on every call.

3.3 Non-Functional Requirements

  • Performance: Avoid heavy memory reads per call.
  • Reliability: Script should handle invalid memory gracefully.
  • Usability: Output should be readable and consistent.

3.4 Example Usage / Output

(gdb) source debug_printf_tracer.py
(gdb) run
[printf #1] Format: "Hello for the %dth time!"

3.5 Real World Outcome

You will see live tracing output while the program runs:

$ gdb ./target
(gdb) source debug_printf_tracer.py
(gdb) run
[printf #1] Format: "Hello for the %dth time!"
[printf #2] Format: "Hello for the %dth time!"
[Inferior 1 (process 14567) exited normally]

4. Solution Architecture

4.1 High-Level Design

┌──────────────┐     ┌───────────────────┐     ┌─────────────────┐
│ debug target │────▶│ GDB Python script │────▶│ trace output    │
└──────────────┘     └───────────────────┘     └─────────────────┘

4.2 Key Components

Component Responsibility Key Decisions
Python script Define breakpoints and commands Use non-stopping breakpoint
Breakpoint class Capture calls Subclass gdb.Breakpoint
Memory reader Extract format string Limit reads to 256 bytes

4.3 Data Structures

class TraceState:
    def __init__(self):
        self.call_count = 0

4.4 Algorithm Overview

Key Algorithm: Non-stopping trace

  1. Register a breakpoint on printf.
  2. On stop, read rdi and memory string.
  3. Print a log line and continue.

Complexity Analysis:

  • Time: O(1) per breakpoint hit.
  • Space: O(1).

5. Implementation Guide

5.1 Development Environment Setup

gdb --version
python3 --version

5.2 Project Structure

project-root/
├── target.c
├── debug_printf_tracer.py
└── README.md

5.3 The Core Question You’re Answering

“How do I automate debugging work so I can focus on analysis, not repetition?”

5.4 Concepts You Must Understand First

Stop and research these before coding:

  1. GDB Python API basics
    • gdb.Breakpoint, gdb.Command
  2. Calling convention
    • Where arguments live (rdi, rsi, etc.)
  3. Memory reading
    • selected_inferior().read_memory()

5.5 Questions to Guide Your Design

  1. Should the breakpoint stop or just log?
  2. How will you handle invalid memory reads?
  3. How much output is too much?

5.6 Thinking Exercise

Design a script that tracks all malloc and free calls to detect double free.

5.7 The Interview Questions They’ll Ask

  1. How do you create a custom command in GDB?
  2. How do you access registers in Python?
  3. What is the performance cost of scripted breakpoints?

5.8 Hints in Layers

Hint 1: Minimal script

  • Define a gdb.Command and gdb.Breakpoint.

Hint 2: Read memory

  • inferior.read_memory(addr, 256)

Hint 3: Non-stopping

  • Return False from stop().

5.9 Books That Will Help

Topic Book Chapter
GDB Python API GDB Manual “Python API”
Debug scripting “The Art of Debugging with GDB” Ch. 7

5.10 Implementation Phases

Phase 1: Foundation (1-2 days)

Goals:

  • Load a Python script inside GDB.

Tasks:

  1. Create debug_printf_tracer.py.
  2. Use source to load it.

Checkpoint: GDB accepts your script without errors.

Phase 2: Core Functionality (3-5 days)

Goals:

  • Build a working breakpoint tracer.

Tasks:

  1. Implement PrintfTracer breakpoint.
  2. Read and print the format string.

Checkpoint: You see trace output on every printf.

Phase 3: Polish & Edge Cases (3-5 days)

Goals:

  • Make the tool robust.

Tasks:

  1. Add error handling for invalid addresses.
  2. Add a custom command to enable/disable tracing.

Checkpoint: Tracer survives bad reads without crashing.

5.11 Key Implementation Decisions

Decision Options Recommendation Rationale
Breakpoint type internal vs normal internal Avoid cluttering GDB output
Stop behavior stop vs log log Keep program running

6. Testing Strategy

6.1 Test Categories

Category Purpose Examples
Script load Ensure Python runs source script.py succeeds
Trace output Verify logging printf lines appear
Error handling Robustness Invalid memory is caught

6.2 Critical Test Cases

  1. Breakpoint triggers at every printf.
  2. Output includes the format string.
  3. Program continues without manual continue.

6.3 Test Data

printf("Hello %d", 1)

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

Pitfall Symptom Solution
Python disabled Import errors Install GDB with Python support
Bad register usage Garbage strings Confirm calling convention
Excessive output Slow debugging Add conditional filters

7.2 Debugging Strategies

  • Use print inside Python to debug your script.
  • Use set pagination off to reduce output friction.

7.3 Performance Traps

Tracing high-frequency functions can slow the program significantly.


8. Extensions & Challenges

8.1 Beginner Extensions

  • Add a hello_gdb custom command.

8.2 Intermediate Extensions

  • Build a memdump command.

8.3 Advanced Extensions

  • Create a pretty-printer for a custom struct.

9. Real-World Connections

9.1 Industry Applications

  • Automation: Build reusable debugging tools for your team.
  • Observability: Instrument behavior at runtime without changing code.
  • GDB pretty-printers used in libstdc++.
  • rr tooling and extensions.

9.3 Interview Relevance

  • Demonstrates ability to automate and extend tooling.

10. Resources

10.1 Essential Reading

  • GDB Manual - Python API.
  • “The Art of Debugging with GDB” - Python chapter.

10.2 Video Resources

  • Search: “gdb python scripting”.

10.3 Tools & Documentation

  • GDB: https://sourceware.org/gdb/
  • Python: https://www.python.org/doc/

11. Self-Assessment Checklist

11.1 Understanding

  • I can explain how GDB exposes Python hooks.
  • I can read registers from Python.

11.2 Implementation

  • My tracer logs printf without stopping.
  • My script handles invalid memory gracefully.

11.3 Growth

  • I can imagine a reusable tool for my own codebase.

12. Submission / Completion Criteria

Minimum Viable Completion:

  • A script that logs printf arguments on each call.

Full Completion:

  • A command to enable/disable tracing and robust error handling.

Excellence (Going Above & Beyond):

  • A pretty-printer or allocation tracker that works across multiple files.

This guide was generated from LEARN_GDB_DEEP_DIVE.md. For the complete learning path, see the parent directory README.