Project 6: GDB Python Scripting

Extend GDB with Python to automate debugging and build custom commands.

Quick Reference

Attribute	Value
Difficulty	Advanced
Time Estimate	1-2 weeks
Language	Python (inside GDB)
Prerequisites	Python basics, Projects 1-3
Key Topics	`gdb` Python API, breakpoints, custom commands

1. Learning Objectives

By completing this project, you will:

Load and run a Python script inside GDB.
Create a custom GDB command with gdb.Command.
Create automated breakpoints with gdb.Breakpoint.
Read memory and registers from Python.

2. Theoretical Foundation

2.1 Core Concepts

GDB Python API: Exposes breakpoints, frames, and memory to Python.
Inferior process: The debugged program inside GDB, accessible via gdb.selected_inferior().
Automation: Use Python to capture patterns and reduce repetitive manual steps.

2.2 Why This Matters

Scripting transforms GDB from a manual tool into a programmable debugging platform. This is how you build reusable debugging utilities for a codebase.

2.3 Historical Context / Background

GDB added Python scripting in the 7.x era. Before that, users relied on GDB command language and extensions like DDD.

2.4 Common Misconceptions

“Python scripts are slow”: They can be, but you can make breakpoints non-stopping and conditional.
“You must stop on every breakpoint”: return False allows logging without pausing.

3. Project Specification

3.1 What You Will Build

A Python script that logs every call to printf, extracts the format string, and prints a formatted trace in real time.

3.2 Functional Requirements

Create a script file and load it with source.
Set an internal breakpoint on printf.
Read the first argument from rdi and print it.
Continue execution without stopping on every call.

3.3 Non-Functional Requirements

Performance: Avoid heavy memory reads per call.
Reliability: Script should handle invalid memory gracefully.
Usability: Output should be readable and consistent.

3.4 Example Usage / Output

(gdb) source debug_printf_tracer.py
(gdb) run
[printf #1] Format: "Hello for the %dth time!"

3.5 Real World Outcome

You will see live tracing output while the program runs:

$ gdb ./target
(gdb) source debug_printf_tracer.py
(gdb) run
[printf #1] Format: "Hello for the %dth time!"
[printf #2] Format: "Hello for the %dth time!"
[Inferior 1 (process 14567) exited normally]

4. Solution Architecture

4.1 High-Level Design

┌──────────────┐     ┌───────────────────┐     ┌─────────────────┐
│ debug target │────▶│ GDB Python script │────▶│ trace output    │
└──────────────┘     └───────────────────┘     └─────────────────┘

4.2 Key Components

Component	Responsibility	Key Decisions
Python script	Define breakpoints and commands	Use non-stopping breakpoint
Breakpoint class	Capture calls	Subclass `gdb.Breakpoint`
Memory reader	Extract format string	Limit reads to 256 bytes

4.3 Data Structures

class TraceState:
    def __init__(self):
        self.call_count = 0

4.4 Algorithm Overview

Key Algorithm: Non-stopping trace

Register a breakpoint on printf.
On stop, read rdi and memory string.
Print a log line and continue.

Complexity Analysis:

Time: O(1) per breakpoint hit.
Space: O(1).

5. Implementation Guide

5.1 Development Environment Setup

gdb --version
python3 --version

5.2 Project Structure

project-root/
├── target.c
├── debug_printf_tracer.py
└── README.md

5.3 The Core Question You’re Answering

“How do I automate debugging work so I can focus on analysis, not repetition?”

5.4 Concepts You Must Understand First

Stop and research these before coding:

GDB Python API basics
- gdb.Breakpoint, gdb.Command
Calling convention
- Where arguments live (rdi, rsi, etc.)
Memory reading
- selected_inferior().read_memory()

5.5 Questions to Guide Your Design

Should the breakpoint stop or just log?
How will you handle invalid memory reads?
How much output is too much?

5.6 Thinking Exercise

Design a script that tracks all malloc and free calls to detect double free.

5.7 The Interview Questions They’ll Ask

How do you create a custom command in GDB?
How do you access registers in Python?
What is the performance cost of scripted breakpoints?

5.8 Hints in Layers

Hint 1: Minimal script

Define a gdb.Command and gdb.Breakpoint.

Hint 2: Read memory

inferior.read_memory(addr, 256)

Hint 3: Non-stopping

Return False from stop().

5.9 Books That Will Help

Topic	Book	Chapter
GDB Python API	GDB Manual	“Python API”
Debug scripting	“The Art of Debugging with GDB”	Ch. 7

5.10 Implementation Phases

Phase 1: Foundation (1-2 days)

Goals:

Load a Python script inside GDB.

Tasks:

Create debug_printf_tracer.py.
Use source to load it.

Checkpoint: GDB accepts your script without errors.

Phase 2: Core Functionality (3-5 days)

Goals:

Build a working breakpoint tracer.

Tasks:

Implement PrintfTracer breakpoint.
Read and print the format string.

Checkpoint: You see trace output on every printf.

Phase 3: Polish & Edge Cases (3-5 days)

Goals:

Make the tool robust.

Tasks:

Add error handling for invalid addresses.
Add a custom command to enable/disable tracing.

Checkpoint: Tracer survives bad reads without crashing.

5.11 Key Implementation Decisions

Decision	Options	Recommendation	Rationale
Breakpoint type	internal vs normal	internal	Avoid cluttering GDB output
Stop behavior	stop vs log	log	Keep program running

6. Testing Strategy

6.1 Test Categories

Category	Purpose	Examples
Script load	Ensure Python runs	`source script.py` succeeds
Trace output	Verify logging	`printf` lines appear
Error handling	Robustness	Invalid memory is caught

6.2 Critical Test Cases

Breakpoint triggers at every printf.
Output includes the format string.
Program continues without manual continue.

6.3 Test Data

printf("Hello %d", 1)

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

Pitfall	Symptom	Solution
Python disabled	Import errors	Install GDB with Python support
Bad register usage	Garbage strings	Confirm calling convention
Excessive output	Slow debugging	Add conditional filters

7.2 Debugging Strategies

Use print inside Python to debug your script.
Use set pagination off to reduce output friction.

7.3 Performance Traps

Tracing high-frequency functions can slow the program significantly.

8. Extensions & Challenges

8.1 Beginner Extensions

Add a hello_gdb custom command.

8.2 Intermediate Extensions

Build a memdump command.

8.3 Advanced Extensions

Create a pretty-printer for a custom struct.

9. Real-World Connections

9.1 Industry Applications

Automation: Build reusable debugging tools for your team.
Observability: Instrument behavior at runtime without changing code.

GDB pretty-printers used in libstdc++.
rr tooling and extensions.

9.3 Interview Relevance

Demonstrates ability to automate and extend tooling.

10. Resources

10.1 Essential Reading

GDB Manual - Python API.
“The Art of Debugging with GDB” - Python chapter.

10.2 Video Resources

Search: “gdb python scripting”.

10.3 Tools & Documentation

GDB: https://sourceware.org/gdb/
Python: https://www.python.org/doc/

Build a Mini-Debugger

11. Self-Assessment Checklist

11.1 Understanding

I can explain how GDB exposes Python hooks.
I can read registers from Python.

11.2 Implementation

My tracer logs printf without stopping.
My script handles invalid memory gracefully.

11.3 Growth

I can imagine a reusable tool for my own codebase.

12. Submission / Completion Criteria

Minimum Viable Completion:

A script that logs printf arguments on each call.

Full Completion:

A command to enable/disable tracing and robust error handling.

Excellence (Going Above & Beyond):

A pretty-printer or allocation tracker that works across multiple files.

This guide was generated from LEARN_GDB_DEEP_DIVE.md. For the complete learning path, see the parent directory README.

Project 6: GDB Python Scripting

Quick Reference

1. Learning Objectives

2. Theoretical Foundation

2.1 Core Concepts

2.2 Why This Matters

2.3 Historical Context / Background

2.4 Common Misconceptions

3. Project Specification

3.1 What You Will Build

3.2 Functional Requirements

3.3 Non-Functional Requirements

3.4 Example Usage / Output

3.5 Real World Outcome

4. Solution Architecture

4.1 High-Level Design

4.2 Key Components

4.3 Data Structures

4.4 Algorithm Overview

5. Implementation Guide

5.1 Development Environment Setup

5.2 Project Structure

5.3 The Core Question You’re Answering

5.4 Concepts You Must Understand First

5.5 Questions to Guide Your Design

5.6 Thinking Exercise

5.7 The Interview Questions They’ll Ask

5.8 Hints in Layers

5.9 Books That Will Help

5.10 Implementation Phases

Phase 1: Foundation (1-2 days)

Phase 2: Core Functionality (3-5 days)

Phase 3: Polish & Edge Cases (3-5 days)

5.11 Key Implementation Decisions

6. Testing Strategy

6.1 Test Categories

6.2 Critical Test Cases

6.3 Test Data

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

7.2 Debugging Strategies

7.3 Performance Traps

8. Extensions & Challenges

8.1 Beginner Extensions

8.2 Intermediate Extensions

8.3 Advanced Extensions

9. Real-World Connections

9.1 Industry Applications

9.2 Related Open Source Projects

9.3 Interview Relevance

10. Resources

10.1 Essential Reading

10.2 Video Resources

10.3 Tools & Documentation

10.4 Related Projects in This Series

11. Self-Assessment Checklist

11.1 Understanding

11.2 Implementation

11.3 Growth

12. Submission / Completion Criteria