Project 6: GDB Python Scripting
Extend GDB with Python to automate debugging and build custom commands.
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Advanced |
| Time Estimate | 1-2 weeks |
| Language | Python (inside GDB) |
| Prerequisites | Python basics, Projects 1-3 |
| Key Topics | gdb Python API, breakpoints, custom commands |
1. Learning Objectives
By completing this project, you will:
- Load and run a Python script inside GDB.
- Create a custom GDB command with
gdb.Command. - Create automated breakpoints with
gdb.Breakpoint. - Read memory and registers from Python.
2. Theoretical Foundation
2.1 Core Concepts
- GDB Python API: Exposes breakpoints, frames, and memory to Python.
- Inferior process: The debugged program inside GDB, accessible via
gdb.selected_inferior(). - Automation: Use Python to capture patterns and reduce repetitive manual steps.
2.2 Why This Matters
Scripting transforms GDB from a manual tool into a programmable debugging platform. This is how you build reusable debugging utilities for a codebase.
2.3 Historical Context / Background
GDB added Python scripting in the 7.x era. Before that, users relied on GDB command language and extensions like DDD.
2.4 Common Misconceptions
- “Python scripts are slow”: They can be, but you can make breakpoints non-stopping and conditional.
- “You must stop on every breakpoint”:
return Falseallows logging without pausing.
3. Project Specification
3.1 What You Will Build
A Python script that logs every call to printf, extracts the format string, and prints a formatted trace in real time.
3.2 Functional Requirements
- Create a script file and load it with
source. - Set an internal breakpoint on
printf. - Read the first argument from
rdiand print it. - Continue execution without stopping on every call.
3.3 Non-Functional Requirements
- Performance: Avoid heavy memory reads per call.
- Reliability: Script should handle invalid memory gracefully.
- Usability: Output should be readable and consistent.
3.4 Example Usage / Output
(gdb) source debug_printf_tracer.py
(gdb) run
[printf #1] Format: "Hello for the %dth time!"
3.5 Real World Outcome
You will see live tracing output while the program runs:
$ gdb ./target
(gdb) source debug_printf_tracer.py
(gdb) run
[printf #1] Format: "Hello for the %dth time!"
[printf #2] Format: "Hello for the %dth time!"
[Inferior 1 (process 14567) exited normally]
4. Solution Architecture
4.1 High-Level Design
┌──────────────┐ ┌───────────────────┐ ┌─────────────────┐
│ debug target │────▶│ GDB Python script │────▶│ trace output │
└──────────────┘ └───────────────────┘ └─────────────────┘
4.2 Key Components
| Component | Responsibility | Key Decisions |
|---|---|---|
| Python script | Define breakpoints and commands | Use non-stopping breakpoint |
| Breakpoint class | Capture calls | Subclass gdb.Breakpoint |
| Memory reader | Extract format string | Limit reads to 256 bytes |
4.3 Data Structures
class TraceState:
def __init__(self):
self.call_count = 0
4.4 Algorithm Overview
Key Algorithm: Non-stopping trace
- Register a breakpoint on
printf. - On stop, read
rdiand memory string. - Print a log line and continue.
Complexity Analysis:
- Time: O(1) per breakpoint hit.
- Space: O(1).
5. Implementation Guide
5.1 Development Environment Setup
gdb --version
python3 --version
5.2 Project Structure
project-root/
├── target.c
├── debug_printf_tracer.py
└── README.md
5.3 The Core Question You’re Answering
“How do I automate debugging work so I can focus on analysis, not repetition?”
5.4 Concepts You Must Understand First
Stop and research these before coding:
- GDB Python API basics
gdb.Breakpoint,gdb.Command
- Calling convention
- Where arguments live (
rdi,rsi, etc.)
- Where arguments live (
- Memory reading
selected_inferior().read_memory()
5.5 Questions to Guide Your Design
- Should the breakpoint stop or just log?
- How will you handle invalid memory reads?
- How much output is too much?
5.6 Thinking Exercise
Design a script that tracks all malloc and free calls to detect double free.
5.7 The Interview Questions They’ll Ask
- How do you create a custom command in GDB?
- How do you access registers in Python?
- What is the performance cost of scripted breakpoints?
5.8 Hints in Layers
Hint 1: Minimal script
- Define a
gdb.Commandandgdb.Breakpoint.
Hint 2: Read memory
inferior.read_memory(addr, 256)
Hint 3: Non-stopping
- Return
Falsefromstop().
5.9 Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| GDB Python API | GDB Manual | “Python API” |
| Debug scripting | “The Art of Debugging with GDB” | Ch. 7 |
5.10 Implementation Phases
Phase 1: Foundation (1-2 days)
Goals:
- Load a Python script inside GDB.
Tasks:
- Create
debug_printf_tracer.py. - Use
sourceto load it.
Checkpoint: GDB accepts your script without errors.
Phase 2: Core Functionality (3-5 days)
Goals:
- Build a working breakpoint tracer.
Tasks:
- Implement
PrintfTracerbreakpoint. - Read and print the format string.
Checkpoint: You see trace output on every printf.
Phase 3: Polish & Edge Cases (3-5 days)
Goals:
- Make the tool robust.
Tasks:
- Add error handling for invalid addresses.
- Add a custom command to enable/disable tracing.
Checkpoint: Tracer survives bad reads without crashing.
5.11 Key Implementation Decisions
| Decision | Options | Recommendation | Rationale |
|---|---|---|---|
| Breakpoint type | internal vs normal | internal | Avoid cluttering GDB output |
| Stop behavior | stop vs log | log | Keep program running |
6. Testing Strategy
6.1 Test Categories
| Category | Purpose | Examples |
|---|---|---|
| Script load | Ensure Python runs | source script.py succeeds |
| Trace output | Verify logging | printf lines appear |
| Error handling | Robustness | Invalid memory is caught |
6.2 Critical Test Cases
- Breakpoint triggers at every
printf. - Output includes the format string.
- Program continues without manual
continue.
6.3 Test Data
printf("Hello %d", 1)
7. Common Pitfalls & Debugging
7.1 Frequent Mistakes
| Pitfall | Symptom | Solution |
|---|---|---|
| Python disabled | Import errors | Install GDB with Python support |
| Bad register usage | Garbage strings | Confirm calling convention |
| Excessive output | Slow debugging | Add conditional filters |
7.2 Debugging Strategies
- Use
printinside Python to debug your script. - Use
set pagination offto reduce output friction.
7.3 Performance Traps
Tracing high-frequency functions can slow the program significantly.
8. Extensions & Challenges
8.1 Beginner Extensions
- Add a
hello_gdbcustom command.
8.2 Intermediate Extensions
- Build a
memdumpcommand.
8.3 Advanced Extensions
- Create a pretty-printer for a custom struct.
9. Real-World Connections
9.1 Industry Applications
- Automation: Build reusable debugging tools for your team.
- Observability: Instrument behavior at runtime without changing code.
9.2 Related Open Source Projects
- GDB pretty-printers used in libstdc++.
- rr tooling and extensions.
9.3 Interview Relevance
- Demonstrates ability to automate and extend tooling.
10. Resources
10.1 Essential Reading
- GDB Manual - Python API.
- “The Art of Debugging with GDB” - Python chapter.
10.2 Video Resources
- Search: “gdb python scripting”.
10.3 Tools & Documentation
- GDB: https://sourceware.org/gdb/
- Python: https://www.python.org/doc/
10.4 Related Projects in This Series
11. Self-Assessment Checklist
11.1 Understanding
- I can explain how GDB exposes Python hooks.
- I can read registers from Python.
11.2 Implementation
- My tracer logs
printfwithout stopping. - My script handles invalid memory gracefully.
11.3 Growth
- I can imagine a reusable tool for my own codebase.
12. Submission / Completion Criteria
Minimum Viable Completion:
- A script that logs
printfarguments on each call.
Full Completion:
- A command to enable/disable tracing and robust error handling.
Excellence (Going Above & Beyond):
- A pretty-printer or allocation tracker that works across multiple files.
This guide was generated from LEARN_GDB_DEEP_DIVE.md. For the complete learning path, see the parent directory README.