Project 4: GDB Debugging Deep Dive
Expanded deep-dive guide for Project 4 from the Binary Analysis sprint.
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Level 2: Intermediate |
| Time Estimate | 1-2 weeks |
| Main Programming Language | C (for targets), GDB commands |
| Alternative Programming Languages | Python (GDB scripting) |
| Coolness Level | Level 3: Genuinely Clever |
| Business Potential | 1. The “Resume Gold” |
| Knowledge Area | Debugging / Dynamic Analysis |
| Software or Tool | GDB, pwndbg/GEF, GCC |
| Main Book | “The Art of Debugging with GDB” by Matloff & Salzman |
1. Learning Objectives
- Build a working implementation with reproducible outputs.
- Justify key design choices with binary-analysis principles.
- Produce an evidence-backed report of findings and limitations.
- Document hardening or next-step improvements.
2. All Theory Needed (Per-Concept Breakdown)
This project depends on concepts from the main sprint primer: loader semantics, control/data-flow recovery, runtime observation, and mitigation-aware vulnerability reasoning. Before implementation, restate the project’s core assumptions in your own words and define how you will validate them.
3. Project Specification
3.1 What You Will Build
A series of increasingly complex debugging exercises, culminating in a GDB Python extension for automated analysis.
3.2 Functional Requirements
- Accept the target binary/input and validate format assumptions.
- Produce analyzable outputs (console report and/or artifacts).
- Handle malformed inputs safely with explicit errors.
3.3 Non-Functional Requirements
- Reproducibility: same input should produce equivalent findings.
- Safety: unknown samples run only in isolated lab contexts.
- Clarity: separate facts, hypotheses, and inferred conclusions.
3.4 Expanded Project Brief
-
File: P04-gdb-debugging-deep-dive.md
- Main Programming Language: C (for targets), GDB commands
- Alternative Programming Languages: Python (GDB scripting)
- Coolness Level: Level 3: Genuinely Clever
- Business Potential: 1. The “Resume Gold”
- Difficulty: Level 2: Intermediate
- Knowledge Area: Debugging / Dynamic Analysis
- Software or Tool: GDB, pwndbg/GEF, GCC
- Main Book: “The Art of Debugging with GDB” by Matloff & Salzman
What you’ll build: A series of increasingly complex debugging exercises, culminating in a GDB Python extension for automated analysis.
Why it teaches binary analysis: Debugging is the most direct way to understand program behavior. GDB is the most powerful open-source debugger.
Core challenges you’ll face:
- Setting breakpoints → maps to controlling execution
- Examining memory → maps to understanding data layout
- Stepping through code → maps to following control flow
- Scripting with Python → maps to automating analysis
Resources for key challenges:
- Reversing a Binary with GDB
- GDB Tutorial (GitHub)
- pwndbg - Enhanced GDB for exploit development
Key Concepts:
- Breakpoints and Watchpoints: GDB documentation
- Memory Examination: “The Art of Debugging” Ch. 3
- Python GDB API: GDB Python documentation
Difficulty: Intermediate Time estimate: 1-2 weeks Prerequisites: Basic C, assembly basics
Real World Outcome
Deliverables:
- Analysis output or tooling scripts
- Report with control/data flow notes
Validation checklist:
- Parses sample binaries correctly
- Findings are reproducible in debugger
- No unsafe execution outside lab
$ gdb ./target_binary (gdb) break main (gdb) run (gdb) disassemble (gdb) info registers (gdb) x/20x $rsp # Examine stack (gdb) x/s 0x402000 # Examine string (gdb) set $rax = 0x1337 # Modify register (gdb) python >>> gdb.execute("info registers") >>> frame = gdb.selected_frame() >>> print(frame.read_register("rip")) >>> end (gdb) continue
Hints in Layers
Essential GDB commands to master:
# Execution control
run [args] # Start program
continue (c) # Continue execution
stepi (si) # Step one instruction
nexti (ni) # Step over calls
finish # Run until function returns
# Breakpoints
break *0x401000 # Break at address
break main # Break at function
watch *0x7ffd1234 # Break on memory write
catch syscall write # Break on syscall
# Examination
disassemble main # Show assembly
info registers # All registers
x/10i $rip # 10 instructions at RIP
x/20wx $rsp # 20 words at stack
x/s 0x402000 # String at address
info proc mappings # Memory layout
# Modification
set $rax = 0 # Change register
set *(int*)0x401000 = 0x90909090 # Patch memory
Create exercises:
- Find a hidden password in a crackme
- Trace a function’s execution
- Modify a return value to bypass a check
- Write a GDB script to log all function calls
Learning milestones:
- Basic debugging → Set breakpoints, step, examine
- Memory analysis → Understand stack and heap layout
- Modify execution → Change registers and memory
- Python scripting → Automate repetitive tasks
The Core Question You Are Answering
How do you observe and manipulate a running program’s state without modifying its source code, and why is interactive debugging more powerful than static analysis for understanding complex behavior?
Debugging bridges the gap between theory and reality. Static analysis shows what code could do. Dynamic analysis with GDB shows what it actually does—with real data, real timing, and real state.
Concepts You Must Understand First
1. Process Memory Layout and Address Space
When you debug a program, you’re inspecting its virtual memory: code, data, heap, stack, and libraries.
Guiding questions:
- What’s the difference between the stack and the heap?
- Why do local variables live at high addresses and code at low addresses?
- How does GDB access another process’s memory?
Key reading: “Computer Systems: A Programmer’s Perspective” Ch. 9 (Virtual Memory), “Hacking: The Art of Exploitation” Ch. 2 (Programming - Memory Segments)
2. Breakpoints: Software vs. Hardware
Software breakpoints replace instruction bytes with int3 (0xCC on x86). Hardware breakpoints use CPU debug registers.
Guiding questions:
- How does GDB set a software breakpoint without permanently modifying the binary?
- What are the limits on hardware breakpoints? (Typically 4 on x86)
- When would you use a hardware breakpoint instead of software?
Key reading: “The Art of Debugging with GDB, DDD, and Eclipse” Ch. 2 (Breakpoints), Intel SDM Volume 3 Ch. 17 (Debug Registers)
3. The Call Stack and Stack Frames
The stack grows with each function call. Each frame contains local variables, saved registers, and the return address.
Guiding questions:
- How does GDB’s
backtracecommand work? - What’s stored in the base pointer (RBP) and stack pointer (RSP)?
- How can you inspect a caller’s variables from a deeper function?
Key reading: “Computer Systems: A Programmer’s Perspective” Ch. 3.7 (Procedures), “Hacking: The Art of Exploitation” Ch. 3 (Exploitation - Stack Overflows)
4. Symbols and Debug Information (DWARF)
Stripped binaries have no function names. Binaries compiled with -g contain DWARF debug info mapping addresses to source lines.
Guiding questions:
- What’s the difference between a stripped and non-stripped binary?
- How does GDB find variable names and types?
- Can you debug a stripped binary? What do you lose?
Key reading: “Practical Binary Analysis” Ch. 5.3 (Symbols and Stripped Binaries), DWARF Debugging Standard documentation
5. Watchpoints: Breaking on Data, Not Code
Watchpoints trigger when memory is read, written, or changes value. Crucial for finding “who modified this variable?”
Guiding questions:
- How are watchpoints implemented? (Hint: hardware debug registers)
- What’s the performance cost of watchpoints?
- Can you watch a range of addresses or only individual locations?
Key reading: “The Art of Debugging with GDB” Ch. 3 (Watchpoints and Catchpoints), GDB Documentation (Watchpoints section)
6. GDB’s Python API and Automation
GDB embeds Python for scripting. You can automate tasks, write custom commands, and analyze program state programmatically.
Guiding questions:
- How do you access registers from Python in GDB?
- Can you set breakpoints from a Python script?
- How would you log every function call automatically?
Key reading: GDB Python API documentation, “The Art of Debugging with GDB” Ch. 8 (Scripting)
7. Debugging Multi-Threaded Programs
Threads share memory but have separate stacks and registers. Debugging threads requires understanding concurrency.
Guiding questions:
- How do you switch between threads in GDB?
- What happens when one thread hits a breakpoint—do others stop?
- How do you debug race conditions?
Key reading: “Computer Systems: A Programmer’s Perspective” Ch. 12 (Concurrent Programming), “The Art of Debugging with GDB” Ch. 6 (Debugging Multi-threaded Programs)
8. Remote Debugging and Embedded Systems
GDB can debug programs on remote systems or embedded devices using the GDB Remote Serial Protocol.
Guiding questions:
- How does
gdbservercommunicate with GDB? - Can you debug a program on a different architecture?
- What’s the difference between native and remote debugging?
Key reading: GDB Documentation (Remote Debugging), “Embedded Systems Architecture” by Tammy Noergaard (GDB sections)
Questions to Guide Your Design
-
What exercises will teach you the most? Simple “hello world” debugging is boring. What about reversing a password checker? Analyzing a buffer overflow? Tracing a complex data structure?
-
How will you structure your learning progression? Start with basic commands, then breakpoints, then memory examination, then modification, then Python scripting?
-
Will you use GDB plugins (pwndbg, GEF, peda)? These add powerful features for exploit development. When should you learn vanilla GDB vs. enhanced versions?
-
What real-world scenarios will you practice? Debugging a segfault? Finding a memory leak? Analyzing a crackme? Reverse engineering a proprietary binary?
-
How will you document your GDB knowledge? Build a cheat sheet? Create a reference of common commands? Write GDB scripts you can reuse?
-
Will you learn GDB’s TUI mode? The Text User Interface shows code, registers, and assembly simultaneously. It’s powerful but has a learning curve.
-
What target binaries will you debug? Toy programs you write, existing open-source software, CTF challenges, or malware samples?
-
How will you practice without source code? Debugging stripped binaries is a critical skill for reverse engineering.
Thinking Exercise
Before writing Python scripts, master these manual exercises:
Exercise 1: Follow a Function Call Chain
Compile this with gcc -g:
#include <stdio.h>
int add(int a, int b) { return a + b; }
int calculate(int x) { return add(x, 10); }
int main() {
int result = calculate(5);
printf("Result: %d\n", result);
return 0;
}
In GDB:
- Set breakpoint on
main - Run and step into
calculate(usestep, notnext) - Step into
add - At each frame, use
backtraceto see the call stack - Use
frame 1to inspectcalculate’s local variables - Use
upanddownto navigate frames
Exercise 2: Find Where a Variable Changes
int main() {
int secret = 100;
secret += 20;
secret *= 2;
secret -= 50;
printf("Secret: %d\n", secret);
}
Use a watchpoint:
- Break at first line of
main - Run to breakpoint
watch secret(sets watchpoint on the variable)continuerepeatedly, noting when and wheresecretchanges- Examine the assembly at each trigger point
Exercise 3: Modify Execution Flow Compile a password checker:
#include <string.h>
#include <stdio.h>
int check_password(char *pass) {
return strcmp(pass, "letmein") == 0;
}
int main() {
char input[50];
fgets(input, 50, stdin);
if (check_password(input)) {
printf("Access granted!\n");
} else {
printf("Access denied!\n");
}
}
In GDB, bypass the check:
- Break on the
ifstatement - Examine
$rax(return value ofcheck_password) - Use
set $rax = 1to force success continueand see “Access granted” despite wrong password
Exercise 4: Examine Data Structures
struct person {
char name[20];
int age;
float salary;
};
int main() {
struct person p = {"Alice", 30, 75000.0};
return 0;
}
In GDB:
- Break after struct initialization
print p(shows entire structure)print p.nameprint &p(shows address)x/20xb &p(examine raw bytes)ptype p(shows structure definition)
Exercise 5: Reverse Engineering a Stripped Binary
Compile without -g and strip:
gcc -O2 -o mystery mystery.c
strip mystery
Now debug it:
gdb mysterydisassemble main(no symbol table, so find entry point)info filesto see entry pointbreak *0x...(break at address, not function name)- Step through assembly, figuring out what the program does
This is real reverse engineering.
The Interview Questions They’ll Ask
- “How does GDB implement software breakpoints?”
- GDB saves the original instruction byte at the breakpoint address, replaces it with
int3(0xCC on x86), and restores it when the breakpoint is removed. Whenint3executes, the kernel sends SIGTRAP to the debugger.
- GDB saves the original instruction byte at the breakpoint address, replaces it with
- “What’s the difference between
stepandnext?”step(si for assembly) steps into function calls.next(ni) steps over them, treating calls as single instructions.
- “How can you find what caused a segmentation fault?”
- Run the program in GDB. When it crashes, use
backtraceto see the call stack,info registersto see register values, andx/i $ripto see the faulting instruction. Often$rsior$rdiwill be 0 (NULL dereference).
- Run the program in GDB. When it crashes, use
- “Explain how watchpoints work.”
- Watchpoints use hardware debug registers (DR0-DR3 on x86) to trigger exceptions when memory is accessed. Limited to 4 simultaneous watchpoints. Software watchpoints exist but are very slow (single-step execution).
- “How do you debug a program that immediately crashes?”
- Use
startito break at the very first instruction beforemain. Orcatch syscall execto break after exec but before startup code.
- Use
- “What’s the purpose of ASLR and how do you handle it in GDB?”
- Address Space Layout Randomization places code/libraries at random addresses for security. GDB can disable ASLR:
set disable-randomization on. Useful for consistent breakpoint addresses.
- Address Space Layout Randomization places code/libraries at random addresses for security. GDB can disable ASLR:
- “How do you debug a running process without restarting it?”
- Use
gdb -p <PID>to attach to a running process. GDB sends SIGSTOP, lets you set breakpoints, then youcontinue.
- Use
- “What’s the difference between a core dump and live debugging?”
- A core dump is a snapshot of memory at crash time. You can debug it with
gdb program core, but it’s read-only (no execution). Live debugging lets you run, modify, and restart.
- A core dump is a snapshot of memory at crash time. You can debug it with
- “How would you automatically log every function call?”
- Write a Python script using GDB’s Python API. Use
gdb.events.stopto hook every stop, check if it’s a call instruction, log the function name from symbols or by disassembling.
- Write a Python script using GDB’s Python API. Use
- “What information is lost when debugging a stripped binary?”
- Function names, variable names, type information, source line mappings. You only have addresses, raw assembly, and sometimes dynamic symbols (from
.dynsym).
- Function names, variable names, type information, source line mappings. You only have addresses, raw assembly, and sometimes dynamic symbols (from
Books That Will Help
| Topic | Book | Chapter/Section |
|---|---|---|
| GDB Basics | “The Art of Debugging with GDB, DDD, and Eclipse” by Matloff & Salzman | Ch. 1-3: GDB Fundamentals |
| Memory Layout | “Computer Systems: A Programmer’s Perspective” by Bryant & O’Hallaron | Ch. 9: Virtual Memory |
| Stack and Calling Conventions | “Computer Systems: A Programmer’s Perspective” | Ch. 3.7: Procedures |
| Breakpoints Internals | “The Art of Debugging with GDB” | Ch. 2: Breakpoints |
| Watchpoints | “The Art of Debugging with GDB” | Ch. 3: Watchpoints and Catchpoints |
| GDB Python API | “The Art of Debugging with GDB” | Ch. 8: Other GDB Topics (Scripting) |
| Debugging Multi-threaded Programs | “The Art of Debugging with GDB” | Ch. 6: Debugging Multi-threaded Programs |
| Symbols and DWARF | “Practical Binary Analysis” by Dennis Andriesse | Ch. 5.3: Symbols and Stripped Binaries |
| Dynamic Analysis | “Practical Malware Analysis” by Sikorski & Honig | Ch. 3: Basic Dynamic Analysis |
| Reverse Engineering with GDB | “Practical Binary Analysis” | Ch. 5: Basic Binary Analysis in Linux |
| Exploitation and GDB | “Hacking: The Art of Exploitation” by Jon Erickson | Ch. 3: Exploitation (Using GDB) |
| Stack Smashing | “Hacking: The Art of Exploitation” | Ch. 3.3: Stack-Based Buffer Overflows |
| CPU Debug Registers | Intel 64/IA-32 SDM Volume 3 | Ch. 17: Debug, Branch Profile, TSC, and Quality of Service |
| Remote Debugging | GDB Documentation (official) | Remote Debugging section |
| Core Dumps | “The Art of Debugging with GDB” | Ch. 4: Core Files |
ASCII Diagram: GDB Process Interaction
+----------------------+ ptrace() system call +--------------------+
| | <------------------------------------- | |
| Target Process | | GDB Debugger |
| (Your Program) | --------------------------------------> | (Controller) |
| | Memory read/write | |
+----------------------+ Register access +--------------------+
| Set breakpoints |
| |
| |
v v
+-------------------+ +-----------------+
| Virtual Memory | | GDB Commands |
| +---------------+ | | - break |
| | Stack | | <-- GDB can read/write | - run |
| | (local vars) | | any of this memory | - step/next |
| +---------------+ | | - print |
| | Heap | | | - x (examine) |
| | (malloc'd) | | | - set |
| +---------------+ | | - backtrace |
| | .data | | | - disassemble |
| | (globals) | | +-----------------+
| +---------------+ |
| | .text | |
| | (code) | | <-- Software breakpoint: int3 (0xCC)
| | ... | | Hardware breakpoint: DR0-DR3 registers
| | 0x401000: RET | |
| +---------------+ |
+-------------------+
Breakpoint Mechanism:
Original: 0x401000: 55 (push rbp)
GDB sets: 0x401000: CC (int3 trap instruction)
When hit: Kernel sends SIGTRAP to GDB
GDB: Restores original byte (55)
Shows user the breakpoint hit
User can inspect/modify state
Continue: Executes real instruction (55)
Re-inserts breakpoint (CC) if persistent
GDB Command Categories
Execution Control:
run (r) - Start program
continue (c) - Resume execution
step (s) - Step into (source line)
stepi (si) - Step into (instruction)
next (n) - Step over (source line)
nexti (ni) - Step over (instruction)
finish - Run until function returns
until <location> - Run until location
Breakpoints:
break <where> - Set breakpoint
break main
break *0x401000
break file.c:42
watch <expr> - Break on write
rwatch <expr> - Break on read
awatch <expr> - Break on access
catch <event> - Break on event
catch syscall write
info breakpoints - List all breakpoints
delete <n> - Delete breakpoint
Examination:
print <expr> - Print value
print $rax
print myvar
print/x $rsp (hex format)
x/<n><f><u> <addr> - Examine memory
x/10i $rip (10 instructions)
x/20xw $rsp (20 words in hex)
x/s 0x402000 (string)
info registers - Show all registers
info frame - Current stack frame
backtrace (bt) - Call stack
disassemble <where> - Show assembly
Modification:
set <var> = <value> - Change variable
set $rax = 0
set myvar = 100
set *(int*)0x401000 = 0x90909090
Process Info:
info proc mappings - Memory map
info sharedlibrary - Loaded libraries
info threads - List threads
thread <n> - Switch to thread
Python Scripting:
python <code> - Execute Python
python-interactive - Python REPL
source script.py - Run script
Key Insight: GDB isn’t just for finding bugs—it’s a reverse engineering Swiss Army knife. Combined with scripting, you can automate complex analysis: trace all heap allocations, log every comparison against a password, or build a complete call graph. Master GDB and you unlock the ability to understand any binary.
Common Pitfalls and Debugging
Problem 1: “Your interpretation does not match runtime behavior”
- Why: Static analysis can hide runtime-resolved addresses, lazy binding, and input-dependent branches.
- Fix: Reproduce the path with debugger or tracer, then compare static assumptions against live register/memory state.
- Quick test: Run the same sample through both your static workflow and a debugger transcript, and confirm control-flow decisions align.
Problem 2: “Tool output is inconsistent across machines”
- Why: ASLR, tool version drift, and different binary build flags (PIE, RELRO, symbols stripped) change observed addresses and metadata.
- Fix: Pin tool versions, capture
checksec/metadata, and document environment assumptions in your report. - Quick test: Re-run analysis in a container or VM with pinned tools and compare hashes of generated outputs.
Problem 3: “Analysis accidentally executes unsafe code”
- Why: Dynamic workflows run binaries in host context without sufficient isolation.
- Fix: Use disposable snapshots, no-network execution, and non-privileged users for all unknown samples.
- Quick test: Validate isolation controls first (network disabled, snapshot active, unprivileged user), then execute sample.
Definition of Done
- Core functionality works on reference inputs
- Edge cases are tested and documented
- Results are reproducible (same binary, same tools, same report output)
- Analysis notes clearly separate observations, assumptions, and conclusions
- Lab safety controls were applied for any dynamic execution
4. Solution Architecture
Input Artifact -> Parse/Decode -> Analysis Engine -> Validation Layer -> Report
Design each stage so intermediate artifacts are inspectable (JSON/text/notes), which makes debugging and peer review much easier.
5. Implementation Phases
Phase 1: Foundation
- Define input assumptions and format checks.
- Produce a minimal golden output on one known sample.
Phase 2: Core Functionality
- Implement full analysis pass for normal cases.
- Add validation against an external ground-truth tool.
Phase 3: Hard Cases and Reporting
- Add malformed/edge-case handling.
- Finalize report template and reproducibility notes.
6. Testing Strategy
- Unit-level checks for parser/decoder helpers.
- Integration checks against known binaries/challenges.
- Regression tests for previously failing cases.
7. Extensions & Challenges
- Add automation for batch analysis and comparative reports.
- Add confidence scoring for each major finding.
- Add export formats suitable for CI/security pipelines.
8. Production Reflection
Map your project output to a production analogue: what reliability, observability, and security controls would be required to run this continuously in an engineering organization?