Project 4: GDB Debugging Deep Dive

Expanded deep-dive guide for Project 4 from the Binary Analysis sprint.

Quick Reference

Attribute Value
Difficulty Level 2: Intermediate
Time Estimate 1-2 weeks
Main Programming Language C (for targets), GDB commands
Alternative Programming Languages Python (GDB scripting)
Coolness Level Level 3: Genuinely Clever
Business Potential 1. The “Resume Gold”
Knowledge Area Debugging / Dynamic Analysis
Software or Tool GDB, pwndbg/GEF, GCC
Main Book “The Art of Debugging with GDB” by Matloff & Salzman

1. Learning Objectives

  1. Build a working implementation with reproducible outputs.
  2. Justify key design choices with binary-analysis principles.
  3. Produce an evidence-backed report of findings and limitations.
  4. Document hardening or next-step improvements.

2. All Theory Needed (Per-Concept Breakdown)

This project depends on concepts from the main sprint primer: loader semantics, control/data-flow recovery, runtime observation, and mitigation-aware vulnerability reasoning. Before implementation, restate the project’s core assumptions in your own words and define how you will validate them.

3. Project Specification

3.1 What You Will Build

A series of increasingly complex debugging exercises, culminating in a GDB Python extension for automated analysis.

3.2 Functional Requirements

  1. Accept the target binary/input and validate format assumptions.
  2. Produce analyzable outputs (console report and/or artifacts).
  3. Handle malformed inputs safely with explicit errors.

3.3 Non-Functional Requirements

  • Reproducibility: same input should produce equivalent findings.
  • Safety: unknown samples run only in isolated lab contexts.
  • Clarity: separate facts, hypotheses, and inferred conclusions.

3.4 Expanded Project Brief

  • File: P04-gdb-debugging-deep-dive.md

  • Main Programming Language: C (for targets), GDB commands
  • Alternative Programming Languages: Python (GDB scripting)
  • Coolness Level: Level 3: Genuinely Clever
  • Business Potential: 1. The “Resume Gold”
  • Difficulty: Level 2: Intermediate
  • Knowledge Area: Debugging / Dynamic Analysis
  • Software or Tool: GDB, pwndbg/GEF, GCC
  • Main Book: “The Art of Debugging with GDB” by Matloff & Salzman

What you’ll build: A series of increasingly complex debugging exercises, culminating in a GDB Python extension for automated analysis.

Why it teaches binary analysis: Debugging is the most direct way to understand program behavior. GDB is the most powerful open-source debugger.

Core challenges you’ll face:

  • Setting breakpoints → maps to controlling execution
  • Examining memory → maps to understanding data layout
  • Stepping through code → maps to following control flow
  • Scripting with Python → maps to automating analysis

Resources for key challenges:

Key Concepts:

  • Breakpoints and Watchpoints: GDB documentation
  • Memory Examination: “The Art of Debugging” Ch. 3
  • Python GDB API: GDB Python documentation

Difficulty: Intermediate Time estimate: 1-2 weeks Prerequisites: Basic C, assembly basics

Real World Outcome

Deliverables:

  • Analysis output or tooling scripts
  • Report with control/data flow notes

Validation checklist:

  • Parses sample binaries correctly
  • Findings are reproducible in debugger
  • No unsafe execution outside lab
    $ gdb ./target_binary
    (gdb) break main
    (gdb) run
    (gdb) disassemble
    (gdb) info registers
    (gdb) x/20x $rsp           # Examine stack
    (gdb) x/s 0x402000         # Examine string
    (gdb) set $rax = 0x1337    # Modify register
    (gdb) python
    >>> gdb.execute("info registers")
    >>> frame = gdb.selected_frame()
    >>> print(frame.read_register("rip"))
    >>> end
    (gdb) continue
    

Hints in Layers

Essential GDB commands to master:

# Execution control
run [args]           # Start program
continue (c)         # Continue execution
stepi (si)           # Step one instruction
nexti (ni)           # Step over calls
finish               # Run until function returns

# Breakpoints
break *0x401000      # Break at address
break main           # Break at function
watch *0x7ffd1234    # Break on memory write
catch syscall write  # Break on syscall

# Examination
disassemble main     # Show assembly
info registers       # All registers
x/10i $rip           # 10 instructions at RIP
x/20wx $rsp          # 20 words at stack
x/s 0x402000         # String at address
info proc mappings   # Memory layout

# Modification
set $rax = 0         # Change register
set *(int*)0x401000 = 0x90909090  # Patch memory

Create exercises:

  1. Find a hidden password in a crackme
  2. Trace a function’s execution
  3. Modify a return value to bypass a check
  4. Write a GDB script to log all function calls

Learning milestones:

  1. Basic debugging → Set breakpoints, step, examine
  2. Memory analysis → Understand stack and heap layout
  3. Modify execution → Change registers and memory
  4. Python scripting → Automate repetitive tasks

The Core Question You Are Answering

How do you observe and manipulate a running program’s state without modifying its source code, and why is interactive debugging more powerful than static analysis for understanding complex behavior?

Debugging bridges the gap between theory and reality. Static analysis shows what code could do. Dynamic analysis with GDB shows what it actually does—with real data, real timing, and real state.

Concepts You Must Understand First

1. Process Memory Layout and Address Space

When you debug a program, you’re inspecting its virtual memory: code, data, heap, stack, and libraries.

Guiding questions:

  • What’s the difference between the stack and the heap?
  • Why do local variables live at high addresses and code at low addresses?
  • How does GDB access another process’s memory?

Key reading: “Computer Systems: A Programmer’s Perspective” Ch. 9 (Virtual Memory), “Hacking: The Art of Exploitation” Ch. 2 (Programming - Memory Segments)

2. Breakpoints: Software vs. Hardware

Software breakpoints replace instruction bytes with int3 (0xCC on x86). Hardware breakpoints use CPU debug registers.

Guiding questions:

  • How does GDB set a software breakpoint without permanently modifying the binary?
  • What are the limits on hardware breakpoints? (Typically 4 on x86)
  • When would you use a hardware breakpoint instead of software?

Key reading: “The Art of Debugging with GDB, DDD, and Eclipse” Ch. 2 (Breakpoints), Intel SDM Volume 3 Ch. 17 (Debug Registers)

3. The Call Stack and Stack Frames

The stack grows with each function call. Each frame contains local variables, saved registers, and the return address.

Guiding questions:

  • How does GDB’s backtrace command work?
  • What’s stored in the base pointer (RBP) and stack pointer (RSP)?
  • How can you inspect a caller’s variables from a deeper function?

Key reading: “Computer Systems: A Programmer’s Perspective” Ch. 3.7 (Procedures), “Hacking: The Art of Exploitation” Ch. 3 (Exploitation - Stack Overflows)

4. Symbols and Debug Information (DWARF)

Stripped binaries have no function names. Binaries compiled with -g contain DWARF debug info mapping addresses to source lines.

Guiding questions:

  • What’s the difference between a stripped and non-stripped binary?
  • How does GDB find variable names and types?
  • Can you debug a stripped binary? What do you lose?

Key reading: “Practical Binary Analysis” Ch. 5.3 (Symbols and Stripped Binaries), DWARF Debugging Standard documentation

5. Watchpoints: Breaking on Data, Not Code

Watchpoints trigger when memory is read, written, or changes value. Crucial for finding “who modified this variable?”

Guiding questions:

  • How are watchpoints implemented? (Hint: hardware debug registers)
  • What’s the performance cost of watchpoints?
  • Can you watch a range of addresses or only individual locations?

Key reading: “The Art of Debugging with GDB” Ch. 3 (Watchpoints and Catchpoints), GDB Documentation (Watchpoints section)

6. GDB’s Python API and Automation

GDB embeds Python for scripting. You can automate tasks, write custom commands, and analyze program state programmatically.

Guiding questions:

  • How do you access registers from Python in GDB?
  • Can you set breakpoints from a Python script?
  • How would you log every function call automatically?

Key reading: GDB Python API documentation, “The Art of Debugging with GDB” Ch. 8 (Scripting)

7. Debugging Multi-Threaded Programs

Threads share memory but have separate stacks and registers. Debugging threads requires understanding concurrency.

Guiding questions:

  • How do you switch between threads in GDB?
  • What happens when one thread hits a breakpoint—do others stop?
  • How do you debug race conditions?

Key reading: “Computer Systems: A Programmer’s Perspective” Ch. 12 (Concurrent Programming), “The Art of Debugging with GDB” Ch. 6 (Debugging Multi-threaded Programs)

8. Remote Debugging and Embedded Systems

GDB can debug programs on remote systems or embedded devices using the GDB Remote Serial Protocol.

Guiding questions:

  • How does gdbserver communicate with GDB?
  • Can you debug a program on a different architecture?
  • What’s the difference between native and remote debugging?

Key reading: GDB Documentation (Remote Debugging), “Embedded Systems Architecture” by Tammy Noergaard (GDB sections)

Questions to Guide Your Design

  1. What exercises will teach you the most? Simple “hello world” debugging is boring. What about reversing a password checker? Analyzing a buffer overflow? Tracing a complex data structure?

  2. How will you structure your learning progression? Start with basic commands, then breakpoints, then memory examination, then modification, then Python scripting?

  3. Will you use GDB plugins (pwndbg, GEF, peda)? These add powerful features for exploit development. When should you learn vanilla GDB vs. enhanced versions?

  4. What real-world scenarios will you practice? Debugging a segfault? Finding a memory leak? Analyzing a crackme? Reverse engineering a proprietary binary?

  5. How will you document your GDB knowledge? Build a cheat sheet? Create a reference of common commands? Write GDB scripts you can reuse?

  6. Will you learn GDB’s TUI mode? The Text User Interface shows code, registers, and assembly simultaneously. It’s powerful but has a learning curve.

  7. What target binaries will you debug? Toy programs you write, existing open-source software, CTF challenges, or malware samples?

  8. How will you practice without source code? Debugging stripped binaries is a critical skill for reverse engineering.

Thinking Exercise

Before writing Python scripts, master these manual exercises:

Exercise 1: Follow a Function Call Chain Compile this with gcc -g:

#include <stdio.h>
int add(int a, int b) { return a + b; }
int calculate(int x) { return add(x, 10); }
int main() {
    int result = calculate(5);
    printf("Result: %d\n", result);
    return 0;
}

In GDB:

  1. Set breakpoint on main
  2. Run and step into calculate (use step, not next)
  3. Step into add
  4. At each frame, use backtrace to see the call stack
  5. Use frame 1 to inspect calculate’s local variables
  6. Use up and down to navigate frames

Exercise 2: Find Where a Variable Changes

int main() {
    int secret = 100;
    secret += 20;
    secret *= 2;
    secret -= 50;
    printf("Secret: %d\n", secret);
}

Use a watchpoint:

  1. Break at first line of main
  2. Run to breakpoint
  3. watch secret (sets watchpoint on the variable)
  4. continue repeatedly, noting when and where secret changes
  5. Examine the assembly at each trigger point

Exercise 3: Modify Execution Flow Compile a password checker:

#include <string.h>
#include <stdio.h>
int check_password(char *pass) {
    return strcmp(pass, "letmein") == 0;
}
int main() {
    char input[50];
    fgets(input, 50, stdin);
    if (check_password(input)) {
        printf("Access granted!\n");
    } else {
        printf("Access denied!\n");
    }
}

In GDB, bypass the check:

  1. Break on the if statement
  2. Examine $rax (return value of check_password)
  3. Use set $rax = 1 to force success
  4. continue and see “Access granted” despite wrong password

Exercise 4: Examine Data Structures

struct person {
    char name[20];
    int age;
    float salary;
};

int main() {
    struct person p = {"Alice", 30, 75000.0};
    return 0;
}

In GDB:

  1. Break after struct initialization
  2. print p (shows entire structure)
  3. print p.name
  4. print &p (shows address)
  5. x/20xb &p (examine raw bytes)
  6. ptype p (shows structure definition)

Exercise 5: Reverse Engineering a Stripped Binary Compile without -g and strip:

gcc -O2 -o mystery mystery.c
strip mystery

Now debug it:

  1. gdb mystery
  2. disassemble main (no symbol table, so find entry point)
  3. info files to see entry point
  4. break *0x... (break at address, not function name)
  5. Step through assembly, figuring out what the program does

This is real reverse engineering.

The Interview Questions They’ll Ask

  1. “How does GDB implement software breakpoints?”
    • GDB saves the original instruction byte at the breakpoint address, replaces it with int3 (0xCC on x86), and restores it when the breakpoint is removed. When int3 executes, the kernel sends SIGTRAP to the debugger.
  2. “What’s the difference between step and next?”
    • step (si for assembly) steps into function calls. next (ni) steps over them, treating calls as single instructions.
  3. “How can you find what caused a segmentation fault?”
    • Run the program in GDB. When it crashes, use backtrace to see the call stack, info registers to see register values, and x/i $rip to see the faulting instruction. Often $rsi or $rdi will be 0 (NULL dereference).
  4. “Explain how watchpoints work.”
    • Watchpoints use hardware debug registers (DR0-DR3 on x86) to trigger exceptions when memory is accessed. Limited to 4 simultaneous watchpoints. Software watchpoints exist but are very slow (single-step execution).
  5. “How do you debug a program that immediately crashes?”
    • Use starti to break at the very first instruction before main. Or catch syscall exec to break after exec but before startup code.
  6. “What’s the purpose of ASLR and how do you handle it in GDB?”
    • Address Space Layout Randomization places code/libraries at random addresses for security. GDB can disable ASLR: set disable-randomization on. Useful for consistent breakpoint addresses.
  7. “How do you debug a running process without restarting it?”
    • Use gdb -p <PID> to attach to a running process. GDB sends SIGSTOP, lets you set breakpoints, then you continue.
  8. “What’s the difference between a core dump and live debugging?”
    • A core dump is a snapshot of memory at crash time. You can debug it with gdb program core, but it’s read-only (no execution). Live debugging lets you run, modify, and restart.
  9. “How would you automatically log every function call?”
    • Write a Python script using GDB’s Python API. Use gdb.events.stop to hook every stop, check if it’s a call instruction, log the function name from symbols or by disassembling.
  10. “What information is lost when debugging a stripped binary?”
    • Function names, variable names, type information, source line mappings. You only have addresses, raw assembly, and sometimes dynamic symbols (from .dynsym).

Books That Will Help

Topic Book Chapter/Section
GDB Basics “The Art of Debugging with GDB, DDD, and Eclipse” by Matloff & Salzman Ch. 1-3: GDB Fundamentals
Memory Layout “Computer Systems: A Programmer’s Perspective” by Bryant & O’Hallaron Ch. 9: Virtual Memory
Stack and Calling Conventions “Computer Systems: A Programmer’s Perspective” Ch. 3.7: Procedures
Breakpoints Internals “The Art of Debugging with GDB” Ch. 2: Breakpoints
Watchpoints “The Art of Debugging with GDB” Ch. 3: Watchpoints and Catchpoints
GDB Python API “The Art of Debugging with GDB” Ch. 8: Other GDB Topics (Scripting)
Debugging Multi-threaded Programs “The Art of Debugging with GDB” Ch. 6: Debugging Multi-threaded Programs
Symbols and DWARF “Practical Binary Analysis” by Dennis Andriesse Ch. 5.3: Symbols and Stripped Binaries
Dynamic Analysis “Practical Malware Analysis” by Sikorski & Honig Ch. 3: Basic Dynamic Analysis
Reverse Engineering with GDB “Practical Binary Analysis” Ch. 5: Basic Binary Analysis in Linux
Exploitation and GDB “Hacking: The Art of Exploitation” by Jon Erickson Ch. 3: Exploitation (Using GDB)
Stack Smashing “Hacking: The Art of Exploitation” Ch. 3.3: Stack-Based Buffer Overflows
CPU Debug Registers Intel 64/IA-32 SDM Volume 3 Ch. 17: Debug, Branch Profile, TSC, and Quality of Service
Remote Debugging GDB Documentation (official) Remote Debugging section
Core Dumps “The Art of Debugging with GDB” Ch. 4: Core Files

ASCII Diagram: GDB Process Interaction

+----------------------+          ptrace() system call          +--------------------+
|                      | <------------------------------------- |                    |
|   Target Process     |                                        |    GDB Debugger    |
|   (Your Program)     | --------------------------------------> |    (Controller)    |
|                      |          Memory read/write             |                    |
+----------------------+          Register access               +--------------------+
         |                        Set breakpoints                        |
         |                                                                |
         |                                                                |
         v                                                                v
+-------------------+                                            +-----------------+
| Virtual Memory    |                                            | GDB Commands    |
| +---------------+ |                                            | - break         |
| | Stack         | |  <-- GDB can read/write                    | - run           |
| | (local vars)  | |      any of this memory                    | - step/next     |
| +---------------+ |                                            | - print         |
| | Heap          | |                                            | - x (examine)   |
| | (malloc'd)    | |                                            | - set           |
| +---------------+ |                                            | - backtrace     |
| | .data         | |                                            | - disassemble   |
| | (globals)     | |                                            +-----------------+
| +---------------+ |
| | .text         | |
| | (code)        | |  <-- Software breakpoint: int3 (0xCC)
| | ...           | |      Hardware breakpoint: DR0-DR3 registers
| | 0x401000: RET | |
| +---------------+ |
+-------------------+

Breakpoint Mechanism:
  Original: 0x401000: 55        (push rbp)
  GDB sets: 0x401000: CC        (int3 trap instruction)
  When hit: Kernel sends SIGTRAP to GDB
  GDB:      Restores original byte (55)
            Shows user the breakpoint hit
            User can inspect/modify state
  Continue: Executes real instruction (55)
            Re-inserts breakpoint (CC) if persistent

GDB Command Categories

Execution Control:
  run (r)              - Start program
  continue (c)         - Resume execution
  step (s)             - Step into (source line)
  stepi (si)           - Step into (instruction)
  next (n)             - Step over (source line)
  nexti (ni)           - Step over (instruction)
  finish               - Run until function returns
  until <location>     - Run until location

Breakpoints:
  break <where>        - Set breakpoint
    break main
    break *0x401000
    break file.c:42
  watch <expr>         - Break on write
  rwatch <expr>        - Break on read
  awatch <expr>        - Break on access
  catch <event>        - Break on event
    catch syscall write
  info breakpoints     - List all breakpoints
  delete <n>           - Delete breakpoint

Examination:
  print <expr>         - Print value
    print $rax
    print myvar
    print/x $rsp      (hex format)
  x/<n><f><u> <addr>   - Examine memory
    x/10i $rip        (10 instructions)
    x/20xw $rsp       (20 words in hex)
    x/s 0x402000      (string)
  info registers       - Show all registers
  info frame           - Current stack frame
  backtrace (bt)       - Call stack
  disassemble <where>  - Show assembly

Modification:
  set <var> = <value>  - Change variable
    set $rax = 0
    set myvar = 100
    set *(int*)0x401000 = 0x90909090

Process Info:
  info proc mappings   - Memory map
  info sharedlibrary   - Loaded libraries
  info threads         - List threads
  thread <n>           - Switch to thread

Python Scripting:
  python <code>        - Execute Python
  python-interactive   - Python REPL
  source script.py     - Run script

Key Insight: GDB isn’t just for finding bugs—it’s a reverse engineering Swiss Army knife. Combined with scripting, you can automate complex analysis: trace all heap allocations, log every comparison against a password, or build a complete call graph. Master GDB and you unlock the ability to understand any binary.

Common Pitfalls and Debugging

Problem 1: “Your interpretation does not match runtime behavior”

  • Why: Static analysis can hide runtime-resolved addresses, lazy binding, and input-dependent branches.
  • Fix: Reproduce the path with debugger or tracer, then compare static assumptions against live register/memory state.
  • Quick test: Run the same sample through both your static workflow and a debugger transcript, and confirm control-flow decisions align.

Problem 2: “Tool output is inconsistent across machines”

  • Why: ASLR, tool version drift, and different binary build flags (PIE, RELRO, symbols stripped) change observed addresses and metadata.
  • Fix: Pin tool versions, capture checksec/metadata, and document environment assumptions in your report.
  • Quick test: Re-run analysis in a container or VM with pinned tools and compare hashes of generated outputs.

Problem 3: “Analysis accidentally executes unsafe code”

  • Why: Dynamic workflows run binaries in host context without sufficient isolation.
  • Fix: Use disposable snapshots, no-network execution, and non-privileged users for all unknown samples.
  • Quick test: Validate isolation controls first (network disabled, snapshot active, unprivileged user), then execute sample.

Definition of Done

  • Core functionality works on reference inputs
  • Edge cases are tested and documented
  • Results are reproducible (same binary, same tools, same report output)
  • Analysis notes clearly separate observations, assumptions, and conclusions
  • Lab safety controls were applied for any dynamic execution

4. Solution Architecture

Input Artifact -> Parse/Decode -> Analysis Engine -> Validation Layer -> Report

Design each stage so intermediate artifacts are inspectable (JSON/text/notes), which makes debugging and peer review much easier.

5. Implementation Phases

Phase 1: Foundation

  • Define input assumptions and format checks.
  • Produce a minimal golden output on one known sample.

Phase 2: Core Functionality

  • Implement full analysis pass for normal cases.
  • Add validation against an external ground-truth tool.

Phase 3: Hard Cases and Reporting

  • Add malformed/edge-case handling.
  • Finalize report template and reproducibility notes.

6. Testing Strategy

  • Unit-level checks for parser/decoder helpers.
  • Integration checks against known binaries/challenges.
  • Regression tests for previously failing cases.

7. Extensions & Challenges

  • Add automation for batch analysis and comparative reports.
  • Add confidence scoring for each major finding.
  • Add export formats suitable for CI/security pipelines.

8. Production Reflection

Map your project output to a production analogue: what reliability, observability, and security controls would be required to run this continuously in an engineering organization?