Project 7: Build a Mini-Debugger (ptrace)

Implement a tiny debugger in C that can launch a process, set breakpoints, read registers, and single-step.

Quick Reference

Attribute Value
Difficulty Expert
Time Estimate 1-2 weeks
Language C (Linux)
Prerequisites Projects 1-5, Linux syscalls, basic assembly
Key Topics ptrace, breakpoints, process control, registers

1. Learning Objectives

By completing this project, you will:

  1. Use ptrace to start and control a child process.
  2. Implement software breakpoints by patching int 3.
  3. Read and write registers with PTRACE_GETREGS.
  4. Build a tiny REPL that mirrors essential GDB behavior.

2. Theoretical Foundation

2.1 Core Concepts

  • ptrace: Kernel API that allows one process to observe and control another.
  • Software breakpoints: Replace instruction byte with 0xCC (int 3) and restore later.
  • Process control: waitpid informs the debugger when the tracee stops.

2.2 Why This Matters

When you build a mini-debugger, you stop treating GDB as magic. You understand the OS-level mechanics that enable debugging.

2.3 Historical Context / Background

ptrace dates back to early UNIX systems and remains the foundation for debuggers like GDB, strace, and rr.

2.4 Common Misconceptions

  • “Breakpoints are metadata”: They are real instruction byte patches.
  • “You need full symbols”: Basic debugging works with addresses alone.

3. Project Specification

3.1 What You Will Build

A minimal debugger executable that:

  • launches a target program,
  • sets breakpoints by address,
  • continues and single-steps,
  • prints register state.

3.2 Functional Requirements

  1. Launch a child process with fork + exec + PTRACE_TRACEME.
  2. Implement break <addr> to install a breakpoint.
  3. Implement continue, step, and regs commands.
  4. Restore instructions when resuming from a breakpoint.

3.3 Non-Functional Requirements

  • Reliability: Breakpoints should be reversible.
  • Performance: Acceptable for small debugging sessions.
  • Usability: Simple REPL with clear output.

3.4 Example Usage / Output

mini_gdb> break 0x40100a
mini_gdb> continue
Stopped at breakpoint 1: 0x40100a
mini_gdb> regs
rax: 0x5
rip: 0x40100a

3.5 Real World Outcome

When you run your debugger, you can control a target:

$ ./mini_gdb ./my_program
mini_gdb> break 0x40100a
Breakpoint set at 0x40100a
mini_gdb> continue
Stopped at breakpoint 1: 0x40100a
mini_gdb> regs
rax: 0x5
rbx: 0x0
rip: 0x40100a
mini_gdb> step
Stopped at 0x40100b
mini_gdb> quit

4. Solution Architecture

4.1 High-Level Design

┌──────────────┐     ┌──────────────┐     ┌──────────────┐
│ mini_gdb     │────▶│ ptrace + wait│────▶│ tracee proc  │
└──────────────┘     └──────────────┘     └──────────────┘

4.2 Key Components

Component Responsibility Key Decisions
REPL Parse commands Simple string parser
Breakpoint manager Patch and restore int 3 Store original byte
ptrace interface Read/write regs and memory Use PTRACE_PEEKDATA

4.3 Data Structures

struct Breakpoint {
    unsigned long addr;
    unsigned char original_byte;
    int enabled;
};

struct DebuggerState {
    pid_t tracee;
    struct Breakpoint bps[32];
    size_t bp_count;
};

4.4 Algorithm Overview

Key Algorithm: Software breakpoint handling

  1. Read original byte at address.
  2. Write 0xCC to set breakpoint.
  3. On stop, restore byte, adjust RIP, single-step, and re-insert breakpoint.

Complexity Analysis:

  • Time: O(1) per breakpoint hit.
  • Space: O(B) for number of breakpoints.

5. Implementation Guide

5.1 Development Environment Setup

uname -a
man ptrace

5.2 Project Structure

project-root/
├── src/
│   ├── main.c
│   ├── repl.c
│   ├── breakpoints.c
│   └── ptrace_wrap.c
├── include/
│   └── debugger.h
└── Makefile

5.3 The Core Question You’re Answering

“How does a debugger actually control another process?”

5.4 Concepts You Must Understand First

Stop and research these before coding:

  1. ptrace lifecycle
    • PTRACE_TRACEME, PTRACE_CONT, waitpid
  2. Breakpoints
    • int 3 opcode and RIP adjustment
  3. Registers
    • struct user_regs_struct

5.5 Questions to Guide Your Design

  1. How will you map breakpoints to original bytes?
  2. What happens if a breakpoint is hit twice?
  3. How do you avoid corrupting instruction streams?

5.6 Thinking Exercise

If the breakpoint is set on a multi-byte instruction, why is writing a single 0xCC still valid?

5.7 The Interview Questions They’ll Ask

  1. How does GDB set a breakpoint under the hood?
  2. What does PTRACE_SINGLESTEP do?
  3. Why must you decrement RIP after a breakpoint hit?

5.8 Hints in Layers

Hint 1: Tracee setup

  • Child calls PTRACE_TRACEME then exec.

Hint 2: Breakpoint patching

  • Read a word, change low byte to 0xCC, write back.

Hint 3: Breakpoint hit

  • Restore original byte, set RIP back one, single-step.

5.9 Books That Will Help

Topic Book Chapter
ptrace TLPI Ch. 19
Debugger internals “The Linux Programming Interface” Ch. 20
Assembly and traps CSAPP Ch. 3

5.10 Implementation Phases

Phase 1: Foundation (2-3 days)

Goals:

  • Start and stop a child process.

Tasks:

  1. Implement fork + PTRACE_TRACEME.
  2. Wait for initial stop and print PID.

Checkpoint: Child stops at exec and you can continue it.

Phase 2: Core Functionality (4-6 days)

Goals:

  • Add breakpoints and register reads.

Tasks:

  1. Implement breakpoint set/clear.
  2. Implement regs command.

Checkpoint: Breakpoints stop execution at correct address.

Phase 3: Polish & Edge Cases (3-5 days)

Goals:

  • Handle repeated hits and multiple breakpoints.

Tasks:

  1. Fix RIP adjustments.
  2. Support multiple breakpoints.

Checkpoint: No crashes after repeated break/continue.

5.11 Key Implementation Decisions

Decision Options Recommendation Rationale
Breakpoint storage array vs hashmap array Simplicity for small count
Address input hex only vs mixed hex only Avoid parsing ambiguity

6. Testing Strategy

6.1 Test Categories

Category Purpose Examples
Process control Verify attach/continue Child runs after continue
Breakpoints Verify stop rip equals address
Registers Validate reads regs prints values

6.2 Critical Test Cases

  1. Breakpoint hit pauses at correct address.
  2. step advances exactly one instruction.
  3. Multiple breakpoints do not conflict.

6.3 Test Data

Target program with a known function address

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

Pitfall Symptom Solution
Wrong RIP after hit Breakpoints skip or loop Decrement RIP by 1
Missing waitpid Race conditions Wait for stop after each action
Bad memory writes Crashes Restore original byte correctly

7.2 Debugging Strategies

  • Use strace on your debugger to validate ptrace calls.
  • Print debug logs for each ptrace request.

7.3 Performance Traps

Single-stepping every instruction is slow; use it sparingly.


8. Extensions & Challenges

8.1 Beginner Extensions

  • Add disassemble by calling objdump.

8.2 Intermediate Extensions

  • Add symbol resolution with libelf.

8.3 Advanced Extensions

  • Add software watchpoints by single-stepping memory access.

9. Real-World Connections

9.1 Industry Applications

  • Debuggers: GDB, LLDB, rr all use the same fundamentals.
  • Security: Exploit development and reverse engineering rely on ptrace.
  • gdb: Full-featured debugger.
  • rr: Record and replay debugging.

9.3 Interview Relevance

  • Demonstrates OS internals knowledge and low-level tooling skills.

10. Resources

10.1 Essential Reading

  • TLPI - Process tracing and signals.
  • ptrace(2) man page.

10.2 Video Resources

  • Search: “build a debugger ptrace”.

10.3 Tools & Documentation

  • GDB: https://sourceware.org/gdb/
  • man 2 ptrace

11. Self-Assessment Checklist

11.1 Understanding

  • I can explain how ptrace works.
  • I can describe how a breakpoint is implemented.

11.2 Implementation

  • My debugger can set and hit a breakpoint.
  • My debugger can read registers.

11.3 Growth

  • I can explain GDB internals at a high level.

12. Submission / Completion Criteria

Minimum Viable Completion:

  • Launch a target and single-step it.

Full Completion:

  • Set and handle breakpoints correctly.

Excellence (Going Above & Beyond):

  • Add symbol lookup and source line mapping.

This guide was generated from LEARN_GDB_DEEP_DIVE.md. For the complete learning path, see the parent directory README.