Project 2: The Syscall Tracer (Mini-strace)
Build a minimal tracer that attaches to a process and prints each syscall with its return value.
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Advanced |
| Time Estimate | 1 week |
| Main Programming Language | C (Alternatives: Rust, Go) |
| Alternative Programming Languages | Rust, Go |
| Coolness Level | See REFERENCE.md (Level 4) |
| Business Potential | See REFERENCE.md (Level 2) |
| Prerequisites | Process control, syscalls, registers |
| Key Topics | ptrace, syscall ABI, signal delivery |
1. Learning Objectives
By completing this project, you will:
- Explain how a tracer pauses and resumes a tracee.
- Identify syscall entry and exit points reliably.
- Interpret return values and error codes deterministically.
- Build a minimal tool that validates kernel behavior.
2. All Theory Needed (Per-Concept Breakdown)
Syscall Tracing with ptrace
Fundamentals Syscall tracing is the practice of observing the exact system calls a process makes. On Linux, this is commonly done using ptrace, a debugging interface that allows one process (the tracer) to control another (the tracee). The tracer can pause the tracee at syscall boundaries, read registers to decode arguments, and resume execution. Understanding this interface gives you visibility into the kernel boundary and lets you turn vague questions like “why is it slow” into precise observations about which kernel services are used and how they behave.
Deep Dive ptrace is a process control interface that allows a tracer to observe and manipulate a tracee. When a tracer attaches to a tracee, the kernel forces the tracee to stop and notifies the tracer. The tracer can then choose to resume the tracee with special flags that cause stops on syscall entry and syscall exit. Each stop provides an opportunity to inspect registers and memory. The most reliable pattern is to alternate between entry and exit stops, capturing syscall numbers and arguments on entry, and results on exit.
System calls are identified by a numeric ID defined by the ABI. The tracer must know where that ID is stored in registers and where arguments are placed. These details differ by architecture, which is why strace has architecture-specific decoders. For a learning tool, it is acceptable to decode only a small subset of arguments and focus on syscall number and return value. The return value indicates success or failure, with negative values mapping to errno. This provides a deterministic way to identify failures without guessing.
Signals complicate tracing. When a tracee receives a signal, the tracer sees it as a stop. The tracer must decide whether to deliver the signal to the tracee or suppress it. If you always suppress signals, you will alter the program’s behavior. If you always forward signals, you risk breaking the tracer’s state machine by mixing signal stops with syscall stops. A robust tracer handles both by tracking why the process stopped, then resuming with appropriate flags.
Tracing introduces overhead, so it should be deterministic and minimal. To keep results stable, use fixed commands and avoid timing-sensitive tests. A good tracer prints syscalls in a consistent order with clear formatting. For example, it can show syscall name (or number), file path for open-like calls, and return values. The goal is not full-featured strace; it is to understand the syscall boundary and confirm your mental model with direct evidence.
The tracer is also a window into library behavior. A single command like ls may invoke dozens of syscalls for dynamic linking, locale loading, and directory traversal. This teaches an important lesson: performance is often dominated by implicit library behavior rather than your own code. By tracing, you can see those hidden steps and optimize or debug based on reality, not assumptions.
How this fit on projects You will apply this concept in §3.1 to define tracer behavior, in §4.1 to design the control loop, and in §6.2 to create test cases. It also supports P03-build-your-own-shell.md by verifying fork/exec behavior.
Definitions & key terms
- ptrace: Kernel interface that allows one process to observe/control another.
- Tracee: The process being traced.
- Tracer: The process controlling the tracee.
- Syscall stop: A trap at syscall entry or exit.
- errno: Error code set on syscall failure.
Mental model diagram
Tracer
|
| attach
v
Tracee --(syscall entry)--> stop
^ |
| resume <----------------+
How it works
- Start or attach to a target process.
- Wait for a stop event.
- Inspect registers and determine stop type.
- Log syscall entry or exit data.
- Resume tracee and repeat.
Minimal concrete example
Trace log (conceptual):
[pid 32010] openat("/etc/hostname") -> fd=3
[pid 32010] read(fd=3, bytes=64) -> 12
[pid 32010] close(fd=3) -> 0
Common misconceptions
- “Tracing shows only my code.” It shows all syscalls, including library activity.
- “Syscall numbers are stable across architectures.” They are not.
- “Signals are separate from tracing.” Signals and tracing are deeply intertwined.
Check-your-understanding questions
- Why does a tracer see two stops per syscall?
- How does the tracer know a syscall failed?
- Why can a signal stop be confused with a syscall stop?
- What makes tracing slow?
Check-your-understanding answers
- One stop at entry to read arguments, one at exit to read return values.
- The return value is negative and maps to errno.
- Both are delivered as stops and require inspection to distinguish.
- Each stop requires a context switch and inspection overhead.
Real-world applications
- Debugging permission and file access errors.
- Profiling syscall-heavy workloads.
- Understanding program startup behavior.
Where you’ll apply it
- See §3.1 What You Will Build and §4.1 High-Level Design.
- Also used in: P03-build-your-own-shell.md
References
- syscall(2) man page: https://man7.org/linux/man-pages/man2/syscall.2.html
- ptrace(2) man page: https://man7.org/linux/man-pages/man2/ptrace.2.html
- “The Linux Programming Interface” - process control chapters
Key insights Tracing converts invisible kernel behavior into a deterministic log.
Summary A syscall tracer is a microscope for the user-kernel boundary.
Homework/Exercises to practice the concept
- Trace a simple command and count unique syscalls.
- Identify which syscalls correspond to file operations.
Solutions to the homework/exercises
- Expect a small set of repeated syscalls.
- Look for open, read, write, close, and stat-like calls.
3. Project Specification
3.1 What You Will Build
A command-line tool that launches a target program and prints a deterministic log of its syscalls, including return values and error codes. It focuses on clarity over completeness and supports a small set of decoded syscalls.
3.2 Functional Requirements
- Attach and trace: Launch or attach to a target process.
- Syscall logging: Print syscall name/number and return value.
- Signal safety: Continue tracing even when signals arrive.
3.3 Non-Functional Requirements
- Performance: Acceptable for short-lived commands.
- Reliability: Trace completes without hanging.
- Usability: Output is readable and sorted by time.
3.4 Example Usage / Output
$ ./mytrace /bin/echo hello
[pid 4021] write(1, "hello\n") -> 6
[pid 4021] exit_group(0)
3.5 Data Formats / Schemas / Protocols
- Syscall log line:
[pid] name(args) -> result. - Error formatting:
-> -1 (EPERM).
3.6 Edge Cases
- Tracee exits before first stop.
- Signals delivered during syscall tracing.
- Unsupported syscall argument types.
3.7 Real World Outcome
3.7.1 How to Run (Copy/Paste)
- Build in
project-rootand run./mytrace /bin/ls. - Requires permissions to trace child processes.
3.7.2 Golden Path Demo (Deterministic)
Trace a fixed command like /bin/true for stable output.
3.7.3 If CLI: Exact terminal transcript
$ ./mytrace /bin/true
[pid 5012] exit_group(0)
# exit code: 0
Failure demo (deterministic):
$ ./mytrace /bin/does-not-exist
error: exec failed (ENOENT)
# exit code: 2
Exit codes:
- 0 success
- 2 invalid command or exec failure
4. Solution Architecture
4.1 High-Level Design
Tracer -> wait loop -> inspect registers -> log -> resume
4.2 Key Components
| Component | Responsibility | Key Decisions |
|---|---|---|
| Attach logic | Start/attach to tracee | Use child tracing for determinism |
| Stop decoder | Identify syscall vs signal stop | Inspect wait status |
| Logger | Format output consistently | Minimal decoding for clarity |
4.4 Data Structures (No Full Code)
- Trace state: per-PID current phase (entry/exit).
- Syscall record: number, args, return value, errno.
4.4 Algorithm Overview
Key Algorithm: syscall tracing loop
- Wait for stop event.
- Determine stop type.
- Read syscall number or return value.
- Log and resume.
Complexity Analysis:
- Time: O(n) for n syscalls.
- Space: O(p) for p traced processes.
5. Implementation Guide
5.1 Development Environment Setup
# Install a C compiler and strace for comparison
5.2 Project Structure
project-root/
├── src/
│ ├── tracer.c
│ └── decode.c
├── tests/
│ └── trace_tests.sh
└── README.md
5.3 The Core Question You’re Answering
“What does the kernel actually see when my program runs?”
5.4 Concepts You Must Understand First
- Syscall ABI
- Where do syscall numbers and args live?
- Book Reference: “The Linux Programming Interface” - Ch. 3-4
- Process stops
- How does a tracer receive stop events?
- Book Reference: “Advanced Programming in the UNIX Environment” - process control
5.5 Questions to Guide Your Design
- How will you pair syscall entry and exit stops?
- How will you display unsupported syscalls?
5.6 Thinking Exercise
Tracing State Machine
Draw the state machine for a tracee alternating between entry and exit stops.
5.7 The Interview Questions They’ll Ask
- “How does strace work internally?”
- “What is ptrace used for besides tracing?”
- “How do you detect syscall failure?”
- “Why is tracing slow?”
5.8 Hints in Layers
Hint 1: Start with syscall numbers Log only numbers first, then add names.
Hint 2: Alternate stops Track whether you are at entry or exit for each PID.
Hint 3: Signals Pass signals through to avoid changing program behavior.
Hint 4: Debugging
Compare output with strace -f on the same command.
5.9 Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Syscalls | “The Linux Programming Interface” | Ch. 3-6 |
| Process control | “Advanced Programming in the UNIX Environment” | Ch. 8 |
5.10 Implementation Phases
Phase 1: Foundation (1-2 days)
Goals:
- Launch a child process under trace.
- Log syscall numbers.
Tasks:
- Implement attach and wait loop.
- Print syscall number at each stop.
Checkpoint: ./mytrace /bin/true prints at least one syscall.
Phase 2: Core Functionality (2-3 days)
Goals:
- Decode return values and errors.
Tasks:
- Detect entry vs exit stops.
- Format output with return values.
Checkpoint: Log shows success and failure codes.
Phase 3: Polish & Edge Cases (1-2 days)
Goals:
- Handle signals and process exit.
Tasks:
- Forward signals properly.
- Exit cleanly on tracee termination.
Checkpoint: Tracer never hangs after tracee exits.
5.11 Key Implementation Decisions
| Decision | Options | Recommendation | Rationale |
|---|---|---|---|
| Trace mode | Attach vs launch | Launch child | Deterministic output |
| Decoding depth | Full decode vs minimal | Minimal | Keep scope manageable |
6. Testing Strategy
6.1 Test Categories
| Category | Purpose | Examples |
|---|---|---|
| Unit Tests | Validate formatting | Parse log line output |
| Integration Tests | Trace real commands | /bin/true, /bin/ls |
| Edge Case Tests | Signal handling | Send SIGINT during trace |
6.2 Critical Test Cases
- Trace /bin/true: prints a deterministic exit syscall.
- Trace missing binary: returns exec error and exit code 2.
- Signal delivery: trace continues after SIGINT.
6.3 Test Data
Expected exit syscall for /bin/true: exit_group(0)
7. Common Pitfalls & Debugging
7.1 Frequent Mistakes
| Pitfall | Symptom | Solution |
|---|---|---|
| Wrong stop handling | Missing syscalls | Track entry/exit per PID |
| Signal suppression | Tracee behavior changes | Forward signals |
| Unsupported arch | Nonsense output | Verify ABI for your arch |
7.2 Debugging Strategies
- Compare with strace: Use the same command to confirm.
- Print raw registers: Confirm syscall numbers when unsure.
7.3 Performance Traps
Tracing adds overhead; avoid using it in tight performance benchmarks.
8. Extensions & Challenges
8.1 Beginner Extensions
- Add syscall name lookup for common calls.
- Filter by syscall number.
8.2 Intermediate Extensions
- Decode open paths and file descriptors.
- Follow child processes.
8.3 Advanced Extensions
- Add time spent per syscall.
- Output in JSON for tooling integration.
9. Real-World Connections
9.1 Industry Applications
- Debugging tools: strace, ltrace, and profilers.
- Security: auditing unexpected syscalls.
9.2 Related Open Source Projects
- strace: https://strace.io/ - full-featured syscall tracer.
- gdb: https://www.gnu.org/software/gdb/ - uses ptrace under the hood.
9.3 Interview Relevance
Tracing and syscall knowledge are common systems interview topics.
10. Resources
10.1 Essential Reading
- “The Linux Programming Interface” - syscalls and process control
- ptrace(2) man page
10.2 Video Resources
- “Linux syscalls and tracing” - conference talks (search title)
10.3 Tools & Documentation
- strace: https://strace.io/ - reference output
- man7.org: https://man7.org/linux/man-pages/man2/ptrace.2.html
10.4 Related Projects in This Series
11. Self-Assessment Checklist
11.1 Understanding
- I can explain how ptrace stops a process at syscall boundaries
- I can explain how errno is derived from return values
- I understand signal interactions with tracing
11.2 Implementation
- All functional requirements are met
- All test cases pass
- Output is deterministic and readable
11.3 Growth
- I can explain this project in an interview
- I documented lessons learned
- I can identify a next feature to add
12. Submission / Completion Criteria
Minimum Viable Completion:
- Trace a short command and log syscalls
- Handle tracee exit without hanging
- Produce deterministic output
Full Completion:
- All minimum criteria plus:
- Support a small set of decoded syscall names
- Include a failure demo with exit codes
Excellence (Going Above & Beyond):
- Follow child processes
- Export structured output for analysis