Project 2: The Syscall Tracer (Mini-strace)

Build a minimal tracer that attaches to a process and prints each syscall with its return value.

Quick Reference

Attribute	Value
Difficulty	Advanced
Time Estimate	1 week
Main Programming Language	C (Alternatives: Rust, Go)
Alternative Programming Languages	Rust, Go
Coolness Level	See REFERENCE.md (Level 4)
Business Potential	See REFERENCE.md (Level 2)
Prerequisites	Process control, syscalls, registers
Key Topics	ptrace, syscall ABI, signal delivery

1. Learning Objectives

By completing this project, you will:

Explain how a tracer pauses and resumes a tracee.
Identify syscall entry and exit points reliably.
Interpret return values and error codes deterministically.
Build a minimal tool that validates kernel behavior.

2. All Theory Needed (Per-Concept Breakdown)

Syscall Tracing with ptrace

Fundamentals Syscall tracing is the practice of observing the exact system calls a process makes. On Linux, this is commonly done using ptrace, a debugging interface that allows one process (the tracer) to control another (the tracee). The tracer can pause the tracee at syscall boundaries, read registers to decode arguments, and resume execution. Understanding this interface gives you visibility into the kernel boundary and lets you turn vague questions like “why is it slow” into precise observations about which kernel services are used and how they behave.

Deep Dive ptrace is a process control interface that allows a tracer to observe and manipulate a tracee. When a tracer attaches to a tracee, the kernel forces the tracee to stop and notifies the tracer. The tracer can then choose to resume the tracee with special flags that cause stops on syscall entry and syscall exit. Each stop provides an opportunity to inspect registers and memory. The most reliable pattern is to alternate between entry and exit stops, capturing syscall numbers and arguments on entry, and results on exit.

System calls are identified by a numeric ID defined by the ABI. The tracer must know where that ID is stored in registers and where arguments are placed. These details differ by architecture, which is why strace has architecture-specific decoders. For a learning tool, it is acceptable to decode only a small subset of arguments and focus on syscall number and return value. The return value indicates success or failure, with negative values mapping to errno. This provides a deterministic way to identify failures without guessing.

Signals complicate tracing. When a tracee receives a signal, the tracer sees it as a stop. The tracer must decide whether to deliver the signal to the tracee or suppress it. If you always suppress signals, you will alter the program’s behavior. If you always forward signals, you risk breaking the tracer’s state machine by mixing signal stops with syscall stops. A robust tracer handles both by tracking why the process stopped, then resuming with appropriate flags.

Tracing introduces overhead, so it should be deterministic and minimal. To keep results stable, use fixed commands and avoid timing-sensitive tests. A good tracer prints syscalls in a consistent order with clear formatting. For example, it can show syscall name (or number), file path for open-like calls, and return values. The goal is not full-featured strace; it is to understand the syscall boundary and confirm your mental model with direct evidence.

The tracer is also a window into library behavior. A single command like ls may invoke dozens of syscalls for dynamic linking, locale loading, and directory traversal. This teaches an important lesson: performance is often dominated by implicit library behavior rather than your own code. By tracing, you can see those hidden steps and optimize or debug based on reality, not assumptions.

How this fit on projects You will apply this concept in §3.1 to define tracer behavior, in §4.1 to design the control loop, and in §6.2 to create test cases. It also supports P03-build-your-own-shell.md by verifying fork/exec behavior.

Definitions & key terms

ptrace: Kernel interface that allows one process to observe/control another.
Tracee: The process being traced.
Tracer: The process controlling the tracee.
Syscall stop: A trap at syscall entry or exit.
errno: Error code set on syscall failure.

Mental model diagram

Tracer
  |
  | attach
  v
Tracee --(syscall entry)--> stop
  ^                         |
  | resume <----------------+

How it works

Start or attach to a target process.
Wait for a stop event.
Inspect registers and determine stop type.
Log syscall entry or exit data.
Resume tracee and repeat.

Minimal concrete example

Trace log (conceptual):
[pid 32010] openat("/etc/hostname") -> fd=3
[pid 32010] read(fd=3, bytes=64) -> 12
[pid 32010] close(fd=3) -> 0

Common misconceptions

“Tracing shows only my code.” It shows all syscalls, including library activity.
“Syscall numbers are stable across architectures.” They are not.
“Signals are separate from tracing.” Signals and tracing are deeply intertwined.

Check-your-understanding questions

Why does a tracer see two stops per syscall?
How does the tracer know a syscall failed?
Why can a signal stop be confused with a syscall stop?
What makes tracing slow?

Check-your-understanding answers

One stop at entry to read arguments, one at exit to read return values.
The return value is negative and maps to errno.
Both are delivered as stops and require inspection to distinguish.
Each stop requires a context switch and inspection overhead.

Real-world applications

Debugging permission and file access errors.
Profiling syscall-heavy workloads.
Understanding program startup behavior.

Where you’ll apply it

See §3.1 What You Will Build and §4.1 High-Level Design.
Also used in: P03-build-your-own-shell.md

References

syscall(2) man page: https://man7.org/linux/man-pages/man2/syscall.2.html
ptrace(2) man page: https://man7.org/linux/man-pages/man2/ptrace.2.html
“The Linux Programming Interface” - process control chapters

Key insights Tracing converts invisible kernel behavior into a deterministic log.

Summary A syscall tracer is a microscope for the user-kernel boundary.

Homework/Exercises to practice the concept

Trace a simple command and count unique syscalls.
Identify which syscalls correspond to file operations.

Solutions to the homework/exercises

Expect a small set of repeated syscalls.
Look for open, read, write, close, and stat-like calls.

3. Project Specification

3.1 What You Will Build

A command-line tool that launches a target program and prints a deterministic log of its syscalls, including return values and error codes. It focuses on clarity over completeness and supports a small set of decoded syscalls.

3.2 Functional Requirements

Attach and trace: Launch or attach to a target process.
Syscall logging: Print syscall name/number and return value.
Signal safety: Continue tracing even when signals arrive.

3.3 Non-Functional Requirements

Performance: Acceptable for short-lived commands.
Reliability: Trace completes without hanging.
Usability: Output is readable and sorted by time.

3.4 Example Usage / Output

$ ./mytrace /bin/echo hello
[pid 4021] write(1, "hello\n") -> 6
[pid 4021] exit_group(0)

3.5 Data Formats / Schemas / Protocols

Syscall log line: [pid] name(args) -> result.
Error formatting: -> -1 (EPERM).

3.6 Edge Cases

Tracee exits before first stop.
Signals delivered during syscall tracing.
Unsupported syscall argument types.

3.7 Real World Outcome

3.7.1 How to Run (Copy/Paste)

Build in project-root and run ./mytrace /bin/ls.
Requires permissions to trace child processes.

3.7.2 Golden Path Demo (Deterministic)

Trace a fixed command like /bin/true for stable output.

3.7.3 If CLI: Exact terminal transcript

$ ./mytrace /bin/true
[pid 5012] exit_group(0)
# exit code: 0

Failure demo (deterministic):

$ ./mytrace /bin/does-not-exist
error: exec failed (ENOENT)
# exit code: 2

Exit codes:

0 success
2 invalid command or exec failure

4. Solution Architecture

4.1 High-Level Design

Tracer -> wait loop -> inspect registers -> log -> resume

4.2 Key Components

Component	Responsibility	Key Decisions
Attach logic	Start/attach to tracee	Use child tracing for determinism
Stop decoder	Identify syscall vs signal stop	Inspect wait status
Logger	Format output consistently	Minimal decoding for clarity

4.4 Data Structures (No Full Code)

Trace state: per-PID current phase (entry/exit).
Syscall record: number, args, return value, errno.

4.4 Algorithm Overview

Key Algorithm: syscall tracing loop

Wait for stop event.
Determine stop type.
Read syscall number or return value.
Log and resume.

Complexity Analysis:

Time: O(n) for n syscalls.
Space: O(p) for p traced processes.

5. Implementation Guide

5.1 Development Environment Setup

# Install a C compiler and strace for comparison

5.2 Project Structure

project-root/
├── src/
│   ├── tracer.c
│   └── decode.c
├── tests/
│   └── trace_tests.sh
└── README.md

5.3 The Core Question You’re Answering

“What does the kernel actually see when my program runs?”

5.4 Concepts You Must Understand First

Syscall ABI
- Where do syscall numbers and args live?
- Book Reference: “The Linux Programming Interface” - Ch. 3-4
Process stops
- How does a tracer receive stop events?
- Book Reference: “Advanced Programming in the UNIX Environment” - process control

5.5 Questions to Guide Your Design

How will you pair syscall entry and exit stops?
How will you display unsupported syscalls?

5.6 Thinking Exercise

Tracing State Machine

Draw the state machine for a tracee alternating between entry and exit stops.

5.7 The Interview Questions They’ll Ask

“How does strace work internally?”
“What is ptrace used for besides tracing?”
“How do you detect syscall failure?”
“Why is tracing slow?”

5.8 Hints in Layers

Hint 1: Start with syscall numbers Log only numbers first, then add names.

Hint 2: Alternate stops Track whether you are at entry or exit for each PID.

Hint 3: Signals Pass signals through to avoid changing program behavior.

Hint 4: Debugging Compare output with strace -f on the same command.

5.9 Books That Will Help

Topic	Book	Chapter
Syscalls	“The Linux Programming Interface”	Ch. 3-6
Process control	“Advanced Programming in the UNIX Environment”	Ch. 8

5.10 Implementation Phases

Phase 1: Foundation (1-2 days)

Goals:

Launch a child process under trace.
Log syscall numbers.

Tasks:

Implement attach and wait loop.
Print syscall number at each stop.

Checkpoint: ./mytrace /bin/true prints at least one syscall.

Phase 2: Core Functionality (2-3 days)

Goals:

Decode return values and errors.

Tasks:

Detect entry vs exit stops.
Format output with return values.

Checkpoint: Log shows success and failure codes.

Phase 3: Polish & Edge Cases (1-2 days)

Goals:

Handle signals and process exit.

Tasks:

Forward signals properly.
Exit cleanly on tracee termination.

Checkpoint: Tracer never hangs after tracee exits.

5.11 Key Implementation Decisions

Decision	Options	Recommendation	Rationale
Trace mode	Attach vs launch	Launch child	Deterministic output
Decoding depth	Full decode vs minimal	Minimal	Keep scope manageable

6. Testing Strategy

6.1 Test Categories

Category	Purpose	Examples
Unit Tests	Validate formatting	Parse log line output
Integration Tests	Trace real commands	/bin/true, /bin/ls
Edge Case Tests	Signal handling	Send SIGINT during trace

6.2 Critical Test Cases

Trace /bin/true: prints a deterministic exit syscall.
Trace missing binary: returns exec error and exit code 2.
Signal delivery: trace continues after SIGINT.

6.3 Test Data

Expected exit syscall for /bin/true: exit_group(0)

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

Pitfall	Symptom	Solution
Wrong stop handling	Missing syscalls	Track entry/exit per PID
Signal suppression	Tracee behavior changes	Forward signals
Unsupported arch	Nonsense output	Verify ABI for your arch

7.2 Debugging Strategies

Compare with strace: Use the same command to confirm.
Print raw registers: Confirm syscall numbers when unsure.

7.3 Performance Traps

Tracing adds overhead; avoid using it in tight performance benchmarks.

8. Extensions & Challenges

8.1 Beginner Extensions

Add syscall name lookup for common calls.
Filter by syscall number.

8.2 Intermediate Extensions

Decode open paths and file descriptors.
Follow child processes.

8.3 Advanced Extensions

Add time spent per syscall.
Output in JSON for tooling integration.

9. Real-World Connections

9.1 Industry Applications

Debugging tools: strace, ltrace, and profilers.
Security: auditing unexpected syscalls.

strace: https://strace.io/ - full-featured syscall tracer.
gdb: https://www.gnu.org/software/gdb/ - uses ptrace under the hood.

9.3 Interview Relevance

Tracing and syscall knowledge are common systems interview topics.

10. Resources

10.1 Essential Reading

“The Linux Programming Interface” - syscalls and process control
ptrace(2) man page

10.2 Video Resources

“Linux syscalls and tracing” - conference talks (search title)

10.3 Tools & Documentation

strace: https://strace.io/ - reference output
man7.org: https://man7.org/linux/man-pages/man2/ptrace.2.html

11. Self-Assessment Checklist

11.1 Understanding

I can explain how ptrace stops a process at syscall boundaries
I can explain how errno is derived from return values
I understand signal interactions with tracing

11.2 Implementation

All functional requirements are met
All test cases pass
Output is deterministic and readable

11.3 Growth

I can explain this project in an interview
I documented lessons learned
I can identify a next feature to add

12. Submission / Completion Criteria

Minimum Viable Completion:

Trace a short command and log syscalls
Handle tracee exit without hanging
Produce deterministic output

Full Completion:

All minimum criteria plus:
Support a small set of decoded syscall names
Include a failure demo with exit codes

Excellence (Going Above & Beyond):

Follow child processes
Export structured output for analysis

Project 2: The Syscall Tracer (Mini-strace)

Quick Reference

1. Learning Objectives

2. All Theory Needed (Per-Concept Breakdown)

Syscall Tracing with ptrace

3. Project Specification

3.1 What You Will Build

3.2 Functional Requirements

3.3 Non-Functional Requirements

3.4 Example Usage / Output

3.5 Data Formats / Schemas / Protocols

3.6 Edge Cases

3.7 Real World Outcome

3.7.1 How to Run (Copy/Paste)

3.7.2 Golden Path Demo (Deterministic)

3.7.3 If CLI: Exact terminal transcript

4. Solution Architecture

4.1 High-Level Design

4.2 Key Components

4.4 Data Structures (No Full Code)

4.4 Algorithm Overview

5. Implementation Guide

5.1 Development Environment Setup

5.2 Project Structure

5.3 The Core Question You’re Answering

5.4 Concepts You Must Understand First

5.5 Questions to Guide Your Design

5.6 Thinking Exercise

5.7 The Interview Questions They’ll Ask

5.8 Hints in Layers

5.9 Books That Will Help

5.10 Implementation Phases

Phase 1: Foundation (1-2 days)

Phase 2: Core Functionality (2-3 days)

Phase 3: Polish & Edge Cases (1-2 days)

5.11 Key Implementation Decisions

6. Testing Strategy

6.1 Test Categories

6.2 Critical Test Cases

6.3 Test Data

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

7.2 Debugging Strategies

7.3 Performance Traps

8. Extensions & Challenges

8.1 Beginner Extensions

8.2 Intermediate Extensions

8.3 Advanced Extensions

9. Real-World Connections

9.1 Industry Applications

9.2 Related Open Source Projects

9.3 Interview Relevance

10. Resources

10.1 Essential Reading

10.2 Video Resources

10.3 Tools & Documentation

10.4 Related Projects in This Series

11. Self-Assessment Checklist

11.1 Understanding

11.2 Implementation

11.3 Growth

12. Submission / Completion Criteria