Project 1: Syscall Tracer (strace-lite)
Build a minimal
strace-style tool that intercepts syscalls, decodes arguments, and reports return values and latency for a target process and its threads.
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Level 3: Advanced |
| Time Estimate | 2-3 weeks |
| Main Programming Language | C (Alternatives: Rust, Go) |
| Alternative Programming Languages | Rust, Go |
| Coolness Level | Level 3: Clever |
| Business Potential | Level 3: Support / Observability tooling |
| Prerequisites | C pointers, POSIX processes, Linux signals, reading man pages |
| Key Topics | Syscall ABI, ptrace, register decoding, process lifecycle, signal handling |
1. Learning Objectives
By completing this project, you will:
- Explain the syscall ABI on x86-64 and map syscall numbers to names and argument registers.
- Attach to running processes and threads using
ptraceand follow forks/execs safely. - Decode syscall arguments by reading registers and user memory without crashing the traced process.
- Measure syscall latency with a monotonic clock and produce deterministic summary stats.
- Build a filtering and formatting layer that mirrors core
stracefeatures. - Describe overhead sources and trade-offs in tracing tools.
2. All Theory Needed (Per-Concept Breakdown)
2.1 Syscall ABI and Kernel Entry Path
Fundamentals
A system call is a privileged transition from user mode to kernel mode. On x86-64 Linux, user code places a syscall number in rax, arguments in rdi, rsi, rdx, r10, r8, r9, and then executes the syscall instruction. The CPU switches privilege levels, the kernel validates the request, dispatches to the syscall table, and returns a value in rax. Understanding this contract is the baseline for decoding arguments and identifying the syscall being executed.
Deep Dive into the concept
The syscall ABI is intentionally minimal: fixed registers, fixed instruction, fixed return path. The kernel entry stub saves registers, switches to a kernel stack, and routes to the appropriate syscall handler. Return values use negative numbers to encode errno (e.g., -ENOENT), which libc converts into -1 and sets errno. Syscalls can be restarted after signals (ERESTARTSYS, ERESTARTNOHAND), which is why you may see repeated syscalls in traces. Some calls are handled in the vDSO or libc without a kernel transition; your tracer only sees actual syscalls. The ABI is architecture-specific, so your decoder must be correct for the target (x86-64 here), and must be extensible if you later support other architectures.
How this fits on projects
You use the ABI to translate register values into syscall arguments, to map syscall numbers to names, and to interpret error returns in Section 3.4 and Section 5.10 Phase 1.
Definitions & key terms
- syscall ABI -> register and instruction contract for entering the kernel
- syscall number -> numeric index into the syscall table
syscallinstruction -> x86-64 instruction that triggers kernel entry- errno -> error code set by libc when syscalls return negative values
- vDSO -> user-space shared page with fast syscalls (e.g.,
gettimeofday)
Mental model diagram (ASCII)
user code CPU kernel
--------- ----- ---------
rax=SYS_open -> syscall -> save regs -> dispatch -> return rax
rdi="/etc" (ring3->0) sys_openat()
How it works (step-by-step)
- User sets registers with syscall number + args.
- CPU executes
syscall, switches to kernel mode, and loads kernel stack. - Kernel entry stub saves registers and validates syscall number.
- Kernel dispatches to the handler (e.g.,
sys_openat). - Handler returns success or negative error.
- Return path restores registers and returns to user mode.
Minimal concrete example
#include <unistd.h>
#include <sys/syscall.h>
long fd = syscall(SYS_openat, AT_FDCWD, "/etc/hosts", O_RDONLY, 0);
Common misconceptions
- Misconception: libc functions are always syscalls. Correction: many libc functions are wrappers or entirely user-space.
- Misconception: errno is a kernel variable.
Correction: the kernel returns negative values; libc sets
errno.
Check-your-understanding questions
- Which register holds the syscall number on x86-64?
- Why might
gettimeofday()not appear in your trace? - Predict what happens if
raxcontains an invalid syscall number. - Explain why
-ENOENTappears as-2in raw return values.
Check-your-understanding answers
rax.- It may be served by vDSO without a syscall.
- The kernel returns
-ENOSYS. errnovalues are negative in kernel returns; libc negates them.
Real-world applications
- Production tracing tools (
strace,perf trace) - Sandboxing and syscall filtering (seccomp)
- Debugging stuck or misbehaving processes
Where you’ll apply it
- This project: Section 3.2 Functional Requirements, Section 4.4 Algorithm Overview, Section 5.10 Phase 1.
- Also used in: P03-process-scheduler-visualization-tool for understanding context switches and P06-userspace-thread-library-green-threads for ABI correctness.
References
- “Computer Systems: A Programmer’s Perspective” (Bryant & O’Hallaron), Ch. 3
- “The Linux Programming Interface” (Kerrisk), Syscalls chapters
Key insights
The syscall ABI is the unchanging contract that makes tracing possible.
Summary
Syscalls are a precise register-level handshake with the kernel. If you can decode that handshake, you can observe almost any program’s behavior.
Homework/Exercises to practice the concept
- Write a tiny program that makes a raw syscall and prints the raw return value.
- Use
strace -e trace=openatand map register values to arguments.
Solutions to the homework/exercises
- Use
syscall(SYS_getpid)and print the value; it should matchgetpid(). - Compare
strace -voutput with manual register mapping from ABI docs.
2.2 ptrace Lifecycle and Event Handling
Fundamentals
ptrace lets one process control and observe another. A tracer attaches, stops the target, inspects registers and memory, and then continues it. The tracer receives events for syscalls, signals, forks, and execs. For a syscall tracer, the basic loop is “wait for stop -> read registers -> decide -> continue”.
Deep Dive into the concept
ptrace uses waitpid()-style stops to synchronize tracer and tracee. PTRACE_SYSCALL requests two stops per syscall: entry and exit. To trace threads and forks, you must enable options such as PTRACE_O_TRACECLONE, PTRACE_O_TRACEFORK, and PTRACE_O_TRACEEXEC, then attach to new TIDs. Signals complicate the event stream: a signal stop is distinct from a syscall stop, and you must forward the signal or suppress it. The tracer must avoid deadlocks by always continuing stopped tracees and by handling waitpid for every thread. If you forget to resume a thread, the target hangs.
How this fits on projects
This is the control loop for the tracer core in Section 5.10 Phase 2 and Section 6 testing.
Definitions & key terms
- tracer -> process that uses
ptraceto control another - tracee -> target process being traced
- stop -> event where tracee is paused and tracer is notified
PTRACE_SYSCALL-> stop on syscall entry and exitPTRACE_O_TRACECLONE-> option to receive thread creation events
Mental model diagram (ASCII)
tracer kernel tracee
| ptrace attach -> stop tracee <- signal/stop
| waitpid <- status
| getregs ->
| ptrace syscall -> resume -> executes
How it works (step-by-step)
- Attach (
PTRACE_ATTACH) or launch (PTRACE_TRACEME). - Wait for initial stop and set options.
- Loop: wait for stop -> classify stop -> read regs.
- On syscall entry, decode args; on exit, decode return.
- Resume with
PTRACE_SYSCALL, forwarding signals as needed.
Minimal concrete example
ptrace(PTRACE_ATTACH, pid, 0, 0);
waitpid(pid, &status, 0);
ptrace(PTRACE_SETOPTIONS, pid, 0, PTRACE_O_TRACESYSGOOD);
ptrace(PTRACE_SYSCALL, pid, 0, 0);
Common misconceptions
- Misconception:
PTRACE_SYSCALLonly stops once per syscall. Correction: it stops twice–entry and exit. - Misconception: tracing a PID traces all threads automatically. Correction: you must attach to each thread or use clone events.
Check-your-understanding questions
- Why does
PTRACE_O_TRACESYSGOODmatter? - How do you distinguish syscall stops from signal stops?
- What happens if you fail to resume a tracee?
Check-your-understanding answers
- It sets a flag in
waitpidstatus to identify syscall stops. - Check
statusbits andTRACESYSGOODflag. - The tracee stays stopped; the program appears hung.
Real-world applications
- Debuggers (
gdb,lldb) - Sandboxing and tracing security tools
- Process introspection in profilers
Where you’ll apply it
- This project: Section 4.4 Algorithm Overview, Section 5.10 Phase 2.
- Also used in: P07-interrupt-latency-profiler for tracing event pipelines.
References
ptrace(2)man page- “The Linux Programming Interface”, tracing chapters
Key insights
Tracing is a synchronized stop-and-resume dance–miss a step and the target freezes.
Summary
ptrace provides full control over a process, but only if you handle the event stream correctly.
Homework/Exercises to practice the concept
- Write a tracer that only prints syscall numbers for
ls. - Attach to a running
sleepprocess and verify it stops and resumes.
Solutions to the homework/exercises
- Use
PTRACE_SYSCALL, readregs.orig_rax, print it. - Attach,
waitpid, then resume; verifypsstate changes.
2.3 Safe Argument Decoding and User Memory Reads
Fundamentals
Syscall arguments live in registers, but many are pointers to user memory (strings, buffers, structs). You cannot dereference those pointers directly in the tracer; you must read from the tracee’s memory using process_vm_readv or ptrace(PTRACE_PEEKDATA).
Deep Dive into the concept
Decoding arguments safely requires bounding reads and handling faults. Strings are NUL-terminated, but you must cap reads to avoid unbounded memory access. For arrays and structs, you need size information from syscall prototypes. On x86-64, argument values are in registers; the tracer interprets those raw values as addresses. If the tracee unmaps memory between entry and exit, a read can fail, so your tracer must degrade gracefully (print ? or <fault>). You must also consider that some arguments are in/out (e.g., read writes into a buffer) and can only be safely printed on syscall exit.
How this fits on projects
This is required to decode openat, read, write, execve in Section 3.2 and Section 5.10 Phase 2.
Definitions & key terms
- user pointer -> address in tracee’s virtual memory
process_vm_readv-> efficient user memory read syscallPTRACE_PEEKDATA-> byte/word read through ptrace- in/out parameter -> argument updated by kernel (e.g.,
readbuffer)
Mental model diagram (ASCII)
Tracer addr space Tracee addr space
------------------ -----------------
ptr=0x7f... --read--> ["/etc/hosts\0"]
How it works (step-by-step)
- Read register holding pointer argument.
- Use
process_vm_readvto copy bytes into tracer buffer. - Cap size (e.g., 256 bytes) and ensure NUL termination.
- If read fails, print
<fault>and continue.
Minimal concrete example
struct iovec local = {buf, 256};
struct iovec remote = {(void*)addr, 256};
ssize_t n = process_vm_readv(pid, &local, 1, &remote, 1, 0);
Common misconceptions
- Misconception: you can read user pointers directly. Correction: the pointer is in another address space.
- Misconception: you can always read buffers on syscall entry. Correction: output buffers are only valid after syscall exit.
Check-your-understanding questions
- Why is
process_vm_readvsafer than repeatedPTRACE_PEEKDATA? - When should you print the buffer for
read(fd, buf, n)? - What should you do if a read fails?
Check-your-understanding answers
- It is faster and handles bulk reads without word-by-word loops.
- On syscall exit, after the kernel has filled the buffer.
- Print a placeholder and continue; do not crash.
Real-world applications
- Debuggers that show memory buffers
- Security tools inspecting arguments for policy enforcement
Where you’ll apply it
- This project: Section 3.2 Functional Requirements, Section 5.10 Phase 2.
- Also used in: P02-memory-allocator-malloc-free-from-scratch for inspecting heap layouts.
References
process_vm_readv(2)man page- “The Linux Programming Interface”, memory access chapters
Key insights
Your tracer is only as trustworthy as your argument decoder.
Summary
Safe decoding means bounded reads, correct timing (entry vs exit), and graceful failure.
Homework/Exercises to practice the concept
- Implement string decoding for
openatand cap at 128 bytes. - Print
readbuffers on syscall exit and compare with actual file contents.
Solutions to the homework/exercises
- Use
process_vm_readvand stop at NUL or 128 bytes. - Store entry args; on exit, read the buffer and print hex + ASCII.
3. Project Specification
3.1 What You Will Build
A CLI tool, syscall-tracer, that attaches to a PID or launches a program and emits a live stream of syscall events with arguments, return values, and latency. It supports filtering (by syscall name, PID, thread), summary reports, and deterministic output mode for testing.
3.2 Functional Requirements
- Attach or launch:
-p <pid>to attach,-- <cmd>to launch. - Syscall stream: print syscall entry/exit with name, args, return, latency.
- Thread-aware: trace all threads and fork/exec children.
- Filtering:
--filter=openat,readand--pid=.... - Summary mode: counts, error rates, total/avg latency.
- Output formats: text (default) and JSON (
--json). - Deterministic mode:
--fixed-tsuses monotonic timestamps and fixed seed. - Graceful shutdown: Ctrl-C detaches and resumes tracees.
3.3 Non-Functional Requirements
- Performance: must handle 5k+ syscalls/sec with <10% added latency in a synthetic test.
- Reliability: never crash the target process; detaches cleanly.
- Usability: clear error messages, consistent output fields, documented filters.
3.4 Example Usage / Output
$ sudo ./syscall-tracer --fixed-ts -p 4242
[000000.001234] pid=4242 tid=4242 openat("/etc/hosts", O_RDONLY) = 3 (0.000081s)
[000000.001400] pid=4242 tid=4242 read(3, 0x7ffc... , 64) = 64 (0.000020s)
[000000.001450] pid=4242 tid=4242 close(3) = 0 (0.000004s)
3.5 Data Formats / Schemas / Protocols
Text line format
[timestamp] pid=<pid> tid=<tid> <syscall>(<args>) = <ret> (<latency>s)
JSON format
{
"ts_ns": 1234000,
"pid": 4242,
"tid": 4242,
"syscall": "openat",
"args": ["AT_FDCWD", "/etc/hosts", "O_RDONLY"],
"ret": 3,
"errno": 0,
"latency_ns": 81000
}
3.6 Edge Cases
- Target process exits while tracing.
- Multi-threaded process spawns threads during attach.
- Syscall arguments are invalid pointers.
- Syscalls interrupted by signals and restarted.
- Large buffers for
read/write.
3.7 Real World Outcome
3.7.1 How to Run (Copy/Paste)
make
sudo ./syscall-tracer --fixed-ts -p 4242
sudo ./syscall-tracer --json --filter=openat,read -- ./demo
3.7.2 Golden Path Demo (Deterministic)
- Use
--fixed-tsand--seed 42to freeze timestamps and ordering for tests.
3.7.3 CLI Transcript (Success + Failure)
$ sudo ./syscall-tracer --fixed-ts --seed 42 -- ./demo
[000000.000100] pid=9001 tid=9001 execve("./demo", ["./demo"], ...) = 0 (0.000200s)
[000000.000500] pid=9001 tid=9001 openat("/etc/hosts", O_RDONLY) = 3 (0.000080s)
[000000.000700] pid=9001 tid=9001 close(3) = 0 (0.000004s)
Summary: openat=1 close=1 errors=0
$ sudo ./syscall-tracer -p 999999
error: pid 999999 not found
exit code: 2
3.7.4 Exit Codes
0success1internal error (decode/IO)2invalid arguments / pid not found
4. Solution Architecture
4.1 High-Level Design
+------------------+
| CLI / Filters |
+--------+---------+
|
v
+--------+---------+ +------------------+
| ptrace Controller|<--->| Tracee Threads |
+--------+---------+ +------------------+
|
v
+--------+---------+
| Decoder & Format |
+--------+---------+
|
v
+------------------+
| Summary Engine |
+------------------+
4.2 Key Components
| Component | Responsibility | Key Decisions |
|———–|—————-|—————|
| Controller | attach/launch, wait/continue loop | use PTRACE_SYSCALL + options |
| Decoder | map syscall numbers and args | x86-64 ABI only for v1 |
| Formatter | text/JSON output | stable field order for tests |
| Summary | aggregate counts/latency | per-syscall hash map |
4.3 Data Structures (No Full Code)
struct SyscallEvent {
pid_t pid;
pid_t tid;
long sysno;
long args[6];
long ret;
long errno_val;
uint64_t ts_enter_ns;
uint64_t ts_exit_ns;
};
4.4 Algorithm Overview
Key Algorithm: Syscall Event Pairing
- On syscall-entry stop, record registers + timestamp in a map keyed by TID.
- On syscall-exit stop, fetch entry, compute latency, decode return.
- Emit formatted event and update summary.
Complexity Analysis:
- Time: O(1) per syscall event
- Space: O(T) where T is number of traced threads
5. Implementation Guide
5.1 Development Environment Setup
sudo apt install build-essential gcc make
5.2 Project Structure
syscall-tracer/
|-- src/
| |-- main.c
| |-- tracer.c
| |-- decoder.c
| `-- format.c
|-- include/
| `-- tracer.h
|-- tests/
| `-- test_decoder.c
`-- Makefile
5.3 The Core Question You’re Answering
“How can I observe the exact user->kernel boundary without modifying the kernel?”
5.4 Concepts You Must Understand First
- Syscall ABI on your architecture.
ptracelifecycle and event handling.- Safe decoding of user pointers and buffers.
5.5 Questions to Guide Your Design
- How will you distinguish entry vs exit stops?
- What arguments do you decode fully vs abbreviate?
- How will you deal with threads and forks?
5.6 Thinking Exercise
Sketch the trace sequence for open("/etc/hosts"): which syscalls appear before main() runs and why?
5.7 The Interview Questions They’ll Ask
- Why do syscalls return negative values in registers?
- How do you avoid stopping only the main thread?
- What is the overhead source in
ptracetracing?
5.8 Hints in Layers
- Hint 1: Start with
PTRACE_TRACEMEand trace a child. - Hint 2: Add
PTRACE_O_TRACESYSGOODand check status bits. - Hint 3: Implement a syscall number->name table.
5.9 Books That Will Help
| Topic | Book | Chapter | |——|——|———| | Syscalls | TLPI | Syscall chapters | | ABI | CS:APP | Ch. 3 | | Tracing | TLPI | Debugging/Tracing |
5.10 Implementation Phases
Phase 1: Minimal tracer (3-4 days)
- Attach to a child and print syscall numbers.
- Checkpoint: you see syscalls for
ls.
Phase 2: Decode + format (1 week)
- Map numbers to names, decode basic args.
- Checkpoint: output resembles basic
strace.
Phase 3: Threads + summary (1 week)
- Handle clone/exec, add summary report.
- Checkpoint: multi-threaded program traces correctly.
5.11 Key Implementation Decisions
| Decision | Options | Recommendation | Rationale |
|———-|———|—————-|———–|
| Event pairing | per-pid vs per-tid | per-tid | syscalls interleave per thread |
| Memory reads | PTRACE_PEEKDATA vs process_vm_readv | process_vm_readv | faster + fewer syscalls |
6. Testing Strategy
6.1 Test Categories
| Category | Purpose | Examples |
|———|———|———-|
| Unit | Decoder correctness | syscall number mapping |
| Integration | End-to-end trace | trace true, cat |
| Edge | Faulted pointers | invalid address arg |
6.2 Critical Test Cases
- Trace a single-threaded program and compare with
straceoutput. - Trace a multi-threaded program and verify all TIDs appear.
- Decode a string arg from an unmapped address -> prints
<fault>.
6.3 Test Data
program: demo
syscalls: openat, read, close
expected: three events + summary
7. Common Pitfalls & Debugging
7.1 Frequent Mistakes
| Pitfall | Symptom | Solution |
|———|———|———-|
| Forgetting to resume | target hangs | always PTRACE_SYSCALL after each stop |
| Wrong ABI | garbage args | confirm register order for x86-64 |
| Missing threads | incomplete trace | handle clone events |
7.2 Debugging Strategies
- Mirror with strace: run both tools on the same process and diff outputs.
- Log wait statuses: print raw
waitpidstatus during development.
7.3 Performance Traps
- Excessive string decoding on every syscall can dominate overhead; cap lengths and sample.
8. Extensions & Challenges
8.1 Beginner Extensions
- Add a
--count-onlymode. - Add basic colorized output.
8.2 Intermediate Extensions
- Decode
sockaddrstructures forconnect. - Add JSON output with schema validation.
8.3 Advanced Extensions
- Add eBPF-based fast path for high-frequency syscalls.
- Implement per-cgroup tracing filters.
9. Real-World Connections
9.1 Industry Applications
- Debugging production incidents: trace slow I/O or failing syscalls.
- Security monitoring: detect unexpected syscalls in sandboxed apps.
9.2 Related Open Source Projects
- strace: canonical syscall tracer
- perf: tracing with lower overhead
9.3 Interview Relevance
- System call path and context switch discussion
- Debugging and observability tooling questions
10. Resources
10.1 Essential Reading
- TLPI (Kerrisk), Syscall and ptrace chapters
- CS:APP (Bryant/O’Hallaron), Ch. 3 (machine-level code)
10.2 Video Resources
- Linux syscall ABI talks (LWN / conference recordings)
10.3 Tools & Documentation
ptrace(2),process_vm_readv(2)man pages
10.4 Related Projects in This Series
- P02-memory-allocator-malloc-free-from-scratch - memory inspection skills
- P03-process-scheduler-visualization-tool - tracing system behavior
11. Self-Assessment Checklist
11.1 Understanding
- I can explain the syscall ABI and register mapping.
- I can describe how
ptraceevents are delivered. - I can explain why syscall tracing adds overhead.
11.2 Implementation
- All functional requirements are met.
- Summary output is correct and deterministic.
- Edge cases do not crash the tracee.
11.3 Growth
- I can explain my design in an interview.
- I can identify one improvement for v2.
12. Submission / Completion Criteria
Minimum Viable Completion:
- Attach/launch tracing works.
- Syscall name + return value printed.
- Clean detach on exit.
Full Completion:
- All minimum criteria plus:
- Argument decoding for strings and buffers.
- Summary statistics mode.
Excellence (Going Above & Beyond):
- JSON output + schema.
- Low-overhead mode or sampling.
13. Determinism Notes
- Use
--fixed-tsand--seed 42in tests. - Avoid wall-clock timestamps in golden outputs.