Project 1: Syscall Tracer (strace-lite)

Build a minimal strace-style tool that intercepts syscalls, decodes arguments, and reports return values and latency for a target process and its threads.

Quick Reference

Attribute	Value
Difficulty	Level 3: Advanced
Time Estimate	2-3 weeks
Main Programming Language	C (Alternatives: Rust, Go)
Alternative Programming Languages	Rust, Go
Coolness Level	Level 3: Clever
Business Potential	Level 3: Support / Observability tooling
Prerequisites	C pointers, POSIX processes, Linux signals, reading man pages
Key Topics	Syscall ABI, ptrace, register decoding, process lifecycle, signal handling

1. Learning Objectives

By completing this project, you will:

Explain the syscall ABI on x86-64 and map syscall numbers to names and argument registers.
Attach to running processes and threads using ptrace and follow forks/execs safely.
Decode syscall arguments by reading registers and user memory without crashing the traced process.
Measure syscall latency with a monotonic clock and produce deterministic summary stats.
Build a filtering and formatting layer that mirrors core strace features.
Describe overhead sources and trade-offs in tracing tools.

2. All Theory Needed (Per-Concept Breakdown)

2.1 Syscall ABI and Kernel Entry Path

Fundamentals

A system call is a privileged transition from user mode to kernel mode. On x86-64 Linux, user code places a syscall number in rax, arguments in rdi, rsi, rdx, r10, r8, r9, and then executes the syscall instruction. The CPU switches privilege levels, the kernel validates the request, dispatches to the syscall table, and returns a value in rax. Understanding this contract is the baseline for decoding arguments and identifying the syscall being executed.

Deep Dive into the concept

The syscall ABI is intentionally minimal: fixed registers, fixed instruction, fixed return path. The kernel entry stub saves registers, switches to a kernel stack, and routes to the appropriate syscall handler. Return values use negative numbers to encode errno (e.g., -ENOENT), which libc converts into -1 and sets errno. Syscalls can be restarted after signals (ERESTARTSYS, ERESTARTNOHAND), which is why you may see repeated syscalls in traces. Some calls are handled in the vDSO or libc without a kernel transition; your tracer only sees actual syscalls. The ABI is architecture-specific, so your decoder must be correct for the target (x86-64 here), and must be extensible if you later support other architectures.

How this fits on projects

You use the ABI to translate register values into syscall arguments, to map syscall numbers to names, and to interpret error returns in Section 3.4 and Section 5.10 Phase 1.

Definitions & key terms

syscall ABI -> register and instruction contract for entering the kernel
syscall number -> numeric index into the syscall table
syscall instruction -> x86-64 instruction that triggers kernel entry
errno -> error code set by libc when syscalls return negative values
vDSO -> user-space shared page with fast syscalls (e.g., gettimeofday)

Mental model diagram (ASCII)

user code          CPU             kernel
---------         -----           ---------
rax=SYS_open  ->  syscall  ->  save regs -> dispatch -> return rax
rdi="/etc"        (ring3->0)       sys_openat()

How it works (step-by-step)

User sets registers with syscall number + args.
CPU executes syscall, switches to kernel mode, and loads kernel stack.
Kernel entry stub saves registers and validates syscall number.
Kernel dispatches to the handler (e.g., sys_openat).
Handler returns success or negative error.
Return path restores registers and returns to user mode.

Minimal concrete example

#include <unistd.h>
#include <sys/syscall.h>
long fd = syscall(SYS_openat, AT_FDCWD, "/etc/hosts", O_RDONLY, 0);

Common misconceptions

Misconception: libc functions are always syscalls. Correction: many libc functions are wrappers or entirely user-space.
Misconception: errno is a kernel variable. Correction: the kernel returns negative values; libc sets errno.

Check-your-understanding questions

Which register holds the syscall number on x86-64?
Why might gettimeofday() not appear in your trace?
Predict what happens if rax contains an invalid syscall number.
Explain why -ENOENT appears as -2 in raw return values.

Check-your-understanding answers

rax.
It may be served by vDSO without a syscall.
The kernel returns -ENOSYS.
errno values are negative in kernel returns; libc negates them.

Real-world applications

Production tracing tools (strace, perf trace)
Sandboxing and syscall filtering (seccomp)
Debugging stuck or misbehaving processes

Where you’ll apply it

This project: Section 3.2 Functional Requirements, Section 4.4 Algorithm Overview, Section 5.10 Phase 1.
Also used in: P03-process-scheduler-visualization-tool for understanding context switches and P06-userspace-thread-library-green-threads for ABI correctness.

References

“Computer Systems: A Programmer’s Perspective” (Bryant & O’Hallaron), Ch. 3
“The Linux Programming Interface” (Kerrisk), Syscalls chapters

Key insights

The syscall ABI is the unchanging contract that makes tracing possible.

Summary

Syscalls are a precise register-level handshake with the kernel. If you can decode that handshake, you can observe almost any program’s behavior.

Homework/Exercises to practice the concept

Write a tiny program that makes a raw syscall and prints the raw return value.
Use strace -e trace=openat and map register values to arguments.

Solutions to the homework/exercises

Use syscall(SYS_getpid) and print the value; it should match getpid().
Compare strace -v output with manual register mapping from ABI docs.

2.2 ptrace Lifecycle and Event Handling

Fundamentals

ptrace lets one process control and observe another. A tracer attaches, stops the target, inspects registers and memory, and then continues it. The tracer receives events for syscalls, signals, forks, and execs. For a syscall tracer, the basic loop is “wait for stop -> read registers -> decide -> continue”.

Deep Dive into the concept

ptrace uses waitpid()-style stops to synchronize tracer and tracee. PTRACE_SYSCALL requests two stops per syscall: entry and exit. To trace threads and forks, you must enable options such as PTRACE_O_TRACECLONE, PTRACE_O_TRACEFORK, and PTRACE_O_TRACEEXEC, then attach to new TIDs. Signals complicate the event stream: a signal stop is distinct from a syscall stop, and you must forward the signal or suppress it. The tracer must avoid deadlocks by always continuing stopped tracees and by handling waitpid for every thread. If you forget to resume a thread, the target hangs.

How this fits on projects

This is the control loop for the tracer core in Section 5.10 Phase 2 and Section 6 testing.

Definitions & key terms

tracer -> process that uses ptrace to control another
tracee -> target process being traced
stop -> event where tracee is paused and tracer is notified
PTRACE_SYSCALL -> stop on syscall entry and exit
PTRACE_O_TRACECLONE -> option to receive thread creation events

Mental model diagram (ASCII)

tracer                kernel                tracee
  |  ptrace attach  ->  stop tracee  <-  signal/stop
  |  waitpid        <-  status
  |  getregs        ->
  |  ptrace syscall ->  resume      ->  executes

How it works (step-by-step)

Attach (PTRACE_ATTACH) or launch (PTRACE_TRACEME).
Wait for initial stop and set options.
Loop: wait for stop -> classify stop -> read regs.
On syscall entry, decode args; on exit, decode return.
Resume with PTRACE_SYSCALL, forwarding signals as needed.

Minimal concrete example

ptrace(PTRACE_ATTACH, pid, 0, 0);
waitpid(pid, &status, 0);
ptrace(PTRACE_SETOPTIONS, pid, 0, PTRACE_O_TRACESYSGOOD);
ptrace(PTRACE_SYSCALL, pid, 0, 0);

Common misconceptions

Misconception: PTRACE_SYSCALL only stops once per syscall. Correction: it stops twice–entry and exit.
Misconception: tracing a PID traces all threads automatically. Correction: you must attach to each thread or use clone events.

Check-your-understanding questions

Why does PTRACE_O_TRACESYSGOOD matter?
How do you distinguish syscall stops from signal stops?
What happens if you fail to resume a tracee?

Check-your-understanding answers

It sets a flag in waitpid status to identify syscall stops.
Check status bits and TRACESYSGOOD flag.
The tracee stays stopped; the program appears hung.

Real-world applications

Debuggers (gdb, lldb)
Sandboxing and tracing security tools
Process introspection in profilers

Where you’ll apply it

This project: Section 4.4 Algorithm Overview, Section 5.10 Phase 2.
Also used in: P07-interrupt-latency-profiler for tracing event pipelines.

References

ptrace(2) man page
“The Linux Programming Interface”, tracing chapters

Key insights

Tracing is a synchronized stop-and-resume dance–miss a step and the target freezes.

Summary

ptrace provides full control over a process, but only if you handle the event stream correctly.

Homework/Exercises to practice the concept

Write a tracer that only prints syscall numbers for ls.
Attach to a running sleep process and verify it stops and resumes.

Solutions to the homework/exercises

Use PTRACE_SYSCALL, read regs.orig_rax, print it.
Attach, waitpid, then resume; verify ps state changes.

2.3 Safe Argument Decoding and User Memory Reads

Fundamentals

Syscall arguments live in registers, but many are pointers to user memory (strings, buffers, structs). You cannot dereference those pointers directly in the tracer; you must read from the tracee’s memory using process_vm_readv or ptrace(PTRACE_PEEKDATA).

Deep Dive into the concept

Decoding arguments safely requires bounding reads and handling faults. Strings are NUL-terminated, but you must cap reads to avoid unbounded memory access. For arrays and structs, you need size information from syscall prototypes. On x86-64, argument values are in registers; the tracer interprets those raw values as addresses. If the tracee unmaps memory between entry and exit, a read can fail, so your tracer must degrade gracefully (print ? or <fault>). You must also consider that some arguments are in/out (e.g., read writes into a buffer) and can only be safely printed on syscall exit.

How this fits on projects

This is required to decode openat, read, write, execve in Section 3.2 and Section 5.10 Phase 2.

Definitions & key terms

user pointer -> address in tracee’s virtual memory
process_vm_readv -> efficient user memory read syscall
PTRACE_PEEKDATA -> byte/word read through ptrace
in/out parameter -> argument updated by kernel (e.g., read buffer)

Mental model diagram (ASCII)

Tracer addr space        Tracee addr space
------------------       -----------------
ptr=0x7f...  --read-->   ["/etc/hosts\0"]

How it works (step-by-step)

Read register holding pointer argument.
Use process_vm_readv to copy bytes into tracer buffer.
Cap size (e.g., 256 bytes) and ensure NUL termination.
If read fails, print <fault> and continue.

Minimal concrete example

struct iovec local = {buf, 256};
struct iovec remote = {(void*)addr, 256};
ssize_t n = process_vm_readv(pid, &local, 1, &remote, 1, 0);

Common misconceptions

Misconception: you can read user pointers directly. Correction: the pointer is in another address space.
Misconception: you can always read buffers on syscall entry. Correction: output buffers are only valid after syscall exit.

Check-your-understanding questions

Why is process_vm_readv safer than repeated PTRACE_PEEKDATA?
When should you print the buffer for read(fd, buf, n)?
What should you do if a read fails?

Check-your-understanding answers

It is faster and handles bulk reads without word-by-word loops.
On syscall exit, after the kernel has filled the buffer.
Print a placeholder and continue; do not crash.

Real-world applications

Debuggers that show memory buffers
Security tools inspecting arguments for policy enforcement

Where you’ll apply it

This project: Section 3.2 Functional Requirements, Section 5.10 Phase 2.
Also used in: P02-memory-allocator-malloc-free-from-scratch for inspecting heap layouts.

References

process_vm_readv(2) man page
“The Linux Programming Interface”, memory access chapters

Key insights

Your tracer is only as trustworthy as your argument decoder.

Summary

Safe decoding means bounded reads, correct timing (entry vs exit), and graceful failure.

Homework/Exercises to practice the concept

Implement string decoding for openat and cap at 128 bytes.
Print read buffers on syscall exit and compare with actual file contents.

Solutions to the homework/exercises

Use process_vm_readv and stop at NUL or 128 bytes.
Store entry args; on exit, read the buffer and print hex + ASCII.

3. Project Specification

3.1 What You Will Build

A CLI tool, syscall-tracer, that attaches to a PID or launches a program and emits a live stream of syscall events with arguments, return values, and latency. It supports filtering (by syscall name, PID, thread), summary reports, and deterministic output mode for testing.

3.2 Functional Requirements

Attach or launch: -p <pid> to attach, -- <cmd> to launch.
Syscall stream: print syscall entry/exit with name, args, return, latency.
Thread-aware: trace all threads and fork/exec children.
Filtering: --filter=openat,read and --pid=....
Summary mode: counts, error rates, total/avg latency.
Output formats: text (default) and JSON (--json).
Deterministic mode: --fixed-ts uses monotonic timestamps and fixed seed.
Graceful shutdown: Ctrl-C detaches and resumes tracees.

3.3 Non-Functional Requirements

Performance: must handle 5k+ syscalls/sec with <10% added latency in a synthetic test.
Reliability: never crash the target process; detaches cleanly.
Usability: clear error messages, consistent output fields, documented filters.

3.4 Example Usage / Output

$ sudo ./syscall-tracer --fixed-ts -p 4242
[000000.001234] pid=4242 tid=4242 openat("/etc/hosts", O_RDONLY) = 3 (0.000081s)
[000000.001400] pid=4242 tid=4242 read(3, 0x7ffc... , 64) = 64 (0.000020s)
[000000.001450] pid=4242 tid=4242 close(3) = 0 (0.000004s)

3.5 Data Formats / Schemas / Protocols

Text line format

[timestamp] pid=<pid> tid=<tid> <syscall>(<args>) = <ret> (<latency>s)

JSON format

{
  "ts_ns": 1234000,
  "pid": 4242,
  "tid": 4242,
  "syscall": "openat",
  "args": ["AT_FDCWD", "/etc/hosts", "O_RDONLY"],
  "ret": 3,
  "errno": 0,
  "latency_ns": 81000
}

3.6 Edge Cases

Target process exits while tracing.
Multi-threaded process spawns threads during attach.
Syscall arguments are invalid pointers.
Syscalls interrupted by signals and restarted.
Large buffers for read/write.

3.7 Real World Outcome

3.7.1 How to Run (Copy/Paste)

make
sudo ./syscall-tracer --fixed-ts -p 4242
sudo ./syscall-tracer --json --filter=openat,read -- ./demo

3.7.2 Golden Path Demo (Deterministic)

Use --fixed-ts and --seed 42 to freeze timestamps and ordering for tests.

3.7.3 CLI Transcript (Success + Failure)

$ sudo ./syscall-tracer --fixed-ts --seed 42 -- ./demo
[000000.000100] pid=9001 tid=9001 execve("./demo", ["./demo"], ...) = 0 (0.000200s)
[000000.000500] pid=9001 tid=9001 openat("/etc/hosts", O_RDONLY) = 3 (0.000080s)
[000000.000700] pid=9001 tid=9001 close(3) = 0 (0.000004s)
Summary: openat=1 close=1 errors=0

$ sudo ./syscall-tracer -p 999999
error: pid 999999 not found
exit code: 2

3.7.4 Exit Codes

0 success
1 internal error (decode/IO)
2 invalid arguments / pid not found

4. Solution Architecture

4.1 High-Level Design

+------------------+
| CLI / Filters    |
+--------+---------+
         |
         v
+--------+---------+     +------------------+
| ptrace Controller|<--->| Tracee Threads   |
+--------+---------+     +------------------+
         |
         v
+--------+---------+
| Decoder & Format |
+--------+---------+
         |
         v
+------------------+
| Summary Engine   |
+------------------+

4.2 Key Components

4.3 Data Structures (No Full Code)

struct SyscallEvent {
    pid_t pid;
    pid_t tid;
    long sysno;
    long args[6];
    long ret;
    long errno_val;
    uint64_t ts_enter_ns;
    uint64_t ts_exit_ns;
};

4.4 Algorithm Overview

Key Algorithm: Syscall Event Pairing

On syscall-entry stop, record registers + timestamp in a map keyed by TID.
On syscall-exit stop, fetch entry, compute latency, decode return.
Emit formatted event and update summary.

Complexity Analysis:

Time: O(1) per syscall event
Space: O(T) where T is number of traced threads

5. Implementation Guide

5.1 Development Environment Setup

sudo apt install build-essential gcc make

5.2 Project Structure

syscall-tracer/
|-- src/
|   |-- main.c
|   |-- tracer.c
|   |-- decoder.c
|   `-- format.c
|-- include/
|   `-- tracer.h
|-- tests/
|   `-- test_decoder.c
`-- Makefile

5.3 The Core Question You’re Answering

“How can I observe the exact user->kernel boundary without modifying the kernel?”

5.4 Concepts You Must Understand First

Syscall ABI on your architecture.
ptrace lifecycle and event handling.
Safe decoding of user pointers and buffers.

5.5 Questions to Guide Your Design

How will you distinguish entry vs exit stops?
What arguments do you decode fully vs abbreviate?
How will you deal with threads and forks?

5.6 Thinking Exercise

Sketch the trace sequence for open("/etc/hosts"): which syscalls appear before main() runs and why?

5.7 The Interview Questions They’ll Ask

Why do syscalls return negative values in registers?
How do you avoid stopping only the main thread?
What is the overhead source in ptrace tracing?

5.8 Hints in Layers

Hint 1: Start with PTRACE_TRACEME and trace a child.
Hint 2: Add PTRACE_O_TRACESYSGOOD and check status bits.
Hint 3: Implement a syscall number->name table.

5.9 Books That Will Help

5.10 Implementation Phases

Phase 1: Minimal tracer (3-4 days)

Attach to a child and print syscall numbers.
Checkpoint: you see syscalls for ls.

Phase 2: Decode + format (1 week)

Map numbers to names, decode basic args.
Checkpoint: output resembles basic strace.

Phase 3: Threads + summary (1 week)

Handle clone/exec, add summary report.
Checkpoint: multi-threaded program traces correctly.

5.11 Key Implementation Decisions

6. Testing Strategy

6.1 Test Categories

6.2 Critical Test Cases

Trace a single-threaded program and compare with strace output.
Trace a multi-threaded program and verify all TIDs appear.
Decode a string arg from an unmapped address -> prints <fault>.

6.3 Test Data

program: demo
syscalls: openat, read, close
expected: three events + summary

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

7.2 Debugging Strategies

Mirror with strace: run both tools on the same process and diff outputs.
Log wait statuses: print raw waitpid status during development.

7.3 Performance Traps

Excessive string decoding on every syscall can dominate overhead; cap lengths and sample.

8. Extensions & Challenges

8.1 Beginner Extensions

Add a --count-only mode.
Add basic colorized output.

8.2 Intermediate Extensions

Decode sockaddr structures for connect.
Add JSON output with schema validation.

8.3 Advanced Extensions

Add eBPF-based fast path for high-frequency syscalls.
Implement per-cgroup tracing filters.

9. Real-World Connections

9.1 Industry Applications

Debugging production incidents: trace slow I/O or failing syscalls.
Security monitoring: detect unexpected syscalls in sandboxed apps.

strace: canonical syscall tracer
perf: tracing with lower overhead

9.3 Interview Relevance

System call path and context switch discussion
Debugging and observability tooling questions

10. Resources

10.1 Essential Reading

TLPI (Kerrisk), Syscall and ptrace chapters
CS:APP (Bryant/O’Hallaron), Ch. 3 (machine-level code)

10.2 Video Resources

Linux syscall ABI talks (LWN / conference recordings)

10.3 Tools & Documentation

ptrace(2), process_vm_readv(2) man pages

P02-memory-allocator-malloc-free-from-scratch - memory inspection skills
P03-process-scheduler-visualization-tool - tracing system behavior

11. Self-Assessment Checklist

11.1 Understanding

I can explain the syscall ABI and register mapping.
I can describe how ptrace events are delivered.
I can explain why syscall tracing adds overhead.

11.2 Implementation

All functional requirements are met.
Summary output is correct and deterministic.
Edge cases do not crash the tracee.

11.3 Growth

I can explain my design in an interview.
I can identify one improvement for v2.

12. Submission / Completion Criteria

Minimum Viable Completion:

Attach/launch tracing works.
Syscall name + return value printed.
Clean detach on exit.

Full Completion:

All minimum criteria plus:
Argument decoding for strings and buffers.
Summary statistics mode.

Excellence (Going Above & Beyond):

JSON output + schema.
Low-overhead mode or sampling.

13. Determinism Notes

Use --fixed-ts and --seed 42 in tests.
Avoid wall-clock timestamps in golden outputs.