Project 1: Syscall Tracer (strace-lite)

Build a minimal strace-style tool that intercepts syscalls, decodes arguments, and reports return values and latency for a target process and its threads.

Quick Reference

Attribute Value
Difficulty Level 3: Advanced
Time Estimate 2-3 weeks
Main Programming Language C (Alternatives: Rust, Go)
Alternative Programming Languages Rust, Go
Coolness Level Level 3: Clever
Business Potential Level 3: Support / Observability tooling
Prerequisites C pointers, POSIX processes, Linux signals, reading man pages
Key Topics Syscall ABI, ptrace, register decoding, process lifecycle, signal handling

1. Learning Objectives

By completing this project, you will:

  1. Explain the syscall ABI on x86-64 and map syscall numbers to names and argument registers.
  2. Attach to running processes and threads using ptrace and follow forks/execs safely.
  3. Decode syscall arguments by reading registers and user memory without crashing the traced process.
  4. Measure syscall latency with a monotonic clock and produce deterministic summary stats.
  5. Build a filtering and formatting layer that mirrors core strace features.
  6. Describe overhead sources and trade-offs in tracing tools.

2. All Theory Needed (Per-Concept Breakdown)

2.1 Syscall ABI and Kernel Entry Path

Fundamentals

A system call is a privileged transition from user mode to kernel mode. On x86-64 Linux, user code places a syscall number in rax, arguments in rdi, rsi, rdx, r10, r8, r9, and then executes the syscall instruction. The CPU switches privilege levels, the kernel validates the request, dispatches to the syscall table, and returns a value in rax. Understanding this contract is the baseline for decoding arguments and identifying the syscall being executed.

Deep Dive into the concept

The syscall ABI is intentionally minimal: fixed registers, fixed instruction, fixed return path. The kernel entry stub saves registers, switches to a kernel stack, and routes to the appropriate syscall handler. Return values use negative numbers to encode errno (e.g., -ENOENT), which libc converts into -1 and sets errno. Syscalls can be restarted after signals (ERESTARTSYS, ERESTARTNOHAND), which is why you may see repeated syscalls in traces. Some calls are handled in the vDSO or libc without a kernel transition; your tracer only sees actual syscalls. The ABI is architecture-specific, so your decoder must be correct for the target (x86-64 here), and must be extensible if you later support other architectures.

How this fits on projects

You use the ABI to translate register values into syscall arguments, to map syscall numbers to names, and to interpret error returns in Section 3.4 and Section 5.10 Phase 1.

Definitions & key terms

  • syscall ABI -> register and instruction contract for entering the kernel
  • syscall number -> numeric index into the syscall table
  • syscall instruction -> x86-64 instruction that triggers kernel entry
  • errno -> error code set by libc when syscalls return negative values
  • vDSO -> user-space shared page with fast syscalls (e.g., gettimeofday)

Mental model diagram (ASCII)

user code          CPU             kernel
---------         -----           ---------
rax=SYS_open  ->  syscall  ->  save regs -> dispatch -> return rax
rdi="/etc"        (ring3->0)       sys_openat()

How it works (step-by-step)

  1. User sets registers with syscall number + args.
  2. CPU executes syscall, switches to kernel mode, and loads kernel stack.
  3. Kernel entry stub saves registers and validates syscall number.
  4. Kernel dispatches to the handler (e.g., sys_openat).
  5. Handler returns success or negative error.
  6. Return path restores registers and returns to user mode.

Minimal concrete example

#include <unistd.h>
#include <sys/syscall.h>
long fd = syscall(SYS_openat, AT_FDCWD, "/etc/hosts", O_RDONLY, 0);

Common misconceptions

  • Misconception: libc functions are always syscalls. Correction: many libc functions are wrappers or entirely user-space.
  • Misconception: errno is a kernel variable. Correction: the kernel returns negative values; libc sets errno.

Check-your-understanding questions

  1. Which register holds the syscall number on x86-64?
  2. Why might gettimeofday() not appear in your trace?
  3. Predict what happens if rax contains an invalid syscall number.
  4. Explain why -ENOENT appears as -2 in raw return values.

Check-your-understanding answers

  1. rax.
  2. It may be served by vDSO without a syscall.
  3. The kernel returns -ENOSYS.
  4. errno values are negative in kernel returns; libc negates them.

Real-world applications

  • Production tracing tools (strace, perf trace)
  • Sandboxing and syscall filtering (seccomp)
  • Debugging stuck or misbehaving processes

Where you’ll apply it

References

  • “Computer Systems: A Programmer’s Perspective” (Bryant & O’Hallaron), Ch. 3
  • “The Linux Programming Interface” (Kerrisk), Syscalls chapters

Key insights

The syscall ABI is the unchanging contract that makes tracing possible.

Summary

Syscalls are a precise register-level handshake with the kernel. If you can decode that handshake, you can observe almost any program’s behavior.

Homework/Exercises to practice the concept

  1. Write a tiny program that makes a raw syscall and prints the raw return value.
  2. Use strace -e trace=openat and map register values to arguments.

Solutions to the homework/exercises

  1. Use syscall(SYS_getpid) and print the value; it should match getpid().
  2. Compare strace -v output with manual register mapping from ABI docs.

2.2 ptrace Lifecycle and Event Handling

Fundamentals

ptrace lets one process control and observe another. A tracer attaches, stops the target, inspects registers and memory, and then continues it. The tracer receives events for syscalls, signals, forks, and execs. For a syscall tracer, the basic loop is “wait for stop -> read registers -> decide -> continue”.

Deep Dive into the concept

ptrace uses waitpid()-style stops to synchronize tracer and tracee. PTRACE_SYSCALL requests two stops per syscall: entry and exit. To trace threads and forks, you must enable options such as PTRACE_O_TRACECLONE, PTRACE_O_TRACEFORK, and PTRACE_O_TRACEEXEC, then attach to new TIDs. Signals complicate the event stream: a signal stop is distinct from a syscall stop, and you must forward the signal or suppress it. The tracer must avoid deadlocks by always continuing stopped tracees and by handling waitpid for every thread. If you forget to resume a thread, the target hangs.

How this fits on projects

This is the control loop for the tracer core in Section 5.10 Phase 2 and Section 6 testing.

Definitions & key terms

  • tracer -> process that uses ptrace to control another
  • tracee -> target process being traced
  • stop -> event where tracee is paused and tracer is notified
  • PTRACE_SYSCALL -> stop on syscall entry and exit
  • PTRACE_O_TRACECLONE -> option to receive thread creation events

Mental model diagram (ASCII)

tracer                kernel                tracee
  |  ptrace attach  ->  stop tracee  <-  signal/stop
  |  waitpid        <-  status
  |  getregs        ->
  |  ptrace syscall ->  resume      ->  executes

How it works (step-by-step)

  1. Attach (PTRACE_ATTACH) or launch (PTRACE_TRACEME).
  2. Wait for initial stop and set options.
  3. Loop: wait for stop -> classify stop -> read regs.
  4. On syscall entry, decode args; on exit, decode return.
  5. Resume with PTRACE_SYSCALL, forwarding signals as needed.

Minimal concrete example

ptrace(PTRACE_ATTACH, pid, 0, 0);
waitpid(pid, &status, 0);
ptrace(PTRACE_SETOPTIONS, pid, 0, PTRACE_O_TRACESYSGOOD);
ptrace(PTRACE_SYSCALL, pid, 0, 0);

Common misconceptions

  • Misconception: PTRACE_SYSCALL only stops once per syscall. Correction: it stops twice–entry and exit.
  • Misconception: tracing a PID traces all threads automatically. Correction: you must attach to each thread or use clone events.

Check-your-understanding questions

  1. Why does PTRACE_O_TRACESYSGOOD matter?
  2. How do you distinguish syscall stops from signal stops?
  3. What happens if you fail to resume a tracee?

Check-your-understanding answers

  1. It sets a flag in waitpid status to identify syscall stops.
  2. Check status bits and TRACESYSGOOD flag.
  3. The tracee stays stopped; the program appears hung.

Real-world applications

  • Debuggers (gdb, lldb)
  • Sandboxing and tracing security tools
  • Process introspection in profilers

Where you’ll apply it

References

  • ptrace(2) man page
  • “The Linux Programming Interface”, tracing chapters

Key insights

Tracing is a synchronized stop-and-resume dance–miss a step and the target freezes.

Summary

ptrace provides full control over a process, but only if you handle the event stream correctly.

Homework/Exercises to practice the concept

  1. Write a tracer that only prints syscall numbers for ls.
  2. Attach to a running sleep process and verify it stops and resumes.

Solutions to the homework/exercises

  1. Use PTRACE_SYSCALL, read regs.orig_rax, print it.
  2. Attach, waitpid, then resume; verify ps state changes.

2.3 Safe Argument Decoding and User Memory Reads

Fundamentals

Syscall arguments live in registers, but many are pointers to user memory (strings, buffers, structs). You cannot dereference those pointers directly in the tracer; you must read from the tracee’s memory using process_vm_readv or ptrace(PTRACE_PEEKDATA).

Deep Dive into the concept

Decoding arguments safely requires bounding reads and handling faults. Strings are NUL-terminated, but you must cap reads to avoid unbounded memory access. For arrays and structs, you need size information from syscall prototypes. On x86-64, argument values are in registers; the tracer interprets those raw values as addresses. If the tracee unmaps memory between entry and exit, a read can fail, so your tracer must degrade gracefully (print ? or <fault>). You must also consider that some arguments are in/out (e.g., read writes into a buffer) and can only be safely printed on syscall exit.

How this fits on projects

This is required to decode openat, read, write, execve in Section 3.2 and Section 5.10 Phase 2.

Definitions & key terms

  • user pointer -> address in tracee’s virtual memory
  • process_vm_readv -> efficient user memory read syscall
  • PTRACE_PEEKDATA -> byte/word read through ptrace
  • in/out parameter -> argument updated by kernel (e.g., read buffer)

Mental model diagram (ASCII)

Tracer addr space        Tracee addr space
------------------       -----------------
ptr=0x7f...  --read-->   ["/etc/hosts\0"]

How it works (step-by-step)

  1. Read register holding pointer argument.
  2. Use process_vm_readv to copy bytes into tracer buffer.
  3. Cap size (e.g., 256 bytes) and ensure NUL termination.
  4. If read fails, print <fault> and continue.

Minimal concrete example

struct iovec local = {buf, 256};
struct iovec remote = {(void*)addr, 256};
ssize_t n = process_vm_readv(pid, &local, 1, &remote, 1, 0);

Common misconceptions

  • Misconception: you can read user pointers directly. Correction: the pointer is in another address space.
  • Misconception: you can always read buffers on syscall entry. Correction: output buffers are only valid after syscall exit.

Check-your-understanding questions

  1. Why is process_vm_readv safer than repeated PTRACE_PEEKDATA?
  2. When should you print the buffer for read(fd, buf, n)?
  3. What should you do if a read fails?

Check-your-understanding answers

  1. It is faster and handles bulk reads without word-by-word loops.
  2. On syscall exit, after the kernel has filled the buffer.
  3. Print a placeholder and continue; do not crash.

Real-world applications

  • Debuggers that show memory buffers
  • Security tools inspecting arguments for policy enforcement

Where you’ll apply it

References

  • process_vm_readv(2) man page
  • “The Linux Programming Interface”, memory access chapters

Key insights

Your tracer is only as trustworthy as your argument decoder.

Summary

Safe decoding means bounded reads, correct timing (entry vs exit), and graceful failure.

Homework/Exercises to practice the concept

  1. Implement string decoding for openat and cap at 128 bytes.
  2. Print read buffers on syscall exit and compare with actual file contents.

Solutions to the homework/exercises

  1. Use process_vm_readv and stop at NUL or 128 bytes.
  2. Store entry args; on exit, read the buffer and print hex + ASCII.

3. Project Specification

3.1 What You Will Build

A CLI tool, syscall-tracer, that attaches to a PID or launches a program and emits a live stream of syscall events with arguments, return values, and latency. It supports filtering (by syscall name, PID, thread), summary reports, and deterministic output mode for testing.

3.2 Functional Requirements

  1. Attach or launch: -p <pid> to attach, -- <cmd> to launch.
  2. Syscall stream: print syscall entry/exit with name, args, return, latency.
  3. Thread-aware: trace all threads and fork/exec children.
  4. Filtering: --filter=openat,read and --pid=....
  5. Summary mode: counts, error rates, total/avg latency.
  6. Output formats: text (default) and JSON (--json).
  7. Deterministic mode: --fixed-ts uses monotonic timestamps and fixed seed.
  8. Graceful shutdown: Ctrl-C detaches and resumes tracees.

3.3 Non-Functional Requirements

  • Performance: must handle 5k+ syscalls/sec with <10% added latency in a synthetic test.
  • Reliability: never crash the target process; detaches cleanly.
  • Usability: clear error messages, consistent output fields, documented filters.

3.4 Example Usage / Output

$ sudo ./syscall-tracer --fixed-ts -p 4242
[000000.001234] pid=4242 tid=4242 openat("/etc/hosts", O_RDONLY) = 3 (0.000081s)
[000000.001400] pid=4242 tid=4242 read(3, 0x7ffc... , 64) = 64 (0.000020s)
[000000.001450] pid=4242 tid=4242 close(3) = 0 (0.000004s)

3.5 Data Formats / Schemas / Protocols

Text line format

[timestamp] pid=<pid> tid=<tid> <syscall>(<args>) = <ret> (<latency>s)

JSON format

{
  "ts_ns": 1234000,
  "pid": 4242,
  "tid": 4242,
  "syscall": "openat",
  "args": ["AT_FDCWD", "/etc/hosts", "O_RDONLY"],
  "ret": 3,
  "errno": 0,
  "latency_ns": 81000
}

3.6 Edge Cases

  • Target process exits while tracing.
  • Multi-threaded process spawns threads during attach.
  • Syscall arguments are invalid pointers.
  • Syscalls interrupted by signals and restarted.
  • Large buffers for read/write.

3.7 Real World Outcome

3.7.1 How to Run (Copy/Paste)

make
sudo ./syscall-tracer --fixed-ts -p 4242
sudo ./syscall-tracer --json --filter=openat,read -- ./demo

3.7.2 Golden Path Demo (Deterministic)

  • Use --fixed-ts and --seed 42 to freeze timestamps and ordering for tests.

3.7.3 CLI Transcript (Success + Failure)

$ sudo ./syscall-tracer --fixed-ts --seed 42 -- ./demo
[000000.000100] pid=9001 tid=9001 execve("./demo", ["./demo"], ...) = 0 (0.000200s)
[000000.000500] pid=9001 tid=9001 openat("/etc/hosts", O_RDONLY) = 3 (0.000080s)
[000000.000700] pid=9001 tid=9001 close(3) = 0 (0.000004s)
Summary: openat=1 close=1 errors=0

$ sudo ./syscall-tracer -p 999999
error: pid 999999 not found
exit code: 2

3.7.4 Exit Codes

  • 0 success
  • 1 internal error (decode/IO)
  • 2 invalid arguments / pid not found

4. Solution Architecture

4.1 High-Level Design

+------------------+
| CLI / Filters    |
+--------+---------+
         |
         v
+--------+---------+     +------------------+
| ptrace Controller|<--->| Tracee Threads   |
+--------+---------+     +------------------+
         |
         v
+--------+---------+
| Decoder & Format |
+--------+---------+
         |
         v
+------------------+
| Summary Engine   |
+------------------+

4.2 Key Components

| Component | Responsibility | Key Decisions | |———–|—————-|—————| | Controller | attach/launch, wait/continue loop | use PTRACE_SYSCALL + options | | Decoder | map syscall numbers and args | x86-64 ABI only for v1 | | Formatter | text/JSON output | stable field order for tests | | Summary | aggregate counts/latency | per-syscall hash map |

4.3 Data Structures (No Full Code)

struct SyscallEvent {
    pid_t pid;
    pid_t tid;
    long sysno;
    long args[6];
    long ret;
    long errno_val;
    uint64_t ts_enter_ns;
    uint64_t ts_exit_ns;
};

4.4 Algorithm Overview

Key Algorithm: Syscall Event Pairing

  1. On syscall-entry stop, record registers + timestamp in a map keyed by TID.
  2. On syscall-exit stop, fetch entry, compute latency, decode return.
  3. Emit formatted event and update summary.

Complexity Analysis:

  • Time: O(1) per syscall event
  • Space: O(T) where T is number of traced threads

5. Implementation Guide

5.1 Development Environment Setup

sudo apt install build-essential gcc make

5.2 Project Structure

syscall-tracer/
|-- src/
|   |-- main.c
|   |-- tracer.c
|   |-- decoder.c
|   `-- format.c
|-- include/
|   `-- tracer.h
|-- tests/
|   `-- test_decoder.c
`-- Makefile

5.3 The Core Question You’re Answering

“How can I observe the exact user->kernel boundary without modifying the kernel?”

5.4 Concepts You Must Understand First

  1. Syscall ABI on your architecture.
  2. ptrace lifecycle and event handling.
  3. Safe decoding of user pointers and buffers.

5.5 Questions to Guide Your Design

  1. How will you distinguish entry vs exit stops?
  2. What arguments do you decode fully vs abbreviate?
  3. How will you deal with threads and forks?

5.6 Thinking Exercise

Sketch the trace sequence for open("/etc/hosts"): which syscalls appear before main() runs and why?

5.7 The Interview Questions They’ll Ask

  1. Why do syscalls return negative values in registers?
  2. How do you avoid stopping only the main thread?
  3. What is the overhead source in ptrace tracing?

5.8 Hints in Layers

  • Hint 1: Start with PTRACE_TRACEME and trace a child.
  • Hint 2: Add PTRACE_O_TRACESYSGOOD and check status bits.
  • Hint 3: Implement a syscall number->name table.

5.9 Books That Will Help

| Topic | Book | Chapter | |——|——|———| | Syscalls | TLPI | Syscall chapters | | ABI | CS:APP | Ch. 3 | | Tracing | TLPI | Debugging/Tracing |

5.10 Implementation Phases

Phase 1: Minimal tracer (3-4 days)

  • Attach to a child and print syscall numbers.
  • Checkpoint: you see syscalls for ls.

Phase 2: Decode + format (1 week)

  • Map numbers to names, decode basic args.
  • Checkpoint: output resembles basic strace.

Phase 3: Threads + summary (1 week)

  • Handle clone/exec, add summary report.
  • Checkpoint: multi-threaded program traces correctly.

5.11 Key Implementation Decisions

| Decision | Options | Recommendation | Rationale | |———-|———|—————-|———–| | Event pairing | per-pid vs per-tid | per-tid | syscalls interleave per thread | | Memory reads | PTRACE_PEEKDATA vs process_vm_readv | process_vm_readv | faster + fewer syscalls |


6. Testing Strategy

6.1 Test Categories

| Category | Purpose | Examples | |———|———|———-| | Unit | Decoder correctness | syscall number mapping | | Integration | End-to-end trace | trace true, cat | | Edge | Faulted pointers | invalid address arg |

6.2 Critical Test Cases

  1. Trace a single-threaded program and compare with strace output.
  2. Trace a multi-threaded program and verify all TIDs appear.
  3. Decode a string arg from an unmapped address -> prints <fault>.

6.3 Test Data

program: demo
syscalls: openat, read, close
expected: three events + summary

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

| Pitfall | Symptom | Solution | |———|———|———-| | Forgetting to resume | target hangs | always PTRACE_SYSCALL after each stop | | Wrong ABI | garbage args | confirm register order for x86-64 | | Missing threads | incomplete trace | handle clone events |

7.2 Debugging Strategies

  • Mirror with strace: run both tools on the same process and diff outputs.
  • Log wait statuses: print raw waitpid status during development.

7.3 Performance Traps

  • Excessive string decoding on every syscall can dominate overhead; cap lengths and sample.

8. Extensions & Challenges

8.1 Beginner Extensions

  • Add a --count-only mode.
  • Add basic colorized output.

8.2 Intermediate Extensions

  • Decode sockaddr structures for connect.
  • Add JSON output with schema validation.

8.3 Advanced Extensions

  • Add eBPF-based fast path for high-frequency syscalls.
  • Implement per-cgroup tracing filters.

9. Real-World Connections

9.1 Industry Applications

  • Debugging production incidents: trace slow I/O or failing syscalls.
  • Security monitoring: detect unexpected syscalls in sandboxed apps.
  • strace: canonical syscall tracer
  • perf: tracing with lower overhead

9.3 Interview Relevance

  • System call path and context switch discussion
  • Debugging and observability tooling questions

10. Resources

10.1 Essential Reading

  • TLPI (Kerrisk), Syscall and ptrace chapters
  • CS:APP (Bryant/O’Hallaron), Ch. 3 (machine-level code)

10.2 Video Resources

  • Linux syscall ABI talks (LWN / conference recordings)

10.3 Tools & Documentation

  • ptrace(2), process_vm_readv(2) man pages

11. Self-Assessment Checklist

11.1 Understanding

  • I can explain the syscall ABI and register mapping.
  • I can describe how ptrace events are delivered.
  • I can explain why syscall tracing adds overhead.

11.2 Implementation

  • All functional requirements are met.
  • Summary output is correct and deterministic.
  • Edge cases do not crash the tracee.

11.3 Growth

  • I can explain my design in an interview.
  • I can identify one improvement for v2.

12. Submission / Completion Criteria

Minimum Viable Completion:

  • Attach/launch tracing works.
  • Syscall name + return value printed.
  • Clean detach on exit.

Full Completion:

  • All minimum criteria plus:
  • Argument decoding for strings and buffers.
  • Summary statistics mode.

Excellence (Going Above & Beyond):

  • JSON output + schema.
  • Low-overhead mode or sampling.

13. Determinism Notes

  • Use --fixed-ts and --seed 42 in tests.
  • Avoid wall-clock timestamps in golden outputs.