Project 2: System Call Tracer (Build Your Own strace)

Build a tool that traces system calls made by a process, showing the syscall name, arguments, and return value—like a simplified version of strace.

Quick Reference

Attribute Value
Difficulty Intermediate
Time Estimate 1-2 weeks
Language C (Rust, Go alternatives)
Prerequisites Project 1, process execution, basic assembly
Key Topics ptrace, syscall ABI, x86-64 calling convention

1. Learning Objectives

By completing this project, you will:

  • Understand the system call interface—the boundary between user and kernel mode
  • Learn how the ptrace API works (the foundation of all debuggers)
  • Master the x86-64 calling convention for system calls
  • Gain insight into how programs interact with the kernel

2. Theoretical Foundation

2.1 Core Concepts

System Calls: The ONLY way userspace programs can request kernel services. When a program needs to read a file, allocate memory, or create a process, it must make a system call.

ptrace: A powerful debugging interface that allows one process (the tracer) to observe and control another process (the tracee). It’s used by strace, gdb, and similar tools.

System Call Flow (x86-64):
┌─────────────────────────────────────────────────────────────┐
│                    User Space                                │
│                                                              │
│  Program:    mov rax, 1        ; syscall number (write)     │
│              mov rdi, 1        ; arg1: fd (stdout)          │
│              mov rsi, msg      ; arg2: buffer address       │
│              mov rdx, len      ; arg3: length               │
│              syscall           ; transition to kernel       │
│                    │                                         │
├────────────────────┼────────────────────────────────────────┤
│                    ▼          Kernel Space                  │
│              ┌───────────┐                                   │
│              │ syscall   │  Saves user state                │
│              │  entry    │  Looks up handler in syscall     │
│              │           │  table using rax                 │
│              └─────┬─────┘                                   │
│                    │                                         │
│              ┌─────▼─────┐                                   │
│              │ sys_write │  The actual handler              │
│              │ function  │  Does the work                   │
│              └─────┬─────┘                                   │
│                    │                                         │
│              ┌─────▼─────┐                                   │
│              │ syscall   │  Restores user state             │
│              │  exit     │  Returns to userspace            │
│              └─────┬─────┘                                   │
│                    │                                         │
├────────────────────┼────────────────────────────────────────┤
│                    ▼          User Space                    │
│              Return value in rax                             │
└─────────────────────────────────────────────────────────────┘

2.2 Why This Matters

Understanding system calls is fundamental to:

  • Debugging: Why did this program fail? What file was it trying to open?
  • Security: System calls are the attack surface between user and kernel
  • Performance: Minimizing syscalls is a key optimization technique
  • Virtualization: Hypervisors must handle syscalls from guest OSes

2.3 Historical Context

The ptrace system call has existed since early Unix. It was designed for debugging but has been repurposed for tracing, sandboxing, and even checkpointing. The Linux implementation has evolved significantly, adding features like PTRACE_SEIZE and seccomp-bpf.

2.4 Common Misconceptions

  • “System calls are slow” - Modern syscalls are quite fast (~100ns), but they’re still orders of magnitude slower than regular function calls.
  • “All libc functions are syscalls” - Many libc functions (like strlen) don’t involve the kernel at all.
  • “ptrace is only for debugging” - It’s used for tracing, sandboxing, and more.

3. Project Specification

3.1 What You Will Build

A command-line tool called mytrace that:

  • Launches a program and traces its system calls
  • Shows syscall names, arguments, and return values
  • Optionally provides summary statistics

3.2 Functional Requirements

  1. Basic Tracing:
    • Launch a program under trace
    • Intercept all system calls
    • Print syscall name, arguments, and return value
  2. Argument Decoding:
    • Decode integer arguments
    • Read string arguments from traced process memory
    • Decode common flags (O_RDONLY, etc.)
  3. Child Process Tracking (stretch goal):
    • Follow fork/clone children
    • Track multiple processes
  4. Summary Mode:
    • Count syscalls by type
    • Measure time spent in each syscall

3.3 Non-Functional Requirements

  • Minimal performance impact on traced program
  • Handle signals correctly
  • Clean up properly on exit

3.4 Example Usage / Output

$ ./mytrace ls -la
[1234] execve("/bin/ls", ["ls", "-la"], [...]) = 0
[1234] brk(NULL) = 0x55a1e2f3d000
[1234] arch_prctl(0x3001, 0x7ffd2a3f3160) = -1 EINVAL
[1234] access("/etc/ld.so.preload", R_OK) = -1 ENOENT
[1234] openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
[1234] fstat(3, {st_mode=S_IFREG|0644, st_size=87441, ...}) = 0
[1234] mmap(NULL, 87441, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f8c12345000
[1234] close(3) = 0
... (more syscalls)
[1234] write(1, "total 48\ndrwxr-xr-x 5 user...", 234) = 234
[1234] close(1) = 0
[1234] exit_group(0) = ?
+++ exited with 0 +++

$ ./mytrace -c ls    # Summary mode
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 45.23    0.000234          11        21           mmap
 22.11    0.000114           8        14           close
 15.67    0.000081           6        13           openat
  8.45    0.000044           7         6           fstat
  5.32    0.000027          27         1           execve
  3.22    0.000017           8         2         2 access
------ ----------- ----------- --------- --------- ----------------
100.00    0.000517                    57         2 total

3.5 Real World Outcome

You’ll have a functional strace clone that you can use for real debugging. More importantly, you’ll deeply understand the user/kernel interface that every program uses.


4. Solution Architecture

4.1 High-Level Design

┌─────────────────────────────────────────────────────────────┐
│                      mytrace (tracer)                       │
│                                                              │
│  ┌──────────────┐   ┌──────────────┐   ┌──────────────┐    │
│  │   Main       │   │   Syscall    │   │   Output     │    │
│  │   Loop       │   │   Decoder    │   │   Formatter  │    │
│  └──────┬───────┘   └──────┬───────┘   └──────┬───────┘    │
│         │                  │                  │             │
│         │ wait()           │ PTRACE_GETREGS   │             │
│         │ PTRACE_SYSCALL   │ PTRACE_PEEKDATA  │             │
│         │                  │                  │             │
└─────────┼──────────────────┼──────────────────┼─────────────┘
          │                  │                  │
          ▼                  ▼                  ▼
┌─────────────────────────────────────────────────────────────┐
│                    Target Process (tracee)                   │
│                                                              │
│  Stopped at:                                                 │
│    - syscall entry (before kernel)                           │
│    - syscall exit (after kernel)                             │
│                                                              │
└─────────────────────────────────────────────────────────────┘

4.2 Key Components

  1. Process Control: Fork/exec tracee, manage ptrace state
  2. Syscall Detection: Distinguish entry from exit, read registers
  3. Argument Decoding: Read memory, decode types and flags
  4. Output Formatting: Format syscalls like strace

4.3 Data Structures

// Syscall information
struct syscall_info {
    long number;              // Syscall number (from rax)
    long args[6];             // Arguments (rdi, rsi, rdx, r10, r8, r9)
    long retval;              // Return value
    int entry;                // 1 = entry, 0 = exit
};

// Syscall table entry
struct syscall_entry {
    const char *name;         // "read", "write", etc.
    int nargs;                // Number of arguments
    int arg_types[6];         // Type of each argument
};

// Argument types
enum arg_type {
    ARG_INT,          // Integer
    ARG_PTR,          // Pointer (print as hex)
    ARG_STR,          // String (read from tracee)
    ARG_FD,           // File descriptor
    ARG_FLAGS,        // Flags (decode)
    ARG_STRUCT,       // Structure (print abbreviated)
};

// Statistics tracking
struct syscall_stats {
    int count;
    int errors;
    long total_time_ns;
};

4.4 Algorithm Overview

Main Tracing Loop:

1. Fork child process
2. In child: PTRACE_TRACEME, then exec target
3. In parent: wait for child to stop at exec
4. Loop:
   a. PTRACE_SYSCALL to continue until next syscall
   b. wait() for child to stop
   c. If stopped at syscall:
      - PTRACE_GETREGS to read registers
      - If syscall entry: save syscall number and args
      - If syscall exit: print complete syscall info
   d. If exited: break

5. Implementation Guide

5.1 Development Environment Setup

# Standard development tools
sudo apt install build-essential gdb

# Create project
mkdir mytrace && cd mytrace

5.2 Project Structure

mytrace/
├── Makefile
├── include/
│   ├── syscalls.h      # Syscall table
│   ├── decoder.h       # Argument decoder
│   └── tracer.h        # Ptrace wrappers
├── src/
│   ├── main.c
│   ├── tracer.c        # Process control
│   ├── decoder.c       # Syscall decoding
│   ├── syscalls.c      # Syscall table
│   └── output.c        # Formatting
└── tests/
    └── test_programs/

5.3 The Core Question You’re Answering

“How does the kernel transfer control between user mode and kernel mode, and how can we observe this boundary?”

5.4 Concepts You Must Understand First

  1. What is a system call at the hardware level?
    • What instruction triggers a syscall on x86-64?
    • What happens to the CPU state?
    • Reference: “The Linux Programming Interface” Chapter 3
  2. How does ptrace work?
    • What does PTRACE_TRACEME do?
    • How does the tracer get notified of events?
    • Reference: man 2 ptrace
  3. What is the x86-64 syscall calling convention?
    • Which registers hold arguments?
    • Where is the return value?
    • Reference: System V AMD64 ABI

5.5 Questions to Guide Your Design

State Management:

  • How do you know if you’re at syscall entry or exit? (Hint: count)
  • What state do you need to save between entry and exit?

Memory Access:

  • How do you read strings from the tracee’s address space?
  • What if a string is very long?

Error Handling:

  • How do you decode negative return values?
  • What’s the difference between -ENOENT and 0xFFFFFFFF?

5.6 Thinking Exercise

Before writing code, trace through what happens when you run ls:

  1. Shell calls fork() - new process created
  2. Child calls ptrace(PTRACE_TRACEME) - requests tracing
  3. Child calls execve("/bin/ls", ...) - stops here due to SIGTRAP
  4. Tracer sees child stopped, calls PTRACE_SYSCALL
  5. Child runs until brk() syscall entry - stops
  6. Tracer reads registers (rax=12, the brk syscall number)
  7. Tracer calls PTRACE_SYSCALL
  8. Child enters kernel, brk() executes, exits - stops at syscall exit
  9. Tracer reads registers (rax=return value)
  10. … repeat for each syscall …

Question: How does the tracer distinguish step 5 from step 8?

5.7 Hints in Layers

Hint 1 - Basic Setup:

pid_t child = fork();
if (child == 0) {
    // Child
    ptrace(PTRACE_TRACEME, 0, NULL, NULL);
    raise(SIGSTOP);  // Stop to let parent catch up
    execvp(argv[0], argv);
    // Never reaches here unless exec fails
}
// Parent continues with tracing loop

Hint 2 - Detecting Syscall Entry/Exit:

// ptrace stops at BOTH entry and exit of syscalls
// You need to track state to know which one
int in_syscall = 0;

// After wait():
if (WIFSTOPPED(status) && WSTOPSIG(status) == (SIGTRAP | 0x80)) {
    if (!in_syscall) {
        // Syscall entry - read args
        in_syscall = 1;
    } else {
        // Syscall exit - read return value
        in_syscall = 0;
    }
}

Hint 3 - Reading Registers:

struct user_regs_struct regs;
ptrace(PTRACE_GETREGS, child, NULL, &regs);

// On x86-64:
// regs.orig_rax = syscall number
// regs.rdi = arg1, regs.rsi = arg2, regs.rdx = arg3
// regs.r10 = arg4, regs.r8 = arg5, regs.r9 = arg6
// regs.rax = return value (after syscall)

Hint 4 - Reading Strings from Tracee:

// Read a string from tracee's memory
char *read_string(pid_t pid, unsigned long addr) {
    char *str = malloc(PATH_MAX);
    int i = 0;

    while (i < PATH_MAX) {
        // PTRACE_PEEKDATA reads one word at a time
        long word = ptrace(PTRACE_PEEKDATA, pid, addr + i, NULL);
        memcpy(str + i, &word, sizeof(word));

        // Check for null terminator in this word
        for (int j = 0; j < sizeof(word); j++) {
            if (str[i + j] == '\0') {
                return str;
            }
        }
        i += sizeof(word);
    }
    str[PATH_MAX - 1] = '\0';
    return str;
}

5.8 The Interview Questions They’ll Ask

  1. “Explain the syscall calling convention on x86-64”
    • Number in rax, args in rdi, rsi, rdx, r10, r8, r9
    • Return value in rax
    • syscall instruction triggers the transition
  2. “How does ptrace intercept system calls?”
    • Tracee is stopped at syscall entry and exit
    • Tracer uses wait() to detect stops
    • PTRACE_GETREGS reads the state
  3. “What’s the performance impact of tracing?”
    • 2 context switches per syscall (entry + exit)
    • PTRACE_PEEKDATA for each word of string data
    • Significant slowdown (10-100x) for syscall-heavy programs
  4. “How would you trace a multi-threaded program?”
    • Use PTRACE_SEIZE instead of PTRACE_ATTACH
    • Need to track multiple PIDs
    • Handle PTRACE_EVENT_CLONE
  5. “What’s seccomp and how does it relate to ptrace?”
    • seccomp-bpf can filter syscalls before they execute
    • Lower overhead than ptrace
    • Can’t modify arguments like ptrace can

5.9 Books That Will Help

Topic Book Chapter
System call mechanism The Linux Programming Interface Chapters 3, 44
ptrace interface Linux System Programming Chapter 10
x86-64 ABI System V AMD64 ABI Chapter 3
Kernel syscall handling Understanding the Linux Kernel Chapter 10

5.10 Implementation Phases

Phase 1: Basic Tracing (Days 1-2)

  • Fork/exec with PTRACE_TRACEME
  • Main loop with PTRACE_SYSCALL
  • Print syscall numbers

Phase 2: Syscall Decoding (Days 3-4)

  • Build syscall number-to-name table
  • Decode integer arguments
  • Print basic output format

Phase 3: String Arguments (Days 5-6)

  • Implement PTRACE_PEEKDATA reader
  • Handle string arguments (filenames)
  • Truncate long strings

Phase 4: Flag Decoding (Days 7-8)

  • Decode open() flags
  • Decode mmap() protection flags
  • Decode common constants

Phase 5: Polish (Days 9-10)

  • Add summary mode (-c)
  • Handle errors and signals
  • Test with various programs

5.11 Key Implementation Decisions

Decision 1: How to track syscall entry vs exit

  • Simple counter approach: odd = entry, even = exit
  • More robust: check if rax changed from original

Decision 2: How much string data to read

  • Read up to N bytes (e.g., 80)
  • Indicate truncation with “…”

Decision 3: How to handle unknown syscalls

  • Print number if name unknown
  • Don’t crash—new kernels may have new syscalls

6. Testing Strategy

Unit Tests

// Test syscall number mapping
void test_syscall_names() {
    assert(strcmp(get_syscall_name(0), "read") == 0);
    assert(strcmp(get_syscall_name(1), "write") == 0);
    assert(strcmp(get_syscall_name(2), "open") == 0);
}

// Test flag decoding
void test_open_flags() {
    char buf[256];
    decode_open_flags(O_RDONLY | O_CLOEXEC, buf);
    assert(strstr(buf, "O_RDONLY") != NULL);
    assert(strstr(buf, "O_CLOEXEC") != NULL);
}

Integration Tests

# Simple test
./mytrace /bin/true
# Should show execve and exit_group

# Test file operations
./mytrace cat /etc/hostname
# Should show openat, read, write, close

# Compare with real strace
strace -o strace.out ls
./mytrace ls > mytrace.out
# Compare outputs (won't be identical, but should have same syscalls)

Test Programs

// test_programs/simple_syscalls.c
int main() {
    int fd = open("/etc/passwd", O_RDONLY);
    char buf[100];
    read(fd, buf, 100);
    close(fd);
    return 0;
}
// Expected output: openat, read, close, exit_group

7. Common Pitfalls & Debugging

Problem Symptom Root Cause Fix
Tracee hangs Program never starts SIGSTOP not handled Wait for initial stop, then PTRACE_SYSCALL
Wrong syscall numbers Numbers don’t match strace 32-bit vs 64-bit ABI Use orig_rax, not rax
Garbage strings Random characters Wrong address or alignment Verify address, handle EFAULT
Missing syscalls Less output than strace Not following children Add PTRACE_O_TRACECLONE
Tracer exits early Misses syscalls Not handling PTRACE_EVENT_* Check wait status properly

Quick Verification

# Run a known program and count syscalls
./mytrace -c /bin/true 2>&1 | grep "exit_group"
# Should show exactly 1 exit_group call

# Compare syscall sequence
strace -e trace=openat cat /etc/hostname 2>&1 | head -3
./mytrace cat /etc/hostname 2>&1 | grep openat | head -3
# Should show similar syscalls

8. Extensions & Challenges

Easy Extensions

  1. Add timestamp: Show time of each syscall
  2. Add pid prefix: Support tracing multiple processes
  3. Filter by syscall: Only show specific syscalls (-e option)

Medium Extensions

  1. Decode more syscalls: Add stat, ioctl, socket structures
  2. Follow children: Handle fork/clone with PTRACE_O_TRACEFORK
  3. Stack traces: Show call stack at each syscall

Hard Extensions

  1. Modify syscalls: Change arguments or return values
  2. Syscall injection: Inject syscalls into running process
  3. Compare with seccomp: Implement filtering with BPF

9. Real-World Connections

How strace Actually Works

The real strace is much more sophisticated:

  • Supports all architectures (ARM, RISC-V, etc.)
  • Decodes hundreds of syscalls with full argument types
  • Handles all ptrace events
  • Can attach to running processes

Study its source code at https://github.com/strace/strace

Industry Usage

  • Debugging: “Why does my program hang?” - strace shows it’s waiting on read()
  • Security auditing: What files does this binary access?
  • Performance analysis: Which syscalls are slow?
  • Reverse engineering: Understanding unknown programs

10. Resources

Essential Documentation

  • man 2 ptrace - The ptrace system call
  • man 2 syscall - Syscall calling convention
  • /usr/include/asm/unistd_64.h - Syscall numbers
  • /usr/include/sys/user.h - user_regs_struct definition

Code References

  • strace source: https://github.com/strace/strace
  • syscall tables: arch/x86/entry/syscalls/syscall_64.tbl in kernel source

Online Resources


11. Self-Assessment Checklist

Before moving to the next project, verify:

  • I can explain what happens when syscall instruction executes
  • I understand the ptrace PTRACE_SYSCALL mechanism
  • I know which registers hold syscall arguments on x86-64
  • I can read data from another process’s memory
  • I understand why ptrace stops twice per syscall
  • My tracer handles simple programs (ls, cat, echo)
  • I can decode at least open(), read(), write(), close()
  • My tracer handles errors gracefully (tracee exits, exec fails)

12. Submission / Completion Criteria

Your project is complete when:

  1. Basic tracing works for simple programs
  2. Syscall names are decoded (at least 20 common syscalls)
  3. String arguments are read and displayed
  4. Return values show both success and error cases
  5. Summary mode counts syscalls correctly
  6. Error handling handles tracee crashes/exits

Verification Commands

# These should all work:
./mytrace /bin/true
./mytrace ls /tmp
./mytrace cat /etc/hostname
./mytrace -c ls /

# Output should be similar to (not identical):
strace ls /tmp
strace -c ls /

Next Project: P03 - Custom Kernel Build - Build and boot your own kernel to establish your development workflow.