Project 1: Cross-Platform Sandboxed Service
Build a file-watching daemon with native sandboxing on each OS–demonstrating the fundamental differences between Unix security models.
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Advanced |
| Time Estimate | 2-3 weeks |
| Languages | C |
| OSes | OpenBSD, FreeBSD, Linux |
| Prerequisites | C programming, basic Unix syscalls |
| Key Topics | pledge/unveil, Capsicum, seccomp-bpf, file watching |
| Portfolio Value | High - demonstrates deep systems knowledge |
1. Learning Objectives
By completing this project, you will:
-
Master three distinct Unix security models - Understand pledge/unveil (OpenBSD), Capsicum (FreeBSD), and seccomp-bpf (Linux) at an implementation level, not just conceptually.
-
Understand the syscall security boundary - Learn why the kernel-userspace boundary is the natural enforcement point for security policies, and how each OS exploits this differently.
-
Implement file system event monitoring - Build working implementations using inotify (Linux) and kqueue EVFILT_VNODE (BSD), understanding their fundamental architectural differences.
-
Write portable systems code - Create a single codebase with conditional compilation that cleanly handles OS-specific APIs without becoming unmaintainable.
-
Apply the principle of least privilege - Reduce your daemon’s attack surface to the absolute minimum required for its function, demonstrating this concretely on each platform.
-
Debug sandbox violations - Learn to interpret SIGABRT (pledge), ECAPMODE (Capsicum), and SIGSYS (seccomp) errors, building intuition for what went wrong.
-
Evaluate security trade-offs - Make informed decisions about which security model is appropriate for different use cases, backed by hands-on experience.
2. Theoretical Foundation
2.1 Core Concepts
System Calls as the Security Boundary
Every interaction between a user-space program and the kernel happens through system calls. When you call open(), read(), or socket(), you’re not just invoking a function–you’re crossing the most fundamental boundary in a Unix system.
User Space Kernel Space
┌────────────────────────────┐ ┌────────────────────────────┐
│ │ │ │
│ Your Application │ │ Kernel │
│ │ │ │
│ fd = open("/etc/passwd") │ │ - Validate arguments │
│ │ │ │ - Check permissions │
│ │ syscall │ │ - Allocate resources │
│ │ instruction │ │ - Return result │
│ │ │ │ │
│ └──────────────────┼─────► │
│ │ │ SECURITY ENFORCEMENT │
│ (CPU mode changes: │ │ HAPPENS HERE │
│ user → kernel) │ │ │
│ │ │ │
└────────────────────────────┘ └────────────────────────────┘
This is why all three security models (pledge, Capsicum, seccomp) operate at the syscall level–it’s the only point where the kernel has complete control and visibility.
Key insight: A library function like fopen() eventually calls open(). If you block open() at the syscall level, there’s no way around it–not through libc, not through any library.
The Principle of Least Privilege
A file watcher needs to:
- Open a directory for watching
- Read file system events
- Write to a log file
- Maybe read a config file
A file watcher does NOT need to:
- Open network connections
- Execute other programs
- Mount filesystems
- Access arbitrary files
- Use ptrace to debug other processes
The principle of least privilege says: remove every capability you don’t need. Each removed capability is one less thing an attacker can exploit if they compromise your daemon.
Full Privileges Minimal Privileges (Sandboxed)
┌──────────────────────┐ ┌──────────────────────────────┐
│ ☑ Read any file │ │ ☑ Read /var/log only │
│ ☑ Write any file │ │ ☑ Write to log file only │
│ ☑ Network access │ │ ☒ Network access │
│ ☑ Execute programs │ │ ☒ Execute programs │
│ ☑ Mount filesystems │ │ ☒ Mount filesystems │
│ ☑ Debug processes │ │ ☒ Debug processes │
│ ☑ Load kernel mods │ │ ☒ Load kernel mods │
└──────────────────────┘ └──────────────────────────────┘
Attack surface: HUGE Attack surface: TINY
2.2 The Three Security Models
Each Unix variant took a different approach to solving the same problem: how do we let applications voluntarily reduce their privileges?
OpenBSD: pledge/unveil (Promise-Based)
OpenBSD’s model is elegantly simple: tell the kernel what you’ll do, then prove you meant it.
pledge() restricts which categories of system calls a process can make:
#include <unistd.h>
// "I promise I will only do stdio and read paths"
if (pledge("stdio rpath", NULL) == -1)
err(1, "pledge");
// From this point forward:
// - I CAN: read, write, close, fstat, lseek, poll (stdio)
// - I CAN: open with O_RDONLY, stat, access, readlink (rpath)
// - I CANNOT: open with O_WRONLY, socket, fork, exec, etc.
Common pledge promises:
| Promise | What it allows |
|---|---|
stdio |
Standard I/O operations (read/write on existing fds) |
rpath |
Open files for reading, stat, readdir |
wpath |
Open files for writing |
cpath |
Create/delete files and directories |
flock |
File locking (flock, fcntl F_SETLK) |
inet |
Internet socket operations |
unix |
Unix domain socket operations |
dns |
DNS resolution (requires special pledge) |
proc |
Fork, kill, wait |
exec |
Execve (execute programs) |
unveil() restricts which filesystem paths are visible:
#include <unistd.h>
// "I can only see /var/log for reading and writing"
if (unveil("/var/log", "rw") == -1)
err(1, "unveil");
// "I can only read /etc/myapp.conf"
if (unveil("/etc/myapp.conf", "r") == -1)
err(1, "unveil");
// "Lock it down - no more unveil calls allowed"
if (unveil(NULL, NULL) == -1)
err(1, "unveil");
// Now: /etc/passwd doesn't exist as far as this process knows
// open("/etc/passwd", O_RDONLY) → ENOENT (not EPERM!)
Philosophy: Simplicity enables adoption. OpenBSD has pledge/unveil’d over 90% of their base system because it’s so easy to use.
Violation behavior: If you violate a pledge, the kernel sends SIGABRT. There’s no catching this, no recovery–your process dies immediately. This is intentional: the only reason to violate a pledge is if your code is compromised.
FreeBSD: Capsicum (Capability-Based)
Capsicum takes a different approach: capabilities are bound to file descriptors, and you can enter a mode where global namespaces disappear.
#include <sys/capsicum.h>
// Step 1: Open all file descriptors you'll need BEFORE sandboxing
int dir_fd = open("/var/log", O_RDONLY | O_DIRECTORY);
int log_fd = open("/var/log/events.log", O_WRONLY | O_APPEND | O_CREAT, 0644);
int kq = kqueue();
// Step 2: Limit what operations are allowed on each fd
cap_rights_t rights;
// dir_fd can only: read entries, receive events, lookup names
cap_rights_init(&rights, CAP_READ, CAP_EVENT, CAP_LOOKUP);
cap_rights_limit(dir_fd, &rights);
// log_fd can only: write and seek
cap_rights_init(&rights, CAP_WRITE, CAP_SEEK);
cap_rights_limit(log_fd, &rights);
// Step 3: Enter capability mode - NO WAY BACK
if (cap_enter() == -1)
err(1, "cap_enter");
// Now: open("/etc/passwd", O_RDONLY) → ECAPMODE
// The global namespace is GONE. You can only use your existing fds.
Key Capsicum concepts:
| Concept | Meaning |
|---|---|
cap_enter() |
Enter capability mode (irreversible) |
cap_rights_limit() |
Restrict operations on a file descriptor |
CAP_READ |
Can read from this fd |
CAP_WRITE |
Can write to this fd |
CAP_EVENT |
Can use with kqueue/poll |
CAP_LOOKUP |
Can use openat() relative to this fd |
Philosophy: File descriptors are tokens of authority. By entering capability mode, you lose the ability to acquire new tokens (no more open() with paths)–you can only use what you already have.
Violation behavior: After cap_enter(), syscalls that try to access the global namespace (like open("/path")) return -1 with errno = ECAPMODE. This is a graceful failure, not a crash.
The restructuring requirement: Capsicum requires you to think about your program differently. You must open all files BEFORE entering capability mode, then work only with those file descriptors. This is more work than pledge, but it’s also more precise.
Linux: seccomp-bpf (Filter-Based)
Linux’s approach is the most flexible and the most complex: you write a BPF program that filters every syscall.
#include <seccomp.h> // libseccomp wrapper
// Initialize: default action is to KILL the process
scmp_filter_ctx ctx = seccomp_init(SCMP_ACT_KILL);
// Whitelist the syscalls we need
seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(read), 0);
seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(write), 0);
seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(close), 0);
seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(fstat), 0);
seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(inotify_init1), 0);
seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(inotify_add_watch), 0);
seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(inotify_rm_watch), 0);
seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(exit_group), 0);
// ... many more syscalls needed
// Load the filter into the kernel
if (seccomp_load(ctx) < 0)
err(1, "seccomp_load");
seccomp_release(ctx);
// Now: socket() → SIGSYS (process killed)
Without libseccomp (raw BPF):
#include <linux/seccomp.h>
#include <linux/filter.h>
#include <linux/audit.h>
#include <sys/prctl.h>
struct sock_filter filter[] = {
// Load syscall number
BPF_STMT(BPF_LD | BPF_W | BPF_ABS,
offsetof(struct seccomp_data, nr)),
// Allow read (syscall 0)
BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, __NR_read, 0, 1),
BPF_STMT(BPF_RET | BPF_K, SECCOMP_RET_ALLOW),
// Allow write (syscall 1)
BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, __NR_write, 0, 1),
BPF_STMT(BPF_RET | BPF_K, SECCOMP_RET_ALLOW),
// ... many more rules
// Default: kill the process
BPF_STMT(BPF_RET | BPF_K, SECCOMP_RET_KILL),
};
struct sock_fprog prog = {
.len = sizeof(filter) / sizeof(filter[0]),
.filter = filter,
};
// Enable seccomp
prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0); // Required first
prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &prog);
Philosophy: Maximum flexibility. You can filter on syscall number, syscall arguments (with limitations), and decide per-syscall what happens (KILL, ERRNO, ALLOW, TRACE, etc.).
The path restriction problem: Unlike pledge’s unveil or Capsicum’s cap_enter, seccomp cannot easily restrict which files can be opened. Syscall arguments are just pointers–the BPF filter would need to dereference userspace memory to check paths, which is complex and has security implications.
Violation behavior: Configurable per-syscall:
SECCOMP_RET_KILL- Send SIGSYS, terminate processSECCOMP_RET_ERRNO- Return -1, set errno to specified valueSECCOMP_RET_TRACE- Notify a tracer (for debugging)SECCOMP_RET_ALLOW- Allow the syscall
2.3 File System Event Notification
Your file watcher needs to know when files change. Linux and BSD have fundamentally different APIs for this.
Linux: inotify
#include <sys/inotify.h>
// Create an inotify instance
int inotify_fd = inotify_init1(IN_NONBLOCK);
// Watch a directory for specific events
int wd = inotify_add_watch(inotify_fd, "/var/log",
IN_CREATE | IN_DELETE | IN_MODIFY | IN_MOVED_FROM | IN_MOVED_TO);
// Read events
char buf[4096];
ssize_t len = read(inotify_fd, buf, sizeof(buf));
// Parse events
for (char *ptr = buf; ptr < buf + len; ) {
struct inotify_event *event = (struct inotify_event *)ptr;
if (event->mask & IN_CREATE)
printf("CREATED: %s\n", event->name);
if (event->mask & IN_DELETE)
printf("DELETED: %s\n", event->name);
if (event->mask & IN_MODIFY)
printf("MODIFIED: %s\n", event->name);
ptr += sizeof(struct inotify_event) + event->len;
}
inotify characteristics:
- Event-driven (you read events from a file descriptor)
- Events include the filename that changed
- Watch descriptor identifies which watch fired
- Scales well for watching many files
BSD: kqueue with EVFILT_VNODE
#include <sys/event.h>
// Create a kqueue
int kq = kqueue();
// Open the directory (you need an fd)
int dir_fd = open("/var/log", O_RDONLY | O_DIRECTORY);
// Set up a vnode filter
struct kevent ev;
EV_SET(&ev, dir_fd, EVFILT_VNODE,
EV_ADD | EV_ENABLE | EV_CLEAR,
NOTE_WRITE | NOTE_DELETE | NOTE_RENAME | NOTE_EXTEND,
0, NULL);
kevent(kq, &ev, 1, NULL, 0, NULL);
// Wait for events
struct kevent events[10];
int n = kevent(kq, NULL, 0, events, 10, NULL);
for (int i = 0; i < n; i++) {
if (events[i].fflags & NOTE_WRITE)
printf("Directory modified (file created/deleted/changed)\n");
if (events[i].fflags & NOTE_DELETE)
printf("Directory deleted\n");
}
kqueue characteristics:
- Unified interface for multiple event types (files, sockets, signals, timers)
- Works on file descriptors, not paths
- NOTE_WRITE on a directory fires when contents change
- Less granular than inotify (you know something changed, not always what)
2.4 Why This Matters
For security professionals: Understanding these models helps you evaluate which is appropriate for different threat models. pledge is great for simple daemons; Capsicum is ideal for complex applications that need fine-grained control; seccomp is necessary when you need to filter syscall arguments.
For systems programmers: Writing portable security-hardened code requires understanding the idioms of each platform. Code that works beautifully on OpenBSD might need significant restructuring for Capsicum.
For interviewers: These concepts come up in security-focused interviews. Understanding the trade-offs between simplicity (pledge) and flexibility (seccomp) shows systems maturity.
For understanding containers: Docker uses seccomp-bpf, namespaces, and cgroups. FreeBSD jails use Capsicum. Understanding these primitives helps you understand what containers actually do.
2.5 Historical Context
OpenBSD pledge (2015): Theo de Raadt and the OpenBSD team introduced pledge as a successor to systrace. The design goal was simplicity: if it’s hard to use, developers won’t use it. By 2018, over 90% of OpenBSD’s base system was pledge’d. unveil() followed in 2018, adding path-based restrictions that pledge lacked.
FreeBSD Capsicum (2010): Jonathan Anderson and Robert Watson designed Capsicum as their PhD research at Cambridge. It was inspired by capability-based security from academic operating systems like seL4. Chromium on FreeBSD was an early adopter, using Capsicum to sandbox renderer processes.
Linux seccomp (2005) / seccomp-bpf (2012): Original seccomp (2005) was extremely limited–processes could only use read, write, exit, and sigreturn. Will Drewry added BPF filtering in 2012, transforming seccomp into a general-purpose sandbox. Chrome’s Linux sandbox became the flagship use case.
2.6 Common Misconceptions
Misconception 1: “seccomp is more secure because it’s more powerful”
Reality: Complexity often reduces security. A complex seccomp filter is harder to audit and more likely to have bugs. OpenBSD’s pledge has been applied to vastly more software precisely because it’s simple.
Misconception 2: “I can use seccomp to restrict file access like unveil”
Reality: seccomp filters syscall numbers and arguments, but arguments like file paths are just pointers to userspace memory. Dereferencing them safely in a BPF filter is non-trivial. Path-based restrictions usually require additional mechanisms (namespaces, bind mounts, or landlock on newer kernels).
Misconception 3: “Capsicum requires rewriting my whole program”
Reality: Capsicum requires thinking about capability acquisition upfront, but you don’t need to rewrite everything. The pattern is: open files before cap_enter(), then use those file descriptors. Many programs already work this way.
Misconception 4: “pledge violations are recoverable”
Reality: pledge violations result in SIGABRT, which cannot be caught, blocked, or ignored (it’s forced by the kernel). Your process dies. This is a feature: if you violate a pledge, your code was compromised.
3. Project Specification
3.1 What You Will Build
A file-watching daemon called filewatcher that:
- Monitors a specified directory for file system changes (create, modify, delete)
- Logs events to stdout or a log file with timestamps
- Runs under the strictest sandbox available on each OS
- Demonstrates sandbox enforcement by attempting (and being blocked from) forbidden operations
- Compiles and runs on OpenBSD, FreeBSD, and Linux from a single codebase
3.2 Functional Requirements
| ID | Requirement |
|---|---|
| FR1 | Accept a directory path as command-line argument |
| FR2 | Detect file creation events and log “CREATED: filename” |
| FR3 | Detect file modification events and log “MODIFIED: filename” |
| FR4 | Detect file deletion events and log “DELETED: filename” |
| FR5 | Include timestamps in ISO 8601 format |
| FR6 | Run continuously until SIGINT/SIGTERM |
| FR7 | Apply OS-native sandboxing before entering main loop |
| FR8 | Provide a –test-sandbox flag that attempts forbidden operations |
3.3 Non-Functional Requirements
| ID | Requirement |
|---|---|
| NFR1 | Single codebase with #ifdef for OS-specific code |
| NFR2 | No external dependencies beyond libc (except libseccomp on Linux, optionally) |
| NFR3 | Clean exit on SIGINT/SIGTERM |
| NFR4 | Memory-safe: no leaks, no buffer overflows |
| NFR5 | Documented: man page or README explaining sandbox configuration |
3.4 Example Usage / Output
On OpenBSD:
$ ./filewatcher /var/log
[filewatcher] OpenBSD mode: pledge("stdio rpath wpath cpath") + unveil
[filewatcher] Revealed paths: /var/log (rw)
[filewatcher] Sandbox active. Watching /var/log...
[2024-12-29T10:15:32] CREATED: messages.1
[2024-12-29T10:15:35] MODIFIED: authlog
[2024-12-29T10:15:40] DELETED: old.log
^C
[filewatcher] Received SIGINT, shutting down.
$ ./filewatcher --test-sandbox /var/log
[filewatcher] OpenBSD mode: pledge("stdio rpath wpath cpath") + unveil
[filewatcher] Testing sandbox: attempting socket()...
Abort trap (core dumped)
On FreeBSD:
$ ./filewatcher /var/log
[filewatcher] FreeBSD mode: Capsicum capability mode
[filewatcher] FD rights: dir_fd=CAP_READ|CAP_EVENT, log=CAP_WRITE
[filewatcher] cap_enter() succeeded. Global namespace revoked.
[filewatcher] Sandbox active. Watching /var/log...
[2024-12-29T10:15:32] CREATED: messages.1
^C
[filewatcher] Received SIGINT, shutting down.
$ ./filewatcher --test-sandbox /var/log
[filewatcher] FreeBSD mode: Capsicum capability mode
[filewatcher] Testing sandbox: attempting open("/etc/passwd")...
[filewatcher] BLOCKED: open() returned ECAPMODE (93) - expected!
[filewatcher] Sandbox working correctly.
On Linux:
$ ./filewatcher /var/log
[filewatcher] Linux mode: seccomp-bpf
[filewatcher] Allowed syscalls: read, write, close, inotify_*, exit_group, ...
[filewatcher] seccomp filter installed.
[filewatcher] Sandbox active. Watching /var/log...
[2024-12-29T10:15:32] CREATED: messages.1
^C
[filewatcher] Received SIGINT, shutting down.
$ ./filewatcher --test-sandbox /var/log
[filewatcher] Linux mode: seccomp-bpf
[filewatcher] Testing sandbox: attempting socket()...
Bad system call (core dumped)
3.5 Real World Outcome
When you complete this project, you will have:
-
A working security-hardened daemon that you can actually use to monitor directories on any Unix-like system.
-
A portfolio piece demonstrating deep systems knowledge. Interviewers will immediately recognize the sophistication of cross-platform sandbox implementation.
-
Visceral understanding of why OpenBSD’s model is simpler, why Capsicum requires restructuring, and why seccomp is powerful but complex.
-
Ready answers for interview questions like “How would you sandbox a daemon?” or “What are the trade-offs between seccomp and Capsicum?”
-
A template you can reuse for other security-sensitive daemons you build.
4. Solution Architecture
4.1 High-Level Design
┌─────────────────────────────────────────────────────────────────────────────┐
│ filewatcher daemon │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌───────────────┐ ┌───────────────┐ ┌────────────────────────────┐ │
│ │ main() │───▶│ parse_args() │───▶│ os_specific_init() │ │
│ │ │ │ │ │ (open dirs, fds) │ │
│ └───────────────┘ └───────────────┘ └─────────────┬──────────────┘ │
│ │ │
│ ▼ │
│ ┌────────────────────────────────────────────────────────────────────────┐ │
│ │ sandbox_init() │ │
│ │ ┌────────────────┬────────────────┬─────────────────────────────────┐ │ │
│ │ │ OpenBSD │ FreeBSD │ Linux │ │ │
│ │ │ pledge() │ cap_rights │ seccomp_init() │ │ │
│ │ │ unveil() │ cap_enter() │ seccomp_rule_add() │ │ │
│ │ └────────────────┴────────────────┴─────────────────────────────────┘ │ │
│ └────────────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌────────────────────────────────────────────────────────────────────────┐ │
│ │ event_loop() │ │
│ │ ┌────────────────────────────────┬──────────────────────────────────┐ │ │
│ │ │ Linux: inotify │ BSD: kqueue EVFILT_VNODE │ │ │
│ │ │ read(inotify_fd, ...) │ kevent(kq, ...) │ │ │
│ │ └────────────────────────────────┴──────────────────────────────────┘ │ │
│ │ │ │ │
│ │ ▼ │ │
│ │ log_event(type, filename) │ │
│ └────────────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
4.2 Key Components
| Component | Responsibility |
|---|---|
main() |
Entry point, signal handling, orchestration |
parse_args() |
Command-line argument parsing |
fs_init() |
Open directory for watching, initialize inotify/kqueue |
sandbox_init() |
Apply OS-specific sandboxing |
event_loop() |
Wait for and process file system events |
log_event() |
Format and output events with timestamps |
sandbox_test() |
Optional: attempt forbidden operations |
4.3 Data Structures
// Configuration from command line
struct config {
const char *watch_path; // Directory to watch
const char *log_path; // Log file path (NULL = stdout)
int test_sandbox; // --test-sandbox flag
int verbose; // -v flag
};
// File system watcher state (OS-specific)
#ifdef __linux__
struct fs_watcher {
int inotify_fd;
int watch_descriptor;
};
#else // BSD
struct fs_watcher {
int kqueue_fd;
int dir_fd;
};
#endif
// Event types (normalized across platforms)
enum event_type {
EVENT_CREATED,
EVENT_MODIFIED,
EVENT_DELETED,
EVENT_RENAMED
};
4.4 Algorithm Overview
1. Parse command-line arguments
- Extract watch_path
- Check for --test-sandbox flag
2. Initialize file system watcher (BEFORE sandboxing)
- Linux: inotify_init1(), inotify_add_watch()
- BSD: open(dir), kqueue(), kevent() to add filter
3. Open log file if specified (BEFORE sandboxing)
- This is critical for Capsicum!
4. Install signal handlers for SIGINT/SIGTERM
5. Apply sandbox (point of no return)
- OpenBSD: pledge() + unveil() + unveil(NULL, NULL)
- FreeBSD: cap_rights_limit() on all fds, then cap_enter()
- Linux: build seccomp filter, seccomp_load()
6. If --test-sandbox: attempt forbidden operation, expect death
7. Enter event loop
- Wait for events (read from inotify_fd or kevent())
- Parse events into normalized format
- Call log_event() for each
- Check for shutdown signal
8. Cleanup and exit
5. Implementation Guide
5.1 Development Environment Setup
You’ll need access to three operating systems. Options:
Option A: Virtual machines (recommended)
- VirtualBox or VMware with OpenBSD, FreeBSD, and Linux VMs
- Shared folder for source code
- Compile natively on each
Option B: Cross-compilation
- Develop on Linux
- Cross-compile for FreeBSD/OpenBSD (more complex)
Option C: Cloud instances
- DigitalOcean, Vultr, etc. offer FreeBSD VMs
- AWS has Amazon Linux; you can install OpenBSD on EC2
Required packages:
# OpenBSD (everything is in base)
# Nothing to install!
# FreeBSD
pkg install git # Everything else is in base
# Linux (Debian/Ubuntu)
apt install build-essential libseccomp-dev
# Linux (Fedora/RHEL)
dnf install gcc libseccomp-devel
5.2 Project Structure
filewatcher/
├── Makefile # OS-detection, proper CFLAGS
├── README.md # Usage documentation
├── filewatcher.c # Main source (single file for simplicity)
├── filewatcher.h # Shared declarations (optional)
├── sandbox_openbsd.c # OpenBSD-specific sandboxing
├── sandbox_freebsd.c # FreeBSD-specific sandboxing
├── sandbox_linux.c # Linux-specific sandboxing
├── fs_inotify.c # Linux inotify implementation
├── fs_kqueue.c # BSD kqueue implementation
└── tests/
├── test_basic.sh # Basic functionality tests
└── test_sandbox.sh # Sandbox enforcement tests
Alternative: single-file approach
For simplicity, you can put everything in one file with #ifdef blocks. This is common for small system utilities.
5.3 The Core Question You’re Answering
“Why do different Unix systems take such radically different approaches to application sandboxing, and what are the real-world trade-offs?”
This project forces you to confront a fundamental truth: security is a design philosophy, not just a feature list.
- OpenBSD’s pledge/unveil says “tell us what you need, we’ll kill you if you lie.”
- FreeBSD’s Capsicum says “capabilities are tokens on file descriptors.”
- Linux’s seccomp says “here’s a programmable filter–go wild.”
By implementing the same functionality on all three, you’ll viscerally understand why OpenBSD can sandbox their entire base system while Linux applications rarely use seccomp directly.
5.4 Concepts You Must Understand First
Before writing code, ensure you can answer these questions:
- System Calls
- What’s the difference between
open()(libc) and theopensyscall? - How does the kernel know which process is making a syscall?
- What happens in the CPU when a syscall is made?
- Reference: Advanced Programming in the UNIX Environment (Stevens & Rago), Ch. 1-3
- What’s the difference between
- Pledge Promises
- What does “stdio” include? What about “rpath”?
- Can you add promises after calling pledge()? (No.)
- What happens if you call pledge() twice with different promises?
- Reference: pledge(2) man page
- Capsicum Capabilities
- What’s the difference between cap_rights_limit() and cap_enter()?
- Can you exit capability mode? (No.)
- What’s ECAPMODE?
- Reference: Absolute FreeBSD (Michael W. Lucas), Ch. 8
- Seccomp BPF
- What is BPF? Why is it used for syscall filtering?
- What’s the difference between SECCOMP_RET_KILL and SECCOMP_RET_ERRNO?
- Why must you call prctl(PR_SET_NO_NEW_PRIVS) first?
- Reference: The Linux Programming Interface (Michael Kerrisk), Ch. 23
- inotify vs kqueue
- How do you get the filename from an inotify event?
- Does kqueue’s EVFILT_VNODE give you the filename?
- Which is more efficient for watching many files?
- Reference: inotify(7) and kqueue(2) man pages
5.5 Questions to Guide Your Design
Before implementing, think through:
Sandboxing strategy:
- What syscalls does your file watcher need? List them all.
- What syscalls should be blocked? Why?
- When exactly should you apply the sandbox?
Initialization order:
- What files must you open before cap_enter() on FreeBSD?
- Does order matter for pledge/unveil on OpenBSD?
- Do you need to call inotify_init() before seccomp?
Error handling:
- If pledge() fails, should you continue or exit?
- How do you handle a system without seccomp support?
- Should sandbox violations be logged? How, if you can’t write?
Testing:
- How do you verify the sandbox is working?
- What should happen when –test-sandbox is passed?
- How do you test on a system that doesn’t have these features?
5.6 Thinking Exercise
Before coding, trace this scenario by hand on each OS:
Your file watcher needs to:
- Open /var/log for watching
- Read file system events
- Write events to /var/log/filewatcher.log
- Send an alert if a certain file is created (stretch goal)
For each OS, answer:
- What syscalls are needed for each step?
- What pledges/capabilities/seccomp rules are required?
- In what order must things happen?
- What happens if you try to do step 4 (network) without the right permissions?
Map it out:
OpenBSD:
pledge("stdio rpath wpath cpath", NULL) // Before or after opening files?
unveil("/var/log", "rw") // Before or after pledge?
unveil(NULL, NULL) // When?
FreeBSD:
int dir_fd = open("/var/log", ...) // Must be before cap_enter()
int log_fd = open("filewatcher.log", ...)// Must be before cap_enter()
cap_rights_limit(dir_fd, ...) // What rights?
cap_rights_limit(log_fd, ...) // What rights?
cap_enter() // Point of no return
Linux:
int inotify_fd = inotify_init1(...) // Before or after seccomp?
seccomp_rule_add(ctx, ..., read, ...) // Which syscalls exactly?
seccomp_load(ctx) // Point of no return
5.7 Hints in Layers
Hint 1: Start with file watching (no sandbox)
Get the core functionality working first. Don’t even think about sandboxing until you can watch a directory and print events.
// Linux
int fd = inotify_init1(IN_NONBLOCK);
int wd = inotify_add_watch(fd, path, IN_CREATE | IN_MODIFY | IN_DELETE);
// Read and print events in a loop
// BSD
int kq = kqueue();
int dir_fd = open(path, O_RDONLY | O_DIRECTORY);
struct kevent ev;
EV_SET(&ev, dir_fd, EVFILT_VNODE, EV_ADD | EV_ENABLE | EV_CLEAR,
NOTE_WRITE, 0, NULL);
kevent(kq, &ev, 1, NULL, 0, NULL);
// Wait and print events
Hint 2: Add OpenBSD sandbox first (simplest)
OpenBSD is the easiest to sandbox. Get it working there before tackling the others.
#ifdef __OpenBSD__
// After opening the directory but before the main loop:
if (pledge("stdio rpath wpath", NULL) == -1)
err(1, "pledge");
if (unveil(watch_path, "rw") == -1)
err(1, "unveil");
if (unveil(NULL, NULL) == -1)
err(1, "unveil lock");
#endif
Hint 3: FreeBSD requires restructuring your initialization
The key insight: you must open ALL file descriptors before calling cap_enter(). Plan your initialization sequence carefully.
#ifdef __FreeBSD__
// Step 1: Open everything FIRST
int dir_fd = open(watch_path, O_RDONLY | O_DIRECTORY);
int log_fd = open(log_path, O_WRONLY | O_APPEND | O_CREAT, 0644);
int kq = kqueue();
// Step 2: Limit rights on each fd
cap_rights_t rights;
cap_rights_init(&rights, CAP_READ, CAP_EVENT, CAP_FCNTL);
if (cap_rights_limit(dir_fd, &rights) == -1)
err(1, "cap_rights_limit dir");
cap_rights_init(&rights, CAP_WRITE, CAP_SEEK);
if (cap_rights_limit(log_fd, &rights) == -1)
err(1, "cap_rights_limit log");
// Step 3: Enter capability mode
if (cap_enter() == -1)
err(1, "cap_enter");
// NO WAY BACK. open() will now fail with ECAPMODE.
#endif
Hint 4: Linux seccomp requires enumerating every syscall
This is the hard part. Use strace to find which syscalls your program actually uses, then whitelist them.
# Find which syscalls your program uses
strace -c ./filewatcher /var/log 2>&1 | head -30
#ifdef __linux__
#include <seccomp.h>
scmp_filter_ctx ctx = seccomp_init(SCMP_ACT_KILL);
// Whitelist what you need (this list is incomplete!)
seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(read), 0);
seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(write), 0);
seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(close), 0);
seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(fstat), 0);
seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(inotify_init1), 0);
seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(inotify_add_watch), 0);
seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(exit_group), 0);
// ... you'll need more: mmap, mprotect, brk (for malloc), etc.
if (seccomp_load(ctx) < 0)
err(1, "seccomp_load");
seccomp_release(ctx);
#endif
5.8 The Interview Questions They’ll Ask
Prepare to answer these confidently:
-
“What’s the difference between pledge, Capsicum, and seccomp?”
- pledge: Promise-based, operates on categories of syscalls, simple strings
- Capsicum: Capability-based, operates on file descriptors, fine-grained
- seccomp: Filter-based, operates on individual syscalls with BPF, most flexible
-
“Why did OpenBSD choose the pledge model?”
- Simplicity enables adoption–90%+ of OpenBSD base is pledge’d
- Human-auditable: you can read “stdio rpath” and understand it
- Fail-closed: violation = death, no recovery, no second chances
-
“What are the limitations of each approach?”
- pledge: Coarse-grained (can’t say “only read /etc/passwd” without unveil)
- Capsicum: Requires restructuring to pre-open file descriptors
- seccomp: Hard to filter syscall arguments (paths, addresses)
-
“How would you sandbox a web browser?”
- Chromium uses seccomp-bpf on Linux (plus namespaces)
- Capsicum was designed with Chromium in mind (FreeBSD port exists)
- Multi-process architecture: render processes get minimal permissions
-
“What’s the attack surface reduction of each model?”
- pledge: Reduces to promised syscall categories
- Capsicum: Removes global namespace entirely
- seccomp: Reduces to explicit syscall whitelist
-
“Can you escape these sandboxes?”
- All have had vulnerabilities (nothing is perfect)
- Complexity correlates with bugs (seccomp filters have had escapes)
- OpenBSD’s simplicity has security benefits
5.9 Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| System calls fundamentals | Advanced Programming in the UNIX Environment (Stevens & Rago) | Ch. 1-3 |
| OpenBSD security | Absolute OpenBSD (Michael W. Lucas) | Security chapters |
| FreeBSD Capsicum | Absolute FreeBSD, 3rd Edition (Michael W. Lucas) | Ch. 8 |
| Linux seccomp | The Linux Programming Interface (Michael Kerrisk) | Ch. 23 |
| File watching (inotify) | The Linux Programming Interface (Michael Kerrisk) | Ch. 19 |
| BSD kqueue | The Design and Implementation of the FreeBSD Operating System (McKusick et al.) | Ch. 6 |
| Security principles | Mastering FreeBSD and OpenBSD Security (Hope, Potter & Korff) | Ch. 1-4 |
| BPF internals | BPF Performance Tools (Brendan Gregg) | Ch. 2 |
5.10 Implementation Phases
Phase 1: Core file watching (3-4 days)
Goal: Watch a directory and print events on any Unix system.
Tasks:
- Implement inotify wrapper for Linux
- Implement kqueue wrapper for BSD
- Create unified event_loop() that calls the right implementation
- Test on all three OSes
Phase 2: OpenBSD sandboxing (2-3 days)
Goal: Add pledge/unveil on OpenBSD.
Tasks:
- Add pledge() after initialization
- Add unveil() for watch path
- Test that normal operation works
- Test that forbidden operations fail (use –test-sandbox)
Phase 3: FreeBSD sandboxing (3-4 days)
Goal: Add Capsicum on FreeBSD.
Tasks:
- Restructure initialization to open all fds first
- Add cap_rights_limit() for each fd
- Add cap_enter()
- Test normal operation
- Test that forbidden operations return ECAPMODE
Phase 4: Linux sandboxing (4-5 days)
Goal: Add seccomp-bpf on Linux.
Tasks:
- Use strace to enumerate needed syscalls
- Build seccomp filter with libseccomp
- Test normal operation
- Test that forbidden syscalls cause SIGSYS
- Iterate on filter until daemon works reliably
Phase 5: Polish and documentation (2-3 days)
Goal: Production-ready code.
Tasks:
- Add proper signal handling
- Add logging with timestamps
- Write Makefile with OS detection
- Write README
- Clean up code, add comments
5.11 Key Implementation Decisions
Decision 1: Single file vs multiple files?
For a project this size, a single file with #ifdef blocks is reasonable and makes it easy to see the OS differences side-by-side. For larger projects, separate files per OS is cleaner.
Decision 2: When to sandbox?
Apply sandboxing as early as possible, but after all initialization that requires privileges:
- After opening the watch directory
- After opening the log file
- After setting up inotify/kqueue
- Before entering the event loop
Decision 3: How to handle missing sandbox support?
Options:
- Fail: if you can’t sandbox, refuse to run
- Warn: print a warning, continue without sandbox
- Detect: check at runtime if features are available
Recommended: Warn and continue for development, fail for production.
Decision 4: libseccomp vs raw BPF?
Use libseccomp. Raw BPF is educational but error-prone. libseccomp provides a sane C API and handles architecture differences.
6. Testing Strategy
6.1 Functional Testing
Test 1: Basic event detection
# Terminal 1
./filewatcher /tmp/test
# Terminal 2
touch /tmp/test/newfile
echo "data" > /tmp/test/newfile
rm /tmp/test/newfile
Expected output:
[...] CREATED: newfile
[...] MODIFIED: newfile
[...] DELETED: newfile
Test 2: Signal handling
./filewatcher /tmp/test &
PID=$!
sleep 1
kill -INT $PID
# Should exit cleanly with message
6.2 Sandbox Testing
Test 3: OpenBSD pledge violation
$ ./filewatcher --test-sandbox /tmp/test
[filewatcher] Testing sandbox: attempting socket()...
Abort trap (core dumped)
$ echo $?
134 # SIGABRT
Test 4: FreeBSD Capsicum violation
$ ./filewatcher --test-sandbox /tmp/test
[filewatcher] Testing sandbox: attempting open("/etc/passwd")...
[filewatcher] BLOCKED: errno=ECAPMODE (93) - expected!
Test 5: Linux seccomp violation
$ ./filewatcher --test-sandbox /tmp/test
[filewatcher] Testing sandbox: attempting socket()...
Bad system call (core dumped)
$ echo $?
159 # 128 + SIGSYS(31)
6.3 Per-OS Test Script
#!/bin/sh
# test_sandbox.sh
OS=$(uname)
TESTDIR=$(mktemp -d)
trap "rm -rf $TESTDIR" EXIT
echo "=== Testing on $OS ==="
# Start filewatcher in background
./filewatcher "$TESTDIR" &
PID=$!
sleep 1
# Create, modify, delete
touch "$TESTDIR/testfile"
sleep 0.5
echo "data" >> "$TESTDIR/testfile"
sleep 0.5
rm "$TESTDIR/testfile"
sleep 0.5
# Clean shutdown
kill -INT $PID
wait $PID 2>/dev/null
echo "=== Functional tests passed ==="
# Sandbox test (will crash intentionally)
echo "=== Testing sandbox enforcement ==="
./filewatcher --test-sandbox "$TESTDIR" 2>&1 || true
echo "=== Sandbox tests complete ==="
7. Common Pitfalls & Debugging
7.1 OpenBSD Pitfalls
| Problem | Cause | Fix |
|---|---|---|
| “unveil: No such file or directory” | unveil() path doesn’t exist | Verify path exists before unveil() |
| SIGABRT immediately | Pledge too restrictive | Add missing promises (use ktrace to find which) |
| “unveil” after “unveil(NULL, NULL)” fails | Can’t modify unveil after locking | Move unveil calls before the lock |
| Can’t read files after unveil | Path not revealed or wrong permissions | Check unveil() path and flags |
Debugging with ktrace:
# See what syscalls are attempted
ktrace ./filewatcher /var/log
# ... program runs or crashes ...
kdump | grep CALL
7.2 FreeBSD Pitfalls
| Problem | Cause | Fix |
|---|---|---|
| ECAPMODE from kqueue | Didn’t limit kqueue fd before cap_enter | Add kqueue fd to cap_rights_limit |
| “Not permitted” on read | Fd doesn’t have CAP_READ | Check cap_rights_limit() call |
| Can’t create log file after cap_enter | Must open all files before cap_enter | Restructure initialization |
| openat() fails | Directory fd missing CAP_LOOKUP | Add CAP_LOOKUP to directory fd rights |
Debugging Capsicum:
# Check capability rights on fds
procstat -C $(pgrep filewatcher)
# See which syscalls fail
truss ./filewatcher /var/log 2>&1 | grep -i cap
7.3 Linux Pitfalls
| Problem | Cause | Fix |
|---|---|---|
| SIGSYS immediately | Missing essential syscall | Run with strace, add missing syscall |
| “Operation not permitted” | seccomp blocks the syscall with ERRNO | Check filter for SCMP_ACT_ERRNO |
| Works in strace, fails without | strace uses ptrace, might allow more | Add rt_sigreturn, clone, etc. |
| Crash in libc | Blocking mmap, mprotect, or brk | Whitelist memory management syscalls |
Debugging seccomp:
# Find which syscall is blocked
strace -c ./filewatcher /var/log
# Run with SCMP_ACT_ERRNO instead of KILL to survive:
# Change SCMP_ACT_KILL to SCMP_ACT_ERRNO(EPERM) temporarily
# Audit log (requires kernel support)
dmesg | grep audit
Common syscalls you’ll forget:
// You'll definitely need these:
SCMP_SYS(read)
SCMP_SYS(write)
SCMP_SYS(close)
SCMP_SYS(fstat)
SCMP_SYS(mmap) // malloc needs this
SCMP_SYS(mprotect) // malloc needs this
SCMP_SYS(munmap) // free needs this
SCMP_SYS(brk) // malloc needs this
SCMP_SYS(rt_sigaction) // signal handling
SCMP_SYS(rt_sigprocmask) // signal handling
SCMP_SYS(rt_sigreturn) // signal handling
SCMP_SYS(exit_group) // exit()
SCMP_SYS(clock_gettime) // timestamps
8. Extensions & Challenges
Once the basic implementation works, try these extensions:
8.1 Recursive Directory Watching
Watch subdirectories too. Challenges:
- inotify can watch multiple directories
- kqueue requires a file descriptor per directory (and you can’t open new fds after cap_enter!)
- On FreeBSD, you must enumerate subdirectories and open them all before sandboxing
8.2 Network Alerts
Send events over the network (e.g., to syslog or a webhook). Challenges:
- OpenBSD: need “inet” or “dns” pledge
- FreeBSD: need a socket fd before cap_enter
- Linux: need socket, connect, send syscalls
8.3 Landlock on Linux
Linux 5.13+ has Landlock, which provides path-based restrictions like unveil. Implement a Landlock path and compare:
#include <linux/landlock.h>
// Create a ruleset
struct landlock_ruleset_attr attr = {
.handled_access_fs = LANDLOCK_ACCESS_FS_READ_FILE |
LANDLOCK_ACCESS_FS_WRITE_FILE,
};
int ruleset_fd = landlock_create_ruleset(&attr, sizeof(attr), 0);
// Add a rule for /var/log
struct landlock_path_beneath_attr path_attr = {
.allowed_access = LANDLOCK_ACCESS_FS_READ_FILE |
LANDLOCK_ACCESS_FS_WRITE_FILE,
.parent_fd = open("/var/log", O_PATH | O_DIRECTORY),
};
landlock_add_rule(ruleset_fd, LANDLOCK_RULE_PATH_BENEATH, &path_attr, 0);
// Enforce
landlock_restrict_self(ruleset_fd, 0);
8.4 Benchmarking
Compare file event latency across OSes:
- How long from file creation to event delivery?
- Does sandboxing add overhead?
- Which is faster: inotify or kqueue?
8.5 Configuration File
Add support for a config file specifying:
- Multiple watch paths
- Event filters (only certain file types)
- Log format
- Which sandbox features to enable
Challenge: You must read the config BEFORE sandboxing, and unveil/Capsicum it appropriately.
9. Real-World Connections
9.1 How Production Systems Do This
Chromium browser:
- Uses seccomp-bpf on Linux with a very restrictive filter
- Renderer processes can’t access the filesystem at all
- GPU process has different, more permissive filter
- Multi-process architecture: broker process has more privileges
OpenSSH on OpenBSD:
- pledge(“stdio rpath wpath cpath flock tty”) in various stages
- unveil() to limit visible paths
- Different pledges for authentication vs session stages
FreeBSD jails:
- Capsicum is used inside jails for additional isolation
- jail(2) + Capsicum = defense in depth
systemd on Linux:
- Uses seccomp filters for service sandboxing
SystemCallFilter=in unit files- Also uses namespaces and cgroups
9.2 Tools Built on These Primitives
| Tool | Technology | Notes |
|---|---|---|
| Docker | seccomp-bpf + namespaces | Default filter blocks ~300 syscalls |
| Firejail | seccomp + namespaces | User-friendly Linux sandboxing |
| Bubblewrap | seccomp + namespaces | Used by Flatpak |
| OpenBSD base | pledge/unveil | Nearly everything is sandboxed |
| FreeBSD Capsicum | Capsicum | Used in many base utilities |
| CloudABI | Capsicum | Cross-platform capability-based runtime |
10. Resources
10.1 Man Pages (Primary References)
- pledge(2) - OpenBSD pledge
- unveil(2) - OpenBSD unveil
- cap_enter(2) - FreeBSD Capsicum
- cap_rights_limit(2) - FreeBSD capabilities
- seccomp(2) - Linux seccomp
- inotify(7) - Linux inotify
- kqueue(2) - BSD kqueue
10.2 Papers and Talks
- Capsicum: Practical Capabilities for UNIX - Original Capsicum paper
- Bob Beck’s pledge/unveil BSDCan 2018 talk
- seccomp-bpf in Chrome
10.3 Source Code to Study
/usr/src/bin/cat/cat.con OpenBSD - Simple pledge example- Chromium’s Linux sandbox
- FreeBSD’s Capsicum examples
10.4 Related Projects
- libseccomp - seccomp wrapper library
- bpftrace - Tracing tool using BPF
- Pledge for Linux - Cosmopolitan’s Linux pledge implementation
11. Self-Assessment Checklist
Before considering this project complete, verify:
Understanding
- I can explain the difference between pledge, Capsicum, and seccomp without notes
- I understand why Capsicum requires pre-opening file descriptors
- I can list 5 syscalls blocked by my seccomp filter and explain why each is unnecessary
- I understand why pledge violations cause SIGABRT instead of EPERM
Implementation
- My daemon correctly reports CREATE, MODIFY, and DELETE events
- Timestamps are in ISO 8601 format
- Clean shutdown on SIGINT/SIGTERM
- Sandbox is applied before the event loop
- –test-sandbox demonstrates that forbidden operations are blocked
Testing
- Tested on OpenBSD with pledge/unveil
- Tested on FreeBSD with Capsicum
- Tested on Linux with seccomp-bpf
- Verified sandbox blocks network operations
- Verified sandbox blocks file access outside allowed paths
Code Quality
- Compiles without warnings on all three OSes
- No memory leaks (test with valgrind on Linux)
- Error handling for all syscalls
- Clear comments explaining OS-specific code
Documentation
- README explains how to build on each OS
- README explains what the sandbox restricts
- Usage examples provided
12. Submission / Completion Criteria
You have completed this project when:
-
Functionality: Your daemon watches a directory and correctly logs CREATE, MODIFY, and DELETE events with timestamps on all three target OSes.
- Sandboxing: Each OS uses its native sandboxing:
- OpenBSD: pledge + unveil
- FreeBSD: Capsicum cap_enter
- Linux: seccomp-bpf filter
- Verification: Running with –test-sandbox demonstrates sandbox enforcement:
- OpenBSD: SIGABRT on pledge violation
- FreeBSD: ECAPMODE returned
- Linux: SIGSYS on forbidden syscall
-
Code organization: Single codebase compiles on all three OSes with appropriate
#ifdefblocks. - Interview readiness: You can:
- Draw the architecture on a whiteboard
- Explain trade-offs between the three models
- Describe how you’d sandbox a different type of application
- Answer the interview questions from section 5.8
Next Project: Project 2: Event-Driven TCP Echo Server (kqueue vs epoll)
This expanded project guide is part of the BSD vs Linux & Unix Variants Learning Projects series.