← Back to all projects

BSD LINUX UNIX VARIANTS LEARNING PROJECTS

Understanding BSD vs Linux & Unix Variants: A Deep Dive Through Building

Goal: Master the fundamental architectural differences between Unix variants—BSD (FreeBSD, OpenBSD), Linux, and illumos—by building real systems that expose their distinct design philosophies. You’ll understand why these systems differ, not just how, enabling you to choose the right tool for each job and write truly portable systems code.


Why This Knowledge Matters

In 1969 at Bell Labs, Ken Thompson and Dennis Ritchie created Unix. From that single ancestor, an entire family of operating systems evolved—each branch making different design choices that echo through every system call you make today.

The professional reality:

  • Netflix runs on FreeBSD for its content delivery (their CDN serves ~40% of North American internet traffic)
  • OpenBSD pioneered security features now standard everywhere (ASLR, W^X, pledge/unveil)
  • Linux dominates servers, cloud, and containers (96%+ of top 1M web servers)
  • illumos (Solaris heritage) gave us DTrace and ZFS—technologies now ported everywhere

Understanding these systems isn’t academic—it’s understanding why your containers work the way they do, why some firewalls are easier to configure than others, and why certain security models exist.

The core question: Why did these systems evolve differently from the same ancestor?

                                 ┌────────────────────────────────────────┐
                                 │        Original Unix (1969)            │
                                 │        Bell Labs (Thompson/Ritchie)    │
                                 └─────────────────┬──────────────────────┘
                                                   │
                      ┌────────────────────────────┴────────────────────────────┐
                      │                                                         │
                      ▼                                                         ▼
        ┌─────────────────────────────┐                          ┌──────────────────────────┐
        │      BSD (1977)             │                          │    System V (AT&T)       │
        │   UC Berkeley               │                          │                          │
        │   "Academic/Research"       │                          │    "Commercial Unix"     │
        └──────────────┬──────────────┘                          └────────────┬─────────────┘
                       │                                                      │
       ┌───────────────┼───────────────┬─────────────────┐                   │
       │               │               │                 │                   │
       ▼               ▼               ▼                 ▼                   ▼
  ┌─────────┐    ┌─────────┐    ┌─────────┐      ┌────────────┐      ┌────────────────┐
  │ FreeBSD │    │ OpenBSD │    │ NetBSD  │      │  Darwin/   │      │    Solaris     │
  │ (1993)  │    │ (1995)  │    │ (1993)  │      │   macOS    │      │    (1992)      │
  │         │    │         │    │         │      │            │      │                │
  │ Focus:  │    │ Focus:  │    │ Focus:  │      │ Mach +     │      │ DTrace, ZFS    │
  │ Perf,   │    │ Security│    │ Porta-  │      │ BSD        │      │                │
  │ Features│    │ Correct │    │ bility  │      │ userland   │      └────────┬───────┘
  └─────────┘    └─────────┘    └─────────┘      └────────────┘               │
                                                                              ▼
                                                                      ┌────────────────┐
        ┌─────────────────────────────────────────────────────────┐   │    illumos     │
        │                  Linux (1991)                            │   │    (2010)      │
        │           NOT Unix descendant—Unix-LIKE                  │   │                │
        │           Reimplementation of Unix ideas                 │   │ OpenSolaris    │
        │           Linux kernel + GNU userland                    │   │ fork           │
        └─────────────────────────────────────────────────────────┘   └────────────────┘

The key insight that will click once you build these projects:

  • Linux = A kernel with userland assembled from various sources (Lego blocks)
  • BSD = Complete, integrated operating systems (Finished product)
  • illumos = Enterprise Unix with native observability (DTrace) and storage (ZFS)

This fundamental difference shapes EVERYTHING: security models, container implementations, networking APIs, and more.


The Design Philosophy Deep Dive

Linux: “The Bazaar”

Linux follows the “cathedral vs bazaar” model from Eric Raymond—many independent contributors, rapid iteration, features from everywhere. The kernel is separate from userland (GNU tools, systemd, etc.).

┌─────────────────────────────────────────────────────────────────────┐
│                         Linux System                                 │
├─────────────────────────────────────────────────────────────────────┤
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐  ┌────────────┐ │
│  │   systemd   │  │ GNU coreutils│  │  glibc      │  │   bash     │ │
│  │ (Lennart P.)│  │  (FSF)      │  │  (FSF)      │  │  (FSF)     │ │
│  └─────────────┘  └─────────────┘  └─────────────┘  └────────────┘ │
│                    ▲ Different projects, different maintainers      │
├────────────────────┼────────────────────────────────────────────────┤
│                    │                                                 │
│                    │  Linux Kernel (Torvalds et al.)                │
│    ┌───────────────┴──────────────────────────────────────────┐     │
│    │ Monolithic kernel with loadable modules                   │     │
│    │ syscall interface is the stable API boundary              │     │
│    └──────────────────────────────────────────────────────────┘     │
└─────────────────────────────────────────────────────────────────────┘

Implications:

  • Security features come from many sources (seccomp, SELinux, AppArmor, namespaces)
  • Containers are “assembled” from primitives (namespaces + cgroups + seccomp + …)
  • Updates can be partial (update kernel, keep userland or vice versa)

BSD: “The Cathedral”

BSD maintains the entire operating system as one project. Kernel, libc, core utilities, documentation—all versioned together.

┌─────────────────────────────────────────────────────────────────────┐
│                    FreeBSD/OpenBSD System                            │
├─────────────────────────────────────────────────────────────────────┤
│           Single source tree, single project                         │
│  ┌─────────────────────────────────────────────────────────────┐    │
│  │  /usr/src                                                     │    │
│  │   ├── sys/          (kernel source)                          │    │
│  │   ├── lib/          (libc, libm, etc.)                       │    │
│  │   ├── bin/          (core utilities: ls, cat, etc.)          │    │
│  │   ├── sbin/         (system utilities: mount, ifconfig)      │    │
│  │   ├── usr.bin/      (user utilities: grep, awk, etc.)        │    │
│  │   └── share/        (docs, man pages)                        │    │
│  │                                                               │    │
│  │   ALL maintained by the SAME project, versioned TOGETHER     │    │
│  └─────────────────────────────────────────────────────────────┘    │
│                                                                      │
│  Result: Tight integration, consistent coding style, unified docs   │
└─────────────────────────────────────────────────────────────────────┘

Implications:

  • Security features are built-in (jails, Capsicum on FreeBSD; pledge/unveil on OpenBSD)
  • Containers are “first-class” (jail(2) is a single system call)
  • Updates are atomic (upgrade entire base system together)

Security Model Comparison: A Critical Difference

The security philosophy differences are profound:

OpenBSD: Promise-Based Security (pledge/unveil)

// OpenBSD: Tell the kernel what you WILL do, reveal what you WILL see
int main() {
    // After this, only these capabilities remain
    if (pledge("stdio rpath wpath", NULL) == -1)
        err(1, "pledge");

    // Only reveal these filesystem paths
    if (unveil("/var/log", "rw") == -1)
        err(1, "unveil");
    if (unveil(NULL, NULL) == -1)  // Lock it down
        err(1, "unveil");

    // Now the program runs with minimal privileges
    // Any violation = immediate SIGABRT (uncatchable)
}

Philosophy: “Surrender capabilities at runtime. Promise what you’ll do, reveal what you’ll see.” Simple, auditable, comprehensible by mortals.

FreeBSD: Capability-Based Security (Capsicum)

// FreeBSD: Limit capabilities on file descriptors
int main() {
    int fd = open("/etc/passwd", O_RDONLY);

    cap_rights_t rights;
    cap_rights_init(&rights, CAP_READ, CAP_SEEK);
    cap_rights_limit(fd, &rights);  // This fd can now ONLY read/seek

    cap_enter();  // Enter capability mode - no more global namespace access

    // fd is now the ONLY way to access that file
    // Cannot open new files, cannot access network
}

Philosophy: “Capabilities are tokens attached to file descriptors.” Fine-grained control, but more complex.

Linux: Filter-Based Security (seccomp-bpf)

// Linux: Write BPF program to filter syscalls
struct sock_filter filter[] = {
    BPF_STMT(BPF_LD | BPF_W | BPF_ABS, offsetof(struct seccomp_data, nr)),
    BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K, __NR_write, 0, 1),
    BPF_STMT(BPF_RET | BPF_K, SECCOMP_RET_ALLOW),
    BPF_STMT(BPF_RET | BPF_K, SECCOMP_RET_KILL),
};
struct sock_fprog prog = { .len = 4, .filter = filter };
prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &prog);

Philosophy: “Maximum flexibility through programmability.” You write a BPF program that filters syscalls. Powerful but complex—easy to make mistakes.

┌───────────────────────────────────────────────────────────────────────────┐
│                    Security Model Comparison                               │
├────────────────────┬──────────────────────┬───────────────────────────────┤
│     OpenBSD        │      FreeBSD         │          Linux                │
│   pledge/unveil    │      Capsicum        │      seccomp-bpf              │
├────────────────────┼──────────────────────┼───────────────────────────────┤
│                    │                      │                               │
│ "I promise to only"│ "This fd can only"  │ "If syscall matches filter"  │
│                    │                      │                               │
│   Simple strings   │  Capability rights   │    BPF bytecode program      │
│   "stdio rpath"    │  CAP_READ, CAP_SEEK  │    Complex filter rules      │
│                    │                      │                               │
│   Easy to audit    │  Medium complexity   │    Hard to get right         │
│   ~10 lines code   │  ~30 lines code     │    ~100+ lines code          │
│                    │                      │                               │
└────────────────────┴──────────────────────┴───────────────────────────────┘

Isolation Architecture: Containers vs Jails vs Zones

This is where the “first-class concept” vs “building blocks” difference becomes crystal clear:

Linux: Assemble from Primitives

┌─────────────────────────────────────────────────────────────────────┐
│                    Linux "Container"                                 │
│            (NOT a kernel concept—assembled from parts)               │
├─────────────────────────────────────────────────────────────────────┤
│                                                                      │
│  You must combine:                                                   │
│  ┌──────────────┐ ┌──────────────┐ ┌──────────────┐                 │
│  │ PID namespace│ │mount namespace│ │network ns   │                 │
│  │ clone(CLONE_ │ │ clone(CLONE_ │ │ clone(CLONE_│                 │
│  │    NEWPID)   │ │    NEWNS)    │ │    NEWNET)  │                 │
│  └──────────────┘ └──────────────┘ └──────────────┘                 │
│  ┌──────────────┐ ┌──────────────┐ ┌──────────────┐                 │
│  │ UTS namespace│ │ IPC namespace│ │ user ns     │                 │
│  │   (hostname) │ │ (semaphores, │ │ (uid/gid    │                 │
│  │              │ │  msg queues) │ │  mapping)   │                 │
│  └──────────────┘ └──────────────┘ └──────────────┘                 │
│  ┌──────────────┐ ┌──────────────┐                                  │
│  │    cgroups   │ │   seccomp    │  + AppArmor/SELinux + ...       │
│  │  (resource   │ │  (syscall    │                                  │
│  │   limits)    │ │   filter)    │                                  │
│  └──────────────┘ └──────────────┘                                  │
│                                                                      │
│  Result: ~500+ lines of C code to create a container                │
└─────────────────────────────────────────────────────────────────────┘

FreeBSD: First-Class Jail

┌─────────────────────────────────────────────────────────────────────┐
│                      FreeBSD Jail                                    │
│               (First-class kernel concept)                           │
├─────────────────────────────────────────────────────────────────────┤
│                                                                      │
│   Single system call:                                                │
│   ┌─────────────────────────────────────────────────────────────┐   │
│   │  jail(2)                                                     │   │
│   │                                                              │   │
│   │  struct jail j = {                                           │   │
│   │      .version = JAIL_API_VERSION,                            │   │
│   │      .path = "/jails/myjail",                                │   │
│   │      .hostname = "myjail",                                   │   │
│   │      .jailname = "myjail",                                   │   │
│   │      .ip4s = 1,                                              │   │
│   │      .ip4 = &jail_ip,                                        │   │
│   │  };                                                          │   │
│   │  jail(&j);  // That's it. You're in a jail.                  │   │
│   └─────────────────────────────────────────────────────────────┘   │
│                                                                      │
│  + VNET for network virtualization                                  │
│  + rctl for resource limits                                         │
│  + ZFS clones for instant filesystem snapshots                      │
│                                                                      │
│  Result: ~100 lines of C code for equivalent isolation              │
└─────────────────────────────────────────────────────────────────────┘

illumos: Zones with SMF Integration

┌─────────────────────────────────────────────────────────────────────┐
│                     illumos Zone                                     │
│              (Enterprise-grade isolation)                            │
├─────────────────────────────────────────────────────────────────────┤
│                                                                      │
│  zone_create() / zonecfg + zoneadm                                  │
│  ┌─────────────────────────────────────────────────────────────┐   │
│  │                                                              │   │
│  │  - Full process isolation                                    │   │
│  │  - Delegated ZFS datasets                                    │   │
│  │  - Resource pools                                            │   │
│  │  - Network virtualization (crossbow)                         │   │
│  │  - SMF (Service Management Facility) integration             │   │
│  │  - DTrace visibility across zones                            │   │
│  │  - LX branded zones (run Linux binaries!)                    │   │
│  │                                                              │   │
│  └─────────────────────────────────────────────────────────────┘   │
│                                                                      │
│  Zones predated Docker by over a decade (2004)                      │
└─────────────────────────────────────────────────────────────────────┘

Event-Driven I/O: kqueue vs epoll

Both solve the C10K problem (handling 10,000+ concurrent connections), but with different elegance:

┌─────────────────────────────────────────────────────────────────────┐
│                    kqueue (BSD)                                      │
├─────────────────────────────────────────────────────────────────────┤
│                                                                      │
│  // One call to register AND wait                                   │
│  struct kevent changes[100];    // What we want to monitor          │
│  struct kevent events[100];     // What happened                    │
│                                                                      │
│  // Register 100 file descriptors in ONE system call                │
│  kevent(kq, changes, 100, events, 100, NULL);                       │
│                                                                      │
│  Benefits:                                                           │
│  ✓ Batch updates (register many fds in one syscall)                 │
│  ✓ Generic (handles files, sockets, signals, processes, timers)    │
│  ✓ Cleaner API design                                               │
│                                                                      │
│  Filter types: EVFILT_READ, EVFILT_WRITE, EVFILT_VNODE,            │
│               EVFILT_PROC, EVFILT_SIGNAL, EVFILT_TIMER             │
└─────────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────────┐
│                    epoll (Linux)                                     │
├─────────────────────────────────────────────────────────────────────┤
│                                                                      │
│  // Separate calls for each modification                             │
│  for (int i = 0; i < 100; i++) {                                    │
│      epoll_ctl(epfd, EPOLL_CTL_ADD, fds[i], &event);  // 100 calls!│
│  }                                                                   │
│  epoll_wait(epfd, events, 100, -1);                                 │
│                                                                      │
│  Limitations:                                                        │
│  ✗ One syscall per modification                                     │
│  ✗ Socket-focused (need eventfd/signalfd/timerfd for other types)  │
│  ✗ More system calls under high churn                               │
│                                                                      │
│  But: Still O(1) and very fast in practice                          │
└─────────────────────────────────────────────────────────────────────┘

Core Concept Analysis

To truly understand BSD vs Linux (and other Unix-likes), you need to grasp these fundamental architectural differences:

Concept Area Linux FreeBSD OpenBSD illumos
Design Philosophy Modular kernel + GNU userland (pieces from everywhere) Integrated “complete OS” (kernel + userland as one) Security-first, minimal attack surface Enterprise features (DTrace, ZFS native)
Isolation Namespaces + cgroups (building blocks) Jails (first-class kernel concept) chroot + pledge/unveil Zones (first-class containers)
Event I/O epoll kqueue kqueue Event ports
Packet Filter nftables/iptables pf (ported from OpenBSD) pf (native) IPFilter
Security Model seccomp-bpf, SELinux, AppArmor Capsicum, MAC Framework pledge(2), unveil(2) Privileges, zones
Tracing eBPF, perf DTrace (ported) ktrace DTrace (native)
Init System systemd (mostly) rc scripts rc scripts SMF

Key insight: Linux is a kernel with userland assembled from various sources. BSDs are complete, integrated operating systems. This fundamental difference shapes everything else.


Concept Summary Table

Concept Cluster What You Need to Internalize
Design Philosophy Linux = bazaar (components from everywhere); BSD = cathedral (integrated system). This shapes everything.
Security Models OpenBSD pledge/unveil = promise-based; FreeBSD Capsicum = capability-based; Linux seccomp = filter-based. Trade-offs between simplicity and flexibility.
Isolation Architecture Jails/Zones are first-class kernel concepts; Linux containers are assembled from namespaces+cgroups. Complexity vs elegance.
Event I/O kqueue is more elegant (batch ops, generic); epoll is socket-focused. Both solve C10K.
The Unix Heritage BSD descends from original Unix; Linux is a reimplementation. This explains API differences.
Observability DTrace (Solaris/illumos native, ported to BSD) vs eBPF (Linux). Both let you instrument running kernels.
Networking BSD’s TCP/IP stack is the reference implementation. pf originated on OpenBSD.

Deep Dive Reading by Concept

Unix History & Design Philosophy

Concept Book & Chapter
Unix origins and philosophy The UNIX Programming Environment by Kernighan & Pike — Ch. 1: “UNIX for Beginners”
BSD history and development The Design and Implementation of the FreeBSD Operating System by McKusick et al. — Ch. 1
Linux kernel architecture Understanding the Linux Kernel, 3rd Edition by Bovet & Cesati — Ch. 1-2
System calls deep dive Advanced Programming in the UNIX Environment, 3rd Edition by Stevens & Rago — Ch. 1-3

Security Models

Concept Book & Chapter
OpenBSD security philosophy Absolute OpenBSD by Michael W. Lucas — Ch. 1 & security chapters
FreeBSD Capsicum Absolute FreeBSD, 3rd Edition by Michael W. Lucas — Ch. 8
Linux security mechanisms The Linux Programming Interface by Michael Kerrisk — Ch. 23 (Timers & Seccomp)
General Unix security Mastering FreeBSD and OpenBSD Security by Hope, Potter & Korff — Full book

Isolation & Containers

Concept Book & Chapter
Linux namespaces The Linux Programming Interface by Michael Kerrisk — Ch. 28-29 (Process Creation) + online resources
FreeBSD jails Absolute FreeBSD, 3rd Edition by Michael W. Lucas — Ch. 12: “Jails”
Linux cgroups How Linux Works, 3rd Edition by Brian Ward — Ch. 8
General process isolation Operating Systems: Three Easy Pieces by Arpaci-Dusseau — Part II: “Virtualization”

Networking & I/O

Concept Book & Chapter
Event-driven I/O The Linux Programming Interface by Michael Kerrisk — Ch. 63: “Alternative I/O Models”
TCP/IP fundamentals TCP/IP Illustrated, Volume 1 by W. Richard Stevens — Full book (BSD reference impl)
Socket programming UNIX Network Programming, Volume 1 by Stevens, Fenner & Rudoff — Ch. 1-6
FreeBSD networking The Design and Implementation of the FreeBSD Operating System by McKusick et al. — Ch. 12

System Tracing & Debugging

Concept Book & Chapter
DTrace fundamentals DTrace: Dynamic Tracing in Oracle Solaris, Mac OS X, and FreeBSD by Brendan Gregg — Full book
eBPF/BPF on Linux BPF Performance Tools by Brendan Gregg — Ch. 1-5
General debugging The Art of Debugging with GDB, DDD, and Eclipse by Matloff & Salzman — Ch. 1-3

The Unix Family Tree (Context)

Understanding the genealogy helps:

Original Unix (Bell Labs, 1970s)
├── BSD (Berkeley, 1977)
│   ├── FreeBSD (1993) → focus: performance, features, ZFS
│   ├── OpenBSD (1995) → focus: security, correctness, simplicity
│   ├── NetBSD (1993) → focus: portability
│   └── Darwin/macOS (2000) → Mach microkernel + BSD userland
│
├── System V (AT&T)
│   └── Solaris (Sun, 1992)
│       └── illumos (2010) → OpenSolaris fork, DTrace/ZFS native
│
└── Linux (1991) → NOT Unix lineage, but Unix-like
    └── GNU userland + Linux kernel

Linux is the “odd one out”—it’s a reimplementation of Unix ideas, not a descendant. This explains why it often does things differently.


Project 1: Cross-Platform Sandboxed Service

  • File: BSD_LINUX_UNIX_VARIANTS_LEARNING_PROJECTS.md
  • Programming Language: C
  • Coolness Level: Level 4: Hardcore Tech Flex
  • Business Potential: 3. The “Service & Support” Model
  • Difficulty: Level 3: Advanced
  • Knowledge Area: OS Security / Systems Programming
  • Software or Tool: pledge / unveil / seccomp / capsicum
  • Main Book: “Advanced Programming in the UNIX Environment” by Stevens & Rago

What you’ll build: A file-watching daemon that monitors directories for changes and logs events—implemented with native sandboxing on each OS (pledge/unveil on OpenBSD, Capsicum on FreeBSD, seccomp on Linux).

Why it teaches Unix differences: You can’t abstract away the security models—you must understand each one’s philosophy. OpenBSD’s “promise what you’ll do, reveal what you’ll see” model (pledge/unveil) is fundamentally different from Linux’s “filter syscalls at BPF level” (seccomp) or FreeBSD’s capability-based approach (Capsicum).

Core challenges you’ll face:

  • Challenge 1: Understanding pledge promises (“stdio rpath wpath”) vs seccomp BPF filters (maps to security model philosophy)
  • Challenge 2: Using unveil() vs Capsicum cap_rights_limit() for filesystem restriction (maps to capability models)
  • Challenge 3: Building without libc abstractions that hide OS differences (maps to syscall interface understanding)
  • Challenge 4: Handling graceful degradation when security features aren’t available

Key Concepts:

Difficulty: Intermediate Time estimate: 2-3 weeks Prerequisites: C programming, basic Unix syscalls (open, read, write), comfort with man pages

Real world outcome:

  • A daemon that prints “CREATED: /path/to/file” or “MODIFIED: /path/to/file” to stdout/syslog
  • Running with minimal privileges on each OS—demonstrable by attempting forbidden operations and seeing them blocked
  • A single codebase with #ifdef __OpenBSD__, #ifdef __FreeBSD__, #ifdef __linux__ blocks showing the architectural differences

Learning milestones:

  1. Get basic file watching working on one OS → understand inotify (Linux) vs kqueue EVFILT_VNODE (BSD)
  2. Add sandboxing on OpenBSD with pledge/unveil → understand “promise-based” security
  3. Port sandboxing to FreeBSD Capsicum → understand capability-based security
  4. Port to Linux seccomp-bpf → understand filter-based security and why it’s “harder”
  5. Compare: which was easiest? Which is most secure? Why?

Real World Outcome

When you complete this project, you’ll have a security-hardened file-watching daemon that demonstrates the fundamental differences between Unix security models.

What you’ll see running on OpenBSD:

$ ./filewatcher /var/log

[filewatcher] Starting with pledge("stdio rpath wpath cpath") and unveil("/var/log", "rw")
[filewatcher] Security sandbox active. Attempting forbidden operation...
[filewatcher] BLOCKED: Cannot access /etc/passwd (unveil restriction)
[filewatcher] Monitoring /var/log for changes...

[2024-12-22 14:32:01] CREATED: /var/log/messages.1
[2024-12-22 14:32:05] MODIFIED: /var/log/auth.log
[2024-12-22 14:32:10] DELETED: /var/log/old.log

# If you try to violate pledge:
$ ./filewatcher_bad /var/log
[filewatcher] Starting...
[filewatcher] Attempting network connection (not pledged)...
Abort trap (core dumped)  # SIGABRT - pledge violation!

What you’ll see running on FreeBSD with Capsicum:

$ ./filewatcher /var/log

[filewatcher] Entering capability mode...
[filewatcher] File descriptor rights limited: CAP_READ, CAP_EVENT
[filewatcher] Capability mode active. Global namespace access revoked.
[filewatcher] Monitoring /var/log for changes...

[2024-12-22 14:32:01] CREATED: /var/log/messages.1

# Attempting to open new file after cap_enter():
[filewatcher] ERROR: open("/etc/passwd") failed: Not permitted in capability mode

What you’ll see running on Linux with seccomp:

$ ./filewatcher /var/log

[filewatcher] Installing seccomp-bpf filter...
[filewatcher] Allowed syscalls: read, write, inotify_add_watch, inotify_rm_watch, exit_group
[filewatcher] Filter installed. Monitoring...

[2024-12-22 14:32:01] CREATED: /var/log/messages.1

# Attempting forbidden syscall:
$ ./filewatcher_bad /var/log
[filewatcher] Attempting socket() syscall (not allowed)...
Bad system call (core dumped)  # SIGSYS - seccomp violation!

Your codebase will look like:

// Conditional compilation showing the architectural differences
#ifdef __OpenBSD__
    // ~15 lines: pledge() + unveil()
    pledge("stdio rpath wpath", NULL);
    unveil(watch_path, "rw");
    unveil(NULL, NULL);
#elif defined(__FreeBSD__)
    // ~30 lines: Capsicum capability mode
    cap_rights_t rights;
    cap_rights_init(&rights, CAP_READ, CAP_EVENT, CAP_FCNTL);
    cap_rights_limit(dir_fd, &rights);
    cap_enter();
#elif defined(__linux__)
    // ~100+ lines: seccomp-bpf filter program
    struct sock_filter filter[] = { /* BPF program */ };
    // ... complex filter setup
#endif

The Core Question You’re Answering

“Why do different Unix systems take such radically different approaches to application sandboxing, and what are the real-world trade-offs?”

This project forces you to confront a fundamental truth: security is a design philosophy, not just a feature list. OpenBSD’s pledge/unveil says “tell us what you need, we’ll kill you if you lie.” FreeBSD’s Capsicum says “capabilities are tokens on file descriptors.” Linux’s seccomp says “here’s a programmable filter—go wild.”

By implementing the same functionality on all three, you’ll viscerally understand why OpenBSD can sandbox their entire base system while Linux applications rarely use seccomp directly.


Concepts You Must Understand First

Stop and research these before coding:

  1. System Calls as the Security Boundary
    • What is a system call? How does it differ from a library function?
    • Why is the syscall interface the natural place to enforce security?
    • How does the kernel know which process is making the call?
    • Book Reference: Advanced Programming in the UNIX Environment, 3rd Edition by Stevens & Rago — Ch. 1-3
  2. The Principle of Least Privilege
    • What does it mean for a program to have “minimal privileges”?
    • Why should a file watcher not have network access?
    • What’s the difference between DAC (discretionary) and MAC (mandatory) access control?
    • Book Reference: Mastering FreeBSD and OpenBSD Security by Hope, Potter & Korff — Ch. 1-2
  3. OpenBSD pledge/unveil Model
  4. FreeBSD Capsicum Model
    • What is “capability mode” and why can’t you leave it?
    • How do cap_rights_t work? What’s CAP_READ vs CAP_WRITE?
    • What’s the difference between cap_rights_limit() and cap_enter()?
    • Book Reference: Absolute FreeBSD, 3rd Edition by Michael W. Lucas — Ch. 8
  5. Linux seccomp-bpf Model
    • What is BPF (Berkeley Packet Filter)? Why is it used for syscall filtering?
    • How do you write a BPF filter program?
    • What’s the difference between SECCOMP_RET_KILL, SECCOMP_RET_ERRNO, SECCOMP_RET_ALLOW?
    • Book Reference: The Linux Programming Interface by Michael Kerrisk — Ch. 23
  6. File System Event Notification
    • Linux: How does inotify work? What events can you watch?
    • BSD: How does kqueue EVFILT_VNODE work? What’s the kevent structure?
    • Why are these fundamentally different APIs?
    • Book Reference: The Linux Programming Interface by Michael Kerrisk — Ch. 19

Questions to Guide Your Design

Before implementing, think through these:

  1. What exactly needs sandboxing?
    • What system calls does a file watcher need? (open, read, stat, inotify_add_watch/kevent, write to log)
    • What system calls should be BLOCKED? (socket, execve, fork, ptrace, mount)
    • How do you enumerate the minimal set?
  2. How do you test the sandbox?
    • How can you verify that forbidden operations are actually blocked?
    • What happens when a sandboxed program tries a forbidden syscall?
    • How do you distinguish “sandbox blocked it” from “other error”?
  3. How do you handle initialization vs runtime?
    • Most programs need more privileges during startup (opening config files, binding ports)
    • How do pledge/Capsicum/seccomp handle the “initialize, then restrict” pattern?
    • When exactly should you “lock down”?
  4. What about error handling?
    • If pledge() fails, should you continue without sandboxing or exit?
    • How do you write code that gracefully degrades on systems without these features?
    • How do you log sandbox violations for debugging?
  5. Cross-platform abstraction?
    • Should you create a common API that hides the OS differences?
    • Or should you embrace the differences with #ifdef?
    • What are the trade-offs of each approach?

Thinking Exercise

Before coding, trace this scenario by hand:

Your file watcher needs to:

  1. Open a directory for watching
  2. Read file system events
  3. Write events to a log file
  4. Optionally: send alerts over the network (for a “premium” version)

Map to each security model:

┌─────────────────────────────────────────────────────────────────────┐
│                    OpenBSD pledge/unveil                             │
├─────────────────────────────────────────────────────────────────────┤
│                                                                      │
│  Step 1: What promises do we need?                                  │
│          stdio (read/write to fds)                                  │
│          rpath (read files)                                         │
│          wpath (write files)                                        │
│          cpath (create files - for log rotation?)                   │
│          inet (ONLY if network alerts enabled)                      │
│                                                                      │
│  Step 2: What paths do we reveal?                                   │
│          unveil("/var/log", "rw")  - watch and log here             │
│          unveil("/etc/filewatcher.conf", "r") - config file         │
│          unveil(NULL, NULL) - lock it down                          │
│                                                                      │
│  Step 3: What happens if we try socket() without "inet" promise?    │
│          → Process receives SIGABRT, core dump created              │
│                                                                      │
└─────────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────────┐
│                    FreeBSD Capsicum                                  │
├─────────────────────────────────────────────────────────────────────┤
│                                                                      │
│  Step 1: Open all needed file descriptors BEFORE cap_enter()        │
│          int dir_fd = open("/var/log", O_RDONLY|O_DIRECTORY);       │
│          int log_fd = open("/var/log/filewatcher.log", O_WRONLY);   │
│          int kq = kqueue();                                          │
│                                                                      │
│  Step 2: Limit rights on each fd                                     │
│          cap_rights_limit(dir_fd, &(CAP_READ|CAP_EVENT|CAP_LOOKUP));│
│          cap_rights_limit(log_fd, &(CAP_WRITE|CAP_SEEK));           │
│                                                                      │
│  Step 3: Enter capability mode                                       │
│          cap_enter();  // No way back!                               │
│                                                                      │
│  Step 4: What happens if we try open("/etc/passwd")?                │
│          → Returns -1, errno = ECAPMODE                              │
│                                                                      │
└─────────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────────┐
│                    Linux seccomp-bpf                                 │
├─────────────────────────────────────────────────────────────────────┤
│                                                                      │
│  Step 1: Enumerate all syscalls we need (this is the hard part!)    │
│          read, write, close, fstat, mmap, mprotect,                 │
│          inotify_init1, inotify_add_watch, inotify_rm_watch,        │
│          epoll_create1, epoll_ctl, epoll_wait,                      │
│          openat (with restrictions?), exit_group, ...               │
│                                                                      │
│  Step 2: Write BPF filter program                                   │
│          For each syscall: ALLOW if in whitelist, KILL otherwise    │
│          Must handle syscall arguments for openat restrictions!     │
│                                                                      │
│  Step 3: What happens if we try socket()?                           │
│          → Process receives SIGSYS, terminated                       │
│                                                                      │
│  Challenge: How do you restrict openat() to specific paths?         │
│             BPF can't easily inspect string arguments!              │
│                                                                      │
└─────────────────────────────────────────────────────────────────────┘

Key questions from this exercise:

  • Why is OpenBSD’s model so much simpler?
  • Why does Capsicum require pre-opening all file descriptors?
  • Why is Linux’s path restriction so much harder?

The Interview Questions They’ll Ask

Prepare to answer these:

  1. “What’s the difference between pledge, Capsicum, and seccomp?”
    • pledge: Promise-based, operates on “promise” categories, simple strings
    • Capsicum: Capability-based, operates on file descriptors, fine-grained
    • seccomp: Filter-based, operates on syscalls with BPF, most flexible but complex
  2. “Why did OpenBSD choose the pledge model?”
    • Simplicity enables adoption (90%+ of base system is pledge’d)
    • Auditable by humans (you can read “stdio rpath” and understand it)
    • Fail-closed philosophy (violation = death, no recovery)
  3. “What are the limitations of each approach?”
    • pledge: Coarse-grained (can’t say “only read /etc/passwd”)
    • Capsicum: Requires restructuring code to pre-open descriptors
    • seccomp: Hard to restrict syscall arguments (e.g., which paths for open)
  4. “How would you sandbox a web browser?”
    • Chromium uses seccomp-bpf on Linux
    • Capsicum was designed with Chromium in mind (FreeBSD port exists)
    • This is a great real-world comparison point
  5. “What’s the attack surface reduction of each model?”
    • pledge: Reduces syscall surface to promised categories
    • Capsicum: Removes global namespace entirely after cap_enter()
    • seccomp: Reduces to explicit syscall whitelist
  6. “Can you escape these sandboxes?”
    • All have had vulnerabilities (nothing is perfect)
    • Complexity = more bugs (seccomp filters have had escapes)
    • OpenBSD’s simplicity has security benefits

Hints in Layers

Hint 1: Start with file watching (no sandbox)

Get the core functionality working first:

// Linux inotify
int fd = inotify_init1(IN_NONBLOCK);
inotify_add_watch(fd, "/var/log", IN_CREATE | IN_MODIFY | IN_DELETE);
// Read events in a loop

// BSD kqueue
int kq = kqueue();
struct kevent ev;
EV_SET(&ev, dir_fd, EVFILT_VNODE, EV_ADD | EV_ENABLE | EV_CLEAR,
       NOTE_WRITE | NOTE_DELETE | NOTE_RENAME, 0, NULL);
kevent(kq, &ev, 1, NULL, 0, NULL);
// Wait for events with kevent()

Hint 2: Add OpenBSD pledge first (simplest)

#ifdef __OpenBSD__
#include <unistd.h>

// After opening watch directory but before main loop:
if (pledge("stdio rpath wpath", NULL) == -1)
    err(1, "pledge");

// After all setup, lock down paths:
if (unveil(watch_path, "rw") == -1)
    err(1, "unveil");
if (unveil(NULL, NULL) == -1)  // No more unveil calls allowed
    err(1, "unveil");
#endif

Hint 3: FreeBSD Capsicum requires restructuring

#ifdef __FreeBSD__
#include <sys/capsicum.h>

// Open EVERYTHING you need FIRST
int dir_fd = open(watch_path, O_RDONLY | O_DIRECTORY);
int log_fd = open(log_path, O_WRONLY | O_APPEND | O_CREAT, 0644);
int kq = kqueue();

// Limit capabilities on each
cap_rights_t rights;
cap_rights_init(&rights, CAP_READ, CAP_EVENT, CAP_FCNTL);
cap_rights_limit(dir_fd, &rights);

cap_rights_init(&rights, CAP_WRITE, CAP_SEEK);
cap_rights_limit(log_fd, &rights);

// Enter capability mode - no turning back!
if (cap_enter() == -1)
    err(1, "cap_enter");
#endif

Hint 4: Linux seccomp is the most complex

#ifdef __linux__
#include <sys/prctl.h>
#include <linux/seccomp.h>
#include <linux/filter.h>
#include <linux/audit.h>

// Use libseccomp for sane API:
#include <seccomp.h>

scmp_filter_ctx ctx = seccomp_init(SCMP_ACT_KILL);

// Whitelist needed syscalls
seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(read), 0);
seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(write), 0);
seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(inotify_add_watch), 0);
// ... many more

seccomp_load(ctx);
#endif

Hint 5: Test sandbox violations

void test_sandbox(void) {
    // Try something we shouldn't be able to do
    int sock = socket(AF_INET, SOCK_STREAM, 0);
    if (sock == -1) {
        printf("GOOD: socket() blocked as expected\n");
    } else {
        printf("BAD: socket() succeeded, sandbox not working!\n");
        close(sock);
    }
}

Books That Will Help

Topic Book Chapter
System calls fundamentals Advanced Programming in the UNIX Environment, 3rd Edition by Stevens & Rago Ch. 1-3
OpenBSD security philosophy Absolute OpenBSD by Michael W. Lucas Ch. 1 + security chapters
FreeBSD Capsicum Absolute FreeBSD, 3rd Edition by Michael W. Lucas Ch. 8
Linux seccomp-bpf The Linux Programming Interface by Michael Kerrisk Ch. 23
File watching (inotify) The Linux Programming Interface by Michael Kerrisk Ch. 19
BSD kqueue The Design and Implementation of the FreeBSD Operating System by McKusick et al. Ch. 6
Security principles Mastering FreeBSD and OpenBSD Security by Hope, Potter & Korff Ch. 1-4
BPF internals BPF Performance Tools by Brendan Gregg Ch. 2 (BPF basics)

Project 2: Event-Driven TCP Echo Server (kqueue vs epoll)

  • File: BSD_LINUX_UNIX_VARIANTS_LEARNING_PROJECTS.md
  • Programming Language: C
  • Coolness Level: Level 4: Hardcore Tech Flex
  • Business Potential: 4. The “Open Core” Infrastructure
  • Difficulty: Level 3: Advanced
  • Knowledge Area: High Performance Networking
  • Software or Tool: epoll / kqueue
  • Main Book: “The Linux Programming Interface” by Michael Kerrisk

What you’ll build: A high-performance TCP echo server handling 10,000+ concurrent connections using native event APIs—kqueue on BSD, epoll on Linux—with no abstraction libraries.

Why it teaches Unix differences: The kqueue vs epoll comparison reveals deep kernel design philosophy differences. kqueue is more general (handles files, signals, processes, timers—not just sockets) and allows batch updates. epoll is socket-focused and requires one syscall per change. Building the same server on both forces you to internalize these differences.

Core challenges you’ll face:

  • Challenge 1: kevent() batch operations vs epoll_ctl() single operations (maps to API design philosophy)
  • Challenge 2: Handling EVFILT_READ, EVFILT_WRITE vs EPOLLIN, EPOLLOUT (maps to event model differences)
  • Challenge 3: Edge-triggered vs level-triggered behavior on both systems
  • Challenge 4: Scaling to C10K connections and measuring performance differences

Key Concepts:

Difficulty: Intermediate Time estimate: 1-2 weeks Prerequisites: Socket programming basics, understanding of file descriptors

Real world outcome:

  • A server that accepts connections and echoes back whatever clients send
  • Benchmark output: “Handled 10,000 concurrent connections, 50,000 req/sec on FreeBSD kqueue” vs “45,000 req/sec on Linux epoll”
  • Performance graphs comparing both implementations under load

Learning milestones:

  1. Build blocking echo server → understand why it doesn’t scale
  2. Convert to epoll on Linux → understand event-driven I/O
  3. Port to kqueue on FreeBSD/OpenBSD → notice the cleaner API
  4. Add benchmarking with wrk or custom client → quantify the differences
  5. Try macOS kqueue → understand Darwin’s BSD heritage

Real World Outcome

When you complete this project, you’ll have a high-performance TCP echo server that handles thousands of concurrent connections using native OS event APIs.

What you’ll see running on FreeBSD with kqueue:

$ ./echo_server 8080
[echo_server] kqueue() created, fd=3
[echo_server] Listening on port 8080
[echo_server] Registered listener with EVFILT_READ
[echo_server] Entering event loop...

[14:32:01] Client connected from 192.168.1.10:52341 (fd=4)
[14:32:01] Client connected from 192.168.1.11:48923 (fd=5)
[14:32:01] Received 1024 bytes from fd=4, echoing back
[14:32:01] Client connected from 192.168.1.12:39847 (fd=6)
...
[14:32:05] Active connections: 847
[14:32:10] Active connections: 2,341
[14:32:15] Active connections: 5,892
[14:32:20] Active connections: 10,003  # C10K achieved!

# Performance stats:
[echo_server] kevent() calls: 15,234
[echo_server] Events processed: 1,247,892
[echo_server] Avg events per kevent(): 81.9
[echo_server] Throughput: 52,341 req/sec

What you’ll see running on Linux with epoll:

$ ./echo_server 8080
[echo_server] epoll_create1() returned fd=3
[echo_server] Listening on port 8080
[echo_server] Added listener to epoll with EPOLLIN
[echo_server] Entering event loop...

[14:32:01] Client connected from 192.168.1.10:52341 (fd=4)
[14:32:01] epoll_ctl(EPOLL_CTL_ADD, fd=4)  # One syscall per fd!
[14:32:01] Client connected from 192.168.1.11:48923 (fd=5)
[14:32:01] epoll_ctl(EPOLL_CTL_ADD, fd=5)
...
[14:32:20] Active connections: 10,003

# Performance stats:
[echo_server] epoll_wait() calls: 18,456
[echo_server] epoll_ctl() calls: 45,234  # More syscalls than kqueue!
[echo_server] Events processed: 1,198,234
[echo_server] Throughput: 48,721 req/sec

Benchmark comparison output:

$ ./benchmark_comparison.sh

========================================
   kqueue vs epoll Performance Test
========================================

Test: 10,000 concurrent connections, 60 seconds

BSD (FreeBSD 14) - kqueue:
  Requests/sec:     52,341
  Latency avg:      1.2ms
  Latency p99:      4.8ms
  Syscalls:         15,234 kevent()

Linux (Ubuntu 24.04) - epoll:
  Requests/sec:     48,721
  Latency avg:      1.4ms
  Latency p99:      5.2ms
  Syscalls:         63,690 (epoll_wait + epoll_ctl)

Analysis:
  - kqueue batches updates: ONE kevent() call for multiple changes
  - epoll requires one epoll_ctl() per fd modification
  - Under high connection churn, kqueue has fewer syscalls
  - Both handle C10K easily, but kqueue is more elegant

Your codebase comparison:

// BSD kqueue - batch register and wait in ONE call
struct kevent changes[MAX_EVENTS];  // What to change
struct kevent events[MAX_EVENTS];   // What happened
int nchanges = 0;

// Add multiple fds to changes array
EV_SET(&changes[nchanges++], client_fd, EVFILT_READ, EV_ADD, 0, 0, NULL);
EV_SET(&changes[nchanges++], another_fd, EVFILT_READ, EV_ADD, 0, 0, NULL);

// ONE syscall does everything!
int nevents = kevent(kq, changes, nchanges, events, MAX_EVENTS, NULL);

// ─────────────────────────────────────────────────────────────────

// Linux epoll - separate calls for modification and waiting
struct epoll_event ev, events[MAX_EVENTS];

// Each fd requires its own syscall
ev.events = EPOLLIN;
ev.data.fd = client_fd;
epoll_ctl(epfd, EPOLL_CTL_ADD, client_fd, &ev);  // Syscall 1

ev.data.fd = another_fd;
epoll_ctl(epfd, EPOLL_CTL_ADD, another_fd, &ev); // Syscall 2

// Then wait
int nevents = epoll_wait(epfd, events, MAX_EVENTS, -1);  // Syscall 3

The Core Question You’re Answering

“Why is kqueue considered technically superior to epoll, and what does this teach us about API design in operating systems?”

This project reveals a truth about Unix API design: elegance matters for performance. kqueue’s ability to batch operations into a single syscall means fewer context switches under load. But epoll works “well enough” and ships with the dominant server OS.

You’ll understand why Nginx, HAProxy, and other high-performance servers have different code paths for different OSes, and why some developers prefer BSD for networking workloads.


Concepts You Must Understand First

Stop and research these before coding:

  1. The C10K Problem
    • What is the C10K problem and why was it revolutionary in 1999?
    • Why don’t traditional threading models scale to 10K connections?
    • How do event-driven architectures solve this?
    • Reference: Dan Kegel’s C10K Paper
  2. Blocking vs Non-Blocking I/O
    • What happens when you call read() on a blocking socket with no data?
    • How does O_NONBLOCK change socket behavior?
    • What does EAGAIN/EWOULDBLOCK mean?
    • Book Reference: The Linux Programming Interface by Michael Kerrisk — Ch. 63
  3. Level-Triggered vs Edge-Triggered
    • Level-triggered: “notify while condition exists”
    • Edge-triggered: “notify when condition changes”
    • Why does edge-triggered require draining the buffer completely?
    • Which is default for kqueue? For epoll?
    • Book Reference: Linux System Programming, 2nd Edition by Robert Love — Ch. 4
  4. File Descriptors and the Kernel
    • What is a file descriptor really? (index into per-process table)
    • How does the kernel track which fds to monitor?
    • Why is select() O(n) while epoll/kqueue are O(1)?
    • Book Reference: Advanced Programming in the UNIX Environment by Stevens & Rago — Ch. 3
  5. kqueue Architecture
    • What is a kevent structure?
    • What are filters? (EVFILT_READ, EVFILT_WRITE, EVFILT_VNODE, EVFILT_TIMER…)
    • Why can kqueue batch changes?
    • Reference: Kernel Queue: Complete Guide
  6. epoll Architecture
    • What do epoll_create, epoll_ctl, epoll_wait do?
    • Why separate calls for modification and waiting?
    • What is EPOLLONESHOT and when would you use it?
    • Book Reference: The Linux Programming Interface by Michael Kerrisk — Ch. 63

Questions to Guide Your Design

Before implementing, think through these:

  1. Server Architecture
    • Will you use a single-threaded event loop or multiple threads with separate event loops?
    • How will you handle the accept() of new connections?
    • Should accept() be level-triggered or edge-triggered?
  2. Event Handling
    • When a read event fires, how much data should you read?
    • What if the client sends more data than your buffer size?
    • How do you handle partial writes when the send buffer is full?
  3. Connection Lifecycle
    • How do you detect client disconnection?
    • When should you remove a fd from the event set?
    • How do you avoid use-after-free when closing connections?
  4. Performance Measurement
    • How will you count syscalls to compare the APIs?
    • How will you generate load for benchmarking?
    • What metrics matter: throughput, latency, syscall count?
  5. Error Handling
    • What happens if kqueue()/epoll_create() fails?
    • How do you handle EINTR during kevent()/epoll_wait()?
    • What if a client causes an error—crash the server or just close that connection?

Thinking Exercise

Before coding, trace this scenario by hand:

You have 1000 clients connected. 100 of them send data simultaneously.

┌─────────────────────────────────────────────────────────────────────┐
│                    kqueue Event Processing                          │
├─────────────────────────────────────────────────────────────────────┤
│                                                                      │
│  State: 1000 fds registered with EVFILT_READ                        │
│                                                                      │
│  Step 1: 100 clients send data simultaneously                       │
│                                                                      │
│  Step 2: kevent(kq, NULL, 0, events, 1000, NULL)                    │
│          Returns: 100 events (only the ready ones)                  │
│          Syscalls so far: 1                                          │
│                                                                      │
│  Step 3: Process all 100 events, read data, echo back               │
│                                                                      │
│  Step 4: 50 clients disconnect                                       │
│          We need to remove them from kqueue                          │
│          Build array: changes[50] = {EV_DELETE for each fd}         │
│                                                                      │
│  Step 5: kevent(kq, changes, 50, events, 1000, NULL)                │
│          Removes 50 fds AND waits for new events in ONE call        │
│          Syscalls so far: 2                                          │
│                                                                      │
│  Total syscalls for this cycle: 2                                    │
│                                                                      │
└─────────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────────┐
│                    epoll Event Processing                            │
├─────────────────────────────────────────────────────────────────────┤
│                                                                      │
│  State: 1000 fds registered with EPOLLIN                            │
│                                                                      │
│  Step 1: 100 clients send data simultaneously                       │
│                                                                      │
│  Step 2: epoll_wait(epfd, events, 1000, -1)                         │
│          Returns: 100 events                                         │
│          Syscalls so far: 1                                          │
│                                                                      │
│  Step 3: Process all 100 events, read data, echo back               │
│                                                                      │
│  Step 4: 50 clients disconnect                                       │
│          We need to remove them from epoll                           │
│          epoll_ctl(epfd, EPOLL_CTL_DEL, fd1, NULL) // syscall 2     │
│          epoll_ctl(epfd, EPOLL_CTL_DEL, fd2, NULL) // syscall 3     │
│          ... 48 more times ...                                       │
│          Syscalls so far: 51                                         │
│                                                                      │
│  Step 5: epoll_wait() for next batch                                │
│          Syscalls so far: 52                                         │
│                                                                      │
│  Total syscalls for this cycle: 52                                   │
│                                                                      │
└─────────────────────────────────────────────────────────────────────┘

Key insight: Under high connection churn (many connects/disconnects), kqueue’s batching advantage becomes significant. Under stable connection pools, the difference is minimal.


The Interview Questions They’ll Ask

Prepare to answer these:

  1. “Why is kqueue technically superior to epoll?”
    • Batch updates: one syscall for multiple changes
    • More generic: handles files, signals, processes, not just sockets
    • Cleaner API: changes and events can be in the same call
  2. “If kqueue is better, why does everyone use Linux?”
    • epoll is “good enough” for most workloads
    • Linux has better hardware support, more developers, larger ecosystem
    • Most applications use abstraction layers (libevent, libuv) anyway
  3. “What’s the difference between level-triggered and edge-triggered?”
    • Level: kernel keeps notifying as long as fd is ready
    • Edge: kernel notifies once when state changes from not-ready to ready
    • Edge requires you to drain the buffer completely or you’ll miss data
  4. “How would you handle 1 million connections?”
    • C10K is 20+ years old; C1M is the new challenge
    • Need multiple event loops (one per core)
    • Need to think about memory per connection
    • SO_REUSEPORT helps distribute accept() load
  5. “What do Nginx and HAProxy use?”
    • Both have epoll and kqueue backends
    • Code is mostly the same, event API is abstracted
    • They prove the performance difference is measurable but not critical
  6. “Why didn’t Linux just implement kqueue?”
    • NIH (Not Invented Here) syndrome
    • Different kernel architecture made direct porting hard
    • By the time kqueue was proven, epoll was already deployed

Hints in Layers

Hint 1: Start with a blocking echo server

Understand the baseline before optimization:

int client_fd = accept(listen_fd, NULL, NULL);
while (1) {
    ssize_t n = read(client_fd, buf, sizeof(buf));
    if (n <= 0) break;
    write(client_fd, buf, n);  // Echo back
}
close(client_fd);
// Problem: only handles ONE client at a time!

Hint 2: Non-blocking sockets are essential

int flags = fcntl(fd, F_GETFL, 0);
fcntl(fd, F_SETFL, flags | O_NONBLOCK);

// Now read() returns -1 with errno=EAGAIN instead of blocking

Hint 3: kqueue skeleton

int kq = kqueue();
struct kevent ev;

// Register listener
EV_SET(&ev, listen_fd, EVFILT_READ, EV_ADD, 0, 0, NULL);
kevent(kq, &ev, 1, NULL, 0, NULL);

while (1) {
    struct kevent events[64];
    int n = kevent(kq, NULL, 0, events, 64, NULL);

    for (int i = 0; i < n; i++) {
        int fd = events[i].ident;
        if (fd == listen_fd) {
            // Accept new connection, add to kqueue
        } else if (events[i].filter == EVFILT_READ) {
            // Read from client, echo back
        }
    }
}

Hint 4: epoll skeleton

int epfd = epoll_create1(0);
struct epoll_event ev, events[64];

// Register listener
ev.events = EPOLLIN;
ev.data.fd = listen_fd;
epoll_ctl(epfd, EPOLL_CTL_ADD, listen_fd, &ev);

while (1) {
    int n = epoll_wait(epfd, events, 64, -1);

    for (int i = 0; i < n; i++) {
        int fd = events[i].data.fd;
        if (fd == listen_fd) {
            // Accept new connection
            int client = accept(listen_fd, NULL, NULL);
            ev.events = EPOLLIN;
            ev.data.fd = client;
            epoll_ctl(epfd, EPOLL_CTL_ADD, client, &ev);  // Extra syscall!
        } else {
            // Read from client, echo back
        }
    }
}

Hint 5: Benchmark with a simple load generator

# Using netcat and yes for simple load
for i in $(seq 1 1000); do
    (yes "hello" | nc localhost 8080 &)
done

# Or use wrk for HTTP if you add HTTP parsing
wrk -t4 -c10000 -d30s http://localhost:8080/

Books That Will Help

Topic Book Chapter
I/O multiplexing fundamentals The Linux Programming Interface by Michael Kerrisk Ch. 63: “Alternative I/O Models”
kqueue deep dive The Design and Implementation of the FreeBSD Operating System by McKusick et al. Ch. 6
epoll internals Linux System Programming, 2nd Edition by Robert Love Ch. 4
Non-blocking sockets UNIX Network Programming, Volume 1 by Stevens, Fenner & Rudoff Ch. 16
High-performance networking TCP/IP Illustrated, Volume 1 by W. Richard Stevens Ch. 17-18
The C10K problem Dan Kegel’s C10K paper (online) Full document
Event loop design Network Programming with Go by Adam Woodbeck Ch. 3 (concepts transfer to C)

Project 3: Build Your Own Container/Jail/Zone

  • File: BSD_LINUX_UNIX_VARIANTS_LEARNING_PROJECTS.md
  • Main Programming Language: C
  • Alternative Programming Languages: Rust, Go
  • Coolness Level: Level 4: Hardcore Tech Flex
  • Business Potential: Level 1: The “Resume Gold”
  • Difficulty: Level 3: Advanced (The Engineer)
  • Knowledge Area: OS Virtualization, Namespaces
  • Software or Tool: Linux, FreeBSD, Docker
  • Main Book: “The Linux Programming Interface” by Michael Kerrisk

What you’ll build: A minimal container runtime from scratch—using Linux namespaces+cgroups, FreeBSD jails, and illumos zones—to understand how OS-level virtualization differs fundamentally across Unix systems.

Why it teaches Unix differences: As Jessie Frazelle explains, “Jails and Zones are first-class kernel concepts. Containers are NOT—they’re just a term for combining Linux namespaces and cgroups.” Building all three reveals why FreeBSD’s jail(2) is a single syscall while Linux requires orchestrating 7+ namespace types plus cgroups.

Core challenges you’ll face:

  • Challenge 1: Linux—combining mount, PID, network, user, UTS, IPC namespaces manually (maps to “building blocks” philosophy)
  • Challenge 2: FreeBSD—single jail() syscall with jailparams (maps to “first-class concept” philosophy)
  • Challenge 3: Networking inside containers—veth pairs (Linux) vs VNET jails (FreeBSD)
  • Challenge 4: Filesystem isolation—overlay/bind mounts (Linux) vs ZFS clones (FreeBSD/illumos)
  • Challenge 5: Resource limits—cgroups v2 (Linux) vs rctl (FreeBSD)

Key Concepts:

Difficulty: Advanced Time estimate: 3-4 weeks Prerequisites: C programming, basic understanding of processes and filesystems

Real world outcome:

  • Run ./mycontainer /bin/sh and get an isolated shell with its own PID 1, network stack, and filesystem view
  • Demonstrate isolation: processes inside can’t see host processes; networking is separate
  • Show the difference in complexity: ~500 lines for Linux namespace container vs ~100 lines for FreeBSD jail wrapper

Learning milestones:

  1. Linux: Create PID namespace, see process isolation → understand namespace concept
  2. Linux: Add mount namespace, overlay filesystem → understand filesystem isolation
  3. Linux: Add network namespace with veth pair → understand network virtualization
  4. FreeBSD: Create jail with single syscall → notice the dramatic simplicity difference
  5. FreeBSD: Add VNET networking to jail → understand VNET architecture
  6. Compare codebase sizes and complexity → internalize the design philosophy difference

Real World Outcome

When you complete this project, you’ll have built minimal container runtimes that demonstrate the fundamental design philosophy differences between Linux and FreeBSD.

What you’ll see on Linux (your container runtime):

$ sudo ./mycontainer run /bin/sh

[mycontainer] Creating namespaces...
[mycontainer]   PID namespace: clone(CLONE_NEWPID) - PID 1 inside!
[mycontainer]   Mount namespace: clone(CLONE_NEWNS) - isolated filesystem
[mycontainer]   UTS namespace: clone(CLONE_NEWUTS) - new hostname
[mycontainer]   Network namespace: clone(CLONE_NEWNET) - isolated network
[mycontainer]   User namespace: clone(CLONE_NEWUSER) - uid mapping
[mycontainer]   IPC namespace: clone(CLONE_NEWIPC) - isolated semaphores

[mycontainer] Setting up cgroups v2...
[mycontainer]   Memory limit: 256MB
[mycontainer]   CPU shares: 512

[mycontainer] Setting up root filesystem...
[mycontainer]   pivot_root() to /containers/alpine

[mycontainer] Setting up network...
[mycontainer]   Created veth pair: veth0 <-> container0
[mycontainer]   Container IP: 10.0.0.2/24
[mycontainer]   Host bridge: 10.0.0.1/24

[mycontainer] Dropping capabilities...
[mycontainer] Entering container...

/ # hostname
container-12345
/ # ps aux
PID   USER     TIME   COMMAND
    1 root       0:00 /bin/sh     <-- We are PID 1!
    2 root       0:00 ps aux
/ # cat /proc/1/cgroup
0::/mycontainer                   <-- Our cgroup
/ # ip addr
1: lo: <LOOPBACK,UP> mtu 65536
    inet 127.0.0.1/8
2: eth0: <BROADCAST,UP> mtu 1500
    inet 10.0.0.2/24              <-- Isolated network!
/ # exit

[mycontainer] Container exited with status 0
[mycontainer] Cleaning up namespaces and cgroups...

What you’ll see on FreeBSD (your jail runtime):

$ sudo ./myjail run /bin/sh

[myjail] Creating jail...
[myjail]   jail_set(2) with:
[myjail]     path = /jails/alpine
[myjail]     hostname = jail-12345
[myjail]     ip4.addr = 10.0.0.2

[myjail] That's it. One syscall. Jail created.
[myjail] Entering jail...

$ hostname
jail-12345
$ ps aux
USER   PID  %CPU %MEM   VSZ   RSS  TT  STAT STARTED    TIME COMMAND
root     1   0.0  0.1  4788  1524  -   SJ   14:32   0:00.01 /bin/sh
root     2   0.0  0.1  4788  1496  -   R+J  14:32   0:00.00 ps aux
$ ifconfig
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> mtu 16384
     inet 127.0.0.1 netmask 0xff000000
jail0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
     inet 10.0.0.2 netmask 0xffffff00
$ exit

[myjail] Jail exited with status 0

Comparing your codebases:

$ wc -l linux_container.c freebsd_jail.c

   547 linux_container.c    # 500+ lines for Linux namespaces + cgroups
    98 freebsd_jail.c       # ~100 lines for FreeBSD jail

# Breakdown of Linux container complexity:
$ grep -c 'clone\|unshare' linux_container.c
12    # Many namespace operations
$ grep -c 'cgroup' linux_container.c
35    # cgroup setup is verbose
$ grep -c 'veth\|netlink' linux_container.c
48    # Network namespace setup is complex

# FreeBSD jail simplicity:
$ grep -c 'jail' freebsd_jail.c
8     # jail_set, jail_attach, jailparam_*

The visual difference:

┌────────────────────────────────────────────────────────────────────────────┐
│                        Linux Container Creation                             │
├────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  clone(CLONE_NEWPID)   ─┐                                                  │
│  clone(CLONE_NEWNS)    ─┤                                                  │
│  clone(CLONE_NEWUTS)   ─┼─► "Assemble the parts"                          │
│  clone(CLONE_NEWNET)   ─┤    7+ syscalls just for namespaces              │
│  clone(CLONE_NEWUSER)  ─┤                                                  │
│  clone(CLONE_NEWIPC)   ─┤                                                  │
│  clone(CLONE_NEWCGROUP)─┘                                                  │
│           +                                                                 │
│  cgroup_create()       ─┐                                                  │
│  write(memory.max)     ─┼─► Setup resource limits                         │
│  write(cpu.weight)     ─┘    More file operations                          │
│           +                                                                 │
│  veth_create()         ─┐                                                  │
│  netlink_addaddr()     ─┼─► Network setup via netlink                     │
│  netlink_addroute()    ─┘    Complex socket programming                    │
│           +                                                                 │
│  pivot_root()          ───► Filesystem isolation                           │
│           +                                                                 │
│  seccomp_load()        ───► Optional syscall filtering                     │
│                                                                             │
│  Result: ~500 lines of C, deeply understanding 5+ subsystems               │
│                                                                             │
└────────────────────────────────────────────────────────────────────────────┘

┌────────────────────────────────────────────────────────────────────────────┐
│                        FreeBSD Jail Creation                                │
├────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  struct jailparam params[] = {                                             │
│      { "path", "/jails/myjail" },                                          │
│      { "hostname", "myjail" },                                             │
│      { "ip4.addr", "10.0.0.2" },                                           │
│      { "vnet", "new" },  // VNET for network isolation                     │
│  };                                                                         │
│                                                                             │
│  jail_set(params, nparams, JAIL_CREATE | JAIL_ATTACH);                     │
│                                                                             │
│  // That's it. ONE syscall. You're in the jail.                            │
│                                                                             │
│  Result: ~100 lines of C, understanding ONE subsystem                      │
│                                                                             │
└────────────────────────────────────────────────────────────────────────────┘

The Core Question You’re Answering

“Why is a ‘container’ not a thing on Linux, and what are the real implications of this design choice?”

This project will burn into your brain the most important insight about Unix system design: Linux containers are a term for a combination of primitives. FreeBSD jails and Solaris zones are first-class kernel concepts.

As Jessie Frazelle famously wrote: “Jails and Zones are first-class concepts. Containers are NOT.” This difference explains:

  • Why container escapes happen on Linux
  • Why Docker needed years to become stable
  • Why FreeBSD jails were production-ready in 2000
  • Why illumos Zones can run Linux binaries (LX branded zones)

Concepts You Must Understand First

Stop and research these before coding:

  1. Process Isolation Fundamentals
    • What does a process see? (memory space, file descriptors, PID space)
    • What is chroot() and why isn’t it enough for isolation?
    • What is “escaping” a chroot and how is it done?
    • Book Reference: Operating Systems: Three Easy Pieces by Arpaci-Dusseau — Part II: “Virtualization”
  2. Linux Namespaces (The Building Blocks)
    • PID namespace: What does it mean to have PID 1?
    • Mount namespace: How does the filesystem view differ?
    • Network namespace: What is a network stack?
    • User namespace: How do UID/GID mappings work?
    • UTS namespace: Just the hostname, but important!
    • IPC namespace: Semaphores, message queues, shared memory
    • Book Reference: The Linux Programming Interface by Michael Kerrisk — Ch. 28-29
  3. Linux cgroups (Resource Limits)
    • What is a cgroup hierarchy?
    • cgroups v1 vs v2: Why did Linux redesign this?
    • How do you limit memory, CPU, I/O?
    • Book Reference: How Linux Works, 3rd Edition by Brian Ward — Ch. 8
  4. FreeBSD Jails (The Integrated Approach)
    • What is the jail(2) system call?
    • What’s a jailparam and how do you set them?
    • What is VNET and why does it make jails more powerful?
    • What is rctl (resource control)?
    • Book Reference: Absolute FreeBSD, 3rd Edition by Michael W. Lucas — Ch. 12: “Jails”
  5. Filesystem Isolation
    • Linux: What is pivot_root() vs chroot()?
    • Linux: What is an overlay filesystem?
    • FreeBSD: How do nullfs mounts work?
    • Both: How does ZFS make container storage better?
    • Book Reference: The Linux Programming Interface by Michael Kerrisk — Ch. 18
  6. Network Virtualization
    • Linux: What is a veth pair? What is a bridge?
    • Linux: How does netlink work?
    • FreeBSD: What is VNET? How is it different from IP-based jails?
    • Reference: FreeBSD Handbook Ch. 17: Jails

Questions to Guide Your Design

Before implementing, think through these:

  1. What defines “isolation”?
    • From the container’s view: what should it be unable to see/do?
    • From the host’s view: what should be protected?
    • What’s the threat model?
  2. How do you set up the root filesystem?
    • Where do you get a minimal rootfs? (Alpine, busybox)
    • Should changes persist or be discarded? (overlay vs bind mount)
    • How do you mount /proc, /sys, /dev inside?
  3. How do you handle networking?
    • Does the container need network access?
    • How does traffic get routed between container and host?
    • Do you need NAT for outbound connections?
  4. What about resource limits?
    • How much memory should the container have?
    • Should it have limited CPU?
    • What happens when limits are exceeded?
  5. How do you enter the container?
    • Linux: clone() with flags vs unshare() + fork()
    • FreeBSD: jail_attach() vs starting a new process in jail
    • What happens to the child process?

Thinking Exercise

Before coding, trace what Docker does on Linux:

$ strace -f docker run --rm alpine echo hello 2>&1 | grep -E 'clone|unshare|mount|pivot|cgroup'

# You'll see something like:
clone(child_stack=0x..., flags=CLONE_NEWNS|CLONE_NEWPID|...)
mount("none", "/", NULL, MS_REC|MS_PRIVATE, NULL)
mount("overlay", "/var/lib/docker/.../merged", "overlay", ...)
pivot_root(".", ".")
mount("proc", "/proc", "proc", ...)
openat(AT_FDCWD, "/sys/fs/cgroup/memory/.../memory.max", ...)
write(3, "268435456", 9)  # 256MB memory limit
clone(child_stack=0x..., flags=CLONE_NEWNET|...)

Now trace what FreeBSD does with a jail:

$ truss jail -c name=test path=/jails/test command=/bin/sh 2>&1 | grep jail

# You'll see:
jail_set(0x..., 5, 0x3)  # THAT'S IT. One syscall.

Map the complexity:

┌─────────────────────────────────────────────────────────────────────┐
│                    What Docker Does on Linux                         │
├─────────────────────────────────────────────────────────────────────┤
│                                                                      │
│  Layer 1: Namespace creation (7 different namespaces)               │
│                                                                      │
│  Layer 2: cgroup creation and configuration                         │
│           - Create cgroup directory                                  │
│           - Write limits to pseudo-files                             │
│           - Add process to cgroup                                    │
│                                                                      │
│  Layer 3: Filesystem setup                                           │
│           - Create overlay mount                                     │
│           - pivot_root to new root                                   │
│           - Mount /proc, /sys, /dev                                  │
│           - Mask sensitive paths                                     │
│                                                                      │
│  Layer 4: Network setup                                              │
│           - Create veth pair                                         │
│           - Move one end to container namespace                      │
│           - Configure IP addresses                                   │
│           - Set up routing                                           │
│           - Configure iptables rules                                 │
│                                                                      │
│  Layer 5: Security                                                   │
│           - Drop capabilities                                        │
│           - Install seccomp filter                                   │
│           - Set up AppArmor/SELinux profile                         │
│                                                                      │
│  TOTAL: 50+ syscalls, configuration across 5+ subsystems            │
│                                                                      │
└─────────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────────┐
│                    What FreeBSD Does with Jails                      │
├─────────────────────────────────────────────────────────────────────┤
│                                                                      │
│  Layer 1: jail_set() with parameters                                 │
│           - path: root filesystem                                    │
│           - hostname: container name                                 │
│           - ip4.addr / vnet: network config                         │
│           - (optional) resource limits via rctl                     │
│                                                                      │
│  TOTAL: 1-3 syscalls, everything is integrated                      │
│                                                                      │
└─────────────────────────────────────────────────────────────────────┘

The philosophical difference:

  • Linux: “Here are Lego blocks. Assemble them yourself. Maximum flexibility!”
  • FreeBSD: “Here’s a finished product. It works. Limited flexibility, but it’s correct.”

The Interview Questions They’ll Ask

Prepare to answer these:

  1. “What’s the difference between a container and a VM?”
    • VM: Separate kernel, hardware virtualization (hypervisor)
    • Container: Shared kernel, OS-level virtualization (namespaces/jails)
    • Container is lighter but has weaker isolation
  2. “How does Docker work under the hood on Linux?”
    • Uses clone() with namespace flags
    • Uses cgroups for resource limits
    • Uses overlay filesystem for copy-on-write layers
    • Uses pivot_root() for filesystem isolation
  3. “What is a container escape?”
    • Attacker inside container gains access to host
    • Usually through kernel vulnerabilities (shared kernel!)
    • Or misconfiguration (privileged containers, mounted docker socket)
  4. “Why are FreeBSD jails considered more secure?”
    • Single, audited subsystem vs. assembled primitives
    • Less complexity = fewer bugs
    • Jails existed since 2000, battle-tested
  5. “What is the difference between Docker and LXC/LXD?”
    • Docker: Application containers, immutable images, microservices
    • LXC/LXD: System containers, more like lightweight VMs
    • Both use the same Linux primitives underneath
  6. “How would you debug a container networking issue?”
    • Check namespace with nsenter
    • Check veth pairs with ip link
    • Check routing with ip route
    • Check iptables rules

Hints in Layers

Hint 1: Start with just PID namespace

The simplest isolation—see a different PID space:

// Linux: fork into new PID namespace
int child_pid = clone(child_func, stack + STACK_SIZE,
                      CLONE_NEWPID | SIGCHLD, NULL);

// In child_func:
printf("I am PID %d\n", getpid());  // Will print "I am PID 1"!

Hint 2: Add mount namespace for filesystem isolation

// After clone with CLONE_NEWNS:
mount("none", "/", NULL, MS_REC | MS_PRIVATE, NULL);  // Private mounts
mount("/containers/alpine", "/containers/alpine", NULL, MS_BIND, NULL);
chdir("/containers/alpine");
pivot_root(".", ".");
umount2(".", MNT_DETACH);  // Unmount old root

Hint 3: FreeBSD jail is dramatically simpler

#include <sys/jail.h>
#include <jail.h>

struct jailparam params[4];
jailparam_init(&params[0], "path");
jailparam_set(&params[0], "/jails/myjail");
jailparam_init(&params[1], "host.hostname");
jailparam_set(&params[1], "myjail");
jailparam_init(&params[2], "ip4.addr");
jailparam_set(&params[2], "10.0.0.2");
jailparam_init(&params[3], "persist");
jailparam_set(&params[3], NULL);

int jid = jailparam_set(params, 4, JAIL_CREATE | JAIL_ATTACH);
// That's it! You're now in the jail.

Hint 4: Linux network namespace needs veth pair

// This requires netlink programming or shelling out to `ip`:
system("ip link add veth0 type veth peer name container0");
system("ip link set container0 netns <pid>");
system("ip addr add 10.0.0.1/24 dev veth0");
system("ip link set veth0 up");

// Inside container namespace:
system("ip addr add 10.0.0.2/24 dev container0");
system("ip link set container0 up");
system("ip route add default via 10.0.0.1");

Hint 5: cgroups v2 setup

// Create cgroup
mkdir("/sys/fs/cgroup/mycontainer", 0755);

// Set memory limit (256MB)
int fd = open("/sys/fs/cgroup/mycontainer/memory.max", O_WRONLY);
write(fd, "268435456", 9);
close(fd);

// Add process to cgroup
fd = open("/sys/fs/cgroup/mycontainer/cgroup.procs", O_WRONLY);
char pid_str[16];
sprintf(pid_str, "%d", child_pid);
write(fd, pid_str, strlen(pid_str));
close(fd);

Books That Will Help

Topic Book Chapter
Linux namespaces The Linux Programming Interface by Michael Kerrisk Ch. 28-29 (Process Creation)
Linux cgroups How Linux Works, 3rd Edition by Brian Ward Ch. 8
FreeBSD jails Absolute FreeBSD, 3rd Edition by Michael W. Lucas Ch. 12: “Jails”
Container internals Container Security by Liz Rice Full book (O’Reilly)
Process isolation theory Operating Systems: Three Easy Pieces by Arpaci-Dusseau Part II: “Virtualization”
Filesystem namespaces The Linux Programming Interface by Michael Kerrisk Ch. 18: “Directories and Links”
Network namespaces Linux Network Internals by Christian Benvenuti Ch. 1-3
illumos Zones illumos Documentation Online

Project 4: Packet Filter Firewall Configuration Tool

  • File: BSD_LINUX_UNIX_VARIANTS_LEARNING_PROJECTS.md
  • Programming Language: C
  • Coolness Level: Level 3: Genuinely Clever
  • Business Potential: 3. The “Service & Support” Model
  • Difficulty: Level 2: Intermediate
  • Knowledge Area: Networking / Security
  • Software or Tool: pf / nftables
  • Main Book: “Absolute OpenBSD” by Michael W. Lucas

What you’ll build: A command-line tool that generates and applies firewall rules—using pf on OpenBSD/FreeBSD and nftables on Linux—from a common configuration format.

Why it teaches Unix differences: OpenBSD’s pf (packet filter) is legendary for its clean syntax and powerful features. Linux’s nftables (replacing iptables) has different semantics. Building a tool that targets both forces you to understand network stack differences at the kernel level.

Core challenges you’ll face:

  • Challenge 1: pf’s stateful inspection model vs nftables’ table/chain/rule hierarchy
  • Challenge 2: pf anchors vs nftables sets for dynamic rules
  • Challenge 3: NAT handling differences
  • Challenge 4: Loading rules atomically vs incrementally

Key Concepts:

  • pf fundamentals: “Absolute OpenBSD” Ch. 7 - Michael W. Lucas
  • OpenBSD pf FAQ: OpenBSD official documentation
  • nftables design: Linux nftables wiki
  • BSD networking: “TCP/IP Illustrated Vol. 1” - W. Richard Stevens (the BSD reference implementation)
  • Packet filtering theory: “Mastering FreeBSD and OpenBSD Security” Ch. 4-5 - Hope, Potter & Korff

Difficulty: Intermediate Time estimate: 2 weeks Prerequisites: Basic networking (TCP/IP), understanding of firewalls conceptually

Real world outcome:

  • A tool that reads YAML like allow: {port: 22, from: 10.0.0.0/8} and outputs valid pf.conf or nftables rules
  • Apply rules and demonstrate: blocked connections fail, allowed connections succeed
  • Show the same logical policy expressed in both syntaxes

Learning milestones:

  1. Write pf rules manually on OpenBSD → understand pf syntax and concepts
  2. Write equivalent nftables rules on Linux → notice the structural differences
  3. Build parser for common config format → abstract the similarities
  4. Generate native rules for each OS → encode the differences
  5. Test with real traffic → verify correctness

Project 5: DTrace/eBPF System Tracer

  • File: BSD_LINUX_UNIX_VARIANTS_LEARNING_PROJECTS.md
  • Main Programming Language: C
  • Alternative Programming Languages: D (DTrace), Rust, Python (BCC)
  • Coolness Level: Level 4: Hardcore Tech Flex
  • Business Potential: Level 1: The “Resume Gold”
  • Difficulty: Level 3: Advanced (The Engineer)
  • Knowledge Area: System Tracing, Performance
  • Software or Tool: DTrace, eBPF, BCC
  • Main Book: “BPF Performance Tools” by Brendan Gregg

What you’ll build: A system tracing tool that shows function call latencies in running processes—using DTrace on FreeBSD/illumos/macOS and eBPF on Linux.

Why it teaches Unix differences: DTrace originated in Solaris (now illumos) and was ported to FreeBSD and macOS. Linux created eBPF as a “competitor.” Both let you instrument a running kernel without rebooting, but their models differ significantly. DTrace uses the D language; eBPF uses C compiled to bytecode with complex verifier rules.

Core challenges you’ll face:

  • Challenge 1: D language scripts vs eBPF C programs (maps to “language design” philosophy)
  • Challenge 2: DTrace probes (fbt, syscall, pid) vs eBPF attach points (kprobe, tracepoint, uprobe)
  • Challenge 3: DTrace aggregations (@count, @quantize) vs eBPF maps
  • Challenge 4: Safety models—DTrace’s interpreter vs eBPF’s verifier

Key Concepts:

Difficulty: Advanced Time estimate: 2-3 weeks Prerequisites: Understanding of kernel/userspace boundary, basic C

Real world outcome:

  • Run ./mytrace -p <pid> and see output like: read() latency: min=1μs avg=50μs max=2ms histogram: [1-10μs: 500] [10-100μs: 200]
  • Same tool works on FreeBSD (DTrace) and Linux (eBPF) with different backends
  • Demonstrate tracing a real application (like nginx) to find performance bottlenecks

Learning milestones:

  1. Write simple DTrace one-liner on FreeBSD → understand probe concept
  2. Convert to D script with aggregations → understand D language
  3. Port to eBPF/BCC on Linux → notice the complexity increase
  4. Add histogram output → understand both aggregation models
  5. Trace real application → apply knowledge practically

Project Comparison Table

Project Difficulty Time Depth of Understanding Fun Factor OSes Covered
Sandboxed Service Intermediate 2-3 weeks ⭐⭐⭐⭐⭐ (security models) ⭐⭐⭐ OpenBSD, FreeBSD, Linux
Event-Driven Server Intermediate 1-2 weeks ⭐⭐⭐⭐ (I/O architecture) ⭐⭐⭐⭐ FreeBSD, Linux, macOS
Container/Jail/Zone Advanced 3-4 weeks ⭐⭐⭐⭐⭐ (isolation architecture) ⭐⭐⭐⭐⭐ Linux, FreeBSD, illumos
Packet Filter Tool Intermediate 2 weeks ⭐⭐⭐⭐ (networking) ⭐⭐⭐ OpenBSD, FreeBSD, Linux
DTrace/eBPF Tracer Advanced 2-3 weeks ⭐⭐⭐⭐⭐ (kernel internals) ⭐⭐⭐⭐ FreeBSD, illumos, Linux

Recommendation

Given that you want to deeply understand the differences, I recommend starting with Project 3: Build Your Own Container/Jail/Zone.

Why this project first:

  1. Maximum contrast: The difference between Linux’s 7 namespace types + cgroups vs FreeBSD’s single jail() syscall is the clearest demonstration of “building blocks” vs “first-class concept” philosophy
  2. Practical relevance: Containers are everywhere; understanding them at the kernel level makes you dangerous
  3. Forces multi-OS work: You literally cannot complete it without running multiple operating systems
  4. Foundation for others: Once you understand isolation, the security sandbox project (Project 1) becomes much clearer

Setup recommendation:

  • Use VirtualBox/VMware with FreeBSD 14, OpenBSD 7.5, and Linux (any distro)
  • Or use cloud VMs (Vultr/DigitalOcean have FreeBSD; OpenBSD requires ISO install)
  • illumos: Use OmniOS or SmartOS in VM

Final Comprehensive Project: Cross-Platform Unix Compatibility Layer

What you’ll build: A userspace compatibility library that allows programs written for one Unix to run on another—implementing syscall translation, filesystem abstraction, and API shimming. Think: a minimal “Wine for BSD” or “BSD personality for Linux.”

Why it teaches everything: This project forces you to confront EVERY difference between Unix systems:

  • Different syscall numbers and semantics
  • Different ioctl interfaces
  • Different signal behaviors
  • Different filesystem layouts and conventions
  • Different library ABIs

What you’ll build specifically:

  • A preloadable shared library (LD_PRELOAD) that intercepts syscalls
  • Translation layer for key differences (e.g., translate kqueue calls to epoll on Linux)
  • ABI compatibility for basic programs (get ls from FreeBSD running on Linux, or vice versa)

Core challenges you’ll face:

  • Challenge 1: Syscall number mapping (same name, different numbers across OSes)
  • Challenge 2: Struct layout differences (even struct stat differs)
  • Challenge 3: Signal semantics variations
  • Challenge 4: Implementing kqueue in terms of epoll (or vice versa)
  • Challenge 5: Path translation (/usr/local conventions, /proc vs /compat/linux/proc)
  • Challenge 6: Dynamic linker differences

Key Concepts:

  • Syscall interfaces: “The Linux Programming Interface” Ch. 3 - Kerrisk + BSD man pages comparison
  • ABI compatibility: “Computer Systems: A Programmer’s Perspective” Ch. 7 - Bryant & O’Hallaron
  • Dynamic linking: “Advanced Programming in the UNIX Environment” Ch. 17 - Stevens & Rago
  • FreeBSD Linux emulation: FreeBSD Handbook - Linux Binary Compatibility
  • illumos LX zones: illumos LX branded zones - how they run Linux binaries

Difficulty: Expert Time estimate: 2-3 months Prerequisites: Complete at least 2-3 projects above; strong C; understanding of ELF format

Real world outcome:

  • Run a simple FreeBSD binary on Linux (or vice versa): ./mycompat /path/to/freebsd/ls -la
  • See output showing which syscalls were translated
  • Demonstrate: “This program uses kqueue, but we’re translating it to epoll on Linux”

Learning milestones:

  1. Build syscall interception framework → understand how syscalls work at machine level
  2. Implement basic syscall translation (open, read, write, close) → understand “same but different”
  3. Implement struct translation layer → understand ABI differences
  4. Port kqueue→epoll (or reverse) → deep understanding of both
  5. Get a real program running → validate your understanding is complete

Sources