Sprint: Linux and Unix Internals Mastery - Real World Projects
Goal: Build a durable mental model of how Linux and Unix systems move from power-on to user space, from a keystroke to a syscall, and from a file name to a disk block. You will learn the contracts that make the OS predictable: the syscall boundary, the process model, virtual memory, and the VFS layer. You will be able to trace behavior in
/proc, reason about why a process blocks, and explain how containers are assembled from namespaces and cgroups. By the end, you can build, debug, and explain core system behaviors instead of guessing.
Introduction
- What is Linux and Unix internals? The concrete rules, data structures, and interfaces that connect firmware, kernel, and user space.
- What problem does it solve today? It turns nondeterministic “system behavior” into explainable cause-and-effect.
- What will you build across the projects? A bootable image, a shell, a syscall tracer, a process inspector, a TTY tool, a page cache lab, a FUSE filesystem, and a mini container runtime.
- In scope: boot flow, syscalls, processes, memory, VFS, signals, terminals, namespaces, cgroups.
- Out of scope: kernel hacking, device drivers, networking internals beyond sockets, and full OS design.
Big-picture mental model (from input to disk and back):
User Input
|
v
Shell -> libc wrapper -> syscall gate
| |
| v
| Kernel boundary
| |
v v
Process model -> Scheduler -> VFS -> Page cache -> Block layer -> Device
^ |
| v
+-------------------- result, errno, data ----------------+
How to Use This Guide
- Read the Theory Primer first to build the mental model for each system layer.
- Pick a learning path that matches your background and time constraints.
- After each project, verify results against the Definition of Done and compare behavior with expected CLI output.
Prerequisites & Background Knowledge
Essential Prerequisites (Must Have)
- C basics: pointers, structs, memory allocation, and string handling.
- Shell fluency: pipes, redirection, job control, and
manpages. - Basic process vocabulary: PID, file descriptor, exit code.
- Recommended Reading: “The Linux Programming Interface” by Michael Kerrisk - Ch. 1-6.
Helpful But Not Required
- Assembly basics (you will encounter this in Project 1).
- Filesystem basics (blocks, inodes, directories) - learned in Projects 4 and 8.
- Containers overview - learned in Projects 9 and 10.
Self-Assessment Questions
- What is the difference between a file descriptor and a pathname?
- Why can a child process not change the parent working directory?
- What does it mean for a signal to be pending?
Development Environment Setup Required Tools:
- Linux host or VM (x86_64).
- C compiler (gcc 10+ or clang 12+).
- Build tools:
makeandld. - Tracing tools:
strace,ltrace. - Debugger:
gdb.
Recommended Tools:
- QEMU (for boot project).
perforbpftrace(for performance inspection).tmux(for multi-pane tracing sessions).
Testing Your Setup:
$ uname -r
6.x.y
$ strace -c true
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
100.00 0.000000 0 1 exit_group
Time Investment
- Simple projects: 4-8 hours each
- Moderate projects: 10-20 hours each
- Complex projects: 20-40 hours each
- Total sprint: 2-4 months
Important Reality Check You will hit silent failures: buffered output, missing permissions, and races. Treat them as signals to refine your mental model, not as personal failure.
Big Picture / Mental Model
Linux and Unix internals are best viewed as layered contracts. Each layer hides complexity behind a narrow interface. When a program misbehaves, it is usually violating a contract or making a wrong assumption about a layer boundary.
+-----------------------------------------------------+
| User Space |
| - Shell, apps, libraries |
+---------------------------+-------------------------+
|
v
+---------------------------+-------------------------+
| Syscall Interface (ABI) |
| - numbers, args, return values, errno |
+---------------------------+-------------------------+
|
v
+---------------------------+-------------------------+
| Kernel Core |
| - Process model, scheduler, memory, VFS |
+---------------------------+-------------------------+
|
v
+---------------------------+-------------------------+
| Devices and Firmware |
| - block I/O, terminals, timers, boot firmware |
+-----------------------------------------------------+
Theory Primer
Chapter 1: Boot and Early Userspace
Fundamentals The boot path is a chain of responsibility from firmware to kernel to user space. Firmware (BIOS or UEFI) initializes hardware and loads a bootloader, which then loads the kernel image and optional initramfs. The kernel is not the first code executed, but it is the first general-purpose code that owns the machine. Early kernel work includes setting up memory management, drivers, and the scheduler, then mounting a root filesystem. The transition to user space happens when the kernel launches PID 1, which is responsible for starting the rest of the system. Understanding this sequence matters because failures before PID 1 require a different debugging approach than failures after user space starts.
Deep Dive Boot is a pipeline with strict handoff points and explicit contracts. Firmware performs basic hardware initialization and defines the starting environment for the bootloader. In UEFI systems, the firmware provides standardized boot services and runtime services, along with tables that describe hardware and boot options. In legacy BIOS flows, the firmware loads a boot sector from disk into memory and jumps to it. Regardless of the path, the bootloader is the piece that understands how to locate and load the kernel image, often along with an initramfs that provides temporary root filesystem contents for early user space.
The Linux kernel expects a defined boot protocol on each architecture. On x86, the boot protocol defines where the kernel image is loaded, how the command line is passed, and where the bootloader communicates memory layout, video info, and other parameters. This is crucial: the kernel cannot guess. If the bootloader fails to pass correct metadata, the kernel will misinterpret memory or fail to mount the root filesystem. After decompression and early setup, the kernel configures memory zones, initializes the interrupt descriptor table, sets up the scheduler, and probes drivers needed to access the root filesystem.
The initramfs is a compressed archive that the kernel unpacks into a temporary root. Its job is to provide minimal tools and drivers to locate the real root filesystem. The initramfs init process runs as PID 1 in early user space; it can load modules, assemble storage (for example, RAID or LUKS), and then switch to the final root using a pivot operation. Once the real root is ready, the kernel (or the initramfs init) performs a root switch and executes the real PID 1, which is typically systemd or an init system. From that point onward, the system behaves like the environment most users are familiar with: services start, loggers run, and the shell becomes available.
This chain is fragile because each step depends on the previous one. If the kernel cannot mount the root filesystem, it panics because it has no place to execute user space from. If PID 1 exits, the kernel also panics because there is no supervisor left. Debugging early boot therefore relies on kernel logs, boot parameters, and visibility into the initramfs environment. You can often diagnose issues by inspecting the kernel command line, verifying that required drivers are built-in or present in initramfs, and using emergency shell hooks in the initramfs.
A key insight is that boot is not a monolithic black box. Each stage is small, testable, and replaceable. You can run a tiny bootloader in QEMU, print directly to video memory, and halt the CPU to verify that the firmware-to-bootloader handoff works. You can alter kernel command line arguments to change root mounts and debug options. You can break the initramfs on purpose to practice recovery. This conceptual clarity is what allows systems engineers to debug “machine will not boot” with confidence instead of guesswork.
How this fit on projects Projects 1 and 10 rely on understanding early boot, PID 1 expectations, and how user space is launched.
Definitions and key terms
- Firmware: Low-level code that initializes hardware and starts the boot chain.
- Bootloader: Program that loads the kernel and passes metadata to it.
- initramfs: Temporary root filesystem used for early user space.
- PID 1: The first user space process; it must stay alive.
- Kernel command line: Boot parameters passed to the kernel.
Mental model diagram
Power on
|
v
Firmware (BIOS/UEFI)
|
v
Bootloader
|
v
Kernel + initramfs
|
v
Early user space (initramfs init)
|
v
Real root mounted
|
v
PID 1 (init/systemd)
How it works
- Firmware initializes hardware and finds a boot target.
- Bootloader loads kernel image and initramfs, passes boot params.
- Kernel initializes CPU, memory, interrupts, and essential drivers.
- Kernel mounts temporary root and starts early user space.
- Early user space mounts real root and hands off to PID 1.
Minimal concrete example
Boot log excerpt (conceptual):
[boot] firmware OK
[boot] bootloader loads kernel + initramfs
[boot] kernel cmdline: root=/dev/sda2 ro
[boot] initramfs: loading storage drivers
[boot] switch_root -> /sbin/init
[boot] PID 1 started
Common misconceptions
- “The kernel is the first code that runs.” It is not; firmware and bootloader run first.
- “initramfs is optional.” It is optional for some systems, mandatory for many modern ones.
- “PID 1 is just another process.” It has special responsibilities and failure semantics.
Check-your-understanding questions
- Why does the kernel panic if PID 1 exits?
- What information does the bootloader provide to the kernel?
- Why can a system boot without an initramfs in some cases but not others?
Check-your-understanding answers
- The kernel has no user space supervisor left; it cannot proceed safely.
- Memory layout, boot parameters, and location of command line or initramfs.
- If all necessary drivers are built into the kernel and root is simple, initramfs may be unnecessary.
Real-world applications
- Debugging boot failures in cloud images or embedded devices.
- Building minimal appliances with a custom initramfs.
- Understanding secure boot flows.
Where you’ll apply it
- Project 1: Do-Nothing Bootloader and Kernel
- Project 10: Container Runtime (PID 1 behavior)
References
- Linux x86 boot protocol: https://docs.kernel.org/arch/x86/boot.html
- UEFI specification overview: https://uefi.org/specs/UEFI/2.10/
- “How Linux Works” by Brian Ward - Ch. 1-3
Key insights Boot is a chain of contracts, not a magic moment.
Summary If you can describe the handoff from firmware to bootloader to kernel to PID 1, you can debug the hardest “it does not boot” problems.
Homework/Exercises to practice the concept
- Draw your machine’s boot chain from firmware to login prompt.
- Identify which step owns the kernel command line and where it is stored.
Solutions to the homework/exercises
- Firmware -> bootloader -> kernel + initramfs -> PID 1 -> services -> login.
- Bootloader writes it into the protocol-defined field and points the kernel to it.
Chapter 2: System Call Interface and ABI
Fundamentals The system call interface is the only legal bridge from user space to kernel space. It is defined by the ABI: which registers hold arguments, how the syscall number is specified, and how return values and errors are communicated. Libraries like libc provide wrappers so most programs do not issue raw syscalls. This boundary exists to enforce privilege separation and provide a stable contract across kernel versions. Understanding syscalls lets you map user-level behavior to kernel actions, which is the basis of tracing, debugging, and performance diagnosis.
Deep Dive A system call is a controlled privilege transition. When user space needs a service that only the kernel can provide, it triggers an architecture-specific instruction that causes a switch to kernel mode. The CPU changes privilege level, switches stacks, and jumps to a well-defined entry point. The kernel then reads the syscall number and arguments according to the ABI. The ABI is not optional: it defines the binary contract between compiled user programs and the kernel. On Linux, this is documented by kernel and libc interfaces, and the details vary by architecture. The important takeaway is that the ABI makes syscalls a stable, testable interface even as kernel internals evolve.
Most programs do not call syscalls directly. The C library provides wrappers that validate arguments, set up registers, and handle error conventions. When a syscall fails, the kernel typically returns a negative error code, which libc converts into a -1 return value with errno set. This design allows error handling to be uniform across functions. It also allows the kernel to add new syscalls without forcing every user program to know the register-level details.
The syscall boundary is visible in tools like strace, which show each syscall and its result. This makes it possible to answer questions like “why does this program block” or “what file does it open.” It also reveals hidden costs: a simple user command can produce dozens of syscalls due to library initialization, dynamic linking, and filesystem checks. That is why tracing is such a powerful diagnostic tool. You are not guessing what the program does; you are observing the kernel-level truth.
The ABI also connects to binary formats and the loader. When you call exec to start a program, the kernel inspects the executable format (ELF on Linux). It maps segments into memory, sets up the initial stack with arguments and environment variables, and then begins execution at the entry point. The ABI specifies calling conventions, data layout, and how shared libraries are resolved. This is why a binary compiled for one architecture will not run on another: the ABI assumptions are different.
Syscalls are intentionally narrow. They do not expose kernel data structures directly, and they are designed to maintain a stable contract over time. When higher-level features are needed, they are built in user space by composing syscalls. For example, shells implement pipelines by combining fork, pipe, dup, and exec. Filesystems in user space (FUSE) are built by exposing a file operation API through a device interface. Containers are built by composing namespace and cgroup syscalls. This composability is the Unix design philosophy in action.
A common failure mode is confusing library calls with syscalls. Not every function that touches the OS is a syscall. Some functions are pure library code, and others combine multiple syscalls internally. When debugging, always trace what the kernel sees instead of assuming a direct mapping from a function name to a single syscall.
How this fit on projects Projects 2, 3, 4, 8, and 10 depend on understanding the syscall boundary and ABI conventions.
Definitions and key terms
- Syscall: A privileged transition to the kernel to request a service.
- ABI: Application Binary Interface; binary-level contract between user space and kernel.
- libc: Standard C library providing syscall wrappers and utilities.
- errno: Thread-local error indicator set by libc when a syscall fails.
Mental model diagram
User program
|
v
libc wrapper
|
v
syscall gate (CPU switches to kernel)
|
v
kernel service
|
v
return value + errno
How it works
- User code calls a libc wrapper.
- Wrapper loads syscall number and arguments.
- CPU enters kernel mode and jumps to syscall handler.
- Kernel validates, executes, and returns a result.
- libc converts result into return value and errno.
Minimal concrete example
Syscall trace excerpt (conceptual):
openat("/etc/hostname") -> fd=3
read(fd=3, bytes=64) -> 12 bytes
close(fd=3) -> 0
Common misconceptions
- “Every C function is a syscall.” Most are not.
- “Syscalls are slow because of their names.” The overhead is the mode switch and validation.
- “ABI changes break all programs.” The ABI is designed to be stable across kernel versions.
Check-your-understanding questions
- Why do syscalls have to cross a privilege boundary?
- How does libc report syscall errors to user code?
- Why can tracing reveal performance bottlenecks?
Check-your-understanding answers
- The boundary enforces protection and prevents user code from touching hardware directly.
- It returns -1 and sets errno based on the kernel error code.
- Tracing shows exact kernel operations and their frequency.
Real-world applications
- Diagnosing file access failures and permission errors.
- Understanding why a program blocks or consumes CPU.
- Building sandboxes by limiting syscalls.
Where you’ll apply it
- Project 2: Syscall Tracer
- Project 3: Build Your Own Shell
- Project 4: Filesystem Explorer
- Project 8: FUSE Mirror Filesystem
- Project 10: Container Runtime
References
- syscall(2) man page: https://man7.org/linux/man-pages/man2/syscall.2.html
- syscalls(2) overview: https://man7.org/linux/man-pages/man2/syscalls.2.html
- “The Linux Programming Interface” by Michael Kerrisk - Ch. 3-6
Key insights Syscalls are the OS contract you can observe and trust.
Summary If you can trace syscalls, you can explain system behavior precisely.
Homework/Exercises to practice the concept
- Use a syscall tracer on a simple command and count unique syscalls.
- Identify which syscalls represent file system work and which represent process control.
Solutions to the homework/exercises
- You should see a small set of file and process syscalls repeated.
- File syscalls include open, read, write; process syscalls include fork and exec.
Chapter 3: Process Model, Scheduling, and Exec
Fundamentals A process is not just “a running program”; it is a kernel data structure that holds memory mappings, open files, credentials, and scheduling state. The kernel schedules runnable threads onto CPUs based on policy and priority. Process creation usually follows a fork-exec pattern: duplicate the process state, then replace the child with a new program image. Understanding process states and scheduling helps you explain why a process is running, sleeping, or blocked, and why it may appear unresponsive even when it is not dead.
Deep Dive The kernel represents each process (and thread) using internal structures that track identity, resources, and execution state. These structures include the process ID, parent/child relationships, open file descriptors, signal handlers, and memory mappings. When you call fork, the kernel creates a new task that initially shares many resources with the parent; modern kernels use copy-on-write so that memory is not duplicated until it is modified. This is why fork is relatively cheap and why it enables efficient process creation in shells and servers.
The exec family of syscalls replaces the current process image with a new program. The kernel loads the executable (usually ELF), maps its segments into memory, sets up the stack with arguments and environment variables, and jumps to the program entry point. At this moment, the process retains its PID and some attributes (like open file descriptors that are not marked close-on-exec) but its code, data, and stack are replaced. This separation is powerful: fork preserves process identity and relationships, while exec changes the program that runs within that identity.
Scheduling is the mechanism that decides which thread runs on which CPU. Linux uses a combination of policies. The default policy (SCHED_OTHER) is a time-sharing scheduler that aims to balance responsiveness and throughput. Real-time policies (SCHED_FIFO, SCHED_RR) provide deterministic scheduling at the cost of fairness. The scheduler tracks runnable tasks, applies priorities, and preempts running threads when higher priority work arrives. Process states explain why a task is not running: it may be runnable, sleeping in an interruptible wait, blocked in uninterruptible I/O, stopped by a signal, or a zombie waiting for its parent to reap it.
The process hierarchy matters. When a parent does not wait for a child, the child becomes a zombie after exit because the kernel must retain its exit status until the parent collects it. This is why shells and service managers must call wait. Orphaned children are reparented to PID 1, which reaps them. If PID 1 fails to reap, the system accumulates zombies. These are not active processes, but they still occupy kernel table entries. Understanding this behavior makes debugging “mysterious PIDs” much easier.
Scheduling and blocking are also closely tied to I/O and synchronization. A process waiting for disk I/O may appear “D” state (uninterruptible sleep) because the kernel cannot safely interrupt that wait. A process waiting for a user input event is usually in interruptible sleep and will respond to signals. Understanding these differences is critical when diagnosing unresponsive processes and when interpreting tools like top or ps.
Finally, the process model is the foundation for isolation. Namespaces and cgroups (discussed later) operate by changing the process view of system resources or limiting their usage. But they still rely on the same core process structures and scheduler decisions. That is why a solid process model is necessary before you can understand containers.
How this fit on projects Projects 3, 5, 9, and 10 rely directly on the fork-exec model, process states, and scheduling behavior.
Definitions and key terms
- Task: A schedulable entity (process or thread) in the kernel.
- Zombie: A terminated process waiting to be reaped.
- Fork: Create a new process by copying the parent.
- Exec: Replace the current process image with a new program.
- Scheduling policy: Rules for CPU time allocation.
Mental model diagram
Parent process
|
| fork
v
Child process (same PID tree, new task)
|
| exec
v
New program image
How it works
- Parent calls fork; kernel creates a new task with shared memory pages.
- Child optionally modifies state, then calls exec.
- Kernel loads ELF, maps segments, sets stack, and jumps to entry point.
- Scheduler chooses runnable tasks based on policy and priority.
Minimal concrete example
Process state trace (conceptual):
PID 1042 R (running)
PID 1050 S (sleeping, waiting for input)
PID 1055 Z (zombie, waiting for parent)
Common misconceptions
- “Fork copies all memory immediately.” It uses copy-on-write.
- “A zombie uses CPU.” It does not; it only occupies a table entry.
- “Exec creates a new PID.” The PID stays the same; the program changes.
Check-your-understanding questions
- Why is fork cheap on modern kernels?
- What causes a process to become a zombie?
- Why do shells need to call wait?
Check-your-understanding answers
- Copy-on-write defers memory duplication until a write occurs.
- The process exits but the parent does not reap it.
- To collect exit status and release kernel resources.
Real-world applications
- Building shells, service managers, and job control.
- Diagnosing unresponsive processes and CPU starvation.
- Understanding how containers manage process trees.
Where you’ll apply it
- Project 3: Build Your Own Shell
- Project 5: Process Psychic
- Project 9: Cgroup Resource Governor
- Project 10: Container Runtime
References
- sched(7) man page: https://man7.org/linux/man-pages/man7/sched.7.html
- procfs process state fields: https://docs.kernel.org/filesystems/proc.html
- “Operating Systems: Three Easy Pieces” - Virtualization and CPU scheduling chapters
Key insights Processes are kernel data structures first, running programs second.
Summary Understanding fork, exec, and scheduling turns “why is it stuck” into a solvable question.
Homework/Exercises to practice the concept
- Observe process states in
/proc/[pid]/statfor a running process. - Use
psto find zombie processes and explain their parent relationship.
Solutions to the homework/exercises
- The state character indicates running, sleeping, or blocked.
- Zombies show as Z; their parent must call wait to remove them.
Chapter 4: Virtual Memory and Page Cache
Fundamentals Virtual memory gives each process the illusion of a contiguous address space. The kernel uses page tables to map virtual addresses to physical frames, enabling protection and isolation. Page faults occur when a mapping is missing or invalid; the kernel resolves them by allocating memory or loading data from disk. The page cache sits between the VFS and storage, storing recently accessed file data in memory. This is why repeated file reads become fast and why I/O performance can appear unpredictable without understanding caching.
Deep Dive Virtual memory is the contract that separates a process’s view of memory from physical RAM. Each process sees a private address space that includes code, data, heap, stack, and mapped files. The kernel maintains page tables that translate virtual addresses to physical frames. When a process accesses a virtual address, the CPU checks the page tables. If the entry is missing or marked invalid, a page fault occurs and the kernel takes over.
Page faults are not always errors. A minor fault happens when the page is already in memory but the page table needs to be updated. A major fault happens when the data must be fetched from disk, which is much slower. This distinction matters when diagnosing performance: frequent major faults indicate I/O bottlenecks, while minor faults are usually acceptable. Tools that show faults can reveal whether memory pressure or slow storage is the root cause.
Memory mappings are created by both the kernel and user programs. When a program starts, its executable segments are mapped into memory. Shared libraries are mapped on demand. When you use memory-mapped files, the file contents are mapped into the process address space and the kernel treats file I/O as page faults. This is powerful because it allows the same data to be accessed via both file I/O and memory access semantics. It also means that the page cache becomes the central buffer between storage and memory access.
The page cache stores file-backed pages in memory. When you read a file, the kernel often reads entire pages into the cache and then satisfies your read from memory. Subsequent reads can be served from cache with no disk I/O. Writes are also buffered: data is written into cached pages and later flushed to disk. This improves performance but introduces complexity: a successful write does not always mean data is on disk. The kernel may delay the actual writeback, and a crash can lose buffered data unless an explicit sync occurs. Understanding this behavior explains why benchmarks can be misleading and why databases rely on fsync.
Memory pressure triggers eviction. The kernel maintains lists of active and inactive pages and chooses victims based on access patterns. File-backed pages can be dropped if the data can be reloaded from disk; anonymous pages must be swapped or reclaimed by killing processes. When memory is scarce, the kernel may swap to disk or invoke the OOM killer. This is where performance collapses if the system starts thrashing. Recognizing the symptoms of thrashing and understanding page cache effects allows you to interpret I/O latencies correctly.
Virtual memory also enforces protection. Each page has permissions (read, write, execute). Attempting to write to a read-only page or execute data will cause a fault, which the kernel converts into a signal. This is the basis of memory safety features like W^X and stack guards. The memory system is therefore both a performance layer and a security boundary.
How this fit on projects Projects 7 and 8 rely on virtual memory behavior, page faults, and page cache visibility.
Definitions and key terms
- Page: Fixed-size unit of memory mapping.
- Page fault: Trap when a page is missing or not permitted.
- Page cache: Kernel cache of file-backed pages.
- mmap: Map a file into a process address space.
- OOM killer: Kernel mechanism to reclaim memory by killing processes.
Mental model diagram
Virtual Address -> Page Table -> Physical Frame
| |
| v
| Page cache (file-backed)
v
Process view of memory
How it works
- Process accesses a virtual address.
- CPU checks page tables.
- If missing, kernel handles fault.
- Kernel loads or allocates a page.
- Access resumes with mapping in place.
Minimal concrete example
Memory access trace (conceptual):
read file -> page fault -> page cached
read same file -> no fault, served from cache
Common misconceptions
- “More RAM always means faster I/O.” It depends on cache behavior.
- “A write is durable after write returns.” It may be buffered.
- “A page fault is always bad.” Minor faults are normal.
Check-your-understanding questions
- What is the difference between a major and minor page fault?
- Why can a read become fast after the first access?
- What causes the OOM killer to run?
Check-your-understanding answers
- Major faults require disk I/O; minor faults only update page tables.
- The data is now in the page cache.
- The kernel cannot reclaim enough memory and must free it by killing a process.
Real-world applications
- Explaining why database benchmarks change after a warm-up.
- Diagnosing slow applications caused by swapping or cache misses.
- Designing memory-mapped file workflows.
Where you’ll apply it
- Project 7: Page Cache and mmap Lab
- Project 8: FUSE Mirror Filesystem (page cache effects)
References
- “Operating Systems: Three Easy Pieces” - Memory virtualization chapters
- “The Linux Programming Interface” - Memory mapping chapter
Key insights Virtual memory is both performance infrastructure and a protection boundary.
Summary Once you understand page faults and page cache behavior, I/O latency becomes predictable.
Homework/Exercises to practice the concept
- Measure file read times before and after cache warm-up.
- Observe page fault counters for a simple program.
Solutions to the homework/exercises
- The second read should be much faster due to cache.
- Minor faults rise with memory mapping; major faults rise with disk access.
Chapter 5: VFS, Inodes, and Virtual Filesystems
Fundamentals The Virtual File System (VFS) provides a uniform interface for files regardless of the underlying filesystem. It translates pathnames into inodes and dispatches operations to the correct filesystem driver. A directory maps names to inode numbers; the inode stores metadata and pointers to file data. File descriptors are references to open file objects, not pathnames. Virtual filesystems like procfs expose kernel data structures as files, and user-space filesystems like FUSE plug into the same VFS interface.
Deep Dive When a program opens a file, the kernel does not immediately read file data. It first resolves the pathname. Path resolution walks directories, consulting dentries (directory entry caches) and inodes to locate the file. Each component in the path is looked up in the VFS cache or read from disk if missing. This is why long paths can cause multiple metadata reads. The VFS layer abstracts these operations so that the same open, read, and write syscalls work on ext4, xfs, procfs, and more.
Once a file is opened, the kernel creates an open file description that contains the current file offset and flags. A file descriptor is just an integer reference to this open file description. Two file descriptors can refer to the same open file description (for example, after a dup), sharing offsets and flags. This is why redirection works: the shell duplicates a file descriptor onto standard output, and the program writes to a file instead of the terminal without knowing the difference.
Inodes represent the identity of a file: ownership, permissions, size, timestamps, and pointers to file data. Directory entries map names to inode numbers. This separation explains why hard links can exist: multiple directory entries can point to the same inode. Deleting a file removes a directory entry, not the inode itself. The inode is removed only when the link count drops to zero and no process has the file open. This is why a file can be deleted but still consume disk space until the last handle closes.
The VFS layer defines a set of operations that filesystems must implement. These include lookup, read, write, and attribute queries. The kernel calls these operations through function pointers in the filesystem implementation. This design allows the same syscall to work across many filesystems, which is why you can mount a FUSE filesystem and still use ls, cat, and cp without modification. The VFS also handles permissions checks and caching rules, allowing filesystem developers to focus on storage-specific logic.
Virtual filesystems expose kernel data structures as file-like interfaces. Procfs is the classic example. It does not store data on disk; instead, it generates file contents on demand when read. This makes it a powerful debugging interface, but it also means the data is ephemeral and can change between reads. When you read /proc/[pid]/stat, you are seeing a snapshot of process state at that instant.
FUSE is the user-space counterpart to this design. It provides a device interface that allows user programs to implement filesystem operations in user space. The kernel forwards VFS operations to a user-space daemon, which responds with data or errors. This is slower than in-kernel filesystems but dramatically easier to implement and safer to debug. It also allows experimentation with new filesystem features without writing kernel code.
How this fit on projects Projects 4, 5, and 8 depend on VFS concepts, inode identity, and virtual filesystems.
Definitions and key terms
- VFS: Virtual File System layer that abstracts filesystem operations.
- Inode: File identity and metadata structure.
- Dentry: Directory entry mapping names to inodes.
- File descriptor: Integer reference to an open file description.
- procfs: Virtual filesystem exposing kernel data structures.
Mental model diagram
Pathname -> Dentry cache -> Inode -> Filesystem ops
|
v
File data blocks
How it works
- Resolve path components into dentries and inodes.
- Create an open file description with offset and flags.
- Read/write via filesystem operations defined by VFS.
- Cache metadata and data for performance.
Minimal concrete example
Directory entry -> inode number
name: "report.txt" -> inode 192341
inode 192341 -> size, permissions, data blocks
Common misconceptions
- “The filename is stored in the file.” It is stored in the directory entry.
- “A file descriptor is a path.” It is a handle to an open file object.
- “procfs files exist on disk.” They are generated by the kernel.
Check-your-understanding questions
- Why can a deleted file still take disk space?
- What does a hard link share with the original file?
- Why can a process read
/proc/selfwithout knowing its PID?
Check-your-understanding answers
- The inode remains as long as there are open file descriptions or link count > 0.
- The inode; different names point to the same file identity.
- The kernel resolves
/proc/selfto the calling process automatically.
Real-world applications
- Debugging permission errors and unexpected file behavior.
- Building user-space filesystems with custom behavior.
- Understanding how tools like
ls,stat, andfindwork.
Where you’ll apply it
- Project 4: Filesystem Explorer
- Project 5: Process Psychic
- Project 8: FUSE Mirror Filesystem
References
- VFS documentation: https://docs.kernel.org/filesystems/vfs.html
- procfs documentation: https://docs.kernel.org/filesystems/proc.html
- libfuse reference: https://github.com/libfuse/libfuse
Key insights VFS separates file identity, naming, and storage so that everything looks like a file.
Summary If you can explain inodes, dentries, and file descriptors, you can explain almost any file behavior.
Homework/Exercises to practice the concept
- Create a hard link and observe how link counts change.
- Compare
/proc/selfand/proc/<pid>output.
Solutions to the homework/exercises
- Link count increases; both names point to the same inode.
- They show the same process data, but one is resolved dynamically.
Chapter 6: Signals, IPC, and Terminals
Fundamentals Signals are asynchronous notifications delivered to processes by the kernel. IPC (inter-process communication) allows processes to exchange data or synchronize: pipes, FIFOs, shared memory, and message queues are common forms. Terminals are special device interfaces that convert keystrokes into bytes, sometimes applying line discipline and signal generation. Understanding how signals, IPC, and TTYs interact is essential for shells, terminal UIs, and any program that handles user input directly.
Deep Dive Signals are the kernel’s way of injecting control flow into a process. Each signal has a disposition: default action, ignore, or custom handler. Signals can be blocked, pending, or delivered. When a signal is delivered, the kernel arranges for the process to run a handler in user space. This is not a kernel callback in the classic sense; the kernel simply adjusts the process state so that when it resumes execution, it executes the handler. This explains why signal handlers must be reentrant and why only async-signal-safe operations are allowed.
Signals interact with process groups and terminals. When you press Ctrl+C in a terminal, the terminal driver does not send a literal character to the program by default. Instead, it interprets that key combination and sends SIGINT to the foreground process group. This is why shells can stop or terminate programs and why they must manage process groups carefully. If a shell does not handle SIGINT correctly, it will kill itself along with the child, which is not what the user expects.
IPC mechanisms provide different guarantees and tradeoffs. Pipes are unidirectional byte streams with kernel buffering, ideal for simple producer-consumer patterns. FIFOs extend pipes to unrelated processes through the filesystem namespace. Message queues provide discrete messages with priorities and boundaries. Shared memory is the fastest IPC because it avoids copying, but it requires explicit synchronization (such as semaphores or futexes) to avoid races. Sockets generalize IPC across the network but also work locally as a flexible communication primitive.
Terminals add another layer of complexity. The terminal driver maintains a line discipline, which by default implements canonical mode: input is buffered until a newline, echoing is enabled, and special control characters generate signals. Raw mode disables these transformations so that each keypress is delivered directly. This is how text editors and full-screen terminal UIs work. Raw mode can also disable signal generation, so programs must handle interrupts explicitly. Restoring terminal state on exit is essential to avoid leaving the user in a broken terminal.
The intersection of signals, IPC, and terminals is where many subtle bugs arise. A pipeline of processes involves pipes and process groups. A signal can interrupt a blocking read from a pipe, leading to partial reads or EINTR errors. A terminal resize sends SIGWINCH, which programs must handle to redraw properly. These interactions are not optional details; they are the behavior you must expect and handle if you build shells, TUI programs, or server daemons.
How this fit on projects Projects 3 and 6 rely on signal handling, pipes, and raw terminal behavior.
Definitions and key terms
- Signal: Asynchronous notification delivered to a process.
- IPC: Communication or synchronization between processes.
- TTY: Terminal device interface with line discipline.
- Canonical mode: Line-buffered terminal input.
- Raw mode: Direct, byte-by-byte terminal input.
Mental model diagram
Keyboard -> TTY driver -> (signals + bytes) -> process group
|
v
line discipline
How it works
- User input arrives at TTY driver.
- Line discipline may buffer or translate input.
- Special keys trigger signals to the foreground group.
- Processes read bytes or handle signals accordingly.
Minimal concrete example
Terminal behavior (conceptual):
Ctrl+C -> SIGINT to foreground process group
Ctrl+Z -> SIGTSTP (suspend)
Common misconceptions
- “Ctrl+C sends a character to the program.” It sends a signal by default.
- “Signals always interrupt syscalls.” Some are restartable, others are not.
- “Raw mode is just no echo.” It also changes signal and line buffering behavior.
Check-your-understanding questions
- Why must signal handlers be minimal?
- What is the difference between a pipe and shared memory?
- Why does a shell need to manage process groups?
Check-your-understanding answers
- Handlers can run at any point and must avoid unsafe operations.
- Pipes copy data through the kernel; shared memory does not.
- To control which processes receive terminal-generated signals.
Real-world applications
- Building interactive shells and TUI applications.
- Designing IPC for multi-process systems.
- Handling graceful shutdown in servers.
Where you’ll apply it
- Project 3: Build Your Own Shell
- Project 6: Raw Mode Terminal
References
- signal(7) man page: https://man7.org/linux/man-pages/man7/signal.7.html
- termios(3) man page: https://man7.org/linux/man-pages/man3/termios.3.html
- “Advanced Programming in the UNIX Environment” - Signals and terminals chapters
Key insights Signals and TTYs are the control plane of Unix processes.
Summary If you understand signals, IPC, and terminal modes, you can build robust interactive programs.
Homework/Exercises to practice the concept
- Identify which signals your shell sends when you press Ctrl+C or Ctrl+Z.
- Draw a pipe diagram for a three-command pipeline.
Solutions to the homework/exercises
- Ctrl+C sends SIGINT, Ctrl+Z sends SIGTSTP to the foreground group.
- Each pipe connects stdout of one process to stdin of the next.
Chapter 7: Namespaces and Cgroups (Isolation and Resource Control)
Fundamentals Namespaces provide isolated views of system resources, such as PIDs, mounts, and hostnames. Cgroups provide hierarchical resource control, limiting CPU, memory, and I/O usage. Containers are built by combining namespaces with cgroups and a minimal filesystem. Understanding these mechanisms reveals that containers are just processes with constrained views and budgets, not virtual machines.
Deep Dive Namespaces wrap global kernel resources in per-process views. A PID namespace gives a process its own PID tree, where it can see itself as PID 1. A mount namespace provides a separate view of the filesystem mount table. A UTS namespace isolates hostname and domain name. Network namespaces provide separate network stacks. These namespaces are created by system calls that clone or unshare a process into new namespaces, and they can be joined by other processes using setns. The critical point is that namespaces do not duplicate the kernel; they partition the view of kernel data structures.
Cgroups manage resources. A cgroup is a node in a hierarchy that controls a set of processes. Controllers enforce limits, such as CPU time, memory usage, or I/O bandwidth. The cgroup v2 design provides a unified hierarchy and consistent controller semantics. When a process belongs to a cgroup, the kernel accounts for its resource usage and enforces the limits of the group. This is how a container can be restricted to a memory budget and how systemd can manage service resource usage.
Isolation and resource control are complementary. Namespaces prevent a process from seeing or interfering with unrelated system resources. Cgroups prevent it from consuming too much of a shared resource. Both rely on the core process model and scheduler. For example, CPU throttling in cgroups is implemented by scheduling decisions that delay runnable tasks. Memory limits are enforced by reclaiming pages or invoking the OOM killer within the cgroup. This means that isolation is not just about visibility; it is about enforcement.
The container runtime uses these primitives to assemble a container: it creates a new process, moves it into new namespaces, sets up a root filesystem, mounts /proc inside the namespace, and then applies cgroup limits. The process inside the container sees a new PID tree and a new filesystem root. But the kernel is the same; the isolation is a view, not a hardware separation. This is why containers start quickly and are lightweight compared to virtual machines.
Understanding cgroup delegation is important for security. Only privileged processes can create and manage cgroups by default, but delegation allows a parent to hand off a subtree to a less privileged process. The kernel enforces containment rules to prevent escape. This is why container managers have careful logic around cgroup ownership and why some operations require root privileges.
How this fit on projects Projects 9 and 10 rely on namespaces, cgroups, and their interaction with process management.
Definitions and key terms
- Namespace: Isolation of a global kernel resource.
- Cgroup: Control group for resource accounting and limits.
- Container: A process with isolated namespaces and limited resources.
- setns/unshare/clone: Syscalls to manipulate namespaces.
Mental model diagram
Host kernel
|
+-- Namespace A (PID, mount, UTS)
| |
| +-- Process tree (PID 1 inside)
|
+-- Namespace B (separate view)
Cgroup tree
/ (root)
/services
/web (cpu, memory limits)
How it works
- Create a new process with namespace flags.
- Set up mounts and root filesystem inside the namespace.
- Move the process into a cgroup with limits.
- Start the target program as PID 1 inside the namespace.
Minimal concrete example
Namespace view (conceptual):
Inside container: PID 1 -> /bin/sh
Outside container: PID 32450 -> /bin/sh
Common misconceptions
- “Containers are lightweight VMs.” They share the host kernel.
- “Namespaces alone are enough.” Resource limits require cgroups.
- “cgroup limits are optional.” Without them, noisy neighbors can starve the host.
Check-your-understanding questions
- What does a PID namespace change about process IDs?
- Why does a container need its own
/procmount? - How does cgroup v2 enforce CPU limits?
Check-your-understanding answers
- It provides a new PID namespace where processes see different IDs.
/procmust reflect the container’s PID view, not the host’s.- It throttles runnable tasks based on configured CPU bandwidth.
Real-world applications
- Building container runtimes and sandboxes.
- Limiting resource usage of services in production.
- Providing isolation for untrusted workloads.
Where you’ll apply it
- Project 9: Cgroup Resource Governor
- Project 10: Container Runtime
References
- namespaces(7) man page: https://man7.org/linux/man-pages/man7/namespaces.7.html
- cgroups(7) man page: https://man7.org/linux/man-pages/man7/cgroups.7.html
- cgroup v2 documentation: https://docs.kernel.org/admin-guide/cgroup-v2.html
Key insights Containers are processes with constrained views and budgets.
Summary Namespaces and cgroups explain containers without magic: isolation plus limits.
Homework/Exercises to practice the concept
- List the namespace types available on your system.
- Identify which cgroup controllers are enabled.
Solutions to the homework/exercises
- Use
ls /proc/self/nsand compare entries. - Inspect the cgroup filesystem for controller files at the root.
Glossary
- ABI: Binary contract between compiled code and the kernel or CPU.
- Cgroup: Kernel mechanism to group and limit resource usage.
- Dentry: Directory entry mapping names to inode numbers.
- ELF: Executable and Linkable Format for binaries.
- Inode: File identity and metadata structure.
- Namespace: Kernel feature that isolates a global resource view.
- PID 1: The first user space process, responsible for system init.
- Procfs: Virtual filesystem exposing kernel data structures.
- Syscall: Controlled transition to the kernel for a service.
- TTY: Terminal device interface with line discipline.
Why Linux and Unix Internals Matter
- Modern motivation and real-world use cases: cloud servers, containers, and embedded devices depend on predictable kernel behavior.
- Real-world statistics and impact:
- Linux is used by 59.8% of websites whose operating system is known (W3Techs, Jan 2, 2026).
- Unix-like systems overall account for 90.7% of known website OS usage (W3Techs, Jan 3, 2026).
- Context and evolution: Unix introduced the process model and file abstraction; Linux scaled it across hardware and cloud.
ASCII comparison (old vs modern operations model):
Then (monolithic servers) Now (containers and clouds)
+-----------------------+ +--------------------------+
| One big OS instance | | Many isolated processes |
| Manual configuration | | Declarative orchestration|
| Few services per host | | Thousands per host |
+-----------------------+ +--------------------------+
Sources:
- https://w3techs.com/technologies/comparison/os-Linux
- https://w3techs.com/technologies/history_overview/operating_system
Concept Summary Table
| Concept Cluster | What You Need to Internalize |
|---|---|
| Boot and Early Userspace | Boot is a chain of contracts from firmware to PID 1, not a single event. |
| System Call Interface and ABI | Syscalls are the only legal bridge and the ABI defines the binary contract. |
| Process Model, Scheduling, and Exec | Processes are kernel records; fork/exec and scheduling explain states. |
| Virtual Memory and Page Cache | Address spaces are contracts; page cache explains I/O behavior. |
| VFS, Inodes, and Virtual Filesystems | Names map to inodes; VFS unifies filesystems and procfs. |
| Signals, IPC, and Terminals | Signals are async control; IPC and TTYs drive interaction. |
| Namespaces and Cgroups | Isolation is a view plus limits, not a VM. |
Project-to-Concept Map
| Project | Concepts Applied |
|---|---|
| Project 1 | Boot and Early Userspace |
| Project 2 | System Call Interface and ABI; Process Model |
| Project 3 | System Call Interface and ABI; Signals, IPC, and Terminals |
| Project 4 | VFS, Inodes, and Virtual Filesystems |
| Project 5 | Process Model, Scheduling, and Exec; VFS |
| Project 6 | Signals, IPC, and Terminals |
| Project 7 | Virtual Memory and Page Cache |
| Project 8 | VFS, Inodes, and Virtual Filesystems; System Calls |
| Project 9 | Namespaces and Cgroups; Process Model |
| Project 10 | Namespaces and Cgroups; Process Model; Boot |
Deep Dive Reading by Concept
| Concept | Book and Chapter | Why This Matters |
|---|---|---|
| Boot and Early Userspace | “How Linux Works” by Brian Ward - Ch. 1-3 | Clear, practical boot narrative. |
| System Call Interface and ABI | “The Linux Programming Interface” - Ch. 1-6 | Syscall boundary and error model. |
| Process Model, Scheduling, and Exec | “Operating Systems: Three Easy Pieces” - CPU chapters | Scheduling and process model fundamentals. |
| Virtual Memory and Page Cache | “Operating Systems: Three Easy Pieces” - Memory chapters | Page faults and caching behavior. |
| VFS, Inodes, and Virtual Filesystems | “Linux Kernel Development” - VFS chapter | Kernel-level view of VFS design. |
| Signals, IPC, and Terminals | “Advanced Programming in the UNIX Environment” - Signals and terminals | Correct control-flow mental model. |
| Namespaces and Cgroups | “The Linux Programming Interface” - namespaces/cgroups sections | Practical API usage for isolation. |
Quick Start
Day 1:
- Read Theory Primer chapters 1 and 2.
- Start Project 3 and get a simple prompt running.
Day 2:
- Validate Project 3 against the Definition of Done.
- Read Chapter 6 and practice signal handling scenarios.
Recommended Learning Paths
Path 1: The Systems Newcomer
- Project 3 -> Project 4 -> Project 5 -> Project 6 -> Project 7 -> Project 8 -> Project 1 -> Project 9 -> Project 10
Path 2: The Linux Power User
- Project 2 -> Project 3 -> Project 5 -> Project 7 -> Project 8 -> Project 9 -> Project 10
Path 3: The Container-Focused Engineer
- Project 5 -> Project 9 -> Project 10 -> Project 8 -> Project 2
Success Metrics
- You can trace a CLI command to its syscalls and explain each one.
- You can explain why a process is sleeping or blocked using
/procoutput. - You can build a container-like process with isolated PID and mount namespaces.
Project Overview Table
| # | Project | Core Topics | Difficulty |
|---|---|---|---|
| 1 | Do-Nothing Bootloader and Kernel | Boot chain, firmware, early init | Advanced |
| 2 | Syscall Tracer | Syscalls, ptrace, process state | Advanced |
| 3 | Build Your Own Shell | fork/exec, pipes, signals | Intermediate |
| 4 | Filesystem Explorer | VFS, inodes, directory traversal | Intermediate |
| 5 | Process Psychic | /proc, process states, CPU usage | Intermediate |
| 6 | Raw Mode Terminal | termios, TTY, signals | Advanced |
| 7 | Page Cache and mmap Lab | page faults, cache, memory | Intermediate |
| 8 | FUSE Mirror Filesystem | VFS callbacks, user space FS | Advanced |
| 9 | Cgroup Resource Governor | cgroup v2, resource limits | Advanced |
| 10 | Container Runtime | namespaces, pivot_root, PID 1 | Expert |
Project List
The following projects guide you from basic syscall tracing to container-level isolation.
Project 1: The Do-Nothing Bootloader and Kernel
- File: P01-do-nothing-bootloader-kernel.md
- Main Programming Language: Assembly and C
- Alternative Programming Languages: Rust, Zig
- Coolness Level: See REFERENCE.md (Level 4)
- Business Potential: See REFERENCE.md (Level 1)
- Difficulty: See REFERENCE.md (Level 3)
- Knowledge Area: Boot Process / Low Level
- Software or Tool: QEMU, assembler, linker
- Main Book: “How Linux Works” by Brian Ward
What you will build: A bootable image that prints a message directly to screen memory and halts.
Why it teaches Linux internals: It reveals the exact handoff between firmware, bootloader, and kernel and makes the boot protocol tangible.
Core challenges you will face:
- Boot sector constraints -> Boot protocol understanding
- Real mode limitations -> CPU mode awareness
- Binary layout -> Linker and memory placement
Real World Outcome
You will build an image os-image.bin that boots in QEMU and displays a message.
For CLI projects - exact output:
$ qemu-system-x86_64 -drive format=raw,file=os-image.bin
# A QEMU window opens with a black screen.
# Text appears:
HELLO FROM BARE METAL
# CPU halts; nothing else happens.
The Core Question You Are Answering
“What happens before the kernel has any services to rely on?”
This project shows that the boot chain is explicit, small, and testable.
Concepts You Must Understand First
- Firmware and Bootloader Handoff
- What data does the bootloader pass to the kernel?
- Book Reference: “How Linux Works” - Ch. 1
- Real Mode vs Protected Mode
- What limitations exist before protected mode is enabled?
- Book Reference: “Operating Systems: Three Easy Pieces” - Intro
Questions to Guide Your Design
- How will you guarantee the boot signature is correct?
- Where in memory will your message be written?
Thinking Exercise
Trace the First Instruction
Explain, step by step, how the CPU finds your first instruction after power-on.
Questions to answer:
- What address does the firmware jump to?
- Why does a wrong signature prevent boot?
The Interview Questions They Will Ask
- “What is the difference between firmware and a bootloader?”
- “Why does early boot code run in real mode?”
- “What is the purpose of the boot signature?”
- “Why is PID 1 special?”
Hints in Layers
Hint 1: The Image Format The first sector must include a boot signature and code the firmware can execute.
Hint 2: Video Memory Text mode can be written by placing character and color bytes into the VGA text buffer.
Hint 3: Minimal Flow Load -> write message -> halt in an infinite loop.
Hint 4: Verification If your image does not boot, check image size, signature, and start address.
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Boot flow | “How Linux Works” | Ch. 1-2 |
| CPU modes | “Operating Systems: Three Easy Pieces” | Intro |
Common Pitfalls and Debugging
Problem 1: “Black screen, no text”
- Why: Wrong load address or missing boot signature.
- Fix: Verify last two bytes are 0x55 0xAA and code starts at correct offset.
- Quick test: Use
hexdumpto confirm the signature at the end of the sector.
Definition of Done
- Image boots in QEMU reliably
- Message appears exactly as expected
- CPU halts without reboot loop
- Boot sector size and signature are correct
Project 2: The Syscall Tracer (Mini-strace)
- File: P02-syscall-tracer.md
- Main Programming Language: C
- Alternative Programming Languages: Rust, Go
- Coolness Level: See REFERENCE.md (Level 4)
- Business Potential: See REFERENCE.md (Level 2)
- Difficulty: See REFERENCE.md (Level 3)
- Knowledge Area: Syscalls / Process Control
- Software or Tool: ptrace
- Main Book: “The Linux Programming Interface”
What you will build: A syscall tracer that attaches to a process and logs each syscall with its result.
Why it teaches Linux internals: It forces you to interact with the syscall boundary and process states directly.
Core challenges you will face:
- Process attach/detach -> ptrace semantics
- Syscall entry/exit -> register inspection
- Signal handling -> avoid losing or misordering signals
Real World Outcome
You will run a program and see a live syscall log.
$ ./mytrace /bin/ls
[pid 22101] openat("/etc/ld.so.cache") -> fd=3
[pid 22101] openat("/lib/x86_64-linux-gnu/libc.so.6") -> fd=3
[pid 22101] getdents64(".") -> 7 entries
[pid 22101] write(1, "README.md\n") -> 10
[pid 22101] exit_group(0)
The Core Question You Are Answering
“What does the kernel actually see when a program runs?”
Concepts You Must Understand First
- Syscall ABI
- How are syscall numbers and arguments passed?
- Book Reference: “The Linux Programming Interface” - Ch. 3-4
- ptrace basics
- How does a tracer control a tracee?
- Book Reference: “The Linux Programming Interface” - process control chapters
Questions to Guide Your Design
- How will you distinguish syscall entry from syscall exit?
- How will you format arguments without full type info?
Thinking Exercise
State Machine for Tracing
Draw the tracer loop: attach -> wait -> inspect -> resume -> repeat.
Questions to answer:
- What happens if the tracee exits between waits?
- How do you avoid blocking forever?
The Interview Questions They Will Ask
- “How does strace work under the hood?”
- “What is ptrace used for besides tracing?”
- “Why do tracers need to handle signals carefully?”
- “How do you know a syscall failed?”
Hints in Layers
Hint 1: Start Simple Trace only syscall numbers and return values first.
Hint 2: Entry vs Exit Alternate between entry and exit stops to capture arguments and results.
Hint 3: Errors A negative return value usually maps to errno.
Hint 4: Debugging
Compare your output with strace -f on the same command.
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Syscalls | “The Linux Programming Interface” | Ch. 3-6 |
| Process control | “Advanced Programming in the UNIX Environment” | Ch. 8 |
Common Pitfalls and Debugging
Problem 1: “Tracee never starts”
- Why: You attached but did not resume the tracee.
- Fix: Ensure the tracer always issues a resume after each stop.
- Quick test: Compare your tracer control flow with
strace -fbehavior.
Definition of Done
- Can trace a short command like
ls - Logs syscall numbers and return values
- Handles process exit without crashing
- Output is deterministic and readable
Project 3: Build Your Own Shell
- File: P03-build-your-own-shell.md
- Main Programming Language: C
- Alternative Programming Languages: Rust, Go
- Coolness Level: See REFERENCE.md (Level 3)
- Business Potential: See REFERENCE.md (Level 2)
- Difficulty: See REFERENCE.md (Level 2)
- Knowledge Area: Process Management / IPC
- Software or Tool: POSIX APIs
- Main Book: “The Linux Programming Interface”
What you will build: A minimal shell with built-ins, pipelines, and redirection.
Why it teaches Linux internals: It forces you to implement the fork-exec model and manage file descriptors.
Core challenges you will face:
- Parsing input -> tokenization and quoting
- Process control -> fork/exec/wait
- Pipes and redirection -> FD manipulation
Real World Outcome
$ ./myshell
myshell> pwd
/home/user/projects
myshell> ls -l | grep .c
main.c
myshell> echo hello > out.txt
myshell> cat out.txt
hello
myshell> exit
The Core Question You Are Answering
“How does the OS run programs and connect them together?”
Concepts You Must Understand First
- fork and exec
- Why are they separate steps?
- Book Reference: “The Linux Programming Interface” - Ch. 24-27
- File descriptors
- How does redirection work?
- Book Reference: “The Linux Programming Interface” - Ch. 4-5
Questions to Guide Your Design
- How will you handle built-ins like
cdandexit? - How will you parse quoted strings safely?
Thinking Exercise
Pipeline Trace
Trace ls | grep txt using a process diagram.
Questions to answer:
- Which process owns each pipe end?
- When should unused pipe ends be closed?
The Interview Questions They Will Ask
- “Why does
cdhave to be a built-in?” - “What happens if you do not wait for children?”
- “How does a pipe connect two processes?”
- “What is a zombie process?”
Hints in Layers
Hint 1: The Loop Read a line -> parse -> decide built-in or external -> execute.
Hint 2: Execution Flow Parent spawns child; child replaces itself with the target program.
Hint 3: Redirection Adjust standard FDs before starting the child program.
Hint 4: Debugging Use your Project 2 tracer to confirm expected syscalls.
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Process control | “The Linux Programming Interface” | Ch. 24-27 |
| Signals | “Advanced Programming in the UNIX Environment” | Ch. 10 |
Common Pitfalls and Debugging
Problem 1: “Shell exits on Ctrl+C”
- Why: The shell and child are in the same process group.
- Fix: Put children in their own group and forward signals.
- Quick test: Run a sleep command and press Ctrl+C; shell should survive.
Definition of Done
- Built-ins (
cd,exit) work - External commands run correctly
- Pipes and redirection work
- No zombie processes remain
Project 4: Filesystem Explorer (ls -R clone)
- File: P04-filesystem-explorer.md
- Main Programming Language: C
- Alternative Programming Languages: Rust, Go
- Coolness Level: See REFERENCE.md (Level 2)
- Business Potential: See REFERENCE.md (Level 1)
- Difficulty: See REFERENCE.md (Level 2)
- Knowledge Area: Filesystem / Inodes
- Software or Tool: POSIX directory APIs
- Main Book: “The Linux Programming Interface”
What you will build: A recursive directory lister that prints permissions, owners, and sizes.
Why it teaches Linux internals: It reveals how directory entries map names to inode metadata.
Core challenges you will face:
- Directory traversal -> recursion and filtering
- Metadata formatting -> permission bits and time
- Symlink handling -> avoid loops
Real World Outcome
$ ./myls -R -l .
./
-rw-r--r-- 1 user staff 512 Jan 02 12:01 main.c
lrwxrwxrwx 1 user staff 10 Jan 02 12:02 latest -> main.c
./subdir
-rw-r--r-- 1 user staff 128 Jan 02 12:03 notes.txt
The Core Question You Are Answering
“Where is the filename actually stored?”
Concepts You Must Understand First
- Inodes vs directory entries
- What is stored in each?
- Book Reference: “The Linux Programming Interface” - Ch. 15-18
- Permission bits
- How do mode bits map to rwx strings?
- Book Reference: “The Linux Programming Interface” - Ch. 15
Questions to Guide Your Design
- How will you detect and avoid symlink loops?
- How will you map uid/gid to names?
Thinking Exercise
Metadata Walk
Pick a file and list which metadata fields are stored in its inode.
Questions to answer:
- Which fields come from the inode?
- Which fields come from the directory entry?
The Interview Questions They Will Ask
- “What is the difference between a hard link and a soft link?”
- “Why is
statseparate fromreaddir?” - “What is the inode number used for?”
- “Why can
lsbe slow on network filesystems?”
Hints in Layers
Hint 1: Start with current directory List entries, then add recursion.
Hint 2: Permissions Build the rwx string by checking mode bit flags.
Hint 3: Symlinks Do not follow symlinks by default in recursive mode.
Hint 4: Debugging
Compare your output to ls -lR on the same directory.
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| File metadata | “The Linux Programming Interface” | Ch. 15 |
| Directories | “The Linux Programming Interface” | Ch. 18 |
Common Pitfalls and Debugging
Problem 1: “Infinite recursion”
- Why: Following symlinks to parent directories.
- Fix: Skip symlinks or track visited inodes.
- Quick test: Run on a directory containing a symlink to itself.
Definition of Done
- Recursion works and avoids cycles
- Permissions and ownership are correct
- Output matches
ls -lRfor key fields - Handles empty directories gracefully
Project 5: Process Psychic (procfs Inspector)
- File: P05-process-psychic.md
- Main Programming Language: C or Python
- Alternative Programming Languages: Go, Rust
- Coolness Level: See REFERENCE.md (Level 3)
- Business Potential: See REFERENCE.md (Level 3)
- Difficulty: See REFERENCE.md (Level 2)
- Knowledge Area: /proc, process states
- Software or Tool: procfs
- Main Book: “The Linux Programming Interface”
What you will build: A ps-like tool that reads /proc to show process state, memory, and CPU time.
Why it teaches Linux internals: It shows how the kernel exposes process state via virtual files.
Core challenges you will face:
- Enumerating processes -> scanning
/proc - Parsing state ->
/proc/[pid]/stat - Handling races -> processes that exit mid-read
Real World Outcome
$ ./myps
PID USER STATE CPU(ms) RSS(KB) CMD
1 root S 12345 4567 /sbin/init
220 user R 120 2048 ./myps
350 user S 234 5120 bash
The Core Question You Are Answering
“How does
topknow what is running?”
Concepts You Must Understand First
- procfs layout
- What is stored in
/proc/[pid]/stat? - Book Reference: “The Linux Programming Interface” - procfs sections
- What is stored in
- Process states
- What does R, S, D, Z mean?
- Book Reference: “Operating Systems: Three Easy Pieces” - CPU scheduling
Questions to Guide Your Design
- How will you handle a PID directory disappearing mid-read?
- How will you convert ticks to milliseconds?
Thinking Exercise
Manual ps
Read /proc/self/stat and map fields to human-readable values.
Questions to answer:
- Which fields represent CPU time?
- Which field is the process state?
The Interview Questions They Will Ask
- “What is
/procand why is it special?” - “How do you calculate CPU usage from
/proc?” - “What does state
Dmean?” - “Why can
/procreads race with process exit?”
Hints in Layers
Hint 1: Filter PIDs Only directories that are all digits represent processes.
Hint 2: Parsing
/proc/[pid]/stat is space-separated but the comm field is in parentheses.
Hint 3: Timing Use the system clock tick size to convert jiffies to ms.
Hint 4: Debugging
Compare a single PID output with ps -o for the same PID.
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| procfs | “The Linux Programming Interface” | procfs sections |
| Process model | “Operating Systems: Three Easy Pieces” | CPU chapters |
Common Pitfalls and Debugging
Problem 1: “Parse errors on comm field”
- Why: The command name can contain spaces inside parentheses.
- Fix: Parse until the matching closing parenthesis first.
- Quick test: Compare with processes that have spaces in their names.
Definition of Done
- Lists at least PID, state, RSS, and command
- Handles PID exit without crashing
- Output matches
psfor sample PIDs - Runs without root privileges
Project 6: Raw Mode Terminal (TUI Core)
- File: P06-raw-mode-terminal.md
- Main Programming Language: C
- Alternative Programming Languages: Rust, Go
- Coolness Level: See REFERENCE.md (Level 3)
- Business Potential: See REFERENCE.md (Level 2)
- Difficulty: See REFERENCE.md (Level 3)
- Knowledge Area: TTY / Termios
- Software or Tool: termios
- Main Book: “The Linux Programming Interface”
What you will build: A raw-mode terminal tool that reads keystrokes byte-by-byte and draws a simple screen.
Why it teaches Linux internals: It forces you to understand terminal line discipline and signals.
Core challenges you will face:
- Terminal modes -> canonical vs raw
- Escape sequences -> cursor control
- Cleanup -> restoring terminal state
Real World Outcome
$ ./rawmode
[screen clears]
RAW MODE ACTIVE - press 'q' to quit
KEY: a (97)
KEY: Ctrl+C (3) - intercepted, not exiting
KEY: ArrowUp -> ESC [ A
The Core Question You Are Answering
“Why does Backspace and Ctrl+C work in a normal terminal?”
Concepts You Must Understand First
- Canonical mode
- Why input is line-buffered by default.
- Book Reference: “The Linux Programming Interface” - terminals chapter
- Signal generation
- How terminals generate SIGINT and SIGTSTP.
- Book Reference: “Advanced Programming in the UNIX Environment” - signals
Questions to Guide Your Design
- How will you ensure the terminal restores on crash?
- How will you decode escape sequences for arrows?
Thinking Exercise
The Stuck Terminal
Predict what happens if echo is disabled and not restored.
Questions to answer:
- What commands restore the terminal?
- Why is cleanup critical?
The Interview Questions They Will Ask
- “What is canonical mode and why does it exist?”
- “How does Ctrl+C become SIGINT?”
- “What is a pseudo-terminal?”
- “Why do TUI programs need to redraw on SIGWINCH?”
Hints in Layers
Hint 1: Save and Restore Store the original terminal settings and restore them on exit.
Hint 2: Disable Canonical Mode Turn off line buffering and echo for raw input.
Hint 3: Escape Sequences Arrow keys start with ESC [; parse multi-byte sequences.
Hint 4: Debugging
If the terminal is broken, use stty sane to recover.
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Terminals | “The Linux Programming Interface” | Ch. 62 |
| Signals | “Advanced Programming in the UNIX Environment” | Ch. 10 |
Common Pitfalls and Debugging
Problem 1: “Terminal stays broken after exit”
- Why: Raw mode not restored.
- Fix: Ensure cleanup runs on normal exit and on signals.
- Quick test: Use
stty -abefore and after.
Definition of Done
- Raw input works for letters and control keys
- Terminal state is restored on exit
- Arrow keys are decoded correctly
- Screen redraw works after resize
Project 7: Page Cache and mmap Lab
- File: P07-page-cache-mmap-lab.md
- Main Programming Language: C or Python
- Alternative Programming Languages: Rust, Go
- Coolness Level: See REFERENCE.md (Level 2)
- Business Potential: See REFERENCE.md (Level 2)
- Difficulty: See REFERENCE.md (Level 2)
- Knowledge Area: Virtual Memory / Page Cache
- Software or Tool:
time,sync,/proc/meminfo - Main Book: “Operating Systems: Three Easy Pieces”
What you will build: A repeatable experiment that measures cached vs uncached file reads and page faults.
Why it teaches Linux internals: It reveals how the page cache and memory mappings change I/O behavior.
Core challenges you will face:
- Cache warm-up -> repeated measurements
- Page fault accounting -> interpreting counters
- Reproducibility -> controlling system noise
Real World Outcome
$ ./cachelab sample.dat
cold read: 1.82s
warm read: 0.06s
minor faults: +1200
major faults: +45
The Core Question You Are Answering
“Why does the second read of a file feel instant?”
Concepts You Must Understand First
- Page cache behavior
- What is cached and when?
- Book Reference: “Operating Systems: Three Easy Pieces” - memory chapters
- Page faults
- What do major and minor faults mean?
- Book Reference: “The Linux Programming Interface” - memory mapping
Questions to Guide Your Design
- How will you separate cold and warm reads?
- How will you measure faults per run?
Thinking Exercise
Predict the Trend
Predict how read times change after repeated reads.
Questions to answer:
- Why does the first read cost more?
- What happens after a sync?
The Interview Questions They Will Ask
- “What is the page cache?”
- “What is a major page fault?”
- “Why can benchmarks be misleading?”
- “How does mmap change I/O behavior?”
Hints in Layers
Hint 1: Control the Cache Measure cold and warm reads separately.
Hint 2: Observe Counters
Read /proc/meminfo and fault counters before and after.
Hint 3: Separate Effects Test normal read vs mmap-based access.
Hint 4: Debugging Repeat runs and compute averages to reduce noise.
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Memory | “Operating Systems: Three Easy Pieces” | Memory chapters |
| mmap | “The Linux Programming Interface” | Memory mapping chapter |
Common Pitfalls and Debugging
Problem 1: “Results vary wildly”
- Why: Background I/O or cache activity.
- Fix: Use a quiet system or repeat measurements and average.
- Quick test: Run the test multiple times and compare variance.
Definition of Done
- Cold and warm read times are measured
- Fault counts are captured and explained
- Results are reproducible across multiple runs
- You can explain the difference in a paragraph
Project 8: The Mirror Filesystem (FUSE)
- File: P08-mirror-filesystem-fuse.md
- Main Programming Language: C or Python
- Alternative Programming Languages: Rust, Go
- Coolness Level: See REFERENCE.md (Level 4)
- Business Potential: See REFERENCE.md (Level 4)
- Difficulty: See REFERENCE.md (Level 3)
- Knowledge Area: VFS / Filesystems
- Software or Tool: libfuse
- Main Book: “Linux Kernel Development”
What you will build: A FUSE filesystem that mirrors a directory while transforming file contents (for example, reversing text).
Why it teaches Linux internals: It exposes the VFS operations the kernel expects from any filesystem.
Core challenges you will face:
- Filesystem callbacks -> open, read, write, getattr
- Permissions -> pass-through metadata
- Concurrency -> multi-threaded requests
Real World Outcome
$ ./mirrorfs real_dir/ mount_point/
$ echo "hello" > mount_point/test.txt
$ cat real_dir/test.txt
olleh
The Core Question You Are Answering
“How does the kernel support many filesystems with one API?”
Concepts You Must Understand First
- VFS operations
- What does the kernel call for open/read/write?
- Book Reference: “Linux Kernel Development” - VFS chapter
- User space filesystem boundary
- How does
/dev/fusemediate requests? - Book Reference: libfuse docs
- How does
Questions to Guide Your Design
- What metadata should be passed through unchanged?
- How will you handle file permissions and ownership?
Thinking Exercise
Trace a cat
List the VFS operations triggered by cat file.
Questions to answer:
- Which operation supplies file size?
- Which operation returns file contents?
The Interview Questions They Will Ask
- “Why is FUSE slower than in-kernel filesystems?”
- “What is the role of the VFS layer?”
- “What does
getattrrepresent?” - “How do you handle concurrent writes?”
Hints in Layers
Hint 1: Start with passthrough Implement a mirror that forwards operations without transformation.
Hint 2: Add transformation Only modify the data path, not metadata.
Hint 3: Handle errors Return correct error codes for missing files.
Hint 4: Debugging
Use strace on a cat command to see expected operations.
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| VFS | “Linux Kernel Development” | VFS chapter |
| FUSE | libfuse documentation | Reference |
Common Pitfalls and Debugging
Problem 1: “Permissions behave strangely”
- Why: Metadata not passed through correctly.
- Fix: Mirror uid, gid, and mode values from the underlying file.
- Quick test: Compare
ls -lon the mount and real directory.
Definition of Done
- Files can be read and written through the mount
- Transformations apply only to file contents
- Permissions and timestamps are preserved
- Unmount works cleanly
Project 9: Cgroup Resource Governor
- File: P09-cgroup-resource-governor.md
- Main Programming Language: C or Python
- Alternative Programming Languages: Go, Rust
- Coolness Level: See REFERENCE.md (Level 4)
- Business Potential: See REFERENCE.md (Level 4)
- Difficulty: See REFERENCE.md (Level 3)
- Knowledge Area: Resource Control / Cgroups
- Software or Tool: cgroup v2
- Main Book: “The Linux Programming Interface”
What you will build: A tool that launches a program inside a cgroup with CPU and memory limits and reports usage.
Why it teaches Linux internals: It shows how the kernel enforces resource budgets and exposes metrics.
Core challenges you will face:
- Cgroup hierarchy -> creating and cleaning up cgroups
- Controller settings -> CPU and memory limits
- Metrics -> reading usage files
Real World Outcome
$ ./cg-run --cpu=20% --mem=200M ./stress
[info] cgroup created: /sys/fs/cgroup/lab
[info] cpu.max = 20000 100000
[info] memory.max = 209715200
[stats] cpu.usage_usec=512345
[stats] memory.current=154320128
The Core Question You Are Answering
“How does the kernel actually enforce resource limits?”
Concepts You Must Understand First
- cgroup v2 hierarchy
- How are processes attached to a cgroup?
- Book Reference: “The Linux Programming Interface” - resource limits
- Scheduler interaction
- How does CPU throttling work?
- Book Reference: “Operating Systems: Three Easy Pieces” - scheduling
Questions to Guide Your Design
- How will you create and remove cgroup directories safely?
- How will you map friendly limits to cgroup values?
Thinking Exercise
Budget Math
Translate a CPU percentage into a cgroup cpu.max pair.
Questions to answer:
- What period should you choose?
- How does burst behavior appear?
The Interview Questions They Will Ask
- “What is the difference between cgroup v1 and v2?”
- “How does CPU throttling affect latency?”
- “What happens when memory.max is exceeded?”
- “Why are cgroups critical for containers?”
Hints in Layers
Hint 1: Use cgroup v2 Verify the cgroup2 filesystem is mounted.
Hint 2: Attach a Process
Write the PID to cgroup.procs.
Hint 3: Set Limits
Update cpu.max and memory.max before starting workload.
Hint 4: Debugging
Read cpu.stat and memory.current to confirm enforcement.
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Resource limits | “The Linux Programming Interface” | Resource control sections |
| Scheduling | “Operating Systems: Three Easy Pieces” | CPU scheduling chapters |
Common Pitfalls and Debugging
Problem 1: “Limits not enforced”
- Why: Controllers not enabled or cgroup v2 not mounted.
- Fix: Verify the unified hierarchy and enable controllers.
- Quick test: Read controller lists in the cgroup root.
Definition of Done
- Can place a process into a cgroup
- CPU and memory limits are enforced
- Metrics are reported accurately
- cgroup cleanup is safe and repeatable
Project 10: The Poor Man’s Docker (Container Runtime)
- File: P10-container-runtime.md
- Main Programming Language: Go or C
- Alternative Programming Languages: Rust, Python
- Coolness Level: See REFERENCE.md (Level 5)
- Business Potential: See REFERENCE.md (Level 4)
- Difficulty: See REFERENCE.md (Level 4)
- Knowledge Area: Namespaces / Cgroups / Mounts
- Software or Tool: Linux namespaces
- Main Book: “The Linux Programming Interface”
What you will build: A minimal container runtime that launches a command in isolated PID and mount namespaces with resource limits.
Why it teaches Linux internals: It ties together process creation, namespaces, cgroups, and filesystem mounts.
Core challenges you will face:
- Namespace setup -> PID, mount, UTS
- Root filesystem -> chroot or pivot_root
- Procfs -> remount
/procinside container
Real World Outcome
$ sudo ./mycontainer run /bin/sh
container# hostname
sandbox
container# ps
PID USER CMD
1 root /bin/sh
2 root ps
container# exit
The Core Question You Are Answering
“What is a container in kernel terms?”
Concepts You Must Understand First
- Namespaces
- Which resources do they isolate?
- Book Reference: “The Linux Programming Interface” - namespaces sections
- Cgroups
- How are limits applied to a process tree?
- Book Reference: “The Linux Programming Interface” - resource control
Questions to Guide Your Design
- How will you create a PID namespace where the child is PID 1?
- How will you populate a minimal root filesystem?
Thinking Exercise
The /proc Trap
Explain what happens if you do not mount /proc inside the container.
Questions to answer:
- What does
psshow? - Why is this misleading?
The Interview Questions They Will Ask
- “How do namespaces differ from VMs?”
- “Why is PID 1 special inside a container?”
- “What is the role of cgroups in containers?”
- “What is pivot_root used for?”
- “How does Docker isolate processes?”
Hints in Layers
Hint 1: Start with unshare Experiment with a shell created in new namespaces.
Hint 2: Minimal Root Use a small directory tree with a shell and libraries.
Hint 3: Mount /proc Inside the namespace, mount procfs to get accurate process views.
Hint 4: Debugging
Compare ps output inside and outside the container.
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Namespaces | “The Linux Programming Interface” | namespaces sections |
| Containers | “Container Security” by Liz Rice | Ch. 2-3 |
Common Pitfalls and Debugging
Problem 1: “ps shows host processes”
- Why: /proc is not remounted inside the namespace.
- Fix: Mount procfs inside the container namespace.
- Quick test: Check that PID 1 is the container init inside
/proc.
Definition of Done
- Container has isolated PID and mount namespaces
/procshows only container processes- Resource limits apply via cgroups
- Container exits cleanly without host impact
Project Comparison Table
| Project | Difficulty | Time | Depth of Understanding | Fun Factor |
|---|---|---|---|---|
| 1. Bootloader | Advanced | Weekend | Boot chain | High |
| 2. Syscall Tracer | Advanced | 1 week | Syscalls and tracing | High |
| 3. Shell | Intermediate | 1 week | Process control | High |
| 4. Filesystem Explorer | Intermediate | Weekend | VFS and inodes | Medium |
| 5. Process Psychic | Intermediate | Weekend | /proc and states | Medium |
| 6. Raw Terminal | Advanced | 1 week | TTY and signals | High |
| 7. Page Cache Lab | Intermediate | Weekend | Memory and cache | Medium |
| 8. FUSE Mirror | Advanced | 1-2 weeks | VFS callbacks | High |
| 9. Cgroup Governor | Advanced | 1 week | Resource limits | High |
| 10. Container Runtime | Expert | 2 weeks | Isolation stack | Very High |
Recommendation
If you are new to Linux internals: Start with Project 3 for a practical view of fork/exec and signals.
If you are a backend engineer: Start with Project 5 to learn how /proc exposes system truth.
If you want containers: Focus on Projects 9 and 10 after reading the Namespaces chapter.
Final Overall Project
Final Overall Project: The System Supervisor
The Goal: Combine Project 3 (shell), Project 5 (process inspector), and Project 10 (container runtime) into a minimal init-like supervisor.
- Boot into a minimal system image.
- Start a containerized shell as the main service.
- Monitor the service using procfs data and restart on failure.
Success Criteria: The supervisor starts, launches the service, reports status, and restarts it after a crash.
From Learning to Production
| Your Project | Production Equivalent | Gap to Fill |
|---|---|---|
| Syscall Tracer | strace | Robust argument decoding and formatting |
| Shell | bash/zsh | Job control, scripting, globbing |
| Process Psychic | top/ps | Performance optimizations and UI |
| FUSE Mirror | sshfs/encfs | Security, caching, consistency |
| Cgroup Governor | systemd | Policy management, delegation |
| Container Runtime | runc | Full OCI runtime spec support |
Summary
This learning path covers Linux and Unix internals through 10 hands-on projects.
| # | Project Name | Main Language | Difficulty | Time Estimate |
|---|---|---|---|---|
| 1 | Bootloader | Assembly/C | Advanced | Weekend |
| 2 | Syscall Tracer | C | Advanced | 1 week |
| 3 | Shell | C | Intermediate | 1 week |
| 4 | Filesystem Explorer | C | Intermediate | Weekend |
| 5 | Process Psychic | C/Python | Intermediate | Weekend |
| 6 | Raw Terminal | C | Advanced | 1 week |
| 7 | Page Cache Lab | C/Python | Intermediate | Weekend |
| 8 | FUSE Mirror | C/Python | Advanced | 1-2 weeks |
| 9 | Cgroup Governor | C/Python | Advanced | 1 week |
| 10 | Container Runtime | Go/C | Expert | 2 weeks |
Expected Outcomes
- You can trace syscalls and map them to kernel behavior.
- You can explain file identity via inodes and VFS.
- You can build a minimal container from namespaces and cgroups.
Additional Resources and References
Standards and Specifications
- POSIX Base Specifications (Issue 7, 2018): https://pubs.opengroup.org/onlinepubs/9699919799/
- UEFI Specification: https://uefi.org/specs/UEFI/2.10/
- ELF Object File Format (gABI): https://gabi.xinuos.com/elf/
Industry Analysis
- W3Techs Linux usage statistics: https://w3techs.com/technologies/comparison/os-Linux
- W3Techs OS historical trends: https://w3techs.com/technologies/history_overview/operating_system
Books
- “The Linux Programming Interface” by Michael Kerrisk - System call and process fundamentals
- “Operating Systems: Three Easy Pieces” by Remzi and Andrea Arpaci-Dusseau - Process and memory models
- “Advanced Programming in the UNIX Environment” by Stevens and Rago - Signals, terminals, and IPC
- “How Linux Works” by Brian Ward - Boot and user space overview