← Back to all projects

LEARN LINUX UNIX INTERNALS DEEP DIVE

Learn Linux and Unix Internals: From Boot to User Space

Goal: Deeply understand the Linux and Unix operating systems by traversing the entire stack—from the first instruction the CPU executes at boot, through the kernel’s management of processes and memory, down to the filesystem structures on disk and the shell that interprets your commands.


Why Linux & Unix Internals Matter

Linux runs the world. From the Android phone in your pocket to the servers powering the internet, and likely the embedded device in your car, the Unix philosophy and Linux kernel are ubiquitous.

Understanding internals transforms you from a “user” of APIs to a “master” of the system. You stop guessing why a process crashed or why I/O is slow. You know exactly which data structure in the kernel is bottlenecked, or which signal wasn’t handled.

You will move from:

  • “I run ls to list files.”
  • To: “I know ls calls getdents64(), reads struct linux_dirent64 entries, and uses stat() to retrieve inode metadata from the filesystem driver.”

Core Concept Analysis

1. The Rings of Power (Kernel vs. User Space)

The CPU enforces a strict boundary between “Kernel Mode” (Ring 0) and “User Mode” (Ring 3).

      +-----------------------------------------+
      |               USER SPACE                |
      |  (Applications: Shell, Browser, ls)     |
      |                                         |
      |   Restricted Access: No hardware I/O    |
      +--------------------+--------------------+
                           | System Calls (INT 0x80 / syscall)
                           v
      +--------------------+--------------------+
      |              KERNEL SPACE               |
      |  (Process Mgmt, VFS, Network Stack)     |
      |                                         |
      |   Full Access: Hardware, Memory, CPU    |
      +-----------------------------------------+
                           | Drivers
                           v
      +-----------------------------------------+
      |                HARDWARE                 |
      |  (CPU, RAM, Disk, Network Card, GPU)    |
      +-----------------------------------------+

2. The Process Life Cycle

A process isn’t just a running program. It’s a kernel data structure (task_struct in Linux).

       fork()              exec()             exit()
[Parent] ----> [Child] ------------> [New Program] ----> [Zombie]
                  |                       ^                 |
                  | (Copy of Parent)      | (Replaces Memory)|
                  |                       |                 |
                  +-----------------------+                 |
                                                            v
                                                       wait() by Parent
                                                       (Zombie Reaped)

3. The Filesystem Abstraction (VFS)

“Everything is a file” is the Unix mantra. The Virtual Filesystem (VFS) provides a unified interface.

 User App:  read(fd, buf, len)
                  |
 System Call: sys_read()
                  |
 VFS Layer: (Common Interface)
      +-----------+-----------+-----------+
      |           |           |           |
    [ext4]      [xfs]       [proc]      [devtmpfs]
      |           |           |           |
    Disk        Disk        RAM         Device
   Inodes      Inodes     Structs      Drivers

4. The Boot Sequence

How does a computer wake up?

  1. BIOS/UEFI: Initializes hardware, finds bootloader.
  2. Bootloader (GRUB): Loads Kernel image (vmlinuz) and Init RAM Disk (initrd) into memory.
  3. Kernel Start: Probes hardware, mounts root filesystem.
  4. Init Process: The first process (PID 1) starts (systemd or sysvinit).
  5. User Space: Login prompts, graphical interfaces.


Concept Summary Table

Concept Cluster What You Need to Internalize
System Calls The only way user space talks to the kernel. A function call that triggers a CPU interrupt/trap to switch rings.
Processes (PID) An instance of a program. Has memory, file descriptors, and a state. Created via fork().
File Descriptors Integers representing open files/sockets. Indices into a per-process table pointing to kernel file structs.
Inodes The identity of a file. Contains metadata (permissions, size, blocks) but not the name.
Signals Software interrupts. The kernel notifies a process of an event (Ctrl+C -> SIGINT, Segfault -> SIGSEGV).
Permissions rwx (Read/Write/Execute) for User/Group/Others. Checked by the kernel at open/exec time.

Deep Dive Reading by Concept

Foundation & Architecture

Concept Book & Chapter
Architecture “The Linux Programming Interface” by Michael Kerrisk — Ch. 1-3 (Fundamentals)
Boot Process “How Linux Works” by Brian Ward — Ch. 1 (The Big Picture)

Processes & Signals

Concept Book & Chapter
Process API “The Linux Programming Interface” by Michael Kerrisk — Ch. 24-28 (Creation, Termination, Execution)
Signals “Advanced Programming in the UNIX Environment” (APUE) — Ch. 10 (Signals)

Filesystems & I/O

Concept Book & Chapter
VFS & I/O “Linux Kernel Development” by Robert Love — Ch. 13 (The Virtual Filesystem)
Directories “C Programming: A Modern Approach” (or generic) — How dirent.h works

Essential Reading Order

  1. Foundation:
    • “How Linux Works” Ch. 1-3 (Overview)
    • “The Linux Programming Interface” Ch. 4 (File I/O)
  2. Process Mastery:
    • “The Linux Programming Interface” Ch. 24 (Process Creation)
    • “APUE” Ch. 8 (Process Control)

Project List


Project 1: The “Do-Nothing” Bootloader & Kernel

  • File: LEARN_LINUX_UNIX_INTERNALS_DEEP_DIVE.md
  • Main Programming Language: Assembly (x86) & C
  • Alternative Programming Languages: Rust
  • Coolness Level: Level 4: Hardcore Tech Flex
  • Business Potential: 1. The “Resume Gold”
  • Difficulty: Level 3: Advanced
  • Knowledge Area: Boot Process / Low Level
  • Software or Tool: QEMU, NASM/GAS, GCC
  • Main Book: “Operating Systems: Three Easy Pieces” (Virtualization Part)

What you’ll build: A minimal “kernel” that boots from a disk image (in QEMU), prints “Hello from Bare Metal!” to the VGA video memory, and halts. No Linux, no libraries, just you and the hardware.

Why it teaches boot process: You assume Linux “just starts.” This project forces you to be the one starting it. You’ll see the magic number 0xAA55 that marks a bootable disk, and you’ll understand what “Ring 0” actually feels like.

Core challenges you’ll face:

  • The 512-byte limit: The BIOS only loads the first sector. You have tiny space.
  • VGA Memory Mapping: Writing to 0xB8000 to make text appear on screen.
  • Linking: Understanding how code is arranged in the binary file.

Key Concepts:

  • Boot Sector: MBR format and the 0x7C00 load address.
  • Real Mode vs Protected Mode: 16-bit vs 32-bit addressing.
  • Bare Metal IO: Writing directly to memory addresses.

Difficulty: Advanced (Conceptually) / Intermediate (Lines of code) Time estimate: Weekend Prerequisites: Basic Assembly knowledge (registers, mov instructions).


Real World Outcome

You will create a file os-image.bin. When you run it in an emulator, you will see your message on a raw black screen.

Example Output:

$ nasm -f bin boot.asm -o boot.bin
$ qemu-system-x86_64 boot.bin

# A QEMU window opens.
# Inside, white text on black background:
# "Hello from Bare Metal!"

The Core Question You’re Answering

“What happens before main()?”

Most programmers live inside an OS. This project answers: “How does the computer get from power-on to executing code?”

Concepts You Must Understand First

Stop and research these before coding:

  1. BIOS vs UEFI (We will use BIOS/Legacy mode for simplicity)
    • What address does BIOS jump to? (Answer: 0x7C00)
  2. VGA Text Mode
    • What is the memory address for the screen buffer?
    • How is a character + color stored? (2 bytes)
    • Reference: OSDev Wiki (“Printing to Screen”)

Questions to Guide Your Design

  1. Boot Signature: What are the last two bytes of a boot sector?
  2. Infinite Loop: Once your code finishes, what should the CPU do? (Don’t let it execute garbage memory).

Thinking Exercise

Trace the Boot

Imagine you are the CPU.

  1. Power on. Reset vector jumps to BIOS.
  2. BIOS checks disks. Finds one with 0x55AA at end of sector 0.
  3. BIOS copies that 512 bytes to 0x7C00.
  4. BIOS sets IP (Instruction Pointer) to 0x7C00.
  5. YOUR CODE STARTS HERE.

The Interview Questions They’ll Ask

  1. “What is the difference between a bootloader and a kernel?”
  2. “What is ‘memory mapped I/O’?”
  3. “Why do operating systems switch from Real Mode to Protected Mode?”

Hints in Layers

Hint 1: The Magic Number Your file must be exactly 512 bytes. The last two bytes must be 0x55 and 0xAA. Pad the rest with zeros.

Hint 2: Writing to Screen In 16-bit Real Mode, you can use BIOS interrupts (int 0x10) OR write directly to video memory. Writing directly is “cooler”. Segment 0xB800.

Hint 3: Assembly Loop To write a string, you need a loop. Load a character, move it to video memory, increment pointer, repeat until null terminator.

Books That Will Help

Topic Book Chapter
Bootloading “Operating Systems: From 0 to 1” (Free online book) Ch. 2-3
Assembly “Programming from the Ground Up” by Jonathan Bartlett Ch. 1-3

Project 2: Build Your Own Shell (BYOS)

  • File: LEARN_LINUX_UNIX_INTERNALS_DEEP_DIVE.md
  • Main Programming Language: C
  • Alternative Programming Languages: Rust, Go
  • Coolness Level: Level 3: Genuinely Clever
  • Business Potential: 1. The “Resume Gold”
  • Difficulty: Level 2: Intermediate
  • Knowledge Area: Process Management / Syscalls
  • Software or Tool: Linux Terminal, GCC
  • Main Book: “The Linux Programming Interface” by Michael Kerrisk

What you’ll build: A functional shell (like bash or zsh, but simpler). It will show a prompt, accept commands (ls -la), handle built-ins (cd, exit), and support input/output redirection (>) and pipes (|).

Why it teaches processes: You will implement the core lifecycle: fork() a new process, exec() a command, and wait() for it to finish. You’ll learn why cd must be a shell built-in and not a program.

Core challenges you’ll face:

  • Parsing: Breaking “ls -la grep foo” into tokens.
  • Fork/Exec pattern: The standard Unix way to run programs.
  • File Descriptors: Making > work by manipulating stdout before exec.
  • Pipes: Connecting the stdout of one process to the stdin of another.

Key Concepts:

  • Process Creation: fork(), execvp(), waitpid().
  • File Descriptors: dup2(), open(), close().
  • Signals: Handling Ctrl+C (SIGINT) without killing the shell itself.

Difficulty: Intermediate Time estimate: 1 week Prerequisites: C basics, Pointers.


Real World Outcome

You will have a program myshell.

Example Output:

$ ./myshell
myshell> ls
file1.txt  file2.c  myshell

myshell> pwd
/home/user/projects/myshell

myshell> ls -l > output.txt
myshell> cat output.txt
(lists files)

myshell> exit

The Core Question You’re Answering

“How does the OS run programs?”

And also: “What actually happens when I type a command?” It’s not magic; it’s specific syscalls.

Concepts You Must Understand First

Stop and research these before coding:

  1. Fork vs Exec
    • fork() creates a copy. exec() replaces the copy with new code. Why do we need both?
    • Book Reference: “The Linux Programming Interface” Ch. 24 & 27.
  2. File Descriptors (FDs)
    • FD 0 is Stdin, 1 is Stdout, 2 is Stderr.
    • What happens if I close FD 1 and open a file? (The file becomes FD 1).

Questions to Guide Your Design

  1. Built-ins: Why doesn’t exec("cd") work? (Hint: A child process cannot change the parent’s directory).
  2. Zombies: What happens if you don’t call wait()?
  3. Parsing: How do you split a string by spaces in C? (strtok or manual parsing).

Thinking Exercise

Trace a Pipe: ls | grep c

  1. Parent (Shell) creates a pipe (array of 2 ints).
  2. Parent forks Child A (for ls).
  3. Parent forks Child B (for grep).
  4. Child A: Closes Stdout. Duplicates Pipe-Write-End to Stdout. Execs ls.
  5. Child B: Closes Stdin. Duplicates Pipe-Read-End to Stdin. Execs grep.
  6. Parent closes pipe ends and waits.

The Interview Questions They’ll Ask

  1. “Write a simplified implementation of system().”
  2. “What is a zombie process?”
  3. “Why does cd have to be a built-in?”

Hints in Layers

Hint 1: The Loop while (1) { print prompt; read line; parse; execute; }

Hint 2: Executing Split the line into char **args. Call fork(). If pid == 0 (child), call execvp(args[0], args). If pid > 0 (parent), call waitpid(pid).

Hint 3: Redirection For ls > out.txt: In the child process, before calling exec: fd = open("out.txt", ...) dup2(fd, STDOUT_FILENO) close(fd) Then exec prints to stdout, which is now the file.

Books That Will Help

| Topic | Book | Chapter | |——-|——|———| | Process Control | “The Linux Programming Interface” | Ch. 24-27 | | Signals | “The Linux Programming Interface” | Ch. 20-22 | —

Project 3: The Filesystem Explorer (ls -R) Clone

  • File: LEARN_LINUX_UNIX_INTERNALS_DEEP_DIVE.md
  • Main Programming Language: C
  • Alternative Programming Languages: Rust
  • Coolness Level: Level 2: Practical but Forgettable
  • Business Potential: 1. The “Resume Gold”
  • Difficulty: Level 2: Intermediate
  • Knowledge Area: Filesystem / Inodes
  • Software or Tool: GCC, man pages
  • Main Book: “The Linux Programming Interface” by Michael Kerrisk

What you’ll build: A robust clone of the ls command that supports recursion (-R) and detailed listing (-l). You will read directories, query file metadata (inodes), and handle permissions formatting.

Why it teaches filesystems: You’ll learn that a directory is just a special file containing a list of names and inode numbers. You’ll understand the difference between a filename (in a directory) and the file metadata (in the inode).

Core challenges you’ll face:

  • Directory Traversal: Using opendir, readdir, closedir.
  • Stat System Call: Converting stat struct data into human-readable strings (e.g., rwxr-xr-x).
  • Time Formatting: Handling Unix timestamps.
  • Recursion: Safely walking directory trees without infinite loops (symlinks).

Key Concepts:

  • Inodes: struct stat and what it contains.
  • Directory Entries: struct dirent.
  • Bitmasks: Decoding permission bits (st_mode).

Difficulty: Intermediate Time estimate: Weekend Prerequisites: C structs, recursion.


Real World Outcome

You will have a tool myls.

Example Output:

$ ./myls -l
drwxr-xr-x  2 user group  4096 Dec 22 10:00 .
drwxr-xr-x 10 user group  4096 Dec 21 14:00 ..
-rw-r--r--  1 user group   512 Dec 22 10:01 main.c

The Core Question You’re Answering

“Where is the file name stored?”

Spoiler: NOT in the file itself. It’s stored in the directory. This project proves it.

Concepts You Must Understand First

Stop and research these before coding:

  1. Inodes vs Directory Entries
    • An Inode holds the file’s metadata (size, permissions).
    • A Directory Entry maps a “Name” to an “Inode Number”.
  2. Stat Struct
    • Look up man 2 stat. Understand st_mode, st_uid, st_size.

Questions to Guide Your Design

  1. Hidden Files: How does ls know to hide files starting with .? (It’s manual filtering in user space!).
  2. User Names: stat gives you uid (integer). How do you get “root” or “douglas”? (Hint: /etc/passwd or getpwuid).

Thinking Exercise

Struct dirent vs Struct stat

readdir gives you a name and a type. That’s it. To get the size, you MUST call stat on that name. Trace:

  1. Open directory ..
  2. Read entry main.c.
  3. Call stat("main.c", &mystat).
  4. Read mystat.st_size.

The Interview Questions They’ll Ask

  1. “What is a hard link vs a soft link?” (Hard link = same inode, new directory entry).
  2. “Why does ls take a long time on a network mount?” (Stat calls are expensive).

Hints in Layers

Hint 1: Decoding Permissions st_mode is a bitfield. (mode & S_IRUSR) checks if User has Read permission. Construct the rwxr-xr-x string character by character.

Hint 2: Recursion If S_ISDIR(mode) is true, and the name isn’t . or .., construct the new path (current/child) and recurse.

Books That Will Help

Topic Book Chapter
File I/O “The Linux Programming Interface” Ch. 15 (File Attributes)
Directories “The Linux Programming Interface” Ch. 18 (Directories)

Project 4: The Process Psychic (Process Inspector)

  • File: LEARN_LINUX_UNIX_INTERNALS_DEEP_DIVE.md
  • Main Programming Language: C or Python
  • Alternative Programming Languages: Go
  • Coolness Level: Level 3: Genuinely Clever
  • Business Potential: 3. The “Service & Support” Model (Monitoring Agent)
  • Difficulty: Level 2: Intermediate
  • Knowledge Area: Linux /proc filesystem
  • Software or Tool: Linux
  • Main Book: “Linux Kernel Development” by Robert Love

What you’ll build: A tool similar to ps or top that lists running processes, their state (Running, Sleeping, Zombie), memory usage, and command line arguments. You will do this by parsing the /proc filesystem directly.

Why it teaches Linux Internals: Linux exposes kernel statistics to user space via the /proc virtual filesystem. By parsing it, you see exactly what the kernel tracks for each process.

Core challenges you’ll face:

  • Scanning /proc: Iterating over numeric directories.
  • Parsing status files: Reading /proc/[pid]/stat or /proc/[pid]/status.
  • Calculating CPU usage: Reading /proc/stat (system wide) and process time to calculate percentages.

Key Concepts:

  • Virtual Filesystems: /proc exists only in RAM.
  • Process States: R (Running), S (Sleeping), Z (Zombie).
  • UID/GID: Mapping numeric IDs to names.

Difficulty: Beginner/Intermediate Time estimate: Weekend Prerequisites: File I/O, String parsing.


Real World Outcome

You will have a CLI tool myps.

Example Output:

$ ./myps
PID    USER    STATE   CMD
1      root    S       /sbin/init
1042   douglas R       ./myps
1043   douglas S       bash

The Core Question You’re Answering

“How does top know what’s running?”

It doesn’t use magic system calls. It reads files. Everything in Unix is a file.

Concepts You Must Understand First

Stop and research these before coding:

  1. The /proc directory
    • Run ls /proc on your machine.
    • Run cat /proc/self/status. Look at the output.
  2. Ticks vs Seconds
    • Kernel counts time in “jiffies” or ticks. You need sysconf(_SC_CLK_TCK) to convert to seconds.

Questions to Guide Your Design

  1. Filtering: How do you distinguish process directories (numbers) from other info (like cpuinfo) in /proc?
  2. Race Conditions: What happens if a process dies while you are reading its /proc file? (Handle ENOENT).

Thinking Exercise

Manual ps

  1. cd /proc
  2. Find a numbered folder, e.g., 1234.
  3. cat 1234/cmdline -> See the command (null separated).
  4. cat 1234/stat -> See the raw state characters.

The Interview Questions They’ll Ask

  1. “What is the /proc filesystem?”
  2. “How do you find a process’s memory usage without top?”

Hints in Layers

Hint 1: Iteration Use opendir("/proc"). Check if entry->d_name is all digits (isdigit).

Hint 2: Reading CMDLINE /proc/[pid]/cmdline arguments are separated by \0 (null bytes), not spaces. Replace \0 with spaces for display.

Books That Will Help

Topic Book Chapter
/proc “The Linux Programming Interface” Ch. 10 (Process Credentials) & Ch. 12

Project 5: The Raw Mode Terminal (Text Editor Base)

  • File: LEARN_LINUX_UNIX_INTERNALS_DEEP_DIVE.md
  • Main Programming Language: C
  • Alternative Programming Languages: Rust
  • Coolness Level: Level 3: Genuinely Clever
  • Business Potential: 2. Micro-SaaS (Custom CLI Tools)
  • Difficulty: Level 3: Advanced
  • Knowledge Area: TTY / Termios
  • Software or Tool: termios.h
  • Main Book: “The Linux Programming Interface”

What you’ll build: A program that enters “Raw Mode”. It disables the terminal’s default behavior (echoing characters, line buffering, handling Ctrl+C). You will read input byte-by-byte and render output manually, moving the cursor using VT100 escape sequences. This is the foundation of vim or nano.

Why it teaches TTY: The terminal is a complex legacy beast. Normally, the terminal driver “cooks” input (processes Backspace, Enter, signals). To build advanced TUIs, you must talk directly to the TTY driver.

Core challenges you’ll face:

  • Termios Struct: Modifying flags ECHO, ICANON, ISIG.
  • Escape Sequences: Sending \x1b[2J to clear screen.
  • Restoring State: If your program crashes in Raw Mode, your terminal is broken. You must implement atexit handlers.

Key Concepts:

  • Cooked vs Raw Mode: Canonical vs Non-canonical input.
  • Control Characters: What Ctrl+C (3) and Enter (10/13) actually send.
  • VT100/ANSI Codes: Controlling the cursor.

Difficulty: Intermediate Time estimate: 1 week Prerequisites: C basics, Bitwise operations.


Real World Outcome

A program that lets you type, and draws characters, but Ctrl+C prints “I caught Ctrl+C!” instead of exiting, and Backspace requires manual handling.

Example Output:

$ ./raw_term
(Screen clears)
[Type 'q' to quit]
You pressed: 'a' (97)
You pressed: 'Ctrl+C' (3) - Not exiting!
...

The Core Question You’re Answering

“Why does Backspace work?”

In Raw Mode, Backspace is just byte 127. It deletes nothing. You have to move the cursor back, print a space, and move back again.

Concepts You Must Understand First

Stop and research these before coding:

  1. Canonical Mode (Line buffering).
    • Why input doesn’t appear until you hit Enter.
  2. termios flags
    • ECHO: Prints what you type.
    • ICANON: Enables line buffering.
    • ISIG: Handles signals (Ctrl+C/Z).

Questions to Guide Your Design

  1. Safety: How do you ensure the terminal returns to normal mode when the user quits?
  2. Mapping: Does pressing Enter send \n (10) or \r (13)? (It’s messy).

Thinking Exercise

The “Stuck” Terminal

Run stty -echo in your terminal. Type. Nothing appears. Run stty echo blindly to fix it. This is what you are manipulating programmatically.

The Interview Questions They’ll Ask

  1. “What is the difference between stdout and a TTY?”
  2. “How do text editors detect window resize events?” (SIGWINCH).

Hints in Layers

Hint 1: Disabling Echo struct termios raw; tcgetattr(STDIN_FILENO, &raw); raw.c_lflag &= ~(ECHO); ...

Hint 2: Read Byte-by-Byte Turn off ICANON. read(STDIN_FILENO, &c, 1) returns immediately after a keypress.

Books That Will Help

Topic Book Chapter
Terminals “The Linux Programming Interface” Ch. 62 (Terminals)
TTY Driver “Build Your Own Text Editor” (Online Guide by Jeremy Ruten) All


Project 6: The “Mirror” Filesystem (FUSE)

  • File: LEARN_LINUX_UNIX_INTERNALS_DEEP_DIVE.md
  • Main Programming Language: C or Python
  • Alternative Programming Languages: Rust
  • Coolness Level: Level 4: Hardcore Tech Flex
  • Business Potential: 4. Open Core Infrastructure
  • Difficulty: Level 3: Advanced
  • Knowledge Area: VFS / Filesystems
  • Software or Tool: libfuse
  • Main Book: “Linux Kernel Development” (Filesystems chapter)

What you’ll build: A user-space filesystem that mounts a folder. When you write to it, it reverses the text (or encrypts it) before saving to the underlying disk. This uses FUSE (Filesystem in Userspace) to hook into kernel VFS calls.

Why it teaches VFS: You will implement the callbacks: getattr, read, write, readdir. You will see exactly what the kernel asks for when you run cat file.txt.

Core challenges you’ll face:

  • Callback Implementation: Mapping read() requests to underlying file operations.
  • Permissions: Handling chmod/chown correctly.
  • Latency: User-space context switching overhead.

Key Concepts:

  • VFS Interface: The common API for all filesystems.
  • Mount Points: How the kernel attaches filesystems.
  • User-Kernel Bridge: How /dev/fuse works.

Difficulty: Advanced Time estimate: 1-2 weeks Prerequisites: C, Pointers, Project 3 (ls clone).


Real World Outcome

You will mount a directory mnt/.

Example Output:

$ ./mirrorfs root_dir/ mount_point/
$ echo "Hello" > mount_point/test.txt
$ cat root_dir/test.txt
olleH

The Core Question You’re Answering

“How does the kernel support ext4, fat32, and ntfs simultaneously?”

It uses a standard interface (VFS). FUSE lets you be that interface.

Concepts You Must Understand First

Stop and research these before coding:

  1. FUSE Architecture
    • Kernel module -> /dev/fuse -> libfuse (User space) -> Your Code.
  2. Error Codes
    • Returning -ENOENT when file is missing.

Questions to Guide Your Design

  1. Stat: When ls runs, it calls getattr. What do you return for a file that doesn’t exist yet?
  2. Thread Safety: FUSE is multithreaded. Do you need locks?

Thinking Exercise

Trace a cat

  1. cat calls open().
  2. Kernel VFS sees mount point.
  3. Kernel sends OPEN op to FUSE.
  4. Your code receives open.
  5. Your code returns Success.
  6. Kernel sends READ.

The Interview Questions They’ll Ask

  1. “Why is FUSE slower than a kernel driver?” (Context switches).
  2. “What is the VFS?”

Hints in Layers

Hint 1: Hello World FUSE Start with the “Hello World” example in libfuse documentation. It creates a virtual file that doesn’t exist on disk.

Hint 2: Passthrough First, make a “passthrough” filesystem that just forwards every call to the underlying directory. open -> open, read -> read.

Books That Will Help

Topic Book Chapter
VFS “Linux Kernel Development” Ch. 13
FUSE “FUSE Documentation” Official Wiki

Project 7: The “Poor Man’s Docker” (Container Runtime)

  • File: LEARN_LINUX_UNIX_INTERNALS_DEEP_DIVE.md
  • Main Programming Language: Go or C
  • Alternative Programming Languages: Rust, Python
  • Coolness Level: Level 5: Pure Magic
  • Business Potential: 4. Open Core Infrastructure
  • Difficulty: Level 4: Expert
  • Knowledge Area: Namespaces / Cgroups
  • Software or Tool: Linux Namespaces
  • Main Book: “The Linux Programming Interface”

What you’ll build: A program that runs a command in an isolated environment. It will have its own Process ID tree (PID 1), its own mount table, and its own hostname. It’s a mini-Docker.

Why it teaches isolation: Docker isn’t magic; it uses Linux Namespaces (CLONE_NEWPID, CLONE_NEWNS, CLONE_NEWUTS) and cgroups. You will call these directly.

Core challenges you’ll face:

  • Namespaces: Using unshare() or clone() with flags.
  • Root Filesystem: Setting up chroot or pivot_root (the “jail”).
  • ProcFS: Mounting a fresh /proc so ps inside the container only shows container processes.

Key Concepts:

  • PID Namespace: Process isolation.
  • Mount Namespace: Filesystem isolation.
  • Chroot/Pivot_root: Root directory isolation.

Difficulty: Expert Time estimate: 2 weeks Prerequisites: Project 2 (Shell), Root access.


Real World Outcome

You run a shell inside your container. ps shows only two processes.

Example Output:

$ sudo ./mycontainer run /bin/bash
container# ps aux
PID   USER     COMMAND
1     root     /bin/bash
2     root     ps aux
container# hostname
container-host
container# exit

The Core Question You’re Answering

“What IS a container?”

It is NOT a virtual machine. It is a process with a restricted view of the kernel’s data structures.

Concepts You Must Understand First

Stop and research these before coding:

  1. Namespaces
    • Read man 7 namespaces. PID, UTS, MNT, NET.
  2. chroot vs pivot_root
    • Why chroot is not enough for security.

Questions to Guide Your Design

  1. PID 1: Who handles signals inside the container? (Your starter process).
  2. Networking: By default, you have no network. Do you want to share the host’s or create a veth pair? (Share for simplicity first).

Thinking Exercise

The ps Lie

If you chroot but don’t use PID namespaces, ps will show all host processes (if /proc is mounted). If you use PID namespaces but don’t remount /proc, ps will show nothing or crash. You need BOTH.

The Interview Questions They’ll Ask

  1. “What is the difference between a VM and a Container?” (Kernel sharing).
  2. “What are cgroups used for?” (Resource limiting).
  3. “How does Docker hide processes?” (PID Namespaces).

Hints in Layers

Hint 1: unshare Tool Play with the command line tool unshare first. sudo unshare --fork --pid --mount-proc /bin/bash.

Hint 2: Go Syscalls In Go: cmd.SysProcAttr = &syscall.SysProcAttr{Cloneflags: syscall.CLONE_NEWUTS | syscall.CLONE_NEWPID ...}.

Hint 3: Mounting Proc Inside the container setup: mount("proc", "/proc", "proc", 0, "").

Books That Will Help

Topic Book Chapter
Namespaces “The Linux Programming Interface” (Newer editions cover namespaces)
Containers “Container Security” by Liz Rice Ch. 2-3

Project Comparison Table

Project Difficulty Time Depth of Understanding Fun Factor
1. Boot Kernel ⭐⭐⭐⭐ Weekend Ring 0, BIOS, ASM ⭐⭐⭐⭐⭐
2. Build Shell ⭐⭐ 1 week Fork/Exec, Signals ⭐⭐⭐⭐
3. ls Clone ⭐⭐ Weekend Inodes, Dirents ⭐⭐
4. Process Psychic ⭐⭐ Weekend /proc, Parsing ⭐⭐⭐
5. Raw Terminal ⭐⭐⭐ 1 week TTY, Termios ⭐⭐⭐⭐
6. FUSE Filesystem ⭐⭐⭐⭐ 1-2 weeks VFS, Kernel Hooks ⭐⭐⭐⭐⭐
7. Container Runtime ⭐⭐⭐⭐⭐ 2 weeks Namespaces, Isolation ⭐⭐⭐⭐⭐

Recommendation

Where to Start

  1. For the Absolute Best Foundation: Start with Project 2 (Build Shell). It covers the most important daily concepts (fork, exec, pipes, signals). It bridges the gap between being a user and a programmer.
  2. For the Low-Level Curious: Start with Project 4 (Process Psychic). It’s easier than the Shell but shows you the “magic” behind the curtain (/proc).

The Progression Path

Shell -> ls Clone -> Process Psychic -> Container Runtime. This path takes you from managing processes, to inspecting files, to inspecting processes, to isolating all of them.


Final Overall Project: The “System-Z” (Mini-OS Supervisor)

What you’ll build: A minimal Init System (PID 1) that combines your shell, your container runtime, and your process monitor.

The Goal: You boot your Linux kernel (using init=/path/to/your/program). Your program starts. It:

  1. Mounts /proc, /sys.
  2. Reads a config file (like systemd units).
  3. Starts services (like a web server or your shell) in isolated containers (using your Project 7 code).
  4. Monitors them (restarting if they crash).
  5. Provides a command socket to query status (using your Project 4 code).

Why this is the ultimate test: You are the system. If your code crashes, the kernel panics. You handle the orphans, the signals, the mounts, and the logs. You effectively replace the entire user-space OS layer.


Summary

This learning path covers Linux and Unix Internals through 7 hands-on projects.

# Project Name Main Language Difficulty Time Estimate
1 Boot Sector Kernel Assembly/C Advanced Weekend
2 Build Your Own Shell C Intermediate 1 week
3 ls -R Clone C Intermediate Weekend
4 Process Psychic (ps) C Intermediate Weekend
5 Raw Mode Terminal C Advanced 1 week
6 Mirror Filesystem (FUSE) C Advanced 1-2 weeks
7 Container Runtime Go/C Expert 2 weeks

Expected Outcomes

After completing these projects, you will:

  • Understand Boot Process by writing one.
  • Master Syscalls by calling them directly.
  • Demystify Containers by building one from scratch.
  • Visualize Filesystems by parsing inodes manually.
  • Debug Processes by reading their raw kernel state.

You will no longer view Linux as a “black box” but as a collection of data structures and well-defined interfaces that you can inspect, manipulate, and reimplement.