← Back to all projects

CSAPP 3E DEEP LEARNING PROJECTS

Every program you write eventually becomes electrons flowing through silicon. Between your high-level code and those electrons lies a vast machinery of translation, optimization, and abstraction that most programmers never see. This invisible infrastructure determines whether your program runs fast or slow, crashes mysteriously or fails gracefully, consumes megabytes or gigabytes of memory.

CS:APP (3rd Edition) โ€” Deep Learning via Buildable Projects

Goal: Transform from a programmer who writes code that โ€œhappens to workโ€ into a systems programmer who understands exactly what the machine does with every instruction. By building 17 increasingly sophisticated projects, you will internalize the complete journey from source code to running processโ€”mastering data representation, machine-level execution, memory hierarchy, operating system abstractions, and concurrent programming. When you finish, you will debug crashes by reading registers, optimize code by reasoning about cache lines, and build robust systems that handle real-world failure modes gracefully.


Why Systems Programming Matters

The Hidden Foundation

Every program you write eventually becomes electrons flowing through silicon. Between your high-level code and those electrons lies a vast machinery of translation, optimization, and abstraction that most programmers never see. This invisible infrastructure determines whether your program runs fast or slow, crashes mysteriously or fails gracefully, consumes megabytes or gigabytes of memory.

Consider this scenario: A financial trading system processes millions of transactions per day. One day, after a routine update, trades start failing silentlyโ€”but only for certain customers, only during peak hours, and only when the system has been running for exactly 47 minutes. The logs show nothing. The unit tests pass. The code review found nothing suspicious.

A programmer without systems knowledge might spend weeks adding more logging, trying random fixes, or blaming the network. A systems programmer recognizes the symptoms immediately: this is a classic memory corruption bug, likely a buffer overflow that only manifests when a specific heap layout occurs after enough allocations. They fire up GDB, examine the heap metadata, trace the corruption back to a string copy that assumed null termination, and fix it in an hour.

The difference is not intelligenceโ€”it is knowledge. Systems programming knowledge.

Real-World Impact

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                    WHY SYSTEMS KNOWLEDGE MATTERS                        โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                                                                         โ”‚
โ”‚  DEBUGGING         Without systems knowledge, you guess.                โ”‚
โ”‚  โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€         With it, you diagnose.                               โ”‚
โ”‚                                                                         โ”‚
โ”‚  PERFORMANCE       Without systems knowledge, you benchmark randomly.   โ”‚
โ”‚  โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€       With it, you reason about cache lines and pipelines. โ”‚
โ”‚                                                                         โ”‚
โ”‚  SECURITY          Without systems knowledge, you follow checklists.    โ”‚
โ”‚  โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€          With it, you understand attack surfaces.             โ”‚
โ”‚                                                                         โ”‚
โ”‚  ARCHITECTURE      Without systems knowledge, you copy patterns.        โ”‚
โ”‚  โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€      With it, you design for the machine you have.        โ”‚
โ”‚                                                                         โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

The Heartbleed Bug (2014): A missing bounds check in OpenSSL allowed attackers to read arbitrary server memory, exposing passwords, private keys, and session tokens. The bug existed for two years. Understanding buffer management and memory layout would have caught it in code review.

The Mars Climate Orbiter (1999): A $327 million spacecraft was lost because one module used imperial units while another expected metric. Understanding data representation and interface contractsโ€”exactly what Chapter 2 teachesโ€”would have prevented this.

Spectre and Meltdown (2018): These CPU vulnerabilities exploited speculative execution and cache timing to leak privileged memory. Understanding cache behavior and CPU pipelinesโ€”Chapters 5 and 6โ€”is essential for both exploiting and mitigating such attacks.

What This Journey Gives You

After completing these projects, you will be able to:

  1. Read a crash dump and explain exactly what happenedโ€”which instruction faulted, what the stack looked like, what memory was corrupted

  2. Profile code and explain why it is slowโ€”whether it is memory-bound, compute-bound, or suffering from branch mispredictions

  3. Audit code for security vulnerabilitiesโ€”recognizing buffer overflows, integer overflows, and use-after-free bugs from code inspection

  4. Design systems that handle failure gracefullyโ€”understanding partial I/O, signal races, and concurrency hazards

  5. Communicate with compilers, operating systems, and hardwareโ€”not as black boxes, but as partners whose behavior you can predict and influence


The Big Picture: How Programs Become Running Processes

Before diving into individual concepts, let us see the complete journey a program takes from source code to execution:

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                    THE PROGRAM EXECUTION PIPELINE                           โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                                                                             โ”‚
โ”‚   SOURCE CODE                                                               โ”‚
โ”‚   โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€                                                              โ”‚
โ”‚   hello.c โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”       โ”‚
โ”‚   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                                    โ”‚       โ”‚
โ”‚   โ”‚ #include <stdio.h>         โ”‚                                    โ”‚       โ”‚
โ”‚   โ”‚                            โ”‚                                    โ”‚       โ”‚
โ”‚   โ”‚ int main() {               โ”‚                                    โ”‚       โ”‚
โ”‚   โ”‚     printf("Hello\n");     โ”‚                                    โ”‚       โ”‚
โ”‚   โ”‚     return 0;              โ”‚                                    โ”‚       โ”‚
โ”‚   โ”‚ }                          โ”‚                                    โ”‚       โ”‚
โ”‚   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                                    โ”‚       โ”‚
โ”‚                 โ”‚                                                   โ”‚       โ”‚
โ”‚                 โ–ผ                                                   โ”‚       โ”‚
โ”‚   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”       โ”‚       โ”‚
โ”‚   โ”‚  STAGE 1: PREPROCESSING (cpp)                           โ”‚       โ”‚       โ”‚
โ”‚   โ”‚  โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€                          โ”‚       โ”‚       โ”‚
โ”‚   โ”‚  โ€ข Expands #include directives                          โ”‚       โ”‚       โ”‚
โ”‚   โ”‚  โ€ข Processes #define macros                             โ”‚       โ”‚       โ”‚
โ”‚   โ”‚  โ€ข Handles conditional compilation (#ifdef)             โ”‚       โ”‚       โ”‚
โ”‚   โ”‚  โ€ข Output: hello.i (expanded C source)                  โ”‚       โ”‚       โ”‚
โ”‚   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜       โ”‚       โ”‚
โ”‚                 โ”‚                                                   โ”‚       โ”‚
โ”‚                 โ–ผ                                                   โ”‚       โ”‚
โ”‚   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”       โ”‚       โ”‚
โ”‚   โ”‚  STAGE 2: COMPILATION (cc1)                             โ”‚       โ”‚       โ”‚
โ”‚   โ”‚  โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€                             โ”‚       โ”‚       โ”‚
โ”‚   โ”‚  โ€ข Lexical analysis โ†’ tokens                            โ”‚       โ”‚       โ”‚
โ”‚   โ”‚  โ€ข Parsing โ†’ AST (Abstract Syntax Tree)                 โ”‚       โ”‚       โ”‚
โ”‚   โ”‚  โ€ข Semantic analysis โ†’ type checking                    โ”‚       โ”‚       โ”‚
โ”‚   โ”‚  โ€ข Optimization passes                                  โ”‚       โ”‚       โ”‚
โ”‚   โ”‚  โ€ข Code generation                                      โ”‚       โ”‚       โ”‚
โ”‚   โ”‚  โ€ข Output: hello.s (assembly source)                    โ”‚       โ”‚       โ”‚
โ”‚   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜       โ”‚       โ”‚
โ”‚                 โ”‚                                                   โ”‚       โ”‚
โ”‚                 โ–ผ                                                   โ”‚       โ”‚
โ”‚   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”       โ”‚       โ”‚
โ”‚   โ”‚  STAGE 3: ASSEMBLY (as)                                 โ”‚       โ”‚       โ”‚
โ”‚   โ”‚  โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€                              โ”‚       โ”‚       โ”‚
โ”‚   โ”‚  โ€ข Translates assembly to machine code                  โ”‚       โ”‚       โ”‚
โ”‚   โ”‚  โ€ข Creates relocatable object file                      โ”‚       โ”‚       โ”‚
โ”‚   โ”‚  โ€ข Records symbols and relocations                      โ”‚       โ”‚       โ”‚
โ”‚   โ”‚  โ€ข Output: hello.o (ELF object file)                    โ”‚       โ”‚       โ”‚
โ”‚   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜       โ”‚       โ”‚
โ”‚                 โ”‚                                                   โ”‚       โ”‚
โ”‚                 โ”‚         โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”       โ”‚       โ”‚
โ”‚                 โ”‚         โ”‚  LIBRARIES                      โ”‚       โ”‚       โ”‚
โ”‚                 โ”‚         โ”‚  libc.a / libc.so               โ”‚       โ”‚       โ”‚
โ”‚                 โ”‚         โ”‚  (printf, malloc, etc.)         โ”‚       โ”‚       โ”‚
โ”‚                 โ”‚         โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜       โ”‚       โ”‚
โ”‚                 โ”‚                    โ”‚                              โ”‚       โ”‚
โ”‚                 โ–ผ                    โ–ผ                              โ”‚       โ”‚
โ”‚   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”       โ”‚       โ”‚
โ”‚   โ”‚  STAGE 4: LINKING (ld)                                  โ”‚       โ”‚       โ”‚
โ”‚   โ”‚  โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€                                  โ”‚       โ”‚       โ”‚
โ”‚   โ”‚  โ€ข Symbol resolution (matches references to defs)       โ”‚       โ”‚       โ”‚
โ”‚   โ”‚  โ€ข Relocation (assigns final addresses)                 โ”‚       โ”‚       โ”‚
โ”‚   โ”‚  โ€ข Static: copies library code into executable          โ”‚       โ”‚       โ”‚
โ”‚   โ”‚  โ€ข Dynamic: records dependencies for runtime            โ”‚       โ”‚       โ”‚
โ”‚   โ”‚  โ€ข Output: hello (executable ELF file)                  โ”‚       โ”‚       โ”‚
โ”‚   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜       โ”‚       โ”‚
โ”‚                 โ”‚                                                   โ”‚       โ”‚
โ”‚                 โ–ผ                                                   โ”‚       โ”‚
โ”‚   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”       โ”‚       โ”‚
โ”‚   โ”‚  STAGE 5: LOADING (execve + ld-linux.so)                โ”‚       โ”‚       โ”‚
โ”‚   โ”‚  โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€                    โ”‚       โ”‚       โ”‚
โ”‚   โ”‚  โ€ข Kernel reads ELF headers                             โ”‚       โ”‚       โ”‚
โ”‚   โ”‚  โ€ข Creates new process address space                    โ”‚       โ”‚       โ”‚
โ”‚   โ”‚  โ€ข Maps code and data segments                          โ”‚       โ”‚       โ”‚
โ”‚   โ”‚  โ€ข Dynamic linker resolves shared libraries             โ”‚       โ”‚       โ”‚
โ”‚   โ”‚  โ€ข Sets up stack with argc/argv/envp                    โ”‚       โ”‚       โ”‚
โ”‚   โ”‚  โ€ข Transfers control to _start โ†’ main()                 โ”‚       โ”‚       โ”‚
โ”‚   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜       โ”‚       โ”‚
โ”‚                 โ”‚                                                   โ”‚       โ”‚
โ”‚                 โ–ผ                                                   โ”‚       โ”‚
โ”‚   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”       โ”‚       โ”‚
โ”‚   โ”‚  STAGE 6: EXECUTION                                     โ”‚       โ”‚       โ”‚
โ”‚   โ”‚  โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€                                  โ”‚       โ”‚       โ”‚
โ”‚   โ”‚  โ€ข CPU fetches, decodes, executes instructions          โ”‚       โ”‚       โ”‚
โ”‚   โ”‚  โ€ข Memory accesses go through cache hierarchy           โ”‚       โ”‚       โ”‚
โ”‚   โ”‚  โ€ข Virtual addresses translated to physical             โ”‚       โ”‚       โ”‚
โ”‚   โ”‚  โ€ข System calls trap to kernel                          โ”‚       โ”‚       โ”‚
โ”‚   โ”‚  โ€ข Signals may interrupt execution                      โ”‚       โ”‚       โ”‚
โ”‚   โ”‚  โ€ข Process terminates, resources cleaned up             โ”‚       โ”‚       โ”‚
โ”‚   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜       โ”‚       โ”‚
โ”‚                                                                     โ”‚       โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Every project in this curriculum touches some part of this pipeline. Project 1 makes the entire pipeline visible. Projects 2-6 focus on data representation and machine code. Projects 7-9 examine the CPU and cache. Projects 10-16 explore the operating systemโ€™s role. Project 17 integrates everything.


Core Concept Analysis

Think of CS:APP as one story told through eight interconnected concept clusters. Each cluster builds on the previous ones, and mastery requires understanding both the individual concepts and their interactions.

A. Translation & Execution

Book Coverage: Chapters 1, 7

The Central Question: How does human-readable source code become a running process?

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                         TRANSLATION PIPELINE DETAIL                         โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                                                                             โ”‚
โ”‚  SOURCE (.c)                                                                โ”‚
โ”‚      โ”‚                                                                      โ”‚
โ”‚      โ”‚  cpp (C Preprocessor)                                                โ”‚
โ”‚      โ”‚  โ€ข Text substitution, #include, #define, #ifdef                      โ”‚
โ”‚      โ–ผ                                                                      โ”‚
โ”‚  PREPROCESSED (.i)                                                          โ”‚
โ”‚      โ”‚                                                                      โ”‚
โ”‚      โ”‚  cc1 (C Compiler)                                                    โ”‚
โ”‚      โ”‚  โ€ข Lexer โ†’ Parser โ†’ Type check โ†’ Optimize โ†’ Code gen                 โ”‚
โ”‚      โ–ผ                                                                      โ”‚
โ”‚  ASSEMBLY (.s)                                                              โ”‚
โ”‚      โ”‚                                                                      โ”‚
โ”‚      โ”‚  as (Assembler)                                                      โ”‚
โ”‚      โ”‚  โ€ข Machine code + metadata + relocations                             โ”‚
โ”‚      โ–ผ                                                                      โ”‚
โ”‚  OBJECT FILE (.o)                                                           โ”‚
โ”‚      โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”           โ”‚
โ”‚      โ”‚  โ”‚  ELF: Header, .text, .data, .bss, .symtab, .rel.*    โ”‚           โ”‚
โ”‚      โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜           โ”‚
โ”‚      โ”‚                                                                      โ”‚
โ”‚      โ”‚  ld (Linker): Symbol resolution + Relocation                         โ”‚
โ”‚      โ–ผ                                                                      โ”‚
โ”‚  EXECUTABLE                                                                 โ”‚
โ”‚      โ”‚                                                                      โ”‚
โ”‚      โ”‚  execve() + ld-linux.so (Loader)                                     โ”‚
โ”‚      โ–ผ                                                                      โ”‚
โ”‚  RUNNING PROCESS                                                            โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Key Insights:

  • Each stage produces a concrete artifact you can inspect
  • Symbol resolution is where โ€œundefined referenceโ€ errors occur
  • Static linking copies code; dynamic linking defers to runtime

Mastery Test: Can you predict what changes when you switch from static to dynamic linking?


B. Data Representation

Book Coverage: Chapter 2

The Central Question: How does the machine represent information, and what happens at the boundaries?

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                         INTEGER REPRESENTATION                              โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚  UNSIGNED (8-bit):  0000 0000 ... 1111 1111  =  0 ... 255                   โ”‚
โ”‚  SIGNED (8-bit):    1000 0000 ... 0111 1111  =  -128 ... 127                โ”‚
โ”‚                                                                             โ”‚
โ”‚  DANGER ZONES:                                                              โ”‚
โ”‚  โ€ข Overflow (signed): undefined behavior!                                   โ”‚
โ”‚  โ€ข Signed/Unsigned comparison: -1 > 0U is TRUE!                             โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                         FLOATING POINT (IEEE 754)                           โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚  32-bit: [S|Exponent(8)|Mantissa(23)]                                       โ”‚
โ”‚  Value = (-1)^S ร— 1.Mantissa ร— 2^(Exp-127)                                  โ”‚
โ”‚                                                                             โ”‚
โ”‚  Special: ยฑ0, ยฑInfinity, NaN                                                โ”‚
โ”‚  WARNING: 0.1 + 0.2 != 0.3                                                  โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                              BYTE ORDERING                                  โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚  0x01234567:  Little-endian (x86): 67 45 23 01                              โ”‚
โ”‚               Big-endian (network): 01 23 45 67                             โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Mastery Test: Can you predict the output of printf("%d", (int)(unsigned)-1)?


C. Machine-Level Programming

Book Coverage: Chapter 3

The Central Question: How does the compiler translate C into x86-64?

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                         x86-64 REGISTER FILE                                โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚  Arguments: %rdi, %rsi, %rdx, %rcx, %r8, %r9                                โ”‚
โ”‚  Return: %rax                                                               โ”‚
โ”‚  Callee-saved: %rbx, %rbp, %r12-%r15                                        โ”‚
โ”‚  Stack pointer: %rsp                                                        โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                         STACK FRAME LAYOUT                                  โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚  High addr: [Caller's frame] [Args 7+] [Return addr] [Saved regs] [Locals]  โ”‚
โ”‚  Low addr:  โ† %rsp                                                          โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Mastery Test: Given a crash address, can you walk the stack frames?


D. Architecture & Performance

Book Coverage: Chapters 4, 5

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                         PIPELINED CPU                                       โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚  5 stages: Fetch โ†’ Decode โ†’ Execute โ†’ Memory โ†’ Writeback                    โ”‚
โ”‚  Hazards: Data (RAW), Control (misprediction ~15-20 cycles)                 โ”‚
โ”‚  Optimization: ILP, loop unrolling, reduce dependencies                     โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

E. Memory Hierarchy & Virtual Memory

Book Coverage: Chapters 6, 9

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                          MEMORY HIERARCHY                                   โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚  Registers (~1KB, ~0.25ns) โ†’ L1 (32-64KB, ~1ns) โ†’ L2 (256KB, ~4ns)          โ”‚
โ”‚  โ†’ L3 (8-32MB, ~12ns) โ†’ DRAM (8-64GB, ~60ns) โ†’ SSD/HDD                      โ”‚
โ”‚                                                                             โ”‚
โ”‚  Cache: tag|index|offset, exploit temporal & spatial locality               โ”‚
โ”‚  VM: Virtual โ†’ Page Table โ†’ Physical, TLB caches translations               โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

F. Exceptional Control Flow & Processes

Book Coverage: Chapter 8

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                       EXCEPTIONS & PROCESSES                                โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚  Exceptions: Interrupt (async), Trap (syscall), Fault, Abort                โ”‚
โ”‚  Processes: fork() โ†’ exec() โ†’ wait() โ†’ exit()                               โ”‚
โ”‚  Signals: SIGINT, SIGTERM, SIGSEGV, SIGCHLD (handlers must be safe!)        โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

G. System I/O & Networking

Book Coverage: Chapters 10, 11

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                            I/O & NETWORKING                                 โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚  Unix I/O: "Everything is a file" (FD 0=stdin, 1=stdout, 2=stderr)          โ”‚
โ”‚  ROBUST I/O: read()/write() may return short counts - always loop!         โ”‚
โ”‚  Sockets: socketโ†’bindโ†’listenโ†’accept (server), socketโ†’connect (client)      โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

H. Concurrency

Book Coverage: Chapter 12

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                            CONCURRENCY                                      โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚  Models: Processes (isolated) | Threads (shared) | I/O multiplex (single)   โ”‚
โ”‚  Race: counter++ is NOT atomic (load-add-store interleaving)                โ”‚
โ”‚  Sync: mutex, semaphore, condition variable                                 โ”‚
โ”‚  Deadlock: circular wait - prevent with consistent lock ordering            โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Process Address Space Layout

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                    PROCESS ADDRESS SPACE (Linux x86-64)                     โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚  High: KERNEL | STACK (grows โ†“) | mmap region | HEAP (grows โ†‘)              โ”‚
โ”‚  Low:  BSS | DATA | RODATA | TEXT | NULL page                               โ”‚
โ”‚                                                                             โ”‚
โ”‚  Permissions: TEXT=r-x, RODATA=r--, DATA/BSS/HEAP/STACK=rw-                 โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Concept Summary Table

Concept Key Questions Danger Signs
Translation What does each stage produce? โ€œCompiled but crashesโ€
Data Rep Why -1 > 0U? Silent corruption
Machine Code How does stack grow? Cannot debug crashes
Architecture What hazard is this? Random performance
Memory Cache hit rate? 100x slowdown
ECF/Processes What is a zombie? Hangs, orphans
I/O/Networking Short count? Data corruption
Concurrency Race condition? Heisenbugs

Deep Dive Reading By Concept

Primary: CS:APP 3rd Ed (Bryant & Oโ€™Hallaron)

Concept CS:APP Supplementary
Translation Ch. 1, 7 Practical Binary Analysis, Low-Level Programming
Data Rep Ch. 2 Write Great Code Vol.1, Effective C
Machine Code Ch. 3 Hacking: Art of Exploitation
Architecture Ch. 4-5 Computer Organization and Design
Memory Ch. 6, 9 OSTEP
ECF/Processes Ch. 8 The Linux Programming Interface
I/O/Networking Ch. 10-11 Unix Network Programming
Concurrency Ch. 12 OSTEP, TLPI

Essential Book List: CS:APP, C Programming: A Modern Approach (King), Effective C (Seacord), OSTEP (free online), The Linux Programming Interface (Kerrisk), Low-Level Programming (Zhirkov)


Table of Contents


Overview

The bookโ€™s scope (12 chapters) spans:

Domain Topics
Translation & Execution Preprocessing, compilation, assembly, linking, loading
Data Representation Bits/bytes, integers, floating point, endianness
Machine-Level Code x86-64, calling conventions, stack discipline
Architecture CPU datapaths, pipelining (Y86-64)
Performance Loop optimization, ILP, branch prediction
Memory Hierarchy Caches, locality, virtual memory
Operating System Processes, signals, exceptional control flow
I/O & Networking File descriptors, sockets, robust I/O
Concurrency Threads, synchronization, deadlock avoidance

Project Dependency Graph

                                    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                                    โ”‚  PROJECT 17: CAPSTONE   โ”‚
                                    โ”‚   Secure Proxy Server   โ”‚
                                    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                                                โ”‚
              โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
              โ”‚                                 โ”‚                                 โ”‚
              โ–ผ                                 โ–ผ                                 โ–ผ
    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”               โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”               โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
    โ”‚   P16: Conc.    โ”‚               โ”‚   P15: Unix     โ”‚               โ”‚   P14: Malloc   โ”‚
    โ”‚   Workbench     โ”‚               โ”‚   I/O Toolkit   โ”‚               โ”‚   Allocator     โ”‚
    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜               โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜               โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
             โ”‚                                 โ”‚                                 โ”‚
             โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค                 โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
             โ”‚                                 โ”‚                 โ”‚               โ”‚
             โ–ผ                                 โ–ผ                 โ–ผ               โ–ผ
    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”               โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
    โ”‚   P12: Shell    โ”‚               โ”‚   P11: Signals  โ”‚    โ”‚ P13: VM  โ”‚  โ”‚ P9:Cache โ”‚
    โ”‚   Job Control   โ”‚               โ”‚   + Processes   โ”‚    โ”‚ Visualiz โ”‚  โ”‚ Simulatorโ”‚
    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜               โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”˜
             โ”‚                                 โ”‚                  โ”‚             โ”‚
             โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                  โ”‚             โ”‚
                           โ”‚                                      โ”‚             โ”‚
                           โ–ผ                                      โ–ผ             โ–ผ
              โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
              โ”‚                          P10: ELF Link Map                          โ”‚
              โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                                                   โ”‚
       โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
       โ”‚                                           โ”‚                                           โ”‚
       โ–ผ                                           โ–ผ                                           โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                      โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                         โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚   P8: Perf      โ”‚                      โ”‚   P7: Y86-64    โ”‚                         โ”‚   P6: Attack    โ”‚
โ”‚   Clinic        โ”‚                      โ”‚   CPU Simulator โ”‚                         โ”‚   Lab Workflow  โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                      โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                         โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
         โ”‚                                        โ”‚                                           โ”‚
         โ”‚                                        โ”‚                                           โ–ผ
         โ”‚                                        โ”‚                                  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
         โ”‚                                        โ”‚                                  โ”‚   P5: Bomb Lab  โ”‚
         โ”‚                                        โ”‚                                  โ”‚   Workflow      โ”‚
         โ”‚                                        โ”‚                                  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
         โ”‚                                        โ”‚                                           โ”‚
         โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
                                                  โ”‚                                           โ”‚
                                                  โ–ผ                                           โ–ผ
                                         โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                         โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                                         โ”‚   P3: Data Lab  โ”‚                         โ”‚   P4: Calling   โ”‚
                                         โ”‚   Clone         โ”‚                         โ”‚   Convention    โ”‚
                                         โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                         โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                                                  โ”‚                                           โ”‚
                                                  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                                                                    โ”‚
                                                                    โ–ผ
                                                           โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                                                           โ”‚   P2: Bitwise   โ”‚
                                                           โ”‚   Data Inspectorโ”‚
                                                           โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                                                                    โ”‚
                                                                    โ–ผ
                                                           โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                                                           โ”‚   P1: Toolchain โ”‚
                                                           โ”‚   Explorer      โ”‚
                                                           โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                                                                 START

CS:APP Project Dependency Graph

Recommended Learning Paths:

Path Focus Projects
Core Essential CS:APP understanding P1 โ†’ P2 โ†’ P4 โ†’ P11 โ†’ P12
Security Exploitation & defense P1 โ†’ P2 โ†’ P4 โ†’ P5 โ†’ P6
Architecture CPU internals P1 โ†’ P2 โ†’ P3 โ†’ P7
Performance Optimization mastery P1 โ†’ P2 โ†’ P8 โ†’ P9
Systems Full systems programmer P1 โ†’ P2 โ†’ P4 โ†’ P11 โ†’ P12 โ†’ P15 โ†’ P16
Complete Everything P1 through P17

Progress Tracker

Use this checklist to track your journey:

Phase 1: Foundation
[ ] P1  - Hello, Toolchain โ€” Build Pipeline Explorer
[ ] P2  - Bitwise Data Inspector

Phase 2: Machine-Level Mastery
[ ] P3  - Data Lab Clone
[ ] P4  - x86-64 Calling Convention Crash Cart
[ ] P5  - Bomb Lab Workflow
[ ] P6  - Attack Lab Workflow

Phase 3: Architecture & Performance
[ ] P7  - Y86-64 CPU Simulator
[ ] P8  - Performance Clinic
[ ] P9  - Cache Lab++ Simulator

Phase 4: Systems Programming
[ ] P10 - ELF Link Map & Interposition
[ ] P11 - Signals + Processes Sandbox
[ ] P12 - Unix Shell with Job Control
[ ] P13 - Virtual Memory Map Visualizer
[ ] P14 - Build Your Own Malloc
[ ] P15 - Robust Unix I/O Toolkit
[ ] P16 - Concurrency Workbench

Phase 5: Capstone
[ ] P17 - CS:APP Capstone Proxy Platform

Phase 6: Beyond CS:APP (Advanced Extensions)
[ ] P18 - ELF Linker and Loader
[ ] P19 - Virtual Memory Simulator
[ ] P20 - HTTP Web Server
[ ] P21 - Thread Pool Implementation
[ ] P22 - Signal-Safe Printf
[ ] P23 - Performance Profiler
[ ] P24 - Memory Leak Detector
[ ] P25 - Debugger (ptrace-based)
[ ] P26 - Operating System Kernel Capstone

Projects

Phase 1: Foundation (Start Here)


Project 1: โ€œHello, Toolchainโ€ โ€” Build Pipeline Explorer

Attribute Value
Language C (alt: Rust, Zig, C++)
Difficulty Intermediate
Time 1โ€“2 weeks
Chapters 1, 7
Coolness โ˜…โ˜…โ˜…โ˜†โ˜† Genuinely Clever
Portfolio Value Resume Gold

What youโ€™ll build: A CLI โ€œpipeline explainerโ€ that takes one small C program and produces a structured report for each stage (preprocessed C, assembly, object metadata, linked binary metadata) plus runtime observations.

Why it matters: Chapter 1 is about seeing the system as a whole; this forces you to observe every transformation and artifact, not just โ€œrun gcc and hope.โ€

Core challenges:

  • Capturing each compilation artifact deterministically (translation stages)
  • Explaining symbol tables/sections in human terms (executable structure)
  • Relating runtime behavior to the produced binary (loading + process execution)

Key concepts to master:

  • Translation system (Ch. 1)
  • Object file anatomy: sections, symbols (Ch. 7)
  • Error-handling discipline (Appendix)

Prerequisites: Basic C, comfort with build tools, basic debugging literacy.

Deliverable: A single report explaining โ€œwhat the compiler produced, what the linker stitched, and what the process looks like at runtime.โ€

Implementation hints:

  • Treat this as a report generator, not a toy script
  • Output must include: section list, symbol count by kind, stack/heap locations at runtime (via debugger)

Milestones:

  1. You can explain each pipeline stage using artifacts you produced
  2. You can predict changes between static vs dynamic linking
  3. You can map a crash address back to the right stage (source vs asm vs binary)

Real World Outcome

When complete, you will have a CLI tool that produces comprehensive build pipeline analysis:

$ ./pipeline-explorer hello.c --all

================================================================================
                    BUILD PIPELINE ANALYSIS: hello.c
================================================================================

[STAGE 1: PREPROCESSING]
--------------------------------------------------------------------------------
Input:  hello.c (45 bytes, 5 lines)
Output: hello.i (18,432 bytes, 847 lines)
Time:   0.003s

Preprocessing transformations:
  - #include <stdio.h> expanded: +842 lines from /usr/include/stdio.h
  - Header chain: stdio.h -> stddef.h -> bits/types.h -> ...
  - Macros defined: 127 (from system headers)
  - Macros used in source: 0
  - Conditional compilation: 23 #ifdef blocks evaluated

[STAGE 2: COMPILATION]
--------------------------------------------------------------------------------
Input:  hello.i (18,432 bytes)
Output: hello.s (512 bytes, 28 lines)
Time:   0.012s

Assembly characteristics:
  - Target: x86-64 (AT&T syntax)
  - Functions generated: 1 (main)
  - Instructions: 14
  - String literals: 1 ("Hello, World!\n")
  - Section directives: .text, .rodata, .note.GNU-stack

Code generation summary:
  - Stack frame: 16 bytes (aligned)
  - Callee-saved registers used: none
  - External calls: puts@PLT

[STAGE 3: ASSEMBLY]
--------------------------------------------------------------------------------
Input:  hello.s (512 bytes)
Output: hello.o (1,688 bytes)
Time:   0.002s

Object file analysis:
  Section        Size    Type           Flags
  .text          26      PROGBITS       AX (alloc, execute)
  .rodata        15      PROGBITS       A  (alloc)
  .comment       46      PROGBITS       MS (merge, strings)
  .note.GNU-s    0       NOBITS         -
  .eh_frame      56      PROGBITS       A  (alloc)

Symbol table (4 entries):
  Symbol         Type    Bind    Section   Value
  main           FUNC    GLOBAL  .text     0x0
  puts           NOTYPE  GLOBAL  UND       0x0 (undefined - needs linking)

Relocations (2 entries):
  Offset    Type              Symbol    Addend
  0x0a      R_X86_64_PC32     .rodata   -4
  0x0f      R_X86_64_PLT32    puts      -4

[STAGE 4: LINKING]
--------------------------------------------------------------------------------
Input:  hello.o + libc
Output: hello (16,696 bytes)
Time:   0.024s

Linking type: Dynamic
Interpreter: /lib64/ld-linux-x86-64.so.2

Linked binary analysis:
  Section          VMA              Size    Type
  .interp          0x0000000000400318   28    interpreter path
  .text            0x0000000000401040   147   executable code
  .rodata          0x0000000000402000   19    read-only data
  .dynamic         0x0000000000403e10   480   dynamic linking info
  .got.plt         0x0000000000404000   32    GOT for PLT
  .data            0x0000000000404020   0     initialized data
  .bss             0x0000000000404020   0     uninitialized data

Symbol resolution:
  - puts: resolved via PLT/GOT (lazy binding)
  - __libc_start_main: resolved via PLT/GOT
  - Dynamic libraries required: libc.so.6

Entry point: 0x401040 (_start, not main!)

[STAGE 5: RUNTIME OBSERVATION]
--------------------------------------------------------------------------------
Process memory map at main() entry:

Address Range                    Perms  Size    Mapping
0x00400000-0x00401000           r--p   4K      hello (ELF header)
0x00401000-0x00402000           r-xp   4K      hello (.text)
0x00402000-0x00403000           r--p   4K      hello (.rodata)
0x00403000-0x00405000           rw-p   8K      hello (.data, .bss, .got)
0x7ffff7c00000-0x7ffff7c28000   r--p   160K    libc.so.6
0x7ffff7c28000-0x7ffff7dbd000   r-xp   1620K   libc.so.6 (.text)
0x7ffff7fc3000-0x7ffff7fc7000   r--p   16K     ld-linux-x86-64.so.2
0x7ffffffde000-0x7ffffffff000   rw-p   132K    [stack]

Stack frame at main():
  RSP: 0x7fffffffe3d0
  RBP: 0x7fffffffe3e0
  Return address: 0x7ffff7c29d90 (__libc_start_call_main+128)
  argc: 1
  argv[0]: "./hello"

================================================================================
PIPELINE SUMMARY
================================================================================
Total build time: 0.041s
Size amplification: 45 bytes (source) -> 16,696 bytes (binary) = 371x
Symbol resolution: 2 external symbols resolved dynamically
Recommendation: Use -static for deployment, dynamic for development

The tool can also produce focused reports:

$ ./pipeline-explorer hello.c --symbols
$ ./pipeline-explorer hello.c --relocations
$ ./pipeline-explorer hello.c --compare-linking  # static vs dynamic comparison
$ ./pipeline-explorer hello.c --trace-symbol puts  # full resolution chain for one symbol

The Core Question Youโ€™re Answering

โ€œWhat exactly happens between typing gcc hello.c and having a running process, and why does each transformation exist?โ€

This question forces you to confront the reality that compilation is not magic - it is a deterministic pipeline where each stage produces artifacts that the next stage consumes. Understanding this pipeline is the foundation for debugging linker errors, understanding security vulnerabilities, optimizing build times, and reasoning about what code actually executes.

Concepts You Must Understand First

  1. The Translation Pipeline (Preprocessing, Compilation, Assembly, Linking)
    • What is the output of each stage and what format does it take?
    • Why does preprocessing happen before compilation?
    • What would break if you skipped the assembly stage and went directly from compiler output to object file?
    • CS:APP Ch. 1.2, Ch. 7.1-7.2
  2. Object Files and ELF Format
    • What are sections and why do .text, .data, .rodata, and .bss exist as separate concepts?
    • What is a symbol table and why does it contain both defined and undefined symbols?
    • What is a relocation entry and why canโ€™t the assembler resolve all addresses itself?
    • CS:APP Ch. 7.3-7.4
  3. Symbol Resolution and Linking
    • How does the linker decide which definition to use when multiple object files define the same symbol?
    • What is the difference between strong and weak symbols?
    • Why do static libraries and dynamic libraries resolve symbols differently?
    • CS:APP Ch. 7.5-7.7
  4. Loading and Process Creation
    • What does the loader do with the ELF file before main() runs?
    • Where do the various segments end up in virtual memory?
    • What is the role of the dynamic linker (ld-linux.so)?
    • CS:APP Ch. 7.9, Ch. 8.2
  5. Compilation and Code Generation
    • What decisions does the compiler make when translating C to assembly?
    • How do optimization levels affect the generated code?
    • What information is lost during compilation that cannot be recovered?
    • CS:APP Ch. 1.2, Ch. 3.1-3.2

Questions to Guide Your Design

  1. How will you invoke each stage of the pipeline separately? (Hint: gcc -E, gcc -S, gcc -c, gcc)

  2. How will you parse the output of tools like objdump, readelf, and nm to extract structured information?

  3. What format will your report take - plain text, JSON, or both? How will you handle reports that need to show binary data?

  4. How will you capture runtime information? Will you use GDB scripting, ptrace, or /proc filesystem parsing?

  5. How will you handle error cases - what if compilation fails? What if the input is not valid C?

  6. How will you make the tool educational? Should it explain why each transformation happened, not just what changed?

  7. How will you compare static vs dynamic linking? What metrics are meaningful to show?

Thinking Exercise

Before writing any code, trace through this program by hand:

// main.c
extern int helper(int x);
int global_var = 42;

int main(void) {
    return helper(global_var);
}

// helper.c
int helper(int x) {
    return x + 1;
}

Answer these questions on paper:

  1. Preprocessing phase: What will main.i look like? Will it be different from main.c in any meaningful way for this example?

  2. Symbol table for main.o: List every symbol. For each one, state:
    • Name
    • Type (FUNC, OBJECT, NOTYPE)
    • Binding (LOCAL, GLOBAL)
    • Section (which section, or UND if undefined)
  3. Relocations in main.o: There will be at least two relocations. What are they and why?

  4. Linking main.o + helper.o: Draw the combined symbol table. Which symbols from main.o were undefined before linking but defined after?

  5. Memory layout after loading: If the .text section of the final binary starts at 0x401000, and main is at offset 0x20 within .text, what is the absolute address of main?

  6. Dynamic linking alternative: If helper() were in a shared library instead of helper.o, what would be different about:
    • The symbol table
    • The relocations
    • The PLT/GOT sections
    • The runtime behavior on first call to helper()

The Interview Questions Theyโ€™ll Ask

  1. โ€œWalk me through what happens when you run gcc -o hello hello.cโ€œ
    • They want: preprocessing expands includes/macros, compiler generates assembly, assembler creates object file with relocations, linker resolves symbols and creates executable
    • Bonus: mention that ld.so loads dynamic dependencies at runtime
  2. โ€œWhatโ€™s the difference between a linker error and a compiler error?โ€
    • They want: compiler errors are syntax/type errors in a single translation unit; linker errors are symbol resolution failures across multiple object files
    • Example: undefined reference vs undeclared identifier
  3. โ€œExplain static vs dynamic linking and when youโ€™d use eachโ€
    • They want: static bundles everything (larger binary, no dependencies, faster startup), dynamic shares libraries (smaller binary, security updates, slower first-call)
    • Discuss: deployment scenarios, licensing implications (LGPL)
  4. โ€œWhat is Position Independent Code (PIC) and why is it needed?โ€
    • They want: code that works regardless of load address, required for shared libraries (ASLR), uses PC-relative addressing and GOT/PLT
  5. โ€œHow would you debug a โ€˜symbol not foundโ€™ error at runtime?โ€
    • They want: ldd to check dependencies, LD_DEBUG=all to trace resolution, readelf/nm to inspect symbol tables, verify library paths
  6. โ€œWhatโ€™s in an ELF file and how does the loader use it?โ€
    • They want: ELF header, program headers (segments for loading), section headers (for linking/debugging), symbol/string tables, relocation entries

Hints in Layers

Hint 1 - Getting Started: Start by manually running each stage and saving the outputs:

gcc -E hello.c -o hello.i      # Preprocess only
gcc -S hello.c -o hello.s      # Compile to assembly
gcc -c hello.c -o hello.o      # Assemble to object file
gcc hello.o -o hello           # Link to executable

Look at each output file. What tools can parse them? (file, cat, objdump, readelf, nm)

Hint 2 - Extracting Object File Information: These commands give you structured output you can parse:

readelf -h hello.o      # ELF header
readelf -S hello.o      # Section headers
readelf -s hello.o      # Symbol table
readelf -r hello.o      # Relocations
objdump -d hello.o      # Disassembly

Consider using readelf --wide for easier parsing.

Hint 3 - Capturing Runtime Information: For the runtime stage, you can use GDB non-interactively:

gdb -batch -ex "break main" -ex "run" -ex "info registers" -ex "x/20x \$rsp" ./hello

Or parse /proc/[pid]/maps from a wrapper program.

Hint 4 - Comparing Linking Strategies: Build both versions and compare:

gcc -o hello_dynamic hello.c
gcc -static -o hello_static hello.c
ls -l hello_dynamic hello_static
ldd hello_dynamic
readelf -d hello_dynamic | grep NEEDED

Hint 5 - Tool Architecture: Structure your code as:

struct stage_result {
    char *stage_name;
    char *input_file;
    char *output_file;
    size_t input_size;
    size_t output_size;
    double elapsed_time;
    /* stage-specific data */
};

struct preprocess_result { int lines_added; int macros_expanded; ... };
struct compile_result { int instructions; int functions; ... };
struct assemble_result { struct section *sections; struct symbol *symbols; ... };
struct link_result { struct segment *segments; char *entry_point; ... };

Hint 6 - The Educational Value: Donโ€™t just report numbers - explain them:

The symbol 'puts' appears in hello.o with type NOTYPE and section UND (undefined).
This means the assembler encountered a call to puts() but has no idea where it is.
The relocation entry at offset 0x0f tells the linker: "When you find puts,
patch this location with the correct address."

After linking, puts is still not directly resolved - instead, the linker created
a PLT entry at 0x401030 and a GOT slot at 0x404018. The first call to puts()
will trigger the dynamic linker to fill in the GOT slot.

Books That Will Help

Topic Book Chapter
The compilation pipeline overview Computer Systems: A Programmerโ€™s Perspective Ch. 1 (A Tour of Computer Systems)
Object files, symbols, and relocations Computer Systems: A Programmerโ€™s Perspective Ch. 7 (Linking)
ELF format deep dive Practical Binary Analysis Ch. 2 (ELF Format)
Static and dynamic linking Computer Systems: A Programmerโ€™s Perspective Ch. 7.6-7.7
Position-independent code and GOT/PLT Computer Systems: A Programmerโ€™s Perspective Ch. 7.12
The C compilation model The C Programming Language (K&R) Ch. 4 (Functions and Program Structure)
Separate compilation in C C Programming: A Modern Approach Ch. 15 (Writing Large Programs)
x86-64 assembly basics Computer Systems: A Programmerโ€™s Perspective Ch. 3.1-3.4
Process loading and execution Computer Systems: A Programmerโ€™s Perspective Ch. 7.9, Ch. 8.2
Low-level executable analysis Low-Level Programming Ch. 3-4 (Assembly and Linking)

Project 2: Bitwise Data Inspector

Attribute Value
Language C (alt: Rust, Zig, C++)
Difficulty Intermediate
Time Weekendโ€“2 weeks
Chapters 2, 3
Coolness โ˜…โ˜…โ˜…โ˜†โ˜† Genuinely Clever
Portfolio Value Micro-SaaS/Pro Tool

What youโ€™ll build: A CLI that prints the byte-level representation of values (signed/unsigned integers and IEEE-754 floats), including inferred endianness and derived interpretations.

Why it matters: Chapter 2 becomes โ€œmuscle memoryโ€ only when you can see representations and predict overflow, truncation, and rounding.

Core challenges:

  • Correct sign extension, shifts, and casts (twoโ€™s complement)
  • Float field extraction and classification (IEEE-754)
  • Tests that catch edge-case mistakes (disciplined reasoning)

Key concepts to master:

  • Integer representations and overflow (Ch. 2)
  • Floating point, rounding, NaN/Inf (Ch. 2)
  • Data sizes and alignment (Ch. 3)

Prerequisites: Basic C operators, binary/hex comfort.

Deliverable: Paste a number; get โ€œwhat the machine storesโ€ plus why comparisons/overflows surprise people.

Implementation hints:

  • Separate parsing, bit extraction, and formatting as distinct modules
  • Make the tool explain why a conversion changed value (range, rounding, NaN propagation)

Milestones:

  1. You can predict overflow and signed/unsigned comparison outcomes
  2. You can explain subnormals and NaN behavior with your own examples
  3. You start trusting bit evidence over intuition

Real World Outcome

When complete, you will have a CLI tool that reveals the hidden bit-level truth behind numbers:

$ ./bitwise-inspector 42

================================================================================
                    BITWISE DATA INSPECTION: 42
================================================================================

[INTEGER INTERPRETATIONS]
--------------------------------------------------------------------------------
Input parsed as: decimal integer

As unsigned integers:
  uint8_t:   42  (0x2A)     Binary: 00101010
  uint16_t:  42  (0x002A)   Binary: 00000000 00101010
  uint32_t:  42  (0x0000002A)
  uint64_t:  42  (0x000000000000002A)

As signed integers (two's complement):
  int8_t:    42  (0x2A)     Binary: 00101010
  int16_t:   42  (0x002A)   Sign bit: 0 (positive)
  int32_t:   42  (0x0000002A)
  int64_t:   42  (0x000000000000002A)

Memory layout (little-endian system):
  Address:  [0]  [1]  [2]  [3]
  uint32:   2A   00   00   00

[OVERFLOW ANALYSIS]
--------------------------------------------------------------------------------
  42 + 200 as uint8_t = 242 (no overflow, still fits)
  42 + 200 as int8_t  = -14 (OVERFLOW! Wraps negative)
    Binary: 00101010 + 11001000 = 11110010 = -14 (two's complement)

[COMPARISON TRAPS]
--------------------------------------------------------------------------------
Warning: Signed/unsigned comparison hazards:
  (int8_t)42 > (uint8_t)200 is FALSE (42 < 200)
  But: (int8_t)-1 > (uint8_t)200 might surprise you!
    -1 as int8_t  = 0xFF = 255 as uint8_t
    Comparison: 255 > 200 = TRUE (after promotion)

$ ./bitwise-inspector -f 0.1

================================================================================
                    BITWISE DATA INSPECTION: 0.1 (float)
================================================================================

[IEEE-754 SINGLE PRECISION (32-bit)]
--------------------------------------------------------------------------------
Hex representation: 0x3DCCCCCD
Binary: 0 01111011 10011001100110011001101
        ^ ^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^
        |    |              |
        |    |              +-- Mantissa (23 bits): 1.10011001100110011001101
        |    +-- Exponent (8 bits): 123 - 127 = -4
        +-- Sign bit: 0 (positive)

Value computation:
  (-1)^0 * 1.10011001100110011001101 * 2^(-4)
  = 1 * 1.60000002384185791015625 * 0.0625
  = 0.10000000149011611938476562

PRECISION LOSS: You asked for 0.1, but got 0.10000000149011612
  Error: +1.49e-09 (relative error: 1.49e-08)

[IEEE-754 DOUBLE PRECISION (64-bit)]
--------------------------------------------------------------------------------
Hex representation: 0x3FB999999999999A
Binary: 0 01111111011 1001100110011001100110011001100110011001100110011010

Value: 0.10000000000000000555111512312578270211815834045410156250

PRECISION LOSS: Error from exact 0.1 = +5.55e-18

[WHY 0.1 CANNOT BE EXACT]
--------------------------------------------------------------------------------
0.1 in binary is a repeating fraction: 0.0001100110011001100110011...
Just like 1/3 = 0.333... in decimal, 1/10 is infinite in binary.
IEEE-754 truncates this, causing the representation error.

$ ./bitwise-inspector -f inf

[IEEE-754 SPECIAL VALUES]
--------------------------------------------------------------------------------
+Infinity (float):  0x7F800000  Binary: 0 11111111 00000000000000000000000
-Infinity (float):  0xFF800000  Binary: 1 11111111 00000000000000000000000
+NaN (quiet):       0x7FC00000  Binary: 0 11111111 10000000000000000000000
-0.0 (float):       0x80000000  Binary: 1 00000000 00000000000000000000000

NaN behavior:
  NaN == NaN is FALSE (NaN is not equal to anything, including itself)
  NaN != NaN is TRUE
  isnan(NaN) is TRUE

$ ./bitwise-inspector --edge-cases

[CRITICAL EDGE CASES TO REMEMBER]
--------------------------------------------------------------------------------
INT_MIN negation trap:
  -(-2147483648) = -2147483648 (NOT 2147483648!)
  Because 2147483648 cannot fit in int32_t

Signed overflow is UNDEFINED BEHAVIOR in C:
  INT_MAX + 1 = undefined (compiler may assume it never happens)
  Unsigned overflow is well-defined: wraps to 0

Float comparison epsilon:
  0.1 + 0.2 == 0.3 is FALSE
  |0.1 + 0.2 - 0.3| < epsilon is the correct approach

The Core Question Youโ€™re Answering

โ€œHow does the machine actually store and manipulate numbers, and why do programmers keep getting bitten by edge cases they thought they understood?โ€

This project forces you to move from โ€œI know twoโ€™s complement existsโ€ to โ€œI can predict exactly which bit pattern will result from any operation.โ€ This is the foundation for understanding buffer overflows, integer vulnerabilities, floating-point precision issues in financial software, and why certain optimizations are (un)safe.

Concepts You Must Understand First

  1. Twoโ€™s Complement Integer Representation
    • How do you convert a negative number to its twoโ€™s complement representation?
    • What is the range of an N-bit twoโ€™s complement integer?
    • Why is there one more negative number than positive?
    • How does negation work in twoโ€™s complement? When does it fail?
    • CS:APP Ch. 2.2 (Integer Representations)
  2. Unsigned vs Signed Integer Operations
    • What happens when you cast a negative signed integer to unsigned?
    • What is โ€œsign extensionโ€ and when does it occur?
    • How does C handle mixed signed/unsigned comparisons?
    • What is the difference between arithmetic and logical right shift?
    • CS:APP Ch. 2.2-2.3
  3. Integer Overflow and Undefined Behavior
    • What happens when signed overflow occurs in C? (Hint: undefined behavior)
    • What happens when unsigned overflow occurs? (Hint: well-defined wraparound)
    • How can compilers exploit undefined behavior for optimization?
    • CS:APP Ch. 2.3, Effective C Ch. 5
  4. IEEE-754 Floating Point Format
    • What are the three components of an IEEE-754 number (sign, exponent, mantissa)?
    • What is the โ€œbiasโ€ in the exponent field? Why is it needed?
    • What is the implicit leading 1 in normalized numbers?
    • What are denormalized (subnormal) numbers and when do they occur?
    • CS:APP Ch. 2.4 (Floating Point)
  5. Special Floating Point Values
    • How are infinity, negative infinity, and NaN represented?
    • What operations produce NaN? What operations produce infinity?
    • Why is NaN != NaN true? How do you test for NaN?
    • What is negative zero and how does it differ from positive zero?
    • CS:APP Ch. 2.4.3-2.4.6
  6. Endianness and Memory Layout
    • What is big-endian vs little-endian?
    • How do you determine the endianness of your system?
    • How does endianness affect multi-byte integer storage?
    • CS:APP Ch. 2.1.3

Questions to Guide Your Design

  1. How will you parse different input formats (decimal, hex, binary, float literals)?

  2. How will you handle type specification - should the user specify int32 vs int64, or infer it?

  3. How will you display bit patterns - raw binary, grouped bytes, or both?

  4. How will you demonstrate overflow - show the computation, or just the result?

  5. How will you extract IEEE-754 fields - bit masking, unions, or memcpy?

  6. How will you make the output educational - just facts, or explanations of why?

  7. How will you handle invalid input or edge cases like NaN input?

Thinking Exercise

Before writing any code, work through these by hand:

// Exercise 1: Integer representation
int8_t a = -1;
uint8_t b = a;
// Question: What is the value of b? Draw the bit pattern.

// Exercise 2: Sign extension
int8_t x = -5;
int32_t y = x;
// Question: What bit pattern is y? How many 1s are in its binary representation?

// Exercise 3: Overflow
int8_t m = 127;
int8_t n = m + 1;
// Question: What is n? Is this defined behavior?

uint8_t p = 255;
uint8_t q = p + 1;
// Question: What is q? Is this defined behavior?

// Exercise 4: Signed/unsigned comparison
int x = -1;
unsigned int y = 1;
if (x < y) printf("x < y\n");
else printf("x >= y\n");
// Question: What prints and why?

// Exercise 5: Float representation
// Convert 12.375 to IEEE-754 single precision by hand:
// Step 1: Convert to binary: 12.375 = ?
// Step 2: Normalize: 1.??? x 2^?
// Step 3: Calculate biased exponent: ? + 127 = ?
// Step 4: Write final bit pattern: ? ? ?

// Exercise 6: Float precision
float a = 0.1f;
float b = 0.2f;
float c = 0.3f;
// Question: Is (a + b == c) true or false? What are the actual bit patterns?

The Interview Questions Theyโ€™ll Ask

  1. โ€œExplain twoโ€™s complement and why we use it instead of sign-magnitudeโ€
    • They want: addition works the same for signed/unsigned, only one zero, simple negation (flip bits + 1), hardware efficiency
    • Know: the asymmetry (-128 to 127 for int8_t)
  2. โ€œWhat happens when you compare a signed and unsigned integer in C?โ€
    • They want: signed is converted to unsigned, which can cause -1 > 1 to be true
    • Bonus: explain that this is a common source of security vulnerabilities
  3. โ€œWhy canโ€™t 0.1 be represented exactly in floating point?โ€
    • They want: 0.1 is a repeating fraction in binary, IEEE-754 has finite precision
    • Know: never compare floats with ==, use epsilon comparison
  4. โ€œWhat is undefined behavior and why does signed overflow cause it?โ€
    • They want: compiler can assume UB never happens, enables optimizations
    • Example: if (x + 1 > x) can be optimized to if (true) because overflow is UB
  5. โ€œHow would you detect if an integer addition will overflow before it happens?โ€
    • They want: for signed, check if signs match and result sign differs; for unsigned, check if result < either operand
    • Bonus: mention compiler built-ins like __builtin_add_overflow
  6. โ€œExplain denormalized floating point numbersโ€
    • They want: gradual underflow, implicit leading 0 instead of 1, fills gap between 0 and smallest normalized
    • Know: they have reduced precision but prevent abrupt underflow to zero

Hints in Layers

Hint 1 - Getting Started: Start with integer display. Use a union or memcpy to view raw bytes:

void show_bytes(void *ptr, size_t len) {
    unsigned char *bytes = (unsigned char *)ptr;
    for (size_t i = 0; i < len; i++) {
        printf("%02x ", bytes[i]);
    }
    printf("\n");
}

int x = -1;
show_bytes(&x, sizeof(x));  // ff ff ff ff on little-endian

Hint 2 - Extracting IEEE-754 Fields: Use bit manipulation to extract sign, exponent, and mantissa:

typedef union {
    float f;
    uint32_t u;
} float_bits;

void decompose_float(float f) {
    float_bits fb = { .f = f };
    uint32_t sign = (fb.u >> 31) & 1;
    uint32_t exponent = (fb.u >> 23) & 0xFF;
    uint32_t mantissa = fb.u & 0x7FFFFF;

    int actual_exp = exponent - 127;  // Remove bias
    printf("Sign: %u, Exp: %d (biased: %u), Mantissa: 0x%06X\n",
           sign, actual_exp, exponent, mantissa);
}

Hint 3 - Detecting Overflow: For unsigned addition, overflow occurred if result < either operand:

int unsigned_add_overflows(unsigned a, unsigned b) {
    return (a + b) < a;
}

// For signed, use compiler built-ins or check manually:
int signed_add_overflows(int a, int b) {
    return __builtin_add_overflow(a, b, &(int){0});
}

Hint 4 - Printing Binary: Create a helper to print any integer as binary with grouping:

void print_binary(uint64_t val, int bits) {
    for (int i = bits - 1; i >= 0; i--) {
        printf("%c", (val >> i) & 1 ? '1' : '0');
        if (i > 0 && i % 8 == 0) printf(" ");
    }
    printf("\n");
}

Hint 5 - Special Float Detection: Detect special values using the bit pattern:

int is_nan(float f) {
    float_bits fb = { .f = f };
    uint32_t exp = (fb.u >> 23) & 0xFF;
    uint32_t mantissa = fb.u & 0x7FFFFF;
    return exp == 255 && mantissa != 0;
}

int is_infinity(float f) {
    float_bits fb = { .f = f };
    uint32_t exp = (fb.u >> 23) & 0xFF;
    uint32_t mantissa = fb.u & 0x7FFFFF;
    return exp == 255 && mantissa == 0;
}

int is_denormalized(float f) {
    float_bits fb = { .f = f };
    uint32_t exp = (fb.u >> 23) & 0xFF;
    return exp == 0 && f != 0.0f;
}

Hint 6 - Tool Structure: Organize your tool with clear separation:

// parser.c - parse input strings to values
// integer.c - integer analysis and display
// float.c - IEEE-754 analysis and display
// display.c - formatted output

struct inspection_result {
    enum { INT_TYPE, FLOAT_TYPE } type;
    union {
        struct {
            int64_t signed_val;
            uint64_t unsigned_val;
            int bits;
        } integer;
        struct {
            double value;
            int precision;  // 32 or 64
        } floating;
    } data;
};

Books That Will Help

Topic Book Chapter
Integer representations (twoโ€™s complement) Computer Systems: A Programmerโ€™s Perspective Ch. 2.2 Integer Representations
Integer arithmetic and overflow Computer Systems: A Programmerโ€™s Perspective Ch. 2.3 Integer Arithmetic
IEEE-754 floating point Computer Systems: A Programmerโ€™s Perspective Ch. 2.4 Floating Point
Bit manipulation techniques The C Programming Language (K&R) Ch. 2.9 Bitwise Operators
Safe integer operations Effective C Ch. 5 Integer Security
Undefined behavior Effective C Ch. 2 Objects, Functions, Types
Data representation overview Write Great Code Vol. 1 Ch. 2-4 (Numeric Representation)
C type conversion rules C Programming: A Modern Approach Ch. 7 Basic Types
Pointer and integer relationships Understanding and Using C Pointers Ch. 4 Pointers and Arrays
Low-level data representation Low-Level Programming Ch. 2 Assembly Language and Computer Architecture

Phase 2: Machine-Level Mastery


Project 3: Data Lab Clone

Attribute Value
Language C (alt: Rust, Zig, C++)
Difficulty Advanced
Time 1โ€“2 weeks
Chapters 2
Coolness โ˜…โ˜…โ˜†โ˜†โ˜† Practical
Portfolio Value Resume Gold

What youโ€™ll build: A framework that enforces restricted operator sets for exercises (e.g., only bitwise ops), runs randomized tests, and produces a scoreboard.

Why it matters: The restriction forces hardware-style thinking; the harness forces correctness under edge cases.

Core challenges:

  • Enforcing restrictions mechanically (operator semantics)
  • Property-based/randomized testing for corner cases (representation edge behavior)
  • Producing clear failure explanations (debugging discipline)

Key concepts to master:

  • Bit-level operator reasoning (Ch. 2)
  • Undefined/implementation-defined behavior awareness (Effective C reference)
  • Test-oracle thinking (Appendix)

Prerequisites: Solid C, comfort writing tests.

Deliverable: A repeatable, automated way to prove you can do โ€œbit-twiddling under constraintsโ€ correctly.

Implementation hints:

  • Make restrictions mechanical (scan source for disallowed tokens)
  • Include adversarial values (min/max, boundaries, NaNs) in tests

Milestones:

  1. You derive bit identities without trial-and-error
  2. You can explain every failing case without โ€œmysteryโ€
  3. Your constraints prevent cheating, not just discourage it

Real World Outcome

When complete, you will have a testing framework that enforces bit-manipulation constraints:

$ ./datalab-runner puzzles/bitAnd.c

================================================================================
                    DATA LAB CLONE - PUZZLE VALIDATOR
================================================================================

[PUZZLE: bitAnd]
--------------------------------------------------------------------------------
Task: Compute x & y using only ~ and |
Allowed operators: ~ |
Max operations: 8
Your solution uses: 4 operations

[RESTRICTION CHECK]
--------------------------------------------------------------------------------
Scanning source for disallowed operators...
  Line 5: Found '~' - ALLOWED
  Line 5: Found '|' - ALLOWED
PASS: No disallowed operators found

[CORRECTNESS TESTS]
--------------------------------------------------------------------------------
Running exhaustive test for 8-bit inputs (65536 combinations)...
  65536/65536 tests passed

Running random 32-bit tests (10000 iterations)...
  10000/10000 tests passed

[RESULT: PASS]
================================================================================
Score: 4/4 (4 ops used, max 8 allowed)

$ ./datalab-runner --scoreboard

================================================================================
                         DATA LAB SCOREBOARD
================================================================================

Puzzle              Status    Ops Used    Max Ops    Score
--------------------------------------------------------------------------------
bitAnd              PASS           4          8       2.0
bitXor              PASS           7          8       1.5
isZero              PASS           2          2       2.0
addOK               FAIL           -         20       0.0
  -> Failed: addOK(0x7FFFFFFF, 1) expected 0, got 1

Total Score: 5.5 / 8.0

The Core Question Youโ€™re Answering

โ€œCan you think like the hardware - expressing computation using only the primitive operations a CPU actually has, while guaranteeing correctness for every possible input?โ€

Concepts You Must Understand First

  1. Boolean Algebra and Logic Gates
    • How can you express AND using only OR and NOT? (De Morganโ€™s Laws)
    • CS:APP Ch. 2.1.6-2.1.8
  2. Bitwise Operations in C
    • What is the difference between & and &&? Between and ย  ?
    • What is the difference between arithmetic and logical right shift?
    • CS:APP Ch. 2.1.6-2.1.8, K&R Ch. 2.9
  3. Twoโ€™s Complement Arithmetic
    • How can you detect overflow using only bitwise operations?
    • What is the relationship between ~x and -x-1?
    • CS:APP Ch. 2.2-2.3
  4. Bit Manipulation Patterns
    • How do you create a mask with the lowest N bits set?
    • How do you extract a field of bits from a value?
    • Hackerโ€™s Delight Ch. 2

Questions to Guide Your Design

  1. How will you detect disallowed operators - regex, parsing, or AST analysis?
  2. What test strategy will you use - exhaustive for small inputs, random for large?
  3. How will you score solutions - just pass/fail, or reward minimal operator usage?

Thinking Exercise

Solve these puzzles by hand:

Puzzle 1: bitAnd(x, y) - Compute x & y using only ~ and |

De Morgan: a & b = ~(~a | ~b)

Puzzle 2: isNegative(x) - Return 1 if x < 0. Use onlyย ยป and &

Hint: x >> 31 for 32-bit integers

The Interview Questions Theyโ€™ll Ask

  1. โ€œImplement XOR using only AND, OR, and NOTโ€ - x ^ y = (x & ~y) (~x & y)
  2. โ€œHow do you detect if adding two integers will overflow?โ€ - Check if signs match but result sign differs
  3. โ€œWhat is the fastest way to check if a number is a power of 2?โ€ - x && !(x & (x-1))
  4. โ€œHow do you compute absolute value without branching?โ€ - int mask = xย ยป 31; return (x ^ mask) - mask;

Hints in Layers

Hint 1: Create reference implementations and test against them Hint 2: Test edge cases: 0, 1, -1, INT_MAX, INT_MIN, 0x55555555, 0xAAAAAAAA Hint 3: Use TRACE macros to debug intermediate values

Books That Will Help

Topic Book Chapter
Bitwise operations Computer Systems: A Programmerโ€™s Perspective Ch. 2.1.6-2.1.8
Twoโ€™s complement Computer Systems: A Programmerโ€™s Perspective Ch. 2.2-2.3
Bit manipulation tricks Hackerโ€™s Delight Ch. 2
C bitwise operators The C Programming Language (K&R) Ch. 2.9
Safe integer operations Effective C Ch. 5

Project 4: x86-64 Calling Convention Crash Cart

Attribute Value
Language C (alt: Rust, Zig, C++)
Difficulty Advanced
Time 1โ€“2 weeks
Chapters 3
Coolness โ˜…โ˜…โ˜…โ˜†โ˜† Genuinely Clever
Portfolio Value Resume Gold

What youโ€™ll build: Tiny programs plus a standardized post-mortem report format that explains how stack frames, saved registers, and return addresses caused a crash.

Why it matters: Chapter 3 becomes usable only when you can debug from registers/stack bytes back to the source-level defect.

Core challenges:

  • Mapping assembly to C constructs (code generation)
  • Explaining stack layout and argument passing (ABI)
  • Handling arrays/structs in machine terms (data layout + addressing)

Key concepts to master:

  • x86-64 instruction patterns (Ch. 3)
  • Stack discipline and procedure calls (Ch. 3)
  • Arrays/structs and pointer arithmetic (Ch. 3)

Prerequisites: Comfort using a debugger (GDB/LLDB).

Deliverable: Given a crash address and debugger snapshot, you can write a clean narrative of what happened.

Implementation hints:

  • Standardize your report: registers, stack window, disassembly window, C-source mapping
  • Intentionally create classic failures: invalid pointer, stack smash, use-after-free

Milestones:

  1. You can explain a crash without guessing
  2. You recognize compiler-generated patterns (switch tables, loops, calls)
  3. You identify vulnerability classes by assembly signature

Real World Outcome

When complete, you will have crash scenarios with detailed post-mortem analysis:

$ ./crash-cart analyze core.12345

================================================================================
              x86-64 CRASH CART - POST-MORTEM ANALYSIS
================================================================================

[CRASH SUMMARY]
Signal: SIGSEGV | Fault: 0x0 | Location: main+54 (vulnerable.c:23)
Cause: NULL pointer dereference

[REGISTERS]
RAX: 0x0000000000000000  <- NULL!
RBP: 0x7fffffffdd70 | RSP: 0x7fffffffdd50 | RIP: 0x401156

[STACK TRACE]
#0  main at vulnerable.c:23
#1  __libc_start_call_main

[STACK FRAME]
  RBP+8:  Return address
  RBP:    Saved RBP
  RBP-24: ptr = NULL

[DISASSEMBLY]
0x401156: mov (%rax),%eax  ; CRASH: deref NULL!

[ROOT CAUSE]
get_data() returned NULL, dereferenced without checking.

The Core Question Youโ€™re Answering

โ€œGiven a crashed program and a debugger, can you reconstruct what happened?โ€

Concepts You Must Understand First

  1. x86-64 Register Conventions - Caller/callee-saved, argument registers (CS:APP Ch. 3.7)
  2. Stack Frame Layout - Return address, saved regs, locals (CS:APP Ch. 3.7.1-3.7.4)
  3. Calling Convention - Arguments and returns (CS:APP Ch. 3.7)
  4. Memory Safety Violations - NULL deref, overflow, use-after-free (CS:APP Ch. 3.10)

Questions to Guide Your Design

  1. What crash scenarios will you create?
  2. How will you standardize your report format?

Thinking Exercise

void greet(char *name) {
    char buffer[16];
    strcpy(buffer, name);  // No bounds check!
}

Run with 32 โ€˜Aโ€™s. Where is return address relative to buffer?

The Interview Questions Theyโ€™ll Ask

  1. โ€œWalk me through a function call in x86-64โ€
  2. โ€œHow would you debug a segfault with only a core dump?โ€
  3. โ€œWhat is a stack buffer overflow?โ€
  4. โ€œWhat protections exist against buffer overflows?โ€

Hints in Layers

Hint 1: GDB commands: info registers, x/32xg $rsp, bt full Hint 2: Report: Summary, Registers, Stack, Disasm, Root Cause Hint 3: Create NULL deref, stack overflow, use-after-free scenarios

Books That Will Help

Topic Book Chapter
x86-64 procedures CS:APP Ch. 3.7
Buffer overflows CS:APP Ch. 3.10
GDB debugging The Art of Debugging Ch. 1-4
Binary analysis Practical Binary Analysis Ch. 6

Project 5: Bomb Lab Workflow

Attribute Value
Language C (alt: Rust, Zig, C++)
Difficulty Advanced
Time 1โ€“2 weeks
Chapters 3
Coolness โ˜…โ˜…โ˜…โ˜…โ˜† Hardcore Tech Flex
Portfolio Value Resume Gold

What youโ€™ll build: A repeatable binary-puzzle playbook and annotated solutions for at least one full bomb instance: inputs, reasoning, and the exact assembly facts used.

Why it matters: It forces fluent reading of compiler output and tool-driven reasoning under constraints.

Core challenges:

  • Extracting constraints from assembly (control flow + data movement)
  • Verifying hypotheses via debugging (disciplined experimentation)
  • Handling indirect jumps and lookup tables (machine-level control)

Key concepts to master:

  • Control flow at machine level (Ch. 3)
  • Debugger-driven reasoning (Ch. 3)
  • Defensive reading of compiled code (Ch. 3 security discussion)

Prerequisites: Project 4 (or equivalent).

Deliverable: A written โ€œdefusal dossierโ€ that proves you can reverse engineer a real x86-64 binary methodically.

Implementation hints:

  • Write down each constraint as a testable statement before trying any input
  • Prefer โ€œprove constraintsโ€ over โ€œtry stringsโ€

Milestones:

  1. You solve phases without brute force
  2. You generalize patterns across different binaries
  3. You can justify each solution in assembly terms

Real World Outcome

When you complete this project, you will have a โ€œDefusal Dossierโ€ documenting your systematic reverse engineering of a binary bomb:

$ objdump -t bomb | grep phase
0000000000400ee0 g     F .text  000000000000002a phase_1
0000000000400efc g     F .text  0000000000000052 phase_2
0000000000400f43 g     F .text  000000000000003c phase_3

$ gdb ./bomb
(gdb) break phase_1
(gdb) run
(gdb) disas
   0x0000000000400ee4 <+4>:     mov    $0x402400,%esi
   0x0000000000400ee9 <+9>:     call   0x401338 <strings_not_equal>
   0x0000000000400ef2 <+18>:    call   0x40143a <explode_bomb>
(gdb) x/s 0x402400
0x402400:       "Border relations with Canada have never been better."

$ ./bomb solutions.txt
Congratulations! You've defused the bomb!

Your dossier documents each phase:

PHASE 1: String Comparison
Constraint: input must equal string at 0x402400
Evidence: mov $0x402400,%esi before strings_not_equal call
Solution: "Border relations with Canada have never been better."

The Core Question Youโ€™re Answering

โ€œHow do I systematically extract program constraints from compiled machine code without source access?โ€

Concepts You Must Understand First

  1. x86-64 Instruction Semantics (CS:APP Ch. 3.4-3.6)
    • What does lea vs mov do? How do cmp and test set condition codes?
    • What is the difference between je, jl, jg, ja, jb?
  2. Calling Conventions (CS:APP Ch. 3.7)
    • Where are arguments? (%rdi, %rsi, %rdx, %rcx, %r8, %r9)
    • Where is the return value? (%rax)
  3. Control Flow Patterns (CS:APP Ch. 3.6)
    • How does a for loop look in assembly?
    • How does a switch compile (jump tables)?
  4. Data Access Patterns (CS:APP Ch. 3.8-3.9)
    • How is array[i] computed? How are struct fields accessed?

Questions to Guide Your Design

  1. What tools will you use first? How do you identify โ€œinterestingโ€ functions?
  2. How do you identify the โ€œexplodeโ€ condition and work backwards?
  3. How do you test hypotheses before committing an answer?
  4. How do you recognize loop and recursive patterns?

Thinking Exercise

Trace this by hand before using GDB:

phase_mystery:
    mov    $0x4025cf,%edi
    call   sscanf
    cmp    $0x2,%eax
    jg     .L1
    call   explode_bomb
.L1:
    cmpl   $0x7,0x8(%rsp)
    ja     .L_explode
    jmp    *0x402470(,%rax,8)

(gdb) x/s 0x4025cf
0x4025cf:       "%d %d %d"

What format does sscanf expect? What is jmp *0x402470(,%rax,8) doing?

The Interview Questions Theyโ€™ll Ask

  1. โ€œWalk me through reverse engineering an unknown binary.โ€
  2. โ€œHow do you identify a switch statement in x86-64 assembly?โ€
  3. โ€œWhatโ€™s the difference between test %eax,%eax and cmp $0,%eax?โ€
  4. โ€œHow would you find a hidden function in a binary?โ€
  5. โ€œWhat tools would you use for binary analysis?โ€

Hints in Layers

Layer 1: strings bomb | less and nm bomb | grep phase_

Layer 2: (gdb) break explode_bomb is your safety net

Layer 3 - String Pattern:

mov    $ADDR,%esi
call   strings_not_equal
test   %eax,%eax
je     .Lsuccess
call   explode_bomb

Books That Will Help

Topic Book Chapter
x86-64 instructions Computer Systems: A Programmerโ€™s Perspective Ch. 3.4-3.6
Reverse engineering Hacking: The Art of Exploitation Ch. 3
GDB mastery The Art of Debugging with GDB, DDD, and Eclipse Ch. 1-4
Binary formats Practical Binary Analysis Ch. 2, 4
Assembly Low-Level Programming by Igor Zhirkov Part II

Project 6: Attack Lab Workflow

Attribute Value
Language C (alt: Rust, Zig, C++)
Difficulty Expert
Time 2โ€“3 weeks
Chapters 3
Coolness โ˜…โ˜…โ˜…โ˜…โ˜† Hardcore Tech Flex
Portfolio Value Resume Gold

What youโ€™ll build: A controlled โ€œvulnerable target labโ€ environment plus an exploitation journal documenting (a) the bug class, (b) memory layout evidence, and (c) the exact control-flow hijack achievedโ€”first via code injection, then via ROP.

Why it matters: It turns Chapter 3โ€™s security discussion into concrete mechanics: stack discipline, calling conventions, and why mitigations matter.

Core challenges:

  • Proving the overwrite boundary and control-flow takeover (stack layout evidence)
  • Understanding executable protections and how they change tactics (mitigations reasoning)
  • Constructing ROP chains from existing code fragments (machine-level composition)

Key concepts to master:

  • Buffer overflows and stack discipline (Ch. 3)
  • Return addresses and control transfers (Ch. 3)
  • Defensive implications and mitigations (Ch. 3)

Prerequisites: Projects 4 and 5.

Deliverable: Demonstrate (in a sandbox) a reliable hijack and explain exactly why it worked and which mitigation would block it.

Implementation hints:

  • Treat this as โ€œlearn to defend by learning to break,โ€ not as an offensive toolkit
  • Journal entries must include memory-map evidence, not just outcomes

Milestones:

  1. You can reason about stack frames as an attack surface
  2. You can explain why NX/ASLR changes the game
  3. You can โ€œread gadgetsโ€ the way you read assembly

Real World Outcome

When you complete this project, you will have an โ€œExploitation Journalโ€ documenting your control-flow hijacking techniques:

# Phase 1: Code Injection Attack
$ ./hex2raw < exploit1.txt | ./ctarget -q
Cookie: 0x59b997fa
Type string:Touch1!: You called touch1()
Valid solution for level 1 with target ctarget
PASS: Would have posted the following:
        user id bovik
        course  15213-f15
        lab     attacklab
        result  1:PASS:...

# Phase 2: ROP Attack (with NX enabled)
$ ./hex2raw < exploit5.txt | ./rtarget -q
Cookie: 0x59b997fa
Type string:Touch2!: You called touch2(0x59b997fa)
Valid solution for level 2 with target rtarget
PASS: Would have posted the following:
        user id bovik
        course  15213-f15
        lab     attacklab
        result  1:PASS:...

Your exploitation journal documents the attack methodology:

EXPLOIT 1: Code Injection - Touch1
==================================
Vulnerability: gets() has no bounds checking
Stack Layout:
  0x5561dc78: buffer start (40 bytes)
  0x5561dca0: saved %rbp
  0x5561dca8: return address <- OVERWRITE TARGET

Attack Vector:
  - 40 bytes padding + address of touch1 (0x4017c0)
  - Little-endian: c0 17 40 00 00 00 00 00

Payload: [40 bytes junk] [0x4017c0]

EXPLOIT 5: ROP Chain - Touch2
=============================
Mitigation: Stack is non-executable (NX bit)
Strategy: Chain existing code "gadgets" to set %rdi = cookie

Gadget Chain:
  0x4019cc: popq %rax; ret     # Pop cookie into %rax
  0x4019c5: movq %rax,%rdi; ret # Move to first argument
  0x4017ec: touch2             # Call target

Payload: [40 bytes] [0x4019cc] [cookie] [0x4019c5] [0x4017ec]

The Core Question Youโ€™re Answering

โ€œHow do memory-safety vulnerabilities enable control-flow hijacking, and how do modern mitigations change the exploitation landscape?โ€

Concepts You Must Understand First

  1. Stack Frame Layout (CS:APP Ch. 3.7)
    • Where is the return address stored relative to local variables?
    • What happens when you write past the end of a buffer?
  2. Control Flow Hijacking (CS:APP Ch. 3.10.3-3.10.4)
    • How does overwriting a return address redirect execution?
    • What is the difference between code injection and ROP?
  3. Modern Mitigations (CS:APP Ch. 3.10.4)
    • What is stack canary protection? When does it detect attacks?
    • What is ASLR? How does it complicate exploitation?
    • What is NX (DEP)? Why does it require ROP?
  4. Gadget Identification (Hacking: Art of Exploitation)
    • What makes a useful gadget? (ends in ret)
    • How do you chain gadgets to achieve computation?

Questions to Guide Your Design

  1. How do you determine the exact offset from buffer start to return address?
  2. How do you construct shellcode that fits in limited space?
  3. How do you find useful gadgets in a binary?
  4. How do you chain gadgets to pass arguments to functions?

Thinking Exercise

Before crafting any exploit, analyze this vulnerable function:

void getbuf() {
    char buf[BUFFER_SIZE];
    Gets(buf);
    return;
}
getbuf:
    sub    $0x28,%rsp       # Allocate 40 bytes
    mov    %rsp,%rdi        # buf = %rsp
    call   Gets             # Gets(buf) - no bounds check!
    add    $0x28,%rsp
    ret

Questions:

  1. Where is buf located relative to the saved return address?
  2. How many bytes do you need to write to reach the return address?
  3. If you want to call touch1 at 0x4017c0, what bytes do you write?
  4. Why must addresses be in little-endian format?

The Interview Questions Theyโ€™ll Ask

  1. โ€œExplain how a buffer overflow attack works.โ€
  2. โ€œWhat is Return-Oriented Programming and why is it necessary?โ€
  3. โ€œHow does ASLR protect against exploitation?โ€
  4. โ€œWhat is a stack canary and how does it work?โ€
  5. โ€œHow would you defend a system against memory-safety attacks?โ€

Hints in Layers

Layer 1: Use GDB to find exact stack layout: (gdb) x/20gx $rsp

Layer 2: For code injection, your shellcode runs from the buffer location

Layer 3 - Finding Gadgets:

# Look for "pop; ret" patterns
objdump -d rtarget | grep -A1 "pop"
# Common useful gadgets:
# 58 c3          popq %rax; ret
# 5f c3          popq %rdi; ret
# 48 89 c7 c3    movq %rax,%rdi; ret

Layer 4 - ROP Chain Structure:

[padding to return address]
[gadget1 address]     <- first ret goes here
[value for pop]       <- popped by gadget1
[gadget2 address]     <- gadget1's ret goes here
[target function]     <- final destination

Books That Will Help

Topic Book Chapter
Buffer overflows Computer Systems: A Programmerโ€™s Perspective Ch. 3.10.3-3.10.4
Stack discipline Computer Systems: A Programmerโ€™s Perspective Ch. 3.7
Exploitation techniques Hacking: The Art of Exploitation Ch. 3, 5
ROP fundamentals Practical Binary Analysis Ch. 10
Modern mitigations Low-Level Programming by Igor Zhirkov Ch. 8-9

Phase 3: Architecture & Performance


Project 7: Y86-64 CPU Simulator

Attribute Value
Language C (alt: Rust, Zig, C++)
Difficulty Expert
Time 1 month+
Chapters 4, 5
Coolness โ˜…โ˜…โ˜…โ˜…โ˜… Pure Magic
Portfolio Value Resume Gold

What youโ€™ll build: A Y86-64 interpreter plus a pipelined model that can emit per-cycle traces (stage contents, hazards, stalls, bubbles).

Why it matters: Chapter 4 is execution mechanics. Modeling a pipeline forces understanding of hazards and control logic.

Core challenges:

  • Implementing ISA semantics correctly (instruction execution)
  • Modeling hazards and pipeline control (pipelining)
  • Validating equivalence between sequential and pipelined execution (correctness)

Key concepts to master:

  • Datapath and control (Ch. 4)
  • Pipelining and hazards (Ch. 4)
  • Correctness vs performance (Ch. 5)

Prerequisites: Strong C, state machine mindset, patience for verification.

Deliverable: Run Y86-64 programs and produce a cycle-by-cycle โ€œwhy it stalled hereโ€ trace.

Implementation hints:

  • Start with a โ€œgoldenโ€ sequential interpreter
  • Add pipeline stages as explicit state; treat each cycle as a deterministic transition

Milestones:

  1. Sequential simulator passes a suite of programs
  2. Pipelined model matches sequential results
  3. You can explain every stall/bubble with a specific hazard rule

Real World Outcome

When you complete this project, you will have a Y86-64 simulator with cycle-accurate pipeline tracing:

$ ./y86sim -s prog.yo
Y86-64 Sequential Simulator
Loaded program: prog.yo (156 bytes, 23 instructions)

Cycle   PC        Instruction              Registers Changed
1       0x000     irmovq $0x100, %rsp      %rsp = 0x100
2       0x00a     call main                %rsp = 0x0f8
3       0x058     addq %rdi, %rax          %rax = 0xa

Execution complete: 47 cycles, status = HLT

$ ./y86sim -p prog.yo -trace
Y86-64 Pipelined Simulator (5-stage)

Cycle 5:
  Fetch:    addq %rdi, %rax
  Decode:   irmovq $0x0, %rax
  Execute:  irmovq $0xa, %rdi
  Memory:   call main
  Writeback:irmovq $0x100, %rsp

*** HAZARD: Load-use data hazard ***
  Action: STALL Fetch+Decode, BUBBLE in Execute

Summary: 52 cycles, 3 data hazards (2 stalls, 1 forwarded)

The Core Question Youโ€™re Answering

โ€œHow does a pipelined processor execute instructions, and what hazards must be detected and resolved to maintain correctness?โ€

Concepts You Must Understand First

  1. Y86-64 ISA (CS:APP Ch. 4.1) - Instruction formats and semantics
  2. Sequential Processor (CS:APP Ch. 4.3) - Fetch, Decode, Execute, Memory, Writeback
  3. Pipelining (CS:APP Ch. 4.4) - Why it improves throughput
  4. Hazards (CS:APP Ch. 4.5) - Data (RAW) and control hazards

Questions to Guide Your Design

  1. How will you represent pipeline registers between stages?
  2. How will you detect data hazards at decode time?
  3. How will you implement forwarding?
  4. How will you handle branch mispredictions?

Thinking Exercise

0x000: irmovq $10, %rax
0x00a: irmovq $3, %rbx
0x014: addq %rax, %rbx     # Depends on both previous

When addq reaches Decode, where can it get %rax and %rbx from? Stall or forward?

The Interview Questions Theyโ€™ll Ask

  1. โ€œExplain the five stages of a classic RISC pipeline.โ€
  2. โ€œWhat is a data hazard and how is it resolved?โ€
  3. โ€œWhat is forwarding/bypassing?โ€
  4. โ€œWhat happens on a branch misprediction?โ€

Hints in Layers

Layer 1: typedef struct { uint8_t icode:4; uint8_t ifun:4; ... } instruction_t;

Layer 2 - Hazard Detection: bool hazard = (D_srcA == E_dstE);

Layer 3: Load-use hazards require stalling, not just forwarding.

Books That Will Help

Topic Book Chapter
Y86-64 ISA Computer Systems: A Programmerโ€™s Perspective Ch. 4.1
Pipelining Computer Systems: A Programmerโ€™s Perspective Ch. 4.4-4.5
Hazards Computer Organization and Design (Patterson) Ch. 4.5-4.7

Project 8: Performance Clinic

Attribute Value
Language C (alt: Rust, Zig, C++)
Difficulty Advanced
Time 1โ€“2 weeks
Chapters 5, 6, 1
Coolness โ˜…โ˜…โ˜…โ˜†โ˜† Genuinely Clever
Portfolio Value Micro-SaaS/Pro Tool

What youโ€™ll build: A benchmark suite of small kernels plus a written optimization report explaining changes in terms of ILP, branch prediction, and locality.

Why it matters: Chapter 5 is about turning โ€œfastโ€ into measurable, explainable transformations.

Core challenges:

  • Stable measurements (methodology)
  • Transformations that improve ILP / reduce mispredicts (CPU behavior)
  • Avoiding โ€œfaster by accidentโ€ (experimental rigor)

Key concepts to master:

  • Loop transformations and tuning (Ch. 5)
  • Bottlenecks: compute vs memory (Ch. 5โ€“6)
  • Limits: Amdahlโ€™s Law intuition (Ch. 1)

Prerequisites: Project 1.

Deliverable: A portfolio-quality report with before/after results and a strong โ€œwhyโ€ narrative.

Implementation hints:

  • Keep kernels tiny; control the environment; log everything needed to reproduce

Milestones:

  1. Measurements become stable and repeatable
  2. You can predict when an optimization backfires
  3. You explain improvements as architecture effects, not folklore

Real World Outcome

When you complete this project, you will have a benchmark suite with detailed performance analysis:

$ ./perfclinic --kernel=dotprod --optimize
Performance Clinic: Dot Product Kernel
=======================================

BASELINE (naive implementation):
  for (i = 0; i < n; i++)
      sum += a[i] * b[i];

  Cycles: 4,892,341
  CPE (Cycles Per Element): 4.89
  Bottleneck: Loop-carried dependency on 'sum'

OPTIMIZATION 1: Loop Unrolling (4x)
  for (i = 0; i < n; i += 4) {
      sum += a[i]*b[i] + a[i+1]*b[i+1] +
             a[i+2]*b[i+2] + a[i+3]*b[i+3];
  }

  Cycles: 2,456,782
  CPE: 2.46  (1.99x speedup)
  Why: Reduced loop overhead, but still serialized on 'sum'

OPTIMIZATION 2: Multiple Accumulators
  for (i = 0; i < n; i += 4) {
      sum0 += a[i]*b[i];     sum1 += a[i+1]*b[i+1];
      sum2 += a[i+2]*b[i+2]; sum3 += a[i+3]*b[i+3];
  }
  sum = sum0 + sum1 + sum2 + sum3;

  Cycles: 1,234,567
  CPE: 1.23  (3.97x speedup over baseline)
  Why: Breaks loop-carried dependency, enables ILP
  Theoretical limit: CPE ~1.0 (FP latency = 4, throughput = 1)

OPTIMIZATION 3: SIMD (AVX)
  __m256d sum_vec = _mm256_setzero_pd();
  for (i = 0; i < n; i += 4) {
      sum_vec = _mm256_fmadd_pd(
          _mm256_load_pd(&a[i]),
          _mm256_load_pd(&b[i]), sum_vec);
  }

  Cycles: 312,456
  CPE: 0.31  (15.7x speedup over baseline)
  Why: 4 elements per SIMD operation

Performance Profile (perf stat):
  Instructions:     12,345,678
  Cycles:           312,456
  IPC:              39.5 (superscalar)
  L1 cache misses:  0.02%
  Branch mispred:   0.01%

The Core Question Youโ€™re Answering

โ€œHow do I measure, explain, and improve program performance in terms of CPU microarchitecture effects?โ€

Concepts You Must Understand First

  1. Latency vs Throughput (CS:APP Ch. 5.7)
    • What is the latency of a floating-point multiply?
    • What is the throughput (operations per cycle)?
  2. Loop-Carried Dependencies (CS:APP Ch. 5.8)
    • Why does a sequential sum limit performance?
    • How do multiple accumulators help?
  3. Instruction-Level Parallelism (CS:APP Ch. 5.9)
    • How many independent operations can execute per cycle?
    • What limits ILP in practice?
  4. Branch Prediction (CS:APP Ch. 5.12)
    • What patterns are predictable?
    • How do mispredictions affect performance?
  5. Memory Hierarchy Effects (CS:APP Ch. 6)
    • When is a kernel compute-bound vs memory-bound?
    • How does cache locality affect performance?

Questions to Guide Your Design

  1. How will you ensure stable, reproducible measurements?
  2. How will you identify the bottleneck (compute, memory, branches)?
  3. How will you verify your optimization actually helps?
  4. How will you explain WHY the optimization works?

Thinking Exercise

Before optimizing, analyze this loop:

double poly(double a[], double x, int degree) {
    double result = a[0];
    double xpwr = x;
    for (int i = 1; i <= degree; i++) {
        result += a[i] * xpwr;
        xpwr *= x;
    }
    return result;
}

Questions:

  1. What is the loop-carried dependency?
  2. What is the theoretical minimum CPE?
  3. How would Hornerโ€™s method change the dependency pattern?
  4. Would loop unrolling help? Why or why not?

The Interview Questions Theyโ€™ll Ask

  1. โ€œHow do you identify a performance bottleneck?โ€
  2. โ€œExplain instruction-level parallelism.โ€
  3. โ€œWhat is loop unrolling and when does it help?โ€
  4. โ€œHow do branch mispredictions affect performance?โ€
  5. โ€œWhen is a program compute-bound vs memory-bound?โ€

Hints in Layers

Layer 1 - Stable Measurement:

# Disable turbo boost, set governor to performance
sudo cpupower frequency-set -g performance
# Run multiple trials, report median
for i in {1..10}; do ./bench; done | sort -n | head -5 | tail -1

Layer 2 - Profiling:

perf stat -e cycles,instructions,cache-misses ./bench
perf record ./bench && perf report

Layer 3 - Multiple Accumulators:

// Transform: sum += a[i] * b[i];
// Into:
double sum0=0, sum1=0, sum2=0, sum3=0;
for (i = 0; i < n; i += 4) {
    sum0 += a[i]*b[i];   sum1 += a[i+1]*b[i+1];
    sum2 += a[i+2]*b[i+2]; sum3 += a[i+3]*b[i+3];
}
return sum0 + sum1 + sum2 + sum3;

Books That Will Help

Topic Book Chapter
Performance optimization Computer Systems: A Programmerโ€™s Perspective Ch. 5
Memory hierarchy Computer Systems: A Programmerโ€™s Perspective Ch. 6
Limits of parallelism Computer Systems: A Programmerโ€™s Perspective Ch. 5.9-5.11
CPU microarchitecture Write Great Code Vol 1 Ch. 3-4
SIMD programming Write Great Code Vol 2 Ch. 12-14
Profiling tools Linux System Programming Ch. 10

Project 9: Cache Lab++ โ€” Cache Simulator + Locality Visualizer

Attribute Value
Language C (alt: Rust, Zig, C++)
Difficulty Advanced
Time 2โ€“3 weeks
Chapters 6, 5
Coolness โ˜…โ˜…โ˜…โ˜…โ˜† Hardcore Tech Flex
Portfolio Value Resume Gold

What youโ€™ll build: A set-associative cache simulator plus an โ€œASCII locality visualizerโ€ that shows hit/miss patterns for selected code paths.

Why it matters: Chapter 6 lands only when you can simulate misses and then change code to improve locality.

Core challenges:

  • Tag/index/offset logic correctness (cache organization)
  • Replacement policy and statistics (behavior)
  • Improving a real kernel via locality (spatial/temporal locality)

Key concepts to master:

  • Cache organization and locality (Ch. 6)
  • Miss types and their causes (Ch. 6)
  • Measurement discipline (Ch. 5)

Prerequisites: Projects 2 and 8 recommended.

Deliverable: Demonstrate a miss-rate reduction with a locality explanation.

Implementation hints:

  • Produce both aggregate stats and per-access event logs
  • Use deliberately-designed access patterns to isolate compulsory/conflict/capacity misses

Milestones:

  1. Simulator matches known traces
  2. You can explain each miss type with concrete scenarios
  3. You can design data layouts to target cache behavior

Real World Outcome

$ ./csim -v -s 4 -E 2 -b 4 -t traces/matrix_multiply.trace
Cache Configuration:
  Sets: 16 (s=4), Lines per set: 2 (E=2), Block size: 16 bytes (b=4)
  Total cache size: 512 bytes

Processing trace: traces/matrix_multiply.trace
---------------------------------------------------
L 0x00601040, 8   miss  [Set 4: loaded block 0x00601040]
L 0x00601048, 8   hit   [Set 4: block 0x00601040 still valid]
S 0x00602080, 8   miss  [Set 8: loaded block 0x00602080]
L 0x00601050, 8   miss  [Set 5: loaded block 0x00601050]
L 0x00601058, 8   hit   [Set 5: block 0x00601050 still valid]
L 0x00601100, 8   miss  [Set 0: loaded block 0x00601100]
L 0x00601108, 8   hit   [Set 0: block 0x00601100 still valid]
S 0x00602088, 8   hit   [Set 8: block 0x00602080 still valid]
L 0x00601060, 8   miss  [Set 6: loaded block 0x00601060]
L 0x00601180, 8   miss  [Set 8: evict LRU, loaded block 0x00601180]
...

Summary:
  hits: 4,847  misses: 1,153  evictions: 641
  hit rate: 80.8%  miss rate: 19.2%
  miss breakdown: compulsory=256 capacity=512 conflict=385

$ ./csim -locality traces/matrix_multiply.trace
Locality Visualization (temporal window=8 accesses)
===================================================

Address Heat Map (most accessed blocks):
  Block 0x00601040: โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ 847 accesses (hot)
  Block 0x00601100: โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ     672 accesses
  Block 0x00602080: โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ         501 accesses
  Block 0x00601180: โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ             334 accesses
  ...

Access Pattern Timeline (showing set utilization):
Time โ†’   0    100   200   300   400   500
Set 0:   โ–‘โ–‘โ–‘โ–‘โ–‘โ–“โ–“โ–“โ–“โ–“โ–‘โ–‘โ–‘โ–‘โ–‘โ–“โ–“โ–“โ–“โ–“โ–‘โ–‘โ–‘โ–‘โ–‘โ–“โ–“โ–“โ–“โ–“  (strided)
Set 4:   โ–“โ–“โ–“โ–“โ–“โ–“โ–“โ–“โ–“โ–“โ–“โ–“โ–“โ–“โ–“โ–“โ–“โ–“โ–“โ–“โ–“โ–“โ–“โ–“โ–“โ–“โ–“โ–“โ–“โ–“  (temporal locality - HOT)
Set 8:   โ–‘โ–“โ–‘โ–“โ–‘โ–“โ–‘โ–“โ–‘โ–“โ–‘โ–“โ–‘โ–“โ–‘โ–“โ–‘โ–“โ–‘โ–“โ–‘โ–“โ–‘โ–“โ–‘โ–“โ–‘โ–“โ–‘โ–“  (interleaved - thrashing!)

Spatial Locality Score: 0.73 (good - sequential block access)
Temporal Locality Score: 0.45 (moderate - reuse distance varies)

Recommendation: Consider blocking/tiling to improve temporal locality
  Current working set estimate: 2.3 KB
  Cache capacity: 512 bytes
  Suggested tile size: 8x8 elements (fits in cache)

The Core Question Youโ€™re Answering

โ€œHow does the memory hierarchy create the illusion of fast, infinite memory, and how can I write code that exploits locality to make this illusion work in my favor?โ€

Concepts You Must Understand First

  1. Cache Organization (sets, lines, blocks)
    • How do you compute which set an address maps to?
    • What is the difference between direct-mapped, set-associative, and fully associative?
    • Given address 0x12345678 and a 4-way set associative cache with 64 sets and 32-byte blocks, which set does this address map to?
    • Book: CS:APP Chapter 6.4 (Cache Memories)
  2. Tag/Index/Offset Address Decomposition
    • How do you split a 64-bit address into tag, index, and offset fields?
    • What determines the size of each field?
    • Why must block size be a power of 2?
    • Book: CS:APP Chapter 6.4.1-6.4.3
  3. The Three Types of Cache Misses
    • What causes compulsory (cold) misses? Can they be eliminated?
    • What causes capacity misses? How do you detect them?
    • What causes conflict misses? Why do they occur even when cache is not full?
    • Book: CS:APP Chapter 6.4.4 (Issues with Writes) and 6.4.5 (Cache Performance)
  4. Replacement Policies
    • How does LRU (Least Recently Used) work? What data structure tracks recency?
    • What is the performance difference between LRU and random replacement?
    • Book: CS:APP Chapter 6.4.2 (Set Associative Caches)
  5. Spatial and Temporal Locality
    • What code patterns exhibit temporal locality? Spatial locality?
    • Why does row-major vs column-major iteration matter for 2D arrays?
    • How does stride length affect cache performance?
    • Book: CS:APP Chapter 6.2 (Locality) and 6.5 (Writing Cache-Friendly Code)
  6. Working Set and Cache Thrashing
    • What is a working set? How do you estimate it?
    • When does thrashing occur? What are the symptoms?
    • Book: CS:APP Chapter 6.3 (Memory Hierarchy) and OSTEP Chapter 22

Questions to Guide Your Design

  1. Data Structure Choice: How will you represent a cache line? What fields do you need (valid bit, tag, LRU counter, dirty bit)?

  2. Address Parsing: Will you use bit manipulation or arithmetic to extract tag/index/offset? Which is clearer?

  3. LRU Implementation: Will you use counters, a linked list, or bit manipulation for tracking LRU? What are the tradeoffs?

  4. Trace Format: How will you parse the Valgrind lackey trace format? What about other formats?

  5. Statistics Tracking: How will you distinguish compulsory from capacity from conflict misses?

  6. Visualization: How will you represent temporal patterns? Access heat? Set utilization?

Thinking Exercise

Consider this code and trace what happens in a direct-mapped cache with 4 sets, 16-byte blocks:

// Array A is at address 0x1000, Array B is at address 0x1100
// Each int is 4 bytes
int A[64], B[64];  // A at 0x1000, B at 0x1100

for (int i = 0; i < 64; i++) {
    A[i] = B[i] + 1;  // Load B[i], then store A[i]
}

Hand-trace questions:

  1. Address of A[0]? Of B[0]? What set does each map to?
  2. Address of A[4]? Of B[4]? (Hint: stride of 16 bytes)
  3. Do A[0] and B[0] map to the same set? What about A[4] and B[4]?
  4. On iteration i=0: What happens when loading B[0]? (miss/hit?)
  5. On iteration i=0: What happens when storing A[0]? Does it evict B[0]โ€™s block?
  6. On iteration i=4: What happens? Do we reload B[4] or is it already cached?
  7. What is the miss rate for this loop? Can you predict it before simulating?
  8. How would you restructure this code to improve locality?

The Interview Questions Theyโ€™ll Ask

  1. โ€œExplain how a CPU cache works and why it matters for performance.โ€
    • Expected: Set/line/block organization, locality exploitation, miss penalty discussion
  2. โ€œYouโ€™re seeing poor performance in your matrix multiplication. How would you diagnose if itโ€™s a cache issue?โ€
    • Expected: Profiling tools (perf, cachegrind), miss rate analysis, working set estimation
  3. โ€œWhat is cache thrashing and how would you fix it?โ€
    • Expected: Conflict misses from aliasing, solutions include padding, blocking/tiling, changing data layout
  4. โ€œExplain the difference between temporal and spatial locality. Give code examples of each.โ€
    • Expected: Temporal = reusing same data, Spatial = accessing nearby addresses, concrete loop examples
  5. โ€œWhy does iterating a 2D array row-by-row vs column-by-column have such different performance?โ€
    • Expected: Memory layout (row-major in C), spatial locality, stride analysis
  6. โ€œDesign a cache-friendly algorithm for transposing a large matrix.โ€
    • Expected: Blocking/tiling to fit working set in cache, discussion of tile size selection

Hints in Layers

Layer 1 - Getting Started: Start by parsing the trace format and printing each access. Implement a direct-mapped cache first (E=1) before handling set-associativity.

// Trace line format: "L 0x00601040, 8" means Load address 0x601040, size 8
typedef struct {
    char op;           // 'L' load, 'S' store, 'M' modify (load+store)
    uint64_t address;
    int size;
} trace_entry_t;

Layer 2 - Address Decomposition: The key insight is that tag, index, and offset are just different bit ranges of the address:

// For a cache with s index bits, b offset bits:
uint64_t offset = address & ((1ULL << b) - 1);
uint64_t set_index = (address >> b) & ((1ULL << s) - 1);
uint64_t tag = address >> (s + b);

Layer 3 - Cache Line Structure: Think about what state you need per line:

typedef struct {
    int valid;          // Is this line holding data?
    uint64_t tag;       // Tag bits from the address
    uint64_t lru_counter; // For LRU replacement (higher = more recent)
    // Note: You don't need to store actual data for simulation!
} cache_line_t;

Layer 4 - Miss Classification: To classify misses, track additional state:

// Compulsory: First access to this block ever (track in a set of seen blocks)
// Conflict: Cache not full, but eviction occurred
// Capacity: Would miss even with fully-associative cache of same size
// Hint: Run two simulations - one with your cache, one with "infinite" associativity

Layer 5 - LRU Implementation: For small associativity (E <= 8), a simple counter approach works:

// On cache hit or fill, update LRU counters:
for (int i = 0; i < E; i++) {
    if (set[i].lru_counter < accessed_line->lru_counter)
        set[i].lru_counter++;  // Age other lines
}
accessed_line->lru_counter = 0;  // Most recently used = 0

// For eviction: find line with highest lru_counter

Books That Will Help

Topic Book Chapter
Cache organization and design Computer Systems: A Programmerโ€™s Perspective Ch. 6.4 Cache Memories
Writing cache-friendly code Computer Systems: A Programmerโ€™s Perspective Ch. 6.5 Writing Cache-Friendly Code
Impact on matrix operations Computer Systems: A Programmerโ€™s Perspective Ch. 6.6 Cache Performance
Memory hierarchy overview Computer Systems: A Programmerโ€™s Perspective Ch. 6.1-6.3
Virtual memory and caching Operating Systems: Three Easy Pieces Ch. 19-22 (Memory Virtualization)
Cache design tradeoffs Computer Organization and Design (Patterson & Hennessy) Ch. 5.3-5.4
Practical cache analysis Linux System Programming (Robert Love) Ch. 9 Memory Management
Performance measurement Computer Systems: A Programmerโ€™s Perspective Ch. 5 Optimizing Program Performance

Phase 4: Systems Programming


Attribute Value
Language C (alt: Rust, Zig, C++)
Difficulty Advanced
Time 2โ€“3 weeks
Chapters 7
Coolness โ˜…โ˜…โ˜…โ˜…โ˜† Hardcore Tech Flex
Portfolio Value Service & Support

What youโ€™ll build: A tool that summarizes symbol/relocation info for ELF objects and demonstrates dynamic interposition (function call hooking) with evidence logs.

Why it matters: It makes symbols, relocation, and runtime resolution concrete.

Core challenges:

  • Parsing ELF structures (object file format)
  • Explaining relocation and binding (static + dynamic linking)
  • Demonstrating interposition safely (loader behavior)

Key concepts to master:

  • Relocation and symbol resolution (Ch. 7)
  • Static vs dynamic linking tradeoffs (Ch. 7)
  • Library interpositioning (Ch. 7)

Prerequisites: Linux environment (VM/container), basic binary tooling.

Deliverable: Prove and explain โ€œwhy my program called that function from that library.โ€

Implementation hints:

  • Keep introspection read-only first
  • Interposition logs must include caller, callee, and resolved address evidence

Milestones:

  1. You interpret link maps confidently
  2. You explain PLT/GOT behavior without hand-waving
  3. You use interposition to debug/profile real programs

Real World Outcome

$ ./elfmap /usr/bin/ls
ELF Analysis: /usr/bin/ls
================================================================================
Type: ELF64 Executable (dynamically linked)
Entry point: 0x00005850
Interpreter: /lib64/ld-linux-x86-64.so.2

Section Headers:
  [Nr] Name              Type            Address          Offset    Size
  [ 0]                   NULL            0000000000000000 00000000  0
  [ 1] .interp           PROGBITS        0000000000000318 00000318  28
  [11] .plt              PROGBITS        0000000000005020 00005020  1424
  [12] .plt.got          PROGBITS        0000000000005590 00005590  24
  [13] .text             PROGBITS        00000000000055b0 000055b0  73521
  [24] .got              PROGBITS        000000000021ff98 0001ff98  104
  [25] .got.plt          PROGBITS        0000000000220000 00020000  728
  [26] .data             PROGBITS        00000000002202e0 000202e0  616
  [27] .bss              NOBITS          0000000000220548 00020548  4824

Symbol Table (.dynsym) - 118 entries:
  Type     Bind   Name                          Library
  FUNC     GLOBAL printf                        libc.so.6
  FUNC     GLOBAL malloc                        libc.so.6
  FUNC     GLOBAL __libc_start_main             libc.so.6
  FUNC     GLOBAL strcmp                        libc.so.6
  FUNC     GLOBAL opendir                       libc.so.6
  FUNC     WEAK   __gmon_start__                (undefined)
  ...

Relocation Entries (.rela.plt) - 89 entries:
  Offset           Info             Type              Symbol + Addend
  0000000220018    000100000007     R_X86_64_JUMP_SLOT printf@GLIBC_2.2.5 + 0
  0000000220020    000200000007     R_X86_64_JUMP_SLOT malloc@GLIBC_2.2.5 + 0
  0000000220028    000300000007     R_X86_64_JUMP_SLOT __libc_start_main + 0

Dynamic Dependencies:
  NEEDED: libselinux.so.1
  NEEDED: libc.so.6

$ ./elfmap --plt-trace /usr/bin/ls
PLT/GOT Lazy Binding Trace:
=============================
Before first call to printf():
  GOT[printf] @ 0x220018 = 0x5026  (points to PLT stub)

[CALL] printf@plt (first call)
  -> PLT stub pushes reloc index, jumps to resolver
  -> ld.so resolves printf to 0x7f3a2c4a5c40 (libc.so.6)
  -> GOT[printf] updated: 0x5026 -> 0x7f3a2c4a5c40

After first call:
  GOT[printf] @ 0x220018 = 0x7f3a2c4a5c40  (direct to libc)

[CALL] printf (second call)
  -> Direct jump via GOT, no resolver needed

$ ./interpose malloc ./myprogram arg1 arg2
=== Interposition Library Loaded ===
Wrapping: malloc, free, calloc, realloc

[14:23:45.001] malloc(64) = 0x55a3b2c00010  [caller: 0x55a3b1a00a32 main+18]
[14:23:45.002] malloc(1024) = 0x55a3b2c00060  [caller: 0x55a3b1a00a58 main+56]
[14:23:45.003] malloc(256) = 0x55a3b2c00470  [caller: 0x55a3b1a00b12 process_data+22]
[14:23:45.004] free(0x55a3b2c00060)  [caller: 0x55a3b1a00b98 process_data+158]
[14:23:45.005] realloc(0x55a3b2c00010, 128) = 0x55a3b2c00010  [caller: 0x55a3b1a00c04 resize_buffer+12]

=== Interposition Summary ===
Total allocations: 47
Total frees: 45
Current heap usage: 384 bytes
Peak heap usage: 8,192 bytes
Potential leaks: 2 blocks (384 bytes)
  - 0x55a3b2c00470 (256 bytes) allocated at main+56
  - 0x55a3b2c00590 (128 bytes) allocated at process_data+98

The Core Question Youโ€™re Answering

โ€œHow does a collection of separately compiled object files become a running program, and how can I observe and modify the symbol resolution process at runtime?โ€

Concepts You Must Understand First

  1. ELF File Format Structure
    • What are the major components of an ELF file (headers, sections, segments)?
    • What is the difference between sections and segments? When is each used?
    • What information does the ELF header contain?
    • Book: CS:APP Chapter 7.4 (Relocatable Object Files) and The Linux Programming Interface Ch. 41
  2. Symbol Tables and Symbol Resolution
    • What is a symbol? What types of symbols exist (global, local, weak)?
    • How does the linker resolve duplicate symbol definitions?
    • What happens with unresolved symbols?
    • Book: CS:APP Chapter 7.5 (Symbols and Symbol Tables) and 7.6 (Symbol Resolution)
  3. Relocation Process
    • Why is relocation necessary? What problem does it solve?
    • What information is in a relocation entry?
    • What are PC-relative vs absolute relocations?
    • Book: CS:APP Chapter 7.7 (Relocation)
  4. Static vs Dynamic Linking
    • What are the tradeoffs between static and dynamic linking?
    • When is each appropriate?
    • What is a shared library? How does it differ from a static archive?
    • Book: CS:APP Chapter 7.10 (Dynamic Linking with Shared Libraries)
  5. PLT and GOT (Lazy Binding)
    • What is the Procedure Linkage Table? The Global Offset Table?
    • How does lazy binding work? What triggers resolution?
    • Why does the first call to a library function take longer?
    • Book: CS:APP Chapter 7.12 (Position-Independent Code) and Practical Binary Analysis Ch. 2
  6. Library Interposition
    • What is function interposition? Why is it useful?
    • What are compile-time, link-time, and runtime interposition?
    • How does LD_PRELOAD work?
    • Book: CS:APP Chapter 7.13 (Library Interpositioning)

Questions to Guide Your Design

  1. ELF Parsing Strategy: Will you parse the ELF manually, use libelf, or memory-map and cast to structures?

  2. Output Format: How will you present symbol tables and relocations in a human-readable way? What groupings help understanding?

  3. Cross-Reference: How will you show which relocations reference which symbols?

  4. Dynamic Analysis: How will you trace PLT/GOT behavior at runtime? ptrace? Interposition?

  5. Interposition Library: What functions will you interpose? How will you call the original function?

  6. Evidence Logging: What information must you capture to prove โ€œthis call went through this resolution pathโ€?

Thinking Exercise

Consider this scenario with two object files being linked:

// main.c
extern int counter;
extern void increment(void);
int main(void) {
    increment();
    return counter;
}

// lib.c
int counter = 0;
void increment(void) {
    counter++;
}

Compile to object files and examine:

gcc -c main.c -o main.o
gcc -c lib.c -o lib.o

Hand-trace questions:

  1. In main.o, what symbols are UNDEFINED? What symbols are defined?
  2. In lib.o, what symbols are defined? Are they global or local?
  3. What relocation entries does main.o have? What type are they?
  4. When the linker processes these files, how does it resolve the counter reference in main.o?
  5. If you add static to counter in lib.c, what error do you get? Why?
  6. If you add a second file with int counter = 5;, what happens? (Strong vs weak symbols)
  7. Now compile as a shared library: gcc -fPIC -shared lib.c -o libmylib.so. What changes in the relocation types?
  8. How would the GOT entry for counter get filled at runtime?

The Interview Questions Theyโ€™ll Ask

  1. โ€œExplain the difference between static and dynamic linking. When would you choose each?โ€
    • Expected: Tradeoffs (startup time, memory sharing, updates, deployment), concrete scenarios
  2. โ€œWhat happens when you call printf() for the first time in a dynamically linked program?โ€
    • Expected: PLT stub, GOT lookup, lazy binding, runtime linker resolution, GOT update
  3. โ€œHow would you intercept all malloc calls in a program without modifying its source?โ€
    • Expected: LD_PRELOAD, dlsym for RTLD_NEXT, wrapper function pattern
  4. โ€œWhat is Position Independent Code and why is it needed for shared libraries?โ€
    • Expected: Load address independence, PC-relative addressing, GOT for data references
  5. โ€œYouโ€™re debugging a program that crashes in a library function. How do you determine which library provided that function?โ€
    • Expected: ldd, /proc/PID/maps, nm, readelf, examining PLT/GOT at crash time
  6. โ€œExplain the One Definition Rule and how the linker handles multiple definitions.โ€
    • Expected: Strong vs weak symbols, resolution rules, static keyword effect

Hints in Layers

Layer 1 - Getting Started: Use existing tools to understand the format before parsing yourself:

# See all sections
readelf -S /usr/bin/ls

# See symbol table
readelf -s /usr/bin/ls

# See relocations
readelf -r /usr/bin/ls

# See dynamic dependencies
ldd /usr/bin/ls

Layer 2 - ELF Header Parsing: The ELF header is at offset 0 and tells you where everything else is:

#include <elf.h>
#include <fcntl.h>
#include <sys/mman.h>

// Memory-map the file
int fd = open(path, O_RDONLY);
struct stat st;
fstat(fd, &st);
void *map = mmap(NULL, st.st_size, PROT_READ, MAP_PRIVATE, fd, 0);

// Cast to ELF header
Elf64_Ehdr *ehdr = (Elf64_Ehdr *)map;

// Find section headers
Elf64_Shdr *shdr = (Elf64_Shdr *)((char *)map + ehdr->e_shoff);
int num_sections = ehdr->e_shnum;

Layer 3 - Symbol Table Navigation: Symbol tables are in sections of type SHT_SYMTAB or SHT_DYNSYM:

// Find the string table for symbol names
Elf64_Shdr *strtab_section = &shdr[symtab_section->sh_link];
char *strtab = (char *)map + strtab_section->sh_offset;

// Iterate symbols
Elf64_Sym *symtab = (Elf64_Sym *)((char *)map + symtab_section->sh_offset);
int num_syms = symtab_section->sh_size / sizeof(Elf64_Sym);

for (int i = 0; i < num_syms; i++) {
    char *name = strtab + symtab[i].st_name;
    int type = ELF64_ST_TYPE(symtab[i].st_info);
    int bind = ELF64_ST_BIND(symtab[i].st_info);
    // ...
}

Layer 4 - Interposition Library: Create a shared library that wraps functions:

// malloc_wrapper.c
#define _GNU_SOURCE
#include <dlfcn.h>
#include <stdio.h>

// Function pointer to real malloc
static void *(*real_malloc)(size_t) = NULL;

void *malloc(size_t size) {
    if (!real_malloc) {
        real_malloc = dlsym(RTLD_NEXT, "malloc");
    }
    void *ptr = real_malloc(size);
    fprintf(stderr, "malloc(%zu) = %p\n", size, ptr);
    return ptr;
}

// Compile: gcc -fPIC -shared -o libmalloc_wrapper.so malloc_wrapper.c -ldl
// Use: LD_PRELOAD=./libmalloc_wrapper.so ./myprogram

Layer 5 - PLT/GOT Tracing: To observe lazy binding, examine the GOT before and after first call:

// Get GOT address from /proc/self/maps or by parsing ELF
// Read GOT entry before call (will point to PLT+6)
// Call function
// Read GOT entry after (will point to actual function in libc)

// Or use ptrace to single-step through PLT resolution

Books That Will Help

Topic Book Chapter
Object files and linking Computer Systems: A Programmerโ€™s Perspective Ch. 7 Linking
ELF format details The Linux Programming Interface Ch. 41 Fundamentals of Shared Libraries
Shared library mechanics The Linux Programming Interface Ch. 42 Advanced Features of Shared Libraries
Symbol resolution rules Computer Systems: A Programmerโ€™s Perspective Ch. 7.6 Symbol Resolution
Position-independent code Computer Systems: A Programmerโ€™s Perspective Ch. 7.12 Position-Independent Code
Library interposition Computer Systems: A Programmerโ€™s Perspective Ch. 7.13 Library Interpositioning
ELF internals Practical Binary Analysis Ch. 2 The ELF Format
Dynamic linking internals Linux System Programming (Robert Love) Ch. 8 File and Directory Management
Linker scripts and details Linkers and Loaders (John Levine) Ch. 3-7

Project 11: Signals + Processes Sandbox

Attribute Value
Language C (alt: Rust, Zig, C++)
Difficulty Advanced
Time 1โ€“2 weeks
Chapters 8
Coolness โ˜…โ˜…โ˜…โ˜†โ˜† Genuinely Clever
Portfolio Value Resume Gold

What youโ€™ll build: A harness that runs child processes in controlled modes (normal exit, crash, stop/continue, timeout) and logs exactly which ECF events occurred and why.

Why it matters: Chapter 8 is about the realities of process control and signals; observation is mandatory.

Core challenges:

  • Correct process creation/reaping (process lifecycle)
  • Async-signal-safe handler design (safe signal handling)
  • Avoiding zombies and race windows (correctness)

Key concepts to master:

  • Process lifecycle (Ch. 8)
  • Signals and handlers (Ch. 8)
  • Nonlocal control (Ch. 8)

Prerequisites: Basic OS concepts.

Deliverable: Demonstrate zombies/orphans/signal races and explain how to prevent them.

Implementation hints:

  • Treat each mode like a lab experiment; isolate one behavior per run
  • Produce a timeline log: spawn โ†’ signal โ†’ status change โ†’ reap

Milestones:

  1. You can explain why zombies happen
  2. Your signal handlers are correct, not โ€œsometimes worksโ€
  3. You can reason about race windows without superstition

Real World Outcome

$ ./procsandbox --mode=lifecycle
Process Lifecycle Sandbox
==========================
Demonstrating: fork, exec, wait, exit

[Parent PID=1234] Forking child...
[14:30:01.001] fork() returned 1235 in parent
[14:30:01.001] fork() returned 0 in child (PID=1235)
[Child  PID=1235] Executing /bin/echo "Hello from child"
[14:30:01.002] execve("/bin/echo", ["echo", "Hello from child"], envp)
Hello from child
[14:30:01.003] Child 1235 called exit(0)
[Parent PID=1234] waitpid() returned: child=1235, status=0x0000
  -> WIFEXITED: true, exit code: 0

Process Timeline:
  Parent [1234]: ----[fork]--------------------[wait/reap]----
  Child  [1235]:      |----[exec]----[run]----[exit]---|
                     t=0   t=1ms     t=2ms    t=3ms

$ ./procsandbox --mode=zombie
Zombie Process Demonstration
=============================

[Parent PID=1234] Creating child without reaping...
[14:30:05.001] Child 1236 created
[14:30:05.002] Child 1236 exiting immediately
[14:30:05.003] Child 1236 is now a ZOMBIE (parent hasn't called wait)

Process Status (from /proc):
  PID   PPID  STATE  COMMAND
  1234  1233  S      procsandbox
  1236  1234  Z      [procsandbox] <defunct>    <-- ZOMBIE!

[14:30:07.000] Parent now calling waitpid()...
[14:30:07.001] Zombie 1236 reaped, status=0x0000

$ ./procsandbox --mode=signals
Signal Handling Demonstration
==============================

[PID=1234] Installing handlers for SIGINT, SIGCHLD, SIGTSTP, SIGUSR1

[14:30:10.001] Forking child 1237 (will run for 5 seconds)...
[14:30:10.002] Child 1237 running: ./sleeper 5

--- Press Ctrl+C to send SIGINT ---
^C
[14:30:12.500] Received SIGINT (signal 2)
  Handler context:
    - Interrupted syscall: yes (was in read())
    - errno preserved: yes (was EINTR, restored to 0)
    - SA_RESTART set: no (syscall returns -1/EINTR)
[14:30:12.501] Forwarding SIGINT to child process group...
[14:30:12.502] Child 1237 terminated by signal 2 (SIGINT)

[14:30:12.503] SIGCHLD received
  Handler actions (async-signal-safe only):
    - Saved errno: 0
    - Called waitpid(-1, &status, WNOHANG): returned 1237
    - WIFSIGNALED(status): true, signal: 2
    - Restored errno: 0

$ ./procsandbox --mode=race
Signal Race Condition Demonstration
====================================

INCORRECT PATTERN (race window exists):
```c
pid_t pid = fork();
if (pid == 0) {
    execve(...);  // Child runs
}
// RACE WINDOW: SIGCHLD might arrive HERE, before job added!
addjob(pid);      // Parent adds job

Running 1000 iterations with race-prone codeโ€ฆ [Results] Failures: 47/1000 (child reaped before job added)

CORRECT PATTERN (block signals around critical section):

sigprocmask(SIG_BLOCK, &mask_chld, &prev);  // Block SIGCHLD
pid_t pid = fork();
if (pid == 0) {
    sigprocmask(SIG_SETMASK, &prev, NULL);  // Unblock in child
    execve(...);
}
addjob(pid);       // Safe: SIGCHLD blocked
sigprocmask(SIG_SETMASK, &prev, NULL);      // Unblock, handler runs

Running 1000 iterations with correct codeโ€ฆ [Results] Failures: 0/1000 (no races detected)


### The Core Question You're Answering

**"How does the operating system manage the lifecycle of processes, and how can programs respond to asynchronous events (signals) correctly and safely?"**

### Concepts You Must Understand First

1. **Process Creation and the fork() Model**
   - What does fork() return in the parent? In the child?
   - What is shared between parent and child after fork? What is copied?
   - Why does fork return twice?
   - Book: CS:APP Chapter 8.4.2 (Creating Processes) and TLPI Chapter 24

2. **The exec Family and Process Replacement**
   - What happens to the calling process during exec?
   - What is preserved across exec? What is not?
   - When does exec return? What does it return?
   - Book: CS:APP Chapter 8.4.5 (Loading and Running Programs) and TLPI Chapter 27

3. **Process Termination and Reaping**
   - What is a zombie process? Why do they exist?
   - What is the difference between wait() and waitpid()?
   - What do WIFEXITED, WIFSIGNALED, and WIFSTOPPED tell you?
   - Book: CS:APP Chapter 8.4.3 (Reaping Child Processes) and TLPI Chapter 26

4. **Signals: Asynchronous Events**
   - What is a signal? What triggers signal delivery?
   - What is the difference between generating, delivering, and handling a signal?
   - What signals are sent by Ctrl+C, Ctrl+Z? What is their default behavior?
   - Book: CS:APP Chapter 8.5 (Signals) and TLPI Chapters 20-22

5. **Signal Handlers and Async-Signal-Safety**
   - Why can't you call printf() in a signal handler?
   - What functions are async-signal-safe? Why does this matter?
   - What is the volatile sig_atomic_t type for?
   - Book: CS:APP Chapter 8.5.5 (Writing Signal Handlers) and TLPI Chapter 21.1

6. **Signal Blocking and Critical Sections**
   - How do you block signals? Why would you want to?
   - What is a signal mask? How does sigprocmask work?
   - What happens to blocked signals? Are they queued?
   - Book: CS:APP Chapter 8.5.6 (Synchronizing Flows) and TLPI Chapter 20.10

7. **Process Groups and Sessions**
   - What is a process group? Why do shells use them?
   - How does the kernel know which process to send SIGINT to when you press Ctrl+C?
   - What is a controlling terminal?
   - Book: CS:APP Chapter 8.5.2 (Sending Signals) and TLPI Chapter 34

### Questions to Guide Your Design

1. **Test Harness Structure**: How will you organize different demonstration modes (lifecycle, signals, races)?

2. **Observability**: How will you log events with precise timestamps? How will you show the timeline?

3. **Signal Handler Design**: How will you make handlers async-signal-safe while still logging useful information?

4. **Race Reproduction**: How will you reliably reproduce race conditions for educational purposes?

5. **Process State Inspection**: Will you use /proc filesystem? waitpid flags? Both?

6. **Error Handling**: How will you handle EINTR from interrupted system calls?

### Thinking Exercise

Consider this signal handler:

```c
volatile sig_atomic_t got_sigchld = 0;
int child_count = 0;  // Number of children to reap

void sigchld_handler(int sig) {
    int olderrno = errno;
    pid_t pid;
    int status;

    // Reap ALL available children (might be multiple)
    while ((pid = waitpid(-1, &status, WNOHANG)) > 0) {
        child_count--;  // BUG: not async-signal-safe!
        // What if main code was in middle of reading child_count?
    }

    got_sigchld = 1;
    errno = olderrno;
}

Hand-trace questions:

  1. Why do we save and restore errno? What could corrupt it?
  2. Why use WNOHANG? What would happen without it?
  3. Why the while loop instead of a single waitpid call?
  4. The code modifies child_count - why is this problematic?
  5. What if two SIGCHLD signals arrive โ€œsimultaneouslyโ€? Are both delivered?
  6. What is volatile sig_atomic_t and why is it needed for got_sigchld?
  7. How would you fix the child_count update to be safe?
  8. Write a main loop that correctly checks got_sigchld and processes reaped children.

The Interview Questions Theyโ€™ll Ask

  1. โ€œExplain what happens when you type Ctrl+C in a terminal running a program.โ€
    • Expected: Terminal driver sends SIGINT to foreground process group, default handler terminates process
  2. โ€œWhat is a zombie process and how do you prevent them?โ€
    • Expected: Terminated child waiting to be reaped, parent must call wait/waitpid, SIGCHLD handler for async reaping
  3. โ€œWhy canโ€™t you call printf() from inside a signal handler?โ€
    • Expected: printf not async-signal-safe, could deadlock on internal locks, use write() instead
  4. โ€œHow would you implement a timeout for a child process?โ€
    • Expected: alarm() or setitimer(), SIGALRM handler, kill() to terminate child, waitpid() to reap
  5. โ€œDescribe a race condition involving fork() and signals, and how to prevent it.โ€
    • Expected: SIGCHLD arriving before job table updated, block signals around fork/addjob, unblock after
  6. โ€œWhat is the difference between SIGTERM and SIGKILL?โ€
    • Expected: SIGTERM can be caught/ignored (graceful shutdown), SIGKILL cannot be caught (forced termination)

Hints in Layers

Layer 1 - Basic Process Creation: Start with a simple fork/exec/wait cycle:

pid_t pid = fork();
if (pid == 0) {
    // Child process
    execve("/bin/echo", (char *[]){"echo", "hello", NULL}, environ);
    perror("execve failed");
    exit(1);
}
// Parent process
int status;
waitpid(pid, &status, 0);
if (WIFEXITED(status)) {
    printf("Child exited with code %d\n", WEXITSTATUS(status));
}

Layer 2 - Signal Handler Installation: Use sigaction() instead of signal() for portable behavior:

struct sigaction sa;
sa.sa_handler = sigchld_handler;
sigemptyset(&sa.sa_mask);
sa.sa_flags = SA_RESTART;  // Restart interrupted syscalls

if (sigaction(SIGCHLD, &sa, NULL) < 0) {
    perror("sigaction");
    exit(1);
}

Layer 3 - Async-Signal-Safe Logging: Write your own safe logging using write():

// Safe string output in signal handler
void safe_print(const char *s) {
    write(STDERR_FILENO, s, strlen(s));
}

// Safe integer output (pre-convert to string)
void safe_print_int(int n) {
    char buf[32];
    int i = sizeof(buf) - 1;
    buf[i] = '\0';
    int neg = (n < 0);
    if (neg) n = -n;
    do {
        buf[--i] = '0' + (n % 10);
        n /= 10;
    } while (n > 0);
    if (neg) buf[--i] = '-';
    write(STDERR_FILENO, &buf[i], sizeof(buf) - 1 - i);
}

Layer 4 - Blocking Signals for Critical Sections: Protect fork/job-add sequences from SIGCHLD:

sigset_t mask, prev;
sigemptyset(&mask);
sigaddset(&mask, SIGCHLD);

// Block SIGCHLD
sigprocmask(SIG_BLOCK, &mask, &prev);

pid_t pid = fork();
if (pid == 0) {
    // Child: restore signal mask before exec
    sigprocmask(SIG_SETMASK, &prev, NULL);
    execve(argv[0], argv, environ);
    exit(1);
}

// Parent: add to job list while SIGCHLD blocked
add_job(job_list, pid, RUNNING);

// Unblock SIGCHLD - pending signal delivered now
sigprocmask(SIG_SETMASK, &prev, NULL);

Layer 5 - Detecting Process State via /proc: Read process state for educational output:

void print_process_state(pid_t pid) {
    char path[64], buf[256];
    snprintf(path, sizeof(path), "/proc/%d/stat", pid);
    int fd = open(path, O_RDONLY);
    if (fd >= 0) {
        read(fd, buf, sizeof(buf));
        // Parse: pid (comm) state ppid pgrp ...
        // state: R=running, S=sleeping, Z=zombie, T=stopped
        close(fd);
    }
}

Books That Will Help

Topic Book Chapter
Process control fundamentals Computer Systems: A Programmerโ€™s Perspective Ch. 8.4 Process Control
Signal concepts and handling Computer Systems: A Programmerโ€™s Perspective Ch. 8.5 Signals
Comprehensive signal coverage The Linux Programming Interface Ch. 20-22 Signals
Process creation in depth The Linux Programming Interface Ch. 24-25 Process Creation
Process termination and waiting The Linux Programming Interface Ch. 26 Monitoring Child Processes
Signal safety and reentrancy The Linux Programming Interface Ch. 21.1 Designing Signal Handlers
Process groups and sessions The Linux Programming Interface Ch. 34 Process Groups, Sessions
Process lifecycle overview Advanced Programming in the UNIX Environment Ch. 8 Process Control
Signals in practice Advanced Programming in the UNIX Environment Ch. 10 Signals
Concurrency with processes Operating Systems: Three Easy Pieces Ch. 5-6 Process API

Project 12: Unix Shell with Job Control

View Expanded Guide - Comprehensive implementation guide with signal flow diagrams, race condition patterns, and job state machines.

Attribute Value
Language C (alt: Rust, Zig, C++)
Difficulty Advanced
Time 2โ€“3 weeks
Chapters 8, 12
Coolness โ˜…โ˜…โ˜…โ˜…โ˜† Hardcore Tech Flex
Portfolio Value Resume Gold

What youโ€™ll build: An interactive shell supporting foreground/background jobs, basic built-ins, and correct handling of interrupt/stop keystrokes.

Why it matters: It integrates processes, signals, and race avoidance into one user-facing system.

Core challenges:

  • Process groups and terminal ownership (job control)
  • Signal handling without races (ECF correctness)
  • Consistent job state under async events (concurrency fundamentals)

Key concepts to master:

  • Job control and signals (Ch. 8)
  • Race avoidance patterns (Ch. 8 & 12)
  • Robust error handling (Appendix)

Prerequisites: Project 11 recommended.

Deliverable: Use your shell to run real programs with correct fg/bg behavior.

Implementation hints:

  • Define a minimal grammar first; add features only after correctness
  • Design job-state transitions on paper before coding

Milestones:

  1. Basic commands run reliably
  2. Foreground/background switching works under stress
  3. No zombies; correct behavior under repeated interrupts/stops

Real World Outcome

$ ./mysh
mysh> echo hello world
hello world
mysh> /bin/ls -la
total 48
drwxr-xr-x  5 user user 4096 Dec 26 10:00 .
drwxr-xr-x 30 user user 4096 Dec 25 09:00 ..
-rwxr-xr-x  1 user user 8432 Dec 26 10:00 mysh
-rw-r--r--  1 user user 2341 Dec 26 09:55 mysh.c
mysh> sleep 100 &
[1] (12345) sleep 100 &
mysh> sleep 200 &
[2] (12346) sleep 200 &
mysh> jobs
[1] (12345) Running    sleep 100 &
[2] (12346) Running    sleep 200 &
mysh> fg %1
sleep 100
^Z
Job [1] (12345) stopped by signal 20 (SIGTSTP)
mysh> jobs
[1] (12345) Stopped    sleep 100
[2] (12346) Running    sleep 200 &
mysh> bg %1
[1] (12345) sleep 100 &
mysh> jobs
[1] (12345) Running    sleep 100 &
[2] (12346) Running    sleep 200 &
mysh> fg %2
sleep 200
^C
Job [2] (12346) terminated by signal 2 (SIGINT)
mysh> jobs
[1] (12345) Running    sleep 100 &
mysh> kill %1
Job [1] (12345) terminated by signal 15 (SIGTERM)
mysh> jobs
mysh>

--- Signal Handling Demo ---
mysh> ./long_running_process &
[1] (12350) ./long_running_process &
mysh> ./another_process
^C
[Ctrl+C sent SIGINT to foreground job only]
[Background job 12350 continues running]
Job ./another_process terminated by signal 2
mysh> jobs
[1] (12350) Running    ./long_running_process &

--- Race Condition Prevention Demo (internal trace) ---
mysh> ./quick_exit &    # Child exits immediately
[DEBUG] sigprocmask(SIG_BLOCK, {SIGCHLD})
[DEBUG] fork() = 12355
[DEBUG] addjob(12355, "./quick_exit")
[DEBUG] sigprocmask(SIG_UNBLOCK, {SIGCHLD})
[DEBUG] SIGCHLD handler: waitpid returned 12355
[DEBUG] deletejob(12355) - job found and removed
[1] (12355) ./quick_exit &

--- Proper Terminal Control ---
mysh> vim test.txt     # Interactive program gets terminal control
[tcsetpgrp gives terminal to vim's process group]
[vim runs with full terminal control]
[After vim exits, shell reclaims terminal]
mysh>

The Core Question Youโ€™re Answering

โ€œHow do shells provide the illusion of multiple concurrent programs sharing one terminal, and how do they coordinate process lifecycle, terminal control, and signal delivery without races or resource leaks?โ€

Concepts You Must Understand First

  1. Job Control Model
    • What is a job? How does it differ from a process?
    • What states can a job be in (foreground, background, stopped)?
    • What triggers transitions between job states?
    • Book: CS:APP Chapter 8.5 and TLPI Chapter 34
  2. Process Groups
    • What is a process group? How is it different from a job?
    • Why does the shell put each pipeline in its own process group?
    • How does setpgid() work? Who can call it?
    • Book: CS:APP Chapter 8.5.2 and TLPI Chapter 34.2
  3. Foreground Process Group and Terminal Control
    • What is the foreground process group? How is it set?
    • What is tcsetpgrp() and when must you call it?
    • What happens if a background process tries to read from the terminal?
    • Book: TLPI Chapter 34.4-34.5 and Advanced Programming in the UNIX Environment Chapter 9
  4. Signal Delivery to Process Groups
    • When you press Ctrl+C, which processes receive SIGINT?
    • How do you send a signal to an entire process group?
    • What is the difference between kill(pid, sig) and kill(-pgid, sig)?
    • Book: CS:APP Chapter 8.5.2 and TLPI Chapter 20.5
  5. Waiting for Stopped/Continued Children
    • What is WUNTRACED? WCONTINUED? When do you need them?
    • How do you distinguish a stopped child from a terminated child?
    • What signals cause a process to stop? To continue?
    • Book: CS:APP Chapter 8.4.3 and TLPI Chapter 26.1
  6. Race Conditions in Shell Implementation
    • What race exists between fork() and adding a job to the table?
    • What race exists between SIGCHLD and the main loop?
    • Why must you block signals during critical sections?
    • Book: CS:APP Chapter 8.5.6 and Chapter 12
  7. Built-in Commands vs External Commands
    • Why must some commands be built-in (cd, exit, jobs, fg, bg)?
    • How do you decide if a command is built-in?
    • What happens if you try to exec a built-in?
    • Book: TLPI Chapter 34.7 and APUE Chapter 9

Questions to Guide Your Design

  1. Job Table Structure: How will you represent jobs? What information do you need per job (pid, pgid, state, command line)?

  2. Command Parsing: How will you parse command lines? Will you handle pipes, redirects, or just simple commands first?

  3. Signal Handler Design: What will your SIGCHLD handler do? What must it NOT do?

  4. Terminal Control: When do you call tcsetpgrp()? What happens if you forget?

  5. Main Loop Architecture: How do you wait for foreground jobs? How do you handle asynchronous SIGCHLD for background jobs?

  6. Error Recovery: What happens if exec fails? If fork fails? If the command doesnโ€™t exist?

Thinking Exercise

Consider this shell main loop pseudocode:

while (1) {
    char *cmdline = readline("mysh> ");
    if (is_builtin(cmdline)) {
        do_builtin(cmdline);
    } else {
        pid_t pid = fork();
        if (pid == 0) {
            // Child
            setpgid(0, 0);  // New process group
            execve(argv[0], argv, environ);
            exit(1);
        }
        // Parent
        setpgid(pid, pid);  // Also set pgid (race with child)
        if (foreground) {
            tcsetpgrp(STDIN_FILENO, pid);  // Give terminal to child
            waitpid(pid, &status, WUNTRACED);  // Wait for fg job
            tcsetpgrp(STDIN_FILENO, getpgrp());  // Reclaim terminal
        } else {
            printf("[%d] %d\n", jobnum, pid);
        }
    }
}

Hand-trace questions:

  1. Why does both parent AND child call setpgid()? What race does this solve?
  2. What happens if the child execs before the parent calls setpgid()?
  3. Why do we call tcsetpgrp() before waitpid() for foreground jobs?
  4. What happens if we forget to reclaim the terminal after the foreground job finishes?
  5. Where should SIGCHLD handling happen? Is it missing from this pseudocode?
  6. What happens if the user types Ctrl+C while a foreground job is running?
  7. What happens if the user types Ctrl+Z? What state does the job transition to?
  8. How would you modify this to properly add jobs to a job table and handle background jobs?

The Interview Questions Theyโ€™ll Ask

  1. **โ€œWalk me through what happens when you type โ€˜ls grep fooโ€™ in a shell and press Enter.โ€**
    • Expected: Parsing, fork for each command, pipe creation, process group setup, exec, wait
  2. โ€œHow does job control work? What happens when you press Ctrl+Z?โ€
    • Expected: SIGTSTP to foreground process group, process stops, shell reclaims terminal, job marked stopped
  3. โ€œWhy canโ€™t โ€˜cdโ€™ be an external command?โ€
    • Expected: chdir() affects calling process only, child process change doesnโ€™t affect parent shell
  4. โ€œDescribe a race condition in a naive shell implementation and how to fix it.โ€
    • Expected: Fork/SIGCHLD race, signal blocking around critical sections
  5. โ€œWhat is a process group and why do shells use them?โ€
    • Expected: Collection of related processes, signal delivery, terminal control, job abstraction
  6. โ€œHow would you implement the โ€˜fgโ€™ built-in command?โ€
    • Expected: Find job, send SIGCONT if stopped, give it terminal via tcsetpgrp(), waitpid with WUNTRACED

Hints in Layers

Layer 1 - Basic Command Execution: Start with a shell that can only run simple foreground commands:

int main(void) {
    char cmdline[1024];
    while (1) {
        printf("mysh> ");
        if (!fgets(cmdline, sizeof(cmdline), stdin)) break;

        // Parse cmdline into argv (simple: split on whitespace)
        char *argv[64];
        parse_cmdline(cmdline, argv);
        if (argv[0] == NULL) continue;

        pid_t pid = fork();
        if (pid == 0) {
            execvp(argv[0], argv);
            perror(argv[0]);
            exit(127);
        }
        int status;
        waitpid(pid, &status, 0);
    }
    return 0;
}

Layer 2 - Job Table Data Structure: Design your job table before adding background jobs:

#define MAXJOBS 16

typedef enum { UNDEF, FG, BG, ST } job_state_t;

typedef struct {
    pid_t pid;              // Process ID
    pid_t pgid;             // Process group ID
    job_state_t state;      // FG, BG, or ST (stopped)
    int jid;                // Job ID [1], [2], etc.
    char cmdline[1024];     // Command line for display
} job_t;

job_t jobs[MAXJOBS];

// Operations: addjob, deletejob, getjobpid, getjobjid, pid2jid, listjobs

Layer 3 - SIGCHLD Handler: Handle child termination and stops asynchronously:

void sigchld_handler(int sig) {
    int olderrno = errno;
    pid_t pid;
    int status;

    // Reap ALL available children
    while ((pid = waitpid(-1, &status, WNOHANG | WUNTRACED)) > 0) {
        if (WIFEXITED(status) || WIFSIGNALED(status)) {
            // Child terminated - delete from job table
            deletejob(jobs, pid);
        } else if (WIFSTOPPED(status)) {
            // Child stopped - update job state
            job_t *job = getjobpid(jobs, pid);
            if (job) job->state = ST;
        }
    }

    errno = olderrno;
}

Layer 4 - Proper Fork with Signal Blocking: Prevent races between fork and job table updates:

void eval(char *cmdline) {
    sigset_t mask_all, mask_chld, prev_mask;
    sigfillset(&mask_all);
    sigemptyset(&mask_chld);
    sigaddset(&mask_chld, SIGCHLD);

    // Block SIGCHLD before fork
    sigprocmask(SIG_BLOCK, &mask_chld, &prev_mask);

    pid_t pid = fork();
    if (pid == 0) {
        // Child: unblock signals, set process group, exec
        sigprocmask(SIG_SETMASK, &prev_mask, NULL);
        setpgid(0, 0);
        execve(argv[0], argv, environ);
        exit(1);
    }

    // Parent: add job while SIGCHLD blocked
    setpgid(pid, pid);  // Also set in parent (race prevention)
    sigprocmask(SIG_BLOCK, &mask_all, NULL);  // Block all for job table
    addjob(jobs, pid, pid, bg ? BG : FG, cmdline);
    sigprocmask(SIG_SETMASK, &prev_mask, NULL);  // Restore (unblock SIGCHLD)

    if (!bg) {
        waitfg(pid);  // Wait for foreground job
    }
}

Layer 5 - Foreground Wait with sigsuspend: Correctly wait for foreground jobs without busy-waiting:

void waitfg(pid_t pid) {
    sigset_t mask;
    sigemptyset(&mask);

    // Wait until the job is no longer in foreground
    // SIGCHLD handler will update job state
    while (fgpid(jobs) == pid) {
        sigsuspend(&mask);  // Atomically unblock and wait
    }
}

Books That Will Help

Topic Book Chapter
Job control overview Computer Systems: A Programmerโ€™s Perspective Ch. 8.5 Signals (job control discussion)
Signal handling for shells Computer Systems: A Programmerโ€™s Perspective Ch. 8.5.5-8.5.7
Process groups and sessions The Linux Programming Interface Ch. 34 Process Groups, Sessions, and Job Control
Terminal control The Linux Programming Interface Ch. 34.4-34.6
Shell implementation details Advanced Programming in the UNIX Environment Ch. 9 Process Relationships
Job control signals Advanced Programming in the UNIX Environment Ch. 10.20 Job Control Signals
Race conditions Computer Systems: A Programmerโ€™s Perspective Ch. 8.5.6 Synchronizing Flows
Concurrent programming patterns Computer Systems: A Programmerโ€™s Perspective Ch. 12 Concurrent Programming
Process API Operating Systems: Three Easy Pieces Ch. 5 Process API
Shell history and design The Unix Programming Environment (Kernighan & Pike) Ch. 3 Using the Shell

Project 13: Virtual Memory Map Visualizer

Attribute Value
Language C (alt: Rust, Zig, C++)
Difficulty Advanced
Time 1โ€“2 weeks
Chapters 8, 9
Coolness โ˜…โ˜…โ˜…โ˜†โ˜† Genuinely Clever
Portfolio Value Micro-SaaS/Pro Tool

What youโ€™ll build: A tool that reports a processโ€™s virtual memory layout (regions, permissions, growth) and demonstrates demand paging and protection faults with controlled experiments.

Why it matters: It turns VM into observable reality: mapping, protection, faults, and locality.

Core challenges:

  • Presenting mapping info accurately (regions + permissions)
  • Controlled page-fault demonstrations (demand paging)
  • Explaining copy-on-write and sharing (fork + VM interaction)

Key concepts to master:

  • Address translation and pages (Ch. 9)
  • Memory protection and mapping (Ch. 9)
  • Process/VM interaction (Ch. 8โ€“9)

Prerequisites: Project 11 recommended.

Deliverable: Show an exact map of a process and explain why a specific access faults.

Implementation hints:

  • Start with โ€œregions with permissions,โ€ then refine to page-level reasoning
  • Keep experiments minimal so the cause of faults is unambiguous

Milestones:

  1. You can distinguish heap/stack/mapped files by observation
  2. You can classify crashes as protection failures
  3. You reason about locality as VM + cache, not just โ€œspeedโ€

Real World Outcome

$ ./vmvis 12345
================================================================================
                    VIRTUAL MEMORY MAP VISUALIZER - Process: 12345 (myapp)
================================================================================

MEMORY REGIONS (from /proc/12345/maps):
--------------------------------------------------------------------------------
ADDRESS RANGE                SIZE      PERMS   PATH
--------------------------------------------------------------------------------
0x00400000-0x00452000       328 KB    r-xp    /usr/bin/myapp
0x00651000-0x00652000         4 KB    r--p    /usr/bin/myapp
0x00652000-0x00653000         4 KB    rw-p    /usr/bin/myapp
0x00653000-0x00674000       132 KB    rw-p    [heap]
0x7f8a3c000000-0x7f8a3c1bc000 1776 KB r-xp    /lib/libc-2.31.so
0x7ffc8a400000-0x7ffc8a421000 132 KB  rw-p    [stack]
0x7ffc8a5fe000-0x7ffc8a600000   8 KB  r-xp    [vdso]

REGION SUMMARY:
  Code (r-x): 2104 KB | Read-only (r--): 20 KB | Read-write (rw-): 276 KB

$ ./vmvis 12345 --page-fault-demo
================================================================================
                    PAGE FAULT DEMONSTRATION
================================================================================
[1] Allocating 16 pages (65536 bytes) without touching...
    Pages resident: 0 of 16

[2] Touching page 0 (writing 1 byte at 0x7f8a40000000)...
    >>> PAGE FAULT TRIGGERED <<<
    Fault type: MINOR (demand paging)
    Pages resident: 1 of 16

[3] Triggering protection fault (writing to code segment)...
    >>> SIGSEGV RECEIVED <<<
    si_code: SEGV_ACCERR (invalid permissions for mapped object)

$ ./vmvis --cow-demo
================================================================================
                    COPY-ON-WRITE DEMONSTRATION
================================================================================
[Parent] Allocating 1 MB, RSS before fork: 5120 KB
[Fork] Child created
  Parent RSS: 5120 KB | Child RSS: 5120 KB (pages SHARED!)

[Child writing to page 0...]
  >>> COPY-ON-WRITE FAULT <<<
  Child RSS: 5124 KB | Parent RSS: 5120 KB (unchanged)

The Core Question Youโ€™re Answering

How does virtual memory create the illusion of a large, private, contiguous address space for each process, and what are the performance and correctness implications of this abstraction?

Concepts You Must Understand First

  1. Virtual vs Physical Addresses - What is an address space? How does the MMU translate addresses? CS:APP Ch. 9.3

  2. Pages and Page Tables - VPN/VPO division, PTE fields, TLB purpose. CS:APP Ch. 9.6

  3. Memory Mapping and Regions - Anonymous vs file-backed mappings, mmap(), fork behavior. CS:APP Ch. 9.8

  4. Page Faults - Minor vs major faults, demand paging. CS:APP Ch. 9.5

  5. Memory Protection - Permission bits (r/w/x), SIGSEGV types, ASLR. CS:APP Ch. 9.7

Questions to Guide Your Design

  1. Will you parse /proc/PID/maps directly? What edge cases exist ([heap], [vdso], deleted files)?
  2. How will you present 48-bit address space meaningfully?
  3. How will you observe page faults without being inside the target process?
  4. How do you safely trigger and catch protection faults?

Thinking Exercise

Trace these accesses with simplified page tables:

  • VPN 0x00400: PPN 0x1A000, r-x, present
  • VPN 0x00653: PPN 0x2B000, rw-, present
  • VPN 0x7f8a3: not present, backed by libc.so
  1. Instruction fetch from 0x00400ABC - success or fault?
  2. Write to 0x00653100 - success or fault?
  3. Read from 0x7f8a3000 - what happens? What changes?
  4. Write to 0x00400000 - different from #1, why?

The Interview Questions Theyโ€™ll Ask

  1. โ€œWalk me through what happens when a process accesses memory that hasnโ€™t been touched since allocation.โ€ - Demand paging, minor fault, kernel allocates page, updates PTE, restarts instruction

  2. โ€œWhy is fork() so fast even for gigabytes of memory?โ€ - Copy-on-write

  3. โ€œWhatโ€™s the difference between SIGSEGV from null pointer vs writing to read-only?โ€ - SEGV_MAPERR vs SEGV_ACCERR

  4. โ€œHow would you debug a memory leak that doesnโ€™t show in valgrind?โ€ - /proc/PID/maps growth, RSS vs VSZ trends

Hints in Layers

Layer 1: Parse /proc/PID/maps with sscanf for start-end perms offset dev inode pathname

Layer 2: Read fault counters from /proc/PID/stat fields 10 and 12 (minflt, majflt)

Layer 3: Use sigsetjmp/siglongjmp with SIGSEGV handler to catch and recover from faults

Layer 4: Use mincore() to check page residency without faulting

Books That Will Help

Topic Book Chapter
Virtual Memory Fundamentals Computer Systems: A Programmerโ€™s Perspective Ch. 9
Page Tables and TLB Operating Systems: Three Easy Pieces Ch. 18-20
Linux Memory Management The Linux Programming Interface Ch. 48-50
mmap and Memory Mapping Advanced Programming in the UNIX Environment Ch. 14

Project 14: Build Your Own Malloc

Attribute Value
Language C (alt: Rust, Zig, C++)
Difficulty Expert
Time 1 month+
Chapters 6, 9
Coolness โ˜…โ˜…โ˜…โ˜…โ˜… Pure Magic
Portfolio Value Resume Gold

What youโ€™ll build: A user-space allocator implementing malloc/free (optionally realloc) plus tooling for invariants, fragmentation, and throughput.

Why it matters: This is the payoff of data layout, locality, and VM: alignment, metadata design, coalescing, and policy trade-offs.

Core challenges:

  • Block metadata and alignment design (layout + ABI alignment)
  • Free-list policies, splitting/coalescing (fragmentation trade-offs)
  • Heap checker and performance harness (correctness + optimization)

Key concepts to master:

  • Heap layout and allocator concepts (Ch. 9)
  • Locality and performance effects (Ch. 6)
  • Invariants mindset (C Interfaces and Implementations reference)

Prerequisites: Projects 2, 9, and 13 recommended.

Deliverable: Allocate/free at scale without corruption; provide evidence on fragmentation and throughput.

Implementation hints:

  • Lock down invariants first; instrument everything
  • Every blockโ€™s header must be explainable in a dump

Milestones:

  1. Allocator passes correctness tests
  2. Fragmentation becomes measurable and improvable by policy
  3. You can explain bugs as violated invariants, not โ€œweird behaviorโ€

Real World Outcome

$ ./mymalloc --test-suite
================================================================================
                    MALLOC IMPLEMENTATION TEST SUITE
================================================================================

Running correctness tests...
  [PASS] Basic malloc/free cycle (1000 allocations)
  [PASS] Alignment check (all pointers 16-byte aligned)
  [PASS] Coalescing test (adjacent free blocks merged)
  [PASS] Realloc in-place when possible
  [PASS] Zero-size malloc returns NULL or unique pointer
  [PASS] Double-free detection (caught and reported)
  [PASS] Heap overflow detection (guard bytes intact)

Running stress tests...
  [PASS] Random alloc/free pattern (100000 ops, no corruption)
  [PASS] Worst-case fragmentation pattern (alternating sizes)

$ ./mymalloc --heap-dump
================================================================================
                    HEAP DUMP - Block Layout
================================================================================
Heap start: 0x555555756000  Heap end: 0x555555776000  Size: 131072 bytes

Block    Address          Size    Status    Prev    Next (free list)
--------------------------------------------------------------------------------
[  0]    0x555555756000   64      ALLOC     -       -
[  1]    0x555555756040   128     FREE      -       [3]
[  2]    0x5555557560c0   256     ALLOC     -       -
[  3]    0x5555557561c0   512     FREE      [1]     [5]
[  4]    0x5555557563c0   1024    ALLOC     -       -
[  5]    0x5555557567c0   2048    FREE      [3]     -

Free list heads (segregated):
  Class 0 (16-64):    [1] -> NULL
  Class 1 (65-256):   NULL
  Class 2 (257-1024): [3] -> NULL
  Class 3 (1025+):    [5] -> NULL

Heap utilization: 67.2%  (internal fragmentation: 8.3%)

$ ./mymalloc --benchmark
================================================================================
                    ALLOCATOR PERFORMANCE BENCHMARK
================================================================================

Workload: Synthetic (mixed sizes 16-4096, 50% alloc / 50% free)
Operations: 1,000,000

                        Throughput      Utilization   Peak Memory
--------------------------------------------------------------------------------
System malloc           847,231 ops/s   89.2%         12.4 MB
My malloc (implicit)    234,567 ops/s   71.3%         18.2 MB
My malloc (explicit)    456,789 ops/s   78.4%         15.1 MB
My malloc (segregated)  678,901 ops/s   84.1%         13.8 MB

Fragmentation Analysis:
  External fragmentation: 12.3% (free blocks too small for requests)
  Internal fragmentation: 5.7% (wasted space within allocated blocks)
  Coalescing efficiency: 94.2% (adjacent frees merged)

$ ./mymalloc --trace workload.trace
================================================================================
                    ALLOCATION TRACE ANALYSIS
================================================================================

Trace: workload.trace (real application: gcc compiling hello.c)
Operations: 47,832 malloc, 45,119 free, 2,713 realloc

Size Distribution:
  0-32 bytes:     โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ 41.2%
  33-64 bytes:    โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ         24.1%
  65-128 bytes:   โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ             15.8%
  129-256 bytes:  โ–ˆโ–ˆโ–ˆโ–ˆ                  9.3%
  257-1024 bytes: โ–ˆโ–ˆ                    6.1%
  1025+ bytes:    โ–ˆ                     3.5%

Peak heap usage: 2.34 MB
Average allocation lifetime: 847 operations
Longest-lived allocation: 47,831 operations (probably a global)

The Core Question Youโ€™re Answering

How do you efficiently manage a contiguous region of memory to satisfy arbitrary allocation requests while minimizing fragmentation and maximizing throughput?

Concepts You Must Understand First

  1. Heap Organization - What is brk/sbrk? How does the heap grow? What is the relationship between heap and mmap? CS:APP Ch. 9.9

  2. Block Structure and Metadata - Header/footer design, boundary tags, alignment requirements. CS:APP Ch. 9.9.6

  3. Free List Management - Implicit vs explicit free lists, LIFO vs address-ordered, segregated fits. CS:APP Ch. 9.9.13

  4. Splitting and Coalescing - When to split blocks? Immediate vs deferred coalescing. CS:APP Ch. 9.9.10

  5. Placement Policies - First fit, next fit, best fit tradeoffs. CS:APP Ch. 9.9.7

  6. Alignment Constraints - Why 8 or 16 byte alignment? What does ABI require? CS:APP Ch. 3.9.3

Questions to Guide Your Design

  1. Metadata size: How many bytes of overhead per block? Can you reduce it?
  2. Minimum block size: What is it? Why?
  3. Free list structure: Implicit, explicit, or segregated? Why?
  4. Coalescing strategy: Immediate or deferred? Tradeoffs?
  5. Heap checker: What invariants must always hold?

Thinking Exercise

Consider this allocation sequence:

p1 = malloc(32);   // Request 32 bytes
p2 = malloc(64);   // Request 64 bytes
p3 = malloc(32);   // Request 32 bytes
free(p2);          // Free middle block
p4 = malloc(48);   // Request 48 bytes - where does it go?
free(p1);          // Free first block
free(p3);          // Free last block - what happens?

Draw the heap state after each operation. Assume:

  • 8-byte header with size and allocated bit
  • 16-byte alignment
  • Minimum block size is 32 bytes (including header)

The Interview Questions Theyโ€™ll Ask

  1. โ€œExplain how malloc works internally.โ€ - Free list, block headers, splitting/coalescing

  2. โ€œWhat is memory fragmentation and how would you minimize it?โ€ - Internal vs external, coalescing, placement policies

  3. โ€œWhy does malloc need to track block sizes?โ€ - For free() to know how much to release, for coalescing

  4. โ€œHow would you detect memory leaks or double-frees?โ€ - Heap checker, guard bytes, tracking allocated blocks

  5. โ€œWhatโ€™s the tradeoff between throughput and utilization?โ€ - Faster policies (first-fit) vs space-efficient (best-fit)

Hints in Layers

Layer 1 - Block Header:

typedef struct {
    size_t size;        // Block size including header (low bit = allocated)
} block_header_t;

#define GET_SIZE(hp)    ((hp)->size & ~0x7)
#define GET_ALLOC(hp)   ((hp)->size & 0x1)
#define PACK(size, alloc) ((size) | (alloc))

Layer 2 - Heap Initialization:

static char *heap_start;
static char *heap_end;

int mm_init(void) {
    heap_start = sbrk(INITIAL_HEAP_SIZE);
    if (heap_start == (void *)-1) return -1;
    heap_end = heap_start + INITIAL_HEAP_SIZE;
    // Create initial free block spanning entire heap
    return 0;
}

Layer 3 - Coalescing:

// With boundary tags, check neighbors:
// Previous block: look at footer just before current header
// Next block: look at header at (current + current_size)

Layer 4 - Heap Checker:

int mm_check(void) {
    // Every block in free list is marked free
    // No contiguous free blocks (coalescing worked)
    // Every free block is in the free list
    // Pointers in heap point to valid addresses
    // No overlapping allocated blocks
}

Books That Will Help

Topic Book Chapter
Dynamic Memory Allocation Computer Systems: A Programmerโ€™s Perspective Ch. 9.9
Allocator Design Patterns C Interfaces and Implementations Ch. 5-6
Memory Management Operating Systems: Three Easy Pieces Ch. 17
Real Allocator Analysis The Linux Programming Interface Ch. 7

Project 15: Robust Unix I/O Toolkit

Attribute Value
Language C (alt: Rust, Zig, C++)
Difficulty Intermediate
Time 1โ€“2 weeks
Chapters 9, 10
Coolness โ˜…โ˜…โ˜†โ˜†โ˜† Practical
Portfolio Value Service & Support

What youโ€™ll build: A โ€œUnix file toolboxโ€ that copies/tees/transforms streams while producing clear evidence of buffering behavior and syscall counts.

Why it matters: Chapter 10 is about being fluent with descriptors and the realities of I/O: partial operations, buffering, metadata, and mapping.

Core challenges:

  • Partial reads/writes and robustness (robust I/O)
  • Buffered vs unbuffered trade-offs (performance + correctness)
  • Safe memory-mapped file usage (VM + I/O interaction)

Key concepts to master:

  • Unix I/O (Ch. 10)
  • Robust I/O discipline (Ch. 10)
  • Memory-mapped files (Ch. 9โ€“10)

Prerequisites: Basic C.

Deliverable: Handle large files, pipes, and redirects without hangs or silent truncation.

Implementation hints:

  • Treat I/O as โ€œmay return less than requested,โ€ always
  • Provide a โ€œtrace modeโ€ that logs your I/O loop decisions

Milestones:

  1. You stop assuming a single read/write is enough
  2. You can explain bufferingโ€™s performance impact with evidence
  3. You treat mmap as โ€œVM + file backing,โ€ not magic

Real World Outcome

$ ./rio_copy --trace input.bin output.bin
================================================================================
                    ROBUST I/O TOOLKIT - File Copy with Trace
================================================================================

Source: input.bin (104857600 bytes, 100 MB)
Destination: output.bin
Buffer size: 8192 bytes

I/O Trace (showing partial operations):
--------------------------------------------------------------------------------
[   1] read(3, buf, 8192) = 8192      (complete)
[   1] write(4, buf, 8192) = 8192     (complete)
[   2] read(3, buf, 8192) = 8192      (complete)
[   2] write(4, buf, 8192) = 4096     (PARTIAL - pipe buffer full)
[   2] write(4, buf+4096, 4096) = 4096 (retry succeeded)
[   3] read(3, buf, 8192) = 8192      (complete)
...
[12800] read(3, buf, 8192) = 4096     (PARTIAL - near EOF)
[12800] read(3, buf+4096, 4096) = 0   (EOF reached)
[12800] write(4, buf, 4096) = 4096    (final write)

Summary:
  Total read syscalls:  12,801 (12,800 complete + 1 partial)
  Total write syscalls: 12,847 (12,753 complete + 94 partial retries)
  Bytes transferred:    104,857,600
  Elapsed time:         0.847 seconds
  Throughput:           118.2 MB/s

$ ./rio_copy --compare-buffering large_file.bin /dev/null
================================================================================
                    BUFFERING STRATEGY COMPARISON
================================================================================

File: large_file.bin (1073741824 bytes, 1 GB)

Strategy          Buffer    Syscalls    Time      Throughput
--------------------------------------------------------------------------------
Unbuffered        1 byte    1073741824  892.3s    1.1 MB/s
Small buffer      64 bytes  16777216    14.2s     72.1 MB/s
Default (8KB)     8192      131073      1.12s     914.3 MB/s
Large (64KB)      65536     16385       0.98s     1044.7 MB/s
Huge (1MB)        1048576   1025        0.91s     1124.9 MB/s
mmap              N/A       ~3          0.84s     1218.2 MB/s

Analysis:
  Syscall overhead at 1-byte: ~830 ns/call (context switch dominated)
  Optimal buffer size for this system: 64KB-1MB
  mmap advantage: eliminates copy to user buffer

$ ./rio_tee input.txt output1.txt output2.txt --trace
================================================================================
                    TEE WITH SYSCALL TRACE
================================================================================

Reading from: input.txt (fd=3)
Writing to: output1.txt (fd=4), output2.txt (fd=5)

[strace-style output]
read(3, "Hello, World!\nThis is...", 8192) = 847
write(4, "Hello, World!\nThis is...", 847) = 847
write(5, "Hello, World!\nThis is...", 847) = 847
read(3, "", 8192) = 0  (EOF)

$ ./rio_cat --handle-signals file.txt
================================================================================
                    SIGNAL-SAFE I/O DEMONSTRATION
================================================================================

Reading file.txt with EINTR handling...

[Simulating signal interruption]
read(3, buf, 8192) = -1, errno=EINTR (signal received during read)
  -> Automatically retrying...
read(3, buf, 8192) = 8192 (success after retry)

Signal-safe I/O pattern demonstrated:
  Total EINTR occurrences: 3
  All automatically handled by rio_readn()

The Core Question Youโ€™re Answering

How do you build I/O routines that correctly handle the realities of Unix: partial operations, interrupted system calls, and the performance tradeoffs of buffering?

Concepts You Must Understand First

  1. File Descriptors - What are they? Relationship to open file table and v-node table. CS:APP Ch. 10.1-10.2

  2. Short Counts - Why does read() return less than requested? When is this normal vs error? CS:APP Ch. 10.4

  3. Buffered vs Unbuffered I/O - stdio vs Unix I/O, when to use each, mixing dangers. CS:APP Ch. 10.9

  4. EINTR Handling - What causes interrupted syscalls? How to handle correctly. TLPI Ch. 21.5

  5. Memory-Mapped I/O - mmap() for files, advantages and gotchas. CS:APP Ch. 9.8

Questions to Guide Your Design

  1. What is the contract of rio_readn()? What does it guarantee?
  2. When should you use unbuffered I/O vs buffered?
  3. How do you handle EINTR - retry or propagate?
  4. What happens if you mix printf() with write()? Why?
  5. When is mmap better than read/write?

Thinking Exercise

Consider this scenario:

int fd = open("data.bin", O_RDONLY);
char buf[1000];
int n = read(fd, buf, 1000);  // n = 847 (short count!)
  1. Is this an error? How do you know?
  2. What could cause this? (List at least 4 scenarios)
  3. How would you modify the code to guarantee reading exactly 1000 bytes (or EOF)?
  4. What if fd is a socket instead of a file? Does your answer change?

The Interview Questions Theyโ€™ll Ask

  1. โ€œWhat does it mean when read() returns less than you asked for?โ€ - Short count, normal for pipes/sockets/signals, check errno for errors

  2. โ€œWhy is printf() not safe to use after fork() before exec()?โ€ - Buffered I/O, buffer might be copied, double output

  3. โ€œHow would you efficiently copy a large file?โ€ - read/write with large buffer, or mmap + memcpy, or sendfile()

  4. โ€œExplain the difference between Unix I/O and Standard I/O.โ€ - Buffering, portability, performance tradeoffs

  5. โ€œWhat happens when you read() from a pipe with no data?โ€ - Blocks until data or all writers close (EOF)

Hints in Layers

Layer 1 - Rio Readn:

ssize_t rio_readn(int fd, void *usrbuf, size_t n) {
    size_t nleft = n;
    char *bufp = usrbuf;

    while (nleft > 0) {
        ssize_t nread = read(fd, bufp, nleft);
        if (nread < 0) {
            if (errno == EINTR) continue;  // Retry on interrupt
            return -1;                      // Error
        } else if (nread == 0) {
            break;                          // EOF
        }
        nleft -= nread;
        bufp += nread;
    }
    return n - nleft;  // Bytes actually read
}

Layer 2 - Buffered Reader Structure:

typedef struct {
    int fd;
    int cnt;        // Unread bytes in buffer
    char *bufptr;   // Next unread byte
    char buf[8192]; // Internal buffer
} rio_t;

Layer 3 - mmap for File I/O:

void *src = mmap(NULL, file_size, PROT_READ, MAP_PRIVATE, fd, 0);
// Now src points to file contents - no read() needed!
// But: must handle SIGBUS if file truncated while mapped

Books That Will Help

Topic Book Chapter
Unix I/O Fundamentals Computer Systems: A Programmerโ€™s Perspective Ch. 10
Robust I/O Wrappers Computer Systems: A Programmerโ€™s Perspective Ch. 10.5
File I/O in Depth The Linux Programming Interface Ch. 4-5
Memory Mapping Advanced Programming in the UNIX Environment Ch. 14.8
Standard I/O Library Advanced Programming in the UNIX Environment Ch. 5

Project 16: Concurrency Workbench

Attribute Value
Language C (alt: Rust, Zig, Go)
Difficulty Expert
Time 2โ€“3 weeks
Chapters 12
Coolness โ˜…โ˜…โ˜…โ˜…โ˜† Hardcore Tech Flex
Portfolio Value Micro-SaaS/Pro Tool

What youโ€™ll build: A server framework that can switch between concurrency models (iterative, process-per-request, thread-per-request, thread pool), with a bounded-buffer work queue and stress-test harness.

Why it matters: Chapter 12 is about choosing the right concurrency model and proving correctness under races and deadlocks.

Core challenges:

  • Correct producer/consumer queue design (synchronization)
  • Avoiding deadlocks and starvation (concurrency hazards)
  • Designing stress tests that actually expose races (verification discipline)

Key concepts to master:

  • Threads and synchronization (Ch. 12)
  • Semaphores/condition-variable patterns (Ch. 12)
  • Concurrency correctness discipline (OSTEP reference)

Prerequisites: Projects 11 and 15 recommended.

Deliverable: Demonstrate throughput gains by model, and explain every bug as a race/deadlock pattern.

Implementation hints:

  • Require โ€œdebug modeโ€ invariants: queue length bounds, lock ordering rules
  • Log enough to prove โ€œwhat happenedโ€ without relying on luck

Milestones:

  1. You can reproduce and fix at least one real race condition
  2. Your thread pool remains stable under stress (no deadlocks)
  3. You can justify which concurrency model fits which workload

Real World Outcome

$ ./concbench --mode=compare --requests=10000
================================================================================
                    CONCURRENCY MODEL COMPARISON
================================================================================

Workload: Echo server, 10000 requests, 100 concurrent clients
Request size: 1KB, think time: 0ms (stress test)

Model                  Throughput    Latency(p50)  Latency(p99)  Memory
--------------------------------------------------------------------------------
Iterative              1,247 req/s   0.8 ms        12.3 ms       2 MB
Process-per-request    3,891 req/s   2.1 ms        45.2 ms       847 MB
Thread-per-request     12,456 req/s  0.4 ms        8.7 ms        124 MB
Thread pool (8)        34,567 req/s  0.2 ms        3.2 ms        18 MB
Thread pool (32)       31,234 req/s  0.3 ms        4.1 ms        34 MB
Event-driven (epoll)   45,678 req/s  0.1 ms        2.1 ms        8 MB

Analysis:
  - Iterative: Simple but serializes all requests
  - Process-per-request: Memory explosion from fork overhead
  - Thread-per-request: Good throughput but thread creation overhead
  - Thread pool: Best balance of throughput and resource usage
  - Event-driven: Highest throughput, lowest memory, most complex

$ ./concbench --mode=race-demo
================================================================================
                    RACE CONDITION DEMONSTRATION
================================================================================

Running counter increment test (1000000 ops, 8 threads)...

WITHOUT synchronization:
  Expected final count: 1000000
  Actual final count:   847293    <-- RACE CONDITION!
  Lost updates:         152707 (15.3%)

  Race detected! Example interleaving:
    Thread 1: load counter (value: 42)
    Thread 2: load counter (value: 42)
    Thread 1: increment -> 43
    Thread 2: increment -> 43  <-- Uses stale value!
    Thread 1: store counter (43)
    Thread 2: store counter (43)  <-- Overwrites!

WITH mutex:
  Expected: 1000000, Actual: 1000000 (correct!)
  Overhead: 2.3x slower than racy version

WITH atomic operations:
  Expected: 1000000, Actual: 1000000 (correct!)
  Overhead: 1.4x slower than racy version

$ ./concbench --mode=deadlock-demo
================================================================================
                    DEADLOCK DEMONSTRATION
================================================================================

Scenario: Transfer between two accounts (A and B)
Thread 1: transfer A -> B (locks A, then B)
Thread 2: transfer B -> A (locks B, then A)

WITHOUT lock ordering:
  Running 10000 transfers...
  [DEADLOCK DETECTED at iteration 47!]

  Thread states:
    Thread 1: holding lock_A, waiting for lock_B
    Thread 2: holding lock_B, waiting for lock_A

  Cycle detected: T1 -> lock_B -> T2 -> lock_A -> T1

WITH consistent lock ordering (always lock lower address first):
  Running 10000 transfers...
  Completed successfully! No deadlocks.

$ ./concbench --mode=producer-consumer --producers=4 --consumers=4 --queue-size=16
================================================================================
                    BOUNDED BUFFER PRODUCER/CONSUMER
================================================================================

Configuration: 4 producers, 4 consumers, queue capacity: 16
Items to produce: 100000

Running...
  [Producer 0] produced 25000 items (blocked 1847 times on full queue)
  [Producer 1] produced 25000 items (blocked 1923 times)
  [Consumer 0] consumed 25000 items (blocked 2134 times on empty queue)
  [Consumer 3] consumed 25000 items (blocked 2089 times)

Summary:
  All 100000 items produced and consumed correctly
  No items lost or duplicated
  Queue utilization: 78.3% (good balance)
  Average wait time: 0.12 ms

The Core Question Youโ€™re Answering

How do you write concurrent programs that are both correct (no races, no deadlocks) and performant, and how do you choose the right concurrency model for your workload?

Concepts You Must Understand First

  1. Threads vs Processes - Shared state implications, creation overhead, isolation tradeoffs. CS:APP Ch. 12.3

  2. Critical Sections and Mutual Exclusion - What needs protecting? What are the primitives? CS:APP Ch. 12.4

  3. Semaphores - Counting vs binary, wait/signal semantics, producer-consumer pattern. CS:APP Ch. 12.5

  4. Deadlock - Four conditions (mutual exclusion, hold-and-wait, no preemption, circular wait), prevention strategies. CS:APP Ch. 12.7.3

  5. Thread Safety - Reentrant functions, thread-local storage, what makes code unsafe. CS:APP Ch. 12.7

  6. Concurrency Models - Process-based, thread-based, event-driven, their tradeoffs. CS:APP Ch. 12.1-12.2

Questions to Guide Your Design

  1. What shared state exists? How will you protect it?
  2. What is your lock ordering discipline? Document it!
  3. How will you detect deadlocks during development?
  4. What invariants must hold before/after each critical section?
  5. How will you stress test for races?

Thinking Exercise

Consider this producer-consumer scenario:

int buffer[N];
int count = 0;  // Items in buffer

void producer() {
    while (1) {
        int item = produce();
        while (count == N) ;  // Spin while full
        buffer[count++] = item;  // BUG: multiple bugs here!
    }
}

void consumer() {
    while (1) {
        while (count == 0) ;  // Spin while empty
        int item = buffer[--count];  // BUG!
        consume(item);
    }
}
  1. Identify at least 3 bugs in this code
  2. What happens if two producers run simultaneously?
  3. What happens if producer and consumer race on count?
  4. Rewrite using a mutex and condition variables
  5. Rewrite using semaphores (simpler!)

The Interview Questions Theyโ€™ll Ask

  1. โ€œWhat is a race condition and how do you prevent it?โ€ - Unordered access to shared state, use locks/atomics

  2. โ€œExplain deadlock and how to avoid it.โ€ - Circular wait on locks, use lock ordering or try-lock

  3. โ€œWhen would you use a thread pool instead of thread-per-request?โ€ - Thread creation overhead, resource limits, predictable memory

  4. โ€œWhatโ€™s the difference between a mutex and a semaphore?โ€ - Mutex for mutual exclusion (1 holder), semaphore for counting resources

  5. โ€œHow do you debug a race condition?โ€ - Thread sanitizer, stress testing, code review for unprotected shared state

  6. โ€œWhat makes a function thread-safe?โ€ - No shared state, or proper synchronization, reentrant

Hints in Layers

Layer 1 - Basic Mutex Usage:

pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER;

void safe_increment(int *counter) {
    pthread_mutex_lock(&lock);
    (*counter)++;
    pthread_mutex_unlock(&lock);
}

Layer 2 - Semaphore-based Producer/Consumer:

sem_t slots;  // Empty slots (init to N)
sem_t items;  // Items available (init to 0)
sem_t mutex;  // Buffer access (init to 1)

void producer() {
    sem_wait(&slots);  // Wait for empty slot
    sem_wait(&mutex);
    // Add to buffer
    sem_post(&mutex);
    sem_post(&items);  // Signal item available
}

Layer 3 - Thread Pool Pattern:

typedef struct {
    pthread_t *threads;
    task_queue_t queue;
    pthread_mutex_t lock;
    pthread_cond_t notify;
    int shutdown;
} threadpool_t;

Layer 4 - Deadlock Prevention (Lock Ordering):

// ALWAYS acquire locks in consistent order (e.g., by address)
void transfer(account_t *from, account_t *to, int amount) {
    account_t *first = (from < to) ? from : to;
    account_t *second = (from < to) ? to : from;

    pthread_mutex_lock(&first->lock);
    pthread_mutex_lock(&second->lock);
    // Transfer...
    pthread_mutex_unlock(&second->lock);
    pthread_mutex_unlock(&first->lock);
}

Books That Will Help

Topic Book Chapter
Thread Programming Computer Systems: A Programmerโ€™s Perspective Ch. 12
Synchronization Patterns Operating Systems: Three Easy Pieces Ch. 26-32
POSIX Threads The Linux Programming Interface Ch. 29-33
Advanced Threading Advanced Programming in the UNIX Environment Ch. 11-12
Concurrency Patterns Unix Network Programming Vol 1 Ch. 26-30

Phase 5: Capstone


Project 17: CS:APP Capstone โ€” Secure, Observable, High-Performance Proxy

Attribute Value
Language C (alt: Rust, Zig, Go)
Difficulty Expert
Time 2โ€“3 months
Chapters All
Coolness โ˜…โ˜…โ˜…โ˜…โ˜… Pure Magic
Portfolio Value Open Core Infrastructure

What youโ€™ll build: A production-minded proxy that includes caching, configurable concurrency, robust error handling, performance instrumentation, and security hardening against common memory-safety failures.

Why it matters: It forces you to use every major idea: representation, machine-level understanding, caching/locality, linking/loading, ECF, VM, Unix I/O, networking, and concurrency.

Core challenges:

  • Correctness under partial I/O and malformed inputs (robust I/O + defensive parsing)
  • High throughput without races/deadlocks (synchronization)
  • Measurable performance wins via locality and reduced syscalls (Ch. 5โ€“6)
  • Debuggability via symbols, interposition, and structured logs (Ch. 7)
  • Hardening and post-mortems for memory errors (Ch. 3, 8, 9)

Key concepts to master:

  • Robust systems programming discipline (Appendix)
  • Concurrency design patterns (Ch. 12)
  • Caching and locality (Ch. 6)
  • VM and mapping (Ch. 9)
  • Network programming (Ch. 11)

Prerequisites: Complete Projects 1, 2, 4, 12, 15, and 16 (or equivalents).

Deliverable: Route real browser traffic through your proxy, observe metrics, reproduce failures, and explain behavior/performance in CS:APP terms.

Implementation hints:

  • Define โ€œdoneโ€ as a checklist: correctness, load test results, metrics present, and at least one documented post-mortem of a bug you introduced and fixed

Milestones:

  1. Correct proxying + robust I/O under adverse conditions
  2. Concurrency scales with evidence and no correctness regressions
  3. You debug performance and correctness using only system evidence (symbols, traces, logs, memory maps)

Real World Outcome

$ ./proxy --port=8080 --threads=8 --cache-size=64MB
================================================================================
              CS:APP CAPSTONE PROXY - Production Mode
================================================================================
Configuration: port=8080, workers=8, cache=64MB (LRU)
[14:30:00] Proxy started, listening on port 8080

$ curl -x localhost:8080 http://example.com/page.html
[14:30:05] GET http://example.com/page.html -> Cache MISS -> 200 OK (847ms)

$ ./proxy --metrics-report
PERFORMANCE: 12,456 requests, 207.6 req/min, 1.23 GB transferred
CACHE: 67.3% hit rate, 52.3 MB used, 234 evictions
LATENCY: p50=23ms, p90=89ms, p99=234ms
CONCURRENCY: 47 active, 312 peak, 73.2% utilization

$ ./loadtest --target=localhost:8080 --concurrent=100 --duration=60s
Results: 45,678 requests, 761.3 req/s, 99.86% success
Latency: min=2ms, max=1247ms, mean=34ms

$ firefox --proxy=localhost:8080
[Browsing session through your proxy - real traffic!]

The Core Question Youโ€™re Answering

How do you build a production-quality networked system integrating robust I/O, concurrency, caching, and security - debugging with systems-level tools?

Concepts You Must Understand First

This capstone integrates ALL CS:APP concepts:

  • Ch. 2: Binary protocol parsing, endianness
  • Ch. 3: Crash debugging, security
  • Ch. 5-6: Performance, cache-friendly structures
  • Ch. 7: Interposition for debugging
  • Ch. 8: Signal handling, graceful shutdown
  • Ch. 9: mmap for cache
  • Ch. 10: Robust I/O
  • Ch. 11: Sockets, HTTP, DNS
  • Ch. 12: Thread pools, synchronization

Questions to Guide Your Design

  1. Thread pool vs event-driven vs hybrid?
  2. Cache data structure and eviction policy?
  3. Handling slow clients, timeouts, keep-alive?
  4. Malformed HTTP request handling?
  5. What happens when origin is down?
  6. Metrics without hurting performance?
  7. Buffer overflow prevention?

Thinking Exercise

Design the cache on paper:

  1. Data structure for storage? (Hash table with what key?)
  2. Concurrent access? (Reader-writer lock? Per-bucket?)
  3. LRU eviction implementation?
  4. What if entry read while being evicted?
  5. Content larger than memory?

The Interview Questions Theyโ€™ll Ask

  1. โ€œWalk through a request from accept() to close()โ€
  2. โ€œHow did you handle concurrent cache access?โ€
  3. โ€œHardest bug you encountered?โ€
  4. โ€œHow would you scale to 10x traffic?โ€
  5. โ€œSecurity vulnerabilities considered?โ€
  6. โ€œExplain your cache eviction policyโ€

Hints in Layers

Layer 1: Basic proxy loop with Accept/handle_request/Close

Layer 2: HTTP parsing with rio_readlineb and sscanf

Layer 3: Thread pool with worker dequeue pattern

Layer 4: Cache with pthread_rwlock_t and LRU doubly-linked list

Layer 5: Graceful shutdown via volatile sig_atomic_t flag

Books That Will Help

Topic Book Chapter
Network Programming CS:APP Ch. 11
Robust I/O CS:APP Ch. 10
Concurrency CS:APP Ch. 12
Sockets API Unix Network Programming Vol 1 Ch. 1-8
High-Performance Servers Unix Network Programming Vol 1 Ch. 26-30
TCP/IP TCP/IP Illustrated Vol 1 Ch. 12-24
Systems Design The Linux Programming Interface Ch. 56-63

Phase 6: Beyond CS:APP (Advanced Extensions)

These projects extend beyond the core CS:APP curriculum, building on everything youโ€™ve learned.


Project 18: ELF Linker and Loader

Attribute Value
Language C (alt: Rust)
Difficulty Expert
Time 2โ€“3 weeks
Chapters 7

What youโ€™ll build: A tiny static linker (myld) for a constrained subset of ELF64 that parses relocatable objects, resolves symbols, applies relocations, and emits a merged output.

Why it matters: โ€œUndefined referenceโ€ stops being mysterious and relocation becomes something you can explain byte-for-byte.

Core challenges:

  • Parsing ELF64 headers, section tables, symbols, and relocations
  • Implementing symbol resolution across multiple .o inputs (strong/weak rules)
  • Implementing x86-64 relocation types end-to-end

Real World Outcome

When your linker works, youโ€™ll see output like this:

$ cat main.c
extern int global_counter;
extern void increment(void);

int main(void) {
    increment();
    return global_counter;
}

$ cat lib.c
int global_counter = 0;

void increment(void) {
    global_counter++;
}

$ gcc -c main.c lib.c

$ ./myld main.o lib.o -o program
================================================================================
                    MYLD - Minimal ELF64 Static Linker
================================================================================

[PHASE 1] Reading input files...
  main.o: 6 sections, 8 symbols, 2 relocations
    .text:     40 bytes
    .data:     0 bytes
    .rodata:   0 bytes
  lib.o: 5 sections, 4 symbols, 1 relocation
    .text:     32 bytes
    .data:     4 bytes

[PHASE 2] Symbol resolution...
  Symbol Table (8 unique symbols):
  +-----------------+--------+----------+---------+------------+
  | Name            | Type   | Bind     | Section | Resolution |
  +-----------------+--------+----------+---------+------------+
  | main            | FUNC   | GLOBAL   | .text   | DEFINED    |
  | increment       | FUNC   | GLOBAL   | .text   | lib.o      |
  | global_counter  | OBJECT | GLOBAL   | .data   | lib.o      |
  | _start          | FUNC   | GLOBAL   | (UNDEF) | libc.a     |
  +-----------------+--------+----------+---------+------------+

  Undefined symbols resolved: 2
  Strong symbol conflicts: 0

[PHASE 3] Section merging...
  .text:   0x401000  (72 bytes from 2 objects)
  .data:   0x402000  (4 bytes from 1 object)
  .rodata: 0x403000  (0 bytes)
  .bss:    0x404000  (0 bytes)

[PHASE 4] Relocation processing...
  Applying 3 relocations:
  +--------------------+------------+------------------+-------------+--------+
  | Type               | Offset     | Symbol           | Addend      | Result |
  +--------------------+------------+------------------+-------------+--------+
  | R_X86_64_PLT32     | 0x40100a   | increment        | -4          | OK     |
  | R_X86_64_PC32      | 0x401012   | global_counter   | -4          | OK     |
  | R_X86_64_32S       | lib.o:0x08 | global_counter   | 0           | OK     |
  +--------------------+------------+------------------+-------------+--------+

  Relocation calculation for R_X86_64_PLT32:
    S (symbol addr) = 0x401040 (increment)
    P (patch site)  = 0x40100a
    A (addend)      = -4
    Result: S + A - P = 0x401040 + (-4) - 0x40100a = 0x32
    Bytes written: 32 00 00 00

[PHASE 5] Output generation...
  Writing ELF header at offset 0
  Writing program headers (3 segments)
  Writing section data
  Writing section headers

[OUTPUT] program: 8752 bytes written
  Entry point: 0x401000
  Segments: 3 (LOAD, LOAD, NOTE)
  Sections: 7

$ ./program && echo "Exit: $?"
Exit: 1

$ readelf -h program | grep Entry
  Entry point address:               0x401000

$ objdump -d program | head -20
program:     file format elf64-x86-64

Disassembly of section .text:

0000000000401000 <main>:
  401000:       55                      push   %rbp
  401001:       48 89 e5                mov    %rsp,%rbp
  401004:       e8 37 00 00 00          call   401040 <increment>
  401009:       8b 05 f1 0f 00 00       mov    0xff1(%rip),%eax
  40100f:       5d                      pop    %rbp
  401010:       c3                      ret

0000000000401040 <increment>:
  401040:       55                      push   %rbp
  401041:       48 89 e5                mov    %rsp,%rbp
  401044:       8b 05 b6 0f 00 00       mov    0xfb6(%rip),%eax
  40104a:       83 c0 01                add    $0x1,%eax
  40104d:       89 05 ad 0f 00 00       mov    %eax,0xfad(%rip)
  401053:       5d                      pop    %rbp
  401054:       c3                      ret

The Core Question Youโ€™re Answering

โ€œHow do separate compilation units become a single executable, and what exactly happens when you see โ€˜undefined reference to fooโ€™?โ€

The linker is the final stage that transforms your mental model of separate .c files into reality. Youโ€™ll understand why symbols have โ€œlinkage,โ€ why static functions canโ€™t be called from other files, and exactly which bytes get patched during relocation.

Concepts You Must Understand First

  1. ELF file format (CS:APP 7.4) - Headers, sections, segments
  2. Symbol tables and types (CS:APP 7.5) - Global/local, strong/weak, defined/undefined
  3. Relocation entries (CS:APP 7.7) - R_X86_64_PC32, R_X86_64_PLT32, R_X86_64_32S
  4. x86-64 addressing modes (CS:APP 3.4) - RIP-relative addressing
  5. Object file sections (CS:APP 7.4) - .text, .data, .bss, .rodata, .symtab, .rela.*
  6. Two-pass linking (CS:APP 7.6) - Symbol resolution then relocation

Questions to Guide Your Design

  1. How will you parse ELF headers? Read the structs from <elf.h> or define your own?
  2. What data structures hold symbol information? Hash table? Sorted array? How do you handle duplicates?
  3. How do you decide section layout? What addresses do merged sections get? How do you handle alignment?
  4. How will you track relocations? Each relocation needs: source object, offset, type, target symbol, addend
  5. Whatโ€™s your relocation calculation? For R_X86_64_PC32: S + A - P - where do S, A, P come from?
  6. How do you handle weak vs strong symbols? What if two objects both define foo?
  7. What output format will you produce? Minimal ELF64 executable? How many program headers?
  8. How will you test correctness? Compare against ld output? Run the executable?

Thinking Exercise

Before writing any code, trace through this relocation by hand:

// In main.o at offset 0x15 in .text section:
// e8 00 00 00 00    call <helper>    ; R_X86_64_PLT32, addend = -4

// Symbol 'helper' will be placed at address 0x401080
// The 'call' instruction is at address 0x401015 in final executable

// Relocation formula for R_X86_64_PLT32: S + A - P
// S = symbol address = 0x401080
// A = addend = -4
// P = patch location = 0x401015 + 1 = 0x401016 (byte after opcode)

// Calculate the 4-byte value to write:
// 0x401080 + (-4) - 0x401016 = 0x401080 - 4 - 0x401016 = 0x66

// The patched instruction becomes:
// e8 66 00 00 00    call 0x401080

// Verify: When CPU executes at 0x401015:
// - Reads opcode e8 (relative call)
// - Reads 4-byte displacement: 0x00000066
// - Calculates target: 0x40101a (next instruction) + 0x66 = 0x401080

Now trace this global variable access:

// In main.o at offset 0x20:
// 8b 05 00 00 00 00    mov 0x0(%rip),%eax  ; R_X86_64_PC32 to 'counter'

// 'counter' is in .data at 0x402000
// This instruction lands at 0x401020 in final executable
// The relocation patches bytes at 0x401022 (after opcode + ModR/M)

// Calculate: S + A - P
// S = 0x402000, A = -4, P = 0x401022
// Result = 0x402000 - 4 - 0x401022 = 0xfda

The Interview Questions Theyโ€™ll Ask

  1. โ€œWhatโ€™s the difference between a section and a segment?โ€
    • Sections are link-time view (.text, .data, etc.), segments are load-time view (LOAD, DYNAMIC)
    • Linker works with sections, loader works with segments
    • Multiple sections can be combined into one segment
  2. โ€œExplain R_X86_64_PC32 vs R_X86_64_32Sโ€
    • PC32: PC-relative, 32-bit signed displacement, S + A - P
    • 32S: Absolute address, sign-extended to 64 bits, S + A
    • PC32 is position-independent, 32S requires known absolute address
  3. โ€œWhy does the linker need two passes?โ€
    • Pass 1: Collect all symbols, resolve references (need to know all symbols before patching)
    • Pass 2: Apply relocations (now we know every symbolโ€™s final address)
    • Single pass would require backpatching or forward references
  4. โ€œWhat happens with multiple definitions of a strong symbol?โ€
    • Linker error: โ€œmultiple definition of โ€˜fooโ€™โ€
    • Each strong symbol can only be defined once across all objects
    • Weak symbols are overridden by strong ones
  5. โ€œHow does the linker handle common symbols (uninitialized globals)?โ€
    • int foo; in multiple files creates COMMON symbols
    • Linker merges them, taking the largest size
    • Final location is in .bss section
  6. โ€œWhatโ€™s the purpose of the .rela.text section?โ€
    • Contains relocation entries for .text section
    • Each entry: offset, type, symbol index, addend
    • Tells linker which bytes to patch and how

Hints in Layers

Layer 1 - ELF Parsing Foundation:

#include <elf.h>
#include <fcntl.h>
#include <sys/mman.h>

typedef struct {
    const char *filename;
    uint8_t *data;
    size_t size;
    Elf64_Ehdr *ehdr;
    Elf64_Shdr *shdrs;
    const char *shstrtab;
} ElfFile;

int load_elf(const char *path, ElfFile *ef) {
    int fd = open(path, O_RDONLY);
    struct stat st;
    fstat(fd, &st);
    ef->data = mmap(NULL, st.st_size, PROT_READ, MAP_PRIVATE, fd, 0);
    ef->size = st.st_size;
    ef->ehdr = (Elf64_Ehdr *)ef->data;
    ef->shdrs = (Elf64_Shdr *)(ef->data + ef->ehdr->e_shoff);
    ef->shstrtab = (char *)(ef->data + ef->shdrs[ef->ehdr->e_shstrndx].sh_offset);
    return 0;
}

Layer 2 - Symbol Table Extraction:

typedef struct {
    char name[256];
    uint64_t value;
    uint8_t bind;    // STB_LOCAL, STB_GLOBAL, STB_WEAK
    uint8_t type;    // STT_FUNC, STT_OBJECT
    uint16_t shndx;  // Section index or SHN_UNDEF
    ElfFile *source;
} Symbol;

void extract_symbols(ElfFile *ef, Symbol **out, int *count) {
    for (int i = 0; i < ef->ehdr->e_shnum; i++) {
        if (ef->shdrs[i].sh_type != SHT_SYMTAB) continue;
        Elf64_Sym *syms = (Elf64_Sym *)(ef->data + ef->shdrs[i].sh_offset);
        int nsyms = ef->shdrs[i].sh_size / sizeof(Elf64_Sym);
        // Process each symbol...
    }
}

Layer 3 - Symbol Resolution:

int resolve_symbols(Symbol *all_syms, int nsyms, SymEntry *global_tab) {
    for (int i = 0; i < nsyms; i++) {
        Symbol *sym = &all_syms[i];
        if (sym->bind == STB_LOCAL) continue;
        SymEntry *existing = hash_lookup(global_tab, sym->name);
        // Handle undefined, defined, strong/weak conflicts...
    }
    return 0;
}

Layer 4 - Relocation Application:

void apply_relocations(ElfFile *ef, uint8_t *output, uint64_t text_base) {
    for (int i = 0; i < ef->ehdr->e_shnum; i++) {
        if (ef->shdrs[i].sh_type != SHT_RELA) continue;
        Elf64_Rela *relas = (Elf64_Rela *)(ef->data + ef->shdrs[i].sh_offset);
        // For each relocation: calculate S + A - P and patch
    }
}

Books That Will Help

Topic Book Chapter
Linking Overview CS:APP Ch. 7
ELF Format Details Practical Binary Analysis Ch. 2
Relocation Types CS:APP 7.7
Symbol Resolution CS:APP 7.6
ELF Specification Low-Level Programming Ch. 4
Dynamic Linking CS:APP 7.10-7.12

Common Pitfalls & Debugging

Problem 1: Relocation values are wrong

# Symptom: Segfault or jump to wrong address
# Fix: P = address of the 4-byte displacement, not instruction start
$ diff <(objdump -d reference) <(objdump -d program)

Problem 2: Section addresses overlap

# Fix: Page-align sections
text_vaddr = 0x401000;
data_vaddr = (text_vaddr + text_size + 0xfff) & ~0xfff;

Problem 3: Missing symbols from libc

# For minimal linker: restrict to self-contained code with syscalls

Problem 4: ELF validation failures

$ readelf -h program  # Compare with readelf -h /bin/ls

Project 19: Virtual Memory Simulator

Attribute Value
Language C (alt: C++, Rust)
Difficulty Advanced
Time ~2 weeks
Chapters 9

What youโ€™ll build: A CLI (vmsim) that simulates page tables, TLB, and page replacement policies on real address traces.

Why it matters: Makes virtual memory visibleโ€”youโ€™ll see page faults happen, measure TLB hit rates, and compare replacement algorithms.

Core challenges:

  • Translating virtual addresses using multi-level page tables
  • Implementing FIFO/LRU/Clock replacement policies
  • Quantifying TLB hit rates vs. page-table-walk costs

Real World Outcome

When your simulator works, youโ€™ll see output like this:

$ cat trace.txt
R 0x00007fff5fbff8a0
W 0x00007fff5fbff8a8
R 0x0000000000400540
R 0x0000000000400544
W 0x00007fff5fbff890
R 0x00007fff5fbff8a0
R 0x0000000000601040
W 0x0000000000601048

$ ./vmsim --trace trace.txt --frames 4 --policy lru --levels 4 --tlb-size 16
================================================================================
                    VMSIM - Virtual Memory Simulator
================================================================================

Configuration:
  Address bits:      48 (virtual), 36 (physical)
  Page size:         4 KB (12 offset bits)
  Page table levels: 4 (9 + 9 + 9 + 9 bits)
  Physical frames:   4
  TLB entries:       16
  Replacement:       LRU

Processing 8 memory accesses...

Access #1: READ  0x00007fff5fbff8a0
  VPN: 0x7fff5fbff  Offset: 0x8a0
  TLB: MISS
  Page Walk: L4[0xff] -> L3[0x1fe] -> L2[0x17e] -> L1[0x1ff]
  Page Table: MISS (page fault)
  [PAGE FAULT] Loading VPN 0x7fff5fbff into frame 0
  Physical: 0x0000008a0
  +------------------+-------+--------+-------+
  | Frame | VPN      | Valid | Dirty  | LRU   |
  +-------+----------+-------+--------+-------+
  | 0     | 7fff5fbff| 1     | 0      | 0     |
  | 1     | -        | 0     | -      | -     |
  | 2     | -        | 0     | -      | -     |
  | 3     | -        | 0     | -      | -     |
  +-------+----------+-------+--------+-------+

Access #2: WRITE 0x00007fff5fbff8a8
  VPN: 0x7fff5fbff  Offset: 0x8a8
  TLB: HIT (frame 0)
  Physical: 0x0000008a8
  [DIRTY] Marking frame 0 as dirty

Access #3: READ  0x0000000000400540
  VPN: 0x000000400  Offset: 0x540
  TLB: MISS
  Page Table: MISS (page fault)
  [PAGE FAULT] Loading VPN 0x000000400 into frame 1
  Physical: 0x000001540

Access #4: READ  0x0000000000400544
  VPN: 0x000000400  Offset: 0x544
  TLB: HIT (frame 1)
  Physical: 0x000001544

... (remaining accesses)

Access #7: READ  0x0000000000601040
  VPN: 0x000000601  Offset: 0x040
  TLB: MISS
  Page Table: MISS (page fault)
  [PAGE FAULT] All frames full - LRU eviction needed
  [EVICT] Frame 2 (VPN 0x7fff5fbf8) - clean, no writeback
  [PAGE FAULT] Loading VPN 0x000000601 into frame 2
  Physical: 0x000002040

================================================================================
                           SIMULATION SUMMARY
================================================================================

Memory Accesses:     8
  Reads:             6
  Writes:            2

TLB Statistics:
  Hits:              3 (37.5%)
  Misses:            5 (62.5%)
  Hit rate:          37.50%

Page Table Statistics:
  Hits:              3 (60.0% of TLB misses)
  Faults:            5
  Page fault rate:   62.50%

Page Replacement:
  Evictions:         1
  Dirty writebacks:  0
  Clean evictions:   1

Performance Estimate:
  TLB hit cost:      1 cycle
  Page walk cost:    ~100 cycles (4 levels * 25 cycles)
  Page fault cost:   ~10,000,000 cycles (disk access)

  Estimated cycles:  50,000,319
  If all TLB hits:   8 cycles
  Slowdown factor:   6,250,040x

$ ./vmsim --trace trace.txt --frames 4 --policy fifo --compare
================================================================================
                    POLICY COMPARISON
================================================================================

| Policy | Page Faults | Evictions | Hit Rate | Dirty Writebacks |
|--------|-------------|-----------|----------|------------------|
| FIFO   | 5           | 1         | 37.50%   | 0                |
| LRU    | 5           | 1         | 37.50%   | 0                |
| Clock  | 5           | 1         | 37.50%   | 0                |
| OPT    | 4           | 0         | 50.00%   | 0                |

Belady's Anomaly Check (FIFO with varying frames):
  2 frames: 6 faults
  3 frames: 5 faults
  4 frames: 5 faults
  5 frames: 4 faults
  No anomaly detected in this trace.

The Core Question Youโ€™re Answering

โ€œWhat actually happens when the CPU accesses a virtual address, and why do some programs thrash while others run smoothly?โ€

Virtual memory is the foundation of process isolation and memory efficiency. By building a simulator, youโ€™ll understand exactly why 4GB programs can run on 2GB machines, what the kernel does during a page fault, and why locality of reference is the most important property of programs.

Concepts You Must Understand First

  1. Virtual vs physical addresses (CS:APP 9.1) - Address spaces and the MMU
  2. Page tables and PTEs (CS:APP 9.3) - Structure of multi-level page tables
  3. TLB operation (CS:APP 9.5) - Translation lookaside buffer as a cache
  4. Page faults (CS:APP 9.3.4) - What triggers them, kernel handling
  5. Replacement policies (OSTEP Ch. 21-22) - FIFO, LRU, Clock, OPT
  6. Working set and locality (CS:APP 9.9) - Why caching works

Questions to Guide Your Design

  1. How will you represent page table entries? What fields: present, dirty, accessed, frame number?
  2. How many levels of page tables? x86-64 uses 4 levels - will you simulate all 4?
  3. Whatโ€™s your TLB data structure? Fully associative? Set associative? What replacement policy?
  4. How will you track LRU order? Timestamp? Doubly-linked list? Counter bits?
  5. How do you implement Clock algorithm? Whatโ€™s the โ€œsecond chanceโ€ logic?
  6. How will you read trace files? What format: <R/W> <hex address>?
  7. What statistics will you collect? TLB hits, page faults, dirty writebacks?
  8. How will you validate correctness? Known traces with expected results?

Thinking Exercise

Before writing any code, trace through this address translation by hand:

// Virtual address: 0x00007fff5fbff8a0
// 48-bit address, 4KB pages, 4-level page table

// Break down the address (4KB = 12 offset bits):
// Binary: 0000 0000 0000 0000 0111 1111 1111 1111 0101 1111 1011 1111 1111 1000 1010 0000
//
// Bits [47:39] = L4 index = 0x0ff (255)   - index into PML4
// Bits [38:30] = L3 index = 0x1fe (510)   - index into PDPT
// Bits [29:21] = L2 index = 0x17e (382)   - index into PD
// Bits [20:12] = L1 index = 0x1ff (511)   - index into PT
// Bits [11:0]  = Offset   = 0x8a0 (2208)  - offset within page

// Page walk:
// 1. CR3 contains physical address of PML4
// 2. Read PML4[255] -> physical address of PDPT
// 3. Read PDPT[510] -> physical address of PD
// 4. Read PD[382]   -> physical address of PT
// 5. Read PT[511]   -> PTE with frame number (or page fault if not present)
// 6. Physical address = (frame_number << 12) | 0x8a0

// TLB caches: VPN -> (frame_number, permissions)
// VPN = upper 36 bits = 0x7fff5fbff

Now trace a replacement decision:

// 4 frames, LRU policy, current state:
// Frame 0: VPN 0xABC, accessed at time 5
// Frame 1: VPN 0xDEF, accessed at time 8  (most recent)
// Frame 2: VPN 0x123, accessed at time 2  (least recent - EVICT THIS)
// Frame 3: VPN 0x456, accessed at time 6

// New access to VPN 0x789 at time 9 (page fault):
// 1. Find LRU frame: Frame 2 (time 2)
// 2. If Frame 2 is dirty, write back to disk
// 3. Load VPN 0x789 into Frame 2
// 4. Update access time to 9

The Interview Questions Theyโ€™ll Ask

  1. โ€œWalk through what happens when a process accesses memoryโ€
    • CPU generates virtual address
    • TLB lookup (fast path if hit)
    • On TLB miss: walk page table (multiple memory accesses)
    • If page not present: page fault, kernel loads from disk
    • Update TLB, return physical address
  2. โ€œWhy do we use multi-level page tables instead of single-level?โ€
    • Single level for 48-bit addresses would need 2^36 entries (64GB table!)
    • Multi-level allows sparse allocation
    • Only allocate tables for used regions
    • Trade-off: more memory accesses per translation
  3. โ€œExplain the Clock page replacement algorithmโ€
    • Approximation of LRU with lower overhead
    • Reference bit set on access, cleared by clock hand
    • Hand sweeps, evicting first page with ref=0
    • Gives pages a โ€œsecond chanceโ€ if recently used
  4. โ€œWhat is thrashing and how do you detect it?โ€
    • Working set exceeds physical memory
    • Constant page faults, CPU mostly waiting for I/O
    • Detect: page fault rate exceeds threshold
    • Solution: reduce multiprogramming or add memory
  5. โ€œWhatโ€™s Beladyโ€™s anomaly and which algorithms are immune?โ€
    • FIFO can have MORE faults with MORE frames
    • Example: sequence 1,2,3,4,1,2,5,1,2,3,4,5 with 3 vs 4 frames
    • Stack algorithms (LRU, OPT) are immune
    • FIFO is not a stack algorithm
  6. โ€œHow does the TLB interact with context switches?โ€
    • TLB entries are process-specific
    • Context switch invalidates TLB (or uses ASID)
    • Cold TLB after switch causes many page walks
    • ASID allows TLB entries to persist across switches

Hints in Layers

Layer 1 - Address Parsing:

#define PAGE_SIZE 4096
#define PAGE_BITS 12
#define VPN_BITS 36
#define LEVELS 4
#define LEVEL_BITS 9

typedef struct {
    uint64_t vpn;           // Virtual page number
    uint16_t offset;        // Offset within page
    uint16_t indices[4];    // Indices for each level
} ParsedAddress;

ParsedAddress parse_address(uint64_t vaddr) {
    ParsedAddress pa;
    pa.offset = vaddr & 0xFFF;
    pa.vpn = vaddr >> PAGE_BITS;

    uint64_t temp = pa.vpn;
    for (int i = LEVELS - 1; i >= 0; i--) {
        pa.indices[i] = temp & 0x1FF;  // 9 bits per level
        temp >>= LEVEL_BITS;
    }
    return pa;
}

Layer 2 - Page Table Structure:

typedef struct {
    uint32_t frame;    // Physical frame number
    uint8_t present;   // Is page in memory?
    uint8_t dirty;     // Has page been written?
    uint8_t accessed;  // For clock algorithm
} PTE;

typedef struct PageTable {
    PTE entries[512];              // 2^9 entries per level
    struct PageTable *children[512]; // Pointers to next level
} PageTable;

PageTable *root;  // PML4

int walk_page_table(uint64_t vpn, uint32_t *frame_out) {
    PageTable *current = root;
    ParsedAddress pa = parse_address(vpn << PAGE_BITS);

    for (int level = 0; level < LEVELS - 1; level++) {
        int idx = pa.indices[level];
        if (!current->children[idx]) return -1;  // Not mapped
        current = current->children[idx];
    }

    int final_idx = pa.indices[LEVELS - 1];
    if (!current->entries[final_idx].present) return -1;

    *frame_out = current->entries[final_idx].frame;
    return 0;
}

Layer 3 - TLB Implementation:

typedef struct {
    uint64_t vpn;
    uint32_t frame;
    uint8_t valid;
    uint64_t last_access;  // For LRU
} TLBEntry;

TLBEntry tlb[TLB_SIZE];
uint64_t access_counter = 0;

int tlb_lookup(uint64_t vpn, uint32_t *frame_out) {
    for (int i = 0; i < TLB_SIZE; i++) {
        if (tlb[i].valid && tlb[i].vpn == vpn) {
            tlb[i].last_access = ++access_counter;
            *frame_out = tlb[i].frame;
            return 1;  // Hit
        }
    }
    return 0;  // Miss
}

void tlb_insert(uint64_t vpn, uint32_t frame) {
    // Find empty or LRU entry
    int victim = 0;
    uint64_t oldest = UINT64_MAX;
    for (int i = 0; i < TLB_SIZE; i++) {
        if (!tlb[i].valid) { victim = i; break; }
        if (tlb[i].last_access < oldest) {
            oldest = tlb[i].last_access;
            victim = i;
        }
    }
    tlb[victim] = (TLBEntry){vpn, frame, 1, ++access_counter};
}

Layer 4 - Page Replacement Policies:

typedef struct {
    uint64_t vpn;
    uint8_t valid;
    uint8_t dirty;
    uint8_t ref_bit;       // For clock
    uint64_t load_time;    // For FIFO
    uint64_t last_access;  // For LRU
} Frame;

Frame frames[MAX_FRAMES];
int clock_hand = 0;

int find_victim_lru(void) {
    int victim = -1;
    uint64_t oldest = UINT64_MAX;
    for (int i = 0; i < num_frames; i++) {
        if (!frames[i].valid) return i;
        if (frames[i].last_access < oldest) {
            oldest = frames[i].last_access;
            victim = i;
        }
    }
    return victim;
}

int find_victim_clock(void) {
    while (1) {
        if (!frames[clock_hand].ref_bit) {
            int victim = clock_hand;
            clock_hand = (clock_hand + 1) % num_frames;
            return victim;
        }
        frames[clock_hand].ref_bit = 0;  // Second chance
        clock_hand = (clock_hand + 1) % num_frames;
    }
}

Layer 5 - Main Simulation Loop:

void simulate(const char *trace_file) {
    FILE *f = fopen(trace_file, "r");
    char op;
    uint64_t addr;

    while (fscanf(f, " %c %lx", &op, &addr) == 2) {
        stats.total_accesses++;
        uint64_t vpn = addr >> PAGE_BITS;
        uint32_t frame;

        // Try TLB
        if (tlb_lookup(vpn, &frame)) {
            stats.tlb_hits++;
        } else {
            stats.tlb_misses++;
            // Walk page table
            if (walk_page_table(vpn, &frame) < 0) {
                stats.page_faults++;
                frame = handle_page_fault(vpn);
            }
            tlb_insert(vpn, frame);
        }

        // Update access info
        frames[frame].last_access = ++access_counter;
        frames[frame].ref_bit = 1;
        if (op == 'W') frames[frame].dirty = 1;
    }
}

Books That Will Help

Topic Book Chapter
Virtual Memory Overview CS:APP Ch. 9
Address Translation CS:APP 9.3-9.5
Page Replacement OSTEP Ch. 21-22
TLBs CS:APP 9.5
Working Sets OSTEP Ch. 22
Linux VM Implementation TLPI Ch. 49-50

Common Pitfalls & Debugging

Problem 1: Address parsing is off by one level

# Symptom: All accesses go to wrong frame
# Check your bit extraction:
printf("VPN: 0x%lx\n", addr >> 12);
printf("L4: %d L3: %d L2: %d L1: %d\n",
       (addr >> 39) & 0x1ff, (addr >> 30) & 0x1ff,
       (addr >> 21) & 0x1ff, (addr >> 12) & 0x1ff);

Problem 2: LRU timestamps not updating

# Symptom: Same frame always evicted
# Fix: Update last_access on EVERY access, not just faults
frames[frame].last_access = ++global_counter;

Problem 3: Clock hand not wrapping

# Symptom: Array out of bounds or stuck
clock_hand = (clock_hand + 1) % num_frames;

Problem 4: Dirty bit not set on writes

# Symptom: Dirty writeback count always zero
if (op == 'W' || op == 'w') {
    frames[frame].dirty = 1;
}

Project 20: HTTP Web Server

Attribute Value
Language C (alt: Rust, Go)
Difficulty Intermediate
Time 1โ€“2 weeks
Chapters 10, 11, 8

What youโ€™ll build: A small but real HTTP server (tiny) that parses requests, serves static files, and runs simple CGI-style dynamic handlers.

Why it matters: Connects sockets, HTTP parsing, and process control into a working networked application.

Core challenges:

  • Implementing request/response loop with sockets API
  • Parsing HTTP request lines and headers defensively
  • Serving static files with correct MIME types

Real World Outcome

When your server works, youโ€™ll see output like this:

$ ./tiny 8080 ./www &
================================================================================
                    TINY - Minimal HTTP/1.1 Web Server
================================================================================
[INIT] Document root: ./www
[INIT] Listening on port 8080
[INIT] Server ready. Press Ctrl+C to shutdown.

$ curl -v http://localhost:8080/index.html
*   Trying 127.0.0.1:8080...
* Connected to localhost (127.0.0.1) port 8080 (#0)
> GET /index.html HTTP/1.1
> Host: localhost:8080
> User-Agent: curl/7.68.0
> Accept: */*
>
< HTTP/1.1 200 OK
< Server: Tiny/1.0
< Date: Sat, 14 Dec 2024 15:30:42 GMT
< Content-Type: text/html
< Content-Length: 1234
< Connection: close
<
<!DOCTYPE html>
<html>...
* Closing connection 0

# Server log output:
[2024-12-14 15:30:42] 127.0.0.1:54321 "GET /index.html HTTP/1.1" 200 1234 0.002ms

$ curl -v http://localhost:8080/cgi-bin/adder?15&213
*   Trying 127.0.0.1:8080...
* Connected to localhost (127.0.0.1) port 8080 (#0)
> GET /cgi-bin/adder?15&213 HTTP/1.1
> Host: localhost:8080
>
< HTTP/1.1 200 OK
< Server: Tiny/1.0
< Content-Type: text/html
< Transfer-Encoding: chunked
<
<html>
<head><title>Adder Result</title></head>
<body>
<h1>Welcome to adder.cgi</h1>
<p>The sum of 15 and 213 is 228</p>
</body>
</html>
* Closing connection 0

[2024-12-14 15:31:15] 127.0.0.1:54322 "GET /cgi-bin/adder?15&213 HTTP/1.1" 200 - 15.3ms (CGI)

$ curl -I http://localhost:8080/images/logo.png
HTTP/1.1 200 OK
Server: Tiny/1.0
Date: Sat, 14 Dec 2024 15:32:00 GMT
Content-Type: image/png
Content-Length: 45678
Last-Modified: Fri, 13 Dec 2024 10:00:00 GMT
Connection: close

$ curl http://localhost:8080/nonexistent.html
<!DOCTYPE html>
<html>
<head><title>404 Not Found</title></head>
<body>
<h1>404 Not Found</h1>
<p>The requested URL /nonexistent.html was not found on this server.</p>
<hr><address>Tiny/1.0 Server</address>
</body>
</html>

[2024-12-14 15:32:30] 127.0.0.1:54323 "GET /nonexistent.html HTTP/1.1" 404 312 0.001ms

$ curl -X POST -d "name=test" http://localhost:8080/form
HTTP/1.1 501 Not Implemented
Server: Tiny/1.0
Content-Type: text/html
Content-Length: 156

<!DOCTYPE html>
<html><body>
<h1>501 Method Not Implemented</h1>
<p>Tiny does not support the POST method.</p>
</body></html>

# Load testing with wrk:
$ wrk -t4 -c100 -d30s http://localhost:8080/index.html
Running 30s test @ http://localhost:8080/index.html
  4 threads and 100 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     2.34ms    1.12ms   45.21ms   89.32%
    Req/Sec     10.87k     1.23k   14.56k    72.15%
  1,301,245 requests in 30.01s, 1.52GB read
Requests/sec:  43,363.21
Transfer/sec:     51.89MB

The Core Question Youโ€™re Answering

โ€œHow does a web server actually work at the systems level, from TCP socket to HTTP response?โ€

Every web developer uses web servers daily, but few understand what happens between socket() and the HTML appearing in the browser. This project demystifies the entire stack: connection handling, protocol parsing, file serving, and process-based CGI.

Concepts You Must Understand First

  1. TCP socket programming (CS:APP 11.4) - socket, bind, listen, accept, read, write
  2. HTTP protocol basics (RFC 2616) - Request/response structure, headers, status codes
  3. Unix I/O (CS:APP 10.1-10.4) - File descriptors, open, read, write, close
  4. Robust I/O (CS:APP 10.5) - rio_readlineb for buffered line reading
  5. Process control (CS:APP 8.2-8.4) - fork, exec, wait for CGI
  6. MIME types - Content-Type mapping from file extensions

Questions to Guide Your Design

  1. How will you structure the main server loop? Single process? Fork per connection? Thread pool?
  2. How do you parse HTTP requests robustly? What if request is malformed? Too long?
  3. How do you prevent directory traversal attacks? GET /../../../etc/passwd?
  4. How do you determine MIME types? Hard-coded table? /etc/mime.types?
  5. How do you implement CGI? Which environment variables? How to pass query string?
  6. How do you handle partial reads/writes? TCP doesnโ€™t guarantee message boundaries
  7. How do you handle persistent connections? Connection: keep-alive vs close?
  8. How do you handle signals? SIGPIPE when client disconnects, SIGCHLD from CGI

Thinking Exercise

Before writing any code, trace through this HTTP transaction:

// Client sends (bytes over TCP):
"GET /images/cat.jpg HTTP/1.1\r\n"
"Host: localhost:8080\r\n"
"User-Agent: Mozilla/5.0\r\n"
"Accept: image/jpeg,image/*\r\n"
"\r\n"

// Server must:
// 1. Parse request line: method="GET", uri="/images/cat.jpg", version="HTTP/1.1"
// 2. Parse headers into key-value pairs
// 3. Validate: method supported? URI safe? Version OK?
// 4. Map URI to filesystem: "./www/images/cat.jpg"
// 5. Check file exists and is readable
// 6. Determine Content-Type from extension: "image/jpeg"
// 7. Get file size with stat()
// 8. Send response:
"HTTP/1.1 200 OK\r\n"
"Server: Tiny/1.0\r\n"
"Content-Type: image/jpeg\r\n"
"Content-Length: 45678\r\n"
"\r\n"
// <45678 bytes of JPEG data>

Now trace a CGI request:

// Client sends:
"GET /cgi-bin/adder?15&213 HTTP/1.1\r\n"
"Host: localhost:8080\r\n"
"\r\n"

// Server must:
// 1. Parse URI: path="/cgi-bin/adder", query_string="15&213"
// 2. Detect CGI (path starts with /cgi-bin/)
// 3. fork() child process
// 4. In child:
//    - dup2() client socket to STDOUT
//    - setenv("QUERY_STRING", "15&213")
//    - setenv("REQUEST_METHOD", "GET")
//    - execve("./www/cgi-bin/adder", ...)
// 5. In parent: wait() for child, then continue

The Interview Questions Theyโ€™ll Ask

  1. โ€œExplain the socket API calls for a TCP serverโ€
    • socket() - create endpoint
    • bind() - attach to port
    • listen() - mark as passive (accepting connections)
    • accept() - block until client connects, return new fd
    • read()/write() - exchange data
    • close() - terminate connection
  2. โ€œHow do you handle slow clients?โ€
    • Problem: read() blocks if client sends slowly
    • Solutions: non-blocking I/O, select/poll/epoll, timeouts
    • For this project: accept slowness (educational focus)
    • Production: event-driven with timeout
  3. โ€œWhatโ€™s a directory traversal attack and how do you prevent it?โ€
    • Attack: GET /../../../etc/passwd
    • Naive: prepend doc root, but .. escapes it
    • Fix: resolve path with realpath(), check prefix matches doc root
    • Also: reject paths containing .. directly
  4. โ€œHow does CGI work?โ€
    • Server forks child, sets environment variables
    • Redirects childโ€™s stdout to client socket
    • Executes CGI program
    • Program writes HTTP response to stdout (goes to client)
    • Server waits for child to finish
  5. โ€œWhat happens if the client disconnects mid-transfer?โ€
    • write() to closed socket generates SIGPIPE
    • Default: terminate process
    • Fix: signal(SIGPIPE, SIG_IGN) and check write() return value
    • Or: use send() with MSG_NOSIGNAL flag
  6. โ€œHow would you add HTTPS support?โ€
    • Use OpenSSL or similar TLS library
    • SSL_accept() instead of just accept()
    • SSL_read()/SSL_write() instead of read()/write()
    • Handle certificate loading, cipher selection

Hints in Layers

Layer 1 - Socket Setup:

#include <sys/socket.h>
#include <netinet/in.h>

int open_listenfd(int port) {
    int listenfd = socket(AF_INET, SOCK_STREAM, 0);

    // Allow port reuse (avoid "Address already in use")
    int optval = 1;
    setsockopt(listenfd, SOL_SOCKET, SO_REUSEADDR, &optval, sizeof(optval));

    struct sockaddr_in addr = {
        .sin_family = AF_INET,
        .sin_port = htons(port),
        .sin_addr.s_addr = htonl(INADDR_ANY)
    };

    bind(listenfd, (struct sockaddr *)&addr, sizeof(addr));
    listen(listenfd, 1024);
    return listenfd;
}

Layer 2 - Robust I/O (from CS:APP):

typedef struct {
    int fd;
    int cnt;              // Unread bytes in buffer
    char *bufptr;         // Next unread byte
    char buf[8192];
} rio_t;

void rio_readinitb(rio_t *rp, int fd) {
    rp->fd = fd;
    rp->cnt = 0;
    rp->bufptr = rp->buf;
}

// Read a line (handles partial reads)
ssize_t rio_readlineb(rio_t *rp, void *usrbuf, size_t maxlen) {
    char *bufp = usrbuf;
    for (int n = 1; n < maxlen; n++) {
        char c;
        int rc = rio_read(rp, &c, 1);
        if (rc == 1) {
            *bufp++ = c;
            if (c == '\n') break;
        } else if (rc == 0) {
            if (n == 1) return 0;  // EOF, no data
            break;
        } else {
            return -1;  // Error
        }
    }
    *bufp = 0;
    return bufp - (char *)usrbuf;
}

Layer 3 - HTTP Request Parsing:

typedef struct {
    char method[16];
    char uri[2048];
    char version[16];
    char headers[32][2][256];  // Up to 32 headers
    int header_count;
} HttpRequest;

int parse_request(rio_t *rp, HttpRequest *req) {
    char line[2048];

    // Read request line: "GET /index.html HTTP/1.1"
    rio_readlineb(rp, line, sizeof(line));
    if (sscanf(line, "%15s %2047s %15s", req->method, req->uri, req->version) != 3)
        return -1;

    // Read headers until blank line
    req->header_count = 0;
    while (rio_readlineb(rp, line, sizeof(line)) > 0) {
        if (strcmp(line, "\r\n") == 0 || strcmp(line, "\n") == 0)
            break;
        char *colon = strchr(line, ':');
        if (colon && req->header_count < 32) {
            *colon = '\0';
            strcpy(req->headers[req->header_count][0], line);
            // Skip ": " and trim newline
            char *value = colon + 2;
            value[strcspn(value, "\r\n")] = '\0';
            strcpy(req->headers[req->header_count][1], value);
            req->header_count++;
        }
    }
    return 0;
}

Layer 4 - Static File Serving:

const char *get_mime_type(const char *filename) {
    const char *ext = strrchr(filename, '.');
    if (!ext) return "application/octet-stream";
    if (strcmp(ext, ".html") == 0) return "text/html";
    if (strcmp(ext, ".css") == 0)  return "text/css";
    if (strcmp(ext, ".js") == 0)   return "application/javascript";
    if (strcmp(ext, ".jpg") == 0)  return "image/jpeg";
    if (strcmp(ext, ".png") == 0)  return "image/png";
    if (strcmp(ext, ".gif") == 0)  return "image/gif";
    return "application/octet-stream";
}

void serve_static(int fd, const char *filename) {
    struct stat sbuf;
    if (stat(filename, &sbuf) < 0) {
        send_error(fd, 404, "Not Found");
        return;
    }

    int srcfd = open(filename, O_RDONLY);
    char *srcp = mmap(0, sbuf.st_size, PROT_READ, MAP_PRIVATE, srcfd, 0);
    close(srcfd);

    // Send headers
    char header[512];
    sprintf(header,
        "HTTP/1.1 200 OK\r\n"
        "Server: Tiny/1.0\r\n"
        "Content-Type: %s\r\n"
        "Content-Length: %ld\r\n"
        "\r\n",
        get_mime_type(filename), sbuf.st_size);
    write(fd, header, strlen(header));

    // Send body
    write(fd, srcp, sbuf.st_size);
    munmap(srcp, sbuf.st_size);
}

Layer 5 - CGI Handler:

void serve_cgi(int fd, const char *program, const char *query_string) {
    pid_t pid = fork();
    if (pid == 0) {  // Child
        // Set CGI environment variables
        setenv("QUERY_STRING", query_string ? query_string : "", 1);
        setenv("REQUEST_METHOD", "GET", 1);
        setenv("GATEWAY_INTERFACE", "CGI/1.1", 1);

        // Redirect stdout to client socket
        dup2(fd, STDOUT_FILENO);
        close(fd);

        // Execute CGI program
        execl(program, program, NULL);
        exit(1);  // If exec fails
    } else {  // Parent
        int status;
        waitpid(pid, &status, 0);
    }
}

Books That Will Help

Topic Book Chapter
Unix I/O CS:APP Ch. 10
Network Programming CS:APP Ch. 11
Process Control CS:APP Ch. 8
Sockets Deep Dive Unix Network Programming Vol 1 Ch. 1-8
HTTP Protocol TCP/IP Illustrated Vol 1 Ch. 14
Advanced I/O TLPI Ch. 63

Common Pitfalls & Debugging

Problem 1: โ€œAddress already in useโ€ on restart

// Fix: Set SO_REUSEADDR before bind()
int optval = 1;
setsockopt(listenfd, SOL_SOCKET, SO_REUSEADDR, &optval, sizeof(optval));

Problem 2: Server crashes on client disconnect

// Fix: Ignore SIGPIPE
signal(SIGPIPE, SIG_IGN);
// Then check write() return value
if (write(fd, data, len) < 0) {
    // Client disconnected, clean up
}

Problem 3: Zombie CGI processes

// Fix: Handle SIGCHLD
void sigchld_handler(int sig) {
    while (waitpid(-1, NULL, WNOHANG) > 0);
}
signal(SIGCHLD, sigchld_handler);

Problem 4: Directory traversal vulnerability

// WRONG:
char path[256];
sprintf(path, "./www%s", uri);  // uri="/../../../etc/passwd"

// FIX:
char realpath_buf[PATH_MAX];
char *real = realpath(path, realpath_buf);
if (!real || strncmp(real, "./www", 5) != 0) {
    send_error(fd, 403, "Forbidden");
    return;
}

Project 21: Thread Pool Implementation

Attribute Value
Language C (alt: Rust, Go, Java)
Difficulty Advanced
Time 1โ€“2 weeks
Chapters 12

What youโ€™ll build: A reusable thread pool library with a bounded work queue, condition variables, and clean shutdown semantics.

Why it matters: Producer-consumer pattern is everywhereโ€”this forces you to get synchronization right.

Core challenges:

  • Implementing correct producer-consumer queue with blocking
  • Handling shutdown safely (no task loss, no deadlocks)
  • Avoiding thundering herd and backpressure issues

Real World Outcome:

$ ./threadpool --workers=4 --demo
================================================================================
                    THREAD POOL DEMONSTRATION
================================================================================
[INIT] Creating pool with 4 worker threads
[INIT] Work queue capacity: 64 tasks
[WORKER-0] Started, waiting for work...
[WORKER-1] Started, waiting for work...
[WORKER-2] Started, waiting for work...
[WORKER-3] Started, waiting for work...

[SUBMIT] Task 1: compute_fibonacci(40)
[SUBMIT] Task 2: compute_fibonacci(35)
[SUBMIT] Task 3: compress_file("data.bin")
[SUBMIT] Task 4: compute_factorial(20)
[SUBMIT] Task 5: hash_password("secret")

[WORKER-0] Executing task 1: compute_fibonacci(40)
[WORKER-1] Executing task 2: compute_fibonacci(35)
[WORKER-2] Executing task 3: compress_file("data.bin")
[WORKER-3] Executing task 4: compute_factorial(20)
[WORKER-1] Completed task 2 in 89ms (result: 9227465)
[WORKER-1] Executing task 5: hash_password("secret")
[WORKER-3] Completed task 4 in 12ms (result: 2432902008176640000)
[WORKER-1] Completed task 5 in 156ms
[WORKER-2] Completed task 3 in 423ms (compressed 1.2MB -> 340KB)
[WORKER-0] Completed task 1 in 1247ms (result: 102334155)

================================================================================
                    POOL STATISTICS
================================================================================
Total tasks submitted:  5
Total tasks completed:  5
Average wait time:      34ms
Average execution time: 385ms
Queue high watermark:   4/64

$ ./threadpool --workers=2 --stress-test --tasks=10000
================================================================================
                    STRESS TEST MODE
================================================================================
[CONFIG] Workers: 2, Tasks: 10000, Queue size: 256

[PROGRESS] 1000/10000 tasks (10.0%) - 847 tasks/sec
[PROGRESS] 2000/10000 tasks (20.0%) - 892 tasks/sec
[PROGRESS] 5000/10000 tasks (50.0%) - 921 tasks/sec
[PROGRESS] 10000/10000 tasks (100.0%) - 934 tasks/sec

[RESULT] All 10000 tasks completed successfully
[RESULT] Total time: 10.71s
[RESULT] Throughput: 934 tasks/sec
[RESULT] No deadlocks detected
[RESULT] No tasks lost during shutdown

$ ./threadpool --workers=4 --graceful-shutdown-test
================================================================================
                    GRACEFUL SHUTDOWN TEST
================================================================================
[TEST] Submitting 100 long-running tasks...
[TEST] Requesting shutdown while 87 tasks pending...
[POOL] Shutdown requested - completing in-flight tasks
[POOL] Worker-0 finishing current task, then exiting
[POOL] Worker-1 finishing current task, then exiting
[POOL] Worker-2 finishing current task, then exiting
[POOL] Worker-3 finishing current task, then exiting
[POOL] Draining remaining 83 queued tasks...
[POOL] All workers joined
[POOL] Pool destroyed cleanly

[RESULT] PASS - All 100 tasks completed
[RESULT] PASS - No memory leaks (valgrind clean)
[RESULT] PASS - Shutdown completed in 2.3s

The Core Question Youโ€™re Answering: How do you safely coordinate multiple threads that share a work queue, ensuring no race conditions, no lost work, and clean shutdown?

Concepts You Must Understand First:

  • Mutex fundamentals (CS:APP 12.5.4) - Why mutual exclusion is necessary and how pthread_mutex_t provides atomicity
  • Condition variables (CS:APP 12.5.5) - The โ€œwait and signalโ€ pattern for efficient blocking without busy-waiting
  • Producer-consumer pattern (OSTEP Ch. 30) - The classic bounded buffer problem and its solution
  • Thread lifecycle (TLPI Ch. 29) - Creation, termination, joining, and detaching threads
  • Memory visibility (CS:APP 12.5.1) - Why compiler reordering and CPU caches can cause subtle bugs without proper synchronization
  • Deadlock prevention (OSTEP Ch. 32) - The four conditions for deadlock and how to avoid them

Questions to Guide Your Design:

  1. When a worker thread finds an empty queue, should it spin-wait, sleep, or use a condition variable? What are the tradeoffs?
  2. What happens if a producer tries to submit work when the queue is full? Block, drop, or grow the queue?
  3. How do you signal shutdown to workers? A special โ€œpoison pillโ€ task, a shared flag, or both?
  4. Should workers check the shutdown flag before or after dequeuing a task? Whatโ€™s the difference?
  5. If you use a circular buffer for the queue, how do you handle the wraparound correctly with concurrent access?
  6. How do you avoid the โ€œthundering herdโ€ problem when multiple workers wake up for one task?
  7. What happens if a worker thread crashes? Should the pool detect this and spawn a replacement?
  8. How do you make task submission return quickly even if all workers are busy?

Thinking Exercise:

Before writing code, trace through this scenario by hand:

// Initial state: queue is EMPTY, pool has 2 workers (both waiting)

// Thread A (producer):
pthread_mutex_lock(&pool->lock);
// A acquires lock
enqueue(&pool->queue, task1);
// A signals condition variable
pthread_cond_signal(&pool->not_empty);
pthread_mutex_unlock(&pool->lock);

// Meanwhile, Thread B (producer) tries to submit:
pthread_mutex_lock(&pool->lock);  // <-- What happens here?

// Worker-0 (waiting on cond var):
// wakes up from pthread_cond_wait()
// <-- What must Worker-0 do before accessing the queue?

// Worker-1 (also waiting):
// <-- Should Worker-1 wake up? What does it do?

Draw a timeline showing which thread holds the mutex at each moment. What if you used pthread_cond_broadcast() instead of pthread_cond_signal()?

The Interview Questions Theyโ€™ll Ask:

  1. โ€œWhatโ€™s the difference between a mutex and a semaphore? When would you use each?โ€
    • Expected answer: Mutex is for mutual exclusion (one thread at a time), semaphore is for counting resources. Use mutex for protecting shared data, semaphore for limiting concurrent access to N resources. A binary semaphore is similar to a mutex but has different ownership semantics (any thread can signal, only owner should unlock mutex).
  2. โ€œExplain why this thread pool implementation might deadlock.โ€ (Theyโ€™ll show buggy code)
    • Expected answer: Look for: lock ordering violations, missing unlock on error paths, waiting on condition while holding multiple locks, or joining a thread thatโ€™s waiting on a lock you hold.
  3. โ€œHow would you implement work stealing between thread pool workers?โ€
    • Expected answer: Each worker has its own deque. Workers push/pop from their own deque (LIFO for cache locality). When empty, steal from the tail of another workerโ€™s deque. Requires lock-free or fine-grained locking for the stealing operation.
  4. โ€œWhatโ€™s the spurious wakeup problem and how do you handle it?โ€
    • Expected answer: Condition variable wait can return even when no signal was sent. Always wrap pthread_cond_wait in a while loop that rechecks the condition, not an if statement.
  5. โ€œHow do you choose the optimal number of threads for a thread pool?โ€
    • Expected answer: For CPU-bound work: number of cores. For I/O-bound work: higher (2x-10x cores) depending on I/O wait ratio. Littleโ€™s Law can help: N = arrival_rate * average_service_time. In practice, benchmark and tune.
  6. โ€œWhatโ€™s the ABA problem and can it affect this implementation?โ€
    • Expected answer: ABA occurs in lock-free structures when a value changes A->B->A between read and CAS. With mutex-protected queues, ABA isnโ€™t an issue. But if you tried to make a lock-free queue, youโ€™d need hazard pointers or epoch-based reclamation.

Hints in Layers:

Layer 1 - Core Data Structures:

typedef struct task {
    void (*function)(void *arg);
    void *arg;
    struct task *next;
} task_t;

typedef struct threadpool {
    pthread_mutex_t lock;
    pthread_cond_t not_empty;   // Signal when queue becomes non-empty
    pthread_cond_t not_full;    // Signal when queue has space (for bounded)

    task_t *queue_head;
    task_t *queue_tail;
    int queue_size;
    int queue_capacity;

    pthread_t *workers;
    int worker_count;
    int shutdown;               // 0 = running, 1 = graceful, 2 = immediate
} threadpool_t;

Layer 2 - Worker Thread Loop Pattern:

void *worker_thread(void *arg) {
    threadpool_t *pool = (threadpool_t *)arg;

    while (1) {
        pthread_mutex_lock(&pool->lock);

        // Wait while queue is empty AND not shutting down
        while (pool->queue_size == 0 && !pool->shutdown) {
            pthread_cond_wait(&pool->not_empty, &pool->lock);
        }

        // Check shutdown AFTER waking
        if (pool->shutdown && pool->queue_size == 0) {
            pthread_mutex_unlock(&pool->lock);
            break;
        }

        task_t *task = dequeue(pool);  // Remove from queue
        pthread_mutex_unlock(&pool->lock);

        // Execute OUTSIDE the lock!
        task->function(task->arg);
        free(task);
    }
    return NULL;
}

Layer 3 - Submit with Backpressure:

int threadpool_submit(threadpool_t *pool, void (*fn)(void*), void *arg) {
    task_t *task = malloc(sizeof(task_t));
    task->function = fn;
    task->arg = arg;
    task->next = NULL;

    pthread_mutex_lock(&pool->lock);

    // Block if queue is full (backpressure)
    while (pool->queue_size >= pool->queue_capacity && !pool->shutdown) {
        pthread_cond_wait(&pool->not_full, &pool->lock);
    }

    if (pool->shutdown) {
        pthread_mutex_unlock(&pool->lock);
        free(task);
        return -1;  // Rejected
    }

    enqueue(pool, task);
    pthread_cond_signal(&pool->not_empty);  // Wake ONE worker

    pthread_mutex_unlock(&pool->lock);
    return 0;
}

Layer 4 - Graceful Shutdown:

void threadpool_shutdown(threadpool_t *pool, int graceful) {
    pthread_mutex_lock(&pool->lock);
    pool->shutdown = graceful ? 1 : 2;
    pthread_cond_broadcast(&pool->not_empty);  // Wake ALL workers
    pthread_cond_broadcast(&pool->not_full);   // Unblock any blocked submitters
    pthread_mutex_unlock(&pool->lock);

    // Join all workers
    for (int i = 0; i < pool->worker_count; i++) {
        pthread_join(pool->workers[i], NULL);
    }

    // If immediate shutdown, drain remaining tasks
    if (!graceful) {
        while (pool->queue_size > 0) {
            task_t *t = dequeue(pool);
            free(t);  // Or call a cancellation callback
        }
    }
}

Layer 5 - Testing for Correctness:

// Test: No lost tasks under concurrent submit/shutdown
void stress_test() {
    atomic_int completed = 0;
    threadpool_t *pool = threadpool_create(4, 64);

    // Submit from multiple producer threads simultaneously
    pthread_t producers[8];
    for (int i = 0; i < 8; i++) {
        pthread_create(&producers[i], NULL, submit_1000_tasks, &completed);
    }

    // Wait a bit then request shutdown
    usleep(100000);
    threadpool_shutdown(pool, 1);  // Graceful

    for (int i = 0; i < 8; i++) {
        pthread_join(producers[i], NULL);
    }

    // Verify: completed should equal submitted
    assert(completed == 8000);
}

Books That Will Help:

Book Chapters What Youโ€™ll Learn
CS:APP 3e 12.4-12.5 Threads, mutexes, condition variables, thread safety
OSTEP 26-32 Locks, condition variables, semaphores, common concurrency bugs
TLPI 29-33 POSIX threads, mutexes, conditions, thread cancellation
C++ Concurrency in Action 2-4 Modern patterns (applicable to C with adaptation)
APUE 3e 11-12 Threads, thread control, thread synchronization

Common Pitfalls & Debugging:

  1. Bug: Forgetting to recheck condition after waking from pthread_cond_wait
    // WRONG - spurious wakeup breaks this
    if (pool->queue_size == 0)
        pthread_cond_wait(&pool->not_empty, &pool->lock);
    task = dequeue();  // Might crash on empty queue!
    
    // RIGHT - while loop handles spurious wakeups
    while (pool->queue_size == 0 && !pool->shutdown)
        pthread_cond_wait(&pool->not_empty, &pool->lock);
    
  2. Bug: Executing task while holding the lock
    // WRONG - blocks all other workers during task execution!
    pthread_mutex_lock(&pool->lock);
    task = dequeue(pool);
    task->function(task->arg);  // Could take seconds!
    pthread_mutex_unlock(&pool->lock);
    
    // RIGHT - release lock before executing
    pthread_mutex_lock(&pool->lock);
    task = dequeue(pool);
    pthread_mutex_unlock(&pool->lock);
    task->function(task->arg);  // Other workers can proceed
    
  3. Bug: Race condition during shutdown
    // WRONG - worker might miss the shutdown signal
    if (pool->shutdown) break;  // Checked without lock!
    pthread_cond_wait(...);     // Might wait forever
    
    // RIGHT - check with lock held, use broadcast for shutdown
    pthread_mutex_lock(&pool->lock);
    while (queue_empty && !pool->shutdown) {
        pthread_cond_wait(...);
    }
    if (pool->shutdown && queue_empty) {
        pthread_mutex_unlock(&pool->lock);
        break;
    }
    
  4. Bug: Memory leak on rejected tasks during shutdown
    // WRONG - caller doesn't know task was rejected
    if (pool->shutdown) {
        pthread_mutex_unlock(&pool->lock);
        return;  // task memory leaked!
    }
    
    // RIGHT - return error code, let caller handle cleanup
    if (pool->shutdown) {
        pthread_mutex_unlock(&pool->lock);
        free(task);
        return -1;  // ESHUTDOWN
    }
    

Project 22: Signal-Safe Printf

Attribute Value
Language C (alt: Rust)
Difficulty Advanced
Time Weekend
Chapters 8, 12

What youโ€™ll build: A tiny printf-like facility (sio) that is safe to call from signal handlers using only async-signal-safe operations.

Why it matters: Forces you to understand why printf, malloc, and most libc functions are unsafe in handlers.

Core challenges:

  • Avoiding all non-async-signal-safe functions
  • Implementing integer/string formatting with only write(2)
  • Testing under high-frequency signal delivery

Real World Outcome:

$ ./sio_demo
================================================================================
                    SIGNAL-SAFE I/O (SIO) DEMONSTRATION
================================================================================
[TEST 1] Basic output from main()
sio_puts: Hello from signal-safe I/O!
sio_putl: The answer is 42
sio_puthex: Address = 0x7fff5fbff8c0

[TEST 2] Signal handler output (SIGUSR1)
$ kill -USR1 $(pgrep sio_demo)
[HANDLER] Caught signal 10 (SIGUSR1)
[HANDLER] Handler invoked 1 time(s)
[HANDLER] Current errno preserved: 0

[TEST 3] Rapid signal delivery stress test
Sending 10000 SIGALRM signals at 10000 Hz...
[HANDLER] Signal count: 1000
[HANDLER] Signal count: 2000
[HANDLER] Signal count: 5000
[HANDLER] Signal count: 10000

[RESULT] All 10000 signals handled
[RESULT] No crashes, no corruption, no deadlocks
[RESULT] Printf equivalent calls in handler: 0 (verified safe)

$ ./sio_demo --compare-with-printf
================================================================================
                    SAFETY COMPARISON: SIO vs PRINTF
================================================================================
[SETUP] Installing SIGALRM handler that prints a message
[SETUP] Handler will fire every 100 microseconds

[TEST] Main thread calling malloc() in a loop...

--- Using printf() in handler (UNSAFE) ---
[MAIN] Iteration 1000...
[MAIN] Iteration 2000...
[DEADLOCK DETECTED] Program hung after 2847 iterations
[CAUSE] printf() called from handler while main held stdio lock

--- Using sio_puts() in handler (SAFE) ---
[MAIN] Iteration 1000...
[HANDLER] tick 50
[MAIN] Iteration 2000...
[HANDLER] tick 100
[MAIN] Iteration 10000...
[HANDLER] tick 500

[RESULT] Completed 10000 iterations with 500 handler invocations
[RESULT] No deadlocks with async-signal-safe sio functions

$ ./sio_demo --format-test
================================================================================
                    FORMAT SPECIFIER TESTS
================================================================================
Testing sio_printf() format specifiers:

sio_printf("Integer: %d\n", -42)     -> Integer: -42
sio_printf("Unsigned: %u\n", 42)     -> Unsigned: 42
sio_printf("Hex: 0x%x\n", 255)       -> Hex: 0xff
sio_printf("Long: %ld\n", 1234567890123) -> Long: 1234567890123
sio_printf("String: %s\n", "hello")  -> String: hello
sio_printf("Pointer: %p\n", ptr)     -> Pointer: 0x7fff5fbff8c0
sio_printf("Percent: %%\n")          -> Percent: %
sio_printf("Width: %10d\n", 42)      -> Width:         42
sio_printf("Multiple: %s=%d\n", "x", 5) -> Multiple: x=5

[RESULT] All format specifiers working correctly
[RESULT] No malloc, no stdio, only write(2) syscalls

The Core Question Youโ€™re Answering: Why canโ€™t you call printf() from a signal handler, and how do you build output functions that are safe to call from any context?

Concepts You Must Understand First:

  • Async-signal-safety (CS:APP 8.5.5) - Which functions can be safely called from signal handlers and why most cannot
  • Reentrancy (TLPI 21.1.2) - What happens when a function is interrupted and called again before completing
  • Signal delivery semantics (CS:APP 8.5) - How signals interrupt execution at arbitrary points
  • The write(2) syscall (TLPI 4.3) - The only safe way to output from a signal handler
  • Errno preservation (TLPI 21.1.3) - Why handlers must save and restore errno
  • Lock-free programming basics (TLPI 21.1.2) - Why mutexes in handlers cause deadlocks

Questions to Guide Your Design:

  1. Why is printf() not async-signal-safe? What specific resources does it use that cause problems?
  2. How do you convert an integer to a string without calling sprintf(), snprintf(), or any memory allocation?
  3. What buffer should you use for formatting? Stack-allocated? Static? What are the tradeoffs?
  4. How do you handle negative numbers in your integer-to-string conversion?
  5. Should sio functions buffer output or write immediately? What does buffering require that makes it unsafe?
  6. How do you implement hexadecimal output without using lookup tables that might not be in cache?
  7. What happens if write(2) is interrupted by another signal? How do you handle partial writes?
  8. How do you test that your implementation is truly async-signal-safe?

Thinking Exercise:

Before coding, analyze why this handler deadlocks:

pthread_mutex_t stdio_lock = PTHREAD_MUTEX_INITIALIZER;
char buffer[1024];

void safe_looking_print(const char *msg) {
    pthread_mutex_lock(&stdio_lock);
    strcpy(buffer, msg);
    printf("%s\n", buffer);
    pthread_mutex_unlock(&stdio_lock);
}

void handler(int sig) {
    safe_looking_print("Signal received!");  // <-- Why does this deadlock?
}

int main() {
    signal(SIGINT, handler);
    while (1) {
        safe_looking_print("Main loop iteration");
    }
}

Trace through: What happens if SIGINT arrives while main() is between pthread_mutex_lock and pthread_mutex_unlock?

Now consider: Would making the mutex recursive solve the problem? (Hint: What about printfโ€™s internal locks?)

The Interview Questions Theyโ€™ll Ask:

  1. โ€œWhat makes a function async-signal-safe? Give examples of safe and unsafe functions.โ€
    • Expected answer: A function is async-signal-safe if it can be safely called from a signal handler, even if the main program was interrupted in the middle of the same function. Safe: write(), _exit(), signal(). Unsafe: printf(), malloc(), any function using locks or global state. The key issue is reentrancy and internal locks.
  2. โ€œWhy is malloc() not async-signal-safe?โ€
    • Expected answer: malloc() uses internal locks to protect the heap data structures. If a signal interrupts malloc() while it holds the lock, and the handler calls malloc(), you get deadlock. Also, malloc() may be in the middle of updating heap metadata, leaving it in an inconsistent state.
  3. โ€œHow would you implement a signal handler that needs to log messages?โ€
    • Expected answer: Use only write(2) for output. Pre-format simple messages as string constants. For dynamic data, implement integer-to-string conversion without malloc. Consider using a pipe or signal-safe queue to defer complex logging to the main thread.
  4. โ€œExplain the errno problem in signal handlers and how to solve it.โ€
    • Expected answer: Many async-signal-safe functions (like write()) can set errno. If the handler modifies errno and the main code was about to check errno from its own syscall, the result is corrupted. Solution: Save errno at handler entry, restore before return.
  5. โ€œWhatโ€™s the difference between reentrant and thread-safe?โ€
    • Expected answer: Thread-safe means safe when called concurrently from multiple threads (usually via locks). Reentrant means safe when interrupted and re-invoked before completing (no global/static state, no locks). All reentrant functions are thread-safe, but not vice versa. Async-signal-safe requires reentrancy.
  6. โ€œHow would you implement a printf-like format string parser thatโ€™s async-signal-safe?โ€
    • Expected answer: Parse the format string character by character. For each specifier, convert the value to a string using stack-local buffers and manual conversion (repeated division for integers). Accumulate output in a stack buffer, then call write() once. No dynamic allocation, no stdio.

Hints in Layers:

Layer 1 - Core Output Primitive:

// The ONLY function we can use for output in a signal handler
ssize_t sio_write(const char *s, size_t n) {
    size_t remaining = n;
    const char *p = s;

    while (remaining > 0) {
        ssize_t written = write(STDOUT_FILENO, p, remaining);
        if (written < 0) {
            if (errno == EINTR) continue;  // Interrupted, retry
            return -1;  // Real error
        }
        remaining -= written;
        p += written;
    }
    return n;
}

// Wrapper for null-terminated strings
ssize_t sio_puts(const char *s) {
    return sio_write(s, strlen(s));
}

Layer 2 - Integer to String (No malloc!):

// Convert integer to string in caller-provided buffer
// Returns pointer to start of number within buffer
char *sio_itoa(long value, char *buf, size_t bufsize) {
    char *p = buf + bufsize - 1;
    *p = '\0';

    int negative = (value < 0);
    unsigned long uval = negative ? -value : value;

    // Build string backwards
    do {
        *--p = '0' + (uval % 10);
        uval /= 10;
    } while (uval > 0 && p > buf);

    if (negative && p > buf) {
        *--p = '-';
    }

    return p;  // Start of the number string
}

// Output a long integer
ssize_t sio_putl(long value) {
    char buf[32];  // Stack allocated!
    char *s = sio_itoa(value, buf, sizeof(buf));
    return sio_puts(s);
}

Layer 3 - Hexadecimal Output:

ssize_t sio_puthex(unsigned long value) {
    char buf[20];
    char *p = buf + sizeof(buf) - 1;
    *p = '\0';

    if (value == 0) {
        *--p = '0';
    } else {
        while (value > 0 && p > buf) {
            int digit = value & 0xF;
            *--p = (digit < 10) ? ('0' + digit) : ('a' + digit - 10);
            value >>= 4;
        }
    }

    // Add "0x" prefix
    *--p = 'x';
    *--p = '0';

    return sio_puts(p);
}

Layer 4 - Signal Handler Pattern:

volatile sig_atomic_t signal_count = 0;

void handler(int sig) {
    // CRITICAL: Save and restore errno
    int saved_errno = errno;

    signal_count++;  // sig_atomic_t is safe to modify

    // Safe output
    sio_puts("[HANDLER] Signal ");
    sio_putl(sig);
    sio_puts(" received (count: ");
    sio_putl(signal_count);
    sio_puts(")\n");

    errno = saved_errno;  // Restore before return
}

Layer 5 - Simple Format String Parser:

// Minimal printf subset: %s, %d, %ld, %x, %p, %%
void sio_printf(const char *fmt, ...) {
    va_list ap;
    va_start(ap, fmt);

    char buf[32];
    const char *p = fmt;

    while (*p) {
        if (*p != '%') {
            sio_write(p, 1);
            p++;
            continue;
        }

        p++;  // Skip '%'
        switch (*p) {
            case 'd': {
                int val = va_arg(ap, int);
                sio_puts(sio_itoa(val, buf, sizeof(buf)));
                break;
            }
            case 'l':
                p++;
                if (*p == 'd') {
                    long val = va_arg(ap, long);
                    sio_puts(sio_itoa(val, buf, sizeof(buf)));
                }
                break;
            case 's': {
                char *s = va_arg(ap, char*);
                sio_puts(s ? s : "(null)");
                break;
            }
            case 'x': {
                unsigned val = va_arg(ap, unsigned);
                sio_puthex(val);
                break;
            }
            case 'p': {
                void *ptr = va_arg(ap, void*);
                sio_puthex((unsigned long)ptr);
                break;
            }
            case '%':
                sio_write("%", 1);
                break;
        }
        p++;
    }

    va_end(ap);
}

Books That Will Help:

Book Chapters What Youโ€™ll Learn
CS:APP 3e 8.5 Signal concepts, async-signal-safety, handler design
TLPI 21-22 Signals, signal handlers, async-signal-safe functions (comprehensive list)
APUE 3e 10 Signals (POSIX perspective)
OSTEP Ch. 5 (Process API) Understanding how signals fit with process model
Secure Coding in C/C++ Ch. 5 Signal handling vulnerabilities

Common Pitfalls & Debugging:

  1. Bug: Forgetting to save/restore errno
    void handler(int sig) {
        // WRONG - corrupts errno if main code is checking it
        write(STDOUT_FILENO, "signal\n", 7);  // write() might set errno
    }
    
    // RIGHT
    void handler(int sig) {
        int saved_errno = errno;
        write(STDOUT_FILENO, "signal\n", 7);
        errno = saved_errno;
    }
    
  2. Bug: Using sprintf() โ€œbecause it doesnโ€™t mallocโ€
    // WRONG - sprintf uses stdio buffers, internal locks
    void handler(int sig) {
        char buf[64];
        sprintf(buf, "Signal %d\n", sig);  // NOT async-signal-safe!
        write(STDOUT_FILENO, buf, strlen(buf));
    }
    
    // RIGHT - manual conversion
    void handler(int sig) {
        char buf[32];
        char *p = sio_itoa(sig, buf, sizeof(buf));
        sio_puts("Signal ");
        sio_puts(p);
        sio_puts("\n");
    }
    
  3. Bug: Static buffers shared between handler and main
    // WRONG - handler might corrupt buffer while main is using it
    static char shared_buffer[256];
    
    void handler(int sig) {
        strcpy(shared_buffer, "interrupted!");  // Race condition!
    }
    
    // RIGHT - use stack-local buffers in handler
    void handler(int sig) {
        char local_buf[256];  // Each handler invocation gets its own
        // ...
    }
    
  4. Bug: Ignoring partial writes
    // WRONG - write() might not write everything
    void handler(int sig) {
        char msg[] = "Very long message...";
        write(STDOUT_FILENO, msg, sizeof(msg));  // Might only write part!
    }
    
    // RIGHT - loop until all bytes written
    void sio_write_all(const char *buf, size_t n) {
        while (n > 0) {
            ssize_t written = write(STDOUT_FILENO, buf, n);
            if (written <= 0) {
                if (errno == EINTR) continue;
                return;  // Error
            }
            buf += written;
            n -= written;
        }
    }
    

Project 23: Performance Profiler

Attribute Value
Language C (alt: C++, Rust)
Difficulty Advanced
Time 1โ€“2 weeks
Chapters 5, 8, 3

What youโ€™ll build: A sampling profiler that periodically interrupts a program, records where it is, and reports the hottest functions.

Why it matters: Understand what profilers can/canโ€™t tell youโ€”bias, sampling error, and Heisenberg effects.

Core challenges:

  • Implementing timer-based sampling (SIGPROF/ITIMER_PROF)
  • Capturing instruction pointers and aggregating into reports
  • Symbolizing addresses back to function names

Real World Outcome:

$ ./profiler --sample-rate=1000 -- ./target_program
================================================================================
                    SAMPLING PROFILER
================================================================================
[CONFIG] Sample rate: 1000 Hz (1ms interval)
[CONFIG] Using ITIMER_PROF (CPU time only)
[START] Profiling ./target_program (PID: 12847)

[PROGRESS] 1000 samples collected...
[PROGRESS] 5000 samples collected...
[PROGRESS] 10000 samples collected...

[END] Target exited with status 0
[STATS] Total samples: 14,293
[STATS] Unique instruction pointers: 847
[STATS] Profiling overhead: ~2.3%

================================================================================
                    FLAT PROFILE (Top 20 Functions)
================================================================================
  %time    samples   function                          source:line
 -------  --------   --------------------------------  ----------------------
  23.4%      3,345   matrix_multiply                   matrix.c:142
  18.7%      2,673   vector_dot_product                linalg.c:89
  12.1%      1,730   quicksort_partition               sort.c:67
   8.9%      1,272   hash_table_lookup                 hash.c:234
   6.2%        886   memcpy@plt                        (libc)
   4.8%        686   strcmp@plt                        (libc)
   3.7%        529   parse_json_object                 json.c:456
   2.9%        414   allocate_buffer                   buffer.c:78
   2.4%        343   compute_checksum                  crypto.c:123
   2.1%        300   read_file_chunk                   io.c:89
   1.8%        257   (unknown)                         0x7f3a2b4c5d6e
   ...
  13.0%      1,858   (other - 837 functions)

================================================================================
                    CALL GRAPH PROFILE
================================================================================
                         |--- vector_dot_product (18.7%)
matrix_multiply (23.4%) -|
                         |--- memcpy@plt (2.1% attributed)

                           |--- quicksort_partition (12.1%)
process_data (32.1%) -----|--- hash_table_lookup (8.9%)
                           |--- parse_json_object (3.7%)

$ ./profiler --flame-graph -- ./target_program > profile.svg
================================================================================
                    FLAME GRAPH GENERATION
================================================================================
[SAMPLING] Collecting call stacks at 997 Hz...
[STACKS] 8,234 unique stack traces captured
[RENDER] Generating SVG flame graph...

[OUTPUT] Flame graph written to: profile.svg (234 KB)
[TIP] Open in browser: firefox profile.svg

$ ./profiler --compare before.prof after.prof
================================================================================
                    DIFFERENTIAL PROFILE
================================================================================
Comparing: before.prof (14,293 samples) vs after.prof (13,892 samples)

Improved (faster):
  function                  before    after     delta
  ---------------------------------------------------
  matrix_multiply           23.4%     8.2%     -15.2%  (optimized!)
  vector_dot_product        18.7%    12.1%      -6.6%

Regressed (slower):
  function                  before    after     delta
  ---------------------------------------------------
  cache_lookup               1.2%     4.8%      +3.6%  (new bottleneck)
  validate_input             0.8%     2.1%      +1.3%

[SUMMARY] Overall improvement: 18.3% less CPU time in hot path

$ ./profiler --self-profile --overhead-test
================================================================================
                    PROFILER OVERHEAD ANALYSIS
================================================================================
[TEST] Running workload without profiling: 4.823s
[TEST] Running workload with profiling at 100 Hz: 4.831s (+0.17%)
[TEST] Running workload with profiling at 1000 Hz: 4.935s (+2.32%)
[TEST] Running workload with profiling at 10000 Hz: 5.647s (+17.09%)

[RECOMMENDATION] Use 1000 Hz for production profiling
[WARNING] Rates above 5000 Hz introduce significant overhead

The Core Question Youโ€™re Answering: How do profilers like gprof, perf, and pprof measure where your program spends its time, and what are the limitations of statistical sampling?

Concepts You Must Understand First:

  • Timer signals (SIGPROF/SIGVTALRM) (TLPI Ch. 23) - Different timers measure wall-clock, user CPU, or system CPU time
  • Signal handlers and context (CS:APP 8.5) - How the interrupted context provides the instruction pointer
  • Program counter / instruction pointer (CS:APP 3.4) - The CPU register that tells you where execution is
  • Symbol tables and DWARF (CS:APP 7.5) - How to map addresses back to function names and line numbers
  • Statistical sampling theory (CS:APP 5.14) - Why sampling works and its inherent error margins
  • ASLR and PIE (CS:APP 7.12) - Address randomization affects address-to-symbol mapping

Questions to Guide Your Design:

  1. Whatโ€™s the difference between ITIMER_REAL, ITIMER_VIRTUAL, and ITIMER_PROF? Which should a CPU profiler use?
  2. How do you get the instruction pointer (RIP) from inside a signal handler? Whatโ€™s in the ucontext_t?
  3. If you sample at 1000 Hz and a function runs for 1ms, how many samples do you expect? What if it runs for 0.5ms?
  4. How do you aggregate samples efficiently? A hash table from IP to count? What about collisions?
  5. How do you convert an instruction pointer to a function name? What tools/libraries can help?
  6. What happens to profiling accuracy if a function is inlined? Can you still measure it?
  7. How do you capture call stacks, not just leaf functions? What are the challenges with frame pointers?
  8. Why might your profiler show different results on different runs? Is this a bug or expected behavior?

Thinking Exercise:

Before coding, analyze this sampling scenario:

Time (ms):  0    1    2    3    4    5    6    7    8    9    10
            |----|----|----|----|----|----|----|----|----|----|
Function A: โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ                                    (4ms, 40%)
Function B:                 โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ                            (2ms, 20%)
Function C:                         โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ    (4ms, 40%)

Sampling at 1000 Hz (every 1ms):
Sample #:   1    2    3    4    5    6    7    8    9    10
Expected:   A    A    A    A    B    B    C    C    C    C

Now consider: What if the timer fires at t=0.5, 1.5, 2.5โ€ฆ (phase-shifted by 0.5ms)?

  • Which functions would we sample?
  • If function B always runs at exactly t=4.0 to t=6.0, and our timer fires at t=0.5, 1.5, 2.5, 3.5, 4.5, 5.5โ€ฆ
  • How many samples of B would we get?

This illustrates aliasing - a real problem in sampling profilers!

The Interview Questions Theyโ€™ll Ask:

  1. โ€œHow does a sampling profiler work? What are its advantages over instrumentation?โ€
    • Expected answer: Sampling profilers periodically interrupt the program (via timer signal) and record where it is (instruction pointer). Advantages: low overhead (constant regardless of call frequency), no code modification needed, works on release binaries. Disadvantages: statistical error, may miss short functions, can alias with periodic behavior.
  2. โ€œExplain the difference between CPU time and wall-clock time profiling.โ€
    • Expected answer: CPU time (ITIMER_PROF) only counts time when the CPU is executing your code - excludes I/O waits, sleeps, context switches. Wall-clock time (ITIMER_REAL) measures real elapsed time including waits. For CPU-bound code, use CPU time. For I/O-bound or concurrent code, wall-clock may be more useful.
  3. โ€œHow do you symbolize an address back to a function name?โ€
    • Expected answer: Use the symbol table in the ELF binary. Tools: dladdr() for runtime lookup, addr2line for static lookup, libbacktrace or libunwind for full support. Need to handle ASLR (read /proc/self/maps), stripped binaries (no symbols), and inlined functions (DWARF info).
  4. โ€œWhat is the observer effect in profiling?โ€
    • Expected answer: Profiling changes the behavior of the program being measured. Signal handlers take CPU time, cache lines get evicted, branches may become less predictable. High sample rates increase overhead. A good profiler minimizes overhead and measures its own impact.
  5. โ€œHow would you profile a multithreaded program?โ€
    • Expected answer: ITIMER_PROF signals go to the thread that consumed the CPU time. Each thread needs its own sample aggregation (or lock-protected shared structure). Consider: should you sample all threads equally or proportionally to CPU usage? Thread IDs help attribute samples.
  6. โ€œWhy might a profiler miss a function entirely?โ€
    • Expected answer: If a function runs for less than the sampling interval (e.g., 0.1ms with 1000Hz sampling), it may never be sampled. Also: inlined functions donโ€™t have separate addresses, leaf functions may be in registers, and short periodic functions may alias with the sample timer.

Hints in Layers:

Layer 1 - Basic Timer Signal Setup:

#include <sys/time.h>
#include <signal.h>

volatile sig_atomic_t sample_count = 0;
static struct sample { void *ip; } samples[1000000];

void profile_handler(int sig, siginfo_t *si, void *context) {
    ucontext_t *uc = (ucontext_t *)context;

    // Get instruction pointer from interrupted context
    // Linux x86-64:
    void *ip = (void *)uc->uc_mcontext.gregs[REG_RIP];

    // macOS x86-64:
    // void *ip = (void *)uc->uc_mcontext->__ss.__rip;

    // Store sample (async-signal-safe: just array write)
    if (sample_count < 1000000) {
        samples[sample_count++].ip = ip;
    }
}

void start_profiling(int hz) {
    // Install signal handler with SA_SIGINFO to get context
    struct sigaction sa;
    sa.sa_sigaction = profile_handler;
    sa.sa_flags = SA_RESTART | SA_SIGINFO;
    sigemptyset(&sa.sa_mask);
    sigaction(SIGPROF, &sa, NULL);

    // Set up interval timer (microseconds)
    struct itimerval timer;
    timer.it_interval.tv_sec = 0;
    timer.it_interval.tv_usec = 1000000 / hz;  // e.g., 1000 for 1ms
    timer.it_value = timer.it_interval;
    setitimer(ITIMER_PROF, &timer, NULL);
}

Layer 2 - Sample Aggregation:

#include <search.h>  // For hash table (hsearch)

typedef struct {
    void *ip;
    unsigned long count;
    char *symbol;  // Resolved later
} profile_entry_t;

// Simple aggregation: sort and count
void aggregate_samples(void) {
    // Sort samples by IP for counting
    qsort(samples, sample_count, sizeof(struct sample), compare_ip);

    // Count consecutive duplicates
    void *current_ip = NULL;
    int current_count = 0;

    for (int i = 0; i < sample_count; i++) {
        if (samples[i].ip == current_ip) {
            current_count++;
        } else {
            if (current_ip != NULL) {
                add_to_profile(current_ip, current_count);
            }
            current_ip = samples[i].ip;
            current_count = 1;
        }
    }
    if (current_ip != NULL) {
        add_to_profile(current_ip, current_count);
    }
}

Layer 3 - Address Symbolization:

#define _GNU_SOURCE
#include <dlfcn.h>

// Runtime symbolization using dladdr
const char *symbolize(void *addr) {
    Dl_info info;
    if (dladdr(addr, &info) && info.dli_sname) {
        return info.dli_sname;
    }
    return "(unknown)";
}

// For better symbolization, use addr2line or libbacktrace
void symbolize_with_addr2line(void *addr, char *out, size_t outsize) {
    char cmd[256];
    snprintf(cmd, sizeof(cmd),
             "addr2line -f -e /proc/self/exe %p 2>/dev/null", addr);

    FILE *fp = popen(cmd, "r");
    if (fp) {
        if (fgets(out, outsize, fp) == NULL) {
            snprintf(out, outsize, "0x%lx", (unsigned long)addr);
        }
        // Strip newline
        out[strcspn(out, "\n")] = '\0';
        pclose(fp);
    }
}

Layer 4 - Stack Trace Capture:

#include <execinfo.h>

#define MAX_STACK_DEPTH 64

typedef struct {
    void *stack[MAX_STACK_DEPTH];
    int depth;
} stack_sample_t;

stack_sample_t stack_samples[100000];
volatile sig_atomic_t stack_sample_count = 0;

void profile_handler_with_stack(int sig, siginfo_t *si, void *context) {
    if (stack_sample_count >= 100000) return;

    // Capture call stack
    // NOTE: backtrace() is not strictly async-signal-safe!
    // For production, use frame pointer walking or libunwind
    stack_sample_t *s = &stack_samples[stack_sample_count];
    s->depth = backtrace(s->stack, MAX_STACK_DEPTH);
    stack_sample_count++;
}

// Print stack traces (for debugging)
void print_stack(stack_sample_t *s) {
    char **symbols = backtrace_symbols(s->stack, s->depth);
    for (int i = 0; i < s->depth; i++) {
        printf("  %s\n", symbols[i]);
    }
    free(symbols);
}

Layer 5 - Report Generation:

void print_flat_profile(void) {
    // Sort entries by sample count (descending)
    qsort(profile_entries, entry_count, sizeof(profile_entry_t),
          compare_by_count_desc);

    printf("================================================================================\n");
    printf("                    FLAT PROFILE\n");
    printf("================================================================================\n");
    printf("  %%time    samples   function\n");
    printf(" -------  --------   --------------------------------\n");

    for (int i = 0; i < entry_count && i < 20; i++) {
        double pct = 100.0 * profile_entries[i].count / sample_count;
        printf("  %5.1f%%  %8lu   %s\n",
               pct,
               profile_entries[i].count,
               profile_entries[i].symbol);
    }
}

// Flame graph output (folded stacks format for flamegraph.pl)
void output_folded_stacks(FILE *out) {
    for (int i = 0; i < stack_sample_count; i++) {
        stack_sample_t *s = &stack_samples[i];

        // Print stack frames separated by semicolons (bottom to top)
        for (int j = s->depth - 1; j >= 0; j--) {
            if (j < s->depth - 1) fprintf(out, ";");
            fprintf(out, "%s", symbolize(s->stack[j]));
        }
        fprintf(out, " 1\n");  // Weight of 1 per sample
    }
}

Books That Will Help:

Book Chapters What Youโ€™ll Learn
CS:APP 3e 5.14, 8.5 Performance measurement, signals and handlers
TLPI 23 Timer signals (ITIMER_*), interval timers
Systems Performance (Gregg) 5-6 Profiling methodology, CPU analysis
APUE 3e 10, 14 Signals, interval timers
BPF Performance Tools 13 CPU profiling with modern tools

Common Pitfalls & Debugging:

  1. Bug: Using wall-clock timer for CPU profiling
    // WRONG - counts time sleeping, not CPU time
    setitimer(ITIMER_REAL, &timer, NULL);
    // If program sleeps 90% of the time, you sample sleeping!
    
    // RIGHT - counts only user + system CPU time
    setitimer(ITIMER_PROF, &timer, NULL);
    
  2. Bug: Forgetting to handle ASLR
    // WRONG - addresses change each run!
    printf("Hot function at: %p\n", ip);
    // Next run, same function is at a different address
    
    // RIGHT - subtract base address or use dladdr
    Dl_info info;
    if (dladdr(ip, &info)) {
        ptrdiff_t offset = (char*)ip - (char*)info.dli_fbase;
        printf("%s+0x%lx\n", info.dli_fname, (unsigned long)offset);
    }
    
  3. Bug: Calling non-async-signal-safe functions in handler
    // WRONG - printf, malloc, dladdr are NOT async-signal-safe
    void handler(int sig, siginfo_t *si, void *ctx) {
        printf("Sample at %p\n", get_ip(ctx));  // May deadlock!
        char *sym = symbolize(get_ip(ctx));     // Calls malloc!
    }
    
    // RIGHT - only store data, process later
    void handler(int sig, siginfo_t *si, void *ctx) {
        if (sample_count < MAX_SAMPLES) {
            samples[sample_count++] = get_ip(ctx);  // Just a write
        }
    }
    
  4. Bug: High sample rate causing measurement distortion
    // WRONG - 100,000 Hz sampling
    timer.it_interval.tv_usec = 10;  // 10us interval
    // Handler overhead dominates! Measuring the profiler, not the program.
    
    // RIGHT - 100-1000 Hz is usually sufficient
    timer.it_interval.tv_usec = 1000;  // 1ms interval (1000 Hz)
    // Rule of thumb: if overhead > 5%, reduce sample rate
    

Project 24: Memory Leak Detector

Attribute Value
Language C (alt: C++)
Difficulty Advanced
Time 1โ€“2 weeks
Chapters 7, 9, 3

What youโ€™ll build: A shared library (libleakcheck.so) that interposes malloc/free at runtime, tracks allocations, and emits leak reports with stack traces.

Why it matters: Combines linking (interposition) and memory concepts into a practical debugging tool.

Core challenges:

  • Using LD_PRELOAD to intercept allocation APIs
  • Avoiding recursion pitfalls (โ€œno malloc in mallocโ€)
  • Recording useful diagnostics (sizes, call stacks)

Real World Outcome

When complete, your leak detector will produce output like this:

$ gcc -g -o leaky_app leaky_app.c
$ gcc -shared -fPIC -o libleakcheck.so leakcheck.c -ldl -lunwind

$ LD_PRELOAD=./libleakcheck.so ./leaky_app
================================================================================
                    MEMORY LEAK DETECTOR - Runtime Analysis
================================================================================
[INIT] libleakcheck.so loaded, intercepting malloc/calloc/realloc/free
[INIT] Tracking allocations with stack trace depth: 8

[ALLOC] malloc(64) = 0x55a3b2c00010      [leaky_app.c:23 in main()]
[ALLOC] malloc(128) = 0x55a3b2c00060     [leaky_app.c:24 in main()]
[ALLOC] calloc(10, 32) = 0x55a3b2c000f0  [leaky_app.c:27 in process_data()]
[ALLOC] malloc(256) = 0x55a3b2c00200     [leaky_app.c:31 in process_data()]
[FREE]  free(0x55a3b2c00010)             [leaky_app.c:45 in cleanup()]
[FREE]  free(0x55a3b2c000f0)             [leaky_app.c:46 in cleanup()]

================================================================================
                         LEAK REPORT - Program Exit
================================================================================
2 blocks leaked (384 bytes total)

Block 1: 0x55a3b2c00060 (128 bytes)
  Allocated at: leaky_app.c:24 in main()
  Call stack:
    #0  main() at leaky_app.c:24
    #1  __libc_start_call_main at libc.so.6
    #2  __libc_start_main at libc.so.6
    #3  _start

Block 2: 0x55a3b2c00200 (256 bytes)
  Allocated at: leaky_app.c:31 in process_data()
  Call stack:
    #0  process_data() at leaky_app.c:31
    #1  main() at leaky_app.c:28
    #2  __libc_start_call_main at libc.so.6
    #3  _start

--------------------------------------------------------------------------------
Summary: 4 allocations, 2 frees, 2 leaks (384 bytes)
Peak memory usage: 480 bytes at timestamp 0.003s
================================================================================

The Core Question Youโ€™re Answering

โ€œHow can we transparently intercept and track every memory allocation in a running program without modifying its source code, and use this to detect memory leaks with precise source location information?โ€

This project teaches you that the dynamic linker is programmable infrastructure. By understanding symbol resolution order and interposition, you can inject behavior into any dynamically-linked program. The same mechanism powers profilers, sanitizers, and debugging tools used in production systems.

Concepts You Must Understand First

Before writing code, ensure you can explain:

Concept Why It Matters Reference
Dynamic Linking & Symbol Resolution LD_PRELOAD exploits the linkerโ€™s symbol search order to let your library โ€œshadowโ€ libc functions CS:APP 7.12, TLPI Ch. 41
Position-Independent Code (PIC) Shared libraries must use PIC; understand GOT/PLT indirection CS:APP 7.12
dlsym and RTLD_NEXT You need to call the real malloc after your wrapper; RTLD_NEXT finds the next symbol in search order TLPI 42.1
Stack Unwinding Capturing call stacks requires walking the frame chain or using libunwind/backtrace() CS:APP 3.7, libunwind docs
Thread Safety Your tracking data structures must handle concurrent allocations CS:APP Ch. 12
Signal Safety Some code paths (like atexit handlers) have restrictions on what functions you can call CS:APP 8.5.5

Questions to Guide Your Design

Answer these before writing code:

  1. How will you store allocation metadata? (Hash table keyed by address? Linked list? What are the tradeoffs?)

  2. How do you get the โ€œrealโ€ malloc? (When does dlsym(RTLD_NEXT, "malloc") get called? What if dlsym itself calls malloc?)

  3. What happens if your tracking code calls malloc? (Design a recursion guard. How do you detect and break the cycle?)

  4. How will you capture stack traces? (backtrace() vs libunwind vs manual frame walking. Which is signal-safe?)

  5. When do you emit the leak report? (atexit handler? Destructor function? What about abnormal termination?)

  6. How do you map addresses to source lines? (Runtime: addr2line/dladdr. Or embed DWARF parsing?)

  7. What about realloc? (It can move memory. How do you track the old/new relationship?)

  8. How do you handle calloc? (It might be implemented via malloc internally in some libcs.)

Thinking Exercise: Trace This Interposition

Before implementing, trace through what happens when a program runs with your library:

// leaky.c - compile with: gcc -g -o leaky leaky.c
#include <stdlib.h>
#include <stdio.h>

void helper(void) {
    char *buf = malloc(100);  // Allocation A
    // Oops, forgot to free!
}

int main(void) {
    int *arr = malloc(40);    // Allocation B
    helper();
    free(arr);                // Free B
    return 0;
}

Trace questions:

  1. When LD_PRELOAD=./libleakcheck.so ./leaky starts, in what order are constructors called?

  2. When main() calls malloc(40), trace the symbol resolution:
    • Where does the PLT jump go first?
    • How does your interposed malloc get called?
    • How does your wrapper call the real malloc?
  3. Why is Allocation A (in helper) a leak but Allocation B is not?

  4. If your leak report runs in an atexit handler, what memory is still โ€œliveโ€?

Draw the state of your allocation tracking table after each call:

After malloc(40):   { 0x55...010: {size=40, caller=main:10} }
After malloc(100):  { 0x55...010: {size=40, caller=main:10},
                      0x55...080: {size=100, caller=helper:6} }
After free(arr):    { 0x55...080: {size=100, caller=helper:6} }
At exit: 1 leak detected!

The Interview Questions Theyโ€™ll Ask

Q1: Explain how LD_PRELOAD works and what security implications it has.

Expected answer: LD_PRELOAD tells the dynamic linker to load specified shared libraries before any others. When resolving symbols, the linker searches in order: LD_PRELOAD libraries, then the executable, then DT_NEEDED libraries. This allows โ€œshadowingโ€ symbols like malloc. Security implication: setuid/setgid binaries ignore LD_PRELOAD (AT_SECURE) to prevent privilege escalation. Itโ€™s also why LD_PRELOAD canโ€™t intercept statically-linked binaries.

Q2: How would you avoid infinite recursion if your malloc wrapper needs to allocate memory?

Expected answer: Use a thread-local recursion guard. When entering the wrapper, check and set a flag. If already set, call the real malloc directly without tracking. Alternative: use a static buffer for internal allocations, or use mmap directly which doesnโ€™t go through malloc. The guard must be thread-local (using __thread) for correctness in multi-threaded programs.

Q3: Whatโ€™s the difference between using backtrace() and libunwind for stack traces?

Expected answer: backtrace() is simpler (part of glibc) but not signal-safe and may not work well with optimized code missing frame pointers. libunwind is more portable, can be configured for signal-safety, and handles various unwinding methods (DWARF, frame pointers, etc.). For a leak detector, either works, but libunwind is more robust for production use.

Q4: How would you extend this to detect double-frees and use-after-free?

Expected answer: For double-free: keep freed blocks in a โ€œrecently freedโ€ quarantine list; if free() is called on an address in quarantine, report double-free. For use-after-free: more complex; could use guard pages (like AddressSanitizer) or probabilistic detection via canary values. Full detection requires memory poisoning and potentially page protection tricks.

Q5: Why might your leak detector report false positives?

Expected answer: (1) Intentional leaks at shutdown (global caches freed by OS). (2) Memory reachable through global pointers but not explicitly freed. (3) Custom allocators that batch-free at exit. (4) Memory still in use when atexit runs. Real leak detectors (Valgrind) do reachability analysis at exit to distinguish โ€œdefinitely lostโ€ from โ€œstill reachable.โ€

Q6: How does Valgrindโ€™s memcheck differ from LD_PRELOAD interposition?

Expected answer: Valgrind runs the program on a synthetic CPU, instrumenting every memory access. This catches more bugs (uninitialized reads, buffer overflows) but with 10-50x slowdown. LD_PRELOAD only intercepts explicit allocation calls, so itโ€™s faster (~1.1x overhead) but misses many bug classes. Theyโ€™re complementary: LD_PRELOAD for production monitoring, Valgrind for thorough testing.

Hints in Layers

Layer 1 - The Basic Structure

Start with the wrapper skeleton. The key is dlsym(RTLD_NEXT, ...):

#define _GNU_SOURCE
#include <dlfcn.h>
#include <stdio.h>
#include <stdlib.h>

static void* (*real_malloc)(size_t) = NULL;
static void  (*real_free)(void*) = NULL;

__attribute__((constructor))
static void init(void) {
    real_malloc = dlsym(RTLD_NEXT, "malloc");
    real_free = dlsym(RTLD_NEXT, "free");
    if (!real_malloc || !real_free) {
        fprintf(stderr, "Error: dlsym failed\n");
        _exit(1);
    }
}

void* malloc(size_t size) {
    void* ptr = real_malloc(size);
    fprintf(stderr, "[ALLOC] malloc(%zu) = %p\n", size, ptr);
    return ptr;
}

void free(void* ptr) {
    fprintf(stderr, "[FREE] free(%p)\n", ptr);
    real_free(ptr);
}

Compile: gcc -shared -fPIC -o libleakcheck.so leakcheck.c -ldl

Layer 2 - The Recursion Problem

Your fprintf calls malloc internally! Add a guard:

static __thread int in_wrapper = 0;

void* malloc(size_t size) {
    if (in_wrapper || !real_malloc) {
        // Bootstrapping or recursive call - use real malloc directly
        if (!real_malloc) {
            real_malloc = dlsym(RTLD_NEXT, "malloc");
        }
        return real_malloc(size);
    }

    in_wrapper = 1;
    void* ptr = real_malloc(size);

    // Now safe to call fprintf, etc.
    fprintf(stderr, "[ALLOC] malloc(%zu) = %p\n", size, ptr);

    in_wrapper = 0;
    return ptr;
}

Layer 3 - Tracking Allocations

Use a hash table to track live allocations:

#define HASH_SIZE 65536

typedef struct alloc_info {
    void* ptr;
    size_t size;
    void* stack[8];
    int stack_depth;
    struct alloc_info* next;
} alloc_info_t;

static alloc_info_t* hash_table[HASH_SIZE];
static pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER;

static size_t ptr_hash(void* ptr) {
    return ((uintptr_t)ptr >> 4) % HASH_SIZE;
}

static void track_alloc(void* ptr, size_t size) {
    alloc_info_t* info = real_malloc(sizeof(alloc_info_t));
    info->ptr = ptr;
    info->size = size;
    info->stack_depth = backtrace(info->stack, 8);

    size_t idx = ptr_hash(ptr);
    pthread_mutex_lock(&lock);
    info->next = hash_table[idx];
    hash_table[idx] = info;
    pthread_mutex_unlock(&lock);
}

Layer 4 - Stack Trace Symbolization

Convert addresses to function names and line numbers:

#include <execinfo.h>
#include <dlfcn.h>

static void print_stack_trace(void** stack, int depth) {
    char** symbols = backtrace_symbols(stack, depth);
    for (int i = 0; i < depth; i++) {
        Dl_info info;
        if (dladdr(stack[i], &info) && info.dli_sname) {
            fprintf(stderr, "    #%d  %s + 0x%lx\n",
                    i, info.dli_sname,
                    (char*)stack[i] - (char*)info.dli_saddr);
        } else {
            fprintf(stderr, "    #%d  %s\n", i, symbols[i]);
        }
    }
    free(symbols);
}

For source lines, shell out to addr2line or embed libdwarf.

Layer 5 - The Leak Report

Register an atexit handler to print remaining allocations:

__attribute__((destructor))
static void report_leaks(void) {
    size_t total_leaked = 0;
    size_t leak_count = 0;

    fprintf(stderr, "\n=== LEAK REPORT ===\n");

    for (int i = 0; i < HASH_SIZE; i++) {
        for (alloc_info_t* info = hash_table[i]; info; info = info->next) {
            leak_count++;
            total_leaked += info->size;
            fprintf(stderr, "Block %p (%zu bytes)\n", info->ptr, info->size);
            print_stack_trace(info->stack, info->stack_depth);
        }
    }

    fprintf(stderr, "\n%zu blocks leaked (%zu bytes total)\n",
            leak_count, total_leaked);
}

Books That Will Help

Book Chapter(s) What Youโ€™ll Learn
CS:APP 7.12 Position-independent code, dynamic linking, symbol interposition
CS:APP 9.9 Dynamic memory allocation concepts
CS:APP 3.7 Stack structure, frame pointers for unwinding
TLPI 41 Fundamentals of shared libraries
TLPI 42 dlopen, dlsym, RTLD_NEXT, library interposition
Low-Level Programming Ch. 13 Shared libraries and dynamic linking on Linux
Effective C Ch. 6 Dynamic memory management best practices

Common Pitfalls & Debugging

Problem 1: Infinite recursion / stack overflow on startup

Symptom: Program crashes immediately with SIGSEGV in dlsym or printf.

Cause: dlsym or stdio functions call malloc before your constructor runs.

Fix: Use a static buffer for early allocations, or check if real_malloc is NULL:

static char early_buffer[4096];
static size_t early_offset = 0;

void* malloc(size_t size) {
    if (!real_malloc) {
        // Before constructor: use static buffer
        void* ptr = &early_buffer[early_offset];
        early_offset += (size + 15) & ~15;  // Align to 16
        return ptr;
    }
    // ... normal path
}

Problem 2: Deadlock in multi-threaded programs

Symptom: Program hangs when multiple threads allocate simultaneously.

Cause: Holding the lock while calling fprintf (which may call malloc).

Fix: Copy necessary data, release lock, then log:

void* malloc(size_t size) {
    void* ptr = real_malloc(size);

    pthread_mutex_lock(&lock);
    // Quick insert into hash table
    pthread_mutex_unlock(&lock);

    // Log AFTER releasing lock
    if (!in_wrapper) {
        in_wrapper = 1;
        fprintf(stderr, "[ALLOC] ...\n");
        in_wrapper = 0;
    }
    return ptr;
}

Problem 3: Incorrect leak counts (missing frees or double-counting)

Symptom: Report shows leaks for memory you know was freed.

Cause: Hash table collision handling bug, or realloc not tracked correctly.

Debug: Add verbose logging showing every insert/remove:

$ LD_DEBUG=bindings LD_PRELOAD=./libleakcheck.so ./app 2>&1 | grep -E 'malloc|free'

Problem 4: Stack traces missing function names

Symptom: Stack trace shows only addresses like 0x55a3b2c00060.

Cause: Program compiled without debug symbols, or stripped binary.

Fix: Compile with -g and -rdynamic (exports symbols for backtrace). For release builds, use addr2line:

char cmd[256];
snprintf(cmd, sizeof(cmd), "addr2line -e /proc/self/exe %p", addr);
system(cmd);

Project 25: Debugger (ptrace-based)

Attribute Value
Language C (alt: C++, Rust)
Difficulty Master
Time 1 month+
Chapters 3, 7, 8

What youโ€™ll build: A tiny debugger (mydb) that runs a child process under control, sets breakpoints, single-steps, and inspects registers/memory.

Why it matters: The ultimate test of understanding machine-level code, process control, and system calls.

Core challenges:

  • Controlling tracee with ptrace stop/resume semantics
  • Implementing software breakpoints (patching with int3)
  • Building a command loop (break, run, step, continue, regs, x)

Real World Outcome

When complete, your debugger will produce output like this:

$ ./mydb ./target_program
================================================================================
                         MYDB - Minimal x86-64 Debugger
================================================================================
[INFO] Loaded executable: ./target_program
[INFO] Entry point: 0x401000
[INFO] Text section: 0x401000 - 0x401fff
[INFO] Type 'help' for available commands

mydb> break main
[BREAK] Breakpoint 1 set at 0x401126 <main>

mydb> break 0x40113a
[BREAK] Breakpoint 2 set at 0x40113a <main+20>

mydb> run
[RUN] Starting program: ./target_program
[STOP] Hit breakpoint 1 at 0x401126 <main>

mydb> regs
================================================================================
                              REGISTER STATE
================================================================================
  rax = 0x0000000000000000    rbx = 0x0000000000000000
  rcx = 0x00007ffff7fa5040    rdx = 0x00007fffffffe0a8
  rsi = 0x00007fffffffe098    rdi = 0x0000000000000001
  rbp = 0x0000000000000000    rsp = 0x00007fffffffe088
  r8  = 0x0000000000000000    r9  = 0x00007ffff7fc9040
  r10 = 0x00007ffff7fc3908    r11 = 0x00007ffff7fe17c0
  r12 = 0x0000000000401000    r13 = 0x00007fffffffe090
  r14 = 0x0000000000000000    r15 = 0x0000000000000000
  rip = 0x0000000000401126    eflags = 0x00000246 [PF ZF IF]

mydb> x/8x $rsp
0x7fffffffe088:  0x00007ffff7df1b6b  0x0000000000000001
0x7fffffffe098:  0x00007fffffffe3a8  0x0000000000000000
0x7fffffffe0a8:  0x00007fffffffe3c0  0x00007fffffffe3d5
0x7fffffffe0b8:  0x00007fffffffe3f2  0x00007fffffffe410

mydb> disas main
0x401126 <main+0>:      push   rbp
0x401127 <main+1>:      mov    rbp, rsp
0x40112a <main+4>:      sub    rsp, 0x20
0x40112e <main+8>:      mov    dword ptr [rbp-0x14], edi
0x401131 <main+11>:     mov    qword ptr [rbp-0x20], rsi
0x401135 <main+15>:     mov    dword ptr [rbp-0x4], 0x2a
0x40113c <main+22>:     mov    eax, dword ptr [rbp-0x4]

mydb> step
[STEP] Single-stepped to 0x401127 <main+1>

mydb> continue
[CONTINUE] Resuming execution...
[STOP] Hit breakpoint 2 at 0x40113a <main+20>

mydb> print $rax
$rax = 42 (0x2a)

mydb> continue
[CONTINUE] Resuming execution...
[EXIT] Program exited with status 0

The Core Question Youโ€™re Answering

โ€œHow does a debugger gain control over another running process, stop it at arbitrary points, inspect its internal state, and resume execution - all without modifying the programโ€™s source code?โ€

This project demystifies the โ€œmagicโ€ of debuggers. Youโ€™ll discover that debuggers are just programs that use operating system facilities (ptrace) to become the โ€œparentโ€ of another process. Every debugger command maps to specific ptrace operations: breakpoints are instruction patches, single-stepping uses CPU trap flags, and register inspection reads from kernel-managed process state.

Concepts You Must Understand First

Before writing code, ensure you can explain:

Concept Why It Matters Reference
ptrace System Call The fundamental mechanism for process tracing; allows reading/writing memory, registers, and controlling execution TLPI Ch. 26, man ptrace
x86-64 Instruction Encoding You need to understand how int3 (0xCC) works as a breakpoint trap CS:APP 3.1, Intel SDM Vol. 2
Process States & Signals Traced processes stop on signals; SIGTRAP indicates breakpoint or single-step CS:APP 8.5, TLPI Ch. 20-22
ELF Format & Symbols To set breakpoints by function name, you must parse the symbol table CS:APP 7.4-7.5
Memory Layout Understanding text/data/stack segments and how addresses map to actual memory CS:APP 7.9, 9.7
Register Conventions Knowing which registers hold arguments, return values, and the instruction pointer CS:APP 3.4

Questions to Guide Your Design

Answer these before writing code:

  1. How do you start a process under your control? (fork + ptrace(PTRACE_TRACEME) + exec? What happens if exec fails?)

  2. Whatโ€™s the difference between PTRACE_CONT and PTRACE_SINGLESTEP? (How does the CPU know to stop after one instruction?)

  3. How do breakpoints actually work? (What byte do you save? What byte do you write? What happens when the CPU executes it?)

  4. After hitting a breakpoint, how do you continue? (Why canโ€™t you just PTRACE_CONT? Whatโ€™s the โ€œstep over breakpointโ€ dance?)

  5. How do you distinguish breakpoint stops from other stops? (SIGTRAP can mean breakpoint, single-step, or syscall stop)

  6. How do you read the traceeโ€™s memory? (PTRACE_PEEKTEXT returns one word at a time - how do you read larger regions?)

  7. How do you map addresses to function names? (Parse ELF .symtab/.dynsym? Use libdwarf for source lines?)

  8. What happens if the tracee forks? (Does your debugger follow the child? How do you handle multi-threaded programs?)

Thinking Exercise: Trace a Breakpoint Hit

Before implementing, trace through what happens when a breakpoint is hit:

State 1: Program loaded, breakpoint set at 0x401126
  - Original instruction at 0x401126: 55 (push rbp)
  - After setting breakpoint: CC (int3)
  - Debugger waiting in waitpid()

State 2: Program runs, hits breakpoint
  - CPU fetches instruction at 0x401126
  - CPU executes 0xCC (int3)
  - CPU raises #BP exception
  - Kernel converts to SIGTRAP, stops tracee
  - Kernel wakes debugger from waitpid()
  - RIP = 0x401127 (past the int3)

State 3: Debugger inspects state
  - ptrace(PTRACE_GETREGS, ...) reads all registers
  - RIP needs adjustment: subtract 1 to point to breakpoint
  - ptrace(PTRACE_PEEKTEXT, 0x401126) reads memory

State 4: User says "continue"
  - Restore original byte: poke 0x55 at 0x401126
  - Set RIP = 0x401126 (re-execute the instruction)
  - ptrace(PTRACE_SINGLESTEP) - execute ONE instruction
  - waitpid() - tracee stops after push rbp
  - Restore breakpoint: poke 0xCC at 0x401126
  - ptrace(PTRACE_CONT) - continue normally

Draw the instruction byte at 0x401126 through each state:

[LOAD]  0x401126: 55        (push rbp - original)
[BREAK] 0x401126: CC        (int3 - breakpoint active)
[HIT]   0x401126: CC, RIP=0x401127 (stopped, RIP past int3)
[STEP]  0x401126: 55, RIP=0x401126 (restored, about to re-execute)
[AFTER] 0x401126: CC, RIP=0x401127 (breakpoint re-armed, executed push)

The Interview Questions Theyโ€™ll Ask

Q1: Explain how software breakpoints work at the CPU level.

Expected answer: A software breakpoint replaces the first byte of an instruction with int3 (0xCC), a single-byte instruction that triggers a breakpoint exception (#BP). When the CPU executes it, it raises the exception, the kernel translates this to SIGTRAP, and the debugger (as the tracer) is notified via waitpid(). The debugger saves the original byte and restores it when needed for continuation.

Q2: Whatโ€™s the โ€œstep over breakpointโ€ problem and how do you solve it?

Expected answer: After hitting a breakpoint, you canโ€™t just continue because the breakpoint instruction is still there. The solution: (1) restore the original instruction, (2) set RIP back to the breakpoint address, (3) single-step one instruction, (4) re-insert the breakpoint, (5) then continue normally. This ensures the original instruction executes before re-arming the breakpoint.

Q3: How does ptrace(PTRACE_SINGLESTEP) work?

Expected answer: PTRACE_SINGLESTEP sets the x86 Trap Flag (TF) in the EFLAGS register. This flag causes the CPU to generate a debug exception (#DB) after executing exactly one instruction. The kernel handles this exception, delivers SIGTRAP to the tracer, and clears TF. The debugger sees the process stopped after one instruction.

Q4: Why do debuggers need to parse ELF files?

Expected answer: To provide symbolic debugging. Without ELF parsing, you can only work with raw addresses. By reading .symtab (static symbols) and .dynsym (dynamic symbols), you can map addresses to function names and vice versa. For source-level debugging, you need DWARF debug info (.debug_* sections) to map addresses to source lines.

Q5: How would you implement conditional breakpoints?

Expected answer: A conditional breakpoint stops only when a condition is true. Implementation: (1) set a normal breakpoint, (2) when hit, evaluate the condition (parse expression, read registers/memory), (3) if false, do the step-over-breakpoint dance silently and continue, (4) if true, report the stop to the user. The overhead comes from stopping on every hit even when continuing.

Q6: What are the limitations of ptrace-based debugging?

Expected answer: (1) Only one tracer per process - canโ€™t run under two debuggers. (2) Performance overhead from context switches on every stop. (3) Can be detected by the tracee (via PTRACE_TRACEME failing or checking ppid). (4) Anti-debugging tricks can interfere (timing checks, self-modifying code). (5) Multi-threaded debugging is complex (need to stop all threads atomically).

Hints in Layers

Layer 1 - Basic Process Control

Start with launching and stopping a process:

#include <sys/ptrace.h>
#include <sys/wait.h>
#include <sys/user.h>
#include <unistd.h>

int main(int argc, char **argv) {
    pid_t child = fork();

    if (child == 0) {
        // Child: request to be traced, then exec
        ptrace(PTRACE_TRACEME, 0, NULL, NULL);
        execvp(argv[1], &argv[1]);
        perror("exec failed");
        _exit(1);
    }

    // Parent: wait for child to stop at exec
    int status;
    waitpid(child, &status, 0);
    printf("Child stopped at entry point\n");

    // Continue child
    ptrace(PTRACE_CONT, child, NULL, NULL);
    waitpid(child, &status, 0);
    printf("Child exited with status %d\n", WEXITSTATUS(status));

    return 0;
}

Layer 2 - Reading Registers

Use PTRACE_GETREGS to read all registers at once:

#include <sys/user.h>

void show_registers(pid_t child) {
    struct user_regs_struct regs;
    ptrace(PTRACE_GETREGS, child, NULL, &regs);

    printf("rip = 0x%llx\n", regs.rip);
    printf("rsp = 0x%llx\n", regs.rsp);
    printf("rax = 0x%llx\n", regs.rax);
    // ... other registers
}

Layer 3 - Reading Memory

PTRACE_PEEKTEXT reads one word at a time:

void read_memory(pid_t child, unsigned long addr, void *buf, size_t len) {
    unsigned long *ptr = buf;
    size_t i;

    for (i = 0; i < len; i += sizeof(long)) {
        long word = ptrace(PTRACE_PEEKTEXT, child, addr + i, NULL);
        if (errno) {
            perror("PEEKTEXT failed");
            return;
        }
        *ptr++ = word;
    }
}

Layer 4 - Setting Breakpoints

Save the original byte, write 0xCC:

typedef struct {
    unsigned long addr;
    unsigned char saved_byte;
    int enabled;
} breakpoint_t;

void set_breakpoint(pid_t child, breakpoint_t *bp, unsigned long addr) {
    long word = ptrace(PTRACE_PEEKTEXT, child, addr, NULL);
    bp->addr = addr;
    bp->saved_byte = (unsigned char)(word & 0xff);
    bp->enabled = 1;

    // Replace first byte with int3 (0xCC)
    long modified = (word & ~0xff) | 0xCC;
    ptrace(PTRACE_POKETEXT, child, addr, modified);
}

void disable_breakpoint(pid_t child, breakpoint_t *bp) {
    long word = ptrace(PTRACE_PEEKTEXT, child, bp->addr, NULL);
    long restored = (word & ~0xff) | bp->saved_byte;
    ptrace(PTRACE_POKETEXT, child, bp->addr, restored);
    bp->enabled = 0;
}

Layer 5 - The Continue Dance

When continuing from a breakpoint:

void continue_from_breakpoint(pid_t child, breakpoint_t *bp) {
    struct user_regs_struct regs;
    ptrace(PTRACE_GETREGS, child, NULL, &regs);

    // RIP points past int3; back it up
    regs.rip = bp->addr;
    ptrace(PTRACE_SETREGS, child, NULL, &regs);

    // Restore original instruction
    disable_breakpoint(child, bp);

    // Single-step one instruction
    ptrace(PTRACE_SINGLESTEP, child, NULL, NULL);
    int status;
    waitpid(child, &status, 0);

    // Re-enable breakpoint
    set_breakpoint(child, bp, bp->addr);

    // Now continue normally
    ptrace(PTRACE_CONT, child, NULL, NULL);
}

Books That Will Help

Book Chapter(s) What Youโ€™ll Learn
CS:APP 3.1-3.4 x86-64 instruction formats, registers, calling conventions
CS:APP 7.4-7.5 ELF format, symbol tables for address-to-name mapping
CS:APP 8.4-8.5 Process control, signals, and how SIGTRAP works
TLPI 26 Comprehensive ptrace coverage with examples
TLPI 20-22 Signals and signal handling
Low-Level Programming Ch. 11 Practical debugging and ptrace examples
Intel SDM Vol. 2 INT instruction How int3 generates #BP exception
How Debuggers Work (Rosenberg) All Classic book on debugger implementation

Common Pitfalls & Debugging

Problem 1: Breakpoint doesnโ€™t trigger

Symptom: Program runs past breakpoint address without stopping.

Cause: Wrong address (e.g., function entry vs. first instruction), or breakpoint set on non-executable memory.

Fix: Verify address with objdump -d. Ensure youโ€™re setting breakpoint on actual code:

$ objdump -d target | grep -A5 "<main>:"
0000000000401126 <main>:
  401126:       55                      push   %rbp

Problem 2: SIGTRAP but not from breakpoint

Symptom: Unexpected stops at random addresses.

Cause: Single-step trap, syscall stop (if PTRACE_SYSCALL was used), or clone/fork events.

Fix: Check stop reason carefully:

if (WIFSTOPPED(status)) {
    int sig = WSTOPSIG(status);
    if (sig == SIGTRAP) {
        // Could be breakpoint, single-step, or syscall
        siginfo_t info;
        ptrace(PTRACE_GETSIGINFO, child, NULL, &info);
        if (info.si_code == SI_KERNEL || info.si_code == TRAP_BRKPT) {
            // Breakpoint
        } else if (info.si_code == TRAP_TRACE) {
            // Single-step
        }
    }
}

Problem 3: Registers show wrong values after breakpoint

Symptom: RIP points to wrong address after hitting breakpoint.

Cause: Forgetting that int3 advances RIP by 1. After hitting breakpoint at 0x401126, RIP = 0x401127.

Fix: Always subtract 1 from RIP when stopped at a breakpoint:

regs.rip--;  // Point back to the int3/original instruction

Problem 4: Cannot continue after hitting breakpoint

Symptom: Program immediately hits same breakpoint again, or crashes.

Cause: Not doing the single-step dance - you continue with int3 still in place.

Fix: Always: restore original byte, step once, re-insert breakpoint, then continue.


Project 26: Operating System Kernel Capstone

Attribute Value
Language C + x86-64 Assembly (alt: Rust)
Difficulty Master+
Time 3โ€“6 months
Chapters All CS:APP + OSTEP

What youโ€™ll build: A minimal x86-64 kernel that boots in QEMU, enables paging, handles interrupts, and runs simple user processes.

Why it matters: An OS kernel uses every concept from CS:APPโ€”this is the ultimate capstone.

Core challenges:

  • Booting to 64-bit long mode
  • Implementing physical/virtual memory management
  • Handling interrupts and context-switching between tasks
  • Designing a minimal syscall boundary

Real World Outcome

When complete, your kernel will boot and run in QEMU like this:

$ make
  AS    boot.S
  CC    kernel.c
  CC    mm.c
  CC    interrupt.c
  CC    process.c
  CC    syscall.c
  LD    kernel.elf
  OBJCOPY kernel.bin

$ make run
qemu-system-x86_64 -kernel kernel.bin -serial stdio -no-reboot

================================================================================
                    MINIX86 KERNEL v0.1 - x86-64 Operating System
================================================================================
[BOOT] Entered 64-bit long mode
[BOOT] Kernel loaded at 0xffffffff80000000
[BOOT] Stack at 0xffffffff80010000

[MM] Physical memory detected: 128 MB
[MM] Kernel: 0x100000 - 0x200000 (1 MB)
[MM] Free memory starts at: 0x200000
[MM] Initializing page frame allocator...
[MM] 32256 page frames available (126 MB)
[MM] Page tables initialized

[IDT] Loading Interrupt Descriptor Table...
[IDT] Exception handlers 0-31 installed
[IDT] IRQ handlers 32-47 installed
[IDT] Syscall handler at vector 0x80 installed
[PIC] 8259 PIC remapped (IRQ0 -> INT32)
[TIMER] PIT configured for 100 Hz tick

[PROC] Process subsystem initialized
[PROC] Creating init process (PID 1)...
[PROC] Loading /bin/init from initrd
[PROC] Entry point: 0x400000
[PROC] User stack: 0x7fffffffe000

================================================================================
                         SWITCHING TO USER MODE
================================================================================
[SYSCALL] init(1): write(1, "Hello from userspace!\n", 22)
Hello from userspace!
[SYSCALL] init(1): fork() = 2
[PROC] Created process 2 (parent: 1)
[SYSCALL] shell(2): write(1, "minix86> ", 9)
minix86> [SYSCALL] shell(2): read(0, buf, 256)

$ # Type commands at the kernel shell
$ echo hello
[SYSCALL] shell(2): fork() = 3
[SYSCALL] echo(3): execve("/bin/echo", ["echo", "hello"], envp)
[SYSCALL] echo(3): write(1, "hello\n", 6)
hello
[SYSCALL] echo(3): exit(0)
[PROC] Process 3 exited with status 0
[SYSCALL] shell(2): wait4(-1, &status, 0, NULL) = 3

minix86> ps
[SYSCALL] shell(2): fork() = 4
[SYSCALL] ps(4): open("/proc/self/status", O_RDONLY)
  PID  PPID  STATE    NAME
    1     0  SLEEP    init
    2     1  RUNNING  shell
    4     2  RUNNING  ps
[SYSCALL] ps(4): exit(0)

minix86> ^C
[SIGNAL] Sending SIGINT to process 2
[PROC] Shell caught SIGINT, continuing...

$ # Press Ctrl+A, X to exit QEMU
[SHUTDOWN] System halt requested
[SHUTDOWN] Syncing filesystems...
[SHUTDOWN] Goodbye!

The Core Question Youโ€™re Answering

โ€œHow does a computer go from power-on to running user programs, and what does the kernel do to make this possible while keeping user programs isolated from each other and from the hardware?โ€

This project is the ultimate integration of everything in CS:APP. Youโ€™ll build the software that sits between bare metal and applications. Every concept youโ€™ve studied - memory layout, calling conventions, interrupts, virtual memory, process control - comes together here. When you understand how a kernel works, you understand how computers work.

Concepts You Must Understand First

Before writing code, ensure you can explain:

Concept Why It Matters Reference
x86-64 Boot Process Understanding real mode, protected mode, and long mode transitions OSDev Wiki, Intel SDM Vol. 3
Paging & Page Tables 4-level page tables (PML4/PDPT/PD/PT), how virtual addresses translate to physical CS:APP 9.6, OSTEP Ch. 18-20
Interrupts & Exceptions IDT setup, interrupt handlers, CPU privilege levels (rings 0-3) CS:APP 8.1, Intel SDM Vol. 3
Context Switching Saving/restoring CPU state, switching between kernel and user stacks OSTEP Ch. 6, CS:APP 8.2
System Calls The syscall/sysret mechanism, transitioning between user and kernel mode CS:APP 8.2, OSTEP Ch. 6
Memory Management Physical frame allocation, virtual memory mapping, kernel/user space split CS:APP Ch. 9, OSTEP Ch. 13-23

Questions to Guide Your Design

Answer these before writing code:

  1. How do you get from BIOS/UEFI to your kernel? (Multiboot? Custom bootloader? UEFI stub?)

  2. How do you transition from 32-bit protected mode to 64-bit long mode? (What CR registers must be set? What page tables are required?)

  3. How do you organize physical memory? (Bitmap allocator? Free list? Buddy system?)

  4. How do you set up kernel virtual memory? (Direct mapping? Higher-half kernel? What goes where?)

  5. How do you handle interrupts? (IDT format in 64-bit? How do you save CPU state? Whatโ€™s the interrupt stack?)

  6. How do you switch from kernel to user mode? (What registers change? How does iretq work?)

  7. How do you implement system calls? (syscall/sysret vs int 0x80? Whatโ€™s the calling convention?)

  8. How do you switch between processes? (When does it happen? What state must be saved/restored?)

Thinking Exercise: Trace a System Call

Before implementing, trace through what happens when a user program calls write(1, "hello", 5):

User Space (Ring 3)
-------------------
1. libc wrapper: write() function
   - Put syscall number (1) in rax
   - Put arguments in rdi=1, rsi=buf, rdx=5
   - Execute 'syscall' instruction

CPU Transition (syscall instruction)
------------------------------------
2. CPU actions (automatic, hardware):
   - Save rip to rcx
   - Save rflags to r11
   - Load rip from IA32_LSTAR MSR (your syscall entry point)
   - Load CS from IA32_STAR MSR (kernel code segment)
   - Load SS (kernel stack segment)
   - Mask rflags with IA32_FMASK MSR
   - Switch to Ring 0
   - NOTE: rsp NOT changed - you must switch stacks!

Kernel Space (Ring 3)
---------------------
3. syscall_entry (assembly):
   - swapgs (switch to kernel GS for per-CPU data)
   - Save user rsp to per-CPU storage
   - Load kernel rsp from per-CPU storage
   - Push user context (for later iretq return)
   - Call C syscall dispatcher

4. sys_write() handler:
   - Validate fd (is 1 a valid file descriptor?)
   - Validate buffer pointer (is it in user space? is it mapped?)
   - Copy data from user space (carefully!)
   - Perform the write to console/file
   - Return bytes written

5. Return to user space:
   - Pop saved context
   - swapgs (restore user GS)
   - sysretq (or iretq for more flexibility)

Back to User Space
------------------
6. After sysret:
   - CPU restores rip from rcx, rflags from r11
   - Switch back to Ring 3
   - libc wrapper returns to caller with result in rax

Draw the stack contents at step 3:

Kernel Stack (after saving context):
+------------------+ <- kernel rsp (low)
|  user ss         |
|  user rsp        |
|  user rflags     |
|  user cs         |
|  user rip (rcx)  |
|  error code (0)  |  <- interrupt frame
+------------------+
|  rax (syscall #) |
|  rbx             |
|  rcx             |
|  ...             |  <- general registers
+------------------+

The Interview Questions Theyโ€™ll Ask

Q1: Explain the difference between physical and virtual addresses, and why kernels use virtual memory.

Expected answer: Physical addresses refer to actual RAM locations. Virtual addresses are what the CPU uses; theyโ€™re translated by the MMU via page tables. Kernels use virtual memory for: (1) isolation between processes - each has its own address space; (2) abstraction - programs donโ€™t need to know physical memory layout; (3) demand paging - not all memory needs to be physically present; (4) shared libraries - same physical pages mapped in multiple processes.

Q2: What happens when a page fault occurs?

Expected answer: The CPU raises exception #14 (page fault), pushing an error code with bits indicating: was it a read/write, user/kernel access, page present or not. The kernelโ€™s page fault handler examines the faulting address (in CR2) and error code to determine: (1) valid access to unmapped page -> allocate and map a frame; (2) copy-on-write -> copy the page and remap; (3) stack growth -> extend the stack; (4) invalid access -> kill the process with SIGSEGV.

Q3: How does the kernel protect itself from user programs?

Expected answer: Multiple mechanisms: (1) Privilege rings - kernel runs in Ring 0, users in Ring 3; Ring 3 canโ€™t execute privileged instructions. (2) Separate page tables - user pages marked as user-accessible, kernel pages as supervisor-only. (3) SMAP/SMEP on modern CPUs - prevent kernel from executing or even accessing user memory without explicit override. (4) System call interface - only way for user code to request kernel services.

Q4: Explain context switching between two processes.

Expected answer: When switching from process A to B: (1) Save Aโ€™s register state to its kernel stack or PCB; (2) Switch page tables - load Bโ€™s PML4 into CR3; (3) Switch kernel stacks - change rsp to Bโ€™s kernel stack; (4) Restore Bโ€™s register state; (5) Return to Bโ€™s code. The trigger is usually a timer interrupt (preemption) or a blocking system call (voluntary switch). TLB is flushed on CR3 change unless using PCID.

Q5: Whatโ€™s the difference between exceptions, interrupts, and traps?

Expected answer: All are handled via the IDT but have different sources. Exceptions: synchronous, caused by CPU (divide by zero, page fault) - faults can be restarted, traps advance past the instruction. Hardware interrupts: asynchronous, from devices (keyboard, timer) via the APIC/PIC - the interrupted instruction completes. Software traps: synchronous, explicitly triggered (int, syscall) - used for system calls. All save state and transfer to a handler.

Q6: How would you add SMP (multiprocessor) support to your kernel?

Expected answer: Key challenges: (1) Per-CPU data structures - each CPU needs its own scheduler queue, current process pointer, kernel stack. Use GS segment for per-CPU access. (2) Lock all shared data - use spinlocks for short critical sections; the scheduler needs careful locking. (3) IPI (inter-processor interrupts) - to signal other CPUs for TLB shootdown, reschedule requests. (4) AP bootstrap - secondary CPUs start in real mode; need special boot code to bring them to long mode.

Hints in Layers

Layer 1 - Multiboot Header and Entry

Start with a minimal bootable kernel:

; boot.S - Multiboot2 header and entry point
.section .multiboot
.align 8
multiboot_header:
    .long 0xE85250D6                    ; Magic
    .long 0                             ; Architecture (i386)
    .long multiboot_header_end - multiboot_header
    .long -(0xE85250D6 + 0 + (multiboot_header_end - multiboot_header))
    ; End tag
    .word 0
    .word 0
    .long 8
multiboot_header_end:

.section .bss
.align 16
stack_bottom:
    .skip 16384                         ; 16 KB stack
stack_top:

.section .text
.global _start
.code32
_start:
    mov $stack_top, %esp
    call check_multiboot
    call check_cpuid
    call check_long_mode
    call setup_page_tables
    call enable_paging
    lgdt gdt64_pointer
    jmp $0x08, $long_mode_start

.code64
long_mode_start:
    mov $0x10, %ax
    mov %ax, %ds
    mov %ax, %es
    mov %ax, %ss
    call kernel_main
    hlt

Layer 2 - Transition to Long Mode

Set up identity-mapped page tables and enable paging:

setup_page_tables:
    ; Map first 2MB with huge pages
    mov $pml4, %edi
    mov $pdpt, %eax
    or $0x03, %eax                      ; Present + Writable
    mov %eax, (%edi)

    mov $pdpt, %edi
    mov $pd, %eax
    or $0x03, %eax
    mov %eax, (%edi)

    mov $pd, %edi
    mov $0x83, %eax                     ; Present + Writable + Huge (2MB)
    mov %eax, (%edi)
    ret

enable_paging:
    mov $pml4, %eax
    mov %eax, %cr3                      ; Load page table

    mov %cr4, %eax
    or $0x20, %eax                      ; Enable PAE
    mov %eax, %cr4

    mov $0xC0000080, %ecx               ; EFER MSR
    rdmsr
    or $0x100, %eax                     ; Enable Long Mode
    wrmsr

    mov %cr0, %eax
    or $0x80000001, %eax                ; Enable Paging + Protection
    mov %eax, %cr0
    ret

Layer 3 - Interrupt Descriptor Table

Set up exception and interrupt handlers:

// interrupt.c
#include <stdint.h>

struct idt_entry {
    uint16_t offset_low;
    uint16_t selector;
    uint8_t  ist;
    uint8_t  type_attr;
    uint16_t offset_mid;
    uint32_t offset_high;
    uint32_t zero;
} __attribute__((packed));

struct idt_entry idt[256];

void set_idt_entry(int n, uint64_t handler, uint8_t type) {
    idt[n].offset_low  = handler & 0xFFFF;
    idt[n].selector    = 0x08;  // Kernel code segment
    idt[n].ist         = 0;
    idt[n].type_attr   = type;  // 0x8E = interrupt gate, 0x8F = trap gate
    idt[n].offset_mid  = (handler >> 16) & 0xFFFF;
    idt[n].offset_high = handler >> 32;
    idt[n].zero        = 0;
}

extern void isr0(void);   // Divide error
extern void isr14(void);  // Page fault
extern void irq0(void);   // Timer

void idt_init(void) {
    set_idt_entry(0, (uint64_t)isr0, 0x8E);
    set_idt_entry(14, (uint64_t)isr14, 0x8E);
    set_idt_entry(32, (uint64_t)irq0, 0x8E);
    // ... more handlers

    struct { uint16_t size; uint64_t addr; } __attribute__((packed)) idtr;
    idtr.size = sizeof(idt) - 1;
    idtr.addr = (uint64_t)idt;
    asm volatile("lidt %0" : : "m"(idtr));
}

Layer 4 - Physical Memory Allocator

Simple bitmap-based page frame allocator:

// mm.c
#define PAGE_SIZE 4096
#define MAX_FRAMES (128 * 1024 * 1024 / PAGE_SIZE)  // 128 MB

static uint8_t frame_bitmap[MAX_FRAMES / 8];
static uint64_t total_frames;
static uint64_t free_frames;

void pmm_init(uint64_t mem_size, uint64_t kernel_end) {
    total_frames = mem_size / PAGE_SIZE;
    free_frames = total_frames;

    // Mark all as free initially
    memset(frame_bitmap, 0, sizeof(frame_bitmap));

    // Mark kernel memory as used
    uint64_t kernel_frames = (kernel_end + PAGE_SIZE - 1) / PAGE_SIZE;
    for (uint64_t i = 0; i < kernel_frames; i++) {
        frame_bitmap[i / 8] |= (1 << (i % 8));
        free_frames--;
    }
}

uint64_t pmm_alloc_frame(void) {
    for (uint64_t i = 0; i < total_frames; i++) {
        if (!(frame_bitmap[i / 8] & (1 << (i % 8)))) {
            frame_bitmap[i / 8] |= (1 << (i % 8));
            free_frames--;
            return i * PAGE_SIZE;
        }
    }
    return 0;  // Out of memory
}

void pmm_free_frame(uint64_t addr) {
    uint64_t frame = addr / PAGE_SIZE;
    frame_bitmap[frame / 8] &= ~(1 << (frame % 8));
    free_frames++;
}

Layer 5 - Process and Context Switch

Basic process structure and switching:

// process.c
struct context {
    uint64_t rsp;
    uint64_t rbp;
    uint64_t rbx;
    uint64_t r12;
    uint64_t r13;
    uint64_t r14;
    uint64_t r15;
    uint64_t rip;
};

struct process {
    int pid;
    enum { RUNNING, READY, BLOCKED, ZOMBIE } state;
    struct context context;
    uint64_t *page_table;
    uint64_t kernel_stack;
    uint64_t user_stack;
};

struct process *current;
struct process processes[MAX_PROCESSES];

// Assembly context switch (in switch.S)
// void switch_context(struct context *old, struct context *new);

void schedule(void) {
    struct process *next = find_next_runnable();
    if (next == current) return;

    struct process *prev = current;
    current = next;

    // Switch page tables
    asm volatile("mov %0, %%cr3" : : "r"(next->page_table));

    // Switch context
    switch_context(&prev->context, &next->context);
}

Books That Will Help

Book Chapter(s) What Youโ€™ll Learn
CS:APP Ch. 9 Virtual memory fundamentals, page tables
CS:APP Ch. 8 Exceptions, interrupts, process control
CS:APP Ch. 3 x86-64 assembly for boot code and handlers
OSTEP Ch. 4-6 Process abstraction, scheduling, context switching
OSTEP Ch. 13-23 Virtual memory, paging, swapping
OSTEP Ch. 26-32 Concurrency, locks, condition variables
Intel SDM Vol. 3 Ch. 2-6 Protected mode, paging, interrupts
OSDev Wiki Various Practical tutorials for each component
xv6 Book All Complete teaching OS with clean code

Common Pitfalls & Debugging

Problem 1: Triple fault on boot (QEMU resets immediately)

Symptom: QEMU restarts as soon as kernel loads, or immediately after enabling paging.

Cause: Invalid page tables, IDT not set up, or exception in exception handler.

Debug: Use QEMUโ€™s debug options:

$ qemu-system-x86_64 -kernel kernel.bin -d int,cpu_reset -no-reboot
# Shows interrupts and why reset occurred

$ qemu-system-x86_64 -kernel kernel.bin -s -S
# Starts paused; attach GDB: target remote :1234

Problem 2: Page fault in kernel mode

Symptom: Page fault at unexpected address, usually during initialization.

Cause: Accessing unmapped memory, or page tables not set up correctly.

Fix: Verify your page table mappings:

// Debug: print page table entries
void debug_pagewalk(uint64_t addr) {
    uint64_t *pml4 = (uint64_t *)read_cr3();
    uint64_t pml4e = pml4[(addr >> 39) & 0x1FF];
    printf("PML4[%d] = 0x%lx\n", (addr >> 39) & 0x1FF, pml4e);
    // ... continue for PDPT, PD, PT
}

Problem 3: Interrupts not working

Symptom: Timer interrupt never fires, keyboard doesnโ€™t respond.

Cause: IDT not loaded, PIC not configured, or interrupts disabled (CLI).

Fix: Verify interrupt setup:

// Check if interrupts enabled
uint64_t flags;
asm volatile("pushfq; pop %0" : "=r"(flags));
if (!(flags & 0x200)) {
    printf("Interrupts disabled (IF=0)!\n");
    asm volatile("sti");
}

// Verify PIC is sending interrupts
outb(0x20, 0x0A);  // Read IRR
printf("PIC IRR: 0x%x\n", inb(0x20));

Problem 4: User program crashes immediately

Symptom: General protection fault or page fault when switching to user mode.

Cause: User page tables wrong, wrong CS/SS for user mode, or stack not set up.

Fix: Verify the iretq frame is correct:

// Stack for iretq to user mode:
// [RSP+32] SS     = 0x23 (user data | RPL=3)
// [RSP+24] RSP    = user stack pointer
// [RSP+16] RFLAGS = 0x202 (IF set)
// [RSP+8]  CS     = 0x1B (user code | RPL=3)
// [RSP+0]  RIP    = user entry point

Legacy project list, re-numbered to match the expanded guides in CSAPP_3E_DEEP_LEARNING_PROJECTS/:

P# Legacy Project Expanded guide
P02 + P03 Bit Manipulation Puzzle Solver (Data Lab) P02-bitwise-data-inspector.md, P03-data-lab-clone.md
P05 Binary Bomb Defuser P05-bomb-lab-workflow.md
P06 Buffer Overflow Exploit Lab (Attack Lab) P06-attack-lab-workflow.md
P07 Y86-64 Processor Simulator P07-y86-64-cpu-simulator.md
P09 Cache Simulator P09-cache-lab-simulator.md
P14 Dynamic Memory Allocator (Malloc Lab) P14-build-your-own-malloc.md
P11 + P12 Unix Shell Implementation P11-signals-processes-sandbox.md, P12-unix-shell-job-control.md
P18 ELF Linker and Loader P18-elf-linker-and-loader.md
P19 Virtual Memory Simulator P19-virtual-memory-simulator.md
P15 Robust I/O Library (RIO) P15-robust-unix-io-toolkit.md
P20 HTTP Web Server P20-http-web-server.md
P17 Concurrent Web Proxy P17-csapp-capstone-proxy.md
P21 Thread Pool Implementation P21-thread-pool-implementation.md
P22 Signal-Safe Printf P22-signal-safe-printf.md
P23 Performance Profiler P23-performance-profiler.md
P24 Memory Leak Detector P24-memory-leak-detector.md
P25 Debugger (ptrace-based) P25-debugger-ptrace.md
P26 Operating System Kernel Capstone P26-operating-system-kernel-capstone.md

Project Comparison Table

# Project Difficulty Time Understanding Fun
1 Toolchain Explorer Intermediate 1โ€“2 wk โ—โ—โ—โ—‹ โ—โ—โ—‹โ—‹
2 Bitwise Data Inspector Intermediate 0.5โ€“2 wk โ—โ—โ—โ—‹ โ—โ—โ—‹โ—‹
3 Data Lab Clone Advanced 1โ€“2 wk โ—โ—โ—โ—‹ โ—โ—โ—‹โ—‹
4 Calling Convention Crash Cart Advanced 1โ€“2 wk โ—โ—โ—โ—‹ โ—โ—โ—โ—‹
5 Bomb Lab Workflow Advanced 1โ€“2 wk โ—โ—โ—โ—‹ โ—โ—โ—โ—‹
6 Attack Lab Workflow Expert 2โ€“3 wk โ—โ—โ—โ— โ—โ—โ—โ—‹
7 Y86-64 CPU Simulator Expert 1 mo+ โ—โ—โ—โ— โ—โ—โ—โ—
8 Performance Clinic Advanced 1โ€“2 wk โ—โ—โ—โ—‹ โ—โ—โ—‹โ—‹
9 Cache Simulator + Visualizer Advanced 2โ€“3 wk โ—โ—โ—โ— โ—โ—โ—โ—‹
10 ELF Link Map + Interposition Advanced 2โ€“3 wk โ—โ—โ—โ— โ—โ—โ—โ—‹
11 Signals + Processes Sandbox Advanced 1โ€“2 wk โ—โ—โ—โ—‹ โ—โ—โ—‹โ—‹
12 Unix Shell with Job Control Advanced 2โ€“3 wk โ—โ—โ—โ— โ—โ—โ—โ—‹
13 VM Map Visualizer Advanced 1โ€“2 wk โ—โ—โ—โ—‹ โ—โ—โ—‹โ—‹
14 Build Your Own Malloc Expert 1 mo+ โ—โ—โ—โ— โ—โ—โ—โ—‹
15 Robust Unix I/O Toolkit Intermediate 1โ€“2 wk โ—โ—โ—โ—‹ โ—โ—โ—‹โ—‹
16 Concurrency Workbench Expert 2โ€“3 wk โ—โ—โ—โ— โ—โ—โ—โ—‹
17 Capstone Proxy Expert 2โ€“3 mo โ—โ—โ—โ— โ—โ—โ—โ—
18 ELF Linker and Loader Expert 2โ€“3 wk โ—โ—โ—โ— โ—โ—โ—โ—‹
19 Virtual Memory Simulator Expert 2โ€“3 wk โ—โ—โ—โ— โ—โ—โ—โ—‹
20 HTTP Web Server Advanced 1โ€“2 wk โ—โ—โ—โ—‹ โ—โ—โ—โ—‹
21 Thread Pool Implementation Advanced 1โ€“2 wk โ—โ—โ—โ—‹ โ—โ—โ—โ—‹
22 Signal-Safe Printf Advanced Weekend โ—โ—โ—โ—‹ โ—โ—โ—‹โ—‹
23 Performance Profiler Advanced 1โ€“2 wk โ—โ—โ—โ— โ—โ—โ—โ—‹
24 Memory Leak Detector Advanced 1โ€“2 wk โ—โ—โ—โ—‹ โ—โ—โ—‹โ—‹
25 Debugger (ptrace-based) Expert 2โ€“4 wk โ—โ—โ—โ— โ—โ—โ—โ—
26 OS Kernel Capstone Expert 2โ€“3 mo โ—โ—โ—โ— โ—โ—โ—โ—

Skills Matrix

Project Ch.1 Ch.2 Ch.3 Ch.4 Ch.5 Ch.6 Ch.7 Ch.8 Ch.9 Ch.10 Ch.11 Ch.12
P1: Toolchain โ—โ— ย  ย  ย  ย  ย  โ—โ— ย  ย  ย  ย  ย 
P2: Bitwise ย  โ—โ—โ— โ— ย  ย  ย  ย  ย  ย  ย  ย  ย 
P3: Data Lab ย  โ—โ—โ— ย  ย  ย  ย  ย  ย  ย  ย  ย  ย 
P4: Crash Cart ย  ย  โ—โ—โ— ย  ย  ย  ย  ย  ย  ย  ย  ย 
P5: Bomb Lab ย  ย  โ—โ—โ— ย  ย  ย  ย  ย  ย  ย  ย  ย 
P6: Attack Lab ย  ย  โ—โ—โ— ย  ย  ย  ย  ย  ย  ย  ย  ย 
P7: Y86-64 ย  ย  ย  โ—โ—โ— โ— ย  ย  ย  ย  ย  ย  ย 
P8: Perf Clinic โ— ย  ย  ย  โ—โ—โ— โ—โ— ย  ย  ย  ย  ย  ย 
P9: Cache Lab ย  ย  ย  ย  โ—โ— โ—โ—โ— ย  ย  ย  ย  ย  ย 
P10: ELF Link ย  ย  ย  ย  ย  ย  โ—โ—โ— ย  ย  ย  ย  ย 
P11: Signals ย  ย  ย  ย  ย  ย  ย  โ—โ—โ— ย  ย  ย  ย 
P12: Shell ย  ย  ย  ย  ย  ย  ย  โ—โ—โ— ย  ย  ย  โ—
P13: VM Map ย  ย  ย  ย  ย  ย  ย  โ—โ— โ—โ—โ— ย  ย  ย 
P14: Malloc ย  ย  ย  ย  ย  โ—โ— ย  ย  โ—โ—โ— ย  ย  ย 
P15: Unix I/O ย  ย  ย  ย  ย  ย  ย  ย  โ—โ— โ—โ—โ— ย  ย 
P16: Concurrency ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  โ—โ—โ—
P17: Capstone โ— โ— โ— ย  โ— โ—โ— โ— โ— โ—โ— โ—โ— โ—โ— โ—โ—
P18: ELF Linker ย  ย  ย  ย  ย  ย  โ—โ—โ— โ—โ— ย  ย  ย  ย 
P19: VM Simulator ย  ย  ย  ย  ย  โ—โ— ย  ย  โ—โ—โ— ย  ย  ย 
P20: HTTP Server ย  ย  ย  ย  ย  ย  ย  ย  ย  โ—โ—โ— โ—โ—โ— ย 
P21: Thread Pool ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  ย  โ—โ—โ—
P22: Signal Printf ย  ย  ย  ย  ย  ย  ย  โ—โ—โ— ย  ย  ย  โ—โ—
P23: Profiler ย  ย  โ—โ— ย  โ—โ—โ— ย  ย  โ—โ— ย  ย  ย  ย 
P24: Leak Detector ย  ย  ย  ย  ย  ย  โ—โ— ย  โ—โ—โ— ย  ย  ย 
P25: Debugger ย  ย  โ—โ—โ— ย  ย  ย  ย  โ—โ—โ— ย  ย  ย  ย 
P26: OS Kernel โ— โ— โ—โ— โ—โ— โ— โ—โ— โ—โ— โ—โ— โ—โ— โ—โ— ย  โ—โ—
Legend: โ—โ—โ— = Primary focus โ—โ— = Significant coverage โ— = Touches on

Resources

Official CS:APP Materials

Supplementary Books

  • Effective C, 2nd Edition โ€” Robert C. Seacord (modern C practices)
  • C Interfaces and Implementations โ€” David R. Hanson (allocator design)
  • Operating Systems: Three Easy Pieces โ€” Arpaci-Dusseau (concurrency, VM)
  • Computer Organization and Design โ€” Patterson & Hennessy (architecture)

Tools

  • Debuggers: GDB, LLDB
  • Disassemblers: objdump, Ghidra, Binary Ninja
  • Profilers: perf, Valgrind, cachegrind
  • Build: Make, CMake, gcc/clang

Online Resources


Summary

# Project Language
1 Hello, Toolchain โ€” Build Pipeline Explorer C
2 Bitwise Data Inspector C
3 Data Lab Clone C
4 x86-64 Calling Convention Crash Cart C
5 Bomb Lab Workflow C
6 Attack Lab Workflow C
7 Y86-64 CPU Simulator C
8 Performance Clinic C
9 Cache Lab++ C
10 ELF Link Map & Interposition Toolkit C
11 Signals + Processes Sandbox C
12 Unix Shell with Job Control C
13 Virtual Memory Map Visualizer C
14 Build Your Own Malloc C
15 Robust Unix I/O Toolkit C
16 Concurrency Workbench C
17 CS:APP Capstone Proxy Platform C
18 ELF Linker and Loader C
19 Virtual Memory Simulator C
20 HTTP Web Server C
21 Thread Pool Implementation C
22 Signal-Safe Printf C
23 Performance Profiler C
24 Memory Leak Detector C
25 Debugger (ptrace-based) C
26 Operating System Kernel Capstone C

Merged Additions (from LEARN_CSAPP_COMPUTER_SYSTEMS.md)

This file (CSAPP_3E_DEEP_LEARNING_PROJECTS.md) is the canonical โ€œone main file + expanded project guidesโ€ path. LEARN_CSAPP_COMPUTER_SYSTEMS.md remains in the repo as a legacy snapshot, but its unique projects and learning-plan ideas are consolidated here so you have a single place to start.

The full legacy document is also included verbatim in Appendix A at the end of this file (collapsed by default).

Overlap Map (Project Equivalents)

LEARN_CSAPP_COMPUTER_SYSTEMS.md Closest match in this path Notes
Project 1: Data Lab P2 + P3 This path splits โ€œinspect representationsโ€ and โ€œconstraints-style bit puzzlesโ€.
Project 2: Bomb Lab P5 Same lab domain; this path emphasizes a repeatable workflow + writeups.
Project 3: Attack Lab P6 Same lab domain; this path emphasizes workflow + post-mortems.
Project 4: Y86-64 Simulator P7 Same core learning objective.
Project 5: Cache Simulator P9 This path adds locality visualization and โ€œwhy itโ€™s slowโ€ instrumentation.
Project 6: Malloc Lab P14 Same domain; this path pushes allocator design further.
Project 7: Unix Shell P12 (+P11) This path explicitly builds signal/process discipline first.
Project 10: Robust I/O P15 Same domain; this path frames it as a reusable toolkit.
Project 12: Concurrent Proxy P17 (+P16) This path makes the proxy the capstone and treats thread pools as a prerequisite skill.

Bonus Projects (Build More of the Stack)

These are additional projects from LEARN_CSAPP_COMPUTER_SYSTEMS.md that are valuable but not part of the core โ€œP1โ†’P17โ€ dependency graph. Each one has an expanded guide in the same folder as the original 17 projects.

# Project Expanded guide
18 ELF Linker and Loader P18-elf-linker-and-loader.md
19 Virtual Memory Simulator P19-virtual-memory-simulator.md
20 HTTP Web Server P20-http-web-server.md
21 Thread Pool Implementation P21-thread-pool-implementation.md
22 Signal-Safe Printf (Async-Signal-Safe Logging) P22-signal-safe-printf.md
23 Performance Profiler P23-performance-profiler.md
24 Memory Leak Detector P24-memory-leak-detector.md
25 Debugger (ptrace-based) P25-debugger-ptrace.md
26 Final Capstone: Operating System Kernel P26-operating-system-kernel-capstone.md

Alternate Time-Based Phases (Optional)

If you prefer a calendar-based plan (instead of dependency-based), the legacy file proposes these phases; they map cleanly onto this path:

  1. Foundation (4โ€“6 weeks): P1โ€“P4 (+P2/P3 depth as needed)
  2. Hardware Understanding (4โ€“6 weeks): P7โ€“P9
  3. System Software (6โ€“8 weeks): P10โ€“P14
  4. I/O and Networking (3โ€“4 weeks): P15 + (P20 optional) + P17 basics
  5. Concurrency (4โ€“6 weeks): P16 + P21 + P17 scaling
  6. Advanced Topics (4+ weeks): P18, P19, P22โ€“P25
  7. Post-CS:APP Capstone (months): P26

Last updated: December 2025