Project 19: Virtual Memory Simulator
Make virtual memory visible by simulating page tables, a TLB, and page replacement on real address traces. Understand how your program’s memory addresses become physical memory locations.
Quick Reference
| Attribute | Value |
|---|---|
| Language | C (alt: C++, Rust) |
| Difficulty | Advanced |
| Time | ~2 weeks |
| Chapters | 9 |
| Coolness | See the Invisible |
| Portfolio Value | Strong Systems Project |
Learning Objectives
By completing this project, you will:
- Master address translation: Understand how virtual addresses become physical addresses through multi-level page tables
- Implement TLB caching: Build a Translation Lookaside Buffer and measure its impact on address translation speed
- Understand page faults: Implement demand paging and see exactly when and why page faults occur
- Compare replacement algorithms: Implement FIFO, LRU, and Clock algorithms and analyze their fault rates
- Track dirty pages: Understand write-back vs write-through and count writeback costs
- Analyze working sets: Visualize how programs use memory over time
- Connect theory to practice: See how Linux/x86-64 page tables actually work
- Measure performance trade-offs: Quantify the cost of TLB misses and page faults
- Build debugging tools: Create visualizations that expose VM behavior
- Prepare for systems interviews: Answer deep questions about memory management
Deep Theoretical Foundation
Why Virtual Memory Exists
Before virtual memory, programs had direct access to physical memory. This caused numerous problems:
┌────────────────────────────────────────────────────────────────────────┐
│ THE PROBLEMS WITHOUT VIRTUAL MEMORY │
├────────────────────────────────────────────────────────────────────────┤
│ │
│ PROBLEM 1: Programs Had to Know Physical Addresses │
│ ───────────────────────────────────────────────────── │
│ │
│ 1960s approach: │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ Program A: "I'll use addresses 0x00000-0x0FFFF" │ │
│ │ Program B: "I'll use addresses 0x10000-0x1FFFF" │ │
│ │ Program C: "Uh... what's left?" │ │
│ └─────────────────────────────────────────────────────────────┘ │
│ │
│ Compiler had to know where program would load! │
│ Moving a program = recompiling everything │
│ │
│ PROBLEM 2: No Memory Protection │
│ ───────────────────────────────── │
│ │
│ Physical Memory: │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ Program A │ Program B │ OS Kernel │ Program C │ ... │ │
│ └─────────────────────────────────────────────────────────────┘ │
│ │ │
│ └─── Buggy Program A can overwrite B or kernel! │
│ │
│ PROBLEM 3: Programs Wanted More RAM Than Available │
│ ────────────────────────────────────────────────── │
│ │
│ "I have 16 MB RAM but my dataset is 64 MB" │
│ Solution: Programmer manually managed overlays (nightmare) │
│ │
│ PROBLEM 4: Fragmentation │
│ ──────────────────────── │
│ │
│ After running a while: │
│ ┌───┬────────┬───┬────────┬───┬───────────┬───┬────────────────┐ │
│ │ A │ (free) │ B │ (free) │ C │ (free) │ D │ (free) │ │
│ └───┴────────┴───┴────────┴───┴───────────┴───┴────────────────┘ │
│ 8K 12K 6K 8K 10K 20K 4K 40K │
│ │
│ Total free: 80K, but can't load 50K program (fragmented!) │
│ │
└────────────────────────────────────────────────────────────────────────┘
The Virtual Memory Solution
Virtual memory provides an elegant abstraction: every process thinks it has the entire address space to itself.
┌────────────────────────────────────────────────────────────────────────┐
│ VIRTUAL MEMORY ABSTRACTION │
├────────────────────────────────────────────────────────────────────────┤
│ │
│ Process A's View: Process B's View: │
│ ───────────────── ───────────────── │
│ │
│ ┌─────────────────┐ ┌─────────────────┐ │
│ │ 0xFFFFFFFF │ │ 0xFFFFFFFF │ │
│ │ Kernel Space │ │ Kernel Space │ │
│ ├─────────────────┤ ├─────────────────┤ │
│ │ Stack │ │ Stack │ │
│ │ ↓ │ │ ↓ │ │
│ │ │ │ │ │
│ │ ↑ │ │ ↑ │ │
│ │ Heap │ │ Heap │ │
│ ├─────────────────┤ ├─────────────────┤ │
│ │ BSS/Data │ │ BSS/Data │ │
│ ├─────────────────┤ ├─────────────────┤ │
│ │ Code (Text) │ │ Code (Text) │ │
│ │ 0x00000000 │ │ 0x00000000 │ │
│ └─────────────────┘ └─────────────────┘ │
│ │
│ BOTH processes use address 0x00401000 for their code! │
│ BOTH processes think they have 4GB (or 256TB on 64-bit) to use! │
│ │
│ Physical Reality: │
│ ───────────────── │
│ │
│ Physical RAM (maybe only 8 GB): │
│ ┌─────┬─────┬─────┬─────┬─────┬─────┬─────┬─────┐ │
│ │A:pg3│B:pg1│A:pg0│B:pg4│B:pg0│Free │A:pg7│B:pg2│ │
│ └─────┴─────┴─────┴─────┴─────┴─────┴─────┴─────┘ │
│ Pages from A and B interleaved, but each process is isolated │
│ │
└────────────────────────────────────────────────────────────────────────┘
Address Spaces and Page Size
Virtual memory divides memory into fixed-size units called pages:
┌────────────────────────────────────────────────────────────────────────┐
│ PAGES AND FRAMES │
├────────────────────────────────────────────────────────────────────────┤
│ │
│ VIRTUAL ADDRESS SPACE PHYSICAL ADDRESS SPACE │
│ (per process) (shared hardware) │
│ │
│ ┌────────────────────┐ ┌────────────────────┐ │
│ │ Virtual Page 0 │ ──┐ │ Physical Frame 0 │ │
│ │ (4 KB) │ │ │ (4 KB) │ │
│ ├────────────────────┤ │ ├────────────────────┤ │
│ │ Virtual Page 1 │ └──→ │ Physical Frame 1 │ ← mapping! │
│ │ (4 KB) │ │ (4 KB) │ │
│ ├────────────────────┤ ├────────────────────┤ │
│ │ Virtual Page 2 │ ─────→ │ Physical Frame 2 │ ← mapping! │
│ │ (4 KB) │ │ (4 KB) │ │
│ ├────────────────────┤ ├────────────────────┤ │
│ │ Virtual Page 3 │ ┌──→ │ Physical Frame 3 │ │
│ │ (4 KB) │ │ │ (4 KB) │ │
│ ├────────────────────┤ │ ├────────────────────┤ │
│ │ Virtual Page 4 │ ───┘ │ Physical Frame 4 │ ← mapping! │
│ │ (NOT MAPPED) │ ✗ │ (4 KB) │ │
│ ├────────────────────┤ ├────────────────────┤ │
│ │ ... │ │ ... │ │
│ └────────────────────┘ └────────────────────┘ │
│ │
│ Key terms: │
│ • Virtual Page Number (VPN): Which virtual page (index) │
│ • Physical Frame Number (PFN): Which physical frame (index) │
│ • Page Offset: Position within a page (same for VPN and PFN!) │
│ │
│ Common page sizes: │
│ • 4 KB (4096 bytes) - most common on x86/x86-64 │
│ • 2 MB "huge pages" - for large datasets, fewer TLB entries │
│ • 1 GB "giant pages" - for very large mappings │
│ │
└────────────────────────────────────────────────────────────────────────┘
Virtual Address Breakdown
A virtual address is split into the Virtual Page Number (VPN) and Page Offset:
┌────────────────────────────────────────────────────────────────────────┐
│ VIRTUAL ADDRESS BREAKDOWN │
├────────────────────────────────────────────────────────────────────────┤
│ │
│ 32-BIT ADDRESS SPACE with 4 KB pages (2^12 = 4096 bytes): │
│ ────────────────────────────────────────────────────────── │
│ │
│ 31 12 11 0 │
│ ┌────────────────────────────────────────┬─────────────────────┐ │
│ │ Virtual Page Number (VPN) │ Page Offset │ │
│ │ 20 bits │ 12 bits │ │
│ │ (1,048,576 pages max) │ (0-4095 in page) │ │
│ └────────────────────────────────────────┴─────────────────────┘ │
│ │
│ Example: Virtual address 0x12345ABC │
│ │
│ Binary: 0001 0010 0011 0100 0101 1010 1011 1100 │
│ ├──────────────────────┤├─────────────┤ │
│ VPN = 0x12345 Offset = 0xABC │
│ (74,565 in decimal) (2748 in decimal) │
│ │
│ │
│ 64-BIT ADDRESS SPACE (x86-64, actually uses 48 bits): │
│ ────────────────────────────────────────────────────── │
│ │
│ 63 48 47 12 11 0 │
│ ┌────────┬───────────────────────────────┬─────────────────────┐ │
│ │ Sign │ Virtual Page Number │ Page Offset │ │
│ │ Extend │ 36 bits │ 12 bits │ │
│ │ 16 bits│ │ │ │
│ └────────┴───────────────────────────────┴─────────────────────┘ │
│ │
│ Sign extension: bits 63-48 must all equal bit 47 │
│ Creates "canonical" addresses with a hole in the middle │
│ │
│ User space: 0x0000000000000000 - 0x00007FFFFFFFFFFF │
│ (hole) 0x0000800000000000 - 0xFFFF7FFFFFFFFFFF │
│ Kernel space: 0xFFFF800000000000 - 0xFFFFFFFFFFFFFFFF │
│ │
└────────────────────────────────────────────────────────────────────────┘
Single-Level Page Tables
The simplest translation: one big table indexed by VPN:
┌────────────────────────────────────────────────────────────────────────┐
│ SINGLE-LEVEL PAGE TABLE │
├────────────────────────────────────────────────────────────────────────┤
│ │
│ Virtual Address │
│ ┌──────────────────────────────┬─────────────────┐ │
│ │ VPN (Virtual Page Number) │ Page Offset │ │
│ └──────────────────────────────┴─────────────────┘ │
│ │ │ │
│ │ │ │
│ ▼ │ │
│ ┌─────────────────────┐ │ │
│ │ PAGE TABLE │ │ │
│ ├─────────────────────┤ │ │
│ │ Entry 0: PFN=3, V=1 │ │ │
│ │ Entry 1: PFN=7, V=1 │ │ │
│ │ Entry 2: V=0 (inv) │←── VPN=2 ──┤ │
│ │ Entry 3: PFN=1, V=1 │ │ │
│ │ Entry 4: PFN=5, V=1 │ │ │
│ │ Entry 5: V=0 (inv) │ │ │
│ │ ... │ │ │
│ └─────────────────────┘ │ │
│ │ │ │
│ │ (if valid) │ │
│ ▼ ▼ │
│ Physical Address │
│ ┌──────────────────────────────┬─────────────────┐ │
│ │ PFN (Physical Frame Number) │ Page Offset │ │
│ │ (from page table entry) │ (unchanged!) │ │
│ └──────────────────────────────┴─────────────────┘ │
│ │
│ PAGE TABLE ENTRY (PTE) structure: │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ Present │ R/W │ User │ PWT │ PCD │ Acc │ Dirty │ ... │ PFN │ │
│ │ (V) │ │/Super│ │ │ │ │ │ │ │
│ │ 1 bit │1 bit│1 bit │1 bit│1 bit│1 bit│ 1 bit │ │ N bits │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │
│ Key bits: │
│ • Present (V): 1 = page in memory, 0 = not present (page fault!) │
│ • R/W: 1 = read/write, 0 = read-only │
│ • User/Super: 1 = user accessible, 0 = kernel only │
│ • Accessed: Set by CPU when page is read │
│ • Dirty: Set by CPU when page is written │
│ • PFN: Physical frame number where this page lives │
│ │
│ PROBLEM: For 32-bit address space with 4KB pages: │
│ 2^20 entries × 4 bytes/entry = 4 MB per process! │
│ │
│ For 64-bit (48-bit used): 2^36 entries × 8 bytes = 512 GB! │
│ (Obviously impossible to allocate) │
│ │
└────────────────────────────────────────────────────────────────────────┘
Multi-Level Page Tables
The solution: hierarchical page tables that only allocate what’s needed:
┌────────────────────────────────────────────────────────────────────────┐
│ TWO-LEVEL PAGE TABLE (32-bit x86) │
├────────────────────────────────────────────────────────────────────────┤
│ │
│ Virtual Address (32 bits, 4KB pages): │
│ │
│ 31 22 21 12 11 0 │
│ ┌───────────┬───────────┬─────────────────────┐ │
│ │ Dir Index │ Tbl Index │ Page Offset │ │
│ │ 10 bits │ 10 bits │ 12 bits │ │
│ │ (0-1023) │ (0-1023) │ (0-4095) │ │
│ └───────────┴───────────┴─────────────────────┘ │
│ │ │ │ │
│ │ │ │ │
│ ▼ │ │ │
│ ┌─────────┐ │ │ │
│ │ Page │ │ │ │
│ │Directory│ │ │ │
│ ├─────────┤ │ │ │
│ │Entry 0 ─┼─────────┐ │ │
│ │Entry 1 ─┼────┐ │ │ │
│ │Entry 2 │ │ │ │ │
│ │ ... │ │ │ │ │
│ │Entry N ─┼──┐ │ │ │ │
│ └─────────┘ │ │ │ │ │
│ (4 KB) │ │ │ │ │
│ │ │ │ │ │
│ ┌─────┘ │ └───────────┼───────────┐ │
│ │ │ │ │ │
│ ▼ ▼ ▼ │ │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ │ │
│ │ Page │ │ Page │ │ Page │ │ │
│ │ Table 0 │ │ Table 1 │ │ Table N │ │ │
│ ├─────────┤ ├─────────┤ ├─────────┤ │ │
│ │Entry 0 │ │Entry 0 │ │Entry 0 ─┼──┐ │ │
│ │Entry 1 │ │Entry 1 │ │Entry 1 │ │ │ │
│ │ ... │ │ ... │ │ ... │ │ │ │
│ └─────────┘ └─────────┘ └─────────┘ │ │ │
│ (4 KB) (4 KB) (4 KB) │ │ │
│ │ │ │
│ ▼ │ │
│ Physical Address │ │
│ ┌──────────────┬─────────┴─┐ │
│ │ PFN │ Offset │ │
│ │ (from PTE) │(unchanged)│ │
│ └──────────────┴───────────┘ │
│ │
│ ADVANTAGES: │
│ • Only allocate page tables for regions actually used │
│ • Most processes use tiny fraction of address space │
│ • Sparse allocation saves huge amounts of memory │
│ │
│ EXAMPLE: Process using only stack (top) and code (bottom) │
│ • Need: 1 page directory + 2 page tables = 12 KB │
│ • Instead of: 4 MB for full single-level table │
│ │
└────────────────────────────────────────────────────────────────────────┘
Four-Level Page Tables (x86-64)
Modern 64-bit systems use four levels:
┌────────────────────────────────────────────────────────────────────────┐
│ FOUR-LEVEL PAGE TABLE (x86-64) │
├────────────────────────────────────────────────────────────────────────┤
│ │
│ Virtual Address (48 bits used of 64): │
│ │
│ 63 48 47 39 38 30 29 21 20 12 11 0 │
│ ┌────────┬────────┬────────┬────────┬────────┬──────────────────┐ │
│ │ Sign │ PML4 │ PDPT │ PD │ PT │ Page Offset │ │
│ │ Extend │ Index │ Index │ Index │ Index │ 12 bits │ │
│ │ │ 9 bits │ 9 bits │ 9 bits │ 9 bits │ │ │
│ └────────┴────────┴────────┴────────┴────────┴──────────────────┘ │
│ │ │ │ │ │ │
│ │ │ │ │ │ │
│ ▼ │ │ │ │ │
│ ┌────────┐ │ │ │ │ │
│ CR3─→│ PML4 │ │ │ │ │ │
│ │ Table │ │ │ │ │ │
│ ├────────┤ │ │ │ │ │
│ │Entry 0 │ │ │ │ │ │
│ │Entry 1─┼───┘ │ │ │ │
│ │ ... │ │ │ │ │
│ │Entry511│ │ │ │ │
│ └────────┘ │ │ │ │
│ (4 KB) │ │ │ │
│ │ │ │ │ │
│ ▼ ▼ │ │ │
│ ┌────────┐ ┌────────┐ │ │ │
│ │ PDPT │ │ Page │ │ │ │
│ │ Table │ │Directory│ │ │ │
│ ├────────┤ │ Ptr Tbl│ │ │ │
│ │ ... │ ├────────┤ │ │ │
│ │Entry──┼────→ │Entry──┼───┘ │ │
│ │ ... │ │ ... │ │ │
│ └────────┘ └────────┘ │ │
│ │ │ │
│ ▼ │ │
│ ┌────────┐ │ │
│ │ Page │ │ │
│ │Directory│ │ │
│ ├────────┤ │ │
│ │Entry──┼───────────────────┘ │
│ │ ... │ │
│ └────────┘ │
│ │ │
│ ▼ │
│ ┌────────┐ │
│ │ Page │ │
│ │ Table │ │
│ ├────────┤ │
│ │Entry──┼──→ PFN + Offset = Physical Addr │
│ │ ... │ │
│ └────────┘ │
│ │
│ NAMES: │
│ • PML4: Page Map Level 4 (512 entries, points to PDPTs) │
│ • PDPT: Page Directory Pointer Table (points to PDs) │
│ • PD: Page Directory (points to PTs) │
│ • PT: Page Table (points to physical frames) │
│ │
│ CR3 register holds physical address of current process's PML4 │
│ Context switch = change CR3 = change entire address space │
│ │
│ COST: Up to 4 memory accesses per address translation! │
│ (This is why TLB is essential) │
│ │
└────────────────────────────────────────────────────────────────────────┘
The Translation Lookaside Buffer (TLB)
The TLB is a small, fast cache of recent page table lookups:
┌────────────────────────────────────────────────────────────────────────┐
│ TRANSLATION LOOKASIDE BUFFER (TLB) │
├────────────────────────────────────────────────────────────────────────┤
│ │
│ WHY TLB IS ESSENTIAL: │
│ ──────────────────── │
│ │
│ Without TLB: Every memory access requires 4 additional accesses │
│ (one for each page table level) │
│ Effective memory latency = 5× actual latency! │
│ │
│ With TLB: Most accesses hit in TLB (typically 95-99%) │
│ Only TLB misses pay the page table walk cost │
│ │
│ │
│ TLB STRUCTURE: │
│ ────────────── │
│ │
│ ┌──────────────────────────────────────────────────────────────────┐ │
│ │ TLB (64-512 entries typical) │ │
│ ├──────────────────────────────────────────────────────────────────┤ │
│ │ Valid │ ASID │ VPN │ PFN │ Permissions │ │
│ ├───────┼──────┼────────────────┼────────────────┼─────────────────┤ │
│ │ 1 │ 12 │ 0x00007FFF1234 │ 0x0000000ABCD │ R/W, User │ │
│ │ 1 │ 12 │ 0x00007FFF1235 │ 0x0000000BCDE │ R/W, User │ │
│ │ 1 │ 12 │ 0x0000004001 │ 0x0000000DEF0 │ R/X, User │ │
│ │ 1 │ 7 │ 0x00007FFF1234 │ 0x000000012AB │ R/W, User │ │
│ │ 0 │ - │ - │ - │ - │ │
│ │ 1 │ 12 │ 0xFFFF8001234 │ 0x00000001234 │ R/W, Kernel │ │
│ │ ... │ ... │ ... │ ... │ ... │ │
│ └──────────────────────────────────────────────────────────────────┘ │
│ │
│ ASID (Address Space ID): │
│ • Tags entries by process to avoid flushing on context switch │
│ • Without ASID: must flush TLB on every context switch │
│ • With ASID: old entries remain valid, just won't match │
│ │
│ │
│ TLB LOOKUP FLOW: │
│ ──────────────── │
│ │
│ Virtual Address │
│ │ │
│ ▼ │
│ ┌───────────────────────┐ │
│ │ Extract VPN + ASID │ │
│ └───────────────────────┘ │
│ │ │
│ ▼ │
│ ┌───────────────────────┐ │
│ │ Search TLB for match │ │
│ │ (VPN + ASID must both │ │
│ │ match valid entry) │ │
│ └───────────────────────┘ │
│ │ │ │
│ HIT! │ │ MISS │
│ ▼ ▼ │
│ ┌─────────────────┐ ┌─────────────────────┐ │
│ │ Return PFN from │ │ Walk page tables │ │
│ │ TLB entry │ │ (4 memory accesses) │ │
│ │ (~1 cycle) │ │ (100s of cycles) │ │
│ └─────────────────┘ └─────────────────────┘ │
│ │ │ │
│ │ ▼ │
│ │ ┌─────────────────────┐ │
│ │ │ Insert into TLB │ │
│ │ │ (may evict old) │ │
│ │ └─────────────────────┘ │
│ │ │ │
│ └──────────────────┘ │
│ │ │
│ ▼ │
│ Physical Address │
│ │
└────────────────────────────────────────────────────────────────────────┘
TLB Replacement Policies
When TLB is full, which entry to evict?
┌────────────────────────────────────────────────────────────────────────┐
│ TLB REPLACEMENT POLICIES │
├────────────────────────────────────────────────────────────────────────┤
│ │
│ FULLY ASSOCIATIVE TLB (most common for small TLBs): │
│ ───────────────────────────────────────────────── │
│ │
│ Any VPN can go in any TLB slot │
│ Search: Must check ALL entries in parallel (hardware) │
│ Good for small TLBs (32-64 entries) │
│ │
│ SET-ASSOCIATIVE TLB (for larger TLBs): │
│ ────────────────────────────────────── │
│ │
│ VPN determines which "set" to search │
│ Only need to check entries in that set │
│ Example: 4-way set-associative, 256 entries = 64 sets × 4 ways │
│ │
│ │
│ REPLACEMENT ALGORITHMS: │
│ ─────────────────────── │
│ │
│ RANDOM: │
│ ┌──────────────────────────────────────────────────────────────┐ │
│ │ Pick random entry to evict │ │
│ │ + Very simple to implement in hardware │ │
│ │ + No state to maintain │ │
│ │ - May evict frequently-used entry │ │
│ │ Used in: Many real TLBs (ARM, MIPS) │ │
│ └──────────────────────────────────────────────────────────────┘ │
│ │
│ LRU (Least Recently Used): │
│ ┌──────────────────────────────────────────────────────────────┐ │
│ │ Evict entry unused for longest time │ │
│ │ + Optimal-like for temporal locality │ │
│ │ - Expensive to track in hardware │ │
│ │ - Need N-way comparator for N entries │ │
│ │ Used in: Some x86 TLBs (approximated with pseudo-LRU) │ │
│ └──────────────────────────────────────────────────────────────┘ │
│ │
│ FIFO (First In First Out): │
│ ┌──────────────────────────────────────────────────────────────┐ │
│ │ Evict oldest entry │ │
│ │ + Simple: just maintain circular pointer │ │
│ │ - Ignores access patterns │ │
│ │ - Can evict hot entry just because it's old │ │
│ │ Rarely used for TLB, common for page replacement │ │
│ └──────────────────────────────────────────────────────────────┘ │
│ │
│ │
│ TLB SHOOTDOWN (Multi-core complication): │
│ ──────────────────────────────────────── │
│ │
│ When page table entry changes: │
│ 1. Must invalidate TLB entry on ALL cores that might have it │
│ 2. Send Inter-Processor Interrupt (IPI) to other cores │
│ 3. Each core flushes its TLB entry │
│ 4. Original core waits for acknowledgment │
│ │
│ This is expensive! Minimize page table changes. │
│ │
└────────────────────────────────────────────────────────────────────────┘
Page Faults and Demand Paging
A page fault occurs when the CPU accesses a page that isn’t in physical memory:
┌────────────────────────────────────────────────────────────────────────┐
│ PAGE FAULT HANDLING │
├────────────────────────────────────────────────────────────────────────┤
│ │
│ TYPES OF PAGE FAULTS: │
│ ──────────────────── │
│ │
│ 1. MINOR FAULT (Soft fault): │
│ Page is in memory, but PTE not set up yet │
│ Example: Shared library page in another process's page table │
│ Cost: ~1000-5000 cycles (just update page tables) │
│ │
│ 2. MAJOR FAULT (Hard fault): │
│ Page must be loaded from disk │
│ Cost: ~10,000,000 cycles (milliseconds) │
│ 10,000× slower than memory access! │
│ │
│ 3. INVALID ACCESS (Segmentation fault): │
│ Access to unmapped address or permission violation │
│ Result: SIGSEGV, process terminated │
│ │
│ │
│ PAGE FAULT HANDLING FLOW: │
│ ──────────────────────── │
│ │
│ CPU attempts memory access │
│ │ │
│ ▼ │
│ ┌─────────────────┐ │
│ │ TLB lookup │ │
│ └─────────────────┘ │
│ │ Miss │
│ ▼ │
│ ┌─────────────────┐ │
│ │ Page table walk │ │
│ └─────────────────┘ │
│ │ Present bit = 0 │
│ ▼ │
│ ╔═════════════════════════════╗ │
│ ║ PAGE FAULT! ║ │
│ ║ CPU raises exception ║ │
│ ╚═════════════════════════════╝ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────┐ │
│ │ OS PAGE FAULT HANDLER │ │
│ ├─────────────────────────────────────────┤ │
│ │ 1. Check if access is valid │ │
│ │ - Is address in valid VMA? │ │
│ │ - Are permissions correct? │ │
│ │ │ │
│ │ If invalid: Send SIGSEGV ─────────────→ Process killed │
│ │ │ │
│ │ 2. If valid, need to bring in page: │ │
│ │ - Find free frame (or evict one) │ │
│ │ - Read page from disk/swap │ │
│ │ - Update page table entry │ │
│ │ - Flush TLB entry │ │
│ │ │ │
│ │ 3. Return from exception │ │
│ │ - CPU re-executes faulting instr │ │
│ └─────────────────────────────────────────┘ │
│ │
│ │
│ DEMAND PAGING: │
│ ────────────── │
│ │
│ Key insight: Don't load pages until accessed! │
│ │
│ ┌──────────────────────────────────────────────────────────────────┐ │
│ │ Program starts: │ │
│ │ All pages marked invalid (present=0) │ │
│ │ No physical frames allocated yet │ │
│ │ │ │
│ │ First instruction fetch → page fault → load code page │ │
│ │ First stack access → page fault → allocate stack page │ │
│ │ First data access → page fault → load data page │ │
│ │ │ │
│ │ Benefits: │ │
│ │ • Fast startup (don't load entire program) │ │
│ │ • Less memory used (unused pages never loaded) │ │
│ │ • Programs can be larger than physical memory │ │
│ └──────────────────────────────────────────────────────────────────┘ │
│ │
└────────────────────────────────────────────────────────────────────────┘
Page Replacement Algorithms
When physical memory is full, which page should be evicted?
┌────────────────────────────────────────────────────────────────────────┐
│ PAGE REPLACEMENT ALGORITHMS │
├────────────────────────────────────────────────────────────────────────┤
│ │
│ THE OPTIMAL ALGORITHM (OPT/MIN): │
│ ──────────────────────────────── │
│ │
│ Replace the page that won't be used for longest time in future │
│ │
│ Problem: Requires knowing future (impossible!) │
│ Use: Benchmark to compare other algorithms against │
│ │
│ │
│ FIFO (First-In-First-Out): │
│ ────────────────────────── │
│ │
│ Replace the page that has been in memory longest │
│ │
│ Implementation: │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ │ │
│ │ Frame Queue (circular buffer): │ │
│ │ │ │
│ │ ┌─────┬─────┬─────┬─────┐ │ │
│ │ │ P3 │ P5 │ P2 │ P7 │ ← hand points to oldest │ │
│ │ └─────┴─────┴─────┴─────┘ │ │
│ │ ↑ │ │
│ │ hand (next victim) │ │
│ │ │ │
│ │ On page fault: │ │
│ │ 1. Evict page at hand position │ │
│ │ 2. Load new page into that frame │ │
│ │ 3. Advance hand │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │
│ Pros: Simple, low overhead │
│ Cons: Ignores access patterns (may evict hot page) │
│ Belady's anomaly (more frames can mean more faults!) │
│ │
│ │
│ LRU (Least Recently Used): │
│ ────────────────────────── │
│ │
│ Replace the page that hasn't been used for longest time │
│ │
│ Implementation approaches: │
│ │
│ 1. TIMESTAMP: │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ Page │ Frame │ Last Access Time │ │
│ │ ──── │ ───── │ ──────────────── │ │
│ │ P3 │ 0 │ 1000 │ │
│ │ P5 │ 1 │ 2500 ← most recent │ │
│ │ P2 │ 2 │ 800 ← oldest = victim │ │
│ │ P7 │ 3 │ 1200 │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │
│ On fault: Find page with smallest timestamp (O(n) scan) │
│ On access: Update timestamp │
│ │
│ 2. STACK (doubly-linked list): │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ │ │
│ │ MRU ←── P5 ←→ P7 ←→ P3 ←→ P2 ──→ LRU │ │
│ │ ↑ ↑ │ │
│ │ (front) (back = victim) │ │
│ │ │ │
│ │ On access: Move page to front (O(1) with hash table) │ │
│ │ On fault: Evict from back (O(1)) │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │
│ Pros: Good approximation of optimal │
│ Cons: Expensive to implement perfectly in hardware │
│ │
│ │
│ CLOCK (Second Chance): │
│ ───────────────────── │
│ │
│ Approximation of LRU using reference bits │
│ │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ │ │
│ │ ┌─────┐ │ │
│ │ │P3:1 │ (reference bit = 1, recently used) │ │
│ │ ┌──┴─────┴──┐ │ │
│ │ ┌────┤ ├────┐ │ │
│ │ │P7:0│ hand │P5:1│ │ │
│ │ │ │ ↓ │ │ │ │
│ │ └────┤ ├────┘ │ │
│ │ └──┬─────┬──┘ │ │
│ │ │P2:0 │ (reference bit = 0, not recently used) │ │
│ │ └─────┘ │ │
│ │ │ │
│ │ ALGORITHM: │ │
│ │ 1. Check page at hand position │ │
│ │ 2. If reference bit = 0: EVICT this page │ │
│ │ 3. If reference bit = 1: │ │
│ │ - Clear reference bit (set to 0) │ │
│ │ - Give it a "second chance" │ │
│ │ - Advance hand, go to step 1 │ │
│ │ │ │
│ │ Hardware automatically sets reference bit on access! │ │
│ │ OS periodically clears reference bits. │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │
│ Pros: Approximates LRU with just 1 bit per page │
│ Cons: Not as accurate as true LRU │
│ │
└────────────────────────────────────────────────────────────────────────┘
Enhanced Clock (NRU - Not Recently Used)
Uses both reference and dirty bits:
┌────────────────────────────────────────────────────────────────────────┐
│ ENHANCED CLOCK / NRU ALGORITHM │
├────────────────────────────────────────────────────────────────────────┤
│ │
│ Each page has two bits: │
│ • R (Reference): Set by hardware on read or write │
│ • D (Dirty/Modified): Set by hardware on write │
│ │
│ PRIORITY CLASSES (prefer to evict lower class): │
│ │
│ ┌────────────────────────────────────────────────────────────────┐ │
│ │ Class │ R │ D │ Meaning │ │
│ ├───────┼─────┼─────┼────────────────────────────────────────────┤ │
│ │ 0 │ 0 │ 0 │ Not referenced, not modified (best victim) │ │
│ │ 1 │ 0 │ 1 │ Not referenced, modified (need writeback) │ │
│ │ 2 │ 1 │ 0 │ Referenced, not modified (might use again) │ │
│ │ 3 │ 1 │ 1 │ Referenced, modified (worst victim) │ │
│ └────────────────────────────────────────────────────────────────┘ │
│ │
│ ALGORITHM: │
│ 1. First pass: Look for class 0 (R=0, D=0) │
│ Don't clear any bits │
│ │
│ 2. Second pass: Look for class 1 (R=0, D=1) │
│ Clear R bits as you go │
│ │
│ 3. Third pass: Look for class 0 (R=0, D=0) │
│ (Some pages now have R=0 because we cleared them) │
│ │
│ 4. Fourth pass: Look for class 1 (R=0, D=1) │
│ Must find something! │
│ │
│ │
│ WHY PREFER CLEAN OVER DIRTY? │
│ ──────────────────────────── │
│ │
│ Clean page: Just evict, no I/O needed │
│ Dirty page: Must write to disk first (slow!) │
│ │
│ Cost: Clean eviction: ~0 ms (just invalidate PTE) │
│ Dirty eviction: ~10 ms (disk write required) │
│ │
└────────────────────────────────────────────────────────────────────────┘
Working Set Model
The working set is the set of pages a process actively uses:
┌────────────────────────────────────────────────────────────────────────┐
│ WORKING SET MODEL │
├────────────────────────────────────────────────────────────────────────┤
│ │
│ DEFINITION: │
│ W(t, Δ) = set of pages referenced in time interval [t-Δ, t] │
│ │
│ Working set size varies over time: │
│ │
│ Pages in │
│ working set │
│ │ │
│ 20 │ ╭──────╮ │
│ │ ╱ ╲ │
│ 15 │ ╭────╯ ╲ ╭────╮ │
│ │ ╱ ╲ ╱ ╲ │
│ 10 │───╯ ╲╱ ╲─────── │
│ │ │
│ 5 │ │
│ │ │
│ 0 └─────────────────────────────────────────────────────→ Time │
│ │ │ │ │ │ │
│ └──init─┴─phase1─┴─trans─┴─phase2─┴─ │
│ │
│ PHASES: │
│ • Init: Loading, small working set │
│ • Phase 1: Processing data set A │
│ • Trans: Transition period (cache thrashing) │
│ • Phase 2: Processing data set B │
│ │
│ │
│ THRASHING: │
│ ────────── │
│ │
│ When working set > physical memory: │
│ │
│ ┌──────────────────────────────────────────────────────────────────┐ │
│ │ │ │
│ │ Time: ────────────────────────────────────────────────→ │ │
│ │ │ │
│ │ CPU: [run][wait][wait][wait][run][wait][wait][wait][run]... │ │
│ │ │ │ │ │ │ │ │ │
│ │ Page faults, waiting for disk I/O │ │
│ │ │ │
│ │ Symptom: CPU utilization drops dramatically │ │
│ │ Disk constantly busy with paging │ │
│ │ System becomes unresponsive │ │
│ │ │ │
│ └──────────────────────────────────────────────────────────────────┘ │
│ │
│ SOLUTION: Keep working set in memory! │
│ • Working set page replacement: Only evict pages outside working set │
│ • Load control: Reduce multiprogramming if thrashing detected │
│ │
└────────────────────────────────────────────────────────────────────────┘
Project Specification
What You Will Build
A command-line virtual memory simulator (vmsim) that:
- Reads address traces from files
- Simulates multi-level page tables
- Implements a TLB with configurable size and policy
- Implements FIFO, LRU, and Clock page replacement
- Tracks dirty pages and writebacks
- Produces detailed statistics
Functional Requirements
┌────────────────────────────────────────────────────────────────────────┐
│ FUNCTIONAL REQUIREMENTS │
├────────────────────────────────────────────────────────────────────────┤
│ │
│ COMMAND LINE INTERFACE: │
│ ─────────────────────── │
│ │
│ ./vmsim [OPTIONS] <trace_file> │
│ │
│ Required options: │
│ --frames <n> Number of physical frames │
│ --page-size <bytes> Page size in bytes (must be power of 2) │
│ --replace <policy> Replacement policy: fifo, lru, clock │
│ │
│ Optional: │
│ --tlb <n> TLB entries (0 = no TLB, default) │
│ --tlb-policy <p> TLB replacement: fifo, lru (default: fifo) │
│ --levels <n> Page table levels (1-4, default: 1) │
│ --verbose Print each address translation │
│ --stats-interval <n> Print stats every N accesses │
│ │
│ TRACE FILE FORMAT: │
│ ────────────────── │
│ │
│ Each line: <access_type> <hex_address> │
│ │
│ Access types: │
│ R - Read access │
│ W - Write access (marks page dirty) │
│ │
│ Example trace file: │
│ ┌─────────────────────────────────────┐ │
│ │ R 0x00007FFF12345678 │ │
│ │ W 0x00007FFF12345680 │ │
│ │ R 0x0000000000401000 │ │
│ │ R 0x0000000000401008 │ │
│ │ W 0x00007FFF12345700 │ │
│ │ R 0x00007FFF12346000 │ (new page) │
│ │ ... │ │
│ └─────────────────────────────────────┘ │
│ │
│ │
│ OUTPUT REQUIREMENTS: │
│ ──────────────────── │
│ │
│ Summary statistics (always printed): │
│ • Total memory accesses │
│ • TLB hits / misses / hit rate │
│ • Page hits / page faults / fault rate │
│ • Writebacks (dirty page evictions) │
│ │
│ Verbose output (with --verbose): │
│ • Each access: VPN, result (TLB hit/miss, page hit/fault) │
│ • On eviction: which page evicted, was it dirty │
│ │
└────────────────────────────────────────────────────────────────────────┘
Example Output
$ ./vmsim --frames=4 --page-size=4096 --replace=clock --tlb=8 trace.txt
Virtual Memory Simulator
========================
Configuration:
Page size: 4096 bytes (12-bit offset)
Physical frames: 4
TLB entries: 8
Page replacement: CLOCK
TLB replacement: FIFO
Processing trace: trace.txt
Results:
========
Total accesses: 100000
TLB Statistics:
TLB hits: 87234 (87.23%)
TLB misses: 12766 (12.77%)
Page Table Statistics:
Page hits: 99012 (99.01%)
Page faults: 988 (0.99%)
Writeback Statistics:
Dirty evictions: 234
Clean evictions: 754
$ ./vmsim --frames=4 --page-size=4096 --replace=lru --verbose trace_small.txt
Virtual Memory Simulator
========================
Configuration:
Page size: 4096 bytes (12-bit offset)
Physical frames: 4
TLB entries: 0 (disabled)
Page replacement: LRU
Processing trace: trace_small.txt
Access Type Address VPN TLB Page Frame
------ ---- ------- --- --- ---- -----
1 R 0x0000000000001234 0x00001 - FAULT -> 0
2 W 0x0000000000002100 0x00002 - FAULT -> 1
3 R 0x0000000000001500 0x00001 - HIT @ 0
4 R 0x0000000000003000 0x00003 - FAULT -> 2
5 R 0x0000000000004ABC 0x00004 - FAULT -> 3
6 R 0x0000000000005000 0x00005 - FAULT -> 0 (evict VPN 0x00001, clean)
7 W 0x0000000000002200 0x00002 - HIT @ 1
8 R 0x0000000000003010 0x00003 - HIT @ 2
9 R 0x0000000000006000 0x00006 - FAULT -> 3 (evict VPN 0x00004, clean)
10 W 0x0000000000005100 0x00005 - HIT @ 0
Results:
========
Total accesses: 10
Page hits: 4 (40.00%)
Page faults: 6 (60.00%)
Dirty evictions: 0
Clean evictions: 2
Verification: Hand-Trace Example
To verify your implementation, trace this example by hand:
┌────────────────────────────────────────────────────────────────────────┐
│ HAND-TRACE EXERCISE │
├────────────────────────────────────────────────────────────────────────┤
│ │
│ Config: 3 frames, 4KB pages, FIFO replacement │
│ │
│ Trace: │
│ 1. R 0x00001000 (VPN=1) │
│ 2. R 0x00002000 (VPN=2) │
│ 3. W 0x00003000 (VPN=3) │
│ 4. R 0x00001100 (VPN=1) │
│ 5. R 0x00004000 (VPN=4) <- frame table full! │
│ 6. W 0x00002100 (VPN=2) │
│ 7. R 0x00005000 (VPN=5) │
│ 8. R 0x00003100 (VPN=3) │
│ │
│ Step-by-step (your implementation should match): │
│ │
│ Access VPN Frame Table State Result │
│ ────── ─── ─────────────────────── ────────────────────────── │
│ 1 1 [1,-,-] FAULT, load into frame 0 │
│ 2 2 [1,2,-] FAULT, load into frame 1 │
│ 3 3 [1,2,3*] FAULT, load into frame 2 │
│ (* = dirty) (write marks dirty) │
│ 4 1 [1,2,3*] HIT (VPN 1 in frame 0) │
│ 5 4 [4,2,3*] FAULT, evict VPN 1 (FIFO) │
│ (VPN 1 was clean, no WB) │
│ 6 2 [4,2*,3*] HIT (VPN 2 in frame 1) │
│ (write marks dirty) │
│ 7 5 [4,5,3*] FAULT, evict VPN 2 │
│ WRITEBACK! (VPN 2 dirty) │
│ 8 3 [4,5,3*] HIT (VPN 3 in frame 2) │
│ │
│ EXPECTED OUTPUT: │
│ Total accesses: 8 │
│ Page faults: 5 │
│ Page hits: 3 │
│ Writebacks: 1 │
│ │
└────────────────────────────────────────────────────────────────────────┘
Solution Architecture
High-Level Design
┌────────────────────────────────────────────────────────────────────────┐
│ SIMULATOR ARCHITECTURE │
├────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌───────────────────┐ │
│ │ Trace Reader │ │
│ │ (parse file) │ │
│ └─────────┬─────────┘ │
│ │ {type, address} │
│ ▼ │
│ ┌───────────────────┐ │
│ │ Address Decoder │ │
│ │ (extract VPN, │ │
│ │ offset) │ │
│ └─────────┬─────────┘ │
│ │ {VPN, offset, type} │
│ ▼ │
│ ┌──────────────────────────────────────────┐ │
│ │ TLB LOOKUP │ │
│ │ ┌─────────────────────────────────────┐ │ │
│ │ │ TLB: [VPN → PFN] cache │ │ │
│ │ └─────────────────────────────────────┘ │ │
│ └──────────────┬─────────────┬─────────────┘ │
│ HIT │ │ MISS │
│ │ │ │
│ ┌──────────────┘ └──────────────┐ │
│ │ │ │
│ │ ┌──────────────▼──────────────┐ │
│ │ │ PAGE TABLE WALK │ │
│ │ │ ┌────────────────────────┐ │ │
│ │ │ │ Page Table: VPN → PTE │ │ │
│ │ │ │ (sparse hash map or │ │ │
│ │ │ │ multi-level array) │ │ │
│ │ │ └────────────────────────┘ │ │
│ │ └──────────────┬──────────────┘ │
│ │ FOUND│ │NOT FOUND │
│ │ │ │ │
│ │ │ ▼ │
│ │ ┌──────────────┘ PAGE FAULT │
│ │ │ │ │
│ │ │ ▼ │
│ │ │ ┌───────────────────────┐ │
│ │ │ │ PAGE REPLACEMENT │ │
│ │ │ │ ┌─────────────────┐ │ │
│ │ │ │ │ Frame Table: │ │ │
│ │ │ │ │ [PFN → {VPN, │ │ │
│ │ │ │ │ valid, dirty, │ │ │
│ │ │ │ │ ref_bit}] │ │ │
│ │ │ │ └─────────────────┘ │ │
│ │ │ │ │ │
│ │ │ │ FIFO / LRU / Clock │ │
│ │ │ └───────────┬───────────┘ │
│ │ │ │ │
│ │ │ ▼ │
│ │ │ ┌───────────────────────┐ │
│ │ │ │ If evicted dirty: │ │
│ │ │ │ writeback_count++ │ │
│ │ │ │ Allocate frame │ │
│ │ │ │ Update page table │ │
│ │ │ │ Update TLB │ │
│ │ │ └───────────┬───────────┘ │
│ │ │ │ │
│ └───────────────┬────────────┴──────────────────┘ │
│ │ │
│ ▼ │
│ ┌───────────────────┐ │
│ │ Update Stats │ │
│ │ - TLB hit/miss │ │
│ │ - page hit/fault │ │
│ │ - writebacks │ │
│ └───────────────────┘ │
│ │
└────────────────────────────────────────────────────────────────────────┘
Data Structures
/* ==================== CONSTANTS ==================== */
#define MAX_TLB_ENTRIES 256
#define MAX_FRAMES 65536
/* ==================== TYPE DEFINITIONS ==================== */
typedef enum { ACCESS_READ, ACCESS_WRITE } access_type_t;
typedef enum { REPLACE_FIFO, REPLACE_LRU, REPLACE_CLOCK } replace_policy_t;
/* ==================== TLB ENTRY ==================== */
typedef struct {
uint64_t vpn; /* Virtual page number */
uint32_t pfn; /* Physical frame number */
uint8_t valid; /* Entry is valid */
uint8_t dirty; /* Page has been written */
uint8_t ref_bit; /* Reference bit (for Clock) */
uint64_t last_access; /* Timestamp for LRU */
} tlb_entry_t;
/* ==================== PAGE TABLE ENTRY ==================== */
typedef struct {
uint32_t pfn; /* Physical frame number (if present) */
uint8_t present; /* Page is in physical memory */
uint8_t dirty; /* Page has been modified */
uint8_t ref_bit; /* Referenced since last check */
} page_table_entry_t;
/* ==================== FRAME TABLE ENTRY ==================== */
typedef struct {
uint64_t vpn; /* VPN currently in this frame */
uint8_t valid; /* Frame is in use */
uint8_t dirty; /* Page has been modified */
uint8_t ref_bit; /* Reference bit for Clock */
uint64_t load_time; /* When page was loaded (FIFO) */
uint64_t last_access; /* Last access time (LRU) */
} frame_entry_t;
/* ==================== TLB ==================== */
typedef struct {
tlb_entry_t entries[MAX_TLB_ENTRIES];
int num_entries; /* Configured TLB size */
int clock_hand; /* For Clock replacement */
replace_policy_t policy;
} tlb_t;
/* ==================== PAGE TABLE ==================== */
/* For single-level: use hash table for sparse storage */
typedef struct {
page_table_entry_t *entries; /* Hash table or array */
size_t capacity;
int num_levels;
} page_table_t;
/* ==================== FRAME TABLE ==================== */
typedef struct {
frame_entry_t *frames;
int num_frames;
int clock_hand; /* For Clock replacement */
int fifo_head; /* For FIFO replacement */
replace_policy_t policy;
} frame_table_t;
/* ==================== STATISTICS ==================== */
typedef struct {
uint64_t total_accesses;
uint64_t tlb_hits;
uint64_t tlb_misses;
uint64_t page_hits;
uint64_t page_faults;
uint64_t writebacks;
uint64_t clean_evictions;
} stats_t;
/* ==================== SIMULATOR STATE ==================== */
typedef struct {
/* Configuration */
int page_size; /* Bytes per page */
int offset_bits; /* log2(page_size) */
int num_frames;
int tlb_entries;
int page_table_levels;
replace_policy_t page_replace_policy;
replace_policy_t tlb_replace_policy;
/* Components */
tlb_t tlb;
page_table_t page_table;
frame_table_t frame_table;
/* Statistics */
stats_t stats;
/* Timing (for LRU) */
uint64_t current_time;
} vmsim_t;
Address Translation Flow
┌────────────────────────────────────────────────────────────────────────┐
│ ADDRESS TRANSLATION PSEUDOCODE │
├────────────────────────────────────────────────────────────────────────┤
│ │
│ translate_address(vaddr, access_type): │
│ vpn = vaddr >> offset_bits │
│ offset = vaddr & ((1 << offset_bits) - 1) │
│ stats.total_accesses++ │
│ │
│ // Step 1: Check TLB │
│ tlb_result = tlb_lookup(vpn) │
│ if (tlb_result.hit): │
│ stats.tlb_hits++ │
│ pfn = tlb_result.pfn │
│ if (access_type == WRITE): │
│ tlb_set_dirty(vpn) │
│ frame_set_dirty(pfn) │
│ return (pfn << offset_bits) | offset │
│ │
│ // TLB miss │
│ stats.tlb_misses++ │
│ │
│ // Step 2: Walk page table │
│ pte = page_table_lookup(vpn) │
│ if (pte != NULL && pte.present): │
│ // Page hit │
│ stats.page_hits++ │
│ pfn = pte.pfn │
│ tlb_insert(vpn, pfn) // Add to TLB │
│ frame_update_access(pfn) // Update LRU info │
│ if (access_type == WRITE): │
│ pte.dirty = 1 │
│ frame_set_dirty(pfn) │
│ return (pfn << offset_bits) | offset │
│ │
│ // Step 3: Page fault! │
│ stats.page_faults++ │
│ │
│ // Find or create a frame │
│ pfn = get_free_frame() │
│ if (pfn == -1): │
│ // No free frame, must evict │
│ victim = select_victim() // FIFO/LRU/Clock │
│ if (frame_table[victim].dirty): │
│ stats.writebacks++ │
│ else: │
│ stats.clean_evictions++ │
│ // Invalidate old mapping │
│ old_vpn = frame_table[victim].vpn │
│ page_table_invalidate(old_vpn) │
│ tlb_invalidate(old_vpn) │
│ pfn = victim │
│ │
│ // Set up new mapping │
│ frame_table[pfn].vpn = vpn │
│ frame_table[pfn].valid = 1 │
│ frame_table[pfn].dirty = (access_type == WRITE) │
│ frame_table[pfn].ref_bit = 1 │
│ frame_table[pfn].load_time = current_time │
│ frame_table[pfn].last_access = current_time │
│ │
│ page_table_insert(vpn, pfn) │
│ tlb_insert(vpn, pfn) │
│ │
│ return (pfn << offset_bits) | offset │
│ │
└────────────────────────────────────────────────────────────────────────┘
Implementation Guide
The Core Question You’re Answering
“When my program accesses memory address 0x7FFF12345678, what actually happens? How does the CPU find the right physical memory location, and what happens if that memory isn’t available?”
This project makes the invisible visible. You’ll see exactly how:
- Virtual addresses get split into VPN and offset
- The TLB accelerates translation (and what happens on miss)
- Multi-level page tables save memory while supporting huge address spaces
- Page faults trigger the OS to bring in pages from disk
- Different replacement algorithms affect performance dramatically
After completing this project, you’ll understand why a program can use more memory than physically exists, why context switches are expensive, and why memory access patterns matter so much for performance.
Concepts You Must Understand First
Before starting, verify you understand these concepts:
| Concept | Why It Matters | Where to Learn |
|---|---|---|
| Binary and hexadecimal | You’ll parse hex addresses and extract bit fields | CS:APP Ch 2 |
| Bitwise operations | Extract VPN/offset using shifts and masks | CS:APP Ch 2.1 |
| Caching concepts | TLB is a cache; locality matters | CS:APP Ch 6 |
| Memory hierarchy | Understand why TLB and page tables exist | CS:APP Ch 6.1-6.4 |
| Hash tables | For sparse page table implementation | Any algorithms book |
| Linked lists | For LRU implementation | Any algorithms book |
Self-Assessment Questions:
- How do you extract bits 12-31 from a 32-bit integer? (Answer:
(x >> 12) & 0xFFFFF) - What’s the difference between a cache hit and miss?
- Why does temporal locality help TLB performance?
- What’s the time complexity of hash table lookup?
Questions to Guide Your Design
Work through these BEFORE writing code:
Address Decoding:
- If page size is 4096 bytes, how many bits for the offset? (Answer: log2(4096) = 12)
- For a 64-bit address with 12-bit offset, how many bits for VPN?
- How will you handle different page sizes (4KB, 2MB)?
TLB Design:
- Will your TLB be fully associative (search all entries) or set-associative?
- How will you handle TLB eviction? FIFO or LRU?
- What fields does each TLB entry need?
Page Table Design:
- Single-level array or multi-level? What about hash table for sparse mapping?
- How big would a full single-level table be for 48-bit addresses?
- What happens when a page table entry is invalidated?
Frame Table:
- How do you track which VPN is in each frame?
- What additional metadata do you need for FIFO? LRU? Clock?
- How do you efficiently find a victim for eviction?
Replacement Algorithms:
- For FIFO: What data structure tracks insertion order?
- For LRU: How do you efficiently track access order? (Hint: timestamps vs linked list)
- For Clock: How do you implement the circular scan with reference bits?
Thinking Exercise
Before writing any code, trace through this scenario by hand:
Configuration:
- 4 physical frames
- 4KB pages (12-bit offset)
- Clock replacement algorithm
- TLB with 2 entries, FIFO replacement
Initial state: All frames empty, TLB empty
Trace:
R 0x00001000 (VPN=1)
R 0x00002100 (VPN=2)
W 0x00003200 (VPN=3)
R 0x00004300 (VPN=4)
R 0x00001400 (VPN=1) <- all frames full, VPN 1 already loaded
R 0x00005000 (VPN=5) <- must evict!
W 0x00002200 (VPN=2)
R 0x00006000 (VPN=6) <- must evict!
For each access, determine:
- TLB hit or miss?
- Page hit or fault?
- If fault, which page is evicted? Is writeback needed?
- What’s in the TLB after the access?
- What are the reference bits and dirty bits in the frame table?
Write out the complete state after each access before implementing.
The Interview Questions They’ll Ask
After completing this project, you’ll confidently answer:
- “Explain how virtual memory address translation works.”
- Expected: VPN extracted, TLB checked, page table walked if miss, physical address formed
- Bonus: Discuss multi-level page tables, why each level exists, CR3 register
- “What is a TLB and why is it necessary?”
- Expected: Cache of recent translations, avoids 4 memory accesses per translation
- Bonus: Discuss ASID, TLB shootdown on multi-core, typical hit rates (95-99%)
- “What happens on a page fault?”
- Expected: CPU exception, OS handler runs, may need to evict a page, load from disk
- Bonus: Distinguish minor vs major faults, explain demand paging
- “Compare FIFO, LRU, and Clock page replacement.”
- Expected: FIFO is simple but ignores access patterns; LRU is better but expensive; Clock approximates LRU cheaply
- Bonus: Discuss Belady’s anomaly, working sets, thrashing
- “Why can a process use more memory than physically available?”
- Expected: Virtual memory + demand paging + swap space
- Bonus: Discuss overcommit, OOM killer, memory pressure
- “How does the OS maintain page table consistency across CPUs?”
- Expected: TLB shootdown via IPI when page table changes
- Bonus: Discuss INVLPG instruction, batching shootdowns, ASID
Hints in Layers
If you’re stuck, reveal hints one at a time:
Hint 1: Basic Address Decoding
Start with the address breakdown:
/*
* Extract VPN and offset from virtual address
*/
void decode_address(uint64_t vaddr, int offset_bits,
uint64_t *vpn, uint32_t *offset) {
// Offset is the low bits
*offset = vaddr & ((1ULL << offset_bits) - 1);
// VPN is everything above the offset
*vpn = vaddr >> offset_bits;
}
// Example: 4KB pages (12-bit offset)
// Address 0x7FFF12345ABC:
// offset = 0xABC (low 12 bits)
// vpn = 0x7FFF12345 (remaining bits)
Test this thoroughly before proceeding!
Hint 2: TLB Implementation
For a fully-associative TLB:
typedef struct {
uint64_t vpn;
uint32_t pfn;
uint8_t valid;
uint8_t dirty;
uint64_t last_used; // For LRU
} tlb_entry_t;
int tlb_lookup(tlb_t *tlb, uint64_t vpn, uint32_t *pfn) {
for (int i = 0; i < tlb->num_entries; i++) {
if (tlb->entries[i].valid && tlb->entries[i].vpn == vpn) {
*pfn = tlb->entries[i].pfn;
tlb->entries[i].last_used = current_time++;
return 1; // Hit
}
}
return 0; // Miss
}
void tlb_insert(tlb_t *tlb, uint64_t vpn, uint32_t pfn) {
// First, check if already present (update)
for (int i = 0; i < tlb->num_entries; i++) {
if (tlb->entries[i].valid && tlb->entries[i].vpn == vpn) {
tlb->entries[i].pfn = pfn;
tlb->entries[i].last_used = current_time++;
return;
}
}
// Find invalid entry or evict
int victim = find_tlb_victim(tlb); // Implement based on policy
tlb->entries[victim].valid = 1;
tlb->entries[victim].vpn = vpn;
tlb->entries[victim].pfn = pfn;
tlb->entries[victim].dirty = 0;
tlb->entries[victim].last_used = current_time++;
}
Hint 3: Clock Algorithm for Page Replacement
The Clock algorithm is elegant. Here’s the core:
int clock_select_victim(frame_table_t *ft) {
while (1) {
frame_entry_t *frame = &ft->frames[ft->clock_hand];
if (!frame->valid) {
// Found free frame
return ft->clock_hand;
}
if (frame->ref_bit == 0) {
// Found victim (not recently used)
return ft->clock_hand;
}
// Give second chance: clear ref bit and move on
frame->ref_bit = 0;
ft->clock_hand = (ft->clock_hand + 1) % ft->num_frames;
}
}
// After selecting victim, advance the hand
void clock_advance(frame_table_t *ft) {
ft->clock_hand = (ft->clock_hand + 1) % ft->num_frames;
}
// On every memory access, set reference bit
void frame_access(frame_table_t *ft, uint32_t pfn) {
ft->frames[pfn].ref_bit = 1;
ft->frames[pfn].last_access = current_time++;
}
The key insight: Clearing ref_bit gives pages a “second chance” before eviction.
Hint 4: LRU with Timestamps
Simple LRU using timestamps:
int lru_select_victim(frame_table_t *ft) {
int victim = -1;
uint64_t oldest_time = UINT64_MAX;
for (int i = 0; i < ft->num_frames; i++) {
if (!ft->frames[i].valid) {
return i; // Free frame
}
if (ft->frames[i].last_access < oldest_time) {
oldest_time = ft->frames[i].last_access;
victim = i;
}
}
return victim;
}
// For better performance (O(1) instead of O(n)), use a
// doubly-linked list ordered by access time:
//
// MRU <--> ... <--> ... <--> LRU
//
// On access: move node to front
// On eviction: take from back
//
// Use hash table: VPN -> list_node for O(1) lookup
Hint 5: Page Table with Hash Map
For sparse page tables (most VPNs unused), use a hash map:
#define PAGE_TABLE_SIZE 1024 // Must be power of 2
typedef struct pt_node {
uint64_t vpn;
page_table_entry_t pte;
struct pt_node *next; // Chaining for collisions
} pt_node_t;
typedef struct {
pt_node_t *buckets[PAGE_TABLE_SIZE];
} page_table_t;
uint64_t hash_vpn(uint64_t vpn) {
// Simple hash - mix the bits
return (vpn ^ (vpn >> 16)) & (PAGE_TABLE_SIZE - 1);
}
page_table_entry_t *page_table_lookup(page_table_t *pt, uint64_t vpn) {
uint64_t idx = hash_vpn(vpn);
pt_node_t *node = pt->buckets[idx];
while (node) {
if (node->vpn == vpn) {
return &node->pte;
}
node = node->next;
}
return NULL; // Not found
}
void page_table_insert(page_table_t *pt, uint64_t vpn,
uint32_t pfn, int dirty) {
uint64_t idx = hash_vpn(vpn);
// Check if exists
pt_node_t *node = pt->buckets[idx];
while (node) {
if (node->vpn == vpn) {
node->pte.pfn = pfn;
node->pte.present = 1;
node->pte.dirty = dirty;
return;
}
node = node->next;
}
// Create new node
pt_node_t *new_node = malloc(sizeof(pt_node_t));
new_node->vpn = vpn;
new_node->pte.pfn = pfn;
new_node->pte.present = 1;
new_node->pte.dirty = dirty;
new_node->next = pt->buckets[idx];
pt->buckets[idx] = new_node;
}
Testing Strategy
Unit Tests
/* Test address decoding */
void test_address_decode(void) {
uint64_t vpn;
uint32_t offset;
// 4KB pages (12-bit offset)
decode_address(0x00001234, 12, &vpn, &offset);
assert(vpn == 0x1);
assert(offset == 0x234);
decode_address(0x7FFF12345ABC, 12, &vpn, &offset);
assert(vpn == 0x7FFF12345);
assert(offset == 0xABC);
// 2MB pages (21-bit offset)
decode_address(0x00123456, 21, &vpn, &offset);
assert(vpn == 0x0);
assert(offset == 0x123456);
printf("test_address_decode PASSED\n");
}
/* Test TLB operations */
void test_tlb(void) {
tlb_t tlb;
tlb_init(&tlb, 4, REPLACE_FIFO);
uint32_t pfn;
// Miss on empty
assert(tlb_lookup(&tlb, 0x1, &pfn) == 0);
// Insert and hit
tlb_insert(&tlb, 0x1, 100);
assert(tlb_lookup(&tlb, 0x1, &pfn) == 1);
assert(pfn == 100);
// Fill TLB
tlb_insert(&tlb, 0x2, 101);
tlb_insert(&tlb, 0x3, 102);
tlb_insert(&tlb, 0x4, 103);
// Insert 5th - should evict first (FIFO)
tlb_insert(&tlb, 0x5, 104);
assert(tlb_lookup(&tlb, 0x1, &pfn) == 0); // Evicted
assert(tlb_lookup(&tlb, 0x5, &pfn) == 1);
printf("test_tlb PASSED\n");
}
/* Test page replacement policies */
void test_fifo(void) {
frame_table_t ft;
frame_table_init(&ft, 3, REPLACE_FIFO);
// Load pages 1, 2, 3
int f1 = allocate_frame(&ft, 1);
int f2 = allocate_frame(&ft, 2);
int f3 = allocate_frame(&ft, 3);
// Access page 1 (shouldn't affect FIFO order)
frame_access(&ft, f1);
// Load page 4 - should evict page 1 (oldest)
int victim = select_victim(&ft);
assert(victim == f1);
assert(ft.frames[f1].vpn == 1);
printf("test_fifo PASSED\n");
}
void test_lru(void) {
frame_table_t ft;
frame_table_init(&ft, 3, REPLACE_LRU);
// Load pages 1, 2, 3
allocate_frame(&ft, 1); // frame 0
allocate_frame(&ft, 2); // frame 1
allocate_frame(&ft, 3); // frame 2
// Access page 1 (make it recently used)
frame_access(&ft, 0);
// Page 2 is now LRU
int victim = select_victim(&ft);
assert(ft.frames[victim].vpn == 2);
printf("test_lru PASSED\n");
}
void test_clock(void) {
frame_table_t ft;
frame_table_init(&ft, 3, REPLACE_CLOCK);
// Load pages 1, 2, 3 (all ref bits = 1)
allocate_frame(&ft, 1);
allocate_frame(&ft, 2);
allocate_frame(&ft, 3);
// Clear ref bit of page 2
ft.frames[1].ref_bit = 0;
// Victim should be page 2 (ref_bit = 0)
int victim = select_victim(&ft);
assert(victim == 1);
assert(ft.frames[victim].vpn == 2);
printf("test_clock PASSED\n");
}
Integration Tests
Create test trace files:
# test_basic.trace - Simple sequential access
R 0x00001000
R 0x00002000
R 0x00003000
R 0x00004000
R 0x00001000
# test_writes.trace - Test dirty bit tracking
R 0x00001000
W 0x00001100
R 0x00002000
R 0x00003000
R 0x00004000
R 0x00005000
# Evicting page 1 should cause writeback
# test_locality.trace - Test working set
R 0x00001000
R 0x00002000
R 0x00001100
R 0x00002100
R 0x00001200
R 0x00002200
# High hit rate expected
Expected Results Verification
# Basic test with 4 frames
$ ./vmsim --frames=4 --page-size=4096 --replace=fifo test_basic.trace
# Expected: 4 faults, 1 hit
# Test writes
$ ./vmsim --frames=4 --page-size=4096 --replace=fifo test_writes.trace
# Expected: Should report 1 writeback when page 1 evicted
# Compare algorithms
$ for alg in fifo lru clock; do
echo "=== $alg ==="
./vmsim --frames=4 --page-size=4096 --replace=$alg large_trace.txt
done
# LRU should have fewer faults than FIFO on realistic traces
Common Pitfalls & Debugging
Pitfall 1: Off-by-One in Address Decoding
┌────────────────────────────────────────────────────────────────────────┐
│ BUG: Extracting wrong number of bits │
├────────────────────────────────────────────────────────────────────────┤
│ │
│ WRONG: │
│ offset = vaddr & (page_size - 1); // Assumes page_size is power of 2│
│ // But what if someone passes page_size = 4000? │
│ │
│ RIGHT: │
│ assert((page_size & (page_size - 1)) == 0); // Verify power of 2 │
│ int offset_bits = __builtin_ctz(page_size); // Count trailing zeros│
│ offset = vaddr & ((1ULL << offset_bits) - 1); │
│ │
│ SYMPTOM: Wrong VPNs, pages appear to share addresses │
│ TEST: Verify with known addresses: │
│ Address 0x12345678 with 4KB pages: │
│ VPN = 0x12345, Offset = 0x678 │
│ │
└────────────────────────────────────────────────────────────────────────┘
Pitfall 2: Not Updating All Data Structures
┌────────────────────────────────────────────────────────────────────────┐
│ BUG: Forgetting to update TLB when page table changes │
├────────────────────────────────────────────────────────────────────────┤
│ │
│ WRONG: │
│ void evict_page(int frame) { │
│ uint64_t vpn = frame_table[frame].vpn; │
│ page_table_invalidate(vpn); // Updated page table │
│ // FORGOT to invalidate TLB! │
│ } │
│ │
│ RIGHT: │
│ void evict_page(int frame) { │
│ uint64_t vpn = frame_table[frame].vpn; │
│ page_table_invalidate(vpn); │
│ tlb_invalidate(vpn); // Must also invalidate TLB! │
│ } │
│ │
│ SYMPTOM: Stale TLB entries cause wrong translations │
│ TEST: Sequence that reuses a VPN after eviction │
│ │
└────────────────────────────────────────────────────────────────────────┘
Pitfall 3: Clock Algorithm Not Advancing
┌────────────────────────────────────────────────────────────────────────┐
│ BUG: Clock hand doesn't advance after allocation │
├────────────────────────────────────────────────────────────────────────┤
│ │
│ WRONG: │
│ int clock_select_victim() { │
│ // ... find victim ... │
│ return victim; │
│ // Hand stays at victim position │
│ } │
│ │
│ RIGHT: │
│ int clock_select_victim() { │
│ // ... find victim ... │
│ int result = clock_hand; │
│ clock_hand = (clock_hand + 1) % num_frames; // Advance! │
│ return result; │
│ } │
│ │
│ SYMPTOM: Same frame always evicted, poor performance │
│ TEST: After eviction, next eviction should start from next position │
│ │
└────────────────────────────────────────────────────────────────────────┘
Pitfall 4: Dirty Bit Not Propagated
┌────────────────────────────────────────────────────────────────────────┐
│ BUG: Write to cached TLB entry doesn't update frame table dirty bit │
├────────────────────────────────────────────────────────────────────────┤
│ │
│ WRONG: │
│ if (tlb_hit) { │
│ pfn = tlb_entry.pfn; │
│ if (access_type == WRITE) │
│ tlb_entry.dirty = 1; // Only updated TLB! │
│ } │
│ │
│ RIGHT: │
│ if (tlb_hit) { │
│ pfn = tlb_entry.pfn; │
│ if (access_type == WRITE) { │
│ tlb_entry.dirty = 1; │
│ frame_table[pfn].dirty = 1; // Also update frame! │
│ page_table[vpn].dirty = 1; // And page table! │
│ } │
│ } │
│ │
│ SYMPTOM: Writebacks not counted correctly │
│ TEST: Write to page, evict it, verify writeback counter increments │
│ │
└────────────────────────────────────────────────────────────────────────┘
Pitfall 5: LRU Not Updated on TLB Hit
┌────────────────────────────────────────────────────────────────────────┐
│ BUG: TLB hit doesn't update LRU information in frame table │
├────────────────────────────────────────────────────────────────────────┤
│ │
│ WRONG: │
│ if (tlb_hit) { │
│ pfn = tlb_entry.pfn; │
│ return translate(pfn, offset); │
│ // Frame's last_access not updated! │
│ } │
│ │
│ RIGHT: │
│ if (tlb_hit) { │
│ pfn = tlb_entry.pfn; │
│ frame_table[pfn].last_access = current_time++; │
│ frame_table[pfn].ref_bit = 1; │
│ return translate(pfn, offset); │
│ } │
│ │
│ SYMPTOM: LRU behaves like FIFO, wrong pages evicted │
│ TEST: Access pattern that exercises LRU ordering │
│ │
└────────────────────────────────────────────────────────────────────────┘
Extensions & Challenges
Beginner Extensions
- Visualization: Print ASCII art of frame table state after each access
- Statistics by VPN: Track which pages cause the most faults
- Working set tracking: Calculate working set size at intervals
Intermediate Extensions
- Multi-level page tables: Implement 2-level or 4-level page tables
- Set-associative TLB: Implement N-way set-associative TLB
- Enhanced Clock (NRU): Use both reference and dirty bits for victim selection
- Trace generator: Create synthetic traces with different locality patterns
Advanced Extensions
- Copy-on-Write: Simulate shared pages that become private on write
- Huge pages: Support 2MB and 1GB pages alongside 4KB
- NUMA simulation: Model memory access latency based on “node” distance
- Prefetching: Predict next pages and load them proactively
- Memory pressure simulation: Model what happens as physical memory fills up
Challenge: Implement Multi-Level Page Tables
/*
* 4-Level page table for x86-64 style addressing
*
* 48-bit virtual address breakdown:
* [47:39] PML4 index (9 bits)
* [38:30] PDPT index (9 bits)
* [29:21] PD index (9 bits)
* [20:12] PT index (9 bits)
* [11:0] Page offset (12 bits)
*/
typedef struct {
uint64_t entries[512]; // 512 entries per level
} page_table_level_t;
typedef struct {
page_table_level_t *pml4; // Top level (1 per process)
} multi_level_pt_t;
page_table_entry_t *multilevel_walk(multi_level_pt_t *pt, uint64_t vpn) {
uint64_t pml4_idx = (vpn >> 27) & 0x1FF;
uint64_t pdpt_idx = (vpn >> 18) & 0x1FF;
uint64_t pd_idx = (vpn >> 9) & 0x1FF;
uint64_t pt_idx = vpn & 0x1FF;
// Walk through levels, allocating as needed
// ... (exercise for the reader)
}
Books That Will Help
| Topic | Book | Chapter/Section |
|---|---|---|
| Virtual Memory Overview | CS:APP 3rd Ed | Chapter 9 “Virtual Memory” |
| Address Translation | CS:APP 3rd Ed | Section 9.3 “VM as Tool for Caching” |
| Page Tables | CS:APP 3rd Ed | Section 9.3.2 “Page Tables” |
| TLB and Translation | CS:APP 3rd Ed | Section 9.3.3 “Speeding Up with TLBs” |
| Multi-level Page Tables | CS:APP 3rd Ed | Section 9.3.4 “Multi-Level Page Tables” |
| Page Faults | CS:APP 3rd Ed | Section 9.3.5 “Putting It Together” |
| Intel/AMD Page Tables | CS:APP 3rd Ed | Section 9.7 “Case Study: x86-64” |
| Paging Mechanisms | OSTEP | Chapters 18-22 (Paging) |
| Page Replacement | OSTEP | Chapter 22 “Swapping: Policies” |
| Working Sets | OSTEP | Chapter 22.5 “Working Sets” |
| Linux VM Implementation | Linux Kernel Development | Chapter 15 |
Real-World Connections
Linux Virtual Memory
Your simulator models concepts used in the real Linux kernel:
┌────────────────────────────────────────────────────────────────────────┐
│ YOUR SIMULATOR LINUX KERNEL │
│ ────────────── ──────────── │
│ │
│ page_table_t struct mm_struct + pgd_t │
│ frame_table_t struct page array (mem_map) │
│ tlb_t CPU hardware TLB │
│ clock algorithm kswapd daemon with LRU lists │
│ page fault handler do_page_fault() in arch/x86/mm/fault.c │
│ │
│ Key difference: Linux uses multiple LRU lists: │
│ - Active list (recently accessed) │
│ - Inactive list (candidates for eviction) │
│ - Plus separate lists for anonymous vs file-backed pages │
│ │
└────────────────────────────────────────────────────────────────────────┘
Performance Impact
Understanding these numbers helps calibrate your mental model:
┌────────────────────────────────────────────────────────────────────────┐
│ OPERATION TYPICAL LATENCY │
├────────────────────────────────────────────────────────────────────────┤
│ TLB hit < 1 cycle │
│ TLB miss (page table walk) 10-100 cycles │
│ L1 cache hit 4 cycles │
│ L2 cache hit 12 cycles │
│ L3 cache hit 40 cycles │
│ Main memory access 100 cycles │
│ Page fault (in memory) ~10,000 cycles │
│ Page fault (from SSD) ~50,000 cycles │
│ Page fault (from HDD) ~10,000,000 cycles │
│ │
│ This is why TLB hit rate matters so much! │
│ 99% hit rate vs 95% hit rate = 4× more page table walks │
│ │
└────────────────────────────────────────────────────────────────────────┘
Self-Assessment Checklist
Understanding
- I can explain why virtual memory exists and what problems it solves
- I can describe how a virtual address is translated to physical
- I can explain the purpose of multi-level page tables
- I understand why TLB is essential for performance
- I can describe what happens on a page fault
- I can compare FIFO, LRU, and Clock algorithms
- I understand the working set model and thrashing
Implementation
- Address decoding correctly extracts VPN and offset
- TLB lookup and insertion work correctly
- Page table lookup returns correct PTE
- FIFO replacement evicts oldest page
- LRU replacement evicts least recently used page
- Clock algorithm correctly uses reference bits
- Dirty bit tracking and writeback counting is accurate
- Statistics are correctly accumulated
Testing
- Hand-traced example produces expected results
- Unit tests pass for all components
- Different replacement algorithms produce different fault counts
- Increasing frame count decreases fault rate
- TLB improves performance on repeated accesses
Growth
- I can debug unexpected page fault counts
- I understand when each replacement algorithm performs best
- I can explain the trade-offs in page table design
- I can discuss virtual memory in an interview setting
Submission / Completion Criteria
Minimum Viable Completion:
- Address decoding works correctly
- Single-level page table implemented
- FIFO replacement working
- Basic statistics tracking
- Passes hand-traced example
Full Completion:
- All of the above
- TLB with configurable size
- LRU and Clock replacement working
- Dirty bit tracking and writebacks
- All test cases pass
- Verbose output mode
Excellence (Going Above & Beyond):
- Multi-level page tables (2+ levels)
- Set-associative TLB
- Enhanced Clock (NRU) algorithm
- Working set visualization
- Performance comparison report
- Trace generator tool
This guide was expanded from CSAPP_3E_DEEP_LEARNING_PROJECTS.md. For the complete learning path, see the project index.