Project 13: Virtual Memory Map Visualizer

Project 13: Virtual Memory Map Visualizer

Build a tool that reveals the invisible architecture of process memory: page tables, protection bits, demand paging, and the beautiful illusion that gives every process its own private address space.

Quick Reference

Attribute Value
Difficulty Advanced
Time Estimate 1-2 weeks
Language C (Alternatives: Rust, Zig, C++)
Prerequisites Project 11 (Signals + Processes Sandbox) recommended
Key Topics Virtual memory, page tables, address translation, memory protection, demand paging, mmap, copy-on-write
CS:APP Chapters 8, 9

Table of Contents

  1. Learning Objectives
  2. Theoretical Foundation
  3. Project Specification
  4. Solution Architecture
  5. Implementation Guide
  6. Testing Strategy
  7. Common Pitfalls
  8. Extensions
  9. Real-World Connections
  10. Resources
  11. Self-Assessment Checklist

1. Learning Objectives

By completing this project, you will:

  1. Understand virtual memory fundamentals: Explain why every process thinks it owns all of memory, and how the hardware + OS maintain this illusion
  2. Master address translation: Trace a virtual address through page tables to its physical location (or understand why it faults)
  3. Interpret /proc/[pid]/maps: Read and explain every field in a Linux memory map, connecting regions to their purpose
  4. Demonstrate demand paging: Create controlled experiments that trigger and observe page faults
  5. Explain memory protection: Understand R/W/X bits, why they exist, and what happens when violated
  6. Observe copy-on-write: Demonstrate how fork() shares memory until modification triggers a copy
  7. Connect VM to performance: Understand how TLB misses and page faults affect locality and cache behavior
  8. Debug memory-related crashes: Given a segfault address, explain exactly why the access failed

2. Theoretical Foundation

2.1 The Virtual Memory Abstraction

Every process believes it has exclusive access to a large, contiguous address space starting at 0. This is a beautiful lie maintained by hardware (MMU) and software (OS kernel):

Process A's View:              Process B's View:              Physical Reality:

0xFFFFFFFF โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”      0xFFFFFFFF โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”      โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
           โ”‚   Kernel   โ”‚                 โ”‚   Kernel   โ”‚      โ”‚   Physical RAM     โ”‚
           โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค                 โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค      โ”‚                    โ”‚
           โ”‚   Stack    โ”‚                 โ”‚   Stack    โ”‚      โ”‚  Shared by ALL     โ”‚
           โ”‚     |      โ”‚                 โ”‚     |      โ”‚      โ”‚  processes via     โ”‚
           โ”‚     v      โ”‚                 โ”‚     v      โ”‚      โ”‚  page tables       โ”‚
           โ”‚            โ”‚                 โ”‚            โ”‚      โ”‚                    โ”‚
           โ”‚            โ”‚                 โ”‚            โ”‚      โ”‚  + Disk (swap)     โ”‚
           โ”‚     ^      โ”‚                 โ”‚     ^      โ”‚      โ”‚                    โ”‚
           โ”‚     |      โ”‚                 โ”‚     |      โ”‚      โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
           โ”‚   Heap     โ”‚                 โ”‚   Heap     โ”‚
           โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค                 โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
           โ”‚   .bss     โ”‚                 โ”‚   .bss     โ”‚
           โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค                 โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
           โ”‚   .data    โ”‚                 โ”‚   .data    โ”‚
           โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค                 โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
           โ”‚   .text    โ”‚                 โ”‚   .text    โ”‚
0x00000000 โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜      0x00000000 โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
           "I own all of                  "I own all of
            this memory!"                  this memory!"

Why Virtual Memory?

  1. Isolation: Processes cannot corrupt each otherโ€™s memory
  2. Simplicity: Programs donโ€™t need to know where theyโ€™re loaded
  3. Efficiency: Memory is allocated on demand, shared when possible
  4. Protection: Different regions have different permissions (R/W/X)
  5. Convenience: Larger address spaces than physical memory (via paging to disk)

2.2 Pages: The Fundamental Unit

Memory is managed in fixed-size chunks called pages (typically 4KB on x86-64):

Virtual Address Space                    Physical Memory (RAM)

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                 โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚    Page 0            โ”‚โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ถโ”‚  Frame 47            โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค                 โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚    Page 1            โ”‚โ”€โ”€โ”€โ”             โ”‚  Frame 48            โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค   โ”‚             โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚    Page 2            โ”‚   โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ถโ”‚  Frame 49            โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค   โ”‚  โ”‚          โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚    Page 3 (unmapped) โ”‚   โ”‚  โ”‚          โ”‚  Frame 50            โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค   โ”‚  โ”‚          โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚    Page 4            โ”‚   โ””โ”€โ”€โ”‚โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ถโ”‚  Frame 51            โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค      โ”‚          โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚    ...               โ”‚      โ”‚          โ”‚  ...                 โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜      โ”‚          โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                              โ”‚
                              โ”‚          On Disk (Swap)
                              โ”‚          โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                              โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ถโ”‚  Swapped Page        โ”‚
                                         โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Page size = 4KB = 4096 bytes = 0x1000 bytes

Key Insight: Pages donโ€™t need to be contiguous in physical memory, and some may not be in memory at all!

2.3 Page Tables and Address Translation

The page table maps virtual pages to physical frames. On x86-64, this is a multi-level hierarchy:

48-bit Virtual Address (x86-64 with 4-level paging):

 63    48 47    39 38    30 29    21 20    12 11        0
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Sign   โ”‚  PML4  โ”‚  PDPT  โ”‚   PD   โ”‚   PT   โ”‚  Offset   โ”‚
โ”‚ Extend โ”‚ Index  โ”‚ Index  โ”‚ Index  โ”‚ Index  โ”‚ (12 bits) โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
   16b      9b       9b       9b       9b        12b

Each index selects one of 512 entries (2^9 = 512) in that level's table.
The offset selects a byte within the 4KB page (2^12 = 4096).

Multi-Level Page Table Walk:

                            โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                            โ”‚                     CR3 (Page Map Level 4 Base)             โ”‚
                            โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                                                        โ”‚
                                                        โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                                    PML4 (Page Map Level 4)                                โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”                           โ”‚
โ”‚  โ”‚  0  โ”‚  1  โ”‚  2  โ”‚ ... โ”‚ idx โ”‚ ... โ”‚ 509 โ”‚ 510 โ”‚ 511 โ”‚     โ”‚  512 entries              โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”ฌโ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”˜                           โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                              โ”‚ PML4[idx] contains pointer to PDPT
                              โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                              PDPT (Page Directory Pointer Table)                          โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”                           โ”‚
โ”‚  โ”‚  0  โ”‚  1  โ”‚  2  โ”‚ ... โ”‚ idx โ”‚ ... โ”‚ 509 โ”‚ 510 โ”‚ 511 โ”‚     โ”‚  512 entries              โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”ฌโ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”˜                           โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                              โ”‚ PDPT[idx] contains pointer to PD
                              โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                                   PD (Page Directory)                                     โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”                           โ”‚
โ”‚  โ”‚  0  โ”‚  1  โ”‚  2  โ”‚ ... โ”‚ idx โ”‚ ... โ”‚ 509 โ”‚ 510 โ”‚ 511 โ”‚     โ”‚  512 entries              โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”ฌโ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”˜                           โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                              โ”‚ PD[idx] contains pointer to PT
                              โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                                     PT (Page Table)                                       โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”                           โ”‚
โ”‚  โ”‚  0  โ”‚  1  โ”‚  2  โ”‚ ... โ”‚ idx โ”‚ ... โ”‚ 509 โ”‚ 510 โ”‚ 511 โ”‚     โ”‚  512 entries              โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”ฌโ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”˜                           โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                              โ”‚ PT[idx] contains Physical Frame Number + flags
                              โ–ผ
                    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                    โ”‚              Physical Frame + Offset                 โ”‚
                    โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”       โ”‚
                    โ”‚  โ”‚   Physical Frame Number  โ”‚    Offset     โ”‚       โ”‚
                    โ”‚  โ”‚        (40 bits)         โ”‚   (12 bits)   โ”‚       โ”‚
                    โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜       โ”‚
                    โ”‚                         โ”‚                            โ”‚
                    โ”‚                         โ–ผ                            โ”‚
                    โ”‚              Physical Memory Address                 โ”‚
                    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

2.4 Page Table Entry (PTE) Format

Each page table entry contains critical information:

x86-64 Page Table Entry (64 bits):

 63  62    59 58    52 51        12 11   9  8   7   6   5   4   3   2   1   0
โ”Œโ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”
โ”‚NX โ”‚  Avail โ”‚  Rsvd  โ”‚  Physical  โ”‚ Avl  โ”‚ G โ”‚PATโ”‚ D โ”‚ A โ”‚PCDโ”‚PWTโ”‚U/Sโ”‚R/Wโ”‚ P โ”‚
โ”‚   โ”‚        โ”‚        โ”‚Frame Numberโ”‚      โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   โ”‚
โ””โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”˜

Key bits:
  P   (bit 0):  Present - page is in physical memory
  R/W (bit 1):  Read/Write - 0=read-only, 1=read-write
  U/S (bit 2):  User/Supervisor - 0=kernel only, 1=user accessible
  A   (bit 5):  Accessed - set by hardware when page is read
  D   (bit 6):  Dirty - set by hardware when page is written
  NX  (bit 63): No Execute - prevents code execution (security feature)

2.5 The Translation Lookaside Buffer (TLB)

Walking the page table for every memory access would be catastrophically slow (4 memory accesses just to translate one address!). The TLB caches recent translations:

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                           CPU Memory Access                                   โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                                    โ”‚
                                    โ–ผ
                        โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                        โ”‚  Virtual Address      โ”‚
                        โ”‚  0x7fff_abcd_1234     โ”‚
                        โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                                    โ”‚
                    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                    โ–ผ                               โ–ผ
           โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”              โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
           โ”‚   TLB Lookup   โ”‚              โ”‚   Page Table   โ”‚
           โ”‚  (1-2 cycles)  โ”‚              โ”‚     Walk       โ”‚
           โ”‚                โ”‚              โ”‚  (~100 cycles) โ”‚
           โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜              โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                   โ”‚                               โ”‚
          TLB Hit? โ”œโ”€โ”€โ”€ YES โ”€โ”€โ”€โ”     TLB Miss โ”€โ”€โ”€โ”€โ”˜
                   โ”‚           โ”‚           โ”‚
                   NO          โ”‚           โ”‚
                   โ”‚           โ”‚           โ”‚
                   โ–ผ           โ”‚           โ–ผ
        Page Table Walk        โ”‚    Update TLB with
        (slow path)            โ”‚    new translation
                   โ”‚           โ”‚           โ”‚
                   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                               โ”‚
                               โ–ผ
                   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                   โ”‚  Physical Address     โ”‚
                   โ”‚  0x0000_1234_5234     โ”‚
                   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                               โ”‚
                               โ–ผ
                   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                   โ”‚  L1/L2/L3 Cache or    โ”‚
                   โ”‚  Main Memory Access   โ”‚
                   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

TLB Facts:

  • Typical size: 64-1024 entries
  • Fully associative (any entry can hold any translation)
  • Flushed on context switch (CR3 change) unless PCID is used
  • TLB miss is expensive: ~100 cycles for page table walk

2.6 Page Faults and Fault Handling

A page fault occurs when the MMU cannot complete address translation:

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                           Page Fault Types                                    โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                                                                               โ”‚
โ”‚  1. MINOR FAULT (Soft Fault)                                                 โ”‚
โ”‚     โ”œโ”€โ”€ Page is in memory but PTE not set up                                 โ”‚
โ”‚     โ”œโ”€โ”€ Example: First access to newly mapped region                         โ”‚
โ”‚     โ””โ”€โ”€ Resolution: Kernel updates PTE, no I/O needed                        โ”‚
โ”‚                                                                               โ”‚
โ”‚  2. MAJOR FAULT (Hard Fault)                                                 โ”‚
โ”‚     โ”œโ”€โ”€ Page must be loaded from disk                                        โ”‚
โ”‚     โ”œโ”€โ”€ Example: Swapped-out page, memory-mapped file                        โ”‚
โ”‚     โ””โ”€โ”€ Resolution: Kernel reads from disk (SLOW: milliseconds)              โ”‚
โ”‚                                                                               โ”‚
โ”‚  3. PROTECTION FAULT                                                          โ”‚
โ”‚     โ”œโ”€โ”€ Access violates page permissions                                      โ”‚
โ”‚     โ”œโ”€โ”€ Examples:                                                             โ”‚
โ”‚     โ”‚   - Write to read-only page (R/W=0)                                    โ”‚
โ”‚     โ”‚   - Execute data page (NX=1)                                           โ”‚
โ”‚     โ”‚   - User access to kernel page (U/S=0)                                 โ”‚
โ”‚     โ””โ”€โ”€ Resolution: Usually SIGSEGV (segmentation fault)                     โ”‚
โ”‚                                                                               โ”‚
โ”‚  4. INVALID FAULT                                                             โ”‚
โ”‚     โ”œโ”€โ”€ Address not mapped at all (P=0, no backing)                          โ”‚
โ”‚     โ”œโ”€โ”€ Example: NULL pointer dereference, wild pointer                      โ”‚
โ”‚     โ””โ”€โ”€ Resolution: SIGSEGV                                                  โ”‚
โ”‚                                                                               โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Page Fault Handling Flow:

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                         Page Fault Exception                                 โ”‚
โ”‚                      (CPU trap to kernel mode)                               โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                                    โ”‚
                                    โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                    Kernel Page Fault Handler                                 โ”‚
โ”‚                                                                              โ”‚
โ”‚   1. Read CR2 register (contains faulting virtual address)                  โ”‚
โ”‚   2. Read error code from stack (indicates fault type)                      โ”‚
โ”‚   3. Find VMA (Virtual Memory Area) containing the address                  โ”‚
โ”‚   4. Check if access is legal for that VMA                                  โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                                    โ”‚
                    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                    โ”‚                               โ”‚
             Legal Access?                   Illegal Access?
                    โ”‚                               โ”‚
                    โ–ผ                               โ–ผ
    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”     โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
    โ”‚   Allocate physical page  โ”‚     โ”‚   Send SIGSEGV to process โ”‚
    โ”‚   (if needed)             โ”‚     โ”‚   (Segmentation Fault)    โ”‚
    โ”‚                           โ”‚     โ”‚                           โ”‚
    โ”‚   Read from disk/swap     โ”‚     โ”‚   Default: terminate      โ”‚
    โ”‚   (if needed)             โ”‚     โ”‚   process with core dump  โ”‚
    โ”‚                           โ”‚     โ”‚                           โ”‚
    โ”‚   Update page table entry โ”‚     โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
    โ”‚                           โ”‚
    โ”‚   Return to user mode     โ”‚
    โ”‚   (retry instruction)     โ”‚
    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

2.7 Demand Paging

Pages are not loaded until actually accessed:

Program starts:                    First access to code page:

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”          โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  Virtual Address      โ”‚          โ”‚  Virtual Address      โ”‚
โ”‚  Space                โ”‚          โ”‚  Space                โ”‚
โ”‚                       โ”‚          โ”‚                       โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”‚          โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”‚
โ”‚  โ”‚ .text (code)    โ”‚  โ”‚          โ”‚  โ”‚ .text (code)    โ”‚  โ”‚
โ”‚  โ”‚ Mapped but NOT  โ”‚โ”€โ”€โ”ผโ”€โ”€X       โ”‚  โ”‚ NOW IN MEMORY   โ”‚โ”€โ”€โ”ผโ”€โ”€โ–ถ Physical Frame
โ”‚  โ”‚ in memory yet   โ”‚  โ”‚          โ”‚  โ”‚ (page fault     โ”‚  โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ”‚          โ”‚  โ”‚  triggered load)โ”‚  โ”‚
โ”‚                       โ”‚          โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜          โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Benefit: Fast startup, only load what's actually used
Cost: Page faults on first access to each page

2.8 Memory-Mapped Files (mmap)

mmap() creates a direct mapping between virtual addresses and file contents:

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                        Memory-Mapped File (mmap)                             โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

File on disk:                         Process virtual memory:
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚   data.bin         โ”‚                โ”‚                            โ”‚
โ”‚   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”‚                โ”‚   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”‚
โ”‚   โ”‚ Page 0     โ”‚โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ–ถโ”‚ VA: 0x7f1234000000 โ”‚   โ”‚
โ”‚   โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค   โ”‚                โ”‚   โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค   โ”‚
โ”‚   โ”‚ Page 1     โ”‚โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ–ถโ”‚ VA: 0x7f1234001000 โ”‚   โ”‚
โ”‚   โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค   โ”‚                โ”‚   โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค   โ”‚
โ”‚   โ”‚ Page 2     โ”‚โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ–ถโ”‚ VA: 0x7f1234002000 โ”‚   โ”‚
โ”‚   โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค   โ”‚                โ”‚   โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค   โ”‚
โ”‚   โ”‚ ...        โ”‚   โ”‚                โ”‚   โ”‚ ...                โ”‚   โ”‚
โ”‚   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ”‚                โ”‚   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

// Usage:
int fd = open("data.bin", O_RDWR);
void *ptr = mmap(NULL, file_size, PROT_READ | PROT_WRITE,
                 MAP_SHARED, fd, 0);

// Now ptr[i] accesses file byte i through page faults
// Changes to memory are written back to file

mmap Modes:

Flag Behavior
MAP_PRIVATE Copy-on-write: writes donโ€™t affect file
MAP_SHARED Writes are visible to other processes and saved to file
MAP_ANONYMOUS No file backing, just allocate memory (like malloc)
MAP_FIXED Map at exact address specified (dangerous)

2.9 Copy-on-Write (COW)

When fork() creates a child process, it doesnโ€™t immediately copy all memory:

Before fork():

Parent Process
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”        Physical Memory
โ”‚ Page Table              โ”‚        โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”‚        โ”‚             โ”‚
โ”‚ โ”‚ VA 0x1000 โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”ผโ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ถโ”‚  Frame A    โ”‚
โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”˜   โ”‚        โ”‚  (data)     โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”˜        โ”‚             โ”‚
                    โ”‚              โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                    โ”‚
                    R/W

After fork() (before any writes):

Parent Process                     Physical Memory
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”        โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Page Table              โ”‚        โ”‚             โ”‚
โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”‚        โ”‚  Frame A    โ”‚โ—€โ”€โ”€โ”
โ”‚ โ”‚ VA 0x1000 โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”ผโ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ถโ”‚  (data)     โ”‚   โ”‚
โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”˜   โ”‚        โ”‚             โ”‚   โ”‚ SHARED!
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”˜        โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ”‚ (read-only
                    โ”‚                                โ”‚  in both)
                    R (was R/W)                      โ”‚
                                                     โ”‚
Child Process                                        โ”‚
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                          โ”‚
โ”‚ Page Table              โ”‚                          โ”‚
โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”‚                          โ”‚
โ”‚ โ”‚ VA 0x1000 โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”ผโ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”˜   โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”˜
                    โ”‚
                    R (was R/W)

After child writes to the page:

Parent Process                     Physical Memory
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”        โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Page Table              โ”‚        โ”‚  Frame A    โ”‚
โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”‚        โ”‚  (parent's  โ”‚
โ”‚ โ”‚ VA 0x1000 โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”ผโ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ถโ”‚   data)     โ”‚
โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”˜   โ”‚        โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”˜
                     โ”‚             โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                     R/W           โ”‚  Frame B    โ”‚   COPY made
                                   โ”‚  (child's   โ”‚โ—€โ”€ on write!
Child Process                      โ”‚   data)     โ”‚
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”        โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
โ”‚ Page Table              โ”‚               โ–ฒ
โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”‚               โ”‚
โ”‚ โ”‚ VA 0x1000 โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”ผโ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”˜   โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”˜
                     โ”‚
                     R/W

COW Benefits:

  • fork() is fast (just copy page tables, not data)
  • If child calls exec() immediately, no data was copied unnecessarily
  • Shared library pages are never copied (read-only)

2.10 Linux Process Memory Layout

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                    Linux x86-64 Process Address Space                       โ”‚
โ”‚                                                                             โ”‚
โ”‚  0xFFFFFFFFFFFFFFFF โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”โ”‚
โ”‚                     โ”‚                                                     โ”‚โ”‚
โ”‚                     โ”‚             Kernel Space                            โ”‚โ”‚
โ”‚                     โ”‚         (not accessible to user)                    โ”‚โ”‚
โ”‚                     โ”‚                                                     โ”‚โ”‚
โ”‚  0xFFFF800000000000 โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”คโ”‚
โ”‚                     โ”‚                                                     โ”‚โ”‚
โ”‚                     โ”‚           Non-canonical hole                        โ”‚โ”‚
โ”‚                     โ”‚    (addresses cause #GP fault)                      โ”‚โ”‚
โ”‚                     โ”‚                                                     โ”‚โ”‚
โ”‚  0x00007FFFFFFFFFFF โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”คโ”‚
โ”‚                     โ”‚             Stack                                   โ”‚โ”‚
โ”‚  (grows down) โ†“     โ”‚        [stack] in /proc/maps                        โ”‚โ”‚
โ”‚                     โ”‚   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”               โ”‚โ”‚
โ”‚  ~0x7FFF...         โ”‚   โ”‚ Command-line args, environment โ”‚               โ”‚โ”‚
โ”‚                     โ”‚   โ”‚ Local variables, return addrs  โ”‚               โ”‚โ”‚
โ”‚                     โ”‚   โ”‚ Stack frames grow downward     โ”‚               โ”‚โ”‚
โ”‚                     โ”‚   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜               โ”‚โ”‚
โ”‚                     โ”‚                  โ†“                                  โ”‚โ”‚
โ”‚                     โ”‚                                                     โ”‚โ”‚
โ”‚                     โ”‚          (unmapped guard pages)                     โ”‚โ”‚
โ”‚                     โ”‚                                                     โ”‚โ”‚
โ”‚                     โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”คโ”‚
โ”‚  ~0x7F...           โ”‚         Memory-Mapped Region                        โ”‚โ”‚
โ”‚                     โ”‚   Shared libraries: libc.so, libm.so, etc.          โ”‚โ”‚
โ”‚                     โ”‚   mmap() allocations                                โ”‚โ”‚
โ”‚                     โ”‚   File mappings                                     โ”‚โ”‚
โ”‚                     โ”‚   (grows down toward heap)                          โ”‚โ”‚
โ”‚                     โ”‚                  โ†“                                  โ”‚โ”‚
โ”‚                     โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”คโ”‚
โ”‚                     โ”‚                  โ†‘                                  โ”‚โ”‚
โ”‚  (grows up) โ†‘       โ”‚              Heap                                   โ”‚โ”‚
โ”‚                     โ”‚        [heap] in /proc/maps                         โ”‚โ”‚
โ”‚                     โ”‚   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”               โ”‚โ”‚
โ”‚                     โ”‚   โ”‚ malloc/free managed memory     โ”‚               โ”‚โ”‚
โ”‚                     โ”‚   โ”‚ Dynamic allocations            โ”‚               โ”‚โ”‚
โ”‚                     โ”‚   โ”‚ brk()/sbrk() region            โ”‚               โ”‚โ”‚
โ”‚                     โ”‚   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜               โ”‚โ”‚
โ”‚                     โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”คโ”‚
โ”‚                     โ”‚              BSS                                    โ”‚โ”‚
โ”‚                     โ”‚   Uninitialized global/static variables             โ”‚โ”‚
โ”‚                     โ”‚   (zeroed by kernel at load time)                   โ”‚โ”‚
โ”‚                     โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”คโ”‚
โ”‚                     โ”‚              Data                                   โ”‚โ”‚
โ”‚                     โ”‚   Initialized global/static variables               โ”‚โ”‚
โ”‚                     โ”‚   Copied from executable at load time               โ”‚โ”‚
โ”‚                     โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”คโ”‚
โ”‚                     โ”‚            Read-Only Data                           โ”‚โ”‚
โ”‚                     โ”‚   String literals, const globals                    โ”‚โ”‚
โ”‚                     โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”คโ”‚
โ”‚                     โ”‚              Text (Code)                            โ”‚โ”‚
โ”‚                     โ”‚   Executable instructions                           โ”‚โ”‚
โ”‚                     โ”‚   Read-only, executable                             โ”‚โ”‚
โ”‚  ~0x400000          โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”คโ”‚
โ”‚                     โ”‚                                                     โ”‚โ”‚
โ”‚                     โ”‚          (unmapped, catches NULL derefs)            โ”‚โ”‚
โ”‚                     โ”‚                                                     โ”‚โ”‚
โ”‚  0x0000000000000000 โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜โ”‚
โ”‚                                                                             โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

2.11 Understanding /proc/[pid]/maps

The /proc/[pid]/maps file shows the memory map of a process:

$ cat /proc/self/maps

Address Range            Perms Offset   Dev   Inode      Pathname
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
55a8e4000000-55a8e4001000 r--p 00000000 08:01 1234567    /usr/bin/cat
55a8e4001000-55a8e4005000 r-xp 00001000 08:01 1234567    /usr/bin/cat
55a8e4005000-55a8e4007000 r--p 00005000 08:01 1234567    /usr/bin/cat
55a8e4007000-55a8e4008000 r--p 00006000 08:01 1234567    /usr/bin/cat
55a8e4008000-55a8e4009000 rw-p 00007000 08:01 1234567    /usr/bin/cat
55a8e5000000-55a8e5021000 rw-p 00000000 00:00 0          [heap]
7f1234000000-7f1234022000 r--p 00000000 08:01 2345678    /lib/x86_64-linux.../libc.so.6
7f1234022000-7f12341b7000 r-xp 00022000 08:01 2345678    /lib/x86_64-linux.../libc.so.6
7f12341b7000-7f123420f000 r--p 001b7000 08:01 2345678    /lib/x86_64-linux.../libc.so.6
7f123420f000-7f1234213000 r--p 0020e000 08:01 2345678    /lib/x86_64-linux.../libc.so.6
7f1234213000-7f1234215000 rw-p 00212000 08:01 2345678    /lib/x86_64-linux.../libc.so.6
7f1234400000-7f1234401000 r--p 00000000 08:01 3456789    /lib/x86_64-linux.../ld-linux-x86-64.so.2
7fff12340000-7fff12361000 rw-p 00000000 00:00 0          [stack]
7fff12380000-7fff12384000 r--p 00000000 00:00 0          [vvar]
7fff12384000-7fff12386000 r-xp 00000000 00:00 0          [vdso]

Field explanation:
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
Address Range:  Start-End virtual addresses (hex)
Permissions:    r = read, w = write, x = execute, p = private, s = shared
Offset:         Offset into the mapped file (0 for anonymous)
Device:         Major:Minor device numbers (00:00 for anonymous)
Inode:          Inode number of mapped file (0 for anonymous)
Pathname:       File path, [heap], [stack], [vdso], or empty for anon mmap

2.12 Address Space Layout Randomization (ASLR)

ASLR randomizes memory layout to make exploits harder:

Without ASLR (predictable):           With ASLR (randomized):

Run 1:                                Run 1:
  Stack:  0x7fffffffe000                Stack:  0x7ffc3a2fe000
  Heap:   0x555555756000                Heap:   0x5612a8756000
  libc:   0x7ffff7c00000                libc:   0x7f2d3fc00000

Run 2:                                Run 2:
  Stack:  0x7fffffffe000  (same!)       Stack:  0x7ffcd12fe000  (different!)
  Heap:   0x555555756000  (same!)       Heap:   0x55f8b9756000  (different!)
  libc:   0x7ffff7c00000  (same!)       libc:   0x7f1a2bc00000  (different!)

Check/control ASLR:

# Check current setting
cat /proc/sys/kernel/randomize_va_space
# 0 = disabled, 1 = stack+mmap only, 2 = full (heap too)

# Disable for debugging (requires root)
echo 0 | sudo tee /proc/sys/kernel/randomize_va_space

# Disable for single command
setarch $(uname -m) -R ./your_program

2.13 Memory Protection Bits

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                        Memory Protection Summary                             โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚    Permission     โ”‚                     Description                          โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚    r (Read)       โ”‚ Contents can be read                                     โ”‚
โ”‚                   โ”‚ Example: .rodata, .text, shared libs                     โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚    w (Write)      โ”‚ Contents can be modified                                 โ”‚
โ”‚                   โ”‚ Example: .data, .bss, heap, stack                        โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚    x (Execute)    โ”‚ Contents can be executed as code                         โ”‚
โ”‚                   โ”‚ Example: .text, [vdso]                                   โ”‚
โ”‚                   โ”‚ NX bit prevents execution when x is missing              โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚    p (Private)    โ”‚ Copy-on-write: writes are private to this process        โ”‚
โ”‚                   โ”‚ Example: Most mappings, private mmap()                   โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚    s (Shared)     โ”‚ Writes visible to other processes mapping same file      โ”‚
โ”‚                   โ”‚ Example: MAP_SHARED mmap(), shared memory                โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Common combinations:
  r--p : Read-only data (.rodata, read-only file mapping)
  r-xp : Executable code (.text section)
  rw-p : Writable data (.data, .bss, heap, stack)
  ---p : Guard page (catches overflows)

3. Project Specification

3.1 What You Will Build

A command-line tool that:

  1. Reports a processโ€™s virtual memory layout with detailed explanations
  2. Demonstrates demand paging with controlled experiments
  3. Triggers and explains protection faults
  4. Shows copy-on-write behavior after fork()

3.2 Functional Requirements

  1. Memory Map Display (--map <pid> or --self):
    • Parse and display /proc/[pid]/maps
    • Categorize regions (text, data, heap, stack, libraries, etc.)
    • Show size of each region
    • Calculate total virtual vs resident memory
  2. Region Details (--detail <address>):
    • Identify which region contains an address
    • Show permissions, backing file, and offset
    • Explain what would happen on read/write/execute
  3. Demand Paging Demo (--demand):
    • Allocate memory without touching it
    • Show maps before and after first access
    • Measure and display page fault counts
  4. Protection Fault Demo (--fault <type>):
    • Types: null, stack-overflow, write-readonly, execute-data
    • Controlled crash with explanation
    • Show faulting address and why it faulted
  5. Copy-on-Write Demo (--cow):
    • Fork and show shared pages
    • Write in child and show page copy
    • Display before/after memory usage
  6. Page Fault Counter (--faults <command>):
    • Run a command and report minor/major page faults
    • Use /proc/[pid]/stat or getrusage()

3.3 Non-Functional Requirements

  • Safety: Controlled faults donโ€™t corrupt system state
  • Portability: Works on Linux x86-64 (macOS stretch goal)
  • Educational: Output explains โ€œwhyโ€ not just โ€œwhatโ€
  • Accurate: Region identification matches kernelโ€™s view

3.4 Example Output

$ ./vmmap --self

=== VIRTUAL MEMORY MAP (PID 12345) ===

SUMMARY:
  Total Virtual:    140.25 GB (theoretical maximum)
  Mapped Regions:   47.3 MB (actual allocations)
  Resident:         8.2 MB (currently in RAM)
  Shared:           4.1 MB (shared with other processes)

EXECUTABLE (/usr/bin/vmmap):
  Address Range           Size    Perms  Description
  โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
  0x5555555ff000-0x555555600000   4 KB   r--p   ELF headers
  0x555555600000-0x555555612000  72 KB   r-xp   .text (code)
  0x555555612000-0x555555618000  24 KB   r--p   .rodata
  0x555555618000-0x55555561a000   8 KB   rw-p   .data + .bss

HEAP:
  0x555555800000-0x555555900000   1 MB   rw-p   [heap]
  โ””โ”€โ”€ brk() managed, grows upward

SHARED LIBRARIES:
  libc.so.6:
    0x7ffff7c00000-0x7ffff7c22000 136 KB  r--p   .rodata
    0x7ffff7c22000-0x7ffff7db7000 1.6 MB  r-xp   .text
    0x7ffff7db7000-0x7ffff7e0f000 352 KB  r--p   .rodata
    0x7ffff7e0f000-0x7ffff7e15000  24 KB  rw-p   .data

  ld-linux-x86-64.so.2:
    0x7ffff7fc0000-0x7ffff7fc4000  16 KB  r--p   .rodata
    0x7ffff7fc4000-0x7ffff7fef000 172 KB  r-xp   .text
    0x7ffff7fef000-0x7ffff7ffc000  52 KB  r--p   .rodata
    0x7ffff7ffc000-0x7ffff7fff000  12 KB  rw-p   .data

STACK:
  0x7ffffffde000-0x7ffffffff000 132 KB  rw-p   [stack]
  โ””โ”€โ”€ Grows downward, guard pages below

SPECIAL:
  0x7ffff7ffc000-0x7ffff7ffe000   8 KB   r--p   [vvar] (kernel variables)
  0x7ffff7ffe000-0x7ffff8000000   8 KB   r-xp   [vdso] (virtual syscalls)

ANONYMOUS (mmap'd):
  0x7ffff7800000-0x7ffff7900000   1 MB   rw-p   (no backing file)

=== END MAP ===
$ ./vmmap --demand

=== DEMAND PAGING DEMONSTRATION ===

Step 1: Allocating 100 MB with mmap (no MAP_POPULATE)
  Virtual address: 0x7f1234000000
  Size: 100 MB (25600 pages)

Step 2: Checking /proc/self/statm
  Before touching:
    Virtual size: 150 MB
    Resident (RSS): 8 MB    <-- Allocated memory NOT in RSS!
    Page faults: 1234

Step 3: Touching every page (reading first byte)
  Progress: [========================================] 100%

Step 4: Checking again
  After touching:
    Virtual size: 150 MB
    Resident (RSS): 108 MB  <-- NOW it's resident!
    Page faults: 26834      <-- 25600 new minor faults

LESSON: Memory is not loaded until first access.
        Each untouched page triggers a minor page fault.
        This is "demand paging" - the kernel loads on demand.

=== END DEMO ===
$ ./vmmap --fault write-readonly

=== PROTECTION FAULT DEMONSTRATION ===

Setting up: Mapping 4KB read-only page at 0x7f1234000000
  Permissions: r--p (read only, no write, no execute)

Attempting to write to 0x7f1234000000...

*** SIGSEGV received! ***

Fault analysis:
  Faulting address: 0x7f1234000000
  Signal: SIGSEGV (Segmentation Fault)
  Code: SEGV_ACCERR (Invalid permissions for mapped object)

  The page exists (no SEGV_MAPERR), but write permission is denied.

  Page table entry for this address:
    Present: YES
    Read:    YES
    Write:   NO   <-- This caused the fault!
    Execute: NO

  To fix: Use mprotect() to add PROT_WRITE, or map with PROT_WRITE initially.

=== END DEMO ===
$ ./vmmap --cow

=== COPY-ON-WRITE DEMONSTRATION ===

Step 1: Parent process (PID 12345) allocating 10 MB private data
  Address: 0x7f1234000000
  Initial value at page 0: 0xDEADBEEF

Step 2: Checking memory usage before fork()
  Parent RSS: 18.5 MB

Step 3: Forking child process (PID 12346)

Step 4: Memory usage immediately after fork()
  Parent RSS: 18.5 MB
  Child RSS:  18.5 MB
  Combined:   18.5 MB  <-- NOT 37 MB! Pages are SHARED.

  Both processes see:
    0x7f1234000000 -> Physical frame 0x1a3b4000 (SHARED, read-only)

Step 5: Child writing to page 0...
  Child writes 0xCAFEBABE to 0x7f1234000000

Step 6: Memory after write
  Parent RSS: 18.5 MB
  Child RSS:  18.5 MB (but 4KB is now private)

  Parent sees: 0x7f1234000000 = 0xDEADBEEF (original)
  Child sees:  0x7f1234000000 = 0xCAFEBABE (copied)

  After COW trigger:
    Parent: 0x7f1234000000 -> Physical frame 0x1a3b4000 (now exclusive)
    Child:  0x7f1234000000 -> Physical frame 0x2c5d6000 (new copy)

LESSON: fork() doesn't copy data immediately.
        Pages are shared until one process writes.
        Then the kernel copies just that page.
        This makes fork() fast!

=== END DEMO ===

4. Solution Architecture

4.1 High-Level Design

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                              vmmap (CLI Tool)                                โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                                                                              โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”‚
โ”‚  โ”‚   Maps       โ”‚  โ”‚   Demand     โ”‚  โ”‚   Fault      โ”‚  โ”‚    COW       โ”‚    โ”‚
โ”‚  โ”‚   Parser     โ”‚  โ”‚   Demo       โ”‚  โ”‚   Demo       โ”‚  โ”‚    Demo      โ”‚    โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ”‚
โ”‚         โ”‚                 โ”‚                 โ”‚                 โ”‚             โ”‚
โ”‚         โ–ผ                 โ–ผ                 โ–ผ                 โ–ผ             โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”‚
โ”‚  โ”‚                        Memory Analysis Core                           โ”‚  โ”‚
โ”‚  โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”     โ”‚  โ”‚
โ”‚  โ”‚  โ”‚   /proc    โ”‚  โ”‚   mmap/    โ”‚  โ”‚   Signal   โ”‚  โ”‚   fork/    โ”‚     โ”‚  โ”‚
โ”‚  โ”‚  โ”‚   Reader   โ”‚  โ”‚  mprotect  โ”‚  โ”‚   Handler  โ”‚  โ”‚   wait     โ”‚     โ”‚  โ”‚
โ”‚  โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜     โ”‚  โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ”‚
โ”‚                                   โ”‚                                         โ”‚
โ”‚                                   โ–ผ                                         โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”‚
โ”‚  โ”‚                        Report Generator                               โ”‚  โ”‚
โ”‚  โ”‚  - Formatted text output                                              โ”‚  โ”‚
โ”‚  โ”‚  - Region categorization                                              โ”‚  โ”‚
โ”‚  โ”‚  - Educational explanations                                           โ”‚  โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ”‚
โ”‚                                                                              โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

4.2 Key Components

Component Responsibility Key Functions
Maps Parser Parse /proc/[pid]/maps parse_maps(), categorize_region()
Memory Stats Read /proc/[pid]/statm, /proc/[pid]/status get_memory_stats(), get_page_faults()
Demand Demo mmap without touch, measure faults demo_demand_paging()
Fault Demo Trigger controlled protection violations demo_fault(), install_handler()
COW Demo fork(), modify, observe sharing demo_cow()
Report Generator Format and explain output print_map(), explain_region()

4.3 Data Structures

// Memory region from /proc/[pid]/maps
typedef struct {
    uint64_t start;
    uint64_t end;
    char perms[5];      // "rwxp" or "rwxs"
    uint64_t offset;
    uint8_t dev_major;
    uint8_t dev_minor;
    uint64_t inode;
    char pathname[256];

    // Derived fields
    size_t size;
    enum RegionType type;  // TEXT, DATA, HEAP, STACK, LIBRARY, ANONYMOUS, SPECIAL
    int is_readable;
    int is_writable;
    int is_executable;
    int is_private;
} MemoryRegion;

// Region types for categorization
typedef enum {
    REGION_TEXT,       // Executable code
    REGION_RODATA,     // Read-only data
    REGION_DATA,       // Writable data
    REGION_BSS,        // Uninitialized data
    REGION_HEAP,       // [heap]
    REGION_STACK,      // [stack]
    REGION_LIBRARY,    // Shared library
    REGION_VDSO,       // [vdso]
    REGION_VVAR,       // [vvar]
    REGION_ANONYMOUS,  // Anonymous mmap
    REGION_MMAP_FILE,  // Memory-mapped file
    REGION_GUARD,      // Guard page (---p)
    REGION_UNKNOWN
} RegionType;

// Complete memory map
typedef struct {
    MemoryRegion *regions;
    size_t count;
    size_t capacity;

    // Summary statistics
    uint64_t total_virtual;
    uint64_t total_resident;
    uint64_t total_shared;
    uint64_t total_private;

    // Parsed from /proc/[pid]/status
    unsigned long vm_peak;
    unsigned long vm_size;
    unsigned long vm_rss;
    unsigned long vm_data;
    unsigned long vm_stack;
    unsigned long vm_exe;
    unsigned long vm_lib;
} MemoryMap;

// Page fault statistics
typedef struct {
    unsigned long minor_faults;  // Satisfied from memory
    unsigned long major_faults;  // Required disk I/O
} PageFaultStats;

// Fault demonstration result
typedef struct {
    uint64_t fault_address;
    int signal_received;
    int signal_code;
    const char *explanation;
} FaultResult;

4.4 Algorithm Overview

Maps Parsing Algorithm:

1. Open /proc/[pid]/maps
2. For each line:
   a. Parse address range (sscanf with %lx-%lx)
   b. Parse permissions string
   c. Parse offset, device, inode
   d. Parse pathname (may be empty)
3. Categorize each region:
   - If pathname is [heap] -> HEAP
   - If pathname is [stack] -> STACK
   - If pathname ends in .so -> LIBRARY
   - If r-xp and pathname is executable -> TEXT
   - etc.
4. Build summary statistics
5. Return MemoryMap structure

Demand Paging Demo Algorithm:

1. Read initial page fault count from /proc/self/stat
2. Allocate large region with mmap(MAP_PRIVATE | MAP_ANONYMOUS)
3. Verify region appears in /proc/self/maps
4. Read page fault count (should be unchanged)
5. Touch each page (read or write first byte)
6. Read page fault count again
7. Report: faults_after - faults_before = pages touched

Protection Fault Demo Algorithm:

1. Install SIGSEGV handler with sigaction()
2. Use sigsetjmp() to establish recovery point
3. Based on fault type:
   - null: dereference NULL pointer
   - write-readonly: mmap() with PROT_READ, attempt write
   - execute-data: mmap() with PROT_READ|PROT_WRITE, call as function
   - stack-overflow: recurse deeply or alloca() large amount
4. In handler:
   - Record fault address from siginfo_t
   - siglongjmp() back to recovery point
5. Explain why fault occurred based on region permissions

5. Implementation Guide

5.1 Development Environment Setup

# Required packages (Debian/Ubuntu)
sudo apt-get install build-essential gcc gdb linux-headers-$(uname -r)

# Project structure
mkdir -p vmmap/{src,include,tests,demos}
cd vmmap

# Create Makefile
cat > Makefile << 'EOF'
CC = gcc
CFLAGS = -Wall -Wextra -g -O2 -std=c11
LDFLAGS =

SRCS = src/main.c src/maps_parser.c src/memory_stats.c \
       src/demand_demo.c src/fault_demo.c src/cow_demo.c \
       src/report.c src/util.c
OBJS = $(SRCS:.c=.o)
TARGET = vmmap

all: $(TARGET)

$(TARGET): $(OBJS)
	$(CC) $(LDFLAGS) -o $@ $^

%.o: %.c
	$(CC) $(CFLAGS) -c -o $@ $<

clean:
	rm -f $(OBJS) $(TARGET)

test: $(TARGET)
	./$(TARGET) --self
	./$(TARGET) --demand
	./$(TARGET) --fault null
	./$(TARGET) --cow

.PHONY: all clean test
EOF

5.2 Project Structure

vmmap/
โ”œโ”€โ”€ src/
โ”‚   โ”œโ”€โ”€ main.c              # CLI argument parsing, dispatch
โ”‚   โ”œโ”€โ”€ maps_parser.c       # Parse /proc/[pid]/maps
โ”‚   โ”œโ”€โ”€ memory_stats.c      # Read /proc stats (RSS, page faults)
โ”‚   โ”œโ”€โ”€ demand_demo.c       # Demand paging demonstration
โ”‚   โ”œโ”€โ”€ fault_demo.c        # Protection fault demonstrations
โ”‚   โ”œโ”€โ”€ cow_demo.c          # Copy-on-write demonstration
โ”‚   โ”œโ”€โ”€ report.c            # Output formatting
โ”‚   โ””โ”€โ”€ util.c              # Helpers (size formatting, etc.)
โ”œโ”€โ”€ include/
โ”‚   โ”œโ”€โ”€ vmmap.h             # Main header, data structures
โ”‚   โ””โ”€โ”€ util.h              # Utility declarations
โ”œโ”€โ”€ tests/
โ”‚   โ”œโ”€โ”€ test_parser.c       # Unit tests for maps parser
โ”‚   โ”œโ”€โ”€ test_regions.c      # Test region categorization
โ”‚   โ””โ”€โ”€ test_demos.c        # Integration tests
โ”œโ”€โ”€ demos/
โ”‚   โ”œโ”€โ”€ simple_program.c    # Simple program for testing
โ”‚   โ””โ”€โ”€ heap_heavy.c        # Program with lots of heap
โ”œโ”€โ”€ Makefile
โ””โ”€โ”€ README.md

5.3 Implementation Phases

Phase 1: Maps Parser (Days 1-3)

Goals:

  • Parse /proc/[pid]/maps completely
  • Categorize regions correctly
  • Display basic map output

Key Code - maps_parser.c:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdint.h>
#include "vmmap.h"

// Parse a single line from /proc/[pid]/maps
static int parse_maps_line(const char *line, MemoryRegion *region) {
    char perms[5] = {0};
    char pathname[256] = {0};

    // Format: start-end perms offset dev inode pathname
    int fields = sscanf(line, "%lx-%lx %4s %lx %hhx:%hhx %lu %255[^\n]",
                        &region->start, &region->end,
                        perms,
                        &region->offset,
                        &region->dev_major, &region->dev_minor,
                        &region->inode,
                        pathname);

    if (fields < 7) return -1;  // pathname is optional

    strncpy(region->perms, perms, 4);
    strncpy(region->pathname, pathname, sizeof(region->pathname) - 1);

    // Parse permission bits
    region->is_readable   = (perms[0] == 'r');
    region->is_writable   = (perms[1] == 'w');
    region->is_executable = (perms[2] == 'x');
    region->is_private    = (perms[3] == 'p');

    region->size = region->end - region->start;
    region->type = categorize_region(region);

    return 0;
}

// Determine region type based on permissions and pathname
RegionType categorize_region(const MemoryRegion *region) {
    const char *path = region->pathname;

    // Special kernel mappings
    if (strcmp(path, "[heap]") == 0) return REGION_HEAP;
    if (strcmp(path, "[stack]") == 0) return REGION_STACK;
    if (strcmp(path, "[vdso]") == 0) return REGION_VDSO;
    if (strcmp(path, "[vvar]") == 0) return REGION_VVAR;
    if (strcmp(path, "[vsyscall]") == 0) return REGION_VDSO;

    // Guard pages (no permissions)
    if (!region->is_readable && !region->is_writable && !region->is_executable) {
        return REGION_GUARD;
    }

    // Anonymous mappings (no pathname, no inode)
    if (path[0] == '\0' && region->inode == 0) {
        return REGION_ANONYMOUS;
    }

    // Shared libraries
    if (strstr(path, ".so") != NULL) {
        if (region->is_executable) return REGION_TEXT;
        if (region->is_writable) return REGION_DATA;
        return REGION_RODATA;
    }

    // Main executable (has pathname, not a library)
    if (path[0] == '/' && strstr(path, ".so") == NULL) {
        if (region->is_executable) return REGION_TEXT;
        if (region->is_writable) return REGION_DATA;
        return REGION_RODATA;
    }

    // Memory-mapped file
    if (path[0] == '/' && region->inode != 0) {
        return REGION_MMAP_FILE;
    }

    return REGION_UNKNOWN;
}

// Parse complete memory map for a PID
MemoryMap *parse_memory_map(pid_t pid) {
    char path[64];
    snprintf(path, sizeof(path), "/proc/%d/maps", pid);

    FILE *f = fopen(path, "r");
    if (!f) return NULL;

    MemoryMap *map = calloc(1, sizeof(MemoryMap));
    map->capacity = 64;
    map->regions = calloc(map->capacity, sizeof(MemoryRegion));

    char line[512];
    while (fgets(line, sizeof(line), f)) {
        if (map->count >= map->capacity) {
            map->capacity *= 2;
            map->regions = realloc(map->regions,
                                   map->capacity * sizeof(MemoryRegion));
        }

        if (parse_maps_line(line, &map->regions[map->count]) == 0) {
            map->count++;
        }
    }

    fclose(f);

    // Calculate totals
    for (size_t i = 0; i < map->count; i++) {
        map->total_virtual += map->regions[i].size;
    }

    return map;
}

Checkpoint: ./vmmap --self shows all memory regions with correct categorization.

Phase 2: Memory Statistics (Days 4-5)

Goals:

  • Read RSS, page faults from /proc
  • Calculate memory summaries
  • Enhance output with statistics

Key Code - memory_stats.c:

#include <stdio.h>
#include <string.h>
#include "vmmap.h"

// Read page fault counts from /proc/[pid]/stat
int get_page_faults(pid_t pid, PageFaultStats *stats) {
    char path[64];
    snprintf(path, sizeof(path), "/proc/%d/stat", pid);

    FILE *f = fopen(path, "r");
    if (!f) return -1;

    // Fields in /proc/[pid]/stat:
    // 1:pid 2:comm 3:state 4:ppid ... 10:minflt 11:cminflt 12:majflt ...
    char line[1024];
    if (!fgets(line, sizeof(line), f)) {
        fclose(f);
        return -1;
    }
    fclose(f);

    // Skip to fields we need (10 and 12)
    unsigned long minflt, cminflt, majflt, cmajflt;

    // Find end of comm field (in parentheses)
    char *p = strrchr(line, ')');
    if (!p) return -1;

    // Skip state and parse fields
    if (sscanf(p + 2, "%*c %*d %*d %*d %*d %*d %*u %lu %lu %lu %lu",
               &minflt, &cminflt, &majflt, &cmajflt) != 4) {
        return -1;
    }

    stats->minor_faults = minflt;
    stats->major_faults = majflt;
    return 0;
}

// Read memory statistics from /proc/[pid]/status
int get_memory_status(pid_t pid, MemoryMap *map) {
    char path[64];
    snprintf(path, sizeof(path), "/proc/%d/status", pid);

    FILE *f = fopen(path, "r");
    if (!f) return -1;

    char line[256];
    while (fgets(line, sizeof(line), f)) {
        unsigned long value;

        if (sscanf(line, "VmPeak: %lu kB", &value) == 1)
            map->vm_peak = value * 1024;
        else if (sscanf(line, "VmSize: %lu kB", &value) == 1)
            map->vm_size = value * 1024;
        else if (sscanf(line, "VmRSS: %lu kB", &value) == 1)
            map->vm_rss = value * 1024;
        else if (sscanf(line, "VmData: %lu kB", &value) == 1)
            map->vm_data = value * 1024;
        else if (sscanf(line, "VmStk: %lu kB", &value) == 1)
            map->vm_stack = value * 1024;
        else if (sscanf(line, "VmExe: %lu kB", &value) == 1)
            map->vm_exe = value * 1024;
        else if (sscanf(line, "VmLib: %lu kB", &value) == 1)
            map->vm_lib = value * 1024;
    }

    fclose(f);
    map->total_resident = map->vm_rss;
    return 0;
}

Checkpoint: Memory map output includes RSS, page faults, and size summaries.

Phase 3: Demand Paging Demo (Days 6-7)

Goals:

  • Demonstrate that allocated memory isnโ€™t resident until touched
  • Count page faults during access
  • Explain the observation

Key Code - demand_demo.c:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/mman.h>
#include <unistd.h>
#include "vmmap.h"

#define DEMO_SIZE (100 * 1024 * 1024)  // 100 MB

void demo_demand_paging(void) {
    printf("\n=== DEMAND PAGING DEMONSTRATION ===\n\n");

    // Step 1: Get initial page fault count
    PageFaultStats before, after;
    get_page_faults(getpid(), &before);

    printf("Step 1: Initial page fault count\n");
    printf("  Minor faults: %lu\n", before.minor_faults);
    printf("  Major faults: %lu\n\n", before.major_faults);

    // Step 2: Allocate without touching
    printf("Step 2: Allocating %zu MB with mmap (no MAP_POPULATE)\n",
           DEMO_SIZE / (1024 * 1024));

    void *ptr = mmap(NULL, DEMO_SIZE, PROT_READ | PROT_WRITE,
                     MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
    if (ptr == MAP_FAILED) {
        perror("mmap");
        return;
    }

    printf("  Virtual address: %p\n", ptr);
    printf("  Size: %zu pages\n\n", DEMO_SIZE / 4096);

    // Step 3: Check RSS before touching
    MemoryMap map = {0};
    get_memory_status(getpid(), &map);
    get_page_faults(getpid(), &after);

    printf("Step 3: Memory status BEFORE touching\n");
    printf("  Virtual size: %lu MB\n", map.vm_size / (1024 * 1024));
    printf("  RSS: %lu MB\n", map.vm_rss / (1024 * 1024));
    printf("  Page faults since alloc: %lu\n\n",
           after.minor_faults - before.minor_faults);

    // Step 4: Touch every page
    printf("Step 4: Touching every page (reading first byte)...\n");

    volatile char *p = (volatile char *)ptr;
    volatile char sum = 0;
    size_t pages = DEMO_SIZE / 4096;

    for (size_t i = 0; i < pages; i++) {
        sum += p[i * 4096];  // Touch first byte of each page
        if (i % (pages / 10) == 0) {
            printf("  Progress: %zu%%\n", (i * 100) / pages);
        }
    }
    printf("  Progress: 100%%\n\n");
    (void)sum;  // Suppress unused warning

    // Step 5: Check RSS after touching
    get_memory_status(getpid(), &map);
    get_page_faults(getpid(), &after);

    printf("Step 5: Memory status AFTER touching\n");
    printf("  Virtual size: %lu MB\n", map.vm_size / (1024 * 1024));
    printf("  RSS: %lu MB\n", map.vm_rss / (1024 * 1024));
    printf("  New page faults: %lu\n\n",
           after.minor_faults - before.minor_faults);

    // Explanation
    printf("EXPLANATION:\n");
    printf("  - We allocated %zu MB of virtual memory\n",
           DEMO_SIZE / (1024 * 1024));
    printf("  - Before touching: RSS was much smaller than allocation\n");
    printf("  - Each first access triggered a MINOR page fault\n");
    printf("  - The kernel allocated physical frames on demand\n");
    printf("  - After touching: RSS increased by ~%zu MB\n",
           DEMO_SIZE / (1024 * 1024));
    printf("  - Total page faults: ~%zu (one per page)\n\n", pages);

    munmap(ptr, DEMO_SIZE);

    printf("=== END DEMO ===\n\n");
}

Checkpoint: Demo shows page faults increasing as pages are touched.

Phase 4: Protection Fault Demo (Days 8-10)

Goals:

  • Trigger various protection violations safely
  • Catch and explain each fault type
  • Show relevant page table information

Key Code - fault_demo.c:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <signal.h>
#include <setjmp.h>
#include <sys/mman.h>
#include <unistd.h>
#include "vmmap.h"

static sigjmp_buf jump_buffer;
static volatile sig_atomic_t fault_caught = 0;
static FaultResult last_fault = {0};

// Signal handler for SIGSEGV
static void segv_handler(int sig, siginfo_t *info, void *context) {
    (void)sig;
    (void)context;

    fault_caught = 1;
    last_fault.fault_address = (uint64_t)info->si_addr;
    last_fault.signal_received = sig;
    last_fault.signal_code = info->si_code;

    // Determine explanation based on signal code
    switch (info->si_code) {
        case SEGV_MAPERR:
            last_fault.explanation = "Address not mapped to object";
            break;
        case SEGV_ACCERR:
            last_fault.explanation = "Invalid permissions for mapped object";
            break;
        default:
            last_fault.explanation = "Unknown SIGSEGV cause";
    }

    siglongjmp(jump_buffer, 1);
}

// Install SIGSEGV handler
static void install_handler(void) {
    struct sigaction sa;
    memset(&sa, 0, sizeof(sa));
    sa.sa_sigaction = segv_handler;
    sa.sa_flags = SA_SIGINFO;
    sigemptyset(&sa.sa_mask);
    sigaction(SIGSEGV, &sa, NULL);
}

// Demo: NULL pointer dereference
static void demo_null_deref(void) {
    printf("\n--- NULL Pointer Dereference ---\n\n");
    printf("Attempting to read from address 0x0...\n\n");

    fault_caught = 0;
    if (sigsetjmp(jump_buffer, 1) == 0) {
        volatile int *null_ptr = NULL;
        int value = *null_ptr;  // This will fault
        (void)value;
    }

    if (fault_caught) {
        printf("*** SIGSEGV received! ***\n\n");
        printf("Fault analysis:\n");
        printf("  Faulting address: 0x%lx\n", last_fault.fault_address);
        printf("  Signal code: %s\n", last_fault.explanation);
        printf("\n");
        printf("  Why this faulted:\n");
        printf("  - Address 0x0 is in the first page (NULL guard)\n");
        printf("  - This page is intentionally unmapped\n");
        printf("  - Any access causes SEGV_MAPERR\n");
    }
}

// Demo: Write to read-only memory
static void demo_write_readonly(void) {
    printf("\n--- Write to Read-Only Memory ---\n\n");

    void *ptr = mmap(NULL, 4096, PROT_READ,
                     MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
    if (ptr == MAP_FAILED) {
        perror("mmap");
        return;
    }

    printf("Mapped 4KB read-only at %p\n", ptr);
    printf("Permissions: r--p (read only)\n\n");
    printf("Attempting to write...\n\n");

    fault_caught = 0;
    if (sigsetjmp(jump_buffer, 1) == 0) {
        *(volatile int *)ptr = 42;  // This will fault
    }

    if (fault_caught) {
        printf("*** SIGSEGV received! ***\n\n");
        printf("Fault analysis:\n");
        printf("  Faulting address: 0x%lx\n", last_fault.fault_address);
        printf("  Signal code: %s\n", last_fault.explanation);
        printf("\n");
        printf("  Why this faulted:\n");
        printf("  - Page is mapped (SEGV_ACCERR, not SEGV_MAPERR)\n");
        printf("  - But write permission is not set\n");
        printf("  - CPU checked PTE.R/W bit = 0, faulted\n");
        printf("\n");
        printf("  Fix: Use mprotect() to add PROT_WRITE\n");
    }

    munmap(ptr, 4096);
}

// Demo: Execute non-executable memory (NX bit)
static void demo_execute_data(void) {
    printf("\n--- Execute Non-Executable Memory ---\n\n");

    // Machine code for: mov eax, 42; ret
    unsigned char code[] = {0xb8, 0x2a, 0x00, 0x00, 0x00, 0xc3};

    void *ptr = mmap(NULL, 4096, PROT_READ | PROT_WRITE,
                     MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
    if (ptr == MAP_FAILED) {
        perror("mmap");
        return;
    }

    memcpy(ptr, code, sizeof(code));

    printf("Mapped 4KB read-write at %p\n", ptr);
    printf("Copied executable code to data page\n");
    printf("Permissions: rw-p (no execute!)\n\n");
    printf("Attempting to execute...\n\n");

    typedef int (*func_t)(void);
    func_t func = (func_t)ptr;

    fault_caught = 0;
    if (sigsetjmp(jump_buffer, 1) == 0) {
        func();  // This will fault (NX bit)
    }

    if (fault_caught) {
        printf("*** SIGSEGV received! ***\n\n");
        printf("Fault analysis:\n");
        printf("  Faulting address: 0x%lx\n", last_fault.fault_address);
        printf("  Signal code: %s\n", last_fault.explanation);
        printf("\n");
        printf("  Why this faulted:\n");
        printf("  - Page lacks execute permission (NX bit set)\n");
        printf("  - This is the NX/XD security feature\n");
        printf("  - Prevents code injection attacks\n");
        printf("\n");
        printf("  Fix: Use mprotect() to add PROT_EXEC\n");
    }

    munmap(ptr, 4096);
}

// Main demo dispatcher
void demo_fault(const char *type) {
    printf("\n=== PROTECTION FAULT DEMONSTRATION ===\n");

    install_handler();

    if (strcmp(type, "null") == 0) {
        demo_null_deref();
    } else if (strcmp(type, "write-readonly") == 0) {
        demo_write_readonly();
    } else if (strcmp(type, "execute-data") == 0) {
        demo_execute_data();
    } else {
        printf("Unknown fault type: %s\n", type);
        printf("Available: null, write-readonly, execute-data\n");
    }

    printf("\n=== END DEMO ===\n\n");
}

Checkpoint: Each fault type is triggered, caught, and explained correctly.

Phase 5: Copy-on-Write Demo (Days 11-12)

Goals:

  • Demonstrate memory sharing after fork()
  • Show COW trigger on write
  • Measure memory before/after

Key Code - cow_demo.c:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/mman.h>
#include <sys/wait.h>
#include <unistd.h>
#include "vmmap.h"

#define COW_SIZE (10 * 1024 * 1024)  // 10 MB

void demo_cow(void) {
    printf("\n=== COPY-ON-WRITE DEMONSTRATION ===\n\n");

    // Step 1: Allocate and initialize data
    printf("Step 1: Parent (PID %d) allocating %zu MB\n",
           getpid(), COW_SIZE / (1024 * 1024));

    volatile uint32_t *data = mmap(NULL, COW_SIZE,
                                   PROT_READ | PROT_WRITE,
                                   MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
    if (data == MAP_FAILED) {
        perror("mmap");
        return;
    }

    // Touch all pages to make them resident
    for (size_t i = 0; i < COW_SIZE / sizeof(uint32_t); i += 4096 / sizeof(uint32_t)) {
        data[i] = 0xDEADBEEF;
    }

    printf("  Address: %p\n", (void *)data);
    printf("  Initial value at page 0: 0x%08X\n\n", data[0]);

    // Step 2: Memory before fork
    MemoryMap parent_map = {0};
    get_memory_status(getpid(), &parent_map);

    printf("Step 2: Memory before fork()\n");
    printf("  Parent RSS: %lu MB\n\n", parent_map.vm_rss / (1024 * 1024));

    // Step 3: Fork
    printf("Step 3: Forking...\n");
    fflush(stdout);

    pid_t child = fork();

    if (child == 0) {
        // Child process
        printf("\n  Child (PID %d) created\n", getpid());

        // Check memory immediately after fork
        MemoryMap child_map = {0};
        get_memory_status(getpid(), &child_map);
        printf("  Child RSS immediately: %lu MB\n",
               child_map.vm_rss / (1024 * 1024));
        printf("  Child sees data[0] = 0x%08X (shared!)\n\n", data[0]);

        // Step 4: Write in child
        printf("Step 4: Child writing to page 0...\n");
        data[0] = 0xCAFEBABE;
        printf("  Child wrote 0x%08X to data[0]\n", data[0]);

        // Check memory after write
        get_memory_status(getpid(), &child_map);
        printf("  Child RSS after write: %lu MB\n",
               child_map.vm_rss / (1024 * 1024));

        printf("  (Page was copied - COW triggered!)\n\n");

        // Exit child
        munmap((void *)data, COW_SIZE);
        _exit(0);

    } else if (child > 0) {
        // Parent process - wait for child output
        usleep(100000);  // Let child print first

        // Wait for child to complete
        waitpid(child, NULL, 0);

        // Step 5: Verify parent data unchanged
        printf("Step 5: Parent checking its data...\n");
        printf("  Parent sees data[0] = 0x%08X (unchanged!)\n\n", data[0]);

        // Explanation
        printf("EXPLANATION:\n");
        printf("  1. Before fork: Parent had %zu MB resident\n",
               COW_SIZE / (1024 * 1024));
        printf("  2. After fork: Both share the same physical pages\n");
        printf("     - Pages are marked read-only in both\n");
        printf("     - Total physical memory: still ~%zu MB\n",
               COW_SIZE / (1024 * 1024));
        printf("  3. Child wrote to a page:\n");
        printf("     - Write caused protection fault (page was read-only)\n");
        printf("     - Kernel caught fault, copied the page\n");
        printf("     - Child got new page, parent keeps original\n");
        printf("  4. Only ONE page was copied (4 KB), not all %zu MB!\n\n",
               COW_SIZE / (1024 * 1024));

        munmap((void *)data, COW_SIZE);
    } else {
        perror("fork");
    }

    printf("=== END DEMO ===\n\n");
}

Checkpoint: Demo shows memory sharing and COW trigger with explanations.

Phase 6: Polish and Integration (Days 13-14)

Goals:

  • Clean error handling
  • Comprehensive help message
  • Test all edge cases

Final Integration in main.c:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include "vmmap.h"

void print_usage(const char *prog) {
    printf("Usage: %s [OPTIONS]\n\n", prog);
    printf("Options:\n");
    printf("  --self              Show memory map of this process\n");
    printf("  --map <pid>         Show memory map of process <pid>\n");
    printf("  --demand            Demonstrate demand paging\n");
    printf("  --fault <type>      Trigger protection fault\n");
    printf("                      Types: null, write-readonly, execute-data\n");
    printf("  --cow               Demonstrate copy-on-write\n");
    printf("  --help              Show this help message\n");
}

int main(int argc, char *argv[]) {
    if (argc < 2) {
        print_usage(argv[0]);
        return 1;
    }

    if (strcmp(argv[1], "--help") == 0 || strcmp(argv[1], "-h") == 0) {
        print_usage(argv[0]);
        return 0;
    }

    if (strcmp(argv[1], "--self") == 0) {
        MemoryMap *map = parse_memory_map(getpid());
        if (map) {
            get_memory_status(getpid(), map);
            print_memory_map(map);
            free_memory_map(map);
        }
        return 0;
    }

    if (strcmp(argv[1], "--map") == 0 && argc > 2) {
        pid_t pid = atoi(argv[2]);
        MemoryMap *map = parse_memory_map(pid);
        if (map) {
            get_memory_status(pid, map);
            print_memory_map(map);
            free_memory_map(map);
        } else {
            fprintf(stderr, "Cannot read map for PID %d\n", pid);
            return 1;
        }
        return 0;
    }

    if (strcmp(argv[1], "--demand") == 0) {
        demo_demand_paging();
        return 0;
    }

    if (strcmp(argv[1], "--fault") == 0 && argc > 2) {
        demo_fault(argv[2]);
        return 0;
    }

    if (strcmp(argv[1], "--cow") == 0) {
        demo_cow();
        return 0;
    }

    fprintf(stderr, "Unknown option: %s\n", argv[1]);
    print_usage(argv[0]);
    return 1;
}

6. Testing Strategy

6.1 Test Categories

Category Purpose Examples
Unit Tests Test parsing functions Maps parser handles all line formats
Integration Tests Full tool on sample processes Map matches expected output
Fault Tests Controlled crashes work correctly Each fault type is caught
Cross-Process Analyze other processes Can read /proc/1/maps (if permitted)

6.2 Critical Test Cases

1. Maps Parser Tests:

# Create test program with various regions
cat > tests/varied_regions.c << 'EOF'
#include <stdio.h>
#include <stdlib.h>
#include <sys/mman.h>
#include <unistd.h>

int global_init = 42;        // .data
int global_uninit;           // .bss
const char *rodata = "test"; // .rodata

int main() {
    char stack_var[4096];    // stack
    void *heap = malloc(4096); // heap
    void *anon = mmap(NULL, 4096, PROT_READ | PROT_WRITE,
                      MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);

    printf("Press Enter to continue...\n");
    getchar();

    free(heap);
    munmap(anon, 4096);
    return 0;
}
EOF

gcc -o tests/varied_regions tests/varied_regions.c
./tests/varied_regions &
PID=$!
./vmmap --map $PID
kill $PID

2. Demand Paging Verification:

# Should see page faults increase by ~25600 for 100MB
./vmmap --demand 2>&1 | grep "New page faults"
# Expected: approximately 25600 (100MB / 4KB)

3. Fault Handler Tests:

# Each should catch fault and not crash the tool
./vmmap --fault null
./vmmap --fault write-readonly
./vmmap --fault execute-data

4. COW Verification:

# Child and parent should see different values
./vmmap --cow 2>&1 | grep "0x"
# Parent should see 0xDEADBEEF
# Child should see 0xCAFEBABE

6.3 Test Automation Script

#!/bin/bash
# tests/run_tests.sh

set -e

echo "=== Testing vmmap ==="

echo "1. Testing --self..."
./vmmap --self > /dev/null
echo "   PASS"

echo "2. Testing --demand..."
./vmmap --demand | grep -q "page faults"
echo "   PASS"

echo "3. Testing --fault null..."
./vmmap --fault null | grep -q "SIGSEGV"
echo "   PASS"

echo "4. Testing --fault write-readonly..."
./vmmap --fault write-readonly | grep -q "SIGSEGV"
echo "   PASS"

echo "5. Testing --cow..."
./vmmap --cow | grep -q "0xDEADBEEF"
echo "   PASS"

echo ""
echo "=== All tests passed! ==="

7. Common Pitfalls

7.1 Frequent Mistakes

Pitfall Symptom Solution
Not handling empty pathname Segfault on parse Check for empty string before using
Signal handler not async-safe Mysterious crashes Only use async-signal-safe functions
Forgetting siglongjmp Handler hangs Always have recovery path
ASLR confuses testing Addresses differ between runs Disable with setarch -R for testing
COW demo races Output interleaved Use proper synchronization or delays
Parsing /proc race Process exits during read Handle errors gracefully

7.2 Debugging Strategies

# Debug maps parsing
cat /proc/self/maps | head -5  # See actual format

# Debug page faults
perf stat -e page-faults ./vmmap --demand

# Debug signal handling
strace -e signal ./vmmap --fault null

# Debug COW
pmap -x <parent_pid>
pmap -x <child_pid>

7.3 Signal Handler Safety

// WRONG: Not async-signal-safe
void bad_handler(int sig) {
    printf("Caught signal %d\n", sig);  // printf is NOT safe!
    malloc(100);                         // malloc is NOT safe!
}

// RIGHT: Async-signal-safe
static volatile sig_atomic_t caught = 0;

void good_handler(int sig, siginfo_t *info, void *ctx) {
    (void)sig; (void)ctx;
    caught = 1;
    // Only set simple flags, then siglongjmp out
    siglongjmp(jump_buffer, 1);
}

8. Extensions

8.1 Beginner Extensions

  • Add JSON output: Machine-readable format for scripting
  • Colorized output: Different colors for different region types
  • Size formatting: Human-readable sizes (KB, MB, GB)
  • Region search: Find which region contains a given address

8.2 Intermediate Extensions

  • Compare two processes: Show differences in memory layout
  • Track memory over time: Poll and show changes
  • Memory-mapped file explorer: List all mmapโ€™d files
  • Shared memory analysis: Show which pages are shared with whom

8.3 Advanced Extensions

  • Parse /proc/[pid]/pagemap: Show actual physical frame numbers
  • TLB miss estimation: Use perf to correlate with address patterns
  • Page table visualization: ASCII art of multi-level tables
  • NUMA awareness: Show which memory is local vs remote

9. Real-World Connections

9.1 Industry Applications

  • Memory profiling tools (Valgrind, massif): Understanding maps is foundational
  • Container runtimes (Docker, containerd): cgroups + namespaces change memory views
  • Security tools: ASLR verification, memory protection auditing
  • Debuggers (GDB, LLDB): Must understand VM to show correct state
  • JIT compilers: Allocate executable memory with mmap + mprotect
# Show memory maps
pmap -x <pid>           # Detailed process map
cat /proc/<pid>/maps    # Raw kernel view
cat /proc/<pid>/smaps   # Detailed per-region stats

# Memory statistics
free -h                 # System-wide memory
vmstat 1                # Virtual memory stats over time
cat /proc/meminfo       # Detailed memory info

# Page faults
ps -o min_flt,maj_flt -p <pid>  # Fault counts
perf stat -e page-faults ./cmd   # Count faults during run

# Memory mapping
strace -e mmap,mprotect,munmap ./cmd  # Trace mmap calls

9.3 Interview Relevance

This project prepares you for questions like:

  • โ€œHow does virtual memory work?โ€
  • โ€œWhat happens when you dereference a NULL pointer?โ€
  • โ€œWhy is fork() considered efficient?โ€
  • โ€œExplain the difference between RSS and virtual sizeโ€
  • โ€œHow does ASLR improve security?โ€
  • โ€œWhatโ€™s the purpose of guard pages?โ€

10. Resources

10.1 Essential Reading

  • CS:APP Chapter 9: โ€œVirtual Memoryโ€ - The foundation
  • CS:APP Chapter 8: โ€œExceptional Control Flowโ€ - Signals and faults
  • Understanding the Linux Virtual Memory Manager by Mel Gorman
  • Linux Kernel Development, 3rd Ed by Robert Love (Ch. 15: Process Address Space)

10.2 Documentation

  • man 5 proc - /proc filesystem documentation
  • man mmap - Memory mapping
  • man mprotect - Changing memory protection
  • man sigaction - Signal handling

10.3 Online Resources

  • Previous: P11 (Signals + Processes Sandbox) - Prerequisite for ECF
  • Parallel: P9 (Cache Simulator) - Locality concepts
  • Next: P14 (Build Your Own Malloc) - Uses VM concepts heavily

11. Self-Assessment Checklist

Before considering this project complete, verify:

Understanding

  • I can explain the multi-level page table structure
  • I understand the difference between virtual and physical addresses
  • I can explain what happens on a TLB miss
  • I understand demand paging and can explain minor vs major faults
  • I can read /proc/[pid]/maps and explain each field
  • I understand copy-on-write and why fork() is efficient
  • I can explain protection bits (R/W/X) and what happens on violation
  • I understand ASLR and its security implications

Implementation

  • Maps parser correctly identifies all region types
  • Demand paging demo shows clear before/after fault counts
  • Protection fault demos catch and explain each fault type
  • COW demo shows sharing and copy trigger
  • Error handling works for inaccessible processes
  • Output is clear and educational

Growth

  • I can debug a segfault by analyzing the memory map
  • I understand how VM interacts with cache locality
  • I can explain memory usage discrepancies (virtual vs resident)
  • Iโ€™m comfortable using mmap() and mprotect()
  • I can write correct signal handlers

12. Real World Outcome

When you complete this project, hereโ€™s exactly what youโ€™ll see when running your tool:

Viewing Your Own Process Memory Map

$ ./vmmap --self

=== VIRTUAL MEMORY MAP: vmmap (PID 12345) ===

REGION TYPE          START ADDR         END ADDR           SIZE       PERM   DESCRIPTION
--------------------------------------------------------------------------------------
[text]               0x0000555555554000 0x0000555555556000     8 KB    r-xp   vmmap (executable)
[rodata]             0x0000555555556000 0x0000555555557000     4 KB    r--p   vmmap (read-only data)
[data]               0x0000555555557000 0x0000555555558000     4 KB    rw-p   vmmap (initialized data)
[bss]                0x0000555555558000 0x0000555555559000     4 KB    rw-p   vmmap (uninitialized data)
[heap]               0x0000555555559000 0x000055555557a000   132 KB    rw-p   [heap]
[shared-lib]         0x00007ffff7c00000 0x00007ffff7c28000   160 KB    r--p   libc.so.6
[shared-lib]         0x00007ffff7c28000 0x00007ffff7dbd000  1620 KB    r-xp   libc.so.6 (code)
[shared-lib]         0x00007ffff7dbd000 0x00007ffff7e15000   352 KB    r--p   libc.so.6 (rodata)
[shared-lib]         0x00007ffff7e15000 0x00007ffff7e19000    16 KB    rw-p   libc.so.6 (data)
[anon]               0x00007ffff7e19000 0x00007ffff7e26000    52 KB    rw-p   [anonymous]
[vdso]               0x00007ffff7fc0000 0x00007ffff7fc4000    16 KB    r-xp   [vdso]
[stack]              0x00007ffffffde000 0x00007ffffffff000   132 KB    rw-p   [stack]
[vsyscall]           0xffffffffff600000 0xffffffffff601000     4 KB    --xp   [vsyscall]

=== MEMORY SUMMARY ===
Total Virtual Size:    2.5 GB
  Code (.text):        1.8 MB (0.07%)
  Data (.data/.bss):   156 KB
  Heap:                132 KB
  Stack:               132 KB
  Shared Libraries:    15.2 MB
  Anonymous:           52 KB

Regions by Permission:
  r-xp (executable):   6 regions
  rw-p (read-write):   8 regions
  r--p (read-only):    4 regions

Demand Paging Demonstration

$ ./vmmap --demand

=== DEMAND PAGING DEMONSTRATION ===

Allocating 100 MB with mmap (no physical pages yet)...
  Virtual address: 0x7ffff0c00000
  Allocation time: 0.0001 seconds
  Page faults before: 1,247

Touching every page (forcing demand paging)...
  Pages to touch: 25,600 (100 MB / 4 KB)
  [####################] 100% complete
  Time elapsed: 0.847 seconds

Page fault analysis:
  Page faults after: 26,892
  New page faults:   25,645  <-- Almost exactly 25,600!
  Extra faults:      45 (library/stack growth)

What this demonstrates:
  - mmap() does NOT allocate physical memory immediately
  - Physical pages are allocated on-demand when first accessed
  - Each 4 KB page access triggers exactly one page fault
  - The kernel satisfies faults by mapping anonymous pages

Memory before/after:
  RSS before touch:  2.1 MB
  RSS after touch:   102.1 MB  (+100 MB, as expected)

Protection Fault Demonstration

$ ./vmmap --fault null

=== PROTECTION FAULT DEMONSTRATION: NULL Pointer ===

Setting up SIGSEGV handler...

Attempting to dereference NULL pointer (address 0x0)...

*** CAUGHT SIGSEGV ***
  Signal:       SIGSEGV (Segmentation fault)
  Fault address: 0x0000000000000000
  Fault reason:  SEGV_MAPERR (address not mapped to object)

Why this happened:
  - Address 0x0 is in the "unmapped" region at the bottom of address space
  - The kernel deliberately leaves this unmapped to catch NULL pointer bugs
  - The MMU triggered an exception when the CPU tried to access this address
  - The kernel converted this to SIGSEGV delivered to our process

If this were a real bug, you'd see:
  Segmentation fault (core dumped)
$ ./vmmap --fault write-readonly

=== PROTECTION FAULT DEMONSTRATION: Write to Read-Only ===

Setting up SIGSEGV handler...

Creating read-only mapped region at 0x7ffff7f00000...
Attempting to write to read-only memory...

*** CAUGHT SIGSEGV ***
  Signal:       SIGSEGV (Segmentation fault)
  Fault address: 0x00007ffff7f00000
  Fault reason:  SEGV_ACCERR (invalid permissions for mapped object)

Why this happened:
  - The page is mapped but with read-only permission (r--p)
  - The MMU checked permissions on the PTE and found no write bit
  - Hardware exception -> kernel -> SIGSEGV to process

This is how the OS protects:
  - Code sections (.text) from modification
  - Shared library code from corruption
  - Read-only data (.rodata) integrity

Copy-on-Write Demonstration

$ ./vmmap --cow

=== COPY-ON-WRITE DEMONSTRATION ===

Creating shared memory region (1 page = 4 KB)...
  Address: 0x7ffff7f00000
  Initial value: 0xDEADBEEF

Forking process...
  Parent PID: 12345
  Child PID:  12346

Before modification:
  Parent reads: 0xDEADBEEF at 0x7ffff7f00000
  Child reads:  0xDEADBEEF at 0x7ffff7f00000  (same physical page!)

Child modifying shared page to 0xCAFEBABE...

After modification:
  Parent reads: 0xDEADBEEF at 0x7ffff7f00000
  Child reads:  0xCAFEBABE at 0x7ffff7f00000

What happened:
  1. After fork(), parent and child shared the SAME physical page
  2. Both PTEs pointed to the same physical frame
  3. Both PTEs were marked read-only (COW bit set)
  4. When child wrote, MMU triggered page fault
  5. Kernel allocated NEW physical page for child
  6. Kernel copied contents to new page
  7. Kernel updated child's PTE to point to new page (now writable)
  8. Parent's page unchanged - processes now have independent copies

This is why fork() is efficient:
  - Only page table entries are copied, not actual pages
  - Physical pages are shared until one process modifies them
  - Most pages (code, libraries) are never modified -> stay shared

13. The Core Question Youโ€™re Answering

โ€œWhen my program uses memory, what actually happens between the addresses my code sees and the physical RAM chips in my computer?โ€

This project demystifies virtual memory by making the abstract concrete. Youโ€™ll see that addresses in your code (virtual addresses) go through a complex translation involving page tables, the MMU, and kernel data structures before touching real hardware. Youโ€™ll understand why processes canโ€™t see each otherโ€™s memory, why dereferencing NULL crashes, and why fork() is amazingly fast despite โ€œcopyingโ€ an entire process.


14. Concepts You Must Understand First

Before starting this project, ensure you understand these concepts:

Concept Why It Matters Where to Learn
What a pointer is and how to dereference it Youโ€™ll be working with raw addresses CS:APP 3.8, any C book Ch. 5-6
Hexadecimal and binary representation Memory addresses are displayed in hex CS:APP 2.1
Process isolation concept VMs main purpose is process isolation CS:APP 8.2.3
What a page fault is (conceptually) Youโ€™ll be triggering and catching these CS:APP 9.3
Signal handling basics (SIGSEGV) Youโ€™ll catch segmentation faults CS:APP 8.5
fork() and process creation COW demo requires understanding fork CS:APP 8.4.2
Basic file I/O in C Reading /proc files CS:APP 10.1-10.4

15. Questions to Guide Your Design

Work through these questions BEFORE writing code:

  1. Parsing Strategy: The /proc/[pid]/maps format has many fields. How will you parse each line reliably? What if a pathname has spaces?

  2. Region Classification: How do you distinguish heap from anonymous mmap? Stack from thread stacks? What heuristics will you use?

  3. Signal Safety: Your SIGSEGV handler will run in a dangerous context. What functions are async-signal-safe? How do you recover from the handler?

  4. Timing Page Faults: How do you measure page faults before and after an operation? What kernel interface provides this?

  5. COW Verification: How do you prove that parent and child initially share physical pages? Can you detect when the copy actually happens?

  6. Cross-Platform: The /proc filesystem is Linux-specific. If you want macOS support, what API would you use instead?

  7. Output Format: How do you make the output educational, not just a data dump? What explanations help the learner understand?


16. Thinking Exercise

Before writing any code, trace through this scenario by hand:

A program does:

int *ptr = malloc(8192);  // 2 pages
ptr[0] = 42;              // Write to first page
ptr[1024] = 99;           // Write to second page (4096 bytes later)
fork();
// In child:
ptr[0] = 100;             // Modify first page

Exercise: On paper, answer:

  1. Before fork(): How many physical pages back the mallocโ€™d region? Are both pages allocated immediately or on-demand?

  2. After fork(), before write: How many total physical pages exist for this region (parent + child combined)? What do the page table entries look like in both processes?

  3. After child writes: Now how many physical pages exist? Which PTE changed? What triggered the copy?

  4. Memory usage: If we had forked 100 times and only one child modified the data, how many physical copies of the 2 pages would exist?

Verify your answers by implementing the COW demo and adding instrumentation to observe physical page allocation.


17. The Interview Questions Theyโ€™ll Ask

After completing this project, youโ€™ll be ready for these common interview questions:

  1. โ€œExplain how virtual memory works.โ€
    • Expected: Describe address translation, page tables, MMU role
    • Bonus: Mention TLB, multi-level page tables, and why VM enables process isolation
  2. โ€œWhat happens when you dereference a NULL pointer?โ€
    • Expected: MMU finds no valid mapping -> page fault -> kernel sends SIGSEGV
    • Bonus: Explain why page 0 is deliberately unmapped, and how guard pages work
  3. โ€œWhy is fork() efficient even though it โ€˜copiesโ€™ an entire process?โ€
    • Expected: Copy-on-write - only page tables are copied, physical pages are shared
    • Bonus: Explain how COW pages are marked read-only and copied on first write
  4. โ€œWhatโ€™s the difference between virtual memory size and RSS (Resident Set Size)?โ€
    • Expected: Virtual is address space reserved; RSS is physical pages currently in RAM
    • Bonus: Explain demand paging, why virtualย ยป RSS, and when pages get evicted
  5. โ€œHow does ASLR improve security?โ€
    • Expected: Randomizes addresses of stack, heap, libraries, making exploits harder
    • Bonus: Discuss whatโ€™s randomized vs. fixed, and limitations (information leaks)
  6. โ€œWhat causes a segmentation fault?โ€
    • Expected: Accessing unmapped memory or violating page permissions
    • Bonus: Distinguish SEGV_MAPERR (unmapped) from SEGV_ACCERR (permission violation)

18. Hints in Layers

If youโ€™re stuck, reveal hints one at a time:

Hint 1: Parsing /proc/[pid]/maps

The format is: address perms offset dev inode pathname

Example line:

7f9c2c000000-7f9c2c021000 rw-p 00000000 00:00 0                          [heap]

Parse with sscanf or by splitting on whitespace. Be careful: pathname can be empty or contain spaces!

sscanf(line, "%lx-%lx %4s %lx %s %lu %[^\n]",
       &start, &end, perms, &offset, dev, &inode, pathname);

If pathname is empty, itโ€™s an anonymous region.

Hint 2: Counting Page Faults

Read /proc/self/stat and extract fields 10 (minflt) and 12 (majflt).

Or use getrusage():

struct rusage usage;
getrusage(RUSAGE_SELF, &usage);
printf("Minor faults: %ld\n", usage.ru_minflt);
printf("Major faults: %ld\n", usage.ru_majflt);

Minor faults = page allocated from free list. Major faults = page read from disk.

Hint 3: Signal Handler Recovery

You canโ€™t return normally from a SIGSEGV handler - the faulting instruction would just run again!

Use sigsetjmp/siglongjmp:

sigjmp_buf jump_buffer;

void handler(int sig, siginfo_t *info, void *ctx) {
    // Log the fault info safely
    siglongjmp(jump_buffer, 1);
}

// In main:
if (sigsetjmp(jump_buffer, 1) == 0) {
    // Try the dangerous operation
    *bad_ptr = 42;
} else {
    // Jumped here from handler
    printf("Caught fault!\n");
}
Hint 4: Demonstrating COW

To show COW is happening, you need to observe that parent and child see different values after one modifies:

int *shared = mmap(...);  // Not really "shared" after COW
*shared = 0xDEADBEEF;

pid_t pid = fork();
if (pid == 0) {
    // Child
    printf("Child before: %x\n", *shared);  // Same as parent
    *shared = 0xCAFEBABE;                   // Triggers COW
    printf("Child after: %x\n", *shared);   // Different from parent
    exit(0);
} else {
    // Parent
    wait(NULL);
    printf("Parent after child exit: %x\n", *shared);  // Still 0xDEADBEEF!
}

19. Books That Will Help

Topic Book Chapter/Section
Virtual Memory overview CS:APP 3rd Ed Chapter 9.1-9.3 โ€œPhysical and Virtual Addressingโ€, โ€œAddress Spacesโ€, โ€œVM as Cachingโ€
Page tables and translation CS:APP 3rd Ed Chapter 9.3 โ€œVM as Tool for Cachingโ€
Page faults and demand paging CS:APP 3rd Ed Chapter 9.3.3 โ€œPage Faultsโ€
Memory mapping (mmap) CS:APP 3rd Ed Chapter 9.8 โ€œMemory Mappingโ€
Copy-on-write CS:APP 3rd Ed Chapter 9.8.3 โ€œThe fork Function Revisitedโ€
Protection and permissions CS:APP 3rd Ed Chapter 9.7 โ€œMemory Protectionโ€
Linux VM implementation Understanding the Linux Kernel Chapter 8 โ€œMemory Managementโ€
Deep VM internals Operating Systems: Three Easy Pieces Chapters on VM (free online)
Intel paging hardware Intel SDM Volume 3A Chapter 4 โ€œPagingโ€

20. Submission / Completion Criteria

Minimum Viable Completion:

  • Parse and display /proc/[pid]/maps correctly
  • Categorize regions by type
  • Show basic memory statistics
  • At least one working demonstration (demand paging, fault, or COW)

Full Completion:

  • All analysis modes work (โ€“self, โ€“map, โ€“demand, โ€“fault, โ€“cow)
  • Accurate region categorization
  • Clear, educational output with explanations
  • All fault types demonstrated and caught
  • COW behavior demonstrated

Excellence (Going Above & Beyond):

  • Parse /proc/[pid]/pagemap for physical addresses
  • TLB/cache interaction analysis
  • Compare memory layouts across processes
  • Integration with perf for page fault profiling
  • Support for macOS (vm_region_recurse_64)

This guide was expanded from CSAPP_3E_DEEP_LEARNING_PROJECTS.md. For the complete learning path, see the project index.