Project 13: Virtual Memory Map Visualizer
Project 13: Virtual Memory Map Visualizer
Build a tool that reveals the invisible architecture of process memory: page tables, protection bits, demand paging, and the beautiful illusion that gives every process its own private address space.
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Advanced |
| Time Estimate | 1-2 weeks |
| Language | C (Alternatives: Rust, Zig, C++) |
| Prerequisites | Project 11 (Signals + Processes Sandbox) recommended |
| Key Topics | Virtual memory, page tables, address translation, memory protection, demand paging, mmap, copy-on-write |
| CS:APP Chapters | 8, 9 |
Table of Contents
- Learning Objectives
- Theoretical Foundation
- Project Specification
- Solution Architecture
- Implementation Guide
- Testing Strategy
- Common Pitfalls
- Extensions
- Real-World Connections
- Resources
- Self-Assessment Checklist
1. Learning Objectives
By completing this project, you will:
- Understand virtual memory fundamentals: Explain why every process thinks it owns all of memory, and how the hardware + OS maintain this illusion
- Master address translation: Trace a virtual address through page tables to its physical location (or understand why it faults)
- Interpret /proc/[pid]/maps: Read and explain every field in a Linux memory map, connecting regions to their purpose
- Demonstrate demand paging: Create controlled experiments that trigger and observe page faults
- Explain memory protection: Understand R/W/X bits, why they exist, and what happens when violated
- Observe copy-on-write: Demonstrate how fork() shares memory until modification triggers a copy
- Connect VM to performance: Understand how TLB misses and page faults affect locality and cache behavior
- Debug memory-related crashes: Given a segfault address, explain exactly why the access failed
2. Theoretical Foundation
2.1 The Virtual Memory Abstraction
Every process believes it has exclusive access to a large, contiguous address space starting at 0. This is a beautiful lie maintained by hardware (MMU) and software (OS kernel):
Process A's View: Process B's View: Physical Reality:
0xFFFFFFFF โโโโโโโโโโโโโโ 0xFFFFFFFF โโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโ
โ Kernel โ โ Kernel โ โ Physical RAM โ
โโโโโโโโโโโโโโค โโโโโโโโโโโโโโค โ โ
โ Stack โ โ Stack โ โ Shared by ALL โ
โ | โ โ | โ โ processes via โ
โ v โ โ v โ โ page tables โ
โ โ โ โ โ โ
โ โ โ โ โ + Disk (swap) โ
โ ^ โ โ ^ โ โ โ
โ | โ โ | โ โโโโโโโโโโโโโโโโโโโโโโ
โ Heap โ โ Heap โ
โโโโโโโโโโโโโโค โโโโโโโโโโโโโโค
โ .bss โ โ .bss โ
โโโโโโโโโโโโโโค โโโโโโโโโโโโโโค
โ .data โ โ .data โ
โโโโโโโโโโโโโโค โโโโโโโโโโโโโโค
โ .text โ โ .text โ
0x00000000 โโโโโโโโโโโโโโ 0x00000000 โโโโโโโโโโโโโโ
"I own all of "I own all of
this memory!" this memory!"
Why Virtual Memory?
- Isolation: Processes cannot corrupt each otherโs memory
- Simplicity: Programs donโt need to know where theyโre loaded
- Efficiency: Memory is allocated on demand, shared when possible
- Protection: Different regions have different permissions (R/W/X)
- Convenience: Larger address spaces than physical memory (via paging to disk)
2.2 Pages: The Fundamental Unit
Memory is managed in fixed-size chunks called pages (typically 4KB on x86-64):
Virtual Address Space Physical Memory (RAM)
โโโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโ
โ Page 0 โโโโโโโโโโโโโโโโโโถโ Frame 47 โ
โโโโโโโโโโโโโโโโโโโโโโโโค โโโโโโโโโโโโโโโโโโโโโโโโค
โ Page 1 โโโโโ โ Frame 48 โ
โโโโโโโโโโโโโโโโโโโโโโโโค โ โโโโโโโโโโโโโโโโโโโโโโโโค
โ Page 2 โ โ โโโโโโโโโโโถโ Frame 49 โ
โโโโโโโโโโโโโโโโโโโโโโโโค โ โ โโโโโโโโโโโโโโโโโโโโโโโโค
โ Page 3 (unmapped) โ โ โ โ Frame 50 โ
โโโโโโโโโโโโโโโโโโโโโโโโค โ โ โโโโโโโโโโโโโโโโโโโโโโโโค
โ Page 4 โ โโโโโโโโโโโโโโถโ Frame 51 โ
โโโโโโโโโโโโโโโโโโโโโโโโค โ โโโโโโโโโโโโโโโโโโโโโโโโค
โ ... โ โ โ ... โ
โโโโโโโโโโโโโโโโโโโโโโโโ โ โโโโโโโโโโโโโโโโโโโโโโโโ
โ
โ On Disk (Swap)
โ โโโโโโโโโโโโโโโโโโโโโโโโ
โโโโโโโโโโโถโ Swapped Page โ
โโโโโโโโโโโโโโโโโโโโโโโโ
Page size = 4KB = 4096 bytes = 0x1000 bytes
Key Insight: Pages donโt need to be contiguous in physical memory, and some may not be in memory at all!
2.3 Page Tables and Address Translation
The page table maps virtual pages to physical frames. On x86-64, this is a multi-level hierarchy:
48-bit Virtual Address (x86-64 with 4-level paging):
63 48 47 39 38 30 29 21 20 12 11 0
โโโโโโโโโโฌโโโโโโโโโฌโโโโโโโโโฌโโโโโโโโโฌโโโโโโโโโฌโโโโโโโโโโโโ
โ Sign โ PML4 โ PDPT โ PD โ PT โ Offset โ
โ Extend โ Index โ Index โ Index โ Index โ (12 bits) โ
โโโโโโโโโโดโโโโโโโโโดโโโโโโโโโดโโโโโโโโโดโโโโโโโโโดโโโโโโโโโโโโ
16b 9b 9b 9b 9b 12b
Each index selects one of 512 entries (2^9 = 512) in that level's table.
The offset selects a byte within the 4KB page (2^12 = 4096).
Multi-Level Page Table Walk:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ CR3 (Page Map Level 4 Base) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ PML4 (Page Map Level 4) โ
โ โโโโโโโฌโโโโโโฌโโโโโโฌโโโโโโฌโโโโโโฌโโโโโโฌโโโโโโฌโโโโโโฌโโโโโโฌโโโโโโ โ
โ โ 0 โ 1 โ 2 โ ... โ idx โ ... โ 509 โ 510 โ 511 โ โ 512 entries โ
โ โโโโโโโดโโโโโโดโโโโโโดโโโโโโดโโโฌโโโดโโโโโโดโโโโโโดโโโโโโดโโโโโโดโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ PML4[idx] contains pointer to PDPT
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ PDPT (Page Directory Pointer Table) โ
โ โโโโโโโฌโโโโโโฌโโโโโโฌโโโโโโฌโโโโโโฌโโโโโโฌโโโโโโฌโโโโโโฌโโโโโโฌโโโโโโ โ
โ โ 0 โ 1 โ 2 โ ... โ idx โ ... โ 509 โ 510 โ 511 โ โ 512 entries โ
โ โโโโโโโดโโโโโโดโโโโโโดโโโโโโดโโโฌโโโดโโโโโโดโโโโโโดโโโโโโดโโโโโโดโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ PDPT[idx] contains pointer to PD
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ PD (Page Directory) โ
โ โโโโโโโฌโโโโโโฌโโโโโโฌโโโโโโฌโโโโโโฌโโโโโโฌโโโโโโฌโโโโโโฌโโโโโโฌโโโโโโ โ
โ โ 0 โ 1 โ 2 โ ... โ idx โ ... โ 509 โ 510 โ 511 โ โ 512 entries โ
โ โโโโโโโดโโโโโโดโโโโโโดโโโโโโดโโโฌโโโดโโโโโโดโโโโโโดโโโโโโดโโโโโโดโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ PD[idx] contains pointer to PT
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ PT (Page Table) โ
โ โโโโโโโฌโโโโโโฌโโโโโโฌโโโโโโฌโโโโโโฌโโโโโโฌโโโโโโฌโโโโโโฌโโโโโโฌโโโโโโ โ
โ โ 0 โ 1 โ 2 โ ... โ idx โ ... โ 509 โ 510 โ 511 โ โ 512 entries โ
โ โโโโโโโดโโโโโโดโโโโโโดโโโโโโดโโโฌโโโดโโโโโโดโโโโโโดโโโโโโดโโโโโโดโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ PT[idx] contains Physical Frame Number + flags
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Physical Frame + Offset โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโ โ
โ โ Physical Frame Number โ Offset โ โ
โ โ (40 bits) โ (12 bits) โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโ โ
โ โ โ
โ โผ โ
โ Physical Memory Address โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
2.4 Page Table Entry (PTE) Format
Each page table entry contains critical information:
x86-64 Page Table Entry (64 bits):
63 62 59 58 52 51 12 11 9 8 7 6 5 4 3 2 1 0
โโโโโฌโโโโโโโโโฌโโโโโโโโโฌโโโโโโโโโโโโโฌโโโโโโโฌโโโโฌโโโโฌโโโโฌโโโโฌโโโโฌโโโโฌโโโโฌโโโโฌโโโโ
โNX โ Avail โ Rsvd โ Physical โ Avl โ G โPATโ D โ A โPCDโPWTโU/SโR/Wโ P โ
โ โ โ โFrame Numberโ โ โ โ โ โ โ โ โ โ โ
โโโโโดโโโโโโโโโดโโโโโโโโโดโโโโโโโโโโโโโดโโโโโโโดโโโโดโโโโดโโโโดโโโโดโโโโดโโโโดโโโโดโโโโดโโโโ
Key bits:
P (bit 0): Present - page is in physical memory
R/W (bit 1): Read/Write - 0=read-only, 1=read-write
U/S (bit 2): User/Supervisor - 0=kernel only, 1=user accessible
A (bit 5): Accessed - set by hardware when page is read
D (bit 6): Dirty - set by hardware when page is written
NX (bit 63): No Execute - prevents code execution (security feature)
2.5 The Translation Lookaside Buffer (TLB)
Walking the page table for every memory access would be catastrophically slow (4 memory accesses just to translate one address!). The TLB caches recent translations:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ CPU Memory Access โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโ
โ Virtual Address โ
โ 0x7fff_abcd_1234 โ
โโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโ
โผ โผ
โโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโ
โ TLB Lookup โ โ Page Table โ
โ (1-2 cycles) โ โ Walk โ
โ โ โ (~100 cycles) โ
โโโโโโโโโฌโโโโโโโโโ โโโโโโโโโฌโโโโโโโโโ
โ โ
TLB Hit? โโโโ YES โโโโ TLB Miss โโโโโ
โ โ โ
NO โ โ
โ โ โ
โผ โ โผ
Page Table Walk โ Update TLB with
(slow path) โ new translation
โ โ โ
โโโโโโโโโโโโโดโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโ
โ Physical Address โ
โ 0x0000_1234_5234 โ
โโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโ
โ L1/L2/L3 Cache or โ
โ Main Memory Access โ
โโโโโโโโโโโโโโโโโโโโโโโโโ
TLB Facts:
- Typical size: 64-1024 entries
- Fully associative (any entry can hold any translation)
- Flushed on context switch (CR3 change) unless PCID is used
- TLB miss is expensive: ~100 cycles for page table walk
2.6 Page Faults and Fault Handling
A page fault occurs when the MMU cannot complete address translation:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Page Fault Types โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ 1. MINOR FAULT (Soft Fault) โ
โ โโโ Page is in memory but PTE not set up โ
โ โโโ Example: First access to newly mapped region โ
โ โโโ Resolution: Kernel updates PTE, no I/O needed โ
โ โ
โ 2. MAJOR FAULT (Hard Fault) โ
โ โโโ Page must be loaded from disk โ
โ โโโ Example: Swapped-out page, memory-mapped file โ
โ โโโ Resolution: Kernel reads from disk (SLOW: milliseconds) โ
โ โ
โ 3. PROTECTION FAULT โ
โ โโโ Access violates page permissions โ
โ โโโ Examples: โ
โ โ - Write to read-only page (R/W=0) โ
โ โ - Execute data page (NX=1) โ
โ โ - User access to kernel page (U/S=0) โ
โ โโโ Resolution: Usually SIGSEGV (segmentation fault) โ
โ โ
โ 4. INVALID FAULT โ
โ โโโ Address not mapped at all (P=0, no backing) โ
โ โโโ Example: NULL pointer dereference, wild pointer โ
โ โโโ Resolution: SIGSEGV โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Page Fault Handling Flow:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Page Fault Exception โ
โ (CPU trap to kernel mode) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Kernel Page Fault Handler โ
โ โ
โ 1. Read CR2 register (contains faulting virtual address) โ
โ 2. Read error code from stack (indicates fault type) โ
โ 3. Find VMA (Virtual Memory Area) containing the address โ
โ 4. Check if access is legal for that VMA โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโ
โ โ
Legal Access? Illegal Access?
โ โ
โผ โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Allocate physical page โ โ Send SIGSEGV to process โ
โ (if needed) โ โ (Segmentation Fault) โ
โ โ โ โ
โ Read from disk/swap โ โ Default: terminate โ
โ (if needed) โ โ process with core dump โ
โ โ โ โ
โ Update page table entry โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ โ
โ Return to user mode โ
โ (retry instruction) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
2.7 Demand Paging
Pages are not loaded until actually accessed:
Program starts: First access to code page:
โโโโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโโ
โ Virtual Address โ โ Virtual Address โ
โ Space โ โ Space โ
โ โ โ โ
โ โโโโโโโโโโโโโโโโโโโ โ โ โโโโโโโโโโโโโโโโโโโ โ
โ โ .text (code) โ โ โ โ .text (code) โ โ
โ โ Mapped but NOT โโโโผโโX โ โ NOW IN MEMORY โโโโผโโโถ Physical Frame
โ โ in memory yet โ โ โ โ (page fault โ โ
โ โโโโโโโโโโโโโโโโโโโ โ โ โ triggered load)โ โ
โ โ โ โโโโโโโโโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโโ
Benefit: Fast startup, only load what's actually used
Cost: Page faults on first access to each page
2.8 Memory-Mapped Files (mmap)
mmap() creates a direct mapping between virtual addresses and file contents:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Memory-Mapped File (mmap) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
File on disk: Process virtual memory:
โโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ data.bin โ โ โ
โ โโโโโโโโโโโโโโ โ โ โโโโโโโโโโโโโโโโโโโโโโ โ
โ โ Page 0 โโโโโผโโโโโโโโโโโโโโโโโผโโโถโ VA: 0x7f1234000000 โ โ
โ โโโโโโโโโโโโโโค โ โ โโโโโโโโโโโโโโโโโโโโโโค โ
โ โ Page 1 โโโโโผโโโโโโโโโโโโโโโโโผโโโถโ VA: 0x7f1234001000 โ โ
โ โโโโโโโโโโโโโโค โ โ โโโโโโโโโโโโโโโโโโโโโโค โ
โ โ Page 2 โโโโโผโโโโโโโโโโโโโโโโโผโโโถโ VA: 0x7f1234002000 โ โ
โ โโโโโโโโโโโโโโค โ โ โโโโโโโโโโโโโโโโโโโโโโค โ
โ โ ... โ โ โ โ ... โ โ
โ โโโโโโโโโโโโโโ โ โ โโโโโโโโโโโโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
// Usage:
int fd = open("data.bin", O_RDWR);
void *ptr = mmap(NULL, file_size, PROT_READ | PROT_WRITE,
MAP_SHARED, fd, 0);
// Now ptr[i] accesses file byte i through page faults
// Changes to memory are written back to file
mmap Modes:
| Flag | Behavior |
|---|---|
MAP_PRIVATE |
Copy-on-write: writes donโt affect file |
MAP_SHARED |
Writes are visible to other processes and saved to file |
MAP_ANONYMOUS |
No file backing, just allocate memory (like malloc) |
MAP_FIXED |
Map at exact address specified (dangerous) |
2.9 Copy-on-Write (COW)
When fork() creates a child process, it doesnโt immediately copy all memory:
Before fork():
Parent Process
โโโโโโโโโโโโโโโโโโโโโโโโโโโ Physical Memory
โ Page Table โ โโโโโโโโโโโโโโโ
โ โโโโโโโโโโโโโโโโโโโโโ โ โ โ
โ โ VA 0x1000 โโโโโโโฌโโผโโโโผโโโโโโโโถโ Frame A โ
โ โโโโโโโโโโโโโโโโโโโผโโ โ โ (data) โ
โโโโโโโโโโโโโโโโโโโโโผโโโโโโ โ โ
โ โโโโโโโโโโโโโโโ
โ
R/W
After fork() (before any writes):
Parent Process Physical Memory
โโโโโโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ
โ Page Table โ โ โ
โ โโโโโโโโโโโโโโโโโโโโโ โ โ Frame A โโโโโ
โ โ VA 0x1000 โโโโโโโฌโโผโโโโผโโโโโโโโถโ (data) โ โ
โ โโโโโโโโโโโโโโโโโโโผโโ โ โ โ โ SHARED!
โโโโโโโโโโโโโโโโโโโโโผโโโโโโ โโโโโโโโโโโโโโโ โ (read-only
โ โ in both)
R (was R/W) โ
โ
Child Process โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ Page Table โ โ
โ โโโโโโโโโโโโโโโโโโโโโ โ โ
โ โ VA 0x1000 โโโโโโโฌโโผโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ โโโโโโโโโโโโโโโโโโโผโโ โ
โโโโโโโโโโโโโโโโโโโโโผโโโโโโ
โ
R (was R/W)
After child writes to the page:
Parent Process Physical Memory
โโโโโโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ
โ Page Table โ โ Frame A โ
โ โโโโโโโโโโโโโโโโโโโโโ โ โ (parent's โ
โ โ VA 0x1000 โโโโโโโโฌโผโโโโผโโโโโโโโถโ data) โ
โ โโโโโโโโโโโโโโโโโโโโผโ โ โโโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโโผโโโโโ
โ โโโโโโโโโโโโโโโ
R/W โ Frame B โ COPY made
โ (child's โโโ on write!
Child Process โ data) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ
โ Page Table โ โฒ
โ โโโโโโโโโโโโโโโโโโโโโ โ โ
โ โ VA 0x1000 โโโโโโโโฌโผโโโโผโโโโโโโโโโโโโโโโ
โ โโโโโโโโโโโโโโโโโโโโผโ โ
โโโโโโโโโโโโโโโโโโโโโโผโโโโโ
โ
R/W
COW Benefits:
fork()is fast (just copy page tables, not data)- If child calls
exec()immediately, no data was copied unnecessarily - Shared library pages are never copied (read-only)
2.10 Linux Process Memory Layout
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Linux x86-64 Process Address Space โ
โ โ
โ 0xFFFFFFFFFFFFFFFF โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ โ โโ
โ โ Kernel Space โโ
โ โ (not accessible to user) โโ
โ โ โโ
โ 0xFFFF800000000000 โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโคโ
โ โ โโ
โ โ Non-canonical hole โโ
โ โ (addresses cause #GP fault) โโ
โ โ โโ
โ 0x00007FFFFFFFFFFF โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโคโ
โ โ Stack โโ
โ (grows down) โ โ [stack] in /proc/maps โโ
โ โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โโ
โ ~0x7FFF... โ โ Command-line args, environment โ โโ
โ โ โ Local variables, return addrs โ โโ
โ โ โ Stack frames grow downward โ โโ
โ โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โโ
โ โ โ โโ
โ โ โโ
โ โ (unmapped guard pages) โโ
โ โ โโ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโคโ
โ ~0x7F... โ Memory-Mapped Region โโ
โ โ Shared libraries: libc.so, libm.so, etc. โโ
โ โ mmap() allocations โโ
โ โ File mappings โโ
โ โ (grows down toward heap) โโ
โ โ โ โโ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโคโ
โ โ โ โโ
โ (grows up) โ โ Heap โโ
โ โ [heap] in /proc/maps โโ
โ โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โโ
โ โ โ malloc/free managed memory โ โโ
โ โ โ Dynamic allocations โ โโ
โ โ โ brk()/sbrk() region โ โโ
โ โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โโ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโคโ
โ โ BSS โโ
โ โ Uninitialized global/static variables โโ
โ โ (zeroed by kernel at load time) โโ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโคโ
โ โ Data โโ
โ โ Initialized global/static variables โโ
โ โ Copied from executable at load time โโ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโคโ
โ โ Read-Only Data โโ
โ โ String literals, const globals โโ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโคโ
โ โ Text (Code) โโ
โ โ Executable instructions โโ
โ โ Read-only, executable โโ
โ ~0x400000 โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโคโ
โ โ โโ
โ โ (unmapped, catches NULL derefs) โโ
โ โ โโ
โ 0x0000000000000000 โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
2.11 Understanding /proc/[pid]/maps
The /proc/[pid]/maps file shows the memory map of a process:
$ cat /proc/self/maps
Address Range Perms Offset Dev Inode Pathname
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
55a8e4000000-55a8e4001000 r--p 00000000 08:01 1234567 /usr/bin/cat
55a8e4001000-55a8e4005000 r-xp 00001000 08:01 1234567 /usr/bin/cat
55a8e4005000-55a8e4007000 r--p 00005000 08:01 1234567 /usr/bin/cat
55a8e4007000-55a8e4008000 r--p 00006000 08:01 1234567 /usr/bin/cat
55a8e4008000-55a8e4009000 rw-p 00007000 08:01 1234567 /usr/bin/cat
55a8e5000000-55a8e5021000 rw-p 00000000 00:00 0 [heap]
7f1234000000-7f1234022000 r--p 00000000 08:01 2345678 /lib/x86_64-linux.../libc.so.6
7f1234022000-7f12341b7000 r-xp 00022000 08:01 2345678 /lib/x86_64-linux.../libc.so.6
7f12341b7000-7f123420f000 r--p 001b7000 08:01 2345678 /lib/x86_64-linux.../libc.so.6
7f123420f000-7f1234213000 r--p 0020e000 08:01 2345678 /lib/x86_64-linux.../libc.so.6
7f1234213000-7f1234215000 rw-p 00212000 08:01 2345678 /lib/x86_64-linux.../libc.so.6
7f1234400000-7f1234401000 r--p 00000000 08:01 3456789 /lib/x86_64-linux.../ld-linux-x86-64.so.2
7fff12340000-7fff12361000 rw-p 00000000 00:00 0 [stack]
7fff12380000-7fff12384000 r--p 00000000 00:00 0 [vvar]
7fff12384000-7fff12386000 r-xp 00000000 00:00 0 [vdso]
Field explanation:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Address Range: Start-End virtual addresses (hex)
Permissions: r = read, w = write, x = execute, p = private, s = shared
Offset: Offset into the mapped file (0 for anonymous)
Device: Major:Minor device numbers (00:00 for anonymous)
Inode: Inode number of mapped file (0 for anonymous)
Pathname: File path, [heap], [stack], [vdso], or empty for anon mmap
2.12 Address Space Layout Randomization (ASLR)
ASLR randomizes memory layout to make exploits harder:
Without ASLR (predictable): With ASLR (randomized):
Run 1: Run 1:
Stack: 0x7fffffffe000 Stack: 0x7ffc3a2fe000
Heap: 0x555555756000 Heap: 0x5612a8756000
libc: 0x7ffff7c00000 libc: 0x7f2d3fc00000
Run 2: Run 2:
Stack: 0x7fffffffe000 (same!) Stack: 0x7ffcd12fe000 (different!)
Heap: 0x555555756000 (same!) Heap: 0x55f8b9756000 (different!)
libc: 0x7ffff7c00000 (same!) libc: 0x7f1a2bc00000 (different!)
Check/control ASLR:
# Check current setting
cat /proc/sys/kernel/randomize_va_space
# 0 = disabled, 1 = stack+mmap only, 2 = full (heap too)
# Disable for debugging (requires root)
echo 0 | sudo tee /proc/sys/kernel/randomize_va_space
# Disable for single command
setarch $(uname -m) -R ./your_program
2.13 Memory Protection Bits
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Memory Protection Summary โ
โโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ Permission โ Description โ
โโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ r (Read) โ Contents can be read โ
โ โ Example: .rodata, .text, shared libs โ
โโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ w (Write) โ Contents can be modified โ
โ โ Example: .data, .bss, heap, stack โ
โโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ x (Execute) โ Contents can be executed as code โ
โ โ Example: .text, [vdso] โ
โ โ NX bit prevents execution when x is missing โ
โโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ p (Private) โ Copy-on-write: writes are private to this process โ
โ โ Example: Most mappings, private mmap() โ
โโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ s (Shared) โ Writes visible to other processes mapping same file โ
โ โ Example: MAP_SHARED mmap(), shared memory โ
โโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Common combinations:
r--p : Read-only data (.rodata, read-only file mapping)
r-xp : Executable code (.text section)
rw-p : Writable data (.data, .bss, heap, stack)
---p : Guard page (catches overflows)
3. Project Specification
3.1 What You Will Build
A command-line tool that:
- Reports a processโs virtual memory layout with detailed explanations
- Demonstrates demand paging with controlled experiments
- Triggers and explains protection faults
- Shows copy-on-write behavior after fork()
3.2 Functional Requirements
- Memory Map Display (
--map <pid>or--self):- Parse and display
/proc/[pid]/maps - Categorize regions (text, data, heap, stack, libraries, etc.)
- Show size of each region
- Calculate total virtual vs resident memory
- Parse and display
- Region Details (
--detail <address>):- Identify which region contains an address
- Show permissions, backing file, and offset
- Explain what would happen on read/write/execute
- Demand Paging Demo (
--demand):- Allocate memory without touching it
- Show maps before and after first access
- Measure and display page fault counts
- Protection Fault Demo (
--fault <type>):- Types:
null,stack-overflow,write-readonly,execute-data - Controlled crash with explanation
- Show faulting address and why it faulted
- Types:
- Copy-on-Write Demo (
--cow):- Fork and show shared pages
- Write in child and show page copy
- Display before/after memory usage
- Page Fault Counter (
--faults <command>):- Run a command and report minor/major page faults
- Use
/proc/[pid]/statorgetrusage()
3.3 Non-Functional Requirements
- Safety: Controlled faults donโt corrupt system state
- Portability: Works on Linux x86-64 (macOS stretch goal)
- Educational: Output explains โwhyโ not just โwhatโ
- Accurate: Region identification matches kernelโs view
3.4 Example Output
$ ./vmmap --self
=== VIRTUAL MEMORY MAP (PID 12345) ===
SUMMARY:
Total Virtual: 140.25 GB (theoretical maximum)
Mapped Regions: 47.3 MB (actual allocations)
Resident: 8.2 MB (currently in RAM)
Shared: 4.1 MB (shared with other processes)
EXECUTABLE (/usr/bin/vmmap):
Address Range Size Perms Description
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
0x5555555ff000-0x555555600000 4 KB r--p ELF headers
0x555555600000-0x555555612000 72 KB r-xp .text (code)
0x555555612000-0x555555618000 24 KB r--p .rodata
0x555555618000-0x55555561a000 8 KB rw-p .data + .bss
HEAP:
0x555555800000-0x555555900000 1 MB rw-p [heap]
โโโ brk() managed, grows upward
SHARED LIBRARIES:
libc.so.6:
0x7ffff7c00000-0x7ffff7c22000 136 KB r--p .rodata
0x7ffff7c22000-0x7ffff7db7000 1.6 MB r-xp .text
0x7ffff7db7000-0x7ffff7e0f000 352 KB r--p .rodata
0x7ffff7e0f000-0x7ffff7e15000 24 KB rw-p .data
ld-linux-x86-64.so.2:
0x7ffff7fc0000-0x7ffff7fc4000 16 KB r--p .rodata
0x7ffff7fc4000-0x7ffff7fef000 172 KB r-xp .text
0x7ffff7fef000-0x7ffff7ffc000 52 KB r--p .rodata
0x7ffff7ffc000-0x7ffff7fff000 12 KB rw-p .data
STACK:
0x7ffffffde000-0x7ffffffff000 132 KB rw-p [stack]
โโโ Grows downward, guard pages below
SPECIAL:
0x7ffff7ffc000-0x7ffff7ffe000 8 KB r--p [vvar] (kernel variables)
0x7ffff7ffe000-0x7ffff8000000 8 KB r-xp [vdso] (virtual syscalls)
ANONYMOUS (mmap'd):
0x7ffff7800000-0x7ffff7900000 1 MB rw-p (no backing file)
=== END MAP ===
$ ./vmmap --demand
=== DEMAND PAGING DEMONSTRATION ===
Step 1: Allocating 100 MB with mmap (no MAP_POPULATE)
Virtual address: 0x7f1234000000
Size: 100 MB (25600 pages)
Step 2: Checking /proc/self/statm
Before touching:
Virtual size: 150 MB
Resident (RSS): 8 MB <-- Allocated memory NOT in RSS!
Page faults: 1234
Step 3: Touching every page (reading first byte)
Progress: [========================================] 100%
Step 4: Checking again
After touching:
Virtual size: 150 MB
Resident (RSS): 108 MB <-- NOW it's resident!
Page faults: 26834 <-- 25600 new minor faults
LESSON: Memory is not loaded until first access.
Each untouched page triggers a minor page fault.
This is "demand paging" - the kernel loads on demand.
=== END DEMO ===
$ ./vmmap --fault write-readonly
=== PROTECTION FAULT DEMONSTRATION ===
Setting up: Mapping 4KB read-only page at 0x7f1234000000
Permissions: r--p (read only, no write, no execute)
Attempting to write to 0x7f1234000000...
*** SIGSEGV received! ***
Fault analysis:
Faulting address: 0x7f1234000000
Signal: SIGSEGV (Segmentation Fault)
Code: SEGV_ACCERR (Invalid permissions for mapped object)
The page exists (no SEGV_MAPERR), but write permission is denied.
Page table entry for this address:
Present: YES
Read: YES
Write: NO <-- This caused the fault!
Execute: NO
To fix: Use mprotect() to add PROT_WRITE, or map with PROT_WRITE initially.
=== END DEMO ===
$ ./vmmap --cow
=== COPY-ON-WRITE DEMONSTRATION ===
Step 1: Parent process (PID 12345) allocating 10 MB private data
Address: 0x7f1234000000
Initial value at page 0: 0xDEADBEEF
Step 2: Checking memory usage before fork()
Parent RSS: 18.5 MB
Step 3: Forking child process (PID 12346)
Step 4: Memory usage immediately after fork()
Parent RSS: 18.5 MB
Child RSS: 18.5 MB
Combined: 18.5 MB <-- NOT 37 MB! Pages are SHARED.
Both processes see:
0x7f1234000000 -> Physical frame 0x1a3b4000 (SHARED, read-only)
Step 5: Child writing to page 0...
Child writes 0xCAFEBABE to 0x7f1234000000
Step 6: Memory after write
Parent RSS: 18.5 MB
Child RSS: 18.5 MB (but 4KB is now private)
Parent sees: 0x7f1234000000 = 0xDEADBEEF (original)
Child sees: 0x7f1234000000 = 0xCAFEBABE (copied)
After COW trigger:
Parent: 0x7f1234000000 -> Physical frame 0x1a3b4000 (now exclusive)
Child: 0x7f1234000000 -> Physical frame 0x2c5d6000 (new copy)
LESSON: fork() doesn't copy data immediately.
Pages are shared until one process writes.
Then the kernel copies just that page.
This makes fork() fast!
=== END DEMO ===
4. Solution Architecture
4.1 High-Level Design
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ vmmap (CLI Tool) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โ
โ โ Maps โ โ Demand โ โ Fault โ โ COW โ โ
โ โ Parser โ โ Demo โ โ Demo โ โ Demo โ โ
โ โโโโโโโโฌโโโโโโโโ โโโโโโโโฌโโโโโโโโ โโโโโโโโฌโโโโโโโโ โโโโโโโโฌโโโโโโโโ โ
โ โ โ โ โ โ
โ โผ โผ โผ โผ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ Memory Analysis Core โ โ
โ โ โโโโโโโโโโโโโโ โโโโโโโโโโโโโโ โโโโโโโโโโโโโโ โโโโโโโโโโโโโโ โ โ
โ โ โ /proc โ โ mmap/ โ โ Signal โ โ fork/ โ โ โ
โ โ โ Reader โ โ mprotect โ โ Handler โ โ wait โ โ โ
โ โ โโโโโโโโโโโโโโ โโโโโโโโโโโโโโ โโโโโโโโโโโโโโ โโโโโโโโโโโโโโ โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ โ
โ โผ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ Report Generator โ โ
โ โ - Formatted text output โ โ
โ โ - Region categorization โ โ
โ โ - Educational explanations โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
4.2 Key Components
| Component | Responsibility | Key Functions |
|---|---|---|
| Maps Parser | Parse /proc/[pid]/maps |
parse_maps(), categorize_region() |
| Memory Stats | Read /proc/[pid]/statm, /proc/[pid]/status |
get_memory_stats(), get_page_faults() |
| Demand Demo | mmap without touch, measure faults | demo_demand_paging() |
| Fault Demo | Trigger controlled protection violations | demo_fault(), install_handler() |
| COW Demo | fork(), modify, observe sharing | demo_cow() |
| Report Generator | Format and explain output | print_map(), explain_region() |
4.3 Data Structures
// Memory region from /proc/[pid]/maps
typedef struct {
uint64_t start;
uint64_t end;
char perms[5]; // "rwxp" or "rwxs"
uint64_t offset;
uint8_t dev_major;
uint8_t dev_minor;
uint64_t inode;
char pathname[256];
// Derived fields
size_t size;
enum RegionType type; // TEXT, DATA, HEAP, STACK, LIBRARY, ANONYMOUS, SPECIAL
int is_readable;
int is_writable;
int is_executable;
int is_private;
} MemoryRegion;
// Region types for categorization
typedef enum {
REGION_TEXT, // Executable code
REGION_RODATA, // Read-only data
REGION_DATA, // Writable data
REGION_BSS, // Uninitialized data
REGION_HEAP, // [heap]
REGION_STACK, // [stack]
REGION_LIBRARY, // Shared library
REGION_VDSO, // [vdso]
REGION_VVAR, // [vvar]
REGION_ANONYMOUS, // Anonymous mmap
REGION_MMAP_FILE, // Memory-mapped file
REGION_GUARD, // Guard page (---p)
REGION_UNKNOWN
} RegionType;
// Complete memory map
typedef struct {
MemoryRegion *regions;
size_t count;
size_t capacity;
// Summary statistics
uint64_t total_virtual;
uint64_t total_resident;
uint64_t total_shared;
uint64_t total_private;
// Parsed from /proc/[pid]/status
unsigned long vm_peak;
unsigned long vm_size;
unsigned long vm_rss;
unsigned long vm_data;
unsigned long vm_stack;
unsigned long vm_exe;
unsigned long vm_lib;
} MemoryMap;
// Page fault statistics
typedef struct {
unsigned long minor_faults; // Satisfied from memory
unsigned long major_faults; // Required disk I/O
} PageFaultStats;
// Fault demonstration result
typedef struct {
uint64_t fault_address;
int signal_received;
int signal_code;
const char *explanation;
} FaultResult;
4.4 Algorithm Overview
Maps Parsing Algorithm:
1. Open /proc/[pid]/maps
2. For each line:
a. Parse address range (sscanf with %lx-%lx)
b. Parse permissions string
c. Parse offset, device, inode
d. Parse pathname (may be empty)
3. Categorize each region:
- If pathname is [heap] -> HEAP
- If pathname is [stack] -> STACK
- If pathname ends in .so -> LIBRARY
- If r-xp and pathname is executable -> TEXT
- etc.
4. Build summary statistics
5. Return MemoryMap structure
Demand Paging Demo Algorithm:
1. Read initial page fault count from /proc/self/stat
2. Allocate large region with mmap(MAP_PRIVATE | MAP_ANONYMOUS)
3. Verify region appears in /proc/self/maps
4. Read page fault count (should be unchanged)
5. Touch each page (read or write first byte)
6. Read page fault count again
7. Report: faults_after - faults_before = pages touched
Protection Fault Demo Algorithm:
1. Install SIGSEGV handler with sigaction()
2. Use sigsetjmp() to establish recovery point
3. Based on fault type:
- null: dereference NULL pointer
- write-readonly: mmap() with PROT_READ, attempt write
- execute-data: mmap() with PROT_READ|PROT_WRITE, call as function
- stack-overflow: recurse deeply or alloca() large amount
4. In handler:
- Record fault address from siginfo_t
- siglongjmp() back to recovery point
5. Explain why fault occurred based on region permissions
5. Implementation Guide
5.1 Development Environment Setup
# Required packages (Debian/Ubuntu)
sudo apt-get install build-essential gcc gdb linux-headers-$(uname -r)
# Project structure
mkdir -p vmmap/{src,include,tests,demos}
cd vmmap
# Create Makefile
cat > Makefile << 'EOF'
CC = gcc
CFLAGS = -Wall -Wextra -g -O2 -std=c11
LDFLAGS =
SRCS = src/main.c src/maps_parser.c src/memory_stats.c \
src/demand_demo.c src/fault_demo.c src/cow_demo.c \
src/report.c src/util.c
OBJS = $(SRCS:.c=.o)
TARGET = vmmap
all: $(TARGET)
$(TARGET): $(OBJS)
$(CC) $(LDFLAGS) -o $@ $^
%.o: %.c
$(CC) $(CFLAGS) -c -o $@ $<
clean:
rm -f $(OBJS) $(TARGET)
test: $(TARGET)
./$(TARGET) --self
./$(TARGET) --demand
./$(TARGET) --fault null
./$(TARGET) --cow
.PHONY: all clean test
EOF
5.2 Project Structure
vmmap/
โโโ src/
โ โโโ main.c # CLI argument parsing, dispatch
โ โโโ maps_parser.c # Parse /proc/[pid]/maps
โ โโโ memory_stats.c # Read /proc stats (RSS, page faults)
โ โโโ demand_demo.c # Demand paging demonstration
โ โโโ fault_demo.c # Protection fault demonstrations
โ โโโ cow_demo.c # Copy-on-write demonstration
โ โโโ report.c # Output formatting
โ โโโ util.c # Helpers (size formatting, etc.)
โโโ include/
โ โโโ vmmap.h # Main header, data structures
โ โโโ util.h # Utility declarations
โโโ tests/
โ โโโ test_parser.c # Unit tests for maps parser
โ โโโ test_regions.c # Test region categorization
โ โโโ test_demos.c # Integration tests
โโโ demos/
โ โโโ simple_program.c # Simple program for testing
โ โโโ heap_heavy.c # Program with lots of heap
โโโ Makefile
โโโ README.md
5.3 Implementation Phases
Phase 1: Maps Parser (Days 1-3)
Goals:
- Parse
/proc/[pid]/mapscompletely - Categorize regions correctly
- Display basic map output
Key Code - maps_parser.c:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdint.h>
#include "vmmap.h"
// Parse a single line from /proc/[pid]/maps
static int parse_maps_line(const char *line, MemoryRegion *region) {
char perms[5] = {0};
char pathname[256] = {0};
// Format: start-end perms offset dev inode pathname
int fields = sscanf(line, "%lx-%lx %4s %lx %hhx:%hhx %lu %255[^\n]",
®ion->start, ®ion->end,
perms,
®ion->offset,
®ion->dev_major, ®ion->dev_minor,
®ion->inode,
pathname);
if (fields < 7) return -1; // pathname is optional
strncpy(region->perms, perms, 4);
strncpy(region->pathname, pathname, sizeof(region->pathname) - 1);
// Parse permission bits
region->is_readable = (perms[0] == 'r');
region->is_writable = (perms[1] == 'w');
region->is_executable = (perms[2] == 'x');
region->is_private = (perms[3] == 'p');
region->size = region->end - region->start;
region->type = categorize_region(region);
return 0;
}
// Determine region type based on permissions and pathname
RegionType categorize_region(const MemoryRegion *region) {
const char *path = region->pathname;
// Special kernel mappings
if (strcmp(path, "[heap]") == 0) return REGION_HEAP;
if (strcmp(path, "[stack]") == 0) return REGION_STACK;
if (strcmp(path, "[vdso]") == 0) return REGION_VDSO;
if (strcmp(path, "[vvar]") == 0) return REGION_VVAR;
if (strcmp(path, "[vsyscall]") == 0) return REGION_VDSO;
// Guard pages (no permissions)
if (!region->is_readable && !region->is_writable && !region->is_executable) {
return REGION_GUARD;
}
// Anonymous mappings (no pathname, no inode)
if (path[0] == '\0' && region->inode == 0) {
return REGION_ANONYMOUS;
}
// Shared libraries
if (strstr(path, ".so") != NULL) {
if (region->is_executable) return REGION_TEXT;
if (region->is_writable) return REGION_DATA;
return REGION_RODATA;
}
// Main executable (has pathname, not a library)
if (path[0] == '/' && strstr(path, ".so") == NULL) {
if (region->is_executable) return REGION_TEXT;
if (region->is_writable) return REGION_DATA;
return REGION_RODATA;
}
// Memory-mapped file
if (path[0] == '/' && region->inode != 0) {
return REGION_MMAP_FILE;
}
return REGION_UNKNOWN;
}
// Parse complete memory map for a PID
MemoryMap *parse_memory_map(pid_t pid) {
char path[64];
snprintf(path, sizeof(path), "/proc/%d/maps", pid);
FILE *f = fopen(path, "r");
if (!f) return NULL;
MemoryMap *map = calloc(1, sizeof(MemoryMap));
map->capacity = 64;
map->regions = calloc(map->capacity, sizeof(MemoryRegion));
char line[512];
while (fgets(line, sizeof(line), f)) {
if (map->count >= map->capacity) {
map->capacity *= 2;
map->regions = realloc(map->regions,
map->capacity * sizeof(MemoryRegion));
}
if (parse_maps_line(line, &map->regions[map->count]) == 0) {
map->count++;
}
}
fclose(f);
// Calculate totals
for (size_t i = 0; i < map->count; i++) {
map->total_virtual += map->regions[i].size;
}
return map;
}
Checkpoint: ./vmmap --self shows all memory regions with correct categorization.
Phase 2: Memory Statistics (Days 4-5)
Goals:
- Read RSS, page faults from /proc
- Calculate memory summaries
- Enhance output with statistics
Key Code - memory_stats.c:
#include <stdio.h>
#include <string.h>
#include "vmmap.h"
// Read page fault counts from /proc/[pid]/stat
int get_page_faults(pid_t pid, PageFaultStats *stats) {
char path[64];
snprintf(path, sizeof(path), "/proc/%d/stat", pid);
FILE *f = fopen(path, "r");
if (!f) return -1;
// Fields in /proc/[pid]/stat:
// 1:pid 2:comm 3:state 4:ppid ... 10:minflt 11:cminflt 12:majflt ...
char line[1024];
if (!fgets(line, sizeof(line), f)) {
fclose(f);
return -1;
}
fclose(f);
// Skip to fields we need (10 and 12)
unsigned long minflt, cminflt, majflt, cmajflt;
// Find end of comm field (in parentheses)
char *p = strrchr(line, ')');
if (!p) return -1;
// Skip state and parse fields
if (sscanf(p + 2, "%*c %*d %*d %*d %*d %*d %*u %lu %lu %lu %lu",
&minflt, &cminflt, &majflt, &cmajflt) != 4) {
return -1;
}
stats->minor_faults = minflt;
stats->major_faults = majflt;
return 0;
}
// Read memory statistics from /proc/[pid]/status
int get_memory_status(pid_t pid, MemoryMap *map) {
char path[64];
snprintf(path, sizeof(path), "/proc/%d/status", pid);
FILE *f = fopen(path, "r");
if (!f) return -1;
char line[256];
while (fgets(line, sizeof(line), f)) {
unsigned long value;
if (sscanf(line, "VmPeak: %lu kB", &value) == 1)
map->vm_peak = value * 1024;
else if (sscanf(line, "VmSize: %lu kB", &value) == 1)
map->vm_size = value * 1024;
else if (sscanf(line, "VmRSS: %lu kB", &value) == 1)
map->vm_rss = value * 1024;
else if (sscanf(line, "VmData: %lu kB", &value) == 1)
map->vm_data = value * 1024;
else if (sscanf(line, "VmStk: %lu kB", &value) == 1)
map->vm_stack = value * 1024;
else if (sscanf(line, "VmExe: %lu kB", &value) == 1)
map->vm_exe = value * 1024;
else if (sscanf(line, "VmLib: %lu kB", &value) == 1)
map->vm_lib = value * 1024;
}
fclose(f);
map->total_resident = map->vm_rss;
return 0;
}
Checkpoint: Memory map output includes RSS, page faults, and size summaries.
Phase 3: Demand Paging Demo (Days 6-7)
Goals:
- Demonstrate that allocated memory isnโt resident until touched
- Count page faults during access
- Explain the observation
Key Code - demand_demo.c:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/mman.h>
#include <unistd.h>
#include "vmmap.h"
#define DEMO_SIZE (100 * 1024 * 1024) // 100 MB
void demo_demand_paging(void) {
printf("\n=== DEMAND PAGING DEMONSTRATION ===\n\n");
// Step 1: Get initial page fault count
PageFaultStats before, after;
get_page_faults(getpid(), &before);
printf("Step 1: Initial page fault count\n");
printf(" Minor faults: %lu\n", before.minor_faults);
printf(" Major faults: %lu\n\n", before.major_faults);
// Step 2: Allocate without touching
printf("Step 2: Allocating %zu MB with mmap (no MAP_POPULATE)\n",
DEMO_SIZE / (1024 * 1024));
void *ptr = mmap(NULL, DEMO_SIZE, PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
if (ptr == MAP_FAILED) {
perror("mmap");
return;
}
printf(" Virtual address: %p\n", ptr);
printf(" Size: %zu pages\n\n", DEMO_SIZE / 4096);
// Step 3: Check RSS before touching
MemoryMap map = {0};
get_memory_status(getpid(), &map);
get_page_faults(getpid(), &after);
printf("Step 3: Memory status BEFORE touching\n");
printf(" Virtual size: %lu MB\n", map.vm_size / (1024 * 1024));
printf(" RSS: %lu MB\n", map.vm_rss / (1024 * 1024));
printf(" Page faults since alloc: %lu\n\n",
after.minor_faults - before.minor_faults);
// Step 4: Touch every page
printf("Step 4: Touching every page (reading first byte)...\n");
volatile char *p = (volatile char *)ptr;
volatile char sum = 0;
size_t pages = DEMO_SIZE / 4096;
for (size_t i = 0; i < pages; i++) {
sum += p[i * 4096]; // Touch first byte of each page
if (i % (pages / 10) == 0) {
printf(" Progress: %zu%%\n", (i * 100) / pages);
}
}
printf(" Progress: 100%%\n\n");
(void)sum; // Suppress unused warning
// Step 5: Check RSS after touching
get_memory_status(getpid(), &map);
get_page_faults(getpid(), &after);
printf("Step 5: Memory status AFTER touching\n");
printf(" Virtual size: %lu MB\n", map.vm_size / (1024 * 1024));
printf(" RSS: %lu MB\n", map.vm_rss / (1024 * 1024));
printf(" New page faults: %lu\n\n",
after.minor_faults - before.minor_faults);
// Explanation
printf("EXPLANATION:\n");
printf(" - We allocated %zu MB of virtual memory\n",
DEMO_SIZE / (1024 * 1024));
printf(" - Before touching: RSS was much smaller than allocation\n");
printf(" - Each first access triggered a MINOR page fault\n");
printf(" - The kernel allocated physical frames on demand\n");
printf(" - After touching: RSS increased by ~%zu MB\n",
DEMO_SIZE / (1024 * 1024));
printf(" - Total page faults: ~%zu (one per page)\n\n", pages);
munmap(ptr, DEMO_SIZE);
printf("=== END DEMO ===\n\n");
}
Checkpoint: Demo shows page faults increasing as pages are touched.
Phase 4: Protection Fault Demo (Days 8-10)
Goals:
- Trigger various protection violations safely
- Catch and explain each fault type
- Show relevant page table information
Key Code - fault_demo.c:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <signal.h>
#include <setjmp.h>
#include <sys/mman.h>
#include <unistd.h>
#include "vmmap.h"
static sigjmp_buf jump_buffer;
static volatile sig_atomic_t fault_caught = 0;
static FaultResult last_fault = {0};
// Signal handler for SIGSEGV
static void segv_handler(int sig, siginfo_t *info, void *context) {
(void)sig;
(void)context;
fault_caught = 1;
last_fault.fault_address = (uint64_t)info->si_addr;
last_fault.signal_received = sig;
last_fault.signal_code = info->si_code;
// Determine explanation based on signal code
switch (info->si_code) {
case SEGV_MAPERR:
last_fault.explanation = "Address not mapped to object";
break;
case SEGV_ACCERR:
last_fault.explanation = "Invalid permissions for mapped object";
break;
default:
last_fault.explanation = "Unknown SIGSEGV cause";
}
siglongjmp(jump_buffer, 1);
}
// Install SIGSEGV handler
static void install_handler(void) {
struct sigaction sa;
memset(&sa, 0, sizeof(sa));
sa.sa_sigaction = segv_handler;
sa.sa_flags = SA_SIGINFO;
sigemptyset(&sa.sa_mask);
sigaction(SIGSEGV, &sa, NULL);
}
// Demo: NULL pointer dereference
static void demo_null_deref(void) {
printf("\n--- NULL Pointer Dereference ---\n\n");
printf("Attempting to read from address 0x0...\n\n");
fault_caught = 0;
if (sigsetjmp(jump_buffer, 1) == 0) {
volatile int *null_ptr = NULL;
int value = *null_ptr; // This will fault
(void)value;
}
if (fault_caught) {
printf("*** SIGSEGV received! ***\n\n");
printf("Fault analysis:\n");
printf(" Faulting address: 0x%lx\n", last_fault.fault_address);
printf(" Signal code: %s\n", last_fault.explanation);
printf("\n");
printf(" Why this faulted:\n");
printf(" - Address 0x0 is in the first page (NULL guard)\n");
printf(" - This page is intentionally unmapped\n");
printf(" - Any access causes SEGV_MAPERR\n");
}
}
// Demo: Write to read-only memory
static void demo_write_readonly(void) {
printf("\n--- Write to Read-Only Memory ---\n\n");
void *ptr = mmap(NULL, 4096, PROT_READ,
MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
if (ptr == MAP_FAILED) {
perror("mmap");
return;
}
printf("Mapped 4KB read-only at %p\n", ptr);
printf("Permissions: r--p (read only)\n\n");
printf("Attempting to write...\n\n");
fault_caught = 0;
if (sigsetjmp(jump_buffer, 1) == 0) {
*(volatile int *)ptr = 42; // This will fault
}
if (fault_caught) {
printf("*** SIGSEGV received! ***\n\n");
printf("Fault analysis:\n");
printf(" Faulting address: 0x%lx\n", last_fault.fault_address);
printf(" Signal code: %s\n", last_fault.explanation);
printf("\n");
printf(" Why this faulted:\n");
printf(" - Page is mapped (SEGV_ACCERR, not SEGV_MAPERR)\n");
printf(" - But write permission is not set\n");
printf(" - CPU checked PTE.R/W bit = 0, faulted\n");
printf("\n");
printf(" Fix: Use mprotect() to add PROT_WRITE\n");
}
munmap(ptr, 4096);
}
// Demo: Execute non-executable memory (NX bit)
static void demo_execute_data(void) {
printf("\n--- Execute Non-Executable Memory ---\n\n");
// Machine code for: mov eax, 42; ret
unsigned char code[] = {0xb8, 0x2a, 0x00, 0x00, 0x00, 0xc3};
void *ptr = mmap(NULL, 4096, PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
if (ptr == MAP_FAILED) {
perror("mmap");
return;
}
memcpy(ptr, code, sizeof(code));
printf("Mapped 4KB read-write at %p\n", ptr);
printf("Copied executable code to data page\n");
printf("Permissions: rw-p (no execute!)\n\n");
printf("Attempting to execute...\n\n");
typedef int (*func_t)(void);
func_t func = (func_t)ptr;
fault_caught = 0;
if (sigsetjmp(jump_buffer, 1) == 0) {
func(); // This will fault (NX bit)
}
if (fault_caught) {
printf("*** SIGSEGV received! ***\n\n");
printf("Fault analysis:\n");
printf(" Faulting address: 0x%lx\n", last_fault.fault_address);
printf(" Signal code: %s\n", last_fault.explanation);
printf("\n");
printf(" Why this faulted:\n");
printf(" - Page lacks execute permission (NX bit set)\n");
printf(" - This is the NX/XD security feature\n");
printf(" - Prevents code injection attacks\n");
printf("\n");
printf(" Fix: Use mprotect() to add PROT_EXEC\n");
}
munmap(ptr, 4096);
}
// Main demo dispatcher
void demo_fault(const char *type) {
printf("\n=== PROTECTION FAULT DEMONSTRATION ===\n");
install_handler();
if (strcmp(type, "null") == 0) {
demo_null_deref();
} else if (strcmp(type, "write-readonly") == 0) {
demo_write_readonly();
} else if (strcmp(type, "execute-data") == 0) {
demo_execute_data();
} else {
printf("Unknown fault type: %s\n", type);
printf("Available: null, write-readonly, execute-data\n");
}
printf("\n=== END DEMO ===\n\n");
}
Checkpoint: Each fault type is triggered, caught, and explained correctly.
Phase 5: Copy-on-Write Demo (Days 11-12)
Goals:
- Demonstrate memory sharing after fork()
- Show COW trigger on write
- Measure memory before/after
Key Code - cow_demo.c:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/mman.h>
#include <sys/wait.h>
#include <unistd.h>
#include "vmmap.h"
#define COW_SIZE (10 * 1024 * 1024) // 10 MB
void demo_cow(void) {
printf("\n=== COPY-ON-WRITE DEMONSTRATION ===\n\n");
// Step 1: Allocate and initialize data
printf("Step 1: Parent (PID %d) allocating %zu MB\n",
getpid(), COW_SIZE / (1024 * 1024));
volatile uint32_t *data = mmap(NULL, COW_SIZE,
PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
if (data == MAP_FAILED) {
perror("mmap");
return;
}
// Touch all pages to make them resident
for (size_t i = 0; i < COW_SIZE / sizeof(uint32_t); i += 4096 / sizeof(uint32_t)) {
data[i] = 0xDEADBEEF;
}
printf(" Address: %p\n", (void *)data);
printf(" Initial value at page 0: 0x%08X\n\n", data[0]);
// Step 2: Memory before fork
MemoryMap parent_map = {0};
get_memory_status(getpid(), &parent_map);
printf("Step 2: Memory before fork()\n");
printf(" Parent RSS: %lu MB\n\n", parent_map.vm_rss / (1024 * 1024));
// Step 3: Fork
printf("Step 3: Forking...\n");
fflush(stdout);
pid_t child = fork();
if (child == 0) {
// Child process
printf("\n Child (PID %d) created\n", getpid());
// Check memory immediately after fork
MemoryMap child_map = {0};
get_memory_status(getpid(), &child_map);
printf(" Child RSS immediately: %lu MB\n",
child_map.vm_rss / (1024 * 1024));
printf(" Child sees data[0] = 0x%08X (shared!)\n\n", data[0]);
// Step 4: Write in child
printf("Step 4: Child writing to page 0...\n");
data[0] = 0xCAFEBABE;
printf(" Child wrote 0x%08X to data[0]\n", data[0]);
// Check memory after write
get_memory_status(getpid(), &child_map);
printf(" Child RSS after write: %lu MB\n",
child_map.vm_rss / (1024 * 1024));
printf(" (Page was copied - COW triggered!)\n\n");
// Exit child
munmap((void *)data, COW_SIZE);
_exit(0);
} else if (child > 0) {
// Parent process - wait for child output
usleep(100000); // Let child print first
// Wait for child to complete
waitpid(child, NULL, 0);
// Step 5: Verify parent data unchanged
printf("Step 5: Parent checking its data...\n");
printf(" Parent sees data[0] = 0x%08X (unchanged!)\n\n", data[0]);
// Explanation
printf("EXPLANATION:\n");
printf(" 1. Before fork: Parent had %zu MB resident\n",
COW_SIZE / (1024 * 1024));
printf(" 2. After fork: Both share the same physical pages\n");
printf(" - Pages are marked read-only in both\n");
printf(" - Total physical memory: still ~%zu MB\n",
COW_SIZE / (1024 * 1024));
printf(" 3. Child wrote to a page:\n");
printf(" - Write caused protection fault (page was read-only)\n");
printf(" - Kernel caught fault, copied the page\n");
printf(" - Child got new page, parent keeps original\n");
printf(" 4. Only ONE page was copied (4 KB), not all %zu MB!\n\n",
COW_SIZE / (1024 * 1024));
munmap((void *)data, COW_SIZE);
} else {
perror("fork");
}
printf("=== END DEMO ===\n\n");
}
Checkpoint: Demo shows memory sharing and COW trigger with explanations.
Phase 6: Polish and Integration (Days 13-14)
Goals:
- Clean error handling
- Comprehensive help message
- Test all edge cases
Final Integration in main.c:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include "vmmap.h"
void print_usage(const char *prog) {
printf("Usage: %s [OPTIONS]\n\n", prog);
printf("Options:\n");
printf(" --self Show memory map of this process\n");
printf(" --map <pid> Show memory map of process <pid>\n");
printf(" --demand Demonstrate demand paging\n");
printf(" --fault <type> Trigger protection fault\n");
printf(" Types: null, write-readonly, execute-data\n");
printf(" --cow Demonstrate copy-on-write\n");
printf(" --help Show this help message\n");
}
int main(int argc, char *argv[]) {
if (argc < 2) {
print_usage(argv[0]);
return 1;
}
if (strcmp(argv[1], "--help") == 0 || strcmp(argv[1], "-h") == 0) {
print_usage(argv[0]);
return 0;
}
if (strcmp(argv[1], "--self") == 0) {
MemoryMap *map = parse_memory_map(getpid());
if (map) {
get_memory_status(getpid(), map);
print_memory_map(map);
free_memory_map(map);
}
return 0;
}
if (strcmp(argv[1], "--map") == 0 && argc > 2) {
pid_t pid = atoi(argv[2]);
MemoryMap *map = parse_memory_map(pid);
if (map) {
get_memory_status(pid, map);
print_memory_map(map);
free_memory_map(map);
} else {
fprintf(stderr, "Cannot read map for PID %d\n", pid);
return 1;
}
return 0;
}
if (strcmp(argv[1], "--demand") == 0) {
demo_demand_paging();
return 0;
}
if (strcmp(argv[1], "--fault") == 0 && argc > 2) {
demo_fault(argv[2]);
return 0;
}
if (strcmp(argv[1], "--cow") == 0) {
demo_cow();
return 0;
}
fprintf(stderr, "Unknown option: %s\n", argv[1]);
print_usage(argv[0]);
return 1;
}
6. Testing Strategy
6.1 Test Categories
| Category | Purpose | Examples |
|---|---|---|
| Unit Tests | Test parsing functions | Maps parser handles all line formats |
| Integration Tests | Full tool on sample processes | Map matches expected output |
| Fault Tests | Controlled crashes work correctly | Each fault type is caught |
| Cross-Process | Analyze other processes | Can read /proc/1/maps (if permitted) |
6.2 Critical Test Cases
1. Maps Parser Tests:
# Create test program with various regions
cat > tests/varied_regions.c << 'EOF'
#include <stdio.h>
#include <stdlib.h>
#include <sys/mman.h>
#include <unistd.h>
int global_init = 42; // .data
int global_uninit; // .bss
const char *rodata = "test"; // .rodata
int main() {
char stack_var[4096]; // stack
void *heap = malloc(4096); // heap
void *anon = mmap(NULL, 4096, PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
printf("Press Enter to continue...\n");
getchar();
free(heap);
munmap(anon, 4096);
return 0;
}
EOF
gcc -o tests/varied_regions tests/varied_regions.c
./tests/varied_regions &
PID=$!
./vmmap --map $PID
kill $PID
2. Demand Paging Verification:
# Should see page faults increase by ~25600 for 100MB
./vmmap --demand 2>&1 | grep "New page faults"
# Expected: approximately 25600 (100MB / 4KB)
3. Fault Handler Tests:
# Each should catch fault and not crash the tool
./vmmap --fault null
./vmmap --fault write-readonly
./vmmap --fault execute-data
4. COW Verification:
# Child and parent should see different values
./vmmap --cow 2>&1 | grep "0x"
# Parent should see 0xDEADBEEF
# Child should see 0xCAFEBABE
6.3 Test Automation Script
#!/bin/bash
# tests/run_tests.sh
set -e
echo "=== Testing vmmap ==="
echo "1. Testing --self..."
./vmmap --self > /dev/null
echo " PASS"
echo "2. Testing --demand..."
./vmmap --demand | grep -q "page faults"
echo " PASS"
echo "3. Testing --fault null..."
./vmmap --fault null | grep -q "SIGSEGV"
echo " PASS"
echo "4. Testing --fault write-readonly..."
./vmmap --fault write-readonly | grep -q "SIGSEGV"
echo " PASS"
echo "5. Testing --cow..."
./vmmap --cow | grep -q "0xDEADBEEF"
echo " PASS"
echo ""
echo "=== All tests passed! ==="
7. Common Pitfalls
7.1 Frequent Mistakes
| Pitfall | Symptom | Solution |
|---|---|---|
| Not handling empty pathname | Segfault on parse | Check for empty string before using |
| Signal handler not async-safe | Mysterious crashes | Only use async-signal-safe functions |
| Forgetting siglongjmp | Handler hangs | Always have recovery path |
| ASLR confuses testing | Addresses differ between runs | Disable with setarch -R for testing |
| COW demo races | Output interleaved | Use proper synchronization or delays |
| Parsing /proc race | Process exits during read | Handle errors gracefully |
7.2 Debugging Strategies
# Debug maps parsing
cat /proc/self/maps | head -5 # See actual format
# Debug page faults
perf stat -e page-faults ./vmmap --demand
# Debug signal handling
strace -e signal ./vmmap --fault null
# Debug COW
pmap -x <parent_pid>
pmap -x <child_pid>
7.3 Signal Handler Safety
// WRONG: Not async-signal-safe
void bad_handler(int sig) {
printf("Caught signal %d\n", sig); // printf is NOT safe!
malloc(100); // malloc is NOT safe!
}
// RIGHT: Async-signal-safe
static volatile sig_atomic_t caught = 0;
void good_handler(int sig, siginfo_t *info, void *ctx) {
(void)sig; (void)ctx;
caught = 1;
// Only set simple flags, then siglongjmp out
siglongjmp(jump_buffer, 1);
}
8. Extensions
8.1 Beginner Extensions
- Add JSON output: Machine-readable format for scripting
- Colorized output: Different colors for different region types
- Size formatting: Human-readable sizes (KB, MB, GB)
- Region search: Find which region contains a given address
8.2 Intermediate Extensions
- Compare two processes: Show differences in memory layout
- Track memory over time: Poll and show changes
- Memory-mapped file explorer: List all mmapโd files
- Shared memory analysis: Show which pages are shared with whom
8.3 Advanced Extensions
- Parse /proc/[pid]/pagemap: Show actual physical frame numbers
- TLB miss estimation: Use perf to correlate with address patterns
- Page table visualization: ASCII art of multi-level tables
- NUMA awareness: Show which memory is local vs remote
9. Real-World Connections
9.1 Industry Applications
- Memory profiling tools (Valgrind, massif): Understanding maps is foundational
- Container runtimes (Docker, containerd): cgroups + namespaces change memory views
- Security tools: ASLR verification, memory protection auditing
- Debuggers (GDB, LLDB): Must understand VM to show correct state
- JIT compilers: Allocate executable memory with mmap + mprotect
9.2 Related Linux Commands
# Show memory maps
pmap -x <pid> # Detailed process map
cat /proc/<pid>/maps # Raw kernel view
cat /proc/<pid>/smaps # Detailed per-region stats
# Memory statistics
free -h # System-wide memory
vmstat 1 # Virtual memory stats over time
cat /proc/meminfo # Detailed memory info
# Page faults
ps -o min_flt,maj_flt -p <pid> # Fault counts
perf stat -e page-faults ./cmd # Count faults during run
# Memory mapping
strace -e mmap,mprotect,munmap ./cmd # Trace mmap calls
9.3 Interview Relevance
This project prepares you for questions like:
- โHow does virtual memory work?โ
- โWhat happens when you dereference a NULL pointer?โ
- โWhy is fork() considered efficient?โ
- โExplain the difference between RSS and virtual sizeโ
- โHow does ASLR improve security?โ
- โWhatโs the purpose of guard pages?โ
10. Resources
10.1 Essential Reading
- CS:APP Chapter 9: โVirtual Memoryโ - The foundation
- CS:APP Chapter 8: โExceptional Control Flowโ - Signals and faults
- Understanding the Linux Virtual Memory Manager by Mel Gorman
- Linux Kernel Development, 3rd Ed by Robert Love (Ch. 15: Process Address Space)
10.2 Documentation
man 5 proc- /proc filesystem documentationman mmap- Memory mappingman mprotect- Changing memory protectionman sigaction- Signal handling
10.3 Online Resources
- Gustavo Duarteโs โHow The Kernel Manages Your Memoryโ
- LWN.net Virtual Memory articles
- Intel Software Developer Manual, Vol 3A, Ch. 4 - Paging
10.4 Related Projects in This Series
- Previous: P11 (Signals + Processes Sandbox) - Prerequisite for ECF
- Parallel: P9 (Cache Simulator) - Locality concepts
- Next: P14 (Build Your Own Malloc) - Uses VM concepts heavily
11. Self-Assessment Checklist
Before considering this project complete, verify:
Understanding
- I can explain the multi-level page table structure
- I understand the difference between virtual and physical addresses
- I can explain what happens on a TLB miss
- I understand demand paging and can explain minor vs major faults
- I can read /proc/[pid]/maps and explain each field
- I understand copy-on-write and why fork() is efficient
- I can explain protection bits (R/W/X) and what happens on violation
- I understand ASLR and its security implications
Implementation
- Maps parser correctly identifies all region types
- Demand paging demo shows clear before/after fault counts
- Protection fault demos catch and explain each fault type
- COW demo shows sharing and copy trigger
- Error handling works for inaccessible processes
- Output is clear and educational
Growth
- I can debug a segfault by analyzing the memory map
- I understand how VM interacts with cache locality
- I can explain memory usage discrepancies (virtual vs resident)
- Iโm comfortable using mmap() and mprotect()
- I can write correct signal handlers
12. Real World Outcome
When you complete this project, hereโs exactly what youโll see when running your tool:
Viewing Your Own Process Memory Map
$ ./vmmap --self
=== VIRTUAL MEMORY MAP: vmmap (PID 12345) ===
REGION TYPE START ADDR END ADDR SIZE PERM DESCRIPTION
--------------------------------------------------------------------------------------
[text] 0x0000555555554000 0x0000555555556000 8 KB r-xp vmmap (executable)
[rodata] 0x0000555555556000 0x0000555555557000 4 KB r--p vmmap (read-only data)
[data] 0x0000555555557000 0x0000555555558000 4 KB rw-p vmmap (initialized data)
[bss] 0x0000555555558000 0x0000555555559000 4 KB rw-p vmmap (uninitialized data)
[heap] 0x0000555555559000 0x000055555557a000 132 KB rw-p [heap]
[shared-lib] 0x00007ffff7c00000 0x00007ffff7c28000 160 KB r--p libc.so.6
[shared-lib] 0x00007ffff7c28000 0x00007ffff7dbd000 1620 KB r-xp libc.so.6 (code)
[shared-lib] 0x00007ffff7dbd000 0x00007ffff7e15000 352 KB r--p libc.so.6 (rodata)
[shared-lib] 0x00007ffff7e15000 0x00007ffff7e19000 16 KB rw-p libc.so.6 (data)
[anon] 0x00007ffff7e19000 0x00007ffff7e26000 52 KB rw-p [anonymous]
[vdso] 0x00007ffff7fc0000 0x00007ffff7fc4000 16 KB r-xp [vdso]
[stack] 0x00007ffffffde000 0x00007ffffffff000 132 KB rw-p [stack]
[vsyscall] 0xffffffffff600000 0xffffffffff601000 4 KB --xp [vsyscall]
=== MEMORY SUMMARY ===
Total Virtual Size: 2.5 GB
Code (.text): 1.8 MB (0.07%)
Data (.data/.bss): 156 KB
Heap: 132 KB
Stack: 132 KB
Shared Libraries: 15.2 MB
Anonymous: 52 KB
Regions by Permission:
r-xp (executable): 6 regions
rw-p (read-write): 8 regions
r--p (read-only): 4 regions
Demand Paging Demonstration
$ ./vmmap --demand
=== DEMAND PAGING DEMONSTRATION ===
Allocating 100 MB with mmap (no physical pages yet)...
Virtual address: 0x7ffff0c00000
Allocation time: 0.0001 seconds
Page faults before: 1,247
Touching every page (forcing demand paging)...
Pages to touch: 25,600 (100 MB / 4 KB)
[####################] 100% complete
Time elapsed: 0.847 seconds
Page fault analysis:
Page faults after: 26,892
New page faults: 25,645 <-- Almost exactly 25,600!
Extra faults: 45 (library/stack growth)
What this demonstrates:
- mmap() does NOT allocate physical memory immediately
- Physical pages are allocated on-demand when first accessed
- Each 4 KB page access triggers exactly one page fault
- The kernel satisfies faults by mapping anonymous pages
Memory before/after:
RSS before touch: 2.1 MB
RSS after touch: 102.1 MB (+100 MB, as expected)
Protection Fault Demonstration
$ ./vmmap --fault null
=== PROTECTION FAULT DEMONSTRATION: NULL Pointer ===
Setting up SIGSEGV handler...
Attempting to dereference NULL pointer (address 0x0)...
*** CAUGHT SIGSEGV ***
Signal: SIGSEGV (Segmentation fault)
Fault address: 0x0000000000000000
Fault reason: SEGV_MAPERR (address not mapped to object)
Why this happened:
- Address 0x0 is in the "unmapped" region at the bottom of address space
- The kernel deliberately leaves this unmapped to catch NULL pointer bugs
- The MMU triggered an exception when the CPU tried to access this address
- The kernel converted this to SIGSEGV delivered to our process
If this were a real bug, you'd see:
Segmentation fault (core dumped)
$ ./vmmap --fault write-readonly
=== PROTECTION FAULT DEMONSTRATION: Write to Read-Only ===
Setting up SIGSEGV handler...
Creating read-only mapped region at 0x7ffff7f00000...
Attempting to write to read-only memory...
*** CAUGHT SIGSEGV ***
Signal: SIGSEGV (Segmentation fault)
Fault address: 0x00007ffff7f00000
Fault reason: SEGV_ACCERR (invalid permissions for mapped object)
Why this happened:
- The page is mapped but with read-only permission (r--p)
- The MMU checked permissions on the PTE and found no write bit
- Hardware exception -> kernel -> SIGSEGV to process
This is how the OS protects:
- Code sections (.text) from modification
- Shared library code from corruption
- Read-only data (.rodata) integrity
Copy-on-Write Demonstration
$ ./vmmap --cow
=== COPY-ON-WRITE DEMONSTRATION ===
Creating shared memory region (1 page = 4 KB)...
Address: 0x7ffff7f00000
Initial value: 0xDEADBEEF
Forking process...
Parent PID: 12345
Child PID: 12346
Before modification:
Parent reads: 0xDEADBEEF at 0x7ffff7f00000
Child reads: 0xDEADBEEF at 0x7ffff7f00000 (same physical page!)
Child modifying shared page to 0xCAFEBABE...
After modification:
Parent reads: 0xDEADBEEF at 0x7ffff7f00000
Child reads: 0xCAFEBABE at 0x7ffff7f00000
What happened:
1. After fork(), parent and child shared the SAME physical page
2. Both PTEs pointed to the same physical frame
3. Both PTEs were marked read-only (COW bit set)
4. When child wrote, MMU triggered page fault
5. Kernel allocated NEW physical page for child
6. Kernel copied contents to new page
7. Kernel updated child's PTE to point to new page (now writable)
8. Parent's page unchanged - processes now have independent copies
This is why fork() is efficient:
- Only page table entries are copied, not actual pages
- Physical pages are shared until one process modifies them
- Most pages (code, libraries) are never modified -> stay shared
13. The Core Question Youโre Answering
โWhen my program uses memory, what actually happens between the addresses my code sees and the physical RAM chips in my computer?โ
This project demystifies virtual memory by making the abstract concrete. Youโll see that addresses in your code (virtual addresses) go through a complex translation involving page tables, the MMU, and kernel data structures before touching real hardware. Youโll understand why processes canโt see each otherโs memory, why dereferencing NULL crashes, and why fork() is amazingly fast despite โcopyingโ an entire process.
14. Concepts You Must Understand First
Before starting this project, ensure you understand these concepts:
| Concept | Why It Matters | Where to Learn |
|---|---|---|
| What a pointer is and how to dereference it | Youโll be working with raw addresses | CS:APP 3.8, any C book Ch. 5-6 |
| Hexadecimal and binary representation | Memory addresses are displayed in hex | CS:APP 2.1 |
| Process isolation concept | VMs main purpose is process isolation | CS:APP 8.2.3 |
| What a page fault is (conceptually) | Youโll be triggering and catching these | CS:APP 9.3 |
| Signal handling basics (SIGSEGV) | Youโll catch segmentation faults | CS:APP 8.5 |
| fork() and process creation | COW demo requires understanding fork | CS:APP 8.4.2 |
| Basic file I/O in C | Reading /proc files | CS:APP 10.1-10.4 |
15. Questions to Guide Your Design
Work through these questions BEFORE writing code:
-
Parsing Strategy: The /proc/[pid]/maps format has many fields. How will you parse each line reliably? What if a pathname has spaces?
-
Region Classification: How do you distinguish heap from anonymous mmap? Stack from thread stacks? What heuristics will you use?
-
Signal Safety: Your SIGSEGV handler will run in a dangerous context. What functions are async-signal-safe? How do you recover from the handler?
-
Timing Page Faults: How do you measure page faults before and after an operation? What kernel interface provides this?
-
COW Verification: How do you prove that parent and child initially share physical pages? Can you detect when the copy actually happens?
-
Cross-Platform: The /proc filesystem is Linux-specific. If you want macOS support, what API would you use instead?
-
Output Format: How do you make the output educational, not just a data dump? What explanations help the learner understand?
16. Thinking Exercise
Before writing any code, trace through this scenario by hand:
A program does:
int *ptr = malloc(8192); // 2 pages
ptr[0] = 42; // Write to first page
ptr[1024] = 99; // Write to second page (4096 bytes later)
fork();
// In child:
ptr[0] = 100; // Modify first page
Exercise: On paper, answer:
-
Before fork(): How many physical pages back the mallocโd region? Are both pages allocated immediately or on-demand?
-
After fork(), before write: How many total physical pages exist for this region (parent + child combined)? What do the page table entries look like in both processes?
-
After child writes: Now how many physical pages exist? Which PTE changed? What triggered the copy?
-
Memory usage: If we had forked 100 times and only one child modified the data, how many physical copies of the 2 pages would exist?
Verify your answers by implementing the COW demo and adding instrumentation to observe physical page allocation.
17. The Interview Questions Theyโll Ask
After completing this project, youโll be ready for these common interview questions:
- โExplain how virtual memory works.โ
- Expected: Describe address translation, page tables, MMU role
- Bonus: Mention TLB, multi-level page tables, and why VM enables process isolation
- โWhat happens when you dereference a NULL pointer?โ
- Expected: MMU finds no valid mapping -> page fault -> kernel sends SIGSEGV
- Bonus: Explain why page 0 is deliberately unmapped, and how guard pages work
- โWhy is fork() efficient even though it โcopiesโ an entire process?โ
- Expected: Copy-on-write - only page tables are copied, physical pages are shared
- Bonus: Explain how COW pages are marked read-only and copied on first write
- โWhatโs the difference between virtual memory size and RSS (Resident Set Size)?โ
- Expected: Virtual is address space reserved; RSS is physical pages currently in RAM
- Bonus: Explain demand paging, why virtualย ยป RSS, and when pages get evicted
- โHow does ASLR improve security?โ
- Expected: Randomizes addresses of stack, heap, libraries, making exploits harder
- Bonus: Discuss whatโs randomized vs. fixed, and limitations (information leaks)
- โWhat causes a segmentation fault?โ
- Expected: Accessing unmapped memory or violating page permissions
- Bonus: Distinguish SEGV_MAPERR (unmapped) from SEGV_ACCERR (permission violation)
18. Hints in Layers
If youโre stuck, reveal hints one at a time:
Hint 1: Parsing /proc/[pid]/maps
The format is: address perms offset dev inode pathname
Example line:
7f9c2c000000-7f9c2c021000 rw-p 00000000 00:00 0 [heap]
Parse with sscanf or by splitting on whitespace. Be careful: pathname can be empty or contain spaces!
sscanf(line, "%lx-%lx %4s %lx %s %lu %[^\n]",
&start, &end, perms, &offset, dev, &inode, pathname);
If pathname is empty, itโs an anonymous region.
Hint 2: Counting Page Faults
Read /proc/self/stat and extract fields 10 (minflt) and 12 (majflt).
Or use getrusage():
struct rusage usage;
getrusage(RUSAGE_SELF, &usage);
printf("Minor faults: %ld\n", usage.ru_minflt);
printf("Major faults: %ld\n", usage.ru_majflt);
Minor faults = page allocated from free list. Major faults = page read from disk.
Hint 3: Signal Handler Recovery
You canโt return normally from a SIGSEGV handler - the faulting instruction would just run again!
Use sigsetjmp/siglongjmp:
sigjmp_buf jump_buffer;
void handler(int sig, siginfo_t *info, void *ctx) {
// Log the fault info safely
siglongjmp(jump_buffer, 1);
}
// In main:
if (sigsetjmp(jump_buffer, 1) == 0) {
// Try the dangerous operation
*bad_ptr = 42;
} else {
// Jumped here from handler
printf("Caught fault!\n");
}
Hint 4: Demonstrating COW
To show COW is happening, you need to observe that parent and child see different values after one modifies:
int *shared = mmap(...); // Not really "shared" after COW
*shared = 0xDEADBEEF;
pid_t pid = fork();
if (pid == 0) {
// Child
printf("Child before: %x\n", *shared); // Same as parent
*shared = 0xCAFEBABE; // Triggers COW
printf("Child after: %x\n", *shared); // Different from parent
exit(0);
} else {
// Parent
wait(NULL);
printf("Parent after child exit: %x\n", *shared); // Still 0xDEADBEEF!
}
19. Books That Will Help
| Topic | Book | Chapter/Section |
|---|---|---|
| Virtual Memory overview | CS:APP 3rd Ed | Chapter 9.1-9.3 โPhysical and Virtual Addressingโ, โAddress Spacesโ, โVM as Cachingโ |
| Page tables and translation | CS:APP 3rd Ed | Chapter 9.3 โVM as Tool for Cachingโ |
| Page faults and demand paging | CS:APP 3rd Ed | Chapter 9.3.3 โPage Faultsโ |
| Memory mapping (mmap) | CS:APP 3rd Ed | Chapter 9.8 โMemory Mappingโ |
| Copy-on-write | CS:APP 3rd Ed | Chapter 9.8.3 โThe fork Function Revisitedโ |
| Protection and permissions | CS:APP 3rd Ed | Chapter 9.7 โMemory Protectionโ |
| Linux VM implementation | Understanding the Linux Kernel | Chapter 8 โMemory Managementโ |
| Deep VM internals | Operating Systems: Three Easy Pieces | Chapters on VM (free online) |
| Intel paging hardware | Intel SDM Volume 3A | Chapter 4 โPagingโ |
20. Submission / Completion Criteria
Minimum Viable Completion:
- Parse and display /proc/[pid]/maps correctly
- Categorize regions by type
- Show basic memory statistics
- At least one working demonstration (demand paging, fault, or COW)
Full Completion:
- All analysis modes work (โself, โmap, โdemand, โfault, โcow)
- Accurate region categorization
- Clear, educational output with explanations
- All fault types demonstrated and caught
- COW behavior demonstrated
Excellence (Going Above & Beyond):
- Parse /proc/[pid]/pagemap for physical addresses
- TLB/cache interaction analysis
- Compare memory layouts across processes
- Integration with perf for page fault profiling
- Support for macOS (vm_region_recurse_64)
This guide was expanded from CSAPP_3E_DEEP_LEARNING_PROJECTS.md. For the complete learning path, see the project index.