Project 5: Shadow Page Table Simulator
Simulate the pre-EPT memory virtualization technique where the VMM maintains “shadow” page tables to translate guest virtual addresses directly to host physical addresses.
Quick Reference
| Attribute | Value |
|---|---|
| Language | C (alt: Rust, C++) |
| Difficulty | Advanced |
| Time | 2-3 weeks |
| Knowledge Area | Memory Virtualization / Page Tables |
| Coolness | Hardcore Tech Flex |
| Portfolio Value | Resume Gold |
| Hardware Needed | None (pure simulation) |
Learning Objectives
By completing this project, you will:
- Master three-level address translation: Understand how Guest Virtual Address (GVA) maps to Guest Physical Address (GPA) maps to Host Physical Address (HPA)
- Implement trap-and-emulate for page tables: Build the mechanism that detects and handles guest page table modifications
- Understand why shadow page tables are expensive: Quantify the VM exit cost through statistics and analysis
- Simulate TLB behavior: Model how Translation Lookaside Buffers cache address translations
- Compare to hardware-assisted virtualization: Understand why EPT/NPT was such a breakthrough
- Debug memory virtualization issues: Develop intuition for the bugs that plague real hypervisors
The Core Question You’re Answering
“How did hypervisors provide memory isolation before hardware support existed, and why was it so expensive?”
Before Intel’s Extended Page Tables (EPT) and AMD’s Nested Page Tables (NPT), hypervisors faced a fundamental problem: the guest OS expects to control its own page tables, but the VMM must ensure guests can only access memory allocated to them. The solution—shadow page tables—required intercepting every guest page table modification and maintaining a parallel set of translations. This project forces you to implement this mechanism and understand viscerally why hardware-assisted memory virtualization was revolutionary.
Concepts You Must Understand First
Before starting this project, ensure you understand these concepts:
| Concept | Why It Matters | Where to Learn |
|---|---|---|
| x86 paging basics (PDE, PTE, CR3) | Shadow tables mirror guest page tables | OSTEP Ch. 18-20, Intel SDM Vol. 3A Ch. 4 |
| Page table walk algorithm | You’ll implement this three times | CS:APP Ch. 9, OSTEP Ch. 18 |
| Virtual memory address translation | Foundation of everything | OSTEP Ch. 18, CS:APP Ch. 9 |
| TLB operation and invalidation | Performance depends on TLB behavior | OSTEP Ch. 19, CS:APP Ch. 9.6 |
| Write protection and page faults | How you detect guest PT modifications | OSTEP Ch. 21, Intel SDM Vol. 3A Ch. 4 |
| Trap-and-emulate virtualization | Core hypervisor technique | Popek & Goldberg 1974, OSTEP Ch. 6-7 |
Key Concepts Deep Dive
- The Address Translation Problem in Virtualization
- Without virtualization: Process VA -> Physical Address (one level)
- With virtualization: Guest VA -> Guest PA -> Host PA (two levels)
- Why can’t the guest control real page tables?
- What happens if two guests try to use the same “physical” address?
- OSTEP Ch. 18-20, Virtualization Deep Dive resources
- Shadow Page Table Mechanism
- What is the relationship between guest PT and shadow PT?
- Why does the shadow PT map GVA -> HPA directly?
- When must the VMM update the shadow tables?
- What happens on a guest page fault?
- “The Definitive Guide to Xen” Ch. 8, VMware Papers
- Write-Protecting Guest Page Tables
- How does the VMM know when the guest modifies its page tables?
- What is the trap mechanism?
- Why is every PT modification expensive?
- Intel SDM Vol. 3A Ch. 4.6, KVM Source Code
- TLB Coherency in Virtualization
- When does the guest flush its TLB?
- When must the VMM flush the real TLB?
- What are the coherency requirements?
- OSTEP Ch. 19, Intel SDM Vol. 3A Ch. 4.10
- Performance Implications
- How many VM exits does a typical workload cause?
- What is the cost of each VM exit?
- Why did this drive the development of EPT/NPT?
- Intel VT-x Performance Papers, VMware Whitepapers
Theoretical Foundation
The Memory Virtualization Challenge
In a virtualized environment, we have multiple address spaces to manage:
┌─────────────────────────────────────────────────────────────────────────────┐
│ ADDRESS SPACE HIERARCHY │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ Guest Process Guest OS VMM/Host │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Guest │ Guest │ Guest │ VMM │ Host │ │
│ │ Virtual │──────────>│ Physical │──────>│ Physical │ │
│ │ Address │ Page │ Address │ Map │ Address │ │
│ │ (GVA) │ Tables │ (GPA) │ │ (HPA) │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ │
│ Example: │
│ GVA 0x1000 ──────────────> GPA 0x5000 ──────────> HPA 0x8A000 │
│ │
│ The CPU only understands one level of translation. │
│ Shadow page tables collapse GVA -> GPA -> HPA into GVA -> HPA. │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
Without Hardware Support: The Problem
The x86 CPU’s MMU only supports ONE level of page table translation. When a guest OS sets CR3 to point to its page tables:
Naive approach (WRONG):
- Let guest control real CR3
- Guest page tables translate GVA -> GPA
- Problem: GPA is NOT a real physical address!
┌─────────────────────────────────────────────────────────────────────────────┐
│ WHY NAIVE APPROACH FAILS │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ Guest thinks GPA 0x1000 is physical memory │
│ But GPA 0x1000 might be: │
│ - HPA 0x50000 for Guest 1 │
│ - HPA 0xA0000 for Guest 2 │
│ - Unmapped (requires allocation) │
│ - A device MMIO region │
│ │
│ If we let the guest directly control page tables: │
│ 1. Guest could access any physical memory (security violation) │
│ 2. Multiple guests would conflict (isolation violation) │
│ 3. VMM couldn't track memory usage (management failure) │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
The Shadow Page Table Solution
The VMM maintains a “shadow” of the guest’s page tables that maps GVA directly to HPA:
┌─────────────────────────────────────────────────────────────────────────────┐
│ SHADOW PAGE TABLE ARCHITECTURE │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────┐ ┌─────────────────┐ │
│ │ Guest Page │ │ Shadow Page │ │
│ │ Tables │ Controlled │ Tables │ Used by │
│ │ (GVA -> GPA) │ by Guest │ (GVA -> HPA) │ Real MMU │
│ └────────┬────────┘ └────────┬────────┘ │
│ │ │ │
│ │ VMM intercepts │ VMM points │
│ │ writes here │ CR3 here │
│ ▼ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ VMM Memory Map │ │
│ │ (GPA -> HPA) │ │
│ │ GPA 0x0000 -> HPA 0x10000 GPA 0x1000 -> HPA 0x20000 │ │
│ │ GPA 0x2000 -> HPA 0x35000 GPA 0x3000 -> HPA 0x48000 │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
│ Translation Path: │
│ 1. Guest writes to page table at GPA 0x2004 │
│ 2. VMM traps (page was write-protected) │
│ 3. VMM reads guest's intended mapping: GVA 0x1000 -> GPA 0x5000 │
│ 4. VMM looks up GPA 0x5000 in its map: GPA 0x5000 -> HPA 0x8A000 │
│ 5. VMM updates shadow: GVA 0x1000 -> HPA 0x8A000 │
│ 6. VMM resumes guest │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
The Trap-and-Update Mechanism
The key insight is that the VMM must intercept every guest page table modification:
┌─────────────────────────────────────────────────────────────────────────────┐
│ SHADOW PT UPDATE PROTOCOL │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ 1. VMM marks all guest page table pages as READ-ONLY in shadow tables │
│ │
│ 2. Guest tries to modify its page table: │
│ mov [GPA_of_PTE], new_mapping │
│ │
│ 3. CPU triggers page fault (write to read-only page) │
│ │
│ 4. Page fault causes VM EXIT to VMM │
│ │
│ 5. VMM handles the exit: │
│ a. Decode the faulting instruction │
│ b. Emulate the write to guest's page table (in guest memory) │
│ c. Parse the new PTE to extract GPA │
│ d. Translate GPA to HPA using VMM's memory map │
│ e. Update shadow page table with GVA -> HPA mapping │
│ f. If new PT page created, mark it read-only in shadow │
│ │
│ 6. VMM resumes guest execution (VM ENTRY) │
│ │
│ ┌───────────────────────────────────────────────────────────────────────┐│
│ │ COST: Every page table modification = 1 VM EXIT + instruction decode │││
│ │ + shadow table update + potential TLB flush + VM ENTRY │││
│ │ │││
│ │ Typical guest OS during boot: 10,000+ page table modifications │││
│ │ Each VM EXIT/ENTRY cycle: ~1000-3000 CPU cycles │││
│ │ Total overhead: Millions of cycles just for memory management! │││
│ └───────────────────────────────────────────────────────────────────────┘│
│ │
└─────────────────────────────────────────────────────────────────────────────┘
Multi-Level Page Table Structure (x86-64)
For realistic simulation, understand the 4-level page table hierarchy:
┌─────────────────────────────────────────────────────────────────────────────┐
│ x86-64 4-LEVEL PAGE TABLE WALK │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ Virtual Address (48 bits used): 0x00007F4A12345678 │
│ │
│ ┌─────┬──────┬──────┬──────┬──────┬──────────────┐ │
│ │Sign │PML4 │ PDPT │ PD │ PT │ Offset │ │
│ │Ext. │Index │Index │Index │Index │ (12 bits) │ │
│ │16b │ 9b │ 9b │ 9b │ 9b │ │ │
│ └─────┴──┬───┴───┬──┴───┬──┴──┬───┴──────────────┘ │
│ │ │ │ │ │
│ ▼ ▼ ▼ ▼ │
│ CR3 ─> [PML4] ─> [PDPT] ─> [PD] ─> [PT] ─> Physical Page │
│ │ │ │ │ │
│ │ │ │ └── 512 PTEs (4KB page table) │
│ │ │ └────────── 512 PDEs (can map 2MB pages) │
│ │ └─────────────────── 512 PDPTEs (can map 1GB pages) │
│ └──────────────────────────── 512 PML4Es │
│ │
│ Each entry is 8 bytes: │
│ ┌────────────────────────────────────────────────────────────────────┐ │
│ │ Bits 63-52: Reserved/NX │ Bits 51-12: Physical Address │ Bits 11-0 │ │
│ │ │ (Page Frame Number) │ Flags │ │
│ └────────────────────────────────────────────────────────────────────┘ │
│ │
│ Important Flags: │
│ Bit 0 (P): Present │
│ Bit 1 (R/W): Read/Write │
│ Bit 2 (U/S): User/Supervisor │
│ Bit 5 (A): Accessed │
│ Bit 6 (D): Dirty │
│ Bit 7 (PS): Page Size (for 2MB/1GB pages) │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
Shadow Page Table Entry Mapping
For each guest PTE, the shadow PTE combines guest permissions with VMM policy:
┌─────────────────────────────────────────────────────────────────────────────┐
│ SHADOW PTE CONSTRUCTION │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ Guest PTE (at GPA 0x2008): │
│ ┌───────────────────────────────────────────────────────────────────┐ │
│ │ Physical Addr: 0x5000 (GPA) │ R/W: 1 │ U/S: 1 │ P: 1 │ Flags... │ │
│ └───────────────────────────────────────────────────────────────────┘ │
│ │
│ VMM Memory Map lookup: GPA 0x5000 -> HPA 0x8A000 │
│ │
│ Shadow PTE (used by real MMU): │
│ ┌───────────────────────────────────────────────────────────────────┐ │
│ │ Physical Addr: 0x8A000 (HPA) │ R/W: 1 │ U/S: 1 │ P: 1 │ Flags...│ │
│ └───────────────────────────────────────────────────────────────────┘ │
│ │
│ SPECIAL CASE - Guest Page Table Page: │
│ If GPA 0x5000 contains a guest page table, shadow marks it read-only: │
│ ┌───────────────────────────────────────────────────────────────────┐ │
│ │ Physical Addr: 0x8A000 (HPA) │ R/W: 0 │ U/S: 1 │ P: 1 │ Flags...│ │
│ └───────────────────────────────────────────────────────────────────┘ │
│ This ensures any write triggers a trap so VMM can update shadows. │
│ │
│ PERMISSION INTERSECTION: │
│ Shadow R/W = Guest R/W AND VMM Policy (read-only if PT page) │
│ Shadow U/S = Guest U/S (VMM doesn't usually restrict) │
│ Shadow NX = Guest NX OR VMM Policy (can make more restrictive) │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
TLB Management in Shadow Paging
TLB coherency is critical and adds more overhead:
┌─────────────────────────────────────────────────────────────────────────────┐
│ TLB COHERENCY REQUIREMENTS │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ Guest modifies PTE: │
│ 1. Guest writes to its page table (trapped by VMM) │
│ 2. VMM updates shadow page table │
│ 3. VMM must flush TLB entry for affected GVA │
│ - If not flushed, CPU uses stale shadow translation! │
│ │
│ Guest executes INVLPG (invalidate page): │
│ 1. INVLPG causes VM EXIT (privileged instruction) │
│ 2. VMM must execute real INVLPG for that GVA │
│ 3. Another VM EXIT/ENTRY cycle │
│ │
│ Guest writes to CR3 (changes address space): │
│ 1. CR3 write causes VM EXIT │
│ 2. VMM must: │
│ a. Look up which shadow PT corresponds to new guest CR3 │
│ b. Create new shadow PT if needed │
│ c. Point real CR3 to shadow PT │
│ d. Flush entire TLB (or use VPID) │
│ │
│ ┌───────────────────────────────────────────────────────────────────────┐│
│ │ TLB Flush Statistics (typical workload): │││
│ │ - Context switches: 100s per second, each flushes TLB │││
│ │ - Page table updates: 1000s per second during heavy allocation │││
│ │ - Each flush: ~100-500 cycles, but destroys cached translations │││
│ │ - TLB miss after flush: ~50-200 cycles per translation │││
│ └───────────────────────────────────────────────────────────────────────┘│
│ │
└─────────────────────────────────────────────────────────────────────────────┘
Why EPT Changed Everything
For comparison, understand what hardware assistance provides:
┌─────────────────────────────────────────────────────────────────────────────┐
│ SHADOW PT vs EPT COMPARISON │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ SHADOW PAGE TABLES: │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ │ │
│ │ Guest CR3 ─────────────┐ │ │
│ │ (not used by HW) │ │ │
│ │ ▼ │ │
│ │ Guest Page Tables │ │
│ │ (in guest memory) │ │
│ │ │ │ │
│ │ │ VMM intercepts writes │ │
│ │ ▼ │ │
│ │ Real CR3 ────────> Shadow Page Tables ────> Physical Memory │ │
│ │ (points here) (VMM maintained) │ │
│ │ │ │
│ │ Cost: VM EXIT on every PT modification, CR3 write, INVLPG │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
│ EPT (Extended Page Tables): │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ │ │
│ │ Guest CR3 ────────> Guest Page Tables ────> Guest Physical │ │
│ │ (guest controlled) (GVA -> GPA) Address │ │
│ │ │ │ │
│ │ │ EPT Walk │ │
│ │ │ (hardware) │ │
│ │ ▼ │ │
│ │ EPT Pointer ────────> EPT Tables ─────────> Host Physical │ │
│ │ (VMM controlled) (GPA -> HPA) Address │ │
│ │ │ │
│ │ Cost: Extra memory access during page walk (cached in TLB) │ │
│ │ NO VM EXIT for normal PT modifications! │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
│ Performance Impact: │
│ ┌──────────────────────┬─────────────────┬───────────────────────────┐ │
│ │ Operation │ Shadow PT │ EPT │ │
│ ├──────────────────────┼─────────────────┼───────────────────────────┤ │
│ │ Guest PT modification│ VM EXIT (~2000 │ No exit (0 cycles) │ │
│ │ │ cycles) │ │ │
│ │ CR3 write │ VM EXIT + flush │ No exit (VPID helps) │ │
│ │ INVLPG │ VM EXIT │ No exit (affects guest │ │
│ │ │ │ TLB only) │ │
│ │ Page walk (TLB miss) │ 1 walk (fast) │ 2 walks (4x4=16 accesses │ │
│ │ │ │ worst case, cached) │ │
│ └──────────────────────┴─────────────────┴───────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
Project Specification
What You Will Build
A software simulator that models shadow page table memory virtualization, including:
- Simulated guest memory with guest page tables
- VMM memory map tracking GPA to HPA mappings
- Shadow page table management with trap-and-update
- TLB simulation with hit/miss tracking
- Statistics collection for performance analysis
Functional Requirements
- Memory Subsystem:
- Allocate host physical memory pool (e.g., 256MB)
- Allocate guest physical memory from pool (e.g., 64MB)
- Track GPA -> HPA mappings
- Guest Page Table Management:
- Support single-level (for simplicity) or multi-level paging
- Allow guest to “allocate” pages and create mappings
- Detect writes to page table pages
- Shadow Page Table Management:
- Create shadow tables that map GVA -> HPA
- Update shadows when guest modifies its tables
- Mark guest PT pages as read-only in shadows
- TLB Simulation:
- Model a simple TLB (e.g., 64 entries, fully associative)
- Track hits and misses
- Flush on CR3 changes and INVLPG
- Statistics and Analysis:
- Count guest PT modifications (VM exits)
- Count TLB flushes
- Calculate estimated performance overhead
Non-Functional Requirements
- Educational: Output explains what’s happening at each step
- Configurable: Memory sizes, TLB size as parameters
- Deterministic: Same input produces same output
- Extensible: Easy to add multi-level paging
Real World Outcome
When complete, your simulator will produce output like this:
$ ./shadow_sim --guest-mem=64M --host-mem=256M guest_program.bin
=== Shadow Page Table Simulator ===
Configuration:
Guest Physical Memory: 64MB (16,384 pages)
Host Physical Memory: 256MB (65,536 pages)
Guest PT allocated at GPA 0x1000000-0x1003FFF
TLB: 64 entries, fully associative
Loading guest program: guest_program.bin (4KB)
Program loaded at GPA 0x0000 -> HPA 0x50000
=== Simulation Start ===
[CYCLE 1] Guest loads CR3 with GPA 0x1000000
[VMM] VM EXIT: CR3 write
[VMM] Guest page directory at GPA 0x1000000 -> HPA 0x60000
[VMM] Creating shadow page directory at HPA 0x70000
[VMM] Real CR3 set to HPA 0x70000
[VMM] Write-protecting guest PD at HPA 0x60000
[VMM] TLB flushed (full)
[VMM] VM ENTRY (resumed guest)
[STATS] VM exits so far: 1
[CYCLE 2] Guest writes PTE: GVA 0x1000 -> GPA 0x5000
[VMM] VM EXIT: Write to protected page (guest page table)
[VMM] Fault address: GPA 0x1000004 (PTE for GVA 0x1000)
[VMM] Decoding guest instruction: MOV [0x1000004], 0x5003
[VMM] Guest mapping: GVA 0x1000 -> GPA 0x5000 (R/W, Present)
[VMM] Looking up GPA 0x5000 in memory map...
[VMM] GPA 0x5000 -> HPA 0x8A000
[VMM] Updating shadow PTE: GVA 0x1000 -> HPA 0x8A000
[VMM] TLB invalidate: GVA 0x1000
[VMM] VM ENTRY (resumed guest)
[STATS] VM exits so far: 2
[CYCLE 3] Guest accesses GVA 0x1000 (read)
[CPU] TLB lookup: GVA 0x1000 -> MISS
[CPU] Shadow table walk: GVA 0x1000 -> HPA 0x8A000
[CPU] TLB fill: GVA 0x1000 -> HPA 0x8A000
[CPU] Memory access completed
[STATS] TLB misses: 1, TLB hits: 0
[CYCLE 4] Guest accesses GVA 0x1000 (read)
[CPU] TLB lookup: GVA 0x1000 -> HIT (HPA 0x8A000)
[CPU] Memory access completed
[STATS] TLB misses: 1, TLB hits: 1
[CYCLE 5] Guest writes PTE: GVA 0x2000 -> GPA 0x6000
[VMM] VM EXIT: Write to protected page
[VMM] Fault address: GPA 0x1000008 (PTE for GVA 0x2000)
[VMM] Guest mapping: GVA 0x2000 -> GPA 0x6000 (R/W, Present)
[VMM] GPA 0x6000 -> HPA 0x95000
[VMM] Updating shadow PTE: GVA 0x2000 -> HPA 0x95000
[VMM] TLB invalidate: GVA 0x2000
[VMM] VM ENTRY
[STATS] VM exits so far: 3
... (more cycles) ...
[CYCLE 1000] Guest changes CR3 (context switch simulation)
[VMM] VM EXIT: CR3 write
[VMM] Full TLB flush
[VMM] Shadow table switched
[STATS] VM exits so far: 247
=== Simulation Complete ===
Total cycles simulated: 1000
Guest instructions: ~5000
=== Performance Statistics ===
┌────────────────────────────────────┬─────────────┐
│ Metric │ Count │
├────────────────────────────────────┼─────────────┤
│ Guest page table modifications │ 1,247 │
│ Shadow table updates │ 1,247 │
│ CR3 writes (context switches) │ 89 │
│ INVLPG instructions │ 156 │
│ Total VM exits │ 1,492 │
│ TLB flushes (full) │ 89 │
│ TLB invalidations (single page) │ 1,403 │
│ TLB hits │ 3,412 │
│ TLB misses │ 588 │
│ TLB hit rate │ 85.3% │
└────────────────────────────────────┴─────────────┘
=== Performance Analysis ===
Estimated cycles lost to VM exits: 1,492 * 2000 = 2,984,000 cycles
Estimated cycles lost to TLB misses: 588 * 100 = 58,800 cycles
Total overhead: ~3,042,800 cycles (~6% of execution time at 1GHz)
=== Why EPT is Better ===
With EPT, PT modifications would cause 0 VM exits.
Only EPT violations (unmapped GPA access) would trap.
Estimated EPT overhead: ~50,000 cycles (for initial EPT faults only)
Shadow PT overhead vs EPT: 60x more expensive!
Questions to Guide Your Design
Work through these questions BEFORE writing code:
-
Memory Layout: How will you organize your simulated physical memory? How do you track which HPAs are free vs allocated? How do you handle the GPA->HPA mapping table efficiently?
-
Page Table Structure: Will you start with single-level paging (simpler) or jump to multi-level? What page size will you use? How will you represent PTEs in your simulator?
-
Trap Detection: How do you detect writes to guest page tables? In a real system, this is done by marking pages read-only—how will you simulate this?
-
Shadow Synchronization: When the guest writes a PTE, what exact steps must you perform? How do you handle the case where the guest is creating a new page table?
-
TLB Simulation: What TLB replacement policy will you use (LRU, random, FIFO)? How do you handle TLB flushes efficiently?
-
Guest Program: How will you simulate guest execution? Will you interpret a simple instruction set, or script the memory operations?
-
Statistics Collection: What metrics best illustrate the overhead of shadow page tables? How will you make the comparison to EPT compelling?
Thinking Exercise
Before writing code, work through this scenario on paper:
Setup:
- Guest physical memory: 16 pages (GPA 0x0000-0xFFFF)
- Host physical memory: 64 pages (HPA 0x00000-0x3FFFF)
- GPA->HPA map:
- GPA 0x0000 -> HPA 0x10000 (guest code)
- GPA 0x1000 -> HPA 0x20000 (guest page table)
- GPA 0x2000 -> HPA 0x25000 (free for mapping)
- GPA 0x3000 -> HPA 0x30000 (free for mapping)
- Initial state: Shadow PT at HPA 0x40000, all entries marked not present
Trace this sequence:
- Guest writes CR3 = 0x1000
- What does VMM do?
- What is the state of shadow PT?
- What is real CR3?
- Guest writes PTE at GPA 0x1000: entry[0] = 0x2003 (maps GVA 0x0000 to GPA 0x2000, present, R/W)
- Why does this trap?
- What does VMM write to shadow PT?
- Show the exact shadow PTE value
- Guest reads from GVA 0x0100
- Show the address translation (TLB miss path)
- What HPA is accessed?
- What TLB entry is created?
- Guest reads from GVA 0x0200
- Is this a TLB hit or miss?
- What HPA is accessed?
- Guest writes PTE at GPA 0x1000: entry[0] = 0x3003 (changes GVA 0x0000 to point to GPA 0x3000)
- What happens to the TLB?
- What does VMM update in shadow PT?
- If guest reads GVA 0x0100 now, what HPA is accessed?
Write out all the data structure states after each step.
Hints in Layers
If you’re stuck, reveal hints one at a time:
Hint 1: Basic Data Structures
Start with these structures:
#define PAGE_SIZE 4096
#define PAGE_SHIFT 12
// Host physical memory (simulated)
typedef struct {
uint8_t *memory; // mmap'd region
size_t size;
uint64_t *free_bitmap; // Track free pages
} host_memory_t;
// Guest physical address to host physical address mapping
typedef struct {
uint64_t gpa;
uint64_t hpa;
uint32_t flags; // Present, is_page_table, etc.
} gpa_mapping_t;
typedef struct {
gpa_mapping_t *mappings;
size_t count;
size_t guest_mem_size;
} vmm_memory_map_t;
// Page table entry (simplified single-level)
typedef struct {
uint64_t pfn : 40; // Physical frame number
uint64_t flags : 12; // P, R/W, U/S, etc.
uint64_t rsvd : 12;
} pte_t;
// Shadow page table state
typedef struct {
uint64_t guest_cr3_gpa; // Guest's CR3 value (GPA)
uint64_t shadow_cr3_hpa; // Shadow PT base (HPA)
pte_t *shadow_pt; // Pointer to shadow PT
} shadow_state_t;
// TLB entry
typedef struct {
uint64_t gva;
uint64_t hpa;
int valid;
uint32_t flags;
} tlb_entry_t;
typedef struct {
tlb_entry_t *entries;
size_t size;
size_t hits;
size_t misses;
} tlb_t;
Hint 2: Memory Map Implementation
Create and manage the GPA->HPA mapping:
vmm_memory_map_t* create_memory_map(size_t guest_mem_size, host_memory_t *host) {
vmm_memory_map_t *map = calloc(1, sizeof(vmm_memory_map_t));
map->guest_mem_size = guest_mem_size;
size_t num_pages = guest_mem_size / PAGE_SIZE;
map->mappings = calloc(num_pages, sizeof(gpa_mapping_t));
map->count = num_pages;
// Allocate HPAs for each GPA
for (size_t i = 0; i < num_pages; i++) {
uint64_t gpa = i * PAGE_SIZE;
uint64_t hpa = allocate_host_page(host);
map->mappings[i].gpa = gpa;
map->mappings[i].hpa = hpa;
map->mappings[i].flags = MAP_PRESENT;
}
return map;
}
uint64_t gpa_to_hpa(vmm_memory_map_t *map, uint64_t gpa) {
size_t page_index = gpa / PAGE_SIZE;
if (page_index >= map->count) {
return INVALID_HPA; // GPA out of range
}
uint64_t page_offset = gpa & (PAGE_SIZE - 1);
return map->mappings[page_index].hpa + page_offset;
}
Hint 3: Trap-and-Update Logic
The core shadow table update routine:
void handle_guest_pte_write(shadow_state_t *shadow,
vmm_memory_map_t *map,
uint64_t fault_gpa,
uint64_t new_value) {
// 1. Extract which GVA this PTE maps
uint64_t guest_pt_gpa = shadow->guest_cr3_gpa;
uint64_t pte_offset = fault_gpa - guest_pt_gpa;
uint64_t pte_index = pte_offset / sizeof(pte_t);
uint64_t gva = pte_index * PAGE_SIZE;
// 2. Apply the write to guest's page table (in guest memory)
uint64_t fault_hpa = gpa_to_hpa(map, fault_gpa);
*(uint64_t*)hpa_to_ptr(fault_hpa) = new_value;
// 3. Parse the new PTE
pte_t guest_pte = *(pte_t*)&new_value;
if (!(guest_pte.flags & PTE_PRESENT)) {
// Guest unmapped this page
shadow->shadow_pt[pte_index].flags &= ~PTE_PRESENT;
return;
}
// 4. Translate GPA to HPA
uint64_t target_gpa = guest_pte.pfn << PAGE_SHIFT;
uint64_t target_hpa = gpa_to_hpa(map, target_gpa);
// 5. Build shadow PTE
pte_t shadow_pte;
shadow_pte.pfn = target_hpa >> PAGE_SHIFT;
shadow_pte.flags = guest_pte.flags; // Copy permissions
// 6. If target is a page table page, mark read-only in shadow
if (is_page_table_page(map, target_gpa)) {
shadow_pte.flags &= ~PTE_WRITABLE;
}
shadow->shadow_pt[pte_index] = shadow_pte;
printf("[VMM] Shadow PTE[%lu]: GVA 0x%lx -> HPA 0x%lx\n",
pte_index, gva, target_hpa);
}
Hint 4: TLB Simulation
Simple fully-associative TLB:
tlb_t* create_tlb(size_t size) {
tlb_t *tlb = calloc(1, sizeof(tlb_t));
tlb->entries = calloc(size, sizeof(tlb_entry_t));
tlb->size = size;
return tlb;
}
// Returns HPA or INVALID_HPA if miss
uint64_t tlb_lookup(tlb_t *tlb, uint64_t gva) {
uint64_t page_gva = gva & ~(PAGE_SIZE - 1);
for (size_t i = 0; i < tlb->size; i++) {
if (tlb->entries[i].valid && tlb->entries[i].gva == page_gva) {
tlb->hits++;
uint64_t offset = gva & (PAGE_SIZE - 1);
return tlb->entries[i].hpa + offset;
}
}
tlb->misses++;
return INVALID_HPA;
}
void tlb_insert(tlb_t *tlb, uint64_t gva, uint64_t hpa, uint32_t flags) {
uint64_t page_gva = gva & ~(PAGE_SIZE - 1);
uint64_t page_hpa = hpa & ~(PAGE_SIZE - 1);
// Simple replacement: find invalid or random
size_t slot = 0;
for (size_t i = 0; i < tlb->size; i++) {
if (!tlb->entries[i].valid) {
slot = i;
break;
}
slot = rand() % tlb->size; // Random if all valid
}
tlb->entries[slot].gva = page_gva;
tlb->entries[slot].hpa = page_hpa;
tlb->entries[slot].flags = flags;
tlb->entries[slot].valid = 1;
}
void tlb_invalidate_page(tlb_t *tlb, uint64_t gva) {
uint64_t page_gva = gva & ~(PAGE_SIZE - 1);
for (size_t i = 0; i < tlb->size; i++) {
if (tlb->entries[i].gva == page_gva) {
tlb->entries[i].valid = 0;
}
}
}
void tlb_flush_all(tlb_t *tlb) {
for (size_t i = 0; i < tlb->size; i++) {
tlb->entries[i].valid = 0;
}
}
Hint 5: Simulation Loop Structure
Main simulation driver:
typedef enum {
OP_LOAD_CR3, // Guest sets CR3
OP_WRITE_PTE, // Guest modifies page table
OP_READ_MEM, // Guest reads from GVA
OP_WRITE_MEM, // Guest writes to GVA
OP_INVLPG, // Guest invalidates TLB entry
} guest_op_t;
typedef struct {
guest_op_t op;
uint64_t addr; // GVA or GPA depending on op
uint64_t value; // For writes
} guest_instruction_t;
void simulate(guest_instruction_t *program, size_t count) {
for (size_t i = 0; i < count; i++) {
guest_instruction_t *inst = &program[i];
switch (inst->op) {
case OP_LOAD_CR3:
stats.vm_exits++;
handle_cr3_write(inst->value);
tlb_flush_all(tlb);
stats.tlb_flushes++;
break;
case OP_WRITE_PTE:
stats.vm_exits++;
stats.pt_modifications++;
handle_guest_pte_write(shadow, map, inst->addr, inst->value);
tlb_invalidate_page(tlb, pte_addr_to_gva(inst->addr));
break;
case OP_READ_MEM:
case OP_WRITE_MEM:
// Try TLB first
uint64_t hpa = tlb_lookup(tlb, inst->addr);
if (hpa == INVALID_HPA) {
// TLB miss - walk shadow tables
hpa = shadow_table_walk(shadow, inst->addr);
tlb_insert(tlb, inst->addr, hpa, get_pte_flags(shadow, inst->addr));
}
// Perform the access
if (inst->op == OP_READ_MEM) {
do_read(hpa);
} else {
do_write(hpa, inst->value);
}
break;
case OP_INVLPG:
stats.vm_exits++;
tlb_invalidate_page(tlb, inst->addr);
break;
}
}
}
Hint 6: Multi-Level Page Tables (Advanced)
Extend to 2-level paging:
// Two-level: Page Directory -> Page Table -> Page
typedef struct {
uint64_t guest_pd_gpa; // Guest's CR3 points here
pte_t *shadow_pd; // Shadow page directory (HPA)
pte_t **shadow_pts; // Array of shadow page tables
} shadow_state_2level_t;
uint64_t shadow_walk_2level(shadow_state_2level_t *shadow, uint64_t gva) {
uint64_t pd_index = (gva >> 22) & 0x3FF; // Top 10 bits
uint64_t pt_index = (gva >> 12) & 0x3FF; // Next 10 bits
uint64_t offset = gva & 0xFFF; // Bottom 12 bits
// Check PD entry
pte_t pde = shadow->shadow_pd[pd_index];
if (!(pde.flags & PTE_PRESENT)) {
return INVALID_HPA; // Page fault
}
// Get page table
pte_t *pt = shadow->shadow_pts[pd_index];
pte_t pte = pt[pt_index];
if (!(pte.flags & PTE_PRESENT)) {
return INVALID_HPA; // Page fault
}
return (pte.pfn << PAGE_SHIFT) + offset;
}
void handle_guest_pd_write(shadow_state_2level_t *shadow,
vmm_memory_map_t *map,
uint64_t fault_gpa,
uint64_t new_value) {
// Guest is creating/modifying a page directory entry
// This points to a page table, which must also be write-protected
uint64_t pde_index = (fault_gpa - shadow->guest_pd_gpa) / sizeof(pte_t);
pte_t guest_pde = *(pte_t*)&new_value;
if (guest_pde.flags & PTE_PRESENT) {
uint64_t pt_gpa = guest_pde.pfn << PAGE_SHIFT;
uint64_t pt_hpa = gpa_to_hpa(map, pt_gpa);
// Mark this page table page as read-only so we trap PT writes
mark_page_readonly_in_shadow(shadow, pt_gpa);
// Allocate shadow page table if needed
if (shadow->shadow_pts[pde_index] == NULL) {
shadow->shadow_pts[pde_index] = allocate_shadow_pt();
}
// Populate shadow PT from guest PT
sync_shadow_pt_from_guest(shadow, pde_index, pt_gpa, pt_hpa);
}
}
Solution Architecture
High-Level Design
┌─────────────────────────────────────────────────────────────────────────────┐
│ SHADOW PT SIMULATOR │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Guest │ │ VMM │ │ Host │ │
│ │ Program │──────>│ Handler │──────>│ Memory │ │
│ │ Loader │ │ │ │ Pool │ │
│ └──────────────┘ └──────┬───────┘ └──────────────┘ │
│ │ │
│ ┌────────────┼────────────┐ │
│ ▼ ▼ ▼ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Memory │ │ Shadow │ │ TLB │ │
│ │ Map │ │ PT │ │ Sim │ │
│ │ (GPA→HPA)│ │ Manager │ │ │ │
│ └──────────┘ └──────────┘ └──────────┘ │
│ │ │ │ │
│ └────────────┼────────────┘ │
│ ▼ │
│ ┌──────────────┐ │
│ │ Statistics │ │
│ │ Collector │ │
│ └──────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────┐ │
│ │ Report │ │
│ │ Generator │ │
│ └──────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
Key Components
| Component | Responsibility | Key Decisions |
|---|---|---|
| Host Memory Pool | Simulates physical RAM | Use mmap for large allocation; bitmap for free tracking |
| Memory Map | GPA to HPA translation | Array indexed by GPA page number for O(1) lookup |
| Shadow PT Manager | Create/update shadow tables | Start with single-level, extend to multi-level |
| TLB Simulator | Model caching behavior | Fully associative, configurable size, LRU or random |
| VMM Handler | Process guest operations | State machine for trap-and-emulate |
| Statistics | Track performance metrics | VM exits, TLB stats, shadow updates |
| Report Generator | Format output | Verbose during sim, summary at end |
Data Flow
Guest Operation
│
▼
┌─────────────────┐
│ Is it a trap? │─── No ──> TLB Lookup ──> Hit? ──> Access Memory
│ (CR3/PT write) │ │
└────────┬────────┘ │ Miss
│ Yes ▼
▼ Shadow Walk
┌─────────────────┐ │
│ VMM Handler │ ▼
│ - Update map │ TLB Insert
│ - Sync shadow │ │
│ - Update TLB │ ▼
└─────────────────┘ Access Memory
│
▼
Resume Guest
Algorithm: Shadow Page Table Update
PROCEDURE: UpdateShadowOnGuestPTEWrite(fault_gpa, new_pte_value)
INPUT: fault_gpa - GPA where guest wrote
new_pte_value - value guest wrote
OUTPUT: Updated shadow page table
1. INCREMENT stats.vm_exits
2. INCREMENT stats.pt_modifications
3. // Apply write to guest memory (emulate the instruction)
fault_hpa := gpa_to_hpa(fault_gpa)
WRITE new_pte_value TO fault_hpa
4. // Determine which GVA this PTE maps
pte_index := (fault_gpa - guest_pt_base) / sizeof(PTE)
gva := pte_index * PAGE_SIZE
5. // Parse guest PTE
guest_pte := parse_pte(new_pte_value)
6. IF NOT guest_pte.present THEN
shadow_pt[pte_index].present := false
tlb_invalidate(gva)
RETURN
7. // Translate GPA to HPA
target_gpa := guest_pte.pfn * PAGE_SIZE
target_hpa := gpa_to_hpa(target_gpa)
8. IF target_hpa == INVALID THEN
// Guest mapped non-existent memory - inject fault to guest
inject_page_fault_to_guest(gva)
RETURN
9. // Build shadow PTE
shadow_pte.pfn := target_hpa / PAGE_SIZE
shadow_pte.flags := guest_pte.flags
10. // If target is a PT page, make shadow read-only
IF is_page_table_page(target_gpa) THEN
shadow_pte.flags.writable := false
11. shadow_pt[pte_index] := shadow_pte
12. tlb_invalidate(gva)
13. LOG "Shadow: GVA %x -> HPA %x", gva, target_hpa
Implementation Guide
Development Environment Setup
# Create project structure
mkdir -p shadow-pt-sim/{src,include,tests,guest-programs}
cd shadow-pt-sim
# Required: C compiler, make
# Optional: gdb for debugging, graphviz for diagrams
# Create Makefile
cat > Makefile << 'EOF'
CC = gcc
CFLAGS = -Wall -Wextra -g -O2 -std=c11
LDFLAGS = -lm
SRC = src/main.c src/memory.c src/shadow.c src/tlb.c src/vmm.c src/stats.c
OBJ = $(SRC:.c=.o)
TARGET = shadow_sim
all: $(TARGET)
$(TARGET): $(OBJ)
$(CC) $(CFLAGS) -o $@ $^ $(LDFLAGS)
%.o: %.c
$(CC) $(CFLAGS) -c -o $@ $<
clean:
rm -f $(OBJ) $(TARGET)
test: $(TARGET)
./$(TARGET) --test
.PHONY: all clean test
EOF
Project Structure
shadow-pt-sim/
├── src/
│ ├── main.c # Entry point, CLI parsing
│ ├── memory.c # Host memory pool, GPA->HPA map
│ ├── shadow.c # Shadow page table management
│ ├── tlb.c # TLB simulation
│ ├── vmm.c # VM exit handling, guest program execution
│ └── stats.c # Statistics collection and reporting
├── include/
│ ├── types.h # Common type definitions
│ ├── memory.h # Memory management declarations
│ ├── shadow.h # Shadow PT declarations
│ ├── tlb.h # TLB declarations
│ └── vmm.h # VMM declarations
├── tests/
│ ├── test_basic.c # Basic single-mapping test
│ ├── test_multimap.c # Multiple mappings
│ └── test_context.c # Context switch simulation
├── guest-programs/
│ └── workload.txt # Sample guest workload specification
├── Makefile
└── README.md
Implementation Phases
Phase 1: Memory Infrastructure (Days 1-3)
Goals:
- Implement host physical memory pool
- Implement GPA to HPA mapping
- Basic memory read/write operations
Tasks:
- Create
memory.cwithcreate_host_memory()andallocate_host_page() - Create GPA->HPA mapping structure and lookup function
- Test: Allocate guest memory, verify mapping
Checkpoint: Can allocate 64MB guest memory from 256MB host pool, translate GPA to HPA.
Phase 2: Single-Level Shadow Tables (Days 4-7)
Goals:
- Implement single-level page table structure
- Create shadow page table
- Handle basic PTE writes
Tasks:
- Define PTE structure with flags
- Create shadow PT allocation and initialization
- Implement
handle_cr3_write()- create shadow from guest PT - Implement
handle_pte_write()- update shadow on guest modification - Test: Guest creates mapping, shadow reflects it
Checkpoint: Guest can set CR3, write PTEs, shadow tables are correctly updated.
Phase 3: TLB and Address Translation (Days 8-10)
Goals:
- Implement TLB simulation
- Implement shadow table walk
- Complete address translation path
Tasks:
- Create TLB structure with lookup/insert/invalidate
- Implement
shadow_table_walk()for TLB miss - Connect TLB to shadow PT
- Track TLB hits/misses
Checkpoint: Memory accesses use TLB, misses walk shadow tables, statistics accurate.
Phase 4: Guest Program Execution (Days 11-14)
Goals:
- Implement guest program parser/executor
- Complete VM exit handling
- Add INVLPG support
Tasks:
- Create simple guest operation language (CR3 load, PTE write, mem read/write, INVLPG)
- Implement simulation loop
- Add CR3 change handling (context switch)
- Add INVLPG handling
Checkpoint: Can run scripted guest workloads, all VM exits handled correctly.
Phase 5: Statistics and Polish (Days 15-17)
Goals:
- Comprehensive statistics collection
- Performance analysis output
- EPT comparison
Tasks:
- Implement detailed statistics tracking
- Create summary report format
- Calculate estimated cycle overhead
- Add EPT comparison analysis
- Clean up output, add verbose mode
Checkpoint: Full statistics, compelling comparison to EPT, clean output.
Testing Strategy
Test Categories
| Category | Purpose | Examples |
|---|---|---|
| Unit Tests | Test individual functions | PTE parsing, TLB operations |
| Integration Tests | Test component interaction | Shadow update triggers TLB invalidate |
| Workload Tests | Test realistic scenarios | Boot simulation, context switch heavy |
| Regression Tests | Prevent bugs from returning | Fixed mapping bug stays fixed |
Critical Test Cases
- Basic Mapping Test:
CR3 = 0x1000 PTE[0] = 0x2003 (GVA 0 -> GPA 0x2000, present, R/W) READ GVA 0x100 Expected: Read from HPA corresponding to GPA 0x2100 - Mapping Change Test:
CR3 = 0x1000 PTE[0] = 0x2003 READ GVA 0x100 # From GPA 0x2100 PTE[0] = 0x3003 # Change mapping READ GVA 0x100 # Should be from GPA 0x3100 Verify: TLB was invalidated, new HPA accessed - Context Switch Test:
CR3 = 0x1000 (Process 1) PTE[0] = 0x2003 READ GVA 0 CR3 = 0x4000 (Process 2 - different PT) PTE[0] = 0x5003 READ GVA 0 # Should be from different HPA Verify: Full TLB flush on CR3 change - Write Protection Test:
CR3 = 0x1000 (PT at GPA 0x1000) READ from GPA 0x1000 # Should succeed WRITE to GPA 0x1000 # Should trap (PT page is protected) Verify: VM exit occurs on PT write - TLB Hit Rate Test:
PTE[0] = 0x2003 Repeat 100x: READ GVA 0x100 Expected: 1 TLB miss, 99 TLB hits
Test Data
# Create test workloads
cat > guest-programs/basic.txt << 'EOF'
# Basic mapping test
CR3 1000
WRITE_PTE 0 2003
READ 100
READ 200
WRITE 150 DEADBEEF
READ 150
EOF
cat > guest-programs/context_switch.txt << 'EOF'
# Context switch simulation
CR3 1000
WRITE_PTE 0 2003
WRITE_PTE 1 3003
READ 100
READ 1100
CR3 4000
WRITE_PTE 0 5003
READ 100
CR3 1000
READ 100
EOF
Common Pitfalls & Debugging
Frequent Mistakes
| Pitfall | Symptom | Solution |
|---|---|---|
| Not write-protecting PT pages | Guest can modify PT without VMM seeing | Mark PT pages read-only in shadow when creating |
| Forgetting TLB invalidation | Stale translations used after PT update | Always invalidate affected GVA on PTE change |
| Wrong GPA->HPA for PT access | Shadow points to wrong memory | Double-check that GPA of PT page maps to correct HPA |
| Off-by-one in PTE index | Wrong GVA affected | Verify PTE index calculation: (fault_gpa - pt_base) / 8 |
| Not handling not-present PTEs | Crash on access to unmapped page | Check present bit before using PFN |
| TLB not flushed on CR3 change | Process sees other process’s memory | Flush entire TLB when CR3 changes |
Debugging Strategies
- Trace Everything Initially:
#define TRACE(fmt, ...) fprintf(stderr, "[TRACE] " fmt "\n", ##__VA_ARGS__) void handle_pte_write(uint64_t gpa, uint64_t value) { TRACE("PTE write: GPA=0x%lx, value=0x%lx", gpa, value); // ... rest of function } - Dump State After Each Operation:
void dump_shadow_state(shadow_state_t *s) { printf("Shadow PT State:\n"); printf(" Guest CR3 GPA: 0x%lx\n", s->guest_cr3_gpa); printf(" Shadow CR3 HPA: 0x%lx\n", s->shadow_cr3_hpa); for (int i = 0; i < 16; i++) { if (s->shadow_pt[i].flags & PTE_PRESENT) { printf(" [%d] GVA 0x%x -> HPA 0x%lx (flags=0x%x)\n", i, i * PAGE_SIZE, s->shadow_pt[i].pfn << PAGE_SHIFT, s->shadow_pt[i].flags); } } } - Verify Round-Trip:
// After updating shadow, verify translation works uint64_t test_hpa = shadow_table_walk(shadow, gva); uint64_t expected_hpa = gpa_to_hpa(map, target_gpa); assert(test_hpa == expected_hpa); - Statistics Sanity Check:
// At end of simulation assert(stats.pt_modifications == stats.shadow_updates); assert(stats.tlb_hits + stats.tlb_misses == stats.total_mem_accesses);
Extensions & Challenges
Beginner Extensions
- Add verbose mode: Show every operation step by step
- Multiple guest processes: Simulate multiple address spaces
- Page fault injection: When guest accesses unmapped GVA
Intermediate Extensions
- 2-level paging: Implement PD + PT hierarchy
- Large page support: 2MB pages (skip PT level)
- LRU TLB: Replace random eviction with LRU
- Accessed/Dirty bit emulation: Track page access patterns
Advanced Extensions
- 4-level x86-64 paging: Full PML4 + PDPT + PD + PT
- ASID/VPID simulation: Tag TLB entries to avoid full flush
- EPT simulator: Implement for comparison
- Performance modeling: Estimate actual cycle costs
- Real guest code: Load simple ELF and simulate execution
Self-Assessment Checklist
Before considering this project complete, verify:
Understanding
- I can explain why shadow page tables require trapping PT modifications
- I can draw the three address spaces (GVA, GPA, HPA) and translation path
- I can explain why CR3 writes must cause VM exits
- I understand why shadow PT pages are marked read-only in the shadow
- I can explain the TLB coherency requirements
- I understand why EPT is dramatically more efficient
Implementation
- GPA to HPA mapping works correctly
- Shadow table updates on guest PTE write
- TLB hits and misses tracked accurately
- CR3 changes handled with TLB flush
- Statistics match expected values for test workloads
Growth
- I can estimate the overhead of shadow PT vs EPT
- I could extend this to multi-level paging
- I understand how real hypervisors (KVM, Xen) handle this
The Interview Questions They’ll Ask
After completing this project, you’ll be ready for these questions:
- “How do hypervisors virtualize memory?”
- They want: Explain GVA->GPA->HPA, shadow PT or EPT, why isolation matters
- Bonus: Mention performance implications of each approach
- “What is a shadow page table?”
- They want: VMM-maintained table that maps GVA directly to HPA, updated on guest PT modifications
- Explain the trap mechanism
- “Why was EPT/NPT such a big improvement?”
- They want: Eliminated VM exits for PT modifications, moved GPA->HPA to hardware
- Quantify: From 1000s of exits to nearly zero for memory management
- “How does a hypervisor detect when a guest modifies its page tables?”
- They want: Write-protect PT pages in shadow, trap on write, emulate and sync
- Explain the VM exit/entry cost
- “What is TLB shootdown and why does it matter for virtualization?”
- They want: Invalidating TLB entries across CPUs, even more expensive in VM context
- Discuss VPID/ASID as mitigation
- “How would you debug a memory corruption issue in a VM?”
- They want: Understanding of memory layers, ability to trace translation, check shadow consistency
- Mention tools like VM introspection
Real-World Connections
Industry Applications
- Cloud providers (AWS, GCP, Azure): Run millions of VMs, memory virtualization efficiency is critical
- KVM/QEMU: Linux’s hypervisor, started with shadow PT, now uses EPT
- VMware: Pioneered shadow page tables in VMware Workstation (late 1990s)
- Security research: Hypervisor-based security tools need deep understanding
- Embedded virtualization: Where EPT isn’t available, shadow PT still used
Related Open Source Projects
- KVM source code:
arch/x86/kvm/mmu/- production shadow/EPT implementation - Xen:
xen/arch/x86/mm/shadow/- shadow page table code - QEMU: User-space component that works with KVM
- bhyve: FreeBSD hypervisor, good learning codebase
- Bareflank: Educational hypervisor project
Historical Context
- 1974: Popek & Goldberg formal requirements for virtualization
- 1998: VMware Workstation uses binary translation + shadow PT
- 2005: Intel VT-x released (CPU virtualization)
- 2008: Intel EPT released (memory virtualization)
- 2008: AMD NPT (Nested Page Tables) released
- Today: Shadow PT still used for nested virtualization edge cases
Resources
Essential Reading
- OSTEP Chapters 18-20: Virtual Memory, Paging, TLBs
- CS:APP Chapter 9: Virtual Memory
- Intel SDM Vol. 3A Chapter 4: Paging
- Intel SDM Vol. 3C Chapters 28-29: EPT and Memory Virtualization
Papers
- “Formal Requirements for Virtualizable Third Generation Architectures” - Popek & Goldberg (1974)
- “A Comparison of Software and Hardware Techniques for x86 Virtualization” - Adams & Agesen (2006)
- “Memory Resource Management in VMware ESX Server” - Waldspurger (2002)
Online Resources
- MMU Virtualization: Shadow Page Tables
- Understanding Memory Virtualization
- KVM MMU Documentation
- Hypervisor From Scratch - Part 4
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Virtual memory fundamentals | Operating Systems: Three Easy Pieces | Ch. 18-20 |
| Address translation hardware | Computer Systems: A Programmer’s Perspective | Ch. 9 |
| x86 paging details | Intel SDM Volume 3A | Ch. 4 |
| Hypervisor architecture | The Definitive Guide to the Xen Hypervisor | Ch. 8 |
| Memory virtualization | Intel SDM Volume 3C | Ch. 28 |
| Low-level memory management | Understanding the Linux Kernel | Ch. 2 |
Related Projects in This Series
- Previous: P04 (Simple JIT Compiler) - Dynamic code generation
- Next: P06 (User-Space Memory Mapper) - User-space memory management with mmap
- Later: P12 (EPT Implementation) - Hardware-assisted memory virtualization
Learning Milestones
Track your progress through these milestones:
Milestone 1: Single-Level Paging Simulated (Days 1-7)
- Host memory pool working
- GPA->HPA mapping working
- Single-level shadow PT created
- Basic PTE write handling
- Verification: Guest can create one mapping, access works
Milestone 2: Shadow Table Updates on PT Writes (Days 8-10)
- Trap mechanism simulated
- Shadow updates correctly on guest PTE modification
- TLB invalidation on mapping change
- Verification: Changing a mapping changes where accesses go
Milestone 3: Multi-Level Paging (Days 11-14) - Stretch
- 2-level PT hierarchy implemented
- PD and PT pages both protected
- Recursive PT walks work
- Verification: More realistic memory layout
Milestone 4: Statistics Show VM Exit Cost (Days 15-16)
- All VM exits counted
- TLB statistics accurate
- Performance overhead calculated
- Verification: Numbers match expected for test workloads
Milestone 5: Compare to EPT Simulation (Day 17) - Stretch
- EPT simulation added (or theoretical comparison)
- Side-by-side statistics
- Clear demonstration of why EPT is better
- Verification: Compelling performance comparison output
Submission / Completion Criteria
Minimum Viable Completion:
- Single-level shadow page tables work
- Guest can create mappings, access memory
- VM exits counted on PT modifications
- Basic TLB simulation with hit/miss tracking
- Summary statistics at end
Full Completion:
- Multi-level paging (at least 2-level)
- TLB coherency correct (flush on CR3, invalidate on PTE change)
- INVLPG handling
- Comprehensive statistics and analysis
- Clean, educational output
Excellence (Going Above & Beyond):
- Full 4-level x86-64 paging simulation
- EPT simulation for comparison
- VPID/ASID TLB optimization
- Realistic guest workload generation
- Performance modeling with estimated cycle costs
- Visualization of address translation
This guide was expanded from HYPERVISOR_VIRTUALIZATION_DEEP_DIVE_PROJECTS.md. For the complete learning path, see the project index.