Project 6: User-Space Memory Mapper with Protection
Build a user-space memory allocator that simulates guest physical memory using mmap, with support for memory-mapped I/O regions and access permission enforcement.
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Intermediate (Level 2) |
| Time Estimate | 1-2 weeks |
| Language | C (alternatives: Rust) |
| Prerequisites | Linux system programming basics, understanding of virtual memory |
| Key Topics | mmap, signal handling (SIGSEGV), memory protection, MMIO emulation, dirty page tracking |
1. Learning Objectives
By completing this project, you will:
- Master Linux memory mapping APIs: Understand how
mmap(),mprotect(), andmunmap()work at a deep level, including the relationship between virtual addresses and physical pages - Implement signal-based memory trapping: Learn how to use
SIGSEGVhandlers to intercept memory accesses and emulate device behavior - Build foundational VM memory management: Create the memory layer that every hypervisor needs - the same patterns used in QEMU, Firecracker, and cloud-hypervisor
- Understand memory hot-plug and dirty tracking: Implement the mechanisms required for live VM migration and dynamic memory allocation
2. Theoretical Foundation
2.1 Core Concepts
The Memory Mapping Problem in Virtualization
When you run a virtual machine, the guest operating system believes it has access to physical RAM starting at address 0. But that’s an illusion - the guest’s “physical memory” is actually backed by memory in the host process. This mapping is fundamental to all virtualization:
┌──────────────────────────────────────────────────────────────────────┐
│ Guest's View │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ 0x00000000 Guest RAM 0x3FFFFFFF │ 0x40000000 MMIO │ │
│ │ ▼ ▼ ▼ │ ▼ │ │
│ │ ████████████████████████████████████ │ ░░░░░░ │ │
│ │ Real memory the guest can read/write │ Device regs │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │ │ │
│ ▼ ▼ │
└──────────────────────────────────────────────────────────────────────┘
│ │
│ Address Translation │
▼ ▼
┌──────────────────────────────────────────────────────────────────────┐
│ Host's Reality │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ mmap'd region at 0x7f0000000000 │ │
│ │ ▼ │ │
│ │ ████████████████████████████████████ PROT_NONE region │ │
│ │ Anonymous mapping (PROT_READ|PROT_WRITE) triggers SIGSEGV │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │
│ VMM Process Virtual Address Space │
└──────────────────────────────────────────────────────────────────────┘
Understanding mmap() for VM Memory
The mmap() system call is the foundation of user-space memory management:
void *mmap(void *addr, size_t length, int prot, int flags, int fd, off_t offset);
For VM memory, we typically use:
addr = NULL- let the kernel choose the address (or useMAP_FIXEDfor specific placement)prot = PROT_READ | PROT_WRITE- guest RAM needs read/write accessflags = MAP_PRIVATE | MAP_ANONYMOUS- private memory not backed by a filefd = -1, offset = 0- no file backing (anonymous mapping)
Key insight: When you mmap() anonymous memory, the kernel doesn’t actually allocate physical pages immediately. Pages are allocated on first access (demand paging). This is crucial for VM memory efficiency.
Memory-Mapped I/O (MMIO) Emulation
Real hardware devices expose their registers through memory addresses. When a CPU accesses these addresses, it’s not accessing RAM - it’s communicating with the device. In virtualization, we need to trap these accesses and emulate the device behavior:
Guest writes to 0x10000000 (UART data register)
│
▼
┌───────────────────┐
│ MMU Translation │
│ GPA → HPA │
└───────────────────┘
│
▼
┌───────────────────┐
│ Page Fault! │ ← Region marked PROT_NONE
│ SIGSEGV raised │
└───────────────────┘
│
▼
┌───────────────────┐
│ Signal Handler │
│ - Decode access │
│ - Identify device │
│ - Emulate op │
│ - Resume guest │
└───────────────────┘
Dirty Page Tracking for Live Migration
Live migration moves a running VM from one host to another. This requires:
- Knowing which memory pages have been modified (dirty)
- Transferring only dirty pages in subsequent rounds
- Eventually pausing the VM when few pages remain dirty
The technique: Mark all pages read-only with mprotect(). When guest writes, we get a SIGSEGV, record the page as dirty, mark it writable, and resume.
┌─────────────────────────────────────────────────────────────────┐
│ Dirty Page Tracking │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Initial state: All pages PROT_READ only │
│ ┌────┬────┬────┬────┬────┬────┬────┬────┐ │
│ │ R │ R │ R │ R │ R │ R │ R │ R │ (read-only) │
│ └────┴────┴────┴────┴────┴────┴────┴────┘ │
│ │
│ Guest writes to page 2: │
│ - SIGSEGV fired │
│ - Handler marks page 2 in dirty bitmap │
│ - Handler calls mprotect(page2, PROT_READ|PROT_WRITE) │
│ │
│ After some writes: │
│ ┌────┬────┬────┬────┬────┬────┬────┬────┐ │
│ │ R │ R │ RW │ R │ RW │ RW │ R │ R │ │
│ └────┴────┴────┴────┴────┴────┴────┴────┘ │
│ ▲ ▲ ▲ │
│ │ │ │ │
│ Dirty bitmap: [0, 0, 1, 0, 1, 1, 0, 0] │
│ │
└─────────────────────────────────────────────────────────────────┘
2.2 Why This Matters
Every hypervisor needs this: Whether it’s QEMU, Firecracker, Cloud Hypervisor, or VMware Workstation - they all have a memory management layer that does exactly what you’re building. This is not an academic exercise; it’s production code patterns.
Foundation for live migration: Cloud providers like AWS and Google move VMs between hosts constantly. The dirty page tracking you’ll implement is the core mechanism that makes this possible.
Understanding QEMU internals: QEMU’s memory API (memory_region_init_ram(), memory_region_init_io()) does exactly what you’re building. After this project, QEMU’s memory code will make sense.
Security boundaries: MMIO trapping creates security boundaries. A VM can’t accidentally (or maliciously) access host memory because device accesses are trapped and validated.
2.3 Historical Context
Early virtualization (1960s-1990s): IBM’s VM/370 and VMware pioneered these concepts. Memory virtualization was done entirely in software through binary translation and trap-and-emulate.
Hardware-assisted memory (2008): Intel EPT (Extended Page Tables) and AMD NPT (Nested Page Tables) made guest physical to host physical translation a hardware function. But user-space VMMs still need mmap-based memory management for the host-side allocation.
Modern cloud (2010s-present): Firecracker (2018) and Cloud Hypervisor (2019) use these exact techniques for microVM memory management. AWS Lambda uses Firecracker, which uses mmap for guest memory.
2.4 Common Misconceptions
Misconception 1: “mmap allocates memory immediately”
- Reality: mmap creates virtual address mappings. Physical pages are allocated on first access (demand paging). A 512MB mmap uses almost no physical memory until touched.
Misconception 2: “SIGSEGV means your program crashed”
- Reality: SIGSEGV is just a signal. It can be caught and handled. VMMs intentionally trigger SIGSEGV to intercept memory accesses.
Misconception 3: “Memory protection is just for security”
- Reality: Protection bits (PROT_READ, PROT_WRITE, PROT_EXEC) are used for dirty tracking, copy-on-write, MMIO emulation, and many other virtualization features.
Misconception 4: “Guest physical addresses map directly to host physical addresses”
- Reality: There are usually three address spaces: Guest Virtual (GVA) -> Guest Physical (GPA) -> Host Virtual (HVA) -> Host Physical (HPA). Your mmap creates the GPA->HVA mapping.
3. Project Specification
3.1 What You Will Build
A comprehensive user-space memory management library that provides:
- Guest RAM allocation: Allocate large contiguous regions for guest physical memory
- MMIO region registration: Define address ranges that trigger callbacks instead of normal memory access
- Access trapping via signals: Use SIGSEGV to intercept MMIO accesses and read-only page writes
- Dirty page tracking: Track which pages have been modified for live migration support
- Memory hot-plug simulation: Add and remove memory regions at runtime
3.2 Functional Requirements
- Memory Region Management
- Allocate guest RAM of configurable size (up to several GB)
- Support multiple disjoint RAM regions
- Register MMIO regions with custom handlers
- Unmap and free regions cleanly
- MMIO Emulation
- Trap reads and writes to MMIO regions
- Pass access information to registered handlers (address, size, is_write, value)
- Support different access sizes (1, 2, 4, 8 bytes)
- Resume execution after handling
- Dirty Page Tracking
- Enable/disable dirty tracking
- Mark all pages read-only when tracking starts
- Record dirty pages in a bitmap
- Report and clear dirty pages
- Re-protect pages for next iteration
- Memory Hot-plug
- Add new memory regions at runtime
- Remove memory regions (with proper cleanup)
- Notify callback when regions change
3.3 Non-Functional Requirements
- Performance: Normal RAM access should have zero overhead (no trapping on regular memory)
- Memory efficiency: Don’t pre-allocate physical pages; rely on demand paging
- Thread safety: Handle signals properly in multi-threaded context
- Reliability: Clean error handling for mmap/mprotect failures
- Testability: Clear interfaces for unit testing
3.4 Example Usage / Output
$ ./memmap_demo
[MEMMAP] Allocating 512MB guest RAM at 0x7f0000000000
[MEMMAP] Registered MMIO region: 0x7f0010000000-0x7f0010001000 (UART)
[MEMMAP] Registered MMIO region: 0x7f0010001000-0x7f0010002000 (VGA)
[GUEST] Writing to RAM at offset 0x1000: OK
[GUEST] Writing to UART at offset 0x10000000: TRAPPED!
-> UART handler called with value 0x41 ('A')
-> Character 'A' printed to console
[MEMMAP] Enabling dirty tracking...
[GUEST] Writing to RAM at offset 0x2000: (dirty page recorded)
[GUEST] Writing to RAM at offset 0x3000: (dirty page recorded)
[MEMMAP] Scanning dirty pages...
-> 2 pages modified since last scan
-> Dirty pages: 0x2, 0x3
[MEMMAP] Simulating hot-plug: adding 256MB
[MEMMAP] New guest RAM size: 768MB
[MEMMAP] Cleanup: unmapping all regions
[MEMMAP] Done
Advanced usage with library API:
#include "memmap.h"
// Create memory manager
memmap_t *mm = memmap_create();
// Allocate 512MB guest RAM at guest physical address 0
memmap_add_ram(mm, 0, 512 * 1024 * 1024);
// Register UART at guest physical address 0x10000000
memmap_add_mmio(mm, 0x10000000, 0x1000, uart_handler, uart_state);
// Get host pointer for guest physical address
void *host_ptr = memmap_gpa_to_hva(mm, 0x1000);
memcpy(host_ptr, data, 100); // Direct access to guest RAM
// Enable dirty tracking
memmap_start_dirty_tracking(mm);
// ... guest runs and modifies memory ...
// Get dirty pages
uint64_t *dirty_bitmap;
size_t dirty_count = memmap_get_dirty_pages(mm, &dirty_bitmap);
printf("Pages modified: %zu\n", dirty_count);
// Cleanup
memmap_destroy(mm);
4. Solution Architecture
4.1 High-Level Design
┌─────────────────────────────────────────────────────────────────────────┐
│ User Application │
│ (test program, mini-VM, etc.) │
└────────────────────────────────────┬────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────────┐
│ Memory Manager API │
│ memmap_create() / memmap_add_ram() / memmap_add_mmio() / etc. │
└────────────────────────────────────┬────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────────┐
│ Core Components │
│ ┌─────────────────────┐ ┌──────────────────┐ ┌───────────────────┐ │
│ │ Region Manager │ │ Signal Handler │ │ Dirty Tracker │ │
│ │ │ │ │ │ │ │
│ │ - RAM regions list │ │ - SIGSEGV setup │ │ - Bitmap mgmt │ │
│ │ - MMIO regions list │ │ - Access decode │ │ - Page protection │ │
│ │ - Address lookup │ │ - Handler dispatch│ │ - Iteration ctrl │ │
│ └─────────────────────┘ └──────────────────┘ └───────────────────┘ │
└────────────────────────────────────┬────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────────┐
│ Linux System Calls │
│ mmap() / munmap() / mprotect() / sigaction() / sigaltstack() │
└─────────────────────────────────────────────────────────────────────────┘
4.2 Key Components
Region Manager
Maintains the mapping between Guest Physical Addresses (GPA) and Host Virtual Addresses (HVA):
┌─────────────────────────────────────────────────────────────────┐
│ Region Manager │
├─────────────────────────────────────────────────────────────────┤
│ │
│ RAM Regions (sorted by GPA): │
│ ┌────────────────────────────────────────────────────────┐ │
│ │ GPA: 0x00000000 Size: 512MB HVA: 0x7f0000000000 │ │
│ │ GPA: 0x20000000 Size: 256MB HVA: 0x7f0100000000 │ │
│ └────────────────────────────────────────────────────────┘ │
│ │
│ MMIO Regions (sorted by GPA): │
│ ┌────────────────────────────────────────────────────────┐ │
│ │ GPA: 0x10000000 Size: 4KB Handler: uart_handler │ │
│ │ GPA: 0x10001000 Size: 4KB Handler: vga_handler │ │
│ │ GPA: 0x10002000 Size: 64KB Handler: virtio_handler │ │
│ └────────────────────────────────────────────────────────┘ │
│ │
│ Functions: │
│ - lookup_region(gpa) -> region_t* │
│ - gpa_to_hva(gpa) -> void* │
│ - is_mmio(gpa) -> bool │
│ │
└─────────────────────────────────────────────────────────────────┘
Signal Handler
Intercepts SIGSEGV and determines appropriate action:
┌─────────────────────────────────────────────────────────────────┐
│ Signal Handler Flow │
├─────────────────────────────────────────────────────────────────┤
│ │
│ SIGSEGV received with siginfo_t: │
│ - si_addr: faulting address │
│ - si_code: fault type (SEGV_ACCERR, SEGV_MAPERR) │
│ │
│ │ │
│ ▼ │
│ ┌───────────────────────┐ │
│ │ Is addr in MMIO region?│ │
│ └───────────────────────┘ │
│ │ │ │
│ YES NO │
│ │ │ │
│ ▼ ▼ │
│ ┌─────────────────┐ ┌─────────────────┐ │
│ │ Decode access: │ │ Is dirty track │ │
│ │ - read/write │ │ enabled? │ │
│ │ - size │ └─────────────────┘ │
│ │ - value (write) │ │ │ │
│ └─────────────────┘ YES NO │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌─────────────────┐ ┌─────────┐ ┌─────────┐ │
│ │ Call MMIO │ │ Mark │ │ Real │ │
│ │ handler │ │ dirty │ │ fault │ │
│ └─────────────────┘ │ + allow │ │ abort() │ │
│ │ └─────────┘ └─────────┘ │
│ ▼ │ │
│ ┌─────────────────┐ │ │
│ │ Emulate instr │ │ │
│ │ Update PC │ │ │
│ │ Return │◄─────┘ │
│ └─────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
Dirty Tracker
Manages page-level dirty tracking for live migration:
┌─────────────────────────────────────────────────────────────────┐
│ Dirty Tracker │
├─────────────────────────────────────────────────────────────────┤
│ │
│ State: │
│ - enabled: bool │
│ - bitmap: uint64_t[] (1 bit per page) │
│ - num_pages: size_t │
│ │
│ Operations: │
│ │
│ start_tracking(): │
│ 1. Allocate bitmap │
│ 2. mprotect(all_ram, PROT_READ) // Remove write │
│ 3. enabled = true │
│ │
│ on_write_fault(addr): │
│ 1. page_num = (addr - base) / PAGE_SIZE │
│ 2. bitmap[page_num / 64] |= (1 << (page_num % 64)) │
│ 3. mprotect(page_addr, PAGE_SIZE, PROT_READ|PROT_WRITE) │
│ │
│ get_dirty_pages(): │
│ 1. Return copy of bitmap │
│ 2. Clear bitmap │
│ 3. mprotect(all_ram, PROT_READ) // Re-protect │
│ │
└─────────────────────────────────────────────────────────────────┘
4.3 Data Structures
// Memory region types
typedef enum {
REGION_RAM,
REGION_MMIO
} region_type_t;
// MMIO handler callback
typedef void (*mmio_handler_t)(void *opaque, uint64_t addr,
uint64_t *data, int size, int is_write);
// Memory region descriptor
typedef struct region {
uint64_t gpa_start; // Guest physical address start
uint64_t size; // Region size in bytes
region_type_t type; // RAM or MMIO
// For RAM regions
void *hva; // Host virtual address (mmap'd)
// For MMIO regions
mmio_handler_t handler; // Callback function
void *handler_opaque; // User data for callback
struct region *next; // Linked list
} region_t;
// Dirty tracking state
typedef struct dirty_tracker {
bool enabled;
uint64_t *bitmap; // 1 bit per page
size_t num_pages;
size_t bitmap_size; // In uint64_t units
} dirty_tracker_t;
// Main memory manager
typedef struct memmap {
region_t *regions; // Sorted list of all regions
dirty_tracker_t dirty; // Dirty tracking state
// For signal handler context
void *ram_base; // Base address of RAM mapping
size_t ram_size; // Total RAM size
} memmap_t;
4.4 Algorithm Overview
GPA to HVA Translation:
Input: Guest Physical Address (GPA)
Output: Host Virtual Address (HVA) or NULL
1. Binary search regions list for GPA
2. If found in RAM region:
- HVA = region->hva + (GPA - region->gpa_start)
- Return HVA
3. If found in MMIO region:
- Return special MMIO marker (or NULL)
4. If not found:
- Return NULL (unmapped)
MMIO Access Emulation:
Input: Faulting address, CPU context (registers, instruction pointer)
Output: Modified context (for reads), handler called (for writes)
1. Decode the faulting instruction at RIP
2. Determine:
- Is it a read or write?
- What size (1, 2, 4, 8 bytes)?
- Source/destination register
3. Look up MMIO handler for this address
4. If write:
- Extract value from source register
- Call handler(addr, &value, size, true)
5. If read:
- Call handler(addr, &value, size, false)
- Store value in destination register
6. Advance RIP past the instruction
7. Return from signal handler
Dirty Page Iteration:
Input: None
Output: List of dirty page numbers
1. Scan bitmap for set bits
2. For each set bit:
- Add page number to result list
- Clear the bit
3. mprotect() all dirty pages back to PROT_READ
4. Return result list
5. Implementation Guide
5.1 Development Environment Setup
Required packages (Ubuntu/Debian):
sudo apt update
sudo apt install build-essential gdb valgrind
Required packages (Fedora/RHEL):
sudo dnf install gcc gdb valgrind
Kernel requirements:
- Any modern Linux kernel (3.x+)
- No special modules needed
- Works in VMs and containers
Testing setup:
mkdir memmap_project && cd memmap_project
mkdir -p src include tests
5.2 Project Structure
memmap_project/
├── Makefile
├── include/
│ └── memmap.h # Public API header
├── src/
│ ├── memmap.c # Main implementation
│ ├── region.c # Region management
│ ├── signal.c # Signal handling
│ ├── dirty.c # Dirty tracking
│ └── decode.c # Instruction decoding (for MMIO)
├── tests/
│ ├── test_ram.c # RAM allocation tests
│ ├── test_mmio.c # MMIO trapping tests
│ ├── test_dirty.c # Dirty tracking tests
│ └── test_hotplug.c # Hot-plug tests
└── examples/
└── demo.c # Full demonstration
5.3 The Core Question You’re Answering
“How does a user-space program create the illusion of guest physical memory while intercepting specific address ranges for device emulation?”
This is the fundamental question every VMM must answer. Your solution combines:
- Memory mapping for guest RAM
- Protection-based trapping for MMIO
- Signal handling for access interception
- State management for dirty tracking
5.4 Concepts You Must Understand First
Before writing code, verify you understand these concepts:
Question 1: What’s the difference between virtual addresses and physical addresses? How does mmap create virtual-to-physical mappings?
- Reference: “Computer Systems: A Programmer’s Perspective” Chapter 9
Question 2: What is a page fault? When does the kernel generate SIGSEGV vs. allocating a new page?
- Reference: “The Linux Programming Interface” Chapter 49
Question 3: What do the protection bits (PROT_READ, PROT_WRITE, PROT_EXEC, PROT_NONE) actually control?
- Reference: mmap(2) and mprotect(2) man pages
Question 4: How does sigaction() differ from signal()? Why do we need siginfo_t?
- Reference: “The Linux Programming Interface” Chapter 21
Question 5: What is a signal handler’s stack? Why might we need sigaltstack()?
- Reference: “The Linux Programming Interface” Chapter 21.3
5.5 Questions to Guide Your Design
Memory Layout Questions:
- How will you organize the address space? Contiguous or scattered regions?
- What happens if a user requests overlapping regions?
- How do you handle regions that cross page boundaries?
Signal Handling Questions:
- How do you distinguish MMIO faults from real bugs?
- How do you decode the faulting instruction to determine read vs. write?
- What happens if the signal handler itself causes a fault?
Dirty Tracking Questions:
- How do you efficiently store the dirty bitmap? (Consider: 1GB RAM = 262,144 pages = 32KB bitmap)
- How do you handle concurrent access while scanning the bitmap?
- Should you track dirty pages at 4KB granularity or larger?
Performance Questions:
- How do you minimize signal handler overhead?
- Should you use different strategies for frequent vs. infrequent MMIO access?
- How do you avoid false-positive dirty pages?
5.6 Thinking Exercise
Before coding, work through this scenario by hand:
Scenario: A guest runs on your memory manager with:
- 256MB RAM at GPA 0x00000000
- UART at GPA 0x10000000 (4KB)
- Dirty tracking enabled
The guest executes these memory accesses:
mov [0x1000], rax- Write to RAMmov rbx, [0x10000000]- Read from UARTmov [0x10000005], cl- Write to UART (1 byte)mov [0x2000], rdx- Write to RAM (second write)
For each access, trace:
- What address is accessed (GPA)?
- Does it cause a signal? Why or why not?
- What does your signal handler do?
- What’s the state of the dirty bitmap after?
Expected answers:
- GPA 0x1000: Signal (dirty tracking), mark page 1 dirty, allow write
- GPA 0x10000000: Signal (MMIO), call UART read handler, return value in rbx
- GPA 0x10000005: Signal (MMIO), call UART write handler with cl value
- GPA 0x2000: Signal (dirty tracking), mark page 2 dirty, allow write
5.7 Hints in Layers
Hint 1 - Starting Point (Conceptual Direction):
Start with just RAM allocation. Get mmap() working for a large anonymous region. Verify you can read and write to it. Then add MMIO as a separate concept that triggers signal handling.
Hint 2 - Next Level (More Specific):
For MMIO, map the region with PROT_NONE. This guarantees any access triggers SIGSEGV. In your signal handler, use siginfo_t->si_addr to get the faulting address. Use the ucontext_t to access registers.
Hint 3 - Technical Details (Approach): Instruction decoding is tricky. For a simplified approach, only support specific instruction patterns:
// Common x86-64 MOV patterns
// 89 /r MOV r/m32, r32 (4-byte write)
// 8B /r MOV r32, r/m32 (4-byte read)
// C7 /0 id MOV r/m32, imm32 (immediate write)
// You can use a library like capstone for full decoding
Hint 4 - Tools/Debugging (Verification): Debug signal handler issues with:
# Trace signals
strace -e trace=signal ./your_program
# Debug signal handler with gdb
gdb ./your_program
(gdb) handle SIGSEGV nostop noprint pass
(gdb) break sigsegv_handler
5.8 The Interview Questions They’ll Ask
- “How does QEMU manage guest memory?”
- Explain mmap for RAM, MMIO regions, memory slots for KVM
- Discuss MemoryRegion API and address space abstraction
- “What is dirty page tracking and why is it needed?”
- Explain live migration use case
- Describe protection-based tracking mechanism
- Discuss performance implications
- “How would you handle a guest that writes to a read-only MMIO register?”
- Discuss emulation choices: ignore, log, inject exception
- Explain how real hardware behaves (bus error, ignore, etc.)
- “What are the challenges of signal handlers in multi-threaded programs?”
- Signal delivery to threads, async-signal-safe functions
- Using sigaltstack, blocking signals during critical sections
- “How does EPT/NPT differ from this software approach?”
- Hardware-based GPA→HPA translation
- EPT violations instead of signals
- Performance comparison
5.9 Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| mmap and memory mapping | “The Linux Programming Interface” by Kerrisk | Chapter 49 |
| Signal handling | “The Linux Programming Interface” by Kerrisk | Chapters 20-22 |
| Virtual memory concepts | “Computer Systems: A Programmer’s Perspective” | Chapter 9 |
| QEMU memory architecture | QEMU documentation | Memory API guide |
| x86 instruction encoding | “Intel SDM Volume 2” | Chapter 2 |
5.10 Implementation Phases
Phase 1: Basic RAM Management (Day 1-2)
- Implement
memmap_create()andmemmap_destroy() - Implement
memmap_add_ram()using mmap - Implement
memmap_gpa_to_hva()for address translation - Test: allocate RAM, write/read data
Phase 2: MMIO Region Support (Day 3-4)
- Implement
memmap_add_mmio()with PROT_NONE mapping - Set up SIGSEGV handler with sigaction
- Implement basic handler dispatch (identify MMIO region)
- Test: register MMIO, verify signal on access
Phase 3: Instruction Decoding (Day 5-6)
- Implement instruction decoder for basic MOV patterns
- Extract read/write, size, and register information
- Call MMIO handlers with proper parameters
- Test: UART emulation with read/write
Phase 4: Dirty Page Tracking (Day 7-8)
- Implement dirty bitmap allocation
- Implement
memmap_start_dirty_tracking()with mprotect - Modify signal handler for dirty tracking
- Implement
memmap_get_dirty_pages() - Test: write patterns and verify dirty bitmap
Phase 5: Hot-plug and Polish (Day 9-10)
- Implement
memmap_add_ram()for additional regions - Implement
memmap_remove_region() - Add error handling throughout
- Create comprehensive test suite
5.11 Key Implementation Decisions
Decision 1: Signal handler stack
- Options: Use default stack, use sigaltstack
- Recommendation: Use sigaltstack. The default stack may not have enough space, especially if the handler does instruction decoding.
Decision 2: Instruction decoding complexity
- Options: Hand-coded decoder, use library (capstone/Zydis)
- Recommendation: Start with hand-coded for common patterns. Add library if you need full x86-64 support.
Decision 3: Dirty bitmap granularity
- Options: 4KB pages, larger tracking units (64KB, 1MB)
- Recommendation: Start with 4KB for accuracy. Add larger granularity as optimization later.
Decision 4: Region storage structure
- Options: Linked list, array, tree
- Recommendation: Simple sorted linked list for < 100 regions. Use interval tree for many regions.
6. Testing Strategy
Unit Tests
// test_ram.c
void test_ram_allocation(void) {
memmap_t *mm = memmap_create();
// Allocate 64MB RAM
int ret = memmap_add_ram(mm, 0, 64 * 1024 * 1024);
assert(ret == 0);
// Get HVA and write data
void *hva = memmap_gpa_to_hva(mm, 0x1000);
assert(hva != NULL);
memset(hva, 0xAB, 4096);
assert(((uint8_t *)hva)[0] == 0xAB);
memmap_destroy(mm);
printf("test_ram_allocation: PASSED\n");
}
// test_mmio.c
static int uart_write_count = 0;
static uint8_t last_uart_write = 0;
void test_uart_handler(void *opaque, uint64_t addr,
uint64_t *data, int size, int is_write) {
if (is_write) {
uart_write_count++;
last_uart_write = *data & 0xFF;
}
}
void test_mmio_trapping(void) {
memmap_t *mm = memmap_create();
// Add RAM and UART
memmap_add_ram(mm, 0, 64 * 1024 * 1024);
memmap_add_mmio(mm, 0x10000000, 0x1000, test_uart_handler, NULL);
// Get HVA of UART region (should return MMIO marker)
void *uart_hva = memmap_gpa_to_hva(mm, 0x10000000);
// Write to UART (this should trigger handler)
volatile uint8_t *uart = (volatile uint8_t *)uart_hva;
*uart = 'X'; // This triggers SIGSEGV -> handler
assert(uart_write_count == 1);
assert(last_uart_write == 'X');
memmap_destroy(mm);
printf("test_mmio_trapping: PASSED\n");
}
Integration Tests
// test_dirty.c
void test_dirty_tracking_iteration(void) {
memmap_t *mm = memmap_create();
memmap_add_ram(mm, 0, 4096 * 100); // 100 pages
// Enable dirty tracking
memmap_start_dirty_tracking(mm);
// Write to specific pages
void *base = memmap_gpa_to_hva(mm, 0);
memset(base + 4096 * 5, 0xAA, 100); // Page 5
memset(base + 4096 * 42, 0xBB, 100); // Page 42
memset(base + 4096 * 99, 0xCC, 100); // Page 99
// Get dirty pages
uint64_t *bitmap;
size_t count = memmap_get_dirty_pages(mm, &bitmap);
// Verify
assert(count == 3);
assert(bitmap[5 / 64] & (1UL << (5 % 64)));
assert(bitmap[42 / 64] & (1UL << (42 % 64)));
assert(bitmap[99 / 64] & (1UL << (99 % 64)));
// Second iteration - write to one page
memset(base + 4096 * 10, 0xDD, 100);
count = memmap_get_dirty_pages(mm, &bitmap);
assert(count == 1);
memmap_destroy(mm);
printf("test_dirty_tracking_iteration: PASSED\n");
}
Stress Tests
void test_high_frequency_mmio(void) {
memmap_t *mm = memmap_create();
memmap_add_mmio(mm, 0x10000000, 0x1000, counter_handler, &counter);
volatile uint32_t *reg = (volatile uint32_t *)memmap_gpa_to_hva(mm, 0x10000000);
struct timespec start, end;
clock_gettime(CLOCK_MONOTONIC, &start);
for (int i = 0; i < 100000; i++) {
*reg = i; // Each triggers signal handler
}
clock_gettime(CLOCK_MONOTONIC, &end);
double elapsed = (end.tv_sec - start.tv_sec) +
(end.tv_nsec - start.tv_nsec) / 1e9;
printf("100,000 MMIO writes in %.3f seconds (%.0f/sec)\n",
elapsed, 100000 / elapsed);
memmap_destroy(mm);
}
7. Common Pitfalls & Debugging
| Problem | Root Cause | Fix | Verification |
|---|---|---|---|
| SIGSEGV kills process | Signal handler not installed or returning incorrectly | Ensure sigaction() with SA_SIGINFO, handler must modify context correctly | Add printf at handler entry to confirm it’s called |
| Infinite loop in handler | Handler doesn’t advance instruction pointer | After emulating, add instruction length to RIP in ucontext | Print RIP before/after to verify change |
| Random crashes | Signal handler stack overflow | Use sigaltstack() with sufficient size (64KB+) | Check stack pointer in handler |
| MMIO reads return garbage | Not properly extracting value from handler | Ensure handler writes to *data, and handler stores in correct register | Print emulated value and register state |
| Dirty tracking misses pages | mprotect() not applied to all pages | Verify mprotect() call covers entire RAM region | Check return value and errno |
| Multi-threaded crashes | Signal delivered to wrong thread | Use per-thread signal masks, consider using userfaultfd instead | Test with single thread first |
| Slow MMIO emulation | Instruction decoding too complex | Cache decoded instructions, use simpler patterns | Profile with perf |
Debugging Techniques
Print signal info:
void sigsegv_handler(int sig, siginfo_t *info, void *context) {
printf("SIGSEGV at %p, code=%d\n", info->si_addr, info->si_code);
// SEGV_MAPERR: invalid address
// SEGV_ACCERR: valid address, invalid permissions
}
Examine registers in handler:
void sigsegv_handler(int sig, siginfo_t *info, void *context) {
ucontext_t *uc = (ucontext_t *)context;
mcontext_t *mc = &uc->uc_mcontext;
printf("RIP: 0x%llx\n", mc->gregs[REG_RIP]);
printf("RSP: 0x%llx\n", mc->gregs[REG_RSP]);
printf("RAX: 0x%llx\n", mc->gregs[REG_RAX]);
}
Verify mmap success:
void *ptr = mmap(NULL, size, prot, flags, -1, 0);
if (ptr == MAP_FAILED) {
perror("mmap failed");
printf("errno=%d, size=%zu\n", errno, size);
exit(1);
}
8. Extensions & Challenges
Basic Extensions
-
Multiple RAM Regions: Support non-contiguous guest physical address spaces (common in real systems due to reserved ranges)
-
Memory Ballooning: Implement a mechanism to dynamically reduce guest memory by marking pages as “ballooned” (unusable by guest)
-
Huge Pages: Add support for 2MB huge pages to reduce page table overhead
Intermediate Challenges
-
NUMA Awareness: Allocate memory from specific NUMA nodes for better performance
-
Copy-on-Write Snapshots: Implement VM snapshots using mmap with
MAP_PRIVATEon a shared backing file -
Memory Encryption: Add hooks for AMD SEV-style memory encryption (encryption at the API level, not real encryption)
Advanced Challenges
-
userfaultfd Integration: Replace SIGSEGV handling with userfaultfd for better performance and thread safety
-
KSM-like Deduplication: Scan for identical pages and merge them (copy-on-write)
-
Live Migration Support: Implement iterative dirty page transfer with a simulated network interface
9. Real-World Connections
QEMU’s Memory Model
QEMU uses a MemoryRegion abstraction very similar to what you’re building:
// QEMU equivalent of your API
memory_region_init_ram(&ram, owner, "ram", size, &error_abort);
memory_region_init_io(&uart, owner, &uart_ops, uart_state, "uart", size);
memory_region_add_subregion(system_memory, 0, &ram);
memory_region_add_subregion(system_memory, 0x10000000, &uart);
After completing this project, you can read QEMU’s memory.c and exec.c with understanding.
Firecracker’s Approach
Firecracker (AWS’s microVM monitor) uses a simpler memory model:
- Single contiguous guest RAM region via mmap
- KVM for actual virtualization
- Minimal device emulation
Your project teaches the concepts Firecracker uses, minus KVM integration.
Cloud Hypervisor
Cloud Hypervisor (Intel’s rust-vmm based VMM) has:
GuestMemorytrait for memory abstractionGuestMemoryMmapimplementation using mmap- Dirty page tracking for live migration
The Rust API is different, but the concepts are identical to what you’re implementing.
10. Resources
Essential Reading
- Linux man pages: mmap(2), mprotect(2), sigaction(2), sigaltstack(2)
- QEMU Memory API: https://www.qemu.org/docs/master/devel/memory.html
- userfaultfd: https://www.kernel.org/doc/html/latest/admin-guide/mm/userfaultfd.html
Code References
- QEMU memory: https://github.com/qemu/qemu/blob/master/system/memory.c
- Firecracker: https://github.com/firecracker-microvm/firecracker
- Cloud Hypervisor: https://github.com/cloud-hypervisor/cloud-hypervisor
Tutorials
- LWN mmap tutorial: https://lwn.net/Articles/250967/
- Signal handling guide: https://www.gnu.org/software/libc/manual/html_node/Signal-Handling.html
11. Self-Assessment Checklist
Before moving on, verify you can:
- Explain the difference between GPA (Guest Physical Address) and HVA (Host Virtual Address)
- Describe how mmap creates memory mappings without immediately allocating physical pages
- Write a SIGSEGV handler that extracts the faulting address and resumes execution
- Explain why MMIO regions are mapped with PROT_NONE
- Describe the dirty page tracking algorithm (protect -> fault -> mark -> unprotect)
- Debug signal handling issues using strace and gdb
- Implement address translation from GPA to HVA for multiple regions
12. Submission / Completion Criteria
Your project is complete when:
- RAM Management Works
- Can allocate > 256MB of guest RAM
- Can read/write to any offset
- Memory is properly freed on cleanup
- MMIO Trapping Works
- MMIO regions trigger signal handler
- Handler correctly identifies read vs. write
- Handler receives correct address and size
- Execution resumes after handling
- Dirty Tracking Works
- Can enable dirty tracking
- Write faults are recorded in bitmap
get_dirty_pages()returns correct set- Second iteration only shows new dirty pages
- Passes All Tests
- Unit tests pass for each component
- Integration test with simulated guest access patterns
- Stress test with 100,000+ MMIO accesses
- Code Quality
- Clean error handling (no silent failures)
- No memory leaks (verify with valgrind)
- Comments explaining key design decisions
- README with build instructions
Demonstration: Run the demo program showing:
$ ./demo
[Expected output showing RAM write, MMIO trap, dirty page scan, and hot-plug]
After completing this project, you’ll have built the memory management foundation that every hypervisor needs. This understanding transfers directly to QEMU development, cloud VMM work, and systems programming roles involving virtualization.