Project 15: Complete Type-2 Hypervisor (Capstone)
Build a complete Type-2 hypervisor that runs Linux as a guest, with virtio devices, SMP support (multiple VCPUs), APIC emulation, and full production-ready features.
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Master (Level 5: The First-Principles Wizard) |
| Time Estimate | 3-6 months |
| Language | C (alternatives: Rust) |
| Prerequisites | All previous projects (P1-P14), deep understanding of x86 architecture, Linux kernel internals |
| Key Topics | SMP virtualization, APIC/IOAPIC emulation, Linux boot protocol, virtio drivers, interrupt virtualization |
1. Learning Objectives
By completing this project, you will:
- Master multi-processor virtualization including VCPU scheduling and IPIs
- Implement complete APIC emulation for modern interrupt handling
- Understand and implement the Linux x86 boot protocol
- Integrate virtio block and network devices for a fully functional system
- Build a production-quality hypervisor comparable to early QEMU/KVM
- Gain deep expertise in systems programming that few engineers possess
- Create a portfolio piece that demonstrates mastery-level systems knowledge
2. Theoretical Foundation
2.1 Core Concepts
What Makes a Complete Hypervisor
A complete Type-2 hypervisor must handle everything a real computer does:
┌────────────────────────────────────────────────────────────────────────┐
│ Complete Type-2 Hypervisor Components │
├────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ Guest Linux Kernel │ │
│ │ ┌─────────────────────────────────────────────────────────┐ │ │
│ │ │ Applications │ Filesystem │ Networking │ Drivers │ │ │
│ │ └─────────────────────────────────────────────────────────┘ │ │
│ │ │ │ │
│ │ ▼ │ │
│ │ ┌─────────────────────────────────────────────────────────┐ │ │
│ │ │ Linux Kernel │ │ │
│ │ │ - Process scheduler (needs timer interrupts) │ │ │
│ │ │ - Memory manager (needs page tables, EPT) │ │ │
│ │ │ - Device drivers (virtio, console, etc.) │ │ │
│ │ │ - SMP support (needs APIC, IPIs) │ │ │
│ │ └─────────────────────────────────────────────────────────┘ │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │ │
│ │ VM Exits │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ Your Hypervisor │ │
│ │ │ │
│ │ ┌────────────────────────────────────────────────────────┐ │ │
│ │ │ CPU Virtualization │ │ │
│ │ │ ┌───────────┐ ┌───────────┐ ┌───────────┐ │ │ │
│ │ │ │ VCPU 0 │ │ VCPU 1 │ │ VCPU N │ │ │ │
│ │ │ │ (KVM API) │ │ (KVM API) │ │ (KVM API) │ │ │ │
│ │ │ └───────────┘ └───────────┘ └───────────┘ │ │ │
│ │ │ │ │ │ │ │ │
│ │ │ └──────────────┼──────────────┘ │ │ │
│ │ │ │ │ │ │
│ │ │ ┌────────▼────────┐ │ │ │
│ │ │ │ VCPU Scheduler │ │ │ │
│ │ │ │ (threads) │ │ │ │
│ │ │ └─────────────────┘ │ │ │
│ │ └────────────────────────────────────────────────────────┘ │ │
│ │ │ │
│ │ ┌────────────────────────────────────────────────────────┐ │ │
│ │ │ Interrupt Virtualization │ │ │
│ │ │ ┌──────────────┐ ┌──────────────┐ │ │ │
│ │ │ │ Local APIC │ │ I/O APIC │ │ │ │
│ │ │ │ (per VCPU) │ │ (shared) │ │ │ │
│ │ │ └──────────────┘ └──────────────┘ │ │ │
│ │ │ │ │ │ │ │
│ │ │ ├────────────────────┤ │ │ │
│ │ │ │ Interrupt Routing │ │ │
│ │ │ │ - Timer IRQ 0 │ │ │
│ │ │ │ - Keyboard IRQ 1 │ │ │
│ │ │ │ - Serial IRQ 4 │ │ │
│ │ │ │ - Disk IRQ 14 │ │ │
│ │ │ │ - Network IRQ 11 │ │ │
│ │ └────────────────────────────────────────────────────────┘ │ │
│ │ │ │
│ │ ┌────────────────────────────────────────────────────────┐ │ │
│ │ │ Memory Virtualization │ │ │
│ │ │ ┌──────────────────────────────────────────────────┐ │ │ │
│ │ │ │ EPT │ │ │ │
│ │ │ │ GPA 0x00000000 → HPA (RAM) │ │ │ │
│ │ │ │ GPA 0xFEE00000 → Local APIC (MMIO) │ │ │ │
│ │ │ │ GPA 0xFEC00000 → I/O APIC (MMIO) │ │ │ │
│ │ │ └──────────────────────────────────────────────────┘ │ │ │
│ │ └────────────────────────────────────────────────────────┘ │ │
│ │ │ │
│ │ ┌────────────────────────────────────────────────────────┐ │ │
│ │ │ Device Emulation │ │ │
│ │ │ ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐ │ │ │
│ │ │ │ Serial │ │ VirtIO │ │ VirtIO │ │ Timer │ │ │ │
│ │ │ │ (UART) │ │ Block │ │ Net │ │ (PIT) │ │ │ │
│ │ │ └────────┘ └────────┘ └────────┘ └────────┘ │ │ │
│ │ └────────────────────────────────────────────────────────┘ │ │
│ │ │ │
│ │ ┌────────────────────────────────────────────────────────┐ │ │
│ │ │ Boot Loader │ │ │
│ │ │ - Load bzImage (compressed kernel) │ │ │
│ │ │ - Load initrd (initial ramdisk) │ │ │
│ │ │ - Set up boot parameters │ │ │
│ │ │ - Jump to kernel entry point │ │ │
│ │ └────────────────────────────────────────────────────────┘ │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │ │
│ │ System calls, file I/O │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ Host Linux │ │
│ │ - KVM module provides VT-x access │ │
│ │ - TAP device for networking │ │
│ │ - File I/O for disk images │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │
└────────────────────────────────────────────────────────────────────────┘
SMP Virtualization Architecture
┌────────────────────────────────────────────────────────────────────────┐
│ SMP Virtualization Model │
├────────────────────────────────────────────────────────────────────────┤
│ │
│ Your Hypervisor Process │
│ ════════════════════════ │
│ │
│ ┌────────────────────────────────────────────────────────────────┐ │
│ │ Main Thread │ │
│ │ │ │
│ │ - Parse arguments │ │
│ │ - Create VM via KVM │ │
│ │ - Set up memory │ │
│ │ - Create VCPUs │ │
│ │ - Load kernel/initrd │ │
│ │ - Spawn VCPU threads │ │
│ │ - Wait for completion │ │
│ │ │ │
│ └────────────────────────────────────────────────────────────────┘ │
│ │ │
│ │ pthread_create() │
│ ▼ │
│ ┌────────────────┬────────────────┬────────────────┬───────────┐ │
│ │ VCPU 0 Thread │ VCPU 1 Thread │ VCPU 2 Thread │ ... │ │
│ │ │ │ │ │ │
│ │ ┌──────────┐ │ ┌──────────┐ │ ┌──────────┐ │ │ │
│ │ │ KVM_RUN │ │ │ KVM_RUN │ │ │ KVM_RUN │ │ │ │
│ │ │ loop │ │ │ loop │ │ │ loop │ │ │ │
│ │ └──────────┘ │ └──────────┘ │ └──────────┘ │ │ │
│ │ │ │ │ │ │ │ │ │
│ │ │ │ │ │ │ │ │ │
│ │ ┌────▼────┐ │ ┌────▼────┐ │ ┌────▼────┐ │ │ │
│ │ │ Local │ │ │ Local │ │ │ Local │ │ │ │
│ │ │ APIC │ │ │ APIC │ │ │ APIC │ │ │ │
│ │ │ State │ │ │ State │ │ │ State │ │ │ │
│ │ └─────────┘ │ └─────────┘ │ └─────────┘ │ │ │
│ │ │ │ │ │ │
│ └────────────────┴────────────────┴────────────────┴───────────┘ │
│ │ │ │ │
│ │ │ │ │
│ └────────────────┼────────────────┘ │
│ │ │
│ ┌────────▼────────┐ │
│ │ Shared State │ │
│ │ │ │
│ │ - I/O APIC │ │
│ │ - Memory │ │
│ │ - Devices │ │
│ │ - Event loop │ │
│ │ │ │
│ └─────────────────┘ │
│ │
│ │
│ Inter-Processor Interrupt (IPI) Flow: │
│ ══════════════════════════════════════ │
│ │
│ VCPU 0 VCPU 1 │
│ │ │ │
│ │ Write to APIC ICR │ │
│ │ (Send IPI to VCPU 1) │ │
│ ▼ │ │
│ ┌──────────┐ │ │
│ │ APIC ICR │──────────────────────►│ │
│ │ handler │ Signal VCPU 1 │ │
│ └──────────┘ thread ▼ │
│ │ ┌──────────┐ │
│ │ │ Interrupt│ │
│ │ │ injected │ │
│ │ └──────────┘ │
│ │ │ │
│ ▼ ▼ │
│ Continue Handle IPI │
│ execution (e.g., TLB flush, │
│ reschedule) │
│ │
└────────────────────────────────────────────────────────────────────────┘
APIC Architecture
┌────────────────────────────────────────────────────────────────────────┐
│ APIC System Architecture │
├────────────────────────────────────────────────────────────────────────┤
│ │
│ I/O APIC (Shared) │
│ MMIO at 0xFEC00000 │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ │ │
│ │ 24 Interrupt Redirection Entries (IRTEs) │ │
│ │ ┌──────────────────────────────────────────────────────┐ │ │
│ │ │ IRQ 0: Timer → Destination: VCPU 0, Vector: 32 │ │ │
│ │ │ IRQ 1: Keyboard → Destination: VCPU 0, Vector: 33 │ │ │
│ │ │ IRQ 4: Serial → Destination: VCPU 0, Vector: 36 │ │ │
│ │ │ IRQ 11: Network → Destination: VCPU 0, Vector: 43 │ │ │
│ │ │ IRQ 14: Disk → Destination: VCPU 0, Vector: 46 │ │ │
│ │ │ ... │ │ │
│ │ └──────────────────────────────────────────────────────┘ │ │
│ │ │ │
│ │ Registers: │ │
│ │ - IOREGSEL (offset 0x00): Select register │ │
│ │ - IOWIN (offset 0x10): Read/write selected register │ │
│ │ - IOAPICVER: Version and max entries │ │
│ │ - IOAPICID: APIC ID │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────────┘ │
│ │ │
│ │ Interrupt routing │
│ ▼ │
│ ┌─────────────────┬────────────────┬───────────────────────────┐ │
│ │ │ │ │ │
│ │ Local APIC 0 │ Local APIC 1 │ Local APIC N │ │
│ │ (VCPU 0) │ (VCPU 1) │ (VCPU N) │ │
│ │ 0xFEE00000 │ 0xFEE00000 │ 0xFEE00000 │ │
│ │ │ │ (per-VCPU) │ │
│ │ ┌───────────┐ │ ┌───────────┐ │ ┌───────────────────┐ │ │
│ │ │ APIC ID │ │ │ APIC ID │ │ │ APIC ID │ │ │
│ │ │ 0 │ │ │ 1 │ │ │ N │ │ │
│ │ └───────────┘ │ └───────────┘ │ └───────────────────┘ │ │
│ │ │ │ │ │
│ │ Key Registers:│ │ │ │
│ │ ┌───────────────────────────────────────────────────────┐ │ │
│ │ │ Offset │ Name │ Description │ │ │
│ │ ├────────┼──────┼───────────────────────────────────────┤ │ │
│ │ │ 0x020 │ ID │ Local APIC ID │ │ │
│ │ │ 0x030 │ VER │ Version │ │ │
│ │ │ 0x080 │ TPR │ Task Priority Register │ │ │
│ │ │ 0x0B0 │ EOI │ End of Interrupt │ │ │
│ │ │ 0x0F0 │ SVR │ Spurious Interrupt Vector │ │ │
│ │ │ 0x100 │ ISR │ In-Service Register (256 bits) │ │ │
│ │ │ 0x180 │ TMR │ Trigger Mode Register │ │ │
│ │ │ 0x200 │ IRR │ Interrupt Request Register │ │ │
│ │ │ 0x300 │ ICR │ Interrupt Command Register (for IPI) │ │ │
│ │ │ 0x320 │ LVTT │ Timer LVT │ │ │
│ │ │ 0x380 │ ICT │ Initial Count (timer) │ │ │
│ │ │ 0x390 │ CCT │ Current Count (timer) │ │ │
│ │ │ 0x3E0 │ DCR │ Divide Configuration │ │ │
│ │ └───────────────────────────────────────────────────────┘ │ │
│ │ │ │ │ │
│ │ Timer: │ │ │ │
│ │ - LVTT: Vector and mode │ │ │
│ │ - ICT: Countdown start │ │ │
│ │ - CCT: Current count │ │ │
│ │ - On expiry: fire interrupt │ │ │
│ │ │ │ │ │
│ └─────────────────┴────────────────┴───────────────────────────┘ │
│ │
└────────────────────────────────────────────────────────────────────────┘
Linux Boot Protocol
┌────────────────────────────────────────────────────────────────────────┐
│ Linux x86-64 Boot Protocol │
├────────────────────────────────────────────────────────────────────────┤
│ │
│ What you need to boot Linux: │
│ ═══════════════════════════════ │
│ │
│ 1. bzImage (compressed kernel) │
│ - Real-mode setup code at offset 0 │
│ - Protected mode kernel at offset 0x200 * setup_sects │
│ - Decompression code included │
│ │
│ 2. initrd (initial ramdisk) │
│ - Compressed filesystem image │
│ - Contains early userspace, drivers │
│ │
│ 3. Command line │
│ - Kernel parameters │
│ - e.g., "console=ttyS0 root=/dev/vda1" │
│ │
│ Memory Layout for Boot: │
│ ════════════════════════ │
│ │
│ GPA Contents │
│ ──────────────────────────────────────────────────────────────── │
│ 0x00000000 Real mode IVT │
│ 0x00000500 BIOS data area │
│ 0x00007C00 (Traditional boot sector - not used) │
│ 0x00010000 Real-mode setup code (from bzImage) │
│ 0x00020000 Command line string │
│ 0x00100000 Protected-mode kernel (1MB mark) │
│ 0x0F000000 initrd load address (example) │
│ (high mem) │
│ │
│ Boot Sequence: │
│ ═══════════════ │
│ │
│ Step 1: Load bzImage │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ - Read bzImage file │ │
│ │ - Parse setup header at offset 0x1F1 │ │
│ │ - Get setup_sects, boot_flag, version │ │
│ │ - Copy real-mode setup to 0x10000 │ │
│ │ - Copy protected-mode kernel to 0x100000 │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ Step 2: Load initrd │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ - Read initrd file │ │
│ │ - Load to high memory (e.g., 0x0F000000) │ │
│ │ - Set ramdisk_image and ramdisk_size in boot params │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ Step 3: Set up boot parameters │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ struct boot_params (at 0x10000): │ │
│ │ - e820_table: Memory map │ │
│ │ - hdr.cmd_line_ptr: Address of command line │ │
│ │ - hdr.ramdisk_image: Address of initrd │ │
│ │ - hdr.ramdisk_size: Size of initrd │ │
│ │ - hdr.hardware_subarch: 0 for standard PC │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ Step 4: Set up VCPU state │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ For 64-bit entry (if loader_type supports): │ │
│ │ - Set up identity-mapped page tables │ │
│ │ - Enter long mode (CR0.PG=1, CR4.PAE=1, EFER.LME=1) │ │
│ │ - Set RSI = address of boot_params │ │
│ │ - Set RIP = 64-bit kernel entry point │ │
│ │ │ │
│ │ For 32-bit entry (traditional): │ │
│ │ - Set up protected mode (CR0.PE=1) │ │
│ │ - Set ESI = address of boot_params │ │
│ │ - Set EIP = 32-bit kernel entry (0x100000) │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ Step 5: Start VCPU │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ - KVM_RUN on VCPU 0 (BSP - Bootstrap Processor) │ │
│ │ - Kernel decompresses, initializes │ │
│ │ - Kernel sends INIT/SIPI to start other VCPUs │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │
└────────────────────────────────────────────────────────────────────────┘
2.2 Why This Matters
This project represents the culmination of virtualization knowledge:
You Will Have Built What Powers the Cloud:
- Every AWS EC2 instance
- Every Google Compute Engine VM
- Every Azure Virtual Machine
- Uses technology nearly identical to what you’re building
Career-Defining Skills:
- Deep systems expertise rare in the industry
- Ability to debug any virtualization issue
- Understanding of kernel/hypervisor boundaries
- Skills valued at $300K+ in specialized roles
Foundation for Advanced Work:
- Hypervisor security research
- Cloud infrastructure optimization
- Bare-metal cloud (Firecracker-like systems)
- Hardware/software co-design
2.3 Historical Context
Evolution of Hypervisor Completeness:
1999-2005: VMware Workstation
- Full system virtualization without hardware support
- Binary translation for privileged code
- Proved complete virtualization was possible
2006-2007: KVM + QEMU
- Hardware-assisted virtualization (VT-x)
- QEMU for device models
- Made open-source virtualization viable
2008-2012: Cloud Era
- EC2, OpenStack, CloudStack
- Hypervisors became infrastructure
- SMP, live migration, hot-plug
2017-Present: Specialized VMMs
- Firecracker: Minimal, fast boot
- gVisor: Kernel-level sandbox
- Kata Containers: Container-VM hybrid
Your Project: Matches KVM+QEMU circa 2010
- Full Linux boot
- SMP support
- virtio devices
- APIC emulation
2.4 Common Misconceptions
Misconception 1: “SMP just means multiple VCPUs”
- Reality: SMP requires APIC emulation, IPI handling, proper startup sequence (INIT/SIPI), memory ordering guarantees, and shared device access synchronization.
Misconception 2: “The kernel just boots if you load it right”
- Reality: The kernel requires correct boot parameters, memory map (E820), ACPI tables (optional but helpful), and proper initial CPU state.
Misconception 3: “Interrupts are simple”
- Reality: APIC emulation alone requires understanding of LVT, ICR, EOI, ISR, IRR, TPR, priority arbitration, and timer configuration.
Misconception 4: “virtio is just another device”
- Reality: virtio requires vring implementation, feature negotiation, proper interrupt injection, and backend thread coordination.
3. Project Specification
3.1 What You Will Build
A complete Type-2 hypervisor that can:
- Boot a standard Linux distribution (Ubuntu, Alpine, etc.)
- Support multiple VCPUs with proper SMP semantics
- Provide virtio-blk storage (boot disk)
- Provide virtio-net networking (optional but recommended)
- Emulate enough hardware for a functional system
3.2 Functional Requirements
Core Requirements:
- SMP: Support 1-16 VCPUs
- Memory: Support 256MB - 16GB guest RAM
- Boot: Load and execute standard bzImage + initrd
- Console: Serial console via emulated 16550 UART
- Storage: virtio-blk device with qcow2 or raw image support
- Interrupts: Full APIC (local + I/O APIC) emulation
Advanced Requirements:
- Network: virtio-net with TAP backend
- Timing: PIT and APIC timer emulation
- ACPI: Minimal ACPI tables for proper shutdown
- Debug: GDB stub for kernel debugging (optional)
3.3 Non-Functional Requirements
- Boot Time: < 10 seconds to kernel prompt
- Stability: Run for hours without crashes
- Performance: Reasonable for interactive use
- Memory: < 100MB hypervisor overhead
- Maintainability: Well-structured, documented code
3.4 Example Usage / Output
$ ./myhypervisor \
--cpus 4 \
--memory 2G \
--kernel /boot/vmlinuz-5.15.0 \
--initrd /boot/initrd.img-5.15.0 \
--append "console=ttyS0 root=/dev/vda1 rw" \
--disk ubuntu.qcow2 \
--net tap,ifname=tap0
╔══════════════════════════════════════════════════════════════════════════════╗
║ My Hypervisor v1.0 ║
║ A Complete Type-2 Hypervisor ║
╠══════════════════════════════════════════════════════════════════════════════╣
║ Configuration: ║
║ VCPUs: 4 ║
║ RAM: 2048 MB ║
║ Kernel: /boot/vmlinuz-5.15.0 ║
║ Initrd: /boot/initrd.img-5.15.0 ║
║ Cmdline: console=ttyS0 root=/dev/vda1 rw ║
║ Disk: ubuntu.qcow2 (virtio-blk) ║
║ Network: tap0 (virtio-net) ║
╚══════════════════════════════════════════════════════════════════════════════╝
[BOOT] Opening /dev/kvm...
[BOOT] KVM API version: 12
[BOOT] Creating VM (4 VCPUs, 2048 MB RAM)...
[MEMORY] Allocating guest RAM...
[MEMORY] Guest RAM: GPA 0x00000000 - 0x80000000 (2048 MB)
[MEMORY] EPT configured with 1024 2MB pages
[APIC] Initializing I/O APIC at 0xFEC00000
[APIC] Initializing Local APIC at 0xFEE00000 (per-VCPU)
[KERNEL] Loading bzImage (7,234,560 bytes)...
[KERNEL] Setup header version: 2.15
[KERNEL] Protected mode kernel at 0x100000
[KERNEL] Loading initrd (32,456,789 bytes) at 0x7F000000...
[KERNEL] Command line at 0x20000
[BOOT] Setting up boot parameters...
[BOOT] E820 memory map:
0x0000000000000000 - 0x000000000009FFFF: Usable (640 KB)
0x00000000000A0000 - 0x00000000000FFFFF: Reserved (384 KB)
0x0000000000100000 - 0x000000007EFFFFFF: Usable (2046 MB)
0x000000007F000000 - 0x000000007FFFFFFF: Reserved (initrd)
[VCPU] Creating VCPU 0 (BSP)...
[VCPU] Creating VCPU 1 (AP)...
[VCPU] Creating VCPU 2 (AP)...
[VCPU] Creating VCPU 3 (AP)...
[DEVICE] virtio-blk: Initialized with ubuntu.qcow2
[DEVICE] virtio-net: Initialized with tap0, MAC 52:54:00:12:34:56
[BOOT] Starting VCPU 0...
[BOOT] ═══════════════════════════════════════════════════════════════════════
[ 0.000000] Linux version 5.15.0 (gcc 11.2.0) ...
[ 0.000000] Command line: console=ttyS0 root=/dev/vda1 rw
[ 0.000000] BIOS-provided physical RAM map:
[ 0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009ffff] usable
[ 0.000000] BIOS-e820: [mem 0x00000000000a0000-0x00000000000fffff] reserved
[ 0.000000] BIOS-e820: [mem 0x0000000000100000-0x000000007effffff] usable
[ 0.000000] NX (Execute Disable) protection: active
[ 0.000000] DMI not present or invalid.
[ 0.000000] Hypervisor detected: KVM
[ 0.002345] smpboot: Allowing 4 CPUs, 0 hotplug CPUs
[ 0.003456] setup_percpu: NR_CPUS:256 nr_cpumask_bits:256 nr_cpu_ids:4
[ 0.012345] smpboot: CPU 0 Converting physical 0 to logical
[ 0.023456] APIC: Switch to symmetric I/O mode setup
[ 0.034567] smpboot: Booting Node 0 Processor 1 APIC 0x1
[ 0.045678] smpboot: Booting Node 0 Processor 2 APIC 0x2
[ 0.056789] smpboot: Booting Node 0 Processor 3 APIC 0x3
[ 0.067890] smpboot: Total of 4 processors activated
[VCPU 1] Started via INIT/SIPI
[VCPU 2] Started via INIT/SIPI
[VCPU 3] Started via INIT/SIPI
[ 0.123456] virtio_blk virtio0: [vda] 41943040 512-byte sectors (21.5 GB)
[ 0.134567] virtio_net virtio1: MAC: 52:54:00:12:34:56
[ 0.145678] virtio_net virtio1: Host supports net-announce
[ 0.234567] EXT4-fs (vda1): mounted filesystem
[ 1.234567] Run /sbin/init as init process
Ubuntu 22.04 LTS myvm ttyS0
myvm login: root
Password:
Welcome to Ubuntu 22.04 LTS (Jammy Jellyfish)!
root@myvm:~# cat /proc/cpuinfo | grep processor
processor : 0
processor : 1
processor : 2
processor : 3
root@myvm:~# free -m
total used free
Mem: 2048 234 1814
root@myvm:~# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
vda 254:0 0 21.5G 0 disk
└─vda1 254:1 0 21.5G 0 part /
root@myvm:~# ip addr show eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP>
inet 10.0.2.15/24 scope global eth0
root@myvm:~# ping -c 3 8.8.8.8
PING 8.8.8.8: 64 bytes from 8.8.8.8: seq=0 ttl=64 time=1.2ms
PING 8.8.8.8: 64 bytes from 8.8.8.8: seq=1 ttl=64 time=0.8ms
PING 8.8.8.8: 64 bytes from 8.8.8.8: seq=2 ttl=64 time=0.9ms
root@myvm:~# uptime
12:34:56 up 5 min, 1 user, load average: 0.00, 0.00, 0.00
root@myvm:~# poweroff
[ 123.456789] reboot: Power down
[HYPERVISOR] ═════════════════════════════════════════════════════════════════
╔══════════════════════════════════════════════════════════════════════════════╗
║ Session Statistics ║
╠══════════════════════════════════════════════════════════════════════════════╣
║ Runtime: 5 minutes 23 seconds ║
║ ║
║ VCPU Statistics: ║
║ VCPU 0: 1,234,567,890 cycles, 45,678 exits (I/O: 12,345, HLT: 33,333) ║
║ VCPU 1: 987,654,321 cycles, 34,567 exits (I/O: 9,876, HLT: 24,691) ║
║ VCPU 2: 876,543,210 cycles, 23,456 exits (I/O: 7,654, HLT: 15,802) ║
║ VCPU 3: 765,432,109 cycles, 12,345 exits (I/O: 5,432, HLT: 6,913) ║
║ ║
║ Interrupt Statistics: ║
║ Timer interrupts: 32,456 ║
║ Disk interrupts: 1,234 ║
║ Network interrupts: 567 ║
║ IPIs: 4,567 ║
║ ║
║ Device Statistics: ║
║ virtio-blk: 23,456 reads, 1,234 writes ║
║ virtio-net: 5,678 packets TX, 4,321 packets RX ║
║ Serial: 123,456 bytes TX ║
║ ║
╚══════════════════════════════════════════════════════════════════════════════╝
[HYPERVISOR] Shutdown complete.
4. Solution Architecture
4.1 High-Level Design
┌────────────────────────────────────────────────────────────────────────────┐
│ Complete Type-2 Hypervisor Architecture │
├────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌───────────────────────────────────────────────────────────────────────┐ │
│ │ Main Components │ │
│ │ │ │
│ │ main.c │ │
│ │ ├── Parse arguments │ │
│ │ ├── Initialize hypervisor │ │
│ │ ├── Start VCPU threads │ │
│ │ ├── Run event loop │ │
│ │ └── Cleanup │ │
│ │ │ │
│ │ ┌─────────────────────────────────────────────────────────────────┐ │ │
│ │ │ hypervisor.c │ │ │
│ │ │ │ │ │
│ │ │ struct hypervisor { │ │ │
│ │ │ int kvm_fd, vm_fd; │ │ │
│ │ │ struct vcpu vcpus[MAX_VCPUS]; │ │ │
│ │ │ int num_vcpus; │ │ │
│ │ │ void *ram; │ │ │
│ │ │ size_t ram_size; │ │ │
│ │ │ struct ioapic ioapic; │ │ │
│ │ │ struct device *devices; │ │ │
│ │ │ struct event_loop *loop; │ │ │
│ │ │ }; │ │ │
│ │ │ │ │ │
│ │ └─────────────────────────────────────────────────────────────────┘ │ │
│ │ │ │
│ └───────────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────────────────────┐│
│ │ VCPU Subsystem ││
│ │ ││
│ │ vcpu.c ││
│ │ ┌───────────────────────────────────────────────────────────────────┐││
│ │ │ struct vcpu { │││
│ │ │ int vcpu_id; │││
│ │ │ int vcpu_fd; │││
│ │ │ struct kvm_run *run; │││
│ │ │ pthread_t thread; │││
│ │ │ struct local_apic lapic; │││
│ │ │ bool is_bsp; // Bootstrap processor? │││
│ │ │ enum { WAIT_SIPI, RUNNING, HALTED } state; │││
│ │ │ }; │││
│ │ └───────────────────────────────────────────────────────────────────┘││
│ │ ││
│ │ void *vcpu_thread(void *arg) { ││
│ │ struct vcpu *vcpu = arg; ││
│ │ ││
│ │ if (!vcpu->is_bsp) ││
│ │ wait_for_sipi(vcpu); ││
│ │ ││
│ │ while (vcpu->state == RUNNING) { ││
│ │ ioctl(vcpu->vcpu_fd, KVM_RUN, 0); ││
│ │ handle_exit(vcpu, vcpu->run); ││
│ │ } ││
│ │ } ││
│ │ ││
│ └─────────────────────────────────────────────────────────────────────────┘│
│ │
│ ┌─────────────────────────────────────────────────────────────────────────┐│
│ │ APIC Subsystem ││
│ │ ││
│ │ apic.c ││
│ │ ┌───────────────────────────────────────────────────────────────────┐││
│ │ │ Local APIC (per-VCPU): │││
│ │ │ - Timer with interrupt generation │││
│ │ │ - IPI sending (ICR write handling) │││
│ │ │ - Interrupt priority arbitration │││
│ │ │ - EOI processing │││
│ │ │ │││
│ │ │ I/O APIC (shared): │││
│ │ │ - 24 redirection entries │││
│ │ │ - Route device IRQs to VCPU(s) │││
│ │ │ - Level vs edge triggering │││
│ │ └───────────────────────────────────────────────────────────────────┘││
│ │ ││
│ └─────────────────────────────────────────────────────────────────────────┘│
│ │
│ ┌─────────────────────────────────────────────────────────────────────────┐│
│ │ Device Subsystem ││
│ │ ││
│ │ devices/ ││
│ │ ├── serial.c - 16550 UART ││
│ │ ├── pit.c - 8254 Timer (optional, APIC timer preferred) ││
│ │ ├── rtc.c - Real-time clock ││
│ │ ├── virtio_blk.c - Block device ││
│ │ ├── virtio_net.c - Network device ││
│ │ └── pci.c - PCI bus for virtio devices ││
│ │ ││
│ │ Event Loop: ││
│ │ ┌───────────────────────────────────────────────────────────────────┐││
│ │ │ while (running) { │││
│ │ │ epoll_wait(); │││
│ │ │ if (disk_io_ready) virtio_blk_complete(); │││
│ │ │ if (net_rx_ready) virtio_net_receive(); │││
│ │ │ if (serial_input) serial_receive(); │││
│ │ │ if (timer_expired) inject_timer_interrupt(); │││
│ │ │ } │││
│ │ └───────────────────────────────────────────────────────────────────┘││
│ │ ││
│ └─────────────────────────────────────────────────────────────────────────┘│
│ │
│ ┌─────────────────────────────────────────────────────────────────────────┐│
│ │ Boot Subsystem ││
│ │ ││
│ │ boot.c ││
│ │ ├── load_bzimage() - Parse and load Linux kernel ││
│ │ ├── load_initrd() - Load initial ramdisk ││
│ │ ├── setup_boot_params() - Fill struct boot_params ││
│ │ ├── setup_e820() - Create memory map ││
│ │ └── setup_vcpu_boot() - Initialize BSP for kernel entry ││
│ │ ││
│ └─────────────────────────────────────────────────────────────────────────┘│
│ │
└────────────────────────────────────────────────────────────────────────────┘
4.2 Project Structure
myhypervisor/
├── Makefile
├── include/
│ ├── hypervisor.h # Main hypervisor structures
│ ├── vcpu.h # VCPU management
│ ├── memory.h # Memory subsystem
│ ├── apic.h # APIC emulation
│ ├── ioapic.h # I/O APIC
│ ├── boot.h # Boot loader
│ ├── pci.h # PCI bus
│ ├── virtio.h # virtio common
│ └── devices/
│ ├── serial.h
│ ├── virtio_blk.h
│ └── virtio_net.h
├── src/
│ ├── main.c # Entry point
│ ├── hypervisor.c # Core hypervisor logic
│ ├── vcpu.c # VCPU creation and threading
│ ├── memory.c # Memory setup
│ ├── apic.c # Local APIC emulation
│ ├── ioapic.c # I/O APIC emulation
│ ├── boot.c # Linux boot protocol
│ ├── pci.c # PCI configuration space
│ ├── eventloop.c # epoll-based event loop
│ └── devices/
│ ├── serial.c # 16550 UART
│ ├── pit.c # 8254 timer
│ ├── rtc.c # RTC
│ ├── virtio_blk.c # Block device
│ └── virtio_net.c # Network device
├── scripts/
│ ├── run.sh # Launch script
│ └── create_disk.sh # Create disk image
└── tests/
├── test_apic.c
├── test_virtio.c
└── integration/
└── boot_linux.sh
5. Implementation Guide
5.1 Development Environment
# Create project directory
mkdir myhypervisor && cd myhypervisor
# Install dependencies
sudo apt-get install build-essential libpthread-stubs0-dev
# Get test kernel and initrd
# Option 1: Use host kernel
cp /boot/vmlinuz-$(uname -r) ./vmlinuz
cp /boot/initrd.img-$(uname -r) ./initrd.img
# Option 2: Build minimal kernel (recommended)
# Follow kernel.org instructions with minimal config
# Create test disk image
dd if=/dev/zero of=disk.img bs=1M count=1024
mkfs.ext4 disk.img
# Or use qcow2 (requires libqcow2-dev)
qemu-img create -f qcow2 disk.qcow2 10G
5.2 Implementation Phases
This project is substantial. Follow this phased approach:
Phase 1: Single-VCPU Linux Boot (Weeks 1-4)
Goal: Boot Linux kernel to panic (no root filesystem)
Milestones:
- KVM setup with proper memory regions
- bzImage loading and parsing
- Boot parameters setup
- 64-bit entry with identity-mapped page tables
- Kernel decompresses and runs
- Panic at “no init found” or similar
Key Code:
/* boot.c - Load and parse bzImage */
struct setup_header {
__u8 setup_sects;
__u16 root_flags;
__u32 syssize;
// ... many more fields
__u8 type_of_loader;
__u8 loadflags;
__u16 setup_move_size;
__u32 code32_start;
__u32 ramdisk_image;
__u32 ramdisk_size;
// ...
};
int load_bzimage(struct hypervisor *hv, const char *path) {
int fd = open(path, O_RDONLY);
struct setup_header header;
/* Read and validate header */
lseek(fd, 0x1F1, SEEK_SET);
read(fd, &header, sizeof(header));
if (header.boot_flag != 0xAA55) {
fprintf(stderr, "Invalid bzImage\n");
return -1;
}
/* Calculate offsets */
int setup_size = (header.setup_sects + 1) * 512;
int kernel_size = /* file size - setup_size */;
/* Load real-mode setup */
lseek(fd, 0, SEEK_SET);
read(fd, hv->ram + 0x10000, setup_size);
/* Load protected-mode kernel */
read(fd, hv->ram + 0x100000, kernel_size);
return 0;
}
Phase 2: Add Serial Console (Week 5)
Goal: See kernel output
Key Points:
- Implement 16550 UART at I/O ports 0x3F8-0x3FF
- Connect to stdout/stdin
- Handle LSR (Line Status Register) for TX ready
Phase 3: Add Storage (Weeks 6-8)
Goal: Mount root filesystem
Key Points:
- Implement virtio-blk device
- Handle virtqueue requests
- Support raw or qcow2 images
- Boot reaches init
Phase 4: Add APIC (Weeks 9-12)
Goal: Timer interrupts work
Key Points:
- Local APIC timer for scheduling
- I/O APIC for device interrupts
- EOI handling
- Interrupt injection via KVM_INTERRUPT or APICv
/* apic.c - Local APIC timer */
struct local_apic {
uint32_t id;
uint32_t version;
uint32_t tpr; /* Task Priority */
uint32_t eoi; /* End of Interrupt */
uint32_t ldr; /* Logical Destination */
uint32_t dfr; /* Destination Format */
uint32_t svr; /* Spurious Vector */
uint32_t isr[8]; /* In-Service Register (256 bits) */
uint32_t tmr[8]; /* Trigger Mode Register */
uint32_t irr[8]; /* Interrupt Request Register */
/* Timer */
uint32_t lvt_timer; /* Timer LVT */
uint32_t timer_initial; /* Initial count */
uint32_t timer_current; /* Current count */
uint32_t timer_divide; /* Divide configuration */
uint64_t timer_deadline;
/* IPI */
uint64_t icr; /* Interrupt Command Register */
};
void apic_timer_tick(struct local_apic *apic) {
if (apic->timer_current > 0) {
apic->timer_current--;
if (apic->timer_current == 0) {
/* Timer expired - inject interrupt */
int vector = apic->lvt_timer & 0xFF;
apic_inject_interrupt(apic, vector);
/* Reload if periodic */
if ((apic->lvt_timer & (1 << 17)) /* periodic mode */) {
apic->timer_current = apic->timer_initial;
}
}
}
}
Phase 5: Add SMP (Weeks 13-16)
Goal: Multiple VCPUs running
Key Points:
- Create multiple VCPU threads
- Handle INIT/SIPI sequence
- IPI via APIC ICR
- Proper synchronization
/* smp.c - VCPU startup sequence */
void handle_apic_icr_write(struct vcpu *sender, uint64_t icr) {
int delivery_mode = (icr >> 8) & 0x7;
int dest_mode = (icr >> 11) & 0x1;
int dest = (icr >> 56) & 0xFF;
switch (delivery_mode) {
case 5: /* INIT */
/* Find target VCPU(s) */
struct vcpu *target = find_vcpu_by_apic_id(dest);
if (target) {
target->state = WAIT_SIPI;
/* INIT resets VCPU to known state */
reset_vcpu(target);
}
break;
case 6: /* SIPI (Startup IPI) */
target = find_vcpu_by_apic_id(dest);
if (target && target->state == WAIT_SIPI) {
/* Vector specifies start address / 0x1000 */
int vector = icr & 0xFF;
uint64_t start_addr = vector * 0x1000;
/* Set VCPU to start at this address */
struct kvm_regs regs = {0};
regs.rip = start_addr;
ioctl(target->vcpu_fd, KVM_SET_REGS, ®s);
target->state = RUNNING;
pthread_cond_signal(&target->start_cond);
}
break;
}
}
Phase 6: Add Networking (Weeks 17-20)
Goal: Guest has network access
Key Points:
- Implement virtio-net
- TAP device backend
- NAT or bridged networking
Phase 7: Polish and Stabilize (Weeks 21-24)
Goal: Production-quality stability
Key Points:
- Handle edge cases
- Improve error messages
- Performance optimization
- Extended testing
6. Testing Strategy
6.1 Progressive Testing
# Test 1: Basic boot (no root fs)
./myhypervisor --kernel vmlinuz --memory 256M
# Expected: Kernel panic "Unable to mount root fs"
# Test 2: With initrd
./myhypervisor --kernel vmlinuz --initrd initrd.img --memory 512M
# Expected: Reaches init, may complain about missing devices
# Test 3: With disk
./myhypervisor --kernel vmlinuz --initrd initrd.img --disk ubuntu.img --memory 1G
# Expected: Boots to login prompt
# Test 4: With SMP
./myhypervisor --cpus 4 --kernel vmlinuz --initrd initrd.img --disk ubuntu.img --memory 2G
# Expected: "smpboot: Total of 4 processors activated"
# Test 5: With network
./myhypervisor --cpus 4 --memory 2G --disk ubuntu.img --net tap,ifname=tap0
# Expected: Guest can ping external hosts
6.2 Stress Testing
# Run for extended period
./myhypervisor --cpus 4 --memory 4G --disk ubuntu.img &
sleep 3600 # 1 hour
# Check if still running, check dmesg for errors
# CPU stress test
# In guest:
$ stress --cpu 4 --timeout 600
# I/O stress test
# In guest:
$ fio --name=test --rw=randwrite --size=1G
7. Common Pitfalls
| Problem | Root Cause | Fix |
|---|---|---|
| Kernel doesn’t decompress | Wrong load address | Check boot_params setup |
| “No init found” | initrd not loaded | Verify ramdisk_image/size |
| Timer interrupts not firing | APIC timer not configured | Check LVT timer register |
| APs don’t start | SIPI handling wrong | Debug APIC ICR writes |
| Guest hangs on disk access | virtio queue bug | Check vring implementation |
| Network packets dropped | TAP misconfigured | Check TAP and bridge setup |
8. Resources
Primary References
Code References
- QEMU - Reference implementation
- Firecracker - Minimal VMM
- kvmtool - Simple KVM VMM
- crosvm - Chrome OS VMM
9. Self-Assessment Checklist
Core Functionality
- VM boots Linux kernel
- Serial console works
- Disk access works
- Timer interrupts fire
- Multiple VCPUs work
- Guest can run for hours
Advanced Features
- Network access works
- ACPI shutdown works
- Guest sees all configured CPUs
- Performance is reasonable
Code Quality
- Clean architecture
- Error handling throughout
- No memory leaks
- Well-documented
10. Completion Criteria
Your hypervisor is complete when you can:
- Boot Ubuntu/Debian/Alpine to login prompt
- Run with 4 VCPUs and guest sees all 4
- Run stress tests without crashes
- Access network from guest (ping external hosts)
- Run for 24+ hours without issues
- Shutdown cleanly via guest poweroff
Congratulations! You’ve built a complete hypervisor comparable to QEMU/KVM circa 2010. You now possess rare, deep systems knowledge that places you among the most skilled systems programmers in the industry.
“After completing this project, you’ll have gone from ‘I don’t know if QEMU is a hypervisor’ to ‘I built my own hypervisor that runs Linux.’ You’ll understand virtualization at every level: hardware-assisted CPU virtualization, memory virtualization with EPT, interrupt virtualization with APIC, and device emulation with virtio. This is deep systems knowledge that very few developers possess, and it will serve you throughout your career in systems programming, cloud infrastructure, and security.”