Project 11: Minimal VT-x Hypervisor (Type 2)

Build a minimal hypervisor that uses Intel VT-x to run a simple guest (real-mode code that prints to VGA) with proper VMCS setup and VM exit handling.

Quick Reference

Attribute Value
Difficulty Expert (Level 4: The Systems Architect)
Time Estimate 4-6 weeks
Language C (alternatives: Rust)
Prerequisites Project 10 (VMX Capability Explorer), deep understanding of x86 architecture, Linux kernel module development
Key Topics Intel VT-x, VMXON, VMCS, VM Entry/Exit, VMX Root/Non-root, Guest State, Host State

1. Learning Objectives

By completing this project, you will:

  • Understand the Intel VT-x hardware virtualization architecture at a deep level
  • Learn how to enter and exit VMX operation (root mode)
  • Master the VMCS (Virtual Machine Control Structure) configuration with all mandatory fields
  • Implement proper VM entry and exit handling including VMLAUNCH and VMRESUME
  • Understand the dual-world model of VMX root vs VMX non-root operation
  • Build the foundation for a production hypervisor like KVM or Xen
  • Debug low-level hypervisor issues using Intel SDM and hardware features

2. Theoretical Foundation

2.1 Core Concepts

The VMX Dual-World Architecture

Intel VT-x introduces a new processor mode called VMX (Virtual Machine Extensions). The CPU can operate in two fundamentally different “worlds”:

┌─────────────────────────────────────────────────────────────────────┐
│                        CPU Execution Modes                           │
├─────────────────────────────────────────────────────────────────────┤
│                                                                      │
│   ┌──────────────────────────────────┐                              │
│   │         VMX ROOT MODE            │                              │
│   │    (Hypervisor executes here)    │                              │
│   │                                  │                              │
│   │  ┌────────────────────────────┐  │                              │
│   │  │   Ring 0 (Kernel mode)     │  │ ◄── Your hypervisor code    │
│   │  ├────────────────────────────┤  │                              │
│   │  │   Ring 3 (User mode)       │  │ ◄── Hypervisor userspace    │
│   │  └────────────────────────────┘  │                              │
│   └──────────────────────────────────┘                              │
│                    │                                                 │
│                    │ VM Exit (Trap to hypervisor)                   │
│                    │ VM Entry (VMLAUNCH/VMRESUME)                   │
│                    ▼                                                 │
│   ┌──────────────────────────────────┐                              │
│   │       VMX NON-ROOT MODE          │                              │
│   │      (Guest executes here)       │                              │
│   │                                  │                              │
│   │  ┌────────────────────────────┐  │                              │
│   │  │   Ring 0 (Guest kernel)    │  │ ◄── Guest OS kernel         │
│   │  ├────────────────────────────┤  │                              │
│   │  │   Ring 3 (Guest user)      │  │ ◄── Guest applications      │
│   │  └────────────────────────────┘  │                              │
│   └──────────────────────────────────┘                              │
│                                                                      │
└─────────────────────────────────────────────────────────────────────┘

Key Insight: The guest thinks it’s running normally with full privilege, but the hardware traps sensitive operations back to the hypervisor. This is “trap-and-emulate” done in silicon.

The VMCS (Virtual Machine Control Structure)

The VMCS is the heart of VT-x. It’s a 4KB data structure that holds:

┌─────────────────────────────────────────────────────────────────────┐
│                     VMCS Structure (4KB)                             │
├─────────────────────────────────────────────────────────────────────┤
│                                                                      │
│  ┌─────────────────────────────────────────────────────────────┐    │
│  │               GUEST STATE AREA                               │    │
│  │  ┌─────────────────────────────────────────────────────┐    │    │
│  │  │ Control Registers: CR0, CR3, CR4                    │    │    │
│  │  │ Segment Registers: CS, SS, DS, ES, FS, GS, LDTR, TR │    │    │
│  │  │ General Registers: RSP, RIP, RFLAGS                 │    │    │
│  │  │ Descriptor Tables: GDTR, IDTR                       │    │    │
│  │  │ MSRs: SYSENTER_CS/ESP/EIP, IA32_EFER, etc.         │    │    │
│  │  │ Activity State, Interruptibility State              │    │    │
│  │  └─────────────────────────────────────────────────────┘    │    │
│  └─────────────────────────────────────────────────────────────┘    │
│                                                                      │
│  ┌─────────────────────────────────────────────────────────────┐    │
│  │                HOST STATE AREA                               │    │
│  │  ┌─────────────────────────────────────────────────────┐    │    │
│  │  │ Control Registers: CR0, CR3, CR4                    │    │    │
│  │  │ Segment Selectors: CS, SS, DS, ES, FS, GS, TR       │    │    │
│  │  │ Base Addresses: FS/GS base, TR base, GDTR, IDTR     │    │    │
│  │  │ RSP, RIP (where to return on VM exit)               │    │    │
│  │  │ MSRs: SYSENTER_CS/ESP/EIP, IA32_EFER                │    │    │
│  │  └─────────────────────────────────────────────────────┘    │    │
│  └─────────────────────────────────────────────────────────────┘    │
│                                                                      │
│  ┌─────────────────────────────────────────────────────────────┐    │
│  │               VM-EXECUTION CONTROL FIELDS                    │    │
│  │  ┌─────────────────────────────────────────────────────┐    │    │
│  │  │ Pin-Based Controls: External/NMI/Virtual NMIs       │    │    │
│  │  │ Primary Proc-Based: HLT/MWAIT/RDPMC/CR/IO exiting  │    │    │
│  │  │ Secondary Proc-Based: EPT/VPID/Unrestricted Guest   │    │    │
│  │  │ Exception Bitmap: Which exceptions cause VM exit    │    │    │
│  │  │ I/O Bitmap, MSR Bitmap, CR0/CR4 Masks               │    │    │
│  │  └─────────────────────────────────────────────────────┘    │    │
│  └─────────────────────────────────────────────────────────────┘    │
│                                                                      │
│  ┌─────────────────────────────────────────────────────────────┐    │
│  │               VM-EXIT CONTROL FIELDS                         │    │
│  │               VM-ENTRY CONTROL FIELDS                        │    │
│  │               VM-EXIT INFORMATION FIELDS                     │    │
│  └─────────────────────────────────────────────────────────────┘    │
│                                                                      │
└─────────────────────────────────────────────────────────────────────┘

VMX Operation Lifecycle

┌────────────────────────────────────────────────────────────────────┐
│                    VMX Operation Lifecycle                          │
└────────────────────────────────────────────────────────────────────┘

     Normal CPU Operation
            │
            │ 1. Set CR4.VMXE = 1
            ▼
    ┌───────────────────┐
    │ VMXON instruction │────────────────────────────────────────────┐
    │ (Enter VMX root)  │                                            │
    └─────────┬─────────┘                                            │
              │                                                       │
              │ 2. Allocate and configure VMCS                       │
              │                                                       │
              ▼                                                       │
    ┌───────────────────┐                                            │
    │     VMCLEAR       │  Clear VMCS launch state                   │
    └─────────┬─────────┘                                            │
              │                                                       │
              ▼                                                       │
    ┌───────────────────┐                                            │
    │     VMPTRLD       │  Load VMCS as current                      │
    └─────────┬─────────┘                                            │
              │                                                       │
              │ 3. Write all VMCS fields                             │
              │    - Guest state                                      │
              │    - Host state                                       │
              │    - Control fields                                   │
              ▼                                                       │
    ┌───────────────────┐         ┌───────────────────┐              │
    │    VMLAUNCH       │────────►│   Guest Running   │              │
    │ (First VM entry)  │         │ (VMX non-root)    │              │
    └───────────────────┘         └─────────┬─────────┘              │
                                            │                         │
                                            │ 4. Sensitive operation  │
                                            │    triggers VM Exit     │
                                            ▼                         │
    ┌───────────────────┐         ┌───────────────────┐              │
    │  Handle VM Exit   │◄────────│     VM EXIT       │              │
    │  (VMX root mode)  │         │                   │              │
    └─────────┬─────────┘         └───────────────────┘              │
              │                                                       │
              │ 5. Emulate/handle                                     │
              │    the operation                                      │
              ▼                                                       │
    ┌───────────────────┐         ┌───────────────────┐              │
    │    VMRESUME       │────────►│   Guest Running   │              │
    │ (Subsequent entry)│         │ (VMX non-root)    │              │
    └───────────────────┘         └─────────┬─────────┘              │
              ▲                             │                         │
              │                             │                         │
              └─────────────────────────────┘                         │
                                                                      │
    ┌───────────────────┐                                            │
    │     VMXOFF        │◄───────────────────────────────────────────┘
    │ (Exit VMX mode)   │  6. When done, leave VMX operation
    └───────────────────┘

2.2 Why This Matters

This is what KVM, VMware ESXi, and Xen do at their core. Every time you run a VM on your laptop or a cloud instance on AWS/GCP/Azure, this exact mechanism is executing:

  1. Cloud Computing: EC2 instances, GCE VMs, Azure VMs all use VT-x
  2. Container Security: gVisor, Kata Containers use VT-x for isolation
  3. Security Research: Hypervisor-based rootkits and detection tools
  4. Desktop Virtualization: VirtualBox, Parallels, VMware Workstation

Understanding VT-x gives you insight into:

  • Why some guest operations are slow (VM exits)
  • Why nested virtualization is complex
  • How to optimize virtualized workloads
  • Security boundaries between VMs

2.3 Historical Context

2005-2006: Intel introduces VT-x (Vanderpool) and AMD introduces AMD-V (Pacifica)

  • Before this, x86 virtualization required complex binary translation (VMware) or paravirtualization (Xen)
  • The x86 architecture was famously “unvirtualizable” due to sensitive but non-privileged instructions

2008: Intel adds EPT (Extended Page Tables)

  • Eliminated the need for shadow page tables
  • Massive performance improvement for memory-intensive workloads

2010+: Continuous improvements

  • VMCS shadowing for nested virtualization
  • APICv for interrupt virtualization
  • Posted interrupts, VMFUNC, etc.

Key Insight: VT-x made x86 virtualization practical. Before it, VMware had to patent complex binary translation techniques. After it, KVM could be written as a relatively simple kernel module.

2.4 Common Misconceptions

Misconception 1: “VT-x makes virtualization fast”

  • Reality: VT-x makes virtualization correct. Speed comes from minimizing VM exits. Naive hypervisors with many VM exits can be slower than binary translation.

Misconception 2: “The guest runs with reduced privileges”

  • Reality: The guest runs with FULL privilege in its world (Ring 0 in VMX non-root). It just can’t escape to the host world without triggering a VM exit.

Misconception 3: “VMCS is just a data structure you fill in”

  • Reality: VMCS fields have complex interdependencies. Getting the configuration right requires careful study of ~200 pages of Intel SDM. Many fields have requirements like “if bit X is set, then field Y must be…”

Misconception 4: “VM exits are like system calls”

  • Reality: VM exits are much heavier. A syscall might take ~100ns; a VM exit can take 500-1000ns. This is why minimizing exits is crucial for performance.

3. Project Specification

3.1 What You Will Build

A Linux kernel module that:

  1. Enters VMX operation (VMX root mode)
  2. Configures a VMCS for a real-mode guest
  3. Loads guest code into memory
  4. Launches the guest using VMLAUNCH
  5. Handles VM exits (I/O, HLT, etc.)
  6. Resumes the guest using VMRESUME
  7. Cleanly exits VMX operation

The guest will be a simple real-mode program that prints “Hello from VMX guest!” to the serial port and then halts.

3.2 Functional Requirements

  • VMX Enable: Set CR4.VMXE and execute VMXON successfully
  • VMCS Configuration: Set all mandatory guest, host, and control fields
  • Guest Initialization: Configure guest to start in 16-bit real mode
  • VM Entry: Successfully execute VMLAUNCH
  • VM Exit Handling: Handle at least: HLT, I/O port access, CPUID
  • VM Resume: Execute VMRESUME to continue guest after handling exit
  • Clean Shutdown: Execute VMXOFF and unload module safely

3.3 Non-Functional Requirements

  • Hardware: Intel CPU with VT-x support (most CPUs since 2006)
  • BIOS/UEFI: VT-x must be enabled in firmware
  • Safety: Must not crash the host system (kernel module safety)
  • Debuggability: Clear logging of all VMX operations and exits
  • Documentation: Comments explaining each VMCS field setting

3.4 Example Usage / Output

$ sudo insmod myhypervisor.ko
$ sudo ./launch_guest guest.bin

[HYPERVISOR] Entering VMX operation...
[HYPERVISOR] Checking VT-x support: CPUID.1:ECX.VMX = 1
[HYPERVISOR] IA32_FEATURE_CONTROL: Lock=1, VMXON enabled=1
[HYPERVISOR] Setting CR4.VMXE = 1
[HYPERVISOR] Allocating VMXON region at 0xffff888012340000
[HYPERVISOR] Writing VMCS revision ID: 0x12
[HYPERVISOR] VMXON successful!

[HYPERVISOR] Allocating VMCS at 0xffff888012341000
[HYPERVISOR] VMCLEAR successful
[HYPERVISOR] VMPTRLD successful - VMCS is now current

[HYPERVISOR] Configuring VMCS fields...
[HYPERVISOR] Guest state:
  - CR0: 0x60000010 (Real mode defaults)
  - CR3: 0x00000000
  - CR4: 0x00000000
  - CS: selector=0x0000, base=0x00000000, limit=0xFFFF, access=0x9B
  - DS/ES/SS: selector=0x0000, base=0x00000000, limit=0xFFFF
  - RIP: 0x00007C00 (Boot sector entry)
  - RSP: 0x00007000
  - RFLAGS: 0x00000002

[HYPERVISOR] Host state:
  - CR0: 0x80050033
  - CR3: 0x00000001a2345000
  - CR4: 0x000026e0
  - CS: 0x0010
  - RIP: 0xffffffff81234567 (VM exit handler)

[HYPERVISOR] Control fields:
  - Pin-based: 0x00000016
  - Primary proc-based: 0x8401E172
  - VM-exit controls: 0x00036DFF
  - VM-entry controls: 0x000011FF

[HYPERVISOR] Loading guest code (512 bytes) at GPA 0x7C00
[HYPERVISOR] Launching guest with VMLAUNCH...

[HYPERVISOR] === GUEST IS NOW RUNNING ===

[HYPERVISOR] VM EXIT #1
  - Exit reason: 30 (I/O instruction)
  - Exit qualification: 0x00010308
  - Guest RIP: 0x00007C12
  - Instruction length: 2
  - Port: 0x3F8 (COM1 data)
  - Direction: OUT
  - Data: 0x48 ('H')
[HYPERVISOR] Emulating: OUT 0x3F8, 'H'
[HYPERVISOR] Resuming guest...

[HYPERVISOR] VM EXIT #2
  - Exit reason: 30 (I/O instruction)
  - Port: 0x3F8, OUT, data: 0x65 ('e')
[HYPERVISOR] Emulating: OUT 0x3F8, 'e'

[HYPERVISOR] VM EXIT #3
  - Exit reason: 30 (I/O instruction)
  - Port: 0x3F8, OUT, data: 0x6C ('l')

[... more I/O exits for remaining characters ...]

[HYPERVISOR] VM EXIT #15
  - Exit reason: 30 (I/O instruction)
  - Port: 0x3F8, OUT, data: 0x0A ('\n')

[HYPERVISOR] VM EXIT #16
  - Exit reason: 12 (HLT)
  - Guest RIP: 0x00007C42
[HYPERVISOR] Guest executed HLT instruction - shutting down

[GUEST OUTPUT]
Hello from VMX guest!

[HYPERVISOR] === GUEST TERMINATED ===
[HYPERVISOR] Statistics:
  - Total VM exits: 16
  - I/O exits: 15
  - HLT exits: 1
  - Total cycles in guest: ~847,000
  - Average exit latency: 487 cycles

[HYPERVISOR] Executing VMXOFF...
[HYPERVISOR] Clearing CR4.VMXE...
[HYPERVISOR] VMX operation exited cleanly

$ dmesg | tail
[12345.678901] myhypervisor: Module unloaded successfully

4. Solution Architecture

4.1 High-Level Design

┌──────────────────────────────────────────────────────────────────────┐
│                         User Space                                    │
├──────────────────────────────────────────────────────────────────────┤
│                                                                       │
│  ┌─────────────────────────────────────────────────────────────┐     │
│  │                    launch_guest utility                      │     │
│  │                                                              │     │
│  │  - Reads guest binary file                                   │     │
│  │  - Opens /dev/myhypervisor                                  │     │
│  │  - Passes guest code via ioctl                              │     │
│  │  - Receives output via ioctl or read                        │     │
│  └─────────────────────────────────────────────────────────────┘     │
│                              │                                        │
│                              │ ioctl(LAUNCH_GUEST)                   │
│                              ▼                                        │
├──────────────────────────────────────────────────────────────────────┤
│                         Kernel Space                                  │
├──────────────────────────────────────────────────────────────────────┤
│                                                                       │
│  ┌─────────────────────────────────────────────────────────────┐     │
│  │              myhypervisor.ko (Kernel Module)                 │     │
│  ├─────────────────────────────────────────────────────────────┤     │
│  │                                                              │     │
│  │  ┌─────────────────┐  ┌─────────────────┐                   │     │
│  │  │   VMX Manager   │  │  VMCS Manager   │                   │     │
│  │  │                 │  │                 │                   │     │
│  │  │ - vmx_enable()  │  │ - vmcs_alloc()  │                   │     │
│  │  │ - vmxon()       │  │ - vmcs_setup()  │                   │     │
│  │  │ - vmxoff()      │  │ - vmcs_write()  │                   │     │
│  │  │ - check_vmx()   │  │ - vmcs_read()   │                   │     │
│  │  └─────────────────┘  └─────────────────┘                   │     │
│  │                                                              │     │
│  │  ┌─────────────────┐  ┌─────────────────┐                   │     │
│  │  │  Guest Memory   │  │  VM Exit Handler│                   │     │
│  │  │                 │  │                 │                   │     │
│  │  │ - alloc_guest() │  │ - handle_io()   │                   │     │
│  │  │ - load_code()   │  │ - handle_hlt()  │                   │     │
│  │  │ - map_memory()  │  │ - handle_cpuid()│                   │     │
│  │  └─────────────────┘  │ - handle_msr()  │                   │     │
│  │                       └─────────────────┘                   │     │
│  │                                                              │     │
│  │  ┌──────────────────────────────────────────────────────┐   │     │
│  │  │                   VM Entry/Exit ASM                   │   │     │
│  │  │                                                       │   │     │
│  │  │  vm_launch():    vm_exit_handler():                   │   │     │
│  │  │    save host      save guest regs                     │   │     │
│  │  │    VMLAUNCH       call handle_exit()                  │   │     │
│  │  │                   restore guest regs                  │   │     │
│  │  │  vm_resume():     VMRESUME                            │   │     │
│  │  │    VMRESUME                                           │   │     │
│  │  └──────────────────────────────────────────────────────┘   │     │
│  │                                                              │     │
│  └─────────────────────────────────────────────────────────────┘     │
│                              │                                        │
│                              │ VMX instructions                       │
│                              ▼                                        │
├──────────────────────────────────────────────────────────────────────┤
│                         CPU Hardware (VT-x)                           │
├──────────────────────────────────────────────────────────────────────┤
│                                                                       │
│  ┌────────────────────────────────────────────────────────────┐      │
│  │                      Intel CPU                              │      │
│  │                                                             │      │
│  │   VMX Root Mode ◄────────────────────► VMX Non-Root Mode   │      │
│  │   (Hypervisor)         VM Exit         (Guest)              │      │
│  │                       ─────────►                            │      │
│  │                       ◄─────────                            │      │
│  │                        VM Entry                             │      │
│  │                                                             │      │
│  │   ┌──────────────────────────────────────────────────┐     │      │
│  │   │                    VMCS                           │     │      │
│  │   │   Guest State │ Host State │ Control Fields       │     │      │
│  │   └──────────────────────────────────────────────────┘     │      │
│  └────────────────────────────────────────────────────────────┘      │
│                                                                       │
└──────────────────────────────────────────────────────────────────────┘

4.2 Key Components

1. VMX Manager

Handles entering and exiting VMX operation:

  • Checks CPU support for VT-x
  • Verifies BIOS/UEFI has enabled VT-x
  • Allocates VMXON region (4KB, aligned)
  • Executes VMXON/VMXOFF instructions

2. VMCS Manager

Manages VMCS allocation and configuration:

  • Allocates VMCS region (4KB, aligned)
  • Implements VMCLEAR, VMPTRLD, VMREAD, VMWRITE
  • Configures all required fields

3. Guest Memory Manager

Allocates and manages guest physical memory:

  • Allocates host memory for guest RAM
  • Loads guest binary into memory
  • Sets up identity mapping (initially)

4. VM Exit Handler

Handles all VM exits:

  • Reads exit reason and qualification
  • Dispatches to appropriate handler
  • Updates guest state as needed
  • Decides whether to resume or terminate

5. VM Entry/Exit Assembly

Low-level entry/exit code:

  • Saves/restores host registers not in VMCS
  • Executes VMLAUNCH/VMRESUME
  • Handles VM exit return path

4.3 Data Structures

/* Main hypervisor state */
struct hypervisor {
    void *vmxon_region;          /* VMXON region (4KB aligned) */
    uint64_t vmxon_phys;         /* Physical address of VMXON region */

    void *vmcs;                  /* VMCS region (4KB aligned) */
    uint64_t vmcs_phys;          /* Physical address of VMCS */

    void *guest_memory;          /* Guest physical memory */
    uint64_t guest_memory_phys;  /* Physical address of guest memory */
    size_t guest_memory_size;    /* Size of guest memory */

    struct vcpu_state vcpu;      /* Current VCPU state */
    bool vmx_enabled;            /* Are we in VMX operation? */
    bool guest_launched;         /* Has VMLAUNCH been executed? */
};

/* VCPU state (registers not in VMCS) */
struct vcpu_state {
    /* General purpose registers */
    uint64_t rax, rbx, rcx, rdx;
    uint64_t rsi, rdi, rbp;
    uint64_t r8, r9, r10, r11;
    uint64_t r12, r13, r14, r15;

    /* Cached from VMCS for convenience */
    uint64_t rip;
    uint64_t rsp;
    uint64_t rflags;
};

/* VM exit information */
struct vmexit_info {
    uint32_t reason;             /* Basic exit reason (bits 15:0) */
    uint64_t qualification;      /* Exit qualification */
    uint64_t guest_rip;          /* Guest RIP at exit */
    uint32_t instruction_len;    /* Length of causing instruction */
    uint32_t instruction_info;   /* Instruction-specific info */
    uint64_t guest_physical;     /* GPA for EPT violations */
    uint64_t guest_linear;       /* Linear address if applicable */
};

/* VMCS field encoding helpers */
struct vmcs_field {
    uint32_t encoding;
    const char *name;
    int width;  /* 16, 32, 64, or 0 for natural width */
};

4.4 Algorithm Overview

Main Launch Algorithm

1. INITIALIZE VMX:
   a. Check CPUID for VMX support
   b. Check IA32_FEATURE_CONTROL MSR
   c. Set CR4.VMXE = 1
   d. Allocate VMXON region, write revision ID
   e. Execute VMXON

2. SETUP VMCS:
   a. Allocate VMCS region, write revision ID
   b. Execute VMCLEAR (clear launch state)
   c. Execute VMPTRLD (make VMCS current)
   d. Write guest state fields
   e. Write host state fields
   f. Write control fields

3. LOAD GUEST:
   a. Copy guest binary to guest memory
   b. Set guest RIP to entry point

4. LAUNCH:
   a. Save host registers not in VMCS
   b. Execute VMLAUNCH
   c. (CPU now executing guest)

5. HANDLE EXIT:
   a. Save guest registers
   b. Read exit reason
   c. Dispatch to handler
   d. If continue: restore guest regs, VMRESUME
   e. If terminate: goto CLEANUP

6. CLEANUP:
   a. Execute VMXOFF
   b. Clear CR4.VMXE
   c. Free all allocated memory

5. Implementation Guide

5.1 Development Environment Setup

# 1. Verify VT-x support
$ grep -E 'vmx|svm' /proc/cpuinfo
flags: ... vmx ...

# 2. Check if VT-x is enabled in BIOS
# If not, reboot and enable in BIOS/UEFI settings

# 3. Install kernel development headers
$ sudo apt-get install linux-headers-$(uname -r) build-essential

# 4. Disable KVM if loaded (it uses VMX)
$ sudo rmmod kvm_intel kvm

# 5. Create project directory
$ mkdir ~/hypervisor && cd ~/hypervisor

# 6. Create Makefile
$ cat > Makefile << 'EOF'
obj-m += myhypervisor.o
myhypervisor-objs := main.o vmx.o vmcs.o exit_handler.o asm.o

KDIR := /lib/modules/$(shell uname -r)/build
PWD := $(shell pwd)

all:
    $(MAKE) -C $(KDIR) M=$(PWD) modules

clean:
    $(MAKE) -C $(KDIR) M=$(PWD) clean

# User-space launcher
launcher: launch_guest.c
    gcc -o launch_guest launch_guest.c

.PHONY: all clean
EOF

5.2 Project Structure

hypervisor/
├── Makefile
├── myhypervisor.h       # Main header with structs and constants
├── main.c               # Module init/exit, ioctl handling
├── vmx.c                # VMX enable/disable, VMXON/VMXOFF
├── vmcs.c               # VMCS allocation and configuration
├── exit_handler.c       # VM exit handling
├── asm.S                # Assembly for VMLAUNCH/VMRESUME
├── vmcs_fields.h        # VMCS field encoding definitions
├── launch_guest.c       # User-space launcher utility
└── guest/
    ├── guest.asm        # Simple real-mode guest
    └── Makefile         # Build guest binary

5.3 The Core Question You’re Answering

“How does the CPU provide a hardware-enforced boundary between a hypervisor and its guests, such that the guest can run at full speed with full privilege, yet cannot escape or interfere with the hypervisor?”

The answer lies in VMX’s dual-world architecture:

  1. The CPU has two execution environments (root and non-root)
  2. The VMCS defines what the guest can do and what causes traps
  3. VM exits are automatic hardware traps on sensitive operations
  4. The hypervisor is always in control via the VMCS configuration

5.4 Concepts You Must Understand First

Before implementing, verify you can answer these questions:

CPU Basics:

  • What are CR0, CR3, and CR4, and what do their bits control?
  • What is the difference between a segment selector and a segment base?
  • What is the GDT and how does segment resolution work?
  • What happens on a privilege level transition (Ring 0 ↔ Ring 3)?

VT-x Fundamentals:

  • What is the difference between VMX root and non-root operation?
  • Why can’t the guest just execute VMXOFF to escape?
  • What is a VM exit and what can trigger one?
  • What is stored in the VMCS and why?

Memory:

  • What is the difference between physical and virtual addresses?
  • How does paging work at a high level?
  • What is a 4KB page alignment requirement?

Book References:

  • Intel SDM Vol. 3C, Chapter 23: “Introduction to Virtual Machine Extensions”
  • Intel SDM Vol. 3C, Chapter 24: “Virtual Machine Control Structures”

5.5 Questions to Guide Your Design

VMXON Region:

  1. How big must the VMXON region be?
  2. What must be written to it before VMXON?
  3. What alignment is required?
  4. Does it need to be writable after VMXON?

VMCS Configuration:

  1. Which guest state fields are mandatory?
  2. Which host state fields are mandatory?
  3. What happens if control fields have invalid settings?
  4. How do you know which bits must be 0 or 1 in control fields?

VM Entry:

  1. What is the difference between VMLAUNCH and VMRESUME?
  2. What registers does the CPU load from VMCS on entry?
  3. What registers must you save/restore manually?

VM Exit:

  1. Where does the CPU jump on VM exit?
  2. What information is available about the exit?
  3. How do you advance guest RIP after handling an exit?

5.6 Thinking Exercise

Before writing any code, trace through this scenario manually:

Setup:

  • Guest code at physical address 0x7C00
  • Guest code: OUT 0x3F8, AL; HLT
  • AL contains ‘X’ (0x58)
  • VMCS configured to exit on I/O instructions

Trace the execution:

  1. VMLAUNCH executes
    • CPU loads guest state from VMCS
    • Guest RIP = 0x7C00
    • Guest AL = 0x58
    • CPU enters VMX non-root mode
  2. Guest executes OUT 0x3F8, AL
    • This is an I/O instruction
    • VMCS says: exit on unconditional I/O
    • CPU triggers VM exit
  3. VM exit occurs
    • CPU saves guest state to VMCS
    • CPU loads host state from VMCS
    • Execution continues at host RIP (your handler)
  4. Your handler runs
    • Read VMCS exit reason: 30 (I/O instruction)
    • Read exit qualification: encodes port, direction, size
    • You emulate: write ‘X’ to your serial buffer
    • Advance guest RIP by instruction length
  5. VMRESUME executes
    • CPU loads guest state from VMCS
    • Guest RIP now points to HLT instruction
  6. Guest executes HLT
    • VMCS says: exit on HLT
    • CPU triggers VM exit
  7. Your handler runs
    • Read exit reason: 12 (HLT)
    • Decide to terminate guest

Questions:

  • What would happen if you didn’t advance guest RIP?
  • What if the guest tried to execute VMXOFF?
  • What if you forgot to set up host state properly?

5.7 Hints in Layers

Hint 1 - Starting Point (Conceptual Direction)

Start with the simplest possible guest: a single instruction that causes a VM exit. Don’t try to run real code yet. Your first milestone is:

  1. Enter VMX operation (VMXON succeeds)
  2. Configure minimal VMCS
  3. Execute VMLAUNCH
  4. Handle one VM exit
  5. Clean up

Hint 2 - Next Level (More Specific Guidance)

The hardest part is getting VMCS configuration right. Follow this order:

  1. Start with KVM’s VMCS setup as a reference
  2. Use IA32_VMX_TRUE_* MSRs to determine allowed settings
  3. Set all mandatory fields to safe defaults first
  4. Add functionality incrementally

Key VMCS fields for a minimal setup:

  • Guest: CR0, CR3, CR4 (real-mode values)
  • Guest: CS selector/base/limit/access rights
  • Guest: RIP, RSP, RFLAGS
  • Host: All state fields (copy from current CPU state)
  • Control: Pin-based, proc-based (with HLT and I/O exits)

Hint 3 - Technical Details (Approach/Pseudocode)

/* Calculating valid control field values */
uint32_t adjust_controls(uint32_t desired, uint32_t msr) {
    uint64_t msr_val = read_msr(msr);
    uint32_t allowed_0 = (uint32_t)msr_val;        /* Must be 1 */
    uint32_t allowed_1 = (uint32_t)(msr_val >> 32); /* Can be 1 */

    /* Start with bits that must be 1 */
    desired |= allowed_0;
    /* Clear bits that must be 0 */
    desired &= allowed_1;

    return desired;
}

/* Minimum proc-based controls for HLT and I/O exiting */
#define CPU_BASED_HLT_EXITING       (1 << 7)
#define CPU_BASED_UNCOND_IO_EXITING (1 << 24)

uint32_t proc_based = adjust_controls(
    CPU_BASED_HLT_EXITING | CPU_BASED_UNCOND_IO_EXITING,
    IA32_VMX_TRUE_PROCBASED_CTLS
);

Hint 4 - Tools/Debugging (Verification Methods)

Debugging a hypervisor is hard because you can’t use printk during VMX transitions. Strategies:

  1. Check VMLAUNCH/VMRESUME return: They set RFLAGS.CF or RFLAGS.ZF on failure
  2. Read error code: If ZF=1, read VMCS field 0x4400 for error number
  3. Use serial output: Configure serial console, hypervisor can write to it
  4. Reference KVM: Compare your VMCS dump to KVM’s (enable KVM tracing)
# Enable KVM tracing
$ sudo trace-cmd record -e 'kvm:*' sleep 1
$ sudo trace-cmd report

Error numbers (field 0x4400):

  • 7: “VM entry with invalid control field(s)”
  • 8: “VM entry with invalid host-state field(s)”
  • 9: “VMPTRLD with invalid physical address”

5.8 The Interview Questions They’ll Ask

  1. “Explain the difference between VMX root mode and VMX non-root mode.”
    • Root: Hypervisor runs here, has full control, can execute VMX instructions
    • Non-root: Guest runs here, appears to have full privilege but is constrained
    • VM exits transfer from non-root to root; VM entries go root to non-root
  2. “What is a VM exit and what are some common causes?”
    • VM exit is a hardware trap from guest to hypervisor
    • Causes: HLT, I/O, CPUID, CR access, MSR access, interrupts, EPT violations
    • Configurable via VMCS control fields
  3. “Why is VMCS configuration so complex?”
    • Must match CPU capabilities (not all features on all CPUs)
    • Fields have interdependencies (if X then Y must be…)
    • Guest state must be valid for the mode you’re entering
    • Host state must be valid for returning to hypervisor
  4. “How would you minimize VM exit overhead?”
    • Use hardware features: EPT (fewer CR3 exits), APICv (fewer interrupt exits)
    • Use MSR/I/O bitmaps to only exit on necessary accesses
    • Paravirtualization: guest cooperates to avoid unnecessary exits
    • Batch operations when possible
  5. “What happens if you forget to advance guest RIP after handling an I/O exit?”
    • VMRESUME will re-execute the same instruction
    • Infinite loop: exit → handle → resume → same instruction → exit…
  6. “How does nested virtualization work?”
    • L0 hypervisor runs L1 hypervisor as a guest
    • L1 tries to use VMX instructions → exit to L0
    • L0 emulates VMX for L1
    • VMCS shadowing hardware helps performance

5.9 Books That Will Help

Topic Book Chapter
VT-x Architecture Intel SDM Vol. 3C Chapters 23-27
VMCS Fields Intel SDM Vol. 3C Chapter 24, Appendix B
VM Exits Intel SDM Vol. 3C Chapter 25-27
Control Fields Intel SDM Vol. 3C Appendix A
x86 Architecture Intel SDM Vol. 1 Chapters 3-6
Kernel Modules Linux Device Drivers, 3rd Ed Chapters 1-2
Memory Allocation Understanding the Linux Kernel Chapter 8

5.10 Implementation Phases

Phase 1: VMX Enable (Week 1)

Goal: Successfully execute VMXON

/* vmx.c - VMX enable implementation */

int vmx_enable(struct hypervisor *hv) {
    uint64_t feature_control;
    uint64_t vmx_basic;

    /* 1. Check CPUID */
    if (!cpu_has_vmx()) {
        pr_err("CPU does not support VMX\n");
        return -ENODEV;
    }

    /* 2. Check IA32_FEATURE_CONTROL */
    feature_control = read_msr(IA32_FEATURE_CONTROL);
    if (!(feature_control & FEATURE_CONTROL_LOCKED)) {
        pr_err("IA32_FEATURE_CONTROL not locked\n");
        return -ENODEV;
    }
    if (!(feature_control & FEATURE_CONTROL_VMXON_ENABLED_OUTSIDE_SMX)) {
        pr_err("VMXON not enabled in BIOS\n");
        return -ENODEV;
    }

    /* 3. Allocate VMXON region */
    hv->vmxon_region = (void *)__get_free_page(GFP_KERNEL);
    if (!hv->vmxon_region)
        return -ENOMEM;
    hv->vmxon_phys = virt_to_phys(hv->vmxon_region);

    /* 4. Write VMCS revision ID */
    vmx_basic = read_msr(IA32_VMX_BASIC);
    *(uint32_t *)hv->vmxon_region = (uint32_t)vmx_basic;

    /* 5. Set CR4.VMXE */
    cr4_set_bits(X86_CR4_VMXE);

    /* 6. Execute VMXON */
    if (vmxon(hv->vmxon_phys)) {
        pr_err("VMXON failed\n");
        cr4_clear_bits(X86_CR4_VMXE);
        free_page((unsigned long)hv->vmxon_region);
        return -EIO;
    }

    hv->vmx_enabled = true;
    pr_info("VMX operation enabled\n");
    return 0;
}

Validation:

  • Module loads without crash
  • dmesg shows “VMX operation enabled”
  • Subsequent VMXON fails (already in VMX mode)

Phase 2: VMCS Configuration (Weeks 2-3)

Goal: Configure VMCS with valid guest and host state

Focus on these VMCS field categories:

  1. Guest register state (CS, RIP, RSP, RFLAGS, CR0, CR3, CR4)
  2. Host register state (same fields, reflecting hypervisor state)
  3. VM-execution controls (what causes exits)
  4. VM-exit controls (what happens on exit)
  5. VM-entry controls (what happens on entry)

Validation:

  • VMCLEAR succeeds
  • VMPTRLD succeeds
  • All VMWRITE operations succeed

Phase 3: VM Launch (Week 4)

Goal: Successfully execute VMLAUNCH and see first VM exit

This requires:

  1. Loading guest code into allocated memory
  2. Setting up assembly trampoline for VMLAUNCH
  3. Handling the transition back on VM exit

Validation:

  • VMLAUNCH returns (doesn’t hang)
  • Exit reason is readable
  • Guest RIP matches expected location

Phase 4: Exit Handling (Weeks 5-6)

Goal: Handle multiple exit types, run complete guest program

Implement handlers for:

  1. HLT (terminate or idle)
  2. I/O (emulate serial port output)
  3. CPUID (return fake values)
  4. CR access (track guest CR changes)

Validation:

  • Guest prints complete message
  • Guest terminates cleanly on HLT
  • Statistics show expected exit count

5.11 Key Implementation Decisions

Decision 1: Kernel Module vs. KVM API

Option A: Kernel Module (This Project)

  • Pros: Learn VT-x directly, full control, deeper understanding
  • Cons: Dangerous (can crash system), complex, no user-space debugger

Option B: KVM API (Project 14)

  • Pros: Safer, user-space debugging, production-tested
  • Cons: KVM abstracts the details, less learning

Recommendation: Do both! Start with this project for understanding, then use KVM for practical applications.

Decision 2: Real Mode vs. Protected Mode Guest

Real Mode (This Project)

  • Simpler segment setup
  • No paging needed
  • Limited to 1MB address space
  • Good for learning

Protected Mode

  • Full 32/64-bit capability
  • Requires GDT setup
  • Requires paging (or EPT)
  • Needed for real OS

Recommendation: Start with real mode, upgrade to protected mode as Phase 5.

Decision 3: Memory Management

Simple Allocation

  • Use __get_free_pages() for guest memory
  • Identity-map if using EPT
  • Simpler but limited

Proper Guest Physical Address Space

  • Model GPA→HPA translation
  • Support large guest memory
  • Required for real OS

Recommendation: Start simple, add EPT in Project 12.


6. Testing Strategy

6.1 Unit Tests

Test individual components before integration:

/* Test VMX capability detection */
void test_vmx_supported(void) {
    assert(cpu_has_vmx() == true);
    pr_info("TEST: VMX support detected - PASS\n");
}

/* Test MSR reading */
void test_msr_reading(void) {
    uint64_t basic = read_msr(IA32_VMX_BASIC);
    assert(basic != 0);
    pr_info("TEST: IA32_VMX_BASIC = 0x%llx - PASS\n", basic);
}

/* Test VMCS allocation */
void test_vmcs_alloc(void) {
    void *vmcs = alloc_vmcs();
    assert(vmcs != NULL);
    assert(((unsigned long)vmcs & 0xFFF) == 0);  /* 4KB aligned */
    free_vmcs(vmcs);
    pr_info("TEST: VMCS allocation - PASS\n");
}

6.2 Integration Tests

#!/bin/bash
# test_hypervisor.sh

# Test 1: Module loads
echo "Test 1: Loading module..."
sudo insmod myhypervisor.ko || { echo "FAIL: Module load"; exit 1; }
echo "PASS: Module loaded"

# Test 2: VMX enabled
echo "Test 2: Checking VMX status..."
dmesg | tail -5 | grep -q "VMX operation enabled" || { echo "FAIL: VMX not enabled"; exit 1; }
echo "PASS: VMX enabled"

# Test 3: Run simple guest
echo "Test 3: Running guest..."
./launch_guest guest_hello.bin > output.txt 2>&1
grep -q "Hello from VMX guest" output.txt || { echo "FAIL: Guest output"; exit 1; }
echo "PASS: Guest executed"

# Test 4: Module unloads cleanly
echo "Test 4: Unloading module..."
sudo rmmod myhypervisor || { echo "FAIL: Module unload"; exit 1; }
echo "PASS: Module unloaded"

echo "All tests passed!"

6.3 Guest Test Programs

guest_hlt.asm - Minimal test (just HLT):

[BITS 16]
[ORG 0x7C00]
    hlt

guest_io.asm - I/O test:

[BITS 16]
[ORG 0x7C00]
    mov al, 'X'
    out 0x3F8, al  ; Serial port
    hlt

guest_hello.asm - Full message:

[BITS 16]
[ORG 0x7C00]

start:
    mov si, message
.loop:
    lodsb           ; Load byte from [SI] into AL
    test al, al     ; Check for null terminator
    jz .done
    out 0x3F8, al   ; Output to serial
    jmp .loop
.done:
    hlt

message: db "Hello from VMX guest!", 13, 10, 0

7. Common Pitfalls & Debugging

Problem Root Cause Fix Verification
VMXON fails with CF=1 Invalid VMXON region Ensure 4KB aligned, write correct revision ID Check alignment and MSR value
VMLAUNCH fails with ZF=1 Invalid VMCS fields Read error code from field 0x4400 Compare with valid settings
System hangs on VMLAUNCH Missing host state Ensure all host fields set correctly Copy from current CPU state
Guest loops infinitely Forgot to advance RIP Add instruction length to RIP after I/O exit Check RIP changes after exit
VM exit reason = 33 Invalid guest state Check segment access rights, CR0/CR4 Follow SDM Chapter 26 checks
CR4.VMXE won’t set KVM loaded Unload kvm_intel and kvm modules lsmod | grep kvm
Module crashes on unload VMXOFF not called Always call VMXOFF before module exit Add cleanup handler

Debugging VMCS Configuration

/* Dump all mandatory VMCS fields for debugging */
void dump_vmcs_state(void) {
    pr_info("=== VMCS Dump ===\n");

    /* Guest state */
    pr_info("Guest CR0: 0x%llx\n", vmcs_read64(GUEST_CR0));
    pr_info("Guest CR3: 0x%llx\n", vmcs_read64(GUEST_CR3));
    pr_info("Guest CR4: 0x%llx\n", vmcs_read64(GUEST_CR4));
    pr_info("Guest RIP: 0x%llx\n", vmcs_read64(GUEST_RIP));
    pr_info("Guest RSP: 0x%llx\n", vmcs_read64(GUEST_RSP));
    pr_info("Guest RFLAGS: 0x%llx\n", vmcs_read64(GUEST_RFLAGS));
    pr_info("Guest CS: sel=0x%x base=0x%llx limit=0x%x ar=0x%x\n",
            vmcs_read16(GUEST_CS_SELECTOR),
            vmcs_read64(GUEST_CS_BASE),
            vmcs_read32(GUEST_CS_LIMIT),
            vmcs_read32(GUEST_CS_AR_BYTES));

    /* Control fields */
    pr_info("Pin-based: 0x%x\n", vmcs_read32(PIN_BASED_VM_EXEC_CONTROL));
    pr_info("Proc-based: 0x%x\n", vmcs_read32(CPU_BASED_VM_EXEC_CONTROL));
    pr_info("Exit controls: 0x%x\n", vmcs_read32(VM_EXIT_CONTROLS));
    pr_info("Entry controls: 0x%x\n", vmcs_read32(VM_ENTRY_CONTROLS));

    /* Host state */
    pr_info("Host CR0: 0x%llx\n", vmcs_read64(HOST_CR0));
    pr_info("Host CR3: 0x%llx\n", vmcs_read64(HOST_CR3));
    pr_info("Host CR4: 0x%llx\n", vmcs_read64(HOST_CR4));
    pr_info("Host RIP: 0x%llx\n", vmcs_read64(HOST_RIP));
    pr_info("Host RSP: 0x%llx\n", vmcs_read64(HOST_RSP));
}

8. Extensions & Challenges

Once the basic hypervisor works, try these extensions:

8.1 Protected Mode Guest

Modify guest to run in 32-bit protected mode:

  • Set up GDT in guest memory
  • Configure guest segment registers properly
  • Handle more complex memory model

8.2 Unrestricted Guest Mode

Use the “unrestricted guest” VMX feature to run real-mode code more easily:

  • Requires EPT
  • Allows real-mode without strict segment checks

8.3 MSR Bitmap

Instead of exiting on all MSR accesses, use MSR bitmap:

  • Only exit on specific MSRs
  • Improves performance for MSR-heavy guests

8.4 I/O Bitmap

Instead of exiting on all I/O, use I/O bitmap:

  • Only exit on specific ports
  • Pass through safe ports to hardware

8.5 CPUID Emulation

Return custom CPUID values to the guest:

  • Hide host features
  • Emulate different CPU models
  • Useful for compatibility

9. Real-World Connections

How This Relates to KVM

KVM (Kernel Virtual Machine) is essentially a production version of what you’re building:

Your Hypervisor                    KVM
┌──────────────────┐              ┌──────────────────┐
│ myhypervisor.ko  │              │    kvm.ko        │
│                  │              │   kvm_intel.ko   │
│ - VMX enable     │      ≈       │                  │
│ - VMCS setup     │              │ - Same VMX code  │
│ - Exit handling  │              │ - Much more      │
│                  │              │   complete       │
└──────────────────┘              └──────────────────┘
        │                                  │
        │                                  │
        ▼                                  ▼
┌──────────────────┐              ┌──────────────────┐
│  launch_guest    │              │     QEMU         │
│  (simple tool)   │              │ (full emulator)  │
└──────────────────┘              └──────────────────┘

How This Relates to VMware

VMware ESXi is a Type-1 hypervisor that:

  • Has its own kernel (VMkernel)
  • Uses VT-x similar to your hypervisor
  • Adds sophisticated scheduling, storage, networking
  • Enterprise features: vMotion, HA, DRS

Your hypervisor teaches the same core concepts VMware engineers use.

How This Relates to Xen

Xen takes a different approach:

  • Very thin hypervisor (dom0 runs privileged guest)
  • Uses VT-x for HVM guests
  • Paravirtualization for PV guests
  • The VT-x usage is similar to yours

10. Resources

Primary References

Code References

Tools


11. Self-Assessment Checklist

Before considering this project complete, verify:

VMX Operation

  • Can enter VMX operation (VMXON succeeds)
  • Can detect and report VMX capabilities
  • Can cleanly exit VMX operation (VMXOFF succeeds)
  • CR4.VMXE is properly managed

VMCS Configuration

  • Can allocate and initialize VMCS
  • Can write all mandatory guest state fields
  • Can write all mandatory host state fields
  • Can configure control fields with valid values
  • Can read VMCS fields for debugging

VM Entry/Exit

  • VMLAUNCH succeeds on first entry
  • Can handle VM exits (read reason and qualification)
  • VMRESUME succeeds for subsequent entries
  • Can advance guest RIP correctly

Guest Execution

  • Guest can execute instructions
  • Guest can perform I/O (causes exit, handled correctly)
  • Guest can execute HLT (causes exit, terminates)
  • Guest output is captured and displayed

Robustness

  • Module loads and unloads cleanly
  • No kernel crashes or hangs
  • Error cases handled gracefully
  • Resources freed properly on exit

12. Submission / Completion Criteria

Your hypervisor is complete when you can demonstrate:

  1. Successful VMX Operation
    • Show dmesg output proving VMXON succeeded
    • Show capability dump from your hypervisor
  2. Guest Execution
    • Run the provided guest_hello.bin
    • Show complete output: “Hello from VMX guest!”
  3. Exit Statistics
    • Show count of VM exits by type
    • Verify exits match expected (15 I/O + 1 HLT for hello message)
  4. Clean Shutdown
    • Module unloads without errors
    • KVM can be loaded afterward (VMX released)
  5. Code Quality
    • Well-commented, especially VMCS field settings
    • Error handling for all VMX operations
    • Clear separation of VMX, VMCS, and exit handling logic

Bonus Points:

  • Protected mode guest works
  • CPUID emulation implemented
  • Multiple VM exits handled (not just I/O and HLT)
  • Performance statistics (cycles in guest, exit latency)

After completing this project, you’ll have direct, hands-on understanding of how hardware virtualization works. The magic of “VMs running at native speed” will be demystified—you built the mechanism yourself. This knowledge is foundational for understanding cloud computing, security isolation, and modern operating systems.