Project 5: Write a Simple Virtual Machine Monitor (VMM)

Build a minimal VT-x based VMM that enters VMX operation, configures VMCS fields, runs a guest, and handles VM exits directly without KVM.

Quick Reference

Attribute Value
Difficulty Expert (Level 5)
Time Estimate 6-9 weeks
Main Programming Language C (Alternatives: Rust)
Alternative Programming Languages Rust
Coolness Level Level 5: Pure Magic
Business Potential Level 1: Resume Gold
Prerequisites x86 assembly, paging, OS development basics
Key Topics VMX operation, VMCS fields, EPT, VM exits

1. Learning Objectives

By completing this project, you will:

  1. Enter VMX operation and allocate VMXON/VMCS regions correctly.
  2. Initialize guest and host state fields in VMCS.
  3. Configure EPT for guest memory translation.
  4. Handle core VM exits (CPUID, HLT, IO, EPT violations).
  5. Debug VMX errors and interpret VM-instruction failures.

2. All Theory Needed (Per-Concept Breakdown)

2.1 VMX Operation and VMCS Fields

Fundamentals

Intel VT-x introduces a new execution mode: VMX root (hypervisor) and VMX non-root (guest). The VMCS (Virtual Machine Control Structure) holds guest state, host state, and execution controls. To run a guest, you must enter VMX operation, allocate a VMXON region, initialize a VMCS region with the correct revision ID, and fill required fields. Understanding VMX operation and VMCS fields is the foundation of a bare-metal VMM. Without this foundation, every VM entry will fail or crash unpredictably. You should also recognize that VMX is not optional on many CPUs; it must be enabled in BIOS and via MSRs before it can be used.

Deep Dive into the concept

VMX operation begins with enabling the VMX bit in CR4 (CR4.VMXE) and executing VMXON with a physical address of a 4KB-aligned VMXON region. That region contains a revision identifier read from IA32_VMX_BASIC. If the region is not aligned, the revision is wrong, or the CPU is not configured, VMXON fails.

Once in VMX operation, you allocate a VMCS region. A VMCS is also 4KB aligned and begins with the revision ID. You execute VMPTRLD to make a VMCS current. The VMCS contains three broad categories: guest state, host state, and execution controls. Guest state includes registers (RIP, RSP, RFLAGS), control registers, segment descriptors, and MSRs. Host state includes the state the CPU will restore on VM exit. Execution controls define which events cause VM exits (e.g., I/O, MSR access, CPUID), and how entry/exit should behave (e.g., whether to load EFER).

VMX has strict requirements: certain fields must have “fixed” bits set or cleared according to MSRs like IA32_VMX_CR0_FIXED0. If you violate these constraints, VMLAUNCH fails with VMXERR_INVALID or VMXERR_INVALID_HOST_STATE. This is why many VMMs include helper functions that validate and sanitize control registers before writing them to VMCS.

The VMCS is per vCPU. If you later scale to multiple vCPUs, you need separate VMCS regions and must manage them carefully. The VMCS fields are large and subtle, covering everything from segment selectors to MSR load lists. A minimal VMM can run in 64-bit long mode or even in real mode, but each mode has different state requirements. Many toy VMMs use a 64-bit guest to avoid real-mode segmentation complexity, but real mode can also be simpler if you want just a HLT guest.

VM entry and VM exit are controlled by VM entry and exit controls. These specify whether certain MSRs are loaded on entry, whether to save debug registers, and whether to load EFER. If you misconfigure them, the guest may start with wrong state or crash. The entry checks are strict, so debugging often involves reading the VM-instruction error field to see which constraint was violated.

Understanding VMX operation also requires understanding how the CPU transitions between root and non-root. On VM entry, the CPU saves host state and loads guest state. On VM exit, it saves guest state back into the VMCS and restores host state. This is the hardware mechanism that makes virtualization practical. Once you grasp this, you can reason about why certain fields matter and how to debug failures.

It is also useful to appreciate the sheer number of fields in a VMCS and how real hypervisors manage them. Production VMMs use structured initialization tables and carefully documented defaults so they can audit changes as CPU generations evolve. Even in a toy VMM, you should adopt a pattern: gather all required fields in one place, document why each field is set, and validate against the SDM. This approach prevents “configuration drift” where incremental tweaks accumulate into an un-debuggable VMCS.

How this fit on projects

VMX operation is the heart of Section 3.2 and Section 5.10 Phase 1, and it drives all VM exit handling in Section 5.10 Phase 2.

Definitions & key terms

  • VMXON -> Instruction that enters VMX operation.
  • VMCS -> Control structure defining guest/host state and execution controls.
  • VMPTRLD -> Instruction that sets the current VMCS.
  • VMLAUNCH/VMRESUME -> Enter guest mode.

Mental model diagram (ASCII)

Host (VMX root) -- VMLAUNCH --> Guest (VMX non-root)
        ^                          |
        |---------- VM exit -------|

How it works (step-by-step, with invariants and failure modes)

  1. Enable CR4.VMXE and verify IA32_FEATURE_CONTROL.
  2. Allocate VMXON region with correct revision.
  3. Execute VMXON (failure: VMXON error if region invalid).
  4. Allocate VMCS region, load with VMPTRLD.
  5. Fill guest/host state and controls.
  6. Execute VMLAUNCH.

Failure modes: invalid CR0/CR4 bits, missing required VMCS fields, VMXON not allowed.

Minimal concrete example

mov eax, cr4
or eax, (1<<13)
mov cr4, eax
vmxon [vmxon_pa]

Common misconceptions

  • “VMCS is only a data structure.” -> It is a hardware-controlled structure with strict constraints.
  • “VMLAUNCH always works if VMXON works.” -> It can fail due to invalid VMCS fields.
  • “VMX operation is just a bit.” -> It is a CPU mode with explicit entry/exit semantics.

Check-your-understanding questions

  1. Why must VMXON and VMCS regions be 4KB aligned?
  2. What is the purpose of the VM-instruction error field?
  3. Predict what happens if CR0 fixed bits are not respected.

Check-your-understanding answers

  1. The CPU hardware expects 4KB-aligned regions for internal caching and validation.
  2. It reports why VMX instructions failed, critical for debugging.
  3. VMLAUNCH will fail with invalid control state.

Real-world applications

  • Hypervisors like ESXi and KVM use these primitives at their core.

Where you’ll apply it

References

  • Intel SDM, VMX chapters

Key insights

VMX operation is a state machine; correct VMCS configuration is the key to entering guest mode.

Summary

You now understand how VMX operation and VMCS fields define guest execution.

Homework/Exercises to practice the concept

  1. Read IA32_VMX_BASIC and decode the revision ID.
  2. Write a small program that fails VMLAUNCH intentionally and reads the error code.

Solutions to the homework/exercises

  1. Use rdmsr to read and mask the revision bits.
  2. Set an invalid guest CR0 and observe VM-instruction error.

2.2 EPT and Nested Page Tables

Fundamentals

Extended Page Tables (EPT) provide a second level of address translation for guests. The guest maps GVA -> GPA, and the hypervisor maps GPA -> HPA via EPT. In a bare-metal VMM, you must build the EPT hierarchy yourself and handle EPT violations (e.g., guest accesses unmapped GPA). Understanding EPT is essential for guest memory isolation and for handling memory exits. You should be able to explain how permissions in EPT differ from guest page permissions. Even at a basic level, EPT forces you to think about page sizes, alignment, and how many tables you need. This also shapes how you lay out guest RAM and MMIO regions.

Deep Dive into the concept

EPT is a four-level page table structure similar to regular x86-64 paging. The difference is that EPT translates guest physical addresses to host physical addresses. Each EPT entry includes read/write/execute permissions and a memory type. When the guest accesses memory, the CPU performs two translations: guest page tables (GVA->GPA) and EPT (GPA->HPA). The combined translation is cached in the TLB, but EPT adds overhead and complexity.

Building EPT in a VMM means allocating aligned pages for the EPT PML4, PDPT, PD, and PT. You populate entries with host physical addresses of guest memory. For a simple VMM, you can use identity mapping: GPA == HPA, mapping a single contiguous guest region. This is simpler for a toy guest but still demonstrates the core mechanism.

EPT violations are akin to page faults but happen at the second translation stage. If a guest accesses an unmapped GPA, or violates permissions, the CPU triggers a VM exit with an EPT violation reason. The exit qualification provides details about the access. Your VMM can then decide to map the page (like demand paging) or inject a page fault into the guest. For this project, you can simply terminate the guest and log the violation, but you should understand that production hypervisors often use EPT violations to implement memory ballooning or lazy mapping.

Memory types matter. EPT entries include memory type bits (UC, WB, etc.), which affect caching. For correctness, RAM should be mapped as write-back. Device memory (MMIO) should be uncached. If you misconfigure memory types, you can see subtle bugs or performance issues. For a toy VMM, using WB for RAM is sufficient.

EPT also interacts with page permissions. If you want to intercept access to a region (e.g., for MMIO), you can clear the EPT read/write/execute bits for that region, causing EPT violations on access. This is a common technique for trapping device accesses without using I/O port exits. Understanding this gives you a mental model of how advanced hypervisors implement MMIO and device emulation.

Another subtle issue is cache coherency with EPT. When you update EPT entries, you may need to invalidate the EPT TLB (via INVEPT) to ensure the CPU uses the new mapping. If you skip this, the guest might continue using stale translations, which can lead to confusing behavior. For a simple VMM, you can avoid dynamic changes or invalidate globally after setup, but knowing this mechanism is important for correctness in more advanced designs.

EPT also supports accessed and dirty bits on some CPUs, which can be used for tracking guest memory usage. This is how hypervisors implement features like live migration pre-copy and memory ballooning efficiently. Even if you do not implement these features, understanding that EPT can provide hardware-assisted dirty tracking is important. It shows that EPT is not just a translation table; it is also a source of observability about guest memory behavior.

Finally, be mindful of the EPTP (EPT pointer) configuration. The EPTP encodes the page-walk length and memory type. A wrong EPTP value can cause VM entry failures or silent misbehavior. Treat EPTP as part of the guest contract, and validate it like any other VMCS field.

How this fit on projects

EPT is used in Section 3.2 and Section 5.10 Phase 2, and EPT violations are handled in Section 7.1.

Definitions & key terms

  • EPT -> Extended Page Tables for GPA->HPA translation.
  • EPT violation -> VM exit triggered by disallowed EPT access.
  • Identity mapping -> Mapping GPA == HPA for simplicity.

Mental model diagram (ASCII)

GVA -> guest PT -> GPA -> EPT -> HPA

How it works (step-by-step, with invariants and failure modes)

  1. Allocate EPT tables.
  2. Map GPA ranges to HPA ranges.
  3. Load EPTP into VMCS.
  4. Guest memory access triggers EPT translation.

Failure modes: incorrect alignment, missing mappings, invalid memory types.

Minimal concrete example

// Map first 2MB with 2MB EPT page
ept_pde[0] = (host_pa & ~0x1FFFFF) | EPT_READ | EPT_WRITE | EPT_EXEC | EPT_2MB;

Common misconceptions

  • “EPT replaces guest paging.” -> It is an additional layer, not a replacement.
  • “EPT violations are fatal.” -> They can be handled and used for lazy mapping.
  • “Identity mapping is always safe.” -> It is simple but not always secure.

Check-your-understanding questions

  1. Why is EPT a second translation layer?
  2. What does an EPT violation indicate?
  3. Predict how clearing execute permission affects guest code execution.

Check-your-understanding answers

  1. It allows the hypervisor to control guest physical memory independently.
  2. The guest accessed unmapped or disallowed GPA.
  3. The guest will trigger EPT violations on instruction fetch.

Real-world applications

  • Hypervisors use EPT permissions to implement MMIO traps and security features.

Where you’ll apply it

References

  • Intel SDM EPT chapters

Key insights

EPT is the hypervisor’s memory policy engine; it defines what the guest can touch.

Summary

You now understand how to build and use EPT to control guest memory.

Homework/Exercises to practice the concept

  1. Map a 4KB page and trigger an EPT violation by accessing an unmapped GPA.
  2. Change EPT permissions and observe exit behavior.

Solutions to the homework/exercises

  1. Access GPA outside the mapped range; handle EPT violation in exit handler.
  2. Clear execute bits and observe instruction fetch exits.

2.3 VM Exits, Guest State Setup, and Debugging

Fundamentals

A VMM must handle VM exits and set up guest state correctly. Exits occur on CPUID, HLT, I/O, or EPT violations. Guest state includes registers, segments, and control registers; if misconfigured, VMLAUNCH fails or the guest triple-faults. Debugging VMX requires reading VM-instruction error codes and logging exit reasons. This concept ties together execution, correctness, and diagnostics. A minimal VMM still needs a clear strategy for which exits to emulate and which to treat as fatal. This clarity is what turns a crash into a teachable, repeatable failure mode. It is also the basis for deterministic debugging. Without it, debugging becomes guesswork.

Deep Dive into the concept

VM exits are how the hypervisor regains control. Each exit provides an exit reason and qualification data. The VMM must decode this and respond appropriately. For CPUID, the VMM emulates the instruction by writing expected values into guest registers. For HLT, it can treat it as a guest shutdown. For I/O, it can emulate a device or simply ignore. For EPT violations, it can map memory or kill the guest.

Guest state setup is subtle. You must configure guest CR0, CR3, CR4, segment selectors, descriptor tables, and MSRs. Many fields have fixed bits that must be set. For example, CR0 must have PE and PG bits set appropriately for protected or long mode. Segment descriptors must be valid and consistent with the chosen mode. If any field violates VMX requirements, VMLAUNCH fails. If fields are valid but inconsistent, the guest may run but crash immediately.

A common approach is to start with a minimal guest in real mode. This avoids complex segment and paging setups. However, VMX still expects certain fields (e.g., guest CS selector and base). You can also set up a simple 64-bit guest for clarity. Either way, the VMM must understand how the CPU interprets these fields during guest execution.

Debugging VMX is a unique challenge. Unlike software bugs, VMX failures often manifest as VMLAUNCH failure with an error code. You must read the VM-instruction error field from VMCS. The Intel SDM lists error codes and their meaning. Additionally, you should log exit reasons and guest RIP values to track where the guest was when it exited. This is a critical skill for real hypervisor work.

Exit handling also impacts performance. If you intercept too many events, the guest runs slowly. A minimal VMM can intercept only what it needs: CPUID (to hide hypervisor features), HLT, I/O, and EPT violations. This is enough for a small guest. Understanding which exits are necessary and which are optional is part of hypervisor design.

In addition, exit handling shapes your observability. The exits you log become your primary debugging signals, so decide early which ones to record and at what verbosity. For example, logging every I/O exit can overwhelm output, while logging only unexpected exits can hide useful patterns. A good approach is to log all exits during development and provide a filter mechanism once the system stabilizes. This mirrors real hypervisors, which have configurable tracing levels.

You should also understand exception injection. If the guest triggers a fault (e.g., divide by zero), the VMM can inject that exception back into the guest. For a toy VMM you might simply terminate, but recognizing that exceptions can be virtualized explains how guests continue operating under faults. This prepares you for future extensions like basic interrupt injection or timer emulation.

Another useful addition is to track guest RFLAGS and interruptibility state. VMX exposes fields like GUEST_INTERRUPTIBILITY_STATE that govern whether interrupts can be delivered. While you may ignore them for a minimal guest, understanding these fields explains why some exits cannot be immediately followed by injected interrupts. This deepens your understanding of correctness in VM state transitions.

How this fit on projects

Exit handling and debugging appear in Section 3.2, Section 5.10 Phase 2, and Section 7.1.

Definitions & key terms

  • Exit reason -> Code describing why the CPU left guest mode.
  • Exit qualification -> Additional details about the exit.
  • VM-instruction error -> VMCS field reporting why VMX instruction failed.

Mental model diagram (ASCII)

Guest executes -> VM exit -> VMM handles -> VM resume

How it works (step-by-step, with invariants and failure modes)

  1. VMLAUNCH starts guest.
  2. Exit occurs (CPUID/HLT/IO/EPT).
  3. VMM reads exit reason and qualification.
  4. VMM emulates or handles.
  5. VMRESUME continues.

Failure modes: invalid guest state, unhandled exit, wrong emulation.

Minimal concrete example

switch (exit_reason) {
  case EXIT_REASON_CPUID: emulate_cpuid(); break;
  case EXIT_REASON_HLT: running = false; break;
}

Common misconceptions

  • “If VMLAUNCH works once, it always will.” -> Any change to VMCS can break it.
  • “Unhandled exits are harmless.” -> They lead to undefined guest behavior.
  • “Debugging VMX is impossible.” -> VM-instruction error codes are your friend.

Check-your-understanding questions

  1. Why is CPUID commonly intercepted?
  2. What is the difference between an exit reason and qualification?
  3. Predict what happens if you leave guest segment bases at zero in protected mode.

Check-your-understanding answers

  1. To hide or virtualize CPU features; also to avoid guest assumptions.
  2. Reason is the type of exit; qualification gives detailed context.
  3. The guest may crash or triple-fault due to invalid segments.

Real-world applications

  • Hypervisors use exit handling to virtualize devices and features.
  • Security researchers use VMX exit tracing for analysis.

Where you’ll apply it

References

  • Intel SDM VMX exit handling sections

Key insights

VM exits are the only way you regain control; they are both your power and your bottleneck.

Summary

You now understand how to set guest state, handle exits, and debug VMX failures.

Homework/Exercises to practice the concept

  1. Implement CPUID emulation that clears the hypervisor bit.
  2. Log guest RIP on every exit and observe patterns.

Solutions to the homework/exercises

  1. Modify CPUID output bits in EAX/EBX/ECX/EDX.
  2. Log RIP from VMCS field GUEST_RIP.

2.4 VMX Capability MSRs and Control Validation

Fundamentals

A VMX-based hypervisor cannot simply “turn on” virtualization and set arbitrary control bits. Intel exposes capability MSRs that tell you which VMX controls are allowed or required on the current CPU. These MSRs define allowed-0 and allowed-1 bits for each control field, and the hypervisor must compute valid control values by combining its desired features with these constraints. If you skip this step, VMLAUNCH fails with an “invalid control” error. Understanding capability MSRs is therefore essential for a bare-metal VMM: it is the difference between a VM that launches and one that immediately fails with no guest code executed.

Deep Dive into the concept

Intel VMX control fields are not fixed across CPU generations. Newer CPUs introduce additional features and allow different control bit combinations. To manage this, Intel provides a set of MSRs such as IA32_VMX_PINBASED_CTLS, IA32_VMX_PROCBASED_CTLS, IA32_VMX_PROCBASED_CTLS2, IA32_VMX_EXIT_CTLS, and IA32_VMX_ENTRY_CTLS. Each of these MSRs encodes two 32-bit masks: the lower 32 bits indicate which bits must be 1, and the upper 32 bits indicate which bits may be 1. The valid value for a control field is computed as:

controls = (desired | must_be_1) & may_be_1

This formula ensures you never clear a bit that must be 1, and you never set a bit that the CPU forbids. Many bare-metal VMM failures come from ignoring these masks. For example, a hypervisor might attempt to enable secondary execution controls without checking whether they are allowed, leading to VMLAUNCH failure.

Another critical MSR is IA32_VMX_BASIC. It provides the VMCS revision ID that must be written into the first 32 bits of the VMXON and VMCS regions. If this value is wrong, VMXON fails. It also indicates whether “true” control MSRs are supported. If IA32_VMX_BASIC reports that true controls are available, the hypervisor should prefer IA32_VMX_TRUE_* MSRs because they provide more accurate allowed bit masks.

VMX also has fixed bits for CR0 and CR4. The MSRs IA32_VMX_CR0_FIXED0/1 and IA32_VMX_CR4_FIXED0/1 define which bits must be 0 or 1. If the guest’s CR0 or CR4 violates these fixed bits at VM entry, the CPU rejects the entry. The VMM should compute valid guest values by applying these masks or by ensuring that any user-specified value conforms to them. This is particularly important when enabling features like paging or long mode in the guest.

Finally, capability MSRs interact with EPT and VPID. The IA32_VMX_EPT_VPID_CAP MSR indicates which EPT paging structures and memory types are supported, which is crucial when you design your EPT tables. If you choose a memory type or page size unsupported by the CPU, EPT violations or VM entry errors can occur. A minimal VMM can choose conservative defaults (4KB pages, write-back memory type) that are widely supported, but it should still query the MSR and validate its assumptions.

In short, capability MSRs are the contract between your VMM and the CPU. They tell you what the hardware allows and what it requires. A correct VMM treats these values as non-negotiable constraints and derives its configuration from them. Once you adopt this mindset, VMX becomes less mysterious: you are no longer guessing which bits to set, you are complying with a documented contract.

How this fit on projects

This concept is used when building the VMCS in Section 5.10 Phase 1 and when troubleshooting VM entry failures in Section 7.1.

Definitions & key terms

  • Capability MSR -> Model-specific register that defines allowed VMX control bits.
  • Allowed-0 bits -> Bits that must be cleared in a control field.
  • Allowed-1 bits -> Bits that may be set in a control field.
  • VMCS revision ID -> A value required in the VMCS/VMXON region header.

Mental model diagram (ASCII)

Desired controls
    |
    v
Apply must_be_1 (OR)
    |
    v
Apply may_be_1 (AND)
    |
    v
Valid control value

How it works (step-by-step, with invariants and failure modes)

  1. Read IA32_VMX_BASIC and verify VMX is supported.
  2. Read control MSRs (pin-based, proc-based, exit, entry).
  3. Compute valid control values with the allowed-0/allowed-1 masks.
  4. Write controls into VMCS fields.
  5. Invariant: VMCS controls must satisfy MSR constraints or VM entry fails.

Minimal concrete example

uint64_t msr = rdmsr(MSR_IA32_VMX_PROCBASED_CTLS);
uint32_t must_be_1 = (uint32_t)msr;
uint32_t may_be_1  = (uint32_t)(msr >> 32);

uint32_t desired = CPU_BASED_HLT_EXITING | CPU_BASED_CR8_LOAD_EXITING;
uint32_t controls = (desired | must_be_1) & may_be_1;
vmwrite(CPU_BASED_VM_EXEC_CONTROL, controls);

Common misconceptions

  • “You can set any control bit you want.” -> Control bits are constrained by the CPU’s capability MSRs.
  • “VMXON only needs a zeroed page.” -> The VMXON region must contain the correct revision ID.
  • “CR0/CR4 are entirely guest-controlled.” -> VMX imposes fixed bits that must be respected.

Check-your-understanding questions

  1. Why does VMLAUNCH fail if VMX control bits violate capability MSRs?
  2. What is the purpose of the VMCS revision ID?
  3. Predict what happens if you set a control bit that the CPU disallows.
  4. How do IA32_VMX_TRUE_* MSRs differ from the non-“true” ones?

Check-your-understanding answers

  1. The CPU validates controls at VM entry and rejects invalid configurations.
  2. It identifies the VMCS format expected by the CPU for this generation.
  3. VM entry fails with an “invalid control” error or VMX abort.
  4. They provide more accurate allowed-bit masks and should be preferred when available.

Real-world applications

  • Every production hypervisor (VMware, KVM, Hyper-V) computes control fields from MSR masks.
  • CPU feature detection for live migration relies on these capability queries.

Where you’ll apply it

References

  • Intel SDM, VMX capability MSR tables
  • Linux KVM documentation on VMX control validation

Key insights

VMX control fields are not a guess; they are derived from hardware-defined masks.

Summary

You can now compute valid VMX control values and avoid VM entry failures.

Homework/Exercises to practice the concept

  1. Write a helper function adjust_vmx_ctrl(desired, msr) and test it with multiple MSRs.
  2. Read IA32_VMX_BASIC and extract the VMCS revision ID.

Solutions to the homework/exercises

  1. Implement (desired | must_be_1) & may_be_1 and validate against known example values.
  2. The revision ID is the low 31 bits of the MSR; write it into the VMXON/VMCS region header.

3. Project Specification

3.1 What You Will Build

A bare-metal VMM that:

  • Enters VMX operation and configures VMCS.
  • Sets up EPT and guest memory.
  • Runs a tiny guest and handles VM exits.
  • Logs exit reasons and handles shutdown.

Included: VMXON/VMCS, EPT, exit handling, minimal guest. Excluded: full device model, SMP, virtio.

3.2 Functional Requirements

  1. VMX Init: VMXON and VMCS setup.
  2. Guest State: valid registers and segments.
  3. EPT: map guest memory.
  4. Exit Handling: CPUID, HLT, EPT violations.
  5. Logging: exit reasons with RIP.

3.3 Non-Functional Requirements

  • Reliability: VMLAUNCH must succeed consistently.
  • Usability: clear log output and error codes.

3.4 Example Usage / Output

$ sudo ./vmm
[VMX] VMXON success
[VMX] VMCS configured
[VMX] VMLAUNCH
[VMX] VMEXIT reason=CPUID rip=0x1000
[VMX] VMEXIT reason=HLT rip=0x1005
[VMX] Guest halted

3.5 Data Formats / Schemas / Protocols

  • Guest binary: flat binary at GPA 0x1000.
  • Exit log format: [VMX] VMEXIT reason=<reason> rip=<addr>.

3.6 Edge Cases

  • VMX disabled in BIOS -> exit code 2.
  • VMLAUNCH invalid state -> exit code 3.
  • EPT violation -> exit code 4.

3.7 Real World Outcome

You will have a working, minimal VMM that runs guest instructions directly on hardware.

3.7.1 How to Run (Copy/Paste)

sudo ./vmm

3.7.2 Golden Path Demo (Deterministic)

  • Guest executes CPUID then HLT; logs show two exits and clean shutdown.

3.7.3 CLI Transcript (Success + Failure)

$ sudo ./vmm
[VMX] ok
[exit] code=0

$ sudo ./vmm
[error] VMX disabled in BIOS
[exit] code=2

Exit codes:

  • 0 success
  • 2 VMX disabled
  • 3 invalid VMCS
  • 4 EPT violation

4. Solution Architecture

4.1 High-Level Design

VMM
 |-- VMX init
 |-- VMCS setup
 |-- EPT tables
 |-- VM exit loop

4.2 Key Components

| Component | Responsibility | Key Decisions | |———–|—————-|—————| | VMX init | enter VMX operation | validate MSRs | | VMCS config | guest/host state | minimal set | | EPT | memory translation | identity map | | Exit handler | emulate CPUID/HLT | log exits |

4.3 Data Structures (No Full Code)

struct VMXState {
  void *vmxon;
  void *vmcs;
  void *ept_pml4;
};

4.4 Algorithm Overview

Key Algorithm: VM Run Loop

  1. VMLAUNCH or VMRESUME.
  2. Handle exit.
  3. Resume until halt.

Complexity Analysis:

  • Time: O(number of exits)
  • Space: O(guest memory + EPT tables)

5. Implementation Guide

5.1 Development Environment Setup

# Requires kernel module or bare metal environment

5.2 Project Structure

vmm/
+-- src/
|   +-- vmx.c
|   +-- ept.c
|   +-- main.c
+-- guest/
    +-- guest.bin

5.3 The Core Question You’re Answering

“How do you control VMX directly without KVM?”

5.4 Concepts You Must Understand First

  1. VMX operation and VMCS fields
  2. EPT mappings
  3. Exit handling

5.5 Questions to Guide Your Design

  1. Which exits will you intercept first?
  2. How will you validate VMCS state?
  3. How will you log errors for debugging?

5.6 Thinking Exercise

Design the minimal VMCS fields required to execute a HLT instruction.

5.7 The Interview Questions They’ll Ask

  1. What is a VMCS and why is it per vCPU?
  2. Explain EPT vs shadow paging.
  3. Why do VMX instructions fail?

5.8 Hints in Layers

Hint 1: Start with a VMXON test only. Hint 2: Add VMCS fields incrementally. Hint 3: Handle only HLT exit first.

5.9 Books That Will Help

| Topic | Book | Chapter | |——-|——|———| | VMX | Intel SDM | Vol. 3 | | Virtual Machines | OS Concepts | Ch. 16 | | Virtual Memory | CSAPP | Ch. 9 |

5.10 Implementation Phases

Phase 1: Foundation (2-3 weeks)

Goals: Enter VMX operation and validate VMCS. Tasks: VMXON, VMCS allocation, guest state. Checkpoint: VMLAUNCH succeeds.

Phase 2: Core Functionality (2-3 weeks)

Goals: EPT and exit handling. Tasks: Map memory, handle CPUID and HLT exits. Checkpoint: Guest executes and halts.

Phase 3: Polish & Edge Cases (1-2 weeks)

Goals: Logging and error reporting. Tasks: VM-instruction error codes and diagnostics. Checkpoint: invalid VMCS yields clear error.

5.11 Key Implementation Decisions

| Decision | Options | Recommendation | Rationale | |———-|———|—————-|———–| | Guest mode | real vs long mode | real mode for minimal | simpler state | | EPT page size | 4KB vs 2MB | 2MB | fewer tables | | Exit handling | minimal vs broad | minimal | focus on core |


6. Testing Strategy

6.1 Test Categories

| Category | Purpose | Examples | |———-|———|———-| | Unit Tests | VMX helpers | CR0 sanitization | | Integration Tests | guest run | CPUID + HLT | | Edge Case Tests | VMX disabled | BIOS off |

6.2 Critical Test Cases

  1. VMXON succeeds and VMLAUNCH runs.
  2. Guest executes CPUID then halts.
  3. VMX disabled error returns exit code 2.

6.3 Test Data

Guest binaries: hlt.bin, cpuid.bin

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

| Pitfall | Symptom | Solution | |———|———|———-| | Missing fixed bits | VMLAUNCH fails | apply fixed0/fixed1 masks | | Wrong segment state | triple fault | use known-good descriptors | | EPT misalignment | EPT violation | align tables |

7.2 Debugging Strategies

  • Read VM-instruction error field.
  • Log VM exit reason and RIP.

7.3 Performance Traps

  • Excessive logging can dominate runtime.

8. Extensions & Challenges

8.1 Beginner Extensions

  • Add MSR emulation for IA32_EFER.

8.2 Intermediate Extensions

  • Support 64-bit long mode guest.

8.3 Advanced Extensions

  • Add simple MMIO device emulation.

9. Real-World Connections

9.1 Industry Applications

  • Hypervisors and secure enclaves.
  • KVM: kernel VMX usage.
  • Xen: alternative hypervisor architecture.

9.3 Interview Relevance

  • VMX internals are a niche but impressive systems topic.

10. Resources

10.1 Essential Reading

  • Intel SDM VMX chapters

10.2 Video Resources

  • VMX deep dive talks

10.3 Tools & Documentation

  • rdmsr, wrmsr, kvm-unit-tests

11. Self-Assessment Checklist

11.1 Understanding

  • I can explain VMX operation and VMCS fields.
  • I understand EPT translation.
  • I can interpret VM-exit reasons.

11.2 Implementation

  • VMLAUNCH succeeds.
  • Guest executes and halts.
  • Error handling is clear.

11.3 Growth

  • I can explain this VMM in a technical interview.

12. Submission / Completion Criteria

Minimum Viable Completion:

  • VMXON + VMLAUNCH works.
  • Guest runs and halts.

Full Completion:

  • EPT and exit handling implemented.
  • Deterministic logs.

Excellence (Going Above & Beyond):

  • Long mode guest or MMIO emulation.