Project 13: Operating System Kernel (Capstone)

The ultimate C++ challenge: writing bare-metal code that runs with no safety net

Quick Reference

Attribute Value
Difficulty Master
Time Estimate 6-12 months
Language C++ (with C and Assembly)
Prerequisites All previous projects, assembly basics, computer architecture
Key Topics Bare metal, paging, scheduling, interrupts, filesystems

1. Learning Objectives

By completing this capstone project, you will:

  • Understand the boot process from BIOS/UEFI through bootloader to kernel initialization
  • Implement memory management including physical/virtual memory, paging, and heap allocation
  • Master interrupt handling for hardware events, exceptions, and system calls
  • Build a process scheduler with context switching and multiple execution contexts
  • Create a simple filesystem for persistent storage and file operations
  • Write C++ without the standard library using freestanding mode and custom allocators
  • Use inline assembly to interface directly with CPU instructions and hardware
  • Debug bare-metal code using QEMU, GDB, and serial output

This project synthesizes everything from the previous 12 projects: memory management from Project 3, data structures from Project 4, concurrency from Project 5, parsing from Projects 1/6/10, and low-level I/O from Projects 2/7/8.


2. Theoretical Foundation

2.1 Core Concepts

What is an Operating System Kernel?

The kernel is the core of an operating system. It’s the first program that runs after the bootloader, and it provides the fundamental abstractions that all other programs depend on:

Hardware Layer (CPU, Memory, Devices)
           |
           v
+------------------------+
|        KERNEL          |
| - Memory Management    |
| - Process Scheduling   |
| - Device Drivers       |
| - System Calls         |
| - Interrupt Handling   |
+------------------------+
           |
           v
     User Programs

Key Abstractions the Kernel Provides:

  1. Processes - Illusion of dedicated CPU for each program
  2. Virtual Memory - Illusion of private address space for each process
  3. Files - Uniform interface to persistent storage
  4. System Calls - Safe way for user programs to request kernel services

The Privilege Model:

Modern CPUs have protection rings. x86 has 4 rings, but most OSes use only 2:

Ring 0 (Kernel Mode)
+------------------------------------------+
| - Full hardware access                   |
| - Can execute privileged instructions    |
| - Can access all memory                  |
| - Handles interrupts and exceptions      |
+------------------------------------------+

Ring 3 (User Mode)
+------------------------------------------+
| - Restricted hardware access             |
| - Cannot execute privileged instructions |
| - Can only access permitted memory       |
| - Must use syscalls for kernel services  |
+------------------------------------------+

Memory Architecture:

Physical Memory Layout (simplified):
+----------------+ 0x00000000
| Reserved/BIOS  |
+----------------+ 0x00100000 (1 MB)
| Kernel Code    |
| Kernel Data    |
| Kernel Heap    |
+----------------+
| Free Memory    |
| (for processes)|
+----------------+ End of RAM

Virtual Address Space (per process):
+----------------+ 0x00000000
| User Code      |
| User Data      |
| User Heap      |
|       v        |
|                |
|       ^        |
| User Stack     |
+----------------+ 0xC0000000 (typical split)
| Kernel Space   |
| (mapped same   |
|  in all procs) |
+----------------+ 0xFFFFFFFF

2.2 Why This Matters

Understanding the Foundation:

Every program you’ve ever written runs on top of an operating system. When you call malloc(), the kernel’s memory manager handles the request. When you read a file, the kernel’s filesystem and device drivers do the work. When your program runs concurrently with others, the kernel’s scheduler makes it possible.

Writing a kernel gives you:

  • Complete understanding of how software interacts with hardware
  • Debugging superpowers - you’ll understand crashes, memory issues, and performance at the deepest level
  • Systems intuition - you’ll make better design decisions in all future projects
  • Ultimate C++ mastery - no safety net means you truly understand the language

Industry Relevance:

  • Embedded systems - IoT devices, automotive, aerospace all need kernel-level code
  • Cloud infrastructure - Hypervisors, containers, and orchestration require kernel knowledge
  • Security - Kernel vulnerabilities are the most critical; understanding prevents them
  • Performance - When milliseconds matter, kernel-level optimization is essential

2.3 Historical Context

The Evolution of Operating Systems:

1960s: Batch Processing
       Programs run one at a time, no interaction

1970s: Time-Sharing (UNIX born)
       Multiple users share a single computer
       Ken Thompson and Dennis Ritchie at Bell Labs

1980s: Personal Computers
       MS-DOS, Mac OS, early Windows
       Single-user, limited multitasking

1990s: Modern Kernels
       Linux kernel (1991), Windows NT (1993)
       True multitasking, virtual memory, networking

2000s-Present: Cloud and Containers
       Hypervisors, containers, microservices
       Kernels now virtualized and containerized

Key Kernel Architectures:

Monolithic Kernel (Linux, BSD):
+------------------------------------------+
| All kernel services in one address space |
| Drivers, filesystem, networking, etc.    |
| Fast (no IPC overhead)                   |
| Large attack surface                     |
+------------------------------------------+

Microkernel (MINIX, QNX, seL4):
+------------------+
| Minimal kernel   |
| IPC, scheduling  |
| memory basics    |
+------------------+
        ^
        | IPC
        v
+-------+-------+-------+
| FS    | Net   |Drivers|  <- User-space servers
+-------+-------+-------+
Small attack surface, but IPC overhead

Hybrid (Windows NT, macOS):
Mix of both approaches

2.4 Common Misconceptions

Misconception 1: “Writing an OS requires years of work” Reality: A minimal kernel that boots, manages memory, and runs simple programs can be built in months. Linux started as a hobby project.

Misconception 2: “You can’t use C++ for kernel development” Reality: Many modern kernels use C++. You just can’t use features that require runtime support (exceptions, RTTI) without implementing that support first.

Misconception 3: “Kernels are only for x86” Reality: The concepts are universal. ARM, RISC-V, MIPS all work similarly. We use x86/x86-64 because QEMU emulation is excellent.

Misconception 4: “I need real hardware to develop a kernel” Reality: QEMU provides perfect emulation. You can develop entirely in a virtual machine with full debugging support.

Misconception 5: “Kernel development is dangerous and could break my computer” Reality: In an emulator, the worst case is restarting QEMU. Your host system is completely isolated.


3. Project Specification

3.1 What You Will Build

A minimal but complete operating system kernel for x86-64 architecture that:

  1. Boots via GRUB using the Multiboot2 specification
  2. Sets up the Global Descriptor Table (GDT) for memory segmentation
  3. Configures the Interrupt Descriptor Table (IDT) for interrupt handling
  4. Implements paging for virtual memory with 4KB pages
  5. Provides a heap allocator with custom operator new and operator delete
  6. Includes a process scheduler with round-robin scheduling and context switching
  7. Has a simple filesystem for reading files from a RAM disk
  8. Runs user-space programs with proper privilege separation
  9. Provides system calls for I/O, process control, and memory allocation

3.2 Functional Requirements

ID Requirement Description
F1 Boot Kernel boots via GRUB and reaches kernel_main()
F2 Terminal VGA text-mode output with scrolling and colors
F3 GDT Proper memory segmentation for kernel and user mode
F4 IDT Interrupt handlers for CPU exceptions and hardware IRQs
F5 Paging Virtual memory with kernel mapped in high memory
F6 Heap Dynamic memory allocation with new/delete
F7 Keyboard PS/2 keyboard input via IRQ1
F8 Timer PIT or APIC timer for scheduling at ~100Hz
F9 Scheduler Round-robin scheduling with context switching
F10 Filesystem Simple filesystem (initrd or ext2-like)
F11 User Mode Transition to Ring 3 for user programs
F12 Syscalls System call interface via software interrupt
F13 Shell Basic command interpreter for user interaction

3.3 Non-Functional Requirements

ID Requirement Target
NF1 Boot Time < 1 second from GRUB to shell prompt
NF2 Memory Support up to 4GB RAM (32-bit) or 16GB (64-bit)
NF3 Stability No kernel panics during normal operation
NF4 Debuggability Serial console output for QEMU debugging
NF5 Portability Clean separation between arch-specific and generic code

3.4 Example Usage / Output

$ make run
qemu-system-x86_64 -kernel kernel.bin -serial stdio

Booting MyOS v0.1.0...

[  0.000] Kernel loaded at 0xFFFFFFFF80100000
[  0.001] CPU: GenuineIntel, 64-bit mode
[  0.002] Memory: 128 MB detected

[  0.003] Initializing subsystems...
[  0.003]   GDT: OK (5 entries)
[  0.004]   IDT: OK (256 entries)
[  0.005]   Paging: OK (kernel mapped at 0xFFFFFFFF80000000)
[  0.010]   Heap: OK (64 MB available)
[  0.011]   Timer: OK (PIT at 100 Hz)
[  0.012]   Keyboard: OK (PS/2)
[  0.015]   Filesystem: OK (initrd mounted at /)
[  0.020]   Scheduler: OK

[  0.025] Starting init process (PID 1)...

Welcome to MyOS!

myos:/$ ls
bin/  dev/  etc/  home/  usr/

myos:/$ cat /etc/motd
Welcome to MyOS - A minimal operating system built from scratch!

myos:/$ ps
PID   STATE     CMD
1     RUNNING   init
2     RUNNING   shell

myos:/$ /bin/hello
Hello from user space!
System call test: write() works!

myos:/$ /bin/forktest
Parent process: PID 3
Child process: PID 4
Fork and exec working correctly!

myos:/$ free
Total: 128 MB
Used:  12 MB
Free:  116 MB
Kernel: 4 MB
Heap: 8 MB

myos:/$ uptime
System uptime: 0d 0h 5m 23s
Tick count: 32300

myos:/$ exit
Halting system...

3.5 Real World Outcome

When you complete this project, you will have:

  1. A bootable kernel that runs in QEMU (and potentially real hardware)
  2. Full source code for memory management, scheduling, and I/O
  3. Understanding of OS concepts that’s impossible to get from reading alone
  4. A foundation for building more advanced OS features
  5. Interview-ready knowledge of systems programming fundamentals

You’ll be able to explain:

  • How a computer boots from power-on to running your code
  • How virtual memory protects processes from each other
  • How the CPU switches between processes
  • How system calls transition between user and kernel mode
  • How interrupts enable hardware communication

4. Solution Architecture

4.1 High-Level Design

+------------------------------------------------------------------+
|                        User Space (Ring 3)                        |
|  +----------+  +----------+  +----------+  +----------+          |
|  |  shell   |  |  hello   |  | forktest |  |   ...    |          |
|  +----+-----+  +----+-----+  +----+-----+  +----+-----+          |
|       |             |             |             |                 |
+-------|-------------|-------------|-------------|----------------+
        |             |             |             |
        v             v             v             v
   +----------------------------------------------------------+
   |                  System Call Interface                     |
   |  int 0x80 / syscall instruction                           |
   +----------------------------------------------------------+
        |             |             |             |
+-------|-------------|-------------|-------------|----------------+
|                        Kernel Space (Ring 0)                      |
|  +-------------+  +-------------+  +-------------+               |
|  | Scheduler   |  | Memory Mgr  |  | Filesystem  |               |
|  +------+------+  +------+------+  +------+------+               |
|         |                |                |                      |
|  +------v------+  +------v------+  +------v------+               |
|  | Process     |  | Page        |  | VFS         |               |
|  | Control     |  | Allocator   |  | Layer       |               |
|  +------+------+  +------+------+  +------+------+               |
|         |                |                |                      |
|  +------v----------------v----------------v------+               |
|  |              Hardware Abstraction             |               |
|  |  +-------+  +-------+  +-------+  +-------+  |               |
|  |  |  GDT  |  |  IDT  |  | Timer |  |  KBD  |  |               |
|  |  +-------+  +-------+  +-------+  +-------+  |               |
|  +------------------------------------------------+               |
+------------------------------------------------------------------+
        |             |             |             |
        v             v             v             v
+------------------------------------------------------------------+
|                         Hardware Layer                            |
|  CPU     RAM     VGA     PIT     PIC     PS/2 Keyboard           |
+------------------------------------------------------------------+

4.2 Key Components

Boot Sequence:

1. BIOS/UEFI -> GRUB Bootloader
                    |
2. GRUB loads kernel.bin at 0x100000 (1MB)
                    |
3. boot.asm: _start entry point
   - Set up initial stack
   - Check multiboot magic
   - Call kernel_main()
                    |
4. kernel_main() (C++)
   - Initialize GDT (segmentation)
   - Initialize IDT (interrupts)
   - Enable paging (virtual memory)
   - Initialize heap (dynamic allocation)
   - Initialize drivers (keyboard, timer)
   - Start scheduler
   - Launch init process

Memory Map After Boot:

Virtual Address Space (Higher-Half Kernel):

0xFFFFFFFF'FFFFFFFF  +------------------------+
                     |    Kernel Stack        |
0xFFFFFFFF'80100000  +------------------------+
                     |    Kernel Code/Data    |
0xFFFFFFFF'80000000  +------------------------+
                     |    Kernel Heap         |
                     |         v              |
                     |                        |
                     |         ^              |
0x00007FFF'FFFFFFFF  +------------------------+
                     |    User Stack          |
                     |         v              |
                     |                        |
                     |         ^              |
                     |    User Heap           |
                     +------------------------+
                     |    User Code/Data      |
0x00000000'00400000  +------------------------+
                     |    Reserved            |
0x00000000'00000000  +------------------------+

4.3 Data Structures

Process Control Block (PCB):

enum class ProcessState {
    READY,
    RUNNING,
    BLOCKED,
    TERMINATED
};

struct Context {
    uint64_t rax, rbx, rcx, rdx;
    uint64_t rsi, rdi, rbp;
    uint64_t r8, r9, r10, r11, r12, r13, r14, r15;
    uint64_t rip;           // Instruction pointer
    uint64_t cs;            // Code segment
    uint64_t rflags;        // CPU flags
    uint64_t rsp;           // Stack pointer
    uint64_t ss;            // Stack segment
};

struct Process {
    uint32_t pid;
    ProcessState state;
    Context context;
    uint64_t* page_table;   // CR3 value
    uint64_t kernel_stack;
    uint64_t user_stack;
    char name[64];
    Process* next;          // For scheduler queue
};

Page Table Entry (x86-64):

struct PageTableEntry {
    uint64_t present    : 1;   // Page is in memory
    uint64_t writable   : 1;   // Page is writable
    uint64_t user       : 1;   // User-mode accessible
    uint64_t pwt        : 1;   // Page write-through
    uint64_t pcd        : 1;   // Page cache disable
    uint64_t accessed   : 1;   // Page was accessed
    uint64_t dirty      : 1;   // Page was written to
    uint64_t pat        : 1;   // Page attribute table
    uint64_t global     : 1;   // Global page
    uint64_t available  : 3;   // Available for OS use
    uint64_t address    : 40;  // Physical address (bits 12-51)
    uint64_t reserved   : 11;  // Reserved
    uint64_t nx         : 1;   // No-execute bit
};

GDT Entry:

struct GDTEntry {
    uint16_t limit_low;
    uint16_t base_low;
    uint8_t  base_middle;
    uint8_t  access;
    uint8_t  granularity;
    uint8_t  base_high;
} __attribute__((packed));

// Access byte bits:
// Bit 7: Present (1 = valid)
// Bit 6-5: Ring (0 = kernel, 3 = user)
// Bit 4: Descriptor type (1 = code/data)
// Bit 3: Executable (1 = code, 0 = data)
// Bit 2: Direction/Conforming
// Bit 1: Readable/Writable
// Bit 0: Accessed

IDT Entry:

struct IDTEntry {
    uint16_t offset_low;    // Offset bits 0-15
    uint16_t selector;      // Code segment selector
    uint8_t  ist;           // Interrupt Stack Table offset
    uint8_t  type_attr;     // Type and attributes
    uint16_t offset_mid;    // Offset bits 16-31
    uint32_t offset_high;   // Offset bits 32-63
    uint32_t zero;          // Reserved
} __attribute__((packed));

4.4 Algorithm Overview

Context Switching:

1. Timer interrupt fires (IRQ0)
2. CPU pushes interrupt frame (RIP, CS, RFLAGS, RSP, SS)
3. Interrupt handler saves remaining registers
4. Scheduler selects next process
5. Scheduler loads next process's registers
6. IRETQ restores RIP, CS, RFLAGS, RSP, SS
7. Execution continues in new process

Page Fault Handling:

1. CPU generates #PF exception
2. CR2 register contains faulting address
3. Error code indicates:
   - Bit 0: Present (0 = not present, 1 = protection violation)
   - Bit 1: Write (0 = read, 1 = write)
   - Bit 2: User (0 = kernel, 1 = user mode)
4. Handler checks if address is valid for process
5. If valid: allocate page, map it, return
6. If invalid: kill process (segfault)

System Call Flow:

User code:
    mov rax, SYSCALL_WRITE   ; syscall number
    mov rdi, fd              ; arg1
    mov rsi, buffer          ; arg2
    mov rdx, length          ; arg3
    int 0x80                 ; trigger syscall

Kernel:
    1. IDT entry 0x80 points to syscall_handler
    2. Handler saves user context
    3. Dispatch based on rax value
    4. Execute kernel function
    5. Store return value in rax
    6. Restore user context
    7. IRETQ back to user code

5. Implementation Guide

5.1 Development Environment Setup

Required Tools:

# Cross-compiler (builds code for target, not host)
# On Ubuntu/Debian:
sudo apt install build-essential bison flex libgmp3-dev \
    libmpc-dev libmpfr-dev texinfo libisl-dev

# Or install pre-built cross-compiler:
sudo apt install gcc-x86-64-linux-gnu g++-x86-64-linux-gnu

# Assembler and linker
sudo apt install nasm

# QEMU for testing
sudo apt install qemu-system-x86

# GRUB tools for bootable image
sudo apt install grub-pc-bin xorriso mtools

# Debugging
sudo apt install gdb

# macOS (using Homebrew):
brew install x86_64-elf-gcc nasm qemu xorriso mtools

Verify Setup:

x86_64-elf-gcc --version   # Cross-compiler
nasm --version             # Assembler
qemu-system-x86_64 --version  # Emulator
grub-mkrescue --version    # GRUB tools

5.2 Project Structure

myos/
├── Makefile
├── linker.ld               # Linker script
├── grub.cfg                # GRUB configuration
│
├── boot/
│   ├── boot.asm            # Multiboot header and entry point
│   ├── gdt.asm             # GDT setup (assembly)
│   └── interrupts.asm      # Interrupt stubs (assembly)
│
├── kernel/
│   ├── kernel.cpp          # kernel_main()
│   ├── terminal.cpp        # VGA text output
│   ├── terminal.h
│   ├── string.cpp          # String utilities (no libc!)
│   ├── string.h
│   ├── port.h              # I/O port access (inb, outb)
│   └── panic.cpp           # Kernel panic handler
│
├── cpu/
│   ├── gdt.cpp             # GDT initialization
│   ├── gdt.h
│   ├── idt.cpp             # IDT initialization
│   ├── idt.h
│   ├── isr.cpp             # Interrupt service routines
│   └── isr.h
│
├── memory/
│   ├── pmm.cpp             # Physical memory manager
│   ├── pmm.h
│   ├── vmm.cpp             # Virtual memory manager (paging)
│   ├── vmm.h
│   ├── heap.cpp            # Kernel heap
│   └── heap.h
│
├── drivers/
│   ├── keyboard.cpp        # PS/2 keyboard driver
│   ├── keyboard.h
│   ├── timer.cpp           # PIT timer driver
│   └── timer.h
│
├── process/
│   ├── process.cpp         # Process management
│   ├── process.h
│   ├── scheduler.cpp       # Round-robin scheduler
│   └── scheduler.h
│
├── fs/
│   ├── vfs.cpp             # Virtual filesystem layer
│   ├── vfs.h
│   ├── initrd.cpp          # Initial RAM disk
│   └── initrd.h
│
├── syscall/
│   ├── syscall.cpp         # System call dispatcher
│   └── syscall.h
│
└── user/
    ├── shell.cpp           # Simple shell
    ├── init.cpp            # Init process
    └── programs/
        ├── hello.cpp       # Hello world user program
        └── forktest.cpp    # Fork test program

5.3 The Core Question You’re Answering

“How does a computer execute code when there is no operating system to provide services?”

When you power on a computer, there’s nothing to call malloc() on, no printf() to call, no filesystem to read from. The kernel must create all of these services from scratch, using only:

  1. CPU instructions
  2. Memory (as raw bytes)
  3. I/O ports (for hardware communication)

You’re building the layer that makes everything else possible.

5.4 Concepts You Must Understand First

Before writing code, ensure you can answer:

Computer Architecture:

  • What is the difference between real mode and protected mode? (x86)
  • What are privilege rings and how do they work?
  • How does the CPU switch from user mode to kernel mode?

Memory:

  • What is the difference between physical and virtual addresses?
  • How does a page table translate virtual to physical addresses?
  • What is a TLB and why does it matter?

Interrupts:

  • What happens when an interrupt fires?
  • What is the difference between an interrupt and an exception?
  • How does the CPU know where to jump when an interrupt occurs?

C++ Specifics:

  • What is a freestanding environment vs hosted environment?
  • What C++ features require runtime support (and thus won’t work)?
  • How does operator new work when there’s no malloc?

Book References:

  • “Operating Systems: Three Easy Pieces” - Chapters 1-6, 12-16, 36-39
  • “Computer Systems: A Programmer’s Perspective” - Chapter 9 (Virtual Memory)
  • “The Linux Programming Interface” - Chapters 2, 6 (concepts, not Linux-specific)

5.5 Questions to Guide Your Design

Boot Process:

  • How will GRUB find and load your kernel?
  • What state is the CPU in when your code starts executing?
  • What’s the minimum setup needed before calling C++ code?

Memory Management:

  • How will you track which physical pages are free?
  • Will you use a bitmap, linked list, or buddy allocator?
  • How will you handle memory fragmentation in the heap?

Interrupts:

  • How will you handle CPU exceptions (divide by zero, page fault)?
  • How will you handle hardware interrupts (keyboard, timer)?
  • What’s your interrupt handler calling convention?

Scheduling:

  • What scheduling algorithm will you use?
  • How will you handle processes that block (waiting for I/O)?
  • What data do you need to save during context switch?

User Space:

  • How will you transition from kernel mode to user mode?
  • How will user programs request kernel services?
  • How will you protect kernel memory from user programs?

5.6 Thinking Exercise

Before writing any code, trace through this scenario mentally:

Exercise: A key is pressed on the keyboard

  1. Where does the electrical signal go first?
  2. How does the CPU know a key was pressed?
  3. What happens to the currently running process?
  4. How does the kernel get control?
  5. How is the key data read?
  6. How does the data get to the waiting process?
  7. How does the waiting process resume?

Write out each step with the specific registers, instructions, and data structures involved. This exercise will reveal gaps in your understanding before you start coding.

5.7 Hints in Layers

Hint 1 - Starting Point: Begin with boot.asm that sets up a Multiboot header. The header tells GRUB this is a valid kernel. Set up a stack and call kernel_main(). In kernel_main(), just write “Hello” to VGA memory at 0xB8000.

Hint 2 - Next Level (GDT/IDT): The GDT defines memory segments. Set up null, kernel code, kernel data, user code, and user data segments. The IDT maps interrupt numbers to handlers. Create stub handlers that just print “Interrupt X” and hang.

Hint 3 - Technical Details (Memory): For physical memory, use a bitmap where each bit represents a 4KB page. Parse the multiboot memory map to find usable regions. For paging, set up a 4-level page table (PML4 -> PDPT -> PD -> PT) and enable paging by setting CR3 and the PG bit in CR0.

Hint 4 - Tools/Debugging: Use QEMU with -d int -no-reboot -no-shutdown to see interrupts. Use -serial stdio for serial output (easier to debug than VGA). Use GDB with qemu-system-x86_64 -s -S to single-step through code.

5.8 The Interview Questions They’ll Ask

  1. “Explain the x86 boot process from power-on to your kernel’s entry point.”
    • Expected: BIOS POST, MBR/GPT, bootloader stages, protected/long mode transition
  2. “How does virtual memory protect processes from each other?”
    • Expected: Separate page tables, user/supervisor bit, page fault handling
  3. “Walk me through what happens when a process makes a system call.”
    • Expected: int/syscall instruction, privilege switch, register conventions, iret
  4. “How does your scheduler choose the next process to run?”
    • Expected: Ready queue, round-robin, time quantum, context switch mechanics
  5. “What happens when a page fault occurs?”
    • Expected: CR2 contains faulting address, error code interpretation, demand paging
  6. “How do you handle the ‘no standard library’ constraint in C++?”
    • Expected: Freestanding mode, custom operator new/delete, no exceptions/RTTI
  7. “What’s the difference between an interrupt and an exception?”
    • Expected: External vs internal, IRQs vs traps/faults, PIC/APIC vs CPU-generated
  8. “How would you add support for multiprocessing (SMP)?”
    • Expected: Per-CPU structures, APIC, spinlocks, scheduler per CPU

5.9 Books That Will Help

Topic Book Chapter(s)
OS Concepts “Operating Systems: Three Easy Pieces” All, especially Part I & II
x86 Architecture Intel Manual Volume 3A Chapters 2-6
Virtual Memory “CS:APP” Chapter 9
Boot Process “Low-Level Programming” by Igor Zhirkov Chapters 17-19
C++ Freestanding “Effective Modern C++” Items relevant to embedded
Filesystems “Operating Systems: Three Easy Pieces” Chapters 39-42
Scheduling “Operating Systems: Three Easy Pieces” Chapters 7-10

5.10 Implementation Phases

Phase 1: Boot and Print (Week 1-2)

Goal: Kernel boots and prints “Hello” to screen

; boot.asm - Multiboot header
section .multiboot
align 4
    dd 0x1BADB002              ; Magic number
    dd 0x00                    ; Flags (no special requirements)
    dd -(0x1BADB002 + 0x00)    ; Checksum (must equal zero when added)

section .bss
align 16
stack_bottom:
    resb 16384                 ; 16 KB stack
stack_top:

section .text
global _start
extern kernel_main

_start:
    ; Set up stack
    mov esp, stack_top

    ; Disable interrupts
    cli

    ; Call C++ kernel
    call kernel_main

    ; Halt if kernel returns
    cli
.hang:
    hlt
    jmp .hang
// kernel.cpp
extern "C" void kernel_main() {
    // VGA text buffer at 0xB8000
    volatile char* video = (volatile char*)0xB8000;

    const char* hello = "Hello from kernel!";
    for (int i = 0; hello[i] != '\0'; i++) {
        video[i * 2] = hello[i];      // Character
        video[i * 2 + 1] = 0x0F;      // White on black
    }

    while (true) {
        asm volatile("hlt");
    }
}

Phase 2: GDT and Protected Mode (Week 3-4)

Goal: Proper memory segmentation

// gdt.h
struct GDTEntry {
    uint16_t limit_low;
    uint16_t base_low;
    uint8_t  base_middle;
    uint8_t  access;
    uint8_t  granularity;
    uint8_t  base_high;
} __attribute__((packed));

struct GDTPointer {
    uint16_t limit;
    uint64_t base;
} __attribute__((packed));

namespace GDT {
    void initialize();
}

// gdt.cpp
static GDTEntry gdt[5];
static GDTPointer gdtr;

void setEntry(int num, uint32_t base, uint32_t limit,
              uint8_t access, uint8_t gran) {
    gdt[num].base_low = base & 0xFFFF;
    gdt[num].base_middle = (base >> 16) & 0xFF;
    gdt[num].base_high = (base >> 24) & 0xFF;
    gdt[num].limit_low = limit & 0xFFFF;
    gdt[num].granularity = ((limit >> 16) & 0x0F) | (gran & 0xF0);
    gdt[num].access = access;
}

void GDT::initialize() {
    gdtr.limit = sizeof(gdt) - 1;
    gdtr.base = (uint64_t)&gdt;

    setEntry(0, 0, 0, 0, 0);                    // Null
    setEntry(1, 0, 0xFFFFFFFF, 0x9A, 0xCF);     // Kernel code
    setEntry(2, 0, 0xFFFFFFFF, 0x92, 0xCF);     // Kernel data
    setEntry(3, 0, 0xFFFFFFFF, 0xFA, 0xCF);     // User code
    setEntry(4, 0, 0xFFFFFFFF, 0xF2, 0xCF);     // User data

    // Load GDT (assembly helper)
    gdt_flush(&gdtr);
}

Phase 3: Interrupts (Week 5-6)

Goal: Handle CPU exceptions and hardware interrupts

// idt.h
struct IDTEntry {
    uint16_t offset_low;
    uint16_t selector;
    uint8_t  ist;
    uint8_t  type_attr;
    uint16_t offset_mid;
    uint32_t offset_high;
    uint32_t zero;
} __attribute__((packed));

namespace IDT {
    void initialize();
    void setEntry(uint8_t vector, void (*handler)(), uint8_t type);
}

// isr.cpp
extern "C" void isr_handler(uint64_t vector, uint64_t error_code) {
    if (vector == 14) {
        // Page fault
        uint64_t faulting_address;
        asm volatile("mov %%cr2, %0" : "=r"(faulting_address));
        Terminal::printf("Page Fault at 0x%x\n", faulting_address);
    } else {
        Terminal::printf("Exception %d (error: %d)\n", vector, error_code);
    }
    while (true) asm volatile("hlt");
}

extern "C" void irq_handler(uint64_t irq) {
    if (irq == 0) {
        // Timer
        Scheduler::tick();
    } else if (irq == 1) {
        // Keyboard
        uint8_t scancode = inb(0x60);
        Keyboard::handle(scancode);
    }

    // Send EOI to PIC
    if (irq >= 8) outb(0xA0, 0x20);  // Slave PIC
    outb(0x20, 0x20);                 // Master PIC
}

Phase 4: Memory Management (Week 7-10)

Goal: Physical and virtual memory working

// pmm.cpp - Physical Memory Manager
class PhysicalMemoryManager {
    uint8_t* bitmap;
    size_t total_pages;
    size_t used_pages;

public:
    void initialize(uint64_t memory_size) {
        total_pages = memory_size / 4096;
        used_pages = 0;

        // Bitmap needs 1 bit per page
        size_t bitmap_size = total_pages / 8;
        bitmap = (uint8_t*)BITMAP_ADDRESS;

        // Mark all pages as used initially
        memset(bitmap, 0xFF, bitmap_size);
    }

    void markFree(uint64_t base, uint64_t length) {
        uint64_t start_page = base / 4096;
        uint64_t num_pages = length / 4096;

        for (uint64_t i = 0; i < num_pages; i++) {
            uint64_t page = start_page + i;
            bitmap[page / 8] &= ~(1 << (page % 8));
        }
    }

    uint64_t allocate() {
        for (size_t i = 0; i < total_pages; i++) {
            if (!(bitmap[i / 8] & (1 << (i % 8)))) {
                bitmap[i / 8] |= (1 << (i % 8));
                used_pages++;
                return i * 4096;
            }
        }
        return 0;  // Out of memory
    }

    void free(uint64_t address) {
        uint64_t page = address / 4096;
        bitmap[page / 8] &= ~(1 << (page % 8));
        used_pages--;
    }
};

// vmm.cpp - Virtual Memory Manager
void mapPage(uint64_t* pml4, uint64_t virtual_addr, uint64_t physical_addr,
             uint64_t flags) {
    uint64_t pml4_idx = (virtual_addr >> 39) & 0x1FF;
    uint64_t pdpt_idx = (virtual_addr >> 30) & 0x1FF;
    uint64_t pd_idx   = (virtual_addr >> 21) & 0x1FF;
    uint64_t pt_idx   = (virtual_addr >> 12) & 0x1FF;

    // Traverse/create page table hierarchy
    uint64_t* pdpt = getOrCreateTable(pml4, pml4_idx);
    uint64_t* pd   = getOrCreateTable(pdpt, pdpt_idx);
    uint64_t* pt   = getOrCreateTable(pd, pd_idx);

    // Set the final mapping
    pt[pt_idx] = (physical_addr & ~0xFFF) | flags | PAGE_PRESENT;

    // Invalidate TLB for this address
    asm volatile("invlpg (%0)" :: "r"(virtual_addr) : "memory");
}

Phase 5: Heap and Allocators (Week 11-12)

Goal: operator new/delete working

// heap.cpp
struct BlockHeader {
    size_t size;
    bool free;
    BlockHeader* next;
};

static BlockHeader* heap_start = nullptr;
static BlockHeader* heap_end = nullptr;

void Heap::initialize(uint64_t start, size_t size) {
    heap_start = (BlockHeader*)start;
    heap_start->size = size - sizeof(BlockHeader);
    heap_start->free = true;
    heap_start->next = nullptr;
    heap_end = heap_start;
}

void* Heap::allocate(size_t size) {
    // Align to 16 bytes
    size = (size + 15) & ~15;

    BlockHeader* current = heap_start;
    while (current) {
        if (current->free && current->size >= size) {
            // Split block if significantly larger
            if (current->size > size + sizeof(BlockHeader) + 32) {
                BlockHeader* new_block = (BlockHeader*)
                    ((uint8_t*)current + sizeof(BlockHeader) + size);
                new_block->size = current->size - size - sizeof(BlockHeader);
                new_block->free = true;
                new_block->next = current->next;
                current->size = size;
                current->next = new_block;
            }
            current->free = false;
            return (void*)((uint8_t*)current + sizeof(BlockHeader));
        }
        current = current->next;
    }
    return nullptr;  // Out of heap
}

void Heap::free(void* ptr) {
    if (!ptr) return;
    BlockHeader* header = (BlockHeader*)((uint8_t*)ptr - sizeof(BlockHeader));
    header->free = true;

    // Coalesce with next block if free
    if (header->next && header->next->free) {
        header->size += sizeof(BlockHeader) + header->next->size;
        header->next = header->next->next;
    }
}

// Custom operators
void* operator new(size_t size) {
    return Heap::allocate(size);
}

void* operator new[](size_t size) {
    return Heap::allocate(size);
}

void operator delete(void* ptr) noexcept {
    Heap::free(ptr);
}

void operator delete[](void* ptr) noexcept {
    Heap::free(ptr);
}

void operator delete(void* ptr, size_t) noexcept {
    Heap::free(ptr);
}

void operator delete[](void* ptr, size_t) noexcept {
    Heap::free(ptr);
}

// Placement new (always available)
inline void* operator new(size_t, void* ptr) noexcept {
    return ptr;
}

Phase 6: Scheduling (Week 13-16)

Goal: Multiple processes with context switching

// scheduler.cpp
static Process* current_process = nullptr;
static Process* ready_queue = nullptr;

Process* Scheduler::createProcess(void (*entry)()) {
    Process* proc = new Process();
    proc->pid = next_pid++;
    proc->state = ProcessState::READY;

    // Allocate kernel stack
    proc->kernel_stack = (uint64_t)Heap::allocate(KERNEL_STACK_SIZE);

    // Allocate page table
    proc->page_table = VMM::createAddressSpace();

    // Set up initial context
    proc->context.rip = (uint64_t)entry;
    proc->context.cs = KERNEL_CODE_SELECTOR;
    proc->context.rflags = 0x202;  // IF set
    proc->context.rsp = proc->kernel_stack + KERNEL_STACK_SIZE;
    proc->context.ss = KERNEL_DATA_SELECTOR;

    // Add to ready queue
    addToReadyQueue(proc);

    return proc;
}

void Scheduler::schedule() {
    if (!ready_queue) return;

    // Round-robin: move current to end of queue
    if (current_process && current_process->state == ProcessState::RUNNING) {
        current_process->state = ProcessState::READY;
        addToReadyQueue(current_process);
    }

    // Get next process
    Process* next = ready_queue;
    ready_queue = ready_queue->next;
    next->next = nullptr;
    next->state = ProcessState::RUNNING;

    Process* prev = current_process;
    current_process = next;

    if (prev != next) {
        // Switch page tables
        asm volatile("mov %0, %%cr3" :: "r"(next->page_table));

        // Context switch (assembly helper)
        context_switch(&prev->context, &next->context);
    }
}

void Scheduler::tick() {
    static int ticks = 0;
    if (++ticks >= TIME_QUANTUM) {
        ticks = 0;
        schedule();
    }
}
; context_switch.asm
global context_switch
; void context_switch(Context* old, Context* new)

context_switch:
    ; Save old context
    mov [rdi + 0], rax
    mov [rdi + 8], rbx
    mov [rdi + 16], rcx
    mov [rdi + 24], rdx
    mov [rdi + 32], rsi
    mov [rdi + 40], rdi
    mov [rdi + 48], rbp
    mov [rdi + 56], r8
    mov [rdi + 64], r9
    mov [rdi + 72], r10
    mov [rdi + 80], r11
    mov [rdi + 88], r12
    mov [rdi + 96], r13
    mov [rdi + 104], r14
    mov [rdi + 112], r15

    ; Save RIP (return address is on stack)
    mov rax, [rsp]
    mov [rdi + 120], rax

    ; Save RSP
    mov [rdi + 144], rsp

    ; Load new context
    mov rax, [rsi + 0]
    mov rbx, [rsi + 8]
    mov rcx, [rsi + 16]
    mov rdx, [rsi + 24]
    ; rsi loaded last (we need it for loads)
    mov rdi, [rsi + 40]
    mov rbp, [rsi + 48]
    mov r8, [rsi + 56]
    mov r9, [rsi + 64]
    mov r10, [rsi + 72]
    mov r11, [rsi + 80]
    mov r12, [rsi + 88]
    mov r13, [rsi + 96]
    mov r14, [rsi + 104]
    mov r15, [rsi + 112]

    ; Load RSP
    mov rsp, [rsi + 144]

    ; Load RIP (push and ret)
    push qword [rsi + 120]

    ; Finally load rsi
    mov rsi, [rsi + 32]

    ret

Phase 7: System Calls (Week 17-18)

Goal: User programs can request kernel services

// syscall.cpp
#define SYSCALL_EXIT   0
#define SYSCALL_WRITE  1
#define SYSCALL_READ   2
#define SYSCALL_FORK   3
#define SYSCALL_EXEC   4
#define SYSCALL_GETPID 5

extern "C" int64_t syscall_handler(uint64_t num, uint64_t arg1,
                                   uint64_t arg2, uint64_t arg3) {
    switch (num) {
        case SYSCALL_EXIT:
            Scheduler::exit((int)arg1);
            return 0;

        case SYSCALL_WRITE: {
            int fd = (int)arg1;
            const char* buf = (const char*)arg2;
            size_t count = (size_t)arg3;

            if (fd == 1) {  // stdout
                for (size_t i = 0; i < count; i++) {
                    Terminal::putchar(buf[i]);
                }
                return count;
            }
            return -1;
        }

        case SYSCALL_FORK:
            return Scheduler::fork();

        case SYSCALL_GETPID:
            return Scheduler::getCurrentPID();

        default:
            return -1;
    }
}

Phase 8: User Space and Shell (Week 19-24)

Goal: User programs run in Ring 3

// user/shell.cpp - Compiled as separate user binary
int main() {
    char buffer[256];
    const char* prompt = "myos:/$ ";

    while (true) {
        write(1, prompt, strlen(prompt));

        int len = read(0, buffer, sizeof(buffer) - 1);
        if (len <= 0) continue;
        buffer[len] = '\0';

        if (strcmp(buffer, "exit\n") == 0) {
            break;
        } else if (strcmp(buffer, "ps\n") == 0) {
            // System call to list processes
            ps();
        } else if (strncmp(buffer, "/bin/", 5) == 0) {
            // Execute program
            int pid = fork();
            if (pid == 0) {
                exec(buffer);
                exit(1);  // exec failed
            } else {
                wait(pid);
            }
        } else {
            write(1, "Unknown command\n", 16);
        }
    }

    return 0;
}

5.11 Key Implementation Decisions

Decision 1: Higher-Half Kernel

Map the kernel to high virtual addresses (e.g., 0xFFFFFFFF80000000 for x86-64). This leaves the lower half for user programs and simplifies memory layout.

Decision 2: Physical Memory Allocator

Use a bitmap for simplicity. Each bit represents one 4KB page. Set bits indicate used pages. Buddy allocator is more complex but better for larger allocations.

Decision 3: Process Representation

Keep process control blocks in kernel heap. Use a singly-linked list for ready queue (simple round-robin). Consider priority queues for more sophisticated scheduling.

Decision 4: Interrupt Handling Strategy

Use a two-phase approach: quick handler in assembly that saves state and calls C++ handler. This keeps interrupt handlers clean and debuggable.


6. Testing Strategy

Unit Testing (Limited in Kernel):

// test.cpp - Early tests before scheduler works
void test_heap() {
    void* p1 = Heap::allocate(100);
    void* p2 = Heap::allocate(200);
    void* p3 = Heap::allocate(300);

    assert(p1 != nullptr);
    assert(p2 != nullptr);
    assert(p3 != nullptr);
    assert(p1 != p2 && p2 != p3);

    Heap::free(p2);
    void* p4 = Heap::allocate(150);
    assert(p4 == p2);  // Reused freed block

    Terminal::println("Heap tests passed!");
}

Integration Testing with QEMU:

# Test boot
qemu-system-x86_64 -kernel kernel.bin -serial stdio 2>&1 | grep "Kernel loaded"

# Test with GDB
qemu-system-x86_64 -kernel kernel.bin -s -S &
gdb -ex "target remote :1234" -ex "break kernel_main" -ex "continue" kernel.bin

# Test filesystem
echo "test content" > initrd/test.txt
./make_initrd initrd/ initrd.img
qemu-system-x86_64 -kernel kernel.bin -initrd initrd.img -serial stdio

Automated Test Script:

#!/bin/bash
# test.sh

TIMEOUT=10
EXPECTED_STRINGS=("Kernel loaded" "GDT initialized" "IDT initialized" "Heap initialized")

OUTPUT=$(timeout $TIMEOUT qemu-system-x86_64 -kernel kernel.bin -nographic -serial mon:stdio 2>&1)

for expected in "${EXPECTED_STRINGS[@]}"; do
    if echo "$OUTPUT" | grep -q "$expected"; then
        echo "PASS: Found '$expected'"
    else
        echo "FAIL: Missing '$expected'"
        exit 1
    fi
done

echo "All tests passed!"

7. Common Pitfalls & Debugging

Problem Symptom Root Cause Fix
Triple fault on boot QEMU resets immediately Invalid GDT, stack overflow, or page fault with no handler Use -d int -no-reboot to see the exception
Page fault in kernel Exception 14 Accessing unmapped virtual address Check page table setup, ensure kernel is mapped
General protection fault Exception 13 Segment selector issue or privilege violation Verify GDT entries and segment registers
No keyboard input Key presses ignored IRQ1 not enabled or PIC not configured Initialize PIC and unmask IRQ1
Scheduler deadlock System hangs All processes blocked or ready queue corrupted Add debug output in scheduler, check queue pointers
Heap corruption Random crashes Buffer overflow or double free Add canary values around allocations
User program crashes Page fault in Ring 3 User page table missing mapping Verify user space is mapped with user flag

Debugging Techniques:

# See all interrupts
qemu-system-x86_64 -d int -no-reboot

# Serial output to terminal
qemu-system-x86_64 -serial stdio

# GDB debugging
qemu-system-x86_64 -s -S -kernel kernel.bin &
gdb kernel.bin -ex "target remote :1234"

# View registers at interrupt
(gdb) info registers

# Examine memory
(gdb) x/16xg 0xFFFFFFFF80100000

8. Extensions & Challenges

After completing the basic kernel:

  1. SMP Support - Boot multiple CPU cores, per-CPU schedulers
  2. ext2 Filesystem - Real filesystem with directories and permissions
  3. Network Stack - Ethernet driver, IP, UDP, TCP
  4. Graphics Mode - VESA framebuffer, basic GUI
  5. Shared Memory - IPC between processes
  6. Signals - Unix-style signal handling
  7. Dynamic Linking - Shared libraries (.so)
  8. UEFI Boot - Modern boot method instead of BIOS/GRUB
  9. Device Tree - For ARM port
  10. Port to RISC-V - Different architecture, same concepts

9. Real-World Connections

This project relates to:

  • Linux Kernel - Many of the same concepts, vastly more complexity
  • Xen/KVM - Hypervisors that run below operating systems
  • RTOS (FreeRTOS, Zephyr) - Real-time kernels for embedded systems
  • Container Runtimes - Use kernel features like namespaces and cgroups
  • Game Console Development - Bare-metal or minimal OS environments

Job Roles Using These Skills:

  • Kernel Developer (Linux, Windows, embedded)
  • Systems Programmer
  • Embedded Software Engineer
  • Security Researcher (kernel exploits)
  • Hypervisor/Virtualization Engineer
  • Firmware Developer

10. Resources

Essential Reading:

Resource URL Purpose
OSDev Wiki wiki.osdev.org Comprehensive OS development reference
Intel Manual Vol 3 intel.com x86 architecture specification
OSDEV Bare Bones wiki.osdev.org/Bare_Bones Minimal kernel tutorial

Video Courses:

  • “Writing a Simple Operating System from Scratch” by Nick Blundell (free PDF)
  • YouTube: “Build an OS” series by various creators

Example Projects to Study:

  • xv6 - MIT’s teaching OS (simple, well-documented)
  • ToaruOS - Hobby OS with full GUI
  • SerenityOS - Modern C++ hobby OS
  • MINIX 3 - Microkernel design

Books:

  • “Operating Systems: Three Easy Pieces” - ostep.org (free online)
  • “The Design of the UNIX Operating System” by Maurice Bach
  • “Linux Kernel Development” by Robert Love

11. Self-Assessment Checklist

Before marking this project complete, verify:

Boot Process:

  • Kernel boots via GRUB without errors
  • Multiboot info structure is parsed correctly
  • Memory map is extracted and used

Memory Management:

  • Physical memory manager tracks free/used pages
  • Paging is enabled with kernel mapped
  • Heap allocator works correctly (no leaks, no corruption)
  • new and delete operators work

Interrupts:

  • CPU exceptions print error info (don’t just hang)
  • Timer interrupt fires at regular intervals
  • Keyboard input is received

Scheduling:

  • Multiple processes can be created
  • Context switching works correctly
  • Round-robin scheduling is fair

User Space:

  • Processes run in Ring 3
  • System calls work
  • User programs can print output

Filesystem:

  • initrd is loaded and parsed
  • Files can be read

12. Submission / Completion Criteria

Your kernel is complete when:

  1. Boot to Shell - Boots from GRUB to interactive shell in < 2 seconds
  2. Run Programs - At least 3 user programs (hello, ps, forktest) work
  3. Stable - No crashes during 10 minutes of interactive use
  4. Documented - README explains build process and design decisions
  5. Clean Code - Organized structure, consistent style, comments for tricky parts

Deliverables:

myos/
├── README.md           # How to build and run
├── DESIGN.md           # Architecture decisions explained
├── Makefile            # Build system
├── kernel.bin          # Compiled kernel
├── myos.iso            # Bootable ISO image
└── src/                # All source code

Build Verification:

# Clean build
make clean && make

# Run in QEMU
make run

# Create bootable ISO
make iso

# Run ISO
qemu-system-x86_64 -cdrom myos.iso

Conclusion

Congratulations on reaching the capstone project of the C++ Mastery path. Building an operating system kernel represents the pinnacle of systems programming. You’re not just using the computer - you’re telling it exactly what to do at the lowest level.

This project synthesizes everything you’ve learned:

  • Memory management from Project 3 (Smart Pointers) and Project 4 (Containers)
  • Concurrency from Project 5 (Thread Pool)
  • Parsing from Project 1 (Calculator), Project 6 (JSON), and Project 10 (Compiler)
  • Networking patterns from Project 7 (TCP Chat) and Project 11 (HTTP Server)
  • Low-level control from Project 2 (Memory-Mapped Files) and Project 8 (Database)
  • Data structures from Project 4 (Generic Containers)
  • Performance mindset from Project 9 (Game Engine)
  • Template mastery from Project 12 (TMP Library)

When you complete this project, you’ll have achieved true C++ mastery. You’ll understand not just how to use the language, but how the entire software stack works - from transistors to high-level applications.

The journey doesn’t end here. Your kernel can grow into a real operating system. Or you can apply this knowledge to kernel modules, embedded systems, security research, or any domain where deep systems understanding matters.

Good luck, and enjoy writing code that runs on bare metal!


“In the beginning, there was the machine, and the machine was without software, and void. And the programmer said, ‘Let there be code,’ and there was code. And the programmer saw the code, and it was good.”