Project 4: Two-Stage Bootloader

Break free from the 512-byte MBR constraint by implementing a multi-stage bootloader architecture identical to GRUB, LILO, and Windows Boot Manager.


Quick Reference

Attribute Value
Difficulty ★★★☆☆ Advanced
Time Estimate 1-2 weeks
Language x86 Assembly (NASM) + C
Prerequisites Project 3 (Real to Protected Mode), basic C knowledge, linker basics
Key Topics INT 13h disk services, CHS/LBA addressing, memory layout, ABI conventions, linker scripts, cross-compilation

1. Learning Objectives

By completing this project, you will:

  1. Master BIOS disk I/O: Use INT 13h to read sectors from disk, handling both CHS and LBA addressing modes
  2. Understand bootloader architecture: Implement the same multi-stage pattern used by GRUB, LILO, and professional bootloaders
  3. Bridge assembly and C: Call C functions from assembly and vice versa, understanding ABI conventions for bare-metal environments
  4. Write linker scripts: Control memory layout for bare-metal binaries, placing code at exact addresses
  5. Cross-compile for bare metal: Build freestanding code without standard library dependencies
  6. Design robust error handling: Implement retry logic and diagnostic output in space-constrained code
  7. Plan memory layouts: Design memory maps that accommodate bootloader stages, stack, and future kernel

2. Theoretical Foundation

2.1 Core Concepts

The 512-Byte Problem

The Master Boot Record (MBR) is exactly 512 bytes. Of these:

  • 2 bytes are the boot signature (0x55AA)
  • 64 bytes are the partition table (4 entries x 16 bytes)
  • 446 bytes remain for your bootloader code

In 446 bytes, you cannot:

  • Include a filesystem driver
  • Parse ELF or PE executables
  • Display a boot menu with options
  • Perform memory detection
  • Set up proper protected mode with comprehensive GDT

Every professional bootloader solves this with staged loading:

┌─────────────────────────────────────────────────────────────────────────────┐
│                        MULTI-STAGE BOOTLOADER ARCHITECTURE                   │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  ┌──────────────────────────────────────────────────────────────────────┐   │
│  │                         STAGE 1 (MBR)                                 │   │
│  │                         512 bytes max                                 │   │
│  ├──────────────────────────────────────────────────────────────────────┤   │
│  │  Responsibilities:                                                    │   │
│  │  ✓ Set up minimal stack                                              │   │
│  │  ✓ Read Stage 2 from disk (fixed sectors)                           │   │
│  │  ✓ Jump to Stage 2                                                   │   │
│  │                                                                       │   │
│  │  NOT responsible for:                                                │   │
│  │  ✗ Filesystem parsing                                                │   │
│  │  ✗ Protected mode                                                    │   │
│  │  ✗ Memory detection                                                  │   │
│  │  ✗ User interface                                                    │   │
│  └──────────────────────────────────────────────────────────────────────┘   │
│                                    │                                         │
│                                    │ Load via INT 13h                        │
│                                    ▼                                         │
│  ┌──────────────────────────────────────────────────────────────────────┐   │
│  │                         STAGE 2                                       │   │
│  │                      No size limit!                                   │   │
│  ├──────────────────────────────────────────────────────────────────────┤   │
│  │  Responsibilities:                                                    │   │
│  │  ✓ Memory detection (E820)                                           │   │
│  │  ✓ A20 gate enable                                                   │   │
│  │  ✓ GDT setup                                                         │   │
│  │  ✓ Protected mode transition                                         │   │
│  │  ✓ Filesystem driver (FAT, ext2, etc.)                              │   │
│  │  ✓ Kernel loading and parsing                                        │   │
│  │  ✓ Boot menu and configuration                                       │   │
│  │  ✓ Video mode setup                                                  │   │
│  │                                                                       │   │
│  │  Can be written in C with assembly stubs!                            │   │
│  └──────────────────────────────────────────────────────────────────────┘   │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

INT 13h Disk Services

The BIOS provides disk I/O through interrupt 13h. Two main functions:

AH=02h: Read Sectors (CHS addressing)

Input:
  AH = 02h (function: read sectors)
  AL = number of sectors to read (1-128)
  CH = cylinder number (low 8 bits)
  CL = sector number (bits 0-5) + cylinder high bits (bits 6-7)
  DH = head number
  DL = drive number (80h = first HDD, 00h = first floppy)
  ES:BX = destination buffer address

Output:
  CF = 0 on success, 1 on error
  AH = status code (0 = success)
  AL = number of sectors actually read

AH=42h: Extended Read (LBA addressing)

Input:
  AH = 42h (function: extended read)
  DL = drive number
  DS:SI = pointer to Disk Address Packet (DAP)

Disk Address Packet structure:
  Offset  Size  Description
  0x00    1     Size of packet (16 bytes)
  0x01    1     Reserved (0)
  0x02    2     Number of sectors to read
  0x04    4     Transfer buffer (segment:offset)
  0x08    8     Starting LBA (64-bit)

Output:
  CF = 0 on success, 1 on error
  AH = status code

CHS vs LBA Conversion:

LBA = (Cylinder × HeadsPerCylinder + Head) × SectorsPerTrack + (Sector - 1)

Example for standard floppy (18 sectors, 2 heads):
  CHS (0, 0, 2) = (0 × 2 + 0) × 18 + (2 - 1) = LBA 1 (second sector)
  CHS (0, 0, 3) = LBA 2
  CHS (0, 1, 1) = LBA 18 (first sector of second head)

Memory Map for Two-Stage Loading

┌─────────────────────────────────────────────────────────────────────────────┐
│                    REAL MODE MEMORY MAP (First 1MB)                          │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  Address      Size        Description                                        │
│  ─────────────────────────────────────────────────────────────────────────  │
│  0x00000      1KB         Interrupt Vector Table (IVT) - DO NOT TOUCH       │
│  0x00400      256B        BIOS Data Area (BDA) - DO NOT TOUCH               │
│  0x00500      ~30KB       FREE - Can use for Stage 2                        │
│  0x07C00      512B        Stage 1 (MBR) - BIOS loads here                   │
│  0x07E00      ~480KB      FREE - Can use for Stage 2 or kernel              │
│  0x9FC00      1KB         Extended BIOS Data Area (EBDA)                    │
│  0xA0000      64KB        Video Memory (VGA)                                │
│  0xC0000      32KB        Video BIOS ROM                                    │
│  0xC8000      32KB        Mapped hardware / Option ROMs                      │
│  0xF0000      64KB        System BIOS ROM                                   │
│                                                                              │
│  RECOMMENDED LAYOUT:                                                         │
│  ┌────────────────────────────────────────────────────────────────────────┐ │
│  │  0x00500 - 0x07BFF : Stage 2 loading area (~29KB available)           │ │
│  │  0x07C00 - 0x07DFF : Stage 1 (MBR) code                               │ │
│  │  0x07E00 - 0x0FFFF : Stack (grows down from 0x10000)                  │ │
│  │  0x10000 - 0x9FBFF : Kernel loading area (~576KB available)           │ │
│  └────────────────────────────────────────────────────────────────────────┘ │
│                                                                              │
│  ALTERNATIVE (Stage 2 above Stage 1):                                       │
│  ┌────────────────────────────────────────────────────────────────────────┐ │
│  │  0x07C00 - 0x07DFF : Stage 1 (MBR) code                               │ │
│  │  0x08000 - 0x0FFFF : Stage 2 loading area (32KB)                      │ │
│  │  0x10000 - 0x1FFFF : Stack (64KB, grows down from 0x20000)            │ │
│  │  0x20000 - 0x9FBFF : Kernel loading area (~510KB)                     │ │
│  └────────────────────────────────────────────────────────────────────────┘ │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Calling C from Assembly (System V i386 ABI)

For 32-bit protected mode code, the System V i386 ABI specifies:

┌─────────────────────────────────────────────────────────────────────────────┐
│                      SYSTEM V i386 CALLING CONVENTION                        │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  ARGUMENT PASSING (cdecl):                                                   │
│  ┌────────────────────────────────────────────────────────────────────────┐ │
│  │ Arguments pushed right-to-left onto stack                              │ │
│  │                                                                         │ │
│  │ For: void foo(int a, int b, int c)                                     │ │
│  │      Call foo(1, 2, 3)                                                 │ │
│  │                                                                         │ │
│  │ Stack before call:        Stack after CALL:                            │ │
│  │     ...                       ...                                      │ │
│  │     3 (arg c)                 3 (arg c)      [EBP+16]                 │ │
│  │     2 (arg b)                 2 (arg b)      [EBP+12]                 │ │
│  │     1 (arg a)                 1 (arg a)      [EBP+8]                  │ │
│  │     ← ESP                     return addr    [EBP+4]                  │ │
│  │                               saved EBP      [EBP] ← EBP, ESP         │ │
│  └────────────────────────────────────────────────────────────────────────┘ │
│                                                                              │
│  RETURN VALUE:                                                               │
│  • Integer/pointer: EAX (32-bit) or EDX:EAX (64-bit)                       │
│  • Floating point: x87 FPU stack top (ST0)                                  │
│                                                                              │
│  CALLER-SAVED (volatile):                                                    │
│  • EAX, ECX, EDX - function can destroy these                              │
│                                                                              │
│  CALLEE-SAVED (non-volatile):                                               │
│  • EBX, ESI, EDI, EBP - function must preserve these                       │
│                                                                              │
│  STACK CLEANUP:                                                              │
│  • Caller cleans up arguments after call (ADD ESP, n)                       │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

For 16-bit real mode calling C, the same principles apply but with 16-bit registers:

  • Arguments on stack via push (right to left)
  • Return in AX (or DX:AX for long)
  • Caller cleans stack

Linker Scripts for Bare Metal

A linker script controls where code and data are placed in memory:

/* stage2.ld - Example linker script for Stage 2 */

OUTPUT_FORMAT(binary)       /* Raw binary, no ELF wrapper */
OUTPUT_ARCH(i386)

ENTRY(_start)               /* Entry point symbol */

SECTIONS
{
    . = 0x8000;             /* Stage 2 loads at 0x8000 */

    .text : {
        *(.text)            /* All code sections */
    }

    .rodata : {
        *(.rodata)          /* Read-only data (strings, constants) */
        *(.rodata.*)
    }

    .data : {
        *(.data)            /* Initialized data */
    }

    .bss : {
        __bss_start = .;    /* Symbol for BSS start */
        *(.bss)
        *(COMMON)
        __bss_end = .;      /* Symbol for BSS end */
    }

    /DISCARD/ : {
        *(.eh_frame)        /* Discard exception handling (no runtime) */
        *(.comment)
    }
}

2.2 Why This Matters

Every real bootloader uses multi-stage loading. Understanding this pattern is fundamental:

Bootloader Stage 1 Stage 2 Notes
GRUB boot.img (446 bytes) core.img (variable, typically 25-500KB) Stage 2 contains filesystem drivers
LILO Boot sector Map file sectors Stage 2 location recorded at install time
Windows MBR code bootmgr Located via BCD configuration
Syslinux ldlinux.sys (first sector) ldlinux.sys (rest) Designed for FAT filesystems

This pattern appears throughout computing:

  • CPU microcode updates
  • UEFI PEI to DXE transition
  • U-Boot SPL to main U-Boot
  • Embedded bootloader chains

2.3 Historical Context

The 512-byte limit comes from the original IBM PC (1981):

  • The PC’s floppy drive controller could read one sector at a time
  • Sector size was standardized at 512 bytes
  • BIOS was designed to load exactly one sector to 0x7C00
  • The boot signature (0x55AA) verified valid boot code

This limit has persisted for 40+ years due to backward compatibility requirements. Even modern UEFI systems can emulate legacy BIOS boot for older operating systems.

2.4 Common Misconceptions

Misconception 1: “Stage 2 must be at a fixed address” Reality: Stage 2 can be anywhere in available memory. The only requirement is that Stage 1 knows where to load it and that the linker script matches that address.

Misconception 2: “You can’t use C in a bootloader” Reality: C works perfectly in bootloaders with proper setup: -ffreestanding, -nostdlib, correct linker script, and an assembly stub to set up the stack before calling C.

Misconception 3: “INT 13h only works with floppies” Reality: INT 13h works with all drives including hard disks, USB drives, and virtual disks. Extended INT 13h (AH=42h) supports LBA for large drives.

Misconception 4: “The partition table uses all 64 bytes” Reality: If you don’t need partitions (e.g., floppy disk or dedicated boot device), you can use those 64 bytes for code. But for hard disk compatibility, keep the partition table.


3. Project Specification

3.1 What You Will Build

A complete two-stage bootloader:

  • Stage 1: 512-byte MBR that loads Stage 2 from disk and jumps to it
  • Stage 2: Larger program (written in C with assembly stubs) that initializes the system and prepares for kernel loading

3.2 Functional Requirements

  1. Stage 1 Requirements:
    • Fit entirely within 446 bytes (512 - 64 partition table - 2 signature)
    • Set up a valid stack
    • Read Stage 2 from consecutive disk sectors (sectors 2-N)
    • Implement retry logic for disk read failures (3-5 retries)
    • Display minimal status (‘1’ for success, ‘E’ for error)
    • Jump to Stage 2’s entry point
  2. Stage 2 Requirements:
    • Display “Stage 2 loaded!” message confirming successful loading
    • Query memory using INT 15h/E820 (optional but recommended)
    • Set up GDT for protected mode
    • Enable A20 gate
    • Switch to protected mode
    • Display “Protected mode active!” from 32-bit code
    • Demonstrate C function calls (print a message from C)
  3. Build System Requirements:
    • Makefile that builds both stages
    • Creates a bootable disk image
    • Supports make run to test in QEMU
    • Supports make debug to debug in QEMU+GDB

3.3 Non-Functional Requirements

  • Deterministic: Same build produces identical binary
  • Testable: Works identically in QEMU and on real hardware (USB boot)
  • Educational: Code is well-commented explaining each step
  • Extensible: Stage 2 structure allows easy addition of features

3.4 Example Usage / Output

# Build the bootloader
$ make
nasm -f bin stage1.asm -o stage1.bin
i686-elf-gcc -c -ffreestanding -nostdlib stage2_entry.asm -o stage2_entry.o
i686-elf-gcc -c -ffreestanding -nostdlib -m32 stage2.c -o stage2.o
i686-elf-ld -T stage2.ld stage2_entry.o stage2.o -o stage2.bin
cat stage1.bin stage2.bin > boot.img
# Pad to floppy size for QEMU
dd if=/dev/zero bs=1474560 count=1 of=floppy.img
dd if=boot.img of=floppy.img conv=notrunc

# Run in QEMU
$ make run
qemu-system-i386 -fda floppy.img

# Expected output on QEMU console:
1                           # Stage 1 loaded sector successfully
Stage 1: Loading Stage 2 from sectors 2-10...
Stage 1: Jumping to Stage 2 at 0x8000
Stage 2: Hello from C code!
Stage 2: Initializing protected mode...
[A20 enabled]
[GDT loaded]
[Entering protected mode...]
Stage 2 (32-bit): Protected mode active!
Stage 2 (32-bit): Ready to load kernel.

3.5 Real World Outcome

Upon completion, you will have:

  1. A working two-stage bootloader identical in architecture to GRUB, proving you understand professional bootloader design

  2. Skills to extend it with:
    • Filesystem drivers (FAT12/FAT32, ext2)
    • ELF kernel loading
    • Boot menu with configuration
    • Graphics mode initialization
  3. Understanding of bare-metal C - how to run C code without any OS support, which is crucial for embedded systems and OS development

  4. A portfolio piece demonstrating low-level systems expertise

4. Solution Architecture

4.1 High-Level Design

┌─────────────────────────────────────────────────────────────────────────────┐
│                          TWO-STAGE BOOTLOADER                                │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│   DISK LAYOUT:                                                               │
│   ┌──────────┬──────────────────────────────────────────────────────────┐   │
│   │ Sector 1 │ Sector 2 │ Sector 3 │ Sector 4 │ ... │ Sector N │ ...   │   │
│   ├──────────┼──────────────────────────────────────────────────────────┤   │
│   │ Stage 1  │              Stage 2 Binary                      │ Free │   │
│   │ (MBR)    │         (loaded to 0x8000)                       │      │   │
│   │ 512 bytes│              Variable size                       │      │   │
│   └──────────┴──────────────────────────────────────────────────────────┘   │
│                                                                              │
│   MEMORY AFTER STAGE 2 LOADS:                                               │
│   ┌──────────────────────────────────────────────────────────────────────┐  │
│   │                                                                       │  │
│   │  0x0000 ┌─────────────────────────────────────────┐                  │  │
│   │         │ IVT + BIOS Data (DO NOT TOUCH)          │                  │  │
│   │  0x0500 ├─────────────────────────────────────────┤                  │  │
│   │         │ (Available memory)                       │                  │  │
│   │  0x7C00 ├─────────────────────────────────────────┤                  │  │
│   │         │ Stage 1 Code (512 bytes)                │                  │  │
│   │  0x7E00 ├─────────────────────────────────────────┤                  │  │
│   │         │ Stack grows down from 0x7C00            │    │             │  │
│   │         │           ↓                             │    │             │  │
│   │  0x8000 ├─────────────────────────────────────────┤    │             │  │
│   │         │                                          │    ▼             │  │
│   │         │ Stage 2 Code + Data                     │                  │  │
│   │         │ (Variable size, e.g., 8KB)              │                  │  │
│   │         │                                          │                  │  │
│   │  0xA000 ├─────────────────────────────────────────┤                  │  │
│   │         │ (Available for kernel)                  │                  │  │
│   │  0x9FC00├─────────────────────────────────────────┤                  │  │
│   │         │ EBDA + Video + ROM (DO NOT TOUCH)       │                  │  │
│   │  0xFFFFF└─────────────────────────────────────────┘                  │  │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

4.2 Key Components

Component File Purpose
Stage 1 stage1.asm MBR bootloader, loads Stage 2
Stage 2 Entry stage2_entry.asm Assembly stub that sets up and calls C
Stage 2 Main stage2.c Main Stage 2 logic in C
Stage 2 Print print.c Utility functions for printing (real/protected mode)
Linker Script stage2.ld Controls Stage 2 memory layout
Makefile Makefile Build automation

4.3 Data Structures

Disk Address Packet (DAP) for Extended Read:

struct disk_address_packet {
    uint8_t  size;           // Size of this structure (16)
    uint8_t  reserved;       // Must be 0
    uint16_t sectors;        // Number of sectors to transfer
    uint16_t offset;         // Transfer buffer offset
    uint16_t segment;        // Transfer buffer segment
    uint64_t lba;            // Starting LBA
} __attribute__((packed));

GDT Entry:

struct gdt_entry {
    uint16_t limit_low;      // Limit bits 0-15
    uint16_t base_low;       // Base bits 0-15
    uint8_t  base_mid;       // Base bits 16-23
    uint8_t  access;         // Access byte
    uint8_t  granularity;    // Limit 16-19 + flags
    uint8_t  base_high;      // Base bits 24-31
} __attribute__((packed));

struct gdt_ptr {
    uint16_t limit;          // Size of GDT - 1
    uint32_t base;           // Linear address of GDT
} __attribute__((packed));

4.4 Algorithm Overview

Stage 1 Algorithm:

1. BIOS loads Stage 1 at 0x7C00
2. Set up segment registers (DS, ES, SS = 0)
3. Set up stack (SP = 0x7C00, grows down to ~0x0500)
4. Display '1' to indicate Stage 1 running
5. Prepare disk read:
   - Target address: 0x8000
   - Start sector: 2 (LBA 1)
   - Sector count: STAGE2_SECTORS (compile-time constant)
6. Call INT 13h AH=02h or AH=42h (with retry loop)
7. If error: display 'E', halt
8. If success: jump to 0x8000

Stage 2 Algorithm:

1. Entry point at 0x8000 (assembly)
2. Display "Stage 2 loaded" message
3. Set up stack for C calling
4. Call C main() function
5. In C main():
   a. Print welcome message
   b. Query memory with E820 (optional)
   c. Enable A20 gate
   d. Load GDT
   e. Enable protected mode (set CR0.PE)
   f. Far jump to 32-bit code
6. In 32-bit code:
   a. Reload segment registers with 32-bit selectors
   b. Print "Protected mode active!" to video memory
   c. Halt or continue to kernel loading

5. Implementation Guide

5.1 Development Environment Setup

# Required tools
# On Ubuntu/Debian:
sudo apt-get install nasm qemu-system-x86 gcc gdb make

# Cross-compiler for 32-bit freestanding code (recommended):
# Option 1: Use system gcc with -m32
sudo apt-get install gcc-multilib

# Option 2: Build cross-compiler (cleaner, no host dependencies)
# See: https://wiki.osdev.org/GCC_Cross-Compiler

# Verify NASM
nasm --version   # Should be 2.14+

# Verify QEMU
qemu-system-i386 --version   # Should be 4.0+

5.2 Project Structure

two-stage-bootloader/
├── Makefile
├── stage1.asm           # Stage 1 assembly (512 bytes)
├── stage2/
│   ├── entry.asm        # Stage 2 entry point (assembly)
│   ├── main.c           # Main C code
│   ├── print.c          # Print utilities
│   ├── print.h          # Header file
│   ├── gdt.c            # GDT setup
│   ├── a20.c            # A20 gate enable
│   └── linker.ld        # Linker script for Stage 2
├── build/               # Build artifacts
│   ├── stage1.bin
│   ├── stage2.bin
│   └── boot.img
└── README.md

5.3 The Core Question You’re Answering

“How do you overcome the 512-byte constraint of the MBR while maintaining compatibility with BIOS boot requirements, and what architectural patterns enable a bootloader to scale from minimal initialization code to complex features like filesystem drivers and interactive menus?”

This is about understanding the fundamental pattern of staged loading and bootstrap architectures.

5.4 Concepts You Must Understand First

Before writing code, verify you understand these concepts:

  1. INT 13h Function AH=02h Parameters
    • What does each register (AH, AL, CH, CL, DH, DL, ES:BX) contain?
    • How do you convert from LBA to CHS?
    • What does the Carry Flag indicate after the call?
    • Reference: Ralf Brown’s Interrupt List, INT 13h
  2. Segment:Offset Addressing
    • Given ES=0x0800, BX=0x0000, what physical address does ES:BX point to?
    • Why do we set DS=ES=SS=0 and use only offsets?
    • Reference: CS:APP Chapter 3.4, Intel SDM Vol. 1 Chapter 3
  3. Stack Setup in Assembly
    • Why do we need to set up SS and SP before using the stack?
    • What happens if we call a function without a stack?
    • Reference: Low-Level Programming by Zhirkov, Chapter 3
  4. Freestanding C Environment
    • What does -ffreestanding tell the compiler?
    • Why can’t we use printf, malloc, or any libc function?
    • What do we have: basic C syntax, no standard library
    • Reference: GCC Manual, Section 3.4 “Options Controlling C Dialect”
  5. Linker Script Basics
    • What does . = 0x8000; mean?
    • What are .text, .data, .rodata, .bss sections?
    • Why OUTPUT_FORMAT(binary) instead of ELF?
    • Reference: LD Manual, Chapter 3 “Linker Scripts”

5.5 Questions to Guide Your Design

Work through these before coding:

  1. Memory Layout:
    • Where will you load Stage 2? (0x8000 is common, but why?)
    • Where will the stack be for Stage 1? For Stage 2?
    • How will you ensure Stage 1 and Stage 2 don’t overlap?
  2. Disk Access:
    • How many sectors is your Stage 2?
    • Will you use CHS or LBA addressing?
    • What drive number will you use (DL is passed by BIOS)?
  3. Error Handling:
    • What happens if disk read fails?
    • How will you indicate success/failure (remember: limited space)?
    • How many times will you retry?
  4. Stage 2 Entry:
    • What does Stage 2 expect when entered (segment registers, stack)?
    • How will the assembly stub hand off to C?
    • What calling convention will you use?
  5. Protected Mode Transition:
    • Where will your GDT live in memory?
    • How will you reload segment registers after mode switch?
    • How will you print in protected mode (no BIOS interrupts)?

5.6 Thinking Exercise

Before writing any code, trace through this scenario on paper:

  1. BIOS loads Stage 1 at 0x7C00. Draw the memory map showing:
    • Where is Stage 1 code?
    • Where should the stack pointer be set?
    • What are the initial values of DS, ES, SS, CS?
  2. Stage 1 loads Stage 2. Show:
    • The INT 13h register setup for reading 8 sectors to 0x8000
    • The memory map after loading
    • The instruction that transfers control to Stage 2
  3. Stage 2 runs and enters protected mode. Diagram:
    • Where is the GDT in memory?
    • What are the three GDT entries (null, code, data)?
    • What are the segment selector values after protected mode?
  4. Stage 2 writes to video memory in protected mode. Explain:
    • Why can’t we use INT 10h anymore?
    • What is the physical address of video memory?
    • How do we write a character with attribute to the screen?

5.7 Hints in Layers

Hint 1: Stage 1 Skeleton

Start with this Stage 1 structure:

[BITS 16]
[ORG 0x7C00]

start:
    ; Disable interrupts during setup
    cli

    ; Set up segments
    xor ax, ax
    mov ds, ax
    mov es, ax
    mov ss, ax
    mov sp, 0x7C00    ; Stack grows down from here

    ; Re-enable interrupts
    sti

    ; Save drive number (BIOS passes it in DL)
    mov [boot_drive], dl

    ; TODO: Print '1' to show we're running
    ; TODO: Load Stage 2
    ; TODO: Jump to Stage 2

    ; Error handling
    jmp error

error:
    mov ah, 0x0E
    mov al, 'E'
    int 0x10
    hlt
    jmp $

boot_drive: db 0

; Pad to 510 bytes
times 510-($-$$) db 0

; Boot signature
dw 0xAA55
Hint 2: Reading Sectors with INT 13h

For reading sectors using CHS (simpler for small disks):

load_stage2:
    mov ah, 0x02        ; Function: read sectors
    mov al, STAGE2_SECTORS  ; Number of sectors
    mov ch, 0           ; Cylinder 0
    mov cl, 2           ; Sector 2 (1-indexed, sector 1 is MBR)
    mov dh, 0           ; Head 0
    mov dl, [boot_drive] ; Drive number

    mov bx, 0x8000      ; ES:BX = destination
    ; ES is already 0 from setup

    int 0x13
    jc disk_error       ; Jump if carry set (error)

    ; Success - jump to Stage 2
    jmp 0x0000:0x8000

disk_error:
    ; Retry logic here
    ; ...

For retry logic:

    mov si, 3           ; 3 retries
.retry:
    ; ... disk read code ...
    int 0x13
    jnc .success        ; No carry = success

    ; Reset disk system
    xor ax, ax
    int 0x13

    dec si
    jnz .retry

    jmp error           ; All retries failed

.success:
    jmp 0x0000:0x8000
Hint 3: Stage 2 Assembly Entry

The Stage 2 entry point needs to set up for C:

[BITS 16]
[GLOBAL _start]
[EXTERN main]         ; C function we'll call

section .text
_start:
    ; We're at 0x8000, loaded by Stage 1

    ; Print message
    mov si, msg_loaded
    call print_string

    ; Set up stack for C (16-bit real mode first)
    mov ax, 0
    mov ds, ax
    mov es, ax
    mov ss, ax
    mov sp, 0x9000      ; Stack for Stage 2

    ; Call C code
    call main

    ; main() returned - halt
    cli
    hlt
    jmp $

print_string:
    ; SI = string pointer
.loop:
    lodsb
    or al, al
    jz .done
    mov ah, 0x0E
    int 0x10
    jmp .loop
.done:
    ret

section .rodata
msg_loaded: db "Stage 2 loaded!", 13, 10, 0
Hint 4: Minimal GDT for Protected Mode

A minimal GDT needs three entries:

// gdt.c

struct gdt_entry {
    uint16_t limit_low;
    uint16_t base_low;
    uint8_t  base_mid;
    uint8_t  access;
    uint8_t  granularity;
    uint8_t  base_high;
} __attribute__((packed));

struct gdt_ptr {
    uint16_t limit;
    uint32_t base;
} __attribute__((packed));

// GDT with 3 entries: null, code, data
struct gdt_entry gdt[3];
struct gdt_ptr gdtp;

void gdt_set_entry(int i, uint32_t base, uint32_t limit,
                   uint8_t access, uint8_t gran) {
    gdt[i].limit_low = limit & 0xFFFF;
    gdt[i].base_low = base & 0xFFFF;
    gdt[i].base_mid = (base >> 16) & 0xFF;
    gdt[i].access = access;
    gdt[i].granularity = ((limit >> 16) & 0x0F) | (gran & 0xF0);
    gdt[i].base_high = (base >> 24) & 0xFF;
}

void gdt_init(void) {
    // Null descriptor (required)
    gdt_set_entry(0, 0, 0, 0, 0);

    // Code segment: base=0, limit=4GB, 32-bit, ring 0
    // Access: present(1) + ring 0(00) + code/data(1) + exec(1) + r/w(1) = 0x9A
    // Granularity: 4KB pages(1) + 32-bit(1) + reserved(0) + limit high = 0xCF
    gdt_set_entry(1, 0, 0xFFFFF, 0x9A, 0xCF);

    // Data segment: base=0, limit=4GB, 32-bit, ring 0
    // Access: same as code but not executable = 0x92
    gdt_set_entry(2, 0, 0xFFFFF, 0x92, 0xCF);

    // Set up GDTR
    gdtp.limit = sizeof(gdt) - 1;
    gdtp.base = (uint32_t)&gdt;
}
Hint 5: Switching to Protected Mode

The actual mode switch requires assembly:

[BITS 16]
switch_to_protected:
    cli                     ; Disable interrupts

    lgdt [gdtp]             ; Load GDT register

    mov eax, cr0
    or eax, 1               ; Set PE (Protection Enable) bit
    mov cr0, eax

    jmp 0x08:protected_mode ; Far jump to code segment (selector 0x08)
                            ; This also flushes the prefetch queue

[BITS 32]
protected_mode:
    ; Reload data segments with selector 0x10 (data segment)
    mov ax, 0x10
    mov ds, ax
    mov es, ax
    mov fs, ax
    mov gs, ax
    mov ss, ax

    mov esp, 0x90000        ; Set up 32-bit stack

    ; Now we're in protected mode!
    ; Write directly to video memory at 0xB8000
    mov byte [0xB8000], 'P'
    mov byte [0xB8001], 0x0F  ; White on black

    ; ... continue execution ...

    cli
    hlt
Hint 6: Makefile for Building
# Cross-compiler (or use gcc -m32)
CC = i686-elf-gcc
LD = i686-elf-ld

# If you don't have cross-compiler:
# CC = gcc
# CFLAGS += -m32

CFLAGS = -ffreestanding -nostdlib -O2 -Wall -Wextra

# Stage 1
build/stage1.bin: stage1.asm
	nasm -f bin stage1.asm -o build/stage1.bin

# Stage 2 object files
build/entry.o: stage2/entry.asm
	nasm -f elf32 stage2/entry.asm -o build/entry.o

build/main.o: stage2/main.c
	$(CC) $(CFLAGS) -c stage2/main.c -o build/main.o

build/print.o: stage2/print.c
	$(CC) $(CFLAGS) -c stage2/print.c -o build/print.o

# Stage 2 binary
build/stage2.bin: build/entry.o build/main.o build/print.o stage2/linker.ld
	$(LD) -T stage2/linker.ld build/entry.o build/main.o build/print.o -o build/stage2.bin

# Final disk image
build/boot.img: build/stage1.bin build/stage2.bin
	cat build/stage1.bin build/stage2.bin > build/boot.img
	# Pad to at least one track (for QEMU)
	dd if=/dev/zero of=build/floppy.img bs=1474560 count=1
	dd if=build/boot.img of=build/floppy.img conv=notrunc

# QEMU targets
run: build/boot.img
	qemu-system-i386 -fda build/floppy.img

debug: build/boot.img
	qemu-system-i386 -fda build/floppy.img -s -S &
	gdb -ex "target remote localhost:1234" -ex "set architecture i8086"

clean:
	rm -f build/*.bin build/*.o build/*.img

.PHONY: run debug clean

5.8 The Interview Questions They’ll Ask

These questions test understanding of the concepts in this project:

  1. “Why is the MBR limited to 512 bytes? Could we change this?”

    Good answer: The 512-byte limit comes from the original PC’s floppy controller reading one sector. The BIOS boot process loads exactly one sector to 0x7C00. We can’t change this for legacy BIOS because the firmware is hardcoded. UEFI solves this with the ESP and UEFI applications.

  2. “Explain what happens from power-on to your Stage 2 code running.”

    Expected flow: PSU → CPU reset → execute at 0xFFFFFFF0 → jump to BIOS → POST → enumerate devices → read MBR to 0x7C00 → verify 0xAA55 signature → jump to 0x7C00 (Stage 1) → Stage 1 reads sectors to 0x8000 → jump to 0x8000 (Stage 2).

  3. “What’s the difference between CHS and LBA addressing? Which would you use?”

    Answer: CHS uses cylinder/head/sector (3D geometry), limited to 8.4GB. LBA treats disk as flat array of sectors. Use LBA (INT 13h AH=42h) when available (check with AH=41h) because it’s simpler and supports larger disks.

  4. “How would you debug if Stage 2 doesn’t load correctly?”

    Answer: (1) Print a character before/after disk read to identify where it fails. (2) Verify sector count and destination address. (3) Use QEMU with -d int to log interrupts. (4) Use GDB with QEMU -s -S to step through. (5) Check CF and AH after INT 13h for error codes.

  5. “Why can’t you call printf() in your Stage 2 C code?”

    Answer: printf() is part of the C standard library, which requires an OS to handle output. In bare-metal, there’s no OS, no libc. We compile with -ffreestanding -nostdlib. We implement our own print function using direct hardware access (INT 10h in real mode, video memory in protected mode).

  6. “What does a linker script do and why do you need one for bare-metal?”

    Answer: Linker scripts control memory layout: where code/data sections go, the entry point, output format. For bare-metal, we need to: (1) place code at the exact load address (e.g., 0x8000), (2) output raw binary (not ELF) so the CPU can execute it directly, (3) define symbols for runtime (like BSS boundaries).

  7. “How do you ensure your C code doesn’t use features that require an OS?”

    Answer: Use -ffreestanding (standard library not available) and -nostdlib (don’t link libc). Avoid: dynamic memory (malloc), floating point (may need FPU init), global constructors, standard headers except freestanding ones (stdint.h, stddef.h, stdarg.h, stdbool.h).

  8. “What happens if you forget to set up the stack before calling C code?”

    Answer: Undefined behavior. C assumes a valid stack for: function parameters, local variables, return addresses, saved registers. Without a stack, PUSH/POP/CALL/RET corrupt random memory. Could overwrite interrupt vectors, BIOS data, or your own code.

5.9 Books That Will Help

Topic Book Specific Chapters/Pages
x86 Assembly Fundamentals “Low-Level Programming” by Igor Zhirkov Chapters 1-3 (Assembly basics), Chapter 8 (Disk I/O)
Disk I/O and BIOS CS:APP (3rd ed.) by Bryant & O’Hallaron Chapter 6.1 (Storage Technologies) pp. 598-612
Linking and Loading “Low-Level Programming” by Igor Zhirkov Chapter 6 (Tool Chain) pp. 201-234
Linker Scripts GNU LD Manual Chapter 3 (Linker Scripts)
x86 Protected Mode Intel SDM Vol. 3A Chapter 2 (System Architecture Overview)
Bootloader Architecture “Operating Systems: Three Easy Pieces” Appendix on Boot (available online free)
GDT and Segmentation Intel SDM Vol. 3A Chapter 3.4 (Logical and Linear Addresses)
BIOS Interrupts Ralf Brown’s Interrupt List INT 13h (Disk), INT 10h (Video)

5.10 Implementation Phases

Phase 1: Basic Stage 1 (Days 1-3)

Goals:

  • Stage 1 boots and prints a character
  • Basic disk read code (even if hardcoded)

Tasks:

  1. Write minimal Stage 1 that boots and prints ‘1’
  2. Add INT 13h read code for a single sector
  3. Test that the read code returns successfully

Checkpoint: QEMU shows ‘1’ on screen and doesn’t immediately crash.

Phase 2: Load Stage 2 (Days 4-5)

Goals:

  • Stage 1 successfully loads multiple sectors
  • Jumps to Stage 2 entry point

Tasks:

  1. Extend Stage 1 to read N sectors
  2. Add retry logic for robustness
  3. Create minimal Stage 2 assembly that prints “Stage 2!”
  4. Verify Stage 2 executes

Checkpoint: QEMU shows “Stage 2 loaded” message from Stage 2 code.

Phase 3: Add C Support (Days 6-8)

Goals:

  • Stage 2 can call C functions
  • Basic print library works

Tasks:

  1. Write linker script for Stage 2
  2. Create assembly entry point that calls C main()
  3. Implement print functions in C
  4. Test C functions work

Checkpoint: Messages from C code appear on screen.

Phase 4: Protected Mode (Days 9-12)

Goals:

  • GDT setup in C
  • A20 enable
  • Mode switch works

Tasks:

  1. Implement GDT setup function in C
  2. Implement A20 enable (fast A20 first, then keyboard controller fallback)
  3. Add assembly code for actual mode switch
  4. Verify protected mode works (write to video memory at 0xB8000)

Checkpoint: “Protected mode active!” appears on screen via direct video memory access.

Phase 5: Polish (Days 13-14)

Goals:

  • Clean code, good documentation
  • Works on real hardware (USB boot)

Tasks:

  1. Add comments explaining each step
  2. Test on real hardware if possible
  3. Add memory detection (E820) as extension
  4. Write README

Checkpoint: Clean, documented, working bootloader.

5.11 Key Implementation Decisions

Decision Options Recommendation Rationale
Stage 2 address 0x1000, 0x8000, 0x10000 0x8000 Above Stage 1, leaves room for stack below, plenty of room above
Disk addressing CHS vs LBA LBA with CHS fallback LBA is simpler, but CHS needed for old BIOSes
GDT location In Stage 2 .data vs dedicated area In Stage 2 .data Simpler, linker handles placement
Stack location Below Stage 1 vs above Stage 2 Below Stage 1 (0x7C00 down) for real mode Traditional, avoids overwriting loaded code
A20 method Fast A20 vs keyboard controller Fast A20 first, KC fallback Fast A20 works on most systems, faster
C compiler Native gcc -m32 vs cross-compiler Cross-compiler if possible Cleaner, avoids host library leakage

6. Testing Strategy

Test Categories

Category What to Test How to Test
Stage 1 Load Stage 1 boots Character ‘1’ appears
Disk Read Sectors read correctly Stage 2 code runs
Retry Logic Handle transient failures Difficult - verify code path exists
C Integration C functions callable Message from C appears
GDT GDT structure correct Protected mode works
Mode Switch Enter protected mode Video memory write works

Critical Test Cases

  1. Basic Boot Test
    make run
    # Expected: '1', "Stage 2 loaded!", eventually protected mode message
    
  2. Debug Session
    make debug
    # In GDB:
    (gdb) break *0x7c00    # Break at Stage 1 entry
    (gdb) continue
    (gdb) x/20i $eip       # View Stage 1 code
    (gdb) break *0x8000    # Break at Stage 2 entry
    (gdb) continue
    (gdb) x/20i $eip       # View Stage 2 code
    
  3. Memory Verification
    # In QEMU monitor (Ctrl+Alt+2):
    (qemu) xp /16xb 0x7c00  # Verify Stage 1 at 0x7C00
    (qemu) xp /16xb 0x8000  # Verify Stage 2 at 0x8000
    (qemu) info registers   # Check segment registers
    
  4. GDT Verification
    # After GDT load, in GDB:
    (gdb) info registers    # Check GDTR
    (gdb) x/24xb <gdtr_base>  # View GDT entries
    

Verifying Success

  • Stage 1 displays ‘1’ character
  • “Stage 2 loaded!” message appears
  • C code message appears
  • A20 enable message appears
  • “Protected mode active!” in video memory
  • No crashes or triple faults
  • Works in QEMU with both -fda and -hda

7. Common Pitfalls & Debugging

Pitfall 1: Stage 2 Not Loading (Nothing After ‘1’)

Symptoms: Stage 1 prints ‘1’ but nothing else happens.

Possible Causes:

  1. Wrong sector number (remember: sectors are 1-indexed for CHS)
  2. Wrong destination address in ES:BX
  3. INT 13h returning error (check CF and AH)
  4. Stage 2 binary not actually on disk (build issue)

Debug Steps:

; Add after INT 13h:
jc .error        ; Check carry flag
cmp ah, 0        ; Check status
jne .error
; If we reach here, read succeeded

.error:
    mov al, ah   ; Get error code
    add al, '0'  ; Convert to ASCII
    mov ah, 0x0E
    int 0x10     ; Print error code

Fix: Verify disk image has Stage 2 at correct sector, verify ES:BX correct.


Pitfall 2: Triple Fault on Jump to Stage 2

Symptoms: QEMU resets or freezes after Stage 1 completes.

Possible Causes:

  1. Stage 2 not at expected address in memory
  2. Jump address doesn’t match load address
  3. Stage 2 code corrupted or not built correctly

Debug Steps:

# In QEMU monitor:
(qemu) xp /16xb 0x8000   # Should show Stage 2 code
# If all zeros or 0xAA, Stage 2 didn't load

# With GDB:
(gdb) break *0x8000
(gdb) continue
# If breakpoint never hits, jump instruction is wrong

Fix: Ensure linker script sets origin to match load address (0x8000). Ensure far jump uses correct segment.


Pitfall 3: Protected Mode Triple Fault

Symptoms: System resets immediately after enabling protected mode.

Possible Causes:

  1. GDT entries incorrect (bad access bytes or granularity)
  2. GDTR not loaded correctly
  3. Far jump to wrong selector
  4. Interrupts enabled during switch (NMI causes triple fault)

Debug Steps:

// Print GDT base and limit before loading
void debug_gdt(void) {
    print_hex((uint32_t)&gdt);      // Should be in low memory
    print_hex(sizeof(gdt) - 1);     // Should be 23 (3 entries * 8 - 1)
}

Fix:

  1. Verify GDT entry format matches Intel SDM exactly
  2. Ensure CLI before mode switch
  3. Far jump selector should be 0x08 (offset to code segment in GDT)

Pitfall 4: C Code Crashes or Acts Strangely

Symptoms: C function calls produce garbage output or crash.

Possible Causes:

  1. Stack not set up correctly
  2. Wrong calling convention
  3. C code uses features requiring libc (stack protector, etc.)
  4. Segments not set up for C’s assumptions

Debug Steps:

# Disassemble C code to verify it's reasonable:
objdump -d build/main.o

# Check for unwanted libc references:
nm build/stage2.bin | grep -v -E "^[0-9a-f]+ [Tt]"
# Should show only your symbols

Fix:

  1. Add -fno-stack-protector to CFLAGS
  2. Ensure DS, ES, SS all point to data segment (selector 0x10 in protected mode)
  3. Set ESP to valid stack address before calling C

Pitfall 5: Linker Errors or Wrong Binary Output

Symptoms: Stage 2 binary is empty, wrong size, or has ELF headers.

Possible Causes:

  1. Linker script not used
  2. OUTPUT_FORMAT not binary
  3. Entry point symbol not found

Debug Steps:

# Check Stage 2 binary:
ls -la build/stage2.bin     # Should be reasonable size (1KB+)
file build/stage2.bin       # Should say "data" not "ELF"
hexdump -C build/stage2.bin | head  # Should start with code, not 0x7F ELF

Fix:

  1. Ensure -T linker.ld in link command
  2. Ensure linker script has OUTPUT_FORMAT(binary)
  3. Ensure entry symbol matches ENTRY() in linker script

Pitfall 6: A20 Gate Not Enabled

Symptoms: Memory above 1MB wraps to lower memory.

Possible Causes:

  1. A20 enable code not run
  2. Fast A20 method doesn’t work on this system
  3. Keyboard controller method has bug

Debug Steps:

// Test A20:
uint8_t *low = (uint8_t *)0x00100000;  // 1MB
uint8_t *wrap = (uint8_t *)0x00000000; // 0

*low = 0x42;
*wrap = 0xFF;

if (*low == 0xFF) {
    print("A20 DISABLED - memory wraps!\n");
} else {
    print("A20 enabled\n");
}

Fix: Implement both fast A20 and keyboard controller fallback:

void enable_a20(void) {
    // Fast A20 (System Control Port A)
    outb(0x92, inb(0x92) | 2);

    // If that didn't work, use keyboard controller
    // ... (more complex, see OSDev wiki)
}

8. Extensions & Challenges

After completing the basic project, try these extensions:

Add support for LBA addressing using INT 13h AH=42h. First check for LBA support with AH=41h.

Extension 2: Memory Detection with E820

Implement INT 15h/E820 memory detection in Stage 2 and display available memory regions.

Extension 3: Display Boot Drive Information

Query and display the boot drive’s parameters (sectors, heads, cylinders) using INT 13h AH=08h.

Extension 4: Simple Boot Menu

Add a basic menu that waits for keypress and can boot different configurations.

Extension 5: VGA Text Mode Clear Screen

Implement screen clear and color text output in protected mode by writing to 0xB8000.

Extension 6: Load a Simple Kernel

Create a minimal kernel that Stage 2 loads and jumps to. The kernel prints “Hello from kernel!”.

Extension 7: Error Message Display

Instead of single-character errors, implement a system to display descriptive error messages.


9. Real-World Connections

How GRUB Uses Multi-Stage Loading

GRUB 2’s boot process:

  1. boot.img (446 bytes): Minimal MBR, loads first sector of core.img
  2. diskboot.img (512 bytes): Part of core.img, loads rest of core.img
  3. core.img (variable, ~25-500KB): Contains filesystem drivers, boot menu, kernel loader

Your two-stage bootloader mirrors this architecture at a simpler level.

Industry Applications

  • Embedded Systems: Most embedded bootloaders (U-Boot, Barebox) use multi-stage loading
  • Security: Secure boot chains verify each stage before loading the next
  • Firmware Updates: Multi-stage allows updating Stage 2 without touching Stage 1
  • Recovery: Stage 1 can have fallback logic if Stage 2 fails

10. Resources

Essential References

Tools

Additional Reading

  • “Low-Level Programming” by Igor Zhirkov - Chapters 1-3, 6, 8
  • “Operating Systems: Three Easy Pieces” - Boot appendix
  • Linux kernel source: arch/x86/boot/ for real-world bootloader code

11. Self-Assessment Checklist

Before considering this project complete, verify:

Understanding

  • I can explain why the MBR is limited to 512 bytes
  • I understand the difference between CHS and LBA addressing
  • I can describe what INT 13h does and how to use it
  • I understand why we need a linker script for bare-metal C
  • I can explain the GDT entry format

Implementation

  • Stage 1 fits in 446 bytes (leaving room for partition table)
  • Stage 1 includes retry logic for disk reads
  • Stage 2 loads correctly and executes
  • C code compiles without libc dependencies
  • Protected mode transition works without triple fault
  • Video memory writes work in protected mode

Quality

  • Code is well-commented
  • Makefile builds everything correctly
  • Works in QEMU with make run
  • Debuggable with make debug

12. Submission / Completion Criteria

Your implementation is complete when:

  1. Functional Criteria:
    • make produces a bootable disk image
    • make run boots in QEMU and shows all expected output
    • Stage 1 → Stage 2 → C code → Protected mode works end-to-end
  2. Code Quality:
    • Each file has a header comment explaining its purpose
    • Complex code sections have explanatory comments
    • No hardcoded magic numbers without explanation
  3. Documentation:
    • README explains how to build and run
    • Memory map is documented
    • Any design decisions are explained
  4. Demonstration:
    • Can explain any line of code if asked
    • Can describe the boot flow from power-on to protected mode
    • Can debug a problem introduced by changing code

Project 4 of 17 in the Bootloader Deep Dive series

Next: Project 5 - FAT12 Filesystem Bootloader