Project 4: Two-Stage Bootloader
Break free from the 512-byte MBR constraint by implementing a multi-stage bootloader architecture identical to GRUB, LILO, and Windows Boot Manager.
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | ★★★☆☆ Advanced |
| Time Estimate | 1-2 weeks |
| Language | x86 Assembly (NASM) + C |
| Prerequisites | Project 3 (Real to Protected Mode), basic C knowledge, linker basics |
| Key Topics | INT 13h disk services, CHS/LBA addressing, memory layout, ABI conventions, linker scripts, cross-compilation |
1. Learning Objectives
By completing this project, you will:
- Master BIOS disk I/O: Use INT 13h to read sectors from disk, handling both CHS and LBA addressing modes
- Understand bootloader architecture: Implement the same multi-stage pattern used by GRUB, LILO, and professional bootloaders
- Bridge assembly and C: Call C functions from assembly and vice versa, understanding ABI conventions for bare-metal environments
- Write linker scripts: Control memory layout for bare-metal binaries, placing code at exact addresses
- Cross-compile for bare metal: Build freestanding code without standard library dependencies
- Design robust error handling: Implement retry logic and diagnostic output in space-constrained code
- Plan memory layouts: Design memory maps that accommodate bootloader stages, stack, and future kernel
2. Theoretical Foundation
2.1 Core Concepts
The 512-Byte Problem
The Master Boot Record (MBR) is exactly 512 bytes. Of these:
- 2 bytes are the boot signature (0x55AA)
- 64 bytes are the partition table (4 entries x 16 bytes)
- 446 bytes remain for your bootloader code
In 446 bytes, you cannot:
- Include a filesystem driver
- Parse ELF or PE executables
- Display a boot menu with options
- Perform memory detection
- Set up proper protected mode with comprehensive GDT
Every professional bootloader solves this with staged loading:
┌─────────────────────────────────────────────────────────────────────────────┐
│ MULTI-STAGE BOOTLOADER ARCHITECTURE │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────────────────────────────────────────────────────────────┐ │
│ │ STAGE 1 (MBR) │ │
│ │ 512 bytes max │ │
│ ├──────────────────────────────────────────────────────────────────────┤ │
│ │ Responsibilities: │ │
│ │ ✓ Set up minimal stack │ │
│ │ ✓ Read Stage 2 from disk (fixed sectors) │ │
│ │ ✓ Jump to Stage 2 │ │
│ │ │ │
│ │ NOT responsible for: │ │
│ │ ✗ Filesystem parsing │ │
│ │ ✗ Protected mode │ │
│ │ ✗ Memory detection │ │
│ │ ✗ User interface │ │
│ └──────────────────────────────────────────────────────────────────────┘ │
│ │ │
│ │ Load via INT 13h │
│ ▼ │
│ ┌──────────────────────────────────────────────────────────────────────┐ │
│ │ STAGE 2 │ │
│ │ No size limit! │ │
│ ├──────────────────────────────────────────────────────────────────────┤ │
│ │ Responsibilities: │ │
│ │ ✓ Memory detection (E820) │ │
│ │ ✓ A20 gate enable │ │
│ │ ✓ GDT setup │ │
│ │ ✓ Protected mode transition │ │
│ │ ✓ Filesystem driver (FAT, ext2, etc.) │ │
│ │ ✓ Kernel loading and parsing │ │
│ │ ✓ Boot menu and configuration │ │
│ │ ✓ Video mode setup │ │
│ │ │ │
│ │ Can be written in C with assembly stubs! │ │
│ └──────────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
INT 13h Disk Services
The BIOS provides disk I/O through interrupt 13h. Two main functions:
AH=02h: Read Sectors (CHS addressing)
Input:
AH = 02h (function: read sectors)
AL = number of sectors to read (1-128)
CH = cylinder number (low 8 bits)
CL = sector number (bits 0-5) + cylinder high bits (bits 6-7)
DH = head number
DL = drive number (80h = first HDD, 00h = first floppy)
ES:BX = destination buffer address
Output:
CF = 0 on success, 1 on error
AH = status code (0 = success)
AL = number of sectors actually read
AH=42h: Extended Read (LBA addressing)
Input:
AH = 42h (function: extended read)
DL = drive number
DS:SI = pointer to Disk Address Packet (DAP)
Disk Address Packet structure:
Offset Size Description
0x00 1 Size of packet (16 bytes)
0x01 1 Reserved (0)
0x02 2 Number of sectors to read
0x04 4 Transfer buffer (segment:offset)
0x08 8 Starting LBA (64-bit)
Output:
CF = 0 on success, 1 on error
AH = status code
CHS vs LBA Conversion:
LBA = (Cylinder × HeadsPerCylinder + Head) × SectorsPerTrack + (Sector - 1)
Example for standard floppy (18 sectors, 2 heads):
CHS (0, 0, 2) = (0 × 2 + 0) × 18 + (2 - 1) = LBA 1 (second sector)
CHS (0, 0, 3) = LBA 2
CHS (0, 1, 1) = LBA 18 (first sector of second head)
Memory Map for Two-Stage Loading
┌─────────────────────────────────────────────────────────────────────────────┐
│ REAL MODE MEMORY MAP (First 1MB) │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ Address Size Description │
│ ───────────────────────────────────────────────────────────────────────── │
│ 0x00000 1KB Interrupt Vector Table (IVT) - DO NOT TOUCH │
│ 0x00400 256B BIOS Data Area (BDA) - DO NOT TOUCH │
│ 0x00500 ~30KB FREE - Can use for Stage 2 │
│ 0x07C00 512B Stage 1 (MBR) - BIOS loads here │
│ 0x07E00 ~480KB FREE - Can use for Stage 2 or kernel │
│ 0x9FC00 1KB Extended BIOS Data Area (EBDA) │
│ 0xA0000 64KB Video Memory (VGA) │
│ 0xC0000 32KB Video BIOS ROM │
│ 0xC8000 32KB Mapped hardware / Option ROMs │
│ 0xF0000 64KB System BIOS ROM │
│ │
│ RECOMMENDED LAYOUT: │
│ ┌────────────────────────────────────────────────────────────────────────┐ │
│ │ 0x00500 - 0x07BFF : Stage 2 loading area (~29KB available) │ │
│ │ 0x07C00 - 0x07DFF : Stage 1 (MBR) code │ │
│ │ 0x07E00 - 0x0FFFF : Stack (grows down from 0x10000) │ │
│ │ 0x10000 - 0x9FBFF : Kernel loading area (~576KB available) │ │
│ └────────────────────────────────────────────────────────────────────────┘ │
│ │
│ ALTERNATIVE (Stage 2 above Stage 1): │
│ ┌────────────────────────────────────────────────────────────────────────┐ │
│ │ 0x07C00 - 0x07DFF : Stage 1 (MBR) code │ │
│ │ 0x08000 - 0x0FFFF : Stage 2 loading area (32KB) │ │
│ │ 0x10000 - 0x1FFFF : Stack (64KB, grows down from 0x20000) │ │
│ │ 0x20000 - 0x9FBFF : Kernel loading area (~510KB) │ │
│ └────────────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
Calling C from Assembly (System V i386 ABI)
For 32-bit protected mode code, the System V i386 ABI specifies:
┌─────────────────────────────────────────────────────────────────────────────┐
│ SYSTEM V i386 CALLING CONVENTION │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ARGUMENT PASSING (cdecl): │
│ ┌────────────────────────────────────────────────────────────────────────┐ │
│ │ Arguments pushed right-to-left onto stack │ │
│ │ │ │
│ │ For: void foo(int a, int b, int c) │ │
│ │ Call foo(1, 2, 3) │ │
│ │ │ │
│ │ Stack before call: Stack after CALL: │ │
│ │ ... ... │ │
│ │ 3 (arg c) 3 (arg c) [EBP+16] │ │
│ │ 2 (arg b) 2 (arg b) [EBP+12] │ │
│ │ 1 (arg a) 1 (arg a) [EBP+8] │ │
│ │ ← ESP return addr [EBP+4] │ │
│ │ saved EBP [EBP] ← EBP, ESP │ │
│ └────────────────────────────────────────────────────────────────────────┘ │
│ │
│ RETURN VALUE: │
│ • Integer/pointer: EAX (32-bit) or EDX:EAX (64-bit) │
│ • Floating point: x87 FPU stack top (ST0) │
│ │
│ CALLER-SAVED (volatile): │
│ • EAX, ECX, EDX - function can destroy these │
│ │
│ CALLEE-SAVED (non-volatile): │
│ • EBX, ESI, EDI, EBP - function must preserve these │
│ │
│ STACK CLEANUP: │
│ • Caller cleans up arguments after call (ADD ESP, n) │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
For 16-bit real mode calling C, the same principles apply but with 16-bit registers:
- Arguments on stack via
push(right to left) - Return in AX (or DX:AX for long)
- Caller cleans stack
Linker Scripts for Bare Metal
A linker script controls where code and data are placed in memory:
/* stage2.ld - Example linker script for Stage 2 */
OUTPUT_FORMAT(binary) /* Raw binary, no ELF wrapper */
OUTPUT_ARCH(i386)
ENTRY(_start) /* Entry point symbol */
SECTIONS
{
. = 0x8000; /* Stage 2 loads at 0x8000 */
.text : {
*(.text) /* All code sections */
}
.rodata : {
*(.rodata) /* Read-only data (strings, constants) */
*(.rodata.*)
}
.data : {
*(.data) /* Initialized data */
}
.bss : {
__bss_start = .; /* Symbol for BSS start */
*(.bss)
*(COMMON)
__bss_end = .; /* Symbol for BSS end */
}
/DISCARD/ : {
*(.eh_frame) /* Discard exception handling (no runtime) */
*(.comment)
}
}
2.2 Why This Matters
Every real bootloader uses multi-stage loading. Understanding this pattern is fundamental:
| Bootloader | Stage 1 | Stage 2 | Notes |
|---|---|---|---|
| GRUB | boot.img (446 bytes) | core.img (variable, typically 25-500KB) | Stage 2 contains filesystem drivers |
| LILO | Boot sector | Map file sectors | Stage 2 location recorded at install time |
| Windows | MBR code | bootmgr | Located via BCD configuration |
| Syslinux | ldlinux.sys (first sector) | ldlinux.sys (rest) | Designed for FAT filesystems |
This pattern appears throughout computing:
- CPU microcode updates
- UEFI PEI to DXE transition
- U-Boot SPL to main U-Boot
- Embedded bootloader chains
2.3 Historical Context
The 512-byte limit comes from the original IBM PC (1981):
- The PC’s floppy drive controller could read one sector at a time
- Sector size was standardized at 512 bytes
- BIOS was designed to load exactly one sector to 0x7C00
- The boot signature (0x55AA) verified valid boot code
This limit has persisted for 40+ years due to backward compatibility requirements. Even modern UEFI systems can emulate legacy BIOS boot for older operating systems.
2.4 Common Misconceptions
Misconception 1: “Stage 2 must be at a fixed address” Reality: Stage 2 can be anywhere in available memory. The only requirement is that Stage 1 knows where to load it and that the linker script matches that address.
Misconception 2: “You can’t use C in a bootloader”
Reality: C works perfectly in bootloaders with proper setup: -ffreestanding, -nostdlib, correct linker script, and an assembly stub to set up the stack before calling C.
Misconception 3: “INT 13h only works with floppies” Reality: INT 13h works with all drives including hard disks, USB drives, and virtual disks. Extended INT 13h (AH=42h) supports LBA for large drives.
Misconception 4: “The partition table uses all 64 bytes” Reality: If you don’t need partitions (e.g., floppy disk or dedicated boot device), you can use those 64 bytes for code. But for hard disk compatibility, keep the partition table.
3. Project Specification
3.1 What You Will Build
A complete two-stage bootloader:
- Stage 1: 512-byte MBR that loads Stage 2 from disk and jumps to it
- Stage 2: Larger program (written in C with assembly stubs) that initializes the system and prepares for kernel loading
3.2 Functional Requirements
- Stage 1 Requirements:
- Fit entirely within 446 bytes (512 - 64 partition table - 2 signature)
- Set up a valid stack
- Read Stage 2 from consecutive disk sectors (sectors 2-N)
- Implement retry logic for disk read failures (3-5 retries)
- Display minimal status (‘1’ for success, ‘E’ for error)
- Jump to Stage 2’s entry point
- Stage 2 Requirements:
- Display “Stage 2 loaded!” message confirming successful loading
- Query memory using INT 15h/E820 (optional but recommended)
- Set up GDT for protected mode
- Enable A20 gate
- Switch to protected mode
- Display “Protected mode active!” from 32-bit code
- Demonstrate C function calls (print a message from C)
- Build System Requirements:
- Makefile that builds both stages
- Creates a bootable disk image
- Supports
make runto test in QEMU - Supports
make debugto debug in QEMU+GDB
3.3 Non-Functional Requirements
- Deterministic: Same build produces identical binary
- Testable: Works identically in QEMU and on real hardware (USB boot)
- Educational: Code is well-commented explaining each step
- Extensible: Stage 2 structure allows easy addition of features
3.4 Example Usage / Output
# Build the bootloader
$ make
nasm -f bin stage1.asm -o stage1.bin
i686-elf-gcc -c -ffreestanding -nostdlib stage2_entry.asm -o stage2_entry.o
i686-elf-gcc -c -ffreestanding -nostdlib -m32 stage2.c -o stage2.o
i686-elf-ld -T stage2.ld stage2_entry.o stage2.o -o stage2.bin
cat stage1.bin stage2.bin > boot.img
# Pad to floppy size for QEMU
dd if=/dev/zero bs=1474560 count=1 of=floppy.img
dd if=boot.img of=floppy.img conv=notrunc
# Run in QEMU
$ make run
qemu-system-i386 -fda floppy.img
# Expected output on QEMU console:
1 # Stage 1 loaded sector successfully
Stage 1: Loading Stage 2 from sectors 2-10...
Stage 1: Jumping to Stage 2 at 0x8000
Stage 2: Hello from C code!
Stage 2: Initializing protected mode...
[A20 enabled]
[GDT loaded]
[Entering protected mode...]
Stage 2 (32-bit): Protected mode active!
Stage 2 (32-bit): Ready to load kernel.
3.5 Real World Outcome
Upon completion, you will have:
-
A working two-stage bootloader identical in architecture to GRUB, proving you understand professional bootloader design
- Skills to extend it with:
- Filesystem drivers (FAT12/FAT32, ext2)
- ELF kernel loading
- Boot menu with configuration
- Graphics mode initialization
-
Understanding of bare-metal C - how to run C code without any OS support, which is crucial for embedded systems and OS development
- A portfolio piece demonstrating low-level systems expertise
4. Solution Architecture
4.1 High-Level Design
┌─────────────────────────────────────────────────────────────────────────────┐
│ TWO-STAGE BOOTLOADER │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ DISK LAYOUT: │
│ ┌──────────┬──────────────────────────────────────────────────────────┐ │
│ │ Sector 1 │ Sector 2 │ Sector 3 │ Sector 4 │ ... │ Sector N │ ... │ │
│ ├──────────┼──────────────────────────────────────────────────────────┤ │
│ │ Stage 1 │ Stage 2 Binary │ Free │ │
│ │ (MBR) │ (loaded to 0x8000) │ │ │
│ │ 512 bytes│ Variable size │ │ │
│ └──────────┴──────────────────────────────────────────────────────────┘ │
│ │
│ MEMORY AFTER STAGE 2 LOADS: │
│ ┌──────────────────────────────────────────────────────────────────────┐ │
│ │ │ │
│ │ 0x0000 ┌─────────────────────────────────────────┐ │ │
│ │ │ IVT + BIOS Data (DO NOT TOUCH) │ │ │
│ │ 0x0500 ├─────────────────────────────────────────┤ │ │
│ │ │ (Available memory) │ │ │
│ │ 0x7C00 ├─────────────────────────────────────────┤ │ │
│ │ │ Stage 1 Code (512 bytes) │ │ │
│ │ 0x7E00 ├─────────────────────────────────────────┤ │ │
│ │ │ Stack grows down from 0x7C00 │ │ │ │
│ │ │ ↓ │ │ │ │
│ │ 0x8000 ├─────────────────────────────────────────┤ │ │ │
│ │ │ │ ▼ │ │
│ │ │ Stage 2 Code + Data │ │ │
│ │ │ (Variable size, e.g., 8KB) │ │ │
│ │ │ │ │ │
│ │ 0xA000 ├─────────────────────────────────────────┤ │ │
│ │ │ (Available for kernel) │ │ │
│ │ 0x9FC00├─────────────────────────────────────────┤ │ │
│ │ │ EBDA + Video + ROM (DO NOT TOUCH) │ │ │
│ │ 0xFFFFF└─────────────────────────────────────────┘ │ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
4.2 Key Components
| Component | File | Purpose |
|---|---|---|
| Stage 1 | stage1.asm |
MBR bootloader, loads Stage 2 |
| Stage 2 Entry | stage2_entry.asm |
Assembly stub that sets up and calls C |
| Stage 2 Main | stage2.c |
Main Stage 2 logic in C |
| Stage 2 Print | print.c |
Utility functions for printing (real/protected mode) |
| Linker Script | stage2.ld |
Controls Stage 2 memory layout |
| Makefile | Makefile |
Build automation |
4.3 Data Structures
Disk Address Packet (DAP) for Extended Read:
struct disk_address_packet {
uint8_t size; // Size of this structure (16)
uint8_t reserved; // Must be 0
uint16_t sectors; // Number of sectors to transfer
uint16_t offset; // Transfer buffer offset
uint16_t segment; // Transfer buffer segment
uint64_t lba; // Starting LBA
} __attribute__((packed));
GDT Entry:
struct gdt_entry {
uint16_t limit_low; // Limit bits 0-15
uint16_t base_low; // Base bits 0-15
uint8_t base_mid; // Base bits 16-23
uint8_t access; // Access byte
uint8_t granularity; // Limit 16-19 + flags
uint8_t base_high; // Base bits 24-31
} __attribute__((packed));
struct gdt_ptr {
uint16_t limit; // Size of GDT - 1
uint32_t base; // Linear address of GDT
} __attribute__((packed));
4.4 Algorithm Overview
Stage 1 Algorithm:
1. BIOS loads Stage 1 at 0x7C00
2. Set up segment registers (DS, ES, SS = 0)
3. Set up stack (SP = 0x7C00, grows down to ~0x0500)
4. Display '1' to indicate Stage 1 running
5. Prepare disk read:
- Target address: 0x8000
- Start sector: 2 (LBA 1)
- Sector count: STAGE2_SECTORS (compile-time constant)
6. Call INT 13h AH=02h or AH=42h (with retry loop)
7. If error: display 'E', halt
8. If success: jump to 0x8000
Stage 2 Algorithm:
1. Entry point at 0x8000 (assembly)
2. Display "Stage 2 loaded" message
3. Set up stack for C calling
4. Call C main() function
5. In C main():
a. Print welcome message
b. Query memory with E820 (optional)
c. Enable A20 gate
d. Load GDT
e. Enable protected mode (set CR0.PE)
f. Far jump to 32-bit code
6. In 32-bit code:
a. Reload segment registers with 32-bit selectors
b. Print "Protected mode active!" to video memory
c. Halt or continue to kernel loading
5. Implementation Guide
5.1 Development Environment Setup
# Required tools
# On Ubuntu/Debian:
sudo apt-get install nasm qemu-system-x86 gcc gdb make
# Cross-compiler for 32-bit freestanding code (recommended):
# Option 1: Use system gcc with -m32
sudo apt-get install gcc-multilib
# Option 2: Build cross-compiler (cleaner, no host dependencies)
# See: https://wiki.osdev.org/GCC_Cross-Compiler
# Verify NASM
nasm --version # Should be 2.14+
# Verify QEMU
qemu-system-i386 --version # Should be 4.0+
5.2 Project Structure
two-stage-bootloader/
├── Makefile
├── stage1.asm # Stage 1 assembly (512 bytes)
├── stage2/
│ ├── entry.asm # Stage 2 entry point (assembly)
│ ├── main.c # Main C code
│ ├── print.c # Print utilities
│ ├── print.h # Header file
│ ├── gdt.c # GDT setup
│ ├── a20.c # A20 gate enable
│ └── linker.ld # Linker script for Stage 2
├── build/ # Build artifacts
│ ├── stage1.bin
│ ├── stage2.bin
│ └── boot.img
└── README.md
5.3 The Core Question You’re Answering
“How do you overcome the 512-byte constraint of the MBR while maintaining compatibility with BIOS boot requirements, and what architectural patterns enable a bootloader to scale from minimal initialization code to complex features like filesystem drivers and interactive menus?”
This is about understanding the fundamental pattern of staged loading and bootstrap architectures.
5.4 Concepts You Must Understand First
Before writing code, verify you understand these concepts:
- INT 13h Function AH=02h Parameters
- What does each register (AH, AL, CH, CL, DH, DL, ES:BX) contain?
- How do you convert from LBA to CHS?
- What does the Carry Flag indicate after the call?
- Reference: Ralf Brown’s Interrupt List, INT 13h
- Segment:Offset Addressing
- Given ES=0x0800, BX=0x0000, what physical address does ES:BX point to?
- Why do we set DS=ES=SS=0 and use only offsets?
- Reference: CS:APP Chapter 3.4, Intel SDM Vol. 1 Chapter 3
- Stack Setup in Assembly
- Why do we need to set up SS and SP before using the stack?
- What happens if we call a function without a stack?
- Reference: Low-Level Programming by Zhirkov, Chapter 3
- Freestanding C Environment
- What does
-ffreestandingtell the compiler? - Why can’t we use printf, malloc, or any libc function?
- What do we have: basic C syntax, no standard library
- Reference: GCC Manual, Section 3.4 “Options Controlling C Dialect”
- What does
- Linker Script Basics
- What does
. = 0x8000;mean? - What are .text, .data, .rodata, .bss sections?
- Why
OUTPUT_FORMAT(binary)instead of ELF? - Reference: LD Manual, Chapter 3 “Linker Scripts”
- What does
5.5 Questions to Guide Your Design
Work through these before coding:
- Memory Layout:
- Where will you load Stage 2? (0x8000 is common, but why?)
- Where will the stack be for Stage 1? For Stage 2?
- How will you ensure Stage 1 and Stage 2 don’t overlap?
- Disk Access:
- How many sectors is your Stage 2?
- Will you use CHS or LBA addressing?
- What drive number will you use (DL is passed by BIOS)?
- Error Handling:
- What happens if disk read fails?
- How will you indicate success/failure (remember: limited space)?
- How many times will you retry?
- Stage 2 Entry:
- What does Stage 2 expect when entered (segment registers, stack)?
- How will the assembly stub hand off to C?
- What calling convention will you use?
- Protected Mode Transition:
- Where will your GDT live in memory?
- How will you reload segment registers after mode switch?
- How will you print in protected mode (no BIOS interrupts)?
5.6 Thinking Exercise
Before writing any code, trace through this scenario on paper:
- BIOS loads Stage 1 at 0x7C00. Draw the memory map showing:
- Where is Stage 1 code?
- Where should the stack pointer be set?
- What are the initial values of DS, ES, SS, CS?
- Stage 1 loads Stage 2. Show:
- The INT 13h register setup for reading 8 sectors to 0x8000
- The memory map after loading
- The instruction that transfers control to Stage 2
- Stage 2 runs and enters protected mode. Diagram:
- Where is the GDT in memory?
- What are the three GDT entries (null, code, data)?
- What are the segment selector values after protected mode?
- Stage 2 writes to video memory in protected mode. Explain:
- Why can’t we use INT 10h anymore?
- What is the physical address of video memory?
- How do we write a character with attribute to the screen?
5.7 Hints in Layers
Hint 1: Stage 1 Skeleton
Start with this Stage 1 structure:
[BITS 16]
[ORG 0x7C00]
start:
; Disable interrupts during setup
cli
; Set up segments
xor ax, ax
mov ds, ax
mov es, ax
mov ss, ax
mov sp, 0x7C00 ; Stack grows down from here
; Re-enable interrupts
sti
; Save drive number (BIOS passes it in DL)
mov [boot_drive], dl
; TODO: Print '1' to show we're running
; TODO: Load Stage 2
; TODO: Jump to Stage 2
; Error handling
jmp error
error:
mov ah, 0x0E
mov al, 'E'
int 0x10
hlt
jmp $
boot_drive: db 0
; Pad to 510 bytes
times 510-($-$$) db 0
; Boot signature
dw 0xAA55
Hint 2: Reading Sectors with INT 13h
For reading sectors using CHS (simpler for small disks):
load_stage2:
mov ah, 0x02 ; Function: read sectors
mov al, STAGE2_SECTORS ; Number of sectors
mov ch, 0 ; Cylinder 0
mov cl, 2 ; Sector 2 (1-indexed, sector 1 is MBR)
mov dh, 0 ; Head 0
mov dl, [boot_drive] ; Drive number
mov bx, 0x8000 ; ES:BX = destination
; ES is already 0 from setup
int 0x13
jc disk_error ; Jump if carry set (error)
; Success - jump to Stage 2
jmp 0x0000:0x8000
disk_error:
; Retry logic here
; ...
For retry logic:
mov si, 3 ; 3 retries
.retry:
; ... disk read code ...
int 0x13
jnc .success ; No carry = success
; Reset disk system
xor ax, ax
int 0x13
dec si
jnz .retry
jmp error ; All retries failed
.success:
jmp 0x0000:0x8000
Hint 3: Stage 2 Assembly Entry
The Stage 2 entry point needs to set up for C:
[BITS 16]
[GLOBAL _start]
[EXTERN main] ; C function we'll call
section .text
_start:
; We're at 0x8000, loaded by Stage 1
; Print message
mov si, msg_loaded
call print_string
; Set up stack for C (16-bit real mode first)
mov ax, 0
mov ds, ax
mov es, ax
mov ss, ax
mov sp, 0x9000 ; Stack for Stage 2
; Call C code
call main
; main() returned - halt
cli
hlt
jmp $
print_string:
; SI = string pointer
.loop:
lodsb
or al, al
jz .done
mov ah, 0x0E
int 0x10
jmp .loop
.done:
ret
section .rodata
msg_loaded: db "Stage 2 loaded!", 13, 10, 0
Hint 4: Minimal GDT for Protected Mode
A minimal GDT needs three entries:
// gdt.c
struct gdt_entry {
uint16_t limit_low;
uint16_t base_low;
uint8_t base_mid;
uint8_t access;
uint8_t granularity;
uint8_t base_high;
} __attribute__((packed));
struct gdt_ptr {
uint16_t limit;
uint32_t base;
} __attribute__((packed));
// GDT with 3 entries: null, code, data
struct gdt_entry gdt[3];
struct gdt_ptr gdtp;
void gdt_set_entry(int i, uint32_t base, uint32_t limit,
uint8_t access, uint8_t gran) {
gdt[i].limit_low = limit & 0xFFFF;
gdt[i].base_low = base & 0xFFFF;
gdt[i].base_mid = (base >> 16) & 0xFF;
gdt[i].access = access;
gdt[i].granularity = ((limit >> 16) & 0x0F) | (gran & 0xF0);
gdt[i].base_high = (base >> 24) & 0xFF;
}
void gdt_init(void) {
// Null descriptor (required)
gdt_set_entry(0, 0, 0, 0, 0);
// Code segment: base=0, limit=4GB, 32-bit, ring 0
// Access: present(1) + ring 0(00) + code/data(1) + exec(1) + r/w(1) = 0x9A
// Granularity: 4KB pages(1) + 32-bit(1) + reserved(0) + limit high = 0xCF
gdt_set_entry(1, 0, 0xFFFFF, 0x9A, 0xCF);
// Data segment: base=0, limit=4GB, 32-bit, ring 0
// Access: same as code but not executable = 0x92
gdt_set_entry(2, 0, 0xFFFFF, 0x92, 0xCF);
// Set up GDTR
gdtp.limit = sizeof(gdt) - 1;
gdtp.base = (uint32_t)&gdt;
}
Hint 5: Switching to Protected Mode
The actual mode switch requires assembly:
[BITS 16]
switch_to_protected:
cli ; Disable interrupts
lgdt [gdtp] ; Load GDT register
mov eax, cr0
or eax, 1 ; Set PE (Protection Enable) bit
mov cr0, eax
jmp 0x08:protected_mode ; Far jump to code segment (selector 0x08)
; This also flushes the prefetch queue
[BITS 32]
protected_mode:
; Reload data segments with selector 0x10 (data segment)
mov ax, 0x10
mov ds, ax
mov es, ax
mov fs, ax
mov gs, ax
mov ss, ax
mov esp, 0x90000 ; Set up 32-bit stack
; Now we're in protected mode!
; Write directly to video memory at 0xB8000
mov byte [0xB8000], 'P'
mov byte [0xB8001], 0x0F ; White on black
; ... continue execution ...
cli
hlt
Hint 6: Makefile for Building
# Cross-compiler (or use gcc -m32)
CC = i686-elf-gcc
LD = i686-elf-ld
# If you don't have cross-compiler:
# CC = gcc
# CFLAGS += -m32
CFLAGS = -ffreestanding -nostdlib -O2 -Wall -Wextra
# Stage 1
build/stage1.bin: stage1.asm
nasm -f bin stage1.asm -o build/stage1.bin
# Stage 2 object files
build/entry.o: stage2/entry.asm
nasm -f elf32 stage2/entry.asm -o build/entry.o
build/main.o: stage2/main.c
$(CC) $(CFLAGS) -c stage2/main.c -o build/main.o
build/print.o: stage2/print.c
$(CC) $(CFLAGS) -c stage2/print.c -o build/print.o
# Stage 2 binary
build/stage2.bin: build/entry.o build/main.o build/print.o stage2/linker.ld
$(LD) -T stage2/linker.ld build/entry.o build/main.o build/print.o -o build/stage2.bin
# Final disk image
build/boot.img: build/stage1.bin build/stage2.bin
cat build/stage1.bin build/stage2.bin > build/boot.img
# Pad to at least one track (for QEMU)
dd if=/dev/zero of=build/floppy.img bs=1474560 count=1
dd if=build/boot.img of=build/floppy.img conv=notrunc
# QEMU targets
run: build/boot.img
qemu-system-i386 -fda build/floppy.img
debug: build/boot.img
qemu-system-i386 -fda build/floppy.img -s -S &
gdb -ex "target remote localhost:1234" -ex "set architecture i8086"
clean:
rm -f build/*.bin build/*.o build/*.img
.PHONY: run debug clean
5.8 The Interview Questions They’ll Ask
These questions test understanding of the concepts in this project:
-
“Why is the MBR limited to 512 bytes? Could we change this?”
Good answer: The 512-byte limit comes from the original PC’s floppy controller reading one sector. The BIOS boot process loads exactly one sector to 0x7C00. We can’t change this for legacy BIOS because the firmware is hardcoded. UEFI solves this with the ESP and UEFI applications.
-
“Explain what happens from power-on to your Stage 2 code running.”
Expected flow: PSU → CPU reset → execute at 0xFFFFFFF0 → jump to BIOS → POST → enumerate devices → read MBR to 0x7C00 → verify 0xAA55 signature → jump to 0x7C00 (Stage 1) → Stage 1 reads sectors to 0x8000 → jump to 0x8000 (Stage 2).
-
“What’s the difference between CHS and LBA addressing? Which would you use?”
Answer: CHS uses cylinder/head/sector (3D geometry), limited to 8.4GB. LBA treats disk as flat array of sectors. Use LBA (INT 13h AH=42h) when available (check with AH=41h) because it’s simpler and supports larger disks.
-
“How would you debug if Stage 2 doesn’t load correctly?”
Answer: (1) Print a character before/after disk read to identify where it fails. (2) Verify sector count and destination address. (3) Use QEMU with
-d intto log interrupts. (4) Use GDB with QEMU-s -Sto step through. (5) Check CF and AH after INT 13h for error codes. -
“Why can’t you call printf() in your Stage 2 C code?”
Answer: printf() is part of the C standard library, which requires an OS to handle output. In bare-metal, there’s no OS, no libc. We compile with
-ffreestanding -nostdlib. We implement our own print function using direct hardware access (INT 10h in real mode, video memory in protected mode). -
“What does a linker script do and why do you need one for bare-metal?”
Answer: Linker scripts control memory layout: where code/data sections go, the entry point, output format. For bare-metal, we need to: (1) place code at the exact load address (e.g., 0x8000), (2) output raw binary (not ELF) so the CPU can execute it directly, (3) define symbols for runtime (like BSS boundaries).
-
“How do you ensure your C code doesn’t use features that require an OS?”
Answer: Use
-ffreestanding(standard library not available) and-nostdlib(don’t link libc). Avoid: dynamic memory (malloc), floating point (may need FPU init), global constructors, standard headers except freestanding ones (stdint.h, stddef.h, stdarg.h, stdbool.h). -
“What happens if you forget to set up the stack before calling C code?”
Answer: Undefined behavior. C assumes a valid stack for: function parameters, local variables, return addresses, saved registers. Without a stack, PUSH/POP/CALL/RET corrupt random memory. Could overwrite interrupt vectors, BIOS data, or your own code.
5.9 Books That Will Help
| Topic | Book | Specific Chapters/Pages |
|---|---|---|
| x86 Assembly Fundamentals | “Low-Level Programming” by Igor Zhirkov | Chapters 1-3 (Assembly basics), Chapter 8 (Disk I/O) |
| Disk I/O and BIOS | CS:APP (3rd ed.) by Bryant & O’Hallaron | Chapter 6.1 (Storage Technologies) pp. 598-612 |
| Linking and Loading | “Low-Level Programming” by Igor Zhirkov | Chapter 6 (Tool Chain) pp. 201-234 |
| Linker Scripts | GNU LD Manual | Chapter 3 (Linker Scripts) |
| x86 Protected Mode | Intel SDM Vol. 3A | Chapter 2 (System Architecture Overview) |
| Bootloader Architecture | “Operating Systems: Three Easy Pieces” | Appendix on Boot (available online free) |
| GDT and Segmentation | Intel SDM Vol. 3A | Chapter 3.4 (Logical and Linear Addresses) |
| BIOS Interrupts | Ralf Brown’s Interrupt List | INT 13h (Disk), INT 10h (Video) |
5.10 Implementation Phases
Phase 1: Basic Stage 1 (Days 1-3)
Goals:
- Stage 1 boots and prints a character
- Basic disk read code (even if hardcoded)
Tasks:
- Write minimal Stage 1 that boots and prints ‘1’
- Add INT 13h read code for a single sector
- Test that the read code returns successfully
Checkpoint: QEMU shows ‘1’ on screen and doesn’t immediately crash.
Phase 2: Load Stage 2 (Days 4-5)
Goals:
- Stage 1 successfully loads multiple sectors
- Jumps to Stage 2 entry point
Tasks:
- Extend Stage 1 to read N sectors
- Add retry logic for robustness
- Create minimal Stage 2 assembly that prints “Stage 2!”
- Verify Stage 2 executes
Checkpoint: QEMU shows “Stage 2 loaded” message from Stage 2 code.
Phase 3: Add C Support (Days 6-8)
Goals:
- Stage 2 can call C functions
- Basic print library works
Tasks:
- Write linker script for Stage 2
- Create assembly entry point that calls C main()
- Implement print functions in C
- Test C functions work
Checkpoint: Messages from C code appear on screen.
Phase 4: Protected Mode (Days 9-12)
Goals:
- GDT setup in C
- A20 enable
- Mode switch works
Tasks:
- Implement GDT setup function in C
- Implement A20 enable (fast A20 first, then keyboard controller fallback)
- Add assembly code for actual mode switch
- Verify protected mode works (write to video memory at 0xB8000)
Checkpoint: “Protected mode active!” appears on screen via direct video memory access.
Phase 5: Polish (Days 13-14)
Goals:
- Clean code, good documentation
- Works on real hardware (USB boot)
Tasks:
- Add comments explaining each step
- Test on real hardware if possible
- Add memory detection (E820) as extension
- Write README
Checkpoint: Clean, documented, working bootloader.
5.11 Key Implementation Decisions
| Decision | Options | Recommendation | Rationale |
|---|---|---|---|
| Stage 2 address | 0x1000, 0x8000, 0x10000 | 0x8000 | Above Stage 1, leaves room for stack below, plenty of room above |
| Disk addressing | CHS vs LBA | LBA with CHS fallback | LBA is simpler, but CHS needed for old BIOSes |
| GDT location | In Stage 2 .data vs dedicated area | In Stage 2 .data | Simpler, linker handles placement |
| Stack location | Below Stage 1 vs above Stage 2 | Below Stage 1 (0x7C00 down) for real mode | Traditional, avoids overwriting loaded code |
| A20 method | Fast A20 vs keyboard controller | Fast A20 first, KC fallback | Fast A20 works on most systems, faster |
| C compiler | Native gcc -m32 vs cross-compiler | Cross-compiler if possible | Cleaner, avoids host library leakage |
6. Testing Strategy
Test Categories
| Category | What to Test | How to Test |
|---|---|---|
| Stage 1 Load | Stage 1 boots | Character ‘1’ appears |
| Disk Read | Sectors read correctly | Stage 2 code runs |
| Retry Logic | Handle transient failures | Difficult - verify code path exists |
| C Integration | C functions callable | Message from C appears |
| GDT | GDT structure correct | Protected mode works |
| Mode Switch | Enter protected mode | Video memory write works |
Critical Test Cases
- Basic Boot Test
make run # Expected: '1', "Stage 2 loaded!", eventually protected mode message - Debug Session
make debug # In GDB: (gdb) break *0x7c00 # Break at Stage 1 entry (gdb) continue (gdb) x/20i $eip # View Stage 1 code (gdb) break *0x8000 # Break at Stage 2 entry (gdb) continue (gdb) x/20i $eip # View Stage 2 code - Memory Verification
# In QEMU monitor (Ctrl+Alt+2): (qemu) xp /16xb 0x7c00 # Verify Stage 1 at 0x7C00 (qemu) xp /16xb 0x8000 # Verify Stage 2 at 0x8000 (qemu) info registers # Check segment registers - GDT Verification
# After GDT load, in GDB: (gdb) info registers # Check GDTR (gdb) x/24xb <gdtr_base> # View GDT entries
Verifying Success
- Stage 1 displays ‘1’ character
- “Stage 2 loaded!” message appears
- C code message appears
- A20 enable message appears
- “Protected mode active!” in video memory
- No crashes or triple faults
- Works in QEMU with both
-fdaand-hda
7. Common Pitfalls & Debugging
Pitfall 1: Stage 2 Not Loading (Nothing After ‘1’)
Symptoms: Stage 1 prints ‘1’ but nothing else happens.
Possible Causes:
- Wrong sector number (remember: sectors are 1-indexed for CHS)
- Wrong destination address in ES:BX
- INT 13h returning error (check CF and AH)
- Stage 2 binary not actually on disk (build issue)
Debug Steps:
; Add after INT 13h:
jc .error ; Check carry flag
cmp ah, 0 ; Check status
jne .error
; If we reach here, read succeeded
.error:
mov al, ah ; Get error code
add al, '0' ; Convert to ASCII
mov ah, 0x0E
int 0x10 ; Print error code
Fix: Verify disk image has Stage 2 at correct sector, verify ES:BX correct.
Pitfall 2: Triple Fault on Jump to Stage 2
Symptoms: QEMU resets or freezes after Stage 1 completes.
Possible Causes:
- Stage 2 not at expected address in memory
- Jump address doesn’t match load address
- Stage 2 code corrupted or not built correctly
Debug Steps:
# In QEMU monitor:
(qemu) xp /16xb 0x8000 # Should show Stage 2 code
# If all zeros or 0xAA, Stage 2 didn't load
# With GDB:
(gdb) break *0x8000
(gdb) continue
# If breakpoint never hits, jump instruction is wrong
Fix: Ensure linker script sets origin to match load address (0x8000). Ensure far jump uses correct segment.
Pitfall 3: Protected Mode Triple Fault
Symptoms: System resets immediately after enabling protected mode.
Possible Causes:
- GDT entries incorrect (bad access bytes or granularity)
- GDTR not loaded correctly
- Far jump to wrong selector
- Interrupts enabled during switch (NMI causes triple fault)
Debug Steps:
// Print GDT base and limit before loading
void debug_gdt(void) {
print_hex((uint32_t)&gdt); // Should be in low memory
print_hex(sizeof(gdt) - 1); // Should be 23 (3 entries * 8 - 1)
}
Fix:
- Verify GDT entry format matches Intel SDM exactly
- Ensure CLI before mode switch
- Far jump selector should be 0x08 (offset to code segment in GDT)
Pitfall 4: C Code Crashes or Acts Strangely
Symptoms: C function calls produce garbage output or crash.
Possible Causes:
- Stack not set up correctly
- Wrong calling convention
- C code uses features requiring libc (stack protector, etc.)
- Segments not set up for C’s assumptions
Debug Steps:
# Disassemble C code to verify it's reasonable:
objdump -d build/main.o
# Check for unwanted libc references:
nm build/stage2.bin | grep -v -E "^[0-9a-f]+ [Tt]"
# Should show only your symbols
Fix:
- Add
-fno-stack-protectorto CFLAGS - Ensure DS, ES, SS all point to data segment (selector 0x10 in protected mode)
- Set ESP to valid stack address before calling C
Pitfall 5: Linker Errors or Wrong Binary Output
Symptoms: Stage 2 binary is empty, wrong size, or has ELF headers.
Possible Causes:
- Linker script not used
- OUTPUT_FORMAT not binary
- Entry point symbol not found
Debug Steps:
# Check Stage 2 binary:
ls -la build/stage2.bin # Should be reasonable size (1KB+)
file build/stage2.bin # Should say "data" not "ELF"
hexdump -C build/stage2.bin | head # Should start with code, not 0x7F ELF
Fix:
- Ensure
-T linker.ldin link command - Ensure linker script has
OUTPUT_FORMAT(binary) - Ensure entry symbol matches ENTRY() in linker script
Pitfall 6: A20 Gate Not Enabled
Symptoms: Memory above 1MB wraps to lower memory.
Possible Causes:
- A20 enable code not run
- Fast A20 method doesn’t work on this system
- Keyboard controller method has bug
Debug Steps:
// Test A20:
uint8_t *low = (uint8_t *)0x00100000; // 1MB
uint8_t *wrap = (uint8_t *)0x00000000; // 0
*low = 0x42;
*wrap = 0xFF;
if (*low == 0xFF) {
print("A20 DISABLED - memory wraps!\n");
} else {
print("A20 enabled\n");
}
Fix: Implement both fast A20 and keyboard controller fallback:
void enable_a20(void) {
// Fast A20 (System Control Port A)
outb(0x92, inb(0x92) | 2);
// If that didn't work, use keyboard controller
// ... (more complex, see OSDev wiki)
}
8. Extensions & Challenges
After completing the basic project, try these extensions:
Extension 1: Extended INT 13h with LBA (Recommended)
Add support for LBA addressing using INT 13h AH=42h. First check for LBA support with AH=41h.
Extension 2: Memory Detection with E820
Implement INT 15h/E820 memory detection in Stage 2 and display available memory regions.
Extension 3: Display Boot Drive Information
Query and display the boot drive’s parameters (sectors, heads, cylinders) using INT 13h AH=08h.
Extension 4: Simple Boot Menu
Add a basic menu that waits for keypress and can boot different configurations.
Extension 5: VGA Text Mode Clear Screen
Implement screen clear and color text output in protected mode by writing to 0xB8000.
Extension 6: Load a Simple Kernel
Create a minimal kernel that Stage 2 loads and jumps to. The kernel prints “Hello from kernel!”.
Extension 7: Error Message Display
Instead of single-character errors, implement a system to display descriptive error messages.
9. Real-World Connections
How GRUB Uses Multi-Stage Loading
GRUB 2’s boot process:
- boot.img (446 bytes): Minimal MBR, loads first sector of core.img
- diskboot.img (512 bytes): Part of core.img, loads rest of core.img
- core.img (variable, ~25-500KB): Contains filesystem drivers, boot menu, kernel loader
Your two-stage bootloader mirrors this architecture at a simpler level.
Industry Applications
- Embedded Systems: Most embedded bootloaders (U-Boot, Barebox) use multi-stage loading
- Security: Secure boot chains verify each stage before loading the next
- Firmware Updates: Multi-stage allows updating Stage 2 without touching Stage 1
- Recovery: Stage 1 can have fallback logic if Stage 2 fails
10. Resources
Essential References
- OSDev Wiki - Rolling Your Own Bootloader
- OSDev Wiki - Disk Access Using BIOS
- OSDev Wiki - GDT Tutorial
- Ralf Brown’s Interrupt List
- Intel SDM Vol. 3A - System Architecture
Tools
Additional Reading
- “Low-Level Programming” by Igor Zhirkov - Chapters 1-3, 6, 8
- “Operating Systems: Three Easy Pieces” - Boot appendix
- Linux kernel source:
arch/x86/boot/for real-world bootloader code
11. Self-Assessment Checklist
Before considering this project complete, verify:
Understanding
- I can explain why the MBR is limited to 512 bytes
- I understand the difference between CHS and LBA addressing
- I can describe what INT 13h does and how to use it
- I understand why we need a linker script for bare-metal C
- I can explain the GDT entry format
Implementation
- Stage 1 fits in 446 bytes (leaving room for partition table)
- Stage 1 includes retry logic for disk reads
- Stage 2 loads correctly and executes
- C code compiles without libc dependencies
- Protected mode transition works without triple fault
- Video memory writes work in protected mode
Quality
- Code is well-commented
- Makefile builds everything correctly
- Works in QEMU with make run
- Debuggable with make debug
12. Submission / Completion Criteria
Your implementation is complete when:
- Functional Criteria:
makeproduces a bootable disk imagemake runboots in QEMU and shows all expected output- Stage 1 → Stage 2 → C code → Protected mode works end-to-end
- Code Quality:
- Each file has a header comment explaining its purpose
- Complex code sections have explanatory comments
- No hardcoded magic numbers without explanation
- Documentation:
- README explains how to build and run
- Memory map is documented
- Any design decisions are explained
- Demonstration:
- Can explain any line of code if asked
- Can describe the boot flow from power-on to protected mode
- Can debug a problem introduced by changing code
Project 4 of 17 in the Bootloader Deep Dive series