LEARN BARE METAL PROGRAMMING
Learn Bare Metal Programming: From Blinking LEDs to Operating Systems
Goal: Deeply understand bare metal programming—writing code that runs directly on hardware without an operating system—from microcontrollers to x86 PCs, mastering CPU architecture, memory management, interrupts, and device drivers.
What is Bare Metal Programming?
Bare metal programming means writing code that runs directly on hardware—no operating system, no runtime, no abstractions. Your code is the first thing the CPU executes after power-on. You control everything: memory, peripherals, interrupts, the CPU itself.
Traditional Programming: Bare Metal Programming:
┌─────────────────────────┐ ┌─────────────────────────┐
│ Your Application │ │ │
├─────────────────────────┤ │ Your Code │
│ Runtime/Libraries │ │ (Everything!) │
├─────────────────────────┤ │ │
│ Operating System │ │ │
├─────────────────────────┤ ├─────────────────────────┤
│ Hardware │ │ Hardware │
└─────────────────────────┘ └─────────────────────────┘
Why Learn Bare Metal Programming?
- Deep Understanding: Know exactly how computers work at the lowest level
- OS Development: Foundation for writing operating systems and kernels
- Embedded Systems: Required for microcontrollers, IoT, automotive, aerospace
- Performance: Eliminate all overhead for real-time and performance-critical systems
- Security Research: Understand firmware, bootloaders, and hardware security
- Debugging Skills: Debug anything when you understand everything
Where Bare Metal Programming is Used
| Domain | Examples |
|---|---|
| Embedded Systems | Pacemakers, car ECUs, industrial controllers |
| Operating Systems | Linux kernel, Windows kernel, macOS kernel |
| Bootloaders | GRUB, U-Boot, BIOS/UEFI |
| Firmware | BIOS, hard drive firmware, GPU firmware |
| Real-Time Systems | Flight controllers, robotics, audio processing |
| Game Consoles | Retro console homebrew, emulators |
| IoT Devices | Sensors, actuators, smart devices |
Core Concepts
1. CPU Architecture
Different platforms have different architectures:
| Architecture | Common Uses | Characteristics |
|---|---|---|
| x86/x86-64 | PCs, servers | Complex (CISC), legacy modes, rich ecosystem |
| ARM Cortex-M | Microcontrollers | Simple, low power, no MMU |
| ARM Cortex-A | Phones, Raspberry Pi | Full-featured, MMU, multiple modes |
| AVR | Arduino | Very simple 8-bit, good for learning |
| RISC-V | Emerging | Open source, clean design |
2. Memory Map
Every system has a memory map defining where things are located:
x86 PC Memory Map (Real Mode):
┌──────────────────────────────────────┐ 0xFFFFF (1MB)
│ BIOS ROM │
├──────────────────────────────────────┤ 0xF0000
│ Reserved │
├──────────────────────────────────────┤ 0xC0000
│ Video Memory │
├──────────────────────────────────────┤ 0xA0000
│ Extended BIOS Data Area │
├──────────────────────────────────────┤ 0x9FC00
│ │
│ Free Memory │
│ (Your bootloader goes here) │
│ │
├──────────────────────────────────────┤ 0x7E00
│ Bootloader (512 bytes) │
├──────────────────────────────────────┤ 0x7C00
│ Free Memory │
├──────────────────────────────────────┤ 0x00500
│ BIOS Data Area │
├──────────────────────────────────────┤ 0x00400
│ Interrupt Vector Table │
└──────────────────────────────────────┘ 0x00000
3. Registers
Direct register manipulation is the core of bare metal:
// AVR example: Set pin 5 of Port B as output, then high
DDRB |= (1 << 5); // Data Direction Register
PORTB |= (1 << 5); // Port output register
// ARM example: Write to GPIO register
*(volatile uint32_t*)0x40020014 = 0x00000020;
4. Interrupts
Hardware events that pause normal execution:
Interrupt Flow:
┌──────────────────┐
Normal execution ───────────────────►│ Interrupt occurs │
└────────┬─────────┘
│
┌────────▼─────────┐
│ Save CPU state │
│ (registers, PC) │
└────────┬─────────┘
│
┌────────▼─────────┐
│ Jump to ISR │
│ (handler code) │
└────────┬─────────┘
│
┌────────▼─────────┐
│ Execute handler │
└────────┬─────────┘
│
┌────────▼─────────┐
│ Restore state │
│ Resume execution │
└──────────────────┘
5. Boot Process
How a computer starts:
x86 Boot Process:
┌─────────────────┐
│ Power On / Reset│
└────────┬────────┘
│
┌────────▼────────┐
│ CPU starts at │ (Real Mode, 16-bit)
│ 0xFFFF:0000 │
│ (Reset Vector) │
└────────┬────────┘
│
┌────────▼────────┐
│ BIOS/UEFI runs │ Initialize hardware, POST
└────────┬────────┘
│
┌────────▼────────┐
│ Load boot sector│ First 512 bytes from disk
│ to 0x7C00 │
└────────┬────────┘
│
┌────────▼────────┐
│ Your bootloader │ You take control here!
│ runs at 0x7C00 │
└────────┬────────┘
│
┌────────▼────────┐
│ Switch to │ Set up GDT, enable A20
│ Protected Mode │
└────────┬────────┘
│
┌────────▼────────────┐
│ Load & run kernel │
└─────────────────────┘
Hardware Platforms
Recommended Learning Path by Platform
| Order | Platform | Why | Cost |
|---|---|---|---|
| 1 | AVR (Arduino Uno) | Simplest, 8-bit, great docs | ~$25 |
| 2 | STM32 (Blue Pill/Nucleo) | ARM Cortex-M, professional | ~$5-15 |
| 3 | Raspberry Pi | Full ARM system, Linux-like | ~$35-75 |
| 4 | x86 (QEMU) | PC architecture, free emulation | Free |
| 5 | x86 (Real Hardware) | Ultimate test | Existing PC |
Project List
Projects progress from simple to complex, covering multiple platforms and concepts.
Project 1: Blink LED on AVR (Arduino Bare Metal)
- File: LEARN_BARE_METAL_PROGRAMMING.md
- Main Programming Language: C
- Alternative Programming Languages: Assembly (AVR ASM)
- Coolness Level: Level 3: Genuinely Clever
- Business Potential: 1. The “Resume Gold”
- Difficulty: Level 1: Beginner
- Knowledge Area: Embedded Systems / GPIO / Registers
- Software or Tool: avr-gcc, avrdude, Arduino Uno
- Main Book: “AVR Workshop” by John Boxall
What you’ll build: Blink an LED on an Arduino Uno without using the Arduino framework—pure C, direct register manipulation, compiled with avr-gcc.
Why it teaches bare metal: This strips away all abstractions. No digitalWrite(), no setup(), no loop(). Just you, the datasheet, and the hardware. You’ll learn that “magic” Arduino functions are just register writes.
Core challenges you’ll face:
- Reading the ATmega328P datasheet → maps to understanding hardware documentation
- Memory-mapped I/O registers → maps to how all hardware is controlled
- Setting up the toolchain → maps to cross-compilation basics
- Timing without delay() → maps to hardware timers
Key Concepts:
- AVR Registers: ATmega328P datasheet (Microchip)
- GPIO Programming: “AVR Workshop” Ch. 2 - Boxall
- Makefiles: “The GNU Make Book” Ch. 1
- Cross-Compilation: avr-gcc documentation
Resources for key challenges:
Difficulty: Beginner Time estimate: Weekend Prerequisites: Basic C programming, ability to read a datasheet
Real world outcome:
$ avr-gcc -mmcu=atmega328p -Os -o blink.elf blink.c
$ avr-objcopy -O ihex blink.elf blink.hex
$ avrdude -p m328p -c arduino -P /dev/ttyACM0 -b 115200 -U flash:w:blink.hex
avrdude: AVR device initialized
avrdude: writing flash (176 bytes)
avrdude: verifying flash...
avrdude: 176 bytes of flash verified
# LED on pin 13 starts blinking!
# Compare: Arduino IDE version is ~900 bytes, ours is 176 bytes!
Implementation Hints:
The ATmega328P has Port B controlling pins 8-13. Pin 13 is PB5 (bit 5 of Port B).
Key registers:
DDRB(0x24): Data Direction Register B - set bits to 1 for outputPORTB(0x25): Port B Data Register - write to set pin high/lowPINB(0x23): Port B Input Pins - read pin state
Minimal blink structure:
1. Include <avr/io.h> for register definitions
2. Create a delay function using volatile loop counter
3. In main():
- Set DDRB bit 5 as output (DDRB |= (1 << PB5))
- Infinite loop:
- Set PORTB bit 5 high
- Call delay
- Clear PORTB bit 5 low
- Call delay
Questions to guide you:
- What is the CPU clock speed? (16 MHz with external crystal)
- Why use
volatile? (Prevent compiler optimization) - What happens at address 0x0000 when the chip powers on? (Reset vector)
Learning milestones:
- LED blinks → You understand GPIO registers
- Code is smaller than Arduino → You understand abstraction overhead
- You can read the datasheet → You can work with any hardware
- Timer-based delay → You understand hardware peripherals
Project 2: UART Serial Communication (AVR)
- File: LEARN_BARE_METAL_PROGRAMMING.md
- Main Programming Language: C
- Alternative Programming Languages: Assembly
- Coolness Level: Level 3: Genuinely Clever
- Business Potential: 1. The “Resume Gold”
- Difficulty: Level 2: Intermediate
- Knowledge Area: Serial Communication / UART
- Software or Tool: avr-gcc, screen/minicom
- Main Book: “Make: AVR Programming” by Elliot Williams
What you’ll build: A bare metal UART driver that can send and receive data over serial, implementing printf-style output for debugging.
Why it teaches bare metal: UART is your window into a bare metal system. Without an OS, you have no terminal, no printf, no debugger. Building UART first gives you debugging capability for all future projects.
Core challenges you’ll face:
- Baud rate calculation → maps to clock dividers and timing
- Polling vs interrupts → maps to I/O handling strategies
- Ring buffers → maps to data structure implementation
- Implementing printf → maps to variadic functions and formatting
Key Concepts:
- UART Protocol: “Make: AVR Programming” Ch. 5 - Williams
- Baud Rate Calculation: ATmega328P datasheet Section 20
- Ring Buffers: Common embedded pattern
- Interrupt-Driven I/O: “Make: AVR Programming” Ch. 7
Difficulty: Intermediate Time estimate: 1 week Prerequisites: Project 1, understanding of serial communication
Real world outcome:
$ make flash
avrdude: writing flash...
$ screen /dev/ttyACM0 115200
=== AVR Bare Metal Serial Demo ===
System initialized.
> hello
You typed: hello
> led on
LED is now ON
> led off
LED is now OFF
> status
Uptime: 42 seconds
Temperature: 23.5C (from ADC)
Free RAM: 1847 bytes
Implementation Hints:
UART registers on ATmega328P:
UBRR0H/UBRR0L: Baud rate register (16-bit)UCSR0A: Status register (data ready, transmit complete)UCSR0B: Control register (enable TX/RX, interrupts)UCSR0C: Frame format (8N1, etc.)UDR0: Data register (read/write)
Baud rate calculation:
UBRR = (F_CPU / (16 * BAUD)) - 1
For 16MHz and 115200 baud: UBRR = (16000000 / (16 * 115200)) - 1 = 8
UART initialization steps:
- Set baud rate in UBRR0H/UBRR0L
- Enable transmitter and receiver in UCSR0B
- Set frame format (8 data bits, 1 stop bit, no parity) in UCSR0C
Transmit a character:
- Wait for UDRE0 flag in UCSR0A (transmit buffer empty)
- Write character to UDR0
Learning milestones:
- Characters appear in terminal → You understand UART TX
- Can receive input → You understand UART RX
- Interrupt-driven works → You understand hardware interrupts
- Printf works → You can debug any future project
Project 3: Hardware Timer and PWM (AVR)
- File: LEARN_BARE_METAL_PROGRAMMING.md
- Main Programming Language: C
- Alternative Programming Languages: Assembly
- Coolness Level: Level 3: Genuinely Clever
- Business Potential: 2. The “Micro-SaaS / Pro Tool”
- Difficulty: Level 2: Intermediate
- Knowledge Area: Timers / PWM / Interrupts
- Software or Tool: avr-gcc, oscilloscope (optional)
- Main Book: “AVR Workshop” by John Boxall
What you’ll build: Configure hardware timers for precise timing and PWM output—control LED brightness, generate tones, measure time accurately.
Why it teaches bare metal: Software delays are imprecise and waste CPU cycles. Hardware timers are fundamental to embedded systems: real-time scheduling, motor control, audio generation, and power management all depend on them.
Core challenges you’ll face:
- Timer modes (Normal, CTC, PWM) → maps to understanding timer hardware
- Prescaler selection → maps to timing calculations
- Timer interrupts → maps to periodic task execution
- PWM duty cycle → maps to analog output from digital
Key Concepts:
- Timer/Counter Registers: ATmega328P datasheet Section 15-16
- PWM Theory: “AVR Workshop” Ch. 5
- Interrupt Service Routines: “AVR Workshop” Ch. 7
- Prescalers: Clock division for timing
Difficulty: Intermediate Time estimate: 1 week Prerequisites: Project 1-2, basic understanding of interrupts
Real world outcome:
$ make flash
# LED fades in and out smoothly using PWM
# Buzzer plays a melody using timer-generated frequencies
# Serial output shows precise 1-second timestamps
=== Timer Demo ===
[0.000s] System started
[1.000s] Tick
[2.000s] Tick
[2.500s] Button pressed! (measured with input capture)
[3.000s] Tick
PWM Demo:
LED brightness: 0% -> 100% -> 0% (smooth fade)
Tone Demo:
Playing note A4 (440 Hz) for 500ms
Playing note C5 (523 Hz) for 500ms
Implementation Hints:
ATmega328P has three timers:
- Timer0: 8-bit (pins OC0A/OC0B = pins 6/5)
- Timer1: 16-bit (pins OC1A/OC1B = pins 9/10)
- Timer2: 8-bit (pins OC2A/OC2B = pins 11/3)
CTC mode (Clear Timer on Compare Match):
- Set WGM12 bit in TCCR1B for CTC mode
- Set prescaler bits in TCCR1B
- Set compare value in OCR1A
- Enable compare match interrupt with OCIE1A in TIMSK1
- Write ISR(TIMER1_COMPA_vect) handler
PWM for LED brightness:
- Set WGM00 and WGM01 for Fast PWM mode
- Set COM0A1 for non-inverting mode
- Set prescaler
- Write duty cycle (0-255) to OCR0A
Learning milestones:
- Precise timing with interrupts → You understand timer hardware
- LED fades smoothly → You understand PWM
- Generates audio tones → You understand frequency generation
- Measured input timing → You understand input capture
Project 4: Raspberry Pi Bare Metal - Hello World
- File: LEARN_BARE_METAL_PROGRAMMING.md
- Main Programming Language: C
- Alternative Programming Languages: Assembly (ARM)
- Coolness Level: Level 4: Hardcore Tech Flex
- Business Potential: 1. The “Resume Gold”
- Difficulty: Level 3: Advanced
- Knowledge Area: ARM Architecture / Boot Process
- Software or Tool: arm-none-eabi-gcc, Raspberry Pi 3/4
- Main Book: “Bare-metal programming for ARM” by Daniels Umanovskis
What you’ll build: A bare metal “kernel” for Raspberry Pi that boots, initializes UART, and prints “Hello World” to the serial console—no Linux, no OS, just your code.
Why it teaches bare metal: The Raspberry Pi is a real ARM computer with GPU, RAM, peripherals. Getting code to run on it teaches the full boot process, linker scripts, and ARM architecture fundamentals.
Core challenges you’ll face:
- Understanding the Pi boot process → maps to how real systems boot
- Writing a linker script → maps to memory layout control
- ARM assembly basics → maps to architecture fundamentals
- UART on BCM2837 → maps to ARM peripheral access
Key Concepts:
- RPi Boot Process: bztsrc/raspi3-tutorial
- ARM Assembly: “Bare-metal programming for ARM” Ch. 3 - Umanovskis
- Linker Scripts: “Bare-metal programming for ARM” Ch. 4
- BCM2837 Peripherals: BCM2837 ARM Peripherals Manual
Resources for key challenges:
Difficulty: Advanced Time estimate: 2 weeks Prerequisites: Projects 1-3, basic ARM knowledge
Real world outcome:
$ make
arm-none-eabi-gcc -mcpu=cortex-a53 -fpic -ffreestanding -c start.S -o start.o
arm-none-eabi-gcc -mcpu=cortex-a53 -fpic -ffreestanding -c kernel.c -o kernel.o
arm-none-eabi-ld -T linker.ld -o kernel.elf start.o kernel.o
arm-none-eabi-objcopy kernel.elf -O binary kernel8.img
$ ls -la kernel8.img
-rw-r--r-- 1 user user 4096 Dec 21 10:30 kernel8.img
# Copy kernel8.img to SD card with bootcode.bin and start.elf
$ screen /dev/ttyUSB0 115200
=== Raspberry Pi 3 Bare Metal ===
UART initialized at 115200 baud
Hello from bare metal!
CPU: Cortex-A53 @ 1.2GHz
RAM: 1024 MB
Implementation Hints:
Raspberry Pi 3 boot process:
- GPU loads
bootcode.binfrom SD card - GPU runs
bootcode.bin, loadsstart.elf start.elfinitializes ARM, loadskernel8.imgto 0x80000- ARM starts executing at 0x80000
Linker script must set origin to 0x80000:
SECTIONS {
. = 0x80000;
.text : { *(.text.boot) *(.text) }
.rodata : { *(.rodata) }
.data : { *(.data) }
.bss : { *(.bss) }
}
Startup assembly needs to:
- Check processor ID, halt secondary cores
- Set up stack pointer
- Clear BSS section
- Jump to C main function
UART on BCM2837:
- Base address: 0x3F201000
- Data register: base + 0x00
- Flag register: base + 0x18
- Wait for TX FIFO not full before writing
Learning milestones:
- Pi boots your code → You understand the boot process
- UART works → You understand ARM peripheral access
- Can blink the ACT LED → You understand GPIO on BCM2837
- Multi-core awareness → You understand SMP basics
Project 5: x86 Bootloader - Real Mode
- File: LEARN_BARE_METAL_PROGRAMMING.md
- Main Programming Language: Assembly (x86)
- Alternative Programming Languages: None (must be assembly for boot sector)
- Coolness Level: Level 5: Pure Magic
- Business Potential: 1. The “Resume Gold”
- Difficulty: Level 4: Expert
- Knowledge Area: x86 Architecture / Boot Process
- Software or Tool: NASM, QEMU, dd
- Main Book: “Write Great Code, Volume 1” by Randall Hyde
What you’ll build: A 512-byte boot sector that the BIOS loads and executes—printing “Hello” using BIOS interrupts, demonstrating you control the machine from power-on.
Why it teaches bare metal: This is the ultimate bare metal experience on PC. Your code runs before any OS. You’ll understand real mode, BIOS services, and the x86 boot process that every PC uses.
Core challenges you’ll face:
- 512-byte limit → maps to extreme code optimization
- 16-bit real mode → maps to x86 legacy architecture
- BIOS interrupts → maps to pre-OS I/O
- Boot signature (0xAA55) → maps to boot protocol
Key Concepts:
- x86 Real Mode: OSDev Wiki - Real Mode
- BIOS Interrupts: Ralph Brown’s Interrupt List
- Boot Sector: OSDev Wiki - Bootloader
- x86 Assembly: “Write Great Code, Volume 1” Ch. 4-6
Resources for key challenges:
Difficulty: Expert Time estimate: 1-2 weeks Prerequisites: x86 assembly basics, understanding of memory segments
Real world outcome:
$ nasm -f bin boot.asm -o boot.bin
$ ls -la boot.bin
-rw-r--r-- 1 user user 512 Dec 21 10:30 boot.bin
$ xxd boot.bin | tail -2
000001f0: 0000 0000 0000 0000 0000 0000 0000 55aa ..............U.
^^^^
Boot signature!
$ qemu-system-x86_64 -drive format=raw,file=boot.bin
# QEMU window shows:
Hello from the boot sector!
Implementation Hints:
Boot sector structure:
[BITS 16]- Tell assembler we’re in 16-bit mode[ORG 0x7C00]- BIOS loads us to this address- Set up segment registers (DS, ES, SS to 0)
- Set stack pointer below 0x7C00
- Print message using BIOS int 0x10 (ah=0x0E for teletype)
- Infinite loop (jmp $)
- Pad to 510 bytes with zeros
- Boot signature: dw 0xAA55
BIOS teletype output (int 0x10, ah=0x0E):
- AL = character to print
- BH = page number (0)
- BL = foreground color (graphics modes)
To print a string, loop through each character, loading into AL and calling int 0x10.
Learning milestones:
- Boot sector runs → You understand PC boot process
- Prints to screen → You understand BIOS services
- Loads more sectors → You can load a kernel
- Runs in QEMU and real hardware → It really works!
Project 6: x86 Protected Mode Kernel
- File: LEARN_BARE_METAL_PROGRAMMING.md
- Main Programming Language: C with Assembly
- Alternative Programming Languages: Rust
- Coolness Level: Level 5: Pure Magic
- Business Potential: 1. The “Resume Gold”
- Difficulty: Level 4: Expert
- Knowledge Area: x86 Protected Mode / GDT / Memory
- Software or Tool: NASM, GCC cross-compiler, QEMU
- Main Book: “Operating Systems: Three Easy Pieces” by Arpaci-Dusseau
What you’ll build: A kernel that transitions from real mode to 32-bit protected mode, sets up a GDT, and runs C code with full memory access—the foundation for an operating system.
Why it teaches bare metal: Protected mode is where real OSes live. You’ll understand segmentation, privilege levels, and the transition from 16-bit to 32-bit—the most critical step in x86 system programming.
Core challenges you’ll face:
- Creating the GDT → maps to x86 segmentation
- Switching to protected mode → maps to CPU mode transitions
- Calling C from assembly → maps to ABI and calling conventions
- No BIOS in protected mode → maps to writing your own drivers
Key Concepts:
- Protected Mode: OSDev Wiki - Protected Mode
- Global Descriptor Table: OSDev Wiki - GDT
- x86 ABI: System V ABI for i386
- VGA Text Mode: Direct video memory access
Resources for key challenges:
Difficulty: Expert Time estimate: 2-3 weeks Prerequisites: Project 5, understanding of x86 segments
Real world outcome:
$ make
nasm -f elf32 boot.asm -o boot.o
i686-elf-gcc -c kernel.c -o kernel.o -ffreestanding
i686-elf-ld -T linker.ld -o kernel.elf boot.o kernel.o
$ qemu-system-i386 -kernel kernel.elf
# QEMU shows VGA text output:
================================================================================
My x86 Protected Mode Kernel v0.1
================================================================================
[OK] GDT loaded at 0x00000800
[OK] Switched to 32-bit protected mode
[OK] VGA driver initialized (80x25)
Implementation Hints:
GDT needs at least 3 entries:
- Null descriptor (required, all zeros)
- Code segment (base=0, limit=0xFFFFFFFF, executable)
- Data segment (base=0, limit=0xFFFFFFFF, read/write)
Each GDT entry is 8 bytes with a complex structure encoding base, limit, access rights, and flags.
Switching to protected mode:
- Disable interrupts (cli)
- Load GDT with lgdt instruction
- Set CR0.PE bit (Protection Enable)
- Far jump to 32-bit code segment to flush pipeline
- Set up data segment registers
- Set up stack
- Call C main function
VGA text mode memory is at 0xB8000, each character is 2 bytes (character + attribute).
Learning milestones:
- Enters protected mode → You understand mode switching
- C code runs → You understand the ABI
- Prints to VGA → You understand memory-mapped I/O
- No more BIOS needed → You’re truly bare metal
Project 7: Interrupt Descriptor Table (IDT)
- File: LEARN_BARE_METAL_PROGRAMMING.md
- Main Programming Language: C with Assembly
- Alternative Programming Languages: Rust
- Coolness Level: Level 4: Hardcore Tech Flex
- Business Potential: 1. The “Resume Gold”
- Difficulty: Level 4: Expert
- Knowledge Area: Interrupts / x86 Architecture
- Software or Tool: GCC cross-compiler, QEMU
- Main Book: “Operating Systems: Three Easy Pieces” by Arpaci-Dusseau
What you’ll build: A complete interrupt handling system with IDT, ISR stubs, exception handlers, and hardware IRQ support—the foundation for preemptive multitasking.
Why it teaches bare metal: Interrupts are how hardware communicates with software. Without an IDT, your kernel can’t handle exceptions (divide by zero, page faults) or hardware events (keyboard, timer).
Core challenges you’ll face:
- Setting up the IDT → maps to x86 interrupt architecture
- Writing ISR stubs in assembly → maps to context saving/restoring
- Programming the PIC → maps to hardware interrupt controllers
- Handling exceptions → maps to CPU fault management
Key Concepts:
- IDT: OSDev Wiki - IDT
- ISRs and IRQs: OSDev Wiki - Interrupts
- 8259 PIC: OSDev Wiki - PIC
- CPU Exceptions: Intel SDM Volume 3, Chapter 6
Difficulty: Expert Time estimate: 2 weeks Prerequisites: Project 6, understanding of CPU exceptions
Real world outcome:
$ qemu-system-i386 -kernel kernel.elf
=== Interrupt Test Suite ===
[Test 1] Divide by zero exception
[EXCEPTION 0] Divide Error at 0x00100234 ✓
[Test 2] Timer interrupt (IRQ0)
[IRQ 0] Timer tick #1
[IRQ 0] Timer tick #2 ✓
[Test 3] Keyboard interrupt (IRQ1)
[IRQ 1] Keyboard: scancode 0x1E (key 'A') ✓
All interrupt tests passed!
Implementation Hints:
IDT entry structure (8 bytes for 32-bit):
- Bits 0-15: Offset low
- Bits 16-31: Segment selector
- Bits 32-39: Zero
- Bits 40-47: Type and attributes
- Bits 48-63: Offset high
ISR stubs need to:
- Push error code (if CPU doesn’t)
- Push interrupt number
- Save all registers (pusha)
- Call C handler
- Restore registers (popa)
- Add esp, 8 (remove error code and int number)
- iret
Remap PIC to move IRQs 0-15 from vectors 0-15 to vectors 32-47 (to avoid conflict with CPU exceptions).
Learning milestones:
- Exceptions are caught → You understand the IDT
- Timer ticks work → You understand IRQs
- Keyboard input works → You understand hardware interrupts
- System calls work → You understand software interrupts
Project 8: Memory Management - Paging
- File: LEARN_BARE_METAL_PROGRAMMING.md
- Main Programming Language: C with Assembly
- Alternative Programming Languages: Rust
- Coolness Level: Level 5: Pure Magic
- Business Potential: 4. The “Open Core” Infrastructure
- Difficulty: Level 5: Master
- Knowledge Area: Virtual Memory / MMU
- Software or Tool: GCC cross-compiler, QEMU
- Main Book: “Operating Systems: Three Easy Pieces” by Arpaci-Dusseau
What you’ll build: A paging system with page directory, page tables, virtual address translation, and page fault handling—the foundation for process isolation and virtual memory.
Why it teaches bare metal: Paging is how modern systems isolate processes, implement virtual memory, and protect the kernel. Understanding paging means understanding how all modern OSes manage memory.
Core challenges you’ll face:
- Setting up page tables → maps to x86 paging structures
- Enabling paging (CR0, CR3) → maps to CPU control registers
- Virtual to physical translation → maps to MMU operation
- Page fault handling → maps to demand paging, CoW
Key Concepts:
- x86 Paging: OSDev Wiki - Paging
- Page Tables: Ciro Santilli’s x86 Paging Tutorial
- Virtual Memory: “Operating Systems: Three Easy Pieces” Ch. 18-21
- TLB: Translation Lookaside Buffer
Difficulty: Master Time estimate: 3-4 weeks Prerequisites: Project 6-7, understanding of memory hierarchy
Real world outcome:
$ qemu-system-i386 -kernel kernel.elf -m 128M
=== Paging Demo ===
Physical memory detected: 128 MB
Kernel virtual: 0xC0100000 - 0xC0200000 (higher half)
[OK] Page directory created
[OK] Paging enabled
Page fault test...
Accessing unmapped address 0x40000000
[PAGE FAULT] Allocating new page...
Mapping to physical 0x00400000 ✓
Free pages: 31,744 (124 MB)
Implementation Hints:
32-bit x86 paging uses two levels:
- Page Directory: 1024 entries, each points to a Page Table
- Page Table: 1024 entries, each points to a 4KB page
Virtual address breakdown:
- Bits 22-31: Page Directory index (10 bits)
- Bits 12-21: Page Table index (10 bits)
- Bits 0-11: Offset within page (12 bits)
Page table entry flags:
- Bit 0: Present
- Bit 1: Read/Write
- Bit 2: User/Supervisor
- Bits 12-31: Physical page address
Enable paging:
- Set up page directory and tables
- Load page directory address into CR3
- Set CR0.PG bit
Learning milestones:
- Paging enabled, kernel runs → You understand page table setup
- Higher half kernel works → You understand virtual memory layout
- Page faults handled → You understand demand paging
- Can allocate virtual memory → You’ve built a memory manager
Project 9: STM32 Bare Metal (ARM Cortex-M)
- File: LEARN_BARE_METAL_PROGRAMMING.md
- Main Programming Language: C
- Alternative Programming Languages: Rust
- Coolness Level: Level 4: Hardcore Tech Flex
- Business Potential: 3. The “Service & Support” Model
- Difficulty: Level 3: Advanced
- Knowledge Area: ARM Cortex-M / Embedded
- Software or Tool: arm-none-eabi-gcc, STM32F4, OpenOCD
- Main Book: “Making Embedded Systems” by Elecia White
What you’ll build: A complete bare metal project on STM32—GPIO, UART, timers, interrupts, and DMA—without HAL or any libraries.
Why it teaches bare metal: STM32 (ARM Cortex-M) is the industry standard for embedded systems. Learning bare metal STM32 opens doors to professional embedded development.
Core challenges you’ll face:
- Clock tree configuration → maps to understanding system clocks
- Register-level peripheral access → maps to reading datasheets
- NVIC and interrupts → maps to ARM interrupt architecture
- DMA transfers → maps to efficient I/O
Key Concepts:
- STM32 Reference Manual: ST’s RM0090 (for STM32F4)
- ARM Cortex-M Architecture: “The Definitive Guide to ARM Cortex-M3/M4”
- CMSIS: ARM’s Cortex Microcontroller Software Interface Standard
- Startup Code: Vector table and initialization
Resources for key challenges:
Difficulty: Advanced Time estimate: 3-4 weeks Prerequisites: Projects 1-3, ARM basics
Real world outcome:
$ make
arm-none-eabi-gcc -mcpu=cortex-m4 -mthumb -c main.c
arm-none-eabi-ld -T stm32f4.ld -o firmware.elf
$ st-flash write firmware.bin 0x8000000
wrote 16384 bytes
$ screen /dev/ttyUSB0 115200
=== STM32F4 Bare Metal Demo ===
System Clock: 168 MHz
LED blinking ✓
UART working ✓
Timer interrupts working ✓
DMA transfer complete ✓
Implementation Hints:
Vector table must be placed at flash start (0x08000000):
- First entry: Initial stack pointer
- Second entry: Reset handler address
- Remaining entries: Exception and IRQ handlers
GPIO configuration on STM32F4:
- Enable GPIO clock in RCC_AHB1ENR
- Set pin mode in GPIOx_MODER (input/output/alternate/analog)
- Set output type in GPIOx_OTYPER
- Set speed in GPIOx_OSPEEDR
- Set pull-up/pull-down in GPIOx_PUPDR
- Write to GPIOx_ODR or GPIOx_BSRR to set output
All register addresses are in the reference manual. Define base addresses and offsets as volatile pointers.
Learning milestones:
- LED blinks → You understand GPIO and clocks
- UART works → You understand serial peripherals
- Timer interrupts work → You understand NVIC
- DMA transfers work → You understand efficient I/O
Project 10: Simple Kernel with Multitasking
- File: LEARN_BARE_METAL_PROGRAMMING.md
- Main Programming Language: C with Assembly
- Alternative Programming Languages: Rust
- Coolness Level: Level 5: Pure Magic
- Business Potential: 5. The “Industry Disruptor”
- Difficulty: Level 5: Master
- Knowledge Area: Operating Systems / Scheduling
- Software or Tool: GCC cross-compiler, QEMU
- Main Book: “Operating Systems: Three Easy Pieces” by Arpaci-Dusseau
What you’ll build: A kernel with cooperative and preemptive multitasking—multiple tasks running “simultaneously” using a scheduler and context switching.
Why it teaches bare metal: This is the pinnacle of bare metal programming. You’ll implement process abstraction, context switching, and scheduling—the core of any operating system.
Core challenges you’ll face:
- Context switching → maps to saving/restoring CPU state
- Task Control Blocks → maps to process management
- Scheduler implementation → maps to OS scheduling algorithms
- Preemption via timer → maps to time-slicing
Key Concepts:
- Context Switch: “Operating Systems: Three Easy Pieces” Ch. 6
- Scheduling: “Operating Systems: Three Easy Pieces” Ch. 7-9
- Task Control Block: Process state management
- Timer Preemption: Using PIT for time slicing
Difficulty: Master Time estimate: 4-6 weeks Prerequisites: All previous projects
Real world outcome:
$ qemu-system-i386 -kernel kernel.elf
=== Simple Multitasking Kernel ===
Task 0: Idle
Task 1: Counter A
Task 2: Counter B
[Task 1] Count: 1
[Task 2] Count: 1
[Task 1] Count: 2
[Context switch: Task 1 -> Task 2]
[Task 2] Count: 2
...
Two tasks running concurrently!
Implementation Hints:
Task Control Block structure:
- Task ID
- Task state (ready, running, blocked)
- Stack pointer
- Saved registers
- Next task pointer (for linked list)
Context switch (in assembly):
- Save current task’s registers to its stack
- Save stack pointer to current TCB
- Load stack pointer from next TCB
- Restore next task’s registers from its stack
- Return (will return to new task’s saved PC)
Round-robin scheduler:
- Timer interrupt fires at fixed interval (e.g., 10ms)
- In timer ISR, call schedule()
- schedule() picks next ready task
- Perform context switch
- Return from interrupt into new task
Initial task setup:
- Allocate stack for each task
- Set up fake “return” frame on stack
- Task entry point goes where PC would be saved
Learning milestones:
- Two tasks alternate → You understand context switching
- Timer preemption works → You understand preemptive scheduling
- Tasks can block/wake → You understand synchronization
- It feels like an OS → You’ve built an operating system!
Project 11: File System Driver
- File: LEARN_BARE_METAL_PROGRAMMING.md
- Main Programming Language: C
- Alternative Programming Languages: Rust
- Coolness Level: Level 4: Hardcore Tech Flex
- Business Potential: 3. The “Service & Support” Model
- Difficulty: Level 4: Expert
- Knowledge Area: Storage / File Systems
- Software or Tool: Custom kernel, QEMU
- Main Book: “Operating Systems: Three Easy Pieces” by Arpaci-Dusseau
What you’ll build: A FAT16 file system driver that can read files from a disk image—the first step to having a real OS that can load programs.
Why it teaches bare metal: Reading files is essential for loading programs, configurations, and data. Understanding file systems at the block level is crucial for OS development.
Core challenges you’ll face:
- Disk I/O (ATA/IDE) → maps to talking to storage hardware
- FAT structure → maps to on-disk data structures
- Directory parsing → maps to navigating file hierarchy
- File reading → maps to cluster chains and fragmentation
Key Concepts:
- ATA PIO Mode: OSDev Wiki - ATA PIO
- FAT File System: Microsoft FAT specification
- Block I/O: Sector-based disk access
Difficulty: Expert Time estimate: 3-4 weeks Prerequisites: Project 6-8
Real world outcome:
$ qemu-system-i386 -kernel kernel.elf -hda disk.img
=== FAT16 Driver Demo ===
Root directory:
HELLO.TXT 17 bytes
SUBDIR/ <DIR>
> cat hello.txt
Hello from file!
> ls subdir
TEST.TXT 12 bytes
Implementation Hints:
ATA PIO read (port 0x1F0):
- Wait for drive ready
- Send drive/head, sector count, LBA to ports 0x1F2-0x1F6
- Send READ command (0x20) to port 0x1F7
- Wait for data ready
- Read 256 words (512 bytes) from port 0x1F0
FAT16 boot sector contains:
- Bytes per sector (usually 512)
- Sectors per cluster
- Reserved sectors
- Number of FATs
- Root directory entries
- Total sectors
Directory entries are 32 bytes each:
- 11 bytes: 8.3 filename
- 1 byte: attributes
- 2 bytes: first cluster
- 4 bytes: file size
Learning milestones:
- Disk sectors read → You understand ATA I/O
- Boot sector parsed → You understand FAT structure
- Directories listed → You understand directory entries
- Files read → You understand cluster chains
Project 12: UEFI Application
- File: LEARN_BARE_METAL_PROGRAMMING.md
- Main Programming Language: C
- Alternative Programming Languages: Rust
- Coolness Level: Level 4: Hardcore Tech Flex
- Business Potential: 3. The “Service & Support” Model
- Difficulty: Level 3: Advanced
- Knowledge Area: UEFI / Modern Boot
- Software or Tool: GNU-EFI or EDK2, QEMU with OVMF
- Main Book: UEFI Specification
What you’ll build: A UEFI application that runs before any OS—displaying graphics, reading files, and interacting with UEFI services.
Why it teaches bare metal: UEFI is the modern replacement for BIOS. Understanding UEFI means understanding how modern PCs boot, and opens doors to firmware development, secure boot, and bootloaders.
Core challenges you’ll face:
- UEFI application structure → maps to PE executable format
- UEFI protocols (GOP, File, Console) → maps to pre-OS services
- Memory services → maps to UEFI memory map
- Exiting boot services → maps to taking over from UEFI
Key Concepts:
- UEFI Specification: UEFI Forum documentation
- GNU-EFI: Simpler UEFI development
- Graphics Output Protocol: UEFI graphics
Resources for key challenges:
Difficulty: Advanced Time estimate: 2 weeks Prerequisites: C programming, understanding of PE format
Real world outcome:
$ qemu-system-x86_64 -bios OVMF.fd -drive file=fat:rw:esp
# UEFI application displays:
╔══════════════════════════════════════╗
║ MY UEFI APPLICATION v1.0 ║
╠══════════════════════════════════════╣
║ UEFI Version: 2.70 ║
║ Screen: 1024x768 @ 32bpp ║
║ Memory: 127 MB ║
╚══════════════════════════════════════╝
Implementation Hints:
UEFI application entry point:
EFI_STATUS EFIAPI efi_main(
EFI_HANDLE ImageHandle,
EFI_SYSTEM_TABLE *SystemTable
)
SystemTable provides access to:
- ConOut: Console output
- ConIn: Console input
- BootServices: Memory, protocol, event services
- RuntimeServices: Time, variables, reset
Graphics Output Protocol (GOP):
- Use BootServices->LocateProtocol to get GOP
- Access Mode->Info for resolution
- Write directly to Mode->FrameBufferBase
Learning milestones:
- UEFI app runs → You understand UEFI boot
- Graphics work → You understand GOP
- Can read files → You understand UEFI file protocol
- Exit boot services → You’re ready to write an OS loader
Project Comparison Table
| Project | Platform | Difficulty | Time | Key Learning | Fun Factor |
|---|---|---|---|---|---|
| 1. AVR LED Blink | Arduino | Beginner | Weekend | GPIO, registers | ⭐⭐⭐ |
| 2. AVR UART | Arduino | Intermediate | 1 week | Serial, interrupts | ⭐⭐⭐⭐ |
| 3. AVR Timer/PWM | Arduino | Intermediate | 1 week | Timers, PWM | ⭐⭐⭐⭐ |
| 4. RPi Bare Metal | Raspberry Pi | Advanced | 2 weeks | ARM, boot process | ⭐⭐⭐⭐⭐ |
| 5. x86 Bootloader | x86 (QEMU) | Expert | 1-2 weeks | Real mode, BIOS | ⭐⭐⭐⭐⭐ |
| 6. Protected Mode | x86 (QEMU) | Expert | 2-3 weeks | GDT, 32-bit | ⭐⭐⭐⭐⭐ |
| 7. IDT/Interrupts | x86 (QEMU) | Expert | 2 weeks | IDT, ISRs, PIC | ⭐⭐⭐⭐ |
| 8. Paging | x86 (QEMU) | Master | 3-4 weeks | Virtual memory | ⭐⭐⭐⭐⭐ |
| 9. STM32 Bare Metal | STM32 | Advanced | 3-4 weeks | ARM Cortex-M | ⭐⭐⭐⭐⭐ |
| 10. Multitasking | x86 (QEMU) | Master | 4-6 weeks | Scheduling | ⭐⭐⭐⭐⭐ |
| 11. File System | x86 (QEMU) | Expert | 3-4 weeks | Storage, FAT | ⭐⭐⭐⭐ |
| 12. UEFI App | x86 (QEMU) | Advanced | 2 weeks | Modern boot | ⭐⭐⭐⭐ |
Recommended Learning Path
Phase 1: Microcontroller Basics (4-6 weeks)
- Project 1: AVR LED Blink - Your first bare metal success
- Project 2: AVR UART - Debug capability
- Project 3: AVR Timer/PWM - Hardware timers
Phase 2: ARM Architecture (4-6 weeks)
- Project 4: Raspberry Pi Bare Metal - Full ARM system
- Project 9: STM32 Bare Metal - Professional embedded
Phase 3: x86 Boot Process (6-8 weeks)
- Project 5: x86 Bootloader - Real mode
- Project 6: Protected Mode - 32-bit, run C code
- Project 12: UEFI Application - Modern boot
Phase 4: OS Foundations (8-12 weeks)
- Project 7: IDT/Interrupts - Handle hardware events
- Project 8: Paging - Virtual memory
- Project 11: File System - Persistent storage
- Project 10: Multitasking - The capstone
Total estimated time: 6-12 months
Essential Resources
Books
- “Operating Systems: Three Easy Pieces” by Arpaci-Dusseau - Free online
- “Making Embedded Systems” by Elecia White - Practical embedded
- “Write Great Code, Volume 1 & 2” by Randall Hyde - Low-level
- “Bare-metal programming for ARM” by Daniels Umanovskis - Free ebook
Online Resources
- OSDev Wiki - The bible of OS development
- Raspberry Pi Bare Metal
- James Molloy’s Kernel Tutorial
- Bare Metal ARM Guide
Tools
- QEMU - Essential emulator for x86 development
- GDB - Debugger with QEMU remote debugging
- NASM - x86 assembler
- arm-none-eabi-gcc - ARM cross-compiler
- avr-gcc - AVR cross-compiler
- OpenOCD - On-chip debugger for STM32
Summary
| # | Project | Main Language | Platform |
|---|---|---|---|
| 1 | Blink LED on AVR | C | Arduino/AVR |
| 2 | UART Serial Communication | C | Arduino/AVR |
| 3 | Hardware Timer and PWM | C | Arduino/AVR |
| 4 | Raspberry Pi Bare Metal | C + ARM Assembly | Raspberry Pi |
| 5 | x86 Bootloader - Real Mode | x86 Assembly | x86 (QEMU) |
| 6 | x86 Protected Mode Kernel | C + x86 Assembly | x86 (QEMU) |
| 7 | Interrupt Descriptor Table | C + x86 Assembly | x86 (QEMU) |
| 8 | Memory Management - Paging | C + x86 Assembly | x86 (QEMU) |
| 9 | STM32 Bare Metal | C | STM32/ARM Cortex-M |
| 10 | Simple Kernel with Multitasking | C + x86 Assembly | x86 (QEMU) |
| 11 | File System Driver | C | x86 (QEMU) |
| 12 | UEFI Application | C | x86 (QEMU/OVMF) |
Sources
- OSDev Wiki - Bare Bones
- OSDev Wiki - Bootloader
- OSDev Wiki - Protected Mode
- OSDev Wiki - Paging
- OSDev Wiki - Interrupts
- Raspberry Pi Bare Metal Tutorial
- Valvers Raspberry Pi Tutorial
- Bare-metal programming for ARM (ebook)
- cpq Bare Metal Programming Guide
- James Molloy Kernel Tutorial
- Ciro Santilli x86 Paging
- Vivonomicon STM32 Bare Metal
- Hackster.io Arduino Bare Metal
- AVR Bare Metal Examples
- Baeldung UEFI Programming