Project 9: Raspberry Pi Bare-Metal Bootloader
Write code that runs directly on ARM silicon with no operating system - a bare-metal bootloader for Raspberry Pi that blinks LEDs, outputs to serial, and boots a kernel.
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | ★★★★☆ Expert |
| Time Estimate | 2-3 weeks |
| Language | ARM Assembly + C (alt: Rust with no_std) |
| Prerequisites | Basic C, willingness to learn ARM assembly, a Raspberry Pi |
| Key Topics | ARM boot process, GPU boot sequence, MMIO, UART initialization, GPIO, device trees, exception vectors |
1. Learning Objectives
After completing this project, you will be able to:
- Explain the unique Raspberry Pi boot sequence (GPU boots first, then ARM CPU)
- Write ARM assembly code that executes without any operating system
- Initialize and use UART for serial communication at the bare-metal level
- Control GPIO pins using memory-mapped I/O (MMIO)
- Understand ARM exception vectors and their role in embedded systems
- Work with device trees for hardware description
- Cross-compile code for ARM targets from x86
- Debug embedded systems using serial output and LED indicators
- Load and execute a secondary kernel from the bootloader
2. Theoretical Foundation
2.1 Core Concepts
The Raspberry Pi Boot Sequence
Unlike x86 computers where the CPU starts first, the Raspberry Pi has a unique boot process where the GPU boots first.
Raspberry Pi Boot Sequence (GPU-First Architecture):
┌─────────────────────────────────────────────────────────────────────────────┐
│ POWER APPLIED │
└─────────────────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ STAGE 1: GPU ROM BOOTLOADER │
│ • GPU (VideoCore IV/VI) wakes up first - ARM CPU is OFF! │
│ • Executes code from internal ROM │
│ • Initializes SD card controller in bare minimum mode │
│ • Loads bootcode.bin from SD card FAT partition │
└─────────────────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ STAGE 2: bootcode.bin (GPU code) │
│ • Enables SDRAM (RAM wasn't usable before this!) │
│ • Reads config.txt for configuration │
│ • Loads start.elf (main GPU firmware) │
└─────────────────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ STAGE 3: start.elf (GPU firmware) │
│ • Full GPU initialization │
│ • Reads config.txt (kernel filename, UART settings, memory split) │
│ • Loads kernel*.img and device tree (.dtb) to RAM │
│ • Sets up ARM CPU registers │
│ • FINALLY releases ARM CPU from reset │
└─────────────────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ YOUR BARE-METAL CODE RUNS HERE! │
│ • ARM CPU starts at address specified by config.txt │
│ • Default: 0x8000 (32-bit) or 0x80000 (64-bit) │
│ • GPU has already initialized RAM, clocks, and basic peripherals │
│ • You have full control of the ARM cores │
└─────────────────────────────────────────────────────────────────────────────┘
Memory-Mapped I/O (MMIO)
On ARM systems, peripherals are controlled by reading and writing to specific memory addresses. There are no special I/O instructions like x86’s IN and OUT.
Raspberry Pi Memory Map (Pi 3):
┌─────────────────────────────────────────────────────────────────────────────┐
│ Address Range │ Purpose │
├──────────────────────┼──────────────────────────────────────────────────────┤
│ 0x00000000-0x3EFFFFFF│ SDRAM (1GB minus GPU memory) │
│ 0x3F000000-0x3FFFFFFF│ Peripheral Memory Space (MMIO) │
│ 0x3F200000 │ GPIO registers │
│ 0x3F201000 │ PL011 UART (main UART) │
│ 0x3F215000 │ Mini UART (secondary UART) │
│ 0x40000000-0x40FFFFFF│ Local peripherals (interrupts, timers, mailboxes) │
└─────────────────────────────────────────────────────────────────────────────┘
Raspberry Pi Memory Map (Pi 4):
┌─────────────────────────────────────────────────────────────────────────────┐
│ Address Range │ Purpose │
├──────────────────────┼──────────────────────────────────────────────────────┤
│ 0x00000000-0xFBFFFFFF│ SDRAM (up to 4GB/8GB minus GPU memory) │
│ 0xFE000000-0xFEFFFFFF│ Peripheral Memory Space (MMIO) │
│ 0xFE200000 │ GPIO registers │
│ 0xFE201000 │ PL011 UART │
│ 0xFE215000 │ Mini UART │
└─────────────────────────────────────────────────────────────────────────────┘
GPIO (General Purpose Input/Output)
GPIO pins can be configured as inputs, outputs, or alternate functions (UART, SPI, I2C, PWM).
GPIO Configuration (Simplified):
┌─────────────────────────────────────────────────────────────────────────────┐
│ GPIO Registers (Pi 3 base: 0x3F200000) │
├────────────────────┬────────────────────────────────────────────────────────┤
│ GPFSEL0 (0x00) │ Function select for GPIO 0-9 (3 bits per pin) │
│ GPFSEL1 (0x04) │ Function select for GPIO 10-19 │
│ GPFSEL2 (0x08) │ Function select for GPIO 20-29 │
│ GPFSEL3 (0x0C) │ Function select for GPIO 30-39 │
│ GPFSEL4 (0x10) │ Function select for GPIO 40-49 │
│ GPFSEL5 (0x14) │ Function select for GPIO 50-53 │
├────────────────────┼────────────────────────────────────────────────────────┤
│ GPSET0 (0x1C) │ Set GPIO high (write 1 to set) │
│ GPSET1 (0x20) │ Set GPIO 32-53 high │
├────────────────────┼────────────────────────────────────────────────────────┤
│ GPCLR0 (0x28) │ Set GPIO low (write 1 to clear) │
│ GPCLR1 (0x2C) │ Clear GPIO 32-53 │
├────────────────────┼────────────────────────────────────────────────────────┤
│ GPLEV0 (0x34) │ Read GPIO level (1 = high, 0 = low) │
│ GPLEV1 (0x38) │ Read GPIO 32-53 level │
└────────────────────┴────────────────────────────────────────────────────────┘
Function Select Values (3 bits per GPIO):
000 = Input
001 = Output
010 = Alt Function 0 (varies by pin)
011 = Alt Function 1
100 = Alt Function 2
101 = Alt Function 3
110 = Alt Function 4
111 = Alt Function 5
Example: Configure GPIO 47 (ACT LED on Pi 3) as output
- GPIO 47 is in GPFSEL4 (handles GPIO 40-49)
- Bits 21-23 control GPIO 47 (7 pins * 3 bits = 21)
- Write 001 to bits 21-23 for output mode
UART (Universal Asynchronous Receiver/Transmitter)
UART provides serial communication - essential for debugging bare-metal code.
UART Configuration (PL011 - Full UART):
┌─────────────────────────────────────────────────────────────────────────────┐
│ PL011 UART Registers (Pi 3 base: 0x3F201000, Pi 4: 0xFE201000) │
├────────────────────┬────────────────────────────────────────────────────────┤
│ DR (0x00) │ Data Register - read/write character data │
│ FR (0x18) │ Flag Register - status flags (TXFF, RXFE, BUSY) │
│ IBRD (0x24) │ Integer Baud Rate Divisor │
│ FBRD (0x28) │ Fractional Baud Rate Divisor │
│ LCRH (0x2C) │ Line Control Register (data bits, parity, stop bits) │
│ CR (0x30) │ Control Register (enable TX, RX, UART) │
│ ICR (0x44) │ Interrupt Clear Register │
└────────────────────┴────────────────────────────────────────────────────────┘
Baud Rate Calculation:
Divider = UART_CLOCK / (16 * Baud_Rate)
IBRD = Integer part of Divider
FBRD = (Fractional part * 64) + 0.5
Example for 115200 baud with 48MHz UART clock:
Divider = 48000000 / (16 * 115200) = 26.041666...
IBRD = 26
FBRD = (0.041666 * 64) + 0.5 = 3.16... ≈ 3
ARM Exception Vectors
ARM processors use an exception vector table at a fixed location for handling interrupts and exceptions.
ARM Exception Vector Table:
┌─────────────────────────────────────────────────────────────────────────────┐
│ Offset │ Exception │ Description │
├────────┼──────────────────────────┼────────────────────────────────────────┤
│ 0x00 │ Reset │ CPU reset or power-on │
│ 0x04 │ Undefined Instruction │ Invalid opcode executed │
│ 0x08 │ Software Interrupt (SWI) │ System call (SVC instruction) │
│ 0x0C │ Prefetch Abort │ Instruction fetch failed │
│ 0x10 │ Data Abort │ Data access failed │
│ 0x14 │ Reserved │ (Not used on ARMv7) │
│ 0x18 │ IRQ │ Hardware interrupt │
│ 0x1C │ FIQ │ Fast interrupt │
└────────┴──────────────────────────┴────────────────────────────────────────┘
Vector Table Example (ARM Assembly):
.section .text.boot
.global _start
_start:
ldr pc, _reset_handler
ldr pc, _undefined_handler
ldr pc, _swi_handler
ldr pc, _prefetch_abort_handler
ldr pc, _data_abort_handler
ldr pc, _reserved_handler
ldr pc, _irq_handler
ldr pc, _fiq_handler
_reset_handler: .word reset_handler
_undefined_handler: .word undefined_handler
_swi_handler: .word swi_handler
_prefetch_abort_handler: .word prefetch_abort_handler
_data_abort_handler: .word data_abort_handler
_reserved_handler: .word reserved_handler
_irq_handler: .word irq_handler
_fiq_handler: .word fiq_handler
2.2 Why This Matters
Real-World Embedded Systems: The Raspberry Pi boot process is representative of how most ARM-based systems boot. Understanding it prepares you for working with:
- Industrial controllers
- IoT devices
- Automotive systems
- Medical devices
- Consumer electronics
Deep Hardware Understanding: Writing bare-metal code forces you to understand:
- How hardware is actually controlled
- Why abstraction layers exist
- Performance implications of every operation
- Power management at the lowest level
Debugging Skills: When there’s no OS, you learn to debug with:
- LED blink patterns (the original debug output)
- Serial output
- Logic analyzers
- JTAG debuggers
2.3 Historical Context
The Raspberry Pi’s Unique Architecture: The Pi was designed by Broadcom for set-top boxes. The VideoCore GPU was the primary processor, with the ARM CPU added as a co-processor. This unusual design decision means the GPU is actually the “main” processor from a boot perspective.
The ARM Revolution: ARM processors dominate mobile and embedded computing. Understanding ARM at the bare-metal level opens doors to billions of devices that power our world.
From Hobby to Industry: The Raspberry Pi, initially an educational tool, is now used in industrial and commercial applications. Skills learned here directly apply to professional embedded development.
2.4 Common Misconceptions
Misconception 1: “The ARM CPU boots first like x86”
- Reality: The GPU boots first and controls when the ARM CPU starts. Your code runs only after the GPU has initialized everything.
Misconception 2: “I can use the same code for all Pi models”
- Reality: Peripheral addresses differ between Pi models. Pi 3 uses 0x3F000000, Pi 4 uses 0xFE000000. Your code must detect or be compiled for the specific model.
Misconception 3: “config.txt is just for display settings”
- Reality: config.txt controls kernel loading, UART configuration, memory split, clock speeds, GPIO configuration, and much more. It’s essential for bare-metal development.
Misconception 4: “ARM assembly is similar to x86”
- Reality: ARM is a RISC architecture with a completely different instruction set, register usage, and conventions. Load/store architecture means no memory operands in arithmetic.
Misconception 5: “The Pi is too complex for bare-metal programming”
- Reality: The GPU handles the complex initialization. By the time your code runs, RAM is initialized, clocks are running, and peripherals are powered. It’s actually simpler than x86 in many ways.
3. Project Specification
3.1 What You Will Build
A bare-metal bootloader for Raspberry Pi that:
- Runs directly on ARM CPU after GPU firmware loads it
- Initializes UART for serial output at 115200 baud
- Prints boot messages to serial console
- Blinks the ACT LED in a recognizable pattern
- Loads and jumps to a secondary kernel
3.2 Functional Requirements
- Code boots on Raspberry Pi 3 or Pi 4 (choose one to start)
- UART outputs boot messages visible in serial terminal
- ACT LED blinks in a recognizable pattern (e.g., 3 short, 1 long)
- Bootloader can load a kernel from SD card
- Clean jump to kernel with correct register state
- Works in both QEMU emulation and real hardware
3.3 Non-Functional Requirements
- Boot time from power-on to serial output: < 2 seconds
- Code size: < 64KB for basic version
- Must work with standard USB-to-TTL serial adapters
- Should compile with standard ARM GCC toolchain
- LED pattern should be recognizable (not just random flickering)
3.4 Example Usage / Output
# Build the bootloader
$ make ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu-
# Copy to SD card boot partition
$ cp kernel8.img /media/boot/
$ cp config.txt /media/boot/
# Connect serial adapter (GPIO 14=TXD, GPIO 15=RXD, GND)
$ screen /dev/ttyUSB0 115200
# Power on Pi, see output:
3.5 Real World Outcome
# On your serial console (via USB-TTL adapter):
=========================================
Bare Metal Bootloader v1.0
=========================================
Raspberry Pi 3 Model B+
CPU: ARM Cortex-A53 @ 1.4GHz
Core Temperature: 42.3C
Initializing hardware...
UART: OK (115200 8N1)
GPIO: OK
Timer: OK
Blinking ACT LED (3 short, 1 long)...
Searching for kernel on SD card...
Found: kernel.img (65536 bytes)
Loading to 0x80000...
Verifying checksum... OK
Boot arguments: console=ttyAMA0,115200
Jumping to kernel at 0x80000!
------------------------------------------
# Your kernel takes over and outputs:
Minimal Kernel v0.1
Hello from bare metal ARM!
The Pi’s green ACT LED blinks in your programmed pattern, and you’ve successfully built a complete boot chain running on real ARM silicon!
4. Solution Architecture
4.1 High-Level Design
Bootloader Architecture:
┌─────────────────────────────────────────────────────────────────────────────┐
│ SD Card (FAT32) │
├─────────────────────────────────────────────────────────────────────────────┤
│ bootcode.bin start.elf config.txt │
│ (GPU stage 1) (GPU stage 2) (configuration) │
│ │
│ kernel8.img kernel.img *.dtb │
│ (YOUR CODE!) (target kernel) (device tree) │
└─────────────────────────────────────────────────────────────────────────────┘
│
│ GPU loads kernel8.img to 0x80000
▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ YOUR BOOTLOADER │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ start.S │────►│ main.c │────►│ kernel │ │
│ │ │ │ │ │ (payload) │ │
│ │ • Stack │ │ • UART init │ │ │ │
│ │ • BSS clear │ │ • GPIO LED │ │ │ │
│ │ • Call main │ │ • Load kern │ │ │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ Support Modules │ │
│ ├─────────────┬─────────────┬─────────────┬─────────────┬─────────────┤ │
│ │ uart.c │ gpio.c │ timer.c │ mailbox.c │ sd.c │ │
│ │ Serial I/O │ LED Ctrl │ Delays │ GPU Comm │ SD Access │ │
│ └─────────────┴─────────────┴─────────────┴─────────────┴─────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
4.2 Key Components
- Bootstrap Assembly (start.S)
- Entry point at 0x80000 (Pi 3 64-bit)
- Sets up stack pointer
- Clears BSS section (uninitialized data)
- Calls C main() function
- UART Module (uart.c)
- Configures GPIO pins 14/15 for UART function
- Sets baud rate divisors
- Enables transmit and receive
- Provides putc, puts, printf functions
- GPIO Module (gpio.c)
- Configures pins as input/output
- Controls LED (GPIO 47 on Pi 3, GPIO 42 on Pi 4)
- Sets up alternate functions for UART
- Timer Module (timer.c)
- Access to system timer
- Delay functions (milliseconds, microseconds)
- For LED blinking timing
- Mailbox Module (mailbox.c)
- GPU-CPU communication
- Query hardware information (board model, MAC address, temperature)
- SD Card Module (sd.c) - Optional for kernel loading
- SD card initialization
- FAT filesystem reading
- Kernel image loading
4.3 Data Structures
// Peripheral base addresses
#define PERIPHERAL_BASE_PI3 0x3F000000
#define PERIPHERAL_BASE_PI4 0xFE000000
// GPIO registers structure
typedef struct {
volatile uint32_t GPFSEL[6]; // Function select
volatile uint32_t reserved0;
volatile uint32_t GPSET[2]; // Pin output set
volatile uint32_t reserved1;
volatile uint32_t GPCLR[2]; // Pin output clear
volatile uint32_t reserved2;
volatile uint32_t GPLEV[2]; // Pin level
// ... more registers
} gpio_t;
// UART registers structure (PL011)
typedef struct {
volatile uint32_t DR; // Data register
volatile uint32_t RSRECR; // Error clear
volatile uint32_t reserved[4];
volatile uint32_t FR; // Flag register
volatile uint32_t reserved1;
volatile uint32_t ILPR; // Not used
volatile uint32_t IBRD; // Integer baud rate divisor
volatile uint32_t FBRD; // Fractional baud rate divisor
volatile uint32_t LCRH; // Line control register
volatile uint32_t CR; // Control register
// ... more registers
} uart_t;
// Boot information passed to kernel
typedef struct {
uint32_t magic; // Magic number for validation
uint32_t version; // Boot protocol version
uint32_t mem_base; // RAM base address
uint32_t mem_size; // RAM size in bytes
uint32_t cmdline_addr; // Address of command line string
uint32_t initrd_addr; // Initramfs address (if loaded)
uint32_t initrd_size; // Initramfs size
} boot_info_t;
4.4 Algorithm Overview
Main Boot Flow:
1. ENTRY (start.S):
- Disable interrupts
- Set stack pointer to safe location
- Clear BSS section to zero
- Jump to main()
2. HARDWARE INIT (main.c):
a. Detect Pi model from peripheral base
b. Initialize UART:
- Set GPIO 14/15 to alt function 0
- Configure baud rate (115200)
- Enable UART TX/RX
c. Print boot banner
d. Initialize GPIO for LED
3. HARDWARE PROBE:
- Query board revision via mailbox
- Get ARM memory size
- Get temperature
- Print hardware info
4. LED BLINK:
- Blink pattern (shows we're alive)
- Use timer for consistent timing
5. KERNEL LOAD (if implemented):
- Initialize SD card
- Mount FAT partition
- Read kernel.img
- Load to target address
6. KERNEL HANDOFF:
- Set register arguments:
- x0/r0 = DTB address
- x1/r1 = 0 (reserved)
- x2/r2 = 0 (reserved)
- x3/r3 = 0 (reserved)
- Disable caches (if enabled)
- Jump to kernel entry point
5. Implementation Guide
5.1 Development Environment Setup
Required Tools:
# Cross-compiler for ARM64 (Pi 3/4 in 64-bit mode)
sudo apt install gcc-aarch64-linux-gnu
# Cross-compiler for ARM32 (Pi 2, or 32-bit mode)
sudo apt install gcc-arm-linux-gnueabihf
# QEMU for testing without real hardware
sudo apt install qemu-system-arm
# Serial terminal
sudo apt install screen minicom
# Hex tools
sudo apt install xxd
Hardware Required:
- Raspberry Pi 3 or 4
- microSD card (4GB minimum)
- USB-to-TTL serial adapter (FTDI, CP2102, or similar)
- 3 jumper wires (connect GPIO 14, 15, and GND)
- Power supply for Pi
Serial Connection:
USB-TTL Adapter Raspberry Pi GPIO
┌───────────┐ ┌───────────────────┐
│ GND ──────┼────────────┤ GND (Pin 6) │
│ TXD ──────┼────────────┤ RXD/GPIO 15 │
│ RXD ──────┼────────────┤ TXD/GPIO 14 │
│ VCC │ │ (DO NOT CONNECT!) │
└───────────┘ └───────────────────┘
WARNING: Do NOT connect VCC from USB adapter to Pi!
Power the Pi separately via USB-C/micro-USB.
5.2 Project Structure
rpi-bootloader/
├── src/
│ ├── start.S # Assembly entry point
│ ├── main.c # Main bootloader code
│ ├── uart.c # UART driver
│ ├── uart.h
│ ├── gpio.c # GPIO driver
│ ├── gpio.h
│ ├── timer.c # System timer
│ ├── timer.h
│ ├── mailbox.c # GPU mailbox interface
│ ├── mailbox.h
│ ├── mmio.h # Memory-mapped I/O helpers
│ └── kernel/ # Simple test kernel
│ ├── kernel.S
│ └── kernel.c
├── linker.ld # Linker script
├── config.txt # Pi boot configuration
├── Makefile # Build system
└── README.md # Documentation
5.3 The Core Question You’re Answering
How do embedded systems boot and initialize hardware when there is no operating system, no drivers, and no abstraction layers - just raw silicon?
This project forces you to understand that every convenience we take for granted (printf, memory allocation, interrupts) must be built from scratch when there’s nothing but a CPU and some addressable memory.
5.4 Concepts You Must Understand First
- ARM Architecture Basics (“Making Embedded Systems” Chapter 4, Elecia White)
- Self-assessment: Can you explain the difference between ARM and Thumb instruction sets? What is a load-store architecture?
- Why it matters: ARM assembly is different from x86; understanding the philosophy helps write correct code.
- Memory-Mapped I/O (BCM2835 ARM Peripherals Manual)
- Self-assessment: How do you read a hardware register in ARM? Why must peripheral registers be declared
volatile? - Why it matters: All hardware control on ARM is done through memory addresses.
- Self-assessment: How do you read a hardware register in ARM? Why must peripheral registers be declared
- Cross-Compilation (“Mastering Embedded Linux Programming” Chapter 2, Chris Simmonds)
- Self-assessment: What does
CROSS_COMPILE=aarch64-linux-gnu-mean? Why can’t you use your host compiler? - Why it matters: Building code for a different architecture requires understanding toolchains.
- Self-assessment: What does
- Linker Scripts (“Computer Systems: A Programmer’s Perspective” Chapter 7, Bryant & O’Hallaron)
- Self-assessment: What does a linker script control? What is the difference between VMA and LMA?
- Why it matters: Bare-metal code must be linked to specific addresses.
- Serial Communication (UART) (“Bare Metal C” Chapter 6, Steve Oualline)
- Self-assessment: What do 115200 8N1 mean? How is a character transmitted serially?
- Why it matters: Serial output is your only debugging tool in bare-metal.
5.5 Questions to Guide Your Design
Architecture Decisions:
- Will you target 32-bit (ARMv7) or 64-bit (ARMv8/AArch64)?
- Which Pi model(s) will you support initially?
- How will you handle the different peripheral base addresses?
Hardware Initialization:
- What must happen before you can print to UART?
- In what order should peripherals be initialized?
- How will you detect which Pi model is running?
Debugging Strategy:
- How will you know code is running if UART isn’t working?
- What LED pattern indicates successful boot?
- How can you use QEMU before testing on real hardware?
Kernel Loading:
- Where will the kernel be loaded in memory?
- What register state does Linux expect at entry?
- How will you pass the device tree address?
5.6 Thinking Exercise
Before writing any code, trace the complete path from power to your code:
- Draw a timeline showing:
- GPU ROM execution
- bootcode.bin loading
- start.elf loading
- Your kernel8.img loading
- ARM CPU release from reset
- Your code starting
- For each phase, answer:
- What hardware is initialized?
- What memory is available?
- Who is in control (GPU or ARM)?
- Map out the GPIO path for UART:
- Which pins are GPIO 14 and 15 on the physical header?
- What alternate function is UART TX/RX?
- What bits in which registers must be set?
Expected Insights:
- The GPU does most of the hard initialization work for you
- Your code benefits from RAM being ready and clocks running
- The peripheral base address is your key to all hardware
- Everything must be configured before use - nothing “just works”
5.7 Hints in Layers
Hint 1: Getting Started (Conceptual Direction)
Start with a minimal assembly stub that just blinks the LED. Don’t try to initialize UART first - you can’t debug UART without another working output. The LED proves your code is running.
For Pi 3, the ACT LED is GPIO 47. For Pi 4, it’s GPIO 42 (directly controllable) or GPIO 29 (active low).
Get QEMU working first. Test your bootloader with:
qemu-system-aarch64 -M raspi3b -serial stdio -kernel kernel8.img
Hint 2: Minimal Bootstrap (More Specific Guidance)
Your start.S should look something like this:
.section ".text.boot"
.global _start
_start:
// Check processor ID, stop slaves
mrs x1, mpidr_el1
and x1, x1, #3
cbz x1, master
b hang
master:
// Set stack pointer
ldr x1, =_start
mov sp, x1
// Clear BSS
ldr x1, =__bss_start
ldr w2, =__bss_size
1: cbz w2, 2f
str xzr, [x1], #8
sub w2, w2, #1
b 1b
2: // Jump to C code
bl main
b hang
hang:
wfe
b hang
Hint 3: UART Initialization (Technical Details)
Initialize UART in this order:
void uart_init(void) {
// 1. Disable UART
mmio_write(UART0_CR, 0);
// 2. Setup GPIO pins 14 and 15
unsigned int selector = mmio_read(GPFSEL1);
selector &= ~(7 << 12); // Clear GPIO 14
selector |= 4 << 12; // Set alt0 for GPIO 14
selector &= ~(7 << 15); // Clear GPIO 15
selector |= 4 << 15; // Set alt0 for GPIO 15
mmio_write(GPFSEL1, selector);
// 3. Disable pull-up/down for pins 14, 15
mmio_write(GPPUD, 0);
delay(150);
mmio_write(GPPUDCLK0, (1 << 14) | (1 << 15));
delay(150);
mmio_write(GPPUDCLK0, 0);
// 4. Clear pending interrupts
mmio_write(UART0_ICR, 0x7FF);
// 5. Set baud rate (115200 with 48MHz clock)
mmio_write(UART0_IBRD, 26);
mmio_write(UART0_FBRD, 3);
// 6. Enable FIFO, 8 bit data
mmio_write(UART0_LCRH, (1 << 4) | (1 << 5) | (1 << 6));
// 7. Enable UART TX and RX
mmio_write(UART0_CR, (1 << 0) | (1 << 8) | (1 << 9));
}
Hint 4: Testing and Debugging (Tools and Verification)
QEMU testing command:
# Run with serial output to terminal
qemu-system-aarch64 -M raspi3b -serial null -serial stdio -kernel kernel8.img
# Run with GDB debugging
qemu-system-aarch64 -M raspi3b -serial null -serial stdio -kernel kernel8.img -s -S
# In another terminal:
aarch64-linux-gnu-gdb kernel8.elf
(gdb) target remote localhost:1234
(gdb) break main
(gdb) continue
For real hardware:
# Connect serial adapter and open terminal
screen /dev/ttyUSB0 115200
# Or with minicom
minicom -b 115200 -D /dev/ttyUSB0
Verification checklist:
- LED blinks: Code is running
- Serial output: UART initialized correctly
- Hardware info printed: Mailbox working
- Kernel loaded: SD card access working
5.8 The Interview Questions They’ll Ask
- “Explain the Raspberry Pi boot process. Why does the GPU boot first?”
- What they’re testing: Understanding of SoC architecture and boot sequences
- Strong answer: “The Pi was designed by Broadcom for set-top boxes where the GPU was the primary processor. The VideoCore GPU has its own boot ROM that initializes SDRAM, loads the GPU firmware, and finally starts the ARM CPU. This is different from x86 where the CPU starts first. By the time my bare-metal code runs, the GPU has already initialized RAM, clocks, and basic peripherals, which actually simplifies bare-metal programming.”
- “How does memory-mapped I/O work, and why is
volatileimportant?”- What they’re testing: Understanding of hardware access in C
- Strong answer: “ARM doesn’t have separate I/O instructions like x86. Peripherals appear at specific memory addresses. To configure UART, I write to addresses like 0x3F201030.
volatiletells the compiler not to optimize away reads/writes - without it, the compiler might think a repeated write to the same address is redundant and remove it, but hardware registers have side effects.”
- “What challenges did you face getting the first serial output working?”
- What they’re testing: Real debugging experience
- Strong answer: “The chicken-and-egg problem: I couldn’t debug UART without serial output. I solved this by first blinking an LED - a single GPIO write that proves code is running. Common issues include wrong GPIO alternate function selection, wrong peripheral base address for the Pi model, and baud rate calculation errors. Testing in QEMU first eliminates hardware connection issues.”
- “How would you port this to a different ARM board?”
- What they’re testing: Generalization ability
- Strong answer: “The principles are the same, but specifics differ: peripheral base address, GPIO/UART register layouts, boot protocol (some boards use U-Boot), and memory map. I’d start by reading the SoC datasheet to find peripheral addresses, then test each component individually. The architecture - assembly entry, BSS clear, C main, MMIO access - is reusable.”
- “How do you ensure the bootloader correctly hands off to the kernel?”
- What they’re testing: Understanding of boot protocols
- Strong answer: “Linux expects specific register values at entry: x0 contains the device tree address, x1-x3 should be zero. Caches and MMU should be disabled. I pass boot arguments through the command line in the device tree or via a legacy ATAGS structure. Testing with a minimal kernel that just prints ‘Hello’ verifies the handoff.”
5.9 Books That Will Help
| Topic | Book & Chapter |
|---|---|
| Raspberry Pi bare metal | “Bare Metal C” by Steve Oualline, Chapters 7-8 (Raspberry Pi specifics) |
| ARM exception model | “Making Embedded Systems” by Elecia White, Chapter 4 (Interrupts and Exceptions) |
| BCM2835/2837 peripherals | BCM2835 ARM Peripherals Manual |
| UART programming | “Bare Metal C” by Steve Oualline, Chapter 6 (Serial Communication) |
| Device tree basics | Device Tree for Dummies |
| ARM architecture | “ARM System Developer’s Guide” by Andrew Sloss et al., Chapters 1-3 |
| Cross-compilation | “Mastering Embedded Linux Programming” by Chris Simmonds, Chapter 2 |
| Embedded debugging | “Making Embedded Systems” by Elecia White, Chapter 9 (Debugging) |
5.10 Implementation Phases
Phase 1: LED Blink (2-3 days)
- Set up cross-compilation toolchain
- Write minimal start.S
- Create linker script for 0x80000
- Implement GPIO output for LED
- Test in QEMU and on real Pi
Phase 2: UART Output (3-4 days)
- Configure GPIO for UART alternate function
- Initialize UART registers
- Implement putc() and puts()
- Print boot banner
- Implement simple printf()
Phase 3: Hardware Detection (2-3 days)
- Implement mailbox interface
- Query board model
- Get memory size and temperature
- Print hardware information
Phase 4: Kernel Loading (4-5 days - optional)
- Initialize SD card
- Read FAT filesystem
- Load kernel to memory
- Set up registers
- Jump to kernel
Phase 5: Polish (2-3 days)
- Add boot timeout
- Implement kernel command line
- Add checksum verification
- Create documentation
5.11 Key Implementation Decisions
-
32-bit vs 64-bit: Start with 64-bit (AArch64) for Pi 3/4. It’s the modern standard and has cleaner assembly syntax.
-
Mini UART vs PL011: Mini UART is simpler but shares clock with GPU. PL011 is more reliable but more complex to configure. Start with Mini UART.
-
Polling vs Interrupts: Use polling for simplicity. Interrupts add complexity and aren’t needed for a basic bootloader.
-
config.txt settings: Start with minimal settings:
enable_uart=1 kernel=kernel8.img arm_64bit=1
6. Testing Strategy
Level 1: QEMU Testing
# Build and test in emulator
make
qemu-system-aarch64 -M raspi3b -serial null -serial stdio -kernel kernel8.img
Expected: Boot message appears in terminal
Level 2: LED Verification
- Flash bootloader to SD card
- Power on Pi
- Observe LED blink pattern (3 short, 1 long)
Level 3: Serial Communication
# Connect USB-TTL adapter and open terminal
screen /dev/ttyUSB0 115200
# Power cycle Pi
Expected: Full boot message with hardware info
Level 4: Kernel Loading
- Place test kernel on SD card
- Bootloader loads and jumps to it
- Test kernel outputs “Hello from kernel!”
Level 5: Hardware Matrix
- Test on Pi 3 Model B
- Test on Pi 3 Model B+
- Test on Pi 4 (different peripheral base)
7. Common Pitfalls & Debugging
Problem 1: Pi doesn’t boot at all (no LED activity)
Root cause: Wrong file format, corrupt bootcode.bin, bad SD card
Fix:
- Start with a working Raspbian image
- Replace only kernel8.img with your code
- Verify GPU firmware files (bootcode.bin, start.elf) are present
- Try a different SD card
Problem 2: LED blinks but UART produces no output
Root cause: GPIO alternate function not set, baud rate mismatch, wrong UART
Fix:
- Verify GPIO 14/15 set to alt0 (not alt5!)
- Check baud rate calculation (48MHz / 115200 / 16)
- Make sure using correct UART base address for your Pi model
- Verify serial adapter connection (TX to RX, RX to TX)
Problem 3: Works in QEMU but not on real hardware
Root cause: QEMU is more forgiving, timing issues, hardware differences
Fix:
- Add delays after GPIO/UART configuration
- Verify peripheral base address for your specific Pi model
- Check config.txt settings (enable_uart=1)
- Try reducing clock speed in config.txt
Problem 4: Code runs once then hangs on reset
Root cause: BSS not cleared, stack corruption, infinite loop in initialization
Fix:
- Ensure BSS section is zeroed in start.S
- Set stack pointer before any function calls
- Add LED toggle at each initialization step to find hang point
Problem 5: Kernel doesn’t boot after loading
Root cause: Wrong register state, incorrect load address, kernel expects different boot protocol
Fix:
- Verify x0 contains valid DTB address
- Check kernel expects to load at the address you used
- Ensure caches are disabled before jump
- Test with a minimal kernel first (just prints hello)
8. Extensions & Challenges
Extension 1: Multi-core Boot
- Boot all 4 ARM cores
- Implement simple spinlock synchronization
- Have cores cooperate on a task (parallel LED chase)
Extension 2: Interrupt Handling
- Set up exception vector table
- Handle timer interrupts
- Implement simple preemptive multitasking
Extension 3: USB Keyboard Input
- Initialize USB controller
- Implement basic HID driver
- Boot menu with keyboard selection
Extension 4: Display Output
- Initialize framebuffer via mailbox
- Draw pixels directly
- Display boot logo
Extension 5: Network Boot
- Initialize Ethernet controller
- Implement TFTP client
- Download kernel over network
Extension 6: Secure Boot
- Verify kernel signature
- Implement chain of trust
- Explore ARM TrustZone
9. Real-World Connections
Android Boot Process: Android devices use similar bare-metal bootloaders. Understanding Pi bare-metal prepares you for Android bootloader development.
IoT Devices: Smart home devices, industrial controllers, and embedded systems use ARM processors with similar boot sequences. This knowledge directly applies.
U-Boot: Project 10 explores U-Boot, the industry-standard bootloader. Understanding bare-metal concepts makes U-Boot’s complexity comprehensible.
Custom Hardware: Product companies regularly need bootloader development for custom ARM boards. This skill is directly marketable.
Debugging Skills: The techniques learned here - LED debugging, serial output, systematic hardware bring-up - apply to any embedded system.
10. Resources
Official Documentation
Tutorials
Reference Implementations
- circle - C++ bare metal environment
- Mailbox Property Interface
Hardware
11. Self-Assessment Checklist
Fundamentals
- Can explain the GPU-first boot sequence
- Understand memory-mapped I/O concept
- Can read ARM assembly code
- Know difference between AArch32 and AArch64
Implementation
- LED blinks in controlled pattern
- Serial output works at 115200 baud
- Code works in both QEMU and real hardware
- Understand every line of start.S
Hardware Understanding
- Can configure GPIO pins for different functions
- Understand UART register configuration
- Know how to use mailbox for GPU communication
- Can debug hardware issues systematically
Advanced
- Can load and boot a secondary kernel
- Understand kernel handoff requirements
- Can explain differences between Pi models
- Have ideas for extending the bootloader
12. Submission / Completion Criteria
Minimum Viable Submission:
- Code compiles with ARM64 cross-compiler
- Boots in QEMU with serial output visible
- LED blinks in identifiable pattern
- Prints boot banner with “Bare Metal Bootloader v1.0”
- Works on at least one real Raspberry Pi
Complete Submission:
- All minimum criteria met
- Hardware detection (board model, temperature)
- Loads and boots a secondary kernel
- Clean, documented code
- Works on both Pi 3 and Pi 4
Excellence Criteria:
- Implements boot menu with timeout
- SD card FAT filesystem reading
- Kernel command line passing
- Comprehensive error handling
- Well-structured modular code
“When you’ve written code that runs on bare ARM silicon - no OS, no runtime, no safety net - you’ve touched the metal in a way few programmers ever do.”