Project 4: Raspberry Pi Bare Metal — Hello World

Build a bare metal kernel for the Raspberry Pi that boots without any operating system, initializes the UART, and prints “Hello World” to the serial console—demonstrating full control over a modern ARM system.


Quick Reference

Attribute Value
Difficulty Advanced
Time Estimate 2 weeks
Language C + ARM Assembly
Platform Raspberry Pi 3/4
Prerequisites Projects 1-3, ARM basics
Key Topics ARM Cortex-A, boot process, linker scripts, UART, BCM2837/BCM2711

1. Learning Objectives

By completing this project, you will:

  1. Understand the Raspberry Pi boot process — How the GPU loads firmware and your kernel
  2. Write ARM64 (AArch64) assembly — Startup code that initializes the CPU
  3. Create linker scripts — Control memory layout for bare metal code
  4. Configure memory-mapped peripherals — Access UART through physical addresses
  5. Handle multi-core startup — Park secondary cores while core 0 runs your code
  6. Debug without an OS — Use UART and LED for diagnostics
  7. Cross-compile for ARM — Build on your development machine, run on Pi

The Core Question: How does code start running on a computer before any operating system exists?


2. Theoretical Foundation

2.1 Raspberry Pi Boot Process

The Raspberry Pi has one of the most unusual boot processes among embedded systems. Unlike x86 PCs where the CPU starts first, the Pi’s GPU runs before the ARM CPU even powers on.

Raspberry Pi Boot Sequence:

Power On
    │
    ▼
┌─────────────────────────────────────────────────────────────────────┐
│                         GPU DOMAIN                                  │
│                                                                     │
│  ┌─────────────────┐                                               │
│  │ 1. GPU ROM      │  Hardcoded in silicon                         │
│  │    (First Stage)│  Loads bootcode.bin from SD                   │
│  └────────┬────────┘                                               │
│           │                                                         │
│           ▼                                                         │
│  ┌─────────────────┐                                               │
│  │ 2. bootcode.bin │  GPU firmware                                 │
│  │    (Second Stage│  Enables SDRAM                                │
│  └────────┬────────┘  Loads start.elf                              │
│           │                                                         │
│           ▼                                                         │
│  ┌─────────────────┐                                               │
│  │ 3. start.elf    │  Main GPU firmware                            │
│  │    (Third Stage)│  Reads config.txt                             │
│  └────────┬────────┘  Loads kernel8.img to 0x80000                 │
│           │          Releases ARM core 0 from reset                │
└───────────┼─────────────────────────────────────────────────────────┘
            │
            ▼
┌─────────────────────────────────────────────────────────────────────┐
│                         ARM DOMAIN                                  │
│                                                                     │
│  ┌─────────────────┐                                               │
│  │ 4. kernel8.img  │  YOUR CODE STARTS HERE!                       │
│  │    @ 0x80000    │  ARM64 execution begins                       │
│  └────────┬────────┘  All 4 cores wake up simultaneously           │
│           │                                                         │
│           ▼                                                         │
│  ┌─────────────────┐                                               │
│  │ 5. Startup Code │  Park cores 1-3 in spinloop                   │
│  │    (Assembly)   │  Set up stack for core 0                      │
│  └────────┬────────┘  Clear BSS section                            │
│           │          Jump to C main()                              │
│           ▼                                                         │
│  ┌─────────────────┐                                               │
│  │ 6. C Kernel     │  Initialize UART                              │
│  │    main()       │  Print "Hello World!"                         │
│  └─────────────────┘  Your application logic                       │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘

Key insight: You don’t write a bootloader for the Pi—the GPU handles that. Your kernel8.img is loaded to address 0x80000 and ARM execution begins there. The “8” in kernel8.img indicates 64-bit ARM (AArch64).

Files required on SD card:

  • bootcode.bin — GPU first-stage bootloader (from Raspberry Pi Foundation)
  • start.elf — GPU firmware (from Raspberry Pi Foundation)
  • fixup.dat — Memory configuration (from Raspberry Pi Foundation)
  • kernel8.imgYOUR kernel binary
  • config.txt — (Optional) Boot configuration

2.2 ARM Cortex-A Architecture

The Raspberry Pi 3 uses the Cortex-A53 (ARMv8-A) and Pi 4 uses Cortex-A72 (ARMv8-A). Both support 64-bit (AArch64) execution.

ARM Cortex-A53/A72 Register Set (AArch64):

General Purpose Registers (64-bit):
┌────────────────────────────────────────────────────────────┐
│  X0-X7    │  Arguments/Results (caller-saved)              │
│  X8       │  Indirect result location register            │
│  X9-X15   │  Temporary (caller-saved)                     │
│  X16-X17  │  Intra-procedure-call scratch (IP0, IP1)      │
│  X18      │  Platform register (reserved)                 │
│  X19-X28  │  Callee-saved registers                       │
│  X29      │  Frame pointer (FP)                           │
│  X30      │  Link register (LR) - return address          │
│  SP       │  Stack pointer (per exception level)          │
│  PC       │  Program counter (not directly accessible)    │
└────────────────────────────────────────────────────────────┘

W0-W30 = Lower 32 bits of X0-X30

Special Registers:
┌────────────────────────────────────────────────────────────┐
│  MPIDR_EL1   │  Multiprocessor affinity - which core am I? │
│  SCTLR_EL1   │  System control register                   │
│  CurrentEL   │  Current exception level                   │
│  VBAR_EL1    │  Vector base address (exception handlers)  │
└────────────────────────────────────────────────────────────┘

Exception Levels (EL):
┌─────────────────────────────────────────────────────────────┐
│  EL3  │  Secure Monitor (TrustZone) - Highest privilege    │
├───────┼─────────────────────────────────────────────────────┤
│  EL2  │  Hypervisor - Virtualization                       │
├───────┼─────────────────────────────────────────────────────┤
│  EL1  │  Kernel/OS - Where bare metal code typically runs  │
├───────┼─────────────────────────────────────────────────────┤
│  EL0  │  User mode - Applications (lowest privilege)       │
└───────┴─────────────────────────────────────────────────────┘

On Pi boot, cores start in EL2 (no EL3 on Pi 3/4)

What you need to know for this project:

  1. Core identification: Read MPIDR_EL1 to get core ID (bits 0-1)
  2. Exception level: Kernel code runs at EL1
  3. Stack setup: Each core needs its own stack, pointed to by SP
  4. Calling convention: First 8 arguments in X0-X7, return in X0

2.3 Linker Scripts Deep Dive

A linker script tells the linker how to arrange your code in memory. For bare metal, this is critical—there’s no OS to relocate your code.

Linker Script Structure:

┌─────────────────────────────────────────────────────────────────────┐
│  ENTRY(_start)                 <- Execution entry point             │
│                                                                     │
│  SECTIONS {                    <- Memory section definitions        │
│      . = 0x80000;              <- Start address (where GPU loads)   │
│                                                                     │
│      .text.boot : {            <- Boot code MUST be first           │
│          *(.text.boot)         <- Wildcard: all .text.boot inputs   │
│      }                                                              │
│                                                                     │
│      .text : {                 <- Main code section                 │
│          *(.text)                                                   │
│      }                                                              │
│                                                                     │
│      .rodata : {               <- Read-only data (strings, etc.)    │
│          *(.rodata)                                                 │
│      }                                                              │
│                                                                     │
│      .data : {                 <- Initialized global variables      │
│          *(.data)                                                   │
│      }                                                              │
│                                                                     │
│      .bss : {                  <- Uninitialized globals (zeroed)    │
│          __bss_start = .;      <- Symbol for BSS start              │
│          *(.bss)                                                    │
│          __bss_end = .;        <- Symbol for BSS end                │
│      }                                                              │
│  }                                                                  │
└─────────────────────────────────────────────────────────────────────┘

Memory Layout After Linking:

                    0x00000000 ─────────────────────────────────┐
                                │ GPU-reserved memory           │
                                │ (VideoCore, framebuffer)      │
                    0x00080000 ─────────────────────────────────┤
                                │ .text.boot (entry point)      │ ◄── _start
                                │ .text (C code)                │
                                │ .rodata (strings)             │
                                │ .data (initialized vars)      │
                                │ .bss (zero-initialized)       │
                    0x000XXXXX ─────────────────────────────────┤
                                │                               │
                                │ FREE MEMORY                   │
                                │ (stack grows down here)       │
                                │                               │
                    0x3B400000 ─────────────────────────────────┤
                                │ GPU memory (varies)           │
                    0x40000000 ─────────────────────────────────┘
                                  (1GB address space - Pi 3)

Critical requirements:

  1. Entry point at 0x80000: GPU loads kernel here
  2. Boot code first: _start assembly must be at the very beginning
  3. BSS symbols exported: Startup code needs to zero BSS
  4. Alignment: Some sections need 4K alignment for MMU (not needed for hello world)

2.4 BCM2837/BCM2711 Peripherals

The Raspberry Pi’s peripherals are memory-mapped—you read and write to specific physical addresses to control hardware.

Raspberry Pi Peripheral Memory Map:

Pi 3 (BCM2837):                      Pi 4 (BCM2711):
┌────────────────────────────────┐   ┌────────────────────────────────┐
│ Peripheral Base: 0x3F000000    │   │ Peripheral Base: 0xFE000000    │
├────────────────────────────────┤   ├────────────────────────────────┤
│ GPIO:      Base + 0x200000     │   │ GPIO:      Base + 0x200000     │
│ UART0:     Base + 0x201000     │   │ UART0:     Base + 0x201000     │
│ AUX/UART1: Base + 0x215000     │   │ AUX/UART1: Base + 0x215000     │
│ Mailbox:   Base + 0x00B880     │   │ Mailbox:   Base + 0x00B880     │
│ Interrupt: Base + 0x00B200     │   │ GIC:       0xFF840000          │
└────────────────────────────────┘   └────────────────────────────────┘

UART0 (PL011) Register Map @ 0x3F201000 (Pi 3):
┌─────────────────────────────────────────────────────────────────────┐
│ Offset │ Name      │ Description                                   │
├────────┼───────────┼───────────────────────────────────────────────┤
│ 0x00   │ DR        │ Data Register - read/write characters         │
│ 0x04   │ RSRECR    │ Receive status / error clear                  │
│ 0x18   │ FR        │ Flag Register - TX/RX status                  │
│ 0x24   │ IBRD      │ Integer baud rate divisor                     │
│ 0x28   │ FBRD      │ Fractional baud rate divisor                  │
│ 0x2C   │ LCRH      │ Line control (data bits, parity, FIFO)        │
│ 0x30   │ CR        │ Control Register (enable TX/RX)               │
│ 0x34   │ IFLS      │ Interrupt FIFO level select                   │
│ 0x38   │ IMSC      │ Interrupt mask set/clear                      │
│ 0x44   │ ICR       │ Interrupt clear register                      │
└─────────────────────────────────────────────────────────────────────┘

Flag Register (FR) Bits:
┌─────────────────────────────────────────────────────────────────────┐
│ Bit │ Name │ Description                                           │
├─────┼──────┼───────────────────────────────────────────────────────┤
│  7  │ TXFE │ Transmit FIFO empty                                   │
│  5  │ TXFF │ Transmit FIFO full (wait if set before writing)       │
│  4  │ RXFE │ Receive FIFO empty (no data to read)                  │
│  3  │ BUSY │ UART busy transmitting                                │
└─────────────────────────────────────────────────────────────────────┘

Transmit Algorithm:
┌──────────────────────────┐
│ while (FR & TXFF)        │ ← Wait until TX FIFO not full
│     ; // spin            │
│ DR = character;          │ ← Write character to data register
└──────────────────────────┘

GPIO Configuration for UART:

The UART pins (GPIO 14 = TXD, GPIO 15 = RXD) must be configured for their “alternate function” (ALT0).

GPIO Function Select Registers (GPFSEL):

Each GPIO pin uses 3 bits to select its function:
┌─────────────────────────────────────────────────────────────────────┐
│ Value │ Function                                                   │
├───────┼────────────────────────────────────────────────────────────┤
│  000  │ Input                                                      │
│  001  │ Output                                                     │
│  010  │ ALT5                                                       │
│  011  │ ALT4                                                       │
│  100  │ ALT0  ← UART0 TX/RX for GPIO 14/15                        │
│  101  │ ALT1                                                       │
│  110  │ ALT2                                                       │
│  111  │ ALT3                                                       │
└─────────────────────────────────────────────────────────────────────┘

GPFSEL1 @ GPIO Base + 0x04 (covers GPIO 10-19):
┌─────────────────────────────────────────────────────────────────────┐
│ Bits  │  18-17-16  │  15-14-13  │  12-11-10  │  ...                │
│       │   GPIO 16  │   GPIO 15  │   GPIO 14  │                      │
│       │            │   (RXD)    │   (TXD)    │                      │
└─────────────────────────────────────────────────────────────────────┘

To set GPIO 14 and 15 to ALT0:
  GPFSEL1 = (GPFSEL1 & ~0x3F000) | 0x24000;
           └─ clear bits 12-17 ─┘ └─ set 100 100 ─┘

2.5 Startup Assembly Requirements

The startup assembly code must handle several critical tasks before C code can run:

Startup Assembly Flow:

                    ┌────────────────────────────────────┐
                    │        All 4 cores wake up         │
                    │          simultaneously            │
                    └───────────────┬────────────────────┘
                                    │
                                    ▼
                    ┌────────────────────────────────────┐
                    │  1. Read MPIDR_EL1 to get core ID  │
                    │     Core 0 continues               │
                    │     Cores 1-3 → spin loop          │
                    └───────────────┬────────────────────┘
                                    │
                    ┌───────────────┴───────────────────┐
                    │                                   │
                    ▼                                   ▼
    ┌───────────────────────────┐       ┌───────────────────────────┐
    │    CORE 0 (continues)     │       │    CORES 1-3 (parked)     │
    └─────────────┬─────────────┘       └───────────────────────────┘
                  │                              │
                  ▼                              ▼
    ┌───────────────────────────┐       ┌───────────────────────────┐
    │  2. Check exception level │       │  wfe                      │
    │     (should be EL2 or EL1)│       │  b spin_loop              │
    └─────────────┬─────────────┘       │  (wait for event, loop)   │
                  │                     └───────────────────────────┘
                  ▼
    ┌───────────────────────────┐
    │  3. Drop from EL2 to EL1  │
    │     (if starting in EL2)  │
    └─────────────┬─────────────┘
                  │
                  ▼
    ┌───────────────────────────┐
    │  4. Set up stack pointer  │
    │     SP = 0x80000          │
    │     (grows down)          │
    └─────────────┬─────────────┘
                  │
                  ▼
    ┌───────────────────────────┐
    │  5. Clear BSS section     │
    │     memset(bss, 0, len)   │
    └─────────────┬─────────────┘
                  │
                  ▼
    ┌───────────────────────────┐
    │  6. Branch to kernel_main │
    │     bl kernel_main        │
    └─────────────┬─────────────┘
                  │
                  ▼
    ┌───────────────────────────┐
    │  7. Infinite loop (halt)  │
    │     (if main returns)     │
    └───────────────────────────┘

Stack Layout:
                    0x80000 ─────────────────────────────────┐
                            │ ▲ kernel_main stack frame      │
                            │ │                              │
                            │ │ Stack grows DOWN             │
                            │ │                              │
                    0x7XXXX ────────────────────────────────┘

Why park secondary cores?

All four cores of the Cortex-A53 wake up and start executing at 0x80000 simultaneously. If you don’t park cores 1-3, they’ll all try to:

  • Use the same stack pointer (corrupting each other’s stack)
  • Initialize the same peripherals (race conditions)
  • Print to UART (garbled output)

For this project, we park them. Multi-core programming comes later.

2.6 Common Misconceptions

Misconception 1: “I need to write a bootloader”

  • Reality: The GPU handles boot. You write a “kernel” that the GPU loads.

Misconception 2: “The ARM CPU starts first”

  • Reality: The GPU boots first, initializes RAM, then releases the ARM cores.

Misconception 3: “I can use any address for my kernel”

  • Reality: Must be 0x80000 (or you change config.txt, not recommended for learning).

Misconception 4: “UART just works if I write to it”

  • Reality: GPIO pins must be configured for ALT0 function first.

Misconception 5: “All cores start at different times”

  • Reality: All 4 cores wake up simultaneously and jump to 0x80000.

Misconception 6: “I can use the standard C library”

  • Reality: No OS = no libc. You implement everything yourself.

Misconception 7: “Debugging is easy with printf”

  • Reality: You must implement UART first to get any output. Before that, blink an LED.

3. Project Specification

3.1 What You Will Build

A minimal bare metal kernel for Raspberry Pi 3/4 that:

  1. Boots from SD card (replacing the default Linux kernel)
  2. Parks secondary CPU cores in a spin loop
  3. Sets up a stack for the primary core
  4. Initializes the PL011 UART at 115200 baud
  5. Prints “Hello from bare metal!” to the serial console
  6. Optionally blinks the onboard ACT LED as a visual indicator

3.2 Functional Requirements

ID Requirement Priority
FR1 Kernel boots on Pi 3 or Pi 4 Required
FR2 Secondary cores parked (not executing random memory) Required
FR3 BSS section cleared to zero Required
FR4 UART0 (PL011) initialized at 115200 baud, 8N1 Required
FR5 Print string to serial terminal Required
FR6 Support both Pi 3 (BCM2837) and Pi 4 (BCM2711) Optional
FR7 Blink ACT LED as visual indicator Optional
FR8 Print CPU identification (core type, revision) Optional

3.3 Non-Functional Requirements

ID Requirement Target
NFR1 Kernel binary size < 4KB
NFR2 Boot to UART output < 100ms
NFR3 Cross-compilation support Linux/macOS host
NFR4 QEMU emulation support For development without hardware

3.4 Example Output

When you connect a USB-to-TTL serial adapter to GPIO 14 (TXD) and GPIO 15 (RXD) and run screen /dev/ttyUSB0 115200:

=== Raspberry Pi 3 Bare Metal ===
UART initialized at 115200 baud
Hello from bare metal!

CPU: Cortex-A53
Core ID: 0
Exception Level: EL1

System running. Press any key...
You pressed: 'a' (0x61)
You pressed: 'b' (0x62)

3.5 Real World Outcome

After completing this project, you will have:

On Your Development Machine:

$ ls -la build/
total 24
-rw-r--r-- 1 user user   512 Dec 15 10:30 boot.S
-rw-r--r-- 1 user user  2048 Dec 15 10:30 kernel.c
-rw-r--r-- 1 user user   256 Dec 15 10:30 linker.ld
-rw-r--r-- 1 user user    56 Dec 15 10:30 Makefile
-rwxr-xr-x 1 user user  3584 Dec 15 10:30 kernel8.img

$ file build/kernel8.img
kernel8.img: data

$ xxd build/kernel8.img | head -4
00000000: d53800a1 12000c21 7100003f 54000040  .8.....!q..?T..@
00000010: 580000c2 d61f0040 00008000 00000000  X......@........

On Your SD Card:

boot/
├── bootcode.bin     # From Raspberry Pi firmware
├── start.elf        # From Raspberry Pi firmware
├── fixup.dat        # From Raspberry Pi firmware
├── config.txt       # Optional: arm_64bit=1
└── kernel8.img      # YOUR compiled kernel!

On Serial Terminal:

=== Raspberry Pi 3 Bare Metal ===
Hello from bare metal!

Verification checklist:

  • Pi boots without Linux kernel panic
  • Serial output appears within 1 second of power-on
  • Output is not garbled (correct baud rate)
  • Pressing keys echoes them back (if you implement RX)
  • ACT LED blinks (if you implement GPIO)

4. Solution Architecture

4.1 Boot Sequence Diagram

Complete Boot Flow (Your Code Path Highlighted):

Power On
    │
    ▼
[GPU ROM] ──────────────────────────────────────────────────────────┐
    │  Hardcoded in Broadcom SoC                                    │
    │  Looks for bootcode.bin on SD card                            │
    │                                                               │
    ▼                                                               │
[bootcode.bin] ─────────────────────────────────────────────────────┤
    │  Enables SDRAM                                                │
    │  Loads start.elf                                              │
    │                                                               │  NOT YOUR
    ▼                                                               │  CODE
[start.elf] ────────────────────────────────────────────────────────┤
    │  GPU firmware with ARM bootloader                             │
    │  Reads config.txt (optional)                                  │
    │  Loads kernel8.img to 0x80000                                 │
    │  Sets up ATAGs or device tree (optional)                      │
    │  Releases ARM cores from reset                                │
    └───────────────────────────────────────────────────────────────┘
                                    │
════════════════════════════════════╪════════════════════════════════
                                    │
    ┌───────────────────────────────┘
    │
    ▼
╔═══════════════════════════════════════════════════════════════════╗
║                     YOUR CODE STARTS HERE                         ║
╠═══════════════════════════════════════════════════════════════════╣
║                                                                   ║
║  [boot.S - _start] ◄────── Entry point at 0x80000                 ║
║      │                                                            ║
║      ├─► Get core ID from MPIDR_EL1                               ║
║      │   │                                                        ║
║      │   ├─► Core 0: Continue                                     ║
║      │   └─► Cores 1-3: Jump to spin_loop                         ║
║      │                                                            ║
║      ├─► Set stack pointer (SP = 0x80000)                         ║
║      │                                                            ║
║      ├─► Clear BSS section                                        ║
║      │   │                                                        ║
║      │   └─► Loop: *bss_ptr++ = 0; while (bss_ptr < bss_end)      ║
║      │                                                            ║
║      └─► Branch to kernel_main (bl kernel_main)                   ║
║                                                                   ║
║  [kernel.c - kernel_main]                                         ║
║      │                                                            ║
║      ├─► uart_init()                                              ║
║      │   │                                                        ║
║      │   ├─► Disable UART                                         ║
║      │   ├─► Set GPIO 14, 15 to ALT0                              ║
║      │   ├─► Disable pull-up/down                                 ║
║      │   ├─► Clear pending interrupts                             ║
║      │   ├─► Set baud rate (115200)                               ║
║      │   ├─► Enable FIFO, 8-bit                                   ║
║      │   └─► Enable UART, TX, RX                                  ║
║      │                                                            ║
║      ├─► uart_puts("Hello from bare metal!")                      ║
║      │   │                                                        ║
║      │   └─► Loop: uart_putc(*s++); while (*s)                    ║
║      │                                                            ║
║      └─► while(1) { ... } // Main loop                            ║
║                                                                   ║
╚═══════════════════════════════════════════════════════════════════╝

4.2 Memory Map

Raspberry Pi 3 Physical Memory Map:

0x00000000 ┌─────────────────────────────────────────────┐
           │ ARM-GPU shared / GPU-only                   │
           │ (Size depends on gpu_mem in config.txt)     │
           │ Default: 64MB for GPU                       │
0x00080000 ├─────────────────────────────────────────────┤ ◄── kernel8.img loaded here
           │ ┌─────────────────────────────────────────┐ │
           │ │ .text.boot section                      │ │ _start (entry point)
           │ │   - Core parking logic                  │ │
           │ │   - Stack setup                         │ │
           │ │   - BSS clearing                        │ │
           │ │   - Jump to C                           │ │
           │ ├─────────────────────────────────────────┤ │
           │ │ .text section                           │ │
           │ │   - kernel_main()                       │ │
           │ │   - uart_init(), uart_putc(), etc.      │ │
           │ ├─────────────────────────────────────────┤ │
           │ │ .rodata section                         │ │
           │ │   - "Hello from bare metal!\n"          │ │
           │ │   - Other string constants              │ │
           │ ├─────────────────────────────────────────┤ │
           │ │ .data section                           │ │
           │ │   - Initialized global variables        │ │
           │ ├─────────────────────────────────────────┤ │
           │ │ .bss section                            │ │ __bss_start
           │ │   - Uninitialized globals (zeroed)      │ │ __bss_end
           │ └─────────────────────────────────────────┘ │
           │                                             │
           │ ┌─────────────────────────────────────────┐ │
           │ │                                         │ │
           │ │           FREE MEMORY                   │ │
           │ │                                         │ │
           │ │      (Heap could grow up here)          │ │
           │ │                                         │ │
           │ └──────────────────────────────┬──────────┘ │
           │                                │            │
           │ ┌──────────────────────────────▼──────────┐ │
           │ │ Stack (grows DOWN toward kernel)        │ │ ◄── SP starts at 0x80000
           │ └─────────────────────────────────────────┘ │
0x3F000000 ├─────────────────────────────────────────────┤
           │                                             │
           │ Peripheral I/O (Memory-Mapped)              │
           │                                             │
           │ +0x200000: GPIO                             │
           │ +0x201000: UART0 (PL011)                    │
           │ +0x215000: AUX (UART1, SPI)                 │
           │ +0x00B880: Mailbox                          │
           │                                             │
0x40000000 ├─────────────────────────────────────────────┤
           │ Local peripherals                           │
           │ (Core timers, mailboxes)                    │
0x40000100 └─────────────────────────────────────────────┘

Note: Pi 4 has peripherals at 0xFE000000 and more RAM (up to 8GB)

4.3 File Structure

project/
├── Makefile                 # Build automation
├── linker.ld                # Memory layout for linker
├── src/
│   ├── boot.S               # Assembly entry point
│   ├── kernel.c             # Main C code
│   ├── uart.c               # UART driver
│   ├── uart.h               # UART interface
│   ├── gpio.c               # GPIO driver (optional)
│   ├── gpio.h               # GPIO interface (optional)
│   └── mmio.h               # Memory-mapped I/O helpers
├── build/                   # Build output directory
│   ├── boot.o
│   ├── kernel.o
│   ├── uart.o
│   ├── kernel.elf           # Linked executable
│   └── kernel8.img          # Final binary for SD card
└── README.md                # Build instructions

Recommended: Start with a single file approach, then split

Minimal single-file approach (recommended for learning):

project/
├── Makefile
├── linker.ld
├── boot.S
├── kernel.c                 # Everything in one file initially
└── kernel8.img

4.4 UART Implementation Strategy

UART Initialization Sequence:

Step 1: Disable UART
────────────────────────────────────────
    CR = 0                      # Control Register = 0
    │
    └── Turns off UART for safe reconfiguration


Step 2: Configure GPIO for UART
────────────────────────────────────────
    GPFSEL1 &= ~(7 << 12)       # Clear GPIO 14 function
    GPFSEL1 |= (4 << 12)        # GPIO 14 = ALT0 (TXD0)
    GPFSEL1 &= ~(7 << 15)       # Clear GPIO 15 function
    GPFSEL1 |= (4 << 15)        # GPIO 15 = ALT0 (RXD0)
    │
    └── Sets pins to UART alternate function


Step 3: Disable Pull-up/Pull-down
────────────────────────────────────────
    GPPUD = 0                   # Disable pull-up/down
    delay(150)                  # Wait 150 cycles
    GPPUDCLK0 = (1 << 14) | (1 << 15)  # Clock GPIO 14, 15
    delay(150)                  # Wait 150 cycles
    GPPUDCLK0 = 0               # Remove clock
    │
    └── Required sequence from BCM2835 datasheet


Step 4: Clear Pending Interrupts
────────────────────────────────────────
    ICR = 0x7FF                 # Clear all interrupt flags
    │
    └── Start with clean state


Step 5: Set Baud Rate
────────────────────────────────────────
    # For 115200 baud with 48MHz UART clock:
    # Divider = 48000000 / (16 * 115200) = 26.0416...
    # Integer part: 26
    # Fractional part: 0.0416 * 64 = 2.67 ≈ 3

    IBRD = 26                   # Integer baud rate divisor
    FBRD = 3                    # Fractional baud rate divisor
    │
    └── Configures 115200 baud (close approximation)


Step 6: Configure Line Control
────────────────────────────────────────
    LCRH = (3 << 5) | (1 << 4)  # 8 bits, FIFO enabled
    │        │          │
    │        │          └── FIFO enable (bit 4)
    │        └── Word length = 8 bits (bits 5-6 = 11)
    │
    └── 8N1: 8 data bits, no parity, 1 stop bit


Step 7: Enable UART
────────────────────────────────────────
    CR = (1 << 9) | (1 << 8) | (1 << 0)
    │      │          │          │
    │      │          │          └── UART enable (bit 0)
    │      │          └── TX enable (bit 8)
    │      └── RX enable (bit 9)
    │
    └── UART is now ready for communication

5. Implementation Guide

5.1 Environment Setup

Required tools:

# On Ubuntu/Debian:
sudo apt-get update
sudo apt-get install gcc-aarch64-linux-gnu
sudo apt-get install qemu-system-arm

# On macOS (via Homebrew):
brew tap ArmMbed/homebrew-formulae
brew install arm-none-eabi-gcc
brew install qemu

# On Arch Linux:
sudo pacman -S aarch64-linux-gnu-gcc qemu-system-aarch64

# Verify installation:
aarch64-linux-gnu-gcc --version
qemu-system-aarch64 --version

Download Raspberry Pi firmware files:

# Create project directory
mkdir rpi-bare-metal && cd rpi-bare-metal

# Download firmware files (bootcode.bin, start.elf, fixup.dat)
wget https://github.com/raspberrypi/firmware/raw/master/boot/bootcode.bin
wget https://github.com/raspberrypi/firmware/raw/master/boot/start.elf
wget https://github.com/raspberrypi/firmware/raw/master/boot/fixup.dat

5.2 Step 1: Create the Linker Script

The linker script controls where your code goes in memory. Think through these questions:

Design Questions:

  1. At what address does the GPU load kernel8.img?
  2. What must be at the very first byte of the kernel?
  3. What sections does your code need?
  4. How will your startup code know where BSS starts and ends?

Linker Script Skeleton:

/* linker.ld - Fill in the values */

ENTRY(/* What symbol is your entry point? */)

SECTIONS
{
    /* Where does the GPU load the kernel? */
    . = ???;

    /* Boot code MUST be first - why? */
    .text.boot : { *(/* pattern? */) }

    /* Rest of code */
    .text : { *(.text) }

    /* Read-only data (strings) */
    .rodata : { *(.rodata) }

    /* Initialized global variables */
    .data : { *(.data) }

    /* Uninitialized data - needs symbols for startup code */
    .bss : {
        /* Export symbol: where does BSS start? */
        __bss_start = .;
        *(.bss)
        /* Export symbol: where does BSS end? */
        __bss_end = .;
    }
}

5.3 Step 2: Write the Assembly Entry Point

Your startup code runs before C. It must set up the environment C expects.

Design Questions:

  1. How do you determine which core you are?
  2. What should cores 1-3 do?
  3. What must be true about the stack before calling C?
  4. What is the BSS section and why clear it?

Assembly Skeleton:

// boot.S - Fill in the implementation

.section ".text.boot"

.global _start

_start:
    // Step 1: Which core am I?
    // Read MPIDR_EL1, check bits [1:0] for core ID
    // Instruction: mrs x?, mpidr_el1
    // Then mask: and x?, x?, #0x3

    // Step 2: If not core 0, go to parking loop
    // Instruction: cbnz x?, label

    // Step 3: Set stack pointer
    // Stack grows DOWN, starts BELOW kernel
    // Instruction: ldr x?, =???  then mov sp, x?

    // Step 4: Clear BSS
    // Load __bss_start and __bss_end symbols
    // Loop storing zeros
    // Instructions: ldr x?, =__bss_start, etc.

    // Step 5: Jump to C
    // Instruction: bl kernel_main

    // Step 6: Hang if main returns
halt:
    // Wait for event, branch to self
    // Instructions: wfe, b halt

// Parking loop for secondary cores
secondary_spin:
    // Wait for event, loop forever
    // Instructions: wfe, b secondary_spin

Key ARM64 Instructions:

  • mrs Xd, <sysreg> — Move from system register to Xd
  • and Xd, Xn, #imm — Bitwise AND with immediate
  • cbnz Xn, label — Compare and branch if not zero
  • cbz Xn, label — Compare and branch if zero
  • ldr Xd, =symbol — Load address of symbol (pseudo-instruction)
  • mov sp, Xn — Set stack pointer from register
  • str Xzr, [Xn], #8 — Store zero, post-increment address by 8
  • cmp Xn, Xm — Compare two registers
  • b.lt label — Branch if less than (after cmp)
  • bl label — Branch with link (call function)
  • wfe — Wait for event (low power wait)
  • b label — Unconditional branch

5.4 Step 3: Implement UART Driver

Your first C code. Keep it simple.

Design Questions:

  1. How do you read/write to a specific memory address in C?
  2. What is volatile and why is it critical for hardware registers?
  3. In what order must UART be configured?
  4. How do you know when it’s safe to write another character?

UART Skeleton:

// kernel.c - Fill in the implementation

// Step 1: Define peripheral base addresses
// Pi 3: 0x3F000000, Pi 4: 0xFE000000
#define PERIPHERAL_BASE ???

// Step 2: Define UART register offsets
#define UART0_BASE (PERIPHERAL_BASE + ???)
#define UART0_DR   (UART0_BASE + 0x00)  // Data register
#define UART0_FR   (UART0_BASE + ???)   // Flag register
// ... more registers

// Step 3: Helper to write to memory-mapped register
// Think: How do you write to an arbitrary address in C?
// What does volatile mean and why is it needed?
static inline void mmio_write(unsigned long reg, unsigned int data) {
    *(volatile unsigned int *)reg = data;
}

// Step 4: Helper to read from memory-mapped register
static inline unsigned int mmio_read(unsigned long reg) {
    return *(volatile unsigned int *)reg;
}

// Step 5: Transmit single character
// Think: What must you check before writing to data register?
void uart_putc(unsigned char c) {
    // Wait while transmit FIFO is full
    // Check bit ??? of flag register
    while (???) {
        // spin
    }
    // Write character to data register
    ???;
}

// Step 6: Transmit string
void uart_puts(const char *str) {
    while (*str) {
        ???;
    }
}

// Step 7: UART initialization
// Follow the sequence from section 4.4
void uart_init(void) {
    // 1. Disable UART
    // 2. Configure GPIO
    // 3. Disable pull-up/down
    // 4. Clear interrupts
    // 5. Set baud rate
    // 6. Configure line control
    // 7. Enable UART
}

// Step 8: Main function
void kernel_main(void) {
    uart_init();
    uart_puts("Hello from bare metal!\r\n");

    // Loop forever
    while (1) {
        // Optional: echo received characters
    }
}

5.5 Step 4: Create the Makefile

Design Questions:

  1. What CPU architecture flags does the compiler need?
  2. What does -ffreestanding mean and why is it needed?
  3. How do you convert an ELF file to a raw binary?

Makefile Skeleton:

# Makefile - Fill in the commands

# Toolchain prefix for cross-compilation
CROSS = aarch64-linux-gnu-

# Compiler flags
# -mcpu=??? : Which CPU?
# -fpic : Position independent code
# -ffreestanding : No standard library
CFLAGS = ???

# Assembler flags
ASFLAGS = ???

# Source files
SOURCES = boot.S kernel.c

# Build targets
all: kernel8.img

# Compile assembly
boot.o: boot.S
	$(CROSS)gcc $(ASFLAGS) -c $< -o $@

# Compile C
kernel.o: kernel.c
	$(CROSS)gcc $(CFLAGS) -c $< -o $@

# Link
kernel.elf: boot.o kernel.o linker.ld
	$(CROSS)ld -T linker.ld -o $@ boot.o kernel.o

# Convert to raw binary
# Hint: objcopy with -O binary
kernel8.img: kernel.elf
	$(CROSS)objcopy ??? $< $@

clean:
	rm -f *.o *.elf *.img

5.6 Step 5: Test with QEMU

Before putting it on hardware, test in QEMU:

# Run in QEMU (serial to stdio)
qemu-system-aarch64 -M raspi3b -kernel kernel8.img -serial stdio

# With debugging:
qemu-system-aarch64 -M raspi3b -kernel kernel8.img -serial stdio -d int,cpu_reset

# With GDB server (for debugging):
qemu-system-aarch64 -M raspi3b -kernel kernel8.img -serial stdio -S -s
# In another terminal:
aarch64-linux-gnu-gdb kernel.elf -ex "target remote :1234"

Expected QEMU output:

Hello from bare metal!

5.7 Step 6: Deploy to Hardware

Prepare SD card:

# Format SD card as FAT32 (use lsblk to find device, e.g., /dev/sdb)
sudo mkfs.vfat -F 32 /dev/sdX1

# Mount and copy files
sudo mount /dev/sdX1 /mnt
sudo cp bootcode.bin start.elf fixup.dat kernel8.img /mnt/
sudo umount /mnt

Optional config.txt:

# config.txt (optional)
arm_64bit=1
kernel=kernel8.img

Connect serial adapter:

USB-TTL Adapter          Raspberry Pi GPIO
────────────────         ───────────────────
   GND     ─────────────   Pin 6 (Ground)
   RXD     ─────────────   Pin 8 (GPIO 14, TXD)
   TXD     ─────────────   Pin 10 (GPIO 15, RXD)
   (DO NOT connect VCC - Pi has its own power)

Open terminal:

# Linux
screen /dev/ttyUSB0 115200

# macOS
screen /dev/cu.usbserial-* 115200

# Exit screen: Ctrl-A, then K, then Y

Power on Pi and watch output.


6. Testing Strategy

6.1 Unit Testing (Limited in Bare Metal)

Without an OS, traditional unit testing is difficult. Instead:

Test each component in isolation:

  1. Assembly entry only: Just make it not crash (QEMU won’t hang)
  2. Stack test: Write a C function that uses locals, verify it returns
  3. UART TX only: Hardcode characters before implementing putc
  4. UART RX: Echo characters back

6.2 Integration Testing

Test Matrix:

Test QEMU Pi 3 Pi 4 Expected
Boots without hang Pass Pass Pass UART output appears
Correct baud rate N/A Pass Pass No garbled output
String output Pass Pass Pass “Hello from bare metal!”
Character echo Pass Pass Pass Typed chars echoed
Multi-core parking Pass Pass Pass Only one “Hello” message

6.3 Debugging Techniques

When nothing works (no output):

Debugging Flowchart:

[No output]
    │
    ├──► Is QEMU working?
    │    │
    │    ├── Yes: Problem is hardware-specific
    │    │        → Check GPIO wiring
    │    │        → Check serial adapter
    │    │        → Verify firmware files on SD
    │    │
    │    └── No: Problem is in code
    │          → Check linker script origin
    │          → Verify _start is at 0x80000
    │          → Add infinite LED blink BEFORE uart_init
    │
    ├──► Is it garbled output?
    │    │
    │    └── Baud rate mismatch
    │        → Check IBRD/FBRD calculation
    │        → Verify UART clock frequency
    │
    └──► Is it partial output?
         │
         └── Missing newlines or buffer issues
             → Use \r\n not just \n
             → Add uart_flush() if needed

LED debugging (before UART works):

// GPIO 47 is ACT LED on Pi 3 (active low)
#define GPIO_BASE (PERIPHERAL_BASE + 0x200000)
#define GPFSEL4   (GPIO_BASE + 0x10)
#define GPSET1    (GPIO_BASE + 0x20)
#define GPCLR1    (GPIO_BASE + 0x2C)

void led_init(void) {
    // Set GPIO 47 as output
    unsigned int val = mmio_read(GPFSEL4);
    val &= ~(7 << 21);  // Clear bits 23-21
    val |= (1 << 21);   // Set as output
    mmio_write(GPFSEL4, val);
}

void led_on(void) {
    mmio_write(GPCLR1, (1 << 15));  // GPIO 47 is bit 15 of GPCLR1
}

void led_off(void) {
    mmio_write(GPSET1, (1 << 15));
}

Use LED as progress indicator:

void kernel_main(void) {
    led_init();
    led_on();   // LED on = we got to main

    uart_init();
    led_off();  // LED off = UART init complete

    uart_puts("Hello");
    led_on();   // LED on = message sent
}

7. Common Pitfalls & Debugging

7.1 Build Issues

Symptom Cause Fix
undefined reference to _start Wrong linker script Check ENTRY(_start)
kernel8.img is 0 bytes objcopy failed Check section names in linker script
cannot find -lgcc Missing compiler libs Use -nostdlib flag
relocation truncated Code too far from address Use -fpic or check linker script

7.2 Boot Issues

Symptom Cause Fix
Rainbow screen Kernel not found Check kernel8.img name, FAT32 format
Hangs with blank screen Crash before UART Add LED blink at _start
Pi doesn’t power on SD card issue Verify firmware files present
Boot loops Kernel crashes immediately Check stack pointer setup

7.3 UART Issues

Symptom Cause Fix
No output at all GPIO not configured Set ALT0 function for GPIO 14/15
Garbled output Wrong baud rate Recalculate IBRD/FBRD
Missing characters TX FIFO overflow Wait for FIFO not full before write
Only first char shows Missing loop in puts Check string iteration
Works in QEMU, not hardware Pull-up/down not configured Implement GPPUD sequence

7.4 Multi-Core Issues

Symptom Cause Fix
Output repeated 4x All cores running main Add core ID check in _start
Random crashes Stack corruption Each core uses same stack
Inconsistent behavior Race conditions Park secondary cores with WFE

7.5 Memory Issues

Symptom Cause Fix
Global variables corrupted BSS not cleared Add BSS clearing loop in _start
Function calls crash Stack pointer wrong Check SP initialization
Strings show garbage .rodata not in binary Check linker script sections

8. Extensions & Challenges

Once “Hello World” works, try these extensions:

8.1 Easy Extensions

  1. Implement uart_getc() — Read characters from serial
  2. Echo terminal — Echo typed characters back
  3. Blink ACT LED — Visual feedback without serial

8.2 Intermediate Extensions

  1. Print hex numbers — Implement simple hex formatting
  2. Simple command parser — “led on”, “led off”, “help”
  3. System information — Print CPU ID, memory size from mailbox
  4. Timer-based LED blink — Use ARM timer instead of delay loop

8.3 Advanced Extensions

  1. Framebuffer output — Print to screen via mailbox/framebuffer
  2. USB keyboard input — Much harder, requires USB stack
  3. Multi-core activation — Wake secondary cores, give them tasks
  4. Exception handlers — Catch and report exceptions
  5. Simple shell — Command line with history

8.4 Project Ideas Using This Foundation

  • Bare metal music player — PWM audio output
  • Logic analyzer — GPIO sampling with timing
  • LED matrix controller — SPI or bit-banged output
  • Temperature logger — I2C sensor reading
  • Mini operating system — Memory management, task switching

9. Real-World Connections

9.1 Where This Knowledge Applies

Domain Application
Embedded Systems IoT firmware, sensor nodes, industrial controllers
Operating Systems Linux kernel development, boot process
Bootloaders U-Boot, Raspberry Pi bootloader
Security Research Firmware analysis, rootkit development
Game Development Retro console homebrew
Automotive ECU firmware, CAN bus controllers
Aerospace Flight controllers, satellite systems

9.2 Production Examples

Raspberry Pi Foundation’s own kernel:

  • Same boot process
  • Same UART initialization
  • Evolves into full Linux kernel

U-Boot bootloader:

  • Similar early initialization
  • Loads and boots Linux
  • Used on most ARM boards

FreeRTOS on Pi:

  • Similar bare metal start
  • Adds scheduler, tasks
  • Real-time applications

9.3 Career Relevance

Job roles that use this:

  • Embedded Systems Engineer
  • Firmware Developer
  • OS/Kernel Developer
  • Security Researcher
  • IoT Developer

Interview topics from this project:

  • “Explain the ARM boot process”
  • “What is a linker script?”
  • “How does memory-mapped I/O work?”
  • “Why use volatile for hardware registers?”

10. Resources

10.1 Primary References

Resource Why It’s Useful
bztsrc/raspi3-tutorial Excellent step-by-step bare metal tutorials
BCM2837 ARM Peripherals Official peripheral documentation
ARM Architecture Reference Manual Definitive ARM instruction reference
Valvers Pi Tutorial Classic bare metal tutorial series

10.2 Books

Book Chapters Why
“Bare-metal programming for ARM” by Umanovskis Ch. 3-5 Free ebook, perfect for this project
“The Definitive Guide to ARM Cortex-M3/M4” Ch. 1-4 ARM architecture fundamentals
“Operating Systems: Three Easy Pieces” Ch. 1-6 OS concepts that apply here

10.3 Tools

Tool Purpose
aarch64-linux-gnu-gcc Cross-compiler for ARM64
qemu-system-aarch64 ARM64 emulator
screen or minicom Serial terminal
aarch64-linux-gnu-objdump Disassembly and inspection
xxd Hex dump for binary inspection

10.4 Community


11. Self-Assessment Checklist

Use this to verify your understanding before, during, and after the project:

Before Starting

  • I can explain what bare metal programming means
  • I understand why the Raspberry Pi GPU boots before the ARM CPU
  • I know what a linker script does
  • I understand the difference between ELF and raw binary formats
  • I know what memory-mapped I/O means

During Implementation

  • I can read the MPIDR_EL1 register to get core ID
  • I understand why secondary cores must be parked
  • I know why volatile is required for hardware registers
  • I can calculate UART baud rate divisors
  • I understand the GPIO function select register format

After Completion

  • I can explain every line of my assembly startup code
  • I know what would happen if I didn’t clear BSS
  • I can describe the complete boot sequence from power-on to my printf
  • I can debug UART issues systematically
  • I could implement UART on a different ARM board using a different datasheet

Extension Readiness

  • I could add interrupt handling to this kernel
  • I understand how to access the mailbox for system information
  • I could implement a simple timer-based scheduler
  • I know what additional steps would be needed for MMU/paging

12. Completion Criteria

You have successfully completed this project when:

Minimum Requirements

  1. Builds without errors: make produces kernel8.img
  2. Boots in QEMU: qemu-system-aarch64 -M raspi3b -kernel kernel8.img -serial stdio shows output
  3. Prints message: “Hello from bare metal!” appears in serial terminal
  4. Works on hardware: Same output when running on physical Pi 3/4
  5. Clean shutdown: Kernel enters infinite loop without crashing

Quality Criteria

  1. Code is readable: Comments explain non-obvious operations
  2. Proper separation: Startup assembly, main kernel, UART driver are distinct
  3. No hardcoded magic: Constants are named and documented
  4. Portable base addresses: Easy to switch between Pi 3 and Pi 4

Understanding Criteria

  1. Can explain boot sequence: From power-on to your first printf
  2. Understands linker script: Can modify memory layout if needed
  3. Knows register purpose: Can describe each UART register used
  4. Can debug issues: Systematic approach to fixing problems

Stretch Goals

  1. Echo terminal: Receives and echoes typed characters
  2. LED feedback: ACT LED indicates system state
  3. Pi 3 and Pi 4: Same code works on both (peripheral base detection)
  4. System info: Prints CPU type and memory size via mailbox

Summary

This project takes you from zero to running your own code on a Raspberry Pi with no operating system. You’ve learned:

  1. The Pi boot process — GPU loads firmware, then your kernel
  2. ARM64 assembly basics — Core parking, stack setup, BSS clearing
  3. Linker scripts — Controlling memory layout for bare metal
  4. Memory-mapped I/O — Accessing hardware through physical addresses
  5. UART programming — Serial communication for debugging

This is the foundation for all embedded systems and operating system development. Every OS kernel, every bootloader, every firmware project starts with code like this.

Next steps:

  • Project 5: x86 Bootloader (learn another architecture)
  • Project 9: STM32 Bare Metal (professional ARM Cortex-M)
  • Project 10: Simple Kernel with Multitasking (use this as a base)

The transition from “code that runs on an OS” to “code that IS the first thing running” is a fundamental shift in understanding. You now have that understanding.


Time to boot into your own code. Good luck!