Project 1: The Bare-Metal “Hello, World”

Bring up a Cortex-M board from reset with your own startup code and linker script, then blink an LED by writing directly to memory-mapped registers.

Quick Reference

Attribute	Value
Difficulty	Intermediate
Time Estimate	1-2 weeks
Main Programming Language	C (Alternatives: Rust, Ada)
Alternative Programming Languages	Rust, Ada
Coolness Level	High
Business Potential	Medium
Prerequisites	C pointers, basic MCU datasheets, basic ARM register awareness
Key Topics	Boot sequence, linker scripts, vector tables, memory-mapped I/O, GPIO

1. Learning Objectives

By completing this project, you will:

Build a complete bare-metal firmware image with custom startup and linker control.
Explain the Cortex-M reset sequence and how the vector table seeds initial execution.
Configure clock and GPIO registers to control an LED without any HAL.
Debug early boot issues with GDB by inspecting SP, PC, and memory-mapped registers.

2. All Theory Needed (Per-Concept Breakdown)

2.1 Cortex-M Boot Sequence and Vector Table

Fundamentals

The Cortex-M family always starts from a fixed contract: the first two 32-bit words in the vector table are the initial stack pointer and the reset handler address. On reset, the CPU loads SP from word 0, PC from word 1, and begins executing the reset handler in Thumb state. Every exception or interrupt thereafter indexes into the vector table, which is simply an array of function pointers. This is why a working vector table is non-negotiable: if any entry is wrong or misaligned, the CPU will jump into invalid memory and fault instantly. The vector table is also the bridge between hardware events and software behavior, mapping asynchronous sources (SysTick, GPIO interrupts, faults) to deterministic handler code. Understanding this contract means you can reason about why a microcontroller appears “dead” when the vector table is wrong and why even a perfect main() cannot save you if boot is broken.

Additional fundamentals: Think of the vector table as a hardware contract, not an optional data structure. The CPU does not search for handlers; it simply trusts the table. That is why the table must be in the correct location, aligned, and complete. Even unused entries must point somewhere safe. This mental model helps you debug: if the system does nothing, your first check is whether the contract is satisfied.

Deep Dive into the concept

A Cortex-M boot looks simple, but every step carries hidden invariants. First, the CPU reads the vector table base address from a fixed location that depends on the memory map and VTOR configuration. On most MCUs, it starts at address 0x00000000, which is usually aliased to flash at reset. The first word must be a valid RAM address because the CPU immediately uses it as the stack pointer. If your linker script places the stack outside RAM or misaligns it, even pushing the first register will fault. The second word is the reset handler address. On Cortex-M, all code executes in Thumb state, and the LSB of the handler address must be set to 1; this is why startup code often uses Reset_Handler + 1. The reset handler is responsible for zeroing .bss, copying .data from flash to RAM, and configuring the environment to a known state before calling main. If you skip these steps, global variables will contain garbage and the system will behave nondeterministically.

The vector table continues beyond reset and lists exception handlers in a fixed order: NMI, HardFault, MemManage, BusFault, UsageFault, SVCall, DebugMonitor, PendSV, SysTick, and device-specific IRQs. The order and offsets are defined by the ARM Architecture Reference Manual. This order is crucial for the NVIC, which uses the interrupt number to index into the table. The invariants are: the table must be word-aligned, entries must be valid addresses, and any unused entry should point to a safe default handler to avoid executing random memory. Many bugs at this layer manifest as a HardFault, but the fault is only the symptom. The cause is almost always a malformed vector table or an invalid stack pointer.

The reset handler is also the natural place to configure early clocks and to relocate the vector table if you are using a bootloader. On some boards, you may boot from ROM and then remap flash or SRAM. If you decide to relocate the vector table to SRAM (for dynamic ISR swapping or faster access), you must update VTOR and ensure the table stays aligned to at least 128 bytes (implementation-dependent). This project does not require relocation, but understanding the rule allows you to reason about advanced RTOS features later.

Finally, the boot process interacts with debugging. When GDB attaches, it can halt the CPU after reset and show you the SP, PC, and memory contents. If the SP points into flash or the reset handler points into unmapped memory, the CPU never executes real code. Being able to inspect these values is an essential skill for early bring-up. The boot sequence is the skeleton that everything else hangs on, including interrupts, context switching, and RTOS scheduling. If you can explain it, you can fix most “board is dead” scenarios in minutes instead of days.

Additional depth: The reset sequence also involves system-level configuration in the System Control Block (SCB). For example, the VTOR register can be used to relocate the vector table to a different memory region (such as SRAM) after boot. This matters when you have a bootloader that lives at flash base and an application that lives at an offset. In that case, the bootloader often remaps the vector table for the application, or the application writes VTOR so that interrupts resolve to its own handlers. Another nuance is that faults are exceptions too; if your HardFault handler is missing or points to an invalid address, the CPU can end up in a lockup state where it repeatedly faults and never returns. This can look like a completely dead board. Learning to read fault status registers (HFSR, CFSR) in the SCB helps you pinpoint boot errors quickly.

There is also a subtle interaction with linker-generated symbols and startup code. The vector table, reset handler, and stack top must be placed consistently by the linker script, and the startup code must use those symbols as absolute addresses. If you accidentally declare a symbol as a regular variable instead of extern, the linker might allocate storage for it, shifting memory layout and corrupting the vector table. A good practice is to inspect the map file and verify that _estack, _sidata, _sdata, and _edata match your expectations. These details are not just “setup work”; they determine whether the CPU can even fetch instructions safely. Many embedded engineers keep a printed copy of the vector table order and the memory map while debugging early boot. This level of rigor is the difference between a weekend project and production-grade firmware.

Finally, remember that the reset handler is the only place where you can safely set up the runtime environment. Even a simple call to a function that uses global variables depends on .data being copied and .bss being cleared. If you skip this or do it in the wrong order, you might see strange behavior like global counters starting at random values. On Cortex-M, the reset handler also sets up the stack alignment and may configure the FPU if your code uses floating point. If you ignore the FPU setup but compile with floating-point options, the first floating-point instruction can trigger a UsageFault. These issues often appear only after you add more code, which is why validating the boot path early is critical.

How this fit on projects

This concept is the backbone of the entire project. You will implement the vector table in startup.s, validate the initial stack pointer placement, and prove the reset handler initializes .data/.bss before calling main (see Sec. 3.1 and Sec. 5.2). It also sets the stage for later projects that rely on interrupts and PendSV.

Definitions & key terms

Vector table: An array of exception/interrupt handler addresses in a fixed order.
Reset handler: First function executed after reset; initializes memory and jumps to main.
VTOR: Vector Table Offset Register; points to the base address of the vector table.
Thumb state: Cortex-M execution mode; function pointers must have LSB set.
HardFault: Exception raised on invalid memory access, bad stack, or illegal state.

Mental model diagram (ASCII)

Reset
  |
  v
[Vector Table]
  | word0 -> SP
  | word1 -> Reset_Handler
  v
Reset_Handler
  | init .data/.bss
  | configure clocks
  v
main()

How it works (step-by-step, with invariants and failure modes)

CPU fetches the vector table base (default 0x00000000).
Loads SP from word 0; invariant: SP must point into RAM and be 8-byte aligned.
Loads PC from word 1; invariant: LSB must be 1 (Thumb state).
Executes reset handler; failure mode: if .data copy is wrong, globals are garbage.
Reset handler sets up .bss, optional clocks, then calls main.
If any interrupt fires, NVIC indexes into the vector table by IRQ number.

Minimal concrete example

__attribute__((section(".isr_vector")))
void (* const vector_table[])(void) = {
    (void (*)(void))(&__stack_top),
    Reset_Handler,
    NMI_Handler,
    HardFault_Handler,
    // ...
};

Common misconceptions

“The compiler sets up the stack for me” -> Not on bare metal; you must place it.
“main is the first code that runs” -> Reset handler runs first and must initialize memory.
“Any address can be a handler” -> Must be valid and in Thumb state.

Check-your-understanding questions

Why does the reset handler address need its LSB set?
What happens if the initial stack pointer points into flash?
Why must .bss be zeroed manually on boot?
What is the role of VTOR, and when would you change it?

Check-your-understanding answers

The LSB indicates Thumb state on Cortex-M; without it, the CPU faults.
The CPU will attempt to push registers into flash, causing a fault.
C assumes zero-initialized globals; without zeroing, behavior is undefined.
VTOR tells the CPU where the vector table lives; change it when relocating the table.

Real-world applications

Bootloaders that remap vector tables for firmware updates.
Safety systems that place fault handlers in protected memory.
ROM-based startup code for automotive ECUs.

Where you’ll apply it

This project: Sec. 3.1, Sec. 3.7.2, Sec. 4.1, Sec. 5.2.
Also used in: Project 2 and Project 4.

References

ARMv7-M Architecture Reference Manual, Exception Model.
“The Definitive Guide to ARM Cortex-M” by Joseph Yiu, Ch. 5-6.
STM32F4 Reference Manual, System Control Block (SCB).

Key insights

A correct vector table and reset handler are the only gatekeepers between a dead board and a running system.

Summary

The boot sequence defines the first execution path on Cortex-M. If you can set SP, PC, and memory correctly, everything else becomes possible.

Homework/Exercises to practice the concept

Draw the vector table layout with exact offsets for the first 10 entries.
Modify the vector table to point SysTick to a custom handler and verify it triggers.
Relocate the vector table to SRAM and confirm VTOR updates the base.

Solutions to the homework/exercises

Use the ARM exception order: Reset at offset 0x04, HardFault at 0x0C, etc.
Set the SysTick entry to your handler and enable SysTick; observe a GPIO toggle.
Copy the table to SRAM, set VTOR to SRAM base, and verify interrupts still work.

2.2 Linker Script, Memory Map, and Section Placement

Fundamentals

A linker script defines where every byte of your firmware lives in flash and RAM. It maps sections such as .text (code), .rodata (const data), .data (initialized variables), and .bss (zeroed variables) into explicit addresses. On a microcontroller, these addresses must match the physical memory map in the datasheet. If you place .data in flash or overlap the stack with heap, your program will fail long before main runs. The linker script also defines symbols such as __stack_top, which your startup code uses. This means the linker script is not a build artifact; it is part of your runtime contract.

Additional fundamentals: A linker script is the only tool that tells the firmware where things live. Without it, sections might overlap or land in invalid memory. Reading the map file after linking is part of correct engineering, not a debugging trick. If you can point to each section in memory, you can reason about every crash and prevent most early-boot faults.

Deep Dive into the concept

Unlike desktop software, embedded firmware must be placed in fixed locations because there is no MMU, no loader, and no OS to relocate code. The linker script provides two sets of addresses for sections like .data: a load address in flash (LMA) and a runtime address in RAM (VMA). The reset handler uses these addresses to copy initialized data into RAM. This is why .data has two addresses and why the linker provides symbols like _sidata, _sdata, and _edata. The .bss section, on the other hand, has only a runtime address and must be zeroed by the reset handler. The linker script also establishes the boundary of RAM for the stack and, optionally, a heap. If you use a static-only RTOS, you may omit the heap entirely and reserve stack regions for tasks.

The memory map is typically divided into FLASH and RAM with explicit sizes. For STM32F4, flash may start at 0x08000000 and RAM at 0x20000000. A common failure is forgetting that the vector table must be at the start of flash, or placing .isr_vector in the wrong section. Another failure is forgetting alignment: Cortex-M requires the vector table to be aligned to 128 bytes or more, depending on the number of vectors. The linker script controls alignment by using ALIGN() directives. If you skip alignment, the CPU might use the wrong offset for interrupt entries.

Beyond correctness, a linker script is how you design memory safety. You can set up guard regions, reserve space for a bootloader, or separate privileged kernel stacks from user stacks. You can even place constant tables into CCM (Core-Coupled Memory) on some MCUs for faster access. The key is that every decision is explicit; there is no hidden allocator moving things around. This is why the linker script is as important to the RTOS as the scheduler itself. Later projects use the same mechanism to carve out task stacks, memory pools, and instrumentation buffers, so this project is where you learn the technique.

Finally, linker scripts enable deterministic debugging. If you know the exact address of a symbol, you can inspect it in GDB or dump it over UART. You can also verify that the stack pointer starts at the correct location by checking the symbol __stack_top. This is the difference between guessing and engineering. The linker script is the map; the startup code is the tour guide.

Additional depth: Linker scripts also allow you to control initialization ordering. For example, you can define custom sections like .init_array or .rtos_init and ensure those sections are placed in flash and copied appropriately. This becomes important when you have kernel-level objects that must be initialized before main. Another aspect is the difference between AT and > directives in the linker script. AT defines the load address (in flash), while > defines the runtime address (in RAM). Forgetting this distinction leads to a very common error: you place .data in RAM but forget to specify a load address, so the reset handler copies garbage into RAM because the LMA symbols are wrong. The firmware may run for a while and then behave unpredictably.

Alignment constraints can be subtle. The vector table must be aligned to a power-of-two boundary, and many MCUs require at least 128-byte alignment. Similarly, the stack pointer must be 8-byte aligned per the ARM ABI. Misalignment may not fail immediately but can cause faults when the CPU uses doubleword accesses or when an exception occurs. It is a good habit to explicitly align sections using ALIGN(8) or higher and to verify the alignment in the map file. When you later add floating-point or DSP instructions, alignment becomes even more important.

Memory maps also involve reserved regions. Some microcontrollers have special RAM blocks like CCM (Core Coupled Memory) that are faster but not accessible by DMA. If you place DMA buffers there, peripheral transfers will fail silently. Similarly, some flash regions are reserved for bootloader or option bytes. The linker script is where you enforce these boundaries so that your application never overwrites them. Understanding these details in a small project pays off later when you integrate larger systems with bootloaders, firmware updates, or safety partitions.

How this fit on projects

You will author linker.ld and verify .isr_vector, .text, .data, .bss, and stack placement. The memory map directly influences the correctness of the LED blink and all later RTOS features (Sec. 3.1, Sec. 4.4, Sec. 5.2).

Definitions & key terms

LMA (Load Memory Address): Where data is stored in flash.
VMA (Virtual Memory Address): Where data lives at runtime, typically in RAM.
Section: A labeled block of data or code placed by the linker.
Symbol: A named address emitted by the linker for startup code to use.
Alignment: Boundary requirements for sections (power-of-two).

Mental model diagram (ASCII)

FLASH (0x0800_0000)
| .isr_vector | .text | .rodata | .data (LMA) |
                   | copy to RAM
RAM (0x2000_0000)
| .data (VMA) | .bss | task stacks | idle stack |

How it works (step-by-step, with invariants and failure modes)

Linker places .isr_vector at flash base; invariant: correct alignment.
Linker emits symbols for section boundaries.
Reset handler copies .data from LMA to VMA; failure mode: wrong size.
Reset handler zeroes .bss; failure mode: uninitialized globals.
Stack pointer initialized to __stack_top; failure mode: stack overlaps .bss.

Minimal concrete example

MEMORY {
  FLASH (rx) : ORIGIN = 0x08000000, LENGTH = 512K
  RAM   (rwx): ORIGIN = 0x20000000, LENGTH = 128K
}

SECTIONS {
  .isr_vector : { KEEP(*(.isr_vector)) } > FLASH
  .text : { *(.text*) *(.rodata*) } > FLASH
  .data : { *(.data*) } > RAM AT > FLASH
  .bss  : { *(.bss*) *(COMMON) } > RAM
}

Common misconceptions

“The linker script is optional” -> Without it, placement is undefined.
“The stack grows up” -> On Cortex-M, it grows downward from the top of RAM.
“Alignment doesn’t matter” -> Misaligned vectors cause crashes.

Check-your-understanding questions

Why does .data have both LMA and VMA?
What symbol tells you where the stack should start?
What happens if .bss overlaps task stacks?
Why must .isr_vector be kept from garbage collection?

Check-your-understanding answers

It is stored in flash but used in RAM; the reset handler copies it.
A linker-defined symbol like __stack_top (or _estack).
Task stacks will overwrite globals, causing random failures.
The linker might discard it if not referenced; KEEP prevents that.

Real-world applications

Bootloader + application memory partitioning.
Safety-critical systems with protected stack regions.
Firmware upgrades with dual-bank flash layouts.

Where you’ll apply it

This project: Sec. 3.1, Sec. 4.4, Sec. 5.2, Sec. 5.10.
Also used in: Project 3 and Project 9.

References

GNU ld documentation, “Linker Scripts” section.
“The Definitive Guide to ARM Cortex-M” by Joseph Yiu, Ch. 4.
STM32F4 Reference Manual, Memory Map chapter.

Key insights

The linker script is the physical blueprint of your firmware; if the blueprint is wrong, every higher layer collapses.

Summary

Linker scripts place code, data, and stacks into precise memory locations and enable deterministic boot and debugging.

Homework/Exercises to practice the concept

Add a custom section .rtos_stacks and place it in RAM after .bss.
Modify the linker script to reserve 16 KB for a bootloader at flash base.
Verify in the map file that .isr_vector is at the correct address.

Solutions to the homework/exercises

Define .rtos_stacks in SECTIONS and reference it with an extern symbol.
Change FLASH origin to 0x08004000 and adjust LENGTH accordingly.
Inspect the map file for the .isr_vector address and alignment.

2.3 Memory-Mapped I/O and GPIO Control

Fundamentals

Memory-mapped I/O means peripherals are controlled by reading and writing specific addresses that map to hardware registers. For GPIO, this includes configuration registers (mode, speed, pull-up/pull-down) and data registers (output state). Because these addresses represent hardware, you must use volatile to prevent the compiler from optimizing away reads and writes. The CPU treats these locations like memory, but the side effects are real: writing a bit can turn on an LED, and reading a register can clear a hardware flag. This direct control is the essence of bare-metal programming.

Additional fundamentals: Memory-mapped registers are not normal variables. Reads and writes have side effects, and timing can matter. Treat every access as a transaction with hardware. Use volatile consistently, and always confirm the peripheral clock is enabled before configuring registers. This simple discipline prevents a huge class of bring-up failures.

Deep Dive into the concept

GPIO peripherals typically sit behind a clock gate. On STM32, you must first enable the GPIO clock in RCC before any writes to GPIO registers take effect. This is a classic bring-up trap: your code might write the correct values, but the peripheral is not clocked and appears dead. Once clocked, each GPIO port has a MODE register to set pin direction, a TYPE register for push-pull or open-drain, PUPD for pull resistors, and ODR/BSRR for outputs. The BSRR register is especially important because it allows atomic set/reset of bits without read-modify-write hazards. This matters because interrupts could otherwise change the ODR between your read and write, producing glitches.

Memory-mapped I/O also has timing and ordering implications. Some registers require delays between writes, others are write-one-to-clear, and many are reset to default values that must be changed in a specific sequence. A common failure is writing to the wrong address because of a misread datasheet or missing offset. This is why it is best to define registers using #define offsets or struct mappings and to verify addresses with the reference manual. It is also why GDB is a powerful tool: you can inspect the registers directly in memory to confirm whether your writes had effect. If a GPIO output doesn’t toggle, you can read back ODR or IDR to verify the state.

In the broader RTOS context, memory-mapped I/O is how the kernel interacts with timers, UARTs, and interrupt controllers. The RTOS does not own special privileges; it simply writes registers more systematically. This means mastering GPIO is not just about blinking an LED; it is about understanding the basic transaction between software and hardware. The same principles apply to NVIC configuration, SysTick, and even context switch triggers. If you can control a pin reliably, you can control the entire system.

Additional depth: GPIO control is often your first exposure to real hardware behavior. Many GPIO registers are not simple read/write registers; some are write-only (like BSRR), some are write-one-to-clear, and some have reserved bits that must never be modified. If you accidentally write to reserved bits, the peripheral may misbehave or become unstable. This is why register definitions should use masks that preserve reserved fields. A common pattern is to read-modify-write only the bits you intend to change and to use the reference manual’s bit descriptions rather than guessing.

Clock gating is another subtle aspect. On STM32, the RCC module controls which peripheral clocks are enabled. If the clock is off, writes are ignored, but reads may return zero or stale values. This can trick you into thinking your code is correct when it is not. Always verify the RCC enable bit after setting it. Also note that some peripherals require a delay after enabling the clock before configuration registers respond. This is not always obvious in the datasheet, but it can cause transient failures if you configure too quickly. Adding a short dummy read or a few NOPs after enabling the clock is a defensive practice.

GPIO output speed and drive type also matter. If you drive an LED with a pin configured as open-drain without a pull-up, it may never light. If you select a low-speed output but try to toggle at a high frequency, you may see rounded edges or timing distortion on a scope. Understanding these electrical properties helps you interpret timing measurements later in the RTOS projects. For example, if you use a GPIO pulse to measure ISR latency, the pin’s output speed setting affects rise time, which affects measurement accuracy. This is why even a “simple” LED blink is a lesson in the hardware-software boundary.

How this fit on projects

You will configure RCC and GPIO registers to blink an LED in Sec. 3.1 and validate the output in Sec. 3.7. This knowledge is reused in timing measurements in Project 2 and Project 10.

Definitions & key terms

Memory-mapped I/O: Peripheral registers mapped into the CPU address space.
GPIO: General Purpose Input/Output, controllable pins.
RCC: Reset and Clock Control; enables peripheral clocks.
BSRR: Bit Set/Reset Register; atomic GPIO set/reset.
volatile: C qualifier preventing optimization of hardware accesses.

Mental model diagram (ASCII)

CPU store -> [GPIOx_BSRR address] -> hardware latch -> LED pin
CPU load  <- [GPIOx_ODR address]  <- output state

How it works (step-by-step, with invariants and failure modes)

Enable GPIO clock in RCC; invariant: peripheral clock must be on.
Configure pin mode as output; failure: pin remains input.
Write to BSRR to set/reset; failure: using ODR with read-modify-write glitch.
Observe LED state; verify by reading back ODR/IDR.

Minimal concrete example

#define RCC_AHB1ENR (*(volatile uint32_t*)0x40023830)
#define GPIOA_MODER (*(volatile uint32_t*)0x40020000)
#define GPIOA_BSRR  (*(volatile uint32_t*)0x40020018)

RCC_AHB1ENR |= (1u << 0);          // enable GPIOA clock
GPIOA_MODER |= (1u << (5 * 2));    // PA5 output
GPIOA_BSRR = (1u << 5);            // set PA5

Common misconceptions

“If I write to a register, it always sticks” -> not if the peripheral clock is off.
“ODR is always safe” -> read-modify-write can glitch under interrupts.
“volatile is optional” -> compiler may remove necessary writes.

Check-your-understanding questions

Why do you enable the GPIO clock before configuring the pin?
What is the advantage of BSRR over ODR writes?
How can you verify a GPIO register write without a scope?

Check-your-understanding answers

Without the clock, the peripheral ignores writes.
BSRR is atomic and avoids read-modify-write hazards.
Read back ODR/IDR with GDB or print via UART if available.

Real-world applications

Control outputs for motors, relays, and indicators.
Toggle timing pins for logic-analyzer measurements.
Drive chip-select lines for SPI devices.

Where you’ll apply it

This project: Sec. 3.1, Sec. 3.7.2, Sec. 5.10 Phase 1.
Also used in: Project 2 and Project 10.

References

STM32F4 Reference Manual, GPIO chapter.
“Making Embedded Systems” by Elecia White, Ch. 4.

Key insights

GPIO bring-up is the first proof that your software controls real hardware.

Summary

Memory-mapped I/O uses ordinary loads and stores to drive hardware. GPIO is the simplest and most useful example.

Homework/Exercises to practice the concept

Configure two pins and alternate them to create a square wave.
Measure the maximum toggle rate with a scope or logic analyzer.
Implement a read-back verification routine that asserts pin state.

Solutions to the homework/exercises

Set pin A, delay, clear pin A, set pin B, delay, clear pin B.
Remove delays and loop; measure frequency and estimate CPU overhead.
After writing BSRR, read ODR and compare expected bit values.

3. Project Specification

3.1 What You Will Build

A complete bare-metal firmware image for a Cortex-M board that boots with your own startup code and linker script, configures GPIO directly, and blinks an LED at a visible rate without any vendor HAL. The build will produce ELF and binary artifacts and will be flashable via OpenOCD and ST-Link.

Included:

Custom startup.s with vector table and reset handler
linker.ld defining flash/RAM layout
C code to configure RCC/GPIO and blink LED

Intentionally excluded:

Vendor HAL or CMSIS device libraries (optional headers are allowed)
Any RTOS code (that comes later)
Dynamic memory allocation

3.2 Functional Requirements

Boot Correctness: Reset handler initializes .data and .bss and calls main.
Vector Table Placement: Vector table is at the correct flash base and aligned.
GPIO Control: LED pin toggles using memory-mapped registers.
Debuggable Build: ELF includes symbols for GDB inspection.

3.3 Non-Functional Requirements

Performance: LED blink should be stable with <5% timing drift.
Reliability: No HardFaults during 60 seconds of continuous run.
Usability: Build uses a single make target to produce artifacts.

3.4 Example Usage / Output

$ make all
arm-none-eabi-gcc ... -T linker.ld -o build/rtos.elf
arm-none-eabi-objcopy -O binary build/rtos.elf build/rtos.bin

$ make flash
openocd -f interface/stlink.cfg -f target/stm32f4x.cfg -c "program build/rtos.elf verify reset exit"

3.5 Data Formats / Schemas / Protocols

ELF: build/rtos.elf contains symbols, sections, and debug info.
BIN: build/rtos.bin is raw flash image.
Map file: build/rtos.map records section placement.

3.6 Edge Cases

Vector table misaligned -> HardFault on reset.
GPIO clock not enabled -> LED never toggles.
Stack pointer outside RAM -> immediate fault on first push.
Incorrect .data copy size -> globals corrupted.

3.7 Real World Outcome

A learner can power-cycle the board, see the LED blink with a stable rhythm, and attach GDB to single-step through register writes. The firmware runs indefinitely without fault.

3.7.1 How to Run (Copy/Paste)

make clean all
make flash

Expected working directory: the project root with Makefile, startup.s, and linker.ld.

Exit codes:

make returns 0 on success, 2 on compilation or link failure.
openocd returns 0 on successful flash/verify, 1 on connection failure.

3.7.2 Golden Path Demo (Deterministic)

Power board with USB.
Run make flash.
Observe LED toggling at a 1 Hz visible rate.
Attach GDB and break main, then continue and observe the LED resumes.

3.7.3 Failure Demo (Deterministic)

Comment out the RCC GPIO clock enable.
Rebuild and flash.
LED never toggles, but GDB shows GPIO registers unchanged.
Restore the RCC enable and the LED toggles again.

3.7.4 Hardware/Firmware Demo

Scope or logic analyzer probe on the LED pin shows a square wave with ~50% duty cycle.
GDB shows RCC_AHB1ENR bit for the GPIO port set to 1.

4. Solution Architecture

4.1 High-Level Design

+-------------------+      +--------------------+
| Reset/Startup     | ---> | main()             |
| - Vector table    |      | - GPIO init        |
| - .data/.bss init |      | - Blink loop       |
+-------------------+      +--------------------+

4.2 Key Components

Component	Responsibility	Key Decisions
`startup.s`	Vector table, reset handler	Thumb state, correct stack pointer
`linker.ld`	Memory layout and section placement	Stack location, section alignment
`gpio.c`	Register-level GPIO control	Use BSRR for atomic toggle
`Makefile`	Build and flash automation	`arm-none-eabi-gcc` toolchain

4.4 Data Structures (No Full Code)

extern uint32_t __stack_top;  // from linker
struct gpio_regs {
    volatile uint32_t MODER;
    volatile uint32_t OTYPER;
    volatile uint32_t OSPEEDR;
    volatile uint32_t PUPDR;
    volatile uint32_t IDR;
    volatile uint32_t ODR;
    volatile uint32_t BSRR;
};

4.4 Algorithm Overview

Key Algorithm: Delay Loop

Configure GPIO pin as output.
Toggle pin via BSRR.
Busy-wait for N cycles.
Repeat indefinitely.

Complexity Analysis:

Time: O(1) per loop iteration.
Space: O(1).

5. Implementation Guide

5.1 Development Environment Setup

brew install arm-none-eabi-gcc openocd
# or use your OS package manager

5.2 Project Structure

rtos-p01/
|-- startup.s
|-- linker.ld
|-- src/
|   `-- main.c
|-- Makefile
`-- build/

5.3 The Core Question You’re Answering

“How does a CPU start executing my code with no OS, and how can I prove it by toggling a real pin?”

5.4 Concepts You Must Understand First

Stop and research these before coding:

Vector Table and Reset
- Why the first two words matter.
- How a wrong stack pointer causes HardFaults.
Linker Scripts
- How .data and .bss are placed and initialized.
- How to read a map file.
Memory-Mapped GPIO
- Which register enables the port clock.
- Why volatile is mandatory.

5.5 Questions to Guide Your Design

Which address is your flash base and RAM base?
Where will you place the stack and how big will it be?
Which register toggles the LED without read-modify-write hazards?

5.6 Thinking Exercise

Sketch the full memory map for your MCU and mark .text, .data, .bss, and the stack. Then annotate where the vector table lives.

5.7 The Interview Questions They’ll Ask

“What does the reset handler do that main cannot?”
“Why is volatile required for GPIO registers?”
“How do you know the vector table is aligned?”
“What happens if .data is not copied?”

5.8 Hints in Layers

Hint 1: Prove the vector table works Set a breakpoint in Reset_Handler and confirm SP and PC values.

Hint 2: Verify the clock Read the RCC enable register after writing it.

Hint 3: Use BSRR Use the BSRR register to avoid accidental clears.

Hint 4: Map file check Inspect build/rtos.map to ensure .isr_vector is at flash base.

5.9 Books That Will Help

Topic	Book	Chapter
Cortex-M startup and exceptions	“The Definitive Guide to ARM Cortex-M”	Ch. 5-6
Embedded memory maps and boot	“Making Embedded Systems”	Ch. 3-4
Linker scripts and toolchains	“The GNU Make Book”	Ch. 1-3

5.10 Implementation Phases

Phase 1: Bring-Up (2-3 days)

Goals:

Build a minimal binary that reaches main.
Validate .data/.bss init with GDB.

Tasks:

Write startup.s with vector table and reset handler.
Create linker.ld with flash/RAM regions.

Checkpoint: GDB breaks at main, SP points into RAM.

Phase 2: GPIO Control (2-4 days)

Goals:

Configure GPIO and blink LED.
Verify output with a scope or LED observation.

Tasks:

Enable GPIO clock in RCC.
Set GPIO mode and toggle using BSRR.

Checkpoint: LED toggles at stable rate.

Phase 3: Debug Polish (2-3 days)

Goals:

Add map file and build targets.
Document how to debug boot faults.

Tasks:

Add make flash and make gdb targets.
Write a short debug checklist in README.

Checkpoint: You can reproduce a HardFault and explain the cause.

5.11 Key Implementation Decisions

Decision	Options	Recommendation	Rationale
GPIO toggle method	ODR vs BSRR	BSRR	Atomic set/reset avoids race conditions
Stack size placement	Minimal vs generous	Generous	Avoid early stack overflow during bring-up
Use of CMSIS headers	Yes vs No	Minimal	Avoid abstraction, keep register addresses

6. Testing Strategy

6.1 Test Categories

Category	Purpose	Examples
Unit Tests	Validate helpers and math	Delay loop calibration
Integration Tests	Verify boot + GPIO together	Flash and observe LED
Edge Case Tests	Verify failure modes	Misaligned vector table simulation

6.2 Critical Test Cases

Boot Integrity: Check SP/PC after reset; expect valid RAM/flash addresses.
GPIO Toggle: LED toggles at the expected interval for 60 seconds.
Fault Injection: Disable RCC clock; LED should remain off.

6.3 Test Data

Expected SP range: 0x20000000 - 0x20020000
Expected LED frequency: ~1 Hz (or your chosen rate)

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

Pitfall	Symptom	Solution
Vector table misaligned	HardFault at reset	Use ALIGN and verify map file
GPIO clock not enabled	LED never toggles	Set RCC enable bit and verify in GDB
Stack outside RAM	Immediate crash	Fix linker symbol and stack placement

7.2 Debugging Strategies

Check SP/PC first: If those are wrong, nothing else matters.
Read back registers: Confirm RCC/GPIO values before blaming code.

7.3 Performance Traps

Busy-wait delays are CPU-expensive; later projects replace them with timers.

8. Extensions & Challenges

8.1 Beginner Extensions

Add a second LED on another pin and alternate them.
Replace the busy delay with a simple cycle counter loop.

8.2 Intermediate Extensions

Add UART output to print a boot banner.
Relocate vector table to SRAM and confirm interrupts still work.

8.3 Advanced Extensions

Implement a minimal fault handler that prints register state over UART.
Add a CRC check of flash at boot and fault if corrupted.

9. Real-World Connections

9.1 Industry Applications

Automotive ECUs: Boot code initializes safety-critical systems.
Medical devices: Reliable startup is required before any patient interaction.

FreeRTOS: Startup + linker scripts for each MCU family.
Zephyr: Board-specific linker and vector table implementations.

9.3 Interview Relevance

Cortex-M startup sequence and linker script reasoning are common embedded interview topics.

10. Resources

10.1 Essential Reading

“The Definitive Guide to ARM Cortex-M” by Joseph Yiu (Startup and exceptions).
“Making Embedded Systems” by Elecia White (Ch. 3-4).

10.2 Video Resources

Cortex-M boot sequence walkthroughs (ARM YouTube channels).
STM32 bring-up tutorials (vendor training videos).

10.3 Tools & Documentation

OpenOCD: Flashing and debug server.
GDB: Register and memory inspection.

Project 2: adds timer interrupts.
Project 3: introduces task stacks.

11. Self-Assessment Checklist

11.1 Understanding

I can explain the first two words of the vector table.
I can describe how .data and .bss are initialized.
I can explain why volatile is required for GPIO registers.

11.2 Implementation

All functional requirements are met.
The LED toggles reliably for at least one minute.
GDB can break at main and inspect registers.

11.3 Growth

I can describe at least one boot failure mode and its fix.
I can explain this project in a job interview.

12. Submission / Completion Criteria

Minimum Viable Completion:

LED blinks reliably.
Reset handler initializes .data/.bss.
Map file confirms correct section placement.

Full Completion:

All minimum criteria plus:
Demonstrated GDB inspection of SP/PC and GPIO registers.
Documented memory map in README.

Excellence (Going Above & Beyond):

Vector table relocation to SRAM works.
Fault handler prints register context over UART.