Project 9: STM32 Bare Metal ARM Cortex-M

Build a complete bare metal project on STM32 without HAL or any libraries–GPIO, UART, timers, interrupts, and DMA–mastering professional embedded systems development from register-level fundamentals.

Quick Reference

Attribute Value
Difficulty Advanced
Time Estimate 3-4 weeks
Language C (alt: Rust)
Prerequisites Projects 1-3 (AVR), ARM basics, C programming
Key Topics ARM Cortex-M4, RCC clock tree, GPIO, NVIC, DMA, startup code
Hardware STM32F4 Discovery or Nucleo board (~$15-25)
Tools arm-none-eabi-gcc, OpenOCD, ST-Link

1. Learning Objectives

By completing this project, you will:

  1. Understand ARM Cortex-M architecture: Master the exception model, NVIC, and core registers unique to Cortex-M processors
  2. Configure complex clock trees: Navigate the STM32 RCC (Reset and Clock Control) to set up system clocks from HSI/HSE through PLLs
  3. Write production-quality startup code: Create vector tables, initialize memory sections, and configure the processor before main()
  4. Master memory-mapped peripheral access: Read datasheets and access GPIO, UART, timers, and DMA through direct register manipulation
  5. Implement interrupt-driven I/O: Configure NVIC priorities, write ISRs, and understand ARM’s exception entry/exit mechanism
  6. Use DMA for efficient data transfer: Offload CPU work to DMA controller for memory-to-peripheral and peripheral-to-memory transfers
  7. Debug embedded systems: Use OpenOCD, GDB, and logic analyzers to troubleshoot bare metal code
  8. Create professional embedded firmware: Organize code with proper hardware abstraction while maintaining bare metal efficiency

2. Theoretical Foundation

2.1 Core Concepts

ARM Cortex-M Architecture Overview

The Cortex-M series is ARM’s architecture for microcontrollers. Unlike Cortex-A (applications) or Cortex-R (real-time), Cortex-M is designed for:

  • Low power consumption
  • Deterministic interrupt latency
  • Simplified programming model (no MMU, single mode of operation)
  • Thumb-2 instruction set (16/32-bit mixed)
┌─────────────────────────────────────────────────────────────────────────────┐
│                    ARM CORTEX-M4 PROCESSOR CORE                             │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  ┌─────────────────────────────────────────────────────────────────────┐   │
│  │                         CPU CORE                                     │   │
│  │  ┌─────────────┐  ┌─────────────┐  ┌──────────────────────────────┐ │   │
│  │  │  Registers  │  │    ALU      │  │      Pipeline (3-stage)      │ │   │
│  │  │  R0-R12     │  │ 32-bit ops  │  │  Fetch → Decode → Execute   │ │   │
│  │  │  SP (R13)   │  │ Hardware    │  └──────────────────────────────┘ │   │
│  │  │  LR (R14)   │  │ multiply    │                                   │   │
│  │  │  PC (R15)   │  │ divide      │  ┌──────────────────────────────┐ │   │
│  │  │  xPSR       │  └─────────────┘  │         FPU (M4F)            │ │   │
│  │  └─────────────┘                   │  Single-precision float      │ │   │
│  │                                    └──────────────────────────────┘ │   │
│  └─────────────────────────────────────────────────────────────────────┘   │
│                                                                             │
│  ┌─────────────────────────────────────────────────────────────────────┐   │
│  │                       NVIC (Nested Vectored Interrupt Controller)    │   │
│  │  • Up to 240 external interrupts                                     │   │
│  │  • 8 priority bits (configurable)                                    │   │
│  │  • 12-cycle worst-case latency                                       │   │
│  │  • Tail-chaining (back-to-back interrupts without stacking)         │   │
│  └─────────────────────────────────────────────────────────────────────┘   │
│                                                                             │
│  ┌─────────────────┐  ┌──────────────────┐  ┌────────────────────────┐    │
│  │    SysTick      │  │    Debug Unit    │  │   Memory Protection    │    │
│  │  24-bit timer   │  │  SWD/JTAG        │  │   Unit (MPU) - opt     │    │
│  └─────────────────┘  └──────────────────┘  └────────────────────────┘    │
│                                                                             │
│  BUS INTERFACES:                                                            │
│  ┌─────────────────────────────────────────────────────────────────────┐   │
│  │  I-Code Bus    D-Code Bus    System Bus                              │   │
│  │  (Instructions) (Data/Debug) (Peripherals)                           │   │
│  └─────────────────────────────────────────────────────────────────────┘   │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

The STM32 Memory Map

Every peripheral, RAM, and Flash region has a specific address:

┌─────────────────────────────────────────────────────────────────────────────┐
│                    STM32F4 MEMORY MAP (32-bit address space)                │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  0xFFFFFFFF ┌──────────────────────────────────────┐                       │
│             │          System Memory               │                       │
│  0xE0100000 ├──────────────────────────────────────┤                       │
│             │   Cortex-M4 Internal Peripherals     │                       │
│             │   • SysTick: 0xE000E010              │                       │
│             │   • NVIC:    0xE000E100              │                       │
│             │   • SCB:     0xE000ED00              │                       │
│  0xE0000000 ├──────────────────────────────────────┤ ← Private Peripheral  │
│             │          (Reserved)                  │   Bus (PPB)           │
│  0xA0000000 ├──────────────────────────────────────┤                       │
│             │       FSMC / FMC Registers           │                       │
│  0x60000000 ├──────────────────────────────────────┤ ← External Memory     │
│             │                                      │                       │
│             │      APB2 Peripherals                │                       │
│             │   • USART1:  0x40011000              │                       │
│             │   • SPI1:    0x40013000              │                       │
│             │   • TIM1:    0x40010000              │                       │
│  0x40010000 ├──────────────────────────────────────┤                       │
│             │      APB1 Peripherals                │                       │
│             │   • USART2:  0x40004400              │                       │
│             │   • I2C1:    0x40005400              │                       │
│             │   • TIM2-7:  0x40000000+             │                       │
│  0x40000000 ├──────────────────────────────────────┤ ← Peripheral Base     │
│             │      AHB1 Peripherals                │                       │
│             │   • GPIOA:   0x40020000              │                       │
│             │   • GPIOB:   0x40020400              │                       │
│             │   • RCC:     0x40023800              │                       │
│             │   • DMA1:    0x40026000              │                       │
│  0x40020000 ├──────────────────────────────────────┤                       │
│             │                                      │                       │
│             │           SRAM                       │                       │
│             │   • SRAM1:  0x20000000 (112KB)       │                       │
│             │   • SRAM2:  0x2001C000 (16KB)        │                       │
│  0x20000000 ├──────────────────────────────────────┤ ← SRAM Base           │
│             │                                      │                       │
│             │           Flash Memory               │                       │
│             │   • 1MB (0x08000000 - 0x080FFFFF)    │                       │
│  0x08000000 ├──────────────────────────────────────┤ ← Flash Base          │
│             │                                      │                       │
│  0x00000000 └──────────────────────────────────────┘ ← Aliased (Boot cfg) │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

The STM32 Clock Tree

This is the most complex part of STM32 bare metal programming. The clock tree routes and multiplies clock signals:

┌─────────────────────────────────────────────────────────────────────────────┐
│                       STM32F4 CLOCK TREE (Simplified)                       │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  CLOCK SOURCES                         SYSTEM CLOCK                         │
│  ─────────────                         ────────────                         │
│                                                                             │
│  ┌─────────────┐                      ┌───────────────┐                    │
│  │    HSI      │──────────────────────│               │                    │
│  │  16 MHz     │                      │   System      │                    │
│  │ (internal)  │     ┌────────┐       │   Clock       │──► SYSCLK          │
│  └─────────────┘     │        │   ┌──►│   Switch      │    (up to 168 MHz) │
│                      │  PLL   │   │   │  (SW bits)    │                    │
│  ┌─────────────┐     │        │───┘   └───────────────┘                    │
│  │    HSE      │────►│ PLLM   │                │                           │
│  │  8-25 MHz   │     │ PLLN   │                │                           │
│  │ (external)  │     │ PLLP   │                ▼                           │
│  └─────────────┘     │ PLLQ   │       ┌───────────────┐                    │
│                      └────────┘       │     AHB       │                    │
│  ┌─────────────┐          │           │   Prescaler   │──► HCLK            │
│  │    LSI      │          │           │  (/1,2,4..512)│   (CPU, AHB bus)   │
│  │  32 kHz     │          │           └───────────────┘                    │
│  │ (internal)  │──────────┼───────────────────┬─────────────┐              │
│  └─────────────┘          │                   │             │              │
│                           │           ┌───────┴───┐ ┌───────┴───┐          │
│  ┌─────────────┐          │           │   APB1    │ │   APB2    │          │
│  │    LSE      │──────────┘           │ Prescaler │ │ Prescaler │          │
│  │  32.768 kHz │                      │(/1,2,4,8,16)│(/1,2,4,8,16)│         │
│  │ (external)  │                      └───────────┘ └───────────┘          │
│  └─────────────┘                           │             │                 │
│                                            ▼             ▼                 │
│                                         PCLK1         PCLK2                │
│                                        (≤42 MHz)     (≤84 MHz)             │
│                                            │             │                 │
│                                            ▼             ▼                 │
│                                       APB1 Periph   APB2 Periph            │
│                                       (USART2,      (USART1,               │
│                                        TIM2-7,       TIM1,                 │
│                                        I2C, SPI2/3)  SPI1, ADC)            │
│                                                                             │
│  EXAMPLE: 168 MHz from 8 MHz HSE                                           │
│  ──────────────────────────────────                                        │
│  HSE (8 MHz) → PLLM (/8) → 1 MHz → PLLN (*336) → 336 MHz → PLLP (/2)       │
│  → 168 MHz SYSCLK                                                          │
│                                                                             │
│  RCC Registers:                                                             │
│  • RCC_CR:      Enable oscillators, check ready flags                      │
│  • RCC_PLLCFGR: Configure PLL multipliers                                  │
│  • RCC_CFGR:    Select system clock, set prescalers                        │
│  • RCC_AHBxENR: Enable peripheral clocks                                   │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Vector Table and Startup

The vector table is the heart of Cortex-M interrupt handling:

┌─────────────────────────────────────────────────────────────────────────────┐
│                    CORTEX-M VECTOR TABLE (at 0x08000000)                    │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  Offset    Exception          Handler Function                             │
│  ──────    ─────────          ────────────────                             │
│  0x0000    Initial SP value   → Points to top of RAM                       │
│  0x0004    Reset              → Reset_Handler (entry point!)               │
│  0x0008    NMI                → NMI_Handler                                │
│  0x000C    HardFault          → HardFault_Handler                          │
│  0x0010    MemManage          → MemManage_Handler                          │
│  0x0014    BusFault           → BusFault_Handler                           │
│  0x0018    UsageFault         → UsageFault_Handler                         │
│  0x001C    Reserved           → (unused)                                   │
│  0x0020    Reserved           → (unused)                                   │
│  0x0024    Reserved           → (unused)                                   │
│  0x0028    Reserved           → (unused)                                   │
│  0x002C    SVCall             → SVC_Handler (supervisor call)              │
│  0x0030    DebugMonitor       → DebugMon_Handler                           │
│  0x0034    Reserved           → (unused)                                   │
│  0x0038    PendSV             → PendSV_Handler (context switch)            │
│  0x003C    SysTick            → SysTick_Handler (system timer)             │
│  0x0040    IRQ0 (WWDG)        → WWDG_IRQHandler                            │
│  0x0044    IRQ1 (PVD)         → PVD_IRQHandler                             │
│  ...       ...                → ...                                        │
│  0x00D0    IRQ36 (USART1)     → USART1_IRQHandler                          │
│  0x00D4    IRQ37 (USART2)     → USART2_IRQHandler                          │
│  ...       (up to IRQ81)      → ...                                        │
│                                                                             │
│  ┌─────────────────────────────────────────────────────────────────────┐   │
│  │  BOOT SEQUENCE:                                                      │   │
│  │                                                                      │   │
│  │  1. Power on → CPU fetches SP from 0x00000000 (aliased to Flash)    │   │
│  │  2. CPU fetches Reset vector from 0x00000004                        │   │
│  │  3. CPU jumps to Reset_Handler                                       │   │
│  │  4. Reset_Handler:                                                   │   │
│  │     a. Copy .data section from Flash to RAM                         │   │
│  │     b. Zero out .bss section in RAM                                 │   │
│  │     c. Call SystemInit() (optional - clock setup)                   │   │
│  │     d. Call main()                                                   │   │
│  │  5. main() runs your application                                     │   │
│  │                                                                      │   │
│  └─────────────────────────────────────────────────────────────────────┘   │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

GPIO Configuration

STM32 GPIO is highly configurable compared to AVR:

┌─────────────────────────────────────────────────────────────────────────────┐
│                      STM32 GPIO CONFIGURATION                               │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  For EACH pin, you configure:                                               │
│                                                                             │
│  1. MODE (MODER register) - 2 bits per pin                                 │
│     ├── 00: Input                                                          │
│     ├── 01: General purpose output                                         │
│     ├── 10: Alternate function (UART, SPI, etc.)                          │
│     └── 11: Analog                                                         │
│                                                                             │
│  2. OUTPUT TYPE (OTYPER) - 1 bit per pin                                   │
│     ├── 0: Push-pull                                                       │
│     └── 1: Open-drain                                                      │
│                                                                             │
│  3. SPEED (OSPEEDR) - 2 bits per pin                                       │
│     ├── 00: Low speed (2 MHz)                                              │
│     ├── 01: Medium speed (25 MHz)                                          │
│     ├── 10: High speed (50 MHz)                                            │
│     └── 11: Very high speed (100 MHz)                                      │
│                                                                             │
│  4. PULL-UP/DOWN (PUPDR) - 2 bits per pin                                  │
│     ├── 00: No pull-up/down                                                │
│     ├── 01: Pull-up                                                        │
│     ├── 10: Pull-down                                                      │
│     └── 11: Reserved                                                       │
│                                                                             │
│  5. ALTERNATE FUNCTION (AFRL/AFRH) - 4 bits per pin                        │
│     └── 0-15: Selects which peripheral gets the pin                        │
│                                                                             │
│  ─────────────────────────────────────────────────────────────────────────  │
│                                                                             │
│  EXAMPLE: Configure PA5 (LED on Nucleo) as output                          │
│                                                                             │
│  // 1. Enable GPIOA clock                                                   │
│  RCC->AHB1ENR |= (1 << 0);  // GPIOAEN bit                                 │
│                                                                             │
│  // 2. Set PA5 as output (bits 11:10 = 01)                                 │
│  GPIOA->MODER &= ~(3 << 10);  // Clear bits                                │
│  GPIOA->MODER |= (1 << 10);   // Set output mode                           │
│                                                                             │
│  // 3. Set PA5 high                                                         │
│  GPIOA->ODR |= (1 << 5);                                                   │
│  // OR use BSRR for atomic set/reset:                                      │
│  GPIOA->BSRR = (1 << 5);      // Set (bits 0-15)                           │
│  GPIOA->BSRR = (1 << 21);     // Reset (bits 16-31)                        │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

NVIC and ARM Interrupt Architecture

┌─────────────────────────────────────────────────────────────────────────────┐
│                    NVIC INTERRUPT HANDLING                                  │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  EXCEPTION ENTRY (Hardware-automated):                                      │
│                                                                             │
│  When interrupt occurs, CPU automatically:                                  │
│  1. Pushes 8 registers to stack (xPSR, PC, LR, R12, R3-R0)                │
│  2. Loads PC from vector table                                             │
│  3. Loads LR with special EXC_RETURN value                                 │
│  4. Enters handler mode                                                     │
│                                                                             │
│  ┌─────────────────────────────────────────────────────────────────────┐   │
│  │               STACK FRAME (pushed by hardware)                       │   │
│  │                                                                      │   │
│  │   Higher addresses                                                   │   │
│  │   ┌──────────────┐                                                  │   │
│  │   │    xPSR      │ ← Original status register                       │   │
│  │   ├──────────────┤                                                  │   │
│  │   │    PC        │ ← Return address                                 │   │
│  │   ├──────────────┤                                                  │   │
│  │   │    LR        │ ← Original link register                         │   │
│  │   ├──────────────┤                                                  │   │
│  │   │    R12       │                                                  │   │
│  │   ├──────────────┤                                                  │   │
│  │   │    R3        │                                                  │   │
│  │   ├──────────────┤                                                  │   │
│  │   │    R2        │                                                  │   │
│  │   ├──────────────┤                                                  │   │
│  │   │    R1        │                                                  │   │
│  │   ├──────────────┤                                                  │   │
│  │   │    R0        │ ← SP after stacking                              │   │
│  │   └──────────────┘                                                  │   │
│  │   Lower addresses                                                    │   │
│  │                                                                      │   │
│  │   EXC_RETURN values (in LR):                                        │   │
│  │   • 0xFFFFFFF1: Return to Handler mode, use MSP                     │   │
│  │   • 0xFFFFFFF9: Return to Thread mode, use MSP                      │   │
│  │   • 0xFFFFFFFD: Return to Thread mode, use PSP                      │   │
│  └─────────────────────────────────────────────────────────────────────┘   │
│                                                                             │
│  PRIORITY CONFIGURATION:                                                    │
│                                                                             │
│  STM32F4 uses 4 priority bits (16 levels: 0-15, lower = higher priority)  │
│                                                                             │
│  NVIC_SetPriority(USART1_IRQn, 5);  // Priority 5                          │
│  NVIC_EnableIRQ(USART1_IRQn);       // Enable interrupt                    │
│                                                                             │
│  PREEMPTION vs SUB-PRIORITY:                                                │
│  Configure with NVIC_SetPriorityGrouping()                                 │
│  • Preemption priority: Can interrupt lower-priority handlers             │
│  • Sub-priority: Determines order when same preemption priority           │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

DMA (Direct Memory Access)

DMA allows peripherals to transfer data without CPU involvement:

┌─────────────────────────────────────────────────────────────────────────────┐
│                           DMA OPERATION                                     │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  WITHOUT DMA (Polling or Interrupt):                                        │
│  ────────────────────────────────────                                       │
│                                                                             │
│  for (i = 0; i < 1000; i++) {                                              │
│      while (!(USART1->SR & USART_SR_RXNE));  // Wait                       │
│      buffer[i] = USART1->DR;                  // CPU reads each byte       │
│  }                                                                          │
│  // CPU is busy during entire transfer!                                     │
│                                                                             │
│  ─────────────────────────────────────────────────────────────────────────  │
│                                                                             │
│  WITH DMA:                                                                  │
│  ──────────                                                                 │
│                                                                             │
│  // Configure DMA once, then forget it!                                     │
│  DMA2_Stream5->PAR = &USART1->DR;           // Peripheral address          │
│  DMA2_Stream5->M0AR = buffer;                // Memory address              │
│  DMA2_Stream5->NDTR = 1000;                  // Number of items            │
│  DMA2_Stream5->CR = DMA_CR_ENABLE | ...;     // Start!                     │
│  // CPU is FREE to do other work                                           │
│  // Interrupt fires when complete                                           │
│                                                                             │
│  ┌───────────────────────────────────────────────────────────────────────┐ │
│  │                    DMA BLOCK DIAGRAM                                   │ │
│  │                                                                        │ │
│  │   ┌──────────────┐      ┌──────────────┐      ┌──────────────┐       │ │
│  │   │    Memory    │◄────►│     DMA      │◄────►│  Peripheral  │       │ │
│  │   │    (SRAM)    │      │  Controller  │      │   (USART)    │       │ │
│  │   └──────────────┘      └──────────────┘      └──────────────┘       │ │
│  │          ▲                     ▲                     │                │ │
│  │          │                     │                     │                │ │
│  │          │              Bus Arbiter                  │                │ │
│  │          │                     ▲                     │                │ │
│  │          │                     │                     ▼                │ │
│  │          │              ┌──────────────┐      DMA Request            │ │
│  │          │              │     CPU      │      (hardware trigger)     │ │
│  │          │              │   (free!)    │                              │ │
│  │          │              └──────────────┘                              │ │
│  │          │                                                            │ │
│  │          └─────────── DMA writes directly to memory ─────────────────│ │
│  │                                                                        │ │
│  └───────────────────────────────────────────────────────────────────────┘ │
│                                                                             │
│  DMA CONFIGURATION PARAMETERS:                                              │
│  • Direction: Peripheral→Memory, Memory→Peripheral, Memory→Memory         │
│  • Mode: Normal (one-shot) or Circular (auto-restart)                      │
│  • Increment: Increment memory address, peripheral address, or both       │
│  • Data size: Byte (8-bit), Half-word (16-bit), Word (32-bit)            │
│  • Priority: Low, Medium, High, Very High                                  │
│  • FIFO: Optional buffering for burst transfers                           │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

2.2 Why This Matters

Industry Relevance:

  • STM32 is the most popular 32-bit microcontroller family globally
  • Used in: drones, medical devices, automotive ECUs, industrial automation
  • Companies like Tesla, Apple, DJI use Cortex-M in products
  • 75%+ of embedded job postings mention ARM or STM32

Skills Gained:

  • Directly applicable to any Cortex-M chip (NXP, Nordic, TI, etc.)
  • Foundation for RTOS development (FreeRTOS, Zephyr run on Cortex-M)
  • Understanding of HAL libraries (you’ll know what’s underneath)
  • Debugging skills transfer to any embedded platform

2.3 Historical Context

ARM’s Evolution:

  • 1983: Acorn RISC Machine at Cambridge
  • 1990: ARM Ltd spun off; licensing model begins
  • 2004: ARM Cortex introduced (M, R, A profiles)
  • 2007: Cortex-M3 revolutionizes MCUs
  • 2010: Cortex-M4 adds DSP/FPU
  • Today: Cortex-M55/M85 with ML acceleration

STM32’s Rise:

  • 2007: ST launches first STM32 (Cortex-M3)
  • Aggressive pricing, excellent documentation
  • Free HAL/LL libraries accelerated adoption
  • 2024: 1000+ STM32 variants, 10 billion+ shipped

2.4 Common Misconceptions

Misconception Reality
“HAL is always bad” HAL is fine for prototyping; bare metal for production optimization
“Cortex-M has no modes” It has Thread and Handler mode, just not user/supervisor like Cortex-A
“DMA is complicated” DMA is simpler than interrupt-driven I/O once configured
“Clock tree is optional” Wrong! Incorrect clocks = nothing works or peripherals misbehave
“ARM assembly is required” 99% of bare metal is C; assembly only for startup/context switch

3. Project Specification

3.1 What You Will Build

A complete bare metal firmware for STM32F4 that demonstrates:

  1. Custom startup code with vector table in C
  2. Clock configuration to run at 168 MHz from 8 MHz HSE
  3. GPIO driver to control LEDs and read buttons
  4. UART driver for serial communication at 115200 baud
  5. Timer driver with interrupt-driven timing
  6. DMA-based UART for efficient data transfer
  7. A simple command shell tying everything together

3.2 Functional Requirements

ID Requirement
FR1 System boots at 168 MHz using PLL from external crystal
FR2 User LED (PA5) blinks at 1 Hz using timer interrupt
FR3 UART1 (PA9/PA10) operates at 115200 baud, 8N1
FR4 UART RX uses DMA in circular mode for buffer filling
FR5 Serial shell accepts commands: “led on/off”, “status”, “help”
FR6 Button (PC13) toggles LED on press with debouncing
FR7 SysTick provides 1ms system tick for timing functions
FR8 All code compiles with -Wall -Werror, no warnings

3.3 Non-Functional Requirements

ID Requirement
NFR1 Binary size < 16 KB (leaves room for growth)
NFR2 No external dependencies (no HAL, no CMSIS beyond headers)
NFR3 Interrupt latency < 50 cycles for UART/Timer
NFR4 Zero busy-waiting in main loop (all interrupt-driven)
NFR5 Clean separation between drivers and application logic

3.4 Example Usage / Output

# Build and flash
$ make
arm-none-eabi-gcc -mcpu=cortex-m4 -mthumb -mfloat-abi=hard -mfpu=fpv4-sp-d16 \
    -Os -Wall -Werror -c main.c -o main.o
arm-none-eabi-gcc -mcpu=cortex-m4 -mthumb -T stm32f4.ld -nostartfiles \
    -o firmware.elf startup.o main.o gpio.o uart.o timer.o dma.o
arm-none-eabi-objcopy -O binary firmware.elf firmware.bin

$ ls -la firmware.bin
-rw-r--r-- 1 user user 8192 Jan 15 10:30 firmware.bin

$ st-flash write firmware.bin 0x8000000
st-flash 1.7.0
2024-01-15T10:30:45 INFO usb.c: Found 1 stlink programmers
2024-01-15T10:30:45 INFO flash.c: Starting Flash write
2024-01-15T10:30:46 INFO flash.c: Flash written and target reset

# Serial terminal (115200 baud)
$ screen /dev/ttyACM0 115200

================================================================================
                    STM32F4 Bare Metal Demo v1.0
================================================================================
System Clock: 168 MHz (HSE + PLL)
UART1: 115200 baud, DMA-enabled RX
Timer2: 1 Hz LED blink interrupt

Initialization complete.

> help
Available commands:
  help     - Show this help
  status   - Show system status
  led on   - Turn LED on
  led off  - Turn LED off
  blink    - Toggle automatic blinking

> status
System Status:
  Uptime: 42 seconds
  LED: ON (blinking)
  Button presses: 3
  UART RX bytes: 128 (DMA)
  Timer ticks: 42

> led off
LED disabled (manual mode)

> led on
LED enabled (blinking resumed)

> [Button pressed - LED toggled]

3.5 Real World Outcome

After completing this project, you will have:

  1. Production-quality startup code reusable for any STM32F4 project
  2. Driver library for GPIO, UART, Timer, DMA that you fully understand
  3. Debugging skills for ARM Cortex-M using OpenOCD and GDB
  4. Foundation for RTOS - context switching builds on this
  5. Portfolio piece demonstrating deep embedded understanding
  6. Interview readiness for embedded systems positions

4. Solution Architecture

4.1 High-Level Design

┌─────────────────────────────────────────────────────────────────────────────┐
│                         FIRMWARE ARCHITECTURE                               │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  ┌────────────────────────────────────────────────────────────────────────┐│
│  │                        APPLICATION LAYER                                ││
│  │  ┌──────────────┐  ┌──────────────┐  ┌──────────────────────────────┐  ││
│  │  │    Shell     │  │  LED Blink   │  │     Button Handler           │  ││
│  │  │  (commands)  │  │   (1 Hz)     │  │     (debounced)              │  ││
│  │  └──────────────┘  └──────────────┘  └──────────────────────────────┘  ││
│  └────────────────────────────────────────────────────────────────────────┘│
│                                    │                                        │
│                                    ▼                                        │
│  ┌────────────────────────────────────────────────────────────────────────┐│
│  │                         DRIVER LAYER (HAL-like)                        ││
│  │  ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────────┐ ││
│  │  │  GPIO    │ │  UART    │ │  Timer   │ │   DMA    │ │   SysTick    │ ││
│  │  │  Driver  │ │  Driver  │ │  Driver  │ │  Driver  │ │   Driver     │ ││
│  │  └──────────┘ └──────────┘ └──────────┘ └──────────┘ └──────────────┘ ││
│  └────────────────────────────────────────────────────────────────────────┘│
│                                    │                                        │
│                                    ▼                                        │
│  ┌────────────────────────────────────────────────────────────────────────┐│
│  │                      LOW-LEVEL REGISTER ACCESS                         ││
│  │  ┌─────────────────────────────────────────────────────────────────┐   ││
│  │  │  stm32f4xx.h - Register definitions and memory-mapped structs   │   ││
│  │  └─────────────────────────────────────────────────────────────────┘   ││
│  └────────────────────────────────────────────────────────────────────────┘│
│                                    │                                        │
│                                    ▼                                        │
│  ┌────────────────────────────────────────────────────────────────────────┐│
│  │                         STARTUP & SYSTEM                               ││
│  │  ┌──────────────┐  ┌──────────────┐  ┌──────────────────────────────┐  ││
│  │  │  Vector      │  │  Startup     │  │       System Init            │  ││
│  │  │  Table       │  │  Code        │  │   (clock config)             │  ││
│  │  └──────────────┘  └──────────────┘  └──────────────────────────────┘  ││
│  └────────────────────────────────────────────────────────────────────────┘│
│                                    │                                        │
│                                    ▼                                        │
│  ┌────────────────────────────────────────────────────────────────────────┐│
│  │                           HARDWARE                                      ││
│  │  ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐          ││
│  │  │  GPIO   │ │ USART1  │ │  TIM2   │ │  DMA2   │ │  NVIC   │          ││
│  │  │  A/B/C  │ │         │ │         │ │ Stream5 │ │         │          ││
│  │  └─────────┘ └─────────┘ └─────────┘ └─────────┘ └─────────┘          ││
│  └────────────────────────────────────────────────────────────────────────┘│
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

4.2 Key Components

Component File Responsibility
Vector Table startup.c Exception handlers, initial SP
Startup Code startup.c Copy .data, zero .bss, call main
System Init system.c Clock configuration (168 MHz)
GPIO Driver gpio.c/h Pin configuration, read/write
UART Driver uart.c/h Init, TX, RX (polling and DMA)
Timer Driver timer.c/h TIM2 configuration, 1 Hz ISR
DMA Driver dma.c/h UART RX DMA setup and handling
SysTick systick.c/h 1ms tick, delay functions
Shell shell.c/h Command parsing and execution
Main main.c Application logic

4.3 Data Structures

// GPIO Pin definition
typedef struct {
    GPIO_TypeDef *port;     // GPIOA, GPIOB, etc.
    uint8_t pin;            // 0-15
    uint8_t mode;           // Input, Output, AF, Analog
    uint8_t af;             // Alternate function number
    uint8_t pull;           // None, Up, Down
    uint8_t speed;          // Low, Medium, High, Very High
} gpio_pin_t;

// UART configuration
typedef struct {
    USART_TypeDef *usart;   // USART1, USART2, etc.
    uint32_t baud;          // 9600, 115200, etc.
    gpio_pin_t tx_pin;
    gpio_pin_t rx_pin;
    uint8_t *dma_buffer;    // For DMA RX
    uint16_t dma_size;
} uart_config_t;

// Ring buffer for UART RX
typedef struct {
    uint8_t *buffer;
    uint16_t size;
    volatile uint16_t head;
    volatile uint16_t tail;
} ring_buffer_t;

// Command structure for shell
typedef struct {
    const char *name;
    void (*handler)(int argc, char *argv[]);
    const char *help;
} shell_command_t;

4.4 Algorithm Overview

Clock Configuration Algorithm:

1. Enable HSE oscillator
2. Wait for HSE ready (HSERDY)
3. Configure PLL:
   - PLLM = 8 (HSE/8 = 1 MHz)
   - PLLN = 336 (1 MHz * 336 = 336 MHz VCO)
   - PLLP = 2 (336/2 = 168 MHz SYSCLK)
   - PLLQ = 7 (for USB, 336/7 = 48 MHz)
4. Enable PLL
5. Wait for PLL ready (PLLRDY)
6. Configure Flash latency (5 wait states for 168 MHz)
7. Select PLL as system clock (SW = PLL)
8. Wait for clock switch complete (SWS = PLL)
9. Configure AHB, APB1, APB2 prescalers

DMA-Based UART RX:

1. Configure DMA stream for USART1_RX (DMA2 Stream 5)
2. Set peripheral address = &USART1->DR
3. Set memory address = rx_buffer
4. Set buffer size, circular mode
5. Enable DMA stream
6. Enable USART DMA receiver
7. DMA fills buffer automatically
8. Application polls NDTR to check bytes received

5. Implementation Guide

5.1 Development Environment Setup

# macOS
brew install arm-none-eabi-gcc openocd stlink

# Ubuntu/Debian
sudo apt install gcc-arm-none-eabi gdb-multiarch openocd stlink-tools

# Windows
# Download from ARM developer website or use WSL

# Verify installation
arm-none-eabi-gcc --version
# arm-none-eabi-gcc (GNU Arm Embedded Toolchain 10.3-2021.10) 10.3.1

openocd --version
# Open On-Chip Debugger 0.11.0

st-info --probe
# Found 1 stlink programmers

5.2 Project Structure

stm32f4_bare_metal/
├── Makefile                 # Build system
├── stm32f4.ld              # Linker script
├── inc/                     # Header files
│   ├── stm32f4xx.h         # Register definitions
│   ├── gpio.h
│   ├── uart.h
│   ├── timer.h
│   ├── dma.h
│   ├── systick.h
│   └── shell.h
├── src/                     # Source files
│   ├── startup.c           # Vector table + startup
│   ├── system.c            # Clock configuration
│   ├── gpio.c
│   ├── uart.c
│   ├── timer.c
│   ├── dma.c
│   ├── systick.c
│   ├── shell.c
│   └── main.c
├── openocd.cfg             # OpenOCD configuration
└── README.md

5.3 The Core Question You’re Answering

“How does a bare metal ARM Cortex-M system boot, configure its hardware, and handle I/O without any OS or libraries?”

You’ll answer this by:

  1. Writing startup code that the CPU executes first
  2. Configuring the clock tree for maximum performance
  3. Implementing drivers that talk directly to hardware registers
  4. Using interrupts and DMA for efficient, non-blocking I/O

5.4 Concepts You Must Understand First

Before implementing, verify you understand:

Concept Self-Assessment Question Reference
Memory-mapped I/O Why does volatile matter when accessing hardware registers? Cortex-M Guide Ch. 4
Interrupt handling What happens to the stack when an interrupt fires on Cortex-M? Cortex-M Guide Ch. 8
Linker scripts What are .text, .data, .bss, and why must .data be copied? “Linker and Loader” Ch. 3
Clock domains Why do peripherals need their clocks enabled separately? STM32F4 Reference Manual
Bit manipulation How do you set bit 5 without affecting other bits? Any C book

5.5 Questions to Guide Your Design

Startup & Memory:

  • Where does the stack start and why?
  • How does the CPU know where to jump after reset?
  • Why can’t .data just stay in Flash?

Clock Configuration:

  • What happens if you enable a peripheral before its clock?
  • Why does Flash need wait states at higher frequencies?
  • How do you verify the clock is running correctly?

Peripheral Drivers:

  • Should drivers be blocking or non-blocking?
  • How do you prevent race conditions between main and ISR?
  • Where should hardware-specific constants live?

DMA:

  • When is DMA better than interrupt-driven I/O?
  • How do you know when DMA has transferred data?
  • What happens if DMA and CPU access memory simultaneously?

5.6 Thinking Exercise

Before writing any code, trace through this sequence manually:

SCENARIO: Button pressed while LED blinking

1. Main loop is in shell_process() waiting for input
2. TIM2 interrupt fires (1 Hz tick)
   - What registers are pushed to stack?
   - What is in LR?
3. In TIM2_IRQHandler, EXTI interrupt fires (button pressed)
   - Can EXTI interrupt TIM2 handler?
   - What determines if preemption happens?
4. Both handlers complete
   - In what order?
5. Return to main loop
   - How does CPU restore context?

Draw the stack at each step. Verify with the Cortex-M reference manual.

5.7 Hints in Layers

Hint 1 - Starting Point (Conceptual Direction): Start with the linker script and startup code. Without these, nothing else works. Your linker script must define:

  • Flash origin and length
  • RAM origin and length
  • Sections: .isr_vector, .text, .rodata, .data, .bss
  • Symbols for startup code: _sdata, _edata, _sidata, _sbss, _ebss

Hint 2 - Next Level (More Specific): The vector table must be at the very beginning of Flash (0x08000000). In startup.c:

__attribute__((section(".isr_vector")))
const uint32_t vector_table[] = {
    (uint32_t)&_estack,        // Initial stack pointer
    (uint32_t)Reset_Handler,   // Reset handler
    // ... other handlers
};

The linker script places .isr_vector first, and the CPU reads the first two entries on reset.

Hint 3 - Technical Details (Approach/Pseudocode): Clock configuration sequence for 168 MHz:

void SystemInit(void) {
    // Enable HSE
    RCC->CR |= RCC_CR_HSEON;
    while (!(RCC->CR & RCC_CR_HSERDY));

    // Configure PLL: 8MHz / 8 * 336 / 2 = 168 MHz
    RCC->PLLCFGR = (8 << RCC_PLLCFGR_PLLM_Pos) |
                   (336 << RCC_PLLCFGR_PLLN_Pos) |
                   (0 << RCC_PLLCFGR_PLLP_Pos) |   // PLLP = 2
                   (7 << RCC_PLLCFGR_PLLQ_Pos) |
                   RCC_PLLCFGR_PLLSRC_HSE;

    // Enable PLL
    RCC->CR |= RCC_CR_PLLON;
    while (!(RCC->CR & RCC_CR_PLLRDY));

    // Flash latency for 168 MHz
    FLASH->ACR = FLASH_ACR_LATENCY_5WS | FLASH_ACR_PRFTEN | FLASH_ACR_ICEN;

    // Switch to PLL
    RCC->CFGR |= RCC_CFGR_SW_PLL;
    while ((RCC->CFGR & RCC_CFGR_SWS) != RCC_CFGR_SWS_PLL);
}

Hint 4 - Tools/Debugging (Verification Methods): Verify clock configuration:

// Output SYSCLK on MCO1 (PA8)
RCC->CFGR &= ~RCC_CFGR_MCO1;
RCC->CFGR |= RCC_CFGR_MCO1_1;  // Select SYSCLK
// Measure PA8 with oscilloscope - should show 168 MHz / 1,2,3,4,5

// Alternative: Toggle LED and measure frequency
while (1) {
    GPIOA->ODR ^= (1 << 5);
    // Known delay
}
// If LED toggles at expected rate, clock is correct

Debug with OpenOCD + GDB:

# Terminal 1
openocd -f interface/stlink.cfg -f target/stm32f4x.cfg

# Terminal 2
arm-none-eabi-gdb firmware.elf
(gdb) target remote localhost:3333
(gdb) monitor reset halt
(gdb) load
(gdb) break main
(gdb) continue

5.8 The Interview Questions They’ll Ask

  1. “Walk me through what happens from power-on to main() on an ARM Cortex-M.”
    • Expected: Vector table fetch, Reset_Handler, .data copy, .bss zero, SystemInit, main
  2. “How would you configure an STM32 GPIO pin for UART TX (alternate function)?”
    • Expected: Enable clock, set MODER to AF, set AFR to correct function, set speed
  3. “Explain the difference between NVIC priority and preemption.”
    • Expected: Priority groups, preemption bits, sub-priority, when nesting occurs
  4. “When would you use DMA instead of interrupt-driven I/O?”
    • Expected: High data rates, CPU-intensive tasks, streaming data, power savings
  5. “What is the EXC_RETURN value and why does it matter?”
    • Expected: Magic value in LR, determines return mode (Thread/Handler) and stack (MSP/PSP)
  6. “How do you handle a HardFault on Cortex-M?”
    • Expected: Stack frame analysis, CFSR register, implement handler that dumps info
  7. “Describe how you’d debug an STM32 that doesn’t boot.”
    • Expected: Check power, verify clock, use debugger to halt at reset, check vector table

5.9 Books That Will Help

Topic Book Chapters
ARM Cortex-M Architecture “The Definitive Guide to ARM Cortex-M3/M4” by Joseph Yiu Ch. 1-4, 7-8
Embedded Systems Design “Making Embedded Systems” by Elecia White Ch. 1-5
STM32 Specifics “Mastering STM32” by Carmine Noviello Ch. 1-10
Low-Level C “Write Great Code, Volume 1” by Randall Hyde Ch. 4-6
Linker Scripts “Linkers and Loaders” by John Levine Ch. 3

5.10 Implementation Phases

Phase 1: Boot & Blink (Week 1)

  • Create linker script
  • Write startup code with vector table
  • Implement Reset_Handler
  • Blink LED using polling delay

Phase 2: Clock & UART (Week 1-2)

  • Configure PLL for 168 MHz
  • Implement UART driver (polling TX/RX)
  • Get “Hello World” over serial
  • Add SysTick for proper delays

Phase 3: Interrupts & Timer (Week 2-3)

  • Set up NVIC for EXTI (button)
  • Configure TIM2 for 1 Hz interrupt
  • Implement LED blink in timer ISR
  • Add button debouncing

Phase 4: DMA & Shell (Week 3-4)

  • Configure DMA for UART RX
  • Implement ring buffer
  • Create command parser
  • Add shell commands
  • Clean up and document

5.11 Key Implementation Decisions

Decision Option A Option B Recommendation
Startup language C Assembly C (cleaner, portable)
Clock source HSI (internal) HSE (external) HSE (more accurate)
UART RX Polling DMA DMA (frees CPU)
Driver style Single file Header + Source H + C (cleaner)
Error handling Assert Return codes Return codes (production)

6. Testing Strategy

6.1 Unit Testing

Since we’re bare metal, “unit tests” run on hardware:

// test_gpio.c
void test_gpio_output(void) {
    gpio_pin_t led = {GPIOA, 5, GPIO_MODE_OUTPUT, 0, GPIO_PULL_NONE, GPIO_SPEED_LOW};
    gpio_init(&led);

    gpio_write(&led, 1);
    assert(gpio_read_output(&led) == 1);  // Via ODR

    gpio_write(&led, 0);
    assert(gpio_read_output(&led) == 0);

    uart_puts("GPIO test PASSED\r\n");
}

6.2 Integration Testing

Use serial output to verify behavior:

void test_timer_interrupt(void) {
    volatile uint32_t start = systick_get_ms();

    // Timer should fire every 1000 ms
    while (timer_tick_count < 5);  // Wait for 5 ticks

    uint32_t elapsed = systick_get_ms() - start;
    // Should be ~5000 ms (within 1%)
    assert(elapsed > 4950 && elapsed < 5050);

    uart_puts("Timer test PASSED\r\n");
}

6.3 Hardware Verification

Test Method Expected
Clock frequency Oscilloscope on MCO 168 MHz
UART baud rate Oscilloscope on TX 8.68 us/bit (115200)
Timer frequency LED + stopwatch 0.5 Hz toggle = 1 Hz
DMA throughput Send known data, verify 100% match
Interrupt latency Toggle GPIO in ISR, scope trigger < 500 ns

7. Common Pitfalls & Debugging

7.1 Common Pitfalls

Symptom Likely Cause Fix
Nothing happens after flash Vector table not at 0x08000000 Check linker script .isr_vector placement
HardFault immediately Stack pointer invalid Verify _estack symbol in linker script
Peripheral doesn’t work Clock not enabled RCC->AHBxENR or APBxENR
UART garbled text Wrong baud rate Check PCLK frequency and BRR calculation
Interrupts don’t fire NVIC not enabled NVIC_EnableIRQ() after peripheral config
DMA doesn’t transfer Stream not enabled or wrong channel Check DMA_SxCR and channel selection
Clock won’t start HSE crystal issue Try HSI first, verify hardware

7.2 Debugging Techniques

Using GDB with OpenOCD:

(gdb) info registers           # Show all registers
(gdb) x/16x 0x40020000        # Examine GPIOA registers
(gdb) x/x 0xE000ED28          # Check CFSR for fault info
(gdb) set {int}0x40020014 = 0x20  # Write to GPIOA BSRR
(gdb) monitor reset halt       # Reset and halt CPU

Fault Analysis:

void HardFault_Handler(void) {
    __asm volatile (
        "tst lr, #4\n"           // Check EXC_RETURN
        "ite eq\n"
        "mrseq r0, msp\n"        // Use MSP
        "mrsne r0, psp\n"        // Use PSP
        "b hardfault_handler_c\n"
    );
}

void hardfault_handler_c(uint32_t *stack) {
    uart_puts("HARDFAULT!\r\n");
    uart_printf("R0:  0x%08X\r\n", stack[0]);
    uart_printf("R1:  0x%08X\r\n", stack[1]);
    uart_printf("R2:  0x%08X\r\n", stack[2]);
    uart_printf("R3:  0x%08X\r\n", stack[3]);
    uart_printf("R12: 0x%08X\r\n", stack[4]);
    uart_printf("LR:  0x%08X\r\n", stack[5]);
    uart_printf("PC:  0x%08X\r\n", stack[6]);  // Faulting instruction
    uart_printf("xPSR:0x%08X\r\n", stack[7]);
    uart_printf("CFSR:0x%08X\r\n", SCB->CFSR);
    while (1);
}

7.3 Quick Verification Tests

// Test 1: Is clock working?
void test_clock(void) {
    // Toggle LED as fast as possible
    while (1) {
        GPIOA->ODR ^= (1 << 5);
    }
    // At 168 MHz, should toggle at ~84 MHz / loop overhead ≈ 10-20 MHz
}

// Test 2: Is UART working?
void test_uart(void) {
    while (1) {
        uart_putc('U');  // 0x55 = 01010101 - good for scope
    }
}

// Test 3: Are interrupts working?
void test_interrupts(void) {
    __disable_irq();
    GPIOA->ODR |= (1 << 5);   // LED on
    __enable_irq();
    // If interrupt fires, handler will toggle LED
}

8. Extensions & Challenges

8.1 Beginner Extensions

  1. Add more shell commands: “uptime”, “reset”, “version”
  2. Implement PWM LED dimming: Use timer output compare
  3. Add button long-press detection: Track press duration
  4. Implement printf: Variable arguments, format specifiers

8.2 Intermediate Extensions

  1. Add SPI driver: Talk to an external device (e.g., SD card)
  2. Implement I2C driver: Read a sensor (e.g., temperature)
  3. Add ADC driver: Read analog input (potentiometer)
  4. Boot from different memory: Execute code from RAM

8.3 Advanced Extensions

  1. Implement a simple RTOS scheduler: Round-robin with 2 tasks
  2. Add USB CDC driver: Virtual COM port (complex but rewarding)
  3. Implement Flash programming: Store configuration in Flash
  4. Add Ethernet driver: If using STM32F4 with Ethernet

8.4 Expert Challenges

  1. Port to Rust: Use embedded-hal traits for drivers
  2. Implement secure boot: Verify firmware signature
  3. Add power management: Sleep modes, wake sources
  4. Build a bootloader: Update firmware over UART

9. Real-World Connections

9.1 Industry Applications

Medical Devices:

  • Insulin pumps use Cortex-M for motor control
  • Patient monitors use DMA for continuous data acquisition
  • Safety-critical: similar bare metal approach for determinism

Automotive:

  • Engine control units run bare metal or minimal RTOS
  • CAN bus drivers similar to UART driver
  • Real-time requirements demand understanding interrupts

IoT/Consumer:

  • Smartwatches (Nordic nRF52 is Cortex-M4)
  • Wireless earbuds (power management critical)
  • Smart home devices

9.2 How Production Code Differs

Aspect This Project Production
Error handling Minimal Comprehensive, watchdog
Power management None Sleep modes, power optimization
Security None Secure boot, encrypted storage
Testing Manual Automated hardware-in-loop
Documentation Comments Doxygen, requirements tracing
Configuration Hardcoded Runtime configurable

9.3 Career Relevance

This project prepares you for:

  • Embedded Software Engineer roles at hardware companies
  • Firmware Developer positions in IoT/automotive
  • RTOS Developer - this is the foundation
  • Security Researcher - understanding bare metal is essential
  • Technical Lead - you’ll understand what the hardware can do

10. Resources

10.1 Essential Documentation

Document Source Usage
STM32F4 Reference Manual (RM0090) ST All peripheral details
STM32F4 Datasheet ST Pinouts, electrical specs
Cortex-M4 Technical Reference ARM Core architecture
Cortex-M4 Generic User Guide ARM Programming model

10.2 Online Resources

10.3 Tools

Tool Purpose Link
arm-none-eabi-gcc Compiler ARM Developer
OpenOCD Debug server openocd.org
ST-Link Flashing/debug STMicroelectronics
PulseView Logic analyzer sigrok.org
STM32CubeMX Pin/clock visualization STMicroelectronics

11. Self-Assessment Checklist

Before Starting

  • I can explain what a memory-mapped register is
  • I understand pointers and volatile in C
  • I can read a basic linker script
  • I have my hardware and tools ready
  • I can explain every line of the linker script
  • I understand why startup code copies .data and zeros .bss
  • My LED blinks at an expected rate
  • I can debug with OpenOCD and GDB

After Phase 2 (Clock & UART)

  • I can configure the clock tree without reference code
  • I understand the PLL calculation
  • UART transmits correctly at 115200 baud
  • I can explain why Flash needs wait states

After Phase 3 (Interrupts & Timer)

  • I can explain the interrupt entry/exit sequence
  • I understand NVIC priority and preemption
  • Timer interrupt fires at exactly 1 Hz
  • Button interrupt works with debouncing

After Completion

  • DMA fills my buffer without CPU involvement
  • Shell commands work correctly
  • All code compiles without warnings
  • I can answer all interview questions in section 5.8
  • I could implement this from scratch on a different Cortex-M chip

12. Submission / Completion Criteria

Your project is complete when:

  1. Functionality:
    • System boots at 168 MHz (verified via LED timing or scope)
    • LED blinks at 1 Hz via timer interrupt
    • UART works at 115200 baud
    • Shell accepts and executes commands
    • Button toggles LED (debounced)
    • DMA is used for UART RX
  2. Code Quality:
    • Compiles with -Wall -Werror (no warnings)
    • Code is well-organized (separate drivers)
    • All magic numbers are named constants
    • Comments explain non-obvious code
  3. Documentation:
    • README explains how to build and flash
    • Memory map is documented
    • Interrupt priorities are documented
  4. Understanding:
    • Can explain code to interviewer
    • Can modify for different baud rate
    • Can add new peripheral driver
    • Understands tradeoffs made

Summary

This project takes you from zero to professional-level STM32 bare metal programming. You’ll understand:

  • How ARM Cortex-M boots from power-on to main()
  • Clock configuration for maximum performance
  • Driver development for GPIO, UART, Timer, DMA
  • Interrupt handling with NVIC configuration
  • Debugging techniques for embedded systems

The skills transfer to any Cortex-M device (NXP, Nordic, TI) and form the foundation for RTOS development, bootloaders, and production firmware.

Estimated time: 3-4 weeks for complete implementation Difficulty: Advanced (requires patience with datasheets) Reward: Deep understanding of professional embedded development


Next: P10-kernel-multitasking.md - Build a simple kernel with preemptive multitasking