Project 6: Interrupt-Driven UART Echo

Use interrupts to echo UART input on Cortex-M.

Quick Reference

Attribute Value
Difficulty Level 4
Time Estimate 16-24 hours
Main Programming Language Assembly + C (Alternatives: Rust)
Alternative Programming Languages Rust
Coolness Level Level 4
Business Potential Level 2
Prerequisites Concept 5: Boot & Exceptions, Concept 4: Memory Maps & Ordering
Key Topics ISR flow, MMIO access, ring buffers

1. Learning Objectives

By completing this project, you will:

  1. Translate ARM concepts into observable outputs you can verify.
  2. Explain why each toolchain or hardware step is necessary.
  3. Detect and fix at least one realistic failure mode.
  4. Communicate the result clearly in a technical review or interview.

2. All Theory Needed (Per-Concept Breakdown)

Boot, Exceptions & Interrupts

Fundamentals Boot and exception handling define how control flow starts and changes when the system is interrupted. On Cortex-M, reset reads a vector table at a fixed address to obtain the initial stack pointer and reset handler. On AArch64, exception levels (EL0–EL3) define privilege and isolation across kernel, hypervisor, and secure monitor. citeturn0search6turn2search4 Interrupts are structured events with defined entry and exit behavior; when misunderstood, they cause the most common low-level failures (silent lockups, corrupted stacks, and unacknowledged interrupts).

Deep Dive Boot flow is architecture-specific, but it always starts with the hardware choosing a program counter and stack pointer. In Cortex-M, the vector table is a literal list of addresses at the start of flash (or a remapped location). The CPU loads the initial SP from offset 0 and the reset handler from offset 4; execution begins there. This is why vector tables are so critical: a single incorrect address prevents boot. In AArch64 systems, boot is more complex. Firmware (or a ROM) selects the initial exception level and execution state, then transfers control to your image. This can occur at EL2 or EL1 depending on platform; understanding the starting level is essential for setting up the MMU and interrupt controller. citeturn0search6turn2search4

Exceptions and interrupts are structured transitions. On Cortex-M, hardware automatically saves a register frame on the stack and switches to handler mode. This means your ISR is effectively running on a known stack layout; if you violate it, return from interrupt fails. AArch64 exceptions follow a different path: they trap into higher exception levels and use banked registers and exception vector tables that differ per EL. This makes exception handling on A-profile both more powerful and more complex. In practice, you must know which registers are saved by hardware and which you must save manually, and you must understand the difference between synchronous exceptions (e.g., illegal instruction) and asynchronous interrupts (e.g., timer). citeturn0search6turn2search4

Interrupt latency is also a systems-level trade-off. M-profile is designed for low-latency, deterministic responses, which is why it dominates microcontroller workloads. citeturn0search0 This is a critical difference from A-profile, where throughput and virtualization might be prioritized. When you design firmware, you need to decide which tasks are best done in an ISR versus in the main loop; an ISR that does too much can starve other interrupts and introduce jitter.

Finally, exceptions connect directly to debugging. Many “mysterious” crashes are just unhandled faults. On Cortex-M, a hard fault may indicate an invalid memory access or misaligned stack. On AArch64, synchronous exceptions reveal illegal instructions or permission violations. By understanding the exception model, you can interpret fault codes and correlate them to your assembly-level behavior, which is a core skill in systems programming and security analysis.

How this fits on projects

  • Central to P05 (Vector Table Builder), P06 (Interrupt-Driven UART), and P07 (Exception Level Lab).

Definitions & key terms

  • Vector table: Table of exception handler addresses used at reset or interrupt.
  • Exception level: Privilege tier in AArch64 (EL0–EL3). citeturn0search6turn2search4
  • ISR: Interrupt service routine.
  • HardFault: Cortex-M fault handler for severe errors.

Mental model diagram

Cortex-M Boot Sequence:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

    Power Applied
         │
         ▼
    ┌─────────────────────────────────────────────────────────────────┐
    │  1. CPU comes out of reset                                      │
    │     - All registers undefined (except SP and PC)                │
    │     - Processor in Thread mode, privileged                      │
    │     - Using Main Stack Pointer (MSP)                            │
    └─────────────────────────────────────────────────────────────────┘
         │
         ▼
    ┌─────────────────────────────────────────────────────────────────┐
    │  2. CPU reads address 0x00000000 (or VTOR)                      │
    │     - Loads INITIAL STACK POINTER value                         │
    │     - This value goes into SP/r13                               │
    └─────────────────────────────────────────────────────────────────┘
         │
         ▼
    ┌─────────────────────────────────────────────────────────────────┐
    │  3. CPU reads address 0x00000004                                │
    │     - Loads RESET HANDLER address                               │
    │     - This value goes into PC/r15                               │
    │     - Bit 0 MUST be 1 (Thumb mode indicator)                    │
    └─────────────────────────────────────────────────────────────────┘
         │
         ▼
    ┌─────────────────────────────────────────────────────────────────┐
    │  4. Execution begins at Reset_Handler                           │
    │     - Your code starts running!                                 │
    │     - Stack is ready to use                                     │
    │     - All peripherals need initialization                       │
    └─────────────────────────────────────────────────────────────────┘


Vector Table Structure (first 16 entries are standard Cortex-M):
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

    Offset   │  Exception #  │  Contents
    ─────────┼───────────────┼────────────────────────────────────────
    0x0000   │  -            │  Initial Stack Pointer value
    0x0004   │  1 (Reset)    │  Reset_Handler address (| 1 for Thumb)
    0x0008   │  2 (NMI)      │  NMI_Handler address
    0x000C   │  3 (HardFault)│  HardFault_Handler address
    0x0010   │  4            │  Reserved (M0+ doesn't use)
    ...      │  ...          │  ...
    0x003C   │  15 (SysTick) │  SysTick_Handler address
    0x0040   │  16 (IRQ0)    │  First peripheral interrupt
    0x0044   │  17 (IRQ1)    │  Second peripheral interrupt
    ...      │  ...          │  (RP2040 has 26 IRQs)


Example minimal vector table in assembly:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

    .section .vectors, "a"
    .align 2

    .word   _stack_top          // 0x00: Initial SP
    .word   Reset_Handler + 1   // 0x04: Reset (bit 0 = Thumb)
    .word   NMI_Handler + 1     // 0x08: NMI
    .word   HardFault_Handler+1 // 0x0C: HardFault
    .word   0                   // 0x10: Reserved
    // ... more entries ...

NOTE: On RP2040, flash is at 0x10000000, so your vector table
lives there. The boot ROM copies the SP and PC from flash.

Cortex-M Boot Sequence

AArch64 Exception Levels:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

    ┌─────────────────────────────────────────────────────────────────┐
    │  EL3: Secure Monitor                                            │
    │       - Highest privilege, manages secure/non-secure worlds     │
    │       - TrustZone firmware lives here                           │
    ├─────────────────────────────────────────────────────────────────┤
    │  EL2: Hypervisor                                                │
    │       - Virtualization support                                  │
    │       - Controls virtual machines                               │
    ├─────────────────────────────────────────────────────────────────┤
    │  EL1: OS Kernel                                                 │
    │       - Where Linux kernel runs                                 │
    │       - Your bare-metal code runs here!                         │
    ├─────────────────────────────────────────────────────────────────┤
    │  EL0: User Applications                                         │
    │       - Lowest privilege                                        │
    │       - Normal programs run here under Linux                    │
    └─────────────────────────────────────────────────────────────────┘

    On Raspberry Pi boot:
    ┌──────────────────────────────────────────────────────────────┐
    │ GPU firmware starts at EL3, then drops to EL2,               │
    │ loads your kernel8.img, and jumps to 0x80000 at EL2.         │
    │ Your bare-metal code typically runs at EL1 after setup.      │
    └──────────────────────────────────────────────────────────────┘

AArch64 Exception Levels

Interrupt Flow on Cortex-M:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

    Main Code Running
           │
           │ ← UART receives byte
           │   Hardware sets interrupt flag
           │   NVIC sees enabled interrupt
           ▼
    ┌──────────────────────────────────────────────────────────────────┐
    │  AUTOMATIC HARDWARE ACTIONS (you don't write code for this):    │
    │  1. Finish current instruction                                   │
    │  2. Push 8 registers to stack: r0-r3, r12, LR, PC, xPSR         │
    │  3. Load new PC from vector table (exception #)                  │
    │  4. Load 0xFFFFFFF9 into LR (EXC_RETURN)                        │
    │  5. Enter Handler mode (privileged)                              │
    └──────────────────────────────────────────────────────────────────┘
           │
           ▼
    ┌──────────────────────────────────────────────────────────────────┐
    │  YOUR ISR EXECUTES:                                              │
    │  - Must save r4-r11 if you use them (push {r4-r7})              │
    │  - Read UART data register (clears interrupt flag)               │
    │  - Process byte (store in buffer, set flag, etc.)                │
    │  - Restore r4-r11 if saved                                       │
    │  - Return with: BX LR (the magic EXC_RETURN value)               │
    └──────────────────────────────────────────────────────────────────┘
           │
           ▼
    ┌──────────────────────────────────────────────────────────────────┐
    │  AUTOMATIC HARDWARE ACTIONS:                                     │
    │  1. Hardware detects EXC_RETURN in LR                            │
    │  2. Pop 8 registers from stack                                   │
    │  3. Resume execution exactly where interrupted                   │
    │  4. Return to Thread mode                                        │
    └──────────────────────────────────────────────────────────────────┘
           │
           ▼
    Main Code Continues (unaware anything happened!)


Stack During Interrupt:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

    BEFORE interrupt:           AFTER entry, BEFORE ISR code:
    ┌──────────────┐           ┌──────────────┐
    │  (old data)  │           │  (old data)  │
    │              │           ├──────────────┤
    │              │           │  xPSR        │ ← +0x1C from new SP
    │              │           ├──────────────┤
    │              │           │  PC (return) │ ← +0x18
    │              │           ├──────────────┤
    │              │           │  LR          │ ← +0x14
    │              │           ├──────────────┤
    │              │           │  r12         │ ← +0x10
    │              │           ├──────────────┤
    │              │           │  r3          │ ← +0x0C
    │              │           ├──────────────┤
    │              │           │  r2          │ ← +0x08
    │              │           ├──────────────┤
    │              │           │  r1          │ ← +0x04
    │              │           ├──────────────┤
SP →│              │        SP→│  r0          │ ← +0x00 (new SP)
    └──────────────┘           └──────────────┘

    The 32 bytes (8 × 4) are pushed automatically by hardware!

Cortex-M Interrupt Flow

How it works (step-by-step, with invariants and failure modes)

  1. Boot loads initial SP and PC from the vector table (Cortex-M) or firmware-defined entry (AArch64). citeturn0search6turn2search4
  2. An interrupt triggers hardware context save and branches to the handler.
  3. Handler restores context and returns using the architecture-specific mechanism.
  4. Failure mode: wrong vector address or corrupted stack → boot hang or fault loop.

Minimal concrete example (pseudo, not runnable)

VECTOR_TABLE[Reset] -> Reset_Handler
On interrupt: push context, branch handler, restore, return

Common misconceptions

  • “Interrupt handlers are just normal functions” → They obey different entry/exit rules.
  • “Boot is just jump to main” → Boot is a structured sequence with strict alignment rules.

Check-your-understanding questions

  1. Why must the reset handler address be Thumb-aligned on Cortex-M?
  2. What does EL1 represent in AArch64?
  3. Why must ISRs be short?

Check-your-understanding answers

  1. Bit 0 of the address indicates Thumb state; if it’s wrong, the CPU faults.
  2. EL1 is the kernel-level privilege where OS code typically runs. citeturn0search6turn2search4
  3. Long ISRs increase latency and can block higher-priority interrupts.

Real-world applications

  • Firmware bootloaders, interrupt-driven I/O, and OS exception handling. citeturn0search0turn0search6

Where you’ll apply it

  • This project: see §3.1 and §5.4 in P06-interrupt-driven-uart-echo.md
  • P05 Vector Table Builder
  • P06 Interrupt-Driven UART
  • P07 AArch64 Exception Level Lab

References

  • AArch64 exception model and privilege levels. citeturn0search6turn2search4
  • Cortex-M profile emphasis on low-latency interrupt response. citeturn0search0

Key insights Boot and exceptions are not features you add later; they are the foundation of control flow.

Summary Once you understand boot and exceptions, most “mysterious” bare-metal failures become obvious.

Homework/Exercises to practice the concept

  1. Draw the Cortex-M vector table layout and label the first 8 entries.
  2. Explain how an interrupt differs from a synchronous exception.

Solutions to the homework/exercises

  1. The first entry is the initial SP, followed by reset, NMI, HardFault, and system handlers.
  2. Interrupts are asynchronous hardware events; synchronous exceptions are triggered by the current instruction.

Memory Maps, MMIO & Ordering

Fundamentals ARM systems expose peripherals through memory-mapped I/O (MMIO): reading or writing specific addresses triggers hardware behavior rather than normal memory access. This is central to microcontrollers and still vital on A-profile SoCs. The memory map defines which address ranges are RAM, flash, peripherals, and internal control regions. Memory ordering adds another layer: modern CPUs can reorder memory accesses for performance, so barriers (DMB/DSB/ISB) are required to guarantee visibility and ordering to devices or other cores. citeturn3search3 Understanding MMIO and ordering is the key to controlling hardware reliably.

Deep Dive A memory map is a contract between the CPU and the SoC. Addresses are not abstract: they correspond to real hardware blocks. In Cortex-M systems, large fixed ranges map to flash, SRAM, peripherals, and internal control registers. These ranges determine what happens when you load or store. For example, a store to a GPIO register flips a pin; a load from a UART data register consumes a byte from a FIFO. MMIO behaves differently from RAM: it is often non-cacheable, may have side effects on read, and is frequently write-only or read-only. When you treat it like ordinary memory, bugs emerge: stale values, missing updates, or unintended state changes.

Memory ordering complicates this further. ARM cores, like most modern CPUs, can reorder memory operations to improve performance. This is invisible in single-threaded logic but catastrophic for devices and multi-core coordination. If you write a command buffer to memory and then write a “doorbell” MMIO register that tells the device to consume it, the device might see the doorbell first unless you insert a barrier. ARM provides barrier instructions—DMB, DSB, ISB—each with distinct strength. DMB ensures prior memory accesses are observed before subsequent ones; DSB additionally waits for completion; ISB flushes the instruction pipeline to make control-register changes visible. citeturn3search3 These are not optional: they are the difference between “mostly works” and “always correct.”

On microcontrollers, you may not have caches or complex reorder buffers, but the bus fabric and peripheral interactions still require ordering. On A-profile systems with caches, speculation, and out-of-order execution, the need is even greater. DMA engines read and write memory independently of the CPU; if you don’t synchronize caches or enforce ordering, the DMA sees stale or partial data. This is why firmware often combines barriers with explicit cache maintenance. The principle is simple: your mental model must include the device, the bus, and the CPU pipeline, not just the instruction sequence.

MMIO access patterns also introduce concurrency hazards. Read-modify-write sequences can race with interrupts or other cores. Hardware often provides SET/CLEAR registers specifically to avoid these races by allowing atomic bit operations. If you ignore these and perform a naive read-modify-write, you can silently clear unrelated bits. The safest approach is to understand the register semantics and use the atomic registers provided. That is not assembly-specific, but assembly exposes the pattern directly and makes it obvious.

How this fits on projects

  • Core to P04 (Memory Map & MMIO Field Notebook) and P09 (Memory Ordering Litmus Tests).

Definitions & key terms

  • Memory map: The assignment of address ranges to RAM, flash, and peripherals.
  • MMIO: Memory addresses that control hardware rather than store data.
  • DMB/DSB/ISB: Memory barrier instructions for ordering and visibility. citeturn3search3

Mental model diagram

Cortex-M Memory Map (4GB address space):
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

    0xFFFFFFFF ┌─────────────────────────────────────────┐
               │         Vendor-Specific                 │
    0xE0100000 ├─────────────────────────────────────────┤
               │         Private Peripheral Bus          │  ← NVIC lives here
               │         (Internal peripherals)          │    at 0xE000E000
    0xE0000000 ├─────────────────────────────────────────┤
               │                                         │
               │         External Device                 │  ← Memory-mapped
               │         (Peripherals, etc.)             │    devices
               │                                         │
    0xA0000000 ├─────────────────────────────────────────┤
               │                                         │
               │         External RAM                    │
               │                                         │
    0x60000000 ├─────────────────────────────────────────┤
               │                                         │
               │         Peripheral                      │  ← GPIO, UART, SPI,
               │         (On-chip I/O)                   │    I2C, PWM, etc.
               │                                         │
    0x40000000 ├─────────────────────────────────────────┤
               │                                         │
               │         SRAM                            │  ← Variables, stack,
               │         (On-chip RAM)                   │    heap
               │                                         │
    0x20000000 ├─────────────────────────────────────────┤
               │                                         │
               │         Code                            │  ← Flash/ROM with
               │         (Flash/ROM)                     │    your program
               │                                         │
    0x00000000 └─────────────────────────────────────────┘


RP2040-Specific Memory Map:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

    Address         │  Size      │  Contents
    ────────────────┼────────────┼─────────────────────────────────────
    0x10000000      │  2 MB      │  External Flash (XIP)
                    │            │  ↳ Your code runs from here
    ────────────────┼────────────┼─────────────────────────────────────
    0x20000000      │  256 KB    │  Main SRAM (4 banks × 64KB)
                    │            │  ↳ Variables, stack, heap
    0x20040000      │  4 KB      │  SRAM4 (for USB)
    0x20041000      │  4 KB      │  SRAM5 (for USB)
    ────────────────┼────────────┼─────────────────────────────────────
    0x40000000      │  -         │  APB Peripherals
                    │            │  ↳ UART, SPI, I2C, PWM...
    ────────────────┼────────────┼─────────────────────────────────────
    0x50000000      │  -         │  AHB-Lite Peripherals
                    │            │  ↳ DMA, USB, PIO...
    ────────────────┼────────────┼─────────────────────────────────────
    0xD0000000      │  -         │  SIO (Single-cycle I/O)
                    │            │  ↳ GPIO (fast access!)
    ────────────────┼────────────┼─────────────────────────────────────
    0xE0000000      │  -         │  Cortex-M0+ internal
                    │            │  ↳ NVIC, SysTick, Debug

Cortex-M Memory Map

Memory-Mapped I/O Concept:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

    Normal Memory:                  Peripheral Register:
    ──────────────                  ────────────────────
    LDR r0, [addr]                  LDR r0, [UART_DATA]
         │                               │
         ▼                               ▼
    Read from RAM                   Read TRIGGERS HARDWARE!
    Data was sitting there          Byte removed from RX FIFO
    Memory unchanged                Status flags updated

    STR r0, [addr]                  STR r0, [GPIO_OUT]
         │                               │
         ▼                               ▼
    Write to RAM                    Write CAUSES ACTION!
    Data now stored there           Pin voltage changes
    Can read it back                May not read same value back


Example: GPIO Control on RP2040:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

    SIO Base: 0xD0000000

    Offset  │ Register       │ Purpose
    ────────┼────────────────┼──────────────────────────────────
    0x000   │ CPUID          │ Processor ID (read-only)
    0x004   │ GPIO_IN        │ Read current GPIO input state
    0x010   │ GPIO_OUT       │ Read/write GPIO output state
    0x014   │ GPIO_OUT_SET   │ Set bits in GPIO_OUT (write-only)
    0x018   │ GPIO_OUT_CLR   │ Clear bits in GPIO_OUT (write-only)
    0x01C   │ GPIO_OUT_XOR   │ Toggle bits in GPIO_OUT (write-only)
    0x020   │ GPIO_OE        │ Output enable (1=output, 0=input)
    0x024   │ GPIO_OE_SET    │ Set bits in GPIO_OE
    0x028   │ GPIO_OE_CLR    │ Clear bits in GPIO_OE


    To turn ON GPIO25 (Pico's LED):
    ─────────────────────────────────────────────────────────────────

    LDR  r0, =0xD0000000     // SIO base address
    MOV  r1, #1
    LSL  r1, r1, #25         // r1 = 0x02000000 (bit 25)
    STR  r1, [r0, #0x024]    // GPIO_OE_SET: enable output
    STR  r1, [r0, #0x014]    // GPIO_OUT_SET: set high → LED ON!


    Why SET/CLR registers instead of just GPIO_OUT?
    ─────────────────────────────────────────────────────────────────

    Without SET/CLR (DANGEROUS):
    ┌────────────────────────────────────────────────────────────────┐
    │ LDR r1, [r0, #GPIO_OUT]   // Read current value                │
    │ ORR r1, r1, #(1<<25)      // Set bit 25                        │
    │ STR r1, [r0, #GPIO_OUT]   // Write back                        │
    │                                                                 │
    │ PROBLEM: If another core or interrupt modifies GPIO_OUT        │
    │ between the LDR and STR, those changes are LOST!               │
    │ This is a classic "read-modify-write race condition."          │
    └────────────────────────────────────────────────────────────────┘

    With SET/CLR (ATOMIC and SAFE):
    ┌────────────────────────────────────────────────────────────────┐
    │ MOV r1, #(1<<25)                                                │
    │ STR r1, [r0, #GPIO_OUT_SET]  // Hardware atomically sets bit   │
    │                                                                 │
    │ Other bits are UNAFFECTED - hardware handles it!               │
    └────────────────────────────────────────────────────────────────┘

Memory-Mapped I/O

Why Memory Barriers Are Needed:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Modern CPUs reorder memory accesses for performance. This is usually
invisible to single-threaded code, but becomes critical when:

  1. Communicating with peripherals (they have side effects!)
  2. Multi-core systems (other cores see different ordering)
  3. DMA operations (hardware sees memory, not caches)

Example WITHOUT barrier (BROKEN):
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

    You write:                  CPU might execute as:
    ──────────────────────      ────────────────────────────────
    mailbox_buffer[0] = cmd     mailbox_write = buffer_addr ← FIRST!
    mailbox_buffer[1] = arg     mailbox_buffer[0] = cmd     ← TOO LATE
    mailbox_write = buffer_addr mailbox_buffer[1] = arg

    The peripheral reads garbage because the buffer wasn't filled yet!


ARM Memory Barrier Instructions:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

    DMB (Data Memory Barrier)
    ├── Ensures all previous memory accesses complete before
    │   subsequent memory accesses begin
    ├── Does NOT affect instruction execution order
    └── Use between: data writes and peripheral write

    DSB (Data Synchronization Barrier)
    ├── Like DMB, but also waits for all previous instructions
    │   to complete (stronger than DMB)
    └── Use before: peripheral access that must be visible

    ISB (Instruction Synchronization Barrier)
    ├── Flushes the instruction pipeline
    ├── Ensures previous context changes take effect
    └── Use after: changing system registers, enabling MMU


Correct Pattern for Peripheral Access:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

    // Fill mailbox buffer
    str  w1, [x0]           // Write data to buffer
    str  w2, [x0, #4]       // Write more data

    dsb  sy                 // ← BARRIER: Complete all writes

    str  w3, [x4]           // Now write to mailbox register
                            // Hardware now sees complete buffer

ARM Memory Barriers

How it works (step-by-step, with invariants and failure modes)

  1. Identify which addresses are MMIO and which are normal memory.
  2. Use atomic SET/CLEAR registers when available to avoid races.
  3. Insert barriers before device “doorbell” writes to guarantee ordering. citeturn3search3
  4. Failure mode: devices read partial buffers, interrupts race, or GPIO bits flip incorrectly.

Minimal concrete example (pseudo, not runnable)

WRITE buffer
BARRIER
WRITE device_register

Common misconceptions

  • “MMIO behaves like RAM” → Reads and writes can trigger side effects.
  • “Ordering is always preserved” → CPUs and buses can reorder operations. citeturn3search3

Check-your-understanding questions

  1. Why can reading a UART data register change system state?
  2. When do you need a DSB instead of a DMB?
  3. Why are SET/CLEAR registers safer than read-modify-write?

Check-your-understanding answers

  1. MMIO reads can pop FIFO entries or clear flags, which changes hardware state.
  2. When you need to ensure prior instructions are fully completed before continuing. citeturn3search3
  3. They avoid races because the hardware performs the atomic bit update.

Real-world applications

  • GPIO control, DMA setup, and peripheral initialization in firmware.

Where you’ll apply it

  • This project: see §3.1 and §5.4 in P06-interrupt-driven-uart-echo.md
  • P04 Memory Map & MMIO Field Notebook
  • P09 Memory Ordering Litmus Tests

References

  • Arm ACLE barrier intrinsics and semantics. citeturn3search3

Key insights MMIO and ordering are the difference between “works once” and “always correct.”

Summary Memory maps define what addresses mean; barriers define when writes become real.

Homework/Exercises to practice the concept

  1. Describe a race condition caused by a read-modify-write on GPIO.
  2. Sketch an ordering bug where a peripheral sees stale data.

Solutions to the homework/exercises

  1. Another core sets a different bit between your read and write; your write erases it.
  2. You signal the device before writing the buffer; it reads garbage.

3. Project Specification

3.1 What You Will Build

An ISR-driven UART echo with a ring buffer and deterministic output.

3.2 Functional Requirements

  1. Requirement 1: Configure UART and enable RX interrupts
  2. Requirement 2: Capture bytes in ISR and echo them
  3. Requirement 3: Handle buffer full conditions gracefully

3.3 Non-Functional Requirements

  • No data loss at typical baud rates

3.4 Example Usage / Output

$ uart-echo
> hello
hello

$ uart-echo --buffer 0
error: buffer size must be >= 1
exit code: 2

3.5 Data Formats / Schemas / Protocols

  • Ring buffer: head, tail, size

3.6 Edge Cases

  • Interrupt storms
  • Buffer overflow

3.7 Real World Outcome

This is the golden reference for success:

  • UART echo works while main loop stays responsive.

3.7.1 How to Run (Copy/Paste)

  • Build: follow the toolchain steps defined in this guide
  • Run: use the CLI examples in §3.4 with fixed inputs
  • Expected directory: project root

3.7.2 Golden Path Demo (Deterministic)

Run with a fixed input set and confirm output matches §3.4 exactly.

3.7.3 If CLI: Exact Terminal Transcript

$ uart-echo
> hello
hello

$ uart-echo --buffer 0
error: buffer size must be >= 1
exit code: 2

4. Solution Architecture

4.1 High-Level Design

┌──────────────┐     ┌──────────────┐     ┌──────────────┐
│ Input Layer  │───▶│ Core Logic   │───▶│ Output Layer │
└──────────────┘     └──────────────┘     └──────────────┘

4.2 Key Components

Component Responsibility Key Decisions
Input Parser Validate and normalize input Strict error handling
Core Engine Perform the main computation Deterministic paths
Reporter Produce user-facing output Stable formatting

4.3 Data Structures (No Full Code)

Record Entry {
  name: string
  fields: list
  notes: text
}

4.4 Algorithm Overview

Key Algorithm: Core Flow

  1. Parse input and validate parameters.
  2. Execute the core transformation or analysis.
  3. Emit deterministic output or error summary.

Complexity Analysis:

  • Time: O(n) in the size of input records
  • Space: O(n) for stored mappings and logs

5. Implementation Guide

5.1 Development Environment Setup

# Install toolchain and verify versions
toolchain --version

5.2 Project Structure

project-root/
├── src/
│   ├── core
│   └── io
├── tests/
│   └── fixtures
├── docs/
└── README.md

5.3 The Core Question You’re Answering

“Use interrupts to echo UART input on Cortex-M.”

5.4 Concepts You Must Understand First

Stop and research these before coding:

  1. Boot, Exceptions & Interrupts
    • What is the key invariant you must preserve?
  2. Memory Maps, MMIO & Ordering
    • What is the key invariant you must preserve?

5.5 Questions to Guide Your Design

  1. Data Flow
    • How does input become output?
    • Which steps must be deterministic?
  2. Validation
    • What is the simplest test that proves correctness?
    • How will you detect regressions?

5.6 Thinking Exercise

Trace the Critical Path

Write a step-by-step trace of the most important workflow in this project.

Questions to answer:

  • Where could a subtle bug hide?
  • What would you log to prove correctness?

5.7 The Interview Questions They’ll Ask

  1. “What is the core invariant this project relies on?”
  2. “How would you debug a failure in this workflow?”
  3. “What trade-offs did you make in design?”
  4. “How does this map to real hardware or toolchains?”
  5. “How do you prove your output is correct?”

5.8 Hints in Layers

Hint 1: Start small Focus on the smallest input that still demonstrates the concept.

Hint 2: Make output deterministic Fix inputs and produce stable logs before expanding functionality.

Hint 3: Validate against a known reference Compare with a known-good output or specification.

Hint 4: Add instrumentation Log internal steps so you can verify each phase explicitly.

5.9 Books That Will Help

Topic Book Chapter
Core concept “ARM Assembly Language” by William Hohl Ch. 3-5
Binary formats “Linkers and Loaders” by John R. Levine Ch. 1-3

5.10 Implementation Phases

Phase 1: Foundation (2-4 hours)

Goals:

  • Establish a minimal working pipeline
  • Validate one end-to-end path Tasks:
    1. Build the smallest viable input and output
    2. Verify outputs against a reference Checkpoint: Output matches expected golden path

Phase 2: Core Functionality (4-8 hours)

Goals:

  • Implement main logic and validation
  • Add structured error handling Tasks:
    1. Implement the core transformation
    2. Add deterministic reporting Checkpoint: Core tests pass reliably

Phase 3: Polish & Edge Cases (2-4 hours)

Goals:

  • Cover edge cases
  • Improve output clarity Tasks:
    1. Add negative tests
    2. Document limitations Checkpoint: All edge cases handled gracefully

5.11 Key Implementation Decisions

Decision Options Recommendation Rationale
Input format Free-form vs structured Structured Easier validation
Output format Human vs machine Both Supports verification and tooling

6. Testing Strategy

6.1 Test Categories

Category Purpose Examples
Unit Tests Validate core logic Field parsing, bounds checks
Integration Tests Validate full flow End-to-end CLI runs
Edge Case Tests Validate boundaries Empty input, invalid flags

6.2 Critical Test Cases

  1. Golden path: Fixed input produces known output.
  2. Invalid input: Error path triggers correct exit code.
  3. Boundary case: Maximum supported value handled correctly.

6.3 Test Data

Input: fixed seed or fixed fixture
Expected: exact output text from §3.4

7. Common Pitfalls & Debugging

Pitfall Symptom Solution
Misaligned assumptions Unexpected output Re-check invariants
Missing validation Silent failures Add explicit checks
Non-determinism Flaky output Fix inputs and seeds

7.2 Debugging Strategies

  • Trace everything: Log each step with stable ordering
  • Compare against reference: Use known-good outputs

7.3 Performance Traps

  • Avoid repeated parsing of the same input; cache results when possible

8. Extensions & Challenges

8.1 Beginner Extensions

  • Add one extra output format
  • Add a help screen with examples

8.2 Intermediate Extensions

  • Add a verification mode that compares two outputs
  • Add structured JSON output

8.3 Advanced Extensions

  • Add a batch mode for large inputs
  • Add cross-target comparisons (M vs A profile)

9. Real-World Connections

9.1 Industry Applications

  • Firmware bring-up: use the same checks to validate early boot images
  • Security audits: analyze binaries for ABI or control-flow correctness
  • binutils: source of many ARM tooling workflows
  • QEMU: emulator used for ARM testing

9.3 Interview Relevance

  • Explains why ARM behavior differs across profiles
  • Demonstrates toolchain literacy and debugging rigor

10. Resources

10.1 Essential Reading

  • “ARM Assembly Language” by William Hohl - practical instruction usage
  • “Linkers and Loaders” by John R. Levine - binary layout

10.2 Video Resources

  • ARM architecture overview talks and lectures

10.3 Tools & Documentation

  • GNU binutils documentation
  • Arm developer documentation
  • This project connects with: P01-toolchain-pipeline-explorer.md, P02-register-stack-visualizer.md, P03-thumb-encoder-decoder.md

11. Self-Assessment Checklist

11.1 Understanding

  • I can explain the core concept without notes
  • I can explain why my design choices were necessary
  • I can describe one realistic failure mode

11.2 Implementation

  • All functional requirements are met
  • Tests pass deterministically
  • Edge cases are documented

11.3 Growth

  • I can describe what I would improve next time
  • I can explain this project in an interview

12. Submission / Completion Criteria

Minimum Viable Completion:

  • Core functionality works on reference inputs
  • Deterministic golden path is documented
  • At least one failure path is demonstrated

Full Completion:

  • All minimum criteria plus:
  • Edge cases are covered with tests
  • Output format is stable and documented

Excellence (Going Above & Beyond):

  • Add a comparison against a second target
  • Provide a short write-up of lessons learned