Project 4: Memory Map & MMIO Field Notebook
Document MMIO register maps and compute addresses reliably.
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Level 3 |
| Time Estimate | 10-16 hours |
| Main Programming Language | Assembly + C (Alternatives: Rust) |
| Alternative Programming Languages | Rust |
| Coolness Level | Level 3 |
| Business Potential | Level 2 |
| Prerequisites | Datasheet reading, Concept 4: Memory Maps & Ordering |
| Key Topics | MMIO semantics, atomic set/clear, address calculation |
1. Learning Objectives
By completing this project, you will:
- Translate ARM concepts into observable outputs you can verify.
- Explain why each toolchain or hardware step is necessary.
- Detect and fix at least one realistic failure mode.
- Communicate the result clearly in a technical review or interview.
2. All Theory Needed (Per-Concept Breakdown)
Memory Maps, MMIO & Ordering
Fundamentals ARM systems expose peripherals through memory-mapped I/O (MMIO): reading or writing specific addresses triggers hardware behavior rather than normal memory access. This is central to microcontrollers and still vital on A-profile SoCs. The memory map defines which address ranges are RAM, flash, peripherals, and internal control regions. Memory ordering adds another layer: modern CPUs can reorder memory accesses for performance, so barriers (DMB/DSB/ISB) are required to guarantee visibility and ordering to devices or other cores. citeturn3search3 Understanding MMIO and ordering is the key to controlling hardware reliably.
Deep Dive A memory map is a contract between the CPU and the SoC. Addresses are not abstract: they correspond to real hardware blocks. In Cortex-M systems, large fixed ranges map to flash, SRAM, peripherals, and internal control registers. These ranges determine what happens when you load or store. For example, a store to a GPIO register flips a pin; a load from a UART data register consumes a byte from a FIFO. MMIO behaves differently from RAM: it is often non-cacheable, may have side effects on read, and is frequently write-only or read-only. When you treat it like ordinary memory, bugs emerge: stale values, missing updates, or unintended state changes.
Memory ordering complicates this further. ARM cores, like most modern CPUs, can reorder memory operations to improve performance. This is invisible in single-threaded logic but catastrophic for devices and multi-core coordination. If you write a command buffer to memory and then write a “doorbell” MMIO register that tells the device to consume it, the device might see the doorbell first unless you insert a barrier. ARM provides barrier instructions—DMB, DSB, ISB—each with distinct strength. DMB ensures prior memory accesses are observed before subsequent ones; DSB additionally waits for completion; ISB flushes the instruction pipeline to make control-register changes visible. citeturn3search3 These are not optional: they are the difference between “mostly works” and “always correct.”
On microcontrollers, you may not have caches or complex reorder buffers, but the bus fabric and peripheral interactions still require ordering. On A-profile systems with caches, speculation, and out-of-order execution, the need is even greater. DMA engines read and write memory independently of the CPU; if you don’t synchronize caches or enforce ordering, the DMA sees stale or partial data. This is why firmware often combines barriers with explicit cache maintenance. The principle is simple: your mental model must include the device, the bus, and the CPU pipeline, not just the instruction sequence.
MMIO access patterns also introduce concurrency hazards. Read-modify-write sequences can race with interrupts or other cores. Hardware often provides SET/CLEAR registers specifically to avoid these races by allowing atomic bit operations. If you ignore these and perform a naive read-modify-write, you can silently clear unrelated bits. The safest approach is to understand the register semantics and use the atomic registers provided. That is not assembly-specific, but assembly exposes the pattern directly and makes it obvious.
How this fits on projects
- Core to P04 (Memory Map & MMIO Field Notebook) and P09 (Memory Ordering Litmus Tests).
Definitions & key terms
- Memory map: The assignment of address ranges to RAM, flash, and peripherals.
- MMIO: Memory addresses that control hardware rather than store data.
- DMB/DSB/ISB: Memory barrier instructions for ordering and visibility. citeturn3search3
Mental model diagram
Cortex-M Memory Map (4GB address space):
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
0xFFFFFFFF ┌─────────────────────────────────────────┐
│ Vendor-Specific │
0xE0100000 ├─────────────────────────────────────────┤
│ Private Peripheral Bus │ ← NVIC lives here
│ (Internal peripherals) │ at 0xE000E000
0xE0000000 ├─────────────────────────────────────────┤
│ │
│ External Device │ ← Memory-mapped
│ (Peripherals, etc.) │ devices
│ │
0xA0000000 ├─────────────────────────────────────────┤
│ │
│ External RAM │
│ │
0x60000000 ├─────────────────────────────────────────┤
│ │
│ Peripheral │ ← GPIO, UART, SPI,
│ (On-chip I/O) │ I2C, PWM, etc.
│ │
0x40000000 ├─────────────────────────────────────────┤
│ │
│ SRAM │ ← Variables, stack,
│ (On-chip RAM) │ heap
│ │
0x20000000 ├─────────────────────────────────────────┤
│ │
│ Code │ ← Flash/ROM with
│ (Flash/ROM) │ your program
│ │
0x00000000 └─────────────────────────────────────────┘
RP2040-Specific Memory Map:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Address │ Size │ Contents
────────────────┼────────────┼─────────────────────────────────────
0x10000000 │ 2 MB │ External Flash (XIP)
│ │ ↳ Your code runs from here
────────────────┼────────────┼─────────────────────────────────────
0x20000000 │ 256 KB │ Main SRAM (4 banks × 64KB)
│ │ ↳ Variables, stack, heap
0x20040000 │ 4 KB │ SRAM4 (for USB)
0x20041000 │ 4 KB │ SRAM5 (for USB)
────────────────┼────────────┼─────────────────────────────────────
0x40000000 │ - │ APB Peripherals
│ │ ↳ UART, SPI, I2C, PWM...
────────────────┼────────────┼─────────────────────────────────────
0x50000000 │ - │ AHB-Lite Peripherals
│ │ ↳ DMA, USB, PIO...
────────────────┼────────────┼─────────────────────────────────────
0xD0000000 │ - │ SIO (Single-cycle I/O)
│ │ ↳ GPIO (fast access!)
────────────────┼────────────┼─────────────────────────────────────
0xE0000000 │ - │ Cortex-M0+ internal
│ │ ↳ NVIC, SysTick, Debug

Memory-Mapped I/O Concept:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Normal Memory: Peripheral Register:
────────────── ────────────────────
LDR r0, [addr] LDR r0, [UART_DATA]
│ │
▼ ▼
Read from RAM Read TRIGGERS HARDWARE!
Data was sitting there Byte removed from RX FIFO
Memory unchanged Status flags updated
STR r0, [addr] STR r0, [GPIO_OUT]
│ │
▼ ▼
Write to RAM Write CAUSES ACTION!
Data now stored there Pin voltage changes
Can read it back May not read same value back
Example: GPIO Control on RP2040:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
SIO Base: 0xD0000000
Offset │ Register │ Purpose
────────┼────────────────┼──────────────────────────────────
0x000 │ CPUID │ Processor ID (read-only)
0x004 │ GPIO_IN │ Read current GPIO input state
0x010 │ GPIO_OUT │ Read/write GPIO output state
0x014 │ GPIO_OUT_SET │ Set bits in GPIO_OUT (write-only)
0x018 │ GPIO_OUT_CLR │ Clear bits in GPIO_OUT (write-only)
0x01C │ GPIO_OUT_XOR │ Toggle bits in GPIO_OUT (write-only)
0x020 │ GPIO_OE │ Output enable (1=output, 0=input)
0x024 │ GPIO_OE_SET │ Set bits in GPIO_OE
0x028 │ GPIO_OE_CLR │ Clear bits in GPIO_OE
To turn ON GPIO25 (Pico's LED):
─────────────────────────────────────────────────────────────────
LDR r0, =0xD0000000 // SIO base address
MOV r1, #1
LSL r1, r1, #25 // r1 = 0x02000000 (bit 25)
STR r1, [r0, #0x024] // GPIO_OE_SET: enable output
STR r1, [r0, #0x014] // GPIO_OUT_SET: set high → LED ON!
Why SET/CLR registers instead of just GPIO_OUT?
─────────────────────────────────────────────────────────────────
Without SET/CLR (DANGEROUS):
┌────────────────────────────────────────────────────────────────┐
│ LDR r1, [r0, #GPIO_OUT] // Read current value │
│ ORR r1, r1, #(1<<25) // Set bit 25 │
│ STR r1, [r0, #GPIO_OUT] // Write back │
│ │
│ PROBLEM: If another core or interrupt modifies GPIO_OUT │
│ between the LDR and STR, those changes are LOST! │
│ This is a classic "read-modify-write race condition." │
└────────────────────────────────────────────────────────────────┘
With SET/CLR (ATOMIC and SAFE):
┌────────────────────────────────────────────────────────────────┐
│ MOV r1, #(1<<25) │
│ STR r1, [r0, #GPIO_OUT_SET] // Hardware atomically sets bit │
│ │
│ Other bits are UNAFFECTED - hardware handles it! │
└────────────────────────────────────────────────────────────────┘

Why Memory Barriers Are Needed:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Modern CPUs reorder memory accesses for performance. This is usually
invisible to single-threaded code, but becomes critical when:
1. Communicating with peripherals (they have side effects!)
2. Multi-core systems (other cores see different ordering)
3. DMA operations (hardware sees memory, not caches)
Example WITHOUT barrier (BROKEN):
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
You write: CPU might execute as:
────────────────────── ────────────────────────────────
mailbox_buffer[0] = cmd mailbox_write = buffer_addr ← FIRST!
mailbox_buffer[1] = arg mailbox_buffer[0] = cmd ← TOO LATE
mailbox_write = buffer_addr mailbox_buffer[1] = arg
The peripheral reads garbage because the buffer wasn't filled yet!
ARM Memory Barrier Instructions:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
DMB (Data Memory Barrier)
├── Ensures all previous memory accesses complete before
│ subsequent memory accesses begin
├── Does NOT affect instruction execution order
└── Use between: data writes and peripheral write
DSB (Data Synchronization Barrier)
├── Like DMB, but also waits for all previous instructions
│ to complete (stronger than DMB)
└── Use before: peripheral access that must be visible
ISB (Instruction Synchronization Barrier)
├── Flushes the instruction pipeline
├── Ensures previous context changes take effect
└── Use after: changing system registers, enabling MMU
Correct Pattern for Peripheral Access:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
// Fill mailbox buffer
str w1, [x0] // Write data to buffer
str w2, [x0, #4] // Write more data
dsb sy // ← BARRIER: Complete all writes
str w3, [x4] // Now write to mailbox register
// Hardware now sees complete buffer

How it works (step-by-step, with invariants and failure modes)
- Identify which addresses are MMIO and which are normal memory.
- Use atomic SET/CLEAR registers when available to avoid races.
- Insert barriers before device “doorbell” writes to guarantee ordering. citeturn3search3
- Failure mode: devices read partial buffers, interrupts race, or GPIO bits flip incorrectly.
Minimal concrete example (pseudo, not runnable)
WRITE buffer
BARRIER
WRITE device_register
Common misconceptions
- “MMIO behaves like RAM” → Reads and writes can trigger side effects.
- “Ordering is always preserved” → CPUs and buses can reorder operations. citeturn3search3
Check-your-understanding questions
- Why can reading a UART data register change system state?
- When do you need a DSB instead of a DMB?
- Why are SET/CLEAR registers safer than read-modify-write?
Check-your-understanding answers
- MMIO reads can pop FIFO entries or clear flags, which changes hardware state.
- When you need to ensure prior instructions are fully completed before continuing. citeturn3search3
- They avoid races because the hardware performs the atomic bit update.
Real-world applications
- GPIO control, DMA setup, and peripheral initialization in firmware.
Where you’ll apply it
- This project: see §3.1 and §5.4 in P04-mmio-memory-map-notebook.md
- P04 Memory Map & MMIO Field Notebook
- P09 Memory Ordering Litmus Tests
References
- Arm ACLE barrier intrinsics and semantics. citeturn3search3
Key insights MMIO and ordering are the difference between “works once” and “always correct.”
Summary Memory maps define what addresses mean; barriers define when writes become real.
Homework/Exercises to practice the concept
- Describe a race condition caused by a read-modify-write on GPIO.
- Sketch an ordering bug where a peripheral sees stale data.
Solutions to the homework/exercises
- Another core sets a different bit between your read and write; your write erases it.
- You signal the device before writing the buffer; it reads garbage.
3. Project Specification
3.1 What You Will Build
A structured notebook that maps peripheral registers to addresses and behaviors.
3.2 Functional Requirements
- Requirement 1: Compute addresses from base + offset
- Requirement 2: Track read/write/clear-on-read semantics
- Requirement 3: Provide a lookup interface for at least 3 peripherals
3.3 Non-Functional Requirements
- Data must be consistent and validated
3.4 Example Usage / Output
$ mmio-notebook lookup GPIO_OUT_SET
Base: 0xD0000000
Offset: 0x014
Address: 0xD0000014
Access: write-only
$ mmio-notebook lookup UNKNOWN
error: register not found
exit code: 3
3.5 Data Formats / Schemas / Protocols
- Register entry: name, base, offset, access, notes
3.6 Edge Cases
- Duplicate names
- Conflicting offsets
3.7 Real World Outcome
This is the golden reference for success:
- You can explain why a write toggles a pin and a read clears a flag.
3.7.1 How to Run (Copy/Paste)
- Build: follow the toolchain steps defined in this guide
- Run: use the CLI examples in §3.4 with fixed inputs
- Expected directory: project root
3.7.2 Golden Path Demo (Deterministic)
Run with a fixed input set and confirm output matches §3.4 exactly.
3.7.3 If CLI: Exact Terminal Transcript
$ mmio-notebook lookup GPIO_OUT_SET
Base: 0xD0000000
Offset: 0x014
Address: 0xD0000014
Access: write-only
$ mmio-notebook lookup UNKNOWN
error: register not found
exit code: 3
4. Solution Architecture
4.1 High-Level Design
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Input Layer │───▶│ Core Logic │───▶│ Output Layer │
└──────────────┘ └──────────────┘ └──────────────┘
4.2 Key Components
| Component | Responsibility | Key Decisions |
|---|---|---|
| Input Parser | Validate and normalize input | Strict error handling |
| Core Engine | Perform the main computation | Deterministic paths |
| Reporter | Produce user-facing output | Stable formatting |
4.3 Data Structures (No Full Code)
Record Entry {
name: string
fields: list
notes: text
}
4.4 Algorithm Overview
Key Algorithm: Core Flow
- Parse input and validate parameters.
- Execute the core transformation or analysis.
- Emit deterministic output or error summary.
Complexity Analysis:
- Time: O(n) in the size of input records
- Space: O(n) for stored mappings and logs
5. Implementation Guide
5.1 Development Environment Setup
# Install toolchain and verify versions
toolchain --version
5.2 Project Structure
project-root/
├── src/
│ ├── core
│ └── io
├── tests/
│ └── fixtures
├── docs/
└── README.md
5.3 The Core Question You’re Answering
“Document MMIO register maps and compute addresses reliably.”
5.4 Concepts You Must Understand First
Stop and research these before coding:
- Memory Maps, MMIO & Ordering
- What is the key invariant you must preserve?
5.5 Questions to Guide Your Design
- Data Flow
- How does input become output?
- Which steps must be deterministic?
- Validation
- What is the simplest test that proves correctness?
- How will you detect regressions?
5.6 Thinking Exercise
Trace the Critical Path
Write a step-by-step trace of the most important workflow in this project.
Questions to answer:
- Where could a subtle bug hide?
- What would you log to prove correctness?
5.7 The Interview Questions They’ll Ask
- “What is the core invariant this project relies on?”
- “How would you debug a failure in this workflow?”
- “What trade-offs did you make in design?”
- “How does this map to real hardware or toolchains?”
- “How do you prove your output is correct?”
5.8 Hints in Layers
Hint 1: Start small Focus on the smallest input that still demonstrates the concept.
Hint 2: Make output deterministic Fix inputs and produce stable logs before expanding functionality.
Hint 3: Validate against a known reference Compare with a known-good output or specification.
Hint 4: Add instrumentation Log internal steps so you can verify each phase explicitly.
5.9 Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Core concept | “ARM Assembly Language” by William Hohl | Ch. 3-5 |
| Binary formats | “Linkers and Loaders” by John R. Levine | Ch. 1-3 |
5.10 Implementation Phases
Phase 1: Foundation (2-4 hours)
Goals:
- Establish a minimal working pipeline
- Validate one end-to-end path
Tasks:
- Build the smallest viable input and output
- Verify outputs against a reference Checkpoint: Output matches expected golden path
Phase 2: Core Functionality (4-8 hours)
Goals:
- Implement main logic and validation
- Add structured error handling
Tasks:
- Implement the core transformation
- Add deterministic reporting Checkpoint: Core tests pass reliably
Phase 3: Polish & Edge Cases (2-4 hours)
Goals:
- Cover edge cases
- Improve output clarity
Tasks:
- Add negative tests
- Document limitations Checkpoint: All edge cases handled gracefully
5.11 Key Implementation Decisions
| Decision | Options | Recommendation | Rationale |
|---|---|---|---|
| Input format | Free-form vs structured | Structured | Easier validation |
| Output format | Human vs machine | Both | Supports verification and tooling |
6. Testing Strategy
6.1 Test Categories
| Category | Purpose | Examples |
|---|---|---|
| Unit Tests | Validate core logic | Field parsing, bounds checks |
| Integration Tests | Validate full flow | End-to-end CLI runs |
| Edge Case Tests | Validate boundaries | Empty input, invalid flags |
6.2 Critical Test Cases
- Golden path: Fixed input produces known output.
- Invalid input: Error path triggers correct exit code.
- Boundary case: Maximum supported value handled correctly.
6.3 Test Data
Input: fixed seed or fixed fixture
Expected: exact output text from §3.4
7. Common Pitfalls & Debugging
| Pitfall | Symptom | Solution |
|---|---|---|
| Misaligned assumptions | Unexpected output | Re-check invariants |
| Missing validation | Silent failures | Add explicit checks |
| Non-determinism | Flaky output | Fix inputs and seeds |
7.2 Debugging Strategies
- Trace everything: Log each step with stable ordering
- Compare against reference: Use known-good outputs
7.3 Performance Traps
- Avoid repeated parsing of the same input; cache results when possible
8. Extensions & Challenges
8.1 Beginner Extensions
- Add one extra output format
- Add a help screen with examples
8.2 Intermediate Extensions
- Add a verification mode that compares two outputs
- Add structured JSON output
8.3 Advanced Extensions
- Add a batch mode for large inputs
- Add cross-target comparisons (M vs A profile)
9. Real-World Connections
9.1 Industry Applications
- Firmware bring-up: use the same checks to validate early boot images
- Security audits: analyze binaries for ABI or control-flow correctness
9.2 Related Open Source Projects
- binutils: source of many ARM tooling workflows
- QEMU: emulator used for ARM testing
9.3 Interview Relevance
- Explains why ARM behavior differs across profiles
- Demonstrates toolchain literacy and debugging rigor
10. Resources
10.1 Essential Reading
- “ARM Assembly Language” by William Hohl - practical instruction usage
- “Linkers and Loaders” by John R. Levine - binary layout
10.2 Video Resources
- ARM architecture overview talks and lectures
10.3 Tools & Documentation
- GNU binutils documentation
- Arm developer documentation
10.4 Related Projects in This Series
- This project connects with: P01-toolchain-pipeline-explorer.md, P02-register-stack-visualizer.md, P03-thumb-encoder-decoder.md
11. Self-Assessment Checklist
11.1 Understanding
- I can explain the core concept without notes
- I can explain why my design choices were necessary
- I can describe one realistic failure mode
11.2 Implementation
- All functional requirements are met
- Tests pass deterministically
- Edge cases are documented
11.3 Growth
- I can describe what I would improve next time
- I can explain this project in an interview
12. Submission / Completion Criteria
Minimum Viable Completion:
- Core functionality works on reference inputs
- Deterministic golden path is documented
- At least one failure path is demonstrated
Full Completion:
- All minimum criteria plus:
- Edge cases are covered with tests
- Output format is stable and documented
Excellence (Going Above & Beyond):
- Add a comparison against a second target
- Provide a short write-up of lessons learned