Learn RP2350 LCD Development: From Zero to Embedded Graphics Master
Goal: Build deep, practical mastery of embedded graphics and real-time systems by programming the RP2350 1.47-inch LCD board end-to-end. You will understand the RP2350 dual-ISA architecture, boot flow, memory map, clocking, GPIO pad control, SPI display protocols, DMA/PIO pipelines, multicore scheduling, and storage/USB I/O. You will be able to design and debug high-performance graphics pipelines, write bare-metal drivers, and build interactive applications on constrained hardware. By the end, you can reason about every byte, cycle, and bus transaction that moves a pixel from your C code onto the screen.
Introduction
RP2350 LCD development is the practice of building real-time, resource-constrained graphics systems on Raspberry Pi’s dual-ISA microcontroller (RP2350) paired with a 1.47-inch SPI LCD. You learn how processors boot, how memory and buses are organized, how pixels are encoded, and how DMA/PIO pipelines eliminate CPU overhead while keeping animations smooth. The Waveshare RP2350-LCD-1.47-A board combines the RP2350 with a 172x320, 262K-color ST7789-based LCD, onboard RGB LED, QSPI flash, and a TF (microSD) slot, making it a compact lab for embedded graphics and I/O.
What you will build (by the end of this guide):
- A complete SPI driver and ST7789 LCD initialization stack
- A software graphics engine (primitives, sprites, fonts, compositing)
- A DMA-driven display pipeline with double buffering
- A dual-core rendering system that hits smooth frame rates
- A bare-metal display driver without any SDK
- A USB HID device and a mini cooperative OS with a task manager UI
Scope (what’s included):
- RP2350 silicon architecture, memory map, and boot flow
- SPI display programming with ST7789 command sequences
- DMA/PIO pipelines, multicore scheduling, and real-time timing
- Storage and USB fundamentals for embedded peripherals
Out of scope (for this guide):
- Full RTOS integration (FreeRTOS/Zephyr) beyond concept-level
- Complex 3D graphics or GPU-based rendering
- Hardware board design or PCB layout
The Big Picture (Mental Model)
┌─────────────────────────────────────────────┐
│ YOUR APPLICATION │
│ game loop, UI logic, sensors, input, etc. │
└────────────────────────────┬────────────────┘
│
v
┌────────────────────────────────────────────────────────────────────────┐
│ SOFTWARE GRAPHICS PIPELINE (CPU) │
│ draw -> rasterize -> blend -> write framebuffer (RGB565/RGB666) │
└────────────────────────────┬───────────────────────────────────────────┘
│
v
┌────────────────────────────────────────────────────────────────────────┐
│ DMA + SPI + PIO DISPLAY PIPELINE │
│ DMA pulls framebuffer -> SPI FIFO -> ST7789 RAM -> LCD pixels │
└────────────────────────────┬───────────────────────────────────────────┘
│
v
┌────────────────────────────────────────────────────────────────────────┐
│ ST7789 DISPLAY CONTROLLER │
│ address window + pixel format + refresh = visible pixels │
└────────────────────────────────────────────────────────────────────────┘
Key Terms You Will See Everywhere
- XIP: Execute-In-Place, running code directly from QSPI flash.
- PIO: Programmable IO, tiny deterministic state machines for IO.
- DREQ: DMA request signal used for pacing peripheral transfers.
- MADCTL: ST7789 memory access control (rotation/mirror).
- RGB565/RGB666: 16-bit/18-bit pixel formats used by LCD.
How to Use This Guide
- Read the Theory Primer first. It is a mini-book that gives you the mental models needed to make sense of the projects.
- Complete the projects in order if you are new to embedded graphics. Each project adds a new concept and a new performance layer.
- For each project, use the Thinking Exercise and Design Questions before coding. These prevent false starts.
- If you get stuck, use the Hints in Layers section. Each hint reveals slightly more without spoiling the entire solution.
- Track mastery using the Definition of Done checklists.
Prerequisites & Background Knowledge
Essential Prerequisites (Must Have)
Programming Skills:
- C fundamentals: pointers, structs, arrays, and bitwise operations
- Ability to read and write simple Make/CMake-based projects
- Comfort with serial logs and basic debugging
Embedded Fundamentals:
- GPIO basics (input/output, pull-ups, pin mux)
- Basic SPI knowledge (clock, MOSI, CS, data framing)
- Interrupts and timers at a conceptual level
Recommended Reading: “Making Embedded Systems, 2nd Edition” by Elecia White - Ch. 1-4
Computer Architecture Basics:
- Memory mapped IO
- The difference between RAM and Flash
Recommended Reading: “Code: The Hidden Language of Computer Hardware and Software” by Charles Petzold - Ch. 13-16
Helpful But Not Required
Advanced Topics:
- DMA configuration and circular buffers (learn in Project 3)
- PIO assembly and state machines (learn in Project 4)
- Multicore synchronization and lock-free queues (learn in Project 5)
- USB descriptors (learn in Project 12)
Self-Assessment Questions
- Can you read a peripheral register value and explain what each bit does?
- Can you explain SPI timing (CPOL/CPHA) and why it matters?
- Have you ever written to a memory-mapped register in C?
- Do you know how to use a logic analyzer or oscilloscope to verify a signal?
- Can you explain the difference between blocking and DMA-driven IO?
Development Environment Setup
Required Tools:
- A Raspberry Pi RP2350 LCD 1.47” board (Waveshare RP2350-LCD-1.47-A or equivalent)
- A USB-C cable (data-capable)
- A build machine (Linux/macOS/Windows) with CMake + GCC
- Pico SDK and toolchain (arm-none-eabi-gcc or LLVM + pico-sdk)
Recommended Tools:
- Logic analyzer (Saleae or cheap 8-channel analyzer)
- SWD debugger (Raspberry Pi Debug Probe or ST-Link)
- Python 3.11+ (for asset conversion tools)
Testing Your Setup:
$ arm-none-eabi-gcc --version
arm-none-eabi-gcc (GNU Arm Embedded Toolchain) 12.x
$ cmake --version
cmake version 3.22+
Time Investment
- Simple projects (1, 2): Weekend (4-8 hours each)
- Moderate projects (3, 4, 6, 8, 9): 1-2 weeks each
- Complex projects (5, 7, 10, 11, 12): 2-4 weeks each
- Final project (13): 1-2 months
Important Reality Check
Embedded graphics is deceptively deep. Your first version will work, your second will be fast, and your third will be clean. Expect to iterate. The real learning happens when you profile, optimize, and debug invisible timing bugs. This guide is structured so that each project teaches a new performance or correctness constraint.
Big Picture / Mental Model
┌────────────────────────────────────────────────────────────────────────┐
│ INPUT SOURCES │
│ GPIO buttons, USB HID, SD card, sensors │
└────────────────────────────┬───────────────────────────────────────────┘
│
v
┌────────────────────────────────────────────────────────────────────────┐
│ APPLICATION LOGIC │
│ game loop, UI state, animation, menus │
└────────────────────────────┬───────────────────────────────────────────┘
│
v
┌────────────────────────────────────────────────────────────────────────┐
│ GRAPHICS / RENDERING CORE │
│ primitives -> sprites -> fonts -> composition -> framebuffer │
└────────────────────────────┬───────────────────────────────────────────┘
│
v
┌────────────────────────────────────────────────────────────────────────┐
│ DATA MOVEMENT & IO ENGINES │
│ DMA + SPI + PIO + interrupts + timers │
└────────────────────────────┬───────────────────────────────────────────┘
│
v
┌────────────────────────────────────────────────────────────────────────┐
│ ST7789 LCD CONTROLLER │
│ address windows, pixel format, RAM write │
└────────────────────────────────────────────────────────────────────────┘
Theory Primer (Read This Before Coding)
Chapter 1: RP2350 Dual-ISA Architecture, Boot Flow, and Security Domains
Fundamentals
The RP2350 is unusual because it can boot either a pair of Arm Cortex-M33 cores or a pair of Hazard3 RISC-V cores. That means the same chip lets you run two different instruction sets, which is an ideal way to learn how ISA choices affect performance, debugging, and toolchains. The boot ROM and bootstrapping logic decide which core pair to enable at startup. This is not just a gimmick: it affects compiler selection, debug tools, and even how you interpret instruction traces on a logic analyzer. The RP2350 also includes security features such as TrustZone support for Arm, OTP memory, and hardware crypto primitives. Understanding secure vs non-secure execution matters because it changes which peripherals and memory regions are accessible. These architectural choices ripple through everything you build in this guide: from how you link code to how you manage memory protection and concurrency.
Deep Dive into the Concept
A dual-ISA microcontroller exposes you to the boundaries between software and silicon. The RP2350 contains two Cortex-M33 cores and two Hazard3 cores, but only one pair is active at a time. This means the boot ROM configures a processor subsystem at boot, then releases the chosen cores from reset. On Arm, the Cortex-M33 brings TrustZone (secure and non-secure states), optional floating-point, and a well-known debug ecosystem (SWD, ETM). On Hazard3, you get a compact 3-stage RV32 core with optional extensions (M, A, C, and bit manipulation sets). That changes code density, interrupt latency, and performance characteristics. For example, compressed instructions in RISC-V reduce flash bandwidth pressure, while ARM’s hardware FPU can accelerate graphics math. A dual-ISA environment is a practical way to compare pipeline depth, ISA encoding, and toolchain maturity in real tasks like drawing sprites or running benchmarks.
Boot flow matters because embedded systems typically start executing from ROM, then configure clocks, copy data into RAM, and finally jump to the main application. If you bypass the SDK, you must set up the vector table, stack pointers, and memory sections yourself. Any mistake shows up as a silent boot failure. On RP2350, the boot ROM is also involved in the USB mass-storage (UF2) boot process, which is why boards appear as a drag-and-drop drive in BOOTSEL mode. Secure boot and OTP features matter in production: you can lock down which code can boot, or store keys that control firmware validation. Even if you are not building a secure product, these features change how you think about trust boundaries, update strategies, and debug access. You must also internalize that peripherals can be assigned to security domains, which influences what a non-secure application can access.
Dual-ISA also means you will maintain two separate build configurations and sometimes two assembly dialects. That is not just busywork: it forces you to understand what your compiler outputs, and how calling conventions, alignment rules, and memory barriers differ. In later projects, you will use these differences to profile graphics pipelines and build a fair performance comparison between ARM and RISC-V. You will also see practical differences in DSP and floating-point support: on Cortex-M33, many math operations map to hardware instructions; on Hazard3, they may be emulated or rely on software libraries. For graphics and signal processing, this can be the difference between a smooth animation and a stuttering one.
Security is not optional in modern devices. TrustZone gives you two worlds: secure and non-secure. A secure boot process can set up memory regions and then hand off to non-secure code. If you later build USB devices or handle external data (SD card images), separating the parsing logic from privileged firmware becomes a best practice. Even if you do not fully implement this in your projects, understanding the concept will improve your reasoning about firmware upgrades and trust boundaries.
Toolchains are part of the architecture story. ARM uses a mature ecosystem with widespread debugger support, while RISC-V tools are newer and may differ in debug visibility or optimization behavior. In practice, you will notice differences in startup code, linker scripts, and debug probes. This forces you to understand the lower-level details of your firmware, which is precisely the kind of mastery you want from this guide.
How This Fits in Projects
You will use this in Projects 1, 5, 7, 10, and 13 where boot flow, ISA selection, and security domains directly impact startup code and performance.
Definitions & Key Terms
- ISA: Instruction Set Architecture, the software-visible contract for a CPU.
- TrustZone: Arm security extension that separates secure/non-secure state.
- Boot ROM: Read-only boot code that initializes the chip and loads firmware.
- OTP: One-time programmable memory used for keys and device identity.
- PMP: Physical Memory Protection (RISC-V mechanism for isolation).
Mental Model Diagram
BOOT ROM
│
v
┌─────────────────┐
│ ISA SELECT MUX │
└───────┬─────────┘
│
┌───────┴─────────┐
│ │
v v
ARM Cortex-M33 Hazard3 RISC-V
│ │
v v
Secure/Non-secure PMP + Privilege
How It Works (Step-by-Step)
- Power-on reset releases boot ROM.
- Boot ROM reads boot mode and ISA select configuration.
- Boot ROM initializes minimal clocks and memory.
- Chosen core pair is released from reset.
- Vector table and stack pointers are configured.
- Application code begins execution.
- Optional security configuration partitions memory/peripherals.
Minimal Concrete Example
// Pseudocode for minimal startup flow (simplified)
void reset_handler(void) {
init_stack_pointers();
init_clocks();
copy_data_to_ram();
zero_bss();
main();
}
Common Misconceptions
- “Both ARM and RISC-V run simultaneously.” (They do not.)
- “Boot ROM can be overwritten.” (It is read-only.)
- “TrustZone is only for big CPUs.” (It also matters on microcontrollers.)
Check-Your-Understanding Questions
- What does the boot ROM do before your code runs?
- Why does the ISA choice affect debugging tools?
- What is the difference between secure and non-secure execution?
Check-Your-Understanding Answers
- It initializes minimal clocks/memory and releases the chosen cores.
- Because ARM uses SWD/ETM tooling while RISC-V uses different debug modules and tooling.
- Secure execution can access protected memory/peripherals; non-secure cannot.
Real-World Applications
- Secure firmware update pipelines
- Products that support both ARM and RISC-V toolchains
- Teaching environments for ISA comparison
Where You Will Apply It
- Project 7 (ARM vs RISC-V benchmark)
- Project 10 (bare-metal startup code)
- Project 13 (mini OS and context switching)
References
- RP2350 Datasheet (processor subsystem)
- Raspberry Pi silicon documentation
- Hazard3 RISC-V core documentation
Key Insight: The RP2350 is a living lab for understanding how ISA, boot flow, and security boundaries shape real firmware.
Summary
Dual-ISA means you control the CPU identity at boot. That affects toolchains, startup code, and security configuration. It is the foundation for everything else you build on this board.
Homework/Exercises
- Draw a boot flow diagram for ARM vs RISC-V startup.
- Compare two compiled binaries (ARM vs RISC-V) for the same function and count instructions.
- Map which peripherals should be secure vs non-secure in a hypothetical product.
Solutions
- The flow is identical until the ISA select step, then diverges by toolchain and startup vector table.
- ARM often emits fewer instructions for floating-point, while RISC-V benefits from compressed encoding.
- Keep USB and storage parsing in non-secure; keep boot validation in secure.
Chapter 2: Memory Map, XIP, SRAM Banking, and Bus Fabric
Fundamentals
Embedded performance is mostly memory performance. On RP2350, code typically runs from external QSPI flash via XIP (Execute-In-Place), while data lives in on-chip SRAM. The memory map defines where ROM, XIP, SRAM, and peripheral registers live. SRAM is split into multiple banks, some striped for bandwidth. The bus fabric (AHB/APB) and crossbar determine how cores, DMA, and peripherals contend for memory. Understanding addresses and bus segments lets you place high-bandwidth buffers in the right memory bank, avoid contention, and explain why a DMA transfer stalls your CPU. This is not just an optimization: graphics rendering becomes impossible if you misplace your framebuffer or block the bus with poorly timed transfers.
Deep Dive into the Concept
The RP2350 memory map is layered: ROM at low addresses, XIP flash mapped at 0x10000000, SRAM at 0x20000000, APB peripherals around 0x40000000, and AHB peripherals around 0x50000000. The RP2350 datasheet specifies base addresses for SRAM, DMA, PIO, and USB control, which you will use when writing bare-metal code. SRAM is not a single block: it is divided into multiple banks, and some are striped for throughput. This means sequential addresses might alternate between banks, allowing parallel access and higher throughput. If you place a framebuffer in striped SRAM, both cores and DMA can fetch data more efficiently, reducing display tearing or stutter.
XIP is critical: it allows the CPU to fetch instructions directly from QSPI flash without copying them into RAM. This simplifies boot and conserves RAM, but it also means flash latency is in your critical path. When you draw pixels or update UI state, you want time-critical code in RAM to reduce flash wait states. A common strategy is to place your hottest loops (pixel blits, DMA setup) into SRAM using linker sections. You can do this by using GCC attributes or linker script sections and then verifying the map file.
The bus fabric determines which agents can access which regions. APB is for low-bandwidth peripherals and often has longer wait states, while AHB is for higher bandwidth devices like DMA and USB. If you are streaming pixels to SPI using DMA, you are competing with CPU fetches on the same crossbar. A poor placement of buffers can cause bus contention and reduce frame rate. The RP2350 also includes SIO (single-cycle IO) for core-local fast registers (GPIO, FIFOs). This means toggling a GPIO pin can be faster when done via SIO, which is important for timing-critical signals or for debugging with a logic analyzer.
Memory mapping also affects security. Secure/non-secure partitions or PMP regions can restrict memory and peripheral access, which is important for any project that touches USB or external storage. You should understand not only addresses but also access permissions and fetch restrictions (e.g., peripherals are not executable). A memory map is not a map of bytes, it is a map of capabilities. When you build a bare-metal project, your linker script defines where code and data live. If you place the vector table or .data in the wrong region, the system will crash immediately. This is why memory map mastery is not optional.
Finally, SRAM banking influences DMA behavior. If the DMA controller and the CPU access the same bank, you will see stalls. If you interleave buffers across striped banks, you can often increase effective bandwidth. Many experienced embedded engineers treat SRAM layout like a performance tool: the layout is part of the algorithm.
How This Fits in Projects
Projects 3, 5, 7, 8, 9, and 10 depend on careful placement of buffers and correct register addresses.
Definitions & Key Terms
- XIP: Execute-In-Place flash mapping.
- APB/AHB: Bus types for peripheral access with different bandwidth.
- Striped SRAM: Memory interleaved across banks for throughput.
- SIO: Core-local fast IO region.
- Bus contention: Multiple agents competing for the same bus resources.
Mental Model Diagram
Address Space
0x00000000 ROM (boot)
0x10000000 XIP flash (code)
0x20000000 SRAM (data, buffers)
0x40000000 APB peripherals (low bandwidth)
0x50000000 AHB peripherals (DMA, PIO, USB)
0xD0000000 SIO (core-local fast IO)
How It Works (Step-by-Step)
- Boot ROM runs from ROM and jumps to flash (XIP).
- Code fetches instructions from XIP.
- Data accesses go to SRAM or peripherals.
- DMA reads from SRAM and writes to peripheral FIFOs.
- Bus fabric arbitrates between CPU and DMA.
Minimal Concrete Example
#define DMA_BASE 0x50000000
#define PIO0_BASE 0x50200000
#define SRAM_BASE 0x20000000
volatile uint32_t *dma = (uint32_t*)(DMA_BASE + 0x000);
volatile uint16_t *framebuffer = (uint16_t*)(SRAM_BASE + 0x0000);
Common Misconceptions
- “All SRAM is equal.” (Banking and striping matter.)
- “XIP is as fast as RAM.” (Flash has wait states.)
- “Peripherals are executable.” (They are not.)
Check-Your-Understanding Questions
- Why might a framebuffer in SRAM8/9 behave differently than SRAM0-3?
- What is the benefit of XIP, and what is its cost?
- Why do DMA transfers sometimes slow your CPU?
Check-Your-Understanding Answers
- SRAM8/9 are non-striped and may have different bandwidth.
- XIP saves RAM and boot time but adds flash latency.
- DMA shares the bus and can create contention.
Real-World Applications
- High-performance display pipelines
- Deterministic real-time control loops
- Secure partitioning of firmware and data
Where You Will Apply It
- Project 3 (DMA display)
- Project 10 (bare-metal register map)
- Project 13 (mini OS memory layout)
References
- RP2350 Datasheet, address map section
Key Insight: Performance bottlenecks on microcontrollers are often memory bottlenecks. Place data intentionally.
Summary
The RP2350 memory map defines not just addresses but performance and access rules. Learn it early and you will debug 10x faster.
Homework/Exercises
- Draw a memory map with the base addresses you will use in Projects 10 and 12.
- Benchmark a loop in XIP vs SRAM and measure the cycle difference.
- Place one buffer in striped SRAM and one in non-striped SRAM, then compare DMA throughput.
Solutions
- Use the datasheet base addresses for SRAM, DMA, and PIO.
- SRAM loops should show fewer wait states and higher throughput.
- The striped buffer should produce higher sustained throughput.
Chapter 3: Clocking, Reset, Power Domains, and GPIO/Pad Control
Fundamentals
Clocks are the heartbeat of embedded systems. The RP2350 has multiple clock sources (crystal oscillator, ring oscillator, PLLs), and each peripheral is clocked separately. Reset logic and power domains control which blocks are active and when. GPIOs are not just pins; they are configurable pads with drive strength, pull-up/down, and function selection. If your display flickers or your SPI bus glitches, the cause is often a misconfigured pad or a peripheral clock set too high. Understanding the clock tree and IO pad configuration makes your system stable, fast, and power-efficient.
Deep Dive into the Concept
The RP2350 includes multiple clock generators (PLLs) that derive system and peripheral clocks from a crystal or internal oscillator. The system clock feeds the CPU cores and bus fabric, while peripheral clocks can be derived or divided independently. This allows you to run the CPU at 150 MHz while keeping the SPI clock within the LCD’s spec. Power domains let you gate off unused blocks to reduce power consumption and noise. In practice, you will use reset and clock control registers to enable a peripheral, then configure its clock divider and source. Skipping this step is the most common reason peripherals appear “dead.”
GPIO pads are programmable. Each pin has a function select (e.g., GPIO, SPI, PIO), and each pad has configuration bits for pull-ups, pull-downs, hysteresis, slew rate, and drive strength. SPI signal integrity depends on proper drive strength and slew. The LCD’s CS or DC lines require clean transitions, or commands are misread. When you enable PWM for the backlight, the pad configuration also matters: a weak drive can produce a slow edge and visible flicker.
Reset is not just about starting the chip; it’s also about cleaning up between experiments. Many boards have a BOOTSEL mode where the boot ROM presents a USB mass-storage device. This mode is possible because BOOTSEL lives in ROM and cannot be erased. It is your safety net when you misconfigure clocks or break your firmware. Good embedded developers learn to recover quickly from bad configurations and design their firmware so it can always be reflashed.
Clocking also affects DMA and PIO timing. If you set the system clock high but leave a peripheral clock low, you may see underflow errors in SPI or slower DMA transfers. If you set the SPI clock too high, the ST7789 may accept commands but corrupt data, which looks like random pixel noise. The correct approach is to read the LCD datasheet, configure SPI accordingly, and then scale clocks for performance only after functional correctness.
Another subtlety is clock domain crossing. Some peripherals run on separate clocks, and signals crossing domains can introduce metastability if not handled correctly. While the hardware handles most of this for you, you will see the effects in timing jitter or unstable UART outputs if clocks are misconfigured. In real-time systems, consistent timing is more important than maximum speed. That is why many embedded engineers stabilize the system first, then increase clock rates gradually while measuring signal integrity.
How This Fits in Projects
Projects 1-4 depend on correct clocking and pad setup. Projects 8-9 also rely on power domain control.
Definitions & Key Terms
- PLL: Phase-locked loop for generating high-frequency clocks.
- Pad control: Electrical configuration of a GPIO pin.
- BOOTSEL: Boot mode for UF2 mass storage programming.
- Power domain: A block that can be turned on/off independently.
- Clock divider: A register that divides a clock to a lower frequency.
Mental Model Diagram
Crystal/Ring OSC -> PLLs -> System Clock -> CPU/BUS
│
├-> SPI Clock Divider -> SPI
├-> PWM Clock Divider -> Backlight
└-> PIO Clock Divider -> PIO SM
How It Works (Step-by-Step)
- Enable oscillator or external crystal.
- Configure PLLs to generate system and USB clocks.
- Set clock dividers for SPI, PWM, PIO.
- Release resets for peripherals.
- Configure pin mux and pad controls.
Minimal Concrete Example
// Pseudocode for enabling a peripheral clock
reset_clear(RESETS_SPI0);
clock_configure(CLK_SPI0, SRC_PLL_SYS, 1, 4); // divide by 4
pad_set_drive_strength(GPIO10, DRIVE_4MA);
Common Misconceptions
- “If code compiles, clocks must be correct.” (Clocks are runtime configuration.)
- “GPIO is always push-pull by default.” (Pads may default to high-impedance.)
- “Faster clocks are always better.” (Signal integrity and power can suffer.)
Check-Your-Understanding Questions
- Why do you need to enable the SPI clock before using SPI?
- How does drive strength affect signal integrity?
- Why is BOOTSEL mode a safety mechanism?
Check-Your-Understanding Answers
- The peripheral is clock-gated by default and will not respond.
- Too weak or too strong drive can cause slow edges or ringing.
- BOOTSEL uses ROM code and cannot be erased, enabling recovery.
Real-World Applications
- Low-power wearable devices
- Reliable high-speed SPI sensors
- Robust firmware recovery strategies
Where You Will Apply It
- Project 1 (SPI LCD bring-up)
- Project 4 (PWM and LED control)
- Project 10 (bare-metal clock setup)
References
- RP2350 Datasheet (clocks/resets/pads)
- Raspberry Pi Pico documentation for BOOTSEL/UF2
Key Insight: Most “mystery bugs” in embedded graphics are clock or pad configuration bugs.
Summary
Clocking and pad configuration define stability, performance, and power. Learn them once and you will debug faster forever.
Homework/Exercises
- Draw a clock tree for your board with all peripheral dividers.
- Experiment with SPI clock speed and measure corruption thresholds.
- Change pad drive strength and observe signal shape on a scope.
Solutions
- Use the system clock as root and add branch dividers for peripherals.
- You will see pixel corruption beyond the LCD’s max SPI clock.
- Too weak drive yields slow edges; too strong yields ringing.
Chapter 4: SPI and the ST7789 Display Controller
Fundamentals
The LCD is controlled by an ST7789 display controller over SPI. SPI is a synchronous serial protocol with a clock and data line plus control signals like chip select (CS) and data/command (DC). The ST7789 expects a specific initialization sequence and a stream of pixel data. You must send commands to set column/row address windows, configure pixel format (RGB565 or RGB666), and then stream pixel data with a memory write command. If you get the command order or timing wrong, the display will stay blank or show corrupted colors.
Deep Dive into the Concept
The ST7789 is not just a display; it is a full controller with internal RAM, address counters, and a programmable refresh engine. SPI is used to write commands and data, which are latched into the controller and then mapped into the display’s frame memory. The sequence matters: typically you assert CS, set DC low to send a command byte, then set DC high to send data bytes. Commands like CASET (column address set) and RASET (row address set) configure the window in RAM where subsequent pixel data will land. The RAMWR command starts a memory write; the following bytes are interpreted as pixel data. The pixel format is configured with the COLMOD command (0x3A), which selects RGB565 or RGB666 packing. The ST7789 datasheet also defines memory access control (MADCTL, 0x36) for orientation, row/column order, and RGB/BGR ordering. This is critical for rotation and for matching the LCD’s physical wiring.
SPI timing matters. The LCD has a maximum clock frequency; exceeding it can produce flicker or random color corruption. The ST7789 can accept both 3-line and 4-line serial protocols; most boards use 4-line SPI with a separate DC pin. The LCD’s 172x320 resolution is not the default 240x320 used in many ST7789 panels, so you must configure the correct address window and offsets. Many LCD modules have hidden pixels or offsets; you will discover these by experimenting with the window ranges and observing where pixels land.
The ST7789’s memory is larger than the visible screen, and the controller maps a subset to the visible pixels. That is why you might see a (0,0) pixel not actually appear in the top-left corner unless you apply an offset. This offset is board-specific and should be documented in the board’s wiki or sample drivers. Once you know it, you can abstract it into your display driver. The ST7789 also supports partial updates: you can set an address window to a small rectangle and update only that region. This is a powerful optimization for UI rendering, reducing SPI bandwidth and CPU load. It is the foundation for fast text rendering and small animations.
The ST7789 command set also includes sleep mode, inversion, gamma adjustment, and frame rate control. In most projects you will use a standard initialization sequence that sets power, gamma, pixel format, and display on commands. But advanced projects can tweak gamma curves to improve contrast or change frame rate settings to balance power vs smoothness. If you plan to run on battery, these settings matter.
How This Fits in Projects
Projects 1-3, 6, 8, 11 rely directly on ST7789 command sequences and SPI timing.
Definitions & Key Terms
- CASET (0x2A): Column Address Set command.
- RASET (0x2B): Row Address Set command.
- RAMWR (0x2C): Start memory write.
- COLMOD (0x3A): Pixel format selection.
- MADCTL (0x36): Memory access control (rotation and RGB/BGR).
- DC pin: Data/Command select pin on 4-wire SPI LCDs.
Mental Model Diagram
CPU -> SPI -> [ST7789 Command Parser] -> [Address Window] -> [Display RAM] -> LCD
DC=0 (cmd) DC=1 (data)
How It Works (Step-by-Step)
- Reset LCD and wait for stable power.
- Send initialization command sequence.
- Set pixel format (RGB565 or RGB666).
- Define address window with CASET/RASET.
- Send RAMWR and stream pixel data.
- Repeat for partial updates.
Minimal Concrete Example
lcd_cmd(0x2A); // CASET
lcd_data16(x0); lcd_data16(x1);
lcd_cmd(0x2B); // RASET
lcd_data16(y0); lcd_data16(y1);
lcd_cmd(0x2C); // RAMWR
lcd_data_pixels(buf, count);
Common Misconceptions
- “SPI is just data; DC does not matter.” (DC distinguishes commands.)
- “A full-frame update is always required.” (Partial windows are faster.)
- “Color errors mean bad framebuffer.” (Often pixel format mismatch.)
Check-Your-Understanding Questions
- Why must CASET and RASET be sent before RAMWR?
- How does COLMOD affect pixel packing?
- Why might the visible image be shifted on the panel?
Check-Your-Understanding Answers
- The ST7789 uses the address window to map data into RAM.
- COLMOD tells the controller how many bits per pixel are sent.
- Because the panel may use a non-zero RAM offset.
Real-World Applications
- Smartwatch and wearable UIs
- Industrial status panels
- Handheld measurement tools
Where You Will Apply It
- Project 1 (bring-up)
- Project 2 (drawing primitives)
- Project 3 (DMA pipeline)
References
- ST7789 datasheet (command set)
- Waveshare RP2350 LCD board documentation
Key Insight: The LCD is a tiny computer. Treat it like a device with its own memory and rules, not just a “screen.”
Summary
SPI display programming is about command sequences, timing, and pixel formats. Master those and you control every pixel.
Homework/Exercises
- Implement a partial update API for a 10x10 region.
- Verify with a logic analyzer that DC toggles correctly.
- Add a function that rotates the display using MADCTL.
Solutions
- Send CASET/RASET for the small region and stream only that data.
- DC should be low for command bytes and high for data bytes.
- Set the MADCTL rotation bits and update your coordinate mapping.
Chapter 5: Pixel Formats, Framebuffers, and Graphics Rendering
Fundamentals
Pixels are just bits in memory. The ST7789 can accept RGB565 (16-bit) or RGB666 (18-bit). RGB565 is the most common because it halves bandwidth and fits well into 16-bit buffers. A framebuffer is a contiguous region of memory representing the pixels you want to show. Rendering is the process of turning shapes, fonts, and sprites into pixel values in that buffer. You need to understand coordinate systems, stride (bytes per row), clipping, and blending. If you want smooth animations, you must control when and how you update the framebuffer so you avoid tearing.
Deep Dive into the Concept
A framebuffer is conceptually simple but full of traps. For a 172x320 display, an RGB565 framebuffer requires 172 * 320 * 2 bytes, which is ~110 KB. That is a large fraction of RP2350 SRAM. If you use double buffering, you need ~220 KB. This is possible but forces you to be careful about where the buffers live and what else shares SRAM. You can also use partial buffers (scanline or tile buffers) and update the display in chunks to save memory. This is a classic trade-off: memory vs CPU time.
Color formats encode RGB values into limited bits. RGB565 uses 5 bits red, 6 bits green, 5 bits blue. That means you must convert 8-bit RGB values by shifting and masking. If you do not align correctly, your colors will look wrong. RGB666 uses 18 bits and allows smoother gradients but requires 3 bytes per pixel, which increases bandwidth. In practice, RGB565 is the sweet spot for most microcontroller graphics. If you need high color fidelity for photos, you might still choose RGB666 and accept lower frame rates.
Rendering primitives (lines, rectangles, circles) are algorithms that touch many pixels. You should learn Bresenham’s algorithm for lines and midpoint algorithms for circles to avoid floating point. Sprites are just bitmaps with a transparent color or alpha mask. Font rendering is similar: fonts are arrays of bits or bytes that map glyphs into pixels. You will likely pre-render fonts into bitmaps and blit them into the framebuffer. The efficiency of these operations determines your frame rate.
Double buffering is critical for smooth animations. If you draw directly to the buffer while DMA is sending it to the display, you will see tearing. With double buffering, you draw into a back buffer while the front buffer is being transmitted. When DMA finishes, you swap the pointers. This requires careful synchronization and a swap protocol that is safe across cores. An alternative is dirty rectangles: track which regions changed and update only those. Dirty rectangles are complex but reduce bandwidth and can outperform full-frame double buffering on small changes.
Clipping is another subtlety. Drawing algorithms must be clipped to the visible region or you will write outside the buffer, which can corrupt memory. You also need to understand coordinate transforms for rotation or UI layouts. The ST7789’s MADCTL can rotate the display, but your rendering coordinates must follow that rotation. A robust renderer has a clear coordinate system and consistent transforms. Advanced renderers also support alpha blending, which requires reading the destination pixel, blending with a source pixel, and writing back. This is expensive on microcontrollers, so you often use 1-bit masks or preblended sprites.
How This Fits in Projects
Projects 2, 5, 6, 8, 11 are all rendering-heavy.
Definitions & Key Terms
- Framebuffer: Memory region holding pixel data.
- Stride: Bytes per row in a framebuffer.
- RGB565: 16-bit pixel format (5-6-5 bits).
- Double buffering: Two framebuffers for tear-free updates.
- Dirty rectangle: A region that changed and needs re-rendering.
Mental Model Diagram
[Render API] -> [Rasterizer] -> [Framebuffer] -> DMA -> LCD
(lines, text, sprites)
How It Works (Step-by-Step)
- Clear framebuffer to background color.
- Render primitives and sprites into buffer.
- Send buffer to display using SPI or DMA.
- Swap buffers if double buffering is enabled.
Minimal Concrete Example
uint16_t rgb565(uint8_t r, uint8_t g, uint8_t b) {
return ((r & 0xF8) << 8) | ((g & 0xFC) << 3) | (b >> 3);
}
void draw_pixel(uint16_t *fb, int x, int y, int w, uint16_t color) {
fb[y * w + x] = color;
}
Common Misconceptions
- “Framebuffers must be full screen.” (Tile buffers also work.)
- “Double buffering is always required.” (Partial updates can avoid tearing.)
- “RGB565 is bad quality.” (For small displays, it is excellent.)
Check-Your-Understanding Questions
- How many bytes does a 172x320 RGB565 framebuffer require?
- What is the advantage of a tile buffer?
- Why can DMA + single buffer cause tearing?
Check-Your-Understanding Answers
- 172 * 320 * 2 = 110,080 bytes.
- It reduces RAM usage while still supporting partial updates.
- Because the buffer is being updated while it is transmitted.
Real-World Applications
- UI rendering for IoT devices
- Portable data loggers with custom graphics
- Wearable interfaces
Where You Will Apply It
- Project 2 (pixel artist)
- Project 6 (font engine)
- Project 11 (game rendering)
References
- ST7789 datasheet (pixel format)
- Computer Graphics from Scratch (rasterization)
Key Insight: Graphics is memory bandwidth. Optimize your pixels, not your CPU.
Summary
Rendering is about mapping shapes and text to pixel buffers efficiently. Master it and your UI becomes smooth and responsive.
Homework/Exercises
- Implement a line-drawing algorithm with clipping.
- Build a small sprite blitter with transparency.
- Implement dirty rectangles and measure bandwidth savings.
Solutions
- Use Bresenham and clip before drawing.
- Skip pixels matching the transparent color.
- Track changed regions and update only those windows.
Chapter 6: DMA and PIO Pipelines for High-Throughput IO
Fundamentals
DMA moves data without CPU intervention. PIO is a programmable IO engine that can generate deterministic waveforms. Together, they form a pipeline where the CPU sets up transfers and the hardware streams data to peripherals. This is the key to smooth graphics: the CPU can render the next frame while DMA pushes the current frame to SPI. PIO can implement protocols or timing that SPI alone cannot. Understanding DMA channels, DREQ pacing, and FIFO behavior is essential for zero-copy, high-throughput display updates.
Deep Dive into the Concept
A DMA controller is essentially a configurable engine that reads from memory and writes to a peripheral or another memory region. On RP2350, DMA can access SRAM and peripheral registers. The typical display path is: DMA reads the framebuffer and writes to the SPI TX FIFO. The DMA channel can be paced by a DREQ signal so it only writes when the FIFO has space. This prevents overflow and allows consistent throughput. If you ignore DREQ and simply blast data, you will either stall or lose data. DMA transfers can be chained: once one transfer finishes, another automatically starts. This is useful for double buffering or for sending command sequences followed by pixel data.
PIO is complementary. It is a small state machine that manipulates GPIO pins with cycle-level determinism. PIO can implement custom SPI-like protocols, generate WS2812 LED waveforms, or even bit-bang LCD signals if needed. The advantage is determinism: PIO timing is independent of CPU jitter. Each PIO block has multiple state machines, each with its own instruction memory, FIFOs, and shift registers. PIO can also be fed by DMA, creating a fully hardware-driven IO pipeline: CPU writes buffer -> DMA feeds PIO -> PIO toggles pins. For graphics, this means you can generate precise timing for unusual displays or backlight protocols.
A key concept is flow control. DMA transfers can starve or overflow a peripheral if not paced correctly. The SPI peripheral usually has a small FIFO. DREQ ensures the DMA engine only writes when the FIFO is ready. Similarly, PIO can request data when its TX FIFO is low. By combining DREQ with DMA, you can build a steady streaming pipeline without busy-waiting. This is how you achieve high frame rates while leaving CPU cycles free for rendering, input, or logic.
DMA and PIO also affect memory placement. Because DMA reads from memory, you should place buffers in SRAM banks with good bandwidth and avoid contention with the CPU. If you run DMA and the CPU from the same bank, you will see stalls. If you place DMA buffers in striped SRAM, you can often improve throughput. This is why memory map knowledge matters. A disciplined DMA design also includes error handling: overrun flags, transfer-complete interrupts, and graceful recovery if a transfer is aborted.
Finally, PIO is best understood as a hardware co-processor. It has its own instruction memory and runs independently. It can be used for more than LED or SPI; it can generate video timing, decode sensors, or implement proprietary buses. When you understand how to feed it with DMA, you gain a reliable pipeline that is independent of CPU scheduling jitter.
How This Fits in Projects
Projects 3 and 4 are direct DMA/PIO exercises. Projects 5 and 11 use these pipelines for smooth graphics.
Definitions & Key Terms
- DMA: Direct Memory Access engine.
- DREQ: DMA request for pacing transfers.
- PIO: Programmable IO state machine.
- FIFO: First-in-first-out queue inside peripherals.
- Chaining: DMA feature that starts a new transfer when one finishes.
Mental Model Diagram
CPU sets DMA -> DMA streams -> SPI FIFO -> ST7789 RAM
\-> DMA streams -> PIO FIFO -> GPIO waveforms
How It Works (Step-by-Step)
- Configure DMA channel with source and destination.
- Set DREQ to SPI TX or PIO TX.
- Enable channel and let it run.
- DMA triggers IRQ on completion.
- Swap buffers or chain next transfer.
Minimal Concrete Example
// Pseudocode for DMA to SPI
configure_dma(channel,
src = framebuffer,
dst = &spi->dr,
count = pixels,
dreq = DREQ_SPI0_TX);
start_dma(channel);
Common Misconceptions
- “DMA always increases speed.” (Incorrect configuration can slow you down.)
- “PIO replaces SPI.” (PIO is for custom timing, not always needed.)
Check-Your-Understanding Questions
- Why is DREQ important in DMA transfers?
- What makes PIO deterministic?
- How do DMA and PIO interact?
Check-Your-Understanding Answers
- It paces the transfer to FIFO availability.
- PIO runs a fixed instruction sequence at a fixed clock.
- DMA can feed PIO FIFOs to automate output.
Real-World Applications
- High-speed LCD refresh
- NeoPixel/WS2812 LED drivers
- Precise sensor protocols
Where You Will Apply It
- Project 3 (DMA display)
- Project 4 (PIO LED)
- Project 5 (dual-core rendering with DMA)
References
- RP2350 Datasheet (DMA/PIO address map)
- Pico SDK hardware_pio documentation
Key Insight: DMA + PIO is how you get hardware-level throughput while the CPU does real work.
Summary
DMA moves bulk data efficiently, PIO handles deterministic timing. Together they create high-performance IO pipelines.
Homework/Exercises
- Build a DMA transfer that fills a framebuffer with a gradient.
- Write a PIO program that toggles a pin at 1 MHz and verify with a scope.
- Chain two DMA transfers: one for a command sequence, one for pixel data.
Solutions
- Use DMA with a memory-to-memory copy and a repeating pattern.
- Use PIO clock divider and a simple loop with SET instructions.
- Use a control channel that triggers the data channel on completion.
Chapter 7: Multicore, Interrupts, and Real-Time Scheduling
Fundamentals
The RP2350 gives you two active cores, which lets you separate rendering from IO or UI logic. Multicore programming is powerful but dangerous: race conditions, cache/bus contention, and synchronization mistakes can crash or corrupt your system. Interrupts are the mechanism by which peripherals notify your code, but they must be short and deterministic. Real-time scheduling is the art of deciding what runs when, so animations stay smooth and input feels responsive.
Deep Dive into the Concept
Dual-core microcontrollers are different from SMP desktop CPUs. They often share memory and peripherals but have limited cache and tighter timing constraints. On RP2350, both cores see the same memory map. You can use a FIFO or shared memory to communicate. The challenge is synchronization: if two cores write the same framebuffer, the results are undefined. The usual pattern is to dedicate one core to rendering and another to IO and display updates. You then synchronize frame swaps using atomic flags or multicore FIFO messages.
Interrupts are essential for timing. For example, you might use a timer interrupt to trigger a screen refresh or to advance animation frames. But interrupts that do too much work can cause jitter and missed deadlines. The correct pattern is to set a flag in the ISR and let the main loop do heavy work. In a multicore system, you can also route interrupts to a specific core to avoid contention. Some tasks (like USB) may require strict timing and should be isolated from rendering code.
Real-time scheduling does not always require an RTOS. You can build a cooperative scheduler that runs tasks in a fixed order and yields control regularly. In Project 13, you will build a tiny cooperative OS that switches between tasks and updates a task manager UI. This teaches you context switching, stack management, and priority scheduling. It also forces you to confront how much CPU time each subsystem consumes. Even with a cooperative scheduler, you need to track deadlines. If one task runs too long, others miss their deadlines and you see visible jitter or input lag.
Concurrency also affects DMA and PIO. If one core is configuring DMA while another is reading from the same registers, you can introduce subtle race conditions. You must define ownership: one core owns DMA setup, the other owns rendering, etc. The more explicit your ownership model, the fewer bugs you will chase at 2 AM. In practice, you will use mutexes, spinlocks, or atomic flags to coordinate access, and you will design protocols (like “only core 1 writes to SPI”).
Atomic operations and memory barriers matter even on microcontrollers. If you do not use proper barriers, one core might not see a flag update from the other. The RP2350 provides spinlocks and FIFO mechanisms for coordination. Understanding these low-level primitives will help you debug the kinds of bugs that only show up under load.
How This Fits in Projects
Projects 5, 7, 9, 11, and 13 rely on multicore and scheduling concepts.
Definitions & Key Terms
- ISR: Interrupt Service Routine.
- Cooperative scheduling: Tasks yield explicitly.
- Race condition: Two writers updating the same data without coordination.
- FIFO: A hardware queue used for inter-core messaging.
- Jitter: Variation in timing of periodic events.
Mental Model Diagram
Core 0: Render -> Framebuffer A
Core 1: DMA -> LCD
^
| swap signal
How It Works (Step-by-Step)
- Core 0 renders into back buffer.
- Core 1 starts DMA for front buffer.
- Core 0 signals when new frame is ready.
- Core 1 swaps buffer pointers after DMA complete.
Minimal Concrete Example
// Shared flag for buffer swap
volatile bool frame_ready = false;
// Core 0
render_frame(back_buffer);
frame_ready = true;
// Core 1
if (frame_ready && dma_done()) {
swap_buffers();
frame_ready = false;
}
Common Misconceptions
- “Two cores always means 2x speed.” (Bus contention can reduce speed.)
- “Interrupts are faster than polling.” (Interrupt overhead can be worse.)
- “RTOS is required for real-time.” (Cooperative scheduling can be enough.)
Check-Your-Understanding Questions
- Why should ISRs be short?
- What is the simplest safe buffer swap mechanism?
- When would you use cooperative scheduling vs an RTOS?
Check-Your-Understanding Answers
- Long ISRs introduce jitter and block other interrupts.
- A flag plus DMA-complete interrupt on a single owner core.
- Cooperative scheduling is simpler and sufficient for small systems.
Real-World Applications
- Wearable UIs with smooth animation
- Industrial control dashboards
- Low-latency input devices
Where You Will Apply It
- Project 5 (dual-core renderer)
- Project 13 (mini OS)
References
- RP2350 Datasheet (multicore SIO/FIFO)
- Operating Systems: Three Easy Pieces (scheduling)
Key Insight: Multicore is a force multiplier only if you define ownership and synchronization clearly.
Summary
Concurrency gives you power, but only when you control timing and data access. Clear ownership is the best optimization.
Homework/Exercises
- Design a buffer swap protocol with two cores and DMA.
- Implement a cooperative scheduler with 3 tasks and fixed time slices.
- Measure jitter in a timer ISR under load.
Solutions
- Use DMA completion IRQ and an atomic frame-ready flag.
- Each task runs for a short time then yields to the next.
- Log timestamps and compute deviation from expected intervals.
Chapter 8: Storage and USB I/O (QSPI, SD/FAT, USB HID)
Fundamentals
Embedded systems rarely live in isolation. You will often load assets from flash or SD card and communicate over USB. The RP2350 supports QSPI flash with XIP, and many boards include a TF (microSD) slot for FAT filesystems. USB 1.1 host/device lets the board enumerate as a HID device. Understanding how file systems and USB descriptors work lets you build real devices instead of demos.
Deep Dive into the Concept
QSPI flash is the main storage for firmware. XIP maps flash into memory so you can execute code directly. The tradeoff is latency and bandwidth: flash access is slower than SRAM. For data-heavy tasks (fonts, images), you may want to load assets into SRAM or stream them in chunks. This affects how you design image viewers and sprite systems. Flash writes also require erase cycles, which are relatively slow and limited in lifetime. You should design firmware updates and asset caches with erase blocks in mind.
SD cards provide much larger storage but require a filesystem. FAT is common because it is simple and supported everywhere. A FAT parser must read the boot sector, FAT tables, and directory entries. You can use an existing FAT library, but you should still understand the fundamentals: cluster size, file allocation chains, and sector reads. A common performance optimization is to cache the FAT and directory sectors to reduce repeated reads. For image viewers, you often stream large files in blocks; reading sequential clusters is efficient, while random seeks are costly.
USB HID devices require a descriptor that tells the host what kind of device you are (keyboard, mouse, gamepad). HID reports are structured bytes that represent button states, axes, or keys. The device must respond to control transfers during enumeration. Libraries like TinyUSB handle the heavy lifting, but you must still define descriptors correctly and match report format with your firmware logic. The descriptor is your contract with the host OS. If it is wrong, your device might enumerate but not function.
USB also introduces timing and buffer constraints. HID devices typically use interrupt endpoints with polling intervals. If you send reports too frequently, the host will drop them or throttle. If you send too slowly, your device feels laggy. The goal is a stable, predictable report rate. The LCD can reflect USB state, which is a good debugging tool. If you ever plan to build custom controllers, understanding HID is essential.
Finally, the UF2 bootloader is a key part of the Raspberry Pi ecosystem. It allows drag-and-drop firmware updates by presenting the board as a USB mass storage device. This is extremely practical when experimenting with bare-metal code, because it provides a recovery path. If you misconfigure clocks or crash the firmware, you can always return to BOOTSEL mode.
How This Fits in Projects
Projects 8 and 12 rely on SD/FAT and USB. Projects 10 and 13 rely on flash/XIP and boot flow.
Definitions & Key Terms
- FAT: File Allocation Table filesystem.
- HID: Human Interface Device class in USB.
- UF2: USB flashing file format used by Raspberry Pi bootloader.
- Descriptor: USB structure describing device capabilities.
- Cluster: FAT allocation unit.
Mental Model Diagram
Firmware in QSPI (XIP) -> App loads assets from SD -> UI displayed
\-> USB HID descriptors -> Host recognizes device
How It Works (Step-by-Step)
- Boot ROM exposes UF2 mass storage in BOOTSEL mode.
- Firmware stored in QSPI flash executes via XIP.
- FAT parser reads SD card sectors and loads images.
- USB stack enumerates and sends HID reports.
Minimal Concrete Example
// Pseudocode: send HID report
uint8_t report[4] = {buttons, x, y, 0};
usb_hid_send(report, sizeof(report));
Common Misconceptions
- “USB just works if you plug it in.” (Descriptors must be correct.)
- “FAT is simple enough to ignore.” (Parsing errors corrupt files.)
- “Flash writes are instantaneous.” (Erase cycles are slow.)
Check-Your-Understanding Questions
- Why does UF2 require a special file format?
- What is a HID report?
- Why might SD card reads be slow without caching?
Check-Your-Understanding Answers
- UF2 includes metadata for flash addresses and block ordering.
- A structured byte packet representing input state.
- FAT tables and directory entries require multiple reads.
Real-World Applications
- USB macro pads and controllers
- Portable photo viewers
- Data loggers with SD storage
Where You Will Apply It
- Project 8 (TF card image viewer)
- Project 12 (USB HID)
References
- Raspberry Pi Pico documentation (BOOTSEL/UF2)
- USB HID specs and TinyUSB docs
Key Insight: Storage and USB turn your board from a demo into a real device.
Summary
QSPI, SD, and USB are the I/O backbone of real products. Learn them and your projects become practical.
Homework/Exercises
- Write a simple FAT sector reader that lists directory entries.
- Modify a HID report descriptor to add a new button.
- Implement a simple asset cache that stores one image in SRAM.
Solutions
- Read the boot sector, compute FAT offsets, then parse directory entries.
- Extend the report descriptor and update firmware to set the new bit.
- Load one image into SRAM and reuse it without SD reads.
Glossary
- AHB/APB: Bus protocols for high/low bandwidth peripherals.
- BOOTSEL: Button mode for USB mass storage flashing.
- DREQ: DMA pacing signal.
- Frame buffer: Memory containing pixel data for the LCD.
- MADCTL: Display memory access control register.
- PIO: Programmable IO engine.
- SRAM banking: Multiple memory banks for throughput.
- SPI: Serial Peripheral Interface.
- TrustZone: Secure/non-secure execution domains.
- UF2: USB flashing file format.
- XIP: Execute-In-Place flash mapping.
Why RP2350 LCD Development Matters
The Modern Problem It Solves
Most IoT and embedded devices need a user interface, but they run on tight power and memory budgets. The RP2350 LCD board is a compact, low-cost platform for learning how to build those interfaces while understanding the hardware that drives them. The skills you develop here are directly transferable to wearable devices, industrial control panels, test equipment, and consumer gadgets.
Real-world impact and scale:
- IoT Analytics reports 18.5 billion connected IoT devices in 2024 and forecasts 21.1 billion by the end of 2025. This scale drives demand for embedded UI and device firmware skills.
- Grand View Research projects the global embedded systems market to reach USD 169.1 billion by 2030, emphasizing embedded systems growth across industries.
OLD APPROACH NEW APPROACH
┌───────────────────────┐ ┌────────────────────────┐
│ Fixed UI, low graphics│ │ Dynamic UI + telemetry │
│ No OTA updates │ │ OTA updates + security │
│ Single-core firmware │ │ Multicore + DMA + PIO │
└───────────────────────┘ └────────────────────────┘
Context & Evolution
Early embedded systems used fixed-function displays or no UI at all. As IoT devices proliferated, demand grew for responsive, graphics-capable interfaces on low-power hardware. The RP2350’s dual-ISA and PIO/DMA features make it a uniquely capable learning platform for modern embedded UI systems.
Concept Summary Table
| Concept Cluster | What You Need to Internalize |
|---|---|
| Dual-ISA Architecture & Boot | How RP2350 selects ARM vs RISC-V, secure vs non-secure domains, and the boot ROM flow. |
| Memory Map & XIP | Where code and data live, how SRAM banking affects throughput, and how DMA competes for bandwidth. |
| Clocking & Pad Control | How clocks and pad settings affect stability, SPI timing, and display reliability. |
| SPI + ST7789 Commands | The exact command sequence and pixel format that brings the display to life. |
| Framebuffers & Rendering | How to pack pixels, render primitives, and manage double buffering. |
| DMA + PIO Pipelines | How to stream pixels and IO data without burning CPU cycles. |
| Multicore & Scheduling | How to split workloads across cores safely and deterministically. |
| Storage & USB I/O | How to load assets from SD and expose the board as a USB HID device. |
Project-to-Concept Map
| Project | What It Builds | Primer Chapters It Uses |
|---|---|---|
| Project 1: Hello Display | SPI bring-up + LCD init | 3, 4 |
| Project 2: Pixel Artist | Primitives + framebuffer | 4, 5 |
| Project 3: DMA Display Driver | DMA streaming to SPI | 2, 6 |
| Project 4: RGB LED Controller | PIO waveforms | 6 |
| Project 5: Dual-Core Renderer | Multicore rendering + DMA | 2, 6, 7 |
| Project 6: Font Rendering Engine | Text rendering + partial updates | 4, 5 |
| Project 7: ARM vs RISC-V Benchmark | ISA comparison | 1, 2 |
| Project 8: TF Card Image Viewer | FAT + asset streaming | 2, 8 |
| Project 9: Real-Time System Monitor | Timers + multicore + UI | 2, 7 |
| Project 10: Bare-Metal Driver | Register-level boot + SPI | 1, 2, 3, 4 |
| Project 11: Simple Game | Game loop + rendering | 5, 7 |
| Project 12: USB HID Device | USB descriptors + reports | 8 |
| Project 13: Mini OS | Cooperative scheduling + UI | 1, 2, 7 |
Deep Dive Reading by Concept
Architecture & Hardware
| Concept | Book & Chapter | Why This Matters |
|---|---|---|
| RP2350 boot + memory map | RP2350 Datasheet - Sections 2-3 | Register and address correctness for bare-metal code |
| ISA basics | Computer Organization and Design RISC-V - Ch. 1-2 | Understand instruction set differences |
| Embedded constraints | Making Embedded Systems - Ch. 1-4 | Real-world constraints and design trade-offs |
Display & Graphics
| Concept | Book & Chapter | Why This Matters |
|---|---|---|
| Graphics memory + bitmaps | Code: The Hidden Language - Ch. 13-16 | Binary representation of data and pixels |
| Rendering primitives | Computer Graphics from Scratch - Ch. 2-5 | Rasterization fundamentals |
| LCD command sets | ST7789 Datasheet - Command sections | Real command order and pixel format |
IO & Performance
| Concept | Book & Chapter | Why This Matters |
|---|---|---|
| DMA/PIO design | RP2350 Datasheet - DMA/PIO sections | Correct high-throughput IO design |
| Debugging timing | The Art of Debugging with GDB - Ch. 6-8 | Timing and peripheral debugging |
Systems & Scheduling
| Concept | Book & Chapter | Why This Matters |
|---|---|---|
| Scheduling fundamentals | Operating Systems: Three Easy Pieces - Ch. 4-9 | Context switching and scheduling basics |
| Concurrency basics | Computer Systems: A Programmer’s Perspective - Ch. 12 | Synchronization and concurrency models |
Storage & USB
| Concept | Book & Chapter | Why This Matters |
|---|---|---|
| Filesystems | Operating Systems: Three Easy Pieces - Ch. 39 | File system layout and metadata |
| USB fundamentals | USB Complete by Jan Axelson - Ch. 1-4, 8 | USB enumeration and HID class |
Quick Start: Your First 48 Hours
Day 1 (4 hours):
- Read the Introduction, Chapter 4 (SPI + ST7789), and Chapter 5 (Framebuffers).
- Flash the Hello Display project and make the screen show text.
- Use a logic analyzer to verify SPI clock and DC toggling.
Day 2 (4 hours):
- Implement a simple rectangle fill and draw a moving box.
- Add partial update support to reduce SPI traffic.
- Read the “Core Question” and “Pitfalls” sections in Project 2.
End of Weekend: You can bring up the display, draw primitives, and understand the SPI command flow. That is the core foundation.
Recommended Learning Paths
Path 1: The Embedded Graphics Beginner (Recommended Start)
- Project 1 - Hello Display
- Project 2 - Pixel Artist
- Project 3 - DMA Display Driver
- Project 6 - Font Rendering Engine
Path 2: The Performance Engineer
- Project 3 - DMA Display Driver
- Project 4 - PIO LED Controller
- Project 5 - Dual-Core Renderer
- Project 7 - ARM vs RISC-V Benchmark
Path 3: The Product Builder
- Project 1 - Hello Display
- Project 8 - TF Card Image Viewer
- Project 12 - USB HID Device
- Project 11 - Simple Game
Path 4: The Completionist
Phase 1: Projects 1-3 Phase 2: Projects 4-6 Phase 3: Projects 7-9 Phase 4: Projects 10-13
Success Metrics
By the end of this guide, you should be able to:
- Explain RP2350 boot flow and memory map from memory
- Bring up the LCD without SDK helper functions
- Achieve smooth 60 FPS animation using DMA
- Implement partial updates and double buffering
- Run the same benchmark on ARM and RISC-V and explain the results
- Build a USB HID device with a custom descriptor
- Build a cooperative scheduler and show task stats on the LCD
Tooling & Debugging Appendix
Logic Analyzer Workflow:
- Capture SPI clock, MOSI, CS, and DC.
- Verify command bytes and data bytes align with DC.
- Look for clock stretching or missed edges.
Common Debug Tools:
picotoolto inspect flash and device stateopenocd+ SWD for stepping through startup- GDB for register inspection
Signal Integrity Tips:
- Reduce SPI clock if you see random pixel noise.
- Increase drive strength on SPI pins for longer wires.
- Keep wires short to avoid ringing.
Project Overview Table
| # | Project | Difficulty | Time | Primary Focus |
|---|---|---|---|---|
| 1 | Hello Display | Beginner | Weekend | SPI + LCD init |
| 2 | Pixel Artist | Intermediate | 1-2 weeks | Rendering primitives |
| 3 | DMA Display Driver | Advanced | 1-2 weeks | DMA pipeline |
| 4 | RGB LED Controller | Advanced | 1-2 weeks | PIO timing |
| 5 | Dual-Core Renderer | Advanced | 2-3 weeks | Multicore + DMA |
| 6 | Font Rendering Engine | Intermediate | 1-2 weeks | Text rendering |
| 7 | ARM vs RISC-V Benchmark | Expert | 2-3 weeks | ISA comparison |
| 8 | TF Card Image Viewer | Intermediate | 1-2 weeks | FAT + assets |
| 9 | Real-Time System Monitor | Intermediate | 1-2 weeks | Timers + UI |
| 10 | Bare-Metal Driver | Master | 1 month | Registers + boot |
| 11 | Simple Game | Advanced | 2-3 weeks | Game loop |
| 12 | USB HID Device | Expert | 2-3 weeks | USB stack |
| 13 | Mini Operating System | Master | 1-2 months | Scheduling |
Project List
Project 1: Hello Display - Raw SPI Communication
- Main Programming Language: C
- Alternative Programming Languages: Rust, MicroPython
- Coolness Level: Level 2: Fun
- Business Potential: 2. The “Prototype” Level
- Difficulty: Level 1: Beginner
- Knowledge Area: SPI + Display Bring-up
- Software or Tool: Pico SDK, logic analyzer
- Main Book: “Making Embedded Systems” by Elecia White
What you’ll build: A minimal firmware that initializes SPI, sends the ST7789 command sequence, and displays text and a color gradient on the LCD.
Why it teaches RP2350 fundamentals: You will wire up SPI, configure GPIO pads, and confirm that the LCD responds to command/data sequences. This is the foundation for everything else.
Core challenges you’ll face:
- Pin mapping -> Correctly assign SPI and control pins
- Command sequencing -> ST7789 init order matters
- Timing -> Reset delays and SPI clock stability
Real World Outcome
You will see a boot logo and a gradient fill:
┌────────────────────────────────────────┐
│ RP2350 LCD HELLO │
│ │
│ Gradient: ██████████░░░░░░░░ │
│ │
│ SPI OK | LCD OK │
└────────────────────────────────────────┘
Command Line Outcome Example:
$ mkdir build && cd build
$ cmake ..
$ make -j4
[100%] Built target hello_display
$ cp hello_display.uf2 /Volumes/RP2350
# Screen updates within 1-2 seconds
The Core Question You’re Answering
“How do I speak the LCD’s language well enough to make a pixel appear?”
Concepts You Must Understand First
- SPI basics
- What does CPOL/CPHA change?
- Why does CS framing matter?
- Book Reference: “Making Embedded Systems” Ch. 6
- ST7789 command flow
- Why do CASET/RASET precede RAMWR?
- Book Reference: ST7789 datasheet command section
- GPIO pad control
- How does drive strength affect edges?
- Book Reference: RP2350 Datasheet, IO/pad sections
Questions to Guide Your Design
- How will you abstract “command” vs “data” writes?
- What reset timing does the LCD require?
- How will you handle display offsets for 172x320?
Thinking Exercise
“The First Pixel”
- Imagine you want to draw only pixel (0,0). What commands must be sent?
- Write the sequence of command bytes and data bytes.
The Interview Questions They’ll Ask
- “What is the purpose of DC in SPI LCDs?”
- “Why must you set an address window before RAMWR?”
- “What are common causes of a blank LCD screen?”
- “Why might a display show shifted graphics?”
Hints in Layers
Hint 1: Start simple Toggle CS and reset pins and confirm with a logic analyzer.
Hint 2: Known-good sequence Use a known ST7789 initialization sequence and add delays after reset.
Hint 3: Color swap check If colors are wrong, flip RGB/BGR in MADCTL.
Hint 4: Verify on the wire Capture the first 16 bytes after RAMWR and ensure they match your pixel data.
Books That Will Help
| Topic | Book | Chapter | |——-|——|———| | SPI basics | “Making Embedded Systems” | Ch. 6 | | Memory-mapped IO | “Code” by Charles Petzold | Ch. 15 | | Bare-metal debugging | “The Art of Debugging with GDB” | Ch. 6 |
Common Pitfalls & Debugging
Problem 1: “Screen stays white”
- Why: Reset not asserted or SPI clock disabled
- Fix: Toggle reset pin with proper delays, enable SPI clock
- Quick test: Send “display on” command and check logic analyzer
Problem 2: “Colors are inverted”
- Why: RGB/BGR bit in MADCTL wrong
- Fix: Toggle MADCTL bit D3
- Quick test: Draw pure red and see if it appears red
Problem 3: “Only part of the screen updates”
- Why: Address window not configured for 172x320
- Fix: Adjust CASET/RASET values and offsets
- Quick test: Draw a border and confirm edges align
Definition of Done
- LCD shows text and gradient correctly
- SPI clock and DC timing verified on analyzer
- Reset sequence is documented and repeatable
- You can draw a single pixel at (0,0)
Project 2: Pixel Artist - Drawing Primitives and Sprites
- Main Programming Language: C
- Alternative Programming Languages: Rust, MicroPython
- Coolness Level: Level 3: Impressive
- Business Potential: 3. The “Demo” Level
- Difficulty: Level 2: Intermediate
- Knowledge Area: Graphics rendering
- Software or Tool: Pico SDK, image conversion scripts
- Main Book: “Computer Graphics from Scratch” by Gabriel Gambetta
What you’ll build: A small graphics library that can draw pixels, lines, rectangles, circles, and sprites. You will render a pixel-art scene and animate a sprite.
Why it teaches graphics fundamentals: You will learn rasterization, clipping, and pixel format conversion, all on a real device.
Core challenges you’ll face:
- Rasterization algorithms -> Bresenham, midpoint circle
- Pixel formats -> RGB565 conversion
- Clipping -> Avoid buffer overflows
Real World Outcome
┌────────────────────────────────────────┐
│ PIXEL ART SCENE │
│ * ^^^ ^^ │
│ ** ^^^ ^^^ │
│ │
│ Sprite: [>o<] │
│ FPS: 30 │
└────────────────────────────────────────┘
Command Line Outcome Example:
$ ./pixel_artist
Loaded palette: 16 colors
Rendered frame in 9.4ms
FPS: 30
The Core Question You’re Answering
“How do I turn math into pixels efficiently?”
Concepts You Must Understand First
- RGB565 encoding
- How do you map 8-bit RGB to 16-bit?
- Book Reference: “Computer Graphics from Scratch” Ch. 2
- Line drawing algorithms
- Why use integer-only algorithms?
- Book Reference: “Computer Graphics from Scratch” Ch. 3
- Clipping
- How do you ensure you never write out of bounds?
- Book Reference: “Computer Graphics from Scratch” Ch. 4
Questions to Guide Your Design
- How will you structure your framebuffer API?
- What is your clipping strategy?
- How will you batch updates to reduce SPI traffic?
Thinking Exercise
“The Diagonal Line”
- Draw a line from (0,0) to (171,319). Which pixels are touched?
- How many pixels are out of bounds if you forget clipping?
The Interview Questions They’ll Ask
- “Explain Bresenham’s line algorithm in one minute.”
- “What causes flicker in framebuffer updates?”
- “Why is clipping necessary in graphics?”
- “What is the difference between RGB565 and RGB666?”
Hints in Layers
Hint 1: Start with put_pixel() Build a reliable pixel writer before any other primitive.
Hint 2: Integer math only Avoid float operations to keep rendering fast.
Hint 3: Clip early Implement a clip rectangle and check bounds before drawing.
Hint 4: Precompute colors Cache common colors in RGB565 to reduce per-pixel overhead.
Books That Will Help
| Topic | Book | Chapter | |——-|——|———| | Rasterization | “Computer Graphics from Scratch” | Ch. 2-4 | | Bitwise math | “Code” by Charles Petzold | Ch. 15 | | Efficient C | “Effective C” by Robert Seacord | Ch. 4 |
Common Pitfalls & Debugging
Problem 1: “Lines have gaps”
- Why: Incorrect Bresenham step condition
- Fix: Re-check error term update
Problem 2: “Random crashes”
- Why: Missing clipping, writing outside buffer
- Fix: Clip coordinates before drawing
Problem 3: “Colors look wrong”
- Why: RGB565 conversion bug
- Fix: Verify bit shifts and masks
Definition of Done
- Draw lines, rectangles, and circles correctly
- Render a sprite with transparency
- No buffer overflows (verified with guard bytes)
- Scene animates at 30+ FPS
Project 3: DMA Display Driver - Zero-CPU Frame Streaming
- Main Programming Language: C
- Alternative Programming Languages: Rust
- Coolness Level: Level 4: Hardcore
- Business Potential: 3. The “Performance” Level
- Difficulty: Level 3: Advanced
- Knowledge Area: DMA and high-throughput IO
- Software or Tool: Pico SDK DMA APIs
- Main Book: “Making Embedded Systems” by Elecia White
What you’ll build: A display driver that uses DMA to stream the framebuffer to SPI while the CPU renders the next frame.
Why it teaches performance engineering: DMA is the key to high frame rates and low CPU usage.
Core challenges you’ll face:
- DMA configuration -> Correct DREQ and transfer size
- Buffer synchronization -> Avoid tearing
- Bus contention -> Optimize SRAM placement
Real World Outcome
Terminal output:
$ ./frame_test
DMA enabled: yes
Frame time: 16.2ms
CPU usage: 24%
Display: smooth animation with no tearing.
The Core Question You’re Answering
“How can I move 110 KB per frame without the CPU doing the copy?”
Concepts You Must Understand First
- DMA channels and DREQ
- How does pacing prevent FIFO overflow?
- Book Reference: RP2350 Datasheet DMA section
- Double buffering
- Why is DMA + single buffer unsafe?
- Book Reference: “Making Embedded Systems” Ch. 7
- SRAM banking
- Why does buffer placement affect throughput?
- Book Reference: RP2350 Datasheet address map
Questions to Guide Your Design
- Where will your buffers live in SRAM banks?
- How will you handle DMA completion interrupts?
- How will you measure CPU utilization?
Thinking Exercise
“The Bus Contention Problem”
- Suppose DMA and CPU both read from SRAM0. What happens to frame time?
The Interview Questions They’ll Ask
- “What is DREQ and why is it needed?”
- “How do you avoid tearing with DMA?”
- “How would you profile DMA throughput?”
- “What is the advantage of DMA chaining?”
Hints in Layers
Hint 1: Validate DMA basics Start with DMA in memory-to-memory mode to verify setup.
Hint 2: DREQ pacing Use DREQ_SPI0_TX so the DMA only writes when FIFO has space.
Hint 3: DMA completion Swap buffers only after DMA complete interrupt.
Hint 4: Measure throughput Measure SPI clock and transfer time with a logic analyzer.
Books That Will Help
| Topic | Book | Chapter | |——-|——|———| | DMA fundamentals | “Making Embedded Systems” | Ch. 7 | | Bus arbitration | RP2350 Datasheet | Bus fabric section | | Optimization | “Effective C” by Robert Seacord | Ch. 8 |
Common Pitfalls & Debugging
Problem 1: “DMA stalls”
- Why: DREQ mismatch or SPI not enabled
- Fix: Ensure SPI is clocked and DREQ matches SPI TX
Problem 2: “Random tearing”
- Why: Buffer swap before DMA completion
- Fix: Swap only in DMA complete IRQ
Problem 3: “Frame rate is low”
- Why: SPI clock too slow or buffer in slow SRAM bank
- Fix: Increase SPI clock within spec, move buffer to striped SRAM
Definition of Done
- DMA streams full frame without CPU loops
- CPU usage below 30% at 30 FPS
- No tearing during animation
- DMA completion interrupt triggers buffer swap reliably
Project 4: RGB LED Controller with PIO (WS2812 Style)
- Main Programming Language: C + PIO
- Alternative Programming Languages: Rust
- Coolness Level: Level 4: Hardcore Tech Flex
- Business Potential: 2. The “Eye Candy” Level
- Difficulty: Level 3: Advanced
- Knowledge Area: PIO and deterministic IO
- Software or Tool: PIO assembler
- Main Book: “Making Embedded Systems” by Elecia White
What you’ll build: A PIO-based driver to control the onboard RGB LED (or an external WS2812 strip) with precise timing and smooth animations.
Why it teaches low-level IO: PIO is the RP2350’s secret weapon for custom protocols.
Core challenges you’ll face:
- PIO timing -> Exact bit widths
- DMA feeding -> Smooth LED updates
- Signal verification -> Scope/logic analyzer testing
Real World Outcome
The LED cycles through a smooth rainbow at 60 FPS without CPU overhead.
The Core Question You’re Answering
“How do I generate perfect waveforms without CPU jitter?”
Concepts You Must Understand First
- PIO instruction set
- How does OUT/SET timing work?
- Book Reference: RP2350 Datasheet PIO section
- WS2812 timing
- Why are timing tolerances so strict?
- Book Reference: WS2812 datasheet
- DMA feeding
- How does DMA keep PIO FIFOs full?
- Book Reference: RP2350 Datasheet DMA section
Questions to Guide Your Design
- What PIO clock divider gives you exact WS2812 timings?
- How will you pack RGB bytes into the PIO FIFO?
- Will you use DMA or CPU writes?
Thinking Exercise
“Timing Budget”
- If each WS2812 bit is 1.25 us, how many PIO cycles per bit do you need at 8 MHz?
The Interview Questions They’ll Ask
- “What is the advantage of PIO over bit-banging?”
- “Why is WS2812 timing strict?”
- “How do you verify PIO timing?”
- “What is DMA’s role in LED control?”
Hints in Layers
Hint 1: Start with a known WS2812 PIO program.
Hint 2: Verify timing with a logic analyzer.
Hint 3: Feed PIO via DMA for smooth animation.
Hint 4: Use a small lookup table for gamma-corrected color.
Books That Will Help
| Topic | Book | Chapter | |——-|——|———| | PIO basics | RP2350 Datasheet | PIO section | | Embedded timing | “Making Embedded Systems” | Ch. 5 | | Debugging | “The Art of Debugging with GDB” | Ch. 7 |
Common Pitfalls & Debugging
Problem 1: “LED flickers”
- Why: Timing jitter or incorrect divider
- Fix: Adjust PIO clock divider and verify waveform
Problem 2: “Colors are wrong”
- Why: RGB vs GRB byte order
- Fix: Swap byte order before sending
Problem 3: “LED freezes after a few seconds”
- Why: FIFO underflow without DMA pacing
- Fix: Use DMA or add blocking writes to keep FIFO filled
Definition of Done
- LED color transitions are smooth
- Timing verified with analyzer
- CPU usage stays low during animation
- Gamma correction improves color smoothness
Project 5: Dual-Core Rendering Engine
- Main Programming Language: C
- Alternative Programming Languages: Rust
- Coolness Level: Level 5: Pure Magic
- Business Potential: 3. The “Performance” Level
- Difficulty: Level 4: Expert
- Knowledge Area: Multicore + DMA
- Software or Tool: Pico SDK multicore APIs
- Main Book: “Computer Systems: A Programmer’s Perspective”
What you’ll build: A dual-core renderer where Core 0 renders frames and Core 1 handles DMA display updates, achieving stable frame times.
Real World Outcome
A demo scene with moving sprites at 60 FPS while CPU usage is split across cores.
The Core Question You’re Answering
“How do I split rendering and IO across cores without data races?”
Concepts You Must Understand First
- Multicore FIFO and synchronization
- How do cores signal each other safely?
- Book Reference: RP2350 Datasheet multicore section
- Double buffering
- Why must buffers be swapped carefully?
- Book Reference: “Making Embedded Systems” Ch. 7
- DMA completion interrupts
- How do you know when transfers finish?
- Book Reference: RP2350 Datasheet DMA section
Questions to Guide Your Design
- Which core owns SPI and DMA configuration?
- How will you avoid buffer swaps mid-transfer?
- How will you measure per-core CPU usage?
Thinking Exercise
“Ownership Model”
- Write down which core owns each resource (DMA, SPI, framebuffer, timers).
The Interview Questions They’ll Ask
- “How do you avoid data races in shared memory?”
- “What is a safe buffer swap protocol?”
- “How do you profile multicore performance?”
- “Why can two cores still be slower than one?”
Hints in Layers
Hint 1: Assign one core as the sole owner of SPI and DMA.
Hint 2: Use a shared flag or FIFO to request swaps.
Hint 3: Swap only on DMA completion.
Hint 4: Use a cycle counter to measure frame time per core.
Books That Will Help
| Topic | Book | Chapter | |——-|——|———| | Concurrency | “Computer Systems: A Programmer’s Perspective” | Ch. 12 | | Scheduling | “Operating Systems: Three Easy Pieces” | Ch. 5-7 |
Common Pitfalls & Debugging
Problem 1: “Random tearing”
- Why: Swap mid-transfer
- Fix: Swap only on DMA completion interrupt
Problem 2: “Data corruption”
- Why: Both cores writing same buffer
- Fix: Strict ownership and locks
Problem 3: “Frame rate drops”
- Why: Bus contention between cores
- Fix: Place buffers in striped SRAM and minimize shared writes
Definition of Done
- Stable 60 FPS with no tearing
- Clear ownership of buffers
- No data corruption in shared state
- Measured per-core CPU usage and documented
Project 6: Font Rendering Engine
- Main Programming Language: C
- Alternative Programming Languages: Rust
- Coolness Level: Level 3: Impressive
- Business Potential: 2. The “UI” Level
- Difficulty: Level 2: Intermediate
- Knowledge Area: Text rendering
- Software or Tool: Bitmap fonts
- Main Book: “Computer Graphics from Scratch”
What you’ll build: A text rendering engine that supports multiple font sizes and partial screen updates.
Real World Outcome
A UI screen with crisp text at 16px and 24px sizes and no flicker.
The Core Question You’re Answering
“How do I render readable text efficiently on a tiny LCD?”
Concepts You Must Understand First
- Bitmap fonts
- How are glyphs stored in memory?
- Book Reference: “Computer Graphics from Scratch” Ch. 5
- Dirty rectangles
- How do you update only changed text?
- Book Reference: Rendering chapter notes
- Baseline and spacing
- How do you align characters consistently?
- Book Reference: Typography basics
Questions to Guide Your Design
- Will you store fonts in flash or SRAM?
- How will you align glyph baselines?
- How will you handle variable-width fonts?
Thinking Exercise
“The Text Baseline”
- Draw a baseline grid for ‘A’, ‘g’, and ‘y’. How do descenders affect layout?
The Interview Questions They’ll Ask
- “What is the cost of full-frame text redraws?”
- “How do bitmap fonts differ from vector fonts?”
- “Why do you need a baseline?”
- “What is a glyph cache?”
Hints in Layers
Hint 1: Start with a fixed-width font for simplicity.
Hint 2: Store glyphs as bitmaps in a compact format (1 bpp).
Hint 3: Use a dirty rectangle around the text region.
Hint 4: Precompute a glyph cache for common characters.
Books That Will Help
| Topic | Book | Chapter | |——-|——|———| | Rasterization | “Computer Graphics from Scratch” | Ch. 5 | | Bitwise operations | “Effective C” | Ch. 4 |
Common Pitfalls & Debugging
Problem 1: “Text looks jagged”
- Why: Low-resolution font or wrong scaling
- Fix: Use pre-rendered font sizes
Problem 2: “Text alignment off”
- Why: Baseline not handled
- Fix: Add ascent/descent metrics per font
Problem 3: “Text flickers”
- Why: Redrawing full screen
- Fix: Use dirty rectangles for partial updates
Definition of Done
- Render ASCII text in multiple sizes
- Partial updates only redraw changed regions
- Text baseline and spacing are correct
- Glyph cache reduces render time for repeated strings
Project 7: ARM vs RISC-V Benchmark Suite
- Main Programming Language: C
- Alternative Programming Languages: Rust
- Coolness Level: Level 5: Pure Magic
- Business Potential: 1. The “Resume Gold”
- Difficulty: Level 4: Expert
- Knowledge Area: ISA comparison
- Software or Tool: Dual toolchains
- Main Book: “Computer Organization and Design RISC-V”
What you’ll build: A benchmark harness that runs the same rendering kernel on ARM and RISC-V and compares throughput.
Real World Outcome
A dashboard showing cycles per pixel, memory bandwidth, and FPS for both ISAs.
The Core Question You’re Answering
“How does ISA choice change real-world performance on the same silicon?”
Concepts You Must Understand First
- Instruction set differences
- How do ARM and RISC-V encode instructions?
- Book Reference: “Computer Organization and Design RISC-V” Ch. 1-2
- Cycle counting
- How do you measure cycles reliably?
- Book Reference: RP2350 Datasheet counters section
- Benchmark methodology
- How do you avoid bias?
- Book Reference: Computer Architecture intro
Questions to Guide Your Design
- How will you isolate CPU performance from memory effects?
- Which kernels represent real workloads (blit, fill, memcpy)?
- How will you collect statistically stable results?
Thinking Exercise
“Benchmark Fairness”
- What variables must be identical between ARM and RISC-V runs?
The Interview Questions They’ll Ask
- “Why is a fair benchmark hard to build?”
- “What is the impact of instruction density on flash fetch?”
- “How does FPU presence change graphics workloads?”
- “Why should you disable interrupts during benchmarks?”
Hints in Layers
Hint 1: Start with a simple memcpy benchmark.
Hint 2: Use the same compiler optimization flags.
Hint 3: Lock clocks and disable interrupts during measurement.
Hint 4: Log results over serial to avoid display overhead.
Books That Will Help
| Topic | Book | Chapter | |——-|——|———| | ISA fundamentals | “Computer Organization and Design RISC-V” | Ch. 1-3 | | Performance | “Computer Architecture” (Hennessy/Patterson) | Ch. 1 |
Common Pitfalls & Debugging
Problem 1: “Results inconsistent”
- Why: Interrupts or variable clock rates
- Fix: Disable interrupts and fix clocks
Problem 2: “ARM seems much faster”
- Why: Using hardware FPU on ARM, software emulation on RISC-V
- Fix: Compare integer-only kernels for fairness
Problem 3: “Benchmark slows display”
- Why: Measuring while rendering
- Fix: Run benchmarks offscreen and report results later
Definition of Done
- Benchmarks run on both ARM and RISC-V
- Results logged with cycle counts
- Analysis explains differences
- Graphs rendered on LCD with comparison bars
Project 8: TF Card Image Viewer
- Main Programming Language: C
- Alternative Programming Languages: MicroPython
- Coolness Level: Level 3: Impressive
- Business Potential: 3. The “Product” Level
- Difficulty: Level 2: Intermediate
- Knowledge Area: FAT filesystem
- Software or Tool: FAT library
- Main Book: “Operating Systems: Three Easy Pieces”
What you’ll build: An image viewer that reads BMP/RAW images from a TF card and displays them.
Real World Outcome
Insert SD card, choose an image, display it in seconds.
The Core Question You’re Answering
“How do I stream large assets from storage without running out of RAM?”
Concepts You Must Understand First
- FAT filesystem basics
- How do clusters map to file data?
- Book Reference: OSTEP Ch. 39
- Streaming IO
- How do you read large files in chunks?
- Book Reference: “Making Embedded Systems” Ch. 8
- Pixel format conversion
- How do you map BMP formats to RGB565?
- Book Reference: Graphics references
Questions to Guide Your Design
- Will you decode images on the fly or preconvert?
- How will you cache FAT sectors to reduce IO?
- How will you display loading progress?
Thinking Exercise
“Chunked Load”
- If your buffer is 4 KB, how many reads are needed for a 100 KB image?
The Interview Questions They’ll Ask
- “Why is FAT still used in embedded systems?”
- “What is a cluster in FAT?”
- “How do you stream data with limited RAM?”
- “Why convert images to RGB565 offline?”
Hints in Layers
Hint 1: Use a known FAT library first, then optimize.
Hint 2: Read image rows and send to display as you go.
Hint 3: Cache FAT tables to reduce repeated reads.
Books That Will Help
| Topic | Book | Chapter | |——-|——|———| | Filesystems | “Operating Systems: Three Easy Pieces” | Ch. 39 | | Embedded IO | “Making Embedded Systems” | Ch. 8 |
Common Pitfalls & Debugging
Problem 1: “Images load slowly”
- Why: Many small reads without caching
- Fix: Use larger reads and cache FAT
Problem 2: “Corrupted image”
- Why: Wrong endian or pixel format conversion
- Fix: Validate BMP header and color depth
Problem 3: “Out of memory”
- Why: Trying to load full image into RAM
- Fix: Stream line-by-line
Definition of Done
- FAT parsing works for root directory
- Image files load correctly
- UI shows loading progress
- At least 5 images can be displayed in sequence
Project 9: Real-Time System Monitor
- Main Programming Language: C
- Alternative Programming Languages: Rust
- Coolness Level: Level 4: Hardcore
- Business Potential: 2. The “Diagnostics” Level
- Difficulty: Level 2: Intermediate
- Knowledge Area: Timers and metrics
- Software or Tool: ADC + timers
- Main Book: “Making Embedded Systems”
What you’ll build: A system monitor UI that displays CPU utilization, temperature, and FPS in real time.
Real World Outcome
A dashboard with live graphs and numeric stats updating at 10 Hz.
The Core Question You’re Answering
“How do I measure and visualize system health in real time?”
Concepts You Must Understand First
- Timers and tick counters
- How do you measure time accurately?
- Book Reference: “Making Embedded Systems” Ch. 7
- ADC temperature sensor
- How do you convert ADC value to degrees?
- Book Reference: RP2350 Datasheet ADC section
- Graph rendering
- How do you draw graphs efficiently?
- Book Reference: Graphics references
Questions to Guide Your Design
- How will you measure CPU idle time?
- How will you draw graphs efficiently?
- How will you avoid the monitor affecting performance?
Thinking Exercise
“Observer Effect”
- How can measuring CPU usage increase CPU usage?
The Interview Questions They’ll Ask
- “How do you measure CPU utilization on a microcontroller?”
- “What is sampling rate vs accuracy trade-off?”
- “How do you avoid the monitor distorting results?”
- “Why is double buffering useful for graphs?”
Hints in Layers
Hint 1: Use a periodic timer interrupt to sample counters.
Hint 2: Track idle loop cycles to estimate CPU usage.
Hint 3: Draw only changed graph regions (dirty rectangles).
Books That Will Help
| Topic | Book | Chapter | |——-|——|———| | Timers | “Making Embedded Systems” | Ch. 7 | | Debugging | “The Art of Debugging with GDB” | Ch. 5 |
Common Pitfalls & Debugging
Problem 1: “CPU usage shows 0% or 100%”
- Why: Counter not reset properly
- Fix: Reset counters every sampling interval
Problem 2: “Temperature fluctuates wildly”
- Why: ADC noise or incorrect formula
- Fix: Average multiple samples
Problem 3: “Graphs flicker”
- Why: Full-screen redraws
- Fix: Use dirty rectangles
Definition of Done
- CPU usage graph updates smoothly
- Temperature readings are stable
- UI remains responsive under load
- Sampling interval is documented and adjustable
Project 10: Bare-Metal Display Driver - No SDK, Just Registers
- Main Programming Language: C (or Assembly)
- Alternative Programming Languages: Rust (no_std)
- Coolness Level: Level 5: Pure Magic
- Business Potential: 1. The “Resume Gold”
- Difficulty: Level 5: Master
- Knowledge Area: Bare-metal programming
- Software or Tool: arm-none-eabi-gcc, linker scripts
- Main Book: “Bare Metal C” by Steve Oualline
What you’ll build: A complete SPI + LCD driver without the Pico SDK. Full control of clocks, pads, SPI, and memory.
Real World Outcome
Display output identical to Project 1, but with no SDK dependency.
The Core Question You’re Answering
“Can I bring up the RP2350 from reset to pixels with only a datasheet?”
Concepts You Must Understand First
- Startup code and linker scripts
- How is memory laid out at boot?
- Book Reference: “Bare Metal C” Ch. 2-3
- Clock and reset registers
- Which registers must be touched first?
- Book Reference: RP2350 Datasheet clock section
- Vector table
- How does the CPU find interrupt handlers?
- Book Reference: Arm Cortex-M docs
Questions to Guide Your Design
- Where will your vector table live?
- How will you configure the stack pointer?
- What is the minimal clock setup for SPI?
Thinking Exercise
“Minimal Boot”
- List the absolute minimum steps to light one pixel without SDK.
The Interview Questions They’ll Ask
- “What is the role of the vector table?”
- “Why is linker script critical in bare metal?”
- “How do you initialize .data and .bss?”
- “What happens if you forget to enable a peripheral clock?”
Hints in Layers
Hint 1: Start from a known minimal linker script.
Hint 2: Copy .data and zero .bss before calling main().
Hint 3: Configure clocks before enabling SPI.
Hint 4: Use memory-mapped register defines from the datasheet.
Books That Will Help
| Topic | Book | Chapter | |——-|——|———| | Bare-metal startup | “Bare Metal C” | Ch. 1-3 | | Low-level C | “Effective C” | Ch. 5 | | Toolchains | “The GNU Make Book” | Ch. 2 |
Common Pitfalls & Debugging
Problem 1: “No boot”
- Why: Stack pointer or vector table wrong
- Fix: Verify linker script addresses
Problem 2: “SPI dead”
- Why: Clocks not enabled
- Fix: Configure clock and reset registers
Problem 3: “Random faults”
- Why: Uninitialized .bss or .data
- Fix: Ensure startup code clears and copies sections
Definition of Done
- System boots without SDK
- Clocks and SPI configured manually
- LCD init sequence works from scratch
- Binary size documented and under 10 KB
Project 11: Simple Game - Pong or Snake
- Main Programming Language: C
- Alternative Programming Languages: Rust, MicroPython
- Coolness Level: Level 4: Hardcore
- Business Potential: 2. The “Fun” Level
- Difficulty: Level 3: Advanced
- Knowledge Area: Game loop + rendering
- Software or Tool: Pico SDK
- Main Book: “Game Programming Patterns” by Robert Nystrom
What you’ll build: A playable Pong or Snake game with button input and smooth animation.
Real World Outcome
A playable game at 60 FPS with score display.
The Core Question You’re Answering
“How do I build a real-time game loop on constrained hardware?”
Concepts You Must Understand First
- Fixed timestep game loop
- Why is fixed timestep more stable?
- Book Reference: “Game Programming Patterns” Ch. 3
- Input debouncing
- Why do buttons bounce?
- Book Reference: “Making Embedded Systems” Ch. 6
- Collision detection
- How do you detect overlaps efficiently?
- Book Reference: Graphics basics
Questions to Guide Your Design
- How will you keep frame time stable?
- How will you represent game state and collisions?
- How will you handle input latency?
Thinking Exercise
“Frame Budget”
- If you target 60 FPS, how many milliseconds per frame do you have?
The Interview Questions They’ll Ask
- “What is a fixed timestep loop?”
- “Why does input debouncing matter?”
- “How do you avoid frame drops?”
- “What is the trade-off between responsiveness and stability?”
Hints in Layers
Hint 1: Start with a 60 FPS timer interrupt.
Hint 2: Separate update() from render().
Hint 3: Use simple collision boxes for sprites.
Books That Will Help
| Topic | Book | Chapter | |——-|——|———| | Game loop | “Game Programming Patterns” | Ch. 3 | | Input handling | “Making Embedded Systems” | Ch. 6 |
Common Pitfalls & Debugging
Problem 1: “Game speed varies”
- Why: Variable timestep
- Fix: Use fixed timestep and accumulate delta
Problem 2: “Input feels laggy”
- Why: Slow polling
- Fix: Sample input every frame
Problem 3: “Collision misses”
- Why: Objects move too fast per frame
- Fix: Use smaller timestep or swept collisions
Definition of Done
- Game runs at fixed frame rate
- Input debounced correctly
- Score and state updates correctly
- Frame time stays within 16.7 ms budget
Project 12: USB HID Device - Custom Controller
- Main Programming Language: C
- Alternative Programming Languages: Rust
- Coolness Level: Level 4: Hardcore
- Business Potential: 3. The “Product” Level
- Difficulty: Level 4: Expert
- Knowledge Area: USB protocol
- Software or Tool: TinyUSB
- Main Book: “USB Complete” by Jan Axelson
What you’ll build: A USB HID device recognized as a game controller or macro keypad with LCD status display.
Real World Outcome
Host OS recognizes the device as a HID controller; LCD shows status.
The Core Question You’re Answering
“How do I make a microcontroller enumerate as a standard USB device?”
Concepts You Must Understand First
- USB descriptors
- What information does the host need?
- Book Reference: “USB Complete” Ch. 1-4
- HID reports
- How do you pack buttons and axes?
- Book Reference: “USB Complete” Ch. 8
- Endpoint polling
- How often can you send reports?
- Book Reference: USB fundamentals
Questions to Guide Your Design
- What HID report format will you implement?
- How often will you send reports?
- How will you indicate connection state on the LCD?
Thinking Exercise
“Descriptor Design”
- Sketch a HID report descriptor for 4 buttons and 2 axes.
The Interview Questions They’ll Ask
- “What happens during USB enumeration?”
- “What is a HID report descriptor?”
- “Why is polling interval important?”
- “Why does the host control report size?”
Hints in Layers
Hint 1: Start with TinyUSB examples.
Hint 2: Define a simple report: 1 byte buttons, 2 bytes axes.
Hint 3: Use a timer to send reports at a fixed interval.
Books That Will Help
| Topic | Book | Chapter | |——-|——|———| | USB basics | “USB Complete” | Ch. 1-4 | | HID | “USB Complete” | Ch. 8 |
Common Pitfalls & Debugging
Problem 1: “Device not recognized”
- Why: Descriptor mismatch or incorrect VID/PID
- Fix: Validate descriptor and use known VID/PID for testing
Problem 2: “Inputs not updating”
- Why: Report size mismatch
- Fix: Ensure report length matches descriptor
Problem 3: “Laggy input”
- Why: Polling interval too long
- Fix: Reduce interval while respecting USB limits
Definition of Done
- USB enumerates correctly on Windows/Mac/Linux
- HID reports update in real time
- LCD shows connection status
- Report rate documented and stable
Project 13: Mini Operating System - Cooperative Multitasking
- Main Programming Language: C
- Alternative Programming Languages: Rust
- Coolness Level: Level 5: Pure Magic
- Business Potential: 1. The “Resume Gold”
- Difficulty: Level 5: Master
- Knowledge Area: OS design
- Software or Tool: Bare-metal
- Main Book: “Operating Systems: Three Easy Pieces”
What you’ll build: A cooperative multitasking mini-OS with a task manager UI on the LCD.
Real World Outcome
A task manager screen showing multiple tasks, CPU usage, and stack stats.
The Core Question You’re Answering
“How do I design a tiny OS that schedules tasks on a microcontroller?”
Concepts You Must Understand First
- Context switching
- Which registers must be saved and restored?
- Book Reference: OSTEP Ch. 4-6
- Scheduling policies
- Round-robin vs priority
- Book Reference: OSTEP Ch. 7-9
- Stack management
- How much stack does each task need?
- Book Reference: “Effective C” Ch. 5
Questions to Guide Your Design
- How will you store task control blocks?
- How will tasks yield control?
- How will you measure per-task CPU usage?
Thinking Exercise
“Stack Size”
- How much stack does a task need if it calls three nested functions?
The Interview Questions They’ll Ask
- “What is a context switch?”
- “How do you avoid starvation?”
- “What is cooperative vs preemptive scheduling?”
- “Why is stack size allocation critical?”
Hints in Layers
Hint 1: Start with two tasks: idle and display.
Hint 2: Use a simple round-robin scheduler.
Hint 3: Track stack pointer per task and restore on switch.
Books That Will Help
| Topic | Book | Chapter | |——-|——|———| | Scheduling | “Operating Systems: Three Easy Pieces” | Ch. 7-9 | | Low-level C | “Effective C” | Ch. 5 |
Common Pitfalls & Debugging
Problem 1: “Tasks crash after switch”
- Why: Stack pointer not restored correctly
- Fix: Save/restore SP and callee-saved registers
Problem 2: “Scheduler hangs”
- Why: Task never yields
- Fix: Enforce yield points
Problem 3: “Stack overflow”
- Why: Task stack too small
- Fix: Add guard regions and measure max depth
Definition of Done
- Multiple tasks run without preemption errors
- Context switching works for at least 4 tasks
- Task manager UI updates in real time
- Scheduler overhead measured and documented
Sources and References
- RP2350 Datasheet (Raspberry Pi): https://datasheets.raspberrypi.com/rp2350/rp2350-datasheet.pdf
- Raspberry Pi silicon documentation: https://www.raspberrypi.com/documentation/microcontrollers/silicon.html
- Raspberry Pi Pico SDK: https://github.com/raspberrypi/pico-sdk
- Pico SDK hardware APIs (PIO, DMA): https://www.raspberrypi.com/documentation/pico-sdk/hardware.html
- Waveshare RP2350-LCD-1.47-A wiki: https://www.waveshare.com/wiki/RP2350-LCD-1.47-A
- ST7789 datasheet (Sitronix): https://www.newhavendisplay.com/appnotes/datasheets/LCDs/ST7789V.pdf
- Hazard3 RISC-V core: https://github.com/Wren6991/Hazard3
- TinyUSB: https://github.com/hathach/tinyusb
- UF2 bootloader docs: https://www.raspberrypi.com/documentation/microcontrollers/raspberry-pi-pico.html