Learn RP2350 LCD Development: From Zero to Embedded Graphics Master

Goal: Build deep, practical mastery of embedded graphics and real-time systems by programming the RP2350 1.47-inch LCD board end-to-end. You will understand the RP2350 dual-ISA architecture, boot flow, memory map, clocking, GPIO pad control, SPI display protocols, DMA/PIO pipelines, multicore scheduling, and storage/USB I/O. You will be able to design and debug high-performance graphics pipelines, write bare-metal drivers, and build interactive applications on constrained hardware. By the end, you can reason about every byte, cycle, and bus transaction that moves a pixel from your C code onto the screen.

Introduction

RP2350 LCD development is the practice of building real-time, resource-constrained graphics systems on Raspberry Pi’s dual-ISA microcontroller (RP2350) paired with a 1.47-inch SPI LCD. You learn how processors boot, how memory and buses are organized, how pixels are encoded, and how DMA/PIO pipelines eliminate CPU overhead while keeping animations smooth. The Waveshare RP2350-LCD-1.47-A board combines the RP2350 with a 172x320, 262K-color ST7789-based LCD, onboard RGB LED, QSPI flash, and a TF (microSD) slot, making it a compact lab for embedded graphics and I/O.

What you will build (by the end of this guide):

A complete SPI driver and ST7789 LCD initialization stack
A software graphics engine (primitives, sprites, fonts, compositing)
A DMA-driven display pipeline with double buffering
A dual-core rendering system that hits smooth frame rates
A bare-metal display driver without any SDK
A USB HID device and a mini cooperative OS with a task manager UI

Scope (what’s included):

RP2350 silicon architecture, memory map, and boot flow
SPI display programming with ST7789 command sequences
DMA/PIO pipelines, multicore scheduling, and real-time timing
Storage and USB fundamentals for embedded peripherals

Out of scope (for this guide):

Full RTOS integration (FreeRTOS/Zephyr) beyond concept-level
Complex 3D graphics or GPU-based rendering
Hardware board design or PCB layout

The Big Picture (Mental Model)

                  ┌─────────────────────────────────────────────┐
                  │              YOUR APPLICATION               │
                  │  game loop, UI logic, sensors, input, etc.  │
                  └────────────────────────────┬────────────────┘
                                               │
                                               v
┌────────────────────────────────────────────────────────────────────────┐
│                 SOFTWARE GRAPHICS PIPELINE (CPU)                       │
│  draw -> rasterize -> blend -> write framebuffer (RGB565/RGB666)       │
└────────────────────────────┬───────────────────────────────────────────┘
                             │
                             v
┌────────────────────────────────────────────────────────────────────────┐
│                  DMA + SPI + PIO DISPLAY PIPELINE                      │
│  DMA pulls framebuffer -> SPI FIFO -> ST7789 RAM -> LCD pixels         │
└────────────────────────────┬───────────────────────────────────────────┘
                             │
                             v
┌────────────────────────────────────────────────────────────────────────┐
│                    ST7789 DISPLAY CONTROLLER                            │
│   address window + pixel format + refresh = visible pixels             │
└────────────────────────────────────────────────────────────────────────┘

Key Terms You Will See Everywhere

XIP: Execute-In-Place, running code directly from QSPI flash.
PIO: Programmable IO, tiny deterministic state machines for IO.
DREQ: DMA request signal used for pacing peripheral transfers.
MADCTL: ST7789 memory access control (rotation/mirror).
RGB565/RGB666: 16-bit/18-bit pixel formats used by LCD.

How to Use This Guide

Read the Theory Primer first. It is a mini-book that gives you the mental models needed to make sense of the projects.
Complete the projects in order if you are new to embedded graphics. Each project adds a new concept and a new performance layer.
For each project, use the Thinking Exercise and Design Questions before coding. These prevent false starts.
If you get stuck, use the Hints in Layers section. Each hint reveals slightly more without spoiling the entire solution.
Track mastery using the Definition of Done checklists.

Prerequisites & Background Knowledge

Essential Prerequisites (Must Have)

Programming Skills:

C fundamentals: pointers, structs, arrays, and bitwise operations
Ability to read and write simple Make/CMake-based projects
Comfort with serial logs and basic debugging

Embedded Fundamentals:

GPIO basics (input/output, pull-ups, pin mux)
Basic SPI knowledge (clock, MOSI, CS, data framing)
Interrupts and timers at a conceptual level

Recommended Reading: “Making Embedded Systems, 2nd Edition” by Elecia White - Ch. 1-4

Computer Architecture Basics:

Memory mapped IO
The difference between RAM and Flash

Recommended Reading: “Code: The Hidden Language of Computer Hardware and Software” by Charles Petzold - Ch. 13-16

Helpful But Not Required

Advanced Topics:

DMA configuration and circular buffers (learn in Project 3)
PIO assembly and state machines (learn in Project 4)
Multicore synchronization and lock-free queues (learn in Project 5)
USB descriptors (learn in Project 12)

Self-Assessment Questions

Can you read a peripheral register value and explain what each bit does?
Can you explain SPI timing (CPOL/CPHA) and why it matters?
Have you ever written to a memory-mapped register in C?
Do you know how to use a logic analyzer or oscilloscope to verify a signal?
Can you explain the difference between blocking and DMA-driven IO?

Development Environment Setup

Required Tools:

A Raspberry Pi RP2350 LCD 1.47” board (Waveshare RP2350-LCD-1.47-A or equivalent)
A USB-C cable (data-capable)
A build machine (Linux/macOS/Windows) with CMake + GCC
Pico SDK and toolchain (arm-none-eabi-gcc or LLVM + pico-sdk)

Recommended Tools:

Logic analyzer (Saleae or cheap 8-channel analyzer)
SWD debugger (Raspberry Pi Debug Probe or ST-Link)
Python 3.11+ (for asset conversion tools)

Testing Your Setup:

$ arm-none-eabi-gcc --version
arm-none-eabi-gcc (GNU Arm Embedded Toolchain) 12.x

$ cmake --version
cmake version 3.22+

Time Investment

Simple projects (1, 2): Weekend (4-8 hours each)
Moderate projects (3, 4, 6, 8, 9): 1-2 weeks each
Complex projects (5, 7, 10, 11, 12): 2-4 weeks each
Final project (13): 1-2 months

Important Reality Check

Embedded graphics is deceptively deep. Your first version will work, your second will be fast, and your third will be clean. Expect to iterate. The real learning happens when you profile, optimize, and debug invisible timing bugs. This guide is structured so that each project teaches a new performance or correctness constraint.

Big Picture / Mental Model

┌────────────────────────────────────────────────────────────────────────┐
│                            INPUT SOURCES                               │
│  GPIO buttons, USB HID, SD card, sensors                               │
└────────────────────────────┬───────────────────────────────────────────┘
                             │
                             v
┌────────────────────────────────────────────────────────────────────────┐
│                         APPLICATION LOGIC                              │
│  game loop, UI state, animation, menus                                 │
└────────────────────────────┬───────────────────────────────────────────┘
                             │
                             v
┌────────────────────────────────────────────────────────────────────────┐
│                      GRAPHICS / RENDERING CORE                         │
│  primitives -> sprites -> fonts -> composition -> framebuffer          │
└────────────────────────────┬───────────────────────────────────────────┘
                             │
                             v
┌────────────────────────────────────────────────────────────────────────┐
│                    DATA MOVEMENT & IO ENGINES                          │
│  DMA + SPI + PIO + interrupts + timers                                 │
└────────────────────────────┬───────────────────────────────────────────┘
                             │
                             v
┌────────────────────────────────────────────────────────────────────────┐
│                     ST7789 LCD CONTROLLER                              │
│  address windows, pixel format, RAM write                              │
└────────────────────────────────────────────────────────────────────────┘

Theory Primer (Read This Before Coding)

Chapter 1: RP2350 Dual-ISA Architecture, Boot Flow, and Security Domains

Fundamentals

The RP2350 is unusual because it can boot either a pair of Arm Cortex-M33 cores or a pair of Hazard3 RISC-V cores. That means the same chip lets you run two different instruction sets, which is an ideal way to learn how ISA choices affect performance, debugging, and toolchains. The boot ROM and bootstrapping logic decide which core pair to enable at startup. This is not just a gimmick: it affects compiler selection, debug tools, and even how you interpret instruction traces on a logic analyzer. The RP2350 also includes security features such as TrustZone support for Arm, OTP memory, and hardware crypto primitives. Understanding secure vs non-secure execution matters because it changes which peripherals and memory regions are accessible. These architectural choices ripple through everything you build in this guide: from how you link code to how you manage memory protection and concurrency.

Deep Dive into the Concept

A dual-ISA microcontroller exposes you to the boundaries between software and silicon. The RP2350 contains two Cortex-M33 cores and two Hazard3 cores, but only one pair is active at a time. This means the boot ROM configures a processor subsystem at boot, then releases the chosen cores from reset. On Arm, the Cortex-M33 brings TrustZone (secure and non-secure states), optional floating-point, and a well-known debug ecosystem (SWD, ETM). On Hazard3, you get a compact 3-stage RV32 core with optional extensions (M, A, C, and bit manipulation sets). That changes code density, interrupt latency, and performance characteristics. For example, compressed instructions in RISC-V reduce flash bandwidth pressure, while ARM’s hardware FPU can accelerate graphics math. A dual-ISA environment is a practical way to compare pipeline depth, ISA encoding, and toolchain maturity in real tasks like drawing sprites or running benchmarks.

Boot flow matters because embedded systems typically start executing from ROM, then configure clocks, copy data into RAM, and finally jump to the main application. If you bypass the SDK, you must set up the vector table, stack pointers, and memory sections yourself. Any mistake shows up as a silent boot failure. On RP2350, the boot ROM is also involved in the USB mass-storage (UF2) boot process, which is why boards appear as a drag-and-drop drive in BOOTSEL mode. Secure boot and OTP features matter in production: you can lock down which code can boot, or store keys that control firmware validation. Even if you are not building a secure product, these features change how you think about trust boundaries, update strategies, and debug access. You must also internalize that peripherals can be assigned to security domains, which influences what a non-secure application can access.

Dual-ISA also means you will maintain two separate build configurations and sometimes two assembly dialects. That is not just busywork: it forces you to understand what your compiler outputs, and how calling conventions, alignment rules, and memory barriers differ. In later projects, you will use these differences to profile graphics pipelines and build a fair performance comparison between ARM and RISC-V. You will also see practical differences in DSP and floating-point support: on Cortex-M33, many math operations map to hardware instructions; on Hazard3, they may be emulated or rely on software libraries. For graphics and signal processing, this can be the difference between a smooth animation and a stuttering one.

Security is not optional in modern devices. TrustZone gives you two worlds: secure and non-secure. A secure boot process can set up memory regions and then hand off to non-secure code. If you later build USB devices or handle external data (SD card images), separating the parsing logic from privileged firmware becomes a best practice. Even if you do not fully implement this in your projects, understanding the concept will improve your reasoning about firmware upgrades and trust boundaries.

Toolchains are part of the architecture story. ARM uses a mature ecosystem with widespread debugger support, while RISC-V tools are newer and may differ in debug visibility or optimization behavior. In practice, you will notice differences in startup code, linker scripts, and debug probes. This forces you to understand the lower-level details of your firmware, which is precisely the kind of mastery you want from this guide.

How This Fits in Projects

You will use this in Projects 1, 5, 7, 10, and 13 where boot flow, ISA selection, and security domains directly impact startup code and performance.

Definitions & Key Terms

ISA: Instruction Set Architecture, the software-visible contract for a CPU.
TrustZone: Arm security extension that separates secure/non-secure state.
Boot ROM: Read-only boot code that initializes the chip and loads firmware.
OTP: One-time programmable memory used for keys and device identity.
PMP: Physical Memory Protection (RISC-V mechanism for isolation).

Mental Model Diagram

        BOOT ROM
           │
           v
   ┌─────────────────┐
   │ ISA SELECT MUX  │
   └───────┬─────────┘
           │
   ┌───────┴─────────┐
   │                 │
   v                 v
ARM Cortex-M33     Hazard3 RISC-V
   │                 │
   v                 v
Secure/Non-secure   PMP + Privilege

How It Works (Step-by-Step)

Power-on reset releases boot ROM.
Boot ROM reads boot mode and ISA select configuration.
Boot ROM initializes minimal clocks and memory.
Chosen core pair is released from reset.
Vector table and stack pointers are configured.
Application code begins execution.
Optional security configuration partitions memory/peripherals.

Minimal Concrete Example

// Pseudocode for minimal startup flow (simplified)
void reset_handler(void) {
  init_stack_pointers();
  init_clocks();
  copy_data_to_ram();
  zero_bss();
  main();
}

Common Misconceptions

“Both ARM and RISC-V run simultaneously.” (They do not.)
“Boot ROM can be overwritten.” (It is read-only.)
“TrustZone is only for big CPUs.” (It also matters on microcontrollers.)

Check-Your-Understanding Questions

What does the boot ROM do before your code runs?
Why does the ISA choice affect debugging tools?
What is the difference between secure and non-secure execution?

Check-Your-Understanding Answers

It initializes minimal clocks/memory and releases the chosen cores.
Because ARM uses SWD/ETM tooling while RISC-V uses different debug modules and tooling.
Secure execution can access protected memory/peripherals; non-secure cannot.

Real-World Applications

Secure firmware update pipelines
Products that support both ARM and RISC-V toolchains
Teaching environments for ISA comparison

Where You Will Apply It

Project 7 (ARM vs RISC-V benchmark)
Project 10 (bare-metal startup code)
Project 13 (mini OS and context switching)

References

RP2350 Datasheet (processor subsystem)
Raspberry Pi silicon documentation
Hazard3 RISC-V core documentation

Key Insight: The RP2350 is a living lab for understanding how ISA, boot flow, and security boundaries shape real firmware.

Summary

Dual-ISA means you control the CPU identity at boot. That affects toolchains, startup code, and security configuration. It is the foundation for everything else you build on this board.

Homework/Exercises

Draw a boot flow diagram for ARM vs RISC-V startup.
Compare two compiled binaries (ARM vs RISC-V) for the same function and count instructions.
Map which peripherals should be secure vs non-secure in a hypothetical product.

Solutions

The flow is identical until the ISA select step, then diverges by toolchain and startup vector table.
ARM often emits fewer instructions for floating-point, while RISC-V benefits from compressed encoding.
Keep USB and storage parsing in non-secure; keep boot validation in secure.

Chapter 2: Memory Map, XIP, SRAM Banking, and Bus Fabric

Fundamentals

Embedded performance is mostly memory performance. On RP2350, code typically runs from external QSPI flash via XIP (Execute-In-Place), while data lives in on-chip SRAM. The memory map defines where ROM, XIP, SRAM, and peripheral registers live. SRAM is split into multiple banks, some striped for bandwidth. The bus fabric (AHB/APB) and crossbar determine how cores, DMA, and peripherals contend for memory. Understanding addresses and bus segments lets you place high-bandwidth buffers in the right memory bank, avoid contention, and explain why a DMA transfer stalls your CPU. This is not just an optimization: graphics rendering becomes impossible if you misplace your framebuffer or block the bus with poorly timed transfers.

Deep Dive into the Concept

The RP2350 memory map is layered: ROM at low addresses, XIP flash mapped at 0x10000000, SRAM at 0x20000000, APB peripherals around 0x40000000, and AHB peripherals around 0x50000000. The RP2350 datasheet specifies base addresses for SRAM, DMA, PIO, and USB control, which you will use when writing bare-metal code. SRAM is not a single block: it is divided into multiple banks, and some are striped for throughput. This means sequential addresses might alternate between banks, allowing parallel access and higher throughput. If you place a framebuffer in striped SRAM, both cores and DMA can fetch data more efficiently, reducing display tearing or stutter.

XIP is critical: it allows the CPU to fetch instructions directly from QSPI flash without copying them into RAM. This simplifies boot and conserves RAM, but it also means flash latency is in your critical path. When you draw pixels or update UI state, you want time-critical code in RAM to reduce flash wait states. A common strategy is to place your hottest loops (pixel blits, DMA setup) into SRAM using linker sections. You can do this by using GCC attributes or linker script sections and then verifying the map file.

The bus fabric determines which agents can access which regions. APB is for low-bandwidth peripherals and often has longer wait states, while AHB is for higher bandwidth devices like DMA and USB. If you are streaming pixels to SPI using DMA, you are competing with CPU fetches on the same crossbar. A poor placement of buffers can cause bus contention and reduce frame rate. The RP2350 also includes SIO (single-cycle IO) for core-local fast registers (GPIO, FIFOs). This means toggling a GPIO pin can be faster when done via SIO, which is important for timing-critical signals or for debugging with a logic analyzer.

Memory mapping also affects security. Secure/non-secure partitions or PMP regions can restrict memory and peripheral access, which is important for any project that touches USB or external storage. You should understand not only addresses but also access permissions and fetch restrictions (e.g., peripherals are not executable). A memory map is not a map of bytes, it is a map of capabilities. When you build a bare-metal project, your linker script defines where code and data live. If you place the vector table or .data in the wrong region, the system will crash immediately. This is why memory map mastery is not optional.

Finally, SRAM banking influences DMA behavior. If the DMA controller and the CPU access the same bank, you will see stalls. If you interleave buffers across striped banks, you can often increase effective bandwidth. Many experienced embedded engineers treat SRAM layout like a performance tool: the layout is part of the algorithm.

How This Fits in Projects

Projects 3, 5, 7, 8, 9, and 10 depend on careful placement of buffers and correct register addresses.

Definitions & Key Terms

XIP: Execute-In-Place flash mapping.
APB/AHB: Bus types for peripheral access with different bandwidth.
Striped SRAM: Memory interleaved across banks for throughput.
SIO: Core-local fast IO region.
Bus contention: Multiple agents competing for the same bus resources.

Mental Model Diagram

Address Space
0x00000000  ROM (boot)
0x10000000  XIP flash (code)
0x20000000  SRAM (data, buffers)
0x40000000  APB peripherals (low bandwidth)
0x50000000  AHB peripherals (DMA, PIO, USB)
0xD0000000  SIO (core-local fast IO)

How It Works (Step-by-Step)

Boot ROM runs from ROM and jumps to flash (XIP).
Code fetches instructions from XIP.
Data accesses go to SRAM or peripherals.
DMA reads from SRAM and writes to peripheral FIFOs.
Bus fabric arbitrates between CPU and DMA.

Minimal Concrete Example

#define DMA_BASE 0x50000000
#define PIO0_BASE 0x50200000
#define SRAM_BASE 0x20000000

volatile uint32_t *dma = (uint32_t*)(DMA_BASE + 0x000);
volatile uint16_t *framebuffer = (uint16_t*)(SRAM_BASE + 0x0000);

Common Misconceptions

“All SRAM is equal.” (Banking and striping matter.)
“XIP is as fast as RAM.” (Flash has wait states.)
“Peripherals are executable.” (They are not.)

Check-Your-Understanding Questions

Why might a framebuffer in SRAM8/9 behave differently than SRAM0-3?
What is the benefit of XIP, and what is its cost?
Why do DMA transfers sometimes slow your CPU?

Check-Your-Understanding Answers

SRAM8/9 are non-striped and may have different bandwidth.
XIP saves RAM and boot time but adds flash latency.
DMA shares the bus and can create contention.

Real-World Applications

High-performance display pipelines
Deterministic real-time control loops
Secure partitioning of firmware and data

Where You Will Apply It

Project 3 (DMA display)
Project 10 (bare-metal register map)
Project 13 (mini OS memory layout)

References

RP2350 Datasheet, address map section

Key Insight: Performance bottlenecks on microcontrollers are often memory bottlenecks. Place data intentionally.

Summary

The RP2350 memory map defines not just addresses but performance and access rules. Learn it early and you will debug 10x faster.

Homework/Exercises

Draw a memory map with the base addresses you will use in Projects 10 and 12.
Benchmark a loop in XIP vs SRAM and measure the cycle difference.
Place one buffer in striped SRAM and one in non-striped SRAM, then compare DMA throughput.

Solutions

Use the datasheet base addresses for SRAM, DMA, and PIO.
SRAM loops should show fewer wait states and higher throughput.
The striped buffer should produce higher sustained throughput.

Chapter 3: Clocking, Reset, Power Domains, and GPIO/Pad Control

Fundamentals

Clocks are the heartbeat of embedded systems. The RP2350 has multiple clock sources (crystal oscillator, ring oscillator, PLLs), and each peripheral is clocked separately. Reset logic and power domains control which blocks are active and when. GPIOs are not just pins; they are configurable pads with drive strength, pull-up/down, and function selection. If your display flickers or your SPI bus glitches, the cause is often a misconfigured pad or a peripheral clock set too high. Understanding the clock tree and IO pad configuration makes your system stable, fast, and power-efficient.

Deep Dive into the Concept

The RP2350 includes multiple clock generators (PLLs) that derive system and peripheral clocks from a crystal or internal oscillator. The system clock feeds the CPU cores and bus fabric, while peripheral clocks can be derived or divided independently. This allows you to run the CPU at 150 MHz while keeping the SPI clock within the LCD’s spec. Power domains let you gate off unused blocks to reduce power consumption and noise. In practice, you will use reset and clock control registers to enable a peripheral, then configure its clock divider and source. Skipping this step is the most common reason peripherals appear “dead.”

GPIO pads are programmable. Each pin has a function select (e.g., GPIO, SPI, PIO), and each pad has configuration bits for pull-ups, pull-downs, hysteresis, slew rate, and drive strength. SPI signal integrity depends on proper drive strength and slew. The LCD’s CS or DC lines require clean transitions, or commands are misread. When you enable PWM for the backlight, the pad configuration also matters: a weak drive can produce a slow edge and visible flicker.

Reset is not just about starting the chip; it’s also about cleaning up between experiments. Many boards have a BOOTSEL mode where the boot ROM presents a USB mass-storage device. This mode is possible because BOOTSEL lives in ROM and cannot be erased. It is your safety net when you misconfigure clocks or break your firmware. Good embedded developers learn to recover quickly from bad configurations and design their firmware so it can always be reflashed.

Clocking also affects DMA and PIO timing. If you set the system clock high but leave a peripheral clock low, you may see underflow errors in SPI or slower DMA transfers. If you set the SPI clock too high, the ST7789 may accept commands but corrupt data, which looks like random pixel noise. The correct approach is to read the LCD datasheet, configure SPI accordingly, and then scale clocks for performance only after functional correctness.

Another subtlety is clock domain crossing. Some peripherals run on separate clocks, and signals crossing domains can introduce metastability if not handled correctly. While the hardware handles most of this for you, you will see the effects in timing jitter or unstable UART outputs if clocks are misconfigured. In real-time systems, consistent timing is more important than maximum speed. That is why many embedded engineers stabilize the system first, then increase clock rates gradually while measuring signal integrity.

How This Fits in Projects

Projects 1-4 depend on correct clocking and pad setup. Projects 8-9 also rely on power domain control.

Definitions & Key Terms

PLL: Phase-locked loop for generating high-frequency clocks.
Pad control: Electrical configuration of a GPIO pin.
BOOTSEL: Boot mode for UF2 mass storage programming.
Power domain: A block that can be turned on/off independently.
Clock divider: A register that divides a clock to a lower frequency.

Mental Model Diagram

Crystal/Ring OSC -> PLLs -> System Clock -> CPU/BUS
                                 │
                                 ├-> SPI Clock Divider -> SPI
                                 ├-> PWM Clock Divider -> Backlight
                                 └-> PIO Clock Divider -> PIO SM

How It Works (Step-by-Step)

Enable oscillator or external crystal.
Configure PLLs to generate system and USB clocks.
Set clock dividers for SPI, PWM, PIO.
Release resets for peripherals.
Configure pin mux and pad controls.

Minimal Concrete Example

// Pseudocode for enabling a peripheral clock
reset_clear(RESETS_SPI0);
clock_configure(CLK_SPI0, SRC_PLL_SYS, 1, 4); // divide by 4
pad_set_drive_strength(GPIO10, DRIVE_4MA);

Common Misconceptions

“If code compiles, clocks must be correct.” (Clocks are runtime configuration.)
“GPIO is always push-pull by default.” (Pads may default to high-impedance.)
“Faster clocks are always better.” (Signal integrity and power can suffer.)

Check-Your-Understanding Questions

Why do you need to enable the SPI clock before using SPI?
How does drive strength affect signal integrity?
Why is BOOTSEL mode a safety mechanism?

Check-Your-Understanding Answers

The peripheral is clock-gated by default and will not respond.
Too weak or too strong drive can cause slow edges or ringing.
BOOTSEL uses ROM code and cannot be erased, enabling recovery.

Real-World Applications

Low-power wearable devices
Reliable high-speed SPI sensors
Robust firmware recovery strategies

Where You Will Apply It

Project 1 (SPI LCD bring-up)
Project 4 (PWM and LED control)
Project 10 (bare-metal clock setup)

References

RP2350 Datasheet (clocks/resets/pads)
Raspberry Pi Pico documentation for BOOTSEL/UF2

Key Insight: Most “mystery bugs” in embedded graphics are clock or pad configuration bugs.

Summary

Clocking and pad configuration define stability, performance, and power. Learn them once and you will debug faster forever.

Homework/Exercises

Draw a clock tree for your board with all peripheral dividers.
Experiment with SPI clock speed and measure corruption thresholds.
Change pad drive strength and observe signal shape on a scope.

Solutions

Use the system clock as root and add branch dividers for peripherals.
You will see pixel corruption beyond the LCD’s max SPI clock.
Too weak drive yields slow edges; too strong yields ringing.

Chapter 4: SPI and the ST7789 Display Controller

Fundamentals

The LCD is controlled by an ST7789 display controller over SPI. SPI is a synchronous serial protocol with a clock and data line plus control signals like chip select (CS) and data/command (DC). The ST7789 expects a specific initialization sequence and a stream of pixel data. You must send commands to set column/row address windows, configure pixel format (RGB565 or RGB666), and then stream pixel data with a memory write command. If you get the command order or timing wrong, the display will stay blank or show corrupted colors.

Deep Dive into the Concept

The ST7789 is not just a display; it is a full controller with internal RAM, address counters, and a programmable refresh engine. SPI is used to write commands and data, which are latched into the controller and then mapped into the display’s frame memory. The sequence matters: typically you assert CS, set DC low to send a command byte, then set DC high to send data bytes. Commands like CASET (column address set) and RASET (row address set) configure the window in RAM where subsequent pixel data will land. The RAMWR command starts a memory write; the following bytes are interpreted as pixel data. The pixel format is configured with the COLMOD command (0x3A), which selects RGB565 or RGB666 packing. The ST7789 datasheet also defines memory access control (MADCTL, 0x36) for orientation, row/column order, and RGB/BGR ordering. This is critical for rotation and for matching the LCD’s physical wiring.

SPI timing matters. The LCD has a maximum clock frequency; exceeding it can produce flicker or random color corruption. The ST7789 can accept both 3-line and 4-line serial protocols; most boards use 4-line SPI with a separate DC pin. The LCD’s 172x320 resolution is not the default 240x320 used in many ST7789 panels, so you must configure the correct address window and offsets. Many LCD modules have hidden pixels or offsets; you will discover these by experimenting with the window ranges and observing where pixels land.

The ST7789’s memory is larger than the visible screen, and the controller maps a subset to the visible pixels. That is why you might see a (0,0) pixel not actually appear in the top-left corner unless you apply an offset. This offset is board-specific and should be documented in the board’s wiki or sample drivers. Once you know it, you can abstract it into your display driver. The ST7789 also supports partial updates: you can set an address window to a small rectangle and update only that region. This is a powerful optimization for UI rendering, reducing SPI bandwidth and CPU load. It is the foundation for fast text rendering and small animations.

The ST7789 command set also includes sleep mode, inversion, gamma adjustment, and frame rate control. In most projects you will use a standard initialization sequence that sets power, gamma, pixel format, and display on commands. But advanced projects can tweak gamma curves to improve contrast or change frame rate settings to balance power vs smoothness. If you plan to run on battery, these settings matter.

How This Fits in Projects

Projects 1-3, 6, 8, 11 rely directly on ST7789 command sequences and SPI timing.

Definitions & Key Terms

CASET (0x2A): Column Address Set command.
RASET (0x2B): Row Address Set command.
RAMWR (0x2C): Start memory write.
COLMOD (0x3A): Pixel format selection.
MADCTL (0x36): Memory access control (rotation and RGB/BGR).
DC pin: Data/Command select pin on 4-wire SPI LCDs.

Mental Model Diagram

CPU -> SPI -> [ST7789 Command Parser] -> [Address Window] -> [Display RAM] -> LCD
            DC=0 (cmd)     DC=1 (data)

How It Works (Step-by-Step)

Reset LCD and wait for stable power.
Send initialization command sequence.
Set pixel format (RGB565 or RGB666).
Define address window with CASET/RASET.
Send RAMWR and stream pixel data.
Repeat for partial updates.

Minimal Concrete Example

lcd_cmd(0x2A); // CASET
lcd_data16(x0); lcd_data16(x1);
lcd_cmd(0x2B); // RASET
lcd_data16(y0); lcd_data16(y1);
lcd_cmd(0x2C); // RAMWR
lcd_data_pixels(buf, count);

Common Misconceptions

“SPI is just data; DC does not matter.” (DC distinguishes commands.)
“A full-frame update is always required.” (Partial windows are faster.)
“Color errors mean bad framebuffer.” (Often pixel format mismatch.)

Check-Your-Understanding Questions

Why must CASET and RASET be sent before RAMWR?
How does COLMOD affect pixel packing?
Why might the visible image be shifted on the panel?

Check-Your-Understanding Answers

The ST7789 uses the address window to map data into RAM.
COLMOD tells the controller how many bits per pixel are sent.
Because the panel may use a non-zero RAM offset.

Real-World Applications

Smartwatch and wearable UIs
Industrial status panels
Handheld measurement tools

Where You Will Apply It

Project 1 (bring-up)
Project 2 (drawing primitives)
Project 3 (DMA pipeline)

References

ST7789 datasheet (command set)
Waveshare RP2350 LCD board documentation

Key Insight: The LCD is a tiny computer. Treat it like a device with its own memory and rules, not just a “screen.”

Summary

SPI display programming is about command sequences, timing, and pixel formats. Master those and you control every pixel.

Homework/Exercises

Implement a partial update API for a 10x10 region.
Verify with a logic analyzer that DC toggles correctly.
Add a function that rotates the display using MADCTL.

Solutions

Send CASET/RASET for the small region and stream only that data.
DC should be low for command bytes and high for data bytes.
Set the MADCTL rotation bits and update your coordinate mapping.

Chapter 5: Pixel Formats, Framebuffers, and Graphics Rendering

Fundamentals

Pixels are just bits in memory. The ST7789 can accept RGB565 (16-bit) or RGB666 (18-bit). RGB565 is the most common because it halves bandwidth and fits well into 16-bit buffers. A framebuffer is a contiguous region of memory representing the pixels you want to show. Rendering is the process of turning shapes, fonts, and sprites into pixel values in that buffer. You need to understand coordinate systems, stride (bytes per row), clipping, and blending. If you want smooth animations, you must control when and how you update the framebuffer so you avoid tearing.

Deep Dive into the Concept

A framebuffer is conceptually simple but full of traps. For a 172x320 display, an RGB565 framebuffer requires 172 * 320 * 2 bytes, which is ~110 KB. That is a large fraction of RP2350 SRAM. If you use double buffering, you need ~220 KB. This is possible but forces you to be careful about where the buffers live and what else shares SRAM. You can also use partial buffers (scanline or tile buffers) and update the display in chunks to save memory. This is a classic trade-off: memory vs CPU time.

Color formats encode RGB values into limited bits. RGB565 uses 5 bits red, 6 bits green, 5 bits blue. That means you must convert 8-bit RGB values by shifting and masking. If you do not align correctly, your colors will look wrong. RGB666 uses 18 bits and allows smoother gradients but requires 3 bytes per pixel, which increases bandwidth. In practice, RGB565 is the sweet spot for most microcontroller graphics. If you need high color fidelity for photos, you might still choose RGB666 and accept lower frame rates.

Rendering primitives (lines, rectangles, circles) are algorithms that touch many pixels. You should learn Bresenham’s algorithm for lines and midpoint algorithms for circles to avoid floating point. Sprites are just bitmaps with a transparent color or alpha mask. Font rendering is similar: fonts are arrays of bits or bytes that map glyphs into pixels. You will likely pre-render fonts into bitmaps and blit them into the framebuffer. The efficiency of these operations determines your frame rate.

Double buffering is critical for smooth animations. If you draw directly to the buffer while DMA is sending it to the display, you will see tearing. With double buffering, you draw into a back buffer while the front buffer is being transmitted. When DMA finishes, you swap the pointers. This requires careful synchronization and a swap protocol that is safe across cores. An alternative is dirty rectangles: track which regions changed and update only those. Dirty rectangles are complex but reduce bandwidth and can outperform full-frame double buffering on small changes.

Clipping is another subtlety. Drawing algorithms must be clipped to the visible region or you will write outside the buffer, which can corrupt memory. You also need to understand coordinate transforms for rotation or UI layouts. The ST7789’s MADCTL can rotate the display, but your rendering coordinates must follow that rotation. A robust renderer has a clear coordinate system and consistent transforms. Advanced renderers also support alpha blending, which requires reading the destination pixel, blending with a source pixel, and writing back. This is expensive on microcontrollers, so you often use 1-bit masks or preblended sprites.

How This Fits in Projects

Projects 2, 5, 6, 8, 11 are all rendering-heavy.

Definitions & Key Terms

Framebuffer: Memory region holding pixel data.
Stride: Bytes per row in a framebuffer.
RGB565: 16-bit pixel format (5-6-5 bits).
Double buffering: Two framebuffers for tear-free updates.
Dirty rectangle: A region that changed and needs re-rendering.

Mental Model Diagram

[Render API] -> [Rasterizer] -> [Framebuffer] -> DMA -> LCD
                 (lines, text, sprites)

How It Works (Step-by-Step)

Clear framebuffer to background color.
Render primitives and sprites into buffer.
Send buffer to display using SPI or DMA.
Swap buffers if double buffering is enabled.

Minimal Concrete Example

uint16_t rgb565(uint8_t r, uint8_t g, uint8_t b) {
  return ((r & 0xF8) << 8) | ((g & 0xFC) << 3) | (b >> 3);
}

void draw_pixel(uint16_t *fb, int x, int y, int w, uint16_t color) {
  fb[y * w + x] = color;
}

Common Misconceptions

“Framebuffers must be full screen.” (Tile buffers also work.)
“Double buffering is always required.” (Partial updates can avoid tearing.)
“RGB565 is bad quality.” (For small displays, it is excellent.)

Check-Your-Understanding Questions

How many bytes does a 172x320 RGB565 framebuffer require?
What is the advantage of a tile buffer?
Why can DMA + single buffer cause tearing?

Check-Your-Understanding Answers

172 * 320 * 2 = 110,080 bytes.
It reduces RAM usage while still supporting partial updates.
Because the buffer is being updated while it is transmitted.

Real-World Applications

UI rendering for IoT devices
Portable data loggers with custom graphics
Wearable interfaces

Where You Will Apply It

Project 2 (pixel artist)
Project 6 (font engine)
Project 11 (game rendering)

References

ST7789 datasheet (pixel format)
Computer Graphics from Scratch (rasterization)

Key Insight: Graphics is memory bandwidth. Optimize your pixels, not your CPU.

Summary

Rendering is about mapping shapes and text to pixel buffers efficiently. Master it and your UI becomes smooth and responsive.

Homework/Exercises

Implement a line-drawing algorithm with clipping.
Build a small sprite blitter with transparency.
Implement dirty rectangles and measure bandwidth savings.

Solutions

Use Bresenham and clip before drawing.
Skip pixels matching the transparent color.
Track changed regions and update only those windows.

Chapter 6: DMA and PIO Pipelines for High-Throughput IO

Fundamentals

DMA moves data without CPU intervention. PIO is a programmable IO engine that can generate deterministic waveforms. Together, they form a pipeline where the CPU sets up transfers and the hardware streams data to peripherals. This is the key to smooth graphics: the CPU can render the next frame while DMA pushes the current frame to SPI. PIO can implement protocols or timing that SPI alone cannot. Understanding DMA channels, DREQ pacing, and FIFO behavior is essential for zero-copy, high-throughput display updates.

Deep Dive into the Concept

A DMA controller is essentially a configurable engine that reads from memory and writes to a peripheral or another memory region. On RP2350, DMA can access SRAM and peripheral registers. The typical display path is: DMA reads the framebuffer and writes to the SPI TX FIFO. The DMA channel can be paced by a DREQ signal so it only writes when the FIFO has space. This prevents overflow and allows consistent throughput. If you ignore DREQ and simply blast data, you will either stall or lose data. DMA transfers can be chained: once one transfer finishes, another automatically starts. This is useful for double buffering or for sending command sequences followed by pixel data.

PIO is complementary. It is a small state machine that manipulates GPIO pins with cycle-level determinism. PIO can implement custom SPI-like protocols, generate WS2812 LED waveforms, or even bit-bang LCD signals if needed. The advantage is determinism: PIO timing is independent of CPU jitter. Each PIO block has multiple state machines, each with its own instruction memory, FIFOs, and shift registers. PIO can also be fed by DMA, creating a fully hardware-driven IO pipeline: CPU writes buffer -> DMA feeds PIO -> PIO toggles pins. For graphics, this means you can generate precise timing for unusual displays or backlight protocols.

A key concept is flow control. DMA transfers can starve or overflow a peripheral if not paced correctly. The SPI peripheral usually has a small FIFO. DREQ ensures the DMA engine only writes when the FIFO is ready. Similarly, PIO can request data when its TX FIFO is low. By combining DREQ with DMA, you can build a steady streaming pipeline without busy-waiting. This is how you achieve high frame rates while leaving CPU cycles free for rendering, input, or logic.

DMA and PIO also affect memory placement. Because DMA reads from memory, you should place buffers in SRAM banks with good bandwidth and avoid contention with the CPU. If you run DMA and the CPU from the same bank, you will see stalls. If you place DMA buffers in striped SRAM, you can often improve throughput. This is why memory map knowledge matters. A disciplined DMA design also includes error handling: overrun flags, transfer-complete interrupts, and graceful recovery if a transfer is aborted.

Finally, PIO is best understood as a hardware co-processor. It has its own instruction memory and runs independently. It can be used for more than LED or SPI; it can generate video timing, decode sensors, or implement proprietary buses. When you understand how to feed it with DMA, you gain a reliable pipeline that is independent of CPU scheduling jitter.

How This Fits in Projects

Projects 3 and 4 are direct DMA/PIO exercises. Projects 5 and 11 use these pipelines for smooth graphics.

Definitions & Key Terms

DMA: Direct Memory Access engine.
DREQ: DMA request for pacing transfers.
PIO: Programmable IO state machine.
FIFO: First-in-first-out queue inside peripherals.
Chaining: DMA feature that starts a new transfer when one finishes.

Mental Model Diagram

CPU sets DMA -> DMA streams -> SPI FIFO -> ST7789 RAM
         \-> DMA streams -> PIO FIFO -> GPIO waveforms

How It Works (Step-by-Step)

Configure DMA channel with source and destination.
Set DREQ to SPI TX or PIO TX.
Enable channel and let it run.
DMA triggers IRQ on completion.
Swap buffers or chain next transfer.

Minimal Concrete Example

// Pseudocode for DMA to SPI
configure_dma(channel,
  src = framebuffer,
  dst = &spi->dr,
  count = pixels,
  dreq = DREQ_SPI0_TX);
start_dma(channel);

Common Misconceptions

“DMA always increases speed.” (Incorrect configuration can slow you down.)
“PIO replaces SPI.” (PIO is for custom timing, not always needed.)

Check-Your-Understanding Questions

Why is DREQ important in DMA transfers?
What makes PIO deterministic?
How do DMA and PIO interact?

Check-Your-Understanding Answers

It paces the transfer to FIFO availability.
PIO runs a fixed instruction sequence at a fixed clock.
DMA can feed PIO FIFOs to automate output.

Real-World Applications

High-speed LCD refresh
NeoPixel/WS2812 LED drivers
Precise sensor protocols

Where You Will Apply It

Project 3 (DMA display)
Project 4 (PIO LED)
Project 5 (dual-core rendering with DMA)

References

RP2350 Datasheet (DMA/PIO address map)
Pico SDK hardware_pio documentation

Key Insight: DMA + PIO is how you get hardware-level throughput while the CPU does real work.

Summary

DMA moves bulk data efficiently, PIO handles deterministic timing. Together they create high-performance IO pipelines.

Homework/Exercises

Build a DMA transfer that fills a framebuffer with a gradient.
Write a PIO program that toggles a pin at 1 MHz and verify with a scope.
Chain two DMA transfers: one for a command sequence, one for pixel data.

Solutions

Use DMA with a memory-to-memory copy and a repeating pattern.
Use PIO clock divider and a simple loop with SET instructions.
Use a control channel that triggers the data channel on completion.

Chapter 7: Multicore, Interrupts, and Real-Time Scheduling

Fundamentals

The RP2350 gives you two active cores, which lets you separate rendering from IO or UI logic. Multicore programming is powerful but dangerous: race conditions, cache/bus contention, and synchronization mistakes can crash or corrupt your system. Interrupts are the mechanism by which peripherals notify your code, but they must be short and deterministic. Real-time scheduling is the art of deciding what runs when, so animations stay smooth and input feels responsive.

Deep Dive into the Concept

Dual-core microcontrollers are different from SMP desktop CPUs. They often share memory and peripherals but have limited cache and tighter timing constraints. On RP2350, both cores see the same memory map. You can use a FIFO or shared memory to communicate. The challenge is synchronization: if two cores write the same framebuffer, the results are undefined. The usual pattern is to dedicate one core to rendering and another to IO and display updates. You then synchronize frame swaps using atomic flags or multicore FIFO messages.

Interrupts are essential for timing. For example, you might use a timer interrupt to trigger a screen refresh or to advance animation frames. But interrupts that do too much work can cause jitter and missed deadlines. The correct pattern is to set a flag in the ISR and let the main loop do heavy work. In a multicore system, you can also route interrupts to a specific core to avoid contention. Some tasks (like USB) may require strict timing and should be isolated from rendering code.

Real-time scheduling does not always require an RTOS. You can build a cooperative scheduler that runs tasks in a fixed order and yields control regularly. In Project 13, you will build a tiny cooperative OS that switches between tasks and updates a task manager UI. This teaches you context switching, stack management, and priority scheduling. It also forces you to confront how much CPU time each subsystem consumes. Even with a cooperative scheduler, you need to track deadlines. If one task runs too long, others miss their deadlines and you see visible jitter or input lag.

Concurrency also affects DMA and PIO. If one core is configuring DMA while another is reading from the same registers, you can introduce subtle race conditions. You must define ownership: one core owns DMA setup, the other owns rendering, etc. The more explicit your ownership model, the fewer bugs you will chase at 2 AM. In practice, you will use mutexes, spinlocks, or atomic flags to coordinate access, and you will design protocols (like “only core 1 writes to SPI”).

Atomic operations and memory barriers matter even on microcontrollers. If you do not use proper barriers, one core might not see a flag update from the other. The RP2350 provides spinlocks and FIFO mechanisms for coordination. Understanding these low-level primitives will help you debug the kinds of bugs that only show up under load.

How This Fits in Projects

Projects 5, 7, 9, 11, and 13 rely on multicore and scheduling concepts.

Definitions & Key Terms

ISR: Interrupt Service Routine.
Cooperative scheduling: Tasks yield explicitly.
Race condition: Two writers updating the same data without coordination.
FIFO: A hardware queue used for inter-core messaging.
Jitter: Variation in timing of periodic events.

Mental Model Diagram

Core 0: Render -> Framebuffer A
Core 1: DMA -> LCD
            ^
            | swap signal

How It Works (Step-by-Step)

Core 0 renders into back buffer.
Core 1 starts DMA for front buffer.
Core 0 signals when new frame is ready.
Core 1 swaps buffer pointers after DMA complete.

Minimal Concrete Example

// Shared flag for buffer swap
volatile bool frame_ready = false;

// Core 0
render_frame(back_buffer);
frame_ready = true;

// Core 1
if (frame_ready && dma_done()) {
  swap_buffers();
  frame_ready = false;
}

Common Misconceptions

“Two cores always means 2x speed.” (Bus contention can reduce speed.)
“Interrupts are faster than polling.” (Interrupt overhead can be worse.)
“RTOS is required for real-time.” (Cooperative scheduling can be enough.)

Check-Your-Understanding Questions

Why should ISRs be short?
What is the simplest safe buffer swap mechanism?
When would you use cooperative scheduling vs an RTOS?

Check-Your-Understanding Answers

Long ISRs introduce jitter and block other interrupts.
A flag plus DMA-complete interrupt on a single owner core.
Cooperative scheduling is simpler and sufficient for small systems.

Real-World Applications

Wearable UIs with smooth animation
Industrial control dashboards
Low-latency input devices

Where You Will Apply It

Project 5 (dual-core renderer)
Project 13 (mini OS)

References

RP2350 Datasheet (multicore SIO/FIFO)
Operating Systems: Three Easy Pieces (scheduling)

Key Insight: Multicore is a force multiplier only if you define ownership and synchronization clearly.

Summary

Concurrency gives you power, but only when you control timing and data access. Clear ownership is the best optimization.

Homework/Exercises

Design a buffer swap protocol with two cores and DMA.
Implement a cooperative scheduler with 3 tasks and fixed time slices.
Measure jitter in a timer ISR under load.

Solutions

Use DMA completion IRQ and an atomic frame-ready flag.
Each task runs for a short time then yields to the next.
Log timestamps and compute deviation from expected intervals.

Chapter 8: Storage and USB I/O (QSPI, SD/FAT, USB HID)

Fundamentals

Embedded systems rarely live in isolation. You will often load assets from flash or SD card and communicate over USB. The RP2350 supports QSPI flash with XIP, and many boards include a TF (microSD) slot for FAT filesystems. USB 1.1 host/device lets the board enumerate as a HID device. Understanding how file systems and USB descriptors work lets you build real devices instead of demos.

Deep Dive into the Concept

QSPI flash is the main storage for firmware. XIP maps flash into memory so you can execute code directly. The tradeoff is latency and bandwidth: flash access is slower than SRAM. For data-heavy tasks (fonts, images), you may want to load assets into SRAM or stream them in chunks. This affects how you design image viewers and sprite systems. Flash writes also require erase cycles, which are relatively slow and limited in lifetime. You should design firmware updates and asset caches with erase blocks in mind.

SD cards provide much larger storage but require a filesystem. FAT is common because it is simple and supported everywhere. A FAT parser must read the boot sector, FAT tables, and directory entries. You can use an existing FAT library, but you should still understand the fundamentals: cluster size, file allocation chains, and sector reads. A common performance optimization is to cache the FAT and directory sectors to reduce repeated reads. For image viewers, you often stream large files in blocks; reading sequential clusters is efficient, while random seeks are costly.

USB HID devices require a descriptor that tells the host what kind of device you are (keyboard, mouse, gamepad). HID reports are structured bytes that represent button states, axes, or keys. The device must respond to control transfers during enumeration. Libraries like TinyUSB handle the heavy lifting, but you must still define descriptors correctly and match report format with your firmware logic. The descriptor is your contract with the host OS. If it is wrong, your device might enumerate but not function.

USB also introduces timing and buffer constraints. HID devices typically use interrupt endpoints with polling intervals. If you send reports too frequently, the host will drop them or throttle. If you send too slowly, your device feels laggy. The goal is a stable, predictable report rate. The LCD can reflect USB state, which is a good debugging tool. If you ever plan to build custom controllers, understanding HID is essential.

Finally, the UF2 bootloader is a key part of the Raspberry Pi ecosystem. It allows drag-and-drop firmware updates by presenting the board as a USB mass storage device. This is extremely practical when experimenting with bare-metal code, because it provides a recovery path. If you misconfigure clocks or crash the firmware, you can always return to BOOTSEL mode.

How This Fits in Projects

Projects 8 and 12 rely on SD/FAT and USB. Projects 10 and 13 rely on flash/XIP and boot flow.

Definitions & Key Terms

FAT: File Allocation Table filesystem.
HID: Human Interface Device class in USB.
UF2: USB flashing file format used by Raspberry Pi bootloader.
Descriptor: USB structure describing device capabilities.
Cluster: FAT allocation unit.

Mental Model Diagram

Firmware in QSPI (XIP) -> App loads assets from SD -> UI displayed
             \-> USB HID descriptors -> Host recognizes device

How It Works (Step-by-Step)

Boot ROM exposes UF2 mass storage in BOOTSEL mode.
Firmware stored in QSPI flash executes via XIP.
FAT parser reads SD card sectors and loads images.
USB stack enumerates and sends HID reports.

Minimal Concrete Example

// Pseudocode: send HID report
uint8_t report[4] = {buttons, x, y, 0};
usb_hid_send(report, sizeof(report));

Common Misconceptions

“USB just works if you plug it in.” (Descriptors must be correct.)
“FAT is simple enough to ignore.” (Parsing errors corrupt files.)
“Flash writes are instantaneous.” (Erase cycles are slow.)

Check-Your-Understanding Questions

Why does UF2 require a special file format?
What is a HID report?
Why might SD card reads be slow without caching?

Check-Your-Understanding Answers

UF2 includes metadata for flash addresses and block ordering.
A structured byte packet representing input state.
FAT tables and directory entries require multiple reads.

Real-World Applications

USB macro pads and controllers
Portable photo viewers
Data loggers with SD storage

Where You Will Apply It

Project 8 (TF card image viewer)
Project 12 (USB HID)

References

Raspberry Pi Pico documentation (BOOTSEL/UF2)
USB HID specs and TinyUSB docs

Key Insight: Storage and USB turn your board from a demo into a real device.

Summary

QSPI, SD, and USB are the I/O backbone of real products. Learn them and your projects become practical.

Homework/Exercises

Write a simple FAT sector reader that lists directory entries.
Modify a HID report descriptor to add a new button.
Implement a simple asset cache that stores one image in SRAM.

Solutions

Read the boot sector, compute FAT offsets, then parse directory entries.
Extend the report descriptor and update firmware to set the new bit.
Load one image into SRAM and reuse it without SD reads.

Glossary

AHB/APB: Bus protocols for high/low bandwidth peripherals.
BOOTSEL: Button mode for USB mass storage flashing.
DREQ: DMA pacing signal.
Frame buffer: Memory containing pixel data for the LCD.
MADCTL: Display memory access control register.
PIO: Programmable IO engine.
SRAM banking: Multiple memory banks for throughput.
SPI: Serial Peripheral Interface.
TrustZone: Secure/non-secure execution domains.
UF2: USB flashing file format.
XIP: Execute-In-Place flash mapping.

Why RP2350 LCD Development Matters

The Modern Problem It Solves

Most IoT and embedded devices need a user interface, but they run on tight power and memory budgets. The RP2350 LCD board is a compact, low-cost platform for learning how to build those interfaces while understanding the hardware that drives them. The skills you develop here are directly transferable to wearable devices, industrial control panels, test equipment, and consumer gadgets.

Real-world impact and scale:

IoT Analytics reports 18.5 billion connected IoT devices in 2024 and forecasts 21.1 billion by the end of 2025. This scale drives demand for embedded UI and device firmware skills.
Grand View Research projects the global embedded systems market to reach USD 169.1 billion by 2030, emphasizing embedded systems growth across industries.

OLD APPROACH                          NEW APPROACH
┌───────────────────────┐           ┌────────────────────────┐
│ Fixed UI, low graphics│           │ Dynamic UI + telemetry │
│ No OTA updates        │           │ OTA updates + security │
│ Single-core firmware  │           │ Multicore + DMA + PIO   │
└───────────────────────┘           └────────────────────────┘

Context & Evolution

Early embedded systems used fixed-function displays or no UI at all. As IoT devices proliferated, demand grew for responsive, graphics-capable interfaces on low-power hardware. The RP2350’s dual-ISA and PIO/DMA features make it a uniquely capable learning platform for modern embedded UI systems.

Concept Summary Table

Concept Cluster	What You Need to Internalize
Dual-ISA Architecture & Boot	How RP2350 selects ARM vs RISC-V, secure vs non-secure domains, and the boot ROM flow.
Memory Map & XIP	Where code and data live, how SRAM banking affects throughput, and how DMA competes for bandwidth.
Clocking & Pad Control	How clocks and pad settings affect stability, SPI timing, and display reliability.
SPI + ST7789 Commands	The exact command sequence and pixel format that brings the display to life.
Framebuffers & Rendering	How to pack pixels, render primitives, and manage double buffering.
DMA + PIO Pipelines	How to stream pixels and IO data without burning CPU cycles.
Multicore & Scheduling	How to split workloads across cores safely and deterministically.
Storage & USB I/O	How to load assets from SD and expose the board as a USB HID device.

Project-to-Concept Map

Project	What It Builds	Primer Chapters It Uses
Project 1: Hello Display	SPI bring-up + LCD init	3, 4
Project 2: Pixel Artist	Primitives + framebuffer	4, 5
Project 3: DMA Display Driver	DMA streaming to SPI	2, 6
Project 4: RGB LED Controller	PIO waveforms	6
Project 5: Dual-Core Renderer	Multicore rendering + DMA	2, 6, 7
Project 6: Font Rendering Engine	Text rendering + partial updates	4, 5
Project 7: ARM vs RISC-V Benchmark	ISA comparison	1, 2
Project 8: TF Card Image Viewer	FAT + asset streaming	2, 8
Project 9: Real-Time System Monitor	Timers + multicore + UI	2, 7
Project 10: Bare-Metal Driver	Register-level boot + SPI	1, 2, 3, 4
Project 11: Simple Game	Game loop + rendering	5, 7
Project 12: USB HID Device	USB descriptors + reports	8
Project 13: Mini OS	Cooperative scheduling + UI	1, 2, 7

Deep Dive Reading by Concept

Architecture & Hardware

Concept	Book & Chapter	Why This Matters
RP2350 boot + memory map	RP2350 Datasheet - Sections 2-3	Register and address correctness for bare-metal code
ISA basics	Computer Organization and Design RISC-V - Ch. 1-2	Understand instruction set differences
Embedded constraints	Making Embedded Systems - Ch. 1-4	Real-world constraints and design trade-offs

Display & Graphics

Concept	Book & Chapter	Why This Matters
Graphics memory + bitmaps	Code: The Hidden Language - Ch. 13-16	Binary representation of data and pixels
Rendering primitives	Computer Graphics from Scratch - Ch. 2-5	Rasterization fundamentals
LCD command sets	ST7789 Datasheet - Command sections	Real command order and pixel format

IO & Performance

Concept	Book & Chapter	Why This Matters
DMA/PIO design	RP2350 Datasheet - DMA/PIO sections	Correct high-throughput IO design
Debugging timing	The Art of Debugging with GDB - Ch. 6-8	Timing and peripheral debugging

Systems & Scheduling

Concept	Book & Chapter	Why This Matters
Scheduling fundamentals	Operating Systems: Three Easy Pieces - Ch. 4-9	Context switching and scheduling basics
Concurrency basics	Computer Systems: A Programmer’s Perspective - Ch. 12	Synchronization and concurrency models

Storage & USB

Concept	Book & Chapter	Why This Matters
Filesystems	Operating Systems: Three Easy Pieces - Ch. 39	File system layout and metadata
USB fundamentals	USB Complete by Jan Axelson - Ch. 1-4, 8	USB enumeration and HID class

Quick Start: Your First 48 Hours

Day 1 (4 hours):

Read the Introduction, Chapter 4 (SPI + ST7789), and Chapter 5 (Framebuffers).
Flash the Hello Display project and make the screen show text.
Use a logic analyzer to verify SPI clock and DC toggling.

Day 2 (4 hours):

Implement a simple rectangle fill and draw a moving box.
Add partial update support to reduce SPI traffic.
Read the “Core Question” and “Pitfalls” sections in Project 2.

End of Weekend: You can bring up the display, draw primitives, and understand the SPI command flow. That is the core foundation.

Recommended Learning Paths

Path 1: The Embedded Graphics Beginner (Recommended Start)

Project 1 - Hello Display
Project 2 - Pixel Artist
Project 3 - DMA Display Driver
Project 6 - Font Rendering Engine

Path 2: The Performance Engineer

Project 3 - DMA Display Driver
Project 4 - PIO LED Controller
Project 5 - Dual-Core Renderer
Project 7 - ARM vs RISC-V Benchmark

Path 3: The Product Builder

Project 1 - Hello Display
Project 8 - TF Card Image Viewer
Project 12 - USB HID Device
Project 11 - Simple Game

Path 4: The Completionist

Phase 1: Projects 1-3 Phase 2: Projects 4-6 Phase 3: Projects 7-9 Phase 4: Projects 10-13

Success Metrics

By the end of this guide, you should be able to:

Explain RP2350 boot flow and memory map from memory
Bring up the LCD without SDK helper functions
Achieve smooth 60 FPS animation using DMA
Implement partial updates and double buffering
Run the same benchmark on ARM and RISC-V and explain the results
Build a USB HID device with a custom descriptor
Build a cooperative scheduler and show task stats on the LCD

Tooling & Debugging Appendix

Logic Analyzer Workflow:

Capture SPI clock, MOSI, CS, and DC.
Verify command bytes and data bytes align with DC.
Look for clock stretching or missed edges.

Common Debug Tools:

picotool to inspect flash and device state
openocd + SWD for stepping through startup
GDB for register inspection

Signal Integrity Tips:

Reduce SPI clock if you see random pixel noise.
Increase drive strength on SPI pins for longer wires.
Keep wires short to avoid ringing.

Project Overview Table

#	Project	Difficulty	Time	Primary Focus
1	Hello Display	Beginner	Weekend	SPI + LCD init
2	Pixel Artist	Intermediate	1-2 weeks	Rendering primitives
3	DMA Display Driver	Advanced	1-2 weeks	DMA pipeline
4	RGB LED Controller	Advanced	1-2 weeks	PIO timing
5	Dual-Core Renderer	Advanced	2-3 weeks	Multicore + DMA
6	Font Rendering Engine	Intermediate	1-2 weeks	Text rendering
7	ARM vs RISC-V Benchmark	Expert	2-3 weeks	ISA comparison
8	TF Card Image Viewer	Intermediate	1-2 weeks	FAT + assets
9	Real-Time System Monitor	Intermediate	1-2 weeks	Timers + UI
10	Bare-Metal Driver	Master	1 month	Registers + boot
11	Simple Game	Advanced	2-3 weeks	Game loop
12	USB HID Device	Expert	2-3 weeks	USB stack
13	Mini Operating System	Master	1-2 months	Scheduling

Project List

Project 1: Hello Display - Raw SPI Communication

Main Programming Language: C
Alternative Programming Languages: Rust, MicroPython
Coolness Level: Level 2: Fun
Business Potential: 2. The “Prototype” Level
Difficulty: Level 1: Beginner
Knowledge Area: SPI + Display Bring-up
Software or Tool: Pico SDK, logic analyzer
Main Book: “Making Embedded Systems” by Elecia White

What you’ll build: A minimal firmware that initializes SPI, sends the ST7789 command sequence, and displays text and a color gradient on the LCD.

Why it teaches RP2350 fundamentals: You will wire up SPI, configure GPIO pads, and confirm that the LCD responds to command/data sequences. This is the foundation for everything else.

Core challenges you’ll face:

Pin mapping -> Correctly assign SPI and control pins
Command sequencing -> ST7789 init order matters
Timing -> Reset delays and SPI clock stability

Real World Outcome

You will see a boot logo and a gradient fill:

┌────────────────────────────────────────┐
│  RP2350 LCD HELLO                      │
│                                        │
│  Gradient: ██████████░░░░░░░░          │
│                                        │
│  SPI OK   | LCD OK                     │
└────────────────────────────────────────┘

Command Line Outcome Example:

$ mkdir build && cd build
$ cmake ..
$ make -j4
[100%] Built target hello_display

$ cp hello_display.uf2 /Volumes/RP2350
# Screen updates within 1-2 seconds

The Core Question You’re Answering

“How do I speak the LCD’s language well enough to make a pixel appear?”

Concepts You Must Understand First

SPI basics
- What does CPOL/CPHA change?
- Why does CS framing matter?
- Book Reference: “Making Embedded Systems” Ch. 6
ST7789 command flow
- Why do CASET/RASET precede RAMWR?
- Book Reference: ST7789 datasheet command section
GPIO pad control
- How does drive strength affect edges?
- Book Reference: RP2350 Datasheet, IO/pad sections

Questions to Guide Your Design

How will you abstract “command” vs “data” writes?
What reset timing does the LCD require?
How will you handle display offsets for 172x320?

Thinking Exercise

“The First Pixel”

Imagine you want to draw only pixel (0,0). What commands must be sent?
Write the sequence of command bytes and data bytes.

The Interview Questions They’ll Ask

“What is the purpose of DC in SPI LCDs?”
“Why must you set an address window before RAMWR?”
“What are common causes of a blank LCD screen?”
“Why might a display show shifted graphics?”

Hints in Layers

Hint 1: Start simple Toggle CS and reset pins and confirm with a logic analyzer.

Hint 2: Known-good sequence Use a known ST7789 initialization sequence and add delays after reset.

Hint 3: Color swap check If colors are wrong, flip RGB/BGR in MADCTL.

Hint 4: Verify on the wire Capture the first 16 bytes after RAMWR and ensure they match your pixel data.

Books That Will Help

Common Pitfalls & Debugging

Problem 1: “Screen stays white”

Why: Reset not asserted or SPI clock disabled
Fix: Toggle reset pin with proper delays, enable SPI clock
Quick test: Send “display on” command and check logic analyzer

Problem 2: “Colors are inverted”

Why: RGB/BGR bit in MADCTL wrong
Fix: Toggle MADCTL bit D3
Quick test: Draw pure red and see if it appears red

Problem 3: “Only part of the screen updates”

Why: Address window not configured for 172x320
Fix: Adjust CASET/RASET values and offsets
Quick test: Draw a border and confirm edges align

Definition of Done

LCD shows text and gradient correctly
SPI clock and DC timing verified on analyzer
Reset sequence is documented and repeatable
You can draw a single pixel at (0,0)

Project 2: Pixel Artist - Drawing Primitives and Sprites

Main Programming Language: C
Alternative Programming Languages: Rust, MicroPython
Coolness Level: Level 3: Impressive
Business Potential: 3. The “Demo” Level
Difficulty: Level 2: Intermediate
Knowledge Area: Graphics rendering
Software or Tool: Pico SDK, image conversion scripts
Main Book: “Computer Graphics from Scratch” by Gabriel Gambetta

What you’ll build: A small graphics library that can draw pixels, lines, rectangles, circles, and sprites. You will render a pixel-art scene and animate a sprite.

Why it teaches graphics fundamentals: You will learn rasterization, clipping, and pixel format conversion, all on a real device.

Core challenges you’ll face:

Rasterization algorithms -> Bresenham, midpoint circle
Pixel formats -> RGB565 conversion
Clipping -> Avoid buffer overflows

Real World Outcome

┌────────────────────────────────────────┐
│  PIXEL ART SCENE                        │
│  *  ^^^  ^^                             │
│  ** ^^^ ^^^                             │
│                                        │
│  Sprite: [>o<]                          │
│  FPS: 30                                │
└────────────────────────────────────────┘

Command Line Outcome Example:

$ ./pixel_artist
Loaded palette: 16 colors
Rendered frame in 9.4ms
FPS: 30

The Core Question You’re Answering

“How do I turn math into pixels efficiently?”

Concepts You Must Understand First

RGB565 encoding
- How do you map 8-bit RGB to 16-bit?
- Book Reference: “Computer Graphics from Scratch” Ch. 2
Line drawing algorithms
- Why use integer-only algorithms?
- Book Reference: “Computer Graphics from Scratch” Ch. 3
Clipping
- How do you ensure you never write out of bounds?
- Book Reference: “Computer Graphics from Scratch” Ch. 4

Questions to Guide Your Design

How will you structure your framebuffer API?
What is your clipping strategy?
How will you batch updates to reduce SPI traffic?

Thinking Exercise

“The Diagonal Line”

Draw a line from (0,0) to (171,319). Which pixels are touched?
How many pixels are out of bounds if you forget clipping?

The Interview Questions They’ll Ask

“Explain Bresenham’s line algorithm in one minute.”
“What causes flicker in framebuffer updates?”
“Why is clipping necessary in graphics?”
“What is the difference between RGB565 and RGB666?”

Hints in Layers

Hint 1: Start with put_pixel() Build a reliable pixel writer before any other primitive.

Hint 2: Integer math only Avoid float operations to keep rendering fast.

Hint 3: Clip early Implement a clip rectangle and check bounds before drawing.

Hint 4: Precompute colors Cache common colors in RGB565 to reduce per-pixel overhead.

Books That Will Help

Common Pitfalls & Debugging

Problem 1: “Lines have gaps”

Why: Incorrect Bresenham step condition
Fix: Re-check error term update

Problem 2: “Random crashes”

Why: Missing clipping, writing outside buffer
Fix: Clip coordinates before drawing

Problem 3: “Colors look wrong”

Why: RGB565 conversion bug
Fix: Verify bit shifts and masks

Definition of Done

Draw lines, rectangles, and circles correctly
Render a sprite with transparency
No buffer overflows (verified with guard bytes)
Scene animates at 30+ FPS

Project 3: DMA Display Driver - Zero-CPU Frame Streaming

Main Programming Language: C
Alternative Programming Languages: Rust
Coolness Level: Level 4: Hardcore
Business Potential: 3. The “Performance” Level
Difficulty: Level 3: Advanced
Knowledge Area: DMA and high-throughput IO
Software or Tool: Pico SDK DMA APIs
Main Book: “Making Embedded Systems” by Elecia White

What you’ll build: A display driver that uses DMA to stream the framebuffer to SPI while the CPU renders the next frame.

Why it teaches performance engineering: DMA is the key to high frame rates and low CPU usage.

Core challenges you’ll face:

DMA configuration -> Correct DREQ and transfer size
Buffer synchronization -> Avoid tearing
Bus contention -> Optimize SRAM placement

Real World Outcome

Terminal output:

$ ./frame_test
DMA enabled: yes
Frame time: 16.2ms
CPU usage: 24%

Display: smooth animation with no tearing.

The Core Question You’re Answering

“How can I move 110 KB per frame without the CPU doing the copy?”

Concepts You Must Understand First

DMA channels and DREQ
- How does pacing prevent FIFO overflow?
- Book Reference: RP2350 Datasheet DMA section
Double buffering
- Why is DMA + single buffer unsafe?
- Book Reference: “Making Embedded Systems” Ch. 7
SRAM banking
- Why does buffer placement affect throughput?
- Book Reference: RP2350 Datasheet address map

Questions to Guide Your Design

Where will your buffers live in SRAM banks?
How will you handle DMA completion interrupts?
How will you measure CPU utilization?

Thinking Exercise

“The Bus Contention Problem”

Suppose DMA and CPU both read from SRAM0. What happens to frame time?

The Interview Questions They’ll Ask

“What is DREQ and why is it needed?”
“How do you avoid tearing with DMA?”
“How would you profile DMA throughput?”
“What is the advantage of DMA chaining?”

Hints in Layers

Hint 1: Validate DMA basics Start with DMA in memory-to-memory mode to verify setup.

Hint 2: DREQ pacing Use DREQ_SPI0_TX so the DMA only writes when FIFO has space.

Hint 3: DMA completion Swap buffers only after DMA complete interrupt.

Hint 4: Measure throughput Measure SPI clock and transfer time with a logic analyzer.

Books That Will Help

Common Pitfalls & Debugging

Problem 1: “DMA stalls”

Why: DREQ mismatch or SPI not enabled
Fix: Ensure SPI is clocked and DREQ matches SPI TX

Problem 2: “Random tearing”

Why: Buffer swap before DMA completion
Fix: Swap only in DMA complete IRQ

Problem 3: “Frame rate is low”

Why: SPI clock too slow or buffer in slow SRAM bank
Fix: Increase SPI clock within spec, move buffer to striped SRAM

Definition of Done

DMA streams full frame without CPU loops
CPU usage below 30% at 30 FPS
No tearing during animation
DMA completion interrupt triggers buffer swap reliably

Project 4: RGB LED Controller with PIO (WS2812 Style)

Main Programming Language: C + PIO
Alternative Programming Languages: Rust
Coolness Level: Level 4: Hardcore Tech Flex
Business Potential: 2. The “Eye Candy” Level
Difficulty: Level 3: Advanced
Knowledge Area: PIO and deterministic IO
Software or Tool: PIO assembler
Main Book: “Making Embedded Systems” by Elecia White

What you’ll build: A PIO-based driver to control the onboard RGB LED (or an external WS2812 strip) with precise timing and smooth animations.

Why it teaches low-level IO: PIO is the RP2350’s secret weapon for custom protocols.

Core challenges you’ll face:

PIO timing -> Exact bit widths
DMA feeding -> Smooth LED updates
Signal verification -> Scope/logic analyzer testing

Real World Outcome

The LED cycles through a smooth rainbow at 60 FPS without CPU overhead.

The Core Question You’re Answering

“How do I generate perfect waveforms without CPU jitter?”

Concepts You Must Understand First

PIO instruction set
- How does OUT/SET timing work?
- Book Reference: RP2350 Datasheet PIO section
WS2812 timing
- Why are timing tolerances so strict?
- Book Reference: WS2812 datasheet
DMA feeding
- How does DMA keep PIO FIFOs full?
- Book Reference: RP2350 Datasheet DMA section

Questions to Guide Your Design

What PIO clock divider gives you exact WS2812 timings?
How will you pack RGB bytes into the PIO FIFO?
Will you use DMA or CPU writes?

Thinking Exercise

“Timing Budget”

If each WS2812 bit is 1.25 us, how many PIO cycles per bit do you need at 8 MHz?

The Interview Questions They’ll Ask

“What is the advantage of PIO over bit-banging?”
“Why is WS2812 timing strict?”
“How do you verify PIO timing?”
“What is DMA’s role in LED control?”

Hints in Layers

Hint 1: Start with a known WS2812 PIO program.

Hint 2: Verify timing with a logic analyzer.

Hint 3: Feed PIO via DMA for smooth animation.

Hint 4: Use a small lookup table for gamma-corrected color.

Books That Will Help

Common Pitfalls & Debugging

Problem 1: “LED flickers”

Why: Timing jitter or incorrect divider
Fix: Adjust PIO clock divider and verify waveform

Problem 2: “Colors are wrong”

Why: RGB vs GRB byte order
Fix: Swap byte order before sending

Problem 3: “LED freezes after a few seconds”

Why: FIFO underflow without DMA pacing
Fix: Use DMA or add blocking writes to keep FIFO filled

Definition of Done

LED color transitions are smooth
Timing verified with analyzer
CPU usage stays low during animation
Gamma correction improves color smoothness

Project 5: Dual-Core Rendering Engine

Main Programming Language: C
Alternative Programming Languages: Rust
Coolness Level: Level 5: Pure Magic
Business Potential: 3. The “Performance” Level
Difficulty: Level 4: Expert
Knowledge Area: Multicore + DMA
Software or Tool: Pico SDK multicore APIs
Main Book: “Computer Systems: A Programmer’s Perspective”

What you’ll build: A dual-core renderer where Core 0 renders frames and Core 1 handles DMA display updates, achieving stable frame times.

Real World Outcome

A demo scene with moving sprites at 60 FPS while CPU usage is split across cores.

The Core Question You’re Answering

“How do I split rendering and IO across cores without data races?”

Concepts You Must Understand First

Multicore FIFO and synchronization
- How do cores signal each other safely?
- Book Reference: RP2350 Datasheet multicore section
Double buffering
- Why must buffers be swapped carefully?
- Book Reference: “Making Embedded Systems” Ch. 7
DMA completion interrupts
- How do you know when transfers finish?
- Book Reference: RP2350 Datasheet DMA section

Questions to Guide Your Design

Which core owns SPI and DMA configuration?
How will you avoid buffer swaps mid-transfer?
How will you measure per-core CPU usage?

Thinking Exercise

“Ownership Model”

Write down which core owns each resource (DMA, SPI, framebuffer, timers).

The Interview Questions They’ll Ask

“How do you avoid data races in shared memory?”
“What is a safe buffer swap protocol?”
“How do you profile multicore performance?”
“Why can two cores still be slower than one?”

Hints in Layers

Hint 1: Assign one core as the sole owner of SPI and DMA.

Hint 2: Use a shared flag or FIFO to request swaps.

Hint 3: Swap only on DMA completion.

Hint 4: Use a cycle counter to measure frame time per core.

Books That Will Help

| Topic | Book | Chapter | |——-|——|———| | Concurrency | “Computer Systems: A Programmer’s Perspective” | Ch. 12 | | Scheduling | “Operating Systems: Three Easy Pieces” | Ch. 5-7 |

Common Pitfalls & Debugging

Problem 1: “Random tearing”

Why: Swap mid-transfer
Fix: Swap only on DMA completion interrupt

Problem 2: “Data corruption”

Why: Both cores writing same buffer
Fix: Strict ownership and locks

Problem 3: “Frame rate drops”

Why: Bus contention between cores
Fix: Place buffers in striped SRAM and minimize shared writes

Definition of Done

Stable 60 FPS with no tearing
Clear ownership of buffers
No data corruption in shared state
Measured per-core CPU usage and documented

Project 6: Font Rendering Engine

Main Programming Language: C
Alternative Programming Languages: Rust
Coolness Level: Level 3: Impressive
Business Potential: 2. The “UI” Level
Difficulty: Level 2: Intermediate
Knowledge Area: Text rendering
Software or Tool: Bitmap fonts
Main Book: “Computer Graphics from Scratch”

What you’ll build: A text rendering engine that supports multiple font sizes and partial screen updates.

Real World Outcome

A UI screen with crisp text at 16px and 24px sizes and no flicker.

The Core Question You’re Answering

“How do I render readable text efficiently on a tiny LCD?”

Concepts You Must Understand First

Bitmap fonts
- How are glyphs stored in memory?
- Book Reference: “Computer Graphics from Scratch” Ch. 5
Dirty rectangles
- How do you update only changed text?
- Book Reference: Rendering chapter notes
Baseline and spacing
- How do you align characters consistently?
- Book Reference: Typography basics

Questions to Guide Your Design

Will you store fonts in flash or SRAM?
How will you align glyph baselines?
How will you handle variable-width fonts?

Thinking Exercise

“The Text Baseline”

Draw a baseline grid for ‘A’, ‘g’, and ‘y’. How do descenders affect layout?

The Interview Questions They’ll Ask

“What is the cost of full-frame text redraws?”
“How do bitmap fonts differ from vector fonts?”
“Why do you need a baseline?”
“What is a glyph cache?”

Hints in Layers

Hint 1: Start with a fixed-width font for simplicity.

Hint 2: Store glyphs as bitmaps in a compact format (1 bpp).

Hint 3: Use a dirty rectangle around the text region.

Hint 4: Precompute a glyph cache for common characters.

Books That Will Help

| Topic | Book | Chapter | |——-|——|———| | Rasterization | “Computer Graphics from Scratch” | Ch. 5 | | Bitwise operations | “Effective C” | Ch. 4 |

Common Pitfalls & Debugging

Problem 1: “Text looks jagged”

Why: Low-resolution font or wrong scaling
Fix: Use pre-rendered font sizes

Problem 2: “Text alignment off”

Why: Baseline not handled
Fix: Add ascent/descent metrics per font

Problem 3: “Text flickers”

Why: Redrawing full screen
Fix: Use dirty rectangles for partial updates

Definition of Done

Render ASCII text in multiple sizes
Partial updates only redraw changed regions
Text baseline and spacing are correct
Glyph cache reduces render time for repeated strings

Project 7: ARM vs RISC-V Benchmark Suite

Main Programming Language: C
Alternative Programming Languages: Rust
Coolness Level: Level 5: Pure Magic
Business Potential: 1. The “Resume Gold”
Difficulty: Level 4: Expert
Knowledge Area: ISA comparison
Software or Tool: Dual toolchains
Main Book: “Computer Organization and Design RISC-V”

What you’ll build: A benchmark harness that runs the same rendering kernel on ARM and RISC-V and compares throughput.

Real World Outcome

A dashboard showing cycles per pixel, memory bandwidth, and FPS for both ISAs.

The Core Question You’re Answering

“How does ISA choice change real-world performance on the same silicon?”

Concepts You Must Understand First

Instruction set differences
- How do ARM and RISC-V encode instructions?
- Book Reference: “Computer Organization and Design RISC-V” Ch. 1-2
Cycle counting
- How do you measure cycles reliably?
- Book Reference: RP2350 Datasheet counters section
Benchmark methodology
- How do you avoid bias?
- Book Reference: Computer Architecture intro

Questions to Guide Your Design

How will you isolate CPU performance from memory effects?
Which kernels represent real workloads (blit, fill, memcpy)?
How will you collect statistically stable results?

Thinking Exercise

“Benchmark Fairness”

What variables must be identical between ARM and RISC-V runs?

The Interview Questions They’ll Ask

“Why is a fair benchmark hard to build?”
“What is the impact of instruction density on flash fetch?”
“How does FPU presence change graphics workloads?”
“Why should you disable interrupts during benchmarks?”

Hints in Layers

Hint 1: Start with a simple memcpy benchmark.

Hint 2: Use the same compiler optimization flags.

Hint 3: Lock clocks and disable interrupts during measurement.

Hint 4: Log results over serial to avoid display overhead.

Books That Will Help

| Topic | Book | Chapter | |——-|——|———| | ISA fundamentals | “Computer Organization and Design RISC-V” | Ch. 1-3 | | Performance | “Computer Architecture” (Hennessy/Patterson) | Ch. 1 |

Common Pitfalls & Debugging

Problem 1: “Results inconsistent”

Why: Interrupts or variable clock rates
Fix: Disable interrupts and fix clocks

Problem 2: “ARM seems much faster”

Why: Using hardware FPU on ARM, software emulation on RISC-V
Fix: Compare integer-only kernels for fairness

Problem 3: “Benchmark slows display”

Why: Measuring while rendering
Fix: Run benchmarks offscreen and report results later

Definition of Done

Benchmarks run on both ARM and RISC-V
Results logged with cycle counts
Analysis explains differences
Graphs rendered on LCD with comparison bars

Project 8: TF Card Image Viewer

Main Programming Language: C
Alternative Programming Languages: MicroPython
Coolness Level: Level 3: Impressive
Business Potential: 3. The “Product” Level
Difficulty: Level 2: Intermediate
Knowledge Area: FAT filesystem
Software or Tool: FAT library
Main Book: “Operating Systems: Three Easy Pieces”

What you’ll build: An image viewer that reads BMP/RAW images from a TF card and displays them.

Real World Outcome

Insert SD card, choose an image, display it in seconds.

The Core Question You’re Answering

“How do I stream large assets from storage without running out of RAM?”

Concepts You Must Understand First

FAT filesystem basics
- How do clusters map to file data?
- Book Reference: OSTEP Ch. 39
Streaming IO
- How do you read large files in chunks?
- Book Reference: “Making Embedded Systems” Ch. 8
Pixel format conversion
- How do you map BMP formats to RGB565?
- Book Reference: Graphics references

Questions to Guide Your Design

Will you decode images on the fly or preconvert?
How will you cache FAT sectors to reduce IO?
How will you display loading progress?

Thinking Exercise

“Chunked Load”

If your buffer is 4 KB, how many reads are needed for a 100 KB image?

The Interview Questions They’ll Ask

“Why is FAT still used in embedded systems?”
“What is a cluster in FAT?”
“How do you stream data with limited RAM?”
“Why convert images to RGB565 offline?”

Hints in Layers

Hint 1: Use a known FAT library first, then optimize.

Hint 2: Read image rows and send to display as you go.

Hint 3: Cache FAT tables to reduce repeated reads.

Books That Will Help

| Topic | Book | Chapter | |——-|——|———| | Filesystems | “Operating Systems: Three Easy Pieces” | Ch. 39 | | Embedded IO | “Making Embedded Systems” | Ch. 8 |

Common Pitfalls & Debugging

Problem 1: “Images load slowly”

Why: Many small reads without caching
Fix: Use larger reads and cache FAT

Problem 2: “Corrupted image”

Why: Wrong endian or pixel format conversion
Fix: Validate BMP header and color depth

Problem 3: “Out of memory”

Why: Trying to load full image into RAM
Fix: Stream line-by-line

Definition of Done

FAT parsing works for root directory
Image files load correctly
UI shows loading progress
At least 5 images can be displayed in sequence

Project 9: Real-Time System Monitor

Main Programming Language: C
Alternative Programming Languages: Rust
Coolness Level: Level 4: Hardcore
Business Potential: 2. The “Diagnostics” Level
Difficulty: Level 2: Intermediate
Knowledge Area: Timers and metrics
Software or Tool: ADC + timers
Main Book: “Making Embedded Systems”

What you’ll build: A system monitor UI that displays CPU utilization, temperature, and FPS in real time.

Real World Outcome

A dashboard with live graphs and numeric stats updating at 10 Hz.

The Core Question You’re Answering

“How do I measure and visualize system health in real time?”

Concepts You Must Understand First

Timers and tick counters
- How do you measure time accurately?
- Book Reference: “Making Embedded Systems” Ch. 7
ADC temperature sensor
- How do you convert ADC value to degrees?
- Book Reference: RP2350 Datasheet ADC section
Graph rendering
- How do you draw graphs efficiently?
- Book Reference: Graphics references

Questions to Guide Your Design

How will you measure CPU idle time?
How will you draw graphs efficiently?
How will you avoid the monitor affecting performance?

Thinking Exercise

“Observer Effect”

How can measuring CPU usage increase CPU usage?

The Interview Questions They’ll Ask

“How do you measure CPU utilization on a microcontroller?”
“What is sampling rate vs accuracy trade-off?”
“How do you avoid the monitor distorting results?”
“Why is double buffering useful for graphs?”

Hints in Layers

Hint 1: Use a periodic timer interrupt to sample counters.

Hint 2: Track idle loop cycles to estimate CPU usage.

Hint 3: Draw only changed graph regions (dirty rectangles).

Books That Will Help

| Topic | Book | Chapter | |——-|——|———| | Timers | “Making Embedded Systems” | Ch. 7 | | Debugging | “The Art of Debugging with GDB” | Ch. 5 |

Common Pitfalls & Debugging

Problem 1: “CPU usage shows 0% or 100%”

Why: Counter not reset properly
Fix: Reset counters every sampling interval

Problem 2: “Temperature fluctuates wildly”

Why: ADC noise or incorrect formula
Fix: Average multiple samples

Problem 3: “Graphs flicker”

Why: Full-screen redraws
Fix: Use dirty rectangles

Definition of Done

CPU usage graph updates smoothly
Temperature readings are stable
UI remains responsive under load
Sampling interval is documented and adjustable

Project 10: Bare-Metal Display Driver - No SDK, Just Registers

Main Programming Language: C (or Assembly)
Alternative Programming Languages: Rust (no_std)
Coolness Level: Level 5: Pure Magic
Business Potential: 1. The “Resume Gold”
Difficulty: Level 5: Master
Knowledge Area: Bare-metal programming
Software or Tool: arm-none-eabi-gcc, linker scripts
Main Book: “Bare Metal C” by Steve Oualline

What you’ll build: A complete SPI + LCD driver without the Pico SDK. Full control of clocks, pads, SPI, and memory.

Real World Outcome

Display output identical to Project 1, but with no SDK dependency.

The Core Question You’re Answering

“Can I bring up the RP2350 from reset to pixels with only a datasheet?”

Concepts You Must Understand First

Startup code and linker scripts
- How is memory laid out at boot?
- Book Reference: “Bare Metal C” Ch. 2-3
Clock and reset registers
- Which registers must be touched first?
- Book Reference: RP2350 Datasheet clock section
Vector table
- How does the CPU find interrupt handlers?
- Book Reference: Arm Cortex-M docs

Questions to Guide Your Design

Where will your vector table live?
How will you configure the stack pointer?
What is the minimal clock setup for SPI?

Thinking Exercise

“Minimal Boot”

List the absolute minimum steps to light one pixel without SDK.

The Interview Questions They’ll Ask

“What is the role of the vector table?”
“Why is linker script critical in bare metal?”
“How do you initialize .data and .bss?”
“What happens if you forget to enable a peripheral clock?”

Hints in Layers

Hint 1: Start from a known minimal linker script.

Hint 2: Copy .data and zero .bss before calling main().

Hint 3: Configure clocks before enabling SPI.

Hint 4: Use memory-mapped register defines from the datasheet.

Books That Will Help

Common Pitfalls & Debugging

Problem 1: “No boot”

Why: Stack pointer or vector table wrong
Fix: Verify linker script addresses

Problem 2: “SPI dead”

Why: Clocks not enabled
Fix: Configure clock and reset registers

Problem 3: “Random faults”

Why: Uninitialized .bss or .data
Fix: Ensure startup code clears and copies sections

Definition of Done

System boots without SDK
Clocks and SPI configured manually
LCD init sequence works from scratch
Binary size documented and under 10 KB

Project 11: Simple Game - Pong or Snake

Main Programming Language: C
Alternative Programming Languages: Rust, MicroPython
Coolness Level: Level 4: Hardcore
Business Potential: 2. The “Fun” Level
Difficulty: Level 3: Advanced
Knowledge Area: Game loop + rendering
Software or Tool: Pico SDK
Main Book: “Game Programming Patterns” by Robert Nystrom

What you’ll build: A playable Pong or Snake game with button input and smooth animation.

Real World Outcome

A playable game at 60 FPS with score display.

The Core Question You’re Answering

“How do I build a real-time game loop on constrained hardware?”

Concepts You Must Understand First

Fixed timestep game loop
- Why is fixed timestep more stable?
- Book Reference: “Game Programming Patterns” Ch. 3
Input debouncing
- Why do buttons bounce?
- Book Reference: “Making Embedded Systems” Ch. 6
Collision detection
- How do you detect overlaps efficiently?
- Book Reference: Graphics basics

Questions to Guide Your Design

How will you keep frame time stable?
How will you represent game state and collisions?
How will you handle input latency?

Thinking Exercise

“Frame Budget”

If you target 60 FPS, how many milliseconds per frame do you have?

The Interview Questions They’ll Ask

“What is a fixed timestep loop?”
“Why does input debouncing matter?”
“How do you avoid frame drops?”
“What is the trade-off between responsiveness and stability?”

Hints in Layers

Hint 1: Start with a 60 FPS timer interrupt.

Hint 2: Separate update() from render().

Hint 3: Use simple collision boxes for sprites.

Books That Will Help

| Topic | Book | Chapter | |——-|——|———| | Game loop | “Game Programming Patterns” | Ch. 3 | | Input handling | “Making Embedded Systems” | Ch. 6 |

Common Pitfalls & Debugging

Problem 1: “Game speed varies”

Why: Variable timestep
Fix: Use fixed timestep and accumulate delta

Problem 2: “Input feels laggy”

Why: Slow polling
Fix: Sample input every frame

Problem 3: “Collision misses”

Why: Objects move too fast per frame
Fix: Use smaller timestep or swept collisions

Definition of Done

Game runs at fixed frame rate
Input debounced correctly
Score and state updates correctly
Frame time stays within 16.7 ms budget

Project 12: USB HID Device - Custom Controller

Main Programming Language: C
Alternative Programming Languages: Rust
Coolness Level: Level 4: Hardcore
Business Potential: 3. The “Product” Level
Difficulty: Level 4: Expert
Knowledge Area: USB protocol
Software or Tool: TinyUSB
Main Book: “USB Complete” by Jan Axelson

What you’ll build: A USB HID device recognized as a game controller or macro keypad with LCD status display.

Real World Outcome

Host OS recognizes the device as a HID controller; LCD shows status.

The Core Question You’re Answering

“How do I make a microcontroller enumerate as a standard USB device?”

Concepts You Must Understand First

USB descriptors
- What information does the host need?
- Book Reference: “USB Complete” Ch. 1-4
HID reports
- How do you pack buttons and axes?
- Book Reference: “USB Complete” Ch. 8
Endpoint polling
- How often can you send reports?
- Book Reference: USB fundamentals

Questions to Guide Your Design

What HID report format will you implement?
How often will you send reports?
How will you indicate connection state on the LCD?

Thinking Exercise

“Descriptor Design”

Sketch a HID report descriptor for 4 buttons and 2 axes.

The Interview Questions They’ll Ask

“What happens during USB enumeration?”
“What is a HID report descriptor?”
“Why is polling interval important?”
“Why does the host control report size?”

Hints in Layers

Hint 1: Start with TinyUSB examples.

Hint 2: Define a simple report: 1 byte buttons, 2 bytes axes.

Hint 3: Use a timer to send reports at a fixed interval.

Books That Will Help

| Topic | Book | Chapter | |——-|——|———| | USB basics | “USB Complete” | Ch. 1-4 | | HID | “USB Complete” | Ch. 8 |

Common Pitfalls & Debugging

Problem 1: “Device not recognized”

Why: Descriptor mismatch or incorrect VID/PID
Fix: Validate descriptor and use known VID/PID for testing

Problem 2: “Inputs not updating”

Why: Report size mismatch
Fix: Ensure report length matches descriptor

Problem 3: “Laggy input”

Why: Polling interval too long
Fix: Reduce interval while respecting USB limits

Definition of Done

USB enumerates correctly on Windows/Mac/Linux
HID reports update in real time
LCD shows connection status
Report rate documented and stable

Project 13: Mini Operating System - Cooperative Multitasking

Main Programming Language: C
Alternative Programming Languages: Rust
Coolness Level: Level 5: Pure Magic
Business Potential: 1. The “Resume Gold”
Difficulty: Level 5: Master
Knowledge Area: OS design
Software or Tool: Bare-metal
Main Book: “Operating Systems: Three Easy Pieces”

What you’ll build: A cooperative multitasking mini-OS with a task manager UI on the LCD.

Real World Outcome

A task manager screen showing multiple tasks, CPU usage, and stack stats.

The Core Question You’re Answering

“How do I design a tiny OS that schedules tasks on a microcontroller?”

Concepts You Must Understand First

Context switching
- Which registers must be saved and restored?
- Book Reference: OSTEP Ch. 4-6
Scheduling policies
- Round-robin vs priority
- Book Reference: OSTEP Ch. 7-9
Stack management
- How much stack does each task need?
- Book Reference: “Effective C” Ch. 5

Questions to Guide Your Design

How will you store task control blocks?
How will tasks yield control?
How will you measure per-task CPU usage?

Thinking Exercise

“Stack Size”

How much stack does a task need if it calls three nested functions?

The Interview Questions They’ll Ask

“What is a context switch?”
“How do you avoid starvation?”
“What is cooperative vs preemptive scheduling?”
“Why is stack size allocation critical?”

Hints in Layers

Hint 1: Start with two tasks: idle and display.

Hint 2: Use a simple round-robin scheduler.

Hint 3: Track stack pointer per task and restore on switch.

Books That Will Help

| Topic | Book | Chapter | |——-|——|———| | Scheduling | “Operating Systems: Three Easy Pieces” | Ch. 7-9 | | Low-level C | “Effective C” | Ch. 5 |

Common Pitfalls & Debugging

Problem 1: “Tasks crash after switch”

Why: Stack pointer not restored correctly
Fix: Save/restore SP and callee-saved registers

Problem 2: “Scheduler hangs”

Why: Task never yields
Fix: Enforce yield points

Problem 3: “Stack overflow”

Why: Task stack too small
Fix: Add guard regions and measure max depth

Definition of Done

Multiple tasks run without preemption errors
Context switching works for at least 4 tasks
Task manager UI updates in real time
Scheduler overhead measured and documented

Sources and References

RP2350 Datasheet (Raspberry Pi): https://datasheets.raspberrypi.com/rp2350/rp2350-datasheet.pdf
Raspberry Pi silicon documentation: https://www.raspberrypi.com/documentation/microcontrollers/silicon.html
Raspberry Pi Pico SDK: https://github.com/raspberrypi/pico-sdk
Pico SDK hardware APIs (PIO, DMA): https://www.raspberrypi.com/documentation/pico-sdk/hardware.html
Waveshare RP2350-LCD-1.47-A wiki: https://www.waveshare.com/wiki/RP2350-LCD-1.47-A
ST7789 datasheet (Sitronix): https://www.newhavendisplay.com/appnotes/datasheets/LCDs/ST7789V.pdf
Hazard3 RISC-V core: https://github.com/Wren6991/Hazard3
TinyUSB: https://github.com/hathach/tinyusb
UF2 bootloader docs: https://www.raspberrypi.com/documentation/microcontrollers/raspberry-pi-pico.html