Project 3: DMA Display Driver - Zero-CPU Frame Streaming

Offload pixel transfers to DMA so the LCD updates while the CPU renders the next frame.

Quick Reference

Attribute	Value
Difficulty	Level 3: Advanced
Time Estimate	1-2 weeks
Main Programming Language	C (Alternatives: Rust)
Alternative Programming Languages	Rust (pico-sdk)
Coolness Level	Level 4: Hardcore
Business Potential	3. The “Product” Level
Prerequisites	Project 1 SPI LCD bring-up, Project 2 framebuffer basics
Key Topics	DMA, SPI FIFO, DREQ pacing, double buffering

1. Learning Objectives

By completing this project, you will:

Configure RP2350 DMA channels to stream RGB565 frames to SPI.
Use DREQ to pace DMA transfers to the SPI FIFO.
Implement double buffering to render and transmit concurrently.
Measure frame time improvements versus CPU-driven SPI.
Build a tear-free update pipeline with deterministic timing.

2. All Theory Needed (Per-Concept Breakdown)

2.1 DMA Engines and DREQ Pacing

Fundamentals

DMA (Direct Memory Access) allows a peripheral to move data between memory and an IO register without CPU intervention. On the RP2350, DMA channels can be configured to read from RAM and write to the SPI FIFO. A key concept is DREQ (DMA request): a hardware signal that tells DMA when the peripheral is ready for more data. Without DREQ, a DMA channel may overflow a FIFO or stall unnecessarily. Correct DMA pacing ensures that you can stream pixels at high speed without CPU overhead, freeing the CPU to render the next frame or handle input.

Deep Dive into the concept

DMA is essentially a programmable copy engine. You configure a source address, a destination address, a transfer count, and a pacing source. For SPI, the destination is the SPI data register (FIFO), and the source is your framebuffer. The DREQ signal serves as a hardware throttle: it advances the DMA when the SPI FIFO can accept more bytes. This is critical because SPI is serial and slow relative to RAM. If DMA writes too quickly, the FIFO overflows, and pixels are lost or corrupted. If DMA is too conservative, you lose throughput.

A robust DMA setup uses 16-bit transfers for RGB565 or 8-bit for byte-wise streams. You also need to consider alignment: some DMA engines prefer word-aligned sources. The RP2350 DMA controller also supports chaining, which can automatically trigger another DMA channel when the first completes. This is useful for double buffering: one channel sends frame A, then chains to a control channel that reconfigures for frame B. DREQ pacing also means you can maintain deterministic throughput regardless of CPU load, because the DMA engine runs independently. Finally, when DMA completes, you can trigger an interrupt to swap buffers or update a frame counter.

How this fits on projects

This is the core of Section 3.2 and Section 5.10 Phase 2. It is required for Project 5 (dual-core rendering) and Project 9 (system monitor). Also used in: Project 5, Project 9.

Definitions & key terms

DMA -> Hardware engine for memory transfers without CPU.
DREQ -> DMA request signal from a peripheral.
FIFO -> First-in-first-out buffer in SPI controller.
Channel chaining -> Triggering a DMA channel from another.

Mental model diagram (ASCII)

Framebuffer RAM --> DMA Channel --> SPI FIFO --> LCD
                    ^ DREQ pacing ^

How it works (step-by-step)

Configure DMA with source=framebuffer, destination=SPI FIFO.
Set transfer size to 16-bit (RGB565) or 8-bit (byte stream).
Select SPI DREQ as the pacing signal.
Start DMA; SPI FIFO drains via clock.
DMA triggers interrupt on completion; swap buffers.

Failure modes:

No DREQ -> FIFO overflow.
Wrong transfer size -> color corruption.
Misaligned source -> DMA errors or slow transfers.

Minimal concrete example

channel_config cfg = dma_channel_get_default_config(dma_chan);
channel_config_set_transfer_data_size(&cfg, DMA_SIZE_16);
channel_config_set_dreq(&cfg, DREQ_SPI0_TX);
channel_config_set_read_increment(&cfg, true);
channel_config_set_write_increment(&cfg, false);

dma_channel_configure(dma_chan, &cfg,
  &spi_get_hw(spi0)->dr,  // write addr
  framebuffer,            // read addr
  WIDTH * HEIGHT,         // transfer count (16-bit)
  true                    // start
);

Common misconceptions

“DMA just makes things faster automatically.” -> Wrong setup can be slower.
“DREQ is optional.” -> Without it, FIFO overflows.
“DMA copies whole frames in one burst.” -> It is paced by FIFO readiness.

Check-your-understanding questions

What does DREQ do in a DMA transfer to SPI?
Why must the destination not increment?
How do you detect DMA completion?

Check-your-understanding answers

It paces DMA so it writes only when SPI FIFO can accept data.
SPI FIFO is a fixed register; only the source should increment.
Use DMA completion interrupt or poll the channel status.

Real-world applications

High-speed display updates in wearables
Streaming data to DACs or audio codecs
Sensor data logging without CPU overhead

Where you’ll apply it

This project: Section 3.2, Section 5.10 Phase 2
Also used in: Project 5, Project 9

References

RP2350 datasheet (DMA controller)
Pico SDK DMA examples

Key insights

DMA is only fast when it is correctly paced.

Summary

DMA plus DREQ turns SPI into a continuous pixel conveyor belt.

Homework/Exercises to practice the concept

Configure a DMA channel to copy a buffer to a dummy memory address.
Measure SPI throughput with and without DMA.
Create a DMA chain for two buffers.

Solutions to the homework/exercises

Set read increment true, write increment false, no DREQ.
DMA should free CPU and maintain steady throughput.
Use a control channel to reconfigure source address.

2.2 Double Buffering and Tear-Free Updates

Fundamentals

Double buffering uses two framebuffers: one displayed (front buffer) and one being rendered (back buffer). While DMA streams the front buffer to the LCD, the CPU draws the next frame into the back buffer. When DMA completes, buffers swap. This prevents tearing (half-updated frames) and allows smooth animations. Without double buffering, you either stall rendering or risk updating the framebuffer while it’s being transmitted, causing visible artifacts.

Deep Dive into the concept

Tearing occurs when the display is mid-refresh and the framebuffer changes under it. If you push pixels directly from your framebuffer while rendering into it, you can end up with a frame that is half old, half new. Double buffering solves this by separating render and display memory. The key is synchronization: you must swap buffers only when DMA completes. This can be done with an interrupt handler that sets a flag, or with a semaphore if you’re using multicore. You also need to manage memory: two full 110 KB buffers consume about 220 KB. On RP2350, this is feasible but not trivial, so you must plan memory regions carefully.

A common refinement is “ping-pong DMA,” where DMA alternates between buffers without CPU reconfiguration. Another technique is “triple buffering,” which allows rendering ahead but uses more memory. For this project, double buffering is enough. You must also ensure that buffer swaps are atomic: if the CPU starts drawing into the buffer while DMA is still using it, you will corrupt the display. Use a state flag or lock. Finally, you should measure frame time: if rendering takes longer than a frame’s transmit time, you’ll miss the next update. That’s acceptable, but you should detect and log it.

How this fits on projects

Double buffering is central in Section 3.2 and Section 5.10 Phase 2. It is required for Project 5 (dual-core rendering) and Project 11 (game loop stability). Also used in: Project 5, Project 11.

Definitions & key terms

Front buffer -> Buffer currently being displayed.
Back buffer -> Buffer being rendered.
Tearing -> Visual artifact from partial updates.
Swap -> Exchange front and back buffer pointers.

Mental model diagram (ASCII)

CPU renders -> [Back Buffer]
DMA streams -> [Front Buffer]
        swap at frame boundary

How it works (step-by-step)

DMA streams front buffer to LCD.
CPU renders next frame into back buffer.
DMA completion interrupt fires.
Swap front/back pointers.
Start DMA on new front buffer.

Failure modes:

Swap too early -> tearing.
Swap too late -> dropped frames.
Insufficient RAM -> crashes.

Minimal concrete example

volatile bool dma_done = false;
void dma_irq_handler() { dma_done = true; }

if (dma_done) {
  dma_done = false;
  swap_buffers(&front, &back);
  start_dma(front);
}

Common misconceptions

“Double buffering always doubles FPS.” -> It prevents tearing, not magically faster render.
“Swap anytime.” -> Only swap on DMA completion.

Check-your-understanding questions

Why does double buffering reduce tearing?
What happens if rendering is slower than DMA?
How do you ensure a safe swap?

Check-your-understanding answers

It prevents the display from reading a buffer being modified.
Frames are skipped; display shows older frame longer.
Swap only when DMA is done and with atomic flags.

Real-world applications

Video playback on embedded displays
Game rendering on microcontrollers
UI frameworks with smooth animations

Where you’ll apply it

This project: Section 3.2, Section 5.10 Phase 2
Also used in: Project 5, Project 11

References

Game loop literature on buffering
Pico SDK DMA examples

Key insights

Double buffering decouples rendering from display transfer.

Summary

Use two buffers and swap at DMA completion to keep frames clean.

Homework/Exercises to practice the concept

Measure RAM usage with one vs two buffers.
Force a mid-frame buffer swap and observe tearing.
Implement a swap counter and log skipped frames.

Solutions to the homework/exercises

Two buffers use ~220 KB for 172x320 RGB565.
You should see a split frame with mismatched content.
Increment counter when render time > DMA time.

3. Project Specification

3.1 What You Will Build

A DMA-driven display pipeline that streams RGB565 frames to the ST7789 LCD with minimal CPU overhead. The project includes double buffering and frame timing diagnostics.

3.2 Functional Requirements

DMA channel setup with SPI DREQ pacing.
Framebuffer streaming in RGB565.
Double buffering with safe swap.
Frame timing metrics logged over serial.

3.3 Non-Functional Requirements

Performance: Achieve at least 30 FPS full-screen updates at 20 MHz SPI.
Reliability: No tearing under stress.
Usability: Clear buffer swap API.

3.4 Example Usage / Output

DMA FPS: 31
Frame time: 32 ms
Dropped frames: 0

3.5 Data Formats / Schemas / Protocols

Framebuffer: RGB565 array, 172x320
DMA transfer size: 16-bit elements

3.6 Edge Cases

DMA completion interrupt missed
Frame render time exceeds transfer time
FIFO overflow due to wrong DREQ

3.7 Real World Outcome

The display updates smoothly while CPU load stays low. A status overlay shows FPS and dropped frames.

3.7.1 How to Run (Copy/Paste)

cd LEARN_RP2350_LCD_DEEP_DIVE/dma_display
mkdir -p build
cd build
cmake ..
make -j4
cp dma_display.uf2 /Volumes/RP2350

3.7.2 Golden Path Demo (Deterministic)

Render a moving gradient at 30 FPS.
FPS counter remains between 29-31.
CPU usage shows at least 50% idle.

3.7.3 Failure Demo (Deterministic)

Disable DREQ pacing.
Expected: corrupted frames or SPI FIFO overflow.
Fix: restore DREQ configuration.

4. Solution Architecture

4.1 High-Level Design

[Renderer] -> [Back Buffer] --swap--> [Front Buffer] -> [DMA+SPI] -> LCD

4.2 Key Components

4.3 Data Structures (No Full Code)

struct frame_buffers {
  uint16_t *front;
  uint16_t *back;
};

4.4 Algorithm Overview

Key Algorithm: Frame Swap

Start DMA on front buffer.
Render into back buffer.
Wait for DMA done.
Swap and restart.

Complexity Analysis:

Time: O(pixels) per frame render
Space: O(2 * framebuffer)

5. Implementation Guide

5.1 Development Environment Setup

# Use pico-sdk + DMA examples

5.2 Project Structure

dma_display/
- src/
  - dma_display.c
  - buffer.c
  - main.c
- README.md

5.3 The Core Question You’re Answering

“How do I stream full frames without tying up the CPU?”

5.4 Concepts You Must Understand First

DMA channel configuration and DREQ
SPI FIFO behavior
Double buffering synchronization

5.5 Questions to Guide Your Design

How will you detect DMA completion?
What transfer size is safest for RGB565?
How will you measure FPS reliably?

5.6 Thinking Exercise

Compute theoretical max FPS at 20 MHz SPI for 110 KB frames.

5.7 The Interview Questions They’ll Ask

What is DMA and why is it useful?
What does DREQ do?
How does double buffering prevent tearing?

5.8 Hints in Layers

Hint 1: Start with a small buffer and verify DMA writes.
Hint 2: Add DREQ pacing and verify FIFO stability.
Hint 3: Add double buffering and swap on DMA IRQ.

5.9 Books That Will Help

| Topic | Book | Chapter | |——-|——|———| | DMA fundamentals | “Making Embedded Systems” | Ch. 9 | | Performance tuning | “Computer Architecture” | Ch. 1 |

5.10 Implementation Phases

Phase 1: DMA Bring-up (2-3 days)

Goals: Send a buffer via DMA. Tasks: Configure DMA channel and DREQ. Checkpoint: Solid color fill works.

Phase 2: Double Buffer (3-4 days)

Goals: Render and transmit concurrently. Tasks: Implement buffer swap logic. Checkpoint: No tearing in animations.

Phase 3: Metrics (2-3 days)

Goals: Measure FPS and CPU idle time. Tasks: Add counters and logging. Checkpoint: Stable FPS counter on LCD.

5.11 Key Implementation Decisions

6. Testing Strategy

6.1 Test Categories

6.2 Critical Test Cases

Solid Color DMA: all pixels identical, no corruption.
Gradient DMA: verify correct ordering.
Stress Loop: 1000 consecutive frames.

6.3 Test Data

Frame A: red gradient
Frame B: blue gradient

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

7.2 Debugging Strategies

Use SPI FIFO level registers to verify pacing.
Toggle a GPIO on DMA completion to measure timing.

7.3 Performance Traps

Using CPU to fill buffers while DMA runs can still be a bottleneck; optimize rendering.

8. Extensions & Challenges

8.1 Beginner Extensions

Add a DMA completion LED blink.
Log frame time over UART.

8.2 Intermediate Extensions

Implement triple buffering.
Use DMA chaining for automatic swaps.

8.3 Advanced Extensions

Use PIO to drive SPI-like LCD in parallel.
Implement scanline DMA updates.

9. Real-World Connections

9.1 Industry Applications

Wearable displays with smooth UI
Portable instruments with continuous data rendering

Pico SDK DMA examples
LVGL display drivers using DMA

9.3 Interview Relevance

DMA and buffering are common embedded optimization topics.

10. Resources

10.1 Essential Reading

RP2350 datasheet DMA chapter
ST7789 data transfer timing

10.2 Video Resources

DMA walkthroughs for microcontrollers

10.3 Tools & Documentation

Logic analyzer for SPI throughput verification

Project 5 uses DMA + multicore.

11. Self-Assessment Checklist

11.1 Understanding

I can configure DMA with DREQ pacing.
I can explain why double buffering prevents tearing.

11.2 Implementation

DMA streams full frames without corruption.
FPS is measured and stable.

11.3 Growth

I can explain DMA setup in an interview.

12. Submission / Completion Criteria

Minimum Viable Completion:

DMA transfer works with DREQ.
One full frame renders without CPU-driven SPI.

Full Completion:

Double buffering and FPS counter working.

Excellence (Going Above & Beyond):

DMA chaining or triple buffering implemented.