Project 3: DMA Display Driver - Zero-CPU Frame Streaming
Offload pixel transfers to DMA so the LCD updates while the CPU renders the next frame.
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Level 3: Advanced |
| Time Estimate | 1-2 weeks |
| Main Programming Language | C (Alternatives: Rust) |
| Alternative Programming Languages | Rust (pico-sdk) |
| Coolness Level | Level 4: Hardcore |
| Business Potential | 3. The “Product” Level |
| Prerequisites | Project 1 SPI LCD bring-up, Project 2 framebuffer basics |
| Key Topics | DMA, SPI FIFO, DREQ pacing, double buffering |
1. Learning Objectives
By completing this project, you will:
- Configure RP2350 DMA channels to stream RGB565 frames to SPI.
- Use DREQ to pace DMA transfers to the SPI FIFO.
- Implement double buffering to render and transmit concurrently.
- Measure frame time improvements versus CPU-driven SPI.
- Build a tear-free update pipeline with deterministic timing.
2. All Theory Needed (Per-Concept Breakdown)
2.1 DMA Engines and DREQ Pacing
Fundamentals
DMA (Direct Memory Access) allows a peripheral to move data between memory and an IO register without CPU intervention. On the RP2350, DMA channels can be configured to read from RAM and write to the SPI FIFO. A key concept is DREQ (DMA request): a hardware signal that tells DMA when the peripheral is ready for more data. Without DREQ, a DMA channel may overflow a FIFO or stall unnecessarily. Correct DMA pacing ensures that you can stream pixels at high speed without CPU overhead, freeing the CPU to render the next frame or handle input.
Deep Dive into the concept
DMA is essentially a programmable copy engine. You configure a source address, a destination address, a transfer count, and a pacing source. For SPI, the destination is the SPI data register (FIFO), and the source is your framebuffer. The DREQ signal serves as a hardware throttle: it advances the DMA when the SPI FIFO can accept more bytes. This is critical because SPI is serial and slow relative to RAM. If DMA writes too quickly, the FIFO overflows, and pixels are lost or corrupted. If DMA is too conservative, you lose throughput.
A robust DMA setup uses 16-bit transfers for RGB565 or 8-bit for byte-wise streams. You also need to consider alignment: some DMA engines prefer word-aligned sources. The RP2350 DMA controller also supports chaining, which can automatically trigger another DMA channel when the first completes. This is useful for double buffering: one channel sends frame A, then chains to a control channel that reconfigures for frame B. DREQ pacing also means you can maintain deterministic throughput regardless of CPU load, because the DMA engine runs independently. Finally, when DMA completes, you can trigger an interrupt to swap buffers or update a frame counter.
How this fits on projects
This is the core of Section 3.2 and Section 5.10 Phase 2. It is required for Project 5 (dual-core rendering) and Project 9 (system monitor). Also used in: Project 5, Project 9.
Definitions & key terms
- DMA -> Hardware engine for memory transfers without CPU.
- DREQ -> DMA request signal from a peripheral.
- FIFO -> First-in-first-out buffer in SPI controller.
- Channel chaining -> Triggering a DMA channel from another.
Mental model diagram (ASCII)
Framebuffer RAM --> DMA Channel --> SPI FIFO --> LCD
^ DREQ pacing ^
How it works (step-by-step)
- Configure DMA with source=framebuffer, destination=SPI FIFO.
- Set transfer size to 16-bit (RGB565) or 8-bit (byte stream).
- Select SPI DREQ as the pacing signal.
- Start DMA; SPI FIFO drains via clock.
- DMA triggers interrupt on completion; swap buffers.
Failure modes:
- No DREQ -> FIFO overflow.
- Wrong transfer size -> color corruption.
- Misaligned source -> DMA errors or slow transfers.
Minimal concrete example
channel_config cfg = dma_channel_get_default_config(dma_chan);
channel_config_set_transfer_data_size(&cfg, DMA_SIZE_16);
channel_config_set_dreq(&cfg, DREQ_SPI0_TX);
channel_config_set_read_increment(&cfg, true);
channel_config_set_write_increment(&cfg, false);
dma_channel_configure(dma_chan, &cfg,
&spi_get_hw(spi0)->dr, // write addr
framebuffer, // read addr
WIDTH * HEIGHT, // transfer count (16-bit)
true // start
);
Common misconceptions
- “DMA just makes things faster automatically.” -> Wrong setup can be slower.
- “DREQ is optional.” -> Without it, FIFO overflows.
- “DMA copies whole frames in one burst.” -> It is paced by FIFO readiness.
Check-your-understanding questions
- What does DREQ do in a DMA transfer to SPI?
- Why must the destination not increment?
- How do you detect DMA completion?
Check-your-understanding answers
- It paces DMA so it writes only when SPI FIFO can accept data.
- SPI FIFO is a fixed register; only the source should increment.
- Use DMA completion interrupt or poll the channel status.
Real-world applications
- High-speed display updates in wearables
- Streaming data to DACs or audio codecs
- Sensor data logging without CPU overhead
Where you’ll apply it
References
- RP2350 datasheet (DMA controller)
- Pico SDK DMA examples
Key insights
DMA is only fast when it is correctly paced.
Summary
DMA plus DREQ turns SPI into a continuous pixel conveyor belt.
Homework/Exercises to practice the concept
- Configure a DMA channel to copy a buffer to a dummy memory address.
- Measure SPI throughput with and without DMA.
- Create a DMA chain for two buffers.
Solutions to the homework/exercises
- Set read increment true, write increment false, no DREQ.
- DMA should free CPU and maintain steady throughput.
- Use a control channel to reconfigure source address.
2.2 Double Buffering and Tear-Free Updates
Fundamentals
Double buffering uses two framebuffers: one displayed (front buffer) and one being rendered (back buffer). While DMA streams the front buffer to the LCD, the CPU draws the next frame into the back buffer. When DMA completes, buffers swap. This prevents tearing (half-updated frames) and allows smooth animations. Without double buffering, you either stall rendering or risk updating the framebuffer while it’s being transmitted, causing visible artifacts.
Deep Dive into the concept
Tearing occurs when the display is mid-refresh and the framebuffer changes under it. If you push pixels directly from your framebuffer while rendering into it, you can end up with a frame that is half old, half new. Double buffering solves this by separating render and display memory. The key is synchronization: you must swap buffers only when DMA completes. This can be done with an interrupt handler that sets a flag, or with a semaphore if you’re using multicore. You also need to manage memory: two full 110 KB buffers consume about 220 KB. On RP2350, this is feasible but not trivial, so you must plan memory regions carefully.
A common refinement is “ping-pong DMA,” where DMA alternates between buffers without CPU reconfiguration. Another technique is “triple buffering,” which allows rendering ahead but uses more memory. For this project, double buffering is enough. You must also ensure that buffer swaps are atomic: if the CPU starts drawing into the buffer while DMA is still using it, you will corrupt the display. Use a state flag or lock. Finally, you should measure frame time: if rendering takes longer than a frame’s transmit time, you’ll miss the next update. That’s acceptable, but you should detect and log it.
How this fits on projects
Double buffering is central in Section 3.2 and Section 5.10 Phase 2. It is required for Project 5 (dual-core rendering) and Project 11 (game loop stability). Also used in: Project 5, Project 11.
Definitions & key terms
- Front buffer -> Buffer currently being displayed.
- Back buffer -> Buffer being rendered.
- Tearing -> Visual artifact from partial updates.
- Swap -> Exchange front and back buffer pointers.
Mental model diagram (ASCII)
CPU renders -> [Back Buffer]
DMA streams -> [Front Buffer]
swap at frame boundary
How it works (step-by-step)
- DMA streams front buffer to LCD.
- CPU renders next frame into back buffer.
- DMA completion interrupt fires.
- Swap front/back pointers.
- Start DMA on new front buffer.
Failure modes:
- Swap too early -> tearing.
- Swap too late -> dropped frames.
- Insufficient RAM -> crashes.
Minimal concrete example
volatile bool dma_done = false;
void dma_irq_handler() { dma_done = true; }
if (dma_done) {
dma_done = false;
swap_buffers(&front, &back);
start_dma(front);
}
Common misconceptions
- “Double buffering always doubles FPS.” -> It prevents tearing, not magically faster render.
- “Swap anytime.” -> Only swap on DMA completion.
Check-your-understanding questions
- Why does double buffering reduce tearing?
- What happens if rendering is slower than DMA?
- How do you ensure a safe swap?
Check-your-understanding answers
- It prevents the display from reading a buffer being modified.
- Frames are skipped; display shows older frame longer.
- Swap only when DMA is done and with atomic flags.
Real-world applications
- Video playback on embedded displays
- Game rendering on microcontrollers
- UI frameworks with smooth animations
Where you’ll apply it
- This project: Section 3.2, Section 5.10 Phase 2
- Also used in: Project 5, Project 11
References
- Game loop literature on buffering
- Pico SDK DMA examples
Key insights
Double buffering decouples rendering from display transfer.
Summary
Use two buffers and swap at DMA completion to keep frames clean.
Homework/Exercises to practice the concept
- Measure RAM usage with one vs two buffers.
- Force a mid-frame buffer swap and observe tearing.
- Implement a swap counter and log skipped frames.
Solutions to the homework/exercises
- Two buffers use ~220 KB for 172x320 RGB565.
- You should see a split frame with mismatched content.
- Increment counter when render time > DMA time.
3. Project Specification
3.1 What You Will Build
A DMA-driven display pipeline that streams RGB565 frames to the ST7789 LCD with minimal CPU overhead. The project includes double buffering and frame timing diagnostics.
3.2 Functional Requirements
- DMA channel setup with SPI DREQ pacing.
- Framebuffer streaming in RGB565.
- Double buffering with safe swap.
- Frame timing metrics logged over serial.
3.3 Non-Functional Requirements
- Performance: Achieve at least 30 FPS full-screen updates at 20 MHz SPI.
- Reliability: No tearing under stress.
- Usability: Clear buffer swap API.
3.4 Example Usage / Output
DMA FPS: 31
Frame time: 32 ms
Dropped frames: 0
3.5 Data Formats / Schemas / Protocols
- Framebuffer: RGB565 array, 172x320
- DMA transfer size: 16-bit elements
3.6 Edge Cases
- DMA completion interrupt missed
- Frame render time exceeds transfer time
- FIFO overflow due to wrong DREQ
3.7 Real World Outcome
The display updates smoothly while CPU load stays low. A status overlay shows FPS and dropped frames.
3.7.1 How to Run (Copy/Paste)
cd LEARN_RP2350_LCD_DEEP_DIVE/dma_display
mkdir -p build
cd build
cmake ..
make -j4
cp dma_display.uf2 /Volumes/RP2350
3.7.2 Golden Path Demo (Deterministic)
- Render a moving gradient at 30 FPS.
- FPS counter remains between 29-31.
- CPU usage shows at least 50% idle.
3.7.3 Failure Demo (Deterministic)
- Disable DREQ pacing.
- Expected: corrupted frames or SPI FIFO overflow.
- Fix: restore DREQ configuration.
4. Solution Architecture
4.1 High-Level Design
[Renderer] -> [Back Buffer] --swap--> [Front Buffer] -> [DMA+SPI] -> LCD
4.2 Key Components
| Component | Responsibility | Key Decisions | |———–|—————-|—————| | DMA config | Setup channel and DREQ | Use 16-bit transfers | | Buffer manager | Swap front/back buffers | Swap on DMA completion | | Metrics | Track FPS and dropped frames | Serial logging |
4.3 Data Structures (No Full Code)
struct frame_buffers {
uint16_t *front;
uint16_t *back;
};
4.4 Algorithm Overview
Key Algorithm: Frame Swap
- Start DMA on front buffer.
- Render into back buffer.
- Wait for DMA done.
- Swap and restart.
Complexity Analysis:
- Time: O(pixels) per frame render
- Space: O(2 * framebuffer)
5. Implementation Guide
5.1 Development Environment Setup
# Use pico-sdk + DMA examples
5.2 Project Structure
dma_display/
- src/
- dma_display.c
- buffer.c
- main.c
- README.md
5.3 The Core Question You’re Answering
“How do I stream full frames without tying up the CPU?”
5.4 Concepts You Must Understand First
- DMA channel configuration and DREQ
- SPI FIFO behavior
- Double buffering synchronization
5.5 Questions to Guide Your Design
- How will you detect DMA completion?
- What transfer size is safest for RGB565?
- How will you measure FPS reliably?
5.6 Thinking Exercise
Compute theoretical max FPS at 20 MHz SPI for 110 KB frames.
5.7 The Interview Questions They’ll Ask
- What is DMA and why is it useful?
- What does DREQ do?
- How does double buffering prevent tearing?
5.8 Hints in Layers
- Hint 1: Start with a small buffer and verify DMA writes.
- Hint 2: Add DREQ pacing and verify FIFO stability.
- Hint 3: Add double buffering and swap on DMA IRQ.
5.9 Books That Will Help
| Topic | Book | Chapter | |——-|——|———| | DMA fundamentals | “Making Embedded Systems” | Ch. 9 | | Performance tuning | “Computer Architecture” | Ch. 1 |
5.10 Implementation Phases
Phase 1: DMA Bring-up (2-3 days)
Goals: Send a buffer via DMA. Tasks: Configure DMA channel and DREQ. Checkpoint: Solid color fill works.
Phase 2: Double Buffer (3-4 days)
Goals: Render and transmit concurrently. Tasks: Implement buffer swap logic. Checkpoint: No tearing in animations.
Phase 3: Metrics (2-3 days)
Goals: Measure FPS and CPU idle time. Tasks: Add counters and logging. Checkpoint: Stable FPS counter on LCD.
5.11 Key Implementation Decisions
| Decision | Options | Recommendation | Rationale | |———-|———|—————-|———–| | Transfer size | 8-bit vs 16-bit | 16-bit | matches RGB565 | | Buffer count | 1 vs 2 | 2 | avoids tearing | | Completion signaling | Poll vs IRQ | IRQ | lower CPU overhead |
6. Testing Strategy
6.1 Test Categories
| Category | Purpose | Examples | |———-|———|———-| | Unit Tests | Validate buffer swaps | swap logic | | Integration Tests | DMA to SPI | full-frame transfer | | Performance Tests | FPS | 30 FPS target |
6.2 Critical Test Cases
- Solid Color DMA: all pixels identical, no corruption.
- Gradient DMA: verify correct ordering.
- Stress Loop: 1000 consecutive frames.
6.3 Test Data
Frame A: red gradient
Frame B: blue gradient
7. Common Pitfalls & Debugging
7.1 Frequent Mistakes
| Pitfall | Symptom | Solution | |———|———|———-| | No DREQ | Random corruption | Enable SPI DREQ | | Wrong transfer size | Colors wrong | Set DMA size to 16-bit | | Swap too early | Tearing | Swap only on DMA done |
7.2 Debugging Strategies
- Use SPI FIFO level registers to verify pacing.
- Toggle a GPIO on DMA completion to measure timing.
7.3 Performance Traps
- Using CPU to fill buffers while DMA runs can still be a bottleneck; optimize rendering.
8. Extensions & Challenges
8.1 Beginner Extensions
- Add a DMA completion LED blink.
- Log frame time over UART.
8.2 Intermediate Extensions
- Implement triple buffering.
- Use DMA chaining for automatic swaps.
8.3 Advanced Extensions
- Use PIO to drive SPI-like LCD in parallel.
- Implement scanline DMA updates.
9. Real-World Connections
9.1 Industry Applications
- Wearable displays with smooth UI
- Portable instruments with continuous data rendering
9.2 Related Open Source Projects
- Pico SDK DMA examples
- LVGL display drivers using DMA
9.3 Interview Relevance
- DMA and buffering are common embedded optimization topics.
10. Resources
10.1 Essential Reading
- RP2350 datasheet DMA chapter
- ST7789 data transfer timing
10.2 Video Resources
- DMA walkthroughs for microcontrollers
10.3 Tools & Documentation
- Logic analyzer for SPI throughput verification
10.4 Related Projects in This Series
- Project 5 uses DMA + multicore.
11. Self-Assessment Checklist
11.1 Understanding
- I can configure DMA with DREQ pacing.
- I can explain why double buffering prevents tearing.
11.2 Implementation
- DMA streams full frames without corruption.
- FPS is measured and stable.
11.3 Growth
- I can explain DMA setup in an interview.
12. Submission / Completion Criteria
Minimum Viable Completion:
- DMA transfer works with DREQ.
- One full frame renders without CPU-driven SPI.
Full Completion:
- Double buffering and FPS counter working.
Excellence (Going Above & Beyond):
- DMA chaining or triple buffering implemented.