Project 1: The “Eye” (Display and State Feedback)

Build a responsive, low-latency UI on a round TFT that visualizes the voice assistant state machine without stealing time from audio tasks.

Quick Reference

Attribute	Value
Difficulty	Level 2: Beginner-Intermediate
Time Estimate	1-2 days
Main Programming Language	C/C++ (ESP-IDF) (Alternatives: Arduino, MicroPython for display bring-up)
Alternative Programming Languages	C (ESP-IDF), C++ (Arduino), Python (MicroPython)
Coolness Level	High
Business Potential	Medium
Prerequisites	Basic ESP32 flashing, SPI wiring, C structs, basic FreeRTOS tasks
Key Topics	LVGL rendering pipeline, display buffers, event-driven UI state machine, PSRAM usage

1. Learning Objectives

By completing this project, you will:

Configure a round SPI display on ESP32-S3 and drive it reliably with LVGL.
Implement an event-driven UI task that renders system state without blocking real-time audio work.
Choose and justify display buffer sizes and PSRAM placement based on latency and memory tradeoffs.
Build a deterministic state visualization flow and verify it through serial logs and visual cues.

2. All Theory Needed (Per-Concept Breakdown)

This section includes every concept required to implement the project successfully.

2.1 LVGL Rendering Pipeline and Flush Callbacks

Fundamentals

LVGL is a retained-mode GUI library. You do not draw pixels directly to the screen for every interaction; instead, you build a tree of objects and let LVGL render dirty regions. The display driver provides a flush callback, which LVGL calls when it has a region to render. The flush callback must push pixel data to the display and then notify LVGL that the flush is complete. The most important takeaway is that LVGL is only as fast and stable as your display driver. If the flush callback blocks too long, LVGL misses its timing and your UI feels unresponsive. If the callback signals completion too early, you can see flicker or tearing. On ESP32-S3, the flush callback often uses SPI DMA to move the buffer to the display. Because DMA needs the buffer to remain valid while the transfer is in flight, you must design buffer lifetimes carefully. This is why LVGL supports single-buffer and double-buffer modes, and why buffer size directly affects frame rate and memory use.

Deep Dive into the concept

The LVGL rendering pipeline is a structured series of steps: LVGL marks objects as dirty, calculates invalidated areas, composites the scene into a buffer, and requests the display driver to flush that buffer to the panel. This pattern is powerful because it separates UI logic from pixel IO. The ESP32-S3 does not have a GPU, so LVGL runs entirely in software. That means every pixel you draw is CPU time and memory bandwidth. The flush callback is the bridge between LVGL and the hardware driver. On a round display, you often use a rectangular buffer and still push a rectangle over SPI. The panel then clips the circle. This is a common optimization: LVGL does not need to know about the round shape; the driver or panel handles it.

A key detail is partial rendering. LVGL lets you define a buffer that is smaller than the full screen. When LVGL renders, it processes chunks of the invalidated area and sends them as multiple flushes. This reduces RAM usage but increases flush overhead. If your buffer is too small, you will spend too much time in the flush callback and starve other tasks. If your buffer is too large, you may exceed internal SRAM and force the buffer into PSRAM, which can slow down DMA or require extra cache management. On ESP32-S3, a typical 240x240 RGB565 screen is 115,200 bytes for a full frame. Two full buffers would be around 230 KB, which is too much for internal SRAM. A common compromise is to use a 1/10 screen buffer (about 11.5 KB), which easily fits in internal SRAM. That increases flush calls, but keeps DMA consistent and avoids PSRAM stalls.

Another important part of the pipeline is the tick and task handler. LVGL requires a tick counter (usually 1 ms) and a periodic call to lv_timer_handler(). Many embedded projects put this in a dedicated UI task that sleeps for a few milliseconds. If that task is too frequent, it can steal time from audio tasks. If it is too slow, animations will stutter and input may feel delayed. The key is to tune the UI task period to your system: if your audio pipeline is running 10 ms frames, a UI task period of 20-30 ms is often fine. You should also keep UI updates event-driven, so you only change labels or animations when the system state changes, not on every tick.

From a memory perspective, LVGL uses draw buffers and a heap for object allocations. Using PSRAM for LVGL heap is possible, but small, timing-sensitive buffers should remain in internal SRAM. When using DMA, you must ensure the buffer is DMA-capable and not cache-unstable. ESP-IDF provides heap capabilities and attributes to allocate DMA-capable memory. If you ignore this, you can see random flicker or partial updates. Another hidden issue is SPI transaction queue depth. If you push too many flushes without waiting, you can overflow the driver queue. The correct pattern is to start a DMA transaction, then call lv_disp_flush_ready() when the transfer completes (often in an SPI callback). This keeps LVGL honest about when it can reuse the buffer.

Finally, the LVGL pipeline is not just about drawing. It is about determinism. A voice assistant UI should reflect system state accurately. If the UI lags, the user loses trust. This is why you must treat flush callbacks and buffer sizing as part of your real-time design, not just a graphics detail. A stable UI is a user-facing proof that your firmware is healthy.

How this fit on projects

You will apply this concept directly in the display driver setup, LVGL buffer configuration, and the UI task that renders state transitions.

Definitions & key terms

Flush callback: Function called by LVGL to push a rendered buffer to the display.
Invalidated area: Region of the UI that needs redrawing.
Partial rendering: Using smaller buffers to draw the screen in chunks.
DMA-capable memory: Memory that can be used safely for hardware DMA.

Mental model diagram (ASCII)

[UI Objects] -> [LVGL Render] -> [Draw Buffer] -> [flush_cb] -> [SPI DMA] -> [Display]
       ^               |               |               |            |
       |               |               |               |            v
   State Events     Invalidated     Buffer size     Transfer     Pixels

How it works (step-by-step)

A UI object changes (label text, icon, animation state).
LVGL marks the object as dirty and computes invalidated rectangles.
LVGL renders the invalidated area into the draw buffer.
The display driver flush callback sends the buffer over SPI DMA.
The driver signals completion, allowing LVGL to reuse the buffer.

Minimal concrete example

static lv_disp_draw_buf_t draw_buf;
static lv_color_t *buf1;

buf1 = heap_caps_malloc(240 * 24 * sizeof(lv_color_t), MALLOC_CAP_DMA); // 1/10 screen
lv_disp_draw_buf_init(&draw_buf, buf1, NULL, 240 * 24);

static void my_flush_cb(lv_disp_drv_t *drv, const lv_area_t *area, lv_color_t *color_p) {
    spi_lcd_blit(area, color_p); // Start DMA transfer
    // In DMA done callback:
    // lv_disp_flush_ready(drv);
}

Common misconceptions

“Full-screen buffers are always faster.” They reduce flush calls but can exceed SRAM and cause stalls.
“Flush callbacks can be blocking.” Blocking flushes will starve other tasks.

Check-your-understanding questions

Why does LVGL need a flush callback instead of drawing directly to the screen?
What happens if you use a non-DMA-capable buffer for SPI DMA?
How does partial rendering trade memory for CPU time?
Why can calling lv_disp_flush_ready() too early cause flicker?

Check-your-understanding answers

LVGL separates rendering from hardware IO so it can run on many targets.
DMA may read invalid or cached data, causing corruption or crashes.
Smaller buffers reduce RAM but increase the number of flushes.
LVGL may overwrite the buffer while the DMA transfer is still active.

Real-world applications

Smart speakers and displays
Instrument clusters in automotive HMIs
IoT devices with local status screens

Where you will apply it

In this project: See Section 3.2 for buffer requirements and Section 5.2 for driver structure.
Also used in: P05-the-full-stack-xiaozhi-clone.md

References

LVGL official documentation on display drivers and draw buffers.
ESP-IDF SPI LCD example projects.
“Making Embedded Systems” by Elecia White, chapters on timing and display systems.

Key insights

A smooth UI is mostly about the flush callback timing and buffer placement.

Summary

LVGL renders into buffers and relies on your driver to move pixels to the screen. If you choose buffers and DMA setup poorly, the UI will flicker and steal time from real-time tasks.

Homework/exercises to practice the concept

Measure how long a single flush takes for different buffer sizes.
Try single-buffer vs double-buffer mode and compare flicker.
Reduce buffer size until the UI becomes visibly sluggish, then document the threshold.

Solutions to the homework/exercises

Use esp_timer_get_time() before and after the flush to log duration.
Double buffering reduces tearing but increases RAM use.
For a 240x240 display, buffers smaller than 240x12 often cause excessive flush overhead.

2.2 Event-Driven UI State Machines on FreeRTOS

Fundamentals

An embedded UI must reflect system state accurately. You cannot poll every subsystem continuously because polling wastes CPU cycles and can introduce jitter in audio tasks. Instead, you treat the UI as a separate task that receives events. Each event maps to a UI state (idle, listening, thinking, speaking, error). This approach is called an event-driven state machine. In FreeRTOS, you implement it using queues or event groups. When the networking or audio subsystem changes state, it pushes a message to the UI task. The UI task updates LVGL objects and animations accordingly. This decouples UI timing from core system timing. A clean state machine prevents illegal UI transitions and makes your logs easier to interpret.

Deep Dive into the concept

A state machine is more than a list of states. It is a contract about what transitions are allowed and what must happen on entry and exit. In this project, you have two concerns: system state and UI state. The system state represents the assistant pipeline (boot, idle, listening, thinking, speaking, error). The UI state is the visible representation of that system state. They should be closely aligned, but you should still decouple them so that the UI does not block core logic. The mechanism is an event queue.

In FreeRTOS, a common pattern is to define a struct that includes the event type, a timestamp, and optional payload data. For example, UI_EVT_WIFI_CONNECTED may include the IP address, while UI_EVT_ERROR includes an error code. The UI task blocks on the queue, and when it receives an event, it updates the UI objects. LVGL itself is not thread-safe, so only the UI task should call LVGL APIs. This is critical: if another task calls LVGL directly, you can corrupt internal structures and crash.

You also need to consider timing. If the UI task processes events too slowly, the queue can fill and you will lose updates. The system should never block on UI events. That means the queue should be sized to handle bursts and the UI task should be lower priority but not starved. A good approach is to process multiple queued events in one UI task iteration, then call lv_timer_handler() to update animations. You should also coalesce events: if you receive a LISTENING event while already in LISTENING, you can ignore it.

The state machine itself can be documented with a simple transition table. For example, IDLE -> LISTENING occurs when wake word triggers. LISTENING -> THINKING occurs when audio capture stops and a network request begins. THINKING -> SPEAKING occurs when TTS audio arrives. If a network error occurs at any point, you transition to ERROR and then eventually back to IDLE. You should not allow LISTENING -> SPEAKING without THINKING in between, because it would indicate a missing ASR or response step. These constraints are not just theoretical; they help you catch bugs where tasks are out of sync.

Event-driven UI design is also about throughput. UI updates should be minimal. For example, use a single label and icon, and change their text and color. Avoid frequent redraws or complex animations while streaming audio. A common trick is to use a low frame rate (10-20 FPS) for UI animations; this keeps the system responsive without stealing CPU. Another important consideration is that logging itself can block. If your UI task prints a log line on every event, it is fine. If it logs on every LVGL tick, it will slow down the system. Use event logs instead of tick logs.

Finally, build in determinism. When a user says the wake word, the UI should always transition to LISTENING within a predictable time (say under 100 ms). That means the event from the wake word task should be high priority, but the UI task should still not run at the same priority as audio. A clear mapping between system events and UI states also helps you test the UI in isolation: you can inject synthetic events to validate the UI without real audio.

How this fit on projects

This concept is central to the UI task, event queue, and the state visualization logic. It also sets the pattern for the more complex full-stack clone.

Definitions & key terms

State machine: A system of states and allowed transitions.
Event queue: A FreeRTOS queue used to send UI events.
UI task: The task that owns LVGL and handles UI updates.
Transition table: A table that documents allowed state changes.

Mental model diagram (ASCII)

[Audio/WiFi Tasks] ---> [UI Event Queue] ---> [UI Task] ---> [LVGL Objects]
        |                      |                 |
        |                      v                 v
        +--------------> [State Table]     [Screen/Icons]

How it works (step-by-step)

A subsystem posts a UI event to the queue.
The UI task receives the event and checks current state.
The state machine validates the transition.
LVGL objects are updated (text, icon, animation).
UI task calls lv_timer_handler() to render changes.

Minimal concrete example

typedef enum { UI_IDLE, UI_LISTENING, UI_THINKING, UI_SPEAKING, UI_ERROR } ui_state_t;

void ui_task(void *arg) {
    ui_state_t state = UI_IDLE;
    ui_event_t evt;
    while (1) {
        if (xQueueReceive(ui_queue, &evt, pdMS_TO_TICKS(50))) {
            state = ui_transition(state, evt.type);
            ui_render_state(state, evt);
        }
        lv_timer_handler();
        vTaskDelay(pdMS_TO_TICKS(20));
    }
}

Common misconceptions

“The UI can update itself from any task.” LVGL must be single-threaded.
“Polling is simpler.” Polling wastes CPU and increases jitter.

Check-your-understanding questions

Why should only the UI task call LVGL APIs?
What happens if the UI queue overflows?
How do you prevent invalid state transitions?

Check-your-understanding answers

LVGL is not thread-safe; concurrent access can corrupt its state.
Events are dropped, leading to stale or inconsistent UI.
Use a transition table and reject illegal transitions.

Real-world applications

Smart displays with asynchronous sensors
Embedded HMIs in appliances
Industrial dashboards

Where you will apply it

In this project: See Section 5.5 for the design questions and Section 5.10 for phases.
Also used in: P03-the-dumb-chatbot-streaming-audio-api.md, P05-the-full-stack-xiaozhi-clone.md

References

FreeRTOS documentation on queues and task priorities.
“Making Embedded Systems” by Elecia White, chapters on state machines.

Key insights

Event-driven UI keeps the system responsive and makes state visible.

Summary

A UI state machine driven by events avoids polling, reduces jitter, and keeps LVGL safe.

Homework/exercises to practice the concept

Write a transition table for idle, listening, thinking, speaking, error.
Inject synthetic events and log state transitions.
Measure time from event generation to UI update.

Solutions to the homework/exercises

Idle -> Listening -> Thinking -> Speaking -> Idle; any state -> Error.
Use a test task that sends events every second.
Use esp_timer_get_time() to log event and render timestamps.

2.3 Display Buffering, PSRAM, and Memory Tradeoffs

Fundamentals

A display buffer is a chunk of memory that holds pixel data before it is sent to the screen. On the ESP32-S3, internal SRAM is fast but limited, while PSRAM is larger but slower. If you store display buffers in PSRAM, you can keep more UI assets in memory, but the DMA engine may experience higher latency or cache issues. The correct choice depends on buffer size, display resolution, and how often you redraw. For a 240x240 RGB565 display, a full frame is about 115 KB. Double buffering would be 230 KB. This is too large for internal SRAM, so you must either use partial buffers in SRAM or accept slower PSRAM buffers. The tradeoff is clear: partial buffers increase flush frequency, while PSRAM buffers increase transfer latency.

Deep Dive into the concept

Memory tradeoffs define whether your UI feels smooth or sluggish. The ESP32-S3 includes limited internal SRAM for time-critical work. When you add a display, you can easily consume a large portion of that SRAM, starving the audio pipeline. This is why memory placement is a first-class design decision. You must decide where to store the LVGL draw buffer, LVGL heap, UI assets, and any cached images. Each choice affects DMA behavior and CPU cache behavior.

On ESP-IDF, you can allocate memory with specific capabilities. MALLOC_CAP_DMA ensures the buffer is suitable for DMA, while MALLOC_CAP_SPIRAM places it in PSRAM. However, a buffer that is both DMA-capable and in PSRAM can have subtle performance issues. PSRAM accesses are slower and can stall the CPU. If your UI flush takes too long, the UI task may block and the audio pipeline may suffer. This is why many designs use a small DMA-capable buffer in SRAM, even if it requires partial rendering. The cost is more flush calls, which increases CPU overhead. The benefit is lower per-flush latency and more predictable behavior.

Another factor is cache line alignment. If your buffer is not aligned, DMA may require extra copies or even fail. The ESP-IDF drivers often handle alignment, but you should still allocate buffers with alignment in mind. If you use the SPI LCD driver, it may internally copy buffers into a DMA-capable region. This double buffering reduces performance. The best approach is to allocate the draw buffer in a DMA-capable region directly.

You also need to consider the LVGL object heap. LVGL uses dynamic allocations for its objects, styles, and animations. Placing this heap in PSRAM can free SRAM for audio tasks. But if the UI objects are accessed frequently, you can experience cache misses. A balanced approach is to keep only large assets (images, fonts) in PSRAM while keeping the draw buffer in SRAM. Many LVGL port examples support custom allocators so you can control where memory goes.

Timing analysis is critical. Suppose your SPI clock is 40 MHz and you flush 11.5 KB per partial buffer. That is about 2.3 ms per flush, plus overhead. If you flush 10 times to update a full screen, you spend ~23 ms just on SPI transfers. If your UI task runs at 20 ms intervals, you already fall behind. You can either reduce the UI refresh rate or increase buffer size. These are the kinds of tradeoffs that you must measure. The best practice is to instrument flush time and track how often the UI task consumes CPU. If the UI task consumes more than 10-15 percent CPU, it is likely to impact audio.

Finally, consider how the UI interacts with the rest of the system. When the assistant is idle, you can allow more UI animation and higher refresh. When streaming audio, you can lower UI refresh rate or pause animations. This dynamic throttling is a powerful technique to keep audio stable. It also makes the UI feel intentional: it becomes calmer during listening and thinking, and more animated when idle. All of these behaviors are consequences of memory and timing decisions.

How this fit on projects

This concept is applied in buffer sizing, memory allocation, and UI refresh strategies in this project and the full-stack clone.

Definitions & key terms

Draw buffer: LVGL buffer used for rendering.
PSRAM: External memory with higher latency.
DMA-capable: Memory that can be used directly by DMA engines.
Partial buffer: Buffer that covers only a portion of the screen.

Mental model diagram (ASCII)

Internal SRAM (fast)          External PSRAM (large)
+-------------------+         +---------------------+
| Audio ring buffer |         | UI assets/fonts     |
| UI draw buffer    |         | Full-frame images   |
+-------------------+         +---------------------+
         |                              |
         v                              v
     Low latency                   High capacity

How it works (step-by-step)

Decide UI refresh target (for example 15 FPS).
Compute buffer size options (full, half, partial).
Allocate DMA-capable buffers in SRAM.
Place large assets in PSRAM to save SRAM.
Measure flush time and adjust buffer size.

Minimal concrete example

lv_color_t *buf1 = heap_caps_malloc(240 * 24 * sizeof(lv_color_t), MALLOC_CAP_DMA);
assert(buf1);

// Store a large image in PSRAM
void *logo = heap_caps_malloc(40 * 1024, MALLOC_CAP_SPIRAM);

Common misconceptions

“PSRAM is good for everything.” It can add latency and jitter.
“Smaller buffers always save time.” They save RAM but can cost more CPU.

Check-your-understanding questions

Why might a partial buffer in SRAM be faster than a full buffer in PSRAM?
What happens if the UI task uses too much CPU?
How do you decide whether to reduce refresh rate or increase buffer size?

Check-your-understanding answers

SRAM has lower latency and DMA overhead is lower per flush.
Audio tasks can miss deadlines, causing glitches.
Measure flush time and CPU usage; choose the option with less total impact.

Real-world applications

Wearable device UIs
Smart thermostat screens
Battery-powered displays

Where you will apply it

In this project: See Section 3.2 for functional requirements and Section 7.3 for performance traps.
Also used in: P01-the-eye-display-and-state-feedback.md, P05-the-full-stack-xiaozhi-clone.md

References

ESP-IDF heap capabilities documentation.
LVGL memory configuration guides.

Key insights

Display buffers are a memory budget, not a convenience.

Summary

Choosing where buffers live and how large they are determines UI smoothness and audio stability.

Homework/exercises to practice the concept

Calculate RAM usage for full-frame, half-frame, and 1/10-frame buffers.
Measure flush time for each buffer size.
Log CPU usage of the UI task during an animation.

Solutions to the homework/exercises

Full: 115,200 bytes, half: 57,600 bytes, 1/10: 11,520 bytes (RGB565).
Use timestamp logs around the flush callback.
Use FreeRTOS run-time stats and log UI task percentage.

3. Project Specification

3.1 What You Will Build

You will build a round-display UI that shows a boot animation and then cycles through assistant states (idle, listening, thinking, speaking, error). The UI must update based on events from other tasks and must not interfere with audio pipeline timing. The project includes a driver for the display, a UI task that owns LVGL, and a small event interface that other modules can call to update state. You will also include a deterministic demo mode that cycles through states without any audio or network dependencies.

3.2 Functional Requirements

Display Initialization: Bring up the SPI display and show a test pattern within 5 seconds of boot.
LVGL Integration: Render LVGL objects using a DMA-capable buffer and flush callback.
State Machine: Implement at least five UI states: boot, idle, listening, thinking, speaking, error.
Event Interface: Provide a function ui_set_state(state, reason) that queues an event to the UI task.
Deterministic Demo Mode: Provide a demo mode that cycles through states on a fixed schedule.

3.3 Non-Functional Requirements

Performance: UI task CPU usage must remain under 15 percent during idle animation.
Reliability: No flicker or corrupted frames during a 5-minute run.
Usability: State changes must be visible within 200 ms of event arrival.

3.4 Example Usage / Output

Boot: screen shows logo for 1.5 seconds
Idle: blue ring, label "idle"
Listening: green pulsing ring
Thinking: yellow spinner
Speaking: orange waves
Error: red exclamation icon and error code

3.5 Data Formats / Schemas / Protocols

UI event message format (binary queue payload):

typedef struct {
    uint32_t timestamp_ms;
    ui_state_t state;
    uint16_t reason_code; // 0=normal, non-zero for error reason
} ui_event_t;

3.6 Edge Cases

Display init fails or SPI not wired: show serial error and keep retrying.
UI queue full: drop oldest event and log warning.
LVGL heap allocation fails: show error screen and halt UI task.

3.7 Real World Outcome

This section is a golden reference. You will compare your result directly against it.

3.7.1 How to Run (Copy/Paste)

cd /path/to/project
idf.py set-target esp32s3
idf.py build
idf.py flash monitor

3.7.2 Golden Path Demo (Deterministic)

Set the system clock to a fixed time for logs and run demo mode.

Expected serial output:

I (000100) ui: demo=on, clock=2026-01-01T00:00:00Z
I (001500) ui: state=boot
I (003000) ui: state=idle
I (005000) ui: state=listening
I (007000) ui: state=thinking
I (009000) ui: state=speaking
I (011000) ui: state=error reason=42
I (013000) ui: state=idle

Expected screen sequence:

0-1.5s: logo
1.5-3s: idle blue ring
3-5s: listening green pulse
5-7s: thinking yellow spinner
7-9s: speaking orange waves
9-11s: error red icon with code 42

3.7.3 Failure Demo (Deterministic)

Simulate a display reset pin failure by forcing reset pin low.

Expected serial output:

E (000200) lcd: reset timeout
E (000210) ui: display_init_failed

Expected screen:

Backlight remains off or white screen.
Device stays in retry loop every 5 seconds.

3.7.4 If GUI / Desktop / Mobile

This is an embedded GUI. The screen must include:

A center label showing the state text.
A circular ring animation or icon around the label.
A small status dot for Wi-Fi (gray/red/green).

ASCII wireframe:

+------------------------+
|        (ring)          |
|        [idle]          |
|                        |
|       o  wifi          |
+------------------------+

4. Solution Architecture

4.1 High-Level Design

[System Tasks] -> [UI Event Queue] -> [UI Task] -> [LVGL] -> [SPI DMA] -> [Display]
        |                |
        |                v
   [State Machine]  [Transition Table]

4.2 Key Components

Component	Responsibility	Key Decisions
Display Driver	SPI init, flush callback, DMA transfers	Buffer size vs speed
UI Task	Own LVGL and render states	Event-driven vs polling
UI Event API	Provide system state updates	Queue depth and payload
UI Assets	Icons, fonts, animations	Store in PSRAM vs flash

4.3 Data Structures (No Full Code)

typedef struct {
    ui_state_t state;
    uint32_t last_transition_ms;
    uint16_t error_code;
} ui_model_t;

typedef struct {
    ui_event_t events[16];
    uint8_t head;
    uint8_t tail;
} ui_event_queue_t;

4.4 Algorithm Overview

Key Algorithm: State Update and Render

Receive UI event.
Validate state transition using a table.
Update UI model and schedule LVGL changes.
Render only changed objects to reduce redraw cost.

Complexity Analysis:

Time: O(1) per event
Space: O(1) for state model

5. Implementation Guide

5.1 Development Environment Setup

idf.py set-target esp32s3
idf.py menuconfig
# Enable PSRAM and SPI LCD settings

5.2 Project Structure

project-root/
├── main/
│   ├── app_main.c
│   ├── ui_task.c
│   ├── ui_task.h
│   ├── ui_assets.c
│   └── display_driver.c
├── components/
│   └── lvgl/
├── CMakeLists.txt
└── README.md

5.3 The Core Question You’re Answering

“How can a tiny device show complex state changes without interfering with real-time audio?”

5.4 Concepts You Must Understand First

Stop and research these before coding:

SPI LCD wiring and reset sequences
LVGL draw buffer sizing and flush callbacks
FreeRTOS queue usage for event-driven updates

5.5 Questions to Guide Your Design

What buffer size gives a stable frame rate without starving audio tasks?
How often should lv_timer_handler() run in your system?
Which state transitions are illegal, and how will you log them?

5.6 Thinking Exercise

Draw a timeline of events from wake word to speaking. Mark where UI events fire and how long it takes to see the UI change.

5.7 The Interview Questions They’ll Ask

Why is LVGL not thread-safe?
How does partial buffering affect CPU usage?
What is the impact of SPI clock speed on frame rate?

5.8 Hints in Layers

Hint 1: Start with a static label and a solid background.

Hint 2: Only update UI when the system state changes.

Hint 3: Use a 1/10 screen buffer and log flush time.

Hint 4: If animations stutter, reduce frame rate and increase buffer size.

5.9 Books That Will Help

5.10 Implementation Phases

Phase 1: Bring-up (2-4 hours)

Goals:

Initialize display and show test pattern.
Verify SPI wiring and backlight.

Tasks:

Run a basic SPI LCD example.
Display a static color screen.

Checkpoint: You see a stable, non-flickering solid color.

Phase 2: LVGL Integration (4-6 hours)

Goals:

Integrate LVGL with flush callback.
Render basic UI objects.

Tasks:

Allocate draw buffer and hook flush callback.
Create label and ring shape.

Checkpoint: UI renders text and a static ring.

Phase 3: State Machine (4-6 hours)

Goals:

Implement UI event queue and states.
Add animations per state.

Tasks:

Define ui_event_t and queue.
Map states to colors and animations.

Checkpoint: Demo mode cycles through states predictably.

5.11 Key Implementation Decisions

6. Testing Strategy

6.1 Test Categories

6.2 Critical Test Cases

Transition Validity: Send invalid transition and verify error log.
Queue Overflow: Push 100 events quickly and verify oldest is dropped.
Flush Timing: Measure flush time and verify under 5 ms for partial buffer.

6.3 Test Data

UI events: [BOOT, IDLE, LISTENING, THINKING, SPEAKING, ERROR]

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

7.2 Debugging Strategies

Measure flush time with timers to confirm performance.
Use a demo mode to isolate UI issues from audio/network tasks.

7.3 Performance Traps

Excessive logging in the UI task can cause jitter. Reduce log frequency during animations.

8. Extensions & Challenges

8.1 Beginner Extensions

Add a Wi-Fi RSSI icon.
Add a boot splash screen with version number.

8.2 Intermediate Extensions

Add a settings menu navigated by button.
Add a circular progress bar during thinking.

8.3 Advanced Extensions

Render a waveform visualization for mic input.
Implement a low-power screen dimming mode.

9. Real-World Connections

9.1 Industry Applications

Smart speakers with visual feedback rings.
Wearable UIs with constrained memory.

LVGL demo applications.
ESP-IDF SPI LCD examples.

9.3 Interview Relevance

UI state machines for embedded systems.
Buffering and DMA tradeoffs under resource constraints.

10. Resources

10.1 Essential Reading

“Making Embedded Systems” by Elecia White, Ch. 7
LVGL documentation (display drivers, draw buffers)

10.2 Video Resources

LVGL official YouTube tutorials on display drivers.
Espressif talks on GUI performance.

10.3 Tools & Documentation

ESP-IDF SPI LCD driver docs
LVGL porting guide

P02-the-parrot-audio-capture-playback.md - Introduces real-time audio pipeline.
P05-the-full-stack-xiaozhi-clone.md - Combines UI and audio in a full assistant.

11. Self-Assessment Checklist

11.1 Understanding

I can explain LVGL flush callbacks and buffer sizing.
I can describe why event-driven UI avoids audio jitter.
I can explain how PSRAM affects UI performance.

11.2 Implementation

UI renders all states without flicker.
Event queue does not overflow under bursty events.
UI task CPU usage stays under target.

11.3 Growth

I documented buffer size experiments.
I can explain UI tradeoffs in an interview.

12. Submission / Completion Criteria

Minimum Viable Completion:

Boot screen and idle state render correctly.
UI transitions to listening, thinking, speaking, error.
Demo mode runs deterministically for 3 minutes.

Full Completion:

All minimum criteria plus:
Buffer timing logs captured and analyzed.
UI task CPU usage measured under load.

Excellence (Going Above & Beyond):

UI adapts refresh rate based on audio load.
Custom animation assets added without increasing glitches.

13 Additional Content Rules (Hard Requirements)

13.1 Determinism

Demo mode uses fixed timing and fixed timestamps.
State transitions are based on a deterministic timer.

13.2 Outcome Completeness

Golden path demo in Section 3.7.2.
Failure demo in Section 3.7.3.

13.3 Cross-Linking

Cross-links are included in “Where you will apply it” and Section 10.4.

13.4 No Placeholder Text

All sections are fully filled with specific content.