Project 10: GPU-Accelerated Renderer
Build a GPU renderer for terminal glyphs with a texture atlas, batching, and damage tracking for high FPS.
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Level 3: Advanced |
| Time Estimate | 3-6 weeks |
| Main Programming Language | C or Rust (with OpenGL/Vulkan/Metal) |
| Alternative Programming Languages | C++, Rust |
| Coolness Level | Level 4: Hardcore Tech Flex |
| Business Potential | Level 3: Niche Infrastructure |
| Prerequisites | Basic GPU rendering, font rendering |
| Key Topics | texture atlas, batching, damage tracking |
1. Learning Objectives
By completing this project, you will:
- Render terminal glyphs using a GPU pipeline.
- Build a glyph texture atlas and cache.
- Batch thousands of glyphs into a few draw calls.
- Implement damage tracking to minimize redraw.
- Measure FPS and rendering throughput deterministically.
2. All Theory Needed (Per-Concept Breakdown)
Concept 1: Texture Atlases and Glyph Caching on the GPU
Fundamentals
A texture atlas packs many glyph bitmaps into a single GPU texture. Each glyph is stored at a known position in the atlas, and rendering a glyph becomes drawing a quad with texture coordinates. This reduces texture binds and enables batching. A glyph cache maps codepoints to atlas positions so glyphs are uploaded once and reused many times.
Deep Dive into the Concept
GPU rendering thrives on minimizing state changes. If each glyph required a separate texture, rendering would involve thousands of texture binds per frame, which is too slow. A texture atlas solves this by packing glyph bitmaps into a single large texture. The renderer then uses texture coordinates to sample the correct glyph region. This allows you to render all glyphs in a frame with a single texture bind.
Building an atlas is a packing problem. A simple approach is to use a shelf or skyline algorithm: place glyphs row by row, tracking the current x and y. When the row is full, move to the next row. For a terminal, a fixed-size atlas (e.g., 2048x2048) is often sufficient because glyph sets are limited. You must handle atlas exhaustion by either allocating a new atlas or evicting rarely used glyphs. A simple strategy is to allocate multiple atlas pages and choose a page when the first is full.
Glyph caching ties CPU and GPU together. On the CPU, you rasterize glyphs with FreeType. Then you upload the bitmap into the atlas texture with glTexSubImage2D or similar. You store the atlas coordinates and metrics in a cache keyed by font face and codepoint. The renderer then uses cached glyph metadata to generate quads. The cache needs eviction policies if the atlas is full. A simple LRU based on last-used frame can work.
Atlas management must be deterministic and correct. If you evict a glyph, any cells using it must be re-uploaded or marked dirty. Many renderers avoid eviction by using large atlases and assuming the glyph set is stable, which is true for most terminal sessions. For this project, a single atlas with a fixed max glyph count is sufficient; when full, you can reject new glyphs or rebuild the atlas from scratch in a deterministic way.
How this fits on projects
This concept builds on P09 and is essential for P13 and P15.
Definitions & Key Terms
- Texture atlas -> large texture containing many glyph bitmaps.
- Glyph cache -> map from codepoint to atlas position.
- Atlas page -> one texture used when atlas is full.
Mental Model Diagram (ASCII)
[atlas texture]
+-------------------------+
| A B C D E ... |
| f g h i j ... |
+-------------------------+
How It Works (Step-by-Step)
- Rasterize glyph with FreeType.
- Allocate space in atlas.
- Upload bitmap to GPU texture.
- Store UV coords in cache.
- Render quad with UV coords.
Invariants:
- Each glyph has stable atlas coordinates.
- Atlas texture size is fixed per page.
Failure modes:
- Atlas overflow with no eviction policy.
- Incorrect UV mapping causing wrong glyphs.
Minimal Concrete Example
AtlasPos pos = atlas_alloc(w, h);
upload_to_texture(pos.x, pos.y, bitmap);
cache_put(codepoint, pos);
Common Misconceptions
- “Each glyph needs its own texture.” -> Atlas avoids per-glyph textures.
- “GPU upload cost is negligible.” -> Uploads are expensive; cache helps.
Check-Your-Understanding Questions
- Why is an atlas better than per-glyph textures?
- What happens when the atlas fills up?
- Why must cache keys include font face?
Check-Your-Understanding Answers
- It reduces texture binds and enables batching.
- You must allocate a new atlas or evict glyphs.
- Different fonts produce different glyphs for the same codepoint.
Real-World Applications
- GPU terminals like Alacritty and Kitty
- Text rendering in game engines
Where You’ll Apply It
- This project: Section 3.2 (atlas), Section 4.3 (data structs)
- Also used in: P13-full-terminal-emulator, P15-feature-complete-terminal-capstone
References
- GPU text rendering articles
- FreeType + OpenGL tutorials
Key Insight
A texture atlas is the core optimization that makes GPU text rendering fast.
Summary
The atlas turns thousands of glyphs into a single texture, enabling batching and speed.
Homework/Exercises to Practice the Concept
- Pack 100 glyphs into a 512x512 atlas and visualize positions.
- Upload glyphs and render a grid of ASCII characters.
- Simulate atlas overflow and choose a policy.
Solutions to the Homework/Exercises
- Use a row-based packer and print positions.
- Render quads with UVs for each glyph.
- Implement a second atlas page and select based on availability.
Concept 2: Batching, Damage Tracking, and Frame Pacing
Fundamentals
Batching combines many glyphs into a single draw call by building a vertex buffer of quads. Damage tracking minimizes redraw by updating only regions of the screen that changed. Frame pacing ensures rendering happens at a stable rate, avoiding unnecessary redraws.
Deep Dive into the Concept
Rendering every cell every frame is wasteful. A terminal screen might be 120x40 = 4,800 cells. At 144 FPS, redrawing all cells can be expensive, especially with GPU overhead. Damage tracking solves this by marking which cells changed since the last frame. When rendering, you only build quads for dirty cells or dirty rows. For a simpler implementation, you can track dirty rows or rectangles rather than individual cells.
Batching is the key to performance. Each glyph becomes a quad with 4 vertices and 6 indices. You collect all quads for dirty cells into a single vertex buffer and issue one draw call. This drastically reduces CPU-GPU overhead. The vertex buffer can be rebuilt each frame or updated partially, depending on complexity. For this project, rebuild per frame is acceptable.
Frame pacing ensures smooth animation and avoids redundant frames. If no cells are dirty, you can skip rendering entirely. If updates are continuous (e.g., a yes command), you may render at a fixed max FPS (e.g., 60) to reduce CPU usage. You can implement a simple time-based limiter: if the last frame was less than 16 ms ago, delay rendering unless a critical event occurs.
Damage tracking also simplifies correctness: you can maintain a boolean per cell or a dirty rectangle list. When the screen model updates a cell, mark it dirty. When scrolling occurs, mark the affected region dirty. For scroll operations, you might choose to redraw the full screen, which is simpler and still performant for a small grid.
How this fits on projects
This concept is used in P13 and P15.
Definitions & Key Terms
- Batching -> combining many draw calls into one.
- Damage tracking -> marking changed regions.
- Frame pacing -> controlling render frequency.
Mental Model Diagram (ASCII)
Dirty cells -> build quads -> GPU draw
No dirty cells -> skip frame
How It Works (Step-by-Step)
- Mark cells dirty when they change.
- Gather dirty cells into a vertex buffer.
- Issue one draw call for all quads.
- Clear dirty marks.
Invariants:
- Dirty marks always cleared after successful render.
- No updates means no draw.
Failure modes:
- Forgetting to mark dirty causes stale cells.
- Clearing dirty flags too early causes missed updates.
Minimal Concrete Example
if (cell_changed) dirty[row][col] = true;
if (has_dirty) build_vertex_buffer(); draw(); clear_dirty();
Common Misconceptions
- “GPU is always fast.” -> CPU-GPU sync can be the bottleneck.
- “Redraw everything is fine.” -> It wastes resources at high FPS.
Check-Your-Understanding Questions
- Why batch glyphs into one draw call?
- What happens if you skip damage tracking?
- How do you avoid rendering when nothing changed?
Check-Your-Understanding Answers
- To reduce draw call overhead and improve performance.
- You redraw unnecessary cells, wasting CPU/GPU time.
- Track dirty state and skip when no changes.
Real-World Applications
- GPU terminal rendering
- Game UI text rendering
Where You’ll Apply It
- This project: Section 3.2 (damage tracking), Section 7.3 (perf)
- Also used in: P15-feature-complete-terminal-capstone
References
- GPU rendering best practices
- Terminal rendering performance studies
Key Insight
Damage tracking and batching are the difference between 30 FPS and 144 FPS.
Summary
Efficient rendering depends on minimizing work: track changes and batch draws.
Homework/Exercises to Practice the Concept
- Render only dirty rows and compare FPS.
- Add a frame limiter and measure CPU usage.
- Build a benchmark that floods output.
Solutions to the Homework/Exercises
- Mark rows dirty on updates and build quads only for them.
- Sleep if the last frame is too recent.
- Use
yesor a fixed log replay.
3. Project Specification
3.1 What You Will Build
A GPU-backed renderer that:
- Uses a glyph texture atlas.
- Builds a vertex buffer of quads for dirty cells.
- Draws text using a single draw call per frame.
- Reports FPS and cache hit rate.
Intentionally excluded:
- Full GUI or windowing framework (use a minimal GL context).
3.2 Functional Requirements
- Atlas: allocate a texture atlas and upload glyphs.
- Cache: store glyph metadata per codepoint.
- Batching: build a vertex buffer for dirty cells.
- Damage tracking: mark dirty cells on updates.
- Frame pacing: cap FPS and skip frames with no changes.
3.3 Non-Functional Requirements
- Performance: 60+ FPS at 4K with 120x40 grid.
- Determinism: benchmark replays with fixed logs and seeds.
- Stability: no GPU resource leaks.
3.4 Example Usage / Output
$ ./gpu_term --bench samples/flood.log
[render] fps=120 draw_calls=1 cache_hit=95%
3.5 Data Formats / Schemas / Protocols
- Vertex format:
{pos_x, pos_y, uv_x, uv_y, color}
3.6 Edge Cases
- Atlas full with new glyphs.
- Large window resize requiring reallocation.
- GPU device lost or context reset.
3.7 Real World Outcome
A renderer that handles high-output workloads at smooth frame rates.
3.7.1 How to Run (Copy/Paste)
cc -O2 -o gpu_term gpu_term.c -lGL
TZ=UTC LC_ALL=C ./gpu_term --bench samples/flood.log --seed 1234
3.7.2 Golden Path Demo (Deterministic)
- Replay
samples/flood.logwith seed 1234. - Verify FPS and cache hit rate match expected values.
3.7.3 Failure Demo (Deterministic)
$ ./gpu_term --atlas-size 0
error: atlas size must be > 0
exit status: 64
3.7.6 If Library: minimal usage snippet
renderer_init(&r, width, height);
renderer_draw(&r, screen);
Expected: draws the screen grid using the GPU with batching.
4. Solution Architecture
4.1 High-Level Design
Screen -> glyph cache -> vertex buffer -> GPU draw
4.2 Key Components
| Component | Responsibility | Key Decisions |
|---|---|---|
| Atlas Manager | Pack and upload glyphs | Fixed size atlas |
| Glyph Cache | Map codepoint -> atlas coords | LRU eviction optional |
| Renderer | Build vertices and draw | Single draw call |
| Damage Tracker | Mark changed cells | Dirty grid or rows |
4.3 Data Structures (No Full Code)
struct GlyphInfo { float u0,v0,u1,v1; int w,h,adv; };
struct DirtyGrid { bool dirty[MAX_ROWS][MAX_COLS]; };
4.4 Algorithm Overview
Key Algorithm: Render Frame
- If no dirty cells, skip frame.
- Build vertex buffer for dirty cells.
- Bind atlas texture and draw.
- Clear dirty flags.
Complexity Analysis:
- Time: O(dirty_cells)
- Space: O(rows*cols)
5. Implementation Guide
5.1 Development Environment Setup
pkg-config --libs gl
5.2 Project Structure
gpu-term/
|-- src/
| |-- renderer.c
| |-- atlas.c
| `-- bench.c
|-- samples/
| `-- flood.log
|-- Makefile
`-- README.md
5.3 The Core Question You’re Answering
“How do you render thousands of glyphs at high FPS without tearing?”
5.4 Concepts You Must Understand First
- Texture atlas packing and caching.
- Batching and draw calls.
- Damage tracking and frame pacing.
5.5 Questions to Guide Your Design
- How large should the atlas be?
- How will you measure cache hit rate?
- How will you cap FPS deterministically?
5.6 Thinking Exercise
Estimate the number of glyphs that fit in a 2048x2048 atlas with 16x20 cells.
5.7 The Interview Questions They’ll Ask
- Why use a texture atlas?
- How does damage tracking improve performance?
- What are GPU bottlenecks in text rendering?
5.8 Hints in Layers
Hint 1: Start with CPU rendering Validate output before GPU.
Hint 2: Upload glyphs lazily Only add glyphs when needed.
Hint 3: Build a single VBO per frame Batch all glyphs together.
Hint 4: Add a benchmark mode Measure FPS under load.
5.9 Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Graphics | “Computer Graphics from Scratch” | Ch. 9-12 |
| Performance | “Computer Systems: A Programmer’s Perspective” | Ch. 5 |
5.10 Implementation Phases
Phase 1: Atlas and cache (1-2 weeks)
Goals: upload glyphs to GPU. Tasks:
- Implement atlas allocation and upload.
- Build glyph cache. Checkpoint: Glyphs render from atlas.
Phase 2: Batching (1 week)
Goals: single draw call. Tasks:
- Build vertex buffer for all glyphs.
- Render with one draw call. Checkpoint: Output renders correctly.
Phase 3: Damage tracking (1 week)
Goals: reduce redraw work. Tasks:
- Mark dirty cells.
- Render only dirty cells. Checkpoint: FPS improves under load.
5.11 Key Implementation Decisions
| Decision | Options | Recommendation | Rationale |
|---|---|---|---|
| Atlas size | 1024 vs 2048 | 2048 | More glyphs before overflow |
| Dirty tracking | Cells vs rows | Cells | Higher precision |
| Frame pacing | Vsync vs timer | Timer + vsync | Deterministic benchmarks |
6. Testing Strategy
6.1 Test Categories
| Category | Purpose | Examples |
|---|---|---|
| Unit Tests | Atlas packing | Fill until full |
| Integration Tests | Render sample log | flood.log |
| Performance Tests | FPS benchmark | 120 FPS target |
6.2 Critical Test Cases
- Atlas overflow: gracefully handle full atlas.
- Cache reuse: repeated text uses cached glyphs.
- Dirty rendering: only changed cells redraw.
6.3 Test Data
Log: 10k lines of repeated text
Expected: cache hit rate > 90%
7. Common Pitfalls & Debugging
7.1 Frequent Mistakes
| Pitfall | Symptom | Solution |
|---|---|---|
| UV misalignment | Wrong glyphs | Verify atlas coordinates |
| No damage tracking | Low FPS | Mark dirty cells |
| Excessive texture uploads | Stutter | Cache glyphs |
7.2 Debugging Strategies
- Render atlas texture to screen for inspection.
- Add debug counters for draw calls and cache hits.
7.3 Performance Traps
Uploading glyphs every frame will stall the GPU. Cache and batch.
8. Extensions & Challenges
8.1 Beginner Extensions
- Add support for bold/italic atlas pages.
- Add a debug overlay for FPS and cache.
8.2 Intermediate Extensions
- Add multi-atlas page support.
- Add GPU scroll with texture blit.
8.3 Advanced Extensions
- Implement persistent mapped buffers.
- Add compute shader glyph packing.
9. Real-World Connections
9.1 Industry Applications
- GPU-accelerated terminals
- High-performance log viewers
9.2 Related Open Source Projects
- Alacritty: GPU terminal
- Kitty: GPU terminal with advanced features
9.3 Interview Relevance
- GPU rendering pipelines
- Performance optimization
10. Resources
10.1 Essential Reading
- OpenGL or Vulkan text rendering tutorials
- GPU performance guides
10.2 Video Resources
- Talks on GPU-based text rendering
10.3 Tools & Documentation
glxinfoor GPU profiling tools
10.4 Related Projects in This Series
11. Self-Assessment Checklist
11.1 Understanding
- I can explain atlas packing and caching.
- I know how batching reduces draw calls.
- I understand damage tracking.
11.2 Implementation
- GPU renderer draws glyphs correctly.
- Cache hit rate is high.
- FPS meets target.
11.3 Growth
- I can extend to multi-atlas support.
- I can profile GPU performance.
12. Submission / Completion Criteria
Minimum Viable Completion:
- GPU renders glyphs with an atlas.
- Single draw call for a frame.
Full Completion:
- Damage tracking and FPS benchmark.
- Deterministic benchmark results.
Excellence (Going Above & Beyond):
- Multi-atlas and GPU scroll.
- Advanced buffer management.