Project 10: GPU-Accelerated Renderer

Build a GPU renderer for terminal glyphs with a texture atlas, batching, and damage tracking for high FPS.

Quick Reference

Attribute	Value
Difficulty	Level 3: Advanced
Time Estimate	3-6 weeks
Main Programming Language	C or Rust (with OpenGL/Vulkan/Metal)
Alternative Programming Languages	C++, Rust
Coolness Level	Level 4: Hardcore Tech Flex
Business Potential	Level 3: Niche Infrastructure
Prerequisites	Basic GPU rendering, font rendering
Key Topics	texture atlas, batching, damage tracking

1. Learning Objectives

By completing this project, you will:

Render terminal glyphs using a GPU pipeline.
Build a glyph texture atlas and cache.
Batch thousands of glyphs into a few draw calls.
Implement damage tracking to minimize redraw.
Measure FPS and rendering throughput deterministically.

2. All Theory Needed (Per-Concept Breakdown)

Concept 1: Texture Atlases and Glyph Caching on the GPU

Fundamentals

A texture atlas packs many glyph bitmaps into a single GPU texture. Each glyph is stored at a known position in the atlas, and rendering a glyph becomes drawing a quad with texture coordinates. This reduces texture binds and enables batching. A glyph cache maps codepoints to atlas positions so glyphs are uploaded once and reused many times.

Deep Dive into the Concept

GPU rendering thrives on minimizing state changes. If each glyph required a separate texture, rendering would involve thousands of texture binds per frame, which is too slow. A texture atlas solves this by packing glyph bitmaps into a single large texture. The renderer then uses texture coordinates to sample the correct glyph region. This allows you to render all glyphs in a frame with a single texture bind.

Building an atlas is a packing problem. A simple approach is to use a shelf or skyline algorithm: place glyphs row by row, tracking the current x and y. When the row is full, move to the next row. For a terminal, a fixed-size atlas (e.g., 2048x2048) is often sufficient because glyph sets are limited. You must handle atlas exhaustion by either allocating a new atlas or evicting rarely used glyphs. A simple strategy is to allocate multiple atlas pages and choose a page when the first is full.

Glyph caching ties CPU and GPU together. On the CPU, you rasterize glyphs with FreeType. Then you upload the bitmap into the atlas texture with glTexSubImage2D or similar. You store the atlas coordinates and metrics in a cache keyed by font face and codepoint. The renderer then uses cached glyph metadata to generate quads. The cache needs eviction policies if the atlas is full. A simple LRU based on last-used frame can work.

Atlas management must be deterministic and correct. If you evict a glyph, any cells using it must be re-uploaded or marked dirty. Many renderers avoid eviction by using large atlases and assuming the glyph set is stable, which is true for most terminal sessions. For this project, a single atlas with a fixed max glyph count is sufficient; when full, you can reject new glyphs or rebuild the atlas from scratch in a deterministic way.

How this fits on projects

This concept builds on P09 and is essential for P13 and P15.

Definitions & Key Terms

Texture atlas -> large texture containing many glyph bitmaps.
Glyph cache -> map from codepoint to atlas position.
Atlas page -> one texture used when atlas is full.

Mental Model Diagram (ASCII)

[atlas texture]
+-------------------------+
| A B C D E ...           |
| f g h i j ...           |
+-------------------------+

How It Works (Step-by-Step)

Rasterize glyph with FreeType.
Allocate space in atlas.
Upload bitmap to GPU texture.
Store UV coords in cache.
Render quad with UV coords.

Invariants:

Each glyph has stable atlas coordinates.
Atlas texture size is fixed per page.

Failure modes:

Atlas overflow with no eviction policy.
Incorrect UV mapping causing wrong glyphs.

Minimal Concrete Example

AtlasPos pos = atlas_alloc(w, h);
upload_to_texture(pos.x, pos.y, bitmap);
cache_put(codepoint, pos);

Common Misconceptions

“Each glyph needs its own texture.” -> Atlas avoids per-glyph textures.
“GPU upload cost is negligible.” -> Uploads are expensive; cache helps.

Check-Your-Understanding Questions

Why is an atlas better than per-glyph textures?
What happens when the atlas fills up?
Why must cache keys include font face?

Check-Your-Understanding Answers

It reduces texture binds and enables batching.
You must allocate a new atlas or evict glyphs.
Different fonts produce different glyphs for the same codepoint.

Real-World Applications

GPU terminals like Alacritty and Kitty
Text rendering in game engines

Where You’ll Apply It

This project: Section 3.2 (atlas), Section 4.3 (data structs)
Also used in: P13-full-terminal-emulator, P15-feature-complete-terminal-capstone

References

GPU text rendering articles
FreeType + OpenGL tutorials

Key Insight

A texture atlas is the core optimization that makes GPU text rendering fast.

Summary

The atlas turns thousands of glyphs into a single texture, enabling batching and speed.

Homework/Exercises to Practice the Concept

Pack 100 glyphs into a 512x512 atlas and visualize positions.
Upload glyphs and render a grid of ASCII characters.
Simulate atlas overflow and choose a policy.

Solutions to the Homework/Exercises

Use a row-based packer and print positions.
Render quads with UVs for each glyph.
Implement a second atlas page and select based on availability.

Concept 2: Batching, Damage Tracking, and Frame Pacing

Fundamentals

Batching combines many glyphs into a single draw call by building a vertex buffer of quads. Damage tracking minimizes redraw by updating only regions of the screen that changed. Frame pacing ensures rendering happens at a stable rate, avoiding unnecessary redraws.

Deep Dive into the Concept

Rendering every cell every frame is wasteful. A terminal screen might be 120x40 = 4,800 cells. At 144 FPS, redrawing all cells can be expensive, especially with GPU overhead. Damage tracking solves this by marking which cells changed since the last frame. When rendering, you only build quads for dirty cells or dirty rows. For a simpler implementation, you can track dirty rows or rectangles rather than individual cells.

Batching is the key to performance. Each glyph becomes a quad with 4 vertices and 6 indices. You collect all quads for dirty cells into a single vertex buffer and issue one draw call. This drastically reduces CPU-GPU overhead. The vertex buffer can be rebuilt each frame or updated partially, depending on complexity. For this project, rebuild per frame is acceptable.

Frame pacing ensures smooth animation and avoids redundant frames. If no cells are dirty, you can skip rendering entirely. If updates are continuous (e.g., a yes command), you may render at a fixed max FPS (e.g., 60) to reduce CPU usage. You can implement a simple time-based limiter: if the last frame was less than 16 ms ago, delay rendering unless a critical event occurs.

Damage tracking also simplifies correctness: you can maintain a boolean per cell or a dirty rectangle list. When the screen model updates a cell, mark it dirty. When scrolling occurs, mark the affected region dirty. For scroll operations, you might choose to redraw the full screen, which is simpler and still performant for a small grid.

How this fits on projects

This concept is used in P13 and P15.

Definitions & Key Terms

Batching -> combining many draw calls into one.
Damage tracking -> marking changed regions.
Frame pacing -> controlling render frequency.

Mental Model Diagram (ASCII)

Dirty cells -> build quads -> GPU draw
No dirty cells -> skip frame

How It Works (Step-by-Step)

Mark cells dirty when they change.
Gather dirty cells into a vertex buffer.
Issue one draw call for all quads.
Clear dirty marks.

Invariants:

Dirty marks always cleared after successful render.
No updates means no draw.

Failure modes:

Forgetting to mark dirty causes stale cells.
Clearing dirty flags too early causes missed updates.

Minimal Concrete Example

if (cell_changed) dirty[row][col] = true;
if (has_dirty) build_vertex_buffer(); draw(); clear_dirty();

Common Misconceptions

“GPU is always fast.” -> CPU-GPU sync can be the bottleneck.
“Redraw everything is fine.” -> It wastes resources at high FPS.

Check-Your-Understanding Questions

Why batch glyphs into one draw call?
What happens if you skip damage tracking?
How do you avoid rendering when nothing changed?

Check-Your-Understanding Answers

To reduce draw call overhead and improve performance.
You redraw unnecessary cells, wasting CPU/GPU time.
Track dirty state and skip when no changes.

Real-World Applications

GPU terminal rendering
Game UI text rendering

Where You’ll Apply It

This project: Section 3.2 (damage tracking), Section 7.3 (perf)
Also used in: P15-feature-complete-terminal-capstone

References

GPU rendering best practices
Terminal rendering performance studies

Key Insight

Damage tracking and batching are the difference between 30 FPS and 144 FPS.

Summary

Efficient rendering depends on minimizing work: track changes and batch draws.

Homework/Exercises to Practice the Concept

Render only dirty rows and compare FPS.
Add a frame limiter and measure CPU usage.
Build a benchmark that floods output.

Solutions to the Homework/Exercises

Mark rows dirty on updates and build quads only for them.
Sleep if the last frame is too recent.
Use yes or a fixed log replay.

3. Project Specification

3.1 What You Will Build

A GPU-backed renderer that:

Uses a glyph texture atlas.
Builds a vertex buffer of quads for dirty cells.
Draws text using a single draw call per frame.
Reports FPS and cache hit rate.

Intentionally excluded:

Full GUI or windowing framework (use a minimal GL context).

3.2 Functional Requirements

Atlas: allocate a texture atlas and upload glyphs.
Cache: store glyph metadata per codepoint.
Batching: build a vertex buffer for dirty cells.
Damage tracking: mark dirty cells on updates.
Frame pacing: cap FPS and skip frames with no changes.

3.3 Non-Functional Requirements

Performance: 60+ FPS at 4K with 120x40 grid.
Determinism: benchmark replays with fixed logs and seeds.
Stability: no GPU resource leaks.

3.4 Example Usage / Output

$ ./gpu_term --bench samples/flood.log
[render] fps=120 draw_calls=1 cache_hit=95%

3.5 Data Formats / Schemas / Protocols

Vertex format: {pos_x, pos_y, uv_x, uv_y, color}

3.6 Edge Cases

Atlas full with new glyphs.
Large window resize requiring reallocation.
GPU device lost or context reset.

3.7 Real World Outcome

A renderer that handles high-output workloads at smooth frame rates.

3.7.1 How to Run (Copy/Paste)

cc -O2 -o gpu_term gpu_term.c -lGL
TZ=UTC LC_ALL=C ./gpu_term --bench samples/flood.log --seed 1234

3.7.2 Golden Path Demo (Deterministic)

Replay samples/flood.log with seed 1234.
Verify FPS and cache hit rate match expected values.

3.7.3 Failure Demo (Deterministic)

$ ./gpu_term --atlas-size 0
error: atlas size must be > 0
exit status: 64

3.7.6 If Library: minimal usage snippet

renderer_init(&r, width, height);
renderer_draw(&r, screen);

Expected: draws the screen grid using the GPU with batching.

4. Solution Architecture

4.1 High-Level Design

Screen -> glyph cache -> vertex buffer -> GPU draw

4.2 Key Components

Component	Responsibility	Key Decisions
Atlas Manager	Pack and upload glyphs	Fixed size atlas
Glyph Cache	Map codepoint -> atlas coords	LRU eviction optional
Renderer	Build vertices and draw	Single draw call
Damage Tracker	Mark changed cells	Dirty grid or rows

4.3 Data Structures (No Full Code)

struct GlyphInfo { float u0,v0,u1,v1; int w,h,adv; };
struct DirtyGrid { bool dirty[MAX_ROWS][MAX_COLS]; };

4.4 Algorithm Overview

Key Algorithm: Render Frame

If no dirty cells, skip frame.
Build vertex buffer for dirty cells.
Bind atlas texture and draw.
Clear dirty flags.

Complexity Analysis:

Time: O(dirty_cells)
Space: O(rows*cols)

5. Implementation Guide

5.1 Development Environment Setup

pkg-config --libs gl

5.2 Project Structure

gpu-term/
|-- src/
|   |-- renderer.c
|   |-- atlas.c
|   `-- bench.c
|-- samples/
|   `-- flood.log
|-- Makefile
`-- README.md

5.3 The Core Question You’re Answering

“How do you render thousands of glyphs at high FPS without tearing?”

5.4 Concepts You Must Understand First

Texture atlas packing and caching.
Batching and draw calls.
Damage tracking and frame pacing.

5.5 Questions to Guide Your Design

How large should the atlas be?
How will you measure cache hit rate?
How will you cap FPS deterministically?

5.6 Thinking Exercise

Estimate the number of glyphs that fit in a 2048x2048 atlas with 16x20 cells.

5.7 The Interview Questions They’ll Ask

Why use a texture atlas?
How does damage tracking improve performance?
What are GPU bottlenecks in text rendering?

5.8 Hints in Layers

Hint 1: Start with CPU rendering Validate output before GPU.

Hint 2: Upload glyphs lazily Only add glyphs when needed.

Hint 3: Build a single VBO per frame Batch all glyphs together.

Hint 4: Add a benchmark mode Measure FPS under load.

5.9 Books That Will Help

Topic	Book	Chapter
Graphics	“Computer Graphics from Scratch”	Ch. 9-12
Performance	“Computer Systems: A Programmer’s Perspective”	Ch. 5

5.10 Implementation Phases

Phase 1: Atlas and cache (1-2 weeks)

Goals: upload glyphs to GPU. Tasks:

Implement atlas allocation and upload.
Build glyph cache. Checkpoint: Glyphs render from atlas.

Phase 2: Batching (1 week)

Goals: single draw call. Tasks:

Build vertex buffer for all glyphs.
Render with one draw call. Checkpoint: Output renders correctly.

Phase 3: Damage tracking (1 week)

Goals: reduce redraw work. Tasks:

Mark dirty cells.
Render only dirty cells. Checkpoint: FPS improves under load.

5.11 Key Implementation Decisions

Decision	Options	Recommendation	Rationale
Atlas size	1024 vs 2048	2048	More glyphs before overflow
Dirty tracking	Cells vs rows	Cells	Higher precision
Frame pacing	Vsync vs timer	Timer + vsync	Deterministic benchmarks

6. Testing Strategy

6.1 Test Categories

Category	Purpose	Examples
Unit Tests	Atlas packing	Fill until full
Integration Tests	Render sample log	flood.log
Performance Tests	FPS benchmark	120 FPS target

6.2 Critical Test Cases

Atlas overflow: gracefully handle full atlas.
Cache reuse: repeated text uses cached glyphs.
Dirty rendering: only changed cells redraw.

6.3 Test Data

Log: 10k lines of repeated text
Expected: cache hit rate > 90%

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

Pitfall	Symptom	Solution
UV misalignment	Wrong glyphs	Verify atlas coordinates
No damage tracking	Low FPS	Mark dirty cells
Excessive texture uploads	Stutter	Cache glyphs

7.2 Debugging Strategies

Render atlas texture to screen for inspection.
Add debug counters for draw calls and cache hits.

7.3 Performance Traps

Uploading glyphs every frame will stall the GPU. Cache and batch.

8. Extensions & Challenges

8.1 Beginner Extensions

Add support for bold/italic atlas pages.
Add a debug overlay for FPS and cache.

8.2 Intermediate Extensions

Add multi-atlas page support.
Add GPU scroll with texture blit.

8.3 Advanced Extensions

Implement persistent mapped buffers.
Add compute shader glyph packing.

9. Real-World Connections

9.1 Industry Applications

GPU-accelerated terminals
High-performance log viewers

Alacritty: GPU terminal
Kitty: GPU terminal with advanced features

9.3 Interview Relevance

GPU rendering pipelines
Performance optimization

10. Resources

10.1 Essential Reading

OpenGL or Vulkan text rendering tutorials
GPU performance guides

10.2 Video Resources

Talks on GPU-based text rendering

10.3 Tools & Documentation

glxinfo or GPU profiling tools

11. Self-Assessment Checklist

11.1 Understanding

I can explain atlas packing and caching.
I know how batching reduces draw calls.
I understand damage tracking.

11.2 Implementation

GPU renderer draws glyphs correctly.
Cache hit rate is high.
FPS meets target.

11.3 Growth

I can extend to multi-atlas support.
I can profile GPU performance.

12. Submission / Completion Criteria

Minimum Viable Completion:

GPU renders glyphs with an atlas.
Single draw call for a frame.

Full Completion:

Damage tracking and FPS benchmark.
Deterministic benchmark results.

Excellence (Going Above & Beyond):

Multi-atlas and GPU scroll.
Advanced buffer management.