Project 10: Game Boy Emulator (Expert Capstone)
Build a playable Game Boy emulator on the Pico with accurate CPU timing, graphics, and audio output.
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Level 5: Expert |
| Time Estimate | 1 month+ |
| Main Programming Language | C |
| Alternative Programming Languages | Rust |
| Coolness Level | Level 5: Legendary |
| Business Potential | 4. The “Retro Hardware” Tier |
| Prerequisites | Strong C, embedded timing, graphics basics |
| Key Topics | CPU emulation, memory map, PPU rendering, audio timing |
1. Learning Objectives
By completing this project, you will:
- Implement a CPU emulator with accurate instruction timing.
- Emulate the Game Boy memory map and IO registers.
- Render frames with a PPU pipeline and display output.
- Synchronize audio with CPU cycles.
- Optimize performance on constrained hardware.
2. All Theory Needed (Per-Concept Breakdown)
2.1 CPU Emulation and Cycle Accuracy
Fundamentals
CPU emulation means executing a target CPU’s instruction set in software. Cycle accuracy ensures that the timing of instructions matches the original hardware, which is critical for games that rely on precise timing.
Deep Dive into the concept
The Game Boy uses an 8-bit CPU (LR35902) with a well-defined instruction set and cycle counts. Emulation involves decoding opcodes, updating registers, and adjusting the program counter. Cycle accuracy is important because the CPU, PPU, timers, and audio are all synchronized to the same clock. If you run the CPU too fast or too slow, graphics glitches and audio artifacts appear. A common approach is to execute instructions and accumulate cycles, then update the PPU and timers based on those cycles. You also need to handle interrupts correctly, including the timing of interrupt enable/disable and the HALT instruction. On the Pico, performance is limited, so you may need to optimize hot paths (e.g., opcode dispatch) and place critical code in SRAM to avoid flash wait states.
How this fits on projects
CPU emulation is the core of §3.2 and §5.10 Phase 1.
Definitions & key terms
- Opcode -> instruction byte
- Cycle accuracy -> matching original CPU timing
- Interrupt -> hardware event that pauses execution
Mental model diagram (ASCII)
Fetch -> Decode -> Execute -> Update cycles
How it works (step-by-step)
- Fetch opcode from memory.
- Decode and execute instruction.
- Update registers and flags.
- Add cycle count and update subsystems.
Minimal concrete example
opcode = mem[pc++];
switch(opcode) { case 0x00: /* NOP */ cycles += 4; break; }
Common misconceptions
- “Accuracy doesn’t matter” -> many games rely on exact timing.
- “Emulation is just running code” -> hardware side effects matter.
Check-your-understanding questions
- Why is cycle accuracy important for the Game Boy?
- How do interrupts affect timing?
- What is a common optimization in opcode dispatch?
Check-your-understanding answers
- Graphics and audio are synchronized to CPU cycles.
- They add cycles and can preempt normal execution.
- Use a jump table or computed goto.
Real-world applications
- Emulators, virtual machines, compatibility layers.
Where you’ll apply it
- In this project: §3.2, §5.10 Phase 1.
- Also used in: P11-can-bus-vehicle-interface-automotive-hacking.md for protocol timing.
References
- Game Boy CPU documentation
- “Code” by Charles Petzold (logic and timing fundamentals)
Key insights
Emulation is a timing system; correctness includes both logic and cycles.
Summary
Accurate CPU timing is the foundation for stable graphics and audio.
Homework/Exercises to practice the concept
- Implement and test NOP, LD, and ADD instructions.
- Measure cycle counts for a short instruction sequence.
Solutions to the homework/exercises
- NOP increments PC and cycles by 4; LD updates registers.
- Sum cycles and compare to reference tables.
2.2 Memory Map and IO Registers
Fundamentals
The Game Boy memory map divides address space into ROM, RAM, video RAM, and IO registers. Reads and writes have side effects, especially for graphics and input.
Deep Dive into the concept
Emulating the memory map means that a load or store must access the correct region and trigger side effects. ROM is read-only; RAM is writable; VRAM is used by the PPU. IO registers control timers, LCD control, joypad input, and audio. Some registers are write-only or have special behavior (e.g., writing to the DMA register triggers a DMA transfer). Cartridge mappers (MBC) bank-switch ROM and RAM; you must implement at least one mapper to run games. Correct emulation requires that memory reads during certain PPU modes behave differently (e.g., VRAM inaccessible during pixel transfer). While you may simplify at first, timing and side effects will be required for compatibility.
How this fits on projects
Memory map emulation is required for §3.2 and §5.10 Phase 2.
Definitions & key terms
- VRAM -> video RAM
- IO register -> memory-mapped hardware control
- MBC -> memory bank controller
Mental model diagram (ASCII)
0x0000-3FFF ROM0 | 0x4000-7FFF ROMX | 0x8000-9FFF VRAM
How it works (step-by-step)
- Decode address region.
- Route to correct memory or device.
- Apply side effects for IO writes.
- Return value or update state.
Minimal concrete example
if (addr == 0xFF04) { /* DIV register */ div = 0; }
Common misconceptions
- “Memory is just an array” -> IO registers have behavior.
- “Bank switching optional” -> many games require it.
Check-your-understanding questions
- Why do IO registers need custom handlers?
- What is an MBC?
- Why might VRAM be restricted during certain modes?
Check-your-understanding answers
- Reads/writes trigger hardware behavior.
- A controller that maps additional ROM banks.
- The PPU is actively using VRAM for rendering.
Real-world applications
- Emulators, hardware simulation, firmware testing.
Where you’ll apply it
- In this project: §3.2, §5.10 Phase 2.
- Also used in: P02-digital-oscilloscope-see-electricity.md for memory-mapped peripherals.
References
- Pan Docs (Game Boy hardware reference)
Key insights
The memory map is a contract; violating it breaks games.
Summary
IO registers and memory regions must emulate real hardware behaviors.
Homework/Exercises to practice the concept
- Implement a memory read/write router.
- Add a handler for the DIV timer register.
Solutions to the homework/exercises
- Use address ranges to select region.
- Writing to DIV resets it to zero.
2.3 PPU Rendering Pipeline
Fundamentals
The PPU (Pixel Processing Unit) renders the Game Boy’s 160x144 display by drawing tiles and sprites. It runs in cycles and uses modes for OAM search, pixel transfer, HBlank, and VBlank.
Deep Dive into the concept
The PPU operates in a cycle-based state machine. Each scanline goes through OAM search (sprite fetch), pixel transfer (tile rendering), HBlank (idle), and after all lines, VBlank. Rendering requires reading tile data from VRAM and composing background and sprites according to priority rules. The LCD control registers define which layers are enabled and which tile maps to use. Accurate rendering requires synchronized timing with CPU cycles; many games rely on mid-frame register changes. On the Pico, you can render into a framebuffer and then push to an external display via SPI or PIO. The framebuffer must be converted to the display’s pixel format. Performance is tight; consider rendering in a line-by-line fashion instead of full frames to save memory.
How this fits on projects
PPU emulation is required in §3.2 and §5.10 Phase 3.
Definitions & key terms
- OAM -> Object Attribute Memory (sprites)
- VBlank -> vertical blanking interval
- Tile map -> grid of tile indices
Mental model diagram (ASCII)
Scanline: OAM -> Pixel -> HBlank -> next line
How it works (step-by-step)
- For each scanline, fetch visible sprites.
- Render background tiles and sprites.
- Enter HBlank and update registers.
- After last line, enter VBlank.
Minimal concrete example
for (y=0; y<144; y++) render_scanline(y);
Common misconceptions
- “You can render once per frame” -> timing changes mid-frame matter.
- “Sprites always on top” -> priority rules are complex.
Check-your-understanding questions
- What is VBlank used for?
- Why are scanline timings important?
- How do background and sprite priorities work?
Check-your-understanding answers
- It’s a safe time to update graphics state.
- Games change registers during rendering.
- Sprites can be hidden by background depending on priority bits.
Real-world applications
- Graphics emulation, display pipelines, GPU timing.
Where you’ll apply it
- In this project: §3.2, §5.10 Phase 3.
- Also used in: P06-neopixel-display-engine-1000-led-controller.md for rendering pipelines.
References
- Game Boy PPU documentation
Key insights
PPU timing is as important as correct pixels.
Summary
Render scanlines with correct timing and layer priority for accurate graphics.
Homework/Exercises to practice the concept
- Render a single tile map in software.
- Implement sprite priority over background.
Solutions to the homework/exercises
- Map tile indices to pixel data and draw 8x8 tiles.
- Check priority bits before writing pixel.
2.4 Audio Timing and Mixing
Fundamentals
The Game Boy has four audio channels with distinct waveforms. Audio output must be synchronized to CPU cycles to avoid glitches.
Deep Dive into the concept
The audio system includes two square wave channels, one wavetable channel, and one noise channel. Each channel has registers controlling frequency, envelope, and length. Emulating audio requires updating these channels based on CPU cycles and mixing them into a PCM stream. The output sample rate (e.g., 44.1 kHz) is independent of the CPU clock, so you must resample by accumulating cycles until enough time has passed to produce a sample. Buffering is critical; if the audio buffer underruns, you hear pops. On the Pico, you can output audio via PWM or an external DAC. Keep audio generation on a high-priority task and avoid large blocking operations in the main loop.
How this fits on projects
Audio timing is required for §3.2 and §5.10 Phase 4.
Definitions & key terms
- Envelope -> volume change over time
- Wavetable -> pre-defined waveform samples
- Underrun -> buffer runs out of data
Mental model diagram (ASCII)
CPU cycles -> Audio state -> Mix -> PCM buffer -> PWM/DAC
How it works (step-by-step)
- Update audio channels based on cycles.
- Mix channels into a sample.
- Push sample into output buffer.
- Output via PWM/DAC.
Minimal concrete example
sample = (ch1 + ch2 + ch3 + ch4) / 4;
Common misconceptions
- “Audio can be updated once per frame” -> too coarse.
- “PWM output is always clean” -> needs filtering.
Check-your-understanding questions
- Why is audio tied to CPU cycles?
- What causes buffer underruns?
- Why filter PWM output?
Check-your-understanding answers
- The audio hardware is clocked by the same source as the CPU.
- The producer is too slow or blocked.
- PWM produces a high-frequency carrier that must be smoothed.
Real-world applications
- Audio emulation, synth engines, real-time DSP.
Where you’ll apply it
- In this project: §5.10 Phase 4.
- Also used in: P12-digital-theremin-touchless-music.md.
References
- Game Boy sound documentation
Key insights
Audio is the first subsystem to reveal timing errors.
Summary
Synchronize audio updates to CPU cycles and keep buffers filled.
Homework/Exercises to practice the concept
- Implement a simple square wave generator.
- Measure output frequency from PWM.
Solutions to the homework/exercises
- Toggle output at frequency intervals to generate square wave.
- Use scope to verify frequency and duty.
3. Project Specification
3.1 What You Will Build
A Game Boy emulator that loads ROMs, runs games at ~60 FPS, renders graphics on an external display, and outputs audio.
3.2 Functional Requirements
- CPU emulation: implement core instruction set.
- Memory map: support ROM, RAM, VRAM, and IO.
- PPU rendering: output frames to display.
- Audio output: implement basic channels.
- Input: map buttons to joypad registers.
3.3 Non-Functional Requirements
- Performance: stable 59-60 FPS.
- Reliability: no crashes after 5 minutes.
- Usability: clear logs and error handling.
3.4 Example Usage / Output
[EMU] ROM loaded: TETRIS.GB
[EMU] FPS=59.7
[PPU] frame rendered in 16.4 ms
[AUDIO] buffer underruns=0
3.5 Data Formats / Schemas / Protocols
- ROM file: .gb file loaded from SD or flash
- Frame buffer: 160x144 2-bit color
3.6 Edge Cases
- Unsupported MBC -> show error.
- ROM checksum mismatch -> warn.
- Audio buffer underrun -> log and mute briefly.
3.7 Real World Outcome
Playable Game Boy games running on a Pico with sound and input.
3.7.1 How to Run (Copy/Paste)
cmake .. && make -j4
picotool load -f gbemu.uf2
3.7.2 Golden Path Demo (Deterministic)
- Load a known ROM with test harness (e.g., instruction test ROM).
- Verify that CPU test passes and frame timing is stable.
3.7.3 Failure Demo (Bad Input)
- Scenario: load unsupported ROM mapper.
- Expected result: log
[ERROR] unsupported MBCand halt.
3.7.4 If GUI: ASCII wireframe
+----------------------+
| GAME BOY SCREEN |
| [160x144 framebuffer] |
+----------------------+
4. Solution Architecture
4.1 High-Level Design
ROM -> CPU -> Memory Map -> PPU -> Display
|-> Audio -> PWM/DAC
|-> Input -> Joypad
4.2 Key Components
| Component | Responsibility | Key Decisions | |———–|—————-|—————| | CPU Core | Opcode execution | Table-driven dispatch | | Memory Map | IO and ROM | Modular mapper support | | PPU | Frame rendering | Scanline-based renderer | | Audio | Channel mixing | Fixed sample rate buffer | | Display | Output frames | SPI TFT or VGA |
4.3 Data Structures (No Full Code)
typedef struct {
uint8_t a, f, b, c, d, e, h, l;
uint16_t pc, sp;
} cpu_t;
4.4 Algorithm Overview
Key Algorithm: Cycle-Stepped Emulation
- Execute instruction, add cycles.
- Update timers/PPU/audio based on cycles.
- Render frame when VBlank occurs.
Complexity Analysis:
- Time: O(1) per instruction
- Space: O(memory map size)
5. Implementation Guide
5.1 Development Environment Setup
# Pico SDK + display driver
5.2 Project Structure
gb-emulator/
├── firmware/
│ ├── cpu.c
│ ├── mem.c
│ ├── ppu.c
│ ├── audio.c
│ └── display.c
└── README.md
5.3 The Core Question You’re Answering
“How do you recreate a complex hardware system in software with precise timing?”
5.4 Concepts You Must Understand First
- CPU emulation and cycle timing
- Memory-mapped IO
- PPU rendering pipeline
- Audio mixing and buffer timing
5.5 Questions to Guide Your Design
- Which MBC will you support first?
- How will you validate CPU correctness?
- How will you optimize for FPS?
5.6 Thinking Exercise
Calculate cycle budget per frame: 70,224 cycles in 16.67 ms.
5.7 The Interview Questions They’ll Ask
- How do you ensure timing accuracy in an emulator?
- What are the hardest parts of PPU emulation?
- How do you synchronize audio and video?
5.8 Hints in Layers
Hint 1: Build CPU core and run test ROMs on a PC. Hint 2: Add memory map and timers. Hint 3: Add PPU rendering to framebuffer. Hint 4: Add audio and input.
5.9 Books That Will Help
| Topic | Book | Chapter | |——-|——|———| | Architecture | “Code” | Ch. 15-17 | | CPU design | “Computer Organization and Design” | Ch. 4 | | Graphics | “Computer Graphics from Scratch” | Ch. 3-4 |
5.10 Implementation Phases
Phase 1: Foundation (2-3 weeks)
- Implement CPU core and pass instruction tests. Checkpoint: CPU test ROM passes.
Phase 2: Core Functionality (2-3 weeks)
- Implement memory map, timers, and PPU. Checkpoint: Boot logo renders.
Phase 3: Audio & Input (1-2 weeks)
- Add audio channels and input mapping. Checkpoint: Game runs with sound.
Phase 4: Optimization (ongoing)
- Profile and optimize hot paths. Checkpoint: Stable 60 FPS.
5.11 Key Implementation Decisions
| Decision | Options | Recommendation | Rationale | |———-|———|—————-|———–| | CPU dispatch | switch vs table | Table | Faster and cleaner | | Rendering | Full frame vs scanline | Scanline | Less memory | | Audio output | PWM vs DAC | PWM (MVP) | Simpler hardware |
6. Testing Strategy
6.1 Test Categories
| Category | Purpose | Examples | |———-|———|———-| | Unit Tests | Opcode correctness | CPU test ROMs | | Integration Tests | Boot process | Boot logo validation | | Edge Case Tests | MBC handling | Unsupported mapper error |
6.2 Critical Test Cases
- Instruction tests: all opcodes pass reference ROM.
- Boot logo: correct sequence displayed.
- Audio: no buffer underruns for 60 s.
6.3 Test Data
Test ROM: cpu_instrs.gb
7. Common Pitfalls & Debugging
7.1 Frequent Mistakes
| Pitfall | Symptom | Solution | |———|———|———-| | Wrong cycle counts | Games run fast/slow | Use reference tables | | Memory map errors | Crashes or glitches | Validate address routing | | Audio underruns | Pops/crackle | Increase buffer size |
7.2 Debugging Strategies
- Compare against known emulator logs.
- Use frame timing logs to detect drift.
7.3 Performance Traps
- Excessive logging kills FPS; compile out in release builds.
8. Extensions & Challenges
8.1 Beginner Extensions
- Add save states.
- Add FPS overlay.
8.2 Intermediate Extensions
- Support additional MBCs.
- Add palette controls.
8.3 Advanced Extensions
- Implement link cable emulation.
- Add speed-up and rewind.
9. Real-World Connections
9.1 Industry Applications
- Emulation and hardware preservation.
- Virtual machines and compatibility layers.
9.2 Related Open Source Projects
- SameBoy, Gambatte (for reference behavior)
9.3 Interview Relevance
- Emulation, timing, and optimization are advanced systems topics.
10. Resources
10.1 Essential Reading
- Pan Docs (Game Boy hardware reference)
- RP2040 datasheet for performance tuning
10.2 Video Resources
- Game Boy emulator deep-dive talks
10.3 Tools & Documentation
- Test ROM suites for CPU/PPU validation
10.4 Related Projects in This Series
- P04-logic-analyzer-debug-digital-signals.md for timing analysis
- P12-digital-theremin-touchless-music.md for audio output
11. Self-Assessment Checklist
11.1 Understanding
- I can explain cycle-accurate CPU emulation.
- I can describe the Game Boy memory map.
- I can render scanlines correctly.
11.2 Implementation
- ROM loads and runs stable at 60 FPS.
- Audio plays without underruns.
- Input is responsive.
11.3 Growth
- I can add new mappers and features.
- I can profile and optimize critical code.
12. Submission / Completion Criteria
Minimum Viable Completion:
- CPU executes test ROM and renders a frame.
Full Completion:
- Playable game with audio and input.
Excellence (Going Above & Beyond):
- Save states and additional mapper support.
13. Additional Content Rules
13.1 Determinism
Use fixed test ROMs and fixed CPU clock. Log FPS and cycle counts for reproducibility.
13.2 Outcome Completeness
- Success demo: §3.7.2
- Failure demo: §3.7.3
- CLI exit codes: host ROM loader returns
0success,2file not found,4unsupported mapper.
13.3 Cross-Linking
Concept references in §2.x and related projects in §10.4.
13.4 No Placeholder Text
All sections are fully specified for this project.