Project 4: CHIP-8 Emulator
Build a complete emulator for the CHIP-8, a 1970s virtual machine with 35 instructions, 16 registers, a 64x32 monochrome display, and hexadecimal keyboard input. Run actual games like Pong, Tetris, and Space Invaders on a CPU you built yourself.
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | ★★★☆☆ Intermediate |
| Time Estimate | 1-2 weeks |
| Language | C (Recommended), Rust, C++, Go |
| Prerequisites | Binary/hex fluency, basic C programming, understanding of fetch-decode-execute cycle |
| Key Topics | Opcode decoding, register files, program counter, stack operations, memory-mapped I/O, timer interrupts, sprite graphics |
1. Learning Objectives
After completing this project, you will be able to:
- Implement the complete fetch-decode-execute cycle for a real instruction set architecture
- Decode 2-byte opcodes and extract embedded operands using bitmasking and shifting
- Design and implement a register file with 16 general-purpose registers
- Manage a program counter and subroutine stack for control flow
- Implement hardware timers that decrement at a fixed frequency
- Draw sprites using XOR-based graphics (the same technique used in early video games)
- Read and interpret official technical specifications for a CPU
- Debug CPU emulation issues systematically using state inspection
- Understand why emulators are structured the way they are
2. Theoretical Foundation
2.1 What is CHIP-8?
CHIP-8 is not a physical CPU - it’s a virtual machine specification created in the mid-1970s by Joseph Weisbecker for the COSMAC VIP computer. It was designed to make game programming easier by providing a higher-level interface than raw machine code.
The CHIP-8 Ecosystem:
1970s Physical Hardware Modern Emulation
┌─────────────────────────┐ ┌─────────────────────────┐
│ COSMAC VIP / TELMAC │ │ Your Computer │
│ ┌───────────────────┐ │ │ ┌───────────────────┐ │
│ │ CDP1802 CPU │ │ │ │ x86/ARM CPU │ │
│ │ (Hardware) │ │ │ │ (Hardware) │ │
│ └─────────┬─────────┘ │ │ └─────────┬─────────┘ │
│ │ │ │ │ │
│ ┌─────────▼─────────┐ │ │ ┌─────────▼─────────┐ │
│ │ CHIP-8 Interpreter│ │ │ │ YOUR EMULATOR │ │
│ │ (Software) │ │ │ │ (Software) │ │
│ └─────────┬─────────┘ │ │ └─────────┬─────────┘ │
│ │ │ │ │ │
│ ┌─────────▼─────────┐ │ │ ┌─────────▼─────────┐ │
│ │ CHIP-8 ROM │ │ │ │ CHIP-8 ROM │ │
│ │ (Pong, Tetris) │ │ │ │ (Same exact ROM) │ │
│ └───────────────────┘ │ │ └───────────────────┘ │
└─────────────────────────┘ └─────────────────────────┘
You are building the middle layer - the interpreter that makes
CHIP-8 programs run on modern hardware.
2.2 Why CHIP-8 is the Perfect First Emulation Project
CHIP-8 is the “Hello World” of CPU emulation for several reasons:
- Simple but complete: Only 35 instructions, but covers all fundamental concepts
- Well-documented: Cowgod’s Technical Reference is the gold standard spec
- Testable: Hundreds of test ROMs and games available
- Visual feedback: You immediately see if your emulator works (games play!)
- Real hardware concepts: Registers, stack, timers, memory-mapped I/O - all present
CHIP-8 Complexity Comparison:
CPU | Instructions | Registers | Addressing Modes | Time to Implement
----------------|--------------|-----------|------------------|------------------
CHIP-8 | 35 | 16 | 2 | 1-2 weeks
Game Boy (LR35902) | 500+ | 8+ | 8+ | 2-3 months
6502 (NES/C64) | 56 | 5 | 13 | 1 month
x86-64 | 1000+ | 16+ | 10+ | Years (never done)
CHIP-8 is the on-ramp to CPU emulation.
2.3 The CHIP-8 Architecture
CHIP-8 System Architecture:
┌─────────────────────────────────────────────────────────────────────────┐
│ CHIP-8 VIRTUAL MACHINE │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ MEMORY (4KB) │ │
│ │ ┌──────────────┬──────────────────────────────────────────┐ │ │
│ │ │ 0x000-0x1FF │ Reserved (Interpreter / Font sprites) │ │ │
│ │ ├──────────────┼──────────────────────────────────────────┤ │ │
│ │ │ 0x200-0xFFF │ Program ROM & Working RAM │ │ │
│ │ └──────────────┴──────────────────────────────────────────┘ │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────┐ ┌─────────────────────────────────────┐ │
│ │ REGISTERS │ │ STACK (16 levels) │ │
│ │ ┌───┬───┬───┬───┐ │ │ ┌───┬───┬───┬───┬───┬───┬───┬───┐ │ │
│ │ │V0 │V1 │V2 │V3 │ │ │ │ 0 │ 1 │ 2 │ 3 │...│12 │13 │14 │ │ │
│ │ ├───┼───┼───┼───┤ │ │ └───┴───┴───┴───┴───┴───┴───┴───┘ │ │
│ │ │V4 │V5 │V6 │V7 │ │ │ ▲ │ │
│ │ ├───┼───┼───┼───┤ │ │ │ SP (Stack Pointer) │ │
│ │ │V8 │V9 │VA │VB │ │ └───────────┴─────────────────────────┘ │
│ │ ├───┼───┼───┼───┤ │ │
│ │ │VC │VD │VE │VF │ │ ┌─────────────────────────────────────┐ │
│ │ └───┴───┴───┴───┘ │ │ SPECIAL REGISTERS │ │
│ │ (8-bit each) │ │ ┌──────┐ ┌──────┐ ┌──────┐ │ │
│ │ VF = Carry flag │ │ │ PC │ │ I │ │ SP │ │ │
│ └─────────────────────┘ │ │16-bit│ │16-bit│ │ 8-bit│ │ │
│ │ └──────┘ └──────┘ └──────┘ │ │
│ ┌─────────────────────┐ │ Program Index Stack │ │
│ │ TIMERS │ │ Counter Reg Pointer │ │
│ │ ┌──────┐ ┌──────┐ │ └─────────────────────────────────────┘ │
│ │ │ DT │ │ ST │ │ │
│ │ │Delay │ │Sound │ │ ┌─────────────────────────────────────┐ │
│ │ │Timer │ │Timer │ │ │ DISPLAY (64 x 32 monochrome) │ │
│ │ └──────┘ └──────┘ │ │ ┌─────────────────────────────────┐ │ │
│ │ (8-bit each, │ │ │ ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ │ │ │
│ │ decrement @60Hz) │ │ │ ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ │ │ │
│ └─────────────────────┘ │ │ ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ │ │ │
│ │ └─────────────────────────────────┘ │ │
│ ┌─────────────────────┐ └─────────────────────────────────────┘ │
│ │ KEYBOARD │ │
│ │ ┌─┬─┬─┬─┐ │ ┌─────────────────────────────────────┐ │
│ │ │1│2│3│C│ │ │ INPUT HANDLING │ │
│ │ ├─┼─┼─┼─┤ │ │ • 16-key hexadecimal keypad │ │
│ │ │4│5│6│D│ │ │ • Mapped to modern keyboard │ │
│ │ ├─┼─┼─┼─┤ │ │ • Blocking wait for key press │ │
│ │ │7│8│9│E│ │ └─────────────────────────────────────┘ │
│ │ ├─┼─┼─┼─┤ │ │
│ │ │A│0│B│F│ │ │
│ │ └─┴─┴─┴─┘ │ │
│ └─────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────┘
2.4 Memory Map
CHIP-8 Memory Layout (4KB = 4096 bytes):
Address Size Purpose
──────────────────────────────────────────────────────────
0x000 80 Built-in font sprites (0-F)
│ bytes Each character is 5 bytes (8x5 pixels)
│
0x050 160 Font sprites continue (or unused)
│ bytes (Implementation varies)
│
0x1FF ──── End of interpreter area
─────────────────────────────────────────
0x200 ◄── Programs loaded here (START ADDRESS)
│ This is where your ROM loads
│ This is where PC starts
│
│ Program code and data occupy this area
│ Variables stored here too
│
0xFFF ◄── End of memory (4095)
──────────────────────────────────────────────────────────
Memory Layout Diagram:
0x000 ┌────────────────────────────────┐
│ Font Sprites (0-F) │ 80 bytes
│ '0' = F0 90 90 90 F0 │
│ '1' = 20 60 20 20 70 │
│ ... │
0x050 ├────────────────────────────────┤
│ (Reserved/Unused) │ ~432 bytes
│ │
0x200 ├────────────────────────────────┤ ◄── PROGRAM START
│ │
│ Your CHIP-8 ROM loads │
│ here and executes │
│ │
│ ┌──────────────────────┐ │
│ │ Instructions │ │
│ │ 2 bytes each │ │
│ │ 0x00E0 = CLS │ │
│ │ 0x1NNN = JMP NNN │ │
│ │ ... │ │
│ └──────────────────────┘ │
│ │
0xFFF └────────────────────────────────┘
2.5 Instruction Format
All CHIP-8 instructions are exactly 2 bytes (16 bits), stored big-endian:
CHIP-8 Instruction Encoding:
High Byte (first in memory) Low Byte (second in memory)
┌───┬───┬───┬───┬───┬───┬───┬───┬───┬───┬───┬───┬───┬───┬───┬───┐
│ 15│ 14│ 13│ 12│ 11│ 10│ 9 │ 8 │ 7 │ 6 │ 5 │ 4 │ 3 │ 2 │ 1 │ 0 │
└───┴───┴───┴───┴───┴───┴───┴───┴───┴───┴───┴───┴───┴───┴───┴───┘
│ │ │ │ │
└─────┬─────┘ └─────┴─────┴─────┴─────┴─────┘
│ │
First nibble Various operands
(Opcode identifier) depending on instruction
Nibble positions (commonly used in documentation):
Nibble 1 (bits 15-12): Primary opcode
Nibble 2 (bits 11-8): Usually register X
Nibble 3 (bits 7-4): Usually register Y
Nibble 4 (bits 3-0): Usually N (small constant)
Common patterns:
NNN = 12-bit address (bits 11-0)
NN = 8-bit constant (bits 7-0)
N = 4-bit constant (bits 3-0)
X = Register index (bits 11-8)
Y = Register index (bits 7-4)
Example: Instruction 0x6A42 (LD VA, 0x42)
Binary: 0110 1010 0100 0010
│ │ └────────┘
│ │ │
│ │ └─ NN = 0x42 (immediate value 66)
│ │
│ └─ X = 0xA (register VA)
│
└─ Opcode = 0x6 (LD Vx, byte)
Action: Load the value 0x42 into register VA
Example: Instruction 0x8123 (ADD V1, V2)
Binary: 1000 0001 0010 0011
│ │ │ │
│ │ │ └─ Variant = 0x4 (but wait, it's 3?)
│ │ │ Actually the low nibble for 8XYN
│ │ │ determines the operation
│ │ │
│ │ └─ Y = 0x2 (register V2)
│ │
│ └─ X = 0x1 (register V1)
│
└─ Opcode = 0x8 (arithmetic/logic group)
Wait - 0x8123 low nibble is 3, which is XOR, not ADD!
For ADD V1, V2, the opcode would be 0x8124
0x8124: V1 = V1 + V2, VF = carry
2.6 The Fetch-Decode-Execute Cycle
CHIP-8 CPU Cycle:
┌─────────────────────────────────────────────────────────────────────────┐
│ FETCH-DECODE-EXECUTE │
└─────────────────────────────────────────────────────────────────────────┘
┌─────────────────┐
│ FETCH │
│ │
│ Read 2 bytes │
│ from memory │
│ at [PC] │
│ │
│ opcode = │
│ mem[PC] << 8 | │
│ mem[PC+1] │
│ │
│ PC += 2 │
└────────┬────────┘
│
▼
┌─────────────────┐
│ DECODE │
│ │
│ Extract opcode │
│ identifier │
│ (high nibble) │
│ │
│ Extract │
│ operands: │
│ X, Y, N, │
│ NN, NNN │
└────────┬────────┘
│
▼
┌─────────────────┐
│ EXECUTE │
│ │
│ switch(opcode) │
│ { │
│ case 0x0: │
│ ... │
│ case 0x1: │
│ ... │
│ } │
│ │
│ Modify state │
│ (registers, │
│ memory, PC, │
│ display) │
└────────┬────────┘
│
▼
┌─────────────────┐
│ UPDATE TIMERS │
│ (at 60Hz) │
│ │
│ if (DT > 0) │
│ DT-- │
│ if (ST > 0) │
│ ST-- │
│ beep() │
└────────┬────────┘
│
▼
┌───────┐
│ LOOP │───────────┐
└───────┘ │
▲ │
└───────────────┘
Timing Considerations:
─────────────────────
• Original CHIP-8 ran at ~500-700 Hz (instructions per second)
• Timers (DT and ST) decrement at exactly 60 Hz
• You need to separate instruction execution rate from timer rate
• Common approach: Execute ~10 instructions per timer tick
2.7 The Display and Sprite System
CHIP-8 Display System:
Display Dimensions: 64 pixels wide x 32 pixels tall (2048 pixels total)
Pixel State: ON (1) or OFF (0) - monochrome
Drawing Method: XOR - sprites toggle pixels
Display Coordinates:
0 63
┌─────────────────────────────────────────────────────────┐
0 │ (0,0) (63,0) │
│ │
│ │
│ SCREEN │
│ │
│ │
31 │ (0,31) (63,31) │
└─────────────────────────────────────────────────────────┘
Sprites are stored in memory as rows of bytes:
─────────────────────────────────────────────────
Each sprite is 8 pixels wide, 1-15 pixels tall
Each row is 1 byte (8 bits = 8 pixels)
Example: The letter 'F' (font sprite at 0x000 + 15*5 = 0x04B)
Memory: Binary: Visual:
0xF0 = 1111 0000 = ████░░░░
0x80 = 1000 0000 = █░░░░░░░
0xF0 = 1111 0000 = ████░░░░
0x80 = 1000 0000 = █░░░░░░░
0x80 = 1000 0000 = █░░░░░░░
XOR Drawing Explained:
─────────────────────
When you draw a sprite, each pixel is XORed with the screen:
Screen pixel XOR Sprite pixel = Result
────────────────────────────────────────────────
0 XOR 0 = 0 (stays off)
0 XOR 1 = 1 (turns on)
1 XOR 0 = 1 (stays on)
1 XOR 1 = 0 (turns OFF - collision!)
If ANY pixel turns OFF during a draw, set VF = 1 (collision detection)
Drawing Algorithm (DXYN instruction):
─────────────────────────────────────
DXYN: Draw N-byte sprite at position (VX, VY)
1. Get X coordinate from V[X] % 64 (wrap horizontally)
2. Get Y coordinate from V[Y] % 32 (wrap vertically)
3. Set VF = 0 (no collision yet)
4. For each row (0 to N-1):
a. Read sprite byte from memory[I + row]
b. For each bit (0 to 7):
- If sprite bit is 1:
- screen_x = (X + bit) % 64
- screen_y = (Y + row) % 32
- If screen[screen_x][screen_y] is 1:
- VF = 1 (collision detected!)
- screen[screen_x][screen_y] ^= 1
Wrapping behavior note: Original CHIP-8 clipped sprites at edges.
Many modern games expect wrapping. Document your choice.
Example Draw Operation:
───────────────────────
Drawing 0xF0 at position (2, 1):
Before: Sprite:
0123456789... Bit: 01234567
┌────────────── 0xF0: 11110000
│░░░░░░░░░░
│░░░░░░░░░░ After XOR at (2,1):
│░░░░░░░░░░ ┌──────────────
│░░░░░░░░░░
│░░████░░░░ ← Sprite drawn here
│░░░░░░░░░░
2.8 Complete Instruction Set
CHIP-8 Instruction Set (35 Instructions):
┌─────────┬─────────────────────────────────────────────────────────────────┐
│ Opcode │ Description │
├─────────┼─────────────────────────────────────────────────────────────────┤
│ │ SYSTEM INSTRUCTIONS │
├─────────┼─────────────────────────────────────────────────────────────────┤
│ 00E0 │ CLS - Clear the display │
│ 00EE │ RET - Return from subroutine (pop stack into PC) │
├─────────┼─────────────────────────────────────────────────────────────────┤
│ │ FLOW CONTROL │
├─────────┼─────────────────────────────────────────────────────────────────┤
│ 1NNN │ JP addr - Jump to address NNN │
│ 2NNN │ CALL addr - Call subroutine at NNN (push PC, jump to NNN) │
│ 3XNN │ SE Vx, byte - Skip next instruction if Vx == NN │
│ 4XNN │ SNE Vx, byte - Skip next instruction if Vx != NN │
│ 5XY0 │ SE Vx, Vy - Skip next instruction if Vx == Vy │
│ 9XY0 │ SNE Vx, Vy - Skip next instruction if Vx != Vy │
│ BNNN │ JP V0, addr - Jump to address NNN + V0 │
├─────────┼─────────────────────────────────────────────────────────────────┤
│ │ REGISTER OPERATIONS │
├─────────┼─────────────────────────────────────────────────────────────────┤
│ 6XNN │ LD Vx, byte - Load NN into Vx │
│ 7XNN │ ADD Vx, byte - Add NN to Vx (no carry flag) │
│ 8XY0 │ LD Vx, Vy - Set Vx = Vy │
│ 8XY1 │ OR Vx, Vy - Set Vx = Vx OR Vy │
│ 8XY2 │ AND Vx, Vy - Set Vx = Vx AND Vy │
│ 8XY3 │ XOR Vx, Vy - Set Vx = Vx XOR Vy │
│ 8XY4 │ ADD Vx, Vy - Set Vx = Vx + Vy, VF = carry │
│ 8XY5 │ SUB Vx, Vy - Set Vx = Vx - Vy, VF = NOT borrow │
│ 8XY6 │ SHR Vx {, Vy} - Shift Vx right, VF = LSB before shift │
│ 8XY7 │ SUBN Vx, Vy - Set Vx = Vy - Vx, VF = NOT borrow │
│ 8XYE │ SHL Vx {, Vy} - Shift Vx left, VF = MSB before shift │
│ CXNN │ RND Vx, byte - Set Vx = random byte AND NN │
├─────────┼─────────────────────────────────────────────────────────────────┤
│ │ MEMORY OPERATIONS │
├─────────┼─────────────────────────────────────────────────────────────────┤
│ ANNN │ LD I, addr - Set I = NNN │
│ FX1E │ ADD I, Vx - Set I = I + Vx │
│ FX29 │ LD F, Vx - Set I = location of sprite for digit Vx │
│ FX33 │ LD B, Vx - Store BCD of Vx in I, I+1, I+2 │
│ FX55 │ LD [I], Vx - Store V0-Vx in memory starting at I │
│ FX65 │ LD Vx, [I] - Read V0-Vx from memory starting at I │
├─────────┼─────────────────────────────────────────────────────────────────┤
│ │ DISPLAY │
├─────────┼─────────────────────────────────────────────────────────────────┤
│ DXYN │ DRW Vx, Vy, N - Draw N-byte sprite at (Vx, Vy), VF = collision │
├─────────┼─────────────────────────────────────────────────────────────────┤
│ │ KEYBOARD │
├─────────┼─────────────────────────────────────────────────────────────────┤
│ EX9E │ SKP Vx - Skip next instruction if key Vx is pressed │
│ EXA1 │ SKNP Vx - Skip next instruction if key Vx is NOT pressed │
│ FX0A │ LD Vx, K - Wait for key press, store key value in Vx │
├─────────┼─────────────────────────────────────────────────────────────────┤
│ │ TIMERS │
├─────────┼─────────────────────────────────────────────────────────────────┤
│ FX07 │ LD Vx, DT - Set Vx = delay timer value │
│ FX15 │ LD DT, Vx - Set delay timer = Vx │
│ FX18 │ LD ST, Vx - Set sound timer = Vx │
└─────────┴─────────────────────────────────────────────────────────────────┘
Note on 8XY6 and 8XYE (Shift operations):
─────────────────────────────────────────
There are two interpretations:
Original COSMAC VIP: Vx = Vy >> 1 or Vx = Vy << 1 (uses Y)
Modern/CHIP-48: Vx = Vx >> 1 or Vx = Vx << 1 (ignores Y)
Most games expect the modern behavior, but some require the original.
Consider making this configurable (quirks mode).
2.9 Common Misconceptions
Misconception 1: “CHIP-8 was a real hardware CPU”
- Reality: CHIP-8 was always a virtual machine interpreted by software. There was no CHIP-8 chip.
Misconception 2: “I need to emulate at exact clock speeds”
- Reality: CHIP-8 ran on different hardware at different speeds. You just need to keep timers at 60Hz and instructions “fast enough” (typically 500-700 Hz feels right).
Misconception 3: “VF is just another general-purpose register”
- Reality: VF is special - it’s used as a flag register by many instructions (carry, borrow, collision). Never use it for general computation.
Misconception 4: “The display is like a framebuffer I write to directly”
- Reality: The display is modified only through the DXYN instruction using XOR. You can’t directly set a pixel to a specific value.
Misconception 5: “I should use floating-point for timing”
- Reality: Use integer counters and fixed update rates. Floating-point timing leads to subtle bugs and inconsistent behavior.
3. Project Specification
3.1 What You Will Build
A complete CHIP-8 emulator that:
- Loads and executes any standard CHIP-8 ROM
- Implements all 35 instructions correctly
- Displays graphics in a 64x32 window (scaled up for visibility)
- Accepts keyboard input mapped to the CHIP-8 hex keypad
- Runs timers at the correct 60Hz rate
- Plays sound when the sound timer is active
3.2 Functional Requirements
- Load ROM files into memory at address 0x200
- Initialize all CHIP-8 state (registers, PC, stack, display)
- Implement fetch-decode-execute cycle
- Implement all 35 CHIP-8 instructions
- 64x32 monochrome display with XOR sprite drawing
- Keyboard input for 16 hex keys
- Delay timer decrements at 60Hz
- Sound timer triggers audio when > 0
- Collision detection (VF flag) for sprite drawing
3.3 Non-Functional Requirements
- Instruction execution at approximately 500-700 Hz (configurable)
- Timer updates at exactly 60 Hz
- Display should scale to a reasonable window size (at least 640x320)
- Frame rate should be smooth (target 60 FPS for display)
- Memory usage under 1MB
- Should run on Linux, macOS, or Windows
3.4 Example Usage / Output
# Build the emulator
$ make
# Run a ROM
$ ./chip8 roms/PONG.ch8
┌────────────────────────────────────────────────────────────────┐
│ │
│ █ ● █ │
│ █ █ │
│ █ █ │
│ █ █ │
│ │
│ │
│ │
│ Score: 3 - 2 │
└────────────────────────────────────────────────────────────────┘
Controls: 1=Left Up, Q=Left Down | 4=Right Up, R=Right Down | ESC=Quit
# Run with debug output
$ ./chip8 --debug roms/test.ch8
PC=0x200: 00E0 CLS
PC=0x202: 6A02 LD VA, 0x02
PC=0x204: 6B0C LD VB, 0x0C
PC=0x206: A300 LD I, 0x300
PC=0x208: DAB5 DRW VA, VB, 5
...
# Run test ROM to verify implementation
$ ./chip8 roms/BC_test.ch8
[All test patterns should display correctly]
3.5 Real World Outcome
When complete, you will be able to:
- Play classic games: Pong, Tetris, Space Invaders, Breakout, Brix
- Run test ROMs that verify your instruction implementation
- Understand emulator architecture well enough to tackle Game Boy or NES
- Explain CPU emulation concepts in interviews
- Have a compelling portfolio project demonstrating systems knowledge
4. Solution Architecture
4.1 High-Level Design
CHIP-8 Emulator Architecture:
┌─────────────────────────────────────────────────────────────────────────────┐
│ YOUR EMULATOR │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ MAIN LOOP │ │
│ │ ┌─────────────────────────────────────────────────────────────┐ │ │
│ │ │ while (running) { │ │ │
│ │ │ handle_input(); // Check keyboard state │ │ │
│ │ │ │ │ │
│ │ │ // Run multiple instructions per frame │ │ │
│ │ │ for (int i = 0; i < instructions_per_frame; i++) { │ │ │
│ │ │ emulate_cycle(); // Fetch-Decode-Execute │ │ │
│ │ │ } │ │ │
│ │ │ │ │ │
│ │ │ update_timers(); // Decrement DT and ST at 60Hz │ │ │
│ │ │ render_display(); // Draw to window │ │ │
│ │ │ play_sound(); // Beep if ST > 0 │ │ │
│ │ │ delay_for_60fps(); // Maintain timing │ │ │
│ │ │ } │ │ │
│ │ └─────────────────────────────────────────────────────────────┘ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌───────────────────────────┐ ┌───────────────────────────────────┐ │
│ │ CHIP-8 STATE │ │ INPUT HANDLER │ │
│ │ │ │ │ │
│ │ uint8_t memory[4096] │ │ Map physical keys to CHIP-8: │ │
│ │ uint8_t V[16] │ │ │ │
│ │ uint16_t I │ │ Physical CHIP-8 │ │
│ │ uint16_t PC │ │ ───────────────────── │ │
│ │ uint8_t SP │ │ 1 2 3 4 1 2 3 C │ │
│ │ uint16_t stack[16] │ │ Q W E R 4 5 6 D │ │
│ │ uint8_t delay_timer │ │ A S D F 7 8 9 E │ │
│ │ uint8_t sound_timer │ │ Z X C V A 0 B F │ │
│ │ uint8_t display[64*32] │ │ │ │
│ │ uint8_t keypad[16] │ │ bool keypad[16] │ │
│ │ │ │ │ │
│ └───────────────────────────┘ └───────────────────────────────────┘ │
│ │
│ ┌───────────────────────────┐ ┌───────────────────────────────────┐ │
│ │ INSTRUCTION DECODER │ │ DISPLAY RENDERER │ │
│ │ │ │ │ │
│ │ uint16_t opcode = │ │ Scale 64x32 to window size │ │
│ │ mem[PC] << 8 | │ │ Draw rectangles for each pixel │ │
│ │ mem[PC+1]; │ │ Use SDL2, raylib, or similar │ │
│ │ │ │ │ │
│ │ Extract: │ │ for (y = 0; y < 32; y++) │ │
│ │ nnn = opcode & 0x0FFF │ │ for (x = 0; x < 64; x++) │ │
│ │ nn = opcode & 0x00FF │ │ if (display[y*64+x]) │ │
│ │ n = opcode & 0x000F │ │ draw_rect(x*scale, │ │
│ │ x = (opcode>>8)&0xF │ │ y*scale, │ │
│ │ y = (opcode>>4)&0xF │ │ scale, scale) │ │
│ │ │ │ │ │
│ └───────────────────────────┘ └───────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
4.2 Key Components
Component Breakdown:
┌──────────────────────────────────────────────────────────────────┐
│ SOURCE FILES │
├──────────────────────────────────────────────────────────────────┤
│ │
│ main.c │
│ ├── Parse command line arguments │
│ ├── Initialize SDL/graphics library │
│ ├── Load ROM file │
│ ├── Run main loop │
│ └── Cleanup on exit │
│ │
│ chip8.h │
│ └── CHIP8 state structure definition │
│ └── All registers, memory, display, stack, etc. │
│ │
│ chip8.c │
│ ├── chip8_init() - Reset all state to initial values │
│ ├── chip8_load_rom() - Load ROM into memory at 0x200 │
│ ├── chip8_cycle() - Fetch, decode, execute one instruction │
│ ├── chip8_update_timers() - Decrement DT and ST │
│ └── chip8_set_key() / chip8_clear_key() - Input handling │
│ │
│ display.c (or integrated with main.c) │
│ ├── Initialize graphics window │
│ ├── Render CHIP-8 display to window │
│ └── Handle window events │
│ │
│ input.c (or integrated with main.c) │
│ ├── Map keyboard keys to CHIP-8 keypad │
│ └── Update keypad state on key press/release │
│ │
│ audio.c (optional, or integrated) │
│ ├── Initialize audio system │
│ └── Play/stop beep based on sound timer │
│ │
└──────────────────────────────────────────────────────────────────┘
4.3 Data Structures
// Core CHIP-8 state structure (chip8.h)
typedef struct {
// Memory: 4KB
uint8_t memory[4096];
// General purpose registers V0-VF
uint8_t V[16];
// Index register (for memory operations)
uint16_t I;
// Program counter (current instruction address)
uint16_t PC;
// Stack for subroutine calls
uint16_t stack[16];
uint8_t SP; // Stack pointer
// Timers (decrement at 60Hz)
uint8_t delay_timer;
uint8_t sound_timer;
// Display: 64x32 monochrome pixels
// 1 = pixel on, 0 = pixel off
uint8_t display[64 * 32];
// Input: 16-key hexadecimal keypad
// 1 = key pressed, 0 = key released
uint8_t keypad[16];
// Flag: does display need to be redrawn?
bool draw_flag;
} Chip8;
// Built-in font sprites (stored at 0x000-0x04F)
// Each character is 5 bytes (8 pixels wide x 5 pixels tall)
static const uint8_t chip8_fontset[80] = {
0xF0, 0x90, 0x90, 0x90, 0xF0, // 0
0x20, 0x60, 0x20, 0x20, 0x70, // 1
0xF0, 0x10, 0xF0, 0x80, 0xF0, // 2
0xF0, 0x10, 0xF0, 0x10, 0xF0, // 3
0x90, 0x90, 0xF0, 0x10, 0x10, // 4
0xF0, 0x80, 0xF0, 0x10, 0xF0, // 5
0xF0, 0x80, 0xF0, 0x90, 0xF0, // 6
0xF0, 0x10, 0x20, 0x40, 0x40, // 7
0xF0, 0x90, 0xF0, 0x90, 0xF0, // 8
0xF0, 0x90, 0xF0, 0x10, 0xF0, // 9
0xF0, 0x90, 0xF0, 0x90, 0x90, // A
0xE0, 0x90, 0xE0, 0x90, 0xE0, // B
0xF0, 0x80, 0x80, 0x80, 0xF0, // C
0xE0, 0x90, 0x90, 0x90, 0xE0, // D
0xF0, 0x80, 0xF0, 0x80, 0xF0, // E
0xF0, 0x80, 0xF0, 0x80, 0x80 // F
};
4.4 Algorithm Overview
MAIN EMULATION LOOP:
┌─────────────────────────────────────────────────────────────────┐
│ 1. INITIALIZATION │
│ ├── Create Chip8 structure │
│ ├── Call chip8_init() to reset state │
│ ├── Load fontset into memory[0x000..0x04F] │
│ ├── Load ROM into memory[0x200...] │
│ ├── Initialize graphics/audio/input systems │
│ └── Set PC = 0x200 (program start) │
├─────────────────────────────────────────────────────────────────┤
│ 2. MAIN LOOP (runs at 60Hz) │
│ │ │
│ ├── Poll input events │
│ │ └── Update keypad[] based on key state │
│ │ │
│ ├── Execute 10-12 CPU cycles (500-700Hz effective) │
│ │ └── for (i = 0; i < 10; i++) chip8_cycle() │
│ │ │
│ ├── Update timers (exactly once per 60Hz frame) │
│ │ ├── if (delay_timer > 0) delay_timer-- │
│ │ └── if (sound_timer > 0) sound_timer-- │
│ │ │
│ ├── Render display if draw_flag is set │
│ │ └── draw_flag = false after rendering │
│ │ │
│ ├── Play/stop beep based on sound_timer │
│ │ │
│ └── Delay to maintain 60 FPS timing │
├─────────────────────────────────────────────────────────────────┤
│ 3. SINGLE CPU CYCLE (chip8_cycle) │
│ │ │
│ ├── FETCH: Read opcode │
│ │ └── opcode = (memory[PC] << 8) | memory[PC + 1] │
│ │ │
│ ├── INCREMENT PC │
│ │ └── PC += 2 (before execute, simplifies branching) │
│ │ │
│ ├── DECODE: Extract operands │
│ │ ├── nnn = opcode & 0x0FFF │
│ │ ├── nn = opcode & 0x00FF │
│ │ ├── n = opcode & 0x000F │
│ │ ├── x = (opcode >> 8) & 0x0F │
│ │ └── y = (opcode >> 4) & 0x0F │
│ │ │
│ └── EXECUTE: Switch on first nibble │
│ ├── case 0x0: if opcode==0x00E0 -> CLS │
│ │ if opcode==0x00EE -> RET │
│ ├── case 0x1: JP nnn │
│ ├── case 0x2: CALL nnn │
│ ├── ... │
│ └── case 0xF: (sub-switch on nn) │
└─────────────────────────────────────────────────────────────────┘
5. Implementation Guide
5.1 Development Environment Setup
# REQUIRED: C Compiler and Build Tools
# macOS
xcode-select --install
brew install sdl2
# Ubuntu/Debian
sudo apt update
sudo apt install build-essential libsdl2-dev
# Fedora/RHEL
sudo dnf install gcc make SDL2-devel
# Windows (with MSYS2)
pacman -S mingw-w64-x86_64-gcc mingw-w64-x86_64-SDL2
# Verify installation
gcc --version
sdl2-config --version
# ALTERNATIVE GRAPHICS LIBRARIES:
# If you prefer simpler APIs than SDL2:
# raylib (simpler, game-focused)
brew install raylib # macOS
apt install libraylib-dev # Ubuntu
# SFML (C++ focused)
brew install sfml
apt install libsfml-dev
5.2 Project Structure
chip8-emulator/
├── Makefile
├── README.md
├── src/
│ ├── main.c # Entry point, main loop
│ ├── chip8.h # CHIP-8 state structure
│ ├── chip8.c # CPU emulation logic
│ ├── display.h # Display rendering interface
│ ├── display.c # SDL2/graphics implementation
│ ├── input.h # Input handling interface
│ └── input.c # Keyboard mapping
├── roms/ # Test ROMs (download separately)
│ ├── BC_test.ch8
│ ├── PONG.ch8
│ ├── TETRIS.ch8
│ ├── INVADERS.ch8
│ └── test_opcode.ch8
└── docs/
├── cowgod_spec.txt # Technical reference (for offline)
└── notes.md # Your implementation notes
5.3 The Core Question You’re Answering
“How do CPUs execute programs, and how can I build one in software?”
This project answers this by making you implement:
- How instructions are fetched from memory
- How opcodes are decoded into operations
- How registers and memory are modified
- How control flow (jumps, calls, returns) works
- How hardware timers and I/O integrate with the CPU
You will emerge understanding that a CPU is just a state machine following simple rules very fast.
5.4 Concepts You Must Understand First
Before writing code, ensure you can answer these self-assessment questions:
Hexadecimal and Binary:
- Q: What is 0xDEAD in binary? In decimal?
- A: 1101 1110 1010 1101; 57005
Bitmasking:
- Q: How do you extract bits 11-8 from a 16-bit number?
- A:
(value >> 8) & 0xF
Two’s Complement (for subtract/borrow):
- Q: If A = 5 and B = 8, what is A - B for 8-bit unsigned? What should VF be?
- A: 253 (0xFD wraps around); VF = 0 (borrow occurred)
Fetch-Decode-Execute:
- Q: What are the three phases of instruction execution?
- A: Fetch (read instruction from memory), Decode (extract opcode and operands), Execute (perform operation)
Stack Operations:
- Q: What happens to SP when you CALL a subroutine? When you RET?
- A: CALL: stack[SP] = PC, SP++; RET: SP–, PC = stack[SP]
5.5 Questions to Guide Your Design
Architecture:
- Should you increment PC before or after executing the instruction?
- How will you structure the switch statement for opcode decoding?
- What happens if an unknown opcode is encountered?
Display:
- How will you represent the 64x32 display in memory?
- What scale factor will you use for the window?
- How will you handle sprite wrapping at screen edges?
Timing:
- How many instructions should execute per frame at 60Hz to feel right?
- How will you measure elapsed time for the main loop?
Input:
- How will you map modern keyboard keys to the CHIP-8 hex keypad?
- How will FX0A (wait for key press) block execution?
5.6 Thinking Exercise
Before coding, trace through this simple CHIP-8 program by hand:
Address | Opcode | Instruction
--------|--------|------------------
0x200 | 00E0 | CLS
0x202 | 6005 | LD V0, 0x05
0x204 | 6110 | LD V1, 0x10
0x206 | 8014 | ADD V0, V1
0x208 | 3015 | SE V0, 0x15
0x20A | 1200 | JP 0x200
0x20C | 00FD | (invalid - exit in some impls)
Trace it:
- PC = 0x200, fetch 00E0 (CLS), clear display, PC = 0x202
- PC = 0x202, fetch 6005, V0 = 0x05, PC = 0x204
- PC = 0x204, fetch 6110, V1 = 0x10, PC = 0x206
- PC = 0x206, fetch 8014, V0 = V0 + V1 = 0x05 + 0x10 = 0x15, VF = 0 (no carry), PC = 0x208
- PC = 0x208, fetch 3015, V0 == 0x15? Yes! Skip next, PC = 0x20C (not 0x20A)
- Program ends (or traps on invalid opcode)
What did you learn?
- SE (skip if equal) advances PC by 4 instead of 2 when condition is true
- The condition was true because we calculated V0 = 0x05 + 0x10 = 0x15
5.7 Hints in Layers
Use these progressively. Try each level before moving to the next.
Hint 1: Starting Structure
Your main.c should look something like this:
#include "chip8.h"
#include "display.h"
int main(int argc, char *argv[]) {
if (argc < 2) {
printf("Usage: %s <rom>\n", argv[0]);
return 1;
}
Chip8 chip8;
chip8_init(&chip8);
chip8_load_rom(&chip8, argv[1]);
display_init();
while (!should_quit()) {
handle_input(&chip8);
// Run ~10 cycles per frame (500-700 Hz effective)
for (int i = 0; i < 10; i++) {
chip8_cycle(&chip8);
}
chip8_update_timers(&chip8);
if (chip8.draw_flag) {
display_render(&chip8);
chip8.draw_flag = false;
}
delay_for_60fps();
}
display_cleanup();
return 0;
}
Hint 2: Initialization
void chip8_init(Chip8 *chip8) {
memset(chip8, 0, sizeof(Chip8));
// Load fontset into memory
memcpy(chip8->memory, chip8_fontset, 80);
// PC starts at 0x200 (where ROMs load)
chip8->PC = 0x200;
}
void chip8_load_rom(Chip8 *chip8, const char *filename) {
FILE *f = fopen(filename, "rb");
if (!f) {
fprintf(stderr, "Failed to open ROM: %s\n", filename);
exit(1);
}
// Read ROM into memory starting at 0x200
fread(&chip8->memory[0x200], 1, 4096 - 0x200, f);
fclose(f);
}
Hint 3: Opcode Decoding Structure
void chip8_cycle(Chip8 *chip8) {
// FETCH
uint16_t opcode = (chip8->memory[chip8->PC] << 8) |
chip8->memory[chip8->PC + 1];
// INCREMENT PC (before execute)
chip8->PC += 2;
// DECODE
uint16_t nnn = opcode & 0x0FFF;
uint8_t nn = opcode & 0x00FF;
uint8_t n = opcode & 0x000F;
uint8_t x = (opcode >> 8) & 0x0F;
uint8_t y = (opcode >> 4) & 0x0F;
// EXECUTE
switch (opcode & 0xF000) {
case 0x0000:
switch (opcode) {
case 0x00E0: // CLS
memset(chip8->display, 0, sizeof(chip8->display));
chip8->draw_flag = true;
break;
case 0x00EE: // RET
chip8->SP--;
chip8->PC = chip8->stack[chip8->SP];
break;
default:
printf("Unknown opcode: 0x%04X\n", opcode);
}
break;
case 0x1000: // JP nnn
chip8->PC = nnn;
break;
case 0x2000: // CALL nnn
chip8->stack[chip8->SP] = chip8->PC;
chip8->SP++;
chip8->PC = nnn;
break;
// ... continue for all opcodes
}
}
Hint 4: Implementing the Draw Instruction (DXYN)
This is the most complex instruction. Take it step by step:
case 0xD000: { // DRW Vx, Vy, n
uint8_t xpos = chip8->V[x] % 64;
uint8_t ypos = chip8->V[y] % 32;
chip8->V[0xF] = 0; // Reset collision flag
for (int row = 0; row < n; row++) {
uint8_t sprite_byte = chip8->memory[chip8->I + row];
for (int col = 0; col < 8; col++) {
// Check if current pixel in sprite is set
if ((sprite_byte & (0x80 >> col)) != 0) {
int screen_x = (xpos + col) % 64;
int screen_y = (ypos + row) % 32;
int pixel_index = screen_y * 64 + screen_x;
// Check for collision
if (chip8->display[pixel_index] == 1) {
chip8->V[0xF] = 1;
}
// XOR the pixel
chip8->display[pixel_index] ^= 1;
}
}
}
chip8->draw_flag = true;
break;
}
Hint 5: Input Handling with SDL2
// Keypad mapping: Physical keyboard -> CHIP-8 hex key
// 1 2 3 4 -> 1 2 3 C
// Q W E R -> 4 5 6 D
// A S D F -> 7 8 9 E
// Z X C V -> A 0 B F
int key_map[16] = {
SDL_SCANCODE_X, // 0
SDL_SCANCODE_1, // 1
SDL_SCANCODE_2, // 2
SDL_SCANCODE_3, // 3
SDL_SCANCODE_Q, // 4
SDL_SCANCODE_W, // 5
SDL_SCANCODE_E, // 6
SDL_SCANCODE_A, // 7
SDL_SCANCODE_S, // 8
SDL_SCANCODE_D, // 9
SDL_SCANCODE_Z, // A
SDL_SCANCODE_C, // B
SDL_SCANCODE_4, // C
SDL_SCANCODE_R, // D
SDL_SCANCODE_F, // E
SDL_SCANCODE_V // F
};
void handle_input(Chip8 *chip8) {
SDL_Event event;
while (SDL_PollEvent(&event)) {
if (event.type == SDL_QUIT) {
exit(0);
}
}
const uint8_t *keyboard = SDL_GetKeyboardState(NULL);
for (int i = 0; i < 16; i++) {
chip8->keypad[i] = keyboard[key_map[i]];
}
}
Hint 6: Complete Instruction Implementation Checklist
Track your progress implementing each instruction:
SYSTEM:
[ ] 00E0 - CLS
[ ] 00EE - RET
FLOW CONTROL:
[ ] 1NNN - JP addr
[ ] 2NNN - CALL addr
[ ] 3XNN - SE Vx, byte
[ ] 4XNN - SNE Vx, byte
[ ] 5XY0 - SE Vx, Vy
[ ] 9XY0 - SNE Vx, Vy
[ ] BNNN - JP V0, addr
REGISTER:
[ ] 6XNN - LD Vx, byte
[ ] 7XNN - ADD Vx, byte
[ ] 8XY0 - LD Vx, Vy
[ ] 8XY1 - OR Vx, Vy
[ ] 8XY2 - AND Vx, Vy
[ ] 8XY3 - XOR Vx, Vy
[ ] 8XY4 - ADD Vx, Vy (with carry)
[ ] 8XY5 - SUB Vx, Vy (with borrow)
[ ] 8XY6 - SHR Vx
[ ] 8XY7 - SUBN Vx, Vy
[ ] 8XYE - SHL Vx
[ ] CXNN - RND Vx, byte
MEMORY:
[ ] ANNN - LD I, addr
[ ] FX1E - ADD I, Vx
[ ] FX29 - LD F, Vx
[ ] FX33 - LD B, Vx (BCD)
[ ] FX55 - LD [I], Vx
[ ] FX65 - LD Vx, [I]
DISPLAY:
[ ] DXYN - DRW Vx, Vy, n
KEYBOARD:
[ ] EX9E - SKP Vx
[ ] EXA1 - SKNP Vx
[ ] FX0A - LD Vx, K (wait for key)
TIMERS:
[ ] FX07 - LD Vx, DT
[ ] FX15 - LD DT, Vx
[ ] FX18 - LD ST, Vx
5.8 The Interview Questions They’ll Ask
Basic Understanding
- “What is the fetch-decode-execute cycle?”
- Good Answer: CPU reads instruction bytes from memory (fetch), interprets the opcode and operands (decode), then performs the operation (execute). This repeats continuously.
- “How did you decode CHIP-8 opcodes?”
- Good Answer: Used bitmasking and shifting. First nibble (opcode » 12) determines instruction type, then extract X, Y, N, NN, NNN using masks like 0x0F00, 0x00FF.
- “Why does CHIP-8 use XOR for drawing sprites?”
- Good Answer: XOR allows toggling pixels on/off, and detecting collisions (if a pixel goes from 1 to 0). It also enables erasing by drawing the same sprite again.
Technical Details
- “How did you handle the timing difference between CPU speed and timer speed?”
- Good Answer: Timer updates at fixed 60Hz, but instructions run faster (10-12 per timer tick). Main loop runs at 60Hz, executes multiple cycles, then decrements timers once.
- “What’s special about register VF?”
- Good Answer: It’s a flag register. Many instructions set VF: carry for ADD, NOT borrow for SUB, collision for DRAW, LSB/MSB for shifts. Never use it for general data.
- “How does CALL/RET work?”
- Good Answer: CALL pushes current PC to stack, increments SP, sets PC to new address. RET decrements SP, pops address from stack back into PC.
- “What happens on a sprite collision in CHIP-8?”
- Good Answer: When XORing a sprite, if any pixel goes from ON to OFF (was 1, XOR with 1 = 0), VF is set to 1. Games use this for collision detection.
Problem-Solving
- “Your emulator runs but games don’t work correctly. How do you debug?”
- Good Answer: Add trace output (print PC, opcode, register state after each instruction). Use test ROMs designed to verify specific instructions. Compare trace output with known-good emulators.
- “What quirks did you encounter with CHIP-8?”
- Good Answer: Shift instructions (8XY6/8XYE) have two interpretations - original vs modern. Load/store (FX55/FX65) may or may not increment I. Some games expect one behavior, some expect the other.
- “How would you extend this to emulate a more complex system like the Game Boy?”
- Good Answer: Similar structure but more components: multiple memory regions with banking, PPU for graphics (tile-based not sprites), interrupts, more complex timing. Same fetch-decode-execute core though.
5.9 Books That Will Help
| Topic | Book | Specific Chapter | Why It Helps |
|---|---|---|---|
| Instruction Execution | “Computer Organization and Design” - Patterson & Hennessy | Chapter 4: The Processor | Understanding datapath and control |
| Opcode Encoding | “Computer Organization and Design” - Patterson & Hennessy | Chapter 2.5: Representing Instructions | How instructions are encoded in bits |
| Register Files | “Computer Organization and Design” - Patterson & Hennessy | Chapter 4.2: Logic Design Conventions | How registers are organized |
| Memory-Mapped I/O | “Computer Organization and Design” - Patterson & Hennessy | Chapter 5.2: Memory Technology | How hardware accesses memory |
| Control Flow | “Computer Systems: A Programmer’s Perspective” - Bryant & O’Hallaron | Chapter 3.6: Control | Jumps, branches, calls at machine level |
| The Stack | “Computer Systems: A Programmer’s Perspective” - Bryant & O’Hallaron | Chapter 3.7: Procedures | How stack supports function calls |
Online Resources:
- Cowgod’s CHIP-8 Technical Reference - THE definitive spec
- How to Write an Emulator (CHIP-8) - Laurence Muller’s tutorial
- CHIP-8 Test ROMs - Modern test suite
5.10 Implementation Phases
Phase 1: Skeleton (2-3 hours)
- Create project structure
- Define Chip8 struct
- Implement chip8_init() and chip8_load_rom()
- Open window (blank display)
- Milestone: Window opens, ROM loads without crashing
Phase 2: Basic Instructions (4-6 hours)
- Implement fetch-decode-execute loop
- Implement 00E0 (CLS), 1NNN (JP), 6XNN (LD)
- Add debug output (print each instruction)
- Milestone: Can trace through simple programs
Phase 3: Core Instructions (4-6 hours)
- Implement all register operations (8XY0-8XYE)
- Implement skip instructions (3XNN, 4XNN, 5XY0, 9XY0)
- Implement ADD, CALL, RET
- Milestone: Test ROM instructions work
Phase 4: Display (3-4 hours)
- Implement DXYN (draw sprite)
- Load fontset
- Implement FX29 (font address)
- Milestone: Sprites render correctly
Phase 5: Input & Timers (2-3 hours)
- Implement keyboard mapping
- Implement EX9E, EXA1, FX0A
- Implement timer instructions and 60Hz update
- Milestone: Input works, timers countdown
Phase 6: Remaining Instructions (2-3 hours)
- Implement FX33 (BCD), FX55, FX65
- Implement CXNN (random)
- Handle edge cases
- Milestone: All instructions implemented
Phase 7: Polish & Testing (2-4 hours)
- Run test ROMs, fix bugs
- Run games (Pong, Tetris, etc.)
- Add sound
- Optimize display rendering
- Milestone: Games playable!
5.11 Key Implementation Decisions
-
Display storage: Use a 1D array
uint8_t display[64*32]indexed asy*64+x. Simpler than 2D array. -
Timing approach: Run main loop at 60Hz (16.67ms per frame). Execute 10-12 instructions per frame for ~500-700Hz effective CPU speed.
-
Quirks mode: Implement configurable behavior for controversial instructions (8XY6, 8XYE, FX55, FX65). Default to modern behavior.
-
PC increment timing: Increment PC immediately after fetch, before execute. Makes branching simpler (just set PC = target, don’t need to subtract 2).
-
Error handling: Print unknown opcodes but continue running. Some ROMs have garbage data that should be skipped.
6. Testing Strategy
6.1 Unit Testing
Test individual instructions in isolation:
// Test: 6XNN - LD Vx, byte
void test_ld_vx_byte() {
Chip8 chip8;
chip8_init(&chip8);
// Set up memory: 6505 = LD V5, 0x05
chip8.memory[0x200] = 0x65;
chip8.memory[0x201] = 0x05;
chip8_cycle(&chip8);
assert(chip8.V[5] == 0x05);
assert(chip8.PC == 0x202);
printf("PASS: LD Vx, byte\n");
}
// Test: 8XY4 - ADD with carry
void test_add_with_carry() {
Chip8 chip8;
chip8_init(&chip8);
chip8.V[0] = 0xFF;
chip8.V[1] = 0x02;
// 8014 = ADD V0, V1
chip8.memory[0x200] = 0x80;
chip8.memory[0x201] = 0x14;
chip8_cycle(&chip8);
assert(chip8.V[0] == 0x01); // 0xFF + 0x02 = 0x101, truncated to 0x01
assert(chip8.V[0xF] == 1); // Carry set
printf("PASS: ADD with carry\n");
}
6.2 Integration Testing with Test ROMs
Use standardized test ROMs to verify your implementation:
# Corax89's CHIP-8 test ROM
$ ./chip8 roms/test_opcode.ch8
# Should display "OK" for each instruction group
# BC_test (comprehensive)
$ ./chip8 roms/BC_test.ch8
# All patterns should render correctly
# Timendus test suite
$ ./chip8 roms/chip8-test-suite.ch8
# Modern comprehensive tests
6.3 Debugging Techniques
// Add trace mode for debugging
void chip8_cycle_debug(Chip8 *chip8) {
uint16_t opcode = (chip8->memory[chip8->PC] << 8) |
chip8->memory[chip8->PC + 1];
printf("PC=%04X: %04X ", chip8->PC, opcode);
// Decode and print mnemonic
switch (opcode & 0xF000) {
case 0x0000:
if (opcode == 0x00E0) printf("CLS");
else if (opcode == 0x00EE) printf("RET");
break;
case 0x1000: printf("JP %03X", opcode & 0x0FFF); break;
case 0x6000: printf("LD V%X, %02X", (opcode>>8)&0xF, opcode&0xFF); break;
// ... etc
}
printf(" | V0=%02X V1=%02X ... VF=%02X | I=%04X | SP=%d\n",
chip8->V[0], chip8->V[1], chip8->V[0xF], chip8->I, chip8->SP);
// Execute
chip8->PC += 2;
// ... rest of cycle
}
Debug checklist for common issues:
- Is PC incrementing correctly (by 2)?
- Is opcode being fetched in correct byte order?
- Are you using the correct x and y from the opcode?
- Is VF being set by all instructions that should set it?
- Is the stack growing/shrinking correctly?
7. Common Pitfalls & Debugging
Problem 1: Nothing displays on screen
- Root Cause: DXYN not implemented, or display not being rendered
- Fix: Add debug print in DXYN, verify sprite bytes are non-zero
- Quick Test: Load font sprite at (0,0) - should show character
Problem 2: Sprites are garbled or in wrong position
- Root Cause: Incorrect bit extraction in DXYN, or wrong coordinate wrapping
- Fix: Use
(0x80 >> col)for bit check, use% 64and% 32for wrapping - Quick Test: Draw at (0,0), should appear in top-left corner
Problem 3: Games don’t respond to input
- Root Cause: Keypad not updating, or key mapping wrong
- Fix: Print keypad state, verify SDL key events are processed
- Quick Test: Add debug output in key handling
Problem 4: Jumps go to wrong addresses
- Root Cause: PC incremented after setting it in JP/CALL
- Fix: Increment PC before execute (at start of cycle)
- Quick Test: Simple loop should repeat, not advance
Problem 5: CALL/RET crashes or loops infinitely
- Root Cause: Stack overflow (SP not bounded), or wrong PC saved
- Fix: Push PC after increment, pop before setting
- Quick Test: Simple subroutine call and return
Problem 6: Games run too fast or too slow
- Root Cause: Timing not properly controlled
- Fix: Use SDL_Delay or proper frame limiting
- Quick Test: Timer countdown should take ~1 second for DT=60
Problem 7: Collision detection doesn’t work
- Root Cause: VF not set when pixel flips from 1 to 0
- Fix: Check pixel BEFORE XOR, set VF if it was 1
- Quick Test: Draw same sprite twice, VF should be 1 after second draw
Problem 8: BCD (FX33) produces wrong digits
- Root Cause: Division/modulo order wrong
- Fix:
mem[I] = val/100; mem[I+1] = (val/10)%10; mem[I+2] = val%10 - Quick Test: FX33 with V0=123 should store 1, 2, 3
8. Extensions & Challenges
Extension 1: Debug Mode
- Add real-time register display
- Step through instructions one at a time
- Set breakpoints at specific addresses
- Inspect memory contents
Extension 2: Assembler
- Write an assembler that converts CHIP-8 assembly to binary
- Support labels and expressions
- Create your own games!
Extension 3: Disassembler
- Convert ROM binary back to assembly
- Show control flow graph
- Identify subroutines
Extension 4: Super CHIP-8 (SCHIP)
- Extend to 128x64 resolution
- Add scroll instructions
- Implement extended instructions
Extension 5: Save States
- Serialize entire emulator state to file
- Load save states
- Implement rewind feature
Extension 6: Configurable Quirks
- Make controversial behaviors configurable
- Test same ROM with different quirk settings
- Document which games need which settings
9. Real-World Connections
Game Emulation Industry: Every game emulator (Nintendo, PlayStation, etc.) uses these same concepts. CHIP-8 is literally the training ground for professional emulator developers.
CPU Design: Understanding emulation helps CPU designers. You can prototype a new ISA in software before committing to silicon.
Virtualization: Virtual machines (VMware, VirtualBox) are sophisticated emulators. The concepts scale up to full x86 emulation.
Security Research: Malware analysts use emulators to safely execute suspicious code. Understanding emulation is crucial for threat analysis.
Compiler Testing: Emulators provide controlled environments for testing compiler output. You know exactly what state the machine is in.
History Preservation: The only way to play many classic games is through emulation. You’re participating in digital preservation.
10. Resources
Primary References
- Cowgod’s CHIP-8 Technical Reference - The definitive specification
- CHIP-8 Wikipedia - History and overview
- Mattmik’s CHIP-8 Mastering Guide - Deep dive
Test ROMs
- Timendus CHIP-8 Test Suite - Modern, comprehensive
- corax89 Test ROM - Classic opcode test
- CHIP-8 Archive - Huge ROM collection
Tutorials
- How to Write an Emulator (CHIP-8) - Laurence Muller
- Austin Morlan’s CHIP-8 Guide - Detailed walkthrough
- Tobias V. Langhoff’s Guide - Excellent modern guide
Reference Implementations
- Study existing emulators AFTER completing your own
- Compare your approach to others
- Learn from different coding styles
11. Self-Assessment Checklist
Before considering this project complete, verify:
- Can you explain the fetch-decode-execute cycle without notes?
- Can you decode an opcode like 0x8124 by hand?
- Can you explain why XOR drawing enables collision detection?
- Can you describe the difference between PC and I registers?
- Can you explain how CALL/RET use the stack?
- Does your emulator pass standard test ROMs?
- Can you play Pong, controlling both paddles?
- Can you play Tetris, placing and rotating pieces?
- Can you step through execution in debug mode?
- Could you implement a similar emulator for a different system?
12. Submission / Completion Criteria
Your implementation is complete when:
- All 35 CHIP-8 instructions are implemented
- Test ROMs (BC_test, corax89 test_opcode) display correctly
- Pong is playable (both paddles respond)
- Tetris is playable (pieces fall, rotate, clear lines)
- Space Invaders is playable (shoot, hit enemies)
- Timers decrement at 60Hz
- Sound plays when sound_timer > 0
- No crashes on standard ROMs
- Code is reasonably commented
- You can explain your implementation to someone else
Congratulations! You’ve built a working CPU emulator. This is a genuine accomplishment that many professional programmers never achieve. You now understand how instructions execute at a level that will inform all your future systems programming.
Proceed to Project 5 to design and implement your own ISA, or jump to Project 6 to tackle the legendary 6502 processor.