Project 4: CHIP-8 Emulator

Build a complete emulator for the CHIP-8, a 1970s virtual machine with 35 instructions, 16 registers, a 64x32 monochrome display, and hexadecimal keyboard input. Run actual games like Pong, Tetris, and Space Invaders on a CPU you built yourself.

Quick Reference

Attribute Value
Difficulty ★★★☆☆ Intermediate
Time Estimate 1-2 weeks
Language C (Recommended), Rust, C++, Go
Prerequisites Binary/hex fluency, basic C programming, understanding of fetch-decode-execute cycle
Key Topics Opcode decoding, register files, program counter, stack operations, memory-mapped I/O, timer interrupts, sprite graphics

1. Learning Objectives

After completing this project, you will be able to:

  • Implement the complete fetch-decode-execute cycle for a real instruction set architecture
  • Decode 2-byte opcodes and extract embedded operands using bitmasking and shifting
  • Design and implement a register file with 16 general-purpose registers
  • Manage a program counter and subroutine stack for control flow
  • Implement hardware timers that decrement at a fixed frequency
  • Draw sprites using XOR-based graphics (the same technique used in early video games)
  • Read and interpret official technical specifications for a CPU
  • Debug CPU emulation issues systematically using state inspection
  • Understand why emulators are structured the way they are

2. Theoretical Foundation

2.1 What is CHIP-8?

CHIP-8 is not a physical CPU - it’s a virtual machine specification created in the mid-1970s by Joseph Weisbecker for the COSMAC VIP computer. It was designed to make game programming easier by providing a higher-level interface than raw machine code.

The CHIP-8 Ecosystem:

    1970s Physical Hardware                Modern Emulation
    ┌─────────────────────────┐           ┌─────────────────────────┐
    │  COSMAC VIP / TELMAC   │           │    Your Computer        │
    │  ┌───────────────────┐ │           │  ┌───────────────────┐  │
    │  │   CDP1802 CPU     │ │           │  │   x86/ARM CPU     │  │
    │  │   (Hardware)      │ │           │  │   (Hardware)      │  │
    │  └─────────┬─────────┘ │           │  └─────────┬─────────┘  │
    │            │           │           │            │            │
    │  ┌─────────▼─────────┐ │           │  ┌─────────▼─────────┐  │
    │  │ CHIP-8 Interpreter│ │           │  │ YOUR EMULATOR     │  │
    │  │   (Software)      │ │           │  │   (Software)      │  │
    │  └─────────┬─────────┘ │           │  └─────────┬─────────┘  │
    │            │           │           │            │            │
    │  ┌─────────▼─────────┐ │           │  ┌─────────▼─────────┐  │
    │  │   CHIP-8 ROM      │ │           │  │   CHIP-8 ROM      │  │
    │  │  (Pong, Tetris)   │ │           │  │  (Same exact ROM) │  │
    │  └───────────────────┘ │           │  └───────────────────┘  │
    └─────────────────────────┘           └─────────────────────────┘

    You are building the middle layer - the interpreter that makes
    CHIP-8 programs run on modern hardware.

2.2 Why CHIP-8 is the Perfect First Emulation Project

CHIP-8 is the “Hello World” of CPU emulation for several reasons:

  1. Simple but complete: Only 35 instructions, but covers all fundamental concepts
  2. Well-documented: Cowgod’s Technical Reference is the gold standard spec
  3. Testable: Hundreds of test ROMs and games available
  4. Visual feedback: You immediately see if your emulator works (games play!)
  5. Real hardware concepts: Registers, stack, timers, memory-mapped I/O - all present
CHIP-8 Complexity Comparison:

CPU             | Instructions | Registers | Addressing Modes | Time to Implement
----------------|--------------|-----------|------------------|------------------
CHIP-8          | 35           | 16        | 2                | 1-2 weeks
Game Boy (LR35902) | 500+      | 8+        | 8+               | 2-3 months
6502 (NES/C64)  | 56           | 5         | 13               | 1 month
x86-64          | 1000+        | 16+       | 10+              | Years (never done)

CHIP-8 is the on-ramp to CPU emulation.

2.3 The CHIP-8 Architecture

CHIP-8 System Architecture:

┌─────────────────────────────────────────────────────────────────────────┐
│                           CHIP-8 VIRTUAL MACHINE                         │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  ┌─────────────────────────────────────────────────────────────────┐    │
│  │                          MEMORY (4KB)                            │    │
│  │  ┌──────────────┬──────────────────────────────────────────┐    │    │
│  │  │ 0x000-0x1FF  │  Reserved (Interpreter / Font sprites)   │    │    │
│  │  ├──────────────┼──────────────────────────────────────────┤    │    │
│  │  │ 0x200-0xFFF  │  Program ROM & Working RAM               │    │    │
│  │  └──────────────┴──────────────────────────────────────────┘    │    │
│  └─────────────────────────────────────────────────────────────────┘    │
│                                                                          │
│  ┌─────────────────────┐    ┌─────────────────────────────────────┐    │
│  │     REGISTERS       │    │           STACK (16 levels)         │    │
│  │  ┌───┬───┬───┬───┐  │    │  ┌───┬───┬───┬───┬───┬───┬───┬───┐ │    │
│  │  │V0 │V1 │V2 │V3 │  │    │  │ 0 │ 1 │ 2 │ 3 │...│12 │13 │14 │ │    │
│  │  ├───┼───┼───┼───┤  │    │  └───┴───┴───┴───┴───┴───┴───┴───┘ │    │
│  │  │V4 │V5 │V6 │V7 │  │    │           ▲                         │    │
│  │  ├───┼───┼───┼───┤  │    │           │ SP (Stack Pointer)     │    │
│  │  │V8 │V9 │VA │VB │  │    └───────────┴─────────────────────────┘    │
│  │  ├───┼───┼───┼───┤  │                                                │
│  │  │VC │VD │VE │VF │  │    ┌─────────────────────────────────────┐    │
│  │  └───┴───┴───┴───┘  │    │       SPECIAL REGISTERS             │    │
│  │   (8-bit each)      │    │  ┌──────┐ ┌──────┐ ┌──────┐         │    │
│  │   VF = Carry flag   │    │  │  PC  │ │  I   │ │  SP  │         │    │
│  └─────────────────────┘    │  │16-bit│ │16-bit│ │ 8-bit│         │    │
│                             │  └──────┘ └──────┘ └──────┘         │    │
│  ┌─────────────────────┐    │  Program   Index   Stack            │    │
│  │      TIMERS         │    │  Counter   Reg     Pointer          │    │
│  │  ┌──────┐ ┌──────┐  │    └─────────────────────────────────────┘    │
│  │  │ DT   │ │ ST   │  │                                                │
│  │  │Delay │ │Sound │  │    ┌─────────────────────────────────────┐    │
│  │  │Timer │ │Timer │  │    │    DISPLAY (64 x 32 monochrome)     │    │
│  │  └──────┘ └──────┘  │    │  ┌─────────────────────────────────┐ │    │
│  │  (8-bit each,       │    │  │ ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ │ │    │
│  │   decrement @60Hz)  │    │  │ ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ │ │    │
│  └─────────────────────┘    │  │ ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ │ │    │
│                             │  └─────────────────────────────────┘ │    │
│  ┌─────────────────────┐    └─────────────────────────────────────┘    │
│  │    KEYBOARD         │                                                │
│  │  ┌─┬─┬─┬─┐          │    ┌─────────────────────────────────────┐    │
│  │  │1│2│3│C│          │    │         INPUT HANDLING              │    │
│  │  ├─┼─┼─┼─┤          │    │  • 16-key hexadecimal keypad        │    │
│  │  │4│5│6│D│          │    │  • Mapped to modern keyboard        │    │
│  │  ├─┼─┼─┼─┤          │    │  • Blocking wait for key press      │    │
│  │  │7│8│9│E│          │    └─────────────────────────────────────┘    │
│  │  ├─┼─┼─┼─┤          │                                                │
│  │  │A│0│B│F│          │                                                │
│  │  └─┴─┴─┴─┘          │                                                │
│  └─────────────────────┘                                                │
│                                                                          │
└─────────────────────────────────────────────────────────────────────────┘

2.4 Memory Map

CHIP-8 Memory Layout (4KB = 4096 bytes):

Address     Size    Purpose
──────────────────────────────────────────────────────────
0x000      80      Built-in font sprites (0-F)
   │        bytes   Each character is 5 bytes (8x5 pixels)
   │
0x050      160     Font sprites continue (or unused)
   │        bytes   (Implementation varies)
   │
0x1FF      ────    End of interpreter area
                   ─────────────────────────────────────────
0x200              ◄── Programs loaded here (START ADDRESS)
   │                   This is where your ROM loads
   │                   This is where PC starts
   │
   │               Program code and data occupy this area
   │               Variables stored here too
   │
0xFFF              ◄── End of memory (4095)
──────────────────────────────────────────────────────────

Memory Layout Diagram:

0x000 ┌────────────────────────────────┐
      │     Font Sprites (0-F)         │  80 bytes
      │     '0' = F0 90 90 90 F0       │
      │     '1' = 20 60 20 20 70       │
      │     ...                        │
0x050 ├────────────────────────────────┤
      │     (Reserved/Unused)          │  ~432 bytes
      │                                │
0x200 ├────────────────────────────────┤ ◄── PROGRAM START
      │                                │
      │     Your CHIP-8 ROM loads      │
      │     here and executes          │
      │                                │
      │     ┌──────────────────────┐   │
      │     │ Instructions         │   │
      │     │ 2 bytes each         │   │
      │     │ 0x00E0 = CLS         │   │
      │     │ 0x1NNN = JMP NNN     │   │
      │     │ ...                  │   │
      │     └──────────────────────┘   │
      │                                │
0xFFF └────────────────────────────────┘

2.5 Instruction Format

All CHIP-8 instructions are exactly 2 bytes (16 bits), stored big-endian:

CHIP-8 Instruction Encoding:

  High Byte (first in memory)    Low Byte (second in memory)
  ┌───┬───┬───┬───┬───┬───┬───┬───┬───┬───┬───┬───┬───┬───┬───┬───┐
  │ 15│ 14│ 13│ 12│ 11│ 10│ 9 │ 8 │ 7 │ 6 │ 5 │ 4 │ 3 │ 2 │ 1 │ 0 │
  └───┴───┴───┴───┴───┴───┴───┴───┴───┴───┴───┴───┴───┴───┴───┴───┘
  │           │           │           │           │
  └─────┬─────┘     └─────┴─────┴─────┴─────┴─────┘
        │                         │
   First nibble              Various operands
   (Opcode identifier)       depending on instruction

Nibble positions (commonly used in documentation):
  Nibble 1 (bits 15-12): Primary opcode
  Nibble 2 (bits 11-8):  Usually register X
  Nibble 3 (bits 7-4):   Usually register Y
  Nibble 4 (bits 3-0):   Usually N (small constant)

Common patterns:
  NNN  = 12-bit address (bits 11-0)
  NN   = 8-bit constant (bits 7-0)
  N    = 4-bit constant (bits 3-0)
  X    = Register index (bits 11-8)
  Y    = Register index (bits 7-4)


Example: Instruction 0x6A42 (LD VA, 0x42)

  Binary: 0110 1010 0100 0010
          │    │    └────────┘
          │    │         │
          │    │         └─ NN = 0x42 (immediate value 66)
          │    │
          │    └─ X = 0xA (register VA)
          │
          └─ Opcode = 0x6 (LD Vx, byte)

  Action: Load the value 0x42 into register VA


Example: Instruction 0x8123 (ADD V1, V2)

  Binary: 1000 0001 0010 0011
          │    │    │    │
          │    │    │    └─ Variant = 0x4 (but wait, it's 3?)
          │    │    │       Actually the low nibble for 8XYN
          │    │    │       determines the operation
          │    │    │
          │    │    └─ Y = 0x2 (register V2)
          │    │
          │    └─ X = 0x1 (register V1)
          │
          └─ Opcode = 0x8 (arithmetic/logic group)

  Wait - 0x8123 low nibble is 3, which is XOR, not ADD!
  For ADD V1, V2, the opcode would be 0x8124

  0x8124: V1 = V1 + V2, VF = carry

2.6 The Fetch-Decode-Execute Cycle

CHIP-8 CPU Cycle:

┌─────────────────────────────────────────────────────────────────────────┐
│                         FETCH-DECODE-EXECUTE                             │
└─────────────────────────────────────────────────────────────────────────┘

                    ┌─────────────────┐
                    │   FETCH         │
                    │                 │
                    │  Read 2 bytes   │
                    │  from memory    │
                    │  at [PC]        │
                    │                 │
                    │  opcode =       │
                    │  mem[PC] << 8 | │
                    │  mem[PC+1]      │
                    │                 │
                    │  PC += 2        │
                    └────────┬────────┘
                             │
                             ▼
                    ┌─────────────────┐
                    │   DECODE        │
                    │                 │
                    │  Extract opcode │
                    │  identifier     │
                    │  (high nibble)  │
                    │                 │
                    │  Extract        │
                    │  operands:      │
                    │  X, Y, N,       │
                    │  NN, NNN        │
                    └────────┬────────┘
                             │
                             ▼
                    ┌─────────────────┐
                    │   EXECUTE       │
                    │                 │
                    │  switch(opcode) │
                    │  {              │
                    │    case 0x0:    │
                    │      ...        │
                    │    case 0x1:    │
                    │      ...        │
                    │  }              │
                    │                 │
                    │  Modify state   │
                    │  (registers,    │
                    │   memory, PC,   │
                    │   display)      │
                    └────────┬────────┘
                             │
                             ▼
                    ┌─────────────────┐
                    │   UPDATE TIMERS │
                    │   (at 60Hz)     │
                    │                 │
                    │  if (DT > 0)    │
                    │    DT--         │
                    │  if (ST > 0)    │
                    │    ST--         │
                    │    beep()       │
                    └────────┬────────┘
                             │
                             ▼
                         ┌───────┐
                         │ LOOP  │───────────┐
                         └───────┘           │
                             ▲               │
                             └───────────────┘


Timing Considerations:
─────────────────────
• Original CHIP-8 ran at ~500-700 Hz (instructions per second)
• Timers (DT and ST) decrement at exactly 60 Hz
• You need to separate instruction execution rate from timer rate
• Common approach: Execute ~10 instructions per timer tick

2.7 The Display and Sprite System

CHIP-8 Display System:

Display Dimensions: 64 pixels wide x 32 pixels tall (2048 pixels total)
Pixel State: ON (1) or OFF (0) - monochrome
Drawing Method: XOR - sprites toggle pixels

Display Coordinates:
     0                                                    63
   ┌─────────────────────────────────────────────────────────┐
 0 │ (0,0)                                           (63,0) │
   │                                                        │
   │                                                        │
   │                        SCREEN                          │
   │                                                        │
   │                                                        │
31 │ (0,31)                                         (63,31) │
   └─────────────────────────────────────────────────────────┘

Sprites are stored in memory as rows of bytes:
─────────────────────────────────────────────────
Each sprite is 8 pixels wide, 1-15 pixels tall
Each row is 1 byte (8 bits = 8 pixels)

Example: The letter 'F' (font sprite at 0x000 + 15*5 = 0x04B)

Memory:        Binary:          Visual:
0xF0    =    1111 0000    =    ████░░░░
0x80    =    1000 0000    =    █░░░░░░░
0xF0    =    1111 0000    =    ████░░░░
0x80    =    1000 0000    =    █░░░░░░░
0x80    =    1000 0000    =    █░░░░░░░


XOR Drawing Explained:
─────────────────────
When you draw a sprite, each pixel is XORed with the screen:

  Screen pixel   XOR   Sprite pixel   =   Result
  ────────────────────────────────────────────────
       0         XOR        0         =     0      (stays off)
       0         XOR        1         =     1      (turns on)
       1         XOR        0         =     1      (stays on)
       1         XOR        1         =     0      (turns OFF - collision!)

If ANY pixel turns OFF during a draw, set VF = 1 (collision detection)


Drawing Algorithm (DXYN instruction):
─────────────────────────────────────
DXYN: Draw N-byte sprite at position (VX, VY)

1. Get X coordinate from V[X] % 64 (wrap horizontally)
2. Get Y coordinate from V[Y] % 32 (wrap vertically)
3. Set VF = 0 (no collision yet)
4. For each row (0 to N-1):
   a. Read sprite byte from memory[I + row]
   b. For each bit (0 to 7):
      - If sprite bit is 1:
        - screen_x = (X + bit) % 64
        - screen_y = (Y + row) % 32
        - If screen[screen_x][screen_y] is 1:
          - VF = 1 (collision detected!)
        - screen[screen_x][screen_y] ^= 1

Wrapping behavior note: Original CHIP-8 clipped sprites at edges.
Many modern games expect wrapping. Document your choice.


Example Draw Operation:
───────────────────────
Drawing 0xF0 at position (2, 1):

Before:                          Sprite:
0123456789...                    Bit: 01234567
┌──────────────                  0xF0: 11110000
│░░░░░░░░░░
│░░░░░░░░░░                      After XOR at (2,1):
│░░░░░░░░░░                      ┌──────────────
                                 │░░░░░░░░░░
                                 │░░████░░░░  ← Sprite drawn here
                                 │░░░░░░░░░░

2.8 Complete Instruction Set

CHIP-8 Instruction Set (35 Instructions):

┌─────────┬─────────────────────────────────────────────────────────────────┐
│ Opcode  │ Description                                                      │
├─────────┼─────────────────────────────────────────────────────────────────┤
│         │ SYSTEM INSTRUCTIONS                                              │
├─────────┼─────────────────────────────────────────────────────────────────┤
│ 00E0    │ CLS - Clear the display                                         │
│ 00EE    │ RET - Return from subroutine (pop stack into PC)                │
├─────────┼─────────────────────────────────────────────────────────────────┤
│         │ FLOW CONTROL                                                     │
├─────────┼─────────────────────────────────────────────────────────────────┤
│ 1NNN    │ JP addr - Jump to address NNN                                   │
│ 2NNN    │ CALL addr - Call subroutine at NNN (push PC, jump to NNN)       │
│ 3XNN    │ SE Vx, byte - Skip next instruction if Vx == NN                 │
│ 4XNN    │ SNE Vx, byte - Skip next instruction if Vx != NN                │
│ 5XY0    │ SE Vx, Vy - Skip next instruction if Vx == Vy                   │
│ 9XY0    │ SNE Vx, Vy - Skip next instruction if Vx != Vy                  │
│ BNNN    │ JP V0, addr - Jump to address NNN + V0                          │
├─────────┼─────────────────────────────────────────────────────────────────┤
│         │ REGISTER OPERATIONS                                              │
├─────────┼─────────────────────────────────────────────────────────────────┤
│ 6XNN    │ LD Vx, byte - Load NN into Vx                                   │
│ 7XNN    │ ADD Vx, byte - Add NN to Vx (no carry flag)                     │
│ 8XY0    │ LD Vx, Vy - Set Vx = Vy                                         │
│ 8XY1    │ OR Vx, Vy - Set Vx = Vx OR Vy                                   │
│ 8XY2    │ AND Vx, Vy - Set Vx = Vx AND Vy                                 │
│ 8XY3    │ XOR Vx, Vy - Set Vx = Vx XOR Vy                                 │
│ 8XY4    │ ADD Vx, Vy - Set Vx = Vx + Vy, VF = carry                       │
│ 8XY5    │ SUB Vx, Vy - Set Vx = Vx - Vy, VF = NOT borrow                  │
│ 8XY6    │ SHR Vx {, Vy} - Shift Vx right, VF = LSB before shift           │
│ 8XY7    │ SUBN Vx, Vy - Set Vx = Vy - Vx, VF = NOT borrow                 │
│ 8XYE    │ SHL Vx {, Vy} - Shift Vx left, VF = MSB before shift            │
│ CXNN    │ RND Vx, byte - Set Vx = random byte AND NN                      │
├─────────┼─────────────────────────────────────────────────────────────────┤
│         │ MEMORY OPERATIONS                                                │
├─────────┼─────────────────────────────────────────────────────────────────┤
│ ANNN    │ LD I, addr - Set I = NNN                                        │
│ FX1E    │ ADD I, Vx - Set I = I + Vx                                      │
│ FX29    │ LD F, Vx - Set I = location of sprite for digit Vx              │
│ FX33    │ LD B, Vx - Store BCD of Vx in I, I+1, I+2                       │
│ FX55    │ LD [I], Vx - Store V0-Vx in memory starting at I                │
│ FX65    │ LD Vx, [I] - Read V0-Vx from memory starting at I               │
├─────────┼─────────────────────────────────────────────────────────────────┤
│         │ DISPLAY                                                          │
├─────────┼─────────────────────────────────────────────────────────────────┤
│ DXYN    │ DRW Vx, Vy, N - Draw N-byte sprite at (Vx, Vy), VF = collision  │
├─────────┼─────────────────────────────────────────────────────────────────┤
│         │ KEYBOARD                                                         │
├─────────┼─────────────────────────────────────────────────────────────────┤
│ EX9E    │ SKP Vx - Skip next instruction if key Vx is pressed             │
│ EXA1    │ SKNP Vx - Skip next instruction if key Vx is NOT pressed        │
│ FX0A    │ LD Vx, K - Wait for key press, store key value in Vx            │
├─────────┼─────────────────────────────────────────────────────────────────┤
│         │ TIMERS                                                           │
├─────────┼─────────────────────────────────────────────────────────────────┤
│ FX07    │ LD Vx, DT - Set Vx = delay timer value                          │
│ FX15    │ LD DT, Vx - Set delay timer = Vx                                │
│ FX18    │ LD ST, Vx - Set sound timer = Vx                                │
└─────────┴─────────────────────────────────────────────────────────────────┘

Note on 8XY6 and 8XYE (Shift operations):
─────────────────────────────────────────
There are two interpretations:
  Original COSMAC VIP: Vx = Vy >> 1 or Vx = Vy << 1 (uses Y)
  Modern/CHIP-48:      Vx = Vx >> 1 or Vx = Vx << 1 (ignores Y)

Most games expect the modern behavior, but some require the original.
Consider making this configurable (quirks mode).

2.9 Common Misconceptions

Misconception 1: “CHIP-8 was a real hardware CPU”

  • Reality: CHIP-8 was always a virtual machine interpreted by software. There was no CHIP-8 chip.

Misconception 2: “I need to emulate at exact clock speeds”

  • Reality: CHIP-8 ran on different hardware at different speeds. You just need to keep timers at 60Hz and instructions “fast enough” (typically 500-700 Hz feels right).

Misconception 3: “VF is just another general-purpose register”

  • Reality: VF is special - it’s used as a flag register by many instructions (carry, borrow, collision). Never use it for general computation.

Misconception 4: “The display is like a framebuffer I write to directly”

  • Reality: The display is modified only through the DXYN instruction using XOR. You can’t directly set a pixel to a specific value.

Misconception 5: “I should use floating-point for timing”

  • Reality: Use integer counters and fixed update rates. Floating-point timing leads to subtle bugs and inconsistent behavior.

3. Project Specification

3.1 What You Will Build

A complete CHIP-8 emulator that:

  1. Loads and executes any standard CHIP-8 ROM
  2. Implements all 35 instructions correctly
  3. Displays graphics in a 64x32 window (scaled up for visibility)
  4. Accepts keyboard input mapped to the CHIP-8 hex keypad
  5. Runs timers at the correct 60Hz rate
  6. Plays sound when the sound timer is active

3.2 Functional Requirements

  • Load ROM files into memory at address 0x200
  • Initialize all CHIP-8 state (registers, PC, stack, display)
  • Implement fetch-decode-execute cycle
  • Implement all 35 CHIP-8 instructions
  • 64x32 monochrome display with XOR sprite drawing
  • Keyboard input for 16 hex keys
  • Delay timer decrements at 60Hz
  • Sound timer triggers audio when > 0
  • Collision detection (VF flag) for sprite drawing

3.3 Non-Functional Requirements

  • Instruction execution at approximately 500-700 Hz (configurable)
  • Timer updates at exactly 60 Hz
  • Display should scale to a reasonable window size (at least 640x320)
  • Frame rate should be smooth (target 60 FPS for display)
  • Memory usage under 1MB
  • Should run on Linux, macOS, or Windows

3.4 Example Usage / Output

# Build the emulator
$ make

# Run a ROM
$ ./chip8 roms/PONG.ch8
┌────────────────────────────────────────────────────────────────┐
│                                                                │
│    █                           ●                           █   │
│    █                                                       █   │
│    █                                                       █   │
│    █                                                       █   │
│                                                                │
│                                                                │
│                                                                │
│                           Score: 3 - 2                         │
└────────────────────────────────────────────────────────────────┘
Controls: 1=Left Up, Q=Left Down | 4=Right Up, R=Right Down | ESC=Quit

# Run with debug output
$ ./chip8 --debug roms/test.ch8
PC=0x200: 00E0 CLS
PC=0x202: 6A02 LD VA, 0x02
PC=0x204: 6B0C LD VB, 0x0C
PC=0x206: A300 LD I, 0x300
PC=0x208: DAB5 DRW VA, VB, 5
...

# Run test ROM to verify implementation
$ ./chip8 roms/BC_test.ch8
[All test patterns should display correctly]

3.5 Real World Outcome

When complete, you will be able to:

  • Play classic games: Pong, Tetris, Space Invaders, Breakout, Brix
  • Run test ROMs that verify your instruction implementation
  • Understand emulator architecture well enough to tackle Game Boy or NES
  • Explain CPU emulation concepts in interviews
  • Have a compelling portfolio project demonstrating systems knowledge

4. Solution Architecture

4.1 High-Level Design

CHIP-8 Emulator Architecture:

┌─────────────────────────────────────────────────────────────────────────────┐
│                              YOUR EMULATOR                                   │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │                           MAIN LOOP                                  │    │
│  │  ┌─────────────────────────────────────────────────────────────┐    │    │
│  │  │  while (running) {                                          │    │    │
│  │  │      handle_input();    // Check keyboard state             │    │    │
│  │  │                                                             │    │    │
│  │  │      // Run multiple instructions per frame                 │    │    │
│  │  │      for (int i = 0; i < instructions_per_frame; i++) {     │    │    │
│  │  │          emulate_cycle();  // Fetch-Decode-Execute          │    │    │
│  │  │      }                                                      │    │    │
│  │  │                                                             │    │    │
│  │  │      update_timers();   // Decrement DT and ST at 60Hz      │    │    │
│  │  │      render_display();  // Draw to window                   │    │    │
│  │  │      play_sound();      // Beep if ST > 0                   │    │    │
│  │  │      delay_for_60fps(); // Maintain timing                  │    │    │
│  │  │  }                                                          │    │    │
│  │  └─────────────────────────────────────────────────────────────┘    │    │
│  └─────────────────────────────────────────────────────────────────────┘    │
│                                                                              │
│  ┌───────────────────────────┐    ┌───────────────────────────────────┐    │
│  │     CHIP-8 STATE          │    │        INPUT HANDLER              │    │
│  │                           │    │                                   │    │
│  │  uint8_t memory[4096]     │    │  Map physical keys to CHIP-8:    │    │
│  │  uint8_t V[16]            │    │                                   │    │
│  │  uint16_t I               │    │  Physical    CHIP-8              │    │
│  │  uint16_t PC              │    │  ─────────────────────           │    │
│  │  uint8_t SP               │    │  1 2 3 4    1 2 3 C              │    │
│  │  uint16_t stack[16]       │    │  Q W E R    4 5 6 D              │    │
│  │  uint8_t delay_timer      │    │  A S D F    7 8 9 E              │    │
│  │  uint8_t sound_timer      │    │  Z X C V    A 0 B F              │    │
│  │  uint8_t display[64*32]   │    │                                   │    │
│  │  uint8_t keypad[16]       │    │  bool keypad[16]                 │    │
│  │                           │    │                                   │    │
│  └───────────────────────────┘    └───────────────────────────────────┘    │
│                                                                              │
│  ┌───────────────────────────┐    ┌───────────────────────────────────┐    │
│  │    INSTRUCTION DECODER    │    │        DISPLAY RENDERER           │    │
│  │                           │    │                                   │    │
│  │  uint16_t opcode =        │    │  Scale 64x32 to window size      │    │
│  │    mem[PC] << 8 |         │    │  Draw rectangles for each pixel  │    │
│  │    mem[PC+1];             │    │  Use SDL2, raylib, or similar    │    │
│  │                           │    │                                   │    │
│  │  Extract:                 │    │  for (y = 0; y < 32; y++)        │    │
│  │    nnn = opcode & 0x0FFF  │    │    for (x = 0; x < 64; x++)      │    │
│  │    nn  = opcode & 0x00FF  │    │      if (display[y*64+x])        │    │
│  │    n   = opcode & 0x000F  │    │        draw_rect(x*scale,        │    │
│  │    x   = (opcode>>8)&0xF  │    │                 y*scale,         │    │
│  │    y   = (opcode>>4)&0xF  │    │                 scale, scale)    │    │
│  │                           │    │                                   │    │
│  └───────────────────────────┘    └───────────────────────────────────┘    │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

4.2 Key Components

Component Breakdown:

┌──────────────────────────────────────────────────────────────────┐
│                        SOURCE FILES                               │
├──────────────────────────────────────────────────────────────────┤
│                                                                   │
│  main.c                                                           │
│  ├── Parse command line arguments                                 │
│  ├── Initialize SDL/graphics library                              │
│  ├── Load ROM file                                                │
│  ├── Run main loop                                                │
│  └── Cleanup on exit                                              │
│                                                                   │
│  chip8.h                                                          │
│  └── CHIP8 state structure definition                             │
│      └── All registers, memory, display, stack, etc.              │
│                                                                   │
│  chip8.c                                                          │
│  ├── chip8_init() - Reset all state to initial values             │
│  ├── chip8_load_rom() - Load ROM into memory at 0x200             │
│  ├── chip8_cycle() - Fetch, decode, execute one instruction       │
│  ├── chip8_update_timers() - Decrement DT and ST                  │
│  └── chip8_set_key() / chip8_clear_key() - Input handling         │
│                                                                   │
│  display.c (or integrated with main.c)                            │
│  ├── Initialize graphics window                                   │
│  ├── Render CHIP-8 display to window                              │
│  └── Handle window events                                         │
│                                                                   │
│  input.c (or integrated with main.c)                              │
│  ├── Map keyboard keys to CHIP-8 keypad                           │
│  └── Update keypad state on key press/release                     │
│                                                                   │
│  audio.c (optional, or integrated)                                │
│  ├── Initialize audio system                                      │
│  └── Play/stop beep based on sound timer                          │
│                                                                   │
└──────────────────────────────────────────────────────────────────┘

4.3 Data Structures

// Core CHIP-8 state structure (chip8.h)

typedef struct {
    // Memory: 4KB
    uint8_t memory[4096];

    // General purpose registers V0-VF
    uint8_t V[16];

    // Index register (for memory operations)
    uint16_t I;

    // Program counter (current instruction address)
    uint16_t PC;

    // Stack for subroutine calls
    uint16_t stack[16];
    uint8_t SP;  // Stack pointer

    // Timers (decrement at 60Hz)
    uint8_t delay_timer;
    uint8_t sound_timer;

    // Display: 64x32 monochrome pixels
    // 1 = pixel on, 0 = pixel off
    uint8_t display[64 * 32];

    // Input: 16-key hexadecimal keypad
    // 1 = key pressed, 0 = key released
    uint8_t keypad[16];

    // Flag: does display need to be redrawn?
    bool draw_flag;

} Chip8;


// Built-in font sprites (stored at 0x000-0x04F)
// Each character is 5 bytes (8 pixels wide x 5 pixels tall)

static const uint8_t chip8_fontset[80] = {
    0xF0, 0x90, 0x90, 0x90, 0xF0, // 0
    0x20, 0x60, 0x20, 0x20, 0x70, // 1
    0xF0, 0x10, 0xF0, 0x80, 0xF0, // 2
    0xF0, 0x10, 0xF0, 0x10, 0xF0, // 3
    0x90, 0x90, 0xF0, 0x10, 0x10, // 4
    0xF0, 0x80, 0xF0, 0x10, 0xF0, // 5
    0xF0, 0x80, 0xF0, 0x90, 0xF0, // 6
    0xF0, 0x10, 0x20, 0x40, 0x40, // 7
    0xF0, 0x90, 0xF0, 0x90, 0xF0, // 8
    0xF0, 0x90, 0xF0, 0x10, 0xF0, // 9
    0xF0, 0x90, 0xF0, 0x90, 0x90, // A
    0xE0, 0x90, 0xE0, 0x90, 0xE0, // B
    0xF0, 0x80, 0x80, 0x80, 0xF0, // C
    0xE0, 0x90, 0x90, 0x90, 0xE0, // D
    0xF0, 0x80, 0xF0, 0x80, 0xF0, // E
    0xF0, 0x80, 0xF0, 0x80, 0x80  // F
};

4.4 Algorithm Overview

MAIN EMULATION LOOP:

┌─────────────────────────────────────────────────────────────────┐
│ 1. INITIALIZATION                                                │
│    ├── Create Chip8 structure                                    │
│    ├── Call chip8_init() to reset state                          │
│    ├── Load fontset into memory[0x000..0x04F]                    │
│    ├── Load ROM into memory[0x200...]                            │
│    ├── Initialize graphics/audio/input systems                   │
│    └── Set PC = 0x200 (program start)                            │
├─────────────────────────────────────────────────────────────────┤
│ 2. MAIN LOOP (runs at 60Hz)                                      │
│    │                                                             │
│    ├── Poll input events                                         │
│    │   └── Update keypad[] based on key state                    │
│    │                                                             │
│    ├── Execute 10-12 CPU cycles (500-700Hz effective)            │
│    │   └── for (i = 0; i < 10; i++) chip8_cycle()               │
│    │                                                             │
│    ├── Update timers (exactly once per 60Hz frame)               │
│    │   ├── if (delay_timer > 0) delay_timer--                    │
│    │   └── if (sound_timer > 0) sound_timer--                    │
│    │                                                             │
│    ├── Render display if draw_flag is set                        │
│    │   └── draw_flag = false after rendering                     │
│    │                                                             │
│    ├── Play/stop beep based on sound_timer                       │
│    │                                                             │
│    └── Delay to maintain 60 FPS timing                           │
├─────────────────────────────────────────────────────────────────┤
│ 3. SINGLE CPU CYCLE (chip8_cycle)                                │
│    │                                                             │
│    ├── FETCH: Read opcode                                        │
│    │   └── opcode = (memory[PC] << 8) | memory[PC + 1]          │
│    │                                                             │
│    ├── INCREMENT PC                                              │
│    │   └── PC += 2 (before execute, simplifies branching)        │
│    │                                                             │
│    ├── DECODE: Extract operands                                  │
│    │   ├── nnn = opcode & 0x0FFF                                 │
│    │   ├── nn  = opcode & 0x00FF                                 │
│    │   ├── n   = opcode & 0x000F                                 │
│    │   ├── x   = (opcode >> 8) & 0x0F                            │
│    │   └── y   = (opcode >> 4) & 0x0F                            │
│    │                                                             │
│    └── EXECUTE: Switch on first nibble                           │
│        ├── case 0x0: if opcode==0x00E0 -> CLS                    │
│        │             if opcode==0x00EE -> RET                    │
│        ├── case 0x1: JP nnn                                      │
│        ├── case 0x2: CALL nnn                                    │
│        ├── ...                                                   │
│        └── case 0xF: (sub-switch on nn)                          │
└─────────────────────────────────────────────────────────────────┘

5. Implementation Guide

5.1 Development Environment Setup

# REQUIRED: C Compiler and Build Tools
# macOS
xcode-select --install
brew install sdl2

# Ubuntu/Debian
sudo apt update
sudo apt install build-essential libsdl2-dev

# Fedora/RHEL
sudo dnf install gcc make SDL2-devel

# Windows (with MSYS2)
pacman -S mingw-w64-x86_64-gcc mingw-w64-x86_64-SDL2

# Verify installation
gcc --version
sdl2-config --version


# ALTERNATIVE GRAPHICS LIBRARIES:
# If you prefer simpler APIs than SDL2:

# raylib (simpler, game-focused)
brew install raylib  # macOS
apt install libraylib-dev  # Ubuntu

# SFML (C++ focused)
brew install sfml
apt install libsfml-dev

5.2 Project Structure

chip8-emulator/
├── Makefile
├── README.md
├── src/
│   ├── main.c           # Entry point, main loop
│   ├── chip8.h          # CHIP-8 state structure
│   ├── chip8.c          # CPU emulation logic
│   ├── display.h        # Display rendering interface
│   ├── display.c        # SDL2/graphics implementation
│   ├── input.h          # Input handling interface
│   └── input.c          # Keyboard mapping
├── roms/                # Test ROMs (download separately)
│   ├── BC_test.ch8
│   ├── PONG.ch8
│   ├── TETRIS.ch8
│   ├── INVADERS.ch8
│   └── test_opcode.ch8
└── docs/
    ├── cowgod_spec.txt  # Technical reference (for offline)
    └── notes.md         # Your implementation notes

5.3 The Core Question You’re Answering

“How do CPUs execute programs, and how can I build one in software?”

This project answers this by making you implement:

  • How instructions are fetched from memory
  • How opcodes are decoded into operations
  • How registers and memory are modified
  • How control flow (jumps, calls, returns) works
  • How hardware timers and I/O integrate with the CPU

You will emerge understanding that a CPU is just a state machine following simple rules very fast.

5.4 Concepts You Must Understand First

Before writing code, ensure you can answer these self-assessment questions:

Hexadecimal and Binary:

  • Q: What is 0xDEAD in binary? In decimal?
  • A: 1101 1110 1010 1101; 57005

Bitmasking:

  • Q: How do you extract bits 11-8 from a 16-bit number?
  • A: (value >> 8) & 0xF

Two’s Complement (for subtract/borrow):

  • Q: If A = 5 and B = 8, what is A - B for 8-bit unsigned? What should VF be?
  • A: 253 (0xFD wraps around); VF = 0 (borrow occurred)

Fetch-Decode-Execute:

  • Q: What are the three phases of instruction execution?
  • A: Fetch (read instruction from memory), Decode (extract opcode and operands), Execute (perform operation)

Stack Operations:

  • Q: What happens to SP when you CALL a subroutine? When you RET?
  • A: CALL: stack[SP] = PC, SP++; RET: SP–, PC = stack[SP]

5.5 Questions to Guide Your Design

Architecture:

  1. Should you increment PC before or after executing the instruction?
  2. How will you structure the switch statement for opcode decoding?
  3. What happens if an unknown opcode is encountered?

Display:

  1. How will you represent the 64x32 display in memory?
  2. What scale factor will you use for the window?
  3. How will you handle sprite wrapping at screen edges?

Timing:

  1. How many instructions should execute per frame at 60Hz to feel right?
  2. How will you measure elapsed time for the main loop?

Input:

  1. How will you map modern keyboard keys to the CHIP-8 hex keypad?
  2. How will FX0A (wait for key press) block execution?

5.6 Thinking Exercise

Before coding, trace through this simple CHIP-8 program by hand:

Address | Opcode | Instruction
--------|--------|------------------
0x200   | 00E0   | CLS
0x202   | 6005   | LD V0, 0x05
0x204   | 6110   | LD V1, 0x10
0x206   | 8014   | ADD V0, V1
0x208   | 3015   | SE V0, 0x15
0x20A   | 1200   | JP 0x200
0x20C   | 00FD   | (invalid - exit in some impls)

Trace it:

  1. PC = 0x200, fetch 00E0 (CLS), clear display, PC = 0x202
  2. PC = 0x202, fetch 6005, V0 = 0x05, PC = 0x204
  3. PC = 0x204, fetch 6110, V1 = 0x10, PC = 0x206
  4. PC = 0x206, fetch 8014, V0 = V0 + V1 = 0x05 + 0x10 = 0x15, VF = 0 (no carry), PC = 0x208
  5. PC = 0x208, fetch 3015, V0 == 0x15? Yes! Skip next, PC = 0x20C (not 0x20A)
  6. Program ends (or traps on invalid opcode)

What did you learn?

  • SE (skip if equal) advances PC by 4 instead of 2 when condition is true
  • The condition was true because we calculated V0 = 0x05 + 0x10 = 0x15

5.7 Hints in Layers

Use these progressively. Try each level before moving to the next.

Hint 1: Starting Structure

Your main.c should look something like this:

#include "chip8.h"
#include "display.h"

int main(int argc, char *argv[]) {
    if (argc < 2) {
        printf("Usage: %s <rom>\n", argv[0]);
        return 1;
    }

    Chip8 chip8;
    chip8_init(&chip8);
    chip8_load_rom(&chip8, argv[1]);

    display_init();

    while (!should_quit()) {
        handle_input(&chip8);

        // Run ~10 cycles per frame (500-700 Hz effective)
        for (int i = 0; i < 10; i++) {
            chip8_cycle(&chip8);
        }

        chip8_update_timers(&chip8);

        if (chip8.draw_flag) {
            display_render(&chip8);
            chip8.draw_flag = false;
        }

        delay_for_60fps();
    }

    display_cleanup();
    return 0;
}

Hint 2: Initialization

void chip8_init(Chip8 *chip8) {
    memset(chip8, 0, sizeof(Chip8));

    // Load fontset into memory
    memcpy(chip8->memory, chip8_fontset, 80);

    // PC starts at 0x200 (where ROMs load)
    chip8->PC = 0x200;
}

void chip8_load_rom(Chip8 *chip8, const char *filename) {
    FILE *f = fopen(filename, "rb");
    if (!f) {
        fprintf(stderr, "Failed to open ROM: %s\n", filename);
        exit(1);
    }

    // Read ROM into memory starting at 0x200
    fread(&chip8->memory[0x200], 1, 4096 - 0x200, f);
    fclose(f);
}

Hint 3: Opcode Decoding Structure

void chip8_cycle(Chip8 *chip8) {
    // FETCH
    uint16_t opcode = (chip8->memory[chip8->PC] << 8) |
                       chip8->memory[chip8->PC + 1];

    // INCREMENT PC (before execute)
    chip8->PC += 2;

    // DECODE
    uint16_t nnn = opcode & 0x0FFF;
    uint8_t nn   = opcode & 0x00FF;
    uint8_t n    = opcode & 0x000F;
    uint8_t x    = (opcode >> 8) & 0x0F;
    uint8_t y    = (opcode >> 4) & 0x0F;

    // EXECUTE
    switch (opcode & 0xF000) {
        case 0x0000:
            switch (opcode) {
                case 0x00E0:  // CLS
                    memset(chip8->display, 0, sizeof(chip8->display));
                    chip8->draw_flag = true;
                    break;
                case 0x00EE:  // RET
                    chip8->SP--;
                    chip8->PC = chip8->stack[chip8->SP];
                    break;
                default:
                    printf("Unknown opcode: 0x%04X\n", opcode);
            }
            break;

        case 0x1000:  // JP nnn
            chip8->PC = nnn;
            break;

        case 0x2000:  // CALL nnn
            chip8->stack[chip8->SP] = chip8->PC;
            chip8->SP++;
            chip8->PC = nnn;
            break;

        // ... continue for all opcodes
    }
}

Hint 4: Implementing the Draw Instruction (DXYN)

This is the most complex instruction. Take it step by step:

case 0xD000: {  // DRW Vx, Vy, n
    uint8_t xpos = chip8->V[x] % 64;
    uint8_t ypos = chip8->V[y] % 32;
    chip8->V[0xF] = 0;  // Reset collision flag

    for (int row = 0; row < n; row++) {
        uint8_t sprite_byte = chip8->memory[chip8->I + row];

        for (int col = 0; col < 8; col++) {
            // Check if current pixel in sprite is set
            if ((sprite_byte & (0x80 >> col)) != 0) {
                int screen_x = (xpos + col) % 64;
                int screen_y = (ypos + row) % 32;
                int pixel_index = screen_y * 64 + screen_x;

                // Check for collision
                if (chip8->display[pixel_index] == 1) {
                    chip8->V[0xF] = 1;
                }

                // XOR the pixel
                chip8->display[pixel_index] ^= 1;
            }
        }
    }

    chip8->draw_flag = true;
    break;
}

Hint 5: Input Handling with SDL2

// Keypad mapping: Physical keyboard -> CHIP-8 hex key
// 1 2 3 4   ->   1 2 3 C
// Q W E R   ->   4 5 6 D
// A S D F   ->   7 8 9 E
// Z X C V   ->   A 0 B F

int key_map[16] = {
    SDL_SCANCODE_X,  // 0
    SDL_SCANCODE_1,  // 1
    SDL_SCANCODE_2,  // 2
    SDL_SCANCODE_3,  // 3
    SDL_SCANCODE_Q,  // 4
    SDL_SCANCODE_W,  // 5
    SDL_SCANCODE_E,  // 6
    SDL_SCANCODE_A,  // 7
    SDL_SCANCODE_S,  // 8
    SDL_SCANCODE_D,  // 9
    SDL_SCANCODE_Z,  // A
    SDL_SCANCODE_C,  // B
    SDL_SCANCODE_4,  // C
    SDL_SCANCODE_R,  // D
    SDL_SCANCODE_F,  // E
    SDL_SCANCODE_V   // F
};

void handle_input(Chip8 *chip8) {
    SDL_Event event;
    while (SDL_PollEvent(&event)) {
        if (event.type == SDL_QUIT) {
            exit(0);
        }
    }

    const uint8_t *keyboard = SDL_GetKeyboardState(NULL);
    for (int i = 0; i < 16; i++) {
        chip8->keypad[i] = keyboard[key_map[i]];
    }
}

Hint 6: Complete Instruction Implementation Checklist

Track your progress implementing each instruction:

SYSTEM:
[ ] 00E0 - CLS
[ ] 00EE - RET

FLOW CONTROL:
[ ] 1NNN - JP addr
[ ] 2NNN - CALL addr
[ ] 3XNN - SE Vx, byte
[ ] 4XNN - SNE Vx, byte
[ ] 5XY0 - SE Vx, Vy
[ ] 9XY0 - SNE Vx, Vy
[ ] BNNN - JP V0, addr

REGISTER:
[ ] 6XNN - LD Vx, byte
[ ] 7XNN - ADD Vx, byte
[ ] 8XY0 - LD Vx, Vy
[ ] 8XY1 - OR Vx, Vy
[ ] 8XY2 - AND Vx, Vy
[ ] 8XY3 - XOR Vx, Vy
[ ] 8XY4 - ADD Vx, Vy (with carry)
[ ] 8XY5 - SUB Vx, Vy (with borrow)
[ ] 8XY6 - SHR Vx
[ ] 8XY7 - SUBN Vx, Vy
[ ] 8XYE - SHL Vx
[ ] CXNN - RND Vx, byte

MEMORY:
[ ] ANNN - LD I, addr
[ ] FX1E - ADD I, Vx
[ ] FX29 - LD F, Vx
[ ] FX33 - LD B, Vx (BCD)
[ ] FX55 - LD [I], Vx
[ ] FX65 - LD Vx, [I]

DISPLAY:
[ ] DXYN - DRW Vx, Vy, n

KEYBOARD:
[ ] EX9E - SKP Vx
[ ] EXA1 - SKNP Vx
[ ] FX0A - LD Vx, K (wait for key)

TIMERS:
[ ] FX07 - LD Vx, DT
[ ] FX15 - LD DT, Vx
[ ] FX18 - LD ST, Vx

5.8 The Interview Questions They’ll Ask

Basic Understanding

  1. “What is the fetch-decode-execute cycle?”
    • Good Answer: CPU reads instruction bytes from memory (fetch), interprets the opcode and operands (decode), then performs the operation (execute). This repeats continuously.
  2. “How did you decode CHIP-8 opcodes?”
    • Good Answer: Used bitmasking and shifting. First nibble (opcode » 12) determines instruction type, then extract X, Y, N, NN, NNN using masks like 0x0F00, 0x00FF.
  3. “Why does CHIP-8 use XOR for drawing sprites?”
    • Good Answer: XOR allows toggling pixels on/off, and detecting collisions (if a pixel goes from 1 to 0). It also enables erasing by drawing the same sprite again.

Technical Details

  1. “How did you handle the timing difference between CPU speed and timer speed?”
    • Good Answer: Timer updates at fixed 60Hz, but instructions run faster (10-12 per timer tick). Main loop runs at 60Hz, executes multiple cycles, then decrements timers once.
  2. “What’s special about register VF?”
    • Good Answer: It’s a flag register. Many instructions set VF: carry for ADD, NOT borrow for SUB, collision for DRAW, LSB/MSB for shifts. Never use it for general data.
  3. “How does CALL/RET work?”
    • Good Answer: CALL pushes current PC to stack, increments SP, sets PC to new address. RET decrements SP, pops address from stack back into PC.
  4. “What happens on a sprite collision in CHIP-8?”
    • Good Answer: When XORing a sprite, if any pixel goes from ON to OFF (was 1, XOR with 1 = 0), VF is set to 1. Games use this for collision detection.

Problem-Solving

  1. “Your emulator runs but games don’t work correctly. How do you debug?”
    • Good Answer: Add trace output (print PC, opcode, register state after each instruction). Use test ROMs designed to verify specific instructions. Compare trace output with known-good emulators.
  2. “What quirks did you encounter with CHIP-8?”
    • Good Answer: Shift instructions (8XY6/8XYE) have two interpretations - original vs modern. Load/store (FX55/FX65) may or may not increment I. Some games expect one behavior, some expect the other.
  3. “How would you extend this to emulate a more complex system like the Game Boy?”
    • Good Answer: Similar structure but more components: multiple memory regions with banking, PPU for graphics (tile-based not sprites), interrupts, more complex timing. Same fetch-decode-execute core though.

5.9 Books That Will Help

Topic Book Specific Chapter Why It Helps
Instruction Execution “Computer Organization and Design” - Patterson & Hennessy Chapter 4: The Processor Understanding datapath and control
Opcode Encoding “Computer Organization and Design” - Patterson & Hennessy Chapter 2.5: Representing Instructions How instructions are encoded in bits
Register Files “Computer Organization and Design” - Patterson & Hennessy Chapter 4.2: Logic Design Conventions How registers are organized
Memory-Mapped I/O “Computer Organization and Design” - Patterson & Hennessy Chapter 5.2: Memory Technology How hardware accesses memory
Control Flow “Computer Systems: A Programmer’s Perspective” - Bryant & O’Hallaron Chapter 3.6: Control Jumps, branches, calls at machine level
The Stack “Computer Systems: A Programmer’s Perspective” - Bryant & O’Hallaron Chapter 3.7: Procedures How stack supports function calls

Online Resources:

5.10 Implementation Phases

Phase 1: Skeleton (2-3 hours)

  • Create project structure
  • Define Chip8 struct
  • Implement chip8_init() and chip8_load_rom()
  • Open window (blank display)
  • Milestone: Window opens, ROM loads without crashing

Phase 2: Basic Instructions (4-6 hours)

  • Implement fetch-decode-execute loop
  • Implement 00E0 (CLS), 1NNN (JP), 6XNN (LD)
  • Add debug output (print each instruction)
  • Milestone: Can trace through simple programs

Phase 3: Core Instructions (4-6 hours)

  • Implement all register operations (8XY0-8XYE)
  • Implement skip instructions (3XNN, 4XNN, 5XY0, 9XY0)
  • Implement ADD, CALL, RET
  • Milestone: Test ROM instructions work

Phase 4: Display (3-4 hours)

  • Implement DXYN (draw sprite)
  • Load fontset
  • Implement FX29 (font address)
  • Milestone: Sprites render correctly

Phase 5: Input & Timers (2-3 hours)

  • Implement keyboard mapping
  • Implement EX9E, EXA1, FX0A
  • Implement timer instructions and 60Hz update
  • Milestone: Input works, timers countdown

Phase 6: Remaining Instructions (2-3 hours)

  • Implement FX33 (BCD), FX55, FX65
  • Implement CXNN (random)
  • Handle edge cases
  • Milestone: All instructions implemented

Phase 7: Polish & Testing (2-4 hours)

  • Run test ROMs, fix bugs
  • Run games (Pong, Tetris, etc.)
  • Add sound
  • Optimize display rendering
  • Milestone: Games playable!

5.11 Key Implementation Decisions

  1. Display storage: Use a 1D array uint8_t display[64*32] indexed as y*64+x. Simpler than 2D array.

  2. Timing approach: Run main loop at 60Hz (16.67ms per frame). Execute 10-12 instructions per frame for ~500-700Hz effective CPU speed.

  3. Quirks mode: Implement configurable behavior for controversial instructions (8XY6, 8XYE, FX55, FX65). Default to modern behavior.

  4. PC increment timing: Increment PC immediately after fetch, before execute. Makes branching simpler (just set PC = target, don’t need to subtract 2).

  5. Error handling: Print unknown opcodes but continue running. Some ROMs have garbage data that should be skipped.


6. Testing Strategy

6.1 Unit Testing

Test individual instructions in isolation:

// Test: 6XNN - LD Vx, byte
void test_ld_vx_byte() {
    Chip8 chip8;
    chip8_init(&chip8);

    // Set up memory: 6505 = LD V5, 0x05
    chip8.memory[0x200] = 0x65;
    chip8.memory[0x201] = 0x05;

    chip8_cycle(&chip8);

    assert(chip8.V[5] == 0x05);
    assert(chip8.PC == 0x202);
    printf("PASS: LD Vx, byte\n");
}

// Test: 8XY4 - ADD with carry
void test_add_with_carry() {
    Chip8 chip8;
    chip8_init(&chip8);

    chip8.V[0] = 0xFF;
    chip8.V[1] = 0x02;

    // 8014 = ADD V0, V1
    chip8.memory[0x200] = 0x80;
    chip8.memory[0x201] = 0x14;

    chip8_cycle(&chip8);

    assert(chip8.V[0] == 0x01);  // 0xFF + 0x02 = 0x101, truncated to 0x01
    assert(chip8.V[0xF] == 1);   // Carry set
    printf("PASS: ADD with carry\n");
}

6.2 Integration Testing with Test ROMs

Use standardized test ROMs to verify your implementation:

# Corax89's CHIP-8 test ROM
$ ./chip8 roms/test_opcode.ch8
# Should display "OK" for each instruction group

# BC_test (comprehensive)
$ ./chip8 roms/BC_test.ch8
# All patterns should render correctly

# Timendus test suite
$ ./chip8 roms/chip8-test-suite.ch8
# Modern comprehensive tests

6.3 Debugging Techniques

// Add trace mode for debugging
void chip8_cycle_debug(Chip8 *chip8) {
    uint16_t opcode = (chip8->memory[chip8->PC] << 8) |
                       chip8->memory[chip8->PC + 1];

    printf("PC=%04X: %04X ", chip8->PC, opcode);

    // Decode and print mnemonic
    switch (opcode & 0xF000) {
        case 0x0000:
            if (opcode == 0x00E0) printf("CLS");
            else if (opcode == 0x00EE) printf("RET");
            break;
        case 0x1000: printf("JP %03X", opcode & 0x0FFF); break;
        case 0x6000: printf("LD V%X, %02X", (opcode>>8)&0xF, opcode&0xFF); break;
        // ... etc
    }

    printf(" | V0=%02X V1=%02X ... VF=%02X | I=%04X | SP=%d\n",
           chip8->V[0], chip8->V[1], chip8->V[0xF], chip8->I, chip8->SP);

    // Execute
    chip8->PC += 2;
    // ... rest of cycle
}

Debug checklist for common issues:

  • Is PC incrementing correctly (by 2)?
  • Is opcode being fetched in correct byte order?
  • Are you using the correct x and y from the opcode?
  • Is VF being set by all instructions that should set it?
  • Is the stack growing/shrinking correctly?

7. Common Pitfalls & Debugging

Problem 1: Nothing displays on screen

  • Root Cause: DXYN not implemented, or display not being rendered
  • Fix: Add debug print in DXYN, verify sprite bytes are non-zero
  • Quick Test: Load font sprite at (0,0) - should show character

Problem 2: Sprites are garbled or in wrong position

  • Root Cause: Incorrect bit extraction in DXYN, or wrong coordinate wrapping
  • Fix: Use (0x80 >> col) for bit check, use % 64 and % 32 for wrapping
  • Quick Test: Draw at (0,0), should appear in top-left corner

Problem 3: Games don’t respond to input

  • Root Cause: Keypad not updating, or key mapping wrong
  • Fix: Print keypad state, verify SDL key events are processed
  • Quick Test: Add debug output in key handling

Problem 4: Jumps go to wrong addresses

  • Root Cause: PC incremented after setting it in JP/CALL
  • Fix: Increment PC before execute (at start of cycle)
  • Quick Test: Simple loop should repeat, not advance

Problem 5: CALL/RET crashes or loops infinitely

  • Root Cause: Stack overflow (SP not bounded), or wrong PC saved
  • Fix: Push PC after increment, pop before setting
  • Quick Test: Simple subroutine call and return

Problem 6: Games run too fast or too slow

  • Root Cause: Timing not properly controlled
  • Fix: Use SDL_Delay or proper frame limiting
  • Quick Test: Timer countdown should take ~1 second for DT=60

Problem 7: Collision detection doesn’t work

  • Root Cause: VF not set when pixel flips from 1 to 0
  • Fix: Check pixel BEFORE XOR, set VF if it was 1
  • Quick Test: Draw same sprite twice, VF should be 1 after second draw

Problem 8: BCD (FX33) produces wrong digits

  • Root Cause: Division/modulo order wrong
  • Fix: mem[I] = val/100; mem[I+1] = (val/10)%10; mem[I+2] = val%10
  • Quick Test: FX33 with V0=123 should store 1, 2, 3

8. Extensions & Challenges

Extension 1: Debug Mode

  • Add real-time register display
  • Step through instructions one at a time
  • Set breakpoints at specific addresses
  • Inspect memory contents

Extension 2: Assembler

  • Write an assembler that converts CHIP-8 assembly to binary
  • Support labels and expressions
  • Create your own games!

Extension 3: Disassembler

  • Convert ROM binary back to assembly
  • Show control flow graph
  • Identify subroutines

Extension 4: Super CHIP-8 (SCHIP)

  • Extend to 128x64 resolution
  • Add scroll instructions
  • Implement extended instructions

Extension 5: Save States

  • Serialize entire emulator state to file
  • Load save states
  • Implement rewind feature

Extension 6: Configurable Quirks

  • Make controversial behaviors configurable
  • Test same ROM with different quirk settings
  • Document which games need which settings

9. Real-World Connections

Game Emulation Industry: Every game emulator (Nintendo, PlayStation, etc.) uses these same concepts. CHIP-8 is literally the training ground for professional emulator developers.

CPU Design: Understanding emulation helps CPU designers. You can prototype a new ISA in software before committing to silicon.

Virtualization: Virtual machines (VMware, VirtualBox) are sophisticated emulators. The concepts scale up to full x86 emulation.

Security Research: Malware analysts use emulators to safely execute suspicious code. Understanding emulation is crucial for threat analysis.

Compiler Testing: Emulators provide controlled environments for testing compiler output. You know exactly what state the machine is in.

History Preservation: The only way to play many classic games is through emulation. You’re participating in digital preservation.


10. Resources

Primary References

Test ROMs

Tutorials

Reference Implementations

  • Study existing emulators AFTER completing your own
  • Compare your approach to others
  • Learn from different coding styles

11. Self-Assessment Checklist

Before considering this project complete, verify:

  • Can you explain the fetch-decode-execute cycle without notes?
  • Can you decode an opcode like 0x8124 by hand?
  • Can you explain why XOR drawing enables collision detection?
  • Can you describe the difference between PC and I registers?
  • Can you explain how CALL/RET use the stack?
  • Does your emulator pass standard test ROMs?
  • Can you play Pong, controlling both paddles?
  • Can you play Tetris, placing and rotating pieces?
  • Can you step through execution in debug mode?
  • Could you implement a similar emulator for a different system?

12. Submission / Completion Criteria

Your implementation is complete when:

  • All 35 CHIP-8 instructions are implemented
  • Test ROMs (BC_test, corax89 test_opcode) display correctly
  • Pong is playable (both paddles respond)
  • Tetris is playable (pieces fall, rotate, clear lines)
  • Space Invaders is playable (shoot, hit enemies)
  • Timers decrement at 60Hz
  • Sound plays when sound_timer > 0
  • No crashes on standard ROMs
  • Code is reasonably commented
  • You can explain your implementation to someone else

Congratulations! You’ve built a working CPU emulator. This is a genuine accomplishment that many professional programmers never achieve. You now understand how instructions execute at a level that will inform all your future systems programming.

Proceed to Project 5 to design and implement your own ISA, or jump to Project 6 to tackle the legendary 6502 processor.