Project 11: The no_std Game Boy Core (CPU Simulation)
Project 11: The no_std Game Boy Core (CPU Simulation)
“To truly understand how a computer works, build one yourself - even if it’s in software.” - The Art of the Metaobject Protocol
Project Metadata
- Main Programming Language: Rust
- Coolness Level: Level 5: Pure Magic (Super Cool)
- Difficulty: Level 5: Master
- Knowledge Area: Computer Architecture / Emulation
- Time Estimate: 1 month+
- Prerequisites:
- Strong Rust fundamentals (ownership, traits, lifetimes)
- Understanding of binary/hexadecimal arithmetic
- Familiarity with bitwise operations
- Basic knowledge of computer architecture (registers, memory, fetch-decode-execute)
- Experience with
no_stdRust (from Project 4)
What You Will Build
A CPU core (LR35902) for the Nintendo Game Boy that works in a no_std environment. You will implement the complete register set, the full instruction set (over 500 opcodes including CB-prefixed instructions), and the memory map. Your core will be portable enough to compile to WebAssembly, embedded ARM Cortex-M, or native x86_64.
This is not a toy project. You will build a cycle-accurate CPU emulator that can pass the industry-standard Blargg’s test ROMs, proving your implementation matches the real silicon.
Learning Objectives
By the end of this project, you will be able to:
- Describe the LR35902 CPU architecture including its registers, flag behaviors, and instruction encoding schemes
- Implement the fetch-decode-execute cycle with proper cycle counting for timing accuracy
- Design a flexible memory bus abstraction using Rust traits for multiple backend support
- Master bit manipulation techniques for flag calculation, half-carry detection, and register pairing
- Understand memory banking controllers (MBC1, MBC3, MBC5) and cartridge ROM/RAM switching
- Implement interrupt handling with proper priority and timing for VBLANK, LCD STAT, Timer, Serial, and Joypad
- Write comprehensive tests using Blargg’s test ROMs as the gold standard for correctness
Deep Theoretical Foundation
Before writing a single line of code, you must deeply understand the system you are emulating. This section provides the foundational knowledge you need.
The Game Boy: A System Overview
The Nintendo Game Boy, released in 1989, was a triumph of engineering compromise. Designed by Gunpei Yokoi following his “Lateral Thinking with Withered Technology” philosophy, it used proven, inexpensive components to create a gaming device with extraordinary battery life and durability.
+-----------------------------------------------------------------------+
| GAME BOY SYSTEM ARCHITECTURE |
+-----------------------------------------------------------------------+
+-------------------+
| CARTRIDGE |
| (Game ROM/RAM) |
| +MBC Chip |
+--------+----------+
|
| Cartridge Bus
|
+--------+ +---------+ +----+----+ +----------+ +--------+
| | | | | | | | | |
| JOYPAD +----+ CPU +----+ BUS +----+ PPU +----+ LCD |
| | | LR35902 | | ARBITER | | (Video) | | 160x144|
| | | @4MHz | | | | | | |
+--------+ +----+----+ +----+----+ +----+-----+ +--------+
| | |
| +----+----+ |
| | | |
+---------+ WRAM +---------+
| | 8KB |
| +---------+
|
+----+----+
| |
| APU +-----> Audio Output (4 Channels)
| (Audio) |
| |
+---------+
Key Specifications:
- CPU: Sharp LR35902 (custom Z80-like) @ 4.194304 MHz
- RAM: 8 KB Work RAM + 8 KB Video RAM
- Display: 160x144 pixels, 4 shades of green
- Audio: 4 channels (2 pulse, 1 wave, 1 noise)
- Cartridge: Up to 8 MB ROM, 128 KB RAM (with MBC)
The Sharp LR35902:
The CPU is often incorrectly called a “Z80.” It’s actually a hybrid between the Intel 8080 and the Zilog Z80, with some instructions from each and some unique to itself. It runs at 4.194304 MHz (often rounded to 4 MHz), which was chosen because it divides evenly into common TV timing frequencies.
Book Reference: “Game Boy Coding Adventure” by Maximilien Dagois - Full Book
The LR35902 CPU: A Modified Z80/8080 Hybrid
Understanding the LR35902’s lineage helps you understand its quirks:
+-----------------------------------------------------------------------+
| CPU FAMILY TREE |
+-----------------------------------------------------------------------+
Intel 8080 (1974)
|
+---> Zilog Z80 (1976)
| |
| +---> Sharp LR35902 (1989)
| |
| +---> What Game Boy uses
|
+---> Intel 8085 (1976)
What LR35902 INHERITED from 8080:
- Basic register set (A, B, C, D, E, H, L)
- Most 8-bit arithmetic instructions
- Register pairing (BC, DE, HL)
- Memory addressing via HL
What LR35902 INHERITED from Z80:
- Swap nibbles instruction
- Some CB-prefixed bit operations
- Some index register concepts (simplified)
What LR35902 REMOVED vs Z80:
- IX and IY index registers
- Shadow registers (A', F', B', C', etc.)
- Block transfer instructions (LDIR, LDDR)
- I/O space (replaced with memory-mapped I/O)
- Many Z80-specific CB/ED prefixed instructions
What LR35902 ADDED uniquely:
- STOP instruction (for power saving)
- Specific I/O register behaviors
- Simpler interrupt system
The 4 MHz Clock:
The CPU operates at exactly 4,194,304 Hz. This number is 2^22, making it easy to divide for various timing purposes:
- 4,194,304 / 154 scanlines / 456 cycles per scanline = ~59.73 frames per second
- This matches the original Game Boy’s display refresh rate
Book Reference: “Computer Organization and Design” by Patterson & Hennessy - Chapter 4
The Register Set: Your CPU’s Working Memory
The LR35902 has eight 8-bit registers and several special-purpose registers:
+-----------------------------------------------------------------------+
| LR35902 REGISTER LAYOUT |
+-----------------------------------------------------------------------+
8-bit Registers 16-bit Register Pairs
+-----------+ +-----------+-----------+
| A | Accumulator | A | F | AF (Accumulator + Flags)
+-----------+ +-----------+-----------+
| F | Flags 0x01 0x00
+-----------+
+-----------+-----------+
+-----------+ | B | C | BC (General Purpose)
| B | +-----------+-----------+
+-----------+ 0xFF 0x13
| C |
+-----------+
+-----------+-----------+
+-----------+ | D | E | DE (General Purpose)
| D | +-----------+-----------+
+-----------+ 0x00 0xD8
| E |
+-----------+
+-----------+-----------+
+-----------+ | H | L | HL (Memory Pointer)
| H | +-----------+-----------+
+-----------+ 0x01 0x4D
| L |
+-----------+
Special Registers:
+-----------------------+
| SP | Stack Pointer (16-bit)
+-----------------------+ Points to top of stack
0xFFFE Grows DOWNWARD in memory
+-----------------------+
| PC | Program Counter (16-bit)
+-----------------------+ Points to next instruction
0x0100 Starts at 0x0100 after boot ROM
After Boot ROM Execution, Registers Contain:
A = 0x01 (or 0x11 on Game Boy Color)
F = 0xB0 (Z=1, N=0, H=1, C=1)
B = 0x00
C = 0x13
D = 0x00
E = 0xD8
H = 0x01
L = 0x4D
SP = 0xFFFE
PC = 0x0100
Register Pairing:
The 8-bit registers can be combined into 16-bit pairs:
- AF: A (high byte) + F (low byte) - Accumulator and Flags
- BC: B (high byte) + C (low byte) - Counter/General purpose
- DE: D (high byte) + E (low byte) - Data/General purpose
- HL: H (high byte) + L (low byte) - Primary memory pointer
When you access HL as a 16-bit value and H=0x01, L=0x4D, then HL=0x014D.
Rust Implementation Strategy:
/// CPU registers with efficient 8-bit and 16-bit access
pub struct Registers {
pub a: u8,
pub f: u8, // Flags register - only upper 4 bits used
pub b: u8,
pub c: u8,
pub d: u8,
pub e: u8,
pub h: u8,
pub l: u8,
pub sp: u16,
pub pc: u16,
}
impl Registers {
/// Get AF as 16-bit value
pub fn af(&self) -> u16 {
(self.a as u16) << 8 | (self.f as u16)
}
/// Set AF from 16-bit value
pub fn set_af(&mut self, value: u16) {
self.a = (value >> 8) as u8;
self.f = (value & 0xF0) as u8; // Lower 4 bits always 0
}
/// Get HL as 16-bit value
pub fn hl(&self) -> u16 {
(self.h as u16) << 8 | (self.l as u16)
}
/// Set HL from 16-bit value
pub fn set_hl(&mut self, value: u16) {
self.h = (value >> 8) as u8;
self.l = value as u8;
}
// Similar for BC and DE...
}
The Flag Register: Z, N, H, C
The F register contains four flags that record information about the results of arithmetic and logic operations:
+-----------------------------------------------------------------------+
| FLAG REGISTER (F) LAYOUT |
+-----------------------------------------------------------------------+
Bit: 7 6 5 4 3 2 1 0
+----+----+----+----+----+----+----+----+
F = | Z | N | H | C | 0 | 0 | 0 | 0 |
+----+----+----+----+----+----+----+----+
| | | |
| | | +-- Carry Flag
| | +------- Half-Carry Flag (BCD)
| +------------ Subtract Flag (BCD)
+----------------- Zero Flag
IMPORTANT: Bits 3-0 are ALWAYS 0. Writing to them has no effect.
FLAG BEHAVIORS:
Z (Zero) Flag - Bit 7
Set (1): Result of operation is zero
Reset (0): Result is non-zero
Example:
ADD A, B where A=5, B=3 -> A=8, Z=0 (result not zero)
SUB A, B where A=5, B=5 -> A=0, Z=1 (result is zero)
N (Subtract) Flag - Bit 6
Set (1): Last operation was a subtraction
Reset (0): Last operation was an addition
Used by: DAA instruction for BCD correction
Example:
ADD A, B -> N=0
SUB A, B -> N=1
H (Half-Carry) Flag - Bit 5
Set (1): Carry occurred from bit 3 to bit 4
Reset (0): No half-carry occurred
This is the TRICKIEST flag to implement correctly!
Example (8-bit addition):
A = 0x0F (0000 1111)
B = 0x01 (0000 0001)
--------------------
R = 0x10 (0001 0000) <- Carry from bit 3 to bit 4, H=1
Formula for ADD: H = ((A & 0xF) + (B & 0xF)) > 0xF
Formula for SUB: H = (A & 0xF) < (B & 0xF)
C (Carry) Flag - Bit 4
Set (1): Carry occurred from bit 7 (overflow/underflow)
Reset (0): No carry occurred
Example (8-bit addition):
A = 0xFF (255)
B = 0x01 (1)
----------------
R = 0x00, C=1 (256 overflowed to 0)
Formula for ADD: C = (A as u16 + B as u16) > 0xFF
Formula for SUB: C = A < B
The Half-Carry Problem:
Half-carry is by far the most commonly incorrectly implemented flag. It matters because of the DAA (Decimal Adjust Accumulator) instruction, which converts binary results to BCD (Binary Coded Decimal) for displaying numbers.
HALF-CARRY CALCULATION EXAMPLES:
8-bit Addition (ADD A, B):
A = 0x3C, B = 0x2F
Binary: 0011 1100 + 0010 1111
Lower nibble: 0xC + 0xF = 0x1B (> 0xF, so H=1)
Full result: 0x6B
Calculation: H = ((0x3C & 0x0F) + (0x2F & 0x0F)) > 0x0F
H = (0x0C + 0x0F) > 0x0F
H = 0x1B > 0x0F
H = true
16-bit Addition (ADD HL, BC):
Half-carry is calculated from bit 11 to bit 12 (not bit 3 to 4)!
HL = 0x0FFF, BC = 0x0001
H = ((HL & 0x0FFF) + (BC & 0x0FFF)) > 0x0FFF
H = (0x0FFF + 0x0001) > 0x0FFF
H = 0x1000 > 0x0FFF
H = true
8-bit Subtraction (SUB A, B):
A = 0x3C, B = 0x2F
Lower nibble: 0xC - 0xF = -3 (borrow needed, H=1)
Calculation: H = (A & 0x0F) < (B & 0x0F)
H = 0x0C < 0x0F
H = true
Rust Implementation:
impl Registers {
/// Set the Zero flag
pub fn set_z(&mut self, value: bool) {
if value {
self.f |= 0x80; // Set bit 7
} else {
self.f &= !0x80; // Clear bit 7
}
}
/// Get the Zero flag
pub fn z(&self) -> bool {
(self.f & 0x80) != 0
}
/// Set the Subtract flag
pub fn set_n(&mut self, value: bool) {
if value {
self.f |= 0x40;
} else {
self.f &= !0x40;
}
}
/// Set the Half-carry flag
pub fn set_h(&mut self, value: bool) {
if value {
self.f |= 0x20;
} else {
self.f &= !0x20;
}
}
/// Set the Carry flag
pub fn set_c(&mut self, value: bool) {
if value {
self.f |= 0x10;
} else {
self.f &= !0x10;
}
}
/// Set all flags at once (common pattern)
pub fn set_flags(&mut self, z: bool, n: bool, h: bool, c: bool) {
self.f = 0;
if z { self.f |= 0x80; }
if n { self.f |= 0x40; }
if h { self.f |= 0x20; }
if c { self.f |= 0x10; }
}
}
Book Reference: “The Art of Computer Programming” Vol. 4 by Donald Knuth - Bit manipulation techniques
The Memory Map: Addressing 64KB
The Game Boy has a 16-bit address bus, allowing it to address 65,536 bytes (64 KB) of memory. Different regions of this address space are mapped to different hardware:
+-----------------------------------------------------------------------+
| GAME BOY MEMORY MAP |
+-----------------------------------------------------------------------+
0xFFFF +-----------------------+
| IE (Interrupt Enable) | 1 byte - Master interrupt enable bits
0xFFFF +-----------------------+
| |
| High RAM (HRAM) | 127 bytes - Fast internal RAM
| (Zero Page) | Used for stack and quick access variables
0xFF80 +-----------------------+
| |
| I/O Registers | 128 bytes - Hardware control registers
| (Joypad, Serial, | 0xFF00: Joypad
| Timer, Audio, | 0xFF04-0xFF07: Timer registers
| PPU controls) | 0xFF10-0xFF3F: Audio registers
| | 0xFF40-0xFF4B: LCD/PPU registers
0xFF00 +-----------------------+
| Unused | 96 bytes - Not mapped (returns 0xFF)
0xFEA0 +-----------------------+
| |
| OAM (Sprite Data) | 160 bytes - Object Attribute Memory
| 40 sprites x 4 | Each sprite: Y, X, Tile, Attributes
| bytes each |
0xFE00 +-----------------------+
| |
| Echo RAM | 7680 bytes - Mirror of 0xC000-0xDDFF
| (Avoid using) | Reading/writing here affects WRAM
| |
0xE000 +-----------------------+
| |
| Work RAM (WRAM) | 8192 bytes - General purpose RAM
| Bank 0: 0xC000-CFFF | Always accessible
| Bank 1: 0xD000-DFFF | Switchable on GBC
| |
0xC000 +-----------------------+
| |
| Cartridge RAM | 8192 bytes (if present)
| (External/Battery) | Depends on MBC type
| | Often battery-backed for saves
0xA000 +-----------------------+
| |
| Video RAM (VRAM) | 8192 bytes - Tile data and maps
| Tile Data: | 0x8000-0x97FF: Tile patterns
| 0x8000-0x97FF | 0x9800-0x9BFF: BG Map 0
| Tile Maps: | 0x9C00-0x9FFF: BG Map 1
| 0x9800-0x9FFF |
| |
0x8000 +-----------------------+
| |
| Cartridge ROM | 16384 bytes - Switchable Bank
| Bank 1-N | Which bank depends on MBC registers
| (Switchable) | Bank 0 is always the low bank
| |
0x4000 +-----------------------+
| |
| Cartridge ROM | 16384 bytes - Fixed Bank 0
| Bank 0 | Always contains start of game
| (Fixed) | Interrupt vectors at 0x0040, 0x0048, etc.
| |
0x0000 +-----------------------+
BOOT ROM OVERLAY (0x0000-0x00FF):
When the Game Boy first powers on, addresses 0x0000-0x00FF
are mapped to the internal 256-byte Boot ROM.
Writing 0x01 to 0xFF50 unmaps the Boot ROM permanently.
Key I/O Registers:
+-----------------------------------------------------------------------+
| KEY I/O REGISTER ADDRESSES |
+-----------------------------------------------------------------------+
0xFF00 - JOYP (Joypad)
Bits 5-4: Select button group (write)
Bits 3-0: Button states (read)
0xFF04 - DIV (Divider Register)
Increments at 16384 Hz. Writing any value resets to 0.
0xFF05 - TIMA (Timer Counter)
Incremented at rate specified by TAC. Triggers interrupt on overflow.
0xFF06 - TMA (Timer Modulo)
Value loaded into TIMA when it overflows.
0xFF07 - TAC (Timer Control)
Bit 2: Timer enable
Bits 1-0: Clock select (4096/262144/65536/16384 Hz)
0xFF0F - IF (Interrupt Flag)
Bit 4: Joypad interrupt requested
Bit 3: Serial interrupt requested
Bit 2: Timer interrupt requested
Bit 1: LCD STAT interrupt requested
Bit 0: VBlank interrupt requested
0xFF40 - LCDC (LCD Control)
Bit 7: LCD enable
Bit 6: Window tile map select
Bit 5: Window enable
Bit 4: BG & Window tile data select
Bit 3: BG tile map select
Bit 2: OBJ (sprite) size (8x8 or 8x16)
Bit 1: OBJ enable
Bit 0: BG & Window enable
0xFF41 - STAT (LCD Status)
Bits 6-3: Interrupt sources
Bits 1-0: Mode (0-3)
0xFF44 - LY (Current Scanline)
Current scanline being drawn (0-153)
0xFF50 - Boot ROM Disable
Write 0x01 to unmap Boot ROM from 0x0000-0x00FF
0xFFFF - IE (Interrupt Enable)
Same format as IF, but controls which interrupts are enabled
Rust Memory Bus Trait:
/// Memory bus abstraction for different backends
pub trait MemoryBus {
/// Read a byte from the given address
fn read(&self, addr: u16) -> u8;
/// Write a byte to the given address
fn write(&mut self, addr: u16, value: u8);
/// Read a 16-bit word (little-endian)
fn read_word(&self, addr: u16) -> u16 {
let lo = self.read(addr) as u16;
let hi = self.read(addr.wrapping_add(1)) as u16;
(hi << 8) | lo
}
/// Write a 16-bit word (little-endian)
fn write_word(&mut self, addr: u16, value: u16) {
self.write(addr, value as u8);
self.write(addr.wrapping_add(1), (value >> 8) as u8);
}
}
Instruction Encoding: Reading the Opcode Tables
The LR35902 has 256 base opcodes (0x00-0xFF) plus 256 CB-prefixed opcodes (0xCB 0x00 - 0xCB 0xFF), for a total of 512 possible instructions.
+-----------------------------------------------------------------------+
| OPCODE ENCODING STRUCTURE |
+-----------------------------------------------------------------------+
OPCODE FORMAT (8 bits):
Many opcodes follow regular patterns based on bit positions:
+---+---+---+---+---+---+---+---+
| 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
+---+---+---+---+---+---+---+---+
| | | | | | | |
+---+ +---+---+ +---+---+
| | |
| | +-- Source register (for many ops)
| +-------------- Destination register (for many ops)
+------------------------ Operation type
REGISTER ENCODING (3 bits):
000 = B 100 = H
001 = C 101 = L
010 = D 110 = (HL) - memory at address HL
011 = E 111 = A
8-BIT LOAD PATTERNS:
LD r, r' (Load register to register)
01 DDD SSS
^^^ ^^^
| +-- Source register
+------ Destination register
Examples:
LD B, C = 01 000 001 = 0x41
LD A, B = 01 111 000 = 0x78
LD L, H = 01 101 100 = 0x6C
LD A, (HL) = 01 111 110 = 0x7E
ARITHMETIC PATTERNS:
ADD A, r (Add register to A)
10 000 SSS
ADC A, r (Add with carry)
10 001 SSS
SUB r (Subtract register from A)
10 010 SSS
SBC A, r (Subtract with carry)
10 011 SSS
AND r
10 100 SSS
XOR r
10 101 SSS
OR r
10 110 SSS
CP r (Compare)
10 111 SSS
CB-PREFIXED INSTRUCTIONS:
The 0xCB prefix indicates a bit manipulation instruction.
After reading 0xCB, read another byte:
RLC r (Rotate Left through Carry)
00 000 SSS
RRC r (Rotate Right through Carry)
00 001 SSS
RL r (Rotate Left)
00 010 SSS
RR r (Rotate Right)
00 011 SSS
SLA r (Shift Left Arithmetic)
00 100 SSS
SRA r (Shift Right Arithmetic - preserve sign)
00 101 SSS
SWAP r (Swap nibbles)
00 110 SSS
SRL r (Shift Right Logical)
00 111 SSS
BIT n, r (Test bit n)
01 NNN SSS
^^^ ^^^
| +-- Register
+------ Bit number (0-7)
RES n, r (Reset bit n)
10 NNN SSS
SET n, r (Set bit n)
11 NNN SSS
Opcode Table Extract (Base Instructions):
+-----------------------------------------------------------------------+
| OPCODE TABLE (PARTIAL) |
+-----------------------------------------------------------------------+
| x0 x1 x2 x3 x4 x5 x6 x7 x8 x9 xA xB xC xD xE xF
-----+--------------------------------------------------------------------------------
0x | NOP LD LD INC INC DEC LD RLCA ADD LD LD DEC INC DEC LD RRCA
| BC (BC) BC B B B,d8 HL A BC C C C,d8
| d16 A B BC (BC)
-----+--------------------------------------------------------------------------------
1x | STOP LD LD INC INC DEC LD RLA JR ADD LD DEC INC DEC LD RRA
| DE (DE) DE D D D,d8 r8 HL A DE E E E,d8
| d16 A DE (DE)
-----+--------------------------------------------------------------------------------
2x | JR LD LD INC INC DEC LD DAA JR ADD LD DEC INC DEC LD CPL
| NZ HL (HL+)HL H H H,d8 Z HL A HL L L L,d8
| r8 d16 A r8 HL (HL+)
-----+--------------------------------------------------------------------------------
3x | JR LD LD INC INC DEC LD SCF JR ADD LD DEC INC DEC LD CCF
| NC SP (HL-)SP (HL) (HL) (HL) C HL A SP A A A,d8
| r8 d16 A d8 r8 SP (HL-)
-----+--------------------------------------------------------------------------------
4x | LD LD LD LD LD LD LD LD LD LD LD LD LD LD LD LD
| B,B B,C B,D B,E B,H B,L B B,A C,B C,C C,D C,E C,H C,L C C,A
| (HL) (HL)
-----+--------------------------------------------------------------------------------
...
-----+--------------------------------------------------------------------------------
Cx | RET POP JP JP CALL PUSH ADD RST RET RET JP CB CALL CALL ADC RST
| NZ BC NZ a16 NZ BC A 00 Z Z Z a16 A 08
| a16 a16 d8 a16 pfx a16 d8
-----+--------------------------------------------------------------------------------
Legend:
d8 = immediate 8-bit data
d16 = immediate 16-bit data
a8 = 8-bit unsigned offset (0xFF00 + a8)
a16 = 16-bit address
r8 = 8-bit signed offset (-128 to +127)
CB = prefix for bit operations
The Fetch-Decode-Execute Cycle
Every CPU in existence follows this fundamental loop:
+-----------------------------------------------------------------------+
| FETCH-DECODE-EXECUTE CYCLE |
+-----------------------------------------------------------------------+
+----------+
| FETCH |
+----+-----+
|
Read opcode from | PC = PC + 1
memory[PC] |
v
+----+-----+
| DECODE |
+----+-----+
|
Determine which | May read additional bytes
instruction to | for operands (d8, d16, r8)
execute | PC = PC + operand_size
v
+----+-----+
| EXECUTE |
+----+-----+
|
Perform the | Update registers
operation | Update flags
| Access memory if needed
|
v
+----------+----------+
| ADD CYCLES TO TOTAL |
+----------+----------+
|
|
v
+----------+----------+
| CHECK INTERRUPTS |
+----------+----------+
|
+------> Back to FETCH
TIMING ACCURACY:
Each instruction takes a specific number of "machine cycles" (M-cycles).
Each M-cycle is 4 clock cycles (T-states).
For example:
NOP = 1 M-cycle = 4 T-states
LD B, C = 1 M-cycle = 4 T-states
LD A, (HL) = 2 M-cycles = 8 T-states (1 for opcode, 1 for memory read)
PUSH BC = 4 M-cycles = 16 T-states (1 + 1 + 2 for stack writes)
CALL a16 = 6 M-cycles = 24 T-states (1 + 2 + 1 + 2 for stack + PC)
Some conditional instructions have DIFFERENT cycle counts:
JR NZ, r8:
If NOT taken: 2 M-cycles (8 T-states)
If taken: 3 M-cycles (12 T-states)
CALL NZ, a16:
If NOT taken: 3 M-cycles (12 T-states)
If taken: 6 M-cycles (24 T-states)
Rust Implementation:
pub struct Cpu<M: MemoryBus> {
pub regs: Registers,
pub bus: M,
pub halted: bool,
pub ime: bool, // Interrupt Master Enable
pub ime_pending: bool, // EI enables IME after next instruction
cycles: u64, // Total cycles executed
}
impl<M: MemoryBus> Cpu<M> {
/// Execute one instruction and return the number of cycles it took
pub fn step(&mut self) -> u8 {
// Handle pending IME enable
if self.ime_pending {
self.ime = true;
self.ime_pending = false;
}
// Handle halted state
if self.halted {
// Still consume 4 cycles while halted
return 4;
}
// FETCH
let opcode = self.fetch_byte();
// DECODE & EXECUTE
let cycles = self.execute(opcode);
// Update total cycle count
self.cycles += cycles as u64;
cycles
}
/// Fetch a byte and increment PC
fn fetch_byte(&mut self) -> u8 {
let byte = self.bus.read(self.regs.pc);
self.regs.pc = self.regs.pc.wrapping_add(1);
byte
}
/// Fetch a 16-bit word and increment PC by 2
fn fetch_word(&mut self) -> u16 {
let lo = self.fetch_byte() as u16;
let hi = self.fetch_byte() as u16;
(hi << 8) | lo
}
/// Execute an instruction and return cycles
fn execute(&mut self, opcode: u8) -> u8 {
match opcode {
0x00 => { /* NOP */ 4 }
0x01 => { /* LD BC, d16 */
let value = self.fetch_word();
self.regs.set_bc(value);
12
}
// ... hundreds more opcodes ...
0xCB => {
let cb_opcode = self.fetch_byte();
self.execute_cb(cb_opcode)
}
_ => panic!("Unknown opcode: 0x{:02X}", opcode),
}
}
fn execute_cb(&mut self, opcode: u8) -> u8 {
match opcode {
0x00 => { /* RLC B */ self.rlc_r8(Reg8::B); 8 }
// ... 255 more CB opcodes ...
_ => panic!("Unknown CB opcode: 0x{:02X}", opcode),
}
}
}
Timing and Cycle Accuracy
Cycle accuracy matters because the Game Boy’s hardware is tightly synchronized:
+-----------------------------------------------------------------------+
| TIMING RELATIONSHIPS |
+-----------------------------------------------------------------------+
CPU Clock: 4,194,304 Hz
|
| / 4
v
Machine Cycles: 1,048,576 Hz
|
|
+-----------------+-----------------+
| | |
v v v
PPU Timing Timer Timing Audio Timing
| | |
v v v
456 cycles Variable Sample rate
per scanline based on TAC depends on
154 scanlines channel
per frame
|
v
70,224 cycles per frame = 59.73 FPS
PPU MODE TIMING (per scanline):
Mode 2 (OAM Scan): 80 cycles - PPU reading OAM
Mode 3 (Drawing): 172+ cycles - PPU reading VRAM, rendering
Mode 0 (HBlank): 204- cycles - Horizontal blank
-----------
456 cycles total
After 144 visible scanlines:
Mode 1 (VBlank): 4560 cycles (10 scanlines x 456 cycles)
-----------
Total per frame: 70,224 cycles
WHY ACCURACY MATTERS:
Many games rely on precise timing for:
1. Raster effects (changing graphics mid-frame)
2. STAT interrupt tricks
3. Audio synchronization
4. Copy protection
If your emulator runs the CPU too fast relative to the PPU,
graphics will glitch and games may break.
Book Reference: Pan Docs (The definitive Game Boy reference) - https://gbdev.io/pandocs/
Memory Banking Controllers (MBC)
Game Boy cartridges can contain far more than 32KB of ROM. Memory Banking Controllers (MBCs) are chips on the cartridge that allow swapping different “banks” of ROM and RAM into the CPU’s address space:
+-----------------------------------------------------------------------+
| MEMORY BANKING OVERVIEW |
+-----------------------------------------------------------------------+
Without MBC (32KB ROM max):
0x0000-0x3FFF: ROM Bank 0 (16KB) - Fixed
0x4000-0x7FFF: ROM Bank 1 (16KB) - Fixed
Total: 32KB ROM
With MBC1 (up to 2MB ROM, 32KB RAM):
0x0000-0x3FFF: ROM Bank 0 (16KB) - Fixed*
0x4000-0x7FFF: ROM Bank 1-127 (16KB) - Switchable
0xA000-0xBFFF: External RAM Bank 0-3 (8KB each) - Switchable
* In advanced mode, bank 0 can also be switched
MBC REGISTER INTERFACE (Memory-Mapped):
MBC1:
0x0000-0x1FFF: RAM Enable
Write 0x0A to enable external RAM
Write 0x00 to disable
0x2000-0x3FFF: ROM Bank Number (Lower 5 bits)
Write bank number (1-31)
Writing 0 selects bank 1
0x4000-0x5FFF: RAM Bank / Upper ROM Bank bits
In ROM mode: Upper 2 bits of ROM bank (for >512KB)
In RAM mode: RAM bank number (0-3)
0x6000-0x7FFF: Banking Mode Select
0 = ROM Banking Mode (default)
1 = RAM Banking Mode
MBC3 (up to 2MB ROM, 32KB RAM, RTC):
Adds Real-Time Clock registers mapped at 0xA000-0xBFFF
when bank 0x08-0x0C is selected.
RTC Registers:
0x08: Seconds (0-59)
0x09: Minutes (0-59)
0x0A: Hours (0-23)
0x0B: Day counter (lower 8 bits)
0x0C: Day counter (bit 0), Halt flag (bit 6), Carry (bit 7)
MBC5 (up to 8MB ROM, 128KB RAM):
Most common MBC for later games.
0x2000-0x2FFF: ROM Bank Number (lower 8 bits)
0x3000-0x3FFF: ROM Bank Number (9th bit)
0x4000-0x5FFF: RAM Bank Number (0-15)
Rust Implementation:
pub enum MbcType {
None,
Mbc1,
Mbc3,
Mbc5,
}
pub struct Cartridge {
rom: Vec<u8>,
ram: Vec<u8>,
mbc_type: MbcType,
rom_bank: usize,
ram_bank: usize,
ram_enabled: bool,
banking_mode: u8,
}
impl Cartridge {
pub fn read(&self, addr: u16) -> u8 {
match addr {
// Fixed ROM Bank 0
0x0000..=0x3FFF => self.rom[addr as usize],
// Switchable ROM Bank
0x4000..=0x7FFF => {
let offset = (addr as usize) - 0x4000;
let bank_offset = self.rom_bank * 0x4000;
self.rom.get(bank_offset + offset).copied().unwrap_or(0xFF)
}
// External RAM
0xA000..=0xBFFF => {
if self.ram_enabled && !self.ram.is_empty() {
let offset = (addr as usize) - 0xA000;
let bank_offset = self.ram_bank * 0x2000;
self.ram.get(bank_offset + offset).copied().unwrap_or(0xFF)
} else {
0xFF
}
}
_ => 0xFF,
}
}
pub fn write(&mut self, addr: u16, value: u8) {
match self.mbc_type {
MbcType::None => { /* No MBC, writes ignored */ }
MbcType::Mbc1 => self.write_mbc1(addr, value),
MbcType::Mbc3 => self.write_mbc3(addr, value),
MbcType::Mbc5 => self.write_mbc5(addr, value),
}
}
fn write_mbc1(&mut self, addr: u16, value: u8) {
match addr {
0x0000..=0x1FFF => {
self.ram_enabled = (value & 0x0F) == 0x0A;
}
0x2000..=0x3FFF => {
let bank = (value & 0x1F) as usize;
self.rom_bank = if bank == 0 { 1 } else { bank };
}
0x4000..=0x5FFF => {
self.ram_bank = (value & 0x03) as usize;
}
0x6000..=0x7FFF => {
self.banking_mode = value & 0x01;
}
0xA000..=0xBFFF => {
if self.ram_enabled && !self.ram.is_empty() {
let offset = (addr as usize) - 0xA000;
let bank_offset = self.ram_bank * 0x2000;
if bank_offset + offset < self.ram.len() {
self.ram[bank_offset + offset] = value;
}
}
}
_ => {}
}
}
}
PPU (Picture Processing Unit) Basics
While this project focuses on the CPU, understanding the PPU is essential because they share memory and synchronization:
+-----------------------------------------------------------------------+
| PPU ARCHITECTURE OVERVIEW |
+-----------------------------------------------------------------------+
DISPLAY CHARACTERISTICS:
Resolution: 160 x 144 pixels
Colors: 4 shades (on original GB)
Refresh rate: 59.73 Hz
TILE-BASED RENDERING:
The Game Boy doesn't store pixels directly. Instead, it stores:
1. Tile patterns (8x8 pixel bitmaps)
2. Tile maps (which tiles go where)
3. Sprite data (position, tile, flags)
+---+---+---+---+---+ Each tile is 8x8 pixels
| | | | | | Each pixel is 2 bits (4 colors)
+---+---+---+---+---+ Each tile = 16 bytes
| | | | | |
+---+---+---+---+---+ 256 tiles can be in memory at once
| | | | | |
+---+---+---+---+---+
| | | | | |
+---+---+---+---+---+
TILE DATA ENCODING:
Each row of a tile is 2 bytes:
Byte 0: Low bits of each pixel
Byte 1: High bits of each pixel
Example row:
Byte 0: 0x3C = 0011 1100
Byte 1: 0x7E = 0111 1110
Pixel values:
Pos: 7 6 5 4 3 2 1 0
Byte0: 0 0 1 1 1 1 0 0
Byte1: 0 1 1 1 1 1 1 0
--------------------------------
Color: 0 2 3 3 3 3 2 0
| | | | | |
White Light Dark White
PPU MODES (Visible in STAT register):
Mode 0: HBlank (204 cycles)
- CPU can access VRAM and OAM
Mode 1: VBlank (4560 cycles = 10 lines)
- CPU can access VRAM and OAM
- Good time for game logic and VRAM updates
- VBLANK interrupt triggered
Mode 2: OAM Scan (80 cycles)
- CPU can access VRAM but NOT OAM
- PPU reading sprite data
Mode 3: Drawing (172-289 cycles)
- CPU CANNOT access VRAM or OAM
- PPU rendering pixels
SCANLINE TIMING DIAGRAM:
Scanline 0-143 (Visible):
|-- Mode 2 --|------- Mode 3 -------|------- Mode 0 -------|
| 80 cyc | 172-289 cyc | 87-204 cyc |
| | | |
|<---------------------- 456 cycles ----------------------->|
Scanline 144-153 (VBlank):
|----------------------- Mode 1 ---------------------------|
| 456 cycles |
Interrupts: VBLANK, LCD STAT, Timer, Serial, Joypad
The Game Boy has 5 interrupt sources with fixed priority:
+-----------------------------------------------------------------------+
| INTERRUPT SYSTEM |
+-----------------------------------------------------------------------+
INTERRUPT VECTORS (Fixed addresses in ROM):
Address | Interrupt | Priority | Bit in IF/IE
----------+----------------+----------+--------------
0x0040 | VBlank | Highest | Bit 0
0x0048 | LCD STAT | | Bit 1
0x0050 | Timer | | Bit 2
0x0058 | Serial | | Bit 3
0x0060 | Joypad | Lowest | Bit 4
INTERRUPT HANDLING SEQUENCE:
When an interrupt occurs and IME=1:
1. The IF bit corresponding to the interrupt is set by hardware
2. CPU completes current instruction
3. CPU checks: (IE & IF) != 0 and IME == 1
4. If true:
a. IME = 0 (disable further interrupts)
b. PUSH PC onto stack (SP -= 2, write PC)
c. Clear the specific IF bit being serviced
d. PC = interrupt vector address
e. Total time: 5 M-cycles (20 T-states)
REGISTERS:
0xFF0F - IF (Interrupt Flag)
Bit 4: Joypad
Bit 3: Serial
Bit 2: Timer
Bit 1: LCD STAT
Bit 0: VBlank
Writing 1 to a bit REQUESTS that interrupt.
Reading shows pending interrupts.
Bits are CLEARED when interrupt is serviced.
0xFFFF - IE (Interrupt Enable)
Same bit layout as IF.
Bit = 1: Interrupt source is enabled
Bit = 0: Interrupt source is masked
IME (Interrupt Master Enable) - NOT memory-mapped
Internal CPU flag.
Controlled by EI, DI, RETI instructions.
EI enables IME AFTER the next instruction.
DI disables IME immediately.
PRIORITY HANDLING:
If multiple interrupts are pending, the lowest-numbered bit wins:
IF = 0x05 (VBlank + Timer pending)
IE = 0xFF (all enabled)
VBlank (bit 0) is serviced first.
After RETI or EI, Timer (bit 2) would be serviced.
HALT BEHAVIOR:
The HALT instruction stops CPU execution until an interrupt:
IF (IE & IF) == 0:
CPU sleeps, PPU and timers continue running
CPU wakes when (IE & IF) != 0
If IME == 0 but (IE & IF) != 0:
HALT BUG: PC fails to increment on next instruction
(Yes, this is a real hardware bug!)
Rust Implementation:
impl<M: MemoryBus> Cpu<M> {
/// Check for and handle pending interrupts
/// Returns cycles consumed by interrupt handling (0 or 20)
pub fn handle_interrupts(&mut self) -> u8 {
let ie = self.bus.read(0xFFFF);
let if_ = self.bus.read(0xFF0F);
let pending = ie & if_;
if pending == 0 {
return 0;
}
// An interrupt is pending - wake from HALT
self.halted = false;
// If IME is disabled, don't actually service the interrupt
if !self.ime {
return 0;
}
// Service the highest priority pending interrupt
let interrupt_bit = pending.trailing_zeros() as u8;
let vector = 0x0040 + (interrupt_bit as u16 * 8);
// Clear the IF flag
self.bus.write(0xFF0F, if_ & !(1 << interrupt_bit));
// Disable interrupts
self.ime = false;
// Push PC and jump to vector
self.push_word(self.regs.pc);
self.regs.pc = vector;
20 // Interrupt handling takes 5 M-cycles
}
fn push_word(&mut self, value: u16) {
self.regs.sp = self.regs.sp.wrapping_sub(1);
self.bus.write(self.regs.sp, (value >> 8) as u8);
self.regs.sp = self.regs.sp.wrapping_sub(1);
self.bus.write(self.regs.sp, value as u8);
}
}
Why no_std Matters for Portability
Building the emulator core with #![no_std] provides massive portability benefits:
+-----------------------------------------------------------------------+
| no_std PORTABILITY |
+-----------------------------------------------------------------------+
WITH no_std, YOUR CORE CAN RUN ON:
+------------------+ +-----------------+ +------------------+
| Web Browser | | Desktop App | | Embedded MCU |
| (via WebAssembly)| | (Windows/Linux/ | | (ESP32, STM32, |
| | | macOS) | | Raspberry Pi |
+------------------+ +-----------------+ | Pico) |
| | +------------------+
| | |
v v v
+---------------------------------------------------------------+
| YOUR no_std CORE |
| |
| pub struct Cpu<M: MemoryBus> { ... } |
| impl<M: MemoryBus> Cpu<M> { pub fn step(&mut self) -> u8 } |
| |
+---------------------------------------------------------------+
|
| Implements trait
v
+---------------------------------------------------------------+
| MemoryBus TRAIT |
| |
| fn read(&self, addr: u16) -> u8; |
| fn write(&mut self, addr: u16, value: u8); |
| |
+---------------------------------------------------------------+
PLATFORM-SPECIFIC IMPLEMENTATIONS:
WebAssembly:
- MemoryBus backed by JavaScript TypedArray
- Frame rendering via Canvas API
- Audio via Web Audio API
- Input from DOM events
Desktop:
- MemoryBus backed by Vec<u8>
- Frame rendering via SDL2, winit, or similar
- Audio via cpal, rodio, or system audio
- Input from window events
Embedded:
- MemoryBus backed by static array or external RAM
- Frame rendering via SPI display (ILI9341, ST7789, etc.)
- Audio via I2S DAC
- Input from GPIO buttons
THE KEY INSIGHT:
By abstracting all I/O through traits, the CPU core becomes
pure computation - no allocation, no system calls, no platform
dependencies. It's just:
loop {
let cycles = cpu.step();
advance_other_hardware(cycles);
}
Book Reference: “The Secret Life of Programs” by Jonathan Steinhart - Chapter 5
Real World Outcome
When complete, you will have a working Game Boy emulator core that demonstrates deep understanding of CPU architecture, memory systems, and Rust’s no_std ecosystem.
Example: Boot Sequence and CPU Trace
$ cargo run --release -- --rom roms/tetris.gb --trace --frames 3
+=======================================================================+
| Game Boy Emulator Core v0.1.0 (no_std) |
+=======================================================================+
| ROM: tetris.gb |
| Size: 32768 bytes (32 KB) |
| Type: ROM ONLY (No MBC) |
| Checksum: 0x3B VALID |
+=======================================================================+
[BOOT] Initializing CPU (LR35902 @ 4.194304 MHz)
[BOOT] Initializing PPU (LCD Controller)
[BOOT] Initializing Memory Map (64KB address space)
[BOOT] Loading Boot ROM (256 bytes) at 0x0000-0x00FF
[BOOT] Starting execution at PC=0x0000
======================== CPU EXECUTION TRACE ===========================
Cycle: 0000000 | PC: 0x0000 | SP: 0x0000 | [BOOT ROM]
Memory[0x0000] = 0x31
OP: 0x31 (LD SP, d16)
Operands: 0xFE, 0xFF -> d16 = 0xFFFE
Execution: SP <- 0xFFFE
Registers BEFORE: A:00 F:00 B:00 C:00 D:00 E:00 H:00 L:00 SP:0000
Registers AFTER: A:00 F:00 B:00 C:00 D:00 E:00 H:00 L:00 SP:FFFE
Flags: [Z:0 N:0 H:0 C:0] (unchanged)
Cycles: 12 (3 M-cycles)
Cycle: 0000012 | PC: 0x0003 | SP: 0xFFFE
Memory[0x0003] = 0xAF
OP: 0xAF (XOR A)
Execution: A <- A XOR A = 0x00
Registers AFTER: A:00 F:80 B:00 C:00 D:00 E:00 H:00 L:00 SP:FFFE
Flags: [Z:1 N:0 H:0 C:0] <- Zero flag SET (result is 0)
Cycles: 4 (1 M-cycle)
Cycle: 0000016 | PC: 0x0004 | SP: 0xFFFE
Memory[0x0004] = 0x21
OP: 0x21 (LD HL, d16)
Operands: 0x26, 0xFF -> d16 = 0xFF26
Execution: HL <- 0xFF26 (Audio Master Control register)
Registers AFTER: A:00 F:80 B:00 C:00 D:00 E:00 H:FF L:26 SP:FFFE
Flags: [Z:1 N:0 H:0 C:0] (unchanged)
Cycles: 12 (3 M-cycles)
Cycle: 0000028 | PC: 0x0007 | SP: 0xFFFE
Memory[0x0007] = 0x0E
OP: 0x0E (LD C, d8)
Operands: 0x11
Execution: C <- 0x11
Registers AFTER: A:00 F:80 B:00 C:11 D:00 E:00 H:FF L:26 SP:FFFE
Cycles: 8 (2 M-cycles)
... [Boot ROM continues for ~244 more instructions] ...
Cycle: 0024812 | PC: 0x00FC | SP: 0xFFFE | [BOOT ROM -> GAME ROM]
Memory[0x00FC] = 0xE0
OP: 0xE0 (LDH (a8), A)
Operands: 0x50
Execution: Memory[0xFF50] <- A
Writing 0x01 to 0xFF50 DISABLES BOOT ROM
[BOOT] Boot ROM disabled. Cartridge ROM now visible at 0x0000-0x00FF
Cycles: 12
Cycle: 0024824 | PC: 0x0100 | SP: 0xFFFE | [GAME ROM - ENTRY POINT]
=================================================================
| GAME ROM EXECUTION BEGINS |
=================================================================
Memory[0x0100] = 0x00
OP: 0x00 (NOP)
Execution: No operation
Registers: A:01 F:B0 B:00 C:13 D:00 E:D8 H:01 L:4D SP:FFFE
Flags: [Z:1 N:0 H:1 C:1]
Cycles: 4
Cycle: 0024828 | PC: 0x0101 | SP: 0xFFFE
Memory[0x0101] = 0xC3
OP: 0xC3 (JP a16)
Operands: 0x50, 0x01 -> a16 = 0x0150
Execution: PC <- 0x0150 (Jump to game initialization)
Cycles: 16
Cycle: 0024844 | PC: 0x0150 | SP: 0xFFFE | [GAME INITIALIZATION]
Memory[0x0150] = 0xC3
OP: 0xC3 (JP a16)
Operands: 0xD3, 0x02 -> a16 = 0x02D3
Execution: PC <- 0x02D3
Cycles: 16
... [Game initialization continues] ...
======================== PPU RENDERING TRACE ===========================
[PPU] Frame 0 | Scanline: 000 | Mode: OAM_SCAN
LCD Control (0xFF40): 0x91
- LCD Enabled: YES
- Window Tile Map: 0x9800
- Window Enabled: NO
- BG Tile Data: 0x8000
- BG Tile Map: 0x9800
- OBJ Size: 8x8
- OBJ Enabled: NO
- BG Enabled: YES
[PPU] Frame 0 | Scanline: 000 | Mode: DRAWING
Scroll: SCX=0x00, SCY=0x00
Fetching tiles from map at 0x9800...
Tile[0,0] = 0x00 -> Pattern at 0x8000
Tile[0,1] = 0x00 -> Pattern at 0x8000
... [20 tiles per scanline]
Rendering 160 pixels to framebuffer[0..159]
[PPU] Frame 0 | Scanline: 000 | Mode: HBLANK
Horizontal blank period
CPU can access VRAM freely
[PPU] Frame 0 | Scanline: 001 | Mode: OAM_SCAN
... [Scanlines 1-143 continue] ...
[PPU] Frame 0 | Scanline: 144 | Mode: VBLANK
+===========================================================+
| VBLANK INTERRUPT TRIGGERED |
| |
| Frame Complete: |
| - 144 visible scanlines rendered |
| - 160x144 = 23,040 pixels |
| - 4 shades per pixel |
| |
| Timing: |
| - 70,224 CPU cycles per frame |
| - Frame time: 16.74ms (59.73 Hz) |
+===========================================================+
[INT] Setting IF bit 0 (VBlank)
[INT] IE & IF = 0x01, IME = 1
[INT] Servicing VBlank interrupt
[INT] PUSH PC (0x02F8) -> Stack at 0xFFFC
[INT] PC <- 0x0040 (VBlank vector)
[INT] IME <- 0 (Interrupts disabled)
Cycle: 0070244 | PC: 0x0040 | SP: 0xFFFC | [VBLANK HANDLER]
Memory[0x0040] = 0xC3
OP: 0xC3 (JP a16)
Operands: 0x00, 0x04 -> a16 = 0x0400
Execution: PC <- 0x0400 (Game's VBlank handler)
Cycles: 16
... [VBlank handler executes] ...
[PPU] Frame 0 | Scanline: 153 | Mode: VBLANK (last)
[PPU] Frame 0 Complete. Starting Frame 1.
[PPU] Frame 1 | Starting new frame
[PPU] Frame 2 | Starting new frame
[PPU] Frame 3 | Starting new frame
======================== EXECUTION SUMMARY =============================
Total Cycles Executed: 280,896 (4 frames)
Total Instructions: 24,312
Total Frames Rendered: 4
Average FPS: 59.73 Hz (target: 59.73 Hz)
Execution Time: 66.96ms (simulated)
Instruction Breakdown:
+------------------+--------+--------+
| Category | Count | % |
+------------------+--------+--------+
| 8-bit Loads | 6,234 | 25.6% |
| 16-bit Loads | 1,102 | 4.5% |
| Arithmetic | 3,891 | 16.0% |
| Logic | 2,567 | 10.6% |
| Bit Operations | 1,234 | 5.1% |
| Jumps/Calls | 2,876 | 11.8% |
| Stack Operations | 891 | 3.7% |
| Control | 5,517 | 22.7% |
+------------------+--------+--------+
Memory Access Statistics:
ROM Reads: 19,234
WRAM Reads: 8,901
WRAM Writes: 4,567
VRAM Reads: 1,234 (mostly by PPU)
VRAM Writes: 456
I/O Reads: 2,123
I/O Writes: 1,024
Interrupt Statistics:
VBlank: 4 (one per frame)
LCD STAT: 0
Timer: 0
Serial: 0
Joypad: 0
[EMULATOR] Execution complete. Exiting.
Example: no_std Build for WebAssembly
$ cargo build --target wasm32-unknown-unknown --release
Compiling gameboy-core v0.1.0
Finished release [optimized] target(s) in 2.34s
$ ls -la target/wasm32-unknown-unknown/release/
-rwxr-xr-x 1 user staff 45678 Dec 27 12:00 gameboy_core.wasm
$ wasm-opt -Os -o optimized.wasm target/wasm32-unknown-unknown/release/gameboy_core.wasm
$ ls -la optimized.wasm
-rw-r--r-- 1 user staff 23456 Dec 27 12:00 optimized.wasm
[BUILD] WebAssembly output: 23 KB (no_std, no allocations)
[BUILD] Can be loaded directly in browser with <50KB footprint
Example: no_std Build for Embedded
$ cargo build --target thumbv7em-none-eabihf --release
Compiling gameboy-core v0.1.0
Finished release [optimized] target(s) in 3.42s
$ arm-none-eabi-size target/thumbv7em-none-eabihf/release/libgameboy_core.a
text data bss dec hex filename
18432 24 2048 20504 5018 gameboy_core.o
[BUILD] Successfully compiled for ARM Cortex-M4 (no_std)
[BUILD] Code size: 18 KB (Flash)
[BUILD] RAM usage: 2 KB (Static)
[BUILD] Perfect for ESP32, STM32F4, RP2040, etc.
Complete Project Specification
Minimum Viable Product (MVP):
- CPU core with all 512 opcodes correctly implemented
- Pass Blargg’s cpu_instrs.gb test ROM (11 individual tests)
- Basic memory bus supporting 32KB ROM (no MBC)
- Accurate cycle counting
- Compiles with no_std
Extended Goals:
- MBC1 support for larger ROMs (2MB)
- MBC3 support with RTC
- Pass Blargg’s instr_timing.gb test
- Basic PPU for visual output
- Save state serialization
Solution Architecture
+-----------------------------------------------------------------------+
| PROJECT STRUCTURE |
+-----------------------------------------------------------------------+
gameboy-core/
+-- Cargo.toml
+-- src/
| +-- lib.rs <- Crate root, no_std configuration
| +-- cpu/
| | +-- mod.rs <- CPU struct and step() method
| | +-- registers.rs <- Register definitions and accessors
| | +-- opcodes.rs <- Opcode execution (base instructions)
| | +-- cb_opcodes.rs <- CB-prefixed instructions
| | +-- flags.rs <- Flag calculation helpers
| | +-- interrupts.rs <- Interrupt handling
| +-- memory/
| | +-- mod.rs <- MemoryBus trait definition
| | +-- bus.rs <- Main memory bus implementation
| | +-- cartridge.rs <- ROM loading, MBC handling
| | +-- io.rs <- I/O register handling
| +-- ppu/
| | +-- mod.rs <- PPU struct (optional for MVP)
| +-- timer.rs <- Timer (DIV, TIMA, TMA, TAC)
+-- tests/
| +-- blargg.rs <- Integration tests with Blargg ROMs
| +-- instruction_tests.rs <- Unit tests for each opcode
+-- roms/ <- Test ROMs (git-ignored)
+-- cpu_instrs.gb
+-- instr_timing.gb
Core Traits and Types:
// lib.rs
#![no_std]
pub mod cpu;
pub mod memory;
pub mod timer;
pub use cpu::Cpu;
pub use memory::{MemoryBus, Bus};
// memory/mod.rs
/// Memory bus abstraction - implement this for different backends
pub trait MemoryBus {
fn read(&self, addr: u16) -> u8;
fn write(&mut self, addr: u16, value: u8);
fn read_word(&self, addr: u16) -> u16 {
let lo = self.read(addr) as u16;
let hi = self.read(addr.wrapping_add(1)) as u16;
(hi << 8) | lo
}
fn write_word(&mut self, addr: u16, value: u16) {
self.write(addr, value as u8);
self.write(addr.wrapping_add(1), (value >> 8) as u8);
}
}
// cpu/mod.rs
pub struct Cpu<M: MemoryBus> {
pub regs: Registers,
pub bus: M,
pub halted: bool,
pub stopped: bool,
pub ime: bool,
pub ime_pending: bool,
cycles: u64,
}
impl<M: MemoryBus> Cpu<M> {
pub fn new(bus: M) -> Self {
Self {
regs: Registers::new(),
bus,
halted: false,
stopped: false,
ime: false,
ime_pending: false,
cycles: 0,
}
}
/// Execute one instruction and return cycles consumed
pub fn step(&mut self) -> u8 {
// Handle pending EI
if self.ime_pending {
self.ime = true;
self.ime_pending = false;
}
if self.halted {
return 4;
}
let opcode = self.fetch_byte();
self.execute(opcode)
}
/// Check and handle interrupts, return cycles consumed
pub fn handle_interrupts(&mut self) -> u8 {
// ... implementation
}
pub fn total_cycles(&self) -> u64 {
self.cycles
}
}
Phased Implementation Guide
Phase 1: Register Definitions with Bit-Shifting (Days 1-2)
Goal: Create the register file with efficient 8-bit and 16-bit access.
// cpu/registers.rs
/// LR35902 CPU registers
#[derive(Debug, Clone, Copy, Default)]
pub struct Registers {
pub a: u8,
pub f: u8,
pub b: u8,
pub c: u8,
pub d: u8,
pub e: u8,
pub h: u8,
pub l: u8,
pub sp: u16,
pub pc: u16,
}
impl Registers {
/// Create registers with post-boot values
pub fn new() -> Self {
Self {
a: 0x01,
f: 0xB0, // Z=1, N=0, H=1, C=1
b: 0x00,
c: 0x13,
d: 0x00,
e: 0xD8,
h: 0x01,
l: 0x4D,
sp: 0xFFFE,
pc: 0x0100,
}
}
// 16-bit register pair accessors
pub fn af(&self) -> u16 {
((self.a as u16) << 8) | (self.f as u16)
}
pub fn set_af(&mut self, value: u16) {
self.a = (value >> 8) as u8;
self.f = (value & 0xF0) as u8; // Lower 4 bits always 0
}
pub fn bc(&self) -> u16 {
((self.b as u16) << 8) | (self.c as u16)
}
pub fn set_bc(&mut self, value: u16) {
self.b = (value >> 8) as u8;
self.c = value as u8;
}
pub fn de(&self) -> u16 {
((self.d as u16) << 8) | (self.e as u16)
}
pub fn set_de(&mut self, value: u16) {
self.d = (value >> 8) as u8;
self.e = value as u8;
}
pub fn hl(&self) -> u16 {
((self.h as u16) << 8) | (self.l as u16)
}
pub fn set_hl(&mut self, value: u16) {
self.h = (value >> 8) as u8;
self.l = value as u8;
}
// Flag accessors
pub fn z(&self) -> bool { (self.f & 0x80) != 0 }
pub fn n(&self) -> bool { (self.f & 0x40) != 0 }
pub fn h(&self) -> bool { (self.f & 0x20) != 0 }
pub fn c(&self) -> bool { (self.f & 0x10) != 0 }
pub fn set_z(&mut self, value: bool) {
if value { self.f |= 0x80; } else { self.f &= !0x80; }
}
pub fn set_n(&mut self, value: bool) {
if value { self.f |= 0x40; } else { self.f &= !0x40; }
}
pub fn set_h(&mut self, value: bool) {
if value { self.f |= 0x20; } else { self.f &= !0x20; }
}
pub fn set_c(&mut self, value: bool) {
if value { self.f |= 0x10; } else { self.f &= !0x10; }
}
/// Set all flags at once
pub fn set_flags(&mut self, z: bool, n: bool, h: bool, c: bool) {
self.f = 0;
if z { self.f |= 0x80; }
if n { self.f |= 0x40; }
if h { self.f |= 0x20; }
if c { self.f |= 0x10; }
}
}
Checkpoint: Write unit tests that verify register pairing works correctly.
Phase 2: Memory Bus Abstraction (Days 3-4)
Goal: Create a flexible memory bus that can support different backends.
// memory/mod.rs
pub trait MemoryBus {
fn read(&self, addr: u16) -> u8;
fn write(&mut self, addr: u16, value: u8);
fn read_word(&self, addr: u16) -> u16 {
let lo = self.read(addr) as u16;
let hi = self.read(addr.wrapping_add(1)) as u16;
(hi << 8) | lo
}
fn write_word(&mut self, addr: u16, value: u16) {
self.write(addr, value as u8);
self.write(addr.wrapping_add(1), (value >> 8) as u8);
}
}
// memory/bus.rs
/// Main Game Boy memory bus
pub struct Bus {
/// Cartridge ROM (32KB for no-MBC)
rom: [u8; 0x8000],
/// Video RAM (8KB)
vram: [u8; 0x2000],
/// Work RAM (8KB)
wram: [u8; 0x2000],
/// High RAM (127 bytes)
hram: [u8; 127],
/// I/O registers (128 bytes)
io: [u8; 128],
/// Interrupt Enable register
ie: u8,
/// Boot ROM enabled flag
boot_rom_enabled: bool,
/// Boot ROM contents (256 bytes)
boot_rom: [u8; 256],
}
impl Bus {
pub fn new() -> Self {
Self {
rom: [0; 0x8000],
vram: [0; 0x2000],
wram: [0; 0x2000],
hram: [0; 127],
io: [0; 128],
ie: 0,
boot_rom_enabled: true,
boot_rom: [0; 256],
}
}
pub fn load_rom(&mut self, data: &[u8]) {
let len = data.len().min(self.rom.len());
self.rom[..len].copy_from_slice(&data[..len]);
}
pub fn load_boot_rom(&mut self, data: &[u8]) {
let len = data.len().min(256);
self.boot_rom[..len].copy_from_slice(&data[..len]);
}
}
impl MemoryBus for Bus {
fn read(&self, addr: u16) -> u8 {
match addr {
// ROM Bank 0 / Boot ROM
0x0000..=0x00FF => {
if self.boot_rom_enabled {
self.boot_rom[addr as usize]
} else {
self.rom[addr as usize]
}
}
// ROM Bank 0 (continued)
0x0100..=0x3FFF => self.rom[addr as usize],
// ROM Bank 1 (for no-MBC)
0x4000..=0x7FFF => self.rom[addr as usize],
// VRAM
0x8000..=0x9FFF => self.vram[(addr - 0x8000) as usize],
// External RAM (not implemented for no-MBC)
0xA000..=0xBFFF => 0xFF,
// WRAM
0xC000..=0xDFFF => self.wram[(addr - 0xC000) as usize],
// Echo RAM (mirror of WRAM)
0xE000..=0xFDFF => self.wram[(addr - 0xE000) as usize],
// OAM
0xFE00..=0xFE9F => 0xFF, // TODO: Implement OAM
// Unusable
0xFEA0..=0xFEFF => 0xFF,
// I/O Registers
0xFF00..=0xFF7F => self.read_io(addr),
// HRAM
0xFF80..=0xFFFE => self.hram[(addr - 0xFF80) as usize],
// Interrupt Enable
0xFFFF => self.ie,
}
}
fn write(&mut self, addr: u16, value: u8) {
match addr {
// ROM (read-only for no-MBC)
0x0000..=0x7FFF => { /* Ignored */ }
// VRAM
0x8000..=0x9FFF => self.vram[(addr - 0x8000) as usize] = value,
// External RAM
0xA000..=0xBFFF => { /* Ignored for no-MBC */ }
// WRAM
0xC000..=0xDFFF => self.wram[(addr - 0xC000) as usize] = value,
// Echo RAM
0xE000..=0xFDFF => self.wram[(addr - 0xE000) as usize] = value,
// OAM
0xFE00..=0xFE9F => { /* TODO */ }
// Unusable
0xFEA0..=0xFEFF => { /* Ignored */ }
// I/O Registers
0xFF00..=0xFF7F => self.write_io(addr, value),
// HRAM
0xFF80..=0xFFFE => self.hram[(addr - 0xFF80) as usize] = value,
// Interrupt Enable
0xFFFF => self.ie = value,
}
}
}
impl Bus {
fn read_io(&self, addr: u16) -> u8 {
let offset = (addr - 0xFF00) as usize;
match addr {
0xFF44 => 0x90, // LY = 144 (fake VBlank for testing)
_ => self.io[offset],
}
}
fn write_io(&mut self, addr: u16, value: u8) {
let offset = (addr - 0xFF00) as usize;
match addr {
0xFF50 => {
// Writing to FF50 disables boot ROM
if value != 0 {
self.boot_rom_enabled = false;
}
}
_ => self.io[offset] = value,
}
}
}
Checkpoint: Memory reads and writes work correctly, boot ROM overlay functions.
Phase 3: Implement Core Instructions (Days 5-10)
Goal: Implement the 20 most common instructions that form the backbone of any program.
Start with these essential opcodes:
// cpu/opcodes.rs
impl<M: MemoryBus> Cpu<M> {
pub fn execute(&mut self, opcode: u8) -> u8 {
match opcode {
// NOP
0x00 => 4,
// LD BC, d16
0x01 => {
let value = self.fetch_word();
self.regs.set_bc(value);
12
}
// LD (BC), A
0x02 => {
self.bus.write(self.regs.bc(), self.regs.a);
8
}
// INC BC
0x03 => {
self.regs.set_bc(self.regs.bc().wrapping_add(1));
8
}
// INC B
0x04 => {
self.regs.b = self.inc8(self.regs.b);
4
}
// DEC B
0x05 => {
self.regs.b = self.dec8(self.regs.b);
4
}
// LD B, d8
0x06 => {
self.regs.b = self.fetch_byte();
8
}
// ADD A, B
0x80 => {
self.add_a(self.regs.b);
4
}
// SUB B
0x90 => {
self.sub_a(self.regs.b);
4
}
// AND B
0xA0 => {
self.and_a(self.regs.b);
4
}
// XOR A
0xAF => {
self.xor_a(self.regs.a);
4
}
// OR B
0xB0 => {
self.or_a(self.regs.b);
4
}
// CP B
0xB8 => {
self.cp_a(self.regs.b);
4
}
// JP a16
0xC3 => {
let addr = self.fetch_word();
self.regs.pc = addr;
16
}
// JP NZ, a16
0xC2 => {
let addr = self.fetch_word();
if !self.regs.z() {
self.regs.pc = addr;
16
} else {
12
}
}
// JR r8
0x18 => {
let offset = self.fetch_byte() as i8;
self.regs.pc = self.regs.pc.wrapping_add(offset as u16);
12
}
// CALL a16
0xCD => {
let addr = self.fetch_word();
self.push_word(self.regs.pc);
self.regs.pc = addr;
24
}
// RET
0xC9 => {
self.regs.pc = self.pop_word();
16
}
// PUSH BC
0xC5 => {
self.push_word(self.regs.bc());
16
}
// POP BC
0xC1 => {
let value = self.pop_word();
self.regs.set_bc(value);
12
}
// CB prefix
0xCB => {
let cb_opcode = self.fetch_byte();
self.execute_cb(cb_opcode)
}
_ => {
panic!("Unimplemented opcode: 0x{:02X} at PC: 0x{:04X}",
opcode, self.regs.pc.wrapping_sub(1));
}
}
}
// Helper functions for arithmetic
fn inc8(&mut self, value: u8) -> u8 {
let result = value.wrapping_add(1);
self.regs.set_z(result == 0);
self.regs.set_n(false);
self.regs.set_h((value & 0x0F) == 0x0F);
// C flag not affected
result
}
fn dec8(&mut self, value: u8) -> u8 {
let result = value.wrapping_sub(1);
self.regs.set_z(result == 0);
self.regs.set_n(true);
self.regs.set_h((value & 0x0F) == 0);
// C flag not affected
result
}
fn add_a(&mut self, value: u8) {
let a = self.regs.a;
let result = a.wrapping_add(value);
self.regs.set_z(result == 0);
self.regs.set_n(false);
self.regs.set_h((a & 0x0F) + (value & 0x0F) > 0x0F);
self.regs.set_c((a as u16) + (value as u16) > 0xFF);
self.regs.a = result;
}
fn sub_a(&mut self, value: u8) {
let a = self.regs.a;
let result = a.wrapping_sub(value);
self.regs.set_z(result == 0);
self.regs.set_n(true);
self.regs.set_h((a & 0x0F) < (value & 0x0F));
self.regs.set_c(a < value);
self.regs.a = result;
}
fn and_a(&mut self, value: u8) {
self.regs.a &= value;
self.regs.set_flags(self.regs.a == 0, false, true, false);
}
fn xor_a(&mut self, value: u8) {
self.regs.a ^= value;
self.regs.set_flags(self.regs.a == 0, false, false, false);
}
fn or_a(&mut self, value: u8) {
self.regs.a |= value;
self.regs.set_flags(self.regs.a == 0, false, false, false);
}
fn cp_a(&mut self, value: u8) {
let a = self.regs.a;
self.regs.set_z(a == value);
self.regs.set_n(true);
self.regs.set_h((a & 0x0F) < (value & 0x0F));
self.regs.set_c(a < value);
// A is NOT modified
}
fn push_word(&mut self, value: u16) {
self.regs.sp = self.regs.sp.wrapping_sub(1);
self.bus.write(self.regs.sp, (value >> 8) as u8);
self.regs.sp = self.regs.sp.wrapping_sub(1);
self.bus.write(self.regs.sp, value as u8);
}
fn pop_word(&mut self) -> u16 {
let lo = self.bus.read(self.regs.sp) as u16;
self.regs.sp = self.regs.sp.wrapping_add(1);
let hi = self.bus.read(self.regs.sp) as u16;
self.regs.sp = self.regs.sp.wrapping_add(1);
(hi << 8) | lo
}
}
Checkpoint: Simple test programs execute correctly.
Phase 4: Implement CB-Prefixed Bit Operations (Days 11-14)
Goal: Complete the 256 CB-prefixed instructions.
// cpu/cb_opcodes.rs
impl<M: MemoryBus> Cpu<M> {
pub fn execute_cb(&mut self, opcode: u8) -> u8 {
// Extract register index from lower 3 bits
let reg_idx = opcode & 0x07;
// Extract operation from upper bits
let operation = opcode >> 3;
match opcode {
// RLC r
0x00..=0x07 => {
let value = self.read_r8(reg_idx);
let result = self.rlc(value);
self.write_r8(reg_idx, result);
if reg_idx == 6 { 16 } else { 8 }
}
// RRC r
0x08..=0x0F => {
let value = self.read_r8(reg_idx);
let result = self.rrc(value);
self.write_r8(reg_idx, result);
if reg_idx == 6 { 16 } else { 8 }
}
// RL r
0x10..=0x17 => {
let value = self.read_r8(reg_idx);
let result = self.rl(value);
self.write_r8(reg_idx, result);
if reg_idx == 6 { 16 } else { 8 }
}
// RR r
0x18..=0x1F => {
let value = self.read_r8(reg_idx);
let result = self.rr(value);
self.write_r8(reg_idx, result);
if reg_idx == 6 { 16 } else { 8 }
}
// SLA r
0x20..=0x27 => {
let value = self.read_r8(reg_idx);
let result = self.sla(value);
self.write_r8(reg_idx, result);
if reg_idx == 6 { 16 } else { 8 }
}
// SRA r
0x28..=0x2F => {
let value = self.read_r8(reg_idx);
let result = self.sra(value);
self.write_r8(reg_idx, result);
if reg_idx == 6 { 16 } else { 8 }
}
// SWAP r
0x30..=0x37 => {
let value = self.read_r8(reg_idx);
let result = self.swap(value);
self.write_r8(reg_idx, result);
if reg_idx == 6 { 16 } else { 8 }
}
// SRL r
0x38..=0x3F => {
let value = self.read_r8(reg_idx);
let result = self.srl(value);
self.write_r8(reg_idx, result);
if reg_idx == 6 { 16 } else { 8 }
}
// BIT n, r
0x40..=0x7F => {
let bit = (opcode >> 3) & 0x07;
let value = self.read_r8(reg_idx);
self.bit(bit, value);
if reg_idx == 6 { 12 } else { 8 }
}
// RES n, r
0x80..=0xBF => {
let bit = (opcode >> 3) & 0x07;
let value = self.read_r8(reg_idx);
let result = value & !(1 << bit);
self.write_r8(reg_idx, result);
if reg_idx == 6 { 16 } else { 8 }
}
// SET n, r
0xC0..=0xFF => {
let bit = (opcode >> 3) & 0x07;
let value = self.read_r8(reg_idx);
let result = value | (1 << bit);
self.write_r8(reg_idx, result);
if reg_idx == 6 { 16 } else { 8 }
}
}
}
fn read_r8(&self, idx: u8) -> u8 {
match idx {
0 => self.regs.b,
1 => self.regs.c,
2 => self.regs.d,
3 => self.regs.e,
4 => self.regs.h,
5 => self.regs.l,
6 => self.bus.read(self.regs.hl()),
7 => self.regs.a,
_ => unreachable!(),
}
}
fn write_r8(&mut self, idx: u8, value: u8) {
match idx {
0 => self.regs.b = value,
1 => self.regs.c = value,
2 => self.regs.d = value,
3 => self.regs.e = value,
4 => self.regs.h = value,
5 => self.regs.l = value,
6 => self.bus.write(self.regs.hl(), value),
7 => self.regs.a = value,
_ => unreachable!(),
}
}
// Bit operation helpers
fn rlc(&mut self, value: u8) -> u8 {
let carry = (value >> 7) & 1;
let result = (value << 1) | carry;
self.regs.set_flags(result == 0, false, false, carry != 0);
result
}
fn rrc(&mut self, value: u8) -> u8 {
let carry = value & 1;
let result = (value >> 1) | (carry << 7);
self.regs.set_flags(result == 0, false, false, carry != 0);
result
}
fn rl(&mut self, value: u8) -> u8 {
let old_carry = if self.regs.c() { 1 } else { 0 };
let new_carry = (value >> 7) & 1;
let result = (value << 1) | old_carry;
self.regs.set_flags(result == 0, false, false, new_carry != 0);
result
}
fn rr(&mut self, value: u8) -> u8 {
let old_carry = if self.regs.c() { 0x80 } else { 0 };
let new_carry = value & 1;
let result = (value >> 1) | old_carry;
self.regs.set_flags(result == 0, false, false, new_carry != 0);
result
}
fn sla(&mut self, value: u8) -> u8 {
let carry = (value >> 7) & 1;
let result = value << 1;
self.regs.set_flags(result == 0, false, false, carry != 0);
result
}
fn sra(&mut self, value: u8) -> u8 {
let carry = value & 1;
let result = (value >> 1) | (value & 0x80); // Preserve sign bit
self.regs.set_flags(result == 0, false, false, carry != 0);
result
}
fn swap(&mut self, value: u8) -> u8 {
let result = (value >> 4) | (value << 4);
self.regs.set_flags(result == 0, false, false, false);
result
}
fn srl(&mut self, value: u8) -> u8 {
let carry = value & 1;
let result = value >> 1;
self.regs.set_flags(result == 0, false, false, carry != 0);
result
}
fn bit(&mut self, bit: u8, value: u8) {
let result = value & (1 << bit);
self.regs.set_z(result == 0);
self.regs.set_n(false);
self.regs.set_h(true);
// C flag not affected
}
}
Checkpoint: All CB-prefixed instructions implemented and tested.
Phase 5: Interrupt Handling (Days 15-17)
Goal: Implement the complete interrupt system.
// cpu/interrupts.rs
impl<M: MemoryBus> Cpu<M> {
/// Handle pending interrupts
/// Returns cycles consumed (0 if no interrupt, 20 if interrupt serviced)
pub fn handle_interrupts(&mut self) -> u8 {
let ie = self.bus.read(0xFFFF);
let if_ = self.bus.read(0xFF0F);
let pending = ie & if_ & 0x1F;
if pending == 0 {
return 0;
}
// Wake from HALT if any interrupt is pending
self.halted = false;
// If IME is disabled, don't service the interrupt
if !self.ime {
return 0;
}
// Find highest priority pending interrupt (lowest bit number)
let interrupt_bit = pending.trailing_zeros() as u8;
// Vector addresses: 0x40, 0x48, 0x50, 0x58, 0x60
let vector = 0x0040 + (interrupt_bit as u16 * 8);
// Clear the IF flag for this interrupt
let new_if = if_ & !(1 << interrupt_bit);
self.bus.write(0xFF0F, new_if);
// Disable interrupts (IME = 0)
self.ime = false;
// Push current PC to stack
self.push_word(self.regs.pc);
// Jump to interrupt vector
self.regs.pc = vector;
// Interrupt dispatch takes 5 M-cycles (20 T-states)
20
}
}
// Instructions that affect interrupts
impl<M: MemoryBus> Cpu<M> {
fn execute_di(&mut self) {
// DI - Disable interrupts immediately
self.ime = false;
}
fn execute_ei(&mut self) {
// EI - Enable interrupts after next instruction
self.ime_pending = true;
}
fn execute_reti(&mut self) -> u8 {
// RETI - Return and enable interrupts
self.regs.pc = self.pop_word();
self.ime = true;
16
}
fn execute_halt(&mut self) {
// HALT - Stop CPU until interrupt
self.halted = true;
// HALT bug: If IME=0 but (IE & IF) != 0,
// the next instruction after HALT will be executed twice
// (PC fails to increment)
// This is a real hardware bug we should emulate
}
}
Checkpoint: Interrupts fire at correct times, HALT works correctly.
Phase 6: Pass Blargg’s CPU Tests (Days 18-24)
Goal: Complete all remaining opcodes and pass the cpu_instrs test ROM.
Blargg’s test ROMs are the gold standard for Game Boy emulator accuracy. The cpu_instrs.gb ROM contains 11 individual tests:
- 01-special.gb - DAA, CPL, SCF, CCF
- 02-interrupts.gb - Interrupt timing
- 03-op sp,hl.gb - SP and HL operations
- 04-op r,imm.gb - Register/immediate operations
- 05-op rp.gb - Register pair operations
- 06-ld r,r.gb - Load between registers
- 07-jr,jp,call,ret,rst.gb - Control flow
- 08-misc instrs.gb - Miscellaneous
- 09-op r,r.gb - Register/register operations
- 10-bit ops.gb - CB-prefixed bit operations
- 11-op a,(hl).gb - A with (HL) operations
The DAA Instruction:
This is the most difficult instruction to implement correctly:
fn execute_daa(&mut self) {
// DAA - Decimal Adjust Accumulator
// Converts A to BCD after an addition or subtraction
let mut a = self.regs.a as i16;
if !self.regs.n() {
// After addition
if self.regs.c() || a > 0x99 {
a += 0x60;
self.regs.set_c(true);
}
if self.regs.h() || (a & 0x0F) > 0x09 {
a += 0x06;
}
} else {
// After subtraction
if self.regs.c() {
a -= 0x60;
}
if self.regs.h() {
a -= 0x06;
}
}
self.regs.a = a as u8;
self.regs.set_z(self.regs.a == 0);
self.regs.set_h(false);
// C flag set above if needed, otherwise unchanged
}
Checkpoint: All 11 Blargg cpu_instrs tests pass.
Phase 7: Basic PPU for Visual Output (Days 25-30)
Goal: Implement enough PPU functionality to see visual output.
// ppu/mod.rs
pub struct Ppu {
/// Current scanline (0-153)
ly: u8,
/// LCD Control register
lcdc: u8,
/// LCD Status register
stat: u8,
/// Scroll X
scx: u8,
/// Scroll Y
scy: u8,
/// Current mode (0-3)
mode: PpuMode,
/// Cycles into current scanline
cycles: u16,
/// Frame buffer (160x144 pixels, 2 bits per pixel)
pub framebuffer: [u8; 160 * 144],
}
#[derive(Clone, Copy, PartialEq)]
pub enum PpuMode {
HBlank = 0,
VBlank = 1,
OamScan = 2,
Drawing = 3,
}
impl Ppu {
pub fn new() -> Self {
Self {
ly: 0,
lcdc: 0x91,
stat: 0,
scx: 0,
scy: 0,
mode: PpuMode::OamScan,
cycles: 0,
framebuffer: [0; 160 * 144],
}
}
/// Step the PPU by the given number of cycles
/// Returns true if VBlank interrupt should be triggered
pub fn step(&mut self, cycles: u8, vram: &[u8]) -> bool {
if !self.lcd_enabled() {
return false;
}
self.cycles += cycles as u16;
let mut vblank_interrupt = false;
match self.mode {
PpuMode::OamScan => {
if self.cycles >= 80 {
self.cycles -= 80;
self.mode = PpuMode::Drawing;
}
}
PpuMode::Drawing => {
if self.cycles >= 172 {
self.cycles -= 172;
self.mode = PpuMode::HBlank;
// Render this scanline
if self.ly < 144 {
self.render_scanline(vram);
}
}
}
PpuMode::HBlank => {
if self.cycles >= 204 {
self.cycles -= 204;
self.ly += 1;
if self.ly == 144 {
self.mode = PpuMode::VBlank;
vblank_interrupt = true;
} else {
self.mode = PpuMode::OamScan;
}
}
}
PpuMode::VBlank => {
if self.cycles >= 456 {
self.cycles -= 456;
self.ly += 1;
if self.ly > 153 {
self.ly = 0;
self.mode = PpuMode::OamScan;
}
}
}
}
vblank_interrupt
}
fn lcd_enabled(&self) -> bool {
(self.lcdc & 0x80) != 0
}
fn render_scanline(&mut self, vram: &[u8]) {
if (self.lcdc & 0x01) == 0 {
// BG disabled
return;
}
let tile_map_addr = if (self.lcdc & 0x08) != 0 { 0x1C00 } else { 0x1800 };
let tile_data_addr = if (self.lcdc & 0x10) != 0 { 0x0000 } else { 0x0800 };
let signed_tile_nums = (self.lcdc & 0x10) == 0;
let y = self.ly.wrapping_add(self.scy);
let tile_row = (y / 8) as u16;
let pixel_row = (y % 8) as u16;
for screen_x in 0u8..160 {
let x = screen_x.wrapping_add(self.scx);
let tile_col = (x / 8) as u16;
let pixel_col = 7 - (x % 8);
// Get tile number from map
let map_offset = tile_row * 32 + tile_col;
let tile_num = vram[(tile_map_addr + map_offset) as usize];
// Get tile data address
let tile_addr = if signed_tile_nums {
let signed_num = tile_num as i8 as i16;
((tile_data_addr as i16) + (signed_num + 128) * 16) as u16
} else {
tile_data_addr + (tile_num as u16) * 16
};
// Read tile row (2 bytes)
let row_addr = tile_addr + pixel_row * 2;
let low = vram[row_addr as usize];
let high = vram[(row_addr + 1) as usize];
// Get pixel color (0-3)
let color_bit = ((high >> pixel_col) & 1) << 1 | ((low >> pixel_col) & 1);
// Store in framebuffer
let fb_idx = self.ly as usize * 160 + screen_x as usize;
self.framebuffer[fb_idx] = color_bit;
}
}
pub fn read_register(&self, addr: u16) -> u8 {
match addr {
0xFF40 => self.lcdc,
0xFF41 => (self.stat & 0xF8) | (self.mode as u8),
0xFF42 => self.scy,
0xFF43 => self.scx,
0xFF44 => self.ly,
_ => 0xFF,
}
}
pub fn write_register(&mut self, addr: u16, value: u8) {
match addr {
0xFF40 => self.lcdc = value,
0xFF41 => self.stat = (value & 0xF8) | (self.stat & 0x07),
0xFF42 => self.scy = value,
0xFF43 => self.scx = value,
// LY is read-only
_ => {}
}
}
}
Checkpoint: Games display on screen.
Testing Strategy
Blargg’s Test ROMs
The gold standard for CPU accuracy:
// tests/blargg.rs
#[test]
fn test_blargg_01_special() {
let rom = include_bytes!("../roms/01-special.gb");
let mut emu = Emulator::new(rom);
// Run until test completes or timeout
for _ in 0..10_000_000 {
emu.step();
// Check serial output for pass/fail
if emu.serial_output().contains("Passed") {
return; // Success!
}
if emu.serial_output().contains("Failed") {
panic!("Test failed: {}", emu.serial_output());
}
}
panic!("Test timed out");
}
Instruction-Level Unit Tests
#[test]
fn test_add_sets_flags_correctly() {
let mut cpu = Cpu::new(TestBus::new());
// Test zero flag
cpu.regs.a = 0;
cpu.add_a(0);
assert!(cpu.regs.z(), "Zero flag should be set for result 0");
// Test half-carry
cpu.regs.a = 0x0F;
cpu.add_a(0x01);
assert!(cpu.regs.h(), "Half-carry should be set for 0x0F + 0x01");
// Test carry
cpu.regs.a = 0xFF;
cpu.add_a(0x01);
assert!(cpu.regs.c(), "Carry should be set for 0xFF + 0x01");
assert_eq!(cpu.regs.a, 0x00, "Result should wrap to 0");
}
Cycle Accuracy Verification
#[test]
fn test_instruction_cycles() {
let mut cpu = Cpu::new(TestBus::with_program(&[
0x00, // NOP - 4 cycles
0x01, 0x00, 0x00, // LD BC, d16 - 12 cycles
0xC3, 0x00, 0x00, // JP a16 - 16 cycles
]));
let initial = cpu.total_cycles();
cpu.step(); // NOP
assert_eq!(cpu.total_cycles() - initial, 4);
cpu.step(); // LD BC, d16
assert_eq!(cpu.total_cycles() - initial, 16);
cpu.step(); // JP a16
assert_eq!(cpu.total_cycles() - initial, 32);
}
Common Pitfalls
Pitfall 1: Half-Carry Flag Calculation Errors
Symptom: DAA produces wrong results, Blargg test 01 fails
Solution: Double-check your half-carry formula for each operation type:
- ADD:
(a & 0xF) + (b & 0xF) > 0xF - SUB:
(a & 0xF) < (b & 0xF) - 16-bit ADD: Uses bit 11, not bit 3
Pitfall 2: DAA Instruction Complexity
Symptom: BCD arithmetic produces garbage
Solution: DAA behavior depends on the N, H, and C flags from the PREVIOUS operation. Study multiple implementations and test thoroughly.
Pitfall 3: Interrupt Timing Edge Cases
Symptom: Games freeze or behave erratically
Solution:
- EI enables interrupts AFTER the next instruction
- HALT bug when IME=0 and (IE & IF) != 0
- Interrupt dispatch takes 5 M-cycles
Pitfall 4: 16-bit Register Access Patterns
Symptom: Stack corruption, wrong values
Solution:
- Stack grows DOWNWARD (SP decrements)
- 16-bit values are little-endian (low byte first)
- PUSH writes high byte first, POP reads low byte first
Pitfall 5: CB-Prefixed Instruction Timing
Symptom: Timing tests fail
Solution: CB instructions that access (HL) take extra cycles:
- CB (HL) read: +4 cycles
- CB (HL) write: +4 more cycles
Pitfall 6: Boot ROM Overlay
Symptom: Games don’t start
Solution:
- Boot ROM is at 0x0000-0x00FF on startup
- Writing to 0xFF50 disables it PERMANENTLY
- After boot, cartridge ROM is visible at 0x0000
Extensions
Full PPU Implementation
Add complete PPU with:
- Sprite rendering (OAM)
- Window layer
- Priority handling
- Mode 3 variable length
Audio APU
Implement the 4 audio channels:
- Channel 1: Pulse with sweep
- Channel 2: Pulse
- Channel 3: Wave
- Channel 4: Noise
Link Cable
Implement serial communication:
- Internal clock mode
- External clock mode
- Shift register timing
Save States
Serialize emulator state:
- CPU registers
- All RAM
- PPU state
- Timer state
- MBC state
Game Boy Color Support
Extend for GBC:
- Double-speed mode
- Color palettes
- VRAM banking
- WRAM banking
The Interview Questions
- “How does the LR35902 differ from a standard Z80?”
- Missing IX/IY index registers
- Missing shadow registers
- Missing block transfer instructions
- Has unique SWAP instruction
- Memory-mapped I/O instead of I/O ports
- Simplified interrupt system (5 sources, fixed vectors)
- “Why is the half-carry flag important?”
- Required for DAA instruction
- DAA converts binary to BCD (for displaying decimal numbers)
- Half-carry tracks overflow from lower nibble to upper nibble
- Essential for correct addition/subtraction of multi-digit decimals
- “How do you handle the different cycle counts for conditional instructions?”
- Check condition before returning cycles
- If taken: return longer cycle count
- If not taken: return shorter cycle count
- Must track total cycles for PPU/Timer synchronization
- “Explain the memory banking system and why it’s necessary.”
- Game Boy has 16-bit address bus = 64KB maximum
- Games can be up to 8MB
- MBC chips on cartridge swap banks into 0x4000-0x7FFF
- Write to ROM addresses controls bank selection
- Also handles external RAM for save games
- “What is the HALT bug and how do you emulate it?”
- When IME=0 but (IE & IF) != 0, HALT exits but PC fails to increment
- Next instruction after HALT is executed twice
- Must track this edge case for accurate emulation
- Some games rely on this behavior
Books That Will Help
| Topic | Book | Chapter/Section |
|---|---|---|
| Game Boy Hardware | “Game Boy Coding Adventure” by Maximilien Dagois | Full book - definitive Game Boy reference |
| CPU Architecture | “Computer Organization and Design” by Patterson & Hennessy | Chapter 4 - The Processor |
| Bit Manipulation | “The Art of Computer Programming” Vol. 4A by Donald Knuth | Combinatorial Algorithms |
| Online Reference | Pan Docs | https://gbdev.io/pandocs/ |
| Z80 Heritage | “Z80 Family CPU User Manual” by Zilog | Reference for understanding LR35902 origins |
| Emulator Development | “The Ultimate Game Boy Talk” by Michael Steil | Video + slides on accurate emulation |
Summary
This project teaches you:
- CPU architecture at the deepest level - you will understand exactly how instructions are encoded, decoded, and executed
- Bit manipulation mastery - from flag calculations to register pairing
- Memory systems - direct access, banking, and memory-mapped I/O
- Interrupt handling - priority, timing, and edge cases
- no_std Rust - building portable, allocation-free libraries
- Hardware synchronization - coordinating CPU, PPU, and timers
When complete, you will have built a working Game Boy emulator core that can run commercial games. More importantly, you will have internalized how computers work at the most fundamental level - knowledge that will inform everything else you build.
The Game Boy’s simplicity makes it an ideal first emulator target, but its quirks (HALT bug, DAA complexity, timing requirements) ensure you learn to pay attention to details that matter.
Welcome to the world of emulation. The hardware is now yours to recreate.