Project 11: The no_std Game Boy Core (CPU Simulation)

Project 11: The no_std Game Boy Core (CPU Simulation)

“To truly understand how a computer works, build one yourself - even if it’s in software.” - The Art of the Metaobject Protocol


Project Metadata

  • Main Programming Language: Rust
  • Coolness Level: Level 5: Pure Magic (Super Cool)
  • Difficulty: Level 5: Master
  • Knowledge Area: Computer Architecture / Emulation
  • Time Estimate: 1 month+
  • Prerequisites:
    • Strong Rust fundamentals (ownership, traits, lifetimes)
    • Understanding of binary/hexadecimal arithmetic
    • Familiarity with bitwise operations
    • Basic knowledge of computer architecture (registers, memory, fetch-decode-execute)
    • Experience with no_std Rust (from Project 4)

What You Will Build

A CPU core (LR35902) for the Nintendo Game Boy that works in a no_std environment. You will implement the complete register set, the full instruction set (over 500 opcodes including CB-prefixed instructions), and the memory map. Your core will be portable enough to compile to WebAssembly, embedded ARM Cortex-M, or native x86_64.

This is not a toy project. You will build a cycle-accurate CPU emulator that can pass the industry-standard Blargg’s test ROMs, proving your implementation matches the real silicon.


Learning Objectives

By the end of this project, you will be able to:

  1. Describe the LR35902 CPU architecture including its registers, flag behaviors, and instruction encoding schemes
  2. Implement the fetch-decode-execute cycle with proper cycle counting for timing accuracy
  3. Design a flexible memory bus abstraction using Rust traits for multiple backend support
  4. Master bit manipulation techniques for flag calculation, half-carry detection, and register pairing
  5. Understand memory banking controllers (MBC1, MBC3, MBC5) and cartridge ROM/RAM switching
  6. Implement interrupt handling with proper priority and timing for VBLANK, LCD STAT, Timer, Serial, and Joypad
  7. Write comprehensive tests using Blargg’s test ROMs as the gold standard for correctness

Deep Theoretical Foundation

Before writing a single line of code, you must deeply understand the system you are emulating. This section provides the foundational knowledge you need.

The Game Boy: A System Overview

The Nintendo Game Boy, released in 1989, was a triumph of engineering compromise. Designed by Gunpei Yokoi following his “Lateral Thinking with Withered Technology” philosophy, it used proven, inexpensive components to create a gaming device with extraordinary battery life and durability.

+-----------------------------------------------------------------------+
|                   GAME BOY SYSTEM ARCHITECTURE                         |
+-----------------------------------------------------------------------+

                         +-------------------+
                         |    CARTRIDGE      |
                         |  (Game ROM/RAM)   |
                         |    +MBC Chip      |
                         +--------+----------+
                                  |
                                  | Cartridge Bus
                                  |
+--------+    +---------+    +----+----+    +----------+    +--------+
|        |    |         |    |         |    |          |    |        |
| JOYPAD +----+  CPU    +----+  BUS    +----+   PPU    +----+  LCD   |
|        |    | LR35902 |    | ARBITER |    | (Video)  |    | 160x144|
|        |    |  @4MHz  |    |         |    |          |    |        |
+--------+    +----+----+    +----+----+    +----+-----+    +--------+
                   |              |              |
                   |         +----+----+         |
                   |         |         |         |
                   +---------+  WRAM   +---------+
                   |         | 8KB     |
                   |         +---------+
                   |
              +----+----+
              |         |
              |   APU   +-----> Audio Output (4 Channels)
              | (Audio) |
              |         |
              +---------+

Key Specifications:
- CPU: Sharp LR35902 (custom Z80-like) @ 4.194304 MHz
- RAM: 8 KB Work RAM + 8 KB Video RAM
- Display: 160x144 pixels, 4 shades of green
- Audio: 4 channels (2 pulse, 1 wave, 1 noise)
- Cartridge: Up to 8 MB ROM, 128 KB RAM (with MBC)

The Sharp LR35902:

The CPU is often incorrectly called a “Z80.” It’s actually a hybrid between the Intel 8080 and the Zilog Z80, with some instructions from each and some unique to itself. It runs at 4.194304 MHz (often rounded to 4 MHz), which was chosen because it divides evenly into common TV timing frequencies.

Book Reference: “Game Boy Coding Adventure” by Maximilien Dagois - Full Book


The LR35902 CPU: A Modified Z80/8080 Hybrid

Understanding the LR35902’s lineage helps you understand its quirks:

+-----------------------------------------------------------------------+
|                    CPU FAMILY TREE                                     |
+-----------------------------------------------------------------------+

Intel 8080 (1974)
    |
    +---> Zilog Z80 (1976)
    |         |
    |         +---> Sharp LR35902 (1989)
    |                    |
    |                    +---> What Game Boy uses
    |
    +---> Intel 8085 (1976)

What LR35902 INHERITED from 8080:
- Basic register set (A, B, C, D, E, H, L)
- Most 8-bit arithmetic instructions
- Register pairing (BC, DE, HL)
- Memory addressing via HL

What LR35902 INHERITED from Z80:
- Swap nibbles instruction
- Some CB-prefixed bit operations
- Some index register concepts (simplified)

What LR35902 REMOVED vs Z80:
- IX and IY index registers
- Shadow registers (A', F', B', C', etc.)
- Block transfer instructions (LDIR, LDDR)
- I/O space (replaced with memory-mapped I/O)
- Many Z80-specific CB/ED prefixed instructions

What LR35902 ADDED uniquely:
- STOP instruction (for power saving)
- Specific I/O register behaviors
- Simpler interrupt system

The 4 MHz Clock:

The CPU operates at exactly 4,194,304 Hz. This number is 2^22, making it easy to divide for various timing purposes:

  • 4,194,304 / 154 scanlines / 456 cycles per scanline = ~59.73 frames per second
  • This matches the original Game Boy’s display refresh rate

Book Reference: “Computer Organization and Design” by Patterson & Hennessy - Chapter 4


The Register Set: Your CPU’s Working Memory

The LR35902 has eight 8-bit registers and several special-purpose registers:

+-----------------------------------------------------------------------+
|                   LR35902 REGISTER LAYOUT                              |
+-----------------------------------------------------------------------+

    8-bit Registers              16-bit Register Pairs
    +-----------+                +-----------+-----------+
    |     A     |  Accumulator   |     A     |     F     |  AF (Accumulator + Flags)
    +-----------+                +-----------+-----------+
    |     F     |  Flags            0x01        0x00
    +-----------+
                                 +-----------+-----------+
    +-----------+                |     B     |     C     |  BC (General Purpose)
    |     B     |                +-----------+-----------+
    +-----------+                   0xFF        0x13
    |     C     |
    +-----------+
                                 +-----------+-----------+
    +-----------+                |     D     |     E     |  DE (General Purpose)
    |     D     |                +-----------+-----------+
    +-----------+                   0x00        0xD8
    |     E     |
    +-----------+
                                 +-----------+-----------+
    +-----------+                |     H     |     L     |  HL (Memory Pointer)
    |     H     |                +-----------+-----------+
    +-----------+                   0x01        0x4D
    |     L     |
    +-----------+

Special Registers:

    +-----------------------+
    |          SP           |  Stack Pointer (16-bit)
    +-----------------------+     Points to top of stack
         0xFFFE                   Grows DOWNWARD in memory

    +-----------------------+
    |          PC           |  Program Counter (16-bit)
    +-----------------------+     Points to next instruction
         0x0100                   Starts at 0x0100 after boot ROM

After Boot ROM Execution, Registers Contain:
    A  = 0x01 (or 0x11 on Game Boy Color)
    F  = 0xB0 (Z=1, N=0, H=1, C=1)
    B  = 0x00
    C  = 0x13
    D  = 0x00
    E  = 0xD8
    H  = 0x01
    L  = 0x4D
    SP = 0xFFFE
    PC = 0x0100

Register Pairing:

The 8-bit registers can be combined into 16-bit pairs:

  • AF: A (high byte) + F (low byte) - Accumulator and Flags
  • BC: B (high byte) + C (low byte) - Counter/General purpose
  • DE: D (high byte) + E (low byte) - Data/General purpose
  • HL: H (high byte) + L (low byte) - Primary memory pointer

When you access HL as a 16-bit value and H=0x01, L=0x4D, then HL=0x014D.

Rust Implementation Strategy:

/// CPU registers with efficient 8-bit and 16-bit access
pub struct Registers {
    pub a: u8,
    pub f: u8,  // Flags register - only upper 4 bits used
    pub b: u8,
    pub c: u8,
    pub d: u8,
    pub e: u8,
    pub h: u8,
    pub l: u8,
    pub sp: u16,
    pub pc: u16,
}

impl Registers {
    /// Get AF as 16-bit value
    pub fn af(&self) -> u16 {
        (self.a as u16) << 8 | (self.f as u16)
    }

    /// Set AF from 16-bit value
    pub fn set_af(&mut self, value: u16) {
        self.a = (value >> 8) as u8;
        self.f = (value & 0xF0) as u8;  // Lower 4 bits always 0
    }

    /// Get HL as 16-bit value
    pub fn hl(&self) -> u16 {
        (self.h as u16) << 8 | (self.l as u16)
    }

    /// Set HL from 16-bit value
    pub fn set_hl(&mut self, value: u16) {
        self.h = (value >> 8) as u8;
        self.l = value as u8;
    }

    // Similar for BC and DE...
}

The Flag Register: Z, N, H, C

The F register contains four flags that record information about the results of arithmetic and logic operations:

+-----------------------------------------------------------------------+
|                   FLAG REGISTER (F) LAYOUT                             |
+-----------------------------------------------------------------------+

  Bit:   7    6    5    4    3    2    1    0
       +----+----+----+----+----+----+----+----+
   F = | Z  | N  | H  | C  | 0  | 0  | 0  | 0  |
       +----+----+----+----+----+----+----+----+
         |    |    |    |
         |    |    |    +-- Carry Flag
         |    |    +------- Half-Carry Flag (BCD)
         |    +------------ Subtract Flag (BCD)
         +----------------- Zero Flag

IMPORTANT: Bits 3-0 are ALWAYS 0. Writing to them has no effect.

FLAG BEHAVIORS:

Z (Zero) Flag - Bit 7
    Set (1): Result of operation is zero
    Reset (0): Result is non-zero

    Example:
        ADD A, B  where A=5, B=3  -> A=8, Z=0 (result not zero)
        SUB A, B  where A=5, B=5  -> A=0, Z=1 (result is zero)

N (Subtract) Flag - Bit 6
    Set (1): Last operation was a subtraction
    Reset (0): Last operation was an addition

    Used by: DAA instruction for BCD correction

    Example:
        ADD A, B  -> N=0
        SUB A, B  -> N=1

H (Half-Carry) Flag - Bit 5
    Set (1): Carry occurred from bit 3 to bit 4
    Reset (0): No half-carry occurred

    This is the TRICKIEST flag to implement correctly!

    Example (8-bit addition):
        A = 0x0F (0000 1111)
        B = 0x01 (0000 0001)
        --------------------
        R = 0x10 (0001 0000)  <- Carry from bit 3 to bit 4, H=1

    Formula for ADD: H = ((A & 0xF) + (B & 0xF)) > 0xF
    Formula for SUB: H = (A & 0xF) < (B & 0xF)

C (Carry) Flag - Bit 4
    Set (1): Carry occurred from bit 7 (overflow/underflow)
    Reset (0): No carry occurred

    Example (8-bit addition):
        A = 0xFF (255)
        B = 0x01 (1)
        ----------------
        R = 0x00, C=1 (256 overflowed to 0)

    Formula for ADD: C = (A as u16 + B as u16) > 0xFF
    Formula for SUB: C = A < B

The Half-Carry Problem:

Half-carry is by far the most commonly incorrectly implemented flag. It matters because of the DAA (Decimal Adjust Accumulator) instruction, which converts binary results to BCD (Binary Coded Decimal) for displaying numbers.

HALF-CARRY CALCULATION EXAMPLES:

8-bit Addition (ADD A, B):
    A = 0x3C, B = 0x2F

    Binary: 0011 1100 + 0010 1111

    Lower nibble: 0xC + 0xF = 0x1B (> 0xF, so H=1)
    Full result: 0x6B

    Calculation: H = ((0x3C & 0x0F) + (0x2F & 0x0F)) > 0x0F
                 H = (0x0C + 0x0F) > 0x0F
                 H = 0x1B > 0x0F
                 H = true

16-bit Addition (ADD HL, BC):
    Half-carry is calculated from bit 11 to bit 12 (not bit 3 to 4)!

    HL = 0x0FFF, BC = 0x0001

    H = ((HL & 0x0FFF) + (BC & 0x0FFF)) > 0x0FFF
    H = (0x0FFF + 0x0001) > 0x0FFF
    H = 0x1000 > 0x0FFF
    H = true

8-bit Subtraction (SUB A, B):
    A = 0x3C, B = 0x2F

    Lower nibble: 0xC - 0xF = -3 (borrow needed, H=1)

    Calculation: H = (A & 0x0F) < (B & 0x0F)
                 H = 0x0C < 0x0F
                 H = true

Rust Implementation:

impl Registers {
    /// Set the Zero flag
    pub fn set_z(&mut self, value: bool) {
        if value {
            self.f |= 0x80;  // Set bit 7
        } else {
            self.f &= !0x80; // Clear bit 7
        }
    }

    /// Get the Zero flag
    pub fn z(&self) -> bool {
        (self.f & 0x80) != 0
    }

    /// Set the Subtract flag
    pub fn set_n(&mut self, value: bool) {
        if value {
            self.f |= 0x40;
        } else {
            self.f &= !0x40;
        }
    }

    /// Set the Half-carry flag
    pub fn set_h(&mut self, value: bool) {
        if value {
            self.f |= 0x20;
        } else {
            self.f &= !0x20;
        }
    }

    /// Set the Carry flag
    pub fn set_c(&mut self, value: bool) {
        if value {
            self.f |= 0x10;
        } else {
            self.f &= !0x10;
        }
    }

    /// Set all flags at once (common pattern)
    pub fn set_flags(&mut self, z: bool, n: bool, h: bool, c: bool) {
        self.f = 0;
        if z { self.f |= 0x80; }
        if n { self.f |= 0x40; }
        if h { self.f |= 0x20; }
        if c { self.f |= 0x10; }
    }
}

Book Reference: “The Art of Computer Programming” Vol. 4 by Donald Knuth - Bit manipulation techniques


The Memory Map: Addressing 64KB

The Game Boy has a 16-bit address bus, allowing it to address 65,536 bytes (64 KB) of memory. Different regions of this address space are mapped to different hardware:

+-----------------------------------------------------------------------+
|                   GAME BOY MEMORY MAP                                  |
+-----------------------------------------------------------------------+

0xFFFF  +-----------------------+
        | IE (Interrupt Enable) | 1 byte - Master interrupt enable bits
0xFFFF  +-----------------------+
        |                       |
        |   High RAM (HRAM)     | 127 bytes - Fast internal RAM
        |     (Zero Page)       | Used for stack and quick access variables
0xFF80  +-----------------------+
        |                       |
        |   I/O Registers       | 128 bytes - Hardware control registers
        |   (Joypad, Serial,    | 0xFF00: Joypad
        |    Timer, Audio,      | 0xFF04-0xFF07: Timer registers
        |    PPU controls)      | 0xFF10-0xFF3F: Audio registers
        |                       | 0xFF40-0xFF4B: LCD/PPU registers
0xFF00  +-----------------------+
        |       Unused          | 96 bytes - Not mapped (returns 0xFF)
0xFEA0  +-----------------------+
        |                       |
        |   OAM (Sprite Data)   | 160 bytes - Object Attribute Memory
        |   40 sprites x 4      | Each sprite: Y, X, Tile, Attributes
        |   bytes each          |
0xFE00  +-----------------------+
        |                       |
        |   Echo RAM            | 7680 bytes - Mirror of 0xC000-0xDDFF
        |   (Avoid using)       | Reading/writing here affects WRAM
        |                       |
0xE000  +-----------------------+
        |                       |
        |   Work RAM (WRAM)     | 8192 bytes - General purpose RAM
        |   Bank 0: 0xC000-CFFF | Always accessible
        |   Bank 1: 0xD000-DFFF | Switchable on GBC
        |                       |
0xC000  +-----------------------+
        |                       |
        |   Cartridge RAM       | 8192 bytes (if present)
        |   (External/Battery)  | Depends on MBC type
        |                       | Often battery-backed for saves
0xA000  +-----------------------+
        |                       |
        |   Video RAM (VRAM)    | 8192 bytes - Tile data and maps
        |   Tile Data:          | 0x8000-0x97FF: Tile patterns
        |   0x8000-0x97FF       | 0x9800-0x9BFF: BG Map 0
        |   Tile Maps:          | 0x9C00-0x9FFF: BG Map 1
        |   0x9800-0x9FFF       |
        |                       |
0x8000  +-----------------------+
        |                       |
        |   Cartridge ROM       | 16384 bytes - Switchable Bank
        |   Bank 1-N            | Which bank depends on MBC registers
        |   (Switchable)        | Bank 0 is always the low bank
        |                       |
0x4000  +-----------------------+
        |                       |
        |   Cartridge ROM       | 16384 bytes - Fixed Bank 0
        |   Bank 0              | Always contains start of game
        |   (Fixed)             | Interrupt vectors at 0x0040, 0x0048, etc.
        |                       |
0x0000  +-----------------------+

BOOT ROM OVERLAY (0x0000-0x00FF):
    When the Game Boy first powers on, addresses 0x0000-0x00FF
    are mapped to the internal 256-byte Boot ROM.
    Writing 0x01 to 0xFF50 unmaps the Boot ROM permanently.

Key I/O Registers:

+-----------------------------------------------------------------------+
|                   KEY I/O REGISTER ADDRESSES                           |
+-----------------------------------------------------------------------+

0xFF00 - JOYP (Joypad)
    Bits 5-4: Select button group (write)
    Bits 3-0: Button states (read)

0xFF04 - DIV (Divider Register)
    Increments at 16384 Hz. Writing any value resets to 0.

0xFF05 - TIMA (Timer Counter)
    Incremented at rate specified by TAC. Triggers interrupt on overflow.

0xFF06 - TMA (Timer Modulo)
    Value loaded into TIMA when it overflows.

0xFF07 - TAC (Timer Control)
    Bit 2: Timer enable
    Bits 1-0: Clock select (4096/262144/65536/16384 Hz)

0xFF0F - IF (Interrupt Flag)
    Bit 4: Joypad interrupt requested
    Bit 3: Serial interrupt requested
    Bit 2: Timer interrupt requested
    Bit 1: LCD STAT interrupt requested
    Bit 0: VBlank interrupt requested

0xFF40 - LCDC (LCD Control)
    Bit 7: LCD enable
    Bit 6: Window tile map select
    Bit 5: Window enable
    Bit 4: BG & Window tile data select
    Bit 3: BG tile map select
    Bit 2: OBJ (sprite) size (8x8 or 8x16)
    Bit 1: OBJ enable
    Bit 0: BG & Window enable

0xFF41 - STAT (LCD Status)
    Bits 6-3: Interrupt sources
    Bits 1-0: Mode (0-3)

0xFF44 - LY (Current Scanline)
    Current scanline being drawn (0-153)

0xFF50 - Boot ROM Disable
    Write 0x01 to unmap Boot ROM from 0x0000-0x00FF

0xFFFF - IE (Interrupt Enable)
    Same format as IF, but controls which interrupts are enabled

Rust Memory Bus Trait:

/// Memory bus abstraction for different backends
pub trait MemoryBus {
    /// Read a byte from the given address
    fn read(&self, addr: u16) -> u8;

    /// Write a byte to the given address
    fn write(&mut self, addr: u16, value: u8);

    /// Read a 16-bit word (little-endian)
    fn read_word(&self, addr: u16) -> u16 {
        let lo = self.read(addr) as u16;
        let hi = self.read(addr.wrapping_add(1)) as u16;
        (hi << 8) | lo
    }

    /// Write a 16-bit word (little-endian)
    fn write_word(&mut self, addr: u16, value: u16) {
        self.write(addr, value as u8);
        self.write(addr.wrapping_add(1), (value >> 8) as u8);
    }
}

Instruction Encoding: Reading the Opcode Tables

The LR35902 has 256 base opcodes (0x00-0xFF) plus 256 CB-prefixed opcodes (0xCB 0x00 - 0xCB 0xFF), for a total of 512 possible instructions.

+-----------------------------------------------------------------------+
|                   OPCODE ENCODING STRUCTURE                            |
+-----------------------------------------------------------------------+

OPCODE FORMAT (8 bits):
    Many opcodes follow regular patterns based on bit positions:

    +---+---+---+---+---+---+---+---+
    | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
    +---+---+---+---+---+---+---+---+
      |   |   |   |   |   |   |   |
      +---+   +---+---+   +---+---+
        |         |           |
        |         |           +-- Source register (for many ops)
        |         +-------------- Destination register (for many ops)
        +------------------------ Operation type

REGISTER ENCODING (3 bits):
    000 = B        100 = H
    001 = C        101 = L
    010 = D        110 = (HL) - memory at address HL
    011 = E        111 = A

8-BIT LOAD PATTERNS:
    LD r, r' (Load register to register)
    01 DDD SSS
       ^^^ ^^^
       |   +-- Source register
       +------ Destination register

    Examples:
        LD B, C = 01 000 001 = 0x41
        LD A, B = 01 111 000 = 0x78
        LD L, H = 01 101 100 = 0x6C
        LD A, (HL) = 01 111 110 = 0x7E

ARITHMETIC PATTERNS:
    ADD A, r (Add register to A)
    10 000 SSS

    ADC A, r (Add with carry)
    10 001 SSS

    SUB r (Subtract register from A)
    10 010 SSS

    SBC A, r (Subtract with carry)
    10 011 SSS

    AND r
    10 100 SSS

    XOR r
    10 101 SSS

    OR r
    10 110 SSS

    CP r (Compare)
    10 111 SSS

CB-PREFIXED INSTRUCTIONS:
    The 0xCB prefix indicates a bit manipulation instruction.

    After reading 0xCB, read another byte:

    RLC r (Rotate Left through Carry)
    00 000 SSS

    RRC r (Rotate Right through Carry)
    00 001 SSS

    RL r (Rotate Left)
    00 010 SSS

    RR r (Rotate Right)
    00 011 SSS

    SLA r (Shift Left Arithmetic)
    00 100 SSS

    SRA r (Shift Right Arithmetic - preserve sign)
    00 101 SSS

    SWAP r (Swap nibbles)
    00 110 SSS

    SRL r (Shift Right Logical)
    00 111 SSS

    BIT n, r (Test bit n)
    01 NNN SSS
       ^^^ ^^^
       |   +-- Register
       +------ Bit number (0-7)

    RES n, r (Reset bit n)
    10 NNN SSS

    SET n, r (Set bit n)
    11 NNN SSS

Opcode Table Extract (Base Instructions):

+-----------------------------------------------------------------------+
|                   OPCODE TABLE (PARTIAL)                               |
+-----------------------------------------------------------------------+

     | x0   x1   x2   x3   x4   x5   x6   x7   x8   x9   xA   xB   xC   xD   xE   xF
-----+--------------------------------------------------------------------------------
0x   | NOP  LD   LD   INC  INC  DEC  LD   RLCA ADD  LD   LD   DEC  INC  DEC  LD   RRCA
     |      BC   (BC) BC   B    B    B,d8      HL   A    BC   C    C    C,d8
     |      d16  A              B              BC   (BC)
-----+--------------------------------------------------------------------------------
1x   | STOP LD   LD   INC  INC  DEC  LD   RLA  JR   ADD  LD   DEC  INC  DEC  LD   RRA
     |      DE   (DE) DE   D    D    D,d8      r8   HL   A    DE   E    E    E,d8
     |      d16  A                                  DE   (DE)
-----+--------------------------------------------------------------------------------
2x   | JR   LD   LD   INC  INC  DEC  LD   DAA  JR   ADD  LD   DEC  INC  DEC  LD   CPL
     | NZ   HL   (HL+)HL   H    H    H,d8      Z    HL   A    HL   L    L    L,d8
     | r8   d16  A                             r8   HL   (HL+)
-----+--------------------------------------------------------------------------------
3x   | JR   LD   LD   INC  INC  DEC  LD   SCF  JR   ADD  LD   DEC  INC  DEC  LD   CCF
     | NC   SP   (HL-)SP   (HL) (HL) (HL)      C    HL   A    SP   A    A    A,d8
     | r8   d16  A              d8             r8   SP   (HL-)
-----+--------------------------------------------------------------------------------
4x   | LD   LD   LD   LD   LD   LD   LD   LD   LD   LD   LD   LD   LD   LD   LD   LD
     | B,B  B,C  B,D  B,E  B,H  B,L  B    B,A  C,B  C,C  C,D  C,E  C,H  C,L  C    C,A
     |                          (HL)                                      (HL)
-----+--------------------------------------------------------------------------------
...
-----+--------------------------------------------------------------------------------
Cx   | RET  POP  JP   JP   CALL PUSH ADD  RST  RET  RET  JP   CB   CALL CALL ADC  RST
     | NZ   BC   NZ   a16  NZ   BC   A    00   Z         Z         Z    a16  A    08
     |           a16       a16       d8             a16       pfx       a16  d8
-----+--------------------------------------------------------------------------------

Legend:
    d8  = immediate 8-bit data
    d16 = immediate 16-bit data
    a8  = 8-bit unsigned offset (0xFF00 + a8)
    a16 = 16-bit address
    r8  = 8-bit signed offset (-128 to +127)
    CB  = prefix for bit operations

The Fetch-Decode-Execute Cycle

Every CPU in existence follows this fundamental loop:

+-----------------------------------------------------------------------+
|                   FETCH-DECODE-EXECUTE CYCLE                           |
+-----------------------------------------------------------------------+

                    +----------+
                    |   FETCH  |
                    +----+-----+
                         |
    Read opcode from     |     PC = PC + 1
    memory[PC]           |
                         v
                    +----+-----+
                    |  DECODE  |
                    +----+-----+
                         |
    Determine which      |     May read additional bytes
    instruction to       |     for operands (d8, d16, r8)
    execute              |     PC = PC + operand_size
                         v
                    +----+-----+
                    |  EXECUTE |
                    +----+-----+
                         |
    Perform the          |     Update registers
    operation            |     Update flags
                         |     Access memory if needed
                         |
                         v
              +----------+----------+
              | ADD CYCLES TO TOTAL |
              +----------+----------+
                         |
                         |
                         v
              +----------+----------+
              |   CHECK INTERRUPTS  |
              +----------+----------+
                         |
                         +------> Back to FETCH

TIMING ACCURACY:

Each instruction takes a specific number of "machine cycles" (M-cycles).
Each M-cycle is 4 clock cycles (T-states).

For example:
    NOP         = 1 M-cycle  = 4 T-states
    LD B, C     = 1 M-cycle  = 4 T-states
    LD A, (HL)  = 2 M-cycles = 8 T-states (1 for opcode, 1 for memory read)
    PUSH BC     = 4 M-cycles = 16 T-states (1 + 1 + 2 for stack writes)
    CALL a16    = 6 M-cycles = 24 T-states (1 + 2 + 1 + 2 for stack + PC)

Some conditional instructions have DIFFERENT cycle counts:
    JR NZ, r8:
        If NOT taken: 2 M-cycles (8 T-states)
        If taken:     3 M-cycles (12 T-states)

    CALL NZ, a16:
        If NOT taken: 3 M-cycles (12 T-states)
        If taken:     6 M-cycles (24 T-states)

Rust Implementation:

pub struct Cpu<M: MemoryBus> {
    pub regs: Registers,
    pub bus: M,
    pub halted: bool,
    pub ime: bool,      // Interrupt Master Enable
    pub ime_pending: bool, // EI enables IME after next instruction
    cycles: u64,        // Total cycles executed
}

impl<M: MemoryBus> Cpu<M> {
    /// Execute one instruction and return the number of cycles it took
    pub fn step(&mut self) -> u8 {
        // Handle pending IME enable
        if self.ime_pending {
            self.ime = true;
            self.ime_pending = false;
        }

        // Handle halted state
        if self.halted {
            // Still consume 4 cycles while halted
            return 4;
        }

        // FETCH
        let opcode = self.fetch_byte();

        // DECODE & EXECUTE
        let cycles = self.execute(opcode);

        // Update total cycle count
        self.cycles += cycles as u64;

        cycles
    }

    /// Fetch a byte and increment PC
    fn fetch_byte(&mut self) -> u8 {
        let byte = self.bus.read(self.regs.pc);
        self.regs.pc = self.regs.pc.wrapping_add(1);
        byte
    }

    /// Fetch a 16-bit word and increment PC by 2
    fn fetch_word(&mut self) -> u16 {
        let lo = self.fetch_byte() as u16;
        let hi = self.fetch_byte() as u16;
        (hi << 8) | lo
    }

    /// Execute an instruction and return cycles
    fn execute(&mut self, opcode: u8) -> u8 {
        match opcode {
            0x00 => { /* NOP */ 4 }
            0x01 => { /* LD BC, d16 */
                let value = self.fetch_word();
                self.regs.set_bc(value);
                12
            }
            // ... hundreds more opcodes ...
            0xCB => {
                let cb_opcode = self.fetch_byte();
                self.execute_cb(cb_opcode)
            }
            _ => panic!("Unknown opcode: 0x{:02X}", opcode),
        }
    }

    fn execute_cb(&mut self, opcode: u8) -> u8 {
        match opcode {
            0x00 => { /* RLC B */ self.rlc_r8(Reg8::B); 8 }
            // ... 255 more CB opcodes ...
            _ => panic!("Unknown CB opcode: 0x{:02X}", opcode),
        }
    }
}

Timing and Cycle Accuracy

Cycle accuracy matters because the Game Boy’s hardware is tightly synchronized:

+-----------------------------------------------------------------------+
|                   TIMING RELATIONSHIPS                                 |
+-----------------------------------------------------------------------+

CPU Clock:          4,194,304 Hz
                           |
                           | / 4
                           v
Machine Cycles:     1,048,576 Hz
                           |
                           |
         +-----------------+-----------------+
         |                 |                 |
         v                 v                 v
    PPU Timing       Timer Timing      Audio Timing
         |                 |                 |
         v                 v                 v
    456 cycles        Variable         Sample rate
    per scanline      based on TAC     depends on
    154 scanlines                      channel
    per frame
         |
         v
    70,224 cycles per frame = 59.73 FPS


PPU MODE TIMING (per scanline):

    Mode 2 (OAM Scan):    80 cycles   - PPU reading OAM
    Mode 3 (Drawing):     172+ cycles - PPU reading VRAM, rendering
    Mode 0 (HBlank):      204- cycles - Horizontal blank
                          -----------
                          456 cycles total

    After 144 visible scanlines:
    Mode 1 (VBlank):      4560 cycles (10 scanlines x 456 cycles)

                          -----------
    Total per frame:      70,224 cycles

WHY ACCURACY MATTERS:

Many games rely on precise timing for:
1. Raster effects (changing graphics mid-frame)
2. STAT interrupt tricks
3. Audio synchronization
4. Copy protection

If your emulator runs the CPU too fast relative to the PPU,
graphics will glitch and games may break.

Book Reference: Pan Docs (The definitive Game Boy reference) - https://gbdev.io/pandocs/


Memory Banking Controllers (MBC)

Game Boy cartridges can contain far more than 32KB of ROM. Memory Banking Controllers (MBCs) are chips on the cartridge that allow swapping different “banks” of ROM and RAM into the CPU’s address space:

+-----------------------------------------------------------------------+
|                   MEMORY BANKING OVERVIEW                              |
+-----------------------------------------------------------------------+

Without MBC (32KB ROM max):

    0x0000-0x3FFF: ROM Bank 0 (16KB) - Fixed
    0x4000-0x7FFF: ROM Bank 1 (16KB) - Fixed

    Total: 32KB ROM

With MBC1 (up to 2MB ROM, 32KB RAM):

    0x0000-0x3FFF: ROM Bank 0 (16KB) - Fixed*
    0x4000-0x7FFF: ROM Bank 1-127 (16KB) - Switchable
    0xA000-0xBFFF: External RAM Bank 0-3 (8KB each) - Switchable

    * In advanced mode, bank 0 can also be switched

MBC REGISTER INTERFACE (Memory-Mapped):

MBC1:
    0x0000-0x1FFF: RAM Enable
        Write 0x0A to enable external RAM
        Write 0x00 to disable

    0x2000-0x3FFF: ROM Bank Number (Lower 5 bits)
        Write bank number (1-31)
        Writing 0 selects bank 1

    0x4000-0x5FFF: RAM Bank / Upper ROM Bank bits
        In ROM mode: Upper 2 bits of ROM bank (for >512KB)
        In RAM mode: RAM bank number (0-3)

    0x6000-0x7FFF: Banking Mode Select
        0 = ROM Banking Mode (default)
        1 = RAM Banking Mode

MBC3 (up to 2MB ROM, 32KB RAM, RTC):
    Adds Real-Time Clock registers mapped at 0xA000-0xBFFF
    when bank 0x08-0x0C is selected.

    RTC Registers:
        0x08: Seconds (0-59)
        0x09: Minutes (0-59)
        0x0A: Hours (0-23)
        0x0B: Day counter (lower 8 bits)
        0x0C: Day counter (bit 0), Halt flag (bit 6), Carry (bit 7)

MBC5 (up to 8MB ROM, 128KB RAM):
    Most common MBC for later games.

    0x2000-0x2FFF: ROM Bank Number (lower 8 bits)
    0x3000-0x3FFF: ROM Bank Number (9th bit)
    0x4000-0x5FFF: RAM Bank Number (0-15)

Rust Implementation:

pub enum MbcType {
    None,
    Mbc1,
    Mbc3,
    Mbc5,
}

pub struct Cartridge {
    rom: Vec<u8>,
    ram: Vec<u8>,
    mbc_type: MbcType,
    rom_bank: usize,
    ram_bank: usize,
    ram_enabled: bool,
    banking_mode: u8,
}

impl Cartridge {
    pub fn read(&self, addr: u16) -> u8 {
        match addr {
            // Fixed ROM Bank 0
            0x0000..=0x3FFF => self.rom[addr as usize],

            // Switchable ROM Bank
            0x4000..=0x7FFF => {
                let offset = (addr as usize) - 0x4000;
                let bank_offset = self.rom_bank * 0x4000;
                self.rom.get(bank_offset + offset).copied().unwrap_or(0xFF)
            }

            // External RAM
            0xA000..=0xBFFF => {
                if self.ram_enabled && !self.ram.is_empty() {
                    let offset = (addr as usize) - 0xA000;
                    let bank_offset = self.ram_bank * 0x2000;
                    self.ram.get(bank_offset + offset).copied().unwrap_or(0xFF)
                } else {
                    0xFF
                }
            }

            _ => 0xFF,
        }
    }

    pub fn write(&mut self, addr: u16, value: u8) {
        match self.mbc_type {
            MbcType::None => { /* No MBC, writes ignored */ }
            MbcType::Mbc1 => self.write_mbc1(addr, value),
            MbcType::Mbc3 => self.write_mbc3(addr, value),
            MbcType::Mbc5 => self.write_mbc5(addr, value),
        }
    }

    fn write_mbc1(&mut self, addr: u16, value: u8) {
        match addr {
            0x0000..=0x1FFF => {
                self.ram_enabled = (value & 0x0F) == 0x0A;
            }
            0x2000..=0x3FFF => {
                let bank = (value & 0x1F) as usize;
                self.rom_bank = if bank == 0 { 1 } else { bank };
            }
            0x4000..=0x5FFF => {
                self.ram_bank = (value & 0x03) as usize;
            }
            0x6000..=0x7FFF => {
                self.banking_mode = value & 0x01;
            }
            0xA000..=0xBFFF => {
                if self.ram_enabled && !self.ram.is_empty() {
                    let offset = (addr as usize) - 0xA000;
                    let bank_offset = self.ram_bank * 0x2000;
                    if bank_offset + offset < self.ram.len() {
                        self.ram[bank_offset + offset] = value;
                    }
                }
            }
            _ => {}
        }
    }
}

PPU (Picture Processing Unit) Basics

While this project focuses on the CPU, understanding the PPU is essential because they share memory and synchronization:

+-----------------------------------------------------------------------+
|                   PPU ARCHITECTURE OVERVIEW                            |
+-----------------------------------------------------------------------+

DISPLAY CHARACTERISTICS:
    Resolution:     160 x 144 pixels
    Colors:         4 shades (on original GB)
    Refresh rate:   59.73 Hz

TILE-BASED RENDERING:

The Game Boy doesn't store pixels directly. Instead, it stores:
1. Tile patterns (8x8 pixel bitmaps)
2. Tile maps (which tiles go where)
3. Sprite data (position, tile, flags)

    +---+---+---+---+---+    Each tile is 8x8 pixels
    |   |   |   |   |   |    Each pixel is 2 bits (4 colors)
    +---+---+---+---+---+    Each tile = 16 bytes
    |   |   |   |   |   |
    +---+---+---+---+---+    256 tiles can be in memory at once
    |   |   |   |   |   |
    +---+---+---+---+---+
    |   |   |   |   |   |
    +---+---+---+---+---+

TILE DATA ENCODING:

Each row of a tile is 2 bytes:
    Byte 0: Low bits of each pixel
    Byte 1: High bits of each pixel

    Example row:
        Byte 0: 0x3C = 0011 1100
        Byte 1: 0x7E = 0111 1110

        Pixel values:
        Pos:    7  6  5  4  3  2  1  0
        Byte0:  0  0  1  1  1  1  0  0
        Byte1:  0  1  1  1  1  1  1  0
        --------------------------------
        Color:  0  2  3  3  3  3  2  0
                |  |  |        |  |  |
              White   Light   Dark  White

PPU MODES (Visible in STAT register):

    Mode 0: HBlank (204 cycles)
        - CPU can access VRAM and OAM

    Mode 1: VBlank (4560 cycles = 10 lines)
        - CPU can access VRAM and OAM
        - Good time for game logic and VRAM updates
        - VBLANK interrupt triggered

    Mode 2: OAM Scan (80 cycles)
        - CPU can access VRAM but NOT OAM
        - PPU reading sprite data

    Mode 3: Drawing (172-289 cycles)
        - CPU CANNOT access VRAM or OAM
        - PPU rendering pixels


SCANLINE TIMING DIAGRAM:

Scanline 0-143 (Visible):
    |-- Mode 2 --|------- Mode 3 -------|------- Mode 0 -------|
    |   80 cyc   |     172-289 cyc      |      87-204 cyc      |
    |            |                      |                       |
    |<---------------------- 456 cycles ----------------------->|

Scanline 144-153 (VBlank):
    |----------------------- Mode 1 ---------------------------|
    |                      456 cycles                           |

Interrupts: VBLANK, LCD STAT, Timer, Serial, Joypad

The Game Boy has 5 interrupt sources with fixed priority:

+-----------------------------------------------------------------------+
|                   INTERRUPT SYSTEM                                     |
+-----------------------------------------------------------------------+

INTERRUPT VECTORS (Fixed addresses in ROM):

    Address   | Interrupt      | Priority | Bit in IF/IE
    ----------+----------------+----------+--------------
    0x0040    | VBlank         | Highest  | Bit 0
    0x0048    | LCD STAT       |          | Bit 1
    0x0050    | Timer          |          | Bit 2
    0x0058    | Serial         |          | Bit 3
    0x0060    | Joypad         | Lowest   | Bit 4

INTERRUPT HANDLING SEQUENCE:

When an interrupt occurs and IME=1:

    1. The IF bit corresponding to the interrupt is set by hardware
    2. CPU completes current instruction
    3. CPU checks: (IE & IF) != 0 and IME == 1
    4. If true:
        a. IME = 0 (disable further interrupts)
        b. PUSH PC onto stack (SP -= 2, write PC)
        c. Clear the specific IF bit being serviced
        d. PC = interrupt vector address
        e. Total time: 5 M-cycles (20 T-states)

REGISTERS:

    0xFF0F - IF (Interrupt Flag)
        Bit 4: Joypad
        Bit 3: Serial
        Bit 2: Timer
        Bit 1: LCD STAT
        Bit 0: VBlank

        Writing 1 to a bit REQUESTS that interrupt.
        Reading shows pending interrupts.
        Bits are CLEARED when interrupt is serviced.

    0xFFFF - IE (Interrupt Enable)
        Same bit layout as IF.
        Bit = 1: Interrupt source is enabled
        Bit = 0: Interrupt source is masked

    IME (Interrupt Master Enable) - NOT memory-mapped
        Internal CPU flag.
        Controlled by EI, DI, RETI instructions.
        EI enables IME AFTER the next instruction.
        DI disables IME immediately.

PRIORITY HANDLING:

If multiple interrupts are pending, the lowest-numbered bit wins:

    IF = 0x05 (VBlank + Timer pending)
    IE = 0xFF (all enabled)

    VBlank (bit 0) is serviced first.
    After RETI or EI, Timer (bit 2) would be serviced.

HALT BEHAVIOR:

The HALT instruction stops CPU execution until an interrupt:

    IF (IE & IF) == 0:
        CPU sleeps, PPU and timers continue running
        CPU wakes when (IE & IF) != 0

    If IME == 0 but (IE & IF) != 0:
        HALT BUG: PC fails to increment on next instruction
        (Yes, this is a real hardware bug!)

Rust Implementation:

impl<M: MemoryBus> Cpu<M> {
    /// Check for and handle pending interrupts
    /// Returns cycles consumed by interrupt handling (0 or 20)
    pub fn handle_interrupts(&mut self) -> u8 {
        let ie = self.bus.read(0xFFFF);
        let if_ = self.bus.read(0xFF0F);
        let pending = ie & if_;

        if pending == 0 {
            return 0;
        }

        // An interrupt is pending - wake from HALT
        self.halted = false;

        // If IME is disabled, don't actually service the interrupt
        if !self.ime {
            return 0;
        }

        // Service the highest priority pending interrupt
        let interrupt_bit = pending.trailing_zeros() as u8;
        let vector = 0x0040 + (interrupt_bit as u16 * 8);

        // Clear the IF flag
        self.bus.write(0xFF0F, if_ & !(1 << interrupt_bit));

        // Disable interrupts
        self.ime = false;

        // Push PC and jump to vector
        self.push_word(self.regs.pc);
        self.regs.pc = vector;

        20 // Interrupt handling takes 5 M-cycles
    }

    fn push_word(&mut self, value: u16) {
        self.regs.sp = self.regs.sp.wrapping_sub(1);
        self.bus.write(self.regs.sp, (value >> 8) as u8);
        self.regs.sp = self.regs.sp.wrapping_sub(1);
        self.bus.write(self.regs.sp, value as u8);
    }
}

Why no_std Matters for Portability

Building the emulator core with #![no_std] provides massive portability benefits:

+-----------------------------------------------------------------------+
|                   no_std PORTABILITY                                   |
+-----------------------------------------------------------------------+

WITH no_std, YOUR CORE CAN RUN ON:

    +------------------+     +-----------------+     +------------------+
    |   Web Browser    |     |  Desktop App    |     |  Embedded MCU    |
    | (via WebAssembly)|     | (Windows/Linux/ |     |  (ESP32, STM32,  |
    |                  |     |     macOS)      |     |   Raspberry Pi   |
    +------------------+     +-----------------+     |     Pico)        |
            |                        |               +------------------+
            |                        |                       |
            v                        v                       v
    +---------------------------------------------------------------+
    |                    YOUR no_std CORE                           |
    |                                                               |
    |  pub struct Cpu<M: MemoryBus> { ... }                        |
    |  impl<M: MemoryBus> Cpu<M> { pub fn step(&mut self) -> u8 }  |
    |                                                               |
    +---------------------------------------------------------------+
                                    |
                                    | Implements trait
                                    v
    +---------------------------------------------------------------+
    |                    MemoryBus TRAIT                            |
    |                                                               |
    |  fn read(&self, addr: u16) -> u8;                            |
    |  fn write(&mut self, addr: u16, value: u8);                  |
    |                                                               |
    +---------------------------------------------------------------+

PLATFORM-SPECIFIC IMPLEMENTATIONS:

WebAssembly:
    - MemoryBus backed by JavaScript TypedArray
    - Frame rendering via Canvas API
    - Audio via Web Audio API
    - Input from DOM events

Desktop:
    - MemoryBus backed by Vec<u8>
    - Frame rendering via SDL2, winit, or similar
    - Audio via cpal, rodio, or system audio
    - Input from window events

Embedded:
    - MemoryBus backed by static array or external RAM
    - Frame rendering via SPI display (ILI9341, ST7789, etc.)
    - Audio via I2S DAC
    - Input from GPIO buttons

THE KEY INSIGHT:

By abstracting all I/O through traits, the CPU core becomes
pure computation - no allocation, no system calls, no platform
dependencies. It's just:

    loop {
        let cycles = cpu.step();
        advance_other_hardware(cycles);
    }

Book Reference: “The Secret Life of Programs” by Jonathan Steinhart - Chapter 5


Real World Outcome

When complete, you will have a working Game Boy emulator core that demonstrates deep understanding of CPU architecture, memory systems, and Rust’s no_std ecosystem.

Example: Boot Sequence and CPU Trace

$ cargo run --release -- --rom roms/tetris.gb --trace --frames 3

+=======================================================================+
|            Game Boy Emulator Core v0.1.0 (no_std)                     |
+=======================================================================+
| ROM: tetris.gb                                                        |
| Size: 32768 bytes (32 KB)                                             |
| Type: ROM ONLY (No MBC)                                               |
| Checksum: 0x3B VALID                                                  |
+=======================================================================+

[BOOT] Initializing CPU (LR35902 @ 4.194304 MHz)
[BOOT] Initializing PPU (LCD Controller)
[BOOT] Initializing Memory Map (64KB address space)
[BOOT] Loading Boot ROM (256 bytes) at 0x0000-0x00FF
[BOOT] Starting execution at PC=0x0000

======================== CPU EXECUTION TRACE ===========================

Cycle: 0000000 | PC: 0x0000 | SP: 0x0000 | [BOOT ROM]
  Memory[0x0000] = 0x31
  OP: 0x31 (LD SP, d16)
  Operands: 0xFE, 0xFF -> d16 = 0xFFFE
  Execution: SP <- 0xFFFE
  Registers BEFORE: A:00 F:00 B:00 C:00 D:00 E:00 H:00 L:00 SP:0000
  Registers AFTER:  A:00 F:00 B:00 C:00 D:00 E:00 H:00 L:00 SP:FFFE
  Flags: [Z:0 N:0 H:0 C:0] (unchanged)
  Cycles: 12 (3 M-cycles)

Cycle: 0000012 | PC: 0x0003 | SP: 0xFFFE
  Memory[0x0003] = 0xAF
  OP: 0xAF (XOR A)
  Execution: A <- A XOR A = 0x00
  Registers AFTER:  A:00 F:80 B:00 C:00 D:00 E:00 H:00 L:00 SP:FFFE
  Flags: [Z:1 N:0 H:0 C:0] <- Zero flag SET (result is 0)
  Cycles: 4 (1 M-cycle)

Cycle: 0000016 | PC: 0x0004 | SP: 0xFFFE
  Memory[0x0004] = 0x21
  OP: 0x21 (LD HL, d16)
  Operands: 0x26, 0xFF -> d16 = 0xFF26
  Execution: HL <- 0xFF26 (Audio Master Control register)
  Registers AFTER:  A:00 F:80 B:00 C:00 D:00 E:00 H:FF L:26 SP:FFFE
  Flags: [Z:1 N:0 H:0 C:0] (unchanged)
  Cycles: 12 (3 M-cycles)

Cycle: 0000028 | PC: 0x0007 | SP: 0xFFFE
  Memory[0x0007] = 0x0E
  OP: 0x0E (LD C, d8)
  Operands: 0x11
  Execution: C <- 0x11
  Registers AFTER:  A:00 F:80 B:00 C:11 D:00 E:00 H:FF L:26 SP:FFFE
  Cycles: 8 (2 M-cycles)

... [Boot ROM continues for ~244 more instructions] ...

Cycle: 0024812 | PC: 0x00FC | SP: 0xFFFE | [BOOT ROM -> GAME ROM]
  Memory[0x00FC] = 0xE0
  OP: 0xE0 (LDH (a8), A)
  Operands: 0x50
  Execution: Memory[0xFF50] <- A
             Writing 0x01 to 0xFF50 DISABLES BOOT ROM
  [BOOT] Boot ROM disabled. Cartridge ROM now visible at 0x0000-0x00FF
  Cycles: 12

Cycle: 0024824 | PC: 0x0100 | SP: 0xFFFE | [GAME ROM - ENTRY POINT]

  =================================================================
  |                    GAME ROM EXECUTION BEGINS                  |
  =================================================================

  Memory[0x0100] = 0x00
  OP: 0x00 (NOP)
  Execution: No operation
  Registers: A:01 F:B0 B:00 C:13 D:00 E:D8 H:01 L:4D SP:FFFE
  Flags: [Z:1 N:0 H:1 C:1]
  Cycles: 4

Cycle: 0024828 | PC: 0x0101 | SP: 0xFFFE
  Memory[0x0101] = 0xC3
  OP: 0xC3 (JP a16)
  Operands: 0x50, 0x01 -> a16 = 0x0150
  Execution: PC <- 0x0150 (Jump to game initialization)
  Cycles: 16

Cycle: 0024844 | PC: 0x0150 | SP: 0xFFFE | [GAME INITIALIZATION]
  Memory[0x0150] = 0xC3
  OP: 0xC3 (JP a16)
  Operands: 0xD3, 0x02 -> a16 = 0x02D3
  Execution: PC <- 0x02D3
  Cycles: 16

... [Game initialization continues] ...

======================== PPU RENDERING TRACE ===========================

[PPU] Frame 0 | Scanline: 000 | Mode: OAM_SCAN
  LCD Control (0xFF40): 0x91
    - LCD Enabled: YES
    - Window Tile Map: 0x9800
    - Window Enabled: NO
    - BG Tile Data: 0x8000
    - BG Tile Map: 0x9800
    - OBJ Size: 8x8
    - OBJ Enabled: NO
    - BG Enabled: YES

[PPU] Frame 0 | Scanline: 000 | Mode: DRAWING
  Scroll: SCX=0x00, SCY=0x00
  Fetching tiles from map at 0x9800...
  Tile[0,0] = 0x00 -> Pattern at 0x8000
  Tile[0,1] = 0x00 -> Pattern at 0x8000
  ... [20 tiles per scanline]
  Rendering 160 pixels to framebuffer[0..159]

[PPU] Frame 0 | Scanline: 000 | Mode: HBLANK
  Horizontal blank period
  CPU can access VRAM freely

[PPU] Frame 0 | Scanline: 001 | Mode: OAM_SCAN
... [Scanlines 1-143 continue] ...

[PPU] Frame 0 | Scanline: 144 | Mode: VBLANK
  +===========================================================+
  |                 VBLANK INTERRUPT TRIGGERED                |
  |                                                           |
  | Frame Complete:                                           |
  |   - 144 visible scanlines rendered                        |
  |   - 160x144 = 23,040 pixels                               |
  |   - 4 shades per pixel                                    |
  |                                                           |
  | Timing:                                                   |
  |   - 70,224 CPU cycles per frame                           |
  |   - Frame time: 16.74ms (59.73 Hz)                        |
  +===========================================================+

[INT] Setting IF bit 0 (VBlank)
[INT] IE & IF = 0x01, IME = 1
[INT] Servicing VBlank interrupt
[INT] PUSH PC (0x02F8) -> Stack at 0xFFFC
[INT] PC <- 0x0040 (VBlank vector)
[INT] IME <- 0 (Interrupts disabled)

Cycle: 0070244 | PC: 0x0040 | SP: 0xFFFC | [VBLANK HANDLER]
  Memory[0x0040] = 0xC3
  OP: 0xC3 (JP a16)
  Operands: 0x00, 0x04 -> a16 = 0x0400
  Execution: PC <- 0x0400 (Game's VBlank handler)
  Cycles: 16

... [VBlank handler executes] ...

[PPU] Frame 0 | Scanline: 153 | Mode: VBLANK (last)
[PPU] Frame 0 Complete. Starting Frame 1.

[PPU] Frame 1 | Starting new frame
[PPU] Frame 2 | Starting new frame
[PPU] Frame 3 | Starting new frame

======================== EXECUTION SUMMARY =============================

Total Cycles Executed:    280,896 (4 frames)
Total Instructions:       24,312
Total Frames Rendered:    4
Average FPS:              59.73 Hz (target: 59.73 Hz)
Execution Time:           66.96ms (simulated)

Instruction Breakdown:
+------------------+--------+--------+
| Category         | Count  | %      |
+------------------+--------+--------+
| 8-bit Loads      |  6,234 | 25.6%  |
| 16-bit Loads     |  1,102 |  4.5%  |
| Arithmetic       |  3,891 | 16.0%  |
| Logic            |  2,567 | 10.6%  |
| Bit Operations   |  1,234 |  5.1%  |
| Jumps/Calls      |  2,876 | 11.8%  |
| Stack Operations |    891 |  3.7%  |
| Control          |  5,517 | 22.7%  |
+------------------+--------+--------+

Memory Access Statistics:
  ROM Reads:      19,234
  WRAM Reads:      8,901
  WRAM Writes:     4,567
  VRAM Reads:      1,234 (mostly by PPU)
  VRAM Writes:       456
  I/O Reads:       2,123
  I/O Writes:      1,024

Interrupt Statistics:
  VBlank:     4 (one per frame)
  LCD STAT:   0
  Timer:      0
  Serial:     0
  Joypad:     0

[EMULATOR] Execution complete. Exiting.

Example: no_std Build for WebAssembly

$ cargo build --target wasm32-unknown-unknown --release
   Compiling gameboy-core v0.1.0
    Finished release [optimized] target(s) in 2.34s

$ ls -la target/wasm32-unknown-unknown/release/
-rwxr-xr-x  1 user  staff  45678 Dec 27 12:00 gameboy_core.wasm

$ wasm-opt -Os -o optimized.wasm target/wasm32-unknown-unknown/release/gameboy_core.wasm

$ ls -la optimized.wasm
-rw-r--r--  1 user  staff  23456 Dec 27 12:00 optimized.wasm

[BUILD] WebAssembly output: 23 KB (no_std, no allocations)
[BUILD] Can be loaded directly in browser with <50KB footprint

Example: no_std Build for Embedded

$ cargo build --target thumbv7em-none-eabihf --release
   Compiling gameboy-core v0.1.0
    Finished release [optimized] target(s) in 3.42s

$ arm-none-eabi-size target/thumbv7em-none-eabihf/release/libgameboy_core.a
   text    data     bss     dec     hex filename
  18432      24    2048   20504    5018 gameboy_core.o

[BUILD] Successfully compiled for ARM Cortex-M4 (no_std)
[BUILD] Code size: 18 KB (Flash)
[BUILD] RAM usage: 2 KB (Static)
[BUILD] Perfect for ESP32, STM32F4, RP2040, etc.

Complete Project Specification

Minimum Viable Product (MVP):

  1. CPU core with all 512 opcodes correctly implemented
  2. Pass Blargg’s cpu_instrs.gb test ROM (11 individual tests)
  3. Basic memory bus supporting 32KB ROM (no MBC)
  4. Accurate cycle counting
  5. Compiles with no_std

Extended Goals:

  1. MBC1 support for larger ROMs (2MB)
  2. MBC3 support with RTC
  3. Pass Blargg’s instr_timing.gb test
  4. Basic PPU for visual output
  5. Save state serialization

Solution Architecture

+-----------------------------------------------------------------------+
|                   PROJECT STRUCTURE                                    |
+-----------------------------------------------------------------------+

gameboy-core/
+-- Cargo.toml
+-- src/
|   +-- lib.rs              <- Crate root, no_std configuration
|   +-- cpu/
|   |   +-- mod.rs          <- CPU struct and step() method
|   |   +-- registers.rs    <- Register definitions and accessors
|   |   +-- opcodes.rs      <- Opcode execution (base instructions)
|   |   +-- cb_opcodes.rs   <- CB-prefixed instructions
|   |   +-- flags.rs        <- Flag calculation helpers
|   |   +-- interrupts.rs   <- Interrupt handling
|   +-- memory/
|   |   +-- mod.rs          <- MemoryBus trait definition
|   |   +-- bus.rs          <- Main memory bus implementation
|   |   +-- cartridge.rs    <- ROM loading, MBC handling
|   |   +-- io.rs           <- I/O register handling
|   +-- ppu/
|   |   +-- mod.rs          <- PPU struct (optional for MVP)
|   +-- timer.rs            <- Timer (DIV, TIMA, TMA, TAC)
+-- tests/
|   +-- blargg.rs           <- Integration tests with Blargg ROMs
|   +-- instruction_tests.rs <- Unit tests for each opcode
+-- roms/                   <- Test ROMs (git-ignored)
    +-- cpu_instrs.gb
    +-- instr_timing.gb

Core Traits and Types:

// lib.rs
#![no_std]

pub mod cpu;
pub mod memory;
pub mod timer;

pub use cpu::Cpu;
pub use memory::{MemoryBus, Bus};

// memory/mod.rs
/// Memory bus abstraction - implement this for different backends
pub trait MemoryBus {
    fn read(&self, addr: u16) -> u8;
    fn write(&mut self, addr: u16, value: u8);

    fn read_word(&self, addr: u16) -> u16 {
        let lo = self.read(addr) as u16;
        let hi = self.read(addr.wrapping_add(1)) as u16;
        (hi << 8) | lo
    }

    fn write_word(&mut self, addr: u16, value: u16) {
        self.write(addr, value as u8);
        self.write(addr.wrapping_add(1), (value >> 8) as u8);
    }
}

// cpu/mod.rs
pub struct Cpu<M: MemoryBus> {
    pub regs: Registers,
    pub bus: M,
    pub halted: bool,
    pub stopped: bool,
    pub ime: bool,
    pub ime_pending: bool,
    cycles: u64,
}

impl<M: MemoryBus> Cpu<M> {
    pub fn new(bus: M) -> Self {
        Self {
            regs: Registers::new(),
            bus,
            halted: false,
            stopped: false,
            ime: false,
            ime_pending: false,
            cycles: 0,
        }
    }

    /// Execute one instruction and return cycles consumed
    pub fn step(&mut self) -> u8 {
        // Handle pending EI
        if self.ime_pending {
            self.ime = true;
            self.ime_pending = false;
        }

        if self.halted {
            return 4;
        }

        let opcode = self.fetch_byte();
        self.execute(opcode)
    }

    /// Check and handle interrupts, return cycles consumed
    pub fn handle_interrupts(&mut self) -> u8 {
        // ... implementation
    }

    pub fn total_cycles(&self) -> u64 {
        self.cycles
    }
}

Phased Implementation Guide

Phase 1: Register Definitions with Bit-Shifting (Days 1-2)

Goal: Create the register file with efficient 8-bit and 16-bit access.

// cpu/registers.rs

/// LR35902 CPU registers
#[derive(Debug, Clone, Copy, Default)]
pub struct Registers {
    pub a: u8,
    pub f: u8,
    pub b: u8,
    pub c: u8,
    pub d: u8,
    pub e: u8,
    pub h: u8,
    pub l: u8,
    pub sp: u16,
    pub pc: u16,
}

impl Registers {
    /// Create registers with post-boot values
    pub fn new() -> Self {
        Self {
            a: 0x01,
            f: 0xB0,  // Z=1, N=0, H=1, C=1
            b: 0x00,
            c: 0x13,
            d: 0x00,
            e: 0xD8,
            h: 0x01,
            l: 0x4D,
            sp: 0xFFFE,
            pc: 0x0100,
        }
    }

    // 16-bit register pair accessors

    pub fn af(&self) -> u16 {
        ((self.a as u16) << 8) | (self.f as u16)
    }

    pub fn set_af(&mut self, value: u16) {
        self.a = (value >> 8) as u8;
        self.f = (value & 0xF0) as u8;  // Lower 4 bits always 0
    }

    pub fn bc(&self) -> u16 {
        ((self.b as u16) << 8) | (self.c as u16)
    }

    pub fn set_bc(&mut self, value: u16) {
        self.b = (value >> 8) as u8;
        self.c = value as u8;
    }

    pub fn de(&self) -> u16 {
        ((self.d as u16) << 8) | (self.e as u16)
    }

    pub fn set_de(&mut self, value: u16) {
        self.d = (value >> 8) as u8;
        self.e = value as u8;
    }

    pub fn hl(&self) -> u16 {
        ((self.h as u16) << 8) | (self.l as u16)
    }

    pub fn set_hl(&mut self, value: u16) {
        self.h = (value >> 8) as u8;
        self.l = value as u8;
    }

    // Flag accessors

    pub fn z(&self) -> bool { (self.f & 0x80) != 0 }
    pub fn n(&self) -> bool { (self.f & 0x40) != 0 }
    pub fn h(&self) -> bool { (self.f & 0x20) != 0 }
    pub fn c(&self) -> bool { (self.f & 0x10) != 0 }

    pub fn set_z(&mut self, value: bool) {
        if value { self.f |= 0x80; } else { self.f &= !0x80; }
    }

    pub fn set_n(&mut self, value: bool) {
        if value { self.f |= 0x40; } else { self.f &= !0x40; }
    }

    pub fn set_h(&mut self, value: bool) {
        if value { self.f |= 0x20; } else { self.f &= !0x20; }
    }

    pub fn set_c(&mut self, value: bool) {
        if value { self.f |= 0x10; } else { self.f &= !0x10; }
    }

    /// Set all flags at once
    pub fn set_flags(&mut self, z: bool, n: bool, h: bool, c: bool) {
        self.f = 0;
        if z { self.f |= 0x80; }
        if n { self.f |= 0x40; }
        if h { self.f |= 0x20; }
        if c { self.f |= 0x10; }
    }
}

Checkpoint: Write unit tests that verify register pairing works correctly.


Phase 2: Memory Bus Abstraction (Days 3-4)

Goal: Create a flexible memory bus that can support different backends.

// memory/mod.rs

pub trait MemoryBus {
    fn read(&self, addr: u16) -> u8;
    fn write(&mut self, addr: u16, value: u8);

    fn read_word(&self, addr: u16) -> u16 {
        let lo = self.read(addr) as u16;
        let hi = self.read(addr.wrapping_add(1)) as u16;
        (hi << 8) | lo
    }

    fn write_word(&mut self, addr: u16, value: u16) {
        self.write(addr, value as u8);
        self.write(addr.wrapping_add(1), (value >> 8) as u8);
    }
}

// memory/bus.rs

/// Main Game Boy memory bus
pub struct Bus {
    /// Cartridge ROM (32KB for no-MBC)
    rom: [u8; 0x8000],
    /// Video RAM (8KB)
    vram: [u8; 0x2000],
    /// Work RAM (8KB)
    wram: [u8; 0x2000],
    /// High RAM (127 bytes)
    hram: [u8; 127],
    /// I/O registers (128 bytes)
    io: [u8; 128],
    /// Interrupt Enable register
    ie: u8,
    /// Boot ROM enabled flag
    boot_rom_enabled: bool,
    /// Boot ROM contents (256 bytes)
    boot_rom: [u8; 256],
}

impl Bus {
    pub fn new() -> Self {
        Self {
            rom: [0; 0x8000],
            vram: [0; 0x2000],
            wram: [0; 0x2000],
            hram: [0; 127],
            io: [0; 128],
            ie: 0,
            boot_rom_enabled: true,
            boot_rom: [0; 256],
        }
    }

    pub fn load_rom(&mut self, data: &[u8]) {
        let len = data.len().min(self.rom.len());
        self.rom[..len].copy_from_slice(&data[..len]);
    }

    pub fn load_boot_rom(&mut self, data: &[u8]) {
        let len = data.len().min(256);
        self.boot_rom[..len].copy_from_slice(&data[..len]);
    }
}

impl MemoryBus for Bus {
    fn read(&self, addr: u16) -> u8 {
        match addr {
            // ROM Bank 0 / Boot ROM
            0x0000..=0x00FF => {
                if self.boot_rom_enabled {
                    self.boot_rom[addr as usize]
                } else {
                    self.rom[addr as usize]
                }
            }

            // ROM Bank 0 (continued)
            0x0100..=0x3FFF => self.rom[addr as usize],

            // ROM Bank 1 (for no-MBC)
            0x4000..=0x7FFF => self.rom[addr as usize],

            // VRAM
            0x8000..=0x9FFF => self.vram[(addr - 0x8000) as usize],

            // External RAM (not implemented for no-MBC)
            0xA000..=0xBFFF => 0xFF,

            // WRAM
            0xC000..=0xDFFF => self.wram[(addr - 0xC000) as usize],

            // Echo RAM (mirror of WRAM)
            0xE000..=0xFDFF => self.wram[(addr - 0xE000) as usize],

            // OAM
            0xFE00..=0xFE9F => 0xFF, // TODO: Implement OAM

            // Unusable
            0xFEA0..=0xFEFF => 0xFF,

            // I/O Registers
            0xFF00..=0xFF7F => self.read_io(addr),

            // HRAM
            0xFF80..=0xFFFE => self.hram[(addr - 0xFF80) as usize],

            // Interrupt Enable
            0xFFFF => self.ie,
        }
    }

    fn write(&mut self, addr: u16, value: u8) {
        match addr {
            // ROM (read-only for no-MBC)
            0x0000..=0x7FFF => { /* Ignored */ }

            // VRAM
            0x8000..=0x9FFF => self.vram[(addr - 0x8000) as usize] = value,

            // External RAM
            0xA000..=0xBFFF => { /* Ignored for no-MBC */ }

            // WRAM
            0xC000..=0xDFFF => self.wram[(addr - 0xC000) as usize] = value,

            // Echo RAM
            0xE000..=0xFDFF => self.wram[(addr - 0xE000) as usize] = value,

            // OAM
            0xFE00..=0xFE9F => { /* TODO */ }

            // Unusable
            0xFEA0..=0xFEFF => { /* Ignored */ }

            // I/O Registers
            0xFF00..=0xFF7F => self.write_io(addr, value),

            // HRAM
            0xFF80..=0xFFFE => self.hram[(addr - 0xFF80) as usize] = value,

            // Interrupt Enable
            0xFFFF => self.ie = value,
        }
    }
}

impl Bus {
    fn read_io(&self, addr: u16) -> u8 {
        let offset = (addr - 0xFF00) as usize;
        match addr {
            0xFF44 => 0x90, // LY = 144 (fake VBlank for testing)
            _ => self.io[offset],
        }
    }

    fn write_io(&mut self, addr: u16, value: u8) {
        let offset = (addr - 0xFF00) as usize;
        match addr {
            0xFF50 => {
                // Writing to FF50 disables boot ROM
                if value != 0 {
                    self.boot_rom_enabled = false;
                }
            }
            _ => self.io[offset] = value,
        }
    }
}

Checkpoint: Memory reads and writes work correctly, boot ROM overlay functions.


Phase 3: Implement Core Instructions (Days 5-10)

Goal: Implement the 20 most common instructions that form the backbone of any program.

Start with these essential opcodes:

// cpu/opcodes.rs

impl<M: MemoryBus> Cpu<M> {
    pub fn execute(&mut self, opcode: u8) -> u8 {
        match opcode {
            // NOP
            0x00 => 4,

            // LD BC, d16
            0x01 => {
                let value = self.fetch_word();
                self.regs.set_bc(value);
                12
            }

            // LD (BC), A
            0x02 => {
                self.bus.write(self.regs.bc(), self.regs.a);
                8
            }

            // INC BC
            0x03 => {
                self.regs.set_bc(self.regs.bc().wrapping_add(1));
                8
            }

            // INC B
            0x04 => {
                self.regs.b = self.inc8(self.regs.b);
                4
            }

            // DEC B
            0x05 => {
                self.regs.b = self.dec8(self.regs.b);
                4
            }

            // LD B, d8
            0x06 => {
                self.regs.b = self.fetch_byte();
                8
            }

            // ADD A, B
            0x80 => {
                self.add_a(self.regs.b);
                4
            }

            // SUB B
            0x90 => {
                self.sub_a(self.regs.b);
                4
            }

            // AND B
            0xA0 => {
                self.and_a(self.regs.b);
                4
            }

            // XOR A
            0xAF => {
                self.xor_a(self.regs.a);
                4
            }

            // OR B
            0xB0 => {
                self.or_a(self.regs.b);
                4
            }

            // CP B
            0xB8 => {
                self.cp_a(self.regs.b);
                4
            }

            // JP a16
            0xC3 => {
                let addr = self.fetch_word();
                self.regs.pc = addr;
                16
            }

            // JP NZ, a16
            0xC2 => {
                let addr = self.fetch_word();
                if !self.regs.z() {
                    self.regs.pc = addr;
                    16
                } else {
                    12
                }
            }

            // JR r8
            0x18 => {
                let offset = self.fetch_byte() as i8;
                self.regs.pc = self.regs.pc.wrapping_add(offset as u16);
                12
            }

            // CALL a16
            0xCD => {
                let addr = self.fetch_word();
                self.push_word(self.regs.pc);
                self.regs.pc = addr;
                24
            }

            // RET
            0xC9 => {
                self.regs.pc = self.pop_word();
                16
            }

            // PUSH BC
            0xC5 => {
                self.push_word(self.regs.bc());
                16
            }

            // POP BC
            0xC1 => {
                let value = self.pop_word();
                self.regs.set_bc(value);
                12
            }

            // CB prefix
            0xCB => {
                let cb_opcode = self.fetch_byte();
                self.execute_cb(cb_opcode)
            }

            _ => {
                panic!("Unimplemented opcode: 0x{:02X} at PC: 0x{:04X}",
                       opcode, self.regs.pc.wrapping_sub(1));
            }
        }
    }

    // Helper functions for arithmetic

    fn inc8(&mut self, value: u8) -> u8 {
        let result = value.wrapping_add(1);
        self.regs.set_z(result == 0);
        self.regs.set_n(false);
        self.regs.set_h((value & 0x0F) == 0x0F);
        // C flag not affected
        result
    }

    fn dec8(&mut self, value: u8) -> u8 {
        let result = value.wrapping_sub(1);
        self.regs.set_z(result == 0);
        self.regs.set_n(true);
        self.regs.set_h((value & 0x0F) == 0);
        // C flag not affected
        result
    }

    fn add_a(&mut self, value: u8) {
        let a = self.regs.a;
        let result = a.wrapping_add(value);

        self.regs.set_z(result == 0);
        self.regs.set_n(false);
        self.regs.set_h((a & 0x0F) + (value & 0x0F) > 0x0F);
        self.regs.set_c((a as u16) + (value as u16) > 0xFF);

        self.regs.a = result;
    }

    fn sub_a(&mut self, value: u8) {
        let a = self.regs.a;
        let result = a.wrapping_sub(value);

        self.regs.set_z(result == 0);
        self.regs.set_n(true);
        self.regs.set_h((a & 0x0F) < (value & 0x0F));
        self.regs.set_c(a < value);

        self.regs.a = result;
    }

    fn and_a(&mut self, value: u8) {
        self.regs.a &= value;
        self.regs.set_flags(self.regs.a == 0, false, true, false);
    }

    fn xor_a(&mut self, value: u8) {
        self.regs.a ^= value;
        self.regs.set_flags(self.regs.a == 0, false, false, false);
    }

    fn or_a(&mut self, value: u8) {
        self.regs.a |= value;
        self.regs.set_flags(self.regs.a == 0, false, false, false);
    }

    fn cp_a(&mut self, value: u8) {
        let a = self.regs.a;
        self.regs.set_z(a == value);
        self.regs.set_n(true);
        self.regs.set_h((a & 0x0F) < (value & 0x0F));
        self.regs.set_c(a < value);
        // A is NOT modified
    }

    fn push_word(&mut self, value: u16) {
        self.regs.sp = self.regs.sp.wrapping_sub(1);
        self.bus.write(self.regs.sp, (value >> 8) as u8);
        self.regs.sp = self.regs.sp.wrapping_sub(1);
        self.bus.write(self.regs.sp, value as u8);
    }

    fn pop_word(&mut self) -> u16 {
        let lo = self.bus.read(self.regs.sp) as u16;
        self.regs.sp = self.regs.sp.wrapping_add(1);
        let hi = self.bus.read(self.regs.sp) as u16;
        self.regs.sp = self.regs.sp.wrapping_add(1);
        (hi << 8) | lo
    }
}

Checkpoint: Simple test programs execute correctly.


Phase 4: Implement CB-Prefixed Bit Operations (Days 11-14)

Goal: Complete the 256 CB-prefixed instructions.

// cpu/cb_opcodes.rs

impl<M: MemoryBus> Cpu<M> {
    pub fn execute_cb(&mut self, opcode: u8) -> u8 {
        // Extract register index from lower 3 bits
        let reg_idx = opcode & 0x07;

        // Extract operation from upper bits
        let operation = opcode >> 3;

        match opcode {
            // RLC r
            0x00..=0x07 => {
                let value = self.read_r8(reg_idx);
                let result = self.rlc(value);
                self.write_r8(reg_idx, result);
                if reg_idx == 6 { 16 } else { 8 }
            }

            // RRC r
            0x08..=0x0F => {
                let value = self.read_r8(reg_idx);
                let result = self.rrc(value);
                self.write_r8(reg_idx, result);
                if reg_idx == 6 { 16 } else { 8 }
            }

            // RL r
            0x10..=0x17 => {
                let value = self.read_r8(reg_idx);
                let result = self.rl(value);
                self.write_r8(reg_idx, result);
                if reg_idx == 6 { 16 } else { 8 }
            }

            // RR r
            0x18..=0x1F => {
                let value = self.read_r8(reg_idx);
                let result = self.rr(value);
                self.write_r8(reg_idx, result);
                if reg_idx == 6 { 16 } else { 8 }
            }

            // SLA r
            0x20..=0x27 => {
                let value = self.read_r8(reg_idx);
                let result = self.sla(value);
                self.write_r8(reg_idx, result);
                if reg_idx == 6 { 16 } else { 8 }
            }

            // SRA r
            0x28..=0x2F => {
                let value = self.read_r8(reg_idx);
                let result = self.sra(value);
                self.write_r8(reg_idx, result);
                if reg_idx == 6 { 16 } else { 8 }
            }

            // SWAP r
            0x30..=0x37 => {
                let value = self.read_r8(reg_idx);
                let result = self.swap(value);
                self.write_r8(reg_idx, result);
                if reg_idx == 6 { 16 } else { 8 }
            }

            // SRL r
            0x38..=0x3F => {
                let value = self.read_r8(reg_idx);
                let result = self.srl(value);
                self.write_r8(reg_idx, result);
                if reg_idx == 6 { 16 } else { 8 }
            }

            // BIT n, r
            0x40..=0x7F => {
                let bit = (opcode >> 3) & 0x07;
                let value = self.read_r8(reg_idx);
                self.bit(bit, value);
                if reg_idx == 6 { 12 } else { 8 }
            }

            // RES n, r
            0x80..=0xBF => {
                let bit = (opcode >> 3) & 0x07;
                let value = self.read_r8(reg_idx);
                let result = value & !(1 << bit);
                self.write_r8(reg_idx, result);
                if reg_idx == 6 { 16 } else { 8 }
            }

            // SET n, r
            0xC0..=0xFF => {
                let bit = (opcode >> 3) & 0x07;
                let value = self.read_r8(reg_idx);
                let result = value | (1 << bit);
                self.write_r8(reg_idx, result);
                if reg_idx == 6 { 16 } else { 8 }
            }
        }
    }

    fn read_r8(&self, idx: u8) -> u8 {
        match idx {
            0 => self.regs.b,
            1 => self.regs.c,
            2 => self.regs.d,
            3 => self.regs.e,
            4 => self.regs.h,
            5 => self.regs.l,
            6 => self.bus.read(self.regs.hl()),
            7 => self.regs.a,
            _ => unreachable!(),
        }
    }

    fn write_r8(&mut self, idx: u8, value: u8) {
        match idx {
            0 => self.regs.b = value,
            1 => self.regs.c = value,
            2 => self.regs.d = value,
            3 => self.regs.e = value,
            4 => self.regs.h = value,
            5 => self.regs.l = value,
            6 => self.bus.write(self.regs.hl(), value),
            7 => self.regs.a = value,
            _ => unreachable!(),
        }
    }

    // Bit operation helpers

    fn rlc(&mut self, value: u8) -> u8 {
        let carry = (value >> 7) & 1;
        let result = (value << 1) | carry;
        self.regs.set_flags(result == 0, false, false, carry != 0);
        result
    }

    fn rrc(&mut self, value: u8) -> u8 {
        let carry = value & 1;
        let result = (value >> 1) | (carry << 7);
        self.regs.set_flags(result == 0, false, false, carry != 0);
        result
    }

    fn rl(&mut self, value: u8) -> u8 {
        let old_carry = if self.regs.c() { 1 } else { 0 };
        let new_carry = (value >> 7) & 1;
        let result = (value << 1) | old_carry;
        self.regs.set_flags(result == 0, false, false, new_carry != 0);
        result
    }

    fn rr(&mut self, value: u8) -> u8 {
        let old_carry = if self.regs.c() { 0x80 } else { 0 };
        let new_carry = value & 1;
        let result = (value >> 1) | old_carry;
        self.regs.set_flags(result == 0, false, false, new_carry != 0);
        result
    }

    fn sla(&mut self, value: u8) -> u8 {
        let carry = (value >> 7) & 1;
        let result = value << 1;
        self.regs.set_flags(result == 0, false, false, carry != 0);
        result
    }

    fn sra(&mut self, value: u8) -> u8 {
        let carry = value & 1;
        let result = (value >> 1) | (value & 0x80);  // Preserve sign bit
        self.regs.set_flags(result == 0, false, false, carry != 0);
        result
    }

    fn swap(&mut self, value: u8) -> u8 {
        let result = (value >> 4) | (value << 4);
        self.regs.set_flags(result == 0, false, false, false);
        result
    }

    fn srl(&mut self, value: u8) -> u8 {
        let carry = value & 1;
        let result = value >> 1;
        self.regs.set_flags(result == 0, false, false, carry != 0);
        result
    }

    fn bit(&mut self, bit: u8, value: u8) {
        let result = value & (1 << bit);
        self.regs.set_z(result == 0);
        self.regs.set_n(false);
        self.regs.set_h(true);
        // C flag not affected
    }
}

Checkpoint: All CB-prefixed instructions implemented and tested.


Phase 5: Interrupt Handling (Days 15-17)

Goal: Implement the complete interrupt system.

// cpu/interrupts.rs

impl<M: MemoryBus> Cpu<M> {
    /// Handle pending interrupts
    /// Returns cycles consumed (0 if no interrupt, 20 if interrupt serviced)
    pub fn handle_interrupts(&mut self) -> u8 {
        let ie = self.bus.read(0xFFFF);
        let if_ = self.bus.read(0xFF0F);
        let pending = ie & if_ & 0x1F;

        if pending == 0 {
            return 0;
        }

        // Wake from HALT if any interrupt is pending
        self.halted = false;

        // If IME is disabled, don't service the interrupt
        if !self.ime {
            return 0;
        }

        // Find highest priority pending interrupt (lowest bit number)
        let interrupt_bit = pending.trailing_zeros() as u8;

        // Vector addresses: 0x40, 0x48, 0x50, 0x58, 0x60
        let vector = 0x0040 + (interrupt_bit as u16 * 8);

        // Clear the IF flag for this interrupt
        let new_if = if_ & !(1 << interrupt_bit);
        self.bus.write(0xFF0F, new_if);

        // Disable interrupts (IME = 0)
        self.ime = false;

        // Push current PC to stack
        self.push_word(self.regs.pc);

        // Jump to interrupt vector
        self.regs.pc = vector;

        // Interrupt dispatch takes 5 M-cycles (20 T-states)
        20
    }
}

// Instructions that affect interrupts

impl<M: MemoryBus> Cpu<M> {
    fn execute_di(&mut self) {
        // DI - Disable interrupts immediately
        self.ime = false;
    }

    fn execute_ei(&mut self) {
        // EI - Enable interrupts after next instruction
        self.ime_pending = true;
    }

    fn execute_reti(&mut self) -> u8 {
        // RETI - Return and enable interrupts
        self.regs.pc = self.pop_word();
        self.ime = true;
        16
    }

    fn execute_halt(&mut self) {
        // HALT - Stop CPU until interrupt
        self.halted = true;

        // HALT bug: If IME=0 but (IE & IF) != 0,
        // the next instruction after HALT will be executed twice
        // (PC fails to increment)
        // This is a real hardware bug we should emulate
    }
}

Checkpoint: Interrupts fire at correct times, HALT works correctly.


Phase 6: Pass Blargg’s CPU Tests (Days 18-24)

Goal: Complete all remaining opcodes and pass the cpu_instrs test ROM.

Blargg’s test ROMs are the gold standard for Game Boy emulator accuracy. The cpu_instrs.gb ROM contains 11 individual tests:

  1. 01-special.gb - DAA, CPL, SCF, CCF
  2. 02-interrupts.gb - Interrupt timing
  3. 03-op sp,hl.gb - SP and HL operations
  4. 04-op r,imm.gb - Register/immediate operations
  5. 05-op rp.gb - Register pair operations
  6. 06-ld r,r.gb - Load between registers
  7. 07-jr,jp,call,ret,rst.gb - Control flow
  8. 08-misc instrs.gb - Miscellaneous
  9. 09-op r,r.gb - Register/register operations
  10. 10-bit ops.gb - CB-prefixed bit operations
  11. 11-op a,(hl).gb - A with (HL) operations

The DAA Instruction:

This is the most difficult instruction to implement correctly:

fn execute_daa(&mut self) {
    // DAA - Decimal Adjust Accumulator
    // Converts A to BCD after an addition or subtraction

    let mut a = self.regs.a as i16;

    if !self.regs.n() {
        // After addition
        if self.regs.c() || a > 0x99 {
            a += 0x60;
            self.regs.set_c(true);
        }
        if self.regs.h() || (a & 0x0F) > 0x09 {
            a += 0x06;
        }
    } else {
        // After subtraction
        if self.regs.c() {
            a -= 0x60;
        }
        if self.regs.h() {
            a -= 0x06;
        }
    }

    self.regs.a = a as u8;
    self.regs.set_z(self.regs.a == 0);
    self.regs.set_h(false);
    // C flag set above if needed, otherwise unchanged
}

Checkpoint: All 11 Blargg cpu_instrs tests pass.


Phase 7: Basic PPU for Visual Output (Days 25-30)

Goal: Implement enough PPU functionality to see visual output.

// ppu/mod.rs

pub struct Ppu {
    /// Current scanline (0-153)
    ly: u8,
    /// LCD Control register
    lcdc: u8,
    /// LCD Status register
    stat: u8,
    /// Scroll X
    scx: u8,
    /// Scroll Y
    scy: u8,
    /// Current mode (0-3)
    mode: PpuMode,
    /// Cycles into current scanline
    cycles: u16,
    /// Frame buffer (160x144 pixels, 2 bits per pixel)
    pub framebuffer: [u8; 160 * 144],
}

#[derive(Clone, Copy, PartialEq)]
pub enum PpuMode {
    HBlank = 0,
    VBlank = 1,
    OamScan = 2,
    Drawing = 3,
}

impl Ppu {
    pub fn new() -> Self {
        Self {
            ly: 0,
            lcdc: 0x91,
            stat: 0,
            scx: 0,
            scy: 0,
            mode: PpuMode::OamScan,
            cycles: 0,
            framebuffer: [0; 160 * 144],
        }
    }

    /// Step the PPU by the given number of cycles
    /// Returns true if VBlank interrupt should be triggered
    pub fn step(&mut self, cycles: u8, vram: &[u8]) -> bool {
        if !self.lcd_enabled() {
            return false;
        }

        self.cycles += cycles as u16;
        let mut vblank_interrupt = false;

        match self.mode {
            PpuMode::OamScan => {
                if self.cycles >= 80 {
                    self.cycles -= 80;
                    self.mode = PpuMode::Drawing;
                }
            }

            PpuMode::Drawing => {
                if self.cycles >= 172 {
                    self.cycles -= 172;
                    self.mode = PpuMode::HBlank;

                    // Render this scanline
                    if self.ly < 144 {
                        self.render_scanline(vram);
                    }
                }
            }

            PpuMode::HBlank => {
                if self.cycles >= 204 {
                    self.cycles -= 204;
                    self.ly += 1;

                    if self.ly == 144 {
                        self.mode = PpuMode::VBlank;
                        vblank_interrupt = true;
                    } else {
                        self.mode = PpuMode::OamScan;
                    }
                }
            }

            PpuMode::VBlank => {
                if self.cycles >= 456 {
                    self.cycles -= 456;
                    self.ly += 1;

                    if self.ly > 153 {
                        self.ly = 0;
                        self.mode = PpuMode::OamScan;
                    }
                }
            }
        }

        vblank_interrupt
    }

    fn lcd_enabled(&self) -> bool {
        (self.lcdc & 0x80) != 0
    }

    fn render_scanline(&mut self, vram: &[u8]) {
        if (self.lcdc & 0x01) == 0 {
            // BG disabled
            return;
        }

        let tile_map_addr = if (self.lcdc & 0x08) != 0 { 0x1C00 } else { 0x1800 };
        let tile_data_addr = if (self.lcdc & 0x10) != 0 { 0x0000 } else { 0x0800 };
        let signed_tile_nums = (self.lcdc & 0x10) == 0;

        let y = self.ly.wrapping_add(self.scy);
        let tile_row = (y / 8) as u16;
        let pixel_row = (y % 8) as u16;

        for screen_x in 0u8..160 {
            let x = screen_x.wrapping_add(self.scx);
            let tile_col = (x / 8) as u16;
            let pixel_col = 7 - (x % 8);

            // Get tile number from map
            let map_offset = tile_row * 32 + tile_col;
            let tile_num = vram[(tile_map_addr + map_offset) as usize];

            // Get tile data address
            let tile_addr = if signed_tile_nums {
                let signed_num = tile_num as i8 as i16;
                ((tile_data_addr as i16) + (signed_num + 128) * 16) as u16
            } else {
                tile_data_addr + (tile_num as u16) * 16
            };

            // Read tile row (2 bytes)
            let row_addr = tile_addr + pixel_row * 2;
            let low = vram[row_addr as usize];
            let high = vram[(row_addr + 1) as usize];

            // Get pixel color (0-3)
            let color_bit = ((high >> pixel_col) & 1) << 1 | ((low >> pixel_col) & 1);

            // Store in framebuffer
            let fb_idx = self.ly as usize * 160 + screen_x as usize;
            self.framebuffer[fb_idx] = color_bit;
        }
    }

    pub fn read_register(&self, addr: u16) -> u8 {
        match addr {
            0xFF40 => self.lcdc,
            0xFF41 => (self.stat & 0xF8) | (self.mode as u8),
            0xFF42 => self.scy,
            0xFF43 => self.scx,
            0xFF44 => self.ly,
            _ => 0xFF,
        }
    }

    pub fn write_register(&mut self, addr: u16, value: u8) {
        match addr {
            0xFF40 => self.lcdc = value,
            0xFF41 => self.stat = (value & 0xF8) | (self.stat & 0x07),
            0xFF42 => self.scy = value,
            0xFF43 => self.scx = value,
            // LY is read-only
            _ => {}
        }
    }
}

Checkpoint: Games display on screen.


Testing Strategy

Blargg’s Test ROMs

The gold standard for CPU accuracy:

// tests/blargg.rs

#[test]
fn test_blargg_01_special() {
    let rom = include_bytes!("../roms/01-special.gb");
    let mut emu = Emulator::new(rom);

    // Run until test completes or timeout
    for _ in 0..10_000_000 {
        emu.step();

        // Check serial output for pass/fail
        if emu.serial_output().contains("Passed") {
            return; // Success!
        }
        if emu.serial_output().contains("Failed") {
            panic!("Test failed: {}", emu.serial_output());
        }
    }

    panic!("Test timed out");
}

Instruction-Level Unit Tests

#[test]
fn test_add_sets_flags_correctly() {
    let mut cpu = Cpu::new(TestBus::new());

    // Test zero flag
    cpu.regs.a = 0;
    cpu.add_a(0);
    assert!(cpu.regs.z(), "Zero flag should be set for result 0");

    // Test half-carry
    cpu.regs.a = 0x0F;
    cpu.add_a(0x01);
    assert!(cpu.regs.h(), "Half-carry should be set for 0x0F + 0x01");

    // Test carry
    cpu.regs.a = 0xFF;
    cpu.add_a(0x01);
    assert!(cpu.regs.c(), "Carry should be set for 0xFF + 0x01");
    assert_eq!(cpu.regs.a, 0x00, "Result should wrap to 0");
}

Cycle Accuracy Verification

#[test]
fn test_instruction_cycles() {
    let mut cpu = Cpu::new(TestBus::with_program(&[
        0x00,       // NOP - 4 cycles
        0x01, 0x00, 0x00, // LD BC, d16 - 12 cycles
        0xC3, 0x00, 0x00, // JP a16 - 16 cycles
    ]));

    let initial = cpu.total_cycles();

    cpu.step(); // NOP
    assert_eq!(cpu.total_cycles() - initial, 4);

    cpu.step(); // LD BC, d16
    assert_eq!(cpu.total_cycles() - initial, 16);

    cpu.step(); // JP a16
    assert_eq!(cpu.total_cycles() - initial, 32);
}

Common Pitfalls

Pitfall 1: Half-Carry Flag Calculation Errors

Symptom: DAA produces wrong results, Blargg test 01 fails

Solution: Double-check your half-carry formula for each operation type:

  • ADD: (a & 0xF) + (b & 0xF) > 0xF
  • SUB: (a & 0xF) < (b & 0xF)
  • 16-bit ADD: Uses bit 11, not bit 3

Pitfall 2: DAA Instruction Complexity

Symptom: BCD arithmetic produces garbage

Solution: DAA behavior depends on the N, H, and C flags from the PREVIOUS operation. Study multiple implementations and test thoroughly.

Pitfall 3: Interrupt Timing Edge Cases

Symptom: Games freeze or behave erratically

Solution:

  • EI enables interrupts AFTER the next instruction
  • HALT bug when IME=0 and (IE & IF) != 0
  • Interrupt dispatch takes 5 M-cycles

Pitfall 4: 16-bit Register Access Patterns

Symptom: Stack corruption, wrong values

Solution:

  • Stack grows DOWNWARD (SP decrements)
  • 16-bit values are little-endian (low byte first)
  • PUSH writes high byte first, POP reads low byte first

Pitfall 5: CB-Prefixed Instruction Timing

Symptom: Timing tests fail

Solution: CB instructions that access (HL) take extra cycles:

  • CB (HL) read: +4 cycles
  • CB (HL) write: +4 more cycles

Pitfall 6: Boot ROM Overlay

Symptom: Games don’t start

Solution:

  • Boot ROM is at 0x0000-0x00FF on startup
  • Writing to 0xFF50 disables it PERMANENTLY
  • After boot, cartridge ROM is visible at 0x0000

Extensions

Full PPU Implementation

Add complete PPU with:

  • Sprite rendering (OAM)
  • Window layer
  • Priority handling
  • Mode 3 variable length

Audio APU

Implement the 4 audio channels:

  • Channel 1: Pulse with sweep
  • Channel 2: Pulse
  • Channel 3: Wave
  • Channel 4: Noise

Implement serial communication:

  • Internal clock mode
  • External clock mode
  • Shift register timing

Save States

Serialize emulator state:

  • CPU registers
  • All RAM
  • PPU state
  • Timer state
  • MBC state

Game Boy Color Support

Extend for GBC:

  • Double-speed mode
  • Color palettes
  • VRAM banking
  • WRAM banking

The Interview Questions

  1. “How does the LR35902 differ from a standard Z80?”
    • Missing IX/IY index registers
    • Missing shadow registers
    • Missing block transfer instructions
    • Has unique SWAP instruction
    • Memory-mapped I/O instead of I/O ports
    • Simplified interrupt system (5 sources, fixed vectors)
  2. “Why is the half-carry flag important?”
    • Required for DAA instruction
    • DAA converts binary to BCD (for displaying decimal numbers)
    • Half-carry tracks overflow from lower nibble to upper nibble
    • Essential for correct addition/subtraction of multi-digit decimals
  3. “How do you handle the different cycle counts for conditional instructions?”
    • Check condition before returning cycles
    • If taken: return longer cycle count
    • If not taken: return shorter cycle count
    • Must track total cycles for PPU/Timer synchronization
  4. “Explain the memory banking system and why it’s necessary.”
    • Game Boy has 16-bit address bus = 64KB maximum
    • Games can be up to 8MB
    • MBC chips on cartridge swap banks into 0x4000-0x7FFF
    • Write to ROM addresses controls bank selection
    • Also handles external RAM for save games
  5. “What is the HALT bug and how do you emulate it?”
    • When IME=0 but (IE & IF) != 0, HALT exits but PC fails to increment
    • Next instruction after HALT is executed twice
    • Must track this edge case for accurate emulation
    • Some games rely on this behavior

Books That Will Help

Topic Book Chapter/Section
Game Boy Hardware “Game Boy Coding Adventure” by Maximilien Dagois Full book - definitive Game Boy reference
CPU Architecture “Computer Organization and Design” by Patterson & Hennessy Chapter 4 - The Processor
Bit Manipulation “The Art of Computer Programming” Vol. 4A by Donald Knuth Combinatorial Algorithms
Online Reference Pan Docs https://gbdev.io/pandocs/
Z80 Heritage “Z80 Family CPU User Manual” by Zilog Reference for understanding LR35902 origins
Emulator Development “The Ultimate Game Boy Talk” by Michael Steil Video + slides on accurate emulation

Summary

This project teaches you:

  • CPU architecture at the deepest level - you will understand exactly how instructions are encoded, decoded, and executed
  • Bit manipulation mastery - from flag calculations to register pairing
  • Memory systems - direct access, banking, and memory-mapped I/O
  • Interrupt handling - priority, timing, and edge cases
  • no_std Rust - building portable, allocation-free libraries
  • Hardware synchronization - coordinating CPU, PPU, and timers

When complete, you will have built a working Game Boy emulator core that can run commercial games. More importantly, you will have internalized how computers work at the most fundamental level - knowledge that will inform everything else you build.

The Game Boy’s simplicity makes it an ideal first emulator target, but its quirks (HALT bug, DAA complexity, timing requirements) ensure you learn to pay attention to details that matter.

Welcome to the world of emulation. The hardware is now yours to recreate.