Project 6: 6502 CPU Emulator - The Heart of a Gaming Generation

Build a cycle-accurate emulator of the MOS 6502 processor - the legendary CPU that powered the NES, Commodore 64, Apple II, and Atari 2600. This is not a toy CPU - it’s real silicon translated into software.

Quick Reference

Attribute Value
Difficulty Advanced
Time Estimate 3-4 weeks
Main Language C
Alternative Languages Rust, C++
Prerequisites Previous CPU emulator project (P05), strong C skills, understanding of binary/hex
Key Topics CPU emulation, addressing modes, status flags, cycle timing, BCD arithmetic

1. Learning Objectives

After completing this project, you will be able to:

  • Implement a complete instruction set architecture (ISA) with 56 opcodes and 151 instruction variants
  • Understand and implement 13 different addressing modes used in real CPUs
  • Handle cycle-accurate timing including page-boundary crossing penalties
  • Implement a processor status register with condition flags (N, V, Z, C, B, I, D)
  • Debug at the CPU level using memory dumps, register traces, and instruction stepping
  • Run real 6502 programs including test suites and vintage software
  • Explain how the 6502’s elegant design influenced modern processor architecture
  • Pass Klaus Dormann’s famous 6502 functional test suite

2. Theoretical Foundation

2.1 The MOS 6502: A Silicon Legend

The MOS Technology 6502, released in 1975, is one of the most influential microprocessors ever created. At $25 (compared to Intel 8080’s $179), it democratized computing and powered the home computer revolution.

Systems Powered by the 6502:

  • Apple I and Apple II (1976-1977)
  • Commodore PET, VIC-20, and C64 (1977-1982)
  • Atari 2600, 400, 800 (1977-1979)
  • Nintendo Entertainment System (NES) with Ricoh 2A03 (1983)
  • BBC Micro (1981)
  • Atari Lynx (1989)
  • Tamagotchi (1996)

The 6502’s design philosophy of “do more with less” led to elegant solutions that every systems programmer should understand.

2.2 Core Architecture

                          MOS 6502 CPU Architecture
    ┌─────────────────────────────────────────────────────────────────┐
    │                                                                 │
    │  ┌─────────────────────────────────────────────────────────┐   │
    │  │                    REGISTERS                             │   │
    │  │  ┌─────┐ ┌─────┐ ┌─────┐ ┌─────┐ ┌─────┐ ┌──────────┐  │   │
    │  │  │  A  │ │  X  │ │  Y  │ │  S  │ │  P  │ │    PC    │  │   │
    │  │  │ 8b  │ │ 8b  │ │ 8b  │ │ 8b  │ │ 8b  │ │   16b    │  │   │
    │  │  └─────┘ └─────┘ └─────┘ └─────┘ └─────┘ └──────────┘  │   │
    │  │   Acc    Index X  Index Y  Stack  Status  Program Ctr  │   │
    │  └─────────────────────────────────────────────────────────┘   │
    │                              │                                  │
    │                              ▼                                  │
    │  ┌─────────────────────────────────────────────────────────┐   │
    │  │                     ALU (8-bit)                          │   │
    │  │  ┌─────────────┐  ┌─────────────┐  ┌─────────────────┐  │   │
    │  │  │   Adder     │  │  Logic Ops  │  │  Shift/Rotate   │  │   │
    │  │  │ ADC, SBC    │  │ AND,OR,EOR  │  │ ASL,LSR,ROL,ROR │  │   │
    │  │  └─────────────┘  └─────────────┘  └─────────────────┘  │   │
    │  └─────────────────────────────────────────────────────────┘   │
    │                              │                                  │
    │                              ▼                                  │
    │  ┌─────────────────────────────────────────────────────────┐   │
    │  │               MEMORY BUS (16-bit address)                │   │
    │  │                        64KB                              │   │
    │  └─────────────────────────────────────────────────────────┘   │
    │                                                                 │
    └─────────────────────────────────────────────────────────────────┘

Register Set:

Register Name Size Purpose
A Accumulator 8-bit Main arithmetic/logic operations
X Index X 8-bit Array indexing, loop counter
Y Index Y 8-bit Array indexing, loop counter
S Stack Pointer 8-bit Points to stack (page $01)
P Processor Status 8-bit Condition flags
PC Program Counter 16-bit Next instruction address

The 6502’s minimalist register set forced elegant programming techniques. Where the Intel 8080 had 7 general-purpose registers, the 6502 had just A, X, and Y - yet achieved remarkable efficiency.

2.3 The Processor Status Register

The 8-bit status register (P) contains flags that reflect the CPU state:

    Status Register (P) Bit Layout
    ┌─────┬─────┬─────┬─────┬─────┬─────┬─────┬─────┐
    │  7  │  6  │  5  │  4  │  3  │  2  │  1  │  0  │
    │  N  │  V  │  -  │  B  │  D  │  I  │  Z  │  C  │
    └─────┴─────┴─────┴─────┴─────┴─────┴─────┴─────┘
       │     │     │     │     │     │     │     │
       │     │     │     │     │     │     │     └── Carry Flag
       │     │     │     │     │     │     └──────── Zero Flag
       │     │     │     │     │     └────────────── Interrupt Disable
       │     │     │     │     └──────────────────── Decimal Mode (BCD)
       │     │     │     └────────────────────────── Break Command
       │     │     └──────────────────────────────── (Always 1)
       │     └────────────────────────────────────── Overflow Flag
       └──────────────────────────────────────────── Negative Flag

Flag Details:

Flag Bit Set When Used For
C (Carry) 0 Arithmetic overflow/borrow, shift out Multi-byte math, comparisons
Z (Zero) 1 Result is zero Equality tests, loop termination
I (Interrupt) 2 Interrupts should be ignored Critical sections
D (Decimal) 3 BCD mode enabled Financial calculations
B (Break) 4 BRK instruction executed Debugging, software interrupts
- (Unused) 5 Always 1 Reserved
V (Overflow) 6 Signed arithmetic overflow Signed number operations
N (Negative) 7 Result bit 7 is set Signed number tests

2.4 The 13 Addressing Modes

The 6502’s addressing modes are its most distinctive feature. Each mode specifies how to find the operand:

                    6502 Addressing Modes Visualization

    ┌────────────────────────────────────────────────────────────────┐
    │  IMMEDIATE          ZERO PAGE           ZERO PAGE,X           │
    │  ────────────────   ────────────────    ────────────────       │
    │  LDA #$44           LDA $44             LDA $44,X              │
    │                                                                │
    │  Operand is the     Operand at          Operand at             │
    │  value itself       addr $0044          addr $0044 + X         │
    │                                                                │
    │  ┌────┐            ┌────┐ ─────▶ ┌────┐  ┌────┐ ──┐            │
    │  │ 44 │            │ 44 │   $44  │ ?? │  │ 44 │   │ +X         │
    │  └────┘            └────┘        └────┘  └────┘   │            │
    │  ↓ A=44            Page 0               ─────────▶│            │
    │                                          $44+X    ▼            │
    ├────────────────────────────────────────────────────────────────┤
    │  ABSOLUTE          ABSOLUTE,X           ABSOLUTE,Y            │
    │  ────────────────   ────────────────    ────────────────       │
    │  LDA $4400          LDA $4400,X         LDA $4400,Y            │
    │                                                                │
    │  ┌────┬────┐        ┌────┬────┐         ┌────┬────┐           │
    │  │ 00 │ 44 │        │ 00 │ 44 │ +X      │ 00 │ 44 │ +Y        │
    │  └────┴────┘        └────┴────┘         └────┴────┘           │
    │      │                  │                   │                  │
    │      ▼                  ▼                   ▼                  │
    │  Addr $4400         Addr $4400+X        Addr $4400+Y          │
    ├────────────────────────────────────────────────────────────────┤
    │  INDIRECT          (INDIRECT,X)         (INDIRECT),Y          │
    │  ────────────────   ────────────────    ────────────────       │
    │  JMP ($4400)        LDA ($44,X)         LDA ($44),Y            │
    │                                                                │
    │  Look up addr       At $44+X, find      At $44, find           │
    │  at $4400-4401      pointer to data     pointer, add Y         │
    │                                                                │
    │  ┌────┬────┐        $44+X ──▶ ┌──┬──┐   $44 ──▶ ┌──┬──┐ +Y    │
    │  │ 00 │ 44 │                  │lo│hi│           │lo│hi│       │
    │  └────┴────┘                  └──┴──┘           └──┴──┘       │
    │      │                           │                  │          │
    │      ▼ read addr                 ▼                  ▼          │
    │  ┌────┬────┐                 Effective         Effective       │
    │  │ xx │ yy │                 Address           Addr + Y        │
    │  └────┴────┘                                                   │
    │      │                                                         │
    │      ▼ jump here                                               │
    └────────────────────────────────────────────────────────────────┘

Complete Addressing Mode Reference:

# Mode Example Operand Location Bytes Cycles
1 Implied CLC None (in opcode) 1 2
2 Accumulator ASL A A register 1 2
3 Immediate LDA #$44 Byte after opcode 2 2
4 Zero Page LDA $44 Address $0044 2 3
5 Zero Page,X LDA $44,X Address $0044+X 2 4
6 Zero Page,Y LDX $44,Y Address $0044+Y 2 4
7 Absolute LDA $4400 Address $4400 3 4
8 Absolute,X LDA $4400,X Address $4400+X 3 4(+1)
9 Absolute,Y LDA $4400,Y Address $4400+Y 3 4(+1)
10 Indirect JMP ($4400) [Address at $4400] 3 5
11 (Indirect,X) LDA ($44,X) [Address at $44+X] 2 6
12 (Indirect),Y LDA ($44),Y [Address at $44]+Y 2 5(+1)
13 Relative BEQ label PC + offset 2 2-4

Page Boundary Crossing: When indexed addressing crosses a 256-byte page boundary (e.g., $44FF + X where X=2 gives $4501), an extra cycle is needed. This is because the 6502 optimistically fetches from the low byte first.

2.5 Memory Map

    Standard 6502 Memory Map (64KB Address Space)

    $FFFF ┌─────────────────────────────────────┐
          │           Vector Table              │
    $FFFA │  NMI, RESET, IRQ vectors            │
          ├─────────────────────────────────────┤
    $FFF9 │                                     │
          │          ROM / Cartridge            │
          │      (System-dependent)             │
          │                                     │
    $8000 ├─────────────────────────────────────┤
          │                                     │
          │              RAM                    │
          │      (System-dependent)             │
          │                                     │
    $0200 ├─────────────────────────────────────┤
          │           Stack Page                │
    $0100 │     (256 bytes, $0100-$01FF)        │
          ├─────────────────────────────────────┤
          │          Zero Page                  │
    $0000 │     (256 bytes, $0000-$00FF)        │
          └─────────────────────────────────────┘

    Special Addresses:
    $0000-$00FF  Zero Page (fast access, 2-byte addressing)
    $0100-$01FF  Stack (grows downward from $01FF)
    $FFFA-$FFFB  NMI Vector (Non-Maskable Interrupt)
    $FFFC-$FFFD  RESET Vector (CPU start address)
    $FFFE-$FFFF  IRQ/BRK Vector (Interrupt Request)

2.6 The Fetch-Decode-Execute Cycle

    6502 Instruction Execution Cycle

    ┌─────────────────────────────────────────────────────────────┐
    │                                                             │
    │    FETCH                 DECODE                EXECUTE      │
    │    ──────                ──────                ──────       │
    │                                                             │
    │    ┌─────┐              ┌─────────┐          ┌─────────┐   │
    │    │ PC  │──────────▶   │ Opcode  │──────▶   │  ALU    │   │
    │    └─────┘              │ Decode  │          │  Ops    │   │
    │       │                 └─────────┘          └─────────┘   │
    │       ▼                      │                    │        │
    │   Read Memory                ▼                    ▼        │
    │   @ PC                  ┌─────────┐          ┌─────────┐   │
    │       │                 │Addressing│          │ Update  │   │
    │       ▼                 │  Mode    │          │ Flags   │   │
    │   ┌─────┐               └─────────┘          └─────────┘   │
    │   │Opcode│                   │                    │        │
    │   └─────┘                    ▼                    ▼        │
    │       │                 Fetch operand        Store result  │
    │       ▼                 (1-2 bytes)                        │
    │   PC = PC + 1                                              │
    │                                                             │
    │   Cycle Count: 2-7 cycles depending on instruction         │
    │                                                             │
    └─────────────────────────────────────────────────────────────┘

2.7 Why This Matters

For Game Development: Understanding the 6502 explains why NES games play the way they do. The limited registers forced creative solutions - sprite multiplexing, music tricks, and graphics hacks that defined an era.

For Compiler Design: The 6502’s constraints make it an excellent target for learning code generation. Every byte and cycle mattered.

For Security Research: ROP chains and exploitation techniques have roots in understanding CPU fundamentals. The 6502’s simplicity makes these concepts clear.

For Career Advancement: “I wrote a cycle-accurate 6502 emulator” is a powerful interview statement. It demonstrates deep systems understanding.

2.8 Common Misconceptions

Misconception 1: “Cycle-accurate just means counting instructions”

  • Reality: Each addressing mode takes different cycles. Page crossing adds cycles. Branches take different cycles depending on whether taken and page crossing.

Misconception 2: “The stack is like a normal stack”

  • Reality: The 6502 stack is fixed at page $01 ($0100-$01FF). The stack pointer is only 8 bits - it automatically wraps within this page.

Misconception 3: “BCD mode is optional”

  • Reality: For accurate emulation, you must implement BCD. Games use it for score displays. The NES disabled BCD in hardware, but a proper 6502 needs it.

Misconception 4: “All 256 opcodes are valid”

  • Reality: Only 151 opcodes are official. The other 105 are “undocumented” - some do useful things, some crash. Real games use some undocumented opcodes.

3. Project Specification

3.1 What You Will Build

A complete 6502 emulator that:

  1. Implements all 56 official instructions with their addressing modes (151 opcodes)
  2. Provides cycle-accurate timing including page-boundary penalties
  3. Correctly implements the status register and all flag operations
  4. Handles BCD (Binary Coded Decimal) arithmetic mode
  5. Processes interrupts (IRQ, NMI, RESET)
  6. Passes the Klaus Dormann 6502 functional test suite
  7. Can run real 6502 programs (Apple I BASIC, games, etc.)

3.2 Functional Requirements

  • Implement all 13 addressing modes correctly
  • Implement all 56 official opcodes
  • Pass Klaus Dormann’s 6502 functional test
  • Pass the BCD test (decimal mode arithmetic)
  • Count cycles accurately for each instruction
  • Handle page-boundary crossing penalties
  • Implement NMI, IRQ, and BRK interrupts
  • Implement RESET initialization
  • Provide debugging output (register dumps, memory inspection)

3.3 Non-Functional Requirements

  • Deterministic execution (same input = same output)
  • Clear separation between CPU core and memory/IO
  • Configurable clock speed or step mode
  • Memory callback system for future system integration (NES, C64, etc.)

3.4 Real World Outcome

When complete, you will have a working 6502 emulator:

$ ./emu6502 test_suite/6502_functional_test.bin
Loading 6502_functional_test.bin at $0400...
Reset vector: $0400
Starting execution...

Running Klaus Dormann's 6502 Functional Test Suite...

Test Group 1: Load/Store Operations
  [OK] LDA immediate
  [OK] LDA zero page
  [OK] LDA zero page,X
  [OK] LDA absolute
  [OK] LDA absolute,X
  [OK] LDA absolute,Y
  [OK] LDA (indirect,X)
  [OK] LDA (indirect),Y
  [OK] LDX/LDY all modes
  [OK] STA/STX/STY all modes

Test Group 2: Transfer Operations
  [OK] TAX, TAY, TXA, TYA
  [OK] TSX, TXS

Test Group 3: Stack Operations
  [OK] PHA, PHP
  [OK] PLA, PLP

Test Group 4: Arithmetic
  [OK] ADC all modes (binary)
  [OK] SBC all modes (binary)
  [OK] ADC/SBC (decimal mode)
  [OK] CMP, CPX, CPY

Test Group 5: Logic
  [OK] AND, ORA, EOR all modes
  [OK] BIT zero page, absolute

Test Group 6: Shift/Rotate
  [OK] ASL, LSR, ROL, ROR

Test Group 7: Branches
  [OK] BCC, BCS, BEQ, BNE
  [OK] BMI, BPL, BVC, BVS
  [OK] Page crossing timing

Test Group 8: Jumps/Calls
  [OK] JMP absolute, indirect
  [OK] JSR, RTS
  [OK] BRK, RTI

Test Group 9: Flags
  [OK] CLC, SEC, CLI, SEI
  [OK] CLD, SED, CLV
  [OK] All flag interactions

═══════════════════════════════════════════════════════════════════
                    ALL TESTS PASSED!
═══════════════════════════════════════════════════════════════════
Executed: 26,765,149 cycles
Time: 2.3 seconds
Effective speed: 11.6 MHz (6.6x real 1.79 MHz NES speed)

$ ./emu6502 roms/apple1_basic.bin
Apple I BASIC loaded. Emulating at 1 MHz...

E000: 4C 00 E0                     JMP $E000

Apple I BASIC 1.0
Ready

> PRINT 2 + 2
4

> 10 FOR I = 1 TO 10
> 20 PRINT I * I
> 30 NEXT I
> RUN
1
4
9
16
25
36
49
64
81
100

Ready

Debugging Output Example:

$ ./emu6502 --debug --step test.bin
6502 Debugger v1.0
Loaded: test.bin at $0600
Reset vector: $0600

PC=$0600  A=$00 X=$00 Y=$00 S=$FD P=00100100 (--_bdIZc)
> s
$0600: A9 44     LDA #$44        ; A = $44
Cycles: 2
PC=$0602  A=$44 X=$00 Y=$00 S=$FD P=00100100 (--_bdIZc)

> s
$0602: 85 10     STA $10         ; Store A to zero page $10
Cycles: 5
PC=$0604  A=$44 X=$00 Y=$00 S=$FD P=00100100 (--_bdIZc)

> m 10 20
$0010: 44 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  D...............

> r
A=$44 X=$00 Y=$00 S=$FD P=$24
PC=$0604
Flags: NV_BDIZC
       00100100

4. Solution Architecture

4.1 High-Level Design

                        6502 Emulator Architecture

    ┌─────────────────────────────────────────────────────────────────┐
    │                         MAIN LOOP                               │
    │                                                                 │
    │   ┌───────────────────────────────────────────────────────┐    │
    │   │                     CPU Core                          │    │
    │   │  ┌─────────┐  ┌─────────────┐  ┌────────────────┐    │    │
    │   │  │Registers│  │  Decoder    │  │    Executor    │    │    │
    │   │  │ A,X,Y   │  │  151 ops    │  │  ALU, Flags    │    │    │
    │   │  │ S,P,PC  │  │             │  │  Memory R/W    │    │    │
    │   │  └─────────┘  └─────────────┘  └────────────────┘    │    │
    │   │       ▲              │                 │              │    │
    │   │       │              ▼                 ▼              │    │
    │   │       │        ┌───────────────────────────┐         │    │
    │   │       └────────│     Addressing Modes      │         │    │
    │   │                │  13 modes, operand fetch  │         │    │
    │   │                └───────────────────────────┘         │    │
    │   └───────────────────────────────────────────────────────┘    │
    │                              │                                  │
    │                              ▼                                  │
    │   ┌───────────────────────────────────────────────────────┐    │
    │   │                    Memory Bus                          │    │
    │   │  ┌─────────────┐  ┌─────────────┐  ┌──────────────┐   │    │
    │   │  │  read_byte  │  │ write_byte  │  │   Callbacks  │   │    │
    │   │  │  (address)  │  │(addr, data) │  │ (for IO/ROM) │   │    │
    │   │  └─────────────┘  └─────────────┘  └──────────────┘   │    │
    │   └───────────────────────────────────────────────────────┘    │
    │                              │                                  │
    │                              ▼                                  │
    │   ┌───────────────────────────────────────────────────────┐    │
    │   │                 Memory Subsystem                       │    │
    │   │    ┌───────┐    ┌───────┐    ┌────────────────┐       │    │
    │   │    │  RAM  │    │  ROM  │    │   I/O Devices  │       │    │
    │   │    │ 64KB  │    │(mapped)│    │   (callbacks) │       │    │
    │   │    └───────┘    └───────┘    └────────────────┘       │    │
    │   └───────────────────────────────────────────────────────┘    │
    │                                                                 │
    └─────────────────────────────────────────────────────────────────┘

4.2 Key Components

  1. CPU Core: Registers, fetch-decode-execute cycle, cycle counting
  2. Decoder: 256-entry lookup table mapping opcodes to handlers
  3. Addressing Mode Calculator: Computes effective addresses for all 13 modes
  4. ALU: Arithmetic, logic, and shift operations with flag updates
  5. Memory Bus: Abstraction layer with read/write callbacks
  6. Interrupt Handler: IRQ, NMI, BRK, and RESET logic

4.3 Data Structures

// CPU State Structure
typedef struct {
    // Registers
    uint8_t  a;        // Accumulator
    uint8_t  x;        // Index X
    uint8_t  y;        // Index Y
    uint8_t  s;        // Stack pointer
    uint8_t  p;        // Processor status
    uint16_t pc;       // Program counter

    // Internal state
    uint64_t cycles;   // Total cycles executed
    bool     irq_pending;
    bool     nmi_pending;
    bool     nmi_edge;  // NMI is edge-triggered

    // Memory interface
    uint8_t  (*read)(uint16_t addr, void *ctx);
    void     (*write)(uint16_t addr, uint8_t val, void *ctx);
    void     *mem_ctx;

} cpu_6502;

// Status register bit masks
#define FLAG_C  0x01  // Carry
#define FLAG_Z  0x02  // Zero
#define FLAG_I  0x04  // Interrupt disable
#define FLAG_D  0x08  // Decimal mode
#define FLAG_B  0x10  // Break
#define FLAG_U  0x20  // Unused (always 1)
#define FLAG_V  0x40  // Overflow
#define FLAG_N  0x80  // Negative

// Addressing modes enumeration
typedef enum {
    ADDR_IMP,    // Implied
    ADDR_ACC,    // Accumulator
    ADDR_IMM,    // Immediate
    ADDR_ZP,     // Zero Page
    ADDR_ZPX,    // Zero Page,X
    ADDR_ZPY,    // Zero Page,Y
    ADDR_ABS,    // Absolute
    ADDR_ABX,    // Absolute,X
    ADDR_ABY,    // Absolute,Y
    ADDR_IND,    // Indirect
    ADDR_IZX,    // (Indirect,X)
    ADDR_IZY,    // (Indirect),Y
    ADDR_REL,    // Relative
} addr_mode;

// Instruction descriptor
typedef struct {
    const char *mnemonic;
    addr_mode   mode;
    uint8_t     bytes;
    uint8_t     cycles;
    bool        page_penalty;  // Extra cycle on page cross
    void        (*execute)(cpu_6502 *cpu, uint16_t addr);
} instruction;

4.4 Algorithm Overview

ALGORITHM: cpu_step (Execute One Instruction)

1. Check for pending interrupts
   - If NMI pending (edge-triggered), handle_nmi()
   - Else if IRQ pending and I flag clear, handle_irq()

2. FETCH opcode
   opcode = read(PC)
   PC++

3. DECODE instruction
   inst = instruction_table[opcode]

4. CALCULATE effective address (based on addressing mode)
   switch(inst.mode):
     ADDR_IMM: addr = PC; PC++
     ADDR_ZP:  addr = read(PC); PC++
     ADDR_ZPX: addr = (read(PC) + X) & 0xFF; PC++
     ADDR_ABS: addr = read16(PC); PC += 2
     // ... etc for all 13 modes

5. Check page boundary crossing
   if (inst.page_penalty && crossed_page)
     cycles++

6. EXECUTE instruction
   inst.execute(cpu, addr)

7. Add base cycle count
   cycles += inst.cycles

8. Return cycles consumed
ALGORITHM: ADC (Add with Carry)

1. Fetch operand value
   operand = read(effective_address)

2. Check decimal mode
   if (P & FLAG_D):
     // BCD addition (complex)
     result = bcd_add(A, operand, carry)
   else:
     // Binary addition
     result = A + operand + (P & FLAG_C)

3. Update flags
   C = (result > 0xFF)
   Z = ((result & 0xFF) == 0)
   N = (result & 0x80)
   V = ~(A ^ operand) & (A ^ result) & 0x80

4. Store result
   A = result & 0xFF

5. Implementation Guide

5.1 Development Environment Setup

# Install required tools
# macOS
brew install gcc make

# Ubuntu/Debian
sudo apt install build-essential

# Create project structure
mkdir -p 6502emu/{src,include,tests,roms}
cd 6502emu

5.2 Project Structure

6502emu/
├── src/
│   ├── main.c           # Entry point, CLI, debugging
│   ├── cpu.c            # CPU core implementation
│   ├── memory.c         # Memory bus and callbacks
│   ├── opcodes.c        # Instruction implementations
│   └── disasm.c         # Disassembler for debugging
├── include/
│   ├── cpu.h            # CPU state and functions
│   ├── memory.h         # Memory interface
│   └── types.h          # Common type definitions
├── tests/
│   ├── 6502_functional_test.bin  # Klaus Dormann test
│   ├── decimal_test.bin          # BCD test
│   └── test_harness.c            # Test runner
├── roms/
│   └── apple1_basic.bin          # Sample program
├── Makefile
└── README.md

5.3 The Core Question You’re Answering

“How does a CPU execute instructions, and what makes cycle-accurate emulation challenging?”

This project forces you to understand:

  • How addressing modes allow a small instruction set to be powerful
  • Why cycle timing matters for system emulation
  • How status flags enable conditional logic
  • Why the 6502’s design influenced 40 years of CPU architecture

5.4 Concepts You Must Understand First

Before writing code, verify your understanding:

Addressing Modes:

  • Q: What’s the difference between (Indirect,X) and (Indirect),Y?
  • A: (Indirect,X) adds X before the indirection; (Indirect),Y adds Y after

Carry Flag:

  • Q: How does SBC use the carry flag?
  • A: SBC subtracts the inverse of carry (borrow). A - M - (1-C)

Overflow Flag:

  • Q: When is overflow set?
  • A: When signed arithmetic produces a result outside -128 to 127

Page Crossing:

  • Q: Why does crossing a page boundary cost a cycle?
  • A: The 6502 optimistically fetches from the wrong page first

Stack Operations:

  • Q: Where is the stack located?
  • A: Page $01 ($0100-$01FF), pointer is only 8 bits

5.5 Questions to Guide Your Design

CPU Core:

  1. How will you represent the opcode-to-instruction mapping?
  2. Will you use function pointers, a switch statement, or computed goto?
  3. How will you track cycle counts for page-crossing penalties?

Addressing Modes:

  1. Will you calculate the effective address once or inline in each instruction?
  2. How will you handle zero-page wrapping (e.g., $FF + X wraps to $00)?
  3. How will you implement the JMP indirect bug (wraps within page)?

Flags:

  1. How will you calculate the overflow flag correctly?
  2. Will you set flags in each instruction or use helper functions?
  3. How will you handle flags for read-modify-write instructions?

BCD Mode:

  1. Will you implement BCD fully or skip it for NES compatibility?
  2. How will you test your BCD implementation?

5.6 Thinking Exercise

Before writing code, trace through this program by hand:

        .org $0600
start:  LDA #$50        ; A = $50
        STA $10         ; Store at zero page $10
        LDA #$30        ; A = $30
        CLC             ; Clear carry
        ADC $10         ; A = A + [$10] = $30 + $50 = $80
        BPL done        ; Branch if positive (N=0)
        LDA #$FF        ; This should execute (N=1)
done:   STA $20         ; Store result
        BRK             ; Stop

For each instruction, write:

  1. The opcode bytes
  2. The cycle count
  3. Register values after execution
  4. Flag values after execution

Your trace should look like:

Address Bytes Instruction A X Y P (NVDIZC) Cycles
$0600 A9 50 LDA #$50 $50 $00 $00 00100100 2
$0602 85 10 STA $10 $50 $00 $00 00100100 3

5.7 Hints in Layers

Use these progressive hints only when stuck.

Hint 1: Starting Structure

Begin with the CPU state and basic fetch:

// Start simple - just registers and memory access
typedef struct {
    uint8_t a, x, y, s, p;
    uint16_t pc;
    uint8_t memory[65536];  // Simple array for now
} cpu_6502;

void cpu_reset(cpu_6502 *cpu) {
    cpu->a = cpu->x = cpu->y = 0;
    cpu->s = 0xFD;  // Stack pointer after reset
    cpu->p = 0x24;  // I flag set, unused bit set
    cpu->pc = cpu->memory[0xFFFC] | (cpu->memory[0xFFFD] << 8);
}

Hint 2: Opcode Dispatch

Create a dispatch table:

typedef void (*opcode_handler)(cpu_6502 *cpu);

opcode_handler dispatch[256];

void init_dispatch(void) {
    // Initialize all to illegal opcode handler
    for (int i = 0; i < 256; i++)
        dispatch[i] = op_illegal;

    // Fill in real opcodes
    dispatch[0xA9] = op_lda_imm;   // LDA #
    dispatch[0xA5] = op_lda_zp;    // LDA zp
    dispatch[0xB5] = op_lda_zpx;   // LDA zp,X
    // ... 148 more
}

int cpu_step(cpu_6502 *cpu) {
    uint8_t opcode = cpu->memory[cpu->pc++];
    dispatch[opcode](cpu);
    return cycle_table[opcode];
}

Hint 3: Addressing Mode Helpers

Factor out addressing mode calculations:

uint16_t addr_immediate(cpu_6502 *cpu) {
    return cpu->pc++;
}

uint16_t addr_zeropage(cpu_6502 *cpu) {
    return cpu->memory[cpu->pc++];
}

uint16_t addr_zeropage_x(cpu_6502 *cpu) {
    return (cpu->memory[cpu->pc++] + cpu->x) & 0xFF;
}

uint16_t addr_absolute(cpu_6502 *cpu) {
    uint16_t lo = cpu->memory[cpu->pc++];
    uint16_t hi = cpu->memory[cpu->pc++];
    return (hi << 8) | lo;
}

Hint 4: Flag Update Helpers

Create reusable flag update functions:

void set_nz(cpu_6502 *cpu, uint8_t value) {
    cpu->p &= ~(FLAG_N | FLAG_Z);
    if (value == 0) cpu->p |= FLAG_Z;
    if (value & 0x80) cpu->p |= FLAG_N;
}

void op_lda_imm(cpu_6502 *cpu) {
    cpu->a = cpu->memory[cpu->pc++];
    set_nz(cpu, cpu->a);
}

void op_lda_zp(cpu_6502 *cpu) {
    cpu->a = cpu->memory[addr_zeropage(cpu)];
    set_nz(cpu, cpu->a);
}

Hint 5: ADC Implementation

The most complex instruction - ADC with both binary and BCD modes:

void do_adc(cpu_6502 *cpu, uint8_t value) {
    if (cpu->p & FLAG_D) {
        // BCD mode
        uint16_t lo = (cpu->a & 0x0F) + (value & 0x0F) + (cpu->p & FLAG_C);
        uint16_t hi = (cpu->a & 0xF0) + (value & 0xF0);

        if (lo > 0x09) {
            lo += 0x06;
            hi += 0x10;
        }

        // Overflow check happens before decimal adjust
        uint8_t overflow = ~(cpu->a ^ value) & (cpu->a ^ (hi + (lo & 0x0F))) & 0x80;

        if (hi > 0x90) hi += 0x60;

        cpu->p &= ~(FLAG_C | FLAG_V | FLAG_Z | FLAG_N);
        if (hi > 0xFF) cpu->p |= FLAG_C;
        if (overflow) cpu->p |= FLAG_V;

        cpu->a = ((hi + (lo & 0x0F)) & 0xFF);
        if (cpu->a == 0) cpu->p |= FLAG_Z;
        if (cpu->a & 0x80) cpu->p |= FLAG_N;
    } else {
        // Binary mode
        uint16_t sum = cpu->a + value + (cpu->p & FLAG_C);

        cpu->p &= ~(FLAG_C | FLAG_V | FLAG_Z | FLAG_N);
        if (sum > 0xFF) cpu->p |= FLAG_C;
        if (~(cpu->a ^ value) & (cpu->a ^ sum) & 0x80) cpu->p |= FLAG_V;

        cpu->a = sum & 0xFF;
        if (cpu->a == 0) cpu->p |= FLAG_Z;
        if (cpu->a & 0x80) cpu->p |= FLAG_N;
    }
}

Hint 6: Page Boundary Crossing

Track page boundaries for cycle-accurate timing:

int addr_absolute_x(cpu_6502 *cpu, bool *page_crossed) {
    uint16_t base = addr_absolute(cpu);
    uint16_t addr = base + cpu->x;
    *page_crossed = ((base & 0xFF00) != (addr & 0xFF00));
    return addr;
}

void op_lda_absx(cpu_6502 *cpu) {
    bool page_crossed;
    uint16_t addr = addr_absolute_x(cpu, &page_crossed);
    cpu->a = cpu->memory[addr];
    set_nz(cpu, cpu->a);
    cpu->cycles += page_crossed ? 5 : 4;  // Penalty for page cross
}

5.8 The Interview Questions They’ll Ask

Basic Understanding

  1. “What is the 6502’s register set and why is it so small?”
    • Good Answer: A, X, Y (8-bit), plus S, P, PC. Designed for low transistor count (3510). Forced use of zero page as “registers.”
  2. “Explain the difference between zero page and absolute addressing.”
    • Good Answer: Zero page uses 1-byte address ($00-$FF), saving a byte and a cycle. It’s a performance optimization for frequently-accessed data.
  3. “How does the 6502 stack work?”
    • Good Answer: Fixed at $0100-$01FF. Stack pointer is 8-bit, automatically OR’d with $0100. Grows downward. Push decrements, pull increments.

Technical Details

  1. “What is the JMP indirect bug?”
    • Good Answer: JMP ($xxFF) wraps to $xx00 instead of $xx00+1 for the high byte. The 6502 doesn’t carry across the page boundary.
  2. “How does the overflow flag work?”
    • Good Answer: Set when signed arithmetic produces an invalid result. For ADC: when inputs have same sign but result has different sign.
  3. “Why does page boundary crossing add a cycle?”
    • Good Answer: The 6502 optimistically reads from the same page. If wrong page, it discards and re-reads from correct page.
  4. “How is SBC implemented internally?”
    • Good Answer: SBC A, M is actually ADC A, (M ^ 0xFF). Uses the same adder circuit with inverted operand.

Problem-Solving

  1. “Your emulator passes simple tests but fails complex ROMs. Debugging steps?”
    • Good Answer:
      1. Run Klaus Dormann’s test suite - identifies failing instructions
      2. Add instruction trace logging
      3. Compare trace with reference emulator
      4. Binary search to find first divergence
      5. Check edge cases: page boundaries, flag handling, cycle timing
  2. “How would you make your emulator cycle-accurate?”
    • Good Answer: Track cycles per instruction including page-crossing penalties. For sub-instruction accuracy, split instructions into micro-operations per cycle.
  3. “What undocumented opcodes would you implement for NES compatibility?”
    • Good Answer: LAX, SAX, DCP, ISB, SLO, RLA, SRE, RRA at minimum. These are used by some games. Some undocumented ops have unstable behavior.

5.9 Books That Will Help

Topic Book Chapter/Section Why It Helps
6502 ISA Reference “Programming the 6502” by Rodnay Zaks All Definitive 6502 reference from 1980
CPU Design “Computer Organization and Design” by Patterson & Hennessy Ch. 4-5 CPU internals and pipelining concepts
Addressing Modes “Computer Systems: A Programmer’s Perspective” by Bryant Ch. 3.4 How ISAs encode operands
Status Flags “Write Great Code, Vol. 2” by Randall Hyde Ch. 5 Condition codes and branching
Emulation Techniques “Game Boy Emulation in JavaScript” (online) CPU chapter Practical emulator implementation
6502 History “The COMMODORE 64 in Action” Ch. 1 Historical context and design philosophy

Online Resources:

5.10 Implementation Phases

Phase 1: Foundation (Days 1-3)

  • Set up project structure
  • Implement CPU state struct
  • Implement simple memory array
  • Add reset vector handling
  • Test with simple NOP loop

Milestone: CPU resets to correct address and executes NOP

Phase 2: Basic Instructions (Days 4-7)

  • Implement LDA, LDX, LDY (all addressing modes)
  • Implement STA, STX, STY
  • Implement transfer instructions (TAX, TAY, etc.)
  • Add flag update helpers

Milestone: Can load and store values, pass load/store tests

Phase 3: Arithmetic (Days 8-12)

  • Implement ADC (binary mode first)
  • Implement SBC
  • Implement compare instructions (CMP, CPX, CPY)
  • Implement INC, DEC, INX, INY, DEX, DEY

Milestone: Can do basic math, pass arithmetic tests

Phase 4: Logic and Shift (Days 13-15)

  • Implement AND, ORA, EOR
  • Implement ASL, LSR, ROL, ROR
  • Implement BIT

Milestone: Pass logic/shift tests

Phase 5: Branches and Jumps (Days 16-18)

  • Implement all branch instructions (BCC, BCS, BEQ, BNE, BMI, BPL, BVC, BVS)
  • Implement JMP (absolute and indirect with bug!)
  • Implement JSR, RTS

Milestone: Can run branching code, pass branch tests

Phase 6: Stack and Interrupts (Days 19-21)

  • Implement stack operations (PHA, PHP, PLA, PLP)
  • Implement BRK, RTI
  • Implement IRQ and NMI handlers
  • Implement flag instructions (CLC, SEC, etc.)

Milestone: Full interrupt support, pass interrupt tests

Phase 7: BCD Mode (Days 22-24)

  • Implement BCD addition in ADC
  • Implement BCD subtraction in SBC
  • Run BCD test suite

Milestone: Pass decimal mode tests

Phase 8: Polish (Days 25-28)

  • Add cycle-accurate timing with page crossing
  • Add debugging features
  • Run Klaus Dormann full test suite
  • Fix any failing tests
  • Test with real programs

Milestone: Pass all tests, run Apple BASIC

5.11 Key Implementation Decisions

  1. Opcode Dispatch: Use a 256-entry function pointer table. Switch statements are slower. Computed goto is fastest but less portable.

  2. Memory Model: Start with a simple 64KB array. Refactor to callbacks later for NES/C64 bank switching.

  3. Flag Handling: Create helper functions like set_nz() and set_carry(). Inline the binary, call for BCD.

  4. BCD Implementation: Implement it. Even if targeting NES (which disables BCD), it teaches ALU concepts.

  5. Undocumented Opcodes: Implement at least the stable ones (LAX, SAX, etc.) for real ROM compatibility.

  6. Cycle Counting: Return cycles from each instruction. Add page-crossing detection for indexed modes.


6. Testing Strategy

6.1 Unit Testing

Test each addressing mode in isolation:

void test_lda_immediate(void) {
    cpu_6502 cpu;
    cpu_init(&cpu);

    // LDA #$42
    cpu.memory[0x0600] = 0xA9;
    cpu.memory[0x0601] = 0x42;
    cpu.pc = 0x0600;

    cpu_step(&cpu);

    assert(cpu.a == 0x42);
    assert(cpu.pc == 0x0602);
    assert((cpu.p & FLAG_Z) == 0);
    assert((cpu.p & FLAG_N) == 0);
    printf("test_lda_immediate PASSED\n");
}

void test_lda_zeropage_x_wrap(void) {
    cpu_6502 cpu;
    cpu_init(&cpu);

    // LDA $FF,X with X=$02 should wrap to $01
    cpu.memory[0x0600] = 0xB5;
    cpu.memory[0x0601] = 0xFF;
    cpu.memory[0x0001] = 0x77;  // Wrapped address
    cpu.x = 0x02;
    cpu.pc = 0x0600;

    cpu_step(&cpu);

    assert(cpu.a == 0x77);
    printf("test_lda_zeropage_x_wrap PASSED\n");
}

6.2 Integration Testing

Use Klaus Dormann’s test suite:

# Download the test
wget https://github.com/Klaus2m5/6502_65C02_functional_tests/raw/master/bin_files/6502_functional_test.bin

# Run it
./emu6502 --test 6502_functional_test.bin

The test is a self-contained program that exercises every opcode and reports failures by halting at a specific address.

6.3 Critical Test Cases

  1. Zero Page Wrapping:
    • LDA $FF,X with X=$01 should read from $00, not $100
  2. JMP Indirect Bug:
    • JMP ($10FF) should read low byte from $10FF and high byte from $1000
  3. BRK/RTI:
    • BRK pushes PC+2, not PC+1
    • B flag handling on push/pull
  4. Overflow Flag:
    • $50 + $50 = $A0 (sets V, result is negative but operands positive)
    • $D0 + $D0 = $A0 (sets C, clears V)
  5. Decimal Mode:
    • $99 + $01 = $00 with C=1 (BCD)
    • $00 - $01 = $99 with C=0 (BCD)

6.4 Debugging Techniques

# Add instruction tracing
./emu6502 --trace test.bin
$0600: A9 44    LDA #$44        A=44 X=00 Y=00 P=00 S=FD
$0602: 85 10    STA $10         A=44 X=00 Y=00 P=00 S=FD
$0604: A9 30    LDA #$30        A=30 X=00 Y=00 P=00 S=FD

# Compare with reference (py65 emulator)
pip install py65
python -c "from py65.monitor import Monitor; Monitor().run()"

7. Common Pitfalls & Debugging

Problem 1: ADC/SBC produce wrong results

  • Root Cause: Incorrect carry handling or overflow calculation
  • Fix: Remember SBC = ADC with inverted operand. Carry is inverted too.
  • Quick Test: $00 - $01 with C=1 should give $FF with C=0

Problem 2: Zero page indexed addressing reads from wrong address

  • Root Cause: Not masking to 8 bits
  • Fix: addr = (base + cpu->x) & 0xFF

Problem 3: BRK doesn’t work correctly

  • Root Cause: Wrong PC value pushed, or B flag handling
  • Fix: BRK pushes PC+2. B is set on pushed P, not actual P.

Problem 4: Branches always/never taken

  • Root Cause: Wrong flag polarity
  • Fix: BCC = branch if C==0, BCS = branch if C==1

Problem 5: Decimal mode gives wrong results

  • Root Cause: BCD addition algorithm error
  • Fix: Handle carry between nibbles, adjust when >9

Problem 6: Test hangs at specific address

  • Root Cause: Failed test loops forever
  • Fix: Klaus test suite hangs at failure point. Address indicates which test failed.

Problem 7: Cycles don’t match expected

  • Root Cause: Missing page-crossing penalty
  • Fix: Check if ((base & 0xFF00) != (addr & 0xFF00))

Problem 8: Stack operations corrupt memory

  • Root Cause: Not fixing stack to page $01
  • Fix: Always use ($0100 cpu->s) for stack address

8. Extensions & Challenges

Beginner Extensions

  • Add disassembler output for each instruction
  • Implement undocumented NOPs
  • Add memory read/write breakpoints
  • Create simple assembler for test programs

Intermediate Extensions

  • Implement all stable undocumented opcodes (LAX, SAX, DCP, ISB, etc.)
  • Add cycle-accurate sub-instruction timing
  • Implement the “unstable” undocumented opcodes
  • Create memory-mapped I/O for simple terminal

Advanced Extensions

  • Build a 6502 assembler
  • Create an Apple I emulator using your CPU
  • Create a C64 emulator (add VIC-II and SID chips)
  • Create an NES emulator (add PPU and APU)
  • Implement the 65C02 extended instruction set

Expert Challenges

  • Match Visual 6502 transistor-level timing
  • Implement the unstable documented opcodes correctly
  • Add save states and rewind capability
  • Create a time-travel debugger

9. Real-World Connections

NES Emulation: The NES CPU is a 6502 variant (Ricoh 2A03) with disabled decimal mode and added audio. Your emulator is the foundation for FCEUX, Nestopia, Mesen.

Apple II Preservation: Apple II enthusiasts use 6502 emulators to run vintage software. AppleWin and LinApple depend on accurate emulation.

Retrocomputing: The 6502 community is active - people still build new computers with real chips. Understanding the 6502 connects you to this community.

Embedded Systems: The 65C02 is still manufactured today for embedded applications. Your knowledge applies to modern hardware.

Career Skills: Writing an emulator demonstrates:

  • Deep understanding of CPU architecture
  • Ability to implement specifications precisely
  • Debugging at the lowest level
  • Performance optimization skills

10. Resources

Primary References

  • MOS 6502 Programming Manual (original 1976 document)
  • W65C02S Datasheet (modern variant, compatible)
  • Klaus Dormann’s Test Suite Documentation

Online Resources

Community

Reference Emulators

  • py65 (Python) - Simple, well-documented
  • FCEUX (C++) - NES emulator with excellent debugger
  • Mesen (C#) - Highly accurate NES emulator
  • perfect6502 (C) - Transistor-level simulation

11. Self-Assessment Checklist

Before moving to the next project, verify:

Understanding:

  • Can you explain all 13 addressing modes without notes?
  • Can you describe how the status register works?
  • Can you explain why SBC is implemented as ADC with inverted operand?
  • Can you explain the JMP indirect bug?
  • Can you describe how BCD arithmetic works?

Implementation:

  • Does your emulator pass Klaus Dormann’s functional test?
  • Does it pass the BCD/decimal test?
  • Are cycles counted correctly including page penalties?
  • Do interrupts work correctly (IRQ, NMI, BRK)?
  • Can you run Apple I BASIC?

Debugging:

  • Can you add instruction tracing to your emulator?
  • Can you set breakpoints and inspect memory?
  • Can you identify which test failed when Klaus test halts?
  • Can you compare your trace with a reference emulator?

Growth:

  • Did you write the emulator without copying code?
  • Can you explain your design decisions?
  • Are you ready to extend this to NES or C64 emulation?

12. Completion Criteria

Your implementation is complete when:

  • All 56 official opcodes implemented
  • All 13 addressing modes working correctly
  • Klaus Dormann 6502 functional test passes
  • Klaus Dormann decimal test passes
  • Cycle counts match reference for all instructions
  • Page-crossing penalties implemented
  • Interrupts (IRQ, NMI, BRK) working
  • Can run Apple I BASIC and interact with it
  • Code is clean, commented, and maintainable
  • You can explain the entire implementation

Congratulations! You’ve built an emulator for one of the most influential CPUs in computing history. You now understand how millions of NES, Apple II, and C64 programs executed. This is the foundation for building complete system emulators.


This guide was expanded from CPU_ISA_ARCHITECTURE_PROJECTS.md. For the complete learning path, see the project index.