Project 6: 6502 CPU Emulator - The Heart of a Gaming Generation
Build a cycle-accurate emulator of the MOS 6502 processor - the legendary CPU that powered the NES, Commodore 64, Apple II, and Atari 2600. This is not a toy CPU - it’s real silicon translated into software.
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Advanced |
| Time Estimate | 3-4 weeks |
| Main Language | C |
| Alternative Languages | Rust, C++ |
| Prerequisites | Previous CPU emulator project (P05), strong C skills, understanding of binary/hex |
| Key Topics | CPU emulation, addressing modes, status flags, cycle timing, BCD arithmetic |
1. Learning Objectives
After completing this project, you will be able to:
- Implement a complete instruction set architecture (ISA) with 56 opcodes and 151 instruction variants
- Understand and implement 13 different addressing modes used in real CPUs
- Handle cycle-accurate timing including page-boundary crossing penalties
- Implement a processor status register with condition flags (N, V, Z, C, B, I, D)
- Debug at the CPU level using memory dumps, register traces, and instruction stepping
- Run real 6502 programs including test suites and vintage software
- Explain how the 6502’s elegant design influenced modern processor architecture
- Pass Klaus Dormann’s famous 6502 functional test suite
2. Theoretical Foundation
2.1 The MOS 6502: A Silicon Legend
The MOS Technology 6502, released in 1975, is one of the most influential microprocessors ever created. At $25 (compared to Intel 8080’s $179), it democratized computing and powered the home computer revolution.
Systems Powered by the 6502:
- Apple I and Apple II (1976-1977)
- Commodore PET, VIC-20, and C64 (1977-1982)
- Atari 2600, 400, 800 (1977-1979)
- Nintendo Entertainment System (NES) with Ricoh 2A03 (1983)
- BBC Micro (1981)
- Atari Lynx (1989)
- Tamagotchi (1996)
The 6502’s design philosophy of “do more with less” led to elegant solutions that every systems programmer should understand.
2.2 Core Architecture
MOS 6502 CPU Architecture
┌─────────────────────────────────────────────────────────────────┐
│ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ REGISTERS │ │
│ │ ┌─────┐ ┌─────┐ ┌─────┐ ┌─────┐ ┌─────┐ ┌──────────┐ │ │
│ │ │ A │ │ X │ │ Y │ │ S │ │ P │ │ PC │ │ │
│ │ │ 8b │ │ 8b │ │ 8b │ │ 8b │ │ 8b │ │ 16b │ │ │
│ │ └─────┘ └─────┘ └─────┘ └─────┘ └─────┘ └──────────┘ │ │
│ │ Acc Index X Index Y Stack Status Program Ctr │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ ALU (8-bit) │ │
│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────────┐ │ │
│ │ │ Adder │ │ Logic Ops │ │ Shift/Rotate │ │ │
│ │ │ ADC, SBC │ │ AND,OR,EOR │ │ ASL,LSR,ROL,ROR │ │ │
│ │ └─────────────┘ └─────────────┘ └─────────────────┘ │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ MEMORY BUS (16-bit address) │ │
│ │ 64KB │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
Register Set:
| Register | Name | Size | Purpose |
|---|---|---|---|
| A | Accumulator | 8-bit | Main arithmetic/logic operations |
| X | Index X | 8-bit | Array indexing, loop counter |
| Y | Index Y | 8-bit | Array indexing, loop counter |
| S | Stack Pointer | 8-bit | Points to stack (page $01) |
| P | Processor Status | 8-bit | Condition flags |
| PC | Program Counter | 16-bit | Next instruction address |
The 6502’s minimalist register set forced elegant programming techniques. Where the Intel 8080 had 7 general-purpose registers, the 6502 had just A, X, and Y - yet achieved remarkable efficiency.
2.3 The Processor Status Register
The 8-bit status register (P) contains flags that reflect the CPU state:
Status Register (P) Bit Layout
┌─────┬─────┬─────┬─────┬─────┬─────┬─────┬─────┐
│ 7 │ 6 │ 5 │ 4 │ 3 │ 2 │ 1 │ 0 │
│ N │ V │ - │ B │ D │ I │ Z │ C │
└─────┴─────┴─────┴─────┴─────┴─────┴─────┴─────┘
│ │ │ │ │ │ │ │
│ │ │ │ │ │ │ └── Carry Flag
│ │ │ │ │ │ └──────── Zero Flag
│ │ │ │ │ └────────────── Interrupt Disable
│ │ │ │ └──────────────────── Decimal Mode (BCD)
│ │ │ └────────────────────────── Break Command
│ │ └──────────────────────────────── (Always 1)
│ └────────────────────────────────────── Overflow Flag
└──────────────────────────────────────────── Negative Flag
Flag Details:
| Flag | Bit | Set When | Used For |
|---|---|---|---|
| C (Carry) | 0 | Arithmetic overflow/borrow, shift out | Multi-byte math, comparisons |
| Z (Zero) | 1 | Result is zero | Equality tests, loop termination |
| I (Interrupt) | 2 | Interrupts should be ignored | Critical sections |
| D (Decimal) | 3 | BCD mode enabled | Financial calculations |
| B (Break) | 4 | BRK instruction executed | Debugging, software interrupts |
| - (Unused) | 5 | Always 1 | Reserved |
| V (Overflow) | 6 | Signed arithmetic overflow | Signed number operations |
| N (Negative) | 7 | Result bit 7 is set | Signed number tests |
2.4 The 13 Addressing Modes
The 6502’s addressing modes are its most distinctive feature. Each mode specifies how to find the operand:
6502 Addressing Modes Visualization
┌────────────────────────────────────────────────────────────────┐
│ IMMEDIATE ZERO PAGE ZERO PAGE,X │
│ ──────────────── ──────────────── ──────────────── │
│ LDA #$44 LDA $44 LDA $44,X │
│ │
│ Operand is the Operand at Operand at │
│ value itself addr $0044 addr $0044 + X │
│ │
│ ┌────┐ ┌────┐ ─────▶ ┌────┐ ┌────┐ ──┐ │
│ │ 44 │ │ 44 │ $44 │ ?? │ │ 44 │ │ +X │
│ └────┘ └────┘ └────┘ └────┘ │ │
│ ↓ A=44 Page 0 ─────────▶│ │
│ $44+X ▼ │
├────────────────────────────────────────────────────────────────┤
│ ABSOLUTE ABSOLUTE,X ABSOLUTE,Y │
│ ──────────────── ──────────────── ──────────────── │
│ LDA $4400 LDA $4400,X LDA $4400,Y │
│ │
│ ┌────┬────┐ ┌────┬────┐ ┌────┬────┐ │
│ │ 00 │ 44 │ │ 00 │ 44 │ +X │ 00 │ 44 │ +Y │
│ └────┴────┘ └────┴────┘ └────┴────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ Addr $4400 Addr $4400+X Addr $4400+Y │
├────────────────────────────────────────────────────────────────┤
│ INDIRECT (INDIRECT,X) (INDIRECT),Y │
│ ──────────────── ──────────────── ──────────────── │
│ JMP ($4400) LDA ($44,X) LDA ($44),Y │
│ │
│ Look up addr At $44+X, find At $44, find │
│ at $4400-4401 pointer to data pointer, add Y │
│ │
│ ┌────┬────┐ $44+X ──▶ ┌──┬──┐ $44 ──▶ ┌──┬──┐ +Y │
│ │ 00 │ 44 │ │lo│hi│ │lo│hi│ │
│ └────┴────┘ └──┴──┘ └──┴──┘ │
│ │ │ │ │
│ ▼ read addr ▼ ▼ │
│ ┌────┬────┐ Effective Effective │
│ │ xx │ yy │ Address Addr + Y │
│ └────┴────┘ │
│ │ │
│ ▼ jump here │
└────────────────────────────────────────────────────────────────┘
Complete Addressing Mode Reference:
| # | Mode | Example | Operand Location | Bytes | Cycles |
|---|---|---|---|---|---|
| 1 | Implied | CLC |
None (in opcode) | 1 | 2 |
| 2 | Accumulator | ASL A |
A register | 1 | 2 |
| 3 | Immediate | LDA #$44 |
Byte after opcode | 2 | 2 |
| 4 | Zero Page | LDA $44 |
Address $0044 | 2 | 3 |
| 5 | Zero Page,X | LDA $44,X |
Address $0044+X | 2 | 4 |
| 6 | Zero Page,Y | LDX $44,Y |
Address $0044+Y | 2 | 4 |
| 7 | Absolute | LDA $4400 |
Address $4400 | 3 | 4 |
| 8 | Absolute,X | LDA $4400,X |
Address $4400+X | 3 | 4(+1) |
| 9 | Absolute,Y | LDA $4400,Y |
Address $4400+Y | 3 | 4(+1) |
| 10 | Indirect | JMP ($4400) |
[Address at $4400] | 3 | 5 |
| 11 | (Indirect,X) | LDA ($44,X) |
[Address at $44+X] | 2 | 6 |
| 12 | (Indirect),Y | LDA ($44),Y |
[Address at $44]+Y | 2 | 5(+1) |
| 13 | Relative | BEQ label |
PC + offset | 2 | 2-4 |
Page Boundary Crossing: When indexed addressing crosses a 256-byte page boundary (e.g., $44FF + X where X=2 gives $4501), an extra cycle is needed. This is because the 6502 optimistically fetches from the low byte first.
2.5 Memory Map
Standard 6502 Memory Map (64KB Address Space)
$FFFF ┌─────────────────────────────────────┐
│ Vector Table │
$FFFA │ NMI, RESET, IRQ vectors │
├─────────────────────────────────────┤
$FFF9 │ │
│ ROM / Cartridge │
│ (System-dependent) │
│ │
$8000 ├─────────────────────────────────────┤
│ │
│ RAM │
│ (System-dependent) │
│ │
$0200 ├─────────────────────────────────────┤
│ Stack Page │
$0100 │ (256 bytes, $0100-$01FF) │
├─────────────────────────────────────┤
│ Zero Page │
$0000 │ (256 bytes, $0000-$00FF) │
└─────────────────────────────────────┘
Special Addresses:
$0000-$00FF Zero Page (fast access, 2-byte addressing)
$0100-$01FF Stack (grows downward from $01FF)
$FFFA-$FFFB NMI Vector (Non-Maskable Interrupt)
$FFFC-$FFFD RESET Vector (CPU start address)
$FFFE-$FFFF IRQ/BRK Vector (Interrupt Request)
2.6 The Fetch-Decode-Execute Cycle
6502 Instruction Execution Cycle
┌─────────────────────────────────────────────────────────────┐
│ │
│ FETCH DECODE EXECUTE │
│ ────── ────── ────── │
│ │
│ ┌─────┐ ┌─────────┐ ┌─────────┐ │
│ │ PC │──────────▶ │ Opcode │──────▶ │ ALU │ │
│ └─────┘ │ Decode │ │ Ops │ │
│ │ └─────────┘ └─────────┘ │
│ ▼ │ │ │
│ Read Memory ▼ ▼ │
│ @ PC ┌─────────┐ ┌─────────┐ │
│ │ │Addressing│ │ Update │ │
│ ▼ │ Mode │ │ Flags │ │
│ ┌─────┐ └─────────┘ └─────────┘ │
│ │Opcode│ │ │ │
│ └─────┘ ▼ ▼ │
│ │ Fetch operand Store result │
│ ▼ (1-2 bytes) │
│ PC = PC + 1 │
│ │
│ Cycle Count: 2-7 cycles depending on instruction │
│ │
└─────────────────────────────────────────────────────────────┘
2.7 Why This Matters
For Game Development: Understanding the 6502 explains why NES games play the way they do. The limited registers forced creative solutions - sprite multiplexing, music tricks, and graphics hacks that defined an era.
For Compiler Design: The 6502’s constraints make it an excellent target for learning code generation. Every byte and cycle mattered.
For Security Research: ROP chains and exploitation techniques have roots in understanding CPU fundamentals. The 6502’s simplicity makes these concepts clear.
For Career Advancement: “I wrote a cycle-accurate 6502 emulator” is a powerful interview statement. It demonstrates deep systems understanding.
2.8 Common Misconceptions
Misconception 1: “Cycle-accurate just means counting instructions”
- Reality: Each addressing mode takes different cycles. Page crossing adds cycles. Branches take different cycles depending on whether taken and page crossing.
Misconception 2: “The stack is like a normal stack”
- Reality: The 6502 stack is fixed at page $01 ($0100-$01FF). The stack pointer is only 8 bits - it automatically wraps within this page.
Misconception 3: “BCD mode is optional”
- Reality: For accurate emulation, you must implement BCD. Games use it for score displays. The NES disabled BCD in hardware, but a proper 6502 needs it.
Misconception 4: “All 256 opcodes are valid”
- Reality: Only 151 opcodes are official. The other 105 are “undocumented” - some do useful things, some crash. Real games use some undocumented opcodes.
3. Project Specification
3.1 What You Will Build
A complete 6502 emulator that:
- Implements all 56 official instructions with their addressing modes (151 opcodes)
- Provides cycle-accurate timing including page-boundary penalties
- Correctly implements the status register and all flag operations
- Handles BCD (Binary Coded Decimal) arithmetic mode
- Processes interrupts (IRQ, NMI, RESET)
- Passes the Klaus Dormann 6502 functional test suite
- Can run real 6502 programs (Apple I BASIC, games, etc.)
3.2 Functional Requirements
- Implement all 13 addressing modes correctly
- Implement all 56 official opcodes
- Pass Klaus Dormann’s 6502 functional test
- Pass the BCD test (decimal mode arithmetic)
- Count cycles accurately for each instruction
- Handle page-boundary crossing penalties
- Implement NMI, IRQ, and BRK interrupts
- Implement RESET initialization
- Provide debugging output (register dumps, memory inspection)
3.3 Non-Functional Requirements
- Deterministic execution (same input = same output)
- Clear separation between CPU core and memory/IO
- Configurable clock speed or step mode
- Memory callback system for future system integration (NES, C64, etc.)
3.4 Real World Outcome
When complete, you will have a working 6502 emulator:
$ ./emu6502 test_suite/6502_functional_test.bin
Loading 6502_functional_test.bin at $0400...
Reset vector: $0400
Starting execution...
Running Klaus Dormann's 6502 Functional Test Suite...
Test Group 1: Load/Store Operations
[OK] LDA immediate
[OK] LDA zero page
[OK] LDA zero page,X
[OK] LDA absolute
[OK] LDA absolute,X
[OK] LDA absolute,Y
[OK] LDA (indirect,X)
[OK] LDA (indirect),Y
[OK] LDX/LDY all modes
[OK] STA/STX/STY all modes
Test Group 2: Transfer Operations
[OK] TAX, TAY, TXA, TYA
[OK] TSX, TXS
Test Group 3: Stack Operations
[OK] PHA, PHP
[OK] PLA, PLP
Test Group 4: Arithmetic
[OK] ADC all modes (binary)
[OK] SBC all modes (binary)
[OK] ADC/SBC (decimal mode)
[OK] CMP, CPX, CPY
Test Group 5: Logic
[OK] AND, ORA, EOR all modes
[OK] BIT zero page, absolute
Test Group 6: Shift/Rotate
[OK] ASL, LSR, ROL, ROR
Test Group 7: Branches
[OK] BCC, BCS, BEQ, BNE
[OK] BMI, BPL, BVC, BVS
[OK] Page crossing timing
Test Group 8: Jumps/Calls
[OK] JMP absolute, indirect
[OK] JSR, RTS
[OK] BRK, RTI
Test Group 9: Flags
[OK] CLC, SEC, CLI, SEI
[OK] CLD, SED, CLV
[OK] All flag interactions
═══════════════════════════════════════════════════════════════════
ALL TESTS PASSED!
═══════════════════════════════════════════════════════════════════
Executed: 26,765,149 cycles
Time: 2.3 seconds
Effective speed: 11.6 MHz (6.6x real 1.79 MHz NES speed)
$ ./emu6502 roms/apple1_basic.bin
Apple I BASIC loaded. Emulating at 1 MHz...
E000: 4C 00 E0 JMP $E000
Apple I BASIC 1.0
Ready
> PRINT 2 + 2
4
> 10 FOR I = 1 TO 10
> 20 PRINT I * I
> 30 NEXT I
> RUN
1
4
9
16
25
36
49
64
81
100
Ready
Debugging Output Example:
$ ./emu6502 --debug --step test.bin
6502 Debugger v1.0
Loaded: test.bin at $0600
Reset vector: $0600
PC=$0600 A=$00 X=$00 Y=$00 S=$FD P=00100100 (--_bdIZc)
> s
$0600: A9 44 LDA #$44 ; A = $44
Cycles: 2
PC=$0602 A=$44 X=$00 Y=$00 S=$FD P=00100100 (--_bdIZc)
> s
$0602: 85 10 STA $10 ; Store A to zero page $10
Cycles: 5
PC=$0604 A=$44 X=$00 Y=$00 S=$FD P=00100100 (--_bdIZc)
> m 10 20
$0010: 44 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 D...............
> r
A=$44 X=$00 Y=$00 S=$FD P=$24
PC=$0604
Flags: NV_BDIZC
00100100
4. Solution Architecture
4.1 High-Level Design
6502 Emulator Architecture
┌─────────────────────────────────────────────────────────────────┐
│ MAIN LOOP │
│ │
│ ┌───────────────────────────────────────────────────────┐ │
│ │ CPU Core │ │
│ │ ┌─────────┐ ┌─────────────┐ ┌────────────────┐ │ │
│ │ │Registers│ │ Decoder │ │ Executor │ │ │
│ │ │ A,X,Y │ │ 151 ops │ │ ALU, Flags │ │ │
│ │ │ S,P,PC │ │ │ │ Memory R/W │ │ │
│ │ └─────────┘ └─────────────┘ └────────────────┘ │ │
│ │ ▲ │ │ │ │
│ │ │ ▼ ▼ │ │
│ │ │ ┌───────────────────────────┐ │ │
│ │ └────────│ Addressing Modes │ │ │
│ │ │ 13 modes, operand fetch │ │ │
│ │ └───────────────────────────┘ │ │
│ └───────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌───────────────────────────────────────────────────────┐ │
│ │ Memory Bus │ │
│ │ ┌─────────────┐ ┌─────────────┐ ┌──────────────┐ │ │
│ │ │ read_byte │ │ write_byte │ │ Callbacks │ │ │
│ │ │ (address) │ │(addr, data) │ │ (for IO/ROM) │ │ │
│ │ └─────────────┘ └─────────────┘ └──────────────┘ │ │
│ └───────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌───────────────────────────────────────────────────────┐ │
│ │ Memory Subsystem │ │
│ │ ┌───────┐ ┌───────┐ ┌────────────────┐ │ │
│ │ │ RAM │ │ ROM │ │ I/O Devices │ │ │
│ │ │ 64KB │ │(mapped)│ │ (callbacks) │ │ │
│ │ └───────┘ └───────┘ └────────────────┘ │ │
│ └───────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
4.2 Key Components
- CPU Core: Registers, fetch-decode-execute cycle, cycle counting
- Decoder: 256-entry lookup table mapping opcodes to handlers
- Addressing Mode Calculator: Computes effective addresses for all 13 modes
- ALU: Arithmetic, logic, and shift operations with flag updates
- Memory Bus: Abstraction layer with read/write callbacks
- Interrupt Handler: IRQ, NMI, BRK, and RESET logic
4.3 Data Structures
// CPU State Structure
typedef struct {
// Registers
uint8_t a; // Accumulator
uint8_t x; // Index X
uint8_t y; // Index Y
uint8_t s; // Stack pointer
uint8_t p; // Processor status
uint16_t pc; // Program counter
// Internal state
uint64_t cycles; // Total cycles executed
bool irq_pending;
bool nmi_pending;
bool nmi_edge; // NMI is edge-triggered
// Memory interface
uint8_t (*read)(uint16_t addr, void *ctx);
void (*write)(uint16_t addr, uint8_t val, void *ctx);
void *mem_ctx;
} cpu_6502;
// Status register bit masks
#define FLAG_C 0x01 // Carry
#define FLAG_Z 0x02 // Zero
#define FLAG_I 0x04 // Interrupt disable
#define FLAG_D 0x08 // Decimal mode
#define FLAG_B 0x10 // Break
#define FLAG_U 0x20 // Unused (always 1)
#define FLAG_V 0x40 // Overflow
#define FLAG_N 0x80 // Negative
// Addressing modes enumeration
typedef enum {
ADDR_IMP, // Implied
ADDR_ACC, // Accumulator
ADDR_IMM, // Immediate
ADDR_ZP, // Zero Page
ADDR_ZPX, // Zero Page,X
ADDR_ZPY, // Zero Page,Y
ADDR_ABS, // Absolute
ADDR_ABX, // Absolute,X
ADDR_ABY, // Absolute,Y
ADDR_IND, // Indirect
ADDR_IZX, // (Indirect,X)
ADDR_IZY, // (Indirect),Y
ADDR_REL, // Relative
} addr_mode;
// Instruction descriptor
typedef struct {
const char *mnemonic;
addr_mode mode;
uint8_t bytes;
uint8_t cycles;
bool page_penalty; // Extra cycle on page cross
void (*execute)(cpu_6502 *cpu, uint16_t addr);
} instruction;
4.4 Algorithm Overview
ALGORITHM: cpu_step (Execute One Instruction)
1. Check for pending interrupts
- If NMI pending (edge-triggered), handle_nmi()
- Else if IRQ pending and I flag clear, handle_irq()
2. FETCH opcode
opcode = read(PC)
PC++
3. DECODE instruction
inst = instruction_table[opcode]
4. CALCULATE effective address (based on addressing mode)
switch(inst.mode):
ADDR_IMM: addr = PC; PC++
ADDR_ZP: addr = read(PC); PC++
ADDR_ZPX: addr = (read(PC) + X) & 0xFF; PC++
ADDR_ABS: addr = read16(PC); PC += 2
// ... etc for all 13 modes
5. Check page boundary crossing
if (inst.page_penalty && crossed_page)
cycles++
6. EXECUTE instruction
inst.execute(cpu, addr)
7. Add base cycle count
cycles += inst.cycles
8. Return cycles consumed
ALGORITHM: ADC (Add with Carry)
1. Fetch operand value
operand = read(effective_address)
2. Check decimal mode
if (P & FLAG_D):
// BCD addition (complex)
result = bcd_add(A, operand, carry)
else:
// Binary addition
result = A + operand + (P & FLAG_C)
3. Update flags
C = (result > 0xFF)
Z = ((result & 0xFF) == 0)
N = (result & 0x80)
V = ~(A ^ operand) & (A ^ result) & 0x80
4. Store result
A = result & 0xFF
5. Implementation Guide
5.1 Development Environment Setup
# Install required tools
# macOS
brew install gcc make
# Ubuntu/Debian
sudo apt install build-essential
# Create project structure
mkdir -p 6502emu/{src,include,tests,roms}
cd 6502emu
5.2 Project Structure
6502emu/
├── src/
│ ├── main.c # Entry point, CLI, debugging
│ ├── cpu.c # CPU core implementation
│ ├── memory.c # Memory bus and callbacks
│ ├── opcodes.c # Instruction implementations
│ └── disasm.c # Disassembler for debugging
├── include/
│ ├── cpu.h # CPU state and functions
│ ├── memory.h # Memory interface
│ └── types.h # Common type definitions
├── tests/
│ ├── 6502_functional_test.bin # Klaus Dormann test
│ ├── decimal_test.bin # BCD test
│ └── test_harness.c # Test runner
├── roms/
│ └── apple1_basic.bin # Sample program
├── Makefile
└── README.md
5.3 The Core Question You’re Answering
“How does a CPU execute instructions, and what makes cycle-accurate emulation challenging?”
This project forces you to understand:
- How addressing modes allow a small instruction set to be powerful
- Why cycle timing matters for system emulation
- How status flags enable conditional logic
- Why the 6502’s design influenced 40 years of CPU architecture
5.4 Concepts You Must Understand First
Before writing code, verify your understanding:
Addressing Modes:
- Q: What’s the difference between (Indirect,X) and (Indirect),Y?
- A: (Indirect,X) adds X before the indirection; (Indirect),Y adds Y after
Carry Flag:
- Q: How does SBC use the carry flag?
- A: SBC subtracts the inverse of carry (borrow). A - M - (1-C)
Overflow Flag:
- Q: When is overflow set?
- A: When signed arithmetic produces a result outside -128 to 127
Page Crossing:
- Q: Why does crossing a page boundary cost a cycle?
- A: The 6502 optimistically fetches from the wrong page first
Stack Operations:
- Q: Where is the stack located?
- A: Page $01 ($0100-$01FF), pointer is only 8 bits
5.5 Questions to Guide Your Design
CPU Core:
- How will you represent the opcode-to-instruction mapping?
- Will you use function pointers, a switch statement, or computed goto?
- How will you track cycle counts for page-crossing penalties?
Addressing Modes:
- Will you calculate the effective address once or inline in each instruction?
- How will you handle zero-page wrapping (e.g., $FF + X wraps to $00)?
- How will you implement the JMP indirect bug (wraps within page)?
Flags:
- How will you calculate the overflow flag correctly?
- Will you set flags in each instruction or use helper functions?
- How will you handle flags for read-modify-write instructions?
BCD Mode:
- Will you implement BCD fully or skip it for NES compatibility?
- How will you test your BCD implementation?
5.6 Thinking Exercise
Before writing code, trace through this program by hand:
.org $0600
start: LDA #$50 ; A = $50
STA $10 ; Store at zero page $10
LDA #$30 ; A = $30
CLC ; Clear carry
ADC $10 ; A = A + [$10] = $30 + $50 = $80
BPL done ; Branch if positive (N=0)
LDA #$FF ; This should execute (N=1)
done: STA $20 ; Store result
BRK ; Stop
For each instruction, write:
- The opcode bytes
- The cycle count
- Register values after execution
- Flag values after execution
Your trace should look like:
| Address | Bytes | Instruction | A | X | Y | P (NVDIZC) | Cycles |
|---|---|---|---|---|---|---|---|
| $0600 | A9 50 | LDA #$50 | $50 | $00 | $00 | 00100100 | 2 |
| $0602 | 85 10 | STA $10 | $50 | $00 | $00 | 00100100 | 3 |
| … | … | … | … | … | … | … | … |
5.7 Hints in Layers
Use these progressive hints only when stuck.
Hint 1: Starting Structure
Begin with the CPU state and basic fetch:
// Start simple - just registers and memory access
typedef struct {
uint8_t a, x, y, s, p;
uint16_t pc;
uint8_t memory[65536]; // Simple array for now
} cpu_6502;
void cpu_reset(cpu_6502 *cpu) {
cpu->a = cpu->x = cpu->y = 0;
cpu->s = 0xFD; // Stack pointer after reset
cpu->p = 0x24; // I flag set, unused bit set
cpu->pc = cpu->memory[0xFFFC] | (cpu->memory[0xFFFD] << 8);
}
Hint 2: Opcode Dispatch
Create a dispatch table:
typedef void (*opcode_handler)(cpu_6502 *cpu);
opcode_handler dispatch[256];
void init_dispatch(void) {
// Initialize all to illegal opcode handler
for (int i = 0; i < 256; i++)
dispatch[i] = op_illegal;
// Fill in real opcodes
dispatch[0xA9] = op_lda_imm; // LDA #
dispatch[0xA5] = op_lda_zp; // LDA zp
dispatch[0xB5] = op_lda_zpx; // LDA zp,X
// ... 148 more
}
int cpu_step(cpu_6502 *cpu) {
uint8_t opcode = cpu->memory[cpu->pc++];
dispatch[opcode](cpu);
return cycle_table[opcode];
}
Hint 3: Addressing Mode Helpers
Factor out addressing mode calculations:
uint16_t addr_immediate(cpu_6502 *cpu) {
return cpu->pc++;
}
uint16_t addr_zeropage(cpu_6502 *cpu) {
return cpu->memory[cpu->pc++];
}
uint16_t addr_zeropage_x(cpu_6502 *cpu) {
return (cpu->memory[cpu->pc++] + cpu->x) & 0xFF;
}
uint16_t addr_absolute(cpu_6502 *cpu) {
uint16_t lo = cpu->memory[cpu->pc++];
uint16_t hi = cpu->memory[cpu->pc++];
return (hi << 8) | lo;
}
Hint 4: Flag Update Helpers
Create reusable flag update functions:
void set_nz(cpu_6502 *cpu, uint8_t value) {
cpu->p &= ~(FLAG_N | FLAG_Z);
if (value == 0) cpu->p |= FLAG_Z;
if (value & 0x80) cpu->p |= FLAG_N;
}
void op_lda_imm(cpu_6502 *cpu) {
cpu->a = cpu->memory[cpu->pc++];
set_nz(cpu, cpu->a);
}
void op_lda_zp(cpu_6502 *cpu) {
cpu->a = cpu->memory[addr_zeropage(cpu)];
set_nz(cpu, cpu->a);
}
Hint 5: ADC Implementation
The most complex instruction - ADC with both binary and BCD modes:
void do_adc(cpu_6502 *cpu, uint8_t value) {
if (cpu->p & FLAG_D) {
// BCD mode
uint16_t lo = (cpu->a & 0x0F) + (value & 0x0F) + (cpu->p & FLAG_C);
uint16_t hi = (cpu->a & 0xF0) + (value & 0xF0);
if (lo > 0x09) {
lo += 0x06;
hi += 0x10;
}
// Overflow check happens before decimal adjust
uint8_t overflow = ~(cpu->a ^ value) & (cpu->a ^ (hi + (lo & 0x0F))) & 0x80;
if (hi > 0x90) hi += 0x60;
cpu->p &= ~(FLAG_C | FLAG_V | FLAG_Z | FLAG_N);
if (hi > 0xFF) cpu->p |= FLAG_C;
if (overflow) cpu->p |= FLAG_V;
cpu->a = ((hi + (lo & 0x0F)) & 0xFF);
if (cpu->a == 0) cpu->p |= FLAG_Z;
if (cpu->a & 0x80) cpu->p |= FLAG_N;
} else {
// Binary mode
uint16_t sum = cpu->a + value + (cpu->p & FLAG_C);
cpu->p &= ~(FLAG_C | FLAG_V | FLAG_Z | FLAG_N);
if (sum > 0xFF) cpu->p |= FLAG_C;
if (~(cpu->a ^ value) & (cpu->a ^ sum) & 0x80) cpu->p |= FLAG_V;
cpu->a = sum & 0xFF;
if (cpu->a == 0) cpu->p |= FLAG_Z;
if (cpu->a & 0x80) cpu->p |= FLAG_N;
}
}
Hint 6: Page Boundary Crossing
Track page boundaries for cycle-accurate timing:
int addr_absolute_x(cpu_6502 *cpu, bool *page_crossed) {
uint16_t base = addr_absolute(cpu);
uint16_t addr = base + cpu->x;
*page_crossed = ((base & 0xFF00) != (addr & 0xFF00));
return addr;
}
void op_lda_absx(cpu_6502 *cpu) {
bool page_crossed;
uint16_t addr = addr_absolute_x(cpu, &page_crossed);
cpu->a = cpu->memory[addr];
set_nz(cpu, cpu->a);
cpu->cycles += page_crossed ? 5 : 4; // Penalty for page cross
}
5.8 The Interview Questions They’ll Ask
Basic Understanding
- “What is the 6502’s register set and why is it so small?”
- Good Answer: A, X, Y (8-bit), plus S, P, PC. Designed for low transistor count (3510). Forced use of zero page as “registers.”
- “Explain the difference between zero page and absolute addressing.”
- Good Answer: Zero page uses 1-byte address ($00-$FF), saving a byte and a cycle. It’s a performance optimization for frequently-accessed data.
- “How does the 6502 stack work?”
- Good Answer: Fixed at $0100-$01FF. Stack pointer is 8-bit, automatically OR’d with $0100. Grows downward. Push decrements, pull increments.
Technical Details
- “What is the JMP indirect bug?”
- Good Answer: JMP ($xxFF) wraps to $xx00 instead of $xx00+1 for the high byte. The 6502 doesn’t carry across the page boundary.
- “How does the overflow flag work?”
- Good Answer: Set when signed arithmetic produces an invalid result. For ADC: when inputs have same sign but result has different sign.
- “Why does page boundary crossing add a cycle?”
- Good Answer: The 6502 optimistically reads from the same page. If wrong page, it discards and re-reads from correct page.
- “How is SBC implemented internally?”
- Good Answer: SBC A, M is actually ADC A, (M ^ 0xFF). Uses the same adder circuit with inverted operand.
Problem-Solving
- “Your emulator passes simple tests but fails complex ROMs. Debugging steps?”
- Good Answer:
- Run Klaus Dormann’s test suite - identifies failing instructions
- Add instruction trace logging
- Compare trace with reference emulator
- Binary search to find first divergence
- Check edge cases: page boundaries, flag handling, cycle timing
- Good Answer:
- “How would you make your emulator cycle-accurate?”
- Good Answer: Track cycles per instruction including page-crossing penalties. For sub-instruction accuracy, split instructions into micro-operations per cycle.
- “What undocumented opcodes would you implement for NES compatibility?”
- Good Answer: LAX, SAX, DCP, ISB, SLO, RLA, SRE, RRA at minimum. These are used by some games. Some undocumented ops have unstable behavior.
5.9 Books That Will Help
| Topic | Book | Chapter/Section | Why It Helps |
|---|---|---|---|
| 6502 ISA Reference | “Programming the 6502” by Rodnay Zaks | All | Definitive 6502 reference from 1980 |
| CPU Design | “Computer Organization and Design” by Patterson & Hennessy | Ch. 4-5 | CPU internals and pipelining concepts |
| Addressing Modes | “Computer Systems: A Programmer’s Perspective” by Bryant | Ch. 3.4 | How ISAs encode operands |
| Status Flags | “Write Great Code, Vol. 2” by Randall Hyde | Ch. 5 | Condition codes and branching |
| Emulation Techniques | “Game Boy Emulation in JavaScript” (online) | CPU chapter | Practical emulator implementation |
| 6502 History | “The COMMODORE 64 in Action” | Ch. 1 | Historical context and design philosophy |
Online Resources:
- 6502.org - Community, documentation, forums
- NesDev Wiki - Ricoh 2A03 (NES CPU) specifics
- Klaus Dormann’s Test Suite
- Visual 6502 - Transistor-level simulation
5.10 Implementation Phases
Phase 1: Foundation (Days 1-3)
- Set up project structure
- Implement CPU state struct
- Implement simple memory array
- Add reset vector handling
- Test with simple NOP loop
Milestone: CPU resets to correct address and executes NOP
Phase 2: Basic Instructions (Days 4-7)
- Implement LDA, LDX, LDY (all addressing modes)
- Implement STA, STX, STY
- Implement transfer instructions (TAX, TAY, etc.)
- Add flag update helpers
Milestone: Can load and store values, pass load/store tests
Phase 3: Arithmetic (Days 8-12)
- Implement ADC (binary mode first)
- Implement SBC
- Implement compare instructions (CMP, CPX, CPY)
- Implement INC, DEC, INX, INY, DEX, DEY
Milestone: Can do basic math, pass arithmetic tests
Phase 4: Logic and Shift (Days 13-15)
- Implement AND, ORA, EOR
- Implement ASL, LSR, ROL, ROR
- Implement BIT
Milestone: Pass logic/shift tests
Phase 5: Branches and Jumps (Days 16-18)
- Implement all branch instructions (BCC, BCS, BEQ, BNE, BMI, BPL, BVC, BVS)
- Implement JMP (absolute and indirect with bug!)
- Implement JSR, RTS
Milestone: Can run branching code, pass branch tests
Phase 6: Stack and Interrupts (Days 19-21)
- Implement stack operations (PHA, PHP, PLA, PLP)
- Implement BRK, RTI
- Implement IRQ and NMI handlers
- Implement flag instructions (CLC, SEC, etc.)
Milestone: Full interrupt support, pass interrupt tests
Phase 7: BCD Mode (Days 22-24)
- Implement BCD addition in ADC
- Implement BCD subtraction in SBC
- Run BCD test suite
Milestone: Pass decimal mode tests
Phase 8: Polish (Days 25-28)
- Add cycle-accurate timing with page crossing
- Add debugging features
- Run Klaus Dormann full test suite
- Fix any failing tests
- Test with real programs
Milestone: Pass all tests, run Apple BASIC
5.11 Key Implementation Decisions
-
Opcode Dispatch: Use a 256-entry function pointer table. Switch statements are slower. Computed goto is fastest but less portable.
-
Memory Model: Start with a simple 64KB array. Refactor to callbacks later for NES/C64 bank switching.
-
Flag Handling: Create helper functions like
set_nz()andset_carry(). Inline the binary, call for BCD. -
BCD Implementation: Implement it. Even if targeting NES (which disables BCD), it teaches ALU concepts.
-
Undocumented Opcodes: Implement at least the stable ones (LAX, SAX, etc.) for real ROM compatibility.
-
Cycle Counting: Return cycles from each instruction. Add page-crossing detection for indexed modes.
6. Testing Strategy
6.1 Unit Testing
Test each addressing mode in isolation:
void test_lda_immediate(void) {
cpu_6502 cpu;
cpu_init(&cpu);
// LDA #$42
cpu.memory[0x0600] = 0xA9;
cpu.memory[0x0601] = 0x42;
cpu.pc = 0x0600;
cpu_step(&cpu);
assert(cpu.a == 0x42);
assert(cpu.pc == 0x0602);
assert((cpu.p & FLAG_Z) == 0);
assert((cpu.p & FLAG_N) == 0);
printf("test_lda_immediate PASSED\n");
}
void test_lda_zeropage_x_wrap(void) {
cpu_6502 cpu;
cpu_init(&cpu);
// LDA $FF,X with X=$02 should wrap to $01
cpu.memory[0x0600] = 0xB5;
cpu.memory[0x0601] = 0xFF;
cpu.memory[0x0001] = 0x77; // Wrapped address
cpu.x = 0x02;
cpu.pc = 0x0600;
cpu_step(&cpu);
assert(cpu.a == 0x77);
printf("test_lda_zeropage_x_wrap PASSED\n");
}
6.2 Integration Testing
Use Klaus Dormann’s test suite:
# Download the test
wget https://github.com/Klaus2m5/6502_65C02_functional_tests/raw/master/bin_files/6502_functional_test.bin
# Run it
./emu6502 --test 6502_functional_test.bin
The test is a self-contained program that exercises every opcode and reports failures by halting at a specific address.
6.3 Critical Test Cases
- Zero Page Wrapping:
- LDA $FF,X with X=$01 should read from $00, not $100
- JMP Indirect Bug:
- JMP ($10FF) should read low byte from $10FF and high byte from $1000
- BRK/RTI:
- BRK pushes PC+2, not PC+1
- B flag handling on push/pull
- Overflow Flag:
- $50 + $50 = $A0 (sets V, result is negative but operands positive)
- $D0 + $D0 = $A0 (sets C, clears V)
- Decimal Mode:
- $99 + $01 = $00 with C=1 (BCD)
- $00 - $01 = $99 with C=0 (BCD)
6.4 Debugging Techniques
# Add instruction tracing
./emu6502 --trace test.bin
$0600: A9 44 LDA #$44 A=44 X=00 Y=00 P=00 S=FD
$0602: 85 10 STA $10 A=44 X=00 Y=00 P=00 S=FD
$0604: A9 30 LDA #$30 A=30 X=00 Y=00 P=00 S=FD
# Compare with reference (py65 emulator)
pip install py65
python -c "from py65.monitor import Monitor; Monitor().run()"
7. Common Pitfalls & Debugging
Problem 1: ADC/SBC produce wrong results
- Root Cause: Incorrect carry handling or overflow calculation
- Fix: Remember SBC = ADC with inverted operand. Carry is inverted too.
- Quick Test:
$00 - $01with C=1 should give $FF with C=0
Problem 2: Zero page indexed addressing reads from wrong address
- Root Cause: Not masking to 8 bits
- Fix:
addr = (base + cpu->x) & 0xFF
Problem 3: BRK doesn’t work correctly
- Root Cause: Wrong PC value pushed, or B flag handling
- Fix: BRK pushes PC+2. B is set on pushed P, not actual P.
Problem 4: Branches always/never taken
- Root Cause: Wrong flag polarity
- Fix: BCC = branch if C==0, BCS = branch if C==1
Problem 5: Decimal mode gives wrong results
- Root Cause: BCD addition algorithm error
- Fix: Handle carry between nibbles, adjust when >9
Problem 6: Test hangs at specific address
- Root Cause: Failed test loops forever
- Fix: Klaus test suite hangs at failure point. Address indicates which test failed.
Problem 7: Cycles don’t match expected
- Root Cause: Missing page-crossing penalty
- Fix: Check if ((base & 0xFF00) != (addr & 0xFF00))
Problem 8: Stack operations corrupt memory
- Root Cause: Not fixing stack to page $01
-
Fix: Always use ($0100 cpu->s) for stack address
8. Extensions & Challenges
Beginner Extensions
- Add disassembler output for each instruction
- Implement undocumented NOPs
- Add memory read/write breakpoints
- Create simple assembler for test programs
Intermediate Extensions
- Implement all stable undocumented opcodes (LAX, SAX, DCP, ISB, etc.)
- Add cycle-accurate sub-instruction timing
- Implement the “unstable” undocumented opcodes
- Create memory-mapped I/O for simple terminal
Advanced Extensions
- Build a 6502 assembler
- Create an Apple I emulator using your CPU
- Create a C64 emulator (add VIC-II and SID chips)
- Create an NES emulator (add PPU and APU)
- Implement the 65C02 extended instruction set
Expert Challenges
- Match Visual 6502 transistor-level timing
- Implement the unstable documented opcodes correctly
- Add save states and rewind capability
- Create a time-travel debugger
9. Real-World Connections
NES Emulation: The NES CPU is a 6502 variant (Ricoh 2A03) with disabled decimal mode and added audio. Your emulator is the foundation for FCEUX, Nestopia, Mesen.
Apple II Preservation: Apple II enthusiasts use 6502 emulators to run vintage software. AppleWin and LinApple depend on accurate emulation.
Retrocomputing: The 6502 community is active - people still build new computers with real chips. Understanding the 6502 connects you to this community.
Embedded Systems: The 65C02 is still manufactured today for embedded applications. Your knowledge applies to modern hardware.
Career Skills: Writing an emulator demonstrates:
- Deep understanding of CPU architecture
- Ability to implement specifications precisely
- Debugging at the lowest level
- Performance optimization skills
10. Resources
Primary References
- MOS 6502 Programming Manual (original 1976 document)
- W65C02S Datasheet (modern variant, compatible)
- Klaus Dormann’s Test Suite Documentation
Online Resources
- 6502.org - Tutorials, forums, resources
- Visual 6502 - See the actual transistors
- NesDev Wiki 6502 - NES-specific details
- Obelisk 6502 Guide - Excellent opcode reference
Community
- 6502.org Forums
- r/EmuDev - Emulator development
- NesDev Forums - NES-specific help
Reference Emulators
- py65 (Python) - Simple, well-documented
- FCEUX (C++) - NES emulator with excellent debugger
- Mesen (C#) - Highly accurate NES emulator
- perfect6502 (C) - Transistor-level simulation
11. Self-Assessment Checklist
Before moving to the next project, verify:
Understanding:
- Can you explain all 13 addressing modes without notes?
- Can you describe how the status register works?
- Can you explain why SBC is implemented as ADC with inverted operand?
- Can you explain the JMP indirect bug?
- Can you describe how BCD arithmetic works?
Implementation:
- Does your emulator pass Klaus Dormann’s functional test?
- Does it pass the BCD/decimal test?
- Are cycles counted correctly including page penalties?
- Do interrupts work correctly (IRQ, NMI, BRK)?
- Can you run Apple I BASIC?
Debugging:
- Can you add instruction tracing to your emulator?
- Can you set breakpoints and inspect memory?
- Can you identify which test failed when Klaus test halts?
- Can you compare your trace with a reference emulator?
Growth:
- Did you write the emulator without copying code?
- Can you explain your design decisions?
- Are you ready to extend this to NES or C64 emulation?
12. Completion Criteria
Your implementation is complete when:
- All 56 official opcodes implemented
- All 13 addressing modes working correctly
- Klaus Dormann 6502 functional test passes
- Klaus Dormann decimal test passes
- Cycle counts match reference for all instructions
- Page-crossing penalties implemented
- Interrupts (IRQ, NMI, BRK) working
- Can run Apple I BASIC and interact with it
- Code is clean, commented, and maintainable
- You can explain the entire implementation
Congratulations! You’ve built an emulator for one of the most influential CPUs in computing history. You now understand how millions of NES, Apple II, and C64 programs executed. This is the foundation for building complete system emulators.
This guide was expanded from CPU_ISA_ARCHITECTURE_PROJECTS.md. For the complete learning path, see the project index.