Project 5: Simple RISC CPU Emulator (Custom ISA)
Design and implement your own RISC instruction set architecture from scratch. Build an emulator that executes your machine code, and write an assembler that translates your assembly language into binary. This is YOUR CPU, YOUR ISA, YOUR design decisions.
Quick Reference
| Attribute | Value |
|---|---|
| Language | C (alt: Rust, C++, Zig) |
| Difficulty | Advanced |
| Time | 2-3 weeks |
| Prerequisites | P03 (Stack VM) or P04 (CHIP-8) |
| Coolness | Hardcore Tech Flex |
| Portfolio Value | Resume Gold |
Table of Contents
- Learning Objectives
- Why Design Your Own ISA?
- Theoretical Foundation
- ISA Design Decisions
- Project Specification
- Solution Architecture
- Implementation Guide
- The Assembler
- Testing Strategy
- Common Pitfalls
- Extensions
- Resources
1. Learning Objectives
By completing this project, you will:
- Understand ISA design trade-offs: Why fixed-width instructions? Why 32 registers vs 8? Why load/store architecture?
- Master instruction encoding: Pack opcodes, registers, and immediates into bit fields
- Implement a complete CPU emulator: Fetch, decode, execute with registers, memory, and flags
- Build an assembler: Parse assembly text, resolve labels, emit binary machine code
- Think like a hardware architect: Every decision affects complexity, performance, and expressiveness
- Debug at the machine level: Trace execution, inspect registers, understand what your code becomes
2. Why Design Your Own ISA?
With CHIP-8 or 6502, you implement someone else’s specification. You learn how CPUs work, but not why they’re designed a particular way.
When YOU design the ISA, you confront fundamental questions:
+-----------------------------------------------------------------------+
| QUESTIONS YOU MUST ANSWER |
+-----------------------------------------------------------------------+
| |
| Instruction Width: |
| - Fixed 32-bit? Variable length? Why? |
| - How many bits for opcode? For registers? For immediates? |
| |
| Registers: |
| - How many? 8? 16? 32? |
| - Special-purpose (PC, SP, flags) or general-purpose? |
| - Register 0 hardwired to zero? (Like MIPS/RISC-V) |
| |
| Memory Access: |
| - Load/store architecture? Or memory operands in ALU ops? |
| - What addressing modes? Immediate? Register indirect? Indexed? |
| - Byte-addressable? Word-addressable? Alignment requirements? |
| |
| Control Flow: |
| - Condition codes (flags) or compare-and-branch? |
| - PC-relative or absolute jumps? |
| - How to implement function calls? |
| |
| Immediates: |
| - How many bits for immediate values? |
| - Sign-extended or zero-extended? |
| - How to load large constants? (LUI + ADDI like RISC-V?) |
| |
+-----------------------------------------------------------------------+
After this project, you’ll understand why real ISAs (ARM, x86, RISC-V) made their choices.
3. Theoretical Foundation
3.1 RISC Philosophy
RISC (Reduced Instruction Set Computer) emerged in the 1980s as a reaction to complex ISAs:
+-----------------------------------------------------------------------+
| CISC vs RISC COMPARISON |
+-----------------------------------------------------------------------+
| |
| CISC (x86, VAX) RISC (ARM, MIPS, RISC-V) |
| +-----------------------+ +-----------------------+ |
| | Complex instructions | | Simple instructions | |
| | - String copy | | - One operation each | |
| | - Polynomial eval | | - Load/store memory | |
| | - Memory-to-memory | | - Register-to-reg ALU | |
| +-----------------------+ +-----------------------+ |
| | Variable length | | Fixed length | |
| | (1-15 bytes x86) | | (32 bits typical) | |
| +-----------------------+ +-----------------------+ |
| | Many addressing modes | | Few addressing modes | |
| | - Indexed, indirect | | - Register, immediate | |
| | - Scaled index | | - Base + offset | |
| +-----------------------+ +-----------------------+ |
| | Microcode-heavy | | Hardwired control | |
| | (ROM for complex ops) | | (faster, simpler) | |
| +-----------------------+ +-----------------------+ |
| |
| RISC Key Principles: |
| 1. Fixed instruction size (easy to fetch/decode) |
| 2. Load/store architecture (only loads/stores touch memory) |
| 3. Many registers (reduce memory traffic) |
| 4. Simple instructions (one operation per instruction) |
| 5. Hardwired control (no microcode) |
| |
+-----------------------------------------------------------------------+
Your ISA will follow RISC principles: fixed-width instructions, load/store architecture, simple operations.
3.2 Instruction Encoding Design
The heart of ISA design is packing information into a fixed number of bits:
+-----------------------------------------------------------------------+
| INSTRUCTION ENCODING TRADE-OFFS |
+-----------------------------------------------------------------------+
| |
| 32-bit instruction with 8 registers (3-bit register IDs): |
| |
| R-type (Register-Register operations): |
| +--------+------+------+------+--------+ |
| | opcode | Rd | Rs1 | Rs2 | unused | |
| | 6 bits | 3b | 3b | 3b | 17 bits| |
| +--------+------+------+------+--------+ |
| bits: 31-26 25-23 22-20 19-17 16-0 |
| |
| Example: ADD R1, R2, R3 (R1 = R2 + R3) |
| opcode=0x01, Rd=1, Rs1=2, Rs2=3 |
| Binary: 000001 001 010 011 00000000000000000 |
| |
| I-type (Immediate operations): |
| +--------+------+------+------------------+ |
| | opcode | Rd | Rs1 | immediate | |
| | 6 bits | 3b | 3b | 20 bits | |
| +--------+------+------+------------------+ |
| bits: 31-26 25-23 22-20 19-0 |
| |
| Example: ADDI R1, R2, #100 (R1 = R2 + 100) |
| opcode=0x02, Rd=1, Rs1=2, imm=100 |
| 20-bit immediate: -524288 to +524287 (signed) |
| |
| J-type (Jump instructions): |
| +--------+------+------------------------+ |
| | opcode | Rd | offset | |
| | 6 bits | 3b | 23 bits | |
| +--------+------+------------------------+ |
| bits: 31-26 25-23 22-0 |
| |
| Example: JAL R7, label (R7 = PC+4, jump to label) |
| 23-bit offset: +/- 4MB range (PC-relative) |
| |
+-----------------------------------------------------------------------+
| |
| DESIGN DECISION: Why these bit allocations? |
| |
| - 6 bits for opcode: 64 possible instructions (plenty for ~20) |
| - 3 bits for registers: 8 registers (R0-R7) |
| - Remaining bits: maximize immediate/offset range |
| |
| Alternative: 16 registers needs 4 bits each |
| +--------+------+------+------+--------+ |
| | opcode | Rd | Rs1 | Rs2 | unused | |
| | 6 bits | 4b | 4b | 4b | 14 bits| |
| +--------+------+------+------+--------+ |
| |
| More registers = smaller immediates. Trade-off! |
| |
+-----------------------------------------------------------------------+

3.3 Register File Architecture
Registers are the CPU’s ultra-fast scratch space:
+-----------------------------------------------------------------------+
| REGISTER FILE DESIGN |
+-----------------------------------------------------------------------+
| |
| OPTION A: 8 General-Purpose Registers |
| +-----+-----+-----+-----+-----+-----+-----+-----+ |
| | R0 | R1 | R2 | R3 | R4 | R5 | R6 | R7 | |
| +-----+-----+-----+-----+-----+-----+-----+-----+ |
| ^ ^ |
| | | |
| Zero register (always 0)? Stack pointer? |
| Or general purpose? Or general purpose? |
| |
| Special Registers (separate from GPRs): |
| +--------+ |
| | PC | Program Counter (address of current instruction) |
| +--------+ |
| +--------+ |
| | FLAGS | Status flags (Zero, Negative, Carry, Overflow) |
| +--------+ |
| |
+-----------------------------------------------------------------------+
| |
| DESIGN CHOICES: |
| |
| 1. R0 = Zero Register (MIPS, RISC-V style) |
| - Writes to R0 are discarded |
| - Reads from R0 always return 0 |
| - Simplifies MOV instruction: ADD R1, R0, R2 copies R2 to R1 |
| - NOP can be: ADD R0, R0, R0 |
| |
| 2. All registers general-purpose (ARM style) |
| - Need explicit MOV instruction |
| - R7 conventionally used as stack pointer (software convention) |
| |
| 3. Dedicated SP and LR (ARM style) |
| - R7 = SP (stack pointer) |
| - R6 = LR (link register for return address) |
| - Simplifies CALL/RET |
| |
+-----------------------------------------------------------------------+
| |
| RECOMMENDATION FOR THIS PROJECT: |
| |
| 8 registers, R0 = zero, R7 = stack pointer (convention) |
| +-----+-----+-----+-----+-----+-----+-----+-----+ |
| | R0 | R1 | R2 | R3 | R4 | R5 | R6 | R7 | |
| | =0 | GP | GP | GP | GP | GP | GP | SP | |
| +-----+-----+-----+-----+-----+-----+-----+-----+ |
| |
| Benefits: |
| - 3 bits per register (efficient encoding) |
| - Zero register simplifies ISA |
| - Stack pointer convention (not hardwired) |
| |
+-----------------------------------------------------------------------+

3.4 Addressing Modes
How instructions specify operand locations:
+-----------------------------------------------------------------------+
| ADDRESSING MODES |
+-----------------------------------------------------------------------+
| |
| 1. REGISTER MODE |
| Operand is in a register |
| |
| ADD R1, R2, R3 ; R1 = R2 + R3 |
| |
| 2. IMMEDIATE MODE |
| Operand is a constant encoded in the instruction |
| |
| ADDI R1, R2, #100 ; R1 = R2 + 100 |
| MOV R1, #42 ; R1 = 42 (same as ADDI R1, R0, #42) |
| |
| 3. REGISTER INDIRECT |
| Address is in a register |
| |
| LW R1, (R2) ; R1 = Memory[R2] |
| SW R1, (R2) ; Memory[R2] = R1 |
| |
| 4. BASE + OFFSET (Displacement) |
| Address = register + constant |
| |
| LW R1, 8(R2) ; R1 = Memory[R2 + 8] |
| SW R1, -4(R2) ; Memory[R2 - 4] = R1 |
| |
| Common pattern: accessing struct fields, local variables |
| LW R1, 0(FP) ; load first local variable |
| LW R2, 4(FP) ; load second local variable |
| |
+-----------------------------------------------------------------------+
| |
| LOAD/STORE ENCODING: |
| |
| LW (Load Word): |
| +--------+------+------+------------------+ |
| | LW_OP | Rd | Rs | offset | |
| | 6 bits | 3b | 3b | 20 bits | |
| +--------+------+------+------------------+ |
| |
| LW R1, 8(R2) => R1 = Memory[R2 + 8] |
| |
| SW (Store Word): |
| +--------+------+------+------------------+ |
| | SW_OP | Rs | Rb | offset | |
| | 6 bits | 3b | 3b | 20 bits | |
| +--------+------+------+------------------+ |
| |
| SW R1, 8(R2) => Memory[R2 + 8] = R1 |
| |
| Note: RISC ISAs typically only have LOAD and STORE touch memory. |
| ALU operations work exclusively on registers. |
| |
+-----------------------------------------------------------------------+
3.5 Condition Codes and Status Flags
Two approaches to conditional execution:
+-----------------------------------------------------------------------+
| CONDITIONAL EXECUTION |
+-----------------------------------------------------------------------+
| |
| APPROACH 1: Condition Codes (FLAGS Register) |
| +---+---+---+---+ |
| | Z | N | C | V | |
| +---+---+---+---+ |
| | | | | |
| | | | +-- Overflow (signed overflow occurred) |
| | | +------ Carry (unsigned overflow/borrow) |
| | +---------- Negative (result is negative, MSB=1) |
| +-------------- Zero (result is zero) |
| |
| ALU operations set flags: |
| SUB R0, R1, R2 ; R0 = R1 - R2, sets Z, N, C, V |
| BEQ label ; branch if Z=1 (equal) |
| BLT label ; branch if N!=V (less than, signed) |
| |
| Condition Combinations: |
| EQ (equal): Z=1 |
| NE (not equal): Z=0 |
| LT (less than): N!=V |
| GE (greater/eq): N==V |
| LE (less/equal): Z=1 OR N!=V |
| GT (greater): Z=0 AND N==V |
| CS (carry set): C=1 (unsigned >=) |
| CC (carry clear): C=0 (unsigned <) |
| |
+-----------------------------------------------------------------------+
| |
| APPROACH 2: Compare-and-Branch (RISC-V style) |
| |
| No flags register! Branch instructions compare directly: |
| BEQ R1, R2, label ; if R1 == R2, branch |
| BLT R1, R2, label ; if R1 < R2 (signed), branch |
| BGE R1, R2, label ; if R1 >= R2 (signed), branch |
| |
| Advantages: |
| - No hidden state (flags) |
| - No flag dependencies between instructions |
| - Easier out-of-order execution |
| |
| Disadvantages: |
| - Branch instruction needs 2 register operands (encoding space) |
| - Can't reuse comparison result |
| |
+-----------------------------------------------------------------------+
| |
| RECOMMENDATION FOR THIS PROJECT: |
| |
| Use FLAGS register (simpler for beginners): |
| - CMP instruction sets flags |
| - Branch instructions check flags |
| - Classic approach, matches ARM and x86 |
| |
| CMP R1, R2 ; FLAGS = R1 - R2 (result discarded) |
| BEQ label ; if Z=1, PC = label |
| |
+-----------------------------------------------------------------------+

3.6 The Fetch-Decode-Execute Cycle
The eternal heartbeat of every CPU:
+-----------------------------------------------------------------------+
| FETCH-DECODE-EXECUTE CYCLE |
+-----------------------------------------------------------------------+
| |
| +---------------+ |
| | FETCH | |
| +-------+-------+ |
| | |
| v |
| +-------------+-------------+ |
| | instruction = MEM[PC] | |
| | PC = PC + 4 | |
| +-------------+-------------+ |
| | |
| v |
| +-------+-------+ |
| | DECODE | |
| +-------+-------+ |
| | |
| v |
| +-------------+-------------+ |
| | opcode = (instr >> 26) | |
| | rd = (instr >> 23) & 0x7 | |
| | rs1 = (instr >> 20) & 0x7 | |
| | rs2 = (instr >> 17) & 0x7 | |
| | imm = sign_ext(instr) | |
| +-------------+-------------+ |
| | |
| v |
| +-------+-------+ |
| | EXECUTE | |
| +-------+-------+ |
| | |
| v |
| +-------------+-------------+ |
| | switch(opcode): | |
| | ADD: R[rd] = R[rs1] + | |
| | R[rs2] | |
| | SUB: R[rd] = R[rs1] - | |
| | R[rs2] | |
| | LW: R[rd] = MEM[R[rs1] | |
| | + imm] | |
| | BEQ: if(Z) PC = PC + | |
| | imm | |
| | ... | |
| +-------------+-------------+ |
| | |
| +------+ |
| | |
| +--------------------+ |
| | |
| v |
| (loop forever until HALT) |
| |
+-----------------------------------------------------------------------+
| |
| EMULATOR MAIN LOOP (pseudocode): |
| |
| while (!halted) { |
| // FETCH |
| instruction = memory[pc]; |
| pc += 4; |
| |
| // DECODE |
| opcode = (instruction >> 26) & 0x3F; |
| rd = (instruction >> 23) & 0x7; |
| rs1 = (instruction >> 20) & 0x7; |
| rs2 = (instruction >> 17) & 0x7; |
| imm = sign_extend_20bit(instruction & 0xFFFFF); |
| |
| // EXECUTE |
| switch(opcode) { |
| case OP_ADD: regs[rd] = regs[rs1] + regs[rs2]; break; |
| case OP_LW: regs[rd] = memory[regs[rs1] + imm]; break; |
| case OP_BEQ: if(flags.Z) pc = pc - 4 + imm; break; |
| case OP_HALT: halted = true; break; |
| // ... |
| } |
| |
| // Enforce R0 = 0 |
| regs[0] = 0; |
| } |
| |
+-----------------------------------------------------------------------+

4. ISA Design Decisions
This section guides you through designing YOUR instruction set. Make deliberate choices!
4.1 Your ISA Parameters
Fill in these decisions:
+-----------------------------------------------------------------------+
| YOUR ISA DESIGN WORKSHEET |
+-----------------------------------------------------------------------+
| |
| ISA Name: ______________________ (e.g., "SimpleCPU", "TinyRISC") |
| |
| INSTRUCTION WIDTH |
| [ ] 16-bit (compact, limited immediates) |
| [X] 32-bit (recommended - good balance) |
| [ ] Variable (complex decoding) |
| |
| REGISTER COUNT |
| [ ] 4 registers (2-bit IDs, very limited) |
| [X] 8 registers (3-bit IDs, recommended) |
| [ ] 16 registers (4-bit IDs, uses more instruction bits) |
| [ ] 32 registers (5-bit IDs, like MIPS/RISC-V) |
| |
| REGISTER WIDTH |
| [ ] 16-bit |
| [X] 32-bit (recommended) |
| [ ] 64-bit |
| |
| MEMORY |
| Word size: 32 bits (4 bytes) |
| Addressing: Byte-addressable |
| Alignment: Word-aligned (addresses must be multiples of 4) |
| Size: 64KB (addressable with 16-bit addresses) |
| |
| SPECIAL REGISTERS |
| [X] R0 = Zero (always reads 0, writes ignored) |
| [ ] R0 = General purpose |
| R7 = Stack pointer (software convention) |
| |
| CONDITIONAL MODEL |
| [X] Flags register (Z, N, C, V) |
| [ ] Compare-and-branch (no flags) |
| |
+-----------------------------------------------------------------------+
4.2 Recommended Instruction Set (~20 instructions)
Design your instruction set in categories:
+-----------------------------------------------------------------------+
| RECOMMENDED INSTRUCTION SET |
+-----------------------------------------------------------------------+
| |
| CATEGORY 1: ARITHMETIC (R-type) |
| +--------+--------------------------------+-------------+ |
| | Opcode | Instruction | Operation | |
| +--------+--------------------------------+-------------+ |
| | 0x01 | ADD Rd, Rs1, Rs2 | Rd=Rs1+Rs2 | |
| | 0x02 | SUB Rd, Rs1, Rs2 | Rd=Rs1-Rs2 | |
| | 0x03 | AND Rd, Rs1, Rs2 | Rd=Rs1&Rs2 | |
| | 0x04 | OR Rd, Rs1, Rs2 | Rd=Rs1|Rs2 | |
| | 0x05 | XOR Rd, Rs1, Rs2 | Rd=Rs1^Rs2 | |
| | 0x06 | SLL Rd, Rs1, Rs2 | Rd=Rs1<<Rs2 | |
| | 0x07 | SRL Rd, Rs1, Rs2 | Rd=Rs1>>Rs2 | |
| +--------+--------------------------------+-------------+ |
| |
| CATEGORY 2: IMMEDIATE OPERATIONS (I-type) |
| +--------+--------------------------------+-------------+ |
| | Opcode | Instruction | Operation | |
| +--------+--------------------------------+-------------+ |
| | 0x10 | ADDI Rd, Rs, #imm | Rd=Rs+imm | |
| | 0x11 | ANDI Rd, Rs, #imm | Rd=Rs&imm | |
| | 0x12 | ORI Rd, Rs, #imm | Rd=Rs|imm | |
| | 0x13 | LUI Rd, #imm | Rd=imm<<12 | |
| +--------+--------------------------------+-------------+ |
| |
| Note: MOV Rd, #imm = ADDI Rd, R0, #imm (pseudo-instruction) |
| MOV Rd, Rs = ADD Rd, R0, Rs (pseudo-instruction) |
| |
| CATEGORY 3: MEMORY (I-type) |
| +--------+--------------------------------+-----------------+ |
| | Opcode | Instruction | Operation | |
| +--------+--------------------------------+-----------------+ |
| | 0x20 | LW Rd, offset(Rs) | Rd=MEM[Rs+off] | |
| | 0x21 | SW Rs, offset(Rb) | MEM[Rb+off]=Rs | |
| | 0x22 | LB Rd, offset(Rs) | Rd=MEM[Rs+off] | |
| | 0x23 | SB Rs, offset(Rb) | MEM[Rb+off]=Rs | |
| +--------+--------------------------------+-----------------+ |
| |
| CATEGORY 4: COMPARISON |
| +--------+--------------------------------+-------------+ |
| | Opcode | Instruction | Operation | |
| +--------+--------------------------------+-------------+ |
| | 0x30 | CMP Rs1, Rs2 | FLAGS=Rs1-Rs2| |
| | 0x31 | CMPI Rs, #imm | FLAGS=Rs-imm | |
| +--------+--------------------------------+-------------+ |
| |
| CATEGORY 5: BRANCHES (B-type, PC-relative) |
| +--------+--------------------------------+-------------------+ |
| | Opcode | Instruction | Operation | |
| +--------+--------------------------------+-------------------+ |
| | 0x40 | BEQ offset | if Z, PC+=offset | |
| | 0x41 | BNE offset | if !Z, PC+=offset | |
| | 0x42 | BLT offset | if N!=V, PC+=off | |
| | 0x43 | BGE offset | if N==V, PC+=off | |
| | 0x44 | BLE offset | if Z|N!=V, PC+=off| |
| | 0x45 | BGT offset | if !Z&N==V, PC+=of| |
| +--------+--------------------------------+-------------------+ |
| |
| CATEGORY 6: JUMPS AND CALLS |
| +--------+--------------------------------+-------------------+ |
| | Opcode | Instruction | Operation | |
| +--------+--------------------------------+-------------------+ |
| | 0x50 | JMP offset | PC += offset | |
| | 0x51 | JAL Rd, offset | Rd=PC, PC+=offset | |
| | 0x52 | JALR Rd, Rs, offset | Rd=PC, PC=Rs+off | |
| +--------+--------------------------------+-------------------+ |
| |
| Note: CALL = JAL R6, offset (R6 = link register by convention) |
| RET = JALR R0, R6, 0 (jump to address in R6, discard PC) |
| |
| CATEGORY 7: SYSTEM |
| +--------+--------------------------------+-------------------+ |
| | Opcode | Instruction | Operation | |
| +--------+--------------------------------+-------------------+ |
| | 0x00 | HALT | Stop execution | |
| | 0x3F | NOP | No operation | |
| +--------+--------------------------------+-------------------+ |
| |
+-----------------------------------------------------------------------+
4.3 Encoding Summary
+-----------------------------------------------------------------------+
| INSTRUCTION ENCODING FORMATS |
+-----------------------------------------------------------------------+
| |
| R-TYPE (Register-Register): |
| +--------+------+------+------+-----------------+ |
| | opcode | Rd | Rs1 | Rs2 | unused | |
| | 6 bits | 3b | 3b | 3b | 17 bits | |
| +--------+------+------+------+-----------------+ |
| bits: 31-26 25-23 22-20 19-17 16-0 |
| |
| Used by: ADD, SUB, AND, OR, XOR, SLL, SRL, CMP |
| |
| I-TYPE (Immediate): |
| +--------+------+------+------------------------+ |
| | opcode | Rd | Rs | immediate | |
| | 6 bits | 3b | 3b | 20 bits | |
| +--------+------+------+------------------------+ |
| bits: 31-26 25-23 22-20 19-0 |
| |
| Used by: ADDI, ANDI, ORI, LUI, LW, SW, CMPI, JALR |
| Immediate is sign-extended to 32 bits |
| Range: -524288 to +524287 |
| |
| B-TYPE (Branch): |
| +--------+-----------------------------+ |
| | opcode | offset | |
| | 6 bits | 26 bits | |
| +--------+-----------------------------+ |
| bits: 31-26 25-0 |
| |
| Used by: BEQ, BNE, BLT, BGE, BLE, BGT, JMP |
| Offset is sign-extended, in BYTES (not words) |
| Range: +/- 32MB |
| |
| J-TYPE (Jump-and-Link): |
| +--------+------+------------------------+ |
| | opcode | Rd | offset | |
| | 6 bits | 3b | 23 bits | |
| +--------+------+------------------------+ |
| bits: 31-26 25-23 22-0 |
| |
| Used by: JAL |
| Rd = return address register |
| |
+-----------------------------------------------------------------------+
5. Project Specification
5.1 What You’ll Build
Two programs:
- Assembler (
myasm): Converts assembly text to binary - Emulator (
mycpu): Executes the binary
5.2 Expected Output
$ cat fibonacci.asm
; Fibonacci sequence - compute F(10)
; Uses R1, R2, R3 as working registers
; Result ends up in R2
ADDI R1, R0, #0 ; R1 = fib(n-2) = 0
ADDI R2, R0, #1 ; R2 = fib(n-1) = 1
ADDI R3, R0, #10 ; R3 = counter = 10
loop:
ADD R4, R1, R2 ; R4 = fib(n) = fib(n-2) + fib(n-1)
ADD R1, R0, R2 ; R1 = old R2 (shift)
ADD R2, R0, R4 ; R2 = new fib
ADDI R3, R3, #-1 ; counter--
CMPI R3, #0
BNE loop ; if counter != 0, continue
HALT
$ ./myasm fibonacci.asm -o fibonacci.bin
Assembler Output:
Input: fibonacci.asm
Output: fibonacci.bin
Pass 1: Scanning for labels...
Found label 'loop' at address 0x000C
Pass 2: Generating machine code...
0x0000: ADDI R1, R0, #0 => 0x40200000
0x0004: ADDI R2, R0, #1 => 0x40400001
0x0008: ADDI R3, R0, #10 => 0x4060000A
0x000C: ADD R4, R1, R2 => 0x04A20000
0x0010: ADD R1, R0, R2 => 0x04220000
0x0014: ADD R2, R0, R4 => 0x04440000
0x0018: ADDI R3, R3, #-1 => 0x406FFFFF
0x001C: CMPI R3, #0 => 0xC4600000
0x0020: BNE loop => 0x41FFFFEC
0x0024: HALT => 0x00000000
Assembled: 10 instructions, 40 bytes
$ ./mycpu fibonacci.bin --trace
SimpleCPU Emulator v1.0
Loading: fibonacci.bin (40 bytes)
Starting execution...
Cycle 1:
[0x0000] 0x40200000 ADDI R1, R0, #0
Registers: R0=0 R1=0 R2=0 R3=0 R4=0 R5=0 R6=0 R7=0
Flags: Z=0 N=0 C=0 V=0
Cycle 2:
[0x0004] 0x40400001 ADDI R2, R0, #1
Registers: R0=0 R1=0 R2=1 R3=0 R4=0 R5=0 R6=0 R7=0
Flags: Z=0 N=0 C=0 V=0
Cycle 3:
[0x0008] 0x4060000A ADDI R3, R0, #10
Registers: R0=0 R1=0 R2=1 R3=10 R4=0 R5=0 R6=0 R7=0
Flags: Z=0 N=0 C=0 V=0
... (loop iterations) ...
Cycle 84:
[0x0024] 0x00000000 HALT
=== EXECUTION COMPLETE ===
Total cycles: 84
Final state:
R0=0 R1=34 R2=55 R3=0 R4=55 R5=0 R6=0 R7=0
Flags: Z=1 N=0 C=0 V=0
Result: R2 = 55 (10th Fibonacci number)
6. Solution Architecture
6.1 Emulator Structure
+-----------------------------------------------------------------------+
| EMULATOR ARCHITECTURE |
+-----------------------------------------------------------------------+
| |
| +------------------------------------------------------------------+ |
| | main.c | |
| | - Parse command line (input file, --trace, --debug) | |
| | - Load binary into memory | |
| | - Call cpu_run() | |
| | - Print final state | |
| +------------------------------------------------------------------+ |
| | |
| v |
| +------------------------------------------------------------------+ |
| | cpu.h / cpu.c | |
| | +----------------------+ +----------------------+ | |
| | | typedef struct { | | void cpu_init() | | |
| | | uint32_t regs[8]; | | void cpu_step() | | |
| | | uint32_t pc; | | void cpu_run() | | |
| | | struct { | | void cpu_dump() | | |
| | | bool Z, N, C, V; | +----------------------+ | |
| | | } flags; | | |
| | | bool halted; | | |
| | | uint8_t *memory; | | |
| | | } CPU; | | |
| | +----------------------+ | |
| +------------------------------------------------------------------+ |
| | |
| v |
| +------------------------------------------------------------------+ |
| | decode.h / decode.c | |
| | +---------------------------+ | |
| | | typedef struct { | | |
| | | uint8_t opcode; | Instruction decode_instr(word) | |
| | | uint8_t rd, rs1, rs2; | const char* disassemble(instr) | |
| | | int32_t imm; | | |
| | | InstrType type; | | |
| | | } Instruction; | | |
| | +---------------------------+ | |
| +------------------------------------------------------------------+ |
| | |
| v |
| +------------------------------------------------------------------+ |
| | execute.h / execute.c | |
| | | |
| | void execute_add(CPU *cpu, Instruction *instr) | |
| | void execute_sub(CPU *cpu, Instruction *instr) | |
| | void execute_lw(CPU *cpu, Instruction *instr) | |
| | void execute_beq(CPU *cpu, Instruction *instr) | |
| | ... | |
| | | |
| | OR: single execute(CPU*, Instruction*) with switch | |
| +------------------------------------------------------------------+ |
| | |
| v |
| +------------------------------------------------------------------+ |
| | memory.h / memory.c | |
| | | |
| | uint32_t mem_read_word(CPU *cpu, uint32_t addr) | |
| | void mem_write_word(CPU *cpu, uint32_t addr, uint32_t val) | |
| | uint8_t mem_read_byte(CPU *cpu, uint32_t addr) | |
| | void mem_write_byte(CPU *cpu, uint32_t addr, uint8_t val) | |
| | | |
| | Handles alignment, bounds checking, little-endian | |
| +------------------------------------------------------------------+ |
| |
+-----------------------------------------------------------------------+
6.2 Assembler Structure
+-----------------------------------------------------------------------+
| ASSEMBLER ARCHITECTURE |
+-----------------------------------------------------------------------+
| |
| +------------------------------------------------------------------+ |
| | main.c | |
| | - Parse command line (input, -o output) | |
| | - Read input file | |
| | - Call assembler passes | |
| | - Write binary output | |
| +------------------------------------------------------------------+ |
| | |
| v |
| +------------------------------------------------------------------+ |
| | lexer.h / lexer.c | |
| | | |
| | typedef struct { | |
| | TokenType type; // LABEL, MNEMONIC, REGISTER, NUMBER, etc | |
| | char *text; | |
| | int line_num; | |
| | } Token; | |
| | | |
| | Token* tokenize(char *source) | |
| | - Strip comments (;) | |
| | - Identify tokens | |
| | - Handle # for immediates, R for registers | |
| +------------------------------------------------------------------+ |
| | |
| v |
| +------------------------------------------------------------------+ |
| | parser.h / parser.c | |
| | | |
| | typedef struct { | |
| | char *label; // NULL if no label on this line | |
| | char *mnemonic; // "ADD", "LW", etc. | |
| | Operand operands[3]; | |
| | int num_operands; | |
| | } ParsedLine; | |
| | | |
| | ParsedLine* parse(Token *tokens) | |
| +------------------------------------------------------------------+ |
| | |
| v |
| +------------------------------------------------------------------+ |
| | assembler.h / assembler.c | |
| | | |
| | PASS 1: Collect labels and their addresses | |
| | - For each instruction, add 4 to address counter | |
| | - When label found, record (label_name, current_address) | |
| | | |
| | PASS 2: Generate machine code | |
| | - For each instruction: | |
| | - Look up opcode | |
| | - Encode registers | |
| | - Resolve label references to offsets | |
| | - Emit 32-bit word | |
| +------------------------------------------------------------------+ |
| | |
| v |
| +------------------------------------------------------------------+ |
| | encode.h / encode.c | |
| | | |
| | uint32_t encode_r_type(opcode, rd, rs1, rs2) | |
| | uint32_t encode_i_type(opcode, rd, rs, imm) | |
| | uint32_t encode_b_type(opcode, offset) | |
| | uint32_t encode_j_type(opcode, rd, offset) | |
| +------------------------------------------------------------------+ |
| |
+-----------------------------------------------------------------------+
7. Implementation Guide
7.1 Phase 1: Basic Emulator (Days 1-3)
Goal: Execute hardcoded machine code with arithmetic operations.
Step 1: CPU State Structure
Think about what state you need to track:
Questions to answer:
- How do you represent 8 registers?
- How do you represent memory (array of bytes? words?)
- How do you store the PC?
- How do you store flags (separate bools? single byte? bitfield?)
Step 2: Instruction Decode
Write a function that extracts fields from a 32-bit instruction:
For instruction 0x04A20000 (ADD R4, R1, R2):
Binary: 0000 0100 1010 0010 0000 0000 0000 0000
---- --++ +--- ---- ---- ---- ---- ----
opcode rd rs1 rs2
How do you extract:
- opcode (bits 31-26)? Use >> and &
- rd (bits 25-23)?
- rs1 (bits 22-20)?
- rs2 (bits 19-17)?
Step 3: Execute Instructions
Start with just a few instructions:
- NOP (do nothing)
- HALT (set halted flag)
- ADD (rd = rs1 + rs2)
- ADDI (rd = rs + immediate)
Write a switch statement that handles each opcode.
Step 4: Main Loop
Pseudocode:
while not halted:
fetch instruction from memory[PC]
decode instruction
execute instruction
PC += 4
regs[0] = 0 // enforce zero register
7.2 Phase 2: Complete ALU (Days 4-5)
Add remaining arithmetic/logic operations:
- SUB, AND, OR, XOR
- SLL, SRL (shifts)
- ANDI, ORI, LUI
Key insight: Immediate operations are almost identical to register operations, just using imm instead of regs[rs2].
7.3 Phase 3: Condition Codes (Days 6-7)
Implement the FLAGS register:
After arithmetic operation (ADD, SUB, CMP):
- Z (Zero): result == 0
- N (Negative): result < 0 (MSB is 1)
- C (Carry): unsigned overflow occurred
- V (Overflow): signed overflow occurred
Carry detection (for ADD):
uint32_t a = regs[rs1];
uint32_t b = regs[rs2];
uint32_t result = a + b;
flags.C = (result < a); // wrapped around
Overflow detection (for ADD):
int32_t a = (int32_t)regs[rs1];
int32_t b = (int32_t)regs[rs2];
int32_t result = a + b;
// Overflow if: both operands same sign, result different sign
flags.V = ((a > 0) && (b > 0) && (result < 0)) ||
((a < 0) && (b < 0) && (result >= 0));
Implement CMP and CMPI (same as SUB but discard result).
7.4 Phase 4: Branches and Jumps (Days 8-9)
Implement conditional branches:
BEQ: if (flags.Z) PC += offset
BNE: if (!flags.Z) PC += offset
BLT: if (flags.N != flags.V) PC += offset
BGE: if (flags.N == flags.V) PC += offset
Important: Offset is relative to current PC. If you already incremented PC, adjust accordingly.
Example:
PC = 0x000C
BEQ -8 ; branch back 8 bytes
If Z=1: PC = 0x000C + (-8) = 0x0004
7.5 Phase 5: Memory Operations (Days 10-11)
Implement LW and SW:
LW Rd, offset(Rs):
address = regs[rs] + sign_extend(offset)
regs[rd] = memory[address] // read 4 bytes
SW Rs, offset(Rb):
address = regs[rb] + sign_extend(offset)
memory[address] = regs[rs] // write 4 bytes
Handle endianness: Most emulators use little-endian (like x86).
Reading 4 bytes from address A:
byte0 = memory[A]
byte1 = memory[A+1]
byte2 = memory[A+2]
byte3 = memory[A+3]
word = byte0 | (byte1 << 8) | (byte2 << 16) | (byte3 << 24)
7.6 Phase 6: Basic Assembler (Days 12-14)
Two-Pass Assembly
Pass 1: Build symbol table
- Scan through source
- Track current address (starts at 0)
- When you see a label, record (name, address)
- Each instruction advances address by 4
Pass 2: Generate code
- For each instruction:
- Look up opcode
- Encode registers
- For labels: calculate offset = label_address - current_address
- Emit 32-bit word
Parsing Strategy
Simple approach for each line:
- Strip comments (everything after
;) - Check for label (ends with
:) - Extract mnemonic and operands
- Parse register (R0-R7) or immediate (#number or label)
7.7 Phase 7: Testing and Polish (Days 15-18)
- Test with handwritten assembly programs
- Add –trace mode (show each instruction)
- Add disassembly output
- Handle errors gracefully
8. The Assembler
8.1 Assembly Language Syntax
Define your syntax clearly:
; Comments start with semicolon
; Labels end with colon
; Registers are R0-R7
; Immediates use # prefix
; Memory uses offset(register) syntax
label:
ADD R1, R2, R3 ; R1 = R2 + R3
ADDI R1, R0, #100 ; R1 = 100
LW R2, 8(R1) ; R2 = MEM[R1 + 8]
BNE label ; branch if not equal
8.2 Pseudo-Instructions
Assemblers typically support pseudo-instructions that expand to real instructions:
+-----------------------------------------------------------------------+
| PSEUDO-INSTRUCTIONS |
+-----------------------------------------------------------------------+
| |
| MOV Rd, Rs => ADD Rd, R0, Rs |
| MOV Rd, #imm => ADDI Rd, R0, #imm |
| NOP => ADD R0, R0, R0 |
| PUSH Rs => ADDI R7, R7, #-4 |
| SW Rs, 0(R7) |
| POP Rd => LW Rd, 0(R7) |
| ADDI R7, R7, #4 |
| CALL label => JAL R6, label |
| RET => JALR R0, R6, #0 |
| |
+-----------------------------------------------------------------------+
8.3 Label Resolution
Labels create forward references that require two-pass assembly:
BEQ skip ; Forward reference - 'skip' not yet defined
ADD R1, R2, R3
skip: ; Now we know skip is at address 0x0008
HALT
Pass 1 Algorithm:
address = 0
for each line:
if line has label:
symbol_table[label_name] = address
if line has instruction:
address += 4
Pass 2 Algorithm:
address = 0
for each instruction:
if operand is label:
target = symbol_table[label]
offset = target - address
encode instruction with offset
emit to output
address += 4
9. Testing Strategy
9.1 Unit Tests for Individual Instructions
Test each instruction in isolation:
; test_add.asm
ADDI R1, R0, #5 ; R1 = 5
ADDI R2, R0, #3 ; R2 = 3
ADD R3, R1, R2 ; R3 = 8
HALT
; Expected: R1=5, R2=3, R3=8
9.2 Test Programs
; test_loop.asm - Count from 0 to 10
ADDI R1, R0, #0 ; counter = 0
loop:
ADDI R1, R1, #1 ; counter++
CMPI R1, #10
BNE loop ; loop until counter == 10
HALT
; Expected: R1=10
; test_memory.asm - Store and load
ADDI R1, R0, #42 ; value to store
ADDI R2, R0, #256 ; address = 256
SW R1, 0(R2) ; MEM[256] = 42
LW R3, 0(R2) ; R3 = MEM[256]
HALT
; Expected: R1=42, R3=42
; test_function.asm - Simple function call
ADDI R7, R0, #1024 ; Initialize stack pointer
ADDI R1, R0, #5 ; argument
JAL R6, double ; call double(5)
HALT ; R2 should be 10
double:
ADD R2, R1, R1 ; return value = arg * 2
JALR R0, R6, #0 ; return
9.3 Validation Checklist
[ ] All R-type instructions work correctly
[ ] All I-type instructions work correctly
[ ] Immediate sign extension is correct
[ ] Flags are set correctly after arithmetic
[ ] All branch conditions work
[ ] Memory loads/stores work
[ ] Little-endian byte order is correct
[ ] Labels resolve to correct addresses
[ ] Negative offsets work for backward branches
[ ] R0 is always zero (even after writes)
10. Common Pitfalls
10.1 Sign Extension Errors
The 20-bit immediate is signed and must be sign-extended to 32 bits:
// WRONG: zero-extension
int32_t imm = instruction & 0xFFFFF; // Always positive!
// RIGHT: sign-extension
int32_t imm = instruction & 0xFFFFF;
if (imm & 0x80000) { // If bit 19 is set (negative)
imm |= 0xFFF00000; // Extend the sign
}
// Or using arithmetic shift:
int32_t imm = ((int32_t)(instruction << 12)) >> 12;
10.2 PC-Relative Offset Calculation
When the CPU fetches an instruction, PC points to that instruction. But by the time you execute a branch, you may have already incremented PC.
Two approaches:
1. Increment PC before execute:
PC += 4;
if (branch_taken) PC += offset; // offset is from NEW PC
2. Increment PC after execute:
if (branch_taken) PC = PC + offset; // offset from current instruction
else PC += 4;
Be consistent in your assembler and emulator!
10.3 Memory Alignment
If you require word-aligned access:
if (address & 0x3) { // Not divisible by 4
// Error: unaligned access
cpu->status = STATUS_ALIGNMENT_ERROR;
cpu->halted = true;
return;
}
10.4 Zero Register Enforcement
R0 must always be zero:
// After EVERY instruction that writes to a register:
cpu->regs[0] = 0;
// Or, before reading:
uint32_t read_reg(CPU *cpu, int r) {
return (r == 0) ? 0 : cpu->regs[r];
}
10.5 Assembler Label Scoping
Labels should be case-sensitive and can contain underscores:
loop: ; valid
LOOP: ; different label!
my_label: ; valid
123label: ; invalid - can't start with digit
11. Extensions
After completing the basic project, consider these extensions:
11.1 Add More Instructions
- MUL Rd, Rs1, Rs2 ; Multiplication
- DIV Rd, Rs1, Rs2 ; Division (integer)
- REM Rd, Rs1, Rs2 ; Remainder (modulo)
- SRA Rd, Rs1, Rs2 ; Arithmetic right shift
- NOT Rd, Rs ; Bitwise NOT
- NEG Rd, Rs ; Two's complement negate
11.2 Implement a Stack
Add PUSH and POP as real instructions (not just pseudo-instructions):
PUSH Rs:
R7 = R7 - 4
MEM[R7] = Rs
POP Rd:
Rd = MEM[R7]
R7 = R7 + 4
11.3 Add System Calls
Implement a simple syscall mechanism:
ADDI R1, R0, #1 ; syscall number: print_int
ADDI R2, R0, #42 ; argument: value to print
SYSCALL ; execute syscall
Syscalls could include:
- Print integer
- Print character
- Read input
- Random number
11.4 Implement Interrupts
Add a timer interrupt:
- Timer counts down each cycle
- When timer reaches zero, save PC and jump to interrupt handler
- RETI instruction returns from interrupt
11.5 Build a Pipeline Simulator
Extend your emulator to simulate pipelining:
- Show what’s in each stage (F/D/E/M/W) each cycle
- Detect and display hazards
- Implement forwarding
- Calculate CPI
11.6 Write a C Compiler Backend
Write a simple compiler that emits your assembly:
- Input: Simple subset of C (arithmetic, if/while, functions)
- Output: Your assembly language
- This is a significant project but teaches you how compilers work!
12. Resources
12.1 Primary Books
| Book | Author | Relevant Chapters |
|---|---|---|
| Computer Organization and Design (RISC-V Ed.) | Patterson & Hennessy | Ch 2 (Instructions), Ch 4 (Processor) |
| Computer Systems: A Programmer’s Perspective | Bryant & O’Hallaron | Ch 3 (Machine-Level Representation) |
| Language Implementation Patterns | Terence Parr | Ch 5 (Assemblers) |
| Digital Design and Computer Architecture | Harris & Harris | Ch 6-7 (Architecture, Microarchitecture) |
12.2 Online Resources
- RISC-V Specification: Official ISA reference (model for your design)
- MIPS Green Sheet: Classic RISC ISA reference card
- Ben Eater’s Videos: 8-bit computer on breadboards
- Nand2Tetris: Build a CPU from scratch (different approach)
12.3 Reference Implementations
Study these open-source emulators:
- rv32emu: Simple RISC-V emulator in C
- chip8-c: CHIP-8 emulator (simpler starting point)
- unicorn: Multi-architecture CPU emulator
12.4 Debugging Tips
- Add verbose trace mode: Show every register after every instruction
- Compare with hand calculation: Work through a program on paper first
- Test in isolation: Verify each instruction type before combining
- Use hex consistently: Easier to match against binary encoding
- Write a disassembler first: Helps verify your encoding is correct
Self-Assessment Checklist
Before moving to the next project, verify:
Understanding
[ ] I can explain why RISC uses fixed-width instructions
[ ] I understand the trade-off between register count and immediate size
[ ] I can explain why load/store architecture simplifies hardware
[ ] I understand how condition codes enable conditional execution
[ ] I can trace the fetch-decode-execute cycle for any instruction
[ ] I can manually encode/decode instructions in my ISA
Implementation
[ ] My emulator runs simple programs correctly
[ ] All arithmetic instructions work, including with negative numbers
[ ] Branches work correctly in both directions
[ ] Memory operations handle all addresses correctly
[ ] My assembler produces correct machine code
[ ] Labels and forward references are resolved correctly
Portfolio
[ ] I have a working emulator with --trace mode
[ ] I have a working assembler with error messages
[ ] I have at least 5 test programs that demonstrate features
[ ] I can explain every design decision I made
[ ] I could extend this to add new instructions
You have designed your own instruction set architecture. You understand why real ISAs make the choices they do. Now go run Fibonacci on YOUR CPU.