Project 5: Simple RISC CPU Emulator (Custom ISA)

Design and implement your own RISC instruction set architecture from scratch. Build an emulator that executes your machine code, and write an assembler that translates your assembly language into binary. This is YOUR CPU, YOUR ISA, YOUR design decisions.

Quick Reference

Attribute	Value
Language	C (alt: Rust, C++, Zig)
Difficulty	Advanced
Time	2-3 weeks
Prerequisites	P03 (Stack VM) or P04 (CHIP-8)
Coolness	Hardcore Tech Flex
Portfolio Value	Resume Gold

Learning Objectives
Why Design Your Own ISA?
Theoretical Foundation
ISA Design Decisions
Project Specification
Solution Architecture
Implementation Guide
The Assembler
Testing Strategy
Common Pitfalls
Extensions
Resources

1. Learning Objectives

By completing this project, you will:

Understand ISA design trade-offs: Why fixed-width instructions? Why 32 registers vs 8? Why load/store architecture?
Master instruction encoding: Pack opcodes, registers, and immediates into bit fields
Implement a complete CPU emulator: Fetch, decode, execute with registers, memory, and flags
Build an assembler: Parse assembly text, resolve labels, emit binary machine code
Think like a hardware architect: Every decision affects complexity, performance, and expressiveness
Debug at the machine level: Trace execution, inspect registers, understand what your code becomes

2. Why Design Your Own ISA?

With CHIP-8 or 6502, you implement someone else’s specification. You learn how CPUs work, but not why they’re designed a particular way.

When YOU design the ISA, you confront fundamental questions:

+-----------------------------------------------------------------------+
|                    QUESTIONS YOU MUST ANSWER                           |
+-----------------------------------------------------------------------+
|                                                                        |
|  Instruction Width:                                                    |
|    - Fixed 32-bit? Variable length? Why?                               |
|    - How many bits for opcode? For registers? For immediates?          |
|                                                                        |
|  Registers:                                                            |
|    - How many? 8? 16? 32?                                              |
|    - Special-purpose (PC, SP, flags) or general-purpose?               |
|    - Register 0 hardwired to zero? (Like MIPS/RISC-V)                  |
|                                                                        |
|  Memory Access:                                                        |
|    - Load/store architecture? Or memory operands in ALU ops?           |
|    - What addressing modes? Immediate? Register indirect? Indexed?     |
|    - Byte-addressable? Word-addressable? Alignment requirements?       |
|                                                                        |
|  Control Flow:                                                         |
|    - Condition codes (flags) or compare-and-branch?                    |
|    - PC-relative or absolute jumps?                                    |
|    - How to implement function calls?                                  |
|                                                                        |
|  Immediates:                                                           |
|    - How many bits for immediate values?                               |
|    - Sign-extended or zero-extended?                                   |
|    - How to load large constants? (LUI + ADDI like RISC-V?)            |
|                                                                        |
+-----------------------------------------------------------------------+

After this project, you’ll understand why real ISAs (ARM, x86, RISC-V) made their choices.

3. Theoretical Foundation

3.1 RISC Philosophy

RISC (Reduced Instruction Set Computer) emerged in the 1980s as a reaction to complex ISAs:

+-----------------------------------------------------------------------+
|                    CISC vs RISC COMPARISON                             |
+-----------------------------------------------------------------------+
|                                                                        |
|  CISC (x86, VAX)                    RISC (ARM, MIPS, RISC-V)           |
|  +-----------------------+         +-----------------------+          |
|  | Complex instructions  |         | Simple instructions   |          |
|  | - String copy         |         | - One operation each  |          |
|  | - Polynomial eval     |         | - Load/store memory   |          |
|  | - Memory-to-memory    |         | - Register-to-reg ALU |          |
|  +-----------------------+         +-----------------------+          |
|  | Variable length       |         | Fixed length          |          |
|  | (1-15 bytes x86)      |         | (32 bits typical)     |          |
|  +-----------------------+         +-----------------------+          |
|  | Many addressing modes |         | Few addressing modes  |          |
|  | - Indexed, indirect   |         | - Register, immediate |          |
|  | - Scaled index        |         | - Base + offset       |          |
|  +-----------------------+         +-----------------------+          |
|  | Microcode-heavy       |         | Hardwired control     |          |
|  | (ROM for complex ops) |         | (faster, simpler)     |          |
|  +-----------------------+         +-----------------------+          |
|                                                                        |
|  RISC Key Principles:                                                  |
|  1. Fixed instruction size (easy to fetch/decode)                      |
|  2. Load/store architecture (only loads/stores touch memory)           |
|  3. Many registers (reduce memory traffic)                             |
|  4. Simple instructions (one operation per instruction)                |
|  5. Hardwired control (no microcode)                                   |
|                                                                        |
+-----------------------------------------------------------------------+

Your ISA will follow RISC principles: fixed-width instructions, load/store architecture, simple operations.

3.2 Instruction Encoding Design

The heart of ISA design is packing information into a fixed number of bits:

+-----------------------------------------------------------------------+
|                 INSTRUCTION ENCODING TRADE-OFFS                        |
+-----------------------------------------------------------------------+
|                                                                        |
|  32-bit instruction with 8 registers (3-bit register IDs):             |
|                                                                        |
|  R-type (Register-Register operations):                                |
|  +--------+------+------+------+--------+                              |
|  | opcode |  Rd  |  Rs1 |  Rs2 | unused |                              |
|  | 6 bits | 3b   | 3b   | 3b   | 17 bits|                              |
|  +--------+------+------+------+--------+                              |
|    bits:   31-26  25-23  22-20  19-17    16-0                          |
|                                                                        |
|  Example: ADD R1, R2, R3   (R1 = R2 + R3)                              |
|  opcode=0x01, Rd=1, Rs1=2, Rs2=3                                       |
|  Binary: 000001 001 010 011 00000000000000000                          |
|                                                                        |
|  I-type (Immediate operations):                                        |
|  +--------+------+------+------------------+                           |
|  | opcode |  Rd  |  Rs1 |    immediate     |                           |
|  | 6 bits | 3b   | 3b   |     20 bits      |                           |
|  +--------+------+------+------------------+                           |
|    bits:   31-26  25-23  22-20     19-0                                |
|                                                                        |
|  Example: ADDI R1, R2, #100   (R1 = R2 + 100)                          |
|  opcode=0x02, Rd=1, Rs1=2, imm=100                                     |
|  20-bit immediate: -524288 to +524287 (signed)                         |
|                                                                        |
|  J-type (Jump instructions):                                           |
|  +--------+------+------------------------+                            |
|  | opcode |  Rd  |        offset          |                            |
|  | 6 bits | 3b   |       23 bits          |                            |
|  +--------+------+------------------------+                            |
|    bits:   31-26  25-23       22-0                                     |
|                                                                        |
|  Example: JAL R7, label   (R7 = PC+4, jump to label)                   |
|  23-bit offset: +/- 4MB range (PC-relative)                            |
|                                                                        |
+-----------------------------------------------------------------------+
|                                                                        |
|  DESIGN DECISION: Why these bit allocations?                           |
|                                                                        |
|  - 6 bits for opcode: 64 possible instructions (plenty for ~20)        |
|  - 3 bits for registers: 8 registers (R0-R7)                           |
|  - Remaining bits: maximize immediate/offset range                     |
|                                                                        |
|  Alternative: 16 registers needs 4 bits each                           |
|  +--------+------+------+------+--------+                              |
|  | opcode |  Rd  |  Rs1 |  Rs2 | unused |                              |
|  | 6 bits | 4b   | 4b   | 4b   | 14 bits|                              |
|  +--------+------+------+------+--------+                              |
|                                                                        |
|  More registers = smaller immediates. Trade-off!                       |
|                                                                        |
+-----------------------------------------------------------------------+

Instruction Encoding Formats

3.3 Register File Architecture

Registers are the CPU’s ultra-fast scratch space:

+-----------------------------------------------------------------------+
|                     REGISTER FILE DESIGN                               |
+-----------------------------------------------------------------------+
|                                                                        |
|  OPTION A: 8 General-Purpose Registers                                 |
|  +-----+-----+-----+-----+-----+-----+-----+-----+                     |
|  | R0  | R1  | R2  | R3  | R4  | R5  | R6  | R7  |                     |
|  +-----+-----+-----+-----+-----+-----+-----+-----+                     |
|    ^                                         ^                         |
|    |                                         |                         |
|    Zero register (always 0)?           Stack pointer?                  |
|    Or general purpose?                 Or general purpose?             |
|                                                                        |
|  Special Registers (separate from GPRs):                               |
|  +--------+                                                            |
|  |   PC   |  Program Counter (address of current instruction)          |
|  +--------+                                                            |
|  +--------+                                                            |
|  |  FLAGS |  Status flags (Zero, Negative, Carry, Overflow)            |
|  +--------+                                                            |
|                                                                        |
+-----------------------------------------------------------------------+
|                                                                        |
|  DESIGN CHOICES:                                                       |
|                                                                        |
|  1. R0 = Zero Register (MIPS, RISC-V style)                            |
|     - Writes to R0 are discarded                                       |
|     - Reads from R0 always return 0                                    |
|     - Simplifies MOV instruction: ADD R1, R0, R2 copies R2 to R1       |
|     - NOP can be: ADD R0, R0, R0                                       |
|                                                                        |
|  2. All registers general-purpose (ARM style)                          |
|     - Need explicit MOV instruction                                    |
|     - R7 conventionally used as stack pointer (software convention)    |
|                                                                        |
|  3. Dedicated SP and LR (ARM style)                                    |
|     - R7 = SP (stack pointer)                                          |
|     - R6 = LR (link register for return address)                       |
|     - Simplifies CALL/RET                                              |
|                                                                        |
+-----------------------------------------------------------------------+
|                                                                        |
|  RECOMMENDATION FOR THIS PROJECT:                                      |
|                                                                        |
|  8 registers, R0 = zero, R7 = stack pointer (convention)               |
|  +-----+-----+-----+-----+-----+-----+-----+-----+                     |
|  | R0  | R1  | R2  | R3  | R4  | R5  | R6  | R7  |                     |
|  | =0  | GP  | GP  | GP  | GP  | GP  | GP  | SP  |                     |
|  +-----+-----+-----+-----+-----+-----+-----+-----+                     |
|                                                                        |
|  Benefits:                                                             |
|  - 3 bits per register (efficient encoding)                            |
|  - Zero register simplifies ISA                                        |
|  - Stack pointer convention (not hardwired)                            |
|                                                                        |
+-----------------------------------------------------------------------+

3.4 Addressing Modes

How instructions specify operand locations:

+-----------------------------------------------------------------------+
|                      ADDRESSING MODES                                  |
+-----------------------------------------------------------------------+
|                                                                        |
|  1. REGISTER MODE                                                      |
|     Operand is in a register                                           |
|                                                                        |
|     ADD R1, R2, R3      ; R1 = R2 + R3                                 |
|                                                                        |
|  2. IMMEDIATE MODE                                                     |
|     Operand is a constant encoded in the instruction                   |
|                                                                        |
|     ADDI R1, R2, #100   ; R1 = R2 + 100                                |
|     MOV R1, #42         ; R1 = 42  (same as ADDI R1, R0, #42)          |
|                                                                        |
|  3. REGISTER INDIRECT                                                  |
|     Address is in a register                                           |
|                                                                        |
|     LW R1, (R2)         ; R1 = Memory[R2]                              |
|     SW R1, (R2)         ; Memory[R2] = R1                              |
|                                                                        |
|  4. BASE + OFFSET (Displacement)                                       |
|     Address = register + constant                                      |
|                                                                        |
|     LW R1, 8(R2)        ; R1 = Memory[R2 + 8]                          |
|     SW R1, -4(R2)       ; Memory[R2 - 4] = R1                          |
|                                                                        |
|     Common pattern: accessing struct fields, local variables           |
|       LW R1, 0(FP)      ; load first local variable                    |
|       LW R2, 4(FP)      ; load second local variable                   |
|                                                                        |
+-----------------------------------------------------------------------+
|                                                                        |
|  LOAD/STORE ENCODING:                                                  |
|                                                                        |
|  LW (Load Word):                                                       |
|  +--------+------+------+------------------+                           |
|  | LW_OP  |  Rd  |  Rs  |     offset       |                           |
|  | 6 bits | 3b   | 3b   |     20 bits      |                           |
|  +--------+------+------+------------------+                           |
|                                                                        |
|  LW R1, 8(R2)  =>  R1 = Memory[R2 + 8]                                 |
|                                                                        |
|  SW (Store Word):                                                      |
|  +--------+------+------+------------------+                           |
|  | SW_OP  |  Rs  |  Rb  |     offset       |                           |
|  | 6 bits | 3b   | 3b   |     20 bits      |                           |
|  +--------+------+------+------------------+                           |
|                                                                        |
|  SW R1, 8(R2)  =>  Memory[R2 + 8] = R1                                 |
|                                                                        |
|  Note: RISC ISAs typically only have LOAD and STORE touch memory.      |
|  ALU operations work exclusively on registers.                         |
|                                                                        |
+-----------------------------------------------------------------------+

3.5 Condition Codes and Status Flags

Two approaches to conditional execution:

+-----------------------------------------------------------------------+
|                    CONDITIONAL EXECUTION                               |
+-----------------------------------------------------------------------+
|                                                                        |
|  APPROACH 1: Condition Codes (FLAGS Register)                          |
|  +---+---+---+---+                                                     |
|  | Z | N | C | V |                                                     |
|  +---+---+---+---+                                                     |
|    |   |   |   |                                                       |
|    |   |   |   +-- Overflow (signed overflow occurred)                 |
|    |   |   +------ Carry (unsigned overflow/borrow)                    |
|    |   +---------- Negative (result is negative, MSB=1)                |
|    +-------------- Zero (result is zero)                               |
|                                                                        |
|  ALU operations set flags:                                             |
|    SUB R0, R1, R2   ; R0 = R1 - R2, sets Z, N, C, V                    |
|    BEQ label        ; branch if Z=1 (equal)                            |
|    BLT label        ; branch if N!=V (less than, signed)               |
|                                                                        |
|  Condition Combinations:                                               |
|    EQ (equal):        Z=1                                              |
|    NE (not equal):    Z=0                                              |
|    LT (less than):    N!=V                                             |
|    GE (greater/eq):   N==V                                             |
|    LE (less/equal):   Z=1 OR N!=V                                      |
|    GT (greater):      Z=0 AND N==V                                     |
|    CS (carry set):    C=1 (unsigned >=)                                |
|    CC (carry clear):  C=0 (unsigned <)                                 |
|                                                                        |
+-----------------------------------------------------------------------+
|                                                                        |
|  APPROACH 2: Compare-and-Branch (RISC-V style)                         |
|                                                                        |
|  No flags register! Branch instructions compare directly:              |
|    BEQ R1, R2, label    ; if R1 == R2, branch                          |
|    BLT R1, R2, label    ; if R1 < R2 (signed), branch                  |
|    BGE R1, R2, label    ; if R1 >= R2 (signed), branch                 |
|                                                                        |
|  Advantages:                                                           |
|  - No hidden state (flags)                                             |
|  - No flag dependencies between instructions                           |
|  - Easier out-of-order execution                                       |
|                                                                        |
|  Disadvantages:                                                        |
|  - Branch instruction needs 2 register operands (encoding space)       |
|  - Can't reuse comparison result                                       |
|                                                                        |
+-----------------------------------------------------------------------+
|                                                                        |
|  RECOMMENDATION FOR THIS PROJECT:                                      |
|                                                                        |
|  Use FLAGS register (simpler for beginners):                           |
|  - CMP instruction sets flags                                          |
|  - Branch instructions check flags                                     |
|  - Classic approach, matches ARM and x86                               |
|                                                                        |
|  CMP R1, R2          ; FLAGS = R1 - R2 (result discarded)              |
|  BEQ label           ; if Z=1, PC = label                              |
|                                                                        |
+-----------------------------------------------------------------------+

Condition Flags Diagram

3.6 The Fetch-Decode-Execute Cycle

The eternal heartbeat of every CPU:

+-----------------------------------------------------------------------+
|                   FETCH-DECODE-EXECUTE CYCLE                           |
+-----------------------------------------------------------------------+
|                                                                        |
|                          +---------------+                             |
|                          |     FETCH     |                             |
|                          +-------+-------+                             |
|                                  |                                     |
|                                  v                                     |
|                    +-------------+-------------+                       |
|                    | instruction = MEM[PC]     |                       |
|                    | PC = PC + 4               |                       |
|                    +-------------+-------------+                       |
|                                  |                                     |
|                                  v                                     |
|                          +-------+-------+                             |
|                          |    DECODE     |                             |
|                          +-------+-------+                             |
|                                  |                                     |
|                                  v                                     |
|                    +-------------+-------------+                       |
|                    | opcode = (instr >> 26)    |                       |
|                    | rd = (instr >> 23) & 0x7  |                       |
|                    | rs1 = (instr >> 20) & 0x7 |                       |
|                    | rs2 = (instr >> 17) & 0x7 |                       |
|                    | imm = sign_ext(instr)     |                       |
|                    +-------------+-------------+                       |
|                                  |                                     |
|                                  v                                     |
|                          +-------+-------+                             |
|                          |    EXECUTE    |                             |
|                          +-------+-------+                             |
|                                  |                                     |
|                                  v                                     |
|                    +-------------+-------------+                       |
|                    | switch(opcode):           |                       |
|                    |   ADD: R[rd] = R[rs1] +   |                       |
|                    |              R[rs2]       |                       |
|                    |   SUB: R[rd] = R[rs1] -   |                       |
|                    |              R[rs2]       |                       |
|                    |   LW:  R[rd] = MEM[R[rs1] |                       |
|                    |              + imm]       |                       |
|                    |   BEQ: if(Z) PC = PC +    |                       |
|                    |              imm          |                       |
|                    |   ...                     |                       |
|                    +-------------+-------------+                       |
|                                  |                                     |
|                                  +------+                              |
|                                         |                              |
|                    +--------------------+                              |
|                    |                                                   |
|                    v                                                   |
|              (loop forever until HALT)                                 |
|                                                                        |
+-----------------------------------------------------------------------+
|                                                                        |
|  EMULATOR MAIN LOOP (pseudocode):                                      |
|                                                                        |
|  while (!halted) {                                                     |
|      // FETCH                                                          |
|      instruction = memory[pc];                                         |
|      pc += 4;                                                          |
|                                                                        |
|      // DECODE                                                         |
|      opcode = (instruction >> 26) & 0x3F;                              |
|      rd = (instruction >> 23) & 0x7;                                   |
|      rs1 = (instruction >> 20) & 0x7;                                  |
|      rs2 = (instruction >> 17) & 0x7;                                  |
|      imm = sign_extend_20bit(instruction & 0xFFFFF);                   |
|                                                                        |
|      // EXECUTE                                                        |
|      switch(opcode) {                                                  |
|          case OP_ADD: regs[rd] = regs[rs1] + regs[rs2]; break;         |
|          case OP_LW:  regs[rd] = memory[regs[rs1] + imm]; break;       |
|          case OP_BEQ: if(flags.Z) pc = pc - 4 + imm; break;            |
|          case OP_HALT: halted = true; break;                           |
|          // ...                                                        |
|      }                                                                 |
|                                                                        |
|      // Enforce R0 = 0                                                 |
|      regs[0] = 0;                                                      |
|  }                                                                     |
|                                                                        |
+-----------------------------------------------------------------------+

Fetch-Decode-Execute Cycle

4. ISA Design Decisions

This section guides you through designing YOUR instruction set. Make deliberate choices!

4.1 Your ISA Parameters

Fill in these decisions:

+-----------------------------------------------------------------------+
|                    YOUR ISA DESIGN WORKSHEET                           |
+-----------------------------------------------------------------------+
|                                                                        |
|  ISA Name: ______________________ (e.g., "SimpleCPU", "TinyRISC")     |
|                                                                        |
|  INSTRUCTION WIDTH                                                     |
|  [ ] 16-bit (compact, limited immediates)                              |
|  [X] 32-bit (recommended - good balance)                               |
|  [ ] Variable (complex decoding)                                       |
|                                                                        |
|  REGISTER COUNT                                                        |
|  [ ] 4 registers (2-bit IDs, very limited)                             |
|  [X] 8 registers (3-bit IDs, recommended)                              |
|  [ ] 16 registers (4-bit IDs, uses more instruction bits)              |
|  [ ] 32 registers (5-bit IDs, like MIPS/RISC-V)                        |
|                                                                        |
|  REGISTER WIDTH                                                        |
|  [ ] 16-bit                                                            |
|  [X] 32-bit (recommended)                                              |
|  [ ] 64-bit                                                            |
|                                                                        |
|  MEMORY                                                                |
|  Word size: 32 bits (4 bytes)                                          |
|  Addressing: Byte-addressable                                          |
|  Alignment: Word-aligned (addresses must be multiples of 4)            |
|  Size: 64KB (addressable with 16-bit addresses)                        |
|                                                                        |
|  SPECIAL REGISTERS                                                     |
|  [X] R0 = Zero (always reads 0, writes ignored)                        |
|  [ ] R0 = General purpose                                              |
|  R7 = Stack pointer (software convention)                              |
|                                                                        |
|  CONDITIONAL MODEL                                                     |
|  [X] Flags register (Z, N, C, V)                                       |
|  [ ] Compare-and-branch (no flags)                                     |
|                                                                        |
+-----------------------------------------------------------------------+

4.2 Recommended Instruction Set (~20 instructions)

Design your instruction set in categories:

+-----------------------------------------------------------------------+
|                    RECOMMENDED INSTRUCTION SET                         |
+-----------------------------------------------------------------------+
|                                                                        |
|  CATEGORY 1: ARITHMETIC (R-type)                                       |
|  +--------+--------------------------------+-------------+             |
|  | Opcode | Instruction                    | Operation   |             |
|  +--------+--------------------------------+-------------+             |
|  | 0x01   | ADD  Rd, Rs1, Rs2              | Rd=Rs1+Rs2  |             |
|  | 0x02   | SUB  Rd, Rs1, Rs2              | Rd=Rs1-Rs2  |             |
|  | 0x03   | AND  Rd, Rs1, Rs2              | Rd=Rs1&Rs2  |             |
|  | 0x04   | OR   Rd, Rs1, Rs2              | Rd=Rs1|Rs2  |             |
|  | 0x05   | XOR  Rd, Rs1, Rs2              | Rd=Rs1^Rs2  |             |
|  | 0x06   | SLL  Rd, Rs1, Rs2              | Rd=Rs1<<Rs2 |             |
|  | 0x07   | SRL  Rd, Rs1, Rs2              | Rd=Rs1>>Rs2 |             |
|  +--------+--------------------------------+-------------+             |
|                                                                        |
|  CATEGORY 2: IMMEDIATE OPERATIONS (I-type)                             |
|  +--------+--------------------------------+-------------+             |
|  | Opcode | Instruction                    | Operation   |             |
|  +--------+--------------------------------+-------------+             |
|  | 0x10   | ADDI Rd, Rs, #imm              | Rd=Rs+imm   |             |
|  | 0x11   | ANDI Rd, Rs, #imm              | Rd=Rs&imm   |             |
|  | 0x12   | ORI  Rd, Rs, #imm              | Rd=Rs|imm   |             |
|  | 0x13   | LUI  Rd, #imm                  | Rd=imm<<12  |             |
|  +--------+--------------------------------+-------------+             |
|                                                                        |
|  Note: MOV Rd, #imm = ADDI Rd, R0, #imm (pseudo-instruction)           |
|        MOV Rd, Rs  = ADD Rd, R0, Rs (pseudo-instruction)               |
|                                                                        |
|  CATEGORY 3: MEMORY (I-type)                                           |
|  +--------+--------------------------------+-----------------+         |
|  | Opcode | Instruction                    | Operation       |         |
|  +--------+--------------------------------+-----------------+         |
|  | 0x20   | LW   Rd, offset(Rs)            | Rd=MEM[Rs+off]  |         |
|  | 0x21   | SW   Rs, offset(Rb)            | MEM[Rb+off]=Rs  |         |
|  | 0x22   | LB   Rd, offset(Rs)            | Rd=MEM[Rs+off]  |         |
|  | 0x23   | SB   Rs, offset(Rb)            | MEM[Rb+off]=Rs  |         |
|  +--------+--------------------------------+-----------------+         |
|                                                                        |
|  CATEGORY 4: COMPARISON                                                |
|  +--------+--------------------------------+-------------+             |
|  | Opcode | Instruction                    | Operation   |             |
|  +--------+--------------------------------+-------------+             |
|  | 0x30   | CMP  Rs1, Rs2                  | FLAGS=Rs1-Rs2|            |
|  | 0x31   | CMPI Rs, #imm                  | FLAGS=Rs-imm |            |
|  +--------+--------------------------------+-------------+             |
|                                                                        |
|  CATEGORY 5: BRANCHES (B-type, PC-relative)                            |
|  +--------+--------------------------------+-------------------+       |
|  | Opcode | Instruction                    | Operation         |       |
|  +--------+--------------------------------+-------------------+       |
|  | 0x40   | BEQ  offset                    | if Z, PC+=offset  |       |
|  | 0x41   | BNE  offset                    | if !Z, PC+=offset |       |
|  | 0x42   | BLT  offset                    | if N!=V, PC+=off  |       |
|  | 0x43   | BGE  offset                    | if N==V, PC+=off  |       |
|  | 0x44   | BLE  offset                    | if Z|N!=V, PC+=off|       |
|  | 0x45   | BGT  offset                    | if !Z&N==V, PC+=of|       |
|  +--------+--------------------------------+-------------------+       |
|                                                                        |
|  CATEGORY 6: JUMPS AND CALLS                                           |
|  +--------+--------------------------------+-------------------+       |
|  | Opcode | Instruction                    | Operation         |       |
|  +--------+--------------------------------+-------------------+       |
|  | 0x50   | JMP  offset                    | PC += offset      |       |
|  | 0x51   | JAL  Rd, offset                | Rd=PC, PC+=offset |       |
|  | 0x52   | JALR Rd, Rs, offset            | Rd=PC, PC=Rs+off  |       |
|  +--------+--------------------------------+-------------------+       |
|                                                                        |
|  Note: CALL = JAL R6, offset (R6 = link register by convention)        |
|        RET  = JALR R0, R6, 0 (jump to address in R6, discard PC)       |
|                                                                        |
|  CATEGORY 7: SYSTEM                                                    |
|  +--------+--------------------------------+-------------------+       |
|  | Opcode | Instruction                    | Operation         |       |
|  +--------+--------------------------------+-------------------+       |
|  | 0x00   | HALT                           | Stop execution    |       |
|  | 0x3F   | NOP                            | No operation      |       |
|  +--------+--------------------------------+-------------------+       |
|                                                                        |
+-----------------------------------------------------------------------+

4.3 Encoding Summary

+-----------------------------------------------------------------------+
|                    INSTRUCTION ENCODING FORMATS                        |
+-----------------------------------------------------------------------+
|                                                                        |
|  R-TYPE (Register-Register):                                           |
|  +--------+------+------+------+-----------------+                     |
|  | opcode |  Rd  |  Rs1 |  Rs2 |     unused      |                     |
|  | 6 bits | 3b   | 3b   | 3b   |     17 bits     |                     |
|  +--------+------+------+------+-----------------+                     |
|  bits:     31-26  25-23  22-20  19-17     16-0                         |
|                                                                        |
|  Used by: ADD, SUB, AND, OR, XOR, SLL, SRL, CMP                        |
|                                                                        |
|  I-TYPE (Immediate):                                                   |
|  +--------+------+------+------------------------+                     |
|  | opcode |  Rd  |  Rs  |       immediate        |                     |
|  | 6 bits | 3b   | 3b   |        20 bits         |                     |
|  +--------+------+------+------------------------+                     |
|  bits:     31-26  25-23  22-20       19-0                              |
|                                                                        |
|  Used by: ADDI, ANDI, ORI, LUI, LW, SW, CMPI, JALR                     |
|  Immediate is sign-extended to 32 bits                                 |
|  Range: -524288 to +524287                                             |
|                                                                        |
|  B-TYPE (Branch):                                                      |
|  +--------+-----------------------------+                              |
|  | opcode |           offset            |                              |
|  | 6 bits |          26 bits            |                              |
|  +--------+-----------------------------+                              |
|  bits:     31-26          25-0                                         |
|                                                                        |
|  Used by: BEQ, BNE, BLT, BGE, BLE, BGT, JMP                            |
|  Offset is sign-extended, in BYTES (not words)                         |
|  Range: +/- 32MB                                                       |
|                                                                        |
|  J-TYPE (Jump-and-Link):                                               |
|  +--------+------+------------------------+                            |
|  | opcode |  Rd  |        offset          |                            |
|  | 6 bits | 3b   |        23 bits         |                            |
|  +--------+------+------------------------+                            |
|  bits:     31-26  25-23       22-0                                     |
|                                                                        |
|  Used by: JAL                                                          |
|  Rd = return address register                                          |
|                                                                        |
+-----------------------------------------------------------------------+

5. Project Specification

5.1 What You’ll Build

Two programs:

Assembler (myasm): Converts assembly text to binary
Emulator (mycpu): Executes the binary

5.2 Expected Output

$ cat fibonacci.asm
; Fibonacci sequence - compute F(10)
; Uses R1, R2, R3 as working registers
; Result ends up in R2

        ADDI R1, R0, #0      ; R1 = fib(n-2) = 0
        ADDI R2, R0, #1      ; R2 = fib(n-1) = 1
        ADDI R3, R0, #10     ; R3 = counter = 10
loop:
        ADD  R4, R1, R2      ; R4 = fib(n) = fib(n-2) + fib(n-1)
        ADD  R1, R0, R2      ; R1 = old R2 (shift)
        ADD  R2, R0, R4      ; R2 = new fib
        ADDI R3, R3, #-1     ; counter--
        CMPI R3, #0
        BNE  loop            ; if counter != 0, continue
        HALT

$ ./myasm fibonacci.asm -o fibonacci.bin
Assembler Output:
  Input:  fibonacci.asm
  Output: fibonacci.bin

Pass 1: Scanning for labels...
  Found label 'loop' at address 0x000C

Pass 2: Generating machine code...
  0x0000: ADDI R1, R0, #0      => 0x40200000
  0x0004: ADDI R2, R0, #1      => 0x40400001
  0x0008: ADDI R3, R0, #10     => 0x4060000A
  0x000C: ADD  R4, R1, R2      => 0x04A20000
  0x0010: ADD  R1, R0, R2      => 0x04220000
  0x0014: ADD  R2, R0, R4      => 0x04440000
  0x0018: ADDI R3, R3, #-1     => 0x406FFFFF
  0x001C: CMPI R3, #0          => 0xC4600000
  0x0020: BNE  loop            => 0x41FFFFEC
  0x0024: HALT                 => 0x00000000

Assembled: 10 instructions, 40 bytes

$ ./mycpu fibonacci.bin --trace
SimpleCPU Emulator v1.0
Loading: fibonacci.bin (40 bytes)
Starting execution...

Cycle 1:
  [0x0000] 0x40200000  ADDI R1, R0, #0
  Registers: R0=0 R1=0 R2=0 R3=0 R4=0 R5=0 R6=0 R7=0
  Flags: Z=0 N=0 C=0 V=0

Cycle 2:
  [0x0004] 0x40400001  ADDI R2, R0, #1
  Registers: R0=0 R1=0 R2=1 R3=0 R4=0 R5=0 R6=0 R7=0
  Flags: Z=0 N=0 C=0 V=0

Cycle 3:
  [0x0008] 0x4060000A  ADDI R3, R0, #10
  Registers: R0=0 R1=0 R2=1 R3=10 R4=0 R5=0 R6=0 R7=0
  Flags: Z=0 N=0 C=0 V=0

... (loop iterations) ...

Cycle 84:
  [0x0024] 0x00000000  HALT

=== EXECUTION COMPLETE ===
Total cycles: 84
Final state:
  R0=0  R1=34  R2=55  R3=0  R4=55  R5=0  R6=0  R7=0
  Flags: Z=1 N=0 C=0 V=0

Result: R2 = 55 (10th Fibonacci number)

6. Solution Architecture

6.1 Emulator Structure

+-----------------------------------------------------------------------+
|                      EMULATOR ARCHITECTURE                             |
+-----------------------------------------------------------------------+
|                                                                        |
|  +------------------------------------------------------------------+  |
|  |                          main.c                                   |  |
|  |  - Parse command line (input file, --trace, --debug)             |  |
|  |  - Load binary into memory                                       |  |
|  |  - Call cpu_run()                                                |  |
|  |  - Print final state                                             |  |
|  +------------------------------------------------------------------+  |
|                              |                                         |
|                              v                                         |
|  +------------------------------------------------------------------+  |
|  |                          cpu.h / cpu.c                            |  |
|  |  +----------------------+   +----------------------+              |  |
|  |  | typedef struct {     |   | void cpu_init()      |              |  |
|  |  |   uint32_t regs[8];  |   | void cpu_step()      |              |  |
|  |  |   uint32_t pc;       |   | void cpu_run()       |              |  |
|  |  |   struct {           |   | void cpu_dump()      |              |  |
|  |  |     bool Z, N, C, V; |   +----------------------+              |  |
|  |  |   } flags;           |                                         |  |
|  |  |   bool halted;       |                                         |  |
|  |  |   uint8_t *memory;   |                                         |  |
|  |  | } CPU;               |                                         |  |
|  |  +----------------------+                                         |  |
|  +------------------------------------------------------------------+  |
|                              |                                         |
|                              v                                         |
|  +------------------------------------------------------------------+  |
|  |                       decode.h / decode.c                         |  |
|  |  +---------------------------+                                    |  |
|  |  | typedef struct {          |                                    |  |
|  |  |   uint8_t opcode;         |   Instruction decode_instr(word)   |  |
|  |  |   uint8_t rd, rs1, rs2;   |   const char* disassemble(instr)   |  |
|  |  |   int32_t imm;            |                                    |  |
|  |  |   InstrType type;         |                                    |  |
|  |  | } Instruction;            |                                    |  |
|  |  +---------------------------+                                    |  |
|  +------------------------------------------------------------------+  |
|                              |                                         |
|                              v                                         |
|  +------------------------------------------------------------------+  |
|  |                       execute.h / execute.c                       |  |
|  |                                                                   |  |
|  |  void execute_add(CPU *cpu, Instruction *instr)                   |  |
|  |  void execute_sub(CPU *cpu, Instruction *instr)                   |  |
|  |  void execute_lw(CPU *cpu, Instruction *instr)                    |  |
|  |  void execute_beq(CPU *cpu, Instruction *instr)                   |  |
|  |  ...                                                              |  |
|  |                                                                   |  |
|  |  OR: single execute(CPU*, Instruction*) with switch               |  |
|  +------------------------------------------------------------------+  |
|                              |                                         |
|                              v                                         |
|  +------------------------------------------------------------------+  |
|  |                       memory.h / memory.c                         |  |
|  |                                                                   |  |
|  |  uint32_t mem_read_word(CPU *cpu, uint32_t addr)                  |  |
|  |  void mem_write_word(CPU *cpu, uint32_t addr, uint32_t val)       |  |
|  |  uint8_t mem_read_byte(CPU *cpu, uint32_t addr)                   |  |
|  |  void mem_write_byte(CPU *cpu, uint32_t addr, uint8_t val)        |  |
|  |                                                                   |  |
|  |  Handles alignment, bounds checking, little-endian                |  |
|  +------------------------------------------------------------------+  |
|                                                                        |
+-----------------------------------------------------------------------+

6.2 Assembler Structure

+-----------------------------------------------------------------------+
|                      ASSEMBLER ARCHITECTURE                            |
+-----------------------------------------------------------------------+
|                                                                        |
|  +------------------------------------------------------------------+  |
|  |                          main.c                                   |  |
|  |  - Parse command line (input, -o output)                         |  |
|  |  - Read input file                                               |  |
|  |  - Call assembler passes                                         |  |
|  |  - Write binary output                                           |  |
|  +------------------------------------------------------------------+  |
|                              |                                         |
|                              v                                         |
|  +------------------------------------------------------------------+  |
|  |                       lexer.h / lexer.c                           |  |
|  |                                                                   |  |
|  |  typedef struct {                                                 |  |
|  |    TokenType type;    // LABEL, MNEMONIC, REGISTER, NUMBER, etc  |  |
|  |    char *text;                                                    |  |
|  |    int line_num;                                                  |  |
|  |  } Token;                                                         |  |
|  |                                                                   |  |
|  |  Token* tokenize(char *source)                                    |  |
|  |  - Strip comments (;)                                            |  |
|  |  - Identify tokens                                               |  |
|  |  - Handle # for immediates, R for registers                      |  |
|  +------------------------------------------------------------------+  |
|                              |                                         |
|                              v                                         |
|  +------------------------------------------------------------------+  |
|  |                       parser.h / parser.c                         |  |
|  |                                                                   |  |
|  |  typedef struct {                                                 |  |
|  |    char *label;       // NULL if no label on this line           |  |
|  |    char *mnemonic;    // "ADD", "LW", etc.                        |  |
|  |    Operand operands[3];                                           |  |
|  |    int num_operands;                                              |  |
|  |  } ParsedLine;                                                    |  |
|  |                                                                   |  |
|  |  ParsedLine* parse(Token *tokens)                                 |  |
|  +------------------------------------------------------------------+  |
|                              |                                         |
|                              v                                         |
|  +------------------------------------------------------------------+  |
|  |                     assembler.h / assembler.c                     |  |
|  |                                                                   |  |
|  |  PASS 1: Collect labels and their addresses                       |  |
|  |    - For each instruction, add 4 to address counter              |  |
|  |    - When label found, record (label_name, current_address)      |  |
|  |                                                                   |  |
|  |  PASS 2: Generate machine code                                    |  |
|  |    - For each instruction:                                       |  |
|  |      - Look up opcode                                            |  |
|  |      - Encode registers                                          |  |
|  |      - Resolve label references to offsets                       |  |
|  |      - Emit 32-bit word                                          |  |
|  +------------------------------------------------------------------+  |
|                              |                                         |
|                              v                                         |
|  +------------------------------------------------------------------+  |
|  |                     encode.h / encode.c                           |  |
|  |                                                                   |  |
|  |  uint32_t encode_r_type(opcode, rd, rs1, rs2)                     |  |
|  |  uint32_t encode_i_type(opcode, rd, rs, imm)                      |  |
|  |  uint32_t encode_b_type(opcode, offset)                           |  |
|  |  uint32_t encode_j_type(opcode, rd, offset)                       |  |
|  +------------------------------------------------------------------+  |
|                                                                        |
+-----------------------------------------------------------------------+

7. Implementation Guide

7.1 Phase 1: Basic Emulator (Days 1-3)

Goal: Execute hardcoded machine code with arithmetic operations.

Step 1: CPU State Structure

Think about what state you need to track:

Questions to answer:
- How do you represent 8 registers?
- How do you represent memory (array of bytes? words?)
- How do you store the PC?
- How do you store flags (separate bools? single byte? bitfield?)

Step 2: Instruction Decode

Write a function that extracts fields from a 32-bit instruction:

For instruction 0x04A20000 (ADD R4, R1, R2):

Binary: 0000 0100 1010 0010 0000 0000 0000 0000
        ---- --++ +--- ---- ---- ---- ---- ----
        opcode rd  rs1 rs2

How do you extract:
- opcode (bits 31-26)?  Use >> and &
- rd (bits 25-23)?
- rs1 (bits 22-20)?
- rs2 (bits 19-17)?

Step 3: Execute Instructions

Start with just a few instructions:

NOP (do nothing)
HALT (set halted flag)
ADD (rd = rs1 + rs2)
ADDI (rd = rs + immediate)

Write a switch statement that handles each opcode.

Step 4: Main Loop

Pseudocode:
  while not halted:
      fetch instruction from memory[PC]
      decode instruction
      execute instruction
      PC += 4
      regs[0] = 0  // enforce zero register

7.2 Phase 2: Complete ALU (Days 4-5)

Add remaining arithmetic/logic operations:

SUB, AND, OR, XOR
SLL, SRL (shifts)
ANDI, ORI, LUI

Key insight: Immediate operations are almost identical to register operations, just using imm instead of regs[rs2].

7.3 Phase 3: Condition Codes (Days 6-7)

Implement the FLAGS register:

After arithmetic operation (ADD, SUB, CMP):
  - Z (Zero): result == 0
  - N (Negative): result < 0 (MSB is 1)
  - C (Carry): unsigned overflow occurred
  - V (Overflow): signed overflow occurred

Carry detection (for ADD):
  uint32_t a = regs[rs1];
  uint32_t b = regs[rs2];
  uint32_t result = a + b;
  flags.C = (result < a);  // wrapped around

Overflow detection (for ADD):
  int32_t a = (int32_t)regs[rs1];
  int32_t b = (int32_t)regs[rs2];
  int32_t result = a + b;
  // Overflow if: both operands same sign, result different sign
  flags.V = ((a > 0) && (b > 0) && (result < 0)) ||
            ((a < 0) && (b < 0) && (result >= 0));

Implement CMP and CMPI (same as SUB but discard result).

7.4 Phase 4: Branches and Jumps (Days 8-9)

Implement conditional branches:

BEQ: if (flags.Z) PC += offset
BNE: if (!flags.Z) PC += offset
BLT: if (flags.N != flags.V) PC += offset
BGE: if (flags.N == flags.V) PC += offset

Important: Offset is relative to current PC. If you already incremented PC, adjust accordingly.

Example:
  PC = 0x000C
  BEQ -8        ; branch back 8 bytes

  If Z=1: PC = 0x000C + (-8) = 0x0004

7.5 Phase 5: Memory Operations (Days 10-11)

Implement LW and SW:

LW Rd, offset(Rs):
  address = regs[rs] + sign_extend(offset)
  regs[rd] = memory[address]  // read 4 bytes

SW Rs, offset(Rb):
  address = regs[rb] + sign_extend(offset)
  memory[address] = regs[rs]  // write 4 bytes

Handle endianness: Most emulators use little-endian (like x86).

Reading 4 bytes from address A:
  byte0 = memory[A]
  byte1 = memory[A+1]
  byte2 = memory[A+2]
  byte3 = memory[A+3]
  word = byte0 | (byte1 << 8) | (byte2 << 16) | (byte3 << 24)

7.6 Phase 6: Basic Assembler (Days 12-14)

Two-Pass Assembly

Pass 1: Build symbol table

Scan through source
Track current address (starts at 0)
When you see a label, record (name, address)
Each instruction advances address by 4

Pass 2: Generate code

For each instruction:
- Look up opcode
- Encode registers
- For labels: calculate offset = label_address - current_address
- Emit 32-bit word

Parsing Strategy

Simple approach for each line:

Strip comments (everything after ;)
Check for label (ends with :)
Extract mnemonic and operands
Parse register (R0-R7) or immediate (#number or label)

7.7 Phase 7: Testing and Polish (Days 15-18)

Test with handwritten assembly programs
Add –trace mode (show each instruction)
Add disassembly output
Handle errors gracefully

8. The Assembler

8.1 Assembly Language Syntax

Define your syntax clearly:

; Comments start with semicolon
; Labels end with colon
; Registers are R0-R7
; Immediates use # prefix
; Memory uses offset(register) syntax

label:
        ADD  R1, R2, R3      ; R1 = R2 + R3
        ADDI R1, R0, #100    ; R1 = 100
        LW   R2, 8(R1)       ; R2 = MEM[R1 + 8]
        BNE  label           ; branch if not equal

8.2 Pseudo-Instructions

Assemblers typically support pseudo-instructions that expand to real instructions:

+-----------------------------------------------------------------------+
|                      PSEUDO-INSTRUCTIONS                               |
+-----------------------------------------------------------------------+
|                                                                        |
|  MOV Rd, Rs        => ADD Rd, R0, Rs                                   |
|  MOV Rd, #imm      => ADDI Rd, R0, #imm                                |
|  NOP               => ADD R0, R0, R0                                   |
|  PUSH Rs           => ADDI R7, R7, #-4                                 |
|                       SW Rs, 0(R7)                                     |
|  POP Rd            => LW Rd, 0(R7)                                     |
|                       ADDI R7, R7, #4                                  |
|  CALL label        => JAL R6, label                                    |
|  RET               => JALR R0, R6, #0                                  |
|                                                                        |
+-----------------------------------------------------------------------+

8.3 Label Resolution

Labels create forward references that require two-pass assembly:

        BEQ skip       ; Forward reference - 'skip' not yet defined
        ADD R1, R2, R3
skip:                  ; Now we know skip is at address 0x0008
        HALT

Pass 1 Algorithm:

address = 0
for each line:
    if line has label:
        symbol_table[label_name] = address
    if line has instruction:
        address += 4

Pass 2 Algorithm:

address = 0
for each instruction:
    if operand is label:
        target = symbol_table[label]
        offset = target - address
    encode instruction with offset
    emit to output
    address += 4

9. Testing Strategy

9.1 Unit Tests for Individual Instructions

Test each instruction in isolation:

; test_add.asm
        ADDI R1, R0, #5      ; R1 = 5
        ADDI R2, R0, #3      ; R2 = 3
        ADD  R3, R1, R2      ; R3 = 8
        HALT
; Expected: R1=5, R2=3, R3=8

9.2 Test Programs

; test_loop.asm - Count from 0 to 10
        ADDI R1, R0, #0      ; counter = 0
loop:
        ADDI R1, R1, #1      ; counter++
        CMPI R1, #10
        BNE  loop            ; loop until counter == 10
        HALT
; Expected: R1=10

; test_memory.asm - Store and load
        ADDI R1, R0, #42     ; value to store
        ADDI R2, R0, #256    ; address = 256
        SW   R1, 0(R2)       ; MEM[256] = 42
        LW   R3, 0(R2)       ; R3 = MEM[256]
        HALT
; Expected: R1=42, R3=42

; test_function.asm - Simple function call
        ADDI R7, R0, #1024   ; Initialize stack pointer
        ADDI R1, R0, #5      ; argument
        JAL  R6, double      ; call double(5)
        HALT                 ; R2 should be 10

double:
        ADD  R2, R1, R1      ; return value = arg * 2
        JALR R0, R6, #0      ; return

9.3 Validation Checklist

[ ] All R-type instructions work correctly
[ ] All I-type instructions work correctly
[ ] Immediate sign extension is correct
[ ] Flags are set correctly after arithmetic
[ ] All branch conditions work
[ ] Memory loads/stores work
[ ] Little-endian byte order is correct
[ ] Labels resolve to correct addresses
[ ] Negative offsets work for backward branches
[ ] R0 is always zero (even after writes)

10. Common Pitfalls

10.1 Sign Extension Errors

The 20-bit immediate is signed and must be sign-extended to 32 bits:

// WRONG: zero-extension
int32_t imm = instruction & 0xFFFFF;  // Always positive!

// RIGHT: sign-extension
int32_t imm = instruction & 0xFFFFF;
if (imm & 0x80000) {  // If bit 19 is set (negative)
    imm |= 0xFFF00000;  // Extend the sign
}

// Or using arithmetic shift:
int32_t imm = ((int32_t)(instruction << 12)) >> 12;

10.2 PC-Relative Offset Calculation

When the CPU fetches an instruction, PC points to that instruction. But by the time you execute a branch, you may have already incremented PC.

Two approaches:

1. Increment PC before execute:
   PC += 4;
   if (branch_taken) PC += offset;  // offset is from NEW PC

2. Increment PC after execute:
   if (branch_taken) PC = PC + offset;  // offset from current instruction
   else PC += 4;

Be consistent in your assembler and emulator!

10.3 Memory Alignment

If you require word-aligned access:

if (address & 0x3) {  // Not divisible by 4
    // Error: unaligned access
    cpu->status = STATUS_ALIGNMENT_ERROR;
    cpu->halted = true;
    return;
}

10.4 Zero Register Enforcement

R0 must always be zero:

// After EVERY instruction that writes to a register:
cpu->regs[0] = 0;

// Or, before reading:
uint32_t read_reg(CPU *cpu, int r) {
    return (r == 0) ? 0 : cpu->regs[r];
}

10.5 Assembler Label Scoping

Labels should be case-sensitive and can contain underscores:

loop:        ; valid
LOOP:        ; different label!
my_label:    ; valid
123label:    ; invalid - can't start with digit

11. Extensions

After completing the basic project, consider these extensions:

11.1 Add More Instructions

- MUL Rd, Rs1, Rs2      ; Multiplication
- DIV Rd, Rs1, Rs2      ; Division (integer)
- REM Rd, Rs1, Rs2      ; Remainder (modulo)
- SRA Rd, Rs1, Rs2      ; Arithmetic right shift
- NOT Rd, Rs            ; Bitwise NOT
- NEG Rd, Rs            ; Two's complement negate

11.2 Implement a Stack

Add PUSH and POP as real instructions (not just pseudo-instructions):

PUSH Rs:
  R7 = R7 - 4
  MEM[R7] = Rs

POP Rd:
  Rd = MEM[R7]
  R7 = R7 + 4

11.3 Add System Calls

Implement a simple syscall mechanism:

ADDI R1, R0, #1      ; syscall number: print_int
ADDI R2, R0, #42     ; argument: value to print
SYSCALL              ; execute syscall

Syscalls could include:

Print integer
Print character
Read input
Random number

11.4 Implement Interrupts

Add a timer interrupt:

Timer counts down each cycle
When timer reaches zero, save PC and jump to interrupt handler
RETI instruction returns from interrupt

11.5 Build a Pipeline Simulator

Extend your emulator to simulate pipelining:

Show what’s in each stage (F/D/E/M/W) each cycle
Detect and display hazards
Implement forwarding
Calculate CPI

11.6 Write a C Compiler Backend

Write a simple compiler that emits your assembly:

Input: Simple subset of C (arithmetic, if/while, functions)
Output: Your assembly language
This is a significant project but teaches you how compilers work!

12. Resources

12.1 Primary Books

Book	Author	Relevant Chapters
Computer Organization and Design (RISC-V Ed.)	Patterson & Hennessy	Ch 2 (Instructions), Ch 4 (Processor)
Computer Systems: A Programmer’s Perspective	Bryant & O’Hallaron	Ch 3 (Machine-Level Representation)
Language Implementation Patterns	Terence Parr	Ch 5 (Assemblers)
Digital Design and Computer Architecture	Harris & Harris	Ch 6-7 (Architecture, Microarchitecture)

12.2 Online Resources

RISC-V Specification: Official ISA reference (model for your design)
MIPS Green Sheet: Classic RISC ISA reference card
Ben Eater’s Videos: 8-bit computer on breadboards
Nand2Tetris: Build a CPU from scratch (different approach)

12.3 Reference Implementations

Study these open-source emulators:

rv32emu: Simple RISC-V emulator in C
chip8-c: CHIP-8 emulator (simpler starting point)
unicorn: Multi-architecture CPU emulator

12.4 Debugging Tips

Add verbose trace mode: Show every register after every instruction
Compare with hand calculation: Work through a program on paper first
Test in isolation: Verify each instruction type before combining
Use hex consistently: Easier to match against binary encoding
Write a disassembler first: Helps verify your encoding is correct

Self-Assessment Checklist

Before moving to the next project, verify:

Understanding

[ ] I can explain why RISC uses fixed-width instructions
[ ] I understand the trade-off between register count and immediate size
[ ] I can explain why load/store architecture simplifies hardware
[ ] I understand how condition codes enable conditional execution
[ ] I can trace the fetch-decode-execute cycle for any instruction
[ ] I can manually encode/decode instructions in my ISA

Implementation

[ ] My emulator runs simple programs correctly
[ ] All arithmetic instructions work, including with negative numbers
[ ] Branches work correctly in both directions
[ ] Memory operations handle all addresses correctly
[ ] My assembler produces correct machine code
[ ] Labels and forward references are resolved correctly

Portfolio

[ ] I have a working emulator with --trace mode
[ ] I have a working assembler with error messages
[ ] I have at least 5 test programs that demonstrate features
[ ] I can explain every design decision I made
[ ] I could extend this to add new instructions

You have designed your own instruction set architecture. You understand why real ISAs make the choices they do. Now go run Fibonacci on YOUR CPU.