Project 2: Simple RISC CPU Emulator (RV32I)
Build an emulator for a subset of RISC-V (RV32I) that can run simple compiled programs, with a debugger interface showing register state and memory.
Quick Reference
| Attribute | Value |
|---|---|
| Language | C (alt: Rust, C++, Zig) |
| Difficulty | Intermediate |
| Time | 2-3 weeks |
| Knowledge Area | CPU Architecture / ISA Design |
| Hardware Needed | None (pure software emulation) |
| Prerequisites | C programming, basic assembly concepts, hex/binary |
| Key Book | “Computer Organization and Design RISC-V Edition” by Patterson & Hennessy |
Learning Objectives
By completing this project, you will:
- Understand how CPUs execute instructions: The fetch-decode-execute cycle becomes concrete when you implement it
- Master RISC-V instruction encoding: Decode R-type, I-type, S-type, B-type, U-type, and J-type instruction formats
- Implement a register file: Understand why x0 is hardwired to zero and how the 32 general-purpose registers work
- Build a memory subsystem: Implement byte, halfword, and word access with proper alignment
- Handle control flow: Implement branches, jumps, and function calls (JAL/JALR)
- Create a debugger interface: Single-step execution, register inspection, memory dumps
- Load and run ELF binaries: Parse the executable format used by GCC-compiled RISC-V programs
- Implement memory-mapped I/O: Understand how CPUs communicate with devices
The Core Question You’re Answering
“How does a CPU actually execute instructions, and what does it mean to ‘emulate’ one?”
This project answers the fundamental question of how code becomes execution. When you write a = b + c, what actually happens in hardware? Your emulator will model every step:
- Fetching the instruction bytes from memory
- Decoding which operation to perform
- Reading operand values from registers
- Performing the computation
- Writing results back
This is exactly what QEMU does with its Tiny Code Generator (TCG) for cross-architecture emulation, and what hypervisors must understand to virtualize CPUs.
Why This Project Teaches Virtualization
RISC-V is a modern, open instruction set used in production systems. Understanding CPU emulation is foundational to hypervisor development because:
- Software virtualization (like QEMU without KVM) works by emulating guest CPU instructions
- Binary translation (QEMU’s TCG) requires understanding instruction formats to translate them
- Hardware virtualization (VT-x) still needs emulation for unsupported instructions
- Debug/trace tools require instruction decoding
- Security analysis tools emulate code to detect malware behavior
What QEMU Does (Simplified)
Guest Program (ARM) Host CPU (x86)
┌─────────────────────┐ ┌─────────────────────┐
│ add r0, r1, r2 │ │ │
│ ldr r3, [r0] │ ──QEMU TCG──► │ mov eax, ebx │
│ bne label │ (Binary │ add eax, ecx │
└─────────────────────┘ Translation) │ mov edx, [eax] │
└─────────────────────┘
Your Project: Similar but simpler
RISC-V Program Your Emulator (runs on any CPU)
┌─────────────────────┐ ┌─────────────────────┐
│ add x10, x11, x12 │ │ cpu.x[10] = │
│ lw x13, 0(x10) │ ──Emulator──► │ cpu.x[11] + │
│ bne x5, x6, label │ │ cpu.x[12] │
└─────────────────────┘ └─────────────────────┘
Concepts You Must Understand First
Before starting this project, ensure you understand these concepts:
| Concept | Why It Matters | Where to Learn |
|---|---|---|
| Binary and hexadecimal | Instructions are encoded as bit patterns | CS:APP Chapter 2 |
| Basic assembly concepts (registers, memory, jumps) | You’re implementing what assembly describes | Patterson & Hennessy Ch. 2 |
| Two’s complement signed integers | RISC-V uses signed arithmetic | CS:APP 2.3 |
| Bitwise operations in C («, », &, |, ^) | Instruction decoding is all bit manipulation | Any C reference |
| Function call conventions (stack, return address) | Understanding JAL/JALR and the ABI | Patterson & Hennessy 2.8 |
| ELF file format basics | Loading compiled programs | “Practical Binary Analysis” Ch. 2 |
Self-Assessment Questions
Before proceeding, verify you can answer:
- Binary encoding: What is
0xDEADBEEFin binary? How many bits is that? - Sign extension: If you have an 8-bit signed value
0xFF, what is it when sign-extended to 32 bits? - Bit extraction: How would you extract bits 12-14 from a 32-bit value in C?
- Two’s complement: What is -5 represented as a 32-bit two’s complement integer?
- Memory addressing: If a word is at address 0x1000, what are the addresses of its 4 bytes?
Theoretical Foundation
RISC-V Architecture Overview
RISC-V (pronounced “risk-five”) is a free, open Instruction Set Architecture (ISA) designed at UC Berkeley in 2010. Unlike proprietary ISAs (x86, ARM), anyone can implement RISC-V without licensing fees.
Key RISC-V Characteristics:
RISC-V Design Philosophy
┌─────────────────────────────────────────────────────────────┐
│ │
│ SIMPLICITY MODULARITY EXTENSIBILITY │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Fixed 32-bit │ │ RV32I = Base │ │ Custom │ │
│ │ instructions │ │ RV32M = Mul │ │ instructions │ │
│ │ (mostly) │ │ RV32F = Float│ │ allowed │ │
│ │ │ │ RV32A = Atom │ │ │ │
│ │ Load/Store │ │ ... │ │ Reserved │ │
│ │ architecture │ │ │ │ opcode space │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ │
└─────────────────────────────────────────────────────────────┘
RV32I - The Base Integer Instruction Set:
RV32I is the minimal RISC-V implementation. It has:
- 32 general-purpose 32-bit registers (x0-x31)
- Program counter (PC)
- ~47 instructions (arithmetic, logic, memory, control flow)
The Register File
RISC-V has 32 registers, each 32 bits wide (in RV32):
RISC-V Register File (RV32I)
Register │ ABI Name │ Description │ Saved By
──────────┼──────────┼──────────────────────────────┼──────────
x0 │ zero │ Hardwired to 0 │ N/A
x1 │ ra │ Return address │ Caller
x2 │ sp │ Stack pointer │ Callee
x3 │ gp │ Global pointer │ N/A
x4 │ tp │ Thread pointer │ N/A
x5-x7 │ t0-t2 │ Temporaries │ Caller
x8 │ s0/fp │ Saved register / Frame ptr │ Callee
x9 │ s1 │ Saved register │ Callee
x10-x11 │ a0-a1 │ Function args / Return vals │ Caller
x12-x17 │ a2-a7 │ Function arguments │ Caller
x18-x27 │ s2-s11 │ Saved registers │ Callee
x28-x31 │ t3-t6 │ Temporaries │ Caller
Critical Rule: x0 is always zero. Writing to x0 has no effect. This simplifies instruction design (no need for special “load immediate 0” instruction).
Instruction Formats
RISC-V uses 6 instruction formats. All instructions are 32 bits, with the opcode in the same position:
RISC-V Instruction Formats (32 bits each)
R-type (Register-Register operations):
┌─────────┬───────┬───────┬────────┬───────┬─────────┐
│ funct7 │ rs2 │ rs1 │ funct3 │ rd │ opcode │
│ [31:25]│[24:20]│[19:15]│ [14:12]│[11:7] │ [6:0] │
│ 7 bits │ 5 bits│ 5 bits│ 3 bits │5 bits │ 7 bits │
└─────────┴───────┴───────┴────────┴───────┴─────────┘
Example: ADD x10, x11, x12 → rd=x10, rs1=x11, rs2=x12
I-type (Immediate operations):
┌───────────────────┬───────┬────────┬───────┬─────────┐
│ imm[11:0] │ rs1 │ funct3 │ rd │ opcode │
│ [31:20] │[19:15]│ [14:12]│[11:7] │ [6:0] │
│ 12 bits │ 5 bits│ 3 bits │5 bits │ 7 bits │
└───────────────────┴───────┴────────┴───────┴─────────┘
Example: ADDI x10, x11, 5 → rd=x10, rs1=x11, imm=5
S-type (Store operations):
┌─────────┬───────┬───────┬────────┬─────────┬─────────┐
│imm[11:5]│ rs2 │ rs1 │ funct3 │imm[4:0] │ opcode │
│ [31:25] │[24:20]│[19:15]│ [14:12]│ [11:7] │ [6:0] │
│ 7 bits │ 5 bits│ 5 bits│ 3 bits │ 5 bits │ 7 bits │
└─────────┴───────┴───────┴────────┴─────────┴─────────┘
Example: SW x12, 8(x11) → rs1=x11, rs2=x12, imm=8
B-type (Branch operations):
┌──────┬────────┬───────┬───────┬────────┬────────┬─────────┐
│[12] │[10:5] │ rs2 │ rs1 │ funct3 │[4:1][11]│ opcode │
│bit 31│[30:25] │[24:20]│[19:15]│ [14:12]│ [11:7] │ [6:0] │
└──────┴────────┴───────┴───────┴────────┴────────┴─────────┘
Example: BEQ x5, x6, label → rs1=x5, rs2=x6, imm=offset
U-type (Upper immediate):
┌───────────────────────────────┬───────┬─────────┐
│ imm[31:12] │ rd │ opcode │
│ [31:12] │[11:7] │ [6:0] │
│ 20 bits │5 bits │ 7 bits │
└───────────────────────────────┴───────┴─────────┘
Example: LUI x10, 0x12345 → rd=x10, imm=0x12345000
J-type (Jump operations):
┌──────┬──────────┬──────┬────────────┬───────┬─────────┐
│[20] │ [10:1] │ [11] │ [19:12] │ rd │ opcode │
│bit 31│ [30:21] │bit 20│ [19:12] │[11:7] │ [6:0] │
└──────┴──────────┴──────┴────────────┴───────┴─────────┘
Example: JAL x1, label → rd=x1 (saves return addr)
The Fetch-Decode-Execute Cycle
Every CPU (real or emulated) follows this cycle:
The CPU Cycle
┌─────────────────────────────────────┐
│ │
│ ┌───────────┐ │
│ │ FETCH │ │
│ │ │ │
│ │ Read 4 │ │
│ │ bytes at │ │
┌────────┴────────┐ │ PC │ │
│ UPDATE PC │ └─────┬─────┘ │
│ │ │ │
│ PC = PC + 4 │ │ │
│ (or branch │ ▼ │
│ target) │ ┌───────────┐ │
└────────┬────────┘ │ DECODE │ │
│ │ │ │
│ │ Extract │ │
│ │ opcode, │ │
│ │ registers,│ │
│ │ immediates│ │
│ └─────┬─────┘ │
│ │ │
│ ▼ │
│ ┌───────────┐ │
└──────────────│ EXECUTE │───────────┘
│ │
│ Perform │
│ operation,│
│ write │
│ result │
└───────────┘
Memory Model
RISC-V uses a load/store architecture: arithmetic operations only work on registers. To operate on memory, you must:
- Load from memory to register
- Perform operation
- Store from register to memory
Memory Access Types
Address │ LB/SB │ LH/SH │ LW/SW │
│(byte) │(half) │(word) │
───────────┼───────┼───────┼───────┤
0x1000 │ [X] │ [X] │ [X] │ ← Word-aligned
0x1001 │ [X] │ │ │
0x1002 │ [X] │ [X] │ │ ← Halfword-aligned
0x1003 │ [X] │ │ │
0x1004 │ [X] │ [X] │ [X] │ ← Word-aligned
LB = Load Byte (8 bits, sign-extended to 32)
LBU = Load Byte Unsigned (8 bits, zero-extended)
LH = Load Halfword (16 bits, sign-extended)
LHU = Load Halfword Unsigned
LW = Load Word (32 bits)
SB = Store Byte
SH = Store Halfword
SW = Store Word
Memory-Mapped I/O
In RISC-V (and most architectures), devices communicate via memory-mapped I/O. Specific memory addresses are “magic” - reading or writing them interacts with hardware:
Example Memory Map
0x00000000 ┌─────────────────────────────┐
│ │
│ RAM │
│ (Program + Data) │
│ │
0x0FFFFFFF └─────────────────────────────┘
0x10000000 ┌─────────────────────────────┐
│ UART (Serial Port) │
│ 0x10000000 = TX Data │
│ 0x10000005 = Status │
0x10000FFF └─────────────────────────────┘
0x20000000 ┌─────────────────────────────┐
│ Timer │
│ 0x20000000 = Counter │
0x20000FFF └─────────────────────────────┘
Writing to 0x10000000 sends a character to console
Reading 0x10000005 tells you if UART is ready
Project Specification
What You Will Build
A functional RISC-V (RV32I) emulator that can:
- Load ELF binaries compiled with
riscv64-unknown-elf-gcc -march=rv32i - Execute the RV32I base integer instruction set (~40 instructions)
- Provide a debugger interface for inspection and single-stepping
- Support memory-mapped I/O for console output
Functional Requirements
Core Emulation:
- Implement all 47 RV32I instructions
- Model 32 general-purpose registers (x0-x31) with x0 hardwired to 0
- Implement program counter (PC) with proper increment
- Support at least 64KB of RAM
Memory System:
- Byte, halfword, and word loads (LB, LBU, LH, LHU, LW)
- Byte, halfword, and word stores (SB, SH, SW)
- Memory-mapped UART at 0x10000000 for character output
ELF Loading:
- Parse ELF32 header and program headers
- Load segments into emulated memory
- Set initial PC to entry point
Debugger Interface:
run- Execute until breakpoint or exitstep- Execute single instructionregs- Display all register valuesmem <addr> <count>- Dump memorybreak <addr>- Set breakpointdisasm <addr> <count>- Disassemble instructions
Non-Functional Requirements
- Must correctly execute GCC-compiled RV32I programs
- Should run at least 1 million instructions per second on modern hardware
- Clean, readable code with instruction decoding clearly separated
- Error messages for invalid instructions or memory access
Real World Outcome
When complete, you’ll be able to run RISC-V programs:
# Compile a simple C program for RV32I
$ cat hello.c
#include <stdint.h>
// Memory-mapped UART
#define UART_TX (*(volatile char *)0x10000000)
void print(const char *s) {
while (*s) {
UART_TX = *s++;
}
}
int main() {
print("Hello from RISC-V!\n");
return 0;
}
$ riscv64-unknown-elf-gcc -march=rv32i -mabi=ilp32 -nostdlib \
-Ttext=0x80000000 -o hello hello.c start.S
# Run in your emulator
$ ./rv32emu hello
Hello from RISC-V!
# Debug mode
$ ./rv32emu --debug hello
RV32I Emulator - Debug Mode
Loaded: hello (entry: 0x80000000)
(rv32emu) regs
x0/zero = 0x00000000 x16/a6 = 0x00000000
x1/ra = 0x00000000 x17/a7 = 0x00000000
x2/sp = 0x7FFFFFF0 x18/s2 = 0x00000000
x3/gp = 0x00000000 x19/s3 = 0x00000000
x4/tp = 0x00000000 x20/s4 = 0x00000000
x5/t0 = 0x00000000 x21/s5 = 0x00000000
x6/t1 = 0x00000000 x22/s6 = 0x00000000
x7/t2 = 0x00000000 x23/s7 = 0x00000000
x8/s0 = 0x00000000 x24/s8 = 0x00000000
x9/s1 = 0x00000000 x25/s9 = 0x00000000
x10/a0 = 0x00000000 x26/s10 = 0x00000000
x11/a1 = 0x00000000 x27/s11 = 0x00000000
x12/a2 = 0x00000000 x28/t3 = 0x00000000
x13/a3 = 0x00000000 x29/t4 = 0x00000000
x14/a4 = 0x00000000 x30/t5 = 0x00000000
x15/a5 = 0x00000000 x31/t6 = 0x00000000
PC = 0x80000000
(rv32emu) disasm 0x80000000 5
0x80000000: 00000513 addi x10, x0, 0 # li a0, 0
0x80000004: 00050593 addi x11, x10, 0 # mv a1, a0
0x80000008: 00000617 auipc x12, 0x0
0x8000000c: 01860613 addi x12, x12, 24
0x80000010: 00062683 lw x13, 0(x12)
(rv32emu) step
PC: 0x80000000 Instruction: 0x00000513 ADDI x10, x0, 0
→ x10 = 0 + 0 = 0
(rv32emu) step
PC: 0x80000004 Instruction: 0x00050593 ADDI x11, x10, 0
→ x11 = 0 + 0 = 0
(rv32emu) break 0x80000020
Breakpoint set at 0x80000020
(rv32emu) run
Running...
Breakpoint hit at 0x80000020
(rv32emu) mem 0x10000000 16
0x10000000: 48 65 6c 6c 6f 20 66 72 Hello fr
0x10000008: 6f 6d 20 52 49 53 43 2d om RISC-
(rv32emu) continue
Hello from RISC-V!
Program exited with code 0
Questions to Guide Your Design
Work through these questions BEFORE writing code:
Architecture Questions
-
How will you represent CPU state? A struct with registers, PC, and memory pointer?
-
How will you organize instruction decoding? Giant switch statement? Function pointer table? Separate decode and execute phases?
-
How will you handle sign extension? C doesn’t have built-in sign extension - will you use arithmetic shift or explicit bit manipulation?
-
How will you handle the x0 register? Writes to x0 should be ignored. Do you check on write, or always read 0?
Memory Questions
-
How will you represent memory? Simple array? Sparse mapping for memory-mapped I/O?
-
How will you handle endianness? RISC-V is little-endian - does your host match?
-
How will you detect memory-mapped I/O accesses? Check address ranges on every access?
Implementation Questions
-
How will you load ELF files? Use a library, or parse headers manually?
-
How will you implement the debugger? Simple command parser? Integration with readline?
-
How will you test your emulator? Hand-written assembly? GCC-compiled programs? Test suite?
Thinking Exercise
Before writing any code, work through this instruction by hand:
Given: Instruction bytes at address 0x80000010 = 0x00C58633
Step 1: Convert to binary
0x00C58633 = 0000 0000 1100 0101 1000 0110 0011 0011
Step 2: Extract fields (this is an R-type instruction)
- opcode [6:0] = 0110011 (0x33) → OP (register-register)
- rd [11:7] = 01100 (12) → x12
- funct3 [14:12] = 000 (0) → ADD/SUB
- rs1 [19:15] = 01011 (11) → x11
- rs2 [24:20] = 01100 (12) → x12
- funct7 [31:25] = 0000000 (0) → ADD (not SUB)
Step 3: Determine instruction
opcode=0x33, funct3=0x0, funct7=0x00 → ADD
Step 4: Write semantic
ADD x12, x11, x12 → x[12] = x[11] + x[12]
Now you try:
Decode these instructions (hint: use the RISC-V reference card):
0x00A00513- What instruction? What operands?0x00B50023- What instruction? What memory address?0x00050863- What instruction? What is the branch offset?0x800000EF- What instruction? Where does it jump?
(Answers provided in hints section)
Solution Architecture
High-Level Design
RV32I Emulator Architecture
┌─────────────────────────────────────────────────────────────────────┐
│ MAIN LOOP │
│ ┌───────────────────────────────────────────────────────────────┐ │
│ │ while (running) { │ │
│ │ instruction = fetch(cpu->pc); │ │
│ │ execute(cpu, instruction); │ │
│ │ cpu->pc += 4; // unless branch/jump │ │
│ │ } │ │
│ └───────────────────────────────────────────────────────────────┘ │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │
│ │ CPU │ │ MEMORY │ │ DEBUGGER │ │
│ │ │ │ │ │ │ │
│ │ x[32] registers │ │ uint8_t *ram │ │ breakpoints[] │ │
│ │ uint32_t pc │ │ size_t size │ │ step mode │ │
│ │ │ │ │ │ watch points │ │
│ │ fetch() │ │ read_byte() │ │ disassemble() │ │
│ │ decode() │ │ read_half() │ │ dump_regs() │ │
│ │ execute() │ │ read_word() │ │ dump_mem() │ │
│ │ │ │ write_byte() │ │ │ │
│ │ │ │ write_half() │ │ │ │
│ │ │ │ write_word() │ │ │ │
│ └────────┬────────┘ └────────┬────────┘ └────────┬────────┘ │
│ │ │ │ │
│ └──────────────┬──────┴─────────────────────┘ │
│ │ │
│ ▼ │
│ ┌───────────────────────────────────────────────────────────────┐ │
│ │ MEMORY-MAPPED I/O │ │
│ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │
│ │ │ UART │ │ Timer │ │ CLINT │ │ ... │ │ │
│ │ │0x10000000│ │0x20000000│ │0x02000000│ │ │ │ │
│ │ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │ │
│ └───────────────────────────────────────────────────────────────┘ │
│ │
├─────────────────────────────────────────────────────────────────────┤
│ ELF LOADER │
│ ┌───────────────────────────────────────────────────────────────┐ │
│ │ parse_elf() → load segments → set entry point │ │
│ └───────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────┘
Key Data Structures
// CPU state
typedef struct {
uint32_t x[32]; // General-purpose registers (x0 always 0)
uint32_t pc; // Program counter
} CPU;
// Memory subsystem
typedef struct {
uint8_t *ram; // Main memory
size_t ram_size; // Size of RAM
uint32_t ram_base; // Base address of RAM
} Memory;
// UART device (memory-mapped I/O)
typedef struct {
uint8_t tx_data; // Transmit data register
uint8_t status; // Status register (bit 0 = ready)
} UART;
// Complete emulator state
typedef struct {
CPU cpu;
Memory mem;
UART uart;
bool running;
bool debug_mode;
uint32_t breakpoints[MAX_BREAKPOINTS];
int num_breakpoints;
} Emulator;
// Decoded instruction (for debugging/display)
typedef struct {
uint32_t raw; // Original instruction bits
uint8_t opcode; // bits [6:0]
uint8_t rd; // bits [11:7]
uint8_t funct3; // bits [14:12]
uint8_t rs1; // bits [19:15]
uint8_t rs2; // bits [24:20]
uint8_t funct7; // bits [31:25]
int32_t imm; // Immediate value (sign-extended)
const char *mnemonic;
} Instruction;
Instruction Decoding Strategy
// Opcode definitions (bits [6:0])
#define OP_LUI 0x37 // Load Upper Immediate
#define OP_AUIPC 0x17 // Add Upper Immediate to PC
#define OP_JAL 0x6F // Jump and Link
#define OP_JALR 0x67 // Jump and Link Register
#define OP_BRANCH 0x63 // Conditional Branch
#define OP_LOAD 0x03 // Load from memory
#define OP_STORE 0x23 // Store to memory
#define OP_IMM 0x13 // Immediate arithmetic
#define OP_REG 0x33 // Register arithmetic
#define OP_FENCE 0x0F // Memory ordering
#define OP_SYSTEM 0x73 // System calls (ECALL, EBREAK)
// Main decode/execute function
void execute(Emulator *emu, uint32_t inst) {
uint32_t opcode = inst & 0x7F;
uint32_t rd = (inst >> 7) & 0x1F;
uint32_t funct3 = (inst >> 12) & 0x7;
uint32_t rs1 = (inst >> 15) & 0x1F;
uint32_t rs2 = (inst >> 20) & 0x1F;
uint32_t funct7 = (inst >> 25) & 0x7F;
switch (opcode) {
case OP_LUI:
execute_lui(emu, rd, inst);
break;
case OP_AUIPC:
execute_auipc(emu, rd, inst);
break;
case OP_JAL:
execute_jal(emu, rd, inst);
break;
// ... etc
}
}
Immediate Extraction
Different instruction formats have different immediate encodings:
// I-type immediate (bits [31:20], sign-extended)
int32_t imm_i(uint32_t inst) {
return (int32_t)inst >> 20; // Arithmetic shift preserves sign
}
// S-type immediate (bits [31:25] | bits [11:7])
int32_t imm_s(uint32_t inst) {
int32_t imm = ((inst >> 25) << 5) | ((inst >> 7) & 0x1F);
// Sign extend from bit 11
if (imm & 0x800) imm |= 0xFFFFF000;
return imm;
}
// B-type immediate (branch offset, bits scrambled)
int32_t imm_b(uint32_t inst) {
int32_t imm = 0;
imm |= ((inst >> 31) & 1) << 12; // bit 12
imm |= ((inst >> 7) & 1) << 11; // bit 11
imm |= ((inst >> 25) & 0x3F) << 5; // bits [10:5]
imm |= ((inst >> 8) & 0xF) << 1; // bits [4:1]
// Sign extend from bit 12
if (imm & 0x1000) imm |= 0xFFFFE000;
return imm;
}
// U-type immediate (bits [31:12], already shifted)
int32_t imm_u(uint32_t inst) {
return inst & 0xFFFFF000;
}
// J-type immediate (JAL offset, bits scrambled)
int32_t imm_j(uint32_t inst) {
int32_t imm = 0;
imm |= ((inst >> 31) & 1) << 20; // bit 20
imm |= ((inst >> 12) & 0xFF) << 12; // bits [19:12]
imm |= ((inst >> 20) & 1) << 11; // bit 11
imm |= ((inst >> 21) & 0x3FF) << 1; // bits [10:1]
// Sign extend from bit 20
if (imm & 0x100000) imm |= 0xFFE00000;
return imm;
}
Implementation Guide
Development Environment Setup
# Install RISC-V toolchain
# macOS (Homebrew)
brew tap riscv/riscv
brew install riscv-tools
# Ubuntu/Debian
sudo apt-get install gcc-riscv64-unknown-elf
# Fedora
sudo dnf install gcc-riscv64-linux-gnu
# Verify installation
riscv64-unknown-elf-gcc --version
# For debugging your emulator
# Install gdb for your host platform
Project Structure
rv32emu/
├── src/
│ ├── main.c # Entry point, argument parsing
│ ├── cpu.c # CPU state, fetch-decode-execute
│ ├── cpu.h
│ ├── memory.c # Memory subsystem
│ ├── memory.h
│ ├── decode.c # Instruction decoding
│ ├── decode.h
│ ├── execute.c # Instruction execution
│ ├── execute.h
│ ├── mmio.c # Memory-mapped I/O devices
│ ├── mmio.h
│ ├── elf_loader.c # ELF file parsing
│ ├── elf_loader.h
│ ├── debugger.c # Interactive debugger
│ └── debugger.h
├── tests/
│ ├── asm/ # Hand-written assembly tests
│ │ ├── add.S
│ │ ├── branch.S
│ │ └── ...
│ ├── c/ # C test programs
│ │ ├── hello.c
│ │ ├── fibonacci.c
│ │ └── ...
│ └── run_tests.sh
├── Makefile
└── README.md
Implementation Phases
Phase 1: Foundation (Days 1-3)
Goals:
- Set up project structure
- Implement CPU struct and basic memory
- Implement fetch cycle
Tasks:
- Create CPU struct with 32 registers and PC
- Implement memory read/write (byte, halfword, word)
- Implement basic main loop (fetch instruction at PC)
- Test with hardcoded instruction bytes
Checkpoint: Can fetch instructions from memory and print them.
// Minimal test
int main() {
Emulator emu;
init_emulator(&emu);
// Put ADDI x10, x0, 42 at address 0
// 0x02A00513 = addi x10, x0, 42
write_word(&emu.mem, 0, 0x02A00513);
uint32_t inst = fetch(&emu);
printf("Fetched: 0x%08X\n", inst);
// Should print: Fetched: 0x02A00513
return 0;
}
Phase 2: Basic Instructions (Days 4-7)
Goals:
- Implement instruction decoding
- Implement arithmetic instructions (ADD, SUB, AND, OR, XOR, etc.)
- Implement immediate instructions (ADDI, ANDI, ORI, etc.)
Tasks:
- Implement opcode extraction and switch statement
- Implement R-type instructions (register-register)
- Implement I-type immediate instructions
- Implement U-type (LUI, AUIPC)
- Test each instruction type
Checkpoint: Arithmetic sequence executes correctly.
// Test program (hand-assemble or use assembler)
// addi x10, x0, 5 # x10 = 5
// addi x11, x0, 3 # x11 = 3
// add x12, x10, x11 # x12 = 8
// After execution, x12 should contain 8
Phase 3: Control Flow (Days 8-10)
Goals:
- Implement branches (BEQ, BNE, BLT, BGE, BLTU, BGEU)
- Implement jumps (JAL, JALR)
- Handle PC updates correctly
Tasks:
- Implement B-type immediate extraction
- Implement conditional branches
- Implement JAL (save return address, jump)
- Implement JALR (indirect jump)
- Test loops and function calls
Checkpoint: Loop that counts to 10 executes correctly.
# Test: loop counting down from 10
addi x10, x0, 10 # x10 = 10
loop:
addi x10, x10, -1 # x10--
bne x10, x0, loop # if x10 != 0, goto loop
# x10 should be 0 here
Phase 4: Memory Operations (Days 11-14)
Goals:
- Implement load instructions (LB, LBU, LH, LHU, LW)
- Implement store instructions (SB, SH, SW)
- Handle alignment (at least report errors)
Tasks:
- Implement S-type immediate extraction
- Implement load instructions with sign/zero extension
- Implement store instructions
- Add alignment checks (optional: allow misaligned)
- Test memory access patterns
Checkpoint: Can load and store arrays.
# Test: store and load values
lui x10, 0x80000 # x10 = 0x80000000 (data area)
addi x11, x0, 0x42 # x11 = 0x42
sw x11, 0(x10) # store 0x42 at 0x80000000
lw x12, 0(x10) # load it back
# x12 should equal 0x42
Phase 5: ELF Loading (Days 15-17)
Goals:
- Parse ELF32 header
- Load program segments
- Set entry point
Tasks:
- Read and validate ELF header
- Find and read program headers
- Load PT_LOAD segments into memory
- Set initial PC to entry point
- Test with GCC-compiled programs
Checkpoint: Can load and run GCC-compiled “Hello World”.
// Minimal ELF header parsing
typedef struct {
uint8_t e_ident[16]; // Magic number
uint16_t e_type; // Object file type
uint16_t e_machine; // Machine type (0xF3 for RISC-V)
uint32_t e_version;
uint32_t e_entry; // Entry point
uint32_t e_phoff; // Program header offset
uint32_t e_shoff; // Section header offset
// ... more fields
} Elf32_Ehdr;
Phase 6: Memory-Mapped I/O (Days 18-19)
Goals:
- Implement UART for console output
- Route memory accesses through MMIO check
Tasks:
- Define UART register addresses
- Intercept writes to UART address
- Output character to host console
- Optionally implement UART status register
Checkpoint: “Hello World” prints to terminal.
// MMIO routing
uint32_t mem_read(Emulator *emu, uint32_t addr, int size) {
// Check for UART address range
if (addr >= UART_BASE && addr < UART_BASE + UART_SIZE) {
return uart_read(&emu->uart, addr - UART_BASE, size);
}
// Normal memory read
return ram_read(&emu->mem, addr, size);
}
void mem_write(Emulator *emu, uint32_t addr, uint32_t value, int size) {
if (addr >= UART_BASE && addr < UART_BASE + UART_SIZE) {
uart_write(&emu->uart, addr - UART_BASE, value, size);
return;
}
ram_write(&emu->mem, addr, value, size);
}
Phase 7: Debugger Interface (Days 20-21)
Goals:
- Implement interactive debugger
- Add disassembly
- Add breakpoints
Tasks:
- Implement command parser
- Implement register dump
- Implement memory dump
- Implement single-step mode
- Implement breakpoints
- Implement instruction disassembly
Checkpoint: Can single-step through program and inspect state.
Hints in Layers
Hint 1: Getting Started - CPU Structure
typedef struct {
uint32_t x[32];
uint32_t pc;
} CPU;
void cpu_init(CPU *cpu) {
memset(cpu->x, 0, sizeof(cpu->x));
cpu->pc = 0;
}
// Important: Always return 0 for x0
uint32_t cpu_read_reg(CPU *cpu, int reg) {
if (reg == 0) return 0;
return cpu->x[reg];
}
// Writes to x0 are discarded
void cpu_write_reg(CPU *cpu, int reg, uint32_t value) {
if (reg != 0) {
cpu->x[reg] = value;
}
}
Hint 2: Instruction Decoding Macros
// Field extraction macros
#define OPCODE(inst) ((inst) & 0x7F)
#define RD(inst) (((inst) >> 7) & 0x1F)
#define FUNCT3(inst) (((inst) >> 12) & 0x7)
#define RS1(inst) (((inst) >> 15) & 0x1F)
#define RS2(inst) (((inst) >> 20) & 0x1F)
#define FUNCT7(inst) (((inst) >> 25) & 0x7F)
// Example decode
uint32_t inst = 0x00C58633; // add x12, x11, x12
printf("opcode=%02X rd=%d funct3=%d rs1=%d rs2=%d funct7=%02X\n",
OPCODE(inst), RD(inst), FUNCT3(inst),
RS1(inst), RS2(inst), FUNCT7(inst));
// Output: opcode=33 rd=12 funct3=0 rs1=11 rs2=12 funct7=00
Hint 3: Implementing ADD/SUB
void execute_op(Emulator *emu, uint32_t inst) {
uint32_t rd = RD(inst);
uint32_t rs1 = RS1(inst);
uint32_t rs2 = RS2(inst);
uint32_t funct3 = FUNCT3(inst);
uint32_t funct7 = FUNCT7(inst);
uint32_t val1 = cpu_read_reg(&emu->cpu, rs1);
uint32_t val2 = cpu_read_reg(&emu->cpu, rs2);
uint32_t result;
switch (funct3) {
case 0x0: // ADD or SUB
if (funct7 == 0x00) {
result = val1 + val2; // ADD
} else if (funct7 == 0x20) {
result = val1 - val2; // SUB
}
break;
case 0x1: // SLL (Shift Left Logical)
result = val1 << (val2 & 0x1F);
break;
case 0x2: // SLT (Set Less Than)
result = ((int32_t)val1 < (int32_t)val2) ? 1 : 0;
break;
case 0x3: // SLTU (Set Less Than Unsigned)
result = (val1 < val2) ? 1 : 0;
break;
case 0x4: // XOR
result = val1 ^ val2;
break;
case 0x5: // SRL or SRA
if (funct7 == 0x00) {
result = val1 >> (val2 & 0x1F); // SRL
} else {
result = (int32_t)val1 >> (val2 & 0x1F); // SRA
}
break;
case 0x6: // OR
result = val1 | val2;
break;
case 0x7: // AND
result = val1 & val2;
break;
}
cpu_write_reg(&emu->cpu, rd, result);
}
Hint 4: Sign Extension
// Sign extend a value from 'bits' to 32 bits
int32_t sign_extend(uint32_t value, int bits) {
int32_t sign_bit = 1 << (bits - 1);
return (value ^ sign_bit) - sign_bit;
}
// Alternative using struct bit fields
union {
struct { int32_t val : 12; } s;
uint32_t u;
} conv;
conv.u = imm12;
int32_t extended = conv.s.val;
// Or using arithmetic shift (simplest)
int32_t imm_i(uint32_t inst) {
return (int32_t)inst >> 20; // Arithmetic right shift
}
Hint 5: Branch Instructions
void execute_branch(Emulator *emu, uint32_t inst) {
uint32_t rs1 = RS1(inst);
uint32_t rs2 = RS2(inst);
uint32_t funct3 = FUNCT3(inst);
int32_t offset = imm_b(inst);
uint32_t val1 = cpu_read_reg(&emu->cpu, rs1);
uint32_t val2 = cpu_read_reg(&emu->cpu, rs2);
bool take_branch = false;
switch (funct3) {
case 0x0: // BEQ
take_branch = (val1 == val2);
break;
case 0x1: // BNE
take_branch = (val1 != val2);
break;
case 0x4: // BLT
take_branch = ((int32_t)val1 < (int32_t)val2);
break;
case 0x5: // BGE
take_branch = ((int32_t)val1 >= (int32_t)val2);
break;
case 0x6: // BLTU
take_branch = (val1 < val2);
break;
case 0x7: // BGEU
take_branch = (val1 >= val2);
break;
}
if (take_branch) {
emu->cpu.pc += offset;
emu->cpu.pc -= 4; // Compensate for PC+=4 in main loop
}
}
Hint 6: Thinking Exercise Answers
0x00A00513:- opcode=0x13 (OP-IMM), rd=10, funct3=0, rs1=0, imm=10
- ADDI x10, x0, 10 (load immediate 10 into x10)
0x00B50023:- opcode=0x23 (STORE), funct3=0 (SB), rs1=10, rs2=11
-
imm = [31:25] [11:7] = 0 0 = 0 - SB x11, 0(x10) (store byte from x11 to address in x10)
0x00050863:- opcode=0x63 (BRANCH), funct3=0 (BEQ), rs1=10, rs2=0
- imm_b = 16 (offset of 16 bytes = 4 instructions)
- BEQ x10, x0, +16 (branch if x10 == 0)
0x800000EF:- opcode=0x6F (JAL), rd=1
- imm_j = -2048 (sign-extended)
- JAL x1, -2048 (call function 2KB backwards)
Testing Strategy
Unit Tests for Instructions
Create test programs for each instruction category:
# tests/asm/test_add.S
.globl _start
_start:
# Test ADD
li x10, 5 # x10 = 5
li x11, 3 # x11 = 3
add x12, x10, x11 # x12 should be 8
# Test with negative
li x13, -5 # x13 = -5
add x14, x10, x13 # x14 should be 0
# Signal done (write to UART)
li x15, 0x10000000
li x16, 'P' # 'P' for Pass
sb x16, 0(x15)
# Halt
ebreak
Integration Tests
#!/bin/bash
# tests/run_tests.sh
EMULATOR=../rv32emu
for test in asm/*.S; do
name=$(basename $test .S)
echo -n "Testing $name... "
# Assemble
riscv64-unknown-elf-gcc -march=rv32i -mabi=ilp32 -nostdlib \
-Ttext=0x80000000 -o bin/$name $test
# Run
output=$($EMULATOR bin/$name 2>&1)
if echo "$output" | grep -q "^P"; then
echo "PASS"
else
echo "FAIL"
echo "$output"
fi
done
Test Cases Table
| Test | What It Tests | Expected Result |
|---|---|---|
| test_add | ADD, ADDI | x12 = 8 |
| test_sub | SUB | x12 = 2 |
| test_logic | AND, OR, XOR | Various bit patterns |
| test_shift | SLL, SRL, SRA | Shift results |
| test_compare | SLT, SLTU | 0 or 1 |
| test_branch | BEQ, BNE, BLT | Correct jumps |
| test_jal | JAL, JALR | Return address saved |
| test_load | LB, LH, LW | Values from memory |
| test_store | SB, SH, SW | Values to memory |
| test_lui | LUI, AUIPC | Upper immediate handling |
Common Pitfalls & Debugging
Pitfall 1: Forgetting x0 is Always Zero
Symptom: Programs behave unexpectedly when using x0
Fix:
uint32_t cpu_read_reg(CPU *cpu, int reg) {
if (reg == 0) return 0; // x0 is hardwired to zero
return cpu->x[reg];
}
Pitfall 2: Sign Extension Errors
Symptom: Negative immediates become huge positive numbers
Fix:
// Wrong: zero extension
int32_t imm = (inst >> 20) & 0xFFF;
// Right: sign extension via arithmetic shift
int32_t imm = (int32_t)inst >> 20;
Pitfall 3: Branch Offset Calculation
Symptom: Branches go to wrong addresses
Fix:
// B-type immediate is PC-relative and in multiples of 2
// The offset is added to PC, not PC+4
if (take_branch) {
cpu->pc += offset; // offset includes the ×2 scaling
cpu->pc -= 4; // Compensate for main loop's PC+=4
}
Pitfall 4: Endianness
Symptom: Multi-byte values appear byte-swapped
Fix: RISC-V is little-endian. When loading a word:
uint32_t mem_read_word(Memory *mem, uint32_t addr) {
return mem->ram[addr] |
(mem->ram[addr + 1] << 8) |
(mem->ram[addr + 2] << 16) |
(mem->ram[addr + 3] << 24);
}
Pitfall 5: JALR Target Calculation
Symptom: JALR jumps to wrong address
Fix:
// JALR: target = (rs1 + imm) & ~1 (clear LSB)
void execute_jalr(Emulator *emu, uint32_t inst) {
uint32_t rd = RD(inst);
uint32_t rs1 = RS1(inst);
int32_t imm = imm_i(inst);
uint32_t return_addr = emu->cpu.pc + 4;
uint32_t target = (cpu_read_reg(&emu->cpu, rs1) + imm) & ~1;
cpu_write_reg(&emu->cpu, rd, return_addr);
emu->cpu.pc = target - 4; // Compensate for PC+=4
}
Pitfall 6: Shift Amount Masking
Symptom: Shifts by large amounts behave strangely
Fix:
// Only lower 5 bits of shift amount are used
uint32_t shamt = val2 & 0x1F; // For register shifts
uint32_t shamt = RS2(inst); // For immediate shifts
result = val1 << shamt;
Extensions & Challenges
Beginner Extensions
- Instruction counter: Track how many instructions executed
- Cycle counter: Estimate cycles (assume CPI of 1 for now)
- Memory usage tracking: Report peak memory usage
Intermediate Extensions
- RV32M support: Multiply and divide instructions
- Better disassembler: Show ABI register names, pretty print
- GDB stub: Implement GDB remote protocol for debugging
- Performance profiling: Hot instruction addresses
Advanced Extensions
- RV32F support: Floating-point instructions
- RV32A support: Atomic instructions
- Compressed instructions: 16-bit RV32C extension
- System mode: User/supervisor modes, CSRs, interrupts
- Basic block caching: Cache decoded instructions
The Interview Questions They’ll Ask
After completing this project, you’ll be ready for these questions:
CPU Architecture
- “Explain the fetch-decode-execute cycle”
- Fetch: Read instruction bytes from PC address
- Decode: Extract opcode, registers, immediates
- Execute: Perform operation, write result
- Update PC (sequential or branch target)
- “What is a load/store architecture?”
- Arithmetic only on registers, not memory
- Must load from memory to register first
- RISC-V, ARM, MIPS are load/store
- x86 allows memory operands directly
- “Why is x0 hardwired to zero in RISC-V?”
- Simplifies instruction encoding
- No need for “load immediate 0” instruction
- Useful for discarding results (write to x0)
- Common constant available without extra instruction
Virtualization/Emulation
- “What’s the difference between interpretation and binary translation?”
- Interpretation: Decode and execute each instruction
- Binary translation: Convert guest code to host code once
- Translation is faster but more complex
- QEMU TCG does binary translation
- “How does memory-mapped I/O work?”
- Certain addresses map to device registers
- Reading/writing triggers device operations
- No special I/O instructions needed
- Emulator must intercept these accesses
- “What are the challenges of CPU emulation?”
- Performance (1000x slower than native)
- Timing accuracy (cycle-accurate vs functional)
- Self-modifying code detection
- Privileged instruction handling
Debugging/Analysis
- “How would you implement single-stepping in an emulator?”
- After each instruction, check debug mode flag
- If set, pause and wait for debugger command
- Maintain breakpoint list, check PC against it
- Can implement conditional breakpoints
- “How do you decode a RISC-V instruction?”
- Extract opcode (bits 6-0) to determine format
- Extract fields based on format (rd, rs1, rs2, imm)
- Use opcode + funct3 + funct7 to identify instruction
- Sign-extend immediates as needed
Books That Will Help
| Topic | Book | Specific Chapter | Why It Helps |
|---|---|---|---|
| RISC-V Architecture | Computer Organization and Design RISC-V Edition | Chapters 2-4 | The authoritative RISC-V textbook |
| Instruction Encoding | RISC-V Specification | Chapter 2, 24 | Official instruction encoding reference |
| ELF Format | Practical Binary Analysis | Chapter 2 | How to parse and load executables |
| CPU Emulation | Programming from the Ground Up | Chapters 1-4 | CPU concepts from programmer’s perspective |
| Memory Systems | CS:APP | Chapter 6 | Memory hierarchy and access patterns |
| Bitwise Operations | C Programming Language (K&R) | Chapter 2 | The bit manipulation you’ll use constantly |
Online Resources
- RISC-V Reader - Free introduction to RISC-V
- RISC-V Specification - Official specs
- RISC-V Reference Card - Quick instruction reference
- Venus RISC-V Simulator - Web-based, good for testing
Self-Assessment Checklist
Before moving to Project 3, verify:
Understanding
- Can explain the fetch-decode-execute cycle without notes
- Can decode any RV32I instruction by hand
- Can explain why x0 is hardwired to zero
- Understand the difference between signed and unsigned comparisons
- Can explain how branch offsets are calculated
Implementation
- All 47 RV32I instructions implemented
- Register x0 always returns 0
- Sign extension works correctly for all immediate types
- Branches and jumps update PC correctly
- Memory load/store handles all sizes
Testing
- Hand-written assembly tests pass
- Simple GCC-compiled programs run
- “Hello World” prints correctly
- Debugger can single-step and inspect state
Growth
- Can write RISC-V assembly for simple algorithms
- Understand how C code compiles to RISC-V
- Can debug emulator issues using QEMU as reference
Real-World Connections
What You’re Learning Applies To:
QEMU Development
- QEMU’s TCG (Tiny Code Generator) does exactly this, but with JIT compilation
- Your decoder structure mirrors QEMU’s instruction handlers
- MMIO handling is how QEMU emulates all devices
Hypervisor Development
- KVM uses QEMU for device emulation
- Understanding instruction formats helps with hardware virtualization
- Debug/trace tools need instruction decoding
Security Research
- Malware analysis sandboxes use emulation
- Symbolic execution engines (angr, S2E) are built on emulators
- Fuzzing often uses emulation for coverage
Hardware Design
- Your emulator is a “golden model” for hardware verification
- Same decode logic appears in actual RISC-V processors
- SystemVerilog models work similarly
Submission / Completion Criteria
Minimum Viable Completion:
- All RV32I instructions implemented
- Simple programs execute correctly
- Memory reads and writes work
- Basic console output via MMIO
Full Completion:
- ELF loader works
- Debugger interface functional (step, regs, mem, break)
- Disassembler shows instruction mnemonics
- GCC-compiled programs run
Excellence (Going Above & Beyond):
- Instruction-level profiling
- GDB remote stub
- RV32M extension (multiply/divide)
- Cycle-accurate timing model
- Web-based interface
Next Steps
After completing this project, you’ll be ready for:
- Project 3: Basic Block Binary Translator - Instead of interpreting each instruction, translate basic blocks to host code
- Project 4: Simple JIT Compiler - Generate native code at runtime for faster execution
- Project 13: Mini-QEMU Clone - Combine CPU emulation with full device emulation
Your RISC-V emulator forms the foundation for understanding all CPU virtualization, whether software-based (QEMU) or hardware-assisted (KVM).
This guide was expanded from HYPERVISOR_VIRTUALIZATION_DEEP_DIVE_PROJECTS.md. For the complete learning path, see the project index.