Project 2: Simple RISC CPU Emulator (RV32I)

Build an emulator for a subset of RISC-V (RV32I) that can run simple compiled programs, with a debugger interface showing register state and memory.


Quick Reference

Attribute Value
Language C (alt: Rust, C++, Zig)
Difficulty Intermediate
Time 2-3 weeks
Knowledge Area CPU Architecture / ISA Design
Hardware Needed None (pure software emulation)
Prerequisites C programming, basic assembly concepts, hex/binary
Key Book “Computer Organization and Design RISC-V Edition” by Patterson & Hennessy

Learning Objectives

By completing this project, you will:

  1. Understand how CPUs execute instructions: The fetch-decode-execute cycle becomes concrete when you implement it
  2. Master RISC-V instruction encoding: Decode R-type, I-type, S-type, B-type, U-type, and J-type instruction formats
  3. Implement a register file: Understand why x0 is hardwired to zero and how the 32 general-purpose registers work
  4. Build a memory subsystem: Implement byte, halfword, and word access with proper alignment
  5. Handle control flow: Implement branches, jumps, and function calls (JAL/JALR)
  6. Create a debugger interface: Single-step execution, register inspection, memory dumps
  7. Load and run ELF binaries: Parse the executable format used by GCC-compiled RISC-V programs
  8. Implement memory-mapped I/O: Understand how CPUs communicate with devices

The Core Question You’re Answering

“How does a CPU actually execute instructions, and what does it mean to ‘emulate’ one?”

This project answers the fundamental question of how code becomes execution. When you write a = b + c, what actually happens in hardware? Your emulator will model every step:

  • Fetching the instruction bytes from memory
  • Decoding which operation to perform
  • Reading operand values from registers
  • Performing the computation
  • Writing results back

This is exactly what QEMU does with its Tiny Code Generator (TCG) for cross-architecture emulation, and what hypervisors must understand to virtualize CPUs.


Why This Project Teaches Virtualization

RISC-V is a modern, open instruction set used in production systems. Understanding CPU emulation is foundational to hypervisor development because:

  1. Software virtualization (like QEMU without KVM) works by emulating guest CPU instructions
  2. Binary translation (QEMU’s TCG) requires understanding instruction formats to translate them
  3. Hardware virtualization (VT-x) still needs emulation for unsupported instructions
  4. Debug/trace tools require instruction decoding
  5. Security analysis tools emulate code to detect malware behavior
What QEMU Does (Simplified)

Guest Program (ARM)                     Host CPU (x86)
┌─────────────────────┐                ┌─────────────────────┐
│ add r0, r1, r2      │                │                     │
│ ldr r3, [r0]        │  ──QEMU TCG──► │ mov eax, ebx        │
│ bne label           │    (Binary     │ add eax, ecx        │
└─────────────────────┘   Translation) │ mov edx, [eax]      │
                                       └─────────────────────┘

Your Project: Similar but simpler

RISC-V Program                         Your Emulator (runs on any CPU)
┌─────────────────────┐                ┌─────────────────────┐
│ add x10, x11, x12   │                │ cpu.x[10] =         │
│ lw x13, 0(x10)      │  ──Emulator──► │   cpu.x[11] +       │
│ bne x5, x6, label   │                │   cpu.x[12]         │
└─────────────────────┘                └─────────────────────┘

Concepts You Must Understand First

Before starting this project, ensure you understand these concepts:

Concept Why It Matters Where to Learn
Binary and hexadecimal Instructions are encoded as bit patterns CS:APP Chapter 2
Basic assembly concepts (registers, memory, jumps) You’re implementing what assembly describes Patterson & Hennessy Ch. 2
Two’s complement signed integers RISC-V uses signed arithmetic CS:APP 2.3
Bitwise operations in C («, », &, |, ^) Instruction decoding is all bit manipulation Any C reference
Function call conventions (stack, return address) Understanding JAL/JALR and the ABI Patterson & Hennessy 2.8
ELF file format basics Loading compiled programs “Practical Binary Analysis” Ch. 2

Self-Assessment Questions

Before proceeding, verify you can answer:

  1. Binary encoding: What is 0xDEADBEEF in binary? How many bits is that?
  2. Sign extension: If you have an 8-bit signed value 0xFF, what is it when sign-extended to 32 bits?
  3. Bit extraction: How would you extract bits 12-14 from a 32-bit value in C?
  4. Two’s complement: What is -5 represented as a 32-bit two’s complement integer?
  5. Memory addressing: If a word is at address 0x1000, what are the addresses of its 4 bytes?

Theoretical Foundation

RISC-V Architecture Overview

RISC-V (pronounced “risk-five”) is a free, open Instruction Set Architecture (ISA) designed at UC Berkeley in 2010. Unlike proprietary ISAs (x86, ARM), anyone can implement RISC-V without licensing fees.

Key RISC-V Characteristics:

RISC-V Design Philosophy
┌─────────────────────────────────────────────────────────────┐
│                                                              │
│  SIMPLICITY           MODULARITY           EXTENSIBILITY    │
│  ┌──────────────┐    ┌──────────────┐    ┌──────────────┐  │
│  │ Fixed 32-bit │    │ RV32I = Base │    │ Custom       │  │
│  │ instructions │    │ RV32M = Mul  │    │ instructions │  │
│  │ (mostly)     │    │ RV32F = Float│    │ allowed      │  │
│  │              │    │ RV32A = Atom │    │              │  │
│  │ Load/Store   │    │ ...          │    │ Reserved     │  │
│  │ architecture │    │              │    │ opcode space │  │
│  └──────────────┘    └──────────────┘    └──────────────┘  │
│                                                              │
└─────────────────────────────────────────────────────────────┘

RV32I - The Base Integer Instruction Set:

RV32I is the minimal RISC-V implementation. It has:

  • 32 general-purpose 32-bit registers (x0-x31)
  • Program counter (PC)
  • ~47 instructions (arithmetic, logic, memory, control flow)

The Register File

RISC-V has 32 registers, each 32 bits wide (in RV32):

RISC-V Register File (RV32I)

 Register │ ABI Name │ Description                  │ Saved By
──────────┼──────────┼──────────────────────────────┼──────────
 x0       │ zero     │ Hardwired to 0               │ N/A
 x1       │ ra       │ Return address               │ Caller
 x2       │ sp       │ Stack pointer                │ Callee
 x3       │ gp       │ Global pointer               │ N/A
 x4       │ tp       │ Thread pointer               │ N/A
 x5-x7    │ t0-t2    │ Temporaries                  │ Caller
 x8       │ s0/fp    │ Saved register / Frame ptr   │ Callee
 x9       │ s1       │ Saved register               │ Callee
 x10-x11  │ a0-a1    │ Function args / Return vals  │ Caller
 x12-x17  │ a2-a7    │ Function arguments           │ Caller
 x18-x27  │ s2-s11   │ Saved registers              │ Callee
 x28-x31  │ t3-t6    │ Temporaries                  │ Caller

Critical Rule: x0 is always zero. Writing to x0 has no effect. This simplifies instruction design (no need for special “load immediate 0” instruction).

Instruction Formats

RISC-V uses 6 instruction formats. All instructions are 32 bits, with the opcode in the same position:

RISC-V Instruction Formats (32 bits each)

R-type (Register-Register operations):
┌─────────┬───────┬───────┬────────┬───────┬─────────┐
│ funct7  │  rs2  │  rs1  │ funct3 │   rd  │ opcode  │
│  [31:25]│[24:20]│[19:15]│ [14:12]│[11:7] │  [6:0]  │
│  7 bits │ 5 bits│ 5 bits│ 3 bits │5 bits │ 7 bits  │
└─────────┴───────┴───────┴────────┴───────┴─────────┘
Example: ADD x10, x11, x12  → rd=x10, rs1=x11, rs2=x12

I-type (Immediate operations):
┌───────────────────┬───────┬────────┬───────┬─────────┐
│     imm[11:0]     │  rs1  │ funct3 │   rd  │ opcode  │
│      [31:20]      │[19:15]│ [14:12]│[11:7] │  [6:0]  │
│      12 bits      │ 5 bits│ 3 bits │5 bits │ 7 bits  │
└───────────────────┴───────┴────────┴───────┴─────────┘
Example: ADDI x10, x11, 5   → rd=x10, rs1=x11, imm=5

S-type (Store operations):
┌─────────┬───────┬───────┬────────┬─────────┬─────────┐
│imm[11:5]│  rs2  │  rs1  │ funct3 │imm[4:0] │ opcode  │
│ [31:25] │[24:20]│[19:15]│ [14:12]│ [11:7]  │  [6:0]  │
│  7 bits │ 5 bits│ 5 bits│ 3 bits │ 5 bits  │ 7 bits  │
└─────────┴───────┴───────┴────────┴─────────┴─────────┘
Example: SW x12, 8(x11)    → rs1=x11, rs2=x12, imm=8

B-type (Branch operations):
┌──────┬────────┬───────┬───────┬────────┬────────┬─────────┐
│[12]  │[10:5]  │  rs2  │  rs1  │ funct3 │[4:1][11]│ opcode │
│bit 31│[30:25] │[24:20]│[19:15]│ [14:12]│ [11:7] │  [6:0]  │
└──────┴────────┴───────┴───────┴────────┴────────┴─────────┘
Example: BEQ x5, x6, label → rs1=x5, rs2=x6, imm=offset

U-type (Upper immediate):
┌───────────────────────────────┬───────┬─────────┐
│         imm[31:12]            │   rd  │ opcode  │
│           [31:12]             │[11:7] │  [6:0]  │
│           20 bits             │5 bits │ 7 bits  │
└───────────────────────────────┴───────┴─────────┘
Example: LUI x10, 0x12345      → rd=x10, imm=0x12345000

J-type (Jump operations):
┌──────┬──────────┬──────┬────────────┬───────┬─────────┐
│[20]  │ [10:1]   │ [11] │  [19:12]   │   rd  │ opcode  │
│bit 31│ [30:21]  │bit 20│   [19:12]  │[11:7] │  [6:0]  │
└──────┴──────────┴──────┴────────────┴───────┴─────────┘
Example: JAL x1, label        → rd=x1 (saves return addr)

The Fetch-Decode-Execute Cycle

Every CPU (real or emulated) follows this cycle:

The CPU Cycle
                    ┌─────────────────────────────────────┐
                    │                                      │
                    │              ┌───────────┐           │
                    │              │   FETCH   │           │
                    │              │           │           │
                    │              │ Read 4    │           │
                    │              │ bytes at  │           │
           ┌────────┴────────┐     │ PC        │           │
           │ UPDATE PC       │     └─────┬─────┘           │
           │                 │           │                 │
           │ PC = PC + 4     │           │                 │
           │ (or branch      │           ▼                 │
           │  target)        │     ┌───────────┐           │
           └────────┬────────┘     │  DECODE   │           │
                    │              │           │           │
                    │              │ Extract   │           │
                    │              │ opcode,   │           │
                    │              │ registers,│           │
                    │              │ immediates│           │
                    │              └─────┬─────┘           │
                    │                    │                 │
                    │                    ▼                 │
                    │              ┌───────────┐           │
                    └──────────────│  EXECUTE  │───────────┘
                                   │           │
                                   │ Perform   │
                                   │ operation,│
                                   │ write     │
                                   │ result    │
                                   └───────────┘

Memory Model

RISC-V uses a load/store architecture: arithmetic operations only work on registers. To operate on memory, you must:

  1. Load from memory to register
  2. Perform operation
  3. Store from register to memory
Memory Access Types

Address    │ LB/SB │ LH/SH │ LW/SW │
           │(byte) │(half) │(word) │
───────────┼───────┼───────┼───────┤
0x1000     │  [X]  │  [X]  │  [X]  │ ← Word-aligned
0x1001     │  [X]  │       │       │
0x1002     │  [X]  │  [X]  │       │ ← Halfword-aligned
0x1003     │  [X]  │       │       │
0x1004     │  [X]  │  [X]  │  [X]  │ ← Word-aligned

LB  = Load Byte (8 bits, sign-extended to 32)
LBU = Load Byte Unsigned (8 bits, zero-extended)
LH  = Load Halfword (16 bits, sign-extended)
LHU = Load Halfword Unsigned
LW  = Load Word (32 bits)
SB  = Store Byte
SH  = Store Halfword
SW  = Store Word

Memory-Mapped I/O

In RISC-V (and most architectures), devices communicate via memory-mapped I/O. Specific memory addresses are “magic” - reading or writing them interacts with hardware:

Example Memory Map

0x00000000 ┌─────────────────────────────┐
           │                              │
           │         RAM                  │
           │   (Program + Data)           │
           │                              │
0x0FFFFFFF └─────────────────────────────┘

0x10000000 ┌─────────────────────────────┐
           │   UART (Serial Port)         │
           │   0x10000000 = TX Data       │
           │   0x10000005 = Status        │
0x10000FFF └─────────────────────────────┘

0x20000000 ┌─────────────────────────────┐
           │   Timer                      │
           │   0x20000000 = Counter       │
0x20000FFF └─────────────────────────────┘

Writing to 0x10000000 sends a character to console
Reading 0x10000005 tells you if UART is ready

Project Specification

What You Will Build

A functional RISC-V (RV32I) emulator that can:

  1. Load ELF binaries compiled with riscv64-unknown-elf-gcc -march=rv32i
  2. Execute the RV32I base integer instruction set (~40 instructions)
  3. Provide a debugger interface for inspection and single-stepping
  4. Support memory-mapped I/O for console output

Functional Requirements

Core Emulation:

  • Implement all 47 RV32I instructions
  • Model 32 general-purpose registers (x0-x31) with x0 hardwired to 0
  • Implement program counter (PC) with proper increment
  • Support at least 64KB of RAM

Memory System:

  • Byte, halfword, and word loads (LB, LBU, LH, LHU, LW)
  • Byte, halfword, and word stores (SB, SH, SW)
  • Memory-mapped UART at 0x10000000 for character output

ELF Loading:

  • Parse ELF32 header and program headers
  • Load segments into emulated memory
  • Set initial PC to entry point

Debugger Interface:

  • run - Execute until breakpoint or exit
  • step - Execute single instruction
  • regs - Display all register values
  • mem <addr> <count> - Dump memory
  • break <addr> - Set breakpoint
  • disasm <addr> <count> - Disassemble instructions

Non-Functional Requirements

  • Must correctly execute GCC-compiled RV32I programs
  • Should run at least 1 million instructions per second on modern hardware
  • Clean, readable code with instruction decoding clearly separated
  • Error messages for invalid instructions or memory access

Real World Outcome

When complete, you’ll be able to run RISC-V programs:

# Compile a simple C program for RV32I
$ cat hello.c
#include <stdint.h>

// Memory-mapped UART
#define UART_TX (*(volatile char *)0x10000000)

void print(const char *s) {
    while (*s) {
        UART_TX = *s++;
    }
}

int main() {
    print("Hello from RISC-V!\n");
    return 0;
}

$ riscv64-unknown-elf-gcc -march=rv32i -mabi=ilp32 -nostdlib \
    -Ttext=0x80000000 -o hello hello.c start.S

# Run in your emulator
$ ./rv32emu hello
Hello from RISC-V!

# Debug mode
$ ./rv32emu --debug hello
RV32I Emulator - Debug Mode
Loaded: hello (entry: 0x80000000)

(rv32emu) regs
 x0/zero = 0x00000000   x16/a6   = 0x00000000
 x1/ra   = 0x00000000   x17/a7   = 0x00000000
 x2/sp   = 0x7FFFFFF0   x18/s2   = 0x00000000
 x3/gp   = 0x00000000   x19/s3   = 0x00000000
 x4/tp   = 0x00000000   x20/s4   = 0x00000000
 x5/t0   = 0x00000000   x21/s5   = 0x00000000
 x6/t1   = 0x00000000   x22/s6   = 0x00000000
 x7/t2   = 0x00000000   x23/s7   = 0x00000000
 x8/s0   = 0x00000000   x24/s8   = 0x00000000
 x9/s1   = 0x00000000   x25/s9   = 0x00000000
x10/a0   = 0x00000000   x26/s10  = 0x00000000
x11/a1   = 0x00000000   x27/s11  = 0x00000000
x12/a2   = 0x00000000   x28/t3   = 0x00000000
x13/a3   = 0x00000000   x29/t4   = 0x00000000
x14/a4   = 0x00000000   x30/t5   = 0x00000000
x15/a5   = 0x00000000   x31/t6   = 0x00000000
PC       = 0x80000000

(rv32emu) disasm 0x80000000 5
0x80000000: 00000513  addi    x10, x0, 0     # li a0, 0
0x80000004: 00050593  addi    x11, x10, 0    # mv a1, a0
0x80000008: 00000617  auipc   x12, 0x0
0x8000000c: 01860613  addi    x12, x12, 24
0x80000010: 00062683  lw      x13, 0(x12)

(rv32emu) step
PC: 0x80000000  Instruction: 0x00000513  ADDI x10, x0, 0
  → x10 = 0 + 0 = 0

(rv32emu) step
PC: 0x80000004  Instruction: 0x00050593  ADDI x11, x10, 0
  → x11 = 0 + 0 = 0

(rv32emu) break 0x80000020
Breakpoint set at 0x80000020

(rv32emu) run
Running...
Breakpoint hit at 0x80000020

(rv32emu) mem 0x10000000 16
0x10000000: 48 65 6c 6c 6f 20 66 72  Hello fr
0x10000008: 6f 6d 20 52 49 53 43 2d  om RISC-

(rv32emu) continue
Hello from RISC-V!
Program exited with code 0

Questions to Guide Your Design

Work through these questions BEFORE writing code:

Architecture Questions

  1. How will you represent CPU state? A struct with registers, PC, and memory pointer?

  2. How will you organize instruction decoding? Giant switch statement? Function pointer table? Separate decode and execute phases?

  3. How will you handle sign extension? C doesn’t have built-in sign extension - will you use arithmetic shift or explicit bit manipulation?

  4. How will you handle the x0 register? Writes to x0 should be ignored. Do you check on write, or always read 0?

Memory Questions

  1. How will you represent memory? Simple array? Sparse mapping for memory-mapped I/O?

  2. How will you handle endianness? RISC-V is little-endian - does your host match?

  3. How will you detect memory-mapped I/O accesses? Check address ranges on every access?

Implementation Questions

  1. How will you load ELF files? Use a library, or parse headers manually?

  2. How will you implement the debugger? Simple command parser? Integration with readline?

  3. How will you test your emulator? Hand-written assembly? GCC-compiled programs? Test suite?


Thinking Exercise

Before writing any code, work through this instruction by hand:

Given: Instruction bytes at address 0x80000010 = 0x00C58633

Step 1: Convert to binary
0x00C58633 = 0000 0000 1100 0101 1000 0110 0011 0011

Step 2: Extract fields (this is an R-type instruction)
- opcode [6:0]   = 0110011  (0x33) → OP (register-register)
- rd     [11:7]  = 01100    (12)   → x12
- funct3 [14:12] = 000      (0)    → ADD/SUB
- rs1    [19:15] = 01011    (11)   → x11
- rs2    [24:20] = 01100    (12)   → x12
- funct7 [31:25] = 0000000  (0)    → ADD (not SUB)

Step 3: Determine instruction
opcode=0x33, funct3=0x0, funct7=0x00 → ADD

Step 4: Write semantic
ADD x12, x11, x12 → x[12] = x[11] + x[12]

Now you try:

Decode these instructions (hint: use the RISC-V reference card):

  1. 0x00A00513 - What instruction? What operands?
  2. 0x00B50023 - What instruction? What memory address?
  3. 0x00050863 - What instruction? What is the branch offset?
  4. 0x800000EF - What instruction? Where does it jump?

(Answers provided in hints section)


Solution Architecture

High-Level Design

RV32I Emulator Architecture

┌─────────────────────────────────────────────────────────────────────┐
│                           MAIN LOOP                                  │
│  ┌───────────────────────────────────────────────────────────────┐  │
│  │  while (running) {                                            │  │
│  │      instruction = fetch(cpu->pc);                            │  │
│  │      execute(cpu, instruction);                               │  │
│  │      cpu->pc += 4; // unless branch/jump                      │  │
│  │  }                                                            │  │
│  └───────────────────────────────────────────────────────────────┘  │
├─────────────────────────────────────────────────────────────────────┤
│                                                                      │
│  ┌─────────────────┐   ┌─────────────────┐   ┌─────────────────┐   │
│  │      CPU        │   │    MEMORY       │   │   DEBUGGER      │   │
│  │                 │   │                 │   │                 │   │
│  │ x[32] registers │   │ uint8_t *ram    │   │ breakpoints[]   │   │
│  │ uint32_t pc     │   │ size_t size     │   │ step mode       │   │
│  │                 │   │                 │   │ watch points    │   │
│  │ fetch()         │   │ read_byte()     │   │ disassemble()   │   │
│  │ decode()        │   │ read_half()     │   │ dump_regs()     │   │
│  │ execute()       │   │ read_word()     │   │ dump_mem()      │   │
│  │                 │   │ write_byte()    │   │                 │   │
│  │                 │   │ write_half()    │   │                 │   │
│  │                 │   │ write_word()    │   │                 │   │
│  └────────┬────────┘   └────────┬────────┘   └────────┬────────┘   │
│           │                     │                     │             │
│           └──────────────┬──────┴─────────────────────┘             │
│                          │                                           │
│                          ▼                                           │
│  ┌───────────────────────────────────────────────────────────────┐  │
│  │                    MEMORY-MAPPED I/O                          │  │
│  │  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐     │  │
│  │  │   UART   │  │  Timer   │  │  CLINT   │  │   ...    │     │  │
│  │  │0x10000000│  │0x20000000│  │0x02000000│  │          │     │  │
│  │  └──────────┘  └──────────┘  └──────────┘  └──────────┘     │  │
│  └───────────────────────────────────────────────────────────────┘  │
│                                                                      │
├─────────────────────────────────────────────────────────────────────┤
│                        ELF LOADER                                    │
│  ┌───────────────────────────────────────────────────────────────┐  │
│  │ parse_elf() → load segments → set entry point                 │  │
│  └───────────────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────────────┘

Key Data Structures

// CPU state
typedef struct {
    uint32_t x[32];      // General-purpose registers (x0 always 0)
    uint32_t pc;         // Program counter
} CPU;

// Memory subsystem
typedef struct {
    uint8_t *ram;        // Main memory
    size_t ram_size;     // Size of RAM
    uint32_t ram_base;   // Base address of RAM
} Memory;

// UART device (memory-mapped I/O)
typedef struct {
    uint8_t tx_data;     // Transmit data register
    uint8_t status;      // Status register (bit 0 = ready)
} UART;

// Complete emulator state
typedef struct {
    CPU cpu;
    Memory mem;
    UART uart;
    bool running;
    bool debug_mode;
    uint32_t breakpoints[MAX_BREAKPOINTS];
    int num_breakpoints;
} Emulator;

// Decoded instruction (for debugging/display)
typedef struct {
    uint32_t raw;        // Original instruction bits
    uint8_t opcode;      // bits [6:0]
    uint8_t rd;          // bits [11:7]
    uint8_t funct3;      // bits [14:12]
    uint8_t rs1;         // bits [19:15]
    uint8_t rs2;         // bits [24:20]
    uint8_t funct7;      // bits [31:25]
    int32_t imm;         // Immediate value (sign-extended)
    const char *mnemonic;
} Instruction;

Instruction Decoding Strategy

// Opcode definitions (bits [6:0])
#define OP_LUI      0x37   // Load Upper Immediate
#define OP_AUIPC    0x17   // Add Upper Immediate to PC
#define OP_JAL      0x6F   // Jump and Link
#define OP_JALR     0x67   // Jump and Link Register
#define OP_BRANCH   0x63   // Conditional Branch
#define OP_LOAD     0x03   // Load from memory
#define OP_STORE    0x23   // Store to memory
#define OP_IMM      0x13   // Immediate arithmetic
#define OP_REG      0x33   // Register arithmetic
#define OP_FENCE    0x0F   // Memory ordering
#define OP_SYSTEM   0x73   // System calls (ECALL, EBREAK)

// Main decode/execute function
void execute(Emulator *emu, uint32_t inst) {
    uint32_t opcode = inst & 0x7F;
    uint32_t rd     = (inst >> 7) & 0x1F;
    uint32_t funct3 = (inst >> 12) & 0x7;
    uint32_t rs1    = (inst >> 15) & 0x1F;
    uint32_t rs2    = (inst >> 20) & 0x1F;
    uint32_t funct7 = (inst >> 25) & 0x7F;

    switch (opcode) {
        case OP_LUI:
            execute_lui(emu, rd, inst);
            break;
        case OP_AUIPC:
            execute_auipc(emu, rd, inst);
            break;
        case OP_JAL:
            execute_jal(emu, rd, inst);
            break;
        // ... etc
    }
}

Immediate Extraction

Different instruction formats have different immediate encodings:

// I-type immediate (bits [31:20], sign-extended)
int32_t imm_i(uint32_t inst) {
    return (int32_t)inst >> 20;  // Arithmetic shift preserves sign
}

// S-type immediate (bits [31:25] | bits [11:7])
int32_t imm_s(uint32_t inst) {
    int32_t imm = ((inst >> 25) << 5) | ((inst >> 7) & 0x1F);
    // Sign extend from bit 11
    if (imm & 0x800) imm |= 0xFFFFF000;
    return imm;
}

// B-type immediate (branch offset, bits scrambled)
int32_t imm_b(uint32_t inst) {
    int32_t imm = 0;
    imm |= ((inst >> 31) & 1) << 12;   // bit 12
    imm |= ((inst >> 7) & 1) << 11;    // bit 11
    imm |= ((inst >> 25) & 0x3F) << 5; // bits [10:5]
    imm |= ((inst >> 8) & 0xF) << 1;   // bits [4:1]
    // Sign extend from bit 12
    if (imm & 0x1000) imm |= 0xFFFFE000;
    return imm;
}

// U-type immediate (bits [31:12], already shifted)
int32_t imm_u(uint32_t inst) {
    return inst & 0xFFFFF000;
}

// J-type immediate (JAL offset, bits scrambled)
int32_t imm_j(uint32_t inst) {
    int32_t imm = 0;
    imm |= ((inst >> 31) & 1) << 20;       // bit 20
    imm |= ((inst >> 12) & 0xFF) << 12;    // bits [19:12]
    imm |= ((inst >> 20) & 1) << 11;       // bit 11
    imm |= ((inst >> 21) & 0x3FF) << 1;    // bits [10:1]
    // Sign extend from bit 20
    if (imm & 0x100000) imm |= 0xFFE00000;
    return imm;
}

Implementation Guide

Development Environment Setup

# Install RISC-V toolchain

# macOS (Homebrew)
brew tap riscv/riscv
brew install riscv-tools

# Ubuntu/Debian
sudo apt-get install gcc-riscv64-unknown-elf

# Fedora
sudo dnf install gcc-riscv64-linux-gnu

# Verify installation
riscv64-unknown-elf-gcc --version

# For debugging your emulator
# Install gdb for your host platform

Project Structure

rv32emu/
├── src/
│   ├── main.c              # Entry point, argument parsing
│   ├── cpu.c               # CPU state, fetch-decode-execute
│   ├── cpu.h
│   ├── memory.c            # Memory subsystem
│   ├── memory.h
│   ├── decode.c            # Instruction decoding
│   ├── decode.h
│   ├── execute.c           # Instruction execution
│   ├── execute.h
│   ├── mmio.c              # Memory-mapped I/O devices
│   ├── mmio.h
│   ├── elf_loader.c        # ELF file parsing
│   ├── elf_loader.h
│   ├── debugger.c          # Interactive debugger
│   └── debugger.h
├── tests/
│   ├── asm/                # Hand-written assembly tests
│   │   ├── add.S
│   │   ├── branch.S
│   │   └── ...
│   ├── c/                  # C test programs
│   │   ├── hello.c
│   │   ├── fibonacci.c
│   │   └── ...
│   └── run_tests.sh
├── Makefile
└── README.md

Implementation Phases

Phase 1: Foundation (Days 1-3)

Goals:

  • Set up project structure
  • Implement CPU struct and basic memory
  • Implement fetch cycle

Tasks:

  1. Create CPU struct with 32 registers and PC
  2. Implement memory read/write (byte, halfword, word)
  3. Implement basic main loop (fetch instruction at PC)
  4. Test with hardcoded instruction bytes

Checkpoint: Can fetch instructions from memory and print them.

// Minimal test
int main() {
    Emulator emu;
    init_emulator(&emu);

    // Put ADDI x10, x0, 42 at address 0
    // 0x02A00513 = addi x10, x0, 42
    write_word(&emu.mem, 0, 0x02A00513);

    uint32_t inst = fetch(&emu);
    printf("Fetched: 0x%08X\n", inst);
    // Should print: Fetched: 0x02A00513

    return 0;
}

Phase 2: Basic Instructions (Days 4-7)

Goals:

  • Implement instruction decoding
  • Implement arithmetic instructions (ADD, SUB, AND, OR, XOR, etc.)
  • Implement immediate instructions (ADDI, ANDI, ORI, etc.)

Tasks:

  1. Implement opcode extraction and switch statement
  2. Implement R-type instructions (register-register)
  3. Implement I-type immediate instructions
  4. Implement U-type (LUI, AUIPC)
  5. Test each instruction type

Checkpoint: Arithmetic sequence executes correctly.

// Test program (hand-assemble or use assembler)
// addi x10, x0, 5    # x10 = 5
// addi x11, x0, 3    # x11 = 3
// add  x12, x10, x11 # x12 = 8

// After execution, x12 should contain 8

Phase 3: Control Flow (Days 8-10)

Goals:

  • Implement branches (BEQ, BNE, BLT, BGE, BLTU, BGEU)
  • Implement jumps (JAL, JALR)
  • Handle PC updates correctly

Tasks:

  1. Implement B-type immediate extraction
  2. Implement conditional branches
  3. Implement JAL (save return address, jump)
  4. Implement JALR (indirect jump)
  5. Test loops and function calls

Checkpoint: Loop that counts to 10 executes correctly.

# Test: loop counting down from 10
        addi    x10, x0, 10     # x10 = 10
loop:
        addi    x10, x10, -1    # x10--
        bne     x10, x0, loop   # if x10 != 0, goto loop
        # x10 should be 0 here

Phase 4: Memory Operations (Days 11-14)

Goals:

  • Implement load instructions (LB, LBU, LH, LHU, LW)
  • Implement store instructions (SB, SH, SW)
  • Handle alignment (at least report errors)

Tasks:

  1. Implement S-type immediate extraction
  2. Implement load instructions with sign/zero extension
  3. Implement store instructions
  4. Add alignment checks (optional: allow misaligned)
  5. Test memory access patterns

Checkpoint: Can load and store arrays.

# Test: store and load values
        lui     x10, 0x80000        # x10 = 0x80000000 (data area)
        addi    x11, x0, 0x42       # x11 = 0x42
        sw      x11, 0(x10)         # store 0x42 at 0x80000000
        lw      x12, 0(x10)         # load it back
        # x12 should equal 0x42

Phase 5: ELF Loading (Days 15-17)

Goals:

  • Parse ELF32 header
  • Load program segments
  • Set entry point

Tasks:

  1. Read and validate ELF header
  2. Find and read program headers
  3. Load PT_LOAD segments into memory
  4. Set initial PC to entry point
  5. Test with GCC-compiled programs

Checkpoint: Can load and run GCC-compiled “Hello World”.

// Minimal ELF header parsing
typedef struct {
    uint8_t  e_ident[16];   // Magic number
    uint16_t e_type;        // Object file type
    uint16_t e_machine;     // Machine type (0xF3 for RISC-V)
    uint32_t e_version;
    uint32_t e_entry;       // Entry point
    uint32_t e_phoff;       // Program header offset
    uint32_t e_shoff;       // Section header offset
    // ... more fields
} Elf32_Ehdr;

Phase 6: Memory-Mapped I/O (Days 18-19)

Goals:

  • Implement UART for console output
  • Route memory accesses through MMIO check

Tasks:

  1. Define UART register addresses
  2. Intercept writes to UART address
  3. Output character to host console
  4. Optionally implement UART status register

Checkpoint: “Hello World” prints to terminal.

// MMIO routing
uint32_t mem_read(Emulator *emu, uint32_t addr, int size) {
    // Check for UART address range
    if (addr >= UART_BASE && addr < UART_BASE + UART_SIZE) {
        return uart_read(&emu->uart, addr - UART_BASE, size);
    }
    // Normal memory read
    return ram_read(&emu->mem, addr, size);
}

void mem_write(Emulator *emu, uint32_t addr, uint32_t value, int size) {
    if (addr >= UART_BASE && addr < UART_BASE + UART_SIZE) {
        uart_write(&emu->uart, addr - UART_BASE, value, size);
        return;
    }
    ram_write(&emu->mem, addr, value, size);
}

Phase 7: Debugger Interface (Days 20-21)

Goals:

  • Implement interactive debugger
  • Add disassembly
  • Add breakpoints

Tasks:

  1. Implement command parser
  2. Implement register dump
  3. Implement memory dump
  4. Implement single-step mode
  5. Implement breakpoints
  6. Implement instruction disassembly

Checkpoint: Can single-step through program and inspect state.


Hints in Layers

Hint 1: Getting Started - CPU Structure

typedef struct {
    uint32_t x[32];
    uint32_t pc;
} CPU;

void cpu_init(CPU *cpu) {
    memset(cpu->x, 0, sizeof(cpu->x));
    cpu->pc = 0;
}

// Important: Always return 0 for x0
uint32_t cpu_read_reg(CPU *cpu, int reg) {
    if (reg == 0) return 0;
    return cpu->x[reg];
}

// Writes to x0 are discarded
void cpu_write_reg(CPU *cpu, int reg, uint32_t value) {
    if (reg != 0) {
        cpu->x[reg] = value;
    }
}

Hint 2: Instruction Decoding Macros

// Field extraction macros
#define OPCODE(inst)  ((inst) & 0x7F)
#define RD(inst)      (((inst) >> 7) & 0x1F)
#define FUNCT3(inst)  (((inst) >> 12) & 0x7)
#define RS1(inst)     (((inst) >> 15) & 0x1F)
#define RS2(inst)     (((inst) >> 20) & 0x1F)
#define FUNCT7(inst)  (((inst) >> 25) & 0x7F)

// Example decode
uint32_t inst = 0x00C58633;  // add x12, x11, x12
printf("opcode=%02X rd=%d funct3=%d rs1=%d rs2=%d funct7=%02X\n",
       OPCODE(inst), RD(inst), FUNCT3(inst),
       RS1(inst), RS2(inst), FUNCT7(inst));
// Output: opcode=33 rd=12 funct3=0 rs1=11 rs2=12 funct7=00

Hint 3: Implementing ADD/SUB

void execute_op(Emulator *emu, uint32_t inst) {
    uint32_t rd = RD(inst);
    uint32_t rs1 = RS1(inst);
    uint32_t rs2 = RS2(inst);
    uint32_t funct3 = FUNCT3(inst);
    uint32_t funct7 = FUNCT7(inst);

    uint32_t val1 = cpu_read_reg(&emu->cpu, rs1);
    uint32_t val2 = cpu_read_reg(&emu->cpu, rs2);
    uint32_t result;

    switch (funct3) {
        case 0x0:  // ADD or SUB
            if (funct7 == 0x00) {
                result = val1 + val2;  // ADD
            } else if (funct7 == 0x20) {
                result = val1 - val2;  // SUB
            }
            break;
        case 0x1:  // SLL (Shift Left Logical)
            result = val1 << (val2 & 0x1F);
            break;
        case 0x2:  // SLT (Set Less Than)
            result = ((int32_t)val1 < (int32_t)val2) ? 1 : 0;
            break;
        case 0x3:  // SLTU (Set Less Than Unsigned)
            result = (val1 < val2) ? 1 : 0;
            break;
        case 0x4:  // XOR
            result = val1 ^ val2;
            break;
        case 0x5:  // SRL or SRA
            if (funct7 == 0x00) {
                result = val1 >> (val2 & 0x1F);  // SRL
            } else {
                result = (int32_t)val1 >> (val2 & 0x1F);  // SRA
            }
            break;
        case 0x6:  // OR
            result = val1 | val2;
            break;
        case 0x7:  // AND
            result = val1 & val2;
            break;
    }

    cpu_write_reg(&emu->cpu, rd, result);
}

Hint 4: Sign Extension

// Sign extend a value from 'bits' to 32 bits
int32_t sign_extend(uint32_t value, int bits) {
    int32_t sign_bit = 1 << (bits - 1);
    return (value ^ sign_bit) - sign_bit;
}

// Alternative using struct bit fields
union {
    struct { int32_t val : 12; } s;
    uint32_t u;
} conv;
conv.u = imm12;
int32_t extended = conv.s.val;

// Or using arithmetic shift (simplest)
int32_t imm_i(uint32_t inst) {
    return (int32_t)inst >> 20;  // Arithmetic right shift
}

Hint 5: Branch Instructions

void execute_branch(Emulator *emu, uint32_t inst) {
    uint32_t rs1 = RS1(inst);
    uint32_t rs2 = RS2(inst);
    uint32_t funct3 = FUNCT3(inst);
    int32_t offset = imm_b(inst);

    uint32_t val1 = cpu_read_reg(&emu->cpu, rs1);
    uint32_t val2 = cpu_read_reg(&emu->cpu, rs2);
    bool take_branch = false;

    switch (funct3) {
        case 0x0:  // BEQ
            take_branch = (val1 == val2);
            break;
        case 0x1:  // BNE
            take_branch = (val1 != val2);
            break;
        case 0x4:  // BLT
            take_branch = ((int32_t)val1 < (int32_t)val2);
            break;
        case 0x5:  // BGE
            take_branch = ((int32_t)val1 >= (int32_t)val2);
            break;
        case 0x6:  // BLTU
            take_branch = (val1 < val2);
            break;
        case 0x7:  // BGEU
            take_branch = (val1 >= val2);
            break;
    }

    if (take_branch) {
        emu->cpu.pc += offset;
        emu->cpu.pc -= 4;  // Compensate for PC+=4 in main loop
    }
}

Hint 6: Thinking Exercise Answers

  1. 0x00A00513:
    • opcode=0x13 (OP-IMM), rd=10, funct3=0, rs1=0, imm=10
    • ADDI x10, x0, 10 (load immediate 10 into x10)
  2. 0x00B50023:
    • opcode=0x23 (STORE), funct3=0 (SB), rs1=10, rs2=11
    • imm = [31:25] [11:7] = 0 0 = 0
    • SB x11, 0(x10) (store byte from x11 to address in x10)
  3. 0x00050863:
    • opcode=0x63 (BRANCH), funct3=0 (BEQ), rs1=10, rs2=0
    • imm_b = 16 (offset of 16 bytes = 4 instructions)
    • BEQ x10, x0, +16 (branch if x10 == 0)
  4. 0x800000EF:
    • opcode=0x6F (JAL), rd=1
    • imm_j = -2048 (sign-extended)
    • JAL x1, -2048 (call function 2KB backwards)

Testing Strategy

Unit Tests for Instructions

Create test programs for each instruction category:

# tests/asm/test_add.S
.globl _start
_start:
    # Test ADD
    li      x10, 5          # x10 = 5
    li      x11, 3          # x11 = 3
    add     x12, x10, x11   # x12 should be 8

    # Test with negative
    li      x13, -5         # x13 = -5
    add     x14, x10, x13   # x14 should be 0

    # Signal done (write to UART)
    li      x15, 0x10000000
    li      x16, 'P'        # 'P' for Pass
    sb      x16, 0(x15)

    # Halt
    ebreak

Integration Tests

#!/bin/bash
# tests/run_tests.sh

EMULATOR=../rv32emu

for test in asm/*.S; do
    name=$(basename $test .S)
    echo -n "Testing $name... "

    # Assemble
    riscv64-unknown-elf-gcc -march=rv32i -mabi=ilp32 -nostdlib \
        -Ttext=0x80000000 -o bin/$name $test

    # Run
    output=$($EMULATOR bin/$name 2>&1)

    if echo "$output" | grep -q "^P"; then
        echo "PASS"
    else
        echo "FAIL"
        echo "$output"
    fi
done

Test Cases Table

Test What It Tests Expected Result
test_add ADD, ADDI x12 = 8
test_sub SUB x12 = 2
test_logic AND, OR, XOR Various bit patterns
test_shift SLL, SRL, SRA Shift results
test_compare SLT, SLTU 0 or 1
test_branch BEQ, BNE, BLT Correct jumps
test_jal JAL, JALR Return address saved
test_load LB, LH, LW Values from memory
test_store SB, SH, SW Values to memory
test_lui LUI, AUIPC Upper immediate handling

Common Pitfalls & Debugging

Pitfall 1: Forgetting x0 is Always Zero

Symptom: Programs behave unexpectedly when using x0

Fix:

uint32_t cpu_read_reg(CPU *cpu, int reg) {
    if (reg == 0) return 0;  // x0 is hardwired to zero
    return cpu->x[reg];
}

Pitfall 2: Sign Extension Errors

Symptom: Negative immediates become huge positive numbers

Fix:

// Wrong: zero extension
int32_t imm = (inst >> 20) & 0xFFF;

// Right: sign extension via arithmetic shift
int32_t imm = (int32_t)inst >> 20;

Pitfall 3: Branch Offset Calculation

Symptom: Branches go to wrong addresses

Fix:

// B-type immediate is PC-relative and in multiples of 2
// The offset is added to PC, not PC+4
if (take_branch) {
    cpu->pc += offset;  // offset includes the ×2 scaling
    cpu->pc -= 4;       // Compensate for main loop's PC+=4
}

Pitfall 4: Endianness

Symptom: Multi-byte values appear byte-swapped

Fix: RISC-V is little-endian. When loading a word:

uint32_t mem_read_word(Memory *mem, uint32_t addr) {
    return mem->ram[addr] |
           (mem->ram[addr + 1] << 8) |
           (mem->ram[addr + 2] << 16) |
           (mem->ram[addr + 3] << 24);
}

Pitfall 5: JALR Target Calculation

Symptom: JALR jumps to wrong address

Fix:

// JALR: target = (rs1 + imm) & ~1 (clear LSB)
void execute_jalr(Emulator *emu, uint32_t inst) {
    uint32_t rd = RD(inst);
    uint32_t rs1 = RS1(inst);
    int32_t imm = imm_i(inst);

    uint32_t return_addr = emu->cpu.pc + 4;
    uint32_t target = (cpu_read_reg(&emu->cpu, rs1) + imm) & ~1;

    cpu_write_reg(&emu->cpu, rd, return_addr);
    emu->cpu.pc = target - 4;  // Compensate for PC+=4
}

Pitfall 6: Shift Amount Masking

Symptom: Shifts by large amounts behave strangely

Fix:

// Only lower 5 bits of shift amount are used
uint32_t shamt = val2 & 0x1F;  // For register shifts
uint32_t shamt = RS2(inst);    // For immediate shifts
result = val1 << shamt;

Extensions & Challenges

Beginner Extensions

  • Instruction counter: Track how many instructions executed
  • Cycle counter: Estimate cycles (assume CPI of 1 for now)
  • Memory usage tracking: Report peak memory usage

Intermediate Extensions

  • RV32M support: Multiply and divide instructions
  • Better disassembler: Show ABI register names, pretty print
  • GDB stub: Implement GDB remote protocol for debugging
  • Performance profiling: Hot instruction addresses

Advanced Extensions

  • RV32F support: Floating-point instructions
  • RV32A support: Atomic instructions
  • Compressed instructions: 16-bit RV32C extension
  • System mode: User/supervisor modes, CSRs, interrupts
  • Basic block caching: Cache decoded instructions

The Interview Questions They’ll Ask

After completing this project, you’ll be ready for these questions:

CPU Architecture

  1. “Explain the fetch-decode-execute cycle”
    • Fetch: Read instruction bytes from PC address
    • Decode: Extract opcode, registers, immediates
    • Execute: Perform operation, write result
    • Update PC (sequential or branch target)
  2. “What is a load/store architecture?”
    • Arithmetic only on registers, not memory
    • Must load from memory to register first
    • RISC-V, ARM, MIPS are load/store
    • x86 allows memory operands directly
  3. “Why is x0 hardwired to zero in RISC-V?”
    • Simplifies instruction encoding
    • No need for “load immediate 0” instruction
    • Useful for discarding results (write to x0)
    • Common constant available without extra instruction

Virtualization/Emulation

  1. “What’s the difference between interpretation and binary translation?”
    • Interpretation: Decode and execute each instruction
    • Binary translation: Convert guest code to host code once
    • Translation is faster but more complex
    • QEMU TCG does binary translation
  2. “How does memory-mapped I/O work?”
    • Certain addresses map to device registers
    • Reading/writing triggers device operations
    • No special I/O instructions needed
    • Emulator must intercept these accesses
  3. “What are the challenges of CPU emulation?”
    • Performance (1000x slower than native)
    • Timing accuracy (cycle-accurate vs functional)
    • Self-modifying code detection
    • Privileged instruction handling

Debugging/Analysis

  1. “How would you implement single-stepping in an emulator?”
    • After each instruction, check debug mode flag
    • If set, pause and wait for debugger command
    • Maintain breakpoint list, check PC against it
    • Can implement conditional breakpoints
  2. “How do you decode a RISC-V instruction?”
    • Extract opcode (bits 6-0) to determine format
    • Extract fields based on format (rd, rs1, rs2, imm)
    • Use opcode + funct3 + funct7 to identify instruction
    • Sign-extend immediates as needed

Books That Will Help

Topic Book Specific Chapter Why It Helps
RISC-V Architecture Computer Organization and Design RISC-V Edition Chapters 2-4 The authoritative RISC-V textbook
Instruction Encoding RISC-V Specification Chapter 2, 24 Official instruction encoding reference
ELF Format Practical Binary Analysis Chapter 2 How to parse and load executables
CPU Emulation Programming from the Ground Up Chapters 1-4 CPU concepts from programmer’s perspective
Memory Systems CS:APP Chapter 6 Memory hierarchy and access patterns
Bitwise Operations C Programming Language (K&R) Chapter 2 The bit manipulation you’ll use constantly

Online Resources


Self-Assessment Checklist

Before moving to Project 3, verify:

Understanding

  • Can explain the fetch-decode-execute cycle without notes
  • Can decode any RV32I instruction by hand
  • Can explain why x0 is hardwired to zero
  • Understand the difference between signed and unsigned comparisons
  • Can explain how branch offsets are calculated

Implementation

  • All 47 RV32I instructions implemented
  • Register x0 always returns 0
  • Sign extension works correctly for all immediate types
  • Branches and jumps update PC correctly
  • Memory load/store handles all sizes

Testing

  • Hand-written assembly tests pass
  • Simple GCC-compiled programs run
  • “Hello World” prints correctly
  • Debugger can single-step and inspect state

Growth

  • Can write RISC-V assembly for simple algorithms
  • Understand how C code compiles to RISC-V
  • Can debug emulator issues using QEMU as reference

Real-World Connections

What You’re Learning Applies To:

QEMU Development

  • QEMU’s TCG (Tiny Code Generator) does exactly this, but with JIT compilation
  • Your decoder structure mirrors QEMU’s instruction handlers
  • MMIO handling is how QEMU emulates all devices

Hypervisor Development

  • KVM uses QEMU for device emulation
  • Understanding instruction formats helps with hardware virtualization
  • Debug/trace tools need instruction decoding

Security Research

  • Malware analysis sandboxes use emulation
  • Symbolic execution engines (angr, S2E) are built on emulators
  • Fuzzing often uses emulation for coverage

Hardware Design

  • Your emulator is a “golden model” for hardware verification
  • Same decode logic appears in actual RISC-V processors
  • SystemVerilog models work similarly

Submission / Completion Criteria

Minimum Viable Completion:

  • All RV32I instructions implemented
  • Simple programs execute correctly
  • Memory reads and writes work
  • Basic console output via MMIO

Full Completion:

  • ELF loader works
  • Debugger interface functional (step, regs, mem, break)
  • Disassembler shows instruction mnemonics
  • GCC-compiled programs run

Excellence (Going Above & Beyond):

  • Instruction-level profiling
  • GDB remote stub
  • RV32M extension (multiply/divide)
  • Cycle-accurate timing model
  • Web-based interface

Next Steps

After completing this project, you’ll be ready for:

  • Project 3: Basic Block Binary Translator - Instead of interpreting each instruction, translate basic blocks to host code
  • Project 4: Simple JIT Compiler - Generate native code at runtime for faster execution
  • Project 13: Mini-QEMU Clone - Combine CPU emulation with full device emulation

Your RISC-V emulator forms the foundation for understanding all CPU virtualization, whether software-based (QEMU) or hardware-assisted (KVM).


This guide was expanded from HYPERVISOR_VIRTUALIZATION_DEEP_DIVE_PROJECTS.md. For the complete learning path, see the project index.