Project 3: Stack Machine Virtual Machine
Build a simple stack-based virtual machine that implements the fetch-decode-execute cycle, teaching you the fundamental loop that powers every CPU from microcontrollers to supercomputers.
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Intermediate |
| Time Estimate | 1 week (20-40 hours) |
| Language | C |
| Alternative Languages | Rust, Go, Zig |
| Prerequisites | Basic C programming, understanding of stacks as data structures |
| Key Topics | Fetch-Decode-Execute cycle, Opcode decoding, Stack pointer management, Control flow, Exception handling |
| Main Book | “Computer Systems: A Programmer’s Perspective” by Bryant & O’Hallaron |
1. Learning Objectives
After completing this project, you will be able to:
- Implement the fetch-decode-execute cycle that drives all CPUs
- Design and decode a custom instruction set with opcodes
- Manage a stack pointer register and understand push/pop semantics
- Implement control flow instructions (jumps, conditional branches)
- Handle exception conditions (stack overflow, stack underflow, division by zero)
- Trace program execution through memory at the instruction level
- Explain how high-level constructs (loops, function calls) map to low-level instructions
- Compare stack-based and register-based virtual machine architectures
2. Theoretical Foundation
2.1 Core Concepts
The Fetch-Decode-Execute Cycle
Every computer, from the simplest microcontroller to the most powerful supercomputer, operates on the same fundamental principle. This is the heartbeat of computation:
THE FETCH-DECODE-EXECUTE CYCLE:
================================
┌─────────────────────────────────────────┐
│ THE CPU LOOP │
│ │
│ while (running) { │
│ instruction = fetch(PC); │
│ operation = decode(instruction); │
│ execute(operation); │
│ PC = next_instruction(); │
│ } │
│ │
└─────────────────────────────────────────┘
DETAILED BREAKDOWN:
==================
┌──────────────────────────────────────────────────────────────────┐
│ │
│ ┌────────────────┐ │
│ │ │ │
│ ──▶│ FETCH │ Read instruction bytes from memory[PC] │
│ │ │ PC points to current instruction │
│ │ │ For our VM: read 1 byte (opcode) │
│ └───────┬────────┘ + optional operand bytes │
│ │ │
│ ▼ │
│ ┌────────────────┐ │
│ │ │ │
│ │ DECODE │ Parse opcode to determine operation │
│ │ │ Extract any embedded operands │
│ │ │ Map to handler function │
│ └───────┬────────┘ │
│ │ │
│ ▼ │
│ ┌────────────────┐ │
│ │ │ │
│ │ EXECUTE │ Perform the operation: │
│ │ │ - Arithmetic (ADD, SUB, MUL, DIV) │
│ │ │ - Stack ops (PUSH, POP, DUP, SWAP) │
│ │ │ - Control flow (JMP, JZ, CALL, RET) │
│ │ │ Update registers, memory, or flags │
│ └───────┬────────┘ │
│ │ │
│ │ ◄─────── Update PC (may be modified by jumps) │
│ │ │
│ ────────┴──────────── Loop back to FETCH ───────────────────── │
│ │
└──────────────────────────────────────────────────────────────────┘
TIMING IN REAL CPUs:
====================
Clock Cycle: │ 1 │ 2 │ 3 │ 4 │ 5 │ 6 │ 7 │ 8 │
├─────┼─────┼─────┼─────┼─────┼─────┼─────┼─────┤
Simple CPU: │ F1 │ D1 │ E1 │ F2 │ D2 │ E2 │ F3 │ D3 │
│ │ │ │ │ │ │ │ │
Pipelined CPU: │ F1 │ F2 │ F3 │ F4 │ F5 │ F6 │ F7 │ F8 │
(3-stage) │ │ D1 │ D2 │ D3 │ D4 │ D5 │ D6 │ D7 │
│ │ │ E1 │ E2 │ E3 │ E4 │ E5 │ E6 │
│ │ │ │ │ │ │ │ │
Our VM will implement the simple (non-pipelined) version first.
Stack-Based vs Register-Based Machines
Virtual machines come in two main architectural styles. Understanding both helps you appreciate the tradeoffs:
STACK-BASED ARCHITECTURE (What you'll build):
=============================================
Memory Stack (grows ↓)
┌─────────────┐ ┌─────────────┐
│ 0x00: PUSH │ │ │
│ 0x01: 5 │ │ │
│ 0x02: PUSH │ │ 5 │ ← Top after PUSH 5
│ 0x03: 3 │ │ 3 │ ← Top after PUSH 3
│ 0x04: ADD │ │ 8 │ ← Top after ADD (5+3)
│ 0x05: PRINT│ │ │
│ 0x06: HALT │ │ │
└─────────────┘ └─────────────┘
↑
SP (Stack Pointer)
Execution of "5 + 3":
PUSH 5 → Stack: [5] (operand 1)
PUSH 3 → Stack: [5, 3] (operand 2)
ADD → Stack: [8] (5 + 3 = 8)
Advantages:
- Simpler instruction encoding (operations implicitly use stack)
- Easier to compile to (no register allocation needed)
- More compact code
Disadvantages:
- More memory traffic (everything goes through stack)
- Harder to optimize at runtime
REGISTER-BASED ARCHITECTURE (Like x86, ARM):
============================================
Registers Memory
┌──────────────┐ ┌─────────────────────┐
│ R0: 0x0005 │ │ 0x00: LOAD R0, 5 │
│ R1: 0x0003 │ │ 0x04: LOAD R1, 3 │
│ R2: 0x0008 │ │ 0x08: ADD R2,R0,R1 │
│ R3: 0x0000 │ │ 0x0C: PRINT R2 │
│ ... │ │ 0x10: HALT │
└──────────────┘ └─────────────────────┘
Execution of "5 + 3":
LOAD R0, 5 → R0 = 5
LOAD R1, 3 → R1 = 3
ADD R2, R0, R1 → R2 = R0 + R1 = 8
Advantages:
- Fewer memory accesses (operands in registers)
- Easier for hardware optimization
- Better for pipelining
Disadvantages:
- More complex instruction encoding
- Need register allocation (harder to compile)
- Larger code size
REAL-WORLD EXAMPLES:
====================
Stack-Based VMs:
┌────────────────────────────────────────────────────────┐
│ • Java Virtual Machine (JVM) │
│ - Bytecode uses operand stack │
│ - Local variables stored in frame │
│ │
│ • CPython bytecode interpreter │
│ - LOAD_FAST pushes local variable │
│ - BINARY_ADD pops two, pushes result │
│ │
│ • WebAssembly (Wasm) │
│ - Stack machine with structured control flow │
│ - Designed for browser execution │
│ │
│ • Forth │
│ - Classic stack-based language │
│ - Minimal, powerful, still used in embedded │
│ │
│ • PostScript │
│ - Stack-based language for graphics │
│ - PDF is based on this │
└────────────────────────────────────────────────────────┘
Register-Based VMs:
┌────────────────────────────────────────────────────────┐
│ • Lua VM │
│ - Register-based for speed │
│ - LuaJIT is one of fastest scripting VMs │
│ │
│ • Android Dalvik (deprecated) │
│ - Register-based alternative to JVM │
│ - Better for mobile (fewer memory ops) │
│ │
│ • Android ART │
│ - Replaced Dalvik │
│ - Compiles to native at install time │
└────────────────────────────────────────────────────────┘
Stack Machine Architecture
Here’s the complete architecture of the VM you’ll build:
STACK MACHINE VIRTUAL MACHINE ARCHITECTURE:
===========================================
┌──────────────────────────────────────────────────────────────────────────┐
│ VIRTUAL MACHINE │
├──────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌────────────────────────────────────────────────────────────────────┐ │
│ │ PROGRAM MEMORY │ │
│ │ (Code Segment - Read Only) │ │
│ ├────────────────────────────────────────────────────────────────────┤ │
│ │ │ │
│ │ Address Byte Meaning │ │
│ │ ─────── ──── ─────── │ │
│ │ 0x0000 0x01 PUSH opcode │ │
│ │ 0x0001 0x05 operand: 5 │ │
│ │ 0x0002 0x01 PUSH opcode │ │
│ │ 0x0003 0x03 operand: 3 │ │
│ │ 0x0004 0x10 ADD opcode │ │
│ │ 0x0005 0x30 PRINT opcode │ │
│ │ 0x0006 0xFF HALT opcode │ │
│ │ ... │ │
│ │ 0xFFFF (max address in 64KB space) │ │
│ │ │ │
│ └────────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────┐ ┌─────────────────────────────────────┐ │
│ │ CPU REGISTERS │ │ DATA STACK │ │
│ ├─────────────────────────┤ ├─────────────────────────────────────┤ │
│ │ │ │ │ │
│ │ PC (Program Counter) │ │ ┌─────┐ ← Top of Stack (TOS) │ │
│ │ ┌────────────────────┐ │ │ │ 8 │ (result of 5+3) │ │
│ │ │ 0x0006 │ │ │ ├─────┤ │ │
│ │ └────────────────────┘ │ │ │ │ │ │
│ │ Points to next instr │ │ ├─────┤ │ │
│ │ │ │ │ │ │ │
│ │ SP (Stack Pointer) │ │ ├─────┤ │ │
│ │ ┌────────────────────┐ │ │ │ │ │ │
│ │ │ 1 │ │ │ ├─────┤ │ │
│ │ └────────────────────┘ │ │ │ │ ← Stack Bottom │ │
│ │ Index of TOS element │ │ └─────┘ (index 0) │ │
│ │ │ │ │ │
│ │ FLAGS (Status) │ │ SP = 1 means 1 element on stack │ │
│ │ ┌────────────────────┐ │ │ SP = 0 means stack is empty │ │
│ │ │ Z │ N │ O │ H │ │ │ │ │ │
│ │ └────────────────────┘ │ │ Stack grows UPWARD in our design │ │
│ │ Z=Zero, N=Neg, O=Ovf │ │ (index increases with push) │ │
│ │ H=Halted │ │ │ │
│ └─────────────────────────┘ └─────────────────────────────────────┘ │
│ │
│ ┌────────────────────────────────────────────────────────────────────┐ │
│ │ CALL STACK │ │
│ │ (For subroutine returns) │ │
│ ├────────────────────────────────────────────────────────────────────┤ │
│ │ │ │
│ │ When CALL is executed: │ │
│ │ 1. Push return address (current PC + instruction size) │ │
│ │ 2. Jump to subroutine address │ │
│ │ │ │
│ │ When RET is executed: │ │
│ │ 1. Pop return address from call stack │ │
│ │ 2. Set PC to that address │ │
│ │ │ │
│ │ ┌────────────────────────────────────────────────────────────┐ │ │
│ │ │ Return Addresses: [0x0010] [0x0025] [ ... ] │ │ │
│ │ └────────────────────────────────────────────────────────────┘ │ │
│ │ ↑ │ │
│ │ RSP (Return Stack Pointer) │ │
│ │ │ │
│ └────────────────────────────────────────────────────────────────────┘ │
│ │
└──────────────────────────────────────────────────────────────────────────┘
Instruction Encoding
Your VM will use a simple variable-length instruction encoding:
INSTRUCTION ENCODING DESIGN:
============================
Format 1: No operand (1 byte)
┌─────────────────────┐
│ OPCODE │
│ (8 bits) │
└─────────────────────┘
Examples: ADD, SUB, MUL, POP, DUP, SWAP, HALT
Format 2: With immediate value (2+ bytes)
┌─────────────────────┬─────────────────────┐
│ OPCODE │ OPERAND │
│ (8 bits) │ (8 or 16 bits) │
└─────────────────────┴─────────────────────┘
Examples: PUSH value, JMP address, JZ address
OPCODE TABLE:
=============
Mnemonic Opcode Size Stack Effect Description
──────── ────── ──── ──────────── ───────────
; Stack Operations
PUSH n 0x01 2 ( -- n) Push immediate value
POP 0x02 1 (a -- ) Discard top of stack
DUP 0x03 1 (a -- a a) Duplicate top
SWAP 0x04 1 (a b -- b a) Swap top two
OVER 0x05 1 (a b -- a b a) Copy second to top
ROT 0x06 1 (a b c -- b c a) Rotate top three
; Arithmetic
ADD 0x10 1 (a b -- a+b) Addition
SUB 0x11 1 (a b -- a-b) Subtraction
MUL 0x12 1 (a b -- a*b) Multiplication
DIV 0x13 1 (a b -- a/b) Integer division
MOD 0x14 1 (a b -- a%b) Modulo
NEG 0x15 1 (a -- -a) Negate
; Comparison
EQ 0x20 1 (a b -- a==b) Equal (1 or 0)
NE 0x21 1 (a b -- a!=b) Not equal
LT 0x22 1 (a b -- a<b) Less than
GT 0x23 1 (a b -- a>b) Greater than
LE 0x24 1 (a b -- a<=b) Less or equal
GE 0x25 1 (a b -- a>=b) Greater or equal
; Logic
AND 0x28 1 (a b -- a&b) Bitwise AND
OR 0x29 1 (a b -- a|b) Bitwise OR
XOR 0x2A 1 (a b -- a^b) Bitwise XOR
NOT 0x2B 1 (a -- ~a) Bitwise NOT
; Control Flow
JMP addr 0x30 3 ( -- ) Unconditional jump
JZ addr 0x31 3 (a -- ) Jump if zero
JNZ addr 0x32 3 (a -- ) Jump if not zero
CALL addr 0x33 3 ( -- ) Call subroutine
RET 0x34 1 ( -- ) Return from subroutine
; I/O
PRINT 0x40 1 (a -- ) Print top as decimal
PRINTC 0x41 1 (a -- ) Print top as character
READ 0x42 1 ( -- n) Read integer from input
; Control
NOP 0x00 1 ( -- ) No operation
HALT 0xFF 1 ( -- ) Stop execution
STACK EFFECT NOTATION (Forth-style):
====================================
( before -- after )
( -- n) : Nothing before, n on stack after
(a -- ) : a on stack before, nothing after
(a b -- a+b) : a and b before, their sum after
(a b -- b a) : a and b before, swapped after
Example execution of "3 4 + 5 *":
Instruction Stack (top right) Notes
─────────── ───────────────── ─────
PUSH 3 [3]
PUSH 4 [3, 4]
ADD [7] 3 + 4 = 7
PUSH 5 [7, 5]
MUL [35] 7 * 5 = 35
2.2 Why This Matters
Building a stack machine VM teaches fundamental concepts that apply everywhere:
REAL-WORLD CONNECTIONS:
=======================
1. JVM and Python Bytecode
┌────────────────────────────────────────────────────────────────┐
│ Your PUSH 5; PUSH 3; ADD is exactly like: │
│ │
│ Java bytecode: Python bytecode: │
│ iconst_5 LOAD_CONST 5 │
│ iconst_3 LOAD_CONST 3 │
│ iadd BINARY_ADD │
│ │
│ Understanding your VM helps you understand these runtimes! │
└────────────────────────────────────────────────────────────────┘
2. WebAssembly (Wasm)
┌────────────────────────────────────────────────────────────────┐
│ Wasm is a stack-based VM that runs in browsers: │
│ │
│ (i32.const 5) ; Push 5 │
│ (i32.const 3) ; Push 3 │
│ (i32.add) ; Pop both, push sum │
│ │
│ Same concepts, used by every major browser! │
└────────────────────────────────────────────────────────────────┘
3. CPU Architecture Understanding
┌────────────────────────────────────────────────────────────────┐
│ Your fetch-decode-execute loop is the SAME loop that runs │
│ in every physical CPU. You're building in software what │
│ Intel and AMD build in silicon. │
│ │
│ The x86 microarchitecture has: │
│ - Instruction fetch unit → Your fetch_instruction() │
│ - Decoder → Your opcode switch statement │
│ - Execution units → Your operation handlers │
└────────────────────────────────────────────────────────────────┘
4. Compiler Backends
┌────────────────────────────────────────────────────────────────┐
│ Compilers generate code for VMs: │
│ │
│ Source: int x = 5 + 3; │
│ ↓ │
│ AST: (assign x (+ 5 3)) │
│ ↓ │
│ Bytecode: PUSH 5; PUSH 3; ADD; STORE x │
│ │
│ Building a VM teaches you what compilers target! │
└────────────────────────────────────────────────────────────────┘
5. Hardware Stack Pointer (SP)
┌────────────────────────────────────────────────────────────────┐
│ Your SP register maps directly to: │
│ │
│ x86: RSP (64-bit stack pointer) │
│ ARM: SP (stack pointer register) │
│ RISC-V: x2/sp (stack pointer by convention) │
│ │
│ Your push/pop operations are what PUSH/POP x86 instructions │
│ do in hardware! │
└────────────────────────────────────────────────────────────────┘
2.3 Historical Context
TIMELINE OF STACK MACHINES:
===========================
1960s ──┬── Burroughs B5000 (1961)
│ - First commercial stack-based computer
│ - Hardware stack for expression evaluation
│ - Influenced ALGOL implementation
│
1970s ──┼── Forth (1970)
│ - Stack-based programming language
│ - Charles Moore for radio telescope control
│ - Still used in embedded systems
│
├── HP calculators (1972)
│ - RPN (Reverse Polish Notation)
│ - Stack-based calculation
│
1980s ──┼── PostScript (1982)
│ - Stack-based page description language
│ - Adobe, used in printing/PDF
│
├── Java (1995) / JVM
│ - Stack-based bytecode interpreter
│ - "Write once, run anywhere"
│
1990s ──┼── Python bytecode
│ - Stack-based CPython VM
│ - LOAD_FAST, BINARY_OP, etc.
│
2000s ──┼── .NET CLR (2002)
│ - Stack-based Common Language Runtime
│ - C#, F#, VB.NET compile to IL
│
2010s ──┼── WebAssembly (2017)
│ - Stack-based VM for browsers
│ - Near-native speed
│ - Compile C/C++/Rust to web
│
2020s ──┴── Still going strong!
- New languages still use stack VMs
- Performance JIT compiles away the stack
The stack machine concept has been a cornerstone of computing for 60+ years.
2.4 Common Misconceptions
MISCONCEPTION 1: "Stack machines are slow because of all the stack operations"
───────────────────────────────────────────────────────────────────────────────
Reality: JIT compilers (like in JVM, V8) transform stack operations into
register operations. The stack is an abstraction that enables portability
and simpler code generation. At runtime, hot code runs at near-native speed.
Source bytecode: JIT output (x86):
PUSH 5 mov eax, 5
PUSH 3 mov ebx, 3
ADD add eax, ebx
The stack is eliminated entirely!
MISCONCEPTION 2: "The data stack and call stack are the same thing"
───────────────────────────────────────────────────────────────────
Reality: Most VMs separate them:
- Data stack: operands for arithmetic/logic (what you manipulate)
- Call stack: return addresses for subroutines (control flow)
In hardware CPUs they often share one stack (with potential for overflow
attacks), but keeping them separate is safer and cleaner.
MISCONCEPTION 3: "Stack underflow can't happen if I'm careful"
──────────────────────────────────────────────────────────────
Reality: It absolutely can, and your VM must handle it gracefully:
POP ; Stack is empty - underflow!
ADD ; Stack has only 1 element - also underflow!
Good VMs check preconditions before every operation.
MISCONCEPTION 4: "PUSH is always followed by an operation"
──────────────────────────────────────────────────────────
Reality: Stack can accumulate values:
PUSH 1
PUSH 2
PUSH 3
PUSH 4 ; Stack: [1, 2, 3, 4]
ADD ; Stack: [1, 2, 7]
ADD ; Stack: [1, 9]
ADD ; Stack: [10]
This is valid and computes 1 + 2 + 3 + 4 = 10.
MISCONCEPTION 5: "My VM is complete when arithmetic works"
─────────────────────────────────────────────────────────
Reality: Control flow is equally important:
- JMP for loops
- JZ/JNZ for conditionals
- CALL/RET for subroutines
Without these, you can't write real programs!
3. Project Specification
3.1 What You Will Build
A complete stack-based virtual machine that:
- Loads and executes bytecode programs from files
- Implements 30+ instructions (stack ops, arithmetic, logic, control flow)
- Manages a data stack with proper overflow/underflow detection
- Supports subroutine calls with a separate return stack
- Provides debug output showing stack state and program execution
- Includes a simple assembler to write programs in human-readable form
3.2 Functional Requirements
- Load binary bytecode from a file
- Implement data stack with configurable maximum depth
- Implement all arithmetic operations: ADD, SUB, MUL, DIV, MOD, NEG
- Implement comparison operations: EQ, NE, LT, GT, LE, GE
- Implement stack manipulation: PUSH, POP, DUP, SWAP, OVER, ROT
- Implement control flow: JMP, JZ, JNZ, CALL, RET
- Implement I/O: PRINT, PRINTC, READ
- Detect and report stack underflow, overflow, and division by zero
- Run programs to completion (HALT) or error
3.3 Non-Functional Requirements
- Clean separation between VM core and I/O
- Portable C code (C99 or later)
- Verbose debug mode showing each instruction execution
- Error messages that help users fix their programs
- Memory-safe implementation (no buffer overflows)
3.4 Example Usage / Output
# Compile a program from assembly to bytecode
$ ./svm-asm factorial.svm factorial.bin
Assembled factorial.svm -> factorial.bin (42 bytes)
# Run the bytecode
$ ./stackvm factorial.bin
120
# Run with debug output
$ ./stackvm --debug factorial.bin
[0000] PUSH 5 | Stack: [5]
[0002] PUSH 1 | Stack: [5, 1]
[0004] DUP | Stack: [5, 1, 5]
[0005] JZ 0x0020 | Stack: [5, 1] (not jumping, TOS != 0)
[0008] SWAP | Stack: [1, 5]
[0009] OVER | Stack: [1, 5, 1]
[000A] MUL | Stack: [1, 5] (wait, that's wrong...)
... (execution continues)
[0020] POP | Stack: [120]
[0021] PRINT | Stack: []
Output: 120
[0022] HALT | Program terminated normally
# Run a program with an error
$ ./stackvm bad_program.bin
Runtime Error at 0x0005: Stack underflow (POP on empty stack)
Stack trace:
0x0005: POP
0x0003: ADD
0x0000: PUSH 5
3.5 Real World Outcome
When complete, you’ll be able to write programs like this:
; factorial.svm - Calculate 5!
; Result should be 120
PUSH 5 ; n = 5
PUSH 1 ; accumulator = 1
; LOOP: while n > 0
loop:
OVER ; copy n to top: [n, acc, n]
JZ end ; if n == 0, done
SWAP ; [n, n, acc]
OVER ; [n, n, acc, n]
MUL ; [n, n, acc*n]
SWAP ; [n, acc*n, n]
PUSH 1 ; [n, acc*n, n, 1]
SUB ; [n, acc*n, n-1]
SWAP ; [n, n-1, acc*n]
ROT ; [n-1, acc*n, n]
POP ; [n-1, acc*n]
JMP loop
end:
POP ; remove the 0
PRINT ; output the result
HALT
Running this produces:
$ ./stackvm factorial.bin
120
You’ve computed 5! = 5 * 4 * 3 * 2 * 1 = 120 using only a stack!
4. Solution Architecture
4.1 High-Level Design
STACK VM ARCHITECTURE:
======================
┌──────────────────────────────────────────────────────────────────────────┐
│ HOST SYSTEM │
├──────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌────────────────────────────────────────────────────────────────────┐ │
│ │ MAIN PROGRAM │ │
│ │ │ │
│ │ main() { │ │
│ │ vm = vm_create(); │ │
│ │ vm_load_program(vm, "program.bin"); │ │
│ │ while (vm_running(vm)) { │ │
│ │ vm_step(vm); // One instruction │ │
│ │ } │ │
│ │ vm_destroy(vm); │ │
│ │ } │ │
│ │ │ │
│ └────────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌────────────────────────────────────────────────────────────────────┐ │
│ │ VM CORE │ │
│ │ │ │
│ │ ┌─────────────────────────────────────────────────────────────┐ │ │
│ │ │ FETCH-DECODE-EXECUTE LOOP │ │ │
│ │ │ │ │ │
│ │ │ vm_step(VM *vm) { │ │ │
│ │ │ // FETCH │ │ │
│ │ │ opcode = vm->code[vm->pc++]; │ │ │
│ │ │ │ │ │
│ │ │ // DECODE & EXECUTE │ │ │
│ │ │ switch (opcode) { │ │ │
│ │ │ case OP_PUSH: ... │ │ │
│ │ │ case OP_ADD: ... │ │ │
│ │ │ case OP_JMP: ... │ │ │
│ │ │ } │ │ │
│ │ │ } │ │ │
│ │ │ │ │ │
│ │ └─────────────────────────────────────────────────────────────┘ │ │
│ │ │ │ │
│ │ ┌────────────────────┼────────────────────┐ │ │
│ │ ▼ ▼ ▼ │ │
│ │ ┌───────────────┐ ┌───────────────┐ ┌───────────────┐ │ │
│ │ │ DATA STACK │ │ RETURN STACK │ │ PROGRAM MEM │ │ │
│ │ │ │ │ │ │ │ │ │
│ │ │ int32_t[256] │ │ uint16_t[64] │ │ uint8_t[64KB] │ │ │
│ │ │ │ │ │ │ │ │ │
│ │ │ SP: 3 │ │ RSP: 1 │ │ Size: 42 │ │ │
│ │ │ │ │ │ │ │ │ │
│ │ │ [10, 20, 30] │ │ [0x0015] │ │ [bytecode...] │ │ │
│ │ │ ↑ │ │ ↑ │ │ │ │ │
│ │ │ TOS │ │ Return to │ │ │ │ │
│ │ └───────────────┘ └───────────────┘ └───────────────┘ │ │
│ │ │ │
│ └────────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌────────────────────────────────────────────────────────────────────┐ │
│ │ I/O INTERFACE │ │
│ │ │ │
│ │ PRINT: printf("%d\n", pop()) │ │
│ │ PRINTC: printf("%c", pop()) │ │
│ │ READ: push(read_int()) │ │
│ │ │ │
│ └────────────────────────────────────────────────────────────────────┘ │
│ │
└──────────────────────────────────────────────────────────────────────────┘
4.2 Key Components
- VM State Structure: Contains PC, SP, RSP, stacks, code, and flags
- Loader: Reads binary bytecode from file into code memory
- Fetch Unit: Reads opcode byte (and operands) from code[PC]
- Decoder/Executor: Switch statement mapping opcodes to operations
- Stack Manager: Push/pop with bounds checking
- Error Handler: Reports underflow, overflow, division by zero
- Debug Output: Optional trace of each instruction
4.3 Data Structures
/* Opcode definitions */
typedef enum {
/* Stack operations */
OP_NOP = 0x00,
OP_PUSH = 0x01,
OP_POP = 0x02,
OP_DUP = 0x03,
OP_SWAP = 0x04,
OP_OVER = 0x05,
OP_ROT = 0x06,
/* Arithmetic */
OP_ADD = 0x10,
OP_SUB = 0x11,
OP_MUL = 0x12,
OP_DIV = 0x13,
OP_MOD = 0x14,
OP_NEG = 0x15,
/* Comparison */
OP_EQ = 0x20,
OP_NE = 0x21,
OP_LT = 0x22,
OP_GT = 0x23,
OP_LE = 0x24,
OP_GE = 0x25,
/* Logic */
OP_AND = 0x28,
OP_OR = 0x29,
OP_XOR = 0x2A,
OP_NOT = 0x2B,
/* Control flow */
OP_JMP = 0x30,
OP_JZ = 0x31,
OP_JNZ = 0x32,
OP_CALL = 0x33,
OP_RET = 0x34,
/* I/O */
OP_PRINT = 0x40,
OP_PRINTC = 0x41,
OP_READ = 0x42,
/* Control */
OP_HALT = 0xFF
} Opcode;
/* Error codes */
typedef enum {
VM_OK = 0,
VM_ERR_STACK_UNDERFLOW,
VM_ERR_STACK_OVERFLOW,
VM_ERR_RSTACK_UNDERFLOW,
VM_ERR_RSTACK_OVERFLOW,
VM_ERR_DIV_BY_ZERO,
VM_ERR_INVALID_OPCODE,
VM_ERR_PC_OUT_OF_BOUNDS,
VM_ERR_HALTED
} VMError;
/* VM configuration constants */
#define STACK_SIZE 256
#define RSTACK_SIZE 64
#define CODE_SIZE 65536 /* 64KB code space */
/* Virtual Machine state */
typedef struct {
/* Data stack */
int32_t stack[STACK_SIZE];
int sp; /* Stack pointer (index of next free slot) */
/* Return stack (for CALL/RET) */
uint16_t rstack[RSTACK_SIZE];
int rsp; /* Return stack pointer */
/* Program memory */
uint8_t code[CODE_SIZE];
size_t code_size; /* Actual program size */
/* Registers */
uint16_t pc; /* Program counter */
/* Status */
bool halted;
VMError error;
/* Debug mode */
bool debug;
} VM;
4.4 Algorithm Overview
Main Execution Loop:
ALGORITHM: VM Execution
1. CREATE VM:
- Allocate VM structure
- Zero all stacks
- Set PC = 0, SP = 0, RSP = 0
- halted = false, error = VM_OK
2. LOAD PROGRAM:
- Open bytecode file
- Read into code[] array
- Set code_size
- Return error if file too large
3. EXECUTE:
WHILE (not halted and no error):
a. FETCH:
- opcode = code[pc]
- pc++
b. DECODE & EXECUTE:
SWITCH (opcode):
CASE OP_PUSH:
- operand = fetch_byte()
- stack_push(operand)
CASE OP_ADD:
- b = stack_pop()
- a = stack_pop()
- stack_push(a + b)
CASE OP_JMP:
- addr = fetch_word() // 2 bytes
- pc = addr
CASE OP_JZ:
- addr = fetch_word()
- value = stack_pop()
- IF (value == 0): pc = addr
CASE OP_CALL:
- addr = fetch_word()
- rstack_push(pc) // Save return address
- pc = addr
CASE OP_RET:
- pc = rstack_pop()
CASE OP_HALT:
- halted = true
DEFAULT:
- error = VM_ERR_INVALID_OPCODE
4. CLEANUP:
- Report final state or error
- Free resources
Stack Operations (with error checking):
ALGORITHM: stack_push(value)
IF sp >= STACK_SIZE:
error = VM_ERR_STACK_OVERFLOW
RETURN
stack[sp] = value
sp++
ALGORITHM: stack_pop() -> value
IF sp <= 0:
error = VM_ERR_STACK_UNDERFLOW
RETURN 0
sp--
RETURN stack[sp]
ALGORITHM: stack_peek(depth) -> value
IF sp - depth - 1 < 0:
error = VM_ERR_STACK_UNDERFLOW
RETURN 0
RETURN stack[sp - depth - 1]
5. Implementation Guide
5.1 Development Environment Setup
# Linux/macOS - Standard C development tools
# Ensure you have gcc and make installed
# Verify GCC
gcc --version
# Should show version info
# Create project structure
mkdir -p stackvm/{src,include,tests,examples}
cd stackvm
# Create Makefile
cat > Makefile << 'EOF'
CC = gcc
CFLAGS = -Wall -Wextra -std=c99 -O2 -g
LDFLAGS =
SRC = src/main.c src/vm.c src/loader.c src/debug.c
OBJ = $(SRC:.c=.o)
TARGET = stackvm
all: $(TARGET)
$(TARGET): $(OBJ)
$(CC) $(OBJ) -o $@ $(LDFLAGS)
%.o: %.c
$(CC) $(CFLAGS) -I include -c $< -o $@
clean:
rm -f $(OBJ) $(TARGET)
test: $(TARGET)
./$(TARGET) tests/simple_add.bin
./$(TARGET) tests/factorial.bin
.PHONY: all clean test
EOF
5.2 Project Structure
stackvm/
├── src/
│ ├── main.c # Entry point, CLI argument parsing
│ ├── vm.c # Core VM implementation
│ ├── loader.c # Bytecode file loading
│ └── debug.c # Debug output and disassembly
├── include/
│ ├── vm.h # VM structures and public API
│ ├── opcodes.h # Opcode definitions
│ └── debug.h # Debug function declarations
├── assembler/
│ ├── svm-asm.c # Simple assembler (text -> binary)
│ └── grammar.h # Assembly syntax definitions
├── tests/
│ ├── simple_add.svm # 5 + 3 = 8
│ ├── factorial.svm # Calculate 5!
│ └── fib.svm # Fibonacci sequence
├── examples/
│ ├── hello.svm # Print "Hello, World!"
│ └── calculator.svm # Interactive calculator
├── Makefile
└── README.md
5.3 The Core Question You’re Answering
“How does a CPU execute instructions, and how can we build one in software?”
This project answers the fundamental question of computation. By building a VM, you discover:
- Why the fetch-decode-execute cycle is universal
- How opcodes map to operations
- Why stacks are so useful for computation
- How control flow works at the machine level
- What happens “under the hood” of every program you write
5.4 Concepts You Must Understand First
Before writing code, ensure you can answer these questions:
Stacks:
- Q: If I push 1, then push 2, then pop, what do I get?
- A: 2 (LIFO - last in, first out)
Endianness:
- Q: A 16-bit address 0x0102 is stored little-endian. What’s in the first byte?
- A: 0x02 (least significant byte first)
Two’s Complement:
- Q: How is -1 represented in 8-bit two’s complement?
- A: 0xFF (all bits set)
Control Flow:
- Q: If JZ at address 0x10 jumps to 0x20, and TOS is 5, what’s PC after?
- A: 0x13 (0x10 + 3 bytes for JZ instruction, no jump because 5 != 0)
Function Calls:
- Q: Why do we need a separate return stack?
- A: To remember where to return to after a subroutine finishes
5.5 Questions to Guide Your Design
Architecture:
- What data type should stack elements be? (int32_t gives signed arithmetic)
- How many stack slots are enough? (256 is typical)
- Should the return stack be separate? (Yes, for cleaner design and security)
Instruction Set:
- Which addressing mode for jumps: absolute or relative? (Start with absolute)
- How many bytes for addresses? (2 bytes = 64KB address space)
- Should PUSH take a signed or unsigned operand? (Consider both)
Error Handling:
- What happens on stack underflow? (Stop with error, or return 0?)
- What about division by zero? (Stop with error)
- How to report the error location? (Include PC in error message)
Debugging:
- How to trace execution? (Print PC, opcode, and stack after each step)
- Should debug mode be a command-line flag? (Yes: –debug)
5.6 Thinking Exercise
Before writing any code, trace this program by hand:
Address Bytes Instruction
------ ----- -----------
0x00 01 05 PUSH 5
0x02 01 03 PUSH 3
0x04 10 ADD
0x05 03 DUP
0x06 01 02 PUSH 2
0x08 12 MUL
0x09 11 SUB
0x0A 40 PRINT
0x0B FF HALT
Fill in this execution trace:
| PC | Opcode | Stack Before | Operation | Stack After |
|---|---|---|---|---|
| 0x00 | PUSH 5 | [] | Push 5 | [5] |
| 0x02 | PUSH 3 | [5] | Push 3 | [5, 3] |
| 0x04 | ADD | [5, 3] | 5 + 3 | [8] |
| 0x05 | DUP | [8] | Dup TOS | [8, 8] |
| 0x06 | PUSH 2 | [8, 8] | Push 2 | [8, 8, 2] |
| 0x08 | MUL | [8, 8, 2] | 8 * 2 | [8, 16] |
| 0x09 | SUB | [8, 16] | 8 - 16 | [-8] |
| 0x0A | [-8] | Print -8 | [] | |
| 0x0B | HALT | [] | Stop | [] |
What does this program compute? Answer: (5 + 3) - ((5 + 3) * 2) = 8 - 16 = -8
5.7 Hints in Layers
Use these progressive hints only when stuck. Try to solve problems yourself first.
Hint 1: Starting Point - VM Structure
Start with the VM structure and basic lifecycle:
/* include/vm.h */
#ifndef VM_H
#define VM_H
#include <stdint.h>
#include <stdbool.h>
#include <stddef.h>
#define STACK_SIZE 256
#define RSTACK_SIZE 64
#define CODE_SIZE 65536
typedef struct {
int32_t stack[STACK_SIZE];
int sp;
uint16_t rstack[RSTACK_SIZE];
int rsp;
uint8_t code[CODE_SIZE];
size_t code_size;
uint16_t pc;
bool halted;
int error;
bool debug;
} VM;
VM *vm_create(void);
void vm_destroy(VM *vm);
bool vm_load(VM *vm, const char *filename);
void vm_run(VM *vm);
int vm_step(VM *vm);
#endif
Hint 2: Stack Operations with Error Checking
/* src/vm.c - Stack operations */
static int push(VM *vm, int32_t value) {
if (vm->sp >= STACK_SIZE) {
vm->error = VM_ERR_STACK_OVERFLOW;
return -1;
}
vm->stack[vm->sp++] = value;
return 0;
}
static int32_t pop(VM *vm) {
if (vm->sp <= 0) {
vm->error = VM_ERR_STACK_UNDERFLOW;
return 0;
}
return vm->stack[--vm->sp];
}
static int32_t peek(VM *vm, int depth) {
int index = vm->sp - depth - 1;
if (index < 0) {
vm->error = VM_ERR_STACK_UNDERFLOW;
return 0;
}
return vm->stack[index];
}
Hint 3: Fetching Operands
/* Fetch a single byte operand and advance PC */
static uint8_t fetch_byte(VM *vm) {
if (vm->pc >= vm->code_size) {
vm->error = VM_ERR_PC_OUT_OF_BOUNDS;
return 0;
}
return vm->code[vm->pc++];
}
/* Fetch a 16-bit address (little-endian) and advance PC */
static uint16_t fetch_word(VM *vm) {
uint8_t lo = fetch_byte(vm);
uint8_t hi = fetch_byte(vm);
return (hi << 8) | lo;
}
Hint 4: The Main Execution Loop
int vm_step(VM *vm) {
if (vm->halted || vm->error) return -1;
/* FETCH */
uint8_t opcode = fetch_byte(vm);
/* For debug output */
uint16_t instr_pc = vm->pc - 1;
/* DECODE & EXECUTE */
switch (opcode) {
case OP_NOP:
break;
case OP_PUSH: {
int32_t value = (int8_t)fetch_byte(vm); /* Sign-extend */
push(vm, value);
break;
}
case OP_ADD: {
int32_t b = pop(vm);
int32_t a = pop(vm);
push(vm, a + b);
break;
}
case OP_JMP: {
uint16_t addr = fetch_word(vm);
vm->pc = addr;
break;
}
case OP_JZ: {
uint16_t addr = fetch_word(vm);
int32_t value = pop(vm);
if (value == 0) {
vm->pc = addr;
}
break;
}
case OP_HALT:
vm->halted = true;
break;
default:
vm->error = VM_ERR_INVALID_OPCODE;
break;
}
return (vm->halted || vm->error) ? -1 : 0;
}
void vm_run(VM *vm) {
while (vm_step(vm) == 0) {
/* Continue until halt or error */
}
}
Hint 5: CALL and RET Implementation
case OP_CALL: {
uint16_t addr = fetch_word(vm);
/* Push return address onto return stack */
if (vm->rsp >= RSTACK_SIZE) {
vm->error = VM_ERR_RSTACK_OVERFLOW;
break;
}
vm->rstack[vm->rsp++] = vm->pc;
/* Jump to subroutine */
vm->pc = addr;
break;
}
case OP_RET: {
if (vm->rsp <= 0) {
vm->error = VM_ERR_RSTACK_UNDERFLOW;
break;
}
/* Pop return address and jump back */
vm->pc = vm->rstack[--vm->rsp];
break;
}
Hint 6: Debug Output
/* src/debug.c */
#include <stdio.h>
#include "vm.h"
void debug_instruction(VM *vm, uint16_t pc, uint8_t opcode) {
printf("[%04X] ", pc);
switch (opcode) {
case OP_PUSH:
printf("PUSH %d", (int8_t)vm->code[pc + 1]);
break;
case OP_ADD:
printf("ADD");
break;
case OP_JMP:
printf("JMP 0x%04X",
vm->code[pc+1] | (vm->code[pc+2] << 8));
break;
/* ... other opcodes ... */
default:
printf("??? (0x%02X)", opcode);
}
/* Print stack */
printf(" | Stack: [");
for (int i = 0; i < vm->sp; i++) {
if (i > 0) printf(", ");
printf("%d", vm->stack[i]);
}
printf("]\n");
}
5.8 The Interview Questions They’ll Ask
After completing this project, you’ll be ready for these questions:
Basic Understanding
- “Explain the fetch-decode-execute cycle”
- Good Answer: The CPU continuously (1) fetches the next instruction from memory at the program counter address, (2) decodes the opcode to determine what operation to perform, (3) executes the operation, updating registers or memory, then (4) updates the PC and repeats.
- Red Flag: “It’s how programs run” (too vague)
- “What’s the difference between a stack-based and register-based VM?”
- Good Answer: Stack-based VMs use an implicit operand stack (JVM, CPython), simpler encoding but more memory traffic. Register-based VMs use explicit registers (Lua, Dalvik), more complex encoding but faster execution.
- Follow-up: “Why might you choose one over the other?”
- “How do you handle a division by zero?”
- Good Answer: Set an error flag, store the PC where it occurred, and halt execution. The error can be reported to the user with context.
- Red Flag: “Just let it crash” (poor error handling)
Technical Details
- “How does a CALL instruction work?”
- Good Answer: CALL pushes the current PC (return address) onto the return stack, then sets PC to the subroutine address. RET pops the return address and sets PC to it.
- Key Insight: This is exactly how real CPUs handle function calls.
- “Why use a separate return stack?”
- Good Answer: Security (harder to corrupt return addresses), clarity (separates data from control flow), and simplicity (data stack operations don’t affect control flow).
- Real-World Example: Forth traditionally uses two stacks for this reason.
- “How would you implement local variables in a stack VM?”
- Good Answer: Reserve stack slots at function entry (ALLOC n), access them by offset from a base pointer or frame pointer. This is how JVM’s local variable array works.
Problem-Solving
- “Your factorial program gives wrong results. How do you debug?”
- Good Answer:
- Enable debug mode to trace each instruction
- Check the stack at each step
- Verify the loop counter is decrementing
- Check conditional jumps are using correct comparison
- Good Answer:
- “How would you optimize this VM?”
- Good Answer:
- Threaded code: Use computed goto instead of switch
- Super-instructions: Combine common sequences (PUSH_ADD)
- JIT compilation: Translate hot paths to native code
- Register caching: Keep TOS in a CPU register
- Good Answer:
- “How would you add floating-point support?”
- Good Answer: Add float type to stack (union or separate stack), add float opcodes (FADD, FSUB, FMUL, FDIV), handle type conversion (I2F, F2I). Consider alignment requirements.
5.9 Books That Will Help
| Topic | Book | Specific Chapter/Section | Why It Helps |
|---|---|---|---|
| Fetch-Decode-Execute | “Computer Organization and Design” by Patterson & Hennessy | Chapter 4.1 (The Processor) | Hardware perspective on CPU cycles |
| Stack-Based Computation | “Computer Systems: A Programmer’s Perspective” by Bryant & O’Hallaron | Chapter 3.7 (Procedures) | How stacks are used for function calls |
| Bytecode Interpretation | “Crafting Interpreters” by Robert Nystrom | Chapter 14 (Chunks of Bytecode) | Practical bytecode VM design |
| VM Internals | “Crafting Interpreters” | Chapter 15 (A Virtual Machine) | Complete VM implementation |
| Forth & Stack Machines | “Starting Forth” by Leo Brodie | Chapters 1-4 | The classic stack-based language |
| Opcode Design | “Language Implementation Patterns” by Terence Parr | Chapter 10 (Building Bytecode VMs) | Practical opcode encoding |
| JVM Internals | “Inside the Java Virtual Machine” by Bill Venners | Chapter 5 (The JVM Instruction Set) | Real-world stack VM |
| Low-Level C | “The C Programming Language” by K&R | Chapter 5-6 | Pointers, structures needed for VM |
5.10 Implementation Phases
Phase 1: Foundation (Day 1-2)
- Set up project structure
- Implement VM structure and lifecycle (create/destroy)
- Implement bytecode loader
- Add NOP and HALT instructions
- Milestone: Load and execute empty program that halts
Phase 2: Stack Operations (Day 2-3)
- Implement PUSH, POP, DUP, SWAP, OVER, ROT
- Add stack bounds checking
- Add debug output for stack state
- Milestone: Stack manipulation programs work
Phase 3: Arithmetic (Day 3-4)
- Implement ADD, SUB, MUL, DIV, MOD, NEG
- Add division by zero check
- Test with arithmetic expressions
- Milestone: “5 + 3 * 2” evaluates correctly
Phase 4: Comparison & Logic (Day 4)
- Implement EQ, NE, LT, GT, LE, GE
- Implement AND, OR, XOR, NOT
- Milestone: Comparison operations work
Phase 5: Control Flow (Day 5-6)
- Implement JMP, JZ, JNZ
- Implement CALL, RET with return stack
- Test with loops and subroutines
- Milestone: Factorial program runs correctly
Phase 6: I/O and Polish (Day 6-7)
- Implement PRINT, PRINTC, READ
- Add command-line arguments (–debug, etc.)
- Write test programs
- Document everything
- Milestone: Multiple test programs pass
5.11 Key Implementation Decisions
-
Stack Element Type: Use int32_t for signed arithmetic and reasonable range. Could extend to 64-bit later.
-
Operand Encoding: PUSH uses 1-byte signed operand (-128 to 127). For larger constants, add PUSH_WORD with 2-byte operand.
-
Address Format: 2-byte little-endian addresses (64KB address space is plenty for learning).
-
Error Handling: Set error code AND return from execution loop. Don’t try to continue after errors.
-
Separate Return Stack: Yes - cleaner, safer, matches Forth tradition.
-
Debug Mode: Command-line flag –debug enables per-instruction trace output.
6. Testing Strategy
6.1 Unit Testing
Test each instruction in isolation:
/* tests/test_arithmetic.c */
void test_add(void) {
VM *vm = vm_create();
/* Manual bytecode: PUSH 5, PUSH 3, ADD, HALT */
vm->code[0] = OP_PUSH;
vm->code[1] = 5;
vm->code[2] = OP_PUSH;
vm->code[3] = 3;
vm->code[4] = OP_ADD;
vm->code[5] = OP_HALT;
vm->code_size = 6;
vm_run(vm);
assert(vm->sp == 1);
assert(vm->stack[0] == 8); /* 5 + 3 = 8 */
assert(vm->error == VM_OK);
vm_destroy(vm);
printf("test_add: PASS\n");
}
void test_underflow(void) {
VM *vm = vm_create();
/* POP on empty stack */
vm->code[0] = OP_POP;
vm->code[1] = OP_HALT;
vm->code_size = 2;
vm_run(vm);
assert(vm->error == VM_ERR_STACK_UNDERFLOW);
vm_destroy(vm);
printf("test_underflow: PASS\n");
}
6.2 Integration Testing
Create test programs in assembly:
; tests/factorial.svm
; Calculate 5! = 120
PUSH 5 ; n
PUSH 1 ; accumulator
loop:
OVER ; [n, acc, n]
JZ end ; if n == 0, done
SWAP ; [n, n, acc]
OVER ; [n, n, acc, n]
MUL ; [n, n, acc*n]
SWAP ; [n, acc*n, n]
PUSH 1 ; [n, acc*n, n, 1]
SUB ; [n, acc*n, n-1]
SWAP ; [n, n-1, acc*n]
ROT ; [n-1, acc*n, n]
POP ; [n-1, acc*n]
JMP loop
end:
POP ; remove 0
PRINT
HALT
Expected output: 120
6.3 Debugging Techniques
Debug Mode Output:
$ ./stackvm --debug tests/add.bin
[0000] PUSH 5 | Stack: [5]
[0002] PUSH 3 | Stack: [5, 3]
[0004] ADD | Stack: [8]
[0005] PRINT | Stack: []
Output: 8
[0006] HALT | Program terminated
Stack Inspection:
void dump_stack(VM *vm) {
printf("Stack (SP=%d): ", vm->sp);
for (int i = 0; i < vm->sp; i++) {
printf("[%d]=%d ", i, vm->stack[i]);
}
printf("\n");
}
Memory Dump:
void dump_code(VM *vm, uint16_t start, int len) {
printf("Code at 0x%04X:\n", start);
for (int i = 0; i < len; i++) {
if (i % 16 == 0) printf("%04X: ", start + i);
printf("%02X ", vm->code[start + i]);
if (i % 16 == 15) printf("\n");
}
printf("\n");
}
7. Common Pitfalls & Debugging
Problem 1: Stack underflow on binary operations
- Symptom: Error when executing ADD, SUB, etc.
- Root Cause: Not enough values on stack
- Fix: Always ensure preconditions before popping
- Quick Test: Check SP >= 2 before binary ops
Problem 2: Infinite loop
- Symptom: Program never terminates
- Root Cause: JMP to wrong address, or condition never met
- Fix: Add instruction count limit, enable debug trace
- Quick Test: Add
max_instructionscounter
Problem 3: Jump to wrong address
- Symptom: “Invalid opcode” or unexpected behavior after JMP
- Root Cause: Endianness wrong when reading address
- Fix: Ensure little-endian read:
(hi << 8) | lo - Quick Test: JMP to known address, verify PC
Problem 4: Off-by-one in stack operations
- Symptom: Wrong value popped, or SP incorrect
- Root Cause: Pre-increment vs post-increment confusion
- Fix: push:
stack[sp++] = val, pop:return stack[--sp] - Quick Test: Push 1, push 2, pop should return 2
Problem 5: CALL/RET mismatch
- Symptom: Return goes to wrong address or crashes
- Root Cause: Not pushing PC at right time, or wrong stack
- Fix: CALL pushes PC after fetching address, before jumping
- Quick Test: Simple CALL/RET with PRINT before and after
Problem 6: Negative numbers display incorrectly
- Symptom: PUSH -1 shows as 255
- Root Cause: Not sign-extending byte operand
- Fix: Cast to int8_t before pushing:
push((int8_t)byte) - Quick Test: PUSH -1, PRINT should show -1
8. Extensions & Challenges
Extension 1: Simple Assembler
- Write a text-to-binary assembler
- Support labels for jumps
- Handle comments and whitespace
- Challenge: Add macro support
Extension 2: Larger Operands
- Add PUSH_WORD for 16-bit values
- Add PUSH_DWORD for 32-bit values
- Consider how this affects code size
Extension 3: Local Variables
- Add LOAD_LOCAL and STORE_LOCAL
- Implement frame pointer
- Support function-local storage
Extension 4: String Support
- Add string literals in data section
- Implement PRINTS instruction
- Handle null-terminated strings
Extension 5: Interactive Debugger
- Single-step execution
- Breakpoints at addresses
- Watch expressions on stack
- Memory inspection
Extension 6: Performance Optimization
- Threaded code (computed goto)
- Super-instructions (common sequences)
- Register caching (TOS in variable)
- Benchmark improvements
Extension 7: Standard Library
- Built-in subroutines
- Math functions (ABS, MAX, MIN)
- I/O functions (input line, etc.)
9. Real-World Connections
YOUR VM ←→ REAL-WORLD SYSTEMS:
┌────────────────────────────────────────────────────────────────────────┐
│ │
│ YOUR INSTRUCTION JAVA BYTECODE CPYTHON BYTECODE │
│ ────────────────── ────────────── ──────────────── │
│ PUSH 5 iconst_5 LOAD_CONST 5 │
│ PUSH 3 iconst_3 LOAD_CONST 3 │
│ ADD iadd BINARY_ADD │
│ PRINT invokevirtual PRINT_ITEM │
│ println │
│ │
│ JMP addr goto addr JUMP_ABSOLUTE │
│ JZ addr ifeq addr POP_JUMP_IF_FALSE │
│ CALL addr invokestatic CALL_FUNCTION │
│ RET return RETURN_VALUE │
│ │
│ ──────────────────────────────────────────────────────────────────────│
│ │
│ What you learned applies directly to: │
│ │
│ • Reading JVM bytecode (javap -c) │
│ • Understanding Python internals (dis module) │
│ • Analyzing WebAssembly │
│ • Writing compilers that target VMs │
│ • Implementing interpreters for DSLs │
│ │
└────────────────────────────────────────────────────────────────────────┘
JVM Connection:
# After your project, you can understand this:
$ echo 'class Add { public static void main(String[] a) {
System.out.println(5 + 3);
}}' > Add.java
$ javac Add.java
$ javap -c Add
Compiled from "Add.java"
class Add {
public static void main(java.lang.String[]);
Code:
0: getstatic #7 // PrintStream
3: bipush 8 // PUSH 8 (compiler folded 5+3!)
5: invokevirtual #13 // PRINT
8: return
}
Python Connection:
# After your project, you understand this:
import dis
dis.dis(lambda: 5 + 3)
# 1 0 LOAD_CONST 1 (5)
# 2 LOAD_CONST 2 (3)
# 4 BINARY_ADD
# 6 RETURN_VALUE
10. Resources
Primary References
- Crafting Interpreters - Bytecode Chapter - Excellent VM tutorial
- Forth Programming - Classic stack-based language
- JVM Specification - Real-world stack VM spec
Video Resources
- Ben Eater’s 8-bit CPU - Hardware fetch-decode-execute
- Tsoding’s Porth Language - Building a stack-based language
Related Projects
- CHIP-8 Emulator - Next step: emulate a real system
- Simple RISC CPU Emulator - Register-based alternative
11. Self-Assessment Checklist
Before moving to Project 4, verify:
Understanding:
- Can you explain fetch-decode-execute without notes?
- Can you trace a program’s execution by hand?
- Can you explain stack underflow and how to prevent it?
- Can you explain how CALL/RET implement subroutines?
- Can you compare stack-based vs register-based VMs?
Implementation:
- All stack operations work (PUSH, POP, DUP, SWAP, OVER, ROT)?
- All arithmetic works including edge cases (div by zero)?
- Control flow works (JMP, JZ, JNZ)?
- Subroutines work (CALL, RET)?
- Error handling catches all error conditions?
Testing:
- Simple arithmetic test passes (5 + 3 = 8)?
- Factorial program produces 120?
- Negative numbers work correctly?
- Stack underflow is caught and reported?
Growth:
- Can you modify the VM to add new instructions?
- Can you write programs without looking at examples?
- Do you understand how this relates to JVM/CPython?
12. Submission / Completion Criteria
Your implementation is complete when:
Minimum Viable Completion:
- VM loads and executes bytecode files
- Basic stack operations work (PUSH, POP, ADD, SUB, MUL)
- Program halts correctly
- At least one test program runs
Full Completion:
- All 30+ opcodes implemented
- Stack bounds checking works
- Control flow works (jumps, conditionals)
- CALL/RET with return stack works
- Debug mode shows execution trace
- Multiple test programs pass
Excellence (Going Above & Beyond):
- Simple assembler (text to bytecode)
- Interactive debugger (breakpoints, step)
- Local variable support
- Performance optimizations
- Extended test suite
Congratulations! By completing this project, you’ve built your first CPU in software. You now understand the fetch-decode-execute cycle that powers every computer, from embedded microcontrollers to supercomputers. The stack machine architecture you built is the same approach used by the Java Virtual Machine, CPython, and WebAssembly.
You’ve taken the fundamental step from “writing programs” to “understanding how programs run.”
This guide was expanded from CPU_ISA_ARCHITECTURE_PROJECTS.md. For the complete learning path, see the README.