Project 3: Stack Machine Virtual Machine

Build a simple stack-based virtual machine that implements the fetch-decode-execute cycle, teaching you the fundamental loop that powers every CPU from microcontrollers to supercomputers.

Quick Reference

Attribute	Value
Difficulty	Intermediate
Time Estimate	1 week (20-40 hours)
Language	C
Alternative Languages	Rust, Go, Zig
Prerequisites	Basic C programming, understanding of stacks as data structures
Key Topics	Fetch-Decode-Execute cycle, Opcode decoding, Stack pointer management, Control flow, Exception handling
Main Book	“Computer Systems: A Programmer’s Perspective” by Bryant & O’Hallaron

1. Learning Objectives

After completing this project, you will be able to:

Implement the fetch-decode-execute cycle that drives all CPUs
Design and decode a custom instruction set with opcodes
Manage a stack pointer register and understand push/pop semantics
Implement control flow instructions (jumps, conditional branches)
Handle exception conditions (stack overflow, stack underflow, division by zero)
Trace program execution through memory at the instruction level
Explain how high-level constructs (loops, function calls) map to low-level instructions
Compare stack-based and register-based virtual machine architectures

2. Theoretical Foundation

2.1 Core Concepts

The Fetch-Decode-Execute Cycle

Every computer, from the simplest microcontroller to the most powerful supercomputer, operates on the same fundamental principle. This is the heartbeat of computation:

THE FETCH-DECODE-EXECUTE CYCLE:
================================

                        ┌─────────────────────────────────────────┐
                        │           THE CPU LOOP                   │
                        │                                          │
                        │   while (running) {                      │
                        │       instruction = fetch(PC);           │
                        │       operation = decode(instruction);   │
                        │       execute(operation);                │
                        │       PC = next_instruction();           │
                        │   }                                      │
                        │                                          │
                        └─────────────────────────────────────────┘

DETAILED BREAKDOWN:
==================

    ┌──────────────────────────────────────────────────────────────────┐
    │                                                                   │
    │    ┌────────────────┐                                             │
    │    │                │                                             │
    │ ──▶│    FETCH       │  Read instruction bytes from memory[PC]     │
    │    │                │  PC points to current instruction           │
    │    │                │  For our VM: read 1 byte (opcode)           │
    │    └───────┬────────┘  + optional operand bytes                   │
    │            │                                                      │
    │            ▼                                                      │
    │    ┌────────────────┐                                             │
    │    │                │                                             │
    │    │    DECODE      │  Parse opcode to determine operation        │
    │    │                │  Extract any embedded operands              │
    │    │                │  Map to handler function                    │
    │    └───────┬────────┘                                             │
    │            │                                                      │
    │            ▼                                                      │
    │    ┌────────────────┐                                             │
    │    │                │                                             │
    │    │    EXECUTE     │  Perform the operation:                     │
    │    │                │  - Arithmetic (ADD, SUB, MUL, DIV)          │
    │    │                │  - Stack ops (PUSH, POP, DUP, SWAP)         │
    │    │                │  - Control flow (JMP, JZ, CALL, RET)        │
    │    │                │  Update registers, memory, or flags         │
    │    └───────┬────────┘                                             │
    │            │                                                      │
    │            │  ◄─────── Update PC (may be modified by jumps)       │
    │            │                                                      │
    │    ────────┴──────────── Loop back to FETCH ───────────────────── │
    │                                                                   │
    └──────────────────────────────────────────────────────────────────┘

TIMING IN REAL CPUs:
====================

    Clock Cycle:   │  1  │  2  │  3  │  4  │  5  │  6  │  7  │  8  │
                   ├─────┼─────┼─────┼─────┼─────┼─────┼─────┼─────┤
    Simple CPU:    │ F1  │ D1  │ E1  │ F2  │ D2  │ E2  │ F3  │ D3  │
                   │     │     │     │     │     │     │     │     │
    Pipelined CPU: │ F1  │ F2  │ F3  │ F4  │ F5  │ F6  │ F7  │ F8  │
    (3-stage)      │     │ D1  │ D2  │ D3  │ D4  │ D5  │ D6  │ D7  │
                   │     │     │ E1  │ E2  │ E3  │ E4  │ E5  │ E6  │
                   │     │     │     │     │     │     │     │     │

    Our VM will implement the simple (non-pipelined) version first.

Stack-Based vs Register-Based Machines

Virtual machines come in two main architectural styles. Understanding both helps you appreciate the tradeoffs:

STACK-BASED ARCHITECTURE (What you'll build):
=============================================

    Memory                  Stack (grows ↓)
    ┌─────────────┐         ┌─────────────┐
    │  0x00: PUSH │         │             │
    │  0x01: 5    │         │             │
    │  0x02: PUSH │         │     5       │  ← Top after PUSH 5
    │  0x03: 3    │         │     3       │  ← Top after PUSH 3
    │  0x04: ADD  │         │     8       │  ← Top after ADD (5+3)
    │  0x05: PRINT│         │             │
    │  0x06: HALT │         │             │
    └─────────────┘         └─────────────┘
                                   ↑
                                   SP (Stack Pointer)

    Execution of "5 + 3":

    PUSH 5  →  Stack: [5]           (operand 1)
    PUSH 3  →  Stack: [5, 3]        (operand 2)
    ADD     →  Stack: [8]           (5 + 3 = 8)

    Advantages:
    - Simpler instruction encoding (operations implicitly use stack)
    - Easier to compile to (no register allocation needed)
    - More compact code

    Disadvantages:
    - More memory traffic (everything goes through stack)
    - Harder to optimize at runtime


REGISTER-BASED ARCHITECTURE (Like x86, ARM):
============================================

    Registers               Memory
    ┌──────────────┐        ┌─────────────────────┐
    │ R0:  0x0005  │        │  0x00: LOAD R0, 5   │
    │ R1:  0x0003  │        │  0x04: LOAD R1, 3   │
    │ R2:  0x0008  │        │  0x08: ADD R2,R0,R1 │
    │ R3:  0x0000  │        │  0x0C: PRINT R2     │
    │ ...          │        │  0x10: HALT         │
    └──────────────┘        └─────────────────────┘

    Execution of "5 + 3":

    LOAD R0, 5       →  R0 = 5
    LOAD R1, 3       →  R1 = 3
    ADD R2, R0, R1   →  R2 = R0 + R1 = 8

    Advantages:
    - Fewer memory accesses (operands in registers)
    - Easier for hardware optimization
    - Better for pipelining

    Disadvantages:
    - More complex instruction encoding
    - Need register allocation (harder to compile)
    - Larger code size


REAL-WORLD EXAMPLES:
====================

    Stack-Based VMs:
    ┌────────────────────────────────────────────────────────┐
    │  • Java Virtual Machine (JVM)                          │
    │    - Bytecode uses operand stack                       │
    │    - Local variables stored in frame                   │
    │                                                        │
    │  • CPython bytecode interpreter                        │
    │    - LOAD_FAST pushes local variable                   │
    │    - BINARY_ADD pops two, pushes result                │
    │                                                        │
    │  • WebAssembly (Wasm)                                  │
    │    - Stack machine with structured control flow        │
    │    - Designed for browser execution                    │
    │                                                        │
    │  • Forth                                               │
    │    - Classic stack-based language                      │
    │    - Minimal, powerful, still used in embedded         │
    │                                                        │
    │  • PostScript                                          │
    │    - Stack-based language for graphics                 │
    │    - PDF is based on this                              │
    └────────────────────────────────────────────────────────┘

    Register-Based VMs:
    ┌────────────────────────────────────────────────────────┐
    │  • Lua VM                                              │
    │    - Register-based for speed                          │
    │    - LuaJIT is one of fastest scripting VMs            │
    │                                                        │
    │  • Android Dalvik (deprecated)                         │
    │    - Register-based alternative to JVM                 │
    │    - Better for mobile (fewer memory ops)              │
    │                                                        │
    │  • Android ART                                         │
    │    - Replaced Dalvik                                   │
    │    - Compiles to native at install time                │
    └────────────────────────────────────────────────────────┘

Stack Machine Architecture

Here’s the complete architecture of the VM you’ll build:

STACK MACHINE VIRTUAL MACHINE ARCHITECTURE:
===========================================

┌──────────────────────────────────────────────────────────────────────────┐
│                            VIRTUAL MACHINE                                │
├──────────────────────────────────────────────────────────────────────────┤
│                                                                           │
│  ┌────────────────────────────────────────────────────────────────────┐  │
│  │                        PROGRAM MEMORY                               │  │
│  │                    (Code Segment - Read Only)                       │  │
│  ├────────────────────────────────────────────────────────────────────┤  │
│  │                                                                     │  │
│  │   Address    Byte       Meaning                                     │  │
│  │   ───────    ────       ───────                                     │  │
│  │   0x0000     0x01       PUSH opcode                                 │  │
│  │   0x0001     0x05       operand: 5                                  │  │
│  │   0x0002     0x01       PUSH opcode                                 │  │
│  │   0x0003     0x03       operand: 3                                  │  │
│  │   0x0004     0x10       ADD opcode                                  │  │
│  │   0x0005     0x30       PRINT opcode                                │  │
│  │   0x0006     0xFF       HALT opcode                                 │  │
│  │   ...                                                               │  │
│  │   0xFFFF     (max address in 64KB space)                            │  │
│  │                                                                     │  │
│  └────────────────────────────────────────────────────────────────────┘  │
│                                                                           │
│  ┌─────────────────────────┐    ┌─────────────────────────────────────┐  │
│  │     CPU REGISTERS       │    │           DATA STACK                 │  │
│  ├─────────────────────────┤    ├─────────────────────────────────────┤  │
│  │                         │    │                                      │  │
│  │  PC  (Program Counter)  │    │   ┌─────┐  ← Top of Stack (TOS)      │  │
│  │  ┌────────────────────┐ │    │   │  8  │    (result of 5+3)         │  │
│  │  │       0x0006       │ │    │   ├─────┤                            │  │
│  │  └────────────────────┘ │    │   │     │                            │  │
│  │  Points to next instr   │    │   ├─────┤                            │  │
│  │                         │    │   │     │                            │  │
│  │  SP  (Stack Pointer)    │    │   ├─────┤                            │  │
│  │  ┌────────────────────┐ │    │   │     │                            │  │
│  │  │         1          │ │    │   ├─────┤                            │  │
│  │  └────────────────────┘ │    │   │     │  ← Stack Bottom            │  │
│  │  Index of TOS element   │    │   └─────┘    (index 0)               │  │
│  │                         │    │                                      │  │
│  │  FLAGS (Status)         │    │   SP = 1 means 1 element on stack    │  │
│  │  ┌────────────────────┐ │    │   SP = 0 means stack is empty        │  │
│  │  │ Z │ N │ O │ H │    │ │    │                                      │  │
│  │  └────────────────────┘ │    │   Stack grows UPWARD in our design   │  │
│  │  Z=Zero, N=Neg, O=Ovf   │    │   (index increases with push)        │  │
│  │  H=Halted               │    │                                      │  │
│  └─────────────────────────┘    └─────────────────────────────────────┘  │
│                                                                           │
│  ┌────────────────────────────────────────────────────────────────────┐  │
│  │                         CALL STACK                                  │  │
│  │                   (For subroutine returns)                          │  │
│  ├────────────────────────────────────────────────────────────────────┤  │
│  │                                                                     │  │
│  │   When CALL is executed:                                            │  │
│  │   1. Push return address (current PC + instruction size)            │  │
│  │   2. Jump to subroutine address                                     │  │
│  │                                                                     │  │
│  │   When RET is executed:                                             │  │
│  │   1. Pop return address from call stack                             │  │
│  │   2. Set PC to that address                                         │  │
│  │                                                                     │  │
│  │   ┌────────────────────────────────────────────────────────────┐   │  │
│  │   │  Return Addresses:  [0x0010] [0x0025] [ ... ]              │   │  │
│  │   └────────────────────────────────────────────────────────────┘   │  │
│  │                                           ↑                         │  │
│  │                                          RSP (Return Stack Pointer) │  │
│  │                                                                     │  │
│  └────────────────────────────────────────────────────────────────────┘  │
│                                                                           │
└──────────────────────────────────────────────────────────────────────────┘

Instruction Encoding

Your VM will use a simple variable-length instruction encoding:

INSTRUCTION ENCODING DESIGN:
============================

Format 1: No operand (1 byte)
┌─────────────────────┐
│      OPCODE         │
│     (8 bits)        │
└─────────────────────┘
Examples: ADD, SUB, MUL, POP, DUP, SWAP, HALT

Format 2: With immediate value (2+ bytes)
┌─────────────────────┬─────────────────────┐
│      OPCODE         │      OPERAND        │
│     (8 bits)        │    (8 or 16 bits)   │
└─────────────────────┴─────────────────────┘
Examples: PUSH value, JMP address, JZ address


OPCODE TABLE:
=============

Mnemonic    Opcode    Size    Stack Effect       Description
────────    ──────    ────    ────────────       ───────────
; Stack Operations
PUSH n      0x01      2       ( -- n)            Push immediate value
POP         0x02      1       (a -- )            Discard top of stack
DUP         0x03      1       (a -- a a)         Duplicate top
SWAP        0x04      1       (a b -- b a)       Swap top two
OVER        0x05      1       (a b -- a b a)     Copy second to top
ROT         0x06      1       (a b c -- b c a)   Rotate top three

; Arithmetic
ADD         0x10      1       (a b -- a+b)       Addition
SUB         0x11      1       (a b -- a-b)       Subtraction
MUL         0x12      1       (a b -- a*b)       Multiplication
DIV         0x13      1       (a b -- a/b)       Integer division
MOD         0x14      1       (a b -- a%b)       Modulo
NEG         0x15      1       (a -- -a)          Negate

; Comparison
EQ          0x20      1       (a b -- a==b)      Equal (1 or 0)
NE          0x21      1       (a b -- a!=b)      Not equal
LT          0x22      1       (a b -- a<b)       Less than
GT          0x23      1       (a b -- a>b)       Greater than
LE          0x24      1       (a b -- a<=b)      Less or equal
GE          0x25      1       (a b -- a>=b)      Greater or equal

; Logic
AND         0x28      1       (a b -- a&b)       Bitwise AND
OR          0x29      1       (a b -- a|b)       Bitwise OR
XOR         0x2A      1       (a b -- a^b)       Bitwise XOR
NOT         0x2B      1       (a -- ~a)          Bitwise NOT

; Control Flow
JMP addr    0x30      3       ( -- )             Unconditional jump
JZ  addr    0x31      3       (a -- )            Jump if zero
JNZ addr    0x32      3       (a -- )            Jump if not zero
CALL addr   0x33      3       ( -- )             Call subroutine
RET         0x34      1       ( -- )             Return from subroutine

; I/O
PRINT       0x40      1       (a -- )            Print top as decimal
PRINTC      0x41      1       (a -- )            Print top as character
READ        0x42      1       ( -- n)            Read integer from input

; Control
NOP         0x00      1       ( -- )             No operation
HALT        0xFF      1       ( -- )             Stop execution


STACK EFFECT NOTATION (Forth-style):
====================================

    ( before -- after )

    ( -- n)          : Nothing before, n on stack after
    (a -- )          : a on stack before, nothing after
    (a b -- a+b)     : a and b before, their sum after
    (a b -- b a)     : a and b before, swapped after

Example execution of "3 4 + 5 *":

    Instruction    Stack (top right)    Notes
    ───────────    ─────────────────    ─────
    PUSH 3         [3]
    PUSH 4         [3, 4]
    ADD            [7]                  3 + 4 = 7
    PUSH 5         [7, 5]
    MUL            [35]                 7 * 5 = 35

2.2 Why This Matters

Building a stack machine VM teaches fundamental concepts that apply everywhere:

REAL-WORLD CONNECTIONS:
=======================

1. JVM and Python Bytecode
   ┌────────────────────────────────────────────────────────────────┐
   │  Your PUSH 5; PUSH 3; ADD is exactly like:                     │
   │                                                                 │
   │  Java bytecode:           Python bytecode:                      │
   │    iconst_5                 LOAD_CONST 5                        │
   │    iconst_3                 LOAD_CONST 3                        │
   │    iadd                     BINARY_ADD                          │
   │                                                                 │
   │  Understanding your VM helps you understand these runtimes!    │
   └────────────────────────────────────────────────────────────────┘

2. WebAssembly (Wasm)
   ┌────────────────────────────────────────────────────────────────┐
   │  Wasm is a stack-based VM that runs in browsers:               │
   │                                                                 │
   │    (i32.const 5)           ; Push 5                            │
   │    (i32.const 3)           ; Push 3                            │
   │    (i32.add)               ; Pop both, push sum                │
   │                                                                 │
   │  Same concepts, used by every major browser!                   │
   └────────────────────────────────────────────────────────────────┘

3. CPU Architecture Understanding
   ┌────────────────────────────────────────────────────────────────┐
   │  Your fetch-decode-execute loop is the SAME loop that runs    │
   │  in every physical CPU. You're building in software what       │
   │  Intel and AMD build in silicon.                               │
   │                                                                 │
   │  The x86 microarchitecture has:                                │
   │    - Instruction fetch unit → Your fetch_instruction()         │
   │    - Decoder → Your opcode switch statement                    │
   │    - Execution units → Your operation handlers                 │
   └────────────────────────────────────────────────────────────────┘

4. Compiler Backends
   ┌────────────────────────────────────────────────────────────────┐
   │  Compilers generate code for VMs:                              │
   │                                                                 │
   │    Source: int x = 5 + 3;                                      │
   │      ↓                                                         │
   │    AST: (assign x (+ 5 3))                                     │
   │      ↓                                                         │
   │    Bytecode: PUSH 5; PUSH 3; ADD; STORE x                      │
   │                                                                 │
   │  Building a VM teaches you what compilers target!              │
   └────────────────────────────────────────────────────────────────┘

5. Hardware Stack Pointer (SP)
   ┌────────────────────────────────────────────────────────────────┐
   │  Your SP register maps directly to:                            │
   │                                                                 │
   │  x86:  RSP (64-bit stack pointer)                              │
   │  ARM:  SP (stack pointer register)                             │
   │  RISC-V: x2/sp (stack pointer by convention)                   │
   │                                                                 │
   │  Your push/pop operations are what PUSH/POP x86 instructions   │
   │  do in hardware!                                                │
   └────────────────────────────────────────────────────────────────┘

2.3 Historical Context

TIMELINE OF STACK MACHINES:
===========================

1960s ──┬── Burroughs B5000 (1961)
        │   - First commercial stack-based computer
        │   - Hardware stack for expression evaluation
        │   - Influenced ALGOL implementation
        │
1970s ──┼── Forth (1970)
        │   - Stack-based programming language
        │   - Charles Moore for radio telescope control
        │   - Still used in embedded systems
        │
        ├── HP calculators (1972)
        │   - RPN (Reverse Polish Notation)
        │   - Stack-based calculation
        │
1980s ──┼── PostScript (1982)
        │   - Stack-based page description language
        │   - Adobe, used in printing/PDF
        │
        ├── Java (1995) / JVM
        │   - Stack-based bytecode interpreter
        │   - "Write once, run anywhere"
        │
1990s ──┼── Python bytecode
        │   - Stack-based CPython VM
        │   - LOAD_FAST, BINARY_OP, etc.
        │
2000s ──┼── .NET CLR (2002)
        │   - Stack-based Common Language Runtime
        │   - C#, F#, VB.NET compile to IL
        │
2010s ──┼── WebAssembly (2017)
        │   - Stack-based VM for browsers
        │   - Near-native speed
        │   - Compile C/C++/Rust to web
        │
2020s ──┴── Still going strong!
            - New languages still use stack VMs
            - Performance JIT compiles away the stack

The stack machine concept has been a cornerstone of computing for 60+ years.

2.4 Common Misconceptions

MISCONCEPTION 1: "Stack machines are slow because of all the stack operations"
───────────────────────────────────────────────────────────────────────────────
Reality: JIT compilers (like in JVM, V8) transform stack operations into
register operations. The stack is an abstraction that enables portability
and simpler code generation. At runtime, hot code runs at near-native speed.

    Source bytecode:       JIT output (x86):
    PUSH 5                 mov eax, 5
    PUSH 3                 mov ebx, 3
    ADD                    add eax, ebx

    The stack is eliminated entirely!


MISCONCEPTION 2: "The data stack and call stack are the same thing"
───────────────────────────────────────────────────────────────────
Reality: Most VMs separate them:
  - Data stack: operands for arithmetic/logic (what you manipulate)
  - Call stack: return addresses for subroutines (control flow)

In hardware CPUs they often share one stack (with potential for overflow
attacks), but keeping them separate is safer and cleaner.


MISCONCEPTION 3: "Stack underflow can't happen if I'm careful"
──────────────────────────────────────────────────────────────
Reality: It absolutely can, and your VM must handle it gracefully:

    POP         ; Stack is empty - underflow!
    ADD         ; Stack has only 1 element - also underflow!

Good VMs check preconditions before every operation.


MISCONCEPTION 4: "PUSH is always followed by an operation"
──────────────────────────────────────────────────────────
Reality: Stack can accumulate values:

    PUSH 1
    PUSH 2
    PUSH 3
    PUSH 4      ; Stack: [1, 2, 3, 4]
    ADD         ; Stack: [1, 2, 7]
    ADD         ; Stack: [1, 9]
    ADD         ; Stack: [10]

This is valid and computes 1 + 2 + 3 + 4 = 10.


MISCONCEPTION 5: "My VM is complete when arithmetic works"
─────────────────────────────────────────────────────────
Reality: Control flow is equally important:
  - JMP for loops
  - JZ/JNZ for conditionals
  - CALL/RET for subroutines

Without these, you can't write real programs!

3. Project Specification

3.1 What You Will Build

A complete stack-based virtual machine that:

Loads and executes bytecode programs from files
Implements 30+ instructions (stack ops, arithmetic, logic, control flow)
Manages a data stack with proper overflow/underflow detection
Supports subroutine calls with a separate return stack
Provides debug output showing stack state and program execution
Includes a simple assembler to write programs in human-readable form

3.2 Functional Requirements

Load binary bytecode from a file
Implement data stack with configurable maximum depth
Implement all arithmetic operations: ADD, SUB, MUL, DIV, MOD, NEG
Implement comparison operations: EQ, NE, LT, GT, LE, GE
Implement stack manipulation: PUSH, POP, DUP, SWAP, OVER, ROT
Implement control flow: JMP, JZ, JNZ, CALL, RET
Implement I/O: PRINT, PRINTC, READ
Detect and report stack underflow, overflow, and division by zero
Run programs to completion (HALT) or error

3.3 Non-Functional Requirements

Clean separation between VM core and I/O
Portable C code (C99 or later)
Verbose debug mode showing each instruction execution
Error messages that help users fix their programs
Memory-safe implementation (no buffer overflows)

3.4 Example Usage / Output

# Compile a program from assembly to bytecode
$ ./svm-asm factorial.svm factorial.bin
Assembled factorial.svm -> factorial.bin (42 bytes)

# Run the bytecode
$ ./stackvm factorial.bin
120

# Run with debug output
$ ./stackvm --debug factorial.bin
[0000] PUSH 5        | Stack: [5]
[0002] PUSH 1        | Stack: [5, 1]
[0004] DUP           | Stack: [5, 1, 5]
[0005] JZ 0x0020     | Stack: [5, 1]  (not jumping, TOS != 0)
[0008] SWAP          | Stack: [1, 5]
[0009] OVER          | Stack: [1, 5, 1]
[000A] MUL           | Stack: [1, 5]  (wait, that's wrong...)
... (execution continues)
[0020] POP           | Stack: [120]
[0021] PRINT         | Stack: []
Output: 120
[0022] HALT          | Program terminated normally

# Run a program with an error
$ ./stackvm bad_program.bin
Runtime Error at 0x0005: Stack underflow (POP on empty stack)
Stack trace:
  0x0005: POP
  0x0003: ADD
  0x0000: PUSH 5

3.5 Real World Outcome

When complete, you’ll be able to write programs like this:

; factorial.svm - Calculate 5!
; Result should be 120

PUSH 5          ; n = 5
PUSH 1          ; accumulator = 1

; LOOP: while n > 0
loop:
    OVER        ; copy n to top: [n, acc, n]
    JZ end      ; if n == 0, done
    SWAP        ; [n, n, acc]
    OVER        ; [n, n, acc, n]
    MUL         ; [n, n, acc*n]
    SWAP        ; [n, acc*n, n]
    PUSH 1      ; [n, acc*n, n, 1]
    SUB         ; [n, acc*n, n-1]
    SWAP        ; [n, n-1, acc*n]
    ROT         ; [n-1, acc*n, n]
    POP         ; [n-1, acc*n]
    JMP loop

end:
    POP         ; remove the 0
    PRINT       ; output the result
    HALT

Running this produces:

$ ./stackvm factorial.bin
120

You’ve computed 5! = 5 * 4 * 3 * 2 * 1 = 120 using only a stack!

4. Solution Architecture

4.1 High-Level Design

STACK VM ARCHITECTURE:
======================

┌──────────────────────────────────────────────────────────────────────────┐
│                          HOST SYSTEM                                      │
├──────────────────────────────────────────────────────────────────────────┤
│                                                                           │
│  ┌────────────────────────────────────────────────────────────────────┐  │
│  │                       MAIN PROGRAM                                  │  │
│  │                                                                     │  │
│  │   main() {                                                          │  │
│  │       vm = vm_create();                                             │  │
│  │       vm_load_program(vm, "program.bin");                           │  │
│  │       while (vm_running(vm)) {                                      │  │
│  │           vm_step(vm);           // One instruction                 │  │
│  │       }                                                             │  │
│  │       vm_destroy(vm);                                               │  │
│  │   }                                                                 │  │
│  │                                                                     │  │
│  └────────────────────────────────────────────────────────────────────┘  │
│                              │                                            │
│                              ▼                                            │
│  ┌────────────────────────────────────────────────────────────────────┐  │
│  │                        VM CORE                                      │  │
│  │                                                                     │  │
│  │  ┌─────────────────────────────────────────────────────────────┐   │  │
│  │  │                  FETCH-DECODE-EXECUTE LOOP                   │   │  │
│  │  │                                                              │   │  │
│  │  │   vm_step(VM *vm) {                                          │   │  │
│  │  │       // FETCH                                               │   │  │
│  │  │       opcode = vm->code[vm->pc++];                           │   │  │
│  │  │                                                              │   │  │
│  │  │       // DECODE & EXECUTE                                    │   │  │
│  │  │       switch (opcode) {                                      │   │  │
│  │  │           case OP_PUSH: ...                                  │   │  │
│  │  │           case OP_ADD:  ...                                  │   │  │
│  │  │           case OP_JMP:  ...                                  │   │  │
│  │  │       }                                                      │   │  │
│  │  │   }                                                          │   │  │
│  │  │                                                              │   │  │
│  │  └─────────────────────────────────────────────────────────────┘   │  │
│  │                              │                                      │  │
│  │         ┌────────────────────┼────────────────────┐                │  │
│  │         ▼                    ▼                    ▼                │  │
│  │  ┌───────────────┐  ┌───────────────┐  ┌───────────────┐          │  │
│  │  │  DATA STACK   │  │ RETURN STACK  │  │ PROGRAM MEM   │          │  │
│  │  │               │  │               │  │               │          │  │
│  │  │ int32_t[256]  │  │ uint16_t[64]  │  │ uint8_t[64KB] │          │  │
│  │  │               │  │               │  │               │          │  │
│  │  │ SP: 3         │  │ RSP: 1        │  │ Size: 42      │          │  │
│  │  │               │  │               │  │               │          │  │
│  │  │ [10, 20, 30]  │  │ [0x0015]      │  │ [bytecode...] │          │  │
│  │  │           ↑   │  │          ↑    │  │               │          │  │
│  │  │          TOS  │  │     Return to │  │               │          │  │
│  │  └───────────────┘  └───────────────┘  └───────────────┘          │  │
│  │                                                                     │  │
│  └────────────────────────────────────────────────────────────────────┘  │
│                              │                                            │
│                              ▼                                            │
│  ┌────────────────────────────────────────────────────────────────────┐  │
│  │                      I/O INTERFACE                                  │  │
│  │                                                                     │  │
│  │   PRINT: printf("%d\n", pop())                                      │  │
│  │   PRINTC: printf("%c", pop())                                       │  │
│  │   READ: push(read_int())                                            │  │
│  │                                                                     │  │
│  └────────────────────────────────────────────────────────────────────┘  │
│                                                                           │
└──────────────────────────────────────────────────────────────────────────┘

4.2 Key Components

VM State Structure: Contains PC, SP, RSP, stacks, code, and flags
Loader: Reads binary bytecode from file into code memory
Fetch Unit: Reads opcode byte (and operands) from code[PC]
Decoder/Executor: Switch statement mapping opcodes to operations
Stack Manager: Push/pop with bounds checking
Error Handler: Reports underflow, overflow, division by zero
Debug Output: Optional trace of each instruction

4.3 Data Structures

/* Opcode definitions */
typedef enum {
    /* Stack operations */
    OP_NOP    = 0x00,
    OP_PUSH   = 0x01,
    OP_POP    = 0x02,
    OP_DUP    = 0x03,
    OP_SWAP   = 0x04,
    OP_OVER   = 0x05,
    OP_ROT    = 0x06,

    /* Arithmetic */
    OP_ADD    = 0x10,
    OP_SUB    = 0x11,
    OP_MUL    = 0x12,
    OP_DIV    = 0x13,
    OP_MOD    = 0x14,
    OP_NEG    = 0x15,

    /* Comparison */
    OP_EQ     = 0x20,
    OP_NE     = 0x21,
    OP_LT     = 0x22,
    OP_GT     = 0x23,
    OP_LE     = 0x24,
    OP_GE     = 0x25,

    /* Logic */
    OP_AND    = 0x28,
    OP_OR     = 0x29,
    OP_XOR    = 0x2A,
    OP_NOT    = 0x2B,

    /* Control flow */
    OP_JMP    = 0x30,
    OP_JZ     = 0x31,
    OP_JNZ    = 0x32,
    OP_CALL   = 0x33,
    OP_RET    = 0x34,

    /* I/O */
    OP_PRINT  = 0x40,
    OP_PRINTC = 0x41,
    OP_READ   = 0x42,

    /* Control */
    OP_HALT   = 0xFF
} Opcode;

/* Error codes */
typedef enum {
    VM_OK = 0,
    VM_ERR_STACK_UNDERFLOW,
    VM_ERR_STACK_OVERFLOW,
    VM_ERR_RSTACK_UNDERFLOW,
    VM_ERR_RSTACK_OVERFLOW,
    VM_ERR_DIV_BY_ZERO,
    VM_ERR_INVALID_OPCODE,
    VM_ERR_PC_OUT_OF_BOUNDS,
    VM_ERR_HALTED
} VMError;

/* VM configuration constants */
#define STACK_SIZE    256
#define RSTACK_SIZE   64
#define CODE_SIZE     65536  /* 64KB code space */

/* Virtual Machine state */
typedef struct {
    /* Data stack */
    int32_t stack[STACK_SIZE];
    int sp;  /* Stack pointer (index of next free slot) */

    /* Return stack (for CALL/RET) */
    uint16_t rstack[RSTACK_SIZE];
    int rsp;  /* Return stack pointer */

    /* Program memory */
    uint8_t code[CODE_SIZE];
    size_t code_size;  /* Actual program size */

    /* Registers */
    uint16_t pc;  /* Program counter */

    /* Status */
    bool halted;
    VMError error;

    /* Debug mode */
    bool debug;
} VM;

4.4 Algorithm Overview

Main Execution Loop:

ALGORITHM: VM Execution

1. CREATE VM:
   - Allocate VM structure
   - Zero all stacks
   - Set PC = 0, SP = 0, RSP = 0
   - halted = false, error = VM_OK

2. LOAD PROGRAM:
   - Open bytecode file
   - Read into code[] array
   - Set code_size
   - Return error if file too large

3. EXECUTE:
   WHILE (not halted and no error):
       a. FETCH:
          - opcode = code[pc]
          - pc++

       b. DECODE & EXECUTE:
          SWITCH (opcode):
              CASE OP_PUSH:
                  - operand = fetch_byte()
                  - stack_push(operand)

              CASE OP_ADD:
                  - b = stack_pop()
                  - a = stack_pop()
                  - stack_push(a + b)

              CASE OP_JMP:
                  - addr = fetch_word()  // 2 bytes
                  - pc = addr

              CASE OP_JZ:
                  - addr = fetch_word()
                  - value = stack_pop()
                  - IF (value == 0): pc = addr

              CASE OP_CALL:
                  - addr = fetch_word()
                  - rstack_push(pc)  // Save return address
                  - pc = addr

              CASE OP_RET:
                  - pc = rstack_pop()

              CASE OP_HALT:
                  - halted = true

              DEFAULT:
                  - error = VM_ERR_INVALID_OPCODE

4. CLEANUP:
   - Report final state or error
   - Free resources

Stack Operations (with error checking):

ALGORITHM: stack_push(value)
  IF sp >= STACK_SIZE:
      error = VM_ERR_STACK_OVERFLOW
      RETURN
  stack[sp] = value
  sp++

ALGORITHM: stack_pop() -> value
  IF sp <= 0:
      error = VM_ERR_STACK_UNDERFLOW
      RETURN 0
  sp--
  RETURN stack[sp]

ALGORITHM: stack_peek(depth) -> value
  IF sp - depth - 1 < 0:
      error = VM_ERR_STACK_UNDERFLOW
      RETURN 0
  RETURN stack[sp - depth - 1]

5. Implementation Guide

5.1 Development Environment Setup

# Linux/macOS - Standard C development tools
# Ensure you have gcc and make installed

# Verify GCC
gcc --version
# Should show version info

# Create project structure
mkdir -p stackvm/{src,include,tests,examples}
cd stackvm

# Create Makefile
cat > Makefile << 'EOF'
CC = gcc
CFLAGS = -Wall -Wextra -std=c99 -O2 -g
LDFLAGS =

SRC = src/main.c src/vm.c src/loader.c src/debug.c
OBJ = $(SRC:.c=.o)
TARGET = stackvm

all: $(TARGET)

$(TARGET): $(OBJ)
	$(CC) $(OBJ) -o $@ $(LDFLAGS)

%.o: %.c
	$(CC) $(CFLAGS) -I include -c $< -o $@

clean:
	rm -f $(OBJ) $(TARGET)

test: $(TARGET)
	./$(TARGET) tests/simple_add.bin
	./$(TARGET) tests/factorial.bin

.PHONY: all clean test
EOF

5.2 Project Structure

stackvm/
├── src/
│   ├── main.c           # Entry point, CLI argument parsing
│   ├── vm.c             # Core VM implementation
│   ├── loader.c         # Bytecode file loading
│   └── debug.c          # Debug output and disassembly
├── include/
│   ├── vm.h             # VM structures and public API
│   ├── opcodes.h        # Opcode definitions
│   └── debug.h          # Debug function declarations
├── assembler/
│   ├── svm-asm.c        # Simple assembler (text -> binary)
│   └── grammar.h        # Assembly syntax definitions
├── tests/
│   ├── simple_add.svm   # 5 + 3 = 8
│   ├── factorial.svm    # Calculate 5!
│   └── fib.svm          # Fibonacci sequence
├── examples/
│   ├── hello.svm        # Print "Hello, World!"
│   └── calculator.svm   # Interactive calculator
├── Makefile
└── README.md

5.3 The Core Question You’re Answering

“How does a CPU execute instructions, and how can we build one in software?”

This project answers the fundamental question of computation. By building a VM, you discover:

Why the fetch-decode-execute cycle is universal
How opcodes map to operations
Why stacks are so useful for computation
How control flow works at the machine level
What happens “under the hood” of every program you write

5.4 Concepts You Must Understand First

Before writing code, ensure you can answer these questions:

Stacks:

Q: If I push 1, then push 2, then pop, what do I get?
A: 2 (LIFO - last in, first out)

Endianness:

Q: A 16-bit address 0x0102 is stored little-endian. What’s in the first byte?
A: 0x02 (least significant byte first)

Two’s Complement:

Q: How is -1 represented in 8-bit two’s complement?
A: 0xFF (all bits set)

Control Flow:

Q: If JZ at address 0x10 jumps to 0x20, and TOS is 5, what’s PC after?
A: 0x13 (0x10 + 3 bytes for JZ instruction, no jump because 5 != 0)

Function Calls:

Q: Why do we need a separate return stack?
A: To remember where to return to after a subroutine finishes

5.5 Questions to Guide Your Design

Architecture:

What data type should stack elements be? (int32_t gives signed arithmetic)
How many stack slots are enough? (256 is typical)
Should the return stack be separate? (Yes, for cleaner design and security)

Instruction Set:

Which addressing mode for jumps: absolute or relative? (Start with absolute)
How many bytes for addresses? (2 bytes = 64KB address space)
Should PUSH take a signed or unsigned operand? (Consider both)

Error Handling:

What happens on stack underflow? (Stop with error, or return 0?)
What about division by zero? (Stop with error)
How to report the error location? (Include PC in error message)

Debugging:

How to trace execution? (Print PC, opcode, and stack after each step)
Should debug mode be a command-line flag? (Yes: –debug)

5.6 Thinking Exercise

Before writing any code, trace this program by hand:

Address  Bytes       Instruction
------   -----       -----------
0x00     01 05       PUSH 5
0x02     01 03       PUSH 3
0x04     10          ADD
0x05     03          DUP
0x06     01 02       PUSH 2
0x08     12          MUL
0x09     11          SUB
0x0A     40          PRINT
0x0B     FF          HALT

Fill in this execution trace:

PC	Opcode	Stack Before	Operation	Stack After
0x00	PUSH 5	[]	Push 5	[5]
0x02	PUSH 3	[5]	Push 3	[5, 3]
0x04	ADD	[5, 3]	5 + 3	[8]
0x05	DUP	[8]	Dup TOS	[8, 8]
0x06	PUSH 2	[8, 8]	Push 2	[8, 8, 2]
0x08	MUL	[8, 8, 2]	8 * 2	[8, 16]
0x09	SUB	[8, 16]	8 - 16	[-8]
0x0A	PRINT	[-8]	Print -8	[]
0x0B	HALT	[]	Stop	[]

What does this program compute? Answer: (5 + 3) - ((5 + 3) * 2) = 8 - 16 = -8

5.7 Hints in Layers

Use these progressive hints only when stuck. Try to solve problems yourself first.

Hint 1: Starting Point - VM Structure

Start with the VM structure and basic lifecycle:

/* include/vm.h */
#ifndef VM_H
#define VM_H

#include <stdint.h>
#include <stdbool.h>
#include <stddef.h>

#define STACK_SIZE   256
#define RSTACK_SIZE  64
#define CODE_SIZE    65536

typedef struct {
    int32_t stack[STACK_SIZE];
    int sp;

    uint16_t rstack[RSTACK_SIZE];
    int rsp;

    uint8_t code[CODE_SIZE];
    size_t code_size;

    uint16_t pc;
    bool halted;
    int error;
    bool debug;
} VM;

VM *vm_create(void);
void vm_destroy(VM *vm);
bool vm_load(VM *vm, const char *filename);
void vm_run(VM *vm);
int vm_step(VM *vm);

#endif

Hint 2: Stack Operations with Error Checking

/* src/vm.c - Stack operations */

static int push(VM *vm, int32_t value) {
    if (vm->sp >= STACK_SIZE) {
        vm->error = VM_ERR_STACK_OVERFLOW;
        return -1;
    }
    vm->stack[vm->sp++] = value;
    return 0;
}

static int32_t pop(VM *vm) {
    if (vm->sp <= 0) {
        vm->error = VM_ERR_STACK_UNDERFLOW;
        return 0;
    }
    return vm->stack[--vm->sp];
}

static int32_t peek(VM *vm, int depth) {
    int index = vm->sp - depth - 1;
    if (index < 0) {
        vm->error = VM_ERR_STACK_UNDERFLOW;
        return 0;
    }
    return vm->stack[index];
}

Hint 3: Fetching Operands

/* Fetch a single byte operand and advance PC */
static uint8_t fetch_byte(VM *vm) {
    if (vm->pc >= vm->code_size) {
        vm->error = VM_ERR_PC_OUT_OF_BOUNDS;
        return 0;
    }
    return vm->code[vm->pc++];
}

/* Fetch a 16-bit address (little-endian) and advance PC */
static uint16_t fetch_word(VM *vm) {
    uint8_t lo = fetch_byte(vm);
    uint8_t hi = fetch_byte(vm);
    return (hi << 8) | lo;
}

Hint 4: The Main Execution Loop

int vm_step(VM *vm) {
    if (vm->halted || vm->error) return -1;

    /* FETCH */
    uint8_t opcode = fetch_byte(vm);

    /* For debug output */
    uint16_t instr_pc = vm->pc - 1;

    /* DECODE & EXECUTE */
    switch (opcode) {
        case OP_NOP:
            break;

        case OP_PUSH: {
            int32_t value = (int8_t)fetch_byte(vm);  /* Sign-extend */
            push(vm, value);
            break;
        }

        case OP_ADD: {
            int32_t b = pop(vm);
            int32_t a = pop(vm);
            push(vm, a + b);
            break;
        }

        case OP_JMP: {
            uint16_t addr = fetch_word(vm);
            vm->pc = addr;
            break;
        }

        case OP_JZ: {
            uint16_t addr = fetch_word(vm);
            int32_t value = pop(vm);
            if (value == 0) {
                vm->pc = addr;
            }
            break;
        }

        case OP_HALT:
            vm->halted = true;
            break;

        default:
            vm->error = VM_ERR_INVALID_OPCODE;
            break;
    }

    return (vm->halted || vm->error) ? -1 : 0;
}

void vm_run(VM *vm) {
    while (vm_step(vm) == 0) {
        /* Continue until halt or error */
    }
}

Hint 5: CALL and RET Implementation

case OP_CALL: {
    uint16_t addr = fetch_word(vm);

    /* Push return address onto return stack */
    if (vm->rsp >= RSTACK_SIZE) {
        vm->error = VM_ERR_RSTACK_OVERFLOW;
        break;
    }
    vm->rstack[vm->rsp++] = vm->pc;

    /* Jump to subroutine */
    vm->pc = addr;
    break;
}

case OP_RET: {
    if (vm->rsp <= 0) {
        vm->error = VM_ERR_RSTACK_UNDERFLOW;
        break;
    }

    /* Pop return address and jump back */
    vm->pc = vm->rstack[--vm->rsp];
    break;
}

Hint 6: Debug Output

/* src/debug.c */
#include <stdio.h>
#include "vm.h"

void debug_instruction(VM *vm, uint16_t pc, uint8_t opcode) {
    printf("[%04X] ", pc);

    switch (opcode) {
        case OP_PUSH:
            printf("PUSH %d", (int8_t)vm->code[pc + 1]);
            break;
        case OP_ADD:
            printf("ADD");
            break;
        case OP_JMP:
            printf("JMP 0x%04X",
                vm->code[pc+1] | (vm->code[pc+2] << 8));
            break;
        /* ... other opcodes ... */
        default:
            printf("??? (0x%02X)", opcode);
    }

    /* Print stack */
    printf(" | Stack: [");
    for (int i = 0; i < vm->sp; i++) {
        if (i > 0) printf(", ");
        printf("%d", vm->stack[i]);
    }
    printf("]\n");
}

5.8 The Interview Questions They’ll Ask

After completing this project, you’ll be ready for these questions:

Basic Understanding

“Explain the fetch-decode-execute cycle”
- Good Answer: The CPU continuously (1) fetches the next instruction from memory at the program counter address, (2) decodes the opcode to determine what operation to perform, (3) executes the operation, updating registers or memory, then (4) updates the PC and repeats.
- Red Flag: “It’s how programs run” (too vague)
“What’s the difference between a stack-based and register-based VM?”
- Good Answer: Stack-based VMs use an implicit operand stack (JVM, CPython), simpler encoding but more memory traffic. Register-based VMs use explicit registers (Lua, Dalvik), more complex encoding but faster execution.
- Follow-up: “Why might you choose one over the other?”
“How do you handle a division by zero?”
- Good Answer: Set an error flag, store the PC where it occurred, and halt execution. The error can be reported to the user with context.
- Red Flag: “Just let it crash” (poor error handling)

Technical Details

“How does a CALL instruction work?”
- Good Answer: CALL pushes the current PC (return address) onto the return stack, then sets PC to the subroutine address. RET pops the return address and sets PC to it.
- Key Insight: This is exactly how real CPUs handle function calls.
“Why use a separate return stack?”
- Good Answer: Security (harder to corrupt return addresses), clarity (separates data from control flow), and simplicity (data stack operations don’t affect control flow).
- Real-World Example: Forth traditionally uses two stacks for this reason.
“How would you implement local variables in a stack VM?”
- Good Answer: Reserve stack slots at function entry (ALLOC n), access them by offset from a base pointer or frame pointer. This is how JVM’s local variable array works.

Problem-Solving

“Your factorial program gives wrong results. How do you debug?”
- Good Answer:
  1. Enable debug mode to trace each instruction
  2. Check the stack at each step
  3. Verify the loop counter is decrementing
  4. Check conditional jumps are using correct comparison
“How would you optimize this VM?”
- Good Answer:
  1. Threaded code: Use computed goto instead of switch
  2. Super-instructions: Combine common sequences (PUSH_ADD)
  3. JIT compilation: Translate hot paths to native code
  4. Register caching: Keep TOS in a CPU register
“How would you add floating-point support?”
- Good Answer: Add float type to stack (union or separate stack), add float opcodes (FADD, FSUB, FMUL, FDIV), handle type conversion (I2F, F2I). Consider alignment requirements.

5.9 Books That Will Help

Topic	Book	Specific Chapter/Section	Why It Helps
Fetch-Decode-Execute	“Computer Organization and Design” by Patterson & Hennessy	Chapter 4.1 (The Processor)	Hardware perspective on CPU cycles
Stack-Based Computation	“Computer Systems: A Programmer’s Perspective” by Bryant & O’Hallaron	Chapter 3.7 (Procedures)	How stacks are used for function calls
Bytecode Interpretation	“Crafting Interpreters” by Robert Nystrom	Chapter 14 (Chunks of Bytecode)	Practical bytecode VM design
VM Internals	“Crafting Interpreters”	Chapter 15 (A Virtual Machine)	Complete VM implementation
Forth & Stack Machines	“Starting Forth” by Leo Brodie	Chapters 1-4	The classic stack-based language
Opcode Design	“Language Implementation Patterns” by Terence Parr	Chapter 10 (Building Bytecode VMs)	Practical opcode encoding
JVM Internals	“Inside the Java Virtual Machine” by Bill Venners	Chapter 5 (The JVM Instruction Set)	Real-world stack VM
Low-Level C	“The C Programming Language” by K&R	Chapter 5-6	Pointers, structures needed for VM

5.10 Implementation Phases

Phase 1: Foundation (Day 1-2)

Set up project structure
Implement VM structure and lifecycle (create/destroy)
Implement bytecode loader
Add NOP and HALT instructions
Milestone: Load and execute empty program that halts

Phase 2: Stack Operations (Day 2-3)

Implement PUSH, POP, DUP, SWAP, OVER, ROT
Add stack bounds checking
Add debug output for stack state
Milestone: Stack manipulation programs work

Phase 3: Arithmetic (Day 3-4)

Implement ADD, SUB, MUL, DIV, MOD, NEG
Add division by zero check
Test with arithmetic expressions
Milestone: “5 + 3 * 2” evaluates correctly

Phase 4: Comparison & Logic (Day 4)

Implement EQ, NE, LT, GT, LE, GE
Implement AND, OR, XOR, NOT
Milestone: Comparison operations work

Phase 5: Control Flow (Day 5-6)

Implement JMP, JZ, JNZ
Implement CALL, RET with return stack
Test with loops and subroutines
Milestone: Factorial program runs correctly

Phase 6: I/O and Polish (Day 6-7)

Implement PRINT, PRINTC, READ
Add command-line arguments (–debug, etc.)
Write test programs
Document everything
Milestone: Multiple test programs pass

5.11 Key Implementation Decisions

Stack Element Type: Use int32_t for signed arithmetic and reasonable range. Could extend to 64-bit later.
Operand Encoding: PUSH uses 1-byte signed operand (-128 to 127). For larger constants, add PUSH_WORD with 2-byte operand.
Address Format: 2-byte little-endian addresses (64KB address space is plenty for learning).
Error Handling: Set error code AND return from execution loop. Don’t try to continue after errors.
Separate Return Stack: Yes - cleaner, safer, matches Forth tradition.
Debug Mode: Command-line flag –debug enables per-instruction trace output.

6. Testing Strategy

6.1 Unit Testing

Test each instruction in isolation:

/* tests/test_arithmetic.c */
void test_add(void) {
    VM *vm = vm_create();

    /* Manual bytecode: PUSH 5, PUSH 3, ADD, HALT */
    vm->code[0] = OP_PUSH;
    vm->code[1] = 5;
    vm->code[2] = OP_PUSH;
    vm->code[3] = 3;
    vm->code[4] = OP_ADD;
    vm->code[5] = OP_HALT;
    vm->code_size = 6;

    vm_run(vm);

    assert(vm->sp == 1);
    assert(vm->stack[0] == 8);  /* 5 + 3 = 8 */
    assert(vm->error == VM_OK);

    vm_destroy(vm);
    printf("test_add: PASS\n");
}

void test_underflow(void) {
    VM *vm = vm_create();

    /* POP on empty stack */
    vm->code[0] = OP_POP;
    vm->code[1] = OP_HALT;
    vm->code_size = 2;

    vm_run(vm);

    assert(vm->error == VM_ERR_STACK_UNDERFLOW);

    vm_destroy(vm);
    printf("test_underflow: PASS\n");
}

6.2 Integration Testing

Create test programs in assembly:

; tests/factorial.svm
; Calculate 5! = 120

PUSH 5          ; n
PUSH 1          ; accumulator

loop:
    OVER        ; [n, acc, n]
    JZ end      ; if n == 0, done
    SWAP        ; [n, n, acc]
    OVER        ; [n, n, acc, n]
    MUL         ; [n, n, acc*n]
    SWAP        ; [n, acc*n, n]
    PUSH 1      ; [n, acc*n, n, 1]
    SUB         ; [n, acc*n, n-1]
    SWAP        ; [n, n-1, acc*n]
    ROT         ; [n-1, acc*n, n]
    POP         ; [n-1, acc*n]
    JMP loop

end:
    POP         ; remove 0
    PRINT
    HALT

Expected output: 120

6.3 Debugging Techniques

Debug Mode Output:

$ ./stackvm --debug tests/add.bin
[0000] PUSH 5    | Stack: [5]
[0002] PUSH 3    | Stack: [5, 3]
[0004] ADD       | Stack: [8]
[0005] PRINT     | Stack: []
Output: 8
[0006] HALT      | Program terminated

Stack Inspection:

void dump_stack(VM *vm) {
    printf("Stack (SP=%d): ", vm->sp);
    for (int i = 0; i < vm->sp; i++) {
        printf("[%d]=%d ", i, vm->stack[i]);
    }
    printf("\n");
}

Memory Dump:

void dump_code(VM *vm, uint16_t start, int len) {
    printf("Code at 0x%04X:\n", start);
    for (int i = 0; i < len; i++) {
        if (i % 16 == 0) printf("%04X: ", start + i);
        printf("%02X ", vm->code[start + i]);
        if (i % 16 == 15) printf("\n");
    }
    printf("\n");
}

7. Common Pitfalls & Debugging

Problem 1: Stack underflow on binary operations

Symptom: Error when executing ADD, SUB, etc.
Root Cause: Not enough values on stack
Fix: Always ensure preconditions before popping
Quick Test: Check SP >= 2 before binary ops

Problem 2: Infinite loop

Symptom: Program never terminates
Root Cause: JMP to wrong address, or condition never met
Fix: Add instruction count limit, enable debug trace
Quick Test: Add max_instructions counter

Problem 3: Jump to wrong address

Symptom: “Invalid opcode” or unexpected behavior after JMP
Root Cause: Endianness wrong when reading address
Fix: Ensure little-endian read: (hi << 8) | lo
Quick Test: JMP to known address, verify PC

Problem 4: Off-by-one in stack operations

Symptom: Wrong value popped, or SP incorrect
Root Cause: Pre-increment vs post-increment confusion
Fix: push: stack[sp++] = val, pop: return stack[--sp]
Quick Test: Push 1, push 2, pop should return 2

Problem 5: CALL/RET mismatch

Symptom: Return goes to wrong address or crashes
Root Cause: Not pushing PC at right time, or wrong stack
Fix: CALL pushes PC after fetching address, before jumping
Quick Test: Simple CALL/RET with PRINT before and after

Problem 6: Negative numbers display incorrectly

Symptom: PUSH -1 shows as 255
Root Cause: Not sign-extending byte operand
Fix: Cast to int8_t before pushing: push((int8_t)byte)
Quick Test: PUSH -1, PRINT should show -1

8. Extensions & Challenges

Extension 1: Simple Assembler

Write a text-to-binary assembler
Support labels for jumps
Handle comments and whitespace
Challenge: Add macro support

Extension 2: Larger Operands

Add PUSH_WORD for 16-bit values
Add PUSH_DWORD for 32-bit values
Consider how this affects code size

Extension 3: Local Variables

Add LOAD_LOCAL and STORE_LOCAL
Implement frame pointer
Support function-local storage

Extension 4: String Support

Add string literals in data section
Implement PRINTS instruction
Handle null-terminated strings

Extension 5: Interactive Debugger

Single-step execution
Breakpoints at addresses
Watch expressions on stack
Memory inspection

Extension 6: Performance Optimization

Threaded code (computed goto)
Super-instructions (common sequences)
Register caching (TOS in variable)
Benchmark improvements

Extension 7: Standard Library

Built-in subroutines
Math functions (ABS, MAX, MIN)
I/O functions (input line, etc.)

9. Real-World Connections

YOUR VM ←→ REAL-WORLD SYSTEMS:

┌────────────────────────────────────────────────────────────────────────┐
│                                                                         │
│  YOUR INSTRUCTION        JAVA BYTECODE          CPYTHON BYTECODE       │
│  ──────────────────      ──────────────         ────────────────       │
│  PUSH 5                  iconst_5               LOAD_CONST 5           │
│  PUSH 3                  iconst_3               LOAD_CONST 3           │
│  ADD                     iadd                   BINARY_ADD             │
│  PRINT                   invokevirtual          PRINT_ITEM             │
│                          println                                        │
│                                                                         │
│  JMP addr                goto addr              JUMP_ABSOLUTE          │
│  JZ addr                 ifeq addr              POP_JUMP_IF_FALSE      │
│  CALL addr               invokestatic           CALL_FUNCTION          │
│  RET                     return                 RETURN_VALUE           │
│                                                                         │
│  ──────────────────────────────────────────────────────────────────────│
│                                                                         │
│  What you learned applies directly to:                                  │
│                                                                         │
│  • Reading JVM bytecode (javap -c)                                      │
│  • Understanding Python internals (dis module)                          │
│  • Analyzing WebAssembly                                                │
│  • Writing compilers that target VMs                                    │
│  • Implementing interpreters for DSLs                                   │
│                                                                         │
└────────────────────────────────────────────────────────────────────────┘

JVM Connection:

# After your project, you can understand this:
$ echo 'class Add { public static void main(String[] a) {
    System.out.println(5 + 3);
}}' > Add.java
$ javac Add.java
$ javap -c Add
Compiled from "Add.java"
class Add {
  public static void main(java.lang.String[]);
    Code:
       0: getstatic     #7  // PrintStream
       3: bipush        8   // PUSH 8 (compiler folded 5+3!)
       5: invokevirtual #13 // PRINT
       8: return
}

Python Connection:

# After your project, you understand this:
import dis
dis.dis(lambda: 5 + 3)
#  1 0 LOAD_CONST 1 (5)
#              2 LOAD_CONST 2 (3)
#              4 BINARY_ADD
#              6 RETURN_VALUE

10. Resources

Primary References

Crafting Interpreters - Bytecode Chapter - Excellent VM tutorial
Forth Programming - Classic stack-based language
JVM Specification - Real-world stack VM spec

Video Resources

Ben Eater’s 8-bit CPU - Hardware fetch-decode-execute
Tsoding’s Porth Language - Building a stack-based language

CHIP-8 Emulator - Next step: emulate a real system
Simple RISC CPU Emulator - Register-based alternative

11. Self-Assessment Checklist

Before moving to Project 4, verify:

Understanding:

Can you explain fetch-decode-execute without notes?
Can you trace a program’s execution by hand?
Can you explain stack underflow and how to prevent it?
Can you explain how CALL/RET implement subroutines?
Can you compare stack-based vs register-based VMs?

Implementation:

All stack operations work (PUSH, POP, DUP, SWAP, OVER, ROT)?
All arithmetic works including edge cases (div by zero)?
Control flow works (JMP, JZ, JNZ)?
Subroutines work (CALL, RET)?
Error handling catches all error conditions?

Testing:

Simple arithmetic test passes (5 + 3 = 8)?
Factorial program produces 120?
Negative numbers work correctly?
Stack underflow is caught and reported?

Growth:

Can you modify the VM to add new instructions?
Can you write programs without looking at examples?
Do you understand how this relates to JVM/CPython?

12. Submission / Completion Criteria

Your implementation is complete when:

Minimum Viable Completion:

VM loads and executes bytecode files
Basic stack operations work (PUSH, POP, ADD, SUB, MUL)
Program halts correctly
At least one test program runs

Full Completion:

All 30+ opcodes implemented
Stack bounds checking works
Control flow works (jumps, conditionals)
CALL/RET with return stack works
Debug mode shows execution trace
Multiple test programs pass

Excellence (Going Above & Beyond):

Simple assembler (text to bytecode)
Interactive debugger (breakpoints, step)
Local variable support
Performance optimizations
Extended test suite

Congratulations! By completing this project, you’ve built your first CPU in software. You now understand the fetch-decode-execute cycle that powers every computer, from embedded microcontrollers to supercomputers. The stack machine architecture you built is the same approach used by the Java Virtual Machine, CPython, and WebAssembly.

You’ve taken the fundamental step from “writing programs” to “understanding how programs run.”

This guide was expanded from CPU_ISA_ARCHITECTURE_PROJECTS.md. For the complete learning path, see the README.