Project 3: Build a WASM Interpreter
Project 3: Build a WASM Interpreter
Implement a working WebAssembly virtual machine that executes bytecode instruction by instruction
Project Overview
| Attribute | Value |
|---|---|
| Difficulty | Advanced |
| Time Estimate | 1 month+ |
| Languages | C (primary), Rust, Go, TypeScript |
| Prerequisites | Project 2 (Parser), recursion, data structures |
| Main Reference | WebAssembly Specification ยง4 (Execution) |
| Knowledge Area | Virtual Machines, Stack Machines, Runtime Systems |
Learning Objectives
After completing this project, you will be able to:
- Implement a stack machine - Manage a value stack with push/pop operations for all WASM types
- Execute arithmetic and logical instructions - Handle i32/i64/f32/f64 operations correctly
- Manage control flow - Implement blocks, loops, br, br_if, and br_table correctly
- Handle function calls - Create and manage activation records (call frames)
- Implement linear memory - Provide bounds-checked load/store operations
- Resolve imports/exports - Connect WASM modules to host functions
- Pass the WebAssembly test suite - Verify spec conformance
Conceptual Foundation
1. What Is a WebAssembly Interpreter?
An interpreter reads WebAssembly bytecode and executes it directly, without compiling to native code. This is how you truly understand WASM:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ WASM Interpreter Architecture โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ .wasm file โ
โ โ โ
โ โผ โ
โ โโโโโโโโโโโโโโ โโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ Parser โโโโโโถโ Validator โโโโโโถโ Execution Engine โ โ
โ โ (Project 2)โ โ (optional) โ โ โโโโโโโโโโโโโโโโโโโ โ โ
โ โโโโโโโโโโโโโโ โโโโโโโโโโโโโโ โ โ Value Stack โ โ โ
โ โ โโโโโโโโโโโโโโโโโโโค โ โ
โ โ โ Call Stack โ โ โ
โ โ โโโโโโโโโโโโโโโโโโโค โ โ
โ โ โ Linear Memory โ โ โ
โ โ โโโโโโโโโโโโโโโโโโโค โ โ
โ โ โ Globals โ โ โ
โ โ โโโโโโโโโโโโโโโโโโโค โ โ
โ โ โ Tables โ โ โ
โ โ โโโโโโโโโโโโโโโโโโโ โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ

Why build an interpreter instead of a JIT?
- Simplicity: No machine code generation, assembly knowledge not required
- Portability: Your interpreter runs anywhere your implementation language runs
- Debuggability: Easy to add tracing, breakpoints, single-stepping
- Understanding: Youโll see exactly how every instruction works
2. The Stack Machine Execution Model
WebAssembly is a stack machine. Every operation pushes to or pops from a value stack:
Instruction Execution:
Before: i32.const 5 i32.const 3 i32.add
Stack: [] โ [5] โ [5, 3] โ [8]
^push ^push ^pop,pop,push
Invariants:
- Every instruction has a known effect on stack height
- Stack must be balanced at function end
- Type must match (can't add i32 to f64)
Key insight: The stack is ephemeral workspace. Function parameters and results flow through the stack, but persistent storage uses locals, globals, or memory.
3. Value Representation
WASM has only 4 numeric types (plus reference types). Your interpreter needs a tagged union:
// C representation
typedef enum {
VAL_I32,
VAL_I64,
VAL_F32,
VAL_F64,
VAL_FUNCREF,
VAL_EXTERNREF,
} ValueType;
typedef struct {
ValueType type;
union {
int32_t i32;
int64_t i64;
float f32;
double f64;
uint32_t funcref; // Function table index, or null (-1)
void* externref; // Host reference
};
} Value;
// Rust representation
enum Value {
I32(i32),
I64(i64),
F32(f32),
F64(f64),
FuncRef(Option<u32>),
ExternRef(Option<Box<dyn Any>>),
}
Type safety: WASM guarantees type safety through validation. Your interpreter can trust that i32.add will always have two i32s on the stackโif the module passed validation.
4. The Value Stack
The heart of execution. Implement as a growable array:
Value Stack Operations:
push(value):
stack[sp++] = value
pop() -> Value:
return stack[--sp]
peek(depth) -> Value:
return stack[sp - 1 - depth]
Stack during function execution:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ ... caller's values ... โ โ Below frame
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ param 0 (local 0) โ โ Frame start
โ param 1 (local 1) โ
โ local 2 โ
โ local 3 โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ working stack value โ โ sp points here
โ working stack value โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ

5. Call Frames (Activation Records)
Each function call creates a frame that tracks:
- Return address (which instruction to resume after return)
- Local variables (parameters + declared locals)
- Current block depth (for control flow)
Call Frame Structure:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ return_addr: instruction pointer โ
โ func_idx: which function is executing โ
โ locals: array of Value โ
โ stack_base: where this frame's stack startsโ
โ label_stack: for br/block/loop targets โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Call Stack:
โโโโโโโโโโโโโโโ
โ Frame 3 โ โ current (top)
โโโโโโโโโโโโโโโค
โ Frame 2 โ
โโโโโโโโโโโโโโโค
โ Frame 1 โ
โโโโโโโโโโโโโโโค
โ Frame 0 โ โ entry point
โโโโโโโโโโโโโโโ

6. Control Flow: The Tricky Part
WASMโs control flow is structured, not arbitrary. This is the hardest part to get right:
;; Block creates a branch target
(block $outer ;; label 0
(block $inner ;; label 1
br $inner ;; jumps to end of $inner
unreachable ;; never reached
)
;; execution continues here after br $inner
br $outer ;; jumps to end of $outer
)
;; execution continues here after br $outer
Label stack model:
Entering a block/loop/if:
push label { kind, arity, continuation_ip }
br N:
pop N labels (unwinding)
jump to continuation of Nth label
For blocks: continuation = instruction after 'end'
For loops: continuation = instruction after 'loop' (loop back)
For ifs: continuation = instruction after 'end'
The critical difference:
block: br jumps forward (to end)loop: br jumps backward (to start)
block semantics: loop semantics:
โโโโโโโโโโโโโโโโโโโ โโโโโโโโถโโโโโโโ
โ block โ โ loop โ
โ ... โ โ ... โ
โ br 0 โโโโโโโโผโโโ โ br 0 โโโโโ
โ ... โ โ โ ...
โ end โ โ โ end โโโโโโโโโโ
โ โโโโ โ โ
โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโ

7. Linear Memory
Linear memory is a bounds-checked byte array:
typedef struct {
uint8_t* data;
uint32_t current_pages; // Current size in 64KB pages
uint32_t max_pages; // Maximum size (or UINT32_MAX if unlimited)
} Memory;
// Every load/store must bounds-check:
int32_t memory_load_i32(Memory* mem, uint32_t addr) {
if (addr + 4 > mem->current_pages * 65536) {
trap("out of bounds memory access");
}
// Little-endian read
return *(int32_t*)(mem->data + addr);
}
void memory_store_i32(Memory* mem, uint32_t addr, int32_t value) {
if (addr + 4 > mem->current_pages * 65536) {
trap("out of bounds memory access");
}
*(int32_t*)(mem->data + addr) = value;
}
Memory instructions include offset:
i32.load offset=8
; effective_addr = pop() + 8
; push(read_i32(memory[effective_addr]))
8. The Instruction Execution Loop
The core of your interpreter is a fetch-decode-execute loop:
void execute(Module* module, uint32_t func_idx) {
Frame* frame = create_frame(module, func_idx);
while (true) {
uint8_t opcode = read_byte(frame);
switch (opcode) {
case 0x00: // unreachable
trap("unreachable executed");
break;
case 0x01: // nop
break;
case 0x02: // block
parse_block_type(frame);
push_label(frame, LABEL_BLOCK);
break;
case 0x03: // loop
parse_block_type(frame);
push_label(frame, LABEL_LOOP);
break;
case 0x0b: // end
if (is_function_end(frame)) {
return; // Function complete
}
pop_label(frame);
break;
case 0x0c: // br
uint32_t depth = read_leb128(frame);
branch_to(frame, depth);
break;
case 0x10: // call
uint32_t target = read_leb128(frame);
call_function(module, frame, target);
break;
case 0x20: // local.get
uint32_t idx = read_leb128(frame);
push(frame->locals[idx]);
break;
case 0x21: // local.set
uint32_t idx = read_leb128(frame);
frame->locals[idx] = pop();
break;
case 0x41: // i32.const
int32_t val = read_sleb128(frame);
push((Value){.type = VAL_I32, .i32 = val});
break;
case 0x6a: // i32.add
Value b = pop();
Value a = pop();
push((Value){.type = VAL_I32, .i32 = a.i32 + b.i32});
break;
// ... 200+ more opcodes ...
default:
trap("unknown opcode: 0x%02x", opcode);
}
}
}
9. Import/Export Resolution
Modules connect to their environment through imports and exports:
Module Instantiation:
1. Parse module (Project 2)
2. Resolve imports:
- For each import, find matching export from environment
- Functions: store function pointer/index
- Memory: link to shared memory instance
- Globals: link to global storage
- Tables: link to function table
3. Initialize memory with data segments
4. Initialize tables with element segments
5. Run start function if present
6. Return instance with exports accessible
Host function imports:
// User provides these during instantiation
typedef Value (*HostFunc)(Value* args, int num_args);
typedef struct {
const char* module_name;
const char* func_name;
HostFunc func;
} Import;
// Example: importing console.log
Value host_console_log(Value* args, int num_args) {
printf("WASM says: %d\n", args[0].i32);
return (Value){0}; // No return value
}
Import imports[] = {
{"console", "log", host_console_log},
};
10. Trap Handling
WASM can โtrapโ on errors. Traps are unrecoverable within the module:
Trap conditions:
- unreachable instruction executed
- Integer divide by zero
- Integer overflow (i32.trunc_sat excepted)
- Out of bounds memory access
- Out of bounds table access
- Null function reference called
- Indirect call type mismatch
- Stack overflow
Implementation options:
- setjmp/longjmp (C): Save state, jump on trap
- Exceptions (C++, Rust, TypeScript): Throw on trap
- Result types (Rust): Return
Result<Value, Trap>
Project Specification
Minimum Viable Interpreter
Your interpreter must support:
Instructions (MVP):
- Control:
unreachable,nop,block,loop,if,else,end,br,br_if,return,call - Parametric:
drop,select - Variable:
local.get,local.set,local.tee,global.get,global.set - Memory:
i32.load,i32.store,memory.size,memory.grow - Numeric (i32 only for MVP):
i32.const,i32.add,i32.sub,i32.mul,i32.div_s,i32.div_u,i32.rem_s,i32.rem_u,i32.and,i32.or,i32.xor,i32.shl,i32.shr_s,i32.shr_u,i32.eq,i32.ne,i32.lt_s,i32.lt_u,i32.gt_s,i32.gt_u,i32.le_s,i32.le_u,i32.ge_s,i32.ge_u,i32.eqz
Features:
- Parse and execute .wasm files (reuse Project 2 parser)
- Support module imports (host functions)
- Linear memory with bounds checking
- Proper control flow (blocks, loops, branches)
- Function calls (direct only for MVP)
Input/Output Specification
# Run a WASM file
$ ./wasm-interp program.wasm
# Output: result of start function or exported "main"
# Run with arguments (WASI-style)
$ ./wasm-interp program.wasm arg1 arg2
# Export function call
$ ./wasm-interp program.wasm --call add 5 3
8
# Debug mode
$ ./wasm-interp program.wasm --trace
[0x0010] i32.const 5 stack: [5]
[0x0012] i32.const 3 stack: [5, 3]
[0x0014] i32.add stack: [8]
Success Criteria
- Arithmetic test: Execute
(func (param i32 i32) (result i32) local.get 0 local.get 1 i32.add)with inputs 5, 3 โ output 8 - Factorial test: Execute recursive factorial,
factorial(10)โ 3628800 - Fibonacci test: Execute iterative fibonacci,
fib(20)โ 6765 - Memory test: Store values, read them back correctly
- Control flow test: Nested blocks with branches execute correctly
- Official test suite: Pass the WASM spec interpreter tests for core instructions
Solution Architecture
Core Data Structures
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Interpreter Architecture โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ โโโโโโโโโโโโโโ โ
โ โ Module โ โ Parsed representation from Project 2 โ
โ โ - types โ โ
โ โ - funcs โ โ
โ โ - exports โ โ
โ โโโโโโโฌโโโโโโโ โ
โ โ instantiate โ
โ โผ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ Instance โ โ
โ โ โโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโ โ โ
โ โ โ Memory โ โ Globals โ โ Function Table โ โ โ
โ โ โ (bytes) โ โ (values) โ โ (funcidx) โ โ โ
โ โ โโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโ โ โ
โ โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ โ
โ โ โ Execution State โ โ โ
โ โ โ โโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโ โ โ โ
โ โ โ โValue Stack โ โ Call Stack (frames) โ โ โ โ
โ โ โ โโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโ โ โ โ
โ โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ

Module vs Instance
Module: The static representation parsed from .wasm
- Types, function bodies, import/export declarations
- Can be instantiated multiple times
- Like a class definition
Instance: A running module with its own state
- Own memory, globals, tables
- Execution state (stack, frames)
- Like a class instance
Suggested File Organization
wasm-interp/
โโโ src/
โ โโโ main.c # Entry point, CLI handling
โ โโโ types.h # Value, Module, Instance types
โ โโโ parser.c # From Project 2
โ โโโ parser.h
โ โโโ validator.c # Optional: type checking
โ โโโ validator.h
โ โโโ instance.c # Module instantiation
โ โโโ instance.h
โ โโโ exec.c # Instruction execution
โ โโโ exec.h
โ โโโ memory.c # Linear memory operations
โ โโโ memory.h
โ โโโ stack.c # Value stack operations
โ โโโ stack.h
โ โโโ host.c # Host function imports
โ โโโ host.h
โโโ tests/
โ โโโ arithmetic.wat
โ โโโ factorial.wat
โ โโโ memory.wat
โ โโโ control_flow.wat
โ โโโ run_tests.sh
โโโ Makefile
โโโ README.md
Implementation Guide
Phase 1: Foundation (Days 1-3)
Goal: Execute the simplest possible program
;; Target: Execute this
(module
(func (export "answer") (result i32)
i32.const 42
)
)
Steps:
- Define Value type (just i32 for now)
- Implement value stack (push, pop)
- Implement minimal execution loop:
i32.const: push immediate valueend: return top of stack
- Call exported function and print result
Checkpoint: Running the above returns 42.
Phase 2: Arithmetic (Days 4-7)
Goal: Execute arithmetic expressions
(module
(func (export "add") (param i32 i32) (result i32)
local.get 0
local.get 1
i32.add
)
)
Steps:
- Implement
local.getwith parameters - Add
local.set,local.tee - Implement i32 arithmetic:
add,sub,mul,div_s,div_u - Implement i32 comparisons:
eq,ne,lt_s,gt_s, etc. - Add
i32.eqz(equals zero)
Checkpoint: Can evaluate any arithmetic expression.
Phase 3: Control Flow (Days 8-14)
Goal: Implement blocks, loops, and branches
(module
(func (export "sum_to") (param $n i32) (result i32)
(local $sum i32)
(local $i i32)
(block $exit
(loop $continue
;; if i > n, exit
local.get $i
local.get $n
i32.gt_s
br_if $exit
;; sum += i
local.get $sum
local.get $i
i32.add
local.set $sum
;; i++
local.get $i
i32.const 1
i32.add
local.set $i
br $continue
)
)
local.get $sum
)
)
Steps:
- Implement label stack
- Implement
block(push label, continuation at end) - Implement
loop(push label, continuation at start) - Implement
br(branch to label) - Implement
br_if(conditional branch) - Implement
if/else/end(two branches) - Handle block return values (arity)
Checkpoint: Loops and conditionals work correctly.
Phase 4: Function Calls (Days 15-21)
Goal: Call other functions, including recursion
(module
(func $factorial (param $n i32) (result i32)
local.get $n
i32.const 2
i32.lt_s
if (result i32)
i32.const 1
else
local.get $n
local.get $n
i32.const 1
i32.sub
call $factorial
i32.mul
end
)
(export "factorial" (func $factorial))
)
Steps:
- Implement call stack (frame creation, management)
- Implement
callinstruction (push frame, jump) - Implement
return(pop frame, return value) - Handle function end (implicit return)
- Test recursion (factorial, fibonacci)
Checkpoint: Recursive factorial works.
Phase 5: Memory (Days 22-28)
Goal: Implement linear memory operations
(module
(memory 1) ;; 1 page = 64KB
(func (export "store_load") (param $val i32) (result i32)
i32.const 0 ;; address
local.get $val
i32.store ;; memory[0] = val
i32.const 0
i32.load ;; return memory[0]
)
)
Steps:
- Allocate memory buffer during instantiation
- Implement
memory.size(return current pages) - Implement
memory.grow(add pages, return old size) - Implement
i32.load,i32.storewith bounds checking - Add all load/store variants (8-bit, 16-bit, signed/unsigned)
- Handle data sections (initialize memory on load)
Checkpoint: Store/load values correctly, bounds checking works.
Phase 6: Imports (Days 29-35)
Goal: Call host functions from WASM
(module
(import "console" "log" (func $log (param i32)))
(func (export "main")
i32.const 42
call $log
)
)
Steps:
- Parse import section
- Create import resolution table
- Allow user to register host functions
- Distinguish imported vs local function indices
- Implement
callfor imported functions
Checkpoint: Can call console.log from WASM.
Phase 7: Completeness (Week 5+)
Goal: Full spec compliance
Steps:
- Add i64, f32, f64 instructions
- Implement globals
- Implement tables and
call_indirect - Add remaining instructions (
select,memory.copy, etc.) - Run official WebAssembly test suite
- Fix edge cases and bugs
Checkpoint: Pass official spec tests.
Testing Strategy
Unit Tests
Test each instruction in isolation:
// test_i32_add.c
void test_i32_add() {
// Setup
Stack stack = create_stack();
push(&stack, (Value){VAL_I32, .i32 = 5});
push(&stack, (Value){VAL_I32, .i32 = 3});
// Execute
exec_i32_add(&stack);
// Verify
Value result = pop(&stack);
assert(result.type == VAL_I32);
assert(result.i32 == 8);
assert(stack_height(&stack) == 0);
}
Integration Tests
Test complete programs:
# tests/run_tests.sh
wat2wasm tests/arithmetic.wat -o /tmp/arith.wasm
result=$(./wasm-interp /tmp/arith.wasm --call add 5 3)
[ "$result" = "8" ] || fail "add test"
result=$(./wasm-interp /tmp/arith.wasm --call factorial 10)
[ "$result" = "3628800" ] || fail "factorial test"
Spec Conformance Tests
The WebAssembly spec includes thousands of tests:
# Clone spec tests
git clone https://github.com/WebAssembly/spec.git
cd spec/test/core
# Run your interpreter against them
for f in *.wast; do
./wasm-interp --test "$f" || echo "FAIL: $f"
done
Tracing for Debugging
Add verbose mode to see execution:
$ ./wasm-interp --trace factorial.wasm --call factorial 5
[call] factorial(5)
[0x10] local.get 0 stack: [5]
[0x12] i32.const 2 stack: [5, 2]
[0x14] i32.lt_s stack: [0]
[0x15] if (false branch)
[0x20] local.get 0 stack: [5]
[0x22] local.get 0 stack: [5, 5]
[0x24] i32.const 1 stack: [5, 5, 1]
[0x26] i32.sub stack: [5, 4]
[0x27] call factorial
[call] factorial(4)
...
Common Pitfalls
1. Block Arity Confusion
Blocks can return values! A block with type (result i32) must leave exactly one i32 on the stack:
;; Block returns a value
(block (result i32)
i32.const 42
)
;; Stack now has 42
;; Error: stack imbalance
(block (result i32)
i32.const 1
i32.const 2 ;; Extra value!
)
2. Loop vs Block Branch Semantics
This is the most common bug:
;; BLOCK: br jumps FORWARD (to end)
(block $exit
br $exit ;; jumps here โโโ
unreachable โ
) ;; โโโ
;; LOOP: br jumps BACKWARD (to start)
(loop $continue ;; โโโ
br $continue ;; โโโโโโโโโโ
)
3. Function Index Offset
If you have imports, local function indices are offset:
Import 0: (import "env" "print")
Import 1: (import "env" "read")
Function 0: (func ...) โ Index is 2, not 0!
Function 1: (func ...) โ Index is 3
call 0 โ calls Import 0
call 2 โ calls Function 0
4. Memory Alignment Traps
WASM doesnโt trap on unaligned access (itโs just a hint), but you must handle it:
;; Unaligned is valid
i32.const 3
i32.const 42
i32.store align=1 ;; Works, just slower on some hardware
5. Integer Overflow Behavior
Division and remainder have edge cases:
i32.div_s: MIN_INT / -1 โ trap (overflow)
i32.div_u: any / 0 โ trap
i32.rem_s: MIN_INT % -1 โ 0 (not trap!)
6. Stack Underflow
Always check before popping:
Value pop(Stack* s) {
if (s->sp == 0) {
trap("stack underflow");
}
return s->data[--s->sp];
}
Extensions
1. Add a Debugger
Implement interactive debugging:
(wdb) break factorial
Breakpoint 1 at func $factorial
(wdb) run
Breakpoint 1 hit
(wdb) stack
[0]: i32 5
(wdb) locals
$n: i32 5
(wdb) step
[0x12] i32.const 2
(wdb) continue
2. Add Profiling
Track instruction counts and execution time:
Execution profile:
i32.add: 15234 calls (12.3%)
local.get: 45123 calls (36.7%)
br_if: 8234 calls ( 6.7%)
call: 2341 calls ( 1.9%)
...
Hot functions:
$factorial: 8234 calls, 45ms total
$fib: 2341 calls, 89ms total
3. Implement Tail Calls
The tail-call proposal optimizes recursion:
;; With tail calls, this uses O(1) stack
(func $factorial_tail (param $n i32) (param $acc i32) (result i32)
local.get $n
i32.eqz
if (result i32)
local.get $acc
else
local.get $n
i32.const 1
i32.sub
local.get $n
local.get $acc
i32.mul
return_call $factorial_tail ;; Tail call!
end
)
4. Add i64/f32/f64 Support
Extend your value type and add the full instruction set:
- i64: 64-bit integers (same ops as i32)
- f32: IEEE 754 single precision
- f64: IEEE 754 double precision
- Conversions:
i32.trunc_f32_s,f64.convert_i32_u, etc.
5. Implement Tables and call_indirect
Tables enable function pointers and dynamic dispatch:
(table 10 funcref)
(elem (i32.const 0) $func0 $func1 $func2)
(func $dispatch (param $idx i32) (result i32)
local.get $idx
call_indirect (type $sig)
)
Real-World Connections
How Production Runtimes Work
wasmtime (Bytecode Alliance):
- Uses Cranelift for JIT compilation
- Your interpreter helps understand what Cranelift compiles to
wasm3 (high-performance interpreter):
- Threaded interpretation (computed goto)
- Study their source as reference
V8 (Chrome):
- Liftoff: baseline compiler (fast startup)
- TurboFan: optimizing compiler (fast execution)
- Your interpreter is like Liftoff without machine code
Edge Computing
Cloudflare Workers, Fastly Compute@Edge, and AWS Lambda all use WASM:
- Cold start in microseconds
- Strong isolation between tenants
- Your interpreter demonstrates the isolation model
Blockchain VMs
Ethereum 2.0 (ewasm), Polkadot, Near Protocol use WASM:
- Deterministic execution across nodes
- Gas metering (add instruction counting!)
- Your interpreter could be extended with gas limits
The Core Question Youโre Answering
โHow does a virtual machine execute bytecode instruction-by-instruction, and what does โexecutionโ really mean at the lowest level?โ
When you write
i32.addin WebAssembly, what actually happens? A compiler translates that to binary opcode0x6a. But execution is not reading bytesโit is transformation of state. Your interpreter must answer: What changes? The value stack loses two elements and gains one. The instruction pointer advances. Nothing else in the universe of your VM changes.This is the deep insight: execution is state transition. Every instruction is a tiny function from machine state to machine state. The entire complexity of a programโrecursion, memory allocation, control flowโreduces to โread byte, change state, repeat.โ When you implement this yourself, the abstraction falls away. You see that there is no magic in computers, only meticulously defined transformations applied billions of times per second.
Understanding this changes how you think about all code. Every high-level construct youโve ever usedโclasses, closures, async/awaitโeventually becomes a sequence of these primitive state changes. Building an interpreter teaches you to see through the abstraction.
Concepts You Must Understand First
Before implementing an interpreter, ensure you have solid grounding in these foundational concepts:
| Concept | Why It Matters | Where to Learn |
|---|---|---|
| Virtual Machine Architecture | Youโre building a VMโunderstand what that means. How does a VM differ from physical hardware? What are the tradeoffs of interpretation vs compilation? | Computer Systems: A Programmerโs Perspective Ch. 3.7 |
| Stack-Based Execution Model | WASM is a stack machine. Every instruction implicitly operates on a stack. You must internalize push/pop semantics before implementing anything. | Engineering a Compiler by Cooper & Torczon, Ch. 7.2 |
| Instruction Dispatch Techniques | How do you go from opcode byte to handler code? Switch statements work but are slow. Computed gotos and threaded dispatch can improve performance 7-15%. | Intel WAMR Technical Article |
| Call Frames and Activation Records | When a function calls another function, where do parameters and locals live? How do you track where to return? The call stack is separate from the value stack. | Low-Level Programming by Igor Zhirkov, Ch. 12 |
| Memory Bounds Checking | Every memory access must be validated. An off-by-one error here is a security vulnerability. Understand how to check efficiently without killing performance. | WebAssembly Specification ยง4.4 |
| Type Validation at Runtime | WASM is typed. Even in an interpreter, you may want to validate that i32.add receives two i32s (for debugging). The spec guarantees type safety if validation passed. | WebAssembly Specification ยง3 (Validation) |
Note on threaded interpretation: The wasm3 interpreter uses a โcontinuation-passing styleโ threaded dispatch that avoids the overhead of a central switch statement. Each instruction handler jumps directly to the next handler. This technique, discussed since the 1970s, shows significant performance benefits on modern CPUs with branch prediction.
Questions to Guide Your Design
Before writing code, think through these design questions. Your answers will shape your implementation:
Value Stack Representation
- How will you represent the value stack? A fixed-size array? A growable vector? Whatโs the maximum depth?
- How do you handle the tagged union problem? Each value has a type (i32, i64, f32, f64). Do you use a struct with a type tag, or separate arrays per type?
- What happens on stack overflow? WASM modules can have deeply nested expressions. Your stack needs a limit.
Function Calls and Returns
- How do you represent a call frame? What data does each frame need: return address, locals, stack base?
- Where do parameters come from? When calling a function, parameters are on the value stack. They become the calleeโs locals.
- How do you handle the return value? The calleeโs result must end up on the callerโs stack. How does unwinding work?
- What about recursion depth? Recursive functions can exhaust your call stack. How do you trap on stack overflow?
Control Flow Implementation
- How do you implement
block,loop, andif? These create branch targets (labels). What data structure tracks them? - What is the label stack? Each block pushes a label with its continuation address.
br Npops N labels and jumps. - Why is
loopdifferent fromblock? Blockโs br jumps forward (to end). Loopโs br jumps backward (to start). This is the most common bug. - How do you handle
br_table? A multi-way branch with a default. This is the hardest control flow instruction.
Linear Memory Management
- How do you allocate memory? WASM memory is sized in 64KB pages. Do you allocate all at once or grow lazily?
- How do you bounds-check efficiently? Every load/store must check. Can you avoid redundant checks?
- How do you handle
memory.grow? The memory can expand. Existing data must be preserved. What if growth fails?
Thinking Exercise
Before implementing your interpreter, hand-trace this program through your proposed data structures:
(module
(func $add_three (param $a i32) (param $b i32) (param $c i32) (result i32)
local.get $a
local.get $b
i32.add
local.get $c
i32.add
)
(func $main (result i32)
i32.const 10
i32.const 20
i32.const 12
call $add_three
)
(export "main" (func $main))
)
Trace the execution step by step. For each instruction, write:
- The instruction pointer (which instruction)
- The value stack contents (before and after)
- The call stack contents (active frames)
- Local variable values for each frame
Expected trace format:
=== Calling $main ===
Frame 0: $main, locals: [], return: (exit)
IP: 0 | i32.const 10 | Stack: [] -> [10]
IP: 1 | i32.const 20 | Stack: [10] -> [10, 20]
IP: 2 | i32.const 12 | Stack: [10, 20] -> [10, 20, 12]
IP: 3 | call $add_three | Creating new frame...
=== Calling $add_three ===
Frame 1: $add_three, locals: [$a=10, $b=20, $c=12], return: Frame 0, IP 4
(Note: 10, 20, 12 were popped from caller's stack and became locals)
IP: 0 | local.get $a | Stack: [] -> [10]
IP: 1 | local.get $b | Stack: [10] -> [10, 20]
IP: 2 | i32.add | Stack: [10, 20] -> [30]
IP: 3 | local.get $c | Stack: [30] -> [30, 12]
IP: 4 | i32.add | Stack: [30, 12] -> [42]
IP: 5 | end | Return 42 to caller, pop frame
=== Returning to $main ===
Frame 0: $main, stack now [42] (return value pushed)
IP: 4 | end | Return 42, execution complete
Result: 42
Complete this trace yourself before implementing. If you can do this by hand, you understand the execution model.
The Interview Questions Theyโll Ask
If you build a WASM interpreter, expect these questions in systems programming interviews:
Fundamental Understanding
- โExplain the difference between a stack machine and a register machine.โ
- Stack machines use implicit operands (pop, compute, push). Register machines use explicit operand names.
- WASM is a stack machine. x86 is a register machine. JVM bytecode is stack-based.
- Tradeoff: stack machines have simpler encoding, register machines have better performance.
- โHow does your interpreter handle function calls?โ
- Be ready to explain: frame creation, parameter passing via stack, local storage, return value handling, stack unwinding.
- โWhat makes WebAssemblyโs control flow โstructuredโ?โ
- No arbitrary
goto. All branches target block/loop/if labels. This enables validation and optimization. - You should be able to explain why this is a security feature.
- No arbitrary
Implementation Details
- โHow do you dispatch instructions efficiently?โ
- Discuss switch statement (baseline), computed goto (faster), threaded interpretation (fastest).
- Mention that wasm3 uses threaded dispatch and achieves high performance.
- โHow do you implement
br_table?โ- Itโs a jump table. Index into array of label depths, branch to that label.
- What if index is out of bounds? Use the default label.
- โHow do you handle memory bounds checking?โ
- Every load/store:
if (addr + size > memory_size) trap(). - Discuss performance: hoisting checks, eliminating redundant checks.
- Every load/store:
Architecture Questions
- โWhatโs the difference between a Module and an Instance?โ
- Module: static, parsed from .wasm, can be instantiated multiple times.
- Instance: runtime state, owns memory/globals/tables, has execution context.
- โHow would you add debugging support?โ
- Breakpoints: check instruction address before executing.
- Single-stepping: execute one instruction, wait.
- Stack inspection: expose frame and value stack state.
- โHow would you extend this to a JIT compiler?โ
- Instead of interpreting opcodes, generate native machine code.
- Still need runtime for memory, calls, traps.
- Mention Cranelift (wasmtime), Lightbeam (deprecated), Liftoff (V8).
Hints in Layers
Stuck on implementation? Read only the hint level you need:
Challenge: Instruction Dispatch is Slow
Hint Level 1 (Conceptual): The obvious approach uses a switch statement with 200+ cases. Every instruction pays the cost of the switch.
Hint Level 2 (Direction): Look up โcomputed gotoโ or โthreaded interpretation.โ The idea is to jump directly from one handler to the next.
Hint Level 3 (Specific): In C/GCC, use &&label to get the address of a label, and goto *address to jump to it. Store handler addresses in an array indexed by opcode.
Hint Level 4 (Code):
void* dispatch_table[256];
dispatch_table[0x6a] = &&op_i32_add;
dispatch_table[0x41] = &&op_i32_const;
// ...
#define DISPATCH() goto *dispatch_table[*ip++]
op_i32_add:
stack[sp-2] += stack[sp-1]; sp--;
DISPATCH();
op_i32_const:
stack[sp++] = read_leb128(&ip);
DISPATCH();
Challenge: Control Flow Labels
Hint Level 1 (Conceptual): Blocks, loops, and ifs create branch targets. You need to track where each br N should jump to.
Hint Level 2 (Direction): Maintain a โlabel stackโ separate from the value stack. Push a label when entering block/loop/if. Pop on end.
Hint Level 3 (Specific): Each label needs: (1) its type (block/loop/if), (2) its continuation address, (3) its arity (number of return values).
Hint Level 4 (Code):
typedef struct {
LabelKind kind; // BLOCK, LOOP, IF
uint8_t* continuation; // Where to jump on br
int arity; // Number of return values
} Label;
// For block: continuation = address after 'end'
// For loop: continuation = address after 'loop' (loops back!)
Challenge: Function Call Mechanics
Hint Level 1 (Conceptual): When you call a function, you need to remember where to return.
Hint Level 2 (Direction): Create a call frame that stores the return address, the base of the value stack, and local variables.
Hint Level 3 (Specific): Pop N arguments from callerโs stack, copy to calleeโs locals. On return, push results back to callerโs stack.
Hint Level 4 (Code):
void call_function(VM* vm, uint32_t func_idx) {
Function* func = &vm->module->functions[func_idx];
Frame* frame = push_frame(vm);
frame->func = func;
frame->return_ip = vm->ip;
frame->stack_base = vm->sp - func->num_params;
// Parameters become first N locals
for (int i = 0; i < func->num_params; i++) {
frame->locals[i] = vm->stack[frame->stack_base + i];
}
vm->sp = frame->stack_base; // Pop params from caller
// Additional locals initialized to zero
for (int i = func->num_params; i < func->num_locals; i++) {
frame->locals[i] = (Value){0};
}
vm->ip = func->code_start;
}
Books That Will Help
| Book | Relevant Chapters | What Youโll Learn |
|---|---|---|
| Low-Level Programming by Igor Zhirkov | Ch. 12: Virtual Machines & Interpreters | How to structure an interpreter, dispatch loops, frame management. This chapter walks through building a simple VM from scratch. |
| Computer Systems: A Programmerโs Perspective by Bryant & OโHallaron | Ch. 3.7: Procedures | How real hardware handles function calls, stack frames, calling conventions. Understanding this helps you design WASM frames. |
| Engineering a Compiler by Cooper & Torczon | Ch. 7.2: Stack Machines, Ch. 5: Runtime Environments | Academic treatment of stack-based execution, activation records, and runtime support. Essential for understanding the theory. |
| Crafting Interpreters by Robert Nystrom | Part II: Bytecode VM (Chapters 14-24) | Complete walkthrough of building a bytecode interpreter. Different language (Lox, not WASM), but same concepts. Free online at craftinginterpreters.com. |
| Writing An Interpreter In Go by Thorsten Ball | Entire book | Practical, hands-on interpreter construction. Good for understanding the end-to-end process before specializing to WASM. |
Real-World Outcome
After completing this project, you should be able to run your interpreter and see output like this:
$ ./wasm-interp factorial.wasm --call factorial 10 --trace
=== WebAssembly Interpreter v1.0 ===
Module: factorial.wasm (1847 bytes)
Functions: 1
Memory: 0 pages
Exports: factorial (func 0)
Calling: factorial(10)
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
[FRAME 0] factorial | locals: [$n=10] | stack_base: 0
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
0x0010 โ local.get 0 โ Stack: [] โ [10]
0x0012 โ i32.const 2 โ Stack: [10] โ [10, 2]
0x0014 โ i32.lt_s โ Stack: [10, 2] โ [0]
0x0015 โ if (result i32) โ Stack: [0] โ []
โ condition: false, taking else branch
0x001A โ local.get 0 โ Stack: [] โ [10]
0x001C โ local.get 0 โ Stack: [10] โ [10, 10]
0x001E โ i32.const 1 โ Stack: [10, 10] โ [10, 10, 1]
0x0020 โ i32.sub โ Stack: [10, 10, 1] โ [10, 9]
0x0021 โ call 0 โ Stack: [10, 9] โ [10]
โ pushing new frame for factorial(9)
[FRAME 1] factorial | locals: [$n=9] | stack_base: 1
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
0x0010 โ local.get 0 โ Stack: [] โ [9]
0x0012 โ i32.const 2 โ Stack: [9] โ [9, 2]
0x0014 โ i32.lt_s โ Stack: [9, 2] โ [0]
0x0015 โ if (result i32) โ condition: false, taking else branch
0x001A โ local.get 0 โ Stack: [] โ [9]
0x001C โ local.get 0 โ Stack: [9] โ [9, 9]
0x001E โ i32.const 1 โ Stack: [9, 9] โ [9, 9, 1]
0x0020 โ i32.sub โ Stack: [9, 9, 1] โ [9, 8]
0x0021 โ call 0 โ pushing new frame for factorial(8)
... [frames 2-8 elided for brevity] ...
[FRAME 9] factorial | locals: [$n=1] | stack_base: 9
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
0x0010 โ local.get 0 โ Stack: [] โ [1]
0x0012 โ i32.const 2 โ Stack: [1] โ [1, 2]
0x0014 โ i32.lt_s โ Stack: [1, 2] โ [1]
0x0015 โ if (result i32) โ condition: true, taking then branch
0x0017 โ i32.const 1 โ Stack: [] โ [1]
0x0019 โ end โ returning from if with [1]
0x0030 โ end โ returning from function with 1
โ popping frame, result: 1
[FRAME 8] factorial | resuming after call
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
0x0022 โ i32.mul โ Stack: [2, 1] โ [2]
0x0023 โ end โ returning from if with [2]
0x0030 โ end โ returning from function with 2
โ popping frame, result: 2
[FRAME 7] factorial | resuming after call
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
0x0022 โ i32.mul โ Stack: [3, 2] โ [6]
...
[FRAME 0] factorial | resuming after call
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
0x0022 โ i32.mul โ Stack: [10, 362880] โ [3628800]
0x0023 โ end โ returning from if with [3628800]
0x0030 โ end โ returning from function with 3628800
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
EXECUTION COMPLETE
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Result: 3628800 (i32)
Instructions executed: 127
Call depth (max): 10
Memory usage: 0 bytes
Execution time: 0.23ms
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
$ ./wasm-interp memory_test.wasm --call store_and_sum --trace
=== WebAssembly Interpreter v1.0 ===
Module: memory_test.wasm (423 bytes)
Functions: 1
Memory: 1 page (65536 bytes)
Exports: store_and_sum (func 0)
Calling: store_and_sum()
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
[FRAME 0] store_and_sum | locals: [] | stack_base: 0
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
0x0010 โ i32.const 0 โ Stack: [] โ [0]
0x0012 โ i32.const 100 โ Stack: [0] โ [0, 100]
0x0014 โ i32.store โ Stack: [0, 100] โ []
โ memory[0x0000..0x0003] = 100 (0x64)
0x0016 โ i32.const 4 โ Stack: [] โ [4]
0x0018 โ i32.const 200 โ Stack: [4] โ [4, 200]
0x001A โ i32.store โ Stack: [4, 200] โ []
โ memory[0x0004..0x0007] = 200 (0xc8)
0x001C โ i32.const 0 โ Stack: [] โ [0]
0x001E โ i32.load โ Stack: [0] โ [100]
โ loaded 100 from memory[0x0000..0x0003]
0x0020 โ i32.const 4 โ Stack: [100] โ [100, 4]
0x0022 โ i32.load โ Stack: [100, 4] โ [100, 200]
โ loaded 200 from memory[0x0004..0x0007]
0x0024 โ i32.add โ Stack: [100, 200] โ [300]
0x0025 โ end โ returning 300
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
EXECUTION COMPLETE
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Result: 300 (i32)
Instructions executed: 11
Memory:
0x0000: 64 00 00 00 c8 00 00 00 00 00 00 00 00 00 00 00
^^^^^^^^^^^^ ^^^^^^^^^^^^
100 200
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
This trace demonstrates mastery of interpreter implementation: instruction dispatch, stack management, call frames with local variables, memory operations with bounds checking, and structured control flow with if/else branches. The step-by-step visibility into execution state proves you understand what โexecutionโ truly means.
Self-Assessment Checklist
Before considering this project complete, verify you can:
Execution Model
- Explain stack machine semantics to another developer
- Trace through a program by hand, showing stack state at each step
- Explain why WASM uses structured control flow instead of goto
- Describe what happens during a function call (frame creation, locals, return)
Implementation
- Execute the official factorial test correctly
- Handle nested blocks and loops with branches
- Bounds-check all memory accesses
- Call host functions from WASM code
- Trap on divide by zero and out-of-bounds access
Debugging
- Add a trace mode that shows instruction-by-instruction execution
- Identify and fix off-by-one errors in branch targets
- Debug stack imbalance issues
Extensions
- Describe how you would add a debugger
- Explain how JIT compilation would improve performance
- Identify which instructions are hardest to optimize
Resources
Primary References
- WebAssembly Specification ยง4 (Execution) - Formal semantics
- WebAssembly Opcode Table - Quick reference
- wasm3 Source Code - Clean, readable interpreter
Books
- Computer Systems: A Programmerโs Perspective Ch. 3 - Machine-level execution
- Engineering a Compiler Ch. 5 - Runtime environments
Videos
- How to Write a WebAssembly Interpreter - Ben Smith (Google)
- Building a WASM Runtime - Lin Clark
Tools for Testing
- WebAssembly Binary Toolkit (wabt) - wat2wasm, wasm-interp
- WebAssembly Spec Tests - Official conformance
Key Insights
The stack is temporary, frames are permanent. Values flow through the stack during expression evaluation, but function state lives in call frames. Understanding this separation is key to implementing control flow correctly.
Control flow is data flow. WASMโs structured control flow means every branch has a known destination. Thereโs no โgoto spaghettiโโthe label stack tells you exactly where each br goes.
The interpreter IS the specification. When youโve implemented all instructions correctly, you have a working definition of WebAssembly semantics. This is the deepest understanding possible.
After completing this project, youโll understand WebAssembly execution as well as anyone who hasnโt written a JIT compiler. Project 4 (compiler) will show you the other sideโgenerating WASM instead of executing it.