Project 3: Build a WASM Interpreter

Project 3: Build a WASM Interpreter

Implement a working WebAssembly virtual machine that executes bytecode instruction by instruction


Project Overview

Attribute Value
Difficulty Advanced
Time Estimate 1 month+
Languages C (primary), Rust, Go, TypeScript
Prerequisites Project 2 (Parser), recursion, data structures
Main Reference WebAssembly Specification ยง4 (Execution)
Knowledge Area Virtual Machines, Stack Machines, Runtime Systems

Learning Objectives

After completing this project, you will be able to:

  1. Implement a stack machine - Manage a value stack with push/pop operations for all WASM types
  2. Execute arithmetic and logical instructions - Handle i32/i64/f32/f64 operations correctly
  3. Manage control flow - Implement blocks, loops, br, br_if, and br_table correctly
  4. Handle function calls - Create and manage activation records (call frames)
  5. Implement linear memory - Provide bounds-checked load/store operations
  6. Resolve imports/exports - Connect WASM modules to host functions
  7. Pass the WebAssembly test suite - Verify spec conformance

Conceptual Foundation

1. What Is a WebAssembly Interpreter?

An interpreter reads WebAssembly bytecode and executes it directly, without compiling to native code. This is how you truly understand WASM:

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                    WASM Interpreter Architecture                     โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                                                                      โ”‚
โ”‚  .wasm file                                                          โ”‚
โ”‚      โ”‚                                                               โ”‚
โ”‚      โ–ผ                                                               โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”     โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”     โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”‚
โ”‚  โ”‚   Parser   โ”‚โ”€โ”€โ”€โ”€โ–ถโ”‚  Validator โ”‚โ”€โ”€โ”€โ”€โ–ถโ”‚  Execution Engine      โ”‚   โ”‚
โ”‚  โ”‚ (Project 2)โ”‚     โ”‚ (optional) โ”‚     โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”‚   โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜     โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜     โ”‚  โ”‚  Value Stack    โ”‚   โ”‚   โ”‚
โ”‚                                         โ”‚  โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค   โ”‚   โ”‚
โ”‚                                         โ”‚  โ”‚  Call Stack     โ”‚   โ”‚   โ”‚
โ”‚                                         โ”‚  โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค   โ”‚   โ”‚
โ”‚                                         โ”‚  โ”‚  Linear Memory  โ”‚   โ”‚   โ”‚
โ”‚                                         โ”‚  โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค   โ”‚   โ”‚
โ”‚                                         โ”‚  โ”‚  Globals        โ”‚   โ”‚   โ”‚
โ”‚                                         โ”‚  โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค   โ”‚   โ”‚
โ”‚                                         โ”‚  โ”‚  Tables         โ”‚   โ”‚   โ”‚
โ”‚                                         โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ”‚   โ”‚
โ”‚                                         โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ”‚
โ”‚                                                                      โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

WASM Interpreter Architecture

Why build an interpreter instead of a JIT?

  • Simplicity: No machine code generation, assembly knowledge not required
  • Portability: Your interpreter runs anywhere your implementation language runs
  • Debuggability: Easy to add tracing, breakpoints, single-stepping
  • Understanding: Youโ€™ll see exactly how every instruction works

2. The Stack Machine Execution Model

WebAssembly is a stack machine. Every operation pushes to or pops from a value stack:

Instruction Execution:

Before:     i32.const 5       i32.const 3       i32.add
Stack:      []         โ†’      [5]         โ†’     [5, 3]      โ†’     [8]
                              ^push              ^push             ^pop,pop,push

Invariants:
- Every instruction has a known effect on stack height
- Stack must be balanced at function end
- Type must match (can't add i32 to f64)

Key insight: The stack is ephemeral workspace. Function parameters and results flow through the stack, but persistent storage uses locals, globals, or memory.

3. Value Representation

WASM has only 4 numeric types (plus reference types). Your interpreter needs a tagged union:

// C representation
typedef enum {
    VAL_I32,
    VAL_I64,
    VAL_F32,
    VAL_F64,
    VAL_FUNCREF,
    VAL_EXTERNREF,
} ValueType;

typedef struct {
    ValueType type;
    union {
        int32_t i32;
        int64_t i64;
        float f32;
        double f64;
        uint32_t funcref;  // Function table index, or null (-1)
        void* externref;   // Host reference
    };
} Value;
// Rust representation
enum Value {
    I32(i32),
    I64(i64),
    F32(f32),
    F64(f64),
    FuncRef(Option<u32>),
    ExternRef(Option<Box<dyn Any>>),
}

Type safety: WASM guarantees type safety through validation. Your interpreter can trust that i32.add will always have two i32s on the stackโ€”if the module passed validation.

4. The Value Stack

The heart of execution. Implement as a growable array:

Value Stack Operations:

push(value):
  stack[sp++] = value

pop() -> Value:
  return stack[--sp]

peek(depth) -> Value:
  return stack[sp - 1 - depth]

Stack during function execution:
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ ... caller's values ...               โ”‚  โ† Below frame
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ param 0 (local 0)                     โ”‚  โ† Frame start
โ”‚ param 1 (local 1)                     โ”‚
โ”‚ local 2                               โ”‚
โ”‚ local 3                               โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ working stack value                   โ”‚  โ† sp points here
โ”‚ working stack value                   โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Value Stack Operations

5. Call Frames (Activation Records)

Each function call creates a frame that tracks:

  • Return address (which instruction to resume after return)
  • Local variables (parameters + declared locals)
  • Current block depth (for control flow)
Call Frame Structure:

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  return_addr: instruction pointer          โ”‚
โ”‚  func_idx: which function is executing     โ”‚
โ”‚  locals: array of Value                    โ”‚
โ”‚  stack_base: where this frame's stack startsโ”‚
โ”‚  label_stack: for br/block/loop targets    โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Call Stack:
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  Frame 3    โ”‚  โ† current (top)
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚  Frame 2    โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚  Frame 1    โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚  Frame 0    โ”‚  โ† entry point
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Call Frame Structure and Call Stack

6. Control Flow: The Tricky Part

WASMโ€™s control flow is structured, not arbitrary. This is the hardest part to get right:

;; Block creates a branch target
(block $outer      ;; label 0
  (block $inner    ;; label 1
    br $inner      ;; jumps to end of $inner
    unreachable    ;; never reached
  )
  ;; execution continues here after br $inner
  br $outer        ;; jumps to end of $outer
)
;; execution continues here after br $outer

Label stack model:

Entering a block/loop/if:
  push label { kind, arity, continuation_ip }

br N:
  pop N labels (unwinding)
  jump to continuation of Nth label

For blocks:  continuation = instruction after 'end'
For loops:   continuation = instruction after 'loop' (loop back)
For ifs:     continuation = instruction after 'end'

The critical difference:

  • block: br jumps forward (to end)
  • loop: br jumps backward (to start)
block semantics:          loop semantics:

  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”       โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ–ถโ”€โ”€โ”€โ”€โ”€โ”€โ”
  โ”‚  block          โ”‚       โ”‚  loop       โ”‚
  โ”‚    ...          โ”‚       โ”‚    ...      โ”‚
  โ”‚    br 0  โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”    โ”‚    br 0 โ”€โ”€โ”€โ”€โ”˜
  โ”‚    ...          โ”‚  โ”‚    โ”‚    ...
  โ”‚  end            โ”‚  โ”‚    โ”‚  end โ—€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
  โ”‚                 โ”‚โ—€โ”€โ”˜    โ”‚               โ”‚
  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜       โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Block vs Loop Semantics

7. Linear Memory

Linear memory is a bounds-checked byte array:

typedef struct {
    uint8_t* data;
    uint32_t current_pages;  // Current size in 64KB pages
    uint32_t max_pages;      // Maximum size (or UINT32_MAX if unlimited)
} Memory;

// Every load/store must bounds-check:
int32_t memory_load_i32(Memory* mem, uint32_t addr) {
    if (addr + 4 > mem->current_pages * 65536) {
        trap("out of bounds memory access");
    }
    // Little-endian read
    return *(int32_t*)(mem->data + addr);
}

void memory_store_i32(Memory* mem, uint32_t addr, int32_t value) {
    if (addr + 4 > mem->current_pages * 65536) {
        trap("out of bounds memory access");
    }
    *(int32_t*)(mem->data + addr) = value;
}

Memory instructions include offset:

i32.load offset=8
  ; effective_addr = pop() + 8
  ; push(read_i32(memory[effective_addr]))

8. The Instruction Execution Loop

The core of your interpreter is a fetch-decode-execute loop:

void execute(Module* module, uint32_t func_idx) {
    Frame* frame = create_frame(module, func_idx);

    while (true) {
        uint8_t opcode = read_byte(frame);

        switch (opcode) {
            case 0x00: // unreachable
                trap("unreachable executed");
                break;

            case 0x01: // nop
                break;

            case 0x02: // block
                parse_block_type(frame);
                push_label(frame, LABEL_BLOCK);
                break;

            case 0x03: // loop
                parse_block_type(frame);
                push_label(frame, LABEL_LOOP);
                break;

            case 0x0b: // end
                if (is_function_end(frame)) {
                    return;  // Function complete
                }
                pop_label(frame);
                break;

            case 0x0c: // br
                uint32_t depth = read_leb128(frame);
                branch_to(frame, depth);
                break;

            case 0x10: // call
                uint32_t target = read_leb128(frame);
                call_function(module, frame, target);
                break;

            case 0x20: // local.get
                uint32_t idx = read_leb128(frame);
                push(frame->locals[idx]);
                break;

            case 0x21: // local.set
                uint32_t idx = read_leb128(frame);
                frame->locals[idx] = pop();
                break;

            case 0x41: // i32.const
                int32_t val = read_sleb128(frame);
                push((Value){.type = VAL_I32, .i32 = val});
                break;

            case 0x6a: // i32.add
                Value b = pop();
                Value a = pop();
                push((Value){.type = VAL_I32, .i32 = a.i32 + b.i32});
                break;

            // ... 200+ more opcodes ...

            default:
                trap("unknown opcode: 0x%02x", opcode);
        }
    }
}

9. Import/Export Resolution

Modules connect to their environment through imports and exports:

Module Instantiation:

1. Parse module (Project 2)
2. Resolve imports:
   - For each import, find matching export from environment
   - Functions: store function pointer/index
   - Memory: link to shared memory instance
   - Globals: link to global storage
   - Tables: link to function table
3. Initialize memory with data segments
4. Initialize tables with element segments
5. Run start function if present
6. Return instance with exports accessible

Host function imports:

// User provides these during instantiation
typedef Value (*HostFunc)(Value* args, int num_args);

typedef struct {
    const char* module_name;
    const char* func_name;
    HostFunc func;
} Import;

// Example: importing console.log
Value host_console_log(Value* args, int num_args) {
    printf("WASM says: %d\n", args[0].i32);
    return (Value){0};  // No return value
}

Import imports[] = {
    {"console", "log", host_console_log},
};

10. Trap Handling

WASM can โ€œtrapโ€ on errors. Traps are unrecoverable within the module:

Trap conditions:
- unreachable instruction executed
- Integer divide by zero
- Integer overflow (i32.trunc_sat excepted)
- Out of bounds memory access
- Out of bounds table access
- Null function reference called
- Indirect call type mismatch
- Stack overflow

Implementation options:

  1. setjmp/longjmp (C): Save state, jump on trap
  2. Exceptions (C++, Rust, TypeScript): Throw on trap
  3. Result types (Rust): Return Result<Value, Trap>

Project Specification

Minimum Viable Interpreter

Your interpreter must support:

Instructions (MVP):

  • Control: unreachable, nop, block, loop, if, else, end, br, br_if, return, call
  • Parametric: drop, select
  • Variable: local.get, local.set, local.tee, global.get, global.set
  • Memory: i32.load, i32.store, memory.size, memory.grow
  • Numeric (i32 only for MVP): i32.const, i32.add, i32.sub, i32.mul, i32.div_s, i32.div_u, i32.rem_s, i32.rem_u, i32.and, i32.or, i32.xor, i32.shl, i32.shr_s, i32.shr_u, i32.eq, i32.ne, i32.lt_s, i32.lt_u, i32.gt_s, i32.gt_u, i32.le_s, i32.le_u, i32.ge_s, i32.ge_u, i32.eqz

Features:

  • Parse and execute .wasm files (reuse Project 2 parser)
  • Support module imports (host functions)
  • Linear memory with bounds checking
  • Proper control flow (blocks, loops, branches)
  • Function calls (direct only for MVP)

Input/Output Specification

# Run a WASM file
$ ./wasm-interp program.wasm
# Output: result of start function or exported "main"

# Run with arguments (WASI-style)
$ ./wasm-interp program.wasm arg1 arg2

# Export function call
$ ./wasm-interp program.wasm --call add 5 3
8

# Debug mode
$ ./wasm-interp program.wasm --trace
[0x0010] i32.const 5         stack: [5]
[0x0012] i32.const 3         stack: [5, 3]
[0x0014] i32.add             stack: [8]

Success Criteria

  1. Arithmetic test: Execute (func (param i32 i32) (result i32) local.get 0 local.get 1 i32.add) with inputs 5, 3 โ†’ output 8
  2. Factorial test: Execute recursive factorial, factorial(10) โ†’ 3628800
  3. Fibonacci test: Execute iterative fibonacci, fib(20) โ†’ 6765
  4. Memory test: Store values, read them back correctly
  5. Control flow test: Nested blocks with branches execute correctly
  6. Official test suite: Pass the WASM spec interpreter tests for core instructions

Solution Architecture

Core Data Structures

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                       Interpreter Architecture                       โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                                                                      โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                                                      โ”‚
โ”‚  โ”‚   Module   โ”‚ โ† Parsed representation from Project 2              โ”‚
โ”‚  โ”‚  - types   โ”‚                                                      โ”‚
โ”‚  โ”‚  - funcs   โ”‚                                                      โ”‚
โ”‚  โ”‚  - exports โ”‚                                                      โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”˜                                                      โ”‚
โ”‚        โ”‚ instantiate                                                 โ”‚
โ”‚        โ–ผ                                                             โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”             โ”‚
โ”‚  โ”‚                    Instance                         โ”‚             โ”‚
โ”‚  โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”‚             โ”‚
โ”‚  โ”‚  โ”‚  Memory  โ”‚  โ”‚  Globals โ”‚  โ”‚  Function Table  โ”‚  โ”‚             โ”‚
โ”‚  โ”‚  โ”‚  (bytes) โ”‚  โ”‚ (values) โ”‚  โ”‚    (funcidx)     โ”‚  โ”‚             โ”‚
โ”‚  โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ”‚             โ”‚
โ”‚  โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”‚             โ”‚
โ”‚  โ”‚  โ”‚              Execution State               โ”‚    โ”‚             โ”‚
โ”‚  โ”‚  โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”‚    โ”‚             โ”‚
โ”‚  โ”‚  โ”‚  โ”‚Value Stack โ”‚  โ”‚ Call Stack (frames) โ”‚   โ”‚    โ”‚             โ”‚
โ”‚  โ”‚  โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ”‚    โ”‚             โ”‚
โ”‚  โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ”‚             โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜             โ”‚
โ”‚                                                                      โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Interpreter Architecture - Module vs Instance

Module vs Instance

Module: The static representation parsed from .wasm

  • Types, function bodies, import/export declarations
  • Can be instantiated multiple times
  • Like a class definition

Instance: A running module with its own state

  • Own memory, globals, tables
  • Execution state (stack, frames)
  • Like a class instance

Suggested File Organization

wasm-interp/
โ”œโ”€โ”€ src/
โ”‚   โ”œโ”€โ”€ main.c              # Entry point, CLI handling
โ”‚   โ”œโ”€โ”€ types.h             # Value, Module, Instance types
โ”‚   โ”œโ”€โ”€ parser.c            # From Project 2
โ”‚   โ”œโ”€โ”€ parser.h
โ”‚   โ”œโ”€โ”€ validator.c         # Optional: type checking
โ”‚   โ”œโ”€โ”€ validator.h
โ”‚   โ”œโ”€โ”€ instance.c          # Module instantiation
โ”‚   โ”œโ”€โ”€ instance.h
โ”‚   โ”œโ”€โ”€ exec.c              # Instruction execution
โ”‚   โ”œโ”€โ”€ exec.h
โ”‚   โ”œโ”€โ”€ memory.c            # Linear memory operations
โ”‚   โ”œโ”€โ”€ memory.h
โ”‚   โ”œโ”€โ”€ stack.c             # Value stack operations
โ”‚   โ”œโ”€โ”€ stack.h
โ”‚   โ”œโ”€โ”€ host.c              # Host function imports
โ”‚   โ””โ”€โ”€ host.h
โ”œโ”€โ”€ tests/
โ”‚   โ”œโ”€โ”€ arithmetic.wat
โ”‚   โ”œโ”€โ”€ factorial.wat
โ”‚   โ”œโ”€โ”€ memory.wat
โ”‚   โ”œโ”€โ”€ control_flow.wat
โ”‚   โ””โ”€โ”€ run_tests.sh
โ”œโ”€โ”€ Makefile
โ””โ”€โ”€ README.md

Implementation Guide

Phase 1: Foundation (Days 1-3)

Goal: Execute the simplest possible program

;; Target: Execute this
(module
  (func (export "answer") (result i32)
    i32.const 42
  )
)

Steps:

  1. Define Value type (just i32 for now)
  2. Implement value stack (push, pop)
  3. Implement minimal execution loop:
    • i32.const: push immediate value
    • end: return top of stack
  4. Call exported function and print result

Checkpoint: Running the above returns 42.

Phase 2: Arithmetic (Days 4-7)

Goal: Execute arithmetic expressions

(module
  (func (export "add") (param i32 i32) (result i32)
    local.get 0
    local.get 1
    i32.add
  )
)

Steps:

  1. Implement local.get with parameters
  2. Add local.set, local.tee
  3. Implement i32 arithmetic: add, sub, mul, div_s, div_u
  4. Implement i32 comparisons: eq, ne, lt_s, gt_s, etc.
  5. Add i32.eqz (equals zero)

Checkpoint: Can evaluate any arithmetic expression.

Phase 3: Control Flow (Days 8-14)

Goal: Implement blocks, loops, and branches

(module
  (func (export "sum_to") (param $n i32) (result i32)
    (local $sum i32)
    (local $i i32)

    (block $exit
      (loop $continue
        ;; if i > n, exit
        local.get $i
        local.get $n
        i32.gt_s
        br_if $exit

        ;; sum += i
        local.get $sum
        local.get $i
        i32.add
        local.set $sum

        ;; i++
        local.get $i
        i32.const 1
        i32.add
        local.set $i

        br $continue
      )
    )

    local.get $sum
  )
)

Steps:

  1. Implement label stack
  2. Implement block (push label, continuation at end)
  3. Implement loop (push label, continuation at start)
  4. Implement br (branch to label)
  5. Implement br_if (conditional branch)
  6. Implement if/else/end (two branches)
  7. Handle block return values (arity)

Checkpoint: Loops and conditionals work correctly.

Phase 4: Function Calls (Days 15-21)

Goal: Call other functions, including recursion

(module
  (func $factorial (param $n i32) (result i32)
    local.get $n
    i32.const 2
    i32.lt_s
    if (result i32)
      i32.const 1
    else
      local.get $n
      local.get $n
      i32.const 1
      i32.sub
      call $factorial
      i32.mul
    end
  )

  (export "factorial" (func $factorial))
)

Steps:

  1. Implement call stack (frame creation, management)
  2. Implement call instruction (push frame, jump)
  3. Implement return (pop frame, return value)
  4. Handle function end (implicit return)
  5. Test recursion (factorial, fibonacci)

Checkpoint: Recursive factorial works.

Phase 5: Memory (Days 22-28)

Goal: Implement linear memory operations

(module
  (memory 1)  ;; 1 page = 64KB

  (func (export "store_load") (param $val i32) (result i32)
    i32.const 0    ;; address
    local.get $val
    i32.store      ;; memory[0] = val

    i32.const 0
    i32.load       ;; return memory[0]
  )
)

Steps:

  1. Allocate memory buffer during instantiation
  2. Implement memory.size (return current pages)
  3. Implement memory.grow (add pages, return old size)
  4. Implement i32.load, i32.store with bounds checking
  5. Add all load/store variants (8-bit, 16-bit, signed/unsigned)
  6. Handle data sections (initialize memory on load)

Checkpoint: Store/load values correctly, bounds checking works.

Phase 6: Imports (Days 29-35)

Goal: Call host functions from WASM

(module
  (import "console" "log" (func $log (param i32)))

  (func (export "main")
    i32.const 42
    call $log
  )
)

Steps:

  1. Parse import section
  2. Create import resolution table
  3. Allow user to register host functions
  4. Distinguish imported vs local function indices
  5. Implement call for imported functions

Checkpoint: Can call console.log from WASM.

Phase 7: Completeness (Week 5+)

Goal: Full spec compliance

Steps:

  1. Add i64, f32, f64 instructions
  2. Implement globals
  3. Implement tables and call_indirect
  4. Add remaining instructions (select, memory.copy, etc.)
  5. Run official WebAssembly test suite
  6. Fix edge cases and bugs

Checkpoint: Pass official spec tests.


Testing Strategy

Unit Tests

Test each instruction in isolation:

// test_i32_add.c
void test_i32_add() {
    // Setup
    Stack stack = create_stack();
    push(&stack, (Value){VAL_I32, .i32 = 5});
    push(&stack, (Value){VAL_I32, .i32 = 3});

    // Execute
    exec_i32_add(&stack);

    // Verify
    Value result = pop(&stack);
    assert(result.type == VAL_I32);
    assert(result.i32 == 8);
    assert(stack_height(&stack) == 0);
}

Integration Tests

Test complete programs:

# tests/run_tests.sh
wat2wasm tests/arithmetic.wat -o /tmp/arith.wasm
result=$(./wasm-interp /tmp/arith.wasm --call add 5 3)
[ "$result" = "8" ] || fail "add test"

result=$(./wasm-interp /tmp/arith.wasm --call factorial 10)
[ "$result" = "3628800" ] || fail "factorial test"

Spec Conformance Tests

The WebAssembly spec includes thousands of tests:

# Clone spec tests
git clone https://github.com/WebAssembly/spec.git
cd spec/test/core

# Run your interpreter against them
for f in *.wast; do
    ./wasm-interp --test "$f" || echo "FAIL: $f"
done

Tracing for Debugging

Add verbose mode to see execution:

$ ./wasm-interp --trace factorial.wasm --call factorial 5
[call] factorial(5)
  [0x10] local.get 0          stack: [5]
  [0x12] i32.const 2          stack: [5, 2]
  [0x14] i32.lt_s             stack: [0]
  [0x15] if                   (false branch)
  [0x20] local.get 0          stack: [5]
  [0x22] local.get 0          stack: [5, 5]
  [0x24] i32.const 1          stack: [5, 5, 1]
  [0x26] i32.sub              stack: [5, 4]
  [0x27] call factorial
    [call] factorial(4)
    ...

Common Pitfalls

1. Block Arity Confusion

Blocks can return values! A block with type (result i32) must leave exactly one i32 on the stack:

;; Block returns a value
(block (result i32)
  i32.const 42
)
;; Stack now has 42

;; Error: stack imbalance
(block (result i32)
  i32.const 1
  i32.const 2   ;; Extra value!
)

2. Loop vs Block Branch Semantics

This is the most common bug:

;; BLOCK: br jumps FORWARD (to end)
(block $exit
  br $exit   ;; jumps here โ”€โ”€โ”
  unreachable                โ”‚
)                        ;; โ—€โ”€โ”˜

;; LOOP: br jumps BACKWARD (to start)
(loop $continue       ;; โ—€โ”€โ”
  br $continue   ;; โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
)

3. Function Index Offset

If you have imports, local function indices are offset:

Import 0: (import "env" "print")
Import 1: (import "env" "read")
Function 0: (func ...)   โ† Index is 2, not 0!
Function 1: (func ...)   โ† Index is 3

call 0  โ†’ calls Import 0
call 2  โ†’ calls Function 0

4. Memory Alignment Traps

WASM doesnโ€™t trap on unaligned access (itโ€™s just a hint), but you must handle it:

;; Unaligned is valid
i32.const 3
i32.const 42
i32.store align=1  ;; Works, just slower on some hardware

5. Integer Overflow Behavior

Division and remainder have edge cases:

i32.div_s: MIN_INT / -1 โ†’ trap (overflow)
i32.div_u: any / 0 โ†’ trap
i32.rem_s: MIN_INT % -1 โ†’ 0 (not trap!)

6. Stack Underflow

Always check before popping:

Value pop(Stack* s) {
    if (s->sp == 0) {
        trap("stack underflow");
    }
    return s->data[--s->sp];
}

Extensions

1. Add a Debugger

Implement interactive debugging:

(wdb) break factorial
Breakpoint 1 at func $factorial

(wdb) run
Breakpoint 1 hit

(wdb) stack
[0]: i32 5

(wdb) locals
$n: i32 5

(wdb) step
[0x12] i32.const 2

(wdb) continue

2. Add Profiling

Track instruction counts and execution time:

Execution profile:
  i32.add:       15234 calls (12.3%)
  local.get:     45123 calls (36.7%)
  br_if:          8234 calls ( 6.7%)
  call:           2341 calls ( 1.9%)
  ...

Hot functions:
  $factorial:    8234 calls, 45ms total
  $fib:          2341 calls, 89ms total

3. Implement Tail Calls

The tail-call proposal optimizes recursion:

;; With tail calls, this uses O(1) stack
(func $factorial_tail (param $n i32) (param $acc i32) (result i32)
  local.get $n
  i32.eqz
  if (result i32)
    local.get $acc
  else
    local.get $n
    i32.const 1
    i32.sub
    local.get $n
    local.get $acc
    i32.mul
    return_call $factorial_tail  ;; Tail call!
  end
)

4. Add i64/f32/f64 Support

Extend your value type and add the full instruction set:

  • i64: 64-bit integers (same ops as i32)
  • f32: IEEE 754 single precision
  • f64: IEEE 754 double precision
  • Conversions: i32.trunc_f32_s, f64.convert_i32_u, etc.

5. Implement Tables and call_indirect

Tables enable function pointers and dynamic dispatch:

(table 10 funcref)
(elem (i32.const 0) $func0 $func1 $func2)

(func $dispatch (param $idx i32) (result i32)
  local.get $idx
  call_indirect (type $sig)
)

Real-World Connections

How Production Runtimes Work

wasmtime (Bytecode Alliance):

  • Uses Cranelift for JIT compilation
  • Your interpreter helps understand what Cranelift compiles to

wasm3 (high-performance interpreter):

  • Threaded interpretation (computed goto)
  • Study their source as reference

V8 (Chrome):

  • Liftoff: baseline compiler (fast startup)
  • TurboFan: optimizing compiler (fast execution)
  • Your interpreter is like Liftoff without machine code

Edge Computing

Cloudflare Workers, Fastly Compute@Edge, and AWS Lambda all use WASM:

  • Cold start in microseconds
  • Strong isolation between tenants
  • Your interpreter demonstrates the isolation model

Blockchain VMs

Ethereum 2.0 (ewasm), Polkadot, Near Protocol use WASM:

  • Deterministic execution across nodes
  • Gas metering (add instruction counting!)
  • Your interpreter could be extended with gas limits

The Core Question Youโ€™re Answering

โ€œHow does a virtual machine execute bytecode instruction-by-instruction, and what does โ€˜executionโ€™ really mean at the lowest level?โ€

When you write i32.add in WebAssembly, what actually happens? A compiler translates that to binary opcode 0x6a. But execution is not reading bytesโ€”it is transformation of state. Your interpreter must answer: What changes? The value stack loses two elements and gains one. The instruction pointer advances. Nothing else in the universe of your VM changes.

This is the deep insight: execution is state transition. Every instruction is a tiny function from machine state to machine state. The entire complexity of a programโ€”recursion, memory allocation, control flowโ€”reduces to โ€œread byte, change state, repeat.โ€ When you implement this yourself, the abstraction falls away. You see that there is no magic in computers, only meticulously defined transformations applied billions of times per second.

Understanding this changes how you think about all code. Every high-level construct youโ€™ve ever usedโ€”classes, closures, async/awaitโ€”eventually becomes a sequence of these primitive state changes. Building an interpreter teaches you to see through the abstraction.


Concepts You Must Understand First

Before implementing an interpreter, ensure you have solid grounding in these foundational concepts:

Concept Why It Matters Where to Learn
Virtual Machine Architecture Youโ€™re building a VMโ€”understand what that means. How does a VM differ from physical hardware? What are the tradeoffs of interpretation vs compilation? Computer Systems: A Programmerโ€™s Perspective Ch. 3.7
Stack-Based Execution Model WASM is a stack machine. Every instruction implicitly operates on a stack. You must internalize push/pop semantics before implementing anything. Engineering a Compiler by Cooper & Torczon, Ch. 7.2
Instruction Dispatch Techniques How do you go from opcode byte to handler code? Switch statements work but are slow. Computed gotos and threaded dispatch can improve performance 7-15%. Intel WAMR Technical Article
Call Frames and Activation Records When a function calls another function, where do parameters and locals live? How do you track where to return? The call stack is separate from the value stack. Low-Level Programming by Igor Zhirkov, Ch. 12
Memory Bounds Checking Every memory access must be validated. An off-by-one error here is a security vulnerability. Understand how to check efficiently without killing performance. WebAssembly Specification ยง4.4
Type Validation at Runtime WASM is typed. Even in an interpreter, you may want to validate that i32.add receives two i32s (for debugging). The spec guarantees type safety if validation passed. WebAssembly Specification ยง3 (Validation)

Note on threaded interpretation: The wasm3 interpreter uses a โ€œcontinuation-passing styleโ€ threaded dispatch that avoids the overhead of a central switch statement. Each instruction handler jumps directly to the next handler. This technique, discussed since the 1970s, shows significant performance benefits on modern CPUs with branch prediction.


Questions to Guide Your Design

Before writing code, think through these design questions. Your answers will shape your implementation:

Value Stack Representation

  • How will you represent the value stack? A fixed-size array? A growable vector? Whatโ€™s the maximum depth?
  • How do you handle the tagged union problem? Each value has a type (i32, i64, f32, f64). Do you use a struct with a type tag, or separate arrays per type?
  • What happens on stack overflow? WASM modules can have deeply nested expressions. Your stack needs a limit.

Function Calls and Returns

  • How do you represent a call frame? What data does each frame need: return address, locals, stack base?
  • Where do parameters come from? When calling a function, parameters are on the value stack. They become the calleeโ€™s locals.
  • How do you handle the return value? The calleeโ€™s result must end up on the callerโ€™s stack. How does unwinding work?
  • What about recursion depth? Recursive functions can exhaust your call stack. How do you trap on stack overflow?

Control Flow Implementation

  • How do you implement block, loop, and if? These create branch targets (labels). What data structure tracks them?
  • What is the label stack? Each block pushes a label with its continuation address. br N pops N labels and jumps.
  • Why is loop different from block? Blockโ€™s br jumps forward (to end). Loopโ€™s br jumps backward (to start). This is the most common bug.
  • How do you handle br_table? A multi-way branch with a default. This is the hardest control flow instruction.

Linear Memory Management

  • How do you allocate memory? WASM memory is sized in 64KB pages. Do you allocate all at once or grow lazily?
  • How do you bounds-check efficiently? Every load/store must check. Can you avoid redundant checks?
  • How do you handle memory.grow? The memory can expand. Existing data must be preserved. What if growth fails?

Thinking Exercise

Before implementing your interpreter, hand-trace this program through your proposed data structures:

(module
  (func $add_three (param $a i32) (param $b i32) (param $c i32) (result i32)
    local.get $a
    local.get $b
    i32.add
    local.get $c
    i32.add
  )

  (func $main (result i32)
    i32.const 10
    i32.const 20
    i32.const 12
    call $add_three
  )

  (export "main" (func $main))
)

Trace the execution step by step. For each instruction, write:

  1. The instruction pointer (which instruction)
  2. The value stack contents (before and after)
  3. The call stack contents (active frames)
  4. Local variable values for each frame

Expected trace format:

=== Calling $main ===
Frame 0: $main, locals: [], return: (exit)

IP: 0 | i32.const 10       | Stack: []           -> [10]
IP: 1 | i32.const 20       | Stack: [10]         -> [10, 20]
IP: 2 | i32.const 12       | Stack: [10, 20]     -> [10, 20, 12]
IP: 3 | call $add_three    | Creating new frame...

=== Calling $add_three ===
Frame 1: $add_three, locals: [$a=10, $b=20, $c=12], return: Frame 0, IP 4
  (Note: 10, 20, 12 were popped from caller's stack and became locals)

IP: 0 | local.get $a       | Stack: []           -> [10]
IP: 1 | local.get $b       | Stack: [10]         -> [10, 20]
IP: 2 | i32.add            | Stack: [10, 20]     -> [30]
IP: 3 | local.get $c       | Stack: [30]         -> [30, 12]
IP: 4 | i32.add            | Stack: [30, 12]     -> [42]
IP: 5 | end                | Return 42 to caller, pop frame

=== Returning to $main ===
Frame 0: $main, stack now [42] (return value pushed)

IP: 4 | end                | Return 42, execution complete
Result: 42

Complete this trace yourself before implementing. If you can do this by hand, you understand the execution model.


The Interview Questions Theyโ€™ll Ask

If you build a WASM interpreter, expect these questions in systems programming interviews:

Fundamental Understanding

  1. โ€œExplain the difference between a stack machine and a register machine.โ€
    • Stack machines use implicit operands (pop, compute, push). Register machines use explicit operand names.
    • WASM is a stack machine. x86 is a register machine. JVM bytecode is stack-based.
    • Tradeoff: stack machines have simpler encoding, register machines have better performance.
  2. โ€œHow does your interpreter handle function calls?โ€
    • Be ready to explain: frame creation, parameter passing via stack, local storage, return value handling, stack unwinding.
  3. โ€œWhat makes WebAssemblyโ€™s control flow โ€˜structuredโ€™?โ€
    • No arbitrary goto. All branches target block/loop/if labels. This enables validation and optimization.
    • You should be able to explain why this is a security feature.

Implementation Details

  1. โ€œHow do you dispatch instructions efficiently?โ€
    • Discuss switch statement (baseline), computed goto (faster), threaded interpretation (fastest).
    • Mention that wasm3 uses threaded dispatch and achieves high performance.
  2. โ€œHow do you implement br_table?โ€
    • Itโ€™s a jump table. Index into array of label depths, branch to that label.
    • What if index is out of bounds? Use the default label.
  3. โ€œHow do you handle memory bounds checking?โ€
    • Every load/store: if (addr + size > memory_size) trap().
    • Discuss performance: hoisting checks, eliminating redundant checks.

Architecture Questions

  1. โ€œWhatโ€™s the difference between a Module and an Instance?โ€
    • Module: static, parsed from .wasm, can be instantiated multiple times.
    • Instance: runtime state, owns memory/globals/tables, has execution context.
  2. โ€œHow would you add debugging support?โ€
    • Breakpoints: check instruction address before executing.
    • Single-stepping: execute one instruction, wait.
    • Stack inspection: expose frame and value stack state.
  3. โ€œHow would you extend this to a JIT compiler?โ€
    • Instead of interpreting opcodes, generate native machine code.
    • Still need runtime for memory, calls, traps.
    • Mention Cranelift (wasmtime), Lightbeam (deprecated), Liftoff (V8).

Hints in Layers

Stuck on implementation? Read only the hint level you need:

Challenge: Instruction Dispatch is Slow

Hint Level 1 (Conceptual): The obvious approach uses a switch statement with 200+ cases. Every instruction pays the cost of the switch.

Hint Level 2 (Direction): Look up โ€œcomputed gotoโ€ or โ€œthreaded interpretation.โ€ The idea is to jump directly from one handler to the next.

Hint Level 3 (Specific): In C/GCC, use &&label to get the address of a label, and goto *address to jump to it. Store handler addresses in an array indexed by opcode.

Hint Level 4 (Code):

void* dispatch_table[256];
dispatch_table[0x6a] = &&op_i32_add;
dispatch_table[0x41] = &&op_i32_const;
// ...

#define DISPATCH() goto *dispatch_table[*ip++]

op_i32_add:
    stack[sp-2] += stack[sp-1]; sp--;
    DISPATCH();

op_i32_const:
    stack[sp++] = read_leb128(&ip);
    DISPATCH();

Challenge: Control Flow Labels

Hint Level 1 (Conceptual): Blocks, loops, and ifs create branch targets. You need to track where each br N should jump to.

Hint Level 2 (Direction): Maintain a โ€œlabel stackโ€ separate from the value stack. Push a label when entering block/loop/if. Pop on end.

Hint Level 3 (Specific): Each label needs: (1) its type (block/loop/if), (2) its continuation address, (3) its arity (number of return values).

Hint Level 4 (Code):

typedef struct {
    LabelKind kind;      // BLOCK, LOOP, IF
    uint8_t* continuation; // Where to jump on br
    int arity;           // Number of return values
} Label;

// For block: continuation = address after 'end'
// For loop: continuation = address after 'loop' (loops back!)

Challenge: Function Call Mechanics

Hint Level 1 (Conceptual): When you call a function, you need to remember where to return.

Hint Level 2 (Direction): Create a call frame that stores the return address, the base of the value stack, and local variables.

Hint Level 3 (Specific): Pop N arguments from callerโ€™s stack, copy to calleeโ€™s locals. On return, push results back to callerโ€™s stack.

Hint Level 4 (Code):

void call_function(VM* vm, uint32_t func_idx) {
    Function* func = &vm->module->functions[func_idx];
    Frame* frame = push_frame(vm);

    frame->func = func;
    frame->return_ip = vm->ip;
    frame->stack_base = vm->sp - func->num_params;

    // Parameters become first N locals
    for (int i = 0; i < func->num_params; i++) {
        frame->locals[i] = vm->stack[frame->stack_base + i];
    }
    vm->sp = frame->stack_base; // Pop params from caller

    // Additional locals initialized to zero
    for (int i = func->num_params; i < func->num_locals; i++) {
        frame->locals[i] = (Value){0};
    }

    vm->ip = func->code_start;
}

Books That Will Help

Book Relevant Chapters What Youโ€™ll Learn
Low-Level Programming by Igor Zhirkov Ch. 12: Virtual Machines & Interpreters How to structure an interpreter, dispatch loops, frame management. This chapter walks through building a simple VM from scratch.
Computer Systems: A Programmerโ€™s Perspective by Bryant & Oโ€™Hallaron Ch. 3.7: Procedures How real hardware handles function calls, stack frames, calling conventions. Understanding this helps you design WASM frames.
Engineering a Compiler by Cooper & Torczon Ch. 7.2: Stack Machines, Ch. 5: Runtime Environments Academic treatment of stack-based execution, activation records, and runtime support. Essential for understanding the theory.
Crafting Interpreters by Robert Nystrom Part II: Bytecode VM (Chapters 14-24) Complete walkthrough of building a bytecode interpreter. Different language (Lox, not WASM), but same concepts. Free online at craftinginterpreters.com.
Writing An Interpreter In Go by Thorsten Ball Entire book Practical, hands-on interpreter construction. Good for understanding the end-to-end process before specializing to WASM.

Real-World Outcome

After completing this project, you should be able to run your interpreter and see output like this:

$ ./wasm-interp factorial.wasm --call factorial 10 --trace

=== WebAssembly Interpreter v1.0 ===
Module: factorial.wasm (1847 bytes)
  Functions: 1
  Memory: 0 pages
  Exports: factorial (func 0)

Calling: factorial(10)
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€

[FRAME 0] factorial | locals: [$n=10] | stack_base: 0
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
  0x0010 โ”‚ local.get 0        โ”‚ Stack: []                โ†’ [10]
  0x0012 โ”‚ i32.const 2        โ”‚ Stack: [10]              โ†’ [10, 2]
  0x0014 โ”‚ i32.lt_s           โ”‚ Stack: [10, 2]           โ†’ [0]
  0x0015 โ”‚ if (result i32)    โ”‚ Stack: [0]               โ†’ []
         โ”‚   condition: false, taking else branch
  0x001A โ”‚ local.get 0        โ”‚ Stack: []                โ†’ [10]
  0x001C โ”‚ local.get 0        โ”‚ Stack: [10]              โ†’ [10, 10]
  0x001E โ”‚ i32.const 1        โ”‚ Stack: [10, 10]          โ†’ [10, 10, 1]
  0x0020 โ”‚ i32.sub            โ”‚ Stack: [10, 10, 1]       โ†’ [10, 9]
  0x0021 โ”‚ call 0             โ”‚ Stack: [10, 9]           โ†’ [10]
         โ”‚   pushing new frame for factorial(9)

[FRAME 1] factorial | locals: [$n=9] | stack_base: 1
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
  0x0010 โ”‚ local.get 0        โ”‚ Stack: []                โ†’ [9]
  0x0012 โ”‚ i32.const 2        โ”‚ Stack: [9]               โ†’ [9, 2]
  0x0014 โ”‚ i32.lt_s           โ”‚ Stack: [9, 2]            โ†’ [0]
  0x0015 โ”‚ if (result i32)    โ”‚ condition: false, taking else branch
  0x001A โ”‚ local.get 0        โ”‚ Stack: []                โ†’ [9]
  0x001C โ”‚ local.get 0        โ”‚ Stack: [9]               โ†’ [9, 9]
  0x001E โ”‚ i32.const 1        โ”‚ Stack: [9, 9]            โ†’ [9, 9, 1]
  0x0020 โ”‚ i32.sub            โ”‚ Stack: [9, 9, 1]         โ†’ [9, 8]
  0x0021 โ”‚ call 0             โ”‚ pushing new frame for factorial(8)

  ... [frames 2-8 elided for brevity] ...

[FRAME 9] factorial | locals: [$n=1] | stack_base: 9
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
  0x0010 โ”‚ local.get 0        โ”‚ Stack: []                โ†’ [1]
  0x0012 โ”‚ i32.const 2        โ”‚ Stack: [1]               โ†’ [1, 2]
  0x0014 โ”‚ i32.lt_s           โ”‚ Stack: [1, 2]            โ†’ [1]
  0x0015 โ”‚ if (result i32)    โ”‚ condition: true, taking then branch
  0x0017 โ”‚ i32.const 1        โ”‚ Stack: []                โ†’ [1]
  0x0019 โ”‚ end                โ”‚ returning from if with [1]
  0x0030 โ”‚ end                โ”‚ returning from function with 1
         โ”‚   popping frame, result: 1

[FRAME 8] factorial | resuming after call
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
  0x0022 โ”‚ i32.mul            โ”‚ Stack: [2, 1]            โ†’ [2]
  0x0023 โ”‚ end                โ”‚ returning from if with [2]
  0x0030 โ”‚ end                โ”‚ returning from function with 2
         โ”‚   popping frame, result: 2

[FRAME 7] factorial | resuming after call
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
  0x0022 โ”‚ i32.mul            โ”‚ Stack: [3, 2]            โ†’ [6]
  ...

[FRAME 0] factorial | resuming after call
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
  0x0022 โ”‚ i32.mul            โ”‚ Stack: [10, 362880]      โ†’ [3628800]
  0x0023 โ”‚ end                โ”‚ returning from if with [3628800]
  0x0030 โ”‚ end                โ”‚ returning from function with 3628800

โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•
EXECUTION COMPLETE
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
Result: 3628800 (i32)
Instructions executed: 127
Call depth (max): 10
Memory usage: 0 bytes
Execution time: 0.23ms
โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•

$ ./wasm-interp memory_test.wasm --call store_and_sum --trace

=== WebAssembly Interpreter v1.0 ===
Module: memory_test.wasm (423 bytes)
  Functions: 1
  Memory: 1 page (65536 bytes)
  Exports: store_and_sum (func 0)

Calling: store_and_sum()
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€

[FRAME 0] store_and_sum | locals: [] | stack_base: 0
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
  0x0010 โ”‚ i32.const 0        โ”‚ Stack: []                โ†’ [0]
  0x0012 โ”‚ i32.const 100      โ”‚ Stack: [0]               โ†’ [0, 100]
  0x0014 โ”‚ i32.store          โ”‚ Stack: [0, 100]          โ†’ []
         โ”‚   memory[0x0000..0x0003] = 100 (0x64)
  0x0016 โ”‚ i32.const 4        โ”‚ Stack: []                โ†’ [4]
  0x0018 โ”‚ i32.const 200      โ”‚ Stack: [4]               โ†’ [4, 200]
  0x001A โ”‚ i32.store          โ”‚ Stack: [4, 200]          โ†’ []
         โ”‚   memory[0x0004..0x0007] = 200 (0xc8)
  0x001C โ”‚ i32.const 0        โ”‚ Stack: []                โ†’ [0]
  0x001E โ”‚ i32.load           โ”‚ Stack: [0]               โ†’ [100]
         โ”‚   loaded 100 from memory[0x0000..0x0003]
  0x0020 โ”‚ i32.const 4        โ”‚ Stack: [100]             โ†’ [100, 4]
  0x0022 โ”‚ i32.load           โ”‚ Stack: [100, 4]          โ†’ [100, 200]
         โ”‚   loaded 200 from memory[0x0004..0x0007]
  0x0024 โ”‚ i32.add            โ”‚ Stack: [100, 200]        โ†’ [300]
  0x0025 โ”‚ end                โ”‚ returning 300

โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•
EXECUTION COMPLETE
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
Result: 300 (i32)
Instructions executed: 11
Memory:
  0x0000: 64 00 00 00 c8 00 00 00  00 00 00 00 00 00 00 00
          ^^^^^^^^^^^^ ^^^^^^^^^^^^
          100          200
โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•

This trace demonstrates mastery of interpreter implementation: instruction dispatch, stack management, call frames with local variables, memory operations with bounds checking, and structured control flow with if/else branches. The step-by-step visibility into execution state proves you understand what โ€œexecutionโ€ truly means.


Self-Assessment Checklist

Before considering this project complete, verify you can:

Execution Model

  • Explain stack machine semantics to another developer
  • Trace through a program by hand, showing stack state at each step
  • Explain why WASM uses structured control flow instead of goto
  • Describe what happens during a function call (frame creation, locals, return)

Implementation

  • Execute the official factorial test correctly
  • Handle nested blocks and loops with branches
  • Bounds-check all memory accesses
  • Call host functions from WASM code
  • Trap on divide by zero and out-of-bounds access

Debugging

  • Add a trace mode that shows instruction-by-instruction execution
  • Identify and fix off-by-one errors in branch targets
  • Debug stack imbalance issues

Extensions

  • Describe how you would add a debugger
  • Explain how JIT compilation would improve performance
  • Identify which instructions are hardest to optimize

Resources

Primary References

Books

  • Computer Systems: A Programmerโ€™s Perspective Ch. 3 - Machine-level execution
  • Engineering a Compiler Ch. 5 - Runtime environments

Videos

Tools for Testing


Key Insights

The stack is temporary, frames are permanent. Values flow through the stack during expression evaluation, but function state lives in call frames. Understanding this separation is key to implementing control flow correctly.

Control flow is data flow. WASMโ€™s structured control flow means every branch has a known destination. Thereโ€™s no โ€œgoto spaghettiโ€โ€”the label stack tells you exactly where each br goes.

The interpreter IS the specification. When youโ€™ve implemented all instructions correctly, you have a working definition of WebAssembly semantics. This is the deepest understanding possible.


After completing this project, youโ€™ll understand WebAssembly execution as well as anyone who hasnโ€™t written a JIT compiler. Project 4 (compiler) will show you the other sideโ€”generating WASM instead of executing it.