Project 4: Compile a Language to WASM
Project 4: Compile a Language to WASM
Build a compiler that translates a simple programming language into WebAssembly binary format
Project Overview
| Attribute | Value |
|---|---|
| Difficulty | Advanced |
| Time Estimate | 1 month+ |
| Languages | C (primary), Rust, Go, TypeScript |
| Prerequisites | Projects 1-2, basic parsing knowledge |
| Main Reference | Engineering a Compiler by Cooper & Torczon |
| Knowledge Area | Compilers, Code Generation, Language Design |
Learning Objectives
After completing this project, you will be able to:
- Design a simple language - Define syntax and semantics for a compilable language
- Build a lexer and parser - Transform source text into an Abstract Syntax Tree (AST)
- Generate stack machine code - Emit WASM instructions from high-level constructs
- Emit valid WASM binaries - Produce correct .wasm files with proper structure
- Implement type checking - Ensure programs match WASMโs type constraints
- Handle control flow compilation - Transform if/while/for into WASM blocks and branches
- Understand compiler architecture - See how real compilers target WASM
Conceptual Foundation
1. Why Build a Compiler to WASM?
Compiling to WASM gives you the โproduction viewโ of WebAssembly:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ The Compiler Writer's Perspective โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ Source Code AST IR .wasm โ
โ โโโโโโโโโโโ โโโโโโโโโโ โโโโโโโโโโ โโโโโโโโโโ โ
โ func add(a, b) FuncDecl f0: [i32,i32] 00 61 73 6d โ
โ return a + b โparams โ [i32] 01 00 00 00 โ
โ โ โ a: i32 local.get 0 01 07 01 60 โ
โ โ โ b: i32 local.get 1 ... โ
โ โbody i32.add โ
โ โ BinOp(+) end โ
โ โ a โ
โ โ b โ
โ โ
โ You learn: โ
โ โข Why WASM has certain constraints (they make compilation easy!) โ
โ โข Why structured control flow matters (no goto = simpler codegen) โ
โ โข Why only 4 types (uniform representation, easy type checking) โ
โ โข Why stack machine (postfix order matches evaluation order) โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ

Real-world impact: Every language that targets WASM (C/C++, Rust, Go, AssemblyScript, Grain) goes through this process. Youโre learning what LLVM does.
2. The Source Language: MiniLang
Design a small but complete language. Hereโs a suggested specification:
MiniLang Grammar (EBNF):
program = { function } ;
function = "func" IDENT "(" params ")" [ ":" type ] block ;
params = [ param { "," param } ] ;
param = IDENT ":" type ;
type = "i32" | "i64" | "f32" | "f64" | "void" ;
block = "{" { statement } "}" ;
statement = var_decl | assignment | if_stmt | while_stmt
| return_stmt | expr_stmt ;
var_decl = "let" IDENT [ ":" type ] "=" expr ";" ;
assignment = IDENT "=" expr ";" ;
if_stmt = "if" expr block [ "else" block ] ;
while_stmt = "while" expr block ;
return_stmt = "return" [ expr ] ";" ;
expr_stmt = expr ";" ;
expr = comparison ;
comparison = addition { ( "==" | "!=" | "<" | ">" | "<=" | ">=" ) addition } ;
addition = term { ( "+" | "-" ) term } ;
term = factor { ( "*" | "/" | "%" ) factor } ;
factor = unary ;
unary = [ "-" | "!" ] primary ;
primary = NUMBER | IDENT | call | "(" expr ")" ;
call = IDENT "(" [ expr { "," expr } ] ")" ;
IDENT = letter { letter | digit | "_" } ;
NUMBER = digit { digit } [ "." digit { digit } ] ;
Example MiniLang program:
func factorial(n: i32): i32 {
if n < 2 {
return 1;
}
return n * factorial(n - 1);
}
func main(): i32 {
let result: i32 = factorial(10);
return result;
}
3. Compiler Pipeline Overview
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Compiler Pipeline โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ Source Tokens AST WASM โ
โ โโโโโโโ โโโโโโโ โโโโโโโ โโโโโโโ โ
โ โ.miniโ โโโโโโโถโTokenโ โโโโโโโถโ AST โ โโโโโโโถโ.wasmโ โ
โ โfile โ Lexer โList โ Parser โTree โ CodeGenโfile โ โ
โ โโโโโโโ โโโโโโโ โโโโโโโ โโโโโโโ โ
โ โ โ
โ โผ โ
โ Type Check โ
โ (optional but โ
โ recommended) โ
โ โ
โ Phase Details: โ
โ โโโโโโโโโโโโโ โ
โ Lexer: "func add" โ [FUNC, IDENT("add")] โ
โ Parser: Tokens โ FuncDecl { name: "add", params: [...], body } โ
โ TypeChk: Verify types match, infer where needed โ
โ CodeGen: AST โ WASM instructions โ Binary encoding โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ

4. The Lexer (Tokenizer)
Convert character stream to token stream:
typedef enum {
// Keywords
TOK_FUNC, TOK_LET, TOK_IF, TOK_ELSE, TOK_WHILE, TOK_RETURN,
TOK_I32, TOK_I64, TOK_F32, TOK_F64, TOK_VOID,
// Symbols
TOK_LPAREN, TOK_RPAREN, TOK_LBRACE, TOK_RBRACE,
TOK_COMMA, TOK_COLON, TOK_SEMICOLON,
TOK_PLUS, TOK_MINUS, TOK_STAR, TOK_SLASH, TOK_PERCENT,
TOK_EQ, TOK_EQEQ, TOK_NE, TOK_LT, TOK_GT, TOK_LE, TOK_GE,
TOK_BANG,
// Literals
TOK_NUMBER,
TOK_IDENT,
// Special
TOK_EOF,
} TokenType;
typedef struct {
TokenType type;
const char* start;
int length;
int line;
} Token;
Lexer pseudocode:
next_token():
skip_whitespace()
skip_comments()
if at_end(): return EOF
start = current
if is_digit(peek()):
while is_digit(peek()): advance()
if peek() == '.' and is_digit(peek_next()):
advance() // consume '.'
while is_digit(peek()): advance()
return NUMBER
if is_alpha(peek()):
while is_alphanumeric(peek()): advance()
return check_keyword() or IDENT
// Single/double character tokens
switch advance():
'(': return LPAREN
')': return RPAREN
'{': return LBRACE
// ... etc
'=': return match('=') ? EQEQ : EQ
'<': return match('=') ? LE : LT
// ... etc
5. The Parser (Syntax Analysis)
Build an Abstract Syntax Tree from tokens. Use recursive descent:
// AST Node types
typedef enum {
NODE_PROGRAM,
NODE_FUNC_DECL,
NODE_VAR_DECL,
NODE_ASSIGNMENT,
NODE_IF,
NODE_WHILE,
NODE_RETURN,
NODE_BINARY,
NODE_UNARY,
NODE_CALL,
NODE_NUMBER,
NODE_IDENT,
} NodeType;
typedef struct ASTNode {
NodeType type;
union {
struct { /* program data */ } program;
struct { char* name; Param* params; int param_count;
Type return_type; struct ASTNode* body; } func;
struct { char* name; Type type; struct ASTNode* init; } var_decl;
struct { char* name; struct ASTNode* value; } assignment;
struct { struct ASTNode* condition;
struct ASTNode* then_branch;
struct ASTNode* else_branch; } if_stmt;
struct { struct ASTNode* condition;
struct ASTNode* body; } while_stmt;
struct { struct ASTNode* value; } return_stmt;
struct { char op; struct ASTNode* left;
struct ASTNode* right; } binary;
struct { char op; struct ASTNode* operand; } unary;
struct { char* name; struct ASTNode** args;
int arg_count; } call;
double number;
char* ident;
};
} ASTNode;
Recursive descent parser pattern:
// Parse with precedence: low to high
// comparison โ addition โ term โ factor โ unary โ primary
ASTNode* parse_expression() {
return parse_comparison();
}
ASTNode* parse_comparison() {
ASTNode* left = parse_addition();
while (match(TOK_EQEQ) || match(TOK_NE) ||
match(TOK_LT) || match(TOK_GT) ||
match(TOK_LE) || match(TOK_GE)) {
Token op = previous();
ASTNode* right = parse_addition();
left = make_binary(op, left, right);
}
return left;
}
ASTNode* parse_addition() {
ASTNode* left = parse_term();
while (match(TOK_PLUS) || match(TOK_MINUS)) {
Token op = previous();
ASTNode* right = parse_term();
left = make_binary(op, left, right);
}
return left;
}
// ... continue for term, factor, unary, primary ...
ASTNode* parse_primary() {
if (match(TOK_NUMBER)) {
return make_number(strtod(previous().start, NULL));
}
if (match(TOK_IDENT)) {
char* name = copy_string(previous());
if (match(TOK_LPAREN)) {
// Function call
return parse_call(name);
}
return make_ident(name);
}
if (match(TOK_LPAREN)) {
ASTNode* expr = parse_expression();
consume(TOK_RPAREN, "Expected ')' after expression");
return expr;
}
error("Expected expression");
}
6. Type Checking
WASM has strict types. Your compiler must verify type consistency:
typedef enum { TYPE_I32, TYPE_I64, TYPE_F32, TYPE_F64, TYPE_VOID } Type;
// Symbol table entry
typedef struct {
char* name;
Type type;
bool is_param;
int local_index;
} Symbol;
// Type checker
Type check_expression(ASTNode* node, SymbolTable* scope) {
switch (node->type) {
case NODE_NUMBER:
// Infer type from value
if (is_integer(node->number)) return TYPE_I32;
return TYPE_F64;
case NODE_IDENT: {
Symbol* sym = lookup(scope, node->ident);
if (!sym) error("Undefined variable: %s", node->ident);
return sym->type;
}
case NODE_BINARY: {
Type left = check_expression(node->binary.left, scope);
Type right = check_expression(node->binary.right, scope);
if (left != right) {
error("Type mismatch: %s vs %s", type_name(left), type_name(right));
}
// Comparison operators return i32 (boolean)
if (is_comparison(node->binary.op)) return TYPE_I32;
return left;
}
case NODE_CALL: {
FuncInfo* func = lookup_func(node->call.name);
// Check argument count and types
if (node->call.arg_count != func->param_count) {
error("Wrong number of arguments");
}
for (int i = 0; i < node->call.arg_count; i++) {
Type arg_type = check_expression(node->call.args[i], scope);
if (arg_type != func->param_types[i]) {
error("Argument type mismatch");
}
}
return func->return_type;
}
// ... other cases ...
}
}
7. Code Generation: The Heart of the Compiler
Transform AST into WASM instructions. The key insight: expressions compile to stack operations naturally.
Expression: a + b * c
Parse tree: Evaluation order: WASM:
+ 1. get a local.get $a
/ \ 2. get b local.get $b
a * 3. get c local.get $c
/ \ 4. multiply i32.mul
b c 5. add i32.add
The tree's post-order traversal IS the WASM instruction sequence!
Code generator structure:
typedef struct {
// Output buffer for instructions
uint8_t* code;
int code_len;
int code_cap;
// Local variable tracking
Symbol* locals;
int local_count;
int next_local_idx;
// Function being compiled
FuncInfo* current_func;
} CodeGen;
void emit_byte(CodeGen* cg, uint8_t byte) {
if (cg->code_len >= cg->code_cap) {
cg->code_cap *= 2;
cg->code = realloc(cg->code, cg->code_cap);
}
cg->code[cg->code_len++] = byte;
}
void emit_leb128(CodeGen* cg, uint32_t value) {
do {
uint8_t byte = value & 0x7f;
value >>= 7;
if (value != 0) byte |= 0x80;
emit_byte(cg, byte);
} while (value != 0);
}
void emit_sleb128(CodeGen* cg, int32_t value) {
bool more = true;
while (more) {
uint8_t byte = value & 0x7f;
value >>= 7;
if ((value == 0 && !(byte & 0x40)) ||
(value == -1 && (byte & 0x40))) {
more = false;
} else {
byte |= 0x80;
}
emit_byte(cg, byte);
}
}
8. Compiling Expressions
Post-order traversal generates correct stack code:
void compile_expression(CodeGen* cg, ASTNode* node) {
switch (node->type) {
case NODE_NUMBER:
emit_byte(cg, 0x41); // i32.const
emit_sleb128(cg, (int32_t)node->number);
break;
case NODE_IDENT: {
Symbol* sym = lookup(cg, node->ident);
emit_byte(cg, 0x20); // local.get
emit_leb128(cg, sym->local_index);
break;
}
case NODE_BINARY:
// Post-order: left, right, operator
compile_expression(cg, node->binary.left);
compile_expression(cg, node->binary.right);
// Emit operator
switch (node->binary.op) {
case '+': emit_byte(cg, 0x6a); break; // i32.add
case '-': emit_byte(cg, 0x6b); break; // i32.sub
case '*': emit_byte(cg, 0x6c); break; // i32.mul
case '/': emit_byte(cg, 0x6d); break; // i32.div_s
case '%': emit_byte(cg, 0x6f); break; // i32.rem_s
case '<': emit_byte(cg, 0x48); break; // i32.lt_s
case '>': emit_byte(cg, 0x4a); break; // i32.gt_s
// ... etc
}
break;
case NODE_CALL: {
// Compile arguments (they go on stack)
for (int i = 0; i < node->call.arg_count; i++) {
compile_expression(cg, node->call.args[i]);
}
// Call function
int func_idx = lookup_func_idx(cg, node->call.name);
emit_byte(cg, 0x10); // call
emit_leb128(cg, func_idx);
break;
}
case NODE_UNARY:
compile_expression(cg, node->unary.operand);
if (node->unary.op == '-') {
// Negate: push 0, swap, subtract
emit_byte(cg, 0x41); // i32.const
emit_byte(cg, 0x00); // 0
// Stack: [operand, 0]
// We need: [0, operand]
// Actually, easier to do: 0 - operand
// But we already have operand on stack...
// Better approach: multiply by -1 for floats, or use i32.sub from 0
// Simple approach: push 0 first, then subtract
// But operand is already on stack. Use local for temp.
// Actually, even simpler for i32: use xor with -1 and add 1 (two's complement)
// Or: i32.const 0; swap positions; i32.sub
// Cleanest: emit as (0 - x)
// This requires we compile differently:
}
break;
}
}
9. Compiling Control Flow
This is where WASMโs structured control flow shines:
If statements:
void compile_if(CodeGen* cg, ASTNode* node) {
// Compile condition (leaves i32 on stack)
compile_expression(cg, node->if_stmt.condition);
if (node->if_stmt.else_branch) {
// if-else
emit_byte(cg, 0x04); // if
emit_byte(cg, 0x40); // void block type (or result type)
compile_block(cg, node->if_stmt.then_branch);
emit_byte(cg, 0x05); // else
compile_block(cg, node->if_stmt.else_branch);
emit_byte(cg, 0x0b); // end
} else {
// if without else
emit_byte(cg, 0x04); // if
emit_byte(cg, 0x40); // void
compile_block(cg, node->if_stmt.then_branch);
emit_byte(cg, 0x0b); // end
}
}
While loops:
void compile_while(CodeGen* cg, ASTNode* node) {
// WASM pattern for while:
// block $exit
// loop $continue
// <condition>
// i32.eqz ;; invert for "branch if false"
// br_if $exit
// <body>
// br $continue
// end
// end
emit_byte(cg, 0x02); // block
emit_byte(cg, 0x40); // void
emit_byte(cg, 0x03); // loop
emit_byte(cg, 0x40); // void
// Condition
compile_expression(cg, node->while_stmt.condition);
emit_byte(cg, 0x45); // i32.eqz (invert)
emit_byte(cg, 0x0d); // br_if
emit_leb128(cg, 1); // break to outer block (depth 1)
// Body
compile_block(cg, node->while_stmt.body);
// Loop back
emit_byte(cg, 0x0c); // br
emit_leb128(cg, 0); // continue to loop (depth 0)
emit_byte(cg, 0x0b); // end loop
emit_byte(cg, 0x0b); // end block
}
Return statements:
void compile_return(CodeGen* cg, ASTNode* node) {
if (node->return_stmt.value) {
compile_expression(cg, node->return_stmt.value);
}
emit_byte(cg, 0x0f); // return
}
10. Emitting the WASM Binary
Finally, assemble everything into a valid .wasm file:
void emit_module(CodeGen* cg, Program* program) {
// Header
emit_bytes(cg, "\0asm", 4); // Magic
emit_u32_le(cg, 1); // Version
// Type section
emit_type_section(cg, program);
// Function section
emit_function_section(cg, program);
// Export section
emit_export_section(cg, program);
// Code section
emit_code_section(cg, program);
}
void emit_type_section(CodeGen* cg, Program* program) {
ByteBuffer section = new_buffer();
// Count types
emit_leb128(§ion, program->func_count);
for (int i = 0; i < program->func_count; i++) {
FuncDecl* f = program->funcs[i];
emit_byte(§ion, 0x60); // func type
// Params
emit_leb128(§ion, f->param_count);
for (int j = 0; j < f->param_count; j++) {
emit_byte(§ion, type_to_byte(f->params[j].type));
}
// Results
if (f->return_type == TYPE_VOID) {
emit_leb128(§ion, 0);
} else {
emit_leb128(§ion, 1);
emit_byte(§ion, type_to_byte(f->return_type));
}
}
// Write section
emit_byte(cg, 0x01); // Type section ID
emit_leb128(cg, section.len);
emit_bytes(cg, section.data, section.len);
}
void emit_code_section(CodeGen* cg, Program* program) {
ByteBuffer section = new_buffer();
emit_leb128(§ion, program->func_count);
for (int i = 0; i < program->func_count; i++) {
FuncDecl* f = program->funcs[i];
// Compile function body
ByteBuffer body = new_buffer();
// Local declarations
emit_local_declarations(&body, f);
// Instructions
compile_function_body(&body, f);
emit_byte(&body, 0x0b); // end
// Write body size + body
emit_leb128(§ion, body.len);
emit_bytes(§ion, body.data, body.len);
}
// Write section
emit_byte(cg, 0x0a); // Code section ID
emit_leb128(cg, section.len);
emit_bytes(cg, section.data, section.len);
}
Project Specification
Required Features
Your compiler must support:
Types:
i32(32-bit integers)void(for functions with no return)
Statements:
- Variable declarations:
let x: i32 = 5; - Assignments:
x = 10; - If statements:
if condition { ... } else { ... } - While loops:
while condition { ... } - Return:
return expr;
Expressions:
- Integer literals:
42 - Variables:
x - Binary operators:
+,-,*,/,%,==,!=,<,>,<=,>= - Function calls:
factorial(n - 1) - Parentheses:
(a + b) * c
Functions:
- Multiple functions per file
- Parameters and return values
- Recursion
Input/Output
Input: .mini source file
Output: .wasm binary file
$ ./minilang factorial.mini -o factorial.wasm
Compiled factorial.mini -> factorial.wasm (247 bytes)
$ wasmtime factorial.wasm --invoke factorial 10
3628800
Success Criteria
- Hello World: Compile and run
func main(): i32 { return 42; } - Arithmetic: Compile
func add(a: i32, b: i32): i32 { return a + b; } - Conditionals: Compile and run a function with if/else
- Loops: Compile and run sum-to-n using a while loop
- Recursion: Compile and run factorial recursively
- Multiple functions: Call one function from another
- Validation: Output passes
wasm-validate
Solution Architecture
Compiler Structure
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Compiler Architecture โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ main.c โ
โ โ โ
โ โโโโถ lexer.c โ
โ โ โ next_token() โ Token stream โ
โ โ โ
โ โโโโถ parser.c โ
โ โ โ parse() โ AST โ
โ โ โ
โ โโโโถ checker.c (optional but recommended) โ
โ โ โ check() โ Type-annotated AST โ
โ โ โ
โ โโโโถ codegen.c โ
โ โ compile_function() โ
โ โ compile_statement() โ
โ โ compile_expression() โ
โ โ emit_module() โ .wasm bytes โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ

Suggested File Organization
minilang/
โโโ src/
โ โโโ main.c # Entry point, CLI
โ โโโ lexer.c # Tokenization
โ โโโ lexer.h
โ โโโ parser.c # AST construction
โ โโโ parser.h
โ โโโ ast.h # AST node definitions
โ โโโ checker.c # Type checking
โ โโโ checker.h
โ โโโ codegen.c # WASM generation
โ โโโ codegen.h
โ โโโ wasm.c # Binary encoding helpers
โ โโโ wasm.h
โ โโโ common.h # Shared utilities
โโโ tests/
โ โโโ arithmetic.mini
โ โโโ factorial.mini
โ โโโ fibonacci.mini
โ โโโ control_flow.mini
โ โโโ run_tests.sh
โโโ Makefile
โโโ README.md
Implementation Guide
Phase 1: Lexer (Days 1-3)
Goal: Tokenize MiniLang source code
// Test input
const char* src = "func add(a: i32, b: i32): i32 { return a + b; }";
// Expected tokens
// FUNC, IDENT("add"), LPAREN, IDENT("a"), COLON, I32, COMMA,
// IDENT("b"), COLON, I32, RPAREN, COLON, I32, LBRACE,
// RETURN, IDENT("a"), PLUS, IDENT("b"), SEMICOLON, RBRACE, EOF
Steps:
- Define token types enum
- Implement
skip_whitespace()andskip_comments() - Implement number scanning
- Implement identifier scanning with keyword lookup
- Implement single and double character operators
- Test on sample programs
Checkpoint: All tokens extracted from factorial.mini.
Phase 2: Parser (Days 4-10)
Goal: Build AST from tokens
Steps:
- Define AST node structures
- Implement
parse_program()โ list of functions - Implement
parse_function()โ function declaration - Implement
parse_block()โ list of statements - Implement statement parsers (if, while, return, etc.)
- Implement expression parser with precedence
- Add error recovery (synchronize on semicolons)
Testing approach:
// Print AST for debugging
void print_ast(ASTNode* node, int indent) {
// Recursively print tree structure
}
// Verify structure
assert(ast->type == NODE_PROGRAM);
assert(ast->program.func_count == 2);
assert(strcmp(ast->program.funcs[0]->name, "factorial") == 0);
Checkpoint: Parse factorial.mini into correct AST.
Phase 3: Type Checker (Days 11-14)
Goal: Verify type consistency
Steps:
- Build symbol table (function for scope)
- Implement type inference for expressions
- Check assignment compatibility
- Check function call argument types
- Verify return type matches declaration
- Report clear error messages
Checkpoint: Reject let x: i32 = "hello"; with error.
Phase 4: Code Generator - Expressions (Days 15-21)
Goal: Compile expressions to WASM
Steps:
- Implement
emit_byte(),emit_leb128(),emit_sleb128() - Compile integer constants (
i32.const) - Compile variable references (
local.get) - Compile binary operations (arithmetic, comparison)
- Compile function calls
Test:
func add(a: i32, b: i32): i32 {
return a + b;
}
Should produce working WASM.
Checkpoint: add(5, 3) returns 8.
Phase 5: Code Generator - Control Flow (Days 22-28)
Goal: Compile if/while/return
Steps:
- Compile if-else to WASM
if/else/end - Compile while to
block/loop/br/br_if - Handle nested control structures
- Implement
return(handle early return)
Test:
func sum_to(n: i32): i32 {
let sum: i32 = 0;
let i: i32 = 1;
while i <= n {
sum = sum + i;
i = i + 1;
}
return sum;
}
Checkpoint: sum_to(100) returns 5050.
Phase 6: Binary Emission (Days 29-35)
Goal: Produce valid .wasm files
Steps:
- Emit module header (magic, version)
- Emit type section (function signatures)
- Emit function section (type indices)
- Emit export section
- Emit code section (function bodies)
- Validate with
wasm-validate
Checkpoint: Output passes validation and runs in wasmtime.
Phase 7: Polish (Week 5+)
Goal: Production-quality compiler
Steps:
- Add error messages with line numbers
- Support all comparison operators
- Add unary minus and not
- Support local variable declarations mid-function
- Add memory operations (stretch goal)
Testing Strategy
Unit Tests
Test each phase independently:
// Lexer test
void test_lexer() {
Lexer l = init_lexer("1 + 2");
assert(next_token(&l).type == TOK_NUMBER);
assert(next_token(&l).type == TOK_PLUS);
assert(next_token(&l).type == TOK_NUMBER);
assert(next_token(&l).type == TOK_EOF);
}
// Parser test
void test_parser() {
ASTNode* ast = parse("func f(): i32 { return 1 + 2; }");
assert(ast->type == NODE_PROGRAM);
assert(ast->program.funcs[0]->body->stmts[0]->type == NODE_RETURN);
}
// Codegen test
void test_codegen() {
uint8_t expected[] = {0x41, 0x01, 0x41, 0x02, 0x6a}; // i32.const 1; i32.const 2; i32.add
ASTNode* expr = parse_expr("1 + 2");
ByteBuffer buf = compile_expr(expr);
assert(memcmp(buf.data, expected, 5) == 0);
}
Integration Tests
Compile and run complete programs:
#!/bin/bash
# tests/run_tests.sh
for f in tests/*.mini; do
name=$(basename "$f" .mini)
./minilang "$f" -o "/tmp/$name.wasm"
# Validate
wasm-validate "/tmp/$name.wasm" || { echo "FAIL: $name (invalid)"; continue; }
# Run and check output
result=$(wasmtime "/tmp/$name.wasm" --invoke main)
expected=$(head -1 "$f" | grep -oP '// expect: \K.*')
if [ "$result" = "$expected" ]; then
echo "PASS: $name"
else
echo "FAIL: $name (expected $expected, got $result)"
fi
done
Test Programs
// tests/arithmetic.mini
// expect: 8
func main(): i32 {
return 5 + 3;
}
// tests/factorial.mini
// expect: 3628800
func factorial(n: i32): i32 {
if n < 2 {
return 1;
}
return n * factorial(n - 1);
}
func main(): i32 {
return factorial(10);
}
// tests/fibonacci.mini
// expect: 55
func fib(n: i32): i32 {
if n < 2 {
return n;
}
return fib(n - 1) + fib(n - 2);
}
func main(): i32 {
return fib(10);
}
// tests/while_loop.mini
// expect: 5050
func main(): i32 {
let sum: i32 = 0;
let i: i32 = 1;
while i <= 100 {
sum = sum + i;
i = i + 1;
}
return sum;
}
Common Pitfalls
1. Off-by-One in Precedence
Expression parsing precedence must be correct:
a + b * c should parse as a + (b * c)
a == b + c should parse as a == (b + c)
Lower precedence = called first in recursive descent.
2. Local Index Calculation
Parameters and locals share the same index space:
func f(a: i32, b: i32): i32 { // a=0, b=1
let x: i32 = 0; // x=2
let y: i32 = 0; // y=3
return a + x; // local.get 0 + local.get 2
}
3. Block Type Annotations
WASM blocks need type annotations:
;; Void block (no result)
(block
...
)
;; Bytecode: 0x02 0x40
;; Block returning i32
(block (result i32)
i32.const 42
)
;; Bytecode: 0x02 0x7f
Use 0x40 for void, 0x7f for i32, etc.
4. While Loop Break Depth
When breaking out of a while loop, the depth depends on nesting:
block $exit ;; depth 1 from inside loop
loop $continue ;; depth 0 from inside loop
br_if 1 ;; break to $exit
br 0 ;; continue to $continue
end
end
5. Function Index Offset
If you have imports, function indices are offset:
;; With 2 imports:
;; Import 0, Import 1, Func 0 (index 2), Func 1 (index 3)
call 2 ;; calls your first function
6. LEB128 Signed vs Unsigned
i32.const uses signed LEB128 (negative numbers):
// i32.const -1
emit_byte(0x41);
emit_sleb128(-1); // NOT emit_leb128!
Extensions
1. Add More Types
Support i64, f32, f64:
func pi(): f64 {
return 3.14159265359;
}
func big(): i64 {
return 9223372036854775807;
}
2. Add Arrays via Memory
Use linear memory for arrays:
func sum_array(ptr: i32, len: i32): i32 {
let sum: i32 = 0;
let i: i32 = 0;
while i < len {
sum = sum + load_i32(ptr + i * 4);
i = i + 1;
}
return sum;
}
3. Add String Literals
Store in data section, pass as pointers:
func main(): i32 {
print("Hello, World!"); // ptr to data section
return 0;
}
4. Add Imports
Support calling host functions:
import "console" "log" func log(x: i32);
func main(): i32 {
log(42); // calls imported function
return 0;
}
5. Add Optimizations
Implement simple optimizations:
- Constant folding:
1 + 2โ3 - Dead code elimination: Remove unreachable code after
return - Strength reduction:
x * 2โx << 1
6. Self-Hosting
The ultimate goal: write MiniLang in MiniLang and compile it to WASM. This requires adding strings, memory, and file I/O.
Real-World Connections
How Real Compilers Target WASM
LLVM (Clang, Rust, etc.):
- LLVM IR โ WASM backend
- Your project is a simplified version of this path
- Study
llc -march=wasm32output
AssemblyScript:
- TypeScript-like โ WASM
- Direct compilation without LLVM
- Similar architecture to your project
Emscripten:
- C/C++ โ LLVM โ WASM
- Adds JavaScript glue code
- Full POSIX emulation
Debug Info and Source Maps
Production compilers emit DWARF debug info:
;; Custom section with debug info
(custom "name" ...)
(custom ".debug_info" ...)
Your compiler could add function names for debugging.
Optimization Opportunities
WASM runtimes can optimize, but compilers can help:
- Use
local.teeinstead oflocal.set+local.get - Use block arity to avoid extra operations
- Inline small functions
Self-Assessment Checklist
Compiler Knowledge
- Explain the difference between lexing, parsing, and code generation
- Describe how operator precedence is implemented in recursive descent
- Explain why ASTs are useful intermediate representations
- Draw the AST for a complex expression
WASM Code Generation
- Explain why post-order traversal produces correct stack code
- Generate correct code for nested if/else statements
- Generate correct code for while loops with break conditions
- Handle function calls with correct stack discipline
Binary Format
- Emit valid LEB128 for both positive and negative numbers
- Structure a complete WASM module with all required sections
- Calculate correct section sizes
Testing
- Verify output with wasm-validate
- Run programs in wasmtime and verify results
- Debug compilation errors systematically
Resources
Primary References
- WebAssembly Specification ยง5 - Binary format
- Crafting Interpreters - Excellent compiler tutorial
- Writing a C Compiler - Nora Sandlerโs series
Books
- Engineering a Compiler by Cooper & Torczon - Comprehensive compiler theory
- Modern Compiler Implementation by Appel - Alternative textbook
- WebAssembly: The Definitive Guide - WASM-specific details
Reference Implementations
- AssemblyScript - TypeScript to WASM
- Grain - Functional language to WASM
- binaryen - WASM toolchain
Tools
- wat2wasm - Online WAT compiler
- wasm-validate - Binary validator
- wasmtime - Run WASM from command line
Key Insights
Expressions are post-order, control flow is structured. These two facts make WASM an excellent compilation target. Your ASTโs shape naturally maps to the instruction sequence.
The binary format is the easy part. Most compiler complexity is in parsing and type checking. Once you have a correct AST, code generation is mechanical.
Type checking prevents runtime errors. WASMโs validation guarantees that well-formed modules canโt have stack underflows or type mismatches. Your type checker provides the same guarantee for your source language.
Building a compiler teaches you what languages can and canโt do. After this project, youโll understand why certain language features exist and how theyโre implemented.
The Core Question Youโre Answering
For a compiler targeting WASM, the fundamental question is:
โHow do you translate high-level language constructs into the constrained world of stack-based bytecode?โ
This question unpacks into several critical sub-questions:
- Expression Translation: How do infix expressions like
a + b * cbecome postfix stack operations? - Scope and Binding: How do variable names map to numbered local slots?
- Control Flow Linearization: How do nested if/while structures become flat sequences of blocks and branches?
- Type Erasure and Preservation: Which type information survives to runtime vs. is checked at compile time?
The answer reveals why WASM was designed the way it was: every design decision makes compilation straightforward while maintaining safety and performance.
Source Expression: result = (a + b) * c - d
Your compiler must ask:
1. Where does 'result' live? โ Local slot assignment
2. What type is each operand? โ Type checking phase
3. What evaluation order? โ Post-order AST traversal
4. How to encode in binary? โ LEB128 encoding, opcodes
5. How to structure the module? โ Sections, type indices
The "constrained world" of WASM isn't limitingโit's clarifying.
Every constraint eliminates ambiguity you'd have to handle yourself.
Concepts You Must Understand First
Before writing a single line of compiler code, you need mental models for these concepts:
1. Lexical Analysis and Tokenization
What it is: Converting a character stream into meaningful tokens.
Key insight: The lexer is a finite state machine. Each state transition consumes characters and potentially emits a token.
Input: "func add(a: i32)"
Output: [FUNC, IDENT("add"), LPAREN, IDENT("a"), COLON, I32, RPAREN]
The lexer doesn't care about meaningโjust pattern recognition.
Book reference: Compilers: Principles, Techniques, and Tools (Dragon Book), Chapter 3 covers lexical analysis with DFA theory.
2. Parsing (Recursive Descent or PEG)
What it is: Transforming a token stream into a structured representation according to a grammar.
Key insight: Recursive descent parsers mirror the grammar structure directly. Each grammar rule becomes a function.
Grammar: Parser code:
expr = term fn parse_expr() {
{ "+" term } let left = parse_term();
while match(PLUS) {
let right = parse_term();
left = make_binary(PLUS, left, right);
}
return left;
}
Book reference: Crafting Interpreters by Robert Nystrom, Chapters 5-6 provide the clearest recursive descent explanation.
3. Abstract Syntax Trees (AST)
What it is: A tree representation that captures the hierarchical structure of the program, stripped of syntactic sugar.
Key insight: The ASTโs shape determines the code generation order. Post-order traversal naturally produces stack machine code.
Expression: a + b * c
(+) Post-order traversal:
/ \ 1. Visit a โ local.get $a
a (*) 2. Visit b โ local.get $b
/ \ 3. Visit c โ local.get $c
b c 4. Process (*) โ i32.mul
5. Process (+) โ i32.add
Book reference: Language Implementation Patterns by Terence Parr, Pattern 8 (Tree Grammar) and Pattern 9 (Tree Pattern Matcher).
4. Stack-Based Code Generation
What it is: Emitting instructions for a virtual machine that uses an operand stack rather than registers.
Key insight: The operand stack is implicit in your recursive calls. Each compile_expression() call leaves exactly one value on the stack.
Invariant: Every expression leaves exactly ONE value on the stack
(or zero for void expressions)
compile(a + b):
compile(a) ; stack: [a]
compile(b) ; stack: [a, b]
emit(i32.add) ; stack: [a+b]
Book reference: Engineering a Compiler by Cooper & Torczon, Chapter 7 (Code Shape) covers stack-based evaluation.
5. Expression Evaluation Order
What it is: The sequence in which subexpressions are computed to ensure correct results.
Key insight: Left-to-right evaluation with operator precedence is encoded in the AST structure, then extracted via post-order traversal.
a - b - c parses as ((a - b) - c), not (a - (b - c))
(-) Post-order: a, b, -, c, -
/ \ Which gives: (a - b) - c โ
(-) c
/ \ If parsed wrong: a, b, c, -, -
a b Would give: a - (b - c) โ
Book reference: Writing a C Compiler by Nora Sandler, Chapter 2 covers expression parsing with associativity.
6. Variable Allocation (Locals vs Memory)
What it is: Deciding where each variable lives at runtimeโin a local slot or in linear memory.
Key insight: WASM locals are typed, indexed slots. The compiler must assign each source variable a unique index.
func example(a: i32, b: i32): i32 {
let x: i32 = 0;
let y: i32 = 0;
...
}
Local index allocation:
$a โ 0 (parameter)
$b โ 1 (parameter)
$x โ 2 (local)
$y โ 3 (local)
The symbol table maps names to indices.
Book reference: Modern Compiler Implementation in C by Andrew Appel, Chapter 6 covers activation records and local allocation.
7. Type Checking and Inference
What it is: Ensuring that operations receive operands of the correct type, and inferring types where not explicitly stated.
Key insight: Type checking happens in a separate pass before code generation. By the time you emit code, you know every expressionโs type.
Type checking example:
let x: i32 = 5; โ 5 is i32, matches declaration
let y = 3.14; โ Infer y: f64 from literal
x = y; โ Cannot assign f64 to i32
foo(x, y); Check: foo's param types match argument types
Book reference: Types and Programming Languages by Benjamin Pierce, Chapters 8-11 for foundational type theory.
Questions to Guide Your Design
Use these questions as checkpoints during implementation:
Converting Expressions to Stack Operations
- How do I ensure correct operator precedence in the AST?
- Lower precedence operators become parents of higher precedence ones
a + b * cโAdd(a, Mul(b, c)), notMul(Add(a, b), c)
- How do I handle left vs right associativity?
- Left-associative:
a - b - cโSub(Sub(a, b), c) - Right-associative:
a = b = cโAssign(a, Assign(b, c))
- Left-associative:
- What invariant does each
compile_expression()call maintain?- Answer: Exactly one value pushed to the stack (for non-void expressions)
- How do I generate code for short-circuit operators (
&&,||)?- They require control flow, not just stack operations
a && bmust skip evaluatingbifais false
Handling Variables and Scoping
- How do I distinguish parameters from local variables?
- Both use
local.get/local.setwith different index ranges - Parameters: indices 0 to param_count-1
- Locals: indices param_count and above
- Both use
- How do I handle shadowing (same name in nested scope)?
- Stack of symbol tables, or single table with scope depth
- Inner scopeโs binding shadows outer scopeโs
- What happens to local indices when a variable goes out of scope?
- WASM doesnโt have dynamic local allocationโall locals declared at function start
- You can reuse indices for non-overlapping scopes (optimization)
Compiling Control Flow
- How does
if/elsemap to WASMโs structured control flow?(if (condition) (then ...) (else ...)) - How do I compile
whilewith WASMโsblock/loop/br?(block $exit (loop $continue (br_if $exit (i32.eqz (condition))) ...body... (br $continue))) - How do I handle nested loops with
breakandcontinue?- Track depth of enclosing blocks
breakโbrto outer block,continueโbrto inner loop
Function Prologue/Epilogue
- What goes in the function prologue?
- Local variable declarations (count and types)
- Initialization of locals to zero (WASM does this automatically)
- What goes in the function epilogue?
- Ensure return value is on stack
- The final
endinstruction
- How do I handle early returns?
- The
returninstruction exits immediately - Stack must have correct return type (or be empty for void)
- The
Thinking Exercise
Hand-Compile an Expression
Task: Convert (a + b) * c - d to WAT stack operations, assuming all variables are i32 locals.
Step 1: Draw the AST
(-)
/ \
(*) d
/ \
(+) c
/ \
a b
Step 2: Post-order traversal
Visit order: a, b, (+), c, (*), d, (-)
Step 3: Emit instructions for each visit
;; Visit a
local.get $a ;; stack: [a]
;; Visit b
local.get $b ;; stack: [a, b]
;; Process (+)
i32.add ;; stack: [(a+b)]
;; Visit c
local.get $c ;; stack: [(a+b), c]
;; Process (*)
i32.mul ;; stack: [(a+b)*c]
;; Visit d
local.get $d ;; stack: [(a+b)*c, d]
;; Process (-)
i32.sub ;; stack: [(a+b)*c - d]
Step 4: Verify with concrete values
Let a=2, b=3, c=4, d=5:
- Expected: (2 + 3) * 4 - 5 = 5 * 4 - 5 = 20 - 5 = 15
- Trace: [2] โ [2,3] โ [5] โ [5,4] โ [20] โ [20,5] โ [15] โ
Challenge Yourself
Now try these:
a < b && c > d(short-circuit required!)if (x > 0) { y = x; } else { y = -x; }(if as expression)factorial(n - 1) * n(function call + arithmetic)
The Interview Questions Theyโll Ask
Conceptual Questions
Q1: โExplain the phases of a compiler and what each does.โ
Expected answer: Lexical analysis (tokenization), parsing (AST construction), semantic analysis (type checking, symbol resolution), optimization (optional), code generation. Each phase transforms the representation: characters โ tokens โ AST โ annotated AST โ target code.
Q2: โWhy use an AST instead of directly generating code during parsing?โ
Expected answer: Separation of concernsโparsing handles syntax, later phases handle semantics. AST allows multiple passes (type checking, optimization, codegen). Also enables targeting multiple backends from the same AST.
Q3: โHow would you implement operator precedence in a recursive descent parser?โ
Expected answer: Each precedence level gets its own parsing function. Lower precedence functions call higher precedence functions. For example, parse_addition() calls parse_multiplication(), ensuring multiplication binds tighter.
Code Generation Questions
Q4: โHow do you compile a while loop for a stack-based VM?โ
Expected answer: Use nested block/loop structure:
block $exit
loop $continue
<condition>
br_if $exit (if false)
<body>
br $continue
end
end
The br $continue jumps to loop start; br_if $exit exits when condition is false.
Q5: โWhatโs the difference between local.get/local.set and memory loads/stores?โ
Expected answer: Locals are typed slots on the call frameโfast, limited number, no address. Memory is linear byte arrayโcan store arbitrary data structures, addressable, requires explicit load/store with offset.
Q6: โHow do you handle function calls in a stack-based compiler?โ
Expected answer: Push arguments left-to-right onto the stack, then emit call $func_idx. Callee expects arguments as its first N locals. Return value (if any) is left on callerโs stack.
AST Traversal Questions
Q7: โImplement a post-order tree traversal that generates stack machine code.โ
def codegen(node):
match node:
case BinaryOp(op, left, right):
codegen(left) # leaves value on stack
codegen(right) # leaves value on stack
emit(op_to_instr[op]) # consumes 2, pushes 1
case Number(value):
emit(f"i32.const {value}")
case Variable(name):
emit(f"local.get ${name}")
Q8: โHow would you type-check a function call expression?โ
Expected answer:
- Look up function in symbol table
- Check argument count matches parameter count
- Type-check each argument expression
- Verify each argument type matches corresponding parameter type
- Return the functionโs return type as the expressionโs type
System Design Questions
Q9: โDesign a symbol table that handles nested scopes.โ
Expected answer: Stack of hash tables. Each scope pushes a new table. Lookup searches from top to bottom. Define adds to topmost table. When scope ends, pop the table.
Q10: โHow would you add error recovery to your parser?โ
Expected answer: Synchronization on statement boundaries. When an error is detected, skip tokens until reaching a synchronization point (like ; or }), report the error, then continue parsing. Collect all errors rather than stopping at first.
Hints in Layers
For when youโre stuck, reveal hints progressively:
Layer 1: Getting Started
Hint: How do I structure the compiler?
Start with the simplest possible pipeline:
main.c:
source = read_file(argv[1])
tokens = lex(source)
ast = parse(tokens)
wasm_bytes = codegen(ast)
write_file(output, wasm_bytes)
Get each phase working before moving to the next. Use print statements liberally.
Hint: What's the minimum viable program?
Start with func main(): i32 { return 42; }
This requires:
- Lexer: FUNC, IDENT, LPAREN, RPAREN, COLON, I32, LBRACE, RETURN, NUMBER, SEMICOLON, RBRACE
- Parser: parse_function, parse_return, parse_number
- Codegen: emit module header, type section, function section, export section, code section with
i32.const 42; end
Layer 2: Lexer Struggles
Hint: My lexer is getting complicated
Keep the lexer stateless except for position. Use a switch statement for single-character tokens, special cases for two-character tokens (==, <=, etc.), and separate functions for numbers and identifiers.
Token next_token() {
skip_whitespace();
if (at_end()) return make_token(TOK_EOF);
char c = advance();
if (is_alpha(c)) return identifier();
if (is_digit(c)) return number();
switch (c) {
case '(': return make_token(TOK_LPAREN);
case ')': return make_token(TOK_RPAREN);
case '=': return match('=') ? make_token(TOK_EQEQ) : make_token(TOK_EQ);
// ...
}
}
Layer 3: Parser Struggles
Hint: Operator precedence is wrong
Remember: lower precedence = called first.
Precedence (lowest to highest):
1. || (or)
2. && (and)
3. == != (equality)
4. < > <= >= (comparison)
5. + - (addition)
6. * / % (multiplication)
7. - ! (unary)
8. () function call (primary)
parse_expression() calls parse_or()
parse_or() calls parse_and()
parse_and() calls parse_equality()
... and so on
Hint: Left recursion is breaking my parser
Convert left recursion to a while loop:
// Instead of: expr = expr "+" term | term
// Which would infinitely recurse
ASTNode* parse_expr() {
ASTNode* left = parse_term();
while (match(TOK_PLUS) || match(TOK_MINUS)) {
Token op = previous();
ASTNode* right = parse_term();
left = make_binary(op, left, right);
}
return left;
}
Layer 4: Code Generation Struggles
Hint: My WASM binary is invalid
Common issues:
- Wrong section order: Type (1), Import (2), Function (3), Table (4), Memory (5), Global (6), Export (7), Start (8), Element (9), Code (10), Data (11)
- Size mismatch: Section size doesnโt match actual content
- LEB128 encoding: Use signed for constants, unsigned for indices
- Missing
endbyte (0x0b): Every function, block, loop, if must end with 0x0b
Hint: My while loop doesn't work
The pattern:
block $exit ;; 0x02 0x40
loop $cont ;; 0x03 0x40
<condition>
i32.eqz ;; 0x45 (invert condition)
br_if $exit ;; 0x0d 0x01 (depth 1 = outer block)
<body>
br $cont ;; 0x0c 0x00 (depth 0 = this loop)
end ;; 0x0b
end ;; 0x0b
Key: br to a loop jumps to the start, br to a block jumps to the end.
Layer 5: Deep Issues
Hint: Early return doesn't work correctly
WASMโs return instruction (0x0f) exits the function immediately. But you need to ensure the stack has the right type.
For functions returning i32:
if (condition)
i32.const 0
return ;; OK: stack has i32
end
i32.const 1 ;; Normal path continues
For void functions, donโt push anything before return.
Hint: Nested control flow has wrong branch depths
Track block depth during codegen:
typedef struct {
int block_depth;
// ...
} CodeGen;
void compile_while(CodeGen* cg, ...) {
emit(cg, 0x02); // block
cg->block_depth++;
emit(cg, 0x03); // loop
cg->block_depth++;
// br_if to exit: depth is 1 (back to outer block)
// br to continue: depth is 0 (back to loop start)
emit(cg, 0x0b); // end loop
cg->block_depth--;
emit(cg, 0x0b); // end block
cg->block_depth--;
}
Books That Will Help
| Book | Author(s) | Relevance | Key Chapters |
|---|---|---|---|
| Writing a C Compiler | Nora Sandler | Hands-on compiler construction with modern approach | All chaptersโfollows a similar project structure |
| Engineering a Compiler | Keith Cooper & Linda Torczon | Comprehensive theory with practical focus | Ch. 4 (Parsing), Ch. 7 (Code Shape), Ch. 13 (Instruction Selection) |
| Language Implementation Patterns | Terence Parr | Pattern-based approach to language tools | Part II (Analyzing Languages), Part III (Interpretation) |
| Compilers: Principles and Practice | Aho, Lam, Sethi, Ullman | The โDragon Bookโโfoundational theory | Ch. 2 (A Simple Syntax-Directed Translator), Ch. 6 (Intermediate-Code Generation) |
| Crafting Interpreters | Robert Nystrom | Most accessible introduction, free online | Part II (Tree-Walk Interpreter)โapplicable to compilers too |
| Modern Compiler Implementation in C | Andrew Appel | Alternative to Dragon Book, more implementation-focused | Ch. 6 (Activation Records), Ch. 7 (Translation to IR) |
Reading Order Recommendation
- Start with Crafting Interpreters (free online) for fundamentals
- Then read Writing a C Compiler for a complete project walkthrough
- Reference Engineering a Compiler Chapter 7 for code generation theory
- Consult Language Implementation Patterns when stuck on specific patterns
Real-World Outcome
When you complete this project, your compiler will take source code and produce working WebAssembly. Hereโs what the end-to-end process looks like:
Input: MiniLang Source File
File: factorial.mini
// Compute factorial recursively
func factorial(n: i32): i32 {
if n < 2 {
return 1;
}
return n * factorial(n - 1);
}
// Compute factorial iteratively for comparison
func factorial_iter(n: i32): i32 {
let result: i32 = 1;
let i: i32 = 2;
while i <= n {
result = result * i;
i = i + 1;
}
return result;
}
func main(): i32 {
return factorial(10);
}
Compilation Process
$ ./minilang factorial.mini -o factorial.wasm --emit-wat
Lexing... 47 tokens
Parsing... AST with 3 functions
Type checking... OK
Code generation...
factorial: 23 bytes
factorial_iter: 41 bytes
main: 8 bytes
Emitting binary...
Output:
factorial.wasm (198 bytes)
factorial.wat (generated WAT for debugging)
$ wasm-validate factorial.wasm
OK
$ wasmtime factorial.wasm --invoke factorial 10
3628800
$ wasmtime factorial.wasm --invoke factorial_iter 10
3628800
Generated WAT (Human-Readable)
File: factorial.wat
(module
;; Type section: function signatures
(type $t0 (func (param i32) (result i32))) ;; factorial, factorial_iter
(type $t1 (func (result i32))) ;; main
;; Function declarations
(func $factorial (type $t0) (param $n i32) (result i32)
;; if n < 2
local.get $n
i32.const 2
i32.lt_s
if (result i32)
;; return 1
i32.const 1
else
;; return n * factorial(n - 1)
local.get $n
local.get $n
i32.const 1
i32.sub
call $factorial
i32.mul
end
)
(func $factorial_iter (type $t0) (param $n i32) (result i32)
(local $result i32)
(local $i i32)
;; let result = 1
i32.const 1
local.set $result
;; let i = 2
i32.const 2
local.set $i
;; while i <= n
block $exit
loop $continue
local.get $i
local.get $n
i32.gt_s ;; i > n means exit
br_if $exit
;; result = result * i
local.get $result
local.get $i
i32.mul
local.set $result
;; i = i + 1
local.get $i
i32.const 1
i32.add
local.set $i
br $continue
end
end
;; return result
local.get $result
)
(func $main (type $t1) (result i32)
i32.const 10
call $factorial
)
;; Export main for runtime invocation
(export "factorial" (func $factorial))
(export "factorial_iter" (func $factorial_iter))
(export "main" (func $main))
)
Binary Output Analysis
$ xxd factorial.wasm | head -20
00000000: 0061 736d 0100 0000 0110 0360 017f 017f .asm.......`....
00000010: 6000 017f 0304 0300 0001 0722 0366 6163 `..........".fac
00000020: 746f 7269 616c 0000 0e66 6163 746f 7269 torial...factori
00000030: 616c 5f69 7465 7200 0104 6d61 696e 0002 al_iter...main..
00000040: 0a56 0317 0020 0041 0248 0440 4101 0520 .V... .A.H.@A..
...
Section breakdown:
00-03: Magic number (0x00 0x61 0x73 0x6d = "\0asm")
04-07: Version (0x01 0x00 0x00 0x00 = 1)
08-17: Type section (01) - function signatures
18-1b: Function section (03) - type indices
1c-43: Export section (07) - exported names
44-c5: Code section (0a) - function bodies
Integration with JavaScript
// Load and run the compiled WASM
const wasmBuffer = await fetch('factorial.wasm');
const wasmModule = await WebAssembly.instantiate(await wasmBuffer.arrayBuffer());
const { factorial, factorial_iter, main } = wasmModule.instance.exports;
// Use the compiled functions
console.log(factorial(10)); // 3628800
console.log(factorial_iter(10)); // 3628800
console.log(main()); // 3628800
// Benchmark: your compiled code runs at near-native speed
console.time('factorial');
for (let i = 0; i < 100000; i++) factorial(20);
console.timeEnd('factorial'); // ~50ms for 100k calls
Compiler Test Suite Output
$ ./run_tests.sh
Running test suite...
[PASS] arithmetic.mini: 5 + 3 = 8
[PASS] precedence.mini: 2 + 3 * 4 = 14
[PASS] comparison.mini: 5 > 3 = 1
[PASS] if_else.mini: max(3, 7) = 7
[PASS] while_loop.mini: sum_to(100) = 5050
[PASS] factorial.mini: factorial(10) = 3628800
[PASS] fibonacci.mini: fib(10) = 55
[PASS] multiple_funcs.mini: calling helper functions
[PASS] nested_if.mini: nested conditionals
[PASS] nested_while.mini: nested loops
All 10 tests passed!
Validation:
All outputs pass wasm-validate โ
All outputs run in wasmtime โ
All outputs run in Node.js โ
This is what youโve built: a complete pipeline from high-level source code to portable, efficient binary that runs anywhere WebAssembly is supported.
After this project, youโll see every programming language differentlyโas a layer of abstraction over something like WASM. Project 5 (WASI) shows how WASM programs interact with the outside world.