Project 6: Full-Stack WASM Toolchain (Capstone)
Project 6: Full-Stack WASM Toolchain (Capstone)
Integrate compiler, validator, interpreter, and debugger into a professional-grade WebAssembly toolchain
The Core Question Youโre Answering
โHow do all the pieces of a compilation pipeline fit together, from source code to running bytecode?โ
This capstone project answers the fundamental question that every toolchain engineer must understand: how do disparate components - lexer, parser, type checker, code generator, validator, runtime, debugger - come together to form a cohesive system? You will discover that building individual tools is straightforward compared to the challenge of making them work together seamlessly with consistent error handling, shared data structures, and unified user experience.
The deeper insight: a toolchain is not just a collection of tools, but an ecosystem where each component must respect contracts established by others, where error messages must trace through multiple layers, and where optimization at any stage must preserve the semantics guaranteed by earlier stages.
Concepts You Must Understand First
Before attempting this capstone, ensure you have mastery of these foundational concepts:
End-to-End Compilation Pipeline
The journey from source code to execution involves multiple transformations, each with specific responsibilities:
Source Text โ Tokens โ AST โ Typed AST โ IR โ Optimized IR โ Binary โ Validated Binary โ Instantiated Module โ Execution
Each arrow represents a complete component with its own algorithms, data structures, and error handling.
Linker and Module Composition
Understanding how separate compilation units combine:
- Symbol resolution: How do we find
function $addwhen it is defined in another module? - Relocation: How do we patch addresses when the final memory layout is determined?
- Import/Export matching: How do types and signatures align across module boundaries?
Debug Information (DWARF)
DWARF provides all necessary information for debuggers to resolve locations, variable names, type layouts, and more. Key concepts:
- Line number tables: Map bytecode offsets to source locations
- Variable location descriptions: Where is variable
xat each program point? - Type definitions: What is the structure of a user-defined type?
- Call frame information: How to unwind the stack for backtraces?
Source Maps for Debugging
Source maps provide location mapping but cannot easily inspect variables. Understanding the tradeoff:
- DWARF: Full debugging experience, larger files, requires tool support
- Source maps: Widely supported, but limited to location mapping
Optimization Passes
How transformations improve code while preserving semantics:
- Analysis passes: Gather information (liveness, dominance, reaching definitions)
- Transformation passes: Modify code based on analysis
- Pass ordering: Some optimizations enable others, some conflicts exist
- Phase ordering problem: No universally optimal ordering exists
Module Instantiation and Linking
The runtime process of preparing a module for execution:
- Allocate linear memory
- Create function table
- Resolve imports from other instances
- Initialize globals and data segments
- Execute start function if present
Questions to Guide Your Design
Integration Architecture
- How should parser, compiler, and runtime share module representation?
- Should each have its own representation, or share one?
- What are the tradeoffs of mutable vs immutable module structures?
- How do you handle incremental updates (e.g., adding a function in REPL)?
- How do you maintain error context across components?
- When the validator rejects code, how do you trace back to source?
- How do you aggregate errors from multiple sources?
- What information must each component preserve for debugging?
Module Linking
- How do you implement multi-module linking?
- Eager vs lazy resolution: when do you verify imports match exports?
- How do you handle cyclic dependencies?
- What metadata must accompany each module?
- How do you handle versioning and compatibility?
- What happens when an import signature changes?
- How do you detect breaking changes vs compatible extensions?
Debug Support
- How do you add debug information without breaking optimization?
- DWARF preservation through optimization passes
- When to regenerate debug info vs preserve it?
- How do you handle inlined functions in stack traces?
- How do you make the debugger responsive during execution?
- Polling vs interrupt-based breakpoint checking
- Impact on performance when debugging is enabled
- How to minimize overhead when debugging is disabled?
Optimization Strategy
- How do you choose optimization levels?
- What constitutes
-O1vs-O2vs-O3? - How do you balance compile time vs runtime performance?
- What optimizations should always run regardless of level?
- What constitutes
- How do you verify optimizations preserve semantics?
- Differential testing: compare optimized vs unoptimized output
- Formal verification: prove transformations correct
- When to trust the optimizer vs verify exhaustively?
Thinking Exercise
Trace a program through every stage of your toolchain:
Consider this simple source program:
func factorial(n: i32) -> i32 {
if n <= 1 { return 1; }
return n * factorial(n - 1);
}
Now trace its journey:
-
Lexer: What tokens are produced? How do you handle the
<=operator vs<followed by=? -
Parser: What AST structure captures the recursive call? How is the if-expression vs if-statement distinction handled?
-
Type Checker: How do you verify that
factorial(n - 1)has typei32? What context must be maintained? -
Code Generator: How does the recursive call become a WASM
callinstruction? Where do locals come from? -
Binary Emitter: How is the LEB128 encoding of the function index generated? What about the block structure for
if? -
Validator: What stack state exists at each instruction? How does
ifaffect the control stack? -
Optimizer: Can the recursive call be tail-call optimized? What analysis determines this?
-
Instantiator: How is memory allocated? What happens to the function when loaded?
-
Debugger: If we set a breakpoint at
return n * ..., what state should be visible? How do we shownโs value? -
Disassembler: How should the output look? Should we show the original source in comments?
Write out the complete transformation at each stage. This exercise reveals the data that must flow between components.
The Interview Questions Theyโll Ask
Toolchain Architecture
- โExplain how a linker resolves symbols across compilation units.โ
- Discuss symbol tables, relocation entries, and the two-pass algorithm
- Address weak vs strong symbols
- Explain dynamic vs static linking tradeoffs
- โHow would you design a modular compiler that supports multiple source languages and target architectures?โ
- Discuss the role of intermediate representations
- Explain how frontends and backends decouple
- Address the M x N problem (M languages, N targets)
- โWhat data structures would you use to represent a WASM module in memory?โ
- Discuss arena allocation vs individual heap allocations
- Address memory layout for cache efficiency
- Explain immutability vs mutability tradeoffs
Compilation Pipeline
- โWalk me through what happens when you compile
a + b * c.โ- Lexing: tokens
a,+,b,*,c - Parsing: precedence handling to get
a + (b * c) - Type checking: ensuring operands are compatible
- Code generation: evaluation order considerations
- Lexing: tokens
- โHow does constant folding work, and what are its limitations?โ
- Explain compile-time evaluation
- Discuss floating-point precision concerns
- Address overflow behavior differences
- โExplain the difference between a syntax error and a semantic error.โ
- Syntax: violates grammar rules (parser catches)
- Semantic: violates type rules (type checker catches)
- Give examples of each in your language
Debugging
- โHow does a debugger implement breakpoints?โ
- Software breakpoints: instruction replacement
- Hardware breakpoints: CPU debug registers
- For interpreters: instruction dispatch interception
- โWhat information does DWARF encode, and why is it complex?โ
- Location information that changes as program executes
- Type information for arbitrary user-defined types
- Call frame information for stack unwinding
- Discuss expression languages for variable locations
- โHow would you implement step-over vs step-into for function calls?โ
- Step-into: stop at first instruction of callee
- Step-over: set breakpoint at return address, continue
- Handle recursive calls correctly
Optimization
- โWhat is the phase ordering problem in compilers?โ
- Some optimizations enable others (inlining enables constant propagation)
- Some optimizations conflict (instruction scheduling vs register allocation)
- No universally optimal ordering exists
- โHow does dead code elimination work?โ
- Compute liveness analysis
- Mark all live instructions
- Remove unmarked instructions
- Handle side effects correctly
- โWhatโs the difference between local and global optimization?โ
- Local: within a basic block
- Global: across basic blocks within a function
- Interprocedural: across function boundaries
Hints in Layers
Integration Challenges
Layer 1: If components are not communicating, check that they share the same module representation. A common mistake is having the compiler emit a different structure than the validator expects.
Layer 2: Error handling across components requires a unified error type. Consider an error that carries: source location, component that detected it, severity, and suggested fixes.
Layer 3: The key insight for integration is contracts. Document what each component promises (postconditions) and requires (preconditions). Validation failures often indicate contract violations.
Debugger Implementation
Layer 1: Start with the simplest possible debugger: stop before every instruction, print it, wait for enter key. This proves your interpreter hook works.
Layer 2: Breakpoints need efficient lookup. A hash set of (function_index, instruction_offset) pairs works well. Donโt linear search through a list on every instruction.
Layer 3: For step-over, you need to track call depth. When step-over is requested, note current depth, continue until depth returns to that level. Handle exceptions that unwind past the step point.
Optimizer Correctness
Layer 1: Before optimizing anything, write tests that compare optimized vs unoptimized output on many inputs. The optimizer is wrong if outputs differ.
Layer 2: Implement optimization passes as pure functions: transform(module) -> new_module. Never mutate in place during development. This makes debugging easier.
Layer 3: For complex optimizations, prove correctness on paper first. What invariant does the transformation preserve? Under what conditions is it safe to apply?
CLI Design
Layer 1: Use a consistent option style throughout. If compile -o output uses -o, then optimize -o output should too. Inconsistency frustrates users.
Layer 2: Return meaningful exit codes: 0 for success, 1 for user error (bad input), 2 for internal error (bug). Scripts depend on these.
Layer 3: Structured output (JSON, machine-readable) enables integration with editors and build systems. Consider --format=json options for all commands that produce output.
Books That Will Help
| Book | Author(s) | Key Topics | Why It Matters |
|---|---|---|---|
| Engineering a Compiler (3rd ed.) | Keith D. Cooper, Linda Torczon | Complete compiler pipeline, optimization algorithms, code generation | The definitive practical guide to building production compilers; covers every phase in depth with modern techniques |
| Advanced C and C++ Compiling | Milan Stevanovic | Linking, loading, library design, ABI | Deep dive into what happens after compilation; essential for understanding module linking |
| Low-Level Programming | Igor Zhirkov | Assembly, memory, calling conventions | Grounds high-level compiler concepts in concrete machine reality |
| Practical Binary Analysis | Dennis Andriesse | Reverse engineering, binary formats, disassembly | Understanding binaries from the consumer side; invaluable for debugger and disassembler implementation |
Additional References
| Resource | Type | Focus Area |
|---|---|---|
| V8 WASM Compilation Pipeline | Documentation | How a production engine handles WASM compilation |
| Chrome DWARF Debugging | Blog Post | Modern WASM debugging with source maps and DWARF |
| Emscripten Debugging Guide | Documentation | Practical debugging flags and techniques |
| LLVM Architecture | Documentation | How a real toolchain organizes its components |
Project Overview
| Attribute | Value |
|---|---|
| Difficulty | Expert |
| Time Estimate | 2-3 months |
| Languages | C (primary), Rust, Go |
| Prerequisites | Projects 1-5 completed |
| Main Reference | All previous project references |
| Knowledge Area | Toolchain Architecture, Software Integration |
Learning Objectives
After completing this project, you will be able to:
- Architect a complete toolchain - Design cohesive tools that work together
- Implement a WASM validator - Verify modules conform to the specification
- Build a debugger - Step through WASM execution with inspection
- Create a disassembler - Convert binary back to readable WAT
- Add optimizations - Improve generated code quality
- Design CLI interfaces - Create professional command-line tools
- Write comprehensive tests - Ensure toolchain reliability
Conceptual Foundation
1. What Is a Toolchain?
A toolchain is a collection of tools that work together to transform source code into running programs:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Complete WebAssembly Toolchain โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ Source (.mini) โ
โ โ โ
โ โผ โ
โ โโโโโโโโโโโโโโโ โ
โ โ Compiler โ mywasmcc source.mini -o module.wasm โ
โ โ (Project 4)โ โ
โ โโโโโโโโฌโโโโโโโ โ
โ โ โ
โ โผ โ
โ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โ
โ โ Validator โโโโโโถโDisassembler โโโโโโถโ Optimizer โ โ
โ โ (NEW) โ โ (NEW) โ โ (NEW) โ โ
โ โโโโโโโโฌโโโโโโโ โโโโโโโโโโโโโโโ โโโโโโโโฌโโโโโโโ โ
โ โ โ โ
โ โผ โผ โ
โ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โ
โ โ Interpreter โ โ Debugger โ โ
โ โ (Project 3) โ โ (NEW) โ โ
โ โ + WASI (P5) โ โโโโโโโโโโโโโโโ โ
โ โโโโโโโโโโโโโโโ โ
โ โ
โ CLI Interface: โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ mywasm compile source.mini -o module.wasm โ
โ mywasm validate module.wasm โ
โ mywasm disasm module.wasm โ
โ mywasm run module.wasm [args...] โ
โ mywasm debug module.wasm โ
โ mywasm optimize module.wasm -o optimized.wasm โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ

2. Why Build a Complete Toolchain?
Building individual tools teaches concepts. Building a toolchain teaches:
- Integration: How tools communicate and share data
- User experience: How developers actually use tools
- Robustness: Handling edge cases across the full pipeline
- Architecture: Designing for extensibility and maintenance
The professional outcome: Youโll have a portfolio piece that demonstrates mastery of WebAssembly and software engineering.
3. Toolchain Components Overview
| Component | Purpose | Status |
|---|---|---|
| Compiler | Source โ WASM | From Project 4 |
| Interpreter | Execute WASM | From Project 3 |
| WASI Runtime | System interface | From Project 5 |
| Validator | Verify correctness | NEW |
| Disassembler | WASM โ WAT | NEW |
| Debugger | Interactive execution | NEW |
| Optimizer | Improve code | NEW |
| Linker | Combine modules | STRETCH |
4. The Validator: Ensuring Correctness
WASM validation ensures a module is well-formed before execution:
Validation Checks:
1. Structure validation
- Magic number correct
- Version supported
- Sections in correct order
- No duplicate sections
2. Type validation
- All type indices in bounds
- Function signatures valid
- Block types well-formed
3. Function validation
- Stack balanced at every point
- Types consistent through execution
- All branches target valid labels
- All calls reference valid functions
4. Memory/Table validation
- Indices in bounds
- Limits valid (min โค max)
- Data segments fit in memory
5. Import/Export validation
- All imports present
- Export names unique
- Indices valid
Type checking algorithm (stack-based):
validate_function(func):
stack = []
control_stack = [] # For blocks/loops/ifs
for instruction in func.body:
match instruction:
case i32.const(n):
push(stack, i32)
case i32.add:
pop_expect(stack, i32)
pop_expect(stack, i32)
push(stack, i32)
case local.get(idx):
type = func.locals[idx].type
push(stack, type)
case local.set(idx):
type = func.locals[idx].type
pop_expect(stack, type)
case block(result_type):
control_stack.push({
kind: BLOCK,
result: result_type,
height: len(stack)
})
case br(depth):
label = control_stack[depth]
# Pop values for label arity
for type in label.result:
pop_expect(stack, type)
# Mark as unreachable
unreachable = true
case end:
block = control_stack.pop()
# Stack should have exactly result values
check(len(stack) == block.height + len(block.result))
for type in block.result:
pop_expect(stack, type)
# Push results back
for type in block.result:
push(stack, type)
5. The Disassembler: Binary to Text
Convert .wasm back to readable WAT:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Disassembler Pipeline โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ Binary Input WAT Output โ
โ โโโโโโโโโโโโ โโโโโโโโโโ โ
โ 00 61 73 6d 01 00 00 00 (module โ
โ 01 07 01 60 02 7f 7f (type (func (param i32 i32) โ
โ 01 7f (result i32))) โ
โ 03 02 01 00 (func (type 0) โ
โ 07 07 01 03 61 64 64 (export "add" (func 0)) โ
โ 00 00 local.get 0 โ
โ 0a 09 01 07 00 20 00 local.get 1 โ
โ 20 01 6a 0b i32.add) โ
โ ) โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ

Disassembler features:
- Resolve function/type indices to names
- Format with proper indentation
- Show hex offsets (optional)
- Include comments with original bytes
6. The Debugger: Interactive Execution
A debugger lets you control and inspect execution:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Debugger Architecture โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ User Interface โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ โ (wdb) break add โโ
โ โ Breakpoint 1 set at function $add โโ
โ โ (wdb) run โโ
โ โ Breakpoint 1 hit at $add โโ
โ โ (wdb) stack โโ
โ โ [0] i32: 5 โโ
โ โ [1] i32: 3 โโ
โ โ (wdb) step โโ
โ โ local.get 0 โโ
โ โ (wdb) print $0 โโ
โ โ $0 = i32: 5 โโ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ โ โ
โ โผ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ โ Debug Controller โโ
โ โ - Breakpoint management โโ
โ โ - Step control (step, next, continue) โโ
โ โ - State inspection (stack, locals, memory, globals) โโ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ โ โ
โ โผ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ โ Modified Interpreter (from P3) โโ
โ โ - Hooks before each instruction โโ
โ โ - State accessible to debugger โโ
โ โ - Can pause/resume execution โโ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ

Debugger commands:
break <func>- Set breakpoint at functionbreak <func>:<offset>- Set breakpoint at instructionrun [args]- Start executioncontinue- Resume until next breakpointstep- Execute one instructionnext- Execute to next line (step over calls)finish- Execute until function returnsstack- Show value stacklocals- Show local variablesmemory <addr> <len>- Dump memorybacktrace- Show call stackprint <expr>- Evaluate and print
7. The Optimizer: Improving Code
Simple optimizations that improve generated code:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Optimization Passes โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ 1. Constant Folding โ
โ โโโโโโโโโโโโโโโโโ โ
โ Before: i32.const 5 โ
โ i32.const 3 โ
โ i32.add โ
โ After: i32.const 8 โ
โ โ
โ 2. Dead Code Elimination โ
โ โโโโโโโโโโโโโโโโโโโโโโ โ
โ Before: return โ
โ i32.const 5 ;; unreachable โ
โ drop โ
โ After: return โ
โ โ
โ 3. Local Variable Coalescing โ
โ โโโโโโโโโโโโโโโโโโโโโโโโ โ
โ Before: local.get 0 โ
โ local.set 1 โ
โ local.get 1 โ
โ After: local.get 0 โ
โ local.tee 1 โ
โ โ
โ 4. Strength Reduction โ
โ โโโโโโโโโโโโโโโโโโโโ โ
โ Before: i32.const 2 โ
โ i32.mul โ
โ After: i32.const 1 โ
โ i32.shl โ
โ โ
โ 5. Block Flattening โ
โ โโโโโโโโโโโโโโโโโ โ
โ Before: block โ
โ block โ
โ nop โ
โ end โ
โ end โ
โ After: nop โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ

8. Unified CLI Design
Create a cohesive command-line interface:
$ mywasm --help
mywasm - A WebAssembly toolchain
USAGE:
mywasm <COMMAND> [OPTIONS]
COMMANDS:
compile Compile source language to WASM
validate Check if a WASM module is valid
disasm Disassemble WASM to WAT text format
run Execute a WASM module
debug Debug a WASM module interactively
optimize Optimize a WASM module
info Display module information
OPTIONS:
-h, --help Print help information
-V, --version Print version information
-v, --verbose Enable verbose output
EXAMPLES:
mywasm compile hello.mini -o hello.wasm
mywasm validate hello.wasm
mywasm run hello.wasm
mywasm debug hello.wasm
mywasm disasm hello.wasm > hello.wat
Project Specification
Required Components
Core (Must Have):
- Unified CLI - Single
mywasmcommand with subcommands - Validator - Structural and type validation
- Disassembler - Binary to WAT conversion
- Debugger - Basic stepping and inspection
- Integration - Previous projects working together
Enhanced (Should Have):
- Optimizer - At least constant folding
- Module Info - Display module structure
- Error Messages - Clear, actionable diagnostics
- Test Suite - Comprehensive automated tests
Stretch (Nice to Have):
- Linker - Combine multiple modules
- Source Maps - Map WASM back to source
- Profiler - Execution timing and hotspots
- REPL - Interactive WASM evaluation
Success Criteria
- End-to-end flow: Compile, validate, run a program
- Validation catches errors: Reject malformed modules
- Debugger works: Set breakpoint, hit it, inspect state
- Disassembly round-trips:
disasm(compile(source))is readable - Professional CLI: Help text, error messages, return codes
- Test coverage: All major functionality tested
Solution Architecture
Directory Structure
mywasm/
โโโ src/
โ โโโ main.c # CLI entry point
โ โโโ cli/
โ โ โโโ cli.c # Command parsing
โ โ โโโ cli.h
โ โ โโโ compile_cmd.c # compile subcommand
โ โ โโโ validate_cmd.c # validate subcommand
โ โ โโโ run_cmd.c # run subcommand
โ โ โโโ debug_cmd.c # debug subcommand
โ โ โโโ disasm_cmd.c # disasm subcommand
โ โ โโโ optimize_cmd.c # optimize subcommand
โ โ
โ โโโ compiler/ # From Project 4
โ โ โโโ lexer.c
โ โ โโโ parser.c
โ โ โโโ checker.c
โ โ โโโ codegen.c
โ โ
โ โโโ runtime/ # From Projects 3 & 5
โ โ โโโ parser.c # WASM binary parser
โ โ โโโ exec.c # Interpreter
โ โ โโโ memory.c
โ โ โโโ stack.c
โ โ โโโ wasi/
โ โ โโโ wasi.c
โ โ โโโ fd_table.c
โ โ
โ โโโ validator/ # NEW
โ โ โโโ validator.c
โ โ โโโ validator.h
โ โ โโโ type_checker.c
โ โ โโโ struct_checker.c
โ โ
โ โโโ disasm/ # NEW
โ โ โโโ disasm.c
โ โ โโโ disasm.h
โ โ
โ โโโ debugger/ # NEW
โ โ โโโ debugger.c
โ โ โโโ debugger.h
โ โ โโโ breakpoints.c
โ โ โโโ ui.c
โ โ
โ โโโ optimizer/ # NEW
โ โ โโโ optimizer.c
โ โ โโโ optimizer.h
โ โ โโโ const_fold.c
โ โ โโโ dead_code.c
โ โ โโโ peephole.c
โ โ
โ โโโ common/
โ โโโ types.h
โ โโโ error.c
โ โโโ error.h
โ โโโ util.c
โ
โโโ tests/
โ โโโ compiler/
โ โโโ validator/
โ โโโ runtime/
โ โโโ debugger/
โ โโโ optimizer/
โ โโโ integration/
โ
โโโ docs/
โ โโโ user_guide.md
โ โโโ architecture.md
โ โโโ contributing.md
โ
โโโ examples/
โ โโโ hello.mini
โ โโโ factorial.mini
โ โโโ fibonacci.mini
โ โโโ cat.mini
โ
โโโ Makefile
โโโ README.md
Shared Module Representation
All tools share a common module representation:
// common/types.h
typedef struct {
// Type section
FuncType* types;
uint32_t type_count;
// Function section
uint32_t* func_types; // Type index for each function
uint32_t func_count;
// Code section
FuncBody* code;
// Memory section
Memory* memories;
uint32_t memory_count;
// Global section
Global* globals;
uint32_t global_count;
// Import section
Import* imports;
uint32_t import_count;
// Export section
Export* exports;
uint32_t export_count;
// Data section
DataSegment* data;
uint32_t data_count;
// Custom sections (for names, debug info)
CustomSection* custom;
uint32_t custom_count;
// Name section data (if present)
NameSection* names;
} Module;
// Shared across tools:
Module* parse_wasm(const uint8_t* bytes, size_t len);
void free_module(Module* module);
uint8_t* emit_wasm(Module* module, size_t* out_len);
Implementation Guide
Phase 1: CLI Framework (Days 1-5)
Goal: Unified command-line interface
// main.c
int main(int argc, char** argv) {
if (argc < 2) {
print_usage();
return 1;
}
const char* cmd = argv[1];
if (strcmp(cmd, "compile") == 0) {
return cmd_compile(argc - 1, argv + 1);
} else if (strcmp(cmd, "validate") == 0) {
return cmd_validate(argc - 1, argv + 1);
} else if (strcmp(cmd, "run") == 0) {
return cmd_run(argc - 1, argv + 1);
} else if (strcmp(cmd, "debug") == 0) {
return cmd_debug(argc - 1, argv + 1);
} else if (strcmp(cmd, "disasm") == 0) {
return cmd_disasm(argc - 1, argv + 1);
} else if (strcmp(cmd, "optimize") == 0) {
return cmd_optimize(argc - 1, argv + 1);
} else if (strcmp(cmd, "--help") == 0 || strcmp(cmd, "-h") == 0) {
print_help();
return 0;
} else {
fprintf(stderr, "Unknown command: %s\n", cmd);
return 1;
}
}
Checkpoint: mywasm --help shows all commands.
Phase 2: Integration (Days 6-10)
Goal: Connect previous projects
Wire up existing code to CLI commands:
// cli/compile_cmd.c
int cmd_compile(int argc, char** argv) {
// Parse arguments
const char* input = NULL;
const char* output = "a.wasm";
for (int i = 1; i < argc; i++) {
if (strcmp(argv[i], "-o") == 0 && i + 1 < argc) {
output = argv[++i];
} else if (argv[i][0] != '-') {
input = argv[i];
}
}
if (!input) {
fprintf(stderr, "Usage: mywasm compile <input.mini> [-o output.wasm]\n");
return 1;
}
// Read source
char* source = read_file(input);
if (!source) {
fprintf(stderr, "Error: Cannot read %s\n", input);
return 1;
}
// Compile (from Project 4)
CompileResult result = compile(source);
free(source);
if (result.error) {
fprintf(stderr, "%s:%d: %s\n",
input, result.error_line, result.error_msg);
return 1;
}
// Write output
write_file(output, result.wasm, result.wasm_len);
printf("Compiled %s -> %s (%zu bytes)\n", input, output, result.wasm_len);
return 0;
}
Checkpoint: mywasm compile hello.mini && mywasm run hello.wasm works.
Phase 3: Validator (Days 11-20)
Goal: Catch invalid modules
// validator/validator.c
typedef struct {
bool valid;
char error[256];
int error_offset;
} ValidationResult;
ValidationResult validate_module(Module* module) {
ValidationResult result = {.valid = true};
// Check structure
if (!validate_structure(module, &result)) return result;
// Check types
if (!validate_types(module, &result)) return result;
// Check functions
for (uint32_t i = 0; i < module->func_count; i++) {
if (!validate_function(module, i, &result)) return result;
}
// Check memory
if (!validate_memory(module, &result)) return result;
// Check data segments
if (!validate_data(module, &result)) return result;
return result;
}
bool validate_function(Module* module, uint32_t idx, ValidationResult* result) {
FuncBody* body = &module->code[idx];
FuncType* type = &module->types[module->func_types[idx]];
// Initialize validator state
ValidatorState state = {
.stack = create_type_stack(),
.control = create_control_stack(),
.locals = get_local_types(module, idx),
.num_locals = body->local_count + type->param_count,
};
// Validate each instruction
for (size_t i = 0; i < body->code_len; ) {
uint8_t opcode = body->code[i++];
if (!validate_instruction(&state, opcode, body->code, &i, result)) {
return false;
}
}
// Check final stack matches return type
if (!check_stack_matches(&state, type->results, type->result_count, result)) {
return false;
}
return true;
}
Test invalid modules:
# Stack underflow
echo "(module (func i32.add))" | wat2wasm - -o bad.wasm 2>/dev/null
./mywasm validate bad.wasm
# Expected: "Error: Stack underflow at offset 0x10"
# Type mismatch
echo "(module (func (result i32) f32.const 1.0))" | wat2wasm - -o bad.wasm 2>/dev/null
./mywasm validate bad.wasm
# Expected: "Error: Type mismatch: expected i32, got f32"
Checkpoint: Rejects invalid modules with clear errors.
Phase 4: Disassembler (Days 21-28)
Goal: Convert WASM back to WAT
// disasm/disasm.c
void disassemble(Module* module, FILE* out) {
fprintf(out, "(module\n");
// Types
for (uint32_t i = 0; i < module->type_count; i++) {
disasm_type(module, i, out);
}
// Imports
for (uint32_t i = 0; i < module->import_count; i++) {
disasm_import(module, i, out);
}
// Functions
for (uint32_t i = 0; i < module->func_count; i++) {
disasm_function(module, i, out);
}
// Memory
for (uint32_t i = 0; i < module->memory_count; i++) {
disasm_memory(module, i, out);
}
// Exports
for (uint32_t i = 0; i < module->export_count; i++) {
disasm_export(module, i, out);
}
// Data segments
for (uint32_t i = 0; i < module->data_count; i++) {
disasm_data(module, i, out);
}
fprintf(out, ")\n");
}
void disasm_function(Module* module, uint32_t idx, FILE* out) {
FuncBody* body = &module->code[idx];
FuncType* type = &module->types[module->func_types[idx]];
// Get name if available
const char* name = get_func_name(module, idx);
fprintf(out, " (func");
if (name) fprintf(out, " $%s", name);
fprintf(out, " (type %u)", module->func_types[idx]);
// Parameters
for (uint32_t i = 0; i < type->param_count; i++) {
fprintf(out, " (param %s)", type_name(type->params[i]));
}
// Results
for (uint32_t i = 0; i < type->result_count; i++) {
fprintf(out, " (result %s)", type_name(type->results[i]));
}
fprintf(out, "\n");
// Locals
for (uint32_t i = 0; i < body->local_count; i++) {
fprintf(out, " (local %s)\n", type_name(body->locals[i]));
}
// Instructions
disasm_instructions(module, body->code, body->code_len, out, 2);
fprintf(out, " )\n");
}
void disasm_instructions(Module* module, uint8_t* code, size_t len,
FILE* out, int indent) {
size_t i = 0;
while (i < len) {
uint8_t opcode = code[i++];
print_indent(out, indent);
switch (opcode) {
case 0x00:
fprintf(out, "unreachable\n");
break;
case 0x01:
fprintf(out, "nop\n");
break;
case 0x02: { // block
int8_t block_type = (int8_t)code[i++];
fprintf(out, "block");
if (block_type != 0x40) {
fprintf(out, " (result %s)", type_name(block_type));
}
fprintf(out, "\n");
indent++;
break;
}
case 0x03: { // loop
int8_t block_type = (int8_t)code[i++];
fprintf(out, "loop");
if (block_type != 0x40) {
fprintf(out, " (result %s)", type_name(block_type));
}
fprintf(out, "\n");
indent++;
break;
}
case 0x0b: // end
indent--;
print_indent(out, indent);
fprintf(out, "end\n");
break;
case 0x0c: { // br
uint32_t depth = read_leb128(code, &i);
fprintf(out, "br %u\n", depth);
break;
}
case 0x10: { // call
uint32_t func_idx = read_leb128(code, &i);
const char* name = get_func_name(module, func_idx);
if (name) {
fprintf(out, "call $%s\n", name);
} else {
fprintf(out, "call %u\n", func_idx);
}
break;
}
case 0x20: { // local.get
uint32_t idx = read_leb128(code, &i);
fprintf(out, "local.get %u\n", idx);
break;
}
case 0x41: { // i32.const
int32_t val = read_sleb128(code, &i);
fprintf(out, "i32.const %d\n", val);
break;
}
case 0x6a:
fprintf(out, "i32.add\n");
break;
// ... all other opcodes ...
default:
fprintf(out, ";; unknown opcode 0x%02x\n", opcode);
}
}
}
Checkpoint: mywasm disasm hello.wasm produces readable WAT.
Phase 5: Debugger (Days 29-42)
Goal: Interactive debugging
// debugger/debugger.c
typedef struct {
Module* module;
Instance* instance;
// Breakpoints
Breakpoint* breakpoints;
int breakpoint_count;
// Current state
uint32_t current_func;
size_t current_ip;
bool running;
bool stepping;
} Debugger;
void debug_repl(Debugger* dbg) {
char line[256];
printf("WebAssembly Debugger\n");
printf("Type 'help' for commands.\n\n");
while (true) {
printf("(wdb) ");
fflush(stdout);
if (!fgets(line, sizeof(line), stdin)) break;
// Remove newline
line[strcspn(line, "\n")] = 0;
// Parse and execute command
char* cmd = strtok(line, " ");
if (!cmd) continue;
if (strcmp(cmd, "run") == 0 || strcmp(cmd, "r") == 0) {
cmd_run(dbg);
} else if (strcmp(cmd, "break") == 0 || strcmp(cmd, "b") == 0) {
char* arg = strtok(NULL, " ");
cmd_break(dbg, arg);
} else if (strcmp(cmd, "continue") == 0 || strcmp(cmd, "c") == 0) {
cmd_continue(dbg);
} else if (strcmp(cmd, "step") == 0 || strcmp(cmd, "s") == 0) {
cmd_step(dbg);
} else if (strcmp(cmd, "next") == 0 || strcmp(cmd, "n") == 0) {
cmd_next(dbg);
} else if (strcmp(cmd, "stack") == 0) {
cmd_stack(dbg);
} else if (strcmp(cmd, "locals") == 0) {
cmd_locals(dbg);
} else if (strcmp(cmd, "memory") == 0 || strcmp(cmd, "x") == 0) {
char* addr_str = strtok(NULL, " ");
char* len_str = strtok(NULL, " ");
cmd_memory(dbg, addr_str, len_str);
} else if (strcmp(cmd, "backtrace") == 0 || strcmp(cmd, "bt") == 0) {
cmd_backtrace(dbg);
} else if (strcmp(cmd, "help") == 0 || strcmp(cmd, "h") == 0) {
cmd_help();
} else if (strcmp(cmd, "quit") == 0 || strcmp(cmd, "q") == 0) {
break;
} else {
printf("Unknown command: %s\n", cmd);
}
}
}
void cmd_step(Debugger* dbg) {
if (!dbg->running) {
printf("Program not running. Use 'run' to start.\n");
return;
}
// Execute one instruction
dbg->stepping = true;
execute_one(dbg->instance);
dbg->stepping = false;
// Show current instruction
print_current_instruction(dbg);
}
void cmd_stack(Debugger* dbg) {
Stack* stack = &dbg->instance->stack;
printf("Value stack (%d values):\n", stack->sp);
for (int i = stack->sp - 1; i >= 0; i--) {
Value* v = &stack->data[i];
printf(" [%d] %s: ", stack->sp - 1 - i, type_name(v->type));
print_value(v);
printf("\n");
}
}
void cmd_locals(Debugger* dbg) {
Frame* frame = current_frame(dbg->instance);
printf("Local variables:\n");
for (uint32_t i = 0; i < frame->local_count; i++) {
Value* v = &frame->locals[i];
const char* name = get_local_name(dbg->module, dbg->current_func, i);
if (name) {
printf(" $%s: ", name);
} else {
printf(" [%u]: ", i);
}
printf("%s = ", type_name(v->type));
print_value(v);
printf("\n");
}
}
Debugger integration with interpreter:
// Modify exec.c to support debugging
typedef void (*DebugHook)(Instance* inst, uint8_t opcode, size_t ip);
void execute_with_debug(Instance* inst, DebugHook hook) {
while (!inst->halted) {
uint8_t opcode = read_byte(inst);
// Call debug hook before each instruction
if (hook) {
hook(inst, opcode, inst->ip - 1);
}
execute_instruction(inst, opcode);
}
}
Checkpoint: Can set breakpoint, hit it, inspect stack and locals.
Phase 6: Optimizer (Days 43-52)
Goal: Improve code quality
// optimizer/optimizer.c
Module* optimize(Module* module, OptimizeOptions* opts) {
Module* opt = clone_module(module);
for (uint32_t i = 0; i < opt->func_count; i++) {
FuncBody* body = &opt->code[i];
if (opts->constant_fold) {
constant_fold(body);
}
if (opts->dead_code) {
eliminate_dead_code(body);
}
if (opts->peephole) {
peephole_optimize(body);
}
}
return opt;
}
// optimizer/const_fold.c
void constant_fold(FuncBody* body) {
// Build instruction list
Instruction* instrs = decode_instructions(body->code, body->code_len);
int count = count_instructions(instrs);
// Look for patterns
for (int i = 0; i < count - 2; i++) {
// i32.const X; i32.const Y; i32.add โ i32.const (X+Y)
if (instrs[i].opcode == 0x41 && // i32.const
instrs[i+1].opcode == 0x41 && // i32.const
instrs[i+2].opcode == 0x6a) { // i32.add
int32_t a = instrs[i].i32_val;
int32_t b = instrs[i+1].i32_val;
int32_t result = a + b;
// Replace with single constant
instrs[i].i32_val = result;
mark_deleted(&instrs[i+1]);
mark_deleted(&instrs[i+2]);
}
// Similar for other operations: sub, mul, etc.
}
// Rebuild body
body->code = encode_instructions(instrs, &body->code_len);
}
// optimizer/dead_code.c
void eliminate_dead_code(FuncBody* body) {
Instruction* instrs = decode_instructions(body->code, body->code_len);
int count = count_instructions(instrs);
// Mark instructions after unconditional branches as dead
bool unreachable = false;
for (int i = 0; i < count; i++) {
if (unreachable) {
// Mark as dead unless it's a target (end, else)
if (instrs[i].opcode != 0x0b && // end
instrs[i].opcode != 0x05) { // else
mark_deleted(&instrs[i]);
} else {
unreachable = false;
}
}
// These make following code unreachable
if (instrs[i].opcode == 0x00 || // unreachable
instrs[i].opcode == 0x0f || // return
instrs[i].opcode == 0x0c) { // br (unconditional)
unreachable = true;
}
}
body->code = encode_instructions(instrs, &body->code_len);
}
Checkpoint: mywasm optimize reduces code size on test programs.
Phase 7: Testing & Polish (Days 53-60+)
Goal: Production quality
# tests/integration/test_full_pipeline.sh
#!/bin/bash
set -e
echo "=== Integration Tests ==="
# Test 1: Full pipeline
echo "Test 1: Compile -> Validate -> Run"
./mywasm compile examples/factorial.mini -o /tmp/fact.wasm
./mywasm validate /tmp/fact.wasm
result=$(./mywasm run /tmp/fact.wasm --invoke factorial 10)
[ "$result" = "3628800" ] && echo "PASS" || echo "FAIL: expected 3628800, got $result"
# Test 2: Disassembly round-trip
echo "Test 2: Disassembly produces valid WAT"
./mywasm disasm /tmp/fact.wasm > /tmp/fact.wat
wat2wasm /tmp/fact.wat -o /tmp/fact2.wasm
./mywasm validate /tmp/fact2.wasm && echo "PASS" || echo "FAIL"
# Test 3: Optimizer preserves semantics
echo "Test 3: Optimization preserves semantics"
./mywasm optimize /tmp/fact.wasm -o /tmp/fact_opt.wasm
./mywasm validate /tmp/fact_opt.wasm
result_opt=$(./mywasm run /tmp/fact_opt.wasm --invoke factorial 10)
[ "$result_opt" = "3628800" ] && echo "PASS" || echo "FAIL"
# Test 4: Validator rejects bad modules
echo "Test 4: Validator rejects invalid module"
echo "00 61 73 6d 01 00 00 00" | xxd -r -p > /tmp/empty.wasm
./mywasm validate /tmp/empty.wasm 2>&1 | grep -q "invalid" && echo "PASS" || echo "FAIL"
# Test 5: Error messages are helpful
echo "Test 5: Error messages include location"
echo "func main() { return x; }" > /tmp/bad.mini
./mywasm compile /tmp/bad.mini 2>&1 | grep -q "undefined" && echo "PASS" || echo "FAIL"
echo "=== All tests completed ==="
Testing Strategy
Unit Tests
Test each component in isolation:
// tests/validator/test_type_check.c
void test_stack_balance() {
// Valid: push, push, add = balanced
uint8_t code[] = {0x41, 0x01, 0x41, 0x02, 0x6a, 0x0b};
ValidationResult r = validate_code(code, sizeof(code), TYPE_I32);
assert(r.valid);
}
void test_stack_underflow() {
// Invalid: add with empty stack
uint8_t code[] = {0x6a, 0x0b};
ValidationResult r = validate_code(code, sizeof(code), TYPE_VOID);
assert(!r.valid);
assert(strstr(r.error, "underflow"));
}
void test_type_mismatch() {
// Invalid: f32 when i32 expected
uint8_t code[] = {0x43, 0x00, 0x00, 0x80, 0x3f, 0x0b}; // f32.const 1.0
ValidationResult r = validate_code(code, sizeof(code), TYPE_I32);
assert(!r.valid);
assert(strstr(r.error, "mismatch"));
}
Fuzzing
Use fuzzing to find edge cases:
// tests/fuzz/fuzz_validator.c
int LLVMFuzzerTestOneInput(const uint8_t* data, size_t size) {
if (size < 8) return 0;
// Try to parse as WASM
Module* module = parse_wasm(data, size);
if (module) {
// Validate (should not crash)
ValidationResult r = validate_module(module);
// If valid, try to run
if (r.valid) {
Instance* inst = instantiate(module);
if (inst) {
// Run briefly
execute_steps(inst, 1000);
free_instance(inst);
}
}
free_module(module);
}
return 0;
}
Spec Conformance
Run the official WebAssembly test suite:
# Clone spec tests
git clone https://github.com/WebAssembly/spec.git
# Convert .wast to .wasm and run
for wast in spec/test/core/*.wast; do
wast2json "$wast" -o /tmp/test.json
./mywasm test /tmp/test.json
done
Common Pitfalls
1. Validator Stack Polymorphism
After unreachable, the stack is polymorphic:
(func (result i32)
unreachable
;; At this point, stack could be anything
i32.add ;; This is actually valid!
)
Handle with special โpolymorphicโ stack state.
2. Debugger Thread Safety
If you support threading later, debugger state must be synchronized:
// Use mutex for breakpoint list
pthread_mutex_lock(&dbg->breakpoint_mutex);
// ... modify breakpoints ...
pthread_mutex_unlock(&dbg->breakpoint_mutex);
3. Optimizer Correctness
Always verify optimizations preserve semantics:
// Before releasing optimization:
// 1. Run all tests with optimization
// 2. Compare output of optimized vs unoptimized
// 3. Check code still validates
4. CLI Argument Parsing
Handle edge cases:
# These should all work:
mywasm run program.wasm
mywasm run program.wasm --
mywasm run program.wasm -- arg1 arg2
mywasm run program.wasm --verbose -- arg1
5. Error Message Quality
Bad: Error: validation failed
Good: Error at function $add (offset 0x42): stack underflow on i32.add
Include:
- Location (function, offset, line if available)
- What went wrong
- What was expected
Extensions
1. Module Linking
Combine multiple modules:
mywasm link module1.wasm module2.wasm -o combined.wasm
2. Source Maps
Generate DWARF debug info:
typedef struct {
uint32_t wasm_offset;
const char* source_file;
int source_line;
int source_column;
} SourceMapping;
3. Profiler
Add execution profiling:
$ mywasm profile program.wasm
Function Calls Time(ms) Time%
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
factorial 1024 45.2 78.3%
helper 512 8.1 14.0%
main 1 4.5 7.7%
4. REPL
Interactive WASM evaluation:
$ mywasm repl
wasm> (module (func (export "add") (param i32 i32) (result i32) local.get 0 local.get 1 i32.add))
Module loaded.
wasm> add(5, 3)
8
wasm> (func (export "mul") (param i32 i32) (result i32) local.get 0 local.get 1 i32.mul)
Function added.
wasm> mul(4, 7)
28
5. IDE Integration
Create a language server:
{
"capabilities": {
"completionProvider": {},
"hoverProvider": true,
"definitionProvider": true,
"diagnosticsProvider": true
}
}
Real-World Outcome
Complete Development Workflow: Source to Debugging
This section demonstrates the complete toolchain workflow, showing how your tools work together from writing source code to debugging runtime issues.
The Complete Pipeline in Action
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ COMPLETE TOOLCHAIN WORKFLOW โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ 1. WRITE SOURCE CODE โ
โ โโโโโโโโโโโโโโโโโโโโ โ
โ $ cat > math.mini << 'EOF' โ
โ func gcd(a: i32, b: i32) -> i32 { โ
โ while b != 0 { โ
โ let temp = b; โ
โ b = a % b; โ
โ a = temp; โ
โ } โ
โ return a; โ
โ } โ
โ EOF โ
โ โ
โ 2. COMPILE WITH DEBUG INFO โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ $ mywasm compile math.mini -o math.wasm -g โ
โ Compiling math.mini... โ
โ โ Lexing: 47 tokens โ
โ โ Parsing: AST generated โ
โ โ Type checking: All types verified โ
โ โ Code generation: 89 bytes โ
โ โ Debug info: DWARF sections embedded โ
โ Output: math.wasm (142 bytes with debug info) โ
โ โ
โ 3. VALIDATE MODULE โ
โ โโโโโโโโโโโโโโโโโโ โ
โ $ mywasm validate math.wasm โ
โ Validating math.wasm... โ
โ โ Magic number: valid โ
โ โ Version: 1 โ
โ โ Section order: valid โ
โ โ Type section: 1 type(s) โ
โ โ Function section: 1 function(s) โ
โ โ Code section: All functions type-checked โ
โ โ Custom sections: names, dwarf โ
โ Module is valid. โ
โ โ
โ 4. DISASSEMBLE TO INSPECT โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ $ mywasm disasm math.wasm โ
โ (module โ
โ (type (;0;) (func (param i32 i32) (result i32))) โ
โ (func $gcd (type 0) (param $a i32) (param $b i32) (result i32) โ
โ (local $temp i32) โ
โ block $exit โ
โ loop $continue โ
โ local.get $b โ
โ i32.eqz โ
โ br_if $exit โ
โ local.get $b โ
โ local.set $temp โ
โ local.get $a โ
โ local.get $b โ
โ i32.rem_s โ
โ local.set $b โ
โ local.get $temp โ
โ local.set $a โ
โ br $continue โ
โ end โ
โ end โ
โ local.get $a) โ
โ (export "gcd" (func $gcd))) โ
โ โ
โ 5. RUN THE PROGRAM โ
โ โโโโโโโโโโโโโโโโโโ โ
โ $ mywasm run math.wasm --invoke gcd 48 18 โ
โ Result: 6 โ
โ โ
โ 6. OPTIMIZE FOR PRODUCTION โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ $ mywasm optimize math.wasm -O2 -o math.opt.wasm โ
โ Optimizing math.wasm... โ
โ โ Constant folding: 0 expressions folded โ
โ โ Dead code elimination: 0 instructions removed โ
โ โ Local coalescing: 0 locals merged โ
โ โ Strength reduction: 0 operations simplified โ
โ Output: math.opt.wasm (89 bytes, 0% reduction) โ
โ โ
โ 7. DEBUG A PROBLEM โ
โ โโโโโโโโโโโโโโโโโโ โ
โ $ mywasm debug math.wasm โ
โ WebAssembly Debugger v1.0 โ
โ Module loaded: math.wasm โ
โ Functions: 1 (gcd) โ
โ โ
โ (wdb) break gcd โ
โ Breakpoint 1 at function $gcd (offset 0x00) โ
โ โ
โ (wdb) run gcd 48 18 โ
โ Starting execution with args: [48, 18] โ
โ Breakpoint 1 hit at $gcd โ
โ math.mini:1 func gcd(a: i32, b: i32) -> i32 { โ
โ โ
โ (wdb) locals โ
โ $a = i32: 48 โ
โ $b = i32: 18 โ
โ $temp = i32: 0 (uninitialized) โ
โ โ
โ (wdb) step 5 โ
โ Stepped 5 instructions โ
โ math.mini:4 b = a % b; โ
โ โ
โ (wdb) stack โ
โ [0] i32: 48 (a) โ
โ [1] i32: 18 (b) โ
โ โ
โ (wdb) eval a % b โ
โ Result: i32: 12 โ
โ โ
โ (wdb) continue โ
โ Execution completed. โ
โ Return value: i32: 6 โ
โ โ
โ (wdb) quit โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ

Debugging with DWARF and Source Maps
For production debugging, your toolchain supports both DWARF and source maps:
# Compile with DWARF debug info (full debugging, larger files)
$ mywasm compile app.mini -o app.wasm -g --debug-format=dwarf
# Result: DWARF sections embedded in custom sections
# Compile with source maps (location only, smaller files, wider support)
$ mywasm compile app.mini -o app.wasm -g --debug-format=sourcemap
# Result: app.wasm.map generated alongside binary
# Strip debug info for production (keep source map separate)
$ mywasm strip app.wasm -o app.prod.wasm --keep-sourcemap
# Result: minimal binary with external source map reference
CI/CD Integration Example
# .github/workflows/build.yml
name: Build and Test WASM
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Compile
run: |
mywasm compile src/main.mini -o dist/main.wasm -O2 -g
mywasm compile src/main.mini -o dist/main.debug.wasm -g
- name: Validate
run: mywasm validate dist/main.wasm --strict
- name: Test
run: |
mywasm run dist/main.wasm --invoke test_suite
mywasm run dist/main.debug.wasm --invoke test_suite
- name: Benchmark
run: mywasm profile dist/main.wasm --invoke benchmark > profile.txt
- name: Upload artifacts
uses: actions/upload-artifact@v4
with:
name: wasm-binaries
path: dist/
Professional Toolchains
Your toolchain mirrors production tools:
| Your Tool | Production Equivalent | Key Learning |
|---|---|---|
mywasm compile |
clang, rustc, emcc | End-to-end compilation pipeline |
mywasm validate |
wasm-validate (wabt) | Type system verification |
mywasm disasm |
wasm2wat (wabt) | Binary format understanding |
mywasm run |
wasmtime, wasmer, wasm3 | Virtual machine execution |
mywasm debug |
lldb, gdb, Chrome DevTools | Debug protocol and state inspection |
mywasm optimize |
wasm-opt (binaryen) | Compiler optimization techniques |
mywasm link |
wasm-ld (LLVM) | Module composition and symbol resolution |
Contributing to the Ecosystem
Your deep understanding enables contributions to:
| Project | Contribution Opportunities |
|---|---|
| wasmtime | Runtime optimizations, new instruction support, debugging improvements |
| wasm3 | Interpreter performance, embedded platform support |
| binaryen | New optimization passes, IR transformations |
| wasi-sdk | Toolchain improvements, better error messages |
| spec | Test suite contributions, proposal implementations |
Teaching Others
Your toolchain becomes an educational platform:
# Use as teaching tool
$ mywasm explain "local.get 0"
local.get 0
Opcode: 0x20
Operand: 0 (LEB128)
Effect: Push value of local variable 0 onto the stack
Stack: [...] -> [..., local_0_value]
$ mywasm trace math.wasm --invoke gcd 48 18
[0x00] local.get $a stack: [] -> [48]
[0x02] local.get $b stack: [48] -> [48, 18]
[0x04] i32.rem_s stack: [48, 18] -> [12]
...
Your knowledge enables you to:
- Write compiler courses using your toolchain as the example project
- Mentor junior engineers on systems programming
- Create YouTube/blog tutorials on โhow WebAssembly really worksโ
- Contribute to WebAssembly education initiatives
Self-Assessment Checklist
Integration
- All tools work together seamlessly
- Shared module representation is consistent
- Error messages are clear and actionable
- CLI is intuitive and well-documented
Validator
- Catches all structural errors
- Type checks all instructions
- Handles unreachable code correctly
- Reports precise error locations
Disassembler
- Produces valid WAT for all inputs
- Uses names when available
- Handles all instruction types
- Output is properly indented
Debugger
- Breakpoints work at function and instruction level
- Step, next, continue all work correctly
- State inspection shows accurate data
- UI is responsive and clear
Testing
- Unit tests cover all components
- Integration tests verify full pipeline
- Edge cases are handled gracefully
- Performance is acceptable
Resources
Toolchain Design
- LLVM Architecture - How a real toolchain works
- GDB Internals - Debugger architecture
Testing
- WebAssembly Spec Tests - Official conformance tests
- AFL Fuzzing - Fuzzing framework
Reference Tools
Key Insights
Integration is harder than implementation. Each component might work alone, but making them work together smoothly requires careful design of shared data structures and consistent error handling.
User experience matters. Clear error messages, intuitive CLI, and helpful documentation transform a technical project into a usable tool.
Testing is your safety net. With a complex toolchain, the only way to make changes confidently is comprehensive automated testing.
Youโve built something real. This isnโt a toyโitโs a functional toolchain that could genuinely be used to compile, debug, and run WebAssembly programs.
Conclusion
Completing this capstone project demonstrates mastery of:
- WebAssembly internals - Binary format, execution semantics, type system
- Compiler construction - Lexing, parsing, type checking, code generation
- Virtual machine design - Stack machines, memory management, control flow
- System programming - WASI, sandboxing, capability security
- Software engineering - Testing, documentation, CLI design
You now understand WebAssembly at the level of its designers. You could:
- Contribute to production runtimes
- Design new languages targeting WASM
- Build the next generation of edge computing platforms
- Teach others how WebAssembly really works
Congratulations on completing the WebAssembly Deep Learning journey.
This is the culmination of Projects 1-5. Return to individual projects to deepen specific areas, or extend this toolchain with your own innovations.