Project 2: Build a WASM Binary Parser

Decode the .wasm binary format byte-by-byte and understand what makes a WebAssembly module

Project Overview

Attribute	Value
Difficulty	Intermediate
Time Estimate	1-2 weeks
Languages	C (primary), Rust, Python, TypeScript
Prerequisites	Project 1 (WAT), binary/hex familiarity, file I/O
Main Reference	WebAssembly Specification §5 (Binary Format)
Knowledge Area	Binary Parsing, WebAssembly Internals

Learning Objectives

After completing this project, you will be able to:

Parse LEB128 encoding - Decode variable-length integers used throughout WASM
Identify all section types - Know the 12 standard sections and their purposes
Decode function signatures - Understand the type section’s structure
Read import/export tables - Parse how modules connect to their environment
Disassemble code sections - Turn bytecode back into readable instructions
Validate module structure - Know the ordering and format requirements

Conceptual Foundation

1. Why Binary Format Matters

When you write WAT and compile with wat2wasm, the output is a .wasm file—a binary encoding of your module. Understanding this format lets you:

Debug mysterious WASM failures by inspecting the actual bytes
Build tools (validators, optimizers, debuggers)
Write compilers that output WASM directly
Understand runtime performance (binary is what actually executes)

The binary format IS WebAssembly. WAT is just a convenience for humans.

2. The Magic Number and Version

Every WASM file begins with 8 bytes:

Offset  Bytes        Meaning
──────────────────────────────
0x00    00 61 73 6D  Magic number: "\0asm"
0x04    01 00 00 00  Version: 1 (little-endian u32)

Why “\0asm”?

Starts with null byte to distinguish from text files
“asm” is a nod to the original asm.js project
Makes files easily identifiable: file mymodule.wasm shows “WebAssembly”

Version 1: All deployed WASM today is version 1. Version 2 is in development with new features.

3. LEB128: Variable-Length Integers

WASM uses LEB128 (Little Endian Base 128) to encode integers compactly:

Small numbers use few bytes: 0-127 fit in 1 byte
Large numbers use more bytes: up to 5 bytes for u32, 10 for u64
Each byte’s high bit indicates “more bytes follow”

Unsigned LEB128 Algorithm

Reading uLEB128:
  result = 0
  shift = 0
  loop:
    byte = read_byte()
    result |= (byte & 0x7F) << shift  // Low 7 bits contribute to value
    if (byte & 0x80) == 0:            // High bit clear = done
      break
    shift += 7
  return result

Example: Decoding 0xE5 0x8E 0x26

Byte 1: 0xE5 = 1110 0101
  - High bit set (1): more bytes coming
  - Value bits: 110 0101 = 0x65

Byte 2: 0x8E = 1000 1110
  - High bit set (1): more bytes coming
  - Value bits: 000 1110 = 0x0E

Byte 3: 0x26 = 0010 0110
  - High bit clear (0): done
  - Value bits: 010 0110 = 0x26

Assembly:
  result = 0x65 | (0x0E << 7) | (0x26 << 14)
         = 0x65 | 0x700 | 0x98000
         = 624485

Signed LEB128 (sLEB128)

Similar, but the final byte’s second-highest bit is the sign bit. If set, extend with 1s:

If final byte has bit 6 set and result needs sign extension:
  result |= (~0 << shift)  // Fill high bits with 1s

4. Section Structure

After the header, a WASM module is a sequence of sections. Each section:

┌─────────────┬──────────────┬────────────────────┐
│ Section ID  │ Section Size │    Section Data    │
│  (1 byte)   │  (uLEB128)   │   (size bytes)     │
└─────────────┴──────────────┴────────────────────┘

WASM Section Structure

Section IDs:

ID	Name	Purpose
0	Custom	Name sections, debug info, extensions
1	Type	Function signatures (param/result types)
2	Import	Functions/memory/tables/globals from host
3	Function	Maps function index → type index
4	Table	Indirect call tables (for function pointers)
5	Memory	Linear memory declarations
6	Global	Global variable declarations
7	Export	Names exposed to host
8	Start	Optional startup function index
9	Element	Table initialization data
10	Code	Function bodies
11	Data	Memory initialization data
12	Data Count	Count of data segments (for validation)

Critical rule: Sections must appear in ID order (except custom sections, which can appear anywhere). Duplicate sections are invalid.

5. The Type Section (ID = 1)

Contains function signatures (types). Structure:

┌────────────┬─────────────────────────────────────────────┐
│ num_types  │ type[0], type[1], ..., type[num_types-1]   │
│ (uLEB128)  │                                             │
└────────────┴─────────────────────────────────────────────┘

Each type:
┌────────┬────────────┬────────────────┬─────────────┬────────────────┐
│ 0x60   │ num_params │ param_types... │ num_results │ result_types...│
│ (func) │ (uLEB128)  │ (n bytes)      │ (uLEB128)   │ (m bytes)      │
└────────┴────────────┴────────────────┴─────────────┴────────────────┘

WASM Type Section Structure

Type encoding: | Byte | Type | |——|——| | 0x7F | i32 | | 0x7E | i64 | | 0x7D | f32 | | 0x7C | f64 | | 0x70 | funcref | | 0x6F | externref |

Example: (func (param i32 i32) (result i32)) encodes as:

60        ; function type marker
02        ; 2 parameters
7F 7F     ; both are i32
01        ; 1 result
7F        ; i32

6. The Function Section (ID = 3)

Maps each function in the code section to a type index:

┌────────────┬─────────────────────────────────────────────┐
│ num_funcs  │ type_idx[0], type_idx[1], ...              │
│ (uLEB128)  │ (each is uLEB128)                          │
└────────────┴─────────────────────────────────────────────┘

Why separate from Code?

Allows streaming compilation: know all types before seeing bodies
Enables forward references and mutual recursion

Function indexing: Imported functions come first. If you have 3 imports, your first local function is index 3.

7. The Import Section (ID = 2)

┌─────────────┬─────────────────────────────────────────────┐
│ num_imports │ import[0], import[1], ...                   │
└─────────────┴─────────────────────────────────────────────┘

Each import:
┌────────────────┬──────────────────┬─────────┬──────────────┐
│ module_name    │ import_name      │ kind    │ description  │
│ (string)       │ (string)         │ (byte)  │ (varies)     │
└────────────────┴──────────────────┴─────────┴──────────────┘

Strings: length (uLEB128) followed by UTF-8 bytes

Kind:
  0x00 = function (followed by type index)
  0x01 = table
  0x02 = memory
  0x03 = global

8. The Export Section (ID = 7)

Each export:
┌───────────────┬─────────┬───────────────┐
│ export_name   │ kind    │ index         │
│ (string)      │ (byte)  │ (uLEB128)     │
└───────────────┴─────────┴───────────────┘

Kind: same as imports (0x00=func, 0x01=table, 0x02=memory, 0x03=global)

9. The Code Section (ID = 10)

The heart of WASM—function bodies:

┌────────────┬───────────────────────────────────────────────┐
│ num_funcs  │ func_body[0], func_body[1], ...              │
└────────────┴───────────────────────────────────────────────┘

Each func_body:
┌───────────────┬──────────────┬────────────────────────────┐
│ body_size     │ locals       │ instructions               │
│ (uLEB128)     │ (see below)  │ (sequence of opcodes)      │
└───────────────┴──────────────┴────────────────────────────┘

Locals:
┌────────────┬────────────────────────────────────────────────┐
│ num_groups │ (count, type) pairs                           │
└────────────┴────────────────────────────────────────────────┘

For example: 2 locals of type i32, 1 local of type i64:
  02          ; 2 groups
  02 7F       ; 2 × i32
  01 7E       ; 1 × i64

10. Instruction Encoding (Opcodes)

Instructions are single bytes (sometimes followed by immediates):

Opcode	Instruction	Immediates
0x00	unreachable	none
0x01	nop	none
0x02	block	block type
0x03	loop	block type
0x04	if	block type
0x05	else	none
0x0B	end	none
0x0C	br	label index
0x0D	br_if	label index
0x10	call	function index
0x20	local.get	local index
0x21	local.set	local index
0x22	local.tee	local index
0x28	i32.load	memarg
0x36	i32.store	memarg
0x41	i32.const	sLEB128 value
0x42	i64.const	sLEB128 value
0x6A	i32.add	none
0x6B	i32.sub	none
0x6C	i32.mul	none

memarg: Memory operations take (align, offset) both as uLEB128.

block type: Either 0x40 (void) or a value type (0x7F for i32, etc.)

11. Module Anatomy Diagram

┌──────────────────────────────────────────────────────────────┐
│                        WASM Module                           │
├──────────────────────────────────────────────────────────────┤
│  Magic Number (4 bytes): 00 61 73 6D                        │
│  Version (4 bytes): 01 00 00 00                             │
├──────────────────────────────────────────────────────────────┤
│  Section 1 (Type):                                          │
│    ├── Number of types                                      │
│    └── Type definitions (function signatures)               │
├──────────────────────────────────────────────────────────────┤
│  Section 2 (Import):                                        │
│    ├── Number of imports                                    │
│    └── Import entries (module, name, kind, description)     │
├──────────────────────────────────────────────────────────────┤
│  Section 3 (Function):                                      │
│    ├── Number of functions                                  │
│    └── Type indices (which signature each function uses)    │
├──────────────────────────────────────────────────────────────┤
│  Section 5 (Memory):                                        │
│    └── Memory limits (initial, optional max)                │
├──────────────────────────────────────────────────────────────┤
│  Section 7 (Export):                                        │
│    ├── Number of exports                                    │
│    └── Export entries (name, kind, index)                   │
├──────────────────────────────────────────────────────────────┤
│  Section 10 (Code):                                         │
│    ├── Number of function bodies                            │
│    └── Function bodies                                      │
│        ├── Body size                                        │
│        ├── Local declarations                               │
│        └── Instructions (ending with 0x0B = end)            │
└──────────────────────────────────────────────────────────────┘

WASM Module Anatomy

Project Specification

Deliverables

Build a command-line tool that parses .wasm files and prints their structure:

$ ./wasmparser example.wasm
WASM Module
  Version: 1

Section: Type (1)
  [0] (i32, i32) -> (i32)
  [1] (i32) -> ()

Section: Import (2)
  [0] "env"."print" : func type=1

Section: Function (3)
  [0] type=0
  [1] type=0

Section: Memory (5)
  [0] min=1, max=none

Section: Export (7)
  [0] "add" : func 1
  [1] "memory" : memory 0

Section: Code (10)
  [0] locals: 0
      20 00   local.get 0
      20 01   local.get 1
      6A      i32.add
      0B      end
  [1] locals: 1 × i32
      41 00   i32.const 0
      21 02   local.set 2
      ...

Functional Requirements

Parse header: Validate magic number and version
Parse all 12 section types (or at least Type, Import, Function, Memory, Export, Code)
Decode LEB128: Both signed and unsigned variants
Parse type section: Show all function signatures
Parse import section: Show module, name, kind, and type
Parse export section: Show name, kind, and index
Parse code section: Disassemble to readable instructions
Handle errors gracefully: Invalid files shouldn’t crash

Minimum Viable Product

Start with:

Header parsing
Section enumeration (just ID and size)
Type section parsing
Code section basic disassembly

Then expand to remaining sections.

Success Criteria

Parses official WASM test suite files without crashing
Correctly decodes LEB128 values (test with edge cases)
Output matches wasm-objdump -x for section structure
Disassembly matches wasm2wat output for simple modules
Handles empty sections and missing optional sections

Solution Architecture

High-Level Design

┌─────────────────────────────────────────────────────────────┐
│                      wasmparser                             │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  ┌──────────────┐    ┌──────────────┐    ┌──────────────┐  │
│  │    Reader    │───▶│    Parser    │───▶│   Printer    │  │
│  │ (byte stream)│    │(decode logic)│    │(format output)│ │
│  └──────────────┘    └──────────────┘    └──────────────┘  │
│                                                             │
│  Reader provides:         Parser builds:    Printer shows:  │
│  - read_byte()           - Module struct   - Human text     │
│  - read_bytes(n)         - Section list    - Or JSON        │
│  - read_uleb128()        - Types list      - Or other       │
│  - read_sleb128()        - Functions       │                │
│  - position              - etc.            │                │
│                                                             │
└─────────────────────────────────────────────────────────────┘

WASM Parser Architecture

Data Structures

Module {
  version: u32
  sections: Vec<Section>
}

Section {
  id: u8
  size: u32
  data: SectionData  // enum or variant type
}

SectionData =
  | TypeSection { types: Vec<FuncType> }
  | ImportSection { imports: Vec<Import> }
  | FunctionSection { type_indices: Vec<u32> }
  | MemorySection { memories: Vec<MemoryType> }
  | ExportSection { exports: Vec<Export> }
  | CodeSection { bodies: Vec<FuncBody> }
  | CustomSection { name: String, bytes: Vec<u8> }
  | ...

FuncType {
  params: Vec<ValueType>
  results: Vec<ValueType>
}

Import {
  module_name: String
  import_name: String
  kind: ImportKind
}

FuncBody {
  locals: Vec<(u32, ValueType)>  // (count, type) pairs
  instructions: Vec<Instruction>
}

Instruction {
  opcode: u8
  immediates: Immediates  // varies by instruction
}

Reader Interface

trait WasmReader {
  fn read_byte(&mut self) -> Result<u8>
  fn read_bytes(&mut self, n: usize) -> Result<Vec<u8>>
  fn read_u32_le(&mut self) -> Result<u32>  // for header
  fn read_uleb128(&mut self) -> Result<u64>
  fn read_sleb128(&mut self) -> Result<i64>
  fn read_string(&mut self) -> Result<String>  // length-prefixed
  fn position(&self) -> usize
  fn remaining(&self) -> usize
}

Parsing Flow

parse_module():
  1. Read and validate magic number
  2. Read and store version
  3. While bytes remaining:
     a. Read section id (1 byte)
     b. Read section size (uLEB128)
     c. Read section_size bytes
     d. Dispatch to section parser based on id
     e. Verify exactly section_size bytes were consumed

parse_type_section(bytes):
  1. Read number of types
  2. For each type:
     a. Read 0x60 (func type marker)
     b. Read param count, then that many type bytes
     c. Read result count, then that many type bytes

parse_code_section(bytes):
  1. Read number of function bodies
  2. For each body:
     a. Read body size
     b. Read locals (count, then groups of (count, type))
     c. Read instructions until 0x0B (end)

parse_instruction():
  1. Read opcode byte
  2. Based on opcode, read immediates:
     - 0x41 (i32.const): read sLEB128
     - 0x20 (local.get): read uLEB128 (index)
     - 0x02 (block): read block type
     - etc.

Implementation Guide

Phase 1: File Reading and Header (Day 1)

Goal: Read a .wasm file and validate the header

Steps:

Read entire file into a byte buffer
Check first 4 bytes are 00 61 73 6D
Read next 4 bytes as little-endian u32 (should be 1)
Print “Valid WASM module, version X”

Hint: Little-endian u32 from bytes b0 b1 b2 b3:

value = b0 | (b1 << 8) | (b2 << 16) | (b3 << 24)

Testing: Create a minimal WAT file, compile it, parse it:

(module)

wat2wasm minimal.wat -o minimal.wasm
./wasmparser minimal.wasm

Phase 2: LEB128 Decoding (Day 1-2)

Goal: Implement robust LEB128 decoding

Unsigned LEB128 algorithm:

result = 0
shift = 0
while true:
  byte = read_byte()
  result |= (byte & 0x7F) << shift
  if (byte & 0x80) == 0:
    break
  shift += 7
return result

Signed LEB128 addition:

// After the loop, if sign extension needed:
if shift < 64 and (byte & 0x40) != 0:
  result |= -(1 << shift)  // Sign extend

Test cases: | Bytes | Unsigned | Signed | |——-|———-|——–| | 0x00 | 0 | 0 | | 0x7F | 127 | -1 | | 0x80 0x01 | 128 | 128 | | 0xFF 0x7F | 16383 | -1 | | 0xE5 0x8E 0x26 | 624485 | 624485 | | 0x9B 0xF1 0x59 | - | -624485 |

Phase 3: Section Enumeration (Day 2)

Goal: List all sections without parsing contents

Algorithm:

while position < file_length:
  section_id = read_byte()
  section_size = read_uleb128()
  print("Section", section_id, "size", section_size)
  skip(section_size)  // advance position

Testing: Parse a compiled WAT with multiple elements:

(module
  (memory 1)
  (func (export "test") (param i32) (result i32)
    local.get 0
  )
)

Expected sections: Type, Function, Memory, Export, Code

Phase 4: Type Section Parser (Day 3)

Goal: Parse and display function signatures

Hint for structure:

Type section:
  count (uLEB128)
  for each:
    0x60 (function type marker)
    param_count (uLEB128)
    param_types[param_count] (one byte each)
    result_count (uLEB128)
    result_types[result_count] (one byte each)

Type byte decoding:

0x7F → "i32"
0x7E → "i64"
0x7D → "f32"
0x7C → "f64"

Output format:

Type Section (1):
  [0] (i32, i32) -> (i32)
  [1] () -> ()

Phase 5: Import and Export Sections (Day 4)

Goal: Parse imports and exports

String reading: Length-prefixed UTF-8

length = read_uleb128()
bytes = read_bytes(length)
string = utf8_decode(bytes)

Import kind dispatch:

kind = read_byte()
match kind:
  0x00 (function): type_index = read_uleb128()
  0x01 (table): parse table type
  0x02 (memory): parse memory limits
  0x03 (global): parse global type

Export format:

name = read_string()
kind = read_byte()
index = read_uleb128()

Phase 6: Code Section and Disassembly (Day 5-7)

Goal: Parse function bodies and print instructions

Function body structure:

body_size = read_uleb128()
// Now read exactly body_size bytes

locals_count = read_uleb128()
for locals_count times:
  count = read_uleb128()  // how many of this type
  type = read_byte()      // what type

// Rest is instructions until 0x0B
while current_byte != 0x0B:
  parse_instruction()

Instruction parsing (start with these):

Opcode	Name	Immediates
0x00	unreachable	-
0x01	nop	-
0x0B	end	-
0x10	call	func_idx (uLEB128)
0x20	local.get	local_idx (uLEB128)
0x21	local.set	local_idx (uLEB128)
0x41	i32.const	value (sLEB128)
0x6A	i32.add	-
0x6B	i32.sub	-
0x6C	i32.mul	-

Output format:

Code Section (10):
  [0] 5 bytes, locals: 0
      20 00     local.get 0
      20 01     local.get 1
      6A        i32.add
      0B        end

Phase 7: Full Integration (Day 8+)

Goal: Handle all sections, robust error handling

Add remaining sections (Table, Memory limits, Global, Element, Data)
Add control flow instructions (block, loop, if, br, br_if)
Add memory instructions with memarg parsing
Improve error messages with byte offsets
Add validation (check section ordering, size consistency)

Testing Strategy

Unit Tests for LEB128

// Unsigned
assert(parse_uleb128([0x00]) == 0)
assert(parse_uleb128([0x7F]) == 127)
assert(parse_uleb128([0x80, 0x01]) == 128)
assert(parse_uleb128([0xE5, 0x8E, 0x26]) == 624485)

// Signed
assert(parse_sleb128([0x00]) == 0)
assert(parse_sleb128([0x7F]) == -1)
assert(parse_sleb128([0x80, 0x01]) == 128)
assert(parse_sleb128([0x9B, 0xF1, 0x59]) == -624485)

Integration Tests

Create WAT files, compile them, parse them, verify output:

# Create test cases
echo '(module)' > empty.wat
echo '(module (func (export "f")))' > func.wat
echo '(module (func (param i32) (result i32) local.get 0))' > identity.wat

# Compile
wat2wasm empty.wat
wat2wasm func.wat
wat2wasm identity.wat

# Parse and verify
./wasmparser empty.wasm
./wasmparser func.wasm
./wasmparser identity.wasm

Comparison Testing

Compare your output to wasm-objdump:

wasm-objdump -x example.wasm > expected.txt
./wasmparser example.wasm > actual.txt
diff expected.txt actual.txt

Fuzzing (Advanced)

Use the official WASM test suite:

git clone https://github.com/aspect-it/aspect-it/aspect-it/aspect-itspec-repo
for f in *.wasm; do
  ./wasmparser "$f" || echo "Failed: $f"
done

Common Pitfalls and Debugging

Pitfall 1: LEB128 Off-by-One

Symptom: Values slightly wrong, especially around powers of 2 Cause: Shift calculation or termination condition wrong Fix: Test with values at boundaries: 0, 127, 128, 255, 256, 16383, 16384

Pitfall 2: Forgetting Section Size Constraint

Symptom: Parsing goes off the rails after first section Cause: Not tracking bytes consumed within section Fix: Record position before section, verify position == start + size after

Pitfall 3: Signed vs Unsigned Confusion

Symptom: Negative numbers look huge positive Cause: Using uLEB128 where sLEB128 needed (e.g., i32.const) Fix: i32.const and i64.const use signed; indices use unsigned

Pitfall 4: String Encoding

Symptom: Import/export names garbled Cause: Not reading length prefix, or encoding issues Fix: Strings are length-prefixed (uLEB128) followed by raw UTF-8 bytes

Pitfall 5: Nested Control Structures

Symptom: Disassembly wrong for if/block/loop Cause: Not tracking nesting depth Fix: block/loop/if push depth, end pops depth. Track for proper indentation.

Debugging with xxd

xxd example.wasm | head -20

Shows hex dump. Match against your parser’s position to find issues.

Debugging with wasm-objdump

wasm-objdump -h example.wasm  # Headers and section list
wasm-objdump -x example.wasm  # Full details
wasm-objdump -d example.wasm  # Disassembly

Compare output to find where your parser diverges.

Extensions and Challenges

Extension 1: JSON Output

Add --json flag for machine-readable output. Useful for tooling.

Extension 2: Hex Dump Mode

Show raw bytes alongside disassembly:

000010: 20 00        local.get 0
000012: 20 01        local.get 1
000014: 6A           i32.add

Extension 3: Validation

Check module validity:

Sections in order?
Type indices in range?
Import counts match?
Stack types balance?

Extension 4: Round-Trip

Parse WASM, emit WAT:

./wasmparser --emit-wat example.wasm > roundtrip.wat
wat2wasm roundtrip.wat -o roundtrip.wasm
diff example.wasm roundtrip.wasm  # Should be identical (or semantically equivalent)

Extension 5: Custom Section Parser

Parse the “name” custom section to show function and local names:

Function names:
  [0] "add"
  [1] "multiply"

Real-World Connections

How Browsers Parse WASM

Browsers like Chrome and Firefox have highly optimized WASM parsers:

Streaming compilation: Begin compiling while downloading
Parallel parsing: Multiple threads parse different sections
Memory mapping: Large files mapped directly, not copied

Your parser teaches the same concepts, just without these optimizations.

How wabt Works

The official wasm2wat tool does exactly what you’re building:

Parse binary with LEB128 decoding
Build in-memory representation
Pretty-print as WAT

Study wabt’s source code to see production patterns.

Compiler Output Analysis

Compilers like Emscripten and Rust/wasm32 produce WASM. Your parser lets you:

Understand what code patterns compile to
Debug ABI issues
Optimize output by understanding size costs

Security Research

WASM parsers are security-critical. Malformed WASM has been used in exploits:

Buffer overflows from bad section sizes
Integer overflows in LEB128
Memory corruption from invalid instructions

Your parser teaches you to think about these boundaries.

Real-World Outcome

When your parser is complete, you will have a tool that produces detailed, professional output for any WebAssembly binary. Here is exactly what running your parser should produce:

Example 1: Simple Add Function

Input WAT file (add.wat):

(module
  (func $add (export "add") (param i32 i32) (result i32)
    local.get 0
    local.get 1
    i32.add
  )
  (memory (export "memory") 1)
)

Command and Output:

$ wat2wasm add.wat -o add.wasm
$ ./wasmparser add.wasm

================================================================================
                           WASM BINARY PARSER v1.0
================================================================================

File: add.wasm
Size: 42 bytes

--------------------------------------------------------------------------------
HEADER ANALYSIS
--------------------------------------------------------------------------------
Offset    Bytes           Description
------    -----           -----------
0x00      00 61 73 6D     Magic number: "\0asm" [VALID]
0x04      01 00 00 00     Version: 1 (little-endian u32)

--------------------------------------------------------------------------------
SECTION MAP
--------------------------------------------------------------------------------
ID   Name       Offset    Size      Description
--   ----       ------    ----      -----------
1    Type       0x08      7         Function signatures
3    Function   0x11      2         Function-to-type mappings
5    Memory     0x15      3         Linear memory declarations
7    Export     0x1A      14        Exported names
10   Code       0x2A      9         Function bodies

--------------------------------------------------------------------------------
SECTION 1: TYPE
--------------------------------------------------------------------------------
Count: 1 function type(s)

  [0] func (i32, i32) -> (i32)
      Encoding: 60 02 7F 7F 01 7F
                ^  ^  ^--^  ^  ^
                |  |   |    |  +-- result: i32
                |  |   |    +-- 1 result
                |  |   +-- params: i32, i32
                |  +-- 2 params
                +-- func type marker

--------------------------------------------------------------------------------
SECTION 3: FUNCTION
--------------------------------------------------------------------------------
Count: 1 function(s)

  [0] type_idx=0 -> (i32, i32) -> (i32)

--------------------------------------------------------------------------------
SECTION 5: MEMORY
--------------------------------------------------------------------------------
Count: 1 memory declaration(s)

  [0] limits: min=1 page (64KB), max=unlimited
      Encoding: 00 01
                ^  ^
                |  +-- minimum pages
                +-- flags (no maximum)

--------------------------------------------------------------------------------
SECTION 7: EXPORT
--------------------------------------------------------------------------------
Count: 2 export(s)

  [0] "add" -> func[0]
      Kind: 0x00 (function)
      Index: 0

  [1] "memory" -> memory[0]
      Kind: 0x02 (memory)
      Index: 0

--------------------------------------------------------------------------------
SECTION 10: CODE
--------------------------------------------------------------------------------
Count: 1 function body(ies)

  [0] Function $add
      Body size: 7 bytes
      Locals: (none)

      Disassembly:
      ~~~~~~~~~~~~
      Offset    Bytes    Instruction       Stack Effect
      ------    -----    -----------       ------------
      0x00      20 00    local.get 0       [] -> [i32]
      0x02      20 01    local.get 1       [i32] -> [i32, i32]
      0x04      6A       i32.add           [i32, i32] -> [i32]
      0x05      0B       end               [i32] -> [i32]

================================================================================
SUMMARY
================================================================================
  Total sections: 5
  Function types: 1
  Functions: 1 (0 imported, 1 defined)
  Exports: 2
  Memory: 1 page minimum
  Validation: PASSED

Parse completed in 0.3ms

Example 2: Module with Imports

Input WAT file (with_imports.wat):

(module
  (import "env" "log" (func $log (param i32)))
  (import "env" "memory" (memory 1))

  (func $greet (export "greet") (param i32)
    (local i32)
    local.get 0
    i32.const 100
    i32.add
    local.set 1
    local.get 1
    call $log
  )
)

Command and Output:

$ wat2wasm with_imports.wat -o with_imports.wasm
$ ./wasmparser with_imports.wasm --verbose

================================================================================
                           WASM BINARY PARSER v1.0
================================================================================

File: with_imports.wasm
Size: 68 bytes

--------------------------------------------------------------------------------
HEADER ANALYSIS
--------------------------------------------------------------------------------
Offset    Bytes           Description
------    -----           -----------
0x00      00 61 73 6D     Magic number: "\0asm" [VALID]
0x04      01 00 00 00     Version: 1 (little-endian u32)

--------------------------------------------------------------------------------
SECTION 1: TYPE
--------------------------------------------------------------------------------
Count: 2 function type(s)

  [0] func (i32) -> ()
      Used by: import "env"."log", function $greet

  [1] func (i32) -> ()
      Note: Duplicate signature, could be deduplicated

--------------------------------------------------------------------------------
SECTION 2: IMPORT
--------------------------------------------------------------------------------
Count: 2 import(s)

  [0] "env"."log" : func
      Type index: 0
      Signature: (i32) -> ()
      Assigned function index: 0

      Encoding breakdown:
      03 65 6E 76    ; module name length=3, "env"
      03 6C 6F 67    ; import name length=3, "log"
      00             ; kind: function
      00             ; type index: 0

  [1] "env"."memory" : memory
      Limits: min=1 page, max=none
      Assigned memory index: 0

--------------------------------------------------------------------------------
SECTION 3: FUNCTION
--------------------------------------------------------------------------------
Count: 1 function(s)

  [0] type_idx=1 -> (i32) -> ()
      Note: This is function index 1 (after 1 imported function)

--------------------------------------------------------------------------------
SECTION 7: EXPORT
--------------------------------------------------------------------------------
Count: 1 export(s)

  [0] "greet" -> func[1]

--------------------------------------------------------------------------------
SECTION 10: CODE
--------------------------------------------------------------------------------
Count: 1 function body(ies)

  [0] Function index 1 (export: "greet")
      Body size: 14 bytes
      Locals: 1 group(s)
        - 1 x i32 (local index: 1, after 1 param)

      Disassembly:
      ~~~~~~~~~~~~
      Offset    Bytes       Instruction       Immediate       Stack
      ------    -----       -----------       ---------       -----
      0x00      20 00       local.get         idx=0           [] -> [i32]
      0x02      41 E4 00    i32.const         100             [i32] -> [i32,i32]
                                              (LEB128: E4 00 = 100)
      0x05      6A          i32.add                           [i32,i32] -> [i32]
      0x06      21 01       local.set         idx=1           [i32] -> []
      0x08      20 01       local.get         idx=1           [] -> [i32]
      0x0A      10 00       call              func_idx=0      [i32] -> []
                                              (calls: "env"."log")
      0x0C      0B          end                               [] -> []

================================================================================
INDEX SPACES (after all imports processed)
================================================================================
  Function indices:
    [0] import "env"."log" : (i32) -> ()
    [1] $greet : (i32) -> () [exported as "greet"]

  Memory indices:
    [0] import "env"."memory" : 1 page min

================================================================================
Parse completed in 0.4ms

Example 3: Hex Dump Mode

$ ./wasmparser add.wasm --hex

WASM Binary Hex Dump: add.wasm
==============================

Header:
  00000000: 00 61 73 6D 01 00 00 00                          .asm....

Section 1 (Type) - 7 bytes:
  00000008: 01 06 01 60 02 7F 7F 01 7F                       ...`....

Section 3 (Function) - 2 bytes:
  00000011: 03 02 01 00                                      ....

Section 5 (Memory) - 3 bytes:
  00000015: 05 03 01 00 01                                   .....

Section 7 (Export) - 14 bytes:
  0000001A: 07 0D 02 03 61 64 64 00 00 06 6D 65 6D 6F 72 79  ....add...memory
  0000002A: 02 00                                            ..

Section 10 (Code) - 9 bytes:
  0000002C: 0A 07 01 05 00 20 00 20 01 6A 0B                 ..... . .j.

Example 4: JSON Output Mode

$ ./wasmparser add.wasm --json | jq .

{
  "magic": "00 61 73 6D",
  "version": 1,
  "sections": [
    {
      "id": 1,
      "name": "type",
      "offset": 8,
      "size": 7,
      "types": [
        {
          "index": 0,
          "params": ["i32", "i32"],
          "results": ["i32"]
        }
      ]
    },
    {
      "id": 3,
      "name": "function",
      "offset": 17,
      "size": 2,
      "functions": [
        {"index": 0, "type_index": 0}
      ]
    },
    {
      "id": 5,
      "name": "memory",
      "offset": 21,
      "size": 3,
      "memories": [
        {"index": 0, "min": 1, "max": null}
      ]
    },
    {
      "id": 7,
      "name": "export",
      "offset": 26,
      "size": 14,
      "exports": [
        {"name": "add", "kind": "function", "index": 0},
        {"name": "memory", "kind": "memory", "index": 0}
      ]
    },
    {
      "id": 10,
      "name": "code",
      "offset": 42,
      "size": 9,
      "bodies": [
        {
          "index": 0,
          "size": 7,
          "locals": [],
          "instructions": [
            {"offset": 0, "opcode": "0x20", "mnemonic": "local.get", "immediate": 0},
            {"offset": 2, "opcode": "0x20", "mnemonic": "local.get", "immediate": 1},
            {"offset": 4, "opcode": "0x6A", "mnemonic": "i32.add"},
            {"offset": 5, "opcode": "0x0B", "mnemonic": "end"}
          ]
        }
      ]
    }
  ],
  "summary": {
    "total_sections": 5,
    "function_count": 1,
    "import_count": 0,
    "export_count": 2,
    "valid": true
  }
}

Example 5: Error Handling

$ ./wasmparser corrupted.wasm

================================================================================
                           WASM BINARY PARSER v1.0
================================================================================

File: corrupted.wasm
Size: 25 bytes

--------------------------------------------------------------------------------
HEADER ANALYSIS
--------------------------------------------------------------------------------
Offset    Bytes           Description
------    -----           -----------
0x00      00 61 73 6D     Magic number: "\0asm" [VALID]
0x04      01 00 00 00     Version: 1 (little-endian u32)

--------------------------------------------------------------------------------
PARSE ERROR
--------------------------------------------------------------------------------
Error: Invalid section ordering at offset 0x10
  Expected: Section ID >= 3
  Found: Section ID 1 (Type)

Context: Section 3 (Function) was already parsed at offset 0x08.
         Known sections must appear at most once and in ascending ID order.
         Only custom sections (ID 0) may appear out of order.

Partial parse results available above this error.

Exit code: 1

The Core Question You’re Answering

How does a compact binary format encode program structure, and what design tradeoffs enable fast parsing?

This question sits at the heart of systems programming. As you build your parser, you’re uncovering answers to fundamental design decisions:

Why variable-length encoding? LEB128 saves bytes for small values (common in indices and sizes) at the cost of complexity. A module with 1000 small functions saves kilobytes compared to fixed 4-byte integers.
Why strict section ordering? Single-pass parsing. A streaming compiler can begin generating code the moment function types arrive, without waiting for the entire file. No seeking backward, no multi-pass algorithms.
Why separate Function and Code sections? Forward reference resolution. The Function section declares all type signatures upfront, so the Code section can validate call instructions immediately without lookahead.
Why magic numbers and version fields? Fast rejection of invalid files. A parser can fail in 8 bytes instead of parsing garbage and failing deep inside.

Reflect on this: Every byte in the format exists for a reason. When you encounter something that seems redundant or complex, ask: “What problem does this solve?” The answer usually involves either space efficiency, parse speed, or validation simplicity.

Concepts You Must Understand First

Before writing code, ensure you have solid mental models for these concepts:

LEB128 variable-length encoding - The algorithm for reading integers where small values use fewer bytes, with continuation bits indicating “more data follows”
Binary file structure and byte-level I/O - How files are organized as sequences of bytes, and how to read them in chunks while tracking position
Hexadecimal notation and bit manipulation - Converting between hex, binary, and decimal; using AND, OR, and shift operations to extract fields from bytes
Magic numbers and file format identification - How files begin with signature bytes that identify their format (PDF starts with %PDF, PNG with \x89PNG, WASM with \0asm)
Type encodings and tagged unions - How a single byte can indicate which variant follows (e.g., 0x60 means “function type”, 0x7F means “i32”)
Length-prefixed vs. delimiter-terminated data - WASM uses length prefixes (know size before reading) rather than delimiters (scan for terminator), enabling single-pass parsing
Index spaces and forward references - How WASM assigns indices to types, functions, memories, etc., and why imports affect the numbering

Book Reference: See “Computer Systems: A Programmer’s Perspective” chapters 2 and 7 for binary representations and linking concepts. The “Practical Binary Analysis” book covers file format parsing in depth.

Questions to Guide Your Design

Before implementing, think through these design questions:

Reading the byte stream:

How will you track your current position in the file?
What happens if you try to read past the end of the file?
Should you read the whole file into memory, or stream from disk?

Decoding LEB128:

How do you know when to stop reading bytes?
What’s the maximum number of bytes you might read for a u32? For a u64?
How does signed LEB128 differ, and when do you need sign extension?

Parsing sections:

How will you verify you consumed exactly the number of bytes a section claims?
What should happen if you encounter an unknown section ID (like ID 15)?
How will you handle custom sections that can appear anywhere?

Validating structure:

How do you enforce that sections appear in ascending ID order?
What if a section appears twice?
Should you validate that function indices are in range, or defer to a separate validation pass?

Decoding instructions:

How will you map opcode bytes to instruction names?
What’s your strategy for handling the ~200 different opcodes?
How will you track nesting depth for block/loop/if structures?

Output format:

Will you build an in-memory AST first, or print while parsing?
How will you handle alignment and formatting for readable output?
What information is essential vs. nice-to-have?

Thinking Exercise

Before writing any code, manually decode this WASM binary by hand. This exercise builds intuition that no amount of coding can replace.

The binary (27 bytes):

00 61 73 6D 01 00 00 00 01 07 01 60 02 7F 7F 01
7F 03 02 01 00 0A 07 01 05 00 20 00 20 01 6A 0B

Your task:

Identify the header (first 8 bytes):
- What is the magic number?
- What version is this module?
Find the first section (starts at byte 8):
- What is the section ID?
- Decode the section size (it’s a single-byte LEB128)
- What are the raw bytes of the section content?
Parse the Type section content:
- How many types are declared?
- What is the marker byte for a function type?
- How many parameters? What types?
- How many results? What types?
- Write out the signature in WAT form: (func (param ...) (result ...))
Find the second section:
- What is the section ID?
- What is the size?
- This is the Function section. How many functions does it declare?
- What type index does function 0 use?
Find the third section (the Code section):
- What is the section ID?
- What is the size?
- How many function bodies?
- What is the body size for function 0?
- How many local variable groups?
- Decode each instruction:
  - Byte 0x20 = ?
  - Byte 0x00 = ?
  - Byte 0x20 = ?
  - Byte 0x01 = ?
  - Byte 0x6A = ?
  - Byte 0x0B = ?
Reconstruct the WAT: Write the complete WAT that would produce this binary.

Hint for checking your work: The module contains a single function that adds two i32 parameters.

The Interview Questions They’ll Ask

Binary parsing skills translate directly to interview questions. Here’s what you’ll be prepared to answer:

LEB128 and Variable-Length Encoding:

“Implement a function to decode unsigned LEB128 from a byte stream.”
“What is the maximum value that can be stored in 3 bytes of LEB128?”
“Why might a format use variable-length encoding instead of fixed-size integers?”
“Given bytes 0xE5 0x8E 0x26, what unsigned integer does this represent?”

Binary File Formats:

“How would you design a binary format for a configuration file?”
“What are the tradeoffs between length-prefixed and delimiter-terminated strings?”
“Why do file formats use magic numbers?”
“How would you make a binary format extensible for future versions?”

Parsing and Validation:

“How would you implement a streaming parser that can process data as it arrives?”
“What’s the difference between syntax errors and semantic errors in a binary format?”
“How would you gracefully handle corrupted or truncated input?”
“Design a data structure to represent a parsed AST for a binary format.”

Systems Programming:

“What’s the difference between big-endian and little-endian byte order?”
“How would you memory-map a large file for efficient parsing?”
“What security considerations apply when parsing untrusted binary input?”
“How would you implement fuzzing for a binary parser?”

WebAssembly Specific:

“Explain the relationship between the Function section and Code section.”
“Why must WASM sections appear in a specific order?”
“What is the purpose of the Data Count section added in WASM 2.0?”
“How does WASM’s binary format enable streaming compilation?”

Hints in Layers

If you get stuck, reveal hints progressively. Try to solve problems yourself before looking.

Layer 1: Getting Started

Hint: File reading structure

Create a struct to wrap your byte buffer with position tracking:

struct ByteReader {
    data: &[u8],
    pos: usize,
}

impl ByteReader {
    fn read_byte(&mut self) -> Option<u8> {
        if self.pos < self.data.len() {
            let b = self.data[self.pos];
            self.pos += 1;
            Some(b)
        } else {
            None
        }
    }
}

All other read operations build on read_byte().

Hint: LEB128 termination condition

The high bit (0x80) of each byte indicates continuation:

If byte & 0x80 != 0: more bytes follow
If byte & 0x80 == 0: this is the last byte

Extract the 7 value bits with byte & 0x7F.

Layer 2: Section Parsing

Hint: Section boundary tracking

Before parsing a section’s content, record your position:

let section_start = reader.pos;
let section_size = reader.read_uleb128();

// Parse content...

let bytes_consumed = reader.pos - section_start;
assert!(bytes_consumed == section_size, "Section size mismatch!");

This catches bugs where you read too many or too few bytes.

Hint: Handling unknown sections

If you encounter a section ID you don’t recognize:

if section_id > 12 {
    // Unknown section - skip it entirely
    reader.skip(section_size);
    continue;
}

This makes your parser forward-compatible with future WASM versions.

Layer 3: Instruction Decoding

Hint: Opcode dispatch pattern

Use a match/switch on the opcode byte. Group by category:

match opcode {
    // Control flow
    0x00 => Instruction::Unreachable,
    0x01 => Instruction::Nop,
    0x02 => Instruction::Block(read_block_type()),
    0x0B => Instruction::End,

    // Variables
    0x20 => Instruction::LocalGet(read_uleb128()),
    0x21 => Instruction::LocalSet(read_uleb128()),

    // Constants
    0x41 => Instruction::I32Const(read_sleb128()),  // Note: signed!

    // Arithmetic (no immediates)
    0x6A => Instruction::I32Add,
    0x6B => Instruction::I32Sub,

    _ => Instruction::Unknown(opcode),
}

Hint: Memory instruction immediates

Memory operations like i32.load and i32.store have two immediates:

// memarg = (align, offset)
let align = read_uleb128();   // log2 of alignment (0=1, 1=2, 2=4, 3=8)
let offset = read_uleb128();  // byte offset from address

// Effective address = stack_value + offset

The alignment is a hint for optimization; the offset is semantic.

Layer 4: Advanced Topics

Hint: Block type encoding

Block types use a special encoding:

0x40: void (empty block type, no value produced)
0x7F, 0x7E, 0x7D, 0x7C: value type (block produces that type)
0x00+: type index (signed LEB128, positive = index into type section)

The type index form allows blocks with multiple returns.

Hint: Nested block tracking

For proper disassembly indentation, track nesting depth:

let mut depth = 0;

for instr in instructions {
    match instr {
        Block | Loop | If => {
            print_indented(depth, instr);
            depth += 1;
        }
        Else => {
            print_indented(depth - 1, instr);  // Else at same level as If
        }
        End => {
            depth -= 1;
            print_indented(depth, instr);
        }
        _ => print_indented(depth, instr),
    }
}

Books That Will Help

Book	Author(s)	Why It’s Relevant
Practical Binary Analysis	Dennis Andriesse	The definitive guide to understanding binary formats. Covers ELF, PE, and general principles of parsing executable formats. Chapters on disassembly directly apply to WASM bytecode.
Computer Systems: A Programmer’s Perspective	Bryant & O’Hallaron	Foundation for understanding how programs are represented in binary. Chapter 2 on data representation explains bit manipulation; Chapter 7 on linking explains symbol tables and relocations (similar to WASM imports/exports).
Low-Level Programming: C, Assembly, and Program Execution	Igor Zhirkov	Teaches the mindset for byte-level programming. Understanding x86 binary encoding helps appreciate WASM’s simpler design.
Crafting Interpreters	Robert Nystrom	While focused on text parsing, the bytecode chapter (Part III) shows how to design instruction encodings, directly applicable to WASM disassembly.
The Art of WebAssembly	Rick Battagline	WebAssembly-specific book that covers the binary format from a practical perspective. Good companion for understanding the “why” behind format decisions.
Compilers: Principles, Techniques, and Tools	Aho, Lam, Sethi, Ullman	The Dragon Book’s chapters on intermediate representations and code generation explain why WASM is structured the way it is.

Resources

Specifications

WebAssembly Binary Format - Definitive reference
LEB128 on Wikipedia - Algorithm details

Reference Implementations

wabt source - C++ reference
parity-wasm - Rust implementation
aspect-it/aspect-it - Community docs

Tools

wasm-objdump - From wabt, for comparison
xxd - Hex dump utility
wasm-validate - Check if WASM is valid

Self-Assessment Checklist

Before moving to Project 3, verify you can:

Implement LEB128 encoding/decoding from scratch
List all 12 section types and their purposes
Explain why sections must be ordered
Parse any function signature from the type section
Decode the code section’s instruction stream
Use wasm-objdump to verify your parser’s output
Handle malformed input without crashing

Conceptual Questions

Why does WASM use LEB128 instead of fixed-size integers?
What information is in the Function section vs. the Code section?
How do you know where a function body ends?
Why must imports be declared before functions?
How would you find which type a particular function uses?

Next: P03: Build a WASM Interpreter — execute the bytecode you’ve learned to parse