Project 2: Build a WASM Binary Parser

Project 2: Build a WASM Binary Parser

Decode the .wasm binary format byte-by-byte and understand what makes a WebAssembly module


Project Overview

Attribute Value
Difficulty Intermediate
Time Estimate 1-2 weeks
Languages C (primary), Rust, Python, TypeScript
Prerequisites Project 1 (WAT), binary/hex familiarity, file I/O
Main Reference WebAssembly Specification ยง5 (Binary Format)
Knowledge Area Binary Parsing, WebAssembly Internals

Learning Objectives

After completing this project, you will be able to:

  1. Parse LEB128 encoding - Decode variable-length integers used throughout WASM
  2. Identify all section types - Know the 12 standard sections and their purposes
  3. Decode function signatures - Understand the type sectionโ€™s structure
  4. Read import/export tables - Parse how modules connect to their environment
  5. Disassemble code sections - Turn bytecode back into readable instructions
  6. Validate module structure - Know the ordering and format requirements

Conceptual Foundation

1. Why Binary Format Matters

When you write WAT and compile with wat2wasm, the output is a .wasm fileโ€”a binary encoding of your module. Understanding this format lets you:

  • Debug mysterious WASM failures by inspecting the actual bytes
  • Build tools (validators, optimizers, debuggers)
  • Write compilers that output WASM directly
  • Understand runtime performance (binary is what actually executes)

The binary format IS WebAssembly. WAT is just a convenience for humans.

2. The Magic Number and Version

Every WASM file begins with 8 bytes:

Offset  Bytes        Meaning
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
0x00    00 61 73 6D  Magic number: "\0asm"
0x04    01 00 00 00  Version: 1 (little-endian u32)

Why โ€œ\0asmโ€?

  • Starts with null byte to distinguish from text files
  • โ€œasmโ€ is a nod to the original asm.js project
  • Makes files easily identifiable: file mymodule.wasm shows โ€œWebAssemblyโ€

Version 1: All deployed WASM today is version 1. Version 2 is in development with new features.

3. LEB128: Variable-Length Integers

WASM uses LEB128 (Little Endian Base 128) to encode integers compactly:

  • Small numbers use few bytes: 0-127 fit in 1 byte
  • Large numbers use more bytes: up to 5 bytes for u32, 10 for u64
  • Each byteโ€™s high bit indicates โ€œmore bytes followโ€

Unsigned LEB128 Algorithm

Reading uLEB128:
  result = 0
  shift = 0
  loop:
    byte = read_byte()
    result |= (byte & 0x7F) << shift  // Low 7 bits contribute to value
    if (byte & 0x80) == 0:            // High bit clear = done
      break
    shift += 7
  return result

Example: Decoding 0xE5 0x8E 0x26

Byte 1: 0xE5 = 1110 0101
  - High bit set (1): more bytes coming
  - Value bits: 110 0101 = 0x65

Byte 2: 0x8E = 1000 1110
  - High bit set (1): more bytes coming
  - Value bits: 000 1110 = 0x0E

Byte 3: 0x26 = 0010 0110
  - High bit clear (0): done
  - Value bits: 010 0110 = 0x26

Assembly:
  result = 0x65 | (0x0E << 7) | (0x26 << 14)
         = 0x65 | 0x700 | 0x98000
         = 624485

Signed LEB128 (sLEB128)

Similar, but the final byteโ€™s second-highest bit is the sign bit. If set, extend with 1s:

If final byte has bit 6 set and result needs sign extension:
  result |= (~0 << shift)  // Fill high bits with 1s

4. Section Structure

After the header, a WASM module is a sequence of sections. Each section:

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Section ID  โ”‚ Section Size โ”‚    Section Data    โ”‚
โ”‚  (1 byte)   โ”‚  (uLEB128)   โ”‚   (size bytes)     โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

WASM Section Structure

Section IDs:

ID Name Purpose
0 Custom Name sections, debug info, extensions
1 Type Function signatures (param/result types)
2 Import Functions/memory/tables/globals from host
3 Function Maps function index โ†’ type index
4 Table Indirect call tables (for function pointers)
5 Memory Linear memory declarations
6 Global Global variable declarations
7 Export Names exposed to host
8 Start Optional startup function index
9 Element Table initialization data
10 Code Function bodies
11 Data Memory initialization data
12 Data Count Count of data segments (for validation)

Critical rule: Sections must appear in ID order (except custom sections, which can appear anywhere). Duplicate sections are invalid.

5. The Type Section (ID = 1)

Contains function signatures (types). Structure:

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ num_types  โ”‚ type[0], type[1], ..., type[num_types-1]   โ”‚
โ”‚ (uLEB128)  โ”‚                                             โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Each type:
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ 0x60   โ”‚ num_params โ”‚ param_types... โ”‚ num_results โ”‚ result_types...โ”‚
โ”‚ (func) โ”‚ (uLEB128)  โ”‚ (n bytes)      โ”‚ (uLEB128)   โ”‚ (m bytes)      โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

WASM Type Section Structure

Type encoding: | Byte | Type | |โ€”โ€”|โ€”โ€”| | 0x7F | i32 | | 0x7E | i64 | | 0x7D | f32 | | 0x7C | f64 | | 0x70 | funcref | | 0x6F | externref |

Example: (func (param i32 i32) (result i32)) encodes as:

60        ; function type marker
02        ; 2 parameters
7F 7F     ; both are i32
01        ; 1 result
7F        ; i32

6. The Function Section (ID = 3)

Maps each function in the code section to a type index:

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ num_funcs  โ”‚ type_idx[0], type_idx[1], ...              โ”‚
โ”‚ (uLEB128)  โ”‚ (each is uLEB128)                          โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Why separate from Code?

  • Allows streaming compilation: know all types before seeing bodies
  • Enables forward references and mutual recursion

Function indexing: Imported functions come first. If you have 3 imports, your first local function is index 3.

7. The Import Section (ID = 2)

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ num_imports โ”‚ import[0], import[1], ...                   โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Each import:
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ module_name    โ”‚ import_name      โ”‚ kind    โ”‚ description  โ”‚
โ”‚ (string)       โ”‚ (string)         โ”‚ (byte)  โ”‚ (varies)     โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Strings: length (uLEB128) followed by UTF-8 bytes

Kind:
  0x00 = function (followed by type index)
  0x01 = table
  0x02 = memory
  0x03 = global

8. The Export Section (ID = 7)

Each export:
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ export_name   โ”‚ kind    โ”‚ index         โ”‚
โ”‚ (string)      โ”‚ (byte)  โ”‚ (uLEB128)     โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Kind: same as imports (0x00=func, 0x01=table, 0x02=memory, 0x03=global)

9. The Code Section (ID = 10)

The heart of WASMโ€”function bodies:

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ num_funcs  โ”‚ func_body[0], func_body[1], ...              โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Each func_body:
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ body_size     โ”‚ locals       โ”‚ instructions               โ”‚
โ”‚ (uLEB128)     โ”‚ (see below)  โ”‚ (sequence of opcodes)      โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Locals:
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ num_groups โ”‚ (count, type) pairs                           โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

For example: 2 locals of type i32, 1 local of type i64:
  02          ; 2 groups
  02 7F       ; 2 ร— i32
  01 7E       ; 1 ร— i64

10. Instruction Encoding (Opcodes)

Instructions are single bytes (sometimes followed by immediates):

Opcode Instruction Immediates
0x00 unreachable none
0x01 nop none
0x02 block block type
0x03 loop block type
0x04 if block type
0x05 else none
0x0B end none
0x0C br label index
0x0D br_if label index
0x10 call function index
0x20 local.get local index
0x21 local.set local index
0x22 local.tee local index
0x28 i32.load memarg
0x36 i32.store memarg
0x41 i32.const sLEB128 value
0x42 i64.const sLEB128 value
0x6A i32.add none
0x6B i32.sub none
0x6C i32.mul none

memarg: Memory operations take (align, offset) both as uLEB128.

block type: Either 0x40 (void) or a value type (0x7F for i32, etc.)

11. Module Anatomy Diagram

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                        WASM Module                           โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚  Magic Number (4 bytes): 00 61 73 6D                        โ”‚
โ”‚  Version (4 bytes): 01 00 00 00                             โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚  Section 1 (Type):                                          โ”‚
โ”‚    โ”œโ”€โ”€ Number of types                                      โ”‚
โ”‚    โ””โ”€โ”€ Type definitions (function signatures)               โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚  Section 2 (Import):                                        โ”‚
โ”‚    โ”œโ”€โ”€ Number of imports                                    โ”‚
โ”‚    โ””โ”€โ”€ Import entries (module, name, kind, description)     โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚  Section 3 (Function):                                      โ”‚
โ”‚    โ”œโ”€โ”€ Number of functions                                  โ”‚
โ”‚    โ””โ”€โ”€ Type indices (which signature each function uses)    โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚  Section 5 (Memory):                                        โ”‚
โ”‚    โ””โ”€โ”€ Memory limits (initial, optional max)                โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚  Section 7 (Export):                                        โ”‚
โ”‚    โ”œโ”€โ”€ Number of exports                                    โ”‚
โ”‚    โ””โ”€โ”€ Export entries (name, kind, index)                   โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚  Section 10 (Code):                                         โ”‚
โ”‚    โ”œโ”€โ”€ Number of function bodies                            โ”‚
โ”‚    โ””โ”€โ”€ Function bodies                                      โ”‚
โ”‚        โ”œโ”€โ”€ Body size                                        โ”‚
โ”‚        โ”œโ”€โ”€ Local declarations                               โ”‚
โ”‚        โ””โ”€โ”€ Instructions (ending with 0x0B = end)            โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

WASM Module Anatomy


Project Specification

Deliverables

Build a command-line tool that parses .wasm files and prints their structure:

$ ./wasmparser example.wasm
WASM Module
  Version: 1

Section: Type (1)
  [0] (i32, i32) -> (i32)
  [1] (i32) -> ()

Section: Import (2)
  [0] "env"."print" : func type=1

Section: Function (3)
  [0] type=0
  [1] type=0

Section: Memory (5)
  [0] min=1, max=none

Section: Export (7)
  [0] "add" : func 1
  [1] "memory" : memory 0

Section: Code (10)
  [0] locals: 0
      20 00   local.get 0
      20 01   local.get 1
      6A      i32.add
      0B      end
  [1] locals: 1 ร— i32
      41 00   i32.const 0
      21 02   local.set 2
      ...

Functional Requirements

  1. Parse header: Validate magic number and version
  2. Parse all 12 section types (or at least Type, Import, Function, Memory, Export, Code)
  3. Decode LEB128: Both signed and unsigned variants
  4. Parse type section: Show all function signatures
  5. Parse import section: Show module, name, kind, and type
  6. Parse export section: Show name, kind, and index
  7. Parse code section: Disassemble to readable instructions
  8. Handle errors gracefully: Invalid files shouldnโ€™t crash

Minimum Viable Product

Start with:

  1. Header parsing
  2. Section enumeration (just ID and size)
  3. Type section parsing
  4. Code section basic disassembly

Then expand to remaining sections.

Success Criteria

  • Parses official WASM test suite files without crashing
  • Correctly decodes LEB128 values (test with edge cases)
  • Output matches wasm-objdump -x for section structure
  • Disassembly matches wasm2wat output for simple modules
  • Handles empty sections and missing optional sections

Solution Architecture

High-Level Design

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                      wasmparser                             โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                                                             โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”‚
โ”‚  โ”‚    Reader    โ”‚โ”€โ”€โ”€โ–ถโ”‚    Parser    โ”‚โ”€โ”€โ”€โ–ถโ”‚   Printer    โ”‚  โ”‚
โ”‚  โ”‚ (byte stream)โ”‚    โ”‚(decode logic)โ”‚    โ”‚(format output)โ”‚ โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ”‚
โ”‚                                                             โ”‚
โ”‚  Reader provides:         Parser builds:    Printer shows:  โ”‚
โ”‚  - read_byte()           - Module struct   - Human text     โ”‚
โ”‚  - read_bytes(n)         - Section list    - Or JSON        โ”‚
โ”‚  - read_uleb128()        - Types list      - Or other       โ”‚
โ”‚  - read_sleb128()        - Functions       โ”‚                โ”‚
โ”‚  - position              - etc.            โ”‚                โ”‚
โ”‚                                                             โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

WASM Parser Architecture

Data Structures

Module {
  version: u32
  sections: Vec<Section>
}

Section {
  id: u8
  size: u32
  data: SectionData  // enum or variant type
}

SectionData =
  | TypeSection { types: Vec<FuncType> }
  | ImportSection { imports: Vec<Import> }
  | FunctionSection { type_indices: Vec<u32> }
  | MemorySection { memories: Vec<MemoryType> }
  | ExportSection { exports: Vec<Export> }
  | CodeSection { bodies: Vec<FuncBody> }
  | CustomSection { name: String, bytes: Vec<u8> }
  | ...

FuncType {
  params: Vec<ValueType>
  results: Vec<ValueType>
}

Import {
  module_name: String
  import_name: String
  kind: ImportKind
}

FuncBody {
  locals: Vec<(u32, ValueType)>  // (count, type) pairs
  instructions: Vec<Instruction>
}

Instruction {
  opcode: u8
  immediates: Immediates  // varies by instruction
}

Reader Interface

trait WasmReader {
  fn read_byte(&mut self) -> Result<u8>
  fn read_bytes(&mut self, n: usize) -> Result<Vec<u8>>
  fn read_u32_le(&mut self) -> Result<u32>  // for header
  fn read_uleb128(&mut self) -> Result<u64>
  fn read_sleb128(&mut self) -> Result<i64>
  fn read_string(&mut self) -> Result<String>  // length-prefixed
  fn position(&self) -> usize
  fn remaining(&self) -> usize
}

Parsing Flow

parse_module():
  1. Read and validate magic number
  2. Read and store version
  3. While bytes remaining:
     a. Read section id (1 byte)
     b. Read section size (uLEB128)
     c. Read section_size bytes
     d. Dispatch to section parser based on id
     e. Verify exactly section_size bytes were consumed

parse_type_section(bytes):
  1. Read number of types
  2. For each type:
     a. Read 0x60 (func type marker)
     b. Read param count, then that many type bytes
     c. Read result count, then that many type bytes

parse_code_section(bytes):
  1. Read number of function bodies
  2. For each body:
     a. Read body size
     b. Read locals (count, then groups of (count, type))
     c. Read instructions until 0x0B (end)

parse_instruction():
  1. Read opcode byte
  2. Based on opcode, read immediates:
     - 0x41 (i32.const): read sLEB128
     - 0x20 (local.get): read uLEB128 (index)
     - 0x02 (block): read block type
     - etc.

Implementation Guide

Phase 1: File Reading and Header (Day 1)

Goal: Read a .wasm file and validate the header

Steps:

  1. Read entire file into a byte buffer
  2. Check first 4 bytes are 00 61 73 6D
  3. Read next 4 bytes as little-endian u32 (should be 1)
  4. Print โ€œValid WASM module, version Xโ€

Hint: Little-endian u32 from bytes b0 b1 b2 b3:

value = b0 | (b1 << 8) | (b2 << 16) | (b3 << 24)

Testing: Create a minimal WAT file, compile it, parse it:

(module)
wat2wasm minimal.wat -o minimal.wasm
./wasmparser minimal.wasm

Phase 2: LEB128 Decoding (Day 1-2)

Goal: Implement robust LEB128 decoding

Unsigned LEB128 algorithm:

result = 0
shift = 0
while true:
  byte = read_byte()
  result |= (byte & 0x7F) << shift
  if (byte & 0x80) == 0:
    break
  shift += 7
return result

Signed LEB128 addition:

// After the loop, if sign extension needed:
if shift < 64 and (byte & 0x40) != 0:
  result |= -(1 << shift)  // Sign extend

Test cases: | Bytes | Unsigned | Signed | |โ€”โ€”-|โ€”โ€”โ€”-|โ€”โ€”โ€“| | 0x00 | 0 | 0 | | 0x7F | 127 | -1 | | 0x80 0x01 | 128 | 128 | | 0xFF 0x7F | 16383 | -1 | | 0xE5 0x8E 0x26 | 624485 | 624485 | | 0x9B 0xF1 0x59 | - | -624485 |

Phase 3: Section Enumeration (Day 2)

Goal: List all sections without parsing contents

Algorithm:

while position < file_length:
  section_id = read_byte()
  section_size = read_uleb128()
  print("Section", section_id, "size", section_size)
  skip(section_size)  // advance position

Testing: Parse a compiled WAT with multiple elements:

(module
  (memory 1)
  (func (export "test") (param i32) (result i32)
    local.get 0
  )
)

Expected sections: Type, Function, Memory, Export, Code

Phase 4: Type Section Parser (Day 3)

Goal: Parse and display function signatures

Hint for structure:

Type section:
  count (uLEB128)
  for each:
    0x60 (function type marker)
    param_count (uLEB128)
    param_types[param_count] (one byte each)
    result_count (uLEB128)
    result_types[result_count] (one byte each)

Type byte decoding:

0x7F โ†’ "i32"
0x7E โ†’ "i64"
0x7D โ†’ "f32"
0x7C โ†’ "f64"

Output format:

Type Section (1):
  [0] (i32, i32) -> (i32)
  [1] () -> ()

Phase 5: Import and Export Sections (Day 4)

Goal: Parse imports and exports

String reading: Length-prefixed UTF-8

length = read_uleb128()
bytes = read_bytes(length)
string = utf8_decode(bytes)

Import kind dispatch:

kind = read_byte()
match kind:
  0x00 (function): type_index = read_uleb128()
  0x01 (table): parse table type
  0x02 (memory): parse memory limits
  0x03 (global): parse global type

Export format:

name = read_string()
kind = read_byte()
index = read_uleb128()

Phase 6: Code Section and Disassembly (Day 5-7)

Goal: Parse function bodies and print instructions

Function body structure:

body_size = read_uleb128()
// Now read exactly body_size bytes

locals_count = read_uleb128()
for locals_count times:
  count = read_uleb128()  // how many of this type
  type = read_byte()      // what type

// Rest is instructions until 0x0B
while current_byte != 0x0B:
  parse_instruction()

Instruction parsing (start with these):

Opcode Name Immediates
0x00 unreachable -
0x01 nop -
0x0B end -
0x10 call func_idx (uLEB128)
0x20 local.get local_idx (uLEB128)
0x21 local.set local_idx (uLEB128)
0x41 i32.const value (sLEB128)
0x6A i32.add -
0x6B i32.sub -
0x6C i32.mul -

Output format:

Code Section (10):
  [0] 5 bytes, locals: 0
      20 00     local.get 0
      20 01     local.get 1
      6A        i32.add
      0B        end

Phase 7: Full Integration (Day 8+)

Goal: Handle all sections, robust error handling

  1. Add remaining sections (Table, Memory limits, Global, Element, Data)
  2. Add control flow instructions (block, loop, if, br, br_if)
  3. Add memory instructions with memarg parsing
  4. Improve error messages with byte offsets
  5. Add validation (check section ordering, size consistency)

Testing Strategy

Unit Tests for LEB128

// Unsigned
assert(parse_uleb128([0x00]) == 0)
assert(parse_uleb128([0x7F]) == 127)
assert(parse_uleb128([0x80, 0x01]) == 128)
assert(parse_uleb128([0xE5, 0x8E, 0x26]) == 624485)

// Signed
assert(parse_sleb128([0x00]) == 0)
assert(parse_sleb128([0x7F]) == -1)
assert(parse_sleb128([0x80, 0x01]) == 128)
assert(parse_sleb128([0x9B, 0xF1, 0x59]) == -624485)

Integration Tests

Create WAT files, compile them, parse them, verify output:

# Create test cases
echo '(module)' > empty.wat
echo '(module (func (export "f")))' > func.wat
echo '(module (func (param i32) (result i32) local.get 0))' > identity.wat

# Compile
wat2wasm empty.wat
wat2wasm func.wat
wat2wasm identity.wat

# Parse and verify
./wasmparser empty.wasm
./wasmparser func.wasm
./wasmparser identity.wasm

Comparison Testing

Compare your output to wasm-objdump:

wasm-objdump -x example.wasm > expected.txt
./wasmparser example.wasm > actual.txt
diff expected.txt actual.txt

Fuzzing (Advanced)

Use the official WASM test suite:

git clone https://github.com/aspect-it/aspect-it/aspect-it/aspect-itspec-repo
for f in *.wasm; do
  ./wasmparser "$f" || echo "Failed: $f"
done

Common Pitfalls and Debugging

Pitfall 1: LEB128 Off-by-One

Symptom: Values slightly wrong, especially around powers of 2 Cause: Shift calculation or termination condition wrong Fix: Test with values at boundaries: 0, 127, 128, 255, 256, 16383, 16384

Pitfall 2: Forgetting Section Size Constraint

Symptom: Parsing goes off the rails after first section Cause: Not tracking bytes consumed within section Fix: Record position before section, verify position == start + size after

Pitfall 3: Signed vs Unsigned Confusion

Symptom: Negative numbers look huge positive Cause: Using uLEB128 where sLEB128 needed (e.g., i32.const) Fix: i32.const and i64.const use signed; indices use unsigned

Pitfall 4: String Encoding

Symptom: Import/export names garbled Cause: Not reading length prefix, or encoding issues Fix: Strings are length-prefixed (uLEB128) followed by raw UTF-8 bytes

Pitfall 5: Nested Control Structures

Symptom: Disassembly wrong for if/block/loop Cause: Not tracking nesting depth Fix: block/loop/if push depth, end pops depth. Track for proper indentation.

Debugging with xxd

xxd example.wasm | head -20

Shows hex dump. Match against your parserโ€™s position to find issues.

Debugging with wasm-objdump

wasm-objdump -h example.wasm  # Headers and section list
wasm-objdump -x example.wasm  # Full details
wasm-objdump -d example.wasm  # Disassembly

Compare output to find where your parser diverges.


Extensions and Challenges

Extension 1: JSON Output

Add --json flag for machine-readable output. Useful for tooling.

Extension 2: Hex Dump Mode

Show raw bytes alongside disassembly:

000010: 20 00        local.get 0
000012: 20 01        local.get 1
000014: 6A           i32.add

Extension 3: Validation

Check module validity:

  • Sections in order?
  • Type indices in range?
  • Import counts match?
  • Stack types balance?

Extension 4: Round-Trip

Parse WASM, emit WAT:

./wasmparser --emit-wat example.wasm > roundtrip.wat
wat2wasm roundtrip.wat -o roundtrip.wasm
diff example.wasm roundtrip.wasm  # Should be identical (or semantically equivalent)

Extension 5: Custom Section Parser

Parse the โ€œnameโ€ custom section to show function and local names:

Function names:
  [0] "add"
  [1] "multiply"

Real-World Connections

How Browsers Parse WASM

Browsers like Chrome and Firefox have highly optimized WASM parsers:

  • Streaming compilation: Begin compiling while downloading
  • Parallel parsing: Multiple threads parse different sections
  • Memory mapping: Large files mapped directly, not copied

Your parser teaches the same concepts, just without these optimizations.

How wabt Works

The official wasm2wat tool does exactly what youโ€™re building:

  1. Parse binary with LEB128 decoding
  2. Build in-memory representation
  3. Pretty-print as WAT

Study wabtโ€™s source code to see production patterns.

Compiler Output Analysis

Compilers like Emscripten and Rust/wasm32 produce WASM. Your parser lets you:

  • Understand what code patterns compile to
  • Debug ABI issues
  • Optimize output by understanding size costs

Security Research

WASM parsers are security-critical. Malformed WASM has been used in exploits:

  • Buffer overflows from bad section sizes
  • Integer overflows in LEB128
  • Memory corruption from invalid instructions

Your parser teaches you to think about these boundaries.


Real-World Outcome

When your parser is complete, you will have a tool that produces detailed, professional output for any WebAssembly binary. Here is exactly what running your parser should produce:

Example 1: Simple Add Function

Input WAT file (add.wat):

(module
  (func $add (export "add") (param i32 i32) (result i32)
    local.get 0
    local.get 1
    i32.add
  )
  (memory (export "memory") 1)
)

Command and Output:

$ wat2wasm add.wat -o add.wasm
$ ./wasmparser add.wasm

================================================================================
                           WASM BINARY PARSER v1.0
================================================================================

File: add.wasm
Size: 42 bytes

--------------------------------------------------------------------------------
HEADER ANALYSIS
--------------------------------------------------------------------------------
Offset    Bytes           Description
------    -----           -----------
0x00      00 61 73 6D     Magic number: "\0asm" [VALID]
0x04      01 00 00 00     Version: 1 (little-endian u32)

--------------------------------------------------------------------------------
SECTION MAP
--------------------------------------------------------------------------------
ID   Name       Offset    Size      Description
--   ----       ------    ----      -----------
1    Type       0x08      7         Function signatures
3    Function   0x11      2         Function-to-type mappings
5    Memory     0x15      3         Linear memory declarations
7    Export     0x1A      14        Exported names
10   Code       0x2A      9         Function bodies

--------------------------------------------------------------------------------
SECTION 1: TYPE
--------------------------------------------------------------------------------
Count: 1 function type(s)

  [0] func (i32, i32) -> (i32)
      Encoding: 60 02 7F 7F 01 7F
                ^  ^  ^--^  ^  ^
                |  |   |    |  +-- result: i32
                |  |   |    +-- 1 result
                |  |   +-- params: i32, i32
                |  +-- 2 params
                +-- func type marker

--------------------------------------------------------------------------------
SECTION 3: FUNCTION
--------------------------------------------------------------------------------
Count: 1 function(s)

  [0] type_idx=0 -> (i32, i32) -> (i32)

--------------------------------------------------------------------------------
SECTION 5: MEMORY
--------------------------------------------------------------------------------
Count: 1 memory declaration(s)

  [0] limits: min=1 page (64KB), max=unlimited
      Encoding: 00 01
                ^  ^
                |  +-- minimum pages
                +-- flags (no maximum)

--------------------------------------------------------------------------------
SECTION 7: EXPORT
--------------------------------------------------------------------------------
Count: 2 export(s)

  [0] "add" -> func[0]
      Kind: 0x00 (function)
      Index: 0

  [1] "memory" -> memory[0]
      Kind: 0x02 (memory)
      Index: 0

--------------------------------------------------------------------------------
SECTION 10: CODE
--------------------------------------------------------------------------------
Count: 1 function body(ies)

  [0] Function $add
      Body size: 7 bytes
      Locals: (none)

      Disassembly:
      ~~~~~~~~~~~~
      Offset    Bytes    Instruction       Stack Effect
      ------    -----    -----------       ------------
      0x00      20 00    local.get 0       [] -> [i32]
      0x02      20 01    local.get 1       [i32] -> [i32, i32]
      0x04      6A       i32.add           [i32, i32] -> [i32]
      0x05      0B       end               [i32] -> [i32]

================================================================================
SUMMARY
================================================================================
  Total sections: 5
  Function types: 1
  Functions: 1 (0 imported, 1 defined)
  Exports: 2
  Memory: 1 page minimum
  Validation: PASSED

Parse completed in 0.3ms

Example 2: Module with Imports

Input WAT file (with_imports.wat):

(module
  (import "env" "log" (func $log (param i32)))
  (import "env" "memory" (memory 1))

  (func $greet (export "greet") (param i32)
    (local i32)
    local.get 0
    i32.const 100
    i32.add
    local.set 1
    local.get 1
    call $log
  )
)

Command and Output:

$ wat2wasm with_imports.wat -o with_imports.wasm
$ ./wasmparser with_imports.wasm --verbose

================================================================================
                           WASM BINARY PARSER v1.0
================================================================================

File: with_imports.wasm
Size: 68 bytes

--------------------------------------------------------------------------------
HEADER ANALYSIS
--------------------------------------------------------------------------------
Offset    Bytes           Description
------    -----           -----------
0x00      00 61 73 6D     Magic number: "\0asm" [VALID]
0x04      01 00 00 00     Version: 1 (little-endian u32)

--------------------------------------------------------------------------------
SECTION 1: TYPE
--------------------------------------------------------------------------------
Count: 2 function type(s)

  [0] func (i32) -> ()
      Used by: import "env"."log", function $greet

  [1] func (i32) -> ()
      Note: Duplicate signature, could be deduplicated

--------------------------------------------------------------------------------
SECTION 2: IMPORT
--------------------------------------------------------------------------------
Count: 2 import(s)

  [0] "env"."log" : func
      Type index: 0
      Signature: (i32) -> ()
      Assigned function index: 0

      Encoding breakdown:
      03 65 6E 76    ; module name length=3, "env"
      03 6C 6F 67    ; import name length=3, "log"
      00             ; kind: function
      00             ; type index: 0

  [1] "env"."memory" : memory
      Limits: min=1 page, max=none
      Assigned memory index: 0

--------------------------------------------------------------------------------
SECTION 3: FUNCTION
--------------------------------------------------------------------------------
Count: 1 function(s)

  [0] type_idx=1 -> (i32) -> ()
      Note: This is function index 1 (after 1 imported function)

--------------------------------------------------------------------------------
SECTION 7: EXPORT
--------------------------------------------------------------------------------
Count: 1 export(s)

  [0] "greet" -> func[1]

--------------------------------------------------------------------------------
SECTION 10: CODE
--------------------------------------------------------------------------------
Count: 1 function body(ies)

  [0] Function index 1 (export: "greet")
      Body size: 14 bytes
      Locals: 1 group(s)
        - 1 x i32 (local index: 1, after 1 param)

      Disassembly:
      ~~~~~~~~~~~~
      Offset    Bytes       Instruction       Immediate       Stack
      ------    -----       -----------       ---------       -----
      0x00      20 00       local.get         idx=0           [] -> [i32]
      0x02      41 E4 00    i32.const         100             [i32] -> [i32,i32]
                                              (LEB128: E4 00 = 100)
      0x05      6A          i32.add                           [i32,i32] -> [i32]
      0x06      21 01       local.set         idx=1           [i32] -> []
      0x08      20 01       local.get         idx=1           [] -> [i32]
      0x0A      10 00       call              func_idx=0      [i32] -> []
                                              (calls: "env"."log")
      0x0C      0B          end                               [] -> []

================================================================================
INDEX SPACES (after all imports processed)
================================================================================
  Function indices:
    [0] import "env"."log" : (i32) -> ()
    [1] $greet : (i32) -> () [exported as "greet"]

  Memory indices:
    [0] import "env"."memory" : 1 page min

================================================================================
Parse completed in 0.4ms

Example 3: Hex Dump Mode

$ ./wasmparser add.wasm --hex

WASM Binary Hex Dump: add.wasm
==============================

Header:
  00000000: 00 61 73 6D 01 00 00 00                          .asm....

Section 1 (Type) - 7 bytes:
  00000008: 01 06 01 60 02 7F 7F 01 7F                       ...`....

Section 3 (Function) - 2 bytes:
  00000011: 03 02 01 00                                      ....

Section 5 (Memory) - 3 bytes:
  00000015: 05 03 01 00 01                                   .....

Section 7 (Export) - 14 bytes:
  0000001A: 07 0D 02 03 61 64 64 00 00 06 6D 65 6D 6F 72 79  ....add...memory
  0000002A: 02 00                                            ..

Section 10 (Code) - 9 bytes:
  0000002C: 0A 07 01 05 00 20 00 20 01 6A 0B                 ..... . .j.

Example 4: JSON Output Mode

$ ./wasmparser add.wasm --json | jq .

{
  "magic": "00 61 73 6D",
  "version": 1,
  "sections": [
    {
      "id": 1,
      "name": "type",
      "offset": 8,
      "size": 7,
      "types": [
        {
          "index": 0,
          "params": ["i32", "i32"],
          "results": ["i32"]
        }
      ]
    },
    {
      "id": 3,
      "name": "function",
      "offset": 17,
      "size": 2,
      "functions": [
        {"index": 0, "type_index": 0}
      ]
    },
    {
      "id": 5,
      "name": "memory",
      "offset": 21,
      "size": 3,
      "memories": [
        {"index": 0, "min": 1, "max": null}
      ]
    },
    {
      "id": 7,
      "name": "export",
      "offset": 26,
      "size": 14,
      "exports": [
        {"name": "add", "kind": "function", "index": 0},
        {"name": "memory", "kind": "memory", "index": 0}
      ]
    },
    {
      "id": 10,
      "name": "code",
      "offset": 42,
      "size": 9,
      "bodies": [
        {
          "index": 0,
          "size": 7,
          "locals": [],
          "instructions": [
            {"offset": 0, "opcode": "0x20", "mnemonic": "local.get", "immediate": 0},
            {"offset": 2, "opcode": "0x20", "mnemonic": "local.get", "immediate": 1},
            {"offset": 4, "opcode": "0x6A", "mnemonic": "i32.add"},
            {"offset": 5, "opcode": "0x0B", "mnemonic": "end"}
          ]
        }
      ]
    }
  ],
  "summary": {
    "total_sections": 5,
    "function_count": 1,
    "import_count": 0,
    "export_count": 2,
    "valid": true
  }
}

Example 5: Error Handling

$ ./wasmparser corrupted.wasm

================================================================================
                           WASM BINARY PARSER v1.0
================================================================================

File: corrupted.wasm
Size: 25 bytes

--------------------------------------------------------------------------------
HEADER ANALYSIS
--------------------------------------------------------------------------------
Offset    Bytes           Description
------    -----           -----------
0x00      00 61 73 6D     Magic number: "\0asm" [VALID]
0x04      01 00 00 00     Version: 1 (little-endian u32)

--------------------------------------------------------------------------------
PARSE ERROR
--------------------------------------------------------------------------------
Error: Invalid section ordering at offset 0x10
  Expected: Section ID >= 3
  Found: Section ID 1 (Type)

Context: Section 3 (Function) was already parsed at offset 0x08.
         Known sections must appear at most once and in ascending ID order.
         Only custom sections (ID 0) may appear out of order.

Partial parse results available above this error.

Exit code: 1

The Core Question Youโ€™re Answering

How does a compact binary format encode program structure, and what design tradeoffs enable fast parsing?

This question sits at the heart of systems programming. As you build your parser, youโ€™re uncovering answers to fundamental design decisions:

  • Why variable-length encoding? LEB128 saves bytes for small values (common in indices and sizes) at the cost of complexity. A module with 1000 small functions saves kilobytes compared to fixed 4-byte integers.

  • Why strict section ordering? Single-pass parsing. A streaming compiler can begin generating code the moment function types arrive, without waiting for the entire file. No seeking backward, no multi-pass algorithms.

  • Why separate Function and Code sections? Forward reference resolution. The Function section declares all type signatures upfront, so the Code section can validate call instructions immediately without lookahead.

  • Why magic numbers and version fields? Fast rejection of invalid files. A parser can fail in 8 bytes instead of parsing garbage and failing deep inside.

Reflect on this: Every byte in the format exists for a reason. When you encounter something that seems redundant or complex, ask: โ€œWhat problem does this solve?โ€ The answer usually involves either space efficiency, parse speed, or validation simplicity.


Concepts You Must Understand First

Before writing code, ensure you have solid mental models for these concepts:

  1. LEB128 variable-length encoding - The algorithm for reading integers where small values use fewer bytes, with continuation bits indicating โ€œmore data followsโ€

  2. Binary file structure and byte-level I/O - How files are organized as sequences of bytes, and how to read them in chunks while tracking position

  3. Hexadecimal notation and bit manipulation - Converting between hex, binary, and decimal; using AND, OR, and shift operations to extract fields from bytes

  4. Magic numbers and file format identification - How files begin with signature bytes that identify their format (PDF starts with %PDF, PNG with \x89PNG, WASM with \0asm)

  5. Type encodings and tagged unions - How a single byte can indicate which variant follows (e.g., 0x60 means โ€œfunction typeโ€, 0x7F means โ€œi32โ€)

  6. Length-prefixed vs. delimiter-terminated data - WASM uses length prefixes (know size before reading) rather than delimiters (scan for terminator), enabling single-pass parsing

  7. Index spaces and forward references - How WASM assigns indices to types, functions, memories, etc., and why imports affect the numbering

Book Reference: See โ€œComputer Systems: A Programmerโ€™s Perspectiveโ€ chapters 2 and 7 for binary representations and linking concepts. The โ€œPractical Binary Analysisโ€ book covers file format parsing in depth.


Questions to Guide Your Design

Before implementing, think through these design questions:

Reading the byte stream:

  • How will you track your current position in the file?
  • What happens if you try to read past the end of the file?
  • Should you read the whole file into memory, or stream from disk?

Decoding LEB128:

  • How do you know when to stop reading bytes?
  • Whatโ€™s the maximum number of bytes you might read for a u32? For a u64?
  • How does signed LEB128 differ, and when do you need sign extension?

Parsing sections:

  • How will you verify you consumed exactly the number of bytes a section claims?
  • What should happen if you encounter an unknown section ID (like ID 15)?
  • How will you handle custom sections that can appear anywhere?

Validating structure:

  • How do you enforce that sections appear in ascending ID order?
  • What if a section appears twice?
  • Should you validate that function indices are in range, or defer to a separate validation pass?

Decoding instructions:

  • How will you map opcode bytes to instruction names?
  • Whatโ€™s your strategy for handling the ~200 different opcodes?
  • How will you track nesting depth for block/loop/if structures?

Output format:

  • Will you build an in-memory AST first, or print while parsing?
  • How will you handle alignment and formatting for readable output?
  • What information is essential vs. nice-to-have?

Thinking Exercise

Before writing any code, manually decode this WASM binary by hand. This exercise builds intuition that no amount of coding can replace.

The binary (27 bytes):

00 61 73 6D 01 00 00 00 01 07 01 60 02 7F 7F 01
7F 03 02 01 00 0A 07 01 05 00 20 00 20 01 6A 0B

Your task:

  1. Identify the header (first 8 bytes):
    • What is the magic number?
    • What version is this module?
  2. Find the first section (starts at byte 8):
    • What is the section ID?
    • Decode the section size (itโ€™s a single-byte LEB128)
    • What are the raw bytes of the section content?
  3. Parse the Type section content:
    • How many types are declared?
    • What is the marker byte for a function type?
    • How many parameters? What types?
    • How many results? What types?
    • Write out the signature in WAT form: (func (param ...) (result ...))
  4. Find the second section:
    • What is the section ID?
    • What is the size?
    • This is the Function section. How many functions does it declare?
    • What type index does function 0 use?
  5. Find the third section (the Code section):
    • What is the section ID?
    • What is the size?
    • How many function bodies?
    • What is the body size for function 0?
    • How many local variable groups?
    • Decode each instruction:
      • Byte 0x20 = ?
      • Byte 0x00 = ?
      • Byte 0x20 = ?
      • Byte 0x01 = ?
      • Byte 0x6A = ?
      • Byte 0x0B = ?
  6. Reconstruct the WAT: Write the complete WAT that would produce this binary.

Hint for checking your work: The module contains a single function that adds two i32 parameters.


The Interview Questions Theyโ€™ll Ask

Binary parsing skills translate directly to interview questions. Hereโ€™s what youโ€™ll be prepared to answer:

LEB128 and Variable-Length Encoding:

  • โ€œImplement a function to decode unsigned LEB128 from a byte stream.โ€
  • โ€œWhat is the maximum value that can be stored in 3 bytes of LEB128?โ€
  • โ€œWhy might a format use variable-length encoding instead of fixed-size integers?โ€
  • โ€œGiven bytes 0xE5 0x8E 0x26, what unsigned integer does this represent?โ€

Binary File Formats:

  • โ€œHow would you design a binary format for a configuration file?โ€
  • โ€œWhat are the tradeoffs between length-prefixed and delimiter-terminated strings?โ€
  • โ€œWhy do file formats use magic numbers?โ€
  • โ€œHow would you make a binary format extensible for future versions?โ€

Parsing and Validation:

  • โ€œHow would you implement a streaming parser that can process data as it arrives?โ€
  • โ€œWhatโ€™s the difference between syntax errors and semantic errors in a binary format?โ€
  • โ€œHow would you gracefully handle corrupted or truncated input?โ€
  • โ€œDesign a data structure to represent a parsed AST for a binary format.โ€

Systems Programming:

  • โ€œWhatโ€™s the difference between big-endian and little-endian byte order?โ€
  • โ€œHow would you memory-map a large file for efficient parsing?โ€
  • โ€œWhat security considerations apply when parsing untrusted binary input?โ€
  • โ€œHow would you implement fuzzing for a binary parser?โ€

WebAssembly Specific:

  • โ€œExplain the relationship between the Function section and Code section.โ€
  • โ€œWhy must WASM sections appear in a specific order?โ€
  • โ€œWhat is the purpose of the Data Count section added in WASM 2.0?โ€
  • โ€œHow does WASMโ€™s binary format enable streaming compilation?โ€

Hints in Layers

If you get stuck, reveal hints progressively. Try to solve problems yourself before looking.

Layer 1: Getting Started

Hint: File reading structure

Create a struct to wrap your byte buffer with position tracking:

struct ByteReader {
    data: &[u8],
    pos: usize,
}

impl ByteReader {
    fn read_byte(&mut self) -> Option<u8> {
        if self.pos < self.data.len() {
            let b = self.data[self.pos];
            self.pos += 1;
            Some(b)
        } else {
            None
        }
    }
}

All other read operations build on read_byte().

Hint: LEB128 termination condition

The high bit (0x80) of each byte indicates continuation:

  • If byte & 0x80 != 0: more bytes follow
  • If byte & 0x80 == 0: this is the last byte

Extract the 7 value bits with byte & 0x7F.

Layer 2: Section Parsing

Hint: Section boundary tracking

Before parsing a sectionโ€™s content, record your position:

let section_start = reader.pos;
let section_size = reader.read_uleb128();

// Parse content...

let bytes_consumed = reader.pos - section_start;
assert!(bytes_consumed == section_size, "Section size mismatch!");

This catches bugs where you read too many or too few bytes.

Hint: Handling unknown sections

If you encounter a section ID you donโ€™t recognize:

if section_id > 12 {
    // Unknown section - skip it entirely
    reader.skip(section_size);
    continue;
}

This makes your parser forward-compatible with future WASM versions.

Layer 3: Instruction Decoding

Hint: Opcode dispatch pattern

Use a match/switch on the opcode byte. Group by category:

match opcode {
    // Control flow
    0x00 => Instruction::Unreachable,
    0x01 => Instruction::Nop,
    0x02 => Instruction::Block(read_block_type()),
    0x0B => Instruction::End,

    // Variables
    0x20 => Instruction::LocalGet(read_uleb128()),
    0x21 => Instruction::LocalSet(read_uleb128()),

    // Constants
    0x41 => Instruction::I32Const(read_sleb128()),  // Note: signed!

    // Arithmetic (no immediates)
    0x6A => Instruction::I32Add,
    0x6B => Instruction::I32Sub,

    _ => Instruction::Unknown(opcode),
}
Hint: Memory instruction immediates

Memory operations like i32.load and i32.store have two immediates:

// memarg = (align, offset)
let align = read_uleb128();   // log2 of alignment (0=1, 1=2, 2=4, 3=8)
let offset = read_uleb128();  // byte offset from address

// Effective address = stack_value + offset

The alignment is a hint for optimization; the offset is semantic.

Layer 4: Advanced Topics

Hint: Block type encoding

Block types use a special encoding:

  • 0x40: void (empty block type, no value produced)
  • 0x7F, 0x7E, 0x7D, 0x7C: value type (block produces that type)
  • 0x00+: type index (signed LEB128, positive = index into type section)

The type index form allows blocks with multiple returns.

Hint: Nested block tracking

For proper disassembly indentation, track nesting depth:

let mut depth = 0;

for instr in instructions {
    match instr {
        Block | Loop | If => {
            print_indented(depth, instr);
            depth += 1;
        }
        Else => {
            print_indented(depth - 1, instr);  // Else at same level as If
        }
        End => {
            depth -= 1;
            print_indented(depth, instr);
        }
        _ => print_indented(depth, instr),
    }
}

Books That Will Help

Book Author(s) Why Itโ€™s Relevant
Practical Binary Analysis Dennis Andriesse The definitive guide to understanding binary formats. Covers ELF, PE, and general principles of parsing executable formats. Chapters on disassembly directly apply to WASM bytecode.
Computer Systems: A Programmerโ€™s Perspective Bryant & Oโ€™Hallaron Foundation for understanding how programs are represented in binary. Chapter 2 on data representation explains bit manipulation; Chapter 7 on linking explains symbol tables and relocations (similar to WASM imports/exports).
Low-Level Programming: C, Assembly, and Program Execution Igor Zhirkov Teaches the mindset for byte-level programming. Understanding x86 binary encoding helps appreciate WASMโ€™s simpler design.
Crafting Interpreters Robert Nystrom While focused on text parsing, the bytecode chapter (Part III) shows how to design instruction encodings, directly applicable to WASM disassembly.
The Art of WebAssembly Rick Battagline WebAssembly-specific book that covers the binary format from a practical perspective. Good companion for understanding the โ€œwhyโ€ behind format decisions.
Compilers: Principles, Techniques, and Tools Aho, Lam, Sethi, Ullman The Dragon Bookโ€™s chapters on intermediate representations and code generation explain why WASM is structured the way it is.

Resources

Specifications

Reference Implementations

Tools

  • wasm-objdump - From wabt, for comparison
  • xxd - Hex dump utility
  • wasm-validate - Check if WASM is valid

Self-Assessment Checklist

Before moving to Project 3, verify you can:

  • Implement LEB128 encoding/decoding from scratch
  • List all 12 section types and their purposes
  • Explain why sections must be ordered
  • Parse any function signature from the type section
  • Decode the code sectionโ€™s instruction stream
  • Use wasm-objdump to verify your parserโ€™s output
  • Handle malformed input without crashing

Conceptual Questions

  1. Why does WASM use LEB128 instead of fixed-size integers?
  2. What information is in the Function section vs. the Code section?
  3. How do you know where a function body ends?
  4. Why must imports be declared before functions?
  5. How would you find which type a particular function uses?

Next: P03: Build a WASM Interpreter โ€” execute the bytecode youโ€™ve learned to parse