LEARNING OCAML

Learning OCaml: A Project-Based Journey

Introduction

OCaml is a statically-typed functional programming language with an emphasis on expressiveness, correctness, and performance. It’s the language of choice for:

Compilers and language tools (Rust’s first compiler was in OCaml, Facebook’s Flow and Hack, ReasonML)
Formal verification (Coq proof assistant, CompCert verified C compiler)
Financial systems (Jane Street trades billions daily using OCaml)
Systems programming (MirageOS unikernels, Irmin database)

What makes OCaml special:

Powerful type system: Algebraic data types, parametric polymorphism, type inference
Pattern matching: Destructure data elegantly and exhaustively
Module system: The most powerful module system of any mainstream language
Performance: Native code compilation, competitive with C for many workloads
Safety: Immutable by default, no null pointers, exhaustive pattern matching
OCaml 5: First mainstream language with algebraic effect handlers and multicore support

To truly understand OCaml, you need to think functionally and embrace its unique strengths: algebraic data types, pattern matching, and the module system.

Core Concept Analysis

The OCaml Mental Model

(* OCaml is expression-oriented - everything returns a value *)
let result = if condition then 42 else 0

(* Algebraic data types model your domain precisely *)
type shape =
  | Circle of float              (* radius *)
  | Rectangle of float * float   (* width, height *)
  | Triangle of float * float * float  (* three sides *)

(* Pattern matching destructures data and ensures exhaustiveness *)
let area = function
  | Circle r -> Float.pi *. r *. r
  | Rectangle (w, h) -> w *. h
  | Triangle (a, b, c) ->
      let s = (a +. b +. c) /. 2.0 in
      Float.sqrt (s *. (s -. a) *. (s -. b) *. (s -. c))

Learning Progression

┌─────────────────────────────────────────────────────────────────┐
│  Level 1: Foundations                                           │
│  - Basic types, let bindings, functions                         │
│  - Pattern matching fundamentals                                │
│  - Lists and recursion                                          │
├─────────────────────────────────────────────────────────────────┤
│  Level 2: Algebraic Thinking                                    │
│  - Variant types (sum types)                                    │
│  - Record types (product types)                                 │
│  - Option and Result for error handling                         │
├─────────────────────────────────────────────────────────────────┤
│  Level 3: Functional Idioms                                     │
│  - Higher-order functions (map, filter, fold)                   │
│  - Tail recursion and accumulators                              │
│  - Immutable data structures                                    │
├─────────────────────────────────────────────────────────────────┤
│  Level 4: The Module System                                     │
│  - Modules and signatures                                       │
│  - Abstract types and encapsulation                             │
│  - Functors (parameterized modules)                             │
├─────────────────────────────────────────────────────────────────┤
│  Level 5: Real-World OCaml                                      │
│  - Concurrency with Lwt/Eio                                     │
│  - Parsing and language implementation                          │
│  - OCaml 5 effects and multicore                                │
└─────────────────────────────────────────────────────────────────┘

Project 1: CLI Calculator with Expression Parser

File: LEARNING_OCAML.md
Main Programming Language: OCaml
Alternative Programming Languages: Haskell, F#, Rust
Coolness Level: Level 3: Genuinely Clever
Business Potential: 1. The “Resume Gold” (Educational/Personal Brand)
Difficulty: Level 2: Intermediate (The Developer)
Knowledge Area: Parsing / Algebraic Data Types
Software or Tool: Expression Evaluator
Main Book: “OCaml Programming: Correct + Efficient + Beautiful” (Cornell CS3110)

What you’ll build

A command-line calculator that parses and evaluates mathematical expressions like (3 + 4) * 2 - 1, using algebraic data types to represent the abstract syntax tree.

Why it teaches OCaml

This is the perfect first project because it exercises OCaml’s core strengths: algebraic data types for the AST, pattern matching for evaluation, and recursion for tree traversal. You’ll immediately see why OCaml is the language of choice for compilers.

Core challenges you’ll face

Defining an AST with variants (each operator is a constructor) → maps to algebraic data types
Recursive descent parsing (turn string into AST) → maps to recursion and pattern matching
Evaluating the tree (pattern match and recurse) → maps to recursive functions
Error handling (division by zero, parse errors) → maps to Option/Result types

Key Concepts

Algebraic Data Types: “OCaml Programming” Chapter 3.9 - Cornell CS3110
Pattern Matching: “OCaml Programming” Chapter 3.5 - Cornell CS3110
Recursive Descent Parsing: “OCaml Programming” Chapter 10.2 - Cornell CS3110
Option Type: OCaml Documentation - Basic Data Types

Difficulty

Beginner-Intermediate

Time estimate

1 week

Prerequisites

Basic programming knowledge, no OCaml experience needed

Real world outcome

$ ./calc
> 2 + 3
5
> (10 - 4) * 2
12
> 15 / (3 + 2)
3
> 2 ^ 10
1024
> 1 / 0
Error: Division by zero
> 2 + + 3
Error: Parse error at position 4

Implementation Hints

Define the AST as a variant type:

TYPE expr =
  | Num of int
  | Add of expr * expr
  | Sub of expr * expr
  | Mul of expr * expr
  | Div of expr * expr
  | Pow of expr * expr

The evaluator is a simple recursive function:

FUNCTION eval(expr):
  MATCH expr WITH
  | Num n -> n
  | Add (left, right) -> eval(left) + eval(right)
  | Sub (left, right) -> eval(left) - eval(right)
  | Mul (left, right) -> eval(left) * eval(right)
  | Div (left, right) ->
      IF eval(right) = 0 THEN Error "Division by zero"
      ELSE eval(left) / eval(right)
  | Pow (base, exp) -> power(eval(base), eval(exp))

Parsing strategy:

# Expression grammar:
# expr   ::= term (('+' | '-') term)*
# term   ::= factor (('*' | '/') factor)*
# factor ::= base ('^' factor)?   (* right-associative *)
# base   ::= NUMBER | '(' expr ')'

FUNCTION parse_expr(tokens):
  left = parse_term(tokens)
  WHILE peek(tokens) IN ['+', '-']:
    op = consume(tokens)
    right = parse_term(tokens)
    left = (IF op = '+' THEN Add ELSE Sub)(left, right)
  RETURN left

Critical insight: In OCaml, the type system ensures your pattern matches are exhaustive. If you add a new operator to the AST, the compiler will tell you everywhere you need to handle it. This is why OCaml is beloved for language implementation.

Learning milestones

Basic arithmetic works → You understand variants and pattern matching
Operator precedence is correct → You understand recursive descent
Error handling with Result → You understand OCaml’s error philosophy
Variables and let-bindings → You’re ready for real interpreters

Project 2: JSON Parser and Manipulator

File: LEARNING_OCAML.md
Main Programming Language: OCaml
Alternative Programming Languages: Haskell, F#, Rust
Coolness Level: Level 3: Genuinely Clever
Business Potential: 2. The “Micro-SaaS / Pro Tool” (Solo-Preneur Potential)
Difficulty: Level 2: Intermediate (The Developer)
Knowledge Area: Parsing / Data Transformation
Software or Tool: JSON Library
Main Book: “Real World OCaml” by Yaron Minsky

What you’ll build

A complete JSON parser that reads JSON text into OCaml data structures, allows querying/manipulation, and serializes back to JSON.

Why it teaches OCaml

JSON’s structure maps perfectly to OCaml’s type system. You’ll see how sum types (variants) represent JSON’s different value types, and how recursive types handle nesting. This project teaches practical parsing without compiler theory.

Core challenges you’ll face

Modeling JSON with variants (null, bool, number, string, array, object) → maps to recursive variant types
Lexing (tokenize the input string) → maps to string manipulation and state
Recursive parsing (handle nested structures) → maps to mutual recursion
Pretty printing (serialize with indentation) → maps to recursive formatting
Path-based querying (access nested values like $.users[0].name) → maps to pattern matching on paths

Key Concepts

Recursive Variant Types: “OCaml Programming” Chapter 3.9 - Cornell CS3110
Mutual Recursion: “OCaml Programming” Chapter 3.10 - Cornell CS3110
String Processing: OCaml String Module Documentation
Pretty Printing: “Real World OCaml” Chapter 7 - Yaron Minsky

Difficulty

Intermediate

Time estimate

1-2 weeks

Prerequisites

Project 1 completed, understanding of recursive types

Real world outcome

$ ./json-tool parse test.json
Parsed successfully: Object with 3 keys

$ ./json-tool query test.json '$.users[0].name'
"Alice"

$ ./json-tool query test.json '$.users[*].age'
[30, 25, 35]

$ echo '{"name": "test", "value": 42}' | ./json-tool pretty
{
  "name": "test",
  "value": 42
}

$ ./json-tool validate broken.json
Error at line 5, column 12: Expected ':' but found ','

Implementation Hints

The JSON type:

TYPE json =
  | Null
  | Bool of bool
  | Number of float
  | String of string
  | Array of json list
  | Object of (string * json) list

Lexer tokens:

TYPE token =
  | LBRACE | RBRACE      (* { } *)
  | LBRACKET | RBRACKET  (* [ ] *)
  | COLON | COMMA        (* : , *)
  | TRUE | FALSE | NULL
  | STRING of string
  | NUMBER of float
  | EOF

Recursive parser structure:

FUNCTION parse_value(tokens):
  MATCH peek(tokens) WITH
  | NULL -> consume(tokens); Null
  | TRUE -> consume(tokens); Bool true
  | FALSE -> consume(tokens); Bool false
  | NUMBER n -> consume(tokens); Number n
  | STRING s -> consume(tokens); String s
  | LBRACKET -> parse_array(tokens)
  | LBRACE -> parse_object(tokens)
  | _ -> error "Unexpected token"

FUNCTION parse_array(tokens):
  expect(tokens, LBRACKET)
  IF peek(tokens) = RBRACKET THEN
    consume(tokens); Array []
  ELSE
    elements = parse_comma_separated(tokens, parse_value)
    expect(tokens, RBRACKET)
    Array elements

FUNCTION parse_object(tokens):
  expect(tokens, LBRACE)
  IF peek(tokens) = RBRACE THEN
    consume(tokens); Object []
  ELSE
    pairs = parse_comma_separated(tokens, parse_pair)
    expect(tokens, RBRACE)
    Object pairs

Path-based querying:

TYPE path_element =
  | Key of string      (* .name *)
  | Index of int       (* [0] *)
  | Wildcard           (* [*] *)

FUNCTION query(json, path):
  MATCH path, json WITH
  | [], value -> [value]
  | Key k :: rest, Object pairs ->
      MATCH find k pairs WITH
      | Some v -> query(v, rest)
      | None -> []
  | Index i :: rest, Array elements ->
      IF i < length(elements) THEN query(nth elements i, rest)
      ELSE []
  | Wildcard :: rest, Array elements ->
      concat_map (fun elem -> query(elem, rest)) elements
  | _, _ -> []  (* Type mismatch *)

Critical insight: OCaml’s pattern matching makes tree transformations elegant. The query function is just a few lines of pattern matching, but it handles arbitrary nesting gracefully.

Learning milestones

Parse simple JSON → You understand recursive types
Handle all JSON types → You understand variant types deeply
Pretty printing works → You understand recursive formatting
Path queries work → You’re thinking in patterns

Project 3: Markdown to HTML Converter

File: LEARNING_OCAML.md
Main Programming Language: OCaml
Alternative Programming Languages: Haskell, F#, Rust
Coolness Level: Level 3: Genuinely Clever
Business Potential: 2. The “Micro-SaaS / Pro Tool” (Solo-Preneur Potential)
Difficulty: Level 2: Intermediate (The Developer)
Knowledge Area: Parsing / Text Processing
Software or Tool: Markdown Processor
Main Book: “OCaml Programming: Correct + Efficient + Beautiful” (Cornell CS3110)

What you’ll build

A Markdown parser that converts Markdown text to HTML, supporting headers, paragraphs, bold/italic, links, code blocks, and lists.

Why it teaches OCaml

Markdown parsing requires handling inline elements within block elements—a natural fit for mutually recursive types and functions. You’ll learn to model complex documents as trees and transform them elegantly.

Core challenges you’ll face

Block-level parsing (headers, paragraphs, code blocks, lists) → maps to line-based state machine
Inline parsing (bold, italic, links, code) → maps to recursive descent within strings
Mutual recursion (blocks contain inlines, lists contain blocks) → maps to mutually recursive types
HTML generation (proper escaping, nesting) → maps to tree-to-string transformation

Key Concepts

Mutually Recursive Types: “OCaml Programming” Chapter 3.10 - Cornell CS3110
State Machines: “Real World OCaml” Chapter 15 - Yaron Minsky
String Processing: OCaml Str Module
Regular Expressions in OCaml: “Real World OCaml” Chapter 15 - Yaron Minsky

Difficulty

Intermediate

Time estimate

2 weeks

Prerequisites

Project 2 completed, comfort with recursive types

Real world outcome

$ cat example.md
# Hello World

This is **bold** and *italic* text.

- Item 1
- Item 2
  - Nested item

```python
print("Hello")

$ ./markdown example.md

Hello World

This is bold and italic text.

Item 1
Item 2
- Nested item

print("Hello")

### Implementation Hints

**Document model**:

TYPE inline = | Text of string | Bold of inline list | Italic of inline list | Code of string | Link of { text: inline list; url: string } | Image of { alt: string; url: string }

TYPE list_item = ListItem of inline list * block list option (* inline content, optional nested blocks *)

TYPE document = block list


**Two-phase parsing**:

(* Phase 1: Split into raw blocks based on blank lines *) FUNCTION split_blocks(lines): group_by_blank_lines(lines)

(* Phase 2: Parse each raw block *) FUNCTION parse_block(lines): first_line = head(lines) IF starts_with “#” first_line THEN parse_heading(first_line) ELSE IF starts_with “```” first_line THEN parse_code_block(lines) ELSE IF starts_with “- “ first_line THEN parse_list(lines) ELSE IF starts_with “> “ first_line THEN parse_blockquote(lines) ELSE parse_paragraph(join(lines))

(* Phase 3: Parse inline elements within text ) FUNCTION parse_inlines(text): ( Use a character-by-character scanner *) …


**Inline parsing with a scanner**:

FUNCTION parse_inlines(text): result = [] i = 0 WHILE i < length(text): IF text[i:i+2] = “” THEN end_pos = find “” text (i+2) content = parse_inlines(text[i+2:end_pos]) result.append(Bold content) i = end_pos + 2 ELSE IF text[i] = ‘’ THEN end_pos = find ‘’ text (i+1) content = parse_inlines(text[i+1:end_pos]) result.append(Italic content) i = end_pos + 1 ELSE IF text[i] = ‘[’ THEN (* Parse text ) … ELSE ( Accumulate plain text *) … RETURN result

**Critical insight**: The key to Markdown parsing is separating block-level (line-based) from inline-level (character-based) parsing. OCaml's type system helps you keep these concerns separate.

### Learning milestones
1. **Headers and paragraphs work** → You understand basic parsing
2. **Bold/italic/links work** → You understand inline parsing
3. **Nested lists work** → You understand complex recursion
4. **Code blocks with syntax hints** → You're building real tools

---

## Project 4: Immutable Data Structures Library

- **File**: LEARNING_OCAML.md
- **Main Programming Language**: OCaml
- **Alternative Programming Languages**: Haskell, F#, Scala
- **Coolness Level**: Level 4: Hardcore Tech Flex
- **Business Potential**: 1. The "Resume Gold" (Educational/Personal Brand)
- **Difficulty**: Level 3: Advanced (The Engineer)
- **Knowledge Area**: Data Structures / Functional Programming
- **Software or Tool**: Persistent Data Structures Library
- **Main Book**: "Purely Functional Data Structures" by Chris Okasaki

### What you'll build
A library of persistent (immutable) data structures: finger trees, red-black trees, hash-array mapped tries (HAMTs), and persistent vectors.

### Why it teaches OCaml
Immutable data structures are the heart of functional programming. You'll understand **structural sharing** (how updates are O(log n) despite immutability), **laziness** (for amortized efficiency), and why OCaml's algebraic types make these implementations elegant.

### Core challenges you'll face
- **Structural sharing** (new versions share most data with old) → maps to *immutability patterns*
- **Red-black tree balancing** (maintain invariants immutably) → maps to *pattern matching on trees*
- **Finger trees** (efficient access to both ends) → maps to *recursive types with invariants*
- **HAMTs** (hash-based associative arrays) → maps to *bit manipulation + trees*
- **Lazy evaluation** (for amortized bounds) → maps to *OCaml's `lazy` keyword*

### Resources for key challenges
- **Purely Functional Data Structures** by Chris Okasaki - The definitive resource
- **Finger Trees Paper**: [Finger Trees: A Simple General-purpose Data Structure](https://www.staff.city.ac.uk/~ross/papers/FingerTree.pdf) - Hinze & Paterson

### Key Concepts
- **Persistent Data Structures**: *"Purely Functional Data Structures"* Chapter 1 - Okasaki
- **Red-Black Trees**: *"Purely Functional Data Structures"* Chapter 3.3 - Okasaki
- **Lazy Evaluation**: *"OCaml Programming"* Chapter 8.4 - Cornell CS3110
- **Structural Sharing**: *"Purely Functional Data Structures"* Chapter 2 - Okasaki

### Difficulty
Advanced

### Time estimate
3-4 weeks

### Prerequisites
Solid understanding of OCaml basics, some data structures knowledge

### Real world outcome
```ocaml
(* Usage in REPL *)
# let m = PersistentMap.empty;;
val m : ('a, 'b) PersistentMap.t

# let m1 = PersistentMap.add "key1" 100 m;;
val m1 : (string, int) PersistentMap.t

# let m2 = PersistentMap.add "key2" 200 m1;;
val m2 : (string, int) PersistentMap.t

# PersistentMap.find "key1" m1;;
- : int option = Some 100

# PersistentMap.find "key2" m1;;  (* m1 doesn't have key2 *)
- : int option = None

# PersistentMap.find "key2" m2;;  (* m2 has key2 *)
- : int option = Some 200

(* Benchmark output *)
$ ./bench
Persistent Map (HAMT):
  insert 1M keys: 2.3s
  lookup 1M keys: 0.8s
  memory overhead vs mutable: 1.4x

Persistent Vector:
  append 1M items: 1.1s
  random access: O(log32 n) = ~5 operations for 1M items

Implementation Hints

Red-black tree with smart constructors:

TYPE color = Red | Black

TYPE 'a tree =
  | Empty
  | Node of color * 'a tree * 'a * 'a tree

(* The balance function is the key insight *)
FUNCTION balance(color, left, value, right):
  MATCH (color, left, value, right) WITH
  (* Left-left case *)
  | (Black, Node(Red, Node(Red, a, x, b), y, c), z, d) ->
      Node(Red, Node(Black, a, x, b), y, Node(Black, c, z, d))
  (* Left-right case *)
  | (Black, Node(Red, a, x, Node(Red, b, y, c)), z, d) ->
      Node(Red, Node(Black, a, x, b), y, Node(Black, c, z, d))
  (* Right-left case *)
  | (Black, a, x, Node(Red, Node(Red, b, y, c), z, d)) ->
      Node(Red, Node(Black, a, x, b), y, Node(Black, c, z, d))
  (* Right-right case *)
  | (Black, a, x, Node(Red, b, y, Node(Red, c, z, d))) ->
      Node(Red, Node(Black, a, x, b), y, Node(Black, c, z, d))
  (* Default case *)
  | _ ->
      Node(color, left, value, right)

FUNCTION insert(x, tree):
  LET rec ins = FUNCTION
    | Empty -> Node(Red, Empty, x, Empty)
    | Node(color, left, y, right) ->
        IF x < y THEN balance(color, ins(left), y, right)
        ELSE IF x > y THEN balance(color, left, y, ins(right))
        ELSE Node(color, left, y, right)  (* duplicate *)
  IN
  MATCH ins(tree) WITH
  | Node(_, left, y, right) -> Node(Black, left, y, right)
  | Empty -> Empty  (* impossible *)

HAMT (Hash Array Mapped Trie):

(* 32-way branching trie indexed by hash bits *)
TYPE ('k, 'v) hamt =
  | Empty
  | Leaf of int * 'k * 'v        (* hash, key, value *)
  | Collision of int * ('k * 'v) list  (* hash collision *)
  | Branch of int * ('k, 'v) hamt array  (* bitmap + children *)

FUNCTION find(key, hash, depth, node):
  MATCH node WITH
  | Empty -> None
  | Leaf(h, k, v) -> IF k = key THEN Some(v) ELSE None
  | Collision(h, pairs) -> assoc key pairs
  | Branch(bitmap, children) ->
      LET idx = (hash >> (depth * 5)) land 0x1F IN  (* 5 bits *)
      LET bit = 1 << idx IN
      IF bitmap land bit = 0 THEN None
      ELSE
        LET sparse_idx = popcount(bitmap land (bit - 1)) IN
        find(key, hash, depth + 1, children[sparse_idx])

Critical insight: Pattern matching on trees is where OCaml shines. The red-black tree balancing code is just 4 patterns that each represent a violation of the invariant and how to fix it.

Learning milestones

Red-black tree insert works → You understand tree balancing
Structural sharing verified → You understand persistence
HAMT achieves near-O(1) → You understand hash tries
Library is usable in other projects → You’ve built reusable code

Project 5: Generic Collections with Functors

File: LEARNING_OCAML.md
Main Programming Language: OCaml
Alternative Programming Languages: Standard ML, F#
Coolness Level: Level 4: Hardcore Tech Flex
Business Potential: 1. The “Resume Gold” (Educational/Personal Brand)
Difficulty: Level 3: Advanced (The Engineer)
Knowledge Area: Module System / Abstraction
Software or Tool: Generic Collections Library
Main Book: “Real World OCaml” by Yaron Minsky

What you’ll build

A library of generic collections (Set, Map, PriorityQueue) using functors—modules parameterized by other modules. You’ll create a Set.Make(Ord) functor like the standard library.

Why it teaches OCaml

OCaml’s module system is the most powerful of any mainstream language. Functors let you write code that’s generic over types while maintaining full type safety. This is how Jane Street builds their production systems.

Core challenges you’ll face

Module signatures (define interfaces for modules) → maps to abstraction and encapsulation
Functors (modules that take modules as parameters) → maps to parameterized abstraction
Abstract types (hide implementation details) → maps to information hiding
Signature constraints (with type, include) → maps to interface refinement
First-class modules (modules as values) → maps to runtime polymorphism

Key Concepts

Modules and Signatures: OCaml Documentation - Modules
Functors: “Real World OCaml” Chapter 9 - Yaron Minsky
Abstract Types: “OCaml Programming” Chapter 5.7 - Cornell CS3110
First-Class Modules: “Real World OCaml” Chapter 10 - Yaron Minsky

Difficulty

Advanced

Time estimate

2-3 weeks

Prerequisites

Projects 1-4 completed, understanding of module basics

Real world outcome

(* Define a module for ordering integers *)
module IntOrder : OrderedType with type t = int = struct
  type t = int
  let compare = Int.compare
end

(* Create a Set of integers using the functor *)
module IntSet = MySet.Make(IntOrder)

(* Use it *)
let s = IntSet.empty
let s = IntSet.add 5 s
let s = IntSet.add 3 s
let s = IntSet.add 7 s
let () = IntSet.iter (Printf.printf "%d\n") s  (* prints 3 5 7 *)

(* Create a Map with string keys *)
module StringMap = MyMap.Make(String)

let m = StringMap.empty
let m = StringMap.add "alice" 30 m
let m = StringMap.add "bob" 25 m
let () = match StringMap.find_opt "alice" m with
  | Some age -> Printf.printf "Alice is %d\n" age
  | None -> Printf.printf "Alice not found\n"

(* Priority Queue *)
module IntPQ = PriorityQueue.Make(IntOrder)
let pq = IntPQ.of_list [5; 1; 8; 3; 2]
let (min, pq) = IntPQ.pop pq  (* min = 1 *)

Implementation Hints

Signature for ordered types:

MODULE TYPE OrderedType = sig
  type t
  val compare : t -> t -> int
end

Set functor signature:

MODULE TYPE SetS = sig
  type elt
  type t

  val empty : t
  val is_empty : t -> bool
  val add : elt -> t -> t
  val remove : elt -> t -> t
  val mem : elt -> t -> bool
  val cardinal : int -> t
  val union : t -> t -> t
  val inter : t -> t -> t
  val diff : t -> t -> t
  val iter : (elt -> unit) -> t -> unit
  val fold : (elt -> 'a -> 'a) -> t -> 'a -> 'a
  val to_list : t -> elt list
  val of_list : elt list -> t
end

Set functor implementation:

MODULE Make (Ord : OrderedType) : SetS with type elt = Ord.t = struct
  type elt = Ord.t

  (* Use a balanced tree internally *)
  type t =
    | Empty
    | Node of t * elt * t * int  (* left, value, right, height *)

  let empty = Empty
  let is_empty = function Empty -> true | _ -> false

  let height = function Empty -> 0 | Node(_, _, _, h) -> h

  let create l v r =
    let hl = height l and hr = height r in
    Node(l, v, r, 1 + max hl hr)

  (* AVL rotation *)
  let balance l v r =
    let hl = height l and hr = height r in
    IF hl > hr + 2 THEN
      MATCH l WITH
      | Node(ll, lv, lr, _) ->
          IF height ll >= height lr THEN
            create ll lv (create lr v r)  (* Right rotation *)
          ELSE
            MATCH lr WITH
            | Node(lrl, lrv, lrr, _) ->
                create (create ll lv lrl) lrv (create lrr v r)
            | Empty -> assert false
      | Empty -> assert false
    ELSE IF hr > hl + 2 THEN
      (* Symmetric *)
      ...
    ELSE
      create l v r

  let rec add x = function
    | Empty -> Node(Empty, x, Empty, 1)
    | Node(l, v, r, _) ->
        let c = Ord.compare x v in
        IF c = 0 THEN Node(l, v, r, height l, height r)
        ELSE IF c < 0 THEN balance (add x l) v r
        ELSE balance l v (add x r)

  (* ... rest of implementation ... *)
end

Map functor (key-value store):

MODULE Make (Ord : OrderedType) : MapS with type key = Ord.t = struct
  type key = Ord.t
  type 'a t = (key * 'a) tree  (* tree of key-value pairs *)

  let rec find key = function
    | Empty -> None
    | Node(l, (k, v), r, _) ->
        let c = Ord.compare key k in
        IF c = 0 THEN Some v
        ELSE IF c < 0 THEN find key l
        ELSE find key r

  (* ... similar to Set, but storing pairs ... *)
end

Critical insight: The functor pattern allows you to write an algorithm once (e.g., balanced tree operations) and use it with any type that provides the required interface (comparison). This is more powerful than generics in most languages because modules can contain types, values, and even other modules.

Learning milestones

Set.Make works → You understand basic functors
Map.Make works → You can generalize patterns
Multiple implementations share code → You understand abstraction
First-class modules work → You’re an OCaml power user

Project 6: Type-Safe Database Query Builder

File: LEARNING_OCAML.md
Main Programming Language: OCaml
Alternative Programming Languages: Haskell, F#, Scala
Coolness Level: Level 4: Hardcore Tech Flex
Business Potential: 3. The “Service & Support” Model (B2B Utility)
Difficulty: Level 4: Expert (The Systems Architect)
Knowledge Area: Type-Level Programming / DSLs
Software or Tool: Query Builder
Main Book: “Real World OCaml” by Yaron Minsky

What you’ll build

A type-safe SQL query builder where the types ensure queries are well-formed: column names must exist in the table, JOIN conditions must match types, and projections are type-checked.

Why it teaches OCaml

This project uses OCaml’s type system to catch SQL errors at compile time. You’ll learn phantom types, GADTs (Generalized Algebraic Data Types), and how to encode invariants in types.

Core challenges you’ll face

Phantom types (types that carry information but have no runtime representation) → maps to type-level programming
GADTs (constructors with different return types) → maps to advanced type safety
Type-safe string building (generate SQL without injection) → maps to embedded DSLs
Row types (represent database rows with types) → maps to record manipulation

Resources for key challenges

GADTs Tutorial: OCaml Manual - GADTs
Phantom Types: “Real World OCaml” Chapter 13 - Yaron Minsky

Key Concepts

GADTs: OCaml Manual - GADTs
Phantom Types: “OCaml Programming” Chapter 5.11 - Cornell CS3110
Embedded DSLs: “Real World OCaml” Chapter 17 - Yaron Minsky
Type-Safe Queries: Jane Street’s Bonsai library patterns

Difficulty

Expert

Time estimate

3-4 weeks

Prerequisites

Projects 1-5 completed, understanding of advanced types

Real world outcome

(* Define a table schema at the type level *)
module Users = struct
  type t = {
    id: int;
    name: string;
    email: string;
    age: int option;
  }

  let table = Table.define "users"
    [ Column.int "id"
    ; Column.string "name"
    ; Column.string "email"
    ; Column.int_nullable "age"
    ]
end

(* Type-safe query building *)
let query =
  Query.from Users.table
  |> Query.select [Users.name; Users.email]
  |> Query.where (Users.age > Some 18)
  |> Query.order_by Users.name

let sql = Query.to_sql query
(* "SELECT name, email FROM users WHERE age > 18 ORDER BY name" *)

(* This would be a compile error: *)
let bad_query =
  Query.from Users.table
  |> Query.select [Users.name; NonExistent.column]  (* Type error! *)

(* Type-safe JOINs *)
module Orders = struct
  type t = { id: int; user_id: int; amount: float }
  let table = Table.define "orders" [...]
end

let join_query =
  Query.from Users.table
  |> Query.join Orders.table ~on:(Users.id = Orders.user_id)
  |> Query.select [Users.name; Orders.amount]

Implementation Hints

Phantom types for column types:

(* The 'a phantom type tracks the column's value type *)
TYPE 'a column = {
  name: string;
  table: string;
}

(* Constructors ensure type safety *)
MODULE Column = struct
  let int name : int column = { name; table = "" }
  let string name : string column = { name; table = "" }
  let float name : float column = { name; table = "" }
end

GADT for expressions:

(* The type parameter tracks the result type of the expression *)
TYPE 'a expr =
  | Col : 'a column -> 'a expr
  | Lit : 'a -> 'a expr
  | Eq : 'a expr * 'a expr -> bool expr
  | Lt : int expr * int expr -> bool expr
  | And : bool expr * bool expr -> bool expr
  | Or : bool expr * bool expr -> bool expr
  | IsNull : 'a option expr -> bool expr

(* Type-safe comparison - only works on same types *)
let (=) a b = Eq (a, b)
let (<) a b = Lt (a, b)
let (&&) a b = And (a, b)

Query builder with type tracking:

TYPE ('table, 'result) query = {
  from: 'table Table.t;
  select: 'result selector;
  where: bool expr option;
  order_by: string list;
}

(* The selector type ensures we only select valid columns *)
TYPE 'a selector =
  | All : 'table selector
  | Cols : 'a column list -> 'a selector

SQL generation:

FUNCTION expr_to_sql : type a. a expr -> string = function
  | Col c -> c.table ^ "." ^ c.name
  | Lit x -> quote_literal x
  | Eq (a, b) -> expr_to_sql a ^ " = " ^ expr_to_sql b
  | Lt (a, b) -> expr_to_sql a ^ " < " ^ expr_to_sql b
  | And (a, b) -> "(" ^ expr_to_sql a ^ " AND " ^ expr_to_sql b ^ ")"
  | Or (a, b) -> "(" ^ expr_to_sql a ^ " OR " ^ expr_to_sql b ^ ")"
  | IsNull e -> expr_to_sql e ^ " IS NULL"

Critical insight: GADTs let each constructor have a different return type. This means Eq(int_col, int_lit) has type bool expr, but Eq(int_col, string_lit) is a compile error. The type system prevents invalid queries.

Learning milestones

Simple queries type-check → You understand phantom types
Invalid queries are compile errors → You understand GADTs
JOINs are type-safe → You can compose type constraints
Generated SQL is correct → You’ve built a real DSL

Project 7: Mini-ML Interpreter

File: LEARNING_OCAML.md
Main Programming Language: OCaml
Alternative Programming Languages: Haskell, Standard ML
Coolness Level: Level 5: Pure Magic (Super Cool)
Business Potential: 1. The “Resume Gold” (Educational/Personal Brand)
Difficulty: Level 4: Expert (The Systems Architect)
Knowledge Area: Programming Languages / Compilers
Software or Tool: ML Interpreter
Main Book: “Types and Programming Languages” by Benjamin Pierce

What you’ll build

An interpreter for a subset of ML with let-bindings, functions, pattern matching, algebraic data types, and Hindley-Milner type inference.

Why it teaches OCaml

This is the quintessential OCaml project. OCaml was literally designed for writing language implementations. You’ll implement the same features you’ve been using—it’s a beautiful recursive understanding.

Core challenges you’ll face

Lexing and parsing (turn text into AST) → maps to ocamllex/menhir
Environment-based evaluation (closures, scoping) → maps to interpreters
Type inference (Algorithm W / Hindley-Milner) → maps to unification
Algebraic data types (define and pattern match on variants) → maps to type system design
Let polymorphism (generalization at let-bindings) → maps to polymorphism

Resources for key challenges

Types and Programming Languages by Benjamin Pierce - Chapters 9, 11, 22
Crafting Interpreters in OCaml: nthomas.org
Writing an Interpreter in OCaml: dev.to/nt591

Key Concepts

Lexing with ocamllex: “OCaml Programming” Chapter 10.3 - Cornell CS3110
Parsing with Menhir: “OCaml Programming” Chapter 10.4 - Cornell CS3110
Environment-Based Evaluation: “OCaml Programming” Chapter 10.5 - Cornell CS3110
Hindley-Milner Type Inference: “Types and Programming Languages” Chapter 22 - Pierce

Difficulty

Expert

Time estimate

4-6 weeks

Prerequisites

Projects 1-5 completed, some theory of computation knowledge

Real world outcome

$ ./miniml
MiniML v0.1 - A tiny ML interpreter

> let x = 42;;
val x : int = 42

> let add = fun x y -> x + y;;
val add : int -> int -> int = <fun>

> add 3 4;;
- : int = 7

> let rec fact n = if n <= 1 then 1 else n * fact (n - 1);;
val fact : int -> int = <fun>

> fact 5;;
- : int = 120

> let rec map f = function
    | [] -> []
    | x :: xs -> f x :: map f xs;;
val map : ('a -> 'b) -> 'a list -> 'b list = <fun>

> map (fun x -> x * 2) [1; 2; 3; 4];;
- : int list = [2; 4; 6; 8]

> type option = None | Some of 'a;;
type 'a option = None | Some of 'a

> let safe_div x y = if y = 0 then None else Some (x / y);;
val safe_div : int -> int -> int option = <fun>

Implementation Hints

Abstract syntax tree:

TYPE expr =
  | Int of int
  | Bool of bool
  | Var of string
  | Let of string * expr * expr           (* let x = e1 in e2 *)
  | LetRec of string * expr * expr        (* let rec f = e1 in e2 *)
  | Fun of string * expr                  (* fun x -> e *)
  | App of expr * expr                    (* e1 e2 *)
  | If of expr * expr * expr              (* if e1 then e2 else e3 *)
  | BinOp of binop * expr * expr          (* e1 op e2 *)
  | Match of expr * (pattern * expr) list (* match e with patterns *)

TYPE pattern =
  | PVar of string                        (* x *)
  | PWild                                 (* _ *)
  | PInt of int                           (* 42 *)
  | PCons of pattern * pattern            (* p1 :: p2 *)
  | PNil                                  (* [] *)
  | PCtor of string * pattern option      (* Some x *)

TYPE value =
  | VInt of int
  | VBool of bool
  | VList of value list
  | VClosure of string * expr * env
  | VRecClosure of string * string * expr * env
  | VCtor of string * value option

TYPE env = (string * value) list

Evaluator (big-step semantics):

FUNCTION eval(env, expr):
  MATCH expr WITH
  | Int n -> VInt n
  | Bool b -> VBool b
  | Var x -> lookup x env

  | Let (x, e1, e2) ->
      let v1 = eval env e1 in
      eval ((x, v1) :: env) e2

  | Fun (x, body) ->
      VClosure (x, body, env)

  | App (e1, e2) ->
      MATCH eval env e1 WITH
      | VClosure (x, body, closure_env) ->
          let v2 = eval env e2 in
          eval ((x, v2) :: closure_env) body
      | VRecClosure (f, x, body, closure_env) ->
          let v2 = eval env e2 in
          let rec_env = (f, VRecClosure(f, x, body, closure_env)) :: closure_env in
          eval ((x, v2) :: rec_env) body
      | _ -> error "Not a function"

  | If (cond, then_, else_) ->
      MATCH eval env cond WITH
      | VBool true -> eval env then_
      | VBool false -> eval env else_
      | _ -> error "Condition must be boolean"

  | Match (e, cases) ->
      let v = eval env e in
      eval_match env v cases

Type inference (Algorithm W):

TYPE typ =
  | TInt | TBool
  | TVar of string                        (* Type variable *)
  | TArrow of typ * typ                   (* a -> b *)
  | TList of typ                          (* a list *)
  | TCtor of string * typ list            (* custom types *)

TYPE scheme = Forall of string list * typ (* Polymorphic type *)

TYPE subst = (string * typ) list          (* Type substitution *)

FUNCTION infer(env, expr):
  MATCH expr WITH
  | Int _ -> (empty_subst, TInt)
  | Bool _ -> (empty_subst, TBool)

  | Var x ->
      let scheme = lookup x env in
      (empty_subst, instantiate scheme)

  | Fun (x, body) ->
      let tv = fresh_type_var () in
      let env' = extend env x (Forall([], tv)) in
      let (s, t) = infer env' body in
      (s, TArrow(apply s tv, t))

  | App (e1, e2) ->
      let (s1, t1) = infer env e1 in
      let (s2, t2) = infer (apply_env s1 env) e2 in
      let tv = fresh_type_var () in
      let s3 = unify (apply s2 t1) (TArrow(t2, tv)) in
      (compose s3 (compose s2 s1), apply s3 tv)

  | Let (x, e1, e2) ->
      let (s1, t1) = infer env e1 in
      let env' = apply_env s1 env in
      let scheme = generalize env' t1 in
      let (s2, t2) = infer (extend env' x scheme) e2 in
      (compose s2 s1, t2)

Unification:

FUNCTION unify(t1, t2):
  MATCH (t1, t2) WITH
  | (TInt, TInt) -> empty_subst
  | (TBool, TBool) -> empty_subst
  | (TVar a, t) -> bind a t
  | (t, TVar a) -> bind a t
  | (TArrow(a1, b1), TArrow(a2, b2)) ->
      let s1 = unify a1 a2 in
      let s2 = unify (apply s1 b1) (apply s1 b2) in
      compose s2 s1
  | (TList t1, TList t2) -> unify t1 t2
  | _ -> error "Type mismatch"

FUNCTION bind(a, t):
  IF t = TVar a THEN empty_subst
  ELSE IF occurs a t THEN error "Infinite type"
  ELSE [(a, t)]

Critical insight: Type inference is constraint solving. Each expression generates constraints (e.g., “the argument type must match the function’s parameter type”), and unification solves them. Let-polymorphism allows types to be generalized at let-bindings but not inside function bodies.

Learning milestones

Basic expressions evaluate → You understand interpreters
Functions and closures work → You understand lexical scoping
Type inference works → You understand Hindley-Milner
Pattern matching works → You’ve built a real language

Project 8: Async Web Crawler with Lwt

File: LEARNING_OCAML.md
Main Programming Language: OCaml
Alternative Programming Languages: Haskell (async), Rust (tokio), Go
Coolness Level: Level 3: Genuinely Clever
Business Potential: 2. The “Micro-SaaS / Pro Tool” (Solo-Preneur Potential)
Difficulty: Level 3: Advanced (The Engineer)
Knowledge Area: Concurrency / Networking
Software or Tool: Web Crawler
Main Book: “Real World OCaml” by Yaron Minsky

What you’ll build

A concurrent web crawler using Lwt (or Eio for OCaml 5) that fetches pages in parallel, extracts links, and respects rate limits.

Why it teaches OCaml

Lwt introduces you to monadic concurrency in OCaml. You’ll learn how cooperative multitasking works without threads, and how to structure asynchronous code using promises and combinators.

Core challenges you’ll face

Promise-based concurrency (Lwt.t monad) → maps to cooperative multitasking
HTTP client usage (cohttp-lwt) → maps to library integration
Link extraction (parsing HTML) → maps to practical parsing
Rate limiting (respect robots.txt, limit concurrent requests) → maps to resource management
Duplicate detection (don’t revisit URLs) → maps to state management

Resources for key challenges

Lwt Manual: github.com/ocsigen/lwt
Eio 1.0: Tarides Blog - Eio Release

Key Concepts

Lwt Promises: “OCaml Programming” Chapter 8.7 - Cornell CS3110
Concurrent Programming: “Real World OCaml” Chapter 18 - Yaron Minsky
HTTP Clients: cohttp-lwt documentation
HTML Parsing: lambdasoup library

Difficulty

Advanced

Time estimate

2-3 weeks

Prerequisites

Projects 1-3 completed, understanding of async concepts

Real world outcome

$ ./crawler https://example.com --depth=2 --concurrency=10
Starting crawl from https://example.com
Max depth: 2
Concurrent requests: 10

[1/1] Fetched https://example.com (200 OK, 1256 bytes)
  Found 15 links

[2/15] Fetching https://example.com/about ...
[3/15] Fetching https://example.com/products ...
[4/15] Fetching https://example.com/contact ...
... (10 requests in parallel) ...

[16/42] Depth 2: Fetching https://example.com/products/widget ...

Crawl complete!
  Pages fetched: 42
  Total bytes: 1,234,567
  Time elapsed: 3.2s
  Effective rate: 13.1 pages/second

$ ./crawler https://example.com --output=sitemap.json
Wrote sitemap with 42 pages to sitemap.json

Implementation Hints

Lwt basics:

(* Lwt.t is a promise - a value that will be available later *)
(* bind (>>=) chains async operations *)
(* return wraps a value in a resolved promise *)

(* Fetch a URL and return its body *)
FUNCTION fetch(url):
  Cohttp_lwt_unix.Client.get (Uri.of_string url) >>= fun (response, body) ->
  Cohttp_lwt.Body.to_string body >>= fun body_string ->
  Lwt.return (response, body_string)

(* Using let%lwt syntax (with lwt_ppx) *)
FUNCTION fetch(url):
  let%lwt (response, body) = Cohttp_lwt_unix.Client.get (Uri.of_string url) in
  let%lwt body_string = Cohttp_lwt.Body.to_string body in
  Lwt.return (response, body_string)

Crawler structure:

TYPE crawl_state = {
  visited: StringSet.t;
  pending: string Queue.t;
  results: (string * int * string) list;  (* url, status, title *)
}

FUNCTION crawl(start_url, max_depth, concurrency):
  let state = ref { visited = StringSet.empty; ... } in

  let rec worker () =
    MATCH Queue.take_opt state.pending WITH
    | None -> Lwt.return ()
    | Some (url, depth) ->
        IF depth > max_depth OR StringSet.mem url state.visited THEN
          worker ()
        ELSE
          state := { !state with visited = StringSet.add url !state.visited };
          let%lwt result = fetch url in
          process_result state result depth;
          worker ()
  in

  (* Start multiple workers *)
  Queue.push (start_url, 0) state.pending;
  Lwt.join (List.init concurrency (fun _ -> worker ()))

Parallel fetching with semaphore:

(* Limit concurrent requests *)
let semaphore = Lwt_semaphore.create concurrency

FUNCTION fetch_limited(url):
  Lwt_semaphore.with_semaphore semaphore (fun () ->
    fetch url
  )

(* Fetch multiple URLs in parallel *)
FUNCTION fetch_all(urls):
  Lwt.all (List.map fetch_limited urls)

Link extraction using lambdasoup:

FUNCTION extract_links(base_url, html):
  let soup = Soup.parse html in
  soup $$ "a[href]"
  |> Soup.to_list
  |> List.filter_map (fun a ->
      Soup.attribute "href" a
      |> Option.map (resolve_url base_url)
    )
  |> List.filter is_same_domain

Critical insight: Lwt uses cooperative multitasking. Each >>= is a potential yield point where other promises can make progress. This means you get concurrency without threads, but you must be careful not to block (use Lwt_unix.sleep instead of Unix.sleep).

Learning milestones

Single-threaded fetch works → You understand Lwt basics
Parallel fetching works → You understand Lwt.join
Rate limiting works → You understand semaphores
Full crawler is robust → You can build async systems

Project 9: OCaml 5 Effects: Algebraic Effect Handlers

File: LEARNING_OCAML.md
Main Programming Language: OCaml
Alternative Programming Languages: (Effect handlers are rare - Eff, Koka, Multicore OCaml)
Coolness Level: Level 5: Pure Magic (Super Cool)
Business Potential: 1. The “Resume Gold” (Educational/Personal Brand)
Difficulty: Level 4: Expert (The Systems Architect)
Knowledge Area: Advanced Language Features / Concurrency
Software or Tool: Effect-Based Concurrency
Main Book: OCaml 5 Effects Tutorial (GitHub)

What you’ll build

A series of programs demonstrating OCaml 5’s algebraic effect handlers: build your own async/await, implement cooperative threading, create resumable exceptions, and build a state monad without monads.

Why it teaches OCaml

OCaml 5 is the first mainstream language with algebraic effect handlers. Effects are more fundamental than monads—they can express exceptions, async/await, generators, and more, all with a unified mechanism.

Core challenges you’ll face

Performing effects (declare and raise effects) → maps to effect declaration
Handling effects (intercept and respond to effects) → maps to delimited continuations
Resumable effects (unlike exceptions, you can resume) → maps to continuation management
Building async/await (effects + fibers) → maps to concurrency primitives

Resources for key challenges

OCaml 5 Tutorial: github.com/ocaml-multicore/ocaml5-tutorial
Effects Tutorial: github.com/ocaml-multicore/ocaml-effects-tutorial
Eio Library: github.com/ocaml-multicore/eio

Key Concepts

Algebraic Effects: OCaml Effects Tutorial
Delimited Continuations: “OCaml 5 Effects Tutorial” - ocaml-multicore
Eio for Direct-Style I/O: Eio 1.0 Blog Post
Domains and Parallelism: OCaml 5 Parallel Tutorial

Difficulty

Expert

Time estimate

2-3 weeks

Prerequisites

Projects 1-7 completed, understanding of continuations helpful

Real world outcome

(* Example 1: State without monad *)
effect Get : int
effect Set : int -> unit

let state_handler init f =
  let rec loop s =
    match f () with
    | x -> x
    | effect Get k -> loop s; continue k s
    | effect (Set s') k -> loop s'; continue k ()
  in
  loop init

let example () =
  let x = perform Get in
  perform (Set (x + 1));
  let y = perform Get in
  x + y

let () = Printf.printf "%d\n" (state_handler 10 example)
(* Prints: 21 *)

(* Example 2: Cooperative threading *)
effect Yield : unit
effect Fork : (unit -> unit) -> unit

let scheduler main =
  let queue = Queue.create () in
  let rec spawn f =
    match f () with
    | () -> next ()
    | effect Yield k ->
        Queue.push k queue;
        next ()
    | effect (Fork f) k ->
        Queue.push k queue;
        spawn f
  and next () =
    match Queue.take_opt queue with
    | None -> ()
    | Some k -> continue k ()
  in
  spawn main

(* Example 3: Generators/iterators *)
effect Yield : int -> unit

let generator f =
  let rec loop () =
    match f () with
    | () -> None
    | effect (Yield v) k ->
        Some (v, fun () -> continue k ())
  in
  loop

let count_up_to n () =
  for i = 1 to n do
    perform (Yield i)
  done

let () =
  let gen = generator (count_up_to 5) in
  let rec consume g =
    match g () with
    | None -> ()
    | Some (v, next) ->
        Printf.printf "%d\n" v;
        consume next
  in
  consume gen
(* Prints: 1 2 3 4 5 *)

Implementation Hints

Basic effect declaration and handling:

(* OCaml 5 syntax *)
(* Declare an effect *)
type _ Effect.t += MyEffect : int -> string Effect.t

(* Perform an effect *)
let result = Effect.perform (MyEffect 42)

(* Handle an effect *)
let handler f =
  Effect.Deep.try_with f ()
    { effc = fun (type a) (eff : a Effect.t) ->
        match eff with
        | MyEffect n ->
            Some (fun (k : (a, _) Effect.Deep.continuation) ->
              (* k is the continuation - we can resume it *)
              Effect.Deep.continue k ("Got: " ^ string_of_int n)
            )
        | _ -> None  (* Re-raise other effects *)
    }

Building async/await:

TYPE 'a promise = {
  mutable state: 'a promise_state
}
AND 'a promise_state =
  | Pending of ('a -> unit) list  (* waiters *)
  | Resolved of 'a
  | Rejected of exn

EFFECT Await : 'a promise -> 'a

FUNCTION async(f):
  let p = { state = Pending [] } in
  (* Schedule f to run, will resolve p when done *)
  spawn (fun () ->
    try
      let result = f () in
      resolve p result
    with e ->
      reject p e
  );
  p

FUNCTION await(p):
  perform (Await p)

FUNCTION scheduler():
  (* Handle Await by suspending until promise resolves *)
  match f () with
  | result -> result
  | effect (Await p) k ->
      match p.state with
      | Resolved v -> continue k v
      | Rejected e -> discontinue k e
      | Pending waiters ->
          p.state <- Pending ((fun v -> continue k v) :: waiters);
          run_next ()  (* Run other fibers *)

Transactional memory with effects:

EFFECT Read : 'a ref -> 'a
EFFECT Write : 'a ref * 'a -> unit
EFFECT Abort : unit

FUNCTION transactional(f):
  let log = ref [] in
  match f () with
  | result ->
      (* Commit: apply all writes *)
      List.iter (fun (r, v) -> r := v) !log;
      Some result
  | effect (Read r) k ->
      (* Read from log or actual ref *)
      let v = match assoc_opt r !log with
        | Some v -> v
        | None -> !r
      in
      continue k v
  | effect (Write (r, v)) k ->
      (* Record write in log, don't apply yet *)
      log := (r, v) :: !log;
      continue k ()
  | effect Abort _ ->
      (* Discard log, return None *)
      None

Critical insight: Effects are like resumable exceptions. When you perform an effect, control transfers to the handler. But unlike exceptions, the handler can continue with the continuation, resuming exactly where the effect was performed. This is incredibly powerful for building control structures.

Learning milestones

Basic effects work → You understand perform/continue
State effect works → You understand stateful handlers
Cooperative threading works → You understand scheduling
Async/await works → You can build real concurrency

Project 10: OCaml 5 Multicore Parallel Ray Tracer

File: LEARNING_OCAML.md
Main Programming Language: OCaml
Alternative Programming Languages: Rust, C++, Go
Coolness Level: Level 5: Pure Magic (Super Cool)
Business Potential: 1. The “Resume Gold” (Educational/Personal Brand)
Difficulty: Level 4: Expert (The Systems Architect)
Knowledge Area: Parallelism / Graphics
Software or Tool: Ray Tracer
Main Book: “Ray Tracing in One Weekend” + OCaml 5 Parallelism Tutorial

What you’ll build

A parallel ray tracer using OCaml 5’s domains (parallel execution) and domainslib for work-stealing parallelism. Render complex 3D scenes using multiple CPU cores.

Why it teaches OCaml

Ray tracing is embarrassingly parallel—each pixel can be computed independently. This makes it perfect for learning OCaml 5’s new parallelism features without complex synchronization.

Core challenges you’ll face

Domains (parallel execution contexts) → maps to multi-core parallelism
Work-stealing (domainslib task pools) → maps to load balancing
Parallel iteration (parallel_for) → maps to data parallelism
Ray-sphere intersection (3D math) → maps to functional numerics
Recursive ray bouncing (reflections, refractions) → maps to recursive parallelism

Resources for key challenges

Ray Tracing in One Weekend: raytracing.github.io
OCaml 5 Parallelism: github.com/ocaml-multicore/ocaml5-tutorial
Domainslib: github.com/ocaml-multicore/domainslib

Key Concepts

Domains: OCaml 5 Tutorial
Domainslib: Parallel Programming in Multicore OCaml
Vector Math: “Ray Tracing in One Weekend” - Peter Shirley
Rendering Equation: “Ray Tracing in One Weekend” Chapter 9

Difficulty

Expert

Time estimate

3-4 weeks

Prerequisites

Projects 1-5 completed, basic 3D math understanding

Real world outcome

$ ./raytracer --width=1920 --height=1080 --samples=100 --cores=8
Rendering 1920x1080 image with 100 samples/pixel
Using 8 parallel domains (work-stealing enabled)

[========================================] 100%

Render complete!
  Total rays: 207,360,000
  Time: 12.3s
  Rays/second: 16,851,219
  Speedup vs 1 core: 6.8x

Saved to output.ppm

$ ./raytracer --scene=cornell-box --samples=1000
Rendering Cornell Box scene...
Features: diffuse, specular, emission, glass
...

Implementation Hints

Vector type (immutable):

TYPE vec3 = { x: float; y: float; z: float }

let vec x y z = { x; y; z }
let add a b = { x = a.x +. b.x; y = a.y +. b.y; z = a.z +. b.z }
let sub a b = { x = a.x -. b.x; y = a.y -. b.y; z = a.z -. b.z }
let mul a b = { x = a.x *. b.x; y = a.y *. b.y; z = a.z *. b.z }
let scale t v = { x = t *. v.x; y = t *. v.y; z = t *. v.z }
let dot a b = a.x *. b.x +. a.y *. b.y +. a.z *. b.z
let length v = sqrt (dot v v)
let normalize v = scale (1.0 /. length v) v

Ray and sphere intersection:

TYPE ray = { origin: vec3; direction: vec3 }
TYPE sphere = { center: vec3; radius: float; material: material }

FUNCTION hit_sphere(ray, sphere, t_min, t_max):
  let oc = sub ray.origin sphere.center in
  let a = dot ray.direction ray.direction in
  let half_b = dot oc ray.direction in
  let c = dot oc oc -. sphere.radius *. sphere.radius in
  let discriminant = half_b *. half_b -. a *. c in
  IF discriminant < 0.0 THEN None
  ELSE
    let sqrtd = sqrt discriminant in
    let root = (-.half_b -. sqrtd) /. a in
    IF root < t_min || root > t_max THEN
      let root = (-.half_b +. sqrtd) /. a in
      IF root < t_min || root > t_max THEN None
      ELSE Some (hit_record ray root sphere)
    ELSE Some (hit_record ray root sphere)

Recursive ray tracing:

FUNCTION ray_color(ray, world, depth):
  IF depth <= 0 THEN vec 0.0 0.0 0.0  (* Max bounces *)
  ELSE
    MATCH hit_anything world ray 0.001 infinity WITH
    | None -> sky_color ray
    | Some hit ->
        MATCH scatter hit.material ray hit WITH
        | None -> vec 0.0 0.0 0.0  (* Absorbed *)
        | Some (attenuation, scattered) ->
            mul attenuation (ray_color scattered world (depth - 1))

Parallel rendering with domainslib:

let render_parallel width height samples world camera =
  let pool = Task.setup_pool ~num_domains:7 () in  (* 7 workers + main *)
  let image = Array.make_matrix height width (vec 0.0 0.0 0.0) in

  Task.run pool (fun () ->
    Task.parallel_for pool ~start:0 ~finish:(height - 1)
      ~body:(fun y ->
        for x = 0 to width - 1 do
          let color = ref (vec 0.0 0.0 0.0) in
          for _ = 1 to samples do
            let u = (float x +. Random.float 1.0) /. float (width - 1) in
            let v = (float y +. Random.float 1.0) /. float (height - 1) in
            let ray = Camera.get_ray camera u v in
            color := add !color (ray_color ray world 50)
          done;
          image.(y).(x) <- scale (1.0 /. float samples) !color
        done
      )
  );

  Task.teardown_pool pool;
  image

Critical insight: OCaml 5’s domains are true OS threads with parallel execution. Domainslib’s work-stealing scheduler dynamically balances work across domains. For ray tracing, this means rows that take longer (complex geometry) automatically get redistributed.

Learning milestones

Single-threaded ray tracer works → You understand ray tracing math
Parallel rendering scales → You understand domains
Work-stealing improves load balance → You understand domainslib
Final images are beautiful → You’ve built something cool

Project 11: Compiler Backend: OCaml to Stack Machine

File: LEARNING_OCAML.md
Main Programming Language: OCaml
Alternative Programming Languages: Haskell, Standard ML, Rust
Coolness Level: Level 5: Pure Magic (Super Cool)
Business Potential: 1. The “Resume Gold” (Educational/Personal Brand)
Difficulty: Level 5: Master (The First-Principles Wizard)
Knowledge Area: Compilers / Code Generation
Software or Tool: Compiler Backend
Main Book: “Modern Compiler Implementation in ML” by Andrew Appel

What you’ll build

A compiler that takes your Mini-ML interpreter (Project 7) and adds a bytecode compilation backend—generate stack-machine instructions and execute them in a virtual machine.

Why it teaches OCaml

This is where OCaml truly shines. You’ll see why pattern matching and algebraic types make compiler writing natural. The compiler itself and the VM are both beautiful OCaml code.

Core challenges you’ll face

Bytecode design (instruction set for a stack machine) → maps to VM architecture
Code generation (translate AST to bytecode) → maps to tree-to-linear transformation
Closure conversion (eliminate free variables) → maps to lambda lifting
Virtual machine (execute bytecode) → maps to interpreter optimization
Tail call optimization (compile tail calls efficiently) → maps to control flow

Resources for key challenges

Modern Compiler Implementation in ML by Andrew Appel - Chapters 6-8
Crafting Interpreters - Part III (Bytecode VM)

Key Concepts

Stack Machines: “Modern Compiler Implementation in ML” Chapter 6 - Appel
Closure Conversion: “Modern Compiler Implementation in ML” Chapter 15 - Appel
Bytecode VMs: “Crafting Interpreters” Part III - Bob Nystrom
Tail Call Optimization: “OCaml Programming” Chapter 8.3 - Cornell CS3110

Difficulty

Master

Time estimate

4-6 weeks

Prerequisites

Project 7 (Mini-ML Interpreter) completed

Real world outcome

$ ./miniml-compile factorial.ml
Compiling factorial.ml...
Generated 45 bytecode instructions

$ ./miniml-compile --show-bytecode factorial.ml
; factorial.ml bytecode
0000: PUSH_INT 1
0001: STORE_LOCAL 0      ; n
0002: LOAD_LOCAL 0
0003: PUSH_INT 1
0004: LE
0005: JUMP_IF_FALSE 9
0006: PUSH_INT 1
0007: RETURN
0008: JUMP 18
0009: LOAD_LOCAL 0
0010: LOAD_LOCAL 0
0011: PUSH_INT 1
0012: SUB
0013: LOAD_GLOBAL "fact"
0014: CALL 1
0015: MUL
0016: RETURN

$ ./miniml-vm factorial.mlc
Running factorial.mlc...
> fact 10
3628800
> fact 20
2432902008176640000
Executed 842 instructions in 0.001ms

Implementation Hints

Bytecode instruction set:

TYPE instruction =
  (* Stack manipulation *)
  | PUSH_INT of int
  | PUSH_BOOL of bool
  | PUSH_UNIT
  | POP

  (* Local variables (stack-relative) *)
  | LOAD_LOCAL of int      (* offset from frame pointer *)
  | STORE_LOCAL of int

  (* Global variables *)
  | LOAD_GLOBAL of string
  | STORE_GLOBAL of string

  (* Arithmetic *)
  | ADD | SUB | MUL | DIV | MOD
  | EQ | NE | LT | LE | GT | GE
  | AND | OR | NOT

  (* Control flow *)
  | JUMP of int            (* absolute address *)
  | JUMP_IF_FALSE of int
  | CALL of int            (* num args *)
  | RETURN
  | TAIL_CALL of int       (* optimized tail call *)

  (* Closures *)
  | MAKE_CLOSURE of int * int list  (* code addr, captured vars *)
  | LOAD_FREE of int       (* access captured variable *)

Compiler structure:

(* Compilation environment *)
TYPE compile_env = {
  locals: (string * int) list;   (* name -> stack offset *)
  frees: (string * int) list;    (* name -> closure index *)
  depth: int;                    (* current stack depth *)
}

FUNCTION compile(env, expr):
  MATCH expr WITH
  | Int n -> [PUSH_INT n]

  | Var x ->
      MATCH lookup_local x env WITH
      | Some offset -> [LOAD_LOCAL offset]
      | None ->
          MATCH lookup_free x env WITH
          | Some index -> [LOAD_FREE index]
          | None -> [LOAD_GLOBAL x]

  | BinOp (op, e1, e2) ->
      compile env e1 @
      compile (incr_depth env) e2 @
      [compile_op op]

  | If (cond, then_, else_) ->
      let cond_code = compile env cond in
      let then_code = compile env then_ in
      let else_code = compile env else_ in
      cond_code @
      [JUMP_IF_FALSE (length then_code + 1)] @
      then_code @
      [JUMP (length else_code)] @
      else_code

  | Fun (x, body) ->
      (* Closure conversion: find free variables *)
      let free_vars = free_variables expr in
      let body_env = {
        locals = [(x, 0)];
        frees = List.mapi (fun i v -> (v, i)) free_vars;
        depth = 1;
      } in
      let body_code = compile body_env body @ [RETURN] in
      let code_addr = emit_code body_code in
      let capture_instrs = List.map (fun v ->
        MATCH lookup_local v env WITH
        | Some off -> LOAD_LOCAL off
        | None -> LOAD_FREE (lookup_free v env)
      ) free_vars in
      capture_instrs @ [MAKE_CLOSURE (code_addr, length free_vars)]

  | App (fn, arg) ->
      compile env arg @
      compile (incr_depth env) fn @
      [CALL 1]

Virtual machine:

TYPE vm_state = {
  mutable stack: value array;
  mutable sp: int;              (* stack pointer *)
  mutable fp: int;              (* frame pointer *)
  mutable pc: int;              (* program counter *)
  code: instruction array;
  globals: (string, value) Hashtbl.t;
}

FUNCTION run(vm):
  WHILE vm.pc < length vm.code DO
    let instr = vm.code.(vm.pc) in
    vm.pc <- vm.pc + 1;

    MATCH instr WITH
    | PUSH_INT n ->
        push vm (VInt n)

    | LOAD_LOCAL offset ->
        push vm (vm.stack.(vm.fp + offset))

    | ADD ->
        let b = pop_int vm in
        let a = pop_int vm in
        push vm (VInt (a + b))

    | JUMP_IF_FALSE addr ->
        IF not (pop_bool vm) THEN
          vm.pc <- addr

    | CALL nargs ->
        let closure = pop vm in
        MATCH closure WITH
        | VClosure (code_addr, captured) ->
            (* Save return address and frame pointer *)
            push vm (VReturnAddr vm.pc);
            push vm (VFramePtr vm.fp);
            vm.fp <- vm.sp - nargs - 2;
            (* Push captured variables *)
            List.iter (push vm) captured;
            vm.pc <- code_addr
        | _ -> error "Not a closure"

    | RETURN ->
        let result = pop vm in
        vm.sp <- vm.fp;
        let old_fp = pop_frame_ptr vm in
        let ret_addr = pop_return_addr vm in
        vm.fp <- old_fp;
        vm.pc <- ret_addr;
        push vm result

    | TAIL_CALL nargs ->
        (* Reuse current stack frame *)
        let closure = pop vm in
        (* Move args to current frame position *)
        ...
        vm.pc <- closure.code_addr
  DONE

Critical insight: The key insight is closure conversion—transforming functions with free variables into closures that explicitly capture those variables. The MAKE_CLOSURE instruction creates a closure value containing the code address and captured variables.

Learning milestones

Simple expressions compile → You understand stack machines
Functions and closures compile → You understand closure conversion
Recursion works → You understand call frames
Tail call optimization works → You’ve built an efficient compiler

Project 12: Full-Featured Build System (Mini-Dune)

File: LEARNING_OCAML.md
Main Programming Language: OCaml
Alternative Programming Languages: Rust, Go
Coolness Level: Level 4: Hardcore Tech Flex
Business Potential: 4. The “Open Core” Infrastructure
Difficulty: Level 5: Master (The First-Principles Wizard)
Knowledge Area: Build Systems / Developer Tools
Software or Tool: Build System
Main Book: “The GNU Make Book” + Dune source code

What you’ll build

A simplified build system inspired by Dune, with dependency tracking, incremental rebuilds, parallel execution, and sandboxing.

Why it teaches OCaml

Build systems are perfect for OCaml: they’re essentially dependency graph evaluators with complex rules. You’ll use every feature of OCaml: ADTs for rules, modules for organization, Lwt/Eio for parallelism, and effects for the build monad.

Core challenges you’ll face

Dependency DAG (build order from dependencies) → maps to graph algorithms
Incremental builds (track file hashes, rebuild minimally) → maps to memoization
Parallel execution (build independent targets concurrently) → maps to work-stealing
Rule evaluation (execute build commands) → maps to process management
Sandboxing (isolate build actions) → maps to filesystem virtualization

Key Concepts

Build Systems à la Carte: Microsoft Research Paper
Dune Internals: github.com/ocaml/dune
Incremental Computation: Jane Street’s Incremental library
DAG Topological Sort: “Algorithms” Chapter 4.2 - Sedgewick

Difficulty

Master

Time estimate

6-8 weeks

Prerequisites

All previous projects, understanding of build systems

Real world outcome

$ ./minidune build
Reading build rules from dune file...
Dependency graph: 45 nodes, 78 edges

[1/45] Compiling src/utils.ml
[2/45] Compiling src/parser.ml
[3/45] Compiling src/lexer.ml
... (running 8 jobs in parallel) ...
[45/45] Linking bin/main.exe

Build successful in 3.2s
  Targets built: 45
  Targets cached: 0
  Parallelism: 8 cores

$ ./minidune build  # Second run
Checking dependencies...
All targets up-to-date (cached).
Build successful in 0.05s

$ touch src/parser.ml && ./minidune build
Invalidated: src/parser.ml (modified)
Rebuilding 3 targets...
[1/3] Compiling src/parser.ml
[2/3] Compiling src/main.ml (depends on parser)
[3/3] Linking bin/main.exe
Build successful in 0.8s

Implementation Hints

Build rule representation:

TYPE target =
  | File of string
  | Alias of string
  | Phony of string

TYPE rule = {
  targets: target list;
  deps: target list;
  action: action;
}

TYPE action =
  | Run of string list         (* command and args *)
  | Write_file of string * string
  | Copy of string * string
  | Mkdir of string
  | Sequence of action list
  | Concurrent of action list  (* independent actions *)

Dependency graph and topological sort:

TYPE build_graph = {
  rules: rule StringMap.t;        (* target -> rule *)
  rdeps: StringSet.t StringMap.t; (* reverse dependencies *)
}

FUNCTION build_order(graph, targets):
  (* Kahn's algorithm for topological sort *)
  let in_degree = compute_in_degrees graph targets in
  let queue = targets with in_degree = 0 in
  let result = [] in

  WHILE queue not empty:
    let node = dequeue queue in
    append node to result
    FOR EACH successor IN graph.rdeps[node]:
      decrement in_degree[successor]
      IF in_degree[successor] = 0:
        enqueue successor

  IF length result != expected THEN
    error "Cycle detected"
  RETURN result

Incremental build with file hashing:

TYPE build_state = {
  file_hashes: (string, string) Hashtbl.t;  (* path -> hash *)
  rule_hashes: (string, string) Hashtbl.t;  (* target -> combined hash *)
}

FUNCTION needs_rebuild(state, target, rule):
  (* Check if any dependency is newer or changed *)
  let current_hash = compute_rule_hash rule in
  MATCH Hashtbl.find_opt state.rule_hashes target WITH
  | None -> true  (* Never built *)
  | Some old_hash ->
      IF current_hash != old_hash THEN true
      ELSE
        (* Check actual file hashes *)
        List.exists (fun dep ->
          let old = Hashtbl.find_opt state.file_hashes dep in
          let new_ = hash_file dep in
          old != Some new_
        ) rule.deps

FUNCTION compute_rule_hash(rule):
  (* Hash of: action + all dependency hashes *)
  let dep_hashes = List.map hash_file rule.deps in
  let action_hash = hash (show_action rule.action) in
  hash (action_hash :: dep_hashes)

Parallel execution with work stealing:

FUNCTION build_parallel(graph, targets, num_workers):
  let pool = Domainslib.Task.setup_pool ~num_domains:(num_workers - 1) () in
  let state = load_or_create_state () in

  (* Find all ready targets (deps already built) *)
  let ready = ref (initial_ready_set graph targets) in
  let building = ref StringSet.empty in
  let done_ = ref StringSet.empty in

  Domainslib.Task.run pool (fun () ->
    WHILE not (StringSet.is_empty !ready && StringSet.is_empty !building):
      (* Spawn tasks for ready targets *)
      let batch = StringSet.elements !ready in
      ready := StringSet.empty;
      building := StringSet.union !building (StringSet.of_list batch);

      Domainslib.Task.parallel_for pool ~start:0 ~finish:(length batch - 1)
        ~body:(fun i ->
          let target = batch.(i) in
          build_target state target;
        );

      (* Move completed to done, find newly ready *)
      done_ := StringSet.union !done_ !building;
      building := StringSet.empty;
      ready := find_newly_ready graph !done_;
  );

  save_state state;
  Domainslib.Task.teardown_pool pool

Critical insight: The key innovation in modern build systems like Dune is content-addressed caching. Instead of tracking modification times, hash the content and action. This enables sharing cached results across machines and detecting when a rebuild is truly necessary.

Learning milestones

Basic builds work → You understand DAG evaluation
Incremental builds work → You understand caching
Parallel builds scale → You understand work distribution
Sandboxing isolates actions → You’ve built a production tool

Project Comparison Table

Project	Difficulty	Time	Depth of Understanding	Fun Factor
1. CLI Calculator	Beginner-Intermediate	1 week	⭐⭐⭐	⭐⭐⭐⭐
2. JSON Parser	Intermediate	1-2 weeks	⭐⭐⭐	⭐⭐⭐
3. Markdown Converter	Intermediate	2 weeks	⭐⭐⭐	⭐⭐⭐
4. Immutable Data Structures	Advanced	3-4 weeks	⭐⭐⭐⭐⭐	⭐⭐⭐⭐
5. Functors Library	Advanced	2-3 weeks	⭐⭐⭐⭐⭐	⭐⭐⭐
6. Type-Safe Query Builder	Expert	3-4 weeks	⭐⭐⭐⭐⭐	⭐⭐⭐⭐
7. Mini-ML Interpreter	Expert	4-6 weeks	⭐⭐⭐⭐⭐	⭐⭐⭐⭐⭐
8. Async Web Crawler	Advanced	2-3 weeks	⭐⭐⭐⭐	⭐⭐⭐⭐
9. OCaml 5 Effects	Expert	2-3 weeks	⭐⭐⭐⭐⭐	⭐⭐⭐⭐⭐
10. Parallel Ray Tracer	Expert	3-4 weeks	⭐⭐⭐⭐	⭐⭐⭐⭐⭐
11. Compiler Backend	Master	4-6 weeks	⭐⭐⭐⭐⭐	⭐⭐⭐⭐⭐
12. Build System	Master	6-8 weeks	⭐⭐⭐⭐⭐	⭐⭐⭐⭐

Recommendation

If you’re new to OCaml and functional programming:

Start with Project 1 (Calculator) and Project 2 (JSON Parser). These teach the core concepts—algebraic types, pattern matching, recursion—without overwhelming complexity. You’ll immediately feel the power of OCaml.

If you want to understand OCaml’s unique strengths:

Do Project 5 (Functors) and Project 7 (Mini-ML Interpreter). Functors show you why OCaml’s module system is unmatched. The interpreter teaches you why OCaml is the language of choice for programming language research.

If you’re interested in production OCaml:

Focus on Project 8 (Async Crawler) for real-world async programming, and Project 9 (Effects) + Project 10 (Multicore) for cutting-edge OCaml 5 features.

If you want to go deep:

Project 11 (Compiler Backend) is the crown jewel. Extending your interpreter with compilation teaches you how real programming languages work.

If you have 6+ months:

Work through all projects in order. By the end, you’ll be comfortable writing production OCaml at a level competitive with Jane Street engineers.

Summary

#	Project	Main Language
1	CLI Calculator with Expression Parser	OCaml
2	JSON Parser and Manipulator	OCaml
3	Markdown to HTML Converter	OCaml
4	Immutable Data Structures Library	OCaml
5	Generic Collections with Functors	OCaml
6	Type-Safe Database Query Builder	OCaml
7	Mini-ML Interpreter	OCaml
8	Async Web Crawler with Lwt	OCaml
9	OCaml 5 Effects: Algebraic Effect Handlers	OCaml
10	OCaml 5 Multicore Parallel Ray Tracer	OCaml
11	Compiler Backend: OCaml to Stack Machine	OCaml
12	Full-Featured Build System (Mini-Dune)	OCaml