← Back to all projects

LEARN SML DEEP DIVE

Learn Standard ML: From Zero to Type-Safe Systems Programmer

Goal: Master the Standard ML (SML) language to deeply understand the power of a statically-typed functional language with type inference, a world-class module system, and rigorous pattern matching. Learn a new way to reason about software that emphasizes correctness and robust design.


Why Learn Standard ML?

Standard ML is not just a language; it’s a meticulously designed tool for thinking. It belongs to the ML family of languages, which pioneered many concepts we now take for granted in modern programming, like static type inference. SML occupies a sweet spot: it’s a functional-first language that is far more rigorous than Lisp, but more pragmatic and less dogmatic about purity than Haskell.

Learning SML will fundamentally change how you view types, modules, and program correctness. It’s the language of choice for serious work in programming language theory, compilers, and formal verification for a reason.

After completing these projects, you will:

  • Write code that is “correct by construction” thanks to a powerful and sound type system.
  • Appreciate the magic of Hindley-Milner type inference, where you get all the safety of static types with almost none of the verbosity.
  • Master one of the most advanced module systems ever created, enabling true abstraction and generic programming.
  • Deconstruct complex data with exhaustive pattern matching, eliminating entire classes of bugs.
  • Understand the trade-offs between pure functional programming and principled side effects.

Core Concept Analysis

The SML Philosophy: Type Safety Meets Pragmatism

SML’s design is a masterclass in balancing theoretical purity with practical utility. It proves that a language can be both mathematically elegant and an effective tool for building real-world systems.

┌──────────────────────────────────────────────┐
│        IMPERATIVE (e.g., Python/Java)        │
│                                              │
│  You tell the computer HOW to do something.  │
│  Types are either absent (Python) or         │
│  verbose and manual (Java).                  │
│  Bugs are often found at runtime.            │
└──────────────────────────────────────────────┘
                         │
                         ▼ Paradigm Shift
┌──────────────────────────────────────────────┐
│        PURELY FUNCTIONAL (e.g., Haskell)     │
│                                              │
│  You describe what something IS, with a     │
│  strict separation from the "dirty"         │
│  outside world (I/O).                        │
└──────────────────────────────────────────────┘
                         │
                         ▼ The SML "Sweet Spot"
┌──────────────────────────────────────────────┐
│           STANDARD ML                        │
│                                              │
│  You get the safety of Haskell's type system │
│  with the practicality of allowing controlled│
│  side effects, all wrapped in a module       │
│  system that encourages building robust,     │
│  reusable, and abstract components.          │
└──────────────────────────────────────────────┘

Fundamental Concepts

  1. Static Typing with Full Type Inference: This is SML’s “magic” feature. You write code that looks as clean as a dynamic language like Python, but the compiler deduces all the types for you before the program runs. If it compiles, it is type-safe. fun id x = x; (SML infers the type val id = fn : 'a -> 'a)

  2. Algebraic Datatypes (ADTs) and Pattern Matching: You create complex data types by combining simpler ones. Pattern matching lets you deconstruct this data in a way that is exhaustive and safe. The compiler will warn you if you forget a case. datatype color = Red | Green | Blue | RGB of int*int*int; fun isGray (RGB(r,g,b)) = (r=g andalso g=b) | isGray _ = false;

  3. The Module System (Signatures, Structures, Functors): This is SML’s most powerful feature, influencing languages like Rust and OCaml.
    • Signatures: An interface that specifies the types and values a module must provide.
    • Structures: The implementation of a signature. A collection of types, values, and other modules.
    • Functors: These are “functions” that take structures as arguments and produce new structures. They are the ultimate tool for building generic, reusable code.
    ┌──────────────┐   implements ►   ┌───────────────┐
    │  SIGNATURE   │ ◄─────────────   │   STRUCTURE   │
    │ (Interface)  │                  │ (Implementation)│
    └──────┬───────┘                  └───────────────┘
           │ is input to
           │
           ▼
    ┌──────────────┐
    │   FUNCTOR    │
    │(Module-level │
    │   Function)  │
    └──────┬───────┘
           │ produces
           ▼
    ┌───────────────┐
    │   New         │
    │   STRUCTURE   │
    └───────────────┘
    
  4. Functional Core with Controlled Effects: SML is functional-first. Immutability is the default. However, it provides explicit mechanisms for side effects like I/O and mutable state (ref types), acknowledging their necessity in real-world programming.

Project List

These projects are carefully sequenced to build your understanding from basic functional programming idioms to the advanced modular programming that makes SML unique.


Project 1: Core List Utilities

  • File: LEARN_SML_DEEP_DIVE.md
  • Main Programming Language: Standard ML
  • Alternative Programming Languages: OCaml, F#
  • Coolness Level: Level 2: Practical but Forgettable
  • Business Potential: 1. The “Resume Gold”
  • Difficulty: Level 1: Beginner
  • Knowledge Area: Functional Programming / Core Data Structures
  • Software or Tool: Standard ML of New Jersey (SML/NJ) or MLton
  • Main Book: “Programming in Standard ML” by Robert Harper

What you’ll build: Your own implementation of the essential list processing functions from the SML Basis Library: length, append, reverse, map, filter, and foldl/foldr.

Why it teaches SML: This project is a bootcamp in the functional way of thinking. You will learn to solve problems using recursion and pattern matching instead of loops and mutation. It forces you to master the fundamental [] vs x::xs pattern that underpins all list processing in ML-family languages.

Core challenges you’ll face:

  • Writing recursive functions → maps to thinking about the base case (empty list) and the inductive step (head and tail)
  • Using pattern matching on lists → maps to the fun or case syntax for deconstructing lists
  • Ensuring your functions are polymorphic → maps to writing code that works on 'a list (a list of *anything), which SML will infer automatically*
  • Implementing higher-order functions → maps to writing map and filter, which take other functions as arguments

Key Concepts:

  • Recursion and Pattern Matching: “Programming in Standard ML”, Chapters 7 & 9 - Harper
  • Parametric Polymorphism ('a): “ML for the Working Programmer”, Chapter 2 - Paulson
  • Higher-Order Functions: “Programming in Standard ML”, Chapter 11 - Harper

Difficulty: Beginner Time estimate: Weekend Prerequisites: None. This is where you start.

Real world outcome: You will have a file of utility functions. In the SML/NJ REPL, you can load your file and see them work, with their fully inferred polymorphic types.

- use "list_utils.sml";
[opening list_utils.sml]
val my_length = fn : 'a list -> int
val my_map = fn : ('a -> 'b) -> 'a list -> 'b list
...
val it = () : unit

- my_length([1, 2, 3]);
val it = 3 : int

- my_map (fn x => x + 1) [1, 2, 3];
val it = [2, 3, 4] : int list

- my_reverse(["a", "b", "c"]);
val it = ["c", "b", "a"] : string list

Implementation Hints:

  1. length: The length of an empty list [] is 0. The length of a list x::xs is 1 + the length of xs.
  2. reverse: The naive recursive version is slow. For a challenge, try to write a more efficient version using a helper function with an accumulator. fun reverse L = let fun rev ([], acc) = acc | rev(x::xs, acc) = rev(xs, x::acc) in rev(L, []) end;
  3. map: map takes a function f and a list. If the list is empty, return []. If the list is x::xs, apply f to x and cons the result onto the result of mapping f over xs.
  4. Type Inference: Don’t write any type annotations. Let the compiler do the work. Observe the types SML infers for your functions—they will tell you if your logic is correct and general enough.

Learning milestones:

  1. You can implement length and append → You understand basic recursion and pattern matching.
  2. You can implement map and filter → You understand higher-order functions.
  3. You can implement foldl → You have grasped the most general and powerful pattern of list processing.

Project 2: Abstract Datatype for Expressions

  • File: LEARN_SML_DEEP_DIVE.md
  • Main Programming Language: Standard ML
  • Alternative Programming Languages: OCaml, Haskell
  • Coolness Level: Level 3: Genuinely Clever
  • Business Potential: 1. The “Resume Gold”
  • Difficulty: Level 2: Intermediate
  • Knowledge Area: Compilers / Data Structures
  • Software or Tool: SML/NJ
  • Main Book: “Programming in Standard ML” by Robert Harper

What you’ll build: A system to represent and evaluate simple arithmetic expressions. You’ll define a datatype for an expression tree and then write an eval function that recursively computes its integer value.

Why it teaches SML: This project demonstrates SML’s power in defining and manipulating complex, recursive data structures. It’s a foundational skill for any work in compilers, interpreters, or any domain that involves symbolic manipulation. Pattern matching makes the eval function incredibly clean and safe.

Core challenges you’ll face:

  • Defining a recursive datatype → maps to representing tree-like data such as Const of int | Add of expr * expr
  • Writing a recursive eval function → maps to using pattern matching to handle each case of the datatype
  • Handling potential errors → maps to using SML’s option type (e.g., for division by zero) instead of nulls
  • Extending the language → maps to adding new constructors like Mul or Div and seeing the compiler tell you which functions need updating

Key Concepts:

  • Algebraic Datatypes: “Programming in Standard ML”, Chapter 10 - Harper
  • Case Expressions: A key part of pattern matching for complex logic.
  • Option Type: SML’s safe way to represent a value that might be missing (SOME v | NONE).

Difficulty: Intermediate Time estimate: Weekend Prerequisites: Project 1.

Real world outcome: You can define expressions as SML values and evaluate them. The compiler will help you ensure your eval function handles every possible kind of expression.

- use "eval.sml";
[opening eval.sml]
datatype expr = Const of int | Add of expr * expr | Mul of expr * expr
val eval = fn : expr -> int
val it = () : unit

- val my_expr = Add(Const 10, Mul(Const 2, Const 5));
val my_expr = Add (Const 10, Mul (Const 2, Const 5)) : expr

- eval my_expr;
val it = 20 : int

Implementation Hints:

  1. Start with your datatype definition: datatype expr = Const of int | Add of expr * expr | Neg of expr;
  2. The eval function will have a clause for each constructor:
    fun eval (Const i) = i
      | eval (Neg e) = ~ (eval e)
      | eval (Add (e1, e2)) = (eval e1) + (eval e2)
    
  3. For division, your eval function can’t return an int because division by zero has no integer result. This is a perfect use case for the option type. Change the function’s signature to fun eval : expr -> int option and handle the error case by returning NONE.
    fun eval (Div (e1, e2)) =
        case (eval e1, eval e2) of
             (SOME v1, SOME 0) => NONE (* Division by zero *)
           | (SOME v1, SOME v2) => SOME (v1 div v2)
           | _ => NONE
    

Learning milestones:

  1. You can define and evaluate simple expressions with Add and Const → You understand ADTs and recursive functions over them.
  2. You add Mul and Sub and the compiler tells you eval is non-exhaustive → You experience the safety benefits of pattern matching firsthand.
  3. You implement division using the option type for error handling → You have learned the standard, safe way to handle potential failure in SML.

Project 3: A Generic Set, Abstracted

  • File: LEARN_SML_DEEP_DIVE.md
  • Main Programming Language: Standard ML
  • Alternative Programming Languages: OCaml
  • Coolness Level: Level 4: Hardcore Tech Flex
  • Business Potential: 1. The “Resume Gold”
  • Difficulty: Level 3: Advanced
  • Knowledge Area: Module Systems / Data Abstraction
  • Software or Tool: SML/NJ
  • Main Book: “ML for the Working Programmer” by L.C. Paulson

What you’ll build: A Set data structure, implemented with a binary search tree. But crucially, you will hide the implementation details behind a SIGNATURE, and then parameterize the whole thing with a functor so it can work with any data type that has a defined ordering.

Why it teaches SML: This project teaches the crown jewel of SML: its module system. You’ll learn the difference between an interface (signature) and an implementation (structure), and then use functors to create generic modules. This is a level of abstraction and code reuse that is unmatched by simple generics in most mainstream languages.

Core challenges you’ll face:

  • Defining a SIGNATURE → maps to specifying the public API of your module (e.g., an abstract type t, an empty set, insert, member functions)
  • Creating a structure that implements the signature → maps to providing the concrete definitions for the types and functions in the signature
  • Hiding the implementation → maps to ensuring users of your Set cannot know it’s a BST, they can only use the functions you’ve exposed
  • Building a functor → maps to creating a generic “factory” that takes a type and its comparison function and produces a specialized Set structure for that type

Key Concepts:

  • Signatures: “Programming in Standard ML”, Chapter 21 - Harper
  • Structures: “Programming in Standard ML”, Chapter 20 - Harper
  • Functors: “Programming in Standard ML”, Chapter 23 - Harper
  • Abstraction: A core principle of software engineering, enforced by the SML type system.

Difficulty: Advanced Time estimate: 1-2 weeks Prerequisites: Project 2, understanding of binary search trees.

Real world outcome: You will have created a generic Set factory. You can use it to instantly create a Set of integers, a Set of strings, or a Set of anything else, just by providing an appropriate ordering module.

(* The signature for what an ordered type must provide *)
signature ORDERED = sig
  type t
  val compare : t * t -> order
end

(* The functor: a function from ORDERED structures to SET structures *)
functor MakeSet (Ord : ORDERED) :> SET = struct
  type item = Ord.t
  datatype tree = Empty | Node of tree * item * tree
  ...
  fun insert item tree = ...
  fun member item tree = ...
end

(* Create a specific implementation for integers *)
structure IntOrder :> ORDERED = struct
  type t = int
  val compare = Int.compare
end
structure IntSet = MakeSet(IntOrder)

(* Create one for strings *)
structure StringOrder :> ORDERED = struct
  type t = string
  val compare = String.compare
end
structure StringViewSet = MakeSet(StringOrder)

- val s1 = IntSet.insert 3 (IntSet.insert 1 IntSet.empty);
val s1 = - : IntSet.set
- IntSet.member 3 s1;
val it = true : bool

Implementation Hints:

  1. Start with the ORDERED signature. It’s simple: it needs to define a type t and a compare function for that type.
  2. Implement IntOrder and StringOrder. These are simple structures that fulfill the ORDERED signature for built-in types.
  3. Define your SET signature. This will be your public API. It should include type set, val empty : set, val insert : item * set -> set, val member : item * set -> bool. The item type will be abstract here.
  4. Build the functor MakeSet. Its argument will be (Ord : ORDERED). Inside the functor, you’ll define your BST, but you will use Ord.t as the type of the elements and Ord.compare as the comparison function.
  5. The functor ... :> SET syntax is called “opaque signature matching” or “sealing”. It ensures that users of your IntSet cannot see that IntSet.set is secretly a tree. This enforces abstraction.

Learning milestones:

  1. You create a working IntSet structure behind a signature → You understand basic abstraction.
  2. You build the MakeSet functor that takes an ORDERED structure → You’ve grasped the core concept of a functor.
  3. You can instantly create both IntSet and StringViewSet from the same functor → You have achieved true, type-safe, generic programming.
  4. You understand why a user of IntSet cannot access the BST nodes directly → You have mastered the principle of data hiding through opaque signatures.

Project 4: A Simple Type Checker

  • File: LEARN_SML_DEEP_DIVE.md
  • Main Programming Language: Standard ML
  • Alternative Programming Languages: OCaml, Haskell
  • Coolness Level: Level 5: Pure Magic (Super Cool)
  • Business Potential: 1. The “Resume Gold”
  • Difficulty: Level 4: Expert
  • Knowledge Area: Compilers / Type Systems / Programming Language Theory
  • Software or Tool: SML/NJ
  • Main Book: “Types and Programming Languages” by Benjamin C. Pierce

What you’ll build: A type checker for a simple typed lambda calculus. You will define datatypes for expressions (variables, lambdas, application) and types (int, bool, function types). Then, you’ll write a typeof function that, given an expression and a context (a map of variable names to their types), returns the type of that expression or fails if it’s ill-typed.

Why it teaches SML: This is what SML was born to do. The language’s own type system was one of the first major applications of ML. Writing a type checker in SML feels incredibly natural. Datatypes represent the syntax, and pattern matching functions represent the typing rules. It’s a deep dive into SML’s original domain.

Core challenges you’ll face:

  • Defining the language’s abstract syntax tree (AST) → maps to using datatype for Var, Lam, App, etc.
  • Managing a type context → maps to using a dictionary/map (from Project 3!) to track variable types
  • Implementing the typing rules → maps to a recursive function that pattern matches on the expression and applies the correct rule
  • Handling type errors gracefully → maps to using the option type to return SOME typ on success and NONE on failure

Key Concepts:

  • Abstract Syntax Trees (ASTs): A core concept in all compilers.
  • Typing Judgements: The formal rules that define a type system (e.g., “if e1 has type A -> B and e2 has type A, then App(e1, e2) has type B”).
  • Programming Language Theory: This project is a practical introduction to the field.

Difficulty: Expert Time estimate: 1-2 weeks Prerequisites: Projects 1, 2, and 3. A grasp of lambda calculus is helpful but can be learned.

Real world outcome: You will be able to represent a program as a data structure and have your SML program prove what its type is, or if it has an error, without ever running the program.

(* Your datatypes *)
datatype typ = TInt | TBool | TFun of typ * typ
datatype expr = Var of string | Const of int | Lam of string * typ * expr | App of expr * expr

(* Your typechecker function *)
val typeof : (string * typ) list -> expr -> typ option

(* Example usage *)
(* Represents: (fn x:int => x + 1) 5 *)
val my_prog = App(Lam("x", TInt, Add(Var "x", Const 1)), Const 5)

- typeof [] my_prog;
val it = SOME TInt : typ option

(* Represents: (fn x:int => x) true *)
val bad_prog = App(Lam("x", TInt, Var "x"), BoolConst true)

- typeof [] bad_prog;
val it = NONE : typ option

Implementation Hints:

  1. Your type context (or environment) can be a simple association list, like [("x", TInt), ("y", TBool)].
  2. The typeof function will take the context and the expression: fun typeof ctx expr = ...
  3. The logic is a case statement over the expr:
    • typeof ctx (Const i) -> SOME TInt
    • typeof ctx (Var s) -> Look up s in the context ctx.
    • typeof ctx (Lam (s, t, body)) -> Add (s, t) to the context and find the type of the body. If the body has type T_body, the lambda’s type is TFun(t, T_body).
    • typeof ctx (App (e1, e2)) -> This is the most complex. Find the type of e1. If it’s a function type TFun(T_arg, T_res), then find the type of e2. If the type of e2 matches T_arg, the whole expression has type T_res. Otherwise, it’s a type error.

Learning milestones:

  1. You can type-check constants and variables → You understand how to manage the context.
  2. You can type-check a lambda function → You understand how to introduce new variables into the scope.
  3. You can type-check a function application → You have implemented the core rule of a typed lambda calculus.
  4. You realize you have just built the heart of a compiler frontend → You’ve gained a deep insight into how programming languages work.

Summary

Project Main Language Difficulty Time Estimate
1. Core List Utilities Standard ML Beginner Weekend
2. Abstract Datatype for Expressions Standard ML Intermediate Weekend
3. A Generic Set, Abstracted Standard ML Advanced 1-2 weeks
4. A Simple Type Checker Standard ML Expert 1-2 weeks