LEARN SML DEEP DIVE
Learn Standard ML: From Zero to Type-Safe Systems Programmer
Goal: Master the Standard ML (SML) language to deeply understand the power of a statically-typed functional language with type inference, a world-class module system, and rigorous pattern matching. Learn a new way to reason about software that emphasizes correctness and robust design.
Why Learn Standard ML?
Standard ML is not just a language; it’s a meticulously designed tool for thinking. It belongs to the ML family of languages, which pioneered many concepts we now take for granted in modern programming, like static type inference. SML occupies a sweet spot: it’s a functional-first language that is far more rigorous than Lisp, but more pragmatic and less dogmatic about purity than Haskell.
Learning SML will fundamentally change how you view types, modules, and program correctness. It’s the language of choice for serious work in programming language theory, compilers, and formal verification for a reason.
After completing these projects, you will:
- Write code that is “correct by construction” thanks to a powerful and sound type system.
- Appreciate the magic of Hindley-Milner type inference, where you get all the safety of static types with almost none of the verbosity.
- Master one of the most advanced module systems ever created, enabling true abstraction and generic programming.
- Deconstruct complex data with exhaustive pattern matching, eliminating entire classes of bugs.
- Understand the trade-offs between pure functional programming and principled side effects.
Core Concept Analysis
The SML Philosophy: Type Safety Meets Pragmatism
SML’s design is a masterclass in balancing theoretical purity with practical utility. It proves that a language can be both mathematically elegant and an effective tool for building real-world systems.
┌──────────────────────────────────────────────┐
│ IMPERATIVE (e.g., Python/Java) │
│ │
│ You tell the computer HOW to do something. │
│ Types are either absent (Python) or │
│ verbose and manual (Java). │
│ Bugs are often found at runtime. │
└──────────────────────────────────────────────┘
│
▼ Paradigm Shift
┌──────────────────────────────────────────────┐
│ PURELY FUNCTIONAL (e.g., Haskell) │
│ │
│ You describe what something IS, with a │
│ strict separation from the "dirty" │
│ outside world (I/O). │
└──────────────────────────────────────────────┘
│
▼ The SML "Sweet Spot"
┌──────────────────────────────────────────────┐
│ STANDARD ML │
│ │
│ You get the safety of Haskell's type system │
│ with the practicality of allowing controlled│
│ side effects, all wrapped in a module │
│ system that encourages building robust, │
│ reusable, and abstract components. │
└──────────────────────────────────────────────┘
Fundamental Concepts
-
Static Typing with Full Type Inference: This is SML’s “magic” feature. You write code that looks as clean as a dynamic language like Python, but the compiler deduces all the types for you before the program runs. If it compiles, it is type-safe.
fun id x = x;(SML infers the typeval id = fn : 'a -> 'a) -
Algebraic Datatypes (ADTs) and Pattern Matching: You create complex data types by combining simpler ones. Pattern matching lets you deconstruct this data in a way that is exhaustive and safe. The compiler will warn you if you forget a case.
datatype color = Red | Green | Blue | RGB of int*int*int;fun isGray (RGB(r,g,b)) = (r=g andalso g=b) | isGray _ = false; - The Module System (Signatures, Structures, Functors): This is SML’s most powerful feature, influencing languages like Rust and OCaml.
- Signatures: An interface that specifies the types and values a module must provide.
- Structures: The implementation of a signature. A collection of types, values, and other modules.
- Functors: These are “functions” that take structures as arguments and produce new structures. They are the ultimate tool for building generic, reusable code.
┌──────────────┐ implements ► ┌───────────────┐ │ SIGNATURE │ ◄───────────── │ STRUCTURE │ │ (Interface) │ │ (Implementation)│ └──────┬───────┘ └───────────────┘ │ is input to │ ▼ ┌──────────────┐ │ FUNCTOR │ │(Module-level │ │ Function) │ └──────┬───────┘ │ produces ▼ ┌───────────────┐ │ New │ │ STRUCTURE │ └───────────────┘ - Functional Core with Controlled Effects: SML is functional-first. Immutability is the default. However, it provides explicit mechanisms for side effects like I/O and mutable state (
reftypes), acknowledging their necessity in real-world programming.
Project List
These projects are carefully sequenced to build your understanding from basic functional programming idioms to the advanced modular programming that makes SML unique.
Project 1: Core List Utilities
- File: LEARN_SML_DEEP_DIVE.md
- Main Programming Language: Standard ML
- Alternative Programming Languages: OCaml, F#
- Coolness Level: Level 2: Practical but Forgettable
- Business Potential: 1. The “Resume Gold”
- Difficulty: Level 1: Beginner
- Knowledge Area: Functional Programming / Core Data Structures
- Software or Tool: Standard ML of New Jersey (SML/NJ) or MLton
- Main Book: “Programming in Standard ML” by Robert Harper
What you’ll build: Your own implementation of the essential list processing functions from the SML Basis Library: length, append, reverse, map, filter, and foldl/foldr.
Why it teaches SML: This project is a bootcamp in the functional way of thinking. You will learn to solve problems using recursion and pattern matching instead of loops and mutation. It forces you to master the fundamental [] vs x::xs pattern that underpins all list processing in ML-family languages.
Core challenges you’ll face:
- Writing recursive functions → maps to thinking about the base case (empty list) and the inductive step (head and tail)
- Using pattern matching on lists → maps to the
funorcasesyntax for deconstructing lists - Ensuring your functions are polymorphic → maps to writing code that works on
'a list(a list of *anything), which SML will infer automatically* - Implementing higher-order functions → maps to writing
mapandfilter, which take other functions as arguments
Key Concepts:
- Recursion and Pattern Matching: “Programming in Standard ML”, Chapters 7 & 9 - Harper
- Parametric Polymorphism (
'a): “ML for the Working Programmer”, Chapter 2 - Paulson - Higher-Order Functions: “Programming in Standard ML”, Chapter 11 - Harper
Difficulty: Beginner Time estimate: Weekend Prerequisites: None. This is where you start.
Real world outcome: You will have a file of utility functions. In the SML/NJ REPL, you can load your file and see them work, with their fully inferred polymorphic types.
- use "list_utils.sml";
[opening list_utils.sml]
val my_length = fn : 'a list -> int
val my_map = fn : ('a -> 'b) -> 'a list -> 'b list
...
val it = () : unit
- my_length([1, 2, 3]);
val it = 3 : int
- my_map (fn x => x + 1) [1, 2, 3];
val it = [2, 3, 4] : int list
- my_reverse(["a", "b", "c"]);
val it = ["c", "b", "a"] : string list
Implementation Hints:
length: The length of an empty list[]is 0. The length of a listx::xsis 1 + the length ofxs.reverse: The naive recursive version is slow. For a challenge, try to write a more efficient version using a helper function with an accumulator.fun reverse L = let fun rev ([], acc) = acc | rev(x::xs, acc) = rev(xs, x::acc) in rev(L, []) end;map:maptakes a functionfand a list. If the list is empty, return[]. If the list isx::xs, applyftoxand cons the result onto the result of mappingfoverxs.- Type Inference: Don’t write any type annotations. Let the compiler do the work. Observe the types SML infers for your functions—they will tell you if your logic is correct and general enough.
Learning milestones:
- You can implement
lengthandappend→ You understand basic recursion and pattern matching. - You can implement
mapandfilter→ You understand higher-order functions. - You can implement
foldl→ You have grasped the most general and powerful pattern of list processing.
Project 2: Abstract Datatype for Expressions
- File: LEARN_SML_DEEP_DIVE.md
- Main Programming Language: Standard ML
- Alternative Programming Languages: OCaml, Haskell
- Coolness Level: Level 3: Genuinely Clever
- Business Potential: 1. The “Resume Gold”
- Difficulty: Level 2: Intermediate
- Knowledge Area: Compilers / Data Structures
- Software or Tool: SML/NJ
- Main Book: “Programming in Standard ML” by Robert Harper
What you’ll build: A system to represent and evaluate simple arithmetic expressions. You’ll define a datatype for an expression tree and then write an eval function that recursively computes its integer value.
Why it teaches SML: This project demonstrates SML’s power in defining and manipulating complex, recursive data structures. It’s a foundational skill for any work in compilers, interpreters, or any domain that involves symbolic manipulation. Pattern matching makes the eval function incredibly clean and safe.
Core challenges you’ll face:
- Defining a recursive
datatype→ maps to representing tree-like data such asConst of int | Add of expr * expr - Writing a recursive
evalfunction → maps to using pattern matching to handle each case of the datatype - Handling potential errors → maps to using SML’s
optiontype (e.g., for division by zero) instead of nulls - Extending the language → maps to adding new constructors like
MulorDivand seeing the compiler tell you which functions need updating
Key Concepts:
- Algebraic Datatypes: “Programming in Standard ML”, Chapter 10 - Harper
- Case Expressions: A key part of pattern matching for complex logic.
- Option Type: SML’s safe way to represent a value that might be missing (
SOME v | NONE).
Difficulty: Intermediate Time estimate: Weekend Prerequisites: Project 1.
Real world outcome:
You can define expressions as SML values and evaluate them. The compiler will help you ensure your eval function handles every possible kind of expression.
- use "eval.sml";
[opening eval.sml]
datatype expr = Const of int | Add of expr * expr | Mul of expr * expr
val eval = fn : expr -> int
val it = () : unit
- val my_expr = Add(Const 10, Mul(Const 2, Const 5));
val my_expr = Add (Const 10, Mul (Const 2, Const 5)) : expr
- eval my_expr;
val it = 20 : int
Implementation Hints:
- Start with your datatype definition:
datatype expr = Const of int | Add of expr * expr | Neg of expr; - The
evalfunction will have a clause for each constructor:fun eval (Const i) = i | eval (Neg e) = ~ (eval e) | eval (Add (e1, e2)) = (eval e1) + (eval e2) - For division, your
evalfunction can’t return anintbecause division by zero has no integer result. This is a perfect use case for theoptiontype. Change the function’s signature tofun eval : expr -> int optionand handle the error case by returningNONE.fun eval (Div (e1, e2)) = case (eval e1, eval e2) of (SOME v1, SOME 0) => NONE (* Division by zero *) | (SOME v1, SOME v2) => SOME (v1 div v2) | _ => NONE
Learning milestones:
- You can define and evaluate simple expressions with
AddandConst→ You understand ADTs and recursive functions over them. - You add
MulandSuband the compiler tells youevalis non-exhaustive → You experience the safety benefits of pattern matching firsthand. - You implement division using the
optiontype for error handling → You have learned the standard, safe way to handle potential failure in SML.
Project 3: A Generic Set, Abstracted
- File: LEARN_SML_DEEP_DIVE.md
- Main Programming Language: Standard ML
- Alternative Programming Languages: OCaml
- Coolness Level: Level 4: Hardcore Tech Flex
- Business Potential: 1. The “Resume Gold”
- Difficulty: Level 3: Advanced
- Knowledge Area: Module Systems / Data Abstraction
- Software or Tool: SML/NJ
- Main Book: “ML for the Working Programmer” by L.C. Paulson
What you’ll build: A Set data structure, implemented with a binary search tree. But crucially, you will hide the implementation details behind a SIGNATURE, and then parameterize the whole thing with a functor so it can work with any data type that has a defined ordering.
Why it teaches SML: This project teaches the crown jewel of SML: its module system. You’ll learn the difference between an interface (signature) and an implementation (structure), and then use functors to create generic modules. This is a level of abstraction and code reuse that is unmatched by simple generics in most mainstream languages.
Core challenges you’ll face:
- Defining a
SIGNATURE→ maps to specifying the public API of your module (e.g., an abstract typet, anemptyset,insert,memberfunctions) - Creating a
structurethat implements the signature → maps to providing the concrete definitions for the types and functions in the signature - Hiding the implementation → maps to ensuring users of your
Setcannot know it’s a BST, they can only use the functions you’ve exposed - Building a
functor→ maps to creating a generic “factory” that takes a type and its comparison function and produces a specializedSetstructure for that type
Key Concepts:
- Signatures: “Programming in Standard ML”, Chapter 21 - Harper
- Structures: “Programming in Standard ML”, Chapter 20 - Harper
- Functors: “Programming in Standard ML”, Chapter 23 - Harper
- Abstraction: A core principle of software engineering, enforced by the SML type system.
Difficulty: Advanced Time estimate: 1-2 weeks Prerequisites: Project 2, understanding of binary search trees.
Real world outcome:
You will have created a generic Set factory. You can use it to instantly create a Set of integers, a Set of strings, or a Set of anything else, just by providing an appropriate ordering module.
(* The signature for what an ordered type must provide *)
signature ORDERED = sig
type t
val compare : t * t -> order
end
(* The functor: a function from ORDERED structures to SET structures *)
functor MakeSet (Ord : ORDERED) :> SET = struct
type item = Ord.t
datatype tree = Empty | Node of tree * item * tree
...
fun insert item tree = ...
fun member item tree = ...
end
(* Create a specific implementation for integers *)
structure IntOrder :> ORDERED = struct
type t = int
val compare = Int.compare
end
structure IntSet = MakeSet(IntOrder)
(* Create one for strings *)
structure StringOrder :> ORDERED = struct
type t = string
val compare = String.compare
end
structure StringViewSet = MakeSet(StringOrder)
- val s1 = IntSet.insert 3 (IntSet.insert 1 IntSet.empty);
val s1 = - : IntSet.set
- IntSet.member 3 s1;
val it = true : bool
Implementation Hints:
- Start with the
ORDEREDsignature. It’s simple: it needs to define a typetand acomparefunction for that type. - Implement
IntOrderandStringOrder. These are simple structures that fulfill theORDEREDsignature for built-in types. - Define your
SETsignature. This will be your public API. It should includetype set,val empty : set,val insert : item * set -> set,val member : item * set -> bool. Theitemtype will be abstract here. - Build the functor
MakeSet. Its argument will be(Ord : ORDERED). Inside the functor, you’ll define your BST, but you will useOrd.tas the type of the elements andOrd.compareas the comparison function. - The
functor ... :> SETsyntax is called “opaque signature matching” or “sealing”. It ensures that users of yourIntSetcannot see thatIntSet.setis secretly atree. This enforces abstraction.
Learning milestones:
- You create a working
IntSetstructure behind a signature → You understand basic abstraction. - You build the
MakeSetfunctor that takes anORDEREDstructure → You’ve grasped the core concept of a functor. - You can instantly create both
IntSetandStringViewSetfrom the same functor → You have achieved true, type-safe, generic programming. - You understand why a user of
IntSetcannot access the BST nodes directly → You have mastered the principle of data hiding through opaque signatures.
Project 4: A Simple Type Checker
- File: LEARN_SML_DEEP_DIVE.md
- Main Programming Language: Standard ML
- Alternative Programming Languages: OCaml, Haskell
- Coolness Level: Level 5: Pure Magic (Super Cool)
- Business Potential: 1. The “Resume Gold”
- Difficulty: Level 4: Expert
- Knowledge Area: Compilers / Type Systems / Programming Language Theory
- Software or Tool: SML/NJ
- Main Book: “Types and Programming Languages” by Benjamin C. Pierce
What you’ll build: A type checker for a simple typed lambda calculus. You will define datatypes for expressions (variables, lambdas, application) and types (int, bool, function types). Then, you’ll write a typeof function that, given an expression and a context (a map of variable names to their types), returns the type of that expression or fails if it’s ill-typed.
Why it teaches SML: This is what SML was born to do. The language’s own type system was one of the first major applications of ML. Writing a type checker in SML feels incredibly natural. Datatypes represent the syntax, and pattern matching functions represent the typing rules. It’s a deep dive into SML’s original domain.
Core challenges you’ll face:
- Defining the language’s abstract syntax tree (AST) → maps to using
datatypeforVar,Lam,App, etc. - Managing a type context → maps to using a dictionary/map (from Project 3!) to track variable types
- Implementing the typing rules → maps to a recursive function that pattern matches on the expression and applies the correct rule
- Handling type errors gracefully → maps to using the
optiontype to returnSOME typon success andNONEon failure
Key Concepts:
- Abstract Syntax Trees (ASTs): A core concept in all compilers.
- Typing Judgements: The formal rules that define a type system (e.g., “if
e1has typeA -> Bande2has typeA, thenApp(e1, e2)has typeB”). - Programming Language Theory: This project is a practical introduction to the field.
Difficulty: Expert Time estimate: 1-2 weeks Prerequisites: Projects 1, 2, and 3. A grasp of lambda calculus is helpful but can be learned.
Real world outcome: You will be able to represent a program as a data structure and have your SML program prove what its type is, or if it has an error, without ever running the program.
(* Your datatypes *)
datatype typ = TInt | TBool | TFun of typ * typ
datatype expr = Var of string | Const of int | Lam of string * typ * expr | App of expr * expr
(* Your typechecker function *)
val typeof : (string * typ) list -> expr -> typ option
(* Example usage *)
(* Represents: (fn x:int => x + 1) 5 *)
val my_prog = App(Lam("x", TInt, Add(Var "x", Const 1)), Const 5)
- typeof [] my_prog;
val it = SOME TInt : typ option
(* Represents: (fn x:int => x) true *)
val bad_prog = App(Lam("x", TInt, Var "x"), BoolConst true)
- typeof [] bad_prog;
val it = NONE : typ option
Implementation Hints:
- Your type context (or environment) can be a simple association list, like
[("x", TInt), ("y", TBool)]. - The
typeoffunction will take the context and the expression:fun typeof ctx expr = ... - The logic is a
casestatement over theexpr:typeof ctx (Const i)->SOME TInttypeof ctx (Var s)-> Look upsin the contextctx.typeof ctx (Lam (s, t, body))-> Add(s, t)to the context and find the type of thebody. If the body has typeT_body, the lambda’s type isTFun(t, T_body).typeof ctx (App (e1, e2))-> This is the most complex. Find the type ofe1. If it’s a function typeTFun(T_arg, T_res), then find the type ofe2. If the type ofe2matchesT_arg, the whole expression has typeT_res. Otherwise, it’s a type error.
Learning milestones:
- You can type-check constants and variables → You understand how to manage the context.
- You can type-check a lambda function → You understand how to introduce new variables into the scope.
- You can type-check a function application → You have implemented the core rule of a typed lambda calculus.
- You realize you have just built the heart of a compiler frontend → You’ve gained a deep insight into how programming languages work.
Summary
| Project | Main Language | Difficulty | Time Estimate |
|---|---|---|---|
| 1. Core List Utilities | Standard ML | Beginner | Weekend |
| 2. Abstract Datatype for Expressions | Standard ML | Intermediate | Weekend |
| 3. A Generic Set, Abstracted | Standard ML | Advanced | 1-2 weeks |
| 4. A Simple Type Checker | Standard ML | Expert | 1-2 weeks |