Project 4: Simply Typed Lambda Calculus + Type Inference

Project 4: Simply Typed Lambda Calculus + Type Inference

Build a type checker for the simply typed lambda calculus, then extend it to full Hindley-Milner type inference with let-polymorphism.

Quick Reference

Attribute Value
Difficulty Expert
Time Estimate 3-4 weeks
Language Haskell (Alternatives: OCaml, Rust, Scala)
Prerequisites Projects 1-3 (Lambda Calculus Interpreter), understanding of substitution, basic type theory
Key Topics Simply Typed Lambda Calculus, Hindley-Milner Type System, Algorithm W, Unification, Let-Polymorphism
Main Resources “Types and Programming Languages” by Benjamin Pierce (Chapters 9-11, 22)

1. Learning Objectives

By completing this project, you will:

  1. Understand type systems from first principles: Explain why types exist, what problems they solve, and how they constrain programs to prevent certain errors
  2. Implement a type checker for STLC: Given an explicitly-typed term, verify it is well-typed using bidirectional type checking
  3. Implement Hindley-Milner type inference: Infer the most general type of an untyped term without any annotations
  4. Master unification: Implement Robinson’s unification algorithm for solving type constraints
  5. Understand let-polymorphism: Explain why let enables polymorphism while lambda abstraction alone does not
  6. Debug type errors: Given a type error, trace back to understand what constraint was violated and why
  7. Connect theory to practice: Relate your implementation to how GHC, OCaml, and Rust infer types

2. Conceptual Foundation

2.1 Why Types?

Before diving into implementation, we must understand why type systems exist at all.

The Fundamental Problem

Consider the untyped lambda calculus from Project 1. Every term is a function, and any term can be applied to any other term. This is maximally flexible but also maximally dangerous:

(lambda x. x + 1) "hello"    -- What does "hello" + 1 mean?
(lambda x. x x) (lambda x. x x)  -- Infinite loop
(lambda x. x 5) true         -- What does true(5) mean?

In the untyped world, these expressions are syntactically valid. We only discover they are nonsensical at runtime (or they diverge forever).

Types solve this by classifying terms before execution. If we can prove at compile time that every function application makes sense, we can eliminate entire classes of runtime errors.

What Types Are

A type is a conservative approximation of the set of values a term can evaluate to:

  • Int means “this term will produce an integer”
  • Bool means “this term will produce a boolean”
  • Int -> Bool means “this term is a function that takes an integer and produces a boolean”

The key word is conservative: the type system may reject some programs that would actually run correctly, but it will never accept a program that would produce a type error at runtime.

This is the fundamental tradeoff: type systems sacrifice completeness for soundness. A well-typed program might still loop forever or throw a non-type exception, but it will never apply a function to an argument of the wrong type.

2.2 The Simply Typed Lambda Calculus (STLC)

The Simply Typed Lambda Calculus (STLC) is the simplest typed extension of the lambda calculus. It was introduced by Alonzo Church in 1940 and forms the foundation of all modern type systems.

Syntax

The STLC extends lambda calculus with type annotations:

Types:
  T ::= Int                    -- Base type
      | Bool                   -- Base type
      | T1 -> T2               -- Function type (right-associative)

Terms:
  e ::= x                      -- Variable
      | lambda x:T. e          -- Abstraction (annotated)
      | e1 e2                  -- Application
      | n                      -- Integer literal
      | true | false           -- Boolean literals

The crucial addition is that lambda-bound variables now carry type annotations: lambda x:Int. x + 1 instead of just lambda x. x + 1.

The Function Type

The arrow type A -> B is the type of functions from A to B. It is right-associative, meaning:

A -> B -> C = A -> (B -> C)

This makes sense when you think about currying. A function add : Int -> Int -> Int takes an Int and returns a function Int -> Int. So:

add     : Int -> Int -> Int   -- Whole function
add 3   : Int -> Int          -- Partially applied
add 3 4 : Int                 -- Fully applied

Typing Rules

The STLC is defined by typing judgments of the form:

Gamma |- e : T

Read as: “Under context Gamma, expression e has type T”

The context (or environment) Gamma maps variable names to their types. The typing rules are:

(VAR)       Gamma, x:T |- x : T

(ABS)       Gamma, x:T1 |- e : T2
            -------------------------
            Gamma |- (lambda x:T1. e) : T1 -> T2

(APP)       Gamma |- e1 : T1 -> T2      Gamma |- e2 : T1
            --------------------------------------------
            Gamma |- e1 e2 : T2

Let’s read these rules:

  1. VAR: If x has type T in the context, then x has type T. (Trivial but necessary.)

  2. ABS: To type an abstraction lambda x:T1. e:
    • Add x:T1 to the context
    • Type-check the body e to get some type T2
    • The whole abstraction has type T1 -> T2
  3. APP: To type an application e1 e2:
    • Type-check e1 to get a function type T1 -> T2
    • Type-check e2 to get type T1 (must match the domain!)
    • The result has type T2

Type Checking vs Type Inference

In STLC, we typically do type checking: given a term with type annotations, verify it is well-typed. This is straightforward because we always know the types of bound variables.

Type inference is harder: given an unannotated term, figure out what type annotations would make it well-typed (if any). This is what Hindley-Milner does.

2.3 The Problem with Simple Types

STLC has a severe limitation: no polymorphism. Consider the identity function:

id = lambda x:Int. x

This works for integers, but if we want to use it on booleans, we need a separate function:

id_bool = lambda x:Bool. x

And for functions, yet another:

id_fun = lambda x:(Int -> Int). x

This is absurd. The identity function does the same thing regardless of type. We want to write it once:

id = lambda x. x

And have it work at any type. This requires parametric polymorphism: the ability for a single function to work at many types.

2.4 Hindley-Milner Type System

The Hindley-Milner (HM) type system solves this problem. Independently discovered by Roger Hindley in 1969 and Robin Milner in 1978, it provides:

  1. Type inference: No type annotations needed
  2. Parametric polymorphism: Functions can work at multiple types
  3. Principal types: Every expression has a “most general” type

Type Variables and Quantification

HM introduces type variables (written as a, b, c or in Haskell notation a, b, c):

id : forall a. a -> a

Read: “For all types a, id is a function from a to a.”

The forall is called universal quantification. It means id can be used at any type:

id 5       -- id used at type Int -> Int
id true    -- id used at type Bool -> Bool
id id      -- id used at type (a -> a) -> (a -> a)

Each use instantiates the type variable with a concrete type.

Monotypes vs Polytypes

HM distinguishes two kinds of types:

Monotypes (tau): Types without quantifiers

Int
Bool
a                   -- A type variable
a -> b              -- Function with type variables
(a -> b) -> a -> b  -- More complex

Polytypes or Type Schemes (sigma): Quantified types

forall a. a -> a
forall a b. a -> b -> a
forall a. (a -> a) -> a -> a

The quantifiers are only allowed at the “top level” of a type scheme. This is called prenex polymorphism or rank-1 polymorphism.

The Let-Polymorphism Insight

Here’s the key question: when can we generalize a type? Consider:

let id = lambda x. x
in (id 5, id true)

We want id to be polymorphic so we can use it at both Int and Bool. But consider:

(lambda id. (id 5, id true)) (lambda x. x)

This is equivalent beta-eta, but now we have a problem! Lambda-bound variables cannot be polymorphic in HM. The term id has a single type that must unify with both Int -> Int and Bool -> Bool, which is impossible.

The solution: let is special. When we write:

let x = e1 in e2

We can generalize the type of e1 before using it in e2. This generalization is only safe because:

  1. We know exactly what e1 is (its full definition)
  2. We can specialize it differently at each use site

Lambda-bound variables don’t have this luxury–we don’t know what argument will be passed until runtime.

2.5 Algorithm W: Type Inference

Algorithm W is the standard algorithm for HM type inference. It was introduced by Milner and is the basis for type inference in ML, OCaml, Haskell, and many other languages.

The Core Idea

Algorithm W works by:

  1. Generating fresh type variables for unknowns
  2. Collecting constraints during traversal
  3. Solving constraints via unification
  4. Generalizing at let-bindings

Unification

Unification is the heart of type inference. Given two types, find a substitution (mapping from type variables to types) that makes them equal.

unify(a, Int) = {a := Int}
unify(a -> b, Int -> Bool) = {a := Int, b := Bool}
unify(a -> a, Int -> Bool) = FAIL (a cannot be both Int and Bool)

Robinson’s unification algorithm:

unify(T1, T2):
  if T1 is a type variable a:
    if a == T2: return {}
    if a occurs in T2: FAIL (infinite type!)
    return {a := T2}

  if T2 is a type variable: return unify(T2, T1)

  if T1 = S1 -> S2 and T2 = R1 -> R2:
    sub1 = unify(S1, R1)
    sub2 = unify(apply(sub1, S2), apply(sub1, R2))
    return compose(sub2, sub1)

  if T1 = T2 (same base type): return {}

  FAIL (incompatible types)

The occurs check prevents infinite types. Without it, unify(a, a -> Int) would produce {a := a -> Int}, but then a = a -> Int = (a -> Int) -> Int = ... forever.

The Algorithm

Algorithm W traverses the expression, maintaining:

  • A type environment mapping variables to type schemes
  • A substitution accumulated from unifications
  • A fresh variable counter
W(env, e):
  case e of:
    x (variable):
      scheme = lookup(x, env)
      return (empty_subst, instantiate(scheme))

    lambda x. body:
      alpha = fresh_type_var()
      (subst, tau) = W(env + {x: alpha}, body)
      return (subst, apply(subst, alpha) -> tau)

    e1 e2:
      (s1, tau1) = W(env, e1)
      (s2, tau2) = W(apply(s1, env), e2)
      alpha = fresh_type_var()
      s3 = unify(apply(s2, tau1), tau2 -> alpha)
      return (compose(s3, compose(s2, s1)), apply(s3, alpha))

    let x = e1 in e2:
      (s1, tau1) = W(env, e1)
      scheme = generalize(apply(s1, env), tau1)
      (s2, tau2) = W(apply(s1, env) + {x: scheme}, e2)
      return (compose(s2, s1), tau2)

Generalization and Instantiation

Generalization turns a monotype into a polytype by quantifying over free type variables that don’t appear in the environment:

generalize(env, tau):
  free_vars = freeVars(tau) - freeVars(env)
  return forall free_vars. tau

Instantiation turns a polytype into a monotype by replacing quantified variables with fresh type variables:

instantiate(forall a1 ... an. tau):
  alpha1 = fresh(), ..., alphan = fresh()
  return tau[a1 := alpha1, ..., an := alphan]

2.6 Type Safety: Progress and Preservation

A type system is sound if it satisfies two properties:

  1. Progress: A well-typed term is either a value or can take a step.
  2. Preservation: If a well-typed term takes a step, the result is also well-typed.

Together, these guarantee: well-typed programs don’t get stuck.

For STLC/HM, we can prove:

  • Progress: By case analysis on the typing derivation
  • Preservation: By the substitution lemma (substituting a value for a variable preserves types)

2.7 Extensions and Variations

Adding Base Types and Primitives

Real languages have more than just functions:

Gamma |- n : Int          (INT)
Gamma |- true : Bool      (TRUE)
Gamma |- false : Bool     (FALSE)

Gamma |- e1 : Int    Gamma |- e2 : Int
--------------------------------------  (PLUS)
Gamma |- e1 + e2 : Int

Gamma |- e1 : Bool    Gamma |- e2 : T    Gamma |- e3 : T
--------------------------------------------------------  (IF)
Gamma |- if e1 then e2 else e3 : T

Recursive Types and Recursion

The Y combinator is untypable in STLC! To support recursion, we need either:

  1. A fixpoint operator: fix : forall a. (a -> a) -> a
  2. Recursive types: type List a = Nil | Cons a (List a)

Type Annotations

Even with inference, programmers sometimes want to annotate types:

id : forall a. a -> a
id x = x

This requires checking that the inferred type is at least as general as the declared type.

2.8 Common Type Errors and What They Mean

Error Meaning
“Cannot unify Int with Bool” You used an Int where a Bool was expected (or vice versa)
“Infinite type: a ~ a -> b” Occurs check failed; you’re trying to create a recursive type
“Unbound variable: x” Variable used but not in scope
“Cannot unify a -> b with Int” You applied something that’s not a function
“Rigid type variable” (GHC) You’re trying to instantiate a type variable that’s already fixed

3. Project Specification

3.1 Phase 1: Simply Typed Lambda Calculus (Week 1)

Build a type checker for the simply typed lambda calculus with explicit type annotations.

Features

  1. Syntax:
    • Types: Int, Bool, T1 -> T2
    • Terms: variables, annotated lambdas, application, literals
    • Parser for the syntax
  2. Type Checker:
    • Implement the typing rules for STLC
    • Maintain a typing context (environment)
    • Produce helpful error messages
  3. REPL:
    • :type <expr> to check the type of an expression
    • Parse and type-check expressions interactively

Example Session

> :type lambda x:Int. x
Int -> Int

> :type lambda x:Int. x + 1
Int -> Int

> :type lambda f:(Int -> Int). lambda x:Int. f (f x)
(Int -> Int) -> Int -> Int

> :type lambda x:Int. x x
Type error: Cannot unify Int with Int -> a
  In expression: x x
  x has type Int but is applied as a function

> :type (lambda x:Int. x) true
Type error: Cannot unify Int with Bool
  Argument 'true' has type Bool
  But function expects type Int

3.2 Phase 2: Unification (Week 2)

Implement Robinson’s unification algorithm as a standalone module.

Features

  1. Type representation with variables:
    • Type variables (fresh, named)
    • Substitutions (mappings from variables to types)
    • Apply substitution to type
  2. Unification:
    • unify :: Type -> Type -> Either TypeError Substitution
    • Occurs check to prevent infinite types
    • Compose substitutions correctly
  3. Tests:
    • Unit tests for all unification cases
    • Property tests: unify t1 t2 produces substitution such that apply s t1 == apply s t2

3.3 Phase 3: Hindley-Milner Inference (Week 3)

Implement Algorithm W for full type inference.

Features

  1. Type Schemes:
    • forall quantification
    • Generalization (monotype to polytype)
    • Instantiation (polytype to monotype)
  2. Algorithm W:
    • Fresh variable generation
    • Constraint collection
    • Let-polymorphism
  3. Extended REPL:
    > :type \x -> x
    Inferred: forall a. a -> a
    
    > :type \f -> \x -> f (f x)
    Inferred: forall a. (a -> a) -> a -> a
    
    > let id = \x -> x in (id 5, id true)
    Inferred: (Int, Bool)
    

3.4 Phase 4: Extensions (Week 4)

Extend your type system with practical features.

Features

  1. Recursive let (let rec):
    > let rec fact = \n -> if n == 0 then 1 else n * fact (n - 1) in fact 5
    Inferred: Int
    
  2. Pattern matching (basic):
    > \(x, y) -> x + y
    Inferred: (Int, Int) -> Int
    
  3. Better error messages:
    • Location information
    • Suggestions for common mistakes
    • “Did you mean?” for typos

4. Solution Architecture

4.1 Module Organization

src/
  Types/
    Syntax.hs       -- Type and Term AST
    Substitution.hs -- Substitutions on types
    Unify.hs        -- Unification algorithm
    Check.hs        -- STLC type checking
    Infer.hs        -- HM type inference
    Scheme.hs       -- Type schemes, generalization, instantiation
    Error.hs        -- Type error representation
    Pretty.hs       -- Pretty printing types and terms
  Parser/
    Lexer.hs
    Parser.hs
  Repl.hs
  Main.hs

test/
  Types/
    SubstitutionSpec.hs
    UnifySpec.hs
    CheckSpec.hs
    InferSpec.hs

4.2 Core Data Types

-- Types
data Type
  = TVar TVar              -- Type variable
  | TInt                   -- Base type
  | TBool                  -- Base type
  | TArrow Type Type       -- Function type

newtype TVar = TVar String

-- Type schemes (polytypes)
data Scheme = Forall [TVar] Type

-- Terms
data Expr
  = Var String
  | Lam String (Maybe Type) Expr  -- Optional annotation
  | App Expr Expr
  | Let String Expr Expr
  | LetRec String Expr Expr
  | IntLit Int
  | BoolLit Bool
  | BinOp Op Expr Expr
  | If Expr Expr Expr

-- Type errors
data TypeError
  = UnboundVariable String
  | CannotUnify Type Type
  | OccursCheckFailed TVar Type
  | NotAFunction Type
  | TypeMismatch Type Type String  -- expected, actual, context

4.3 Substitution

A substitution is a finite mapping from type variables to types:

newtype Subst = Subst (Map TVar Type)

empty :: Subst
singleton :: TVar -> Type -> Subst
compose :: Subst -> Subst -> Subst  -- Order matters!

class Substitutable a where
  apply :: Subst -> a -> a
  freeVars :: a -> Set TVar

Key insight: compose s2 s1 means “apply s1 first, then s2”.

compose s2 s1 = Subst (Map.map (apply s2) (unSubst s1) `Map.union` unSubst s2)

4.4 Unification Algorithm

unify :: Type -> Type -> Either TypeError Subst
unify (TVar v) t = bindVar v t
unify t (TVar v) = bindVar v t
unify (TArrow a1 a2) (TArrow b1 b2) = do
  s1 <- unify a1 b1
  s2 <- unify (apply s1 a2) (apply s1 b2)
  return (compose s2 s1)
unify TInt TInt = return empty
unify TBool TBool = return empty
unify t1 t2 = Left (CannotUnify t1 t2)

bindVar :: TVar -> Type -> Either TypeError Subst
bindVar v (TVar u) | v == u = return empty
bindVar v t | v `Set.member` freeVars t = Left (OccursCheckFailed v t)
bindVar v t = return (singleton v t)

4.5 Algorithm W

-- The inference monad
type Infer a = ExceptT TypeError (State InferState) a

data InferState = InferState
  { counter :: Int  -- For fresh variables
  }

fresh :: Infer Type
fresh = do
  n <- gets counter
  modify (\s -> s { counter = n + 1 })
  return (TVar (TVar ("t" ++ show n)))

-- Main inference function
infer :: TypeEnv -> Expr -> Infer (Subst, Type)

5. Implementation Guide

5.1 Phase 1 Milestones

Milestone 1.1: Type Representation

  • Define Type data type
  • Implement Show and Eq
  • Parse simple types

Milestone 1.2: Term Representation

  • Define Expr data type
  • Parser for explicitly-typed terms
  • Pretty printer

Milestone 1.3: Type Checker

  • Implement check :: TypeEnv -> Expr -> Either TypeError Type
  • Handle all STLC cases
  • Error messages with context

5.2 Phase 2 Milestones

Milestone 2.1: Substitution

  • Implement Subst type
  • Implement apply for types and environments
  • Implement compose correctly

Milestone 2.2: Unification

  • Implement basic unification
  • Add occurs check
  • Comprehensive test suite

Milestone 2.3: Property Tests

  • Idempotence: apply s (apply s t) == apply s t
  • Correctness: unify t1 t2 = Right s => apply s t1 == apply s t2
  • Compositionality tests

5.3 Phase 3 Milestones

Milestone 3.1: Type Schemes

  • Define Scheme type
  • Implement generalize
  • Implement instantiate

Milestone 3.2: Algorithm W

  • Implement core algorithm
  • Handle let-polymorphism
  • Fresh variable generation

Milestone 3.3: Integration

  • Connect to REPL
  • Infer types for unannotated terms
  • Show most general type

5.4 Hints for Tricky Parts

Hint 1: Composition Order

The order in compose s2 s1 matters. Think of it as function composition:

apply (compose s2 s1) t == apply s2 (apply s1 t)

Hint 2: Environment Updates

When descending into a lambda body, you extend the environment. But when returning, you must apply the accumulated substitution to the extended variable’s type.

Hint 3: Let vs Lambda

The key difference:

  • Lambda: The bound variable has a monotype (no generalization)
  • Let: The bound variable’s type is generalized before use

Hint 4: Occurs Check

Don’t forget it! Without the occurs check:

\x -> x x

Would infer a ~ a -> b, giving the “infinite type” a -> a -> a -> ...


6. Testing Strategy

6.1 Unit Tests

-- Unification tests
describe "unify" $ do
  it "unifies type variables" $
    unify (TVar "a") TInt `shouldBe` Right (singleton "a" TInt)

  it "fails occurs check" $
    unify (TVar "a") (TArrow (TVar "a") TInt) `shouldSatisfy` isLeft

  it "unifies function types" $
    unify (TArrow (TVar "a") (TVar "b"))
          (TArrow TInt TBool)
    `shouldBe` Right (fromList [("a", TInt), ("b", TBool)])

-- Inference tests
describe "infer" $ do
  it "infers identity" $
    inferType (Lam "x" Nothing (Var "x"))
    `shouldBe` Right (Forall ["a"] (TArrow (TVar "a") (TVar "a")))

  it "infers const" $
    inferType (Lam "x" Nothing (Lam "y" Nothing (Var "x")))
    `shouldBe` Right (Forall ["a", "b"] (TArrow (TVar "a") (TArrow (TVar "b") (TVar "a"))))

6.2 Property-Based Tests

-- Substitution properties
prop_applyIdempotent :: Subst -> Type -> Bool
prop_applyIdempotent s t = apply s (apply s t) == apply s t

-- Unification properties
prop_unifyCorrect :: Type -> Type -> Property
prop_unifyCorrect t1 t2 =
  case unify t1 t2 of
    Left _ -> property True  -- No constraint on failure
    Right s -> apply s t1 === apply s t2

-- Generalization/Instantiation
prop_instantiateSpecializes :: TypeEnv -> Type -> Property
prop_instantiateSpecializes env tau =
  let scheme = generalize env tau
      tau' = runInfer (instantiate scheme)
  in canUnify tau tau'

6.3 Integration Tests

Test complete programs:

testPrograms :: [(String, String)]
testPrograms =
  [ ("\\x -> x", "forall a. a -> a")
  , ("\\f -> \\x -> f (f x)", "forall a. (a -> a) -> a -> a")
  , ("let id = \\x -> x in id id", "forall a. a -> a")
  , ("let id = \\x -> x in (id 1, id true)", "(Int, Bool)")
  , ("\\x -> x x", "TYPE ERROR")  -- Should fail
  ]

7. Common Pitfalls

7.1 Substitution Composition

Wrong:

compose s2 s1 = Subst (unSubst s1 `Map.union` unSubst s2)

Right:

compose s2 s1 = Subst (Map.map (apply s2) (unSubst s1) `Map.union` unSubst s2)

You must apply s2 to the range of s1!

7.2 Forgetting the Occurs Check

Without the occurs check, your inferencer will loop or produce nonsense:

-- Without occurs check:
unify (TVar "a") (TArrow (TVar "a") TInt)
-- Returns: {a := a -> Int}
-- But a = a -> Int = (a -> Int) -> Int = ...

7.3 Generalization Timing

You must generalize after type checking e1 in let x = e1 in e2, not before. The substitution from checking e1 must be applied first.

7.4 Fresh Variable Collisions

Use a counter or supply monad for fresh variables. Don’t use random names or you’ll get collisions.

7.5 Monomorphism Restriction

In Haskell, top-level bindings without type signatures and with no function arguments don’t get generalized (the “monomorphism restriction”). You can ignore this for your project or implement it as an extension.


8. Extensions and Challenges

8.1 Intermediate Extensions

  1. Type Annotations: Allow optional annotations and check they’re not less general than inferred
  2. Recursive Let: Extend let to allow recursive definitions
  3. Tuples and Records: Add product types with projection
  4. Sum Types: Add Either-like types with case analysis

8.2 Advanced Extensions

  1. Higher-Rank Types: Allow forall inside types (Rank-N polymorphism)
  2. Constraint-Based Inference: Separate constraint generation from solving
  3. Error Recovery: Continue after errors to report multiple problems
  4. Type Classes: Add Haskell-style overloading (significant extension!)

8.3 Research Extensions

  1. Row Polymorphism: Extensible records
  2. GADTs: Generalized Algebraic Data Types
  3. Dependent Types: Types that depend on values
  4. Effect Systems: Track side effects in types

9. Real-World Connections

9.1 Where HM Appears

Language Type System
Haskell HM + type classes + many extensions
OCaml HM + objects + modules
ML (SML) Classic HM
F# HM + .NET interop
Rust HM-inspired + ownership + lifetimes
TypeScript Structural + limited inference (not HM)
Elm Pure HM (no type classes)

9.2 Type Inference in Production

GHC (Haskell): Uses constraint-based inference with extensive extensions (GADTs, type families, etc.). Error messages have improved dramatically with work on error provenance.

Rust: The borrow checker is separate from type inference. Rust’s inference is mostly local (within functions) with limited bidirectional inference.

TypeScript: Uses a “best common type” algorithm with widening. Not HM because it needs to interop with JavaScript’s dynamic typing.

9.3 Industry Applications

  • Compiler internals: Every typed language needs a type checker
  • IDE support: Type inference powers autocomplete and refactoring
  • Static analysis: Type-like systems for security, dataflow
  • Program synthesis: Types guide search for correct programs

10. Interview Questions

After completing this project, you should be able to answer:

  1. What is the difference between type checking and type inference?
    • Checking: verify a term has a given type
    • Inference: discover the type without annotations
    • HM does inference with principal types
  2. Explain the occurs check and why it’s necessary.
    • Prevents infinite types like a = a -> Int
    • Without it, unification would loop or produce nonsense
    • Related to the termination of the unification algorithm
  3. Why can lambda-bound variables not be polymorphic in HM?
    • We don’t know what value will be passed at runtime
    • Can’t generalize because the variable might unify with something concrete
    • Let-polymorphism is safe because we know the full definition
  4. What is a principal type?
    • The most general type of an expression
    • All other valid types are instances of it
    • HM guarantees principal types exist and are computable
  5. How would you extend HM to support type classes?
    • Add constraints/predicates to type schemes
    • Instance resolution during type checking
    • Dictionary passing in implementation
  6. What are the tradeoffs of Hindley-Milner vs. dependent types?
    • HM: decidable inference, simpler, proven track record
    • Dependent types: more expressive, harder inference, fewer mainstream languages
  7. Explain Algorithm W in your own words.
    • Walk the syntax tree
    • Generate fresh type variables for unknowns
    • Collect constraints via unification
    • Generalize at let-bindings
    • Return accumulated substitution and final type

11. Self-Assessment Checklist

Phase 1: STLC Type Checker

  • Can type-check identity: lambda x:Int. x : Int -> Int
  • Can type-check composition: lambda f:(Int->Int). lambda g:(Int->Int). lambda x:Int. f (g x)
  • Rejects ill-typed terms with helpful errors
  • Parser handles nested lambdas and applications correctly

Phase 2: Unification

  • Unifies type variables with concrete types
  • Unifies function types recursively
  • Occurs check prevents infinite types
  • Composition of substitutions is correct
  • Property tests pass

Phase 3: Algorithm W

  • Infers forall a. a -> a for \x -> x
  • Infers forall a b. a -> b -> a for \x -> \y -> x
  • Let-polymorphism works: let id = \x -> x in (id 1, id true)
  • Rejects \x -> x x with occurs check error
  • Fresh variable generation is sound

Phase 4: Extensions

  • Recursive let works for factorial
  • Error messages include source locations
  • Type annotations are checked for consistency
  • REPL provides good user experience

Understanding

  • Can explain why types prevent runtime errors
  • Can trace through Algorithm W by hand
  • Can explain let-polymorphism vs. lambda restriction
  • Can relate to Haskell/OCaml type inference
  • Can discuss soundness (progress + preservation)

12. Resources

Primary Texts

  1. “Types and Programming Languages” by Benjamin Pierce
    • Chapters 9-11: Simply Typed Lambda Calculus
    • Chapter 22: Type Reconstruction (Hindley-Milner)
    • The definitive reference
  2. “Haskell Programming from First Principles” by Allen & Moronuki
    • Chapter 5: Types
    • Chapter 6: Typeclasses
    • Good intuition-building

Papers

  1. “Principal Type-Schemes for Functional Programs” by Damas and Milner (1982)
    • The original Algorithm W paper
    • Dense but foundational
  2. “A Tutorial Introduction to Type Inference” by Luca Cardelli
    • More accessible than Pierce
    • Good pseudocode
  3. “Typing Haskell in Haskell” by Mark P. Jones
    • Complete type checker implementation
    • Includes type classes

Online Resources

  1. Write You a Haskell
    • Chapter on type inference
    • Working Haskell code
  2. Algorithm W Step by Step
    • Martin Grabmuller’s tutorial
    • Complete implementation
  3. Stanford CS242: Type Inference
    • Lecture notes with examples
    • Good for review

Reference Implementations

  1. Hindley-Milner in 500 lines: Various implementations on GitHub
  2. GHC source: The real thing (complex but instructive)
  3. Mini-ML: Various teaching implementations

“The type system is a theorem prover, and type checking is proof verification. Every program that type-checks comes with a machine-verified proof that certain errors cannot occur.”

After completing this project, you’ll never look at a type error the same way again.