Project 7: Parser Combinator Library

Building the Crown Jewel of Functional Programming

Main Programming Language: Haskell
Alternative Languages: Scala, OCaml, F#
Coolness Level: Level 4: Hardcore Tech Flex
Difficulty: Level 3: Advanced
Knowledge Area: Parsing / Combinators
Estimated Time: 2-3 weeks
Prerequisites: Functor, Applicative, Monad, Alternative type classes; Basic recursion; Understanding of grammars

Learning Objectives

After completing this project, you will be able to:

Design parser types from first principles - Derive the Parser a = String -> Maybe (a, String) type and understand why each component is necessary
Implement Functor, Applicative, and Monad instances for parsers - Write these instances by hand and explain what each operation means for parsing
Use the Alternative type class for choice - Implement <|> for backtracking and understand when parsers should backtrack
Build complex parsers from simple primitives - Combine satisfy, char, string into parsers for JSON, arithmetic, or programming languages
Handle parse errors gracefully - Implement meaningful error messages with position tracking
Understand the relationship between grammars and combinators - Translate BNF/EBNF directly into combinator code
Apply monadic parsing to real-world formats - Parse JSON, CSV, configuration files, or simple programming languages

Conceptual Foundation

What Is Parsing?

Parsing is the process of transforming unstructured text into structured data. When you read the string "[1, 2, 3]" and produce a Haskell list [1, 2, 3], you are parsing. When a compiler reads "x + y * z" and produces an abstract syntax tree respecting operator precedence, it is parsing.

The fundamental challenge of parsing is ambiguity and structure recovery. Text is linear—a sequence of characters. But the data it represents often has hierarchical structure: nested lists, nested function calls, nested expressions. The parser’s job is to recover this hidden structure.

Traditional Approaches vs. Combinators

Historically, parsing was done with specialized tools:

Hand-written recursive descent parsers - Tedious, error-prone, but understandable
Parser generators (yacc, bison, ANTLR) - Powerful but require learning a separate grammar language
Regular expressions - Limited to regular languages (no nesting)

Parser combinators offer a different approach: parsers as first-class values. Instead of describing your grammar in a separate DSL, you write it directly in your host language. The grammar is the code.

-- Traditional grammar definition (BNF):
-- expr   ::= term (('+' | '-') term)*
-- term   ::= factor (('*' | '/') factor)*
-- factor ::= number | '(' expr ')'

-- Parser combinator version (this is actual Haskell code):
expr :: Parser Expr
expr = chainl1 term addOp

term :: Parser Expr
term = chainl1 factor mulOp

factor :: Parser Expr
factor = number <|> parens expr

The combinator version is not a separate specification—it’s executable code that directly performs the parsing.

The Parser Type: Building Intuition

What should the type of a parser be? Let’s derive it step by step.

Attempt 1: Parser as a function

A parser takes a string and produces something:

type Parser a = String -> a

But this doesn’t handle failure. What if the input isn’t valid?

Attempt 2: Adding failure with Maybe

type Parser a = String -> Maybe a

Better! Now Nothing represents parse failure. But we have a problem: after parsing part of the input, we need to know what’s left to continue parsing.

Attempt 3: Returning remaining input

type Parser a = String -> Maybe (a, String)

Now this is useful! A parser takes a string, potentially parses some prefix, and returns both the parsed value and the unconsumed remainder.

For example, parsing an integer from "123abc":

Input: "123abc"
Output: Just (123, "abc")

This is the classic parser combinator type. Let’s make it a proper newtype:

newtype Parser a = Parser { runParser :: String -> Maybe (a, String) }

Understanding the Parser Type Deeply

Let’s examine what each part of String -> Maybe (a, String) contributes:

                    String -> Maybe (a, String)
                    ------    -----  ----------
                      |         |        |
                      |         |        +-- (a, String): Result type
                      |         |            a: The parsed value
                      |         |            String: Remaining input
                      |         |
                      |         +-- Maybe: Handles failure
                      |             Just: Successful parse
                      |             Nothing: Parse failure
                      |
                      +-- String: Input to consume

Why a tuple? Because parsers are incremental. After one parser succeeds, the next parser continues from where it left off:

Input:    "123abc"
          ^^^
Parser 1: integer
Output 1: Just (123, "abc")
                     ^^^^
                     Passed to Parser 2

Input:    "abc"
          ^^^
Parser 2: letters
Output 2: Just ("abc", "")

This is the essence of sequential composition in parsing.

The Primitive Parsers

Every parser combinator library is built from a few fundamental primitives:

1. satisfy - The Foundation

satisfy :: (Char -> Bool) -> Parser Char
satisfy predicate = Parser $ \input ->
    case input of
        (c:cs) | predicate c -> Just (c, cs)
        _                    -> Nothing

satisfy consumes one character if it satisfies a predicate. All other character-based parsers are built on top of it:

char :: Char -> Parser Char
char c = satisfy (== c)

digit :: Parser Char
digit = satisfy isDigit

letter :: Parser Char
letter = satisfy isAlpha

anyChar :: Parser Char
anyChar = satisfy (const True)

2. pure (or return) - Success Without Consuming

instance Applicative Parser where
    pure x = Parser $ \input -> Just (x, input)
    -- ...

pure always succeeds with a given value, consuming no input. This is essential for constructing results:

-- Parse "true" and return the boolean True
trueParser :: Parser Bool
trueParser = string "true" *> pure True

3. empty (or mzero) - Guaranteed Failure

instance Alternative Parser where
    empty = Parser $ \_ -> Nothing
    -- ...

empty always fails. Useful for conditional failure:

-- Parse a digit, but fail if it's zero
nonZeroDigit :: Parser Char
nonZeroDigit = satisfy isDigit >>= \c ->
    if c == '0' then empty else pure c

Functor: Transforming Parse Results

The Functor instance lets you transform what a parser returns:

instance Functor Parser where
    fmap f (Parser p) = Parser $ \input ->
        case p input of
            Nothing        -> Nothing
            Just (a, rest) -> Just (f a, rest)

Visualized:

                 Parser a
                    |
                    | fmap f
                    v
                 Parser b

Input: "123abc"
       ^^^
Parser (digit): Just ('1', "23abc")
                      ^^^
fmap digitToInt:      Just (1, "23abc")
                            ^
                            Transformed!

Common uses:

-- Parse a digit character, convert to Int
digitAsInt :: Parser Int
digitAsInt = fmap digitToInt digit

-- Parse quoted string, drop the quotes
quotedString :: Parser String
quotedString = fmap (init . tail) (char '"' *> many (satisfy (/= '"')) <* char '"')
-- Or better: char '"' *> many (satisfy (/= '"')) <* char '"'

Applicative: Sequencing Independent Parsers

The Applicative instance sequences parsers and combines their results:

instance Applicative Parser where
    pure x = Parser $ \input -> Just (x, input)

    (Parser pf) <*> (Parser pa) = Parser $ \input ->
        case pf input of
            Nothing        -> Nothing
            Just (f, rest) -> case pa rest of
                Nothing         -> Nothing
                Just (a, rest') -> Just (f a, rest')

The key insight: <*> runs the first parser, then runs the second on the remaining input, and combines the results:

Parser (a -> b)  <*>  Parser a  =  Parser b

Input: "ABCdef"
       ^^^
Parser 1: Parse "ABC", return (\x -> "ABC" ++ x)
Output 1: Just ((\x -> "ABC" ++ x), "def")
                                    ^^^
                                    Remaining input

Input for Parser 2: "def"
                    ^^^
Parser 2: Parse "def", return "def"
Output 2: Just ("def", "")

Combined: (\x -> "ABC" ++ x) applied to "def"
Result: Just ("ABCdef", "")

Applicative operators for convenience:

(*>) :: Parser a -> Parser b -> Parser b  -- Run both, keep right
(<*) :: Parser a -> Parser b -> Parser a  -- Run both, keep left
(<$>) :: (a -> b) -> Parser a -> Parser b -- Same as fmap

Parsing a point (3,4):

data Point = Point Int Int

point :: Parser Point
point = Point
    <$> (char '(' *> integer)
    <*> (char ',' *> integer <* char ')')

-- Or with ApplicativeDo:
point = do
    char '('
    x <- integer
    char ','
    y <- integer
    char ')'
    pure (Point x y)

Monad: Context-Sensitive Parsing

Sometimes you need the result of one parser to decide what to parse next. This requires Monad:

instance Monad Parser where
    return = pure

    (Parser pa) >>= f = Parser $ \input ->
        case pa input of
            Nothing        -> Nothing
            Just (a, rest) -> runParser (f a) rest

The crucial difference from Applicative: with >>=, the second parser is computed from the first result.

Example: Parsing length-prefixed data

Format: "3:abc" means "3 characters follow: abc"
        "5:hello" means "5 characters follow: hello"

lengthPrefixed :: Parser String
lengthPrefixed = do
    n <- integer       -- Parse the length
    char ':'
    count n anyChar    -- Parse exactly n characters
                       -- 'count n' depends on the VALUE of n!

This is impossible with just Applicative because count n needs the value n that was parsed.

When to use Monad vs Applicative:

Use Applicative when the structure of your parser is fixed (most grammars)
Use Monad when the structure depends on parsed values (length prefixes, version-dependent formats)

Applicative is faster because parsers can be analyzed statically. Prefer it when possible.

Alternative: Choice and Backtracking

The Alternative instance provides choice between parsers:

instance Alternative Parser where
    empty = Parser $ \_ -> Nothing

    (Parser p1) <|> (Parser p2) = Parser $ \input ->
        case p1 input of
            Nothing -> p2 input  -- p1 failed, try p2 on ORIGINAL input
            result  -> result    -- p1 succeeded, use its result

Critical concept: Backtracking

When p1 fails, p2 gets the original input, not what p1 consumed. This is called backtracking.

Input: "false"
Parser 1: string "true"
  Attempt: 'f' != 't', fails immediately
  Consumed: nothing

Parser 2: string "false"  -- Gets ORIGINAL input
  Attempt: matches!
  Result: Just ("false", "")

Derived combinators from Alternative:

many :: Parser a -> Parser [a]   -- Zero or more
some :: Parser a -> Parser [a]   -- One or more
optional :: Parser a -> Parser (Maybe a)

-- Implementations:
many p = some p <|> pure []
some p = (:) <$> p <*> many p

The Problem with Naive Backtracking

Backtracking can be expensive. Consider:

keyword :: Parser String
keyword = string "function" <|> string "for" <|> string "func"

For input "function":

Try "function" - matches!

For input "forward":

Try "function" - ‘f’, ‘o’, ‘r’ match, then ‘w’ != ‘c’, fail after consuming 3 chars
Backtrack to start, try "for" - matches!

For input "foreach":

Try "function" - ‘f’, ‘o’, ‘r’ match, fail
Backtrack, try "for" - matches!
But wait… we wanted "foreach", and "for" consumed only 3 chars

This naive approach has problems:

Performance: Re-scanning the same prefix multiple times
Ambiguity: "for" matches as prefix of "foreach"

Advanced Parsers: The `try` Combinator

Real parser libraries like Parsec use predictive parsing: by default, once a parser consumes input, it “commits” and won’t backtrack.

-- In Parsec/Megaparsec:
keyword = string "function" <|> string "for"

-- On input "forward":
-- 1. Try "function", consume "f", commit
-- 2. "o" matches, "r" matches, "w" fails
-- 3. Parser FAILS (no backtrack because we consumed input)

The try combinator enables backtracking:

keyword = try (string "function") <|> string "for"

-- Now on input "forward":
-- 1. try (string "function") consumes "for", fails on 'w'
-- 2. try causes BACKTRACK to original position
-- 3. string "for" matches

For this project, implement the simpler always-backtracking version first, then optionally add try for the commit/backtrack distinction.

Error Messages and Position Tracking

A parser that just returns Nothing is frustrating. Real parsers need error messages:

data ParseError = ParseError
    { errorPos     :: SourcePos
    , errorMessage :: String
    , expected     :: [String]
    }

data SourcePos = SourcePos
    { sourceLine   :: Int
    , sourceColumn :: Int
    }

Enhanced parser type:

newtype Parser a = Parser
    { runParser :: String -> SourcePos -> Either ParseError (a, String, SourcePos)
    }

Now parsers track position and accumulate meaningful error messages:

Parse error at line 3, column 15:
    unexpected '}'
    expecting expression

From Grammar to Combinators: The Translation

Any context-free grammar can be translated to combinators mechanically:

BNF Grammar:

<json>   ::= <object> | <array> | <string> | <number> | <bool> | "null"
<object> ::= '{' <members>? '}'
<members> ::= <pair> (',' <pair>)*
<pair>   ::= <string> ':' <json>
<array>  ::= '[' <elements>? ']'
<elements> ::= <json> (',' <json>)*

Direct Translation:

json :: Parser JSON
json = object <|> array <|> string <|> number <|> bool <|> null_

object :: Parser JSON
object = JObject <$> (char '{' *> optional members <* char '}')

members :: Parser [(String, JSON)]
members = sepBy1 pair (char ',')

pair :: Parser (String, JSON)
pair = (,) <$> stringLiteral <*> (char ':' *> json)

array :: Parser JSON
array = JArray <$> (char '[' *> optional elements <* char ']')

elements :: Parser [JSON]
elements = sepBy1 json (char ',')

The pattern:

| in BNF becomes <|> in combinators
Sequencing in BNF becomes <*> or *>
* (zero or more) becomes many
+ (one or more) becomes some
? (optional) becomes optional

Why Parser Combinators Embody Functional Programming

Parser combinators are the crown jewel of FP because they demonstrate every core principle:

Functions as first-class values: Parsers ARE functions
Composition: Complex parsers from simple ones via <*>, <|>, >>=
Higher-order functions: many, sepBy, chainl1 are all HOFs
Algebraic structures: Functor, Applicative, Monad, Alternative
Referential transparency: Same input always produces same output
Declarative style: Grammar IS the code

Compare to imperative parsing:

// Imperative
int parseInteger(String input, int[] position) {
    int start = position[0];
    while (position[0] < input.length() &&
           Character.isDigit(input.charAt(position[0]))) {
        position[0]++;
    }
    return Integer.parseInt(input.substring(start, position[0]));
}

-- Functional
integer :: Parser Int
integer = read <$> some digit

The functional version is shorter, clearer, and composable.

The `Applicative` Parsing Intuition

Here’s a mental model that helps: Applicative parsing is like filling in a template.

data Person = Person String Int String  -- Name, Age, City

person :: Parser Person
person = Person
    <$> name       -- Fill in slot 1
    <*> age        -- Fill in slot 2
    <*> city       -- Fill in slot 3

Reading left to right:

Start with constructor Person :: String -> Int -> String -> Person
Apply <$> to get Parser (Int -> String -> Person)
Apply <*> to get Parser (String -> Person)
Apply <*> to get Parser Person

Each <*> “uses up” one argument to the constructor.

Common Combinator Patterns

Separated lists:

sepBy :: Parser a -> Parser sep -> Parser [a]
sepBy p sep = sepBy1 p sep <|> pure []

sepBy1 :: Parser a -> Parser sep -> Parser [a]
sepBy1 p sep = (:) <$> p <*> many (sep *> p)

Chained expressions (left-associative):

chainl1 :: Parser a -> Parser (a -> a -> a) -> Parser a
chainl1 p op = do
    x <- p
    rest x
  where
    rest x = (do
        f <- op
        y <- p
        rest (f x y)) <|> pure x

For expression 1 + 2 + 3:

chainl1 number (char '+' *> pure (+))
Parse 1, x = 1
Parse +, parse 2, rest (1 + 2) = rest 3
Parse +, parse 3, rest (3 + 3) = rest 6
No more +, return 6

Between delimiters:

between :: Parser open -> Parser close -> Parser a -> Parser a
between open close p = open *> p <* close

parens :: Parser a -> Parser a
parens = between (char '(') (char ')')

brackets :: Parser a -> Parser a
brackets = between (char '[') (char ']')

Project Specification

Core Requirements

Build a parser combinator library that can parse non-trivial structured text. Your library must include:

1. The Parser Type

Newtype wrapper around the parsing function
Support for tracking remaining input
(Optional) Position tracking for error messages

2. Basic Parsers

satisfy :: (Char -> Bool) -> Parser Char
char :: Char -> Parser Char
string :: String -> Parser String
anyChar :: Parser Char
eof :: Parser () (succeeds only at end of input)

3. Type Class Instances

Functor Parser
Applicative Parser
Monad Parser
Alternative Parser

4. Combinators

many :: Parser a -> Parser [a] (zero or more)
some :: Parser a -> Parser [a] (one or more)
optional :: Parser a -> Parser (Maybe a)
sepBy :: Parser a -> Parser sep -> Parser [a]
sepBy1 :: Parser a -> Parser sep -> Parser [a]
between :: Parser open -> Parser close -> Parser a -> Parser a
chainl1 :: Parser a -> Parser (a -> a -> a) -> Parser a
choice :: [Parser a] -> Parser a

5. Lexical Helpers

spaces :: Parser String (whitespace)
lexeme :: Parser a -> Parser a (consume trailing whitespace)
symbol :: String -> Parser String (string + trailing whitespace)
integer :: Parser Int

6. A Non-Trivial Parser

Choose ONE of:

JSON parser - Parse JSON into a Haskell data type
Arithmetic expression parser - Respect operator precedence
Simple programming language - Variables, let bindings, functions

Stretch Goals

Add position tracking and meaningful error messages
Implement try for explicit backtracking control
Add <?> :: Parser a -> String -> Parser a for labeling parsers in errors
Implement lookAhead :: Parser a -> Parser a (peek without consuming)
Implement notFollowedBy :: Parser a -> Parser () (negative lookahead)

Solution Architecture

Module Structure

src/
  Parser/
    Core.hs         -- Parser type, Functor/Applicative/Monad/Alternative
    Primitives.hs   -- satisfy, char, string, anyChar
    Combinators.hs  -- many, some, sepBy, chainl1, etc.
    Lexer.hs        -- spaces, lexeme, symbol, integer
    Error.hs        -- (Optional) Error types and messages

  Examples/
    JSON.hs         -- JSON parser using your library
    Expr.hs         -- Arithmetic expression parser

test/
    CoreSpec.hs     -- Property tests for laws
    JSONSpec.hs     -- JSON parsing tests

The Parser Type (Two Options)

Simple Version:

newtype Parser a = Parser
    { runParser :: String -> Maybe (a, String)
    }

With Errors:

newtype Parser a = Parser
    { runParser :: String -> Either ParseError (a, String)
    }

data ParseError = ParseError
    { errorPosition :: Int
    , errorMessage  :: String
    }

Implementation Strategy

Start with the simple type - Get Functor/Applicative/Monad/Alternative working
Build primitives - satisfy, char, string
Derive combinators - Most are built on the instances
Add lexing - Whitespace handling
Build your target parser - JSON or expressions
Optionally enhance errors - Add position tracking

Implementation Guide

Phase 1: Core Parser Type and Instances

Goal: Implement the Parser newtype and its Functor/Applicative/Monad/Alternative instances.

Step 1: Define the type

module Parser.Core where

newtype Parser a = Parser
    { runParser :: String -> Maybe (a, String)
    }

Step 2: Implement Functor

Think about what fmap means: transform the result without changing parsing behavior.

instance Functor Parser where
    fmap f (Parser p) = Parser $ \input ->
        -- Pattern match on result, apply f to the parsed value
        -- Hint: use fmap on the outer Maybe, then on the tuple

Step 3: Implement Applicative

pure succeeds without consuming. <*> sequences parsers.

instance Applicative Parser where
    pure x = Parser $ \input -> -- Return x, don't consume input

    (Parser pf) <*> (Parser pa) = Parser $ \input ->
        -- Run pf, get function f and remaining input
        -- Run pa on remaining input, get value a
        -- Return f a with final remaining input

Step 4: Implement Monad

>>= lets the second parser depend on the first result.

instance Monad Parser where
    return = pure

    (Parser pa) >>= f = Parser $ \input ->
        -- Run pa, get value a and remaining input
        -- Apply f to a to get new parser
        -- Run new parser on remaining input

Step 5: Implement Alternative

empty always fails. <|> tries second on failure of first.

instance Alternative Parser where
    empty = Parser $ \_ -> Nothing

    (Parser p1) <|> (Parser p2) = Parser $ \input ->
        -- Try p1 on input
        -- If Nothing, try p2 on SAME input
        -- If Just, return that result

Verification: Write tests for the Functor/Applicative/Monad laws.

Phase 2: Primitive Parsers

Goal: Implement the building blocks all other parsers use.

module Parser.Primitives where

import Parser.Core

-- The fundamental primitive
satisfy :: (Char -> Bool) -> Parser Char
satisfy predicate = Parser $ \input ->
    case input of
        []     -> Nothing
        (c:cs) -> if predicate c then Just (c, cs) else Nothing

-- Derived from satisfy
char :: Char -> Parser Char
char c = satisfy (== c)

digit :: Parser Char
digit = satisfy isDigit

letter :: Parser Char
letter = satisfy isAlpha

anyChar :: Parser Char
anyChar = satisfy (const True)

-- End of input
eof :: Parser ()
eof = Parser $ \input ->
    case input of
        [] -> Just ((), "")
        _  -> Nothing

-- Parse a specific string
string :: String -> Parser String
string []     = pure []
string (c:cs) = (:) <$> char c <*> string cs
-- Or: string s = traverse char s

Phase 3: Combinators

Goal: Build the higher-level parsing tools.

module Parser.Combinators where

import Control.Applicative (Alternative(..))
import Parser.Core
import Parser.Primitives

-- many and some come from Alternative, but let's implement them:
manyP :: Parser a -> Parser [a]
manyP p = someP p <|> pure []

someP :: Parser a -> Parser [a]
someP p = (:) <$> p <*> manyP p

-- Separated by
sepBy :: Parser a -> Parser sep -> Parser [a]
sepBy p sep = sepBy1 p sep <|> pure []

sepBy1 :: Parser a -> Parser sep -> Parser [a]
sepBy1 p sep = (:) <$> p <*> many (sep *> p)

-- Between delimiters
between :: Parser open -> Parser close -> Parser a -> Parser a
between open close p = open *> p <* close

-- Choice from list
choice :: [Parser a] -> Parser a
choice = foldr (<|>) empty

-- Optional with default
option :: a -> Parser a -> Parser a
option x p = p <|> pure x

-- Left-associative chaining (for operators)
chainl1 :: Parser a -> Parser (a -> a -> a) -> Parser a
chainl1 p op = p >>= rest
  where
    rest x = (do
        f <- op
        y <- p
        rest (f x y)) <|> pure x

-- Right-associative chaining
chainr1 :: Parser a -> Parser (a -> a -> a) -> Parser a
chainr1 p op = do
    x <- p
    (do
        f <- op
        y <- chainr1 p op
        pure (f x y)) <|> pure x

Phase 4: Lexer Utilities

Goal: Handle whitespace and token-level parsing.

module Parser.Lexer where

import Parser.Core
import Parser.Primitives
import Parser.Combinators
import Data.Char (isSpace, isDigit)

-- Whitespace
spaces :: Parser String
spaces = many (satisfy isSpace)

-- Parser that consumes trailing whitespace
lexeme :: Parser a -> Parser a
lexeme p = p <* spaces

-- Parse a symbol (string + trailing whitespace)
symbol :: String -> Parser String
symbol s = lexeme (string s)

-- Parse an integer
integer :: Parser Int
integer = read <$> some digit

-- Signed integer
signedInteger :: Parser Int
signedInteger = do
    sign <- option '+' (char '-' <|> char '+')
    n <- integer
    pure $ if sign == '-' then -n else n

Phase 5: Build Your Target Parser

Option A: JSON Parser

module Examples.JSON where

data JSON
    = JNull
    | JBool Bool
    | JNumber Double
    | JString String
    | JArray [JSON]
    | JObject [(String, JSON)]
    deriving (Show, Eq)

json :: Parser JSON
json = lexeme $ choice
    [ JNull   <$  symbol "null"
    , JBool   <$> bool
    , JNumber <$> number
    , JString <$> stringLiteral
    , JArray  <$> array
    , JObject <$> object
    ]

bool :: Parser Bool
bool = (True <$ symbol "true") <|> (False <$ symbol "false")

number :: Parser Double
-- Handle integers, decimals, exponents

stringLiteral :: Parser String
-- Handle quotes, escape sequences

array :: Parser [JSON]
array = between (symbol "[") (symbol "]") (sepBy json (symbol ","))

object :: Parser [(String, JSON)]
object = between (symbol "{") (symbol "}") (sepBy pair (symbol ","))

pair :: Parser (String, JSON)
pair = (,) <$> (stringLiteral <* symbol ":") <*> json

Option B: Arithmetic Expressions

module Examples.Expr where

data Expr
    = Lit Int
    | Add Expr Expr
    | Sub Expr Expr
    | Mul Expr Expr
    | Div Expr Expr
    | Neg Expr
    deriving (Show, Eq)

expr :: Parser Expr
expr = term `chainl1` addOp

term :: Parser Expr
term = factor `chainl1` mulOp

factor :: Parser Expr
factor = choice
    [ Lit <$> lexeme integer
    , between (symbol "(") (symbol ")") expr
    , Neg <$> (symbol "-" *> factor)
    ]

addOp :: Parser (Expr -> Expr -> Expr)
addOp = (Add <$ symbol "+") <|> (Sub <$ symbol "-")

mulOp :: Parser (Expr -> Expr -> Expr)
mulOp = (Mul <$ symbol "*") <|> (Div <$ symbol "/")

eval :: Expr -> Int
eval (Lit n)     = n
eval (Add e1 e2) = eval e1 + eval e2
eval (Sub e1 e2) = eval e1 - eval e2
eval (Mul e1 e2) = eval e1 * eval e2
eval (Div e1 e2) = eval e1 `div` eval e2
eval (Neg e)     = - eval e

Phase 6: Error Enhancement (Optional)

Goal: Add position tracking and meaningful errors.

data ParseError = ParseError
    { errorPos      :: SourcePos
    , errorUnexpected :: Maybe Char
    , errorExpected   :: [String]
    }

data SourcePos = SourcePos
    { line   :: Int
    , column :: Int
    } deriving (Show, Eq)

newtype Parser a = Parser
    { runParser :: String -> SourcePos -> Either ParseError (a, String, SourcePos)
    }

-- Update position after consuming character
updatePos :: SourcePos -> Char -> SourcePos
updatePos (SourcePos l c) '\n' = SourcePos (l + 1) 1
updatePos (SourcePos l c) _    = SourcePos l (c + 1)

-- Label a parser for error messages
(<?>) :: Parser a -> String -> Parser a
p <?> label = -- On failure, include label in "expected" list

Testing Strategy

Unit Tests

-- Primitive tests
testSatisfy :: Spec
testSatisfy = describe "satisfy" $ do
    it "succeeds when predicate is true" $
        runParser (satisfy (== 'a')) "abc" `shouldBe` Just ('a', "bc")

    it "fails when predicate is false" $
        runParser (satisfy (== 'a')) "xyz" `shouldBe` Nothing

    it "fails on empty input" $
        runParser (satisfy (const True)) "" `shouldBe` Nothing

testString :: Spec
testString = describe "string" $ do
    it "parses exact match" $
        runParser (string "hello") "hello world" `shouldBe` Just ("hello", " world")

    it "fails on partial match" $
        runParser (string "hello") "help" `shouldBe` Nothing

Property-Based Tests (with QuickCheck)

-- Functor laws
prop_functorIdentity :: String -> Bool
prop_functorIdentity input =
    runParser (fmap id anyChar) input == runParser anyChar input

prop_functorComposition :: Fun Char Char -> Fun Char Char -> String -> Bool
prop_functorComposition (Fun _ f) (Fun _ g) input =
    runParser (fmap (f . g) anyChar) input ==
    runParser (fmap f . fmap g $ anyChar) input

-- Applicative laws
prop_applicativeIdentity :: String -> Bool
prop_applicativeIdentity input =
    runParser (pure id <*> anyChar) input == runParser anyChar input

-- Monad laws
prop_monadLeftIdentity :: Char -> Fun Char Char -> String -> Bool
prop_monadLeftIdentity x (Fun _ f) input =
    runParser (return x >>= \c -> return (f c)) input ==
    runParser (return (f x)) input

-- Alternative laws
prop_alternativeAssociativity :: String -> Property
prop_alternativeAssociativity input =
    runParser ((p1 <|> p2) <|> p3) input ===
    runParser (p1 <|> (p2 <|> p3)) input
  where
    p1 = char 'a'
    p2 = char 'b'
    p3 = char 'c'

Integration Tests (JSON Example)

testJSONParser :: Spec
testJSONParser = describe "JSON parser" $ do
    it "parses null" $
        runParser json "null" `shouldBe` Just (JNull, "")

    it "parses booleans" $ do
        runParser json "true" `shouldBe` Just (JBool True, "")
        runParser json "false" `shouldBe` Just (JBool False, "")

    it "parses integers" $
        runParser json "42" `shouldBe` Just (JNumber 42.0, "")

    it "parses strings" $
        runParser json "\"hello\"" `shouldBe` Just (JString "hello", "")

    it "parses arrays" $
        runParser json "[1, 2, 3]" `shouldBe`
            Just (JArray [JNumber 1, JNumber 2, JNumber 3], "")

    it "parses nested structures" $
        runParser json "{\"a\": [1, 2], \"b\": null}" `shouldBe`
            Just (JObject [("a", JArray [JNumber 1, JNumber 2]), ("b", JNull)], "")

Common Pitfalls

Pitfall 1: Left Recursion

Problem: Parser hangs infinitely.

-- WRONG: Left recursion causes infinite loop
expr :: Parser Expr
expr = (Add <$> expr <*> (char '+' *> term)) <|> term
--              ^^^^
--              Calls itself without consuming input!

Solution: Use chainl1 or rewrite to be right-recursive/iterative.

-- CORRECT: Left-factor or use chainl1
expr = term `chainl1` ((+) <$ char '+')

Pitfall 2: Forgetting Backtracking

Problem: Parser fails when it should try alternatives.

keyword = string "function" <|> string "func"

-- On input "func", this fails!
-- "function" consumes "func", then fails on 't' vs 'i'
-- No backtrack occurs

In the simple always-backtracking model, this works fine. But if you implement commit semantics like Parsec, you need try:

keyword = try (string "function") <|> string "func"

Pitfall 3: Greedy `many` Causing Failures

Problem: many consumes too much, causing the next parser to fail.

-- WRONG: many anyChar consumes EVERYTHING, including the quote
quotedString = char '"' *> many anyChar <* char '"'

runParser quotedString "\"hello\"" -- Nothing!

Solution: Be more specific about what you’re matching.

-- CORRECT: Stop at quote
quotedString = char '"' *> many (satisfy (/= '"')) <* char '"'

Pitfall 4: Mixing Up `>` and `<`

Problem: Discarding the wrong result.

-- These are different!
char '(' *> expr <* char ')'  -- Returns expr result
char '(' *> expr *> char ')'  -- Returns ')' (the last parser's result)

Memory aid:

*> points right: keep the RIGHT result
<* points left: keep the LEFT result

Pitfall 5: Not Handling Whitespace

Problem: Parser works for "123" but fails for " 123" or "123 ".

Solution: Decide on a whitespace strategy:

-- Option 1: Consume leading whitespace once, trailing whitespace after each token
json = spaces *> jsonValue

-- Option 2: Every parser consumes its trailing whitespace
integer = read <$> some digit <* spaces

Extensions and Challenges

Extension 1: Implement `try` for Explicit Backtracking

Add commit/backtrack semantics. By default, consuming input commits to that branch.

try :: Parser a -> Parser a
-- If p fails after consuming input, backtrack to original position

Extension 2: Parse a Complete Programming Language

Extend your expression parser to a mini-language with:

Variable bindings: let x = 5 in x + 1
Conditionals: if x > 0 then x else -x
Functions: \x -> x + 1
Function application: f 5

Extension 3: Add Megaparsec-style Error Handling

Implement:

Error message accumulation
“expected” labels
Position tracking with file name, line, column
Pretty-printed error messages

Extension 4: Implement Indentation-Sensitive Parsing

Parse Python-style or Haskell-style indentation:

-- Track current indentation level
type Indent = Int

newtype Parser a = Parser
    { runParser :: String -> Indent -> Maybe (a, String, Indent)
    }

block :: Parser a -> Parser [a]
-- Parse multiple items at same or greater indentation

Extension 5: Make It a Real Library

Add documentation with Haddock
Publish to Hackage
Compare performance with Parsec/Megaparsec
Add benchmarks with criterion

Real-World Connections

Industry Use Cases

Compilers and Interpreters: GHC uses parser combinators for Haskell parsing
Configuration Files: Parse YAML, TOML, INI files
Domain-Specific Languages: Internal DSLs for business rules, queries
Data Formats: JSON, XML, CSV, log files
Protocol Parsing: HTTP headers, email (RFC 5322), URLs

Popular Parser Combinator Libraries

Library	Language	Notes
Parsec	Haskell	The original, by Daan Leijen
Megaparsec	Haskell	Modern, better errors, more features
attoparsec	Haskell	Faster, less features, for protocols
FParsec	F#	Port of Parsec
nom	Rust	Zero-copy, binary parsing focus
pyparsing	Python	Python-style API
Scala Parser Combinators	Scala	Standard library

The Parsec Paper

Read “Parsec: Direct Style Monadic Parser Combinators For The Real World” by Daan Leijen. It explains:

Why commit/backtrack semantics improve error messages
How to handle left recursion
Performance optimizations

Interview Questions

After completing this project, you should be able to answer:

“Explain the type String -> Maybe (a, String) for a parser. Why each component?”
- String: Input to consume
- Maybe: Handles success/failure
- a: The parsed result
- String: Remaining unconsumed input
“Why do parser combinators use the Monad type class?”
- Monad allows the next parser to depend on the previous result
- Essential for context-sensitive grammars (length-prefixed data, version-dependent formats)
- >>= threads state (remaining input) through the computation
“What is the difference between Applicative and Monad parsing?”
- Applicative: Structure is static, parsers are independent
- Monad: Structure can depend on parsed values
- Applicative is more analyzable (can determine what input looks like before running)
“How does <|> implement backtracking?”
- When left parser fails, right parser receives original input
- In commit semantics, try is needed to enable backtracking after consumption
“What is left recursion and why is it a problem for recursive descent parsers?”
- Left recursion: A → A α | β
- Parser calls itself without consuming input
- Causes infinite loop
- Solution: Left-factor or use iterative combinators like chainl1
“How would you parse operator precedence (e.g., * before +)?”
- Stratify the grammar by precedence level
- Higher precedence operators in inner productions
- Use chainl1 or chainr1 for each level
“What is the relationship between context-free grammars and parser combinators?”
- Direct correspondence: each production becomes a parser
- | becomes <|>, sequencing becomes <*> or >>=
- Combinators can express context-sensitive features too

Self-Assessment Checklist

Check off each item when you’ve mastered it:

Fundamentals

I can explain why parsers return (a, String) tuples
I understand the difference between Maybe and Either for error handling
I can trace through what happens when chaining parsers with >>=

Type Classes

I can implement Functor for Parser from scratch
I can implement Applicative for Parser from scratch
I can implement Monad for Parser from scratch
I can implement Alternative for Parser from scratch
I can explain when to use <*> vs >>=

Combinators

I can implement many and some in terms of Alternative
I can implement sepBy and sepBy1
I can implement chainl1 for left-associative operators
I understand why chainl1 avoids left recursion

Practical Parsing

I can translate a BNF grammar to combinator code
I can parse JSON or arithmetic expressions
I can handle operator precedence correctly
I understand whitespace handling strategies

Advanced Topics

I can explain commit vs backtrack semantics
I understand when try is needed
I can add position tracking for error messages
I can debug common parser issues (infinite loops, unexpected failures)

Resources

Books

Book	Chapter	What You’ll Learn
“Programming in Haskell” (Hutton)	Chapter 13	The definitive introduction to parsing
“Haskell Programming from First Principles”	Chapters 24-25	Parser combinators step by step
“Real World Haskell”	Chapter 16	Parsec in practice
“Haskell in Depth”	Chapter 8	Production parsing techniques

Papers

“Monadic Parsing in Haskell” by Hutton & Meijer (1998)
- The foundational paper, extremely readable
- Derives the parser type step by step
“Parsec: Direct Style Monadic Parser Combinators For The Real World” by Daan Leijen
- Explains Parsec’s design decisions
- Error handling and performance
“Monadic Parser Combinators” by Graham Hutton & Erik Meijer
- Technical report with more detail than the paper

Online Resources

Megaparsec Tutorial - Modern parser combinators
Write You a Haskell - Chapters on parsing for language implementation
Typeclassopedia - Understanding Functor/Applicative/Monad

Code References

Conclusion

Parser combinators represent functional programming at its best: small, composable pieces that combine into powerful tools. By building this library, you’ve learned:

How to design types that capture computation - The parser type encapsulates sequential, fallible computation
Why Functor/Applicative/Monad matter - Real use cases, not abstract nonsense
The power of composition - Complex parsers from simple primitives
Practical FP techniques - Error handling, state threading, algebraic design

You now have the skills to parse any structured text format you encounter, and the conceptual tools to build your own DSLs and compilers.

Next Steps:

Project 8: Lazy Evaluation Demonstrator - See how Haskell’s evaluation strategy enables infinite data structures
Project 9: Monad Transformers - Stack multiple effects (parsing with logging, or parsing with state)