Project 7: Parser Combinator Library
Project 7: Parser Combinator Library
Building the Crown Jewel of Functional Programming
- Main Programming Language: Haskell
- Alternative Languages: Scala, OCaml, F#
- Coolness Level: Level 4: Hardcore Tech Flex
- Difficulty: Level 3: Advanced
- Knowledge Area: Parsing / Combinators
- Estimated Time: 2-3 weeks
- Prerequisites: Functor, Applicative, Monad, Alternative type classes; Basic recursion; Understanding of grammars
Learning Objectives
After completing this project, you will be able to:
- Design parser types from first principles - Derive the
Parser a = String -> Maybe (a, String)type and understand why each component is necessary - Implement Functor, Applicative, and Monad instances for parsers - Write these instances by hand and explain what each operation means for parsing
- Use the Alternative type class for choice - Implement
<|>for backtracking and understand when parsers should backtrack - Build complex parsers from simple primitives - Combine
satisfy,char,stringinto parsers for JSON, arithmetic, or programming languages - Handle parse errors gracefully - Implement meaningful error messages with position tracking
- Understand the relationship between grammars and combinators - Translate BNF/EBNF directly into combinator code
- Apply monadic parsing to real-world formats - Parse JSON, CSV, configuration files, or simple programming languages
Conceptual Foundation
What Is Parsing?
Parsing is the process of transforming unstructured text into structured data. When you read the string "[1, 2, 3]" and produce a Haskell list [1, 2, 3], you are parsing. When a compiler reads "x + y * z" and produces an abstract syntax tree respecting operator precedence, it is parsing.
The fundamental challenge of parsing is ambiguity and structure recovery. Text is linearâa sequence of characters. But the data it represents often has hierarchical structure: nested lists, nested function calls, nested expressions. The parserâs job is to recover this hidden structure.
Traditional Approaches vs. Combinators
Historically, parsing was done with specialized tools:
- Hand-written recursive descent parsers - Tedious, error-prone, but understandable
- Parser generators (yacc, bison, ANTLR) - Powerful but require learning a separate grammar language
- Regular expressions - Limited to regular languages (no nesting)
Parser combinators offer a different approach: parsers as first-class values. Instead of describing your grammar in a separate DSL, you write it directly in your host language. The grammar is the code.
-- Traditional grammar definition (BNF):
-- expr ::= term (('+' | '-') term)*
-- term ::= factor (('*' | '/') factor)*
-- factor ::= number | '(' expr ')'
-- Parser combinator version (this is actual Haskell code):
expr :: Parser Expr
expr = chainl1 term addOp
term :: Parser Expr
term = chainl1 factor mulOp
factor :: Parser Expr
factor = number <|> parens expr
The combinator version is not a separate specificationâitâs executable code that directly performs the parsing.
The Parser Type: Building Intuition
What should the type of a parser be? Letâs derive it step by step.
Attempt 1: Parser as a function
A parser takes a string and produces something:
type Parser a = String -> a
But this doesnât handle failure. What if the input isnât valid?
Attempt 2: Adding failure with Maybe
type Parser a = String -> Maybe a
Better! Now Nothing represents parse failure. But we have a problem: after parsing part of the input, we need to know whatâs left to continue parsing.
Attempt 3: Returning remaining input
type Parser a = String -> Maybe (a, String)
Now this is useful! A parser takes a string, potentially parses some prefix, and returns both the parsed value and the unconsumed remainder.
For example, parsing an integer from "123abc":
- Input:
"123abc" - Output:
Just (123, "abc")
This is the classic parser combinator type. Letâs make it a proper newtype:
newtype Parser a = Parser { runParser :: String -> Maybe (a, String) }
Understanding the Parser Type Deeply
Letâs examine what each part of String -> Maybe (a, String) contributes:
String -> Maybe (a, String)
------ ----- ----------
| | |
| | +-- (a, String): Result type
| | a: The parsed value
| | String: Remaining input
| |
| +-- Maybe: Handles failure
| Just: Successful parse
| Nothing: Parse failure
|
+-- String: Input to consume
Why a tuple? Because parsers are incremental. After one parser succeeds, the next parser continues from where it left off:
Input: "123abc"
^^^
Parser 1: integer
Output 1: Just (123, "abc")
^^^^
Passed to Parser 2
Input: "abc"
^^^
Parser 2: letters
Output 2: Just ("abc", "")
This is the essence of sequential composition in parsing.
The Primitive Parsers
Every parser combinator library is built from a few fundamental primitives:
1. satisfy - The Foundation
satisfy :: (Char -> Bool) -> Parser Char
satisfy predicate = Parser $ \input ->
case input of
(c:cs) | predicate c -> Just (c, cs)
_ -> Nothing
satisfy consumes one character if it satisfies a predicate. All other character-based parsers are built on top of it:
char :: Char -> Parser Char
char c = satisfy (== c)
digit :: Parser Char
digit = satisfy isDigit
letter :: Parser Char
letter = satisfy isAlpha
anyChar :: Parser Char
anyChar = satisfy (const True)
2. pure (or return) - Success Without Consuming
instance Applicative Parser where
pure x = Parser $ \input -> Just (x, input)
-- ...
pure always succeeds with a given value, consuming no input. This is essential for constructing results:
-- Parse "true" and return the boolean True
trueParser :: Parser Bool
trueParser = string "true" *> pure True
3. empty (or mzero) - Guaranteed Failure
instance Alternative Parser where
empty = Parser $ \_ -> Nothing
-- ...
empty always fails. Useful for conditional failure:
-- Parse a digit, but fail if it's zero
nonZeroDigit :: Parser Char
nonZeroDigit = satisfy isDigit >>= \c ->
if c == '0' then empty else pure c
Functor: Transforming Parse Results
The Functor instance lets you transform what a parser returns:
instance Functor Parser where
fmap f (Parser p) = Parser $ \input ->
case p input of
Nothing -> Nothing
Just (a, rest) -> Just (f a, rest)
Visualized:
Parser a
|
| fmap f
v
Parser b
Input: "123abc"
^^^
Parser (digit): Just ('1', "23abc")
^^^
fmap digitToInt: Just (1, "23abc")
^
Transformed!
Common uses:
-- Parse a digit character, convert to Int
digitAsInt :: Parser Int
digitAsInt = fmap digitToInt digit
-- Parse quoted string, drop the quotes
quotedString :: Parser String
quotedString = fmap (init . tail) (char '"' *> many (satisfy (/= '"')) <* char '"')
-- Or better: char '"' *> many (satisfy (/= '"')) <* char '"'
Applicative: Sequencing Independent Parsers
The Applicative instance sequences parsers and combines their results:
instance Applicative Parser where
pure x = Parser $ \input -> Just (x, input)
(Parser pf) <*> (Parser pa) = Parser $ \input ->
case pf input of
Nothing -> Nothing
Just (f, rest) -> case pa rest of
Nothing -> Nothing
Just (a, rest') -> Just (f a, rest')
The key insight: <*> runs the first parser, then runs the second on the remaining input, and combines the results:
Parser (a -> b) <*> Parser a = Parser b
Input: "ABCdef"
^^^
Parser 1: Parse "ABC", return (\x -> "ABC" ++ x)
Output 1: Just ((\x -> "ABC" ++ x), "def")
^^^
Remaining input
Input for Parser 2: "def"
^^^
Parser 2: Parse "def", return "def"
Output 2: Just ("def", "")
Combined: (\x -> "ABC" ++ x) applied to "def"
Result: Just ("ABCdef", "")
Applicative operators for convenience:
(*>) :: Parser a -> Parser b -> Parser b -- Run both, keep right
(<*) :: Parser a -> Parser b -> Parser a -- Run both, keep left
(<$>) :: (a -> b) -> Parser a -> Parser b -- Same as fmap
Parsing a point (3,4):
data Point = Point Int Int
point :: Parser Point
point = Point
<$> (char '(' *> integer)
<*> (char ',' *> integer <* char ')')
-- Or with ApplicativeDo:
point = do
char '('
x <- integer
char ','
y <- integer
char ')'
pure (Point x y)
Monad: Context-Sensitive Parsing
Sometimes you need the result of one parser to decide what to parse next. This requires Monad:
instance Monad Parser where
return = pure
(Parser pa) >>= f = Parser $ \input ->
case pa input of
Nothing -> Nothing
Just (a, rest) -> runParser (f a) rest
The crucial difference from Applicative: with >>=, the second parser is computed from the first result.
Example: Parsing length-prefixed data
Format: "3:abc" means "3 characters follow: abc"
"5:hello" means "5 characters follow: hello"
lengthPrefixed :: Parser String
lengthPrefixed = do
n <- integer -- Parse the length
char ':'
count n anyChar -- Parse exactly n characters
-- 'count n' depends on the VALUE of n!
This is impossible with just Applicative because count n needs the value n that was parsed.
When to use Monad vs Applicative:
- Use Applicative when the structure of your parser is fixed (most grammars)
- Use Monad when the structure depends on parsed values (length prefixes, version-dependent formats)
Applicative is faster because parsers can be analyzed statically. Prefer it when possible.
Alternative: Choice and Backtracking
The Alternative instance provides choice between parsers:
instance Alternative Parser where
empty = Parser $ \_ -> Nothing
(Parser p1) <|> (Parser p2) = Parser $ \input ->
case p1 input of
Nothing -> p2 input -- p1 failed, try p2 on ORIGINAL input
result -> result -- p1 succeeded, use its result
Critical concept: Backtracking
When p1 fails, p2 gets the original input, not what p1 consumed. This is called backtracking.
Input: "false"
Parser 1: string "true"
Attempt: 'f' != 't', fails immediately
Consumed: nothing
Parser 2: string "false" -- Gets ORIGINAL input
Attempt: matches!
Result: Just ("false", "")
Derived combinators from Alternative:
many :: Parser a -> Parser [a] -- Zero or more
some :: Parser a -> Parser [a] -- One or more
optional :: Parser a -> Parser (Maybe a)
-- Implementations:
many p = some p <|> pure []
some p = (:) <$> p <*> many p
The Problem with Naive Backtracking
Backtracking can be expensive. Consider:
keyword :: Parser String
keyword = string "function" <|> string "for" <|> string "func"
For input "function":
- Try
"function"- matches!
For input "forward":
- Try
"function"- âfâ, âoâ, ârâ match, then âwâ != âcâ, fail after consuming 3 chars - Backtrack to start, try
"for"- matches!
For input "foreach":
- Try
"function"- âfâ, âoâ, ârâ match, fail - Backtrack, try
"for"- matches! - But wait⌠we wanted
"foreach", and"for"consumed only 3 chars
This naive approach has problems:
- Performance: Re-scanning the same prefix multiple times
- Ambiguity:
"for"matches as prefix of"foreach"
Advanced Parsers: The try Combinator
Real parser libraries like Parsec use predictive parsing: by default, once a parser consumes input, it âcommitsâ and wonât backtrack.
-- In Parsec/Megaparsec:
keyword = string "function" <|> string "for"
-- On input "forward":
-- 1. Try "function", consume "f", commit
-- 2. "o" matches, "r" matches, "w" fails
-- 3. Parser FAILS (no backtrack because we consumed input)
The try combinator enables backtracking:
keyword = try (string "function") <|> string "for"
-- Now on input "forward":
-- 1. try (string "function") consumes "for", fails on 'w'
-- 2. try causes BACKTRACK to original position
-- 3. string "for" matches
For this project, implement the simpler always-backtracking version first, then optionally add try for the commit/backtrack distinction.
Error Messages and Position Tracking
A parser that just returns Nothing is frustrating. Real parsers need error messages:
data ParseError = ParseError
{ errorPos :: SourcePos
, errorMessage :: String
, expected :: [String]
}
data SourcePos = SourcePos
{ sourceLine :: Int
, sourceColumn :: Int
}
Enhanced parser type:
newtype Parser a = Parser
{ runParser :: String -> SourcePos -> Either ParseError (a, String, SourcePos)
}
Now parsers track position and accumulate meaningful error messages:
Parse error at line 3, column 15:
unexpected '}'
expecting expression
From Grammar to Combinators: The Translation
Any context-free grammar can be translated to combinators mechanically:
BNF Grammar:
<json> ::= <object> | <array> | <string> | <number> | <bool> | "null"
<object> ::= '{' <members>? '}'
<members> ::= <pair> (',' <pair>)*
<pair> ::= <string> ':' <json>
<array> ::= '[' <elements>? ']'
<elements> ::= <json> (',' <json>)*
Direct Translation:
json :: Parser JSON
json = object <|> array <|> string <|> number <|> bool <|> null_
object :: Parser JSON
object = JObject <$> (char '{' *> optional members <* char '}')
members :: Parser [(String, JSON)]
members = sepBy1 pair (char ',')
pair :: Parser (String, JSON)
pair = (,) <$> stringLiteral <*> (char ':' *> json)
array :: Parser JSON
array = JArray <$> (char '[' *> optional elements <* char ']')
elements :: Parser [JSON]
elements = sepBy1 json (char ',')
The pattern:
|in BNF becomes<|>in combinators- Sequencing in BNF becomes
<*>or*> *(zero or more) becomesmany+(one or more) becomessome?(optional) becomesoptional
Why Parser Combinators Embody Functional Programming
Parser combinators are the crown jewel of FP because they demonstrate every core principle:
- Functions as first-class values: Parsers ARE functions
- Composition: Complex parsers from simple ones via
<*>,<|>,>>= - Higher-order functions:
many,sepBy,chainl1are all HOFs - Algebraic structures: Functor, Applicative, Monad, Alternative
- Referential transparency: Same input always produces same output
- Declarative style: Grammar IS the code
Compare to imperative parsing:
// Imperative
int parseInteger(String input, int[] position) {
int start = position[0];
while (position[0] < input.length() &&
Character.isDigit(input.charAt(position[0]))) {
position[0]++;
}
return Integer.parseInt(input.substring(start, position[0]));
}
-- Functional
integer :: Parser Int
integer = read <$> some digit
The functional version is shorter, clearer, and composable.
The Applicative Parsing Intuition
Hereâs a mental model that helps: Applicative parsing is like filling in a template.
data Person = Person String Int String -- Name, Age, City
person :: Parser Person
person = Person
<$> name -- Fill in slot 1
<*> age -- Fill in slot 2
<*> city -- Fill in slot 3
Reading left to right:
- Start with constructor
Person :: String -> Int -> String -> Person - Apply
<$>to getParser (Int -> String -> Person) - Apply
<*>to getParser (String -> Person) - Apply
<*>to getParser Person
Each <*> âuses upâ one argument to the constructor.
Common Combinator Patterns
Separated lists:
sepBy :: Parser a -> Parser sep -> Parser [a]
sepBy p sep = sepBy1 p sep <|> pure []
sepBy1 :: Parser a -> Parser sep -> Parser [a]
sepBy1 p sep = (:) <$> p <*> many (sep *> p)
Chained expressions (left-associative):
chainl1 :: Parser a -> Parser (a -> a -> a) -> Parser a
chainl1 p op = do
x <- p
rest x
where
rest x = (do
f <- op
y <- p
rest (f x y)) <|> pure x
For expression 1 + 2 + 3:
chainl1 number (char '+' *> pure (+))
1. Parse 1, x = 1
2. Parse +, parse 2, rest (1 + 2) = rest 3
3. Parse +, parse 3, rest (3 + 3) = rest 6
4. No more +, return 6
Between delimiters:
between :: Parser open -> Parser close -> Parser a -> Parser a
between open close p = open *> p <* close
parens :: Parser a -> Parser a
parens = between (char '(') (char ')')
brackets :: Parser a -> Parser a
brackets = between (char '[') (char ']')
Project Specification
Core Requirements
Build a parser combinator library that can parse non-trivial structured text. Your library must include:
1. The Parser Type
- Newtype wrapper around the parsing function
- Support for tracking remaining input
- (Optional) Position tracking for error messages
2. Basic Parsers
satisfy :: (Char -> Bool) -> Parser Charchar :: Char -> Parser Charstring :: String -> Parser StringanyChar :: Parser Chareof :: Parser ()(succeeds only at end of input)
3. Type Class Instances
Functor ParserApplicative ParserMonad ParserAlternative Parser
4. Combinators
many :: Parser a -> Parser [a](zero or more)some :: Parser a -> Parser [a](one or more)optional :: Parser a -> Parser (Maybe a)sepBy :: Parser a -> Parser sep -> Parser [a]sepBy1 :: Parser a -> Parser sep -> Parser [a]between :: Parser open -> Parser close -> Parser a -> Parser achainl1 :: Parser a -> Parser (a -> a -> a) -> Parser achoice :: [Parser a] -> Parser a
5. Lexical Helpers
spaces :: Parser String(whitespace)lexeme :: Parser a -> Parser a(consume trailing whitespace)symbol :: String -> Parser String(string + trailing whitespace)integer :: Parser Int
6. A Non-Trivial Parser
Choose ONE of:
- JSON parser - Parse JSON into a Haskell data type
- Arithmetic expression parser - Respect operator precedence
- Simple programming language - Variables, let bindings, functions
Stretch Goals
- Add position tracking and meaningful error messages
- Implement
tryfor explicit backtracking control - Add
<?> :: Parser a -> String -> Parser afor labeling parsers in errors - Implement
lookAhead :: Parser a -> Parser a(peek without consuming) - Implement
notFollowedBy :: Parser a -> Parser ()(negative lookahead)
Solution Architecture
Module Structure
src/
Parser/
Core.hs -- Parser type, Functor/Applicative/Monad/Alternative
Primitives.hs -- satisfy, char, string, anyChar
Combinators.hs -- many, some, sepBy, chainl1, etc.
Lexer.hs -- spaces, lexeme, symbol, integer
Error.hs -- (Optional) Error types and messages
Examples/
JSON.hs -- JSON parser using your library
Expr.hs -- Arithmetic expression parser
test/
CoreSpec.hs -- Property tests for laws
JSONSpec.hs -- JSON parsing tests
The Parser Type (Two Options)
Simple Version:
newtype Parser a = Parser
{ runParser :: String -> Maybe (a, String)
}
With Errors:
newtype Parser a = Parser
{ runParser :: String -> Either ParseError (a, String)
}
data ParseError = ParseError
{ errorPosition :: Int
, errorMessage :: String
}
Implementation Strategy
- Start with the simple type - Get Functor/Applicative/Monad/Alternative working
- Build primitives -
satisfy,char,string - Derive combinators - Most are built on the instances
- Add lexing - Whitespace handling
- Build your target parser - JSON or expressions
- Optionally enhance errors - Add position tracking
Implementation Guide
Phase 1: Core Parser Type and Instances
Goal: Implement the Parser newtype and its Functor/Applicative/Monad/Alternative instances.
Step 1: Define the type
module Parser.Core where
newtype Parser a = Parser
{ runParser :: String -> Maybe (a, String)
}
Step 2: Implement Functor
Think about what fmap means: transform the result without changing parsing behavior.
instance Functor Parser where
fmap f (Parser p) = Parser $ \input ->
-- Pattern match on result, apply f to the parsed value
-- Hint: use fmap on the outer Maybe, then on the tuple
Step 3: Implement Applicative
pure succeeds without consuming. <*> sequences parsers.
instance Applicative Parser where
pure x = Parser $ \input -> -- Return x, don't consume input
(Parser pf) <*> (Parser pa) = Parser $ \input ->
-- Run pf, get function f and remaining input
-- Run pa on remaining input, get value a
-- Return f a with final remaining input
Step 4: Implement Monad
>>= lets the second parser depend on the first result.
instance Monad Parser where
return = pure
(Parser pa) >>= f = Parser $ \input ->
-- Run pa, get value a and remaining input
-- Apply f to a to get new parser
-- Run new parser on remaining input
Step 5: Implement Alternative
empty always fails. <|> tries second on failure of first.
instance Alternative Parser where
empty = Parser $ \_ -> Nothing
(Parser p1) <|> (Parser p2) = Parser $ \input ->
-- Try p1 on input
-- If Nothing, try p2 on SAME input
-- If Just, return that result
Verification: Write tests for the Functor/Applicative/Monad laws.
Phase 2: Primitive Parsers
Goal: Implement the building blocks all other parsers use.
module Parser.Primitives where
import Parser.Core
-- The fundamental primitive
satisfy :: (Char -> Bool) -> Parser Char
satisfy predicate = Parser $ \input ->
case input of
[] -> Nothing
(c:cs) -> if predicate c then Just (c, cs) else Nothing
-- Derived from satisfy
char :: Char -> Parser Char
char c = satisfy (== c)
digit :: Parser Char
digit = satisfy isDigit
letter :: Parser Char
letter = satisfy isAlpha
anyChar :: Parser Char
anyChar = satisfy (const True)
-- End of input
eof :: Parser ()
eof = Parser $ \input ->
case input of
[] -> Just ((), "")
_ -> Nothing
-- Parse a specific string
string :: String -> Parser String
string [] = pure []
string (c:cs) = (:) <$> char c <*> string cs
-- Or: string s = traverse char s
Phase 3: Combinators
Goal: Build the higher-level parsing tools.
module Parser.Combinators where
import Control.Applicative (Alternative(..))
import Parser.Core
import Parser.Primitives
-- many and some come from Alternative, but let's implement them:
manyP :: Parser a -> Parser [a]
manyP p = someP p <|> pure []
someP :: Parser a -> Parser [a]
someP p = (:) <$> p <*> manyP p
-- Separated by
sepBy :: Parser a -> Parser sep -> Parser [a]
sepBy p sep = sepBy1 p sep <|> pure []
sepBy1 :: Parser a -> Parser sep -> Parser [a]
sepBy1 p sep = (:) <$> p <*> many (sep *> p)
-- Between delimiters
between :: Parser open -> Parser close -> Parser a -> Parser a
between open close p = open *> p <* close
-- Choice from list
choice :: [Parser a] -> Parser a
choice = foldr (<|>) empty
-- Optional with default
option :: a -> Parser a -> Parser a
option x p = p <|> pure x
-- Left-associative chaining (for operators)
chainl1 :: Parser a -> Parser (a -> a -> a) -> Parser a
chainl1 p op = p >>= rest
where
rest x = (do
f <- op
y <- p
rest (f x y)) <|> pure x
-- Right-associative chaining
chainr1 :: Parser a -> Parser (a -> a -> a) -> Parser a
chainr1 p op = do
x <- p
(do
f <- op
y <- chainr1 p op
pure (f x y)) <|> pure x
Phase 4: Lexer Utilities
Goal: Handle whitespace and token-level parsing.
module Parser.Lexer where
import Parser.Core
import Parser.Primitives
import Parser.Combinators
import Data.Char (isSpace, isDigit)
-- Whitespace
spaces :: Parser String
spaces = many (satisfy isSpace)
-- Parser that consumes trailing whitespace
lexeme :: Parser a -> Parser a
lexeme p = p <* spaces
-- Parse a symbol (string + trailing whitespace)
symbol :: String -> Parser String
symbol s = lexeme (string s)
-- Parse an integer
integer :: Parser Int
integer = read <$> some digit
-- Signed integer
signedInteger :: Parser Int
signedInteger = do
sign <- option '+' (char '-' <|> char '+')
n <- integer
pure $ if sign == '-' then -n else n
Phase 5: Build Your Target Parser
Option A: JSON Parser
module Examples.JSON where
data JSON
= JNull
| JBool Bool
| JNumber Double
| JString String
| JArray [JSON]
| JObject [(String, JSON)]
deriving (Show, Eq)
json :: Parser JSON
json = lexeme $ choice
[ JNull <$ symbol "null"
, JBool <$> bool
, JNumber <$> number
, JString <$> stringLiteral
, JArray <$> array
, JObject <$> object
]
bool :: Parser Bool
bool = (True <$ symbol "true") <|> (False <$ symbol "false")
number :: Parser Double
-- Handle integers, decimals, exponents
stringLiteral :: Parser String
-- Handle quotes, escape sequences
array :: Parser [JSON]
array = between (symbol "[") (symbol "]") (sepBy json (symbol ","))
object :: Parser [(String, JSON)]
object = between (symbol "{") (symbol "}") (sepBy pair (symbol ","))
pair :: Parser (String, JSON)
pair = (,) <$> (stringLiteral <* symbol ":") <*> json
Option B: Arithmetic Expressions
module Examples.Expr where
data Expr
= Lit Int
| Add Expr Expr
| Sub Expr Expr
| Mul Expr Expr
| Div Expr Expr
| Neg Expr
deriving (Show, Eq)
expr :: Parser Expr
expr = term `chainl1` addOp
term :: Parser Expr
term = factor `chainl1` mulOp
factor :: Parser Expr
factor = choice
[ Lit <$> lexeme integer
, between (symbol "(") (symbol ")") expr
, Neg <$> (symbol "-" *> factor)
]
addOp :: Parser (Expr -> Expr -> Expr)
addOp = (Add <$ symbol "+") <|> (Sub <$ symbol "-")
mulOp :: Parser (Expr -> Expr -> Expr)
mulOp = (Mul <$ symbol "*") <|> (Div <$ symbol "/")
eval :: Expr -> Int
eval (Lit n) = n
eval (Add e1 e2) = eval e1 + eval e2
eval (Sub e1 e2) = eval e1 - eval e2
eval (Mul e1 e2) = eval e1 * eval e2
eval (Div e1 e2) = eval e1 `div` eval e2
eval (Neg e) = - eval e
Phase 6: Error Enhancement (Optional)
Goal: Add position tracking and meaningful errors.
data ParseError = ParseError
{ errorPos :: SourcePos
, errorUnexpected :: Maybe Char
, errorExpected :: [String]
}
data SourcePos = SourcePos
{ line :: Int
, column :: Int
} deriving (Show, Eq)
newtype Parser a = Parser
{ runParser :: String -> SourcePos -> Either ParseError (a, String, SourcePos)
}
-- Update position after consuming character
updatePos :: SourcePos -> Char -> SourcePos
updatePos (SourcePos l c) '\n' = SourcePos (l + 1) 1
updatePos (SourcePos l c) _ = SourcePos l (c + 1)
-- Label a parser for error messages
(<?>) :: Parser a -> String -> Parser a
p <?> label = -- On failure, include label in "expected" list
Testing Strategy
Unit Tests
-- Primitive tests
testSatisfy :: Spec
testSatisfy = describe "satisfy" $ do
it "succeeds when predicate is true" $
runParser (satisfy (== 'a')) "abc" `shouldBe` Just ('a', "bc")
it "fails when predicate is false" $
runParser (satisfy (== 'a')) "xyz" `shouldBe` Nothing
it "fails on empty input" $
runParser (satisfy (const True)) "" `shouldBe` Nothing
testString :: Spec
testString = describe "string" $ do
it "parses exact match" $
runParser (string "hello") "hello world" `shouldBe` Just ("hello", " world")
it "fails on partial match" $
runParser (string "hello") "help" `shouldBe` Nothing
Property-Based Tests (with QuickCheck)
-- Functor laws
prop_functorIdentity :: String -> Bool
prop_functorIdentity input =
runParser (fmap id anyChar) input == runParser anyChar input
prop_functorComposition :: Fun Char Char -> Fun Char Char -> String -> Bool
prop_functorComposition (Fun _ f) (Fun _ g) input =
runParser (fmap (f . g) anyChar) input ==
runParser (fmap f . fmap g $ anyChar) input
-- Applicative laws
prop_applicativeIdentity :: String -> Bool
prop_applicativeIdentity input =
runParser (pure id <*> anyChar) input == runParser anyChar input
-- Monad laws
prop_monadLeftIdentity :: Char -> Fun Char Char -> String -> Bool
prop_monadLeftIdentity x (Fun _ f) input =
runParser (return x >>= \c -> return (f c)) input ==
runParser (return (f x)) input
-- Alternative laws
prop_alternativeAssociativity :: String -> Property
prop_alternativeAssociativity input =
runParser ((p1 <|> p2) <|> p3) input ===
runParser (p1 <|> (p2 <|> p3)) input
where
p1 = char 'a'
p2 = char 'b'
p3 = char 'c'
Integration Tests (JSON Example)
testJSONParser :: Spec
testJSONParser = describe "JSON parser" $ do
it "parses null" $
runParser json "null" `shouldBe` Just (JNull, "")
it "parses booleans" $ do
runParser json "true" `shouldBe` Just (JBool True, "")
runParser json "false" `shouldBe` Just (JBool False, "")
it "parses integers" $
runParser json "42" `shouldBe` Just (JNumber 42.0, "")
it "parses strings" $
runParser json "\"hello\"" `shouldBe` Just (JString "hello", "")
it "parses arrays" $
runParser json "[1, 2, 3]" `shouldBe`
Just (JArray [JNumber 1, JNumber 2, JNumber 3], "")
it "parses nested structures" $
runParser json "{\"a\": [1, 2], \"b\": null}" `shouldBe`
Just (JObject [("a", JArray [JNumber 1, JNumber 2]), ("b", JNull)], "")
Common Pitfalls
Pitfall 1: Left Recursion
Problem: Parser hangs infinitely.
-- WRONG: Left recursion causes infinite loop
expr :: Parser Expr
expr = (Add <$> expr <*> (char '+' *> term)) <|> term
-- ^^^^
-- Calls itself without consuming input!
Solution: Use chainl1 or rewrite to be right-recursive/iterative.
-- CORRECT: Left-factor or use chainl1
expr = term `chainl1` ((+) <$ char '+')
Pitfall 2: Forgetting Backtracking
Problem: Parser fails when it should try alternatives.
keyword = string "function" <|> string "func"
-- On input "func", this fails!
-- "function" consumes "func", then fails on 't' vs 'i'
-- No backtrack occurs
In the simple always-backtracking model, this works fine. But if you implement commit semantics like Parsec, you need try:
keyword = try (string "function") <|> string "func"
Pitfall 3: Greedy many Causing Failures
Problem: many consumes too much, causing the next parser to fail.
-- WRONG: many anyChar consumes EVERYTHING, including the quote
quotedString = char '"' *> many anyChar <* char '"'
runParser quotedString "\"hello\"" -- Nothing!
Solution: Be more specific about what youâre matching.
-- CORRECT: Stop at quote
quotedString = char '"' *> many (satisfy (/= '"')) <* char '"'
Pitfall 4: Mixing Up *> and <*
Problem: Discarding the wrong result.
-- These are different!
char '(' *> expr <* char ')' -- Returns expr result
char '(' *> expr *> char ')' -- Returns ')' (the last parser's result)
Memory aid:
*>points right: keep the RIGHT result<*points left: keep the LEFT result
Pitfall 5: Not Handling Whitespace
Problem: Parser works for "123" but fails for " 123" or "123 ".
Solution: Decide on a whitespace strategy:
-- Option 1: Consume leading whitespace once, trailing whitespace after each token
json = spaces *> jsonValue
-- Option 2: Every parser consumes its trailing whitespace
integer = read <$> some digit <* spaces
Extensions and Challenges
Extension 1: Implement try for Explicit Backtracking
Add commit/backtrack semantics. By default, consuming input commits to that branch.
try :: Parser a -> Parser a
-- If p fails after consuming input, backtrack to original position
Extension 2: Parse a Complete Programming Language
Extend your expression parser to a mini-language with:
- Variable bindings:
let x = 5 in x + 1 - Conditionals:
if x > 0 then x else -x - Functions:
\x -> x + 1 - Function application:
f 5
Extension 3: Add Megaparsec-style Error Handling
Implement:
- Error message accumulation
- âexpectedâ labels
- Position tracking with file name, line, column
- Pretty-printed error messages
Extension 4: Implement Indentation-Sensitive Parsing
Parse Python-style or Haskell-style indentation:
-- Track current indentation level
type Indent = Int
newtype Parser a = Parser
{ runParser :: String -> Indent -> Maybe (a, String, Indent)
}
block :: Parser a -> Parser [a]
-- Parse multiple items at same or greater indentation
Extension 5: Make It a Real Library
- Add documentation with Haddock
- Publish to Hackage
- Compare performance with Parsec/Megaparsec
- Add benchmarks with
criterion
Real-World Connections
Industry Use Cases
- Compilers and Interpreters: GHC uses parser combinators for Haskell parsing
- Configuration Files: Parse YAML, TOML, INI files
- Domain-Specific Languages: Internal DSLs for business rules, queries
- Data Formats: JSON, XML, CSV, log files
- Protocol Parsing: HTTP headers, email (RFC 5322), URLs
Popular Parser Combinator Libraries
| Library | Language | Notes |
|---|---|---|
| Parsec | Haskell | The original, by Daan Leijen |
| Megaparsec | Haskell | Modern, better errors, more features |
| attoparsec | Haskell | Faster, less features, for protocols |
| FParsec | F# | Port of Parsec |
| nom | Rust | Zero-copy, binary parsing focus |
| pyparsing | Python | Python-style API |
| Scala Parser Combinators | Scala | Standard library |
The Parsec Paper
Read âParsec: Direct Style Monadic Parser Combinators For The Real Worldâ by Daan Leijen. It explains:
- Why commit/backtrack semantics improve error messages
- How to handle left recursion
- Performance optimizations
Interview Questions
After completing this project, you should be able to answer:
- âExplain the type
String -> Maybe (a, String)for a parser. Why each component?â- String: Input to consume
- Maybe: Handles success/failure
- a: The parsed result
- String: Remaining unconsumed input
- âWhy do parser combinators use the Monad type class?â
- Monad allows the next parser to depend on the previous result
- Essential for context-sensitive grammars (length-prefixed data, version-dependent formats)
>>=threads state (remaining input) through the computation
- âWhat is the difference between Applicative and Monad parsing?â
- Applicative: Structure is static, parsers are independent
- Monad: Structure can depend on parsed values
- Applicative is more analyzable (can determine what input looks like before running)
- âHow does
<|>implement backtracking?â- When left parser fails, right parser receives original input
- In commit semantics,
tryis needed to enable backtracking after consumption
- âWhat is left recursion and why is it a problem for recursive descent parsers?â
- Left recursion:
A â A Îą | β - Parser calls itself without consuming input
- Causes infinite loop
- Solution: Left-factor or use iterative combinators like
chainl1
- Left recursion:
- âHow would you parse operator precedence (e.g.,
*before+)?â- Stratify the grammar by precedence level
- Higher precedence operators in inner productions
- Use
chainl1orchainr1for each level
- âWhat is the relationship between context-free grammars and parser combinators?â
- Direct correspondence: each production becomes a parser
|becomes<|>, sequencing becomes<*>or>>=- Combinators can express context-sensitive features too
Self-Assessment Checklist
Check off each item when youâve mastered it:
Fundamentals
- I can explain why parsers return
(a, String)tuples - I understand the difference between
MaybeandEitherfor error handling - I can trace through what happens when chaining parsers with
>>=
Type Classes
- I can implement
FunctorforParserfrom scratch - I can implement
ApplicativeforParserfrom scratch - I can implement
MonadforParserfrom scratch - I can implement
AlternativeforParserfrom scratch - I can explain when to use
<*>vs>>=
Combinators
- I can implement
manyandsomein terms ofAlternative - I can implement
sepByandsepBy1 - I can implement
chainl1for left-associative operators - I understand why
chainl1avoids left recursion
Practical Parsing
- I can translate a BNF grammar to combinator code
- I can parse JSON or arithmetic expressions
- I can handle operator precedence correctly
- I understand whitespace handling strategies
Advanced Topics
- I can explain commit vs backtrack semantics
- I understand when
tryis needed - I can add position tracking for error messages
- I can debug common parser issues (infinite loops, unexpected failures)
Resources
Books
| Book | Chapter | What Youâll Learn |
|---|---|---|
| âProgramming in Haskellâ (Hutton) | Chapter 13 | The definitive introduction to parsing |
| âHaskell Programming from First Principlesâ | Chapters 24-25 | Parser combinators step by step |
| âReal World Haskellâ | Chapter 16 | Parsec in practice |
| âHaskell in Depthâ | Chapter 8 | Production parsing techniques |
Papers
- âMonadic Parsing in Haskellâ by Hutton & Meijer (1998)
- The foundational paper, extremely readable
- Derives the parser type step by step
- âParsec: Direct Style Monadic Parser Combinators For The Real Worldâ by Daan Leijen
- Explains Parsecâs design decisions
- Error handling and performance
- âMonadic Parser Combinatorsâ by Graham Hutton & Erik Meijer
- Technical report with more detail than the paper
Online Resources
- Megaparsec Tutorial - Modern parser combinators
- Write You a Haskell - Chapters on parsing for language implementation
- Typeclassopedia - Understanding Functor/Applicative/Monad
Code References
Conclusion
Parser combinators represent functional programming at its best: small, composable pieces that combine into powerful tools. By building this library, youâve learned:
- How to design types that capture computation - The parser type encapsulates sequential, fallible computation
- Why Functor/Applicative/Monad matter - Real use cases, not abstract nonsense
- The power of composition - Complex parsers from simple primitives
- Practical FP techniques - Error handling, state threading, algebraic design
You now have the skills to parse any structured text format you encounter, and the conceptual tools to build your own DSLs and compilers.
Next Steps:
- Project 8: Lazy Evaluation Demonstrator - See how Haskellâs evaluation strategy enables infinite data structures
- Project 9: Monad Transformers - Stack multiple effects (parsing with logging, or parsing with state)