Project 5: Functional JSON Parser

Project 5: Functional JSON Parser

Build a JSON parser using parser combinatorsโ€”composable building blocks that parse complex grammars.

Quick Reference

Attribute Value
Difficulty Expert
Time Estimate 2-3 Weeks
Language TypeScript
Prerequisites Projects 2-4, Recursion, Basic grammar concepts
Key Topics Parser Combinators, Recursion, Monads, Grammar, Backtracking

1. Learning Objectives

By completing this project, you will:

  1. Understand what parser combinators are and why theyโ€™re powerful
  2. Build primitive parsers (char, string, digit)
  3. Build combinator functions (sequence, choice, many, map)
  4. See how monads enable parser composition
  5. Parse a complete JSON grammar using only combinators
  6. Understand the relationship between grammars and parsers
  7. Handle error messages and recovery

2. Theoretical Foundation

2.1 Core Concepts

What is a Parser?

A parser is a function that takes input and produces structured output:

// Conceptually:
type Parser<A> = (input: string) => ParseResult<A>;

type ParseResult<A> =
  | { success: true; value: A; remaining: string }
  | { success: false; error: string; position: number };

A parser consumes some input and returns:

  • The parsed value
  • The remaining unconsumed input
  • Or an error with position
Input: "123abc"
       โ†“
Parser(digits)
       โ†“
Success: { value: 123, remaining: "abc" }

Input: "abc123"
       โ†“
Parser(digits)
       โ†“
Failure: { error: "Expected digit", position: 0 }

Parser Combinators: Composable Parsing

Instead of writing one big parser, we build small parsers and combine them:

Primitive Parsers:

// Parse a single specific character
const char = (c: string): Parser<string> =>
  (input) => {
    if (input[0] === c) {
      return success(c, input.slice(1));
    }
    return failure(`Expected '${c}'`);
  };

// Parse any single character
const anyChar: Parser<string> =
  (input) => {
    if (input.length > 0) {
      return success(input[0], input.slice(1));
    }
    return failure("Unexpected end of input");
  };

// Parse a specific string
const string = (s: string): Parser<string> =>
  (input) => {
    if (input.startsWith(s)) {
      return success(s, input.slice(s.length));
    }
    return failure(`Expected "${s}"`);
  };

Combinator Functions:

// Sequence: Run parsers in order, collect results
const seq = <A, B>(p1: Parser<A>, p2: Parser<B>): Parser<[A, B]> =>
  (input) => {
    const r1 = p1(input);
    if (!r1.success) return r1;
    const r2 = p2(r1.remaining);
    if (!r2.success) return r2;
    return success([r1.value, r2.value], r2.remaining);
  };

// Choice: Try parsers until one succeeds
const or = <A>(p1: Parser<A>, p2: Parser<A>): Parser<A> =>
  (input) => {
    const r1 = p1(input);
    if (r1.success) return r1;
    return p2(input);  // Backtrack and try p2
  };

// Many: Parse zero or more
const many = <A>(p: Parser<A>): Parser<A[]> =>
  (input) => {
    const results: A[] = [];
    let remaining = input;

    while (true) {
      const result = p(remaining);
      if (!result.success) break;
      results.push(result.value);
      remaining = result.remaining;
    }

    return success(results, remaining);
  };

// Map: Transform parser output
const map = <A, B>(p: Parser<A>, fn: (a: A) => B): Parser<B> =>
  (input) => {
    const result = p(input);
    if (!result.success) return result;
    return success(fn(result.value), result.remaining);
  };

Building Up: From Chars to JSON

// Digit: Parse one digit
const digit = or(char('0'), or(char('1'), /* ... */ char('9')));
// Or more elegantly:
const digit = satisfy(c => c >= '0' && c <= '9');

// Digits: Parse many digits
const digits = many1(digit);  // At least one

// Number: Parse integer
const integer = map(digits, ds => parseInt(ds.join(''), 10));

// JSON Number: Handle negatives, decimals, exponents
const jsonNumber = seq(
  optional(char('-')),
  or(char('0'), seq(digit19, many(digit))),
  optional(seq(char('.'), many1(digit))),
  optional(seq(or(char('e'), char('E')), optional(or(char('+'), char('-'))), many1(digit)))
);

Visualized:

Building JSON Parser:

Primitives:
char('a')     โ†’ parses 'a'
digit         โ†’ parses '0'-'9'
letter        โ†’ parses 'a'-'z'

Combinators:
many(digit)   โ†’ parses "123" โ†’ ["1","2","3"]
seq(a,b)      โ†’ parses "ab" โ†’ ["a","b"]
or(a,b)       โ†’ parses "a" or "b"

JSON Values:
jsonNull      = string("null")
jsonBool      = or(string("true"), string("false"))
jsonNumber    = seq(optional('-'), digits, optional(decimal))
jsonString    = between(char('"'), many(stringChar), char('"'))
jsonArray     = between(char('['), sepBy(jsonValue, char(',')), char(']'))
jsonObject    = between(char('{'), sepBy(keyValue, char(',')), char('}'))
jsonValue     = or(jsonNull, jsonBool, jsonNumber, jsonString, jsonArray, jsonObject)

The Monadic Structure

Parsers form a monad because:

  1. We can put a value in a parser (pure / of)
  2. We can chain parsers where the second depends on the first (flatMap / bind)
// pure: A parser that always succeeds with value
const pure = <A>(value: A): Parser<A> =>
  (input) => success(value, input);

// flatMap: Chain where second parser depends on first result
const flatMap = <A, B>(
  p: Parser<A>,
  fn: (a: A) => Parser<B>
): Parser<B> =>
  (input) => {
    const r1 = p(input);
    if (!r1.success) return r1;
    return fn(r1.value)(r1.remaining);
  };

Why this matters:

// Parse a number, then parse that many 'a's
const parseNAs = flatMap(integer, n =>
  times(n, char('a'))
);

parseNAs("3aaa");  // Success: ["a", "a", "a"]
parseNAs("2aaa");  // Success: ["a", "a"], remaining: "a"

The second parser (how many aโ€™s to parse) depends on the first result (the number).

JSON Grammar

The JSON grammar weโ€™ll parse:

value   = object | array | string | number | "true" | "false" | "null"

object  = "{" "}" | "{" members "}"
members = member | member "," members
member  = string ":" value

array   = "[" "]" | "[" elements "]"
elements = value | value "," elements

string  = '"' characters '"'
characters = "" | character characters
character = any-Unicode-except-" | '\"' | '\\' | '\/' | '\b' | '\f' | '\n' | '\r' | '\t' | '\uHHHH'

number  = integer fraction? exponent?
integer = digit | onenine digits | '-' digit | '-' onenine digits
fraction = '.' digits
exponent = ('e' | 'E') ('+' | '-')? digits

2.2 Why This Matters

In Real Applications:

  • DSLs: Parse domain-specific languages
  • Configuration: Parse config file formats
  • Protocols: Parse wire protocols
  • Compilers: Lexing and parsing stages

In Industry:

  • Parsec (Haskell): The original parser combinator library
  • Nom (Rust): High-performance parser combinators
  • Chevrotain (JS): Parser toolkit for JavaScript
  • ANTLR alternative: When you need lightweight parsing

2.3 Historical Context

  • 1975: Parser combinators described in theoretical papers
  • 1996: Huttonโ€™s โ€œHigher-Order Functions for Parsingโ€
  • 2001: Parsec library for Haskell - made combinators practical
  • 2010s: Parser combinators in many languages (Scala, Python, JS)
  • Today: Used in production for DSLs, config parsing, data formats

2.4 Common Misconceptions

โ€œParser combinators are slowโ€

  • Reality: With proper optimization (lazy evaluation, memoization), theyโ€™re fast
  • Trade-off: Development speed vs runtime performance

โ€œYou need a formal grammar firstโ€

  • Reality: You can discover the grammar as you build combinators
  • Combinators mirror BNF grammar naturally

โ€œJSON is too complex for this projectโ€

  • Reality: JSON is perfect - complex enough to learn, simple enough to complete

3. Project Specification

3.1 What You Will Build

A complete JSON parser built entirely from parser combinators:

  • Primitive parsers (char, string, satisfy)
  • Combinators (seq, or, many, map, flatMap)
  • Higher-level combinators (between, sepBy, optional)
  • Full JSON value parser
  • Pretty error messages with line/column

3.2 Functional Requirements

  1. Primitive Parsers:
    • char(c): Parse specific character
    • string(s): Parse specific string
    • satisfy(predicate): Parse char matching predicate
    • anyChar: Parse any character
    • eof: Match end of input
  2. Core Combinators:
    • seq(p1, p2): Sequence two parsers
    • or(p1, p2): Choice between parsers
    • many(p): Zero or more
    • many1(p): One or more
    • optional(p): Zero or one
    • map(p, fn): Transform result
    • flatMap(p, fn): Monadic chain
  3. Higher-Level Combinators:
    • between(open, content, close): Parse between delimiters
    • sepBy(p, sep): Parse separated list
    • sepBy1(p, sep): At least one
    • skip(p): Parse but discard result
    • lazy(thunk): For recursive grammars
  4. JSON Parsers:
    • jsonNull: Parse โ€œnullโ€
    • jsonBool: Parse โ€œtrueโ€ or โ€œfalseโ€
    • jsonNumber: Parse numbers (with decimals, exponents)
    • jsonString: Parse strings (with escape sequences)
    • jsonArray: Parse arrays
    • jsonObject: Parse objects
    • jsonValue: Parse any JSON value
  5. Error Handling:
    • Track line and column numbers
    • Meaningful error messages
    • Show context around error

3.3 Non-Functional Requirements

  • Type Safety: Full generic types for all combinators
  • Immutability: Parsers never mutate state
  • Composability: Any parser can be combined with any combinator
  • Performance: Reasonable for typical JSON (< 1MB)

3.4 Example Usage / Output

// Using primitive parsers
char('a')("abc");  // Success: { value: 'a', remaining: 'bc' }
char('a')("bcd");  // Failure: { error: "Expected 'a'", position: 0 }

// Using combinators
const ab = seq(char('a'), char('b'));
ab("abc");  // Success: { value: ['a', 'b'], remaining: 'c' }

const digit = satisfy(c => c >= '0' && c <= '9');
const number = map(many1(digit), ds => parseInt(ds.join(''), 10));
number("123abc");  // Success: { value: 123, remaining: 'abc' }

// JSON parsing
const result = jsonValue('{"name": "Alice", "age": 30, "active": true}');
// Success: {
//   value: {
//     name: "Alice",
//     age: 30,
//     active: true
//   },
//   remaining: ""
// }

// Error handling
const badResult = jsonValue('{"name": "unterminated');
// Failure: {
//   error: "Expected '\"' at line 1, column 23",
//   position: 22,
//   context: '..."name": "unterminated'
//                            ^
// }

4. Solution Architecture

4.1 High-Level Design

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                    Parser Combinator Library                โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                                                            โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”‚
โ”‚  โ”‚                  Primitives Layer                     โ”‚  โ”‚
โ”‚  โ”‚  char, string, satisfy, anyChar, eof                  โ”‚  โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ”‚
โ”‚                          โ”‚                                  โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”‚
โ”‚  โ”‚                 Core Combinators                      โ”‚  โ”‚
โ”‚  โ”‚  seq, or, many, many1, optional, map, flatMap         โ”‚  โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ”‚
โ”‚                          โ”‚                                  โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”‚
โ”‚  โ”‚              Higher-Level Combinators                 โ”‚  โ”‚
โ”‚  โ”‚  between, sepBy, skip, lazy, token                    โ”‚  โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ”‚
โ”‚                          โ”‚                                  โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”‚
โ”‚  โ”‚                   JSON Parsers                        โ”‚  โ”‚
โ”‚  โ”‚  jsonNull, jsonBool, jsonNumber, jsonString           โ”‚  โ”‚
โ”‚  โ”‚  jsonArray, jsonObject, jsonValue                     โ”‚  โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ”‚
โ”‚                                                            โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

4.2 Key Components

Component Responsibility Key Decisions
Parser<A> Core parser type Function taking ParserState
ParserState Track position, input Include line/column
ParseResult<A> Success or failure Either-like structure
Primitives Match characters Foundation for everything
Combinators Compose parsers Enable complex grammars
JSON parsers Parse JSON specifically Use mutual recursion

4.3 Data Structures

// Parser state tracks position
interface ParserState {
  readonly input: string;
  readonly position: number;
  readonly line: number;
  readonly column: number;
}

// Parse result
type ParseResult<A> = ParseSuccess<A> | ParseFailure;

interface ParseSuccess<A> {
  readonly _tag: 'Success';
  readonly value: A;
  readonly state: ParserState;
}

interface ParseFailure {
  readonly _tag: 'Failure';
  readonly error: string;
  readonly state: ParserState;
  readonly expected: string[];
}

// Parser is a function
type Parser<A> = (state: ParserState) => ParseResult<A>;

// JSON value type
type JsonValue =
  | null
  | boolean
  | number
  | string
  | JsonValue[]
  | { [key: string]: JsonValue };

4.4 Algorithm Overview

Character Parser:

const char = (expected: string): Parser<string> =>
  (state) => {
    if (state.position >= state.input.length) {
      return failure(state, `Unexpected end of input, expected '${expected}'`);
    }

    const actual = state.input[state.position];
    if (actual === expected) {
      return success(actual, advance(state, 1));
    }

    return failure(state, `Expected '${expected}', got '${actual}'`);
  };

Sequence Combinator:

const seq = <A, B>(p1: Parser<A>, p2: Parser<B>): Parser<[A, B]> =>
  (state) => {
    const r1 = p1(state);
    if (r1._tag === 'Failure') return r1;

    const r2 = p2(r1.state);
    if (r2._tag === 'Failure') return r2;

    return success([r1.value, r2.value], r2.state);
  };

Lazy for Recursion:

// jsonValue references jsonArray, which references jsonValue
// Use lazy to break the cycle

const lazy = <A>(thunk: () => Parser<A>): Parser<A> =>
  (state) => thunk()(state);

const jsonArray: Parser<JsonValue[]> =
  between(
    char('['),
    sepBy(lazy(() => jsonValue), char(',')),
    char(']')
  );

Complexity:

  • Single character: O(1)
  • Sequence: O(n) where n = total parsed characters
  • Choice with backtracking: O(k ร— n) where k = choices
  • Full JSON: O(n) for typical JSON

5. Implementation Guide

5.1 Development Environment Setup

mkdir json-parser && cd json-parser
npm init -y
npm install --save-dev typescript ts-node jest @types/jest ts-jest
npx tsc --init
mkdir src tests examples

5.2 Project Structure

json-parser/
โ”œโ”€โ”€ src/
โ”‚   โ”œโ”€โ”€ core/
โ”‚   โ”‚   โ”œโ”€โ”€ types.ts         # Parser, ParserState, ParseResult
โ”‚   โ”‚   โ”œโ”€โ”€ state.ts         # State manipulation functions
โ”‚   โ”‚   โ””โ”€โ”€ result.ts        # Success/failure constructors
โ”‚   โ”œโ”€โ”€ primitives/
โ”‚   โ”‚   โ”œโ”€โ”€ char.ts          # char, anyChar, satisfy
โ”‚   โ”‚   โ”œโ”€โ”€ string.ts        # string, regex
โ”‚   โ”‚   โ””โ”€โ”€ index.ts
โ”‚   โ”œโ”€โ”€ combinators/
โ”‚   โ”‚   โ”œโ”€โ”€ sequence.ts      # seq, seqN
โ”‚   โ”‚   โ”œโ”€โ”€ choice.ts        # or, choice
โ”‚   โ”‚   โ”œโ”€โ”€ repetition.ts    # many, many1, optional
โ”‚   โ”‚   โ”œโ”€โ”€ transform.ts     # map, flatMap
โ”‚   โ”‚   โ”œโ”€โ”€ utility.ts       # between, sepBy, skip, lazy
โ”‚   โ”‚   โ””โ”€โ”€ index.ts
โ”‚   โ”œโ”€โ”€ json/
โ”‚   โ”‚   โ”œโ”€โ”€ primitives.ts    # jsonNull, jsonBool, jsonNumber, jsonString
โ”‚   โ”‚   โ”œโ”€โ”€ compound.ts      # jsonArray, jsonObject
โ”‚   โ”‚   โ”œโ”€โ”€ value.ts         # jsonValue (main entry)
โ”‚   โ”‚   โ””โ”€โ”€ index.ts
โ”‚   โ””โ”€โ”€ index.ts             # Public API
โ”œโ”€โ”€ tests/
โ”‚   โ”œโ”€โ”€ primitives.test.ts
โ”‚   โ”œโ”€โ”€ combinators.test.ts
โ”‚   โ”œโ”€โ”€ json.test.ts
โ”‚   โ””โ”€โ”€ fixtures/            # Test JSON files
โ””โ”€โ”€ package.json

5.3 Implementation Phases

Phase 1: Core Types & Primitives (1-2 days)

Goals:

  • Define Parser, ParserState, ParseResult types
  • Implement char, string, satisfy, anyChar, eof
  • Handle position tracking

Tasks:

  1. Create ParserState with input, position, line, column
  2. Create ParseResult as discriminated union
  3. Create helper functions: success(), failure(), advance()
  4. Implement char(c) - parse single character
  5. Implement satisfy(predicate) - parse if predicate true
  6. Implement string(s) - parse exact string
  7. Implement anyChar and eof

Checkpoint:

const p = char('a');
expect(p(createState("abc"))).toEqual({
  _tag: 'Success',
  value: 'a',
  state: { input: "abc", position: 1, line: 1, column: 2 }
});

Phase 2: Core Combinators (2-3 days)

Goals:

  • Implement sequence and choice
  • Implement repetition (many, many1, optional)
  • Implement map and flatMap

Tasks:

  1. Implement seq(p1, p2) - sequence two parsers
  2. Implement or(p1, p2) - try first, backtrack and try second
  3. Implement many(p) - zero or more
  4. Implement many1(p) - one or more
  5. Implement optional(p) - zero or one (returns Maybe)
  6. Implement map(p, fn) - transform result
  7. Implement flatMap(p, fn) - monadic chain

Checkpoint:

const digit = satisfy(c => c >= '0' && c <= '9');
const digits = many1(digit);
const number = map(digits, ds => parseInt(ds.join(''), 10));

expect(run(number, "123abc")).toEqual({ value: 123, remaining: "abc" });

Phase 3: Higher-Level Combinators (1-2 days)

Goals:

  • Build convenience combinators
  • Handle whitespace
  • Support recursive parsers

Tasks:

  1. Implement between(open, content, close)
  2. Implement sepBy(p, separator) and sepBy1
  3. Implement skip(p) - parse and discard
  4. Implement lazy(() => parser) - for recursion
  5. Implement token(p) - skip whitespace around parser
  6. Implement lexeme(s) - string token with whitespace

Checkpoint:

const list = between(char('['), sepBy(number, char(',')), char(']'));
expect(run(list, "[1,2,3]")).toEqual({ value: [1, 2, 3], remaining: "" });

Phase 4: JSON Parser (3-4 days)

Goals:

  • Implement all JSON value types
  • Handle string escapes
  • Handle recursive structures

Tasks:

  1. Implement jsonNull - parse โ€œnullโ€
  2. Implement jsonBool - parse โ€œtrueโ€ or โ€œfalseโ€
  3. Implement jsonNumber - full number spec with decimals, exponents
  4. Implement jsonString - with escape sequences (\n, \t, \uXXXX)
  5. Implement jsonArray - recursive with jsonValue
  6. Implement jsonObject - with key-value pairs
  7. Combine all into jsonValue
  8. Add whitespace handling throughout

Checkpoint:

const json = `{
  "name": "Alice",
  "age": 30,
  "friends": ["Bob", "Charlie"],
  "address": null,
  "active": true
}`;

expect(run(jsonValue, json)).toEqual({
  value: {
    name: "Alice",
    age: 30,
    friends: ["Bob", "Charlie"],
    address: null,
    active: true
  },
  remaining: ""
});

Phase 5: Error Handling & Polish (2-3 days)

Goals:

  • Improve error messages
  • Add context to errors
  • Test edge cases

Tasks:

  1. Track โ€œexpectedโ€ alternatives in errors
  2. Add label(p, name) for custom error messages
  3. Show context around error position
  4. Handle unterminated strings, arrays, objects
  5. Test with malformed JSON
  6. Benchmark with large JSON files

Checkpoint:

const result = run(jsonValue, '{"key": "unterminated}');
// Error: Expected '"' at line 1, column 23
// Context: ..."key": "unterminated
//                              ^ Expected closing quote

5.4 Key Implementation Decisions

Decision Options Recommendation Rationale
Result type Tuple vs Object Object Clearer, named fields
Backtracking Always vs Explicit Always for or Simpler for beginners
Error collection First vs All First with expected list Balance info vs complexity
Whitespace Implicit vs Explicit Explicit with token helper More control

6. Testing Strategy

6.1 Test Categories

Category Purpose Examples
Primitives Basic character parsing char, string, satisfy
Combinators Composition works seq, or, many
JSON Types Each value type null, number, string
Edge Cases Boundary conditions Empty input, escapes
Errors Good error messages Position, context

6.2 Critical Test Cases

  1. String Escapes:
    test('parses escape sequences', () => {
      expect(run(jsonString, '"hello\\nworld"')).toEqual({
     value: "hello\nworld",
     remaining: ""
      });
      expect(run(jsonString, '"unicode: \\u0041"')).toEqual({
     value: "unicode: A",
     remaining: ""
      });
    });
    
  2. Nested Structures:
    test('parses deeply nested arrays', () => {
      expect(run(jsonValue, '[[[1]]]')).toEqual({
     value: [[[1]]],
     remaining: ""
      });
    });
    
  3. Number Edge Cases:
    test('parses various number formats', () => {
      expect(run(jsonNumber, '0')).toMatchObject({ value: 0 });
      expect(run(jsonNumber, '-42')).toMatchObject({ value: -42 });
      expect(run(jsonNumber, '3.14')).toMatchObject({ value: 3.14 });
      expect(run(jsonNumber, '1e10')).toMatchObject({ value: 1e10 });
      expect(run(jsonNumber, '-1.5e-3')).toMatchObject({ value: -0.0015 });
    });
    
  4. Error Position:
    test('reports correct error position', () => {
      const result = run(jsonValue, '{"a": }');
      expect(result._tag).toBe('Failure');
      expect(result.state.column).toBe(7);
    });
    

6.3 Test Data

// Valid JSON samples
const validSamples = [
  'null',
  'true',
  'false',
  '42',
  '-3.14',
  '"hello"',
  '[]',
  '{}',
  '[1, 2, 3]',
  '{"key": "value"}',
  // Complex nested
  `{
    "users": [
      {"name": "Alice", "age": 30},
      {"name": "Bob", "age": 25}
    ],
    "metadata": null
  }`
];

// Invalid JSON samples (with expected error)
const invalidSamples = [
  { input: '{', error: 'Unexpected end' },
  { input: '{"key": }', error: 'Expected value' },
  { input: '[1, 2,]', error: 'Trailing comma' },
  { input: '"unterminated', error: 'Expected "' },
];

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

Pitfall Symptom Solution
Left recursion Stack overflow Use iteration or lazy
Forgetting to advance state Infinite loop Always return new state
Backtracking too far Wrong parse Use try combinator
No lazy for recursion TypeScript error Wrap recursive refs in lazy
Whitespace handling Parses fail Use token wrapper

7.2 Debugging Strategies

// Debug combinator: log parse attempts
const debug = <A>(name: string, p: Parser<A>): Parser<A> =>
  (state) => {
    console.log(`Trying ${name} at position ${state.position}: "${state.input.slice(state.position, state.position + 10)}..."`);
    const result = p(state);
    console.log(`${name}: ${result._tag}`);
    return result;
  };

// Usage
const jsonValue = choice(
  debug('null', jsonNull),
  debug('bool', jsonBool),
  debug('number', jsonNumber),
  debug('string', jsonString),
  debug('array', jsonArray),
  debug('object', jsonObject)
);

7.3 Performance Traps

  • No memoization: Consider caching for ambiguous grammars
  • Excessive backtracking: Order choices from most to least likely
  • Concatenating strings: Use array and join at end

8. Extensions & Challenges

8.1 Beginner Extensions

  • Comments: Allow // and /* */ comments in JSON
  • Trailing Commas: Handle optional trailing commas
  • Pretty Errors: Show line context with arrow

8.2 Intermediate Extensions

  • JSONC (JSON with Comments): Full spec support
  • JSON5: Extended JSON format
  • Source Maps: Track original positions for each value

8.3 Advanced Extensions

  • Streaming Parser: Parse large files incrementally
  • Parser Generator: Generate parsers from grammar DSL
  • Expression Language: Extend to parse 1 + 2 * 3

9. Real-World Connections

9.1 Industry Applications

  • Configuration: Parse custom config formats
  • Query Languages: SQL-like query parsing
  • Templates: Mustache/Handlebars parsing
  • Protocols: Parse binary/text protocols
  • Parsimmon: https://github.com/jneen/parsimmon - JS parser combinators
  • ts-parsec: https://github.com/nicktindall/ts-parsec - TypeScript combinators
  • Chevrotain: https://github.com/Chevrotain/chevrotain - Parser toolkit

9.3 Interview Relevance

  • โ€œImplement a simple expression parserโ€: Direct application
  • โ€œHow would you parse a custom format?โ€: Show combinator approach
  • โ€œExplain recursive descent parsingโ€: Conceptual overlap

10. Resources

10.1 Essential Reading

  • โ€œMonadic Parser Combinatorsโ€ by Hutton & Meijer - The classic paper
  • โ€œFunctional Programming in Scalaโ€ Ch. 9 - Parser Combinators
  • โ€œProgramming Language Pragmaticsโ€ by Michael Scott - Parsing fundamentals

10.2 Video Resources

  • โ€œParser Combinators From Scratchโ€ - Computerphile (YouTube)
  • โ€œWrite You A Parserโ€ - Tsoding (YouTube)

10.3 Tools & Documentation

  • JSON Spec: https://www.json.org/json-en.html - Official grammar
  • Parsimmon docs: https://github.com/jneen/parsimmon/blob/master/API.md

11. Self-Assessment Checklist

Before considering this project complete, verify:

Understanding

  • I can explain what a parser combinator is
  • I understand how parsers form a monad
  • I can explain the difference between seq and flatMap
  • I understand why lazy is needed for recursion

Implementation

  • All primitives work correctly
  • All combinators compose correctly
  • Full JSON grammar is parsed
  • String escapes handled (\n, \t, \uXXXX)
  • Numbers with decimals and exponents work
  • Error messages show position

Growth

  • I could extend this to parse other formats
  • I understand performance implications
  • I can compare to regex-based parsing

12. Submission / Completion Criteria

Minimum Viable Completion:

  • Primitives: char, string, satisfy
  • Combinators: seq, or, many, map
  • JSON: null, bool, number, string (basic)
  • Basic tests passing

Full Completion:

  • All combinators including between, sepBy, lazy
  • Full JSON spec including escapes and exponents
  • Nested objects and arrays work
  • Meaningful error messages with position
  • Comprehensive test suite

Excellence (Going Above & Beyond):

  • Streaming parser for large files
  • Support for JSON5 or JSONC
  • Performance benchmarks
  • Published as npm package

This guide was generated from FUNCTIONAL_PROGRAMMING_TYPESCRIPT_LEARNING_PROJECTS.md. For the complete learning path, see the parent directory.