Project 4: Create a Parser Combinator Library

Project 4: Create a Parser Combinator Library

Project Overview

Attribute Value
Difficulty Advanced
Time Estimate 2 weeks
Main Language C
Alternatives Rust, Haskell, OCaml, TypeScript
Prerequisites Higher-order functions, recursion, comfortable with abstraction
Key FP Concepts Higher-order functions, monads, function composition, domain-specific languages

Learning Objectives

By completing this project, you will:

  1. Master higher-order functions - Functions that take and return functions
  2. Understand monads practically - See monads as “chainable operations with context”
  3. Experience composition at scale - Build complex parsers from simple ones
  4. Build a domain-specific language - Create an embedded DSL for parsing
  5. Understand applicative functors - Combine independent computations
  6. See why FP excels at parsing - Grammar rules map directly to code

The Core Question

“How can you build a complex parser by combining tiny, simple parsers? What does it mean that ‘parsers are monads’?”

Parser combinators demonstrate FP’s core insight: composition scales. Instead of writing one giant parsing function, you write many tiny parsers and combine them with operators like “sequence”, “choice”, and “many”.

The magic: your code looks like the grammar it parses.

-- Grammar: email = word '@' word '.' word
email = word <*> char '@' <*> word <*> char '.' <*> word

-- Grammar: json = null | bool | number | string | array | object
json = null_ <|> bool_ <|> number_ <|> string_ <|> array_ <|> object_

Deep Theoretical Foundation

1. What Is a Parser?

A parser is a function that takes input (usually a string) and returns either:

  • Success: The parsed value plus the remaining unparsed input
  • Failure: An error message
// Parser type signature
typedef struct {
    bool success;
    void* result;
    const char* remaining;
    const char* error;
} ParseResult;

typedef ParseResult (*Parser)(const char* input);

In a more functional language:

type Parser a = String -> Either ParseError (a, String)
--              input  -> failure     or (result, remaining)

2. Primitive Parsers

Start with the simplest possible parsers:

// Parse a specific character
ParseResult char_parser(char c, const char* input) {
    if (*input == c) {
        return (ParseResult){ .success = true, .result = ..., .remaining = input + 1 };
    }
    return (ParseResult){ .success = false, .error = "Expected character" };
}

// Parse any character
ParseResult any_char(const char* input) {
    if (*input != '\0') {
        return (ParseResult){ .success = true, .result = *input, .remaining = input + 1 };
    }
    return (ParseResult){ .success = false, .error = "Unexpected end of input" };
}

// Parse end of input
ParseResult end_of_input(const char* input) {
    if (*input == '\0') {
        return (ParseResult){ .success = true, .remaining = input };
    }
    return (ParseResult){ .success = false, .error = "Expected end of input" };
}

3. The First Combinator: Sequence

Sequence: Run one parser, then another on the remaining input.

Parser sequence(Parser p1, Parser p2) {
    return lambda(input) {
        ParseResult r1 = p1(input);
        if (!r1.success) return r1;

        ParseResult r2 = p2(r1.remaining);
        if (!r2.success) return r2;

        return (ParseResult){
            .success = true,
            .result = combine(r1.result, r2.result),
            .remaining = r2.remaining
        };
    };
}

Now you can parse “ab” as sequence(char('a'), char('b')).

4. The Second Combinator: Choice

Choice: Try one parser; if it fails, try another.

Parser choice(Parser p1, Parser p2) {
    return lambda(input) {
        ParseResult r1 = p1(input);
        if (r1.success) return r1;

        return p2(input);  // Try second parser on original input
    };
}

Now you can parse “a” or “b” as choice(char('a'), char('b')).

5. The Third Combinator: Many

Many: Apply a parser zero or more times.

Parser many(Parser p) {
    return lambda(input) {
        List results = empty_list();
        const char* remaining = input;

        while (true) {
            ParseResult r = p(remaining);
            if (!r.success) break;
            list_append(results, r.result);
            remaining = r.remaining;
        }

        return (ParseResult){
            .success = true,
            .result = results,
            .remaining = remaining
        };
    };
}

Now you can parse “aaaa” as many(char('a')).

6. Map: Transforming Results

Map: Apply a function to a parser’s result.

Parser map(Parser p, Transform f) {
    return lambda(input) {
        ParseResult r = p(input);
        if (!r.success) return r;

        return (ParseResult){
            .success = true,
            .result = f(r.result),
            .remaining = r.remaining
        };
    };
}

Now you can parse a digit and convert to int:

Parser digit_as_int = map(digit, char_to_int);

7. Understanding Monads Through Parsers

A monad is a design pattern for chaining operations that have some “context” (like failure). Parsers are monads because:

  1. Wrap a value: pure(x) creates a parser that succeeds with x
  2. Chain operations: bind(p, f) runs parser p, then applies function f to the result
// pure: Create a parser that always succeeds with a value
Parser pure(void* value) {
    return lambda(input) {
        return (ParseResult){
            .success = true,
            .result = value,
            .remaining = input  // Consumes nothing
        };
    };
}

// bind (flatMap, >>=): Chain parsers, threading state through
Parser bind(Parser p, ParserFunc f) {
    return lambda(input) {
        ParseResult r = p(input);
        if (!r.success) return r;

        Parser next = f(r.result);  // f takes result, returns new parser
        return next(r.remaining);
    };
}

Why this matters: bind automatically handles:

  • Threading the remaining input through
  • Propagating failures

You can write:

// Parse "Name: Alice"
bind(string("Name: "), lambda(void* _) {
    return bind(word, lambda(char* name) {
        return pure(make_person(name));
    });
});

Instead of manually checking each parse result.

8. Applicative Functors: Combining Independent Parsers

Applicative is for combining parsers that don’t depend on each other’s results:

// <*>: Apply parsed function to parsed value
Parser apply(Parser pf, Parser pa) {
    return lambda(input) {
        ParseResult rf = pf(input);
        if (!rf.success) return rf;

        ParseResult ra = pa(rf.remaining);
        if (!ra.success) return ra;

        Function f = rf.result;
        return (ParseResult){
            .success = true,
            .result = f(ra.result),
            .remaining = ra.remaining
        };
    };
}

This enables beautiful syntax:

-- Parse "Name: Alice, Age: 30" into Person
person = Person <$> (string "Name: " *> word)
                <*> (string ", Age: " *> number)

9. Recursive Parsers

For recursive grammars (like JSON arrays containing JSON), use forward references:

// JSON grammar:
// json = null | bool | number | string | array | object
// array = '[' (json (',' json)*)? ']'
// object = '{' (pair (',' pair)*)? '}'

Parser json;  // Forward declaration

Parser array = sequence(
    char('['),
    sequence(
        optional(sep_by(json, char(','))),  // json used recursively
        char(']')
    )
);

json = choice(null_, choice(bool_, choice(number_, ...)));  // Define json

Project Specification

What You’re Building

A library where you combine small parsers into complex ones:

// Define parsers
Parser digit = satisfy(isdigit);
Parser digits = many1(digit);
Parser number = map(digits, parse_int);
Parser identifier = sequence(letter, many(alphanumeric));

// Build complex parsers
Parser email = sequence3(
    identifier,
    char('@'),
    sequence(identifier, char('.'), identifier)
);

// Use them
ParseResult r = run_parser(email, "alice@example.com");
// r.result = EmailAddress("alice", "example", "com")

Core Combinators to Implement

Combinator Description Example
pure(x) Always succeeds with x pure(42) → always returns 42
char(c) Parse specific character char('a')
satisfy(pred) Parse char satisfying predicate satisfy(isdigit)
string(s) Parse exact string string("hello")
seq(p1, p2) Parse p1 then p2 seq(char('a'), char('b'))
choice(p1, p2) Try p1, then p2 choice(letter, digit)
many(p) Zero or more many(digit)
many1(p) One or more many1(digit)
optional(p) Zero or one optional(char('-'))
map(p, f) Transform result map(digits, atoi)
bind(p, f) Sequence with dependency monadic chaining
sep_by(p, sep) List with separator sep_by(number, char(','))
between(l, p, r) Surrounded by delimiters between(char('('), expr, char(')'))

Target Parsers to Build

  1. Integer parser: Parse 123, -456, 0
  2. Float parser: Parse 3.14, -0.5, 1.0e-10
  3. String parser: Parse "hello" with escapes
  4. Identifier parser: Parse foo, bar_123, _private
  5. JSON parser: Full JSON including nested structures
  6. Expression parser: 1 + 2 * 3 with precedence

Solution Architecture

Data Types (C)

// Parse result
typedef struct ParseResult {
    bool success;
    void* value;          // The parsed value
    const char* remaining; // Unparsed input
    const char* error;     // Error message if failed
    size_t position;       // Position in original input
} ParseResult;

// Parser is a function pointer
typedef ParseResult (*ParserFn)(const char* input);

// Parser wrapper (allows storing context)
typedef struct Parser {
    ParserFn fn;
    void* context;
} Parser;

// For more complex results
typedef struct {
    size_t count;
    void** items;
} List;

Core Functions

// Create parsers
Parser char_p(char c);
Parser string_p(const char* s);
Parser satisfy(int (*predicate)(int));

// Combinators
Parser seq(Parser p1, Parser p2);
Parser choice(Parser p1, Parser p2);
Parser many(Parser p);
Parser many1(Parser p);
Parser optional(Parser p);

// Transformers
Parser map_p(Parser p, void* (*transform)(void*));
Parser bind_p(Parser p, Parser (*f)(void*));

// Utilities
Parser sep_by(Parser p, Parser sep);
Parser between(Parser left, Parser p, Parser right);
Parser chainl1(Parser p, Parser op);  // Left-associative operators

// Run a parser
ParseResult run_parser(Parser p, const char* input);

Module Structure

src/
├── parser.h           # Type definitions
├── parser.c           # Core parser infrastructure
├── combinators.h      # Combinator declarations
├── combinators.c      # Combinator implementations
├── primitives.h       # char, string, satisfy, etc.
├── primitives.c
├── json.h             # JSON parser built from combinators
├── json.c
├── expr.h             # Expression parser
├── expr.c
└── main.c             # Tests and examples

Implementation Guide

Phase 1: Basic Infrastructure (Days 1-2)

Goal: Set up parser types and basic char parser.

// parser.h
typedef struct {
    bool success;
    char value;
    const char* remaining;
    const char* error;
} CharResult;

typedef CharResult (*CharParser)(const char*);

// parser.c
CharResult char_parser(char c, const char* input) {
    if (input == NULL || *input == '\0') {
        return (CharResult){
            .success = false,
            .error = "Unexpected end of input"
        };
    }

    if (*input == c) {
        return (CharResult){
            .success = true,
            .value = c,
            .remaining = input + 1
        };
    }

    return (CharResult){
        .success = false,
        .error = "Expected different character"
    };
}

Phase 2: Generic Parser Type (Days 3-4)

Goal: Make parsers work with any result type.

In C, this requires some creativity (void pointers or code generation). A cleaner approach uses tagged unions:

typedef enum {
    VAL_CHAR,
    VAL_STRING,
    VAL_INT,
    VAL_FLOAT,
    VAL_LIST,
    VAL_PAIR,
} ValueType;

typedef struct Value {
    ValueType type;
    union {
        char char_val;
        char* string_val;
        int int_val;
        double float_val;
        struct { size_t len; struct Value* items; } list_val;
        struct { struct Value* first; struct Value* second; } pair_val;
    };
} Value;

typedef struct ParseResult {
    bool success;
    Value* value;
    const char* remaining;
    const char* error;
} ParseResult;

Phase 3: Sequence and Choice (Days 5-6)

Goal: Implement the fundamental combinators.

// Closure struct for sequence
typedef struct {
    Parser p1;
    Parser p2;
} SeqContext;

ParseResult seq_fn(const char* input, void* ctx) {
    SeqContext* c = (SeqContext*)ctx;

    ParseResult r1 = run_parser(c->p1, input);
    if (!r1.success) return r1;

    ParseResult r2 = run_parser(c->p2, r1.remaining);
    if (!r2.success) return r2;

    Value* pair = malloc(sizeof(Value));
    pair->type = VAL_PAIR;
    pair->pair_val.first = r1.value;
    pair->pair_val.second = r2.value;

    return (ParseResult){
        .success = true,
        .value = pair,
        .remaining = r2.remaining
    };
}

Parser seq(Parser p1, Parser p2) {
    SeqContext* ctx = malloc(sizeof(SeqContext));
    ctx->p1 = p1;
    ctx->p2 = p2;
    return (Parser){ .fn = seq_fn, .context = ctx };
}

Phase 4: Many and Map (Days 7-8)

Goal: Implement repetition and transformation.

ParseResult many_fn(const char* input, void* ctx) {
    Parser p = *(Parser*)ctx;
    Value* list = make_list();

    const char* remaining = input;
    while (true) {
        ParseResult r = run_parser(p, remaining);
        if (!r.success) break;

        list_append(list, r.value);
        remaining = r.remaining;
    }

    return (ParseResult){
        .success = true,
        .value = list,
        .remaining = remaining
    };
}

typedef struct {
    Parser p;
    Value* (*transform)(Value*);
} MapContext;

ParseResult map_fn(const char* input, void* ctx) {
    MapContext* c = (MapContext*)ctx;

    ParseResult r = run_parser(c->p, input);
    if (!r.success) return r;

    return (ParseResult){
        .success = true,
        .value = c->transform(r.value),
        .remaining = r.remaining
    };
}

Phase 5: Build Useful Parsers (Days 9-10)

Goal: Create number, string, identifier parsers.

// digit: Parse a single digit
Parser digit = satisfy(isdigit);

// digits: One or more digits
Parser digits = many1(digit);

// Integer parser
Value* chars_to_int(Value* chars) {
    // Convert list of chars to int
    char buf[32];
    size_t len = chars->list_val.len;
    for (size_t i = 0; i < len; i++) {
        buf[i] = chars->list_val.items[i].char_val;
    }
    buf[len] = '\0';

    Value* result = malloc(sizeof(Value));
    result->type = VAL_INT;
    result->int_val = atoi(buf);
    return result;
}

Parser integer = map_p(
    seq(
        optional(choice(char_p('-'), char_p('+'))),
        digits
    ),
    chars_to_int
);

Phase 6: JSON Parser (Days 11-13)

Goal: Build a complete JSON parser.

// Forward declaration (needed for recursion)
Parser json_value(void);

// Whitespace
Parser ws = many(satisfy(isspace));

// JSON null
Parser json_null = map_p(string_p("null"), make_null);

// JSON boolean
Parser json_bool = choice(
    map_p(string_p("true"), make_true),
    map_p(string_p("false"), make_false)
);

// JSON number
Parser json_number = /* ... from Phase 5 ... */;

// JSON string
Parser json_string = between(
    char_p('"'),
    many(choice(
        satisfy(lambda(c) { return c != '"' && c != '\\'; }),
        seq(char_p('\\'), any_char)  // Escape sequences
    )),
    char_p('"')
);

// JSON array
Parser json_array(void) {
    return between(
        seq(char_p('['), ws),
        sep_by(json_value(), seq(ws, seq(char_p(','), ws))),
        seq(ws, char_p(']'))
    );
}

// JSON object
Parser json_object(void) {
    Parser pair = seq(
        json_string,
        seq(seq(ws, seq(char_p(':'), ws)), json_value())
    );
    return between(
        seq(char_p('{'), ws),
        sep_by(pair, seq(ws, seq(char_p(','), ws))),
        seq(ws, char_p('}'))
    );
}

// The full JSON parser
Parser json_value(void) {
    return choice(json_null,
           choice(json_bool,
           choice(json_number,
           choice(json_string,
           choice(json_array(),
                  json_object())))));
}

Phase 7: Expression Parser with Precedence (Day 14)

Goal: Parse arithmetic with correct precedence.

Use chainl1 for left-associative operators:

// chainl1: Parse p, then repeatedly (op, p), left-associative
// expr = term (('+' | '-') term)*
// term = factor (('*' | '/') factor)*

Parser addOp = choice(
    map_p(char_p('+'), make_add_op),
    map_p(char_p('-'), make_sub_op)
);

Parser mulOp = choice(
    map_p(char_p('*'), make_mul_op),
    map_p(char_p('/'), make_div_op)
);

Parser factor(void) {
    return choice(
        integer,
        between(char_p('('), expr(), char_p(')'))
    );
}

Parser term(void) {
    return chainl1(factor(), mulOp);
}

Parser expr(void) {
    return chainl1(term(), addOp);
}

Testing Strategy

Primitive Tests

void test_char_parser(void) {
    Parser p = char_p('a');

    ParseResult r1 = run_parser(p, "abc");
    assert(r1.success);
    assert(r1.value->char_val == 'a');
    assert(strcmp(r1.remaining, "bc") == 0);

    ParseResult r2 = run_parser(p, "xyz");
    assert(!r2.success);
}

void test_string_parser(void) {
    Parser p = string_p("hello");

    ParseResult r = run_parser(p, "hello world");
    assert(r.success);
    assert(strcmp(r.remaining, " world") == 0);
}

Combinator Tests

void test_sequence(void) {
    Parser p = seq(char_p('a'), char_p('b'));

    ParseResult r = run_parser(p, "abc");
    assert(r.success);
    assert(r.value->pair_val.first->char_val == 'a');
    assert(r.value->pair_val.second->char_val == 'b');
}

void test_choice(void) {
    Parser p = choice(char_p('a'), char_p('b'));

    assert(run_parser(p, "a").success);
    assert(run_parser(p, "b").success);
    assert(!run_parser(p, "c").success);
}

void test_many(void) {
    Parser p = many(char_p('a'));

    ParseResult r1 = run_parser(p, "aaab");
    assert(r1.success);
    assert(r1.value->list_val.len == 3);
    assert(strcmp(r1.remaining, "b") == 0);

    ParseResult r2 = run_parser(p, "b");
    assert(r2.success);  // many succeeds with 0 matches
    assert(r2.value->list_val.len == 0);
}

JSON Parser Tests

void test_json_null(void) {
    ParseResult r = run_parser(json_value(), "null");
    assert(r.success);
    assert(r.value->type == VAL_NULL);
}

void test_json_array(void) {
    ParseResult r = run_parser(json_value(), "[1, 2, 3]");
    assert(r.success);
    assert(r.value->type == VAL_LIST);
    assert(r.value->list_val.len == 3);
}

void test_json_nested(void) {
    const char* input = "{\"a\": [1, 2], \"b\": {\"c\": true}}";
    ParseResult r = run_parser(json_value(), input);
    assert(r.success);
    // Verify nested structure...
}

Common Pitfalls and Debugging

Pitfall 1: Left Recursion

Symptom: Stack overflow or infinite loop

// WRONG: Left recursion
// expr = expr '+' term | term
Parser expr() {
    return choice(
        seq(expr(), seq(char_p('+'), term())),  // expr calls expr immediately!
        term()
    );
}

Solution: Use chainl1 or rewrite the grammar:

// RIGHT: Eliminate left recursion
// expr = term ('+' term)*
Parser expr() {
    return chainl1(term(), addOp);
}

Pitfall 2: Forgetting Backtracking

Symptom: Parser fails when it shouldn’t

// "if" vs "identifier" - both start with 'i'
Parser keyword_if = string_p("if");
Parser identifier = many1(letter);

// WRONG: choice doesn't backtrack by default
Parser token = choice(keyword_if, identifier);
// Parsing "if" works
// Parsing "ifx" fails: "if" matches, leaving "x" which isn't handled

Solution: Use try combinator or order choices carefully:

// Option 1: try backtracks on partial match
Parser token = choice(try(keyword_if), identifier);

// Option 2: Match longer first
Parser keyword_if = seq(string_p("if"), not(letter));  // "if" followed by non-letter

Pitfall 3: Memory Leaks

Symptom: Memory usage grows with parsing

In C, you must carefully manage memory:

// Remember to free failed parse results
ParseResult r = run_parser(p, input);
if (!r.success) {
    free_result(&r);  // Don't leak partial results
}

// Use arena allocation for efficiency
Arena* arena = arena_create(1024 * 1024);
ParseResult r = run_parser_with_arena(p, input, arena);
// ... use result ...
arena_destroy(arena);  // Free everything at once

Pitfall 4: Poor Error Messages

Symptom: “Parse error at position 42” with no context

Solution: Track what was expected:

typedef struct {
    bool success;
    Value* value;
    const char* remaining;
    struct {
        const char* position;
        const char** expected;  // List of expected tokens
        size_t expected_count;
    } error;
} ParseResult;

// When choice fails, collect expected from all alternatives
ParseResult choice_fn(const char* input, void* ctx) {
    ChoiceContext* c = ctx;

    ParseResult r1 = run_parser(c->p1, input);
    if (r1.success) return r1;

    ParseResult r2 = run_parser(c->p2, input);
    if (r2.success) return r2;

    // Merge expected lists
    return (ParseResult){
        .success = false,
        .error = {
            .position = input,
            .expected = merge_expected(r1.error.expected, r2.error.expected),
            // ...
        }
    };
}

Extensions and Challenges

Challenge 1: Error Recovery

Continue parsing after an error to report multiple issues:

// Instead of failing on first error, skip to synchronization point
Parser recover(Parser p, Parser sync) {
    return lambda(input) {
        ParseResult r = run_parser(p, input);
        if (r.success) return r;

        // Skip until sync succeeds
        while (*input && !run_parser(sync, input).success) {
            input++;
        }

        // Continue parsing, collect this error
        return (ParseResult){
            .success = true,  // Partial success
            .value = make_error_node(r.error),
            .remaining = input
        };
    };
}

Challenge 2: Incremental Parsing

Re-parse only changed portions for editor integration:

typedef struct IncrementalParser {
    // Store parse tree with byte ranges
    // When text changes, invalidate affected nodes
    // Re-parse only invalidated portions
} IncrementalParser;

Challenge 3: Generate Parser from Grammar

Build a parser generator:

grammar JSON {
    json = null | bool | number | string | array | object;
    null = "null";
    bool = "true" | "false";
    ...
}

Generates the parser combinator code automatically.

Challenge 4: Pratt Parsing

For complex expression grammars, implement Pratt parsing (precedence climbing):

typedef struct {
    int prefix_precedence;
    int infix_precedence;
    bool right_associative;
} OperatorInfo;

Value* pratt_parse(Parser* p, int min_precedence) {
    Value* left = parse_prefix(p);

    while (true) {
        OperatorInfo* op = get_operator(current_token(p));
        if (!op || op->infix_precedence < min_precedence) break;

        advance(p);
        int next_prec = op->right_associative
            ? op->infix_precedence
            : op->infix_precedence + 1;
        Value* right = pratt_parse(p, next_prec);
        left = make_binary(op, left, right);
    }

    return left;
}

Real-World Connections

Parser Combinator Libraries

Language Library Notable Features
Haskell Parsec, Megaparsec The originals, beautiful monadic API
Rust nom, combine Zero-copy, streaming parsers
Scala scala-parser-combinators Part of standard library
JavaScript Parsimmon Clean API for JS
Python pyparsing Readable, Pythonic

Where Parser Combinators Excel

  • DSLs: Query languages, config files, templates
  • Protocol parsing: HTTP, SMTP, custom binary formats
  • Compilers: Lexing and parsing phases
  • Data extraction: Log files, structured text

Industrial Usage

  • Rust analyzer: Uses a parser combinator approach for IDE features
  • GraphQL implementations: Many use combinators for query parsing
  • Configuration languages: HCL (HashiCorp), TOML parsers

Resources

Essential Reading

Topic Resource
Monadic parsing “Monadic Parser Combinators” - Hutton & Meijer
Higher-order functions Learn You a Haskell Ch. 6
Parsing techniques Language Implementation Patterns Ch. 2
Composition Domain Modeling Made Functional Ch. 6

Papers

  • “Monadic Parser Combinators” by Graham Hutton and Erik Meijer
  • “Parsec: Direct Style Monadic Parser Combinators For The Real World” by Daan Leijen
  • “Packrat Parsing” by Bryan Ford (for unlimited lookahead)

Implementation References


Self-Assessment Checklist

Primitive Parsers

  • char(c) - parse specific character
  • satisfy(pred) - parse char matching predicate
  • string(s) - parse exact string
  • any_char - parse any single character
  • end_of_input - succeed only at end

Core Combinators

  • seq(p1, p2) - sequence
  • choice(p1, p2) - alternative
  • many(p) - zero or more
  • many1(p) - one or more
  • optional(p) - zero or one
  • map(p, f) - transform result
  • bind(p, f) - monadic chaining

Utility Combinators

  • sep_by(p, sep) - separated list
  • between(l, p, r) - surrounded by delimiters
  • chainl1(p, op) - left-associative chain

Built Parsers

  • Integer parser (handles negative)
  • Float parser (handles exponent)
  • String parser (handles escapes)
  • Complete JSON parser
  • Expression parser with precedence

Quality

  • Error messages include position
  • Memory is managed correctly
  • Tests cover edge cases
  • Code mirrors grammar structure

Interview Questions

  1. “What is a parser combinator?”
    • Expected: A higher-order function that takes parsers and returns a new parser
  2. “Why are parsers monads?”
    • Expected: They support sequencing with state (remaining input) and failure propagation
  3. “How do you handle operator precedence?”
    • Expected: Use chainl1/chainr1 or Pratt parsing
  4. “What’s the difference between many and many1?”
    • Expected: many matches zero or more (always succeeds), many1 requires at least one
  5. “How do you avoid left recursion?”
    • Expected: Rewrite grammar to term (op term)* form or use chainl1
  6. “What are the advantages of parser combinators over parser generators?”
    • Expected: Composability, type safety, integration with host language, easier debugging
  7. “How would you implement good error messages?”
    • Expected: Track position, collect “expected” tokens, use labeled parsers