Project 3: Filter Expression Language
Build an expression DSL with precedence-aware parsing and AST evaluation for dynamic filtering rules.
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Level 2: Intermediate |
| Time Estimate | 1-2 weeks |
| Main Programming Language | Python |
| Alternative Programming Languages | TypeScript, Rust, Go |
| Coolness Level | Level 3: Genuinely Clever |
| Business Potential | 4. The “Open Core” Infrastructure |
| Prerequisites | Project 2 parsing fundamentals, recursion, tree traversal |
| Key Topics | Operator precedence, AST construction, interpreter semantics, extensibility |
1. Learning Objectives
By completing this project, you will:
- Parse boolean/filter expressions into a precise AST.
- Implement precedence and associativity rules correctly.
- Evaluate expressions against structured records deterministically.
- Add operators without breaking grammar stability.
- Build a reusable expression engine for rule systems in Project 4 and Project 7.
2. All Theory Needed (Per-Concept Breakdown)
Operator Precedence and Expression Grammar Design
Fundamentals
Expression languages look simple but fail quickly without clear precedence rules. AND, OR, comparison operators, and parentheses must map to unambiguous parse trees. Precedence defines which operators bind first; associativity defines how equal-precedence operators group. Without formal rules, two implementations can produce different ASTs for the same string, which breaks trust and testability. A DSL becomes useful only when authors can predict behavior. This project teaches grammar design choices that make expression evaluation deterministic and extensible.
Deep Dive into the concept
Start by declaring an explicit precedence ladder. Example high to low: parentheses, unary operators, comparison (==, !=, <, >=), containment (IN, CONTAINS), logical AND, logical OR. Associativity usually left-to-right for binary operators except certain unary/binding forms. Once precedence is defined, choose parser strategy: layered recursive descent or Pratt parser.
Layered recursive descent creates one function per precedence level (parse_or, parse_and, parse_comparison, parse_primary). It is easy to read and debug but grows verbose as operators increase. Pratt parsing uses binding powers and parselets; it scales elegantly for extensible operator sets and DSL evolution. For this project, either works; if you plan many operators, Pratt is usually cleaner.
Grammar design must consider token ambiguity. IN may conflict with identifiers if case-insensitive. Decide keyword policy early (reserved keywords vs contextual keywords). Reserved keywords simplify parser logic and diagnostics, but reduce field-name freedom. Contextual keywords preserve flexibility but raise parser complexity.
Parentheses handling should be explicit in grammar and diagnostics. Missing closing parenthesis is a common authoring error; diagnostic should point to opening token and expected closure. Good error UX includes nearest unexpected token and likely fix: “Expected ) before AND”.
Precedence bugs are subtle because many tests still pass. Add intentionally ambiguous fixtures to prove behavior:
a OR b AND cshould parse asa OR (b AND c).(a OR b) AND cshould differ.x == 1 OR y == 2 AND z == 3should keep AND binding tighter.
Extensibility means operator additions should be local changes. If adding NOT, you should update one precedence layer/parselet and evaluation mapping, not rewrite parser core. This is a strong architecture signal.
Consider data model constraints while designing grammar. If fields can be nested (user.role), tokenizer and parser must support dotted identifiers. If literals include strings/numbers/booleans/lists, grammar must define literal syntax clearly. Avoid “magic” implicit conversions unless necessary.
A mature expression DSL also defines truthiness and null behavior. Example policy:
- comparisons with missing field -> false or error?
CONTAINSon non-list -> semantic error.- null equality semantics -> explicit rules. Document these policies before coding to avoid evaluator drift.
In real systems, expression grammar is often customer-facing. Deterministic behavior and clear docs lower support load significantly. This is why precedence is not an academic detail; it is product reliability.
How this fit on projects
- Core to this project’s parser and evaluator.
- Reused in Project 4 rule conditions.
- Critical for Project 7 conditions and conflict checks.
Definitions & key terms
- Precedence: ranking that decides operator binding order.
- Associativity: grouping direction for same-precedence operators.
- Pratt parser: expression parser based on token binding powers.
- Parselet: handler for token in Pratt parsing.
- Ambiguous expression: expression with multiple possible parse trees unless precedence is defined.
Mental model diagram
Expression text
|
v
Tokenizer --> Parser (precedence-aware) --> AST
|
unambiguous tree shape
How it works
- Tokenize identifiers, literals, operators, parentheses.
- Parse with precedence-aware functions/parselets.
- Build AST nodes preserving source spans.
- Validate operator/operand compatibility.
- Pass AST to evaluator.
Minimal concrete example
Input: age >= 21 AND status == "active" OR role == "admin"
AST: OR(
AND(GTE(age,21), EQ(status,"active")),
EQ(role,"admin")
)
Common misconceptions
- “Parentheses are optional if precedence is obvious.” -> obvious to author, not always to reader.
- “
ANDandORcan be same precedence.” -> usually wrong for user expectation. - “Adding operators is trivial.” -> can break grammar and evaluator guarantees.
Check-your-understanding questions
- Why does
a OR b AND cnot parse left-to-right? - What changes when you add unary
NOT? - Predict AST for
(a OR b) AND c.
Check-your-understanding answers
- Because
ANDhas higher precedence. - Add unary precedence level/parselet and evaluator branch.
AND(OR(a,b), c).
Real-world applications
- Search filtering in admin dashboards.
- Alerting/monitoring condition languages.
- Feature flag targeting rules.
Where you’ll apply it
- §3.2 functional requirements 1-4.
- §5.10 phase 2 parser completion.
- Also used in Project 4 and Project 7.
References
- Terence Parr, Language Implementation Patterns, expression parsing.
- Robert Nystrom, Crafting Interpreters, parsing chapters.
- Pratt parsing article: https://matklad.github.io/2020/04/13/simple-but-powerful-pratt-parsing.html
Key insights Expression DSL correctness is mostly a grammar policy problem before it is an evaluation problem.
Summary Clear precedence and associativity rules produce predictable ASTs and make DSL behavior trustworthy.
Homework/Exercises to practice the concept
- Write precedence table for all planned operators.
- Parse three ambiguous expressions by hand.
- Add unary
NOTto your grammar draft.
Solutions to the homework/exercises
- Keep table explicit and ordered high->low.
- Compare ASTs with and without parentheses.
- Place
NOTabove comparison, below primary grouping.
AST Evaluation and Semantic Safety
Fundamentals An AST evaluator executes language semantics independent of surface syntax. It walks tree nodes and computes boolean outcomes against a context record. Evaluator quality depends on deterministic semantics: type handling, missing fields, short-circuit rules, and operator contracts must be explicit. Without clear contracts, the same filter can behave differently across data shapes. For production use, evaluator errors should report the specific node and reason. This project builds the interpreter pattern that later scales into rule engines.
Deep Dive into the concept
Design evaluator as pure function eval(node, context) -> Value|Error. Purity simplifies testing and deterministic behavior. Each AST node type maps to one evaluation rule. For binary nodes, evaluate operands and apply operator semantics. For logical operators, use short-circuit evaluation: AND stops on false left operand; OR stops on true left operand. This improves performance and avoids unnecessary errors from right branch evaluation when outcome is already decided.
Type policy is central. You can allow limited coercion ("21" to 21) or require strict typing. Strict typing is safer and easier to debug; coercion is user-friendly but can hide data quality issues. If coercion is allowed, make rules explicit and symmetrical.
Field resolution should support nested paths (user.role). On missing path segments, choose policy: return null sentinel, false for comparisons, or hard error. For operational DSLs, hard errors are often preferable during authoring, while runtime filtering may prefer false to avoid failures on partial data. Choose one and document it.
Containment operators (IN, CONTAINS) require container semantics. x IN [a,b] checks membership; tags CONTAINS "premium" checks that left operand is collection/string per policy. Invalid operand types should return semantic errors with node span.
Performance considerations: repeated evaluation over many records benefits from precomputed field accessors and operator dispatch tables. But correctness first. Optimize only after deterministic test suite exists.
Diagnostics should include expression segment and reason. Example: “SemanticError at price > "high": numeric comparison expected, got string.” This directly guides fixes.
Security concerns: if expression strings are user-supplied, never compile/evaluate via host eval. Always interpret your own AST. Limit operator/function set to approved safe operations.
Testing strategy: for each operator, create truth table fixtures. Include missing-field cases, nulls, mixed types, and short-circuit behavior. Add metamorphic tests: equivalent expressions should yield same result set.
This evaluator discipline is essential for business rules in Project 4 and inference systems in Project 7.
How this fit on projects
- Direct evaluation engine in this project.
- Reused for rule condition execution in Project 4.
- Core semantic executor in Project 7.
Definitions & key terms
- Interpreter: executes AST semantics directly.
- Short-circuit: stop evaluating once result is determined.
- Semantic error: structurally valid expression with invalid meaning/type.
- Context: record/map evaluated against expression fields.
- Purity: function behavior depends only on inputs.
Mental model diagram
AST node
|
v
eval(node, context)
|-- logical node -> short-circuit branches
|-- comparison node -> typed operator
|-- field node -> resolve path
'-- literal node -> value
How it works
- Dispatch by node type.
- Resolve operands recursively.
- Apply type checks and operator semantics.
- Return bool/value or semantic error with span.
- Aggregate results across records.
Minimal concrete example
Expression: tags CONTAINS "premium" AND age >= 21
Record: {tags:["premium"], age:22}
Result: true
Common misconceptions
- “Parser correctness guarantees evaluator correctness.” -> semantics still need independent tests.
- “Missing fields should silently be false always.” -> may hide data pipeline bugs.
- “Short-circuit is just optimization.” -> it changes observable error behavior too.
Check-your-understanding questions
- Why avoid host-language
evalfor DSL execution? - What is the effect of short-circuit on error propagation?
- How should evaluator handle
age > "ten"?
Check-your-understanding answers
- Security and deterministic control over supported semantics.
- Right branch may never execute, so some errors are intentionally suppressed.
- Return typed semantic error with node span.
Real-world applications
- Marketing audience segmentation.
- Fraud rule pre-filters.
- Access policy conditions.
Where you’ll apply it
- §3.2 requirements 3-5.
- §6.2 critical test cases.
- Also used in Project 7.
References
- Nystrom, Crafting Interpreters, tree-walk interpreter chapters.
- Fowler rules-engine notes: https://martinfowler.com/bliki/RulesEngine.html
Key insights A safe evaluator is a deterministic semantic engine, not a string execution shortcut.
Summary AST evaluation requires explicit type, field-resolution, and short-circuit policies to stay reliable.
Homework/Exercises to practice the concept
- Define missing-field policy and justify it.
- Write truth tables for
AND/ORwith semantic errors. - Add one custom operator and semantic contract.
Solutions to the homework/exercises
- Recommended default: semantic error in strict mode.
false AND error -> falseunder short-circuit,true AND error -> error.- Example
STARTS_WITH(field, string)with string-only operands.
3. Project Specification
3.1 What You Will Build
An expression language engine that parses and evaluates filter conditions over JSON-like records.
Included:
- comparisons, boolean operators, parentheses.
- string/number/boolean literals and dotted field access.
- evaluation engine with deterministic semantics.
Excluded:
- user-defined functions (extension only).
- SQL compilation backend.
3.2 Functional Requirements
- Tokenize expression text.
- Parse to precedence-correct AST.
- Evaluate AST against records.
- Support
== != > >= < <= AND OR IN CONTAINS. - Return actionable parse/semantic errors.
3.3 Non-Functional Requirements
- Performance: evaluate 100k records with simple expression under 1s baseline.
- Reliability: deterministic parse tree and evaluation result.
- Usability: error spans point to exact token/segment.
3.4 Example Usage / Output
expr: status == "active" AND (age >= 21 OR role == "admin")
record: {status:"active", age:19, role:"admin"}
result: true
3.5 Data Formats / Schemas / Protocols
AST Node variants:
- Literal(value)
- Field(path)
- Binary(op,left,right)
- Group(inner)
3.6 Edge Cases
- chained comparisons without boolean operator.
- mismatched parentheses.
- unknown operator token.
- missing fields.
- invalid type comparisons.
3.7 Real World Outcome
A backend can persist filter strings authored by users and evaluate them consistently against live records.
3.7.1 How to Run (Copy/Paste)
cd project_based_ideas/COMPILERS_RUNTIMES/DOMAIN_SPECIFIC_LANGUAGES_DSL_PROJECTS
make p03-test
./bin/p03-filter-eval --expr-file fixtures/p03_golden.expr --data fixtures/p03_data.json
3.7.2 Golden Path Demo (Deterministic)
Expression and dataset fixtures yield exact expected record IDs.
3.7.3 If CLI: exact terminal transcript
$ ./bin/p03-filter-eval --expr 'status == "active" AND age >= 21' --data fixtures/users.json
[ok] parsed_ast_hash=7b2a91ef
[ok] matched_ids=[u1,u4,u7]
exit=0
$ ./bin/p03-filter-eval --expr 'status == "active" AND (age > )' --data fixtures/users.json
[error] ParseError 1:33 expected literal after operator '>'
[hint] provide a number, string, or field reference
exit=2
4. Solution Architecture
4.1 High-Level Design
Expression text -> Lexer -> Parser -> AST -> Evaluator -> bool per record
|
diagnostics
4.2 Key Components
| Component | Responsibility | Key Decisions |
|---|---|---|
| Lexer | produce expression tokens | keyword and operator policy |
| Parser | precedence-correct AST | Pratt vs layered recursive descent |
| Evaluator | execute semantics on records | strict typing + short-circuit |
| Diagnostics | error reporting | span + hint format |
4.4 Data Structures (No Full Code)
record EvalResult { ok: bool, value: bool, error?: SemanticError }
record SemanticError { node_span, message, hint }
4.4 Algorithm Overview
Key Algorithm: Pratt-style Expression Parse
- Parse prefix token into left expression.
- While next token binding power > current power, parse infix.
- Build binary node and continue.
- Return completed subtree.
Complexity Analysis
- Parse time: O(n) tokens.
- Eval time: O(m * t) where m records, t AST node visits per record.
5. Implementation Guide
5.1 Development Environment Setup
mkdir -p bin fixtures tests
5.2 Project Structure
p03-filter-language/
├── src/
│ ├── lexer.*
│ ├── parser.*
│ ├── ast.*
│ ├── evaluator.*
│ └── diagnostics.*
├── fixtures/
└── tests/
5.3 The Core Question You’re Answering
“How do I turn human-readable boolean logic into an unambiguous tree and evaluate it safely?”
5.4 Concepts You Must Understand First
- Operator precedence and associativity.
- AST structure and tree traversal.
- Type-safe evaluator design.
- Error spans and hinting.
5.5 Questions to Guide Your Design
- Which parser strategy best supports future operators?
- Should missing field be false or error?
- How do you represent dotted path resolution?
- How do you test short-circuit behavior explicitly?
5.6 Thinking Exercise
Manually parse:
a OR b AND c OR d
Then evaluate against contexts where a=true, b=false, c=error, d=false and explain short-circuit outcomes.
5.7 The Interview Questions They’ll Ask
- Pratt parser vs recursive descent differences?
- Why precedence bugs are hard to detect?
- How to ensure evaluator safety for untrusted expressions?
- What data structures improve evaluation throughput?
- How would you add functions to the language?
5.8 Hints in Layers
Hint 1: Implement parser for comparisons first.
Hint 2: Add boolean operators with precedence tests.
Hint 3: Build AST pretty-printer before evaluator.
Hint 4: Add semantic strict mode switch (strict=true/false).
5.9 Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Expression parsing | Language Implementation Patterns | expression chapters |
| Interpreters | Crafting Interpreters | AST/evaluator chapters |
| DSL semantics | Domain Specific Languages | execution semantics |
5.10 Implementation Phases
Phase 1: Foundation (2-4 hours)
- lexer + literals + comparisons.
- parse/evaluate simple
field == value.
Checkpoint: minimal expression fixture passes.
Phase 2: Core Functionality (6-10 hours)
- boolean operators and precedence.
- dotted fields and containment ops.
Checkpoint: ambiguous precedence fixtures pass.
Phase 3: Polish & Edge Cases (4-8 hours)
- semantic diagnostics and strict mode.
- benchmark and deterministic AST snapshots.
Checkpoint: golden + failure suites stable in CI.
5.11 Key Implementation Decisions
| Decision | Options | Recommendation | Rationale |
|---|---|---|---|
| Parser strategy | layered RD / Pratt | Pratt | scalable operator growth |
| Missing field policy | false / null / error | strict error default | avoids silent data bugs |
| Evaluation mode | interpret only / compile bytecode | interpret first | fastest path to correctness |
6. Testing Strategy
6.1 Test Categories
| Category | Purpose | Examples |
|---|---|---|
| Parser tests | AST shape correctness | precedence and grouping |
| Evaluator tests | semantic correctness | type checks, short-circuit |
| Integration tests | end-to-end filtering | expression + dataset fixtures |
6.2 Critical Test Cases
a OR b AND cparse shape assertion.- strict-mode missing field error.
CONTAINSsemantics on list vs scalar.- short-circuit avoids right-side error evaluation.
6.3 Test Data
fixtures/p03_golden.expr
fixtures/p03_precedence.expr
fixtures/p03_bad_syntax.expr
fixtures/p03_dataset.json
7. Common Pitfalls & Debugging
7.1 Frequent Mistakes
| Pitfall | Symptom | Solution |
|---|---|---|
| precedence not encoded | wrong matches | explicit precedence tests |
| coercion hidden | surprising comparisons | strict typing with explicit conversion ops |
| parser/evaluator coupled | hard to extend | clean AST boundary |
7.2 Debugging Strategies
- AST pretty-print every parse test.
- evaluation trace mode that prints node results.
7.3 Performance Traps
Repeated path resolution can be expensive; cache field accessor paths per AST node.
8. Extensions & Challenges
8.1 Beginner Extensions
- Add unary
NOT. - Add case-insensitive string comparisons.
8.2 Intermediate Extensions
- Add function calls (
len(tags) > 2). - compile AST to bytecode-like instruction list.
8.3 Advanced Extensions
- expression optimizer (constant folding).
- partial evaluation with index prefilters.
9. Real-World Connections
9.1 Industry Applications
- CRM segmentation rules.
- Alert policy filters.
- Access-control decision expressions.
9.2 Related Open Source Projects
- CEL (Common Expression Language): https://cel.dev/
- JMESPath: https://jmespath.org/
9.3 Interview Relevance
- expression parser implementation.
- interpreter safety and determinism.
10. Resources
10.1 Essential Reading
- Nystrom, Crafting Interpreters.
- Parr, Language Implementation Patterns.
10.2 Video Resources
- Pratt parser walkthroughs.
- AST interpreter architecture talks.
10.3 Tools & Documentation
- grammar playground tooling.
- snapshot test frameworks.
10.4 Related Projects in This Series
11. Self-Assessment Checklist
11.1 Understanding
- I can explain precedence and associativity clearly.
- I can trace AST evaluation on paper.
- I can justify missing-field and type policies.
11.2 Implementation
- precedence tests pass.
- semantic diagnostics include spans and hints.
- deterministic integration fixtures pass.
11.3 Growth
- I can add one new operator safely.
- I can discuss parser strategy tradeoffs.
- I can map this project to business-rule systems.
12. Submission / Completion Criteria
Minimum Viable Completion:
- parse and evaluate comparisons with AND/OR.
- parentheses support and two failure-path diagnostics.
Full Completion:
- all minimum plus containment operators, strict semantic checks, deterministic fixtures.
Excellence (Going Above & Beyond):
- function/operator extensibility architecture with benchmarks.
13 Additional Content Rules (Applied)
13.1 Determinism
Use fixed dataset and expected matched IDs snapshot.
13.2 Outcome Completeness
Provide both successful evaluation and parse failure demos with explicit exit codes.
13.3 Cross-Linking
The evaluator and parser are prerequisites for Project 4 and Project 7.
13.4 No Placeholder Text
All sections include concrete decisions and outputs.