Project 6: Interpreter for a Programming Language
Build a tree-walking interpreter for a small language (Lox or Monkey style).
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Intermediate to Advanced |
| Time Estimate | 2-3 weeks |
| Main Programming Language | C, Go, or Rust |
| Alternative Programming Languages | Python, JavaScript |
| Coolness Level | Level 3: Genuinely Clever |
| Business Potential | Language tooling and DSLs |
| Prerequisites | Basic parsing, data structures |
| Key Topics | Lexing, parsing, ASTs, evaluation, environments |
1. Learning Objectives
By completing this project, you will:
- Build a lexer that converts source text into tokens.
- Implement a parser that builds an AST.
- Evaluate expressions and statements with correct semantics.
- Implement lexical scoping with environments.
- Add functions and closures.
- Explain how interpreters differ from compilers.
2. All Theory Needed (Per-Concept Breakdown)
2.1 Lexing and Tokens
Description / Expanded Explanation
The lexer reads raw source code and outputs a stream of tokens. This simplifies parsing by turning characters into meaningful symbols like identifiers, numbers, and keywords.
Definitions & Key Terms
- token -> smallest meaningful unit
- lexeme -> raw characters matched
- keyword -> reserved word
- literal -> numeric or string value
Mental Model Diagram (ASCII)
"var x = 42;" -> [VAR][IDENT:x][=][NUMBER:42][;]
Mental Model Diagram (Image)

How It Works (Step-by-Step)
- Read characters left to right.
- Match patterns for numbers, identifiers, operators.
- Skip whitespace and comments.
- Emit tokens with type and lexeme.
Minimal Concrete Example
Input: "1 + 2"
Tokens: NUMBER(1), PLUS, NUMBER(2)
Common Misconceptions
- “Lexer handles grammar” -> lexer only groups characters.
- “Whitespace is always irrelevant” -> some languages are whitespace sensitive.
Check-Your-Understanding Questions
- Why separate lexing from parsing?
- How do you handle multi-character operators like “==”?
- What should the lexer do with unknown characters?
Where You’ll Apply It
- See 3.5 for token format.
- See 5.10 Phase 1 for lexer implementation.
- Also used in: P09 Mini-React for JSX-like parsing
2.2 Parsing and ASTs
Description / Expanded Explanation
The parser consumes tokens and produces an Abstract Syntax Tree (AST), which represents the grammatical structure of the program. The AST is the basis for interpretation.
Definitions & Key Terms
- parser -> converts tokens into a tree structure
- AST -> tree representation of program structure
- precedence -> operator binding strength
- associativity -> grouping order of operators
Mental Model Diagram (ASCII)
1 + 2 * 3
+
/ \
1 *
/ \
2 3
Mental Model Diagram (Image)

How It Works (Step-by-Step)
- Use recursive descent or Pratt parsing.
- Respect precedence and associativity.
- Build AST nodes for expressions and statements.
Minimal Concrete Example
BinaryExpr(op='+', left=Literal(1), right=Literal(2))
Common Misconceptions
- “AST is identical to parse tree” -> AST is simplified representation.
- “Parsing is trivial” -> precedence handling is subtle.
Check-Your-Understanding Questions
- Why do we need precedence rules?
- How does recursive descent avoid left recursion?
- What is the difference between statement and expression nodes?
Where You’ll Apply It
- See 4.4 for evaluation of AST nodes.
- See 5.10 Phase 2 for parser implementation.
- Also used in: P10 Mini Git Object Store for object format parsing
2.3 Evaluation and Semantics
Description / Expanded Explanation
The interpreter walks the AST and executes its semantics. This is where operations, control flow, and functions are actually performed.
Definitions & Key Terms
- evaluator -> executes AST nodes
- environment -> mapping of variables to values
- truthiness -> rules for boolean evaluation
- runtime error -> error during execution
Mental Model Diagram (ASCII)
AST -> evaluator -> runtime values
Mental Model Diagram (Image)

How It Works (Step-by-Step)
- Dispatch on node type (literal, binary, call, etc.).
- Evaluate children first.
- Apply operation and return result.
- Propagate errors with line info.
Minimal Concrete Example
eval(Binary(+)) -> eval(left)+eval(right)
Common Misconceptions
- “Interpreter can ignore types” -> dynamic types still have runtime rules.
- “Errors should be ignored” -> clear error messages are critical.
Check-Your-Understanding Questions
- How does evaluation order affect side effects?
- What should happen when dividing by zero?
- How do you implement short-circuit boolean logic?
Where You’ll Apply It
- See 3.2 for functional requirements.
- See 5.10 Phase 3 for runtime implementation.
- Also used in: P07 Lock-Free Queue for reasoning about execution order
2.4 Environments and Scoping
Description / Expanded Explanation
Variables live in environments. Lexical scoping means a variable resolves to the nearest enclosing scope where it is defined. This enables functions and closures.
Definitions & Key Terms
- scope -> region where a name is defined
- lexical scoping -> scope determined by program structure
- closure -> function with captured environment
Mental Model Diagram (ASCII)
global env
-> function env
-> block env
Mental Model Diagram (Image)

How It Works (Step-by-Step)
- Each block creates a new environment with parent pointer.
- Variable lookup walks parent chain.
- Functions capture the environment where they are defined.
Minimal Concrete Example
var x=1; fun f(){ return x; }
Common Misconceptions
- “Scope is dynamic” -> lexical scoping is based on source, not call stack.
- “Closures are optional” -> they are essential for functions.
Check-Your-Understanding Questions
- How is a variable resolved inside nested functions?
- What is captured when a closure is created?
- What happens if an inner variable shadows an outer one?
Where You’ll Apply It
- See 3.2 for variable semantics.
- See 5.10 Phase 3 for closures.
- Also used in: P09 Mini-React for component state
2.5 Error Handling and Diagnostics
Description / Expanded Explanation
A good interpreter reports syntax and runtime errors clearly, with line and column numbers. Errors guide the user and make debugging possible.
Definitions & Key Terms
- syntax error -> invalid grammar
- runtime error -> error during evaluation
- recovery -> continue parsing after errors
Mental Model Diagram (ASCII)
source -> parser -> error at line 3: unexpected token
Mental Model Diagram (Image)

How It Works (Step-by-Step)
- Lexer tracks line and column positions.
- Parser reports errors with token context.
- Interpreter reports runtime errors with stack trace.
Minimal Concrete Example
line 2: expected ';' after expression
Common Misconceptions
- “Errors should abort immediately” -> you can often recover and continue.
- “Line numbers are optional” -> they are critical for usability.
Check-Your-Understanding Questions
- How does parser recovery work?
- What information helps users debug runtime errors?
- Why track column numbers?
Where You’ll Apply It
- See 3.4 for error output format.
- See 6.2 for error tests.
- Also used in: P01 Memory Allocator
3. Project Specification
3.1 What You Will Build
A small interpreter that supports variables, arithmetic, conditionals, loops, and functions. It reads source files and executes them.
3.2 Functional Requirements
- Lexing for identifiers, numbers, strings, and keywords.
- Parsing for expressions, statements, and blocks.
- Evaluation of arithmetic, boolean logic, and control flow.
- Functions with parameters and return values.
- Closures and lexical scoping.
- Error reporting with line numbers.
3.3 Non-Functional Requirements
- Performance: execute 10k line scripts in under 1s.
- Reliability: deterministic execution and clear errors.
- Usability: CLI and REPL mode.
3.4 Example Usage / Output
$ mylang run hello.lang
Hello, world
3.5 Data Formats / Schemas / Protocols
Token format:
{type: IDENT, lexeme: "foo", line: 3}
AST node (JSON):
{"type":"Binary","op":"+","left":"Literal","right":"Literal"}
Error JSON:
{"error":"syntax","line":3,"message":"expected ';'"}
3.6 Edge Cases
- Unterminated strings.
- Division by zero.
- Recursive function depth.
3.7 Real World Outcome
3.7.1 How to Run (Copy/Paste)
make
./mylang run examples/fib.lang
3.7.2 Golden Path Demo (Deterministic)
Use a fixed input program and expected output.
3.7.3 CLI Transcript (Success)
$ ./mylang run examples/hello.lang
Hello, world
$ echo $?
0
3.7.3 CLI Transcript (Failure)
$ ./mylang run examples/bad.lang
{"error":"syntax","line":2,"message":"expected ';'"}
$ echo $?
65
3.7.4 Exit Codes
0success65syntax error70runtime error
4. Solution Architecture
4.1 High-Level Design
source -> lexer -> parser -> AST -> interpreter -> output
4.2 Key Components
| Component | Responsibility | Key Decisions | |———–|—————-|—————| | Lexer | produce tokens | regex vs manual scanner | | Parser | build AST | recursive descent | | Interpreter | evaluate nodes | visitor pattern | | Environment | variable scopes | chained maps |
4.3 Data Structures (No Full Code)
Env { map<string, Value> vars; Env* parent; }
4.4 Algorithm Overview
- Parse into AST, then evaluate recursively.
- Use environment chain for name lookup.
Complexity Analysis
- Parsing: O(n) tokens
- Execution: O(n) nodes (per run)
5. Implementation Guide
5.1 Development Environment Setup
make
5.2 Project Structure
project-root/
├── src/lexer/
├── src/parser/
├── src/ast/
├── src/interpreter/
└── tests/
5.3 The Core Question You’re Answering
“How does source code become running behavior without a compiler?”
5.4 Concepts You Must Understand First
- Tokenization and lexical rules
- Grammar and precedence
- AST evaluation
- Lexical scoping and closures
5.5 Questions to Guide Your Design
- How will you represent values (numbers, strings, booleans)?
- How will you manage environments for function calls?
- What error recovery strategy will you use?
5.6 Thinking Exercise
Given var x = 1; fun f(){ var x = 2; return x; }, what does f() return?
5.7 The Interview Questions They’ll Ask
- Explain lexing vs parsing.
- Why use an AST instead of interpreting tokens directly?
- How do closures work?
5.8 Hints in Layers
Hint 1: Implement an expression evaluator first. Hint 2: Add variables and environments. Hint 3: Add functions and closures.
5.9 Books That Will Help
| Topic | Book | Chapter | |——-|——|———| | Interpreters | Crafting Interpreters | Ch. 1-9 | | Parsing | Programming Language Pragmatics | Ch. 2 |
5.10 Implementation Phases
Phase 1: Foundation (4-6 days)
Lexer and basic parser.
Phase 2: Core Functionality (7-10 days)
Expressions, statements, variables.
Phase 3: Polish and Edge Cases (4-6 days)
Functions, closures, error handling.
5.11 Key Implementation Decisions
| Decision | Options | Recommendation | Rationale | |———-|———|—————-|———–| | Parser | Pratt vs recursive descent | recursive descent | clarity | | Values | tagged union | tagged union | simple runtime | | REPL | optional | include | good UX |
6. Testing Strategy
6.1 Test Categories
| Category | Purpose | Examples | |———-|———|———-| | Unit Tests | lexer and parser | token streams | | Integration Tests | sample programs | fibonacci, loops | | Error Tests | invalid syntax | missing semicolons |
6.2 Critical Test Cases
- Operator precedence works (1 + 2 * 3).
- Variable scoping works in nested blocks.
- Closure captures outer variable.
6.3 Test Data
programs: hello.lang, fib.lang, scope.lang
7. Common Pitfalls & Debugging
7.1 Frequent Mistakes
| Pitfall | Symptom | Solution | |———|———|———-| | Wrong precedence | incorrect results | implement precedence table | | Environment leak | variables persist incorrectly | create new env per scope | | Poor errors | hard to debug | include line numbers |
7.2 Debugging Strategies
- Print AST for complex programs.
- Add trace mode to show evaluation steps.
7.3 Performance Traps
- Re-parsing in REPL without caching.
8. Extensions & Challenges
8.1 Beginner Extensions
- Add arrays or lists.
- Add for loops.
8.2 Intermediate Extensions
- Add classes and objects.
- Add modules/imports.
8.3 Advanced Extensions
- Compile to bytecode.
- Add garbage collection.
9. Real-World Connections
9.1 Industry Applications
- DSLs in build systems and configuration.
- Scripting engines inside applications.
9.2 Related Open Source Projects
- Lua
- Wren
- Monkey language
9.3 Interview Relevance
- Demonstrates parsing and runtime fundamentals.
10. Resources
10.1 Essential Reading
- Crafting Interpreters (Ch. 1-9)
- Writing Interpreters by Hand
10.2 Video Resources
- Language design lectures
10.3 Tools & Documentation
re2corflexfor lexer inspiration
10.4 Related Projects in This Series
11. Self-Assessment Checklist
11.1 Understanding
- I can explain the lexer-parser-AST pipeline.
- I can explain closures and lexical scoping.
- I can explain error recovery.
11.2 Implementation
- All sample programs run correctly.
- Errors include line numbers.
- REPL works reliably.
11.3 Growth
- I can explain interpreter trade-offs in an interview.
12. Submission / Completion Criteria
Minimum Viable Completion:
- Lexer, parser, and evaluator for arithmetic.
- Variables and basic control flow.
Full Completion:
- Functions and closures implemented.
- Error handling and REPL included.
Excellence (Going Above & Beyond):
- Bytecode compiler and VM.
- Garbage collection.