Project 6: Interpreter for a Programming Language

Build a tree-walking interpreter for a small language (Lox or Monkey style).

Quick Reference

Attribute Value
Difficulty Intermediate to Advanced
Time Estimate 2-3 weeks
Main Programming Language C, Go, or Rust
Alternative Programming Languages Python, JavaScript
Coolness Level Level 3: Genuinely Clever
Business Potential Language tooling and DSLs
Prerequisites Basic parsing, data structures
Key Topics Lexing, parsing, ASTs, evaluation, environments

1. Learning Objectives

By completing this project, you will:

  1. Build a lexer that converts source text into tokens.
  2. Implement a parser that builds an AST.
  3. Evaluate expressions and statements with correct semantics.
  4. Implement lexical scoping with environments.
  5. Add functions and closures.
  6. Explain how interpreters differ from compilers.

2. All Theory Needed (Per-Concept Breakdown)

2.1 Lexing and Tokens

Description / Expanded Explanation

The lexer reads raw source code and outputs a stream of tokens. This simplifies parsing by turning characters into meaningful symbols like identifiers, numbers, and keywords.

Definitions & Key Terms
  • token -> smallest meaningful unit
  • lexeme -> raw characters matched
  • keyword -> reserved word
  • literal -> numeric or string value
Mental Model Diagram (ASCII)
"var x = 42;" -> [VAR][IDENT:x][=][NUMBER:42][;]
Mental Model Diagram (Image)

Lexing

How It Works (Step-by-Step)
  1. Read characters left to right.
  2. Match patterns for numbers, identifiers, operators.
  3. Skip whitespace and comments.
  4. Emit tokens with type and lexeme.
Minimal Concrete Example
Input: "1 + 2"
Tokens: NUMBER(1), PLUS, NUMBER(2)
Common Misconceptions
  • “Lexer handles grammar” -> lexer only groups characters.
  • “Whitespace is always irrelevant” -> some languages are whitespace sensitive.
Check-Your-Understanding Questions
  1. Why separate lexing from parsing?
  2. How do you handle multi-character operators like “==”?
  3. What should the lexer do with unknown characters?
Where You’ll Apply It
  • See 3.5 for token format.
  • See 5.10 Phase 1 for lexer implementation.
  • Also used in: P09 Mini-React for JSX-like parsing

2.2 Parsing and ASTs

Description / Expanded Explanation

The parser consumes tokens and produces an Abstract Syntax Tree (AST), which represents the grammatical structure of the program. The AST is the basis for interpretation.

Definitions & Key Terms
  • parser -> converts tokens into a tree structure
  • AST -> tree representation of program structure
  • precedence -> operator binding strength
  • associativity -> grouping order of operators
Mental Model Diagram (ASCII)
1 + 2 * 3
   +
  / \
 1   *
    / \
   2   3
Mental Model Diagram (Image)

AST

How It Works (Step-by-Step)
  1. Use recursive descent or Pratt parsing.
  2. Respect precedence and associativity.
  3. Build AST nodes for expressions and statements.
Minimal Concrete Example
BinaryExpr(op='+', left=Literal(1), right=Literal(2))
Common Misconceptions
  • “AST is identical to parse tree” -> AST is simplified representation.
  • “Parsing is trivial” -> precedence handling is subtle.
Check-Your-Understanding Questions
  1. Why do we need precedence rules?
  2. How does recursive descent avoid left recursion?
  3. What is the difference between statement and expression nodes?
Where You’ll Apply It
  • See 4.4 for evaluation of AST nodes.
  • See 5.10 Phase 2 for parser implementation.
  • Also used in: P10 Mini Git Object Store for object format parsing

2.3 Evaluation and Semantics

Description / Expanded Explanation

The interpreter walks the AST and executes its semantics. This is where operations, control flow, and functions are actually performed.

Definitions & Key Terms
  • evaluator -> executes AST nodes
  • environment -> mapping of variables to values
  • truthiness -> rules for boolean evaluation
  • runtime error -> error during execution
Mental Model Diagram (ASCII)
AST -> evaluator -> runtime values
Mental Model Diagram (Image)

Evaluation

How It Works (Step-by-Step)
  1. Dispatch on node type (literal, binary, call, etc.).
  2. Evaluate children first.
  3. Apply operation and return result.
  4. Propagate errors with line info.
Minimal Concrete Example
eval(Binary(+)) -> eval(left)+eval(right)
Common Misconceptions
  • “Interpreter can ignore types” -> dynamic types still have runtime rules.
  • “Errors should be ignored” -> clear error messages are critical.
Check-Your-Understanding Questions
  1. How does evaluation order affect side effects?
  2. What should happen when dividing by zero?
  3. How do you implement short-circuit boolean logic?
Where You’ll Apply It
  • See 3.2 for functional requirements.
  • See 5.10 Phase 3 for runtime implementation.
  • Also used in: P07 Lock-Free Queue for reasoning about execution order

2.4 Environments and Scoping

Description / Expanded Explanation

Variables live in environments. Lexical scoping means a variable resolves to the nearest enclosing scope where it is defined. This enables functions and closures.

Definitions & Key Terms
  • scope -> region where a name is defined
  • lexical scoping -> scope determined by program structure
  • closure -> function with captured environment
Mental Model Diagram (ASCII)
global env
  -> function env
      -> block env
Mental Model Diagram (Image)

Environments

How It Works (Step-by-Step)
  1. Each block creates a new environment with parent pointer.
  2. Variable lookup walks parent chain.
  3. Functions capture the environment where they are defined.
Minimal Concrete Example
var x=1; fun f(){ return x; }
Common Misconceptions
  • “Scope is dynamic” -> lexical scoping is based on source, not call stack.
  • “Closures are optional” -> they are essential for functions.
Check-Your-Understanding Questions
  1. How is a variable resolved inside nested functions?
  2. What is captured when a closure is created?
  3. What happens if an inner variable shadows an outer one?
Where You’ll Apply It
  • See 3.2 for variable semantics.
  • See 5.10 Phase 3 for closures.
  • Also used in: P09 Mini-React for component state

2.5 Error Handling and Diagnostics

Description / Expanded Explanation

A good interpreter reports syntax and runtime errors clearly, with line and column numbers. Errors guide the user and make debugging possible.

Definitions & Key Terms
  • syntax error -> invalid grammar
  • runtime error -> error during evaluation
  • recovery -> continue parsing after errors
Mental Model Diagram (ASCII)
source -> parser -> error at line 3: unexpected token
Mental Model Diagram (Image)

Errors

How It Works (Step-by-Step)
  1. Lexer tracks line and column positions.
  2. Parser reports errors with token context.
  3. Interpreter reports runtime errors with stack trace.
Minimal Concrete Example
line 2: expected ';' after expression
Common Misconceptions
  • “Errors should abort immediately” -> you can often recover and continue.
  • “Line numbers are optional” -> they are critical for usability.
Check-Your-Understanding Questions
  1. How does parser recovery work?
  2. What information helps users debug runtime errors?
  3. Why track column numbers?
Where You’ll Apply It

3. Project Specification

3.1 What You Will Build

A small interpreter that supports variables, arithmetic, conditionals, loops, and functions. It reads source files and executes them.

3.2 Functional Requirements

  1. Lexing for identifiers, numbers, strings, and keywords.
  2. Parsing for expressions, statements, and blocks.
  3. Evaluation of arithmetic, boolean logic, and control flow.
  4. Functions with parameters and return values.
  5. Closures and lexical scoping.
  6. Error reporting with line numbers.

3.3 Non-Functional Requirements

  • Performance: execute 10k line scripts in under 1s.
  • Reliability: deterministic execution and clear errors.
  • Usability: CLI and REPL mode.

3.4 Example Usage / Output

$ mylang run hello.lang
Hello, world

3.5 Data Formats / Schemas / Protocols

Token format:

{type: IDENT, lexeme: "foo", line: 3}

AST node (JSON):

{"type":"Binary","op":"+","left":"Literal","right":"Literal"}

Error JSON:

{"error":"syntax","line":3,"message":"expected ';'"}

3.6 Edge Cases

  • Unterminated strings.
  • Division by zero.
  • Recursive function depth.

3.7 Real World Outcome

3.7.1 How to Run (Copy/Paste)

make
./mylang run examples/fib.lang

3.7.2 Golden Path Demo (Deterministic)

Use a fixed input program and expected output.

3.7.3 CLI Transcript (Success)

$ ./mylang run examples/hello.lang
Hello, world
$ echo $?
0

3.7.3 CLI Transcript (Failure)

$ ./mylang run examples/bad.lang
{"error":"syntax","line":2,"message":"expected ';'"}
$ echo $?
65

3.7.4 Exit Codes

  • 0 success
  • 65 syntax error
  • 70 runtime error

4. Solution Architecture

4.1 High-Level Design

source -> lexer -> parser -> AST -> interpreter -> output

4.2 Key Components

| Component | Responsibility | Key Decisions | |———–|—————-|—————| | Lexer | produce tokens | regex vs manual scanner | | Parser | build AST | recursive descent | | Interpreter | evaluate nodes | visitor pattern | | Environment | variable scopes | chained maps |

4.3 Data Structures (No Full Code)

Env { map<string, Value> vars; Env* parent; }

4.4 Algorithm Overview

  • Parse into AST, then evaluate recursively.
  • Use environment chain for name lookup.

Complexity Analysis

  • Parsing: O(n) tokens
  • Execution: O(n) nodes (per run)

5. Implementation Guide

5.1 Development Environment Setup

make

5.2 Project Structure

project-root/
├── src/lexer/
├── src/parser/
├── src/ast/
├── src/interpreter/
└── tests/

5.3 The Core Question You’re Answering

“How does source code become running behavior without a compiler?”

5.4 Concepts You Must Understand First

  1. Tokenization and lexical rules
  2. Grammar and precedence
  3. AST evaluation
  4. Lexical scoping and closures

5.5 Questions to Guide Your Design

  1. How will you represent values (numbers, strings, booleans)?
  2. How will you manage environments for function calls?
  3. What error recovery strategy will you use?

5.6 Thinking Exercise

Given var x = 1; fun f(){ var x = 2; return x; }, what does f() return?

5.7 The Interview Questions They’ll Ask

  1. Explain lexing vs parsing.
  2. Why use an AST instead of interpreting tokens directly?
  3. How do closures work?

5.8 Hints in Layers

Hint 1: Implement an expression evaluator first. Hint 2: Add variables and environments. Hint 3: Add functions and closures.

5.9 Books That Will Help

| Topic | Book | Chapter | |——-|——|———| | Interpreters | Crafting Interpreters | Ch. 1-9 | | Parsing | Programming Language Pragmatics | Ch. 2 |

5.10 Implementation Phases

Phase 1: Foundation (4-6 days)

Lexer and basic parser.

Phase 2: Core Functionality (7-10 days)

Expressions, statements, variables.

Phase 3: Polish and Edge Cases (4-6 days)

Functions, closures, error handling.

5.11 Key Implementation Decisions

| Decision | Options | Recommendation | Rationale | |———-|———|—————-|———–| | Parser | Pratt vs recursive descent | recursive descent | clarity | | Values | tagged union | tagged union | simple runtime | | REPL | optional | include | good UX |


6. Testing Strategy

6.1 Test Categories

| Category | Purpose | Examples | |———-|———|———-| | Unit Tests | lexer and parser | token streams | | Integration Tests | sample programs | fibonacci, loops | | Error Tests | invalid syntax | missing semicolons |

6.2 Critical Test Cases

  1. Operator precedence works (1 + 2 * 3).
  2. Variable scoping works in nested blocks.
  3. Closure captures outer variable.

6.3 Test Data

programs: hello.lang, fib.lang, scope.lang

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

| Pitfall | Symptom | Solution | |———|———|———-| | Wrong precedence | incorrect results | implement precedence table | | Environment leak | variables persist incorrectly | create new env per scope | | Poor errors | hard to debug | include line numbers |

7.2 Debugging Strategies

  • Print AST for complex programs.
  • Add trace mode to show evaluation steps.

7.3 Performance Traps

  • Re-parsing in REPL without caching.

8. Extensions & Challenges

8.1 Beginner Extensions

  • Add arrays or lists.
  • Add for loops.

8.2 Intermediate Extensions

  • Add classes and objects.
  • Add modules/imports.

8.3 Advanced Extensions

  • Compile to bytecode.
  • Add garbage collection.

9. Real-World Connections

9.1 Industry Applications

  • DSLs in build systems and configuration.
  • Scripting engines inside applications.
  • Lua
  • Wren
  • Monkey language

9.3 Interview Relevance

  • Demonstrates parsing and runtime fundamentals.

10. Resources

10.1 Essential Reading

  • Crafting Interpreters (Ch. 1-9)
  • Writing Interpreters by Hand

10.2 Video Resources

  • Language design lectures

10.3 Tools & Documentation

  • re2c or flex for lexer inspiration

11. Self-Assessment Checklist

11.1 Understanding

  • I can explain the lexer-parser-AST pipeline.
  • I can explain closures and lexical scoping.
  • I can explain error recovery.

11.2 Implementation

  • All sample programs run correctly.
  • Errors include line numbers.
  • REPL works reliably.

11.3 Growth

  • I can explain interpreter trade-offs in an interview.

12. Submission / Completion Criteria

Minimum Viable Completion:

  • Lexer, parser, and evaluator for arithmetic.
  • Variables and basic control flow.

Full Completion:

  • Functions and closures implemented.
  • Error handling and REPL included.

Excellence (Going Above & Beyond):

  • Bytecode compiler and VM.
  • Garbage collection.