Project 6: Interpreter for a Programming Language

Build a tree-walking interpreter for a small language (Lox or Monkey style).

Quick Reference

Attribute	Value
Difficulty	Intermediate to Advanced
Time Estimate	2-3 weeks
Main Programming Language	C, Go, or Rust
Alternative Programming Languages	Python, JavaScript
Coolness Level	Level 3: Genuinely Clever
Business Potential	Language tooling and DSLs
Prerequisites	Basic parsing, data structures
Key Topics	Lexing, parsing, ASTs, evaluation, environments

1. Learning Objectives

By completing this project, you will:

Build a lexer that converts source text into tokens.
Implement a parser that builds an AST.
Evaluate expressions and statements with correct semantics.
Implement lexical scoping with environments.
Add functions and closures.
Explain how interpreters differ from compilers.

2. All Theory Needed (Per-Concept Breakdown)

2.1 Lexing and Tokens

Description / Expanded Explanation

The lexer reads raw source code and outputs a stream of tokens. This simplifies parsing by turning characters into meaningful symbols like identifiers, numbers, and keywords.

Definitions & Key Terms

token -> smallest meaningful unit
lexeme -> raw characters matched
keyword -> reserved word
literal -> numeric or string value

Mental Model Diagram (ASCII)

"var x = 42;" -> [VAR][IDENT:x][=][NUMBER:42][;]

Mental Model Diagram (Image)

Lexing

How It Works (Step-by-Step)

Read characters left to right.
Match patterns for numbers, identifiers, operators.
Skip whitespace and comments.
Emit tokens with type and lexeme.

Minimal Concrete Example

Input: "1 + 2"
Tokens: NUMBER(1), PLUS, NUMBER(2)

Common Misconceptions

“Lexer handles grammar” -> lexer only groups characters.
“Whitespace is always irrelevant” -> some languages are whitespace sensitive.

Check-Your-Understanding Questions

Why separate lexing from parsing?
How do you handle multi-character operators like “==”?
What should the lexer do with unknown characters?

Where You’ll Apply It

See 3.5 for token format.
See 5.10 Phase 1 for lexer implementation.
Also used in: P09 Mini-React for JSX-like parsing

2.2 Parsing and ASTs

Description / Expanded Explanation

The parser consumes tokens and produces an Abstract Syntax Tree (AST), which represents the grammatical structure of the program. The AST is the basis for interpretation.

Definitions & Key Terms

parser -> converts tokens into a tree structure
AST -> tree representation of program structure
precedence -> operator binding strength
associativity -> grouping order of operators

Mental Model Diagram (ASCII)

Mental Model Diagram (Image)

AST

How It Works (Step-by-Step)

Use recursive descent or Pratt parsing.
Respect precedence and associativity.
Build AST nodes for expressions and statements.

Minimal Concrete Example

BinaryExpr(op='+', left=Literal(1), right=Literal(2))

Common Misconceptions

“AST is identical to parse tree” -> AST is simplified representation.
“Parsing is trivial” -> precedence handling is subtle.

Check-Your-Understanding Questions

Why do we need precedence rules?
How does recursive descent avoid left recursion?
What is the difference between statement and expression nodes?

Where You’ll Apply It

See 4.4 for evaluation of AST nodes.
See 5.10 Phase 2 for parser implementation.
Also used in: P10 Mini Git Object Store for object format parsing

2.3 Evaluation and Semantics

Description / Expanded Explanation

The interpreter walks the AST and executes its semantics. This is where operations, control flow, and functions are actually performed.

Definitions & Key Terms

evaluator -> executes AST nodes
environment -> mapping of variables to values
truthiness -> rules for boolean evaluation
runtime error -> error during execution

Mental Model Diagram (ASCII)

AST -> evaluator -> runtime values

Mental Model Diagram (Image)

Evaluation

How It Works (Step-by-Step)

Dispatch on node type (literal, binary, call, etc.).
Evaluate children first.
Apply operation and return result.
Propagate errors with line info.

Minimal Concrete Example

eval(Binary(+)) -> eval(left)+eval(right)

Common Misconceptions

“Interpreter can ignore types” -> dynamic types still have runtime rules.
“Errors should be ignored” -> clear error messages are critical.

Check-Your-Understanding Questions

How does evaluation order affect side effects?
What should happen when dividing by zero?
How do you implement short-circuit boolean logic?

Where You’ll Apply It

See 3.2 for functional requirements.
See 5.10 Phase 3 for runtime implementation.
Also used in: P07 Lock-Free Queue for reasoning about execution order

2.4 Environments and Scoping

Description / Expanded Explanation

Variables live in environments. Lexical scoping means a variable resolves to the nearest enclosing scope where it is defined. This enables functions and closures.

Definitions & Key Terms

scope -> region where a name is defined
lexical scoping -> scope determined by program structure
closure -> function with captured environment

Mental Model Diagram (ASCII)

global env
  -> function env
      -> block env

Mental Model Diagram (Image)

Environments

How It Works (Step-by-Step)

Each block creates a new environment with parent pointer.
Variable lookup walks parent chain.
Functions capture the environment where they are defined.

Minimal Concrete Example

var x=1; fun f(){ return x; }

Common Misconceptions

“Scope is dynamic” -> lexical scoping is based on source, not call stack.
“Closures are optional” -> they are essential for functions.

Check-Your-Understanding Questions

How is a variable resolved inside nested functions?
What is captured when a closure is created?
What happens if an inner variable shadows an outer one?

Where You’ll Apply It

See 3.2 for variable semantics.
See 5.10 Phase 3 for closures.
Also used in: P09 Mini-React for component state

2.5 Error Handling and Diagnostics

Description / Expanded Explanation

A good interpreter reports syntax and runtime errors clearly, with line and column numbers. Errors guide the user and make debugging possible.

Definitions & Key Terms

syntax error -> invalid grammar
runtime error -> error during evaluation
recovery -> continue parsing after errors

Mental Model Diagram (ASCII)

source -> parser -> error at line 3: unexpected token

Mental Model Diagram (Image)

Errors

How It Works (Step-by-Step)

Lexer tracks line and column positions.
Parser reports errors with token context.
Interpreter reports runtime errors with stack trace.

Minimal Concrete Example

line 2: expected ';' after expression

Common Misconceptions

“Errors should abort immediately” -> you can often recover and continue.
“Line numbers are optional” -> they are critical for usability.

Check-Your-Understanding Questions

How does parser recovery work?
What information helps users debug runtime errors?
Why track column numbers?

Where You’ll Apply It

See 3.4 for error output format.
See 6.2 for error tests.
Also used in: P01 Memory Allocator

3. Project Specification

3.1 What You Will Build

A small interpreter that supports variables, arithmetic, conditionals, loops, and functions. It reads source files and executes them.

3.2 Functional Requirements

Lexing for identifiers, numbers, strings, and keywords.
Parsing for expressions, statements, and blocks.
Evaluation of arithmetic, boolean logic, and control flow.
Functions with parameters and return values.
Closures and lexical scoping.
Error reporting with line numbers.

3.3 Non-Functional Requirements

Performance: execute 10k line scripts in under 1s.
Reliability: deterministic execution and clear errors.
Usability: CLI and REPL mode.

3.4 Example Usage / Output

$ mylang run hello.lang
Hello, world

3.5 Data Formats / Schemas / Protocols

Token format:

{type: IDENT, lexeme: "foo", line: 3}

AST node (JSON):

{"type":"Binary","op":"+","left":"Literal","right":"Literal"}

Error JSON:

{"error":"syntax","line":3,"message":"expected ';'"}

3.6 Edge Cases

Unterminated strings.
Division by zero.
Recursive function depth.

3.7 Real World Outcome

3.7.1 How to Run (Copy/Paste)

make
./mylang run examples/fib.lang

3.7.2 Golden Path Demo (Deterministic)

Use a fixed input program and expected output.

3.7.3 CLI Transcript (Success)

$ ./mylang run examples/hello.lang
Hello, world
$ echo $?
0

3.7.3 CLI Transcript (Failure)

$ ./mylang run examples/bad.lang
{"error":"syntax","line":2,"message":"expected ';'"}
$ echo $?
65

3.7.4 Exit Codes

0 success
65 syntax error
70 runtime error

4. Solution Architecture

4.1 High-Level Design

source -> lexer -> parser -> AST -> interpreter -> output

4.2 Key Components

4.3 Data Structures (No Full Code)

Env { map<string, Value> vars; Env* parent; }

4.4 Algorithm Overview

Parse into AST, then evaluate recursively.
Use environment chain for name lookup.

Complexity Analysis

Parsing: O(n) tokens
Execution: O(n) nodes (per run)

5. Implementation Guide

5.1 Development Environment Setup

make

5.2 Project Structure

project-root/
├── src/lexer/
├── src/parser/
├── src/ast/
├── src/interpreter/
└── tests/

5.3 The Core Question You’re Answering

“How does source code become running behavior without a compiler?”

5.4 Concepts You Must Understand First

Tokenization and lexical rules
Grammar and precedence
AST evaluation
Lexical scoping and closures

5.5 Questions to Guide Your Design

How will you represent values (numbers, strings, booleans)?
How will you manage environments for function calls?
What error recovery strategy will you use?

5.6 Thinking Exercise

Given var x = 1; fun f(){ var x = 2; return x; }, what does f() return?

5.7 The Interview Questions They’ll Ask

Explain lexing vs parsing.
Why use an AST instead of interpreting tokens directly?
How do closures work?

5.8 Hints in Layers

Hint 1: Implement an expression evaluator first. Hint 2: Add variables and environments. Hint 3: Add functions and closures.

5.9 Books That Will Help

| Topic | Book | Chapter | |——-|——|———| | Interpreters | Crafting Interpreters | Ch. 1-9 | | Parsing | Programming Language Pragmatics | Ch. 2 |

5.10 Implementation Phases

Phase 1: Foundation (4-6 days)

Lexer and basic parser.

Phase 2: Core Functionality (7-10 days)

Expressions, statements, variables.

Phase 3: Polish and Edge Cases (4-6 days)

Functions, closures, error handling.

5.11 Key Implementation Decisions

6. Testing Strategy

6.1 Test Categories

6.2 Critical Test Cases

Operator precedence works (1 + 2 * 3).
Variable scoping works in nested blocks.
Closure captures outer variable.

6.3 Test Data

programs: hello.lang, fib.lang, scope.lang

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

7.2 Debugging Strategies

Print AST for complex programs.
Add trace mode to show evaluation steps.

7.3 Performance Traps

Re-parsing in REPL without caching.

8. Extensions & Challenges

8.1 Beginner Extensions

Add arrays or lists.
Add for loops.

8.2 Intermediate Extensions

Add classes and objects.
Add modules/imports.

8.3 Advanced Extensions

Compile to bytecode.
Add garbage collection.

9. Real-World Connections

9.1 Industry Applications

DSLs in build systems and configuration.
Scripting engines inside applications.

Lua
Wren
Monkey language

9.3 Interview Relevance

Demonstrates parsing and runtime fundamentals.

10. Resources

10.1 Essential Reading

Crafting Interpreters (Ch. 1-9)
Writing Interpreters by Hand

10.2 Video Resources

Language design lectures

10.3 Tools & Documentation

re2c or flex for lexer inspiration

11. Self-Assessment Checklist

11.1 Understanding

I can explain the lexer-parser-AST pipeline.
I can explain closures and lexical scoping.
I can explain error recovery.

11.2 Implementation

All sample programs run correctly.
Errors include line numbers.
REPL works reliably.

11.3 Growth

I can explain interpreter trade-offs in an interview.

12. Submission / Completion Criteria

Minimum Viable Completion:

Lexer, parser, and evaluator for arithmetic.
Variables and basic control flow.

Full Completion:

Functions and closures implemented.
Error handling and REPL included.

Excellence (Going Above & Beyond):

Bytecode compiler and VM.
Garbage collection.