← Back to all projects

COMPILERS INTERPRETERS LEARNING PROJECTS

Understanding Compilers and Interpreters

To truly understand compilers and interpreters, you need to build systems that transform source code into something executable. This forces you to grapple with the fundamental problem: how do we bridge human-readable text and machine execution?

Core Concept Analysis

Compilers and interpreters break down into these fundamental building blocks:

Stage What It Does Core Challenge
Lexical Analysis Convert character stream → tokens Recognizing patterns, handling edge cases
Parsing Convert tokens → Abstract Syntax Tree Grammar design, precedence, error recovery
Semantic Analysis Validate meaning (types, scopes) Symbol tables, type systems
Intermediate Representation Language-agnostic program form Designing good IR, SSA form
Optimization Transform IR for efficiency Correctness-preserving transformations
Code Generation IR → target (bytecode/assembly) Register allocation, instruction selection
Runtime Execute bytecode, manage memory Dispatch loops, garbage collection

Project 1: Expression Evaluator with Variables

  • File: COMPILERS_INTERPRETERS_LEARNING_PROJECTS.md
  • Programming Language: C
  • Coolness Level: Level 3: Genuinely Clever
  • Business Potential: 1. The “Resume Gold”
  • Difficulty: Level 1: Beginner
  • Knowledge Area: Compilers / Interpreters
  • Software or Tool: Lexing/Parsing logic
  • Main Book: “Compilers: Principles and Practice” by Dave & Dave

What you’ll build: A REPL calculator that parses mathematical expressions, supports variables (x = 5 + 3), and evaluates them—showing the AST visually before computing results.

Why it teaches compilers/interpreters: This is the “hello world” of language implementation. You’ll implement the complete pipeline (lexer → parser → AST → evaluator) in miniature, without the complexity of control flow or functions. Every concept scales up.

Core challenges you’ll face:

  • Tokenizing expressions with operators, numbers, identifiers, and parentheses (maps to lexical analysis)
  • Handling operator precedence correctly (2 + 3 * 4 must equal 14, not 20) (maps to parsing)
  • Building and traversing an AST (maps to intermediate representation)
  • Managing a symbol table for variables (maps to semantic analysis)

Key Concepts:

  • Tokenization/Lexing: “Compilers: Principles and Practice” Ch. 3 - Dave & Dave
  • Recursive Descent Parsing: “Language Implementation Patterns” Ch. 2-3 - Terence Parr
  • Operator Precedence: “Writing a C Compiler” Ch. 1 - Nora Sandler
  • Abstract Syntax Trees: “Engineering a Compiler” Ch. 4 - Cooper & Torczon

Difficulty: Beginner Time estimate: Weekend Prerequisites: Basic programming, recursion

Real world outcome: A working REPL where you type x = 10 * 2 then x + 5 and see 25. Bonus: print the AST tree structure before evaluation so you can see how parsing worked.

Learning milestones:

  1. Tokenizer works - you understand how source becomes tokens
  2. Parser works - you understand grammars and precedence
  3. Evaluator works - you understand tree traversal and environments
  4. Error messages work - you understand the importance of source locations

Project 2: Interpreter for a Programming Language (Lox or Monkey)

  • File: COMPILERS_INTERPRETERS_LEARNING_PROJECTS.md
  • Programming Language: C
  • Coolness Level: Level 5: Pure Magic (Super Cool)
  • Business Potential: 1. The “Resume Gold”
  • Difficulty: Level 2: Intermediate
  • Knowledge Area: Compilers / Interpreters
  • Software or Tool: Lox / Monkey
  • Main Book: “Crafting Interpreters” by Bob Nystrom

What you’ll build: A complete tree-walking interpreter for a dynamically-typed language with variables, functions, closures, control flow, and classes/objects.

Why it teaches compilers/interpreters: This project forces you through the entire interpretation pipeline. You can’t skip anything. By the end, you’ll have implemented closures (which requires understanding environments and scope chains), first-class functions, and dynamic dispatch.

Core challenges you’ll face:

  • Designing a scanner that handles strings, comments, and multi-character operators (maps to lexical analysis)
  • Implementing a parser that handles statements, expressions, and declarations (maps to parsing)
  • Creating an environment chain for lexical scoping and closures (maps to semantic analysis)
  • Implementing control flow by recursively evaluating AST nodes (maps to runtime execution)

Resources for key challenges:

  • “Crafting Interpreters” by Bob Nystrom (free online) - The definitive guide; walks through building Lox step-by-step
  • “Writing An Interpreter In Go” by Thorsten Ball - Practical alternative using the Monkey language

Key Concepts:

  • Lexical Scoping: “Language Implementation Patterns” Ch. 6 - Terence Parr
  • Closure Implementation: “Crafting Interpreters” Ch. 11 - Bob Nystrom
  • Tree-Walking Evaluation: “Engineering a Compiler” Ch. 5 - Cooper & Torczon
  • Symbol Tables: “Compilers: Principles and Practice” Ch. 5 - Dave & Dave

Difficulty: Intermediate Time estimate: 2-3 weeks Prerequisites: Comfortable with recursion, basic data structures

Real world outcome: Run programs like:

fun makeCounter() {
  var i = 0;
  fun count() {
    i = i + 1;
    print i;
  }
  return count;
}

var counter = makeCounter();
counter(); // prints 1
counter(); // prints 2

Learning milestones:

  1. Expressions and statements work - you understand the execution model
  2. Functions work - you understand call stacks and return values
  3. Closures work - you truly understand lexical scope and environments
  4. Classes work - you understand dynamic dispatch and inheritance

Project 3: Bytecode Virtual Machine

  • File: COMPILERS_INTERPRETERS_LEARNING_PROJECTS.md
  • Programming Language: C
  • Coolness Level: Level 4: Hardcore Tech Flex
  • Business Potential: 5. The “Industry Disruptor”
  • Difficulty: Level 3: Advanced
  • Knowledge Area: Virtual Machines
  • Software or Tool: Bytecode VM
  • Main Book: “Crafting Interpreters” by Bob Nystrom

What you’ll build: A stack-based VM that executes bytecode instructions, with a compiler that transforms AST → bytecode. Include a disassembler to visualize the bytecode.

Why it teaches compilers/interpreters: Tree-walking is slow. Real interpreters (Python, Ruby, Lua, JVM) compile to bytecode first. This project teaches you why that’s faster and how instruction dispatch works. You’ll understand what “the stack” actually means at runtime.

Core challenges you’ll face:

  • Designing an instruction set (what opcodes do you need?) (maps to IR design)
  • Implementing the bytecode compiler that linearizes your AST (maps to code generation)
  • Building a dispatch loop that executes instructions efficiently (maps to runtime systems)
  • Managing a value stack and call frames (maps to runtime memory model)

Key Concepts:

  • Stack Machine Architecture: “Crafting Interpreters” Part III - Bob Nystrom
  • Instruction Encoding: “Computer Systems: A Programmer’s Perspective” Ch. 3 - Bryant & O’Hallaron
  • Dispatch Loop Optimization: “Virtual Machines” by Iain Craig - Ch. 4
  • Call Frame Management: “Engineering a Compiler” Ch. 6 - Cooper & Torczon

Difficulty: Intermediate-Advanced Time estimate: 2-4 weeks Prerequisites: Completed a tree-walking interpreter, understand stack data structure

Real world outcome: Run a program and see output like:

== disassembly of <script> ==
0000    1 OP_CONSTANT    0 '10'
0002    | OP_CONSTANT    1 '20'
0004    | OP_ADD
0005    | OP_PRINT
0006    2 OP_RETURN

Then watch it execute step-by-step, showing the stack state at each instruction.

Learning milestones:

  1. Simple expressions compile and run - you understand bytecode encoding
  2. Control flow works - you understand jump instructions and patching
  3. Functions work - you understand call frames and the call stack
  4. Performance improves 10-50x over tree-walking - you understand why bytecode is faster

Project 4: Compiler to x86-64 Assembly

  • File: COMPILERS_INTERPRETERS_LEARNING_PROJECTS.md
  • Programming Language: C
  • Coolness Level: Level 4: Hardcore Tech Flex
  • Business Potential: 5. The “Industry Disruptor”
  • Difficulty: Level 4: Expert
  • Knowledge Area: Compilers / Architecture
  • Software or Tool: x86-64 Assembly
  • Main Book: “Writing a C Compiler” by Nora Sandler

What you’ll build: A compiler that takes a C subset (integers, arithmetic, functions, control flow) and produces actual x86-64 assembly that runs natively.

Why it teaches compilers/interpreters: This is where you confront the real machine. No virtual machine to hide behind. You must understand calling conventions, register allocation, stack frames, and how assembly actually executes. This demystifies everything.

Core challenges you’ll face:

  • Generating correct assembly for expressions using registers (maps to instruction selection)
  • Implementing the System V AMD64 calling convention (maps to ABI understanding)
  • Allocating registers efficiently without running out (maps to register allocation)
  • Generating correct code for control flow (if/else, loops) (maps to code generation)

Resources for key challenges:

  • “Writing a C Compiler” by Nora Sandler - Step-by-step guide to building exactly this

Key Concepts:

  • x86-64 Assembly Basics: “Computer Systems: A Programmer’s Perspective” Ch. 3 - Bryant & O’Hallaron
  • Calling Conventions: “Writing a C Compiler” Ch. 9 - Nora Sandler
  • Register Allocation: “Engineering a Compiler” Ch. 13 - Cooper & Torczon
  • Instruction Selection: “Engineering a Compiler” Ch. 11 - Cooper & Torczon

Difficulty: Advanced Time estimate: 1-2 months Prerequisites: Understand assembly basics, completed an interpreter

Real world outcome: Write this:

int factorial(int n) {
    if (n <= 1) return 1;
    return n * factorial(n - 1);
}

Compile it with YOUR compiler, and run the resulting binary: ./a.out prints 120 for factorial(5).

Learning milestones:

  1. Arithmetic expressions compile - you understand instruction selection
  2. Functions work - you understand calling conventions and stack frames
  3. Control flow works - you understand conditional jumps and labels
  4. The binary runs - you’ve created a real executable from source code

Project 5: LLVM Frontend for a Custom Language

  • File: COMPILERS_INTERPRETERS_LEARNING_PROJECTS.md
  • Programming Language: C++
  • Coolness Level: Level 4: Hardcore Tech Flex
  • Business Potential: 5. The “Industry Disruptor”
  • Difficulty: Level 4: Expert
  • Knowledge Area: Compilers
  • Software or Tool: LLVM
  • Main Book: “Engineering a Compiler” by Cooper & Torczon

What you’ll build: A compiler frontend that parses your language and emits LLVM IR, letting LLVM handle optimization and code generation for multiple architectures.

Why it teaches compilers/interpreters: This is how modern compilers (Rust, Swift, Clang) work. You focus on the frontend (parsing, type checking, IR generation) while LLVM provides production-grade optimization and codegen. You’ll understand why LLVM is revolutionary.

Core challenges you’ll face:

  • Learning LLVM IR and its type system (maps to IR design)
  • Using the LLVM C API or bindings to emit IR programmatically (maps to code generation)
  • Implementing SSA form (all values assigned exactly once) (maps to optimization)
  • Connecting to LLVM’s optimization and codegen pipeline (maps to backend integration)

Key Concepts:

  • LLVM IR Fundamentals: “LLVM Code Generation” - Quentin Colombet
  • SSA Form: “Engineering a Compiler” Ch. 9 - Cooper & Torczon
  • LLVM API Usage: LLVM Kaleidoscope Tutorial (official docs)
  • Frontend Design: “Language Implementation Patterns” Ch. 8-9 - Terence Parr

Difficulty: Advanced Time estimate: 1-2 months Prerequisites: Completed a native compiler or interpreter, understand C/C++ basics

Real world outcome: Compile your custom language to native binaries for x86, ARM, or WebAssembly—all from the same frontend. Run benchmarks showing LLVM’s optimizations make your code competitive with gcc/clang.

Learning milestones:

  1. Hello World compiles via LLVM - you understand the LLVM pipeline
  2. Functions and control flow work - you understand LLVM IR structure
  3. Optimizations kick in - you see -O2 transform your code
  4. Cross-compile to different targets - you understand LLVM’s architecture

Project Comparison Table

Project Difficulty Time Depth of Understanding Fun Factor
Expression Evaluator Beginner Weekend ⭐⭐ Entry-level ⭐⭐⭐ Quick wins
Full Interpreter (Lox/Monkey) Intermediate 2-3 weeks ⭐⭐⭐⭐ Core concepts ⭐⭐⭐⭐⭐ Magical
Bytecode VM Intermediate-Advanced 2-4 weeks ⭐⭐⭐⭐ Runtime systems ⭐⭐⭐⭐ Satisfying
x86-64 Compiler Advanced 1-2 months ⭐⭐⭐⭐⭐ Full depth ⭐⭐⭐⭐⭐ Mind-blowing
LLVM Frontend Advanced 1-2 months ⭐⭐⭐⭐ Production skills ⭐⭐⭐⭐ Powerful

Recommendation

Start with Project 2 (Full Interpreter) using Bob Nystrom’s “Crafting Interpreters” as your guide.

Here’s why:

  1. It’s the sweet spot - Complex enough to teach real concepts, achievable enough to finish
  2. Immediate feedback - You can run programs in YOUR language within days
  3. Foundation for everything - Every concept (lexing, parsing, scopes, closures) carries forward
  4. Free, excellent resource - Crafting Interpreters is one of the best programming books ever written

After completing it, you’ll have the foundation to tackle either:

  • Project 3 if you want to understand performance and bytecode
  • Project 4 if you want to understand the machine and native compilation

Final Capstone Project: Self-Hosting Compiler

What you’ll build: A compiler for a language written in that same language. The compiler compiles itself.

Why it teaches compilers/interpreters: Self-hosting is the ultimate test. Your compiler must be correct enough and complete enough to compile its own source code. This forces you to implement every feature you need, handle every edge case, and produce efficient enough code that compilation completes in reasonable time.

Core challenges you’ll face:

  • Designing a language expressive enough to write a compiler in (maps to language design)
  • Bootstrapping: first compile with another compiler, then with yourself (maps to build systems)
  • Debugging compiler bugs when the compiler itself might be buggy (maps to systematic debugging)
  • Optimizing compilation speed since you’ll compile your compiler thousands of times (maps to performance engineering)

Key Concepts:

  • Bootstrapping Process: “Engineering a Compiler” Ch. 1 - Cooper & Torczon
  • Compiler Testing: “Writing a C Compiler” Ch. 20 - Nora Sandler
  • Language Design Trade-offs: “Language Implementation Patterns” Ch. 1 - Terence Parr
  • Self-Hosting Strategy: “The Dragon Book” Ch. 1 - Aho, Lam, Sethi, Ullman

Difficulty: Expert Time estimate: 3-6 months Prerequisites: Completed Projects 2, 3, and 4

Real world outcome: Run ./mycompiler mycompiler.src -o mycompiler_v2 and produce a working compiler binary. Then use that binary to compile itself again and get an identical result. You’ve achieved what GCC, Clang, Rust, and Go all do.

Learning milestones:

  1. Compiler compiles simple programs - foundation works
  2. Compiler compiles a subset of itself - bootstrapping begins
  3. Compiler fully compiles itself - self-hosting achieved
  4. Stage 2 and Stage 3 binaries are identical - compiler is correct

This journey will transform how you see every piece of software. Every programming language, every tool, every runtime—you’ll understand how they work because you’ve built them yourself.