Project 10: SQL Query Engine

A SQL query engine that parses SQL, builds query plans, optimizes them, and executes against in-memory tables—supporting SELECT, WHERE, JOIN, GROUP BY, and ORDER BY.

Quick Reference

Attribute Value
Primary Language Go
Alternative Languages Rust, C++, Java
Difficulty Level 5: Master
Time Estimate 2-3 months
Knowledge Area Databases, Parsing, Query Optimization
Tooling None (from scratch)
Prerequisites All previous projects. Strong algorithms background. Understanding of relational algebra helpful.

What You Will Build

A SQL query engine that parses SQL, builds query plans, optimizes them, and executes against in-memory tables—supporting SELECT, WHERE, JOIN, GROUP BY, and ORDER BY.

Why It Matters

This project builds core skills that appear repeatedly in real-world systems and tooling.

Core Challenges

  • SQL parsing → maps to recursive descent parsing, AST building
  • Query planning → maps to tree transformations, relational algebra
  • Join algorithms → maps to hash join, nested loop join
  • Optimization → maps to cost estimation, plan selection

Key Concepts

  • SQL parsing: “Crafting Interpreters” by Bob Nystrom (parsing techniques)
  • Query execution: “Database Internals” Ch. 7-9 - Alex Petrov
  • Join algorithms: “Designing Data-Intensive Applications” Ch. 3 - Kleppmann
  • B-trees: “Database Internals” Ch. 2-3 - Alex Petrov

Real-World Outcome

$ ./minisql
MiniSQL v1.0
Type "help" for commands, "exit" to quit.

> CREATE TABLE users (id INT, name TEXT, age INT);
Table 'users' created.

> INSERT INTO users VALUES (1, 'Alice', 30);
> INSERT INTO users VALUES (2, 'Bob', 25);
> INSERT INTO users VALUES (3, 'Charlie', 35);
Inserted 3 rows.

> SELECT name, age FROM users WHERE age > 27;
┌─────────┬─────┐
│ name    │ age │
├─────────┼─────┤
│ Alice   │ 30  │
│ Charlie │ 35  │
└─────────┴─────┘
2 rows returned

> CREATE TABLE orders (id INT, user_id INT, amount FLOAT);
> INSERT INTO orders VALUES (1, 1, 99.99);
> INSERT INTO orders VALUES (2, 1, 49.99);
> INSERT INTO orders VALUES (3, 2, 199.99);

> SELECT u.name, SUM(o.amount) as total
  FROM users u
  JOIN orders o ON u.id = o.user_id
  GROUP BY u.name
  ORDER BY total DESC;
┌───────┬────────┐
│ name  │ total  │
├───────┼────────┤
│ Bob   │ 199.99 │
│ Alice │ 149.98 │
└───────┴────────┘

> EXPLAIN SELECT * FROM users WHERE age > 25;
Query Plan:
└─ Filter: age > 25
   └─ TableScan: users
   Estimated cost: 3 rows scanned

Implementation Guide

  1. Reproduce the simplest happy-path scenario.
  2. Build the smallest working version of the core feature.
  3. Add input validation and error handling.
  4. Add instrumentation/logging to confirm behavior.
  5. Refactor into clean modules with tests.

Milestones

  • Milestone 1: Minimal working program that runs end-to-end.
  • Milestone 2: Correct outputs for typical inputs.
  • Milestone 3: Robust handling of edge cases.
  • Milestone 4: Clean structure and documented usage.

Validation Checklist

  • Output matches the real-world outcome example
  • Handles invalid inputs safely
  • Provides clear errors and exit codes
  • Repeatable results across runs

References

  • Main guide: LEARN_GO_DEEP_DIVE.md
  • “Database Internals” by Alex Petrov