Project 1: Build a JSON Query Tool (like jq)

Project 1: Build a JSON Query Tool (like jq)

Project Overview

Attribute Value
Difficulty Intermediate
Time Estimate 1-2 weeks
Main Language Rust
Alternatives Haskell, OCaml, TypeScript
Prerequisites Basic parsing concepts, comfortable with one language
Key FP Concepts Function composition, pipelines, immutability, algebraic data types

Learning Objectives

By completing this project, you will:

  1. Understand function composition - Build complex behavior by combining simple, pure functions
  2. Master pipeline architecture - Design systems where data flows through transformation chains
  3. Internalize immutability - Experience why never mutating data simplifies debugging and testing
  4. Apply algebraic data types - Use Option/Maybe and Result/Either for principled error handling
  5. Implement higher-order functions - Build your own map, filter, and reduce operations
  6. Parse structured languages - Convert query strings into abstract syntax trees

The Core Question

โ€œWhy do functional programmers say โ€˜composition is everythingโ€™? What does it mean to build complex behavior by combining simple functions?โ€

In imperative programming, you build complexity by accumulating state through loops and mutations. In functional programming, you build complexity by composing functionsโ€”feeding the output of one into the input of another.

When you write .users | filter(.age > 21) | map(.name), youโ€™re not writing a loop that mutates an array. Youโ€™re composing three pure functions into a pipeline:

users โ†’ filter(age > 21) โ†’ map(name) โ†’ result

Each function is simple. The power comes from combining them.


Deep Theoretical Foundation

1. Pure Functions: The Foundation of FP

A pure function has two properties:

  1. Deterministic: Given the same inputs, it always returns the same output
  2. No side effects: It doesnโ€™t modify any external state, perform I/O, or interact with the outside world
// PURE: Only depends on input, only produces output
fn add(a: i32, b: i32) -> i32 {
    a + b
}

// IMPURE: Modifies external state
static mut COUNTER: i32 = 0;
fn add_and_count(a: i32, b: i32) -> i32 {
    unsafe { COUNTER += 1; }  // Side effect!
    a + b
}

// IMPURE: Output depends on external state
fn add_with_time(a: i32, b: i32) -> i64 {
    let now = std::time::SystemTime::now();  // Non-deterministic!
    a as i64 + b as i64 + now.elapsed().unwrap().as_secs() as i64
}

Why purity matters for JSON queries:

  • filter(users, age > 21) always returns the same filtered array for the same input
  • You can test it with just assert_eq!(filter(input), expected_output)
  • You can cache results: if input hasnโ€™t changed, output hasnโ€™t changed
  • You can parallelize: no shared state means no race conditions

2. Immutability: Data That Never Changes

Immutability means once data is created, it cannot be modified. To โ€œchangeโ€ data, you create a new version.

// MUTABLE approach (imperative)
let mut users = vec!["Alice", "Bob"];
users.push("Charlie");  // Modifies users in place

// IMMUTABLE approach (functional)
let users = vec!["Alice", "Bob"];
let new_users = [&users[..], &["Charlie"]].concat();  // Creates new vector
// `users` is unchanged

The mental model shift:

  • Imperative: โ€œI have a box. I put things in it and take things out.โ€
  • Functional: โ€œI have data. I create new data derived from it.โ€

Why immutability matters:

  1. No โ€œspooky action at a distanceโ€: If you pass data to a function, you know it wonโ€™t be modified
  2. Thread safety: No locks needed when data canโ€™t change
  3. Time-travel debugging: Keep all previous states, step backward
  4. Easier reasoning: At any point, data is what it was when created

3. Higher-Order Functions: Functions as Values

A higher-order function is a function that either:

  1. Takes one or more functions as arguments, or
  2. Returns a function as its result

The three fundamental HOFs:

map: Transform every element

// map applies a function to each element, returning a new collection
fn map<T, U, F>(items: Vec<T>, f: F) -> Vec<U>
where
    F: Fn(T) -> U
{
    items.into_iter().map(f).collect()
}

// Usage: transform each user to just their name
let names = map(users, |user| user.name);

What map does conceptually:

[a, b, c] โ†’ map(f) โ†’ [f(a), f(b), f(c)]

filter: Keep elements that pass a test

// filter keeps elements where the predicate returns true
fn filter<T, F>(items: Vec<T>, predicate: F) -> Vec<T>
where
    F: Fn(&T) -> bool
{
    items.into_iter().filter(predicate).collect()
}

// Usage: keep only adults
let adults = filter(users, |user| user.age >= 21);

What filter does conceptually:

[a, b, c, d] โ†’ filter(p) โ†’ [elements where p(element) is true]

reduce (fold): Accumulate into a single value

// reduce combines all elements into a single value
fn reduce<T, U, F>(items: Vec<T>, initial: U, f: F) -> U
where
    F: Fn(U, T) -> U
{
    items.into_iter().fold(initial, f)
}

// Usage: sum all ages
let total_age = reduce(users, 0, |acc, user| acc + user.age);

What reduce does conceptually:

[a, b, c] โ†’ reduce(f, init) โ†’ f(f(f(init, a), b), c)

4. Function Composition: Building Pipelines

Composition is combining two functions to create a new function:

compose(f, g)(x) = f(g(x))

In a pipeline, data flows left-to-right:

x |> g |> f  โ‰ก  f(g(x))

Visual representation:

Input: {"users": [{"name": "Alice", "age": 30}, {"name": "Bob", "age": 19}]}

Pipeline: .users | filter(.age >= 21) | map(.name)

Step 1: .users
        โ†“
        [{"name": "Alice", "age": 30}, {"name": "Bob", "age": 19}]

Step 2: filter(.age >= 21)
        โ†“
        [{"name": "Alice", "age": 30}]

Step 3: map(.name)
        โ†“
        ["Alice"]

Pipeline Data Transformation

5. Algebraic Data Types: Making Illegal States Unrepresentable

Sum types (Either/Result) represent โ€œone of these thingsโ€:

enum Result<T, E> {
    Ok(T),    // Success with value
    Err(E),   // Failure with error
}

enum Option<T> {
    Some(T),  // Value present
    None,     // Value absent
}

Why this matters for JSON queries:

  • .nonexistent_field should return None, not crash
  • filter(.age > "text") should return Err(TypeError), not panic
  • Errors compose through the pipeline
// Without Result: exceptions, nulls, magic values
fn get_field(json: Json, field: &str) -> Json {
    // What if field doesn't exist? Return null? Throw? Crash?
}

// With Result: errors are explicit in the type
fn get_field(json: Json, field: &str) -> Result<Json, Error> {
    match json {
        Json::Object(map) => map.get(field)
            .cloned()
            .ok_or(Error::FieldNotFound(field.to_string())),
        _ => Err(Error::NotAnObject),
    }
}

Project Specification

What Youโ€™re Building

A command-line tool that queries and transforms JSON using a pipeline syntax:

# Basic field access
$ echo '{"name":"Alice","age":30}' | jql '.name'
"Alice"

# Array operations
$ echo '[1,2,3,4,5]' | jql '. | map(. * 2)'
[2,4,6,8,10]

# Complex pipelines
$ cat users.json | jql '.users | filter(.age >= 21) | map(.name) | sort'
["Alice", "Charlie"]

Query Language Grammar

query       = expression ("|" expression)*
expression  = field_access | operation | literal
field_access = "." identifier ("." identifier)*
operation   = "map" "(" query ")"
            | "filter" "(" predicate ")"
            | "sort"
            | "reverse"
            | "length"
            | "keys"
            | "values"
predicate   = query comparator query
comparator  = "==" | "!=" | ">" | "<" | ">=" | "<="
literal     = number | string | "true" | "false" | "null"
identifier  = [a-zA-Z_][a-zA-Z0-9_]*

Core Operations to Implement

Operation Input Type Output Type Description
.field Object Any Extract field from object
map(expr) Array Array Apply expression to each element
filter(pred) Array Array Keep elements where predicate is true
sort Array Array Sort elements (numbers/strings)
reverse Array Array Reverse element order
length Array/String/Object Number Count elements
keys Object Array Extract object keys
values Object Array Extract object values

Solution Architecture

High-Level Design

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚   Input     โ”‚    โ”‚   Parser    โ”‚    โ”‚  Evaluator  โ”‚    โ”‚   Output    โ”‚
โ”‚   (JSON +   โ”‚โ”€โ”€โ”€โ–ถโ”‚  (Query to  โ”‚โ”€โ”€โ”€โ–ถโ”‚  (AST +     โ”‚โ”€โ”€โ”€โ–ถโ”‚   (JSON)    โ”‚
โ”‚   Query)    โ”‚    โ”‚   AST)      โ”‚    โ”‚   JSON)     โ”‚    โ”‚             โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

JSON Query Tool Pipeline

Data Types

// JSON representation
enum Json {
    Null,
    Bool(bool),
    Number(f64),
    String(String),
    Array(Vec<Json>),
    Object(HashMap<String, Json>),
}

// Query AST
enum Query {
    Identity,                        // .
    Field(String),                   // .name
    Pipe(Box<Query>, Box<Query>),    // query | query
    Map(Box<Query>),                 // map(query)
    Filter(Box<Predicate>),          // filter(predicate)
    Sort,
    Reverse,
    Length,
    Keys,
    Values,
}

enum Predicate {
    Comparison(Box<Query>, Comparator, Box<Query>),
    And(Box<Predicate>, Box<Predicate>),
    Or(Box<Predicate>, Box<Predicate>),
    Not(Box<Predicate>),
}

enum Comparator {
    Eq, Ne, Lt, Le, Gt, Ge,
}

// Errors
enum Error {
    ParseError { position: usize, message: String },
    TypeError { expected: String, got: String, path: String },
    FieldNotFound { field: String, path: String },
    IndexOutOfBounds { index: usize, length: usize },
}

Module Structure

src/
โ”œโ”€โ”€ main.rs           # CLI entry point
โ”œโ”€โ”€ lib.rs            # Public API
โ”œโ”€โ”€ json.rs           # JSON type and operations
โ”œโ”€โ”€ ast.rs            # Query AST types
โ”œโ”€โ”€ parser.rs         # Query string โ†’ AST
โ”œโ”€โ”€ eval.rs           # AST + JSON โ†’ Result<JSON>
โ””โ”€โ”€ error.rs          # Error types and formatting

Key Function Signatures

// Parser: Query string โ†’ AST
fn parse(query: &str) -> Result<Query, ParseError>;

// Evaluator: AST + JSON โ†’ Result
fn evaluate(query: &Query, json: &Json) -> Result<Json, EvalError>;

// Pipeline execution (the core insight: it's just a fold!)
fn execute_pipeline(ops: Vec<Query>, input: Json) -> Result<Json, Error> {
    ops.into_iter().try_fold(input, |acc, op| evaluate(&op, &acc))
}

// Individual operations (all pure functions!)
fn get_field(json: &Json, field: &str) -> Result<Json, Error>;
fn map_array(json: &Json, query: &Query) -> Result<Json, Error>;
fn filter_array(json: &Json, pred: &Predicate) -> Result<Json, Error>;

Implementation Guide

Phase 1: JSON Representation (Day 1)

Goal: Define your JSON type and basic operations.

  1. Create the Json enum with all JSON types
  2. Implement Display for pretty printing
  3. Implement FromStr or use serde_json for parsing
  4. Write tests for JSON equality and display

Self-check questions:

  • Can you represent {"nested": {"deeply": [1, 2, 3]}}?
  • Does your equality check handle floating-point edge cases?

Phase 2: Query AST (Day 2)

Goal: Define the abstract syntax tree for queries.

  1. Create the Query enum with basic operations
  2. Create the Predicate enum for filter conditions
  3. Create the Comparator enum for comparisons
  4. Implement Display for debugging ASTs

Self-check questions:

  • Can you represent .users | filter(.age > 21) | map(.name) as an AST?
  • Is your AST recursive? (It needs to be for nested queries)

Phase 3: Basic Parser (Days 3-4)

Goal: Parse query strings into ASTs.

Start simpleโ€”just handle:

  1. Identity (.)
  2. Field access (.field)
  3. Pipes (|)
fn parse_query(input: &str) -> Result<Query, ParseError> {
    let segments: Vec<&str> = input.split('|').collect();
    let queries: Result<Vec<Query>, _> = segments
        .iter()
        .map(|s| parse_expression(s.trim()))
        .collect();

    // Chain them with Pipe
    queries?.into_iter()
        .reduce(|a, b| Query::Pipe(Box::new(a), Box::new(b)))
        .ok_or(ParseError::EmptyQuery)
}

Self-check questions:

  • Does .users.name parse correctly as nested field access?
  • What happens with malformed input like .|.|.?

Phase 4: Basic Evaluator (Days 5-6)

Goal: Execute ASTs against JSON.

  1. Implement evaluate for Identity and Field
  2. Implement evaluate for Pipe
  3. Add proper error handling with Result
fn evaluate(query: &Query, json: &Json) -> Result<Json, EvalError> {
    match query {
        Query::Identity => Ok(json.clone()),
        Query::Field(name) => get_field(json, name),
        Query::Pipe(left, right) => {
            let intermediate = evaluate(left, json)?;
            evaluate(right, &intermediate)
        }
        // ... more cases
    }
}

Self-check questions:

  • Does .users on {"users": []} return []?
  • Does .missing return an appropriate error?

Phase 5: Array Operations (Days 7-8)

Goal: Implement map and filter.

This is where FP shines. Each operation is a pure function:

fn map_array(json: &Json, inner: &Query) -> Result<Json, EvalError> {
    match json {
        Json::Array(arr) => {
            let results: Result<Vec<Json>, _> = arr
                .iter()
                .map(|item| evaluate(inner, item))
                .collect();
            Ok(Json::Array(results?))
        }
        _ => Err(EvalError::TypeError {
            expected: "array".to_string(),
            got: json.type_name(),
        }),
    }
}

Self-check questions:

  • Does map(.name) on [{"name": "a"}, {"name": "b"}] return ["a", "b"]?
  • Does filter(.age > 21) correctly evaluate the predicate for each element?

Phase 6: Predicate Parsing and Evaluation (Days 9-10)

Goal: Parse and evaluate filter conditions.

  1. Parse comparison expressions: .age > 21, .name == "Alice"
  2. Evaluate predicates against JSON values
  3. Handle type coercion or type errors

Self-check questions:

  • Can you compare numbers, strings, booleans?
  • Whatโ€™s the behavior of .age > "text"? Error or false?

Phase 7: CLI Polish (Days 11-12)

Goal: Make it usable as a real tool.

  1. Read JSON from stdin
  2. Accept query as command-line argument
  3. Pretty-print output
  4. Provide helpful error messages with context
$ echo '{"users":[{"name":"Alice"}]}' | jql '.users | map(.name)'
["Alice"]

Phase 8: Extensions (Days 13-14)

Add more operations as you see fit:

  • sort, reverse, unique
  • length, keys, values
  • first, last, nth(n)
  • Arithmetic: + - * /
  • String operations: contains, startswith

Testing Strategy

Unit Tests

Test each component in isolation:

#[cfg(test)]
mod tests {
    #[test]
    fn test_field_access() {
        let json = json!({"name": "Alice"});
        let result = evaluate(&Query::Field("name".into()), &json);
        assert_eq!(result.unwrap(), json!("Alice"));
    }

    #[test]
    fn test_map() {
        let json = json!([1, 2, 3]);
        let query = Query::Map(Box::new(Query::Identity));
        let result = evaluate(&query, &json);
        assert_eq!(result.unwrap(), json!([1, 2, 3]));
    }

    #[test]
    fn test_filter() {
        let json = json!([1, 2, 3, 4, 5]);
        // filter(. > 3)
        let pred = Predicate::Comparison(
            Box::new(Query::Identity),
            Comparator::Gt,
            Box::new(Query::Literal(Json::Number(3.0))),
        );
        let query = Query::Filter(Box::new(pred));
        let result = evaluate(&query, &json);
        assert_eq!(result.unwrap(), json!([4, 5]));
    }
}

Integration Tests

Test full pipelines:

#[test]
fn test_full_pipeline() {
    let json = json!({
        "users": [
            {"name": "Alice", "age": 30},
            {"name": "Bob", "age": 19},
            {"name": "Charlie", "age": 25}
        ]
    });

    let query = ".users | filter(.age >= 21) | map(.name)";
    let result = execute(query, &json);
    assert_eq!(result.unwrap(), json!(["Alice", "Charlie"]));
}

Property-Based Tests

Use proptest or quickcheck:

proptest! {
    #[test]
    fn map_preserves_length(arr: Vec<i32>) {
        let json = Json::Array(arr.iter().map(|&n| Json::Number(n as f64)).collect());
        let result = evaluate(&Query::Map(Box::new(Query::Identity)), &json).unwrap();
        if let Json::Array(result_arr) = result {
            assert_eq!(result_arr.len(), arr.len());
        }
    }

    #[test]
    fn filter_never_increases_length(arr: Vec<i32>) {
        let json = Json::Array(arr.iter().map(|&n| Json::Number(n as f64)).collect());
        // filter(. > 0)
        let pred = /* ... */;
        let result = evaluate(&Query::Filter(Box::new(pred)), &json).unwrap();
        if let Json::Array(result_arr) = result {
            assert!(result_arr.len() <= arr.len());
        }
    }
}

Common Pitfalls and Debugging

Pitfall 1: Forgetting to Clone

Symptom: Borrow checker errors, or unexpected data reuse

Solution: JSON operations should return new data, not modify in place

// WRONG: Trying to reuse json
fn get_field(json: Json, field: &str) -> Result<Json, Error> {
    // json is moved, can't be used again!
}

// RIGHT: Take reference, return owned
fn get_field(json: &Json, field: &str) -> Result<Json, Error> {
    // Clone the result, original unchanged
}

Pitfall 2: Stack Overflow on Deep Recursion

Symptom: Stack overflow with deeply nested JSON

Solution: Use iteration or explicit stack for very deep structures

// Recursive (may overflow)
fn depth(json: &Json) -> usize {
    match json {
        Json::Array(arr) => 1 + arr.iter().map(depth).max().unwrap_or(0),
        Json::Object(obj) => 1 + obj.values().map(depth).max().unwrap_or(0),
        _ => 1,
    }
}

// Iterative (safe)
fn depth_safe(json: &Json) -> usize {
    let mut stack = vec![(json, 1)];
    let mut max_depth = 0;
    while let Some((node, d)) = stack.pop() {
        max_depth = max_depth.max(d);
        match node {
            Json::Array(arr) => stack.extend(arr.iter().map(|v| (v, d + 1))),
            Json::Object(obj) => stack.extend(obj.values().map(|v| (v, d + 1))),
            _ => {}
        }
    }
    max_depth
}

Pitfall 3: Poor Error Messages

Symptom: โ€œError: type mismatchโ€ with no context

Solution: Include path information in errors

// BAD
Err(Error::TypeError)

// GOOD
Err(Error::TypeError {
    expected: "array",
    got: "object",
    path: ".users[0].addresses",
    context: "while evaluating filter(.city == \"NYC\")",
})

Pitfall 4: Incorrect Predicate Evaluation

Symptom: filter(.age > 21) returns empty or wrong results

Debug: Print the predicate result for each element

fn filter_array(json: &Json, pred: &Predicate) -> Result<Json, Error> {
    match json {
        Json::Array(arr) => {
            let results: Vec<Json> = arr
                .iter()
                .filter(|item| {
                    let result = evaluate_predicate(pred, item);
                    eprintln!("Predicate for {:?}: {:?}", item, result); // Debug!
                    result.unwrap_or(false)
                })
                .cloned()
                .collect();
            Ok(Json::Array(results))
        }
        _ => Err(/* ... */),
    }
}

Extensions and Challenges

Challenge 1: Lazy Evaluation

Make operations lazy so they can handle infinite streams or very large files:

// Instead of Vec<Json>, use an iterator
fn filter_lazy<'a>(
    iter: impl Iterator<Item = &'a Json> + 'a,
    pred: &'a Predicate,
) -> impl Iterator<Item = &'a Json> + 'a {
    iter.filter(move |item| evaluate_predicate(pred, item).unwrap_or(false))
}

Challenge 2: Parallel Execution

Since operations are pure, parallelize them with Rayon:

use rayon::prelude::*;

fn map_parallel(json: &Json, inner: &Query) -> Result<Json, Error> {
    match json {
        Json::Array(arr) => {
            let results: Result<Vec<Json>, _> = arr
                .par_iter()  // Parallel iterator!
                .map(|item| evaluate(inner, item))
                .collect();
            Ok(Json::Array(results?))
        }
        _ => Err(/* ... */),
    }
}

Challenge 3: User-Defined Functions

Allow users to define and reuse transformations:

$ jql 'def adult = .age >= 21; .users | filter(adult) | map(.name)'

Challenge 4: Streaming JSON Parser

Handle newline-delimited JSON (NDJSON) for log processing:

$ cat logs.ndjson | jql 'filter(.level == "error") | map(.message)'

Real-World Connections

jq - The Inspiration

Your tool is a simplified version of jq, one of the most beloved Unix tools. Study its manual to see how a mature implementation handles edge cases.

Unix Philosophy

This project embodies Unix philosophy:

  • Do one thing well: Query JSON, nothing else
  • Compose with others: Works with curl, cat, grep, etc.
  • Text streams as interface: stdin/stdout, JSON as universal format

Modern Dataflow Systems

The pipeline model scales to distributed systems:

  • Apache Spark: df.filter(col("age") > 21).select("name")
  • Pandas: df[df.age > 21]["name"]
  • SQL: SELECT name FROM users WHERE age > 21

All express the same functional idea: declarative transformations on data.


Resources

Essential Reading

Topic Resource
Pure functions, pipelines Domain Modeling Made Functional Ch. 4, 6
Higher-order functions Learn You a Haskell Ch. 6
Error handling with Result Programming Rust 2nd Ed. Ch. 7
Immutability Fluent Python 2nd Ed. Ch. 6
Parsing Language Implementation Patterns Ch. 2

Tools and Libraries

Tool Purpose
serde_json (Rust) JSON parsing (donโ€™t reinvent this)
clap (Rust) CLI argument parsing
nom (Rust) Parser combinators (optional)
proptest (Rust) Property-based testing

Reference Implementations


Self-Assessment Checklist

Before considering this project complete, verify:

Core Functionality

  • Can parse .field expressions
  • Can parse .nested.field.access
  • Can parse query | query pipelines
  • Can parse map(query) and filter(predicate)
  • Can parse comparison predicates: >, <, ==, !=, >=, <=

Evaluation

  • Field access works on objects
  • Field access on non-objects returns appropriate error
  • map applies query to each array element
  • filter keeps elements where predicate is true
  • Pipes chain operations correctly

Error Handling

  • Parse errors include position information
  • Type errors include expected vs. got
  • Missing field errors include the path
  • All errors implement Display with helpful messages

Quality

  • All operations are pure functions
  • No mutation of input JSON
  • Tests cover happy paths and error cases
  • CLI reads from stdin and accepts query argument

Extensions (optional)

  • sort, reverse, unique
  • length, keys, values
  • Lazy evaluation for large files
  • Parallel execution with Rayon

Interview Questions

Prepare to discuss these after completing the project:

  1. โ€œWhatโ€™s the difference between map and a for loop?โ€
  2. โ€œWhy are pure functions easier to test?โ€
  3. โ€œHow does immutability help with debugging?โ€
  4. โ€œWhat is function composition? Give an example.โ€
  5. โ€œHow would you make your JSON query tool lazy?โ€
  6. โ€œWhatโ€™s the difference between Option and Result?โ€
  7. โ€œWhy is functional programming popular for data pipelines?โ€
  8. โ€œHow would you parallelize the map operation?โ€

See the main file for detailed expected answers.