Project 3: JSON Parser Library

Build a JSON parser with a clear, memory-safe API and explicit ownership rules.

Quick Reference

Attribute Value
Difficulty Intermediate
Time Estimate 2 weeks
Language C
Prerequisites Strings, recursion, data structures
Key Topics Parsing, AST, ownership, error reporting

1. Learning Objectives

By completing this project, you will:

  1. Parse JSON into a structured tree.
  2. Design APIs that clarify ownership of nodes and strings.
  3. Implement error reporting with line/column.
  4. Provide traversal and query helpers.

2. Theoretical Foundation

2.1 Core Concepts

  • Recursive descent parsing: JSON is naturally recursive (objects, arrays).
  • Value types: null, bool, number, string, array, object.
  • Ownership: Parser allocates nodes; caller frees via a single destroy call.

2.2 Why This Matters

Parsing is where boundaries fail if errors are unclear. JSON is a standard format that forces you to design robust, user-friendly error handling.

2.3 Historical Context / Background

JSON became a ubiquitous data interchange format because of its simplicity. Many C libraries exist, but they differ in ownership clarity and API stability.

2.4 Common Misconceptions

  • “Strings can be referenced directly”: Unless you keep the input buffer alive, you must copy.
  • “Errors can be generic”: Parser errors need precise location.

3. Project Specification

3.1 What You Will Build

A jsonlite library that:

  • Parses JSON strings/files
  • Produces an AST of nodes
  • Exposes getters for types and values
  • Provides json_free to release all memory

3.2 Functional Requirements

  1. Parse objects, arrays, strings, numbers, booleans, null.
  2. Return clear errors with line/column.
  3. Provide typed accessors.
  4. Support pretty-print or serialization.

3.3 Non-Functional Requirements

  • Safety: No memory leaks or use-after-free.
  • Usability: Errors identify exact location.
  • Maintainability: Cleanly separated lexer and parser.

3.4 Example Usage / Output

JsonDoc *doc = json_parse_file("config.json");
JsonValue *root = json_root(doc);
const char *name = json_get_string(root, "name");
json_free(doc);

3.5 Real World Outcome

You can load a JSON config file, extract values, and free everything with one call. The API makes ownership unambiguous for users.


4. Solution Architecture

4.1 High-Level Design

lexer -> tokens -> parser -> AST -> query API

4.2 Key Components

Component Responsibility Key Decisions
Lexer Tokenize input Track line/col
Parser Build AST Recursive descent
AST Node tree Tagged union
API Query helpers Typed accessors

4.3 Data Structures

typedef enum { JSON_NULL, JSON_BOOL, JSON_NUM, JSON_STR, JSON_OBJ, JSON_ARR } JsonType;

typedef struct JsonValue {
    JsonType type;
    union {
        double num;
        char *str;
        struct JsonObject *obj;
        struct JsonArray *arr;
    } as;
} JsonValue;

4.4 Algorithm Overview

Key Algorithm: Parse value

  1. Inspect current token.
  2. Dispatch to parse object/array/string/number.
  3. Build node and return.

Complexity Analysis:

  • Time: O(n) input size
  • Space: O(n) nodes

5. Implementation Guide

5.1 Development Environment Setup

cc -Wall -Wextra -O2 -g -o test_json test_json.c json.c

5.2 Project Structure

jsonlite/
├── src/
│   ├── json.c
│   └── json.h
├── tests/
│   └── test_json.c
└── README.md

5.3 The Core Question You’re Answering

“How do I parse a recursive format and expose a safe, clear API?”

5.4 Concepts You Must Understand First

Stop and research these before coding:

  1. Recursive parsing
    • How do arrays and objects nest?
  2. Ownership rules
    • Who frees nodes and strings?
  3. Error reporting
    • How to report line/column?

5.5 Questions to Guide Your Design

Before implementing, think through these:

  1. Will users free nodes individually or via a document object?
  2. Will you copy input strings or reference input buffer?
  3. How will you represent object members (hash table vs list)?

5.6 Thinking Exercise

Error Location

If parsing fails at "name": [1, 2,, what line/column should you report?

5.7 The Interview Questions They’ll Ask

Prepare to answer these:

  1. “How do you parse nested structures?”
  2. “How do you design memory ownership for ASTs?”
  3. “How do you report parsing errors precisely?”

5.8 Hints in Layers

Hint 1: Start with tokens Ensure the lexer is correct before parsing.

Hint 2: Parse values only Add objects and arrays after primitives.

Hint 3: Add a document wrapper A single json_free(doc) simplifies ownership.

5.9 Books That Will Help

Topic Book Chapter
Parsing “Crafting Interpreters” Ch. 5-8
Memory ownership “Effective C” Ch. 6

5.10 Implementation Phases

Phase 1: Foundation (4-6 days)

Goals:

  • Lexer and primitive values

Tasks:

  1. Tokenize strings, numbers, punctuation.
  2. Parse null/bool/number/string.

Checkpoint: Simple values parse correctly.

Phase 2: Core Functionality (5-7 days)

Goals:

  • Arrays and objects

Tasks:

  1. Parse arrays recursively.
  2. Parse objects into key/value pairs.

Checkpoint: Nested JSON parses.

Phase 3: Polish & API (3-5 days)

Goals:

  • Query helpers and errors

Tasks:

  1. Add typed getters.
  2. Add error messages with line/col.

Checkpoint: Errors are precise.

5.11 Key Implementation Decisions

Decision Options Recommendation Rationale
Storage Linked list vs hash map Linked list Simpler for parsing
Ownership Document owns all Yes Clear cleanup

6. Testing Strategy

6.1 Test Categories

Category Purpose Examples
Unit Tests Lexer correctness Tokens
Parser Tests Nested JSON Arrays/objects
Error Tests Bad JSON Missing commas

6.2 Critical Test Cases

  1. Nested objects: { "a": {"b": 1} }.
  2. Array parsing: [1, 2, 3].
  3. Invalid JSON: Missing closing brace.

6.3 Test Data

{"name": "Ada", "skills": ["c", "os"]}

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

Pitfall Symptom Solution
Not copying strings Use-after-free Allocate and copy
Bad recursion base case Stack overflow Validate tokens
Memory leaks Valgrind errors Free recursively

7.2 Debugging Strategies

  • Print AST with indentation.
  • Add token dumps for debugging.

7.3 Performance Traps

Repeated string scans can be slow; tokenize once and reuse tokens.


8. Extensions & Challenges

8.1 Beginner Extensions

  • Add pretty-printing.
  • Add json_get_path with dotted keys.

8.2 Intermediate Extensions

  • Add serialization back to JSON.
  • Add number validation (range).

8.3 Advanced Extensions

  • Add streaming parser for large files.
  • Add JSON schema validation.

9. Real-World Connections

9.1 Industry Applications

  • Config parsing: Application settings.
  • Network protocols: JSON APIs.
  • cJSON: Popular C JSON library.

9.3 Interview Relevance

Parsing and ownership design are high-value systems skills.


10. Resources

10.1 Essential Reading

  • “Crafting Interpreters” - Ch. 5-8
  • “Effective C” - Ch. 6

10.2 Video Resources

  • Parsing and recursive descent lectures

10.3 Tools & Documentation

  • JSON spec (RFC 8259)
  • KV Client: API ownership patterns.
  • Logging Library: Provides structured logs.

11. Self-Assessment Checklist

11.1 Understanding

  • I can parse recursive data.
  • I can design a clear ownership API.
  • I can report parsing errors precisely.

11.2 Implementation

  • JSON parsing works for nested input.
  • Errors are informative.
  • Memory is managed correctly.

11.3 Growth

  • I can add streaming parsing.
  • I can explain this project in an interview.

12. Submission / Completion Criteria

Minimum Viable Completion:

  • Parse JSON objects and arrays.

Full Completion:

  • Typed getters and error reporting.

Excellence (Going Above & Beyond):

  • Streaming parser and schema validation.

This guide was generated from SPRINT_4_BOUNDARIES_INTERFACES_PROJECTS.md. For the complete learning path, see the parent directory.