Project 3: JSON Parser Library
Build a JSON parser with a clear, memory-safe API and explicit ownership rules.
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Intermediate |
| Time Estimate | 2 weeks |
| Language | C |
| Prerequisites | Strings, recursion, data structures |
| Key Topics | Parsing, AST, ownership, error reporting |
1. Learning Objectives
By completing this project, you will:
- Parse JSON into a structured tree.
- Design APIs that clarify ownership of nodes and strings.
- Implement error reporting with line/column.
- Provide traversal and query helpers.
2. Theoretical Foundation
2.1 Core Concepts
- Recursive descent parsing: JSON is naturally recursive (objects, arrays).
- Value types: null, bool, number, string, array, object.
- Ownership: Parser allocates nodes; caller frees via a single destroy call.
2.2 Why This Matters
Parsing is where boundaries fail if errors are unclear. JSON is a standard format that forces you to design robust, user-friendly error handling.
2.3 Historical Context / Background
JSON became a ubiquitous data interchange format because of its simplicity. Many C libraries exist, but they differ in ownership clarity and API stability.
2.4 Common Misconceptions
- “Strings can be referenced directly”: Unless you keep the input buffer alive, you must copy.
- “Errors can be generic”: Parser errors need precise location.
3. Project Specification
3.1 What You Will Build
A jsonlite library that:
- Parses JSON strings/files
- Produces an AST of nodes
- Exposes getters for types and values
- Provides
json_freeto release all memory
3.2 Functional Requirements
- Parse objects, arrays, strings, numbers, booleans, null.
- Return clear errors with line/column.
- Provide typed accessors.
- Support pretty-print or serialization.
3.3 Non-Functional Requirements
- Safety: No memory leaks or use-after-free.
- Usability: Errors identify exact location.
- Maintainability: Cleanly separated lexer and parser.
3.4 Example Usage / Output
JsonDoc *doc = json_parse_file("config.json");
JsonValue *root = json_root(doc);
const char *name = json_get_string(root, "name");
json_free(doc);
3.5 Real World Outcome
You can load a JSON config file, extract values, and free everything with one call. The API makes ownership unambiguous for users.
4. Solution Architecture
4.1 High-Level Design
lexer -> tokens -> parser -> AST -> query API
4.2 Key Components
| Component | Responsibility | Key Decisions |
|---|---|---|
| Lexer | Tokenize input | Track line/col |
| Parser | Build AST | Recursive descent |
| AST | Node tree | Tagged union |
| API | Query helpers | Typed accessors |
4.3 Data Structures
typedef enum { JSON_NULL, JSON_BOOL, JSON_NUM, JSON_STR, JSON_OBJ, JSON_ARR } JsonType;
typedef struct JsonValue {
JsonType type;
union {
double num;
char *str;
struct JsonObject *obj;
struct JsonArray *arr;
} as;
} JsonValue;
4.4 Algorithm Overview
Key Algorithm: Parse value
- Inspect current token.
- Dispatch to parse object/array/string/number.
- Build node and return.
Complexity Analysis:
- Time: O(n) input size
- Space: O(n) nodes
5. Implementation Guide
5.1 Development Environment Setup
cc -Wall -Wextra -O2 -g -o test_json test_json.c json.c
5.2 Project Structure
jsonlite/
├── src/
│ ├── json.c
│ └── json.h
├── tests/
│ └── test_json.c
└── README.md
5.3 The Core Question You’re Answering
“How do I parse a recursive format and expose a safe, clear API?”
5.4 Concepts You Must Understand First
Stop and research these before coding:
- Recursive parsing
- How do arrays and objects nest?
- Ownership rules
- Who frees nodes and strings?
- Error reporting
- How to report line/column?
5.5 Questions to Guide Your Design
Before implementing, think through these:
- Will users free nodes individually or via a document object?
- Will you copy input strings or reference input buffer?
- How will you represent object members (hash table vs list)?
5.6 Thinking Exercise
Error Location
If parsing fails at "name": [1, 2,, what line/column should you report?
5.7 The Interview Questions They’ll Ask
Prepare to answer these:
- “How do you parse nested structures?”
- “How do you design memory ownership for ASTs?”
- “How do you report parsing errors precisely?”
5.8 Hints in Layers
Hint 1: Start with tokens Ensure the lexer is correct before parsing.
Hint 2: Parse values only Add objects and arrays after primitives.
Hint 3: Add a document wrapper
A single json_free(doc) simplifies ownership.
5.9 Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Parsing | “Crafting Interpreters” | Ch. 5-8 |
| Memory ownership | “Effective C” | Ch. 6 |
5.10 Implementation Phases
Phase 1: Foundation (4-6 days)
Goals:
- Lexer and primitive values
Tasks:
- Tokenize strings, numbers, punctuation.
- Parse null/bool/number/string.
Checkpoint: Simple values parse correctly.
Phase 2: Core Functionality (5-7 days)
Goals:
- Arrays and objects
Tasks:
- Parse arrays recursively.
- Parse objects into key/value pairs.
Checkpoint: Nested JSON parses.
Phase 3: Polish & API (3-5 days)
Goals:
- Query helpers and errors
Tasks:
- Add typed getters.
- Add error messages with line/col.
Checkpoint: Errors are precise.
5.11 Key Implementation Decisions
| Decision | Options | Recommendation | Rationale |
|---|---|---|---|
| Storage | Linked list vs hash map | Linked list | Simpler for parsing |
| Ownership | Document owns all | Yes | Clear cleanup |
6. Testing Strategy
6.1 Test Categories
| Category | Purpose | Examples |
|---|---|---|
| Unit Tests | Lexer correctness | Tokens |
| Parser Tests | Nested JSON | Arrays/objects |
| Error Tests | Bad JSON | Missing commas |
6.2 Critical Test Cases
- Nested objects:
{ "a": {"b": 1} }. - Array parsing:
[1, 2, 3]. - Invalid JSON: Missing closing brace.
6.3 Test Data
{"name": "Ada", "skills": ["c", "os"]}
7. Common Pitfalls & Debugging
7.1 Frequent Mistakes
| Pitfall | Symptom | Solution |
|---|---|---|
| Not copying strings | Use-after-free | Allocate and copy |
| Bad recursion base case | Stack overflow | Validate tokens |
| Memory leaks | Valgrind errors | Free recursively |
7.2 Debugging Strategies
- Print AST with indentation.
- Add token dumps for debugging.
7.3 Performance Traps
Repeated string scans can be slow; tokenize once and reuse tokens.
8. Extensions & Challenges
8.1 Beginner Extensions
- Add pretty-printing.
- Add
json_get_pathwith dotted keys.
8.2 Intermediate Extensions
- Add serialization back to JSON.
- Add number validation (range).
8.3 Advanced Extensions
- Add streaming parser for large files.
- Add JSON schema validation.
9. Real-World Connections
9.1 Industry Applications
- Config parsing: Application settings.
- Network protocols: JSON APIs.
9.2 Related Open Source Projects
- cJSON: Popular C JSON library.
9.3 Interview Relevance
Parsing and ownership design are high-value systems skills.
10. Resources
10.1 Essential Reading
- “Crafting Interpreters” - Ch. 5-8
- “Effective C” - Ch. 6
10.2 Video Resources
- Parsing and recursive descent lectures
10.3 Tools & Documentation
- JSON spec (RFC 8259)
10.4 Related Projects in This Series
- KV Client: API ownership patterns.
- Logging Library: Provides structured logs.
11. Self-Assessment Checklist
11.1 Understanding
- I can parse recursive data.
- I can design a clear ownership API.
- I can report parsing errors precisely.
11.2 Implementation
- JSON parsing works for nested input.
- Errors are informative.
- Memory is managed correctly.
11.3 Growth
- I can add streaming parsing.
- I can explain this project in an interview.
12. Submission / Completion Criteria
Minimum Viable Completion:
- Parse JSON objects and arrays.
Full Completion:
- Typed getters and error reporting.
Excellence (Going Above & Beyond):
- Streaming parser and schema validation.
This guide was generated from SPRINT_4_BOUNDARIES_INTERFACES_PROJECTS.md. For the complete learning path, see the parent directory.