Project 4: JSON Parser with Explicit Ownership
Project 4: JSON Parser with Explicit Ownership
Sprint: 2 - Data & Invariants Difficulty: Advanced Time Estimate: 2 weeks Prerequisites: Projects 1 & 3 (dynamic array and arena), basic parsing concepts
Overview
What youโll build: A JSON parser that produces a tree structure with clear, documented ownership semanticsโwho owns each string, each array, each object.
Why it teaches Data & Invariants: JSON parsing is a gauntlet of ownership decisions:
- Does the parsed tree own the strings, or point into the source?
- Who frees nested objects?
- What happens to the tree if parsing fails halfway?
- How do you represent
nullvs โkey not presentโ?
Every node in your tree has an invariant. Every edge represents an ownership relationship.
The Core Question Youโre Answering:
โHow do you build a recursive tree structure in C where every node clearly owns its children, strings have explicit lifetime semantics, and cleanup is guaranteed even on partial parse failures?โ
Learning Objectives
By the end of this project, you will be able to:
- Implement recursive descent parsing in C
- Design ownership semantics for tree structures
- Use tagged unions to represent multiple types
- Handle parse errors with proper cleanup
- Distinguish null from absent values
- Integrate arena allocation for efficient memory management
Theoretical Foundation
JSON Value Types
JSON Types:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ null - The value null โ
โ boolean - true or false โ
โ number - integers and floats (64-bit double) โ
โ string - UTF-8 text in quotes โ
โ array - ordered list [v1, v2, ...] โ
โ object - key-value pairs {"k1": v1, "k2": v2, ...} โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Ownership Patterns
OWNERSHIP IN JSON TREES:
Pattern 1: Deep Copy (Safe, Slower)
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Input string: {"name": "Alice"} โ
โ โ
โ Parsed tree: โ
โ โโโโโโโโโโโโ โ
โ โ Object โ โ
โ โ โโโโโโโโ โ โโโโโโโโโโโโโโโ โ
โ โ โpairsโโผโโผโโโโ>โ "name" copy โ โ OWNS this string โ
โ โ โโโโโโโโ โ โ "Alice" copyโ โ OWNS this string โ
โ โโโโโโโโโโโโ โโโโโโโโโโโโโโโ โ
โ โ
โ Tree OWNS all string memory. Can outlive input. โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Pattern 2: Shallow Copy (Fast, Dangerous)
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Input string: {"name": "Alice"} โ
โ โ โ โ
โ Parsed tree: โ โ โ
โ โโโโโโโโโโโโ โ โ โ
โ โ Object โ โ โ โ
โ โ โโโโโโโโ โ โโโโโโโผโโโโโโโผโโโ โ
โ โ โpairsโโผโโผโโ>โ ptr โ ptr โ โ โ Points INTO input โ
โ โ โโโโโโโโ โ โโโโโโโดโโโโโโโดโโโ โ
โ โโโโโโโโโโโโ โ
โ โ
โ Tree BORROWS strings. Dies if input freed. โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Pattern 3: Arena + Copy (Best of Both)
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Arena owns all memory. Tree uses arena allocations. โ
โ To free: arena_destroy() frees EVERYTHING. โ
โ No individual free calls. No ownership tracking. โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Representing Absence vs Null
Key distinction:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ {"value": null} - key exists, value is null โ
โ {} - key does not exist โ
โ โ
โ These are DIFFERENT! โ
โ โ
โ json_get(obj, "value") should distinguish: โ
โ โข Found key, value is null โ return JSON_NULL type โ
โ โข Key not found โ return NULL pointer (or error code) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Project Specification
Core API
// Parsing
JsonValue* json_parse(const char* input);
JsonValue* json_parse_file(const char* filename);
void json_free(JsonValue* value);
// Type checking
JsonType json_type(JsonValue* value);
bool json_is_null(JsonValue* value);
bool json_is_bool(JsonValue* value);
bool json_is_number(JsonValue* value);
bool json_is_string(JsonValue* value);
bool json_is_array(JsonValue* value);
bool json_is_object(JsonValue* value);
// Value extraction
bool json_get_bool(JsonValue* value);
double json_get_number(JsonValue* value);
const char* json_get_string(JsonValue* value);
// Array access
size_t json_array_length(JsonValue* array);
JsonValue* json_array_get(JsonValue* array, size_t index);
// Object access
size_t json_object_length(JsonValue* object);
JsonValue* json_object_get(JsonValue* object, const char* key);
const char* json_object_key_at(JsonValue* object, size_t index);
JsonValue* json_object_value_at(JsonValue* object, size_t index);
// Utility
void json_print(JsonValue* value);
void json_pretty_print(JsonValue* value, int indent);
Expected Output
$ ./jsonquery data.json
Successfully parsed JSON in 0.34ms
Tree structure:
- Root: Object (4 keys)
- Total nodes: 247
- Strings: 89 (12.4 KB)
- Arrays: 12
- Objects: 18
- Primitives: 128
$ ./jsonquery data.json '.users[0].name'
"Alice Johnson"
$ ./jsonquery data.json '.users[].age' --format=csv
28
34
22
$ ./jsonquery input.json --pretty-print
{
"users": [
{
"name": "Alice",
"age": 28,
"active": true
}
]
}
Solution Architecture
Data Structures
typedef enum {
JSON_NULL,
JSON_BOOL,
JSON_NUMBER,
JSON_STRING,
JSON_ARRAY,
JSON_OBJECT
} JsonType;
typedef struct JsonValue {
JsonType type;
union {
bool bool_val;
double number_val;
struct {
char* str;
size_t len;
} string_val;
struct {
struct JsonValue** elements;
size_t count;
size_t capacity;
} array_val;
struct {
char** keys;
struct JsonValue** values;
size_t count;
size_t capacity;
} object_val;
} value;
} JsonValue;
Parsing Strategy
JsonValue* parse_value(const char** cursor) {
skip_whitespace(cursor);
char c = **cursor;
if (c == '{') return parse_object(cursor);
if (c == '[') return parse_array(cursor);
if (c == '"') return parse_string(cursor);
if (c == 't' || c == 'f') return parse_bool(cursor);
if (c == 'n') return parse_null(cursor);
if (c == '-' || isdigit(c)) return parse_number(cursor);
return NULL; // Parse error
}
Implementation Guide
Phase 1: Lexer/Tokenizer (2-3 hours)
static void skip_whitespace(const char** cursor) {
while (**cursor == ' ' || **cursor == '\t' ||
**cursor == '\n' || **cursor == '\r') {
(*cursor)++;
}
}
static bool expect(const char** cursor, char c) {
skip_whitespace(cursor);
if (**cursor == c) {
(*cursor)++;
return true;
}
return false;
}
Phase 2: Primitive Parsing (2-3 hours)
static JsonValue* parse_null(const char** cursor) {
if (strncmp(*cursor, "null", 4) == 0) {
*cursor += 4;
JsonValue* v = malloc(sizeof(JsonValue));
v->type = JSON_NULL;
return v;
}
return NULL;
}
static JsonValue* parse_bool(const char** cursor) {
JsonValue* v = malloc(sizeof(JsonValue));
v->type = JSON_BOOL;
if (strncmp(*cursor, "true", 4) == 0) {
*cursor += 4;
v->value.bool_val = true;
return v;
}
if (strncmp(*cursor, "false", 5) == 0) {
*cursor += 5;
v->value.bool_val = false;
return v;
}
free(v);
return NULL;
}
static JsonValue* parse_number(const char** cursor) {
char* end;
double num = strtod(*cursor, &end);
if (end == *cursor) return NULL;
JsonValue* v = malloc(sizeof(JsonValue));
v->type = JSON_NUMBER;
v->value.number_val = num;
*cursor = end;
return v;
}
Phase 3: String Parsing (2-3 hours)
static JsonValue* parse_string(const char** cursor) {
if (**cursor != '"') return NULL;
(*cursor)++;
const char* start = *cursor;
size_t len = 0;
// Find end of string (handle escapes)
while (**cursor != '"') {
if (**cursor == '\0') return NULL; // Unterminated
if (**cursor == '\\') {
(*cursor)++;
if (**cursor == '\0') return NULL;
}
(*cursor)++;
len++;
}
// Copy string
char* str = malloc(len + 1);
// ... copy with escape handling ...
(*cursor)++; // Skip closing quote
JsonValue* v = malloc(sizeof(JsonValue));
v->type = JSON_STRING;
v->value.string_val.str = str;
v->value.string_val.len = len;
return v;
}
Phase 4: Array and Object Parsing (3-4 hours)
static JsonValue* parse_array(const char** cursor) {
if (!expect(cursor, '[')) return NULL;
JsonValue* v = malloc(sizeof(JsonValue));
v->type = JSON_ARRAY;
v->value.array_val.elements = NULL;
v->value.array_val.count = 0;
v->value.array_val.capacity = 0;
skip_whitespace(cursor);
if (**cursor == ']') {
(*cursor)++;
return v; // Empty array
}
while (1) {
JsonValue* elem = parse_value(cursor);
if (!elem) {
json_free(v);
return NULL;
}
// Add to array (grow if needed)
array_push(v, elem);
skip_whitespace(cursor);
if (**cursor == ']') {
(*cursor)++;
return v;
}
if (!expect(cursor, ',')) {
json_free(v);
return NULL;
}
}
}
static JsonValue* parse_object(const char** cursor) {
if (!expect(cursor, '{')) return NULL;
JsonValue* v = malloc(sizeof(JsonValue));
v->type = JSON_OBJECT;
v->value.object_val.keys = NULL;
v->value.object_val.values = NULL;
v->value.object_val.count = 0;
v->value.object_val.capacity = 0;
skip_whitespace(cursor);
if (**cursor == '}') {
(*cursor)++;
return v; // Empty object
}
while (1) {
// Parse key (must be string)
JsonValue* key = parse_string(cursor);
if (!key) {
json_free(v);
return NULL;
}
if (!expect(cursor, ':')) {
json_free(key);
json_free(v);
return NULL;
}
JsonValue* value = parse_value(cursor);
if (!value) {
json_free(key);
json_free(v);
return NULL;
}
// Add key-value pair
object_insert(v, key->value.string_val.str, value);
free(key); // We took ownership of the string
skip_whitespace(cursor);
if (**cursor == '}') {
(*cursor)++;
return v;
}
if (!expect(cursor, ',')) {
json_free(v);
return NULL;
}
}
}
Phase 5: Free Function (1-2 hours)
void json_free(JsonValue* value) {
if (!value) return;
switch (value->type) {
case JSON_STRING:
free(value->value.string_val.str);
break;
case JSON_ARRAY:
for (size_t i = 0; i < value->value.array_val.count; i++) {
json_free(value->value.array_val.elements[i]);
}
free(value->value.array_val.elements);
break;
case JSON_OBJECT:
for (size_t i = 0; i < value->value.object_val.count; i++) {
free(value->value.object_val.keys[i]);
json_free(value->value.object_val.values[i]);
}
free(value->value.object_val.keys);
free(value->value.object_val.values);
break;
default:
break;
}
free(value);
}
Common Pitfalls
Pitfall 1: Partial Parse Cleanup
// WRONG: Memory leak on parse error
JsonValue* elem = parse_value(cursor);
if (!elem) return NULL; // Leaks already-parsed array elements!
// CORRECT: Free on error
JsonValue* elem = parse_value(cursor);
if (!elem) {
json_free(v); // Free what we've built so far
return NULL;
}
Pitfall 2: String Ownership Confusion
// DANGEROUS: Who owns key->value.string_val.str?
char* key_str = key->value.string_val.str;
object_insert(v, key_str, value);
json_free(key); // This frees key_str too!
// Now v has a dangling pointer
// CORRECT: Transfer ownership explicitly
char* key_str = key->value.string_val.str;
key->value.string_val.str = NULL; // Prevent double-free
object_insert(v, key_str, value); // v now owns key_str
free(key); // Safe - str is NULL
Pitfall 3: Confusing Null and Missing
// BAD API:
JsonValue* v = json_object_get(obj, "missing_key");
if (v == NULL) {
// Is key missing, or is value null?
}
// GOOD API:
bool found;
JsonValue* v = json_object_get(obj, "key", &found);
if (!found) {
// Key doesn't exist
} else if (v->type == JSON_NULL) {
// Key exists, value is null
}
Interview Preparation
Common Questions
- โDeep copy vs shallow copy for strings?โ
- Deep: safe, owns data, can outlive input
- Shallow: fast, borrows data, dies with input
- Arena: fast, safe, bulk deallocation
- โHow do you handle parse errors with partial trees?โ
- Track all allocations
- Free everything built so far
- Or use arena: just destroy arena
- โNull vs key not found?โ
- Return error code + output parameter
- Or use sentinel value (not NULL)
- Document behavior clearly
Self-Assessment Checklist
- Parse all JSON value types
- Handle nested structures correctly
- Clean up on parse errors
- No memory leaks (Valgrind clean)
- Distinguish null from missing keys
Next Project: P05: Linked List Database - Apply invariants to a classic data structure with complex operations.