Project 1: Safe Dynamic Array Library (`vec.h`)
Project 1: Safe Dynamic Array Library (vec.h)
Sprint: 2 - Data & Invariants Difficulty: Intermediate Time Estimate: 1 week Prerequisites: Basic C, pointers, malloc/free basics
Overview
What youโll build: A reusable, memory-safe dynamic array library in C with documented invariants, bounds checking, and clear ownership semanticsโusable in other projects.
Why it teaches Data & Invariants: This is the canonical โinvariant-heavyโ data structure. A dynamic array has at least 5 invariants that must hold simultaneously:
capacity >= lengthdata != NULL || capacity == 0lengthelements are initialized- Only the owner may free
data - After realloc, all previous pointers are invalid
You cannot implement this correctly without thinking about invariants constantly.
The Core Question Youโre Answering:
โHow do you design a data structure so that itโs IMPOSSIBLE to use incorrectlyโor at least OBVIOUS when you do?โ
Learning Objectives
By the end of this project, you will be able to:
- Define and enforce invariants using assertions and documentation
- Implement a resizable array with amortized O(1) append
- Handle ownership correctly (who allocates, who frees)
- Detect and prevent common bugs like use-after-realloc
- Verify memory safety using Valgrind
- Design APIs that make misuse difficult
- Document contracts in code comments
Theoretical Foundation
What is an Invariant?
An invariant is a condition that must always be true at specific points in your programโs execution. Itโs not a suggestionโitโs a contract that your code makes with itself.
INVARIANT: A condition that MUST be true
at all "stable points" in your program
Stable points:
โข Before a function is called
โข After a function returns
โข At the start of a loop iteration
โข After allocating/freeing memory
Dynamic Array Invariants
typedef struct {
int *data; // Pointer to heap-allocated array
size_t length; // Number of elements currently in use
size_t capacity; // Total number of elements allocated
} Vec;
This structure has at least five invariants that must hold simultaneously:
INVARIANT SET for Vec:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ 1. capacity >= length โ
โ (You can't use more space than you allocated) โ
โ โ
โ 2. data != NULL OR capacity == 0 โ
โ (Either we have allocated memory, or capacity is 0) โ
โ โ
โ 3. First 'length' elements at data[0..length-1] are โ
โ initialized and valid โ
โ โ
โ 4. Only ONE owner may call free(data) โ
โ (Double-free is undefined behavior) โ
โ โ
โ 5. After realloc(), ALL previous pointers to โ
โ elements are INVALID โ
โ (The array may have moved) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Ownership Pattern
OWNERSHIP PRINCIPLE:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Every malloc() must have EXACTLY ONE free() โ
โ Every piece of memory has ONE responsible โ
โ entity that will free it โ
โ โ
โ Owner = "The entity that calls free()" โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Pattern: STRUCT OWNS ITS HEAP DATA
โโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Vec โ
โ โโโโโโโโโโโโโโโโโโโโโโ โ
โ โ int *data โโโโ โ โ
โ โ size_t length โ โ โ
โ โ size_t capacityโ โ โ
โ โโโโโโโโโโโโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โผ
โโโโโโโโโโโโโโโโโโ
โ Heap memory โ โ Vec OWNS this
โ [1,2,3,4,5...] โ โ vec_destroy() frees it
โโโโโโโโโโโโโโโโโโ
Pointer Invalidation After Realloc
SCENARIO: Reallocation invalidates ALL previous pointers
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Vec v;
vec_init(&v);
vec_push(&v, 10);
int *first = &v.data[0]; // โ first points into v.data
printf("%d\n", *first); // โ SAFE (currently valid)
vec_push(&v, 20); // โ This might realloc() v.data
// โ v.data may have MOVED
// โ first is now INVALID (dangling pointer)
printf("%d\n", *first); // โ UNDEFINED BEHAVIOR
// (first points to freed memory)
BEFORE realloc():
v.data โโโโโโโโโโ
โผ
โโโโโโโโโโโโโโโโโโโ
Heap: โ [10, ...empty...] โ
โโโโโโโโโโโโโโโโโโโ
โฒ
first โโโโโโโโโโ (VALID pointer)
AFTER vec_push() triggers realloc():
v.data โโโโโโโโโโโโโโโโโโโโโโ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Heap: โ [10, 20, ...empty........] โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโโโโโ
โ (freed memory) โ
โโโโโโโโโโโโโโโโโโโ
โฒ
first โโโโโโโโโโ (INVALID - DANGLING!)
Project Specification
Core API
// Create a new vector with initial capacity
void vec_init(vec_t *v);
void vec_init_capacity(vec_t *v, size_t capacity);
// Destroy vector and free memory
void vec_destroy(vec_t *v);
// Add element to end (may trigger realloc)
void vec_push(vec_t *v, int value);
// Remove and return last element
int vec_pop(vec_t *v);
// Access element by index (with bounds checking)
int vec_get(vec_t *v, size_t index);
void vec_set(vec_t *v, size_t index, int value);
// Query size
size_t vec_length(vec_t *v);
size_t vec_capacity(vec_t *v);
bool vec_empty(vec_t *v);
// Remove element at index (shifts remaining)
void vec_remove(vec_t *v, size_t index);
// Clear all elements (keeps capacity)
void vec_clear(vec_t *v);
// Debug: check all invariants
bool vec_check_invariants(vec_t *v);
Expected Output
$ ./vec_demo
=== Dynamic Array (vec.h) Demonstration ===
[Test 1: Basic Creation and Push]
Creating vector with initial capacity 4...
Vector created: {data=0x7f9a42c05a00, length=0, capacity=4}
Invariant check: โ capacity >= length (4 >= 0)
Invariant check: โ data != NULL
Invariant check: โ length within bounds
Pushing integers 0-9...
Push 0: capacity grew 4 โ 8 (realloc triggered)
Old pointer: 0x7f9a42c05a00 (now INVALID)
New pointer: 0x7f9a42c05c00
Push 1: no realloc (8 > 1)
...
Push 8: capacity grew 8 โ 16 (realloc triggered)
Final state: {length=10, capacity=16}
Contents: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[Test 2: Memory Growth Pattern]
Starting with capacity=1, pushing 10,000 integers...
Capacity growth sequence:
1 โ 2 โ 4 โ 8 โ 16 โ 32 โ 64 โ 128 โ 256 โ 512 โ 1024 โ 2048 โ 4096 โ 8192 โ 16384
Total reallocations: 14
Final capacity: 16384 (63.6% utilization)
Amortized cost analysis:
Total push operations: 10,000
Total realloc operations: 14
Average cost per push: O(1) โ
[Test 3: Bounds Checking]
vec_get(vec, 5) = 5 โ
vec_get(vec, 9) = 9 โ
Attempting out-of-bounds access: vec_get(vec, 10)...
ASSERTION FAILED: index < vec->length
[Test 4: Cleanup and Valgrind]
Destroying vector...
Freed 16384 bytes
$ valgrind --leak-check=full ./vec_demo
==12345== All heap blocks were freed -- no leaks are possible
SUCCESS: Zero memory leaks โ
Solution Architecture
Data Structure
typedef struct {
int *data; // Pointer to heap-allocated array
size_t length; // Number of elements currently stored
size_t capacity; // Total slots allocated
} vec_t;
Key Algorithms
Push with Amortized O(1):
vec_push(vec, value):
1. Check if length >= capacity
- If yes: grow to capacity * 2 (or 1 if capacity == 0)
2. Store value at data[length]
3. Increment length
Growth Strategy:
- Start with capacity 0 or small number
- Double capacity when full
- Amortized O(1) because total cost of n pushes is O(n)
Bounds Checking:
vec_get(vec, index):
1. Assert: vec != NULL
2. Assert: index < vec->length
3. Return vec->data[index]
Implementation Guide
Phase 1: Basic Structure (1 hour)
typedef struct {
int *data;
size_t length;
size_t capacity;
} vec_t;
void vec_init(vec_t *v) {
v->data = NULL;
v->length = 0;
v->capacity = 0;
}
void vec_destroy(vec_t *v) {
free(v->data);
v->data = NULL;
v->length = 0;
v->capacity = 0;
}
Phase 2: Push and Grow (1 hour)
void vec_push(vec_t *v, int value) {
if (v->length >= v->capacity) {
size_t new_capacity = (v->capacity == 0) ? 1 : v->capacity * 2;
int *new_data = realloc(v->data, new_capacity * sizeof(int));
if (new_data == NULL) {
// Handle allocation failure
fprintf(stderr, "vec_push: allocation failed\n");
abort();
}
v->data = new_data;
v->capacity = new_capacity;
}
v->data[v->length] = value;
v->length++;
}
Phase 3: Access and Bounds Checking (1 hour)
int vec_get(vec_t *v, size_t index) {
assert(v != NULL);
assert(index < v->length && "Index out of bounds");
return v->data[index];
}
void vec_set(vec_t *v, size_t index, int value) {
assert(v != NULL);
assert(index < v->length && "Index out of bounds");
v->data[index] = value;
}
Phase 4: Additional Operations (1-2 hours)
int vec_pop(vec_t *v) {
assert(v != NULL);
assert(v->length > 0 && "Cannot pop from empty vector");
v->length--;
return v->data[v->length];
}
void vec_remove(vec_t *v, size_t index) {
assert(v != NULL);
assert(index < v->length);
// Shift elements left
for (size_t i = index; i < v->length - 1; i++) {
v->data[i] = v->data[i + 1];
}
v->length--;
}
void vec_clear(vec_t *v) {
v->length = 0;
// Keep capacity - don't free
}
Phase 5: Invariant Checking (1 hour)
bool vec_check_invariants(vec_t *v) {
if (v == NULL) return false;
// Invariant 1: capacity >= length
if (v->capacity < v->length) {
fprintf(stderr, "INVARIANT VIOLATION: capacity < length\n");
return false;
}
// Invariant 2: data != NULL OR capacity == 0
if (v->data == NULL && v->capacity != 0) {
fprintf(stderr, "INVARIANT VIOLATION: data is NULL but capacity != 0\n");
return false;
}
if (v->data != NULL && v->capacity == 0) {
fprintf(stderr, "INVARIANT VIOLATION: data not NULL but capacity == 0\n");
return false;
}
return true;
}
Testing Strategy
Unit Tests
void test_init_destroy() {
vec_t v;
vec_init(&v);
assert(v.length == 0);
assert(v.capacity == 0);
assert(v.data == NULL);
vec_destroy(&v);
printf("test_init_destroy: PASSED\n");
}
void test_push_and_grow() {
vec_t v;
vec_init(&v);
for (int i = 0; i < 1000; i++) {
vec_push(&v, i);
assert(v.length == (size_t)(i + 1));
assert(v.capacity >= v.length);
}
for (int i = 0; i < 1000; i++) {
assert(vec_get(&v, i) == i);
}
vec_destroy(&v);
printf("test_push_and_grow: PASSED\n");
}
void test_bounds_checking() {
vec_t v;
vec_init(&v);
vec_push(&v, 42);
// Valid access
assert(vec_get(&v, 0) == 42);
// This should trigger assertion failure in debug mode
// vec_get(&v, 1); // Would abort
vec_destroy(&v);
printf("test_bounds_checking: PASSED\n");
}
Valgrind Verification
$ valgrind --leak-check=full --show-leak-kinds=all ./test_vec
==12345== All heap blocks were freed -- no leaks are possible
Common Pitfalls
Pitfall 1: Forgetting capacity == 0 Case
// WRONG: Infinite loop when capacity is 0
size_t new_capacity = v->capacity * 2; // 0 * 2 = 0!
// CORRECT: Handle zero case
size_t new_capacity = (v->capacity == 0) ? 1 : v->capacity * 2;
Pitfall 2: Updating Capacity Before Checking Realloc
// WRONG: If realloc fails, capacity is wrong
v->capacity = new_capacity;
v->data = realloc(v->data, new_capacity * sizeof(int));
if (v->data == NULL) // Too late!
// CORRECT: Check realloc first
int *new_data = realloc(v->data, new_capacity * sizeof(int));
if (new_data == NULL) {
// Handle error - original data still valid
return ERROR;
}
v->data = new_data;
v->capacity = new_capacity;
Pitfall 3: Using Pointers After Push
// DANGEROUS: pointer may be invalidated
int *first = &v.data[0];
vec_push(&v, 42); // May realloc!
*first = 10; // UNDEFINED BEHAVIOR - first may be dangling
Pitfall 4: Double Free
vec_t v;
vec_init(&v);
vec_push(&v, 1);
vec_destroy(&v);
vec_destroy(&v); // DOUBLE FREE - undefined behavior
// SOLUTION: Set to NULL after destroy
void vec_destroy(vec_t *v) {
free(v->data);
v->data = NULL; // Prevents double-free issues
v->length = 0;
v->capacity = 0;
}
Interview Preparation
Common Questions
- โWhat is the difference between length and capacity?โ
- Length: number of elements currently stored
- Capacity: total slots allocated
- Invariant: capacity >= length
- โWhy double the capacity instead of adding a fixed amount?โ
- Amortized O(1) per push
- Total cost for n pushes is O(n), not O(nยฒ)
- Adding fixed amount (e.g., +10) would be O(n) per push amortized
- โAfter calling vec_push(), why might pointers to elements become invalid?โ
- Realloc may move the entire array to a new location
- Old pointers become dangling
- Must re-fetch pointers after any operation that may realloc
- โWhat happens if realloc returns NULL?โ
- Allocation failed, but original data is still valid
- Must not update v->data until we know realloc succeeded
- Should handle gracefully (return error, abort, etc.)
- โShould vec_get return a pointer or a copy?โ
- Pointer: faster, but dangerous (invalidation)
- Copy: safer, but slower for large types
- Design choice depends on use case
Self-Assessment Checklist
Understanding
- Can explain the 5 invariants without looking
- Can explain why doubling gives amortized O(1)
- Can describe what happens when realloc moves memory
- Can explain ownership semantics for the vector
Implementation
- vec_init works correctly
- vec_push handles growth correctly
- vec_get/vec_set have bounds checking
- vec_destroy frees all memory
- Valgrind shows no leaks
Verification
- Unit tests pass
- Invariant checker validates state
- Edge cases handled (empty, single element, large)
Resources
Books
| Topic | Book | Chapter |
|---|---|---|
| Dynamic Arrays | โC Programming: A Modern Approachโ by K.N. King | Chapter 17 |
| Invariants & Contracts | โC Interfaces and Implementationsโ by David Hanson | Chapter 1 |
| Assertions | โWriting Solid Codeโ by Steve Maguire | Chapter 2 |
| Memory Management | โEffective Cโ by Robert Seacord | Chapter 6 |
| Amortized Analysis | โIntroduction to Algorithmsโ by Cormen et al. | Chapter 17.4 |
Summary
This project teaches the fundamental discipline of invariant-based programming:
- Invariants are contracts - they define what makes your data structure valid
- Ownership is explicit - one entity is responsible for freeing memory
- Pointers have lifetimes - realloc invalidates all previous pointers
- Bounds checking is non-negotiable - assert your preconditions
- Testing proves correctness - Valgrind and unit tests verify your invariants
When you can build a dynamic array that never leaks, never corrupts, and clearly documents its contracts, youโve internalized the discipline that separates reliable systems code from fragile code.
Next Project: P02: Text Editor Buffer (Gap Buffer) - Apply invariants to a more complex data structure with elegant properties.