Project 1: Safe Dynamic Array Library (`vec.h`)

Sprint: 2 - Data & Invariants Difficulty: Intermediate Time Estimate: 1 week Prerequisites: Basic C, pointers, malloc/free basics

Overview

What you’ll build: A reusable, memory-safe dynamic array library in C with documented invariants, bounds checking, and clear ownership semantics—usable in other projects.

Why it teaches Data & Invariants: This is the canonical “invariant-heavy” data structure. A dynamic array has at least 5 invariants that must hold simultaneously:

capacity >= length
data != NULL || capacity == 0
length elements are initialized
Only the owner may free data
After realloc, all previous pointers are invalid

You cannot implement this correctly without thinking about invariants constantly.

The Core Question You’re Answering:

“How do you design a data structure so that it’s IMPOSSIBLE to use incorrectly—or at least OBVIOUS when you do?”

Learning Objectives

By the end of this project, you will be able to:

Define and enforce invariants using assertions and documentation
Implement a resizable array with amortized O(1) append
Handle ownership correctly (who allocates, who frees)
Detect and prevent common bugs like use-after-realloc
Verify memory safety using Valgrind
Design APIs that make misuse difficult
Document contracts in code comments

Theoretical Foundation

What is an Invariant?

An invariant is a condition that must always be true at specific points in your program’s execution. It’s not a suggestion—it’s a contract that your code makes with itself.

INVARIANT: A condition that MUST be true
at all "stable points" in your program

Stable points:
• Before a function is called
• After a function returns
• At the start of a loop iteration
• After allocating/freeing memory

Dynamic Array Invariants

typedef struct {
    int *data;      // Pointer to heap-allocated array
    size_t length;  // Number of elements currently in use
    size_t capacity; // Total number of elements allocated
} Vec;

This structure has at least five invariants that must hold simultaneously:

INVARIANT SET for Vec:
┌──────────────────────────────────────────────────────────────┐
│ 1. capacity >= length                                        │
│    (You can't use more space than you allocated)             │
│                                                               │
│ 2. data != NULL  OR  capacity == 0                           │
│    (Either we have allocated memory, or capacity is 0)       │
│                                                               │
│ 3. First 'length' elements at data[0..length-1] are          │
│    initialized and valid                                     │
│                                                               │
│ 4. Only ONE owner may call free(data)                        │
│    (Double-free is undefined behavior)                       │
│                                                               │
│ 5. After realloc(), ALL previous pointers to                 │
│    elements are INVALID                                      │
│    (The array may have moved)                                │
└──────────────────────────────────────────────────────────────┘

Ownership Pattern

OWNERSHIP PRINCIPLE:
┌─────────────────────────────────────────────────┐
│  Every malloc() must have EXACTLY ONE free()    │
│  Every piece of memory has ONE responsible      │
│  entity that will free it                       │
│                                                 │
│  Owner = "The entity that calls free()"         │
└─────────────────────────────────────────────────┘

Pattern: STRUCT OWNS ITS HEAP DATA
┌──────────────────────────┐
│  Vec                     │
│  ┌────────────────────┐  │
│  │ int *data   ───┐   │  │
│  │ size_t length  │   │  │
│  │ size_t capacity│   │  │
│  └────────────────│───┘  │
└───────────────────│──────┘
                    ▼
           ┌────────────────┐
           │ Heap memory    │  ← Vec OWNS this
           │ [1,2,3,4,5...] │  ← vec_destroy() frees it
           └────────────────┘

Pointer Invalidation After Realloc

SCENARIO: Reallocation invalidates ALL previous pointers
──────────────────────────────────────────────────────────
Vec v;
vec_init(&v);

vec_push(&v, 10);
int *first = &v.data[0];  // ← first points into v.data
printf("%d\n", *first);   // ✓ SAFE (currently valid)

vec_push(&v, 20);  // ← This might realloc() v.data
                   // ← v.data may have MOVED
                   // ← first is now INVALID (dangling pointer)

printf("%d\n", *first);   // ✗ UNDEFINED BEHAVIOR
                          //   (first points to freed memory)

BEFORE realloc():
  v.data ─────────┐
                  ▼
         ┌─────────────────┐
  Heap:  │ [10, ...empty...] │
         └─────────────────┘
                  ▲
  first ─────────┘   (VALID pointer)

AFTER vec_push() triggers realloc():
  v.data ─────────────────────┐
                              ▼
                     ┌─────────────────────────────┐
  Heap:              │ [10, 20, ...empty........]  │
                     └─────────────────────────────┘

         ┌─────────────────┐
         │ (freed memory)  │
         └─────────────────┘
                  ▲
  first ─────────┘   (INVALID - DANGLING!)

Project Specification

Core API

// Create a new vector with initial capacity
void vec_init(vec_t *v);
void vec_init_capacity(vec_t *v, size_t capacity);

// Destroy vector and free memory
void vec_destroy(vec_t *v);

// Add element to end (may trigger realloc)
void vec_push(vec_t *v, int value);

// Remove and return last element
int vec_pop(vec_t *v);

// Access element by index (with bounds checking)
int vec_get(vec_t *v, size_t index);
void vec_set(vec_t *v, size_t index, int value);

// Query size
size_t vec_length(vec_t *v);
size_t vec_capacity(vec_t *v);
bool vec_empty(vec_t *v);

// Remove element at index (shifts remaining)
void vec_remove(vec_t *v, size_t index);

// Clear all elements (keeps capacity)
void vec_clear(vec_t *v);

// Debug: check all invariants
bool vec_check_invariants(vec_t *v);

Expected Output

$ ./vec_demo

=== Dynamic Array (vec.h) Demonstration ===

[Test 1: Basic Creation and Push]
Creating vector with initial capacity 4...
Vector created: {data=0x7f9a42c05a00, length=0, capacity=4}
Invariant check: ✓ capacity >= length (4 >= 0)
Invariant check: ✓ data != NULL
Invariant check: ✓ length within bounds

Pushing integers 0-9...
Push 0: capacity grew 4 → 8 (realloc triggered)
  Old pointer: 0x7f9a42c05a00 (now INVALID)
  New pointer: 0x7f9a42c05c00
Push 1: no realloc (8 > 1)
...
Push 8: capacity grew 8 → 16 (realloc triggered)

Final state: {length=10, capacity=16}
Contents: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

[Test 2: Memory Growth Pattern]
Starting with capacity=1, pushing 10,000 integers...
Capacity growth sequence:
  1 → 2 → 4 → 8 → 16 → 32 → 64 → 128 → 256 → 512 → 1024 → 2048 → 4096 → 8192 → 16384

Total reallocations: 14
Final capacity: 16384 (63.6% utilization)
Amortized cost analysis:
  Total push operations: 10,000
  Total realloc operations: 14
  Average cost per push: O(1) ✓

[Test 3: Bounds Checking]
vec_get(vec, 5) = 5 ✓
vec_get(vec, 9) = 9 ✓
Attempting out-of-bounds access: vec_get(vec, 10)...
ASSERTION FAILED: index < vec->length

[Test 4: Cleanup and Valgrind]
Destroying vector...
Freed 16384 bytes

$ valgrind --leak-check=full ./vec_demo
==12345== All heap blocks were freed -- no leaks are possible
SUCCESS: Zero memory leaks ✓

Solution Architecture

Data Structure

typedef struct {
    int *data;        // Pointer to heap-allocated array
    size_t length;    // Number of elements currently stored
    size_t capacity;  // Total slots allocated
} vec_t;

Key Algorithms

Push with Amortized O(1):

vec_push(vec, value):
    1. Check if length >= capacity
       - If yes: grow to capacity * 2 (or 1 if capacity == 0)
    2. Store value at data[length]
    3. Increment length

Growth Strategy:

Start with capacity 0 or small number
Double capacity when full
Amortized O(1) because total cost of n pushes is O(n)

Bounds Checking:

vec_get(vec, index):
Assert: vec != NULL
Assert: index < vec->length
Return vec->data[index]

Implementation Guide

Phase 1: Basic Structure (1 hour)

typedef struct {
    int *data;
    size_t length;
    size_t capacity;
} vec_t;

void vec_init(vec_t *v) {
    v->data = NULL;
    v->length = 0;
    v->capacity = 0;
}

void vec_destroy(vec_t *v) {
    free(v->data);
    v->data = NULL;
    v->length = 0;
    v->capacity = 0;
}

Phase 2: Push and Grow (1 hour)

void vec_push(vec_t *v, int value) {
    if (v->length >= v->capacity) {
        size_t new_capacity = (v->capacity == 0) ? 1 : v->capacity * 2;
        int *new_data = realloc(v->data, new_capacity * sizeof(int));
        if (new_data == NULL) {
            // Handle allocation failure
            fprintf(stderr, "vec_push: allocation failed\n");
            abort();
        }
        v->data = new_data;
        v->capacity = new_capacity;
    }
    v->data[v->length] = value;
    v->length++;
}

Phase 3: Access and Bounds Checking (1 hour)

int vec_get(vec_t *v, size_t index) {
    assert(v != NULL);
    assert(index < v->length && "Index out of bounds");
    return v->data[index];
}

void vec_set(vec_t *v, size_t index, int value) {
    assert(v != NULL);
    assert(index < v->length && "Index out of bounds");
    v->data[index] = value;
}

Phase 4: Additional Operations (1-2 hours)

int vec_pop(vec_t *v) {
    assert(v != NULL);
    assert(v->length > 0 && "Cannot pop from empty vector");
    v->length--;
    return v->data[v->length];
}

void vec_remove(vec_t *v, size_t index) {
    assert(v != NULL);
    assert(index < v->length);

    // Shift elements left
    for (size_t i = index; i < v->length - 1; i++) {
        v->data[i] = v->data[i + 1];
    }
    v->length--;
}

void vec_clear(vec_t *v) {
    v->length = 0;
    // Keep capacity - don't free
}

Phase 5: Invariant Checking (1 hour)

bool vec_check_invariants(vec_t *v) {
    if (v == NULL) return false;

    // Invariant 1: capacity >= length
    if (v->capacity < v->length) {
        fprintf(stderr, "INVARIANT VIOLATION: capacity < length\n");
        return false;
    }

    // Invariant 2: data != NULL OR capacity == 0
    if (v->data == NULL && v->capacity != 0) {
        fprintf(stderr, "INVARIANT VIOLATION: data is NULL but capacity != 0\n");
        return false;
    }
    if (v->data != NULL && v->capacity == 0) {
        fprintf(stderr, "INVARIANT VIOLATION: data not NULL but capacity == 0\n");
        return false;
    }

    return true;
}

Testing Strategy

Unit Tests

void test_init_destroy() {
    vec_t v;
    vec_init(&v);
    assert(v.length == 0);
    assert(v.capacity == 0);
    assert(v.data == NULL);
    vec_destroy(&v);
    printf("test_init_destroy: PASSED\n");
}

void test_push_and_grow() {
    vec_t v;
    vec_init(&v);

    for (int i = 0; i < 1000; i++) {
        vec_push(&v, i);
        assert(v.length == (size_t)(i + 1));
        assert(v.capacity >= v.length);
    }

    for (int i = 0; i < 1000; i++) {
        assert(vec_get(&v, i) == i);
    }

    vec_destroy(&v);
    printf("test_push_and_grow: PASSED\n");
}

void test_bounds_checking() {
    vec_t v;
    vec_init(&v);
    vec_push(&v, 42);

    // Valid access
    assert(vec_get(&v, 0) == 42);

    // This should trigger assertion failure in debug mode
    // vec_get(&v, 1);  // Would abort

    vec_destroy(&v);
    printf("test_bounds_checking: PASSED\n");
}

Valgrind Verification

$ valgrind --leak-check=full --show-leak-kinds=all ./test_vec
==12345== All heap blocks were freed -- no leaks are possible

Common Pitfalls

Pitfall 1: Forgetting capacity == 0 Case

// WRONG: Infinite loop when capacity is 0
size_t new_capacity = v->capacity * 2;  // 0 * 2 = 0!

// CORRECT: Handle zero case
size_t new_capacity = (v->capacity == 0) ? 1 : v->capacity * 2;

Pitfall 2: Updating Capacity Before Checking Realloc

// WRONG: If realloc fails, capacity is wrong
v->capacity = new_capacity;
v->data = realloc(v->data, new_capacity * sizeof(int));
if (v->data == NULL) // Too late!

// CORRECT: Check realloc first
int *new_data = realloc(v->data, new_capacity * sizeof(int));
if (new_data == NULL) {
    // Handle error - original data still valid
    return ERROR;
}
v->data = new_data;
v->capacity = new_capacity;

Pitfall 3: Using Pointers After Push

// DANGEROUS: pointer may be invalidated
int *first = &v.data[0];
vec_push(&v, 42);  // May realloc!
*first = 10;  // UNDEFINED BEHAVIOR - first may be dangling

Pitfall 4: Double Free

vec_t v;
vec_init(&v);
vec_push(&v, 1);
vec_destroy(&v);
vec_destroy(&v);  // DOUBLE FREE - undefined behavior

// SOLUTION: Set to NULL after destroy
void vec_destroy(vec_t *v) {
    free(v->data);
    v->data = NULL;  // Prevents double-free issues
    v->length = 0;
    v->capacity = 0;
}

Interview Preparation

Common Questions

“What is the difference between length and capacity?”
- Length: number of elements currently stored
- Capacity: total slots allocated
- Invariant: capacity >= length
“Why double the capacity instead of adding a fixed amount?”
- Amortized O(1) per push
- Total cost for n pushes is O(n), not O(n²)
- Adding fixed amount (e.g., +10) would be O(n) per push amortized
“After calling vec_push(), why might pointers to elements become invalid?”
- Realloc may move the entire array to a new location
- Old pointers become dangling
- Must re-fetch pointers after any operation that may realloc
“What happens if realloc returns NULL?”
- Allocation failed, but original data is still valid
- Must not update v->data until we know realloc succeeded
- Should handle gracefully (return error, abort, etc.)
“Should vec_get return a pointer or a copy?”
- Pointer: faster, but dangerous (invalidation)
- Copy: safer, but slower for large types
- Design choice depends on use case

Self-Assessment Checklist

Understanding

Can explain the 5 invariants without looking
Can explain why doubling gives amortized O(1)
Can describe what happens when realloc moves memory
Can explain ownership semantics for the vector

Implementation

vec_init works correctly
vec_push handles growth correctly
vec_get/vec_set have bounds checking
vec_destroy frees all memory
Valgrind shows no leaks

Verification

Unit tests pass
Invariant checker validates state
Edge cases handled (empty, single element, large)

Resources

Books

Topic	Book	Chapter
Dynamic Arrays	“C Programming: A Modern Approach” by K.N. King	Chapter 17
Invariants & Contracts	“C Interfaces and Implementations” by David Hanson	Chapter 1
Assertions	“Writing Solid Code” by Steve Maguire	Chapter 2
Memory Management	“Effective C” by Robert Seacord	Chapter 6
Amortized Analysis	“Introduction to Algorithms” by Cormen et al.	Chapter 17.4

Summary

This project teaches the fundamental discipline of invariant-based programming:

Invariants are contracts - they define what makes your data structure valid
Ownership is explicit - one entity is responsible for freeing memory
Pointers have lifetimes - realloc invalidates all previous pointers
Bounds checking is non-negotiable - assert your preconditions
Testing proves correctness - Valgrind and unit tests verify your invariants

When you can build a dynamic array that never leaks, never corrupts, and clearly documents its contracts, you’ve internalized the discipline that separates reliable systems code from fragile code.

Next Project: P02: Text Editor Buffer (Gap Buffer) - Apply invariants to a more complex data structure with elegant properties.

Project 1: Safe Dynamic Array Library (vec.h)