Project 1: Safe Dynamic Array Library (`vec.h`)

Project 1: Safe Dynamic Array Library (vec.h)

Sprint: 2 - Data & Invariants Difficulty: Intermediate Time Estimate: 1 week Prerequisites: Basic C, pointers, malloc/free basics


Overview

What youโ€™ll build: A reusable, memory-safe dynamic array library in C with documented invariants, bounds checking, and clear ownership semanticsโ€”usable in other projects.

Why it teaches Data & Invariants: This is the canonical โ€œinvariant-heavyโ€ data structure. A dynamic array has at least 5 invariants that must hold simultaneously:

  1. capacity >= length
  2. data != NULL || capacity == 0
  3. length elements are initialized
  4. Only the owner may free data
  5. After realloc, all previous pointers are invalid

You cannot implement this correctly without thinking about invariants constantly.

The Core Question Youโ€™re Answering:

โ€œHow do you design a data structure so that itโ€™s IMPOSSIBLE to use incorrectlyโ€”or at least OBVIOUS when you do?โ€


Learning Objectives

By the end of this project, you will be able to:

  1. Define and enforce invariants using assertions and documentation
  2. Implement a resizable array with amortized O(1) append
  3. Handle ownership correctly (who allocates, who frees)
  4. Detect and prevent common bugs like use-after-realloc
  5. Verify memory safety using Valgrind
  6. Design APIs that make misuse difficult
  7. Document contracts in code comments

Theoretical Foundation

What is an Invariant?

An invariant is a condition that must always be true at specific points in your programโ€™s execution. Itโ€™s not a suggestionโ€”itโ€™s a contract that your code makes with itself.

INVARIANT: A condition that MUST be true
at all "stable points" in your program

Stable points:
โ€ข Before a function is called
โ€ข After a function returns
โ€ข At the start of a loop iteration
โ€ข After allocating/freeing memory

Dynamic Array Invariants

typedef struct {
    int *data;      // Pointer to heap-allocated array
    size_t length;  // Number of elements currently in use
    size_t capacity; // Total number of elements allocated
} Vec;

This structure has at least five invariants that must hold simultaneously:

INVARIANT SET for Vec:
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ 1. capacity >= length                                        โ”‚
โ”‚    (You can't use more space than you allocated)             โ”‚
โ”‚                                                               โ”‚
โ”‚ 2. data != NULL  OR  capacity == 0                           โ”‚
โ”‚    (Either we have allocated memory, or capacity is 0)       โ”‚
โ”‚                                                               โ”‚
โ”‚ 3. First 'length' elements at data[0..length-1] are          โ”‚
โ”‚    initialized and valid                                     โ”‚
โ”‚                                                               โ”‚
โ”‚ 4. Only ONE owner may call free(data)                        โ”‚
โ”‚    (Double-free is undefined behavior)                       โ”‚
โ”‚                                                               โ”‚
โ”‚ 5. After realloc(), ALL previous pointers to                 โ”‚
โ”‚    elements are INVALID                                      โ”‚
โ”‚    (The array may have moved)                                โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Ownership Pattern

OWNERSHIP PRINCIPLE:
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  Every malloc() must have EXACTLY ONE free()    โ”‚
โ”‚  Every piece of memory has ONE responsible      โ”‚
โ”‚  entity that will free it                       โ”‚
โ”‚                                                 โ”‚
โ”‚  Owner = "The entity that calls free()"         โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Pattern: STRUCT OWNS ITS HEAP DATA
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  Vec                     โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”‚
โ”‚  โ”‚ int *data   โ”€โ”€โ”€โ”   โ”‚  โ”‚
โ”‚  โ”‚ size_t length  โ”‚   โ”‚  โ”‚
โ”‚  โ”‚ size_t capacityโ”‚   โ”‚  โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”‚โ”€โ”€โ”€โ”˜  โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”‚โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                    โ–ผ
           โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
           โ”‚ Heap memory    โ”‚  โ† Vec OWNS this
           โ”‚ [1,2,3,4,5...] โ”‚  โ† vec_destroy() frees it
           โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Pointer Invalidation After Realloc

SCENARIO: Reallocation invalidates ALL previous pointers
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
Vec v;
vec_init(&v);

vec_push(&v, 10);
int *first = &v.data[0];  // โ† first points into v.data
printf("%d\n", *first);   // โœ“ SAFE (currently valid)

vec_push(&v, 20);  // โ† This might realloc() v.data
                   // โ† v.data may have MOVED
                   // โ† first is now INVALID (dangling pointer)

printf("%d\n", *first);   // โœ— UNDEFINED BEHAVIOR
                          //   (first points to freed memory)

BEFORE realloc():
  v.data โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                  โ–ผ
         โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
  Heap:  โ”‚ [10, ...empty...] โ”‚
         โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                  โ–ฒ
  first โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   (VALID pointer)

AFTER vec_push() triggers realloc():
  v.data โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                              โ–ผ
                     โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
  Heap:              โ”‚ [10, 20, ...empty........]  โ”‚
                     โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

         โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
         โ”‚ (freed memory)  โ”‚
         โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                  โ–ฒ
  first โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   (INVALID - DANGLING!)

Project Specification

Core API

// Create a new vector with initial capacity
void vec_init(vec_t *v);
void vec_init_capacity(vec_t *v, size_t capacity);

// Destroy vector and free memory
void vec_destroy(vec_t *v);

// Add element to end (may trigger realloc)
void vec_push(vec_t *v, int value);

// Remove and return last element
int vec_pop(vec_t *v);

// Access element by index (with bounds checking)
int vec_get(vec_t *v, size_t index);
void vec_set(vec_t *v, size_t index, int value);

// Query size
size_t vec_length(vec_t *v);
size_t vec_capacity(vec_t *v);
bool vec_empty(vec_t *v);

// Remove element at index (shifts remaining)
void vec_remove(vec_t *v, size_t index);

// Clear all elements (keeps capacity)
void vec_clear(vec_t *v);

// Debug: check all invariants
bool vec_check_invariants(vec_t *v);

Expected Output

$ ./vec_demo

=== Dynamic Array (vec.h) Demonstration ===

[Test 1: Basic Creation and Push]
Creating vector with initial capacity 4...
Vector created: {data=0x7f9a42c05a00, length=0, capacity=4}
Invariant check: โœ“ capacity >= length (4 >= 0)
Invariant check: โœ“ data != NULL
Invariant check: โœ“ length within bounds

Pushing integers 0-9...
Push 0: capacity grew 4 โ†’ 8 (realloc triggered)
  Old pointer: 0x7f9a42c05a00 (now INVALID)
  New pointer: 0x7f9a42c05c00
Push 1: no realloc (8 > 1)
...
Push 8: capacity grew 8 โ†’ 16 (realloc triggered)

Final state: {length=10, capacity=16}
Contents: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

[Test 2: Memory Growth Pattern]
Starting with capacity=1, pushing 10,000 integers...
Capacity growth sequence:
  1 โ†’ 2 โ†’ 4 โ†’ 8 โ†’ 16 โ†’ 32 โ†’ 64 โ†’ 128 โ†’ 256 โ†’ 512 โ†’ 1024 โ†’ 2048 โ†’ 4096 โ†’ 8192 โ†’ 16384

Total reallocations: 14
Final capacity: 16384 (63.6% utilization)
Amortized cost analysis:
  Total push operations: 10,000
  Total realloc operations: 14
  Average cost per push: O(1) โœ“

[Test 3: Bounds Checking]
vec_get(vec, 5) = 5 โœ“
vec_get(vec, 9) = 9 โœ“
Attempting out-of-bounds access: vec_get(vec, 10)...
ASSERTION FAILED: index < vec->length

[Test 4: Cleanup and Valgrind]
Destroying vector...
Freed 16384 bytes

$ valgrind --leak-check=full ./vec_demo
==12345== All heap blocks were freed -- no leaks are possible
SUCCESS: Zero memory leaks โœ“

Solution Architecture

Data Structure

typedef struct {
    int *data;        // Pointer to heap-allocated array
    size_t length;    // Number of elements currently stored
    size_t capacity;  // Total slots allocated
} vec_t;

Key Algorithms

Push with Amortized O(1):

vec_push(vec, value):
    1. Check if length >= capacity
       - If yes: grow to capacity * 2 (or 1 if capacity == 0)
    2. Store value at data[length]
    3. Increment length

Growth Strategy:

  • Start with capacity 0 or small number
  • Double capacity when full
  • Amortized O(1) because total cost of n pushes is O(n)

Bounds Checking:

vec_get(vec, index):
    1. Assert: vec != NULL
    2. Assert: index < vec->length
    3. Return vec->data[index]

Implementation Guide

Phase 1: Basic Structure (1 hour)

typedef struct {
    int *data;
    size_t length;
    size_t capacity;
} vec_t;

void vec_init(vec_t *v) {
    v->data = NULL;
    v->length = 0;
    v->capacity = 0;
}

void vec_destroy(vec_t *v) {
    free(v->data);
    v->data = NULL;
    v->length = 0;
    v->capacity = 0;
}

Phase 2: Push and Grow (1 hour)

void vec_push(vec_t *v, int value) {
    if (v->length >= v->capacity) {
        size_t new_capacity = (v->capacity == 0) ? 1 : v->capacity * 2;
        int *new_data = realloc(v->data, new_capacity * sizeof(int));
        if (new_data == NULL) {
            // Handle allocation failure
            fprintf(stderr, "vec_push: allocation failed\n");
            abort();
        }
        v->data = new_data;
        v->capacity = new_capacity;
    }
    v->data[v->length] = value;
    v->length++;
}

Phase 3: Access and Bounds Checking (1 hour)

int vec_get(vec_t *v, size_t index) {
    assert(v != NULL);
    assert(index < v->length && "Index out of bounds");
    return v->data[index];
}

void vec_set(vec_t *v, size_t index, int value) {
    assert(v != NULL);
    assert(index < v->length && "Index out of bounds");
    v->data[index] = value;
}

Phase 4: Additional Operations (1-2 hours)

int vec_pop(vec_t *v) {
    assert(v != NULL);
    assert(v->length > 0 && "Cannot pop from empty vector");
    v->length--;
    return v->data[v->length];
}

void vec_remove(vec_t *v, size_t index) {
    assert(v != NULL);
    assert(index < v->length);

    // Shift elements left
    for (size_t i = index; i < v->length - 1; i++) {
        v->data[i] = v->data[i + 1];
    }
    v->length--;
}

void vec_clear(vec_t *v) {
    v->length = 0;
    // Keep capacity - don't free
}

Phase 5: Invariant Checking (1 hour)

bool vec_check_invariants(vec_t *v) {
    if (v == NULL) return false;

    // Invariant 1: capacity >= length
    if (v->capacity < v->length) {
        fprintf(stderr, "INVARIANT VIOLATION: capacity < length\n");
        return false;
    }

    // Invariant 2: data != NULL OR capacity == 0
    if (v->data == NULL && v->capacity != 0) {
        fprintf(stderr, "INVARIANT VIOLATION: data is NULL but capacity != 0\n");
        return false;
    }
    if (v->data != NULL && v->capacity == 0) {
        fprintf(stderr, "INVARIANT VIOLATION: data not NULL but capacity == 0\n");
        return false;
    }

    return true;
}

Testing Strategy

Unit Tests

void test_init_destroy() {
    vec_t v;
    vec_init(&v);
    assert(v.length == 0);
    assert(v.capacity == 0);
    assert(v.data == NULL);
    vec_destroy(&v);
    printf("test_init_destroy: PASSED\n");
}

void test_push_and_grow() {
    vec_t v;
    vec_init(&v);

    for (int i = 0; i < 1000; i++) {
        vec_push(&v, i);
        assert(v.length == (size_t)(i + 1));
        assert(v.capacity >= v.length);
    }

    for (int i = 0; i < 1000; i++) {
        assert(vec_get(&v, i) == i);
    }

    vec_destroy(&v);
    printf("test_push_and_grow: PASSED\n");
}

void test_bounds_checking() {
    vec_t v;
    vec_init(&v);
    vec_push(&v, 42);

    // Valid access
    assert(vec_get(&v, 0) == 42);

    // This should trigger assertion failure in debug mode
    // vec_get(&v, 1);  // Would abort

    vec_destroy(&v);
    printf("test_bounds_checking: PASSED\n");
}

Valgrind Verification

$ valgrind --leak-check=full --show-leak-kinds=all ./test_vec
==12345== All heap blocks were freed -- no leaks are possible

Common Pitfalls

Pitfall 1: Forgetting capacity == 0 Case

// WRONG: Infinite loop when capacity is 0
size_t new_capacity = v->capacity * 2;  // 0 * 2 = 0!

// CORRECT: Handle zero case
size_t new_capacity = (v->capacity == 0) ? 1 : v->capacity * 2;

Pitfall 2: Updating Capacity Before Checking Realloc

// WRONG: If realloc fails, capacity is wrong
v->capacity = new_capacity;
v->data = realloc(v->data, new_capacity * sizeof(int));
if (v->data == NULL) // Too late!

// CORRECT: Check realloc first
int *new_data = realloc(v->data, new_capacity * sizeof(int));
if (new_data == NULL) {
    // Handle error - original data still valid
    return ERROR;
}
v->data = new_data;
v->capacity = new_capacity;

Pitfall 3: Using Pointers After Push

// DANGEROUS: pointer may be invalidated
int *first = &v.data[0];
vec_push(&v, 42);  // May realloc!
*first = 10;  // UNDEFINED BEHAVIOR - first may be dangling

Pitfall 4: Double Free

vec_t v;
vec_init(&v);
vec_push(&v, 1);
vec_destroy(&v);
vec_destroy(&v);  // DOUBLE FREE - undefined behavior

// SOLUTION: Set to NULL after destroy
void vec_destroy(vec_t *v) {
    free(v->data);
    v->data = NULL;  // Prevents double-free issues
    v->length = 0;
    v->capacity = 0;
}

Interview Preparation

Common Questions

  1. โ€œWhat is the difference between length and capacity?โ€
    • Length: number of elements currently stored
    • Capacity: total slots allocated
    • Invariant: capacity >= length
  2. โ€œWhy double the capacity instead of adding a fixed amount?โ€
    • Amortized O(1) per push
    • Total cost for n pushes is O(n), not O(nยฒ)
    • Adding fixed amount (e.g., +10) would be O(n) per push amortized
  3. โ€œAfter calling vec_push(), why might pointers to elements become invalid?โ€
    • Realloc may move the entire array to a new location
    • Old pointers become dangling
    • Must re-fetch pointers after any operation that may realloc
  4. โ€œWhat happens if realloc returns NULL?โ€
    • Allocation failed, but original data is still valid
    • Must not update v->data until we know realloc succeeded
    • Should handle gracefully (return error, abort, etc.)
  5. โ€œShould vec_get return a pointer or a copy?โ€
    • Pointer: faster, but dangerous (invalidation)
    • Copy: safer, but slower for large types
    • Design choice depends on use case

Self-Assessment Checklist

Understanding

  • Can explain the 5 invariants without looking
  • Can explain why doubling gives amortized O(1)
  • Can describe what happens when realloc moves memory
  • Can explain ownership semantics for the vector

Implementation

  • vec_init works correctly
  • vec_push handles growth correctly
  • vec_get/vec_set have bounds checking
  • vec_destroy frees all memory
  • Valgrind shows no leaks

Verification

  • Unit tests pass
  • Invariant checker validates state
  • Edge cases handled (empty, single element, large)

Resources

Books

Topic Book Chapter
Dynamic Arrays โ€œC Programming: A Modern Approachโ€ by K.N. King Chapter 17
Invariants & Contracts โ€œC Interfaces and Implementationsโ€ by David Hanson Chapter 1
Assertions โ€œWriting Solid Codeโ€ by Steve Maguire Chapter 2
Memory Management โ€œEffective Cโ€ by Robert Seacord Chapter 6
Amortized Analysis โ€œIntroduction to Algorithmsโ€ by Cormen et al. Chapter 17.4

Summary

This project teaches the fundamental discipline of invariant-based programming:

  1. Invariants are contracts - they define what makes your data structure valid
  2. Ownership is explicit - one entity is responsible for freeing memory
  3. Pointers have lifetimes - realloc invalidates all previous pointers
  4. Bounds checking is non-negotiable - assert your preconditions
  5. Testing proves correctness - Valgrind and unit tests verify your invariants

When you can build a dynamic array that never leaks, never corrupts, and clearly documents its contracts, youโ€™ve internalized the discipline that separates reliable systems code from fragile code.


Next Project: P02: Text Editor Buffer (Gap Buffer) - Apply invariants to a more complex data structure with elegant properties.