Project 1: Safe Dynamic Array Library (vec.h)

Build a reusable, rigorously-specified dynamic array library in C with invariants you can prove and tests that enforce them.

Quick Reference

Attribute Value
Difficulty Intermediate
Time Estimate 1-2 weeks
Main Programming Language C (Alternatives: Rust, Zig, C++)
Alternative Programming Languages Rust, Zig, C++
Coolness Level Level 2 (Practical and Useful)
Business Potential Level 2 (Reusable library)
Prerequisites Pointers, malloc/free, structs, basic testing
Key Topics Invariants, ownership, realloc safety, amortized O(1)

1. Learning Objectives

By completing this project, you will:

  1. Define and enforce representation and behavioral invariants for a C data structure.
  2. Implement safe growth using realloc without losing the old buffer on failure.
  3. Design an API that prevents (or loudly detects) invalid usage.
  4. Explain and justify amortized O(1) append in a dynamic array.
  5. Write tests that validate internal invariants after every operation.

2. All Theory Needed (Per-Concept Breakdown)

This section includes every concept required to implement a safe dynamic array in C.

2.1 Representation Invariants and Contracts

Fundamentals

A representation invariant is a rule about the internal state that must always be true at stable points such as function entry, function exit, and loop boundaries. In a dynamic array, you must be able to state and check that length never exceeds capacity, that the data pointer is either NULL or points to a buffer of at least capacity elements, and that the first length elements are initialized. These are not optional; they are the definition of correctness for the structure. Contracts extend this idea to API boundaries by describing what callers must guarantee (preconditions) and what the function promises in return (postconditions). In C, these contracts are enforced by discipline, documentation, and explicit validation. Without them, your code can appear to work but still be undefined behavior. A safe dynamic array is mostly a contract-engineering exercise.

Deep Dive into the Concept

Representation invariants are the only guardrails in a language that does not check bounds for you. Think of a dynamic array as a struct that describes a memory region. The memory region is not self-describing; it becomes meaningful only if the struct fields accurately describe it. If length is larger than capacity, the struct is lying, and any function that trusts the struct is at risk of accessing unallocated memory. If data is NULL while capacity is non-zero, the struct claims memory exists when it does not. If length is smaller than the number of initialized elements, you are losing data; if it is larger, you may read uninitialized bytes. These invariants are both structural and behavioral: structural invariants define the shape of the data, while behavioral invariants define how operations relate to each other, such as “push followed by pop returns the original element”.

In C, you do not get automatic enforcement. That means you must build a mental model of stable points. A stable point is any moment when other code is allowed to observe your structure: at function boundaries, before and after mutation loops, and at error returns. Inside a function, you can temporarily break invariants while rearranging memory, but you must restore them before any observable point. This suggests a programming style: validate at entry, mutate with an internal plan, then validate before exit. If you find that your function cannot restore invariants in all paths, the design is flawed.

A good contract includes error handling. When a dynamic array operation fails, you need to decide which invariants remain true. A safe design rule is “fail without side effects”: if vec_push fails, the array remains unchanged and valid. This is critical because many functions compose operations. If vec_push partially mutates state on failure, every caller becomes more complex, and invariants become fragile. Another contract rule is to separate internal and external mutability. If you expose the struct fields publicly, any caller can break invariants. That may be acceptable in a trusted codebase, but it increases risk. A safer design is to keep the struct private or at least document that fields must not be mutated by callers.

Contracts also include bounds and index validity. If your API allows vec_get(vec, i) with no checks, you are delegating safety to every caller. You can expose both checked and unchecked functions, but you must be explicit about it. For example, vec_get can assert in debug builds while vec_get_unchecked can skip checks. The invariant is not only about length <= capacity but also about the promise that valid indices are [0, length). If any function accepts an index, it must either check or clearly state that the caller is responsible.

Loop invariants are a special case that matter when resizing or shifting elements. When you copy elements to a new buffer, the loop invariant might be “all elements before index i have been copied correctly, and the destination buffer is large enough for the remaining elements.” This ensures you do not read or write out of bounds. In C, a loop that violates its invariant can corrupt memory without immediate failure, making debugging extremely painful.

The best way to enforce invariants is to make them executable. Write a vec_validate function that asserts internal rules and call it at entry and exit in debug builds. Pair this with tests that intentionally break invariants to prove the validator catches them. This approach turns informal reasoning into a concrete safety net. You should also document invariants in the header file so that users know the rules of engagement. Over time, these invariants become a small specification for the data structure, and tests become proofs that the implementation satisfies that specification.

How this fits on projects

You will apply invariants in every API function (vec_push, vec_pop, vec_insert, vec_remove, vec_reserve) and in your test suite. The project is successful only if the invariants are clearly stated, easy to check, and always preserved.

Definitions & key terms

  • Representation invariant: A rule about the internal structure that must always hold.
  • Behavioral invariant: A rule about the observable behavior across operations.
  • Precondition: What must be true before a function is called.
  • Postcondition: What must be true after a function returns.
  • Stable point: A point where invariants must hold (function entry/exit, loop boundaries).

Mental model diagram (ASCII)

             Stable Points and Invariants

   [call] -> [mutate] -> [return]
     |         |          |
   preconds   (temp)    postconds

Invariant is allowed to be false only inside mutate.

How it works (step-by-step, with invariants and failure modes)

  1. At function entry, validate the structure (length <= capacity, pointer validity).
  2. Check preconditions (index bounds, non-null vector pointer).
  3. Perform mutation steps in an order that does not expose inconsistent state.
  4. If a step fails, roll back or exit without changing the vector.
  5. At function exit, validate invariants again.

Failure modes: out-of-bounds access if index checks are missing, stale length if you update length before allocation succeeds, and memory leaks if you lose track of ownership on errors.

Minimal concrete example

static void vec_validate(const Vec *v) {
    assert(v != NULL);
    assert(v->capacity == 0 || v->data != NULL);
    assert(v->length <= v->capacity);
}

Common misconceptions

  • “If it passes tests, the invariants must hold.” (Bugs can be data-dependent.)
  • “Invariants slow code.” (Use them in debug builds; they save time overall.)
  • “I can fix invariants later.” (Designing without invariants leads to brittle APIs.)

Check-your-understanding questions

  1. What is the difference between a representation invariant and a behavioral invariant?
  2. Why are invariants allowed to be temporarily false during mutation?
  3. If vec_push fails, what should the state of the vector be?

Check-your-understanding answers

  1. Representation invariants define structural correctness; behavioral invariants define operation semantics.
  2. Because the function body is not a stable point; only entry/exit and loop boundaries are.
  3. It should remain unchanged and valid.

Real-world applications

  • Standard libraries (e.g., std::vector) rely on similar invariants.
  • Database page structures have strict invariants for offsets and lengths.
  • Memory allocators validate free lists with invariants to avoid corruption.

Where you will apply it

References

  • “C Interfaces and Implementations” by David Hanson, Ch. 4-5.
  • “Code Complete” by Steve McConnell, Ch. 8.

Key insights

Invariants are the real type system of C; without them, your structure has no meaning.

Summary

A safe dynamic array is defined by its invariants. Once you can state them, you can enforce them, test them, and trust the implementation.

Homework/Exercises to practice the concept

  1. Write invariants for a stack and a queue.
  2. Write a validate() function for each structure.
  3. Create a failing test that violates one invariant and confirm the validator catches it.

Solutions to the homework/exercises

  1. Stack: top <= capacity, data != NULL if capacity > 0. Queue: size <= capacity, head and tail in bounds.
  2. Use asserts on each rule, then call the function at entry and exit of operations.
  3. Force length > capacity in a test and verify an assertion triggers.

2.2 Ownership, Lifetimes, and realloc Safety

Fundamentals

Ownership answers a simple question: who is responsible for freeing memory? In a dynamic array, the vector owns the buffer and is solely responsible for freeing it. Lifetimes describe when a pointer is valid: a pointer is valid only from the moment the memory is allocated until it is freed or reallocated. The realloc function complicates this because it may move memory, invalidating all previous pointers into the array. If you store or return raw element pointers, you must document that they become invalid after any operation that may resize the buffer. Understanding ownership and lifetime is the difference between a safe API and a time bomb.

Deep Dive into the Concept

Ownership in C is contractual, not enforced by the compiler. A safe dynamic array must declare that it owns its internal buffer and that callers must not free it directly. This is a simple rule, but it has many consequences. If you allow callers to access the buffer directly, they might store element pointers and assume they stay valid. That assumption is false whenever the buffer grows. Therefore, your API should either avoid exposing raw pointers or be explicit about the invalidation rules. A common design is to provide vec_data for read-only access with a warning: any mutation that might grow the vector invalidates the returned pointer. Alternatively, you can provide a “frozen” view that only exists as long as the vector is not mutated.

Lifetimes are closely tied to error handling. When realloc fails, the old buffer remains valid, but the returned pointer is NULL. If you assign directly to vec->data before checking, you will lose the only reference to the old buffer and leak memory. The correct pattern is to use a temporary pointer, verify success, then update the vector fields. This pattern is not optional; it is the difference between correct and incorrect code. Additionally, you must decide how to handle realloc failure in the API. For example, vec_push should return a boolean or error code and leave the vector unchanged if growth fails. This ensures that ownership and invariants remain intact.

Element lifetimes are another subtlety. The vector controls the lifetime of its elements, but only for the memory container. If the elements are pointers to other allocations, ownership becomes layered: does the vector own the pointed-to data, or does it merely store references? You must define this in the API. A generic vector might be “shallow” (it only owns the buffer), while a specialized vector might be “deep” (it owns the elements and frees them on destruction). This choice affects how you design vec_destroy and how you test for leaks. A good approach is to allow the caller to provide an optional destructor function for elements.

Realloc safety also includes pointer invalidation and aliasing. Any pointer to vec->data[i] is not safe across operations that might resize. Even if realloc returns the same address, you cannot rely on that. Therefore, APIs that expose element pointers must define invalidation rules clearly. Some libraries prevent this by using indices instead of pointers. Indices are stable across realloc as long as the logical elements remain in order. That is why vec_get should return values or copies rather than raw pointers unless explicitly requested.

Ownership is not just about memory. If your vector uses a file handle, mutex, or other resource (e.g., for persistence), then you must define ownership for those resources too. Even though this project focuses on memory, the same rules apply: there must be one responsible owner, and any transfer must be documented. This mental model scales to more complex systems, which is why it is foundational.

A robust way to test ownership is to deliberately stress error paths. Force realloc to fail by using a very small memory limit or by injecting a failing allocator, then verify that there are no leaks and that the vector remains valid. This is a realistic test: many bugs only appear on allocation failures. When you can handle those failures gracefully, you can trust your ownership model.

How this fits on projects

Ownership and lifetime rules determine the design of vec_create, vec_destroy, vec_reserve, and all functions that return pointers. They also inform the testing strategy: you must test realloc failures and ensure the original buffer is still valid.

Definitions & key terms

  • Ownership: Responsibility for freeing a resource.
  • Lifetime: The time interval when a pointer is valid.
  • Reallocation: Resizing a memory block, which may move it.
  • Aliasing: Two pointers referring to the same memory.

Mental model diagram (ASCII)

Ownership Flow

caller --> vec_create() -> vec owns buffer
caller <-- vec_destroy() frees buffer

realloc may move buffer:
old_ptr ----X (invalid)
new_ptr ----> valid

How it works (step-by-step, with invariants and failure modes)

  1. Vector allocates buffer and becomes sole owner.
  2. Mutations may call realloc to grow.
  3. If realloc fails, keep old buffer and return error.
  4. If realloc succeeds, update pointer and capacity.
  5. Any previously obtained element pointer is now invalid.

Failure modes: double free if caller frees buffer, memory leak if you overwrite pointer on realloc failure, use-after-free if caller keeps element pointers.

Minimal concrete example

int *tmp = realloc(vec->data, new_cap * sizeof(int));
if (!tmp) return false; // old buffer still valid
vec->data = tmp;
vec->capacity = new_cap;

Common misconceptions

  • “realloc always moves memory.” (It may or may not.)
  • “If I only grow a little, old pointers remain valid.” (Not guaranteed.)
  • “Ownership is obvious.” (It is not; it must be documented.)

Check-your-understanding questions

  1. Why must you use a temporary pointer with realloc?
  2. What happens to element pointers after a resize?
  3. If the vector stores pointers, who frees them?

Check-your-understanding answers

  1. To avoid losing the original buffer on failure.
  2. They are invalid because the buffer may move.
  3. It depends on the API; you must define whether the vector owns them.

Real-world applications

  • Custom allocators, pools, and arenas use strict ownership rules.
  • GUI toolkits often return borrowed pointers with explicit lifetimes.
  • Network stacks pass ownership of buffers between layers.

Where you will apply it

References

  • “Effective C” by Robert Seacord, Ch. 4-6.
  • “Expert C Programming” by Peter van der Linden, Ch. 3.

Key insights

If you cannot state who owns a pointer and how long it lives, your program is already unsafe.

Summary

Ownership and lifetimes turn raw pointers into manageable resources. In a dynamic array, the vector owns the buffer, and realloc is the critical moment when lifetimes can be violated.

Homework/Exercises to practice the concept

  1. Write a small program that calls realloc incorrectly and observe the leak.
  2. Add a destructor callback to a vector of pointers and free all elements.
  3. Document ownership rules for a hypothetical vec_view function.

Solutions to the homework/exercises

  1. Assign vec->data = realloc(...) directly and simulate failure to see the leak.
  2. Store function pointers for element cleanup and call them in vec_destroy.
  3. State that vec_view returns a borrowed pointer valid until the next mutation.

2.3 Amortized Analysis and Growth Strategy

Fundamentals

A dynamic array grows by allocating more memory than immediately needed so that most pushes are O(1) and only occasional resizes are O(n). This is called amortized analysis. The fundamental idea is that expensive operations are rare and their cost is spread across many cheap operations. If you double the capacity each time you grow, the total work of copying elements across many pushes is linear in the number of elements inserted. This gives an average, or amortized, cost of O(1) per push. Without a sound growth strategy, you may end up reallocating on every push, making the structure slow and unstable.

Deep Dive into the Concept

Amortized analysis is a way to analyze algorithms that have occasional expensive steps. For a dynamic array, the expensive step is resizing the buffer and copying elements. Suppose you start with capacity 1 and double whenever you run out of space. When you insert the 1st element, you allocate capacity 1. Insert the 2nd element, you grow to 2 and copy 1 element. Insert the 3rd element, you grow to 4 and copy 2 elements. Insert the 5th, you grow to 8 and copy 4 elements. The total number of copies after inserting n elements is less than 2n. That means the average cost per insert is constant, even though some inserts are expensive. This is why doubling (or 1.5x growth) is common in standard libraries.

The growth factor affects memory overhead and performance. Doubling uses more memory but fewer resizes; a smaller factor uses less memory but causes more resizes. The choice depends on your constraints. For a systems programming library, a doubling strategy is common because it minimizes realloc calls, which can be expensive and can fragment memory. However, in memory-constrained environments, you may choose 1.5x or a fixed increment. The key is to document the strategy and ensure it provides bounded complexity.

You must also consider integer overflow when calculating the new capacity. If capacity is large, doubling may overflow size_t, leading to small allocations and buffer overflows. A safe implementation uses a checked multiplication and returns an error if the requested size is too large. This is part of your invariants: capacity must always represent the actual allocated size. If you overflow, your structure lies.

Another amortized consideration is the cost of shrinking. Some vectors shrink when the size drops below a threshold (e.g., one quarter of capacity). This reduces memory but introduces more reallocations if the size fluctuates around the threshold. A common strategy is to avoid automatic shrinking and instead provide an explicit vec_shrink_to_fit function. This gives the caller control and keeps complexity predictable.

Amortized analysis also impacts API guarantees. If you claim that vec_push is O(1) amortized, you are making a contract with users. It means that in a long sequence of pushes, the total work grows linearly with the number of elements. If you implement growth incorrectly (e.g., adding a constant each time), the total work becomes quadratic and the contract is broken. This matters in real code: a poor growth strategy can turn a seemingly simple algorithm into a performance disaster.

Finally, amortized analysis connects to invariants. The growth strategy must maintain invariants before and after resizing: the old contents must be copied correctly, the new capacity must be correct, and the vector must remain valid in case of failure. Amortized analysis tells you how often resizing happens; invariants tell you how to do it safely.

A practical way to internalize amortized analysis is to measure total copy work. Add a counter that increments by the number of elements copied during each resize, then run a long sequence of pushes. You will observe that the counter grows roughly linearly with the final size. This experiment makes the proof tangible and reinforces why exponential growth is the standard choice in real libraries.

How this fits on projects

You will define and justify a growth strategy in vec_reserve and vec_push. This strategy will influence performance, memory overhead, and the correctness of realloc handling.

Definitions & key terms

  • Amortized cost: Average cost per operation across a sequence.
  • Growth factor: The multiplier used when expanding capacity.
  • Resize: The operation that allocates a larger buffer and copies data.
  • Overflow: When arithmetic exceeds the maximum representable value.

Mental model diagram (ASCII)

Capacity growth (doubling)

cap: 1 -> 2 -> 4 -> 8 -> 16
push:1   2   3-4 5-8 9-16
copies:1   2    4    8  (total < 2n)

How it works (step-by-step, with invariants and failure modes)

  1. Check if length + 1 exceeds capacity.
  2. If yes, compute new capacity (e.g., max(1, capacity * 2)).
  3. Validate that new capacity does not overflow.
  4. Attempt to realloc to the new size.
  5. On success, update capacity and continue; on failure, return error and leave vector unchanged.

Failure modes: integer overflow causing tiny allocation, repeated small growth leading to O(n^2) behavior, and failure to handle realloc errors.

Minimal concrete example

size_t new_cap = vec->capacity ? vec->capacity * 2 : 1;
if (new_cap < vec->capacity) return false; // overflow check

Common misconceptions

  • “Doubling wastes too much memory.” (It is a trade-off; it reduces reallocations.)
  • “Resizing is always expensive.” (It is rare if growth is exponential.)
  • “Shrinking automatically is always better.” (It can cause thrashing.)

Check-your-understanding questions

  1. Why is doubling capacity an amortized O(1) strategy?
  2. What happens if you grow by +1 each time?
  3. Why must you check for overflow in capacity calculations?

Check-your-understanding answers

  1. Because total copy work over n inserts is O(n).
  2. You will reallocate on every insert, leading to O(n^2) total cost.
  3. Overflow can lead to allocating too little memory and out-of-bounds writes.

Real-world applications

  • std::vector uses exponential growth with an implementation-defined factor.
  • Dynamic strings in many libraries use doubling to avoid repeated reallocations.
  • Hash tables resize with a growth factor to maintain load factor invariants.

Where you will apply it

  • This project: See §3.2 Functional Requirements and §4.4 Algorithm Overview.
  • Also used in: P04 JSON Parser for dynamic buffers.

References

  • “Algorithms” by Sedgewick and Wayne, dynamic array analysis sections.
  • “The Art of Computer Programming” Vol. 1, amortized analysis discussions.

Key insights

A correct growth strategy turns a potentially slow structure into a predictable one.

Summary

Amortized analysis justifies why occasional expensive resizes do not make the structure slow overall, as long as you choose a sound growth factor.

Homework/Exercises to practice the concept

  1. Compare growth strategies (x2 vs x1.5) in a small benchmark.
  2. Implement a checked multiplication helper for size_t.
  3. Add vec_shrink_to_fit and test its behavior with fluctuating sizes.

Solutions to the homework/exercises

  1. Measure total reallocations and time for each strategy and compare.
  2. Use division to detect overflow: if new_cap / factor != old_cap, overflow occurred.
  3. Shrink when length < capacity / 4 and verify it does not thrash in repeated push/pop.

3. Project Specification

3.1 What You Will Build

You will build a small, reusable C library that implements a dynamic array (Vec) with clear API contracts and invariant validation. The library will support creation, destruction, push/pop, insert/remove, bounds-checked access, reserve/shrink, and optional element destructor callbacks.

Included:

  • vec.h public API and vec.c implementation
  • Debug validation function callable from tests
  • Error-returning API for operations that can fail
  • Optional element destructor for deep cleanup

Excluded:

  • Thread safety
  • Concurrency or lock-free features
  • Generic macros for arbitrary types (you will implement a void*-based or int-based version)

3.2 Functional Requirements

  1. Create/Destroy: vec_create allocates a valid empty vector; vec_destroy frees all owned memory.
  2. Push/Pop: vec_push appends in amortized O(1) time; vec_pop removes the last element safely.
  3. Insert/Remove: Insert at index shifts elements; remove at index compacts the array.
  4. Bounds-Checked Access: vec_get and vec_set validate indices and return errors.
  5. Reserve/Shrink: vec_reserve ensures capacity; vec_shrink_to_fit optional.
  6. Validation: vec_validate checks invariants and is used in tests and debug builds.

3.3 Non-Functional Requirements

  • Performance: Push should be amortized O(1), insert/remove O(n).
  • Reliability: No memory leaks; safe behavior under allocation failure.
  • Usability: API should document ownership and invalidation rules clearly.

3.4 Example Usage / Output

Vec *v = vec_create(sizeof(int));
vec_push(v, &value);
int out;
vec_get(v, 0, &out);
vec_destroy(v, NULL);

3.5 Data Formats / Schemas / Protocols

Public API (example signatures):

typedef struct Vec Vec;

Vec *vec_create(size_t elem_size);
void vec_destroy(Vec *v, void (*elem_destructor)(void *));

bool vec_push(Vec *v, const void *elem);
bool vec_pop(Vec *v, void *out);

bool vec_get(const Vec *v, size_t idx, void *out);
bool vec_set(Vec *v, size_t idx, const void *elem);

bool vec_insert(Vec *v, size_t idx, const void *elem);
bool vec_remove(Vec *v, size_t idx, void *out);

bool vec_reserve(Vec *v, size_t new_cap);
bool vec_shrink_to_fit(Vec *v);

bool vec_validate(const Vec *v);

3.6 Edge Cases

  • vec_pop on empty vector returns false and does not mutate.
  • vec_get out of bounds returns false and leaves output untouched.
  • vec_reserve with smaller capacity does nothing.
  • realloc failure leaves vector unchanged.
  • Overflow when computing new capacity returns false.

3.7 Real World Outcome

This section is a golden reference for what correct behavior looks like.

3.7.1 How to Run (Copy/Paste)

make
./vec_demo
./vec_tests

3.7.2 Golden Path Demo (Deterministic)

Run with a fixed sequence of operations and print the internal state.

3.7.3 CLI Terminal Transcript (Exact)

$ ./vec_demo
[vec] create elem_size=4
[vec] push 10
[vec] push 20
[vec] insert idx=1 val=15
[vec] state len=3 cap=4 data=[10,15,20]
[vec] pop -> 20
[vec] state len=2 cap=4 data=[10,15]
[vec] destroy ok
exit_code=0

3.7.4 Failure Demo (Deterministic)

$ ./vec_demo --fail-alloc-at=2
[vec] create elem_size=4
[vec] push 10
[vec] push 20
[error] allocation failed during resize
[vec] state len=1 cap=1 data=[10]
exit_code=2

3.7.5 Exit Codes

  • 0: success
  • 2: allocation failure
  • 3: invariant violation detected
  • 4: invalid argument

4. Solution Architecture

4.1 High-Level Design

+-------------+     +------------------+     +------------------+
|  vec.h API  | --> | vec.c operations | --> | invariant checks |
+-------------+     +------------------+     +------------------+
         |                     |                      |
         v                     v                      v
     callers             internal buffer         debug validation

4.2 Key Components

Component Responsibility Key Decisions
Vec struct Stores length, capacity, element size, data pointer Keep fields private to enforce contracts
Growth logic Computes new capacity and reallocates Use doubling + overflow checks
Validation Asserts invariants Debug-only checks with optional runtime validation

4.3 Data Structures (No Full Code)

typedef struct {
    size_t length;
    size_t capacity;
    size_t elem_size;
    unsigned char *data;
} Vec;

4.4 Algorithm Overview

Key Algorithm: Push with Growth

  1. Validate vector at entry.
  2. If length == capacity, compute new capacity and realloc.
  3. Copy element bytes into data + length * elem_size.
  4. Increment length.
  5. Validate vector at exit.

Complexity Analysis:

  • Time: O(1) amortized for push, O(n) for insert/remove.
  • Space: O(n) with growth overhead.

5. Implementation Guide

5.1 Development Environment Setup

cc --version
make --version

5.2 Project Structure

vec/
├── include/vec.h
├── src/vec.c
├── tests/vec_test.c
├── examples/vec_demo.c
├── Makefile
└── README.md

5.3 The Core Question You’re Answering

“How do I build a resizable array that is correct in the face of memory failures and pointer invalidation?”

5.4 Concepts You Must Understand First

Stop and verify you can explain these before coding:

  1. Invariants and stable points.
  2. Ownership and lifetime rules for heap memory.
  3. Realloc semantics and pointer invalidation.
  4. Amortized analysis and growth factors.

5.5 Questions to Guide Your Design

  1. How will you prevent callers from breaking invariants?
  2. What is the behavior when allocation fails?
  3. Which API functions return borrowed pointers, if any?
  4. How will you prove that insert/remove maintain ordering?

5.6 Thinking Exercise

Trace this sequence and verify invariants after each step:

Vec *v = vec_create(sizeof(int));
vec_push(v, &a);
vec_push(v, &b);
vec_insert(v, 0, &c);
vec_remove(v, 1, NULL);

5.7 The Interview Questions They’ll Ask

  1. Why is realloc dangerous and how do you use it safely?
  2. What invariants define a dynamic array?
  3. What is amortized O(1) and why does it matter here?

5.8 Hints in Layers

Hint 1: Start with vec_validate() and call it everywhere in debug. Hint 2: Use a temp pointer with realloc to avoid losing the old buffer. Hint 3: Keep element access in helper functions to centralize bounds checks.

5.9 Books That Will Help

| Topic | Book | Chapter | |——|——|———| | Invariants and contracts | “Code Complete” | Ch. 8 | | Dynamic arrays | “C Interfaces and Implementations” | Ch. 4-5 | | Memory safety | “Effective C” | Ch. 4-6 |

5.10 Implementation Phases

Phase 1: Foundation (1-2 days)

Goals: Define API, struct, and invariants. Tasks:

  1. Write vec.h with documented contracts.
  2. Implement vec_create and vec_destroy.
  3. Add vec_validate and unit tests for invariants. Checkpoint: Create/destroy works with no leaks.

Phase 2: Core Functionality (3-5 days)

Goals: Push/pop, insert/remove, reserve. Tasks:

  1. Implement growth strategy with overflow checks.
  2. Implement push/pop and test edge cases.
  3. Implement insert/remove with memmove and tests. Checkpoint: Tests cover all operations and invariants hold.

Phase 3: Polish & Edge Cases (2-3 days)

Goals: Error paths, docs, examples. Tasks:

  1. Add failure injection for realloc.
  2. Write example CLI demo.
  3. Document pointer invalidation rules. Checkpoint: Failure demo behaves deterministically.

5.11 Key Implementation Decisions

| Decision | Options | Recommendation | Rationale | |———-|———|—————-|———–| | Growth factor | 1.5x, 2x | 2x | Fewer reallocations, simpler analysis | | Public struct fields | Expose or hide | Hide | Prevent invariant violations | | Element ownership | Shallow or deep | Shallow + destructor callback | Flexible and explicit |


6. Testing Strategy

6.1 Test Categories

| Category | Purpose | Examples | |———|———|———-| | Unit Tests | Validate single operations | push/pop, insert/remove | | Invariant Tests | Assert internal correctness | validate after every op | | Failure Tests | Ensure safe behavior on errors | realloc failure injection |

6.2 Critical Test Cases

  1. Push until multiple resizes: Ensure length and capacity stay correct.
  2. Insert at index 0 and end: Verify shifting logic.
  3. Realloc failure: Ensure vector unchanged and valid.

6.3 Test Data

Input sequence: push 1, push 2, push 3, insert 0->9, remove idx=2
Expected: [9,1,3]

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

| Pitfall | Symptom | Solution | |———|———|———-| | Overwriting pointer on realloc failure | Leak or crash | Use temp pointer | | Off-by-one in insert/remove | Corrupted data | Use memmove and test boundaries | | Exposing struct fields | Invariant violations | Make struct opaque |

7.2 Debugging Strategies

  • Use AddressSanitizer to detect out-of-bounds.
  • Add invariant checks after every mutation in tests.

7.3 Performance Traps

Repeated small growth (e.g., +1) causes O(n^2) behavior. Use exponential growth.


8. Extensions & Challenges

8.1 Beginner Extensions

  • Add vec_clear to reset length without freeing.
  • Add vec_clone that copies a vector.

8.2 Intermediate Extensions

  • Add vec_sort with user-provided comparator.
  • Add vec_binary_search for sorted vectors.

8.3 Advanced Extensions

  • Add a small-object optimization (store small arrays inline).
  • Add a debug allocator for deterministic failure injection.

9. Real-World Connections

9.1 Industry Applications

  • Used in game engines for entity lists.
  • Used in compilers for token streams and AST nodes.
  • glib GArray: dynamic array with similar invariants.
  • stb_ds stretchy buffer: lightweight dynamic array pattern.

9.3 Interview Relevance

  • Dynamic array invariants are a standard interview topic.
  • Amortized analysis is commonly discussed in complexity questions.

10. Resources

10.1 Essential Reading

  • “C Interfaces and Implementations” by David Hanson - dynamic array design.
  • “Effective C” by Robert Seacord - memory safety and ownership.

10.2 Video Resources

  • “Amortized Analysis” lectures (MIT OCW).

10.3 Tools & Documentation

  • Valgrind Memcheck - memory error detection.
  • LLVM AddressSanitizer docs.

11. Self-Assessment Checklist

11.1 Understanding

  • I can state all invariants for the vector.
  • I can explain why realloc invalidates pointers.
  • I can justify the growth factor choice.

11.2 Implementation

  • All operations pass tests and invariant checks.
  • Failure paths do not leak memory.
  • Example demo output matches the golden transcript.

11.3 Growth

  • I can explain this project in an interview.
  • I documented lessons learned.

12. Submission / Completion Criteria

Minimum Viable Completion:

  • vec.h and vec.c compile cleanly with warnings enabled.
  • All tests pass.
  • Invariants validated at entry/exit in debug builds.

Full Completion:

  • All minimum criteria plus failure injection tests.
  • Documented ownership rules and pointer invalidation.

Excellence (Going Above & Beyond):

  • Benchmarks comparing growth factors.
  • Extended API with optional destructor callbacks.

13. Additional Content Rules (Hard Requirements)

13.1 Determinism

All demos use a fixed input sequence and deterministic output. Allocation failure is injected at a fixed operation count.

13.2 Outcome Completeness

  • Includes a success demo and a failure demo.
  • Exit codes are specified in §3.7.5.

13.3 Cross-Linking

13.4 No Placeholder Text

All content is complete and explicit.