Project 1: Safe Dynamic Array Library (vec.h)
Build a reusable, rigorously-specified dynamic array library in C with invariants you can prove and tests that enforce them.
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Intermediate |
| Time Estimate | 1-2 weeks |
| Main Programming Language | C (Alternatives: Rust, Zig, C++) |
| Alternative Programming Languages | Rust, Zig, C++ |
| Coolness Level | Level 2 (Practical and Useful) |
| Business Potential | Level 2 (Reusable library) |
| Prerequisites | Pointers, malloc/free, structs, basic testing |
| Key Topics | Invariants, ownership, realloc safety, amortized O(1) |
1. Learning Objectives
By completing this project, you will:
- Define and enforce representation and behavioral invariants for a C data structure.
- Implement safe growth using
reallocwithout losing the old buffer on failure. - Design an API that prevents (or loudly detects) invalid usage.
- Explain and justify amortized O(1) append in a dynamic array.
- Write tests that validate internal invariants after every operation.
2. All Theory Needed (Per-Concept Breakdown)
This section includes every concept required to implement a safe dynamic array in C.
2.1 Representation Invariants and Contracts
Fundamentals
A representation invariant is a rule about the internal state that must always be true at stable points such as function entry, function exit, and loop boundaries. In a dynamic array, you must be able to state and check that length never exceeds capacity, that the data pointer is either NULL or points to a buffer of at least capacity elements, and that the first length elements are initialized. These are not optional; they are the definition of correctness for the structure. Contracts extend this idea to API boundaries by describing what callers must guarantee (preconditions) and what the function promises in return (postconditions). In C, these contracts are enforced by discipline, documentation, and explicit validation. Without them, your code can appear to work but still be undefined behavior. A safe dynamic array is mostly a contract-engineering exercise.
Deep Dive into the Concept
Representation invariants are the only guardrails in a language that does not check bounds for you. Think of a dynamic array as a struct that describes a memory region. The memory region is not self-describing; it becomes meaningful only if the struct fields accurately describe it. If length is larger than capacity, the struct is lying, and any function that trusts the struct is at risk of accessing unallocated memory. If data is NULL while capacity is non-zero, the struct claims memory exists when it does not. If length is smaller than the number of initialized elements, you are losing data; if it is larger, you may read uninitialized bytes. These invariants are both structural and behavioral: structural invariants define the shape of the data, while behavioral invariants define how operations relate to each other, such as “push followed by pop returns the original element”.
In C, you do not get automatic enforcement. That means you must build a mental model of stable points. A stable point is any moment when other code is allowed to observe your structure: at function boundaries, before and after mutation loops, and at error returns. Inside a function, you can temporarily break invariants while rearranging memory, but you must restore them before any observable point. This suggests a programming style: validate at entry, mutate with an internal plan, then validate before exit. If you find that your function cannot restore invariants in all paths, the design is flawed.
A good contract includes error handling. When a dynamic array operation fails, you need to decide which invariants remain true. A safe design rule is “fail without side effects”: if vec_push fails, the array remains unchanged and valid. This is critical because many functions compose operations. If vec_push partially mutates state on failure, every caller becomes more complex, and invariants become fragile. Another contract rule is to separate internal and external mutability. If you expose the struct fields publicly, any caller can break invariants. That may be acceptable in a trusted codebase, but it increases risk. A safer design is to keep the struct private or at least document that fields must not be mutated by callers.
Contracts also include bounds and index validity. If your API allows vec_get(vec, i) with no checks, you are delegating safety to every caller. You can expose both checked and unchecked functions, but you must be explicit about it. For example, vec_get can assert in debug builds while vec_get_unchecked can skip checks. The invariant is not only about length <= capacity but also about the promise that valid indices are [0, length). If any function accepts an index, it must either check or clearly state that the caller is responsible.
Loop invariants are a special case that matter when resizing or shifting elements. When you copy elements to a new buffer, the loop invariant might be “all elements before index i have been copied correctly, and the destination buffer is large enough for the remaining elements.” This ensures you do not read or write out of bounds. In C, a loop that violates its invariant can corrupt memory without immediate failure, making debugging extremely painful.
The best way to enforce invariants is to make them executable. Write a vec_validate function that asserts internal rules and call it at entry and exit in debug builds. Pair this with tests that intentionally break invariants to prove the validator catches them. This approach turns informal reasoning into a concrete safety net. You should also document invariants in the header file so that users know the rules of engagement. Over time, these invariants become a small specification for the data structure, and tests become proofs that the implementation satisfies that specification.
How this fits on projects
You will apply invariants in every API function (vec_push, vec_pop, vec_insert, vec_remove, vec_reserve) and in your test suite. The project is successful only if the invariants are clearly stated, easy to check, and always preserved.
Definitions & key terms
- Representation invariant: A rule about the internal structure that must always hold.
- Behavioral invariant: A rule about the observable behavior across operations.
- Precondition: What must be true before a function is called.
- Postcondition: What must be true after a function returns.
- Stable point: A point where invariants must hold (function entry/exit, loop boundaries).
Mental model diagram (ASCII)
Stable Points and Invariants
[call] -> [mutate] -> [return]
| | |
preconds (temp) postconds
Invariant is allowed to be false only inside mutate.
How it works (step-by-step, with invariants and failure modes)
- At function entry, validate the structure (
length <= capacity, pointer validity). - Check preconditions (index bounds, non-null vector pointer).
- Perform mutation steps in an order that does not expose inconsistent state.
- If a step fails, roll back or exit without changing the vector.
- At function exit, validate invariants again.
Failure modes: out-of-bounds access if index checks are missing, stale length if you update length before allocation succeeds, and memory leaks if you lose track of ownership on errors.
Minimal concrete example
static void vec_validate(const Vec *v) {
assert(v != NULL);
assert(v->capacity == 0 || v->data != NULL);
assert(v->length <= v->capacity);
}
Common misconceptions
- “If it passes tests, the invariants must hold.” (Bugs can be data-dependent.)
- “Invariants slow code.” (Use them in debug builds; they save time overall.)
- “I can fix invariants later.” (Designing without invariants leads to brittle APIs.)
Check-your-understanding questions
- What is the difference between a representation invariant and a behavioral invariant?
- Why are invariants allowed to be temporarily false during mutation?
- If
vec_pushfails, what should the state of the vector be?
Check-your-understanding answers
- Representation invariants define structural correctness; behavioral invariants define operation semantics.
- Because the function body is not a stable point; only entry/exit and loop boundaries are.
- It should remain unchanged and valid.
Real-world applications
- Standard libraries (e.g.,
std::vector) rely on similar invariants. - Database page structures have strict invariants for offsets and lengths.
- Memory allocators validate free lists with invariants to avoid corruption.
Where you will apply it
- This project: See §3.2 Functional Requirements and §5.10 Implementation Phases.
- Also used in: P02 Gap Buffer, P03 Memory Arena.
References
- “C Interfaces and Implementations” by David Hanson, Ch. 4-5.
- “Code Complete” by Steve McConnell, Ch. 8.
Key insights
Invariants are the real type system of C; without them, your structure has no meaning.
Summary
A safe dynamic array is defined by its invariants. Once you can state them, you can enforce them, test them, and trust the implementation.
Homework/Exercises to practice the concept
- Write invariants for a stack and a queue.
- Write a
validate()function for each structure. - Create a failing test that violates one invariant and confirm the validator catches it.
Solutions to the homework/exercises
- Stack:
top <= capacity,data != NULLif capacity > 0. Queue:size <= capacity,headandtailin bounds. - Use asserts on each rule, then call the function at entry and exit of operations.
- Force
length > capacityin a test and verify an assertion triggers.
2.2 Ownership, Lifetimes, and realloc Safety
Fundamentals
Ownership answers a simple question: who is responsible for freeing memory? In a dynamic array, the vector owns the buffer and is solely responsible for freeing it. Lifetimes describe when a pointer is valid: a pointer is valid only from the moment the memory is allocated until it is freed or reallocated. The realloc function complicates this because it may move memory, invalidating all previous pointers into the array. If you store or return raw element pointers, you must document that they become invalid after any operation that may resize the buffer. Understanding ownership and lifetime is the difference between a safe API and a time bomb.
Deep Dive into the Concept
Ownership in C is contractual, not enforced by the compiler. A safe dynamic array must declare that it owns its internal buffer and that callers must not free it directly. This is a simple rule, but it has many consequences. If you allow callers to access the buffer directly, they might store element pointers and assume they stay valid. That assumption is false whenever the buffer grows. Therefore, your API should either avoid exposing raw pointers or be explicit about the invalidation rules. A common design is to provide vec_data for read-only access with a warning: any mutation that might grow the vector invalidates the returned pointer. Alternatively, you can provide a “frozen” view that only exists as long as the vector is not mutated.
Lifetimes are closely tied to error handling. When realloc fails, the old buffer remains valid, but the returned pointer is NULL. If you assign directly to vec->data before checking, you will lose the only reference to the old buffer and leak memory. The correct pattern is to use a temporary pointer, verify success, then update the vector fields. This pattern is not optional; it is the difference between correct and incorrect code. Additionally, you must decide how to handle realloc failure in the API. For example, vec_push should return a boolean or error code and leave the vector unchanged if growth fails. This ensures that ownership and invariants remain intact.
Element lifetimes are another subtlety. The vector controls the lifetime of its elements, but only for the memory container. If the elements are pointers to other allocations, ownership becomes layered: does the vector own the pointed-to data, or does it merely store references? You must define this in the API. A generic vector might be “shallow” (it only owns the buffer), while a specialized vector might be “deep” (it owns the elements and frees them on destruction). This choice affects how you design vec_destroy and how you test for leaks. A good approach is to allow the caller to provide an optional destructor function for elements.
Realloc safety also includes pointer invalidation and aliasing. Any pointer to vec->data[i] is not safe across operations that might resize. Even if realloc returns the same address, you cannot rely on that. Therefore, APIs that expose element pointers must define invalidation rules clearly. Some libraries prevent this by using indices instead of pointers. Indices are stable across realloc as long as the logical elements remain in order. That is why vec_get should return values or copies rather than raw pointers unless explicitly requested.
Ownership is not just about memory. If your vector uses a file handle, mutex, or other resource (e.g., for persistence), then you must define ownership for those resources too. Even though this project focuses on memory, the same rules apply: there must be one responsible owner, and any transfer must be documented. This mental model scales to more complex systems, which is why it is foundational.
A robust way to test ownership is to deliberately stress error paths. Force realloc to fail by using a very small memory limit or by injecting a failing allocator, then verify that there are no leaks and that the vector remains valid. This is a realistic test: many bugs only appear on allocation failures. When you can handle those failures gracefully, you can trust your ownership model.
How this fits on projects
Ownership and lifetime rules determine the design of vec_create, vec_destroy, vec_reserve, and all functions that return pointers. They also inform the testing strategy: you must test realloc failures and ensure the original buffer is still valid.
Definitions & key terms
- Ownership: Responsibility for freeing a resource.
- Lifetime: The time interval when a pointer is valid.
- Reallocation: Resizing a memory block, which may move it.
- Aliasing: Two pointers referring to the same memory.
Mental model diagram (ASCII)
Ownership Flow
caller --> vec_create() -> vec owns buffer
caller <-- vec_destroy() frees buffer
realloc may move buffer:
old_ptr ----X (invalid)
new_ptr ----> valid
How it works (step-by-step, with invariants and failure modes)
- Vector allocates buffer and becomes sole owner.
- Mutations may call
reallocto grow. - If
reallocfails, keep old buffer and return error. - If
reallocsucceeds, update pointer and capacity. - Any previously obtained element pointer is now invalid.
Failure modes: double free if caller frees buffer, memory leak if you overwrite pointer on realloc failure, use-after-free if caller keeps element pointers.
Minimal concrete example
int *tmp = realloc(vec->data, new_cap * sizeof(int));
if (!tmp) return false; // old buffer still valid
vec->data = tmp;
vec->capacity = new_cap;
Common misconceptions
- “realloc always moves memory.” (It may or may not.)
- “If I only grow a little, old pointers remain valid.” (Not guaranteed.)
- “Ownership is obvious.” (It is not; it must be documented.)
Check-your-understanding questions
- Why must you use a temporary pointer with
realloc? - What happens to element pointers after a resize?
- If the vector stores pointers, who frees them?
Check-your-understanding answers
- To avoid losing the original buffer on failure.
- They are invalid because the buffer may move.
- It depends on the API; you must define whether the vector owns them.
Real-world applications
- Custom allocators, pools, and arenas use strict ownership rules.
- GUI toolkits often return borrowed pointers with explicit lifetimes.
- Network stacks pass ownership of buffers between layers.
Where you will apply it
- This project: See §3.2 Functional Requirements and §3.7 Real World Outcome.
- Also used in: P03 Memory Arena, P04 JSON Parser.
References
- “Effective C” by Robert Seacord, Ch. 4-6.
- “Expert C Programming” by Peter van der Linden, Ch. 3.
Key insights
If you cannot state who owns a pointer and how long it lives, your program is already unsafe.
Summary
Ownership and lifetimes turn raw pointers into manageable resources. In a dynamic array, the vector owns the buffer, and realloc is the critical moment when lifetimes can be violated.
Homework/Exercises to practice the concept
- Write a small program that calls
reallocincorrectly and observe the leak. - Add a destructor callback to a vector of pointers and free all elements.
- Document ownership rules for a hypothetical
vec_viewfunction.
Solutions to the homework/exercises
- Assign
vec->data = realloc(...)directly and simulate failure to see the leak. - Store function pointers for element cleanup and call them in
vec_destroy. - State that
vec_viewreturns a borrowed pointer valid until the next mutation.
2.3 Amortized Analysis and Growth Strategy
Fundamentals
A dynamic array grows by allocating more memory than immediately needed so that most pushes are O(1) and only occasional resizes are O(n). This is called amortized analysis. The fundamental idea is that expensive operations are rare and their cost is spread across many cheap operations. If you double the capacity each time you grow, the total work of copying elements across many pushes is linear in the number of elements inserted. This gives an average, or amortized, cost of O(1) per push. Without a sound growth strategy, you may end up reallocating on every push, making the structure slow and unstable.
Deep Dive into the Concept
Amortized analysis is a way to analyze algorithms that have occasional expensive steps. For a dynamic array, the expensive step is resizing the buffer and copying elements. Suppose you start with capacity 1 and double whenever you run out of space. When you insert the 1st element, you allocate capacity 1. Insert the 2nd element, you grow to 2 and copy 1 element. Insert the 3rd element, you grow to 4 and copy 2 elements. Insert the 5th, you grow to 8 and copy 4 elements. The total number of copies after inserting n elements is less than 2n. That means the average cost per insert is constant, even though some inserts are expensive. This is why doubling (or 1.5x growth) is common in standard libraries.
The growth factor affects memory overhead and performance. Doubling uses more memory but fewer resizes; a smaller factor uses less memory but causes more resizes. The choice depends on your constraints. For a systems programming library, a doubling strategy is common because it minimizes realloc calls, which can be expensive and can fragment memory. However, in memory-constrained environments, you may choose 1.5x or a fixed increment. The key is to document the strategy and ensure it provides bounded complexity.
You must also consider integer overflow when calculating the new capacity. If capacity is large, doubling may overflow size_t, leading to small allocations and buffer overflows. A safe implementation uses a checked multiplication and returns an error if the requested size is too large. This is part of your invariants: capacity must always represent the actual allocated size. If you overflow, your structure lies.
Another amortized consideration is the cost of shrinking. Some vectors shrink when the size drops below a threshold (e.g., one quarter of capacity). This reduces memory but introduces more reallocations if the size fluctuates around the threshold. A common strategy is to avoid automatic shrinking and instead provide an explicit vec_shrink_to_fit function. This gives the caller control and keeps complexity predictable.
Amortized analysis also impacts API guarantees. If you claim that vec_push is O(1) amortized, you are making a contract with users. It means that in a long sequence of pushes, the total work grows linearly with the number of elements. If you implement growth incorrectly (e.g., adding a constant each time), the total work becomes quadratic and the contract is broken. This matters in real code: a poor growth strategy can turn a seemingly simple algorithm into a performance disaster.
Finally, amortized analysis connects to invariants. The growth strategy must maintain invariants before and after resizing: the old contents must be copied correctly, the new capacity must be correct, and the vector must remain valid in case of failure. Amortized analysis tells you how often resizing happens; invariants tell you how to do it safely.
A practical way to internalize amortized analysis is to measure total copy work. Add a counter that increments by the number of elements copied during each resize, then run a long sequence of pushes. You will observe that the counter grows roughly linearly with the final size. This experiment makes the proof tangible and reinforces why exponential growth is the standard choice in real libraries.
How this fits on projects
You will define and justify a growth strategy in vec_reserve and vec_push. This strategy will influence performance, memory overhead, and the correctness of realloc handling.
Definitions & key terms
- Amortized cost: Average cost per operation across a sequence.
- Growth factor: The multiplier used when expanding capacity.
- Resize: The operation that allocates a larger buffer and copies data.
- Overflow: When arithmetic exceeds the maximum representable value.
Mental model diagram (ASCII)
Capacity growth (doubling)
cap: 1 -> 2 -> 4 -> 8 -> 16
push:1 2 3-4 5-8 9-16
copies:1 2 4 8 (total < 2n)
How it works (step-by-step, with invariants and failure modes)
- Check if
length + 1exceedscapacity. - If yes, compute new capacity (e.g., max(1, capacity * 2)).
- Validate that new capacity does not overflow.
- Attempt to
reallocto the new size. - On success, update capacity and continue; on failure, return error and leave vector unchanged.
Failure modes: integer overflow causing tiny allocation, repeated small growth leading to O(n^2) behavior, and failure to handle realloc errors.
Minimal concrete example
size_t new_cap = vec->capacity ? vec->capacity * 2 : 1;
if (new_cap < vec->capacity) return false; // overflow check
Common misconceptions
- “Doubling wastes too much memory.” (It is a trade-off; it reduces reallocations.)
- “Resizing is always expensive.” (It is rare if growth is exponential.)
- “Shrinking automatically is always better.” (It can cause thrashing.)
Check-your-understanding questions
- Why is doubling capacity an amortized O(1) strategy?
- What happens if you grow by +1 each time?
- Why must you check for overflow in capacity calculations?
Check-your-understanding answers
- Because total copy work over n inserts is O(n).
- You will reallocate on every insert, leading to O(n^2) total cost.
- Overflow can lead to allocating too little memory and out-of-bounds writes.
Real-world applications
std::vectoruses exponential growth with an implementation-defined factor.- Dynamic strings in many libraries use doubling to avoid repeated reallocations.
- Hash tables resize with a growth factor to maintain load factor invariants.
Where you will apply it
- This project: See §3.2 Functional Requirements and §4.4 Algorithm Overview.
- Also used in: P04 JSON Parser for dynamic buffers.
References
- “Algorithms” by Sedgewick and Wayne, dynamic array analysis sections.
- “The Art of Computer Programming” Vol. 1, amortized analysis discussions.
Key insights
A correct growth strategy turns a potentially slow structure into a predictable one.
Summary
Amortized analysis justifies why occasional expensive resizes do not make the structure slow overall, as long as you choose a sound growth factor.
Homework/Exercises to practice the concept
- Compare growth strategies (x2 vs x1.5) in a small benchmark.
- Implement a checked multiplication helper for
size_t. - Add
vec_shrink_to_fitand test its behavior with fluctuating sizes.
Solutions to the homework/exercises
- Measure total reallocations and time for each strategy and compare.
- Use division to detect overflow: if
new_cap / factor != old_cap, overflow occurred. - Shrink when
length < capacity / 4and verify it does not thrash in repeated push/pop.
3. Project Specification
3.1 What You Will Build
You will build a small, reusable C library that implements a dynamic array (Vec) with clear API contracts and invariant validation. The library will support creation, destruction, push/pop, insert/remove, bounds-checked access, reserve/shrink, and optional element destructor callbacks.
Included:
vec.hpublic API andvec.cimplementation- Debug validation function callable from tests
- Error-returning API for operations that can fail
- Optional element destructor for deep cleanup
Excluded:
- Thread safety
- Concurrency or lock-free features
- Generic macros for arbitrary types (you will implement a
void*-based orint-based version)
3.2 Functional Requirements
- Create/Destroy:
vec_createallocates a valid empty vector;vec_destroyfrees all owned memory. - Push/Pop:
vec_pushappends in amortized O(1) time;vec_popremoves the last element safely. - Insert/Remove: Insert at index shifts elements; remove at index compacts the array.
- Bounds-Checked Access:
vec_getandvec_setvalidate indices and return errors. - Reserve/Shrink:
vec_reserveensures capacity;vec_shrink_to_fitoptional. - Validation:
vec_validatechecks invariants and is used in tests and debug builds.
3.3 Non-Functional Requirements
- Performance: Push should be amortized O(1), insert/remove O(n).
- Reliability: No memory leaks; safe behavior under allocation failure.
- Usability: API should document ownership and invalidation rules clearly.
3.4 Example Usage / Output
Vec *v = vec_create(sizeof(int));
vec_push(v, &value);
int out;
vec_get(v, 0, &out);
vec_destroy(v, NULL);
3.5 Data Formats / Schemas / Protocols
Public API (example signatures):
typedef struct Vec Vec;
Vec *vec_create(size_t elem_size);
void vec_destroy(Vec *v, void (*elem_destructor)(void *));
bool vec_push(Vec *v, const void *elem);
bool vec_pop(Vec *v, void *out);
bool vec_get(const Vec *v, size_t idx, void *out);
bool vec_set(Vec *v, size_t idx, const void *elem);
bool vec_insert(Vec *v, size_t idx, const void *elem);
bool vec_remove(Vec *v, size_t idx, void *out);
bool vec_reserve(Vec *v, size_t new_cap);
bool vec_shrink_to_fit(Vec *v);
bool vec_validate(const Vec *v);
3.6 Edge Cases
vec_popon empty vector returns false and does not mutate.vec_getout of bounds returns false and leaves output untouched.vec_reservewith smaller capacity does nothing.reallocfailure leaves vector unchanged.- Overflow when computing new capacity returns false.
3.7 Real World Outcome
This section is a golden reference for what correct behavior looks like.
3.7.1 How to Run (Copy/Paste)
make
./vec_demo
./vec_tests
3.7.2 Golden Path Demo (Deterministic)
Run with a fixed sequence of operations and print the internal state.
3.7.3 CLI Terminal Transcript (Exact)
$ ./vec_demo
[vec] create elem_size=4
[vec] push 10
[vec] push 20
[vec] insert idx=1 val=15
[vec] state len=3 cap=4 data=[10,15,20]
[vec] pop -> 20
[vec] state len=2 cap=4 data=[10,15]
[vec] destroy ok
exit_code=0
3.7.4 Failure Demo (Deterministic)
$ ./vec_demo --fail-alloc-at=2
[vec] create elem_size=4
[vec] push 10
[vec] push 20
[error] allocation failed during resize
[vec] state len=1 cap=1 data=[10]
exit_code=2
3.7.5 Exit Codes
0: success2: allocation failure3: invariant violation detected4: invalid argument
4. Solution Architecture
4.1 High-Level Design
+-------------+ +------------------+ +------------------+
| vec.h API | --> | vec.c operations | --> | invariant checks |
+-------------+ +------------------+ +------------------+
| | |
v v v
callers internal buffer debug validation
4.2 Key Components
| Component | Responsibility | Key Decisions |
|---|---|---|
| Vec struct | Stores length, capacity, element size, data pointer | Keep fields private to enforce contracts |
| Growth logic | Computes new capacity and reallocates | Use doubling + overflow checks |
| Validation | Asserts invariants | Debug-only checks with optional runtime validation |
4.3 Data Structures (No Full Code)
typedef struct {
size_t length;
size_t capacity;
size_t elem_size;
unsigned char *data;
} Vec;
4.4 Algorithm Overview
Key Algorithm: Push with Growth
- Validate vector at entry.
- If length == capacity, compute new capacity and realloc.
- Copy element bytes into
data + length * elem_size. - Increment length.
- Validate vector at exit.
Complexity Analysis:
- Time: O(1) amortized for push, O(n) for insert/remove.
- Space: O(n) with growth overhead.
5. Implementation Guide
5.1 Development Environment Setup
cc --version
make --version
5.2 Project Structure
vec/
├── include/vec.h
├── src/vec.c
├── tests/vec_test.c
├── examples/vec_demo.c
├── Makefile
└── README.md
5.3 The Core Question You’re Answering
“How do I build a resizable array that is correct in the face of memory failures and pointer invalidation?”
5.4 Concepts You Must Understand First
Stop and verify you can explain these before coding:
- Invariants and stable points.
- Ownership and lifetime rules for heap memory.
- Realloc semantics and pointer invalidation.
- Amortized analysis and growth factors.
5.5 Questions to Guide Your Design
- How will you prevent callers from breaking invariants?
- What is the behavior when allocation fails?
- Which API functions return borrowed pointers, if any?
- How will you prove that insert/remove maintain ordering?
5.6 Thinking Exercise
Trace this sequence and verify invariants after each step:
Vec *v = vec_create(sizeof(int));
vec_push(v, &a);
vec_push(v, &b);
vec_insert(v, 0, &c);
vec_remove(v, 1, NULL);
5.7 The Interview Questions They’ll Ask
- Why is
reallocdangerous and how do you use it safely? - What invariants define a dynamic array?
- What is amortized O(1) and why does it matter here?
5.8 Hints in Layers
Hint 1: Start with vec_validate() and call it everywhere in debug.
Hint 2: Use a temp pointer with realloc to avoid losing the old buffer.
Hint 3: Keep element access in helper functions to centralize bounds checks.
5.9 Books That Will Help
| Topic | Book | Chapter | |——|——|———| | Invariants and contracts | “Code Complete” | Ch. 8 | | Dynamic arrays | “C Interfaces and Implementations” | Ch. 4-5 | | Memory safety | “Effective C” | Ch. 4-6 |
5.10 Implementation Phases
Phase 1: Foundation (1-2 days)
Goals: Define API, struct, and invariants. Tasks:
- Write
vec.hwith documented contracts. - Implement
vec_createandvec_destroy. - Add
vec_validateand unit tests for invariants. Checkpoint: Create/destroy works with no leaks.
Phase 2: Core Functionality (3-5 days)
Goals: Push/pop, insert/remove, reserve. Tasks:
- Implement growth strategy with overflow checks.
- Implement push/pop and test edge cases.
- Implement insert/remove with memmove and tests. Checkpoint: Tests cover all operations and invariants hold.
Phase 3: Polish & Edge Cases (2-3 days)
Goals: Error paths, docs, examples. Tasks:
- Add failure injection for
realloc. - Write example CLI demo.
- Document pointer invalidation rules. Checkpoint: Failure demo behaves deterministically.
5.11 Key Implementation Decisions
| Decision | Options | Recommendation | Rationale | |———-|———|—————-|———–| | Growth factor | 1.5x, 2x | 2x | Fewer reallocations, simpler analysis | | Public struct fields | Expose or hide | Hide | Prevent invariant violations | | Element ownership | Shallow or deep | Shallow + destructor callback | Flexible and explicit |
6. Testing Strategy
6.1 Test Categories
| Category | Purpose | Examples | |———|———|———-| | Unit Tests | Validate single operations | push/pop, insert/remove | | Invariant Tests | Assert internal correctness | validate after every op | | Failure Tests | Ensure safe behavior on errors | realloc failure injection |
6.2 Critical Test Cases
- Push until multiple resizes: Ensure length and capacity stay correct.
- Insert at index 0 and end: Verify shifting logic.
- Realloc failure: Ensure vector unchanged and valid.
6.3 Test Data
Input sequence: push 1, push 2, push 3, insert 0->9, remove idx=2
Expected: [9,1,3]
7. Common Pitfalls & Debugging
7.1 Frequent Mistakes
| Pitfall | Symptom | Solution |
|———|———|———-|
| Overwriting pointer on realloc failure | Leak or crash | Use temp pointer |
| Off-by-one in insert/remove | Corrupted data | Use memmove and test boundaries |
| Exposing struct fields | Invariant violations | Make struct opaque |
7.2 Debugging Strategies
- Use AddressSanitizer to detect out-of-bounds.
- Add invariant checks after every mutation in tests.
7.3 Performance Traps
Repeated small growth (e.g., +1) causes O(n^2) behavior. Use exponential growth.
8. Extensions & Challenges
8.1 Beginner Extensions
- Add
vec_clearto reset length without freeing. - Add
vec_clonethat copies a vector.
8.2 Intermediate Extensions
- Add
vec_sortwith user-provided comparator. - Add
vec_binary_searchfor sorted vectors.
8.3 Advanced Extensions
- Add a small-object optimization (store small arrays inline).
- Add a debug allocator for deterministic failure injection.
9. Real-World Connections
9.1 Industry Applications
- Used in game engines for entity lists.
- Used in compilers for token streams and AST nodes.
9.2 Related Open Source Projects
glibGArray: dynamic array with similar invariants.stb_dsstretchy buffer: lightweight dynamic array pattern.
9.3 Interview Relevance
- Dynamic array invariants are a standard interview topic.
- Amortized analysis is commonly discussed in complexity questions.
10. Resources
10.1 Essential Reading
- “C Interfaces and Implementations” by David Hanson - dynamic array design.
- “Effective C” by Robert Seacord - memory safety and ownership.
10.2 Video Resources
- “Amortized Analysis” lectures (MIT OCW).
10.3 Tools & Documentation
- Valgrind Memcheck - memory error detection.
- LLVM AddressSanitizer docs.
10.4 Related Projects in This Series
- P02 Gap Buffer: Applies invariants to text editing.
- P03 Memory Arena: Alternative ownership model.
11. Self-Assessment Checklist
11.1 Understanding
- I can state all invariants for the vector.
- I can explain why
reallocinvalidates pointers. - I can justify the growth factor choice.
11.2 Implementation
- All operations pass tests and invariant checks.
- Failure paths do not leak memory.
- Example demo output matches the golden transcript.
11.3 Growth
- I can explain this project in an interview.
- I documented lessons learned.
12. Submission / Completion Criteria
Minimum Viable Completion:
vec.handvec.ccompile cleanly with warnings enabled.- All tests pass.
- Invariants validated at entry/exit in debug builds.
Full Completion:
- All minimum criteria plus failure injection tests.
- Documented ownership rules and pointer invalidation.
Excellence (Going Above & Beyond):
- Benchmarks comparing growth factors.
- Extended API with optional destructor callbacks.
13. Additional Content Rules (Hard Requirements)
13.1 Determinism
All demos use a fixed input sequence and deterministic output. Allocation failure is injected at a fixed operation count.
13.2 Outcome Completeness
- Includes a success demo and a failure demo.
- Exit codes are specified in §3.7.5.
13.3 Cross-Linking
- Internal references: See §5.10 Phase 2 and §6.2 Critical Test Cases.
- Other projects: P02 Gap Buffer, P04 JSON Parser.
13.4 No Placeholder Text
All content is complete and explicit.