Project 3: Memory Arena Allocator

Build a bump-pointer arena allocator with alignment guarantees, explicit ownership, and deterministic reset semantics.

Quick Reference

Attribute	Value
Difficulty	Advanced
Time Estimate	1-2 weeks
Main Programming Language	C (Alternatives: Rust, Zig)
Alternative Programming Languages	Rust, Zig
Coolness Level	Level 3 (Genuinely Clever)
Business Potential	Level 3 (Reusable infrastructure)
Prerequisites	malloc/free, pointer arithmetic, alignment basics
Key Topics	Alignment, bump allocation, ownership, lifetimes

1. Learning Objectives

By completing this project, you will:

Implement a bump-pointer allocator with clear invariants.
Guarantee alignment for arbitrary types using _Alignof.
Design ownership rules where the arena owns all allocations.
Provide deterministic reset semantics without leaks.
Test allocation behavior, out-of-memory conditions, and invariants.

2. All Theory Needed (Per-Concept Breakdown)

2.1 Alignment and Object Representation

Fundamentals

Alignment is the rule that objects must be stored at addresses that are multiples of their alignment requirement. In C, misaligned access can lead to undefined behavior or performance penalties. An arena allocator must round its bump pointer to satisfy alignment before handing out memory. Object representation is the idea that C objects are stored as sequences of bytes with strict requirements on layout, padding, and alignment. If you return a pointer that is not properly aligned, any dereference of that pointer can crash or silently corrupt data. Alignment is therefore a non-negotiable invariant for a general-purpose allocator.

Deep Dive into the Concept

Alignment is a hardware constraint that the C language exposes. Most CPUs can access aligned memory more efficiently; some cannot access misaligned memory at all. For example, certain ARM configurations will fault on unaligned 64-bit loads. The C standard allows the compiler to assume that pointers to a given type are properly aligned. If you violate that, the compiler may optimize in ways that break your program, even if it appears to work in small tests. This is why alignment is fundamental to any allocator.

An arena allocator typically stores a base pointer and a current offset or “bump” pointer. To allocate memory for a type T, you must ensure that the returned pointer is aligned to _Alignof(T). The usual technique is to round the bump pointer up to the next multiple of the alignment. This can be implemented with an align_up function: (p + (a - 1)) & ~(a - 1) when a is a power of two. This arithmetic must be done on integer addresses, usually by casting to uintptr_t. It is critical to ensure that you do not overflow the address computation and that you do not move past the end of the arena buffer. The alignment operation itself does not allocate; it only adjusts where the next allocation begins.

Object representation also includes padding within structs. When you allocate a struct, you must allocate sizeof(struct) bytes, not the sum of field sizes. The compiler inserts padding to satisfy alignment constraints of fields. Your allocator must respect this by allocating sizeof(T) and aligning to _Alignof(T). If you attempt to pack objects without alignment, you will break the compiler’s assumptions. This is particularly relevant when you allocate arrays of objects: the start of the array must be aligned, and the element size is sizeof(T) which already accounts for padding, so contiguous elements will be properly aligned if the base is aligned.

Another subtle point is that alignment requirements can vary by platform and by type. _Alignof(max_align_t) gives the maximum alignment required for any scalar type. If you want a “generic” allocator that can allocate any type, you must ensure that the arena’s base pointer itself is aligned to at least max_align_t and that each allocation is aligned to its requested alignment. If you only align to sizeof(void*), you may be incorrect for types like long double or SIMD types. This is a common bug in naive allocators.

Alignment also interacts with internal fragmentation. When you align up, you may skip bytes to reach the next boundary. Those bytes become unused space (padding) inside the arena. Your invariants must account for this: the bump pointer moves forward by the aligned size, not just by the requested size. That means your allocator’s accounting must track how much space is consumed and ensure used <= capacity. If you ignore alignment padding, you will eventually hand out overlapping allocations.

Finally, alignment is part of the allocator’s API contract. You should document the alignment guarantees and provide helper functions like arena_alloc_aligned(arena, size, alignment) and a convenience arena_alloc(arena, size) that aligns to max_align_t. This clarity ensures that callers understand what they can safely store in the allocated memory. It also simplifies testing: you can assert that returned pointers satisfy ((uintptr_t)ptr % alignment) == 0 for a set of requested alignments.

How this fits on projects

Alignment rules define the correctness of every allocation returned by the arena. You will use this concept to implement arena_alloc and arena_alloc_aligned and to validate invariants after each allocation.

Definitions & key terms

Alignment: Required address multiple for a type.
Padding: Unused bytes inserted to satisfy alignment.
max_align_t: Type representing maximum alignment for scalar types.
Object representation: The bytes that make up a C object.

Mental model diagram (ASCII)

Arena memory (addresses)

0x1000 [----aligned----][padding][obj][padding][obj]
         ^ align_up          ^ bump pointer

How it works (step-by-step, with invariants and failure modes)

Convert bump pointer to uintptr_t.
Round up to the requested alignment.
Check that aligned + size <= base + capacity.
Return aligned pointer and advance bump pointer.

Failure modes: misaligned pointers, overlapping allocations, overflow in alignment math.

Minimal concrete example

uintptr_t p = (uintptr_t)arena->ptr;
uintptr_t aligned = (p + (align - 1)) & ~(align - 1);

Common misconceptions

“Alignment is only about performance.” (It is also about correctness.)
“Aligning to sizeof(void*) is enough.” (Not for all types.)
“Padding is wasted and can be removed.” (Padding is required for correct alignment.)

Check-your-understanding questions

Why can misaligned access be undefined behavior?
What is the difference between sizeof(T) and sum of field sizes?
Why must the arena base pointer be aligned?

Check-your-understanding answers

The compiler assumes alignment for optimizations; hardware may fault.
sizeof(T) includes padding required by alignment.
If the base is misaligned, every allocation may be misaligned.

Real-world applications

Memory allocators in game engines.
Network packet buffers requiring alignment for headers.
SIMD-heavy code requiring specific alignment.

Where you will apply it

This project: See §3.2 Functional Requirements and §4.3 Data Structures.
Also used in: P06 HTTP Server for request pools.

References

C11 standard sections on alignment.
“Expert C Programming” by Peter van der Linden.

Key insights

Alignment is a correctness requirement, not an optimization detail.

Summary

If your arena does not align allocations correctly, every caller is at risk. Alignment is the foundation of a safe allocator.

Homework/Exercises to practice the concept

Write a function to check pointer alignment.
Allocate double and long double in your arena and verify alignment.
Add a test that fails when alignment is wrong.

Solutions to the homework/exercises

Use ((uintptr_t)p % alignment) == 0.
Use _Alignof to request alignment and assert the condition.
Force misalignment by offsetting the base pointer and observe failure.

2.2 Bump Allocation and Arena Invariants

Fundamentals

A bump allocator is the simplest form of allocator: it hands out memory by advancing a pointer. The arena owns a fixed block of memory and tracks how much has been used. The key invariants are that used <= capacity, that the bump pointer always moves forward, and that each allocation falls within the arena bounds. There is no individual free; the only valid deallocation is a reset that reclaims the entire arena. These invariants simplify ownership and make allocation extremely fast. The simplicity is a feature, but it also means that a single violated invariant can corrupt every subsequent allocation.

Deep Dive into the Concept

The bump allocator is conceptually a stack of bytes. It has a base pointer, a capacity, and a current offset. Allocation is a single arithmetic operation: compute the aligned pointer, check bounds, then move the offset forward. This simplicity makes it very fast and predictable. However, the simplicity is only safe if the invariants are strict. If the bump pointer ever moves backward (except during reset), you risk overlapping allocations. If you allow individual frees, you break the model and need a free list, which is a different allocator entirely.

The key invariant is that the arena is monotonic: allocations are handed out in increasing address order. That means every allocation is valid until the arena is reset or destroyed. This property is extremely powerful for workloads where many objects share the same lifetime, such as parsing a file or handling a network request. In such cases, you can allocate freely without tracking individual frees, then reset the arena when the phase ends. This drastically reduces overhead and reduces fragmentation, because the memory is reclaimed in bulk.

Another important invariant is that all allocations are owned by the arena. If a caller wants to keep an allocation beyond the arena’s lifetime, it must copy the data elsewhere. This must be documented as a strict rule. The arena is not a general-purpose allocator; it is a scoped allocator. This is why arena_reset is safe: it invalidates all previous allocations at once. You must ensure callers do not use pointers after reset. In a debug build, you can optionally fill the arena with a poison pattern to catch use-after-reset bugs.

The invariants also define your error handling. If there is not enough space for a new allocation, you should return NULL or a failure code without altering the arena state. That means the bump pointer should remain unchanged on failure. This is similar to the realloc pattern in the dynamic array: fail without side effects. It allows callers to handle allocation failure without worrying about partial state changes.

A subtle aspect is the distinction between used and capacity. Because alignment can add padding, used should represent the true number of bytes consumed, including alignment padding. If you compute used as the sum of requested sizes, you will underestimate consumption and eventually allocate overlapping memory. Therefore, your allocator should track the actual bump pointer position relative to the base. This is an invariant you can assert: base + used == bump.

Some arenas also provide “marks” or checkpoints that allow partial resets. This is a useful extension: you can push a mark, allocate, then pop to that mark to reclaim only the allocations since that point. This is still compatible with the bump model because it preserves the monotonic property within a scope. If you implement marks, you must document that popping invalidates all allocations since the mark.

A final consideration is internal fragmentation. Alignment padding creates "dead" bytes that are not available for allocation even though they live inside the arena. This is normal, but you must treat those bytes as consumed to preserve the invariant that allocations never overlap. Logging requested size, alignment, and resulting bump position during tests makes these effects visible and helps verify that your accounting is correct.

How this fits on projects

Bump allocation defines the core of your allocator. Every function in the arena API must preserve the monotonic and bounds invariants.

Definitions & key terms

Bump pointer: The current allocation position in the arena.
Monotonic allocation: Allocation that only moves forward.
Reset: A bulk deallocation that returns the arena to empty.
Mark: A saved pointer for partial reset.

Mental model diagram (ASCII)

Arena bytes

base -> [used..........................][free........]
            ^ bump pointer
reset => bump = base

How it works (step-by-step, with invariants and failure modes)

Validate used <= capacity.
Compute aligned pointer for next allocation.
Check that aligned + size <= base + capacity.
Return pointer and update bump/used.
Reset sets bump to base and used to 0.

Failure modes: overlapping allocations, forgetting to include padding, using pointers after reset.

Minimal concrete example

if (arena->used + size > arena->capacity) return NULL;
void *p = arena->base + arena->used;
arena->used += size;

Common misconceptions

“Arena allocations can be freed individually.” (They cannot; use reset.)
“Unused alignment padding does not matter.” (It affects bounds and overlap.)
“Reset only frees the last allocation.” (It invalidates all allocations.)

Check-your-understanding questions

Why must allocations be monotonic in a bump allocator?
What happens to pointers after arena_reset?
Why should allocation failure leave state unchanged?

Check-your-understanding answers

Because the allocator has no free list and relies on forward-only growth.
They become invalid; use-after-reset is undefined behavior.
It preserves invariants and allows callers to handle errors safely.

Real-world applications

Request-scoped allocations in web servers.
Parsing ASTs where all nodes live for the duration of parsing.
Game frame allocators for per-frame data.

Where you will apply it

This project: See §3.2 Functional Requirements and §5.10 Phase 2.
Also used in: P04 JSON Parser, P06 HTTP Server.

References

“C Interfaces and Implementations” by David Hanson, allocator design.
Game engine allocator patterns.

Key insights

Arenas trade individual frees for speed and simplicity. The invariants are what make that trade safe.

Summary

Bump allocation is simple and fast, but only if you maintain strict invariants about monotonic growth and bounds.

Homework/Exercises to practice the concept

Implement a mark/pop system and test nested scopes.
Add a debug fill pattern on reset and detect use-after-reset.
Simulate out-of-memory and verify invariants.

Solutions to the homework/exercises

Store size_t marks and set used back to that value.
Use memset to 0xCD on reset and detect pattern reads.
Inject small capacity and confirm allocator returns NULL without state change.

2.3 Ownership and Lifetime of Arena Allocations

Fundamentals

Arena allocations share a common lifetime: all objects allocated from the arena are valid until the arena is reset or destroyed. The arena is the owner of its memory, and it does not track individual frees. This model simplifies ownership but imposes a strict rule: you must not store arena pointers beyond the arena’s lifetime. If you need data to outlive the arena, you must copy it elsewhere. This is a core contract of the allocator. It is also why arenas are often called phase allocators: their lifetime is tied to a specific phase of work. Treat arena pointers as borrowed handles scoped to that phase.

Deep Dive into the Concept

Ownership in an arena is intentionally coarse-grained. Instead of tracking every allocation, the arena owns a big chunk of memory and hands out slices. The caller receives a borrowed pointer: it can use it, but it must not free it. This is a different ownership contract than malloc, and it must be documented clearly. In practice, this means APIs that accept an arena must define the lifetime of returned data. For example, a JSON parser might allocate nodes in an arena and return a tree that is valid only until the arena is reset. This is a valid design, but only if callers understand the contract.

Lifetimes in an arena are tied to phases. A common pattern is to create an arena per “phase” or per “request”. During the phase, you allocate freely. At the end, you reset the arena, invalidating everything. This works well for workloads with clear boundaries, such as processing a file, handling a network request, or building a temporary data structure. The key invariant is that no pointer crosses a phase boundary. This can be enforced by careful design or by code structure that limits scope.

Because there is no individual free, you must be careful about memory pressure. If you allocate too much in a single phase, you will run out of space and must either grow the arena or fail. A common strategy is to allow arena “blocks” or “chunks” that link together, allowing the arena to grow by adding new blocks. That adds complexity but retains the bulk-free model. For this project, you can keep the arena as a single block and return failure when full. This keeps the invariants simple and easy to reason about.

Error handling must respect ownership rules. If a function allocates multiple objects in an arena and then fails partway, it cannot free the partial allocations individually. Instead, it must either return failure and rely on the caller to reset the arena, or it must use a mark/pop system to roll back to a previous state. This is why marks are useful: they allow you to reclaim only the allocations made in a particular operation while keeping the rest of the arena intact.

Lifetimes also interact with pointer aliasing. If multiple data structures store pointers into the arena, they all become invalid simultaneously on reset. This can be a feature: it simplifies cleanup. But it can also be a bug if a pointer is stored globally or in a long-lived structure. A safe design approach is to keep arena-allocated data within the same module or function that owns the arena, reducing the chance of misuse.

In tests, you should explicitly verify that arena_reset does not leak memory and that allocations after reset reuse the same addresses. This provides a deterministic way to confirm that the arena is truly resetting its state. You can also add debug checks that set the arena to a known pattern on reset to catch use-after-reset bugs in a controlled way.

How this fits on projects

This concept defines the contract of every function that allocates from the arena. It also guides how you test and document the allocator.

Definitions & key terms

Phase lifetime: The time during which all arena allocations are valid.
Borrowed pointer: A pointer that must not be freed by the caller.
Reset: Operation that invalidates all allocations.

Mental model diagram (ASCII)

Phase 1: [alloc][alloc][alloc] -> reset
Phase 2: [alloc][alloc]
Pointers from Phase 1 are invalid in Phase 2.

How it works (step-by-step, with invariants and failure modes)

Create arena for a scope.
Allocate objects, storing pointers only within that scope.
At scope end, call arena_reset or destroy.
Do not use old pointers.

Failure modes: use-after-reset, storing pointers beyond scope, inability to free partial allocations.

Minimal concrete example

Arena a;
arena_init(&a, 4096);
Node *n = arena_alloc(&a, sizeof(Node));
// n valid until arena_reset or arena_destroy

Common misconceptions

“Arena allocations can be freed individually.” (They cannot.)
“Reset only frees the last few allocations.” (It frees all.)
“Borrowed pointers are safe forever.” (They are tied to arena lifetime.)

Check-your-understanding questions

What is the lifetime of an arena allocation?
When should you use marks?
Why is it dangerous to store arena pointers globally?

Check-your-understanding answers

From allocation until reset/destroy.
When you need to roll back allocations within a scope.
Because the arena may reset, invalidating them.

Real-world applications

HTTP servers with per-request arenas.
Compilers allocating AST nodes in a phase.

Where you will apply it

This project: See §3.2 Functional Requirements and §5.10 Phase 1.
Also used in: P04 JSON Parser, P06 HTTP Server.

References

“Effective C” by Robert Seacord, ownership models.
Arena allocation patterns in game engines.

Key insights

Arena ownership is simple and safe only when you respect phase lifetimes.

Summary

Arena allocation trades fine-grained frees for fast bulk lifetime management. The contract is clear: reset invalidates everything.

Homework/Exercises to practice the concept

Build a tiny parser that allocates tokens in an arena and resets after each line.
Add a mark/pop to rollback a failed parse.
Create a test that intentionally uses a pointer after reset and catch it with ASan.

Solutions to the homework/exercises

Allocate tokens as you scan and reset after processing a line.
Save used before parse and restore on failure.
Run with ASan and verify it reports use-after-reset.

3. Project Specification

3.1 What You Will Build

A bump-pointer arena allocator library with functions to initialize, allocate, reset, and destroy. Optional support for alignment-specific allocation and debug validation is included.

Included:

arena.h and arena.c
Alignment-aware allocation
Deterministic demo and tests

Excluded:

Thread safety
Free lists or individual free
Garbage collection

3.2 Functional Requirements

Init/Destroy: Initialize arena with fixed capacity and free it.
Alloc: Allocate memory with default alignment.
Alloc Aligned: Allocate with specified alignment.
Reset: Reclaim all allocations at once.
Validate: Provide a function to check invariants.

3.3 Non-Functional Requirements

Performance: Allocation should be O(1).
Reliability: No overlapping allocations or misalignment.
Usability: Clear ownership rules in documentation.

3.4 Example Usage / Output

Arena a;
arena_init(&a, 4096);
int *x = arena_alloc(&a, sizeof(int));
* x = 42;
arena_reset(&a);

3.5 Data Formats / Schemas / Protocols

Public API (example signatures):

typedef struct Arena Arena;

void arena_init(Arena *a, size_t size);
void arena_destroy(Arena *a);
void *arena_alloc(Arena *a, size_t size);
void *arena_alloc_aligned(Arena *a, size_t size, size_t align);
void arena_reset(Arena *a);
bool arena_validate(const Arena *a);

3.6 Edge Cases

Zero-size allocation returns NULL or a unique pointer (documented choice).
Allocation exactly fills the arena.
Alignment larger than max align.
Reset called multiple times.

3.7 Real World Outcome

3.7.1 How to Run (Copy/Paste)

make
./arena_demo

3.7.2 Golden Path Demo (Deterministic)

A fixed allocation sequence is used to verify alignment and reset behavior.

3.7.3 CLI Terminal Transcript (Exact)

$ ./arena_demo
[arena] init size=128
[alloc] 16 align=8 -> 0x1000
[alloc] 24 align=8 -> 0x1010
[reset]
[alloc] 32 align=16 -> 0x1000
[ok] invariants hold
exit_code=0

3.7.4 Failure Demo (Deterministic)

$ ./arena_demo --alloc 1024
[arena] init size=128
[error] out of memory
exit_code=2

3.7.5 Exit Codes

0: success
2: out of memory
3: invariant violation
4: invalid argument

4. Solution Architecture

4.1 High-Level Design

+-------------+     +------------------+     +-------------------+
| arena API   | --> | bump allocator   | --> | invariants/validation |
+-------------+     +------------------+     +-------------------+

4.2 Key Components

4.3 Data Structures (No Full Code)

typedef struct {
    unsigned char *base;
    size_t capacity;
    size_t used;
} Arena;

4.4 Algorithm Overview

Key Algorithm: Allocation

Compute aligned pointer.
Check bounds.
Update used.

Complexity Analysis:

Time: O(1) per allocation.
Space: O(n) total, with alignment padding overhead.

5. Implementation Guide

5.1 Development Environment Setup

cc --version
make --version

5.2 Project Structure

arena/
├── include/arena.h
├── src/arena.c
├── tests/arena_test.c
├── examples/arena_demo.c
└── Makefile

5.3 The Core Question You’re Answering

“How can I allocate many small objects quickly when they share a lifetime?”

5.4 Concepts You Must Understand First

Alignment and object representation.
Bump allocation invariants.
Ownership and lifetime of arena allocations.

5.5 Questions to Guide Your Design

How will you handle alignment for arbitrary types?
How will you handle allocation failure without state corruption?
Will you support marks for partial reset?

5.6 Thinking Exercise

Given a 64-byte arena, allocate 8 bytes aligned to 16, then 12 bytes aligned to 8. Track used after each allocation and identify padding.

5.7 The Interview Questions They’ll Ask

What is an arena allocator and why is it fast?
How do you ensure alignment in a custom allocator?
What are the trade-offs of an arena vs malloc?

5.8 Hints in Layers

Hint 1: Use uintptr_t for alignment math. Hint 2: Keep used equal to aligned_ptr - base + size. Hint 3: Add invariant checks on every allocation in debug builds.

5.9 Books That Will Help

5.10 Implementation Phases

Phase 1: Foundation (2-3 days)

Goals: Define struct, init, destroy. Tasks:

Implement arena_init/arena_destroy.
Add arena_validate.
Write initial tests. Checkpoint: Init/destroy with no leaks.

Phase 2: Core Functionality (3-5 days)

Goals: Allocation with alignment. Tasks:

Implement align_up helper.
Implement arena_alloc and arena_alloc_aligned.
Add out-of-memory tests. Checkpoint: Alignment tests pass.

Phase 3: Polish & Edge Cases (2-3 days)

Goals: Demo, docs, optional marks. Tasks:

Implement demo CLI with deterministic output.
Document ownership and reset semantics.
Add optional mark/pop. Checkpoint: Failure demo matches transcript.

5.11 Key Implementation Decisions

6. Testing Strategy

6.1 Test Categories

6.2 Critical Test Cases

Allocate objects with different alignment requirements.
Allocate until full, then one more allocation should fail.
Reset and verify address reuse.

6.3 Test Data

Capacity: 64
Allocations: 8@8, 16@16, 12@4
Expected: no overlap, used <= 64

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

7.2 Debugging Strategies

Add an assertion for alignment on every allocation.
Log used and pointer values in the demo.

7.3 Performance Traps

Using a small arena for large allocations leads to frequent failures. Choose a capacity based on expected workload.

8. Extensions & Challenges

8.1 Beginner Extensions

Add an arena_strdup helper.
Add arena_calloc that zeroes memory.

8.2 Intermediate Extensions

Add mark/pop for partial resets.
Add optional zero-on-reset mode.

8.3 Advanced Extensions

Implement a multi-block arena that grows.
Integrate with a JSON parser to allocate nodes.

9. Real-World Connections

9.1 Industry Applications

Request allocators in servers.
Frame allocators in game engines.

LLVM uses arenas for IR nodes.
Many compilers use arena-like allocators for ASTs.

9.3 Interview Relevance

Custom allocator design questions.
Alignment and memory layout questions.

10. Resources

10.1 Essential Reading

“C Interfaces and Implementations” by David Hanson.
“Effective C” by Robert Seacord.

10.2 Video Resources

Memory allocator lectures (systems programming courses).

10.3 Tools & Documentation

C11 alignment documentation.

11. Self-Assessment Checklist

11.1 Understanding

I can explain alignment and why it matters.
I can describe arena invariants.
I can explain why reset invalidates pointers.

11.2 Implementation

All alignment tests pass.
Out-of-memory cases return correct error codes.
Demo output matches transcript.

11.3 Growth

I can compare arenas to malloc/free.
I documented the ownership model clearly.

12. Submission / Completion Criteria

Minimum Viable Completion:

Basic allocation, reset, destroy work.
Invariants validated in tests.

Full Completion:

Alignment-aware allocation and failure tests.

Excellence (Going Above & Beyond):

Multi-block arena or mark/pop support.

13. Additional Content Rules (Hard Requirements)

13.1 Determinism

All demos use fixed sequences of allocations with known alignment requirements.

13.2 Outcome Completeness

Success and failure demos included.
Exit codes specified in §3.7.5.

13.3 Cross-Linking

Internal references: See §5.10 Phase 2 and §6.2 Critical Test Cases.
Other projects: P04 JSON Parser, P06 HTTP Server.

13.4 No Placeholder Text

All content is complete and explicit.