Project 4: Arena Allocator
Project 4: Arena Allocator
Sprint: 1 - Memory & Control Difficulty: Intermediate Time Estimate: Weekend - 1 week Prerequisites: Project 3 or understanding of malloc/free
Overview
What youโll build: A bump/arena allocator that allocates memory from a pre-allocated block, with O(1) allocation and bulk-free semantics.
Why it teaches memory & control: This is where you understand why allocators exist. Youโll see that malloc is just softwareโsomeone wrote it. By building the simplest possible allocator, you understand memory as a raw resource to be carved up, not a magic service.
The Core Question Youโre Answering:
โWhat if I could allocate memory with just a pointer increment, and free everything at once?โ
This is the arena allocatorโs insight. It trades flexibility (individual frees) for speed (O(1) allocation, O(1) bulk free). By building one, you understand that malloc is just one way to manage memoryโand often not the best way.
Learning Objectives
By the end of this project, you will be able to:
- Explain why allocators exist and what problem they solve
- Implement a bump allocator from scratch using mmap or malloc
- Handle memory alignment requirements for different data types
- Compare arena allocation patterns to general-purpose allocation
- Identify use cases where arena allocation is optimal
- Use mmap to request memory directly from the operating system
- Reason about allocation performance (O(1) vs O(n) patterns)
- Design lifetime-based memory management for batch operations
Theoretical Foundation
Why Allocators Exist
Every time you call malloc(), youโre not asking the operating system directly. Instead, youโre asking the malloc implementationโa piece of software that mediates between your program and the OS.
Why the indirection?
System calls are expensive. Each call to the OS kernel involves:
- Context switch from user mode to kernel mode
- Security checks and permission validation
- Kernel data structure manipulation
- Context switch back to user mode
Without allocator (naive approach):
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Your Program โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ malloc(16) โ syscall โ OS gives 16 bytes โ
โ malloc(32) โ syscall โ OS gives 32 bytes โ
โ malloc(8) โ syscall โ OS gives 8 bytes โ
โ ... โ
โ 1000 allocations = 1000 syscalls = SLOW! โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
With allocator (malloc):
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Your Program โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ malloc(16) โ allocator gives 16 bytes from pool โ
โ malloc(32) โ allocator gives 32 bytes from pool โ
โ malloc(8) โ allocator gives 8 bytes from pool โ
โ ... โ
โ When pool empty โ ONE syscall to refill โ
โ 1000 allocations = ~1 syscall + 1000 pointer ops = FAST! โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
The allocatorโs job:
- Request large chunks from the OS (infrequently)
- Subdivide those chunks for your program (frequently)
- Track whatโs in use and whatโs free
- Coalesce free blocks to reduce fragmentation
The Bump Allocator: Simplest Possible Design
A bump allocator (also called arena, linear, or region allocator) is the simplest possible memory allocator:
Arena/Bump Allocator Structure:
Initial state:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ โ
โ 1024 bytes of memory โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โฒ โฒ
โ โ
base pointer end
(offset = 0)
After arena_alloc(100):
โโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ USED (100) โ AVAILABLE (924) โ
โโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โฒ โฒ โฒ
โ โ โ
base current (offset = 100) end
After arena_alloc(200):
โโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโ
โ USED (100) โ USED (200) โ AVAILABLE(724)โ
โโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโ
โฒ โฒ โฒ
โ โ โ
base current (300) end
After arena_reset():
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ โ
โ 1024 bytes of memory โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โฒ โฒ
โ โ
base (offset = 0 again!) end
All "allocations" gone instantly - O(1)!
Why itโs called โbumpโ allocation:
- To allocate, you just โbumpโ (increment) the current pointer
- No searching for free blocks
- No updating free lists
- Just
ptr = base + offset; offset += size;
The Tradeoff: Speed vs Flexibility
General-purpose allocator (malloc):
- Can free individual allocations
- Handles arbitrary allocation/free patterns
- Must track every allocation
- O(n) or O(log n) allocation in worst case
- Complex implementation
Arena allocator:
- Can only free ALL allocations at once
- Perfect for batch operations
- Minimal tracking (just one offset)
- O(1) allocation always
- Simple implementation (~50 lines)
When to use which:
malloc is better when:
โโ Allocations have varying, unpredictable lifetimes
โโ You need to free individual objects
โโ Memory pressure requires immediate reclamation
โโ General-purpose code with no allocation pattern
Arena is better when:
โโ Allocations share a common lifetime (batch operations)
โโ You can reset/free everything together
โโ Performance is critical
โโ Examples:
โโ Game frame allocations (allocate during frame, reset at frame end)
โโ Request handlers (allocate during request, reset when done)
โโ Compiler/parser (allocate AST nodes, free when compilation done)
โโ JSON parser (allocate tree nodes, free entire tree at once)
Memory Alignment
Modern CPUs require data to be aligned to certain boundaries for efficient (or even correct) access:
Alignment Requirements (typical x86-64):
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Type โ Size โ Alignment โ Valid Addresses โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ char โ 1 byte โ 1 byte โ any address โ
โ short โ 2 bytes โ 2 bytes โ 0, 2, 4, 6, 8, ... โ
โ int โ 4 bytes โ 4 bytes โ 0, 4, 8, 12, ... โ
โ long/pointer โ 8 bytes โ 8 bytes โ 0, 8, 16, 24, ... โ
โ double โ 8 bytes โ 8 bytes โ 0, 8, 16, 24, ... โ
โ long double โ 16 bytesโ 16 bytes โ 0, 16, 32, ... โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
What happens with misaligned access:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Aligned (good): โ
โ โ
โ Address: 0 4 8 12 16 20 24 28 โ
โ โโโโโโผโโโโโผโโโโโผโโโโโผโโโโโผโโโโโผโโโโโค โ
โ โint โint โint โint โ... โ โ โ โ
โ โโโโโโดโโโโโดโโโโโดโโโโโดโโโโโดโโโโโดโโโโโ โ
โ โ One memory read per int โ
โ โ
โ Misaligned (bad): โ
โ โ
โ Address: 0 4 8 12 16 20 24 28 โ
โ โโโโโโผโโโโโผโโโโโผโโโโโผโโโโโผโโโโโผโโโโโค โ
โ โ x โ IN T โผโ x โ โ โ โ โ
โ โโโโโโดโโโโโดโโโโโดโโโโโดโโโโโดโโโโโดโโโโโ โ
โ โฒ int starting at address 2 โ
โ โ Two memory reads required, or CPU exception! โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Alignment formula:
To align address addr up to alignment align (where align is a power of 2):
aligned_addr = (addr + align - 1) & ~(align - 1);
Example:
Aligning 5 to 4-byte boundary:
addr = 5 = 0b00000101
align = 4 = 0b00000100
align - 1 = 0b00000011
addr + align - 1 = 5 + 3 = 8 = 0b00001000
~(align - 1) = 0b11111100
(8) & 0b11111100 = 0b00001000 = 8
Result: 5 aligned up to 8 (next multiple of 4)
Getting Memory from the OS: mmap
While you can use malloc() to get memory for your arena, using mmap() directly gives you more control:
#include <sys/mman.h>
// Request 1MB of memory directly from OS
void* block = mmap(
NULL, // Let OS choose address
1024 * 1024, // Size in bytes
PROT_READ | PROT_WRITE, // Can read and write
MAP_PRIVATE | MAP_ANONYMOUS, // Private, not backed by file
-1, // No file descriptor
0 // No offset
);
// Give memory back to OS
munmap(block, 1024 * 1024);
Why use mmap over malloc?
- Direct control: Know exactly how much youโre getting
- No metadata overhead: malloc adds headers to each allocation
- Guaranteed contiguous: One big block, not potentially scattered
- Zero-initialized: mmap gives you zeroed memory (with MAP_ANONYMOUS)
- Large allocations: malloc may use mmap internally anyway for large requests
Project Specification
Core API
// Create a new arena with specified capacity
Arena* arena_create(size_t capacity);
// Allocate size bytes from the arena
// Returns NULL if arena is exhausted
void* arena_alloc(Arena* arena, size_t size);
// Allocate with specific alignment
void* arena_alloc_aligned(Arena* arena, size_t size, size_t alignment);
// Reset arena - all allocations become invalid
// This is O(1) - just resets the offset
void arena_reset(Arena* arena);
// Destroy arena and release memory to OS
void arena_destroy(Arena* arena);
// Query functions
size_t arena_used(Arena* arena);
size_t arena_remaining(Arena* arena);
Expected Output Examples
Basic Usage:
$ ./test_arena
Arena created: 1048576 bytes
After allocating p1:
Used: 54 bytes
Free: 1048522 bytes
After allocating p2:
Used: 108 bytes
Free: 1048468 bytes
After allocating 100 integers:
Used: 508 bytes
Free: 1048068 bytes
After arena_reset():
Used: 0 bytes (back to zero!)
Free: 1048576 bytes (all available again!)
Visual Bump Pointer:
$ ./test_visual
Initial state:
โโ Base: 0x100204000
โโ Current: 0x100204000 (base + 0)
โโ Capacity: 1024 bytes
โโ Used: 0.00%
After arena_alloc(100):
โโ Base: 0x100204000
โโ Current: 0x100204064 (base + 100)
โโ Capacity: 1024 bytes
โโ Used: 9.77%
After arena_alloc(200):
โโ Base: 0x100204000
โโ Current: 0x1002040c8 (base + 300)
โโ Capacity: 1024 bytes
โโ Used: 29.30%
After arena_alloc(300):
โโ Base: 0x100204000
โโ Current: 0x10020412c (base + 600)
โโ Capacity: 1024 bytes
โโ Used: 58.59%
Performance Benchmark:
$ ./benchmark
Benchmarking 1000000 iterations of 100 allocations each...
malloc/free: 8.342 seconds
Arena alloc: 0.241 seconds
Arena is 34.6x faster!
Minimum Viable Features
- arena_create: Request memory block from OS
- arena_alloc: Bump allocate with default alignment
- arena_reset: Reset offset to 0
- arena_destroy: Return memory to OS
Extended Features
- Alignment support: Align allocations to arbitrary power-of-2 boundaries
- Typed allocation macros:
arena_alloc_type(arena, Type) - Growing arenas: Chain multiple blocks when capacity exceeded
- Temporary arenas: Save/restore offset for nested scopes
- Debug mode: Track allocation count, high watermark, statistics
Solution Architecture
Data Structures
typedef struct Arena {
char* base; // Start of memory block
size_t offset; // Current position (bytes used)
size_t capacity; // Total size of block
} Arena;
Memory Layout:
Arena struct (on heap or stack):
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ base โ โโโโโโโโโโโโโโ โ
โ offset = 300 โ โ
โ capacity = 1024 โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
Memory block (from mmap):
โโโโโโโโโโโโฌโโโโโโโโโโโโโฌโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโ
โ A (100) โ B (200) โ โ Available (724) โ
โโโโโโโโโโโโดโโโโโโโโโโโโโค โ โ
โ USED (300) โ โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโ
โฒ โฒ โฒ
โ โ โ
base base + offset base + capacity
Algorithm: Bump Allocation
arena_alloc(arena, size):
1. Check if size fits: offset + size <= capacity?
- If no: return NULL (out of space)
2. Calculate pointer: ptr = base + offset
3. Bump offset: offset += size
4. Return ptr
Time complexity: O(1) - just arithmetic and comparison
Space overhead: 0 bytes per allocation (no headers!)
Algorithm: Aligned Allocation
arena_alloc_aligned(arena, size, alignment):
1. Calculate aligned offset:
aligned_offset = (offset + alignment - 1) & ~(alignment - 1)
2. Check if aligned allocation fits:
aligned_offset + size <= capacity?
- If no: return NULL
3. Calculate pointer: ptr = base + aligned_offset
4. Bump offset: offset = aligned_offset + size
5. Return ptr
Note: Some bytes may be "wasted" as padding between allocations
Alignment Padding Example:
Before (offset = 5, need 8-byte alignment):
โโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โUSED โ AVAILABLE โ
โโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โฒ
โ
offset = 5
After arena_alloc_aligned(arena, 16, 8):
โโโโโโโฌโโโโฌโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โUSED โPADโ NEW ALLOC (16) โ AVAILABLE โ
โโโโโโโดโโโโดโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โฒ โฒ โฒ
โ โ โ
5 8 24 (new offset)
3 bytes of padding "wasted" for alignment
Component Organization
arena.h - Public API and Arena typedef
arena.c - Implementation
โโ arena_create - mmap wrapper
โโ arena_alloc - Bump allocation
โโ arena_alloc_aligned - Aligned bump allocation
โโ arena_reset - Set offset to 0
โโ arena_destroy - munmap wrapper
โโ query functions
test_arena.c - Basic functionality tests
benchmark.c - Performance comparison with malloc
Implementation Guide
Phase 1: Minimal Arena (30 minutes)
Goal: Get basic allocation working without mmap or alignment.
- Create
Arenastruct with base, offset, capacity - Implement
arena_createusingmalloc(capacity)for now - Implement
arena_alloc: check bounds, bump offset, return pointer - Implement
arena_reset: set offset to 0 - Implement
arena_destroy:free(base)
Test: Allocate some integers, print their addresses, verify theyโre sequential.
// Your first test
Arena* arena = arena_create(1024);
int* a = arena_alloc(arena, sizeof(int));
int* b = arena_alloc(arena, sizeof(int));
int* c = arena_alloc(arena, sizeof(int));
*a = 10; *b = 20; *c = 30;
printf("a at %p, b at %p, c at %p\n", a, b, c);
// Should be exactly 4 bytes apart!
arena_destroy(arena);
Phase 2: Use mmap (30 minutes)
Goal: Request memory directly from OS.
- Replace
malloc(capacity)withmmap()call - Replace
free()withmunmap() - Handle mmap failure (returns
MAP_FAILED)
#include <sys/mman.h>
Arena* arena_create(size_t capacity) {
Arena* arena = malloc(sizeof(Arena));
if (!arena) return NULL;
arena->base = mmap(
NULL,
capacity,
PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS,
-1,
0
);
if (arena->base == MAP_FAILED) {
free(arena);
return NULL;
}
arena->offset = 0;
arena->capacity = capacity;
return arena;
}
Phase 3: Alignment (45 minutes)
Goal: Support aligned allocations for any power-of-2 alignment.
- Implement alignment helper function
- Create
arena_alloc_aligned(arena, size, alignment) - Make
arena_alloccallarena_alloc_alignedwith default alignment (8 or 16) - Test with different types requiring different alignments
// Alignment helper
static size_t align_up(size_t value, size_t alignment) {
return (value + alignment - 1) & ~(alignment - 1);
}
void* arena_alloc_aligned(Arena* arena, size_t size, size_t alignment) {
size_t aligned_offset = align_up(arena->offset, alignment);
if (aligned_offset + size > arena->capacity) {
return NULL;
}
void* ptr = arena->base + aligned_offset;
arena->offset = aligned_offset + size;
return ptr;
}
Phase 4: Type-Safe Macros (30 minutes)
Goal: Create convenient macros for type allocations.
// Single allocation
#define arena_new(arena, type) \
((type*)arena_alloc_aligned((arena), sizeof(type), _Alignof(type)))
// Array allocation
#define arena_array(arena, type, count) \
((type*)arena_alloc_aligned((arena), sizeof(type) * (count), _Alignof(type)))
// Usage:
Player* player = arena_new(arena, Player);
int* scores = arena_array(arena, int, 100);
Phase 5: Performance Benchmark (30 minutes)
Goal: Quantify the speed advantage.
#include <time.h>
#define ITERATIONS 1000000
#define ALLOCS_PER_ITER 100
#define ALLOC_SIZE 64
void benchmark_malloc() {
clock_t start = clock();
void* ptrs[ALLOCS_PER_ITER];
for (int i = 0; i < ITERATIONS; i++) {
for (int j = 0; j < ALLOCS_PER_ITER; j++) {
ptrs[j] = malloc(ALLOC_SIZE);
}
for (int j = 0; j < ALLOCS_PER_ITER; j++) {
free(ptrs[j]);
}
}
clock_t end = clock();
printf("malloc/free: %.3f seconds\n",
(double)(end - start) / CLOCKS_PER_SEC);
}
void benchmark_arena() {
clock_t start = clock();
Arena* arena = arena_create(ALLOC_SIZE * ALLOCS_PER_ITER);
for (int i = 0; i < ITERATIONS; i++) {
for (int j = 0; j < ALLOCS_PER_ITER; j++) {
arena_alloc(arena, ALLOC_SIZE);
}
arena_reset(arena);
}
arena_destroy(arena);
clock_t end = clock();
printf("arena: %.3f seconds\n",
(double)(end - start) / CLOCKS_PER_SEC);
}
Phase 6: Extended Features (Optional)
Temporary/Nested Arenas:
// Save current position
size_t arena_save(Arena* arena) {
return arena->offset;
}
// Restore to saved position (invalidates all allocations since save)
void arena_restore(Arena* arena, size_t savepoint) {
arena->offset = savepoint;
}
// Usage:
size_t saved = arena_save(arena);
// ... temporary allocations ...
arena_restore(arena, saved); // All temp allocations gone
Growing Arenas:
typedef struct ArenaBlock {
char* base;
size_t capacity;
struct ArenaBlock* next;
} ArenaBlock;
typedef struct GrowingArena {
ArenaBlock* head;
ArenaBlock* current;
size_t offset;
size_t block_size;
} GrowingArena;
// When current block is full, allocate new block and chain
Testing Strategy
Unit Tests
void test_basic_allocation() {
Arena* arena = arena_create(1024);
assert(arena != NULL);
int* a = arena_alloc(arena, sizeof(int));
assert(a != NULL);
*a = 42;
assert(*a == 42);
arena_destroy(arena);
printf("test_basic_allocation: PASSED\n");
}
void test_sequential_addresses() {
Arena* arena = arena_create(1024);
char* a = arena_alloc(arena, 1);
char* b = arena_alloc(arena, 1);
char* c = arena_alloc(arena, 1);
// Should be sequential
assert(b == a + 1);
assert(c == b + 1);
arena_destroy(arena);
printf("test_sequential_addresses: PASSED\n");
}
void test_reset() {
Arena* arena = arena_create(1024);
arena_alloc(arena, 512);
assert(arena_used(arena) == 512);
arena_reset(arena);
assert(arena_used(arena) == 0);
assert(arena_remaining(arena) == 1024);
arena_destroy(arena);
printf("test_reset: PASSED\n");
}
void test_exhaustion() {
Arena* arena = arena_create(100);
void* a = arena_alloc(arena, 50);
assert(a != NULL);
void* b = arena_alloc(arena, 50);
assert(b != NULL);
// This should fail - arena exhausted
void* c = arena_alloc(arena, 1);
assert(c == NULL);
arena_destroy(arena);
printf("test_exhaustion: PASSED\n");
}
void test_alignment() {
Arena* arena = arena_create(1024);
// Allocate 1 byte to misalign
arena_alloc(arena, 1);
// Now allocate something requiring 8-byte alignment
double* d = arena_alloc_aligned(arena, sizeof(double), 8);
// Address should be 8-byte aligned
assert(((uintptr_t)d % 8) == 0);
// Should be usable
*d = 3.14159;
assert(*d == 3.14159);
arena_destroy(arena);
printf("test_alignment: PASSED\n");
}
Integration Test: Game Frame Simulation
void test_game_frame_pattern() {
Arena* frame_arena = arena_create(1024 * 1024); // 1MB
for (int frame = 0; frame < 60; frame++) {
// Simulate frame allocations
Vector3* positions = arena_array(frame_arena, Vector3, 1000);
Vector3* velocities = arena_array(frame_arena, Vector3, 1000);
char* debug_buffer = arena_alloc(frame_arena, 4096);
// "Use" the allocations
for (int i = 0; i < 1000; i++) {
positions[i].x = (float)i;
velocities[i].x = (float)i * 0.1f;
}
snprintf(debug_buffer, 4096, "Frame %d", frame);
// Reset for next frame
arena_reset(frame_arena);
}
arena_destroy(frame_arena);
printf("test_game_frame_pattern: PASSED (60 frames, no leaks)\n");
}
Common Pitfalls
Pitfall 1: Using Allocations After Reset
// WRONG: Using pointer after reset
Arena* arena = arena_create(1024);
int* data = arena_alloc(arena, sizeof(int) * 100);
arena_reset(arena);
data[0] = 42; // UNDEFINED BEHAVIOR! Memory may be reused
// CORRECT: Reset invalidates ALL pointers from that arena
Arena* arena = arena_create(1024);
int* data = arena_alloc(arena, sizeof(int) * 100);
// Use data...
// When done:
data = NULL; // Clear your pointer
arena_reset(arena);
Pitfall 2: Forgetting Alignment
// WRONG: May cause crashes on some CPUs
Arena* arena = arena_create(1024);
char c;
arena_alloc(arena, 1); // offset = 1
double* d = arena_alloc(arena, sizeof(double)); // offset = 1, misaligned!
*d = 3.14; // Potential bus error!
// CORRECT: Use aligned allocation
Arena* arena = arena_create(1024);
arena_alloc(arena, 1); // offset = 1
double* d = arena_alloc_aligned(arena, sizeof(double), alignof(double));
*d = 3.14; // Safe, properly aligned
Pitfall 3: Integer Overflow in Size Calculations
// WRONG: Possible overflow with large sizes
void* arena_alloc(Arena* arena, size_t size) {
if (arena->offset + size > arena->capacity) { // Can overflow!
return NULL;
}
// ...
}
// CORRECT: Check for overflow
void* arena_alloc(Arena* arena, size_t size) {
if (size > arena->capacity - arena->offset) { // Safe check
return NULL;
}
// ...
}
Pitfall 4: Not Handling mmap Failure
// WRONG: Not checking mmap return value
Arena* arena_create(size_t capacity) {
Arena* arena = malloc(sizeof(Arena));
arena->base = mmap(NULL, capacity, PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
// If mmap failed, base == MAP_FAILED, not NULL!
arena->offset = 0;
arena->capacity = capacity;
return arena;
}
// CORRECT: Check for MAP_FAILED
Arena* arena_create(size_t capacity) {
Arena* arena = malloc(sizeof(Arena));
if (!arena) return NULL;
arena->base = mmap(NULL, capacity, PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
if (arena->base == MAP_FAILED) {
free(arena);
return NULL;
}
arena->offset = 0;
arena->capacity = capacity;
return arena;
}
Pitfall 5: Using Arena for Wrong Pattern
// WRONG: Using arena when you need individual frees
Arena* arena = arena_create(1024);
void* item;
while ((item = get_next_item())) {
void* processed = process_in_arena(arena, item);
output(processed);
// Can't free processed individually - arena keeps growing!
}
// Eventually arena exhausted
// CORRECT: Use arena for batch patterns only
Arena* arena = arena_create(1024);
while (has_batches()) {
Batch* batch = get_batch();
for (int i = 0; i < batch->count; i++) {
void* processed = process_in_arena(arena, batch->items[i]);
output(processed);
}
arena_reset(arena); // Reset after each batch
}
Extensions and Challenges
Challenge 1: Growing Arena (Medium)
Implement an arena that automatically grows when exhausted:
- Chain multiple blocks together
- Keep allocating without interruption
- Reset frees all blocks except the first
- Measure overhead vs fixed arena
Challenge 2: Scratch/Temp Arena (Medium)
Implement nested arena scopes:
Arena* arena = arena_create(1024);
// ... some allocations ...
ArenaTemp temp = arena_begin_temp(arena);
// ... temporary allocations ...
arena_end_temp(temp); // Everything since begin_temp is freed
// Original allocations still valid
Challenge 3: Pool Allocator (Hard)
Build a pool allocator on top of your arena:
- Fixed-size blocks only
- O(1) allocate AND free
- No fragmentation
- Free list stored in the free blocks themselves
Challenge 4: Thread-Local Arenas (Hard)
Make arenas thread-safe:
- Option 1: One arena per thread
- Option 2: Mutex-protected arena
- Compare performance
Challenge 5: Debug Arena (Medium)
Add debugging features:
- Track number of allocations
- Track high-water mark
- Poison freed memory (fill with 0xDE)
- Detect use-after-reset
Real-World Applications
Game Engine Frame Allocator
// Used by many game engines (Unreal, Unity underlying systems, etc.)
void game_loop() {
Arena* frame_arena = arena_create(16 * 1024 * 1024); // 16MB
while (game_running) {
process_input(frame_arena);
update_physics(frame_arena);
render_frame(frame_arena);
// All frame-specific allocations freed instantly
arena_reset(frame_arena);
}
arena_destroy(frame_arena);
}
Web Server Request Handler
// Each request gets its own arena
void handle_request(Request* req) {
Arena* request_arena = arena_create(64 * 1024); // 64KB
ParsedHeaders* headers = parse_headers(request_arena, req);
JsonValue* body = parse_json(request_arena, req->body);
Response* response = generate_response(request_arena, headers, body);
send_response(response);
// All request data freed at once
arena_destroy(request_arena);
}
Compiler/Parser
// Parse phase allocates AST nodes into arena
AST* parse_file(const char* filename) {
Arena* parse_arena = arena_create(1024 * 1024); // 1MB
Tokens* tokens = lex(parse_arena, filename);
AST* ast = parse(parse_arena, tokens);
// AST lives in arena - return it
// Arena destroyed when AST no longer needed
return ast;
}
Simulation/Scientific Computing
// Monte Carlo simulation with many temporary values
void run_simulation(int iterations) {
Arena* iter_arena = arena_create(10 * 1024 * 1024); // 10MB
for (int i = 0; i < iterations; i++) {
Matrix* temp1 = matrix_create(iter_arena, 1000, 1000);
Matrix* temp2 = matrix_multiply(iter_arena, temp1, other);
Vector* result = matrix_reduce(iter_arena, temp2);
accumulate(result);
// Hundreds of MB would be allocated without arena
// With arena: just reset
arena_reset(iter_arena);
}
arena_destroy(iter_arena);
}
Interview Preparation
Common Questions
- โWhat is an arena allocator? When would you use one?โ
- An arena allocator pre-allocates a large block and hands out pieces sequentially
- Use when allocations share a lifetime and can be freed together
- Examples: per-frame, per-request, per-parse operations
- โWhatโs the time complexity of arena allocation vs malloc?โ
- Arena: O(1) always - just bump a pointer
- malloc: O(1) best case, O(n) worst case (searching free lists)
- Arena reset: O(1) - just set offset to 0
- free: O(1) to O(log n) depending on implementation
- โWhy canโt you free individual allocations from an arena?โ
- No per-allocation metadata to track whatโs allocated
- Would need free list, defeating the purpose
- Design choice: trade individual frees for speed
- โWhat is memory alignment and why does it matter?โ
- CPUs require data at specific address boundaries (2, 4, 8, 16 bytes)
- Misaligned access: slower (multiple memory reads) or crash (bus error)
- Align formula:
(addr + align - 1) & ~(align - 1)
- โCompare arena allocators to pool allocators.โ
- Arena: variable-size, no individual free, bulk reset
- Pool: fixed-size, individual free via free list, good for objects of same size
- Arena for temporary data, pool for persistent same-sized objects
- โIn what scenarios would an arena allocator be faster than malloc?โ
- Many small allocations followed by bulk free
- Known lifetime patterns (frame, request, transaction)
- Performance-critical paths with predictable allocation
- When fragmentation would hurt malloc
Self-Assessment Checklist
Understanding (Can You Explain?)
- Why syscalls are expensive and allocators batch them
- What โbump allocationโ means and why itโs O(1)
- The tradeoff between flexibility and performance in allocators
- Why alignment matters and how to calculate aligned addresses
- When arena allocation is appropriate vs general-purpose
Implementation (Can You Build?)
- Basic arena with create/alloc/reset/destroy
- Aligned allocation supporting power-of-2 alignments
- Type-safe macros for allocation
- mmap-based memory acquisition
Application (Can You Apply?)
- Identify code that would benefit from arena allocation
- Benchmark arena vs malloc for specific patterns
- Design arena usage for a game frame or request handler
- Avoid the common pitfalls (use-after-reset, alignment)
Extension (Can You Extend?)
- Implement growing arenas
- Add temp/scratch arena pattern
- Build pool allocator on top of arena
- Add debug/statistics features
Resources
Books
| Topic | Book | Chapter |
|---|---|---|
| Arena design patterns | โC Interfaces and Implementationsโ by David Hanson | Ch. 5-6 |
| Memory allocator internals | โComputer Systems: A Programmerโs Perspectiveโ | Ch. 9.9 |
| mmap and virtual memory | โThe Linux Programming Interfaceโ | Ch. 49 |
| Alignment requirements | โComputer Systems: A Programmerโs Perspectiveโ | Ch. 3.9.3 |
| Allocator strategies | โOperating Systems: Three Easy Piecesโ | Ch. 17 |
Online Resources
- Memory Allocators 101
- malloc() from Scratch
- Build Your Own Memory Allocator
- Ryan Fleuryโs Arena Allocator Article
Code References
- stb single-file libraries - See memory handling patterns
- mimalloc - Microsoftโs performance allocator
- jemalloc - General-purpose allocator with arenas
Summary
The arena allocator teaches a fundamental truth about memory management: malloc is just one solution, and often not the best one. By building your own allocator, you understand:
- Memory is a resource - You can manage it however you want
- Tradeoffs exist - Speed vs flexibility, simplicity vs features
- Patterns matter - Know your allocation pattern, choose the right allocator
- The OS is expensive - Batch syscalls, minimize kernel transitions
- Alignment is real - CPUs have requirements, respect them
When you can build a custom allocator for a specific use case and quantify its performance advantage, youโve moved from โusingโ memory to โmanagingโ memory. This is systems programming.
Next Project: P05: Exploit Lab (Buffer Overflow) - Now that you understand how memory really works, letโs see what happens when it goes wrong on purpose.