Project 4: Arena Allocator

Sprint: 1 - Memory & Control Difficulty: Intermediate Time Estimate: Weekend - 1 week Prerequisites: Project 3 or understanding of malloc/free

Overview

What you’ll build: A bump/arena allocator that allocates memory from a pre-allocated block, with O(1) allocation and bulk-free semantics.

Why it teaches memory & control: This is where you understand why allocators exist. You’ll see that malloc is just software—someone wrote it. By building the simplest possible allocator, you understand memory as a raw resource to be carved up, not a magic service.

The Core Question You’re Answering:

“What if I could allocate memory with just a pointer increment, and free everything at once?”

This is the arena allocator’s insight. It trades flexibility (individual frees) for speed (O(1) allocation, O(1) bulk free). By building one, you understand that malloc is just one way to manage memory—and often not the best way.

Learning Objectives

By the end of this project, you will be able to:

Explain why allocators exist and what problem they solve
Implement a bump allocator from scratch using mmap or malloc
Handle memory alignment requirements for different data types
Compare arena allocation patterns to general-purpose allocation
Identify use cases where arena allocation is optimal
Use mmap to request memory directly from the operating system
Reason about allocation performance (O(1) vs O(n) patterns)
Design lifetime-based memory management for batch operations

Theoretical Foundation

Why Allocators Exist

Every time you call malloc(), you’re not asking the operating system directly. Instead, you’re asking the malloc implementation—a piece of software that mediates between your program and the OS.

Why the indirection?

System calls are expensive. Each call to the OS kernel involves:

Context switch from user mode to kernel mode
Security checks and permission validation
Kernel data structure manipulation
Context switch back to user mode

Without allocator (naive approach):
┌──────────────────────────────────────────────────────────────┐
│ Your Program                                                  │
├──────────────────────────────────────────────────────────────┤
│  malloc(16)  → syscall → OS gives 16 bytes                  │
│  malloc(32)  → syscall → OS gives 32 bytes                  │
│  malloc(8)   → syscall → OS gives 8 bytes                   │
│  ...                                                         │
│  1000 allocations = 1000 syscalls = SLOW!                   │
└──────────────────────────────────────────────────────────────┘

With allocator (malloc):
┌──────────────────────────────────────────────────────────────┐
│ Your Program                                                  │
├──────────────────────────────────────────────────────────────┤
│  malloc(16)  → allocator gives 16 bytes from pool           │
│  malloc(32)  → allocator gives 32 bytes from pool           │
│  malloc(8)   → allocator gives 8 bytes from pool            │
│  ...                                                         │
│  When pool empty → ONE syscall to refill                    │
│  1000 allocations = ~1 syscall + 1000 pointer ops = FAST!   │
└──────────────────────────────────────────────────────────────┘

The allocator’s job:

Request large chunks from the OS (infrequently)
Subdivide those chunks for your program (frequently)
Track what’s in use and what’s free
Coalesce free blocks to reduce fragmentation

The Bump Allocator: Simplest Possible Design

A bump allocator (also called arena, linear, or region allocator) is the simplest possible memory allocator:

Arena/Bump Allocator Structure:

Initial state:
┌───────────────────────────────────────────────────────────────┐
│                                                               │
│                   1024 bytes of memory                        │
│                                                               │
└───────────────────────────────────────────────────────────────┘
▲                                                               ▲
│                                                               │
base pointer                                                    end
(offset = 0)

After arena_alloc(100):
┌─────────────────┬─────────────────────────────────────────────┐
│    USED (100)   │            AVAILABLE (924)                   │
└─────────────────┴─────────────────────────────────────────────┘
▲                 ▲                                             ▲
│                 │                                             │
base              current (offset = 100)                        end

After arena_alloc(200):
┌─────────────────┬─────────────────────────────┬───────────────┐
│    USED (100)   │      USED (200)             │ AVAILABLE(724)│
└─────────────────┴─────────────────────────────┴───────────────┘
▲                                               ▲               ▲
│                                               │               │
base                                            current (300)   end

After arena_reset():
┌───────────────────────────────────────────────────────────────┐
│                                                               │
│                   1024 bytes of memory                        │
│                                                               │
└───────────────────────────────────────────────────────────────┘
▲                                                               ▲
│                                                               │
base (offset = 0 again!)                                        end
All "allocations" gone instantly - O(1)!

Why it’s called “bump” allocation:

To allocate, you just “bump” (increment) the current pointer
No searching for free blocks
No updating free lists
Just ptr = base + offset; offset += size;

The Tradeoff: Speed vs Flexibility

General-purpose allocator (malloc):

Can free individual allocations
Handles arbitrary allocation/free patterns
Must track every allocation
O(n) or O(log n) allocation in worst case
Complex implementation

Arena allocator:

Can only free ALL allocations at once
Perfect for batch operations
Minimal tracking (just one offset)
O(1) allocation always
Simple implementation (~50 lines)

When to use which:

malloc is better when:
├─ Allocations have varying, unpredictable lifetimes
├─ You need to free individual objects
├─ Memory pressure requires immediate reclamation
└─ General-purpose code with no allocation pattern

Arena is better when:
├─ Allocations share a common lifetime (batch operations)
├─ You can reset/free everything together
├─ Performance is critical
└─ Examples:
   ├─ Game frame allocations (allocate during frame, reset at frame end)
   ├─ Request handlers (allocate during request, reset when done)
   ├─ Compiler/parser (allocate AST nodes, free when compilation done)
   └─ JSON parser (allocate tree nodes, free entire tree at once)

Memory Alignment

Modern CPUs require data to be aligned to certain boundaries for efficient (or even correct) access:

Alignment Requirements (typical x86-64):
┌────────────────────────────────────────────────────────────────┐
│ Type          │ Size    │ Alignment   │ Valid Addresses        │
├────────────────────────────────────────────────────────────────┤
│ char          │ 1 byte  │ 1 byte      │ any address            │
│ short         │ 2 bytes │ 2 bytes     │ 0, 2, 4, 6, 8, ...    │
│ int           │ 4 bytes │ 4 bytes     │ 0, 4, 8, 12, ...      │
│ long/pointer  │ 8 bytes │ 8 bytes     │ 0, 8, 16, 24, ...     │
│ double        │ 8 bytes │ 8 bytes     │ 0, 8, 16, 24, ...     │
│ long double   │ 16 bytes│ 16 bytes    │ 0, 16, 32, ...        │
└────────────────────────────────────────────────────────────────┘

What happens with misaligned access:
┌────────────────────────────────────────────────────────────────┐
│ Aligned (good):                                                │
│                                                                │
│ Address: 0    4    8    12   16   20   24   28               │
│          ├────┼────┼────┼────┼────┼────┼────┤                │
│          │int │int │int │int │... │    │    │                │
│          └────┴────┴────┴────┴────┴────┴────┘                │
│          ✓ One memory read per int                            │
│                                                                │
│ Misaligned (bad):                                              │
│                                                                │
│ Address: 0    4    8    12   16   20   24   28               │
│          ├────┼────┼────┼────┼────┼────┼────┤                │
│          │ x  │ IN T  ▼│ x  │    │    │    │                │
│          └────┴────┴────┴────┴────┴────┴────┘                │
│                  ▲ int starting at address 2                  │
│          ✗ Two memory reads required, or CPU exception!       │
└────────────────────────────────────────────────────────────────┘

Alignment formula: To align address addr up to alignment align (where align is a power of 2):

aligned_addr = (addr + align - 1) & ~(align - 1);

Example:

Aligning 5 to 4-byte boundary:
  addr = 5      = 0b00000101
  align = 4     = 0b00000100
  align - 1     = 0b00000011

  addr + align - 1 = 5 + 3 = 8 = 0b00001000
  ~(align - 1) = 0b11111100

  (8) & 0b11111100 = 0b00001000 = 8

  Result: 5 aligned up to 8 (next multiple of 4)

Getting Memory from the OS: mmap

While you can use malloc() to get memory for your arena, using mmap() directly gives you more control:

#include <sys/mman.h>

// Request 1MB of memory directly from OS
void* block = mmap(
    NULL,                    // Let OS choose address
    1024 * 1024,            // Size in bytes
    PROT_READ | PROT_WRITE,  // Can read and write
    MAP_PRIVATE | MAP_ANONYMOUS, // Private, not backed by file
    -1,                      // No file descriptor
    0                        // No offset
);

// Give memory back to OS
munmap(block, 1024 * 1024);

Why use mmap over malloc?

Direct control: Know exactly how much you’re getting
No metadata overhead: malloc adds headers to each allocation
Guaranteed contiguous: One big block, not potentially scattered
Zero-initialized: mmap gives you zeroed memory (with MAP_ANONYMOUS)
Large allocations: malloc may use mmap internally anyway for large requests

Project Specification

Core API

// Create a new arena with specified capacity
Arena* arena_create(size_t capacity);

// Allocate size bytes from the arena
// Returns NULL if arena is exhausted
void* arena_alloc(Arena* arena, size_t size);

// Allocate with specific alignment
void* arena_alloc_aligned(Arena* arena, size_t size, size_t alignment);

// Reset arena - all allocations become invalid
// This is O(1) - just resets the offset
void arena_reset(Arena* arena);

// Destroy arena and release memory to OS
void arena_destroy(Arena* arena);

// Query functions
size_t arena_used(Arena* arena);
size_t arena_remaining(Arena* arena);

Expected Output Examples

Basic Usage:

$ ./test_arena

Arena created: 1048576 bytes

After allocating p1:
  Used: 54 bytes
  Free: 1048522 bytes

After allocating p2:
  Used: 108 bytes
  Free: 1048468 bytes

After allocating 100 integers:
  Used: 508 bytes
  Free: 1048068 bytes

After arena_reset():
  Used: 0 bytes (back to zero!)
  Free: 1048576 bytes (all available again!)

Visual Bump Pointer:

$ ./test_visual

Initial state:
├─ Base:     0x100204000
├─ Current:  0x100204000 (base + 0)
├─ Capacity: 1024 bytes
└─ Used:     0.00%

After arena_alloc(100):
├─ Base:     0x100204000
├─ Current:  0x100204064 (base + 100)
├─ Capacity: 1024 bytes
└─ Used:     9.77%

After arena_alloc(200):
├─ Base:     0x100204000
├─ Current:  0x1002040c8 (base + 300)
├─ Capacity: 1024 bytes
└─ Used:     29.30%

After arena_alloc(300):
├─ Base:     0x100204000
├─ Current:  0x10020412c (base + 600)
├─ Capacity: 1024 bytes
└─ Used:     58.59%

Performance Benchmark:

$ ./benchmark

Benchmarking 1000000 iterations of 100 allocations each...

malloc/free:  8.342 seconds
Arena alloc:  0.241 seconds

Arena is 34.6x faster!

Minimum Viable Features

arena_create: Request memory block from OS
arena_alloc: Bump allocate with default alignment
arena_reset: Reset offset to 0
arena_destroy: Return memory to OS

Extended Features

Alignment support: Align allocations to arbitrary power-of-2 boundaries
Typed allocation macros: arena_alloc_type(arena, Type)
Growing arenas: Chain multiple blocks when capacity exceeded
Temporary arenas: Save/restore offset for nested scopes
Debug mode: Track allocation count, high watermark, statistics

Solution Architecture

Data Structures

typedef struct Arena {
    char* base;       // Start of memory block
    size_t offset;    // Current position (bytes used)
    size_t capacity;  // Total size of block
} Arena;

Memory Layout:

Arena struct (on heap or stack):
┌─────────────────────────────────────────────┐
│ base → ─────────────┐                       │
│ offset = 300        │                       │
│ capacity = 1024     │                       │
└─────────────────────│───────────────────────┘
                      │
                      ▼
Memory block (from mmap):
┌──────────┬────────────┬────────────┬────────────────────────┐
│ A (100)  │  B (200)   │            │    Available (724)     │
├──────────┴────────────┤            │                        │
│      USED (300)       │            │                        │
└───────────────────────┴────────────┴────────────────────────┘
▲                       ▲                                     ▲
│                       │                                     │
base                    base + offset                         base + capacity

Algorithm: Bump Allocation

arena_alloc(arena, size):
    1. Check if size fits: offset + size <= capacity?
       - If no: return NULL (out of space)

    2. Calculate pointer: ptr = base + offset

    3. Bump offset: offset += size

    4. Return ptr

Time complexity: O(1) - just arithmetic and comparison
Space overhead: 0 bytes per allocation (no headers!)

Algorithm: Aligned Allocation

arena_alloc_aligned(arena, size, alignment):
    1. Calculate aligned offset:
       aligned_offset = (offset + alignment - 1) & ~(alignment - 1)

    2. Check if aligned allocation fits:
       aligned_offset + size <= capacity?
       - If no: return NULL

    3. Calculate pointer: ptr = base + aligned_offset

    4. Bump offset: offset = aligned_offset + size

    5. Return ptr

Note: Some bytes may be "wasted" as padding between allocations

Alignment Padding Example:

Before (offset = 5, need 8-byte alignment):
┌─────┬─────────────────────────────────────────────────────────┐
│USED │                    AVAILABLE                            │
└─────┴─────────────────────────────────────────────────────────┘
      ▲
      │
      offset = 5

After arena_alloc_aligned(arena, 16, 8):
┌─────┬───┬──────────────────┬──────────────────────────────────┐
│USED │PAD│   NEW ALLOC (16) │           AVAILABLE              │
└─────┴───┴──────────────────┴──────────────────────────────────┘
      ▲   ▲                  ▲
      │   │                  │
      5   8                  24 (new offset)

      3 bytes of padding "wasted" for alignment

Component Organization

arena.h          - Public API and Arena typedef
arena.c          - Implementation
├─ arena_create  - mmap wrapper
├─ arena_alloc   - Bump allocation
├─ arena_alloc_aligned - Aligned bump allocation
├─ arena_reset   - Set offset to 0
├─ arena_destroy - munmap wrapper
└─ query functions

test_arena.c     - Basic functionality tests
benchmark.c      - Performance comparison with malloc

Implementation Guide

Phase 1: Minimal Arena (30 minutes)

Goal: Get basic allocation working without mmap or alignment.

Create Arena struct with base, offset, capacity
Implement arena_create using malloc(capacity) for now
Implement arena_alloc: check bounds, bump offset, return pointer
Implement arena_reset: set offset to 0
Implement arena_destroy: free(base)

Test: Allocate some integers, print their addresses, verify they’re sequential.

// Your first test
Arena* arena = arena_create(1024);
int* a = arena_alloc(arena, sizeof(int));
int* b = arena_alloc(arena, sizeof(int));
int* c = arena_alloc(arena, sizeof(int));

*a = 10; *b = 20; *c = 30;

printf("a at %p, b at %p, c at %p\n", a, b, c);
// Should be exactly 4 bytes apart!

arena_destroy(arena);

Phase 2: Use mmap (30 minutes)

Goal: Request memory directly from OS.

Replace malloc(capacity) with mmap() call
Replace free() with munmap()
Handle mmap failure (returns MAP_FAILED)

#include <sys/mman.h>

Arena* arena_create(size_t capacity) {
    Arena* arena = malloc(sizeof(Arena));
    if (!arena) return NULL;

    arena->base = mmap(
        NULL,
        capacity,
        PROT_READ | PROT_WRITE,
        MAP_PRIVATE | MAP_ANONYMOUS,
        -1,
        0
    );

    if (arena->base == MAP_FAILED) {
        free(arena);
        return NULL;
    }

    arena->offset = 0;
    arena->capacity = capacity;
    return arena;
}

Phase 3: Alignment (45 minutes)

Goal: Support aligned allocations for any power-of-2 alignment.

Implement alignment helper function
Create arena_alloc_aligned(arena, size, alignment)
Make arena_alloc call arena_alloc_aligned with default alignment (8 or 16)
Test with different types requiring different alignments

// Alignment helper
static size_t align_up(size_t value, size_t alignment) {
    return (value + alignment - 1) & ~(alignment - 1);
}

void* arena_alloc_aligned(Arena* arena, size_t size, size_t alignment) {
    size_t aligned_offset = align_up(arena->offset, alignment);

    if (aligned_offset + size > arena->capacity) {
        return NULL;
    }

    void* ptr = arena->base + aligned_offset;
    arena->offset = aligned_offset + size;
    return ptr;
}

Phase 4: Type-Safe Macros (30 minutes)

Goal: Create convenient macros for type allocations.

// Single allocation
#define arena_new(arena, type) \
    ((type*)arena_alloc_aligned((arena), sizeof(type), _Alignof(type)))

// Array allocation
#define arena_array(arena, type, count) \
    ((type*)arena_alloc_aligned((arena), sizeof(type) * (count), _Alignof(type)))

// Usage:
Player* player = arena_new(arena, Player);
int* scores = arena_array(arena, int, 100);

Phase 5: Performance Benchmark (30 minutes)

Goal: Quantify the speed advantage.

#include <time.h>

#define ITERATIONS 1000000
#define ALLOCS_PER_ITER 100
#define ALLOC_SIZE 64

void benchmark_malloc() {
    clock_t start = clock();
    void* ptrs[ALLOCS_PER_ITER];

    for (int i = 0; i < ITERATIONS; i++) {
        for (int j = 0; j < ALLOCS_PER_ITER; j++) {
            ptrs[j] = malloc(ALLOC_SIZE);
        }
        for (int j = 0; j < ALLOCS_PER_ITER; j++) {
            free(ptrs[j]);
        }
    }

    clock_t end = clock();
    printf("malloc/free: %.3f seconds\n",
           (double)(end - start) / CLOCKS_PER_SEC);
}

void benchmark_arena() {
    clock_t start = clock();
    Arena* arena = arena_create(ALLOC_SIZE * ALLOCS_PER_ITER);

    for (int i = 0; i < ITERATIONS; i++) {
        for (int j = 0; j < ALLOCS_PER_ITER; j++) {
            arena_alloc(arena, ALLOC_SIZE);
        }
        arena_reset(arena);
    }

    arena_destroy(arena);
    clock_t end = clock();
    printf("arena: %.3f seconds\n",
           (double)(end - start) / CLOCKS_PER_SEC);
}

Phase 6: Extended Features (Optional)

Temporary/Nested Arenas:

// Save current position
size_t arena_save(Arena* arena) {
    return arena->offset;
}

// Restore to saved position (invalidates all allocations since save)
void arena_restore(Arena* arena, size_t savepoint) {
    arena->offset = savepoint;
}

// Usage:
size_t saved = arena_save(arena);
// ... temporary allocations ...
arena_restore(arena, saved);  // All temp allocations gone

Growing Arenas:

typedef struct ArenaBlock {
    char* base;
    size_t capacity;
    struct ArenaBlock* next;
} ArenaBlock;

typedef struct GrowingArena {
    ArenaBlock* head;
    ArenaBlock* current;
    size_t offset;
    size_t block_size;
} GrowingArena;

// When current block is full, allocate new block and chain

Testing Strategy

Unit Tests

void test_basic_allocation() {
    Arena* arena = arena_create(1024);
    assert(arena != NULL);

    int* a = arena_alloc(arena, sizeof(int));
    assert(a != NULL);
    *a = 42;
    assert(*a == 42);

    arena_destroy(arena);
    printf("test_basic_allocation: PASSED\n");
}

void test_sequential_addresses() {
    Arena* arena = arena_create(1024);

    char* a = arena_alloc(arena, 1);
    char* b = arena_alloc(arena, 1);
    char* c = arena_alloc(arena, 1);

    // Should be sequential
    assert(b == a + 1);
    assert(c == b + 1);

    arena_destroy(arena);
    printf("test_sequential_addresses: PASSED\n");
}

void test_reset() {
    Arena* arena = arena_create(1024);

    arena_alloc(arena, 512);
    assert(arena_used(arena) == 512);

    arena_reset(arena);
    assert(arena_used(arena) == 0);
    assert(arena_remaining(arena) == 1024);

    arena_destroy(arena);
    printf("test_reset: PASSED\n");
}

void test_exhaustion() {
    Arena* arena = arena_create(100);

    void* a = arena_alloc(arena, 50);
    assert(a != NULL);

    void* b = arena_alloc(arena, 50);
    assert(b != NULL);

    // This should fail - arena exhausted
    void* c = arena_alloc(arena, 1);
    assert(c == NULL);

    arena_destroy(arena);
    printf("test_exhaustion: PASSED\n");
}

void test_alignment() {
    Arena* arena = arena_create(1024);

    // Allocate 1 byte to misalign
    arena_alloc(arena, 1);

    // Now allocate something requiring 8-byte alignment
    double* d = arena_alloc_aligned(arena, sizeof(double), 8);

    // Address should be 8-byte aligned
    assert(((uintptr_t)d % 8) == 0);

    // Should be usable
    *d = 3.14159;
    assert(*d == 3.14159);

    arena_destroy(arena);
    printf("test_alignment: PASSED\n");
}

Integration Test: Game Frame Simulation

void test_game_frame_pattern() {
    Arena* frame_arena = arena_create(1024 * 1024);  // 1MB

    for (int frame = 0; frame < 60; frame++) {
        // Simulate frame allocations
        Vector3* positions = arena_array(frame_arena, Vector3, 1000);
        Vector3* velocities = arena_array(frame_arena, Vector3, 1000);
        char* debug_buffer = arena_alloc(frame_arena, 4096);

        // "Use" the allocations
        for (int i = 0; i < 1000; i++) {
            positions[i].x = (float)i;
            velocities[i].x = (float)i * 0.1f;
        }
        snprintf(debug_buffer, 4096, "Frame %d", frame);

        // Reset for next frame
        arena_reset(frame_arena);
    }

    arena_destroy(frame_arena);
    printf("test_game_frame_pattern: PASSED (60 frames, no leaks)\n");
}

Common Pitfalls

Pitfall 1: Using Allocations After Reset

// WRONG: Using pointer after reset
Arena* arena = arena_create(1024);
int* data = arena_alloc(arena, sizeof(int) * 100);
arena_reset(arena);
data[0] = 42;  // UNDEFINED BEHAVIOR! Memory may be reused

// CORRECT: Reset invalidates ALL pointers from that arena
Arena* arena = arena_create(1024);
int* data = arena_alloc(arena, sizeof(int) * 100);
// Use data...
// When done:
data = NULL;  // Clear your pointer
arena_reset(arena);

Pitfall 2: Forgetting Alignment

// WRONG: May cause crashes on some CPUs
Arena* arena = arena_create(1024);
char c;
arena_alloc(arena, 1);  // offset = 1
double* d = arena_alloc(arena, sizeof(double));  // offset = 1, misaligned!
*d = 3.14;  // Potential bus error!

// CORRECT: Use aligned allocation
Arena* arena = arena_create(1024);
arena_alloc(arena, 1);  // offset = 1
double* d = arena_alloc_aligned(arena, sizeof(double), alignof(double));
*d = 3.14;  // Safe, properly aligned

Pitfall 3: Integer Overflow in Size Calculations

// WRONG: Possible overflow with large sizes
void* arena_alloc(Arena* arena, size_t size) {
    if (arena->offset + size > arena->capacity) {  // Can overflow!
        return NULL;
    }
    // ...
}

// CORRECT: Check for overflow
void* arena_alloc(Arena* arena, size_t size) {
    if (size > arena->capacity - arena->offset) {  // Safe check
        return NULL;
    }
    // ...
}

Pitfall 4: Not Handling mmap Failure

// WRONG: Not checking mmap return value
Arena* arena_create(size_t capacity) {
    Arena* arena = malloc(sizeof(Arena));
    arena->base = mmap(NULL, capacity, PROT_READ | PROT_WRITE,
                       MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
    // If mmap failed, base == MAP_FAILED, not NULL!
    arena->offset = 0;
    arena->capacity = capacity;
    return arena;
}

// CORRECT: Check for MAP_FAILED
Arena* arena_create(size_t capacity) {
    Arena* arena = malloc(sizeof(Arena));
    if (!arena) return NULL;

    arena->base = mmap(NULL, capacity, PROT_READ | PROT_WRITE,
                       MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
    if (arena->base == MAP_FAILED) {
        free(arena);
        return NULL;
    }

    arena->offset = 0;
    arena->capacity = capacity;
    return arena;
}

Pitfall 5: Using Arena for Wrong Pattern

// WRONG: Using arena when you need individual frees
Arena* arena = arena_create(1024);
void* item;
while ((item = get_next_item())) {
    void* processed = process_in_arena(arena, item);
    output(processed);
    // Can't free processed individually - arena keeps growing!
}
// Eventually arena exhausted

// CORRECT: Use arena for batch patterns only
Arena* arena = arena_create(1024);
while (has_batches()) {
    Batch* batch = get_batch();
    for (int i = 0; i < batch->count; i++) {
        void* processed = process_in_arena(arena, batch->items[i]);
        output(processed);
    }
    arena_reset(arena);  // Reset after each batch
}

Extensions and Challenges

Challenge 1: Growing Arena (Medium)

Implement an arena that automatically grows when exhausted:

Chain multiple blocks together
Keep allocating without interruption
Reset frees all blocks except the first
Measure overhead vs fixed arena

Challenge 2: Scratch/Temp Arena (Medium)

Implement nested arena scopes:

Arena* arena = arena_create(1024);
// ... some allocations ...

ArenaTemp temp = arena_begin_temp(arena);
// ... temporary allocations ...
arena_end_temp(temp);  // Everything since begin_temp is freed

// Original allocations still valid

Challenge 3: Pool Allocator (Hard)

Build a pool allocator on top of your arena:

Fixed-size blocks only
O(1) allocate AND free
No fragmentation
Free list stored in the free blocks themselves

Challenge 4: Thread-Local Arenas (Hard)

Make arenas thread-safe:

Option 1: One arena per thread
Option 2: Mutex-protected arena
Compare performance

Challenge 5: Debug Arena (Medium)

Add debugging features:

Track number of allocations
Track high-water mark
Poison freed memory (fill with 0xDE)
Detect use-after-reset

Real-World Applications

Game Engine Frame Allocator

// Used by many game engines (Unreal, Unity underlying systems, etc.)
void game_loop() {
    Arena* frame_arena = arena_create(16 * 1024 * 1024);  // 16MB

    while (game_running) {
        process_input(frame_arena);
        update_physics(frame_arena);
        render_frame(frame_arena);

        // All frame-specific allocations freed instantly
        arena_reset(frame_arena);
    }

    arena_destroy(frame_arena);
}

Web Server Request Handler

// Each request gets its own arena
void handle_request(Request* req) {
    Arena* request_arena = arena_create(64 * 1024);  // 64KB

    ParsedHeaders* headers = parse_headers(request_arena, req);
    JsonValue* body = parse_json(request_arena, req->body);
    Response* response = generate_response(request_arena, headers, body);

    send_response(response);

    // All request data freed at once
    arena_destroy(request_arena);
}

Compiler/Parser

// Parse phase allocates AST nodes into arena
AST* parse_file(const char* filename) {
    Arena* parse_arena = arena_create(1024 * 1024);  // 1MB

    Tokens* tokens = lex(parse_arena, filename);
    AST* ast = parse(parse_arena, tokens);

    // AST lives in arena - return it
    // Arena destroyed when AST no longer needed
    return ast;
}

Simulation/Scientific Computing

// Monte Carlo simulation with many temporary values
void run_simulation(int iterations) {
    Arena* iter_arena = arena_create(10 * 1024 * 1024);  // 10MB

    for (int i = 0; i < iterations; i++) {
        Matrix* temp1 = matrix_create(iter_arena, 1000, 1000);
        Matrix* temp2 = matrix_multiply(iter_arena, temp1, other);
        Vector* result = matrix_reduce(iter_arena, temp2);

        accumulate(result);

        // Hundreds of MB would be allocated without arena
        // With arena: just reset
        arena_reset(iter_arena);
    }

    arena_destroy(iter_arena);
}

Interview Preparation

Common Questions

“What is an arena allocator? When would you use one?”
- An arena allocator pre-allocates a large block and hands out pieces sequentially
- Use when allocations share a lifetime and can be freed together
- Examples: per-frame, per-request, per-parse operations
“What’s the time complexity of arena allocation vs malloc?”
- Arena: O(1) always - just bump a pointer
- malloc: O(1) best case, O(n) worst case (searching free lists)
- Arena reset: O(1) - just set offset to 0
- free: O(1) to O(log n) depending on implementation
“Why can’t you free individual allocations from an arena?”
- No per-allocation metadata to track what’s allocated
- Would need free list, defeating the purpose
- Design choice: trade individual frees for speed
“What is memory alignment and why does it matter?”
- CPUs require data at specific address boundaries (2, 4, 8, 16 bytes)
- Misaligned access: slower (multiple memory reads) or crash (bus error)
- Align formula: (addr + align - 1) & ~(align - 1)
“Compare arena allocators to pool allocators.”
- Arena: variable-size, no individual free, bulk reset
- Pool: fixed-size, individual free via free list, good for objects of same size
- Arena for temporary data, pool for persistent same-sized objects
“In what scenarios would an arena allocator be faster than malloc?”
- Many small allocations followed by bulk free
- Known lifetime patterns (frame, request, transaction)
- Performance-critical paths with predictable allocation
- When fragmentation would hurt malloc

Self-Assessment Checklist

Understanding (Can You Explain?)

Why syscalls are expensive and allocators batch them
What “bump allocation” means and why it’s O(1)
The tradeoff between flexibility and performance in allocators
Why alignment matters and how to calculate aligned addresses
When arena allocation is appropriate vs general-purpose

Implementation (Can You Build?)

Basic arena with create/alloc/reset/destroy
Aligned allocation supporting power-of-2 alignments
Type-safe macros for allocation
mmap-based memory acquisition

Application (Can You Apply?)

Identify code that would benefit from arena allocation
Benchmark arena vs malloc for specific patterns
Design arena usage for a game frame or request handler
Avoid the common pitfalls (use-after-reset, alignment)

Extension (Can You Extend?)

Implement growing arenas
Add temp/scratch arena pattern
Build pool allocator on top of arena
Add debug/statistics features

Resources

Books

Topic	Book	Chapter
Arena design patterns	“C Interfaces and Implementations” by David Hanson	Ch. 5-6
Memory allocator internals	“Computer Systems: A Programmer’s Perspective”	Ch. 9.9
mmap and virtual memory	“The Linux Programming Interface”	Ch. 49
Alignment requirements	“Computer Systems: A Programmer’s Perspective”	Ch. 3.9.3
Allocator strategies	“Operating Systems: Three Easy Pieces”	Ch. 17

Online Resources

Code References

stb single-file libraries - See memory handling patterns
mimalloc - Microsoft’s performance allocator
jemalloc - General-purpose allocator with arenas

Summary

The arena allocator teaches a fundamental truth about memory management: malloc is just one solution, and often not the best one. By building your own allocator, you understand:

Memory is a resource - You can manage it however you want
Tradeoffs exist - Speed vs flexibility, simplicity vs features
Patterns matter - Know your allocation pattern, choose the right allocator
The OS is expensive - Batch syscalls, minimize kernel transitions
Alignment is real - CPUs have requirements, respect them

When you can build a custom allocator for a specific use case and quantify its performance advantage, you’ve moved from “using” memory to “managing” memory. This is systems programming.

Next Project: P05: Exploit Lab (Buffer Overflow) - Now that you understand how memory really works, let’s see what happens when it goes wrong on purpose.