Project 4: Arena Allocator

Build a fast bump allocator backed by mmap with alignment support, reset, and usage stats.

Quick Reference

Attribute Value
Difficulty Intermediate
Time Estimate 10-16 hours
Main Programming Language C (Alternatives: Rust, Zig)
Alternative Programming Languages Rust, Zig, C++
Coolness Level High
Business Potential Medium (game engines, embedded)
Prerequisites Pointers, malloc, basic OS concepts
Key Topics Alignment, mmap, bump allocation, fragmentation

1. Learning Objectives

By completing this project, you will:

  1. Implement a bump (arena) allocator with alignment guarantees.
  2. Understand how mmap provides large contiguous virtual regions.
  3. Track usage metrics like high-water marks.
  4. Design reset semantics that make allocation O(1).
  5. Compare arena allocation to malloc and explain trade-offs.

2. All Theory Needed (Per-Concept Breakdown)

2.1 Alignment and Pointer Arithmetic

Fundamentals

Alignment is the rule that certain types must be stored at addresses that are multiples of their size (or a platform-specific alignment). Misaligned access can be slow or even crash on some architectures. In a custom allocator, you must round allocation pointers up to the required alignment before returning them. This is done with simple arithmetic on addresses interpreted as integers. Understanding alignment is essential to build a correct arena allocator because even if the memory region is large enough, returning misaligned pointers makes it unsafe to store types like double or struct.

Deep Dive into the Concept

Every type has an alignment requirement, which is usually the size of the type but can be smaller or larger depending on ABI rules. For example, on x86-64, double is 8-byte aligned, and many structs have alignment equal to the largest member. When you return a pointer from an allocator, you must ensure it meets the alignment requirement for any type that may be stored. Standard malloc guarantees alignment suitable for any type. Your arena allocator must do the same or provide an API where the caller specifies alignment.

Alignment is achieved by rounding the current bump pointer up to the next multiple of the alignment. This can be done with the formula (p + align - 1) & ~(align - 1) when align is a power of two. This is why most allocators require power-of-two alignments. If the alignment is not a power of two, the math is more complex and error-prone. You can choose to enforce power-of-two alignment as a precondition.

Pointer arithmetic in C is scaled by type size, so you should cast to uintptr_t or char* when doing alignment calculations. uintptr_t is an integer type that can hold a pointer value. You convert the pointer to uintptr_t, do the rounding, and then cast back to void*. This ensures correct arithmetic in bytes rather than in elements of a type.

Alignment introduces padding. Suppose your bump pointer is at offset 5, and you need 8-byte alignment. You must move to offset 8, wasting 3 bytes. This padding is internal fragmentation. In an arena, fragmentation is acceptable because the allocator is designed for speed and bulk reset. But you still need to measure it. Tracking “wasted bytes” can be a valuable metric to show the cost of alignment.

Alignment also affects performance. Properly aligned data enables the CPU to fetch it efficiently, especially for SIMD instructions. Misaligned access can cause multiple memory reads or exceptions. This is why alignment is not just a correctness rule but also a performance rule. Your project should teach that correct alignment is a necessary trade-off in allocator design.

Finally, alignment is connected to ABI rules and stack discipline. Many calling conventions require the stack to be aligned to 16 bytes at call boundaries. If your arena allocator is used to allocate stack-like objects or SIMD buffers, alignment becomes even more critical. A good arena allocator should support a function like arena_alloc_aligned(arena, size, align) so the caller can request specific alignment. Your tests should verify that returned pointers satisfy the alignment constraint.

How this fits on projects

Alignment is used in Section 3.2 requirements and Section 4.4 algorithm overview. It also defines key tests in Section 6.2.

Definitions & key terms

  • Alignment: Address multiple required by a type.
  • Padding: Unused bytes added to satisfy alignment.
  • uintptr_t: Integer type for pointer arithmetic.
  • Internal fragmentation: Wasted space due to alignment.

Mental model diagram (ASCII)

current=5, align=8
5 -> 8 (padding 3 bytes)
[xxxxx][ppp][aligned block]

How it works (step-by-step, with invariants and failure modes)

  1. Convert pointer to uintptr_t.
  2. Round up to alignment boundary.
  3. Return aligned pointer.

Invariant: Returned pointer % align == 0.

Failure modes: Non-power-of-two alignment, integer overflow.

Minimal concrete example

uintptr_t p = (uintptr_t)arena->base + arena->offset;
uintptr_t aligned = (p + align - 1) & ~(align - 1);

Common misconceptions

  • “Alignment only matters for performance.” -> It is also correctness.
  • “Any alignment value works.” -> Power-of-two is required for bit mask.
  • “Padding is wasted.” -> It is necessary for safety.

Check-your-understanding questions

  1. Why must align be a power of two for bit masking?
  2. What is internal fragmentation?
  3. How do you verify a pointer is aligned?

Check-your-understanding answers

  1. The mask relies on binary boundaries; non-power-of-two breaks it.
  2. Wasted space due to alignment rounding.
  3. (uintptr_t)ptr % align == 0.

Real-world applications

  • Game engines using arenas for fast allocation.
  • SIMD buffers requiring 16-byte alignment.
  • Embedded systems with strict alignment rules.

Where you’ll apply it

References

  • CSAPP (data alignment)
  • “C Interfaces and Implementations” (allocators)

Key insights

Alignment is the rule that keeps raw memory safe for any type.

Summary

You understand how to compute aligned addresses and why alignment is mandatory.

Homework/Exercises to practice the concept

  1. Compute alignment padding for offsets 0..15 with align=8.
  2. Write a function that returns the next aligned address.

Solutions to the homework/exercises

  1. Offsets 0..7 need 0..1..7 bytes; 8..15 repeat.
  2. Use (p + a - 1) & ~(a - 1).

2.2 mmap, Pages, and Virtual Memory Regions

Fundamentals

mmap asks the OS to map a region of virtual memory. Unlike malloc, which allocates from the heap, mmap can request large contiguous regions directly from the kernel. The size is rounded to page boundaries, and the OS returns a page-aligned pointer. This is ideal for arena allocators because you can reserve a large region and then manage allocations yourself. Understanding how pages work and how mmap behaves is essential for building a correct and portable arena.

Deep Dive into the Concept

On Unix systems, mmap creates a mapping of a file or anonymous memory into the process’s address space. For an arena allocator, you typically use anonymous mappings (MAP_ANONYMOUS) with read/write permissions (PROT_READ | PROT_WRITE). The kernel reserves a region of virtual address space and returns a pointer aligned to the page size. The page size is typically 4096 bytes, but you can query it with sysconf(_SC_PAGESIZE).

mmap is different from malloc in two ways. First, it bypasses the allocator and talks directly to the kernel, which makes it slower per call but provides large regions with minimal fragmentation. Second, the memory is lazily allocated. The OS reserves the virtual range, but physical pages are mapped only when you touch them. This is useful because you can reserve large arenas without immediately consuming RAM.

When you free an arena, you should call munmap with the same size. This returns the region to the OS. If you only want to reset allocations, you can keep the mapping and reset the offset to zero. This makes the arena extremely fast: allocations are O(1) and deallocations are a single reset. This is the primary appeal of arenas.

mmap also interacts with the OS’s overcommit policy. On Linux, the kernel may allow you to allocate more virtual memory than physical RAM; if you touch too much memory, you can get SIGKILL due to the OOM killer. Your allocator should handle mmap failure gracefully (returning NULL and setting an error code). You should also avoid trusting that a large allocation will always succeed.

Pages are the unit of protection. You can use mprotect to change permissions (e.g., making a region read-only). For this project, you can optionally add a guard page after your arena to catch overflows. This is an advanced extension: you allocate one extra page, mark it PROT_NONE, and then any out-of-bounds write will trigger a crash. This demonstrates how the OS can enforce safety even for custom allocators.

Finally, the mapping address you get from mmap can vary due to ASLR. This is why addresses printed by your arena demo change between runs. To keep output deterministic, you should print offsets within the arena rather than absolute addresses, or allow a --offsets flag similar to Project 1. This is a practical application of the determinism requirement.

How this fits on projects

mmap is used in Section 3.1 for allocation, and page concepts are used in Section 7.3 for guard pages.

Definitions & key terms

  • mmap: System call that maps memory regions.
  • Page: Fixed-size memory unit (often 4 KiB).
  • munmap: Unmaps a region.
  • Overcommit: OS policy for virtual memory reservations.

Mental model diagram (ASCII)

[mmap] -> [virtual pages] -> touched pages -> physical RAM

How it works (step-by-step, with invariants and failure modes)

  1. Request size rounded to page boundaries.
  2. Kernel reserves virtual range.
  3. Pages are allocated on first touch.
  4. munmap returns region to OS.

Invariant: munmap size must match original mapping.

Failure modes: mmap returns MAP_FAILED, OOM kill on touch.

Minimal concrete example

void *mem = mmap(NULL, size, PROT_READ|PROT_WRITE,
                 MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
if (mem == MAP_FAILED) { /* handle error */ }

Common misconceptions

  • mmap always allocates physical RAM.” -> It is lazy.
  • munmap can free part of a mapping safely.” -> It can, but must be aligned.
  • mmap is always faster than malloc.” -> It is slower per call.

Check-your-understanding questions

  1. Why is mmap useful for arenas?
  2. What is the page size, and why does it matter?
  3. How do you detect mmap failure?

Check-your-understanding answers

  1. It provides large, contiguous regions with minimal fragmentation.
  2. It determines alignment and granularity of mappings.
  3. mmap returns MAP_FAILED.

Real-world applications

  • Memory pools in databases.
  • Game engines and real-time systems.
  • Custom allocators in high-performance servers.

Where you’ll apply it

References

  • man 2 mmap
  • “The Linux Programming Interface” (memory mapping)

Key insights

mmap gives you raw memory; you provide the policy.

Summary

You understand how to request, use, and release memory mappings for arenas.

Homework/Exercises to practice the concept

  1. Allocate 1 MB with mmap and write to the first and last byte.
  2. Add a guard page and intentionally overflow.

Solutions to the homework/exercises

  1. Touching the bytes forces page allocation.
  2. mprotect the guard page to PROT_NONE.

2.3 Bump Allocation and Reset Semantics

Fundamentals

A bump allocator keeps a pointer (or offset) that advances as you allocate. Each allocation returns the current pointer and increments it by the requested size plus alignment padding. There is no individual free; instead, you reset the entire arena at once. This makes allocation extremely fast and predictable. The trade-off is that you cannot free individual allocations, so arenas are best when allocation lifetimes are similar (e.g., per-frame allocations in a game).

Deep Dive into the Concept

The core of a bump allocator is a simple invariant: base + offset points to the next free byte. Allocations are O(1): align the pointer, check if it fits, and move the offset. This simplicity yields speed and predictability, but it comes with limitations. Because there is no per-allocation free, memory usage grows monotonically until you reset or destroy the arena. This makes arenas ideal for “phase-based” workloads where a batch of allocations is used together and then discarded at once.

Reset semantics are central. A reset sets the offset back to zero, effectively freeing all allocations at once. This is much faster than freeing each allocation. However, it requires that the caller knows when all allocations are no longer needed. If the caller resets too early, pointers become invalid and use-after-free bugs occur. In other words, arenas move the burden of lifetime management from the allocator to the user. Your project should make this explicit: a reset invalidates all previously returned pointers.

The bump allocator also highlights fragmentation trade-offs. There is no external fragmentation because allocations are contiguous. But internal fragmentation exists due to alignment padding. The allocator can track “used bytes” and “wasted bytes” separately. This is a valuable teaching tool: it shows that alignment has a cost, and it quantifies that cost for different workloads.

To support multiple lifetimes, you can add “checkpoints.” A checkpoint stores the current offset; later you can roll back to that offset, freeing all allocations after it. This is a simple extension that adds flexibility. It also demonstrates how arena allocators can be used for nested lifetimes (e.g., parsing substructures).

Thread safety is another consideration. A simple arena allocator is not thread-safe because multiple threads can update the offset concurrently. You can add a mutex or atomic operations to support concurrent allocations, but this reduces performance. For this project, you can document that the allocator is single-threaded and leave concurrency as an extension.

Finally, memory usage metrics matter. Your allocator should track current used bytes, high-water mark, and total capacity. These metrics help users understand allocation patterns and debug memory usage. They also provide deterministic output for tests because you can show offsets rather than absolute addresses.

How this fits on projects

This concept defines Section 3.2 requirements and Section 5.10 phases. It also shapes the demo output in Section 3.7.

Definitions & key terms

  • Bump allocator: Allocator that advances a pointer for each allocation.
  • Reset: Operation that frees all allocations at once.
  • High-water mark: Maximum used bytes observed.
  • Checkpoint: Saved offset for rollback.

Mental model diagram (ASCII)

base -> [allocated][allocated][free...]
          ^ offset moves right
reset -> offset = 0

How it works (step-by-step, with invariants and failure modes)

  1. Align the current pointer.
  2. Check if size fits in remaining capacity.
  3. Return pointer and advance offset.
  4. Reset sets offset to zero.

Invariant: offset <= capacity.

Failure modes: OOM without detection, using pointers after reset.

Minimal concrete example

void *arena_alloc(arena *a, size_t n, size_t align) {
    size_t p = align_up(a->offset, align);
    if (p + n > a->capacity) return NULL;
    a->offset = p + n;
    return a->base + p;
}

Common misconceptions

  • “Arena is faster because it is magic.” -> It is fast because it is simple.
  • “Reset frees memory in OS.” -> It only resets offsets.
  • “You can free individual allocations.” -> You cannot, by design.

Check-your-understanding questions

  1. Why is a bump allocator O(1)?
  2. What happens to old pointers after reset?
  3. Why is alignment padding unavoidable?

Check-your-understanding answers

  1. It only updates a pointer and checks bounds.
  2. They become invalid; use-after-free if used.
  3. Types require aligned addresses.

Real-world applications

  • Game engine frame allocators.
  • Parser and compiler temporary memory pools.
  • Real-time systems with deterministic allocation.

Where you’ll apply it

References

  • “C Interfaces and Implementations” (memory arenas)
  • “Memory Systems: Cache, DRAM, Disk” (memory hierarchy)

Key insights

Bump allocation trades flexibility for speed and determinism.

Summary

You understand how arena allocators work and why reset semantics are powerful.

Homework/Exercises to practice the concept

  1. Simulate a sequence of allocations and compute offsets by hand.
  2. Implement a checkpoint/rollback feature.

Solutions to the homework/exercises

  1. Align each allocation and sum sizes to compute offsets.
  2. Save offset and restore it to roll back.

3. Project Specification

3.1 What You Will Build

An arena allocator library that:

  • Allocates a large region with mmap.
  • Serves aligned bump allocations.
  • Provides reset and destroy operations.
  • Tracks usage metrics and prints a demo report.

3.2 Functional Requirements

  1. arena_init(size) allocates a region.
  2. arena_alloc(size, align) returns aligned pointers.
  3. arena_reset() resets offset to zero.
  4. arena_destroy() calls munmap.
  5. Track used bytes, high-water mark, and wasted padding.

3.3 Non-Functional Requirements

  • Performance: Allocation O(1).
  • Reliability: Alignment guarantees met.
  • Usability: Clear usage stats.

3.4 Example Usage / Output

$ ./arena_demo
Arena size: 1048576 bytes
alloc(24, 8) -> offset 0
alloc(64, 16) -> offset 32
alloc(128, 8) -> offset 96

Used: 216 bytes
High-water: 216 bytes
Padding waste: 8 bytes

3.5 Data Formats / Schemas / Protocols

typedef struct {
    uint8_t *base;
    size_t capacity;
    size_t offset;
    size_t high_water;
    size_t padding_waste;
} arena;

3.6 Edge Cases

  • Allocation exactly equal to remaining space.
  • Alignment larger than page size.
  • mmap failure (return NULL).
  • Reset after partial allocations.

3.7 Real World Outcome

The demo should show deterministic offsets and usage metrics.

3.7.1 How to Run (Copy/Paste)

make
./arena_demo

3.7.2 Golden Path Demo (Deterministic)

$ ./arena_demo
Arena size: 1048576 bytes
alloc(24, 8) -> offset 0
alloc(64, 16) -> offset 32
alloc(128, 8) -> offset 96

Used: 216 bytes
High-water: 216 bytes
Padding waste: 8 bytes
Exit code: 0

3.7.3 If CLI: Exact Terminal Transcript

$ ./arena_demo --overflow
Arena size: 128 bytes
alloc(200, 8) -> NULL (out of memory)
Exit code: 1

4. Solution Architecture

4.1 High-Level Design

[mmap region] -> [bump allocator] -> [reset]

4.2 Key Components

Component Responsibility Key Decisions
Arena state base, offset, capacity store metrics
Allocator align and bump power-of-two alignment
Demo deterministic output print offsets

4.3 Data Structures (No Full Code)

typedef struct {
    uint8_t *base;
    size_t capacity;
    size_t offset;
} arena;

4.4 Algorithm Overview

Key Algorithm: Aligned Allocation

  1. Align offset to alignment.
  2. Check if offset + size <= capacity.
  3. Return base + offset and update offset.

Complexity Analysis:

  • Time: O(1).
  • Space: O(capacity).

5. Implementation Guide

5.1 Development Environment Setup

sudo apt-get install build-essential

5.2 Project Structure

arena/
|-- include/
|   \-- arena.h
|-- src/
|   \-- arena.c
|-- demo/
|   \-- arena_demo.c
|-- tests/
|   \-- test_arena.c
\-- Makefile

5.3 The Core Question You’re Answering

“How can I trade flexibility for speed by controlling memory allocation patterns?”

5.4 Concepts You Must Understand First

  1. Alignment and pointer arithmetic (see Section 2.1).
  2. mmap and pages (see Section 2.2).
  3. Bump allocation semantics (see Section 2.3).

5.5 Questions to Guide Your Design

  1. How will you handle alignment and padding metrics?
  2. Will you support checkpoints or just reset?
  3. How will you present deterministic output?

5.6 Thinking Exercise

If your arena is 256 bytes and you allocate 3 blocks of 24 bytes each with 16-byte alignment, how much padding is wasted?

5.7 The Interview Questions They’ll Ask

  1. What is an arena allocator and why is it fast?
  2. When is an arena allocator a bad idea?
  3. How does alignment affect allocator design?

5.8 Hints in Layers

Hint 1: Implement arena_init and arena_destroy first. Hint 2: Implement arena_alloc with alignment. Hint 3: Track high_water and padding_waste. Hint 4: Add arena_reset and test it.

5.9 Books That Will Help

| Topic | Book | Chapter | |——-|——|———| | Allocators | C Interfaces and Implementations | Ch. 5-6 | | Alignment | CSAPP | Ch. 3 | | Memory mapping | TLPI | Ch. 49 |

5.10 Implementation Phases

Phase 1: Mapping + Basic Alloc (3-4 hours)

Goals: mmap region and bump allocate. Checkpoint: allocations return non-NULL pointers.

Phase 2: Alignment + Metrics (3-4 hours)

Goals: align allocations and track padding. Checkpoint: alignment tests pass.

Phase 3: Reset + Demo (2-3 hours)

Goals: reset and deterministic demo output. Checkpoint: demo output matches golden path.

5.11 Key Implementation Decisions

| Decision | Options | Recommendation | Rationale | |———-|———|—————-|———–| | Alignment API | implicit vs explicit | explicit align | clarity | | Memory source | mmap vs malloc | mmap | contiguous region | | Reset semantics | full reset vs checkpoints | full reset | simpler |


6. Testing Strategy

6.1 Test Categories

| Category | Purpose | Examples | |———-|———|———-| | Unit Tests | alignment correctness | pointer % align == 0 | | Integration Tests | demo output | offsets match | | Edge Case Tests | OOM handling | oversize alloc returns NULL |

6.2 Critical Test Cases

  1. Allocation at boundary fits exactly.
  2. Misaligned sizes still return aligned pointers.
  3. Reset restores offset to zero.

6.3 Test Data

Fixed allocation sequence with known offsets.

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

| Pitfall | Symptom | Solution | |———|———|———-| | Wrong alignment math | misaligned pointer | use power-of-two mask | | No OOM check | crash on NULL | check before use | | Reset misuse | use-after-free | document reset semantics |

7.2 Debugging Strategies

  • Print offsets and alignment for each allocation.
  • Add assertions for offset <= capacity.

7.3 Performance Traps

Allocating too small arenas can cause frequent failures; choose reasonable sizes.


8. Extensions & Challenges

8.1 Beginner Extensions

  • Add arena_strdup.
  • Add statistics report function.

8.2 Intermediate Extensions

  • Add checkpoints and rollback.
  • Add guard page with mprotect.

8.3 Advanced Extensions

  • Thread-safe arena with atomic bump.
  • Multiple arenas for different allocation classes.

9. Real-World Connections

9.1 Industry Applications

  • Game engines (per-frame memory).
  • Compilers and parsers (temporary allocations).
  • jemalloc arenas (conceptual inspiration).
  • Lua allocator (custom allocator support).

9.3 Interview Relevance

  • Explain trade-offs between malloc and arena allocators.

10. Resources

10.1 Essential Reading

  • “C Interfaces and Implementations” (arena allocators)
  • “The Linux Programming Interface” (mmap)

10.2 Video Resources

  • Talks on memory allocators and performance.

10.3 Tools & Documentation

  • man 2 mmap, man 2 munmap

11. Self-Assessment Checklist

11.1 Understanding

  • I can explain why alignment matters.
  • I can explain how mmap differs from malloc.
  • I can describe the reset semantics of an arena.

11.2 Implementation

  • Allocations are aligned correctly.
  • OOM conditions are handled gracefully.
  • Demo output is deterministic.

11.3 Growth

  • I can explain where arenas are useful in real systems.

12. Submission / Completion Criteria

Minimum Viable Completion:

  • Arena allocator with aligned allocations and reset.
  • Demo program showing usage metrics.

Full Completion:

  • High-water and padding stats.
  • Unit tests for alignment and OOM.

Excellence (Going Above & Beyond):

  • Guard pages and checkpoints.
  • Thread-safe arena variant.