Project 2: Custom Memory Allocator (malloc/free from scratch)

Build a drop-in malloc/free replacement that manages heap pages, splits/coalesces blocks, and uses mmap for large allocations.

Quick Reference

Attribute Value
Difficulty Level 3: Advanced
Time Estimate 2-3 weeks
Main Programming Language C (Alternatives: Rust)
Alternative Programming Languages Rust
Coolness Level Level 4: Hardcore
Business Potential Level 3: Systems / performance tooling
Prerequisites C pointers, structs, bit ops, basic VM concepts
Key Topics Heap layout, free lists, coalescing, alignment, mmap/brk

1. Learning Objectives

By completing this project, you will:

  1. Explain how user-space heaps grow via brk/sbrk and mmap.
  2. Design allocator metadata and free list strategies.
  3. Implement splitting, coalescing, and alignment correctly.
  4. Integrate with real binaries using LD_PRELOAD.
  5. Benchmark and reason about fragmentation vs speed.
  6. Add deterministic debug modes for reproducible tests.

2. All Theory Needed (Per-Concept Breakdown)

2.1 Process Virtual Memory and the Heap

Fundamentals

Processes see a contiguous virtual address space. The heap typically grows upward using brk/sbrk, while large allocations can use mmap to map fresh regions. The kernel backs virtual pages with physical memory on demand, which is why small allocations are cheap until you touch pages.

Deep Dive into the concept

The kernel maintains per-process VMAs (virtual memory areas). The heap is usually a VMA bounded by the program break; sbrk moves the break and expands the heap VMA. mmap creates new VMAs and can bypass the main heap, which is useful for large allocations to reduce fragmentation and allow independent unmapping. Page alignment matters: allocators often align to 16 bytes (for SIMD) and manage internal headers. Understanding VMAs helps you explain why large allocations appear in /proc/<pid>/maps as separate regions.

How this fits on projects

You will choose when to use sbrk vs mmap in Section 3.2 and design the block layout in Section 4.3.

Definitions & key terms

  • heap -> region for dynamic allocations via malloc
  • program break -> end of the heap for brk/sbrk
  • VMA -> virtual memory area tracked by the kernel
  • mmap -> map pages at arbitrary addresses

Mental model diagram (ASCII)

low addr
+--------------------+  text/data
+--------------------+
|   heap (brk)       |  grows up
+--------------------+
|   mmap regions     |  large allocs
+--------------------+
|   stack            |  grows down
+--------------------+
high addr

How it works (step-by-step)

  1. Allocator calls sbrk to extend heap by N bytes.
  2. Kernel updates program break and VMA.
  3. Pages are committed on first touch (page faults).
  4. For large allocations, allocator calls mmap directly.

Minimal concrete example

void *p = sbrk(4096);     // grow heap by a page
void *q = mmap(NULL, 1<<20, PROT_READ|PROT_WRITE,
               MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);

Common misconceptions

  • Misconception: malloc always calls sbrk. Correction: modern allocators often use mmap for large blocks.
  • Misconception: pages are allocated immediately. Correction: pages are typically allocated on first access.

Check-your-understanding questions

  1. Why do large allocations often use mmap?
  2. What does /proc/<pid>/maps show for an mmap allocation?
  3. Why does touching a new heap page cause a page fault?

Check-your-understanding answers

  1. To reduce fragmentation and enable independent unmapping.
  2. A separate VMA labeled [anon] or the mapped file.
  3. The page isn’t backed until first access (demand paging).

Real-world applications

  • High-performance allocators (jemalloc, tcmalloc)
  • Leak detection and heap profiling

Where you’ll apply it

  • This project: Section 3.2 Functional Requirements, Section 4.3 Data Structures.
  • Also used in: P04-page-fault-analyzer for understanding demand paging.

References

  • “Computer Systems: A Programmer’s Perspective” Ch. 9
  • brk(2), mmap(2) man pages

Key insights

Allocator policy is inseparable from how the OS maps memory.

Summary

Understanding VMAs and the program break tells you what the allocator can and cannot control.

Homework/Exercises to practice the concept

  1. Allocate 1 MB with malloc and inspect /proc/<pid>/maps.
  2. Compare sbrk(0) before and after malloc(1).

Solutions to the homework/exercises

  1. You should see a new VMA if malloc uses mmap for large blocks.
  2. The program break may advance after the first allocation.

2.2 Block Metadata, Free Lists, and Coalescing

Fundamentals

Allocators track blocks with headers that store size and status. Free blocks are kept in a free list. When a request arrives, you pick a free block, possibly split it, and later coalesce adjacent free blocks to reduce fragmentation.

Deep Dive into the concept

Metadata usually includes size and flags packed into a word. The allocator keeps an explicit free list (linked list of free blocks), or segregated lists by size class. Coalescing requires checking neighbors; boundary tags let you find previous block size. Splitting a block must leave a minimum block size to store metadata. A common approach is: first-fit or best-fit over a free list, split if large, return pointer to payload, then on free, coalesce adjacent free blocks and insert into free list. Correctness hinges on consistent header/footer updates.

How this fits on projects

This is the core of your allocator in Section 4.4 and Section 5.10 Phase 2.

Definitions & key terms

  • block header -> metadata for a block (size, flags)
  • free list -> collection of free blocks
  • coalescing -> merging adjacent free blocks
  • split -> divide a large free block into allocated + free

Mental model diagram (ASCII)

[Hdr|64|USED][Hdr|128|FREE][Hdr|256|USED][Hdr|128|FREE]
          ^ split here            ^ coalesce here

How it works (step-by-step)

  1. Search free list for a block of adequate size.
  2. If block is much larger, split and return part.
  3. On free, mark block free and check neighbors.
  4. Merge adjacent free blocks and update list.

Minimal concrete example

struct block { size_t size; struct block *next; };

Common misconceptions

  • Misconception: first-fit always minimizes fragmentation. Correction: first-fit is fast but can fragment badly.
  • Misconception: coalescing is optional. Correction: without coalescing, fragmentation grows quickly.

Check-your-understanding questions

  1. Why do you need a minimum block size?
  2. What is the risk of forgetting to update a footer?
  3. How does segregated free lists improve performance?

Check-your-understanding answers

  1. To ensure free blocks can store metadata.
  2. Coalescing will read incorrect neighbor size and corrupt memory.
  3. It reduces search time by limiting candidates to size class.

Real-world applications

  • General-purpose allocators and memory pools
  • Embedded systems with tight memory constraints

Where you’ll apply it

References

  • “C Interfaces and Implementations” (Hanson) Ch. 5-6
  • “The Art of Computer Systems Performance Analysis” sections on fragmentation

Key insights

Allocator correctness is a data-structure invariants problem.

Summary

Block metadata and free list integrity decide whether your allocator is stable or crashes.

Homework/Exercises to practice the concept

  1. Draw a heap layout and simulate a sequence of alloc/free operations.
  2. Implement a minimal free list and test split/coalesce by hand.

Solutions to the homework/exercises

  1. Track block sizes and verify that coalescing reduces total blocks.
  2. Use a fixed array buffer and manage headers manually.

2.3 Alignment, Fragmentation, and Large Allocations

Fundamentals

Allocators align payloads to word boundaries (often 16 bytes) to satisfy ABI and SIMD requirements. Fragmentation comes in two forms: internal (wasted space inside blocks) and external (free space split across blocks). Large allocations should bypass the main heap using mmap.

Deep Dive into the concept

Alignment is usually achieved by rounding requested sizes up to the nearest multiple of an alignment constant. Internal fragmentation is a trade-off for simpler alignment. External fragmentation is mitigated by coalescing and by using segregated size classes. A practical rule: if request size > threshold (e.g., 128 KB), call mmap and free with munmap. This avoids polluting the heap and allows returning memory to the OS.

How this fits on projects

You will implement alignment and a large-allocation policy in Section 3.2 and Section 5.10 Phase 3.

Definitions & key terms

  • alignment -> ensuring payload addresses are multiples of N
  • internal fragmentation -> unused bytes inside allocated blocks
  • external fragmentation -> unused bytes between allocated blocks
  • threshold -> size above which mmap is used

Mental model diagram (ASCII)

request 30 bytes -> align to 32 -> 2 bytes internal fragment

How it works (step-by-step)

  1. Round request to alignment boundary.
  2. Add header size to compute total block size.
  3. For large size, allocate via mmap and mark as mmap block.
  4. On free, munmap large blocks directly.

Minimal concrete example

size_t align = 16;
size_t size = (req + (align-1)) & ~(align-1);

Common misconceptions

  • Misconception: alignment is optional. Correction: misaligned pointers can crash on some architectures.
  • Misconception: fragmentation is only an implementation detail. Correction: fragmentation directly impacts memory usage and performance.

Check-your-understanding questions

  1. Why is 16-byte alignment common on x86-64?
  2. What is the difference between internal and external fragmentation?
  3. Why use mmap for large allocations?

Check-your-understanding answers

  1. SIMD instructions and ABI require 16-byte stack alignment.
  2. Internal is wasted inside blocks; external is wasted between blocks.
  3. To reduce heap fragmentation and allow unmapping to the OS.

Real-world applications

  • Performance-critical systems (DBs, HFT)
  • Memory-constrained embedded software

Where you’ll apply it

References

  • “CS:APP” Ch. 3 (alignment)
  • “C Interfaces and Implementations” Ch. 5

Key insights

Alignment rules keep code correct; fragmentation rules keep it fast.

Summary

A good allocator balances alignment, fragmentation, and fast allocation paths.

Homework/Exercises to practice the concept

  1. Compute total block size for requests of 1, 15, 16, 17 bytes.
  2. Choose a mmap threshold and justify it.

Solutions to the homework/exercises

  1. With 16-byte alignment, requests round to 16, 16, 16, 32.
  2. Start with 128 KB; adjust after benchmarking.

3. Project Specification

3.1 What You Will Build

A shared library libmymalloc.so that provides malloc, free, calloc, and realloc, plus a debug mode that prints heap layout and fragmentation stats. It can be injected into existing programs via LD_PRELOAD.

3.2 Functional Requirements

  1. Implement malloc, free, calloc, realloc.
  2. Maintain free list with block metadata and coalescing.
  3. Support alignment to 16 bytes.
  4. Use mmap for large allocations (configurable threshold).
  5. Provide a debug function heap_dump().
  6. Support LD_PRELOAD usage without crashing common tools (ls, cat).
  7. Deterministic debug output with MALLOC_DEBUG_SEED=42.

3.3 Non-Functional Requirements

  • Performance: allocate/free 1M small blocks in < 2s on a modern CPU.
  • Reliability: no double-free crashes; detect and report.
  • Usability: clear stats for fragmentation and block counts.

3.4 Example Usage / Output

$ LD_PRELOAD=./libmymalloc.so ./bench_alloc
[alloc] 64 -> 0x55b2c00010
[free ] 0x55b2c00010
[stats] in_use=1 free=2 fragmentation=12%

3.5 Data Formats / Schemas / Protocols

Debug output format

[BLOCK addr size status] ...

3.6 Edge Cases

  • free(NULL) should be a no-op.
  • realloc(ptr, 0) should free.
  • Double-free detection in debug mode.
  • calloc overflow check (nmemb * size).

3.7 Real World Outcome

3.7.1 How to Run (Copy/Paste)

make
LD_PRELOAD=./libmymalloc.so ./bench_alloc
MALLOC_DEBUG=1 MALLOC_DEBUG_SEED=42 ./bench_alloc

3.7.2 Golden Path Demo (Deterministic)

  • With MALLOC_DEBUG=1 and MALLOC_DEBUG_SEED=42, heap dump order is fixed.

3.7.3 CLI Transcript (Success + Failure)

$ MALLOC_DEBUG=1 MALLOC_DEBUG_SEED=42 LD_PRELOAD=./libmymalloc.so ./bench_alloc
[BLOCK 0x1000 64 USED] [BLOCK 0x1040 128 FREE]
Summary: allocs=3 frees=2

$ LD_PRELOAD=./libmymalloc.so ./bench_alloc --double-free
error: double free detected at 0x1000
exit code: 3

3.7.4 Exit Codes

  • 0 success
  • 2 invalid argument (allocator API misuse)
  • 3 detected heap corruption (debug mode)

4. Solution Architecture

4.1 High-Level Design

malloc/free API
   |
   v
Free-list manager <-> OS memory (sbrk/mmap)
   |
   v
Debug + Stats reporter

4.2 Key Components

| Component | Responsibility | Key Decisions | |———–|—————-|—————| | Block header | size + flags | size in header, optional footer | | Free list | manage free blocks | first-fit to start | | OS interface | sbrk/mmap | threshold for large allocations | | Debug | heap dump + checks | enabled by env vars |

4.3 Data Structures (No Full Code)

struct block {
    size_t size;
    int free;
    struct block *next;
    struct block *prev;
};

4.4 Algorithm Overview

Key Algorithm: Allocate

  1. Align size and search free list.
  2. Split block if large.
  3. If none found, request more memory from OS.
  4. Return pointer to payload.

Complexity Analysis:

  • Time: O(n) for linear free list search
  • Space: O(n) for number of blocks

5. Implementation Guide

5.1 Development Environment Setup

sudo apt install build-essential

5.2 Project Structure

allocator/
|-- src/
|   |-- malloc.c
|   |-- free.c
|   |-- realloc.c
|   `-- heap_debug.c
|-- include/
|   `-- allocator.h
`-- Makefile

5.3 The Core Question You’re Answering

“How do I manage memory safely and efficiently without relying on libc?”

5.4 Concepts You Must Understand First

  1. VMAs and the heap.
  2. Block metadata and free list invariants.
  3. Alignment and fragmentation trade-offs.

5.5 Questions to Guide Your Design

  1. How will you find a free block quickly?
  2. How will you coalesce neighbors safely?
  3. What is your large-allocation threshold?

5.6 Thinking Exercise

Simulate: allocate 64, allocate 128, free 64, allocate 32. What happens with first-fit vs best-fit?

5.7 The Interview Questions They’ll Ask

  1. What is internal vs external fragmentation?
  2. Why do allocators use mmap for large blocks?
  3. How does realloc work internally?

5.8 Hints in Layers

  • Hint 1: start with a single global free list.
  • Hint 2: add splitting and coalescing.
  • Hint 3: add mmap for large allocations.

5.9 Books That Will Help

| Topic | Book | Chapter | |——|——|———| | Allocators | C Interfaces and Implementations | Ch. 5-6 | | VM | CS:APP | Ch. 9 | | Memory syscalls | TLPI | Memory chapters |

5.10 Implementation Phases

Phase 1: Minimal allocator (3-4 days)

  • Use a single free list and sbrk.
  • Checkpoint: malloc/free works for small tests.

Phase 2: Coalescing + split (1 week)

  • Implement split and coalesce logic.
  • Checkpoint: fragmentation stats improve.

Phase 3: mmap + debug (1 week)

  • Add mmap path and debug checks.
  • Checkpoint: large allocs show as separate VMAs.

5.11 Key Implementation Decisions

| Decision | Options | Recommendation | Rationale | |———-|———|—————-|———–| | Free list search | first-fit vs best-fit | first-fit | simpler, acceptable for v1 | | Large allocs | mmap threshold | 128 KB | reduces heap fragmentation |


6. Testing Strategy

6.1 Test Categories

| Category | Purpose | Examples | |———|———|———-| | Unit | header math | size/alignment tests | | Integration | real programs | ls, grep with LD_PRELOAD | | Edge | overflow | calloc overflow test |

6.2 Critical Test Cases

  1. Allocate/free small blocks repeatedly (no leaks).
  2. Double-free should be detected in debug mode.
  3. Large allocation should use mmap and munmap.

6.3 Test Data

requests: 8, 16, 24, 1024, 256000
expected: aligned sizes, mmap for 256000

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

| Pitfall | Symptom | Solution | |———|———|———-| | Incorrect alignment | crashes with SIMD | round up to 16 bytes | | Metadata overwrite | random crashes | guard headers in debug mode | | No coalescing | memory blow-up | merge neighbors on free |

7.2 Debugging Strategies

  • Heap dumps: print block list after each alloc/free.
  • Canaries: add sentinel bytes to detect overwrites.

7.3 Performance Traps

  • Linear scans for every alloc can be slow; consider segregated lists later.

8. Extensions & Challenges

8.1 Beginner Extensions

  • Add malloc_stats() output.
  • Add colorized heap dump.

8.2 Intermediate Extensions

  • Segregated free lists by size class.
  • Thread-safe allocator with per-thread caches.

8.3 Advanced Extensions

  • Implement a slab allocator for fixed-size objects.
  • Add profiling hooks for hot allocation sites.

9. Real-World Connections

9.1 Industry Applications

  • Memory allocators in databases and game engines
  • Performance profiling and leak detection
  • jemalloc, tcmalloc, mimalloc

9.3 Interview Relevance

  • Explaining fragmentation and allocator internals
  • Debugging memory corruption scenarios

10. Resources

10.1 Essential Reading

  • Hanson, “C Interfaces and Implementations,” Ch. 5-6
  • CS:APP Ch. 9 (VM)

10.2 Tools & Documentation

  • brk(2), mmap(2), munmap(2) man pages

11. Self-Assessment Checklist

11.1 Understanding

  • I can describe heap vs mmap regions.
  • I can explain fragmentation and coalescing.

11.2 Implementation

  • malloc/free/calloc/realloc all work.
  • Debug mode detects corruption.

11.3 Growth

  • I can benchmark and explain allocator trade-offs.

12. Submission / Completion Criteria

Minimum Viable Completion:

  • Basic allocator works with small test program.
  • Free list and split implemented.

Full Completion:

  • Coalescing and mmap path implemented.
  • Works with LD_PRELOAD on common tools.

Excellence (Going Above & Beyond):

  • Thread-safe + segregated lists.
  • Detailed benchmark report.

13. Determinism Notes

  • Use MALLOC_DEBUG_SEED=42 for stable heap dumps.