Project 14: Build Your Own Malloc
Implement a dynamic memory allocator from scratch, mastering heap organization, block management, and the trade-offs between throughput and memory utilization.
Quick Reference
| Attribute | Value |
|---|---|
| Language | C (alt: Rust, Zig, C++) |
| Difficulty | Expert |
| Time | 1 month+ |
| Chapters | 6, 9 |
| Coolness | ★★★★★ Pure Magic |
| Portfolio Value | Resume Gold |
Learning Objectives
By completing this project, you will:
- Master heap organization: Understand how the heap is structured and managed at the byte level
- Implement memory allocation algorithms: Build malloc, free, and optionally realloc from scratch
- Design efficient data structures: Create block headers, footers, and free list organizations
- Understand fragmentation trade-offs: Balance internal vs external fragmentation through policy choices
- Apply boundary tag coalescing: Implement efficient free block merging in O(1) time
- Build a heap checker: Write invariant-checking code that catches corruption early
- Measure and optimize performance: Understand throughput vs utilization trade-offs
- Connect VM concepts to practice: See how sbrk() and mmap() extend the heap
Deep Theoretical Foundation
Dynamic Memory Allocation Fundamentals
Why Dynamic Allocation Exists
Static memory (stack and global variables) has fundamental limitations:
┌────────────────────────────────────────────────────────────────────┐
│ MEMORY ALLOCATION SPECTRUM │
├────────────────────────────────────────────────────────────────────┤
│ │
│ STATIC ALLOCATION DYNAMIC ALLOCATION │
│ ───────────────── ─────────────────── │
│ │
│ int arr[100]; int *arr = malloc(n * sizeof(int));│
│ │
│ Pros: Pros: │
│ - Fast (no runtime cost) - Size determined at runtime │
│ - Simple (compiler handles) - Lifetime controlled by program │
│ - Automatic cleanup - Can grow/shrink │
│ │
│ Cons: Cons: │
│ - Size fixed at compile time - Runtime overhead │
│ - Limited by stack size - Manual cleanup required │
│ - Lifetime tied to scope - Fragmentation possible │
│ │
│ Use when: Use when: │
│ - Size is known - Size unknown until runtime │
│ - Small, fixed allocations - Large or variable allocations │
│ - Short-lived data - Data outlives function scope │
│ │
└────────────────────────────────────────────────────────────────────┘
The Allocator’s Contract
A memory allocator must satisfy these requirements:
void *malloc(size_t size);
// 1. Return pointer to block of at least 'size' bytes
// 2. Block must be properly aligned (usually 8 or 16 bytes)
// 3. Block must not overlap with any other allocated block
// 4. Return NULL if request cannot be satisfied
void free(void *ptr);
// 1. Return block to the free pool
// 2. ptr must be from a previous malloc/realloc
// 3. ptr must not have been freed already (undefined behavior)
// 4. ptr == NULL is a valid no-op
void *realloc(void *ptr, size_t size);
// 1. Resize allocation to 'size' bytes
// 2. Preserve contents (up to min of old and new size)
// 3. May move the block; return new location
// 4. realloc(NULL, size) == malloc(size)
// 5. realloc(ptr, 0) == free(ptr), returns NULL
Heap Organization Strategies
The Heap in Process Memory
┌────────────────────────────────────────────────────────────────────┐
│ PROCESS VIRTUAL ADDRESS SPACE │
├────────────────────────────────────────────────────────────────────┤
│ │
│ High Address (0x7fff...) │
│ ┌─────────────────────────────────────────────────────────────────┐│
│ │ KERNEL SPACE ││
│ ├─────────────────────────────────────────────────────────────────┤│
│ │ STACK ││
│ │ ↓ grows down ││
│ │ ... ││
│ ├─────────────────────────────────────────────────────────────────┤│
│ │ (unmapped gap) ││
│ ├─────────────────────────────────────────────────────────────────┤│
│ │ ... ││
│ │ ↑ grows up ││
│ │ HEAP ││
│ │ ┌───────────────────────────────────────────────────────────┐ ││
│ │ │ Your allocator manages this region! │ ││
│ │ │ │ ││
│ │ │ ┌──────┐ ┌──────────┐ ┌────┐ ┌──────────────────────┐ │ ││
│ │ │ │ used │ │ free │ │used│ │ free │ │ ││
│ │ │ └──────┘ └──────────┘ └────┘ └──────────────────────┘ │ ││
│ │ │ │ ││
│ │ └───────────────────────────────────────────────────────────┘ ││
│ ├─────────────────────────────────────────────────────────────────┤│
│ │ .bss (uninitialized data) ││
│ ├─────────────────────────────────────────────────────────────────┤│
│ │ .data (initialized data) ││
│ ├─────────────────────────────────────────────────────────────────┤│
│ │ .text (program code) ││
│ └─────────────────────────────────────────────────────────────────┘│
│ Low Address (0x0...) │
│ │
└────────────────────────────────────────────────────────────────────┘
Heap Expansion Mechanisms
sbrk() - Traditional Heap Extension:
void *sbrk(intptr_t increment);
// Moves the "program break" (end of heap) by 'increment' bytes
// Returns pointer to the OLD break (start of new memory)
// sbrk(0) returns current break without changing it
// Example: Extend heap by 4096 bytes
void *new_mem = sbrk(4096);
if (new_mem == (void *)-1) {
// Error: out of memory
}
mmap() - Memory Mapping for Large Allocations:
void *mmap(void *addr, size_t length, int prot, int flags, int fd, off_t offset);
// Maps a region of virtual address space
// For anonymous memory: fd = -1, flags = MAP_ANONYMOUS | MAP_PRIVATE
// Example: Allocate 1MB anonymous region
void *large_block = mmap(NULL, 1024*1024,
PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS,
-1, 0);
Trade-offs:
┌────────────────────────────────────────────────────────────────────┐
│ sbrk() vs mmap() │
├───────────────────────────────┬────────────────────────────────────┤
│ sbrk() │ mmap() │
├───────────────────────────────┼────────────────────────────────────┤
│ Heap must be contiguous │ Can map anywhere in address space │
│ Fast for small increments │ Higher overhead per call │
│ Can only shrink from the top │ Any region can be unmapped │
│ Simple interface │ More complex but flexible │
│ Traditional Unix approach │ Modern approach for large allocs │
├───────────────────────────────┴────────────────────────────────────┤
│ Modern allocators: sbrk for small, mmap for large (>128KB) │
└────────────────────────────────────────────────────────────────────┘
Block Structure: Headers and Footers
The Implicit Free List Block Format
Every block in the heap needs metadata to track its size and status:
┌────────────────────────────────────────────────────────────────────┐
│ BLOCK STRUCTURE (WITH FOOTER) │
├────────────────────────────────────────────────────────────────────┤
│ │
│ ALLOCATED BLOCK FREE BLOCK │
│ ─────────────── ────────── │
│ │
│ ┌────────────────┐ ┌────────────────┐ │
│ │ HEADER (4B) │ │ HEADER (4B) │ │
│ │ size | 1 │ │ size | 0 │ │
│ ├────────────────┤ ├────────────────┤ │
│ │ │ │ │ │
│ │ PAYLOAD │ │ │ │
│ │ (user data) │ │ FREE SPACE │ │
│ │ │ │ (available) │ │
│ │ │ │ │ │
│ ├────────────────┤ ├────────────────┤ │
│ │ PADDING │ │ │ │
│ │ (alignment) │ │ │ │
│ ├────────────────┤ ├────────────────┤ │
│ │ FOOTER (4B) │ │ FOOTER (4B) │ │
│ │ size | 1 │ │ size | 0 │ │
│ └────────────────┘ └────────────────┘ │
│ │
│ Low-order bit of size encodes allocated/free status │
│ (works because size is always multiple of alignment) │
│ │
└────────────────────────────────────────────────────────────────────┘
Header Format (32-bit example):
┌────────────────────────────────────────────────────────────────────┐
│ HEADER FORMAT │
├────────────────────────────────────────────────────────────────────┤
│ │
│ 31 3 2 1 0 │
│ ┌───────────────────────────────────────────────┬──┬──┬──┬──┐ │
│ │ BLOCK SIZE (bytes) │ │ │ P│ A│ │
│ └───────────────────────────────────────────────┴──┴──┴──┴──┘ │
│ │
│ A (bit 0): Allocated flag │
│ 0 = free, 1 = allocated │
│ │
│ P (bit 1): Previous block allocated (optional optimization) │
│ 0 = previous is free, 1 = previous is allocated │
│ (eliminates need for footer in allocated blocks) │
│ │
│ Size is always a multiple of 8 (or 16), so low bits are free │
│ │
│ Example: │
│ Header = 0x00000019 = 24 bytes, allocated │
│ Header = 0x00000020 = 32 bytes, free │
│ │
└────────────────────────────────────────────────────────────────────┘
Why Footers?:
Footers enable O(1) coalescing with the previous block:
Without footers:
To find previous block's size, must scan from heap start = O(n)
With footers:
Just look at (current_header - 4 bytes) = O(1)
┌────────┬────────────────────┬────────┬────────┬────────────────┬────────┐
│ HDR │ PAYLOAD │ FTR │ HDR │ PAYLOAD │ FTR │
│ 24|1 │ │ 24|1 │ 32|0 │ │ 32|0 │
└────────┴────────────────────┴────────┴────────┴────────────────┴────────┘
↑ ↑
| Footer tells us prev |
| block is 24 bytes, alloc |
└──────────────────────────────┘
Look back 4 bytes to find previous footer
Free List Organizations
Implicit Free List
The simplest approach: blocks are organized implicitly by their physical order.
┌────────────────────────────────────────────────────────────────────┐
│ IMPLICIT FREE LIST │
├────────────────────────────────────────────────────────────────────┤
│ │
│ HEAP STRUCTURE: │
│ │
│ ┌──────────┬──────────┬──────────┬──────────┬──────────┐ │
│ │ PROLOGUE │ BLOCK │ BLOCK │ BLOCK │ EPILOGUE │ │
│ │ 8|1 │ 32|1 │ 64|0 │ 128|1 │ 0|1 │ │
│ └──────────┴──────────┴──────────┴──────────┴──────────┘ │
│ ALLOCATED FREE ALLOCATED (end marker) │
│ │
│ FINDING FREE BLOCKS: │
│ - Start at first block │
│ - Walk through ALL blocks checking headers │
│ - Stop at first free block that fits (first fit) │
│ │
│ Time Complexity: │
│ - malloc: O(total blocks) - must scan all blocks │
│ - free: O(1) with footers (coalescing) │
│ │
│ Space Overhead: │
│ - Header: 4 bytes per block │
│ - Footer: 4 bytes per block │
│ - Minimum block size: 16 bytes (header + footer + 8-byte min) │
│ │
└────────────────────────────────────────────────────────────────────┘
Explicit Free List
Maintain a linked list of only free blocks:
┌────────────────────────────────────────────────────────────────────┐
│ EXPLICIT FREE LIST │
├────────────────────────────────────────────────────────────────────┤
│ │
│ FREE BLOCK STRUCTURE: │
│ │
│ ┌────────────────┐ │
│ │ HEADER (4B) │ │
│ │ size | 0 │ │
│ ├────────────────┤ │
│ │ NEXT (8B) │ ──→ points to next free block │
│ ├────────────────┤ │
│ │ PREV (8B) │ ←── points to previous free block │
│ ├────────────────┤ │
│ │ │ │
│ │ (unused) │ │
│ │ │ │
│ ├────────────────┤ │
│ │ FOOTER (4B) │ │
│ │ size | 0 │ │
│ └────────────────┘ │
│ │
│ HEAP VIEW: │
│ │
│ ┌────────┬────────┬────────┬────────┬────────┬────────┐ │
│ │PROLOGUE│ ALLOC │ FREE │ ALLOC │ FREE │EPILOGUE│ │
│ │ │ 32|1 │ 64|0 │ 128|1 │ 256|0 │ │ │
│ └────────┴────────┴────────┴────────┴────────┴────────┘ │
│ ↑ ↑ │
│ │ │ │
│ └───────←──────────┘ │
│ doubly linked │
│ │
│ free_list_head ──→ [64|0] ←──→ [256|0] ←──→ NULL │
│ │
│ Time Complexity: │
│ - malloc: O(free blocks) - only scan free blocks │
│ - free: O(1) with immediate coalescing │
│ │
│ Space Overhead: │
│ - Minimum block: 24 bytes (header + 2 pointers + footer) │
│ - But only free blocks need the pointers │
│ │
└────────────────────────────────────────────────────────────────────┘
List Ordering Policies:
┌────────────────────────────────────────────────────────────────────┐
│ FREE LIST ORDERING POLICIES │
├────────────────────────────────────────────────────────────────────┤
│ │
│ LIFO (Last-In-First-Out): │
│ ───────────────────────── │
│ Insert newly freed block at HEAD of list │
│ + Simple and fast insertion: O(1) │
│ - Poor locality: recently freed blocks scattered │
│ │
│ free(block): │
│ block->next = free_list_head │
│ block->prev = NULL │
│ if (free_list_head) free_list_head->prev = block │
│ free_list_head = block │
│ │
│ ADDRESS-ORDERED: │
│ ──────────────── │
│ Insert in order by address in memory │
│ + Better locality: nearby free blocks adjacent in list │
│ + Enables efficient coalescing │
│ - Slower insertion: O(n) to find position │
│ │
│ Memory: [block_A at 0x1000] [block_B at 0x2000] [block_C at 0x3000]│
│ List: block_A → block_B → block_C │
│ │
│ BEST PRACTICE: Address-ordered for better utilization │
│ │
└────────────────────────────────────────────────────────────────────┘
Segregated Free Lists
Group free blocks by size class for O(1) access to appropriately-sized blocks:
┌────────────────────────────────────────────────────────────────────┐
│ SEGREGATED FREE LISTS │
├────────────────────────────────────────────────────────────────────┤
│ │
│ SIZE CLASS ARRAY: │
│ ───────────────── │
│ │
│ Index Size Range Free List │
│ ───── ────────── ───────── │
│ 0 16-31 bytes ──→ [24|0] → [16|0] → NULL │
│ 1 32-63 bytes ──→ [48|0] → [32|0] → [56|0] → NULL │
│ 2 64-127 bytes ──→ [96|0] → NULL │
│ 3 128-255 bytes ──→ NULL (empty) │
│ 4 256-511 bytes ──→ [384|0] → [256|0] → NULL │
│ 5 512-1023 bytes ──→ [512|0] → NULL │
│ 6 1024-2047 ──→ [1024|0] → NULL │
│ 7 2048-4095 ──→ NULL │
│ 8 4096-∞ ──→ [8192|0] → [4096|0] → NULL │
│ │
│ SIZE CLASS CALCULATION: │
│ ─────────────────────── │
│ class(size) = floor(log2(size)) - 3 (for power-of-2 classes) │
│ │
│ Or use explicit ranges: │
│ size < 32 → class 0 │
│ size < 64 → class 1 │
│ size < 128 → class 2 │
│ ... │
│ │
│ MALLOC ALGORITHM: │
│ ───────────────── │
│ 1. Calculate size class for requested size │
│ 2. Search that class's list │
│ 3. If empty, try next larger class │
│ 4. If all empty, extend heap with sbrk() │
│ 5. Split if block too large │
│ │
│ Time Complexity: O(1) for common case │
│ O(classes) worst case │
│ │
│ Space Overhead: Array of list heads (typically 8-16 pointers) │
│ │
└────────────────────────────────────────────────────────────────────┘
Boundary Tag Coalescing
When freeing a block, adjacent free blocks should be merged to reduce fragmentation:
┌────────────────────────────────────────────────────────────────────┐
│ COALESCING SCENARIOS │
├────────────────────────────────────────────────────────────────────┤
│ │
│ CASE 1: Neither adjacent block is free │
│ ───────────────────────────────────────── │
│ │
│ Before: [ALLOC] [FREE*] [ALLOC] │
│ After: [ALLOC] [FREE ] [ALLOC] (no change, just mark free) │
│ │
│ CASE 2: Previous block is free │
│ ──────────────────────────────── │
│ │
│ Before: [FREE ] [FREE*] [ALLOC] │
│ ↑──64──↑↑──32──↑ │
│ │
│ After: [ FREE (96) ] [ALLOC] │
│ Update header of prev, footer of current │
│ │
│ CASE 3: Next block is free │
│ ─────────────────────────── │
│ │
│ Before: [ALLOC] [FREE*] [FREE ] │
│ ↑──32──↑↑──64──↑ │
│ │
│ After: [ALLOC] [ FREE (96) ] │
│ Update header of current, footer of next │
│ │
│ CASE 4: Both adjacent blocks are free │
│ ────────────────────────────────────── │
│ │
│ Before: [FREE ] [FREE*] [FREE ] │
│ ↑──64──↑↑──32──↑↑──64──↑ │
│ │
│ After: [ FREE (160) ] │
│ Update header of prev, footer of next │
│ │
└────────────────────────────────────────────────────────────────────┘
Coalescing Implementation:
static void *coalesce(void *bp) {
// Get allocation status of adjacent blocks
size_t prev_alloc = GET_ALLOC(FTRP(PREV_BLKP(bp))); // Previous footer
size_t next_alloc = GET_ALLOC(HDRP(NEXT_BLKP(bp))); // Next header
size_t size = GET_SIZE(HDRP(bp));
if (prev_alloc && next_alloc) { // Case 1: both allocated
return bp; // No coalescing needed
}
else if (prev_alloc && !next_alloc) { // Case 2: next is free
size += GET_SIZE(HDRP(NEXT_BLKP(bp)));
PUT(HDRP(bp), PACK(size, 0)); // Update current header
PUT(FTRP(bp), PACK(size, 0)); // Update next's footer
// (FTRP uses new size, finds the right location)
}
else if (!prev_alloc && next_alloc) { // Case 3: prev is free
size += GET_SIZE(HDRP(PREV_BLKP(bp)));
PUT(FTRP(bp), PACK(size, 0)); // Update current footer
PUT(HDRP(PREV_BLKP(bp)), PACK(size, 0)); // Update prev header
bp = PREV_BLKP(bp); // Return prev's address
}
else { // Case 4: both are free
size += GET_SIZE(HDRP(PREV_BLKP(bp))) +
GET_SIZE(FTRP(NEXT_BLKP(bp)));
PUT(HDRP(PREV_BLKP(bp)), PACK(size, 0)); // Update prev header
PUT(FTRP(NEXT_BLKP(bp)), PACK(size, 0)); // Update next footer
bp = PREV_BLKP(bp); // Return prev's address
}
return bp;
}
Splitting Policies
When a free block is larger than needed, we may split it:
┌────────────────────────────────────────────────────────────────────┐
│ BLOCK SPLITTING │
├────────────────────────────────────────────────────────────────────┤
│ │
│ REQUEST: malloc(24) (plus 8 bytes overhead = 32 bytes needed) │
│ FOUND: Free block of 128 bytes │
│ │
│ WITHOUT SPLITTING: │
│ ────────────────── │
│ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ HEADER (128|1) │ PAYLOAD (120 bytes) │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
│ Internal fragmentation: 120 - 24 = 96 bytes WASTED! │
│ │
│ WITH SPLITTING: │
│ ─────────────── │
│ │
│ ┌──────────────────────┬──────────────────────────────────┐ │
│ │ HDR │ PAYLOAD │ FTR │ HDR │ FREE BLOCK │ FTR │ │
│ │32|1 │ (24) │32|1 │96|0 │ (88 usable) │96|0 │ │
│ └──────────────────────┴──────────────────────────────────┘ │
│ 32 bytes 96 bytes │
│ │
│ SPLITTING DECISION: │
│ ─────────────────── │
│ remainder = block_size - needed_size │
│ if (remainder >= MIN_BLOCK_SIZE) │
│ split the block │
│ else │
│ use entire block (accept internal fragmentation) │
│ │
│ MIN_BLOCK_SIZE = header(4) + min_payload(8) + footer(4) = 16 │
│ or with pointers: 24-32 bytes │
│ │
└────────────────────────────────────────────────────────────────────┘
Placement Policies: First Fit, Next Fit, Best Fit
How do we choose which free block to use?
┌────────────────────────────────────────────────────────────────────┐
│ PLACEMENT POLICIES │
├────────────────────────────────────────────────────────────────────┤
│ │
│ FREE BLOCKS: [32] → [128] → [64] → [256] → [48] → NULL │
│ REQUEST: malloc(50) needs 64 bytes with overhead │
│ │
│ FIRST FIT: │
│ ────────── │
│ Scan from beginning, use first block that fits │
│ Choice: [128] (first one >= 64) │
│ │
│ + Simple to implement │
│ + Fast for small allocations (found early) │
│ - Fragments beginning of heap │
│ - Large blocks may not find space │
│ │
│ NEXT FIT: │
│ ───────── │
│ Like first fit, but start where previous search ended │
│ If last search ended at [64], start there │
│ Choice: [64] (first one >= 64 from current position) │
│ │
│ + Spreads allocations across heap │
│ + Can be faster than first fit │
│ - Often worse fragmentation than first fit │
│ │
│ BEST FIT: │
│ ───────── │
│ Scan entire list, choose smallest block that fits │
│ Choice: [64] (exact fit!) │
│ │
│ + Minimizes wasted space per allocation │
│ + Good memory utilization │
│ - Slow: O(n) every time │
│ - Creates many tiny unusable fragments │
│ │
│ PERFORMANCE COMPARISON: │
│ ─────────────────────── │
│ │
│ Policy Throughput Utilization Implementation │
│ ────── ────────── ─────────── ────────────── │
│ First Fit Fast Medium Simple │
│ Next Fit Fast Poor-Medium Simple + state │
│ Best Fit Slow Good Simple (but slow) │
│ Seg Lists Very Fast Good Complex │
│ │
└────────────────────────────────────────────────────────────────────┘
Fragmentation
Internal Fragmentation
Wasted space INSIDE allocated blocks:
┌────────────────────────────────────────────────────────────────────┐
│ INTERNAL FRAGMENTATION │
├────────────────────────────────────────────────────────────────────┤
│ │
│ REQUEST: malloc(1) │
│ ACTUAL ALLOCATION: │
│ │
│ ┌────────┬────────────────────────────┬────────┐ │
│ │ HEADER │ PAYLOAD │ FOOTER │ │
│ │ 4B │ 16B minimum (aligned) │ 4B │ │
│ └────────┴────────────────────────────┴────────┘ │
│ │ │
│ └── User requested 1 byte │
│ Allocator gave 16 bytes (minimum) │
│ Internal fragmentation: 15 bytes │
│ │
│ CAUSES: │
│ ─────── │
│ 1. Alignment requirements (must be 8 or 16 byte aligned) │
│ 2. Metadata overhead (header/footer) │
│ 3. Minimum block size constraints │
│ 4. Not splitting (when remainder too small) │
│ │
│ MEASUREMENT: │
│ │
│ internal_frag = allocated_size - requested_size │
│ internal_frag_ratio = internal_frag / allocated_size │
│ │
└────────────────────────────────────────────────────────────────────┘
External Fragmentation
Wasted space BETWEEN allocated blocks:
┌────────────────────────────────────────────────────────────────────┐
│ EXTERNAL FRAGMENTATION │
├────────────────────────────────────────────────────────────────────┤
│ │
│ SCENARIO: malloc(256) fails despite 512 bytes total free │
│ │
│ HEAP STATE: │
│ ┌───────┬───────┬───────┬───────┬───────┬───────┬───────┬───────┐│
│ │ FREE │ ALLOC │ FREE │ ALLOC │ FREE │ ALLOC │ FREE │ ALLOC ││
│ │ 128B │ 64B │ 128B │ 64B │ 128B │ 64B │ 128B │ 64B ││
│ └───────┴───────┴───────┴───────┴───────┴───────┴───────┴───────┘│
│ │
│ Total free: 128 × 4 = 512 bytes │
│ Largest contiguous free: 128 bytes │
│ Cannot satisfy 256-byte request! │
│ │
│ CAUSES: │
│ ─────── │
│ 1. Alternating alloc/free pattern │
│ 2. Poor placement policy (best fit creates tiny holes) │
│ 3. No coalescing (or deferred coalescing) │
│ 4. Long-lived allocations fragment the heap │
│ │
│ MITIGATION: │
│ ─────────── │
│ 1. Immediate coalescing │
│ 2. Segregated fits (group similar sizes) │
│ 3. Compaction (expensive, not common in C) │
│ 4. Better placement (first fit often better than best fit) │
│ │
│ MEASUREMENT: │
│ │
│ external_frag = total_free_space - largest_free_block │
│ utilization = allocated_bytes / total_heap_size │
│ │
└────────────────────────────────────────────────────────────────────┘
Alignment Requirements
All pointers returned by malloc must be properly aligned:
┌────────────────────────────────────────────────────────────────────┐
│ ALIGNMENT REQUIREMENTS │
├────────────────────────────────────────────────────────────────────┤
│ │
│ WHY ALIGNMENT MATTERS: │
│ ────────────────────── │
│ │
│ Modern CPUs load data in aligned chunks. Unaligned access: │
│ - May require multiple memory operations │
│ - Can cause hardware exceptions (SIGBUS on some architectures) │
│ - Always slower when supported │
│ │
│ ALIGNMENT RULES: │
│ ──────────────── │
│ │
│ Type Size Required Alignment │
│ ──── ──── ────────────────── │
│ char 1 1 byte │
│ short 2 2 bytes │
│ int 4 4 bytes │
│ long 8 8 bytes (on 64-bit) │
│ double 8 8 bytes │
│ pointer 8 8 bytes (on 64-bit) │
│ long double 16 16 bytes │
│ __m128 16 16 bytes (SSE) │
│ __m256 32 32 bytes (AVX) │
│ │
│ MALLOC REQUIREMENT: │
│ ─────────────────── │
│ │
│ malloc must return addresses aligned to the MAXIMUM alignment │
│ requirement of any type (typically 8 or 16 bytes) │
│ │
│ Valid addresses: 0x1000, 0x1008, 0x1010, 0x1018, ... │
│ Invalid (8-byte): 0x1001, 0x1002, 0x1004, 0x1007, ... │
│ │
│ IMPLEMENTATION: │
│ ─────────────── │
│ │
│ #define ALIGNMENT 8 │
│ #define ALIGN(size) (((size) + (ALIGNMENT-1)) & ~(ALIGNMENT-1)) │
│ │
│ Examples: │
│ ALIGN(1) = 8 │
│ ALIGN(8) = 8 │
│ ALIGN(9) = 16 │
│ ALIGN(15) = 16 │
│ ALIGN(16) = 16 │
│ │
│ BLOCK SIZE = ALIGN(payload_size + header + footer) │
│ │
└────────────────────────────────────────────────────────────────────┘
Memory Allocator Invariants
Invariants are properties that must ALWAYS be true. Checking them catches bugs early:
┌────────────────────────────────────────────────────────────────────┐
│ ALLOCATOR INVARIANTS │
├────────────────────────────────────────────────────────────────────┤
│ │
│ STRUCTURAL INVARIANTS (must always hold): │
│ ────────────────────────────────────────── │
│ │
│ 1. Alignment: Every block address is ALIGNMENT-byte aligned │
│ CHECK: (block_ptr & (ALIGNMENT - 1)) == 0 │
│ │
│ 2. Header/Footer match: Header size == Footer size │
│ CHECK: GET_SIZE(HDRP(bp)) == GET_SIZE(FTRP(bp)) │
│ GET_ALLOC(HDRP(bp)) == GET_ALLOC(FTRP(bp)) │
│ │
│ 3. Coalescing: No two consecutive free blocks │
│ CHECK: if (!allocated(bp)) then allocated(NEXT_BLKP(bp)) │
│ │
│ 4. Boundary: Prologue and epilogue exist and are marked │
│ CHECK: heap starts with 8|1, ends with 0|1 │
│ │
│ 5. Heap bounds: All blocks within [heap_start, heap_end] │
│ CHECK: heap_start <= bp < heap_end for all blocks │
│ │
│ FREE LIST INVARIANTS (for explicit lists): │
│ ────────────────────────────────────────── │
│ │
│ 6. Every block in free list is marked free │
│ CHECK: for each bp in list: !GET_ALLOC(HDRP(bp)) │
│ │
│ 7. Every free block is in the free list │
│ CHECK: scan heap, verify all free blocks are in list │
│ │
│ 8. Pointer consistency: next->prev == current, prev->next == curr │
│ CHECK: for each bp: NEXT(PREV(bp)) == bp && PREV(NEXT(bp)) == bp│
│ │
│ SIZE INVARIANTS: │
│ ──────────────── │
│ │
│ 9. Block size >= minimum block size │
│ CHECK: GET_SIZE(HDRP(bp)) >= MIN_BLOCK_SIZE │
│ │
│ 10. Block size is aligned │
│ CHECK: (GET_SIZE(HDRP(bp)) & (ALIGNMENT - 1)) == 0 │
│ │
│ 11. Total of all block sizes == heap size │
│ CHECK: sum of sizes from prologue to epilogue == brk - start │
│ │
└────────────────────────────────────────────────────────────────────┘
Throughput vs Utilization Trade-offs
Every allocator design involves trade-offs:
┌────────────────────────────────────────────────────────────────────┐
│ THROUGHPUT vs UTILIZATION TRADE-OFFS │
├────────────────────────────────────────────────────────────────────┤
│ │
│ High Throughput │
│ ▲ │
│ │ │
│ Implicit Free │ Segregated Lists │
│ (simple header) │ (per-class lists) │
│ │ │
│ Simple ────────────────┼──────────────────── Complex │
│ │ │
│ Best Fit │ Buddy Allocator │
│ (scan all) │ (power-of-2 splits) │
│ │ │
│ ▼ │
│ High Utilization │
│ │
│ DESIGN CHOICES AND THEIR EFFECTS: │
│ ───────────────────────────────── │
│ │
│ Choice Throughput Effect Utilization Effect │
│ ────── ──────────────── ────────────────── │
│ Larger min block Better (faster) Worse (more internal) │
│ Explicit lists Better Same │
│ Segregated lists Much better Same or better │
│ Best fit Worse Better │
│ First fit Better Worse │
│ Immediate coalesce Usually better Usually better │
│ Deferred coalesce Sometimes better Sometimes worse │
│ Footer elimination Better (less data) Better (less overhead) │
│ │
│ TYPICAL TARGETS: │
│ ──────────────── │
│ - Throughput: 10,000+ operations/second │
│ - Utilization: 70-90% of peak memory │
│ - Real allocators (glibc, jemalloc): highly optimized for both │
│ │
└────────────────────────────────────────────────────────────────────┘
Project Specification
What You Will Build
A dynamic memory allocator that implements:
- malloc(size): Allocate at least
sizebytes of memory - free(ptr): Return memory to the free pool
- realloc(ptr, size): Resize an existing allocation (optional but recommended)
Plus supporting tools:
- Heap checker: Verify invariants after operations
- Performance harness: Measure throughput and utilization
Functional Requirements
┌────────────────────────────────────────────────────────────────────┐
│ FUNCTIONAL REQUIREMENTS │
├────────────────────────────────────────────────────────────────────┤
│ │
│ CORE API: │
│ ───────── │
│ │
│ int mm_init(void); │
│ - Initialize the heap │
│ - Create prologue and epilogue blocks │
│ - Return 0 on success, -1 on failure │
│ │
│ void *mm_malloc(size_t size); │
│ - Return pointer to allocated block of at least 'size' bytes │
│ - Block must be 8-byte aligned (or 16-byte on 64-bit) │
│ - Return NULL if size == 0 or cannot satisfy request │
│ │
│ void mm_free(void *ptr); │
│ - Return block to free pool │
│ - Coalesce with adjacent free blocks │
│ - ptr == NULL is a no-op │
│ │
│ void *mm_realloc(void *ptr, size_t size); │
│ - Resize allocation to 'size' bytes │
│ - Preserve contents up to min(old_size, new_size) │
│ - realloc(NULL, size) == malloc(size) │
│ - realloc(ptr, 0) == free(ptr), return NULL │
│ │
│ HEAP EXTENSION: │
│ ─────────────── │
│ │
│ Use provided mem_sbrk(int incr) to extend heap │
│ - Similar to sbrk() but works with simulated heap │
│ - Returns pointer to new memory on success │
│ - Returns (void *)-1 on failure │
│ │
│ CONSTRAINTS: │
│ ──────────── │
│ │
│ - No global arrays larger than 128 bytes │
│ - No calls to malloc/free/realloc from libc │
│ - Must handle requests from 1 byte to large allocations │
│ - Must not corrupt heap on any sequence of operations │
│ │
└────────────────────────────────────────────────────────────────────┘
Non-Functional Requirements
- Correctness: Never return overlapping blocks, never corrupt heap
- Alignment: All returned pointers must be properly aligned
- Performance: Reasonable throughput (>1000 ops/sec at minimum)
- Utilization: Memory utilization above 50% on standard traces
- Robustness: Handle edge cases (size=0, ptr=NULL, large sizes)
Example Allocation Trace
┌────────────────────────────────────────────────────────────────────┐
│ EXAMPLE ALLOCATION TRACE │
├────────────────────────────────────────────────────────────────────┤
│ │
│ Trace: │
│ ────── │
│ a0 = malloc(32) # Allocate 32 bytes, get pointer p0 │
│ a1 = malloc(64) # Allocate 64 bytes, get pointer p1 │
│ a2 = malloc(128) # Allocate 128 bytes, get pointer p2 │
│ free(a1) # Free the 64-byte block │
│ a3 = malloc(48) # Should reuse (part of) a1's block │
│ a1 = realloc(a2, 64) # Shrink a2 from 128 to 64 bytes │
│ free(a0) # Free original 32-byte block │
│ free(a3) # Free 48-byte block │
│ free(a1) # Free remaining block │
│ │
│ Heap evolution: │
│ ─────────────── │
│ │
│ After init: │
│ [PROLOGUE] ──────────── FREE ────────────────────────── [EPILOGUE]│
│ │
│ After malloc(32): │
│ [PRO] [a0:32|1] ────────── FREE ─────────────────────── [EPI] │
│ │
│ After malloc(64): │
│ [PRO] [a0:32|1] [a1:64|1] ────── FREE ───────────────── [EPI] │
│ │
│ After malloc(128): │
│ [PRO] [a0:32|1] [a1:64|1] [a2:128|1] ─── FREE ───────── [EPI] │
│ │
│ After free(a1): │
│ [PRO] [a0:32|1] [64|0] [a2:128|1] ─── FREE ───────────── [EPI] │
│ │
│ After malloc(48): (reuses part of freed a1) │
│ [PRO] [a0:32|1] [a3:48|1] [16|0] [a2:128|1] ─── FREE ─── [EPI] │
│ or (if 16 too small, uses all 64): │
│ [PRO] [a0:32|1] [a3:64|1] [a2:128|1] ─── FREE ─── [EPI] │
│ │
│ Final state (all freed): │
│ [PRO] ────────────────── FREE ────────────────────────── [EPI] │
│ │
└────────────────────────────────────────────────────────────────────┘
Real World Outcome
When you complete this project, here’s exactly what you’ll see when running your allocator with the test driver:
Running the Malloc Driver
$ ./mdriver -V
Team Name: my-malloc
Member 1: Student Developer
Using default tracefiles in ./traces/
Measuring performance with gettimeofday().
Results for mm malloc:
trace valid util ops secs Kops
0 yes 99% 5694 0.000183 31115
1 yes 99% 5848 0.000172 34000
2 yes 98% 6648 0.000228 29158
3 yes 99% 5380 0.000136 39559
4 yes 66% 14400 0.000132109091
5 yes 92% 4800 0.000345 13913
6 yes 92% 4800 0.000341 14076
7 yes 55% 12000 0.004792 2504
8 yes 51% 24000 0.013844 1734
9 yes 27% 14401 0.052847 272
10 yes 34% 14401 0.002219 6490
Total 74% 112372 0.075239 1494
Perf index = 44 (util) + 40 (thru) = 84/100
Heap Checker Output
$ ./mdriver -c -V -f traces/short1.rep
Checking mm_init...
Prologue: 8 bytes at 0x7f8a1c000000 [8|1]
Epilogue: 4 bytes at 0x7f8a1c000008 [0|1]
Initial heap size: 4096 bytes
PASSED
Checking mm_malloc(32)...
Returned: 0x7f8a1c00000c (aligned: YES)
Block size: 40 bytes (32 + 8 overhead)
Header: [40|1], Footer: [40|1]
PASSED
Checking mm_malloc(64)...
Returned: 0x7f8a1c000034 (aligned: YES)
Block size: 72 bytes (64 + 8 overhead)
PASSED
Checking mm_free(0x7f8a1c00000c)...
Block marked free: [40|0]
Adjacent blocks: [ALLOC] [FREE] [ALLOC]
Coalescing: Not needed (both neighbors allocated)
PASSED
Checking mm_free(0x7f8a1c000034)...
Block marked free: [72|0]
Adjacent blocks: [FREE] [FREE] [EPILOGUE]
Coalescing: Case 2 (merge with previous)
New coalesced block: [112|0]
PASSED
=== HEAP INVARIANT CHECK ===
All blocks aligned: YES
Header/footer consistency: YES
No consecutive free blocks: YES
Free list consistency: YES
Total heap size matches: YES
All checks PASSED!
Performance Comparison
$ ./compare_allocators
=== ALLOCATOR COMPARISON ===
Trace: binary-bal.rep (mixed alloc/free pattern)
Implementation Throughput Utilization Score
----------------------------------------------------------
Implicit First Fit 1,420 Kops 72% 65/100
Implicit Best Fit 890 Kops 78% 61/100
Explicit First Fit 8,230 Kops 75% 82/100
Explicit Best Fit 4,120 Kops 82% 80/100
Segregated Fits 15,800 Kops 84% 94/100
System malloc 45,000 Kops ~90% (baseline)
Analysis:
- Explicit list 5.8x faster than implicit (only scanning free blocks)
- Best fit: +3-7% utilization, -45% throughput
- Segregated fits: best of both worlds
Debugging a Heap Corruption
$ ./mdriver -V -f traces/coalescing.rep
trace 4: Running...
ERROR: Heap checker failed after operation 47
=== HEAP DUMP ===
Address Size Status Header Footer Notes
--------------------------------------------------------------
0x7f8a1c000000 8 ALLOC [8|1] [8|1] Prologue
0x7f8a1c000008 40 FREE [40|0] [40|0]
0x7f8a1c000030 40 FREE [40|0] [40|0] << CONSECUTIVE FREE!
0x7f8a1c000058 72 ALLOC [72|1] [72|1]
0x7f8a1c0000a0 0 ALLOC [0|1] N/A Epilogue
INVARIANT VIOLATION:
Consecutive free blocks at 0x7f8a1c000008 and 0x7f8a1c000030
These should have been coalesced!
Hint: Check your coalesce() function, specifically Case 2
(current block free, next block free)
Solution Architecture
High-Level Design
┌────────────────────────────────────────────────────────────────────┐
│ ALLOCATOR ARCHITECTURE │
├────────────────────────────────────────────────────────────────────┤
│ │
│ mm_malloc() │
│ │ │
│ ┌───────────────┼───────────────┐ │
│ ▼ ▼ ▼ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Find │ │ Extend │ │ Split │ │
│ │ Block │ │ Heap │ │ Block │ │
│ └──────────┘ └──────────┘ └──────────┘ │
│ │ │ │ │
│ └───────────────┼───────────────┘ │
│ ▼ │
│ Return Pointer │
│ │
│ mm_free() │
│ │ │
│ ┌───────────────┼───────────────┐ │
│ ▼ ▼ ▼ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Mark │ │ Coalesce │ │ Update │ │
│ │ Free │ │ Blocks │ │ List │ │
│ └──────────┘ └──────────┘ └──────────┘ │
│ │
│ │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ HEAP LAYOUT │ │
│ ├─────────────────────────────────────────────────────────────┤ │
│ │ │ │
│ │ heap_listp │ │
│ │ │ │ │
│ │ ▼ │ │
│ │ ┌──────┬─────────┬─────────┬─────────┬─────────┬──────┐ │ │
│ │ │ PRO │ BLOCK 1 │ BLOCK 2 │ BLOCK 3 │ ... │ EPI │ │ │
│ │ │ 8|1 │ │ │ │ │ 0|1 │ │ │
│ │ └──────┴─────────┴─────────┴─────────┴─────────┴──────┘ │ │
│ │ │ │
│ │ free_listp (for explicit list) │ │
│ │ │ │ │
│ │ ▼ │ │
│ │ [FREE 1] ←──→ [FREE 2] ←──→ [FREE 3] ←──→ NULL │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────────┘ │
│ │
└────────────────────────────────────────────────────────────────────┘
Data Structures
/* ==================== CONSTANTS ==================== */
/* Alignment requirement (bytes) */
#define ALIGNMENT 8
/* Word size (bytes) */
#define WSIZE 4
/* Double word size (bytes) */
#define DSIZE 8
/* Minimum block size: header + min payload + footer */
#define MIN_BLOCK_SIZE 16
/* Extend heap by this amount (bytes) */
#define CHUNKSIZE (1 << 12) /* 4096 bytes */
/* ==================== MACROS ==================== */
/* Max of two values */
#define MAX(x, y) ((x) > (y) ? (x) : (y))
/* Pack size and allocated bit into a word */
#define PACK(size, alloc) ((size) | (alloc))
/* Read a word at address p */
#define GET(p) (*(unsigned int *)(p))
/* Write a word at address p */
#define PUT(p, val) (*(unsigned int *)(p) = (val))
/* Read size from header/footer at address p */
#define GET_SIZE(p) (GET(p) & ~0x7)
/* Read allocated bit from header/footer at address p */
#define GET_ALLOC(p) (GET(p) & 0x1)
/* Given block ptr bp, compute address of its header */
#define HDRP(bp) ((char *)(bp) - WSIZE)
/* Given block ptr bp, compute address of its footer */
#define FTRP(bp) ((char *)(bp) + GET_SIZE(HDRP(bp)) - DSIZE)
/* Given block ptr bp, compute address of next block */
#define NEXT_BLKP(bp) ((char *)(bp) + GET_SIZE(((char *)(bp) - WSIZE)))
/* Given block ptr bp, compute address of previous block */
#define PREV_BLKP(bp) ((char *)(bp) - GET_SIZE(((char *)(bp) - DSIZE)))
/* ==================== FOR EXPLICIT FREE LIST ==================== */
/* Given free block ptr bp, get next free block pointer */
#define NEXT_FREE(bp) (*(void **)(bp))
/* Given free block ptr bp, get previous free block pointer */
#define PREV_FREE(bp) (*(void **)((char *)(bp) + DSIZE))
/* ==================== GLOBAL VARIABLES ==================== */
/* Pointer to first block (after prologue) */
static char *heap_listp = NULL;
/* Pointer to first free block (for explicit list) */
static char *free_listp = NULL;
Block Layout
┌────────────────────────────────────────────────────────────────────┐
│ BLOCK LAYOUTS │
├────────────────────────────────────────────────────────────────────┤
│ │
│ IMPLICIT FREE LIST BLOCK: │
│ ───────────────────────── │
│ │
│ Address Content │
│ ─────── ─────── │
│ bp - 4 HEADER: [size | alloc] │
│ bp PAYLOAD start (returned to user) │
│ bp + ? PAYLOAD end │
│ bp + size-8 FOOTER: [size | alloc] │
│ │
│ ┌────────┬────────────────────────────────────────┬────────┐ │
│ │ HEADER │ PAYLOAD │ FOOTER │ │
│ │ 4B │ (size - 8) bytes │ 4B │ │
│ └────────┴────────────────────────────────────────┴────────┘ │
│ │
│ │
│ EXPLICIT FREE LIST BLOCK (when free): │
│ ───────────────────────────────────── │
│ │
│ ┌────────┬──────────┬──────────┬───────────────────┬────────┐ │
│ │ HEADER │ NEXT_PTR │ PREV_PTR │ (unused) │ FOOTER │ │
│ │ 4B │ 8B │ 8B │ (size-28) B │ 4B │ │
│ └────────┴──────────┴──────────┴───────────────────┴────────┘ │
│ ↑ ↑ │
│ bp bp + 8 │
│ │
│ MINIMUM BLOCK SIZE (explicit list): │
│ header(4) + next(8) + prev(8) + footer(4) = 24 bytes │
│ Round up to 32 for alignment │
│ │
│ │
│ PROLOGUE AND EPILOGUE: │
│ ────────────────────── │
│ │
│ ┌────────┬────────┐ ┌────────┐ │
│ │ HDR │ FTR │ │ HDR │ │
│ │ 8|1 │ 8|1 │ │ 0|1 │ │
│ └────────┴────────┘ └────────┘ │
│ PROLOGUE (8 bytes) EPILOGUE (4 bytes, marks end) │
│ │
└────────────────────────────────────────────────────────────────────┘
Algorithm Overview
┌────────────────────────────────────────────────────────────────────┐
│ ALGORITHM SUMMARY │
├────────────────────────────────────────────────────────────────────┤
│ │
│ mm_init(): │
│ ────────── │
│ 1. Request initial heap space (e.g., 4 words) │
│ 2. Create alignment padding if needed │
│ 3. Create prologue header and footer (8|1) │
│ 4. Create epilogue header (0|1) │
│ 5. Set heap_listp to first block position │
│ 6. Extend heap with a free block │
│ │
│ mm_malloc(size): │
│ ──────────────── │
│ 1. Ignore spurious requests (size == 0) │
│ 2. Adjust size: add overhead, round up to alignment │
│ 3. Search free list for suitable block │
│ - First fit: scan from start, take first fit │
│ - Best fit: scan all, take smallest fit │
│ 4. If no block found, extend heap via sbrk() │
│ 5. Split block if remainder >= MIN_BLOCK_SIZE │
│ 6. Mark block as allocated, return payload pointer │
│ │
│ mm_free(ptr): │
│ ───────────── │
│ 1. If ptr == NULL, return immediately │
│ 2. Get block size from header │
│ 3. Mark header and footer as free │
│ 4. Coalesce with adjacent free blocks │
│ 5. (For explicit list) Insert into free list │
│ │
│ coalesce(bp): │
│ ───────────── │
│ 1. Check if previous block is free (via previous footer) │
│ 2. Check if next block is free (via next header) │
│ 3. Handle 4 cases: │
│ a. Neither free: do nothing │
│ b. Next free: merge with next │
│ c. Previous free: merge with previous │
│ d. Both free: merge all three │
│ 4. Update headers/footers and free list │
│ │
│ mm_realloc(ptr, size): │
│ ───────────────────── │
│ 1. Handle edge cases (NULL ptr, size 0) │
│ 2. If new size <= current size, can shrink in place │
│ - Optionally split off excess │
│ 3. If next block is free and combined size sufficient │
│ - Coalesce with next block (avoid copy) │
│ 4. Otherwise: │
│ - malloc(size) │
│ - memcpy(new, old, min_size) │
│ - free(old) │
│ - return new │
│ │
└────────────────────────────────────────────────────────────────────┘
Implementation Guide
Development Environment Setup
# Required tools
gcc --version # Need GCC for compilation
make --version # Build automation
gdb --version # Debugging
# Create project structure
mkdir -p malloc-lab/{src,tests,traces}
cd malloc-lab
# Download CS:APP malloc lab files (if available)
# Or create your own test harness
Project Structure
malloc-lab/
├── src/
│ ├── mm.c # Your allocator implementation
│ ├── mm.h # Public interface
│ ├── memlib.c # Memory system simulator
│ └── memlib.h # Memory system interface
├── tests/
│ ├── mdriver.c # Test driver
│ ├── mm_test.c # Unit tests
│ └── fsecs.c # Timing utilities
├── traces/
│ ├── short1.rep # Short test trace
│ ├── short2.rep # Another short trace
│ └── ... # More traces
├── Makefile
└── README.md
Implementation Phases
Phase 1: Foundation (Days 1-3)
Goals:
- Set up the basic heap structure
- Implement mm_init()
- Create prologue and epilogue blocks
Tasks:
/*
* mm_init - Initialize the heap with prologue and epilogue
*/
int mm_init(void) {
/* Create the initial empty heap */
if ((heap_listp = mem_sbrk(4 * WSIZE)) == (void *)-1)
return -1;
PUT(heap_listp, 0); /* Alignment padding */
PUT(heap_listp + (1 * WSIZE), PACK(DSIZE, 1)); /* Prologue header */
PUT(heap_listp + (2 * WSIZE), PACK(DSIZE, 1)); /* Prologue footer */
PUT(heap_listp + (3 * WSIZE), PACK(0, 1)); /* Epilogue header */
heap_listp += (2 * WSIZE); /* Point to prologue block */
/* Extend the empty heap with a free block of CHUNKSIZE bytes */
if (extend_heap(CHUNKSIZE / WSIZE) == NULL)
return -1;
return 0;
}
Checkpoint: mm_init() creates valid heap structure. Can verify by examining memory.
Phase 2: Basic Allocation (Days 4-7)
Goals:
- Implement find_fit() with first-fit policy
- Implement place() to allocate blocks
- Implement basic mm_malloc()
Tasks:
/*
* find_fit - Find a free block of at least asize bytes
* Uses first-fit search
*/
static void *find_fit(size_t asize) {
void *bp;
/* First-fit search through the heap */
for (bp = heap_listp; GET_SIZE(HDRP(bp)) > 0; bp = NEXT_BLKP(bp)) {
if (!GET_ALLOC(HDRP(bp)) && (asize <= GET_SIZE(HDRP(bp)))) {
return bp;
}
}
return NULL; /* No fit found */
}
/*
* place - Place block of asize bytes at start of free block bp
* Split if remainder would be at least minimum block size
*/
static void place(void *bp, size_t asize) {
size_t csize = GET_SIZE(HDRP(bp));
if ((csize - asize) >= (2 * DSIZE)) { /* Can split */
PUT(HDRP(bp), PACK(asize, 1));
PUT(FTRP(bp), PACK(asize, 1));
bp = NEXT_BLKP(bp);
PUT(HDRP(bp), PACK(csize - asize, 0));
PUT(FTRP(bp), PACK(csize - asize, 0));
} else { /* Cannot split, use entire block */
PUT(HDRP(bp), PACK(csize, 1));
PUT(FTRP(bp), PACK(csize, 1));
}
}
/*
* mm_malloc - Allocate a block with at least size bytes of payload
*/
void *mm_malloc(size_t size) {
size_t asize; /* Adjusted block size */
size_t extendsize; /* Amount to extend heap if no fit */
char *bp;
/* Ignore spurious requests */
if (size == 0)
return NULL;
/* Adjust block size to include overhead and alignment reqs */
if (size <= DSIZE)
asize = 2 * DSIZE; /* Minimum block size */
else
asize = DSIZE * ((size + (DSIZE) + (DSIZE - 1)) / DSIZE);
/* Search the free list for a fit */
if ((bp = find_fit(asize)) != NULL) {
place(bp, asize);
return bp;
}
/* No fit found. Get more memory and place the block */
extendsize = MAX(asize, CHUNKSIZE);
if ((bp = extend_heap(extendsize / WSIZE)) == NULL)
return NULL;
place(bp, asize);
return bp;
}
Checkpoint: Can allocate blocks. Use simple test: malloc(32) returns valid pointer.
Phase 3: Free and Coalesce (Days 8-12)
Goals:
- Implement mm_free()
- Implement coalesce() with all 4 cases
- Verify no adjacent free blocks remain
Tasks:
/*
* mm_free - Free a block and coalesce adjacent free blocks
*/
void mm_free(void *ptr) {
if (ptr == NULL)
return;
size_t size = GET_SIZE(HDRP(ptr));
/* Mark block as free */
PUT(HDRP(ptr), PACK(size, 0));
PUT(FTRP(ptr), PACK(size, 0));
/* Coalesce with adjacent free blocks */
coalesce(ptr);
}
/*
* coalesce - Merge adjacent free blocks
* Returns pointer to the coalesced block
*/
static void *coalesce(void *bp) {
size_t prev_alloc = GET_ALLOC(FTRP(PREV_BLKP(bp)));
size_t next_alloc = GET_ALLOC(HDRP(NEXT_BLKP(bp)));
size_t size = GET_SIZE(HDRP(bp));
if (prev_alloc && next_alloc) { /* Case 1 */
return bp;
}
else if (prev_alloc && !next_alloc) { /* Case 2 */
size += GET_SIZE(HDRP(NEXT_BLKP(bp)));
PUT(HDRP(bp), PACK(size, 0));
PUT(FTRP(bp), PACK(size, 0));
}
else if (!prev_alloc && next_alloc) { /* Case 3 */
size += GET_SIZE(HDRP(PREV_BLKP(bp)));
PUT(FTRP(bp), PACK(size, 0));
PUT(HDRP(PREV_BLKP(bp)), PACK(size, 0));
bp = PREV_BLKP(bp);
}
else { /* Case 4 */
size += GET_SIZE(HDRP(PREV_BLKP(bp))) +
GET_SIZE(FTRP(NEXT_BLKP(bp)));
PUT(HDRP(PREV_BLKP(bp)), PACK(size, 0));
PUT(FTRP(NEXT_BLKP(bp)), PACK(size, 0));
bp = PREV_BLKP(bp);
}
return bp;
}
Checkpoint: Free works correctly. Verify with sequence: malloc, free, malloc (should reuse block).
Phase 4: Heap Checker (Days 13-16)
Goals:
- Implement comprehensive invariant checks
- Add debug printing
- Create test suite
Tasks:
/*
* mm_check - Check the heap for consistency
* Returns 0 if consistent, prints error and returns -1 otherwise
*/
int mm_check(void) {
char *bp;
int free_count_heap = 0;
int free_count_list = 0;
/* Check prologue */
if (GET_SIZE(HDRP(heap_listp)) != DSIZE ||
!GET_ALLOC(HDRP(heap_listp))) {
printf("Error: Invalid prologue header\n");
return -1;
}
/* Iterate through all blocks */
for (bp = heap_listp; GET_SIZE(HDRP(bp)) > 0; bp = NEXT_BLKP(bp)) {
/* Check alignment */
if ((size_t)bp % ALIGNMENT != 0) {
printf("Error: Block at %p is not aligned\n", bp);
return -1;
}
/* Check header/footer match */
if (GET_SIZE(HDRP(bp)) != GET_SIZE(FTRP(bp)) ||
GET_ALLOC(HDRP(bp)) != GET_ALLOC(FTRP(bp))) {
printf("Error: Header/footer mismatch at %p\n", bp);
return -1;
}
/* Check coalescing: no consecutive free blocks */
if (!GET_ALLOC(HDRP(bp))) {
free_count_heap++;
if (!GET_ALLOC(HDRP(NEXT_BLKP(bp))) &&
GET_SIZE(HDRP(NEXT_BLKP(bp))) > 0) {
printf("Error: Consecutive free blocks at %p\n", bp);
return -1;
}
}
/* Check block size >= minimum */
if (GET_SIZE(HDRP(bp)) < 2 * DSIZE) {
printf("Error: Block at %p smaller than minimum\n", bp);
return -1;
}
}
/* Check epilogue */
if (GET_SIZE(HDRP(bp)) != 0 || !GET_ALLOC(HDRP(bp))) {
printf("Error: Invalid epilogue\n");
return -1;
}
/* For explicit list: check all free blocks are in list */
/* (Add this when implementing explicit free list) */
return 0;
}
/*
* print_heap - Print the heap for debugging
*/
void print_heap(void) {
char *bp;
printf("\n===== HEAP DUMP =====\n");
printf("heap_listp = %p\n", heap_listp);
for (bp = heap_listp; GET_SIZE(HDRP(bp)) > 0; bp = NEXT_BLKP(bp)) {
printf("Block at %p: size=%u, alloc=%d\n",
bp, GET_SIZE(HDRP(bp)), GET_ALLOC(HDRP(bp)));
}
printf("Epilogue at %p\n", bp);
printf("=====================\n\n");
}
Checkpoint: Heap checker passes on all test cases.
Phase 5: Realloc Implementation (Days 17-20)
Goals:
- Implement mm_realloc()
- Handle all edge cases
- Optimize to avoid unnecessary copying
Tasks:
/*
* mm_realloc - Resize an allocated block
*/
void *mm_realloc(void *ptr, size_t size) {
size_t oldsize;
void *newptr;
/* If ptr is NULL, just malloc */
if (ptr == NULL) {
return mm_malloc(size);
}
/* If size is 0, just free */
if (size == 0) {
mm_free(ptr);
return NULL;
}
oldsize = GET_SIZE(HDRP(ptr));
/* Calculate new required size with overhead */
size_t asize;
if (size <= DSIZE)
asize = 2 * DSIZE;
else
asize = DSIZE * ((size + DSIZE + (DSIZE - 1)) / DSIZE);
/* If new size fits in current block */
if (asize <= oldsize) {
/* Could split here if oldsize - asize >= MIN_BLOCK_SIZE */
return ptr;
}
/* Check if next block is free and combined size is enough */
size_t next_alloc = GET_ALLOC(HDRP(NEXT_BLKP(ptr)));
size_t next_size = GET_SIZE(HDRP(NEXT_BLKP(ptr)));
if (!next_alloc && (oldsize + next_size >= asize)) {
/* Expand into next block */
PUT(HDRP(ptr), PACK(oldsize + next_size, 1));
PUT(FTRP(ptr), PACK(oldsize + next_size, 1));
return ptr;
}
/* Must allocate new block and copy */
newptr = mm_malloc(size);
if (newptr == NULL)
return NULL;
/* Copy old data (payload only) */
size_t copysize = oldsize - DSIZE; /* Subtract header+footer */
if (size < copysize)
copysize = size;
memcpy(newptr, ptr, copysize);
/* Free old block */
mm_free(ptr);
return newptr;
}
Checkpoint: Realloc passes all tests. Verify contents preserved.
Phase 6: Performance Optimization (Days 21-28)
Goals:
- Upgrade to explicit free list
- Implement segregated lists (optional)
- Tune parameters for best throughput/utilization
Tasks:
/*
* Explicit free list functions
*/
/* Insert block at front of free list (LIFO) */
static void insert_free_block(void *bp) {
NEXT_FREE(bp) = free_listp;
PREV_FREE(bp) = NULL;
if (free_listp != NULL) {
PREV_FREE(free_listp) = bp;
}
free_listp = bp;
}
/* Remove block from free list */
static void remove_free_block(void *bp) {
void *prev = PREV_FREE(bp);
void *next = NEXT_FREE(bp);
if (prev == NULL) { /* First in list */
free_listp = next;
} else {
NEXT_FREE(prev) = next;
}
if (next != NULL) {
PREV_FREE(next) = prev;
}
}
/* Updated find_fit for explicit list */
static void *find_fit(size_t asize) {
void *bp;
/* Search the free list */
for (bp = free_listp; bp != NULL; bp = NEXT_FREE(bp)) {
if (asize <= GET_SIZE(HDRP(bp))) {
return bp;
}
}
return NULL;
}
/* Updated coalesce to maintain free list */
static void *coalesce(void *bp) {
size_t prev_alloc = GET_ALLOC(FTRP(PREV_BLKP(bp)));
size_t next_alloc = GET_ALLOC(HDRP(NEXT_BLKP(bp)));
size_t size = GET_SIZE(HDRP(bp));
if (prev_alloc && next_alloc) {
/* Just insert into free list */
insert_free_block(bp);
return bp;
}
else if (prev_alloc && !next_alloc) {
/* Remove next from list, merge, insert merged */
remove_free_block(NEXT_BLKP(bp));
size += GET_SIZE(HDRP(NEXT_BLKP(bp)));
PUT(HDRP(bp), PACK(size, 0));
PUT(FTRP(bp), PACK(size, 0));
insert_free_block(bp);
}
else if (!prev_alloc && next_alloc) {
/* Remove prev from list, merge, insert merged */
remove_free_block(PREV_BLKP(bp));
size += GET_SIZE(HDRP(PREV_BLKP(bp)));
PUT(FTRP(bp), PACK(size, 0));
PUT(HDRP(PREV_BLKP(bp)), PACK(size, 0));
bp = PREV_BLKP(bp);
insert_free_block(bp);
}
else {
/* Remove both from list, merge all three, insert merged */
remove_free_block(PREV_BLKP(bp));
remove_free_block(NEXT_BLKP(bp));
size += GET_SIZE(HDRP(PREV_BLKP(bp))) +
GET_SIZE(FTRP(NEXT_BLKP(bp)));
PUT(HDRP(PREV_BLKP(bp)), PACK(size, 0));
PUT(FTRP(NEXT_BLKP(bp)), PACK(size, 0));
bp = PREV_BLKP(bp);
insert_free_block(bp);
}
return bp;
}
Checkpoint: Performance meets targets. Throughput and utilization scores acceptable.
Phase 7: Polish and Documentation (Days 29-30)
Goals:
- Clean up code
- Add comments
- Document design decisions
Testing Strategy
Unit Tests
/* Test basic allocation */
void test_malloc_basic(void) {
mm_init();
void *p1 = mm_malloc(32);
assert(p1 != NULL);
assert((size_t)p1 % ALIGNMENT == 0);
void *p2 = mm_malloc(64);
assert(p2 != NULL);
assert(p2 != p1); /* Different blocks */
mm_free(p1);
mm_free(p2);
assert(mm_check() == 0);
printf("test_malloc_basic PASSED\n");
}
/* Test coalescing */
void test_coalesce(void) {
mm_init();
void *p1 = mm_malloc(32);
void *p2 = mm_malloc(32);
void *p3 = mm_malloc(32);
mm_free(p2); /* Create hole in middle */
/* Should have 3 blocks: alloc, free, alloc */
assert(mm_check() == 0);
mm_free(p3); /* Should coalesce p2 and p3 */
/* Should have 2 blocks: alloc, free (coalesced) */
assert(mm_check() == 0);
mm_free(p1); /* Should coalesce all three */
/* Should have 1 large free block */
assert(mm_check() == 0);
printf("test_coalesce PASSED\n");
}
/* Test realloc */
void test_realloc(void) {
mm_init();
/* Test NULL pointer */
void *p = mm_realloc(NULL, 32);
assert(p != NULL);
/* Write pattern */
memset(p, 'A', 32);
/* Grow */
p = mm_realloc(p, 64);
assert(p != NULL);
/* Verify pattern preserved */
for (int i = 0; i < 32; i++) {
assert(((char *)p)[i] == 'A');
}
/* Shrink */
p = mm_realloc(p, 16);
assert(p != NULL);
/* Free via realloc */
void *q = mm_realloc(p, 0);
assert(q == NULL);
assert(mm_check() == 0);
printf("test_realloc PASSED\n");
}
/* Test alignment */
void test_alignment(void) {
mm_init();
for (int size = 1; size <= 1024; size++) {
void *p = mm_malloc(size);
assert(p != NULL);
assert((size_t)p % ALIGNMENT == 0);
mm_free(p);
}
assert(mm_check() == 0);
printf("test_alignment PASSED\n");
}
Stress Tests
/* Random allocation pattern */
void test_random_pattern(int seed, int ops) {
mm_init();
srand(seed);
void *ptrs[1000] = {0};
for (int i = 0; i < ops; i++) {
int op = rand() % 3;
int idx = rand() % 1000;
if (op == 0 && ptrs[idx] == NULL) {
/* Allocate */
size_t size = (rand() % 512) + 1;
ptrs[idx] = mm_malloc(size);
if (ptrs[idx]) {
memset(ptrs[idx], idx & 0xFF, size);
}
}
else if (op == 1 && ptrs[idx] != NULL) {
/* Free */
mm_free(ptrs[idx]);
ptrs[idx] = NULL;
}
else if (op == 2 && ptrs[idx] != NULL) {
/* Realloc */
size_t size = (rand() % 512) + 1;
ptrs[idx] = mm_realloc(ptrs[idx], size);
}
/* Periodic check */
if (i % 100 == 0) {
assert(mm_check() == 0);
}
}
/* Cleanup */
for (int i = 0; i < 1000; i++) {
if (ptrs[i]) mm_free(ptrs[i]);
}
printf("test_random_pattern PASSED (seed=%d, ops=%d)\n", seed, ops);
}
Trace Files
Standard trace file format:
# Comments start with #
# a <id> <size> - allocate <size> bytes, assign to ptr <id>
# f <id> - free ptr <id>
# r <id> <size> - realloc ptr <id> to <size> bytes
a 0 32
a 1 64
a 2 128
f 1
a 3 48
r 2 64
f 0
f 3
f 2
Heap Checker in Action
/* Call after every operation during debugging */
#ifdef DEBUG
#define CHECKHEAP(verbose) do { \
if (mm_check()) { \
printf("Heap error after %s:%d\n", __FILE__, __LINE__); \
print_heap(); \
exit(1); \
} \
} while (0)
#else
#define CHECKHEAP(verbose)
#endif
Common Pitfalls
Corruption Bugs
┌────────────────────────────────────────────────────────────────────┐
│ CORRUPTION BUGS │
├────────────────────────────────────────────────────────────────────┤
│ │
│ BUG: Off-by-one in block size calculation │
│ ────────────────────────────────────────── │
│ │
│ Symptom: Header/footer don't match after a few operations │
│ │
│ Wrong: PUT(FTRP(bp), PACK(size, 0)); // But FTRP uses old size │
│ Right: PUT(HDRP(bp), PACK(size, 0)); // Header first │
│ PUT(FTRP(bp), PACK(size, 0)); // Footer uses new size │
│ │
│ │
│ BUG: Pointer arithmetic on wrong type │
│ ───────────────────────────────────────── │
│ │
│ Symptom: Blocks overlap or have gaps │
│ │
│ Wrong: bp = bp + size; // Adds size * sizeof(*bp) │
│ Right: bp = (char *)bp + size; // Adds exactly 'size' bytes │
│ │
│ │
│ BUG: Coalescing with allocated blocks │
│ ───────────────────────────────────────── │
│ │
│ Symptom: Allocated data gets overwritten │
│ │
│ Wrong: // Forgot to check GET_ALLOC before coalescing │
│ Right: if (!prev_alloc) { ... merge ... } │
│ │
│ │
│ BUG: Double free │
│ ──────────────── │
│ │
│ Symptom: Crash or infinite loop in coalescing │
│ │
│ Defense: In debug mode, mark freed blocks with magic number │
│ Check for magic before freeing │
│ │
└────────────────────────────────────────────────────────────────────┘
Coalescing Bugs
┌────────────────────────────────────────────────────────────────────┐
│ COALESCING BUGS │
├────────────────────────────────────────────────────────────────────┤
│ │
│ BUG: Not updating all headers/footers │
│ ───────────────────────────────────────── │
│ │
│ Case 4 (both neighbors free): │
│ Before: [PREV:64|0] [CURR:32|0] [NEXT:64|0] │
│ │
│ Wrong: PUT(HDRP(PREV_BLKP(bp)), PACK(160, 0)); // Only header! │
│ │
│ Right: PUT(HDRP(PREV_BLKP(bp)), PACK(160, 0)); // Update prev hdr│
│ PUT(FTRP(NEXT_BLKP(bp)), PACK(160, 0)); // Update next ftr│
│ │
│ │
│ BUG: Forgetting to return new block pointer │
│ ──────────────────────────────────────────── │
│ │
│ Case 3 or 4: Block pointer changes to prev block! │
│ │
│ Wrong: return bp; // Still points to old block │
│ Right: bp = PREV_BLKP(bp); │
│ return bp; // Return merged block start │
│ │
│ │
│ BUG: Not removing from free list before coalescing │
│ ────────────────────────────────────────────────── │
│ │
│ (For explicit free lists) │
│ │
│ Wrong: // Merge blocks but neighbor still in free list │
│ Right: remove_free_block(NEXT_BLKP(bp)); // Remove first │
│ // Then merge and insert merged block │
│ │
└────────────────────────────────────────────────────────────────────┘
Alignment Bugs
┌────────────────────────────────────────────────────────────────────┐
│ ALIGNMENT BUGS │
├────────────────────────────────────────────────────────────────────┤
│ │
│ BUG: Block size not multiple of alignment │
│ ───────────────────────────────────────── │
│ │
│ Wrong: asize = size + DSIZE; // Could be odd │
│ Right: asize = DSIZE * ((size + DSIZE + (DSIZE-1)) / DSIZE); │
│ │
│ │
│ BUG: Returning unaligned pointer │
│ ──────────────────────────────── │
│ │
│ Symptom: Bus error or incorrect data │
│ │
│ Check: assert((size_t)bp % ALIGNMENT == 0); │
│ │
│ │
│ BUG: Header not at word boundary │
│ ──────────────────────────────── │
│ │
│ Initial heap must be aligned! │
│ │
│ // In mm_init: │
│ PUT(heap_listp, 0); // Padding word for alignment │
│ │
└────────────────────────────────────────────────────────────────────┘
Performance Pitfalls
┌────────────────────────────────────────────────────────────────────┐
│ PERFORMANCE PITFALLS │
├────────────────────────────────────────────────────────────────────┤
│ │
│ PITFALL: Linear search of all blocks │
│ ──────────────────────────────────── │
│ │
│ Problem: Implicit free list scans ALL blocks │
│ Solution: Explicit free list (only scan free blocks) │
│ Better: Segregated lists (O(1) for common sizes) │
│ │
│ │
│ PITFALL: Extending heap too often │
│ ──────────────────────────────── │
│ │
│ Problem: sbrk() is expensive │
│ Solution: Extend by larger chunks (e.g., 4KB) │
│ Reuse freed blocks before extending │
│ │
│ │
│ PITFALL: Not splitting blocks │
│ ──────────────────────────── │
│ │
│ Problem: Huge internal fragmentation │
│ Solution: Split if remainder >= MIN_BLOCK_SIZE │
│ │
│ │
│ PITFALL: Best fit with O(n) search │
│ ──────────────────────────────── │
│ │
│ Problem: Slow for large heaps │
│ Solution: First fit is usually faster AND has good utilization │
│ Better: Segregated fits combine speed with good utilization │
│ │
└────────────────────────────────────────────────────────────────────┘
Extensions
Beginner Extensions
- Footer elimination: Use “previous allocated” bit to skip footer on allocated blocks
- Better fit: Implement best fit or next fit, compare performance
- Heap visualization: ASCII art showing block layout
- Statistics collection: Track peak usage, fragmentation metrics
Intermediate Extensions
- Segregated free lists: Group blocks by size class for O(1) allocation
- Immediate coalescing vs deferred: Compare strategies
- Memory-mapped large blocks: Use mmap for allocations > threshold
- Debug mode: Pattern filling, guard pages, leak detection
Advanced Extensions
- Thread-safe allocator: Add locking or use lock-free algorithms
- Per-thread arenas: Reduce contention in multithreaded programs
- Compaction: Move blocks to reduce fragmentation (requires handle system)
- SLAB allocator: For fixed-size object pools (common in kernels)
Segregated Free Lists Implementation
/*
* Segregated free lists implementation
*/
#define NUM_SIZE_CLASSES 9
/* Size class boundaries (in bytes) */
static size_t class_sizes[NUM_SIZE_CLASSES] = {
32, 64, 128, 256, 512, 1024, 2048, 4096, SIZE_MAX
};
/* Array of free list heads, one per size class */
static void *seg_lists[NUM_SIZE_CLASSES];
/* Get size class for a given size */
static int get_size_class(size_t size) {
for (int i = 0; i < NUM_SIZE_CLASSES; i++) {
if (size <= class_sizes[i])
return i;
}
return NUM_SIZE_CLASSES - 1;
}
/* Find fit in segregated lists */
static void *seg_find_fit(size_t asize) {
int class = get_size_class(asize);
/* Search this class and larger classes */
for (int i = class; i < NUM_SIZE_CLASSES; i++) {
void *bp = seg_lists[i];
while (bp != NULL) {
if (GET_SIZE(HDRP(bp)) >= asize) {
return bp;
}
bp = NEXT_FREE(bp);
}
}
return NULL; /* No fit in any class */
}
/* Insert into appropriate segregated list */
static void seg_insert(void *bp) {
size_t size = GET_SIZE(HDRP(bp));
int class = get_size_class(size);
/* LIFO insertion at head of list */
NEXT_FREE(bp) = seg_lists[class];
PREV_FREE(bp) = NULL;
if (seg_lists[class] != NULL) {
PREV_FREE(seg_lists[class]) = bp;
}
seg_lists[class] = bp;
}
/* Remove from segregated list */
static void seg_remove(void *bp) {
size_t size = GET_SIZE(HDRP(bp));
int class = get_size_class(size);
void *prev = PREV_FREE(bp);
void *next = NEXT_FREE(bp);
if (prev == NULL) {
seg_lists[class] = next;
} else {
NEXT_FREE(prev) = next;
}
if (next != NULL) {
PREV_FREE(next) = prev;
}
}
Real-World Connections
Production Allocators
glibc malloc (ptmalloc2):
- Uses bins (segregated lists) for small allocations
- Maintains unsorted “fastbin” for recently freed small blocks
- Uses mmap for very large allocations
- Thread-safe with per-arena locking
jemalloc (used by Firefox, Facebook):
- Thread-local caching to reduce contention
- Size classes with minimal internal fragmentation
- Sophisticated metadata management
- Extensive debugging and profiling support
tcmalloc (Google):
- Per-thread cache for small allocations
- Central free list for larger allocations
- Page-based heap organization
- Low fragmentation, high performance
mimalloc (Microsoft):
- Free list sharding for NUMA awareness
- Extremely fast small object allocation
- Designed for modern multi-core systems
Kernel Allocators
Linux SLAB/SLUB:
- Object caching for frequently-used structures
- Per-CPU caches to avoid locking
- Constructors/destructors for complex objects
Buddy Allocator:
- Power-of-2 block sizes
- Fast coalescing via “buddy” relationship
- Used for page-level allocation
Language Runtime Allocators
Go runtime:
- Concurrent, generational garbage collector
- Size-class based allocation
- Integration with goroutine scheduling
Rust allocators:
- System allocator (usually jemalloc or platform default)
- Custom allocator API for embedded/specialized use
- Memory safety guarantees at compile time
Interview Relevance
This project prepares you for questions like:
- “How would you implement malloc?”
- “What is fragmentation and how do you minimize it?”
- “Explain the trade-offs between different allocator designs”
- “How do production allocators achieve thread safety?”
- “What data structures are used in memory allocators?”
Resources
Essential Reading
- CS:APP Chapter 9: Virtual Memory (especially 9.9 Dynamic Memory Allocation)
- CS:APP Chapter 6: The Memory Hierarchy (for locality understanding)
- “C Interfaces and Implementations” by David Hanson: Excellent allocator examples
- glibc malloc source code: Real-world implementation
Papers
- “Dynamic Storage Allocation: A Survey and Critical Review” (Wilson et al.)
- “Hoard: A Scalable Memory Allocator for Multithreaded Applications” (Berger et al.)
- “Mimalloc: Free List Sharding in Action” (Leijen et al.)
Online Resources
Tools
- Valgrind: Memory error detection
- AddressSanitizer: Fast memory error detection
- heaptrack: Heap memory profiler
- perf: Performance profiling
Related Projects in This Series
- Previous: P13 (Virtual Memory Map Visualizer) - VM foundations
- Next: P15 (Robust Unix I/O Toolkit) - I/O patterns
- Uses concepts from: P2 (Bitwise Data Inspector) - bit manipulation for headers
- Uses concepts from: P9 (Cache Simulator) - locality for performance
Self-Assessment Checklist
Understanding
- I can explain why dynamic allocation is necessary
- I understand the difference between internal and external fragmentation
- I can describe at least 3 free list organizations and their trade-offs
- I can explain boundary tag coalescing in all 4 cases
- I understand alignment requirements and why they matter
- I can list 5+ allocator invariants and explain why each is important
Implementation
- mm_init creates valid heap with prologue and epilogue
- mm_malloc returns properly aligned blocks
- mm_malloc correctly splits blocks when possible
- mm_free correctly marks blocks and coalesces
- All 4 coalescing cases work correctly
- mm_realloc handles all edge cases
- Heap checker catches common errors
- No memory corruption on random traces
Performance
- Throughput meets minimum threshold (1000+ ops/sec)
- Utilization is reasonable (>50% on standard traces)
- Explicit free list improves performance over implicit
- I can explain which design choices affect throughput vs utilization
Growth
- I debugged at least one corruption bug using the heap checker
- I measured and improved performance through design changes
- I understand how production allocators (jemalloc, tcmalloc) work
- I can discuss allocator trade-offs in an interview setting
The Core Question You’re Answering
“When I call malloc(100), where does that memory come from, how does the allocator track it, and what happens when I call free()?”
This project takes you inside one of the most fundamental abstractions in systems programming. You’ll see that “getting memory” is really about managing a large region of bytes, carving it into pieces, tracking which pieces are in use, and reassembling freed pieces efficiently. You’ll understand why fragmentation matters, why free() doesn’t shrink your process, and the brilliant engineering behind production allocators.
Concepts You Must Understand First
Before starting this project, ensure you understand these concepts:
| Concept | Why It Matters | Where to Learn |
|---|---|---|
| Pointer arithmetic in C | You’ll compute block addresses constantly | CS:APP 3.8, any C book Ch. 5-6 |
| How the heap grows (sbrk, brk) | You’ll extend the heap when out of space | CS:APP 9.9.1 |
| Alignment requirements | Every allocation must be aligned | CS:APP 9.9.1, 3.9.3 |
| Bit manipulation (pack/unpack) | Headers encode size and status in one word | CS:APP Chapter 2 |
| Linked list operations | Free lists are linked lists | Data structures basics |
| What fragmentation means | Your allocator must minimize it | CS:APP 9.9.3-9.9.4 |
Questions to Guide Your Design
Work through these questions BEFORE writing code:
-
Block Layout: How big is your header? Do you need a footer? What’s your minimum block size?
-
Alignment: How do you ensure every returned pointer is 8-byte (or 16-byte) aligned? How does this affect block sizes?
-
Finding Free Blocks: Will you use implicit list, explicit list, or segregated lists? What are the trade-offs?
-
Placement Policy: First fit, next fit, or best fit? What are the implications for throughput and utilization?
-
Splitting: When do you split a block? What’s the minimum remainder size worth keeping?
-
Coalescing: Will you coalesce immediately or defer? How do footers enable O(1) coalescing with the previous block?
-
Heap Extension: How much should you extend when you run out of space? Small or large chunks?
Thinking Exercise
Before writing any code, trace through this scenario by hand:
Starting heap state (block addresses and sizes shown):
[PROLOGUE 8|1] [FREE 80|0] [ALLOC 48|1] [FREE 120|0] [EPILOGUE 0|1]
Operations to trace:
void *p1 = mm_malloc(32); // Needs 40 bytes with overhead
void *p2 = mm_malloc(100); // Needs 112 bytes with overhead (round up)
mm_free(p1);
void *p3 = mm_malloc(60); // Needs 72 bytes with overhead
Exercise: On paper, answer:
-
After malloc(32): Which block was used? Was it split? Draw the new heap layout with all headers/footers.
-
After malloc(100): Which block was used? What happens if no single block is big enough?
-
After free(p1): Was coalescing needed? Draw the free list if using explicit list.
-
After malloc(60): Which block was chosen (first fit)? Would best fit choose differently?
Verify your answers by implementing and adding debug printing.
Hints in Layers
If you’re stuck, reveal hints one at a time:
Hint 1: Getting Started with Block Macros
Define these macros first - they’re the foundation:
#define WSIZE 4 // Word size (bytes)
#define DSIZE 8 // Double word size (bytes)
#define ALIGNMENT 8
// Pack size and allocated bit into a word
#define PACK(size, alloc) ((size) | (alloc))
// Read and write a word at address p
#define GET(p) (*(unsigned int *)(p))
#define PUT(p, val) (*(unsigned int *)(p) = (val))
// Read size and allocated fields from header/footer
#define GET_SIZE(p) (GET(p) & ~0x7)
#define GET_ALLOC(p) (GET(p) & 0x1)
// Given block ptr bp, compute header and footer addresses
#define HDRP(bp) ((char *)(bp) - WSIZE)
#define FTRP(bp) ((char *)(bp) + GET_SIZE(HDRP(bp)) - DSIZE)
// Given block ptr bp, compute next and previous block addresses
#define NEXT_BLKP(bp) ((char *)(bp) + GET_SIZE(HDRP(bp)))
#define PREV_BLKP(bp) ((char *)(bp) - GET_SIZE((char *)(bp) - DSIZE))
Test these macros extensively before using them!
Hint 2: The Coalescing Cases
When you free a block, check all four cases:
Case 1: [ALLOC] [FREE*] [ALLOC] -> just mark free
Case 2: [ALLOC] [FREE*] [FREE] -> merge current + next
Case 3: [FREE] [FREE*] [ALLOC] -> merge prev + current
Case 4: [FREE] [FREE*] [FREE] -> merge prev + current + next
For cases 3 and 4, remember to return the NEW block pointer (prev block’s address).
To check if previous is free, you need its footer: look at (bp - DSIZE).
Hint 3: Debugging with a Heap Checker
Implement mm_check() early! Check these invariants:
int mm_check(void) {
char *bp;
// Check prologue
if (GET_SIZE(heap_listp - WSIZE) != DSIZE ||
!GET_ALLOC(heap_listp - WSIZE))
return 0; // Bad prologue
// Walk all blocks
for (bp = heap_listp; GET_SIZE(HDRP(bp)) > 0; bp = NEXT_BLKP(bp)) {
// Check alignment
if ((size_t)bp % ALIGNMENT != 0) {
printf("Block %p not aligned!\n", bp);
return 0;
}
// Check header/footer match
if (GET(HDRP(bp)) != GET(FTRP(bp))) {
printf("Header/footer mismatch at %p\n", bp);
return 0;
}
// Check for consecutive free blocks
if (!GET_ALLOC(HDRP(bp)) && !GET_ALLOC(HDRP(NEXT_BLKP(bp)))) {
printf("Consecutive free blocks at %p\n", bp);
return 0;
}
}
return 1; // All checks passed
}
Call mm_check() after every malloc/free during debugging.
Hint 4: Explicit Free List
For explicit lists, free blocks contain next/prev pointers in their payload area:
Free block layout:
[HEADER 4B] [NEXT PTR 8B] [PREV PTR 8B] [...padding...] [FOOTER 4B]
Minimum free block size: 4 + 8 + 8 + 4 = 24 bytes (round to 32 for alignment)
When allocating, remove from list:
void remove_from_free_list(void *bp) {
void *prev = PREV_FREE(bp);
void *next = NEXT_FREE(bp);
if (prev) NEXT_FREE(prev) = next;
else free_list_head = next;
if (next) PREV_FREE(next) = prev;
}
When freeing, add to list (LIFO is simplest):
void add_to_free_list(void *bp) {
NEXT_FREE(bp) = free_list_head;
PREV_FREE(bp) = NULL;
if (free_list_head) PREV_FREE(free_list_head) = bp;
free_list_head = bp;
}
The Interview Questions They’ll Ask
After completing this project, you’ll be ready for these common interview questions:
- “How does malloc work internally?”
- Expected: Manages heap via free lists, finds/splits blocks, extends heap if needed
- Bonus: Discuss headers, footers, coalescing, and alignment
- “What’s the difference between internal and external fragmentation?”
- Expected: Internal = wasted space inside allocated blocks; External = scattered free blocks can’t satisfy large request
- Bonus: Give concrete examples, discuss mitigation strategies
- “Explain how you would implement free().”
- Expected: Mark block as free, coalesce with neighbors, add to free list
- Bonus: Describe all 4 coalescing cases, explain why footers are needed
- “What placement policy would you use and why?”
- Expected: First fit (fast) vs. best fit (better utilization) trade-off
- Bonus: Discuss next fit, segregated fits, and when each is appropriate
- “How do production allocators like jemalloc achieve high performance?”
- Expected: Thread-local caches, size classes, minimal locking
- Bonus: Discuss arena-based allocation, slab allocators, NUMA awareness
- “Why doesn’t free() return memory to the OS immediately?”
- Expected: Heap can only shrink from the top; fragmentation prevents shrinking
- Bonus: Discuss mmap for large allocations, MADV_DONTNEED, memory overcommit
Books That Will Help
| Topic | Book | Chapter/Section |
|---|---|---|
| Dynamic memory allocation overview | CS:APP 3rd Ed | Chapter 9.9 “Dynamic Memory Allocation” |
| Allocator requirements and goals | CS:APP 3rd Ed | Chapter 9.9.1 “The malloc and free Functions” |
| Fragmentation | CS:APP 3rd Ed | Chapter 9.9.3-9.9.4 “Fragmentation” |
| Implementation details | CS:APP 3rd Ed | Chapter 9.9.6-9.9.13 “Implicit Free Lists” through “Explicit Free Lists” |
| Segregated storage | CS:APP 3rd Ed | Chapter 9.9.14 “Segregated Free Lists” |
| Coalescing strategies | CS:APP 3rd Ed | Chapter 9.9.10 “Coalescing Free Blocks” |
| Production allocators | The Linux Programming Interface | Chapter 7 “Memory Allocation” |
| Hoard allocator (scalable) | Paper: “Hoard: A Scalable Memory Allocator for Multithreaded Applications” | - |
| jemalloc design | jemalloc documentation | - |
Submission / Completion Criteria
Minimum Viable Completion:
- mm_init, mm_malloc, mm_free work correctly
- Passes basic correctness tests
- Heap checker implemented and passing
- Implicit free list with first fit
Full Completion:
- All of the above plus mm_realloc
- Explicit free list implemented
- Good throughput and utilization scores
- All 4 coalescing cases verified
Excellence (Going Above & Beyond):
- Segregated free lists
- Footer elimination optimization
- Thread-safe version
- Comparison with production allocators
- Detailed performance analysis report
This guide was expanded from CSAPP_3E_DEEP_LEARNING_PROJECTS.md. For the complete learning path, see the project index.