Project 2: Custom Memory Allocator (malloc/free from scratch)
Build a drop-in
malloc/freereplacement that manages heap pages, splits/coalesces blocks, and usesmmapfor large allocations.
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Level 3: Advanced |
| Time Estimate | 2-3 weeks |
| Main Programming Language | C (Alternatives: Rust) |
| Alternative Programming Languages | Rust |
| Coolness Level | Level 4: Hardcore |
| Business Potential | Level 3: Systems / performance tooling |
| Prerequisites | C pointers, structs, bit ops, basic VM concepts |
| Key Topics | Heap layout, free lists, coalescing, alignment, mmap/brk |
1. Learning Objectives
By completing this project, you will:
- Explain how user-space heaps grow via
brk/sbrkandmmap. - Design allocator metadata and free list strategies.
- Implement splitting, coalescing, and alignment correctly.
- Integrate with real binaries using
LD_PRELOAD. - Benchmark and reason about fragmentation vs speed.
- Add deterministic debug modes for reproducible tests.
2. All Theory Needed (Per-Concept Breakdown)
2.1 Process Virtual Memory and the Heap
Fundamentals
Processes see a contiguous virtual address space. The heap typically grows upward using brk/sbrk, while large allocations can use mmap to map fresh regions. The kernel backs virtual pages with physical memory on demand, which is why small allocations are cheap until you touch pages.
Deep Dive into the concept
The kernel maintains per-process VMAs (virtual memory areas). The heap is usually a VMA bounded by the program break; sbrk moves the break and expands the heap VMA. mmap creates new VMAs and can bypass the main heap, which is useful for large allocations to reduce fragmentation and allow independent unmapping. Page alignment matters: allocators often align to 16 bytes (for SIMD) and manage internal headers. Understanding VMAs helps you explain why large allocations appear in /proc/<pid>/maps as separate regions.
How this fits on projects
You will choose when to use sbrk vs mmap in Section 3.2 and design the block layout in Section 4.3.
Definitions & key terms
- heap -> region for dynamic allocations via
malloc - program break -> end of the heap for
brk/sbrk - VMA -> virtual memory area tracked by the kernel
mmap-> map pages at arbitrary addresses
Mental model diagram (ASCII)
low addr
+--------------------+ text/data
+--------------------+
| heap (brk) | grows up
+--------------------+
| mmap regions | large allocs
+--------------------+
| stack | grows down
+--------------------+
high addr
How it works (step-by-step)
- Allocator calls
sbrkto extend heap by N bytes. - Kernel updates program break and VMA.
- Pages are committed on first touch (page faults).
- For large allocations, allocator calls
mmapdirectly.
Minimal concrete example
void *p = sbrk(4096); // grow heap by a page
void *q = mmap(NULL, 1<<20, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
Common misconceptions
- Misconception:
mallocalways callssbrk. Correction: modern allocators often usemmapfor large blocks. - Misconception: pages are allocated immediately. Correction: pages are typically allocated on first access.
Check-your-understanding questions
- Why do large allocations often use
mmap? - What does
/proc/<pid>/mapsshow for anmmapallocation? - Why does touching a new heap page cause a page fault?
Check-your-understanding answers
- To reduce fragmentation and enable independent unmapping.
- A separate VMA labeled
[anon]or the mapped file. - The page isn’t backed until first access (demand paging).
Real-world applications
- High-performance allocators (jemalloc, tcmalloc)
- Leak detection and heap profiling
Where you’ll apply it
- This project: Section 3.2 Functional Requirements, Section 4.3 Data Structures.
- Also used in: P04-page-fault-analyzer for understanding demand paging.
References
- “Computer Systems: A Programmer’s Perspective” Ch. 9
brk(2),mmap(2)man pages
Key insights
Allocator policy is inseparable from how the OS maps memory.
Summary
Understanding VMAs and the program break tells you what the allocator can and cannot control.
Homework/Exercises to practice the concept
- Allocate 1 MB with
mallocand inspect/proc/<pid>/maps. - Compare
sbrk(0)before and aftermalloc(1).
Solutions to the homework/exercises
- You should see a new VMA if
mallocusesmmapfor large blocks. - The program break may advance after the first allocation.
2.2 Block Metadata, Free Lists, and Coalescing
Fundamentals
Allocators track blocks with headers that store size and status. Free blocks are kept in a free list. When a request arrives, you pick a free block, possibly split it, and later coalesce adjacent free blocks to reduce fragmentation.
Deep Dive into the concept
Metadata usually includes size and flags packed into a word. The allocator keeps an explicit free list (linked list of free blocks), or segregated lists by size class. Coalescing requires checking neighbors; boundary tags let you find previous block size. Splitting a block must leave a minimum block size to store metadata. A common approach is: first-fit or best-fit over a free list, split if large, return pointer to payload, then on free, coalesce adjacent free blocks and insert into free list. Correctness hinges on consistent header/footer updates.
How this fits on projects
This is the core of your allocator in Section 4.4 and Section 5.10 Phase 2.
Definitions & key terms
- block header -> metadata for a block (size, flags)
- free list -> collection of free blocks
- coalescing -> merging adjacent free blocks
- split -> divide a large free block into allocated + free
Mental model diagram (ASCII)
[Hdr|64|USED][Hdr|128|FREE][Hdr|256|USED][Hdr|128|FREE]
^ split here ^ coalesce here
How it works (step-by-step)
- Search free list for a block of adequate size.
- If block is much larger, split and return part.
- On free, mark block free and check neighbors.
- Merge adjacent free blocks and update list.
Minimal concrete example
struct block { size_t size; struct block *next; };
Common misconceptions
- Misconception: first-fit always minimizes fragmentation. Correction: first-fit is fast but can fragment badly.
- Misconception: coalescing is optional. Correction: without coalescing, fragmentation grows quickly.
Check-your-understanding questions
- Why do you need a minimum block size?
- What is the risk of forgetting to update a footer?
- How does segregated free lists improve performance?
Check-your-understanding answers
- To ensure free blocks can store metadata.
- Coalescing will read incorrect neighbor size and corrupt memory.
- It reduces search time by limiting candidates to size class.
Real-world applications
- General-purpose allocators and memory pools
- Embedded systems with tight memory constraints
Where you’ll apply it
- This project: Section 4.3 Data Structures, Section 5.10 Phase 2.
- Also used in: P06-userspace-thread-library-green-threads when allocating stacks.
References
- “C Interfaces and Implementations” (Hanson) Ch. 5-6
- “The Art of Computer Systems Performance Analysis” sections on fragmentation
Key insights
Allocator correctness is a data-structure invariants problem.
Summary
Block metadata and free list integrity decide whether your allocator is stable or crashes.
Homework/Exercises to practice the concept
- Draw a heap layout and simulate a sequence of alloc/free operations.
- Implement a minimal free list and test split/coalesce by hand.
Solutions to the homework/exercises
- Track block sizes and verify that coalescing reduces total blocks.
- Use a fixed array buffer and manage headers manually.
2.3 Alignment, Fragmentation, and Large Allocations
Fundamentals
Allocators align payloads to word boundaries (often 16 bytes) to satisfy ABI and SIMD requirements. Fragmentation comes in two forms: internal (wasted space inside blocks) and external (free space split across blocks). Large allocations should bypass the main heap using mmap.
Deep Dive into the concept
Alignment is usually achieved by rounding requested sizes up to the nearest multiple of an alignment constant. Internal fragmentation is a trade-off for simpler alignment. External fragmentation is mitigated by coalescing and by using segregated size classes. A practical rule: if request size > threshold (e.g., 128 KB), call mmap and free with munmap. This avoids polluting the heap and allows returning memory to the OS.
How this fits on projects
You will implement alignment and a large-allocation policy in Section 3.2 and Section 5.10 Phase 3.
Definitions & key terms
- alignment -> ensuring payload addresses are multiples of N
- internal fragmentation -> unused bytes inside allocated blocks
- external fragmentation -> unused bytes between allocated blocks
- threshold -> size above which
mmapis used
Mental model diagram (ASCII)
request 30 bytes -> align to 32 -> 2 bytes internal fragment
How it works (step-by-step)
- Round request to alignment boundary.
- Add header size to compute total block size.
- For large size, allocate via
mmapand mark as mmap block. - On free,
munmaplarge blocks directly.
Minimal concrete example
size_t align = 16;
size_t size = (req + (align-1)) & ~(align-1);
Common misconceptions
- Misconception: alignment is optional. Correction: misaligned pointers can crash on some architectures.
- Misconception: fragmentation is only an implementation detail. Correction: fragmentation directly impacts memory usage and performance.
Check-your-understanding questions
- Why is 16-byte alignment common on x86-64?
- What is the difference between internal and external fragmentation?
- Why use
mmapfor large allocations?
Check-your-understanding answers
- SIMD instructions and ABI require 16-byte stack alignment.
- Internal is wasted inside blocks; external is wasted between blocks.
- To reduce heap fragmentation and allow unmapping to the OS.
Real-world applications
- Performance-critical systems (DBs, HFT)
- Memory-constrained embedded software
Where you’ll apply it
- This project: Section 3.2 Functional Requirements, Section 5.10 Phase 3.
- Also used in: P05-linux-kernel-module-character-device-driver for alignment-aware buffers.
References
- “CS:APP” Ch. 3 (alignment)
- “C Interfaces and Implementations” Ch. 5
Key insights
Alignment rules keep code correct; fragmentation rules keep it fast.
Summary
A good allocator balances alignment, fragmentation, and fast allocation paths.
Homework/Exercises to practice the concept
- Compute total block size for requests of 1, 15, 16, 17 bytes.
- Choose a
mmapthreshold and justify it.
Solutions to the homework/exercises
- With 16-byte alignment, requests round to 16, 16, 16, 32.
- Start with 128 KB; adjust after benchmarking.
3. Project Specification
3.1 What You Will Build
A shared library libmymalloc.so that provides malloc, free, calloc, and realloc, plus a debug mode that prints heap layout and fragmentation stats. It can be injected into existing programs via LD_PRELOAD.
3.2 Functional Requirements
- Implement
malloc,free,calloc,realloc. - Maintain free list with block metadata and coalescing.
- Support alignment to 16 bytes.
- Use
mmapfor large allocations (configurable threshold). - Provide a debug function
heap_dump(). - Support
LD_PRELOADusage without crashing common tools (ls,cat). - Deterministic debug output with
MALLOC_DEBUG_SEED=42.
3.3 Non-Functional Requirements
- Performance: allocate/free 1M small blocks in < 2s on a modern CPU.
- Reliability: no double-free crashes; detect and report.
- Usability: clear stats for fragmentation and block counts.
3.4 Example Usage / Output
$ LD_PRELOAD=./libmymalloc.so ./bench_alloc
[alloc] 64 -> 0x55b2c00010
[free ] 0x55b2c00010
[stats] in_use=1 free=2 fragmentation=12%
3.5 Data Formats / Schemas / Protocols
Debug output format
[BLOCK addr size status] ...
3.6 Edge Cases
free(NULL)should be a no-op.realloc(ptr, 0)should free.- Double-free detection in debug mode.
callocoverflow check (nmemb * size).
3.7 Real World Outcome
3.7.1 How to Run (Copy/Paste)
make
LD_PRELOAD=./libmymalloc.so ./bench_alloc
MALLOC_DEBUG=1 MALLOC_DEBUG_SEED=42 ./bench_alloc
3.7.2 Golden Path Demo (Deterministic)
- With
MALLOC_DEBUG=1andMALLOC_DEBUG_SEED=42, heap dump order is fixed.
3.7.3 CLI Transcript (Success + Failure)
$ MALLOC_DEBUG=1 MALLOC_DEBUG_SEED=42 LD_PRELOAD=./libmymalloc.so ./bench_alloc
[BLOCK 0x1000 64 USED] [BLOCK 0x1040 128 FREE]
Summary: allocs=3 frees=2
$ LD_PRELOAD=./libmymalloc.so ./bench_alloc --double-free
error: double free detected at 0x1000
exit code: 3
3.7.4 Exit Codes
0success2invalid argument (allocator API misuse)3detected heap corruption (debug mode)
4. Solution Architecture
4.1 High-Level Design
malloc/free API
|
v
Free-list manager <-> OS memory (sbrk/mmap)
|
v
Debug + Stats reporter
4.2 Key Components
| Component | Responsibility | Key Decisions |
|———–|—————-|—————|
| Block header | size + flags | size in header, optional footer |
| Free list | manage free blocks | first-fit to start |
| OS interface | sbrk/mmap | threshold for large allocations |
| Debug | heap dump + checks | enabled by env vars |
4.3 Data Structures (No Full Code)
struct block {
size_t size;
int free;
struct block *next;
struct block *prev;
};
4.4 Algorithm Overview
Key Algorithm: Allocate
- Align size and search free list.
- Split block if large.
- If none found, request more memory from OS.
- Return pointer to payload.
Complexity Analysis:
- Time: O(n) for linear free list search
- Space: O(n) for number of blocks
5. Implementation Guide
5.1 Development Environment Setup
sudo apt install build-essential
5.2 Project Structure
allocator/
|-- src/
| |-- malloc.c
| |-- free.c
| |-- realloc.c
| `-- heap_debug.c
|-- include/
| `-- allocator.h
`-- Makefile
5.3 The Core Question You’re Answering
“How do I manage memory safely and efficiently without relying on libc?”
5.4 Concepts You Must Understand First
- VMAs and the heap.
- Block metadata and free list invariants.
- Alignment and fragmentation trade-offs.
5.5 Questions to Guide Your Design
- How will you find a free block quickly?
- How will you coalesce neighbors safely?
- What is your large-allocation threshold?
5.6 Thinking Exercise
Simulate: allocate 64, allocate 128, free 64, allocate 32. What happens with first-fit vs best-fit?
5.7 The Interview Questions They’ll Ask
- What is internal vs external fragmentation?
- Why do allocators use
mmapfor large blocks? - How does
reallocwork internally?
5.8 Hints in Layers
- Hint 1: start with a single global free list.
- Hint 2: add splitting and coalescing.
- Hint 3: add
mmapfor large allocations.
5.9 Books That Will Help
| Topic | Book | Chapter | |——|——|———| | Allocators | C Interfaces and Implementations | Ch. 5-6 | | VM | CS:APP | Ch. 9 | | Memory syscalls | TLPI | Memory chapters |
5.10 Implementation Phases
Phase 1: Minimal allocator (3-4 days)
- Use a single free list and
sbrk. - Checkpoint:
malloc/freeworks for small tests.
Phase 2: Coalescing + split (1 week)
- Implement split and coalesce logic.
- Checkpoint: fragmentation stats improve.
Phase 3: mmap + debug (1 week)
- Add
mmappath and debug checks. - Checkpoint: large allocs show as separate VMAs.
5.11 Key Implementation Decisions
| Decision | Options | Recommendation | Rationale |
|———-|———|—————-|———–|
| Free list search | first-fit vs best-fit | first-fit | simpler, acceptable for v1 |
| Large allocs | mmap threshold | 128 KB | reduces heap fragmentation |
6. Testing Strategy
6.1 Test Categories
| Category | Purpose | Examples |
|———|———|———-|
| Unit | header math | size/alignment tests |
| Integration | real programs | ls, grep with LD_PRELOAD |
| Edge | overflow | calloc overflow test |
6.2 Critical Test Cases
- Allocate/free small blocks repeatedly (no leaks).
- Double-free should be detected in debug mode.
- Large allocation should use
mmapandmunmap.
6.3 Test Data
requests: 8, 16, 24, 1024, 256000
expected: aligned sizes, mmap for 256000
7. Common Pitfalls & Debugging
7.1 Frequent Mistakes
| Pitfall | Symptom | Solution | |———|———|———-| | Incorrect alignment | crashes with SIMD | round up to 16 bytes | | Metadata overwrite | random crashes | guard headers in debug mode | | No coalescing | memory blow-up | merge neighbors on free |
7.2 Debugging Strategies
- Heap dumps: print block list after each alloc/free.
- Canaries: add sentinel bytes to detect overwrites.
7.3 Performance Traps
- Linear scans for every alloc can be slow; consider segregated lists later.
8. Extensions & Challenges
8.1 Beginner Extensions
- Add
malloc_stats()output. - Add colorized heap dump.
8.2 Intermediate Extensions
- Segregated free lists by size class.
- Thread-safe allocator with per-thread caches.
8.3 Advanced Extensions
- Implement a slab allocator for fixed-size objects.
- Add profiling hooks for hot allocation sites.
9. Real-World Connections
9.1 Industry Applications
- Memory allocators in databases and game engines
- Performance profiling and leak detection
9.2 Related Open Source Projects
- jemalloc, tcmalloc, mimalloc
9.3 Interview Relevance
- Explaining fragmentation and allocator internals
- Debugging memory corruption scenarios
10. Resources
10.1 Essential Reading
- Hanson, “C Interfaces and Implementations,” Ch. 5-6
- CS:APP Ch. 9 (VM)
10.2 Tools & Documentation
brk(2),mmap(2),munmap(2)man pages
10.3 Related Projects in This Series
11. Self-Assessment Checklist
11.1 Understanding
- I can describe heap vs mmap regions.
- I can explain fragmentation and coalescing.
11.2 Implementation
malloc/free/calloc/reallocall work.- Debug mode detects corruption.
11.3 Growth
- I can benchmark and explain allocator trade-offs.
12. Submission / Completion Criteria
Minimum Viable Completion:
- Basic allocator works with small test program.
- Free list and split implemented.
Full Completion:
- Coalescing and
mmappath implemented. - Works with
LD_PRELOADon common tools.
Excellence (Going Above & Beyond):
- Thread-safe + segregated lists.
- Detailed benchmark report.
13. Determinism Notes
- Use
MALLOC_DEBUG_SEED=42for stable heap dumps.