P01: Custom Memory Allocator
P01: Custom Memory Allocator
Build a production-ready memory allocator that can replace malloc in real programs
| Attribute | Value |
|---|---|
| Main Language | C |
| Alternative Languages | C++, Rust, Zig |
| Difficulty | Level 3: Advanced |
| Coolness Level | Level 4: Hardcore Tech Flex |
| Knowledge Area | Memory Management, Systems Programming |
| Key Tools | glibc, jemalloc, valgrind, LD_PRELOAD |
| Main Book | โC Interfaces and Implementationsโ - David Hanson |
Learning Objectives
By completing this project, you will:
- Understand memory allocation internals โ Know exactly what happens when you call
malloc()andfree() - Master free list data structures โ Implement and optimize the core data structure behind allocators
- Handle fragmentation โ Understand internal vs external fragmentation and implement coalescing
- Design thread-safe systems โ Make your allocator work correctly in multi-threaded programs
- Profile and benchmark โ Measure your allocator against glibc and jemalloc
- Deploy real software โ Use
LD_PRELOADto run actual programs with your allocator
The Core Question
โWhy is malloc slow, and what does it actually take to make a fast, correct memory allocator?โ
This question separates developers who use memory from those who understand memory. Most programmers treat malloc as magic. After this project, youโll answer:
- Why does
mallocsometimes take 10ns and sometimes 10ยตs? - Why does allocating 16 bytes waste 8 more bytes on metadata?
- Why doesnโt
freealways return memory to the OS? - Why do jemalloc and tcmalloc exist when glibc already has malloc?
Memory allocators sit at the intersection of three hard problems:
- Speed โ O(1) allocation without searching
- Memory efficiency โ Minimizing fragmentation and overhead
- Thread safety โ Scaling to many cores without lock contention
You cannot optimize all three. This project teaches you the tradeoffs by making you choose.
Deep Theoretical Foundation
1. Virtual Memory and Address Spaces
Before writing an allocator, you must understand where memory comes from.
Virtual vs Physical Addresses
When your program accesses address 0x7f8a2c001000, thatโs a virtual address. The CPUโs Memory Management Unit (MMU) translates it to a physical RAM address. This translation happens via page tables maintained by the OS kernel.
Virtual Address Space (per process):
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ 0xFFFFFFFFFFFFFFFF
โ Kernel Space โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ Stack โ โ grows downward
โ โ โ
โ โ
โ (unused) โ
โ โ
โ โ โ
โ Heap โ โ grows upward (where malloc lives)
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ BSS (uninitialized) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ Data (initialized) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ Text (code) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ 0x0000000000000000
Getting Memory from the OS
Your allocator needs to request memory from the kernel. Two primary mechanisms:
sbrk()/brk()โ Extends the heap segment- Contiguous growth (easy to manage)
- Cannot release memory in the middle (only from the top)
- Deprecated on some systems
mmap()โ Maps anonymous pages anywhere in the address space- Can release individual regions via
munmap() - Higher per-call overhead
- Modern allocators prefer this for large allocations
- Can release individual regions via
// sbrk example: extend heap by 4KB
void* ptr = sbrk(4096); // Returns old break address
// mmap example: get 4KB of anonymous memory
void* ptr = mmap(NULL, 4096,
PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS,
-1, 0);
Page Granularity
The OS allocates memory in pages (typically 4KB). When you request 100 bytes:
- The allocator asks the OS for at least one page (4096 bytes)
- Your 100 bytes come from that page
- The remaining 3996 bytes stay in the allocatorโs pool for future requests
2. Metadata: The Hidden Cost of malloc
Every allocation needs bookkeeping. When you call malloc(100), the allocator must remember:
- How big is this block? (Needed for
reallocand coalescing) - Is it currently in use? (Needed to know if itโs available)
Where to Store Metadata?
Option 1: Header before the block (most common)
Memory Layout:
[Header: size=108, used=1][User Data: 100 bytes]
^
Pointer returned to user
When free(ptr) is called, the allocator backs up from ptr to find the header:
void free(void* ptr) {
block_header_t* header = (block_header_t*)ptr - 1;
// Now we know the size and can mark it free
}
Option 2: Boundary tags (header + footer)
[Header: size=108][User Data: 100 bytes][Footer: size=108]
The footer enables finding the previous block during coalescing.
Option 3: Separate metadata table
User blocks: [Block 0][Block 1][Block 2]...
Metadata: [Entry 0][Entry 1][Entry 2]...
Isolates user data from metadata (prevents corruption) but requires lookup.
Minimum Block Size
Even if a user requests 1 byte, your block needs space for:
- Header (8-16 bytes typically)
- User data (rounded up for alignment)
- When free: a โnextโ pointer to link in free list (8 bytes on 64-bit)
Typical minimum: 16-32 bytes per allocation.
3. Free Lists: The Core Data Structure
A free list is a linked list of available memory blocks. The key insight: use the free blockโs memory itself to store the link.
Free List Structure:
โโโโโโโโโโโโโโโโโโโ
head โโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ size: 64 โ
โ next: โโโโโโโโโโโโโโ
โ [free memory] โ โ
โโโโโโโโโโโโโโโโโโโ โ
โ
โโโโโโโโโโโโโโโโโโโ
โ size: 128 โ
โ next: โโโโโโโโโโโโโโ
โ [free memory] โ โ
โโโโโโโโโโโโโโโโโโโ โ
โ
โโโโโโโโโโโโโโโโโโโ
โ size: 256 โ
โ next: NULL โ
โ [free memory] โ
โโโโโโโโโโโโโโโโโโโ
Allocation Strategies
When searching for a block to satisfy malloc(100):
| Strategy | Description | Pros | Cons |
|---|---|---|---|
| First-fit | Return first block โฅ size | Fast O(1) average | Fragments beginning of list |
| Best-fit | Return smallest block โฅ size | Minimizes waste | O(n) search, creates small unusable fragments |
| Worst-fit | Return largest block | Avoids tiny fragments | O(n) search, poor utilization |
| Next-fit | First-fit starting from last position | Spreads allocations | Still O(n) worst case |
Splitting Blocks
If you find a 256-byte block for a 64-byte request:
Before: [Header: 256][โโโโโโโโ 256 bytes โโโโโโโโ]
After: [Header: 64][Used: 64 bytes][Header: 192][Free: 192 bytes]
^ ^
Returned to user Added to free list
Only split if the remainder is large enough for a useful block (header + minimum data).
4. Fragmentation: The Silent Killer
Internal Fragmentation
Wasted space inside allocated blocks:
User requests 17 bytes
Allocator gives 32 bytes (rounded for alignment)
Wasted: 15 bytes (47% overhead!)
External Fragmentation
Free memory exists but is scattered:
After many alloc/free cycles:
[Used:50][Free:50][Used:50][Free:50][Used:50][Free:50]
Total free: 150 bytes
Can allocate 100 bytes? NO! Largest contiguous free: 50 bytes
Coalescing: The Solution
When freeing a block, merge it with adjacent free blocks:
Before free(B):
[A: Free][B: Used][C: Free]
After free(B) with coalescing:
[โโโโโ Single Free Block (A+B+C) โโโโโ]
To find adjacent blocks:
- Next block: Current address + current size
- Previous block: Need boundary tags (footer) or address-ordered list
5. Size Classes: Why jemalloc is Fast
Instead of one free list, maintain separate lists for common sizes:
Size Class 0 (16 bytes): [Free] -> [Free] -> [Free] -> NULL
Size Class 1 (32 bytes): [Free] -> [Free] -> NULL
Size Class 2 (64 bytes): [Free] -> [Free] -> [Free] -> [Free] -> NULL
Size Class 3 (128 bytes): [Free] -> NULL
...
Benefits:
- O(1) allocation for small sizes (no searching)
- Reduced fragmentation (similar sizes grouped)
- Better cache locality
Common size class schemes:
- Powers of 2: 16, 32, 64, 128, 256โฆ
- jemalloc-style: 8, 16, 32, 48, 64, 80, 96, 112, 128โฆ (more granular)
Large allocations (>4KB typically) bypass size classes and use mmap directly.
6. Thread Safety
The Problem
Without synchronization, two threads calling malloc simultaneously corrupt the free list:
Thread 1: malloc(64) Thread 2: malloc(64)
โ โ
Read head = A Read head = A (same value!)
โ โ
head = A->next head = A->next
โ โ
Return A Return A (SAME BLOCK!)
Solution 1: Global Lock (Simple but Slow)
pthread_mutex_t malloc_lock = PTHREAD_MUTEX_INITIALIZER;
void* malloc(size_t size) {
pthread_mutex_lock(&malloc_lock);
// ... allocation logic ...
pthread_mutex_unlock(&malloc_lock);
return ptr;
}
All threads serialize through one lock โ kills scalability.
Solution 2: Per-Thread Caches (What jemalloc Does)
Each thread has its own cache of blocks:
Thread 1: Local cache [Block][Block][Block]
Thread 2: Local cache [Block][Block]
Thread 3: Local cache [Block][Block][Block][Block]
Global arena (locked): [Blocks for refilling caches]
Allocations from local cache need no locking. Only refilling from global arena requires synchronization.
False Sharing
Even with per-thread data, placing thread-local variables on the same cache line causes performance collapse:
Cache line (64 bytes):
[Thread1 counter][Thread2 counter][Thread3 counter]...
When Thread1 increments its counter, Thread2's cache line is invalidated!
Solution: Pad data structures to cache-line boundaries:
struct thread_cache {
// ... actual data ...
char padding[64]; // Ensure cache-line isolation
};
7. Alignment Requirements
Modern CPUs require (or strongly prefer) aligned memory access:
| Type | Alignment Requirement |
|---|---|
char |
1 byte |
short |
2 bytes |
int |
4 bytes |
long, pointer |
8 bytes |
| SSE vectors | 16 bytes |
| AVX vectors | 32 bytes |
The C standard requires malloc to return memory suitable for any type. On most systems: 16-byte alignment.
// Rounding up to alignment
size_t aligned_size = (size + 15) & ~15; // Round up to 16
// Checking alignment
assert((uintptr_t)ptr % 16 == 0);
Project Specification
What Youโll Build
A general-purpose memory allocator implementing:
void* malloc(size_t size)โ Allocate memoryvoid free(void* ptr)โ Release memoryvoid* calloc(size_t nmemb, size_t size)โ Allocate and zerovoid* realloc(void* ptr, size_t size)โ Resize allocation
Your allocator must:
- Be loadable via
LD_PRELOADto replace system malloc - Handle multi-threaded programs correctly
- Have competitive performance (within 2x of glibc for common workloads)
- Pass memory leak detection (valgrind)
Deliverables
- Source code:
myalloc.c,myalloc.h - Shared library:
libmyalloc.so(built with-shared -fPIC) - Benchmark suite: Comparing against glibc and jemalloc
- Test suite: Unit tests and stress tests
- Documentation: Design decisions and tradeoffs made
Success Criteria
# Basic functionality
$ gcc -shared -fPIC -o libmyalloc.so myalloc.c -lpthread
$ LD_PRELOAD=./libmyalloc.so ls
# Should complete without crashes
# Runs Python
$ LD_PRELOAD=./libmyalloc.so python3 -c "print('Hello!')"
Hello!
# Passes valgrind
$ valgrind --leak-check=full ./test_allocator
All heap blocks were freed -- no leaks possible
# Benchmark output
$ ./bench_allocator
Your allocator: 1,234,567 allocs/sec
glibc malloc: 2,345,678 allocs/sec
Ratio: 0.53x # Within acceptable range
Solution Architecture
Component Overview
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Public API โ
โ malloc() / free() / calloc() / realloc() โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Thread-Local Cache โ
โ โโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโ โ
โ โ Thread 1 โ โ Thread 2 โ โ Thread 3 โ ... (one per thread) โ
โ โ Cache โ โ Cache โ โ Cache โ โ
โ โโโโโโฌโโโโโโ โโโโโโฌโโโโโโ โโโโโโฌโโโโโโ โ
โโโโโโโโโผโโโโโโโโโโโโโผโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ โ โ
โโโโโโโโโโโโโโผโโโโโโโโโโโโโ
โ (when cache empty/full)
โโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Central Arena (Locked) โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ Size Class Free Lists โ โ
โ โ Class 0: [16B] โ [16B] โ [16B] โ โ
โ โ Class 1: [32B] โ [32B] โ โ
โ โ Class 2: [64B] โ [64B] โ [64B] โ [64B] โ โ
โ โ ... โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ (when arena needs more memory)
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ OS Memory (mmap/sbrk) โ
โ [Page][Page][Page][Page][Page][Page][Page][Page]... โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Data Structures
Block Header
typedef struct block_header {
size_t size; // Size including header (low bit = in-use flag)
struct block_header* next; // Next in free list (only when free)
} block_header_t;
// Size encoding trick: use low bit for in-use flag
#define GET_SIZE(header) ((header)->size & ~1)
#define IS_USED(header) ((header)->size & 1)
#define SET_USED(header) ((header)->size |= 1)
#define SET_FREE(header) ((header)->size &= ~1)
Size Class Array
#define NUM_SIZE_CLASSES 8
#define SIZE_CLASS_THRESHOLD 2048
// Size classes: 16, 32, 64, 128, 256, 512, 1024, 2048
static block_header_t* size_classes[NUM_SIZE_CLASSES];
static pthread_mutex_t class_locks[NUM_SIZE_CLASSES];
Thread-Local Cache
#define CACHE_SIZE 32
typedef struct {
void* blocks[CACHE_SIZE];
int count;
} thread_cache_t;
static __thread thread_cache_t thread_caches[NUM_SIZE_CLASSES];
Key Algorithms
Allocation Flow
malloc(size):
1. Round size up to next size class
2. Check thread-local cache for that class
- If hit: return cached block (no lock!)
3. Lock size class
4. Search size class free list
- If found: remove from list, unlock, return
5. Request blocks from OS (mmap)
6. Split into size-class blocks
7. Return one, add rest to cache/free list
Deallocation Flow
free(ptr):
1. Get header (ptr - sizeof(header))
2. Determine size class from header
3. If thread-local cache not full:
- Add to cache (no lock!)
4. Else:
- Lock size class
- Add to free list
- Optionally coalesce with neighbors
- Unlock
Phased Implementation Guide
Phase 1: Minimal Allocator (Days 1-2)
Goal: A working malloc/free with a single global free list.
Steps:
- Define block header structure
- Implement
get_memory_from_os()using mmap - Implement first-fit
malloc():- Search free list
- If no fit, get more from OS
- Split block if too large
- Implement
free():- Find header, mark as free
- Add to free list
Test: Can allocate and free without crashing.
Phase 2: Coalescing (Days 2-3)
Goal: Prevent fragmentation by merging adjacent free blocks.
Steps:
- Add boundary tags (footer with size)
- On
free(), check if next block is free โ merge - On
free(), check if previous block is free โ merge - Keep free list address-sorted (simplifies coalescing)
Test: After many alloc/free cycles, large allocations still succeed.
Phase 3: Size Classes (Days 3-4)
Goal: O(1) allocation for common sizes.
Steps:
- Define size classes (e.g., powers of 2 up to 2048)
- Create separate free list for each class
- Round allocation requests up to appropriate class
- Large allocations (>2048) go directly to mmap
Test: Benchmark shows significant speedup for small allocations.
Phase 4: Thread Safety (Days 4-5)
Goal: Correct operation in multi-threaded programs.
Steps:
- Add mutex per size class (not one global lock)
- Lock only when accessing shared free list
- Test with multiple threads allocating concurrently
Test: No crashes or corruption with 8+ threads.
Phase 5: Thread-Local Caches (Days 5-7)
Goal: Avoid locking in the common case.
Steps:
- Add
__threadcache per size class - Allocate from cache first (no lock)
- Free to cache first (no lock)
- When cache full/empty, batch-transfer to/from global list
- Pad structures to avoid false sharing
Test: Near-linear scaling with thread count.
Phase 6: Benchmarking and Optimization (Days 7-10)
Goal: Competitive performance with glibc.
Steps:
- Build comprehensive benchmark suite
- Profile with
perfto find hotspots - Optimize critical paths
- Compare against glibc and jemalloc
Testing Strategy
Unit Tests
// Test basic allocation
void test_basic_malloc() {
void* p = malloc(100);
assert(p != NULL);
memset(p, 0x42, 100); // Should not crash
free(p);
}
// Test alignment
void test_alignment() {
for (int i = 0; i < 1000; i++) {
void* p = malloc(rand() % 1024 + 1);
assert((uintptr_t)p % 16 == 0);
free(p);
}
}
// Test realloc
void test_realloc() {
char* p = malloc(10);
strcpy(p, "hello");
p = realloc(p, 100);
assert(strcmp(p, "hello") == 0);
free(p);
}
Stress Tests
// Fragmentation stress test
void test_fragmentation() {
void* ptrs[1000];
for (int i = 0; i < 1000; i++) {
ptrs[i] = malloc(64);
}
// Free every other block
for (int i = 0; i < 1000; i += 2) {
free(ptrs[i]);
}
// This should still work (coalescing)
void* big = malloc(32000);
assert(big != NULL);
free(big);
}
// Thread stress test
void* thread_func(void* arg) {
for (int i = 0; i < 10000; i++) {
void* p = malloc(rand() % 1024);
free(p);
}
return NULL;
}
void test_multithreaded() {
pthread_t threads[8];
for (int i = 0; i < 8; i++) {
pthread_create(&threads[i], NULL, thread_func, NULL);
}
for (int i = 0; i < 8; i++) {
pthread_join(threads[i], NULL);
}
}
Real-World Testing
# Test with real programs
LD_PRELOAD=./libmyalloc.so ls -la /usr
LD_PRELOAD=./libmyalloc.so find /usr -name "*.so"
LD_PRELOAD=./libmyalloc.so python3 -c "import numpy; print(numpy.zeros(1000))"
# Memory leak detection
valgrind --leak-check=full LD_PRELOAD=./libmyalloc.so ./test_suite
Common Pitfalls and Debugging
Pitfall 1: Corrupted Metadata
Symptom: Crashes on free(), or allocating same block twice.
Cause: User code writes past allocation boundary, corrupting header.
Debug:
// Add canary values
struct block_header {
uint32_t canary_start; // 0xDEADBEEF
size_t size;
struct block_header* next;
uint32_t canary_end; // 0xCAFEBABE
};
void check_canaries(block_header_t* h) {
assert(h->canary_start == 0xDEADBEEF);
assert(h->canary_end == 0xCAFEBABE);
}
Pitfall 2: Alignment Violations
Symptom: Bus error (SIGBUS) on some architectures, or slow performance.
Cause: Returning unaligned pointer.
Fix:
// Always round header size to alignment
#define HEADER_SIZE ((sizeof(block_header_t) + 15) & ~15)
void* malloc(size_t size) {
size_t total = HEADER_SIZE + ((size + 15) & ~15);
// ...
}
Pitfall 3: Double Free
Symptom: Corruption, same block appearing in free list twice.
Debug:
void free(void* ptr) {
block_header_t* h = get_header(ptr);
if (!IS_USED(h)) {
fprintf(stderr, "DOUBLE FREE at %p!\n", ptr);
abort();
}
SET_FREE(h);
// ...
}
Pitfall 4: Race Conditions
Symptom: Crashes or corruption only under heavy threading.
Debug:
- Use ThreadSanitizer:
gcc -fsanitize=thread - Add assertions that check invariants
- Log all lock/unlock operations
Pitfall 5: Memory Leaks in Allocator
Symptom: Memory usage grows unboundedly.
Debug:
- Track total bytes from OS vs bytes in free lists vs bytes allocated
- Use valgrind on your test programs
- Add stats printing:
malloc_stats()function
Extensions and Challenges
Extension 1: Arena/Pool Allocator
Build a specialized allocator for fixed-size objects:
pool_t* pool = pool_create(sizeof(MyStruct), 1000);
MyStruct* s = pool_alloc(pool); // O(1), no fragmentation
pool_free(pool, s); // O(1)
pool_destroy(pool); // Frees everything
Extension 2: Memory-Mapped File Allocator
Persist allocations to a file:
heap_t* h = mmap_heap_open("data.heap");
void* p = mmap_malloc(h, 100);
// Survives program restart!
Extension 3: Debug Allocator
Add rich debugging features:
- Track allocation call stacks
- Detect use-after-free (poison freed memory)
- Detect buffer overflows (guard pages)
- Generate allocation heatmaps
Extension 4: NUMA-Aware Allocator
On multi-socket systems, allocate from the NUMA node closest to the requesting CPU:
void* numa_malloc(size_t size) {
int node = get_current_numa_node();
return allocate_from_node(node, size);
}
Real-World Connections
Where Memory Allocators Matter
-
High-Frequency Trading: Latency-sensitive systems pre-allocate all memory. Understanding allocators reveals why.
-
Game Engines: Frame-rate consistency requires predictable allocation. Many games use custom pools.
-
Databases: B-trees and page buffers use arena allocators for bulk operations.
-
Web Browsers: Tab isolation often uses separate heaps per tab (security + cleanup).
-
Embedded Systems: No OS = must implement your own allocator.
Industry Allocators to Study
| Allocator | Key Innovation |
|---|---|
| jemalloc | Thread-local arenas, size classes, extensive metadata |
| tcmalloc | Per-thread caches, page-level management |
| mimalloc | Free list sharding, eager page-reset |
| Hoard | Provably scalable, superblock-based |
Resources
Essential Reading
- โC Interfaces and Implementationsโ by David Hanson โ Chapter 5 (Arena) and Chapter 6 (Mem)
- โComputer Systems: A Programmerโs Perspectiveโ by Bryant & OโHallaron โ Chapter 9: Virtual Memory
- โThe Linux Programming Interfaceโ by Michael Kerrisk โ Chapter 7: Memory Allocation
Papers
- โA Scalable Concurrent malloc Implementationโ โ Jason Evans (jemalloc)
- โHoard: A Scalable Memory Allocatorโ โ Emery Berger
- โReconsidering Custom Memory Allocationโ โ Berger et al.
Online Resources
- CS 341 Malloc Tutorial โ Step-by-step implementation guide
- Embedded Artistry: Implementing Malloc
- jemalloc Wiki
Self-Assessment Checklist
Before considering this project complete, verify:
Understanding
- I can explain why
mallocneeds metadata and where itโs stored - I can describe internal vs external fragmentation
- I can explain why coalescing is necessary
- I can describe three different free list strategies and their tradeoffs
- I can explain why per-thread caches improve performance
- I can explain what false sharing is and how to avoid it
Implementation
- My allocator compiles to a shared library
LD_PRELOAD=./libmyalloc.so lsworks correctly- My allocator handles multi-threaded programs
- Valgrind reports no leaks in my test programs
- My benchmarks show reasonable performance (within 3x of glibc)
Verification
- Iโve tested with programs of varying sizes (ls, Python, large apps)
- Iโve stress-tested with thousands of alloc/free cycles
- Iโve tested with 8+ concurrent threads
- Iโve profiled and can explain where time is spent
When you complete this project, youโll have built something that can literally replace a core system library. Thatโs not a toyโitโs systems programming at its finest.