Project 5: ARM Memory Allocator (malloc from Scratch)

Build a custom heap memory allocator for bare-metal ARM systems, implementing malloc(), free(), and realloc() with proper alignment, coalescing, and fragmentation handling.


Quick Reference

Attribute Value
Language C (alt: Rust, ARM Assembly)
Difficulty Advanced
Time 1-2 weeks
Coolness ★★★☆☆ Genuinely Clever
Portfolio Value Resume Gold
Prerequisites Project 3 (bare-metal environment), strong C pointer skills
Key Topics Memory management, alignment, free lists, coalescing

Learning Objectives

By completing this project, you will:

  1. Understand heap memory management: Implement the fundamental algorithms behind dynamic memory allocation
  2. Master ARM alignment requirements: Handle 4-byte (ARM32) and 8-byte (AArch64) alignment constraints correctly
  3. Build free list data structures: Design and implement linked lists within raw memory regions
  4. Implement block coalescing: Merge adjacent free blocks to combat memory fragmentation
  5. Debug memory corruption: Detect and diagnose common heap bugs using canary values and validation
  6. Work without standard library: Build malloc without sbrk(), using fixed memory pools

The Core Question You’re Answering

“How does malloc actually work, and why is memory alignment critical on ARM processors?”

This question forces you to understand that malloc is not magic—it’s a data structure (free list) and a set of algorithms (first-fit, best-fit, coalescing) operating on raw memory. On ARM, alignment violations can cause bus errors or severe performance penalties, making correct implementation essential for bare-metal systems.


Concepts You Must Understand First

Before starting this project, verify your understanding:

Concept Why It Matters Where to Learn
Pointer arithmetic in C You’ll be navigating raw memory K&R Chapter 5, CS:APP 3.8
Linked list implementation Free lists are linked lists in memory Any data structures course
Memory layout (stack, heap, data, text) You need to know where your heap lives CS:APP Chapter 9
Bitwise operations Alignment calculations use masks CS:APP Chapter 2
ARM memory model Understand alignment faults and penalties ARM Architecture Reference Manual
Bare-metal memory maps Know where your RAM starts and ends Your MCU’s datasheet

Self-Assessment Questions

Before starting, answer these without looking:

  1. Pointer arithmetic: If char *p = 0x1000, what is (void *)(p + 16)?
  2. Alignment: How do you round up an address to the nearest 8-byte boundary?
  3. Memory layout: In your STM32, where does RAM start and how large is it?
  4. Bit masking: What is 0x1003 & ~7? What about (0x1003 + 7) & ~7?
  5. Linked lists: How do you insert a node into a sorted linked list?

Theoretical Foundation

How malloc Works: The Big Picture

When you call malloc(32), you expect to get a pointer to 32 usable bytes. But where do these bytes come from? In a typical OS, malloc manages a region called the “heap” that grows via sbrk() or mmap(). On bare metal, we have a fixed pool of memory.

Traditional malloc (with OS):           Bare-metal malloc (this project):
┌─────────────────────────┐            ┌─────────────────────────┐
│ Program calls malloc(n) │            │ Program calls malloc(n) │
└───────────┬─────────────┘            └───────────┬─────────────┘
            │                                      │
            ▼                                      ▼
┌─────────────────────────┐            ┌─────────────────────────┐
│ Search free list for    │            │ Search free list for    │
│ block >= n bytes        │            │ block >= n bytes        │
└───────────┬─────────────┘            └───────────┬─────────────┘
            │                                      │
       ┌────┴────┐                           ┌────┴────┐
       │ Found?  │                           │ Found?  │
       └────┬────┘                           └────┬────┘
      Yes   │   No                          Yes   │   No
       │    │    │                           │    │    │
       ▼    │    ▼                           ▼    │    ▼
   Return   │  Call sbrk()               Return   │  Return NULL
   block    │  to grow heap              block    │  (out of memory)
            │    │                               │
            └────┘                               │
                                                 │

Memory Block Structure

Every allocated block needs metadata. We store this in a “header” before the user data:

Block structure:
┌──────────────────────────────────────────────────────────────┐
│                                                              │
│  ┌─────────────────┐  ┌─────────────────────────────────┐   │
│  │  Block Header   │  │         User Data               │   │
│  │  (metadata)     │  │   (what malloc returns)         │   │
│  └─────────────────┘  └─────────────────────────────────┘   │
│  ↑                    ↑                                     │
│  Block start          Pointer returned to user              │
│                                                              │
└──────────────────────────────────────────────────────────────┘

Header contents:
┌─────────────────────────────────────────────────────────────┐
│ size_t size      │ Total block size including header        │
│ block_t *next    │ Pointer to next free block (if free)     │
│ uint32_t magic   │ 0xDEADBEEF=allocated, 0xFEEDFACE=free    │
└─────────────────────────────────────────────────────────────┘

Free List Organization

A free list tracks available memory blocks. When memory is freed, the block is added to this list. When memory is allocated, we search this list:

Free list visualization:

heap_start
    │
    ▼
┌───────────┐     ┌───────────┐     ┌───────────┐
│ HDR │     │     │ HDR │     │     │ HDR │     │
│ 128 │     │────▶│ 64  │     │────▶│ 256 │     │──▶ NULL
│ FACE│     │     │ FACE│     │     │ FACE│     │
└───────────┘     └───────────┘     └───────────┘
  Free: 128        Free: 64         Free: 256

When malloc(50) is called:
1. Search list: 128 >= 50+header? Yes!
2. Split block if much larger
3. Remove from free list
4. Return pointer after header

ARM Alignment Requirements

ARM processors have strict alignment requirements:

Data Type ARM32 Alignment AArch64 Alignment
char 1 byte 1 byte
short 2 bytes 2 bytes
int 4 bytes 4 bytes
long 4 bytes 8 bytes
pointer 4 bytes 8 bytes
double 8 bytes 8 bytes

Why alignment matters:

  • Performance: Unaligned access requires multiple memory operations
  • Correctness: Some ARM processors fault on unaligned access
  • Atomicity: Unaligned access may not be atomic
Alignment calculation:
┌─────────────────────────────────────────────────────────────┐
│ // Round up to 8-byte boundary                              │
│ #define ALIGN8(x) (((x) + 7) & ~7)                         │
│                                                             │
│ Example: ALIGN8(13)                                         │
│   13 + 7 = 20 = 0b00010100                                 │
│   ~7 = 0b...11111000                                        │
│   20 & ~7 = 0b00010000 = 16                                │
│                                                             │
│ So malloc(13) actually reserves at least 16 bytes of       │
│ user data, ensuring the next block starts aligned.         │
└─────────────────────────────────────────────────────────────┘

Allocation Strategies

Different strategies for finding a suitable free block:

First Fit: Take the first block that’s large enough

Pros: Fast (O(1) in best case)
Cons: Tends to leave small fragments at the beginning

Best Fit: Find the smallest block that’s large enough

Pros: Minimizes wasted space in chosen block
Cons: Slow (O(n)), leaves many tiny fragments

Segregated Lists: Multiple lists for different size classes

Pros: Fast allocation, good for common sizes
Cons: More complex, may waste memory between classes

Block Coalescing

When a block is freed, we check if neighbors are also free and merge them:

Before coalescing:
┌─────────┬─────────┬─────────┐
│ A (free)│ B (free)│ C (free)│
│  64     │   32    │   64    │
└─────────┴─────────┴─────────┘
  Three small free blocks: 64, 32, 64

After coalescing:
┌─────────────────────────────┐
│        A+B+C (free)         │
│           160               │
└─────────────────────────────┘
  One large free block: 160 (can satisfy larger requests)

Immediate coalescing: Merge on every free() call Deferred coalescing: Merge only when allocation fails

Why This Matters

Understanding malloc is essential for:

  • Embedded systems: Often need custom allocators for determinism
  • Real-time systems: Standard malloc may have unbounded runtime
  • Game development: Pool allocators for specific object types
  • Security: Heap exploits target malloc metadata
  • Performance: Custom allocators can be 10x faster for specific patterns

Project Specification

What You Will Build

A complete memory allocator for bare-metal ARM with:

  1. malloc(size_t size): Allocate size bytes from the heap
  2. free(void *ptr): Return memory to the free list
  3. realloc(void *ptr, size_t size): Resize an allocation
  4. calloc(size_t num, size_t size): Allocate and zero memory

Functional Requirements

  1. Basic Allocation:
    • Satisfy requests from 1 byte to pool size
    • Return NULL when out of memory
    • Maintain proper alignment (8-byte for this project)
  2. Deallocation:
    • Return blocks to free list
    • Coalesce adjacent free blocks
    • Detect double-free errors
  3. Reallocation:
    • Grow or shrink existing allocations
    • Copy data when moving to new location
    • Handle realloc(NULL, n) as malloc(n)
  4. Debugging Features:
    • Magic numbers to detect corruption
    • Heap statistics (total, used, free, fragments)
    • Optional integrity checking

Non-Functional Requirements

  • Deterministic: Same allocation pattern gives same results
  • No external dependencies: Works without libc
  • Thread-unsafe: Single-threaded only (simplifies implementation)
  • Minimum block size: 16 bytes (to fit header + meaningful data)

Real World Outcome

// Test program output:
=== ARM Malloc Test Suite ===

Test 1: Basic allocation
  malloc(32) = 0x20001008 
  malloc(64) = 0x20001030 
  malloc(16) = 0x20001078 

Test 2: Free and reuse
  free(0x20001030)
  malloc(32) = 0x20001030   (reused!)

Test 3: Coalescing
  free(0x20001008)
  free(0x20001078)
  malloc(100) = 0x20001008   (coalesced block)

Test 4: Alignment
  malloc(1) = 0x20001074 (aligned to 8) 
  malloc(7) = 0x20001080 (aligned to 8) 

Test 5: Large allocation
  malloc(2048) = 0x20001088 

Test 6: Out of memory
  malloc(1000000) = NULL   (expected failure)

Test 7: Realloc
  p = malloc(32) = 0x20001088
  realloc(p, 64) = 0x20001088   (expanded in place)
  realloc(p, 16) = 0x20001088   (shrunk in place)

Test 8: Calloc
  calloc(10, 4) = 0x200010C8 
  First 40 bytes are zero 

Heap stats:
  Total pool: 4096 bytes
  Allocated: 148 bytes (3.6%)
  Free: 3948 bytes (96.4%)
  Fragments: 2
  Largest free block: 3916 bytes

All tests passed!

Solution Architecture

High-Level Design

┌─────────────────────────────────────────────────────────────────────┐
│                        Memory Allocator                             │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│  ┌────────────────┐   ┌────────────────┐   ┌────────────────┐     │
│  │  Public API    │   │  Free List     │   │  Block         │     │
│  │  malloc/free   │──▶│  Management    │──▶│  Operations    │     │
│  │  realloc/calloc│   │                │   │                │     │
│  └────────────────┘   └────────────────┘   └────────────────┘     │
│          │                    │                    │               │
│          │                    │                    │               │
│          ▼                    ▼                    ▼               │
│  ┌─────────────────────────────────────────────────────────────┐  │
│  │                    Heap Memory Pool                          │  │
│  │  [HDR|DATA    ][HDR|DATA][HDR|       FREE          ]        │  │
│  │   ↑ allocated   ↑ alloc   ↑ free                            │  │
│  └─────────────────────────────────────────────────────────────┘  │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘

Key Components

Component Responsibility Key Functions
Public API User-facing allocation interface malloc, free, realloc, calloc
Free List Track available memory blocks find_free_block, add_to_free_list, remove_from_free_list
Block Operations Split, coalesce, validate blocks split_block, coalesce, validate_block
Statistics Track heap health metrics get_heap_stats, print_heap_state

Data Structures

// Block header - stored before every allocation
typedef struct block_header {
    size_t size;                    // Total block size including header
    struct block_header *next;      // Next free block (only used when free)
    uint32_t magic;                 // Magic number for corruption detection
    // On 64-bit: add padding to maintain alignment
} block_header_t;

#define MAGIC_ALLOCATED 0xDEADBEEF
#define MAGIC_FREE      0xFEEDFACE
#define MIN_BLOCK_SIZE  (sizeof(block_header_t) + 8)  // Header + minimum data

// Heap state
typedef struct {
    uint8_t *pool_start;            // Start of memory pool
    uint8_t *pool_end;              // End of memory pool
    block_header_t *free_list;      // Head of free list
    size_t total_size;              // Total pool size
    size_t allocated_size;          // Currently allocated
    size_t allocation_count;        // Number of active allocations
} heap_state_t;

// Statistics structure
typedef struct {
    size_t total_bytes;
    size_t allocated_bytes;
    size_t free_bytes;
    size_t fragment_count;
    size_t largest_free_block;
    size_t allocation_count;
} heap_stats_t;

Memory Layout

Memory Pool Layout (4KB example at 0x20001000):

0x20001000 ┌─────────────────────────────────────────┐
           │ Block Header (12 bytes ARM32)           │
           │   size: 56                              │
           │   next: NULL                            │
           │   magic: 0xDEADBEEF (allocated)         │
0x2000100C ├─────────────────────────────────────────┤
           │ User Data (44 bytes)                    │
           │   (pointer returned by malloc)          │
0x20001038 ├─────────────────────────────────────────┤
           │ Block Header (12 bytes)                 │
           │   size: 32                              │
           │   next: 0x20001078                      │
           │   magic: 0xFEEDFACE (free)             │
0x20001044 ├─────────────────────────────────────────┤
           │ Free Space (20 bytes)                   │
0x20001058 ├─────────────────────────────────────────┤
           │ Block Header (12 bytes)                 │
           │   size: 24                              │
           │   next: NULL                            │
           │   magic: 0xDEADBEEF (allocated)         │
0x20001064 ├─────────────────────────────────────────┤
           │ User Data (12 bytes)                    │
0x20001070 ├─────────────────────────────────────────┤
           │ Remaining Pool (free)                   │
           │   ...                                   │
0x20002000 └─────────────────────────────────────────┘

Algorithm: malloc

malloc(size):
    1. aligned_size = ALIGN8(size + sizeof(header))
    2. Search free list for block >= aligned_size
    3. If found:
       a. Remove block from free list
       b. If block is much larger, split it
       c. Mark as allocated (magic = ALLOCATED)
       d. Return pointer after header
    4. If not found:
       a. Return NULL (out of memory)

Algorithm: free

free(ptr):
    1. header = ptr - sizeof(header)
    2. Validate magic == ALLOCATED (detect double-free)
    3. Mark as free (magic = FREE)
    4. Coalesce with adjacent free blocks:
       a. Check if previous block is free → merge backward
       b. Check if next block is free → merge forward
    5. Add to free list (sorted by address for easy coalescing)

Algorithm: Coalescing

coalesce(block):
    // Try to merge with next block
    next = (header *)((char *)block + block->size)
    if next is within pool AND next->magic == FREE:
        block->size += next->size
        remove next from free list

    // Try to merge with previous block (if using boundary tags)
    // This requires footer in each block, or walking the list

Implementation Guide

Development Environment Setup

# Required tools
sudo apt-get install gcc-arm-none-eabi gdb-multiarch qemu-system-arm

# Verify ARM compiler
arm-none-eabi-gcc --version

# Create project structure
mkdir -p arm-malloc/{src,include,tests}
cd arm-malloc

Project Structure

arm-malloc/
├── src/
│   ├── malloc.c           # Main allocator implementation
│   ├── startup.s          # ARM startup code (from Project 3)
│   └── main.c             # Test program
├── include/
│   ├── malloc.h           # Public API
│   └── malloc_internal.h  # Internal structures
├── linker.ld              # Linker script with heap definition
├── Makefile
└── tests/
    ├── test_basic.c
    ├── test_stress.c
    └── test_corruption.c

Implementation Phases

Phase 1: Foundation (Days 1-3)

Goals:

  • Set up project structure
  • Define data structures
  • Implement heap initialization

Tasks:

  1. Create block header structure with magic numbers
  2. Define the memory pool (use linker script or static array)
  3. Implement heap_init() to set up initial free block
  4. Implement basic get_heap_stats() for debugging

Code skeleton:

// malloc.c
#include "malloc.h"

#define HEAP_SIZE 4096
static uint8_t heap_pool[HEAP_SIZE] __attribute__((aligned(8)));
static heap_state_t heap;

void heap_init(void) {
    heap.pool_start = heap_pool;
    heap.pool_end = heap_pool + HEAP_SIZE;
    heap.total_size = HEAP_SIZE;
    heap.allocated_size = 0;
    heap.allocation_count = 0;

    // Create one large free block covering entire pool
    block_header_t *initial = (block_header_t *)heap_pool;
    initial->size = HEAP_SIZE;
    initial->next = NULL;
    initial->magic = MAGIC_FREE;

    heap.free_list = initial;
}

Checkpoint: heap_init() creates a valid initial state, get_heap_stats() reports correct total size.

Phase 2: Basic malloc (Days 4-6)

Goals:

  • Implement first-fit allocation
  • Handle alignment correctly
  • Implement block splitting

Tasks:

  1. Implement find_free_block() - search free list for suitable block
  2. Implement split_block() - divide large blocks
  3. Implement malloc() - tie it all together
  4. Test with various sizes

Key code:

void *malloc(size_t size) {
    if (size == 0) return NULL;

    // Calculate total size needed (header + aligned user data)
    size_t total_size = ALIGN8(sizeof(block_header_t) + size);
    if (total_size < MIN_BLOCK_SIZE) {
        total_size = MIN_BLOCK_SIZE;
    }

    // Find a free block
    block_header_t *block = find_free_block(total_size);
    if (!block) return NULL;

    // Remove from free list
    remove_from_free_list(block);

    // Split if block is much larger than needed
    if (block->size >= total_size + MIN_BLOCK_SIZE) {
        split_block(block, total_size);
    }

    // Mark as allocated
    block->magic = MAGIC_ALLOCATED;
    heap.allocated_size += block->size;
    heap.allocation_count++;

    // Return pointer to user data (after header)
    return (void *)((uint8_t *)block + sizeof(block_header_t));
}

Checkpoint: malloc() returns valid, aligned pointers. Multiple allocations don’t overlap.

Phase 3: free and coalescing (Days 7-9)

Goals:

  • Implement free with validation
  • Implement forward and backward coalescing
  • Detect double-free errors

Tasks:

  1. Implement free() with magic number validation
  2. Implement coalesce_forward() - merge with next block
  3. Implement coalesce_backward() - merge with previous block (requires walking list)
  4. Add to free list in sorted order (by address)

Key code:

void free(void *ptr) {
    if (!ptr) return;

    block_header_t *block = (block_header_t *)((uint8_t *)ptr - sizeof(block_header_t));

    // Validate
    if (block->magic != MAGIC_ALLOCATED) {
        if (block->magic == MAGIC_FREE) {
            // Double free detected!
            debug_print("ERROR: Double free detected!\n");
        } else {
            // Corruption detected!
            debug_print("ERROR: Heap corruption at %p\n", block);
        }
        return;
    }

    // Mark as free
    block->magic = MAGIC_FREE;
    heap.allocated_size -= block->size;
    heap.allocation_count--;

    // Coalesce with neighbors
    block = coalesce(block);

    // Add to free list (sorted by address)
    add_to_free_list_sorted(block);
}

Checkpoint: Free blocks are reused. Coalescing combines adjacent blocks.

Phase 4: realloc and calloc (Days 10-11)

Goals:

  • Implement realloc with in-place growth when possible
  • Implement calloc (allocate + zero)
  • Handle edge cases

Tasks:

  1. Implement realloc() - try to grow in place, else allocate+copy+free
  2. Implement calloc() - malloc + memset
  3. Handle edge cases (NULL ptr, size 0)

Key code:

void *realloc(void *ptr, size_t size) {
    if (!ptr) return malloc(size);
    if (size == 0) { free(ptr); return NULL; }

    block_header_t *block = GET_HEADER(ptr);
    size_t current_size = block->size - sizeof(block_header_t);

    if (size <= current_size) {
        // Shrinking - could split off excess, or just keep it
        return ptr;
    }

    // Need to grow - check if next block is free and big enough
    block_header_t *next = GET_NEXT_BLOCK(block);
    if (next && next->magic == MAGIC_FREE) {
        size_t combined = block->size + next->size;
        if (combined >= ALIGN8(sizeof(block_header_t) + size)) {
            // Can grow in place
            remove_from_free_list(next);
            block->size = combined;
            return ptr;
        }
    }

    // Must relocate
    void *new_ptr = malloc(size);
    if (new_ptr) {
        memcpy(new_ptr, ptr, current_size);
        free(ptr);
    }
    return new_ptr;
}

Checkpoint: realloc grows and shrinks allocations correctly. calloc returns zeroed memory.

Phase 5: Testing and Polish (Days 12-14)

Goals:

  • Comprehensive testing
  • Performance measurement
  • Documentation

Tasks:

  1. Write stress tests (many allocations/frees)
  2. Write corruption detection tests
  3. Measure fragmentation under various patterns
  4. Add debug visualization of heap state
  5. Document API and implementation

Checkpoint: All tests pass. Heap stats accurately track state.


Hints in Layers

Hint 1: Getting Started - The Alignment Macro

The most important macro for this project:

#define ALIGN8(x) (((x) + 7) & ~7)

This rounds up to the nearest multiple of 8. Test it:

  • ALIGN8(1) = 8
  • ALIGN8(8) = 8
  • ALIGN8(9) = 16
  • ALIGN8(15) = 16

For ARM32, you could use ALIGN4, but ALIGN8 is safer for doubles and 64-bit values.

Hint 2: Block Header Navigation

To get from user pointer to header and back:

// User pointer to header
#define GET_HEADER(ptr) \
    ((block_header_t *)((uint8_t *)(ptr) - sizeof(block_header_t)))

// Header to user pointer
#define GET_USER_PTR(block) \
    ((void *)((uint8_t *)(block) + sizeof(block_header_t)))

// Get next block in memory (not free list)
#define GET_NEXT_BLOCK(block) \
    ((block_header_t *)((uint8_t *)(block) + (block)->size))

// Check if address is within heap
#define IS_IN_HEAP(ptr) \
    ((uint8_t *)(ptr) >= heap.pool_start && (uint8_t *)(ptr) < heap.pool_end)
Hint 3: Free List Management

Keep the free list sorted by address for easy coalescing:

void add_to_free_list_sorted(block_header_t *block) {
    block_header_t **current = &heap.free_list;

    while (*current && *current < block) {
        current = &((*current)->next);
    }

    block->next = *current;
    *current = block;
}

void remove_from_free_list(block_header_t *block) {
    block_header_t **current = &heap.free_list;

    while (*current && *current != block) {
        current = &((*current)->next);
    }

    if (*current) {
        *current = block->next;
    }
}
Hint 4: Coalescing Strategy

With a sorted free list, coalescing becomes easier:

block_header_t *coalesce(block_header_t *block) {
    // Forward coalescing - check next block
    block_header_t *next = GET_NEXT_BLOCK(block);
    if (IS_IN_HEAP(next) && next->magic == MAGIC_FREE) {
        remove_from_free_list(next);
        block->size += next->size;
        // 'next' is now invalid - absorbed into 'block'
    }

    // Backward coalescing - find previous block
    // This requires either boundary tags or walking the heap
    block_header_t *prev = find_previous_block(block);
    if (prev && prev->magic == MAGIC_FREE) {
        remove_from_free_list(prev);
        prev->size += block->size;
        return prev;  // Return the merged block
    }

    return block;
}

For backward coalescing without boundary tags, you’d need to walk from heap start:

block_header_t *find_previous_block(block_header_t *block) {
    block_header_t *prev = NULL;
    block_header_t *current = (block_header_t *)heap.pool_start;

    while (current < block) {
        prev = current;
        current = GET_NEXT_BLOCK(current);
    }

    return prev;
}
Hint 5: Block Splitting

When you find a free block much larger than needed, split it:

void split_block(block_header_t *block, size_t needed_size) {
    size_t remaining = block->size - needed_size;

    if (remaining < MIN_BLOCK_SIZE) {
        return;  // Not enough left for a new block
    }

    // Create new free block from the remainder
    block_header_t *new_block = (block_header_t *)((uint8_t *)block + needed_size);
    new_block->size = remaining;
    new_block->magic = MAGIC_FREE;
    new_block->next = NULL;

    // Shrink original block
    block->size = needed_size;

    // Add new block to free list
    add_to_free_list_sorted(new_block);
}
Hint 6: Debug Visualization

A heap dump function is invaluable for debugging:

void dump_heap(void) {
    debug_print("=== Heap Dump ===\n");
    debug_print("Pool: %p - %p (%u bytes)\n",
                heap.pool_start, heap.pool_end, heap.total_size);

    block_header_t *block = (block_header_t *)heap.pool_start;
    int count = 0;

    while (IS_IN_HEAP(block) && count < 100) {
        const char *status;
        if (block->magic == MAGIC_ALLOCATED) status = "USED";
        else if (block->magic == MAGIC_FREE) status = "FREE";
        else status = "CORRUPT";

        debug_print("  [%d] %p: size=%u, status=%s\n",
                    count, block, block->size, status);

        block = GET_NEXT_BLOCK(block);
        count++;
    }

    debug_print("Free list: ");
    block_header_t *free = heap.free_list;
    while (free) {
        debug_print("%p(%u) -> ", free, free->size);
        free = free->next;
    }
    debug_print("NULL\n");
}

Testing Strategy

Test Categories

Category Purpose Examples
Basic Verify core functionality Single alloc/free, alignment
Stress Test under heavy load 1000 allocations, random sizes
Coalescing Verify block merging Free adjacent blocks, check merge
Fragmentation Measure memory efficiency Alternating alloc/free patterns
Corruption Test error detection Double free, buffer overflow

Critical Test Cases

// Test 1: Basic allocation and alignment
void test_basic(void) {
    void *p1 = malloc(1);   assert(p1 && IS_ALIGNED(p1, 8));
    void *p2 = malloc(7);   assert(p2 && IS_ALIGNED(p2, 8));
    void *p3 = malloc(8);   assert(p3 && IS_ALIGNED(p3, 8));
    void *p4 = malloc(100); assert(p4 && IS_ALIGNED(p4, 8));

    // Verify no overlap
    assert(p2 >= p1 + 8 || p1 >= p2 + 8);

    free(p1); free(p2); free(p3); free(p4);
}

// Test 2: Coalescing
void test_coalescing(void) {
    void *p1 = malloc(32);
    void *p2 = malloc(32);
    void *p3 = malloc(32);

    free(p2);  // Free middle block
    free(p1);  // Should coalesce with p2
    free(p3);  // Should coalesce all three

    // Now should be able to allocate all memory as one block
    void *big = malloc(80);  // Should succeed with coalesced block
    assert(big != NULL);
    free(big);
}

// Test 3: Stress test
void test_stress(void) {
    void *ptrs[100];

    // Allocate all
    for (int i = 0; i < 100; i++) {
        ptrs[i] = malloc(16 + (i % 64));
        assert(ptrs[i] != NULL);
    }

    // Free odd indices
    for (int i = 1; i < 100; i += 2) {
        free(ptrs[i]);
        ptrs[i] = NULL;
    }

    // Allocate again
    for (int i = 1; i < 100; i += 2) {
        ptrs[i] = malloc(32);
        assert(ptrs[i] != NULL);
    }

    // Free all
    for (int i = 0; i < 100; i++) {
        free(ptrs[i]);
    }

    // Verify all memory returned
    heap_stats_t stats;
    get_heap_stats(&stats);
    assert(stats.allocated_bytes == 0);
}

// Test 4: Realloc
void test_realloc(void) {
    void *p = malloc(32);
    memset(p, 0xAA, 32);

    // Grow
    p = realloc(p, 64);
    assert(p != NULL);
    assert(((uint8_t *)p)[0] == 0xAA);  // Data preserved

    // Shrink
    p = realloc(p, 16);
    assert(p != NULL);
    assert(((uint8_t *)p)[0] == 0xAA);

    free(p);
}

Common Pitfalls & Debugging

Frequent Mistakes

Pitfall Symptom Solution
Off-by-one in alignment Corruption after N allocations Test ALIGN8 macro thoroughly
Forgetting to update free list Memory leaks, double allocation Always pair add/remove operations
Wrong pointer arithmetic Wild pointers, crashes Cast to uint8_t * before arithmetic
Not checking boundaries Read/write outside heap Validate all pointers with IS_IN_HEAP
Split creates too-small block Header-only blocks Check remaining >= MIN_BLOCK_SIZE
Coalesce loop infinite Hang during free Verify next block calculation

Debugging Strategies

  1. Add verbose logging: Print every malloc/free with addresses and sizes
  2. Validate on every operation: Check magic numbers, free list integrity
  3. Use a heap visualizer: Print ASCII diagram of heap state
  4. Test incrementally: Get malloc working before free, free before realloc
  5. Use QEMU with GDB: Set watchpoints on corrupted addresses

Quick Validation

void validate_heap(void) {
    size_t counted_free = 0;
    size_t counted_alloc = 0;

    block_header_t *block = (block_header_t *)heap.pool_start;

    while (IS_IN_HEAP(block)) {
        assert(block->magic == MAGIC_FREE || block->magic == MAGIC_ALLOCATED);

        if (block->magic == MAGIC_FREE) {
            counted_free += block->size;
        } else {
            counted_alloc += block->size;
        }

        block = GET_NEXT_BLOCK(block);
    }

    assert(counted_alloc == heap.allocated_size);
    assert(counted_free + counted_alloc == heap.total_size);
}

Extensions & Challenges

Beginner Extensions

  • Heap statistics CLI: Print stats via UART command
  • Memory fill patterns: Fill allocated memory with 0xCD, freed with 0xDD (like MSVC debug)
  • Allocation tracking: Log allocation call sites (requires stack walking or macro trickery)

Intermediate Extensions

  • Boundary tags: Add footers to blocks for O(1) backward coalescing
  • Best-fit strategy: Compare fragmentation vs first-fit
  • Size-class segregation: Multiple free lists for different size ranges
  • Memory pools: Create specialized allocators for fixed-size objects

Advanced Extensions

  • Thread safety: Add mutex/spinlock protection
  • Defragmentation: Compact heap by moving allocations (requires handles)
  • Guard pages: Use MPU to detect buffer overflows (requires Project 15)
  • TLSF algorithm: Implement Two-Level Segregated Fit for O(1) allocation

The Interview Questions They’ll Ask

After completing this project, you’ll be ready for:

  1. “How does malloc work internally?”
    • Discuss: free lists, block headers, splitting, coalescing
    • Mention: first-fit vs best-fit tradeoffs
  2. “What is memory fragmentation and how do you combat it?”
    • Internal vs external fragmentation
    • Coalescing, compaction, segregated lists
  3. “Why does alignment matter?”
    • Performance penalties for unaligned access
    • Atomicity requirements
    • ARM-specific alignment faults
  4. “How would you detect a double-free bug?”
    • Magic numbers in headers
    • Poisoning freed memory
    • Guard patterns
  5. “What’s the difference between stack and heap allocation?”
    • Lifetime, thread-safety, fragmentation, performance
    • When to use each
  6. “Design a memory allocator for [game engine / embedded system / real-time system]”
    • Pool allocators for uniform objects
    • Arena allocators for bulk free
    • Deterministic timing requirements

Books That Will Help

Topic Book Chapter
Dynamic memory allocation algorithms Computer Systems: A Programmer’s Perspective Ch. 9.9
Free list implementations The Art of Computer Programming, Vol 1 Ch. 2.5
ARM alignment rules ARM Architecture Reference Manual Memory Model
Pool allocators for games Game Programming Patterns Ch. 19 (Object Pool)
Memory allocator design Understanding the Linux Kernel Ch. 8 (Memory Management)
Embedded memory management Making Embedded Systems Ch. 7
Writing robust C Expert C Programming Ch. 9

Self-Assessment Checklist

Before considering this project complete, verify:

Understanding

  • I can explain why alignment matters on ARM processors
  • I can draw the memory layout of an allocated block with header
  • I can trace through a malloc/free sequence on paper
  • I understand when and why coalescing improves memory usage
  • I can explain the tradeoffs between first-fit and best-fit

Implementation

  • malloc returns properly aligned pointers
  • free returns memory to the free list correctly
  • Adjacent free blocks are coalesced
  • Double-free is detected and reported
  • realloc handles grow, shrink, and NULL cases
  • All heap memory can be reclaimed after freeing all allocations

Testing

  • Basic allocation/free tests pass
  • Stress tests with many allocations pass
  • Coalescing tests verify blocks are merged
  • Edge cases (size=0, NULL ptr) are handled

Growth

  • I debugged at least one corruption bug using magic numbers
  • I can extend this allocator with new features
  • I understand how production allocators differ (jemalloc, tcmalloc)

Learning Milestones

Track your progress through these understanding checkpoints:

  1. malloc/free work for single allocations → Basic structure is correct
  2. Memory is reused after free → Free list works
  3. Coalescing prevents fragmentation → Block merging works
  4. Stress test passes → Ready for real use

This guide was expanded from LEARN_ARM_DEEP_DIVE.md. For the complete learning path, see the project index.