Project 3: Memory Leak Detector

Project 3: Memory Leak Detector

The Core Question: โ€œWho owns this memory, and when does it become invalid?โ€

Project Overview

Attribute Value
Difficulty Intermediate
Time Estimate 1-2 weeks
Language C
Prerequisites Comfort with structs, linked lists, macros
Main Book โ€œUnderstanding and Using C Pointersโ€ by Richard Reese

Learning Objectives

By completing this project, you will:

  1. Understand object lifetime - When memory is โ€œaliveโ€ vs โ€œdeadโ€
  2. Master ownership semantics - Who is responsible for calling free()
  3. Detect common memory bugs - Leaks, double-free, use-after-free
  4. Build debugging infrastructure - Track allocations with file/line info
  5. Think like a tool author - Understand how Valgrind/ASan work

Theoretical Foundation

The Three Pointer States

Every pointer in C is in exactly one of three states:

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ POINTER STATES                                                  โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                                                                  โ”‚
โ”‚  1. NULL                                                         โ”‚
โ”‚     int *p = NULL;                                              โ”‚
โ”‚     - Explicitly points to nothing                              โ”‚
โ”‚     - Safe to check: if (p == NULL)                            โ”‚
โ”‚     - Dereferencing crashes (usually)                          โ”‚
โ”‚                                                                  โ”‚
โ”‚  2. VALID                                                        โ”‚
โ”‚     int *p = malloc(sizeof(int));                               โ”‚
โ”‚     - Points to allocated, live memory                          โ”‚
โ”‚     - Safe to dereference: *p = 42;                            โ”‚
โ”‚     - Owner must eventually free                                โ”‚
โ”‚                                                                  โ”‚
โ”‚  3. DANGLING (INVALID)                                          โ”‚
โ”‚     int *p = malloc(sizeof(int));                               โ”‚
โ”‚     free(p);                                                    โ”‚
โ”‚     // p is now DANGLING - still contains old address           โ”‚
โ”‚     // but memory is no longer valid                            โ”‚
โ”‚     *p = 42;  // UNDEFINED BEHAVIOR!                            โ”‚
โ”‚                                                                  โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

The danger of dangling pointers: They look exactly like valid pointers. Thereโ€™s no runtime check. The old address is still thereโ€”itโ€™s just that the memory at that address is no longer yours.

Object Lifetime

Every piece of dynamically allocated memory has a lifetime:

Timeline:
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€>
        โ”‚                              โ”‚
        โ”‚  malloc()                    โ”‚  free()
        โ”‚  โ†“                           โ”‚  โ†“
        โ”‚  โ•”โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•—โ”‚
        โ”‚  โ•‘    VALID LIFETIME        โ•‘โ”‚
        โ”‚  โ•‘    Safe to use pointer   โ•‘โ”‚
        โ”‚  โ•šโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ”‚
        โ”‚                              โ”‚
   โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€>
     BEFORE                          AFTER
   (unallocated)                   (freed - DANGLING!)

Bug 1: Memory Leak

void leak() {
    int *p = malloc(1000);
    // Function returns without free(p)
    // Those 1000 bytes are now UNREACHABLE but still ALLOCATED
}

// Call leak() 1000 times โ†’ 1MB of leaked memory

Symptoms: Memory usage grows over time. Long-running programs eventually crash with OOM.

Bug 2: Double-Free

int *p = malloc(sizeof(int));
free(p);
free(p);  // BUG! p was already freed

// Consequences:
// - Heap metadata corruption
// - Security vulnerability (exploitation possible)
// - Crash (if lucky)

Bug 3: Use-After-Free

int *p = malloc(sizeof(int));
*p = 42;
free(p);

// Later...
printf("%d\n", *p);  // BUG! Reading freed memory

// Consequences:
// - Reads garbage (old value might still be there!)
// - Reads new data (if memory was reused)
// - Security vulnerability (attacker controls reused memory)

Bug 4: Invalid Free

int x = 42;
free(&x);  // BUG! Can't free stack memory

char *str = "hello";
free(str);  // BUG! Can't free string literals

Why Bugs Appear Later Than Mistakes

This is the critical insight that makes memory bugs so hard:

int *p = malloc(sizeof(int));
*p = 42;
free(p);

// p is now dangling, but...
// The memory at *p might still contain 42!
// The heap hasn't reused it yet.

printf("%d\n", *p);  // Might print 42! "Works!"

// Much later, in unrelated code:
int *q = malloc(sizeof(int));
*q = 100;
// q might get the same address as p was!

printf("%d\n", *p);  // NOW prints 100 (or crashes)

The bug (use-after-free) was committed when we used *p after free(p). But the symptom only appeared when the memory was reused.


Project Specification

What Youโ€™re Building

A wrapper around malloc/free that:

  1. Tracks all allocations - Registry of (address, size, file, line)
  2. Detects memory leaks - Report unfreed allocations at program exit
  3. Catches double-free - Error if freeing already-freed pointer
  4. Warns about use-after-free - Poison freed memory to make bugs visible
  5. Provides debugging info - Show WHERE the allocation happened

Core API

// Macros that capture file/line
#define malloc(size) debug_malloc(size, __FILE__, __LINE__)
#define free(ptr) debug_free(ptr, __FILE__, __LINE__)

// Underlying functions
void* debug_malloc(size_t size, const char* file, int line);
void debug_free(void* ptr, const char* file, int line);

// Reporting
void leak_check_report(void);    // Call at program exit
size_t get_active_allocations(void);  // Currently allocated count
size_t get_total_allocated_bytes(void);

Sample Output

$ ./my_program

=== Memory Leak Detector Report ===
[LEAK] 100 bytes allocated at test.c:7 (create_user) never freed
       Address: 0x600000004000
[LEAK] 100 bytes allocated at test.c:7 (create_user) never freed
       Address: 0x600000004100
[LEAK] 100 bytes allocated at test.c:7 (create_user) never freed
       Address: 0x600000004200

Total leaks: 3 allocations, 300 bytes

Solution Architecture

Module Design

leak_detector/
โ”œโ”€โ”€ leak_detector.h     # Public API and macros
โ”œโ”€โ”€ leak_detector.c     # Implementation
โ”œโ”€โ”€ test_leak.c         # Memory leak test
โ”œโ”€โ”€ test_double_free.c  # Double-free test
โ”œโ”€โ”€ test_use_after_free.c # Use-after-free test
โ”œโ”€โ”€ test_clean.c        # Clean program (no leaks)
โ””โ”€โ”€ Makefile

Data Structures

// Track one allocation
typedef struct Allocation {
    void *ptr;              // The allocated address
    size_t size;            // Size in bytes
    const char *file;       // Source file
    int line;               // Line number
    int freed;              // 0 = active, 1 = freed
    struct Allocation *next; // Linked list pointer
} Allocation;

// Global registry
static Allocation *registry_head = NULL;
static int initialized = 0;

Core Algorithm

debug_malloc(size, file, line):
    1. Call real malloc(size)
    2. Create Allocation record with (ptr, size, file, line, freed=0)
    3. Add to registry
    4. Return ptr

debug_free(ptr, file, line):
    1. Find ptr in registry
    2. If not found: WARNING - freeing untracked memory
    3. If found and freed == 1: ERROR - double free!
    4. Mark freed = 1
    5. Fill memory with poison pattern (0xDEADBEEF)
    6. Call real free(ptr)

leak_check_report():
    1. Walk registry
    2. For each entry where freed == 0: report as leak
    3. Print summary statistics

Implementation Guide

Phase 1: Basic Tracking (3-4 hours)

Goal: Track allocations in a linked list.

// leak_detector.c

#include <stdlib.h>
#include <stdio.h>
#include <string.h>

typedef struct Allocation {
    void *ptr;
    size_t size;
    const char *file;
    int line;
    int freed;
    struct Allocation *next;
} Allocation;

static Allocation *head = NULL;

// Store the real malloc before we redefine it
static void* (*real_malloc)(size_t) = NULL;
static void (*real_free)(void*) = NULL;

void init_leak_detector(void) {
    // Save references to real functions
    // (This is simplified - in practice you'd use dlsym)
}

void* debug_malloc(size_t size, const char* file, int line) {
    // Allocate the requested memory plus our tracking
    void *ptr = malloc(size);  // Call the real malloc
    if (ptr == NULL) return NULL;

    // Create tracking record
    Allocation *record = malloc(sizeof(Allocation));
    record->ptr = ptr;
    record->size = size;
    record->file = file;
    record->line = line;
    record->freed = 0;
    record->next = head;
    head = record;

    return ptr;
}

Phase 2: Leak Detection (2-3 hours)

Goal: Report unfreed allocations at exit.

void leak_check_report(void) {
    int leak_count = 0;
    size_t leak_bytes = 0;

    printf("\n=== Memory Leak Detector Report ===\n");

    for (Allocation *a = head; a != NULL; a = a->next) {
        if (!a->freed) {
            printf("[LEAK] %zu bytes allocated at %s:%d never freed\n",
                   a->size, a->file, a->line);
            printf("       Address: %p\n", a->ptr);
            leak_count++;
            leak_bytes += a->size;
        }
    }

    if (leak_count == 0) {
        printf("No memory leaks detected!\n");
    } else {
        printf("\nTotal leaks: %d allocations, %zu bytes\n",
               leak_count, leak_bytes);
    }
}

// Automatically run at program exit
__attribute__((destructor))
void auto_leak_report(void) {
    leak_check_report();
}

Phase 3: Double-Free Detection (2-3 hours)

Goal: Catch attempts to free already-freed memory.

void debug_free(void* ptr, const char* file, int line) {
    if (ptr == NULL) return;  // free(NULL) is valid and does nothing

    // Find this pointer in our registry
    Allocation *found = NULL;
    for (Allocation *a = head; a != NULL; a = a->next) {
        if (a->ptr == ptr) {
            found = a;
            break;
        }
    }

    if (found == NULL) {
        printf("[WARNING] Freeing untracked pointer %p at %s:%d\n",
               ptr, file, line);
        printf("          This may be a bug or memory not allocated via debug_malloc\n");
        free(ptr);
        return;
    }

    if (found->freed) {
        printf("[ERROR] DOUBLE-FREE DETECTED!\n");
        printf("  Pointer: %p\n", ptr);
        printf("  Size: %zu bytes\n", found->size);
        printf("  Originally allocated at: %s:%d\n", found->file, found->line);
        printf("  First freed at: (tracking not implemented yet)\n");
        printf("  Second free attempted at: %s:%d\n", file, line);
        printf("\n*** Aborting to prevent heap corruption ***\n");
        abort();
    }

    found->freed = 1;
    free(ptr);
}

Phase 4: Use-After-Free Detection (2-3 hours)

Goal: Poison freed memory to make UAF bugs visible.

void debug_free(void* ptr, const char* file, int line) {
    // ... (previous checks) ...

    // Before freeing, fill with poison pattern
    // This makes use-after-free more likely to be noticed
    memset(ptr, 0xDE, found->size);

    found->freed = 1;
    free(ptr);
}

Why poisoning helps:

  • If code reads freed memory and gets 0xDEDEDEDE, itโ€™s obviously wrong
  • Much easier to debug than reading โ€œold valid dataโ€
  • Pattern 0xDE is visible in debuggers

Phase 5: Header File and Macros (1-2 hours)

// leak_detector.h

#ifndef LEAK_DETECTOR_H
#define LEAK_DETECTOR_H

#include <stddef.h>

// Initialize (optional - auto-init on first malloc)
void init_leak_detector(void);

// Underlying functions
void* debug_malloc(size_t size, const char* file, int line);
void debug_free(void* ptr, const char* file, int line);

// Reporting
void leak_check_report(void);
size_t get_active_allocations(void);
size_t get_total_allocated_bytes(void);

// Macros to capture file/line automatically
#define malloc(size) debug_malloc(size, __FILE__, __LINE__)
#define free(ptr) debug_free(ptr, __FILE__, __LINE__)

#endif

Testing Strategy

Test 1: Memory Leak Detection

// test_leak.c
#include "leak_detector.h"

void create_user(void) {
    char *name = malloc(100);
    // Oops, forgot to free!
}

int main(void) {
    for (int i = 0; i < 3; i++) {
        create_user();
    }
    return 0;
}

Expected Output:

=== Memory Leak Detector Report ===
[LEAK] 100 bytes at test_leak.c:5 never freed
[LEAK] 100 bytes at test_leak.c:5 never freed
[LEAK] 100 bytes at test_leak.c:5 never freed

Total leaks: 3 allocations, 300 bytes

Test 2: Double-Free Detection

// test_double_free.c
#include "leak_detector.h"

int main(void) {
    int *p = malloc(sizeof(int));
    *p = 42;
    free(p);
    free(p);  // Should trigger error!
    return 0;
}

Test 3: Clean Program

// test_clean.c
#include "leak_detector.h"

int main(void) {
    int *p = malloc(sizeof(int));
    *p = 42;
    free(p);  // Properly freed!

    char *str = malloc(100);
    strcpy(str, "Hello");
    free(str);  // Properly freed!

    return 0;
}

Expected Output:

=== Memory Leak Detector Report ===
No memory leaks detected!
Total allocations: 2
Total frees: 2

Common Pitfalls and Debugging Tips

Pitfall 1: Infinite Recursion

// WRONG - debug_malloc calls malloc which calls debug_malloc...
void* debug_malloc(size_t size, const char* file, int line) {
    Allocation *record = malloc(sizeof(Allocation));  // RECURSION!
    // ...
}

// Solution 1: Use a flag to detect recursion
static int in_debug_malloc = 0;

void* debug_malloc(size_t size, const char* file, int line) {
    if (in_debug_malloc) {
        return real_malloc(size);  // Bypass tracking
    }
    in_debug_malloc = 1;
    // ... do tracking ...
    in_debug_malloc = 0;
}

// Solution 2: Allocate tracking records from separate pool

Pitfall 2: Thread Safety

The simple linked list isnโ€™t thread-safe. For multi-threaded programs:

  • Use a mutex around registry operations
  • Or use thread-local storage

Pitfall 3: Missing Some Allocations

Some code might call malloc before your detector initializes:

  • Use constructor attribute: __attribute__((constructor))
  • Or intercept at link time with --wrap flag

Extensions and Challenges

Challenge 1: Track Free Location

typedef struct Allocation {
    // ... existing fields ...
    const char *free_file;  // Where it was freed
    int free_line;
} Allocation;

Challenge 2: Memory Usage Statistics

typedef struct {
    size_t total_allocated;
    size_t total_freed;
    size_t peak_usage;
    int allocation_count;
    int free_count;
} MemoryStats;

MemoryStats get_memory_stats(void);

Challenge 3: Quarantine for UAF Detection

Instead of immediately freeing, keep freed memory in a โ€œquarantineโ€:

#define QUARANTINE_SIZE 100

static void* quarantine[QUARANTINE_SIZE];
static int quarantine_index = 0;

void debug_free(void* ptr, ...) {
    // Add to quarantine instead of immediate free
    if (quarantine[quarantine_index] != NULL) {
        real_free(quarantine[quarantine_index]);
    }
    quarantine[quarantine_index] = ptr;
    quarantine_index = (quarantine_index + 1) % QUARANTINE_SIZE;
}

This delays reuse, making UAF bugs more likely to crash (and be detected).

Challenge 4: Stack Trace Capture

On Linux/macOS, capture the call stack at allocation time:

#include <execinfo.h>

typedef struct Allocation {
    // ... existing fields ...
    void* backtrace[10];
    int backtrace_size;
} Allocation;

Real-World Connections

Connection 1: How Valgrind Works

Valgrind uses similar techniques:

  • Intercepts malloc/free calls
  • Maintains shadow memory to track validity
  • Reports errors with stack traces

Your detector is a simplified version of Valgrindโ€™s Memcheck.

Connection 2: AddressSanitizer

ASan (AddressSanitizer) uses compiler instrumentation:

  • Adds checks around every memory access
  • Much faster than Valgrind (2x slowdown vs 20x)
  • Catches more bugs (stack buffer overflows too)

Connection 3: Production Memory Allocators

Production allocators like jemalloc and tcmalloc:

  • Track statistics
  • Detect some corruption
  • Profile memory usage

Interview Questions You Can Now Answer

  1. โ€œWhat is a memory leak? How do you detect them?โ€
    • Memory allocated but never freed
    • Detect by tracking allocations and checking at exit
  2. โ€œWhat is use-after-free? Why is it dangerous?โ€
    • Accessing memory after itโ€™s freed
    • Dangerous because memory might be reused, leading to corruption or security bugs
  3. โ€œWhat is double-free? What can go wrong?โ€
    • Calling free() twice on same pointer
    • Corrupts heap allocator metadata, can lead to exploitation
  4. โ€œHow does Valgrind detect memory errors?โ€
    • Intercepts malloc/free
    • Tracks every byteโ€™s validity state
    • Checks every memory access
  5. โ€œWhatโ€™s the difference between a dangling pointer and a NULL pointer?โ€
    • NULL: explicitly points to nothing, checkable
    • Dangling: points to freed memory, looks valid but isnโ€™t
  6. โ€œIf you free memory, why can you sometimes still read from it?โ€
    • Memory isnโ€™t erased, just marked as available
    • Old data remains until overwritten

Resources

Books

  • โ€œUnderstanding and Using C Pointersโ€ by Richard Reese - Ch. 2
  • โ€œComputer Systems: A Programmerโ€™s Perspectiveโ€ - Ch. 9.9
  • โ€œEffective Cโ€ by Robert Seacord - Ch. 6

Online

Tools

  • Valgrind
  • AddressSanitizer
  • LeakSanitizer (LSan)

Self-Assessment Checklist

  • Explain why freeing memory doesnโ€™t zero it out
  • Describe why use-after-free sometimes โ€œworksโ€ and sometimes crashes
  • Think about every malloc in terms of โ€œwho frees this and whenโ€
  • Implement a basic leak detector with file/line tracking
  • Explain how poisoning freed memory helps debugging
  • Use Valgrind to find memory errors

Final Milestone: You think about every malloc in terms of โ€œwho frees this and whenโ€.