Project 3: Memory Leak Detector

Build a malloc/free tracker that reports leaks and double frees with allocation-site metadata.

Quick Reference

Attribute Value
Difficulty Intermediate
Time Estimate 12-20 hours
Main Programming Language C (Alternatives: C++)
Alternative Programming Languages C++, Rust
Coolness Level High
Business Potential High (debugging tools)
Prerequisites Pointers, dynamic memory, basic data structures
Key Topics Ownership, allocation tracking, metadata, atexit, macros

1. Learning Objectives

By completing this project, you will:

  1. Track allocations and frees with precise metadata.
  2. Detect memory leaks and double frees at program exit.
  3. Implement safe wrappers using macros and function pointers.
  4. Design deterministic reports for testing.
  5. Understand how production leak detectors work.

2. All Theory Needed (Per-Concept Breakdown)

2.1 Heap Allocation, Ownership, and Lifetime

Fundamentals

Dynamic memory allocation in C is explicit: malloc allocates, and free releases. Ownership is a discipline that answers “who is responsible for freeing this block?” If ownership is unclear, leaks and double frees occur. Lifetime is the period during which a pointer is valid. After free, a pointer becomes invalid even if the address looks unchanged. A leak detector makes ownership visible by recording each allocation and ensuring it is freed exactly once. Understanding heap allocation and lifetime is essential because every leak report is rooted in a lifetime violation.

Deep Dive into the Concept

The C standard library allocator manages the heap. When you call malloc, it returns a pointer to a block of memory that the allocator considers “owned” by the caller. That pointer is valid until free is called on it (or the program exits). realloc is a special case: it may return a new pointer and invalidate the old one. If you use the old pointer after realloc, you create a use-after-free. Ownership must be updated whenever realloc moves memory. A leak detector must treat realloc as either a resize in place or an allocation+free pair, depending on whether it returns a new pointer.

Lifetime errors are subtle. A pointer can outlive the memory it points to. This happens when the pointer is stored in a global or passed to another subsystem that later uses it after it has been freed. Since C doesn’t track ownership automatically, a leak detector must impose a model: each allocation has a unique owner, and that owner is responsible for freeing it exactly once. This is why leak detectors report “allocation sites” (file and line) and “free sites” when possible. By tracking allocation sites, you can correlate leaks with the code that created the memory.

Ownership can be explicit (the API says “caller must free”) or implicit (ownership transferred by a function). Libraries often document ownership rules, but documentation is not enforcement. A leak detector provides enforcement by tracking each pointer. The simplest model is a set of “live allocations.” If a pointer is in the set at exit, it is a leak. If a pointer is freed but not in the set, it is an invalid or double free. This model does not detect use-after-free directly, but it provides enough information to reason about lifetime correctness.

The allocator itself adds metadata around your block. The pointer you receive points to user data, but the allocator may store size and bookkeeping data nearby. This is why freeing a pointer that was not returned by malloc can crash: the allocator expects metadata that doesn’t exist. Your leak detector must not interfere with the allocator’s internal metadata; instead, it should store metadata separately. One common approach is to maintain a hash table keyed by pointer. Another is to use a linked list. The choice is a trade-off between simplicity and performance.

Finally, the lifetime model informs your reporting. A clean report should show total leaked bytes, number of leaked blocks, and the sites where they were allocated. It should also distinguish “still reachable” (for global caches) from “definitely lost” (not referenced). Implementing reachability is complex, so your project can treat all unfreed allocations as leaks. The goal is learning, not perfect GC-level analysis.

How this fits on projects

This concept drives the core requirements in Section 3.2 and influences edge cases in Section 3.6 and testing in Section 6.

Definitions & key terms

  • Ownership: Responsibility for freeing memory.
  • Lifetime: Validity window of a pointer.
  • Leak: Allocated memory not freed before exit.
  • Double free: Freeing the same block twice.

Mental model diagram (ASCII)

malloc -> [live set add]
free   -> [live set remove]
exit   -> [report remaining]

How it works (step-by-step, with invariants and failure modes)

  1. Allocate memory with malloc.
  2. Record pointer + metadata in tracking table.
  3. Free pointer and remove from table.
  4. At exit, report remaining pointers.

Invariant: Every allocation must be freed exactly once.

Failure modes: Leaks, double frees, use-after-free.

Minimal concrete example

void *p = malloc(64);  // tracked
free(p);               // removed from tracking

Common misconceptions

  • “If the program exits, leaks don’t matter.” -> They matter for long-running processes.
  • “Freeing twice is harmless.” -> It corrupts allocator metadata.
  • “A pointer value implies validity.” -> Validity depends on lifetime.

Check-your-understanding questions

  1. What happens to ownership after realloc returns a new pointer?
  2. Why is double-free dangerous?
  3. Why do leaks matter in server processes?

Check-your-understanding answers

  1. Ownership transfers to the new pointer; the old pointer is invalid.
  2. The allocator tries to free the same block twice, corrupting metadata.
  3. Leaks accumulate and exhaust memory or degrade performance.

Real-world applications

  • Leak detection in long-running servers.
  • Memory auditing in embedded systems.
  • Security tooling to detect misuse of allocators.

Where you’ll apply it

References

  • “The Linux Programming Interface” (memory chapters)
  • “Expert C Programming” (memory pitfalls)

Key insights

Ownership is a policy; a leak detector makes that policy visible.

Summary

You understand how heap lifetimes work and why tracking ownership is essential for leak detection.

Homework/Exercises to practice the concept

  1. Write a small program that leaks memory intentionally.
  2. Modify it to free correctly and confirm the leak disappears.

Solutions to the homework/exercises

  1. Allocate and exit without free.
  2. Add free and confirm no leaks are reported.

2.2 Allocation Metadata and Tracking Data Structures

Fundamentals

A leak detector needs a way to associate metadata (size, file, line) with each allocated pointer. This is commonly done with a linked list or hash table keyed by pointer. The data structure must support insertion on allocation, removal on free, and iteration at exit. A linked list is simpler but O(n) for removal; a hash table is faster but more complex. The project goal is to learn how metadata tracking works and to design a structure that is correct and testable.

Deep Dive into the Concept

When you intercept malloc, you must store metadata about the allocation in a separate structure. The simplest is a singly linked list:

typedef struct alloc {
    void *ptr;
    size_t size;
    const char *file;
    int line;
    struct alloc *next;
} alloc;

On every allocation, you create a new node and push it onto the list. On free, you search for the pointer and remove the node. This is O(n), but for small programs or educational tools, it is sufficient. The key is correctness: you must remove exactly one node per free. If the pointer is not found, it is an invalid free. If it is found twice, you have a double free. You must decide how to report these conditions.

A hash table improves performance and is not difficult to implement. You can hash the pointer value (e.g., using uintptr_t) and store nodes in buckets. This yields O(1) average insert/remove. The trade-off is complexity and memory overhead. Because your tool is for learning, a linked list might be acceptable, but you should design the API so that you can swap in a hash table later if needed.

Metadata includes not only size and allocation site, but also optional stack traces. Capturing stack traces requires platform-specific APIs (backtrace() on glibc) and is optional. File and line information can be captured with macros (__FILE__, __LINE__) when you wrap malloc. That means your malloc wrapper should take additional parameters. If you want to track realloc, you need to update the metadata when a pointer changes. A good design is to remove the old node and insert a new one if the pointer changes.

Thread safety is another concern. If your program is multi-threaded, the tracking structure must be protected by a mutex. Otherwise, concurrent allocations can corrupt the list. For this project, you can either explicitly state “single-threaded only” or add a simple pthread_mutex_t around insert/remove. If you add locking, keep it small and local to avoid deadlocks (e.g., do not call malloc while holding the lock unless you have a reentrant allocator).

Finally, consider how your tracker handles its own memory. If your tracker uses malloc to allocate metadata nodes, you can create recursion: your tracker calls malloc, which calls your tracker again. This can be solved by using the real malloc via dlsym, or by preallocating a static pool for metadata. For this project, a simple approach is to call the real malloc function pointer obtained from dlsym and avoid recursion. Alternatively, you can use a fixed-size metadata pool to keep the detector self-contained.

How this fits on projects

Metadata tracking underlies the core algorithm in Section 4.4 and the data structure in Section 4.3. It also informs pitfalls in Section 7.

Definitions & key terms

  • Metadata: Extra information stored about an allocation.
  • Tracking structure: List or table of live allocations.
  • Bucket: Slot in a hash table for collisions.
  • Recursion: Tracker calling itself by accident.

Mental model diagram (ASCII)

ptr -> [hash] -> bucket -> [alloc node] -> next

How it works (step-by-step, with invariants and failure modes)

  1. On allocation, create metadata node.
  2. Insert node into tracking structure.
  3. On free, find node and remove.
  4. On exit, iterate nodes and report.

Invariant: Each pointer appears at most once in the structure.

Failure modes: Missing nodes, duplicated nodes, recursion.

Minimal concrete example

alloc *node = real_malloc(sizeof(alloc));
node->ptr = p; node->size = sz;
node->next = head; head = node;

Common misconceptions

  • “Linked lists are too slow.” -> For educational tooling, they are fine.
  • “Metadata doesn’t need to be freed.” -> It can leak and skew reports.
  • “Hashing pointers is unsafe.” -> It is fine if done carefully.

Check-your-understanding questions

  1. Why must a pointer appear only once in the tracking set?
  2. What happens if your tracker allocates metadata using the wrapped malloc?
  3. How do you detect a double free?

Check-your-understanding answers

  1. Otherwise you may report incorrect leaks and removals.
  2. It causes infinite recursion or stack overflow.
  3. The pointer is not found in the live set when free is called.

Real-world applications

  • Heap profiling tools.
  • Allocation tracing in performance diagnostics.
  • Memory debugging in embedded environments.

Where you’ll apply it

References

  • “The Art of Debugging with GDB” (heap debugging)
  • “Advanced Programming in the UNIX Environment” (dynamic loading)

Key insights

The tracker is a mini database of allocations; keep it consistent and simple.

Summary

You understand how to store and manage allocation metadata in a reliable structure.

Homework/Exercises to practice the concept

  1. Implement a linked list tracker and test insert/remove.
  2. Extend it to a hash table with 64 buckets.

Solutions to the homework/exercises

  1. Ensure head insertion and removal by pointer match work.
  2. Hash uintptr_t(ptr) % 64 and store nodes per bucket.

2.3 Interposition, Wrappers, and Exit Reporting

Fundamentals

A leak detector works by intercepting calls to malloc, free, and realloc. In C, you can do this with macros that replace calls at compile time or by using dynamic interposition (e.g., LD_PRELOAD). For this project, macro-based wrapping is simpler: you define #define malloc(sz) leak_malloc(sz, __FILE__, __LINE__). You also need a way to report leaks at program exit, typically using atexit() to register a handler. This ensures the leak report always prints, even if the program exits normally.

Deep Dive into the Concept

Interposition is the act of sitting between the caller and the real allocator. There are two main strategies. The first is compile-time wrapping: you redefine malloc and free using macros. This is straightforward and portable. The downside is that it only applies to translation units that include your header, and it can break third-party code that expects the real malloc signature. The second strategy is runtime interposition using LD_PRELOAD or platform-specific dynamic linker features. This allows you to intercept allocations in any binary without recompiling, but it requires more advanced tooling. For this project, compile-time macros are sufficient and educational.

Macro wrappers must avoid infinite recursion. Your wrapper calls the real malloc, but if you call malloc directly, the macro expands again and you recurse. The solution is to provide a real_malloc function pointer that points to the true allocator. In a macro-only approach, you can rename the real functions (e.g., #undef malloc inside the implementation file) and call malloc there. Another strategy is to declare the wrapper functions in a separate compilation unit where the macro is not active. This is often the simplest approach.

The atexit() function registers a callback that runs when the program exits normally. This is an excellent place to print leak reports because you can see the final state of the tracking structure. However, it will not run if the program aborts due to a crash or SIGKILL. For deterministic tests, you should use normal exit paths. In your report, you should print the number of leaked blocks, total bytes, and per-allocation details. You should also print a clean message when no leaks are found to avoid ambiguity.

Error reporting must be deterministic. The order of leaks in the report should be stable. If you use a linked list with head insertion, the report order depends on allocation order. That is acceptable if you document it. For tests, you can sort by address or file/line to make output stable. Sorting requires additional data structures but yields deterministic output. This project can choose a stable sorting strategy as an “advanced extension.”

Interposition also raises issues with initialization order. If your tracker uses global state, you must ensure it is initialized before the first allocation. A common pattern is to initialize lazily on the first call to leak_malloc. Another is to use a constructor attribute to initialize early. For educational purposes, lazy initialization is simplest. Just be sure to guard against reentrant calls.

Finally, the report should clearly indicate errors beyond leaks: double frees and invalid frees. These should be reported immediately when detected, and they should not crash the tool unless you choose to abort for safety. A clean design is to increment an error counter and print a warning to stderr, then continue. This mirrors how production tools like ASan report errors but keep going to show more.

How this fits on projects

Interposition defines the core API in Section 3.2 and the “failure demo” in Section 3.7. It also shapes testing in Section 6.

Definitions & key terms

  • Interposition: Intercepting calls to a function.
  • atexit: Register a function to run on normal exit.
  • Wrapper: A function that calls the real allocator and adds metadata.
  • Macro substitution: Preprocessor technique to rewrite calls.

Mental model diagram (ASCII)

user code -> malloc macro -> leak_malloc -> real malloc
                              |
                              v
                          track + return

How it works (step-by-step, with invariants and failure modes)

  1. Replace calls with wrapper functions.
  2. Wrapper allocates using real allocator.
  3. Store metadata in tracking structure.
  4. On free, remove metadata.
  5. On exit, print report.

Invariant: Wrapper must always call the real allocator exactly once.

Failure modes: Recursion, missing exit report, non-deterministic output.

Minimal concrete example

#define malloc(sz) leak_malloc((sz), __FILE__, __LINE__)
#define free(p) leak_free((p), __FILE__, __LINE__)

Common misconceptions

  • “Macros are unsafe.” -> They are safe if scoped carefully.
  • atexit covers crashes.” -> It only runs on normal exit.
  • “All allocations are intercepted.” -> Only code that includes the header.

Check-your-understanding questions

  1. Why can macros cause recursion in the tracker implementation?
  2. Why is deterministic report ordering important?
  3. What happens if the program crashes before atexit runs?

Check-your-understanding answers

  1. The wrapper calls malloc, which expands to itself.
  2. Tests need stable output to compare.
  3. The report may not run; leaks are not printed.

Real-world applications

  • LD_PRELOAD-based memory debuggers.
  • Runtime instrumentation tools.
  • Allocation profilers.

Where you’ll apply it

References

  • man 3 atexit
  • “The Linux Programming Interface” (dynamic linking)

Key insights

Interposition is how tools see allocations without changing application logic.

Summary

You understand how to intercept allocations and report leaks deterministically.

Homework/Exercises to practice the concept

  1. Write a macro wrapper for malloc and confirm it compiles.
  2. Register an atexit handler that prints a message.

Solutions to the homework/exercises

  1. Use #define malloc(sz) leak_malloc(sz, __FILE__, __LINE__).
  2. Call atexit(report_leaks); and run the program.

3. Project Specification

3.1 What You Will Build

A library libleakdet that wraps malloc, free, and realloc, tracks live allocations, and prints a report at exit. A demo program intentionally leaks memory to show the report.

3.2 Functional Requirements

  1. Track allocations with size, file, and line.
  2. Remove allocations on free.
  3. Detect double free and invalid free.
  4. Handle realloc by updating metadata.
  5. Print leak report on normal exit.
  6. Provide deterministic demo output via a --deterministic mode that prints allocation IDs instead of raw addresses.

3.3 Non-Functional Requirements

  • Performance: Overhead should be reasonable for small programs.
  • Reliability: Tracking data structure must remain consistent.
  • Usability: Error messages are clear and concise.

3.4 Example Usage / Output

$ ./leak_demo
[leakdet] malloc(32) -> 0x55555556a2a0 (main.c:12)
[leakdet] malloc(64) -> 0x55555556a2d0 (main.c:13)
[leakdet] free(0x55555556a2a0) (main.c:17)

== Leak Report ==
Leaked blocks: 1
Total leaked bytes: 64
Leaked at:
  0x55555556a2d0 size=64 (main.c:13)

3.5 Data Formats / Schemas / Protocols

typedef struct alloc {
    void *ptr;
    size_t size;
    const char *file;
    int line;
    struct alloc *next;
} alloc;

3.6 Edge Cases

  • free(NULL) should be ignored (like standard C).
  • realloc(NULL, n) behaves like malloc.
  • realloc(p, 0) behaves like free.
  • Double free detection when pointer is already removed.

3.7 Real World Outcome

The program should produce a deterministic leak report and a clear error for double free.

3.7.1 How to Run (Copy/Paste)

make
./leak_demo --deterministic
./leak_demo --double-free --deterministic

3.7.2 Golden Path Demo (Deterministic)

$ ./leak_demo --deterministic
[leakdet] malloc(32) -> id=1 (main.c:12)
[leakdet] malloc(64) -> id=2 (main.c:13)
[leakdet] free(id=1) (main.c:17)

== Leak Report ==
Leaked blocks: 1
Total leaked bytes: 64
Leaked at:
  id=2 size=64 (main.c:13)

Exit code: 0

3.7.3 If CLI: Exact Terminal Transcript

$ ./leak_demo --double-free --deterministic
[leakdet] malloc(16) -> id=1 (main.c:10)
[leakdet] free(id=1) (main.c:12)
[leakdet] error: double free of id=1 (main.c:13)

== Leak Report ==
Leaked blocks: 0
Total leaked bytes: 0

Exit code: 1

4. Solution Architecture

4.1 High-Level Design

User code -> macro wrappers -> tracking table -> report at exit

4.2 Key Components

Component Responsibility Key Decisions
Wrappers intercept malloc/free/realloc macro-based
Tracker store live allocations linked list or hash
Reporter print leaks deterministic ordering

4.3 Data Structures (No Full Code)

typedef struct {
    alloc *head;
    size_t total_bytes;
    size_t total_blocks;
} leak_state;

4.4 Algorithm Overview

Key Algorithm: Allocation Tracking

  1. On malloc, call real allocator.
  2. Insert metadata node.
  3. On free, remove node.
  4. At exit, iterate nodes and print report.

Complexity Analysis:

  • Time: O(n) insertion/removal for linked list.
  • Space: O(n) for metadata.

5. Implementation Guide

5.1 Development Environment Setup

sudo apt-get install build-essential

5.2 Project Structure

leakdet/
|-- include/
|   \-- leakdet.h
|-- src/
|   \-- leakdet.c
|-- demo/
|   \-- leak_demo.c
|-- tests/
|   \-- test_leakdet.c
\-- Makefile

5.3 The Core Question You’re Answering

“How can I make memory ownership visible and enforceable in a C program?”

5.4 Concepts You Must Understand First

  1. Ownership and lifetime (see Section 2.1).
  2. Metadata tracking structures (see Section 2.2).
  3. Interposition and exit reporting (see Section 2.3).

5.5 Questions to Guide Your Design

  1. Will you use a linked list or hash table?
  2. How will you handle realloc?
  3. How will you make reports deterministic for testing?

5.6 Thinking Exercise

Write a small program that allocates three blocks and frees only one. Predict exactly what your leak report should show.

5.7 The Interview Questions They’ll Ask

  1. How do leak detectors work?
  2. Why is double free dangerous?
  3. How would you track allocation sites in C?

5.8 Hints in Layers

Hint 1: Start with wrappers and print every allocation. Hint 2: Add a linked list of active allocations. Hint 3: Remove nodes on free and detect missing entries. Hint 4: Add atexit to print report.

5.9 Books That Will Help

| Topic | Book | Chapter | |——-|——|———| | Heap allocation | CSAPP | Ch. 9 | | Debugging | Art of Debugging with GDB | Ch. 1-4 | | C pitfalls | Expert C Programming | Ch. 2-3 |

5.10 Implementation Phases

Phase 1: Basic Wrappers (3-4 hours)

Goals: wrap malloc/free and log calls. Checkpoint: demo prints allocation log.

Phase 2: Tracking + Reporting (4-6 hours)

Goals: store metadata and report leaks. Checkpoint: leak report matches expected output.

Phase 3: Robustness (3-4 hours)

Goals: add double free detection, handle realloc. Checkpoint: double-free demo triggers error.

5.11 Key Implementation Decisions

| Decision | Options | Recommendation | Rationale | |———-|———|—————-|———–| | Tracking structure | list vs hash | list first | simpler learning | | Error handling | abort vs warn | warn + exit code | more informative | | Determinism | sort output | stable order by address | testable |


6. Testing Strategy

6.1 Test Categories

| Category | Purpose | Examples | |———-|———|———-| | Unit Tests | tracker correctness | insert/remove, realloc | | Integration Tests | leak report | demo output matches | | Edge Case Tests | invalid free | free unknown pointer |

6.2 Critical Test Cases

  1. free(NULL) does nothing and prints no error.
  2. Double free prints error and exit code 1.
  3. Leak report counts total bytes correctly.

6.3 Test Data

Demo program with fixed allocation sizes.

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

| Pitfall | Symptom | Solution | |———|———|———-| | Recursive allocation | crash on first malloc | use real allocator | | Metadata leaks | report includes tracker nodes | ignore tracker allocations | | Non-deterministic order | tests fail | sort output |

7.2 Debugging Strategies

  • Print tracker state after every allocation in debug mode.
  • Use a fixed demo program to compare output.

7.3 Performance Traps

Using a linear list for huge programs; acceptable for educational scope.


8. Extensions & Challenges

8.1 Beginner Extensions

  • Add --quiet mode for only final report.
  • Add JSON output.

8.2 Intermediate Extensions

  • Implement hash table for tracking.
  • Add realloc tracking with move detection.

8.3 Advanced Extensions

  • Add stack trace capture.
  • Build LD_PRELOAD version.

9. Real-World Connections

9.1 Industry Applications

  • Leak detection in long-running servers.
  • Debugging embedded software.
  • Valgrind Memcheck
  • AddressSanitizer

9.3 Interview Relevance

  • Explain ownership, double free, and leak detection strategies.

10. Resources

10.1 Essential Reading

  • “The Linux Programming Interface” (memory chapters)
  • “Expert C Programming” (memory pitfalls)

10.2 Video Resources

  • Tooling talks on memory debugging.

10.3 Tools & Documentation

  • man 3 malloc, man 3 free, man 3 realloc

11. Self-Assessment Checklist

11.1 Understanding

  • I can explain heap ownership and lifetime rules.
  • I can describe how interposition works.
  • I can explain double free and leak detection.

11.2 Implementation

  • Leak report is deterministic.
  • Double free produces error.
  • Tests pass.

11.3 Growth

  • I can explain how my detector differs from Valgrind.

12. Submission / Completion Criteria

Minimum Viable Completion:

  • Allocation tracking and leak report at exit.
  • Demo program that shows a leak.

Full Completion:

  • Double-free detection and realloc handling.
  • Deterministic output for tests.

Excellence (Going Above & Beyond):

  • Stack trace capture.
  • LD_PRELOAD version for arbitrary binaries.