Project 6: Data Structure Cache Benchmark

A program that implements a Doubly Linked List and a Dynamic Array (std::vector equivalent). You will then perform two benchmarks on a large dataset: (1) iterating through and summing all elements, and (2) performing many random insertions and deletions.

Quick Reference

Attribute Value
Primary Language C
Alternative Languages C++, Rust
Difficulty Level 2: Intermediate
Time Estimate 1-2 weeks
Knowledge Area Data Structures / CPU Caches / Memory Access Patterns
Tooling GCC/Clang
Prerequisites Solid understanding of malloc/free and pointers.

What You Will Build

A program that implements a Doubly Linked List and a Dynamic Array (std::vector equivalent). You will then perform two benchmarks on a large dataset: (1) iterating through and summing all elements, and (2) performing many random insertions and deletions.

Why It Matters

This project builds core skills that appear repeatedly in real-world systems and tooling.

Core Challenges

  • Implementing a Doubly Linked List from scratch → maps to manual memory management and pointer manipulation
  • Implementing a Dynamic Array (vector) from scratch → maps to handling growth, realloc
  • Populating both structures with a large dataset → maps to preparing the benchmark
  • Comparing traversal vs. random access performance → maps to understanding algorithmic complexity vs. hardware performance

Key Concepts

  • Pointer Chasing vs. Sequential Access: “Computer Systems: A Programmer’s Perspective” Ch. 6 - Bryant & O’Hallaron
  • Dynamic Array Implementation: Numerous online tutorials on creating a vector in C.
  • Linked List Implementation: Classic computer science exercise.

Real-World Outcome

$ ./datastructure_benchmark 1000000
Populating 1,000,000 elements...

Benchmark 1: Full Traversal and Summation
  Dynamic Array: 0.003 seconds
  Linked List:   0.095 seconds
  -> Array is 31.67x faster due to spatial locality.

Benchmark 2: 10,000 Random Insertions/Deletions
  Dynamic Array: 1.25 seconds (due to memory shifting)
  Linked List:   0.05 seconds (pointer manipulation is cheap)
  -> Linked List is 25.00x faster for this access pattern.

Implementation Guide

  1. Reproduce the simplest happy-path scenario.
  2. Build the smallest working version of the core feature.
  3. Add input validation and error handling.
  4. Add instrumentation/logging to confirm behavior.
  5. Refactor into clean modules with tests.

Milestones

  • Milestone 1: Minimal working program that runs end-to-end.
  • Milestone 2: Correct outputs for typical inputs.
  • Milestone 3: Robust handling of edge cases.
  • Milestone 4: Clean structure and documented usage.

Validation Checklist

  • Output matches the real-world outcome example
  • Handles invalid inputs safely
  • Provides clear errors and exit codes
  • Repeatable results across runs

References

  • Main guide: LEARN_C_PERFORMANCE_DEEP_DIVE.md
  • “Data Structures and Algorithm Analysis in C” by Mark Allen Weiss