Project 6: Data Structure Cache Benchmark
A program that implements a Doubly Linked List and a Dynamic Array (std::vector equivalent). You will then perform two benchmarks on a large dataset: (1) iterating through and summing all elements, and (2) performing many random insertions and deletions.
Quick Reference
| Attribute | Value |
|---|---|
| Primary Language | C |
| Alternative Languages | C++, Rust |
| Difficulty | Level 2: Intermediate |
| Time Estimate | 1-2 weeks |
| Knowledge Area | Data Structures / CPU Caches / Memory Access Patterns |
| Tooling | GCC/Clang |
| Prerequisites | Solid understanding of malloc/free and pointers. |
What You Will Build
A program that implements a Doubly Linked List and a Dynamic Array (std::vector equivalent). You will then perform two benchmarks on a large dataset: (1) iterating through and summing all elements, and (2) performing many random insertions and deletions.
Why It Matters
This project builds core skills that appear repeatedly in real-world systems and tooling.
Core Challenges
- Implementing a Doubly Linked List from scratch → maps to manual memory management and pointer manipulation
- Implementing a Dynamic Array (vector) from scratch → maps to handling growth,
realloc - Populating both structures with a large dataset → maps to preparing the benchmark
- Comparing traversal vs. random access performance → maps to understanding algorithmic complexity vs. hardware performance
Key Concepts
- Pointer Chasing vs. Sequential Access: “Computer Systems: A Programmer’s Perspective” Ch. 6 - Bryant & O’Hallaron
- Dynamic Array Implementation: Numerous online tutorials on creating a
vectorin C. - Linked List Implementation: Classic computer science exercise.
Real-World Outcome
$ ./datastructure_benchmark 1000000
Populating 1,000,000 elements...
Benchmark 1: Full Traversal and Summation
Dynamic Array: 0.003 seconds
Linked List: 0.095 seconds
-> Array is 31.67x faster due to spatial locality.
Benchmark 2: 10,000 Random Insertions/Deletions
Dynamic Array: 1.25 seconds (due to memory shifting)
Linked List: 0.05 seconds (pointer manipulation is cheap)
-> Linked List is 25.00x faster for this access pattern.
Implementation Guide
- Reproduce the simplest happy-path scenario.
- Build the smallest working version of the core feature.
- Add input validation and error handling.
- Add instrumentation/logging to confirm behavior.
- Refactor into clean modules with tests.
Milestones
- Milestone 1: Minimal working program that runs end-to-end.
- Milestone 2: Correct outputs for typical inputs.
- Milestone 3: Robust handling of edge cases.
- Milestone 4: Clean structure and documented usage.
Validation Checklist
- Output matches the real-world outcome example
- Handles invalid inputs safely
- Provides clear errors and exit codes
- Repeatable results across runs
References
- Main guide:
LEARN_C_PERFORMANCE_DEEP_DIVE.md - “Data Structures and Algorithm Analysis in C” by Mark Allen Weiss