Project 8: Profiler for Cache Misses
A simple command-line profiler that uses the Linux
perf_event_opensyscall to wrap another command and report on the L1/L2/L3 cache misses it generated.
Quick Reference
| Attribute | Value |
|---|---|
| Primary Language | C |
| Alternative Languages | C++, Rust |
| Difficulty | Level 4: Expert |
| Time Estimate | 2-3 weeks |
| Knowledge Area | Systems Programming / CPU Internals / Linux Kernel |
| Tooling | Linux perf_event_open syscall |
| Prerequisites | Strong C skills, comfort with Linux syscalls, Project 1 (for a target to profile). |
What You Will Build
A simple command-line profiler that uses the Linux perf_event_open syscall to wrap another command and report on the L1/L2/L3 cache misses it generated.
Why It Matters
This project builds core skills that appear repeatedly in real-world systems and tooling.
Core Challenges
- Understanding the
perf_event_opensyscall → maps to reading kernel documentation and dealing with a complex API - Setting up the
perf_event_attrstruct → maps to specifying which hardware events to count (e.g.,PERF_COUNT_HW_CACHE_MISSES) - Forking and executing a child process (
execvp) → maps to standard Unix process management - Controlling the counters with
ioctl→ maps to enabling, disabling, and resetting counts around the child process execution - Reading and reporting the results → maps to getting the final numbers from the kernel
Key Concepts
perf_event_opensyscall:man perf_event_openis the primary source.- Performance Monitoring Units (PMU): Intel/AMD developer manuals.
fork/execprocess model: “Advanced Programming in the UNIX Environment” by Stevens & Rago.
Real-World Outcome
# Profile the Locality Benchmarker from Project 1
$ ./cache_profiler ./locality_benchmark
> Profiling command: ./locality_benchmark
... (output from the locality_benchmark) ...
> Profiling finished. Hardware Performance Counters:
L1d Cache Loads: 2,014,589,123
L1d Cache Misses: 501,887,345 (24.91%)
L3 Cache Misses: 125,234,876 ( 6.22%)
Instructions: 10,456,123,789
CPU Cycles: 25,123,456,901
Implementation Guide
- Reproduce the simplest happy-path scenario.
- Build the smallest working version of the core feature.
- Add input validation and error handling.
- Add instrumentation/logging to confirm behavior.
- Refactor into clean modules with tests.
Milestones
- Milestone 1: Minimal working program that runs end-to-end.
- Milestone 2: Correct outputs for typical inputs.
- Milestone 3: Robust handling of edge cases.
- Milestone 4: Clean structure and documented usage.
Validation Checklist
- Output matches the real-world outcome example
- Handles invalid inputs safely
- Provides clear errors and exit codes
- Repeatable results across runs
References
- Main guide:
LEARN_C_PERFORMANCE_DEEP_DIVE.md - “The Linux Programming Interface” by Michael Kerrisk