Project 8: CPU Stack Profiler (profile Clone)

A sampling profiler that captures stack traces at regular intervals to show where CPU time is being spent. Output in a format suitable for flame graph generation.

Quick Reference

Attribute Value
Primary Language C (libbpf)
Alternative Languages Go (cilium/ebpf), Rust (aya)
Difficulty Level 3: Advanced
Time Estimate 2 weeks
Knowledge Area Performance Profiling / CPU Analysis
Tooling libbpf, perf_event
Prerequisites Projects 1-7 completed

What You Will Build

A sampling profiler that captures stack traces at regular intervals to show where CPU time is being spent. Output in a format suitable for flame graph generation.

Why It Matters

This project combines BPF with perf events for CPU sampling, teaches you about stack unwinding, and produces data for visualization. This is the foundation of production profilers.

Core Challenges

  • CPU sampling with perf events → maps to PERF_TYPE_SOFTWARE, PERF_COUNT_SW_CPU_CLOCK
  • Capturing kernel and user stacks → maps to bpf_get_stack(), stack traces
  • Symbol resolution → maps to /proc/kallsyms, DWARF, frame pointers
  • Flame graph generation → maps to folded stacks format

Key Concepts

  • CPU Profiling: “BPF Performance Tools” Chapter 6 - Brendan Gregg
  • Flame Graphs: Brendan Gregg Flame Graphs
  • Stack Walking: “BPF Performance Tools” Chapter 2.7 - Brendan Gregg
  • perf_event Integration: “Learning eBPF” Chapter 7 - Liz Rice

Real-World Outcome

$ sudo ./stackprof -p 1234 -d 10  # Profile PID 1234 for 10 seconds
Profiling PID 1234 for 10 seconds at 99 Hz...

Collected 990 samples

# Output folded stacks for flame graph
$ sudo ./stackprof -p 1234 -d 10 -f > stacks.folded
$ flamegraph.pl stacks.folded > profile.svg

Top functions by sample count:
  pthread_mutex_lock      234  (23.6%)
  __GI___libc_read        198  (20.0%)
  __memcpy_avx_unaligned  156  (15.8%)
  process_request         123  (12.4%)
  parse_json               89   (9.0%)

Implementation Guide

  1. Reproduce the simplest happy-path scenario.
  2. Build the smallest working version of the core feature.
  3. Add input validation and error handling.
  4. Add instrumentation/logging to confirm behavior.
  5. Refactor into clean modules with tests.

Milestones

  • Milestone 1: Minimal working program that runs end-to-end.
  • Milestone 2: Correct outputs for typical inputs.
  • Milestone 3: Robust handling of edge cases.
  • Milestone 4: Clean structure and documented usage.

Validation Checklist

  • Output matches the real-world outcome example
  • Handles invalid inputs safely
  • Provides clear errors and exit codes
  • Repeatable results across runs

References

  • Main guide: LEARN_BPF_EBPF_LINUX.md
  • “BPF Performance Tools” by Brendan Gregg