Project 15: eBPF-based Observability Agent
A complete observability agent that uses eBPF to collect metrics (CPU, memory, network, disk), traces (function calls, syscalls), and events—exposing them via Prometheus metrics and structured logs.
Quick Reference
| Attribute | Value |
|---|---|
| Primary Language | Go (cilium/ebpf) |
| Alternative Languages | Rust (aya), C (libbpf) |
| Difficulty | Level 4: Expert |
| Time Estimate | 4-6 weeks |
| Knowledge Area | Observability / Distributed Systems |
| Tooling | cilium/ebpf, Prometheus, Grafana |
| Prerequisites | Projects 1-10 completed, familiarity with Go |
What You Will Build
A complete observability agent that uses eBPF to collect metrics (CPU, memory, network, disk), traces (function calls, syscalls), and events—exposing them via Prometheus metrics and structured logs.
Why It Matters
This ties together everything: multiple BPF program types, various maps, userspace integration, and production concerns like performance and reliability.
Core Challenges
- Multiple data sources → maps to combining tracepoints, kprobes, XDP
- Metric aggregation → maps to in-kernel vs userspace aggregation
- Configuration management → maps to dynamic program loading
- Production reliability → maps to error handling, resource limits
Key Concepts
- Observability Pipelines: “Learning eBPF” Chapter 9 - Liz Rice
- Prometheus Metrics: Prometheus Best Practices
- Production eBPF: How Netflix Uses eBPF
Real-World Outcome
$ sudo ./ebpf-agent --config agent.yaml
eBPF Observability Agent v1.0.0
Loading BPF programs...
✓ syscall_counter (tracepoint)
✓ tcp_tracker (kprobe)
✓ file_monitor (kprobe)
✓ net_stats (XDP)
Metrics server: http://localhost:9090/metrics
Log output: /var/log/ebpf-agent/
# Prometheus metrics
$ curl localhost:9090/metrics
# HELP ebpf_syscalls_total Total system calls by type
# TYPE ebpf_syscalls_total counter
ebpf_syscalls_total{syscall="read"} 1234567
ebpf_syscalls_total{syscall="write"} 987654
ebpf_syscalls_total{syscall="openat"} 123456
# HELP ebpf_tcp_connections Active TCP connections
# TYPE ebpf_tcp_connections gauge
ebpf_tcp_connections{direction="outgoing"} 234
ebpf_tcp_connections{direction="incoming"} 567
# HELP ebpf_network_bytes_total Network bytes by direction
# TYPE ebpf_network_bytes_total counter
ebpf_network_bytes_total{direction="rx"} 12345678901
ebpf_network_bytes_total{direction="tx"} 9876543210
Implementation Guide
- Reproduce the simplest happy-path scenario.
- Build the smallest working version of the core feature.
- Add input validation and error handling.
- Add instrumentation/logging to confirm behavior.
- Refactor into clean modules with tests.
Milestones
- Milestone 1: Minimal working program that runs end-to-end.
- Milestone 2: Correct outputs for typical inputs.
- Milestone 3: Robust handling of edge cases.
- Milestone 4: Clean structure and documented usage.
Validation Checklist
- Output matches the real-world outcome example
- Handles invalid inputs safely
- Provides clear errors and exit codes
- Repeatable results across runs
References
- Main guide:
LEARN_BPF_EBPF_LINUX.md - “Learning eBPF” by Liz Rice