Project 4: Page Fault Analyzer
Build a tool that captures page faults, classifies them (major/minor/COW), and maps fault addresses to memory regions.
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Level 4: Expert |
| Time Estimate | 2-3 weeks |
| Main Programming Language | C (Alternatives: Rust, Python) |
| Alternative Programming Languages | Rust, Python |
| Coolness Level | Level 4: Hardcore |
| Business Potential | Level 3: Performance tooling |
| Prerequisites | C, virtual memory concepts, /proc familiarity |
| Key Topics | Page faults, perf events, address mapping |
1. Learning Objectives
By completing this project, you will:
- Distinguish minor, major, and COW page faults.
- Capture page-fault events with perf/tracepoints.
- Map fault addresses to VMAs via
/proc/<pid>/maps. - Correlate faults with file-backed vs anonymous regions.
- Produce deterministic summaries and histograms.
- Explain how faults impact performance.
2. All Theory Needed (Per-Concept Breakdown)
2.1 Page Faults and Demand Paging
Fundamentals
A page fault occurs when the CPU references a virtual page not currently present. The kernel resolves the fault by mapping the page (minor fault) or by reading it from disk (major fault). Copy-on-write faults happen when a shared page is written.
Deep Dive into the concept
Page tables translate virtual addresses to physical frames. If a present bit is clear, the CPU raises a fault. The kernel examines the VMA and decides whether the access is valid. For anonymous memory, it may allocate a new page. For file-backed memory, it may read from disk, causing major faults. COW faults occur when multiple processes share a read-only page after fork; the first write triggers a copy. Understanding this path lets you interpret which faults are truly expensive.
How this fits on projects
Fault classification powers Section 3.2 and the output in Section 3.7.
Definitions & key terms
- minor fault -> resolved without disk I/O
- major fault -> requires disk I/O
- COW -> copy-on-write fault
- present bit -> page table flag indicating page is resident
Mental model diagram (ASCII)
VA -> page table -> present? no -> page fault -> kernel -> map page
How it works (step-by-step)
- CPU accesses a virtual address.
- Page table lookup finds page not present.
- CPU triggers page fault exception.
- Kernel resolves (allocate or read from disk).
- Execution resumes.
Minimal concrete example
mmap file -> first read -> minor/major fault -> page loaded
Common misconceptions
- Misconception: all page faults are bad. Correction: minor faults are normal and cheap.
Check-your-understanding questions
- What distinguishes a major from a minor fault?
- When do COW faults happen?
Check-your-understanding answers
- Major faults require disk I/O.
- When a shared read-only page is written.
Real-world applications
- Performance analysis of databases and caches
- Understanding cold-start latency
Where you’ll apply it
- This project: Section 3.2 Functional Requirements, Section 3.7 Output.
- Also used in: P02-memory-allocator-malloc-free-from-scratch.
References
- OSTEP, VM chapters
perf_event_open(2)docs
Key insights
Not all faults are equal; classification matters.
Summary
Page faults are how virtual memory becomes physical reality.
Homework/Exercises to practice the concept
- Use
mmapand touch a file to observe faults. - Compare faults after
drop_caches.
Solutions to the homework/exercises
- First touch triggers faults; subsequent touches do not.
- After cache drop, major faults increase.
2.2 perf_event_open and Tracepoint Sampling
Fundamentals
perf_event_open can subscribe to kernel tracepoints like page-faults and provide a ring buffer of events. You can use it in counting mode or sampling mode.
Deep Dive into the concept
perf events are configured with a perf_event_attr struct. In sampling mode, the kernel writes event records into a ring buffer. You read the buffer with mmap and parse records. Tracepoint events include metadata such as address and fault type. Properly handling buffer overruns and event sizes is critical for correctness.
How this fits on projects
This is the event capture engine in Section 4.2 and Section 5.10 Phase 1.
Definitions & key terms
- perf event -> kernel performance counter or tracepoint
- ring buffer -> circular buffer used for events
- sample -> event record containing metadata
Mental model diagram (ASCII)
kernel tracepoint -> perf ring buffer -> user parser
How it works (step-by-step)
- Configure
perf_event_attrfor the page-fault tracepoint. mmapthe ring buffer.- Poll and read event records.
- Decode address and fault type.
Minimal concrete example
int fd = perf_event_open(&attr, pid, -1, -1, 0);
Common misconceptions
- Misconception: perf buffers are ordered across CPUs. Correction: they’re per-CPU; ordering is best-effort.
Check-your-understanding questions
- What happens if the ring buffer overflows?
- Why use sampling instead of counting?
Check-your-understanding answers
- You lose events unless you handle overwrite.
- Sampling gives per-fault metadata (addresses).
Real-world applications
- Profiling (perf, bcc, eBPF tools)
Where you’ll apply it
- This project: Section 4.4 Algorithm Overview, Section 5.10 Phase 2.
- Also used in: P07-interrupt-latency-profiler.
References
perf_event_open(2)man page- Kernel perf documentation
Key insights
perf is the bridge from kernel events to user insight.
Summary
Without perf, fault analysis is guesswork.
Homework/Exercises to practice the concept
- Count page faults with
perf stat. - Write a program that reads raw perf events.
Solutions to the homework/exercises
perf stat -e page-faults ./prog.- Use
perf_event_openwith a tiny ring buffer and parse records.
2.3 Mapping Fault Addresses to VMAs
Fundamentals
A fault address only becomes meaningful when mapped to a region: stack, heap, or file-backed segment. /proc/<pid>/maps lists these regions with permissions and file names.
Deep Dive into the concept
Each VMA includes start/end, permissions, offset, device, inode, and pathname. By scanning /proc/<pid>/maps, you can determine which region contains a fault address and whether it’s anonymous or file-backed. Combining this with ELF symbolization (optional) lets you see which binary segment caused the fault. You must refresh maps periodically because VMAs change as the process allocates or maps files.
How this fits on projects
This is how you turn fault addresses into human-readable output in Section 3.7.
Definitions & key terms
- VMA -> virtual memory area entry
- anonymous mapping -> no backing file
- file-backed mapping -> region mapped from a file
Mental model diagram (ASCII)
fault addr 0x7f... -> /proc/pid/maps -> libc.so.6 [text]
How it works (step-by-step)
- Read
/proc/<pid>/mapsinto a list of regions. - For each fault address, find matching region.
- Label as stack/heap/file/anon.
- Optionally symbolize within ELF.
Minimal concrete example
7f2c7b100000-7f2c7b200000 r-xp ... /lib/x86_64-linux-gnu/libc.so.6
Common misconceptions
- Misconception:
/proc/<pid>/mapsis static. Correction: it changes as mappings change.
Check-your-understanding questions
- How do you detect a stack fault?
- What does an empty pathname mean?
Check-your-understanding answers
- Region labeled
[stack]in maps. - Anonymous mapping.
Real-world applications
- Memory profiling and leak analysis
Where you’ll apply it
- This project: Section 3.4 Example Output, Section 5.10 Phase 2.
- Also used in: P02-memory-allocator-malloc-free-from-scratch.
References
proc(5)man page
Key insights
A fault address is meaningless without its VMA context.
Summary
Mapping faults to VMAs makes the data actionable.
Homework/Exercises to practice the concept
- Trigger a stack growth fault and observe maps.
- Map a file and identify its region.
Solutions to the homework/exercises
- Recursively allocate and watch
[stack]grow. mmapa file and find its pathname in maps.
3. Project Specification
3.1 What You Will Build
A CLI tool pagefault-analyzer that attaches to a PID and streams page-fault events with classification and region labeling. It also provides summary stats and histograms.
3.2 Functional Requirements
- Attach to target PID and capture page-fault events.
- Classify faults into minor/major/COW.
- Map fault addresses to VMAs and label regions.
- Output live stream and summary report.
- Provide deterministic output with
--fixed-ts.
3.3 Non-Functional Requirements
- Performance: handle 10k faults/sec without dropping.
- Reliability: graceful exit on target termination.
- Usability: readable output and clear error messages.
3.4 Example Usage / Output
$ sudo ./pagefault-analyzer --fixed-ts -p 4321
[000000.001] MINOR 0x7f2c7b100000 libc.so.6 .text
[000000.012] MAJOR 0x400000 demo.bin .text 12.4ms
Summary: major=12 minor=542 cow=3
3.5 Data Formats / Schemas / Protocols
- Text stream format:
[ts] <type> <addr> <region> <latency>
3.6 Edge Cases
- Target exits while tracing.
- Missing permissions (perf_event_paranoid).
- Fault address outside current maps snapshot.
3.7 Real World Outcome
3.7.1 How to Run (Copy/Paste)
sudo ./pagefault-analyzer --fixed-ts -p 4321
3.7.2 Golden Path Demo (Deterministic)
- Use
--fixed-tsand a fixed workload to produce stable output.
3.7.3 CLI Transcript (Success + Failure)
$ sudo ./pagefault-analyzer --fixed-ts -p 4321
[000000.001] MINOR 0x7f... libc.so.6 .text
Summary: major=1 minor=12 cow=0
$ sudo ./pagefault-analyzer -p 1
error: perf_event_paranoid too strict (see /proc/sys/kernel/perf_event_paranoid)
exit code: 2
3.7.4 Exit Codes
0success2permission/config error
4. Solution Architecture
4.1 High-Level Design
perf event reader -> classifier -> VMA mapper -> reporter
4.2 Key Components
| Component | Responsibility | Key Decisions | |———–|—————-|—————| | perf reader | read page-fault tracepoints | mmap ring buffer | | classifier | minor/major/COW | tracepoint flags | | mapper | VMA lookup | periodic refresh | | reporter | output + summary | fixed timestamp mode |
4.3 Data Structures (No Full Code)
struct vma {
uint64_t start, end;
char path[128];
};
4.4 Algorithm Overview
- Setup perf event.
- Read events and classify.
- Map address to VMA.
- Emit output and update summary.
5. Implementation Guide
5.1 Development Environment Setup
sudo apt install linux-tools-common
5.2 Project Structure
pagefault-analyzer/
|-- src/
| |-- main.c
| |-- perf_reader.c
| |-- maps.c
| `-- report.c
`-- Makefile
5.3 The Core Question You’re Answering
“Where do page faults happen, and how do they map to a program’s memory layout?”
5.4 Concepts You Must Understand First
- Page faults and demand paging.
- perf_event_open and tracepoints.
/proc/<pid>/mapsparsing.
5.5 Questions to Guide Your Design
- What sampling frequency is safe?
- How often should you refresh VMAs?
- How will you handle missing mappings?
5.6 Thinking Exercise
Why might a program have many minor faults but no major faults?
5.7 The Interview Questions They’ll Ask
- Explain major vs minor faults.
- What is copy-on-write and why is it used?
5.8 Hints in Layers
- Hint 1: start in counting mode to verify events.
- Hint 2: switch to sampling and parse ring buffer.
- Hint 3: add VMA labeling.
5.9 Books That Will Help
| Topic | Book | Chapter | |——|——|———| | Virtual memory | OSTEP | VM chapters | | perf | man-pages | perf_event_open |
5.10 Implementation Phases
Phase 1: collect fault counts. Phase 2: add address sampling + maps. Phase 3: add summary + histograms.
6. Testing Strategy
6.1 Test Categories
| Category | Purpose | Examples |
|———|———|———-|
| Unit | maps parser | parse sample maps |
| Integration | fault capture | run cat on uncached file |
6.2 Critical Test Cases
- Faults captured for file-backed reads.
- Address mapping works for stack/heap.
7. Common Pitfalls & Debugging
7.1 Frequent Mistakes
| Pitfall | Symptom | Solution | |———|———|———-| | perf_event_paranoid | no events | relax setting | | stale maps | unknown regions | refresh periodically |
8. Extensions & Challenges
- Add symbolization with
libbfdoraddr2line. - Export CSV for offline analysis.
9. Real-World Connections
- Cold-start latency analysis for services
10. Resources
proc(5),perf_event_open(2)
11. Self-Assessment Checklist
- I can classify page faults correctly.
12. Submission / Completion Criteria
Minimum: capture and classify faults. Full: map to VMAs and produce summary. Excellence: histogram + symbolization.
13. Determinism Notes
- Use
--fixed-tsand a fixed workload.