Project 4: Page Fault Analyzer

Build a tool that captures page faults, classifies them (major/minor/COW), and maps fault addresses to memory regions.

Quick Reference

Attribute Value
Difficulty Level 4: Expert
Time Estimate 2-3 weeks
Main Programming Language C (Alternatives: Rust, Python)
Alternative Programming Languages Rust, Python
Coolness Level Level 4: Hardcore
Business Potential Level 3: Performance tooling
Prerequisites C, virtual memory concepts, /proc familiarity
Key Topics Page faults, perf events, address mapping

1. Learning Objectives

By completing this project, you will:

  1. Distinguish minor, major, and COW page faults.
  2. Capture page-fault events with perf/tracepoints.
  3. Map fault addresses to VMAs via /proc/<pid>/maps.
  4. Correlate faults with file-backed vs anonymous regions.
  5. Produce deterministic summaries and histograms.
  6. Explain how faults impact performance.

2. All Theory Needed (Per-Concept Breakdown)

2.1 Page Faults and Demand Paging

Fundamentals

A page fault occurs when the CPU references a virtual page not currently present. The kernel resolves the fault by mapping the page (minor fault) or by reading it from disk (major fault). Copy-on-write faults happen when a shared page is written.

Deep Dive into the concept

Page tables translate virtual addresses to physical frames. If a present bit is clear, the CPU raises a fault. The kernel examines the VMA and decides whether the access is valid. For anonymous memory, it may allocate a new page. For file-backed memory, it may read from disk, causing major faults. COW faults occur when multiple processes share a read-only page after fork; the first write triggers a copy. Understanding this path lets you interpret which faults are truly expensive.

How this fits on projects

Fault classification powers Section 3.2 and the output in Section 3.7.

Definitions & key terms

  • minor fault -> resolved without disk I/O
  • major fault -> requires disk I/O
  • COW -> copy-on-write fault
  • present bit -> page table flag indicating page is resident

Mental model diagram (ASCII)

VA -> page table -> present? no -> page fault -> kernel -> map page

How it works (step-by-step)

  1. CPU accesses a virtual address.
  2. Page table lookup finds page not present.
  3. CPU triggers page fault exception.
  4. Kernel resolves (allocate or read from disk).
  5. Execution resumes.

Minimal concrete example

mmap file -> first read -> minor/major fault -> page loaded

Common misconceptions

  • Misconception: all page faults are bad. Correction: minor faults are normal and cheap.

Check-your-understanding questions

  1. What distinguishes a major from a minor fault?
  2. When do COW faults happen?

Check-your-understanding answers

  1. Major faults require disk I/O.
  2. When a shared read-only page is written.

Real-world applications

  • Performance analysis of databases and caches
  • Understanding cold-start latency

Where you’ll apply it

References

  • OSTEP, VM chapters
  • perf_event_open(2) docs

Key insights

Not all faults are equal; classification matters.

Summary

Page faults are how virtual memory becomes physical reality.

Homework/Exercises to practice the concept

  1. Use mmap and touch a file to observe faults.
  2. Compare faults after drop_caches.

Solutions to the homework/exercises

  1. First touch triggers faults; subsequent touches do not.
  2. After cache drop, major faults increase.

2.2 perf_event_open and Tracepoint Sampling

Fundamentals

perf_event_open can subscribe to kernel tracepoints like page-faults and provide a ring buffer of events. You can use it in counting mode or sampling mode.

Deep Dive into the concept

perf events are configured with a perf_event_attr struct. In sampling mode, the kernel writes event records into a ring buffer. You read the buffer with mmap and parse records. Tracepoint events include metadata such as address and fault type. Properly handling buffer overruns and event sizes is critical for correctness.

How this fits on projects

This is the event capture engine in Section 4.2 and Section 5.10 Phase 1.

Definitions & key terms

  • perf event -> kernel performance counter or tracepoint
  • ring buffer -> circular buffer used for events
  • sample -> event record containing metadata

Mental model diagram (ASCII)

kernel tracepoint -> perf ring buffer -> user parser

How it works (step-by-step)

  1. Configure perf_event_attr for the page-fault tracepoint.
  2. mmap the ring buffer.
  3. Poll and read event records.
  4. Decode address and fault type.

Minimal concrete example

int fd = perf_event_open(&attr, pid, -1, -1, 0);

Common misconceptions

  • Misconception: perf buffers are ordered across CPUs. Correction: they’re per-CPU; ordering is best-effort.

Check-your-understanding questions

  1. What happens if the ring buffer overflows?
  2. Why use sampling instead of counting?

Check-your-understanding answers

  1. You lose events unless you handle overwrite.
  2. Sampling gives per-fault metadata (addresses).

Real-world applications

  • Profiling (perf, bcc, eBPF tools)

Where you’ll apply it

References

  • perf_event_open(2) man page
  • Kernel perf documentation

Key insights

perf is the bridge from kernel events to user insight.

Summary

Without perf, fault analysis is guesswork.

Homework/Exercises to practice the concept

  1. Count page faults with perf stat.
  2. Write a program that reads raw perf events.

Solutions to the homework/exercises

  1. perf stat -e page-faults ./prog.
  2. Use perf_event_open with a tiny ring buffer and parse records.

2.3 Mapping Fault Addresses to VMAs

Fundamentals

A fault address only becomes meaningful when mapped to a region: stack, heap, or file-backed segment. /proc/<pid>/maps lists these regions with permissions and file names.

Deep Dive into the concept

Each VMA includes start/end, permissions, offset, device, inode, and pathname. By scanning /proc/<pid>/maps, you can determine which region contains a fault address and whether it’s anonymous or file-backed. Combining this with ELF symbolization (optional) lets you see which binary segment caused the fault. You must refresh maps periodically because VMAs change as the process allocates or maps files.

How this fits on projects

This is how you turn fault addresses into human-readable output in Section 3.7.

Definitions & key terms

  • VMA -> virtual memory area entry
  • anonymous mapping -> no backing file
  • file-backed mapping -> region mapped from a file

Mental model diagram (ASCII)

fault addr 0x7f... -> /proc/pid/maps -> libc.so.6 [text]

How it works (step-by-step)

  1. Read /proc/<pid>/maps into a list of regions.
  2. For each fault address, find matching region.
  3. Label as stack/heap/file/anon.
  4. Optionally symbolize within ELF.

Minimal concrete example

7f2c7b100000-7f2c7b200000 r-xp ... /lib/x86_64-linux-gnu/libc.so.6

Common misconceptions

  • Misconception: /proc/<pid>/maps is static. Correction: it changes as mappings change.

Check-your-understanding questions

  1. How do you detect a stack fault?
  2. What does an empty pathname mean?

Check-your-understanding answers

  1. Region labeled [stack] in maps.
  2. Anonymous mapping.

Real-world applications

  • Memory profiling and leak analysis

Where you’ll apply it

References

  • proc(5) man page

Key insights

A fault address is meaningless without its VMA context.

Summary

Mapping faults to VMAs makes the data actionable.

Homework/Exercises to practice the concept

  1. Trigger a stack growth fault and observe maps.
  2. Map a file and identify its region.

Solutions to the homework/exercises

  1. Recursively allocate and watch [stack] grow.
  2. mmap a file and find its pathname in maps.

3. Project Specification

3.1 What You Will Build

A CLI tool pagefault-analyzer that attaches to a PID and streams page-fault events with classification and region labeling. It also provides summary stats and histograms.

3.2 Functional Requirements

  1. Attach to target PID and capture page-fault events.
  2. Classify faults into minor/major/COW.
  3. Map fault addresses to VMAs and label regions.
  4. Output live stream and summary report.
  5. Provide deterministic output with --fixed-ts.

3.3 Non-Functional Requirements

  • Performance: handle 10k faults/sec without dropping.
  • Reliability: graceful exit on target termination.
  • Usability: readable output and clear error messages.

3.4 Example Usage / Output

$ sudo ./pagefault-analyzer --fixed-ts -p 4321
[000000.001] MINOR 0x7f2c7b100000 libc.so.6 .text
[000000.012] MAJOR 0x400000 demo.bin .text 12.4ms
Summary: major=12 minor=542 cow=3

3.5 Data Formats / Schemas / Protocols

  • Text stream format: [ts] <type> <addr> <region> <latency>

3.6 Edge Cases

  • Target exits while tracing.
  • Missing permissions (perf_event_paranoid).
  • Fault address outside current maps snapshot.

3.7 Real World Outcome

3.7.1 How to Run (Copy/Paste)

sudo ./pagefault-analyzer --fixed-ts -p 4321

3.7.2 Golden Path Demo (Deterministic)

  • Use --fixed-ts and a fixed workload to produce stable output.

3.7.3 CLI Transcript (Success + Failure)

$ sudo ./pagefault-analyzer --fixed-ts -p 4321
[000000.001] MINOR 0x7f... libc.so.6 .text
Summary: major=1 minor=12 cow=0

$ sudo ./pagefault-analyzer -p 1
error: perf_event_paranoid too strict (see /proc/sys/kernel/perf_event_paranoid)
exit code: 2

3.7.4 Exit Codes

  • 0 success
  • 2 permission/config error

4. Solution Architecture

4.1 High-Level Design

perf event reader -> classifier -> VMA mapper -> reporter

4.2 Key Components

| Component | Responsibility | Key Decisions | |———–|—————-|—————| | perf reader | read page-fault tracepoints | mmap ring buffer | | classifier | minor/major/COW | tracepoint flags | | mapper | VMA lookup | periodic refresh | | reporter | output + summary | fixed timestamp mode |

4.3 Data Structures (No Full Code)

struct vma {
    uint64_t start, end;
    char path[128];
};

4.4 Algorithm Overview

  1. Setup perf event.
  2. Read events and classify.
  3. Map address to VMA.
  4. Emit output and update summary.

5. Implementation Guide

5.1 Development Environment Setup

sudo apt install linux-tools-common

5.2 Project Structure

pagefault-analyzer/
|-- src/
|   |-- main.c
|   |-- perf_reader.c
|   |-- maps.c
|   `-- report.c
`-- Makefile

5.3 The Core Question You’re Answering

“Where do page faults happen, and how do they map to a program’s memory layout?”

5.4 Concepts You Must Understand First

  1. Page faults and demand paging.
  2. perf_event_open and tracepoints.
  3. /proc/<pid>/maps parsing.

5.5 Questions to Guide Your Design

  1. What sampling frequency is safe?
  2. How often should you refresh VMAs?
  3. How will you handle missing mappings?

5.6 Thinking Exercise

Why might a program have many minor faults but no major faults?

5.7 The Interview Questions They’ll Ask

  1. Explain major vs minor faults.
  2. What is copy-on-write and why is it used?

5.8 Hints in Layers

  • Hint 1: start in counting mode to verify events.
  • Hint 2: switch to sampling and parse ring buffer.
  • Hint 3: add VMA labeling.

5.9 Books That Will Help

| Topic | Book | Chapter | |——|——|———| | Virtual memory | OSTEP | VM chapters | | perf | man-pages | perf_event_open |

5.10 Implementation Phases

Phase 1: collect fault counts. Phase 2: add address sampling + maps. Phase 3: add summary + histograms.


6. Testing Strategy

6.1 Test Categories

| Category | Purpose | Examples | |———|———|———-| | Unit | maps parser | parse sample maps | | Integration | fault capture | run cat on uncached file |

6.2 Critical Test Cases

  1. Faults captured for file-backed reads.
  2. Address mapping works for stack/heap.

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

| Pitfall | Symptom | Solution | |———|———|———-| | perf_event_paranoid | no events | relax setting | | stale maps | unknown regions | refresh periodically |


8. Extensions & Challenges

  • Add symbolization with libbfd or addr2line.
  • Export CSV for offline analysis.

9. Real-World Connections

  • Cold-start latency analysis for services

10. Resources

  • proc(5), perf_event_open(2)

11. Self-Assessment Checklist

  • I can classify page faults correctly.

12. Submission / Completion Criteria

Minimum: capture and classify faults. Full: map to VMAs and produce summary. Excellence: histogram + symbolization.


13. Determinism Notes

  • Use --fixed-ts and a fixed workload.