Project 2: Memory Leak Detective
Monitor a process over time and flag suspicious heap growth using
pmapand/proc.
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Advanced |
| Time Estimate | 1-2 weeks |
| Language | C (Alternatives: Rust, Python, Go) |
| Prerequisites | Project 1, pointers and malloc basics |
| Key Topics | virtual memory, RSS vs VSZ, memory maps |
1. Learning Objectives
By completing this project, you will:
- Parse
/proc/<pid>/mapsandpmapoutput. - Track heap size over time and detect linear growth.
- Explain RSS, VSZ, and shared vs private mappings.
- Produce a leak report with trend analysis.
2. Theoretical Foundation
2.1 Core Concepts
- Virtual vs physical memory: VSZ counts mappings; RSS counts resident pages.
- Memory mappings:
[heap],[stack], anonymous, and file-backed regions are distinct. - Demand paging: A large allocation does not become resident until touched.
2.2 Why This Matters
Memory leaks can be invisible until the system is under pressure. Understanding maps and RSS lets you debug without restarting services.
2.3 Historical Context / Background
Linux exposes process memory maps via /proc/<pid>/maps and details via /proc/<pid>/smaps, enabling user-space tooling like pmap.
2.4 Common Misconceptions
- “High VSZ means a leak”: VSZ can be inflated by file mappings and reserved space.
- “RSS only goes up”: RSS can drop due to page reclaim.
3. Project Specification
3.1 What You Will Build
A monitoring tool that samples a target PID, tracks heap size and RSS, and flags leak-like growth trends.
3.2 Functional Requirements
- Accept PID, interval, and duration.
- Extract heap region size from maps or smaps.
- Report growth rates and highlight suspicious trends.
3.3 Non-Functional Requirements
- Performance: Low overhead sampling.
- Reliability: Handle target process exit gracefully.
- Usability: Produce a compact table and a final diagnosis.
3.4 Example Usage / Output
$ ./memleak-detective --pid 1234 --interval 5 --duration 60
Time RSS(MB) Heap(MB) Delta
00:00 45.2 12.4 --
00:05 47.8 14.2 +1.8
3.5 Real World Outcome
You will see a time series of heap growth and a simple leak warning. Example:
$ ./memleak-detective --pid 1234 --interval 5 --duration 60
Time RSS(MB) Heap(MB) Delta
00:00 45.2 12.4 --
00:05 47.8 14.2 +1.8
00:10 51.3 17.9 +3.7
Warning: linear heap growth detected
4. Solution Architecture
4.1 High-Level Design
Sampler -> read maps/smaps -> compute heap size -> store history -> analyze slope
4.2 Key Components
| Component | Responsibility | Key Decisions |
|---|---|---|
| Sampler | Periodic reads | Use sleep interval |
| Parser | Extract heap region | Prefer /proc/<pid>/maps |
| Analyzer | Trend detection | Simple slope or regression |
4.3 Data Structures
struct Sample { double rss_mb; double heap_mb; double t_sec; };
4.4 Algorithm Overview
Key Algorithm: Trend Detection
- Sample heap size every N seconds.
- Compute slope of heap over time.
- Flag when slope exceeds threshold.
Complexity Analysis:
- Time: O(n) samples
- Space: O(n) history
5. Implementation Guide
5.1 Development Environment Setup
gcc --version
5.2 Project Structure
project-root/
├── memleak_detective.c
└── README.md
5.3 The Core Question You’re Answering
“Is this process actually leaking memory, or just reserving it?”
5.4 Concepts You Must Understand First
Stop and research these before coding:
- RSS vs VSZ
- What each means and how it is computed.
- Memory Maps
- How to identify
[heap]and[stack].
- How to identify
- Demand Paging
- Why touching memory matters for RSS.
5.5 Questions to Guide Your Design
Before implementing, think through these:
- How long should you sample before declaring a leak?
- Should you report RSS, heap, or both?
- How do you handle short-lived spikes?
5.6 Thinking Exercise
Create a Leak
Write a small program that allocates 1MB per second without freeing. Track RSS and heap and compare their growth.
5.7 The Interview Questions They’ll Ask
Prepare to answer these:
- “What is the difference between RSS and VSZ?”
- “How can a process reserve memory without using it?”
- “How do you find a leak in production?”
5.8 Hints in Layers
Hint 1: Start with pmap -x
Parse the total line first.
Hint 2: Use /proc/<pid>/smaps
It provides per-region RSS and private memory.
Hint 3: Use a slope threshold Consider leak if slope is positive and stable.
5.9 Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Virtual memory | “CS:APP” | Ch. 9 |
| Memory mapping | “TLPI” | Ch. 49 |
| Process layout | “OSTEP” | Ch. 13-15 |
5.10 Implementation Phases
Phase 1: Foundation (2-3 days)
Goals:
- Read maps and identify heap region.
Tasks:
- Parse
/proc/<pid>/maps. - Compute heap size.
Checkpoint: Heap size matches pmap output.
Phase 2: Core Functionality (3-4 days)
Goals:
- Track growth over time.
Tasks:
- Sample periodically.
- Compute deltas.
Checkpoint: Output shows sensible trend line.
Phase 3: Polish & Edge Cases (2-3 days)
Goals:
- Add warnings and summary report.
Tasks:
- Detect linear growth.
- Print diagnosis.
Checkpoint: Tool flags leak in test program.
5.11 Key Implementation Decisions
| Decision | Options | Recommendation | Rationale |
|---|---|---|---|
| Heap source | maps vs smaps | maps first | Simpler parsing |
| Trend model | slope vs regression | slope | Sufficient for leaks |
6. Testing Strategy
6.1 Test Categories
| Category | Purpose | Examples |
|---|---|---|
| Parsing | Validate heap extraction | Compare with pmap |
| Trend | Validate slope | Leaky test program |
| Stability | Target exits | Handle gracefully |
6.2 Critical Test Cases
- Heap grows linearly -> warning triggered.
- Heap grows once -> no warning.
- Target exits -> tool reports and exits cleanly.
6.3 Test Data
Leaky sample: +1MB/sec heap growth
7. Common Pitfalls & Debugging
7.1 Frequent Mistakes
| Pitfall | Symptom | Solution |
|---|---|---|
| Confusing VSZ/RSS | False alarms | Report both clearly |
| Ignoring demand paging | Wrong conclusions | Touch memory in tests |
| Parsing smaps incorrectly | Missing data | Start with maps |
7.2 Debugging Strategies
- Compare to
pmap -x <pid>totals. - Validate heap region by address range.
7.3 Performance Traps
Parsing smaps frequently is costly; use longer intervals if needed.
8. Extensions & Challenges
8.1 Beginner Extensions
- Add CSV export.
- Add a max heap threshold.
8.2 Intermediate Extensions
- Use
/proc/<pid>/smapsto separate private vs shared. - Add a small ASCII sparkline.
8.3 Advanced Extensions
- Integrate with
performalloctracing. - Detect leaks per allocation site (with optional debug builds).
9. Real-World Connections
9.1 Industry Applications
- Memory leak triage in long-running services.
9.2 Related Open Source Projects
- Valgrind: https://valgrind.org
- heaptrack: https://github.com/KDE/heaptrack
9.3 Interview Relevance
- Memory metrics and leak reasoning are common systems topics.
10. Resources
10.1 Essential Reading
- pmap(1) -
man 1 pmap - proc(5) -
/proc/<pid>/mapsand/proc/<pid>/smaps
10.2 Video Resources
- Virtual memory overview lectures (search “virtual memory Linux”)
10.3 Tools & Documentation
- smaps:
/proc/<pid>/smaps
10.4 Related Projects in This Series
- Syscall Profiler: tie leaks to I/O behavior.
11. Self-Assessment Checklist
11.1 Understanding
- I can explain RSS vs VSZ.
- I can interpret memory maps.
- I can explain demand paging.
11.2 Implementation
- Heap size is computed correctly.
- Trend analysis works.
- Reports are readable.
11.3 Growth
- I can use the tool on a real service.
- I can explain leak detection logic.
12. Submission / Completion Criteria
Minimum Viable Completion:
- Report RSS and heap size over time.
Full Completion:
- Detect linear heap growth and provide a warning.
Excellence (Going Above & Beyond):
- Separate private/shared memory and provide leak confidence scoring.
This guide was generated from LINUX_SYSTEM_TOOLS_MASTERY.md. For the complete learning path, see the parent directory.