Project 2: Memory Leak Detective

Monitor a process over time and flag suspicious heap growth using pmap and /proc.

Quick Reference

Attribute Value
Difficulty Advanced
Time Estimate 1-2 weeks
Language C (Alternatives: Rust, Python, Go)
Prerequisites Project 1, pointers and malloc basics
Key Topics virtual memory, RSS vs VSZ, memory maps

1. Learning Objectives

By completing this project, you will:

  1. Parse /proc/<pid>/maps and pmap output.
  2. Track heap size over time and detect linear growth.
  3. Explain RSS, VSZ, and shared vs private mappings.
  4. Produce a leak report with trend analysis.

2. Theoretical Foundation

2.1 Core Concepts

  • Virtual vs physical memory: VSZ counts mappings; RSS counts resident pages.
  • Memory mappings: [heap], [stack], anonymous, and file-backed regions are distinct.
  • Demand paging: A large allocation does not become resident until touched.

2.2 Why This Matters

Memory leaks can be invisible until the system is under pressure. Understanding maps and RSS lets you debug without restarting services.

2.3 Historical Context / Background

Linux exposes process memory maps via /proc/<pid>/maps and details via /proc/<pid>/smaps, enabling user-space tooling like pmap.

2.4 Common Misconceptions

  • “High VSZ means a leak”: VSZ can be inflated by file mappings and reserved space.
  • “RSS only goes up”: RSS can drop due to page reclaim.

3. Project Specification

3.1 What You Will Build

A monitoring tool that samples a target PID, tracks heap size and RSS, and flags leak-like growth trends.

3.2 Functional Requirements

  1. Accept PID, interval, and duration.
  2. Extract heap region size from maps or smaps.
  3. Report growth rates and highlight suspicious trends.

3.3 Non-Functional Requirements

  • Performance: Low overhead sampling.
  • Reliability: Handle target process exit gracefully.
  • Usability: Produce a compact table and a final diagnosis.

3.4 Example Usage / Output

$ ./memleak-detective --pid 1234 --interval 5 --duration 60
Time  RSS(MB)  Heap(MB)  Delta
00:00 45.2     12.4      --
00:05 47.8     14.2      +1.8

3.5 Real World Outcome

You will see a time series of heap growth and a simple leak warning. Example:

$ ./memleak-detective --pid 1234 --interval 5 --duration 60
Time  RSS(MB)  Heap(MB)  Delta
00:00 45.2     12.4      --
00:05 47.8     14.2      +1.8
00:10 51.3     17.9      +3.7
Warning: linear heap growth detected

4. Solution Architecture

4.1 High-Level Design

Sampler -> read maps/smaps -> compute heap size -> store history -> analyze slope

4.2 Key Components

Component Responsibility Key Decisions
Sampler Periodic reads Use sleep interval
Parser Extract heap region Prefer /proc/<pid>/maps
Analyzer Trend detection Simple slope or regression

4.3 Data Structures

struct Sample { double rss_mb; double heap_mb; double t_sec; };

4.4 Algorithm Overview

Key Algorithm: Trend Detection

  1. Sample heap size every N seconds.
  2. Compute slope of heap over time.
  3. Flag when slope exceeds threshold.

Complexity Analysis:

  • Time: O(n) samples
  • Space: O(n) history

5. Implementation Guide

5.1 Development Environment Setup

gcc --version

5.2 Project Structure

project-root/
├── memleak_detective.c
└── README.md

5.3 The Core Question You’re Answering

“Is this process actually leaking memory, or just reserving it?”

5.4 Concepts You Must Understand First

Stop and research these before coding:

  1. RSS vs VSZ
    • What each means and how it is computed.
  2. Memory Maps
    • How to identify [heap] and [stack].
  3. Demand Paging
    • Why touching memory matters for RSS.

5.5 Questions to Guide Your Design

Before implementing, think through these:

  1. How long should you sample before declaring a leak?
  2. Should you report RSS, heap, or both?
  3. How do you handle short-lived spikes?

5.6 Thinking Exercise

Create a Leak

Write a small program that allocates 1MB per second without freeing. Track RSS and heap and compare their growth.

5.7 The Interview Questions They’ll Ask

Prepare to answer these:

  1. “What is the difference between RSS and VSZ?”
  2. “How can a process reserve memory without using it?”
  3. “How do you find a leak in production?”

5.8 Hints in Layers

Hint 1: Start with pmap -x Parse the total line first.

Hint 2: Use /proc/<pid>/smaps It provides per-region RSS and private memory.

Hint 3: Use a slope threshold Consider leak if slope is positive and stable.

5.9 Books That Will Help

Topic Book Chapter
Virtual memory “CS:APP” Ch. 9
Memory mapping “TLPI” Ch. 49
Process layout “OSTEP” Ch. 13-15

5.10 Implementation Phases

Phase 1: Foundation (2-3 days)

Goals:

  • Read maps and identify heap region.

Tasks:

  1. Parse /proc/<pid>/maps.
  2. Compute heap size.

Checkpoint: Heap size matches pmap output.

Phase 2: Core Functionality (3-4 days)

Goals:

  • Track growth over time.

Tasks:

  1. Sample periodically.
  2. Compute deltas.

Checkpoint: Output shows sensible trend line.

Phase 3: Polish & Edge Cases (2-3 days)

Goals:

  • Add warnings and summary report.

Tasks:

  1. Detect linear growth.
  2. Print diagnosis.

Checkpoint: Tool flags leak in test program.

5.11 Key Implementation Decisions

Decision Options Recommendation Rationale
Heap source maps vs smaps maps first Simpler parsing
Trend model slope vs regression slope Sufficient for leaks

6. Testing Strategy

6.1 Test Categories

Category Purpose Examples
Parsing Validate heap extraction Compare with pmap
Trend Validate slope Leaky test program
Stability Target exits Handle gracefully

6.2 Critical Test Cases

  1. Heap grows linearly -> warning triggered.
  2. Heap grows once -> no warning.
  3. Target exits -> tool reports and exits cleanly.

6.3 Test Data

Leaky sample: +1MB/sec heap growth

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

Pitfall Symptom Solution
Confusing VSZ/RSS False alarms Report both clearly
Ignoring demand paging Wrong conclusions Touch memory in tests
Parsing smaps incorrectly Missing data Start with maps

7.2 Debugging Strategies

  • Compare to pmap -x <pid> totals.
  • Validate heap region by address range.

7.3 Performance Traps

Parsing smaps frequently is costly; use longer intervals if needed.


8. Extensions & Challenges

8.1 Beginner Extensions

  • Add CSV export.
  • Add a max heap threshold.

8.2 Intermediate Extensions

  • Use /proc/<pid>/smaps to separate private vs shared.
  • Add a small ASCII sparkline.

8.3 Advanced Extensions

  • Integrate with perf or malloc tracing.
  • Detect leaks per allocation site (with optional debug builds).

9. Real-World Connections

9.1 Industry Applications

  • Memory leak triage in long-running services.
  • Valgrind: https://valgrind.org
  • heaptrack: https://github.com/KDE/heaptrack

9.3 Interview Relevance

  • Memory metrics and leak reasoning are common systems topics.

10. Resources

10.1 Essential Reading

  • pmap(1) - man 1 pmap
  • proc(5) - /proc/<pid>/maps and /proc/<pid>/smaps

10.2 Video Resources

  • Virtual memory overview lectures (search “virtual memory Linux”)

10.3 Tools & Documentation

  • smaps: /proc/<pid>/smaps

11. Self-Assessment Checklist

11.1 Understanding

  • I can explain RSS vs VSZ.
  • I can interpret memory maps.
  • I can explain demand paging.

11.2 Implementation

  • Heap size is computed correctly.
  • Trend analysis works.
  • Reports are readable.

11.3 Growth

  • I can use the tool on a real service.
  • I can explain leak detection logic.

12. Submission / Completion Criteria

Minimum Viable Completion:

  • Report RSS and heap size over time.

Full Completion:

  • Detect linear heap growth and provide a warning.

Excellence (Going Above & Beyond):

  • Separate private/shared memory and provide leak confidence scoring.

This guide was generated from LINUX_SYSTEM_TOOLS_MASTERY.md. For the complete learning path, see the parent directory.