Project 5: System Health Monitor
Build a real-time dashboard that shows load, memory, swap, and VM stats in one view.
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Beginner |
| Time Estimate | Weekend |
| Language | Bash (Alternatives: Python, Go, Rust) |
| Prerequisites | Basic shell scripting |
| Key Topics | load average, free/available memory, vmstat |
1. Learning Objectives
By completing this project, you will:
- Parse load averages from
/proc/loadavg. - Extract memory metrics from
/proc/meminfoandfree. - Interpret vmstat fields for CPU and I/O pressure.
- Present metrics with thresholds and trend indicators.
2. Theoretical Foundation
2.1 Core Concepts
- Load average: Average runnable or uninterruptible tasks over time.
- Memory accounting: Free vs available and the role of page cache.
- vmstat: Snapshot of run queue, I/O, and memory churn.
2.2 Why This Matters
Most production slowdowns are visible in these metrics. Knowing how to read them gives fast signal during incidents.
2.3 Historical Context / Background
These metrics exist since early Unix and remain the foundation for monitoring systems and dashboards.
2.4 Common Misconceptions
- “Free memory must be high”: Linux uses free memory for cache.
- “Load average equals CPU%”: It includes I/O waiters.
3. Project Specification
3.1 What You Will Build
A bash dashboard that refreshes every N seconds, showing load, memory, swap, and vmstat values with basic status flags.
3.2 Functional Requirements
- Show 1/5/15 minute load averages.
- Show total/used/available memory and swap usage.
- Show vmstat fields (r, b, si, so, wa).
3.3 Non-Functional Requirements
- Performance: Minimal overhead.
- Reliability: Handles missing tools gracefully.
- Usability: Clear labels and consistent units.
3.4 Example Usage / Output
$ ./health-monitor --interval 2
Load: 1.25 1.50 1.75
Mem: 12.4G used / 16G total (avail 3.4G)
Swap: 0.4G / 8G
vmstat: r=2 b=0 wa=1 si=0 so=0
3.5 Real World Outcome
You will run the script and get a concise view of system health. Example:
$ ./health-monitor --interval 2
Load: 1.25 1.50 1.75
Mem: 12.4G used / 16G total (avail 3.4G)
Swap: 0.4G / 8G
vmstat: r=2 b=0 wa=1 si=0 so=0
4. Solution Architecture
4.1 High-Level Design
collect metrics -> compute thresholds -> render -> sleep -> repeat
4.2 Key Components
| Component | Responsibility | Key Decisions |
|---|---|---|
| Load reader | /proc/loadavg | Prefer /proc over uptime |
| Memory reader | /proc/meminfo | Use MemAvailable |
| vmstat reader | vmstat 1 2 | Use second line |
| Renderer | Format output | Fixed units and labels |
4.3 Data Structures
LOAD_1=; LOAD_5=; LOAD_15=
MEM_TOTAL=; MEM_AVAIL=
4.4 Algorithm Overview
Key Algorithm: Thresholding
- Compare load to CPU count.
- Compare MemAvailable to total.
- Flag swap in/out activity.
Complexity Analysis:
- Time: O(1) per refresh
- Space: O(1)
5. Implementation Guide
5.1 Development Environment Setup
uname -a
5.2 Project Structure
project-root/
├── health_monitor.sh
└── README.md
5.3 The Core Question You’re Answering
“Is the system slow due to CPU, memory pressure, or I/O wait?”
5.4 Concepts You Must Understand First
Stop and research these before coding:
- Load averages
- Compare to CPU count.
- MemAvailable
- Why it differs from MemFree.
- vmstat fields
- r, b, wa, si, so.
5.5 Questions to Guide Your Design
Before implementing, think through these:
- What thresholds indicate warning or alert?
- How often should you sample to avoid noise?
- Should you keep a small history for trend arrows?
5.6 Thinking Exercise
Compare raw sources
Compare output of uptime, /proc/loadavg, and vmstat 1 2. Confirm they agree.
5.7 The Interview Questions They’ll Ask
Prepare to answer these:
- “What does load average represent?”
- “Why is low MemFree not necessarily a problem?”
- “What does high wa in vmstat mean?”
5.8 Hints in Layers
Hint 1: Use /proc
/proc/loadavg is easy to parse.
Hint 2: Use MemAvailable It is the best signal of usable memory.
Hint 3: Use vmstat 1 2
Ignore the first line (since boot).
5.9 Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Load average | “How Linux Works” | Ch. 8 |
| Memory stats | “Linux System Programming” | Ch. 4 |
| vmstat | “Systems Performance” | Ch. 7 |
5.10 Implementation Phases
Phase 1: Foundation (Half day)
Goals:
- Parse load and memory.
Tasks:
- Read /proc/loadavg.
- Read /proc/meminfo.
Checkpoint: Values match uptime and free.
Phase 2: Core Functionality (Half day)
Goals:
- Add vmstat and thresholds.
Tasks:
- Parse vmstat sample.
- Compute warning flags.
Checkpoint: Warnings align with actual load.
Phase 3: Polish & Edge Cases (Half day)
Goals:
- Add formatting and refresh loop.
Tasks:
- Print clear output.
- Handle missing tools.
Checkpoint: Dashboard refreshes cleanly.
5.11 Key Implementation Decisions
| Decision | Options | Recommendation | Rationale |
|---|---|---|---|
| Data sources | /proc vs commands | /proc | Stable parsing |
| Output | One-line vs multi-line | Multi-line | Readability |
6. Testing Strategy
6.1 Test Categories
| Category | Purpose | Examples |
|---|---|---|
| Parsing | Validate values | Compare with free |
| Thresholds | Validate warnings | Simulate high load |
| Refresh | Validate loop | Run 1 min |
6.2 Critical Test Cases
- Load averages match
/proc/loadavg. - MemAvailable matches
freeoutput. - vmstat fields align with
vmstatoutput.
6.3 Test Data
Sample load: 1.25 1.50 1.75
7. Common Pitfalls & Debugging
7.1 Frequent Mistakes
| Pitfall | Symptom | Solution |
|---|---|---|
| Using MemFree only | False alarms | Use MemAvailable |
| Parsing vmstat header | Bad values | Use second line |
| Comparing load to 1.0 | Wrong threshold | Normalize by CPU count |
7.2 Debugging Strategies
- Print raw /proc values alongside parsed output.
- Check CPU count from
/proc/cpuinfo.
7.3 Performance Traps
Very short intervals can cause self-induced load; use 1-2 seconds.
8. Extensions & Challenges
8.1 Beginner Extensions
- Add disk usage with
df -h. - Add CPU usage from
/proc/stat.
8.2 Intermediate Extensions
- Track trends across 5 samples.
- Add JSON output.
8.3 Advanced Extensions
- Build a small ncurses UI.
- Add alert hooks (email/webhook).
9. Real-World Connections
9.1 Industry Applications
- Basic triage during incident response.
9.2 Related Open Source Projects
- collectd: https://collectd.org
- node_exporter: https://github.com/prometheus/node_exporter
9.3 Interview Relevance
- Load and memory interpretation is standard Linux interview material.
10. Resources
10.1 Essential Reading
- free(1) -
man 1 free - vmstat(8) -
man 8 vmstat
10.2 Video Resources
- Linux performance basics (search “Linux load average”)
10.3 Tools & Documentation
- /proc/meminfo and /proc/loadavg
10.4 Related Projects in This Series
- Performance Snapshot Tool: combines these metrics into a report.
11. Self-Assessment Checklist
11.1 Understanding
- I can interpret load averages.
- I can explain MemAvailable.
- I can read vmstat fields.
11.2 Implementation
- Metrics are parsed correctly.
- Thresholds are reasonable.
- Dashboard refreshes smoothly.
11.3 Growth
- I can explain system health to a teammate.
- I can extend the dashboard with new metrics.
12. Submission / Completion Criteria
Minimum Viable Completion:
- Display load, memory, and vmstat in a loop.
Full Completion:
- Add thresholds and clear status labels.
Excellence (Going Above & Beyond):
- Add historical trends and JSON export.
This guide was generated from LINUX_SYSTEM_TOOLS_MASTERY.md. For the complete learning path, see the parent directory.