Project 5: “Works On My Machine” Debugger

Build a diagnostic tool that fingerprints system environments and diffs them across machines.

Quick Reference

Attribute	Value
Difficulty	Intermediate
Time Estimate	1-2 weeks
Language	C (Alternatives: Rust, Go)
Prerequisites	Basic C, Linux filesystem basics
Key Topics	/proc, resource limits, dynamic linking

1. Learning Objectives

By completing this project, you will:

Collect critical environment data that affects program behavior.
Parse /proc and system configuration files safely.
Produce structured output for diffing.
Explain common “works on my machine” causes with evidence.

2. Theoretical Foundation

2.1 Core Concepts

/proc and /sys: Virtual files exposing kernel state.
Resource limits: getrlimit() and /proc/self/limits.
Dynamic linking: library resolution via ld.so and LD_LIBRARY_PATH.
Environment variables: execution context affecting behavior.

2.2 Why This Matters

Many production failures are caused by environment mismatches. A reliable fingerprint tool turns guesswork into evidence.

2.3 Historical Context / Background

Linux exposes extensive runtime state via /proc. Tooling like lsof and strace emerged to make system state observable.

2.4 Common Misconceptions

“The OS is the same, so behavior is the same.” Kernel settings and limits vary.
“Environment variables only affect the shell.” They affect libraries and runtime behavior.

3. Project Specification

3.1 What You Will Build

A CLI tool with two subcommands:

capture: produce a JSON snapshot of machine state.
diff: compare two snapshots and highlight differences.

3.2 Functional Requirements

Fingerprint capture: limits, kernel version, libc version, env vars.
/proc inspection: read /proc/self/limits, /proc/self/maps, /proc/sys/*.
Diff output: highlight mismatches with human-readable explanations.
Output format: JSON or line-delimited key/value pairs.

3.3 Non-Functional Requirements

Performance: completes in < 1 second on typical systems.
Reliability: handles missing files gracefully.
Usability: clear help and example usage.

3.4 Example Usage / Output

$ ./diagtool capture > local.json
$ ./diagtool diff local.json ci.json
ULIMIT_NOFILE: 1024 (ci) vs 65535 (local)
LD_LIBRARY_PATH: unset (ci) vs /opt/lib (local)

3.5 Real World Outcome

You run capture locally and in CI, then run diff to see a precise list of discrepancies. Example output:

$ ./diagtool diff local.json server.json
KERNEL_VERSION: 6.6.8 (local) vs 5.15.0 (server)
RLIMIT_NPROC: 4096 (local) vs 1024 (server)

4. Solution Architecture

4.1 High-Level Design

┌──────────────┐   ┌──────────────┐   ┌──────────────┐
│   Capture    │──▶│   Snapshot   │──▶│    Diff      │
└──────────────┘   └──────────────┘   └──────────────┘
        │                  │                  │
        ▼                  ▼                  ▼
   /proc parsing       JSON writer        Comparator

Diagnostic Capture Pipeline

4.2 Key Components

Component	Responsibility	Key Decisions
Collector	Read system state	use robust file parsing
Serializer	Write JSON	stable key ordering
Diff Engine	Compare snapshots	human-friendly output

4.3 Data Structures

struct kv_pair {
    char key[128];
    char value[256];
};

struct snapshot {
    struct kv_pair *items;
    size_t count;
};

4.4 Algorithm Overview

Key Algorithm: snapshot diff

Load snapshot A and B into key-value maps.
For each key in union, compare values.
Output differences with context.

Complexity Analysis:

Time: O(N) for N keys
Space: O(N)

5. Implementation Guide

5.1 Development Environment Setup

sudo apt-get install build-essential

5.2 Project Structure

diagtool/
├── src/
│   ├── main.c
│   ├── capture.c
│   ├── diff.c
│   └── json.c
├── tests/
│   └── test_diff.sh
├── Makefile
└── README.md

Diagnostic Tool Project Structure

5.3 The Core Question You’re Answering

“What system facts actually change program behavior across machines?”

5.4 Concepts You Must Understand First

Stop and research these before coding:

/proc filesystem
- What does /proc/self/limits represent?
- Book Reference: “How Linux Works” Ch. 8
Resource limits
- RLIMIT_NOFILE, RLIMIT_NPROC and failure modes.
- Book Reference: “TLPI” Ch. 36
Dynamic linking
- How LD_LIBRARY_PATH and rpath affect loading.
- Book Reference: “CS:APP” Ch. 7

5.5 Questions to Guide Your Design

Before implementing, think through these:

Which environment variables are most likely to cause breakage?
How will you format diff output for easy scanning?
What should happen if a file is missing on one system?
How do you avoid collecting sensitive data?

5.6 Thinking Exercise

Find a Real Difference

Compare ulimit -n and /proc/sys/fs/file-max on two machines. Note how they differ and what error messages appear when exceeded.

5.7 The Interview Questions They’ll Ask

Prepare to answer these:

“What are common causes of ‘works on my machine’ bugs?”
“How does /proc expose process limits?”
“What is LD_LIBRARY_PATH and why is it risky?”

5.8 Hints in Layers

Hint 1: Start with limits /proc/self/limits is a goldmine for differences.

Hint 2: Stable sorting Sort keys before output so diffs are deterministic.

Hint 3: Redact paths Avoid dumping private file paths unless necessary.

5.9 Books That Will Help

Topic	Book	Chapter
/proc and processes	“How Linux Works”	Ch. 8
Resource limits	“TLPI”	Ch. 36
Dynamic linking	“CS:APP”	Ch. 7

5.10 Implementation Phases

Phase 1: Foundation (2-3 days)

Goals:

Capture basic system facts
JSON serialization

Tasks:

Read kernel version and OS info.
Implement simple JSON writer.

Checkpoint: capture produces valid JSON.

Phase 2: Core Functionality (3-4 days)

Goals:

/proc parsing
Diff engine

Tasks:

Parse /proc/self/limits and /proc/self/maps.
Implement diff output.

Checkpoint: diff highlights real differences.

Phase 3: Polish & Edge Cases (2-3 days)

Goals:

CLI flags
Redaction and filtering

Tasks:

Add --include/--exclude filters.
Handle missing files gracefully.

Checkpoint: Tool runs on minimal containers.

5.11 Key Implementation Decisions

Decision	Options	Recommendation	Rationale
Output format	JSON vs text	JSON	machine-readable diffs
Key storage	hash map vs array	hash map	faster diff lookup
Sensitivity	full dump vs redaction	redaction	safer by default

6. Testing Strategy

6.1 Test Categories

Category	Purpose	Examples
Unit Tests	Parser correctness	limits parser
Integration Tests	Capture+diff	run on same host
Edge Case Tests	Missing files	minimal container

6.2 Critical Test Cases

Identical snapshots: diff should be empty.
Different limits: ensure output is readable.
Missing /proc: degrade gracefully with warnings.

6.3 Test Data

{"RLIMIT_NOFILE":"1024"}

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

Pitfall	Symptom	Solution
Unstable ordering	noisy diffs	sort keys
Over-collecting	sensitive output	apply filters
Parsing errors	crash on format	defensive parsing

7.2 Debugging Strategies

Log file reads to see missing paths.
Validate JSON with jq.

7.3 Performance Traps

Reading huge /proc entries can be slow; keep scope focused.

8. Extensions & Challenges

8.1 Beginner Extensions

Add CSV output.
Add a “quick summary” mode.

8.2 Intermediate Extensions

Integrate with ssh to capture remote snapshot.
Add library version comparisons using ldd.

8.3 Advanced Extensions

Build a TUI diff viewer.
Add baseline profiles for CI enforcement.

9. Real-World Connections

9.1 Industry Applications

CI/CD debugging for environment drift.
Support tooling for customer deployments.

neofetch: https://github.com/dylanaraps/neofetch - System info tool
strace: https://github.com/strace/strace - System call inspection

9.3 Interview Relevance

Shows systematic debugging mindset.
Demonstrates knowledge of /proc and limits.

10. Resources

10.1 Essential Reading

“How Linux Works” by Brian Ward - Ch. 8
“The Linux Programming Interface” by Michael Kerrisk - Ch. 36

10.2 Video Resources

Linux /proc deep dives - YouTube
SRE incident debugging talks

10.3 Tools & Documentation

man 2 getrlimit: Resource limits
man 5 proc: /proc layout

Project 1 teaches inode tracking and FD awareness.
Project 6 uses the diagnostics to compare deployment environments.

11. Self-Assessment Checklist

11.1 Understanding

I can explain why limits differ across machines.
I can parse /proc/self/limits confidently.
I can describe how dynamic linking varies.

11.2 Implementation

Capture and diff work reliably.
Output is stable and readable.
Sensitive data is handled safely.

11.3 Growth

I can explain this tool in an interview.
I can use it to debug a real issue.

12. Submission / Completion Criteria

Minimum Viable Completion:

capture produces a snapshot.
diff compares two snapshots.

Full Completion:

Includes limits, kernel, and environment info.
Robust parsing and stable output.

Excellence (Going Above & Beyond):

Remote capture and baseline enforcement.
TUI diff viewer.

This guide was generated from SPRINT_5_SYSTEMS_INTEGRATION_PROJECTS.md. For the complete learning path, see the parent directory.

Project 5: “Works On My Machine” Debugger

Quick Reference

1. Learning Objectives

2. Theoretical Foundation

2.1 Core Concepts

2.2 Why This Matters

2.3 Historical Context / Background

2.4 Common Misconceptions

3. Project Specification

3.1 What You Will Build

3.2 Functional Requirements

3.3 Non-Functional Requirements

3.4 Example Usage / Output

3.5 Real World Outcome

4. Solution Architecture

4.1 High-Level Design

4.2 Key Components

4.3 Data Structures

4.4 Algorithm Overview

5. Implementation Guide

5.1 Development Environment Setup

5.2 Project Structure

5.3 The Core Question You’re Answering

5.4 Concepts You Must Understand First

5.5 Questions to Guide Your Design

5.6 Thinking Exercise

Find a Real Difference

5.7 The Interview Questions They’ll Ask

5.8 Hints in Layers

5.9 Books That Will Help

5.10 Implementation Phases

Phase 1: Foundation (2-3 days)

Phase 2: Core Functionality (3-4 days)

Phase 3: Polish & Edge Cases (2-3 days)

5.11 Key Implementation Decisions

6. Testing Strategy

6.1 Test Categories

6.2 Critical Test Cases

6.3 Test Data

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

7.2 Debugging Strategies

7.3 Performance Traps

8. Extensions & Challenges

8.1 Beginner Extensions

8.2 Intermediate Extensions

8.3 Advanced Extensions

9. Real-World Connections

9.1 Industry Applications

9.2 Related Open Source Projects

9.3 Interview Relevance

10. Resources

10.1 Essential Reading

10.2 Video Resources

10.3 Tools & Documentation

10.4 Related Projects in This Series

11. Self-Assessment Checklist

11.1 Understanding

11.2 Implementation

11.3 Growth

12. Submission / Completion Criteria