Project 4: The Process Psychic (Process Inspector)

Build a ps-like tool by parsing /proc directly.

Quick Reference

Attribute Value
Difficulty Intermediate
Time Estimate Weekend
Language C or Python (Alt: Go)
Prerequisites File I/O, string parsing
Key Topics /proc parsing, process states, CPU usage

1. Learning Objectives

By completing this project, you will:

  1. Scan /proc for running processes.
  2. Parse /proc/<pid>/stat or /proc/<pid>/status.
  3. Display command line, state, and memory stats.
  4. Compute CPU usage from jiffies.

2. Theoretical Foundation

2.1 Core Concepts

  • /proc: A virtual filesystem exposing kernel process data.
  • Process states: R, S, D, Z describe scheduler state.
  • Jiffies: Kernel ticks used for CPU accounting.

2.2 Why This Matters

Tools like ps and top are just /proc parsers. This project removes the mystery.

2.3 Historical Context / Background

The /proc filesystem became the standard Unix interface for process introspection.

2.4 Common Misconceptions

  • “ps uses special syscalls”: It mostly reads /proc files.
  • “cmdline is space-separated”: It is null-separated.

3. Project Specification

3.1 What You Will Build

A CLI tool that prints PID, user, state, memory usage, and command line for running processes.

3.2 Functional Requirements

  1. Enumerate numeric /proc entries.
  2. Parse process state and memory fields.
  3. Resolve UID to username.
  4. Print a process table.

3.3 Non-Functional Requirements

  • Reliability: Handle processes that exit mid-read.
  • Usability: Output columns align and truncate safely.

3.4 Example Usage / Output

$ ./myps
PID USER STATE CMD
1   root S     /sbin/init

3.5 Real World Outcome

You will run myps and see a table similar to ps:

$ ./myps
PID USER STATE CMD
1   root S     /sbin/init

4. Solution Architecture

4.1 High-Level Design

scan /proc -> parse stat/status -> resolve uid -> render table

4.2 Key Components

Component Responsibility Key Decisions
Scanner Iterate /proc Filter numeric dirs
Parser Extract state, mem status vs stat
Resolver UID -> name getpwuid
Renderer Output table Fixed columns

4.3 Data Structures

struct proc_info { pid_t pid; char state; long rss; char cmd[256]; };

4.4 Algorithm Overview

Key Algorithm: CPU%

  1. Read utime+stime.
  2. Sample again after interval.
  3. Compute delta vs total CPU time.

Complexity Analysis:

  • Time: O(n) per scan
  • Space: O(n)

5. Implementation Guide

5.1 Development Environment Setup

gcc --version

5.2 Project Structure

project-root/
├── myps.c
└── README.md

5.3 The Core Question You’re Answering

“How does top know what’s running right now?”

5.4 Concepts You Must Understand First

Stop and research these before coding:

  1. /proc layout
  2. Null-separated cmdline
  3. Tick to seconds conversion

5.5 Questions to Guide Your Design

Before implementing, think through these:

  1. Will you parse stat or status for memory values?
  2. How will you handle permissions for root-owned processes?
  3. How frequently will you sample for CPU%?

5.6 Thinking Exercise

Manual /proc

Inspect /proc/self/status and /proc/self/cmdline and compare to ps output.

5.7 The Interview Questions They’ll Ask

Prepare to answer these:

  1. “What is /proc and why is it virtual?”
  2. “How do you interpret the process state letters?”
  3. “Why is cmdline null-separated?”

5.8 Hints in Layers

Hint 1: Numeric directories Check isdigit for directory names.

Hint 2: cmdline parsing Replace \0 with spaces when printing.

Hint 3: Race conditions Ignore ENOENT when processes disappear.

5.9 Books That Will Help

Topic Book Chapter
/proc “TLPI” Ch. 10, 12

5.10 Implementation Phases

Phase 1: Foundation (1-2 days)

Goals:

  • List PIDs and read cmdline.

Tasks:

  1. Scan /proc.
  2. Print PID + cmdline.

Checkpoint: Output looks reasonable.

Phase 2: Core Functionality (2-3 days)

Goals:

  • Add state and memory.

Tasks:

  1. Parse status or stat.
  2. Add UID -> name.

Checkpoint: Table matches ps fields.

Phase 3: Polish & Edge Cases (1-2 days)

Goals:

  • Add CPU% sampling.

Tasks:

  1. Sample twice.
  2. Compute and display CPU%.

Checkpoint: CPU% is plausible.

5.11 Key Implementation Decisions

Decision Options Recommendation Rationale
Source stat vs status status Easier parsing
CPU% single vs two samples two samples Correct values

6. Testing Strategy

6.1 Test Categories

Category Purpose Examples
Parsing Validate fields Compare with ps
UID mapping User names getpwuid
CPU% Sampling Compare with top

6.2 Critical Test Cases

  1. Processes that exit mid-read are skipped.
  2. cmdline prints correctly with spaces.
  3. CPU% for idle process near zero.

6.3 Test Data

PID: self

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

Pitfall Symptom Solution
Parsing cmdline as text Missing args Replace nulls
Wrong field index Bad state Use correct stat positions
Permission errors Crash Handle gracefully

7.2 Debugging Strategies

  • Print raw stat line for a known PID.
  • Compare with ps -o pid,stat,cmd.

7.3 Performance Traps

Scanning /proc too often is expensive; use reasonable refresh intervals.


8. Extensions & Challenges

8.1 Beginner Extensions

  • Add filtering by user.
  • Add sorting by PID.

8.2 Intermediate Extensions

  • Add RSS/VSZ columns.
  • Add process tree view.

8.3 Advanced Extensions

  • Add JSON output for monitoring integration.
  • Add thread counts from /proc.

9. Real-World Connections

9.1 Industry Applications

  • Lightweight monitoring agents and diagnostics.
  • procps: https://gitlab.com/procps-ng/procps

9.3 Interview Relevance

  • /proc parsing and process states are common topics.

10. Resources

10.1 Essential Reading

  • proc(5) - man 5 proc

10.2 Video Resources

  • /proc tutorials (search “Linux /proc”)

10.3 Tools & Documentation

  • ps(1) - man 1 ps

11. Self-Assessment Checklist

11.1 Understanding

  • I can explain /proc.
  • I can explain process states.
  • I can compute CPU% from jiffies.

11.2 Implementation

  • Tool lists processes correctly.
  • cmdline displays properly.
  • CPU% looks reasonable.

11.3 Growth

  • I can add more fields easily.
  • I can compare output with top/ps.

12. Submission / Completion Criteria

Minimum Viable Completion:

  • List PID, user, state, and cmdline.

Full Completion:

  • Add memory and CPU% columns.

Excellence (Going Above & Beyond):

  • Add process tree and JSON output.

This guide was generated from LEARN_LINUX_UNIX_INTERNALS_DEEP_DIVE.md. For the complete learning path, see the parent directory.