Project 8: Process Genealogist

Trace a PID’s full ancestry and descendants using /proc and ps.

Quick Reference

Attribute Value
Difficulty Intermediate
Time Estimate 1 week
Language Python (Alternatives: Go, Rust, C)
Prerequisites Project 1, fork/exec basics
Key Topics PPID chains, sessions, process groups

1. Learning Objectives

By completing this project, you will:

  1. Build parent/child trees from /proc.
  2. Trace a process back to PID 1.
  3. Inspect sessions, process groups, and controlling terminals.
  4. List a process’s open file descriptors and cwd.

2. Theoretical Foundation

2.1 Core Concepts

  • fork/exec: Processes are created by fork and then replaced by exec.
  • Sessions and groups: Job control depends on session/PGID relationships.
  • Orphans and reparenting: Orphans are adopted by PID 1.

2.2 Why This Matters

Understanding lineage explains why processes have certain environments, fds, and terminals.

2.3 Historical Context / Background

Unix has always tracked PPIDs and sessions; /proc exposes these fields directly.

2.4 Common Misconceptions

  • “PPID is always stable”: It can change when parents die.
  • “pstree is magic”: It just reads /proc.

3. Project Specification

3.1 What You Will Build

A CLI tool that prints a PID’s ancestry tree, descendant tree, and session details, along with open file descriptors.

3.2 Functional Requirements

  1. Resolve a PID’s parent chain to PID 1.
  2. Build a list of descendants.
  3. Show PGID, SID, and controlling terminal.
  4. List open file descriptors and cwd.

3.3 Non-Functional Requirements

  • Performance: Handle full /proc scan quickly.
  • Reliability: Handle processes that exit during scan.
  • Usability: Clear, tree-like output.

3.4 Example Usage / Output

$ ./process-genealogist 1234
systemd(1) -> sshd(521) -> bash(1199) -> python3(1234)

3.5 Real World Outcome

You will run the tool and see the full lineage and open resources for a process. Example:

$ ./process-genealogist 1234
systemd(1) -> sshd(521) -> bash(1199) -> python3(1234)

4. Solution Architecture

4.1 High-Level Design

scan /proc -> build PID->PPID map -> trace ancestry -> find descendants -> report

4.2 Key Components

Component Responsibility Key Decisions
Scanner Collect stat/status Use /proc numeric dirs
Tree builder Build child lists Map PPID -> children
Reporter Render ancestry Arrow or tree view

4.3 Data Structures

ppid = {pid: ppid}
children = {pid: [child1, child2]}

4.4 Algorithm Overview

Key Algorithm: Ancestry Chain

  1. Start with PID.
  2. Repeatedly lookup PPID until 1.
  3. Reverse list for display.

Complexity Analysis:

  • Time: O(n) scan + O(depth) traversal
  • Space: O(n)

5. Implementation Guide

5.1 Development Environment Setup

python3 --version

5.2 Project Structure

project-root/
├── process_genealogist.py
└── README.md

5.3 The Core Question You’re Answering

“Where did this process come from, and what did it inherit?”

5.4 Concepts You Must Understand First

Stop and research these before coding:

  1. PPID and reparenting
  2. Sessions and process groups
  3. /proc fields in stat and status

5.5 Questions to Guide Your Design

Before implementing, think through these:

  1. How do you handle processes that exit during scanning?
  2. How will you display ancestry and descendants separately?
  3. What resource info is most valuable to show?

5.6 Thinking Exercise

Trace your shell

Use /proc/$$/status and follow PPID until PID 1.

5.7 The Interview Questions They’ll Ask

Prepare to answer these:

  1. “What happens to children when a parent exits?”
  2. “What is a session leader?”
  3. “How do you find which terminal owns a process?”

5.8 Hints in Layers

Hint 1: Start with status /proc/<pid>/status exposes PPID, UID, and state.

Hint 2: Build child lists Scan all PIDs and group by PPID.

**Hint 3: Use /proc//fd** Each symlink shows open files and sockets.

5.9 Books That Will Help

Topic Book Chapter
Process creation “TLPI” Ch. 24-28
Sessions/groups “APUE” Ch. 9
/proc “How Linux Works” Ch. 8

5.10 Implementation Phases

Phase 1: Foundation (2 days)

Goals:

  • Parse PID->PPID mapping.

Tasks:

  1. Scan /proc.
  2. Build map.

Checkpoint: Map matches ps -o pid,ppid.

Phase 2: Core Functionality (3 days)

Goals:

  • Show ancestry and descendants.

Tasks:

  1. Trace ancestry chain.
  2. Build descendant tree.

Checkpoint: Output matches pstree.

Phase 3: Polish & Edge Cases (2 days)

Goals:

  • Add session and resource info.

Tasks:

  1. Read SID/PGID.
  2. List open files and cwd.

Checkpoint: Output shows expected fds and cwd.

5.11 Key Implementation Decisions

Decision Options Recommendation Rationale
Data source ps vs /proc /proc More detail
Output style tree vs list tree Readability

6. Testing Strategy

6.1 Test Categories

Category Purpose Examples
Mapping Validate PPIDs Compare with ps
Tree Validate children Compare with pstree
Resources Validate fds Compare with lsof

6.2 Critical Test Cases

  1. PID with no children shows empty descendants.
  2. Orphaned process shows parent PID 1.
  3. Process exits during scan without crash.

6.3 Test Data

PID chain: 1 -> 521 -> 1199 -> 1234

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

Pitfall Symptom Solution
Missing fields Wrong PPID Use /proc//stat
Races KeyError Catch file-not-found
Parsing cmdline Empty names Use /proc//comm fallback

7.2 Debugging Strategies

  • Print ancestry chain step-by-step.
  • Compare with pstree -p output.

7.3 Performance Traps

Reading too many files repeatedly can be slow; scan once per run.


8. Extensions & Challenges

8.1 Beginner Extensions

  • Add user name lookup.
  • Add command-line argument display.

8.2 Intermediate Extensions

  • Add process group tree.
  • Show environment variables count.

8.3 Advanced Extensions

  • Visualize output as Graphviz.
  • Add filtering by user or session.

9. Real-World Connections

9.1 Industry Applications

  • Debugging runaway process trees and orphaned jobs.
  • pstree: https://gitlab.com/procps-ng/procps
  • lsof: https://github.com/lsof-org/lsof

9.3 Interview Relevance

  • Process lineage and job control are core Unix topics.

10. Resources

10.1 Essential Reading

  • proc(5) - man 5 proc
  • ps(1) - man 1 ps

10.2 Video Resources

  • Process tree explanations (search “Unix process tree”)

10.3 Tools & Documentation

  • pstree(1) - man 1 pstree

11. Self-Assessment Checklist

11.1 Understanding

  • I can explain PPID chains.
  • I can explain sessions and process groups.
  • I can interpret /proc fields.

11.2 Implementation

  • Ancestry output is correct.
  • Descendants are correct.
  • Resource listing works.

11.3 Growth

  • I can use this to debug real services.
  • I can extend output formats.

12. Submission / Completion Criteria

Minimum Viable Completion:

  • Print a PID’s ancestry chain.

Full Completion:

  • Include descendant tree and session details.

Excellence (Going Above & Beyond):

  • Export Graphviz or JSON for visualization.

This guide was generated from LINUX_SYSTEM_TOOLS_MASTERY.md. For the complete learning path, see the parent directory.