Project 9: Zombie Hunter

Create and detect zombie processes, then build a cleanup advisory tool.

Quick Reference

Attribute Value
Difficulty Advanced
Time Estimate 1 week
Language C (Alternatives: Rust, Python, Go)
Prerequisites Project 4 and 8
Key Topics wait(), SIGCHLD, process states

1. Learning Objectives

By completing this project, you will:

  1. Create zombies intentionally and observe their state.
  2. Detect zombies using /proc and ps.
  3. Identify parent processes responsible for reaping.
  4. Provide remediation steps for zombie cleanup.

2. Theoretical Foundation

2.1 Core Concepts

  • Zombie state: A dead child that has not been reaped by the parent.
  • wait() family: The only proper way to collect exit status.
  • SIGCHLD: Signal delivered when a child changes state.

2.2 Why This Matters

Zombie accumulation indicates buggy process management and can exhaust PID space over time.

2.3 Historical Context / Background

Unix keeps a short-lived process table entry for dead children to allow parents to read exit status.

2.4 Common Misconceptions

  • “You can kill a zombie”: Zombies are already dead; you must fix the parent.

3. Project Specification

3.1 What You Will Build

A small suite: a zombie creator, a scanner that lists zombies, and a report that points to the parent process and suggested fix.

3.2 Functional Requirements

  1. Create a zombie with a parent that never calls wait().
  2. Scan /proc for processes with state Z.
  3. Identify and report parent process and age.

3.3 Non-Functional Requirements

  • Safety: Keep test zombies in a controlled environment.
  • Reliability: Handle races where zombies disappear.
  • Usability: Provide clear remediation notes.

3.4 Example Usage / Output

$ ./zombie-hunter --scan
PID  PPID  AGE   CMD
1234 1200  5m    zombie-child

3.5 Real World Outcome

You will list zombies and see which parents are responsible. Example:

$ ./zombie-hunter --scan
PID  PPID  AGE   CMD
1234 1200  5m    zombie-child

4. Solution Architecture

4.1 High-Level Design

create zombie -> scan /proc -> detect state Z -> map to parent -> report

4.2 Key Components

Component Responsibility Key Decisions
Zombie creator Fork and exit child Parent sleeps
Scanner Read /proc//stat Check state field
Reporter Map PPID to cmd Suggest remediation

4.3 Data Structures

struct ZombieInfo { pid_t pid; pid_t ppid; time_t start; };

4.4 Algorithm Overview

Key Algorithm: Zombie Scan

  1. Iterate /proc/*/stat.
  2. Parse state field.
  3. If state == ‘Z’, record PID/PPID.

Complexity Analysis:

  • Time: O(n) for n processes
  • Space: O(z) zombies

5. Implementation Guide

5.1 Development Environment Setup

gcc --version

5.2 Project Structure

project-root/
├── zombie_creator.c
├── zombie_hunter.c
└── README.md

5.3 The Core Question You’re Answering

“Why do zombie processes exist, and how do you get rid of them safely?”

5.4 Concepts You Must Understand First

Stop and research these before coding:

  1. Process exit lifecycle
  2. wait()/waitpid() semantics
  3. SIGCHLD handling

5.5 Questions to Guide Your Design

Before implementing, think through these:

  1. How will you compute zombie age?
  2. What should the tool recommend if parent is PID 1?
  3. How do you avoid false positives?

5.6 Thinking Exercise

Manual zombie

Create a zombie using fork and confirm it with ps -o pid,ppid,stat,cmd.

5.7 The Interview Questions They’ll Ask

Prepare to answer these:

  1. “Why can’t you kill a zombie process?”
  2. “How do you prevent zombies in your code?”
  3. “What does SIGCHLD do?”

5.8 Hints in Layers

Hint 1: Minimal zombie Child exits immediately, parent sleeps forever.

Hint 2: State field State is the 3rd field in /proc/<pid>/stat.

Hint 3: Cleanup Kill the parent or fix it to call wait().

5.9 Books That Will Help

Topic Book Chapter
Process termination “TLPI” Ch. 25-26
SIGCHLD “APUE” Ch. 10
Process states “OSTEP” Ch. 5

5.10 Implementation Phases

Phase 1: Foundation (2 days)

Goals:

  • Create a reliable zombie.

Tasks:

  1. Fork and exit child.
  2. Keep parent alive.

Checkpoint: ps shows Z state.

Phase 2: Core Functionality (3 days)

Goals:

  • Build zombie scanner.

Tasks:

  1. Scan /proc.
  2. Identify Z states.

Checkpoint: Scanner lists the test zombie.

Phase 3: Polish & Edge Cases (2 days)

Goals:

  • Add remediation suggestions.

Tasks:

  1. Map PPID to command.
  2. Suggest wait() or kill parent.

Checkpoint: Report is actionable.

5.11 Key Implementation Decisions

Decision Options Recommendation Rationale
Age source /proc//stat starttime stat starttime Stable source
Output one-line vs table table Readability

6. Testing Strategy

6.1 Test Categories

Category Purpose Examples
Zombie creation Ensure Z state exists fork/exit test
Detection Ensure scan works Compare with ps
Cleanup Ensure recommendations Kill parent

6.2 Critical Test Cases

  1. Zombie detected from test program.
  2. Parent PID listed correctly.
  3. No crash if zombie disappears.

6.3 Test Data

State: Z

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

Pitfall Symptom Solution
Confusing zombie with stopped Wrong state Check field 3 in stat
Assuming kill -9 works Zombie persists Kill parent or fix wait
Races Missing PID Retry scan

7.2 Debugging Strategies

  • Use ps -o pid,ppid,stat,cmd to verify.
  • Print raw stat line for a zombie.

7.3 Performance Traps

Scanning all PIDs too often is unnecessary; use on-demand scans.


8. Extensions & Challenges

8.1 Beginner Extensions

  • Add a summary count per parent.
  • Add JSON output.

8.2 Intermediate Extensions

  • Add a daemon mode to watch for zombies.
  • Integrate with SIGCHLD handler examples.

8.3 Advanced Extensions

  • Integrate with process supervisor from Project 12.
  • Add cross-namespace zombie scanning.

9. Real-World Connections

9.1 Industry Applications

  • Debugging daemon failures and process management bugs.
  • ps: https://gitlab.com/procps-ng/procps
  • systemd: https://systemd.io

9.3 Interview Relevance

  • Zombies and wait() behavior are classic Unix interview topics.

10. Resources

10.1 Essential Reading

  • wait(2) - man 2 wait
  • proc(5) - /proc/<pid>/stat

10.2 Video Resources

  • Process lifecycle explanations (search “Linux zombie process”)

10.3 Tools & Documentation

  • ps(1) - man 1 ps

11. Self-Assessment Checklist

11.1 Understanding

  • I can explain why zombies exist.
  • I can explain wait() behavior.
  • I can detect zombies via /proc.

11.2 Implementation

  • Zombie creation works.
  • Scanner reports zombies accurately.
  • Recommendations are clear.

11.3 Growth

  • I can prevent zombies in my own code.
  • I can debug zombie accumulation in real services.

12. Submission / Completion Criteria

Minimum Viable Completion:

  • Detect zombies and print PID/PPID.

Full Completion:

  • Provide age and remediation notes.

Excellence (Going Above & Beyond):

  • Add a background watch mode with alerts.

This guide was generated from LINUX_SYSTEM_TOOLS_MASTERY.md. For the complete learning path, see the parent directory.