Project 9: Zombie Hunter
Create and detect zombie processes, then build a cleanup advisory tool.
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Advanced |
| Time Estimate | 1 week |
| Language | C (Alternatives: Rust, Python, Go) |
| Prerequisites | Project 4 and 8 |
| Key Topics | wait(), SIGCHLD, process states |
1. Learning Objectives
By completing this project, you will:
- Create zombies intentionally and observe their state.
- Detect zombies using
/procandps. - Identify parent processes responsible for reaping.
- Provide remediation steps for zombie cleanup.
2. Theoretical Foundation
2.1 Core Concepts
- Zombie state: A dead child that has not been reaped by the parent.
- wait() family: The only proper way to collect exit status.
- SIGCHLD: Signal delivered when a child changes state.
2.2 Why This Matters
Zombie accumulation indicates buggy process management and can exhaust PID space over time.
2.3 Historical Context / Background
Unix keeps a short-lived process table entry for dead children to allow parents to read exit status.
2.4 Common Misconceptions
- “You can kill a zombie”: Zombies are already dead; you must fix the parent.
3. Project Specification
3.1 What You Will Build
A small suite: a zombie creator, a scanner that lists zombies, and a report that points to the parent process and suggested fix.
3.2 Functional Requirements
- Create a zombie with a parent that never calls wait().
- Scan
/procfor processes with state Z. - Identify and report parent process and age.
3.3 Non-Functional Requirements
- Safety: Keep test zombies in a controlled environment.
- Reliability: Handle races where zombies disappear.
- Usability: Provide clear remediation notes.
3.4 Example Usage / Output
$ ./zombie-hunter --scan
PID PPID AGE CMD
1234 1200 5m zombie-child
3.5 Real World Outcome
You will list zombies and see which parents are responsible. Example:
$ ./zombie-hunter --scan
PID PPID AGE CMD
1234 1200 5m zombie-child
4. Solution Architecture
4.1 High-Level Design
create zombie -> scan /proc -> detect state Z -> map to parent -> report
4.2 Key Components
| Component | Responsibility | Key Decisions |
|---|---|---|
| Zombie creator | Fork and exit child | Parent sleeps |
| Scanner | Read /proc/ |
Check state field |
| Reporter | Map PPID to cmd | Suggest remediation |
4.3 Data Structures
struct ZombieInfo { pid_t pid; pid_t ppid; time_t start; };
4.4 Algorithm Overview
Key Algorithm: Zombie Scan
- Iterate
/proc/*/stat. - Parse state field.
- If state == ‘Z’, record PID/PPID.
Complexity Analysis:
- Time: O(n) for n processes
- Space: O(z) zombies
5. Implementation Guide
5.1 Development Environment Setup
gcc --version
5.2 Project Structure
project-root/
├── zombie_creator.c
├── zombie_hunter.c
└── README.md
5.3 The Core Question You’re Answering
“Why do zombie processes exist, and how do you get rid of them safely?”
5.4 Concepts You Must Understand First
Stop and research these before coding:
- Process exit lifecycle
- wait()/waitpid() semantics
- SIGCHLD handling
5.5 Questions to Guide Your Design
Before implementing, think through these:
- How will you compute zombie age?
- What should the tool recommend if parent is PID 1?
- How do you avoid false positives?
5.6 Thinking Exercise
Manual zombie
Create a zombie using fork and confirm it with ps -o pid,ppid,stat,cmd.
5.7 The Interview Questions They’ll Ask
Prepare to answer these:
- “Why can’t you kill a zombie process?”
- “How do you prevent zombies in your code?”
- “What does SIGCHLD do?”
5.8 Hints in Layers
Hint 1: Minimal zombie Child exits immediately, parent sleeps forever.
Hint 2: State field
State is the 3rd field in /proc/<pid>/stat.
Hint 3: Cleanup Kill the parent or fix it to call wait().
5.9 Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Process termination | “TLPI” | Ch. 25-26 |
| SIGCHLD | “APUE” | Ch. 10 |
| Process states | “OSTEP” | Ch. 5 |
5.10 Implementation Phases
Phase 1: Foundation (2 days)
Goals:
- Create a reliable zombie.
Tasks:
- Fork and exit child.
- Keep parent alive.
Checkpoint: ps shows Z state.
Phase 2: Core Functionality (3 days)
Goals:
- Build zombie scanner.
Tasks:
- Scan
/proc. - Identify Z states.
Checkpoint: Scanner lists the test zombie.
Phase 3: Polish & Edge Cases (2 days)
Goals:
- Add remediation suggestions.
Tasks:
- Map PPID to command.
- Suggest wait() or kill parent.
Checkpoint: Report is actionable.
5.11 Key Implementation Decisions
| Decision | Options | Recommendation | Rationale |
|---|---|---|---|
| Age source | /proc/ |
stat starttime | Stable source |
| Output | one-line vs table | table | Readability |
6. Testing Strategy
6.1 Test Categories
| Category | Purpose | Examples |
|---|---|---|
| Zombie creation | Ensure Z state exists | fork/exit test |
| Detection | Ensure scan works | Compare with ps |
| Cleanup | Ensure recommendations | Kill parent |
6.2 Critical Test Cases
- Zombie detected from test program.
- Parent PID listed correctly.
- No crash if zombie disappears.
6.3 Test Data
State: Z
7. Common Pitfalls & Debugging
7.1 Frequent Mistakes
| Pitfall | Symptom | Solution |
|---|---|---|
| Confusing zombie with stopped | Wrong state | Check field 3 in stat |
| Assuming kill -9 works | Zombie persists | Kill parent or fix wait |
| Races | Missing PID | Retry scan |
7.2 Debugging Strategies
- Use
ps -o pid,ppid,stat,cmdto verify. - Print raw stat line for a zombie.
7.3 Performance Traps
Scanning all PIDs too often is unnecessary; use on-demand scans.
8. Extensions & Challenges
8.1 Beginner Extensions
- Add a summary count per parent.
- Add JSON output.
8.2 Intermediate Extensions
- Add a daemon mode to watch for zombies.
- Integrate with SIGCHLD handler examples.
8.3 Advanced Extensions
- Integrate with process supervisor from Project 12.
- Add cross-namespace zombie scanning.
9. Real-World Connections
9.1 Industry Applications
- Debugging daemon failures and process management bugs.
9.2 Related Open Source Projects
- ps: https://gitlab.com/procps-ng/procps
- systemd: https://systemd.io
9.3 Interview Relevance
- Zombies and wait() behavior are classic Unix interview topics.
10. Resources
10.1 Essential Reading
- wait(2) -
man 2 wait - proc(5) -
/proc/<pid>/stat
10.2 Video Resources
- Process lifecycle explanations (search “Linux zombie process”)
10.3 Tools & Documentation
- ps(1) -
man 1 ps
10.4 Related Projects in This Series
- Signal Laboratory: learn SIGCHLD basics.
11. Self-Assessment Checklist
11.1 Understanding
- I can explain why zombies exist.
- I can explain wait() behavior.
- I can detect zombies via /proc.
11.2 Implementation
- Zombie creation works.
- Scanner reports zombies accurately.
- Recommendations are clear.
11.3 Growth
- I can prevent zombies in my own code.
- I can debug zombie accumulation in real services.
12. Submission / Completion Criteria
Minimum Viable Completion:
- Detect zombies and print PID/PPID.
Full Completion:
- Provide age and remediation notes.
Excellence (Going Above & Beyond):
- Add a background watch mode with alerts.
This guide was generated from LINUX_SYSTEM_TOOLS_MASTERY.md. For the complete learning path, see the parent directory.