Project 6: Fork and Exec (Process Creation)
Build a process explorer that demonstrates fork/exec/wait and visualizes process trees.
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Intermediate |
| Time Estimate | 8-12 hours |
| Main Programming Language | C |
| Alternative Programming Languages | Rust, Go |
| Coolness Level | High |
| Business Potential | Medium (tooling) |
| Prerequisites | Syscall basics, /proc understanding |
| Key Topics | fork/exec, wait, process trees, zombies |
1. Learning Objectives
By completing this project, you will:
- Explain why Unix separates fork and exec.
- Implement process trees by parsing /proc.
- Demonstrate zombies and orphans safely.
- Capture exit statuses and signals correctly.
2. All Theory Needed (Per-Concept Breakdown)
Fork/Exec Model and Process Lifecycle
Fundamentals
Unix process creation is a two-step model: fork() duplicates the calling process, and exec() replaces the child process image with a new program. This separation enables powerful composition: the child can rewire file descriptors, set environment variables, or change privileges before executing a new program. wait() allows parents to reap child exit statuses and prevents zombie accumulation. A process hierarchy emerges naturally: each process has a parent, and if the parent exits first, the child becomes an orphan and is adopted by init (PID 1). Understanding these states is essential for shells, pipelines, and job control.
Deep Dive into the concept
fork() is conceptually simple: it creates a new process with a copy of the parent’s address space and file descriptors. In practice, it is efficient because modern kernels use copy-on-write (COW). The kernel marks memory pages as read-only and shares them between parent and child until one writes. This allows fork() to be fast even for large processes. The child receives a return value of 0, while the parent receives the child’s PID. This asymmetry allows both to execute different code paths.
exec() replaces the current process image. It loads a new executable, resets the address space, and initializes the stack with arguments and environment variables. This means exec() does not create a new PID; it transforms the existing process. The file descriptor table is preserved (unless FD_CLOEXEC is set), which is why redirection and pipes are possible: the shell creates pipes, forks, sets up FDs in the child, and then execs the target program. The program inherits the redirected file descriptors seamlessly.
wait() and waitpid() are critical for process cleanup. When a child exits, it becomes a zombie: the process is dead but its exit status remains in the process table. The parent must call wait() to reap it. If the parent exits first, the child is reparented to PID 1, which eventually reaps it. Zombies are a resource leak, not a CPU leak, but too many zombies can exhaust the process table.
Process trees are a natural representation of process relationships. On Linux, /proc/<pid>/stat includes the parent PID. You can build a tree by parsing all entries, then linking children to parents. This produces a snapshot of system process structure and reveals shells, daemons, and nested command pipelines.
This project makes the fork/exec lifecycle visible. By intentionally creating zombies and orphans (briefly and safely), you see how the kernel tracks them. By printing PID relationships and exit statuses, you gain a mental model of process control that directly applies to shells, service managers, and container runtimes.
How this fit on projects
This concept powers Section 3.2 and Section 3.7 and is used in Project 7 (shell) and Project 13 (containers).
Definitions & key terms
- fork(): create a new process by copying the current one.
- exec(): replace the current process image with a new program.
- wait(): collect exit status of a child.
- Zombie: exited process waiting to be reaped.
- Orphan: child whose parent has exited.
Mental model diagram (ASCII)
parent (PID 100)
|
+-- fork -> child (PID 101)
|
+-- exec -> new program
How it works (step-by-step)
- Parent calls fork.
- Child returns 0 and continues.
- Child optionally rewires FDs.
- Child calls exec to run target program.
- Parent waits and collects exit status.
Minimal concrete example
pid_t pid = fork();
if (pid == 0) {
execl("/bin/ls", "ls", NULL);
_exit(1);
}
int status;
waitpid(pid, &status, 0);
Common misconceptions
- “exec creates a new PID”: it does not; the PID stays the same.
- “fork copies all memory immediately”: COW makes it lazy.
- “zombies are running”: zombies are dead; only metadata remains.
Check-your-understanding questions
- Why did Unix split process creation into fork and exec?
- What happens if a parent never calls wait?
- How does a shell implement
cmd > file?
Check-your-understanding answers
- It allows the child to set up environment/FDs before running a new program.
- The child stays as a zombie, consuming a process table entry.
- The shell forks, redirects stdout in the child, then execs.
Real-world applications
- Shells and pipelines.
- Service supervisors (systemd) and job control.
Where you’ll apply it
- This project: Section 3.2, Section 3.7, Section 5.10 Phase 2.
- Also used in: Project 7, Project 13.
References
- “APUE” Ch. 8
- “TLPI” Ch. 24-26
Key insights
Fork/exec is the foundation of Unix composition and process control.
Summary
By creating and observing process trees, you make the process lifecycle concrete.
Homework/Exercises to practice the concept
- Add a flag to show only children of a given PID.
- Implement a simple
pstreeview. - Show FD inheritance by printing open FDs before exec.
Solutions to the homework/exercises
- Filter by PPID in the /proc scan.
- Build a parent->children map and print with indentation.
- Read /proc/self/fd and list descriptors.
3. Project Specification
3.1 What You Will Build
A CLI tool that demonstrates fork/exec/wait behaviors and prints process trees from /proc. It will include demos for zombies and orphans with safe timeouts.
3.2 Functional Requirements
- Show fork/exec ordering with timestamps.
- Print exit status and signal termination.
- Build a process tree from /proc.
- Demonstrate zombies and orphans safely.
3.3 Non-Functional Requirements
- Performance: process tree builds in <1 second.
- Reliability: safe timeouts for zombie demo.
- Usability:
./fork_explorer demoand./fork_explorer tree.
3.4 Example Usage / Output
$ ./fork_explorer demo
Parent PID: 1201
fork() -> child PID 1202
Child: exec /bin/ls
file1.txt file2.txt
Parent: child 1202 exited with status 0
3.5 Data Formats / Schemas / Protocols
- **/proc/
/stat** fields for PID/PPID.
3.6 Edge Cases
- Processes that exit while building the tree.
- Permission denied for some /proc entries.
3.7 Real World Outcome
3.7.1 How to Run (Copy/Paste)
./fork_explorer demo
./fork_explorer tree --seed 42
3.7.2 Golden Path Demo (Deterministic)
- Use
--seed 42to pick the same demo branch order.
3.7.3 If CLI: exact terminal transcript
$ ./fork_explorer demo
Parent PID: 3000
fork() -> child PID 3001
Child: exec /bin/echo
hello
Parent: child 3001 exited with status 0
Failure demo (deterministic):
$ ./fork_explorer tree --proc /nope
error: cannot read /nope
Exit codes:
0success2invalid args3/proc read error
4. Solution Architecture
4.1 High-Level Design
CLI -> Demo runner -> Process control
-> /proc scanner -> Tree builder -> Renderer
4.2 Key Components
| Component | Responsibility | Key Decisions | |———–|—————-|—————| | Demo runner | fork/exec/wait demos | timestamps for clarity | | /proc scanner | read PIDs/PPIDs | robust error handling | | Tree renderer | ASCII tree output | indentation style |
4.3 Data Structures (No Full Code)
struct proc_info {
int pid;
int ppid;
char comm[32];
};
4.4 Algorithm Overview
Key Algorithm: Build process tree
- Read all PIDs from /proc.
- Parse PPID from each stat file.
- Build parent->children map.
- Render starting from PID 1.
Complexity Analysis:
- Time: O(n) processes
- Space: O(n)
5. Implementation Guide
5.1 Development Environment Setup
sudo apt-get install build-essential
5.2 Project Structure
project-root/
|-- fork_explorer.c
|-- README.md
`-- Makefile
5.3 The Core Question You’re Answering
“Why does Unix use fork+exec instead of a single spawn call, and how does that enable composition?”
5.4 Concepts You Must Understand First
- Copy-on-write semantics.
- File descriptor inheritance.
- Process states and wait.
5.5 Questions to Guide Your Design
- How will you keep demo output deterministic?
- How will you avoid leaving zombies?
- How will you handle disappearing /proc entries?
5.6 Thinking Exercise
Draw the process tree for cat file | grep foo | wc -l.
5.7 The Interview Questions They’ll Ask
- What is the difference between fork and exec?
- Why can fork be fast?
- What causes zombies?
5.8 Hints in Layers
Hint 1: Start with fork + wait.
Hint 2: Add exec of /bin/echo.
Hint 3: Add /proc tree parsing.
5.9 Books That Will Help
| Topic | Book | Chapter | |——-|——|———| | Process model | OSTEP | 5 | | fork/exec | APUE | 8 | | /proc | TLPI | 12 |
5.10 Implementation Phases
Phase 1: fork/exec demo (2-3 hours)
Goals: basic parent/child output.
Phase 2: wait and status (2-3 hours)
Goals: exit status handling.
Phase 3: /proc tree (3-5 hours)
Goals: tree rendering from /proc.
5.11 Key Implementation Decisions
| Decision | Options | Recommendation | Rationale | |———-|———|—————-|———–| | Tree render | ASCII vs JSON | ASCII | matches OS tools | | Demo safety | sleep vs alarm | alarm | ensures cleanup |
6. Testing Strategy
6.1 Test Categories
| Category | Purpose | Examples | |———-|———|———-| | Unit | parsing /proc | known stat lines | | Integration | demo run | capture output | | Safety | zombie demo | verify wait cleans |
6.2 Critical Test Cases
- Process exits between scan and parse.
- Child exits with non-zero status.
- /proc read permission denied.
6.3 Test Data
/proc/1/stat (sample line)
7. Common Pitfalls & Debugging
7.1 Frequent Mistakes
| Pitfall | Symptom | Solution | |——–|———|———-| | Output order confusion | interleaved lines | timestamps + fflush | | Zombie lingering | ps shows Z | waitpid in parent | | /proc parse errors | crash | robust token parsing |
7.2 Debugging Strategies
- Add timestamps to parent/child lines.
- Use
strace -fto confirm fork/exec/wait order.
7.3 Performance Traps
- Re-reading /proc many times unnecessarily.
8. Extensions & Challenges
8.1 Beginner Extensions
- Add option to filter by process name.
8.2 Intermediate Extensions
- Visualize open file descriptors per process.
8.3 Advanced Extensions
- Track process lifetime by polling /proc over time.
9. Real-World Connections
9.1 Industry Applications
- Shell implementation and process supervisors.
9.2 Related Open Source Projects
- pstree and ps source code.
9.3 Interview Relevance
- Fork/exec and process lifecycle are classic questions.
10. Resources
10.1 Essential Reading
- APUE Ch. 8
- TLPI Ch. 24-26
10.2 Video Resources
- Unix process model lectures
10.3 Tools & Documentation
man fork,man execve,man waitpid
10.4 Related Projects in This Series
11. Self-Assessment Checklist
11.1 Understanding
- I can explain fork vs exec.
- I can explain zombies and orphans.
11.2 Implementation
- Demos run and exit cleanly.
11.3 Growth
- I can describe process trees in an interview.
12. Submission / Completion Criteria
Minimum Viable Completion:
- fork/exec demo + wait status output.
Full Completion:
- /proc tree + zombie/orphan demo.
Excellence (Going Above & Beyond):
- FD inheritance visualization and timeline output.