Project 6: Fork and Exec (Process Creation)

Build a process explorer that demonstrates fork/exec/wait and visualizes process trees.

Quick Reference

Attribute	Value
Difficulty	Intermediate
Time Estimate	8-12 hours
Main Programming Language	C
Alternative Programming Languages	Rust, Go
Coolness Level	High
Business Potential	Medium (tooling)
Prerequisites	Syscall basics, /proc understanding
Key Topics	fork/exec, wait, process trees, zombies

1. Learning Objectives

By completing this project, you will:

Explain why Unix separates fork and exec.
Implement process trees by parsing /proc.
Demonstrate zombies and orphans safely.
Capture exit statuses and signals correctly.

2. All Theory Needed (Per-Concept Breakdown)

Fork/Exec Model and Process Lifecycle

Fundamentals

Unix process creation is a two-step model: fork() duplicates the calling process, and exec() replaces the child process image with a new program. This separation enables powerful composition: the child can rewire file descriptors, set environment variables, or change privileges before executing a new program. wait() allows parents to reap child exit statuses and prevents zombie accumulation. A process hierarchy emerges naturally: each process has a parent, and if the parent exits first, the child becomes an orphan and is adopted by init (PID 1). Understanding these states is essential for shells, pipelines, and job control.

Deep Dive into the concept

fork() is conceptually simple: it creates a new process with a copy of the parent’s address space and file descriptors. In practice, it is efficient because modern kernels use copy-on-write (COW). The kernel marks memory pages as read-only and shares them between parent and child until one writes. This allows fork() to be fast even for large processes. The child receives a return value of 0, while the parent receives the child’s PID. This asymmetry allows both to execute different code paths.

exec() replaces the current process image. It loads a new executable, resets the address space, and initializes the stack with arguments and environment variables. This means exec() does not create a new PID; it transforms the existing process. The file descriptor table is preserved (unless FD_CLOEXEC is set), which is why redirection and pipes are possible: the shell creates pipes, forks, sets up FDs in the child, and then execs the target program. The program inherits the redirected file descriptors seamlessly.

wait() and waitpid() are critical for process cleanup. When a child exits, it becomes a zombie: the process is dead but its exit status remains in the process table. The parent must call wait() to reap it. If the parent exits first, the child is reparented to PID 1, which eventually reaps it. Zombies are a resource leak, not a CPU leak, but too many zombies can exhaust the process table.

Process trees are a natural representation of process relationships. On Linux, /proc/<pid>/stat includes the parent PID. You can build a tree by parsing all entries, then linking children to parents. This produces a snapshot of system process structure and reveals shells, daemons, and nested command pipelines.

This project makes the fork/exec lifecycle visible. By intentionally creating zombies and orphans (briefly and safely), you see how the kernel tracks them. By printing PID relationships and exit statuses, you gain a mental model of process control that directly applies to shells, service managers, and container runtimes.

How this fit on projects

This concept powers Section 3.2 and Section 3.7 and is used in Project 7 (shell) and Project 13 (containers).

Definitions & key terms

fork(): create a new process by copying the current one.
exec(): replace the current process image with a new program.
wait(): collect exit status of a child.
Zombie: exited process waiting to be reaped.
Orphan: child whose parent has exited.

Mental model diagram (ASCII)

parent (PID 100)
  |
  +-- fork -> child (PID 101)
       |
       +-- exec -> new program

How it works (step-by-step)

Parent calls fork.
Child returns 0 and continues.
Child optionally rewires FDs.
Child calls exec to run target program.
Parent waits and collects exit status.

Minimal concrete example

pid_t pid = fork();
if (pid == 0) {
    execl("/bin/ls", "ls", NULL);
    _exit(1);
}
int status;
waitpid(pid, &status, 0);

Common misconceptions

“exec creates a new PID”: it does not; the PID stays the same.
“fork copies all memory immediately”: COW makes it lazy.
“zombies are running”: zombies are dead; only metadata remains.

Check-your-understanding questions

Why did Unix split process creation into fork and exec?
What happens if a parent never calls wait?
How does a shell implement cmd > file?

Check-your-understanding answers

It allows the child to set up environment/FDs before running a new program.
The child stays as a zombie, consuming a process table entry.
The shell forks, redirects stdout in the child, then execs.

Real-world applications

Shells and pipelines.
Service supervisors (systemd) and job control.

Where you’ll apply it

This project: Section 3.2, Section 3.7, Section 5.10 Phase 2.
Also used in: Project 7, Project 13.

References

“APUE” Ch. 8
“TLPI” Ch. 24-26

Key insights

Fork/exec is the foundation of Unix composition and process control.

Summary

By creating and observing process trees, you make the process lifecycle concrete.

Homework/Exercises to practice the concept

Add a flag to show only children of a given PID.
Implement a simple pstree view.
Show FD inheritance by printing open FDs before exec.

Solutions to the homework/exercises

Filter by PPID in the /proc scan.
Build a parent->children map and print with indentation.
Read /proc/self/fd and list descriptors.

3. Project Specification

3.1 What You Will Build

A CLI tool that demonstrates fork/exec/wait behaviors and prints process trees from /proc. It will include demos for zombies and orphans with safe timeouts.

3.2 Functional Requirements

Show fork/exec ordering with timestamps.
Print exit status and signal termination.
Build a process tree from /proc.
Demonstrate zombies and orphans safely.

3.3 Non-Functional Requirements

Performance: process tree builds in <1 second.
Reliability: safe timeouts for zombie demo.
Usability: ./fork_explorer demo and ./fork_explorer tree.

3.4 Example Usage / Output

$ ./fork_explorer demo
Parent PID: 1201
fork() -> child PID 1202
Child: exec /bin/ls
file1.txt  file2.txt
Parent: child 1202 exited with status 0

3.5 Data Formats / Schemas / Protocols

**/proc//stat** fields for PID/PPID.

3.6 Edge Cases

Processes that exit while building the tree.
Permission denied for some /proc entries.

3.7 Real World Outcome

3.7.1 How to Run (Copy/Paste)

./fork_explorer demo
./fork_explorer tree --seed 42

3.7.2 Golden Path Demo (Deterministic)

Use --seed 42 to pick the same demo branch order.

3.7.3 If CLI: exact terminal transcript

$ ./fork_explorer demo
Parent PID: 3000
fork() -> child PID 3001
Child: exec /bin/echo
hello
Parent: child 3001 exited with status 0

Failure demo (deterministic):

$ ./fork_explorer tree --proc /nope
error: cannot read /nope

Exit codes:

0 success
2 invalid args
3 /proc read error

4. Solution Architecture

4.1 High-Level Design

CLI -> Demo runner -> Process control
    -> /proc scanner -> Tree builder -> Renderer

4.2 Key Components

4.3 Data Structures (No Full Code)

struct proc_info {
    int pid;
    int ppid;
    char comm[32];
};

4.4 Algorithm Overview

Key Algorithm: Build process tree

Read all PIDs from /proc.
Parse PPID from each stat file.
Build parent->children map.
Render starting from PID 1.

Complexity Analysis:

Time: O(n) processes
Space: O(n)

5. Implementation Guide

5.1 Development Environment Setup

sudo apt-get install build-essential

5.2 Project Structure

project-root/
|-- fork_explorer.c
|-- README.md
`-- Makefile

5.3 The Core Question You’re Answering

“Why does Unix use fork+exec instead of a single spawn call, and how does that enable composition?”

5.4 Concepts You Must Understand First

Copy-on-write semantics.
File descriptor inheritance.
Process states and wait.

5.5 Questions to Guide Your Design

How will you keep demo output deterministic?
How will you avoid leaving zombies?
How will you handle disappearing /proc entries?

5.6 Thinking Exercise

Draw the process tree for cat file | grep foo | wc -l.

5.7 The Interview Questions They’ll Ask

What is the difference between fork and exec?
Why can fork be fast?
What causes zombies?

5.8 Hints in Layers

Hint 1: Start with fork + wait.

Hint 2: Add exec of /bin/echo.

Hint 3: Add /proc tree parsing.

5.9 Books That Will Help

5.10 Implementation Phases

Phase 1: fork/exec demo (2-3 hours)

Goals: basic parent/child output.

Phase 2: wait and status (2-3 hours)

Goals: exit status handling.

Phase 3: /proc tree (3-5 hours)

Goals: tree rendering from /proc.

5.11 Key Implementation Decisions

6. Testing Strategy

6.1 Test Categories

6.2 Critical Test Cases

Process exits between scan and parse.
Child exits with non-zero status.
/proc read permission denied.

6.3 Test Data

/proc/1/stat (sample line)

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

7.2 Debugging Strategies

Add timestamps to parent/child lines.
Use strace -f to confirm fork/exec/wait order.

7.3 Performance Traps

Re-reading /proc many times unnecessarily.

8. Extensions & Challenges

8.1 Beginner Extensions

Add option to filter by process name.

8.2 Intermediate Extensions

Visualize open file descriptors per process.

8.3 Advanced Extensions

Track process lifetime by polling /proc over time.

9. Real-World Connections

9.1 Industry Applications

Shell implementation and process supervisors.

pstree and ps source code.

9.3 Interview Relevance

Fork/exec and process lifecycle are classic questions.

10. Resources

10.1 Essential Reading

APUE Ch. 8
TLPI Ch. 24-26

10.2 Video Resources

Unix process model lectures

10.3 Tools & Documentation

man fork, man execve, man waitpid

11. Self-Assessment Checklist

11.1 Understanding

I can explain fork vs exec.
I can explain zombies and orphans.

11.2 Implementation

Demos run and exit cleanly.

11.3 Growth

I can describe process trees in an interview.

12. Submission / Completion Criteria

Minimum Viable Completion:

fork/exec demo + wait status output.

Full Completion:

/proc tree + zombie/orphan demo.

Excellence (Going Above & Beyond):

FD inheritance visualization and timeline output.