Project 11: Signals + Processes Sandbox

Project 11: Signals + Processes Sandbox

Build a harness that runs child processes in controlled modes and logs exactly which exceptional control flow events occurred and why.

Quick Reference

Attribute Value
Difficulty Advanced
Time Estimate 1-2 weeks
Language C (Alternatives: Rust, Zig, C++)
Prerequisites Basic OS concepts, C programming, Project 10 recommended
Key Topics Process lifecycle, signals, signal handlers, zombies, race conditions
CS:APP Chapters 8

Table of Contents

  1. Learning Objectives
  2. Deep Theoretical Foundation
  3. Project Specification
  4. Solution Architecture
  5. Implementation Guide
  6. Testing Strategy
  7. Common Pitfalls
  8. Extensions
  9. Real-World Connections
  10. Resources
  11. Self-Assessment Checklist

1. Learning Objectives

By completing this project, you will:

  1. Master process lifecycle: Understand fork(), exec(), wait(), and the states a process transitions through
  2. Write correct signal handlers: Design async-signal-safe handlers that donโ€™t corrupt program state
  3. Prevent zombie processes: Implement proper child reaping strategies
  4. Identify and fix race conditions: Recognize timing windows where signals can cause subtle bugs
  5. Use non-local jumps safely: Understand when setjmp/longjmp are appropriate and their dangers
  6. Debug signal-related issues: Use tools and techniques to diagnose ECF problems
  7. Reason about concurrent events: Think systematically about interleaved execution

2. Deep Theoretical Foundation

2.1 The Process Model and Address Spaces

A process is an instance of a running program. It is the OSโ€™s fundamental abstraction for:

  • Logical control flow: The illusion that your program has exclusive use of the CPU
  • Private address space: The illusion that your program has exclusive use of memory
                    PROCESS A                           PROCESS B
              โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”            โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
              โ”‚    Virtual Memory   โ”‚            โ”‚    Virtual Memory   โ”‚
              โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”‚            โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”‚
              โ”‚  โ”‚     Stack     โ”‚  โ”‚            โ”‚  โ”‚     Stack     โ”‚  โ”‚
              โ”‚  โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค  โ”‚            โ”‚  โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค  โ”‚
              โ”‚  โ”‚               โ”‚  โ”‚            โ”‚  โ”‚               โ”‚  โ”‚
              โ”‚  โ”‚     Heap      โ”‚  โ”‚            โ”‚  โ”‚     Heap      โ”‚  โ”‚
              โ”‚  โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค  โ”‚            โ”‚  โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค  โ”‚
              โ”‚  โ”‚  Data + BSS   โ”‚  โ”‚            โ”‚  โ”‚  Data + BSS   โ”‚  โ”‚
              โ”‚  โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค  โ”‚            โ”‚  โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค  โ”‚
              โ”‚  โ”‚     Text      โ”‚  โ”‚            โ”‚  โ”‚     Text      โ”‚  โ”‚
              โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ”‚            โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ”‚
              โ”‚                     โ”‚            โ”‚                     โ”‚
              โ”‚   PID: 1234         โ”‚            โ”‚   PID: 5678         โ”‚
              โ”‚   PPID: 1           โ”‚            โ”‚   PPID: 1234        โ”‚
              โ”‚   State: Running    โ”‚            โ”‚   State: Stopped    โ”‚
              โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜            โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                       โ”‚                                  โ”‚
                       โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                                      โ–ผ
                          โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                          โ”‚   PHYSICAL MEMORY     โ”‚
                          โ”‚  (shared via paging)  โ”‚
                          โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Key insight: Each process believes it has the entire machine to itself. The kernel maintains this illusion through context switching and virtual memory.

2.2 Process States and Transitions

A process exists in one of several states:

                                     โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                                     โ”‚                    PROCESS STATE MACHINE                     โ”‚
                                     โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

                                                        fork()
                                                           โ”‚
                                                           โ–ผ
                                              โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                                              โ”‚         NEW            โ”‚
                                              โ”‚    (being created)     โ”‚
                                              โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                                                          โ”‚
                                                          โ–ผ
         โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
         โ”‚                                                                                          โ”‚
         โ”‚                              SIGCONT                                                     โ”‚
         โ”‚                     โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                                           โ”‚
         โ”‚                     โ”‚                        โ”‚                                           โ”‚
         โ”‚                     โ–ผ                        โ”‚                                           โ”‚
         โ”‚    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”      โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                           โ”‚
         โ”‚    โ”‚        READY           โ”‚โ—€โ”€โ”€โ”€โ”€โ”€โ”‚       STOPPED          โ”‚                           โ”‚
         โ”‚    โ”‚   (waiting for CPU)    โ”‚      โ”‚  (SIGSTOP/SIGTSTP)     โ”‚                           โ”‚
         โ”‚    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜      โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                           โ”‚
         โ”‚                โ”‚                              โ–ฒ                                          โ”‚
         โ”‚                โ”‚ scheduled                    โ”‚                                          โ”‚
         โ”‚                โ–ผ                              โ”‚ SIGSTOP/SIGTSTP                          โ”‚
         โ”‚    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                 โ”‚                                          โ”‚
         โ”‚    โ”‚       RUNNING          โ”‚โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                                          โ”‚
         โ”‚    โ”‚   (executing on CPU)   โ”‚                                                            โ”‚
         โ”‚    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                                                            โ”‚
         โ”‚                โ”‚                                                                         โ”‚
         โ”‚                โ”œโ”€โ”€โ”€ I/O or sleep โ”€โ”€โ”€โ–ถ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                        โ”‚
         โ”‚                โ”‚                      โ”‚       BLOCKED          โ”‚                        โ”‚
         โ”‚                โ”‚                      โ”‚  (waiting for event)   โ”‚                        โ”‚
         โ”‚                โ”‚                      โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                        โ”‚
         โ”‚                โ”‚                                  โ”‚ event occurs                         โ”‚
         โ”‚                โ”‚                                  โ–ผ                                      โ”‚
         โ”‚                โ”‚                       โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                               โ”‚
         โ”‚                โ”‚                       โ”‚     READY       โ”‚                               โ”‚
         โ”‚                โ”‚                       โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                               โ”‚
         โ”‚                โ”‚                                                                         โ”‚
         โ”‚                โ”‚ exit() or signal                                                        โ”‚
         โ”‚                โ–ผ                                                                         โ”‚
         โ”‚    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                                                            โ”‚
         โ”‚    โ”‚       ZOMBIE           โ”‚โ—€โ”€โ”€โ”€โ”€โ”€โ”€ Process terminated but not yet reaped              โ”‚
         โ”‚    โ”‚ (terminated, awaiting  โ”‚                                                            โ”‚
         โ”‚    โ”‚     parent reap)       โ”‚                                                            โ”‚
         โ”‚    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                                                            โ”‚
         โ”‚                โ”‚ parent calls wait()                                                     โ”‚
         โ”‚                โ–ผ                                                                         โ”‚
         โ”‚    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                                                            โ”‚
         โ”‚    โ”‚      TERMINATED        โ”‚                                                            โ”‚
         โ”‚    โ”‚   (fully cleaned up)   โ”‚                                                            โ”‚
         โ”‚    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                                                            โ”‚
         โ”‚                                                                                          โ”‚
         โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Critical states for this project:

  • Zombie: Process has terminated but parent hasnโ€™t called wait(). This is a resource leak!
  • Stopped: Process received SIGSTOP/SIGTSTP and is suspended until SIGCONT
  • Running/Ready: Normal execution states

2.3 fork(), exec(), and wait() Family

fork() - Creating a New Process

pid_t pid = fork();

fork() creates an exact copy of the calling process:

                    BEFORE fork()                          AFTER fork()
              โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”            โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
              โ”‚      PARENT         โ”‚            โ”‚      PARENT         โ”‚
              โ”‚    PID: 1000        โ”‚            โ”‚    PID: 1000        โ”‚
              โ”‚                     โ”‚            โ”‚    fork() returned  โ”‚
              โ”‚    int x = 5;       โ”‚            โ”‚    child_pid (1001) โ”‚
              โ”‚    fork();          โ”‚โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ถโ”‚                     โ”‚
              โ”‚                     โ”‚            โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
              โ”‚                     โ”‚                      โ”‚
              โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                      โ”‚ (both continue
                                                           โ”‚  from here)
                                                           โ”‚
                                               โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                                               โ”‚                       โ”‚
                                               โ–ผ                       โ–ผ
                                    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                                    โ”‚      PARENT         โ”‚  โ”‚      CHILD          โ”‚
                                    โ”‚    PID: 1000        โ”‚  โ”‚    PID: 1001        โ”‚
                                    โ”‚    x = 5            โ”‚  โ”‚    x = 5 (copy!)    โ”‚
                                    โ”‚    pid = 1001       โ”‚  โ”‚    pid = 0          โ”‚
                                    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Key properties:

  1. Child gets an exact copy of parentโ€™s memory (via copy-on-write)
  2. Child inherits open file descriptors
  3. fork() returns twice: child_pid in parent, 0 in child
  4. Execution order is non-deterministic

exec() Family - Replacing Process Image

execve(filename, argv, envp);  // The fundamental call
execl, execle, execlp, execv, execvp, execvpe  // Convenience wrappers

exec() replaces the current process image with a new program:

                    BEFORE exec()                         AFTER exec()
              โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”            โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
              โ”‚      CHILD          โ”‚            โ”‚      CHILD          โ”‚
              โ”‚    PID: 1001        โ”‚            โ”‚    PID: 1001        โ”‚ (same PID!)
              โ”‚                     โ”‚            โ”‚                     โ”‚
              โ”‚   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”‚            โ”‚   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”‚
              โ”‚   โ”‚ Parent Code โ”‚   โ”‚            โ”‚   โ”‚  NEW CODE   โ”‚   โ”‚
              โ”‚   โ”‚   (copy)    โ”‚   โ”‚  โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ถ โ”‚   โ”‚  /bin/ls    โ”‚   โ”‚
              โ”‚   โ”‚             โ”‚   โ”‚  execve()  โ”‚   โ”‚             โ”‚   โ”‚
              โ”‚   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ”‚            โ”‚   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ”‚
              โ”‚                     โ”‚            โ”‚                     โ”‚
              โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜            โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

                                    exec() does NOT return on success
                                    (the old code is gone!)

Key insight: exec() doesnโ€™t create a new process - it transforms the existing one.

wait() Family - Reaping Child Processes

pid_t waitpid(pid_t pid, int *status, int options);
pid_t wait(int *status);  // Equivalent to waitpid(-1, &status, 0)

wait() serves two purposes:

  1. Synchronization: Parent blocks until child terminates (or changes state)
  2. Cleanup: Kernel removes childโ€™s process table entry (reaps the zombie)
// Status examination macros
WIFEXITED(status)     // True if child exited normally
WEXITSTATUS(status)   // Exit code (if WIFEXITED is true)
WIFSIGNALED(status)   // True if child was killed by a signal
WTERMSIG(status)      // Signal number that killed child
WIFSTOPPED(status)    // True if child is currently stopped
WSTOPSIG(status)      // Signal that stopped child
WIFCONTINUED(status)  // True if child was continued by SIGCONT

Options:

  • WNOHANG: Return immediately if no child has exited (non-blocking)
  • WUNTRACED: Also report stopped children
  • WCONTINUED: Also report continued children

2.4 Zombies and Orphans

Zombies

A zombie is a terminated process that hasnโ€™t been reaped by its parent:

                           ZOMBIE PROCESS LIFECYCLE

    Parent                                              Child
       โ”‚                                                  โ”‚
       โ”‚  fork()                                          โ”‚
       โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ถ
       โ”‚                                                  โ”‚
       โ”‚                                                  โ”‚ (runs...)
       โ”‚                                                  โ”‚
       โ”‚                                                  โ”‚ exit(0)
       โ”‚                                                  โ”‚
       โ”‚                                   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
       โ”‚                                   โ”‚      ZOMBIE STATE            โ”‚
       โ”‚                                   โ”‚  - exit code saved           โ”‚
       โ”‚                                   โ”‚  - most resources freed      โ”‚
       โ”‚                                   โ”‚  - PID still reserved        โ”‚
       โ”‚                                   โ”‚  - entry in process table    โ”‚
       โ”‚                                   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
       โ”‚                                                  โ”‚
       โ”‚  (doing other work, not calling wait)            โ”‚
       โ”‚                                                  โ”‚ (stuck as zombie!)
       โ”‚                                                  โ”‚
       โ”‚  wait(&status)                                   โ”‚
       โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ถ
       โ”‚                                   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
       โ”‚                                   โ”‚     FULLY REAPED             โ”‚
       โ”‚                                   โ”‚  - PID can be reused         โ”‚
       โ”‚                                   โ”‚  - process table entry freed โ”‚
       โ”‚                                   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Why zombies are bad:

  • Each zombie consumes a process table entry (limited kernel resource)
  • PIDs are finite (typically 32768 max)
  • Long-running servers can exhaust these resources

Orphans

An orphan is a child whose parent has terminated:

    Parent (PID 1000)                    Child (PID 1001)
         โ”‚                                     โ”‚
         โ”‚  fork()                             โ”‚
         โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ถโ”‚
         โ”‚                                     โ”‚
         โ”‚  exit(0)                            โ”‚ (still running)
         โ”‚                                     โ”‚
    [Parent dies]                              โ”‚
                                               โ”‚
                โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
                โ”‚         ORPHAN               โ”‚
                โ”‚   PPID changes to 1 (init)   โ”‚
                โ”‚   init will reap when done   โ”‚
                โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
                                               โ”‚
                                               โ”‚ exit(0)
                                               โ”‚
                          [init reaps child - no zombie leak]

Key insight: Orphans are automatically adopted by init (PID 1), which will reap them. This is why orphans donโ€™t cause zombie leaks, but intentionally orphaning children is still bad practice.

2.5 Signal Fundamentals

Signals are software interrupts that notify a process of an event:

                                SIGNAL DELIVERY

     โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
     โ”‚                               KERNEL                                       โ”‚
     โ”‚                                                                            โ”‚
     โ”‚   Signal Sources:                                                          โ”‚
     โ”‚   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”     โ”‚
     โ”‚   โ”‚ - Hardware exceptions (SIGSEGV, SIGFPE, SIGBUS)                 โ”‚     โ”‚
     โ”‚   โ”‚ - Terminal input (Ctrl-C โ†’ SIGINT, Ctrl-Z โ†’ SIGTSTP)            โ”‚     โ”‚
     โ”‚   โ”‚ - kill() system call from another process                       โ”‚     โ”‚
     โ”‚   โ”‚ - Timer expiration (SIGALRM)                                    โ”‚     โ”‚
     โ”‚   โ”‚ - Child state change (SIGCHLD)                                  โ”‚     โ”‚
     โ”‚   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜     โ”‚
     โ”‚                                    โ”‚                                       โ”‚
     โ”‚                                    โ–ผ                                       โ”‚
     โ”‚   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”‚
     โ”‚   โ”‚              PENDING SIGNALS (per-process bitmask)               โ”‚    โ”‚
     โ”‚   โ”‚                                                                  โ”‚    โ”‚
     โ”‚   โ”‚   Signal:  1   2   3   4   5   6   7   8   9  ... 31            โ”‚    โ”‚
     โ”‚   โ”‚            โ”Œโ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”          โ”‚    โ”‚
     โ”‚   โ”‚   Pending: โ”‚ 0 โ”‚ 1 โ”‚ 0 โ”‚ 0 โ”‚ 0 โ”‚ 0 โ”‚ 0 โ”‚ 0 โ”‚ 0 โ”‚ ... โ”‚          โ”‚    โ”‚
     โ”‚   โ”‚            โ””โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”˜          โ”‚    โ”‚
     โ”‚   โ”‚                  โ–ฒ                                               โ”‚    โ”‚
     โ”‚   โ”‚                  โ”‚ (SIGINT pending)                              โ”‚    โ”‚
     โ”‚   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ”‚
     โ”‚                                    โ”‚                                       โ”‚
     โ”‚                                    โ”‚ Delivered when:                       โ”‚
     โ”‚                                    โ”‚ 1. Signal not blocked                 โ”‚
     โ”‚                                    โ”‚ 2. Process scheduled to run           โ”‚
     โ”‚                                    โ–ผ                                       โ”‚
     โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                                          โ”‚
                                          โ–ผ
     โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
     โ”‚                            USER PROCESS                                    โ”‚
     โ”‚                                                                            โ”‚
     โ”‚   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”‚
     โ”‚   โ”‚                  SIGNAL HANDLER TABLE                             โ”‚    โ”‚
     โ”‚   โ”‚                                                                   โ”‚    โ”‚
     โ”‚   โ”‚   Signal   โ”‚  Handler                                            โ”‚    โ”‚
     โ”‚   โ”‚   โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€                โ”‚    โ”‚
     โ”‚   โ”‚   SIGINT   โ”‚  my_sigint_handler() or SIG_DFL or SIG_IGN         โ”‚    โ”‚
     โ”‚   โ”‚   SIGCHLD  โ”‚  my_sigchld_handler()                               โ”‚    โ”‚
     โ”‚   โ”‚   SIGTERM  โ”‚  SIG_DFL (terminate)                                โ”‚    โ”‚
     โ”‚   โ”‚   ...      โ”‚  ...                                                โ”‚    โ”‚
     โ”‚   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ”‚
     โ”‚                                                                            โ”‚
     โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Common Signals

Signal Default Action Description
SIGINT Terminate Interrupt from keyboard (Ctrl-C)
SIGTERM Terminate Termination request
SIGKILL Terminate Kill (cannot be caught or ignored)
SIGSTOP Stop Stop process (cannot be caught or ignored)
SIGTSTP Stop Stop from keyboard (Ctrl-Z)
SIGCONT Continue Continue if stopped
SIGCHLD Ignore Child stopped or terminated
SIGSEGV Terminate + core Invalid memory reference
SIGALRM Terminate Timer expired
SIGPIPE Terminate Write to pipe with no reader

2.6 Signal Delivery and Handling

Signal States

                         SIGNAL STATE DIAGRAM

    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
    โ”‚                                                                      โ”‚
    โ”‚   SENT                    PENDING                    DELIVERED      โ”‚
    โ”‚    โ”‚                         โ”‚                           โ”‚          โ”‚
    โ”‚    โ”‚  kill()/raise()         โ”‚  Process scheduled,       โ”‚          โ”‚
    โ”‚    โ”‚  hardware trap          โ”‚  signal unblocked         โ”‚          โ”‚
    โ”‚    โ”‚        โ”‚                โ”‚        โ”‚                  โ”‚          โ”‚
    โ”‚    โ”‚        โ–ผ                โ–ผ        โ”‚                  โ–ผ          โ”‚
    โ”‚    โ”‚   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”‚          โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”      โ”‚
    โ”‚    โ””โ”€โ”€โ–ถโ”‚  Kernel  โ”‚โ”€โ”€โ”€โ–ถโ”‚  Kernel  โ”‚โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ถโ”‚  User    โ”‚      โ”‚
    โ”‚        โ”‚  records โ”‚    โ”‚  pending โ”‚              โ”‚ handler  โ”‚      โ”‚
    โ”‚        โ”‚  signal  โ”‚    โ”‚  bit set โ”‚              โ”‚  runs    โ”‚      โ”‚
    โ”‚        โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜              โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜      โ”‚
    โ”‚                                                                      โ”‚
    โ”‚   NOTE: Signal is pending but NOT delivered if:                     โ”‚
    โ”‚   1. Signal is blocked (in signal mask)                             โ”‚
    โ”‚   2. Process is not running (not scheduled)                         โ”‚
    โ”‚                                                                      โ”‚
    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Critical insight: Signals are delivered between instructions, not during them. When a signal is delivered, the kernel:

  1. Saves the current execution context
  2. Pushes a frame onto the user stack
  3. Jumps to the signal handler
  4. When handler returns, restores original context

2.7 Signal Masks and Blocking

Every process has a signal mask - a set of signals currently blocked:

sigset_t mask, prev_mask;

// Initialize an empty set
sigemptyset(&mask);

// Add SIGCHLD to the set
sigaddset(&mask, SIGCHLD);

// Block SIGCHLD (add to process's signal mask)
sigprocmask(SIG_BLOCK, &mask, &prev_mask);

// ... critical section - SIGCHLD won't be delivered here ...

// Restore previous mask (unblock SIGCHLD)
sigprocmask(SIG_SETMASK, &prev_mask, NULL);
                    SIGNAL BLOCKING MECHANISM

    Process Signal Mask (what's blocked):
    โ”Œโ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”
    โ”‚ 0 โ”‚ 0 โ”‚ 0 โ”‚ 0 โ”‚ 0 โ”‚ 1 โ”‚ 0 โ”‚ 0 โ”‚ 0 โ”‚ ... โ”‚  (SIGCHLD blocked)
    โ””โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”˜
          โ–ฒ
          โ”‚ SIG_BLOCK adds to mask
          โ”‚ SIG_UNBLOCK removes from mask
          โ”‚ SIG_SETMASK replaces mask

    Pending Signals:
    โ”Œโ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”
    โ”‚ 0 โ”‚ 0 โ”‚ 0 โ”‚ 0 โ”‚ 0 โ”‚ 1 โ”‚ 0 โ”‚ 0 โ”‚ 0 โ”‚ ... โ”‚  (SIGCHLD pending)
    โ””โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”˜

    Deliverable = Pending AND NOT Blocked
    โ”Œโ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”
    โ”‚ 0 โ”‚ 0 โ”‚ 0 โ”‚ 0 โ”‚ 0 โ”‚ 0 โ”‚ 0 โ”‚ 0 โ”‚ 0 โ”‚ ... โ”‚  (nothing deliverable now)
    โ””โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”˜

    When mask is cleared, SIGCHLD will be delivered

2.8 Async-Signal-Safe Functions

The critical safety rule: Signal handlers can interrupt the main program at any point. If the handler calls a function that the main program was in the middle of, corruption can occur.

                    THE ASYNC-SIGNAL-SAFETY PROBLEM

    Main program:                           Signal handler:
         โ”‚                                        โ”‚
         โ”‚  malloc() {                            โ”‚
         โ”‚    // modifying internal              โ”‚
         โ”‚    // data structures...              โ”‚
         โ”‚         โ”‚                              โ”‚
         โ”‚         โ”‚ โ—€โ”€โ”€โ”€โ”€ SIGNAL DELIVERED โ”€โ”€โ”€โ”€โ”€โ”ค
         โ”‚         โ”‚                              โ”‚
         โ”‚         โ”‚                     malloc() {
         โ”‚         โ”‚                       // CORRUPTS the same
         โ”‚         โ”‚                       // data structures!
         โ”‚         โ”‚                     }
         โ”‚         โ”‚ โ—€โ”€โ”€โ”€โ”€ HANDLER RETURNS โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
         โ”‚         โ”‚                              โ”‚
         โ”‚    // continues with                  โ”‚
         โ”‚    // corrupted state!                โ”‚
         โ”‚  }                                    โ”‚

Async-signal-safe functions are functions that can be safely called from signal handlers. The POSIX standard defines a specific list:

Safe to call in handlers:

  • _exit() (NOT exit())
  • write() (NOT printf())
  • signal(), sigaction(), sigprocmask()
  • waitpid()
  • fork(), exec*(), kill()
  • open(), close(), read()
  • Most simple system calls

NOT safe (never call from handlers):

  • printf(), fprintf() - use buffered I/O
  • malloc(), free() - internal data structures
  • exit() - runs atexit handlers
  • Most standard library functions

Writing to stdout safely from a handler:

// WRONG:
void handler(int sig) {
    printf("Caught signal %d\n", sig);  // NOT async-signal-safe!
}

// RIGHT:
void handler(int sig) {
    const char msg[] = "Caught signal\n";
    write(STDOUT_FILENO, msg, sizeof(msg) - 1);  // Safe!
}

2.9 Race Conditions in Signal Handling

Race conditions occur when the correctness of a program depends on the timing of events:

                    THE CLASSIC SIGCHLD RACE

    Scenario: Parent forks a child and adds it to a job list

    INCORRECT (has race):

    Parent:                                 Child:
         โ”‚                                     โ”‚
         โ”‚  pid = fork()                       โ”‚
         โ”‚       โ”‚                             โ”‚
         โ”‚       โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ถโ”‚
         โ”‚       โ”‚                             โ”‚
         โ”‚       โ”‚                             โ”‚  (child runs)
         โ”‚       โ”‚                             โ”‚  (child exits IMMEDIATELY)
         โ”‚       โ”‚                             โ”‚
         โ”‚       โ”‚ โ—€โ”€โ”€ SIGCHLD DELIVERED โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
         โ”‚       โ”‚                             โ”‚
         โ”‚  SIGCHLD handler runs:              โ”‚
         โ”‚    deletejob(pid)  // JOB NOT FOUND!โ”‚
         โ”‚                    // because addjobโ”‚
         โ”‚                    // hasn't run yetโ”‚
         โ”‚       โ”‚                             โ”‚
         โ”‚  addjob(pid)  // Adds a zombie!     โ”‚
         โ”‚                                     โ”‚

    CORRECT (race avoided):

    Parent:                                 Child:
         โ”‚                                     โ”‚
         โ”‚  sigprocmask(SIG_BLOCK, SIGCHLD)    โ”‚
         โ”‚       โ”‚                             โ”‚
         โ”‚  pid = fork()                       โ”‚
         โ”‚       โ”‚                             โ”‚
         โ”‚       โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ถโ”‚
         โ”‚       โ”‚                             โ”‚
         โ”‚  addjob(pid)  // SAFE - SIGCHLD     โ”‚  (child may exit)
         โ”‚               // is blocked!        โ”‚
         โ”‚       โ”‚                             โ”‚
         โ”‚  sigprocmask(SIG_UNBLOCK, SIGCHLD)  โ”‚
         โ”‚       โ”‚                             โ”‚
         โ”‚       โ”‚ โ—€โ”€โ”€ SIGCHLD NOW DELIVERED โ”€โ”€โ”ค
         โ”‚       โ”‚                             โ”‚
         โ”‚  SIGCHLD handler runs:              โ”‚
         โ”‚    deletejob(pid)  // WORKS - job   โ”‚
         โ”‚                    // was added!    โ”‚

2.10 Reentrant Functions

A function is reentrant if it can be safely interrupted and called again before the first invocation completes:

                    REENTRANCY ILLUSTRATED

    Non-reentrant function (global state):

    int global_counter = 0;

    int increment() {
        int temp = global_counter;   // โ—€โ”€โ”€ Signal here
        temp = temp + 1;             //     causes increment()
        global_counter = temp;       //     to be called again,
        return global_counter;       //     corrupting state
    }

    Reentrant version (local state only):

    int increment(int *counter) {
        return ++(*counter);         // Only uses stack/parameters
    }

Making handlers safe:

  1. Use only async-signal-safe functions
  2. Save and restore errno
  3. Protect global data structures with signal blocking
  4. Keep handlers simple - set a flag and return

2.11 setjmp/longjmp for Non-Local Control

setjmp and longjmp provide a way to jump directly from one function to another, bypassing normal call/return:

#include <setjmp.h>

jmp_buf buf;

void handler(int sig) {
    longjmp(buf, 1);  // Jump back to setjmp with return value 1
}

int main() {
    signal(SIGINT, handler);

    if (setjmp(buf) == 0) {
        // First time through (normal flow)
        while (1) {
            // Do work...
        }
    } else {
        // Got here via longjmp from handler
        printf("Interrupted, cleaning up\n");
    }
}
                    SETJMP/LONGJMP CONTROL FLOW

    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
    โ”‚   main()                                                         โ”‚
    โ”‚        โ”‚                                                         โ”‚
    โ”‚        โ”‚  setjmp(buf) saves context                             โ”‚
    โ”‚        โ”‚       โ”‚                                                 โ”‚
    โ”‚        โ”‚       โ–ผ                                                 โ”‚
    โ”‚   โ”Œโ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”‚
    โ”‚   โ”‚  SAVED CONTEXT:                                         โ”‚    โ”‚
    โ”‚   โ”‚  - Stack pointer                                        โ”‚    โ”‚
    โ”‚   โ”‚  - Program counter                                      โ”‚    โ”‚
    โ”‚   โ”‚  - Callee-saved registers                               โ”‚    โ”‚
    โ”‚   โ”‚  - Signal mask (if sigsetjmp)                           โ”‚    โ”‚
    โ”‚   โ””โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ”‚
    โ”‚        โ”‚                                                         โ”‚
    โ”‚        โ”‚  returns 0 (first time)                                โ”‚
    โ”‚        โ”‚                                                         โ”‚
    โ”‚        โ”‚  ... program runs, calls other functions ...           โ”‚
    โ”‚        โ”‚                                                         โ”‚
    โ”‚        โ”‚  deep_function()                                       โ”‚
    โ”‚        โ”‚        โ”‚                                                โ”‚
    โ”‚        โ”‚        โ”‚ โ—€โ”€โ”€ SIGINT โ”€โ”€โ”€                                โ”‚
    โ”‚        โ”‚        โ”‚                                                โ”‚
    โ”‚        โ”‚  handler() {                                           โ”‚
    โ”‚        โ”‚    longjmp(buf, 1);  โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”        โ”‚
    โ”‚        โ”‚  }                                             โ”‚        โ”‚
    โ”‚        โ”‚                                                โ”‚        โ”‚
    โ”‚        โ”‚        โ”‚ (stack frames unwound)               โ”‚        โ”‚
    โ”‚        โ”‚        โ”‚ (destructors NOT called!)            โ”‚        โ”‚
    โ”‚        โ”‚        โ”‚                                       โ”‚        โ”‚
    โ”‚        โ”‚โ—€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜        โ”‚
    โ”‚        โ”‚                                                         โ”‚
    โ”‚        โ”‚  setjmp returns 1 (second time)                        โ”‚
    โ”‚        โ”‚                                                         โ”‚
    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Dangers of longjmp:

  1. Stack frames are discarded, destructors/cleanup not called
  2. Can leave data structures in inconsistent state
  3. sigsetjmp/siglongjmp are needed to properly save/restore signal mask
  4. Jumping into a function that has returned is undefined behavior

2.12 Proper Error Handling with Signals

Signals can interrupt system calls. When this happens, the call may:

  1. Return an error with errno = EINTR
  2. Automatically restart (SA_RESTART flag)
// Handling EINTR properly:
ssize_t safe_read(int fd, void *buf, size_t count) {
    ssize_t n;
    while ((n = read(fd, buf, count)) < 0) {
        if (errno == EINTR)
            continue;  // Interrupted - retry
        else
            return -1;  // Real error
    }
    return n;
}

// Or use SA_RESTART when installing handler:
struct sigaction sa;
sa.sa_handler = handler;
sigemptyset(&sa.sa_mask);
sa.sa_flags = SA_RESTART;  // Automatically restart interrupted calls
sigaction(SIGCHLD, &sa, NULL);

3. Project Specification

3.1 What You Will Build

A process sandbox harness that:

  1. Spawns child processes in controlled modes
  2. Monitors and logs all exceptional control flow events
  3. Demonstrates zombies, orphans, and signal races
  4. Provides tools for understanding and preventing these issues

3.2 Functional Requirements

Core Features

  1. Process Spawner (sandbox spawn):
    • Normal exit mode (child exits cleanly)
    • Crash mode (child triggers SIGSEGV, SIGFPE, etc.)
    • Stop/continue mode (child responds to SIGSTOP/SIGCONT)
    • Timeout mode (child killed after timeout)
    • Zombie demonstration mode
    • Orphan demonstration mode
  2. Event Logger:
    • Timestamp all events to microsecond precision
    • Log: fork, exec, signal sent, signal delivered, state changes, wait results
    • Output machine-parseable format (for analysis)
  3. Signal Handler Laboratory (sandbox signals):
    • Demonstrate async-signal-safety violations
    • Show race condition scenarios
    • Illustrate proper vs improper signal blocking
  4. Race Condition Demonstrator (sandbox race):
    • Show the SIGCHLD/addjob race
    • Demonstrate how to fix it
    • Provide before/after comparison

3.3 Example Usage and Output

$ ./sandbox spawn --mode=normal
=== PROCESS SANDBOX: Normal Exit Mode ===

[00:00:00.000001] PARENT (PID 1000): Starting sandbox
[00:00:00.000045] PARENT (PID 1000): Forking child
[00:00:00.000089] PARENT (PID 1000): fork() returned 1001
[00:00:00.000091] CHILD  (PID 1001): Started (PPID=1000)
[00:00:00.000095] CHILD  (PID 1001): Executing /bin/echo "Hello, World!"
[00:00:00.000234] CHILD  (PID 1001): execve() succeeded
Hello, World!
[00:00:00.001234] PARENT (PID 1000): Received SIGCHLD
[00:00:00.001256] PARENT (PID 1000): waitpid() returned 1001
[00:00:00.001258] PARENT (PID 1000): Child status: exited normally, code=0

Timeline:
  fork โ”€โ”€โ–ถ exec โ”€โ”€โ–ถ run โ”€โ”€โ–ถ exit โ”€โ”€โ–ถ SIGCHLD โ”€โ”€โ–ถ reap

Summary: Child executed normally, no zombies created.
$ ./sandbox spawn --mode=zombie --delay=5
=== PROCESS SANDBOX: Zombie Demonstration ===

[00:00:00.000001] PARENT (PID 1000): Starting zombie demo
[00:00:00.000045] PARENT (PID 1000): Forking child
[00:00:00.000089] CHILD  (PID 1001): Started
[00:00:00.000091] CHILD  (PID 1001): Exiting immediately
[00:00:00.000095] CHILD  (PID 1001): exit(42)
[00:00:00.000100] PARENT (PID 1000): NOT calling wait() for 5 seconds...

[Process table during delay:]
$ ps aux | grep sandbox
user  1000  ... sandbox spawn --mode=zombie
user  1001  ... [sandbox] <defunct>    โ—€โ”€โ”€ ZOMBIE!

[00:00:05.000100] PARENT (PID 1000): Now calling wait()
[00:00:05.000123] PARENT (PID 1000): waitpid() returned 1001, status=42
[00:00:05.000125] PARENT (PID 1000): Zombie reaped

Explanation:
  Between child exit and parent wait(), the child was a ZOMBIE.
  - PID 1001 was reserved (couldn't be reused)
  - Process table entry consumed resources
  - This would be a bug in a long-running server!

Prevention:
  1. Always call wait()/waitpid() promptly
  2. Use SIGCHLD handler to reap asynchronously
  3. Use double-fork to orphan children (let init reap)
$ ./sandbox race --demonstrate
=== SIGNAL RACE CONDITION DEMONSTRATION ===

PART 1: The Race (buggy code)
[00:00:00.000001] Installing SIGCHLD handler (buggy version)
[00:00:00.000010] Forking child...
[00:00:00.000050] fork() returned 1001
[00:00:00.000051] Handler triggered: deletejob(1001)
[00:00:00.000052] ERROR: Job 1001 not in list! (not added yet)
[00:00:00.000053] addjob(1001)   โ—€โ”€โ”€ Too late! Handler already ran

PROBLEM: The race occurs because:
  1. fork() returns in parent
  2. Child runs and exits BEFORE parent reaches addjob()
  3. SIGCHLD handler runs and tries to delete non-existent job
  4. Parent then adds a job that will never be reaped

PART 2: The Fix (correct code)
[00:00:01.000001] Installing SIGCHLD handler (correct version)
[00:00:01.000010] Blocking SIGCHLD before fork...
[00:00:01.000015] Forking child...
[00:00:01.000050] fork() returned 1002
[00:00:01.000055] addjob(1002)   โ—€โ”€โ”€ Safe! SIGCHLD blocked
[00:00:01.000060] Unblocking SIGCHLD...
[00:00:01.000062] Handler triggered: deletejob(1002)
[00:00:01.000063] Successfully removed job 1002

SOLUTION: Block SIGCHLD around fork()/addjob() to ensure
          the job is added BEFORE the handler can run.
$ ./sandbox signals --test-safety
=== ASYNC-SIGNAL-SAFETY TEST ===

Test 1: Calling printf() from handler (UNSAFE)
  [Running 1000 iterations with concurrent signals...]
  Result: Crashed after 234 iterations (corrupted stdio buffer)

Test 2: Calling write() from handler (SAFE)
  [Running 1000 iterations with concurrent signals...]
  Result: All 1000 iterations completed successfully

Test 3: Calling malloc() from handler (UNSAFE)
  [Running 1000 iterations with concurrent signals...]
  Result: Hung after 567 iterations (malloc lock deadlock)

CONCLUSION:
  Async-signal-safety is NOT optional. Use only safe functions in handlers.

3.4 Non-Functional Requirements

  • Determinism: Same sequence of events for same inputs (where possible)
  • Portability: Linux x86-64 primary target
  • Educational: Clear explanations, not just outcomes
  • Robustness: No zombie leaks from the sandbox itself

4. Solution Architecture

4.1 High-Level Design

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                              PROCESS SANDBOX                                     โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                                                                                  โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”              โ”‚
โ”‚  โ”‚   CLI Parser     โ”‚  โ”‚  Event Logger    โ”‚  โ”‚  Signal Manager  โ”‚              โ”‚
โ”‚  โ”‚                  โ”‚  โ”‚                  โ”‚  โ”‚                  โ”‚              โ”‚
โ”‚  โ”‚  --mode=zombie   โ”‚  โ”‚  Timestamps      โ”‚  โ”‚  Handler setup   โ”‚              โ”‚
โ”‚  โ”‚  --delay=5       โ”‚  โ”‚  Event types     โ”‚  โ”‚  Mask control    โ”‚              โ”‚
โ”‚  โ”‚  --verbose       โ”‚  โ”‚  PID tracking    โ”‚  โ”‚  Safe I/O        โ”‚              โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜              โ”‚
โ”‚           โ”‚                     โ”‚                     โ”‚                         โ”‚
โ”‚           โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                         โ”‚
โ”‚                                 โ”‚                                                โ”‚
โ”‚                                 โ–ผ                                                โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”‚
โ”‚  โ”‚                         MODE DISPATCHER                                  โ”‚   โ”‚
โ”‚  โ”‚                                                                          โ”‚   โ”‚
โ”‚  โ”‚   spawn_normal()  spawn_crash()  spawn_zombie()  spawn_race_demo()      โ”‚   โ”‚
โ”‚  โ”‚        โ”‚               โ”‚              โ”‚                โ”‚                 โ”‚   โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ”‚
โ”‚           โ”‚               โ”‚              โ”‚                โ”‚                      โ”‚
โ”‚           โ–ผ               โ–ผ              โ–ผ                โ–ผ                      โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”‚
โ”‚  โ”‚                       PROCESS CONTROLLER                                 โ”‚   โ”‚
โ”‚  โ”‚                                                                          โ”‚   โ”‚
โ”‚  โ”‚   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”     โ”‚   โ”‚
โ”‚  โ”‚   โ”‚   fork()   โ”‚โ”€โ”€โ–ถโ”‚   exec()   โ”‚โ”€โ”€โ–ถโ”‚   wait()   โ”‚โ”€โ”€โ–ถโ”‚  report()  โ”‚     โ”‚   โ”‚
โ”‚  โ”‚   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜     โ”‚   โ”‚
โ”‚  โ”‚                                                                          โ”‚   โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ”‚
โ”‚                                                                                  โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”‚
โ”‚  โ”‚                        REPORT GENERATOR                                   โ”‚  โ”‚
โ”‚  โ”‚                                                                           โ”‚  โ”‚
โ”‚  โ”‚   Timeline visualization   โ”‚   Explanation text   โ”‚   Prevention tips    โ”‚  โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ”‚
โ”‚                                                                                  โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

4.2 Key Components

Component Responsibility Key Files
CLI Parser Parse command-line options, dispatch to modes main.c, cli.c
Event Logger Thread-safe, signal-safe event recording logger.c
Signal Manager Install handlers, manage masks, safe I/O signals.c
Process Controller fork/exec/wait, child modes process.c
Mode Implementations Specific demo scenarios modes/*.c
Report Generator Format output, explanations report.c

4.3 Data Structures

/* Event types for logging */
typedef enum {
    EVENT_FORK,
    EVENT_EXEC,
    EVENT_EXIT,
    EVENT_SIGNAL_SENT,
    EVENT_SIGNAL_RECEIVED,
    EVENT_WAIT_STARTED,
    EVENT_WAIT_RETURNED,
    EVENT_STATE_CHANGE,
    EVENT_ERROR
} event_type_t;

/* Single logged event */
typedef struct {
    struct timespec timestamp;
    event_type_t type;
    pid_t pid;
    pid_t related_pid;      /* For fork: child PID; for signal: sender */
    int signal_num;         /* For signal events */
    int status;             /* For wait events */
    char message[256];      /* Human-readable description */
} event_t;

/* Event log (ring buffer for async-signal-safe access) */
typedef struct {
    event_t events[MAX_EVENTS];
    volatile sig_atomic_t head;
    volatile sig_atomic_t count;
} event_log_t;

/* Job list entry (for race condition demos) */
typedef struct job {
    pid_t pid;
    int state;              /* RUNNING, STOPPED, DONE */
    char cmdline[256];
    struct job *next;
} job_t;

/* Sandbox configuration */
typedef struct {
    int mode;               /* Normal, crash, zombie, etc. */
    int delay_seconds;      /* For zombie demo */
    int verbose;            /* Detailed output */
    int demonstrate_race;   /* Show race condition */
    char *child_command;    /* What to exec in child */
} sandbox_config_t;

4.4 Signal Handler Design

/* Signal-safe write wrapper */
static void safe_write(const char *msg) {
    write(STDOUT_FILENO, msg, strlen(msg));
}

/* Signal-safe integer to string */
static void itoa_safe(int n, char *buf) {
    /* Custom implementation using only stack */
}

/* Async-signal-safe SIGCHLD handler */
static volatile sig_atomic_t got_sigchld = 0;
static volatile sig_atomic_t child_pid = 0;
static volatile sig_atomic_t child_status = 0;

void sigchld_handler(int sig) {
    int saved_errno = errno;  /* Save errno */
    pid_t pid;
    int status;

    /* Reap ALL available children (signals can coalesce) */
    while ((pid = waitpid(-1, &status, WNOHANG | WUNTRACED)) > 0) {
        /* Use only signal-safe operations */
        child_pid = pid;
        child_status = status;
        got_sigchld = 1;

        /* Log using write(), not printf() */
        safe_write("[HANDLER] Reaped child\n");
    }

    errno = saved_errno;  /* Restore errno */
}

5. Implementation Guide

5.1 Development Environment Setup

# Required: Linux with standard development tools
sudo apt-get install build-essential gdb

# Recommended: Valgrind for memory checking
sudo apt-get install valgrind

# Create project structure
mkdir -p sandbox/{src,include,tests,docs}
cd sandbox

5.2 Project Structure

sandbox/
โ”œโ”€โ”€ src/
โ”‚   โ”œโ”€โ”€ main.c              # Entry point, CLI parsing
โ”‚   โ”œโ”€โ”€ logger.c            # Async-signal-safe event logging
โ”‚   โ”œโ”€โ”€ signals.c           # Signal handler installation/management
โ”‚   โ”œโ”€โ”€ process.c           # fork/exec/wait wrappers
โ”‚   โ”œโ”€โ”€ modes/
โ”‚   โ”‚   โ”œโ”€โ”€ normal.c        # Normal exit demonstration
โ”‚   โ”‚   โ”œโ”€โ”€ crash.c         # Crash (signal) demonstration
โ”‚   โ”‚   โ”œโ”€โ”€ zombie.c        # Zombie demonstration
โ”‚   โ”‚   โ”œโ”€โ”€ orphan.c        # Orphan demonstration
โ”‚   โ”‚   โ”œโ”€โ”€ race.c          # Race condition demonstration
โ”‚   โ”‚   โ””โ”€โ”€ safety.c        # Async-signal-safety tests
โ”‚   โ””โ”€โ”€ report.c            # Output formatting
โ”œโ”€โ”€ include/
โ”‚   โ”œโ”€โ”€ sandbox.h           # Public interface
โ”‚   โ”œโ”€โ”€ logger.h            # Logging interface
โ”‚   โ”œโ”€โ”€ signals.h           # Signal utilities
โ”‚   โ””โ”€โ”€ process.h           # Process control interface
โ”œโ”€โ”€ tests/
โ”‚   โ”œโ”€โ”€ test_logger.c       # Logger unit tests
โ”‚   โ”œโ”€โ”€ test_signals.c      # Signal handling tests
โ”‚   โ””โ”€โ”€ stress_test.c       # Concurrent stress tests
โ”œโ”€โ”€ Makefile
โ””โ”€โ”€ README.md

5.3 Implementation Phases

Phase 1: Foundation (Days 1-3)

Goals:

  • Set up build system
  • Implement async-signal-safe logger
  • Basic process creation

Tasks:

  1. Create Makefile with proper compiler flags (-Wall -Werror -g)
  2. Implement safe_write() and basic logging
  3. Implement spawn_child() that forks and execs a command
  4. Implement basic SIGCHLD handler

Checkpoint: Can fork a child, child runs /bin/echo, parent logs events.

/* Phase 1 checkpoint test */
int main() {
    install_sigchld_handler();

    pid_t pid = fork();
    if (pid == 0) {
        /* Child */
        execlp("echo", "echo", "Hello from child", NULL);
        _exit(1);  /* Use _exit, not exit, after failed exec */
    }

    /* Parent */
    log_event(EVENT_FORK, getpid(), pid, 0, 0, "Forked child");

    int status;
    waitpid(pid, &status, 0);
    log_event(EVENT_WAIT_RETURNED, getpid(), pid, 0, status, "Child reaped");

    return 0;
}

Phase 2: Zombie and Orphan Modes (Days 4-6)

Goals:

  • Demonstrate zombie processes
  • Demonstrate orphan processes
  • Clear explanations of each

Tasks:

  1. Implement spawn_zombie() that delays reaping
  2. Add shell command to show process state during delay
  3. Implement spawn_orphan() that has parent exit first
  4. Track PPID changes in orphan case

Checkpoint: ./sandbox spawn --mode=zombie shows defunct process in ps.

/* Zombie demonstration core logic */
void demonstrate_zombie(int delay_seconds) {
    pid_t pid = fork();

    if (pid == 0) {
        /* Child exits immediately */
        log_event(EVENT_EXIT, getpid(), 0, 0, 42, "Child exiting");
        _exit(42);
    }

    /* Parent delays before reaping */
    printf("Child PID %d is now a zombie. Check with: ps aux | grep %d\n", pid, pid);
    printf("Waiting %d seconds before reaping...\n", delay_seconds);

    sleep(delay_seconds);

    int status;
    waitpid(pid, &status, 0);
    printf("Zombie reaped. Exit status: %d\n", WEXITSTATUS(status));
}

Phase 3: Signal Handling Modes (Days 7-9)

Goals:

  • Stop/continue demonstration
  • Crash handling
  • Async-signal-safety testing

Tasks:

  1. Implement stop/continue mode using SIGSTOP/SIGCONT
  2. Implement crash mode (trigger SIGSEGV, catch in parent)
  3. Create async-signal-safety test suite
  4. Demonstrate safe vs unsafe handler functions

Checkpoint: Can send SIGSTOP, see child stop, send SIGCONT, see child resume.

/* Stop/continue demonstration */
void demonstrate_stop_continue(void) {
    pid_t pid = fork();

    if (pid == 0) {
        /* Child: loop forever, parent controls */
        while (1) {
            printf("Child running...\n");
            sleep(1);
        }
    }

    /* Parent */
    sleep(2);

    printf("Sending SIGSTOP to child %d\n", pid);
    kill(pid, SIGSTOP);

    /* Wait for stop, using WUNTRACED */
    int status;
    waitpid(pid, &status, WUNTRACED);
    if (WIFSTOPPED(status)) {
        printf("Child stopped by signal %d\n", WSTOPSIG(status));
    }

    sleep(2);

    printf("Sending SIGCONT to child %d\n", pid);
    kill(pid, SIGCONT);

    /* Let child run a bit more, then terminate */
    sleep(2);
    kill(pid, SIGTERM);
    waitpid(pid, &status, 0);
}

Phase 4: Race Condition Demonstration (Days 10-12)

Goals:

  • Show the classic SIGCHLD/addjob race
  • Demonstrate the fix with signal blocking
  • Clear before/after comparison

Tasks:

  1. Implement simple job list
  2. Create buggy version that races
  3. Create fixed version with signal blocking
  4. Run many iterations to expose race

Checkpoint: Buggy version fails detectably; fixed version succeeds.

/* Race condition demonstration */

/* Buggy version */
void spawn_buggy(void) {
    pid_t pid = fork();
    if (pid == 0) {
        _exit(0);  /* Exit immediately to maximize race window */
    }

    /* Race: SIGCHLD might arrive before addjob() */
    addjob(pid);  /* BUG: handler might run first! */
}

/* Fixed version */
void spawn_fixed(void) {
    sigset_t mask, prev;
    sigemptyset(&mask);
    sigaddset(&mask, SIGCHLD);

    /* Block SIGCHLD */
    sigprocmask(SIG_BLOCK, &mask, &prev);

    pid_t pid = fork();
    if (pid == 0) {
        /* Child: unblock inherited mask before exec */
        sigprocmask(SIG_SETMASK, &prev, NULL);
        _exit(0);
    }

    /* Parent: safe to add job - SIGCHLD is blocked */
    addjob(pid);

    /* Unblock SIGCHLD - handler runs now if signal pending */
    sigprocmask(SIG_SETMASK, &prev, NULL);
}

Phase 5: Polish and Integration (Days 13-14)

Goals:

  • Complete CLI interface
  • Comprehensive output formatting
  • Documentation and explanations

Tasks:

  1. Integrate all modes into unified CLI
  2. Add timeline visualization
  3. Add explanation text for each mode
  4. Write comprehensive tests
  5. Document all features

Checkpoint: All modes work, output is educational, no memory leaks.


6. Testing Strategy

6.1 Test Categories

Category Purpose Examples
Unit Tests Test individual functions Logger, signal mask helpers
Integration Tests Test mode behaviors Zombie demo produces correct output
Stress Tests Expose race conditions Rapid fork/exit cycles
Safety Tests Verify signal safety Handler doesnโ€™t corrupt state

6.2 Critical Test Scenarios

Scenario 1: Zombie Detection

#!/bin/bash
# Test that zombie is actually created

./sandbox spawn --mode=zombie --delay=3 &
PARENT_PID=$!
sleep 1

# Check for defunct process
ZOMBIE=$(ps aux | grep defunct | grep -v grep)
if [ -z "$ZOMBIE" ]; then
    echo "FAIL: No zombie found"
    exit 1
fi

wait $PARENT_PID
echo "PASS: Zombie was created and reaped"

Scenario 2: Signal Race Stress Test

/* Stress test for race condition fix */
void test_race_fix(void) {
    int failures = 0;

    for (int i = 0; i < 1000; i++) {
        reset_job_list();

        spawn_fixed();  /* Should never fail */

        /* Give handler time to run */
        usleep(1000);

        /* Verify job was properly handled */
        if (!job_list_is_consistent()) {
            failures++;
        }
    }

    printf("Race fix test: %d/1000 iterations failed\n", failures);
    assert(failures == 0);
}

Scenario 3: Async-Signal-Safety Test

/* Test that unsafe functions cause problems */
void test_unsafe_handler(void) {
    /* Install handler that calls printf (UNSAFE) */
    signal(SIGUSR1, unsafe_handler);

    /* Send signals rapidly while calling printf */
    pid_t pid = fork();
    if (pid == 0) {
        for (int i = 0; i < 1000; i++) {
            kill(getppid(), SIGUSR1);
            usleep(100);
        }
        _exit(0);
    }

    /* This should eventually hang or crash */
    for (int i = 0; i < 1000; i++) {
        printf("Iteration %d\n", i);
        usleep(100);
    }

    /* If we get here, test didn't trigger the bug (not guaranteed) */
    printf("WARNING: Bug not triggered in this run\n");
}

6.3 Test Data and Expected Results

Test Input Expected Outcome
Normal exit --mode=normal Child exits 0, no zombie
Zombie --mode=zombie --delay=3 Defunct process visible for 3s
Crash --mode=crash --signal=SIGSEGV Child terminates by signal
Race buggy --race --buggy Some iterations fail
Race fixed --race --fixed All iterations succeed

7. Common Pitfalls and Debugging

7.1 Frequent Mistakes

Pitfall Symptom Solution
Using exit() after failed exec() atexit handlers run in child Use _exit() after exec failure
Calling printf() in handler Hangs, crashes, garbled output Use write() only
Not saving errno in handler Mysterious failures in main program Save/restore errno in handler
Single waitpid() for multiple children Zombies accumulate Loop with WNOHANG until no more
Not blocking signals around fork Race conditions Block SIGCHLD before fork
Forgetting WUNTRACED Stopped children not detected Use WUNTRACED to catch stops

7.2 Signal Race Debugging

/* Adding debug output to expose races */

void debug_handler(int sig) {
    int saved_errno = errno;

    /* Async-signal-safe debug output */
    char buf[100];
    int len = snprintf(buf, sizeof(buf),
                       "[HANDLER] sig=%d, time=%ld\n",
                       sig, time(NULL));
    write(STDERR_FILENO, buf, len);

    /* ... handler logic ... */

    errno = saved_errno;
}

/* In main code, also log with timestamps */
void spawn_with_debug(void) {
    fprintf(stderr, "[MAIN] Before fork, time=%ld\n", time(NULL));

    pid_t pid = fork();

    fprintf(stderr, "[MAIN] After fork, pid=%d, time=%ld\n", pid, time(NULL));

    if (pid > 0) {
        fprintf(stderr, "[MAIN] Before addjob, time=%ld\n", time(NULL));
        addjob(pid);
        fprintf(stderr, "[MAIN] After addjob, time=%ld\n", time(NULL));
    }
}

7.3 Zombie Debugging

# Find zombies on the system
ps aux | grep defunct

# See process tree to find parent
pstree -p | grep defunct

# Use strace to see what parent is doing
strace -p <parent_pid>

# Check if parent is blocking on something
cat /proc/<parent_pid>/wchan

7.4 Signal Delivery Debugging

# See what signals are blocked/pending
cat /proc/<pid>/status | grep -E "Sig(Blk|Pnd|Ign|Cgt)"

# Trace signals with strace
strace -e signal ./sandbox

# Send signals manually
kill -SIGCHLD <pid>
kill -STOP <pid>
kill -CONT <pid>

8. Extensions and Challenges

8.1 Beginner Extensions

  • Add color output: Use ANSI codes for different event types
  • JSON logging: Machine-readable event format
  • Multiple children: Spawn and track N children simultaneously

8.2 Intermediate Extensions

  • Process group demo: Show how SIGINT affects process groups
  • Terminal control: Demonstrate foreground/background job control
  • Timer-based timeout: Use SIGALRM for timeouts
  • Signal queue: Show how signals coalesce (send 10, receive fewer)

8.3 Advanced Extensions

  • Full job control shell: Extend to mini-shell with fg/bg
  • ptrace-based tracing: Intercept all system calls
  • Namespace isolation: Use Linux namespaces for sandbox
  • Signal forwarding proxy: Parent intercepts and logs all child signals

9. Real-World Connections

9.1 Industry Applications

  • Process supervisors (systemd, supervisord): Manage service lifecycles
  • Container runtimes (Docker, containerd): Process isolation and control
  • Job schedulers (Slurm, PBS): Batch job management
  • Debuggers (GDB, LLDB): Use ptrace and signals for control
  • Shell implementations (bash, zsh): Job control using these primitives
  • libuv: Cross-platform async I/O with process management
  • supervisord: Python process control system
  • runit: Simple Unix init scheme with process supervision
  • s6: Small, secure supervision suite

9.3 Interview Relevance

This project prepares you for questions like:

  • โ€œWhat is a zombie process and how do you prevent them?โ€
  • โ€œExplain the fork/exec/wait patternโ€
  • โ€œHow do you handle signals safely?โ€
  • โ€œWhat is async-signal-safety?โ€
  • โ€œDescribe a race condition youโ€™ve debuggedโ€
  • โ€œHow does job control work in a shell?โ€

10. Resources

10.1 Essential Reading

  • CS:APP Chapter 8: โ€œExceptional Control Flowโ€ - The primary reference
  • OSTEP Chapters on Processes: ostep.org
  • The Linux Programming Interface by Michael Kerrisk: Chapters 20-28 (Signals), Chapter 24-26 (Process Creation)
  • Advanced Programming in the UNIX Environment by Stevens: Chapters 8-10

10.2 Man Pages

man 2 fork
man 2 execve
man 2 wait
man 2 waitpid
man 7 signal
man 2 sigaction
man 2 sigprocmask
man 3 sigsetjmp
man 7 signal-safety   # List of async-signal-safe functions

10.3 Online Resources


11. Self-Assessment Checklist

Before considering this project complete, verify:

Understanding

  • I can explain the process state diagram without looking at notes
  • I can describe exactly what happens when fork() is called
  • I understand why zombies occur and how to prevent them
  • I can explain async-signal-safety and name 5 safe functions
  • I understand the SIGCHLD/addjob race and how signal blocking fixes it
  • I can describe when and why to use sigsetjmp/siglongjmp

Implementation

  • Logger is fully async-signal-safe (no printf/malloc in handlers)
  • Zombie demonstration actually shows defunct process in ps
  • Race condition demonstration reliably shows bug and fix
  • All children are properly reaped (no zombie leaks)
  • Signal masks are correctly managed around fork()

Debugging Skills

  • I can use ps to identify zombie processes
  • I can use strace to trace signal delivery
  • I can inspect /proc//status for signal state
  • I can diagnose async-signal-safety violations

Growth

  • I caught and fixed at least one subtle signal race in my code
  • I can explain my handler design decisions
  • I feel confident using fork/exec/wait in future projects

13. Real World Outcome

When you complete this project, you will have a comprehensive signals and process control sandbox. Here is exactly what running your tool will look like:

Basic Process Lifecycle Demo

$ ./procsandbox --demo lifecycle
โ•”โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•—
โ•‘             PROCESS LIFECYCLE DEMONSTRATION                       โ•‘
โ•‘                    Normal Exit Scenario                           โ•‘
โ•šโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•

[00.000] PARENT (PID 12345): Starting demonstration
[00.001] PARENT (PID 12345): About to fork()...
[00.002] PARENT (PID 12345): fork() returned 12346 (child PID)
[00.002] CHILD  (PID 12346): I exist! My parent is 12345
[00.003] CHILD  (PID 12346): Doing work for 2 seconds...
[02.005] CHILD  (PID 12346): Work complete, calling exit(42)
[02.005] PARENT (PID 12345): Received SIGCHLD!
[02.006] PARENT (PID 12345): waitpid() returned 12346
[02.006] PARENT (PID 12345): Child exited normally with status 42

Timeline:
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
  0s         1s         2s
  โ”‚โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”‚โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”‚
  โ–ผ fork()              โ–ผ exit(42)
  โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
  โ”‚     CHILD RUNNING    โ”‚
  โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
                         โ–ผ SIGCHLD received
                         โ–ผ waitpid() reaps child
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€

Zombie Process Demonstration

$ ./procsandbox --demo zombie
โ•”โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•—
โ•‘                  ZOMBIE PROCESS DEMONSTRATION                     โ•‘
โ•šโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•

[00.000] PARENT (PID 12345): Creating a child that will become a zombie
[00.001] CHILD  (PID 12346): I'm about to exit immediately!
[00.002] CHILD  (PID 12346): Exiting with status 0
[00.003] PARENT (PID 12345): Child exited, but I'm NOT calling wait()...
[00.003] PARENT (PID 12345): Sleeping for 10 seconds. Check process table!

          โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
          โ”‚  RUN THIS IN ANOTHER TERMINAL:                      โ”‚
          โ”‚                                                     โ”‚
          โ”‚  $ ps aux | grep 12346                              โ”‚
          โ”‚  user   12346  0.0  0.0  0  0 ?  Z  10:00  0:00    โ”‚
          โ”‚         [procsandbox] <defunct>                     โ”‚
          โ”‚                                   โ–ฒ                 โ”‚
          โ”‚                                   โ”‚                 โ”‚
          โ”‚                          ZOMBIE! (Z state)          โ”‚
          โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

[10.005] PARENT (PID 12345): Now calling wait() to reap the zombie...
[10.006] PARENT (PID 12345): waitpid() returned 12346 - zombie reaped!

What just happened:
  1. Child exited but parent didn't call wait()
  2. Kernel kept child's exit status in process table
  3. Child appeared as 'Z' (zombie) state in ps
  4. When parent finally called wait(), zombie was reaped
  5. Zombies waste kernel resources - always reap your children!

Signal Handler Race Condition Demo

$ ./procsandbox --demo race
โ•”โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•—
โ•‘               SIGNAL HANDLER RACE CONDITION                       โ•‘
โ•šโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•

Demonstrating the classic add-job-before-reap race...

BUGGY VERSION (race condition):
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
[00.000] Parent: About to fork
[00.001] Parent: fork() returned 12346
[00.001] Child 12346: Running and exiting immediately
[00.002] Child 12346: exit(0)
[00.002] *** SIGCHLD received! Entering handler... ***
[00.002] Handler: Looking for job with PID 12346...
[00.002] Handler: ERROR - job not found in table!
[00.003] Parent: Now adding job 12346 to table... (TOO LATE!)
[00.003] Parent: Job 12346 will NEVER be reaped - ZOMBIE LEAK!

         โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
         โ”‚  Race Timeline:                                        โ”‚
         โ”‚                                                        โ”‚
         โ”‚  Parent:  fork() โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€ addjob() โ”€โ”€โ”€โ–ถ            โ”‚
         โ”‚                        โ”‚         โ–ฒ                     โ”‚
         โ”‚  Child:       โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€ exit() โ”‚                     โ”‚
         โ”‚               โ”‚              SIGCHLD arrives BEFORE    โ”‚
         โ”‚  Handler: โ”€โ”€โ”€โ”€โ”ดโ”€ can't find job! addjob() runs!        โ”‚
         โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

FIXED VERSION (with sigprocmask):
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
[00.000] Parent: Blocking SIGCHLD before fork
[00.001] Parent: fork() returned 12347
[00.001] Child 12347: Running and exiting immediately
[00.002] Child 12347: exit(0)
[00.002] (SIGCHLD is BLOCKED - handler cannot run yet)
[00.003] Parent: Adding job 12347 to table...
[00.003] Parent: Unblocking SIGCHLD
[00.003] *** SIGCHLD received! Entering handler... ***
[00.003] Handler: Looking for job with PID 12347... FOUND!
[00.004] Handler: Successfully reaped job 12347
[00.004] SUCCESS - No race, no zombie!

Orphan Process Demonstration

$ ./procsandbox --demo orphan
โ•”โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•—
โ•‘                 ORPHAN PROCESS DEMONSTRATION                      โ•‘
โ•šโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•

[00.000] PARENT (PID 12345): Creating a child that will become an orphan
[00.001] CHILD  (PID 12346): My parent is 12345
[00.002] PARENT (PID 12345): I'm exiting early, abandoning my child!
[00.003] CHILD  (PID 12346): Still running... checking my parent...
[00.004] CHILD  (PID 12346): My parent is now 1 (init/systemd adopted me!)
[00.005] CHILD  (PID 12346): I'm an orphan process
[00.006] CHILD  (PID 12346): When I exit, init will reap me

Key insight: When a parent exits, orphaned children are adopted by PID 1
(init or systemd), which will properly reap them when they exit.

Signal Coalescing Demo

$ ./procsandbox --demo coalesce
โ•”โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•—
โ•‘                  SIGNAL COALESCING DEMONSTRATION                  โ•‘
โ•šโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•

Creating 10 children that will all exit nearly simultaneously...

[00.000] Forked children: 12346 12347 12348 12349 12350
                          12351 12352 12353 12354 12355
[00.001] All children exiting now!
[00.002] SIGCHLD received - handler invoked 1 time

BUGGY handler (single waitpid call):
  Reaped: 12346
  MISSED: 12347 12348 12349 12350 12351 12352 12353 12354 12355
  Result: 9 ZOMBIE PROCESSES!

CORRECT handler (waitpid loop):
  while ((pid = waitpid(-1, &status, WNOHANG)) > 0) {
      // reap child
  }
  Reaped: 12346 12347 12348 12349 12350 12351 12352 12353 12354 12355
  Result: All children reaped, no zombies!

Lesson: Signals don't queue! Multiple SIGCHLDs coalesce into one.
        Always use a loop to reap all available children.

Async-Signal-Safety Test

$ ./procsandbox --demo async-safety
โ•”โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•—
โ•‘              ASYNC-SIGNAL-SAFETY DEMONSTRATION                    โ•‘
โ•šโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•

Testing what happens when you call unsafe functions in signal handlers...

Test 1: printf() in handler (UNSAFE)
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
  Running 1000 iterations with printf() in SIGALRM handler...
  [โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ] 100%
  Result: 3 DEADLOCKS detected!

  Why: printf() uses global lock. If signal arrives while main code
       holds that lock, handler blocks forever waiting for same lock.

Test 2: malloc() in handler (UNSAFE)
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
  Running 1000 iterations with malloc() in SIGALRM handler...
  [โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ] 100%
  Result: 7 HEAP CORRUPTIONS detected!

  Why: malloc() manipulates heap data structures. If interrupted
       mid-update, handler's malloc() sees corrupted state.

Test 3: write() in handler (SAFE)
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
  Running 1000 iterations with write() in SIGALRM handler...
  [โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ] 100%
  Result: 0 issues - SAFE!

  write() is async-signal-safe: it's a direct syscall with no
  global state dependencies.

14. The Core Question Youโ€™re Answering

โ€œHow do Unix signals and process control mechanisms work, and what subtle race conditions and correctness issues arise when handling asynchronous events?โ€

Signals are one of the most error-prone areas of systems programming. This project forces you to confront the reality that your code can be interrupted at any point by an asynchronous signal. Understanding these issues is essential for writing correct concurrent code, implementing job control in shells, and avoiding the mysterious bugs that plague programs that handle signals incorrectly.


15. Concepts You Must Understand First

Before starting this project, ensure you have a solid grasp of these foundational concepts:

Concept Where to Learn Why It Matters
Process creation (fork) CS:APP 8.2-8.3 Core of all process control
Process termination (exit, wait) CS:APP 8.4 Proper child cleanup
Signal delivery and handling CS:APP 8.5 Foundation of this project
Signal sets and blocking CS:APP 8.5.4 Preventing race conditions
Zombie and orphan processes CS:APP 8.4.2 Common pitfalls
errno and its quirks man errno Handler must preserve errno
Process groups and sessions APUE Ch. 9 Job control foundation
C function pointers K&R Ch. 5.11 Signal handler registration

16. Questions to Guide Your Design

Work through these questions before writing any code:

  1. Handler Registration: Why should you use sigaction() instead of signal()? What guarantees does sigaction() provide that signal() doesnโ€™t?

  2. Reentrant Data Structures: If your signal handler needs to modify a shared data structure (like a job list), how will you ensure the main code doesnโ€™t see a half-updated state?

  3. errno Preservation: If your handler calls a syscall that fails, it will overwrite errno. How do you prevent this from corrupting the main codeโ€™s error handling?

  4. Child Reaping Strategy: Should you reap children in the signal handler or in the main loop? What are the tradeoffs?

  5. waitpid Options: When should you use WNOHANG? WUNTRACED? WCONTINUED? What happens if you use blocking waitpid() in a signal handler?

  6. Pending vs. Blocked: Whatโ€™s the difference between a pending signal and a blocked signal? If 5 SIGCHLDs arrive while SIGCHLD is blocked, how many will be delivered when you unblock?


17. Thinking Exercise

Before coding, trace through this scenario by hand:

volatile int counter = 0;
volatile sig_atomic_t child_count = 0;

void handler(int sig) {
    counter++;
    child_count--;
}

int main() {
    signal(SIGCHLD, handler);

    for (int i = 0; i < 5; i++) {
        if (fork() == 0) {
            exit(0);  // Child exits immediately
        }
        child_count++;
    }

    while (child_count > 0) {
        sleep(1);  // Wait for all children
    }

    printf("counter = %d\n", counter);
    return 0;
}

Questions to answer:

  1. If all 5 children exit before the parent finishes the fork loop, what value will counter have at the end?
  2. Will the while loop ever terminate? Why or why not?
  3. Whatโ€™s the fundamental bug in this code?
  4. How would you fix it?
Solution (click to expand)

Problem Analysis:

  1. Counter value: Could be anywhere from 1 to 5, most likely 1-2. Signals coalesce, so if all 5 children exit around the same time, most SIGCHLDs are lost.

  2. Loop termination: Likely will NOT terminate! If signals coalesce, counter might only be 1, but child_count started at 5 and only decremented once, leaving child_count = 4. The loop waits forever.

  3. Fundamental bugs:
    • Race between child_count++ and signal handlerโ€™s child_count--
    • Assumes 1 signal per child (signals coalesce!)
    • Uses counter instead of actually reaping children
  4. Fixed version: ```c void handler(int sig) { int saved_errno = errno; pid_t pid; while ((pid = waitpid(-1, NULL, WNOHANG)) > 0) { // Actually reap, loop handles coalescing } errno = saved_errno; }

int main() { sigset_t mask, prev; sigemptyset(&mask); sigaddset(&mask, SIGCHLD);

struct sigaction sa = {.sa_handler = handler, .sa_flags = SA_RESTART};
sigaction(SIGCHLD, &sa, NULL);

int child_count = 0;
for (int i = 0; i < 5; i++) {
    sigprocmask(SIG_BLOCK, &mask, &prev);
    if (fork() == 0) {
        exit(0);
    }
    child_count++;
    sigprocmask(SIG_SETMASK, &prev, NULL);
}

// Proper synchronization via waitpid, not counter
while (waitpid(-1, NULL, 0) > 0 || child_count-- > 0) {
    // Reap remaining children
}
return 0; } ```

18. The Interview Questions Theyโ€™ll Ask

After completing this project, you should be able to confidently answer these questions:

  1. โ€œWhat is a zombie process and how do you prevent them?โ€
    • Zombie = exited child not yet reaped; prevent by calling wait() or using SIGCHLD handler
  2. โ€œExplain the race condition between fork() and adding to a job list. How do you fix it?โ€
    • Block SIGCHLD before fork, add to list, then unblock
  3. โ€œWhy canโ€™t you call printf() from a signal handler?โ€
    • Not async-signal-safe; uses global lock/buffer; can deadlock or corrupt
  4. โ€œWhat happens if 5 signals of the same type arrive while that signal is blocked?โ€
    • Only 1 is delivered when unblocked (signals donโ€™t queue, they coalesce)
  5. โ€œWhatโ€™s the difference between a signal being ignored vs blocked?โ€
    • Ignored: delivered but has no effect. Blocked: pending until unblocked
  6. โ€œHow would you implement a timeout for a blocking operation?โ€
    • setitimer/alarm + longjmp, or use select/poll with timeout

19. Hints in Layers

Use these hints progressively if you get stuck.

Hint Layer 1: Getting Started

  • Start simple: make a parent fork one child, have child exit, have parent wait and print status
  • Use ps aux | grep Z in another terminal to see zombies
  • WIFEXITED(), WIFSIGNALED(), WIFSTOPPED() macros decode wait status

Hint Layer 2: Signal Handling

  • Use sigaction() not signal(): struct sigaction sa = {.sa_handler = handler}; sigaction(SIGCHLD, &sa, NULL);
  • Always save/restore errno in handlers: int saved = errno; ... errno = saved;
  • Use write() for output in handlers, not printf()

Hint Layer 3: Race Prevention

  • Signal blocking pattern:
    sigset_t mask, prev;
    sigemptyset(&mask);
    sigaddset(&mask, SIGCHLD);
    sigprocmask(SIG_BLOCK, &mask, &prev);  // Block
    // ... critical section ...
    sigprocmask(SIG_SETMASK, &prev, NULL); // Restore
    
  • Blocking a signal doesnโ€™t lose it; it becomes pending

Hint Layer 4: Common Bugs

  • Forgetting WNOHANG in handler causes blocking in signal context
  • Not looping on waitpid: while ((pid = waitpid(-1, &st, WNOHANG)) > 0)
  • Race: signal arrives between check and action. Always re-check after unblocking.
  • SA_RESTART doesnโ€™t apply to all syscalls (select, sleep interrupted anyway)

20. Books That Will Help

Topic Book Specific Chapters
Signals and handlers CS:APP (3rd ed.) Chapter 8.5
Process control (fork/exec/wait) CS:APP (3rd ed.) Chapter 8.2-8.4
Race conditions CS:APP (3rd ed.) Chapter 8.5.5
Async-signal-safety The Linux Programming Interface (Kerrisk) Chapter 21.1
Signal sets and blocking The Linux Programming Interface (Kerrisk) Chapter 20
Process groups and sessions Advanced Programming in the Unix Environment (Stevens) Chapter 9
Robust signal handling Advanced Programming in the Unix Environment (Stevens) Chapter 10
Practical debugging The Art of Debugging with GDB (Matloff) Chapters on signal tracing

12. Submission / Completion Criteria

Minimum Viable Completion:

  • Normal, zombie, and crash modes work
  • Basic event logging with timestamps
  • Zombie is visibly demonstrated
  • No memory leaks or zombie leaks from sandbox itself

Full Completion:

  • All modes implemented (normal, crash, zombie, orphan, stop/continue)
  • Race condition demonstration with clear before/after
  • Async-signal-safety test suite
  • Comprehensive output with explanations
  • Stress testing passes

Excellence (Going Above & Beyond):

  • Process group and session demonstrations
  • Timeline visualization (ASCII art or terminal graphics)
  • Integration with P12 (shell) for job control
  • Coverage of edge cases (signal coalescing, EINTR handling)

This guide was expanded from CSAPP_3E_DEEP_LEARNING_PROJECTS.md. For the complete learning path, see the project index.