Project 11: Signals + Processes Sandbox
Project 11: Signals + Processes Sandbox
Build a harness that runs child processes in controlled modes and logs exactly which exceptional control flow events occurred and why.
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Advanced |
| Time Estimate | 1-2 weeks |
| Language | C (Alternatives: Rust, Zig, C++) |
| Prerequisites | Basic OS concepts, C programming, Project 10 recommended |
| Key Topics | Process lifecycle, signals, signal handlers, zombies, race conditions |
| CS:APP Chapters | 8 |
Table of Contents
- Learning Objectives
- Deep Theoretical Foundation
- Project Specification
- Solution Architecture
- Implementation Guide
- Testing Strategy
- Common Pitfalls
- Extensions
- Real-World Connections
- Resources
- Self-Assessment Checklist
1. Learning Objectives
By completing this project, you will:
- Master process lifecycle: Understand fork(), exec(), wait(), and the states a process transitions through
- Write correct signal handlers: Design async-signal-safe handlers that donโt corrupt program state
- Prevent zombie processes: Implement proper child reaping strategies
- Identify and fix race conditions: Recognize timing windows where signals can cause subtle bugs
- Use non-local jumps safely: Understand when setjmp/longjmp are appropriate and their dangers
- Debug signal-related issues: Use tools and techniques to diagnose ECF problems
- Reason about concurrent events: Think systematically about interleaved execution
2. Deep Theoretical Foundation
2.1 The Process Model and Address Spaces
A process is an instance of a running program. It is the OSโs fundamental abstraction for:
- Logical control flow: The illusion that your program has exclusive use of the CPU
- Private address space: The illusion that your program has exclusive use of memory
PROCESS A PROCESS B
โโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโ
โ Virtual Memory โ โ Virtual Memory โ
โ โโโโโโโโโโโโโโโโโ โ โ โโโโโโโโโโโโโโโโโ โ
โ โ Stack โ โ โ โ Stack โ โ
โ โโโโโโโโโโโโโโโโโค โ โ โโโโโโโโโโโโโโโโโค โ
โ โ โ โ โ โ โ โ
โ โ Heap โ โ โ โ Heap โ โ
โ โโโโโโโโโโโโโโโโโค โ โ โโโโโโโโโโโโโโโโโค โ
โ โ Data + BSS โ โ โ โ Data + BSS โ โ
โ โโโโโโโโโโโโโโโโโค โ โ โโโโโโโโโโโโโโโโโค โ
โ โ Text โ โ โ โ Text โ โ
โ โโโโโโโโโโโโโโโโโ โ โ โโโโโโโโโโโโโโโโโ โ
โ โ โ โ
โ PID: 1234 โ โ PID: 5678 โ
โ PPID: 1 โ โ PPID: 1234 โ
โ State: Running โ โ State: Stopped โ
โโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโ
โ โ
โโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโ
โ PHYSICAL MEMORY โ
โ (shared via paging) โ
โโโโโโโโโโโโโโโโโโโโโโโโโ
Key insight: Each process believes it has the entire machine to itself. The kernel maintains this illusion through context switching and virtual memory.
2.2 Process States and Transitions
A process exists in one of several states:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ PROCESS STATE MACHINE โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
fork()
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโ
โ NEW โ
โ (being created) โ
โโโโโโโโโโโโโฌโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ โ
โ SIGCONT โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ โ โ
โ โผ โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ READY โโโโโโโโ STOPPED โ โ
โ โ (waiting for CPU) โ โ (SIGSTOP/SIGTSTP) โ โ
โ โโโโโโโโโโโโโฌโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ โฒ โ
โ โ scheduled โ โ
โ โผ โ SIGSTOP/SIGTSTP โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโ โ โ
โ โ RUNNING โโโโโโโโโโโโโโโโโโโ โ
โ โ (executing on CPU) โ โ
โ โโโโโโโโโโโโโฌโโโโโโโโโโโโโ โ
โ โ โ
โ โโโโ I/O or sleep โโโโถ โโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ โ BLOCKED โ โ
โ โ โ (waiting for event) โ โ
โ โ โโโโโโโโโโโโโฌโโโโโโโโโโโโโ โ
โ โ โ event occurs โ
โ โ โผ โ
โ โ โโโโโโโโโโโโโโโโโโโ โ
โ โ โ READY โ โ
โ โ โโโโโโโโโโโโโโโโโโโ โ
โ โ โ
โ โ exit() or signal โ
โ โผ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ ZOMBIE โโโโโโโโ Process terminated but not yet reaped โ
โ โ (terminated, awaiting โ โ
โ โ parent reap) โ โ
โ โโโโโโโโโโโโโฌโโโโโโโโโโโโโ โ
โ โ parent calls wait() โ
โ โผ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ TERMINATED โ โ
โ โ (fully cleaned up) โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Critical states for this project:
- Zombie: Process has terminated but parent hasnโt called wait(). This is a resource leak!
- Stopped: Process received SIGSTOP/SIGTSTP and is suspended until SIGCONT
- Running/Ready: Normal execution states
2.3 fork(), exec(), and wait() Family
fork() - Creating a New Process
pid_t pid = fork();
fork() creates an exact copy of the calling process:
BEFORE fork() AFTER fork()
โโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโ
โ PARENT โ โ PARENT โ
โ PID: 1000 โ โ PID: 1000 โ
โ โ โ fork() returned โ
โ int x = 5; โ โ child_pid (1001) โ
โ fork(); โโโโโโโโโโโโโถโ โ
โ โ โโโโโโโโโโโโโโโโโโโโโโโ
โ โ โ
โโโโโโโโโโโโโโโโโโโโโโโ โ (both continue
โ from here)
โ
โโโโโโโโโโโโโดโโโโโโโโโโโโ
โ โ
โผ โผ
โโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโ
โ PARENT โ โ CHILD โ
โ PID: 1000 โ โ PID: 1001 โ
โ x = 5 โ โ x = 5 (copy!) โ
โ pid = 1001 โ โ pid = 0 โ
โโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโ
Key properties:
- Child gets an exact copy of parentโs memory (via copy-on-write)
- Child inherits open file descriptors
- fork() returns twice: child_pid in parent, 0 in child
- Execution order is non-deterministic
exec() Family - Replacing Process Image
execve(filename, argv, envp); // The fundamental call
execl, execle, execlp, execv, execvp, execvpe // Convenience wrappers
exec() replaces the current process image with a new program:
BEFORE exec() AFTER exec()
โโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโ
โ CHILD โ โ CHILD โ
โ PID: 1001 โ โ PID: 1001 โ (same PID!)
โ โ โ โ
โ โโโโโโโโโโโโโโโ โ โ โโโโโโโโโโโโโโโ โ
โ โ Parent Code โ โ โ โ NEW CODE โ โ
โ โ (copy) โ โ โโโโโโโโโถ โ โ /bin/ls โ โ
โ โ โ โ execve() โ โ โ โ
โ โโโโโโโโโโโโโโโ โ โ โโโโโโโโโโโโโโโ โ
โ โ โ โ
โโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโ
exec() does NOT return on success
(the old code is gone!)
Key insight: exec() doesnโt create a new process - it transforms the existing one.
wait() Family - Reaping Child Processes
pid_t waitpid(pid_t pid, int *status, int options);
pid_t wait(int *status); // Equivalent to waitpid(-1, &status, 0)
wait() serves two purposes:
- Synchronization: Parent blocks until child terminates (or changes state)
- Cleanup: Kernel removes childโs process table entry (reaps the zombie)
// Status examination macros
WIFEXITED(status) // True if child exited normally
WEXITSTATUS(status) // Exit code (if WIFEXITED is true)
WIFSIGNALED(status) // True if child was killed by a signal
WTERMSIG(status) // Signal number that killed child
WIFSTOPPED(status) // True if child is currently stopped
WSTOPSIG(status) // Signal that stopped child
WIFCONTINUED(status) // True if child was continued by SIGCONT
Options:
WNOHANG: Return immediately if no child has exited (non-blocking)WUNTRACED: Also report stopped childrenWCONTINUED: Also report continued children
2.4 Zombies and Orphans
Zombies
A zombie is a terminated process that hasnโt been reaped by its parent:
ZOMBIE PROCESS LIFECYCLE
Parent Child
โ โ
โ fork() โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโถ
โ โ
โ โ (runs...)
โ โ
โ โ exit(0)
โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ โ ZOMBIE STATE โ
โ โ - exit code saved โ
โ โ - most resources freed โ
โ โ - PID still reserved โ
โ โ - entry in process table โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ โ
โ (doing other work, not calling wait) โ
โ โ (stuck as zombie!)
โ โ
โ wait(&status) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโถ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ โ FULLY REAPED โ
โ โ - PID can be reused โ
โ โ - process table entry freed โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Why zombies are bad:
- Each zombie consumes a process table entry (limited kernel resource)
- PIDs are finite (typically 32768 max)
- Long-running servers can exhaust these resources
Orphans
An orphan is a child whose parent has terminated:
Parent (PID 1000) Child (PID 1001)
โ โ
โ fork() โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโถโ
โ โ
โ exit(0) โ (still running)
โ โ
[Parent dies] โ
โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ ORPHAN โ
โ PPID changes to 1 (init) โ
โ init will reap when done โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ
โ exit(0)
โ
[init reaps child - no zombie leak]
Key insight: Orphans are automatically adopted by init (PID 1), which will reap them. This is why orphans donโt cause zombie leaks, but intentionally orphaning children is still bad practice.
2.5 Signal Fundamentals
Signals are software interrupts that notify a process of an event:
SIGNAL DELIVERY
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ KERNEL โ
โ โ
โ Signal Sources: โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ - Hardware exceptions (SIGSEGV, SIGFPE, SIGBUS) โ โ
โ โ - Terminal input (Ctrl-C โ SIGINT, Ctrl-Z โ SIGTSTP) โ โ
โ โ - kill() system call from another process โ โ
โ โ - Timer expiration (SIGALRM) โ โ
โ โ - Child state change (SIGCHLD) โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ โ
โ โผ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ PENDING SIGNALS (per-process bitmask) โ โ
โ โ โ โ
โ โ Signal: 1 2 3 4 5 6 7 8 9 ... 31 โ โ
โ โ โโโโโฌโโโโฌโโโโฌโโโโฌโโโโฌโโโโฌโโโโฌโโโโฌโโโโฌโโโโโโ โ โ
โ โ Pending: โ 0 โ 1 โ 0 โ 0 โ 0 โ 0 โ 0 โ 0 โ 0 โ ... โ โ โ
โ โ โโโโโดโโโโดโโโโดโโโโดโโโโดโโโโดโโโโดโโโโดโโโโดโโโโโโ โ โ
โ โ โฒ โ โ
โ โ โ (SIGINT pending) โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ โ
โ โ Delivered when: โ
โ โ 1. Signal not blocked โ
โ โ 2. Process scheduled to run โ
โ โผ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ USER PROCESS โ
โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ SIGNAL HANDLER TABLE โ โ
โ โ โ โ
โ โ Signal โ Handler โ โ
โ โ โโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ โ
โ โ SIGINT โ my_sigint_handler() or SIG_DFL or SIG_IGN โ โ
โ โ SIGCHLD โ my_sigchld_handler() โ โ
โ โ SIGTERM โ SIG_DFL (terminate) โ โ
โ โ ... โ ... โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Common Signals
| Signal | Default Action | Description |
|---|---|---|
| SIGINT | Terminate | Interrupt from keyboard (Ctrl-C) |
| SIGTERM | Terminate | Termination request |
| SIGKILL | Terminate | Kill (cannot be caught or ignored) |
| SIGSTOP | Stop | Stop process (cannot be caught or ignored) |
| SIGTSTP | Stop | Stop from keyboard (Ctrl-Z) |
| SIGCONT | Continue | Continue if stopped |
| SIGCHLD | Ignore | Child stopped or terminated |
| SIGSEGV | Terminate + core | Invalid memory reference |
| SIGALRM | Terminate | Timer expired |
| SIGPIPE | Terminate | Write to pipe with no reader |
2.6 Signal Delivery and Handling
Signal States
SIGNAL STATE DIAGRAM
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ โ
โ SENT PENDING DELIVERED โ
โ โ โ โ โ
โ โ kill()/raise() โ Process scheduled, โ โ
โ โ hardware trap โ signal unblocked โ โ
โ โ โ โ โ โ โ
โ โ โผ โผ โ โผ โ
โ โ โโโโโโโโโโโโ โโโโโโโโโโโโ โ โโโโโโโโโโโโ โ
โ โโโโถโ Kernel โโโโโถโ Kernel โโโโโดโโโโโโโโโโถโ User โ โ
โ โ records โ โ pending โ โ handler โ โ
โ โ signal โ โ bit set โ โ runs โ โ
โ โโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโ โ
โ โ
โ NOTE: Signal is pending but NOT delivered if: โ
โ 1. Signal is blocked (in signal mask) โ
โ 2. Process is not running (not scheduled) โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Critical insight: Signals are delivered between instructions, not during them. When a signal is delivered, the kernel:
- Saves the current execution context
- Pushes a frame onto the user stack
- Jumps to the signal handler
- When handler returns, restores original context
2.7 Signal Masks and Blocking
Every process has a signal mask - a set of signals currently blocked:
sigset_t mask, prev_mask;
// Initialize an empty set
sigemptyset(&mask);
// Add SIGCHLD to the set
sigaddset(&mask, SIGCHLD);
// Block SIGCHLD (add to process's signal mask)
sigprocmask(SIG_BLOCK, &mask, &prev_mask);
// ... critical section - SIGCHLD won't be delivered here ...
// Restore previous mask (unblock SIGCHLD)
sigprocmask(SIG_SETMASK, &prev_mask, NULL);
SIGNAL BLOCKING MECHANISM
Process Signal Mask (what's blocked):
โโโโโฌโโโโฌโโโโฌโโโโฌโโโโฌโโโโฌโโโโฌโโโโฌโโโโฌโโโโโโ
โ 0 โ 0 โ 0 โ 0 โ 0 โ 1 โ 0 โ 0 โ 0 โ ... โ (SIGCHLD blocked)
โโโโโดโโโโดโโโโดโโโโดโโโโดโโโโดโโโโดโโโโดโโโโดโโโโโโ
โฒ
โ SIG_BLOCK adds to mask
โ SIG_UNBLOCK removes from mask
โ SIG_SETMASK replaces mask
Pending Signals:
โโโโโฌโโโโฌโโโโฌโโโโฌโโโโฌโโโโฌโโโโฌโโโโฌโโโโฌโโโโโโ
โ 0 โ 0 โ 0 โ 0 โ 0 โ 1 โ 0 โ 0 โ 0 โ ... โ (SIGCHLD pending)
โโโโโดโโโโดโโโโดโโโโดโโโโดโโโโดโโโโดโโโโดโโโโดโโโโโโ
Deliverable = Pending AND NOT Blocked
โโโโโฌโโโโฌโโโโฌโโโโฌโโโโฌโโโโฌโโโโฌโโโโฌโโโโฌโโโโโโ
โ 0 โ 0 โ 0 โ 0 โ 0 โ 0 โ 0 โ 0 โ 0 โ ... โ (nothing deliverable now)
โโโโโดโโโโดโโโโดโโโโดโโโโดโโโโดโโโโดโโโโดโโโโดโโโโโโ
When mask is cleared, SIGCHLD will be delivered
2.8 Async-Signal-Safe Functions
The critical safety rule: Signal handlers can interrupt the main program at any point. If the handler calls a function that the main program was in the middle of, corruption can occur.
THE ASYNC-SIGNAL-SAFETY PROBLEM
Main program: Signal handler:
โ โ
โ malloc() { โ
โ // modifying internal โ
โ // data structures... โ
โ โ โ
โ โ โโโโโ SIGNAL DELIVERED โโโโโโค
โ โ โ
โ โ malloc() {
โ โ // CORRUPTS the same
โ โ // data structures!
โ โ }
โ โ โโโโโ HANDLER RETURNS โโโโโโโค
โ โ โ
โ // continues with โ
โ // corrupted state! โ
โ } โ
Async-signal-safe functions are functions that can be safely called from signal handlers. The POSIX standard defines a specific list:
Safe to call in handlers:
_exit()(NOTexit())write()(NOTprintf())signal(),sigaction(),sigprocmask()waitpid()fork(),exec*(),kill()open(),close(),read()- Most simple system calls
NOT safe (never call from handlers):
printf(),fprintf()- use buffered I/Omalloc(),free()- internal data structuresexit()- runs atexit handlers- Most standard library functions
Writing to stdout safely from a handler:
// WRONG:
void handler(int sig) {
printf("Caught signal %d\n", sig); // NOT async-signal-safe!
}
// RIGHT:
void handler(int sig) {
const char msg[] = "Caught signal\n";
write(STDOUT_FILENO, msg, sizeof(msg) - 1); // Safe!
}
2.9 Race Conditions in Signal Handling
Race conditions occur when the correctness of a program depends on the timing of events:
THE CLASSIC SIGCHLD RACE
Scenario: Parent forks a child and adds it to a job list
INCORRECT (has race):
Parent: Child:
โ โ
โ pid = fork() โ
โ โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโถโ
โ โ โ
โ โ โ (child runs)
โ โ โ (child exits IMMEDIATELY)
โ โ โ
โ โ โโโ SIGCHLD DELIVERED โโโโโโโค
โ โ โ
โ SIGCHLD handler runs: โ
โ deletejob(pid) // JOB NOT FOUND!โ
โ // because addjobโ
โ // hasn't run yetโ
โ โ โ
โ addjob(pid) // Adds a zombie! โ
โ โ
CORRECT (race avoided):
Parent: Child:
โ โ
โ sigprocmask(SIG_BLOCK, SIGCHLD) โ
โ โ โ
โ pid = fork() โ
โ โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโถโ
โ โ โ
โ addjob(pid) // SAFE - SIGCHLD โ (child may exit)
โ // is blocked! โ
โ โ โ
โ sigprocmask(SIG_UNBLOCK, SIGCHLD) โ
โ โ โ
โ โ โโโ SIGCHLD NOW DELIVERED โโโค
โ โ โ
โ SIGCHLD handler runs: โ
โ deletejob(pid) // WORKS - job โ
โ // was added! โ
2.10 Reentrant Functions
A function is reentrant if it can be safely interrupted and called again before the first invocation completes:
REENTRANCY ILLUSTRATED
Non-reentrant function (global state):
int global_counter = 0;
int increment() {
int temp = global_counter; // โโโ Signal here
temp = temp + 1; // causes increment()
global_counter = temp; // to be called again,
return global_counter; // corrupting state
}
Reentrant version (local state only):
int increment(int *counter) {
return ++(*counter); // Only uses stack/parameters
}
Making handlers safe:
- Use only async-signal-safe functions
- Save and restore
errno - Protect global data structures with signal blocking
- Keep handlers simple - set a flag and return
2.11 setjmp/longjmp for Non-Local Control
setjmp and longjmp provide a way to jump directly from one function to another, bypassing normal call/return:
#include <setjmp.h>
jmp_buf buf;
void handler(int sig) {
longjmp(buf, 1); // Jump back to setjmp with return value 1
}
int main() {
signal(SIGINT, handler);
if (setjmp(buf) == 0) {
// First time through (normal flow)
while (1) {
// Do work...
}
} else {
// Got here via longjmp from handler
printf("Interrupted, cleaning up\n");
}
}
SETJMP/LONGJMP CONTROL FLOW
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ main() โ
โ โ โ
โ โ setjmp(buf) saves context โ
โ โ โ โ
โ โ โผ โ
โ โโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ SAVED CONTEXT: โ โ
โ โ - Stack pointer โ โ
โ โ - Program counter โ โ
โ โ - Callee-saved registers โ โ
โ โ - Signal mask (if sigsetjmp) โ โ
โ โโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ โ
โ โ returns 0 (first time) โ
โ โ โ
โ โ ... program runs, calls other functions ... โ
โ โ โ
โ โ deep_function() โ
โ โ โ โ
โ โ โ โโโ SIGINT โโโ โ
โ โ โ โ
โ โ handler() { โ
โ โ longjmp(buf, 1); โโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ } โ โ
โ โ โ โ
โ โ โ (stack frames unwound) โ โ
โ โ โ (destructors NOT called!) โ โ
โ โ โ โ โ
โ โโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ โ
โ โ setjmp returns 1 (second time) โ
โ โ โ
โโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Dangers of longjmp:
- Stack frames are discarded, destructors/cleanup not called
- Can leave data structures in inconsistent state
sigsetjmp/siglongjmpare needed to properly save/restore signal mask- Jumping into a function that has returned is undefined behavior
2.12 Proper Error Handling with Signals
Signals can interrupt system calls. When this happens, the call may:
- Return an error with
errno = EINTR - Automatically restart (SA_RESTART flag)
// Handling EINTR properly:
ssize_t safe_read(int fd, void *buf, size_t count) {
ssize_t n;
while ((n = read(fd, buf, count)) < 0) {
if (errno == EINTR)
continue; // Interrupted - retry
else
return -1; // Real error
}
return n;
}
// Or use SA_RESTART when installing handler:
struct sigaction sa;
sa.sa_handler = handler;
sigemptyset(&sa.sa_mask);
sa.sa_flags = SA_RESTART; // Automatically restart interrupted calls
sigaction(SIGCHLD, &sa, NULL);
3. Project Specification
3.1 What You Will Build
A process sandbox harness that:
- Spawns child processes in controlled modes
- Monitors and logs all exceptional control flow events
- Demonstrates zombies, orphans, and signal races
- Provides tools for understanding and preventing these issues
3.2 Functional Requirements
Core Features
- Process Spawner (
sandbox spawn):- Normal exit mode (child exits cleanly)
- Crash mode (child triggers SIGSEGV, SIGFPE, etc.)
- Stop/continue mode (child responds to SIGSTOP/SIGCONT)
- Timeout mode (child killed after timeout)
- Zombie demonstration mode
- Orphan demonstration mode
- Event Logger:
- Timestamp all events to microsecond precision
- Log: fork, exec, signal sent, signal delivered, state changes, wait results
- Output machine-parseable format (for analysis)
- Signal Handler Laboratory (
sandbox signals):- Demonstrate async-signal-safety violations
- Show race condition scenarios
- Illustrate proper vs improper signal blocking
- Race Condition Demonstrator (
sandbox race):- Show the SIGCHLD/addjob race
- Demonstrate how to fix it
- Provide before/after comparison
3.3 Example Usage and Output
$ ./sandbox spawn --mode=normal
=== PROCESS SANDBOX: Normal Exit Mode ===
[00:00:00.000001] PARENT (PID 1000): Starting sandbox
[00:00:00.000045] PARENT (PID 1000): Forking child
[00:00:00.000089] PARENT (PID 1000): fork() returned 1001
[00:00:00.000091] CHILD (PID 1001): Started (PPID=1000)
[00:00:00.000095] CHILD (PID 1001): Executing /bin/echo "Hello, World!"
[00:00:00.000234] CHILD (PID 1001): execve() succeeded
Hello, World!
[00:00:00.001234] PARENT (PID 1000): Received SIGCHLD
[00:00:00.001256] PARENT (PID 1000): waitpid() returned 1001
[00:00:00.001258] PARENT (PID 1000): Child status: exited normally, code=0
Timeline:
fork โโโถ exec โโโถ run โโโถ exit โโโถ SIGCHLD โโโถ reap
Summary: Child executed normally, no zombies created.
$ ./sandbox spawn --mode=zombie --delay=5
=== PROCESS SANDBOX: Zombie Demonstration ===
[00:00:00.000001] PARENT (PID 1000): Starting zombie demo
[00:00:00.000045] PARENT (PID 1000): Forking child
[00:00:00.000089] CHILD (PID 1001): Started
[00:00:00.000091] CHILD (PID 1001): Exiting immediately
[00:00:00.000095] CHILD (PID 1001): exit(42)
[00:00:00.000100] PARENT (PID 1000): NOT calling wait() for 5 seconds...
[Process table during delay:]
$ ps aux | grep sandbox
user 1000 ... sandbox spawn --mode=zombie
user 1001 ... [sandbox] <defunct> โโโ ZOMBIE!
[00:00:05.000100] PARENT (PID 1000): Now calling wait()
[00:00:05.000123] PARENT (PID 1000): waitpid() returned 1001, status=42
[00:00:05.000125] PARENT (PID 1000): Zombie reaped
Explanation:
Between child exit and parent wait(), the child was a ZOMBIE.
- PID 1001 was reserved (couldn't be reused)
- Process table entry consumed resources
- This would be a bug in a long-running server!
Prevention:
1. Always call wait()/waitpid() promptly
2. Use SIGCHLD handler to reap asynchronously
3. Use double-fork to orphan children (let init reap)
$ ./sandbox race --demonstrate
=== SIGNAL RACE CONDITION DEMONSTRATION ===
PART 1: The Race (buggy code)
[00:00:00.000001] Installing SIGCHLD handler (buggy version)
[00:00:00.000010] Forking child...
[00:00:00.000050] fork() returned 1001
[00:00:00.000051] Handler triggered: deletejob(1001)
[00:00:00.000052] ERROR: Job 1001 not in list! (not added yet)
[00:00:00.000053] addjob(1001) โโโ Too late! Handler already ran
PROBLEM: The race occurs because:
1. fork() returns in parent
2. Child runs and exits BEFORE parent reaches addjob()
3. SIGCHLD handler runs and tries to delete non-existent job
4. Parent then adds a job that will never be reaped
PART 2: The Fix (correct code)
[00:00:01.000001] Installing SIGCHLD handler (correct version)
[00:00:01.000010] Blocking SIGCHLD before fork...
[00:00:01.000015] Forking child...
[00:00:01.000050] fork() returned 1002
[00:00:01.000055] addjob(1002) โโโ Safe! SIGCHLD blocked
[00:00:01.000060] Unblocking SIGCHLD...
[00:00:01.000062] Handler triggered: deletejob(1002)
[00:00:01.000063] Successfully removed job 1002
SOLUTION: Block SIGCHLD around fork()/addjob() to ensure
the job is added BEFORE the handler can run.
$ ./sandbox signals --test-safety
=== ASYNC-SIGNAL-SAFETY TEST ===
Test 1: Calling printf() from handler (UNSAFE)
[Running 1000 iterations with concurrent signals...]
Result: Crashed after 234 iterations (corrupted stdio buffer)
Test 2: Calling write() from handler (SAFE)
[Running 1000 iterations with concurrent signals...]
Result: All 1000 iterations completed successfully
Test 3: Calling malloc() from handler (UNSAFE)
[Running 1000 iterations with concurrent signals...]
Result: Hung after 567 iterations (malloc lock deadlock)
CONCLUSION:
Async-signal-safety is NOT optional. Use only safe functions in handlers.
3.4 Non-Functional Requirements
- Determinism: Same sequence of events for same inputs (where possible)
- Portability: Linux x86-64 primary target
- Educational: Clear explanations, not just outcomes
- Robustness: No zombie leaks from the sandbox itself
4. Solution Architecture
4.1 High-Level Design
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ PROCESS SANDBOX โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ โโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโ โ
โ โ CLI Parser โ โ Event Logger โ โ Signal Manager โ โ
โ โ โ โ โ โ โ โ
โ โ --mode=zombie โ โ Timestamps โ โ Handler setup โ โ
โ โ --delay=5 โ โ Event types โ โ Mask control โ โ
โ โ --verbose โ โ PID tracking โ โ Safe I/O โ โ
โ โโโโโโโโโโฌโโโโโโโโโโ โโโโโโโโโโฌโโโโโโโโโโ โโโโโโโโโโฌโโโโโโโโโโ โ
โ โ โ โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ โ
โ โผ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ MODE DISPATCHER โ โ
โ โ โ โ
โ โ spawn_normal() spawn_crash() spawn_zombie() spawn_race_demo() โ โ
โ โ โ โ โ โ โ โ
โ โโโโโโโโโโผโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโ โ
โ โ โ โ โ โ
โ โผ โผ โผ โผ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ PROCESS CONTROLLER โ โ
โ โ โ โ
โ โ โโโโโโโโโโโโโโ โโโโโโโโโโโโโโ โโโโโโโโโโโโโโ โโโโโโโโโโโโโโ โ โ
โ โ โ fork() โโโโถโ exec() โโโโถโ wait() โโโโถโ report() โ โ โ
โ โ โโโโโโโโโโโโโโ โโโโโโโโโโโโโโ โโโโโโโโโโโโโโ โโโโโโโโโโโโโโ โ โ
โ โ โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ REPORT GENERATOR โ โ
โ โ โ โ
โ โ Timeline visualization โ Explanation text โ Prevention tips โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
4.2 Key Components
| Component | Responsibility | Key Files |
|---|---|---|
| CLI Parser | Parse command-line options, dispatch to modes | main.c, cli.c |
| Event Logger | Thread-safe, signal-safe event recording | logger.c |
| Signal Manager | Install handlers, manage masks, safe I/O | signals.c |
| Process Controller | fork/exec/wait, child modes | process.c |
| Mode Implementations | Specific demo scenarios | modes/*.c |
| Report Generator | Format output, explanations | report.c |
4.3 Data Structures
/* Event types for logging */
typedef enum {
EVENT_FORK,
EVENT_EXEC,
EVENT_EXIT,
EVENT_SIGNAL_SENT,
EVENT_SIGNAL_RECEIVED,
EVENT_WAIT_STARTED,
EVENT_WAIT_RETURNED,
EVENT_STATE_CHANGE,
EVENT_ERROR
} event_type_t;
/* Single logged event */
typedef struct {
struct timespec timestamp;
event_type_t type;
pid_t pid;
pid_t related_pid; /* For fork: child PID; for signal: sender */
int signal_num; /* For signal events */
int status; /* For wait events */
char message[256]; /* Human-readable description */
} event_t;
/* Event log (ring buffer for async-signal-safe access) */
typedef struct {
event_t events[MAX_EVENTS];
volatile sig_atomic_t head;
volatile sig_atomic_t count;
} event_log_t;
/* Job list entry (for race condition demos) */
typedef struct job {
pid_t pid;
int state; /* RUNNING, STOPPED, DONE */
char cmdline[256];
struct job *next;
} job_t;
/* Sandbox configuration */
typedef struct {
int mode; /* Normal, crash, zombie, etc. */
int delay_seconds; /* For zombie demo */
int verbose; /* Detailed output */
int demonstrate_race; /* Show race condition */
char *child_command; /* What to exec in child */
} sandbox_config_t;
4.4 Signal Handler Design
/* Signal-safe write wrapper */
static void safe_write(const char *msg) {
write(STDOUT_FILENO, msg, strlen(msg));
}
/* Signal-safe integer to string */
static void itoa_safe(int n, char *buf) {
/* Custom implementation using only stack */
}
/* Async-signal-safe SIGCHLD handler */
static volatile sig_atomic_t got_sigchld = 0;
static volatile sig_atomic_t child_pid = 0;
static volatile sig_atomic_t child_status = 0;
void sigchld_handler(int sig) {
int saved_errno = errno; /* Save errno */
pid_t pid;
int status;
/* Reap ALL available children (signals can coalesce) */
while ((pid = waitpid(-1, &status, WNOHANG | WUNTRACED)) > 0) {
/* Use only signal-safe operations */
child_pid = pid;
child_status = status;
got_sigchld = 1;
/* Log using write(), not printf() */
safe_write("[HANDLER] Reaped child\n");
}
errno = saved_errno; /* Restore errno */
}
5. Implementation Guide
5.1 Development Environment Setup
# Required: Linux with standard development tools
sudo apt-get install build-essential gdb
# Recommended: Valgrind for memory checking
sudo apt-get install valgrind
# Create project structure
mkdir -p sandbox/{src,include,tests,docs}
cd sandbox
5.2 Project Structure
sandbox/
โโโ src/
โ โโโ main.c # Entry point, CLI parsing
โ โโโ logger.c # Async-signal-safe event logging
โ โโโ signals.c # Signal handler installation/management
โ โโโ process.c # fork/exec/wait wrappers
โ โโโ modes/
โ โ โโโ normal.c # Normal exit demonstration
โ โ โโโ crash.c # Crash (signal) demonstration
โ โ โโโ zombie.c # Zombie demonstration
โ โ โโโ orphan.c # Orphan demonstration
โ โ โโโ race.c # Race condition demonstration
โ โ โโโ safety.c # Async-signal-safety tests
โ โโโ report.c # Output formatting
โโโ include/
โ โโโ sandbox.h # Public interface
โ โโโ logger.h # Logging interface
โ โโโ signals.h # Signal utilities
โ โโโ process.h # Process control interface
โโโ tests/
โ โโโ test_logger.c # Logger unit tests
โ โโโ test_signals.c # Signal handling tests
โ โโโ stress_test.c # Concurrent stress tests
โโโ Makefile
โโโ README.md
5.3 Implementation Phases
Phase 1: Foundation (Days 1-3)
Goals:
- Set up build system
- Implement async-signal-safe logger
- Basic process creation
Tasks:
- Create Makefile with proper compiler flags (
-Wall -Werror -g) - Implement
safe_write()and basic logging - Implement
spawn_child()that forks and execs a command - Implement basic SIGCHLD handler
Checkpoint: Can fork a child, child runs /bin/echo, parent logs events.
/* Phase 1 checkpoint test */
int main() {
install_sigchld_handler();
pid_t pid = fork();
if (pid == 0) {
/* Child */
execlp("echo", "echo", "Hello from child", NULL);
_exit(1); /* Use _exit, not exit, after failed exec */
}
/* Parent */
log_event(EVENT_FORK, getpid(), pid, 0, 0, "Forked child");
int status;
waitpid(pid, &status, 0);
log_event(EVENT_WAIT_RETURNED, getpid(), pid, 0, status, "Child reaped");
return 0;
}
Phase 2: Zombie and Orphan Modes (Days 4-6)
Goals:
- Demonstrate zombie processes
- Demonstrate orphan processes
- Clear explanations of each
Tasks:
- Implement
spawn_zombie()that delays reaping - Add shell command to show process state during delay
- Implement
spawn_orphan()that has parent exit first - Track PPID changes in orphan case
Checkpoint: ./sandbox spawn --mode=zombie shows defunct process in ps.
/* Zombie demonstration core logic */
void demonstrate_zombie(int delay_seconds) {
pid_t pid = fork();
if (pid == 0) {
/* Child exits immediately */
log_event(EVENT_EXIT, getpid(), 0, 0, 42, "Child exiting");
_exit(42);
}
/* Parent delays before reaping */
printf("Child PID %d is now a zombie. Check with: ps aux | grep %d\n", pid, pid);
printf("Waiting %d seconds before reaping...\n", delay_seconds);
sleep(delay_seconds);
int status;
waitpid(pid, &status, 0);
printf("Zombie reaped. Exit status: %d\n", WEXITSTATUS(status));
}
Phase 3: Signal Handling Modes (Days 7-9)
Goals:
- Stop/continue demonstration
- Crash handling
- Async-signal-safety testing
Tasks:
- Implement stop/continue mode using SIGSTOP/SIGCONT
- Implement crash mode (trigger SIGSEGV, catch in parent)
- Create async-signal-safety test suite
- Demonstrate safe vs unsafe handler functions
Checkpoint: Can send SIGSTOP, see child stop, send SIGCONT, see child resume.
/* Stop/continue demonstration */
void demonstrate_stop_continue(void) {
pid_t pid = fork();
if (pid == 0) {
/* Child: loop forever, parent controls */
while (1) {
printf("Child running...\n");
sleep(1);
}
}
/* Parent */
sleep(2);
printf("Sending SIGSTOP to child %d\n", pid);
kill(pid, SIGSTOP);
/* Wait for stop, using WUNTRACED */
int status;
waitpid(pid, &status, WUNTRACED);
if (WIFSTOPPED(status)) {
printf("Child stopped by signal %d\n", WSTOPSIG(status));
}
sleep(2);
printf("Sending SIGCONT to child %d\n", pid);
kill(pid, SIGCONT);
/* Let child run a bit more, then terminate */
sleep(2);
kill(pid, SIGTERM);
waitpid(pid, &status, 0);
}
Phase 4: Race Condition Demonstration (Days 10-12)
Goals:
- Show the classic SIGCHLD/addjob race
- Demonstrate the fix with signal blocking
- Clear before/after comparison
Tasks:
- Implement simple job list
- Create buggy version that races
- Create fixed version with signal blocking
- Run many iterations to expose race
Checkpoint: Buggy version fails detectably; fixed version succeeds.
/* Race condition demonstration */
/* Buggy version */
void spawn_buggy(void) {
pid_t pid = fork();
if (pid == 0) {
_exit(0); /* Exit immediately to maximize race window */
}
/* Race: SIGCHLD might arrive before addjob() */
addjob(pid); /* BUG: handler might run first! */
}
/* Fixed version */
void spawn_fixed(void) {
sigset_t mask, prev;
sigemptyset(&mask);
sigaddset(&mask, SIGCHLD);
/* Block SIGCHLD */
sigprocmask(SIG_BLOCK, &mask, &prev);
pid_t pid = fork();
if (pid == 0) {
/* Child: unblock inherited mask before exec */
sigprocmask(SIG_SETMASK, &prev, NULL);
_exit(0);
}
/* Parent: safe to add job - SIGCHLD is blocked */
addjob(pid);
/* Unblock SIGCHLD - handler runs now if signal pending */
sigprocmask(SIG_SETMASK, &prev, NULL);
}
Phase 5: Polish and Integration (Days 13-14)
Goals:
- Complete CLI interface
- Comprehensive output formatting
- Documentation and explanations
Tasks:
- Integrate all modes into unified CLI
- Add timeline visualization
- Add explanation text for each mode
- Write comprehensive tests
- Document all features
Checkpoint: All modes work, output is educational, no memory leaks.
6. Testing Strategy
6.1 Test Categories
| Category | Purpose | Examples |
|---|---|---|
| Unit Tests | Test individual functions | Logger, signal mask helpers |
| Integration Tests | Test mode behaviors | Zombie demo produces correct output |
| Stress Tests | Expose race conditions | Rapid fork/exit cycles |
| Safety Tests | Verify signal safety | Handler doesnโt corrupt state |
6.2 Critical Test Scenarios
Scenario 1: Zombie Detection
#!/bin/bash
# Test that zombie is actually created
./sandbox spawn --mode=zombie --delay=3 &
PARENT_PID=$!
sleep 1
# Check for defunct process
ZOMBIE=$(ps aux | grep defunct | grep -v grep)
if [ -z "$ZOMBIE" ]; then
echo "FAIL: No zombie found"
exit 1
fi
wait $PARENT_PID
echo "PASS: Zombie was created and reaped"
Scenario 2: Signal Race Stress Test
/* Stress test for race condition fix */
void test_race_fix(void) {
int failures = 0;
for (int i = 0; i < 1000; i++) {
reset_job_list();
spawn_fixed(); /* Should never fail */
/* Give handler time to run */
usleep(1000);
/* Verify job was properly handled */
if (!job_list_is_consistent()) {
failures++;
}
}
printf("Race fix test: %d/1000 iterations failed\n", failures);
assert(failures == 0);
}
Scenario 3: Async-Signal-Safety Test
/* Test that unsafe functions cause problems */
void test_unsafe_handler(void) {
/* Install handler that calls printf (UNSAFE) */
signal(SIGUSR1, unsafe_handler);
/* Send signals rapidly while calling printf */
pid_t pid = fork();
if (pid == 0) {
for (int i = 0; i < 1000; i++) {
kill(getppid(), SIGUSR1);
usleep(100);
}
_exit(0);
}
/* This should eventually hang or crash */
for (int i = 0; i < 1000; i++) {
printf("Iteration %d\n", i);
usleep(100);
}
/* If we get here, test didn't trigger the bug (not guaranteed) */
printf("WARNING: Bug not triggered in this run\n");
}
6.3 Test Data and Expected Results
| Test | Input | Expected Outcome |
|---|---|---|
| Normal exit | --mode=normal |
Child exits 0, no zombie |
| Zombie | --mode=zombie --delay=3 |
Defunct process visible for 3s |
| Crash | --mode=crash --signal=SIGSEGV |
Child terminates by signal |
| Race buggy | --race --buggy |
Some iterations fail |
| Race fixed | --race --fixed |
All iterations succeed |
7. Common Pitfalls and Debugging
7.1 Frequent Mistakes
| Pitfall | Symptom | Solution |
|---|---|---|
Using exit() after failed exec() |
atexit handlers run in child | Use _exit() after exec failure |
Calling printf() in handler |
Hangs, crashes, garbled output | Use write() only |
Not saving errno in handler |
Mysterious failures in main program | Save/restore errno in handler |
Single waitpid() for multiple children |
Zombies accumulate | Loop with WNOHANG until no more |
| Not blocking signals around fork | Race conditions | Block SIGCHLD before fork |
Forgetting WUNTRACED |
Stopped children not detected | Use WUNTRACED to catch stops |
7.2 Signal Race Debugging
/* Adding debug output to expose races */
void debug_handler(int sig) {
int saved_errno = errno;
/* Async-signal-safe debug output */
char buf[100];
int len = snprintf(buf, sizeof(buf),
"[HANDLER] sig=%d, time=%ld\n",
sig, time(NULL));
write(STDERR_FILENO, buf, len);
/* ... handler logic ... */
errno = saved_errno;
}
/* In main code, also log with timestamps */
void spawn_with_debug(void) {
fprintf(stderr, "[MAIN] Before fork, time=%ld\n", time(NULL));
pid_t pid = fork();
fprintf(stderr, "[MAIN] After fork, pid=%d, time=%ld\n", pid, time(NULL));
if (pid > 0) {
fprintf(stderr, "[MAIN] Before addjob, time=%ld\n", time(NULL));
addjob(pid);
fprintf(stderr, "[MAIN] After addjob, time=%ld\n", time(NULL));
}
}
7.3 Zombie Debugging
# Find zombies on the system
ps aux | grep defunct
# See process tree to find parent
pstree -p | grep defunct
# Use strace to see what parent is doing
strace -p <parent_pid>
# Check if parent is blocking on something
cat /proc/<parent_pid>/wchan
7.4 Signal Delivery Debugging
# See what signals are blocked/pending
cat /proc/<pid>/status | grep -E "Sig(Blk|Pnd|Ign|Cgt)"
# Trace signals with strace
strace -e signal ./sandbox
# Send signals manually
kill -SIGCHLD <pid>
kill -STOP <pid>
kill -CONT <pid>
8. Extensions and Challenges
8.1 Beginner Extensions
- Add color output: Use ANSI codes for different event types
- JSON logging: Machine-readable event format
- Multiple children: Spawn and track N children simultaneously
8.2 Intermediate Extensions
- Process group demo: Show how SIGINT affects process groups
- Terminal control: Demonstrate foreground/background job control
- Timer-based timeout: Use SIGALRM for timeouts
- Signal queue: Show how signals coalesce (send 10, receive fewer)
8.3 Advanced Extensions
- Full job control shell: Extend to mini-shell with fg/bg
- ptrace-based tracing: Intercept all system calls
- Namespace isolation: Use Linux namespaces for sandbox
- Signal forwarding proxy: Parent intercepts and logs all child signals
9. Real-World Connections
9.1 Industry Applications
- Process supervisors (systemd, supervisord): Manage service lifecycles
- Container runtimes (Docker, containerd): Process isolation and control
- Job schedulers (Slurm, PBS): Batch job management
- Debuggers (GDB, LLDB): Use ptrace and signals for control
- Shell implementations (bash, zsh): Job control using these primitives
9.2 Related Open Source Projects
- libuv: Cross-platform async I/O with process management
- supervisord: Python process control system
- runit: Simple Unix init scheme with process supervision
- s6: Small, secure supervision suite
9.3 Interview Relevance
This project prepares you for questions like:
- โWhat is a zombie process and how do you prevent them?โ
- โExplain the fork/exec/wait patternโ
- โHow do you handle signals safely?โ
- โWhat is async-signal-safety?โ
- โDescribe a race condition youโve debuggedโ
- โHow does job control work in a shell?โ
10. Resources
10.1 Essential Reading
- CS:APP Chapter 8: โExceptional Control Flowโ - The primary reference
- OSTEP Chapters on Processes: ostep.org
- The Linux Programming Interface by Michael Kerrisk: Chapters 20-28 (Signals), Chapter 24-26 (Process Creation)
- Advanced Programming in the UNIX Environment by Stevens: Chapters 8-10
10.2 Man Pages
man 2 fork
man 2 execve
man 2 wait
man 2 waitpid
man 7 signal
man 2 sigaction
man 2 sigprocmask
man 3 sigsetjmp
man 7 signal-safety # List of async-signal-safe functions
10.3 Online Resources
10.4 Related Projects in This Series
- Previous: P10: ELF Link Map & Interposition - Understanding program execution
- Next: P12: Unix Shell with Job Control - Apply these concepts to build a real shell
11. Self-Assessment Checklist
Before considering this project complete, verify:
Understanding
- I can explain the process state diagram without looking at notes
- I can describe exactly what happens when fork() is called
- I understand why zombies occur and how to prevent them
- I can explain async-signal-safety and name 5 safe functions
- I understand the SIGCHLD/addjob race and how signal blocking fixes it
- I can describe when and why to use sigsetjmp/siglongjmp
Implementation
- Logger is fully async-signal-safe (no printf/malloc in handlers)
- Zombie demonstration actually shows defunct process in ps
- Race condition demonstration reliably shows bug and fix
- All children are properly reaped (no zombie leaks)
- Signal masks are correctly managed around fork()
Debugging Skills
- I can use ps to identify zombie processes
- I can use strace to trace signal delivery
- I can inspect /proc/
/status for signal state - I can diagnose async-signal-safety violations
Growth
- I caught and fixed at least one subtle signal race in my code
- I can explain my handler design decisions
- I feel confident using fork/exec/wait in future projects
13. Real World Outcome
When you complete this project, you will have a comprehensive signals and process control sandbox. Here is exactly what running your tool will look like:
Basic Process Lifecycle Demo
$ ./procsandbox --demo lifecycle
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ PROCESS LIFECYCLE DEMONSTRATION โ
โ Normal Exit Scenario โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
[00.000] PARENT (PID 12345): Starting demonstration
[00.001] PARENT (PID 12345): About to fork()...
[00.002] PARENT (PID 12345): fork() returned 12346 (child PID)
[00.002] CHILD (PID 12346): I exist! My parent is 12345
[00.003] CHILD (PID 12346): Doing work for 2 seconds...
[02.005] CHILD (PID 12346): Work complete, calling exit(42)
[02.005] PARENT (PID 12345): Received SIGCHLD!
[02.006] PARENT (PID 12345): waitpid() returned 12346
[02.006] PARENT (PID 12345): Child exited normally with status 42
Timeline:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
0s 1s 2s
โโโโโโโโโโโโโโโโโโโโโโโ
โผ fork() โผ exit(42)
โโโโโโโโโโโโโโโโโโโโโโโโค
โ CHILD RUNNING โ
โโโโโโโโโโโโโโโโโโโโโโโโค
โผ SIGCHLD received
โผ waitpid() reaps child
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Zombie Process Demonstration
$ ./procsandbox --demo zombie
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ ZOMBIE PROCESS DEMONSTRATION โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
[00.000] PARENT (PID 12345): Creating a child that will become a zombie
[00.001] CHILD (PID 12346): I'm about to exit immediately!
[00.002] CHILD (PID 12346): Exiting with status 0
[00.003] PARENT (PID 12345): Child exited, but I'm NOT calling wait()...
[00.003] PARENT (PID 12345): Sleeping for 10 seconds. Check process table!
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ RUN THIS IN ANOTHER TERMINAL: โ
โ โ
โ $ ps aux | grep 12346 โ
โ user 12346 0.0 0.0 0 0 ? Z 10:00 0:00 โ
โ [procsandbox] <defunct> โ
โ โฒ โ
โ โ โ
โ ZOMBIE! (Z state) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
[10.005] PARENT (PID 12345): Now calling wait() to reap the zombie...
[10.006] PARENT (PID 12345): waitpid() returned 12346 - zombie reaped!
What just happened:
1. Child exited but parent didn't call wait()
2. Kernel kept child's exit status in process table
3. Child appeared as 'Z' (zombie) state in ps
4. When parent finally called wait(), zombie was reaped
5. Zombies waste kernel resources - always reap your children!
Signal Handler Race Condition Demo
$ ./procsandbox --demo race
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ SIGNAL HANDLER RACE CONDITION โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Demonstrating the classic add-job-before-reap race...
BUGGY VERSION (race condition):
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
[00.000] Parent: About to fork
[00.001] Parent: fork() returned 12346
[00.001] Child 12346: Running and exiting immediately
[00.002] Child 12346: exit(0)
[00.002] *** SIGCHLD received! Entering handler... ***
[00.002] Handler: Looking for job with PID 12346...
[00.002] Handler: ERROR - job not found in table!
[00.003] Parent: Now adding job 12346 to table... (TOO LATE!)
[00.003] Parent: Job 12346 will NEVER be reaped - ZOMBIE LEAK!
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Race Timeline: โ
โ โ
โ Parent: fork() โโโโโโโฌโโโโ addjob() โโโโถ โ
โ โ โฒ โ
โ Child: โโโโโโโโโโดโ exit() โ โ
โ โ SIGCHLD arrives BEFORE โ
โ Handler: โโโโโดโ can't find job! addjob() runs! โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
FIXED VERSION (with sigprocmask):
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
[00.000] Parent: Blocking SIGCHLD before fork
[00.001] Parent: fork() returned 12347
[00.001] Child 12347: Running and exiting immediately
[00.002] Child 12347: exit(0)
[00.002] (SIGCHLD is BLOCKED - handler cannot run yet)
[00.003] Parent: Adding job 12347 to table...
[00.003] Parent: Unblocking SIGCHLD
[00.003] *** SIGCHLD received! Entering handler... ***
[00.003] Handler: Looking for job with PID 12347... FOUND!
[00.004] Handler: Successfully reaped job 12347
[00.004] SUCCESS - No race, no zombie!
Orphan Process Demonstration
$ ./procsandbox --demo orphan
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ ORPHAN PROCESS DEMONSTRATION โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
[00.000] PARENT (PID 12345): Creating a child that will become an orphan
[00.001] CHILD (PID 12346): My parent is 12345
[00.002] PARENT (PID 12345): I'm exiting early, abandoning my child!
[00.003] CHILD (PID 12346): Still running... checking my parent...
[00.004] CHILD (PID 12346): My parent is now 1 (init/systemd adopted me!)
[00.005] CHILD (PID 12346): I'm an orphan process
[00.006] CHILD (PID 12346): When I exit, init will reap me
Key insight: When a parent exits, orphaned children are adopted by PID 1
(init or systemd), which will properly reap them when they exit.
Signal Coalescing Demo
$ ./procsandbox --demo coalesce
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ SIGNAL COALESCING DEMONSTRATION โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Creating 10 children that will all exit nearly simultaneously...
[00.000] Forked children: 12346 12347 12348 12349 12350
12351 12352 12353 12354 12355
[00.001] All children exiting now!
[00.002] SIGCHLD received - handler invoked 1 time
BUGGY handler (single waitpid call):
Reaped: 12346
MISSED: 12347 12348 12349 12350 12351 12352 12353 12354 12355
Result: 9 ZOMBIE PROCESSES!
CORRECT handler (waitpid loop):
while ((pid = waitpid(-1, &status, WNOHANG)) > 0) {
// reap child
}
Reaped: 12346 12347 12348 12349 12350 12351 12352 12353 12354 12355
Result: All children reaped, no zombies!
Lesson: Signals don't queue! Multiple SIGCHLDs coalesce into one.
Always use a loop to reap all available children.
Async-Signal-Safety Test
$ ./procsandbox --demo async-safety
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ ASYNC-SIGNAL-SAFETY DEMONSTRATION โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Testing what happens when you call unsafe functions in signal handlers...
Test 1: printf() in handler (UNSAFE)
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Running 1000 iterations with printf() in SIGALRM handler...
[โโโโโโโโโโโโโโโโโโโโโโโโโโโโ] 100%
Result: 3 DEADLOCKS detected!
Why: printf() uses global lock. If signal arrives while main code
holds that lock, handler blocks forever waiting for same lock.
Test 2: malloc() in handler (UNSAFE)
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Running 1000 iterations with malloc() in SIGALRM handler...
[โโโโโโโโโโโโโโโโโโโโโโโโโโโโ] 100%
Result: 7 HEAP CORRUPTIONS detected!
Why: malloc() manipulates heap data structures. If interrupted
mid-update, handler's malloc() sees corrupted state.
Test 3: write() in handler (SAFE)
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Running 1000 iterations with write() in SIGALRM handler...
[โโโโโโโโโโโโโโโโโโโโโโโโโโโโ] 100%
Result: 0 issues - SAFE!
write() is async-signal-safe: it's a direct syscall with no
global state dependencies.
14. The Core Question Youโre Answering
โHow do Unix signals and process control mechanisms work, and what subtle race conditions and correctness issues arise when handling asynchronous events?โ
Signals are one of the most error-prone areas of systems programming. This project forces you to confront the reality that your code can be interrupted at any point by an asynchronous signal. Understanding these issues is essential for writing correct concurrent code, implementing job control in shells, and avoiding the mysterious bugs that plague programs that handle signals incorrectly.
15. Concepts You Must Understand First
Before starting this project, ensure you have a solid grasp of these foundational concepts:
| Concept | Where to Learn | Why It Matters |
|---|---|---|
| Process creation (fork) | CS:APP 8.2-8.3 | Core of all process control |
| Process termination (exit, wait) | CS:APP 8.4 | Proper child cleanup |
| Signal delivery and handling | CS:APP 8.5 | Foundation of this project |
| Signal sets and blocking | CS:APP 8.5.4 | Preventing race conditions |
| Zombie and orphan processes | CS:APP 8.4.2 | Common pitfalls |
| errno and its quirks | man errno | Handler must preserve errno |
| Process groups and sessions | APUE Ch. 9 | Job control foundation |
| C function pointers | K&R Ch. 5.11 | Signal handler registration |
16. Questions to Guide Your Design
Work through these questions before writing any code:
-
Handler Registration: Why should you use sigaction() instead of signal()? What guarantees does sigaction() provide that signal() doesnโt?
-
Reentrant Data Structures: If your signal handler needs to modify a shared data structure (like a job list), how will you ensure the main code doesnโt see a half-updated state?
-
errno Preservation: If your handler calls a syscall that fails, it will overwrite errno. How do you prevent this from corrupting the main codeโs error handling?
-
Child Reaping Strategy: Should you reap children in the signal handler or in the main loop? What are the tradeoffs?
-
waitpid Options: When should you use WNOHANG? WUNTRACED? WCONTINUED? What happens if you use blocking waitpid() in a signal handler?
-
Pending vs. Blocked: Whatโs the difference between a pending signal and a blocked signal? If 5 SIGCHLDs arrive while SIGCHLD is blocked, how many will be delivered when you unblock?
17. Thinking Exercise
Before coding, trace through this scenario by hand:
volatile int counter = 0;
volatile sig_atomic_t child_count = 0;
void handler(int sig) {
counter++;
child_count--;
}
int main() {
signal(SIGCHLD, handler);
for (int i = 0; i < 5; i++) {
if (fork() == 0) {
exit(0); // Child exits immediately
}
child_count++;
}
while (child_count > 0) {
sleep(1); // Wait for all children
}
printf("counter = %d\n", counter);
return 0;
}
Questions to answer:
- If all 5 children exit before the parent finishes the fork loop, what value will counter have at the end?
- Will the while loop ever terminate? Why or why not?
- Whatโs the fundamental bug in this code?
- How would you fix it?
Solution (click to expand)
Problem Analysis:
-
Counter value: Could be anywhere from 1 to 5, most likely 1-2. Signals coalesce, so if all 5 children exit around the same time, most SIGCHLDs are lost.
-
Loop termination: Likely will NOT terminate! If signals coalesce, counter might only be 1, but child_count started at 5 and only decremented once, leaving child_count = 4. The loop waits forever.
- Fundamental bugs:
- Race between
child_count++and signal handlerโschild_count-- - Assumes 1 signal per child (signals coalesce!)
- Uses counter instead of actually reaping children
- Race between
- Fixed version: ```c void handler(int sig) { int saved_errno = errno; pid_t pid; while ((pid = waitpid(-1, NULL, WNOHANG)) > 0) { // Actually reap, loop handles coalescing } errno = saved_errno; }
int main() { sigset_t mask, prev; sigemptyset(&mask); sigaddset(&mask, SIGCHLD);
struct sigaction sa = {.sa_handler = handler, .sa_flags = SA_RESTART};
sigaction(SIGCHLD, &sa, NULL);
int child_count = 0;
for (int i = 0; i < 5; i++) {
sigprocmask(SIG_BLOCK, &mask, &prev);
if (fork() == 0) {
exit(0);
}
child_count++;
sigprocmask(SIG_SETMASK, &prev, NULL);
}
// Proper synchronization via waitpid, not counter
while (waitpid(-1, NULL, 0) > 0 || child_count-- > 0) {
// Reap remaining children
}
return 0; } ```
18. The Interview Questions Theyโll Ask
After completing this project, you should be able to confidently answer these questions:
- โWhat is a zombie process and how do you prevent them?โ
- Zombie = exited child not yet reaped; prevent by calling wait() or using SIGCHLD handler
- โExplain the race condition between fork() and adding to a job list. How do you fix it?โ
- Block SIGCHLD before fork, add to list, then unblock
- โWhy canโt you call printf() from a signal handler?โ
- Not async-signal-safe; uses global lock/buffer; can deadlock or corrupt
- โWhat happens if 5 signals of the same type arrive while that signal is blocked?โ
- Only 1 is delivered when unblocked (signals donโt queue, they coalesce)
- โWhatโs the difference between a signal being ignored vs blocked?โ
- Ignored: delivered but has no effect. Blocked: pending until unblocked
- โHow would you implement a timeout for a blocking operation?โ
- setitimer/alarm + longjmp, or use select/poll with timeout
19. Hints in Layers
Use these hints progressively if you get stuck.
Hint Layer 1: Getting Started
- Start simple: make a parent fork one child, have child exit, have parent wait and print status
- Use
ps aux | grep Zin another terminal to see zombies - WIFEXITED(), WIFSIGNALED(), WIFSTOPPED() macros decode wait status
Hint Layer 2: Signal Handling
- Use sigaction() not signal():
struct sigaction sa = {.sa_handler = handler}; sigaction(SIGCHLD, &sa, NULL); - Always save/restore errno in handlers:
int saved = errno; ... errno = saved; - Use write() for output in handlers, not printf()
Hint Layer 3: Race Prevention
- Signal blocking pattern:
sigset_t mask, prev; sigemptyset(&mask); sigaddset(&mask, SIGCHLD); sigprocmask(SIG_BLOCK, &mask, &prev); // Block // ... critical section ... sigprocmask(SIG_SETMASK, &prev, NULL); // Restore - Blocking a signal doesnโt lose it; it becomes pending
Hint Layer 4: Common Bugs
- Forgetting WNOHANG in handler causes blocking in signal context
- Not looping on waitpid:
while ((pid = waitpid(-1, &st, WNOHANG)) > 0) - Race: signal arrives between check and action. Always re-check after unblocking.
- SA_RESTART doesnโt apply to all syscalls (select, sleep interrupted anyway)
20. Books That Will Help
| Topic | Book | Specific Chapters |
|---|---|---|
| Signals and handlers | CS:APP (3rd ed.) | Chapter 8.5 |
| Process control (fork/exec/wait) | CS:APP (3rd ed.) | Chapter 8.2-8.4 |
| Race conditions | CS:APP (3rd ed.) | Chapter 8.5.5 |
| Async-signal-safety | The Linux Programming Interface (Kerrisk) | Chapter 21.1 |
| Signal sets and blocking | The Linux Programming Interface (Kerrisk) | Chapter 20 |
| Process groups and sessions | Advanced Programming in the Unix Environment (Stevens) | Chapter 9 |
| Robust signal handling | Advanced Programming in the Unix Environment (Stevens) | Chapter 10 |
| Practical debugging | The Art of Debugging with GDB (Matloff) | Chapters on signal tracing |
12. Submission / Completion Criteria
Minimum Viable Completion:
- Normal, zombie, and crash modes work
- Basic event logging with timestamps
- Zombie is visibly demonstrated
- No memory leaks or zombie leaks from sandbox itself
Full Completion:
- All modes implemented (normal, crash, zombie, orphan, stop/continue)
- Race condition demonstration with clear before/after
- Async-signal-safety test suite
- Comprehensive output with explanations
- Stress testing passes
Excellence (Going Above & Beyond):
- Process group and session demonstrations
- Timeline visualization (ASCII art or terminal graphics)
- Integration with P12 (shell) for job control
- Coverage of edge cases (signal coalescing, EINTR handling)
This guide was expanded from CSAPP_3E_DEEP_LEARNING_PROJECTS.md. For the complete learning path, see the project index.