Project 6: Robust Signal Handler Framework

Build a comprehensive signal handling framework that properly handles SIGINT, SIGTERM, SIGCHLD, SIGALRM, and SIGUSR1/SIGUSR2, demonstrating async-signal-safe practices and the self-pipe trick.

Quick Reference

Attribute Value
Difficulty Level 4: Expert
Time Estimate 1-2 Weeks (20-30 hours)
Language C (Alternatives: Rust with unsafe, Go with limited signal support)
Prerequisites Process control, fork/exec, basic file descriptors, select() or poll()
Key Topics sigaction, async-signal-safety, self-pipe trick, signal masks, SIGCHLD handling

1. Learning Objectives

By completing this project, you will:

  1. Understand signal delivery mechanics - When signals are delivered, how they interrupt execution, and what happens to blocked signals.
  2. Master sigaction() over signal() - Why sigaction() is reliable and signal() is not.
  3. Implement async-signal-safe handlers - Know which functions are safe to call in signal context.
  4. Apply the self-pipe trick - Convert asynchronous signals into synchronous I/O events.
  5. Handle SIGCHLD correctly - Reap all zombie children without missing any.
  6. Build production-ready signal infrastructure - Patterns used in nginx, PostgreSQL, and other servers.

2. Theoretical Foundation

2.1 Core Concepts

Signals are software interrupts. When a signal arrives, the kernel suspends whatever your process is doing and runs your signal handler. This can happen at literally ANY point in your code - between any two machine instructions.

Normal Execution                 Signal Arrives
┌─────────────────┐             ┌─────────────────┐
│ instruction 1   │             │ instruction 1   │
│ instruction 2   │             │ instruction 2   │
│ instruction 3   │  ────────>  │ ═══════════════ │ SIGNAL!
│ instruction 4   │             │ handler runs... │
│ instruction 5   │             │ ═══════════════ │
└─────────────────┘             │ instruction 3   │ (resumes)
                                │ instruction 4   │
                                └─────────────────┘

The fundamental problem: If your main code is modifying a data structure and a signal fires, your handler might see that data structure in a corrupted, half-modified state. Worse, if your handler calls a function that uses locks (like printf()), and your main code was already holding that lock, you have a deadlock.

Async-signal-safe functions are the tiny subset of functions guaranteed safe to call from signal handlers. The list is short: write(), _exit(), read(), and a few dozen others. NOT safe: printf(), malloc(), free(), any function that uses locks internally.

2.2 Why This Matters

Every long-running server needs signal handling:

  • SIGTERM: Graceful shutdown (systemd sends this first)
  • SIGINT: Interactive interrupt (Ctrl-C)
  • SIGHUP: Reload configuration (convention for daemons)
  • SIGCHLD: Child process terminated (avoid zombies)
  • SIGPIPE: Broken pipe (don’t crash on network errors)
  • SIGALRM: Timer expired (timeouts, periodic tasks)

Getting signals wrong causes:

  • Zombie processes (missed SIGCHLD)
  • Data corruption (non-atomic updates interrupted)
  • Deadlocks (calling non-safe functions in handlers)
  • Missed signals (using signal() instead of sigaction())
  • Race conditions (checking flag, then acting, interrupted between)

2.3 Historical Context

Early UNIX signal handling was unreliable. The signal() function would reset the handler to SIG_DFL after each signal, leading to race conditions. BSD introduced reliable signals, and POSIX standardized sigaction() in 1988. Today, sigaction() is the only correct choice.

The self-pipe trick was popularized by Dan Bernstein (djb) in the 1990s and is now standard practice in event-driven servers.


3. Project Specification

3.1 What You Will Build

A signal handling framework library (siglib.h / siglib.c) and demonstration program (sigdemo.c) that:

  1. Sets up signal handlers using sigaction()
  2. Demonstrates the self-pipe trick for integrating signals with select()/poll()
  3. Handles SIGCHLD to reap all zombie children
  4. Implements graceful shutdown on SIGTERM/SIGINT
  5. Provides periodic timer functionality with SIGALRM
  6. Logs all signal activity (safely) via the self-pipe

3.2 Functional Requirements

Requirement Description
R1 Use sigaction() exclusively, never signal()
R2 All signal handlers must be async-signal-safe
R3 Implement self-pipe trick for signal-to-I/O conversion
R4 Handle SIGCHLD with waitpid() loop (WNOHANG)
R5 Support graceful shutdown (finish current work, cleanup)
R6 Implement SIGALRM-based periodic timer
R7 Handle EINTR in all blocking system calls
R8 Provide signal masking for critical sections

3.3 Non-Functional Requirements

  • Zero memory leaks (verified with valgrind)
  • No race conditions (verified with helgrind)
  • Compile with -Wall -Wextra -Werror without warnings
  • Handle rapid signal delivery without losing signals
  • Work correctly on Linux and macOS

3.4 Example Output

# Terminal 1: Start the demo
$ ./sigdemo
Signal handler framework running (PID 12345)
Self-pipe initialized: read_fd=3, write_fd=4
Handlers installed: SIGINT SIGTERM SIGCHLD SIGALRM SIGUSR1 SIGUSR2
Starting main event loop...

# Terminal 2: Send signals
$ kill -USR1 12345
$ kill -USR2 12345
$ kill -ALRM 12345
$ kill -TERM 12345

# Terminal 1 output:
[2024-03-15 10:30:01.123] EVENT: Received SIGUSR1 (10)
[2024-03-15 10:30:02.456] EVENT: Received SIGUSR2 (12)
[2024-03-15 10:30:03.789] EVENT: Received SIGALRM (14) - timer tick
[2024-03-15 10:30:04.012] EVENT: Received SIGTERM (15)
[2024-03-15 10:30:04.013] Initiating graceful shutdown...
[2024-03-15 10:30:04.014] Completing pending work (2 items)...
[2024-03-15 10:30:04.115] Flushing buffers...
[2024-03-15 10:30:04.116] Closing resources...
[2024-03-15 10:30:04.117] Shutdown complete. Exiting with status 0.

# Child process handling
$ ./sigdemo --fork 5
Forking 5 child processes...
Child 12346 started
Child 12347 started
Child 12348 started
Child 12349 started
Child 12350 started
[2024-03-15 10:31:00.100] SIGCHLD: Child 12346 exited, status 0
[2024-03-15 10:31:00.200] SIGCHLD: Child 12347 exited, status 0
[2024-03-15 10:31:00.300] SIGCHLD: Child 12348 exited, status 0
[2024-03-15 10:31:00.400] SIGCHLD: Child 12349 exited, status 0
[2024-03-15 10:31:00.500] SIGCHLD: Child 12350 exited, status 0
All 5 children reaped. No zombies!

# Timer demonstration
$ ./sigdemo --timer 2
Setting SIGALRM every 2 seconds
[10:32:00] Timer tick #1
[10:32:02] Timer tick #2
[10:32:04] Timer tick #3
^C
Caught SIGINT. Stopping timer. Exiting.

4. Solution Architecture

4.1 High-Level Design

┌──────────────────────────────────────────────────────────────────────┐
│                         Application Code                              │
│                                                                       │
│   ┌───────────────┐    ┌───────────────┐    ┌───────────────┐       │
│   │  Main Loop    │    │ Work Handler  │    │ Shutdown      │       │
│   │  (select)     │───>│ (process data)│    │ Handler       │       │
│   └───────┬───────┘    └───────────────┘    └───────────────┘       │
│           │                                         ^                 │
│           │ reads                                   │                 │
│           v                                         │                 │
│   ┌───────────────┐                                 │                 │
│   │  Self-Pipe    │                                 │                 │
│   │  read end     │                                 │                 │
│   └───────┬───────┘                                 │                 │
└───────────┼─────────────────────────────────────────┼─────────────────┘
            │                                         │
            │                                         │ sets flag
            │                                         │
┌───────────┼─────────────────────────────────────────┼─────────────────┐
│           │                                         │                 │
│   ┌───────┴───────┐                         ┌───────┴───────┐       │
│   │  Self-Pipe    │<────────────────────────│ Signal Handler │       │
│   │  write end    │    write(pipe, "x", 1)  │ (minimal!)     │       │
│   └───────────────┘                         └───────────────┘       │
│                                                     ^                 │
│                              Signal Context         │                 │
│                                                     │                 │
└─────────────────────────────────────────────────────┼─────────────────┘
                                                      │
                                              ┌───────┴───────┐
                                              │    Kernel     │
                                              │ Signal        │
                                              │ Delivery      │
                                              └───────────────┘

4.2 Key Components

Component Responsibility Key Decisions
Signal Handler Set flag, write to pipe Must be async-signal-safe
Self-Pipe Convert async → sync Non-blocking, ignore write errors
Event Loop Wait on pipe + other FDs Use select() or poll()
Signal Processor Handle signal in main context Can use printf, malloc, etc.
Child Reaper waitpid() loop WNOHANG, handle multiple children
Timer Manager setitimer/alarm Reset timer after each SIGALRM

4.3 Data Structures

/* Signal state - must use volatile sig_atomic_t for flags */
typedef struct {
    volatile sig_atomic_t got_sigint;
    volatile sig_atomic_t got_sigterm;
    volatile sig_atomic_t got_sigchld;
    volatile sig_atomic_t got_sigalrm;
    volatile sig_atomic_t got_sigusr1;
    volatile sig_atomic_t got_sigusr2;
    volatile sig_atomic_t shutdown_requested;
} signal_state_t;

/* Self-pipe for signal-to-I/O conversion */
typedef struct {
    int read_fd;
    int write_fd;
} self_pipe_t;

/* Signal framework context */
typedef struct {
    signal_state_t state;
    self_pipe_t pipe;
    sigset_t original_mask;
    int timer_interval_secs;
    void (*on_shutdown)(void *);
    void *shutdown_arg;
} siglib_context_t;

4.4 Algorithm Overview

Self-Pipe Trick:

1. Create pipe: pipe(pipefd)
2. Set both ends non-blocking: fcntl(fd, F_SETFL, O_NONBLOCK)
3. In signal handler:
   - Set flag (volatile sig_atomic_t)
   - write(write_fd, "x", 1)  // Wakes up select()
4. In main loop:
   - select() includes read_fd
   - When readable: read and drain pipe, check flags
   - Handle signals in main context (safe to use any functions)

SIGCHLD Handling:

// In main context after SIGCHLD flag set:
while ((pid = waitpid(-1, &status, WNOHANG)) > 0) {
    // Process exited child
    log_child_exit(pid, status);
}
// Loop handles ALL children that exited before we ran

5. Implementation Guide

5.1 Development Environment Setup

# Required packages (Ubuntu/Debian)
sudo apt-get install build-essential valgrind strace

# Verify tools
gcc --version
valgrind --version
strace --version

# Project structure
mkdir -p sighandler/{src,include,tests}
cd sighandler

5.2 Project Structure

sighandler/
├── include/
│   └── siglib.h          # Public API
├── src/
│   ├── siglib.c          # Implementation
│   └── sigdemo.c         # Demo program
├── tests/
│   ├── test_signals.c    # Unit tests
│   └── test_selfpipe.c   # Self-pipe tests
├── Makefile
└── README.md

5.3 The Core Question You’re Answering

“How do you handle asynchronous events safely in a program where a signal can interrupt literally any line of code?”

Before writing any code, internalize this: signals are not like function calls. They don’t wait for you to be ready. They interrupt execution between any two machine instructions. Your handler runs with the data structures in whatever state they were in at that instant.

5.4 Concepts You Must Understand First

Before coding, verify you can answer these:

  1. Why is printf() not async-signal-safe?
    • It uses internal locks. If interrupted while holding lock, handler calling printf() deadlocks.
  2. What’s the difference between sigaction() and signal()?
    • signal() is unreliable: handler resets to SIG_DFL after delivery on some systems
    • signal() doesn’t specify what happens to other signals during handling
    • sigaction() is fully specified by POSIX
  3. What does SA_RESTART do?
    • If a signal interrupts a slow syscall (read, write, select), SA_RESTART makes the syscall restart automatically instead of returning EINTR.
  4. What is volatile sig_atomic_t?
    • volatile: compiler won’t optimize away reads/writes
    • sig_atomic_t: guaranteed to be read/written atomically (no torn reads)

5.5 Questions to Guide Your Design

  1. Handler complexity: Should handlers do real work or just set flags?
    • Answer: Just set flags. All real work in main context.
  2. Signal masking: When do you need to block signals?
    • Answer: During critical sections where you modify shared state.
  3. EINTR handling: How to handle interrupted system calls?
    • Answer: Either use SA_RESTART, or wrap calls in retry loops.
  4. Multiple children: What if 5 children exit before you handle SIGCHLD?
    • Answer: Only one SIGCHLD delivered. Must call waitpid() in loop with WNOHANG.

5.6 Thinking Exercise

Trace through signal handler deadlock:

Main code                          Signal handler
──────────                          ──────────────
printf("Status: %d", x);
  └── acquires stdio lock
       └── still holding lock...
                                    [SIGNAL ARRIVES HERE]
                                    handler() {
                                        printf("Signal!");
                                          └── tries to acquire stdio lock
                                               └── BLOCKED FOREVER!
                                    }
                                    (never returns)
  (never resumes)

This is a DEADLOCK. The process hangs forever.

Now trace the self-pipe solution:

Main code                          Signal handler
──────────                          ──────────────
fd_set readfds;
FD_SET(pipe_read, &readfds);
FD_SET(client_fd, &readfds);
select(nfds, &readfds, ...);
  └── blocked, waiting...
                                    [SIGNAL ARRIVES]
                                    handler() {
                                        got_signal = signo;
                                        write(pipe_write, "x", 1);
                                    }
                                    (returns immediately)
  └── select() returns!
       FD_ISSET(pipe_read) == true
       └── read(pipe_read, buf, 1);
       └── if (got_signal == SIGTERM)
              printf("Shutting down"); // SAFE HERE!

5.7 Hints in Layers

Hint 1: Setting Up sigaction()

struct sigaction sa;
memset(&sa, 0, sizeof(sa));
sa.sa_handler = signal_handler;
sigemptyset(&sa.sa_mask);
sa.sa_flags = SA_RESTART;  // Auto-restart interrupted syscalls
if (sigaction(SIGTERM, &sa, NULL) == -1) {
    perror("sigaction");
    exit(1);
}

Hint 2: Minimal Handler Pattern

static volatile sig_atomic_t got_sigterm = 0;
static int pipe_write_fd;

static void handler(int signo) {
    // ONLY async-signal-safe operations!
    if (signo == SIGTERM) got_sigterm = 1;
    char c = 'x';
    write(pipe_write_fd, &c, 1);  // write() is async-signal-safe
}

Hint 3: Non-blocking Pipe

int pipefd[2];
if (pipe(pipefd) == -1) { perror("pipe"); exit(1); }

// Set both ends non-blocking
int flags = fcntl(pipefd[0], F_GETFL);
fcntl(pipefd[0], F_SETFL, flags | O_NONBLOCK);
flags = fcntl(pipefd[1], F_GETFL);
fcntl(pipefd[1], F_SETFL, flags | O_NONBLOCK);

Hint 4: SIGCHLD with Multiple Children

// Called from main context when got_sigchld flag is set
void reap_children(void) {
    pid_t pid;
    int status;
    // Loop is CRITICAL - multiple children may have exited!
    while ((pid = waitpid(-1, &status, WNOHANG)) > 0) {
        if (WIFEXITED(status)) {
            printf("Child %d exited with status %d\n",
                   pid, WEXITSTATUS(status));
        } else if (WIFSIGNALED(status)) {
            printf("Child %d killed by signal %d\n",
                   pid, WTERMSIG(status));
        }
    }
    got_sigchld = 0;  // Reset flag
}

5.8 The Interview Questions They’ll Ask

  1. “What functions are async-signal-safe and why?”
    • Functions that don’t use locks, don’t modify global state, are reentrant
    • Examples: write(), read(), _exit(), signal()
    • NOT safe: printf(), malloc(), free(), any function using stdio
  2. “Explain the difference between sigaction() and signal().”
    • signal() semantics vary by system, handler may reset to SIG_DFL
    • sigaction() is fully POSIX-specified, handler persists
    • sigaction() allows specifying signal mask during handler
  3. “How do you prevent a signal handler from interrupting itself?”
    • sigaction() automatically blocks the signal being handled
    • Use sa_mask to block additional signals during handler
  4. “What is the self-pipe trick and when would you use it?”
    • Converts async signals to sync I/O events
    • Use when you have an event loop with select()/poll()/epoll()
    • Allows handling signals in main context where any function is safe
  5. “How does SIGCHLD work and why is it important to handle it?”
    • Delivered when child process terminates
    • Must call waitpid() to reap zombie
    • Must loop with WNOHANG because only one SIGCHLD for multiple exits

5.9 Books That Will Help

Topic Book Chapter
Signal fundamentals “APUE” by Stevens & Rago Ch. 10 (complete)
Reliable signals “The Linux Programming Interface” by Kerrisk Ch. 20-22
Advanced signal techniques “Linux System Programming” by Love Ch. 10
Real-world examples nginx source code src/os/unix/ngx_process.c

6. Testing Strategy

6.1 Test Categories

Category Purpose Examples
Unit Individual functions Signal registration, pipe creation
Integration Signal flow Send signal, verify handler called
Stress Rapid signals 1000 signals/second, no loss
Concurrency Race conditions helgrind verification

6.2 Critical Test Cases

# Test 1: Basic signal handling
./sigdemo &
PID=$!
kill -USR1 $PID
# Expected: logs "Received SIGUSR1"

# Test 2: Graceful shutdown
kill -TERM $PID
# Expected: cleanup sequence in logs, clean exit

# Test 3: Rapid signals (stress test)
./sigdemo &
PID=$!
for i in {1..100}; do kill -USR1 $PID; done
kill -TERM $PID
# Expected: Should count ~100 USR1 signals received

# Test 4: Child reaping
./sigdemo --fork 10
# Expected: All 10 children reaped, no zombies
ps aux | grep defunct  # Should show nothing

# Test 5: Verify no async-signal-unsafe calls
strace -e trace=write ./sigdemo 2>&1 | grep -v "write(1"
# Handler should only use write() on pipe, never printf

# Test 6: Memory/thread safety
valgrind --leak-check=full ./sigdemo --fork 5
helgrind ./sigdemo --timer 1 &
sleep 5
kill -TERM $!
# Expected: No errors from valgrind or helgrind

7. Common Pitfalls & Debugging

Pitfall Symptom Solution
printf() in handler Random hangs/deadlocks Use write() only, set flag
Using signal() Miss signals, inconsistent Always use sigaction()
Single waitpid() call Zombie children Loop with WNOHANG
Forgetting SA_RESTART EINTR errors everywhere Set SA_RESTART or retry loop
Blocking pipe Handler blocks forever Set O_NONBLOCK on pipe
Non-volatile flag Compiler optimizes away Use volatile sig_atomic_t

Debugging Signal Issues:

# See what signals a process catches
cat /proc/<PID>/status | grep Sig

# Trace signal delivery
strace -e trace=signal ./sigdemo

# Check for zombies
ps aux | grep Z

# Debug deadlocks
gdb -p <PID>
(gdb) thread apply all bt
# Look for threads blocked in signal handler

8. Extensions & Challenges

8.1 Beginner Extensions

  • Add SIGHUP handler that re-reads a config file
  • Implement signal counter (count each signal type received)
  • Add command-line option to list all installed handlers

8.2 Intermediate Extensions

  • Implement signalfd() alternative to self-pipe (Linux 2.6.22+)
  • Add real-time signals (SIGRTMIN to SIGRTMAX) with queuing
  • Create signal debugging tool that traces all signals to a process

8.3 Advanced Extensions

  • Implement signal-safe lock-free queue for handler-to-main communication
  • Add support for siginfo_t (SA_SIGINFO) to get sender PID
  • Create multi-threaded signal handling with dedicated signal thread

9. Resources

9.1 Essential Reading

  • “APUE” Ch. 10 - The definitive signal reference
  • “TLPI” Ch. 20-22 - Linux-specific signal details
  • POSIX async-signal-safe function list: man 7 signal-safety

9.2 Code References

  • nginx signal handling: src/os/unix/ngx_process.c
  • PostgreSQL signal handling: src/backend/postmaster/postmaster.c
  • Redis signal handling: src/server.c

9.3 Man Pages

man 2 sigaction
man 2 signal
man 7 signal
man 7 signal-safety
man 2 waitpid
man 3 sigemptyset
man 2 sigprocmask

10. Self-Assessment Checklist

Before considering this project complete, verify:

  • I can explain why printf() is unsafe in signal handlers
  • I understand the self-pipe trick and when to use it
  • My signal handlers are minimal (only async-signal-safe functions)
  • I handle SIGCHLD with a waitpid() loop
  • I use sigaction() exclusively, never signal()
  • I understand SA_RESTART and EINTR handling
  • My code compiles with -Wall -Wextra -Werror without warnings
  • valgrind reports zero errors
  • I can explain the double-fork pattern’s relationship to signals
  • I know which signals can be caught and which cannot (SIGKILL, SIGSTOP)

11. Submission / Completion Criteria

Minimum Viable Completion:

  • sigaction() used for all handlers
  • Self-pipe trick implemented
  • SIGTERM/SIGINT graceful shutdown
  • Zero compiler warnings

Full Completion:

  • SIGCHLD handling with child reaping
  • SIGALRM periodic timer
  • Comprehensive error handling (EINTR, etc.)
  • valgrind clean

Excellence (Going Above & Beyond):

  • signalfd() implementation as alternative
  • Real-time signal support
  • Signal debugging/tracing tool

This guide was generated from project_based_ideas/SYSTEMS_PROGRAMMING/ADVANCED_UNIX_PROGRAMMING_DEEP_DIVE.md. For the complete sprint overview, see the README.md in this directory.