Project 16: Concurrency Workbench
Project 16: Concurrency Workbench
Build a server framework that can switch between concurrency models (iterative, process-per-request, thread-per-request, thread pool), with a bounded-buffer work queue and stress-test harness.
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Expert |
| Time Estimate | 2-3 weeks |
| Language | C (Alternatives: Rust, Zig, Go) |
| Prerequisites | Projects 11 and 15 recommended; solid understanding of processes and I/O |
| Key Topics | Threads, synchronization, producer-consumer, deadlock avoidance, thread pools |
| CS:APP Chapters | 12 |
1. Learning Objectives
By completing this project, you will:
- Master threading fundamentals: Create, join, and manage POSIX threads with proper lifecycle management
- Implement synchronization primitives correctly: Use mutexes, semaphores, and condition variables without races
- Build a producer-consumer queue: Design a thread-safe bounded buffer with proper blocking semantics
- Compare concurrency models: Measure and explain throughput differences between iterative, process-based, and thread-based servers
- Diagnose and fix concurrency bugs: Identify race conditions, deadlocks, and starvation through systematic debugging
- Design effective stress tests: Create test harnesses that reliably expose concurrency defects
- Apply concurrency correctness discipline: Use invariants, assertions, and logging to prove correctness
2. Theoretical Foundation
2.1 Concurrency vs Parallelism
These terms are often confused, but they describe fundamentally different concepts:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ CONCURRENCY vs PARALLELISM โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ CONCURRENCY: Dealing with multiple things at once (structure) โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ
โ Time โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโบ โ
โ โ
โ CPU: [Task A][Task B][Task A][Task C][Task B][Task A] โ
โ โ
โ - Single CPU interleaving multiple tasks โ
โ - Tasks make progress "together" through time-slicing โ
โ - About program STRUCTURE and composition โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ PARALLELISM: Doing multiple things at once (execution) โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ
โ Time โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโบ โ
โ โ
โ CPU 0: [Task A][Task A][Task A][Task A] โ
โ CPU 1: [Task B][Task B][Task B][Task B] โ
โ CPU 2: [Task C][Task C][Task C][Task C] โ
โ โ
โ - Multiple CPUs executing simultaneously โ
โ - True simultaneous execution โ
โ - About program EXECUTION and performance โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Key insight: You can have concurrency without parallelism (single-core with threading), parallelism without concurrency (SIMD operations), or both (multi-core with multiple threads).
Why this matters for servers:
- Concurrency allows handling multiple clients without blocking
- Parallelism allows using multiple CPU cores for throughput
- Most servers need both: concurrent structure with parallel execution
2.2 Threads vs Processes Trade-offs
Both threads and processes provide concurrent execution, but with different characteristics:
| Aspect | Processes | Threads |
|---|---|---|
| Address Space | Separate (isolated) | Shared |
| Creation Cost | High (fork + COW) | Low (stack allocation) |
| Context Switch | Expensive (TLB flush) | Cheaper (same address space) |
| Communication | IPC required (pipes, sockets, shm) | Direct memory sharing |
| Failure Isolation | Crash isolated to process | Crash affects all threads |
| Synchronization | Not needed for isolation | Required for shared state |
| Debugging | Easier (isolated state) | Harder (shared state) |
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ PROCESSES vs THREADS: Memory Model โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ PROCESS MODEL: โ
โ โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ โ
โ โ Process A โ โ Process B โ โ Process C โ โ
โ โ โโโโโโโโโโโโโโโ โ โ โโโโโโโโโโโโโโโ โ โ โโโโโโโโโโโโโโโ โ โ
โ โ โ Stack โ โ โ โ Stack โ โ โ โ Stack โ โ โ
โ โ โโโโโโโโโโโโโโโค โ โ โโโโโโโโโโโโโโโค โ โ โโโโโโโโโโโโโโโค โ โ
โ โ โ Heap โ โ โ โ Heap โ โ โ โ Heap โ โ โ
โ โ โโโโโโโโโโโโโโโค โ โ โโโโโโโโโโโโโโโค โ โ โโโโโโโโโโโโโโโค โ โ
โ โ โ Data/BSS โ โ โ โ Data/BSS โ โ โ โ Data/BSS โ โ โ
โ โ โโโโโโโโโโโโโโโค โ โ โโโโโโโโโโโโโโโค โ โ โโโโโโโโโโโโโโโค โ โ
โ โ โ Text โ โ โ โ Text โ โ โ โ Text โ โ โ
โ โ โโโโโโโโโโโโโโโ โ โ โโโโโโโโโโโโโโโ โ โ โโโโโโโโโโโโโโโ โ โ
โ โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ โ
โ โฒ โฒ โฒ โ
โ โโโโโโโโโโโโโ ISOLATED โโโโโโโโโโโโโ โ โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ THREAD MODEL (within one process): โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ Single Process โ โ
โ โ โโโโโโโโโโโ โโโโโโโโโโโ โโโโโโโโโโโ โ โ
โ โ โ Stack 1 โ โ Stack 2 โ โ Stack 3 โ โโโ Each thread has โ โ
โ โ โThread 1 โ โThread 2 โ โThread 3 โ its own stack โ โ
โ โ โโโโโโโโโโโ โโโโโโโโโโโ โโโโโโโโโโโ โ โ
โ โ โ โ โ โ โ
โ โ โโโโโโโโโโโโโโโผโโโโโโโโโโโโโโ โ โ
โ โ โผ โ โ
โ โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ โ
โ โ โ SHARED HEAP โ โ โ
โ โ โ SHARED DATA/BSS โ โ โ
โ โ โ SHARED TEXT โ โ โ
โ โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
When to use processes:
- Untrusted code execution (sandboxing)
- Legacy code that isnโt thread-safe
- Maximum fault isolation required
- CPU-bound work that doesnโt share state
When to use threads:
- Frequent communication between execution units
- Shared state is fundamental to the design
- Low-latency response required
- Memory efficiency matters
2.3 POSIX Threads (pthreads) Basics
The POSIX threads API provides portable threading on Unix-like systems.
Thread Lifecycle
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ THREAD LIFECYCLE โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ pthread_create() โ
โ โ โ
โ โผ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ RUNNABLE โ โ
โ โ (Ready to run, waiting for CPU time from scheduler) โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ โ
โ โโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโ โ
โ โผ โผ โผ โ
โ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โ
โ โ RUNNING โ โ BLOCKED โ โ WAITING โ โ
โ โ (executing) โ โ(mutex/sema) โ โ(cond_wait) โ โ
โ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โ
โ โ โ โ โ
โ โโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโ โ
โ โ โ
โ โผ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ TERMINATED โ โ
โ โ (Thread function returned or pthread_exit called) โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ โ
โ โโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโ โ
โ โผ โผ โ
โ โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ โ
โ โ JOINED โ โ DETACHED โ โ
โ โ (resources โ โ (resources โ โ
โ โ reclaimed via โ โ auto-reclaimed โ โ
โ โ pthread_join) โ โ on termination)โ โ
โ โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Essential pthreads Functions
// Thread creation
int pthread_create(pthread_t *thread, // Output: thread ID
const pthread_attr_t *attr, // Attributes (NULL for defaults)
void *(*start_routine)(void*), // Function to run
void *arg); // Argument to function
// Thread termination
void pthread_exit(void *retval); // Exit current thread with return value
// Wait for thread completion
int pthread_join(pthread_t thread, void **retval); // Block until thread exits
// Detach a thread (no join needed)
int pthread_detach(pthread_t thread);
// Get current thread ID
pthread_t pthread_self(void);
Common Patterns
// Pattern 1: Create and join (most common)
void *worker(void *arg) {
int id = *(int *)arg;
// ... do work ...
return NULL;
}
int main() {
pthread_t threads[N];
int ids[N];
// Create threads
for (int i = 0; i < N; i++) {
ids[i] = i;
pthread_create(&threads[i], NULL, worker, &ids[i]);
}
// Join all threads (wait for completion)
for (int i = 0; i < N; i++) {
pthread_join(threads[i], NULL);
}
}
// Pattern 2: Detached threads (fire and forget)
void *background_task(void *arg) {
pthread_detach(pthread_self()); // Self-detach
// ... do work ...
return NULL; // Resources auto-freed
}
2.4 Shared State and Race Conditions
When threads share memory, concurrent access without synchronization leads to race conditions.
Anatomy of a Race Condition
// Shared counter - BROKEN without synchronization
int counter = 0;
void *increment(void *arg) {
for (int i = 0; i < 1000000; i++) {
counter++; // This is NOT atomic!
}
return NULL;
}
What counter++ actually does at the machine level:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ WHY counter++ IS NOT ATOMIC โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ C code: counter++; โ
โ โ
โ Assembly: movl counter(%rip), %eax # Load counter into register โ
โ addl $1, %eax # Increment register โ
โ movl %eax, counter(%rip) # Store back to memory โ
โ โ
โ Three separate operations that can be interleaved! โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ RACE CONDITION EXAMPLE (counter starts at 0): โ
โ โ
โ Thread A Thread B โ
โ โโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโ โ
โ LOAD counter (0) โ
โ LOAD counter (0) โ
โ ADD 1 (reg = 1) โ
โ ADD 1 (reg = 1) โ
โ STORE counter (1) โ
โ STORE counter (1) โโโ LOST UPDATE! โ
โ โ
โ Expected: counter = 2 โ
โ Actual: counter = 1 โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Types of Race Conditions
- Read-Modify-Write races: Like counter++ above
- Check-Then-Act races:
if (ptr != NULL) { // Thread B sets ptr = NULL here use(ptr); // CRASH! } - Data races: Any unsynchronized access where at least one is a write
2.5 Critical Sections and Mutual Exclusion
A critical section is code that accesses shared resources and must not be executed by multiple threads simultaneously.
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ CRITICAL SECTION CONCEPT โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ Thread A Thread B โ
โ โโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโ โ
โ โ
โ [Non-critical code] [Non-critical code] โ
โ โ โ โ
โ โผ โผ โ
โ โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ โ
โ โ ENTER CRITICAL โ โ ENTER CRITICAL โ โ
โ โ SECTION โโโโโโโโโโโโโโโโบโ SECTION โ โ
โ โ โ MUTUAL โ โ โ
โ โ Modify shared โ EXCLUSION โ Modify shared โ โ
โ โ resource โ REQUIRED! โ resource โ โ
โ โ โ โ โ โ
โ โ EXIT CRITICAL โ โ EXIT CRITICAL โ โ
โ โ SECTION โ โ SECTION โ โ
โ โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ โ
โ โ โ โ
โ โผ โผ โ
โ [Non-critical code] [Non-critical code] โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Mutual exclusion ensures only one thread executes a critical section at a time.
Requirements for correct mutual exclusion:
- Safety: At most one thread in the critical section
- Liveness: A thread that wants to enter eventually does
- Bounded waiting: No thread waits forever (no starvation)
- No assumptions: Works regardless of thread speed or count
2.6 Mutex Locks and Spinlocks
Mutex (Mutual Exclusion Lock)
pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER;
void *worker(void *arg) {
pthread_mutex_lock(&lock); // Acquire lock (blocks if held)
// Critical section - only one thread here at a time
counter++;
pthread_mutex_unlock(&lock); // Release lock
return NULL;
}
Mutex behavior:
lock(): If free, acquire and continue. If held, block (sleep) until available.unlock(): Release lock, wake one waiting thread.- Low CPU usage when waiting (thread sleeps).
- Higher latency due to context switch overhead.
Spinlock
pthread_spinlock_t spinlock;
pthread_spin_init(&spinlock, PTHREAD_PROCESS_PRIVATE);
void *worker(void *arg) {
pthread_spin_lock(&spinlock); // Busy-wait if held
// Critical section
counter++;
pthread_spin_unlock(&spinlock);
return NULL;
}
Spinlock behavior:
lock(): If free, acquire. If held, spin (busy-wait) checking repeatedly.unlock(): Release lock.- High CPU usage when waiting (burning cycles).
- Lower latency for short critical sections.
When to Use Each
| Use Mutex When | Use Spinlock When |
|---|---|
| Critical section is long | Critical section is very short |
| Holding across I/O | Never holding across I/O |
| Uniprocessor system | Multiprocessor with short waits |
| Priority concerns matter | Lowest latency required |
2.7 Semaphores and Their Semantics
A semaphore is a non-negative integer with two atomic operations: wait (P/down) and signal (V/up).
#include <semaphore.h>
sem_t sem;
sem_init(&sem, 0, initial_value); // Initialize with count
sem_wait(&sem); // P operation: if count > 0, decrement; else block
sem_post(&sem); // V operation: increment count, wake one waiter
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ SEMAPHORE OPERATIONS โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ P (wait/down): V (signal/up): โ
โ โโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโ โ
โ โ
โ if (s > 0) { s++; โ
โ s--; wake_one_waiter(); โ
โ } else { โ
โ block_current_thread(); // Always succeeds โ
โ } // Never blocks โ
โ โ
โ // May block โ
โ // Decrements on success โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ BINARY SEMAPHORE (mutex equivalent, initial value = 1): โ
โ โ
โ sem_init(&sem, 0, 1); // Can be acquired once โ
โ โ
โ sem_wait(&sem); // Acquire (like lock) โ
โ // critical section โ
โ sem_post(&sem); // Release (like unlock) โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ COUNTING SEMAPHORE (resource pool, initial value = N): โ
โ โ
โ sem_init(&sem, 0, 10); // 10 resources available โ
โ โ
โ sem_wait(&sem); // Acquire one resource โ
โ // use resource โ
โ sem_post(&sem); // Return resource โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Semaphore vs Mutex:
- Mutex: Locked/unlocked, must be released by the thread that acquired it
- Semaphore: Counter, can be signaled by any thread (useful for signaling)
2.8 Condition Variables and Signaling
Condition variables enable threads to wait for a condition to become true.
pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER;
pthread_cond_t cond = PTHREAD_COND_INITIALIZER;
int ready = 0;
// Waiting thread
void *waiter(void *arg) {
pthread_mutex_lock(&lock);
while (!ready) { // Always use while, not if!
pthread_cond_wait(&cond, &lock); // Atomically: unlock + sleep + relock
}
// Condition is now true
pthread_mutex_unlock(&lock);
return NULL;
}
// Signaling thread
void *signaler(void *arg) {
pthread_mutex_lock(&lock);
ready = 1;
pthread_cond_signal(&cond); // Wake one waiter
pthread_mutex_unlock(&lock);
return NULL;
}
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ CONDITION VARIABLE WAIT/SIGNAL โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ pthread_cond_wait(&cond, &mutex): โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ
โ 1. Atomically: release mutex AND go to sleep on cond โ
โ 2. When signaled: wake up โ
โ 3. Atomically: reacquire mutex before returning โ
โ โ
โ CRITICAL: Must hold mutex when calling! โ
โ CRITICAL: Always recheck condition in while loop (spurious wakeups)! โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ pthread_cond_signal(&cond): pthread_cond_broadcast(&cond): โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ
โ Wake ONE waiting thread Wake ALL waiting threads โ
โ (or none if no waiters) (each must reacquire mutex) โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Why while loop, not if?
// WRONG - can miss condition change
if (!ready) {
pthread_cond_wait(&cond, &lock);
}
// ready might be false here due to:
// 1. Spurious wakeups (OS can wake thread without signal)
// 2. Another thread changed condition after we woke but before we locked
// CORRECT
while (!ready) {
pthread_cond_wait(&cond, &lock);
}
// ready is guaranteed true here
2.9 Producer-Consumer Pattern
The producer-consumer pattern is fundamental to concurrent systems: producers add work to a buffer, consumers remove and process it.
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ PRODUCER-CONSUMER PATTERN โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ โโโโโโโโโโโโ โโโโโโโโโโโโ โ
โ โProducer 1โโโโ โโโโถโConsumer 1โ โ
โ โโโโโโโโโโโโ โ โ โโโโโโโโโโโโ โ
โ โ โ โ
โ โโโโโโโโโโโโ โ โโโโโโโโโโโโโโโโโโโโโโโโโโโ โ โโโโโโโโโโโโ โ
โ โProducer 2โโโโผโโโถโ Bounded Buffer โโโโโผโโโถโConsumer 2โ โ
โ โโโโโโโโโโโโ โ โ โโโโโฌโโโโฌโโโโฌโโโโฌโโโโ โ โ โโโโโโโโโโโโ โ
โ โ โ โ W โ W โ โ โ โ โ โ โ
โ โโโโโโโโโโโโ โ โ โโโโโดโโโโดโโโโดโโโโดโโโโ โ โ โโโโโโโโโโโโ โ
โ โProducer 3โโโโ โ โฒ โฒ โ โโโโถโConsumer 3โ โ
โ โโโโโโโโโโโโ โ โ โ โ โโโโโโโโโโโโ โ
โ โ rear front โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ
โ Synchronization requirements: โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ 1. Mutual exclusion: Only one thread modifies buffer at a time โ
โ 2. Producer blocks: When buffer is FULL โ
โ 3. Consumer blocks: When buffer is EMPTY โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Implementation with Semaphores
#define BUFFER_SIZE 10
typedef struct {
int buffer[BUFFER_SIZE];
int front; // Index for next remove
int rear; // Index for next insert
sem_t mutex; // Protects buffer access
sem_t slots; // Counts empty slots (initially BUFFER_SIZE)
sem_t items; // Counts full slots (initially 0)
} bounded_buffer_t;
void buffer_init(bounded_buffer_t *b) {
b->front = b->rear = 0;
sem_init(&b->mutex, 0, 1);
sem_init(&b->slots, 0, BUFFER_SIZE); // All slots empty
sem_init(&b->items, 0, 0); // No items yet
}
void buffer_insert(bounded_buffer_t *b, int item) {
sem_wait(&b->slots); // Wait for empty slot (decrements slots)
sem_wait(&b->mutex); // Lock buffer
b->buffer[b->rear] = item;
b->rear = (b->rear + 1) % BUFFER_SIZE;
sem_post(&b->mutex); // Unlock buffer
sem_post(&b->items); // Signal item available (increments items)
}
int buffer_remove(bounded_buffer_t *b) {
sem_wait(&b->items); // Wait for item (decrements items)
sem_wait(&b->mutex); // Lock buffer
int item = b->buffer[b->front];
b->front = (b->front + 1) % BUFFER_SIZE;
sem_post(&b->mutex); // Unlock buffer
sem_post(&b->slots); // Signal slot freed (increments slots)
return item;
}
Implementation with Condition Variables
typedef struct {
int buffer[BUFFER_SIZE];
int front, rear, count;
pthread_mutex_t lock;
pthread_cond_t not_empty;
pthread_cond_t not_full;
} bounded_buffer_cv_t;
void buffer_insert_cv(bounded_buffer_cv_t *b, int item) {
pthread_mutex_lock(&b->lock);
while (b->count == BUFFER_SIZE) { // Buffer full
pthread_cond_wait(&b->not_full, &b->lock);
}
b->buffer[b->rear] = item;
b->rear = (b->rear + 1) % BUFFER_SIZE;
b->count++;
pthread_cond_signal(&b->not_empty); // Wake a consumer
pthread_mutex_unlock(&b->lock);
}
int buffer_remove_cv(bounded_buffer_cv_t *b) {
pthread_mutex_lock(&b->lock);
while (b->count == 0) { // Buffer empty
pthread_cond_wait(&b->not_empty, &b->lock);
}
int item = b->buffer[b->front];
b->front = (b->front + 1) % BUFFER_SIZE;
b->count--;
pthread_cond_signal(&b->not_full); // Wake a producer
pthread_mutex_unlock(&b->lock);
return item;
}
2.10 Reader-Writer Locks
When data is read often but written rarely, reader-writer locks allow multiple concurrent readers but exclusive writers.
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ READER-WRITER LOCK SEMANTICS โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ Reader wants access: Writer wants access: โ
โ โโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโ โ
โ โ
โ If no writers active: If no readers AND no writers: โ
โ Grant read access Grant write access โ
โ (multiple readers OK) (exclusive) โ
โ โ
โ If writer active: If any readers OR writer: โ
โ Block Block โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ Legal states: Illegal states: โ
โ โโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โ
โ โ
โ [R] [R] [R] [R] โ Multiple [R] [W] โ Reader + Writer โ
โ [W] โ Single writer [W] [W] โ Multiple writers โ
โ (empty) โ No access [R] [R] [W] โ Mixed โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
pthread_rwlock_t rwlock = PTHREAD_RWLOCK_INITIALIZER;
void *reader(void *arg) {
pthread_rwlock_rdlock(&rwlock); // Acquire read lock
// Read shared data (others can read too)
pthread_rwlock_unlock(&rwlock);
return NULL;
}
void *writer(void *arg) {
pthread_rwlock_wrlock(&rwlock); // Acquire write lock (exclusive)
// Modify shared data (no one else can access)
pthread_rwlock_unlock(&rwlock);
return NULL;
}
Starvation concerns:
- Reader preference: Writers may starve if readers keep arriving
- Writer preference: Readers may starve if writers keep arriving
- Fair: Requests handled in order (complex to implement)
2.11 Deadlock Conditions and Prevention
Deadlock occurs when threads wait for each other in a cycle, and none can proceed.
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ DEADLOCK SCENARIO โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ Thread A Thread B โ
โ โโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโ โ
โ โ
โ lock(mutex_A); lock(mutex_B); โ
โ // ... some work ... // ... some work ... โ
โ lock(mutex_B); โโโ BLOCKS โโโบ lock(mutex_A); โโโ BLOCKS โ
โ // waiting for B // waiting for A โ
โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ โ โ
โ โ Thread A โโโโwaits forโโโโโถ mutex_B โ โ
โ โ โ โ โ โ
โ โ โ โ โ โ
โ โ held by held by โ โ
โ โ โ โ โ โ
โ โ โผ โผ โ โ
โ โ mutex_A โโโโโwaits forโโโโ Thread B โ โ
โ โ โ โ
โ โ CIRCULAR WAIT = DEADLOCK โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Four Necessary Conditions (Coffman Conditions)
All four must be present for deadlock:
- Mutual exclusion: Resources cannot be shared
- Hold and wait: Thread holds resources while waiting for more
- No preemption: Resources cannot be forcibly taken away
- Circular wait: Cycle of threads waiting for each other
Prevention Strategies
| Strategy | Breaks Condition | How |
|---|---|---|
| Lock ordering | Circular wait | Always acquire locks in global order |
| Lock timeout | Hold and wait | Give up and retry if lock not acquired in time |
| Try-lock | Hold and wait | Donโt block; back off if canโt acquire |
| Lock hierarchy | Circular wait | Assign levels; only acquire lower-level locks |
| Single lock | Mutual exclusion | Use one coarse-grained lock (hurts parallelism) |
Lock ordering example:
// Define global order: mutex_A < mutex_B < mutex_C
// CORRECT: Always acquire in order
void safe_operation() {
pthread_mutex_lock(&mutex_A);
pthread_mutex_lock(&mutex_B);
// ... critical section ...
pthread_mutex_unlock(&mutex_B);
pthread_mutex_unlock(&mutex_A);
}
// WRONG: Violates ordering, risks deadlock
void unsafe_operation() {
pthread_mutex_lock(&mutex_B); // Bad: B before A
pthread_mutex_lock(&mutex_A);
}
2.12 Thread Pools and Work Queues
A thread pool maintains a fixed number of worker threads that pull tasks from a shared queue.
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ THREAD POOL ARCHITECTURE โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ Clients โ
โ โโโโโโโ โ
โ โโโโโ โโโโโ โโโโโ โโโโโ โ
โ โ C โ โ C โ โ C โ โ C โ ... many clients โ
โ โโโฌโโ โโโฌโโ โโโฌโโ โโโฌโโ โ
โ โ โ โ โ โ
โ โโโโโโโดโโโโโโดโโโโโโ โ
โ โ โ
โ โผ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ TASK QUEUE โ โ
โ โ โโโโโโฌโโโโโฌโโโโโฌโโโโโฌโโโโโฌโโโโโฌโโโโโฌโโโโโฌโโโโโฌโโโโโ โ โ
โ โ โTaskโTaskโTaskโTaskโTaskโ โ โ โ โ โ โ โ
โ โ โโโโโโดโโโโโดโโโโโดโโโโโดโโโโโดโโโโโดโโโโโดโโโโโดโโโโโดโโโโโ โ โ
โ โ โโโโโ Bounded buffer with producer-consumer sync โโโโโบ โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ โ
โ โผ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ THREAD POOL โ โ
โ โ โ โ
โ โ โโโโโโโโโโ โโโโโโโโโโ โโโโโโโโโโ โโโโโโโโโโ โโโโโโโโโโ โ โ
โ โ โWorker 0โ โWorker 1โ โWorker 2โ โWorker 3โ โWorker Nโ โ โ
โ โ โโโโโฌโโโโโ โโโโโฌโโโโโ โโโโโฌโโโโโ โโโโโฌโโโโโ โโโโโฌโโโโโ โ โ
โ โ โ โ โ โ โ โ โ
โ โ โผ โผ โผ โผ โผ โ โ
โ โ [Execute] [Execute] [Waiting] [Execute] [Waiting] โ โ
โ โ Task 1 Task 2 for task Task 3 for task โ โ
โ โ โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ
โ Benefits: โ
โ โโโโโโโโโ โ
โ - Bounded resource usage (fixed N threads) โ
โ - Amortized thread creation cost โ
โ - Backpressure when queue fills (producers block) โ
โ - Natural load balancing across workers โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Why thread pools?
| Without Pool | With Pool |
|---|---|
| Create thread per request | Reuse existing threads |
| Unbounded resource growth | Bounded thread count |
| Thread creation overhead per request | Creation cost amortized |
| Can exhaust system resources | Graceful degradation under load |
2.13 Thread Safety and Reentrant Functions
A function is thread-safe if it can be called from multiple threads simultaneously without causing race conditions.
A function is reentrant if it doesnโt use any shared stateโa stricter condition than thread-safety.
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ THREAD SAFETY CLASSIFICATION โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ REENTRANT โ โ
โ โ - Uses only local variables (stack) โ โ
โ โ - Doesn't access global/static data โ โ
โ โ - Doesn't call non-reentrant functions โ โ
โ โ - Automatically thread-safe โ โ
โ โ โ โ
โ โ Example: โ โ
โ โ int square(int x) { return x * x; } โ โ
โ โ โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ โ
โ โผ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ THREAD-SAFE (but not reentrant) โ โ
โ โ - Uses shared state with proper synchronization โ โ
โ โ - May use static variables protected by locks โ โ
โ โ โ โ
โ โ Example: โ โ
โ โ int counter = 0; โ โ
โ โ pthread_mutex_t lock; โ โ
โ โ โ โ
โ โ int get_next() { โ โ
โ โ pthread_mutex_lock(&lock); โ โ
โ โ int val = counter++; โ โ
โ โ pthread_mutex_unlock(&lock); โ โ
โ โ return val; โ โ
โ โ } โ โ
โ โ โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ โ
โ โผ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ NOT THREAD-SAFE โ โ
โ โ - Accesses shared state without synchronization โ โ
โ โ - Returns pointers to static storage โ โ
โ โ โ โ
โ โ Examples: โ โ
โ โ - strtok() - uses static buffer โ โ
โ โ - localtime() - returns pointer to static struct โ โ
โ โ - rand() - uses static seed โ โ
โ โ โ โ
โ โ Thread-safe alternatives: โ โ
โ โ - strtok_r() - user provides buffer โ โ
โ โ - localtime_r() - user provides output struct โ โ
โ โ - rand_r() - user provides seed โ โ
โ โ โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Guidelines for thread-safe programming:
- Prefer reentrant functions when possible
- Use
_rversions of standard library functions - Protect shared state with appropriate synchronization
- Avoid returning pointers to static storage
- Use thread-local storage for per-thread state
2.14 Memory Consistency Models (Basics)
Modern CPUs and compilers reorder operations for performance, which can break assumptions in concurrent code.
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ MEMORY ORDERING SURPRISES โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ // Thread A // Thread B โ
โ x = 1; while (ready == 0) { } โ
โ ready = 1; print(x); โ
โ โ
โ You expect: Thread B prints 1 โ
โ Reality: Thread B might print 0! โ
โ โ
โ Why? Compiler or CPU might reorder Thread A's stores: โ
โ ready = 1; // Stored first! โ
โ x = 1; // Stored second โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ WHAT CAN REORDER OPERATIONS: โ
โ โ
โ 1. COMPILER: Reorders for optimization โ
โ - Can be prevented with volatile or memory barriers โ
โ โ
โ 2. CPU: Out-of-order execution, store buffers โ
โ - Requires hardware memory barriers (mfence, etc.) โ
โ โ
โ 3. CACHE: Different cores see updates at different times โ
โ - Cache coherency protocols handle this (eventually) โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ SOLUTION: Use proper synchronization primitives! โ
โ โ
โ pthread_mutex_lock/unlock include necessary memory barriers โ
โ sem_wait/post include necessary memory barriers โ
โ C11 atomics provide explicit memory ordering โ
โ โ
โ DON'T try to roll your own lock-free code without deep expertise! โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Practical advice:
- Always use pthreads primitives or C11 atomics
- Donโt assume operations happen in source-code order
- Testing doesnโt prove absence of memory ordering bugs
- Lock-free programming is expert-level (avoid unless necessary)
3. Project Specification
3.1 What You Will Build
A Concurrency Workbench: a configurable server framework that demonstrates and compares different concurrency models under load, with built-in instrumentation and stress testing.
3.2 Server Modes
The server must support switching between these concurrency models at startup:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ CONCURRENCY MODEL COMPARISON โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ MODE 1: ITERATIVE (baseline) โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ
โ โโโโโโโโโโ โโโโโโโโโโ โโโโโโโโโโ โโโโโโโโโโ โ
โ โRequest1โโโโโถโHandle 1โโโโโถโRequest2โโโโโถโHandle 2โโโโโถ ... โ
โ โโโโโโโโโโ โโโโโโโโโโ โโโโโโโโโโ โโโโโโโโโโ โ
โ โ
โ - One request at a time โ
โ - Simple, no concurrency bugs possible โ
โ - Terrible for blocking I/O or slow clients โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ MODE 2: PROCESS-PER-REQUEST โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ
โ fork() โโโโโโโโโโโโโโโโ โ
โ โโโโโโโโโโฌโโโถโ Child Proc 1 โโโโถ Handle Request 1 โ
โ โ โ โโโโโโโโโโโโโโโโ โ
โ โ Parent โ โ
โ โ โ โโโโโโโโโโโโโโโโ โ
โ โ โโโโถโ Child Proc 2 โโโโถ Handle Request 2 โ
โ โ โ โโโโโโโโโโโโโโโโ โ
โ โ โ โ
โ โโโโโโโโโโ ... โ
โ โ
โ - Maximum isolation (crash doesn't affect parent) โ
โ - High overhead (fork per request) โ
โ - No shared state between requests (simple) โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ MODE 3: THREAD-PER-REQUEST โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ Main Thread โ โ
โ โ accept() โโโถ pthread_create() โโโถ accept() โโโถ ... โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ โ โ
โ โผ โผ โ
โ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โ
โ โ Thread 1 โ โ Thread 2 โ โ
โ โ Handle Req โ โ Handle Req โ โ
โ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โ
โ โ
โ - Lower overhead than fork() โ
โ - Still unbounded thread creation โ
โ - Shared state requires synchronization โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ MODE 4: THREAD POOL (most sophisticated) โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ Main Thread โ โ
โ โ accept() โโโถ enqueue(conn_fd) โโโถ accept() โโโถ ... โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ โ
โ โผ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ Bounded Task Queue โ โ
โ โ โโโโโโฌโโโโโฌโโโโโฌโโโโโฌโโโโโฌโโโโโฌโโโโโฌโโโโโ โ โ
โ โ โfd 5โfd 9โfd 3โ โ โ โ โ โ โ โ
โ โ โโโโโโดโโโโโดโโโโโดโโโโโดโโโโโดโโโโโดโโโโโดโโโโโ โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ โ
โ โโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโ โ
โ โผ โผ โผ โ
โ โโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโ โ
โ โ Worker 1 โ โ Worker 2 โ โ Worker N โ โ
โ โ dequeue โ โ dequeue โ โ dequeue โ โ
โ โ handle โ โ handle โ โ handle โ โ
โ โโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโ โ
โ โ
โ - Bounded resource usage โ
โ - Backpressure when overloaded (queue fills) โ
โ - Thread creation amortized over many requests โ
โ - Requires careful synchronization โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
3.3 Functional Requirements
- Mode Selection (
--mode <iterative|process|thread|pool>):- Select concurrency model at startup
- Default to thread pool mode
- Thread Pool Configuration (
--threads <N>,--queue-size <M>):- Configurable worker thread count
- Configurable bounded buffer size
- Echo Service:
- Simple echo server for testing (read line, write line)
- Configurable artificial delay per request (for testing concurrency)
- Instrumentation (
--stats):- Requests handled per second
- Average latency per request
- Queue depth over time (for pool mode)
- Active workers over time
- Graceful Shutdown:
- Handle SIGINT to stop accepting new connections
- Complete in-progress requests
- Clean thread/process termination
- Debug Mode (
--debug):- Log every operation with timestamps
- Assert invariants continuously
- Detect and report races/deadlocks
3.4 Non-Functional Requirements
- Correctness: No data races, no deadlocks, proper resource cleanup
- Stability: Run for hours under stress without crashes or leaks
- Observability: Every concurrency decision should be loggable
- Portability: Works on Linux (primary), macOS (stretch goal)
3.5 Example Usage
# Start server in thread pool mode with 4 workers and queue size 16
$ ./concurrency-workbench --mode pool --threads 4 --queue-size 16 --port 8080
Concurrency Workbench v1.0
Mode: Thread Pool (4 workers, queue size 16)
Listening on port 8080...
# In another terminal, run stress test
$ ./stress-test --clients 100 --requests 1000 --target localhost:8080
Running stress test: 100 concurrent clients, 1000 requests each
[โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ] 100000/100000 requests
Results:
Total time: 12.3 seconds
Requests/sec: 8130
Avg latency: 12.3 ms
P99 latency: 45.2 ms
Errors: 0
# Compare with other modes
$ ./run-comparison.sh
Mode RPS Avg Latency P99 Latency Errors
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
iterative 142 703.4 ms 2100 ms 0
process 1823 54.9 ms 289 ms 0
thread 6412 15.6 ms 67 ms 0
pool (4) 8130 12.3 ms 45 ms 0
pool (8) 9847 10.2 ms 38 ms 0
pool (16) 10234 9.8 ms 41 ms 0
4. Solution Architecture
4.1 High-Level Design
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ CONCURRENCY WORKBENCH ARCHITECTURE โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ CLI Layer โ โ
โ โ main.c: Argument parsing, mode selection, startup/shutdown โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ โ
โ โผ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ Server Abstraction โ โ
โ โ server.h: Common interface for all concurrency modes โ โ
โ โ โ โ
โ โ typedef struct server { โ โ
โ โ void (*start)(struct server *, int port); โ โ
โ โ void (*stop)(struct server *); โ โ
โ โ stats_t (*get_stats)(struct server *); โ โ
โ โ } server_t; โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ โ โ โ
โ โผ โผ โผ โ
โ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ Iterative โ โ Process โ โ Thread โ โ
โ โ Server โ โ Server โ โ Server โ โ
โ โ iterative.c โ โ process.c โ โ โโโโโโโโโโโโโโโโโโโ โ โ
โ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โ โ thread_simple โ โ โ
โ โ โ thread.c โ โ โ
โ โ โโโโโโโโโโโโโโโโโโโค โ โ
โ โ โ thread_pool โ โ โ
โ โ โ thread_pool.c โ โ โ
โ โ โโโโโโโโโโโโโโโโโโโ โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ โ
โ โผ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ Core Components โ โ
โ โ โ โ
โ โ โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโ โ โ
โ โ โ Bounded Buffer โ โ Thread Pool โ โ Request Handler โ โ โ
โ โ โ bounded_buf.c โ โ threadpool.c โ โ handler.c โ โ โ
โ โ โ โ โ โ โ โ โ โ
โ โ โ - Producer โ โ - Worker mgmt โ โ - Echo protocol โ โ โ
โ โ โ - Consumer โ โ - Task submit โ โ - Response format โ โ โ
โ โ โ - Blocking ops โ โ - Stats โ โ - Timing โ โ โ
โ โ โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโ โ โ
โ โ โ โ
โ โ โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโ โ โ
โ โ โ Statistics โ โ Robust I/O โ โ Debug/Assert โ โ โ
โ โ โ stats.c โ โ rio.c โ โ debug.c โ โ โ
โ โ โ โ โ โ โ โ โ โ
โ โ โ - Counters โ โ - Partial R/W โ โ - Invariant check โ โ โ
โ โ โ - Latency โ โ - Buffering โ โ - Race detection โ โ โ
โ โ โ - Throughput โ โ - Error handle โ โ - Logging โ โ โ
โ โ โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโ โ โ
โ โ โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
4.2 Key Components
| Component | Responsibility | Key Challenges |
|---|---|---|
| Bounded Buffer | Thread-safe work queue | Correct blocking on empty/full |
| Thread Pool | Worker lifecycle management | Clean shutdown, error handling |
| Request Handler | Protocol implementation | Thread-safe design |
| Statistics | Metrics collection | Lock-free or low-overhead synchronization |
| Debug System | Invariant checking | Non-intrusive, togglable |
4.3 Data Structures
// Bounded buffer for work queue
typedef struct {
int *buffer; // Array of file descriptors
int capacity; // Maximum items
int count; // Current items
int front, rear; // Circular queue indices
pthread_mutex_t lock; // Protects all fields
pthread_cond_t not_empty; // Signaled when item added
pthread_cond_t not_full; // Signaled when item removed
int shutdown; // Shutdown flag
} bounded_buffer_t;
// Thread pool
typedef struct {
pthread_t *workers; // Worker thread handles
int num_workers; // Number of workers
bounded_buffer_t *queue; // Work queue
void (*handler)(int fd); // Request handler function
// Statistics (atomics for lock-free access)
_Atomic uint64_t requests_completed;
_Atomic uint64_t total_latency_us;
_Atomic int active_workers;
int shutdown; // Shutdown flag
} thread_pool_t;
// Server statistics
typedef struct {
uint64_t total_requests;
uint64_t requests_per_second;
double avg_latency_ms;
double p50_latency_ms;
double p99_latency_ms;
int queue_depth;
int active_workers;
} stats_t;
4.4 Algorithm Overview
Thread Pool Worker Loop:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ WORKER THREAD ALGORITHM โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ worker_thread(): โ
โ โโโโโโโโโโโโโโโโ โ
โ โ
โ loop: โ
โ โ โ
โ โผ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ fd = bounded_buffer_get(queue) โ โ
โ โ โ โ
โ โ - Acquire lock โ โ
โ โ - While queue empty AND not shutdown: โ โ
โ โ Wait on not_empty condition โ โ
โ โ - If shutdown: return โ โ
โ โ - Dequeue fd โ โ
โ โ - Signal not_full โ โ
โ โ - Release lock โ โ
โ โโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ โ
โ โผ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ active_workers++ โ โ
โ โ start_time = now() โ โ
โ โโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ โ
โ โผ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ handle_request(fd) โ โ
โ โ โ โ
โ โ - Read client data โ โ
โ โ - Process (echo back) โ โ
โ โ - Write response โ โ
โ โ - Close fd โ โ
โ โโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ โ
โ โผ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ latency = now() - start_time โ โ
โ โ total_latency += latency โ โ
โ โ requests_completed++ โ โ
โ โ active_workers-- โ โ
โ โโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ โ
โ โโโโโโโโโโโโโโโโถ loop โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
5. Implementation Guide
5.1 Development Environment Setup
# Required packages (Ubuntu/Debian)
sudo apt-get install build-essential gdb valgrind
# For helgrind (race detection) and drd (thread error detection)
# Included with valgrind
# Optional: ThreadSanitizer (requires clang or recent gcc)
# Compile with: gcc -fsanitize=thread -g ...
# Create project structure
mkdir -p concurrency-workbench/{src,include,tests,scripts}
cd concurrency-workbench
5.2 Project Structure
concurrency-workbench/
โโโ include/
โ โโโ bounded_buffer.h # Bounded buffer interface
โ โโโ thread_pool.h # Thread pool interface
โ โโโ server.h # Server abstraction
โ โโโ handler.h # Request handler
โ โโโ stats.h # Statistics collection
โ โโโ rio.h # Robust I/O
โ โโโ debug.h # Debug/assert utilities
โโโ src/
โ โโโ main.c # Entry point, CLI
โ โโโ bounded_buffer.c # Producer-consumer queue
โ โโโ thread_pool.c # Thread pool implementation
โ โโโ server_iterative.c # Iterative server
โ โโโ server_process.c # Process-per-request server
โ โโโ server_thread.c # Thread-per-request server
โ โโโ server_pool.c # Thread pool server
โ โโโ handler.c # Echo handler
โ โโโ stats.c # Statistics
โ โโโ rio.c # Robust I/O
โ โโโ debug.c # Debug utilities
โโโ tests/
โ โโโ test_bounded_buffer.c
โ โโโ test_thread_pool.c
โ โโโ stress_test.c
โ โโโ race_detector.c
โโโ scripts/
โ โโโ run_comparison.sh
โ โโโ analyze_stats.py
โโโ Makefile
5.3 Implementation Phases
Phase 1: Foundation (Days 1-3)
Goals:
- Set up project structure
- Implement robust I/O
- Create basic echo handler
Tasks:
- Create Makefile with proper compilation flags
- Implement
rio_readn()andrio_writen()for robust I/O - Implement basic echo handler (read line, echo back)
- Create server socket setup utilities
Checkpoint: Can create a listening socket and handle one connection manually.
Phase 2: Iterative Server (Days 4-5)
Goals:
- Implement baseline iterative server
- Add basic statistics collection
Tasks:
- Implement accept loop with single-threaded handling
- Add request counting and timing
- Test with multiple sequential clients
- Observe blocking behavior with slow client
Checkpoint: Server handles requests one at a time, reports requests/second.
Phase 3: Process-Based Server (Days 6-7)
Goals:
- Implement process-per-request model
- Handle zombie process cleanup
Tasks:
- Fork child process for each accepted connection
- Implement SIGCHLD handler for reaping
- Handle error cases (fork failure, child crash)
- Compare throughput with iterative
Checkpoint: Server handles concurrent clients via fork, no zombie processes.
Phase 4: Thread-Per-Request Server (Days 8-9)
Goals:
- Implement thread-per-request model
- Handle thread lifecycle
Tasks:
- Create detached thread for each connection
- Ensure proper resource cleanup
- Handle thread creation failure
- Compare throughput with process model
Checkpoint: Server handles concurrent clients via threads, no resource leaks.
Phase 5: Bounded Buffer (Days 10-12)
Goals:
- Implement thread-safe bounded buffer
- Test thoroughly under contention
Tasks:
- Implement circular buffer with mutex and condition variables
- Implement blocking
put()andget()operations - Add shutdown signaling
- Write comprehensive unit tests
Checkpoint: Bounded buffer passes stress tests with many producers/consumers.
Phase 6: Thread Pool (Days 13-16)
Goals:
- Implement complete thread pool
- Add graceful shutdown
Tasks:
- Create worker thread management
- Integrate bounded buffer as work queue
- Implement shutdown sequence
- Add statistics collection
Checkpoint: Thread pool handles work items, shuts down cleanly.
Phase 7: Thread Pool Server (Days 17-19)
Goals:
- Integrate thread pool with server
- Compare all models
Tasks:
- Connect accept loop to thread pool submission
- Handle queue-full backpressure
- Implement comprehensive statistics
- Run comparison benchmarks
Checkpoint: All four server modes work, with performance comparison data.
Phase 8: Stress Testing and Bug Hunting (Days 20-21)
Goals:
- Create stress test harness
- Find and fix concurrency bugs
Tasks:
- Build multi-client stress test tool
- Run extended stress tests (hours)
- Use helgrind/ThreadSanitizer to find races
- Fix all discovered issues
Checkpoint: Server runs stably under stress, no races detected.
5.4 Key Implementation Decisions
| Decision | Options | Recommendation | Rationale |
|---|---|---|---|
| Synchronization | Semaphores vs Condition Variables | Condition Variables | More flexible, clearer semantics |
| Statistics | Per-operation locks vs Atomics | Atomics | Lower overhead, sufficient for counters |
| Shutdown | Immediate vs Graceful | Graceful | Real-world expectation |
| Error Handling | Abort vs Continue | Log and continue | Server should be resilient |
| Queue | Fixed array vs Dynamic | Fixed array | Predictable memory, simpler |
6. Testing Strategy
6.1 Unit Tests
// test_bounded_buffer.c
void test_single_producer_consumer() {
bounded_buffer_t buf;
bounded_buffer_init(&buf, 10);
// Producer thread
pthread_t producer;
pthread_create(&producer, NULL, producer_fn, &buf);
// Consumer in main thread
for (int i = 0; i < 100; i++) {
int item = bounded_buffer_get(&buf);
assert(item == i);
}
pthread_join(producer, NULL);
bounded_buffer_destroy(&buf);
}
void test_multiple_producers_consumers() {
// 10 producers, 10 consumers, 1000 items each
// Verify all items received exactly once
}
void test_buffer_bounds() {
bounded_buffer_t buf;
bounded_buffer_init(&buf, 5);
// Fill buffer
for (int i = 0; i < 5; i++) {
bounded_buffer_put(&buf, i); // Should not block
}
// Next put should block (test with tryput or timeout)
// ...
}
void test_shutdown() {
// Verify waiting threads unblock on shutdown
}
6.2 Integration Tests
// test_server.c
void test_echo_correctness() {
// Start server in thread
// Connect as client
// Send message, verify echo
// Close connection
// Stop server
}
void test_concurrent_clients() {
// Start server
// Launch 100 clients in parallel
// Each sends/receives 10 messages
// Verify all succeeded
}
void test_slow_client() {
// Client connects but reads slowly
// Other clients should not be affected
}
6.3 Stress Tests
# stress_test.c - Configurable stress test client
# Basic stress test
./stress-test --clients 100 --requests 1000 --target localhost:8080
# Long duration test
./stress-test --clients 50 --duration 3600 --target localhost:8080
# Burst test (all requests at once)
./stress-test --clients 1000 --requests 1 --burst --target localhost:8080
# Slow client test
./stress-test --clients 10 --delay-ms 100 --target localhost:8080
6.4 Race Detection
# Compile with ThreadSanitizer
gcc -fsanitize=thread -g -O1 -o server_tsan src/*.c -lpthread
# Run with helgrind (Valgrind tool)
valgrind --tool=helgrind ./concurrency-workbench --mode pool
# Run with DRD (Valgrind tool, often faster)
valgrind --tool=drd ./concurrency-workbench --mode pool
6.5 Critical Test Cases
- Race condition in counter:
- Multiple threads incrementing shared counter
- Verify final count matches expected
- Producer-consumer correctness:
- N producers, M consumers, K items each
- Verify each item consumed exactly once
- Deadlock scenario:
- Create conditions for potential deadlock
- Verify no hang occurs
- Shutdown under load:
- Signal shutdown while handling requests
- Verify clean termination
- Queue overflow:
- Submit more work than queue capacity
- Verify backpressure works correctly
7. Common Pitfalls
7.1 Deadlocks
| Pitfall | Symptom | Solution |
|---|---|---|
| Lock ordering violation | Complete hang | Always acquire locks in same order |
| Self-deadlock | Thread blocks forever | Donโt lock mutex already held |
| Condition variable without mutex | Corruption, deadlock | Always hold mutex when waiting |
| Forgetting to unlock | Other threads block | Use RAII pattern or careful review |
// WRONG: Can deadlock if threads acquire in different order
void transfer(account_t *from, account_t *to, int amount) {
pthread_mutex_lock(&from->lock);
pthread_mutex_lock(&to->lock); // Deadlock if other thread does to->from
// transfer
pthread_mutex_unlock(&to->lock);
pthread_mutex_unlock(&from->lock);
}
// CORRECT: Always lock by address order
void transfer(account_t *from, account_t *to, int amount) {
account_t *first = (from < to) ? from : to;
account_t *second = (from < to) ? to : from;
pthread_mutex_lock(&first->lock);
pthread_mutex_lock(&second->lock);
// transfer
pthread_mutex_unlock(&second->lock);
pthread_mutex_unlock(&first->lock);
}
7.2 Data Races
| Pitfall | Symptom | Solution |
|---|---|---|
| Unprotected shared variable | Corrupted data, crashes | Use mutex or atomics |
| Read without lock | Stale data | Lock for reads too |
| Non-atomic compound operation | Lost updates | Hold lock for entire operation |
| Flag without synchronization | Missed signals | Use atomic or condition variable |
// WRONG: Data race on 'done' flag
int done = 0;
void *producer(void *arg) {
// produce data
done = 1; // No memory barrier!
return NULL;
}
void *consumer(void *arg) {
while (!done) { // May never see update
// wait
}
// consume data - may see stale data!
}
// CORRECT: Use atomic or condition variable
_Atomic int done = 0;
void *producer(void *arg) {
// produce data
atomic_store(&done, 1);
return NULL;
}
void *consumer(void *arg) {
while (!atomic_load(&done)) {
// wait
}
// consume data
}
7.3 Resource Leaks
| Pitfall | Symptom | Solution |
|---|---|---|
| Not joining threads | Memory leak | Join or detach all threads |
| Not closing file descriptors | FD exhaustion | Close in all paths, including errors |
| Not destroying mutexes | Memory leak | Destroy in cleanup |
| Not reaping child processes | Zombie processes | Handle SIGCHLD |
// WRONG: Leaks on error
void handle_client(int fd) {
char *buf = malloc(1024);
if (read(fd, buf, 1024) < 0) {
return; // Leaks buf!
}
// ...
free(buf);
close(fd);
}
// CORRECT: Cleanup in all paths
void handle_client(int fd) {
char *buf = malloc(1024);
if (read(fd, buf, 1024) < 0) {
free(buf);
close(fd);
return;
}
// ...
free(buf);
close(fd);
}
// BETTER: Use goto for cleanup (common C pattern)
void handle_client(int fd) {
char *buf = NULL;
buf = malloc(1024);
if (read(fd, buf, 1024) < 0) {
goto cleanup;
}
// ...
cleanup:
free(buf);
close(fd);
}
7.4 Performance Pitfalls
| Pitfall | Symptom | Solution |
|---|---|---|
| Lock contention | Poor scaling | Reduce critical section, finer locks |
| False sharing | Cache thrashing | Pad data structures |
| Too many threads | Context switch overhead | Use thread pool |
| Busy waiting | High CPU, poor latency | Use condition variables |
7.5 Debugging Strategies
- Add extensive logging (with timestamps and thread IDs):
#define DEBUG_LOG(fmt, ...) \ fprintf(stderr, "[%lu][%p] " fmt "\n", \ time_us(), (void*)pthread_self(), ##__VA_ARGS__) - Use assertions liberally:
void bounded_buffer_put(bounded_buffer_t *b, int item) { pthread_mutex_lock(&b->lock); assert(b->count >= 0 && b->count <= b->capacity); // ... } - Introduce artificial delays to expose races:
#ifdef DEBUG_RACES usleep(rand() % 1000); // Random delay to expose timing issues #endif - Use helgrind/ThreadSanitizer in CI
8. Extensions and Challenges
8.1 Beginner Extensions
- Multiple handler types: Add handlers for different protocols (HTTP, time, etc.)
- Connection timeout: Disconnect idle clients after timeout
- Logging to file: Structured logging with rotation
8.2 Intermediate Extensions
- Priority queue: High-priority requests handled first
- Dynamic thread pool: Grow/shrink based on load
- Connection limiting: Max clients per IP
- I/O multiplexing mode: Add epoll/select-based model
8.3 Advanced Extensions
- Lock-free bounded buffer: Use compare-and-swap instead of locks
- Work stealing: Workers steal from each otherโs queues
- NUMA awareness: Pin threads to cores, local memory
- Custom allocator: Per-thread arena for request data
8.4 Lock-Free Data Structures (Advanced)
A lock-free bounded buffer using compare-and-swap:
// WARNING: Lock-free programming is extremely difficult!
// This is for educational purposes; use tested libraries in production.
typedef struct {
_Atomic int *buffer;
_Atomic size_t head;
_Atomic size_t tail;
size_t capacity;
} lockfree_queue_t;
int lockfree_push(lockfree_queue_t *q, int item) {
size_t tail, next;
do {
tail = atomic_load(&q->tail);
next = (tail + 1) % q->capacity;
if (next == atomic_load(&q->head)) {
return -1; // Full
}
} while (!atomic_compare_exchange_weak(&q->tail, &tail, next));
atomic_store(&q->buffer[tail], item);
return 0;
}
9. Real-World Connections
9.1 Industry Applications
- Web servers (nginx, Apache): Thread pools for request handling
- Databases (PostgreSQL, MySQL): Connection pools, worker processes
- Message queues (RabbitMQ, Kafka): Producer-consumer patterns
- Game servers: Thread pools for player connections
- Cloud infrastructure: Work stealing, NUMA-aware scheduling
9.2 Related Open Source Projects
- libuv: Event loop library used by Node.js
- libevent: Event notification library
- Intel TBB: Threading Building Blocks with work stealing
- jemalloc: Thread-aware memory allocator
- Seastar: High-performance async framework
9.3 Interview Relevance
This project prepares you for questions like:
- โDesign a thread pool from scratchโ
- โHow would you implement a producer-consumer queue?โ
- โWhat are the four conditions for deadlock? How do you prevent it?โ
- โCompare process-based vs thread-based concurrencyโ
- โHow would you debug a race condition?โ
- โExplain the difference between mutex and semaphoreโ
10. Resources
10.1 Essential Reading
- CS:APP Chapter 12: โConcurrent Programmingโ โ Core material
- OSTEP Concurrency Chapters: Free textbook with excellent explanations
- Chapter 26: Concurrency Introduction
- Chapter 27: Interlude: Thread API
- Chapter 28: Locks
- Chapter 29: Lock-based Concurrent Data Structures
- Chapter 30: Condition Variables
- Chapter 31: Semaphores
- Chapter 32: Common Concurrency Problems
- Chapter 33: Event-based Concurrency
10.2 Reference Documentation
- POSIX Threads Programming: https://computing.llnl.gov/tutorials/pthreads/
- pthread man pages:
man pthread_create,man pthread_mutex_init, etc. - GCC Atomic Builtins: GCC documentation on
__atomic_*operations
10.3 Video Resources
- MIT 6.004 lectures on concurrency
- Carnegie Mellon 15-213 lectures (CS:APP course)
- โConcurrency is not Parallelismโ by Rob Pike
10.4 Tools
- helgrind: Valgrind tool for detecting races
- DRD: Valgrind tool for thread errors
- ThreadSanitizer: Clang/GCC tool for race detection
- gdb: Thread-aware debugging
10.5 Related Projects in This Series
- Previous: P15 (Robust Unix I/O Toolkit) โ I/O foundations
- Previous: P11 (Signals + Processes Sandbox) โ Process control foundations
- Next: P17 (Capstone Proxy) โ Integrates all concepts including concurrency
11. Self-Assessment Checklist
Before considering this project complete, verify:
Understanding
- I can explain the difference between concurrency and parallelism
- I can describe when to use processes vs threads
- I can implement a mutex from scratch (conceptually)
- I can explain why
counter++is not atomic - I can describe the four conditions for deadlock
- I can implement producer-consumer with semaphores AND condition variables
- I can explain spurious wakeups and why we use
whilenotif - I can describe what makes a function thread-safe vs reentrant
Implementation
- All four server modes work correctly
- Bounded buffer handles all edge cases (empty, full, shutdown)
- Thread pool shuts down gracefully
- No resource leaks (threads, fds, memory)
- Statistics are accurate under load
- Server handles client errors gracefully
Debugging
- I can use helgrind/ThreadSanitizer to find races
- I can debug deadlocks using gdb
- I have found and fixed at least one real concurrency bug
- My stress tests run for hours without issues
Performance
- I can explain why thread pool beats thread-per-request under high load
- I understand the overhead of context switches
- I can tune thread count for my workload
- I can identify lock contention as a bottleneck
12. Real World Outcome
When you complete this project, hereโs exactly what youโll see when running your concurrency workbench:
Server Mode Selection
$ ./workbench --mode iterative --port 8080
Concurrency Workbench v1.0
Mode: iterative (single-threaded, blocking)
Listening on port 8080...
Press Ctrl+C for statistics and shutdown.
[2025-12-26 10:15:23] Client 192.168.1.10:54321 connected
[2025-12-26 10:15:23] Request: GET /compute?n=100
[2025-12-26 10:15:23] Response: 200 OK (23ms)
[2025-12-26 10:15:24] Client 192.168.1.10:54321 disconnected
Comparing Concurrency Modes
$ ./workbench-bench --compare --clients 100 --requests 1000
=== CONCURRENCY MODE COMPARISON ===
Test: 100 concurrent clients, 1000 requests each
Work per request: 10ms simulated CPU + 5ms simulated I/O
Mode Throughput Avg Latency P99 Latency CPU Usage
--------------------------------------------------------------------------------
Iterative 67 req/s 1492 ms 1523 ms 12%
Process-per-Request 823 req/s 98 ms 342 ms 89%
Thread-per-Request 2,847 req/s 31 ms 87 ms 94%
Thread Pool (8) 4,123 req/s 22 ms 45 ms 97%
Thread Pool (16) 4,891 req/s 19 ms 38 ms 99%
Thread Pool (32) 4,752 req/s 20 ms 41 ms 99%
Analysis:
- Iterative: Serial processing, terrible latency (request queueing)
- Process-per-request: 12x faster, but fork() overhead visible
- Thread-per-request: 4x faster than processes (no fork overhead)
- Thread pool: Best throughput, optimal at ~16 threads (matches CPU cores)
- Diminishing returns beyond CPU count (context switch overhead)
Bounded Buffer in Action
$ ./workbench --mode pool --threads 4 --queue-size 10 --port 8080 --verbose
=== THREAD POOL INITIALIZATION ===
Queue capacity: 10 slots
Worker threads: 4
[MAIN] Starting worker threads...
Thread 0 (tid 12345) started, waiting for work
Thread 1 (tid 12346) started, waiting for work
Thread 2 (tid 12347) started, waiting for work
Thread 3 (tid 12348) started, waiting for work
[MAIN] Server ready. Queue state: [0/10 items]
=== UNDER LOAD ===
[10:15:23.001] Connection from 192.168.1.10 -> enqueue (queue: 1/10)
[10:15:23.002] Connection from 192.168.1.11 -> enqueue (queue: 2/10)
[10:15:23.002] Thread 0: dequeue, processing client 192.168.1.10
[10:15:23.003] Thread 1: dequeue, processing client 192.168.1.11
[10:15:23.003] Connection from 192.168.1.12 -> enqueue (queue: 1/10)
...
[10:15:23.100] Connection from 192.168.1.50 -> enqueue (queue: 10/10) FULL!
[10:15:23.100] Main thread BLOCKING on full queue...
[10:15:23.125] Thread 2: completed client, queue slot freed
[10:15:23.125] Main thread: enqueued, continuing accept loop
=== GRACEFUL SHUTDOWN (Ctrl+C) ===
^C
[MAIN] Shutdown signal received
[MAIN] Stopping accept loop
[MAIN] Sending poison pills to 4 workers...
[MAIN] Waiting for workers to finish current requests...
Thread 0: received poison pill, exiting
Thread 1: received poison pill, exiting
Thread 2: completing request, then exit
Thread 3: completing request, then exit
[MAIN] All workers terminated
=== FINAL STATISTICS ===
Total requests: 12,847
Successful: 12,845 (99.98%)
Failed: 2 (client disconnect)
Total time: 127.3 seconds
Throughput: 100.9 req/s
Latency percentiles:
P50: 18ms
P90: 29ms
P99: 67ms
Max: 312ms
Queue statistics:
Times full: 47
Times empty: 891
Avg occupancy: 4.2 items
Thread utilization:
Thread 0: 87% busy
Thread 1: 89% busy
Thread 2: 86% busy
Thread 3: 91% busy
Race Condition Detection
$ ./workbench --mode pool --threads 4 &
$ helgrind ./workbench-bench --clients 50 --requests 100
==12345== Helgrind, a thread error detector
==12345== Running on valgrind-3.18.1
==12345== ---Thread-Announcement------------------------------------------
==12345== Thread #3 was created
==12345== ----------------------------------------------------------------
==12345== Possible data race during write of size 8 at 0x4A3C80:
==12345== at 0x401234: update_statistics (stats.c:47)
==12345== by 0x401567: handle_request (server.c:123)
==12345== by 0x401789: worker_thread (pool.c:89)
==12345==
==12345== This conflicts with a previous read of size 8 by thread #2:
==12345== at 0x401230: update_statistics (stats.c:45)
==12345== by 0x401567: handle_request (server.c:123)
RACE DETECTED! Fix by adding mutex around statistics update.
After fixing and re-running:
==12345== ERROR SUMMARY: 0 errors from 0 contexts
Deadlock Demonstration and Fix
$ ./deadlock-demo
=== DEADLOCK DEMONSTRATION ===
Creating two threads with lock ordering violation...
Thread A: Acquired lock_1
Thread B: Acquired lock_2
Thread A: Waiting for lock_2...
Thread B: Waiting for lock_1...
[5 seconds pass with no progress]
DEADLOCK DETECTED!
Thread A holds lock_1, wants lock_2
Thread B holds lock_2, wants lock_1
This is a classic dining philosophers scenario.
=== FIX: CONSISTENT LOCK ORDERING ===
Both threads now acquire locks in order: lock_1, then lock_2
Thread A: Acquired lock_1
Thread A: Acquired lock_2
Thread A: Released lock_2
Thread A: Released lock_1
Thread B: Acquired lock_1
Thread B: Acquired lock_2
Thread B: Released lock_2
Thread B: Released lock_1
SUCCESS! Consistent ordering prevents deadlock.
13. The Core Question Youโre Answering
โHow do I write a server that can handle thousands of simultaneous clients without blocking, crashing, or corrupting shared data?โ
This project teaches you to harness the power of concurrent execution while avoiding its pitfalls. Youโll understand why the naive approach (one thread per client) doesnโt scale, why shared mutable state is dangerous, and how synchronization primitives let multiple threads cooperate safely. These patterns are the foundation of every high-performance server, database, and operating system.
14. Concepts You Must Understand First
Before starting this project, ensure you understand these concepts:
| Concept | Why It Matters | Where to Learn |
|---|---|---|
| Process creation (fork) | Baseline for comparison | CS:APP 8.4 |
| Basic thread creation (pthread_create) | Youโll create many threads | CS:APP 12.3 |
| What a race condition is | You must avoid them | CS:APP 12.4 |
| Critical sections concept | Foundation for mutexes | CS:APP 12.4 |
| Producer-consumer pattern | Your thread pool uses this | CS:APP 12.5.2 |
| What a semaphore does | Key synchronization primitive | CS:APP 12.5 |
| Socket programming basics | Server accepts connections | CS:APP Chapter 11 |
| errno and error handling | Threaded errors are tricky | CS:APP 10.4 |
15. Questions to Guide Your Design
Work through these questions BEFORE writing code:
-
Thread Pool Sizing: How many threads should be in your pool? How do you determine the optimal number?
-
Queue Overflow: What happens when work arrives faster than threads can process it? Block the producer? Drop work?
-
Graceful Shutdown: How do you stop worker threads without corrupting in-flight work? Whatโs a โpoison pillโ?
-
Error Isolation: If one request causes a segfault, what happens to other requests? To the server?
-
Statistics Threading: How do you collect accurate stats without creating a bottleneck? Lock every update?
-
Client Timeouts: What if a client connects but never sends data? How do you avoid blocking a worker forever?
-
Lock Ordering: If you need multiple locks, in what order should they be acquired to prevent deadlock?
16. Thinking Exercise
Before writing any code, trace through this scenario by hand:
You have a thread pool with 2 workers and a bounded buffer of size 3. Trace the execution:
Time 0: Request A arrives (processing time: 100ms)
Time 10: Request B arrives (processing time: 50ms)
Time 20: Request C arrives (processing time: 80ms)
Time 30: Request D arrives (processing time: 40ms)
Time 40: Request E arrives (processing time: 60ms)
Exercise: On paper, answer:
-
Time 0-10: Whatโs in the queue? Which workers are busy?
-
Time 30: Request D arrives. Can it be enqueued immediately? Whoโs processing what?
-
Time 40: Request E arrives. Queue state? Is the producer blocked?
-
Time 60: Request B completes. What happens next? Who processes D?
-
Time 100: Request A completes. What happens to blocked producer (if any)?
-
Final state: In what order did requests complete? Draw a timeline.
Verify your answers by implementing with verbose logging.
17. The Interview Questions Theyโll Ask
After completing this project, youโll be ready for these common interview questions:
- โExplain the difference between concurrency and parallelism.โ
- Expected: Concurrency is about structure (dealing with many things); parallelism is about execution (doing many things)
- Bonus: Give examples of each alone and together, explain why both matter for servers
- โHow do you prevent race conditions?โ
- Expected: Mutex/lock around shared data, atomic operations, or lock-free data structures
- Bonus: Explain trade-offs, when each approach is appropriate, what makes a good critical section
- โWhat causes deadlock and how do you prevent it?โ
- Expected: Four conditions (mutual exclusion, hold and wait, no preemption, circular wait); prevent by breaking one condition
- Bonus: Lock ordering, timeout-based detection, lock-free alternatives
- โDesign a thread pool for a web server.โ
- Expected: Bounded queue, worker threads, producer-consumer pattern
- Bonus: Discuss sizing, graceful shutdown, error handling, monitoring
- โWhatโs the difference between a mutex and a semaphore?โ
- Expected: Mutex for mutual exclusion (binary, ownership); semaphore for counting resources
- Bonus: When to use each, binary semaphore vs mutex, condition variables
- โHow would you debug a threading bug that only happens under load?โ
- Expected: Thread sanitizers (TSan), logging with thread IDs, deterministic testing
- Bonus: Race detection tools, controlled thread interleavings, invariant checking
18. Hints in Layers
If youโre stuck, reveal hints one at a time:
Hint 1: Bounded Buffer with Semaphores
The producer-consumer pattern with semaphores uses three semaphores:
typedef struct {
int *buf; // Buffer array
int n; // Maximum number of slots
int front; // buf[(front+1)%n] is first item
int rear; // buf[rear%n] is last item
sem_t mutex; // Protects buffer access
sem_t slots; // Counts available slots
sem_t items; // Counts available items
} sbuf_t;
void sbuf_init(sbuf_t *sp, int n) {
sp->buf = calloc(n, sizeof(int));
sp->n = n;
sp->front = sp->rear = 0;
sem_init(&sp->mutex, 0, 1); // Binary semaphore for mutual exclusion
sem_init(&sp->slots, 0, n); // n empty slots initially
sem_init(&sp->items, 0, 0); // 0 items initially
}
Producer waits for slot, then inserts:
void sbuf_insert(sbuf_t *sp, int item) {
sem_wait(&sp->slots); // Wait for available slot
sem_wait(&sp->mutex); // Lock buffer
sp->buf[(++sp->rear) % sp->n] = item;
sem_post(&sp->mutex); // Unlock buffer
sem_post(&sp->items); // Announce available item
}
Hint 2: Graceful Thread Pool Shutdown
Use a โpoison pillโ pattern - a special value that tells workers to exit:
#define POISON_PILL -1
void shutdown_thread_pool(sbuf_t *pool, int num_workers) {
// Send poison pill to each worker
for (int i = 0; i < num_workers; i++) {
sbuf_insert(pool, POISON_PILL);
}
// Wait for all workers to exit
for (int i = 0; i < num_workers; i++) {
pthread_join(worker_threads[i], NULL);
}
}
void *worker_thread(void *arg) {
sbuf_t *pool = (sbuf_t *)arg;
while (1) {
int connfd = sbuf_remove(pool);
if (connfd == POISON_PILL) {
printf("Thread %lu received poison pill, exiting\n",
pthread_self());
break;
}
handle_client(connfd);
close(connfd);
}
return NULL;
}
Hint 3: Thread-Safe Statistics
For statistics, you have options:
Option 1: Single mutex (simple but potentially contended):
pthread_mutex_t stats_lock = PTHREAD_MUTEX_INITIALIZER;
struct stats {
long total_requests;
long total_bytes;
double total_time;
} global_stats;
void update_stats(long bytes, double time) {
pthread_mutex_lock(&stats_lock);
global_stats.total_requests++;
global_stats.total_bytes += bytes;
global_stats.total_time += time;
pthread_mutex_unlock(&stats_lock);
}
Option 2: Per-thread counters (no contention, merge at end):
__thread struct stats thread_local_stats; // Thread-local storage
void update_stats(long bytes, double time) {
// No locking needed - each thread has its own copy
thread_local_stats.total_requests++;
thread_local_stats.total_bytes += bytes;
thread_local_stats.total_time += time;
}
void merge_all_stats(struct stats *result) {
// Called during shutdown, sum all thread-local copies
// (Requires keeping track of all thread-local instances)
}
Hint 4: Testing for Race Conditions
To reliably trigger race conditions in testing:
// Add artificial delays to widen race windows
#ifdef TESTING_RACES
#define RACE_DELAY() usleep(rand() % 1000)
#else
#define RACE_DELAY()
#endif
void critical_section(void) {
pthread_mutex_lock(&lock);
RACE_DELAY(); // Makes races more likely to manifest
shared_counter++;
RACE_DELAY();
pthread_mutex_unlock(&lock);
}
Use thread sanitizer:
gcc -fsanitize=thread -g -o server server.c -lpthread
./server
Stress test pattern:
void stress_test(int num_threads, int ops_per_thread) {
pthread_t threads[num_threads];
for (int i = 0; i < num_threads; i++) {
pthread_create(&threads[i], NULL, worker, &ops_per_thread);
}
for (int i = 0; i < num_threads; i++) {
pthread_join(threads[i], NULL);
}
// Verify invariants
assert(final_count == num_threads * ops_per_thread);
}
19. Books That Will Help
| Topic | Book | Chapter/Section |
|---|---|---|
| Concurrent programming overview | CS:APP 3rd Ed | Chapter 12.1-12.2 โConcurrent Programmingโ, โConcurrent Programming with Processesโ |
| Thread programming | CS:APP 3rd Ed | Chapter 12.3 โConcurrent Programming with Threadsโ |
| Shared variables and synchronization | CS:APP 3rd Ed | Chapter 12.4-12.5 โShared Variables in Threaded Programsโ, โSynchronizing Threadsโ |
| Thread safety | CS:APP 3rd Ed | Chapter 12.7 โThread Safetyโ |
| Races and deadlocks | CS:APP 3rd Ed | Chapter 12.7.2-12.7.3 โRacesโ, โDeadlocksโ |
| POSIX threads in depth | Programming with POSIX Threads by Butenhof | Chapters 3-7 |
| Lock-free programming | C++ Concurrency in Action by Williams | Chapter 7 โDesigning lock-free concurrent data structuresโ |
| High-performance servers | The Art of Multiprocessor Programming | Chapters 1-3 |
| Thread pools and patterns | Java Concurrency in Practice | Chapter 8 (concepts apply to C) |
20. Submission / Completion Criteria
Minimum Viable Completion:
- All four concurrency modes implemented and working
- Bounded buffer passes correctness tests
- Basic stress test completes without errors
- Performance comparison data collected
Full Completion:
- Graceful shutdown works under load
- Statistics collection is accurate
- Extended stress testing (1+ hour) passes
- Race detection tools report no issues
- Documentation explains every synchronization decision
Excellence (Going Above & Beyond):
- Lock-free data structures implemented
- Dynamic thread pool sizing
- I/O multiplexing mode added
- NUMA-aware implementation
- Formal proof of deadlock freedom
This guide was expanded from CSAPP_3E_DEEP_LEARNING_PROJECTS.md. For the complete learning path, see the project index.