Project 16: Concurrency Workbench

Build a server framework that can switch between concurrency models (iterative, process-per-request, thread-per-request, thread pool), with a bounded-buffer work queue and stress-test harness.

Quick Reference

Attribute	Value
Language	C (alt: Rust, Zig, Go)
Difficulty	Expert
Time	2–3 weeks
Chapters	12
Coolness	★★★★☆ Hardcore Tech Flex
Portfolio Value	Micro-SaaS/Pro Tool

Learning Objectives

By completing this project, you will:

Master threading fundamentals: Create, join, and manage POSIX threads with proper lifecycle management
Implement synchronization primitives correctly: Use mutexes, semaphores, and condition variables without races
Build a producer-consumer queue: Design a thread-safe bounded buffer with proper blocking semantics
Compare concurrency models: Measure and explain throughput differences between iterative, process-based, and thread-based servers
Diagnose and fix concurrency bugs: Identify race conditions, deadlocks, and starvation through systematic debugging
Design effective stress tests: Create test harnesses that reliably expose concurrency defects
Apply concurrency correctness discipline: Use invariants, assertions, and logging to prove correctness

Theoretical Foundation

Concurrency vs Parallelism

These terms are often confused, but they describe fundamentally different concepts:

┌─────────────────────────────────────────────────────────────────────────────┐
│                        CONCURRENCY vs PARALLELISM                           │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  CONCURRENCY: Dealing with multiple things at once (structure)              │
│  ─────────────────────────────────────────────────────────────              │
│                                                                             │
│    Time ──────────────────────────────────────►                             │
│                                                                             │
│    CPU:  [Task A][Task B][Task A][Task C][Task B][Task A]                   │
│                                                                             │
│    - Single CPU interleaving multiple tasks                                 │
│    - Tasks make progress "together" through time-slicing                    │
│    - About program STRUCTURE and composition                                │
│                                                                             │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  PARALLELISM: Doing multiple things at once (execution)                     │
│  ──────────────────────────────────────────────────────                     │
│                                                                             │
│    Time ──────────────────────────────────────►                             │
│                                                                             │
│    CPU 0: [Task A][Task A][Task A][Task A]                                  │
│    CPU 1: [Task B][Task B][Task B][Task B]                                  │
│    CPU 2: [Task C][Task C][Task C][Task C]                                  │
│                                                                             │
│    - Multiple CPUs executing simultaneously                                 │
│    - True simultaneous execution                                            │
│    - About program EXECUTION and performance                                │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Key insight: You can have concurrency without parallelism (single-core with threading), parallelism without concurrency (SIMD operations), or both (multi-core with multiple threads).

Why this matters for servers:

Concurrency allows handling multiple clients without blocking
Parallelism allows using multiple CPU cores for throughput
Most servers need both: concurrent structure with parallel execution

Threads vs Processes Trade-offs

Both threads and processes provide concurrent execution, but with different characteristics:

Aspect	Processes	Threads
Address Space	Separate (isolated)	Shared
Creation Cost	High (fork + COW)	Low (stack allocation)
Context Switch	Expensive (TLB flush)	Cheaper (same address space)
Communication	IPC required (pipes, sockets, shm)	Direct memory sharing
Failure Isolation	Crash isolated to process	Crash affects all threads
Synchronization	Not needed for isolation	Required for shared state
Debugging	Easier (isolated state)	Harder (shared state)

┌─────────────────────────────────────────────────────────────────────────────┐
│                      PROCESSES vs THREADS: Memory Model                      │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  PROCESS MODEL:                                                             │
│  ┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐          │
│  │   Process A     │    │   Process B     │    │   Process C     │          │
│  │ ┌─────────────┐ │    │ ┌─────────────┐ │    │ ┌─────────────┐ │          │
│  │ │    Stack    │ │    │ │    Stack    │ │    │ │    Stack    │ │          │
│  │ ├─────────────┤ │    │ ├─────────────┤ │    │ ├─────────────┤ │          │
│  │ │    Heap     │ │    │ │    Heap     │ │    │ │    Heap     │ │          │
│  │ ├─────────────┤ │    │ ├─────────────┤ │    │ ├─────────────┤ │          │
│  │ │ Data/BSS   │ │    │ │ Data/BSS   │ │    │ │ Data/BSS   │ │          │
│  │ ├─────────────┤ │    │ ├─────────────┤ │    │ ├─────────────┤ │          │
│  │ │    Text     │ │    │ │    Text     │ │    │ │    Text     │ │          │
│  │ └─────────────┘ │    │ └─────────────┘ │    │ └─────────────┘ │          │
│  └─────────────────┘    └─────────────────┘    └─────────────────┘          │
│         ▲                      ▲                      ▲                     │
│         └──────────── ISOLATED ────────────┘          │                     │
│                                                                             │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  THREAD MODEL (within one process):                                         │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │                         Single Process                               │    │
│  │  ┌─────────┐   ┌─────────┐   ┌─────────┐                            │    │
│  │  │ Stack 1 │   │ Stack 2 │   │ Stack 3 │  ◄── Each thread has      │    │
│  │  │Thread 1 │   │Thread 2 │   │Thread 3 │      its own stack         │    │
│  │  └─────────┘   └─────────┘   └─────────┘                            │    │
│  │       │             │             │                                  │    │
│  │       └─────────────┼─────────────┘                                  │    │
│  │                     ▼                                                │    │
│  │  ┌───────────────────────────────────────────────────────────────┐  │    │
│  │  │                    SHARED HEAP                                 │  │    │
│  │  │                    SHARED DATA/BSS                             │  │    │
│  │  │                    SHARED TEXT                                 │  │    │
│  │  └───────────────────────────────────────────────────────────────┘  │    │
│  └─────────────────────────────────────────────────────────────────────┘    │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

When to use processes:

Untrusted code execution (sandboxing)
Legacy code that isn’t thread-safe
Maximum fault isolation required
CPU-bound work that doesn’t share state

When to use threads:

Frequent communication between execution units
Shared state is fundamental to the design
Low-latency response required
Memory efficiency matters

POSIX Threads (pthreads) Basics

The POSIX threads API provides portable threading on Unix-like systems.

Thread Lifecycle

┌─────────────────────────────────────────────────────────────────────────────┐
│                         THREAD LIFECYCLE                                     │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│                     pthread_create()                                        │
│                           │                                                 │
│                           ▼                                                 │
│    ┌───────────────────────────────────────────────────────────────┐        │
│    │                      RUNNABLE                                  │        │
│    │   (Ready to run, waiting for CPU time from scheduler)          │        │
│    └───────────────────────────────────────────────────────────────┘        │
│                           │                                                 │
│            ┌──────────────┼──────────────┐                                  │
│            ▼              ▼              ▼                                  │
│    ┌─────────────┐ ┌─────────────┐ ┌─────────────┐                          │
│    │  RUNNING    │ │  BLOCKED    │ │  WAITING    │                          │
│    │ (executing) │ │(mutex/sema) │ │(cond_wait)  │                          │
│    └─────────────┘ └─────────────┘ └─────────────┘                          │
│            │              │              │                                  │
│            └──────────────┼──────────────┘                                  │
│                           │                                                 │
│                           ▼                                                 │
│    ┌───────────────────────────────────────────────────────────────┐        │
│    │                      TERMINATED                                │        │
│    │   (Thread function returned or pthread_exit called)            │        │
│    └───────────────────────────────────────────────────────────────┘        │
│                           │                                                 │
│            ┌──────────────┴──────────────┐                                  │
│            ▼                             ▼                                  │
│    ┌─────────────────┐          ┌─────────────────┐                         │
│    │    JOINED       │          │   DETACHED      │                         │
│    │ (resources      │          │ (resources      │                         │
│    │  reclaimed via  │          │  auto-reclaimed │                         │
│    │  pthread_join)  │          │  on termination)│                         │
│    └─────────────────┘          └─────────────────┘                         │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Essential pthreads Functions

// Thread creation
int pthread_create(pthread_t *thread,           // Output: thread ID
                   const pthread_attr_t *attr,  // Attributes (NULL for defaults)
                   void *(*start_routine)(void*), // Function to run
                   void *arg);                  // Argument to function

// Thread termination
void pthread_exit(void *retval);  // Exit current thread with return value

// Wait for thread completion
int pthread_join(pthread_t thread, void **retval);  // Block until thread exits

// Detach a thread (no join needed)
int pthread_detach(pthread_t thread);

// Get current thread ID
pthread_t pthread_self(void);

Common Patterns

// Pattern 1: Create and join (most common)
void *worker(void *arg) {
    int id = *(int *)arg;
    // ... do work ...
    return NULL;
}

int main() {
    pthread_t threads[N];
    int ids[N];

    // Create threads
    for (int i = 0; i < N; i++) {
        ids[i] = i;
        pthread_create(&threads[i], NULL, worker, &ids[i]);
    }

    // Join all threads (wait for completion)
    for (int i = 0; i < N; i++) {
        pthread_join(threads[i], NULL);
    }
}

// Pattern 2: Detached threads (fire and forget)
void *background_task(void *arg) {
    pthread_detach(pthread_self());  // Self-detach
    // ... do work ...
    return NULL;  // Resources auto-freed
}

Shared State and Race Conditions

When threads share memory, concurrent access without synchronization leads to race conditions.

Anatomy of a Race Condition

// Shared counter - BROKEN without synchronization
int counter = 0;

void *increment(void *arg) {
    for (int i = 0; i < 1000000; i++) {
        counter++;  // This is NOT atomic!
    }
    return NULL;
}

What counter++ actually does at the machine level:

┌─────────────────────────────────────────────────────────────────────────────┐
│                    WHY counter++ IS NOT ATOMIC                               │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  C code:     counter++;                                                     │
│                                                                             │
│  Assembly:   movl  counter(%rip), %eax    # Load counter into register      │
│              addl  $1, %eax                # Increment register             │
│              movl  %eax, counter(%rip)    # Store back to memory            │
│                                                                             │
│  Three separate operations that can be interleaved!                         │
│                                                                             │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  RACE CONDITION EXAMPLE (counter starts at 0):                              │
│                                                                             │
│  Thread A                          Thread B                                 │
│  ────────────────────              ────────────────────                     │
│  LOAD counter (0)                                                           │
│                                    LOAD counter (0)                         │
│  ADD 1 (reg = 1)                                                            │
│                                    ADD 1 (reg = 1)                          │
│  STORE counter (1)                                                          │
│                                    STORE counter (1)  ◄── LOST UPDATE!     │
│                                                                             │
│  Expected: counter = 2                                                      │
│  Actual:   counter = 1                                                      │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Types of Race Conditions

Read-Modify-Write races: Like counter++ above

Check-Then-Act races:

if (ptr != NULL) {     // Thread B sets ptr = NULL here
    use(ptr);          // CRASH!
}

Data races: Any unsynchronized access where at least one is a write

Critical Sections and Mutual Exclusion

A critical section is code that accesses shared resources and must not be executed by multiple threads simultaneously.

┌─────────────────────────────────────────────────────────────────────────────┐
│                         CRITICAL SECTION CONCEPT                             │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  Thread A                          Thread B                                 │
│  ────────────────────              ────────────────────                     │
│                                                                             │
│  [Non-critical code]               [Non-critical code]                      │
│         │                                 │                                 │
│         ▼                                 ▼                                 │
│  ┌─────────────────┐               ┌─────────────────┐                      │
│  │ ENTER CRITICAL  │               │ ENTER CRITICAL  │                      │
│  │    SECTION      │◄─────────────►│    SECTION      │                      │
│  │                 │   MUTUAL      │                 │                      │
│  │  Modify shared  │  EXCLUSION    │  Modify shared  │                      │
│  │  resource       │  REQUIRED!    │  resource       │                      │
│  │                 │               │                 │                      │
│  │ EXIT CRITICAL   │               │ EXIT CRITICAL   │                      │
│  │    SECTION      │               │    SECTION      │                      │
│  └─────────────────┘               └─────────────────┘                      │
│         │                                 │                                 │
│         ▼                                 ▼                                 │
│  [Non-critical code]               [Non-critical code]                      │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Mutual exclusion ensures only one thread executes a critical section at a time.

Requirements for correct mutual exclusion:

Safety: At most one thread in the critical section
Liveness: A thread that wants to enter eventually does
Bounded waiting: No thread waits forever (no starvation)
No assumptions: Works regardless of thread speed or count

Mutex Locks and Spinlocks

Mutex (Mutual Exclusion Lock)

pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER;

void *worker(void *arg) {
    pthread_mutex_lock(&lock);    // Acquire lock (blocks if held)

    // Critical section - only one thread here at a time
    counter++;

    pthread_mutex_unlock(&lock);  // Release lock
    return NULL;
}

Mutex behavior:

lock(): If free, acquire and continue. If held, block (sleep) until available.
unlock(): Release lock, wake one waiting thread.
Low CPU usage when waiting (thread sleeps).
Higher latency due to context switch overhead.

Spinlock

pthread_spinlock_t spinlock;
pthread_spin_init(&spinlock, PTHREAD_PROCESS_PRIVATE);

void *worker(void *arg) {
    pthread_spin_lock(&spinlock);   // Busy-wait if held

    // Critical section
    counter++;

    pthread_spin_unlock(&spinlock);
    return NULL;
}

Spinlock behavior:

lock(): If free, acquire. If held, spin (busy-wait) checking repeatedly.
unlock(): Release lock.
High CPU usage when waiting (burning cycles).
Lower latency for short critical sections.

When to Use Each

Use Mutex When	Use Spinlock When
Critical section is long	Critical section is very short
Holding across I/O	Never holding across I/O
Uniprocessor system	Multiprocessor with short waits
Priority concerns matter	Lowest latency required

Semaphores and Their Semantics

A semaphore is a non-negative integer with two atomic operations: wait (P/down) and signal (V/up).

#include <semaphore.h>

sem_t sem;
sem_init(&sem, 0, initial_value);  // Initialize with count

sem_wait(&sem);   // P operation: if count > 0, decrement; else block
sem_post(&sem);   // V operation: increment count, wake one waiter

┌─────────────────────────────────────────────────────────────────────────────┐
│                         SEMAPHORE OPERATIONS                                 │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  P (wait/down):                       V (signal/up):                        │
│  ─────────────────                    ─────────────────                     │
│                                                                             │
│  if (s > 0) {                         s++;                                  │
│      s--;                             wake_one_waiter();                    │
│  } else {                                                                   │
│      block_current_thread();          // Always succeeds                    │
│  }                                    // Never blocks                       │
│                                                                             │
│  // May block                                                               │
│  // Decrements on success                                                   │
│                                                                             │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  BINARY SEMAPHORE (mutex equivalent, initial value = 1):                    │
│                                                                             │
│  sem_init(&sem, 0, 1);  // Can be acquired once                             │
│                                                                             │
│  sem_wait(&sem);        // Acquire (like lock)                              │
│  // critical section                                                        │
│  sem_post(&sem);        // Release (like unlock)                            │
│                                                                             │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  COUNTING SEMAPHORE (resource pool, initial value = N):                     │
│                                                                             │
│  sem_init(&sem, 0, 10);  // 10 resources available                          │
│                                                                             │
│  sem_wait(&sem);         // Acquire one resource                            │
│  // use resource                                                            │
│  sem_post(&sem);         // Return resource                                 │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Semaphore vs Mutex:

Mutex: Locked/unlocked, must be released by the thread that acquired it
Semaphore: Counter, can be signaled by any thread (useful for signaling)

Condition Variables and Signaling

Condition variables enable threads to wait for a condition to become true.

pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER;
pthread_cond_t cond = PTHREAD_COND_INITIALIZER;
int ready = 0;

// Waiting thread
void *waiter(void *arg) {
    pthread_mutex_lock(&lock);
    while (!ready) {                    // Always use while, not if!
        pthread_cond_wait(&cond, &lock); // Atomically: unlock + sleep + relock
    }
    // Condition is now true
    pthread_mutex_unlock(&lock);
    return NULL;
}

// Signaling thread
void *signaler(void *arg) {
    pthread_mutex_lock(&lock);
    ready = 1;
    pthread_cond_signal(&cond);         // Wake one waiter
    pthread_mutex_unlock(&lock);
    return NULL;
}

┌─────────────────────────────────────────────────────────────────────────────┐
│                    CONDITION VARIABLE WAIT/SIGNAL                            │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  pthread_cond_wait(&cond, &mutex):                                          │
│  ─────────────────────────────────                                          │
│                                                                             │
│  1. Atomically: release mutex AND go to sleep on cond                       │
│  2. When signaled: wake up                                                  │
│  3. Atomically: reacquire mutex before returning                            │
│                                                                             │
│  CRITICAL: Must hold mutex when calling!                                    │
│  CRITICAL: Always recheck condition in while loop (spurious wakeups)!       │
│                                                                             │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  pthread_cond_signal(&cond):      pthread_cond_broadcast(&cond):            │
│  ────────────────────────────     ──────────────────────────────            │
│                                                                             │
│  Wake ONE waiting thread          Wake ALL waiting threads                  │
│  (or none if no waiters)          (each must reacquire mutex)               │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Why while loop, not if?

// WRONG - can miss condition change
if (!ready) {
    pthread_cond_wait(&cond, &lock);
}
// ready might be false here due to:
// 1. Spurious wakeups (OS can wake thread without signal)
// 2. Another thread changed condition after we woke but before we locked

// CORRECT
while (!ready) {
    pthread_cond_wait(&cond, &lock);
}
// ready is guaranteed true here

Producer-Consumer Pattern

The producer-consumer pattern is fundamental to concurrent systems: producers add work to a buffer, consumers remove and process it.

┌─────────────────────────────────────────────────────────────────────────────┐
│                      PRODUCER-CONSUMER PATTERN                               │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│    ┌──────────┐                                        ┌──────────┐         │
│    │Producer 1│──┐                                 ┌──▶│Consumer 1│         │
│    └──────────┘  │                                 │   └──────────┘         │
│                  │                                 │                        │
│    ┌──────────┐  │   ┌─────────────────────────┐   │   ┌──────────┐         │
│    │Producer 2│──┼──▶│     Bounded Buffer      │───┼──▶│Consumer 2│         │
│    └──────────┘  │   │  ┌───┬───┬───┬───┬───┐  │   │   └──────────┘         │
│                  │   │  │ W │ W │   │   │   │  │   │                        │
│    ┌──────────┐  │   │  └───┴───┴───┴───┴───┘  │   │   ┌──────────┐         │
│    │Producer 3│──┘   │      ▲           ▲      │   └──▶│Consumer 3│         │
│    └──────────┘      │      │           │      │       └──────────┘         │
│                      │     rear       front    │                            │
│                      └─────────────────────────┘                            │
│                                                                             │
│  Synchronization requirements:                                              │
│  ─────────────────────────────                                              │
│  1. Mutual exclusion: Only one thread modifies buffer at a time             │
│  2. Producer blocks: When buffer is FULL                                    │
│  3. Consumer blocks: When buffer is EMPTY                                   │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Implementation with Semaphores

#define BUFFER_SIZE 10

typedef struct {
    int buffer[BUFFER_SIZE];
    int front;                  // Index for next remove
    int rear;                   // Index for next insert
    sem_t mutex;                // Protects buffer access
    sem_t slots;                // Counts empty slots (initially BUFFER_SIZE)
    sem_t items;                // Counts full slots (initially 0)
} bounded_buffer_t;

void buffer_init(bounded_buffer_t *b) {
    b->front = b->rear = 0;
    sem_init(&b->mutex, 0, 1);
    sem_init(&b->slots, 0, BUFFER_SIZE);  // All slots empty
    sem_init(&b->items, 0, 0);            // No items yet
}

void buffer_insert(bounded_buffer_t *b, int item) {
    sem_wait(&b->slots);              // Wait for empty slot (decrements slots)
    sem_wait(&b->mutex);              // Lock buffer

    b->buffer[b->rear] = item;
    b->rear = (b->rear + 1) % BUFFER_SIZE;

    sem_post(&b->mutex);              // Unlock buffer
    sem_post(&b->items);              // Signal item available (increments items)
}

int buffer_remove(bounded_buffer_t *b) {
    sem_wait(&b->items);              // Wait for item (decrements items)
    sem_wait(&b->mutex);              // Lock buffer

    int item = b->buffer[b->front];
    b->front = (b->front + 1) % BUFFER_SIZE;

    sem_post(&b->mutex);              // Unlock buffer
    sem_post(&b->slots);              // Signal slot freed (increments slots)

    return item;
}

Implementation with Condition Variables

typedef struct {
    int buffer[BUFFER_SIZE];
    int front, rear, count;
    pthread_mutex_t lock;
    pthread_cond_t not_empty;
    pthread_cond_t not_full;
} bounded_buffer_cv_t;

void buffer_insert_cv(bounded_buffer_cv_t *b, int item) {
    pthread_mutex_lock(&b->lock);

    while (b->count == BUFFER_SIZE) {       // Buffer full
        pthread_cond_wait(&b->not_full, &b->lock);
    }

    b->buffer[b->rear] = item;
    b->rear = (b->rear + 1) % BUFFER_SIZE;
    b->count++;

    pthread_cond_signal(&b->not_empty);     // Wake a consumer
    pthread_mutex_unlock(&b->lock);
}

int buffer_remove_cv(bounded_buffer_cv_t *b) {
    pthread_mutex_lock(&b->lock);

    while (b->count == 0) {                 // Buffer empty
        pthread_cond_wait(&b->not_empty, &b->lock);
    }

    int item = b->buffer[b->front];
    b->front = (b->front + 1) % BUFFER_SIZE;
    b->count--;

    pthread_cond_signal(&b->not_full);      // Wake a producer
    pthread_mutex_unlock(&b->lock);

    return item;
}

Reader-Writer Locks

When data is read often but written rarely, reader-writer locks allow multiple concurrent readers but exclusive writers.

┌─────────────────────────────────────────────────────────────────────────────┐
│                      READER-WRITER LOCK SEMANTICS                            │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  Reader wants access:               Writer wants access:                    │
│  ────────────────────               ─────────────────────                   │
│                                                                             │
│  If no writers active:              If no readers AND no writers:           │
│    Grant read access                  Grant write access                    │
│    (multiple readers OK)              (exclusive)                           │
│                                                                             │
│  If writer active:                  If any readers OR writer:               │
│    Block                              Block                                 │
│                                                                             │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  Legal states:                      Illegal states:                         │
│  ─────────────                      ───────────────                         │
│                                                                             │
│  [R] [R] [R] [R]  ✓  Multiple       [R] [W]          ✗  Reader + Writer    │
│  [W]              ✓  Single writer  [W] [W]          ✗  Multiple writers   │
│  (empty)          ✓  No access      [R] [R] [W]      ✗  Mixed              │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

pthread_rwlock_t rwlock = PTHREAD_RWLOCK_INITIALIZER;

void *reader(void *arg) {
    pthread_rwlock_rdlock(&rwlock);   // Acquire read lock
    // Read shared data (others can read too)
    pthread_rwlock_unlock(&rwlock);
    return NULL;
}

void *writer(void *arg) {
    pthread_rwlock_wrlock(&rwlock);   // Acquire write lock (exclusive)
    // Modify shared data (no one else can access)
    pthread_rwlock_unlock(&rwlock);
    return NULL;
}

Starvation concerns:

Reader preference: Writers may starve if readers keep arriving
Writer preference: Readers may starve if writers keep arriving
Fair: Requests handled in order (complex to implement)

Deadlock Conditions and Prevention

Deadlock occurs when threads wait for each other in a cycle, and none can proceed.

┌─────────────────────────────────────────────────────────────────────────────┐
│                         DEADLOCK SCENARIO                                    │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  Thread A                          Thread B                                 │
│  ────────────────────              ────────────────────                     │
│                                                                             │
│  lock(mutex_A);                    lock(mutex_B);                           │
│  // ... some work ...              // ... some work ...                     │
│  lock(mutex_B);  ◄── BLOCKS ──►    lock(mutex_A);  ◄── BLOCKS              │
│  // waiting for B                  // waiting for A                         │
│                                                                             │
│         ┌──────────────────────────────────────────┐                        │
│         │                                          │                        │
│         │   Thread A ────waits for────▶ mutex_B   │                        │
│         │      │                           │       │                        │
│         │      │                           │       │                        │
│         │   held by                    held by     │                        │
│         │      │                           │       │                        │
│         │      ▼                           ▼       │                        │
│         │   mutex_A ◀────waits for──── Thread B   │                        │
│         │                                          │                        │
│         │           CIRCULAR WAIT = DEADLOCK       │                        │
│         └──────────────────────────────────────────┘                        │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Four Necessary Conditions (Coffman Conditions)

All four must be present for deadlock:

Mutual exclusion: Resources cannot be shared
Hold and wait: Thread holds resources while waiting for more
No preemption: Resources cannot be forcibly taken away
Circular wait: Cycle of threads waiting for each other

Prevention Strategies

Strategy	Breaks Condition	How
Lock ordering	Circular wait	Always acquire locks in global order
Lock timeout	Hold and wait	Give up and retry if lock not acquired in time
Try-lock	Hold and wait	Don’t block; back off if can’t acquire
Lock hierarchy	Circular wait	Assign levels; only acquire lower-level locks
Single lock	Mutual exclusion	Use one coarse-grained lock (hurts parallelism)

Lock ordering example:

// Define global order: mutex_A < mutex_B < mutex_C

// CORRECT: Always acquire in order
void safe_operation() {
    pthread_mutex_lock(&mutex_A);
    pthread_mutex_lock(&mutex_B);
    // ... critical section ...
    pthread_mutex_unlock(&mutex_B);
    pthread_mutex_unlock(&mutex_A);
}

// WRONG: Violates ordering, risks deadlock
void unsafe_operation() {
    pthread_mutex_lock(&mutex_B);  // Bad: B before A
    pthread_mutex_lock(&mutex_A);
}

Thread Pools and Work Queues

A thread pool maintains a fixed number of worker threads that pull tasks from a shared queue.

┌─────────────────────────────────────────────────────────────────────────────┐
│                        THREAD POOL ARCHITECTURE                              │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│   Clients                                                                   │
│   ───────                                                                   │
│   ┌───┐ ┌───┐ ┌───┐ ┌───┐                                                   │
│   │ C │ │ C │ │ C │ │ C │  ... many clients                                 │
│   └─┬─┘ └─┬─┘ └─┬─┘ └─┬─┘                                                   │
│     │     │     │     │                                                     │
│     └─────┴─────┴─────┘                                                     │
│             │                                                               │
│             ▼                                                               │
│   ┌─────────────────────────────────────────────────────────────────────┐   │
│   │                         TASK QUEUE                                   │   │
│   │   ┌────┬────┬────┬────┬────┬────┬────┬────┬────┬────┐               │   │
│   │   │Task│Task│Task│Task│Task│    │    │    │    │    │               │   │
│   │   └────┴────┴────┴────┴────┴────┴────┴────┴────┴────┘               │   │
│   │   ◄──── Bounded buffer with producer-consumer sync ────►            │   │
│   └─────────────────────────────────────────────────────────────────────┘   │
│                                    │                                        │
│                                    ▼                                        │
│   ┌─────────────────────────────────────────────────────────────────────┐   │
│   │                       THREAD POOL                                    │   │
│   │                                                                      │   │
│   │    ┌────────┐  ┌────────┐  ┌────────┐  ┌────────┐  ┌────────┐       │   │
│   │    │Worker 0│  │Worker 1│  │Worker 2│  │Worker 3│  │Worker N│       │   │
│   │    └───┬────┘  └───┬────┘  └───┬────┘  └───┬────┘  └───┬────┘       │   │
│   │        │           │           │           │           │            │   │
│   │        ▼           ▼           ▼           ▼           ▼            │   │
│   │    [Execute]   [Execute]   [Waiting]   [Execute]   [Waiting]        │   │
│   │     Task 1      Task 2     for task     Task 3     for task         │   │
│   │                                                                      │   │
│   └─────────────────────────────────────────────────────────────────────┘   │
│                                                                             │
│   Benefits:                                                                 │
│   ─────────                                                                 │
│   - Bounded resource usage (fixed N threads)                                │
│   - Amortized thread creation cost                                          │
│   - Backpressure when queue fills (producers block)                         │
│   - Natural load balancing across workers                                   │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Why thread pools?

Without Pool	With Pool
Create thread per request	Reuse existing threads
Unbounded resource growth	Bounded thread count
Thread creation overhead per request	Creation cost amortized
Can exhaust system resources	Graceful degradation under load

Thread Safety and Reentrant Functions

A function is thread-safe if it can be called from multiple threads simultaneously without causing race conditions.

A function is reentrant if it doesn’t use any shared state—a stricter condition than thread-safety.

┌─────────────────────────────────────────────────────────────────────────────┐
│                    THREAD SAFETY CLASSIFICATION                              │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │                         REENTRANT                                    │    │
│  │  - Uses only local variables (stack)                                 │    │
│  │  - Doesn't access global/static data                                 │    │
│  │  - Doesn't call non-reentrant functions                              │    │
│  │  - Automatically thread-safe                                         │    │
│  │                                                                      │    │
│  │  Example:                                                            │    │
│  │  int square(int x) { return x * x; }                                 │    │
│  │                                                                      │    │
│  └─────────────────────────────────────────────────────────────────────┘    │
│                           │                                                 │
│                           ▼                                                 │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │                     THREAD-SAFE (but not reentrant)                  │    │
│  │  - Uses shared state with proper synchronization                     │    │
│  │  - May use static variables protected by locks                       │    │
│  │                                                                      │    │
│  │  Example:                                                            │    │
│  │  int counter = 0;                                                    │    │
│  │  pthread_mutex_t lock;                                               │    │
│  │                                                                      │    │
│  │  int get_next() {                                                    │    │
│  │      pthread_mutex_lock(&lock);                                      │    │
│  │      int val = counter++;                                            │    │
│  │      pthread_mutex_unlock(&lock);                                    │    │
│  │      return val;                                                     │    │
│  │  }                                                                   │    │
│  │                                                                      │    │
│  └─────────────────────────────────────────────────────────────────────┘    │
│                           │                                                 │
│                           ▼                                                 │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │                      NOT THREAD-SAFE                                 │    │
│  │  - Accesses shared state without synchronization                     │    │
│  │  - Returns pointers to static storage                                │    │
│  │                                                                      │    │
│  │  Examples:                                                           │    │
│  │  - strtok() - uses static buffer                                     │    │
│  │  - localtime() - returns pointer to static struct                    │    │
│  │  - rand() - uses static seed                                         │    │
│  │                                                                      │    │
│  │  Thread-safe alternatives:                                           │    │
│  │  - strtok_r() - user provides buffer                                 │    │
│  │  - localtime_r() - user provides output struct                       │    │
│  │  - rand_r() - user provides seed                                     │    │
│  │                                                                      │    │
│  └─────────────────────────────────────────────────────────────────────┘    │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Guidelines for thread-safe programming:

Prefer reentrant functions when possible
Use _r versions of standard library functions
Protect shared state with appropriate synchronization
Avoid returning pointers to static storage
Use thread-local storage for per-thread state

Memory Consistency Models (Basics)

Modern CPUs and compilers reorder operations for performance, which can break assumptions in concurrent code.

┌─────────────────────────────────────────────────────────────────────────────┐
│                    MEMORY ORDERING SURPRISES                                 │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  // Thread A              // Thread B                                       │
│  x = 1;                   while (ready == 0) { }                            │
│  ready = 1;               print(x);                                         │
│                                                                             │
│  You expect: Thread B prints 1                                              │
│  Reality: Thread B might print 0!                                           │
│                                                                             │
│  Why? Compiler or CPU might reorder Thread A's stores:                      │
│  ready = 1;    // Stored first!                                             │
│  x = 1;        // Stored second                                             │
│                                                                             │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  WHAT CAN REORDER OPERATIONS:                                               │
│                                                                             │
│  1. COMPILER: Reorders for optimization                                     │
│     - Can be prevented with volatile or memory barriers                     │
│                                                                             │
│  2. CPU: Out-of-order execution, store buffers                              │
│     - Requires hardware memory barriers (mfence, etc.)                      │
│                                                                             │
│  3. CACHE: Different cores see updates at different times                   │
│     - Cache coherency protocols handle this (eventually)                    │
│                                                                             │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  SOLUTION: Use proper synchronization primitives!                           │
│                                                                             │
│  pthread_mutex_lock/unlock include necessary memory barriers                │
│  sem_wait/post include necessary memory barriers                            │
│  C11 atomics provide explicit memory ordering                               │
│                                                                             │
│  DON'T try to roll your own lock-free code without deep expertise!          │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Practical advice:

Always use pthreads primitives or C11 atomics
Don’t assume operations happen in source-code order
Testing doesn’t prove absence of memory ordering bugs
Lock-free programming is expert-level (avoid unless necessary)

Project Specification

What You Will Build

A Concurrency Workbench: a configurable server framework that demonstrates and compares different concurrency models under load, with built-in instrumentation and stress testing.

Server Modes

The server must support switching between these concurrency models at startup:

┌─────────────────────────────────────────────────────────────────────────────┐
│                       CONCURRENCY MODEL COMPARISON                           │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  MODE 1: ITERATIVE (baseline)                                               │
│  ─────────────────────────────                                              │
│                                                                             │
│  ┌────────┐    ┌────────┐    ┌────────┐    ┌────────┐                       │
│  │Request1│───▶│Handle 1│───▶│Request2│───▶│Handle 2│───▶ ...               │
│  └────────┘    └────────┘    └────────┘    └────────┘                       │
│                                                                             │
│  - One request at a time                                                    │
│  - Simple, no concurrency bugs possible                                     │
│  - Terrible for blocking I/O or slow clients                                │
│                                                                             │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  MODE 2: PROCESS-PER-REQUEST                                                │
│  ───────────────────────────                                                │
│                                                                             │
│       fork()  ┌──────────────┐                                              │
│  ┌────────┬──▶│ Child Proc 1 │──▶ Handle Request 1                          │
│  │        │   └──────────────┘                                              │
│  │ Parent │                                                                 │
│  │        │   ┌──────────────┐                                              │
│  │        ├──▶│ Child Proc 2 │──▶ Handle Request 2                          │
│  │        │   └──────────────┘                                              │
│  │        │                                                                 │
│  └────────┘   ...                                                           │
│                                                                             │
│  - Maximum isolation (crash doesn't affect parent)                          │
│  - High overhead (fork per request)                                         │
│  - No shared state between requests (simple)                                │
│                                                                             │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  MODE 3: THREAD-PER-REQUEST                                                 │
│  ──────────────────────────                                                 │
│                                                                             │
│  ┌─────────────────────────────────────────────────────────┐                │
│  │                    Main Thread                           │                │
│  │  accept() ──▶ pthread_create() ──▶ accept() ──▶ ...     │                │
│  └─────────────────────────────────────────────────────────┘                │
│                     │                     │                                 │
│                     ▼                     ▼                                 │
│              ┌──────────────┐      ┌──────────────┐                         │
│              │   Thread 1   │      │   Thread 2   │                         │
│              │  Handle Req  │      │  Handle Req  │                         │
│              └──────────────┘      └──────────────┘                         │
│                                                                             │
│  - Lower overhead than fork()                                               │
│  - Still unbounded thread creation                                          │
│  - Shared state requires synchronization                                    │
│                                                                             │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  MODE 4: THREAD POOL (most sophisticated)                                   │
│  ────────────────────────────────────────                                   │
│                                                                             │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │                         Main Thread                                  │    │
│  │  accept() ──▶ enqueue(conn_fd) ──▶ accept() ──▶ ...                 │    │
│  └─────────────────────────────────────────────────────────────────────┘    │
│                              │                                              │
│                              ▼                                              │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │                     Bounded Task Queue                               │    │
│  │     ┌────┬────┬────┬────┬────┬────┬────┬────┐                       │    │
│  │     │fd 5│fd 9│fd 3│    │    │    │    │    │                       │    │
│  │     └────┴────┴────┴────┴────┴────┴────┴────┘                       │    │
│  └─────────────────────────────────────────────────────────────────────┘    │
│                              │                                              │
│          ┌───────────────────┼───────────────────┐                          │
│          ▼                   ▼                   ▼                          │
│    ┌──────────┐        ┌──────────┐        ┌──────────┐                     │
│    │ Worker 1 │        │ Worker 2 │        │ Worker N │                     │
│    │ dequeue  │        │ dequeue  │        │ dequeue  │                     │
│    │ handle   │        │ handle   │        │ handle   │                     │
│    └──────────┘        └──────────┘        └──────────┘                     │
│                                                                             │
│  - Bounded resource usage                                                   │
│  - Backpressure when overloaded (queue fills)                               │
│  - Thread creation amortized over many requests                             │
│  - Requires careful synchronization                                         │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Functional Requirements

Mode Selection (--mode <iterative|process|thread|pool>):
- Select concurrency model at startup
- Default to thread pool mode
Thread Pool Configuration (--threads <N>, --queue-size <M>):
- Configurable worker thread count
- Configurable bounded buffer size
Echo Service:
- Simple echo server for testing (read line, write line)
- Configurable artificial delay per request (for testing concurrency)
Instrumentation (--stats):
- Requests handled per second
- Average latency per request
- Queue depth over time (for pool mode)
- Active workers over time
Graceful Shutdown:
- Handle SIGINT to stop accepting new connections
- Complete in-progress requests
- Clean thread/process termination
Debug Mode (--debug):
- Log every operation with timestamps
- Assert invariants continuously
- Detect and report races/deadlocks

Non-Functional Requirements

Correctness: No data races, no deadlocks, proper resource cleanup
Stability: Run for hours under stress without crashes or leaks
Observability: Every concurrency decision should be loggable
Portability: Works on Linux (primary), macOS (stretch goal)

Example Usage

# Start server in thread pool mode with 4 workers and queue size 16
$ ./concurrency-workbench --mode pool --threads 4 --queue-size 16 --port 8080

Concurrency Workbench v1.0
Mode: Thread Pool (4 workers, queue size 16)
Listening on port 8080...

# In another terminal, run stress test
$ ./stress-test --clients 100 --requests 1000 --target localhost:8080

Running stress test: 100 concurrent clients, 1000 requests each
[████████████████████████████████████████] 100000/100000 requests

Results:
  Total time:    12.3 seconds
  Requests/sec:  8130
  Avg latency:   12.3 ms
  P99 latency:   45.2 ms
  Errors:        0

# Compare with other modes
$ ./run-comparison.sh
Mode            RPS      Avg Latency   P99 Latency   Errors
─────────────────────────────────────────────────────────────
iterative       142      703.4 ms      2100 ms       0
process         1823     54.9 ms       289 ms        0
thread          6412     15.6 ms       67 ms         0
pool (4)        8130     12.3 ms       45 ms         0
pool (8)        9847     10.2 ms       38 ms         0
pool (16)       10234    9.8 ms        41 ms         0

Real World Outcome

When you complete this project, here’s exactly what you’ll see when running your concurrency workbench:

Server Mode Selection

$ ./workbench --mode iterative --port 8080
Concurrency Workbench v1.0
Mode: iterative (single-threaded, blocking)
Listening on port 8080...
Press Ctrl+C for statistics and shutdown.

[2025-12-26 10:15:23] Client 192.168.1.10:54321 connected
[2025-12-26 10:15:23] Request: GET /compute?n=100
[2025-12-26 10:15:23] Response: 200 OK (23ms)
[2025-12-26 10:15:24] Client 192.168.1.10:54321 disconnected

Comparing Concurrency Modes

$ ./workbench-bench --compare --clients 100 --requests 1000

=== CONCURRENCY MODE COMPARISON ===
Test: 100 concurrent clients, 1000 requests each
Work per request: 10ms simulated CPU + 5ms simulated I/O

Mode                   Throughput    Avg Latency    P99 Latency    CPU Usage
--------------------------------------------------------------------------------
Iterative                  67 req/s      1492 ms        1523 ms        12%
Process-per-Request       823 req/s        98 ms         342 ms        89%
Thread-per-Request       2,847 req/s       31 ms          87 ms        94%
Thread Pool (8)          4,123 req/s       22 ms          45 ms        97%
Thread Pool (16)         4,891 req/s       19 ms          38 ms        99%
Thread Pool (32)         4,752 req/s       20 ms          41 ms        99%

Analysis:
  - Iterative: Serial processing, terrible latency (request queueing)
  - Process-per-request: 12x faster, but fork() overhead visible
  - Thread-per-request: 4x faster than processes (no fork overhead)
  - Thread pool: Best throughput, optimal at ~16 threads (matches CPU cores)
  - Diminishing returns beyond CPU count (context switch overhead)

Bounded Buffer in Action

$ ./workbench --mode pool --threads 4 --queue-size 10 --port 8080 --verbose

=== THREAD POOL INITIALIZATION ===
Queue capacity: 10 slots
Worker threads: 4

[MAIN] Starting worker threads...
  Thread 0 (tid 12345) started, waiting for work
  Thread 1 (tid 12346) started, waiting for work
  Thread 2 (tid 12347) started, waiting for work
  Thread 3 (tid 12348) started, waiting for work

[MAIN] Server ready. Queue state: [0/10 items]

=== UNDER LOAD ===

[10:15:23.001] Connection from 192.168.1.10 -> enqueue (queue: 1/10)
[10:15:23.002] Connection from 192.168.1.11 -> enqueue (queue: 2/10)
[10:15:23.002] Thread 0: dequeue, processing client 192.168.1.10
[10:15:23.003] Thread 1: dequeue, processing client 192.168.1.11
[10:15:23.003] Connection from 192.168.1.12 -> enqueue (queue: 1/10)
...
[10:15:23.100] Connection from 192.168.1.50 -> enqueue (queue: 10/10) FULL!
[10:15:23.100] Main thread BLOCKING on full queue...
[10:15:23.125] Thread 2: completed client, queue slot freed
[10:15:23.125] Main thread: enqueued, continuing accept loop

=== GRACEFUL SHUTDOWN (Ctrl+C) ===
^C
[MAIN] Shutdown signal received
[MAIN] Stopping accept loop
[MAIN] Sending poison pills to 4 workers...
[MAIN] Waiting for workers to finish current requests...
  Thread 0: received poison pill, exiting
  Thread 1: received poison pill, exiting
  Thread 2: completing request, then exit
  Thread 3: completing request, then exit
[MAIN] All workers terminated

=== FINAL STATISTICS ===
Total requests:          12,847
Successful:              12,845 (99.98%)
Failed:                       2 (client disconnect)
Total time:              127.3 seconds
Throughput:              100.9 req/s

Latency percentiles:
  P50:   18ms
  P90:   29ms
  P99:   67ms
  Max:  312ms

Queue statistics:
  Times full:      47
  Times empty:    891
  Avg occupancy:  4.2 items

Thread utilization:
  Thread 0:  87% busy
  Thread 1:  89% busy
  Thread 2:  86% busy
  Thread 3:  91% busy

Race Condition Detection

$ ./workbench --mode pool --threads 4 &
$ helgrind ./workbench-bench --clients 50 --requests 100

==12345== Helgrind, a thread error detector
==12345== Running on valgrind-3.18.1

==12345== ---Thread-Announcement------------------------------------------
==12345== Thread #3 was created

==12345== ----------------------------------------------------------------
==12345== Possible data race during write of size 8 at 0x4A3C80:
==12345==    at 0x401234: update_statistics (stats.c:47)
==12345==    by 0x401567: handle_request (server.c:123)
==12345==    by 0x401789: worker_thread (pool.c:89)
==12345==
==12345== This conflicts with a previous read of size 8 by thread #2:
==12345==    at 0x401230: update_statistics (stats.c:45)
==12345==    by 0x401567: handle_request (server.c:123)

RACE DETECTED! Fix by adding mutex around statistics update.

After fixing and re-running:
==12345== ERROR SUMMARY: 0 errors from 0 contexts

Deadlock Demonstration and Fix

$ ./deadlock-demo

=== DEADLOCK DEMONSTRATION ===

Creating two threads with lock ordering violation...

Thread A: Acquired lock_1
Thread B: Acquired lock_2
Thread A: Waiting for lock_2...
Thread B: Waiting for lock_1...

[5 seconds pass with no progress]

DEADLOCK DETECTED!
  Thread A holds lock_1, wants lock_2
  Thread B holds lock_2, wants lock_1

This is a classic dining philosophers scenario.

=== FIX: CONSISTENT LOCK ORDERING ===

Both threads now acquire locks in order: lock_1, then lock_2

Thread A: Acquired lock_1
Thread A: Acquired lock_2
Thread A: Released lock_2
Thread A: Released lock_1
Thread B: Acquired lock_1
Thread B: Acquired lock_2
Thread B: Released lock_2
Thread B: Released lock_1

SUCCESS! Consistent ordering prevents deadlock.

Solution Architecture

High-Level Design

┌─────────────────────────────────────────────────────────────────────────────┐
│                     CONCURRENCY WORKBENCH ARCHITECTURE                       │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │                           CLI Layer                                  │    │
│  │    main.c: Argument parsing, mode selection, startup/shutdown        │    │
│  └───────────────────────────────┬─────────────────────────────────────┘    │
│                                  │                                          │
│                                  ▼                                          │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │                        Server Abstraction                            │    │
│  │  server.h: Common interface for all concurrency modes                │    │
│  │                                                                      │    │
│  │  typedef struct server {                                             │    │
│  │      void (*start)(struct server *, int port);                       │    │
│  │      void (*stop)(struct server *);                                  │    │
│  │      stats_t (*get_stats)(struct server *);                          │    │
│  │  } server_t;                                                         │    │
│  └───────────────────────────────┬─────────────────────────────────────┘    │
│                                  │                                          │
│         ┌────────────────────────┼────────────────────────┐                 │
│         │                        │                        │                 │
│         ▼                        ▼                        ▼                 │
│  ┌─────────────┐         ┌─────────────┐         ┌─────────────────────┐    │
│  │  Iterative  │         │   Process   │         │       Thread        │    │
│  │   Server    │         │   Server    │         │       Server        │    │
│  │ iterative.c │         │  process.c  │         │ ┌─────────────────┐ │    │
│  └─────────────┘         └─────────────┘         │ │  thread_simple  │ │    │
│                                                  │ │     thread.c    │ │    │
│                                                  │ ├─────────────────┤ │    │
│                                                  │ │   thread_pool   │ │    │
│                                                  │ │  thread_pool.c  │ │    │
│                                                  │ └─────────────────┘ │    │
│                                                  └─────────────────────┘    │
│                                                            │                │
│                                                            ▼                │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │                        Core Components                               │    │
│  │                                                                      │    │
│  │  ┌─────────────────┐  ┌─────────────────┐  ┌─────────────────────┐  │    │
│  │  │  Bounded Buffer │  │   Thread Pool   │  │  Request Handler    │  │    │
│  │  │   bounded_buf.c │  │   threadpool.c  │  │    handler.c        │  │    │
│  │  │                 │  │                 │  │                     │  │    │
│  │  │  - Producer     │  │  - Worker mgmt  │  │  - Echo protocol    │  │    │
│  │  │  - Consumer     │  │  - Task submit  │  │  - Response format  │  │    │
│  │  │  - Blocking ops │  │  - Stats        │  │  - Timing           │  │    │
│  │  └─────────────────┘  └─────────────────┘  └─────────────────────┘  │    │
│  │                                                                      │    │
│  │  ┌─────────────────┐  ┌─────────────────┐  ┌─────────────────────┐  │    │
│  │  │    Statistics   │  │   Robust I/O    │  │    Debug/Assert     │  │    │
│  │  │     stats.c     │  │     rio.c       │  │     debug.c         │  │    │
│  │  │                 │  │                 │  │                     │  │    │
│  │  │  - Counters     │  │  - Partial R/W  │  │  - Invariant check  │  │    │
│  │  │  - Latency      │  │  - Buffering    │  │  - Race detection   │  │    │
│  │  │  - Throughput   │  │  - Error handle │  │  - Logging          │  │    │
│  │  └─────────────────┘  └─────────────────┘  └─────────────────────┘  │    │
│  │                                                                      │    │
│  └─────────────────────────────────────────────────────────────────────┘    │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Key Components

Component	Responsibility	Key Challenges
Bounded Buffer	Thread-safe work queue	Correct blocking on empty/full
Thread Pool	Worker lifecycle management	Clean shutdown, error handling
Request Handler	Protocol implementation	Thread-safe design
Statistics	Metrics collection	Lock-free or low-overhead synchronization
Debug System	Invariant checking	Non-intrusive, togglable

Data Structures

// Bounded buffer for work queue
typedef struct {
    int *buffer;                    // Array of file descriptors
    int capacity;                   // Maximum items
    int count;                      // Current items
    int front, rear;                // Circular queue indices

    pthread_mutex_t lock;           // Protects all fields
    pthread_cond_t not_empty;       // Signaled when item added
    pthread_cond_t not_full;        // Signaled when item removed

    int shutdown;                   // Shutdown flag
} bounded_buffer_t;

// Thread pool
typedef struct {
    pthread_t *workers;             // Worker thread handles
    int num_workers;                // Number of workers

    bounded_buffer_t *queue;        // Work queue

    void (*handler)(int fd);        // Request handler function

    // Statistics (atomics for lock-free access)
    _Atomic uint64_t requests_completed;
    _Atomic uint64_t total_latency_us;
    _Atomic int active_workers;

    int shutdown;                   // Shutdown flag
} thread_pool_t;

// Server statistics
typedef struct {
    uint64_t total_requests;
    uint64_t requests_per_second;
    double avg_latency_ms;
    double p50_latency_ms;
    double p99_latency_ms;
    int queue_depth;
    int active_workers;
} stats_t;

Algorithm Overview

Thread Pool Worker Loop:

┌─────────────────────────────────────────────────────────────────────────────┐
│                       WORKER THREAD ALGORITHM                                │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  worker_thread():                                                           │
│  ────────────────                                                           │
│                                                                             │
│      loop:                                                                  │
│          │                                                                  │
│          ▼                                                                  │
│      ┌───────────────────────────────────────┐                              │
│      │ fd = bounded_buffer_get(queue)        │                              │
│      │                                       │                              │
│      │ - Acquire lock                        │                              │
│      │ - While queue empty AND not shutdown: │                              │
│      │     Wait on not_empty condition       │                              │
│      │ - If shutdown: return                 │                              │
│      │ - Dequeue fd                          │                              │
│      │ - Signal not_full                     │                              │
│      │ - Release lock                        │                              │
│      └───────────────┬───────────────────────┘                              │
│                      │                                                      │
│                      ▼                                                      │
│      ┌───────────────────────────────────────┐                              │
│      │ active_workers++                      │                              │
│      │ start_time = now()                    │                              │
│      └───────────────┬───────────────────────┘                              │
│                      │                                                      │
│                      ▼                                                      │
│      ┌───────────────────────────────────────┐                              │
│      │ handle_request(fd)                    │                              │
│      │                                       │                              │
│      │ - Read client data                    │                              │
│      │ - Process (echo back)                 │                              │
│      │ - Write response                      │                              │
│      │ - Close fd                            │                              │
│      └───────────────┬───────────────────────┘                              │
│                      │                                                      │
│                      ▼                                                      │
│      ┌───────────────────────────────────────┐                              │
│      │ latency = now() - start_time          │                              │
│      │ total_latency += latency              │                              │
│      │ requests_completed++                  │                              │
│      │ active_workers--                      │                              │
│      └───────────────┬───────────────────────┘                              │
│                      │                                                      │
│                      └──────────────▶ loop                                  │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Implementation Guide

Development Environment Setup

# Required packages (Ubuntu/Debian)
sudo apt-get install build-essential gdb valgrind

# For helgrind (race detection) and drd (thread error detection)
# Included with valgrind

# Optional: ThreadSanitizer (requires clang or recent gcc)
# Compile with: gcc -fsanitize=thread -g...

# Create project structure
mkdir -p concurrency-workbench/{src,include,tests,scripts}
cd concurrency-workbench

Project Structure

concurrency-workbench/
├── include/
│   ├── bounded_buffer.h     # Bounded buffer interface
│   ├── thread_pool.h        # Thread pool interface
│   ├── server.h             # Server abstraction
│   ├── handler.h            # Request handler
│   ├── stats.h              # Statistics collection
│   ├── rio.h                # Robust I/O
│   └── debug.h              # Debug/assert utilities
├── src/
│   ├── main.c               # Entry point, CLI
│   ├── bounded_buffer.c     # Producer-consumer queue
│   ├── thread_pool.c        # Thread pool implementation
│   ├── server_iterative.c   # Iterative server
│   ├── server_process.c     # Process-per-request server
│   ├── server_thread.c      # Thread-per-request server
│   ├── server_pool.c        # Thread pool server
│   ├── handler.c            # Echo handler
│   ├── stats.c              # Statistics
│   ├── rio.c                # Robust I/O
│   └── debug.c              # Debug utilities
├── tests/
│   ├── test_bounded_buffer.c
│   ├── test_thread_pool.c
│   ├── stress_test.c
│   └── race_detector.c
├── scripts/
│   ├── run_comparison.sh
│   └── analyze_stats.py
└── Makefile

Implementation Phases

Phase 1: Foundation (Days 1-3)

Goals:

Set up project structure
Implement robust I/O
Create basic echo handler

Tasks:

Create Makefile with proper compilation flags
Implement rio_readn() and rio_writen() for robust I/O
Implement basic echo handler (read line, echo back)
Create server socket setup utilities

Checkpoint: Can create a listening socket and handle one connection manually.

Phase 2: Iterative Server (Days 4-5)

Goals:

Implement baseline iterative server
Add basic statistics collection

Tasks:

Implement accept loop with single-threaded handling
Add request counting and timing
Test with multiple sequential clients
Observe blocking behavior with slow client

Checkpoint: Server handles requests one at a time, reports requests/second.

Phase 3: Process-Based Server (Days 6-7)

Goals:

Implement process-per-request model
Handle zombie process cleanup

Tasks:

Fork child process for each accepted connection
Implement SIGCHLD handler for reaping
Handle error cases (fork failure, child crash)
Compare throughput with iterative

Checkpoint: Server handles concurrent clients via fork, no zombie processes.

Phase 4: Thread-Per-Request Server (Days 8-9)

Goals:

Implement thread-per-request model
Handle thread lifecycle

Tasks:

Create detached thread for each connection
Ensure proper resource cleanup
Handle thread creation failure
Compare throughput with process model

Checkpoint: Server handles concurrent clients via threads, no resource leaks.

Phase 5: Bounded Buffer (Days 10-12)

Goals:

Implement thread-safe bounded buffer
Test thoroughly under contention

Tasks:

Implement circular buffer with mutex and condition variables
Implement blocking put() and get() operations
Add shutdown signaling
Write comprehensive unit tests

Checkpoint: Bounded buffer passes stress tests with many producers/consumers.

Phase 6: Thread Pool (Days 13-16)

Goals:

Implement complete thread pool
Add graceful shutdown

Tasks:

Create worker thread management
Integrate bounded buffer as work queue
Implement shutdown sequence
Add statistics collection

Checkpoint: Thread pool handles work items, shuts down cleanly.

Phase 7: Thread Pool Server (Days 17-19)

Goals:

Integrate thread pool with server
Compare all models

Tasks:

Connect accept loop to thread pool submission
Handle queue-full backpressure
Implement comprehensive statistics
Run comparison benchmarks

Checkpoint: All four server modes work, with performance comparison data.

Phase 8: Stress Testing and Bug Hunting (Days 20-21)

Goals:

Create stress test harness
Find and fix concurrency bugs

Tasks:

Build multi-client stress test tool
Run extended stress tests (hours)
Use helgrind/ThreadSanitizer to find races
Fix all discovered issues

Checkpoint: Server runs stably under stress, no races detected.

Key Implementation Decisions

Decision	Options	Recommendation	Rationale
Synchronization	Semaphores vs Condition Variables	Condition Variables	More flexible, clearer semantics
Statistics	Per-operation locks vs Atomics	Atomics	Lower overhead, sufficient for counters
Shutdown	Immediate vs Graceful	Graceful	Real-world expectation
Error Handling	Abort vs Continue	Log and continue	Server should be resilient
Queue	Fixed array vs Dynamic	Fixed array	Predictable memory, simpler

Testing Strategy

Unit Tests

// test_bounded_buffer.c

void test_single_producer_consumer() {
    bounded_buffer_t buf;
    bounded_buffer_init(&buf, 10);

    // Producer thread
    pthread_t producer;
    pthread_create(&producer, NULL, producer_fn, &buf);

    // Consumer in main thread
    for (int i = 0; i < 100; i++) {
        int item = bounded_buffer_get(&buf);
        assert(item == i);
    }

    pthread_join(producer, NULL);
    bounded_buffer_destroy(&buf);
}

void test_multiple_producers_consumers() {
    // 10 producers, 10 consumers, 1000 items each
    // Verify all items received exactly once
}

void test_buffer_bounds() {
    bounded_buffer_t buf;
    bounded_buffer_init(&buf, 5);

    // Fill buffer
    for (int i = 0; i < 5; i++) {
        bounded_buffer_put(&buf, i);  // Should not block
    }

    // Next put should block (test with tryput or timeout)
    // ...
}

void test_shutdown() {
    // Verify waiting threads unblock on shutdown
}

Integration Tests

// test_server.c

void test_echo_correctness() {
    // Start server in thread
    // Connect as client
    // Send message, verify echo
    // Close connection
    // Stop server
}

void test_concurrent_clients() {
    // Start server
    // Launch 100 clients in parallel
    // Each sends/receives 10 messages
    // Verify all succeeded
}

void test_slow_client() {
    // Client connects but reads slowly
    // Other clients should not be affected
}

Stress Tests

# stress_test.c - Configurable stress test client

# Basic stress test
./stress-test --clients 100 --requests 1000 --target localhost:8080

# Long duration test
./stress-test --clients 50 --duration 3600 --target localhost:8080

# Burst test (all requests at once)
./stress-test --clients 1000 --requests 1 --burst --target localhost:8080

# Slow client test
./stress-test --clients 10 --delay-ms 100 --target localhost:8080

Race Detection

# Compile with ThreadSanitizer
gcc -fsanitize=thread -g -O1 -o server_tsan src/*.c -lpthread

# Run with helgrind (Valgrind tool)
valgrind --tool=helgrind ./concurrency-workbench --mode pool

# Run with DRD (Valgrind tool, often faster)
valgrind --tool=drd ./concurrency-workbench --mode pool

Critical Test Cases

Race condition in counter:
- Multiple threads incrementing shared counter
- Verify final count matches expected
Producer-consumer correctness:
- N producers, M consumers, K items each
- Verify each item consumed exactly once
Deadlock scenario:
- Create conditions for potential deadlock
- Verify no hang occurs
Shutdown under load:
- Signal shutdown while handling requests
- Verify clean termination
Queue overflow:
- Submit more work than queue capacity
- Verify backpressure works correctly

Common Pitfalls

Deadlocks

Pitfall	Symptom	Solution
Lock ordering violation	Complete hang	Always acquire locks in same order
Self-deadlock	Thread blocks forever	Don’t lock mutex already held
Condition variable without mutex	Corruption, deadlock	Always hold mutex when waiting
Forgetting to unlock	Other threads block	Use RAII pattern or careful review

// WRONG: Can deadlock if threads acquire in different order
void transfer(account_t *from, account_t *to, int amount) {
    pthread_mutex_lock(&from->lock);
    pthread_mutex_lock(&to->lock);   // Deadlock if other thread does to->from
    // transfer
    pthread_mutex_unlock(&to->lock);
    pthread_mutex_unlock(&from->lock);
}

// CORRECT: Always lock by address order
void transfer(account_t *from, account_t *to, int amount) {
    account_t *first = (from < to) ? from : to;
    account_t *second = (from < to) ? to : from;

    pthread_mutex_lock(&first->lock);
    pthread_mutex_lock(&second->lock);
    // transfer
    pthread_mutex_unlock(&second->lock);
    pthread_mutex_unlock(&first->lock);
}

Data Races

Pitfall	Symptom	Solution
Unprotected shared variable	Corrupted data, crashes	Use mutex or atomics
Read without lock	Stale data	Lock for reads too
Non-atomic compound operation	Lost updates	Hold lock for entire operation
Flag without synchronization	Missed signals	Use atomic or condition variable

// WRONG: Data race on 'done' flag
int done = 0;

void *producer(void *arg) {
    // produce data
    done = 1;  // No memory barrier!
    return NULL;
}

void *consumer(void *arg) {
    while (!done) {  // May never see update
        // wait
    }
    // consume data - may see stale data!
}

// CORRECT: Use atomic or condition variable
_Atomic int done = 0;

void *producer(void *arg) {
    // produce data
    atomic_store(&done, 1);
    return NULL;
}

void *consumer(void *arg) {
    while (!atomic_load(&done)) {
        // wait
    }
    // consume data
}

Resource Leaks

Pitfall	Symptom	Solution
Not joining threads	Memory leak	Join or detach all threads
Not closing file descriptors	FD exhaustion	Close in all paths, including errors
Not destroying mutexes	Memory leak	Destroy in cleanup
Not reaping child processes	Zombie processes	Handle SIGCHLD

// WRONG: Leaks on error
void handle_client(int fd) {
    char *buf = malloc(1024);
    if (read(fd, buf, 1024) < 0) {
        return;  // Leaks buf!
    }
    // ...
    free(buf);
    close(fd);
}

// CORRECT: Cleanup in all paths
void handle_client(int fd) {
    char *buf = malloc(1024);
    if (read(fd, buf, 1024) < 0) {
        free(buf);
        close(fd);
        return;
    }
    // ...
    free(buf);
    close(fd);
}

// BETTER: Use goto for cleanup (common C pattern)
void handle_client(int fd) {
    char *buf = NULL;

    buf = malloc(1024);
    if (read(fd, buf, 1024) < 0) {
        goto cleanup;
    }
    // ...

cleanup:
    free(buf);
    close(fd);
}

Performance Pitfalls

Pitfall	Symptom	Solution
Lock contention	Poor scaling	Reduce critical section, finer locks
False sharing	Cache thrashing	Pad data structures
Too many threads	Context switch overhead	Use thread pool
Busy waiting	High CPU, poor latency	Use condition variables

Debugging Strategies

Add extensive logging (with timestamps and thread IDs):

#define DEBUG_LOG(fmt, ...) \
    fprintf(stderr, "[%lu][%p] " fmt "\n", \
            time_us(), (void*)pthread_self(), ##__VA_ARGS__)

Use assertions liberally:

void bounded_buffer_put(bounded_buffer_t *b, int item) {
    pthread_mutex_lock(&b->lock);
    assert(b->count >= 0 && b->count <= b->capacity);
    // ...
}

Introduce artificial delays to expose races:

#ifdef DEBUG_RACES
usleep(rand() % 1000);  // Random delay to expose timing issues
#endif

Use helgrind/ThreadSanitizer in CI

Extensions and Challenges

Beginner Extensions

Multiple handler types: Add handlers for different protocols (HTTP, time, etc.)
Connection timeout: Disconnect idle clients after timeout
Logging to file: Structured logging with rotation

Intermediate Extensions

Priority queue: High-priority requests handled first
Dynamic thread pool: Grow/shrink based on load
Connection limiting: Max clients per IP
I/O multiplexing mode: Add epoll/select-based model

Advanced Extensions

Lock-free bounded buffer: Use compare-and-swap instead of locks
Work stealing: Workers steal from each other’s queues
NUMA awareness: Pin threads to cores, local memory
Custom allocator: Per-thread arena for request data

Lock-Free Data Structures (Advanced)

A lock-free bounded buffer using compare-and-swap:

// WARNING: Lock-free programming is extremely difficult!
// This is for educational purposes; use tested libraries in production.

typedef struct {
    _Atomic int *buffer;
    _Atomic size_t head;
    _Atomic size_t tail;
    size_t capacity;
} lockfree_queue_t;

int lockfree_push(lockfree_queue_t *q, int item) {
    size_t tail, next;

    do {
        tail = atomic_load(&q->tail);
        next = (tail + 1) % q->capacity;

        if (next == atomic_load(&q->head)) {
            return -1;  // Full
        }
    } while (!atomic_compare_exchange_weak(&q->tail, &tail, next));

    atomic_store(&q->buffer[tail], item);
    return 0;
}

Real-World Connections

Industry Applications

Web servers (nginx, Apache): Thread pools for request handling
Databases (PostgreSQL, MySQL): Connection pools, worker processes
Message queues (RabbitMQ, Kafka): Producer-consumer patterns
Game servers: Thread pools for player connections
Cloud infrastructure: Work stealing, NUMA-aware scheduling

libuv: Event loop library used by Node.js
libevent: Event notification library
Intel TBB: Threading Building Blocks with work stealing
jemalloc: Thread-aware memory allocator
Seastar: High-performance async framework

Interview Relevance

This project prepares you for questions like:

“Design a thread pool from scratch”
“How would you implement a producer-consumer queue?”
“What are the four conditions for deadlock? How do you prevent it?”
“Compare process-based vs thread-based concurrency”
“How would you debug a race condition?”
“Explain the difference between mutex and semaphore”

Resources

Essential Reading

CS:APP Chapter 12: “Concurrent Programming” — Core material
OSTEP Concurrency Chapters: Free textbook with excellent explanations
- Chapter 26: Concurrency Introduction
- Chapter 27: Interlude: Thread API
- Chapter 28: Locks
- Chapter 29: Lock-based Concurrent Data Structures
- Chapter 30: Condition Variables
- Chapter 31: Semaphores
- Chapter 32: Common Concurrency Problems
- Chapter 33: Event-based Concurrency

Reference Documentation

POSIX Threads Programming: https://computing.llnl.gov/tutorials/pthreads/
pthread man pages: man pthread_create, man pthread_mutex_init, etc.
GCC Atomic Builtins: GCC documentation on __atomic_* operations

Video Resources

MIT 6.004 lectures on concurrency
Carnegie Mellon 15-213 lectures (CS:APP course)
“Concurrency is not Parallelism” by Rob Pike

Tools

helgrind: Valgrind tool for detecting races
DRD: Valgrind tool for thread errors
ThreadSanitizer: Clang/GCC tool for race detection
gdb: Thread-aware debugging

Previous: P15 (Robust Unix I/O Toolkit) — I/O foundations
Previous: P11 (Signals + Processes Sandbox) — Process control foundations
Next: P17 (Capstone Proxy) — Integrates all concepts including concurrency

Self-Assessment Checklist

Before considering this project complete, verify:

Understanding

I can explain the difference between concurrency and parallelism
I can describe when to use processes vs threads
I can implement a mutex from scratch (conceptually)
I can explain why counter++ is not atomic
I can describe the four conditions for deadlock
I can implement producer-consumer with semaphores AND condition variables
I can explain spurious wakeups and why we use while not if
I can describe what makes a function thread-safe vs reentrant

Implementation

All four server modes work correctly
Bounded buffer handles all edge cases (empty, full, shutdown)
Thread pool shuts down gracefully
No resource leaks (threads, fds, memory)
Statistics are accurate under load
Server handles client errors gracefully

Debugging

I can use helgrind/ThreadSanitizer to find races
I can debug deadlocks using gdb
I have found and fixed at least one real concurrency bug
My stress tests run for hours without issues

Performance

I can explain why thread pool beats thread-per-request under high load
I understand the overhead of context switches
I can tune thread count for my workload
I can identify lock contention as a bottleneck

The Core Question You’re Answering

“How do I write a server that can handle thousands of simultaneous clients without blocking, crashing, or corrupting shared data?”

This project teaches you to harness the power of concurrent execution while avoiding its pitfalls. You’ll understand why the naive approach (one thread per client) doesn’t scale, why shared mutable state is dangerous, and how synchronization primitives let multiple threads cooperate safely. These patterns are the foundation of every high-performance server, database, and operating system.

Concepts You Must Understand First

Before starting this project, ensure you understand these concepts:

Concept	Why It Matters	Where to Learn
Process creation (fork)	Baseline for comparison	CS:APP 8.4
Basic thread creation (pthread_create)	You’ll create many threads	CS:APP 12.3
What a race condition is	You must avoid them	CS:APP 12.4
Critical sections concept	Foundation for mutexes	CS:APP 12.4
Producer-consumer pattern	Your thread pool uses this	CS:APP 12.5.2
What a semaphore does	Key synchronization primitive	CS:APP 12.5
Socket programming basics	Server accepts connections	CS:APP Chapter 11
errno and error handling	Threaded errors are tricky	CS:APP 10.4

Questions to Guide Your Design

Work through these questions BEFORE writing code:

Thread Pool Sizing: How many threads should be in your pool? How do you determine the optimal number?
Queue Overflow: What happens when work arrives faster than threads can process it? Block the producer? Drop work?
Graceful Shutdown: How do you stop worker threads without corrupting in-flight work? What’s a “poison pill”?
Error Isolation: If one request causes a segfault, what happens to other requests? To the server?
Statistics Threading: How do you collect accurate stats without creating a bottleneck? Lock every update?
Client Timeouts: What if a client connects but never sends data? How do you avoid blocking a worker forever?
Lock Ordering: If you need multiple locks, in what order should they be acquired to prevent deadlock?

Thinking Exercise

Before writing any code, trace through this scenario by hand:

You have a thread pool with 2 workers and a bounded buffer of size 3. Trace the execution:

Time 0:  Request A arrives (processing time: 100ms)
Time 10: Request B arrives (processing time: 50ms)
Time 20: Request C arrives (processing time: 80ms)
Time 30: Request D arrives (processing time: 40ms)
Time 40: Request E arrives (processing time: 60ms)

Exercise: On paper, answer:

Time 0-10: What’s in the queue? Which workers are busy?
Time 30: Request D arrives. Can it be enqueued immediately? Who’s processing what?
Time 40: Request E arrives. Queue state? Is the producer blocked?
Time 60: Request B completes. What happens next? Who processes D?
Time 100: Request A completes. What happens to blocked producer (if any)?
Final state: In what order did requests complete? Draw a timeline.

Verify your answers by implementing with verbose logging.

Hints in Layers

If you’re stuck, reveal hints one at a time:

Hint 1: Bounded Buffer with Semaphores

The producer-consumer pattern with semaphores uses three semaphores:

typedef struct {
    int *buf;           // Buffer array
    int n;              // Maximum number of slots
    int front;          // buf[(front+1)%n] is first item
    int rear;           // buf[rear%n] is last item
    sem_t mutex;        // Protects buffer access
    sem_t slots;        // Counts available slots
    sem_t items;        // Counts available items
} sbuf_t;

void sbuf_init(sbuf_t *sp, int n) {
    sp->buf = calloc(n, sizeof(int));
    sp->n = n;
    sp->front = sp->rear = 0;
    sem_init(&sp->mutex, 0, 1);    // Binary semaphore for mutual exclusion
    sem_init(&sp->slots, 0, n);    // n empty slots initially
    sem_init(&sp->items, 0, 0);    // 0 items initially
}

Producer waits for slot, then inserts:

void sbuf_insert(sbuf_t *sp, int item) {
    sem_wait(&sp->slots);          // Wait for available slot
    sem_wait(&sp->mutex);          // Lock buffer
    sp->buf[(++sp->rear) % sp->n] = item;
    sem_post(&sp->mutex);          // Unlock buffer
    sem_post(&sp->items);          // Announce available item
}

Hint 2: Graceful Thread Pool Shutdown

Use a “poison pill” pattern - a special value that tells workers to exit:

#define POISON_PILL -1

void shutdown_thread_pool(sbuf_t *pool, int num_workers) {
    // Send poison pill to each worker
    for (int i = 0; i < num_workers; i++) {
        sbuf_insert(pool, POISON_PILL);
    }

    // Wait for all workers to exit
    for (int i = 0; i < num_workers; i++) {
        pthread_join(worker_threads[i], NULL);
    }
}

void *worker_thread(void *arg) {
    sbuf_t *pool = (sbuf_t *)arg;

    while (1) {
        int connfd = sbuf_remove(pool);

        if (connfd == POISON_PILL) {
            printf("Thread %lu received poison pill, exiting\n",
                   pthread_self());
            break;
        }

        handle_client(connfd);
        close(connfd);
    }

    return NULL;
}

Hint 3: Thread-Safe Statistics

For statistics, you have options:

Option 1: Single mutex (simple but potentially contended):

pthread_mutex_t stats_lock = PTHREAD_MUTEX_INITIALIZER;
struct stats {
    long total_requests;
    long total_bytes;
    double total_time;
} global_stats;

void update_stats(long bytes, double time) {
    pthread_mutex_lock(&stats_lock);
    global_stats.total_requests++;
    global_stats.total_bytes += bytes;
    global_stats.total_time += time;
    pthread_mutex_unlock(&stats_lock);
}

Option 2: Per-thread counters (no contention, merge at end):

__thread struct stats thread_local_stats;  // Thread-local storage

void update_stats(long bytes, double time) {
    // No locking needed - each thread has its own copy
    thread_local_stats.total_requests++;
    thread_local_stats.total_bytes += bytes;
    thread_local_stats.total_time += time;
}

void merge_all_stats(struct stats *result) {
    // Called during shutdown, sum all thread-local copies
    // (Requires keeping track of all thread-local instances)
}

Hint 4: Testing for Race Conditions

To reliably trigger race conditions in testing:

// Add artificial delays to widen race windows
#ifdef TESTING_RACES
    #define RACE_DELAY() usleep(rand() % 1000)
#else
    #define RACE_DELAY()
#endif

void critical_section(void) {
    pthread_mutex_lock(&lock);
    RACE_DELAY();  // Makes races more likely to manifest
    shared_counter++;
    RACE_DELAY();
    pthread_mutex_unlock(&lock);
}

Use thread sanitizer:

gcc -fsanitize=thread -g -o server server.c -lpthread
./server

Stress test pattern:

void stress_test(int num_threads, int ops_per_thread) {
    pthread_t threads[num_threads];

    for (int i = 0; i < num_threads; i++) {
        pthread_create(&threads[i], NULL, worker, &ops_per_thread);
    }

    for (int i = 0; i < num_threads; i++) {
        pthread_join(threads[i], NULL);
    }

    // Verify invariants
    assert(final_count == num_threads * ops_per_thread);
}

The Interview Questions They’ll Ask

After completing this project, you’ll be ready for these common interview questions:

“Explain the difference between concurrency and parallelism.”
- Expected: Concurrency is about structure (dealing with many things); parallelism is about execution (doing many things)
- Bonus: Give examples of each alone and together, explain why both matter for servers
“How do you prevent race conditions?”
- Expected: Mutex/lock around shared data, atomic operations, or lock-free data structures
- Bonus: Explain trade-offs, when each approach is appropriate, what makes a good critical section
“What causes deadlock and how do you prevent it?”
- Expected: Four conditions (mutual exclusion, hold and wait, no preemption, circular wait); prevent by breaking one condition
- Bonus: Lock ordering, timeout-based detection, lock-free alternatives
“Design a thread pool for a web server.”
- Expected: Bounded queue, worker threads, producer-consumer pattern
- Bonus: Discuss sizing, graceful shutdown, error handling, monitoring
“What’s the difference between a mutex and a semaphore?”
- Expected: Mutex for mutual exclusion (binary, ownership); semaphore for counting resources
- Bonus: When to use each, binary semaphore vs mutex, condition variables
“How would you debug a threading bug that only happens under load?”
- Expected: Thread sanitizers (TSan), logging with thread IDs, deterministic testing
- Bonus: Race detection tools, controlled thread interleavings, invariant checking

Books That Will Help

Topic	Book	Chapter/Section
Concurrent programming overview	CS:APP 3rd Ed	Chapter 12.1-12.2 “Concurrent Programming”, “Concurrent Programming with Processes”
Thread programming	CS:APP 3rd Ed	Chapter 12.3 “Concurrent Programming with Threads”
Shared variables and synchronization	CS:APP 3rd Ed	Chapter 12.4-12.5 “Shared Variables in Threaded Programs”, “Synchronizing Threads”
Thread safety	CS:APP 3rd Ed	Chapter 12.7 “Thread Safety”
Races and deadlocks	CS:APP 3rd Ed	Chapter 12.7.2-12.7.3 “Races”, “Deadlocks”
POSIX threads in depth	Programming with POSIX Threads by Butenhof	Chapters 3-7
Lock-free programming	C++ Concurrency in Action by Williams	Chapter 7 “Designing lock-free concurrent data structures”
High-performance servers	The Art of Multiprocessor Programming	Chapters 1-3
Thread pools and patterns	Java Concurrency in Practice	Chapter 8 (concepts apply to C)

Submission / Completion Criteria

Minimum Viable Completion:

All four concurrency modes implemented and working
Bounded buffer passes correctness tests
Basic stress test completes without errors
Performance comparison data collected

Full Completion:

Graceful shutdown works under load
Statistics collection is accurate
Extended stress testing (1+ hour) passes
Race detection tools report no issues
Documentation explains every synchronization decision

Excellence (Going Above & Beyond):

Lock-free data structures implemented
Dynamic thread pool sizing
I/O multiplexing mode added
NUMA-aware implementation
Formal proof of deadlock freedom

This guide was expanded from CSAPP_3E_DEEP_LEARNING_PROJECTS.md. For the complete learning path, see the project index.

Project 16: Concurrency Workbench

Quick Reference

Learning Objectives

Theoretical Foundation

Concurrency vs Parallelism

Threads vs Processes Trade-offs

POSIX Threads (pthreads) Basics

Thread Lifecycle

Essential pthreads Functions

Common Patterns

Shared State and Race Conditions

Anatomy of a Race Condition

Types of Race Conditions

Critical Sections and Mutual Exclusion

Mutex Locks and Spinlocks

Mutex (Mutual Exclusion Lock)

Spinlock

When to Use Each

Semaphores and Their Semantics

Condition Variables and Signaling

Producer-Consumer Pattern

Implementation with Semaphores

Implementation with Condition Variables

Reader-Writer Locks

Deadlock Conditions and Prevention

Four Necessary Conditions (Coffman Conditions)

Prevention Strategies

Thread Pools and Work Queues

Thread Safety and Reentrant Functions

Memory Consistency Models (Basics)

Project Specification

What You Will Build

Server Modes

Functional Requirements

Non-Functional Requirements

Example Usage

Real World Outcome

Server Mode Selection

Comparing Concurrency Modes

Bounded Buffer in Action

Race Condition Detection

Deadlock Demonstration and Fix

Solution Architecture

High-Level Design

Key Components

Data Structures

Algorithm Overview

Implementation Guide

Development Environment Setup

Project Structure

Implementation Phases

Phase 1: Foundation (Days 1-3)

Phase 2: Iterative Server (Days 4-5)

Phase 3: Process-Based Server (Days 6-7)

Phase 4: Thread-Per-Request Server (Days 8-9)

Phase 5: Bounded Buffer (Days 10-12)

Phase 6: Thread Pool (Days 13-16)

Phase 7: Thread Pool Server (Days 17-19)

Phase 8: Stress Testing and Bug Hunting (Days 20-21)

Key Implementation Decisions

Testing Strategy

Unit Tests

Integration Tests

Stress Tests

Race Detection

Critical Test Cases

Common Pitfalls

Deadlocks

Data Races

Resource Leaks

Performance Pitfalls

Debugging Strategies

Extensions and Challenges

Beginner Extensions

Intermediate Extensions

Advanced Extensions

Lock-Free Data Structures (Advanced)

Real-World Connections

Industry Applications

Related Open Source Projects