Project 5: Thread Pool Library

Build a production-quality thread pool with futures, task queuing, and graceful shutdown to master C++ concurrency primitives.

Quick Reference

Attribute	Value
Difficulty	Advanced
Time Estimate	2-3 weeks
Language	C++
Prerequisites	std::thread basics, mutex/lock understanding, move semantics
Key Topics	std::thread, mutex, condition_variable, futures, packaged_task, producer-consumer

1. Learning Objectives

By completing this project, you will:

Master thread lifecycle management: Create, manage, and properly join worker threads
Understand synchronization primitives: Use mutex, lock_guard, unique_lock, and condition_variable correctly
Implement the producer-consumer pattern: Build a thread-safe task queue with proper signaling
Work with futures and promises: Return results from async tasks using std::future and std::packaged_task
Handle graceful shutdown: Stop threads safely without losing pending tasks or causing deadlocks
Apply perfect forwarding: Use variadic templates and std::forward for flexible task submission
Avoid common concurrency bugs: Prevent race conditions, deadlocks, and spurious wakeups

2. Theoretical Foundation

2.1 Core Concepts

Thread Pools solve a fundamental problem: Creating and destroying threads is expensive (typically 10-100 microseconds per thread). A thread pool maintains a set of worker threads that wait for and execute tasks, amortizing creation cost across many tasks.

Traditional Approach (expensive):
┌─────────────────────────────────────────────────────────────┐
│  Task 1 arrives → Create Thread → Execute → Destroy Thread  │
│  Task 2 arrives → Create Thread → Execute → Destroy Thread  │
│  Task 3 arrives → Create Thread → Execute → Destroy Thread  │
│  ...                                                         │
│  Overhead: N thread creations + N destructions               │
└─────────────────────────────────────────────────────────────┘

Thread Pool Approach (efficient):
┌─────────────────────────────────────────────────────────────┐
│  Create N workers once                                       │
│  ┌─────────┐  ┌─────────┐  ┌─────────┐  ┌─────────┐         │
│  │Worker 1 │  │Worker 2 │  │Worker 3 │  │Worker 4 │         │
│  └────┬────┘  └────┬────┘  └────┬────┘  └────┬────┘         │
│       │            │            │            │               │
│       └────────────┴─────┬──────┴────────────┘               │
│                          │                                   │
│                    ┌─────▼─────┐                             │
│                    │Task Queue │ ← Tasks submitted here      │
│                    └───────────┘                             │
│  Overhead: N thread creations (once) + 0 destructions        │
└─────────────────────────────────────────────────────────────┘

The Producer-Consumer Pattern is the heart of a thread pool:

                    ┌─────────────────────────────────────┐
                    │           SHARED QUEUE              │
                    │  ┌───┬───┬───┬───┬───┬───┬───┐     │
 Producer           │  │T1 │T2 │T3 │T4 │   │   │   │     │    Consumer
 (main thread)      │  └───┴───┴───┴───┴───┴───┴───┘     │    (worker threads)
      │             │       ↑                    ↑        │         │
      │ submit()    │       │                    │        │  wait() │
      └─────────────┼───────┘                    └────────┼─────────┘
                    │                                     │
                    │  MUTEX protects queue access        │
                    │  CONDITION_VARIABLE signals work    │
                    └─────────────────────────────────────┘

Futures and Promises enable returning results:

┌──────────────┐     ┌─────────────────┐     ┌──────────────┐
│    Caller    │     │  packaged_task  │     │   Worker     │
│              │     │                 │     │              │
│  get_future()├────►│ ┌─────────────┐ │     │              │
│              │     │ │   promise   │ │     │              │
│              │     │ └──────┬──────┘ │     │              │
│              │     │        │        │     │  operator()  │
│              │     │        ▼        │◄────┤  (execute)   │
│  future.get()│◄────┤ ┌─────────────┐ │     │              │
│  (blocks)    │     │ │   future    │ │     │              │
└──────────────┘     │ └─────────────┘ │     └──────────────┘
                     └─────────────────┘

2.2 Why This Matters

Thread pools are ubiquitous in production systems:

Web servers: Nginx, Apache use thread pools for request handling
Databases: PostgreSQL, MySQL use thread pools for query execution
Game engines: Unity, Unreal use job systems (advanced thread pools)
Async frameworks: Boost.Asio, libuv use thread pools for I/O completion

Understanding thread pools teaches you:

How to think about concurrent execution
The cost/benefit tradeoffs of parallelism
How to design thread-safe interfaces
How async/await patterns work under the hood

2.3 Historical Context

Thread pools emerged from the need to handle many concurrent connections in early web servers (1990s). The “thread-per-connection” model didn’t scale beyond hundreds of clients due to:

Memory overhead: Each thread requires its own stack (typically 1-8 MB)
Context switching cost: OS scheduler overhead increases with thread count
Creation latency: Thread creation involves kernel calls and memory allocation

The Apache prefork model and later worker MPM pioneered thread pool usage in web servers. Today, nearly every high-performance server uses some form of thread pooling.

2.4 Common Misconceptions

Misconception 1: “More threads = faster execution” Reality: Beyond the number of CPU cores, adding threads increases contention and context switching. A thread pool with std::thread::hardware_concurrency() threads is often optimal.

Misconception 2: “condition_variable::notify_one() always wakes exactly one thread” Reality: Spurious wakeups can occur. Always use a predicate with wait():

// WRONG: Can wake up without work available
cv.wait(lock);

// CORRECT: Re-checks condition after wakeup
cv.wait(lock, []{ return !queue.empty() || stop; });

Misconception 3: “std::function<void()> is the only way to store tasks” Reality: std::function has overhead (type erasure, potential heap allocation). For performance-critical pools, consider std::move_only_function (C++23) or custom type-erased wrappers.

3. Project Specification

3.1 What You Will Build

A thread pool library that:

Manages a fixed number of worker threads
Accepts callable tasks with arbitrary return types
Returns std::future for each submitted task
Supports graceful shutdown (complete pending tasks)
Is thread-safe for concurrent task submission

3.2 Functional Requirements

Requirement	Description
FR-1	Constructor takes number of worker threads
FR-2	`submit()` accepts any callable with any return type
FR-3	`submit()` returns `std::future<ReturnType>`
FR-4	Multiple threads can submit tasks concurrently
FR-5	Destructor waits for all pending tasks to complete
FR-6	Workers sleep when queue is empty (no busy-waiting)
FR-7	Tasks execute in approximate FIFO order

3.3 Non-Functional Requirements

Requirement	Description
NFR-1	Submission latency < 10 microseconds for simple lambdas
NFR-2	No memory leaks (verified with Valgrind/ASan)
NFR-3	No deadlocks under any submission pattern
NFR-4	Exception in task doesn’t crash pool
NFR-5	Header-only implementation preferred

3.4 Example Usage / Output

#include "thread_pool.hpp"
#include <iostream>
#include <vector>
#include <chrono>

int main() {
    // Create pool with 4 worker threads
    ThreadPool pool(4);

    // Submit a simple lambda
    auto future1 = pool.submit([]() {
        std::this_thread::sleep_for(std::chrono::milliseconds(100));
        return 42;
    });

    // Submit with arguments
    auto future2 = pool.submit([](int a, int b) {
        return a + b;
    }, 10, 20);

    // Submit many tasks
    std::vector<std::future<int>> results;
    for (int i = 0; i < 1000; i++) {
        results.push_back(pool.submit([i]() {
            return i * i;
        }));
    }

    // Get results (blocks until ready)
    std::cout << "future1: " << future1.get() << std::endl;  // 42
    std::cout << "future2: " << future2.get() << std::endl;  // 30

    // Collect all results
    int sum = 0;
    for (auto& f : results) {
        sum += f.get();
    }
    std::cout << "Sum of squares 0-999: " << sum << std::endl;

    // Pool destructor waits for all tasks
    return 0;
}

Expected Output:

future1: 42
future2: 30
Sum of squares 0-999: 332833500

3.5 Real World Outcome

When you run your thread pool with a benchmark:

$ ./thread_pool_bench
Thread Pool Benchmark
=====================
Pool size: 8 threads
Task count: 100,000

Sequential execution: 2.34 seconds
Thread pool execution: 0.31 seconds
Speedup: 7.5x

Task submission latency:
  Min: 0.8 us
  Max: 45.2 us
  Avg: 2.1 us
  P99: 8.3 us

Memory usage: 2.1 MB (stable, no growth)
No data races detected (ThreadSanitizer clean)

4. Solution Architecture

4.1 High-Level Design

┌─────────────────────────────────────────────────────────────────────┐
│                         ThreadPool Class                             │
│                                                                      │
│  ┌──────────────────────────────────────────────────────────────┐   │
│  │                      Public Interface                         │   │
│  │  ThreadPool(size_t num_threads)                               │   │
│  │  ~ThreadPool()                                                │   │
│  │  template<class F, class... Args>                             │   │
│  │  auto submit(F&& f, Args&&... args) -> future<result_of<F>>   │   │
│  └──────────────────────────────────────────────────────────────┘   │
│                                │                                     │
│                                ▼                                     │
│  ┌──────────────────────────────────────────────────────────────┐   │
│  │                      Private Members                          │   │
│  │                                                               │   │
│  │  vector<thread> workers_          ← N worker threads          │   │
│  │  queue<function<void()>> tasks_   ← Task queue                │   │
│  │  mutex queue_mutex_               ← Protects queue            │   │
│  │  condition_variable condition_    ← Signals workers           │   │
│  │  bool stop_                       ← Shutdown flag             │   │
│  └──────────────────────────────────────────────────────────────┘   │
│                                                                      │
└─────────────────────────────────────────────────────────────────────┘

4.2 Key Components

Component	Responsibility
Worker Loop	Continuously dequeue and execute tasks
Task Queue	Thread-safe FIFO storage for pending tasks
Submit Function	Wrap callable in packaged_task, enqueue, return future
Shutdown Logic	Signal workers, join threads, drain queue

4.3 Data Structures

Task Queue: std::queue<std::function<void()>>

Why std::function<void()>? Type erasure allows storing any callable
Why void()? We wrap actual callables in lambdas that capture packaged_task

Worker Vector: std::vector<std::thread>

Fixed size after construction
Each thread runs the same worker loop function

4.4 Algorithm Overview

Worker Loop Algorithm:

while true:
    acquire lock
    wait for (not empty OR stop flag)
    if stop AND empty:
        return  // Exit thread
    task = dequeue front
    release lock
    execute task  // Outside lock!

Submit Algorithm:

create packaged_task from callable
get future from packaged_task
acquire lock
if stopped:
    throw "pool is stopped"
enqueue lambda that invokes packaged_task
release lock
notify one worker
return future

5. Implementation Guide

5.1 Development Environment Setup

Required:

C++17 or later (for std::invoke_result_t, structured bindings)
Compiler with threading support: GCC 7+, Clang 5+, MSVC 2017+

Recommended tools:

# Install ThreadSanitizer-capable compiler
sudo apt install g++-11

# Compile with ThreadSanitizer
g++ -std=c++17 -pthread -fsanitize=thread -g main.cpp -o main

# Or use AddressSanitizer for memory issues
g++ -std=c++17 -pthread -fsanitize=address -g main.cpp -o main

5.2 Project Structure

thread_pool/
├── include/
│   └── thread_pool.hpp      # Header-only implementation
├── examples/
│   ├── basic_usage.cpp      # Simple usage examples
│   └── benchmark.cpp        # Performance testing
├── tests/
│   ├── test_submit.cpp      # Task submission tests
│   ├── test_shutdown.cpp    # Graceful shutdown tests
│   └── test_stress.cpp      # Concurrent stress tests
├── CMakeLists.txt
└── README.md

5.3 The Core Question You’re Answering

“How do I execute work asynchronously while controlling the number of threads and getting results back?”

This question underlies most concurrent programming. By building a thread pool, you learn the fundamental patterns that power async/await, parallel algorithms, and distributed task systems.

5.4 Concepts You Must Understand First

Before implementing, verify you can answer these:

What’s the difference between std::mutex and std::recursive_mutex?
- Reference: “C++ Concurrency in Action” Chapter 3
Why must you use std::unique_lock (not std::lock_guard) with condition_variable?
- Because wait() needs to unlock the mutex while sleeping
What is a spurious wakeup and how do you handle it?
- Reference: “C++ Concurrency in Action” Section 4.1.1
What does std::packaged_task do and how does it relate to std::future?
- It wraps a callable and provides a future for its result
What is perfect forwarding and why is it needed for submit()?
- Reference: “Effective Modern C++” Items 23-25

5.5 Questions to Guide Your Design

Architecture Questions:

Should the pool be copyable? Movable?
What happens if submit() is called after destruction begins?
Should workers have priority? Affinity to specific cores?

Interface Questions:

Should submit() take arguments separately or require pre-binding?
Should there be a shutdown() method separate from destructor?
How do you handle exceptions thrown by tasks?

Performance Questions:

Is std::function overhead acceptable? Alternatives?
Lock-free queue vs mutex-protected queue tradeoffs?
How to minimize contention when many threads submit simultaneously?

5.6 Thinking Exercise

Before coding, trace through this scenario on paper:

Initial state: Pool with 2 workers, empty queue, stop=false

Timeline:
T=0:  Main thread calls submit(task_A)
T=1:  Worker 1 wakes up
T=2:  Main thread calls submit(task_B)
T=3:  Main thread calls submit(task_C)
T=4:  Worker 1 starts executing task_A
T=5:  Worker 2 wakes up
T=6:  Worker 2 pops and starts task_B
T=7:  Worker 1 finishes task_A, pops task_C
T=8:  Pool destructor called while task_C running

Questions to answer:

At T=5, how does Worker 2 know there’s work?
At T=8, what prevents destructor from completing before task_C?
If task_B throws an exception at T=6, what happens to Worker 2?

5.7 Hints in Layers

Hint 1 - Starting Point (Conceptual): Think about the pool as two halves: the “producer side” (submit) and the “consumer side” (workers). They communicate through a shared queue protected by a mutex. The condition variable is the “doorbell” that wakes sleeping workers.

Hint 2 - Next Level (More Specific): The worker loop should:

Acquire the mutex
Wait on condition variable with a predicate
Check if stopping AND queue empty (if so, return)
Pop a task (while holding mutex)
Release mutex
Execute task (without mutex!)
Loop back to step 1

Hint 3 - Technical Details (Approach): For submit(), you need to:

Create a std::packaged_task<ReturnType()> by binding arguments
Get the future before moving the packaged_task
Wrap the packaged_task in a std::function<void()> (tricky: packaged_task is move-only)
Solution: Use std::make_shared<std::packaged_task<...>> and capture by shared_ptr

template<class F, class... Args>
auto submit(F&& f, Args&&... args)
    -> std::future<std::invoke_result_t<F, Args...>>
{
    using return_type = std::invoke_result_t<F, Args...>;

    auto task = std::make_shared<std::packaged_task<return_type()>>(
        std::bind(std::forward<F>(f), std::forward<Args>(args)...)
    );

    std::future<return_type> result = task->get_future();

    {
        std::unique_lock<std::mutex> lock(queue_mutex_);
        if (stop_)
            throw std::runtime_error("submit on stopped pool");
        tasks_.emplace([task]() { (*task)(); });
    }
    condition_.notify_one();

    return result;
}

Hint 4 - Tools and Debugging:

Use ThreadSanitizer to detect races: g++ -fsanitize=thread
Print thread IDs for debugging: std::this_thread::get_id()
Log queue sizes: helps identify if work distribution is fair
Use std::atomic<int> counters to track active tasks

5.8 The Interview Questions They’ll Ask

“Why use notify_one() instead of notify_all() in submit?”
- Expected answer: Only one task was added, so waking one worker is sufficient. notify_all() causes thundering herd problem.
“What happens if a task throws an exception?”
- The exception is captured in the promise and rethrown when future.get() is called. The worker thread continues to the next task.
“How would you implement task cancellation?”
- Use std::stop_token (C++20) or a custom cancellation token passed to tasks.
“How would you implement task priorities?”
- Replace std::queue with std::priority_queue. Need custom comparator.
“Why wrap packaged_task in shared_ptr instead of moving directly?”
- std::function requires copyable callable. packaged_task is move-only. Shared_ptr makes it copyable.
“What’s the difference between this and std::async?”
- std::async may create a new thread per call. Thread pool reuses threads, controlling parallelism.

5.9 Books That Will Help

Topic	Book & Chapter
Thread basics	“C++ Concurrency in Action” Ch. 2 - Anthony Williams
Mutex and locks	“C++ Concurrency in Action” Ch. 3 - Anthony Williams
Condition variables	“C++ Concurrency in Action” Ch. 4.1 - Anthony Williams
Futures and promises	“C++ Concurrency in Action” Ch. 4.2 - Anthony Williams
Thread pool design	“C++ Concurrency in Action” Ch. 9 - Anthony Williams
Perfect forwarding	“Effective Modern C++” Items 23-30 - Scott Meyers

5.10 Implementation Phases

Phase 1: Basic Pool (Day 1-3)

Implement constructor that creates worker threads
Implement worker loop that sleeps when queue empty
Implement submit() for void-returning tasks only
Test that tasks execute

Phase 2: Futures Support (Day 4-6)

Add packaged_task wrapping
Return futures from submit()
Handle tasks with return values
Test that future.get() returns correct results

Phase 3: Graceful Shutdown (Day 7-9)

Implement stop flag logic
Ensure destructor waits for pending tasks
Handle submit-during-shutdown
Test with ThreadSanitizer

Phase 4: Polish (Day 10-14)

Add exception safety
Benchmark performance
Write comprehensive tests
Document the interface

5.11 Key Implementation Decisions

Decision	Recommended Choice	Reasoning
Task storage	`std::function<void()>`	Simple, flexible, good enough for most uses
Queue type	`std::queue`	FIFO is intuitive, sufficient for general use
Lock type	`std::mutex`	Don’t need recursion, simpler than alternatives
Number of workers	Hardware concurrency	Matches CPU cores, avoids over-subscription

6. Testing Strategy

Unit Tests:

TEST(ThreadPool, ExecutesSingleTask) {
    ThreadPool pool(1);
    auto f = pool.submit([]{ return 42; });
    EXPECT_EQ(f.get(), 42);
}

TEST(ThreadPool, HandlesExceptions) {
    ThreadPool pool(1);
    auto f = pool.submit([]{ throw std::runtime_error("oops"); return 0; });
    EXPECT_THROW(f.get(), std::runtime_error);
}

TEST(ThreadPool, ConcurrentSubmissions) {
    ThreadPool pool(4);
    std::vector<std::future<int>> futures;
    for (int i = 0; i < 1000; i++) {
        futures.push_back(pool.submit([i]{ return i; }));
    }
    for (int i = 0; i < 1000; i++) {
        EXPECT_EQ(futures[i].get(), i);
    }
}

Stress Tests:

Submit 100,000 tasks concurrently from 10 threads
Run with ThreadSanitizer
Verify no tasks lost, no crashes

7. Common Pitfalls & Debugging

Problem	Symptom	Root Cause	Fix
Deadlock on shutdown	Program hangs	Workers waiting on condition, never notified	Call `notify_all()` after setting stop flag
Race condition	Sporadic wrong results	Accessing shared data without mutex	Audit all shared state access
Spurious wakeup issues	Tasks not executing	Not using predicate in wait()	Always use `wait(lock, predicate)`
Memory leak	Growing memory	Futures never consumed	Ensure all futures are `.get()`ed or destroyed
Exception crashes pool	One bad task kills workers	Exception escapes worker loop	Catch all exceptions inside worker loop

8. Extensions & Challenges

Work Stealing: Implement per-worker queues with stealing for better cache locality
Dynamic Sizing: Add/remove workers based on load
Task Dependencies: Allow specifying that task B depends on task A completing
Priority Queue: High-priority tasks execute before low-priority
Timeouts: submit_with_timeout() that fails if not completed in time
Metrics: Track queue depth, wait times, task durations

9. Real-World Connections

Boost.Asio io_context: Uses thread pool for async I/O completion
TBB task_arena: Intel’s thread pool with work stealing
std::execution (C++23): Standard parallel algorithms use similar pools
Web server request handling: Every major web server uses thread pools

10. Resources

11. Self-Assessment Checklist

Before considering this project complete, verify:

Pool executes tasks submitted by single thread
Pool executes tasks submitted by multiple threads concurrently
submit() returns working std::future
Tasks with different return types work
Exception in task doesn’t crash pool
Destructor waits for all pending tasks
No deadlocks detected (tested with ThreadSanitizer)
No memory leaks (tested with Valgrind)
Benchmark shows near-linear speedup

12. Submission / Completion Criteria

Your implementation is complete when:

All tests pass: Unit tests, stress tests, sanitizer tests
Documentation complete: Header comments, usage examples
Performance verified: Benchmark shows expected speedup
Code reviewed: No obvious race conditions or resource leaks
Can explain: You can whiteboard the design in an interview