Project 5: Thread Pool Library

Build a production-quality thread pool with futures, task queuing, and graceful shutdown to master C++ concurrency primitives.

Quick Reference

Attribute Value
Difficulty Advanced
Time Estimate 2-3 weeks
Language C++
Prerequisites std::thread basics, mutex/lock understanding, move semantics
Key Topics std::thread, mutex, condition_variable, futures, packaged_task, producer-consumer

1. Learning Objectives

By completing this project, you will:

  • Master thread lifecycle management: Create, manage, and properly join worker threads
  • Understand synchronization primitives: Use mutex, lock_guard, unique_lock, and condition_variable correctly
  • Implement the producer-consumer pattern: Build a thread-safe task queue with proper signaling
  • Work with futures and promises: Return results from async tasks using std::future and std::packaged_task
  • Handle graceful shutdown: Stop threads safely without losing pending tasks or causing deadlocks
  • Apply perfect forwarding: Use variadic templates and std::forward for flexible task submission
  • Avoid common concurrency bugs: Prevent race conditions, deadlocks, and spurious wakeups

2. Theoretical Foundation

2.1 Core Concepts

Thread Pools solve a fundamental problem: Creating and destroying threads is expensive (typically 10-100 microseconds per thread). A thread pool maintains a set of worker threads that wait for and execute tasks, amortizing creation cost across many tasks.

Traditional Approach (expensive):
┌─────────────────────────────────────────────────────────────┐
│  Task 1 arrives → Create Thread → Execute → Destroy Thread  │
│  Task 2 arrives → Create Thread → Execute → Destroy Thread  │
│  Task 3 arrives → Create Thread → Execute → Destroy Thread  │
│  ...                                                         │
│  Overhead: N thread creations + N destructions               │
└─────────────────────────────────────────────────────────────┘

Thread Pool Approach (efficient):
┌─────────────────────────────────────────────────────────────┐
│  Create N workers once                                       │
│  ┌─────────┐  ┌─────────┐  ┌─────────┐  ┌─────────┐         │
│  │Worker 1 │  │Worker 2 │  │Worker 3 │  │Worker 4 │         │
│  └────┬────┘  └────┬────┘  └────┬────┘  └────┬────┘         │
│       │            │            │            │               │
│       └────────────┴─────┬──────┴────────────┘               │
│                          │                                   │
│                    ┌─────▼─────┐                             │
│                    │Task Queue │ ← Tasks submitted here      │
│                    └───────────┘                             │
│  Overhead: N thread creations (once) + 0 destructions        │
└─────────────────────────────────────────────────────────────┘

The Producer-Consumer Pattern is the heart of a thread pool:

                    ┌─────────────────────────────────────┐
                    │           SHARED QUEUE              │
                    │  ┌───┬───┬───┬───┬───┬───┬───┐     │
 Producer           │  │T1 │T2 │T3 │T4 │   │   │   │     │    Consumer
 (main thread)      │  └───┴───┴───┴───┴───┴───┴───┘     │    (worker threads)
      │             │       ↑                    ↑        │         │
      │ submit()    │       │                    │        │  wait() │
      └─────────────┼───────┘                    └────────┼─────────┘
                    │                                     │
                    │  MUTEX protects queue access        │
                    │  CONDITION_VARIABLE signals work    │
                    └─────────────────────────────────────┘

Futures and Promises enable returning results:

┌──────────────┐     ┌─────────────────┐     ┌──────────────┐
│    Caller    │     │  packaged_task  │     │   Worker     │
│              │     │                 │     │              │
│  get_future()├────►│ ┌─────────────┐ │     │              │
│              │     │ │   promise   │ │     │              │
│              │     │ └──────┬──────┘ │     │              │
│              │     │        │        │     │  operator()  │
│              │     │        ▼        │◄────┤  (execute)   │
│  future.get()│◄────┤ ┌─────────────┐ │     │              │
│  (blocks)    │     │ │   future    │ │     │              │
└──────────────┘     │ └─────────────┘ │     └──────────────┘
                     └─────────────────┘

2.2 Why This Matters

Thread pools are ubiquitous in production systems:

  • Web servers: Nginx, Apache use thread pools for request handling
  • Databases: PostgreSQL, MySQL use thread pools for query execution
  • Game engines: Unity, Unreal use job systems (advanced thread pools)
  • Async frameworks: Boost.Asio, libuv use thread pools for I/O completion

Understanding thread pools teaches you:

  • How to think about concurrent execution
  • The cost/benefit tradeoffs of parallelism
  • How to design thread-safe interfaces
  • How async/await patterns work under the hood

2.3 Historical Context

Thread pools emerged from the need to handle many concurrent connections in early web servers (1990s). The “thread-per-connection” model didn’t scale beyond hundreds of clients due to:

  1. Memory overhead: Each thread requires its own stack (typically 1-8 MB)
  2. Context switching cost: OS scheduler overhead increases with thread count
  3. Creation latency: Thread creation involves kernel calls and memory allocation

The Apache prefork model and later worker MPM pioneered thread pool usage in web servers. Today, nearly every high-performance server uses some form of thread pooling.

2.4 Common Misconceptions

Misconception 1: “More threads = faster execution” Reality: Beyond the number of CPU cores, adding threads increases contention and context switching. A thread pool with std::thread::hardware_concurrency() threads is often optimal.

Misconception 2: “condition_variable::notify_one() always wakes exactly one thread” Reality: Spurious wakeups can occur. Always use a predicate with wait():

// WRONG: Can wake up without work available
cv.wait(lock);

// CORRECT: Re-checks condition after wakeup
cv.wait(lock, []{ return !queue.empty() || stop; });

Misconception 3: “std::function<void()> is the only way to store tasks” Reality: std::function has overhead (type erasure, potential heap allocation). For performance-critical pools, consider std::move_only_function (C++23) or custom type-erased wrappers.


3. Project Specification

3.1 What You Will Build

A thread pool library that:

  • Manages a fixed number of worker threads
  • Accepts callable tasks with arbitrary return types
  • Returns std::future for each submitted task
  • Supports graceful shutdown (complete pending tasks)
  • Is thread-safe for concurrent task submission

3.2 Functional Requirements

Requirement Description
FR-1 Constructor takes number of worker threads
FR-2 submit() accepts any callable with any return type
FR-3 submit() returns std::future<ReturnType>
FR-4 Multiple threads can submit tasks concurrently
FR-5 Destructor waits for all pending tasks to complete
FR-6 Workers sleep when queue is empty (no busy-waiting)
FR-7 Tasks execute in approximate FIFO order

3.3 Non-Functional Requirements

Requirement Description
NFR-1 Submission latency < 10 microseconds for simple lambdas
NFR-2 No memory leaks (verified with Valgrind/ASan)
NFR-3 No deadlocks under any submission pattern
NFR-4 Exception in task doesn’t crash pool
NFR-5 Header-only implementation preferred

3.4 Example Usage / Output

#include "thread_pool.hpp"
#include <iostream>
#include <vector>
#include <chrono>

int main() {
    // Create pool with 4 worker threads
    ThreadPool pool(4);

    // Submit a simple lambda
    auto future1 = pool.submit([]() {
        std::this_thread::sleep_for(std::chrono::milliseconds(100));
        return 42;
    });

    // Submit with arguments
    auto future2 = pool.submit([](int a, int b) {
        return a + b;
    }, 10, 20);

    // Submit many tasks
    std::vector<std::future<int>> results;
    for (int i = 0; i < 1000; i++) {
        results.push_back(pool.submit([i]() {
            return i * i;
        }));
    }

    // Get results (blocks until ready)
    std::cout << "future1: " << future1.get() << std::endl;  // 42
    std::cout << "future2: " << future2.get() << std::endl;  // 30

    // Collect all results
    int sum = 0;
    for (auto& f : results) {
        sum += f.get();
    }
    std::cout << "Sum of squares 0-999: " << sum << std::endl;

    // Pool destructor waits for all tasks
    return 0;
}

Expected Output:

future1: 42
future2: 30
Sum of squares 0-999: 332833500

3.5 Real World Outcome

When you run your thread pool with a benchmark:

$ ./thread_pool_bench
Thread Pool Benchmark
=====================
Pool size: 8 threads
Task count: 100,000

Sequential execution: 2.34 seconds
Thread pool execution: 0.31 seconds
Speedup: 7.5x

Task submission latency:
  Min: 0.8 us
  Max: 45.2 us
  Avg: 2.1 us
  P99: 8.3 us

Memory usage: 2.1 MB (stable, no growth)
No data races detected (ThreadSanitizer clean)

4. Solution Architecture

4.1 High-Level Design

┌─────────────────────────────────────────────────────────────────────┐
│                         ThreadPool Class                             │
│                                                                      │
│  ┌──────────────────────────────────────────────────────────────┐   │
│  │                      Public Interface                         │   │
│  │  ThreadPool(size_t num_threads)                               │   │
│  │  ~ThreadPool()                                                │   │
│  │  template<class F, class... Args>                             │   │
│  │  auto submit(F&& f, Args&&... args) -> future<result_of<F>>   │   │
│  └──────────────────────────────────────────────────────────────┘   │
│                                │                                     │
│                                ▼                                     │
│  ┌──────────────────────────────────────────────────────────────┐   │
│  │                      Private Members                          │   │
│  │                                                               │   │
│  │  vector<thread> workers_          ← N worker threads          │   │
│  │  queue<function<void()>> tasks_   ← Task queue                │   │
│  │  mutex queue_mutex_               ← Protects queue            │   │
│  │  condition_variable condition_    ← Signals workers           │   │
│  │  bool stop_                       ← Shutdown flag             │   │
│  └──────────────────────────────────────────────────────────────┘   │
│                                                                      │
└─────────────────────────────────────────────────────────────────────┘

4.2 Key Components

Component Responsibility
Worker Loop Continuously dequeue and execute tasks
Task Queue Thread-safe FIFO storage for pending tasks
Submit Function Wrap callable in packaged_task, enqueue, return future
Shutdown Logic Signal workers, join threads, drain queue

4.3 Data Structures

Task Queue: std::queue<std::function<void()>>

  • Why std::function<void()>? Type erasure allows storing any callable
  • Why void()? We wrap actual callables in lambdas that capture packaged_task

Worker Vector: std::vector<std::thread>

  • Fixed size after construction
  • Each thread runs the same worker loop function

4.4 Algorithm Overview

Worker Loop Algorithm:

while true:
    acquire lock
    wait for (not empty OR stop flag)
    if stop AND empty:
        return  // Exit thread
    task = dequeue front
    release lock
    execute task  // Outside lock!

Submit Algorithm:

create packaged_task from callable
get future from packaged_task
acquire lock
if stopped:
    throw "pool is stopped"
enqueue lambda that invokes packaged_task
release lock
notify one worker
return future

5. Implementation Guide

5.1 Development Environment Setup

Required:

  • C++17 or later (for std::invoke_result_t, structured bindings)
  • Compiler with threading support: GCC 7+, Clang 5+, MSVC 2017+

Recommended tools:

# Install ThreadSanitizer-capable compiler
sudo apt install g++-11

# Compile with ThreadSanitizer
g++ -std=c++17 -pthread -fsanitize=thread -g main.cpp -o main

# Or use AddressSanitizer for memory issues
g++ -std=c++17 -pthread -fsanitize=address -g main.cpp -o main

5.2 Project Structure

thread_pool/
├── include/
│   └── thread_pool.hpp      # Header-only implementation
├── examples/
│   ├── basic_usage.cpp      # Simple usage examples
│   └── benchmark.cpp        # Performance testing
├── tests/
│   ├── test_submit.cpp      # Task submission tests
│   ├── test_shutdown.cpp    # Graceful shutdown tests
│   └── test_stress.cpp      # Concurrent stress tests
├── CMakeLists.txt
└── README.md

5.3 The Core Question You’re Answering

“How do I execute work asynchronously while controlling the number of threads and getting results back?”

This question underlies most concurrent programming. By building a thread pool, you learn the fundamental patterns that power async/await, parallel algorithms, and distributed task systems.

5.4 Concepts You Must Understand First

Before implementing, verify you can answer these:

  1. What’s the difference between std::mutex and std::recursive_mutex?
    • Reference: “C++ Concurrency in Action” Chapter 3
  2. Why must you use std::unique_lock (not std::lock_guard) with condition_variable?
    • Because wait() needs to unlock the mutex while sleeping
  3. What is a spurious wakeup and how do you handle it?
    • Reference: “C++ Concurrency in Action” Section 4.1.1
  4. What does std::packaged_task do and how does it relate to std::future?
    • It wraps a callable and provides a future for its result
  5. What is perfect forwarding and why is it needed for submit()?
    • Reference: “Effective Modern C++” Items 23-25

5.5 Questions to Guide Your Design

Architecture Questions:

  • Should the pool be copyable? Movable?
  • What happens if submit() is called after destruction begins?
  • Should workers have priority? Affinity to specific cores?

Interface Questions:

  • Should submit() take arguments separately or require pre-binding?
  • Should there be a shutdown() method separate from destructor?
  • How do you handle exceptions thrown by tasks?

Performance Questions:

  • Is std::function overhead acceptable? Alternatives?
  • Lock-free queue vs mutex-protected queue tradeoffs?
  • How to minimize contention when many threads submit simultaneously?

5.6 Thinking Exercise

Before coding, trace through this scenario on paper:

Initial state: Pool with 2 workers, empty queue, stop=false

Timeline:
T=0:  Main thread calls submit(task_A)
T=1:  Worker 1 wakes up
T=2:  Main thread calls submit(task_B)
T=3:  Main thread calls submit(task_C)
T=4:  Worker 1 starts executing task_A
T=5:  Worker 2 wakes up
T=6:  Worker 2 pops and starts task_B
T=7:  Worker 1 finishes task_A, pops task_C
T=8:  Pool destructor called while task_C running

Questions to answer:

  1. At T=5, how does Worker 2 know there’s work?
  2. At T=8, what prevents destructor from completing before task_C?
  3. If task_B throws an exception at T=6, what happens to Worker 2?

5.7 Hints in Layers

Hint 1 - Starting Point (Conceptual): Think about the pool as two halves: the “producer side” (submit) and the “consumer side” (workers). They communicate through a shared queue protected by a mutex. The condition variable is the “doorbell” that wakes sleeping workers.

Hint 2 - Next Level (More Specific): The worker loop should:

  1. Acquire the mutex
  2. Wait on condition variable with a predicate
  3. Check if stopping AND queue empty (if so, return)
  4. Pop a task (while holding mutex)
  5. Release mutex
  6. Execute task (without mutex!)
  7. Loop back to step 1

Hint 3 - Technical Details (Approach): For submit(), you need to:

  1. Create a std::packaged_task<ReturnType()> by binding arguments
  2. Get the future before moving the packaged_task
  3. Wrap the packaged_task in a std::function<void()> (tricky: packaged_task is move-only)
  4. Solution: Use std::make_shared<std::packaged_task<...>> and capture by shared_ptr
template<class F, class... Args>
auto submit(F&& f, Args&&... args)
    -> std::future<std::invoke_result_t<F, Args...>>
{
    using return_type = std::invoke_result_t<F, Args...>;

    auto task = std::make_shared<std::packaged_task<return_type()>>(
        std::bind(std::forward<F>(f), std::forward<Args>(args)...)
    );

    std::future<return_type> result = task->get_future();

    {
        std::unique_lock<std::mutex> lock(queue_mutex_);
        if (stop_)
            throw std::runtime_error("submit on stopped pool");
        tasks_.emplace([task]() { (*task)(); });
    }
    condition_.notify_one();

    return result;
}

Hint 4 - Tools and Debugging:

  • Use ThreadSanitizer to detect races: g++ -fsanitize=thread
  • Print thread IDs for debugging: std::this_thread::get_id()
  • Log queue sizes: helps identify if work distribution is fair
  • Use std::atomic<int> counters to track active tasks

5.8 The Interview Questions They’ll Ask

  1. “Why use notify_one() instead of notify_all() in submit?”
    • Expected answer: Only one task was added, so waking one worker is sufficient. notify_all() causes thundering herd problem.
  2. “What happens if a task throws an exception?”
    • The exception is captured in the promise and rethrown when future.get() is called. The worker thread continues to the next task.
  3. “How would you implement task cancellation?”
    • Use std::stop_token (C++20) or a custom cancellation token passed to tasks.
  4. “How would you implement task priorities?”
    • Replace std::queue with std::priority_queue. Need custom comparator.
  5. “Why wrap packaged_task in shared_ptr instead of moving directly?”
    • std::function requires copyable callable. packaged_task is move-only. Shared_ptr makes it copyable.
  6. “What’s the difference between this and std::async?”
    • std::async may create a new thread per call. Thread pool reuses threads, controlling parallelism.

5.9 Books That Will Help

Topic Book & Chapter
Thread basics “C++ Concurrency in Action” Ch. 2 - Anthony Williams
Mutex and locks “C++ Concurrency in Action” Ch. 3 - Anthony Williams
Condition variables “C++ Concurrency in Action” Ch. 4.1 - Anthony Williams
Futures and promises “C++ Concurrency in Action” Ch. 4.2 - Anthony Williams
Thread pool design “C++ Concurrency in Action” Ch. 9 - Anthony Williams
Perfect forwarding “Effective Modern C++” Items 23-30 - Scott Meyers

5.10 Implementation Phases

Phase 1: Basic Pool (Day 1-3)

  • Implement constructor that creates worker threads
  • Implement worker loop that sleeps when queue empty
  • Implement submit() for void-returning tasks only
  • Test that tasks execute

Phase 2: Futures Support (Day 4-6)

  • Add packaged_task wrapping
  • Return futures from submit()
  • Handle tasks with return values
  • Test that future.get() returns correct results

Phase 3: Graceful Shutdown (Day 7-9)

  • Implement stop flag logic
  • Ensure destructor waits for pending tasks
  • Handle submit-during-shutdown
  • Test with ThreadSanitizer

Phase 4: Polish (Day 10-14)

  • Add exception safety
  • Benchmark performance
  • Write comprehensive tests
  • Document the interface

5.11 Key Implementation Decisions

Decision Recommended Choice Reasoning
Task storage std::function<void()> Simple, flexible, good enough for most uses
Queue type std::queue FIFO is intuitive, sufficient for general use
Lock type std::mutex Don’t need recursion, simpler than alternatives
Number of workers Hardware concurrency Matches CPU cores, avoids over-subscription

6. Testing Strategy

Unit Tests:

TEST(ThreadPool, ExecutesSingleTask) {
    ThreadPool pool(1);
    auto f = pool.submit([]{ return 42; });
    EXPECT_EQ(f.get(), 42);
}

TEST(ThreadPool, HandlesExceptions) {
    ThreadPool pool(1);
    auto f = pool.submit([]{ throw std::runtime_error("oops"); return 0; });
    EXPECT_THROW(f.get(), std::runtime_error);
}

TEST(ThreadPool, ConcurrentSubmissions) {
    ThreadPool pool(4);
    std::vector<std::future<int>> futures;
    for (int i = 0; i < 1000; i++) {
        futures.push_back(pool.submit([i]{ return i; }));
    }
    for (int i = 0; i < 1000; i++) {
        EXPECT_EQ(futures[i].get(), i);
    }
}

Stress Tests:

  • Submit 100,000 tasks concurrently from 10 threads
  • Run with ThreadSanitizer
  • Verify no tasks lost, no crashes

7. Common Pitfalls & Debugging

Problem Symptom Root Cause Fix
Deadlock on shutdown Program hangs Workers waiting on condition, never notified Call notify_all() after setting stop flag
Race condition Sporadic wrong results Accessing shared data without mutex Audit all shared state access
Spurious wakeup issues Tasks not executing Not using predicate in wait() Always use wait(lock, predicate)
Memory leak Growing memory Futures never consumed Ensure all futures are .get()ed or destroyed
Exception crashes pool One bad task kills workers Exception escapes worker loop Catch all exceptions inside worker loop

8. Extensions & Challenges

  1. Work Stealing: Implement per-worker queues with stealing for better cache locality
  2. Dynamic Sizing: Add/remove workers based on load
  3. Task Dependencies: Allow specifying that task B depends on task A completing
  4. Priority Queue: High-priority tasks execute before low-priority
  5. Timeouts: submit_with_timeout() that fails if not completed in time
  6. Metrics: Track queue depth, wait times, task durations

9. Real-World Connections

  • Boost.Asio io_context: Uses thread pool for async I/O completion
  • TBB task_arena: Intel’s thread pool with work stealing
  • std::execution (C++23): Standard parallel algorithms use similar pools
  • Web server request handling: Every major web server uses thread pools

10. Resources


11. Self-Assessment Checklist

Before considering this project complete, verify:

  • Pool executes tasks submitted by single thread
  • Pool executes tasks submitted by multiple threads concurrently
  • submit() returns working std::future
  • Tasks with different return types work
  • Exception in task doesn’t crash pool
  • Destructor waits for all pending tasks
  • No deadlocks detected (tested with ThreadSanitizer)
  • No memory leaks (tested with Valgrind)
  • Benchmark shows near-linear speedup

12. Submission / Completion Criteria

Your implementation is complete when:

  1. All tests pass: Unit tests, stress tests, sanitizer tests
  2. Documentation complete: Header comments, usage examples
  3. Performance verified: Benchmark shows expected speedup
  4. Code reviewed: No obvious race conditions or resource leaks
  5. Can explain: You can whiteboard the design in an interview