Project 5: Thread Pool Library
Build a production-quality thread pool with futures, task queuing, and graceful shutdown to master C++ concurrency primitives.
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Advanced |
| Time Estimate | 2-3 weeks |
| Language | C++ |
| Prerequisites | std::thread basics, mutex/lock understanding, move semantics |
| Key Topics | std::thread, mutex, condition_variable, futures, packaged_task, producer-consumer |
1. Learning Objectives
By completing this project, you will:
- Master thread lifecycle management: Create, manage, and properly join worker threads
- Understand synchronization primitives: Use mutex, lock_guard, unique_lock, and condition_variable correctly
- Implement the producer-consumer pattern: Build a thread-safe task queue with proper signaling
- Work with futures and promises: Return results from async tasks using std::future and std::packaged_task
- Handle graceful shutdown: Stop threads safely without losing pending tasks or causing deadlocks
- Apply perfect forwarding: Use variadic templates and std::forward for flexible task submission
- Avoid common concurrency bugs: Prevent race conditions, deadlocks, and spurious wakeups
2. Theoretical Foundation
2.1 Core Concepts
Thread Pools solve a fundamental problem: Creating and destroying threads is expensive (typically 10-100 microseconds per thread). A thread pool maintains a set of worker threads that wait for and execute tasks, amortizing creation cost across many tasks.
Traditional Approach (expensive):
┌─────────────────────────────────────────────────────────────┐
│ Task 1 arrives → Create Thread → Execute → Destroy Thread │
│ Task 2 arrives → Create Thread → Execute → Destroy Thread │
│ Task 3 arrives → Create Thread → Execute → Destroy Thread │
│ ... │
│ Overhead: N thread creations + N destructions │
└─────────────────────────────────────────────────────────────┘
Thread Pool Approach (efficient):
┌─────────────────────────────────────────────────────────────┐
│ Create N workers once │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │Worker 1 │ │Worker 2 │ │Worker 3 │ │Worker 4 │ │
│ └────┬────┘ └────┬────┘ └────┬────┘ └────┬────┘ │
│ │ │ │ │ │
│ └────────────┴─────┬──────┴────────────┘ │
│ │ │
│ ┌─────▼─────┐ │
│ │Task Queue │ ← Tasks submitted here │
│ └───────────┘ │
│ Overhead: N thread creations (once) + 0 destructions │
└─────────────────────────────────────────────────────────────┘
The Producer-Consumer Pattern is the heart of a thread pool:
┌─────────────────────────────────────┐
│ SHARED QUEUE │
│ ┌───┬───┬───┬───┬───┬───┬───┐ │
Producer │ │T1 │T2 │T3 │T4 │ │ │ │ │ Consumer
(main thread) │ └───┴───┴───┴───┴───┴───┴───┘ │ (worker threads)
│ │ ↑ ↑ │ │
│ submit() │ │ │ │ wait() │
└─────────────┼───────┘ └────────┼─────────┘
│ │
│ MUTEX protects queue access │
│ CONDITION_VARIABLE signals work │
└─────────────────────────────────────┘
Futures and Promises enable returning results:
┌──────────────┐ ┌─────────────────┐ ┌──────────────┐
│ Caller │ │ packaged_task │ │ Worker │
│ │ │ │ │ │
│ get_future()├────►│ ┌─────────────┐ │ │ │
│ │ │ │ promise │ │ │ │
│ │ │ └──────┬──────┘ │ │ │
│ │ │ │ │ │ operator() │
│ │ │ ▼ │◄────┤ (execute) │
│ future.get()│◄────┤ ┌─────────────┐ │ │ │
│ (blocks) │ │ │ future │ │ │ │
└──────────────┘ │ └─────────────┘ │ └──────────────┘
└─────────────────┘
2.2 Why This Matters
Thread pools are ubiquitous in production systems:
- Web servers: Nginx, Apache use thread pools for request handling
- Databases: PostgreSQL, MySQL use thread pools for query execution
- Game engines: Unity, Unreal use job systems (advanced thread pools)
- Async frameworks: Boost.Asio, libuv use thread pools for I/O completion
Understanding thread pools teaches you:
- How to think about concurrent execution
- The cost/benefit tradeoffs of parallelism
- How to design thread-safe interfaces
- How async/await patterns work under the hood
2.3 Historical Context
Thread pools emerged from the need to handle many concurrent connections in early web servers (1990s). The “thread-per-connection” model didn’t scale beyond hundreds of clients due to:
- Memory overhead: Each thread requires its own stack (typically 1-8 MB)
- Context switching cost: OS scheduler overhead increases with thread count
- Creation latency: Thread creation involves kernel calls and memory allocation
The Apache prefork model and later worker MPM pioneered thread pool usage in web servers. Today, nearly every high-performance server uses some form of thread pooling.
2.4 Common Misconceptions
Misconception 1: “More threads = faster execution”
Reality: Beyond the number of CPU cores, adding threads increases contention and context switching. A thread pool with std::thread::hardware_concurrency() threads is often optimal.
Misconception 2: “condition_variable::notify_one() always wakes exactly one thread”
Reality: Spurious wakeups can occur. Always use a predicate with wait():
// WRONG: Can wake up without work available
cv.wait(lock);
// CORRECT: Re-checks condition after wakeup
cv.wait(lock, []{ return !queue.empty() || stop; });
Misconception 3: “std::function<void()> is the only way to store tasks”
Reality: std::function has overhead (type erasure, potential heap allocation). For performance-critical pools, consider std::move_only_function (C++23) or custom type-erased wrappers.
3. Project Specification
3.1 What You Will Build
A thread pool library that:
- Manages a fixed number of worker threads
- Accepts callable tasks with arbitrary return types
- Returns
std::futurefor each submitted task - Supports graceful shutdown (complete pending tasks)
- Is thread-safe for concurrent task submission
3.2 Functional Requirements
| Requirement | Description |
|---|---|
| FR-1 | Constructor takes number of worker threads |
| FR-2 | submit() accepts any callable with any return type |
| FR-3 | submit() returns std::future<ReturnType> |
| FR-4 | Multiple threads can submit tasks concurrently |
| FR-5 | Destructor waits for all pending tasks to complete |
| FR-6 | Workers sleep when queue is empty (no busy-waiting) |
| FR-7 | Tasks execute in approximate FIFO order |
3.3 Non-Functional Requirements
| Requirement | Description |
|---|---|
| NFR-1 | Submission latency < 10 microseconds for simple lambdas |
| NFR-2 | No memory leaks (verified with Valgrind/ASan) |
| NFR-3 | No deadlocks under any submission pattern |
| NFR-4 | Exception in task doesn’t crash pool |
| NFR-5 | Header-only implementation preferred |
3.4 Example Usage / Output
#include "thread_pool.hpp"
#include <iostream>
#include <vector>
#include <chrono>
int main() {
// Create pool with 4 worker threads
ThreadPool pool(4);
// Submit a simple lambda
auto future1 = pool.submit([]() {
std::this_thread::sleep_for(std::chrono::milliseconds(100));
return 42;
});
// Submit with arguments
auto future2 = pool.submit([](int a, int b) {
return a + b;
}, 10, 20);
// Submit many tasks
std::vector<std::future<int>> results;
for (int i = 0; i < 1000; i++) {
results.push_back(pool.submit([i]() {
return i * i;
}));
}
// Get results (blocks until ready)
std::cout << "future1: " << future1.get() << std::endl; // 42
std::cout << "future2: " << future2.get() << std::endl; // 30
// Collect all results
int sum = 0;
for (auto& f : results) {
sum += f.get();
}
std::cout << "Sum of squares 0-999: " << sum << std::endl;
// Pool destructor waits for all tasks
return 0;
}
Expected Output:
future1: 42
future2: 30
Sum of squares 0-999: 332833500
3.5 Real World Outcome
When you run your thread pool with a benchmark:
$ ./thread_pool_bench
Thread Pool Benchmark
=====================
Pool size: 8 threads
Task count: 100,000
Sequential execution: 2.34 seconds
Thread pool execution: 0.31 seconds
Speedup: 7.5x
Task submission latency:
Min: 0.8 us
Max: 45.2 us
Avg: 2.1 us
P99: 8.3 us
Memory usage: 2.1 MB (stable, no growth)
No data races detected (ThreadSanitizer clean)
4. Solution Architecture
4.1 High-Level Design
┌─────────────────────────────────────────────────────────────────────┐
│ ThreadPool Class │
│ │
│ ┌──────────────────────────────────────────────────────────────┐ │
│ │ Public Interface │ │
│ │ ThreadPool(size_t num_threads) │ │
│ │ ~ThreadPool() │ │
│ │ template<class F, class... Args> │ │
│ │ auto submit(F&& f, Args&&... args) -> future<result_of<F>> │ │
│ └──────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────────────────────────────┐ │
│ │ Private Members │ │
│ │ │ │
│ │ vector<thread> workers_ ← N worker threads │ │
│ │ queue<function<void()>> tasks_ ← Task queue │ │
│ │ mutex queue_mutex_ ← Protects queue │ │
│ │ condition_variable condition_ ← Signals workers │ │
│ │ bool stop_ ← Shutdown flag │ │
│ └──────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────┘
4.2 Key Components
| Component | Responsibility |
|---|---|
| Worker Loop | Continuously dequeue and execute tasks |
| Task Queue | Thread-safe FIFO storage for pending tasks |
| Submit Function | Wrap callable in packaged_task, enqueue, return future |
| Shutdown Logic | Signal workers, join threads, drain queue |
4.3 Data Structures
Task Queue: std::queue<std::function<void()>>
- Why
std::function<void()>? Type erasure allows storing any callable - Why
void()? We wrap actual callables in lambdas that capture packaged_task
Worker Vector: std::vector<std::thread>
- Fixed size after construction
- Each thread runs the same worker loop function
4.4 Algorithm Overview
Worker Loop Algorithm:
while true:
acquire lock
wait for (not empty OR stop flag)
if stop AND empty:
return // Exit thread
task = dequeue front
release lock
execute task // Outside lock!
Submit Algorithm:
create packaged_task from callable
get future from packaged_task
acquire lock
if stopped:
throw "pool is stopped"
enqueue lambda that invokes packaged_task
release lock
notify one worker
return future
5. Implementation Guide
5.1 Development Environment Setup
Required:
- C++17 or later (for
std::invoke_result_t, structured bindings) - Compiler with threading support: GCC 7+, Clang 5+, MSVC 2017+
Recommended tools:
# Install ThreadSanitizer-capable compiler
sudo apt install g++-11
# Compile with ThreadSanitizer
g++ -std=c++17 -pthread -fsanitize=thread -g main.cpp -o main
# Or use AddressSanitizer for memory issues
g++ -std=c++17 -pthread -fsanitize=address -g main.cpp -o main
5.2 Project Structure
thread_pool/
├── include/
│ └── thread_pool.hpp # Header-only implementation
├── examples/
│ ├── basic_usage.cpp # Simple usage examples
│ └── benchmark.cpp # Performance testing
├── tests/
│ ├── test_submit.cpp # Task submission tests
│ ├── test_shutdown.cpp # Graceful shutdown tests
│ └── test_stress.cpp # Concurrent stress tests
├── CMakeLists.txt
└── README.md
5.3 The Core Question You’re Answering
“How do I execute work asynchronously while controlling the number of threads and getting results back?”
This question underlies most concurrent programming. By building a thread pool, you learn the fundamental patterns that power async/await, parallel algorithms, and distributed task systems.
5.4 Concepts You Must Understand First
Before implementing, verify you can answer these:
- What’s the difference between
std::mutexandstd::recursive_mutex?- Reference: “C++ Concurrency in Action” Chapter 3
- Why must you use
std::unique_lock(notstd::lock_guard) with condition_variable?- Because
wait()needs to unlock the mutex while sleeping
- Because
- What is a spurious wakeup and how do you handle it?
- Reference: “C++ Concurrency in Action” Section 4.1.1
- What does
std::packaged_taskdo and how does it relate tostd::future?- It wraps a callable and provides a future for its result
- What is perfect forwarding and why is it needed for
submit()?- Reference: “Effective Modern C++” Items 23-25
5.5 Questions to Guide Your Design
Architecture Questions:
- Should the pool be copyable? Movable?
- What happens if
submit()is called after destruction begins? - Should workers have priority? Affinity to specific cores?
Interface Questions:
- Should
submit()take arguments separately or require pre-binding? - Should there be a
shutdown()method separate from destructor? - How do you handle exceptions thrown by tasks?
Performance Questions:
- Is
std::functionoverhead acceptable? Alternatives? - Lock-free queue vs mutex-protected queue tradeoffs?
- How to minimize contention when many threads submit simultaneously?
5.6 Thinking Exercise
Before coding, trace through this scenario on paper:
Initial state: Pool with 2 workers, empty queue, stop=false
Timeline:
T=0: Main thread calls submit(task_A)
T=1: Worker 1 wakes up
T=2: Main thread calls submit(task_B)
T=3: Main thread calls submit(task_C)
T=4: Worker 1 starts executing task_A
T=5: Worker 2 wakes up
T=6: Worker 2 pops and starts task_B
T=7: Worker 1 finishes task_A, pops task_C
T=8: Pool destructor called while task_C running
Questions to answer:
- At T=5, how does Worker 2 know there’s work?
- At T=8, what prevents destructor from completing before task_C?
- If task_B throws an exception at T=6, what happens to Worker 2?
5.7 Hints in Layers
Hint 1 - Starting Point (Conceptual): Think about the pool as two halves: the “producer side” (submit) and the “consumer side” (workers). They communicate through a shared queue protected by a mutex. The condition variable is the “doorbell” that wakes sleeping workers.
Hint 2 - Next Level (More Specific): The worker loop should:
- Acquire the mutex
- Wait on condition variable with a predicate
- Check if stopping AND queue empty (if so, return)
- Pop a task (while holding mutex)
- Release mutex
- Execute task (without mutex!)
- Loop back to step 1
Hint 3 - Technical Details (Approach):
For submit(), you need to:
- Create a
std::packaged_task<ReturnType()>by binding arguments - Get the future before moving the packaged_task
- Wrap the packaged_task in a
std::function<void()>(tricky: packaged_task is move-only) - Solution: Use
std::make_shared<std::packaged_task<...>>and capture by shared_ptr
template<class F, class... Args>
auto submit(F&& f, Args&&... args)
-> std::future<std::invoke_result_t<F, Args...>>
{
using return_type = std::invoke_result_t<F, Args...>;
auto task = std::make_shared<std::packaged_task<return_type()>>(
std::bind(std::forward<F>(f), std::forward<Args>(args)...)
);
std::future<return_type> result = task->get_future();
{
std::unique_lock<std::mutex> lock(queue_mutex_);
if (stop_)
throw std::runtime_error("submit on stopped pool");
tasks_.emplace([task]() { (*task)(); });
}
condition_.notify_one();
return result;
}
Hint 4 - Tools and Debugging:
- Use
ThreadSanitizerto detect races:g++ -fsanitize=thread - Print thread IDs for debugging:
std::this_thread::get_id() - Log queue sizes: helps identify if work distribution is fair
- Use
std::atomic<int>counters to track active tasks
5.8 The Interview Questions They’ll Ask
- “Why use
notify_one()instead ofnotify_all()in submit?”- Expected answer: Only one task was added, so waking one worker is sufficient.
notify_all()causes thundering herd problem.
- Expected answer: Only one task was added, so waking one worker is sufficient.
- “What happens if a task throws an exception?”
- The exception is captured in the promise and rethrown when
future.get()is called. The worker thread continues to the next task.
- The exception is captured in the promise and rethrown when
- “How would you implement task cancellation?”
- Use
std::stop_token(C++20) or a custom cancellation token passed to tasks.
- Use
- “How would you implement task priorities?”
- Replace
std::queuewithstd::priority_queue. Need custom comparator.
- Replace
- “Why wrap packaged_task in shared_ptr instead of moving directly?”
std::functionrequires copyable callable.packaged_taskis move-only. Shared_ptr makes it copyable.
- “What’s the difference between this and
std::async?”std::asyncmay create a new thread per call. Thread pool reuses threads, controlling parallelism.
5.9 Books That Will Help
| Topic | Book & Chapter |
|---|---|
| Thread basics | “C++ Concurrency in Action” Ch. 2 - Anthony Williams |
| Mutex and locks | “C++ Concurrency in Action” Ch. 3 - Anthony Williams |
| Condition variables | “C++ Concurrency in Action” Ch. 4.1 - Anthony Williams |
| Futures and promises | “C++ Concurrency in Action” Ch. 4.2 - Anthony Williams |
| Thread pool design | “C++ Concurrency in Action” Ch. 9 - Anthony Williams |
| Perfect forwarding | “Effective Modern C++” Items 23-30 - Scott Meyers |
5.10 Implementation Phases
Phase 1: Basic Pool (Day 1-3)
- Implement constructor that creates worker threads
- Implement worker loop that sleeps when queue empty
- Implement submit() for void-returning tasks only
- Test that tasks execute
Phase 2: Futures Support (Day 4-6)
- Add packaged_task wrapping
- Return futures from submit()
- Handle tasks with return values
- Test that future.get() returns correct results
Phase 3: Graceful Shutdown (Day 7-9)
- Implement stop flag logic
- Ensure destructor waits for pending tasks
- Handle submit-during-shutdown
- Test with ThreadSanitizer
Phase 4: Polish (Day 10-14)
- Add exception safety
- Benchmark performance
- Write comprehensive tests
- Document the interface
5.11 Key Implementation Decisions
| Decision | Recommended Choice | Reasoning |
|---|---|---|
| Task storage | std::function<void()> |
Simple, flexible, good enough for most uses |
| Queue type | std::queue |
FIFO is intuitive, sufficient for general use |
| Lock type | std::mutex |
Don’t need recursion, simpler than alternatives |
| Number of workers | Hardware concurrency | Matches CPU cores, avoids over-subscription |
6. Testing Strategy
Unit Tests:
TEST(ThreadPool, ExecutesSingleTask) {
ThreadPool pool(1);
auto f = pool.submit([]{ return 42; });
EXPECT_EQ(f.get(), 42);
}
TEST(ThreadPool, HandlesExceptions) {
ThreadPool pool(1);
auto f = pool.submit([]{ throw std::runtime_error("oops"); return 0; });
EXPECT_THROW(f.get(), std::runtime_error);
}
TEST(ThreadPool, ConcurrentSubmissions) {
ThreadPool pool(4);
std::vector<std::future<int>> futures;
for (int i = 0; i < 1000; i++) {
futures.push_back(pool.submit([i]{ return i; }));
}
for (int i = 0; i < 1000; i++) {
EXPECT_EQ(futures[i].get(), i);
}
}
Stress Tests:
- Submit 100,000 tasks concurrently from 10 threads
- Run with ThreadSanitizer
- Verify no tasks lost, no crashes
7. Common Pitfalls & Debugging
| Problem | Symptom | Root Cause | Fix |
|---|---|---|---|
| Deadlock on shutdown | Program hangs | Workers waiting on condition, never notified | Call notify_all() after setting stop flag |
| Race condition | Sporadic wrong results | Accessing shared data without mutex | Audit all shared state access |
| Spurious wakeup issues | Tasks not executing | Not using predicate in wait() | Always use wait(lock, predicate) |
| Memory leak | Growing memory | Futures never consumed | Ensure all futures are .get()ed or destroyed |
| Exception crashes pool | One bad task kills workers | Exception escapes worker loop | Catch all exceptions inside worker loop |
8. Extensions & Challenges
- Work Stealing: Implement per-worker queues with stealing for better cache locality
- Dynamic Sizing: Add/remove workers based on load
- Task Dependencies: Allow specifying that task B depends on task A completing
- Priority Queue: High-priority tasks execute before low-priority
- Timeouts:
submit_with_timeout()that fails if not completed in time - Metrics: Track queue depth, wait times, task durations
9. Real-World Connections
- Boost.Asio io_context: Uses thread pool for async I/O completion
- TBB task_arena: Intel’s thread pool with work stealing
- std::execution (C++23): Standard parallel algorithms use similar pools
- Web server request handling: Every major web server uses thread pools
10. Resources
- C++ Concurrency in Action, 2nd Edition
- cppreference: std::thread
- cppreference: std::condition_variable
- cppreference: std::packaged_task
11. Self-Assessment Checklist
Before considering this project complete, verify:
- Pool executes tasks submitted by single thread
- Pool executes tasks submitted by multiple threads concurrently
submit()returns workingstd::future- Tasks with different return types work
- Exception in task doesn’t crash pool
- Destructor waits for all pending tasks
- No deadlocks detected (tested with ThreadSanitizer)
- No memory leaks (tested with Valgrind)
- Benchmark shows near-linear speedup
12. Submission / Completion Criteria
Your implementation is complete when:
- All tests pass: Unit tests, stress tests, sanitizer tests
- Documentation complete: Header comments, usage examples
- Performance verified: Benchmark shows expected speedup
- Code reviewed: No obvious race conditions or resource leaks
- Can explain: You can whiteboard the design in an interview