Project 17: High-Frequency Trading Simulator (Capstone)
Quick Reference
| Attribute | Value |
|---|---|
| Project Number | 17 of 17 (Capstone) |
| Category | C++ Concurrency Mastery |
| Difficulty | Level 5: Master |
| Time Estimate | 8-12 weeks |
| Main Programming Language | C++ |
| Alternative Languages | C, Rust (no GC languages!) |
| Coolness Level | Level 5: Pure Magic (Super Cool) |
| Business Potential | 5. The “Industry Disruptor” |
| Knowledge Area | All Concurrency / Low Latency / Real-time |
| Primary Tool | GCC/Clang with C++20/23, perf, Valgrind |
| Main Books | “C++ Concurrency in Action” by Anthony Williams, “Trading and Exchanges” by Larry Harris |
Summary: Build a simulated high-frequency trading system with market data feed handlers (lock-free), order matching engine (SIMD), strategy execution (coroutines), and risk management - all with microsecond-level latency. This is the ultimate integration test of everything you have learned.
Learning Objectives
By completing this project, you will be able to:
- Architect a low-latency system where every microsecond matters and design decisions directly impact performance
- Integrate lock-free data structures for market data feeds handling millions of messages per second
- Apply SIMD operations for parallel price comparisons and order matching
- Design trading strategies as coroutines with efficient async workflows using co_await
- Implement real-time risk management with atomic operations and lock-free position tracking
- Master low-latency patterns: object pools, core pinning, NUMA awareness, branch elimination
- Measure and optimize using P50/P99/P99.9 latency metrics with sub-microsecond precision
- Answer interview questions about trading systems, low-latency design, and concurrent architecture
Theoretical Foundation
The Core Question You’re Answering
“How do you build a system where every nanosecond matters, combining lock-free data structures, SIMD operations, coroutines, and parallel algorithms into a cohesive architecture that processes millions of events per second with microsecond-level latency?”
This project forces you to answer:
- How do lock-free structures eliminate mutex overhead on the critical path?
- Why does SIMD enable matching 8 prices in the time it takes to compare 1?
- How do coroutines model trading strategies more naturally than callbacks?
- What architectural patterns eliminate memory allocations, system calls, and cache misses?
- How do you measure latency at P99.9 and identify tail latency sources?
Most developers have never built anything this performance-sensitive. After this project, you will understand what it takes to compete at the microsecond level and could interview for roles at trading firms, game engines, or embedded systems.
Why This Matters
LATENCY MATTERS IN HFT
Time Scale | What Happens
--------------------+--------------------------------------------------
1 second | Human reaction time. An eternity in HFT.
100ms | Network round-trip across continent.
10ms | Database query. OS scheduler quantum.
1ms (1000 us) | Thread context switch. Mutex contention.
100 us | Memory allocation. Garbage collection pause.
10 us | L3 cache miss. Your target quote-to-trade.
1 us (1000 ns) | L2 cache miss. Lock-free operation.
100 ns | L1 cache miss. Branch misprediction.
10 ns | Memory access (cached). SIMD operation.
1 ns | CPU cycle at 1 GHz. Register access.
In HFT, being 10us faster can mean winning vs losing every trade.
Historical Context: The “Flash Crash” of May 6, 2010 saw the Dow drop 1000 points in minutes, driven partly by HFT algorithms. Modern markets execute millions of orders per second, with firms competing to shave microseconds off latency. Understanding these systems is essential for:
- Trading firm engineering (Jane Street, Two Sigma, Citadel)
- Game engine development (tick rates, physics updates)
- Embedded systems (automotive, aerospace)
- Real-time audio/video processing
- Any system where latency is business-critical
Common Misconceptions
-
“Fast code is just about algorithms” - Wrong. Memory layout, cache behavior, branch prediction, and system architecture matter more than big-O complexity at this level.
-
“Lock-free means faster” - Not always. Lock-free has overhead (CAS loops, memory barriers). It wins when contention is high or holding locks is dangerous, not universally.
-
“C++ is automatically fast” - C++ gives you control, not speed. You can write slow C++ easily. Speed comes from understanding what the compiler generates and what the hardware does.
-
“Latency and throughput are the same” - They are often opposed. Batching improves throughput but hurts latency. HFT cares about latency at P99.9, even at throughput cost.
-
“SIMD is just for math” - SIMD excels at data-parallel conditionals too. Comparing 8 prices at once is as useful as adding 8 numbers.
Concepts You Must Understand First
Stop and research these before coding:
1. Market Microstructure
How do financial markets actually work?
| Concept | Description | Why It Matters |
|---|---|---|
| Order Book | Sorted list of buy/sell orders by price | Central data structure you’ll build |
| Bid/Ask Spread | Gap between best buy and sell prices | Profit opportunity for market makers |
| Market Order | Execute immediately at best available price | Creates latency pressure |
| Limit Order | Execute only at specified price or better | Rests in order book |
| Matching Engine | Core component that matches buyers to sellers | Your SIMD optimization target |
ORDER BOOK VISUALIZATION
BID SIDE ASK SIDE
(buyers want to buy) (sellers want to sell)
Price Quantity Price Quantity
----- -------- ----- --------
$99.97 1,000 <-- best $100.03 500 <-- best ask
$99.95 2,500 $100.05 1,200
$99.90 5,000 $100.10 3,000
$99.85 3,200 $100.15 2,800
^^^^
SPREAD = $0.06 |
(100.03 - 99.97) |
|
When a BUY market order arrives -------+
it matches against best ask
Book Reference: “Trading and Exchanges” by Larry Harris - Ch. 1-4
2. Lock-Free SPSC Queue
The market data path must be lock-free. A Single-Producer Single-Consumer (SPSC) queue is the simplest lock-free structure.
// SPSC Queue: One thread writes, one reads
template<typename T, size_t Size>
class SPSCQueue {
alignas(64) std::array<T, Size> buffer_;
alignas(64) std::atomic<size_t> head_{0}; // Writer increments
alignas(64) std::atomic<size_t> tail_{0}; // Reader increments
// Note: head_ and tail_ on separate cache lines (64 bytes apart)
// This prevents false sharing between producer and consumer
public:
bool push(const T& item) {
size_t h = head_.load(std::memory_order_relaxed);
size_t next = (h + 1) % Size;
if (next == tail_.load(std::memory_order_acquire)) {
return false; // Queue full
}
buffer_[h] = item;
head_.store(next, std::memory_order_release);
return true;
}
std::optional<T> pop() {
size_t t = tail_.load(std::memory_order_relaxed);
if (t == head_.load(std::memory_order_acquire)) {
return std::nullopt; // Queue empty
}
T item = buffer_[t];
tail_.store((t + 1) % Size, std::memory_order_release);
return item;
}
};
Key Points:
memory_order_acquireon load synchronizes withmemory_order_releaseon store- Cache line alignment prevents false sharing (producer and consumer don’t contend)
- No locks, no CAS loops (because single producer/consumer)
Book Reference: “C++ Concurrency in Action” Chapter 7 - Anthony Williams
3. SIMD Price Comparison
Instead of comparing prices one at a time, compare 8 at once:
#include <experimental/simd>
namespace stdx = std::experimental;
// Traditional: O(n) comparisons, one at a time
for (size_t i = 0; i < orders.size(); ++i) {
if ((side == BUY && orders[i].price >= target_price) ||
(side == SELL && orders[i].price <= target_price)) {
matches.push_back(i);
}
}
// SIMD: O(n/8) iterations, 8 comparisons per iteration
using simd_t = stdx::fixed_size_simd<int64_t, 8>;
simd_t target = target_price; // Broadcast to all lanes
for (size_t i = 0; i < orders.size(); i += 8) {
simd_t prices(&order_prices[i], stdx::element_aligned);
auto mask = (side == BUY) ? (prices >= target) : (prices <= target);
if (stdx::any_of(mask)) {
// Extract matching indices using mask
for (size_t j = 0; j < 8; ++j) {
if (mask[j]) matches.push_back(i + j);
}
}
}
Book Reference: Projects 12-14 in this series
4. Coroutines for Strategy Execution
Trading strategies are naturally expressed as async workflows:
// Strategy as a coroutine - reads like sequential code
// but executes asynchronously
Task<void> momentum_strategy(MarketDataFeed& feed, OrderRouter& router) {
while (true) {
// Wait for next price update
auto quote = co_await feed.next_quote();
// Calculate trading signal
double signal = co_await calculate_momentum(quote);
if (signal > buy_threshold) {
// Submit order and wait for fill
auto order = Order{BUY, quote.symbol, 100, quote.ask};
auto result = co_await router.submit(order);
if (result.filled) {
// Hold position
co_await sleep_for(1000ms);
// Close position
co_await router.submit(Order{SELL, quote.symbol, 100});
}
}
// Yield to other strategies
co_await next_tick();
}
}
Why coroutines over callbacks?
- Sequential code is easier to reason about
- State is preserved in coroutine frame (no explicit state machines)
- Cooperative multitasking without OS threads
- Can run thousands of strategies on few threads
Book Reference: Projects 9-11 in this series
5. Low-Latency Patterns
These patterns eliminate the sources of latency:
| Pattern | Problem Solved | Implementation |
|---|---|---|
| Object Pools | new/delete take microseconds |
Pre-allocate arrays, use free lists |
| Core Pinning | OS migrates threads, trashing caches | pthread_setaffinity_np |
| NUMA Awareness | Cross-socket memory access is 100+ ns | Allocate on local NUMA node |
| Branch Elimination | Mispredictions cost 15+ cycles | Use SIMD masks, branchless min/max |
| Cache Prefetching | L3 miss is 40+ ns | __builtin_prefetch |
| Kernel Bypass | System calls cost microseconds | DPDK for networking (optional) |
LATENCY SOURCES TO ELIMINATE
+-----------------+ +-----------------+
| Memory | | System |
| Allocation | <------ | malloc/new |
| ~1-10 us | | heap management |
+-----------------+ +-----------------+
+-----------------+ +-----------------+
| Cache | | Thread |
| Miss (L3) | <------ | Migration |
| ~40 ns | | by OS scheduler |
+-----------------+ +-----------------+
+-----------------+ +-----------------+
| Lock | | Mutex |
| Contention | <------ | contention |
| ~1-100 us | | between threads |
+-----------------+ +-----------------+
+-----------------+ +-----------------+
| Branch | | Unpredictable |
| Misprediction | <------ | if/else in |
| ~15 cycles | | hot paths |
+-----------------+ +-----------------+
Project Specification
What You’ll Build
A complete simulated HFT system with these components:
- Market Data Handler: Lock-free SPSC queue receiving simulated market data at 5M+ messages/second
- Order Book Manager: Per-instrument order books with SIMD-accelerated matching
- Strategy Engine: Multiple trading strategies implemented as coroutines
- Risk Manager: Real-time position and exposure tracking with atomic operations
- Order Router: Submits orders to simulated exchange
- Performance Monitor: Measures latency at P50/P99/P99.9
Performance Targets
| Metric | Target | Description |
|---|---|---|
| Quote-to-Trade P50 | < 5 us | Median latency from quote arrival to order submission |
| Quote-to-Trade P99 | < 10 us | 99th percentile latency |
| Quote-to-Trade P99.9 | < 50 us | 99.9th percentile (1 in 1000 events) |
| Throughput | > 1M quotes/sec | Market data processing capacity |
| Orders/Second | > 50,000 | Order submission rate |
| Cache Miss Rate | < 1% | L3 cache misses in hot path |
| Memory Allocations | 0 | Zero allocations in critical path |
Input/Output Specification
Configuration:
# trading.yaml
market_data:
source: "replay" # or "simulated", "network"
file: "nasdaq_20240115.pcap"
replay_speed: 10 # 10x real-time
instruments:
count: 8000 # Number of stocks
symbols: ["AAPL", "GOOGL", ...] # or "all"
strategies:
- name: "momentum"
enabled: true
params:
lookback_period: 100
threshold: 0.02
- name: "mean_reversion"
enabled: true
params:
window: 50
z_score_threshold: 2.0
- name: "market_making"
enabled: true
params:
spread: 0.001
position_limit: 1000
threads:
market_data: [0, 1] # Pinned to cores 0-1
strategy: [2, 3, 4, 5] # Pinned to cores 2-5
risk: [6] # Pinned to core 6
order_management: [7] # Pinned to core 7
risk:
max_position_per_symbol: 10000
max_total_exposure: 10000000 # $10M
max_loss_per_day: 100000 # $100k
Expected Output:
$ ./hft_simulator --config trading.yaml
=== High-Frequency Trading Simulator ===
Loading market data replay: nasdaq_20240115.pcap
Instruments: 8,000 stocks
Strategies: 5 (momentum, mean-reversion, arbitrage, market-making, statistical)
System Configuration:
Market data threads: 2 (pinned to cores 0-1)
Strategy threads: 4 (pinned to cores 2-5)
Risk thread: 1 (pinned to core 6)
Order management: 1 (pinned to core 7)
Data structures:
Market data queue: Lock-free SPSC, 1M slots
Order book: Lock-free per-instrument, SIMD matching
Position cache: Atomic counters, no locks
Pre-allocation complete:
Order pool: 100,000 orders pre-allocated
Quote pool: 1,000,000 quotes pre-allocated
Message pool: 500,000 messages pre-allocated
Starting replay at 10x speed...
[09:30:00.000] Market open
[09:30:00.001] Received 50,000 quotes
[09:30:00.002] Strategy signals: 23 buy, 17 sell
[09:30:00.003] Orders submitted: 40
[09:30:00.004] Orders filled: 38, rejected: 2 (risk limit)
Performance Metrics (1 second window):
Market data messages: 5.2M
Quote-to-trade latency:
P50: 2.3 us
P99: 8.1 us
P99.9: 24 us
Orders per second: 45,000
Strategy CPU utilization: 78%
Cache miss rate: 0.3%
[10:30:00.000] Replay complete
Summary:
Total PnL: $127,432 (simulated)
Trades: 2.1M
Win rate: 51.3%
Sharpe ratio: 3.2
Max drawdown: -$8,923
Latency Histogram:
< 1us: ████████████░░░░░░░░ 35%
1-5us: ██████████████████░░ 52%
5-10us: ███░░░░░░░░░░░░░░░░░ 10%
10-50us:█░░░░░░░░░░░░░░░░░░░ 2%
> 50us: ░░░░░░░░░░░░░░░░░░░░ 1%
Solution Architecture
System Overview
+-----------------------------------------------------------------------------+
| HIGH-FREQUENCY TRADING SIMULATOR |
+-----------------------------------------------------------------------------+
| |
| +------------------+ |
| | Market Data | Lock-free |
| | Feed Handler |------+ |
| | (Core 0-1) | | |
| +------------------+ | |
| v |
| +---------------+ |
| | SPSC Queue | |
| | (1M slots) | |
| +-------+-------+ |
| | |
| v |
| +-----------------------------------------------------------+ |
| | ORDER BOOK MANAGER | |
| | | |
| | Symbol Order Book (SIMD-optimized) | |
| | ------ ----------------------------------- | |
| | AAPL --> [Bid: 185.50(100), 185.49(500), ...] | |
| | [Ask: 185.52(200), 185.53(300), ...] | |
| | GOOGL --> [Bid: 142.10(50), 142.09(200), ...] | |
| | [Ask: 142.12(100), 142.15(400), ...] | |
| | ... (8000 instruments) | |
| +----------------------------+------------------------------+ |
| | |
| +----------------+----------------+ |
| | | |
| v v |
| +-------------------+ +-------------------+ |
| | Price Updates | | Trade Signals | |
| +--------+----------+ +--------+----------+ |
| | | |
| v v |
| +------------------------------------------------+ |
| | STRATEGY ENGINE (Cores 2-5) | |
| | | |
| | +-------------+ +---------------+ +-------+ | |
| | | Momentum | | Mean-Reversion| | Arb | | |
| | | (coroutine) | | (coroutine) | | (co) | | |
| | +------+------+ +-------+-------+ +---+---+ | |
| | | | | | |
| +---------+-----------------+--------------+------+ |
| | | | |
| v v v |
| +---------------------------------------------------+ |
| | ORDER QUEUE | |
| +----------------------------+----------------------+ |
| | |
| v |
| +---------------------------------------------------+ |
| | RISK MANAGER (Core 6) | |
| | | |
| | Position Limits Exposure Checks P&L Track | |
| | (atomic counters) (lock-free) (real-time) | |
| +----------------------------+-----------------------+ |
| | |
| PASS or REJECT |
| | |
| v |
| +---------------------------------------------------+ |
| | ORDER ROUTER (Core 7) | |
| +----------------------------+-----------------------+ |
| | |
| v |
| +---------------------------------------------------+ |
| | SIMULATED EXCHANGE | |
| | | |
| | Matching Engine Fill Reports Rejections | |
| +---------------------------------------------------+ |
| |
+-----------------------------------------------------------------------------+
Key Components
1. MarketDataFeed
// Market data feed handler - processes raw network data
class MarketDataFeed {
private:
SPSCQueue<Quote, 1'000'000> queue_; // Lock-free, 1M slots
std::atomic<bool> running_{true};
// Object pool for quotes (zero allocation)
ObjectPool<Quote, 1'000'000> quote_pool_;
public:
// Producer thread (network/replay)
void run_producer(DataSource& source) {
// Pin to cores 0-1
set_thread_affinity({0, 1});
while (running_.load(std::memory_order_relaxed)) {
auto raw = source.read();
if (!raw) continue;
// Get quote from pool (no allocation)
Quote* quote = quote_pool_.acquire();
parse_into(raw, *quote);
// Push to queue (lock-free)
while (!queue_.push(*quote)) {
// Queue full - apply backpressure
_mm_pause(); // CPU-friendly spin
}
}
}
// Consumer interface (strategy threads)
std::optional<Quote> next_quote() {
return queue_.pop();
}
};
2. OrderBook
// Per-instrument order book with SIMD matching
class OrderBook {
private:
// Structure of arrays for SIMD access
alignas(64) std::array<int64_t, MAX_LEVELS> bid_prices_;
alignas(64) std::array<int64_t, MAX_LEVELS> bid_quantities_;
alignas(64) std::array<int64_t, MAX_LEVELS> ask_prices_;
alignas(64) std::array<int64_t, MAX_LEVELS> ask_quantities_;
std::atomic<size_t> bid_count_{0};
std::atomic<size_t> ask_count_{0};
public:
// SIMD-accelerated order matching
std::vector<Match> match_order(const Order& order) {
using simd_t = stdx::fixed_size_simd<int64_t, 8>;
std::vector<Match> matches;
const auto& prices = (order.side == BUY) ? ask_prices_ : bid_prices_;
const auto& qtys = (order.side == BUY) ? ask_quantities_ : bid_quantities_;
size_t count = (order.side == BUY) ?
ask_count_.load(std::memory_order_acquire) :
bid_count_.load(std::memory_order_acquire);
simd_t target_price = order.price;
int64_t remaining_qty = order.quantity;
for (size_t i = 0; i < count && remaining_qty > 0; i += 8) {
// Load 8 prices at once
simd_t book_prices(&prices[i], stdx::element_aligned);
// Compare 8 prices at once
auto price_match = (order.side == BUY) ?
(book_prices <= target_price) :
(book_prices >= target_price);
if (stdx::any_of(price_match)) {
// Extract matches
for (size_t j = 0; j < 8 && remaining_qty > 0; ++j) {
if (price_match[j] && qtys[i + j] > 0) {
int64_t fill_qty = std::min(remaining_qty, qtys[i + j]);
matches.push_back({prices[i + j], fill_qty});
remaining_qty -= fill_qty;
}
}
}
}
return matches;
}
};
3. StrategyEngine (Coroutine-based)
// Coroutine task type for strategies
template<typename T>
class Task {
struct promise_type {
T value;
std::suspend_always initial_suspend() { return {}; }
std::suspend_always final_suspend() noexcept { return {}; }
Task get_return_object() {
return Task{std::coroutine_handle<promise_type>::from_promise(*this)};
}
void return_value(T v) { value = std::move(v); }
void unhandled_exception() { std::terminate(); }
};
std::coroutine_handle<promise_type> handle_;
// ...
};
// Awaitable for next market data tick
struct NextTick {
MarketDataFeed& feed_;
bool await_ready() { return feed_.has_data(); }
void await_suspend(std::coroutine_handle<> h) {
feed_.register_waiter(h);
}
Quote await_resume() { return feed_.get_quote(); }
};
// Momentum strategy as coroutine
Task<void> momentum_strategy(
MarketDataFeed& feed,
RiskManager& risk,
OrderRouter& router
) {
// State persists across co_await points
std::deque<double> price_history;
constexpr size_t LOOKBACK = 100;
constexpr double THRESHOLD = 0.02;
while (true) {
// Wait for next quote (suspends, no busy-waiting)
auto quote = co_await NextTick{feed};
// Update price history
price_history.push_back(quote.mid_price());
if (price_history.size() > LOOKBACK) {
price_history.pop_front();
}
if (price_history.size() < LOOKBACK) continue;
// Calculate momentum signal
double momentum = (price_history.back() - price_history.front())
/ price_history.front();
if (std::abs(momentum) > THRESHOLD) {
Side side = momentum > 0 ? BUY : SELL;
Order order{side, quote.symbol, 100, quote.mid_price()};
// Check risk (lock-free atomic check)
if (co_await risk.check(order)) {
co_await router.submit(order);
}
}
}
}
4. RiskManager
// Lock-free risk management
class RiskManager {
private:
// Per-symbol position tracking (atomic)
struct SymbolPosition {
alignas(64) std::atomic<int64_t> position{0};
alignas(64) std::atomic<int64_t> exposure{0};
};
std::unordered_map<SymbolId, SymbolPosition> positions_;
std::atomic<int64_t> total_exposure_{0};
std::atomic<int64_t> daily_pnl_{0};
// Limits
const int64_t max_position_per_symbol_;
const int64_t max_total_exposure_;
const int64_t max_daily_loss_;
public:
// Lock-free risk check
bool check(const Order& order) {
auto& pos = positions_[order.symbol];
int64_t current = pos.position.load(std::memory_order_relaxed);
int64_t new_pos = current + (order.side == BUY ? order.quantity : -order.quantity);
// Position limit check
if (std::abs(new_pos) > max_position_per_symbol_) {
return false;
}
// Exposure check
int64_t current_exposure = total_exposure_.load(std::memory_order_relaxed);
int64_t order_value = order.quantity * order.price;
if (current_exposure + order_value > max_total_exposure_) {
return false;
}
// Daily loss check
if (daily_pnl_.load(std::memory_order_relaxed) < -max_daily_loss_) {
return false;
}
return true;
}
// Update position after fill (lock-free)
void update_position(const Fill& fill) {
auto& pos = positions_[fill.symbol];
// Atomic update
pos.position.fetch_add(
fill.side == BUY ? fill.quantity : -fill.quantity,
std::memory_order_relaxed
);
pos.exposure.fetch_add(
fill.quantity * fill.price,
std::memory_order_relaxed
);
total_exposure_.fetch_add(
fill.quantity * fill.price,
std::memory_order_relaxed
);
}
};
Data Structures
// Core data types - all fixed size, no allocations
struct Quote {
SymbolId symbol; // 8 bytes
int64_t bid_price; // Price in cents (fixed point)
int64_t ask_price;
int32_t bid_size;
int32_t ask_size;
uint64_t timestamp; // Nanoseconds since epoch
// Total: 40 bytes, fits in cache line
};
struct Order {
uint64_t order_id;
SymbolId symbol;
Side side; // BUY or SELL
int64_t price;
int32_t quantity;
OrderType type; // MARKET, LIMIT
uint64_t timestamp;
};
struct Fill {
uint64_t order_id;
SymbolId symbol;
Side side;
int64_t price;
int32_t quantity;
uint64_t exchange_timestamp;
uint64_t local_timestamp; // For latency measurement
};
// Object pool for zero-allocation
template<typename T, size_t N>
class ObjectPool {
private:
std::array<T, N> objects_;
std::array<T*, N> free_list_;
std::atomic<size_t> free_count_{N};
public:
ObjectPool() {
for (size_t i = 0; i < N; ++i) {
free_list_[i] = &objects_[i];
}
}
T* acquire() {
size_t idx = free_count_.fetch_sub(1, std::memory_order_relaxed) - 1;
return free_list_[idx];
}
void release(T* obj) {
size_t idx = free_count_.fetch_add(1, std::memory_order_relaxed);
free_list_[idx] = obj;
}
};
Implementation Guide
Phase 1: Foundation (Weeks 1-2)
Goal: Basic infrastructure without optimizations
- Project Structure:
hft_simulator/ ├── CMakeLists.txt ├── src/ │ ├── main.cpp │ ├── core/ │ │ ├── types.hpp # Quote, Order, Fill structs │ │ ├── spsc_queue.hpp # Lock-free SPSC queue │ │ ├── object_pool.hpp # Pre-allocation pool │ │ └── timing.hpp # High-resolution timing │ ├── market_data/ │ │ ├── feed.hpp │ │ ├── feed.cpp │ │ └── parser.cpp │ ├── order_book/ │ │ ├── order_book.hpp │ │ └── order_book.cpp │ ├── strategy/ │ │ ├── task.hpp # Coroutine Task type │ │ ├── momentum.cpp │ │ ├── mean_reversion.cpp │ │ └── market_making.cpp │ ├── risk/ │ │ ├── risk_manager.hpp │ │ └── risk_manager.cpp │ └── routing/ │ ├── order_router.hpp │ └── simulated_exchange.cpp ├── tests/ │ ├── test_spsc_queue.cpp │ ├── test_order_book.cpp │ └── test_risk.cpp ├── benchmarks/ │ ├── bench_latency.cpp │ └── bench_throughput.cpp └── data/ └── sample_data.csv - Start with simple implementations:
- Mutex-based queue (replace with lock-free later)
- Scalar order matching (replace with SIMD later)
- Callback-based strategies (replace with coroutines later)
- Build the data flow:
- Read CSV market data
- Update order books
- Generate strategy signals
- Submit orders
Checkpoint: System runs end-to-end, but slowly (100s of microseconds latency)
Phase 2: Lock-Free Path (Weeks 3-4)
Goal: Eliminate locks from critical path
-
Implement SPSC Queue: ```cpp // Key insight: separate cache lines for producer and consumer template<typename T, size_t Size> class SPSCQueue { static_assert((Size & (Size - 1)) == 0, “Size must be power of 2”);
alignas(64) std::atomic
head_{0}; char padding1_[64 - sizeof(std::atomic )]; alignas(64) std::atomic
tail_{0}; char padding2_[64 - sizeof(std::atomic )]; alignas(64) std::array<T, Size> buffer_;
public: bool push(const T& item) { const size_t h = head_.load(std::memory_order_relaxed); const size_t next = (h + 1) & (Size - 1);
if (next == tail_.load(std::memory_order_acquire)) {
return false; // Full
}
buffer_[h] = item;
head_.store(next, std::memory_order_release);
return true;
}
bool pop(T& item) {
const size_t t = tail_.load(std::memory_order_relaxed);
if (t == head_.load(std::memory_order_acquire)) {
return false; // Empty
}
item = buffer_[t];
tail_.store((t + 1) & (Size - 1), std::memory_order_release);
return true;
} }; ```
- Lock-free position tracking:
- Use
std::atomic<int64_t>for positions - Use
fetch_addfor updates - No locks, no CAS loops for updates
- Use
- Benchmark lock-free vs mutex:
- Expect 10-100x improvement under contention
Checkpoint: Critical path is lock-free, latency drops to 10-50 microseconds
Phase 3: SIMD Optimization (Weeks 5-6)
Goal: Vectorize order matching
- Convert to Structure of Arrays:
```cpp
// Before: Array of Structures (AoS) - bad for SIMD
struct OrderBookLevel {
int64_t price;
int32_t quantity;
uint64_t timestamp;
};
std::vector
levels;
// After: Structure of Arrays (SoA) - SIMD-friendly struct OrderBook { alignas(64) std::array<int64_t, 256> prices; alignas(64) std::array<int32_t, 256> quantities; alignas(64) std::array<uint64_t, 256> timestamps; size_t count; };
2. **Implement SIMD matching**:
```cpp
#include <immintrin.h> // AVX2 intrinsics
// Compare 4 prices at once (AVX2, 256-bit)
std::vector<size_t> find_matching_levels_simd(
const OrderBook& book,
int64_t target_price,
Side order_side
) {
std::vector<size_t> matches;
__m256i target = _mm256_set1_epi64x(target_price);
for (size_t i = 0; i < book.count; i += 4) {
__m256i prices = _mm256_load_si256(
reinterpret_cast<const __m256i*>(&book.prices[i])
);
__m256i cmp;
if (order_side == BUY) {
// For buy: book price <= target (ask side)
cmp = _mm256_cmpgt_epi64(target, prices); // target > book
cmp = _mm256_or_si256(cmp, _mm256_cmpeq_epi64(target, prices));
} else {
// For sell: book price >= target (bid side)
cmp = _mm256_cmpgt_epi64(prices, target);
cmp = _mm256_or_si256(cmp, _mm256_cmpeq_epi64(prices, target));
}
// Extract mask and find matching indices
int mask = _mm256_movemask_pd(_mm256_castsi256_pd(cmp));
for (size_t j = 0; j < 4; ++j) {
if (mask & (1 << j)) {
matches.push_back(i + j);
}
}
}
return matches;
}
- Benchmark SIMD vs scalar:
- Expect 4-8x improvement for price comparisons
Checkpoint: Order matching uses SIMD, latency drops to 5-20 microseconds
Phase 4: Coroutines (Weeks 7-8)
Goal: Implement strategies as coroutines
-
Create Task type: ```cpp template
class Task { public: struct promise_type { T result_; std::exception_ptr exception_; Task get_return_object() { return Task{Handle::from_promise(*this)}; } std::suspend_never initial_suspend() noexcept { return {}; } std::suspend_always final_suspend() noexcept { return {}; } void return_value(T value) { result_ = std::move(value); } void unhandled_exception() { exception_ = std::current_exception(); } };using Handle = std::coroutine_handle
;
private: Handle handle_;
public: explicit Task(Handle h) : handle_(h) {} ~Task() { if (handle_) handle_.destroy(); }
T get() {
while (!handle_.done()) {
handle_.resume();
}
if (handle_.promise().exception_) {
std::rethrow_exception(handle_.promise().exception_);
}
return std::move(handle_.promise().result_);
} }; ```
-
Create awaitables: ```cpp // Await next market data event struct AwaitQuote { MarketDataFeed& feed_; Quote result_;
bool await_ready() const noexcept { return feed_.has_pending(); }
void await_suspend(std::coroutine_handle<> h) noexcept { feed_.register_continuation(h); }
Quote await_resume() noexcept { return feed_.pop(); } };
// Await order submission struct AwaitOrderSubmit { OrderRouter& router_; Order order_; OrderResult result_;
bool await_ready() const noexcept { return false; }
void await_suspend(std::coroutine_handle<> h) noexcept {
router_.submit_async(order_, [this, h](OrderResult r) {
result_ = r;
h.resume();
});
}
OrderResult await_resume() noexcept { return result_; } }; ```
- Convert strategies:
Task<void> mean_reversion_strategy( MarketDataFeed& feed, RiskManager& risk, OrderRouter& router ) { // Moving average state std::deque<double> prices; constexpr size_t WINDOW = 50; constexpr double Z_THRESHOLD = 2.0; while (true) { auto quote = co_await AwaitQuote{feed}; prices.push_back(quote.mid_price()); if (prices.size() > WINDOW) prices.pop_front(); if (prices.size() < WINDOW) continue; // Calculate z-score double mean = std::accumulate(prices.begin(), prices.end(), 0.0) / WINDOW; double sq_sum = std::inner_product(prices.begin(), prices.end(), prices.begin(), 0.0); double stdev = std::sqrt(sq_sum / WINDOW - mean * mean); double z_score = (quote.mid_price() - mean) / stdev; if (std::abs(z_score) > Z_THRESHOLD) { Side side = z_score > 0 ? SELL : BUY; // Mean reversion Order order{side, quote.symbol, 100, quote.mid_price()}; if (risk.check(order)) { auto result = co_await AwaitOrderSubmit{router, order}; // Log result... } } } }
Checkpoint: Strategies are coroutines, code is cleaner, performance maintained
Phase 5: Low-Latency Polish (Weeks 9-10)
Goal: Eliminate remaining latency sources
- Core Pinning:
```cpp
#include
void pin_to_core(int core_id) { cpu_set_t cpuset; CPU_ZERO(&cpuset); CPU_SET(core_id, &cpuset);
pthread_t thread = pthread_self();
int result = pthread_setaffinity_np(thread, sizeof(cpuset), &cpuset);
if (result != 0) {
throw std::runtime_error("Failed to pin thread to core");
} }
// In thread startup: void market_data_thread() { pin_to_core(0); // Pin to core 0 // … processing }
2. **NUMA Awareness**:
```cpp
#include <numa.h>
void* numa_local_alloc(size_t size) {
int node = numa_node_of_cpu(sched_getcpu());
return numa_alloc_onnode(size, node);
}
// Allocate order book on local NUMA node
OrderBook* book = new (numa_local_alloc(sizeof(OrderBook))) OrderBook();
- Cache Prefetching:
// Prefetch next order book levels while processing current void process_orders(OrderBook& book) { for (size_t i = 0; i < book.count; i += 8) { // Prefetch next cache line __builtin_prefetch(&book.prices[i + 16], 0, 3); // Process current batch process_batch(&book.prices[i], &book.quantities[i]); } } - Branch Elimination: ```cpp // Before: branchy code int64t best_price = (side == BUY) ? ask_prices[0] : bid_prices_[0];
// After: branchless with conditional move int64t prices[2] = {bid_prices[0], ask_prices_[0]}; int64_t best_price = prices[side]; // side is 0 or 1
5. **Object Pool**:
```cpp
// Zero-allocation order creation
OrderPool pool(100000); // Pre-allocate 100k orders
Order* create_order(/* params */) {
Order* order = pool.acquire();
order->symbol = symbol;
order->side = side;
// ...
return order;
}
void release_order(Order* order) {
pool.release(order);
}
Checkpoint: P99 latency < 10 microseconds
Phase 6: Measurement & Tuning (Weeks 11-12)
Goal: Measure, profile, and tune
- Latency Histogram:
```cpp
class LatencyTracker {
static constexpr size_t BUCKETS = 1000; // 1ns to 1ms
std::array<std::atomic
, BUCKETS> histogram_{}; std::atomic overflow_{0};
public: void record(uint64t latency_ns) { if (latency_ns < BUCKETS) { histogram[latency_ns].fetch_add(1, std::memory_order_relaxed); } else { overflow_.fetch_add(1, std::memory_order_relaxed); } }
uint64_t percentile(double p) const {
uint64_t total = 0;
for (const auto& count : histogram_) {
total += count.load(std::memory_order_relaxed);
}
total += overflow_.load(std::memory_order_relaxed);
uint64_t target = static_cast<uint64_t>(total * p);
uint64_t cumulative = 0;
for (size_t i = 0; i < BUCKETS; ++i) {
cumulative += histogram_[i].load(std::memory_order_relaxed);
if (cumulative >= target) {
return i;
}
}
return BUCKETS; // Overflow
} }; ```
- Use perf for profiling:
```bash
Record CPU cycles and cache misses
perf record -e cycles,cache-misses ./hft_simulator
Analyze hot spots
perf report
Check for cache misses
perf stat -e L1-dcache-load-misses,LLC-load-misses ./hft_simulator
3. **Flame graphs**:
```bash
# Generate flame graph
perf record -g ./hft_simulator
perf script | stackcollapse-perf.pl | flamegraph.pl > flame.svg
- Tune based on data:
- If cache miss rate high: improve data layout, add prefetching
- If branch misses high: eliminate branches with SIMD or branchless code
- If lock contention: move to lock-free or reduce sharing
Checkpoint: Meet all performance targets
Testing Strategy
Unit Tests
// test_spsc_queue.cpp
TEST(SPSCQueue, BasicPushPop) {
SPSCQueue<int, 16> queue;
EXPECT_TRUE(queue.push(42));
int value;
EXPECT_TRUE(queue.pop(value));
EXPECT_EQ(value, 42);
}
TEST(SPSCQueue, FullQueue) {
SPSCQueue<int, 4> queue; // 4 slots, but only 3 usable
EXPECT_TRUE(queue.push(1));
EXPECT_TRUE(queue.push(2));
EXPECT_TRUE(queue.push(3));
EXPECT_FALSE(queue.push(4)); // Full
}
TEST(SPSCQueue, ConcurrentAccess) {
SPSCQueue<int, 1024> queue;
std::atomic<int> sum{0};
std::thread producer([&]() {
for (int i = 0; i < 10000; ++i) {
while (!queue.push(i)) {
std::this_thread::yield();
}
}
});
std::thread consumer([&]() {
for (int i = 0; i < 10000; ++i) {
int value;
while (!queue.pop(value)) {
std::this_thread::yield();
}
sum += value;
}
});
producer.join();
consumer.join();
EXPECT_EQ(sum.load(), 10000 * 9999 / 2); // Sum of 0..9999
}
Integration Tests
// test_end_to_end.cpp
TEST(HFTSimulator, EndToEndLatency) {
HFTSimulator sim("test_config.yaml");
sim.load_market_data("test_data.csv");
LatencyTracker tracker;
sim.on_quote_to_trade([&](const Quote& q, const Order& o) {
auto latency = o.timestamp - q.timestamp;
tracker.record(latency);
});
sim.run();
// Verify latency targets
EXPECT_LT(tracker.percentile(0.50), 5000); // P50 < 5us
EXPECT_LT(tracker.percentile(0.99), 10000); // P99 < 10us
EXPECT_LT(tracker.percentile(0.999), 50000); // P99.9 < 50us
}
TEST(HFTSimulator, RiskLimitsEnforced) {
HFTSimulator sim("test_config.yaml");
sim.set_position_limit("AAPL", 1000);
// Try to exceed limit
for (int i = 0; i < 20; ++i) {
auto result = sim.submit_order(Order{BUY, "AAPL", 100, 100'00});
if (i < 10) {
EXPECT_TRUE(result.accepted);
} else {
EXPECT_FALSE(result.accepted);
EXPECT_EQ(result.reject_reason, RejectReason::PositionLimit);
}
}
}
Performance Benchmarks
// bench_latency.cpp
static void BM_OrderMatching(benchmark::State& state) {
OrderBook book;
fill_order_book(book, 256); // 256 levels
Order order{BUY, "AAPL", 100, 185'50};
for (auto _ : state) {
auto matches = book.match_order(order);
benchmark::DoNotOptimize(matches);
}
}
BENCHMARK(BM_OrderMatching);
static void BM_SPSCQueue(benchmark::State& state) {
SPSCQueue<Quote, 65536> queue;
Quote quote{};
for (auto _ : state) {
queue.push(quote);
Quote out;
queue.pop(out);
benchmark::DoNotOptimize(out);
}
}
BENCHMARK(BM_SPSCQueue);
Common Pitfalls
1. False Sharing
Problem: Two threads accessing different atomics on the same cache line
// WRONG: head_ and tail_ on same cache line
struct BadQueue {
std::atomic<size_t> head_;
std::atomic<size_t> tail_; // False sharing!
};
// RIGHT: Pad to separate cache lines
struct GoodQueue {
alignas(64) std::atomic<size_t> head_;
alignas(64) std::atomic<size_t> tail_;
};
Symptoms: High cache miss rate, poor scalability with threads
2. Memory Order Mistakes
Problem: Using relaxed ordering when synchronization needed
// WRONG: No synchronization between push and pop
head_.store(next, std::memory_order_relaxed); // Producer
if (t == head_.load(std::memory_order_relaxed)) // Consumer
// Data race! May read stale data
// RIGHT: Release-acquire pairing
head_.store(next, std::memory_order_release); // Producer publishes
if (t == head_.load(std::memory_order_acquire)) // Consumer synchronizes
3. Allocation in Hot Path
Problem: Calling new/malloc in performance-critical code
// WRONG: Allocates on every order
Order* create_order() {
return new Order(); // ~1-10 microseconds!
}
// RIGHT: Use object pool
Order* create_order() {
return order_pool_.acquire(); // ~10 nanoseconds
}
4. Branch Misprediction
Problem: Unpredictable branches in inner loops
// WRONG: Hard to predict branch
for (auto& order : orders) {
if (order.side == BUY) { // Misprediction ~50%
process_buy(order);
} else {
process_sell(order);
}
}
// RIGHT: Separate loops or branchless
for (auto& order : buy_orders) {
process_buy(order);
}
for (auto& order : sell_orders) {
process_sell(order);
}
5. System Call Overhead
Problem: Calling the kernel in hot path
// WRONG: System call for time
auto now = std::chrono::system_clock::now(); // ~100ns syscall
// RIGHT: Use RDTSC for low overhead
inline uint64_t rdtsc() {
uint32_t lo, hi;
__asm__ volatile ("rdtsc" : "=a" (lo), "=d" (hi));
return ((uint64_t)hi << 32) | lo;
}
6. Coroutine Heap Allocation
Problem: Coroutine frame allocated on heap by default
// Problem: Each coroutine allocates on heap
Task<void> strategy() {
co_await next_tick(); // Heap allocation for frame
}
// Solution: Custom allocator in promise_type
struct promise_type {
static void* operator new(size_t size) {
return frame_pool.acquire(size); // Pool allocation
}
static void operator delete(void* ptr, size_t size) {
frame_pool.release(ptr, size);
}
};
Extensions & Challenges
Challenge 1: Kernel Bypass with DPDK
Implement network I/O without kernel involvement:
// DPDK provides direct NIC access, bypassing kernel
// ~100ns receive latency instead of ~10us
#include <rte_ethdev.h>
void dpdk_receive_loop() {
struct rte_mbuf* bufs[32];
while (running) {
uint16_t nb_rx = rte_eth_rx_burst(port_id, 0, bufs, 32);
for (int i = 0; i < nb_rx; ++i) {
process_packet(rte_pktmbuf_mtod(bufs[i], uint8_t*));
rte_pktmbuf_free(bufs[i]);
}
}
}
Challenge 2: Hardware Timestamping
Use NIC hardware timestamps for sub-microsecond accuracy:
// Enable hardware timestamping on NIC
struct hwtstamp_config cfg = {
.tx_type = HWTSTAMP_TX_ON,
.rx_filter = HWTSTAMP_FILTER_ALL,
};
ioctl(socket_fd, SIOCSHWTSTAMP, &cfg);
// Read hardware timestamp from packet
struct scm_timestamping* ts = CMSG_DATA(cmsg);
uint64_t hw_ns = ts->ts[2].tv_sec * 1e9 + ts->ts[2].tv_nsec;
Challenge 3: Order Book Reconstruction
Build order book from L3 market data (individual orders, not just levels):
struct L3OrderBook {
struct Order {
uint64_t order_id;
int64_t price;
int32_t quantity;
uint64_t timestamp;
};
// Map from order_id to order details
std::unordered_map<uint64_t, Order> orders_;
// Price levels with order queues
std::map<int64_t, std::deque<uint64_t>> bid_levels_;
std::map<int64_t, std::deque<uint64_t>, std::greater<>> ask_levels_;
void add_order(const Order& order, Side side);
void modify_order(uint64_t order_id, int32_t new_quantity);
void delete_order(uint64_t order_id);
};
Challenge 4: Market Maker Strategy
Implement a market-making strategy that quotes both sides:
Task<void> market_maker_strategy(
MarketDataFeed& feed,
RiskManager& risk,
OrderRouter& router
) {
constexpr double SPREAD = 0.001; // 10 basis points
constexpr int32_t SIZE = 100;
std::optional<uint64_t> bid_order_id, ask_order_id;
while (true) {
auto quote = co_await NextTick{feed};
// Cancel existing orders
if (bid_order_id) co_await router.cancel(*bid_order_id);
if (ask_order_id) co_await router.cancel(*ask_order_id);
// Calculate new prices
double mid = (quote.bid + quote.ask) / 2.0;
int64_t bid_price = static_cast<int64_t>(mid * (1 - SPREAD/2));
int64_t ask_price = static_cast<int64_t>(mid * (1 + SPREAD/2));
// Quote both sides
if (risk.check_position(quote.symbol) < POSITION_LIMIT) {
auto bid_result = co_await router.submit(
Order{BUY, quote.symbol, SIZE, bid_price}
);
bid_order_id = bid_result.order_id;
auto ask_result = co_await router.submit(
Order{SELL, quote.symbol, SIZE, ask_price}
);
ask_order_id = ask_result.order_id;
}
}
}
Challenge 5: Distributed HFT (Multi-Machine)
Scale across multiple machines with RDMA:
Machine A (Market Data) Machine B (Strategy)
| |
| RDMA write (no CPU) |
+--------------------------->|
~1 us latency
Resources
Primary Reading
| Topic | Book/Resource | Chapter/Section |
|---|---|---|
| Lock-free programming | “C++ Concurrency in Action” by Anthony Williams | Chapters 5-7 |
| SIMD programming | “Intel Intrinsics Guide” | AVX2 reference |
| Coroutines | “C++ Concurrency in Action” 2nd Ed | Chapter 13 |
| Market microstructure | “Trading and Exchanges” by Larry Harris | Chapters 1-6 |
| Low-latency design | “Trading and Exchanges” by Larry Harris | Chapter 16 |
| Memory model | “C++ Concurrency in Action” | Chapter 5 |
CppCon Talks (Essential Viewing)
| Talk | Speaker | Year | Topic |
|---|---|---|---|
| “Designing Low-Latency Systems” | Carl Cook | 2017 | HFT architecture |
| “When a Microsecond Is an Eternity” | Carl Cook | 2019 | Low-latency patterns |
| “Lock-Free Programming” | Herb Sutter | 2014 | Lock-free fundamentals |
| “C++ Atomics, From Basic to Advanced” | Fedor Pikus | 2017 | Memory model deep dive |
| “SIMD: From Zero to Hero” | Victor Ciura | 2018 | SIMD optimization |
Tools
| Tool | Purpose |
|---|---|
perf |
Linux profiling (cycles, cache misses) |
flamegraph |
Visualize CPU time distribution |
valgrind --tool=cachegrind |
Cache simulation |
| Google Benchmark | Microbenchmarking |
| Catch2/GoogleTest | Unit testing |
rdtsc |
Nanosecond-precision timing |
Online Resources
- Intel Intrinsics Guide: https://software.intel.com/sites/landingpage/IntrinsicsGuide/
- Preshing on Programming: https://preshing.com/archives/
- Lock-Free Programming: https://www.1024cores.net/
- LMAX Disruptor Paper: https://lmax-exchange.github.io/disruptor/
Self-Assessment Checklist
Before considering this project complete, verify you can:
Architecture & Design
- Draw the complete system architecture from memory
- Explain why each component uses its specific concurrency primitive
- Justify the thread/core assignment strategy
- Describe the data flow from market data to order submission
Lock-Free Programming
- Implement SPSC queue from scratch
- Explain acquire-release semantics and when to use them
- Identify and fix false sharing issues
- Explain why the queue is wait-free for single producer/consumer
SIMD Optimization
- Convert array-of-structures to structure-of-arrays
- Write SIMD price comparison code
- Explain when SIMD helps and when it doesn’t
- Measure SIMD vs scalar speedup
Coroutines
- Implement a basic Task
coroutine type - Create custom awaitables for market data and order submission
- Explain coroutine frame allocation and how to optimize it
- Convert callback-based code to coroutine-based
Low-Latency Patterns
- Implement object pool for zero-allocation
- Apply core pinning and measure impact
- Identify and eliminate branches in hot paths
- Use cache prefetching effectively
Performance Measurement
- Measure latency at P50/P99/P99.9
- Use perf to identify hot spots
- Measure cache miss rates
- Create latency histograms
Interview Readiness
- Explain HFT system architecture in 2 minutes
- Answer questions about lock-free vs wait-free
- Discuss tradeoffs in low-latency design
- Explain why C++ (not Java/Python) for HFT
Submission/Completion Criteria
Minimum Viable Product (8 weeks)
To consider Phase 1-4 complete:
- Functional Requirements:
- System reads market data from file/generator
- Order book updates correctly on quotes
- At least one strategy generates signals
- Risk manager enforces position limits
- Orders submitted to simulated exchange
- Performance Requirements:
- Quote-to-trade latency P99 < 50 microseconds
- Throughput > 100,000 quotes/second
- Zero allocations in critical path (verified)
- Code Quality:
- Unit tests for all core components
- Integration tests for end-to-end flow
- Benchmarks for latency and throughput
- Documentation of architecture
Full Completion (12 weeks)
To consider the project fully complete:
- All MVP requirements plus:
- Quote-to-trade latency P99 < 10 microseconds
- Throughput > 1 million quotes/second
- Multiple strategies running concurrently
- Core pinning and NUMA awareness implemented
- Advanced Features (at least 2):
- SIMD order matching
- Coroutine-based strategies
- Latency histogram with percentile reporting
- Flame graph analysis completed
- Documentation:
- Architecture design document
- Performance optimization log
- Lessons learned summary
Final Thoughts
This capstone project integrates everything from the C++ Concurrency learning path:
- Projects 1-2: Threading fundamentals, futures/promises
- Projects 3-4: Thread pools, custom synchronization primitives
- Projects 5-6: Lock-free data structures
- Projects 7-8: Parallel algorithms
- Projects 9-11: Coroutines
- Projects 12-14: SIMD
- Projects 15-16: Actor model, distributed systems
Building an HFT simulator teaches you that performance engineering is about understanding your hardware: CPU caches, memory bandwidth, branch prediction, and SIMD lanes. It is about measuring obsessively and optimizing surgically.
Even if you never work in finance, these skills transfer directly to:
- Game engines (physics, rendering)
- Database engines (query execution)
- Network infrastructure (packet processing)
- Audio/video processing (real-time)
- Any system where latency matters
The journey from “working” to “fast” to “really fast” is where you become an expert.
Previous Project: P16 - Distributed Task Scheduler
Back to: C++ Concurrency Learning Guide