Project 1: Limit Order Book Engine
Build the fundamental data structure of every exchange and trading system - a fully functional limit order book that maintains bid/ask levels, handles order matching, and reports trades with nanosecond-level performance.
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Intermediate |
| Time Estimate | 1-2 weeks |
| Language | C++ or Rust |
| Prerequisites | Basic C++/Rust, understanding of maps/queues |
| Key Topics | Cache optimization, data structures, memory layout, order matching |
| Coolness Level | Level 4: Hardcore Tech Flex |
| Portfolio Value | Resume Gold |
1. Learning Objectives
By completing this project, you will:
- Understand limit order book mechanics: Explain how bids, asks, price levels, and order queues work in real exchanges
- Master cache-friendly data structure design: Implement data layouts that minimize cache misses and maximize throughput
- Learn price-time priority matching: Build a matching algorithm that correctly executes orders following exchange rules
- Gain experience with memory pooling: Eliminate malloc/new from your hot path using custom allocation strategies
- Benchmark and optimize for nanosecond latency: Use profiling tools to identify bottlenecks and measure improvements
- Build the foundation for HFT systems: This project is the core component that all subsequent projects build upon
2. Theoretical Foundation
2.1 What is a Limit Order Book?
A limit order book (LOB) is the central data structure used by all electronic exchanges to match buyers with sellers. Every stock exchange (NYSE, NASDAQ), cryptocurrency exchange (Coinbase, Binance), and futures exchange (CME) maintains an order book for each tradable instrument.
The order book contains two sides:
LIMIT ORDER BOOK: AAPL
┌─────────────────────────────────────────────────┐
│ ASKS (SELLS) │
│ │
│ Price │ Quantity │ Orders │
│ ─────────┼──────────┼───────────────────── │
│ $150.30 │ 2,500 │ [O5:500, O6:2000] │ ◄── Best Ask
│ $150.29 │ 1,200 │ [O4:1200] │
│ $150.28 │ 3,800 │ [O3:800, O7:3000] │
│ ─────────┴──────────┴───────────────────── │
│ │
│ ═══ SPREAD: $0.02 ═══ │
│ │
│ ─────────┬──────────┬───────────────────── │
│ $150.26 │ 4,100 │ [O1:1000, O2:100, ...] │ ◄── Best Bid
│ $150.25 │ 2,300 │ [O8:2300] │
│ $150.24 │ 5,600 │ [O9:1000, O10:4600] │
│ │
│ BIDS (BUYS) │
└─────────────────────────────────────────────────┘
Key Terminology:
| Term | Definition |
|---|---|
| Bid | An offer to buy at a specific price |
| Ask (Offer) | An offer to sell at a specific price |
| Best Bid | Highest price any buyer is willing to pay |
| Best Ask | Lowest price any seller is willing to accept |
| Spread | Difference between best ask and best bid |
| Price Level | All orders at the same price |
| Depth | Total quantity available at each price level |
| Liquidity | How easily you can trade without moving the price |
2.2 Price-Time Priority (FIFO)
When multiple orders exist at the same price level, they are executed in the order they arrived. This is called price-time priority or FIFO (First In, First Out).
Price Level $150.26: [Order A: 9:00:00.001] -> [Order B: 9:00:00.003] -> [Order C: 9:00:00.007]
^ ^
Arrived first Arrived second
(Matched first) (Matched second)
Why This Matters: Speed is everything. If you can submit your order 1 microsecond faster than a competitor, you get priority at that price level. This is why HFT firms spend billions on infrastructure.
2.3 Order Types and Lifecycle
Order Types You Must Support:
- Limit Order: Buy/sell at a specific price or better
BUY 100 @ $150.25- Will only execute at $150.25 or lowerSELL 100 @ $150.30- Will only execute at $150.30 or higher
- Market Order (optional for this project): Buy/sell immediately at best available price
- Takes whatever price is available
- Guaranteed execution, uncertain price
- Cancel Order: Remove a previously submitted order
- Must be O(1) or O(log n) to avoid exploitation
Order Lifecycle:
Order Lifecycle State Machine
┌───────┐ ┌──────────┐ ┌─────────────┐
│ NEW │────►│ PENDING │────►│ FILLED │
└───────┘ └──────────┘ └─────────────┘
│ │ ▲
│ │ │
│ ▼ │
│ ┌──────────┐ │
│ │ PARTIAL │───────────┘
│ └──────────┘
│ │
│ ▼
▼ ┌──────────┐
┌───────┐ │ CANCELED │
│REJECTED│ └──────────┘
└───────┘
NEW: Order received but not yet processed
PENDING: Order resting in book, waiting for match
PARTIAL: Some quantity filled, remainder still in book
FILLED: All quantity matched, order complete
CANCELED: Order removed before full execution
REJECTED: Order failed validation (invalid price, etc.)
2.4 Matching Algorithm
When a new order arrives, the matching engine attempts to match it against existing orders on the opposite side:
MATCHING ALGORITHM: New Buy Order @ $150.28 for 1,500 shares
Step 1: Check if order crosses the spread
Best Ask: $150.26 (1,000 shares)
New Buy: $150.28 (1,500 shares)
$150.28 >= $150.26 ✓ (Order crosses!)
Step 2: Match against best ask
Match: 1,000 shares @ $150.26
Trade generated: BUY 1,000 @ $150.26
Remaining: 500 shares to fill
Step 3: Move to next price level
Next Ask: $150.27 (800 shares)
Match: 500 shares @ $150.27
Trade generated: BUY 500 @ $150.27
Order fully filled!
Step 4: Update order book
- Remove filled orders from ask side
- Update price level quantities
- Broadcast market data update
Pseudocode:
function match_order(incoming_order):
while incoming_order.remaining_qty > 0:
# Get best opposite price level
opposite_level = get_best_level(opposite_side(incoming_order.side))
if opposite_level is None:
break # No more liquidity
if not prices_cross(incoming_order.price, opposite_level.price, incoming_order.side):
break # Order doesn't cross spread
# Match against orders at this level (FIFO order)
for resting_order in opposite_level.orders:
match_qty = min(incoming_order.remaining_qty, resting_order.remaining_qty)
generate_trade(incoming_order, resting_order, match_qty, opposite_level.price)
incoming_order.remaining_qty -= match_qty
resting_order.remaining_qty -= match_qty
if resting_order.remaining_qty == 0:
remove_order(resting_order)
if incoming_order.remaining_qty == 0:
break
# If level is empty, remove it
if opposite_level.total_qty == 0:
remove_level(opposite_level)
# If order has remaining quantity, add to book
if incoming_order.remaining_qty > 0:
add_to_book(incoming_order)
2.5 Why This Matters for HFT
The order book is the most performance-critical data structure in a trading system:
| Operation | Frequency | Latency Requirement | Why It Matters |
|---|---|---|---|
| Add Order | 10,000-100,000/sec | < 1 microsecond | Every strategy starts with placing orders |
| Cancel Order | 50,000-500,000/sec | < 500 nanoseconds | Rapid requoting requires instant cancellation |
| Match/Execute | Variable | < 1 microsecond | Slow matching = missed opportunities |
| Best Bid/Ask | 100,000+/sec | < 100 nanoseconds | Price check happens before every decision |
Industry Context:
- NASDAQ: Processes 1+ million messages per second per symbol
- CME: Sub-microsecond matching latency
- NYSE: Order book updates every 64 microseconds (consolidated feed)
- HFT Firms: Internal matching in 10-100 nanoseconds
Real Financial Impact:
Scenario: You're quoting both sides of AAPL
Your Quote: BID $150.00 x 1000 | ASK $150.02 x 1000
The market moves against you (AAPL drops)
You need to cancel your $150.00 bid BEFORE someone sells to you
If your cancel takes 10 microseconds and a competitor's takes 1 microsecond:
- They cancel first, you get filled on stale quote
- Loss: potentially $0.10/share x 1000 shares = $100 per incident
- This happens thousands of times per day
2.6 Historical Context
Evolution of Order Books:
Timeline of Electronic Trading
1960s-1970s: Physical trading floors, open outcry
Order books maintained on paper by specialists
1971: NASDAQ launched - first electronic exchange
Order books stored in centralized computers
1980s-1990s: Electronic Communication Networks (ECNs)
Competition drives innovation
1998: SEC Regulation ATS enables more ECNs
Market fragmentation begins
2000s: Decimalization ($0.01 increments)
Algorithmic trading emerges
Latency becomes competitive advantage
2005: Reg NMS - National Best Bid Offer requirement
Order routing becomes complex
2010s: Microsecond wars
Co-location, FPGA-based trading
Flash crashes reveal fragility
2020s: Nanosecond precision
Quantum-resistant timing
AI-driven market making
2.7 Common Misconceptions
Misconception 1: “A red-black tree or std::map is good enough for production”
Reality: std::map operations involve pointer chasing and cache misses. Production order books use flat arrays with intrusive lists, achieving 10-50x better performance.
Misconception 2: “The matching algorithm is the hard part”
Reality: The matching algorithm is straightforward (just FIFO). The hard part is the data structure organization that makes matching fast.
Misconception 3: “Memory allocation is fine if it’s fast”
Reality: Even “fast” allocators like jemalloc add 50-200ns variance. Production HFT systems use pre-allocated pools with O(1) allocation.
Misconception 4: “Optimizations don’t matter until you have a working version”
Reality: Partially true for correctness, but architecture decisions made early (data layout, allocation strategy) are very hard to change later. Design for performance from the start.
Misconception 5: “Exchanges use complex AI for matching”
Reality: Matching is deliberately simple and deterministic. Price-time priority is a well-defined, auditable algorithm. Complexity lives in the data structure, not the algorithm.
3. Project Specification
3.1 What You Will Build
A limit order book engine that:
- Maintains sorted price levels for both bids (descending) and asks (ascending)
- Supports core order operations: Add, Cancel, Modify
- Implements price-time priority matching correctly
- Generates trade events when orders cross
- Reports market data: Best Bid/Offer, depth at each level
- Achieves sub-microsecond operation times (target: < 500ns for common operations)
3.2 Functional Requirements
| ID | Requirement | Priority |
|---|---|---|
| FR1 | Accept limit orders with: symbol, side (buy/sell), price, quantity, order_id | Must Have |
| FR2 | Match incoming orders against existing book using price-time priority | Must Have |
| FR3 | Cancel orders by order_id in O(1) or O(log n) time | Must Have |
| FR4 | Report best bid and best ask prices and quantities | Must Have |
| FR5 | Generate trade events with: buyer_id, seller_id, price, quantity, timestamp | Must Have |
| FR6 | Support multiple price levels (at least 1000 levels per side) | Must Have |
| FR7 | Maintain FIFO order within each price level | Must Have |
| FR8 | Handle order modification (price change, quantity change) | Should Have |
| FR9 | Support multiple symbols (order book per symbol) | Should Have |
| FR10 | Provide depth snapshot (top N levels) | Should Have |
3.3 Non-Functional Requirements
| Requirement | Target | Measurement |
|---|---|---|
| Add Order Latency | < 500ns (p50), < 2us (p99) | Measure with rdtsc or std::chrono::high_resolution_clock |
| Cancel Order Latency | < 200ns (p50), < 1us (p99) | Same measurement approach |
| Memory Usage | < 100 bytes per order | Count struct sizes, verify with sizeof |
| Throughput | > 1 million orders/second | Benchmark with sustained load |
| Determinism | No dynamic allocation in hot path | Profile heap usage during operation |
| Correctness | Price-time priority always maintained | Property-based testing |
3.4 Example Usage / Output
$ ./orderbook
=== LIMIT ORDER BOOK ENGINE ===
> ADD BUY 100 50.25 order_001
ACK order_001 ADDED @ 50.25 x 100
BOOK: BID 50.25 x 100 | ASK -- x --
> ADD BUY 200 50.24 order_002
ACK order_002 ADDED @ 50.24 x 200
BOOK: BID 50.25 x 100 | ASK -- x --
> ADD SELL 150 50.30 order_003
ACK order_003 ADDED @ 50.30 x 150
BOOK: BID 50.25 x 100 | ASK 50.30 x 150
> ADD SELL 50 50.25 order_004
ACK order_004 ADDED @ 50.25 x 50
TRADE order_001 order_004 50@50.25
ACK order_004 FILLED
BOOK: BID 50.25 x 50 | ASK 50.30 x 150
> CANCEL order_001
ACK order_001 CANCELED (remaining 50)
BOOK: BID 50.24 x 200 | ASK 50.30 x 150
> ADD SELL 300 50.20 order_005
ACK order_005 ADDED @ 50.20 x 300
TRADE order_002 order_005 200@50.24
PARTIAL order_005 200 filled, 100 remaining @ 50.20
BOOK: BID -- x -- | ASK 50.20 x 100
> DEPTH
=== ORDER BOOK DEPTH ===
ASKS
----
50.30 | 150 (1 orders)
50.20 | 100 (1 orders)
----
SPREAD: 0.00
----
BIDS
----
(empty)
> STATS
Operations: 5 adds, 1 cancel
Trades: 2 (250 shares)
Avg add latency: 342ns (p50), 891ns (p99)
Avg cancel latency: 156ns (p50), 412ns (p99)
Memory: 5 orders x 64 bytes = 320 bytes
3.5 Real World Outcome
When complete, you will have:
-
A functional matching engine that can process orders, generate trades, and maintain consistent book state
- Benchmark results demonstrating sub-microsecond latency:
=== BENCHMARK RESULTS === Add Order: 312ns (p50), 456ns (p95), 1.2us (p99) Cancel Order: 89ns (p50), 124ns (p95), 234ns (p99) Best Bid/Ask: 12ns (p50), 18ns (p95), 31ns (p99) Throughput: 2.4M orders/second sustained -
Understanding of why your design choices affect performance
- Foundation for Project 3 (Matching Engine) which adds networking
4. Solution Architecture
4.1 High-Level Design
ORDER BOOK ENGINE ARCHITECTURE
┌─────────────────────────────────────────────────────────────────────────┐
│ ORDER BOOK ENGINE │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────────────────────────────────────────────────────────┐ │
│ │ ORDER INPUT │ │
│ │ │ │
│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │
│ │ │ Add Order │ │Cancel Order │ │Modify Order │ │ │
│ │ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ │ │
│ │ │ │ │ │ │
│ └──────────┼──────────────────┼──────────────────┼──────────────────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌──────────────────────────────────────────────────────────────────┐ │
│ │ ORDER VALIDATOR │ │
│ │ - Price > 0 │ │
│ │ - Quantity > 0 │ │
│ │ - Order ID unique │ │
│ │ - Side valid (BUY/SELL) │ │
│ └──────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────────────────────────────────┐ │
│ │ MATCHING ENGINE │ │
│ │ │ │
│ │ ┌─────────────────────────────────────────────────────────┐ │ │
│ │ │ ORDER BOOK (per symbol) │ │ │
│ │ │ │ │ │
│ │ │ ┌──────────────────┐ ┌──────────────────┐ │ │ │
│ │ │ │ BIDS (Buy) │ │ ASKS (Sell) │ │ │ │
│ │ │ │ (desc by price) │ │ (asc by price) │ │ │ │
│ │ │ │ │ │ │ │ │ │
│ │ │ │ PriceLevel 50.26 │ │ PriceLevel 50.28 │ │ │ │
│ │ │ │ └─ Order Queue │ │ └─ Order Queue │ │ │ │
│ │ │ │ PriceLevel 50.25 │ │ PriceLevel 50.29 │ │ │ │
│ │ │ │ └─ Order Queue │ │ └─ Order Queue │ │ │ │
│ │ │ │ ... │ │ ... │ │ │ │
│ │ │ └──────────────────┘ └──────────────────┘ │ │ │
│ │ │ │ │ │
│ │ │ ┌──────────────────────────────────────────────┐ │ │ │
│ │ │ │ ORDER ID -> ORDER* MAP │ │ │ │
│ │ │ │ (for O(1) cancel/modify by order_id) │ │ │ │
│ │ │ └──────────────────────────────────────────────┘ │ │ │
│ │ │ │ │ │
│ │ └─────────────────────────────────────────────────────────┘ │ │
│ │ │ │
│ └──────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────────────────────────────────┐ │
│ │ EVENT OUTPUT │ │
│ │ │ │
│ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │
│ │ │Order ACK │ │ Trade │ │ Cancel │ │Market │ │ │
│ │ │ │ │ Event │ │ ACK │ │Data │ │ │
│ │ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │ │
│ │ │ │
│ └──────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌──────────────────────────────────────────────────────────────────┐ │
│ │ MEMORY POOL │ │
│ │ Pre-allocated Order objects for zero-allocation hot path │ │
│ └──────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────┘
4.2 Key Components
| Component | Responsibility | Performance Requirement |
|---|---|---|
| Order Validator | Validate order parameters before processing | O(1), < 50ns |
| Order Book | Maintain sorted price levels and order queues | O(log n) add, O(1) best price |
| Price Level | FIFO queue of orders at same price | O(1) add/remove from ends |
| Order Map | OrderID -> Order* for fast lookup | O(1) lookup |
| Matching Logic | Execute crossing orders, generate trades | O(m) where m = matched orders |
| Memory Pool | Pre-allocated order storage | O(1) allocate/deallocate |
| Event Publisher | Output trades, acks, market data | Non-blocking |
4.3 Data Structures
The choice of data structures is the most critical design decision. Let’s analyze the options:
Price Level Organization:
| Data Structure | Add Level | Remove Level | Find Best | Memory Overhead | Cache Behavior |
|---|---|---|---|---|---|
std::map<Price, Level> |
O(log n) | O(log n) | O(1) amortized | High (nodes) | Poor (pointer chase) |
| Sorted vector | O(n) | O(n) | O(1) | Low | Excellent |
| Skip list | O(log n) | O(log n) | O(1) | Medium | Medium |
| Array + hash map | O(1) | O(1) | O(1) | Medium | Excellent |
Recommended Approach: Price Level Array
For typical instruments, prices cluster in a range. We can use:
Price Level Array Design
Price Range: $100.00 - $200.00 at $0.01 increments = 10,000 possible levels
Array Size: 10,000 x sizeof(PriceLevel*) = 80KB (fits in L2 cache!)
Price Index = (price - base_price) / tick_size
┌────────────────────────────────────────────────────────────────┐
│ Index: 0 1 2 3 ... 9999 │
│ Price: 100.00 100.01 100.02 100.03 ... 199.99 │
│ │ │ │ │ ... │ │
│ ▼ ▼ ▼ ▼ ▼ │
│ NULL Level* NULL Level* ... NULL │
│ │ │ │
│ ▼ ▼ │
│ ┌─────┐ ┌─────┐ │
│ │Queue│ │Queue│ │
│ │O1→O2│ │O7→O8│ │
│ └─────┘ └─────┘ │
└────────────────────────────────────────────────────────────────┘
Best bid/ask tracking:
- best_bid_index: index of highest non-empty bid level
- best_ask_index: index of lowest non-empty ask level
After removing best level, scan for next non-empty level
(Typically 1-3 indices away, so O(1) in practice)
Order Queue at Each Level:
For FIFO within a price level, use an intrusive doubly-linked list:
Intrusive Linked List Design
Instead of:
std::list<Order*> // Allocates ListNode objects
Use intrusive:
struct Order {
Order* prev; // Part of Order struct itself
Order* next; // No separate allocation!
uint64_t id;
uint32_t qty;
// ...
};
Price Level:
┌─────────────────────────────────────────────────────────────┐
│ head ──► Order_A ◄──► Order_B ◄──► Order_C ◄── tail │
│ │ │ │
│ └────────── FIFO queue ───────────────┘ │
└─────────────────────────────────────────────────────────────┘
Operations:
Add to back: O(1) - tail->next = new_order; tail = new_order;
Remove any: O(1) - prev->next = next; next->prev = prev;
Get front: O(1) - return head;
Order ID Lookup:
For O(1) cancel by order_id:
// Option 1: Hash map (std::unordered_map)
std::unordered_map<OrderId, Order*> order_map;
// Pros: Works for any order ID range
// Cons: Allocation on insert, hash computation
// Option 2: Pre-allocated array (if order IDs are sequential)
std::vector<Order*> order_array; // Index = order_id
// Pros: O(1) with no hashing, cache-friendly
// Cons: Requires sequential/bounded order IDs
// Recommendation: Use hash map initially, optimize later if needed
4.4 Memory Layout
Cache efficiency requires careful struct layout:
// BAD: Wasted space due to alignment padding
struct OrderBad {
char side; // 1 byte
// 7 bytes padding
double price; // 8 bytes
uint32_t quantity; // 4 bytes
// 4 bytes padding
uint64_t order_id; // 8 bytes
Order* next; // 8 bytes
Order* prev; // 8 bytes
uint64_t timestamp; // 8 bytes
};
// sizeof = 56 bytes (with 11 bytes wasted on padding!)
// GOOD: Ordered by size to minimize padding
struct OrderGood {
uint64_t order_id; // 8 bytes
uint64_t timestamp; // 8 bytes
double price; // 8 bytes (or use int64_t fixed-point)
Order* next; // 8 bytes
Order* prev; // 8 bytes
uint32_t quantity; // 4 bytes
char side; // 1 byte
char padding[3]; // explicit padding to 8-byte boundary
};
// sizeof = 48 bytes (no wasted padding)
// BEST: Fixed-point price, minimal size
struct OrderBest {
Order* next; // 8 bytes
Order* prev; // 8 bytes
uint64_t order_id; // 8 bytes
int64_t price_ticks; // 8 bytes (price in tick units)
uint32_t quantity; // 4 bytes
uint16_t symbol_id; // 2 bytes
uint8_t side; // 1 byte
uint8_t flags; // 1 byte
};
// sizeof = 40 bytes, all useful data
Fixed-Point Pricing:
Floating-point has issues:
- Comparison can be imprecise
- Slower on some CPUs
- Wasted bits for unused precision
Use fixed-point instead:
// Price = 150.25, tick size = 0.01
// Stored as: 15025 (integer)
// To convert:
int64_t price_to_ticks(double price, double tick_size) {
return static_cast<int64_t>(price / tick_size + 0.5);
}
double ticks_to_price(int64_t ticks, double tick_size) {
return ticks * tick_size;
}
// Comparisons are now exact integer comparisons!
4.5 Algorithm Overview
Add Order Algorithm:
function add_order(order):
// Step 1: Validate
if not valid(order):
return REJECTED
// Step 2: Check for match
if order.side == BUY:
while order.remaining_qty > 0 AND best_ask exists AND order.price >= best_ask.price:
match_at_level(order, best_ask_level)
else: // SELL
while order.remaining_qty > 0 AND best_bid exists AND order.price <= best_bid.price:
match_at_level(order, best_bid_level)
// Step 3: If remaining quantity, add to book
if order.remaining_qty > 0:
level = get_or_create_level(order.side, order.price)
level.add_order(order)
order_map[order.id] = order
update_best_price(order.side)
return order.status
function match_at_level(incoming, level):
for resting_order in level.orders: // FIFO order
match_qty = min(incoming.remaining_qty, resting_order.remaining_qty)
emit_trade(incoming, resting_order, match_qty, level.price)
incoming.remaining_qty -= match_qty
resting_order.remaining_qty -= match_qty
if resting_order.remaining_qty == 0:
level.remove_order(resting_order)
order_map.remove(resting_order.id)
order_pool.deallocate(resting_order)
if incoming.remaining_qty == 0:
break
if level.is_empty():
remove_level(level)
Cancel Order Algorithm:
function cancel_order(order_id):
order = order_map.get(order_id)
if order is None:
return NOT_FOUND
level = get_level(order.side, order.price)
level.remove_order(order) // O(1) with doubly-linked list
if level.is_empty():
remove_level(level)
update_best_price(order.side)
order_map.remove(order_id)
order_pool.deallocate(order)
return CANCELED
5. Implementation Guide
5.1 Development Environment Setup
C++ Setup:
# Ubuntu/Debian
sudo apt-get update
sudo apt-get install g++ cmake gdb valgrind linux-tools-generic
# Verify compiler supports C++17
g++ --version # Need 7+ for C++17
# Install perf for profiling (Linux)
sudo apt-get install linux-tools-common
# Create project
mkdir -p orderbook/{src,include,tests,benchmarks}
cd orderbook
Rust Setup:
# Install Rust
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
source $HOME/.cargo/env
# Create project
cargo new orderbook
cd orderbook
# Add dependencies to Cargo.toml
# [dependencies]
# criterion = "0.5" # For benchmarking
5.2 Project Structure
C++ Structure:
orderbook/
├── CMakeLists.txt
├── src/
│ ├── main.cpp # CLI entry point
│ ├── orderbook.cpp # Order book implementation
│ ├── matching_engine.cpp # Matching logic
│ └── memory_pool.cpp # Custom allocator (Phase 2)
├── include/
│ ├── order.hpp # Order struct
│ ├── orderbook.hpp # OrderBook class
│ ├── price_level.hpp # PriceLevel with order queue
│ ├── matching_engine.hpp # Matching interface
│ └── memory_pool.hpp # Pool allocator
├── tests/
│ ├── test_orderbook.cpp
│ └── test_matching.cpp
├── benchmarks/
│ └── bench_orderbook.cpp
└── README.md
Rust Structure:
orderbook/
├── Cargo.toml
├── src/
│ ├── main.rs # CLI entry point
│ ├── lib.rs # Library root
│ ├── order.rs # Order struct
│ ├── orderbook.rs # OrderBook implementation
│ ├── price_level.rs # PriceLevel
│ └── matching.rs # Matching logic
├── tests/
│ └── integration_tests.rs
├── benches/
│ └── orderbook_bench.rs
└── README.md
5.3 The Core Question You’re Answering
“How do you organize price levels and orders such that the most common operations (add order, match, cancel) are O(1) or O(log n) while maintaining cache efficiency?”
This question forces you to confront the reality that big-O notation alone is insufficient for HFT. An O(log n) tree operation with poor cache behavior can be slower than an O(n) scan of a cache-resident array.
The answer involves:
- Choosing data structures that keep hot data together in memory
- Minimizing pointer indirection (each pointer chase is a potential cache miss)
- Pre-allocating memory to avoid allocation in the hot path
- Using integer arithmetic (fixed-point prices) instead of floating-point
5.4 Concepts You Must Understand First
Before starting implementation, verify your understanding:
| Concept | Self-Assessment Question | Where to Learn |
|---|---|---|
| Hash maps | What is the expected time complexity of insert/lookup? What happens during a resize? | CS:APP Ch. 6, Language docs |
| Linked lists | How do you remove a node from a doubly-linked list in O(1)? | Any data structures textbook |
| CPU cache hierarchy | What is a cache line? Why does data locality matter? | CS:APP Ch. 6 |
| Struct alignment | Why does the order of struct members affect sizeof? | CS:APP Ch. 3.9 |
| Memory allocation | Why is malloc slow? What is a memory pool? | CS:APP Ch. 9 |
| Fixed-point arithmetic | How do you represent $150.25 as an integer? | Any numerics reference |
5.5 Questions to Guide Your Design
Data Structure Questions:
- How will you represent prices internally? (float, double, int64 ticks?)
- What data structure will hold all price levels? (map, vector, array?)
- How will you maintain FIFO order within a price level?
- How will you enable O(1) cancel-by-order-id?
- How will you track the best bid and best ask efficiently?
Memory Questions:
- Will you allocate Order objects on demand or pre-allocate?
- How much memory will your order book use with 100,000 orders?
- What is the size of your Order struct after considering alignment?
- Will your data structures fit in L1/L2/L3 cache?
Algorithm Questions:
- When a new buy order matches multiple sell orders, in what order are they filled?
- After filling the best ask level completely, how do you find the new best ask?
- What happens if a cancel arrives for an order that’s currently being matched?
- How do you handle partial fills?
Performance Questions:
- What operations happen most frequently? (add, cancel, match?)
- Where will cache misses likely occur in your design?
- How will you benchmark your implementation?
- What is your target latency for each operation?
5.6 Thinking Exercise
Before writing any code, work through this scenario by hand:
Scenario:
Start with an empty order book. Process these orders in sequence:
1. ADD BUY 100 @ 50.25 order_A
2. ADD BUY 200 @ 50.24 order_B
3. ADD BUY 150 @ 50.25 order_C
4. ADD SELL 50 @ 50.30 order_D
5. ADD SELL 300 @ 50.25 order_E
6. CANCEL order_B
7. ADD SELL 100 @ 50.20 order_F
Questions to answer on paper:
-
After step 3, draw the bid side of the book. Which orders are at price 50.25, and in what order?
-
After step 4, what does the book look like? What trades (if any) were generated?
-
After step 5, what trades were generated? List each trade with price, quantity, buyer, seller.
-
After step 5, what is the state of order_A? Order_C? Order_E?
-
After step 6, draw the full book (bids and asks).
-
After step 7, what trades were generated? What is the final book state?
Verify your answers:
- Total shares bought should equal total shares sold
- Every trade should have happened at the price of the resting order
- FIFO should have been maintained within each price level
5.7 Hints in Layers
If you’re stuck, reveal hints one at a time:
Hint 1 - Starting Direction
Start with the simplest possible implementation using standard library containers:
// C++
std::map<int64_t, std::list<Order*>> bids; // price -> orders (descending)
std::map<int64_t, std::list<Order*>> asks; // price -> orders (ascending)
std::unordered_map<uint64_t, Order*> order_lookup; // id -> order
// Rust
use std::collections::{BTreeMap, VecDeque, HashMap};
bids: BTreeMap<i64, VecDeque<Order>> // Reverse for descending
asks: BTreeMap<i64, VecDeque<Order>>
order_lookup: HashMap<u64, OrderRef>
Get correctness first. Optimize later.
Hint 2 - More Specific Guidance
For the matching logic, process orders in this sequence:
- Check if incoming order crosses the spread
- Buy order crosses if: buy_price >= best_ask_price
- Sell order crosses if: sell_price <= best_bid_price
- If it crosses, iterate through opposing levels
while (remaining_qty > 0 && crosses_spread(order)) { auto& best_level = get_best_opposing_level(order.side); match_against_level(order, best_level); } - Match against individual orders in FIFO order
void match_against_level(Order& incoming, Level& level) { while (!level.orders.empty() && incoming.remaining > 0) { Order& resting = level.orders.front(); uint32_t match_qty = std::min(incoming.remaining, resting.remaining); execute_trade(incoming, resting, match_qty, level.price); // ... update remaining quantities, remove filled orders } } - After matching, add remainder to book if any
Hint 3 - Technical Details
Price Level Array Implementation (for optimization phase):
class OrderBook {
static constexpr int64_t MIN_PRICE = 1; // $0.01
static constexpr int64_t MAX_PRICE = 1000000; // $10,000.00
static constexpr size_t NUM_LEVELS = MAX_PRICE - MIN_PRICE + 1;
std::array<PriceLevel*, NUM_LEVELS> bid_levels{};
std::array<PriceLevel*, NUM_LEVELS> ask_levels{};
int64_t best_bid_price = 0; // 0 = no bids
int64_t best_ask_price = MAX_PRICE + 1; // MAX+1 = no asks
size_t price_to_index(int64_t price) {
return static_cast<size_t>(price - MIN_PRICE);
}
PriceLevel* get_bid_level(int64_t price) {
return bid_levels[price_to_index(price)];
}
void update_best_bid_after_remove(int64_t removed_price) {
if (removed_price != best_bid_price) return;
// Scan downward for next non-empty level
for (int64_t p = removed_price - 1; p >= MIN_PRICE; --p) {
if (bid_levels[price_to_index(p)] != nullptr) {
best_bid_price = p;
return;
}
}
best_bid_price = 0; // No bids
}
};
Intrusive List for O(1) Order Removal:
struct Order {
Order* prev = nullptr;
Order* next = nullptr;
// ... other fields
};
class OrderQueue {
Order* head = nullptr;
Order* tail = nullptr;
public:
void push_back(Order* order) {
if (tail) {
tail->next = order;
order->prev = tail;
tail = order;
} else {
head = tail = order;
}
}
void remove(Order* order) {
if (order->prev) order->prev->next = order->next;
else head = order->next;
if (order->next) order->next->prev = order->prev;
else tail = order->prev;
order->prev = order->next = nullptr;
}
};
Hint 4 - Tools and Debugging
Benchmarking:
#include <chrono>
auto start = std::chrono::high_resolution_clock::now();
// ... operation
auto end = std::chrono::high_resolution_clock::now();
auto ns = std::chrono::duration_cast<std::chrono::nanoseconds>(end - start).count();
For more accurate timing (x86):
#include <x86intrin.h>
uint64_t start = __rdtsc();
// ... operation
uint64_t end = __rdtsc();
uint64_t cycles = end - start;
// Convert to ns: cycles * (1.0 / cpu_freq_ghz)
Profiling with perf:
# Record performance counters
perf stat -e cycles,instructions,cache-misses,cache-references ./orderbook_bench
# Record call graph for flamegraph
perf record -g ./orderbook_bench
perf report
Checking struct sizes:
static_assert(sizeof(Order) == 40, "Order size unexpected");
std::cout << "Order size: " << sizeof(Order) << " bytes\n";
std::cout << "Order alignment: " << alignof(Order) << "\n";
Memory allocation tracking:
# Use valgrind massif for heap profiling
valgrind --tool=massif ./orderbook_bench
ms_print massif.out.*
5.8 The Interview Questions They’ll Ask
After completing this project, you should be able to answer:
- “Explain how a limit order book works.”
- Expected: Clear explanation of bids, asks, price levels, FIFO matching
- Bonus: Mention spread, depth, liquidity concepts
- “What data structures would you use for an order book and why?”
- Expected: Discuss tradeoffs between maps, vectors, arrays
- Bonus: Mention cache efficiency, intrusive lists, memory pools
- “How do you handle order cancellation efficiently?”
- Expected: Hash map for O(1) lookup, doubly-linked list for O(1) removal
- Bonus: Discuss order ID generation strategies
- “What is price-time priority and why is it important?”
- Expected: FIFO within price level, incentivizes speed
- Bonus: Discuss alternative algorithms (pro-rata, size-priority)
- “How would you optimize an order book for low latency?”
- Expected: Memory pooling, cache optimization, fixed-point prices
- Bonus: Discuss false sharing, branch prediction, lock-free structures
- “What happens when a market order arrives?”
- Expected: Matches against best levels until filled or liquidity exhausted
- Bonus: Discuss market vs limit, IOC, FOK order types
- “How does your design scale with 1 million price levels?”
- Expected: Discuss when array-based breaks down, alternatives
- Bonus: Discuss sparse data structures, adaptive strategies
- “What would you do differently for production?”
- Expected: Acknowledge simplifications, discuss what production adds
- Bonus: Mention FIX protocol, persistence, replication
5.9 Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Cache and memory hierarchy | Computer Systems: A Programmer’s Perspective | Ch. 6 |
| Memory layout and alignment | Computer Systems: A Programmer’s Perspective | Ch. 3.9 |
| Memory management and allocators | Computer Systems: A Programmer’s Perspective | Ch. 9 |
| C++ data structures | Building Low Latency Applications with C++ | Ch. 3-4 |
| Order book design | Building Low Latency Applications with C++ | Ch. 5-6 |
| Rust data structures | Programming Rust | Ch. 16 |
| Lock-free fundamentals (for later) | Rust Atomics and Locks | Ch. 1-3 |
| Trading system architecture | Developing High-Frequency Trading Systems | Ch. 1-4 |
5.10 Implementation Phases
Phase 1: Basic Functionality (Days 1-3)
Goals:
- Get a working order book with std::map (or BTreeMap in Rust)
- Implement add, cancel, and match
- Create basic CLI for testing
Tasks:
- Define Order struct with all necessary fields
- Create PriceLevel class with order queue (use std::list or VecDeque)
- Create OrderBook class with bid/ask maps
- Implement add_order() with no matching (just add to book)
- Implement cancel_order()
- Implement get_best_bid() and get_best_ask()
- Create simple CLI to test
Checkpoint: Can add orders, cancel them, and view book state.
Phase 2: Matching Logic (Days 4-5)
Goals:
- Implement price-time priority matching
- Generate trade events
- Handle partial fills
Tasks:
- Implement crosses_spread() check
- Implement match_against_level()
- Modify add_order() to call matching before adding to book
- Create Trade struct for trade events
- Handle order state changes (partial, filled)
- Test matching with various scenarios
Checkpoint: Orders match correctly, trades are generated, book updates properly.
Phase 3: Optimization - Memory (Days 6-8)
Goals:
- Eliminate heap allocation in hot path
- Switch to intrusive linked lists
- Implement memory pool for orders
Tasks:
- Measure baseline allocation with valgrind/massif
- Convert Order to use intrusive list pointers
- Implement simple fixed-size memory pool
- Integrate pool with order book
- Verify zero allocations during order processing
- Measure improvement
Checkpoint: Adding/canceling orders allocates no heap memory.
Phase 4: Optimization - Cache (Days 9-11)
Goals:
- Switch to array-based price level storage
- Optimize struct layout
- Use fixed-point prices
Tasks:
- Implement price-to-index conversion
- Create array-based level storage
- Implement best price tracking with scan
- Optimize Order struct layout for cache lines
- Switch to int64_t fixed-point prices
- Measure cache misses before/after
Checkpoint: Cache miss rate significantly reduced.
Phase 5: Benchmarking (Days 12-14)
Goals:
- Comprehensive latency measurement
- Throughput testing
- Document performance characteristics
Tasks:
- Create benchmark harness with rdtsc or chrono
- Measure add order latency (p50, p95, p99)
- Measure cancel latency
- Measure throughput (orders/second)
- Compare against baseline (Phase 1 version)
- Document results and analysis
Checkpoint: Performance targets met, results documented.
5.11 Key Implementation Decisions
| Decision | Options | Recommendation | Rationale |
|---|---|---|---|
| Price representation | float, double, int64 fixed-point | int64 fixed-point | Exact comparisons, smaller struct, faster |
| Level storage | std::map, sorted vector, array | Array (then map fallback) | O(1) access, cache-friendly |
| Order queue | std::list, std::deque, intrusive list | Intrusive doubly-linked | O(1) removal, no allocation |
| Order lookup | std::unordered_map, flat_hash_map | std::unordered_map (then optimize) | Standard, works for prototyping |
| Memory | malloc, memory pool | Memory pool (Phase 3) | Predictable, fast, no fragmentation |
6. Testing Strategy
6.1 Unit Tests
| Test Category | What to Test |
|---|---|
| Order struct | Correct initialization, size, alignment |
| PriceLevel | Add/remove orders, FIFO order, total quantity |
| OrderBook | Add to empty book, add to existing level, best bid/ask |
| Matching | Single match, multi-level match, partial fill |
| Cancel | Cancel existing, cancel non-existent, cancel partial |
6.2 Property-Based Tests
Use property testing (QuickCheck-style) to find edge cases:
Property 1: Total buy quantity in trades == Total sell quantity in trades
Property 2: Trade price is always between bid and ask prices
Property 3: After any sequence of operations, book is internally consistent
Property 4: Order of trades respects FIFO within each price level
Property 5: Canceled order is never matched in subsequent trades
6.3 Critical Test Cases
Test 1: Simple Match
ADD BUY 100 @ 50.00
ADD SELL 100 @ 50.00
Expected: Trade 100 @ 50.00, book empty
Test 2: Partial Fill
ADD BUY 100 @ 50.00
ADD SELL 50 @ 50.00
Expected: Trade 50 @ 50.00, bid of 50 @ 50.00 remains
Test 3: Multi-Level Match
ADD BUY 100 @ 50.00
ADD BUY 100 @ 49.00
ADD SELL 150 @ 48.00
Expected: Trade 100 @ 50.00, Trade 50 @ 49.00,
bid of 50 @ 49.00 remains, ask of 0
Test 4: FIFO Priority
ADD BUY 100 @ 50.00 order_A
ADD BUY 100 @ 50.00 order_B
ADD SELL 100 @ 50.00 order_C
Expected: Trade order_A-order_C 100 @ 50.00,
order_B still in book
Test 5: Cancel Before Match
ADD BUY 100 @ 50.00 order_A
CANCEL order_A
ADD SELL 100 @ 50.00 order_B
Expected: No trade, ask of 100 @ 50.00
Test 6: Many Price Levels
ADD 1000 buy orders at prices 50.00-50.99 (0.01 increments)
ADD SELL 100 @ 48.00
Expected: 100 trades, best bid drops appropriately
Test 7: High Volume
ADD 100,000 buy orders at random prices
ADD 100,000 sell orders at random prices
Verify: No crashes, book consistent, all trades valid
6.4 Stress Tests
Stress Test 1: Throughput
Add 1,000,000 orders as fast as possible
Measure orders per second
Stress Test 2: Mixed Workload
70% adds, 25% cancels, 5% queries
Run for 10 seconds
Measure latency distribution
Stress Test 3: Adversarial
Add orders at 10,000 different price levels
Cancel all of them
Verify no memory leaks
7. Common Pitfalls and Debugging
| Problem | Symptom | Cause | Solution | Verification |
|---|---|---|---|---|
| Wrong match order | Orders matched out of FIFO sequence | Using wrong data structure (set, unordered) | Use ordered container or explicit timestamp comparison | Log order arrival times, verify first-in matched first |
| Memory leaks | Memory usage grows over time | Orders not freed after fill/cancel | Implement memory pool, track allocations | Run with valgrind, verify count goes to zero |
| Cache thrashing | High latency variance | Pointer-heavy data structures | Switch to array-based, intrusive lists | Check cache misses with perf stat |
| Floating point bugs | $0.10 + $0.20 != $0.30 | Using float/double for prices | Use fixed-point integer representation | Test with known problematic values |
| Best price stale | Report wrong best bid/ask | Not updating after cancel | Scan for new best after level empties | Unit test cancel at best price |
| Quantity overflow | Negative or huge quantities | Not checking for overflow | Use checked arithmetic or assert bounds | Test with MAX_INT quantities |
| ID collision | Wrong order canceled | Order IDs reused incorrectly | Use monotonic incrementing IDs | Assert uniqueness on add |
| Race conditions (future) | Corrupt state | Concurrent access without locks | Single-threaded first, then add proper sync | Run thread sanitizer |
7.1 Debugging Strategies
Print the book state:
void debug_print_book() {
std::cout << "=== BIDS ===\n";
for (const auto& [price, level] : bids) {
std::cout << price << ": ";
for (const auto& order : level.orders) {
std::cout << "[" << order.id << ":" << order.qty << "] ";
}
std::cout << "\n";
}
// ... similar for asks
}
Validate invariants:
void validate_book() {
// All orders in book should be in lookup map
// All orders in lookup should be in book
// Best bid should be highest non-empty bid level
// Best ask should be lowest non-empty ask level
// No price level should be empty
// Total quantity at level should match sum of orders
}
Trace execution:
#define TRACE(msg) std::cerr << __FUNCTION__ << ":" << __LINE__ << " " << msg << "\n"
void add_order(Order& order) {
TRACE("order_id=" << order.id << " price=" << order.price);
// ...
}
8. Extensions and Challenges
8.1 Beginner Extensions
- Add modify order: Change price or quantity of existing order
- Market order support: IOC (Immediate or Cancel) execution
- Order history: Keep log of all orders and trades
- Pretty printing: Colorized book display
8.2 Intermediate Extensions
- Multiple symbols: Order book per symbol, symbol manager
- Order types: Stop orders, stop-limit, trailing stop
- Time in force: Day, GTC, GTD, FOK, IOC
- Self-trade prevention: Prevent same user matching with self
- ITCH protocol: Parse NASDAQ ITCH data to populate book
8.3 Advanced Extensions
- Lock-free order book: Concurrent reads while writes happening
- FPGA acceleration: Move hot path to hardware
- Persistent state: Recover book state after crash
- Market data feed: Publish L2/L3 data over UDP multicast
- FIX gateway: Accept orders via FIX protocol
9. Real-World Connections
9.1 How Production Systems Differ
| Aspect | This Project | Production System |
|---|---|---|
| Orders per second | 100,000 | 10,000,000+ |
| Latency target | < 1 microsecond | < 100 nanoseconds |
| Price levels | Thousands | Millions possible |
| Symbols | 1 | 10,000+ |
| Redundancy | None | Hot-hot replication |
| Persistence | None | Every order journaled |
| Connectivity | CLI | FIX, ITCH, proprietary |
| Monitoring | printf | Prometheus, Grafana, alerting |
9.2 Real Exchange Examples
NASDAQ Matching Engine:
- Processes 10+ million orders per day per symbol
- Latency: ~50 microseconds end-to-end
- Uses ITCH protocol for market data (binary, efficient)
- FIX for order entry
- Runs on commodity Linux servers (not FPGA)
CME Globex:
- Largest futures exchange
- Sub-millisecond matching
- Uses iLink protocol (binary)
- Co-location available for HFT firms
IEX (Investors Exchange):
- Famous for “speed bump” (350 microsecond delay)
- Designed to reduce HFT advantage
- Same order book mechanics, different latency profile
9.3 Open Source References
| Project | Description | Useful For |
|---|---|---|
| LMAX Disruptor | Lock-free ring buffer | Understanding low-latency patterns |
| OpenHFT Chronicle Queue | Persisted queue | Journaling patterns |
| quickfix | FIX engine | Protocol implementation |
| nautilus_trader | Python/Rust trading | Architecture reference |
10. Resources
10.1 Essential Reading
- “Building Low Latency Applications with C++” by Sourav Ghosh (Ch. 3-6)
- Step-by-step order book implementation
- C++ specific optimizations
- “Computer Systems: A Programmer’s Perspective” by Bryant & O’Hallaron (Ch. 6)
- Cache hierarchy and locality
- Why memory layout matters
10.2 Articles and Papers
- “Limit Order Book Implementation for Low Latency Trading” - Alex Abosi
- Data structure comparison
- Benchmarks
- “C++ Design Patterns for Low-Latency Applications” - arXiv
- Academic treatment
- Pattern catalog
10.3 GitHub Repositories
- aspone/OrderBook - C++ reference (300-600ns)
- CppTrader - Full trading system in C++
- matching-engine - Go implementation (for comparison)
10.4 Videos
- “CppCon: Building Low Latency Trading Systems” - Various years
- “LMAX Exchange Architecture” - Martin Thompson (InfoQ)
- “Lock-Free Programming” - Herb Sutter (Channel 9)
11. Self-Assessment Checklist
Understanding
- I can explain what a limit order book is without looking at notes
- I can describe price-time priority and why it matters
- I understand why cache efficiency is critical for HFT
- I can explain the tradeoffs between different data structures
- I know what fixed-point arithmetic is and when to use it
- I understand what intrusive data structures are
Implementation
- My order book correctly maintains sorted price levels
- Orders match in FIFO order within price levels
- Cancel is O(1) using order ID lookup
- No heap allocation occurs during normal operation (Phase 3+)
- My benchmark shows sub-microsecond latency
- All test cases pass
Growth
- I profiled my code and found at least one bottleneck
- I measured and can explain my cache miss rate
- I can read and understand the reference implementations
- I could explain my design decisions in an interview
12. Submission / Completion Criteria
Minimum Viable Completion
- Order book supports add, cancel, and match operations
- Matching follows price-time priority correctly
- Basic CLI demonstrates functionality
- At least 5 unit tests pass
- Code compiles without warnings
Full Completion
- All 10 test scenarios pass
- Memory pool eliminates hot-path allocation
- Add order latency < 1 microsecond (p99)
- Cancel order latency < 500 nanoseconds (p99)
- Throughput > 500,000 orders/second
- Cache miss rate documented
Excellence (Going Above and Beyond)
- Array-based price level storage implemented
- Latency < 500ns (p99) for all operations
- Multiple symbols supported
- Benchmark comparison with baseline documented
- ITCH or FIX protocol parsing integrated
This guide was expanded from HIGH_FREQUENCY_TRADING_CPP_RUST_LEARNING_PROJECTS.md. For the complete learning path, see the project index.