Project 1: Limit Order Book Engine

Build the fundamental data structure of every exchange and trading system - a fully functional limit order book that maintains bid/ask levels, handles order matching, and reports trades with nanosecond-level performance.


Quick Reference

Attribute Value
Difficulty Intermediate
Time Estimate 1-2 weeks
Language C++ or Rust
Prerequisites Basic C++/Rust, understanding of maps/queues
Key Topics Cache optimization, data structures, memory layout, order matching
Coolness Level Level 4: Hardcore Tech Flex
Portfolio Value Resume Gold

1. Learning Objectives

By completing this project, you will:

  1. Understand limit order book mechanics: Explain how bids, asks, price levels, and order queues work in real exchanges
  2. Master cache-friendly data structure design: Implement data layouts that minimize cache misses and maximize throughput
  3. Learn price-time priority matching: Build a matching algorithm that correctly executes orders following exchange rules
  4. Gain experience with memory pooling: Eliminate malloc/new from your hot path using custom allocation strategies
  5. Benchmark and optimize for nanosecond latency: Use profiling tools to identify bottlenecks and measure improvements
  6. Build the foundation for HFT systems: This project is the core component that all subsequent projects build upon

2. Theoretical Foundation

2.1 What is a Limit Order Book?

A limit order book (LOB) is the central data structure used by all electronic exchanges to match buyers with sellers. Every stock exchange (NYSE, NASDAQ), cryptocurrency exchange (Coinbase, Binance), and futures exchange (CME) maintains an order book for each tradable instrument.

The order book contains two sides:

                    LIMIT ORDER BOOK: AAPL

    ┌─────────────────────────────────────────────────┐
    │                    ASKS (SELLS)                  │
    │                                                  │
    │  Price    │ Quantity │ Orders                   │
    │  ─────────┼──────────┼─────────────────────    │
    │  $150.30  │   2,500  │ [O5:500, O6:2000]        │ ◄── Best Ask
    │  $150.29  │   1,200  │ [O4:1200]                │
    │  $150.28  │   3,800  │ [O3:800, O7:3000]        │
    │  ─────────┴──────────┴─────────────────────    │
    │                                                  │
    │              ═══ SPREAD: $0.02 ═══               │
    │                                                  │
    │  ─────────┬──────────┬─────────────────────    │
    │  $150.26  │   4,100  │ [O1:1000, O2:100, ...]   │ ◄── Best Bid
    │  $150.25  │   2,300  │ [O8:2300]                │
    │  $150.24  │   5,600  │ [O9:1000, O10:4600]      │
    │                                                  │
    │                    BIDS (BUYS)                   │
    └─────────────────────────────────────────────────┘

Key Terminology:

Term Definition
Bid An offer to buy at a specific price
Ask (Offer) An offer to sell at a specific price
Best Bid Highest price any buyer is willing to pay
Best Ask Lowest price any seller is willing to accept
Spread Difference between best ask and best bid
Price Level All orders at the same price
Depth Total quantity available at each price level
Liquidity How easily you can trade without moving the price

2.2 Price-Time Priority (FIFO)

When multiple orders exist at the same price level, they are executed in the order they arrived. This is called price-time priority or FIFO (First In, First Out).

Price Level $150.26: [Order A: 9:00:00.001] -> [Order B: 9:00:00.003] -> [Order C: 9:00:00.007]
                            ^                          ^
                      Arrived first              Arrived second
                      (Matched first)            (Matched second)

Why This Matters: Speed is everything. If you can submit your order 1 microsecond faster than a competitor, you get priority at that price level. This is why HFT firms spend billions on infrastructure.

2.3 Order Types and Lifecycle

Order Types You Must Support:

  1. Limit Order: Buy/sell at a specific price or better
    • BUY 100 @ $150.25 - Will only execute at $150.25 or lower
    • SELL 100 @ $150.30 - Will only execute at $150.30 or higher
  2. Market Order (optional for this project): Buy/sell immediately at best available price
    • Takes whatever price is available
    • Guaranteed execution, uncertain price
  3. Cancel Order: Remove a previously submitted order
    • Must be O(1) or O(log n) to avoid exploitation

Order Lifecycle:

                   Order Lifecycle State Machine

    ┌───────┐     ┌──────────┐     ┌─────────────┐
    │ NEW   │────►│ PENDING  │────►│   FILLED    │
    └───────┘     └──────────┘     └─────────────┘
        │              │                  ▲
        │              │                  │
        │              ▼                  │
        │         ┌──────────┐           │
        │         │ PARTIAL  │───────────┘
        │         └──────────┘
        │              │
        │              ▼
        ▼         ┌──────────┐
    ┌───────┐     │ CANCELED │
    │REJECTED│    └──────────┘
    └───────┘

    NEW:       Order received but not yet processed
    PENDING:   Order resting in book, waiting for match
    PARTIAL:   Some quantity filled, remainder still in book
    FILLED:    All quantity matched, order complete
    CANCELED:  Order removed before full execution
    REJECTED:  Order failed validation (invalid price, etc.)

2.4 Matching Algorithm

When a new order arrives, the matching engine attempts to match it against existing orders on the opposite side:

MATCHING ALGORITHM: New Buy Order @ $150.28 for 1,500 shares

Step 1: Check if order crosses the spread
        Best Ask: $150.26 (1,000 shares)
        New Buy:  $150.28 (1,500 shares)
        $150.28 >= $150.26 ✓ (Order crosses!)

Step 2: Match against best ask
        Match: 1,000 shares @ $150.26
        Trade generated: BUY 1,000 @ $150.26
        Remaining: 500 shares to fill

Step 3: Move to next price level
        Next Ask: $150.27 (800 shares)
        Match: 500 shares @ $150.27
        Trade generated: BUY 500 @ $150.27
        Order fully filled!

Step 4: Update order book
        - Remove filled orders from ask side
        - Update price level quantities
        - Broadcast market data update

Pseudocode:

function match_order(incoming_order):
    while incoming_order.remaining_qty > 0:
        # Get best opposite price level
        opposite_level = get_best_level(opposite_side(incoming_order.side))

        if opposite_level is None:
            break  # No more liquidity

        if not prices_cross(incoming_order.price, opposite_level.price, incoming_order.side):
            break  # Order doesn't cross spread

        # Match against orders at this level (FIFO order)
        for resting_order in opposite_level.orders:
            match_qty = min(incoming_order.remaining_qty, resting_order.remaining_qty)

            generate_trade(incoming_order, resting_order, match_qty, opposite_level.price)

            incoming_order.remaining_qty -= match_qty
            resting_order.remaining_qty -= match_qty

            if resting_order.remaining_qty == 0:
                remove_order(resting_order)

            if incoming_order.remaining_qty == 0:
                break

        # If level is empty, remove it
        if opposite_level.total_qty == 0:
            remove_level(opposite_level)

    # If order has remaining quantity, add to book
    if incoming_order.remaining_qty > 0:
        add_to_book(incoming_order)

2.5 Why This Matters for HFT

The order book is the most performance-critical data structure in a trading system:

Operation Frequency Latency Requirement Why It Matters
Add Order 10,000-100,000/sec < 1 microsecond Every strategy starts with placing orders
Cancel Order 50,000-500,000/sec < 500 nanoseconds Rapid requoting requires instant cancellation
Match/Execute Variable < 1 microsecond Slow matching = missed opportunities
Best Bid/Ask 100,000+/sec < 100 nanoseconds Price check happens before every decision

Industry Context:

  • NASDAQ: Processes 1+ million messages per second per symbol
  • CME: Sub-microsecond matching latency
  • NYSE: Order book updates every 64 microseconds (consolidated feed)
  • HFT Firms: Internal matching in 10-100 nanoseconds

Real Financial Impact:

Scenario: You're quoting both sides of AAPL
Your Quote: BID $150.00 x 1000 | ASK $150.02 x 1000

The market moves against you (AAPL drops)
You need to cancel your $150.00 bid BEFORE someone sells to you

If your cancel takes 10 microseconds and a competitor's takes 1 microsecond:
  - They cancel first, you get filled on stale quote
  - Loss: potentially $0.10/share x 1000 shares = $100 per incident
  - This happens thousands of times per day

2.6 Historical Context

Evolution of Order Books:

Timeline of Electronic Trading

1960s-1970s:    Physical trading floors, open outcry
                Order books maintained on paper by specialists

1971:           NASDAQ launched - first electronic exchange
                Order books stored in centralized computers

1980s-1990s:    Electronic Communication Networks (ECNs)
                Competition drives innovation

1998:           SEC Regulation ATS enables more ECNs
                Market fragmentation begins

2000s:          Decimalization ($0.01 increments)
                Algorithmic trading emerges
                Latency becomes competitive advantage

2005:           Reg NMS - National Best Bid Offer requirement
                Order routing becomes complex

2010s:          Microsecond wars
                Co-location, FPGA-based trading
                Flash crashes reveal fragility

2020s:          Nanosecond precision
                Quantum-resistant timing
                AI-driven market making

2.7 Common Misconceptions

Misconception 1: “A red-black tree or std::map is good enough for production”

Reality: std::map operations involve pointer chasing and cache misses. Production order books use flat arrays with intrusive lists, achieving 10-50x better performance.

Misconception 2: “The matching algorithm is the hard part”

Reality: The matching algorithm is straightforward (just FIFO). The hard part is the data structure organization that makes matching fast.

Misconception 3: “Memory allocation is fine if it’s fast”

Reality: Even “fast” allocators like jemalloc add 50-200ns variance. Production HFT systems use pre-allocated pools with O(1) allocation.

Misconception 4: “Optimizations don’t matter until you have a working version”

Reality: Partially true for correctness, but architecture decisions made early (data layout, allocation strategy) are very hard to change later. Design for performance from the start.

Misconception 5: “Exchanges use complex AI for matching”

Reality: Matching is deliberately simple and deterministic. Price-time priority is a well-defined, auditable algorithm. Complexity lives in the data structure, not the algorithm.


3. Project Specification

3.1 What You Will Build

A limit order book engine that:

  1. Maintains sorted price levels for both bids (descending) and asks (ascending)
  2. Supports core order operations: Add, Cancel, Modify
  3. Implements price-time priority matching correctly
  4. Generates trade events when orders cross
  5. Reports market data: Best Bid/Offer, depth at each level
  6. Achieves sub-microsecond operation times (target: < 500ns for common operations)

3.2 Functional Requirements

ID Requirement Priority
FR1 Accept limit orders with: symbol, side (buy/sell), price, quantity, order_id Must Have
FR2 Match incoming orders against existing book using price-time priority Must Have
FR3 Cancel orders by order_id in O(1) or O(log n) time Must Have
FR4 Report best bid and best ask prices and quantities Must Have
FR5 Generate trade events with: buyer_id, seller_id, price, quantity, timestamp Must Have
FR6 Support multiple price levels (at least 1000 levels per side) Must Have
FR7 Maintain FIFO order within each price level Must Have
FR8 Handle order modification (price change, quantity change) Should Have
FR9 Support multiple symbols (order book per symbol) Should Have
FR10 Provide depth snapshot (top N levels) Should Have

3.3 Non-Functional Requirements

Requirement Target Measurement
Add Order Latency < 500ns (p50), < 2us (p99) Measure with rdtsc or std::chrono::high_resolution_clock
Cancel Order Latency < 200ns (p50), < 1us (p99) Same measurement approach
Memory Usage < 100 bytes per order Count struct sizes, verify with sizeof
Throughput > 1 million orders/second Benchmark with sustained load
Determinism No dynamic allocation in hot path Profile heap usage during operation
Correctness Price-time priority always maintained Property-based testing

3.4 Example Usage / Output

$ ./orderbook

=== LIMIT ORDER BOOK ENGINE ===

> ADD BUY 100 50.25 order_001
ACK order_001 ADDED @ 50.25 x 100
BOOK: BID 50.25 x 100 | ASK -- x --

> ADD BUY 200 50.24 order_002
ACK order_002 ADDED @ 50.24 x 200
BOOK: BID 50.25 x 100 | ASK -- x --

> ADD SELL 150 50.30 order_003
ACK order_003 ADDED @ 50.30 x 150
BOOK: BID 50.25 x 100 | ASK 50.30 x 150

> ADD SELL 50 50.25 order_004
ACK order_004 ADDED @ 50.25 x 50
TRADE order_001 order_004 50@50.25
ACK order_004 FILLED
BOOK: BID 50.25 x 50 | ASK 50.30 x 150

> CANCEL order_001
ACK order_001 CANCELED (remaining 50)
BOOK: BID 50.24 x 200 | ASK 50.30 x 150

> ADD SELL 300 50.20 order_005
ACK order_005 ADDED @ 50.20 x 300
TRADE order_002 order_005 200@50.24
PARTIAL order_005 200 filled, 100 remaining @ 50.20
BOOK: BID -- x -- | ASK 50.20 x 100

> DEPTH
=== ORDER BOOK DEPTH ===
        ASKS
        ----
50.30 | 150 (1 orders)
50.20 | 100 (1 orders)
        ----
      SPREAD: 0.00
        ----
        BIDS
        ----
      (empty)

> STATS
Operations: 5 adds, 1 cancel
Trades: 2 (250 shares)
Avg add latency: 342ns (p50), 891ns (p99)
Avg cancel latency: 156ns (p50), 412ns (p99)
Memory: 5 orders x 64 bytes = 320 bytes

3.5 Real World Outcome

When complete, you will have:

  1. A functional matching engine that can process orders, generate trades, and maintain consistent book state

  2. Benchmark results demonstrating sub-microsecond latency:
    === BENCHMARK RESULTS ===
    Add Order:    312ns (p50), 456ns (p95), 1.2us (p99)
    Cancel Order: 89ns (p50), 124ns (p95), 234ns (p99)
    Best Bid/Ask: 12ns (p50), 18ns (p95), 31ns (p99)
    Throughput:   2.4M orders/second sustained
    
  3. Understanding of why your design choices affect performance

  4. Foundation for Project 3 (Matching Engine) which adds networking

4. Solution Architecture

4.1 High-Level Design

                         ORDER BOOK ENGINE ARCHITECTURE

┌─────────────────────────────────────────────────────────────────────────┐
│                            ORDER BOOK ENGINE                             │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  ┌──────────────────────────────────────────────────────────────────┐   │
│  │                         ORDER INPUT                               │   │
│  │                                                                   │   │
│  │   ┌─────────────┐    ┌─────────────┐    ┌─────────────┐         │   │
│  │   │ Add Order   │    │Cancel Order │    │Modify Order │         │   │
│  │   └──────┬──────┘    └──────┬──────┘    └──────┬──────┘         │   │
│  │          │                  │                  │                  │   │
│  └──────────┼──────────────────┼──────────────────┼──────────────────┘   │
│             │                  │                  │                      │
│             ▼                  ▼                  ▼                      │
│  ┌──────────────────────────────────────────────────────────────────┐   │
│  │                      ORDER VALIDATOR                              │   │
│  │   - Price > 0                                                     │   │
│  │   - Quantity > 0                                                  │   │
│  │   - Order ID unique                                               │   │
│  │   - Side valid (BUY/SELL)                                        │   │
│  └──────────────────────────────────────────────────────────────────┘   │
│             │                                                            │
│             ▼                                                            │
│  ┌──────────────────────────────────────────────────────────────────┐   │
│  │                      MATCHING ENGINE                              │   │
│  │                                                                   │   │
│  │   ┌─────────────────────────────────────────────────────────┐   │   │
│  │   │                  ORDER BOOK (per symbol)                 │   │   │
│  │   │                                                          │   │   │
│  │   │   ┌──────────────────┐      ┌──────────────────┐        │   │   │
│  │   │   │    BIDS (Buy)    │      │   ASKS (Sell)    │        │   │   │
│  │   │   │  (desc by price) │      │ (asc by price)   │        │   │   │
│  │   │   │                  │      │                  │        │   │   │
│  │   │   │ PriceLevel 50.26 │      │ PriceLevel 50.28 │        │   │   │
│  │   │   │   └─ Order Queue │      │   └─ Order Queue │        │   │   │
│  │   │   │ PriceLevel 50.25 │      │ PriceLevel 50.29 │        │   │   │
│  │   │   │   └─ Order Queue │      │   └─ Order Queue │        │   │   │
│  │   │   │        ...       │      │        ...       │        │   │   │
│  │   │   └──────────────────┘      └──────────────────┘        │   │   │
│  │   │                                                          │   │   │
│  │   │   ┌──────────────────────────────────────────────┐      │   │   │
│  │   │   │           ORDER ID -> ORDER* MAP             │      │   │   │
│  │   │   │   (for O(1) cancel/modify by order_id)       │      │   │   │
│  │   │   └──────────────────────────────────────────────┘      │   │   │
│  │   │                                                          │   │   │
│  │   └─────────────────────────────────────────────────────────┘   │   │
│  │                                                                   │   │
│  └──────────────────────────────────────────────────────────────────┘   │
│             │                                                            │
│             ▼                                                            │
│  ┌──────────────────────────────────────────────────────────────────┐   │
│  │                       EVENT OUTPUT                                │   │
│  │                                                                   │   │
│  │   ┌──────────┐    ┌──────────┐    ┌──────────┐    ┌──────────┐  │   │
│  │   │Order ACK │    │  Trade   │    │ Cancel   │    │Market    │  │   │
│  │   │          │    │  Event   │    │   ACK    │    │Data      │  │   │
│  │   └──────────┘    └──────────┘    └──────────┘    └──────────┘  │   │
│  │                                                                   │   │
│  └──────────────────────────────────────────────────────────────────┘   │
│                                                                          │
│  ┌──────────────────────────────────────────────────────────────────┐   │
│  │                      MEMORY POOL                                  │   │
│  │   Pre-allocated Order objects for zero-allocation hot path       │   │
│  └──────────────────────────────────────────────────────────────────┘   │
│                                                                          │
└─────────────────────────────────────────────────────────────────────────┘

4.2 Key Components

Component Responsibility Performance Requirement
Order Validator Validate order parameters before processing O(1), < 50ns
Order Book Maintain sorted price levels and order queues O(log n) add, O(1) best price
Price Level FIFO queue of orders at same price O(1) add/remove from ends
Order Map OrderID -> Order* for fast lookup O(1) lookup
Matching Logic Execute crossing orders, generate trades O(m) where m = matched orders
Memory Pool Pre-allocated order storage O(1) allocate/deallocate
Event Publisher Output trades, acks, market data Non-blocking

4.3 Data Structures

The choice of data structures is the most critical design decision. Let’s analyze the options:

Price Level Organization:

Data Structure Add Level Remove Level Find Best Memory Overhead Cache Behavior
std::map<Price, Level> O(log n) O(log n) O(1) amortized High (nodes) Poor (pointer chase)
Sorted vector O(n) O(n) O(1) Low Excellent
Skip list O(log n) O(log n) O(1) Medium Medium
Array + hash map O(1) O(1) O(1) Medium Excellent

Recommended Approach: Price Level Array

For typical instruments, prices cluster in a range. We can use:

Price Level Array Design

Price Range: $100.00 - $200.00 at $0.01 increments = 10,000 possible levels
Array Size: 10,000 x sizeof(PriceLevel*) = 80KB (fits in L2 cache!)

Price Index = (price - base_price) / tick_size

┌────────────────────────────────────────────────────────────────┐
│ Index:  0      1      2      3      ...   9999                 │
│ Price: 100.00 100.01 100.02 100.03  ...  199.99               │
│        │      │      │      │       ...   │                   │
│        ▼      ▼      ▼      ▼             ▼                   │
│       NULL  Level* NULL  Level*    ...  NULL                  │
│              │            │                                    │
│              ▼            ▼                                    │
│            ┌─────┐     ┌─────┐                                │
│            │Queue│     │Queue│                                │
│            │O1→O2│     │O7→O8│                                │
│            └─────┘     └─────┘                                │
└────────────────────────────────────────────────────────────────┘

Best bid/ask tracking:
- best_bid_index: index of highest non-empty bid level
- best_ask_index: index of lowest non-empty ask level

After removing best level, scan for next non-empty level
(Typically 1-3 indices away, so O(1) in practice)

Order Queue at Each Level:

For FIFO within a price level, use an intrusive doubly-linked list:

Intrusive Linked List Design

Instead of:
  std::list<Order*>  // Allocates ListNode objects

Use intrusive:
  struct Order {
      Order* prev;    // Part of Order struct itself
      Order* next;    // No separate allocation!
      uint64_t id;
      uint32_t qty;
      // ...
  };

Price Level:
┌─────────────────────────────────────────────────────────────┐
│  head ──► Order_A ◄──► Order_B ◄──► Order_C ◄── tail       │
│           │                                     │           │
│           └────────── FIFO queue ───────────────┘          │
└─────────────────────────────────────────────────────────────┘

Operations:
  Add to back:   O(1) - tail->next = new_order; tail = new_order;
  Remove any:    O(1) - prev->next = next; next->prev = prev;
  Get front:     O(1) - return head;

Order ID Lookup:

For O(1) cancel by order_id:

// Option 1: Hash map (std::unordered_map)
std::unordered_map<OrderId, Order*> order_map;
// Pros: Works for any order ID range
// Cons: Allocation on insert, hash computation

// Option 2: Pre-allocated array (if order IDs are sequential)
std::vector<Order*> order_array;  // Index = order_id
// Pros: O(1) with no hashing, cache-friendly
// Cons: Requires sequential/bounded order IDs

// Recommendation: Use hash map initially, optimize later if needed

4.4 Memory Layout

Cache efficiency requires careful struct layout:

// BAD: Wasted space due to alignment padding
struct OrderBad {
    char side;           // 1 byte
    // 7 bytes padding
    double price;        // 8 bytes
    uint32_t quantity;   // 4 bytes
    // 4 bytes padding
    uint64_t order_id;   // 8 bytes
    Order* next;         // 8 bytes
    Order* prev;         // 8 bytes
    uint64_t timestamp;  // 8 bytes
};
// sizeof = 56 bytes (with 11 bytes wasted on padding!)

// GOOD: Ordered by size to minimize padding
struct OrderGood {
    uint64_t order_id;   // 8 bytes
    uint64_t timestamp;  // 8 bytes
    double price;        // 8 bytes (or use int64_t fixed-point)
    Order* next;         // 8 bytes
    Order* prev;         // 8 bytes
    uint32_t quantity;   // 4 bytes
    char side;           // 1 byte
    char padding[3];     // explicit padding to 8-byte boundary
};
// sizeof = 48 bytes (no wasted padding)

// BEST: Fixed-point price, minimal size
struct OrderBest {
    Order* next;           // 8 bytes
    Order* prev;           // 8 bytes
    uint64_t order_id;     // 8 bytes
    int64_t price_ticks;   // 8 bytes (price in tick units)
    uint32_t quantity;     // 4 bytes
    uint16_t symbol_id;    // 2 bytes
    uint8_t side;          // 1 byte
    uint8_t flags;         // 1 byte
};
// sizeof = 40 bytes, all useful data

Fixed-Point Pricing:

Floating-point has issues:

  • Comparison can be imprecise
  • Slower on some CPUs
  • Wasted bits for unused precision

Use fixed-point instead:

// Price = 150.25, tick size = 0.01
// Stored as: 15025 (integer)

// To convert:
int64_t price_to_ticks(double price, double tick_size) {
    return static_cast<int64_t>(price / tick_size + 0.5);
}

double ticks_to_price(int64_t ticks, double tick_size) {
    return ticks * tick_size;
}

// Comparisons are now exact integer comparisons!

4.5 Algorithm Overview

Add Order Algorithm:

function add_order(order):
    // Step 1: Validate
    if not valid(order):
        return REJECTED

    // Step 2: Check for match
    if order.side == BUY:
        while order.remaining_qty > 0 AND best_ask exists AND order.price >= best_ask.price:
            match_at_level(order, best_ask_level)
    else:  // SELL
        while order.remaining_qty > 0 AND best_bid exists AND order.price <= best_bid.price:
            match_at_level(order, best_bid_level)

    // Step 3: If remaining quantity, add to book
    if order.remaining_qty > 0:
        level = get_or_create_level(order.side, order.price)
        level.add_order(order)
        order_map[order.id] = order
        update_best_price(order.side)

    return order.status

function match_at_level(incoming, level):
    for resting_order in level.orders:  // FIFO order
        match_qty = min(incoming.remaining_qty, resting_order.remaining_qty)

        emit_trade(incoming, resting_order, match_qty, level.price)

        incoming.remaining_qty -= match_qty
        resting_order.remaining_qty -= match_qty

        if resting_order.remaining_qty == 0:
            level.remove_order(resting_order)
            order_map.remove(resting_order.id)
            order_pool.deallocate(resting_order)

        if incoming.remaining_qty == 0:
            break

    if level.is_empty():
        remove_level(level)

Cancel Order Algorithm:

function cancel_order(order_id):
    order = order_map.get(order_id)
    if order is None:
        return NOT_FOUND

    level = get_level(order.side, order.price)
    level.remove_order(order)  // O(1) with doubly-linked list

    if level.is_empty():
        remove_level(level)
        update_best_price(order.side)

    order_map.remove(order_id)
    order_pool.deallocate(order)

    return CANCELED

5. Implementation Guide

5.1 Development Environment Setup

C++ Setup:

# Ubuntu/Debian
sudo apt-get update
sudo apt-get install g++ cmake gdb valgrind linux-tools-generic

# Verify compiler supports C++17
g++ --version  # Need 7+ for C++17

# Install perf for profiling (Linux)
sudo apt-get install linux-tools-common

# Create project
mkdir -p orderbook/{src,include,tests,benchmarks}
cd orderbook

Rust Setup:

# Install Rust
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
source $HOME/.cargo/env

# Create project
cargo new orderbook
cd orderbook

# Add dependencies to Cargo.toml
# [dependencies]
# criterion = "0.5"  # For benchmarking

5.2 Project Structure

C++ Structure:

orderbook/
├── CMakeLists.txt
├── src/
│   ├── main.cpp              # CLI entry point
│   ├── orderbook.cpp         # Order book implementation
│   ├── matching_engine.cpp   # Matching logic
│   └── memory_pool.cpp       # Custom allocator (Phase 2)
├── include/
│   ├── order.hpp             # Order struct
│   ├── orderbook.hpp         # OrderBook class
│   ├── price_level.hpp       # PriceLevel with order queue
│   ├── matching_engine.hpp   # Matching interface
│   └── memory_pool.hpp       # Pool allocator
├── tests/
│   ├── test_orderbook.cpp
│   └── test_matching.cpp
├── benchmarks/
│   └── bench_orderbook.cpp
└── README.md

Rust Structure:

orderbook/
├── Cargo.toml
├── src/
│   ├── main.rs               # CLI entry point
│   ├── lib.rs                # Library root
│   ├── order.rs              # Order struct
│   ├── orderbook.rs          # OrderBook implementation
│   ├── price_level.rs        # PriceLevel
│   └── matching.rs           # Matching logic
├── tests/
│   └── integration_tests.rs
├── benches/
│   └── orderbook_bench.rs
└── README.md

5.3 The Core Question You’re Answering

“How do you organize price levels and orders such that the most common operations (add order, match, cancel) are O(1) or O(log n) while maintaining cache efficiency?”

This question forces you to confront the reality that big-O notation alone is insufficient for HFT. An O(log n) tree operation with poor cache behavior can be slower than an O(n) scan of a cache-resident array.

The answer involves:

  1. Choosing data structures that keep hot data together in memory
  2. Minimizing pointer indirection (each pointer chase is a potential cache miss)
  3. Pre-allocating memory to avoid allocation in the hot path
  4. Using integer arithmetic (fixed-point prices) instead of floating-point

5.4 Concepts You Must Understand First

Before starting implementation, verify your understanding:

Concept Self-Assessment Question Where to Learn
Hash maps What is the expected time complexity of insert/lookup? What happens during a resize? CS:APP Ch. 6, Language docs
Linked lists How do you remove a node from a doubly-linked list in O(1)? Any data structures textbook
CPU cache hierarchy What is a cache line? Why does data locality matter? CS:APP Ch. 6
Struct alignment Why does the order of struct members affect sizeof? CS:APP Ch. 3.9
Memory allocation Why is malloc slow? What is a memory pool? CS:APP Ch. 9
Fixed-point arithmetic How do you represent $150.25 as an integer? Any numerics reference

5.5 Questions to Guide Your Design

Data Structure Questions:

  1. How will you represent prices internally? (float, double, int64 ticks?)
  2. What data structure will hold all price levels? (map, vector, array?)
  3. How will you maintain FIFO order within a price level?
  4. How will you enable O(1) cancel-by-order-id?
  5. How will you track the best bid and best ask efficiently?

Memory Questions:

  1. Will you allocate Order objects on demand or pre-allocate?
  2. How much memory will your order book use with 100,000 orders?
  3. What is the size of your Order struct after considering alignment?
  4. Will your data structures fit in L1/L2/L3 cache?

Algorithm Questions:

  1. When a new buy order matches multiple sell orders, in what order are they filled?
  2. After filling the best ask level completely, how do you find the new best ask?
  3. What happens if a cancel arrives for an order that’s currently being matched?
  4. How do you handle partial fills?

Performance Questions:

  1. What operations happen most frequently? (add, cancel, match?)
  2. Where will cache misses likely occur in your design?
  3. How will you benchmark your implementation?
  4. What is your target latency for each operation?

5.6 Thinking Exercise

Before writing any code, work through this scenario by hand:

Scenario:

Start with an empty order book. Process these orders in sequence:

1. ADD BUY  100 @ 50.25  order_A
2. ADD BUY  200 @ 50.24  order_B
3. ADD BUY  150 @ 50.25  order_C
4. ADD SELL  50 @ 50.30  order_D
5. ADD SELL 300 @ 50.25  order_E
6. CANCEL order_B
7. ADD SELL 100 @ 50.20  order_F

Questions to answer on paper:

  1. After step 3, draw the bid side of the book. Which orders are at price 50.25, and in what order?

  2. After step 4, what does the book look like? What trades (if any) were generated?

  3. After step 5, what trades were generated? List each trade with price, quantity, buyer, seller.

  4. After step 5, what is the state of order_A? Order_C? Order_E?

  5. After step 6, draw the full book (bids and asks).

  6. After step 7, what trades were generated? What is the final book state?

Verify your answers:

  • Total shares bought should equal total shares sold
  • Every trade should have happened at the price of the resting order
  • FIFO should have been maintained within each price level

5.7 Hints in Layers

If you’re stuck, reveal hints one at a time:

Hint 1 - Starting Direction

Start with the simplest possible implementation using standard library containers:

// C++
std::map<int64_t, std::list<Order*>> bids;  // price -> orders (descending)
std::map<int64_t, std::list<Order*>> asks;  // price -> orders (ascending)
std::unordered_map<uint64_t, Order*> order_lookup;  // id -> order
// Rust
use std::collections::{BTreeMap, VecDeque, HashMap};

bids: BTreeMap<i64, VecDeque<Order>>  // Reverse for descending
asks: BTreeMap<i64, VecDeque<Order>>
order_lookup: HashMap<u64, OrderRef>

Get correctness first. Optimize later.

Hint 2 - More Specific Guidance

For the matching logic, process orders in this sequence:

  1. Check if incoming order crosses the spread
    • Buy order crosses if: buy_price >= best_ask_price
    • Sell order crosses if: sell_price <= best_bid_price
  2. If it crosses, iterate through opposing levels
    while (remaining_qty > 0 && crosses_spread(order)) {
        auto& best_level = get_best_opposing_level(order.side);
        match_against_level(order, best_level);
    }
    
  3. Match against individual orders in FIFO order
    void match_against_level(Order& incoming, Level& level) {
        while (!level.orders.empty() && incoming.remaining > 0) {
            Order& resting = level.orders.front();
            uint32_t match_qty = std::min(incoming.remaining, resting.remaining);
            execute_trade(incoming, resting, match_qty, level.price);
            // ... update remaining quantities, remove filled orders
        }
    }
    
  4. After matching, add remainder to book if any
Hint 3 - Technical Details

Price Level Array Implementation (for optimization phase):

class OrderBook {
    static constexpr int64_t MIN_PRICE = 1;      // $0.01
    static constexpr int64_t MAX_PRICE = 1000000; // $10,000.00
    static constexpr size_t NUM_LEVELS = MAX_PRICE - MIN_PRICE + 1;

    std::array<PriceLevel*, NUM_LEVELS> bid_levels{};
    std::array<PriceLevel*, NUM_LEVELS> ask_levels{};

    int64_t best_bid_price = 0;  // 0 = no bids
    int64_t best_ask_price = MAX_PRICE + 1;  // MAX+1 = no asks

    size_t price_to_index(int64_t price) {
        return static_cast<size_t>(price - MIN_PRICE);
    }

    PriceLevel* get_bid_level(int64_t price) {
        return bid_levels[price_to_index(price)];
    }

    void update_best_bid_after_remove(int64_t removed_price) {
        if (removed_price != best_bid_price) return;
        // Scan downward for next non-empty level
        for (int64_t p = removed_price - 1; p >= MIN_PRICE; --p) {
            if (bid_levels[price_to_index(p)] != nullptr) {
                best_bid_price = p;
                return;
            }
        }
        best_bid_price = 0;  // No bids
    }
};

Intrusive List for O(1) Order Removal:

struct Order {
    Order* prev = nullptr;
    Order* next = nullptr;
    // ... other fields
};

class OrderQueue {
    Order* head = nullptr;
    Order* tail = nullptr;

public:
    void push_back(Order* order) {
        if (tail) {
            tail->next = order;
            order->prev = tail;
            tail = order;
        } else {
            head = tail = order;
        }
    }

    void remove(Order* order) {
        if (order->prev) order->prev->next = order->next;
        else head = order->next;

        if (order->next) order->next->prev = order->prev;
        else tail = order->prev;

        order->prev = order->next = nullptr;
    }
};
Hint 4 - Tools and Debugging

Benchmarking:

#include <chrono>

auto start = std::chrono::high_resolution_clock::now();
// ... operation
auto end = std::chrono::high_resolution_clock::now();
auto ns = std::chrono::duration_cast<std::chrono::nanoseconds>(end - start).count();

For more accurate timing (x86):

#include <x86intrin.h>

uint64_t start = __rdtsc();
// ... operation
uint64_t end = __rdtsc();
uint64_t cycles = end - start;
// Convert to ns: cycles * (1.0 / cpu_freq_ghz)

Profiling with perf:

# Record performance counters
perf stat -e cycles,instructions,cache-misses,cache-references ./orderbook_bench

# Record call graph for flamegraph
perf record -g ./orderbook_bench
perf report

Checking struct sizes:

static_assert(sizeof(Order) == 40, "Order size unexpected");
std::cout << "Order size: " << sizeof(Order) << " bytes\n";
std::cout << "Order alignment: " << alignof(Order) << "\n";

Memory allocation tracking:

# Use valgrind massif for heap profiling
valgrind --tool=massif ./orderbook_bench
ms_print massif.out.*

5.8 The Interview Questions They’ll Ask

After completing this project, you should be able to answer:

  1. “Explain how a limit order book works.”
    • Expected: Clear explanation of bids, asks, price levels, FIFO matching
    • Bonus: Mention spread, depth, liquidity concepts
  2. “What data structures would you use for an order book and why?”
    • Expected: Discuss tradeoffs between maps, vectors, arrays
    • Bonus: Mention cache efficiency, intrusive lists, memory pools
  3. “How do you handle order cancellation efficiently?”
    • Expected: Hash map for O(1) lookup, doubly-linked list for O(1) removal
    • Bonus: Discuss order ID generation strategies
  4. “What is price-time priority and why is it important?”
    • Expected: FIFO within price level, incentivizes speed
    • Bonus: Discuss alternative algorithms (pro-rata, size-priority)
  5. “How would you optimize an order book for low latency?”
    • Expected: Memory pooling, cache optimization, fixed-point prices
    • Bonus: Discuss false sharing, branch prediction, lock-free structures
  6. “What happens when a market order arrives?”
    • Expected: Matches against best levels until filled or liquidity exhausted
    • Bonus: Discuss market vs limit, IOC, FOK order types
  7. “How does your design scale with 1 million price levels?”
    • Expected: Discuss when array-based breaks down, alternatives
    • Bonus: Discuss sparse data structures, adaptive strategies
  8. “What would you do differently for production?”
    • Expected: Acknowledge simplifications, discuss what production adds
    • Bonus: Mention FIX protocol, persistence, replication

5.9 Books That Will Help

Topic Book Chapter
Cache and memory hierarchy Computer Systems: A Programmer’s Perspective Ch. 6
Memory layout and alignment Computer Systems: A Programmer’s Perspective Ch. 3.9
Memory management and allocators Computer Systems: A Programmer’s Perspective Ch. 9
C++ data structures Building Low Latency Applications with C++ Ch. 3-4
Order book design Building Low Latency Applications with C++ Ch. 5-6
Rust data structures Programming Rust Ch. 16
Lock-free fundamentals (for later) Rust Atomics and Locks Ch. 1-3
Trading system architecture Developing High-Frequency Trading Systems Ch. 1-4

5.10 Implementation Phases

Phase 1: Basic Functionality (Days 1-3)

Goals:

  • Get a working order book with std::map (or BTreeMap in Rust)
  • Implement add, cancel, and match
  • Create basic CLI for testing

Tasks:

  1. Define Order struct with all necessary fields
  2. Create PriceLevel class with order queue (use std::list or VecDeque)
  3. Create OrderBook class with bid/ask maps
  4. Implement add_order() with no matching (just add to book)
  5. Implement cancel_order()
  6. Implement get_best_bid() and get_best_ask()
  7. Create simple CLI to test

Checkpoint: Can add orders, cancel them, and view book state.

Phase 2: Matching Logic (Days 4-5)

Goals:

  • Implement price-time priority matching
  • Generate trade events
  • Handle partial fills

Tasks:

  1. Implement crosses_spread() check
  2. Implement match_against_level()
  3. Modify add_order() to call matching before adding to book
  4. Create Trade struct for trade events
  5. Handle order state changes (partial, filled)
  6. Test matching with various scenarios

Checkpoint: Orders match correctly, trades are generated, book updates properly.

Phase 3: Optimization - Memory (Days 6-8)

Goals:

  • Eliminate heap allocation in hot path
  • Switch to intrusive linked lists
  • Implement memory pool for orders

Tasks:

  1. Measure baseline allocation with valgrind/massif
  2. Convert Order to use intrusive list pointers
  3. Implement simple fixed-size memory pool
  4. Integrate pool with order book
  5. Verify zero allocations during order processing
  6. Measure improvement

Checkpoint: Adding/canceling orders allocates no heap memory.

Phase 4: Optimization - Cache (Days 9-11)

Goals:

  • Switch to array-based price level storage
  • Optimize struct layout
  • Use fixed-point prices

Tasks:

  1. Implement price-to-index conversion
  2. Create array-based level storage
  3. Implement best price tracking with scan
  4. Optimize Order struct layout for cache lines
  5. Switch to int64_t fixed-point prices
  6. Measure cache misses before/after

Checkpoint: Cache miss rate significantly reduced.

Phase 5: Benchmarking (Days 12-14)

Goals:

  • Comprehensive latency measurement
  • Throughput testing
  • Document performance characteristics

Tasks:

  1. Create benchmark harness with rdtsc or chrono
  2. Measure add order latency (p50, p95, p99)
  3. Measure cancel latency
  4. Measure throughput (orders/second)
  5. Compare against baseline (Phase 1 version)
  6. Document results and analysis

Checkpoint: Performance targets met, results documented.

5.11 Key Implementation Decisions

Decision Options Recommendation Rationale
Price representation float, double, int64 fixed-point int64 fixed-point Exact comparisons, smaller struct, faster
Level storage std::map, sorted vector, array Array (then map fallback) O(1) access, cache-friendly
Order queue std::list, std::deque, intrusive list Intrusive doubly-linked O(1) removal, no allocation
Order lookup std::unordered_map, flat_hash_map std::unordered_map (then optimize) Standard, works for prototyping
Memory malloc, memory pool Memory pool (Phase 3) Predictable, fast, no fragmentation

6. Testing Strategy

6.1 Unit Tests

Test Category What to Test
Order struct Correct initialization, size, alignment
PriceLevel Add/remove orders, FIFO order, total quantity
OrderBook Add to empty book, add to existing level, best bid/ask
Matching Single match, multi-level match, partial fill
Cancel Cancel existing, cancel non-existent, cancel partial

6.2 Property-Based Tests

Use property testing (QuickCheck-style) to find edge cases:

Property 1: Total buy quantity in trades == Total sell quantity in trades
Property 2: Trade price is always between bid and ask prices
Property 3: After any sequence of operations, book is internally consistent
Property 4: Order of trades respects FIFO within each price level
Property 5: Canceled order is never matched in subsequent trades

6.3 Critical Test Cases

Test 1: Simple Match
  ADD BUY 100 @ 50.00
  ADD SELL 100 @ 50.00
  Expected: Trade 100 @ 50.00, book empty

Test 2: Partial Fill
  ADD BUY 100 @ 50.00
  ADD SELL 50 @ 50.00
  Expected: Trade 50 @ 50.00, bid of 50 @ 50.00 remains

Test 3: Multi-Level Match
  ADD BUY 100 @ 50.00
  ADD BUY 100 @ 49.00
  ADD SELL 150 @ 48.00
  Expected: Trade 100 @ 50.00, Trade 50 @ 49.00,
            bid of 50 @ 49.00 remains, ask of 0

Test 4: FIFO Priority
  ADD BUY 100 @ 50.00 order_A
  ADD BUY 100 @ 50.00 order_B
  ADD SELL 100 @ 50.00 order_C
  Expected: Trade order_A-order_C 100 @ 50.00,
            order_B still in book

Test 5: Cancel Before Match
  ADD BUY 100 @ 50.00 order_A
  CANCEL order_A
  ADD SELL 100 @ 50.00 order_B
  Expected: No trade, ask of 100 @ 50.00

Test 6: Many Price Levels
  ADD 1000 buy orders at prices 50.00-50.99 (0.01 increments)
  ADD SELL 100 @ 48.00
  Expected: 100 trades, best bid drops appropriately

Test 7: High Volume
  ADD 100,000 buy orders at random prices
  ADD 100,000 sell orders at random prices
  Verify: No crashes, book consistent, all trades valid

6.4 Stress Tests

Stress Test 1: Throughput
  Add 1,000,000 orders as fast as possible
  Measure orders per second

Stress Test 2: Mixed Workload
  70% adds, 25% cancels, 5% queries
  Run for 10 seconds
  Measure latency distribution

Stress Test 3: Adversarial
  Add orders at 10,000 different price levels
  Cancel all of them
  Verify no memory leaks

7. Common Pitfalls and Debugging

Problem Symptom Cause Solution Verification
Wrong match order Orders matched out of FIFO sequence Using wrong data structure (set, unordered) Use ordered container or explicit timestamp comparison Log order arrival times, verify first-in matched first
Memory leaks Memory usage grows over time Orders not freed after fill/cancel Implement memory pool, track allocations Run with valgrind, verify count goes to zero
Cache thrashing High latency variance Pointer-heavy data structures Switch to array-based, intrusive lists Check cache misses with perf stat
Floating point bugs $0.10 + $0.20 != $0.30 Using float/double for prices Use fixed-point integer representation Test with known problematic values
Best price stale Report wrong best bid/ask Not updating after cancel Scan for new best after level empties Unit test cancel at best price
Quantity overflow Negative or huge quantities Not checking for overflow Use checked arithmetic or assert bounds Test with MAX_INT quantities
ID collision Wrong order canceled Order IDs reused incorrectly Use monotonic incrementing IDs Assert uniqueness on add
Race conditions (future) Corrupt state Concurrent access without locks Single-threaded first, then add proper sync Run thread sanitizer

7.1 Debugging Strategies

Print the book state:

void debug_print_book() {
    std::cout << "=== BIDS ===\n";
    for (const auto& [price, level] : bids) {
        std::cout << price << ": ";
        for (const auto& order : level.orders) {
            std::cout << "[" << order.id << ":" << order.qty << "] ";
        }
        std::cout << "\n";
    }
    // ... similar for asks
}

Validate invariants:

void validate_book() {
    // All orders in book should be in lookup map
    // All orders in lookup should be in book
    // Best bid should be highest non-empty bid level
    // Best ask should be lowest non-empty ask level
    // No price level should be empty
    // Total quantity at level should match sum of orders
}

Trace execution:

#define TRACE(msg) std::cerr << __FUNCTION__ << ":" << __LINE__ << " " << msg << "\n"

void add_order(Order& order) {
    TRACE("order_id=" << order.id << " price=" << order.price);
    // ...
}

8. Extensions and Challenges

8.1 Beginner Extensions

  • Add modify order: Change price or quantity of existing order
  • Market order support: IOC (Immediate or Cancel) execution
  • Order history: Keep log of all orders and trades
  • Pretty printing: Colorized book display

8.2 Intermediate Extensions

  • Multiple symbols: Order book per symbol, symbol manager
  • Order types: Stop orders, stop-limit, trailing stop
  • Time in force: Day, GTC, GTD, FOK, IOC
  • Self-trade prevention: Prevent same user matching with self
  • ITCH protocol: Parse NASDAQ ITCH data to populate book

8.3 Advanced Extensions

  • Lock-free order book: Concurrent reads while writes happening
  • FPGA acceleration: Move hot path to hardware
  • Persistent state: Recover book state after crash
  • Market data feed: Publish L2/L3 data over UDP multicast
  • FIX gateway: Accept orders via FIX protocol

9. Real-World Connections

9.1 How Production Systems Differ

Aspect This Project Production System
Orders per second 100,000 10,000,000+
Latency target < 1 microsecond < 100 nanoseconds
Price levels Thousands Millions possible
Symbols 1 10,000+
Redundancy None Hot-hot replication
Persistence None Every order journaled
Connectivity CLI FIX, ITCH, proprietary
Monitoring printf Prometheus, Grafana, alerting

9.2 Real Exchange Examples

NASDAQ Matching Engine:

  • Processes 10+ million orders per day per symbol
  • Latency: ~50 microseconds end-to-end
  • Uses ITCH protocol for market data (binary, efficient)
  • FIX for order entry
  • Runs on commodity Linux servers (not FPGA)

CME Globex:

  • Largest futures exchange
  • Sub-millisecond matching
  • Uses iLink protocol (binary)
  • Co-location available for HFT firms

IEX (Investors Exchange):

  • Famous for “speed bump” (350 microsecond delay)
  • Designed to reduce HFT advantage
  • Same order book mechanics, different latency profile

9.3 Open Source References

Project Description Useful For
LMAX Disruptor Lock-free ring buffer Understanding low-latency patterns
OpenHFT Chronicle Queue Persisted queue Journaling patterns
quickfix FIX engine Protocol implementation
nautilus_trader Python/Rust trading Architecture reference

10. Resources

10.1 Essential Reading

  • “Building Low Latency Applications with C++” by Sourav Ghosh (Ch. 3-6)
    • Step-by-step order book implementation
    • C++ specific optimizations
  • “Computer Systems: A Programmer’s Perspective” by Bryant & O’Hallaron (Ch. 6)
    • Cache hierarchy and locality
    • Why memory layout matters

10.2 Articles and Papers

10.3 GitHub Repositories

10.4 Videos

  • “CppCon: Building Low Latency Trading Systems” - Various years
  • “LMAX Exchange Architecture” - Martin Thompson (InfoQ)
  • “Lock-Free Programming” - Herb Sutter (Channel 9)

11. Self-Assessment Checklist

Understanding

  • I can explain what a limit order book is without looking at notes
  • I can describe price-time priority and why it matters
  • I understand why cache efficiency is critical for HFT
  • I can explain the tradeoffs between different data structures
  • I know what fixed-point arithmetic is and when to use it
  • I understand what intrusive data structures are

Implementation

  • My order book correctly maintains sorted price levels
  • Orders match in FIFO order within price levels
  • Cancel is O(1) using order ID lookup
  • No heap allocation occurs during normal operation (Phase 3+)
  • My benchmark shows sub-microsecond latency
  • All test cases pass

Growth

  • I profiled my code and found at least one bottleneck
  • I measured and can explain my cache miss rate
  • I can read and understand the reference implementations
  • I could explain my design decisions in an interview

12. Submission / Completion Criteria

Minimum Viable Completion

  • Order book supports add, cancel, and match operations
  • Matching follows price-time priority correctly
  • Basic CLI demonstrates functionality
  • At least 5 unit tests pass
  • Code compiles without warnings

Full Completion

  • All 10 test scenarios pass
  • Memory pool eliminates hot-path allocation
  • Add order latency < 1 microsecond (p99)
  • Cancel order latency < 500 nanoseconds (p99)
  • Throughput > 500,000 orders/second
  • Cache miss rate documented

Excellence (Going Above and Beyond)

  • Array-based price level storage implemented
  • Latency < 500ns (p99) for all operations
  • Multiple symbols supported
  • Benchmark comparison with baseline documented
  • ITCH or FIX protocol parsing integrated

This guide was expanded from HIGH_FREQUENCY_TRADING_CPP_RUST_LEARNING_PROJECTS.md. For the complete learning path, see the project index.