Project 1: Limit Order Book Engine

Build the fundamental data structure of every exchange and trading system - a fully functional limit order book that maintains bid/ask levels, handles order matching, and reports trades with nanosecond-level performance.

Quick Reference

Attribute	Value
Difficulty	Intermediate
Time Estimate	1-2 weeks
Language	C++ or Rust
Prerequisites	Basic C++/Rust, understanding of maps/queues
Key Topics	Cache optimization, data structures, memory layout, order matching
Coolness Level	Level 4: Hardcore Tech Flex
Portfolio Value	Resume Gold

1. Learning Objectives

By completing this project, you will:

Understand limit order book mechanics: Explain how bids, asks, price levels, and order queues work in real exchanges
Master cache-friendly data structure design: Implement data layouts that minimize cache misses and maximize throughput
Learn price-time priority matching: Build a matching algorithm that correctly executes orders following exchange rules
Gain experience with memory pooling: Eliminate malloc/new from your hot path using custom allocation strategies
Benchmark and optimize for nanosecond latency: Use profiling tools to identify bottlenecks and measure improvements
Build the foundation for HFT systems: This project is the core component that all subsequent projects build upon

2. Theoretical Foundation

2.1 What is a Limit Order Book?

A limit order book (LOB) is the central data structure used by all electronic exchanges to match buyers with sellers. Every stock exchange (NYSE, NASDAQ), cryptocurrency exchange (Coinbase, Binance), and futures exchange (CME) maintains an order book for each tradable instrument.

The order book contains two sides:

                    LIMIT ORDER BOOK: AAPL

    ┌─────────────────────────────────────────────────┐
    │                    ASKS (SELLS)                  │
    │                                                  │
    │  Price    │ Quantity │ Orders                   │
    │  ─────────┼──────────┼─────────────────────    │
    │  $150.30  │   2,500  │ [O5:500, O6:2000]        │ ◄── Best Ask
    │  $150.29  │   1,200  │ [O4:1200]                │
    │  $150.28  │   3,800  │ [O3:800, O7:3000]        │
    │  ─────────┴──────────┴─────────────────────    │
    │                                                  │
    │              ═══ SPREAD: $0.02 ═══               │
    │                                                  │
    │  ─────────┬──────────┬─────────────────────    │
    │  $150.26  │   4,100  │ [O1:1000, O2:100, ...]   │ ◄── Best Bid
    │  $150.25  │   2,300  │ [O8:2300]                │
    │  $150.24  │   5,600  │ [O9:1000, O10:4600]      │
    │                                                  │
    │                    BIDS (BUYS)                   │
    └─────────────────────────────────────────────────┘

Key Terminology:

Term	Definition
Bid	An offer to buy at a specific price
Ask (Offer)	An offer to sell at a specific price
Best Bid	Highest price any buyer is willing to pay
Best Ask	Lowest price any seller is willing to accept
Spread	Difference between best ask and best bid
Price Level	All orders at the same price
Depth	Total quantity available at each price level
Liquidity	How easily you can trade without moving the price

2.2 Price-Time Priority (FIFO)

When multiple orders exist at the same price level, they are executed in the order they arrived. This is called price-time priority or FIFO (First In, First Out).

Price Level $150.26: [Order A: 9:00:00.001] -> [Order B: 9:00:00.003] -> [Order C: 9:00:00.007]
                            ^                          ^
                      Arrived first              Arrived second
                      (Matched first)            (Matched second)

Why This Matters: Speed is everything. If you can submit your order 1 microsecond faster than a competitor, you get priority at that price level. This is why HFT firms spend billions on infrastructure.

2.3 Order Types and Lifecycle

Order Types You Must Support:

Limit Order: Buy/sell at a specific price or better
- BUY 100 @ $150.25 - Will only execute at $150.25 or lower
- SELL 100 @ $150.30 - Will only execute at $150.30 or higher
Market Order (optional for this project): Buy/sell immediately at best available price
- Takes whatever price is available
- Guaranteed execution, uncertain price
Cancel Order: Remove a previously submitted order
- Must be O(1) or O(log n) to avoid exploitation

Order Lifecycle:

                   Order Lifecycle State Machine

    ┌───────┐     ┌──────────┐     ┌─────────────┐
    │ NEW   │────►│ PENDING  │────►│   FILLED    │
    └───────┘     └──────────┘     └─────────────┘
        │              │                  ▲
        │              │                  │
        │              ▼                  │
        │         ┌──────────┐           │
        │         │ PARTIAL  │───────────┘
        │         └──────────┘
        │              │
        │              ▼
        ▼         ┌──────────┐
    ┌───────┐     │ CANCELED │
    │REJECTED│    └──────────┘
    └───────┘

    NEW:       Order received but not yet processed
    PENDING:   Order resting in book, waiting for match
    PARTIAL:   Some quantity filled, remainder still in book
    FILLED:    All quantity matched, order complete
    CANCELED:  Order removed before full execution
    REJECTED:  Order failed validation (invalid price, etc.)

2.4 Matching Algorithm

When a new order arrives, the matching engine attempts to match it against existing orders on the opposite side:

MATCHING ALGORITHM: New Buy Order @ $150.28 for 1,500 shares

Step 1: Check if order crosses the spread
        Best Ask: $150.26 (1,000 shares)
        New Buy:  $150.28 (1,500 shares)
        $150.28 >= $150.26 ✓ (Order crosses!)

Step 2: Match against best ask
        Match: 1,000 shares @ $150.26
        Trade generated: BUY 1,000 @ $150.26
        Remaining: 500 shares to fill

Step 3: Move to next price level
        Next Ask: $150.27 (800 shares)
        Match: 500 shares @ $150.27
        Trade generated: BUY 500 @ $150.27
        Order fully filled!

Step 4: Update order book
        - Remove filled orders from ask side
        - Update price level quantities
        - Broadcast market data update

Pseudocode:

function match_order(incoming_order):
    while incoming_order.remaining_qty > 0:
        # Get best opposite price level
        opposite_level = get_best_level(opposite_side(incoming_order.side))

        if opposite_level is None:
            break  # No more liquidity

        if not prices_cross(incoming_order.price, opposite_level.price, incoming_order.side):
            break  # Order doesn't cross spread

        # Match against orders at this level (FIFO order)
        for resting_order in opposite_level.orders:
            match_qty = min(incoming_order.remaining_qty, resting_order.remaining_qty)

            generate_trade(incoming_order, resting_order, match_qty, opposite_level.price)

            incoming_order.remaining_qty -= match_qty
            resting_order.remaining_qty -= match_qty

            if resting_order.remaining_qty == 0:
                remove_order(resting_order)

            if incoming_order.remaining_qty == 0:
                break

        # If level is empty, remove it
        if opposite_level.total_qty == 0:
            remove_level(opposite_level)

    # If order has remaining quantity, add to book
    if incoming_order.remaining_qty > 0:
        add_to_book(incoming_order)

2.5 Why This Matters for HFT

The order book is the most performance-critical data structure in a trading system:

Operation	Frequency	Latency Requirement	Why It Matters
Add Order	10,000-100,000/sec	< 1 microsecond	Every strategy starts with placing orders
Cancel Order	50,000-500,000/sec	< 500 nanoseconds	Rapid requoting requires instant cancellation
Match/Execute	Variable	< 1 microsecond	Slow matching = missed opportunities
Best Bid/Ask	100,000+/sec	< 100 nanoseconds	Price check happens before every decision

Industry Context:

NASDAQ: Processes 1+ million messages per second per symbol
CME: Sub-microsecond matching latency
NYSE: Order book updates every 64 microseconds (consolidated feed)
HFT Firms: Internal matching in 10-100 nanoseconds

Real Financial Impact:

Scenario: You're quoting both sides of AAPL
Your Quote: BID $150.00 x 1000 | ASK $150.02 x 1000

The market moves against you (AAPL drops)
You need to cancel your $150.00 bid BEFORE someone sells to you

If your cancel takes 10 microseconds and a competitor's takes 1 microsecond:
  - They cancel first, you get filled on stale quote
  - Loss: potentially $0.10/share x 1000 shares = $100 per incident
  - This happens thousands of times per day

2.6 Historical Context

Evolution of Order Books:

Timeline of Electronic Trading

1960s-1970s:    Physical trading floors, open outcry
                Order books maintained on paper by specialists

1971:           NASDAQ launched - first electronic exchange
                Order books stored in centralized computers

1980s-1990s:    Electronic Communication Networks (ECNs)
                Competition drives innovation

1998:           SEC Regulation ATS enables more ECNs
                Market fragmentation begins

2000s:          Decimalization ($0.01 increments)
                Algorithmic trading emerges
                Latency becomes competitive advantage

2005:           Reg NMS - National Best Bid Offer requirement
                Order routing becomes complex

2010s:          Microsecond wars
                Co-location, FPGA-based trading
                Flash crashes reveal fragility

2020s:          Nanosecond precision
                Quantum-resistant timing
                AI-driven market making

2.7 Common Misconceptions

Misconception 1: “A red-black tree or std::map is good enough for production”

Reality: std::map operations involve pointer chasing and cache misses. Production order books use flat arrays with intrusive lists, achieving 10-50x better performance.

Misconception 2: “The matching algorithm is the hard part”

Reality: The matching algorithm is straightforward (just FIFO). The hard part is the data structure organization that makes matching fast.

Misconception 3: “Memory allocation is fine if it’s fast”

Reality: Even “fast” allocators like jemalloc add 50-200ns variance. Production HFT systems use pre-allocated pools with O(1) allocation.

Misconception 4: “Optimizations don’t matter until you have a working version”

Reality: Partially true for correctness, but architecture decisions made early (data layout, allocation strategy) are very hard to change later. Design for performance from the start.

Misconception 5: “Exchanges use complex AI for matching”

Reality: Matching is deliberately simple and deterministic. Price-time priority is a well-defined, auditable algorithm. Complexity lives in the data structure, not the algorithm.

3. Project Specification

3.1 What You Will Build

A limit order book engine that:

Maintains sorted price levels for both bids (descending) and asks (ascending)
Supports core order operations: Add, Cancel, Modify
Implements price-time priority matching correctly
Generates trade events when orders cross
Reports market data: Best Bid/Offer, depth at each level
Achieves sub-microsecond operation times (target: < 500ns for common operations)

3.2 Functional Requirements

ID	Requirement	Priority
FR1	Accept limit orders with: symbol, side (buy/sell), price, quantity, order_id	Must Have
FR2	Match incoming orders against existing book using price-time priority	Must Have
FR3	Cancel orders by order_id in O(1) or O(log n) time	Must Have
FR4	Report best bid and best ask prices and quantities	Must Have
FR5	Generate trade events with: buyer_id, seller_id, price, quantity, timestamp	Must Have
FR6	Support multiple price levels (at least 1000 levels per side)	Must Have
FR7	Maintain FIFO order within each price level	Must Have
FR8	Handle order modification (price change, quantity change)	Should Have
FR9	Support multiple symbols (order book per symbol)	Should Have
FR10	Provide depth snapshot (top N levels)	Should Have

3.3 Non-Functional Requirements

Requirement	Target	Measurement
Add Order Latency	< 500ns (p50), < 2us (p99)	Measure with rdtsc or std::chrono::high_resolution_clock
Cancel Order Latency	< 200ns (p50), < 1us (p99)	Same measurement approach
Memory Usage	< 100 bytes per order	Count struct sizes, verify with sizeof
Throughput	> 1 million orders/second	Benchmark with sustained load
Determinism	No dynamic allocation in hot path	Profile heap usage during operation
Correctness	Price-time priority always maintained	Property-based testing

3.4 Example Usage / Output

$ ./orderbook

=== LIMIT ORDER BOOK ENGINE ===

> ADD BUY 100 50.25 order_001
ACK order_001 ADDED @ 50.25 x 100
BOOK: BID 50.25 x 100 | ASK -- x --

> ADD BUY 200 50.24 order_002
ACK order_002 ADDED @ 50.24 x 200
BOOK: BID 50.25 x 100 | ASK -- x --

> ADD SELL 150 50.30 order_003
ACK order_003 ADDED @ 50.30 x 150
BOOK: BID 50.25 x 100 | ASK 50.30 x 150

> ADD SELL 50 50.25 order_004
ACK order_004 ADDED @ 50.25 x 50
TRADE order_001 order_004 50@50.25
ACK order_004 FILLED
BOOK: BID 50.25 x 50 | ASK 50.30 x 150

> CANCEL order_001
ACK order_001 CANCELED (remaining 50)
BOOK: BID 50.24 x 200 | ASK 50.30 x 150

> ADD SELL 300 50.20 order_005
ACK order_005 ADDED @ 50.20 x 300
TRADE order_002 order_005 200@50.24
PARTIAL order_005 200 filled, 100 remaining @ 50.20
BOOK: BID -- x -- | ASK 50.20 x 100

> DEPTH
=== ORDER BOOK DEPTH ===
        ASKS
        ----
50.30 | 150 (1 orders)
50.20 | 100 (1 orders)
        ----
      SPREAD: 0.00
        ----
        BIDS
        ----
      (empty)

> STATS
Operations: 5 adds, 1 cancel
Trades: 2 (250 shares)
Avg add latency: 342ns (p50), 891ns (p99)
Avg cancel latency: 156ns (p50), 412ns (p99)
Memory: 5 orders x 64 bytes = 320 bytes

3.5 Real World Outcome

When complete, you will have:

A functional matching engine that can process orders, generate trades, and maintain consistent book state

Benchmark results demonstrating sub-microsecond latency:

=== BENCHMARK RESULTS ===
Add Order:    312ns (p50), 456ns (p95), 1.2us (p99)
Cancel Order: 89ns (p50), 124ns (p95), 234ns (p99)
Best Bid/Ask: 12ns (p50), 18ns (p95), 31ns (p99)
Throughput:   2.4M orders/second sustained

Understanding of why your design choices affect performance
Foundation for Project 3 (Matching Engine) which adds networking

4. Solution Architecture

4.1 High-Level Design

                         ORDER BOOK ENGINE ARCHITECTURE

┌─────────────────────────────────────────────────────────────────────────┐
│                            ORDER BOOK ENGINE                             │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  ┌──────────────────────────────────────────────────────────────────┐   │
│  │                         ORDER INPUT                               │   │
│  │                                                                   │   │
│  │   ┌─────────────┐    ┌─────────────┐    ┌─────────────┐         │   │
│  │   │ Add Order   │    │Cancel Order │    │Modify Order │         │   │
│  │   └──────┬──────┘    └──────┬──────┘    └──────┬──────┘         │   │
│  │          │                  │                  │                  │   │
│  └──────────┼──────────────────┼──────────────────┼──────────────────┘   │
│             │                  │                  │                      │
│             ▼                  ▼                  ▼                      │
│  ┌──────────────────────────────────────────────────────────────────┐   │
│  │                      ORDER VALIDATOR                              │   │
│  │   - Price > 0                                                     │   │
│  │   - Quantity > 0                                                  │   │
│  │   - Order ID unique                                               │   │
│  │   - Side valid (BUY/SELL)                                        │   │
│  └──────────────────────────────────────────────────────────────────┘   │
│             │                                                            │
│             ▼                                                            │
│  ┌──────────────────────────────────────────────────────────────────┐   │
│  │                      MATCHING ENGINE                              │   │
│  │                                                                   │   │
│  │   ┌─────────────────────────────────────────────────────────┐   │   │
│  │   │                  ORDER BOOK (per symbol)                 │   │   │
│  │   │                                                          │   │   │
│  │   │   ┌──────────────────┐      ┌──────────────────┐        │   │   │
│  │   │   │    BIDS (Buy)    │      │   ASKS (Sell)    │        │   │   │
│  │   │   │  (desc by price) │      │ (asc by price)   │        │   │   │
│  │   │   │                  │      │                  │        │   │   │
│  │   │   │ PriceLevel 50.26 │      │ PriceLevel 50.28 │        │   │   │
│  │   │   │   └─ Order Queue │      │   └─ Order Queue │        │   │   │
│  │   │   │ PriceLevel 50.25 │      │ PriceLevel 50.29 │        │   │   │
│  │   │   │   └─ Order Queue │      │   └─ Order Queue │        │   │   │
│  │   │   │        ...       │      │        ...       │        │   │   │
│  │   │   └──────────────────┘      └──────────────────┘        │   │   │
│  │   │                                                          │   │   │
│  │   │   ┌──────────────────────────────────────────────┐      │   │   │
│  │   │   │           ORDER ID -> ORDER* MAP             │      │   │   │
│  │   │   │   (for O(1) cancel/modify by order_id)       │      │   │   │
│  │   │   └──────────────────────────────────────────────┘      │   │   │
│  │   │                                                          │   │   │
│  │   └─────────────────────────────────────────────────────────┘   │   │
│  │                                                                   │   │
│  └──────────────────────────────────────────────────────────────────┘   │
│             │                                                            │
│             ▼                                                            │
│  ┌──────────────────────────────────────────────────────────────────┐   │
│  │                       EVENT OUTPUT                                │   │
│  │                                                                   │   │
│  │   ┌──────────┐    ┌──────────┐    ┌──────────┐    ┌──────────┐  │   │
│  │   │Order ACK │    │  Trade   │    │ Cancel   │    │Market    │  │   │
│  │   │          │    │  Event   │    │   ACK    │    │Data      │  │   │
│  │   └──────────┘    └──────────┘    └──────────┘    └──────────┘  │   │
│  │                                                                   │   │
│  └──────────────────────────────────────────────────────────────────┘   │
│                                                                          │
│  ┌──────────────────────────────────────────────────────────────────┐   │
│  │                      MEMORY POOL                                  │   │
│  │   Pre-allocated Order objects for zero-allocation hot path       │   │
│  └──────────────────────────────────────────────────────────────────┘   │
│                                                                          │
└─────────────────────────────────────────────────────────────────────────┘

4.2 Key Components

Component	Responsibility	Performance Requirement
Order Validator	Validate order parameters before processing	O(1), < 50ns
Order Book	Maintain sorted price levels and order queues	O(log n) add, O(1) best price
Price Level	FIFO queue of orders at same price	O(1) add/remove from ends
Order Map	OrderID -> Order* for fast lookup	O(1) lookup
Matching Logic	Execute crossing orders, generate trades	O(m) where m = matched orders
Memory Pool	Pre-allocated order storage	O(1) allocate/deallocate
Event Publisher	Output trades, acks, market data	Non-blocking

4.3 Data Structures

The choice of data structures is the most critical design decision. Let’s analyze the options:

Price Level Organization:

Data Structure	Add Level	Remove Level	Find Best	Memory Overhead	Cache Behavior
`std::map<Price, Level>`	O(log n)	O(log n)	O(1) amortized	High (nodes)	Poor (pointer chase)
Sorted vector	O(n)	O(n)	O(1)	Low	Excellent
Skip list	O(log n)	O(log n)	O(1)	Medium	Medium
Array + hash map	O(1)	O(1)	O(1)	Medium	Excellent

Recommended Approach: Price Level Array

For typical instruments, prices cluster in a range. We can use:

Price Level Array Design

Price Range: $100.00 - $200.00 at $0.01 increments = 10,000 possible levels
Array Size: 10,000 x sizeof(PriceLevel*) = 80KB (fits in L2 cache!)

Price Index = (price - base_price) / tick_size

┌────────────────────────────────────────────────────────────────┐
│ Index:  0      1      2      3      ...   9999                 │
│ Price: 100.00 100.01 100.02 100.03  ...  199.99               │
│        │      │      │      │       ...   │                   │
│        ▼      ▼      ▼      ▼             ▼                   │
│       NULL  Level* NULL  Level*    ...  NULL                  │
│              │            │                                    │
│              ▼            ▼                                    │
│            ┌─────┐     ┌─────┐                                │
│            │Queue│     │Queue│                                │
│            │O1→O2│     │O7→O8│                                │
│            └─────┘     └─────┘                                │
└────────────────────────────────────────────────────────────────┘

Best bid/ask tracking:
- best_bid_index: index of highest non-empty bid level
- best_ask_index: index of lowest non-empty ask level

After removing best level, scan for next non-empty level
(Typically 1-3 indices away, so O(1) in practice)

Order Queue at Each Level:

For FIFO within a price level, use an intrusive doubly-linked list:

Intrusive Linked List Design

Instead of:
  std::list<Order*>  // Allocates ListNode objects

Use intrusive:
  struct Order {
      Order* prev;    // Part of Order struct itself
      Order* next;    // No separate allocation!
      uint64_t id;
      uint32_t qty;
      // ...
  };

Price Level:
┌─────────────────────────────────────────────────────────────┐
│  head ──► Order_A ◄──► Order_B ◄──► Order_C ◄── tail       │
│           │                                     │           │
│           └────────── FIFO queue ───────────────┘          │
└─────────────────────────────────────────────────────────────┘

Operations:
  Add to back:   O(1) - tail->next = new_order; tail = new_order;
  Remove any:    O(1) - prev->next = next; next->prev = prev;
  Get front:     O(1) - return head;

Order ID Lookup:

For O(1) cancel by order_id:

// Option 1: Hash map (std::unordered_map)
std::unordered_map<OrderId, Order*> order_map;
// Pros: Works for any order ID range
// Cons: Allocation on insert, hash computation

// Option 2: Pre-allocated array (if order IDs are sequential)
std::vector<Order*> order_array;  // Index = order_id
// Pros: O(1) with no hashing, cache-friendly
// Cons: Requires sequential/bounded order IDs

// Recommendation: Use hash map initially, optimize later if needed

4.4 Memory Layout

Cache efficiency requires careful struct layout:

// BAD: Wasted space due to alignment padding
struct OrderBad {
    char side;           // 1 byte
    // 7 bytes padding
    double price;        // 8 bytes
    uint32_t quantity;   // 4 bytes
    // 4 bytes padding
    uint64_t order_id;   // 8 bytes
    Order* next;         // 8 bytes
    Order* prev;         // 8 bytes
    uint64_t timestamp;  // 8 bytes
};
// sizeof = 56 bytes (with 11 bytes wasted on padding!)

// GOOD: Ordered by size to minimize padding
struct OrderGood {
    uint64_t order_id;   // 8 bytes
    uint64_t timestamp;  // 8 bytes
    double price;        // 8 bytes (or use int64_t fixed-point)
    Order* next;         // 8 bytes
    Order* prev;         // 8 bytes
    uint32_t quantity;   // 4 bytes
    char side;           // 1 byte
    char padding[3];     // explicit padding to 8-byte boundary
};
// sizeof = 48 bytes (no wasted padding)

// BEST: Fixed-point price, minimal size
struct OrderBest {
    Order* next;           // 8 bytes
    Order* prev;           // 8 bytes
    uint64_t order_id;     // 8 bytes
    int64_t price_ticks;   // 8 bytes (price in tick units)
    uint32_t quantity;     // 4 bytes
    uint16_t symbol_id;    // 2 bytes
    uint8_t side;          // 1 byte
    uint8_t flags;         // 1 byte
};
// sizeof = 40 bytes, all useful data

Fixed-Point Pricing:

Floating-point has issues:

Comparison can be imprecise
Slower on some CPUs
Wasted bits for unused precision

Use fixed-point instead:

// Price = 150.25, tick size = 0.01
// Stored as: 15025 (integer)

// To convert:
int64_t price_to_ticks(double price, double tick_size) {
    return static_cast<int64_t>(price / tick_size + 0.5);
}

double ticks_to_price(int64_t ticks, double tick_size) {
    return ticks * tick_size;
}

// Comparisons are now exact integer comparisons!

4.5 Algorithm Overview

Add Order Algorithm:

function add_order(order):
    // Step 1: Validate
    if not valid(order):
        return REJECTED

    // Step 2: Check for match
    if order.side == BUY:
        while order.remaining_qty > 0 AND best_ask exists AND order.price >= best_ask.price:
            match_at_level(order, best_ask_level)
    else:  // SELL
        while order.remaining_qty > 0 AND best_bid exists AND order.price <= best_bid.price:
            match_at_level(order, best_bid_level)

    // Step 3: If remaining quantity, add to book
    if order.remaining_qty > 0:
        level = get_or_create_level(order.side, order.price)
        level.add_order(order)
        order_map[order.id] = order
        update_best_price(order.side)

    return order.status

function match_at_level(incoming, level):
    for resting_order in level.orders:  // FIFO order
        match_qty = min(incoming.remaining_qty, resting_order.remaining_qty)

        emit_trade(incoming, resting_order, match_qty, level.price)

        incoming.remaining_qty -= match_qty
        resting_order.remaining_qty -= match_qty

        if resting_order.remaining_qty == 0:
            level.remove_order(resting_order)
            order_map.remove(resting_order.id)
            order_pool.deallocate(resting_order)

        if incoming.remaining_qty == 0:
            break

    if level.is_empty():
        remove_level(level)

Cancel Order Algorithm:

function cancel_order(order_id):
    order = order_map.get(order_id)
    if order is None:
        return NOT_FOUND

    level = get_level(order.side, order.price)
    level.remove_order(order)  // O(1) with doubly-linked list

    if level.is_empty():
        remove_level(level)
        update_best_price(order.side)

    order_map.remove(order_id)
    order_pool.deallocate(order)

    return CANCELED

5. Implementation Guide

5.1 Development Environment Setup

C++ Setup:

# Ubuntu/Debian
sudo apt-get update
sudo apt-get install g++ cmake gdb valgrind linux-tools-generic

# Verify compiler supports C++17
g++ --version  # Need 7+ for C++17

# Install perf for profiling (Linux)
sudo apt-get install linux-tools-common

# Create project
mkdir -p orderbook/{src,include,tests,benchmarks}
cd orderbook

Rust Setup:

# Install Rust
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
source $HOME/.cargo/env

# Create project
cargo new orderbook
cd orderbook

# Add dependencies to Cargo.toml
# [dependencies]
# criterion = "0.5" # For benchmarking

5.2 Project Structure

C++ Structure:

orderbook/
├── CMakeLists.txt
├── src/
│   ├── main.cpp              # CLI entry point
│   ├── orderbook.cpp         # Order book implementation
│   ├── matching_engine.cpp   # Matching logic
│   └── memory_pool.cpp       # Custom allocator (Phase 2)
├── include/
│   ├── order.hpp             # Order struct
│   ├── orderbook.hpp         # OrderBook class
│   ├── price_level.hpp       # PriceLevel with order queue
│   ├── matching_engine.hpp   # Matching interface
│   └── memory_pool.hpp       # Pool allocator
├── tests/
│   ├── test_orderbook.cpp
│   └── test_matching.cpp
├── benchmarks/
│   └── bench_orderbook.cpp
└── README.md

Rust Structure:

orderbook/
├── Cargo.toml
├── src/
│   ├── main.rs               # CLI entry point
│   ├── lib.rs                # Library root
│   ├── order.rs              # Order struct
│   ├── orderbook.rs          # OrderBook implementation
│   ├── price_level.rs        # PriceLevel
│   └── matching.rs           # Matching logic
├── tests/
│   └── integration_tests.rs
├── benches/
│   └── orderbook_bench.rs
└── README.md

5.3 The Core Question You’re Answering

“How do you organize price levels and orders such that the most common operations (add order, match, cancel) are O(1) or O(log n) while maintaining cache efficiency?”

This question forces you to confront the reality that big-O notation alone is insufficient for HFT. An O(log n) tree operation with poor cache behavior can be slower than an O(n) scan of a cache-resident array.

The answer involves:

Choosing data structures that keep hot data together in memory
Minimizing pointer indirection (each pointer chase is a potential cache miss)
Pre-allocating memory to avoid allocation in the hot path
Using integer arithmetic (fixed-point prices) instead of floating-point

5.4 Concepts You Must Understand First

Before starting implementation, verify your understanding:

Concept	Self-Assessment Question	Where to Learn
Hash maps	What is the expected time complexity of insert/lookup? What happens during a resize?	CS:APP Ch. 6, Language docs
Linked lists	How do you remove a node from a doubly-linked list in O(1)?	Any data structures textbook
CPU cache hierarchy	What is a cache line? Why does data locality matter?	CS:APP Ch. 6
Struct alignment	Why does the order of struct members affect sizeof?	CS:APP Ch. 3.9
Memory allocation	Why is malloc slow? What is a memory pool?	CS:APP Ch. 9
Fixed-point arithmetic	How do you represent $150.25 as an integer?	Any numerics reference

5.5 Questions to Guide Your Design

Data Structure Questions:

How will you represent prices internally? (float, double, int64 ticks?)
What data structure will hold all price levels? (map, vector, array?)
How will you maintain FIFO order within a price level?
How will you enable O(1) cancel-by-order-id?
How will you track the best bid and best ask efficiently?

Memory Questions:

Will you allocate Order objects on demand or pre-allocate?
How much memory will your order book use with 100,000 orders?
What is the size of your Order struct after considering alignment?
Will your data structures fit in L1/L2/L3 cache?

Algorithm Questions:

When a new buy order matches multiple sell orders, in what order are they filled?
After filling the best ask level completely, how do you find the new best ask?
What happens if a cancel arrives for an order that’s currently being matched?
How do you handle partial fills?

Performance Questions:

What operations happen most frequently? (add, cancel, match?)
Where will cache misses likely occur in your design?
How will you benchmark your implementation?
What is your target latency for each operation?

5.6 Thinking Exercise

Before writing any code, work through this scenario by hand:

Scenario:

Start with an empty order book. Process these orders in sequence:

ADD BUY  100 @ 50.25  order_A
ADD BUY  200 @ 50.24  order_B
ADD BUY  150 @ 50.25  order_C
ADD SELL  50 @ 50.30  order_D
ADD SELL 300 @ 50.25  order_E
CANCEL order_B
ADD SELL 100 @ 50.20  order_F

Questions to answer on paper:

After step 3, draw the bid side of the book. Which orders are at price 50.25, and in what order?
After step 4, what does the book look like? What trades (if any) were generated?
After step 5, what trades were generated? List each trade with price, quantity, buyer, seller.
After step 5, what is the state of order_A? Order_C? Order_E?
After step 6, draw the full book (bids and asks).
After step 7, what trades were generated? What is the final book state?

Verify your answers:

Total shares bought should equal total shares sold
Every trade should have happened at the price of the resting order
FIFO should have been maintained within each price level

5.7 Hints in Layers

If you’re stuck, reveal hints one at a time:

Hint 1 - Starting Direction

Start with the simplest possible implementation using standard library containers:

// C++
std::map<int64_t, std::list<Order*>> bids;  // price -> orders (descending)
std::map<int64_t, std::list<Order*>> asks;  // price -> orders (ascending)
std::unordered_map<uint64_t, Order*> order_lookup;  // id -> order

// Rust
use std::collections::{BTreeMap, VecDeque, HashMap};

bids: BTreeMap<i64, VecDeque<Order>>  // Reverse for descending
asks: BTreeMap<i64, VecDeque<Order>>
order_lookup: HashMap<u64, OrderRef>

Get correctness first. Optimize later.

Hint 2 - More Specific Guidance

For the matching logic, process orders in this sequence:

Check if incoming order crosses the spread
- Buy order crosses if: buy_price >= best_ask_price
- Sell order crosses if: sell_price <= best_bid_price

If it crosses, iterate through opposing levels

while (remaining_qty > 0 && crosses_spread(order)) {
    auto& best_level = get_best_opposing_level(order.side);
    match_against_level(order, best_level);
}

Match against individual orders in FIFO order

void match_against_level(Order& incoming, Level& level) {
    while (!level.orders.empty() && incoming.remaining > 0) {
        Order& resting = level.orders.front();
        uint32_t match_qty = std::min(incoming.remaining, resting.remaining);
        execute_trade(incoming, resting, match_qty, level.price);
        // ... update remaining quantities, remove filled orders
    }
}

After matching, add remainder to book if any

Hint 3 - Technical Details

Price Level Array Implementation (for optimization phase):

class OrderBook {
    static constexpr int64_t MIN_PRICE = 1;      // $0.01
    static constexpr int64_t MAX_PRICE = 1000000; // $10,000.00
    static constexpr size_t NUM_LEVELS = MAX_PRICE - MIN_PRICE + 1;

    std::array<PriceLevel*, NUM_LEVELS> bid_levels{};
    std::array<PriceLevel*, NUM_LEVELS> ask_levels{};

    int64_t best_bid_price = 0;  // 0 = no bids
    int64_t best_ask_price = MAX_PRICE + 1;  // MAX+1 = no asks

    size_t price_to_index(int64_t price) {
        return static_cast<size_t>(price - MIN_PRICE);
    }

    PriceLevel* get_bid_level(int64_t price) {
        return bid_levels[price_to_index(price)];
    }

    void update_best_bid_after_remove(int64_t removed_price) {
        if (removed_price != best_bid_price) return;
        // Scan downward for next non-empty level
        for (int64_t p = removed_price - 1; p >= MIN_PRICE; --p) {
            if (bid_levels[price_to_index(p)] != nullptr) {
                best_bid_price = p;
                return;
            }
        }
        best_bid_price = 0;  // No bids
    }
};

Intrusive List for O(1) Order Removal:

struct Order {
    Order* prev = nullptr;
    Order* next = nullptr;
    // ... other fields
};

class OrderQueue {
    Order* head = nullptr;
    Order* tail = nullptr;

public:
    void push_back(Order* order) {
        if (tail) {
            tail->next = order;
            order->prev = tail;
            tail = order;
        } else {
            head = tail = order;
        }
    }

    void remove(Order* order) {
        if (order->prev) order->prev->next = order->next;
        else head = order->next;

        if (order->next) order->next->prev = order->prev;
        else tail = order->prev;

        order->prev = order->next = nullptr;
    }
};

Hint 4 - Tools and Debugging

Benchmarking:

#include <chrono>

auto start = std::chrono::high_resolution_clock::now();
// ... operation
auto end = std::chrono::high_resolution_clock::now();
auto ns = std::chrono::duration_cast<std::chrono::nanoseconds>(end - start).count();

For more accurate timing (x86):

#include <x86intrin.h>

uint64_t start = __rdtsc();
// ... operation
uint64_t end = __rdtsc();
uint64_t cycles = end - start;
// Convert to ns: cycles * (1.0 / cpu_freq_ghz)

Profiling with perf:

# Record performance counters
perf stat -e cycles,instructions,cache-misses,cache-references ./orderbook_bench

# Record call graph for flamegraph
perf record -g ./orderbook_bench
perf report

Checking struct sizes:

static_assert(sizeof(Order) == 40, "Order size unexpected");
std::cout << "Order size: " << sizeof(Order) << " bytes\n";
std::cout << "Order alignment: " << alignof(Order) << "\n";

Memory allocation tracking:

# Use valgrind massif for heap profiling
valgrind --tool=massif ./orderbook_bench
ms_print massif.out.*

5.8 The Interview Questions They’ll Ask

After completing this project, you should be able to answer:

“Explain how a limit order book works.”
- Expected: Clear explanation of bids, asks, price levels, FIFO matching
- Bonus: Mention spread, depth, liquidity concepts
“What data structures would you use for an order book and why?”
- Expected: Discuss tradeoffs between maps, vectors, arrays
- Bonus: Mention cache efficiency, intrusive lists, memory pools
“How do you handle order cancellation efficiently?”
- Expected: Hash map for O(1) lookup, doubly-linked list for O(1) removal
- Bonus: Discuss order ID generation strategies
“What is price-time priority and why is it important?”
- Expected: FIFO within price level, incentivizes speed
- Bonus: Discuss alternative algorithms (pro-rata, size-priority)
“How would you optimize an order book for low latency?”
- Expected: Memory pooling, cache optimization, fixed-point prices
- Bonus: Discuss false sharing, branch prediction, lock-free structures
“What happens when a market order arrives?”
- Expected: Matches against best levels until filled or liquidity exhausted
- Bonus: Discuss market vs limit, IOC, FOK order types
“How does your design scale with 1 million price levels?”
- Expected: Discuss when array-based breaks down, alternatives
- Bonus: Discuss sparse data structures, adaptive strategies
“What would you do differently for production?”
- Expected: Acknowledge simplifications, discuss what production adds
- Bonus: Mention FIX protocol, persistence, replication

5.9 Books That Will Help

Topic	Book	Chapter
Cache and memory hierarchy	Computer Systems: A Programmer’s Perspective	Ch. 6
Memory layout and alignment	Computer Systems: A Programmer’s Perspective	Ch. 3.9
Memory management and allocators	Computer Systems: A Programmer’s Perspective	Ch. 9
C++ data structures	Building Low Latency Applications with C++	Ch. 3-4
Order book design	Building Low Latency Applications with C++	Ch. 5-6
Rust data structures	Programming Rust	Ch. 16
Lock-free fundamentals (for later)	Rust Atomics and Locks	Ch. 1-3
Trading system architecture	Developing High-Frequency Trading Systems	Ch. 1-4

5.10 Implementation Phases

Phase 1: Basic Functionality (Days 1-3)

Goals:

Get a working order book with std::map (or BTreeMap in Rust)
Implement add, cancel, and match
Create basic CLI for testing

Tasks:

Define Order struct with all necessary fields
Create PriceLevel class with order queue (use std::list or VecDeque)
Create OrderBook class with bid/ask maps
Implement add_order() with no matching (just add to book)
Implement cancel_order()
Implement get_best_bid() and get_best_ask()
Create simple CLI to test

Checkpoint: Can add orders, cancel them, and view book state.

Phase 2: Matching Logic (Days 4-5)

Goals:

Implement price-time priority matching
Generate trade events
Handle partial fills

Tasks:

Implement crosses_spread() check
Implement match_against_level()
Modify add_order() to call matching before adding to book
Create Trade struct for trade events
Handle order state changes (partial, filled)
Test matching with various scenarios

Checkpoint: Orders match correctly, trades are generated, book updates properly.

Phase 3: Optimization - Memory (Days 6-8)

Goals:

Eliminate heap allocation in hot path
Switch to intrusive linked lists
Implement memory pool for orders

Tasks:

Measure baseline allocation with valgrind/massif
Convert Order to use intrusive list pointers
Implement simple fixed-size memory pool
Integrate pool with order book
Verify zero allocations during order processing
Measure improvement

Checkpoint: Adding/canceling orders allocates no heap memory.

Phase 4: Optimization - Cache (Days 9-11)

Goals:

Switch to array-based price level storage
Optimize struct layout
Use fixed-point prices

Tasks:

Implement price-to-index conversion
Create array-based level storage
Implement best price tracking with scan
Optimize Order struct layout for cache lines
Switch to int64_t fixed-point prices
Measure cache misses before/after

Checkpoint: Cache miss rate significantly reduced.

Phase 5: Benchmarking (Days 12-14)

Goals:

Comprehensive latency measurement
Throughput testing
Document performance characteristics

Tasks:

Create benchmark harness with rdtsc or chrono
Measure add order latency (p50, p95, p99)
Measure cancel latency
Measure throughput (orders/second)
Compare against baseline (Phase 1 version)
Document results and analysis

Checkpoint: Performance targets met, results documented.

5.11 Key Implementation Decisions

Decision	Options	Recommendation	Rationale
Price representation	float, double, int64 fixed-point	int64 fixed-point	Exact comparisons, smaller struct, faster
Level storage	std::map, sorted vector, array	Array (then map fallback)	O(1) access, cache-friendly
Order queue	std::list, std::deque, intrusive list	Intrusive doubly-linked	O(1) removal, no allocation
Order lookup	std::unordered_map, flat_hash_map	std::unordered_map (then optimize)	Standard, works for prototyping
Memory	malloc, memory pool	Memory pool (Phase 3)	Predictable, fast, no fragmentation

6. Testing Strategy

6.1 Unit Tests

Test Category	What to Test
Order struct	Correct initialization, size, alignment
PriceLevel	Add/remove orders, FIFO order, total quantity
OrderBook	Add to empty book, add to existing level, best bid/ask
Matching	Single match, multi-level match, partial fill
Cancel	Cancel existing, cancel non-existent, cancel partial

6.2 Property-Based Tests

Use property testing (QuickCheck-style) to find edge cases:

Property 1: Total buy quantity in trades == Total sell quantity in trades
Property 2: Trade price is always between bid and ask prices
Property 3: After any sequence of operations, book is internally consistent
Property 4: Order of trades respects FIFO within each price level
Property 5: Canceled order is never matched in subsequent trades

6.3 Critical Test Cases

Test 1: Simple Match
  ADD BUY 100 @ 50.00
  ADD SELL 100 @ 50.00
  Expected: Trade 100 @ 50.00, book empty

Test 2: Partial Fill
  ADD BUY 100 @ 50.00
  ADD SELL 50 @ 50.00
  Expected: Trade 50 @ 50.00, bid of 50 @ 50.00 remains

Test 3: Multi-Level Match
  ADD BUY 100 @ 50.00
  ADD BUY 100 @ 49.00
  ADD SELL 150 @ 48.00
  Expected: Trade 100 @ 50.00, Trade 50 @ 49.00,
            bid of 50 @ 49.00 remains, ask of 0

Test 4: FIFO Priority
  ADD BUY 100 @ 50.00 order_A
  ADD BUY 100 @ 50.00 order_B
  ADD SELL 100 @ 50.00 order_C
  Expected: Trade order_A-order_C 100 @ 50.00,
            order_B still in book

Test 5: Cancel Before Match
  ADD BUY 100 @ 50.00 order_A
  CANCEL order_A
  ADD SELL 100 @ 50.00 order_B
  Expected: No trade, ask of 100 @ 50.00

Test 6: Many Price Levels
  ADD 1000 buy orders at prices 50.00-50.99 (0.01 increments)
  ADD SELL 100 @ 48.00
  Expected: 100 trades, best bid drops appropriately

Test 7: High Volume
  ADD 100,000 buy orders at random prices
  ADD 100,000 sell orders at random prices
  Verify: No crashes, book consistent, all trades valid

6.4 Stress Tests

Stress Test 1: Throughput
  Add 1,000,000 orders as fast as possible
  Measure orders per second

Stress Test 2: Mixed Workload
  70% adds, 25% cancels, 5% queries
  Run for 10 seconds
  Measure latency distribution

Stress Test 3: Adversarial
  Add orders at 10,000 different price levels
  Cancel all of them
  Verify no memory leaks

7. Common Pitfalls and Debugging

Problem	Symptom	Cause	Solution	Verification
Wrong match order	Orders matched out of FIFO sequence	Using wrong data structure (set, unordered)	Use ordered container or explicit timestamp comparison	Log order arrival times, verify first-in matched first
Memory leaks	Memory usage grows over time	Orders not freed after fill/cancel	Implement memory pool, track allocations	Run with valgrind, verify count goes to zero
Cache thrashing	High latency variance	Pointer-heavy data structures	Switch to array-based, intrusive lists	Check cache misses with perf stat
Floating point bugs	$0.10 + $0.20 != $0.30	Using float/double for prices	Use fixed-point integer representation	Test with known problematic values
Best price stale	Report wrong best bid/ask	Not updating after cancel	Scan for new best after level empties	Unit test cancel at best price
Quantity overflow	Negative or huge quantities	Not checking for overflow	Use checked arithmetic or assert bounds	Test with MAX_INT quantities
ID collision	Wrong order canceled	Order IDs reused incorrectly	Use monotonic incrementing IDs	Assert uniqueness on add
Race conditions (future)	Corrupt state	Concurrent access without locks	Single-threaded first, then add proper sync	Run thread sanitizer

7.1 Debugging Strategies

Print the book state:

void debug_print_book() {
    std::cout << "=== BIDS ===\n";
    for (const auto& [price, level] : bids) {
        std::cout << price << ": ";
        for (const auto& order : level.orders) {
            std::cout << "[" << order.id << ":" << order.qty << "] ";
        }
        std::cout << "\n";
    }
    // ... similar for asks
}

Validate invariants:

void validate_book() {
    // All orders in book should be in lookup map
    // All orders in lookup should be in book
    // Best bid should be highest non-empty bid level
    // Best ask should be lowest non-empty ask level
    // No price level should be empty
    // Total quantity at level should match sum of orders
}

Trace execution:

#define TRACE(msg) std::cerr << __FUNCTION__ << ":" << __LINE__ << " " << msg << "\n"

void add_order(Order& order) {
    TRACE("order_id=" << order.id << " price=" << order.price);
    // ...
}

8. Extensions and Challenges

8.1 Beginner Extensions

Add modify order: Change price or quantity of existing order
Market order support: IOC (Immediate or Cancel) execution
Order history: Keep log of all orders and trades
Pretty printing: Colorized book display

8.2 Intermediate Extensions

Multiple symbols: Order book per symbol, symbol manager
Order types: Stop orders, stop-limit, trailing stop
Time in force: Day, GTC, GTD, FOK, IOC
Self-trade prevention: Prevent same user matching with self
ITCH protocol: Parse NASDAQ ITCH data to populate book

8.3 Advanced Extensions

Lock-free order book: Concurrent reads while writes happening
FPGA acceleration: Move hot path to hardware
Persistent state: Recover book state after crash
Market data feed: Publish L2/L3 data over UDP multicast
FIX gateway: Accept orders via FIX protocol

9. Real-World Connections

9.1 How Production Systems Differ

Aspect	This Project	Production System
Orders per second	100,000	10,000,000+
Latency target	< 1 microsecond	< 100 nanoseconds
Price levels	Thousands	Millions possible
Symbols	1	10,000+
Redundancy	None	Hot-hot replication
Persistence	None	Every order journaled
Connectivity	CLI	FIX, ITCH, proprietary
Monitoring	printf	Prometheus, Grafana, alerting

9.2 Real Exchange Examples

NASDAQ Matching Engine:

Processes 10+ million orders per day per symbol
Latency: ~50 microseconds end-to-end
Uses ITCH protocol for market data (binary, efficient)
FIX for order entry
Runs on commodity Linux servers (not FPGA)

CME Globex:

Largest futures exchange
Sub-millisecond matching
Uses iLink protocol (binary)
Co-location available for HFT firms

IEX (Investors Exchange):

Famous for “speed bump” (350 microsecond delay)
Designed to reduce HFT advantage
Same order book mechanics, different latency profile

9.3 Open Source References

Project	Description	Useful For
LMAX Disruptor	Lock-free ring buffer	Understanding low-latency patterns
OpenHFT Chronicle Queue	Persisted queue	Journaling patterns
quickfix	FIX engine	Protocol implementation
nautilus_trader	Python/Rust trading	Architecture reference

10. Resources

10.1 Essential Reading

“Building Low Latency Applications with C++” by Sourav Ghosh (Ch. 3-6)
- Step-by-step order book implementation
- C++ specific optimizations
“Computer Systems: A Programmer’s Perspective” by Bryant & O’Hallaron (Ch. 6)
- Cache hierarchy and locality
- Why memory layout matters

10.2 Articles and Papers

“Limit Order Book Implementation for Low Latency Trading” - Alex Abosi
- Data structure comparison
- Benchmarks
“C++ Design Patterns for Low-Latency Applications” - arXiv
- Academic treatment
- Pattern catalog

10.3 GitHub Repositories

aspone/OrderBook - C++ reference (300-600ns)
CppTrader - Full trading system in C++
matching-engine - Go implementation (for comparison)

10.4 Videos

“CppCon: Building Low Latency Trading Systems” - Various years
“LMAX Exchange Architecture” - Martin Thompson (InfoQ)
“Lock-Free Programming” - Herb Sutter (Channel 9)

11. Self-Assessment Checklist

Understanding

I can explain what a limit order book is without looking at notes
I can describe price-time priority and why it matters
I understand why cache efficiency is critical for HFT
I can explain the tradeoffs between different data structures
I know what fixed-point arithmetic is and when to use it
I understand what intrusive data structures are

Implementation

My order book correctly maintains sorted price levels
Orders match in FIFO order within price levels
Cancel is O(1) using order ID lookup
No heap allocation occurs during normal operation (Phase 3+)
My benchmark shows sub-microsecond latency
All test cases pass

Growth

I profiled my code and found at least one bottleneck
I measured and can explain my cache miss rate
I can read and understand the reference implementations
I could explain my design decisions in an interview

12. Submission / Completion Criteria

Minimum Viable Completion

Order book supports add, cancel, and match operations
Matching follows price-time priority correctly
Basic CLI demonstrates functionality
At least 5 unit tests pass
Code compiles without warnings

Full Completion

All 10 test scenarios pass
Memory pool eliminates hot-path allocation
Add order latency < 1 microsecond (p99)
Cancel order latency < 500 nanoseconds (p99)
Throughput > 500,000 orders/second
Cache miss rate documented

Excellence (Going Above and Beyond)

Array-based price level storage implemented
Latency < 500ns (p99) for all operations
Multiple symbols supported
Benchmark comparison with baseline documented
ITCH or FIX protocol parsing integrated

This guide was expanded from HIGH_FREQUENCY_TRADING_CPP_RUST_LEARNING_PROJECTS.md. For the complete learning path, see the project index.