Project 3: Simple Matching Engine

Build a complete matching engine that accepts orders via TCP, matches them using your order book, and publishes trades and market data - the core of what exchanges do, integrating networking, protocol design, and real-time state management.


Quick Reference

Attribute Value
Difficulty Advanced
Time Estimate 3-4 weeks
Languages Rust (Primary), C++, C, Go (Alternatives)
Prerequisites Project 1 (Order Book), basic networking, event-driven I/O concepts
Key Topics Event-driven I/O, protocol design, state machines, TCP networking, concurrent connections
Coolness Level Level 5: Production Exchange Core
Portfolio Value Interview Gold + Job Ready

1. Learning Objectives

By completing this project, you will:

  1. Master event-driven I/O: Understand epoll/kqueue/io_uring and why blocking I/O is unacceptable in low-latency systems
  2. Design binary protocols: Create efficient wire formats for order submission and trade reporting
  3. Implement order lifecycle state machines: Track orders from submission through acknowledgment to fill or cancel
  4. Handle concurrent TCP connections: Accept and process orders from multiple clients simultaneously without blocking
  5. Integrate complex subsystems: Combine your order book with networking layer, creating a complete mini-exchange
  6. Understand thread orchestration: Learn when to use single-threaded event loops vs. multi-threaded architectures

2. Theoretical Foundation

2.1 What is a Matching Engine?

A matching engine is the heart of any exchange. It receives orders from participants, matches buyers with sellers according to defined rules, and publishes the results. Every stock exchange, cryptocurrency exchange, and futures market runs a matching engine.

                    MATCHING ENGINE OVERVIEW

    ┌─────────────────────────────────────────────────────────────┐
    │                                                             │
    │                    MATCHING ENGINE                          │
    │                                                             │
    │  ┌───────────────────────────────────────────────────────┐ │
    │  │                 TCP ACCEPTOR                          │ │
    │  │   - Listen on port 9000                               │ │
    │  │   - Accept new client connections                     │ │
    │  │   - Non-blocking I/O                                  │ │
    │  └───────────────────┬───────────────────────────────────┘ │
    │                      │                                      │
    │                      ▼                                      │
    │  ┌───────────────────────────────────────────────────────┐ │
    │  │              PROTOCOL PARSER                          │ │
    │  │   - Deserialize binary messages                       │ │
    │  │   - Validate order fields                             │ │
    │  │   - Create Order objects                              │ │
    │  └───────────────────┬───────────────────────────────────┘ │
    │                      │                                      │
    │                      ▼                                      │
    │  ┌───────────────────────────────────────────────────────┐ │
    │  │              ORDER BOOK ENGINE                        │ │
    │  │   (Your Project 1 implementation)                     │ │
    │  │   - Add orders to book                                │ │
    │  │   - Match crossing orders                             │ │
    │  │   - Generate trade events                             │ │
    │  └───────────────────┬───────────────────────────────────┘ │
    │                      │                                      │
    │                      ▼                                      │
    │  ┌───────────────────────────────────────────────────────┐ │
    │  │              MESSAGE PUBLISHER                        │ │
    │  │   - Order acknowledgments                             │ │
    │  │   - Trade execution reports                           │ │
    │  │   - Market data broadcast                             │ │
    │  └───────────────────────────────────────────────────────┘ │
    │                                                             │
    └─────────────────────────────────────────────────────────────┘

              Clients                            Subscribers
         ┌──────────────┐                   ┌──────────────────┐
         │ Trader A     │───────────────────│ Market Data Feed │
         │ Trader B     │      TCP          │ Risk System      │
         │ Market Maker │───────────────────│ Audit Log        │
         └──────────────┘                   └──────────────────┘

2.2 Event-Driven I/O

The Problem with Blocking I/O:

BLOCKING I/O: One Thread per Connection

Thread 1: accept() ──► read() ──► [BLOCKS] ──► process() ──► write()
Thread 2: accept() ──► read() ──► [BLOCKS] ──► process() ──► write()
Thread 3: accept() ──► read() ──► [BLOCKS] ──► process() ──► write()
   ...
Thread 1000: accept() ──► read() ──► [BLOCKS] ──► process() ──► write()

Problems:
  - 1000 connections = 1000 threads
  - Each thread uses ~1MB stack (default on Linux)
  - Context switching overhead
  - Cache pollution from thread switching
  - Latency variance from scheduler decisions

Event-Driven Alternative:

EVENT-DRIVEN I/O: Single Thread, Many Connections

                    ┌─────────────────────────────────────┐
                    │          EVENT LOOP                 │
                    │                                     │
                    │   1. Wait for events (epoll_wait)   │
                    │   2. Process ready file descriptors │
                    │   3. Return to step 1               │
                    │                                     │
                    └─────────────────────────────────────┘
                                    │
           ┌────────────────────────┼────────────────────────┐
           │                        │                        │
           ▼                        ▼                        ▼
    ┌─────────────┐          ┌─────────────┐          ┌─────────────┐
    │ Connection 1│          │ Connection 2│          │ Connection N│
    │   READABLE  │          │   WRITABLE  │          │   READABLE  │
    └─────────────┘          └─────────────┘          └─────────────┘

Single thread handles ALL connections!
- No context switching
- Predictable latency
- CPU cache stays hot

Platform-Specific APIs:

Platform API Key Function Performance
Linux epoll epoll_wait() Excellent
Linux (newer) io_uring io_uring_submit() Best
macOS/BSD kqueue kevent() Excellent
Windows IOCP GetQueuedCompletionStatus() Excellent
Cross-platform libuv/mio Varies Good abstraction

2.3 Order Lifecycle State Machine

Every order progresses through a well-defined state machine:

                    ORDER STATE MACHINE

                         ┌─────────────┐
                         │   CREATED   │
                         │  (in-flight)│
                         └──────┬──────┘
                                │
              ┌─────────────────┼─────────────────┐
              │                 │                 │
              ▼                 ▼                 ▼
       ┌─────────────┐   ┌─────────────┐   ┌─────────────┐
       │  REJECTED   │   │  ACCEPTED   │   │ ACK_PENDING │
       │ (validation │   │ (in book,   │   │  (network   │
       │   failed)   │   │   resting)  │   │   delay)    │
       └─────────────┘   └──────┬──────┘   └─────────────┘
                                │
              ┌─────────────────┼─────────────────┐
              │                 │                 │
              ▼                 ▼                 ▼
       ┌─────────────┐   ┌─────────────┐   ┌─────────────┐
       │  CANCELED   │   │PARTIAL_FILL│   │   FILLED    │
       │  (by user   │   │(some qty    │   │ (complete   │
       │   request)  │   │  executed)  │   │  execution) │
       └─────────────┘   └──────┬──────┘   └─────────────┘
                                │
                                ▼
                         ┌─────────────┐
                         │   FILLED    │
                         └─────────────┘

    State Transitions:
    CREATED → REJECTED:      Validation fails (invalid price, bad symbol)
    CREATED → ACCEPTED:      Order added to book, resting
    CREATED → FILLED:        Immediate match, no resting quantity
    ACCEPTED → PARTIAL_FILL: Some quantity matched
    ACCEPTED → FILLED:       All quantity matched
    ACCEPTED → CANCELED:     Cancel request received
    PARTIAL_FILL → FILLED:   Remaining quantity matched
    PARTIAL_FILL → CANCELED: Cancel request for remainder

2.4 Binary Protocol Design

Text protocols (like FIX tag=value) are human-readable but slow. Binary protocols are faster:

TEXT PROTOCOL (FIX-like):

8=FIX.4.2|9=178|35=D|49=TRADER1|56=EXCHANGE|34=12|52=20231215-10:30:00|
11=ORDER123|21=1|55=AAPL|54=1|60=20231215-10:30:00|38=100|40=2|44=150.50|10=128|

Length: 178 bytes
Parsing: String splitting, multiple conversions
Latency: ~2-5 microseconds to parse


BINARY PROTOCOL:

┌──────────────────────────────────────────────────────────────────┐
│ Byte  0-1:  Message Length (uint16_t)  = 42                     │
│ Byte  2:    Message Type (uint8_t)     = 0x01 (NEW_ORDER)       │
│ Byte  3-10: Order ID (uint64_t)        = 123                    │
│ Byte 11-14: Symbol ID (uint32_t)       = 1 (AAPL)               │
│ Byte 15:    Side (uint8_t)             = 1 (BUY)                │
│ Byte 16-23: Price (int64_t ticks)      = 15050 ($150.50)        │
│ Byte 24-27: Quantity (uint32_t)        = 100                    │
│ Byte 28-35: Timestamp (uint64_t ns)    = 1702638600000000000    │
│ Byte 36-39: Client ID (uint32_t)       = 42                     │
│ Byte 40-41: Checksum (uint16_t)        = 0xABCD                 │
└──────────────────────────────────────────────────────────────────┘

Length: 42 bytes
Parsing: Direct memory cast (zero-copy)
Latency: ~50-100 nanoseconds to parse

Message Type Catalog:

Type Code Name Direction Description
0x01 NEW_ORDER Client → Engine Submit new order
0x02 CANCEL_ORDER Client → Engine Cancel existing order
0x03 MODIFY_ORDER Client → Engine Modify price/quantity
0x10 ORDER_ACK Engine → Client Order accepted
0x11 ORDER_REJECTED Engine → Client Order validation failed
0x12 ORDER_CANCELED Engine → Client Cancel confirmed
0x20 TRADE Engine → All Execution occurred
0x30 MARKET_DATA Engine → Subscribers Best bid/offer update

2.5 Fair Matching with Price-Time Priority

Your matching engine must be deterministic and fair:

PRICE-TIME PRIORITY MATCHING

Scenario: Three buy orders at price $150.00

Time 09:00:00.001:  Order A - BUY 100 @ $150.00
Time 09:00:00.003:  Order B - BUY 200 @ $150.00
Time 09:00:00.007:  Order C - BUY 150 @ $150.00

Order Queue at $150.00:
┌──────────┬──────────┬──────────┐
│ Order A  │ Order B  │ Order C  │
│ (first)  │ (second) │ (third)  │
│ 100 qty  │ 200 qty  │ 150 qty  │
└──────────┴──────────┴──────────┘

Incoming: SELL 250 @ $150.00

Step 1: Match with Order A (first in queue)
        Trade: 100 @ $150.00
        Order A: FILLED
        Remaining sell: 150

Step 2: Match with Order B (next in queue)
        Trade: 150 @ $150.00
        Order B: PARTIAL (50 remaining)
        Sell order: FILLED

Result:
- Order A: FILLED (was first)
- Order B: PARTIAL, 50 remaining (was second)
- Order C: UNTOUCHED (was third)

Queue at $150.00 after matching:
┌──────────┬──────────┐
│ Order B  │ Order C  │
│ 50 qty   │ 150 qty  │
└──────────┴──────────┘

2.6 Why This Matters

This is what real exchanges do. The matching engine is:

The Core Revenue Generator:

  • NYSE processes 6+ billion shares daily
  • CME handles $4.5+ trillion in futures daily
  • Every trade generates fees

The Speed Differentiator:

  • NASDAQ: ~50 microseconds end-to-end
  • CME Globex: sub-millisecond matching
  • IEX: 350 microsecond speed bump (intentional delay)

The Most Regulated Component:

  • Must be deterministic (same input = same output)
  • Must be auditable (every order/trade logged)
  • Must be fair (price-time priority strictly enforced)
  • Subject to SEC/CFTC rules

2.7 Historical Context

EVOLUTION OF MATCHING ENGINES

1792-1971:    Open Outcry
              - Traders shouting in trading pits
              - Specialists maintaining order books on paper
              - Speed: minutes per trade

1971:         NASDAQ Founded
              - First electronic quote system
              - Still phone-based trading
              - Speed: seconds per trade

1987:         Black Monday Crash
              - Electronic systems overwhelmed
              - Led to circuit breakers
              - Highlighted need for robust systems

1990s:        Electronic Communication Networks (ECNs)
              - Island, Archipelago, Instinet
              - Fully electronic matching
              - Speed: 100s of milliseconds

2000s:        Algorithmic Trading Era
              - Decimalization (2001)
              - Reg NMS (2005)
              - Co-location begins
              - Speed: milliseconds

2010s:        HFT Dominance
              - Microwave networks
              - FPGA-based trading
              - Flash crash (2010)
              - Speed: microseconds

2020s:        Nanosecond Wars
              - Kernel bypass (DPDK)
              - Lock-free everything
              - AI market making
              - Speed: nanoseconds

2.8 Common Misconceptions

Misconception 1: “Just use threads - one per connection”

Reality: Thread-per-connection doesn’t scale and introduces unpredictable latency from context switching. Event-driven architectures handle thousands of connections with a single thread.

Misconception 2: “TCP is too slow; I need UDP”

Reality: TCP is fine for order submission. The kernel handles buffering efficiently. UDP multicast is used for market data broadcasts where occasional packet loss is acceptable.

Misconception 3: “The matching algorithm is complex”

Reality: Matching is simple FIFO logic. The complexity is in the surrounding infrastructure: networking, protocol parsing, state management, and failure handling.

Misconception 4: “I need to support FIX protocol immediately”

Reality: Start with a simple binary protocol. FIX is verbose and complex. Production systems often have a FIX gateway that translates to internal binary protocols.

Misconception 5: “A single-threaded event loop can’t be fast enough”

Reality: A well-optimized single-threaded event loop can process 100,000+ orders per second. Multi-threading adds complexity and often hurts latency due to synchronization.


3. Project Specification

3.1 What You Will Build

A complete matching engine that:

  1. Accepts TCP connections from multiple clients on a configurable port
  2. Parses binary order messages with zero-copy deserialization
  3. Matches orders using your Project 1 order book with price-time priority
  4. Publishes execution reports to relevant clients
  5. Broadcasts market data to all connected subscribers
  6. Tracks order state through the complete lifecycle
  7. Provides performance metrics including latency percentiles and throughput

3.2 Functional Requirements

ID Requirement Priority
FR1 Accept TCP connections on configurable port Must Have
FR2 Parse binary NEW_ORDER messages Must Have
FR3 Parse binary CANCEL_ORDER messages Must Have
FR4 Send ORDER_ACK to client upon order acceptance Must Have
FR5 Send ORDER_REJECTED with reason code on validation failure Must Have
FR6 Send TRADE messages to both counterparties Must Have
FR7 Broadcast MARKET_DATA updates after book changes Must Have
FR8 Handle client disconnection gracefully Must Have
FR9 Support at least 100 concurrent connections Must Have
FR10 Process orders in FIFO order per connection Must Have
FR11 Support MODIFY_ORDER messages Should Have
FR12 Provide command interface for statistics Should Have
FR13 Support graceful shutdown with order draining Should Have
FR14 Log all orders and trades for audit Should Have

3.3 Non-Functional Requirements

Requirement Target Measurement
Order-to-Ack Latency < 20us (p50), < 50us (p99) Measure from message receive to ack send
Throughput > 50,000 orders/second Sustained load test
Connection Capacity 100+ concurrent Stress test with many clients
Memory per Connection < 10 KB Measure with 100 connections
CPU Usage Single core saturation Event loop on one core
Message Parsing < 100ns Benchmark protocol parser
Zero Allocation No malloc in hot path Profile heap usage

3.4 Example Usage / Output

Starting the server:

$ ./matching_engine --port 9000 --symbol AAPL

=== SIMPLE MATCHING ENGINE ===
Listening on port 9000
Symbol: AAPL
Tick size: 0.01

[09:00:00.000] Server started, waiting for connections...

Client session (using telnet or custom client):

$ telnet localhost 9000
Connected to matching engine

# Client sends binary NEW_ORDER (shown as text for readability)
> NEW_ORDER id=1 side=BUY price=150.25 qty=100
< ACK order_id=1 status=ACCEPTED

> NEW_ORDER id=2 side=BUY price=150.24 qty=200
< ACK order_id=2 status=ACCEPTED

> NEW_ORDER id=3 side=SELL price=150.30 qty=150
< ACK order_id=3 status=ACCEPTED

> NEW_ORDER id=4 side=SELL price=150.25 qty=50
< ACK order_id=4 status=ACCEPTED
< TRADE buyer=1 seller=4 price=150.25 qty=50
< MARKET_DATA bid=150.25x50 ask=150.30x150

> CANCEL id=1
< ACK order_id=1 status=CANCELED remaining=50
< MARKET_DATA bid=150.24x200 ask=150.30x150

> NEW_ORDER id=5 side=SELL price=150.20 qty=300
< ACK order_id=5 status=ACCEPTED
< TRADE buyer=2 seller=5 price=150.24 qty=200
< PARTIAL order_id=5 filled=200 remaining=100
< MARKET_DATA bid=-- ask=150.20x100

Server log output:

[09:00:01.234] Client 1 connected from 127.0.0.1:54321
[09:00:01.235] Order #1: BUY 100 @ 150.25 from client 1
[09:00:01.235] Order #1 ACCEPTED, added to book
[09:00:01.240] Order #2: BUY 200 @ 150.24 from client 1
[09:00:01.240] Order #2 ACCEPTED, added to book
[09:00:01.250] Order #3: SELL 150 @ 150.30 from client 1
[09:00:01.250] Order #3 ACCEPTED, added to book
[09:00:01.260] Order #4: SELL 50 @ 150.25 from client 1
[09:00:01.260] Trade: Order #1 x Order #4 = 50 @ 150.25
[09:00:01.260] Order #4 FILLED
[09:00:01.260] Order #1 PARTIAL, 50 remaining
...

[09:01:00.000] === PERFORMANCE STATS ===
Orders received:     50,234
Orders matched:      23,891
Trades executed:     31,456
Avg latency (p50):   12us
Avg latency (p99):   45us
Throughput:          50,234 orders/second
Memory usage:        2.4 MB
Active connections:  15

3.5 Real World Outcome

When complete, you will have:

  1. A functional mini-exchange that can accept orders, match them, and report trades

  2. Performance metrics demonstrating production-grade latency:
    === BENCHMARK RESULTS ===
    Order-to-Ack:     12us (p50), 18us (p95), 45us (p99)
    Throughput:       52,847 orders/second sustained
    Connections:      100 concurrent, 0 drops
    Memory:           2.4 MB total, 0 allocations in hot path
    CPU:              Single core, 85% utilization at peak
    
  3. Understanding of event-driven architecture and why it’s used in low-latency systems

  4. Foundation for building FIX gateways, market data systems, and trading strategies

4. Solution Architecture

4.1 High-Level Design

                    MATCHING ENGINE ARCHITECTURE

┌─────────────────────────────────────────────────────────────────────────────┐
│                                                                             │
│                          MATCHING ENGINE                                    │
│                                                                             │
│  ┌─────────────────────────────────────────────────────────────────────┐   │
│  │                       EVENT LOOP (epoll/kqueue)                      │   │
│  │                                                                      │   │
│  │   ┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐ │   │
│  │   │ Listen Socket   │    │ Client Socket 1 │    │ Client Socket N │ │   │
│  │   │ (Accept events) │    │ (Read/Write)    │    │ (Read/Write)    │ │   │
│  │   └────────┬────────┘    └────────┬────────┘    └────────┬────────┘ │   │
│  │            │                      │                      │          │   │
│  └────────────┼──────────────────────┼──────────────────────┼──────────┘   │
│               │                      │                      │              │
│               ▼                      ▼                      ▼              │
│  ┌─────────────────────────────────────────────────────────────────────┐   │
│  │                      CONNECTION MANAGER                              │   │
│  │                                                                      │   │
│  │   ┌─────────────────────────────────────────────────────────────┐   │   │
│  │   │ Connection Table: fd → ConnectionState                      │   │   │
│  │   │                                                              │   │   │
│  │   │ ConnectionState:                                            │   │   │
│  │   │   - File descriptor                                         │   │   │
│  │   │   - Read buffer                                             │   │   │
│  │   │   - Write buffer                                            │   │   │
│  │   │   - Client ID                                               │   │   │
│  │   │   - Pending orders list                                     │   │   │
│  │   └─────────────────────────────────────────────────────────────┘   │   │
│  │                                                                      │   │
│  └──────────────────────────────┬──────────────────────────────────────┘   │
│                                 │                                          │
│                                 ▼                                          │
│  ┌─────────────────────────────────────────────────────────────────────┐   │
│  │                      PROTOCOL PARSER                                 │   │
│  │                                                                      │   │
│  │   ┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐ │   │
│  │   │ Read bytes      │───►│ Frame messages  │───►│ Deserialize     │ │   │
│  │   │ from buffer     │    │ (length prefix) │    │ to Order struct │ │   │
│  │   └─────────────────┘    └─────────────────┘    └─────────────────┘ │   │
│  │                                                                      │   │
│  └──────────────────────────────┬──────────────────────────────────────┘   │
│                                 │                                          │
│                                 ▼                                          │
│  ┌─────────────────────────────────────────────────────────────────────┐   │
│  │                      ORDER VALIDATOR                                 │   │
│  │                                                                      │   │
│  │   - Price > 0                 - Quantity > 0                        │   │
│  │   - Valid side (BUY/SELL)     - Symbol exists                       │   │
│  │   - Order ID unique           - Client authorized                    │   │
│  │                                                                      │   │
│  └──────────────────────────────┬──────────────────────────────────────┘   │
│                                 │                                          │
│                                 ▼                                          │
│  ┌─────────────────────────────────────────────────────────────────────┐   │
│  │                      ORDER BOOK ENGINE                               │   │
│  │                      (From Project 1)                                │   │
│  │                                                                      │   │
│  │   ┌─────────────────────────────────────────────────────────────┐   │   │
│  │   │                      ORDER BOOK                              │   │   │
│  │   │                                                              │   │   │
│  │   │   BIDS (Descending)           ASKS (Ascending)              │   │   │
│  │   │   ├── 150.25: [O1, O2]        ├── 150.28: [O5]              │   │   │
│  │   │   ├── 150.24: [O3]            ├── 150.30: [O6, O7]          │   │   │
│  │   │   └── 150.20: [O4]            └── 150.35: [O8]              │   │   │
│  │   │                                                              │   │   │
│  │   └─────────────────────────────────────────────────────────────┘   │   │
│  │                                                                      │   │
│  │   Outputs:                                                           │   │
│  │     - Order accepted/rejected events                                │   │
│  │     - Trade execution events                                        │   │
│  │     - Book update events (for market data)                          │   │
│  │                                                                      │   │
│  └──────────────────────────────┬──────────────────────────────────────┘   │
│                                 │                                          │
│                                 ▼                                          │
│  ┌─────────────────────────────────────────────────────────────────────┐   │
│  │                      MESSAGE PUBLISHER                               │   │
│  │                                                                      │   │
│  │   ┌─────────────────────────────────────────────────────────────┐   │   │
│  │   │ Serialize events to binary:                                  │   │   │
│  │   │   - ORDER_ACK → specific client                              │   │   │
│  │   │   - TRADE → both counterparties                              │   │   │
│  │   │   - MARKET_DATA → all subscribers                            │   │   │
│  │   └─────────────────────────────────────────────────────────────┘   │   │
│  │                                                                      │   │
│  │   ┌─────────────────────────────────────────────────────────────┐   │   │
│  │   │ Queue messages to connection write buffers                   │   │   │
│  │   │ (Non-blocking, handled by event loop)                        │   │   │
│  │   └─────────────────────────────────────────────────────────────┘   │   │
│  │                                                                      │   │
│  └─────────────────────────────────────────────────────────────────────┘   │
│                                                                             │
│  ┌─────────────────────────────────────────────────────────────────────┐   │
│  │                      SUPPORTING SYSTEMS                              │   │
│  │                                                                      │   │
│  │   ┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐ │   │
│  │   │ Order State     │    │ Statistics      │    │ Audit Logger    │ │   │
│  │   │ Tracker         │    │ Collector       │    │                 │ │   │
│  │   └─────────────────┘    └─────────────────┘    └─────────────────┘ │   │
│  │                                                                      │   │
│  │   ┌─────────────────┐    ┌─────────────────┐                        │   │
│  │   │ Memory Pool     │    │ Timestamp       │                        │   │
│  │   │ (Zero alloc)    │    │ Generator       │                        │   │
│  │   └─────────────────┘    └─────────────────┘                        │   │
│  │                                                                      │   │
│  └─────────────────────────────────────────────────────────────────────┘   │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

4.2 Key Components

Component Responsibility Performance Target
Event Loop Wait for I/O events, dispatch handlers < 1us per iteration
Connection Manager Track client connections and buffers O(1) lookup by fd
Protocol Parser Deserialize binary messages < 100ns per message
Order Validator Validate order fields < 50ns per order
Order Book Price-time priority matching < 500ns per operation
Message Publisher Serialize and queue outbound messages < 100ns per message
Order State Tracker Track order lifecycle O(1) state lookup
Statistics Collector Latency histograms, throughput Lock-free updates
Memory Pool Pre-allocated buffers and orders O(1) alloc/free

4.3 Data Structures

Connection State:

struct Connection {
    fd: RawFd,                    // File descriptor
    client_id: u32,               // Unique client identifier

    // Buffers
    read_buffer: RingBuffer,      // Incoming data (4KB typical)
    write_buffer: RingBuffer,     // Outgoing data (4KB typical)

    // State
    state: ConnectionState,       // CONNECTED, AUTHENTICATED, CLOSING
    pending_orders: Vec<u64>,     // Order IDs belonging to this client

    // Metrics
    orders_sent: u64,
    bytes_received: u64,
    last_activity: Instant,
}

enum ConnectionState {
    Connected,
    Authenticated,
    Closing,
    Closed,
}

Message Types (Binary Protocol):

MESSAGE HEADER (all messages):
┌───────────────────────────────────────────┐
│ Byte 0-1:  Length (uint16_t, big-endian)  │
│ Byte 2:    Message Type (uint8_t)         │
│ Byte 3:    Flags (uint8_t)                │
└───────────────────────────────────────────┘

NEW_ORDER (Type 0x01):
┌───────────────────────────────────────────┐
│ Header (4 bytes)                          │
│ Order ID (uint64_t)                       │  8 bytes
│ Symbol ID (uint32_t)                      │  4 bytes
│ Side (uint8_t)                            │  1 byte
│ Price (int64_t ticks)                     │  8 bytes
│ Quantity (uint32_t)                       │  4 bytes
│ Timestamp (uint64_t ns)                   │  8 bytes
│ Client Order ID (uint64_t)                │  8 bytes
└───────────────────────────────────────────┘
Total: 45 bytes

CANCEL_ORDER (Type 0x02):
┌───────────────────────────────────────────┐
│ Header (4 bytes)                          │
│ Order ID (uint64_t)                       │  8 bytes
│ Symbol ID (uint32_t)                      │  4 bytes
└───────────────────────────────────────────┘
Total: 16 bytes

ORDER_ACK (Type 0x10):
┌───────────────────────────────────────────┐
│ Header (4 bytes)                          │
│ Order ID (uint64_t)                       │  8 bytes
│ Status (uint8_t)                          │  1 byte
│ Timestamp (uint64_t ns)                   │  8 bytes
│ Remaining Qty (uint32_t)                  │  4 bytes
│ Reason Code (uint8_t) [if rejected]       │  1 byte
└───────────────────────────────────────────┘
Total: 26 bytes

TRADE (Type 0x20):
┌───────────────────────────────────────────┐
│ Header (4 bytes)                          │
│ Trade ID (uint64_t)                       │  8 bytes
│ Buy Order ID (uint64_t)                   │  8 bytes
│ Sell Order ID (uint64_t)                  │  8 bytes
│ Symbol ID (uint32_t)                      │  4 bytes
│ Price (int64_t ticks)                     │  8 bytes
│ Quantity (uint32_t)                       │  4 bytes
│ Timestamp (uint64_t ns)                   │  8 bytes
└───────────────────────────────────────────┘
Total: 52 bytes

MARKET_DATA (Type 0x30):
┌───────────────────────────────────────────┐
│ Header (4 bytes)                          │
│ Symbol ID (uint32_t)                      │  4 bytes
│ Best Bid Price (int64_t)                  │  8 bytes
│ Best Bid Qty (uint32_t)                   │  4 bytes
│ Best Ask Price (int64_t)                  │  8 bytes
│ Best Ask Qty (uint32_t)                   │  4 bytes
│ Timestamp (uint64_t ns)                   │  8 bytes
└───────────────────────────────────────────┘
Total: 40 bytes

Order State:

struct OrderState {
    order_id: u64,
    client_id: u32,
    original_qty: u32,
    remaining_qty: u32,
    filled_qty: u32,
    price: i64,
    side: Side,
    status: OrderStatus,
    created_at: u64,       // nanosecond timestamp
    last_updated: u64,
}

enum OrderStatus {
    Pending,       // In flight, not yet in book
    Accepted,      // In book, resting
    PartialFill,   // Some quantity filled
    Filled,        // Completely filled
    Canceled,      // User canceled
    Rejected,      // Validation failed
}

4.4 Algorithm Overview

Event Loop Flow:

EVENT LOOP ALGORITHM

┌─────────────────────────────────────────────────────────────────────┐
│                                                                     │
│  loop {                                                             │
│      // 1. Wait for events (blocking or with timeout)               │
│      events = epoll_wait(epfd, &events, MAX_EVENTS, timeout);      │
│                                                                     │
│      // 2. Process each ready event                                 │
│      for event in events {                                         │
│          if event.fd == listen_socket {                            │
│              // New client connection                               │
│              handle_accept(listen_socket);                         │
│          } else {                                                   │
│              // Existing client activity                            │
│              connection = connections[event.fd];                   │
│                                                                     │
│              if event.is_readable() {                              │
│                  handle_read(connection);                          │
│              }                                                      │
│              if event.is_writable() {                              │
│                  handle_write(connection);                         │
│              }                                                      │
│              if event.is_error() {                                 │
│                  handle_disconnect(connection);                    │
│              }                                                      │
│          }                                                          │
│      }                                                              │
│                                                                     │
│      // 3. Process any pending work (timers, cleanup)              │
│      process_pending_tasks();                                      │
│  }                                                                  │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘

handle_read(connection):
    // Read available data into buffer
    bytes_read = recv(connection.fd, buffer, size, MSG_DONTWAIT);
    connection.read_buffer.push(buffer, bytes_read);

    // Try to parse complete messages
    while let Some(message) = parse_message(&connection.read_buffer) {
        process_message(connection, message);
    }

handle_write(connection):
    // Send queued data
    if !connection.write_buffer.is_empty() {
        bytes_sent = send(connection.fd, write_buffer.data(), MSG_DONTWAIT);
        connection.write_buffer.consume(bytes_sent);
    }

    // If buffer empty, remove from write interest
    if connection.write_buffer.is_empty() {
        epoll_ctl(epfd, EPOLL_CTL_MOD, fd, EPOLLIN);  // Read only
    }

process_message(connection, message):
    match message.type {
        NEW_ORDER => {
            order = parse_new_order(message);
            if validate(order) {
                result = order_book.add_order(order);
                publish_ack(connection, order, result);
                for trade in result.trades {
                    publish_trade(trade);
                }
                publish_market_data();
            } else {
                publish_reject(connection, order, reason);
            }
        }
        CANCEL_ORDER => {
            result = order_book.cancel_order(message.order_id);
            publish_cancel_ack(connection, message.order_id, result);
            publish_market_data();
        }
    }

Threading Model Options:

OPTION 1: SINGLE-THREADED EVENT LOOP (Recommended for learning)

┌─────────────────────────────────────────────────────────────────┐
│                      MAIN THREAD                                 │
│                                                                 │
│   ┌─────────┐    ┌─────────┐    ┌─────────┐    ┌─────────┐    │
│   │ Accept  │───►│  Read   │───►│  Match  │───►│  Write  │    │
│   │Clients  │    │Messages │    │ Orders  │    │Responses│    │
│   └─────────┘    └─────────┘    └─────────┘    └─────────┘    │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Pros: Simple, no synchronization, predictable latency
Cons: Can't use multiple cores, single point of failure

---

OPTION 2: THREAD-PER-CORE WITH LOCK-FREE QUEUES (Production)

┌─────────────────────────────────────────────────────────────────┐
│  NETWORK THREAD (Core 0)                                        │
│                                                                 │
│   ┌─────────┐    ┌─────────┐    ┌─────────┐                    │
│   │ Accept  │───►│  Read   │───►│  Queue  │──────────┐        │
│   │Clients  │    │Messages │    │to Match │          │        │
│   └─────────┘    └─────────┘    └─────────┘          │        │
│                                                       ▼        │
└───────────────────────────────────────────┬───────────────────┘
                                            │   Lock-Free
                                            │     Queue
┌───────────────────────────────────────────▼───────────────────┐
│  MATCHING THREAD (Core 1)                                      │
│                                                                 │
│   ┌─────────┐    ┌─────────┐    ┌─────────┐                    │
│   │Dequeue  │───►│  Match  │───►│  Queue  │──────────┐        │
│   │ Order   │    │ Orders  │    │Responses│          │        │
│   └─────────┘    └─────────┘    └─────────┘          │        │
│                                                       ▼        │
└───────────────────────────────────────────┬───────────────────┘
                                            │   Lock-Free
                                            │     Queue
┌───────────────────────────────────────────▼───────────────────┐
│  NETWORK THREAD (Core 0) - continues                           │
│                                                                 │
│   ┌─────────┐    ┌─────────┐                                   │
│   │Dequeue  │───►│  Write  │                                   │
│   │Response │    │to Client│                                   │
│   └─────────┘    └─────────┘                                   │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Pros: Scales with cores, separates concerns
Cons: Complex, lock-free queues needed, harder to debug

---

OPTION 3: SHARDED BY SYMBOL

┌─────────────────────────────────────────────────────────────────┐
│                   ROUTER THREAD                                  │
│                                                                 │
│   Orders for AAPL ─────────────────────────► Core 1            │
│   Orders for GOOG ─────────────────────────► Core 2            │
│   Orders for MSFT ─────────────────────────► Core 3            │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Each core runs independent matching engine for its symbols.
No coordination needed between symbols.
Used by real exchanges.

5. Implementation Guide

5.1 Development Environment Setup

Linux (Ubuntu/Debian):

# Install dependencies
sudo apt-get update
sudo apt-get install build-essential cmake gdb valgrind linux-tools-generic

# For Rust
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
source $HOME/.cargo/env

# For C++
sudo apt-get install g++ libboost-dev  # Optional boost for networking

# Create project
mkdir matching_engine && cd matching_engine

macOS:

# Install Xcode command line tools
xcode-select --install

# For Rust
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

# Create project
mkdir matching_engine && cd matching_engine

Rust Project Setup:

cargo new matching_engine
cd matching_engine

# Add to Cargo.toml:
# [dependencies]
# mio = "0.8"              # Cross-platform event loop
# bytes = "1.5"            # Byte buffer utilities
# criterion = "0.5"        # Benchmarking
# tracing = "0.1"          # Logging
# parking_lot = "0.12"     # Fast mutexes (if needed)

C++ Project Setup:

mkdir -p src include tests benchmarks
touch CMakeLists.txt

# CMakeLists.txt content:
# cmake_minimum_required(VERSION 3.15)
# project(matching_engine CXX)
# set(CMAKE_CXX_STANDARD 17)
# add_executable(matching_engine src/main.cpp)

5.2 Project Structure

Rust:

matching_engine/
├── Cargo.toml
├── src/
│   ├── main.rs                 # Entry point, CLI
│   ├── lib.rs                  # Library exports
│   ├── engine/
│   │   ├── mod.rs
│   │   ├── matching_engine.rs  # Core engine orchestration
│   │   └── order_book.rs       # Order book (from P1)
│   ├── network/
│   │   ├── mod.rs
│   │   ├── event_loop.rs       # epoll/kqueue wrapper
│   │   ├── connection.rs       # Connection state
│   │   └── acceptor.rs         # TCP listener
│   ├── protocol/
│   │   ├── mod.rs
│   │   ├── messages.rs         # Message types
│   │   ├── parser.rs           # Binary deserialization
│   │   └── serializer.rs       # Binary serialization
│   ├── state/
│   │   ├── mod.rs
│   │   └── order_tracker.rs    # Order lifecycle tracking
│   └── util/
│       ├── mod.rs
│       ├── buffer.rs           # Ring buffer
│       ├── pool.rs             # Memory pool
│       └── stats.rs            # Latency histogram
├── tests/
│   ├── integration_tests.rs
│   └── protocol_tests.rs
├── benches/
│   └── engine_bench.rs
└── examples/
    └── simple_client.rs

C++:

matching_engine/
├── CMakeLists.txt
├── src/
│   ├── main.cpp                # Entry point
│   ├── matching_engine.cpp     # Core orchestration
│   ├── order_book.cpp          # Order book (from P1)
│   ├── event_loop.cpp          # epoll/kqueue
│   ├── connection.cpp          # Connection state
│   ├── protocol.cpp            # Message parsing/serialization
│   └── order_tracker.cpp       # Order lifecycle
├── include/
│   ├── matching_engine.hpp
│   ├── order_book.hpp
│   ├── event_loop.hpp
│   ├── connection.hpp
│   ├── protocol.hpp
│   ├── messages.hpp
│   └── order_tracker.hpp
├── tests/
│   ├── test_protocol.cpp
│   └── test_integration.cpp
├── benchmarks/
│   └── bench_engine.cpp
└── tools/
    └── simple_client.cpp

5.3 The Core Question You’re Answering

“How do you build a system that accepts thousands of concurrent connections, processes messages with microsecond latency, and maintains consistent order book state - all without blocking?”

This question forces you to understand:

  1. Event-driven architecture: Why blocking I/O doesn’t scale
  2. State management: How to track order lifecycle across async boundaries
  3. Protocol design: Why binary protocols outperform text
  4. System integration: How to compose order book with networking

The answer involves understanding that the event loop is the heartbeat of the system - everything happens in response to I/O events, never blocking waiting for slow operations.

5.4 Concepts You Must Understand First

Concept Self-Assessment Question Where to Learn
TCP sockets How do listen(), accept(), recv(), send() work? Stevens UNIX Network Programming Ch. 4
Non-blocking I/O What happens when you set O_NONBLOCK on a socket? TLPI Ch. 63
epoll/kqueue What’s the difference between level-triggered and edge-triggered? TLPI Ch. 63
File descriptors What is a file descriptor? How does the kernel manage them? TLPI Ch. 4
Event loops How does Node.js/libuv work internally? libuv design docs
Byte ordering What is big-endian vs. little-endian? Why does it matter for networks? CS:APP Ch. 2
Memory-mapped I/O What is zero-copy networking? Any systems book
State machines How do you model object lifecycles? Any software design book

5.5 Questions to Guide Your Design

Protocol Design:

  1. How will you frame messages? (length prefix, delimiter, fixed size?)
  2. What byte ordering will you use? (network byte order = big-endian)
  3. How will you handle partial reads? (message split across packets)
  4. How will you version the protocol for future changes?
  5. What checksums or validation will you include?

Networking:

  1. Will you use epoll (Linux), kqueue (macOS), or a cross-platform library?
  2. How will you handle client disconnection mid-message?
  3. What happens if the write buffer fills up?
  4. How will you implement timeouts for idle connections?
  5. Will you support TCP_NODELAY (disable Nagle’s algorithm)?

Threading:

  1. Will you start with single-threaded or multi-threaded?
  2. If multi-threaded, how will you communicate between threads?
  3. How will you ensure order processing is fair (no client starvation)?
  4. Where are the potential contention points?

State Management:

  1. How will you track which orders belong to which client?
  2. What happens if a cancel arrives for an order being matched?
  3. How will you handle duplicate order IDs?
  4. What state needs to survive a reconnect?

5.6 Thinking Exercise

Before writing code, trace through this scenario by hand:

Setup:

  • Matching engine listening on port 9000
  • Client A connects
  • Client B connects

Event Sequence:

t=0ms:   Client A sends: NEW_ORDER id=1 BUY 100 @ 50.25
t=1ms:   Client B sends: NEW_ORDER id=2 SELL 50 @ 50.30
t=2ms:   Client A sends: NEW_ORDER id=3 BUY 200 @ 50.24
t=3ms:   Client B sends: NEW_ORDER id=4 SELL 150 @ 50.20
t=4ms:   Client A sends: CANCEL id=1
t=5ms:   (Client A disconnects unexpectedly)
t=6ms:   Client C connects
t=7ms:   Client C sends: NEW_ORDER id=5 BUY 100 @ 50.30

Questions to answer on paper:

  1. After t=0ms, what messages are sent? To whom?
  2. After t=1ms, what does the order book look like?
  3. After t=3ms, what trades occurred? What messages were sent?
  4. After t=4ms, what happens? Is Order #1 still in the book?
  5. At t=5ms, what happens to Order #3? Is it canceled automatically?
  6. After t=7ms, what trades occur?

Draw the event loop timeline:

Time  | Event             | Action                    | Messages Sent
------|-------------------|---------------------------|---------------
0ms   | Read: NEW_ORDER   | Add to book               | ACK to A
...

5.7 Hints in Layers

Hint 1 - Starting Direction

Start with the simplest possible event loop:

// Rust with mio
use mio::{Events, Poll, Interest, Token};
use mio::net::TcpListener;

fn main() {
    let mut poll = Poll::new().unwrap();
    let mut events = Events::with_capacity(1024);

    let addr = "127.0.0.1:9000".parse().unwrap();
    let mut listener = TcpListener::bind(addr).unwrap();

    poll.registry().register(&mut listener, Token(0), Interest::READABLE).unwrap();

    loop {
        poll.poll(&mut events, None).unwrap();

        for event in events.iter() {
            match event.token() {
                Token(0) => {
                    // Accept new connection
                    let (mut conn, addr) = listener.accept().unwrap();
                    println!("New connection from {}", addr);
                    // TODO: Register connection for reading
                }
                _ => {
                    // Handle client I/O
                    // TODO: Read/write from connection
                }
            }
        }
    }
}
// C++ with epoll (Linux)
#include <sys/epoll.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <fcntl.h>
#include <unistd.h>

int main() {
    int listen_fd = socket(AF_INET, SOCK_STREAM | SOCK_NONBLOCK, 0);

    sockaddr_in addr{};
    addr.sin_family = AF_INET;
    addr.sin_port = htons(9000);
    addr.sin_addr.s_addr = INADDR_ANY;

    bind(listen_fd, (sockaddr*)&addr, sizeof(addr));
    listen(listen_fd, SOMAXCONN);

    int epoll_fd = epoll_create1(0);

    epoll_event ev{};
    ev.events = EPOLLIN;
    ev.data.fd = listen_fd;
    epoll_ctl(epoll_fd, EPOLL_CTL_ADD, listen_fd, &ev);

    epoll_event events[1024];

    while (true) {
        int n = epoll_wait(epoll_fd, events, 1024, -1);

        for (int i = 0; i < n; i++) {
            if (events[i].data.fd == listen_fd) {
                // Accept new connection
                int client_fd = accept4(listen_fd, nullptr, nullptr, SOCK_NONBLOCK);
                printf("New connection: fd=%d\n", client_fd);
                // TODO: Register for reading
            } else {
                // Handle client I/O
                // TODO: Read/write
            }
        }
    }
}

Get this skeleton working first. Then add message parsing.

Hint 2 - Message Framing

Use a length-prefixed protocol to handle message framing:

Binary Message Format:
┌─────────────────┬─────────────────┬─────────────────────────┐
│ Length (2 bytes)│ Type (1 byte)   │ Payload (Length-1 bytes)│
└─────────────────┴─────────────────┴─────────────────────────┘
fn try_parse_message(buffer: &[u8]) -> Option<(Message, usize)> {
    if buffer.len() < 3 {
        return None;  // Not enough data for header
    }

    let length = u16::from_be_bytes([buffer[0], buffer[1]]) as usize;

    if buffer.len() < 2 + length {
        return None;  // Not enough data for complete message
    }

    let msg_type = buffer[2];
    let payload = &buffer[3..2 + length];

    let message = parse_payload(msg_type, payload)?;

    Some((message, 2 + length))  // Return message and bytes consumed
}

// In your read handler:
fn handle_read(conn: &mut Connection) {
    // Read into buffer
    let bytes_read = conn.socket.read(&mut conn.read_buffer)?;
    conn.read_buffer_len += bytes_read;

    // Try to parse complete messages
    let mut consumed = 0;
    while let Some((msg, len)) = try_parse_message(&conn.read_buffer[consumed..]) {
        process_message(conn, msg);
        consumed += len;
    }

    // Shift remaining bytes to front of buffer
    if consumed > 0 {
        conn.read_buffer.copy_within(consumed.., 0);
        conn.read_buffer_len -= consumed;
    }
}

Handle the edge cases:

  • Message split across multiple read() calls
  • Multiple messages in single read() call
  • Buffer overflow (connection sending too much data)
Hint 3 - Order Book Integration

Your order book from Project 1 needs a clean interface:

pub enum OrderResult {
    Accepted {
        order_id: u64,
        remaining_qty: u32,
    },
    Rejected {
        order_id: u64,
        reason: RejectReason,
    },
    Filled {
        order_id: u64,
        trades: Vec<Trade>,
    },
    PartialFill {
        order_id: u64,
        filled_qty: u32,
        remaining_qty: u32,
        trades: Vec<Trade>,
    },
}

pub struct Trade {
    pub trade_id: u64,
    pub buy_order_id: u64,
    pub sell_order_id: u64,
    pub price: i64,
    pub quantity: u32,
    pub timestamp: u64,
}

impl OrderBook {
    pub fn add_order(&mut self, order: Order) -> OrderResult {
        // Validate
        if order.price <= 0 || order.quantity == 0 {
            return OrderResult::Rejected {
                order_id: order.id,
                reason: RejectReason::InvalidOrder,
            };
        }

        // Try to match
        let trades = self.match_order(&mut order);

        if order.remaining_qty == 0 {
            OrderResult::Filled {
                order_id: order.id,
                trades,
            }
        } else if !trades.is_empty() {
            // Add remainder to book
            self.insert_order(order.clone());
            OrderResult::PartialFill {
                order_id: order.id,
                filled_qty: order.original_qty - order.remaining_qty,
                remaining_qty: order.remaining_qty,
                trades,
            }
        } else {
            // No match, add to book
            self.insert_order(order.clone());
            OrderResult::Accepted {
                order_id: order.id,
                remaining_qty: order.remaining_qty,
            }
        }
    }

    pub fn cancel_order(&mut self, order_id: u64) -> Option<u32> {
        // Returns remaining quantity if found and canceled
    }
}

Integrate with message processing:

fn process_new_order(conn: &mut Connection, msg: NewOrderMessage) {
    let order = Order::from_message(&msg);

    let result = order_book.add_order(order);

    match result {
        OrderResult::Accepted { order_id, remaining_qty } => {
            send_ack(conn, order_id, AckStatus::Accepted, remaining_qty);
            broadcast_market_data();
        }
        OrderResult::Filled { order_id, trades } => {
            send_ack(conn, order_id, AckStatus::Filled, 0);
            for trade in trades {
                send_trade_to_both_parties(trade);
            }
            broadcast_market_data();
        }
        // ... handle other cases
    }
}
Hint 4 - Tools and Debugging

Testing with netcat/telnet:

# Start your engine
./matching_engine --port 9000

# In another terminal, connect with nc
nc localhost 9000

# You'll need to send binary data, so use xxd:
# Create a hex file for NEW_ORDER message
echo "00 2D 01 00 00 00 00 00 00 00 01 00 00 00 01 01 00 00 00 00 00 00 3A 61 00 00 00 64 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01" | xxd -r -p | nc localhost 9000

Simple test client in Python:

import socket
import struct

def send_new_order(sock, order_id, side, price_ticks, quantity):
    # Build message
    msg_type = 0x01
    symbol_id = 1
    timestamp = 0
    client_order_id = order_id

    payload = struct.pack('>QIBQQQ',  # Big-endian
        order_id,       # Q = uint64
        symbol_id,      # I = uint32
        side,           # B = uint8
        price_ticks,    # Q = int64 (use Q for unsigned, cast)
        quantity,       # I = uint32 (padded to Q)
        client_order_id
    )

    # Actually the struct format needs adjustment for your protocol
    # This is just an example

    length = len(payload) + 1  # +1 for msg_type
    header = struct.pack('>HB', length, msg_type)

    sock.sendall(header + payload)

# Connect and send orders
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.connect(('localhost', 9000))

send_new_order(sock, order_id=1, side=1, price_ticks=15025, quantity=100)

# Read response
response = sock.recv(1024)
print(f"Response: {response.hex()}")

Latency measurement:

use std::time::Instant;

fn measure_latency<F: FnOnce() -> R, R>(name: &str, f: F) -> R {
    let start = Instant::now();
    let result = f();
    let elapsed = start.elapsed();
    println!("{}: {:?}", name, elapsed);
    result
}

// Or use rdtsc for nanosecond precision:
#[cfg(target_arch = "x86_64")]
fn rdtsc() -> u64 {
    unsafe { std::arch::x86_64::_rdtsc() }
}

Debugging with strace/dtruss:

# Linux: trace system calls
strace -f -e epoll_wait,read,write,accept4 ./matching_engine

# macOS: use dtruss
sudo dtruss -f ./matching_engine

Memory profiling:

# Check for leaks
valgrind --leak-check=full ./matching_engine

# Profile allocations
valgrind --tool=massif ./matching_engine
ms_print massif.out.*

5.8 The Interview Questions They’ll Ask

  1. “Explain event-driven I/O and why it’s used in low-latency systems.”
    • Expected: Explain epoll/kqueue, why blocking doesn’t scale, single thread handling many connections
    • Bonus: Discuss level-triggered vs edge-triggered, io_uring
  2. “How would you design a binary protocol for order submission?”
    • Expected: Length prefix, fixed-size fields, network byte order
    • Bonus: Versioning, checksums, backward compatibility
  3. “What happens when a client disconnects mid-order?”
    • Expected: Discuss order lifecycle, whether to cancel orphaned orders
    • Bonus: Session semantics, reconnection handling
  4. “How do you handle a slow client that can’t keep up with market data?”
    • Expected: Write buffer overflow, disconnect or skip updates
    • Bonus: Conflation (send latest state, not all updates), back pressure
  5. “Explain the tradeoffs between single-threaded and multi-threaded matching engines.”
    • Expected: Single is simpler, multi scales but needs synchronization
    • Bonus: Lock-free queues, per-symbol sharding
  6. “How would you add persistence so orders survive a crash?”
    • Expected: Write-ahead log, journal before processing
    • Bonus: Fsync tradeoffs, memory-mapped files, replication
  7. “What causes latency variance in your system?”
    • Expected: Garbage collection (if applicable), kernel scheduling, network jitter
    • Bonus: Cache misses, branch misprediction, allocations
  8. “How does your system ensure fair order processing?”
    • Expected: FIFO processing, timestamp ordering
    • Bonus: Per-connection queuing, round-robin between clients

5.9 Books That Will Help

Topic Book Chapter
TCP/IP and sockets The Linux Programming Interface Ch. 56-61
epoll and event-driven I/O The Linux Programming Interface Ch. 63
Non-blocking I/O UNIX Network Programming Vol. 1 Ch. 16
High-performance networking High Performance Browser Networking Ch. 1-4
Lock-free programming Rust Atomics and Locks Ch. 1-5
Matching engine design Building Low Latency Applications with C++ Ch. 7-9
Protocol design Designing Data-Intensive Applications Ch. 4
Trading systems Developing High-Frequency Trading Systems Ch. 5-7

5.10 Implementation Phases

Phase 1: Basic TCP Server (Days 1-4)

Goals:

  • Accept TCP connections
  • Echo received data back to client
  • Use epoll/kqueue/mio for event-driven I/O

Tasks:

  1. Create listening socket, bind to port
  2. Set up epoll/kqueue event loop
  3. Accept new connections, add to event loop
  4. Read data from ready connections
  5. Write data back (echo server)
  6. Handle client disconnection
  7. Test with nc/telnet

Checkpoint: Can connect multiple clients, echo works.

Phase 2: Binary Protocol (Days 5-8)

Goals:

  • Define message formats
  • Implement parsing and serialization
  • Handle framing (length-prefixed messages)

Tasks:

  1. Define message structs (NEW_ORDER, CANCEL, ACK, TRADE)
  2. Implement serialization to bytes
  3. Implement deserialization from bytes
  4. Handle partial reads (buffering)
  5. Add message validation
  6. Create test client that sends binary messages

Checkpoint: Can parse and respond to binary messages.

Phase 3: Order Book Integration (Days 9-14)

Goals:

  • Integrate your Project 1 order book
  • Process orders and generate trades
  • Track order state

Tasks:

  1. Import/adapt order book code
  2. Create order from parsed message
  3. Submit to order book, get result
  4. Generate appropriate response messages
  5. Track order lifecycle state
  6. Handle cancel requests
  7. Test matching with multiple clients

Checkpoint: Full order lifecycle works, trades generated.

Phase 4: Market Data and Publishing (Days 15-18)

Goals:

  • Broadcast market data updates
  • Send trades to both parties
  • Implement connection management

Tasks:

  1. Track best bid/offer after each order
  2. Serialize market data message
  3. Broadcast to all connected clients
  4. Track client -> order mapping
  5. Route trade messages to correct clients
  6. Handle client disconnect (clean up orders)

Checkpoint: Market data flowing, trades routed correctly.

Phase 5: Optimization and Testing (Days 19-28)

Goals:

  • Measure and optimize latency
  • Stress test with many clients
  • Document performance

Tasks:

  1. Add latency measurement (order receive to ack send)
  2. Create latency histogram
  3. Identify and fix bottlenecks
  4. Stress test with 100 concurrent clients
  5. Measure throughput (orders/second)
  6. Profile memory usage
  7. Add statistics command
  8. Document results

Checkpoint: Performance targets met, system stable under load.

5.11 Key Implementation Decisions

Decision Options Recommendation Rationale
Event library Raw epoll, mio, tokio, libuv mio (Rust) / raw epoll (C++) Learning value, control
Async runtime None, tokio, async-std None initially Understand fundamentals first
Threading Single, multi with locks, lock-free Single-threaded first Simpler, predictable
Buffer strategy Per-connection, shared pool Per-connection (simple) Avoid contention
Protocol Text, binary, Protobuf Binary (hand-coded) Learning, performance
Order book From P1, new implementation Import P1 code Builds on prior work
Memory Malloc, pool allocator Malloc first, pool later Get working first

6. Testing Strategy

6.1 Unit Tests

Test Category What to Test
Protocol parser Parse valid messages, reject malformed
Message serialization Round-trip: struct -> bytes -> struct
Connection buffer Handle partial reads, multiple messages
Order validator Reject invalid prices, quantities, sides
Order tracker State transitions, lifecycle correctness

6.2 Integration Tests

Test 1: Single Order Flow
  - Connect client
  - Send NEW_ORDER
  - Verify ACK received
  - Verify order in book

Test 2: Simple Match
  - Client A: NEW_ORDER BUY 100 @ 50.00
  - Client B: NEW_ORDER SELL 100 @ 50.00
  - Verify: Both receive TRADE message
  - Verify: Book is empty

Test 3: Partial Fill
  - Client A: NEW_ORDER BUY 100 @ 50.00
  - Client B: NEW_ORDER SELL 50 @ 50.00
  - Verify: Trade for 50 shares
  - Verify: Client A receives partial fill ack
  - Verify: 50 remaining in book

Test 4: Cancel Flow
  - Send NEW_ORDER
  - Send CANCEL
  - Verify: Cancel ACK received
  - Verify: Order removed from book

Test 5: Client Disconnect
  - Connect, send orders
  - Disconnect abruptly
  - Reconnect as new client
  - Verify: Original orders still in book (or canceled per policy)

Test 6: Multiple Clients
  - Connect 10 clients
  - Each sends 100 orders
  - Verify: All acks received
  - Verify: Trades routed to correct clients

Test 7: Market Data Broadcast
  - Connect subscriber
  - Client sends order that changes BBO
  - Verify: Subscriber receives MARKET_DATA

6.3 Stress Tests

Stress Test 1: Connection Storm
  - Connect 100 clients simultaneously
  - Each sends 1000 orders
  - Measure connection success rate
  - Measure order processing rate

Stress Test 2: Sustained Throughput
  - Single client, 10 threads
  - Send orders as fast as possible for 60 seconds
  - Measure orders/second sustained
  - Measure latency distribution

Stress Test 3: Large Book
  - Add 100,000 orders at different prices
  - Send crossing order that matches many
  - Measure matching time
  - Verify correctness

Stress Test 4: Reconnection Churn
  - Clients connect, send orders, disconnect, repeat
  - Run for 5 minutes
  - Verify no memory leaks
  - Verify no orphaned state

7. Common Pitfalls and Debugging

Problem Symptom Cause Solution Verification
Partial read Messages corrupted Not buffering across reads Accumulate in buffer until complete Log message lengths
Write buffer full Client stops receiving Slow client, fast engine Detect buffer full, close or throttle Monitor buffer size
FD leak “Too many open files” Not closing on disconnect Close fd in all error paths Count open fds
Blocking read Engine freezes Forgot O_NONBLOCK Set non-blocking on accept Test with slow client
Order mismatch Wrong client gets trade Client ID mapping wrong Track order->client mapping Log routing decisions
Stale BBO Market data wrong Not updating after match Recalculate after every book change Verify against book state
Memory growth RSS increases over time Not freeing orders Use memory pool, track allocations valgrind/massif
Latency spike P99 much higher than P50 GC, malloc, cache miss Profile, eliminate allocations Histogram analysis

7.1 Debugging Strategies

Print connection state:

fn debug_connection(conn: &Connection) {
    println!("Connection {}:", conn.fd);
    println!("  Read buffer: {} bytes", conn.read_buffer.len());
    println!("  Write buffer: {} bytes", conn.write_buffer.len());
    println!("  Pending orders: {:?}", conn.pending_orders);
    println!("  State: {:?}", conn.state);
}

Trace message flow:

fn process_message(conn: &Connection, msg: &Message) {
    tracing::info!(
        conn_id = conn.client_id,
        msg_type = ?msg.message_type(),
        "Processing message"
    );
    // ... process
    tracing::info!(
        conn_id = conn.client_id,
        result = ?result,
        "Message processed"
    );
}

Validate invariants:

fn validate_state(&self) {
    // All orders in book should have valid client mappings
    for order_id in self.order_book.all_orders() {
        assert!(self.order_tracker.get(order_id).is_some());
    }

    // All tracked orders should be in book or terminal state
    for (order_id, state) in self.order_tracker.iter() {
        match state.status {
            OrderStatus::Accepted | OrderStatus::PartialFill => {
                assert!(self.order_book.contains(order_id));
            }
            _ => {
                assert!(!self.order_book.contains(order_id));
            }
        }
    }
}

8. Extensions and Challenges

8.1 Beginner Extensions

  • Add statistics command: Print latency percentiles, throughput
  • Configurable symbols: Support multiple order books
  • Pretty logging: Colorized output, structured logging
  • Simple web UI: Show order book state in browser

8.2 Intermediate Extensions

  • FIX protocol gateway: Accept FIX messages, convert to internal format
  • Market order support: Execute immediately at best price
  • Stop orders: Trigger when price crosses threshold
  • Order types: IOC (Immediate or Cancel), FOK (Fill or Kill), GTD (Good Till Date)
  • Session management: Login/logout, sequence numbers, replay
  • UDP market data: Multicast for efficient broadcast

8.3 Advanced Extensions

  • Multi-threaded with lock-free queues: Separate network and matching threads
  • Kernel bypass (DPDK): Eliminate kernel overhead
  • Persistence: Write-ahead log, crash recovery
  • Replication: Hot standby for failover
  • Rate limiting: Protect against misbehaving clients
  • Circuit breakers: Halt trading on extreme moves
  • ITCH/OUCH protocol: Industry-standard protocols

9. Real-World Connections

9.1 How Production Systems Differ

Aspect This Project Production System
Protocol Custom binary FIX, ITCH, proprietary
Latency 10-50 microseconds 1-10 microseconds
Throughput 50K orders/sec 1M+ orders/sec
Availability Single process Replicated, hot-hot
Persistence None Every message journaled
Monitoring Print statements Prometheus, PagerDuty
Compliance None SEC/CFTC audit trails
Security None TLS, authentication, authorization

9.2 Real Exchange Architectures

NASDAQ:

  • Uses ITCH (market data) and OUCH (order entry) protocols
  • Matching engine processes millions of messages per second
  • Sub-100 microsecond end-to-end latency
  • Runs on commodity Linux servers
  • Multiple data centers for disaster recovery

NYSE:

  • Pillar trading platform (launched 2016)
  • Binary protocols over TCP
  • Co-location available
  • Strict message ordering guarantees

CME Globex:

  • World’s largest futures exchange
  • iLink protocol for order entry
  • Market data via multicast
  • Sub-millisecond matching
  • Global distribution network

IEX:

  • Famous for “speed bump” (350 microsecond delay)
  • Uses commodity hardware
  • Open source order type specifications
  • Designed to reduce HFT advantage

9.3 Open Source References

Project Language Description Learning Value
LMAX Disruptor Java Lock-free ring buffer Inter-thread communication
Aeron Java/C++ High-performance messaging Protocol design
Chronicle Queue Java Persisted queue Journaling
quickfix C++ FIX protocol engine Protocol implementation
fix8 C++ High-performance FIX Production patterns

10. Resources

10.1 Essential Reading

  • “The Linux Programming Interface” by Michael Kerrisk (Ch. 56-63)
    • Definitive guide to Linux system calls
    • Detailed epoll coverage
  • “Building Low Latency Applications with C++” by Sourav Ghosh (Ch. 7-9)
    • Matching engine implementation
    • Network I/O patterns
  • “UNIX Network Programming Vol. 1” by W. Richard Stevens (Ch. 6, 16)
    • Socket programming fundamentals
    • Non-blocking I/O

10.2 Online Resources

  • Beej’s Guide to Network Programming: Classic socket tutorial
  • Linux epoll man pages: man 7 epoll
  • mio documentation: Rust event loop library
  • tokio internals: Understanding async in Rust

10.3 Protocol Specifications

  • FIX Protocol: fixtrading.org
  • NASDAQ ITCH: Public specification available
  • CME iLink: Available to members

10.4 Videos

  • CppCon 2017: “Low Latency C++” - multiple talks
  • Strange Loop: “LMAX Architecture” - Martin Thompson
  • InfoQ: “How to Build a Matching Engine”

11. Self-Assessment Checklist

Understanding

  • I can explain event-driven I/O vs thread-per-connection
  • I understand how epoll/kqueue works (level vs edge triggered)
  • I can design a binary protocol and explain trade-offs
  • I understand order lifecycle states and transitions
  • I can explain why low-latency systems avoid allocations
  • I understand TCP flow control and buffer management

Implementation

  • My server accepts multiple concurrent TCP connections
  • Messages are correctly framed and parsed (no corruption)
  • Orders are matched with price-time priority
  • Trade messages are sent to correct clients
  • Market data is broadcast after book changes
  • Client disconnection is handled gracefully

Performance

  • Order-to-ack latency is under 50 microseconds (p99)
  • Throughput exceeds 50,000 orders/second
  • System handles 100+ concurrent connections
  • No memory leaks under sustained load
  • Latency is consistent (low variance)

Growth

  • I profiled my code and optimized at least one bottleneck
  • I can read production matching engine code
  • I could explain my design in a systems design interview
  • I understand what I would change for production

12. Submission / Completion Criteria

Minimum Viable Completion

  • TCP server accepts connections on configurable port
  • Binary protocol for NEW_ORDER and CANCEL
  • Order acknowledgments sent to clients
  • Basic matching using price-time priority
  • At least 3 integration tests passing
  • Code compiles without warnings

Full Completion

  • TRADE messages sent to both counterparties
  • MARKET_DATA broadcast after book changes
  • Client disconnection handling
  • Order state tracking (lifecycle)
  • Statistics command (latency, throughput)
  • 10+ integration tests passing
  • Order-to-ack latency < 50us (p99)
  • Throughput > 50,000 orders/second

Excellence (Going Above and Beyond)

  • Multi-threaded with lock-free queues
  • FIX protocol support
  • UDP multicast market data
  • Persistence/journaling
  • Sub-20us latency (p99)
  • 100+ concurrent connections tested
  • Comprehensive benchmarks documented

This guide was expanded from HIGH_FREQUENCY_TRADING_CPP_RUST_LEARNING_PROJECTS_SUMMARY.md. For the complete learning path, see the project index.