Project 3: Simple Matching Engine
Build a complete matching engine that accepts orders via TCP, matches them using your order book, and publishes trades and market data - a mini-exchange that ties together networking, data structures, and protocol design.
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Advanced |
| Time Estimate | 3-4 weeks |
| Language | C++ or Rust |
| Prerequisites | Order book project (P01), basic networking, threading concepts |
| Key Topics | Event-driven I/O, epoll/io_uring, binary protocols, state machines, TCP/IP |
1. Learning Objectives
After completing this project, you will be able to:
- Design event-driven network architectures - Understand why blocking I/O fails at scale and implement non-blocking alternatives using epoll, kqueue, or io_uring
- Implement binary wire protocols - Create efficient serialization formats with proper framing, avoiding JSON/text overhead
- Build deterministic state machines - Model order lifecycles and connection states with predictable, testable behavior
- Handle concurrent connections efficiently - Manage thousands of clients with minimal resource usage and fair scheduling
- Integrate multiple subsystems - Connect networking, order books, matching logic, and market data distribution
- Measure and optimize latency - Profile hot paths, eliminate allocations, and achieve microsecond-level response times
- Design fair matching algorithms - Implement price-time priority correctly without introducing bias
- Broadcast market data efficiently - Publish order book updates and trade notifications to multiple subscribers
- Handle failure modes gracefully - Manage disconnections, partial reads, and malformed messages
- Reason about system correctness - Ensure no orders are lost, duplicated, or incorrectly matched
2. Theoretical Foundation
2.1 Core Concepts
Event-Driven I/O: Why Blocking Fails
Traditional blocking I/O works like this:
Thread-Per-Connection Model (Blocking)
=====================================
Client A ────┐
│ ┌──────────────────────┐
Client B ────┼───►│ Server Process │
│ ├──────────────────────┤
Client C ────┘ │ Thread 1: Client A │ ◄── Blocked waiting for data
│ Thread 2: Client B │ ◄── Blocked waiting for data
│ Thread 3: Client C │ ◄── Blocked waiting for data
│ Thread 4: Client D │ ◄── Blocked waiting for data
│ ... │
│ Thread 10000: ... │ ◄── 10,000 threads = disaster!
└──────────────────────┘
Problems:
- Each thread uses ~8KB-8MB stack memory
- Context switching between 10,000 threads is expensive
- Scheduling overhead becomes dominant
- Memory usage: 10,000 * 8MB = 80GB just for stacks!
The solution: Event-driven I/O with a reactor pattern
Event-Driven Model (Non-Blocking)
=================================
┌──────────────────────────────────────┐
Client A ────┐ │ Kernel │
│ ├──────────────────────────────────────┤
Client B ────┼───►│ epoll/kqueue/io_uring │
│ │ ┌────────────────────────────────┐ │
Client C ────┘ │ │ Event Queue │ │
│ │ ┌─────────────────────────────┴─┐│
│ │ │ fd=5: READ_READY ││
│ │ │ fd=7: WRITE_READY ││
│ │ │ fd=9: READ_READY ││
│ │ └─────────────────────────────┬─┘│
│ └────────────────────────────────┘ │
└───────────────────────┬──────────────┘
│
▼
┌──────────────────────────────────────┐
│ Single Event Loop Thread │
├──────────────────────────────────────┤
│ while (true) { │
│ events = epoll_wait(epfd); │
│ for (event in events) { │
│ if (event.readable) │
│ handle_read(event.fd); │
│ if (event.writable) │
│ handle_write(event.fd); │
│ } │
│ } │
└──────────────────────────────────────┘
Benefits:
- Single thread handles all connections
- No context switching overhead
- Minimal memory usage
- O(1) event notification with epoll
- Can handle 100,000+ connections easily
The Three I/O Multiplexing Mechanisms
epoll (Linux)
┌───────────────────────────────────────────────────────┐
│ epoll Architecture │
├───────────────────────────────────────────────────────┤
│ │
│ User Space Kernel Space │
│ ┌─────────────┐ ┌─────────────────────┐ │
│ │ Application │ │ epoll instance │ │
│ │ │ │ ┌───────────────┐ │ │
│ │ epoll_create│───────────►│ │ Interest Set │ │ │
│ │ epoll_ctl │───────────►│ │ (Red-Black │ │ │
│ │ epoll_wait │◄───────────│ │ Tree) │ │ │
│ │ │ │ └───────┬───────┘ │ │
│ └─────────────┘ │ │ │ │
│ │ ┌───────▼───────┐ │ │
│ │ │ Ready List │ │ │
│ │ │ (Linked List) │ │ │
│ │ │ Only ready fds│ │ │
│ │ └───────────────┘ │ │
│ └─────────────────────┘ │
│ │
│ Key Operations: │
│ - epoll_create1(flags) → creates epoll instance │
│ - epoll_ctl(epfd, op, fd, event) → add/mod/del fd │
│ - epoll_wait(epfd, events, max, timeout) → get ready │
│ │
│ Edge-Triggered vs Level-Triggered: │
│ - Level: "fd is ready" (keeps firing while ready) │
│ - Edge: "fd became ready" (fires once per event) │
└───────────────────────────────────────────────────────┘
io_uring (Linux 5.1+)
┌───────────────────────────────────────────────────────┐
│ io_uring Architecture │
├───────────────────────────────────────────────────────┤
│ │
│ User Space Kernel Space │
│ ┌─────────────────────────────────────────────────┐ │
│ │ Shared Memory Region │ │
│ │ ┌───────────────────┐ ┌───────────────────┐ │ │
│ │ │ Submission Queue │ │ Completion Queue │ │ │
│ │ │ (SQ) │ │ (CQ) │ │ │
│ │ │ ┌───────────────┐ │ │ ┌───────────────┐ │ │ │
│ │ │ │ SQE: read() │ │ │ │ CQE: result │ │ │ │
│ │ │ │ SQE: write() │─┼──┼►│ CQE: result │ │ │ │
│ │ │ │ SQE: accept() │ │ │ │ CQE: result │ │ │ │
│ │ │ └───────────────┘ │ │ └───────────────┘ │ │ │
│ │ └───────────────────┘ └───────────────────┘ │ │
│ └─────────────────────────────────────────────────┘ │
│ │
│ Zero System Calls for I/O! │
│ - Application writes SQEs to shared memory │
│ - Kernel processes and writes CQEs to shared memory │
│ - io_uring_enter() only needed to wake kernel │
│ │
│ Key Advantage: Batched, async I/O without syscalls │
└───────────────────────────────────────────────────────┘
kqueue (BSD/macOS)
┌───────────────────────────────────────────────────────┐
│ kqueue Architecture │
├───────────────────────────────────────────────────────┤
│ │
│ Similar to epoll but more general: │
│ - Can monitor files, sockets, processes, signals │
│ - Uses "kevents" for both registration and delivery │
│ - Single call (kevent) for both add/wait operations │
│ │
│ struct kevent { │
│ uintptr_t ident; // fd, pid, signal number │
│ int16_t filter; // EVFILT_READ, EVFILT_WRITE │
│ uint16_t flags; // EV_ADD, EV_DELETE, EV_CLEAR │
│ uint32_t fflags; // filter-specific flags │
│ intptr_t data; // filter-specific data │
│ void *udata; // user-defined data │
│ }; │
│ │
└───────────────────────────────────────────────────────┘
Binary Protocol Design
Why Text Protocols Are Slow
JSON Message (Human Readable, Slow)
===================================
{ ◄── 1 byte
"type": "new_order", ◄── 21 bytes
"symbol": "AAPL", ◄── 16 bytes
"side": "buy", ◄── 14 bytes
"price": 150.25, ◄── 17 bytes
"quantity": 100, ◄── 16 bytes
"order_id": 123456789 ◄── 22 bytes
} ◄── 1 byte
────────────
~108 bytes + parsing overhead
Parsing Steps:
1. Tokenize string (O(n) scan)
2. Build AST/parse tree
3. Validate field names (string comparison)
4. Convert "150.25" → double (expensive!)
5. Allocate strings for symbol, type
Binary Message (Machine Efficient, Fast)
========================================
┌────────────────────────────────────────────────────────┐
│ Byte│ 0 1 2 3 │ 4 5 6 7 │ 8 9 10 11 │ │
├─────┼───────────────┼───────────────┼───────────────┼─┤
│ │ MSG_TYPE │ SYMBOL │ PRICE │ │
│ │ 0x01 (NEW) │ "AAPL" │ 150.25 │ │
│ │ uint8_t │ 4 bytes │ int64 (bp) │ │
├─────┼───────────────┼───────────────┼───────────────┼─┤
│ │ QUANTITY │ ORDER_ID │ SIDE │ │
│ │ 100 │ 123456789 │ 0x01 (BUY) │ │
│ │ uint32_t │ uint64_t │ uint8_t │ │
└─────┴───────────────┴───────────────┴───────────────┴─┘
Total: 22 bytes (5x smaller than JSON!)
Parsing Steps:
1. Read header (1 memory access)
2. Cast bytes to struct (zero-copy!)
3. Done!
No string parsing, no allocation, no validation loops.
Message Framing
Length-Prefixed Framing
=======================
┌─────────────────────────────────────────────────────────┐
│ TCP Stream (no message boundaries) │
├─────────────────────────────────────────────────────────┤
│ │
│ Challenge: TCP is a byte stream, not message stream │
│ We might receive partial messages or multiple messages │
│ │
│ Single recv() might return: │
│ - Half a message │
│ - One complete message │
│ - Three messages │
│ - 2.5 messages │
│ │
└─────────────────────────────────────────────────────────┘
Solution: Length-Prefix Protocol
================================
│◄──────────── Message 1 ──────────────►│◄─ Message 2 ─►│
┌────────────┬─────────────────────────┬────────────┬────┐
│ Length │ Payload │ Length │... │
│ (4 bytes) │ (Length bytes) │ (4 bytes) │ │
│ 00 00 00 │ 01 41 41 50 4C ... │ 00 00 00 │ │
│ 16 │ │ 0A │ │
└────────────┴─────────────────────────┴────────────┴────┘
Parser State Machine:
┌─────────────────────────────────────────────────────────┐
│ │
│ ┌──────────┐ ┌──────────────────┐ │
│ │ READING │ got 4 bytes │ READING │ │
│ │ LENGTH │───────────────────►│ PAYLOAD │ │
│ │ │ │ │ │
│ └────┬─────┘ └────────┬─────────┘ │
│ │ │ │
│ │ need more │ got full │
│ │ bytes │ payload │
│ ▼ ▼ │
│ ┌──────────┐ ┌──────────────────┐ │
│ │ WAITING │◄───────────────────│ DISPATCH │ │
│ │ FOR DATA │ reset state │ MESSAGE │ │
│ └──────────┘ └──────────────────┘ │
│ │
└─────────────────────────────────────────────────────────┘
State Machine Design
Order Lifecycle State Machine
┌─────────────────────────────────────────────────────────────────────┐
│ Order Lifecycle States │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ ┌───────────┐ │
│ ┌───────────►│ REJECTED │◄──────────────┐ │
│ │ └───────────┘ │ │
│ │ validation │ risk check │
│ │ failed │ failed │
│ │ │ │
│ ┌───────┴───────┐ ┌────────┴──────┐ │
│ │ PENDING │──────────────────────►│ ACCEPTED │ │
│ │ (received) │ validation OK │ (on book) │ │
│ └───────────────┘ └───────┬───────┘ │
│ │ │
│ ┌───────────────────────┼───────────────┐ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌───────────────┐ ┌───────────────┐ ┌──────┐│
│ │ PARTIALLY │ │ FILLED │ │CANCEL││
│ │ FILLED │──────►│ (complete) │ │LED ││
│ │ │ rest └───────────────┘ └──────┘│
│ └───────┬───────┘ filled │
│ │ │
│ │ cancel request │
│ ▼ │
│ ┌───────────────┐ │
│ │ CANCELLED │ │
│ │ (partial OK) │ │
│ └───────────────┘ │
│ │
│ Valid State Transitions: │
│ PENDING → ACCEPTED, REJECTED │
│ ACCEPTED → PARTIALLY_FILLED, FILLED, CANCELLED │
│ PARTIALLY_FILLED → PARTIALLY_FILLED, FILLED, CANCELLED │
│ │
└─────────────────────────────────────────────────────────────────────┘
Connection State Machine
┌─────────────────────────────────────────────────────────────────────┐
│ Connection State Machine │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────┐ │
│ │ │ accept() │
│ │ LISTENING │────────────────┐ │
│ │ │ │ │
│ └──────────────┘ ▼ │
│ ┌──────────────────┐ │
│ │ CONNECTED │ │
│ │ (no session) │ │
│ └────────┬─────────┘ │
│ │ │
│ │ receive LOGIN msg │
│ ▼ │
│ ┌──────────────────┐ │
│ │ AUTHENTICATING │ │
│ │ │ │
│ └────────┬─────────┘ │
│ ┌───────┴───────┐ │
│ auth fail │ │ auth success │
│ ▼ ▼ │
│ ┌──────────────┐ ┌──────────────────┐ │
│ │ DISCONNECTING│ │ AUTHENTICATED │ │
│ │ │ │ (session ready) │ │
│ └──────────────┘ └────────┬─────────┘ │
│ │ │
│ │ can send orders │
│ ▼ │
│ ┌──────────────────┐ │
│ │ TRADING │ │
│ │ (active session)│ │
│ └────────┬─────────┘ │
│ │ │
│ ┌─────────────────┼─────────────────┐ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌──────────┐ ┌──────────────┐ ┌───────────┐ │
│ │ LOGOUT │ │ HEARTBEAT │ │ ERROR │ │
│ │ requested│ │ TIMEOUT │ │ (proto) │ │
│ └─────┬────┘ └──────┬───────┘ └─────┬─────┘ │
│ │ │ │ │
│ └─────────────────┼─────────────────┘ │
│ ▼ │
│ ┌──────────────────┐ │
│ │ DISCONNECTED │ │
│ │ │ │
│ └──────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────┘
2.2 Why This Matters
The matching engine is the heart of any electronic exchange. Every stock trade, cryptocurrency transaction, and futures contract flows through a matching engine. Understanding how to build one teaches you:
Systems Design Skills
- How to handle high concurrency with predictable latency
- How to design protocols that are both efficient and debuggable
- How to build systems that never lose data
Real-World Applicability
- NASDAQ processes 1 million messages per second
- NYSE matching engine latency is ~50 microseconds
- Cryptocurrency exchanges handle 100,000+ orders per second
Career Value
- HFT firms pay $500K+ for engineers who understand matching engines
- Exchange infrastructure knowledge is rare and valuable
- The skills transfer to any high-throughput system
2.3 Historical Context
1960s-1980s: Floor Trading
- Humans shouted orders in trading pits
- Matching was done by eye contact and hand signals
- Latency measured in seconds
1990s: Electronic Order Books
- First electronic exchanges (ISLAND, Archipelago)
- Matching engines ran on single servers
- Latency dropped to milliseconds
2000s: Co-location Era
- Traders moved servers next to exchange
- Microsecond latency became competitive advantage
- Arms race for faster matching
2010s-Present: Kernel Bypass
- FPGA-based matching engines
- User-space networking (DPDK, RDMA)
- Nanosecond-level differences matter
Your Project’s Place You will build something comparable to early 2000s exchange technology - software-based matching with microsecond latency. This is the foundation for understanding modern systems.
2.4 Common Misconceptions
Misconception 1: “Just use async/await and you’re done”
Reality: High-level async frameworks add overhead. Raw epoll/io_uring gives you control over every syscall.
Overhead Comparison:
tokio::spawn() → ~300ns task spawn overhead
raw epoll_wait() → ~50ns per wake
io_uring → ~10ns for batched I/O
For HFT, 250ns extra per operation is unacceptable.
Misconception 2: “TCP provides reliable ordering, so message order is guaranteed”
Reality: TCP guarantees byte order within a connection, not global order across connections. If Client A sends order 1 and Client B sends order 2 at the “same time,” their arrival order depends on network latency, not send time.
Misconception 3: “JSON is slow because of parsing; Protocol Buffers solve this”
Reality: Even Protocol Buffers have overhead. True HFT uses fixed-width structs with no serialization at all - just memcpy and type punning (carefully!).
Misconception 4: “Matching is just sorting orders by price”
Reality: Matching involves:
- Price-time priority (FIFO within price level)
- Pro-rata allocation for some markets
- Self-trade prevention
- Market maker obligations
- Minimum quantity constraints
- Hidden/iceberg orders
3. Project Specification
3.1 What You Will Build
A functional matching engine with these capabilities:
┌─────────────────────────────────────────────────────────────────────┐
│ Matching Engine Architecture │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────────────────────────────────────────────────────┐ │
│ │ Network Layer │ │
│ │ ┌─────────────────────────────────────────────────────────┐ │ │
│ │ │ TCP Listener (epoll/kqueue/io_uring) │ │ │
│ │ │ │ │ │
│ │ │ Client A ─┐ │ │ │
│ │ │ Client B ─┼──► Accept ──► Session Manager │ │ │
│ │ │ Client C ─┘ │ │ │
│ │ └─────────────────────────────────────────────────────────┘ │ │
│ └──────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────────────────────────────┐ │
│ │ Protocol Layer │ │
│ │ ┌─────────────────────────────────────────────────────────┐ │ │
│ │ │ Binary Protocol Parser │ │ │
│ │ │ - Message framing (length-prefixed) │ │ │
│ │ │ - Message type dispatch │ │ │
│ │ │ - Validation │ │ │
│ │ └─────────────────────────────────────────────────────────┘ │ │
│ └──────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────────────────────────────┐ │
│ │ Order Processing Layer │ │
│ │ ┌───────────────────────────────────────────────────────┐ │ │
│ │ │ Order Validator ──► Order Book ──► Matching Engine │ │ │
│ │ │ │ │ │ │ │ │
│ │ │ ▼ ▼ ▼ │ │ │
│ │ │ Rejections Book Updates Trades │ │ │
│ │ └───────────────────────────────────────────────────────┘ │ │
│ └──────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────────────────────────────┐ │
│ │ Market Data Layer │ │
│ │ ┌─────────────────────────────────────────────────────────┐ │ │
│ │ │ - Trade broadcasts │ │ │
│ │ │ - Order book snapshots │ │ │
│ │ │ - Best bid/offer updates │ │ │
│ │ └─────────────────────────────────────────────────────────┘ │ │
│ └──────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────┘
3.2 Functional Requirements
Order Management
- Accept NEW_ORDER messages with: symbol, side (buy/sell), price, quantity, order_id
- Accept CANCEL_ORDER messages with: order_id
- Send ORDER_ACK confirming receipt with status (accepted/rejected)
- Send FILL messages when orders match
- Send CANCEL_ACK confirming cancellations
Matching Logic
- Implement price-time priority (best price first, then FIFO within price)
- Match incoming orders against resting orders on opposite side
- Support limit orders (execute at specified price or better)
- Handle partial fills correctly
Market Data
- Broadcast TRADE messages to all connected clients
- Broadcast BBO (best bid/offer) updates when top-of-book changes
- Support order book snapshot requests
Connection Management
- Accept multiple simultaneous TCP connections
- Implement session management (login, heartbeat, logout)
- Handle disconnections gracefully (cancel-on-disconnect optional)
3.3 Non-Functional Requirements
Performance Targets
- Handle 50,000+ orders per second sustained
- Achieve p99 latency under 100 microseconds (order in to ack out)
- Support 100+ simultaneous connections
- Zero memory allocations in the hot path
Reliability
- No lost orders - every order gets a response
- No duplicate matches - each order matched at most once
- Deterministic behavior - same input produces same output
Observability
- Log all orders and trades with timestamps
- Report latency percentiles (p50, p95, p99)
- Track order counts and throughput
3.4 Example Usage / Output
Starting the Engine
$ ./matching_engine --port 9000 --symbols AAPL,MSFT,GOOG
[2024-01-15 09:30:00.000] [INFO] Matching Engine v0.1.0
[2024-01-15 09:30:00.001] [INFO] Loading symbols: AAPL, MSFT, GOOG
[2024-01-15 09:30:00.002] [INFO] Listening on 0.0.0.0:9000
[2024-01-15 09:30:00.002] [INFO] Using epoll for I/O (Linux detected)
[2024-01-15 09:30:00.002] [INFO] Ready to accept connections
Client Connection and Trading
$ ./trading_client --connect localhost:9000
Connected to matching engine at 127.0.0.1:9000
Session ID: 1001
> login trader_alice password123
LOGIN_ACK: session=1001 status=AUTHENTICATED
> order buy AAPL 100 150.25
ORDER_ACK: order_id=1 status=ACCEPTED symbol=AAPL side=BUY qty=100 price=150.25
[ORDER] Resting on book: BUY 100 AAPL @ 150.25
> order sell AAPL 50 150.25
ORDER_ACK: order_id=2 status=ACCEPTED symbol=AAPL side=SELL qty=50 price=150.25
FILL: order_id=2 match_id=1 exec_qty=50 exec_price=150.25 leaves_qty=0
FILL: order_id=1 match_id=1 exec_qty=50 exec_price=150.25 leaves_qty=50
[TRADE] AAPL: 50 @ 150.25 (buyer=1, seller=2)
> book AAPL
ORDER_BOOK: AAPL
BID ASK
50 @ 150.25 (order 1) (empty)
> cancel 1
CANCEL_ACK: order_id=1 status=CANCELLED cancelled_qty=50
> logout
LOGOUT_ACK: session=1001 status=LOGGED_OUT
Connection closed.
Server Statistics
$ ./matching_engine --port 9000 --stats-interval 10
[STATS] 10-second interval ending 09:30:10.000
+------------------------------------------+
| Metric | Value |
+------------------------------------------+
| Orders received | 524,312 |
| Orders matched | 489,001 |
| Orders cancelled | 35,311 |
| Trades executed | 245,678 |
| Active connections | 47 |
| Active orders (resting) | 12,456 |
+------------------------------------------+
| Throughput | 52,431/sec |
+------------------------------------------+
| Latency (order→ack) |
| p50 | 8 μs |
| p95 | 23 μs |
| p99 | 45 μs |
| p99.9 | 112 μs |
| max | 1,245 μs |
+------------------------------------------+
3.5 Real World Outcome
Upon completion, you will have:
- A Working Mini-Exchange: Connect multiple clients and trade against each other
- Understanding of Exchange Internals: Know how NASDAQ, NYSE, and crypto exchanges work
- Production-Grade Components: Reusable network layer, protocol parser, matching logic
- Performance Profiling Skills: Know how to measure and optimize latency
- Portfolio Piece: Demonstrate systems programming expertise to potential employers
4. Solution Architecture
4.1 High-Level Design
┌─────────────────────────────────────────────────────────────────────────┐
│ Matching Engine Components │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ Main Event Loop │ │
│ │ ┌───────────────────────────────────────────────────────────┐ │ │
│ │ │ while running: │ │ │
│ │ │ events = io_poll.wait(timeout=1ms) │ │ │
│ │ │ for event in events: │ │ │
│ │ │ if event.is_accept: │ │ │
│ │ │ handle_new_connection() │ │ │
│ │ │ elif event.is_readable: │ │ │
│ │ │ handle_client_data(event.fd) │ │ │
│ │ │ elif event.is_writable: │ │ │
│ │ │ flush_pending_writes(event.fd) │ │ │
│ │ │ check_heartbeats() │ │ │
│ │ │ maybe_print_stats() │ │ │
│ │ └───────────────────────────────────────────────────────────┘ │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ┌───────────────┼───────────────────┐ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌───────────────────┐ ┌──────────────────┐ ┌─────────────────────┐ │
│ │ SessionManager │ │ OrderRouter │ │ MarketDataPub │ │
│ │ │ │ │ │ │ │
│ │ - connections[] │ │ - validate() │ │ - subscribers[] │ │
│ │ - authenticate() │ │ - route_to_book()│ │ - broadcast_trade() │ │
│ │ - heartbeat() │ │ │ │ - broadcast_bbo() │ │
│ └───────────────────┘ └────────┬─────────┘ └─────────────────────┘ │
│ │ │
│ ▼ │
│ ┌───────────────────────────────────────┐ │
│ │ Order Books (per symbol) │ │
│ │ ┌─────────────────────────────────┐ │ │
│ │ │ AAPL Book │ │ │
│ │ │ ┌───────────┐ ┌───────────┐ │ │ │
│ │ │ │ Bids │ │ Asks │ │ │ │
│ │ │ │ 150.25:50 │ │ 150.30:25 │ │ │ │
│ │ │ │ 150.20:100│ │ 150.35:75 │ │ │ │
│ │ │ └───────────┘ └───────────┘ │ │ │
│ │ └─────────────────────────────────┘ │ │
│ │ ┌─────────────────────────────────┐ │ │
│ │ │ MSFT Book ... │ │ │
│ │ └─────────────────────────────────┘ │ │
│ └───────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────┘
4.2 Key Components
| Component | Responsibility | Key Design Decisions |
|---|---|---|
| EventLoop | Polls for I/O events, drives main execution | Single-threaded vs multi-threaded, timer resolution |
| SessionManager | Tracks connections, authentication state | Session ID assignment, timeout handling |
| ProtocolParser | Deserializes binary messages, validates | Buffer management, partial read handling |
| OrderRouter | Validates orders, routes to correct book | Pre-trade risk checks, symbol lookup |
| OrderBook | Maintains price levels, matches orders | Data structure choice (map vs array), price representation |
| MatchingEngine | Executes matching algorithm | Price-time priority implementation, self-trade prevention |
| MarketDataPublisher | Broadcasts trades, quotes to subscribers | Multicast vs unicast, throttling |
| Logger | Records all activity with timestamps | File vs memory buffer, async logging |
4.3 Data Structures
Core Data Structures
====================
Order (24-32 bytes, cache-line friendly)
┌──────────────────────────────────────────────────────────┐
│ order_id: u64 │ 8 bytes │ Unique order identifier │
│ symbol_id: u32 │ 4 bytes │ Internal symbol index │
│ price: i64 │ 8 bytes │ Price in basis points │
│ quantity: u32 │ 4 bytes │ Order quantity │
│ leaves_qty: u32 │ 4 bytes │ Remaining quantity │
│ side: u8 │ 1 byte │ BUY=1, SELL=2 │
│ type: u8 │ 1 byte │ LIMIT=1, MARKET=2 │
│ status: u8 │ 1 byte │ See state machine │
│ _padding: u8 │ 1 byte │ Alignment │
│ client_id: u32 │ 4 bytes │ Owner session │
│ timestamp: u64 │ 8 bytes │ Arrival time (nanos) │
└──────────────────────────────────────────────────────────┘
Total: 44 bytes (fits in cache line with padding)
PriceLevel
┌──────────────────────────────────────────────────────────┐
│ price: i64 │ Price for this level │
│ total_quantity: u64 │ Sum of all order qtys │
│ order_count: u32 │ Number of orders │
│ orders: LinkedList<Order*> │ FIFO queue of orders │
│ next: *PriceLevel │ Linked list of levels │
│ prev: *PriceLevel │ Doubly-linked │
└──────────────────────────────────────────────────────────┘
OrderBook (per symbol)
┌──────────────────────────────────────────────────────────┐
│ symbol_id: u32 │
│ bids: BTreeMap<Price, PriceLevel> │ Sorted high→low │
│ asks: BTreeMap<Price, PriceLevel> │ Sorted low→high │
│ orders: HashMap<OrderId, Order*> │ Fast lookup │
│ best_bid: Option<i64> │ Cached BBO │
│ best_ask: Option<i64> │ Cached BBO │
│ last_trade_price: i64 │
│ last_trade_qty: u32 │
└──────────────────────────────────────────────────────────┘
Session (per connection)
┌──────────────────────────────────────────────────────────┐
│ session_id: u64 │
│ fd: i32 │ Socket descriptor │
│ state: ConnectionState │ See state machine │
│ read_buffer: RingBuffer │ Incoming data │
│ write_buffer: RingBuffer │ Outgoing data │
│ last_activity: u64 │ For heartbeat │
│ orders: Vec<OrderId> │ Orders from this sess │
└──────────────────────────────────────────────────────────┘
4.4 Algorithm Overview
Matching Algorithm (Price-Time Priority)
match_order(incoming_order):
if incoming_order.side == BUY:
opposite_book = asks
matches_price = lambda ask_price: incoming_order.price >= ask_price
else:
opposite_book = bids
matches_price = lambda bid_price: incoming_order.price <= bid_price
remaining_qty = incoming_order.quantity
trades = []
while remaining_qty > 0:
best_level = opposite_book.peek_best()
if best_level is None:
break // No more orders to match
if not matches_price(best_level.price):
break // Price doesn't cross
// Match against orders at this price level (FIFO)
while remaining_qty > 0 and not best_level.is_empty():
resting_order = best_level.front()
match_qty = min(remaining_qty, resting_order.leaves_qty)
match_price = resting_order.price // Passive side's price
// Create trade
trade = Trade(
buyer = incoming_order if BUY else resting_order,
seller = resting_order if BUY else incoming_order,
price = match_price,
quantity = match_qty
)
trades.append(trade)
// Update quantities
remaining_qty -= match_qty
resting_order.leaves_qty -= match_qty
if resting_order.leaves_qty == 0:
best_level.pop_front()
resting_order.status = FILLED
else:
resting_order.status = PARTIALLY_FILLED
if best_level.is_empty():
opposite_book.remove_level(best_level.price)
// Handle remaining quantity on incoming order
if remaining_qty > 0:
incoming_order.leaves_qty = remaining_qty
add_to_book(incoming_order) // Becomes resting order
incoming_order.status = ACCEPTED
else:
incoming_order.status = FILLED
return trades
5. Implementation Guide
5.1 Development Environment Setup
Linux (Recommended)
# System packages
sudo apt-get update
sudo apt-get install -y build-essential cmake ninja-build
sudo apt-get install -y liburing-dev # For io_uring (Linux 5.1+)
# C++ toolchain
sudo apt-get install -y g++-12 clang-15
sudo apt-get install -y libfmt-dev libspdlog-dev
# Rust toolchain
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
rustup default stable
rustup component add rust-analyzer
# Performance tools
sudo apt-get install -y linux-tools-common linux-tools-generic
sudo apt-get install -y perf # Profiling
cargo install flamegraph # Flame graphs
cargo install cargo-criterion # Benchmarking
macOS
# Xcode command line tools
xcode-select --install
# Homebrew packages
brew install cmake ninja llvm
brew install fmt spdlog
# Rust toolchain
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
5.2 Project Structure
matching_engine/
├── Cargo.toml # Rust project (or CMakeLists.txt for C++)
├── src/
│ ├── main.rs # Entry point
│ ├── lib.rs # Library exports
│ ├── config.rs # Configuration parsing
│ │
│ ├── network/
│ │ ├── mod.rs
│ │ ├── event_loop.rs # epoll/kqueue/io_uring abstraction
│ │ ├── tcp_listener.rs # Connection acceptance
│ │ ├── session.rs # Per-connection state
│ │ └── buffer.rs # Ring buffer for I/O
│ │
│ ├── protocol/
│ │ ├── mod.rs
│ │ ├── messages.rs # Message type definitions
│ │ ├── parser.rs # Binary protocol parsing
│ │ └── serializer.rs # Binary protocol writing
│ │
│ ├── order/
│ │ ├── mod.rs
│ │ ├── order.rs # Order struct and states
│ │ ├── validator.rs # Pre-trade validation
│ │ └── id_generator.rs # Unique ID generation
│ │
│ ├── book/
│ │ ├── mod.rs
│ │ ├── order_book.rs # OrderBook implementation
│ │ ├── price_level.rs # PriceLevel implementation
│ │ └── matching.rs # Matching algorithm
│ │
│ ├── market_data/
│ │ ├── mod.rs
│ │ ├── publisher.rs # Market data distribution
│ │ └── snapshot.rs # Order book snapshots
│ │
│ └── metrics/
│ ├── mod.rs
│ ├── latency.rs # Latency histograms
│ └── counters.rs # Throughput tracking
│
├── tests/
│ ├── integration/
│ │ ├── matching_tests.rs
│ │ ├── network_tests.rs
│ │ └── protocol_tests.rs
│ └── unit/
│ └── *.rs
│
├── benches/
│ ├── matching_bench.rs
│ └── protocol_bench.rs
│
└── tools/
├── client/ # Test client
└── load_generator/ # Performance testing
5.3 The Core Question You’re Answering
“How do you build a system that handles thousands of concurrent connections with deterministic, microsecond-level latency?”
This question encompasses:
- Why can’t we just spawn a thread per connection?
- How do operating systems notify us of I/O readiness efficiently?
- What data structures enable O(1) or O(log n) operations under load?
- How do we avoid the unpredictability of garbage collection and memory allocation?
- How do we measure latency accurately without the measurement affecting results?
5.4 Concepts You Must Understand First
Before writing code, verify you understand these concepts:
| Concept | Self-Check Question | Where to Learn |
|---|---|---|
| TCP socket states | What’s the difference between ESTABLISHED and CLOSE_WAIT? | Stevens, “TCP/IP Illustrated” Ch. 2 |
| Non-blocking I/O | What does EAGAIN/EWOULDBLOCK mean? | Kerrisk, “Linux Programming Interface” Ch. 63 |
| File descriptor readiness | What does “ready for reading” mean for a socket? | Kerrisk Ch. 63 |
| Edge vs level triggering | Why would epoll EPOLLET cause problems if we don’t drain the buffer? | man epoll(7) |
| Byte ordering | Why do we need htonl/ntohl? | Stevens Ch. 3 |
| Memory alignment | Why might a misaligned read cause a bus error? | CS:APP Ch. 3 |
| Cache lines | How does false sharing slow down concurrent code? | CS:APP Ch. 6 |
5.5 Questions to Guide Your Design
Work through these before coding:
Networking
- Should you use blocking or non-blocking sockets? Why?
- When a client connects, what socket options should you set (TCP_NODELAY, SO_REUSEADDR)?
- How will you handle partial reads? Partial writes?
- What happens if your write buffer fills up?
Protocol
- How will you detect message boundaries in the TCP stream?
- What happens if you receive a malformed message?
- How will you represent prices? (Floats? Fixed-point? Basis points?)
- Should message types be extensible (like protobuf) or fixed (like FIX)?
Matching
- What is price-time priority and why is it fair?
- How will you handle a market order that partially fills?
- Should you match synchronously (inline) or asynchronously (queue)?
- How will you prevent self-trading (same user buying from themselves)?
State Management
- How will you track which orders belong to which session?
- What happens to orders when a session disconnects?
- How will you handle out-of-sequence messages?
- How will you recover from a crash?
5.6 Thinking Exercise
Before coding, trace through this scenario by hand:
T=0: Client A connects, sends LOGIN
T=1: Client B connects, sends LOGIN
T=5: Client A sends: BUY 100 AAPL @ 150.00
T=6: Client A sends: BUY 50 AAPL @ 150.00
T=7: Client B sends: SELL 75 AAPL @ 150.00
T=8: Client B sends: SELL 100 AAPL @ 149.50
T=10: Client A sends: CANCEL order from T=6
Answer on paper:
- What is the order book state after T=6? (Bids and asks)
- What trades occur at T=7? Who is the buyer/seller for each trade?
- What is the order book state after T=7?
- What trades occur at T=8?
- What happens at T=10? Can the cancel succeed?