Project 6: Capstone - Full HFT Trading System
Build a complete electronic trading ecosystem that integrates all previous projects into a production-grade system with end-to-end latency measurement, risk management, and real-time monitoring. This is what HFT firms actually build.
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Advanced |
| Time Estimate | 2-3 months (after completing P01-P05) |
| Languages | C++ and/or Rust |
| Prerequisites | All previous projects (P01-P05), system design experience |
| Key Topics | System integration, end-to-end latency, risk management, monitoring, multi-threaded orchestration |
| Coolness Level | Level 5: Production-Grade HFT |
| Portfolio Value | Career-Defining Project |
1. Learning Objectives
By completing this capstone project, you will:
- Master system integration with minimal latency - Understand how to connect multiple high-performance components without introducing bottleneck delays
- Implement end-to-end latency instrumentation - Build comprehensive timing infrastructure that measures every hop in the trading pipeline
- Design failure handling and resilience patterns - Create systems that gracefully handle component failures without data loss
- Build configuration management for trading systems - Implement hot-reloadable configuration for strategy parameters and risk limits
- Orchestrate multi-threaded systems - Master CPU pinning, affinity, and coordination between specialized threads
- Understand production trading concerns - Experience the full complexity of what HFT firms actually deploy
- Build real-time monitoring dashboards - Create visibility into system health, P&L, and latency metrics
- Implement risk management layers - Add pre-trade and post-trade risk controls that prevent catastrophic losses
- Connect theory to practice - See how all previous project knowledge combines into a working trading system
- Prepare for HFT engineering roles - Build exactly what you would build on day one at a trading firm
2. Theoretical Foundation
2.1 Core Concepts
System Architecture: The Trading Pipeline
A complete HFT system is not a single program but an orchestrated collection of specialized components, each optimized for a specific task:
COMPLETE HFT TRADING SYSTEM ARCHITECTURE
==========================================
┌─────────────────────────────────────────────────────────────────────────────────────┐
│ EXTERNAL WORLD │
│ │
│ ┌─────────────────┐ ┌─────────────────────┐ │
│ │ Real Market │ │ Exchange │ │
│ │ Data Feeds │ │ Simulator │ │
│ │ (or Exchange │ │ (Matching │ │
│ │ Simulator) │ │ Engine) │ │
│ └────────┬────────┘ └──────────┬──────────┘ │
│ │ ▲ │
│ │ Market Data Order Submission│ │
└────────────┼────────────────────────────────────────────────────────┼───────────────┘
│ │
┌────────────┼────────────────────────────────────────────────────────┼───────────────┐
│ ▼ │ │
│ ┌─────────────────────────────────────────────────────────────────┴────────┐ │
│ │ MARKET DATA FEED HANDLER │ │
│ │ │ │
│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │
│ │ │ Protocol │───►│ Normalize │───►│ Lock-Free │ │ │
│ │ │ Parser │ │ & Validate │ │ Publisher │ │ │
│ │ └─────────────┘ └─────────────┘ └──────┬──────┘ │ │
│ │ │ │ │
│ │ Timestamp: T1 (market data arrival) │ │ │
│ └────────────────────────────────────────────────┼──────────────────────────┘ │
│ │ │
│ Lock-Free SPSC Queue │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────────────┐ │
│ │ STRATEGY ENGINE │ │
│ │ │ │
│ │ ┌────────────────────────────────────────────────────────────────────┐ │ │
│ │ │ Market Data Consumer │ │ │
│ │ │ ┌─────────────────────────────────────────────────────────────┐ │ │ │
│ │ │ │ Order Book │ Price │ Signal │ Order │ │ │ │
│ │ │ │ Reconstruction │ Analysis │ Generation │ Decision │ │ │ │
│ │ │ └─────────────────────────────────────────────────────────────┘ │ │ │
│ │ └────────────────────────────────────────────────────────────────────┘ │ │
│ │ │ │
│ │ Timestamp: T2 (signal generated) │ │
│ └──────────────────────────────────────────┬───────────────────────────────────┘ │
│ │ │
│ Order Commands │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────────────┐ │
│ │ RISK MANAGEMENT LAYER │ │
│ │ │ │
│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │
│ │ │ Position │ │ Order Rate │ │ P&L Limit │ │ Exposure │ │ │
│ │ │ Limits │ │ Limiting │ │ Check │ │ Check │ │ │
│ │ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ │ │
│ │ │ │
│ │ Risk Check: PASS/REJECT │ │
│ └──────────────────────────────────────────┬───────────────────────────────────┘ │
│ │ │
│ Approved Orders │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────────────┐ │
│ │ TRADING GATEWAY │ │
│ │ │ │
│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │
│ │ │ Order │ │ Session │ │ Wire │ │ ACK/Fill │ │ │
│ │ │ Validation │ │ Management │ │ Protocol │ │ Handler │ │ │
│ │ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ │ │
│ │ │ │
│ │ Timestamp: T3 (order sent) │ │
│ └──────────────────────────────────────────┬───────────────────────────────────┘ │
│ │ │
│ To Exchange ────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────────────────────────┐ │
│ │ PERFORMANCE MONITORING DASHBOARD │ │
│ │ │ │
│ │ ┌─────────────────────────────────────────────────────────────────────┐ │ │
│ │ │ Orders/sec │ Latency p50/p99 │ Position │ P&L │ Risk │ Fills │ │ │
│ │ │ 12,456 │ 45us / 120us │ +500 │ +$23│ OK │ 234 │ │ │
│ │ └─────────────────────────────────────────────────────────────────────┘ │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────────────────────────┘ │
│ │
│ YOUR TRADING SYSTEM │
└──────────────────────────────────────────────────────────────────────────────────────┘
Component Integration: The Challenge
Each component from previous projects was optimized in isolation. The capstone challenge is making them work together without losing the performance gains:
INTEGRATION CHALLENGES
======================
Challenge 1: Data Flow Without Copying
--------------------------------------
Feed Handler ──► Strategy ──► Gateway
│ │ │
│ Copy? │ Copy? │
└──────────────┴────────────┘
WRONG: memcpy(strategy_buffer, feed_handler_data, size);
memcpy(gateway_buffer, strategy_order, size);
Total: 2 copies per message = 2-3 microseconds added!
RIGHT: Zero-copy message passing with shared memory regions
or lock-free queues with pre-allocated slots
Challenge 2: Thread Coordination
--------------------------------
┌──────────────────────────────────────────────────────────────┐
│ │
│ Core 0: Feed Handler ──┐ │
│ │ │
│ Core 1: Strategy ◄──┴──► Lock-Free Queue │
│ Engine │ │
│ │ │
│ Core 2: Gateway ◄───────┘ │
│ │
│ Core 3: Risk + Monitoring (lower priority) │
│ │
│ PROBLEM: If threads share a core, context switching │
│ adds 1-10 microseconds of latency jitter │
│ │
│ SOLUTION: CPU pinning with isolcpus kernel parameter │
│ │
└──────────────────────────────────────────────────────────────┘
Challenge 3: Failure Isolation
------------------------------
What happens when one component fails?
┌─────────────────────────────────────────────────────────────────┐
│ │
│ Feed Handler │
│ │ │
│ ▼ │
│ Strategy Engine ──────────────────────────────► Gateway │
│ │ │ │
│ │ CRASH! │ │
│ ▼ │ │
│ Risk Layer ─────────── Circuit Breaker ─────────────┘ │
│ │
│ Requirements: │
│ - Other components continue running │
│ - Orders in-flight complete or cancel │
│ - Position state preserved │
│ - Automatic recovery or manual intervention │
│ │
└─────────────────────────────────────────────────────────────────┘
End-to-End Latency Measurement
Understanding where time is spent is critical:
END-TO-END LATENCY BREAKDOWN
============================
T1 T2 T3 T4 T5
│ │ │ │ │
Market ───────►│◄── Feed ────►│◄── Strategy ──►│◄── Risk ───────►│◄── Gateway ──►│ Exchange
Data │ Handler │ Engine │ Check │ │
Arrival │ │ │ │ │
│ │ │ │ │
│◄─────────────►│◄───────────────►│◄───────────────►│◄─────────────►│
Hop 1 Hop 2 Hop 3 Hop 4
(~5us) (~20us) (~2us) (~10us)
│◄──────────────────────────────────────────────────────────────►│
Total: ~37us internal
+ Network RTT to exchange
Measurement Points:
- T1: When bytes arrive from market data source (socket timestamp)
- T2: When strategy makes trading decision
- T3: When risk check completes
- T4: When order bytes written to socket
- T5: When exchange acknowledgment received
Instrumentation Requirements:
- Hardware timestamps if available (NIC timestamping)
- Monotonic clock with nanosecond resolution
- Lock-free logging to avoid affecting measurements
- Histogram storage for percentile calculation
2.2 Why This Matters
This is not an academic exercise. This is exactly what you would build on day one at a trading firm:
Jane Street, Citadel, Two Sigma, Tower Research, DRW - These firms all have systems that look remarkably similar to what you’re building. The primary differences:
| Aspect | This Project | Production HFT |
|---|---|---|
| Market Data | Simulated feed or delayed | Direct exchange feeds ($50K+/month) |
| Networking | TCP/IP kernel stack | DPDK/RDMA kernel bypass |
| Hardware | Consumer CPU | Xeon with AVX-512, FPGA acceleration |
| Latency | ~50 microseconds | ~1-10 microseconds |
| Reliability | Single machine | Redundant, geographically distributed |
| Scale | 1-10 symbols | 10,000+ symbols |
The architecture and patterns are identical. What you learn here transfers directly.
2.3 Historical Context
EVOLUTION OF TRADING SYSTEMS
============================
1970s-1990s: Monolithic Systems
┌─────────────────────────────────────────┐
│ Single Application │
│ ┌───────────────────────────────────┐ │
│ │ Data + Strategy + Orders + Risk │ │
│ │ (All in one process) │ │
│ └───────────────────────────────────┘ │
│ │
│ - Easy to understand │
│ - Hard to scale │
│ - Single point of failure │
│ - Latency: milliseconds to seconds │
└─────────────────────────────────────────┘
2000s: Service-Oriented Architecture
┌─────────────────────────────────────────┐
│ ┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐│
│ │ Data │ │Strat │ │Order │ │ Risk ││
│ │ Svc │◄─│ Svc │─►│ Svc │◄─│ Svc ││
│ └──────┘ └──────┘ └──────┘ └──────┘│
│ │ │ │ │ │
│ └─────────┴─────────┴─────────┘ │
│ Message Bus (MQ) │
│ │
│ - Scalable │
│ - Resilient │
│ - Added latency from serialization │
│ - Latency: 100s of microseconds │
└─────────────────────────────────────────┘
2010s-Present: Ultra-Low-Latency Microservices
┌─────────────────────────────────────────┐
│ ┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐│
│ │ Feed │──│Strat │──│ Risk │──│ Gate ││
│ │ Hand │ │ Eng │ │ Mgr │ │ way ││
│ └──────┘ └──────┘ └──────┘ └──────┘│
│ │ │ │ │ │
│ └─────────┴─────────┴─────────┘ │
│ Lock-Free Shared Memory │
│ CPU Affinity, Kernel Bypass │
│ │
│ - Each component optimized separately │
│ - Zero-copy data passing │
│ - Deterministic latency │
│ - Latency: microseconds to nanoseconds │
└─────────────────────────────────────────┘
2.4 Common Misconceptions
Misconception 1: “Just glue the components together with function calls”
Reality: Inter-component communication design is critical. Naive function calls couple components, making them impossible to test, profile, or run in isolation.
Misconception 2: “More threads = more performance”
Reality: More threads often means more contention, context switching, and cache pollution. The fastest systems use the minimum number of threads, each pinned to a dedicated CPU core.
Misconception 3: “Microservices mean separate processes”
Reality: In HFT, “microservices” often means separate threads or modules sharing a single address space with lock-free communication. Process boundaries add unacceptable latency.
Misconception 4: “Monitoring adds overhead, skip it in production”
Reality: You cannot optimize what you cannot measure. Production systems have more monitoring, not less. The key is non-intrusive monitoring (lock-free counters, async logging).
Misconception 5: “Risk management is just position limits”
Reality: Comprehensive risk management includes: order rate limiting, P&L limits, position limits, exposure limits, correlation limits, and circuit breakers at multiple levels.
3. Project Specification
3.1 What You Will Build
A complete electronic trading ecosystem with six integrated components:
COMPLETE SYSTEM COMPONENT MAP
=============================
┌─────────────────────────────────────────────────────────────────────────────────────────┐
│ │
│ COMPONENT 1: EXCHANGE SIMULATOR │
│ ┌──────────────────────────────────────────────────────────────────────────────────┐ │
│ │ Reuse: Project 3 (Matching Engine) │ │
│ │ Enhancements: │ │
│ │ - Synthetic market data generation │ │
│ │ - Configurable latency injection │ │
│ │ - Multiple symbol support │ │
│ │ - Order book snapshots for reconciliation │ │
│ └──────────────────────────────────────────────────────────────────────────────────┘ │
│ │
│ COMPONENT 2: MARKET DATA FEED HANDLER │
│ ┌──────────────────────────────────────────────────────────────────────────────────┐ │
│ │ Reuse: Project 2 (Lock-Free Market Data Handler) │ │
│ │ Enhancements: │ │
│ │ - Connect to exchange simulator │ │
│ │ - Normalize data format │ │
│ │ - Lock-free publishing to strategy │ │
│ │ - Timestamp injection for latency measurement │ │
│ └──────────────────────────────────────────────────────────────────────────────────┘ │
│ │
│ COMPONENT 3: TRADING GATEWAY │
│ ┌──────────────────────────────────────────────────────────────────────────────────┐ │
│ │ New Component (uses patterns from Project 3) │ │
│ │ Features: │ │
│ │ - Order validation and enrichment │ │
│ │ - Session management with exchange │ │
│ │ - Wire protocol (binary order format) │ │
│ │ - Fill handling and position updates │ │
│ │ - Connection resilience (reconnection, failover) │ │
│ └──────────────────────────────────────────────────────────────────────────────────┘ │
│ │
│ COMPONENT 4: TRADING STRATEGY ENGINE │
│ ┌──────────────────────────────────────────────────────────────────────────────────┐ │
│ │ Reuse: Project 4 (Backtester) strategy interface │ │
│ │ Enhancements: │ │
│ │ - Live market data consumption │ │
│ │ - Order book reconstruction │ │
│ │ - Signal generation │ │
│ │ - Order management (working orders tracking) │ │
│ │ - At least one simple strategy (mean reversion, momentum) │ │
│ └──────────────────────────────────────────────────────────────────────────────────┘ │
│ │
│ COMPONENT 5: RISK MANAGEMENT LAYER │
│ ┌──────────────────────────────────────────────────────────────────────────────────┐ │
│ │ New Component │ │
│ │ Features: │ │
│ │ - Pre-trade risk checks (before order sent) │ │
│ │ - Position limits per symbol and total │ │
│ │ - Order rate limiting │ │
│ │ - P&L limits (stop trading if loss exceeds threshold) │ │
│ │ - Circuit breakers │ │
│ │ - Real-time position and P&L calculation │ │
│ └──────────────────────────────────────────────────────────────────────────────────┘ │
│ │
│ COMPONENT 6: PERFORMANCE MONITORING DASHBOARD │
│ ┌──────────────────────────────────────────────────────────────────────────────────┐ │
│ │ New Component │ │
│ │ Features: │ │
│ │ - Real-time latency percentiles (p50, p95, p99, p999) │ │
│ │ - Orders per second, fills per second │ │
│ │ - Position and P&L display │ │
│ │ - Risk status indicators │ │
│ │ - Component health monitoring │ │
│ │ - Alert generation │ │
│ └──────────────────────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────────────────┘
3.2 Functional Requirements
Per-Component Requirements
Component 1: Exchange Simulator
| ID | Requirement | Priority |
|---|---|---|
| EX-1 | Accept order submissions via TCP (binary protocol) | Must Have |
| EX-2 | Maintain order book per symbol (reuse P01 code) | Must Have |
| EX-3 | Execute matching with price-time priority | Must Have |
| EX-4 | Publish trade and market data updates | Must Have |
| EX-5 | Generate synthetic market activity (background orders) | Should Have |
| EX-6 | Support configurable latency injection for testing | Nice to Have |
Component 2: Market Data Feed Handler
| ID | Requirement | Priority |
|---|---|---|
| MD-1 | Connect to exchange simulator market data port | Must Have |
| MD-2 | Parse binary market data protocol | Must Have |
| MD-3 | Normalize to internal format | Must Have |
| MD-4 | Publish via lock-free SPSC queue (reuse P02 code) | Must Have |
| MD-5 | Add receive timestamp for latency tracking | Must Have |
| MD-6 | Handle reconnection on disconnect | Should Have |
Component 3: Trading Gateway
| ID | Requirement | Priority |
|---|---|---|
| GW-1 | Accept orders from strategy via internal interface | Must Have |
| GW-2 | Validate order parameters | Must Have |
| GW-3 | Serialize to wire protocol and send to exchange | Must Have |
| GW-4 | Handle fills and acknowledgments | Must Have |
| GW-5 | Update position state on fills | Must Have |
| GW-6 | Implement order timeout and retry logic | Should Have |
Component 4: Trading Strategy Engine
| ID | Requirement | Priority |
|---|---|---|
| ST-1 | Consume market data from feed handler queue | Must Have |
| ST-2 | Reconstruct order book from updates | Must Have |
| ST-3 | Generate trading signals based on simple strategy | Must Have |
| ST-4 | Submit orders via risk layer | Must Have |
| ST-5 | Track working orders (sent but not filled) | Must Have |
| ST-6 | Implement at least one quantitative strategy | Should Have |
Component 5: Risk Management Layer
| ID | Requirement | Priority |
|---|---|---|
| RK-1 | Check position limits before order submission | Must Have |
| RK-2 | Enforce order rate limits | Must Have |
| RK-3 | Track real-time P&L | Must Have |
| RK-4 | Stop trading when P&L limit breached | Must Have |
| RK-5 | Reject orders that violate risk parameters | Must Have |
| RK-6 | Implement circuit breaker on repeated failures | Should Have |
Component 6: Performance Monitoring Dashboard
| ID | Requirement | Priority |
|---|---|---|
| MO-1 | Display orders per second | Must Have |
| MO-2 | Display latency percentiles (p50, p95, p99) | Must Have |
| MO-3 | Display current position per symbol | Must Have |
| MO-4 | Display realized and unrealized P&L | Must Have |
| MO-5 | Display risk status (limits, breaches) | Must Have |
| MO-6 | Update in real-time (sub-second refresh) | Should Have |
3.3 Non-Functional Requirements
| Requirement | Target | Measurement |
|---|---|---|
| End-to-end latency (p50) | < 50 microseconds | Market data arrival to order sent |
| End-to-end latency (p99) | < 200 microseconds | Same measurement |
| Throughput | > 10,000 orders/second | Sustained for 60 seconds |
| Memory stability | No growth over time | Monitor RSS over 1 hour run |
| Recovery time | < 1 second | From component crash to resumed operation |
| Dashboard update rate | > 1 Hz | Visual refresh frequency |
| Zero allocation in hot path | 0 mallocs | Profile during steady state |
3.4 Example Usage / Output
System Startup
$ ./hft_system --config system.toml
[2024-06-15 09:30:00.000] [SYSTEM] HFT Trading System v1.0.0
[2024-06-15 09:30:00.001] [SYSTEM] Loading configuration: system.toml
[2024-06-15 09:30:00.002] [SYSTEM] Risk limits: max_position=1000, max_loss=$5000
[2024-06-15 09:30:00.003] [SYSTEM] Symbols: AAPL, MSFT, GOOG, AMZN
[2024-06-15 09:30:00.005] [EXCH ] Exchange simulator starting on port 9000
[2024-06-15 09:30:00.006] [EXCH ] Order book initialized for 4 symbols
[2024-06-15 09:30:00.010] [MD ] Feed handler connecting to localhost:9000
[2024-06-15 09:30:00.011] [MD ] Connected, receiving market data
[2024-06-15 09:30:00.015] [GW ] Gateway connecting to localhost:9000
[2024-06-15 09:30:00.016] [GW ] Connected, ready to send orders
[2024-06-15 09:30:00.020] [STRAT ] Strategy engine starting: MeanReversion
[2024-06-15 09:30:00.021] [STRAT ] Parameters: window=100, threshold=2.0 sigma
[2024-06-15 09:30:00.025] [RISK ] Risk manager initialized
[2024-06-15 09:30:00.030] [DASH ] Dashboard available at http://localhost:8080
[2024-06-15 09:30:00.031] [SYSTEM] All components ready. Trading enabled.
Live Operation (Dashboard View)
╔══════════════════════════════════════════════════════════════════════════════════════╗
║ HFT TRADING SYSTEM DASHBOARD ║
╠══════════════════════════════════════════════════════════════════════════════════════╣
║ ║
║ THROUGHPUT LATENCY (microseconds) ║
║ ┌─────────────────────────┐ ┌─────────────────────────┐ ║
║ │ Orders/sec: 12,456 │ │ p50: 42 │ ║
║ │ Fills/sec: 3,234 │ │ p95: 89 │ ║
║ │ Cancels/sec: 9,222 │ │ p99: 156 │ ║
║ │ Rejects/sec: 0 │ │ p999: 312 │ ║
║ └─────────────────────────┘ └─────────────────────────┘ ║
║ ║
║ POSITIONS ║
║ ┌───────────────────────────────────────────────────────────────────────────┐ ║
║ │ Symbol │ Position │ Avg Price │ Market │ Unrealized │ Realized │ Total │ ║
║ ├───────────────────────────────────────────────────────────────────────────┤ ║
║ │ AAPL │ +500 │ 150.25 │ 150.30 │ +$25.00 │ +$45.00 │ +$70.00 │ ║
║ │ MSFT │ -200 │ 380.10 │ 379.90 │ +$40.00 │ +$12.00 │ +$52.00 │ ║
║ │ GOOG │ +100 │ 140.50 │ 140.25 │ -$25.00 │ +$8.00 │ -$17.00 │ ║
║ │ AMZN │ 0 │ 0.00 │ 178.45 │ $0.00 │ +$23.00 │ +$23.00 │ ║
║ ├───────────────────────────────────────────────────────────────────────────┤ ║
║ │ TOTAL │ │ │ │ +$40.00 │ +$88.00 │+$128.00 │ ║
║ └───────────────────────────────────────────────────────────────────────────┘ ║
║ ║
║ RISK STATUS ║
║ ┌───────────────────────────────────────────────────────────────────────────┐ ║
║ │ Position Limit: ████████░░░░░░░░░░░░ 40% (400/1000) [OK] │ ║
║ │ Order Rate: ██████████░░░░░░░░░░ 50% (5000/10000 per sec) [OK] │ ║
║ │ Daily P&L: ████████████████████ +$128 / -$5000 limit [OK] │ ║
║ │ Circuit Breaker: [ARMED] │ ║
║ └───────────────────────────────────────────────────────────────────────────┘ ║
║ ║
║ RECENT TRADES ║
║ ┌───────────────────────────────────────────────────────────────────────────┐ ║
║ │ 09:31:45.234 │ AAPL │ BUY │ 50 │ 150.30 │ +$0.25/sh │ Latency: 38us │ ║
║ │ 09:31:45.189 │ MSFT │ SELL │ 25 │ 379.90 │ +$0.20/sh │ Latency: 45us │ ║
║ │ 09:31:45.156 │ AAPL │ BUY │ 100 │ 150.28 │ +$0.03/sh │ Latency: 41us │ ║
║ │ 09:31:45.098 │ GOOG │ BUY │ 50 │ 140.25 │ -$0.25/sh │ Latency: 52us │ ║
║ │ 09:31:44.987 │ AAPL │ SELL │ 75 │ 150.31 │ +$0.06/sh │ Latency: 39us │ ║
║ └───────────────────────────────────────────────────────────────────────────┘ ║
║ ║
║ [P] Pause Trading [R] Resume [F] Flatten All [Q] Quit ║
╚══════════════════════════════════════════════════════════════════════════════════════╝
Log Output
[09:31:45.234] [STRAT ] Signal: BUY AAPL (price=150.30, signal_strength=2.3)
[09:31:45.234] [RISK ] Order check: BUY 50 AAPL @ MKT -> APPROVED
[09:31:45.235] [GW ] Order sent: id=12456 BUY 50 AAPL @ MKT
[09:31:45.272] [GW ] Fill: id=12456 50 @ 150.30
[09:31:45.272] [RISK ] Position update: AAPL +500 (limit: 1000)
[09:31:45.272] [RISK ] P&L update: +$128.00 (limit: -$5000)
[09:31:45.272] [STRAT ] Order 12456 filled, updating state
[09:31:45.272] [PERF ] Trade latency: 38us (signal to fill)
3.5 Real World Outcome
When complete, you will have:
- A complete, working trading system that you can actually use for paper trading
- End-to-end latency measurements showing exactly where time is spent
- Real-time P&L tracking with position management
- Risk controls that prevent runaway losses
- A monitoring dashboard that visualizes system health
- Portfolio-ready demonstration of systems programming expertise
- Interview talking points for every aspect of trading system design
4. Solution Architecture
4.1 High-Level Design
FULL SYSTEM DATA FLOW ARCHITECTURE
==================================
┌─────────────────────────────────────┐
│ EXCHANGE SIMULATOR │
│ │
│ ┌───────────────┐ │
┌────────────►│ │ Matching │◄────────────┐ │
│ │ │ Engine │ │ │
│ │ └───────────────┘ │ │
│ │ │ │ │
│ │ ▼ Market Data │ │
│ │ ┌───────────────┐ │ │
│ │ │ MD Publisher │ │ │
│ │ └───────┬───────┘ │ │
│ │ │ │ │
│ └───────────┼─────────────────────┘ │
│ │ │
│ TCP Port 9001 │ TCP Port 9000 │
│ │ │
│ ▼ │
┌─────────────────────┴─────────────────────────────────────────────────────────────────┐
│ │
│ TRADING SYSTEM PROCESS │
│ │
│ ┌────────────────────────────────────────────────────────────────────────────────┐ │
│ │ NETWORK I/O THREAD (Core 0) │ │
│ │ ┌──────────────────────────────────────────────────────────────────────────┐ │ │
│ │ │ Event Loop (epoll/io_uring) │ │ │
│ │ │ │ │ │
│ │ │ ┌─────────────────────┐ ┌─────────────────────┐ │ │ │
│ │ │ │ Market Data Socket │ │ Order Socket │ │ │ │
│ │ │ │ (Read from Exch) │ │ (Write to Exch) │ │ │ │
│ │ │ └──────────┬──────────┘ └──────────▲──────────┘ │ │ │
│ │ │ │ │ │ │ │
│ │ │ │ Raw bytes │ Order bytes │ │ │
│ │ │ ▼ │ │ │ │
│ │ │ ┌─────────────────────┐ ┌─────────┴──────────┐ │ │ │
│ │ │ │ Protocol Parser │ │ Protocol Serializer│ │ │ │
│ │ │ │ (FEED HANDLER) │ │ (GATEWAY) │ │ │ │
│ │ │ └──────────┬──────────┘ └──────────▲─────────┘ │ │ │
│ │ │ │ │ │ │ │
│ │ │ │ Parsed updates │ Order commands │ │ │
│ │ │ ▼ │ │ │ │
│ │ │ ┌────────────────────────────────────────────┴──────────────────────┐ │ │ │
│ │ │ │ LOCK-FREE SPSC QUEUES (inter-thread communication) │ │ │ │
│ │ │ │ │ │ │ │
│ │ │ │ Market Data Queue Order Command Queue │ │ │ │
│ │ │ │ ┌─────────────┐ ┌─────────────┐ │ │ │ │
│ │ │ │ │ █ █ █ ░ ░ ░ │ │ ░ ░ █ █ ░ ░ │ │ │ │ │
│ │ │ │ └──────┬──────┘ └──────▲──────┘ │ │ │ │
│ │ │ │ │ │ │ │ │ │
│ │ │ └──────────┼────────────────────────┼───────────────────────────────┘ │ │ │
│ │ └─────────────┼────────────────────────┼──────────────────────────────────┘ │ │
│ │ │ │ │ │
│ │ │ │ │ │
│ │ ┌─────────────┼────────────────────────┼──────────────────────────────────┐ │ │
│ │ │ STRATEGY THREAD (Core 1) │ │ │ │
│ │ │ │ │ │ │ │
│ │ │ ▼ │ │ │ │
│ │ │ ┌─────────────────────┐ │ │ │ │
│ │ │ │ Order Book │ │ │ │ │
│ │ │ │ Reconstruction │ │ │ │ │
│ │ │ └──────────┬──────────┘ │ │ │ │
│ │ │ │ │ │ │ │
│ │ │ ▼ │ │ │ │
│ │ │ ┌─────────────────────┐ │ │ │ │
│ │ │ │ STRATEGY ENGINE │ │ │ │ │
│ │ │ │ - Signal generation │ │ │ │ │
│ │ │ │ - Order management │ │ │ │ │
│ │ │ └──────────┬──────────┘ │ │ │ │
│ │ │ │ │ │ │ │
│ │ │ │ Order request │ │ │ │
│ │ │ ▼ │ │ │ │
│ │ │ ┌─────────────────────┐ │ │ │ │
│ │ │ │ RISK MANAGER │─────────────┘ │ │ │
│ │ │ │ - Pre-trade checks │ (writes to order queue if approved) │ │ │
│ │ │ │ - Position tracking │ │ │ │
│ │ │ │ - P&L calculation │ │ │ │
│ │ │ └─────────────────────┘ │ │ │
│ │ │ │ │ │
│ │ └───────────────────────────────────────────────────────────────────────────┘ │ │
│ │ │ │
│ │ ┌───────────────────────────────────────────────────────────────────────────┐ │ │
│ │ │ MONITORING THREAD (Core 2 or shared) │ │ │
│ │ │ │ │ │
│ │ │ ┌─────────────────────────────────────────────────────────────────────┐ │ │ │
│ │ │ │ DASHBOARD │ │ │ │
│ │ │ │ - Reads metrics from lock-free counters │ │ │ │
│ │ │ │ - Computes latency percentiles │ │ │ │
│ │ │ │ - Renders TUI or serves HTTP │ │ │ │
│ │ │ └─────────────────────────────────────────────────────────────────────┘ │ │ │
│ │ │ │ │ │
│ │ └────────────────────────────────────────────────────────────────────────────┘ │ │
│ │ │ │
│ └────────────────────────────────────────────────────────────────────────────────┘ │ │
│ │
└────────────────────────────────────────────────────────────────────────────────────────┘
4.2 Key Components
| Component | Responsibility | Thread | Core Affinity |
|---|---|---|---|
| Network I/O | Socket reads/writes, protocol parsing | Dedicated | Core 0 (isolated) |
| Feed Handler | Market data parsing, normalization | Part of Network I/O | Core 0 |
| Strategy Engine | Signal generation, order decisions | Dedicated | Core 1 (isolated) |
| Risk Manager | Pre-trade checks, position tracking | Part of Strategy | Core 1 |
| Gateway | Order serialization, fill handling | Part of Network I/O | Core 0 |
| Dashboard | Metrics aggregation, display | Shared | Any available core |
| Exchange Sim | Separate process | Own threads | Own cores |
4.3 Data Structures
CROSS-COMPONENT MESSAGE TYPES
=============================
MarketDataUpdate (Published by Feed Handler, Consumed by Strategy)
┌──────────────────────────────────────────────────────────────────────────┐
│ Field │ Type │ Size │ Description │
├──────────────────────────────────────────────────────────────────────────┤
│ symbol_id │ u32 │ 4 │ Internal symbol identifier │
│ update_type │ u8 │ 1 │ TRADE=1, BBO=2, BOOK=3 │
│ bid_price │ i64 │ 8 │ Best bid (basis points) │
│ bid_qty │ u32 │ 4 │ Best bid quantity │
│ ask_price │ i64 │ 8 │ Best ask (basis points) │
│ ask_qty │ u32 │ 4 │ Best ask quantity │
│ last_price │ i64 │ 8 │ Last trade price │
│ last_qty │ u32 │ 4 │ Last trade quantity │
│ exchange_time │ u64 │ 8 │ Exchange timestamp (nanos) │
│ receive_time │ u64 │ 8 │ Local receive timestamp (nanos) │
│ sequence_num │ u64 │ 8 │ Sequence for gap detection │
├──────────────────────────────────────────────────────────────────────────┤
│ Total │ │ 65 │ (padded to 72 for alignment) │
└──────────────────────────────────────────────────────────────────────────┘
OrderCommand (Published by Strategy, Consumed by Gateway)
┌──────────────────────────────────────────────────────────────────────────┐
│ Field │ Type │ Size │ Description │
├──────────────────────────────────────────────────────────────────────────┤
│ command_type │ u8 │ 1 │ NEW=1, CANCEL=2, REPLACE=3 │
│ symbol_id │ u32 │ 4 │ Internal symbol identifier │
│ order_id │ u64 │ 8 │ Client order ID │
│ side │ u8 │ 1 │ BUY=1, SELL=2 │
│ order_type │ u8 │ 1 │ LIMIT=1, MARKET=2 │
│ price │ i64 │ 8 │ Limit price (basis points) │
│ quantity │ u32 │ 4 │ Order quantity │
│ time_in_force │ u8 │ 1 │ DAY=1, IOC=2, GTC=3 │
│ strategy_time │ u64 │ 8 │ Decision timestamp (nanos) │
│ padding │ - │ 4 │ Alignment padding │
├──────────────────────────────────────────────────────────────────────────┤
│ Total │ │ 40 │ (fits in cache line) │
└──────────────────────────────────────────────────────────────────────────┘
ExecutionReport (From Gateway to Strategy/Risk)
┌──────────────────────────────────────────────────────────────────────────┐
│ Field │ Type │ Size │ Description │
├──────────────────────────────────────────────────────────────────────────┤
│ report_type │ u8 │ 1 │ ACK=1, FILL=2, CANCEL=3, REJ=4 │
│ order_id │ u64 │ 8 │ Client order ID │
│ exec_id │ u64 │ 8 │ Execution ID (from exchange) │
│ symbol_id │ u32 │ 4 │ Symbol identifier │
│ side │ u8 │ 1 │ BUY=1, SELL=2 │
│ exec_price │ i64 │ 8 │ Execution price │
│ exec_qty │ u32 │ 4 │ Executed quantity │
│ leaves_qty │ u32 │ 4 │ Remaining quantity │
│ order_status │ u8 │ 1 │ NEW, PARTIAL, FILLED, CANCELED │
│ reject_reason │ u8 │ 1 │ If rejected, why │
│ exchange_time │ u64 │ 8 │ Exchange timestamp │
│ receive_time │ u64 │ 8 │ Local receive timestamp │
├──────────────────────────────────────────────────────────────────────────┤
│ Total │ │ 56 │ (padded to 64 for alignment) │
└──────────────────────────────────────────────────────────────────────────┘
Position (Maintained by Risk Manager)
┌──────────────────────────────────────────────────────────────────────────┐
│ Field │ Type │ Size │ Description │
├──────────────────────────────────────────────────────────────────────────┤
│ symbol_id │ u32 │ 4 │ Symbol identifier │
│ quantity │ i64 │ 8 │ Signed quantity (+long/-short) │
│ avg_price │ i64 │ 8 │ Volume-weighted average price │
│ realized_pnl │ i64 │ 8 │ Realized P&L (basis points) │
│ unrealized_pnl │ i64 │ 8 │ Mark-to-market P&L │
│ last_update_time │ u64 │ 8 │ Last update timestamp │
├──────────────────────────────────────────────────────────────────────────┤
│ Total │ │ 44 │ (padded to 48) │
└──────────────────────────────────────────────────────────────────────────┘
Metrics (Lock-free counters for Dashboard)
┌──────────────────────────────────────────────────────────────────────────┐
│ Field │ Type │ Description │
├──────────────────────────────────────────────────────────────────────────┤
│ orders_sent │ AtomicU64 │ Total orders submitted │
│ orders_filled │ AtomicU64 │ Total orders filled │
│ orders_rejected │ AtomicU64 │ Total orders rejected │
│ shares_traded │ AtomicU64 │ Total shares traded │
│ latency_samples │ LockFreeHist │ Latency histogram (HDR Histogram) │
│ last_latency_us │ AtomicU64 │ Most recent latency │
│ market_data_msgs │ AtomicU64 │ Market data messages received │
│ md_sequence_gaps │ AtomicU64 │ Detected sequence gaps │
└──────────────────────────────────────────────────────────────────────────┘
4.4 Algorithm Overview: System Orchestration
SYSTEM STARTUP SEQUENCE
=======================
Phase 1: Configuration Loading
┌─────────────────────────────────────────────────────────────────────────┐
│ 1. Parse command line arguments │
│ 2. Load TOML/YAML configuration file │
│ 3. Validate configuration (symbols, limits, addresses) │
│ 4. Initialize logging subsystem │
└─────────────────────────────────────────────────────────────────────────┘
Phase 2: Infrastructure Setup
┌─────────────────────────────────────────────────────────────────────────┐
│ 1. Pre-allocate memory pools (orders, messages) │
│ 2. Create lock-free queues with appropriate capacity │
│ 3. Initialize shared metrics structures │
│ 4. Set up signal handlers (SIGINT, SIGTERM) │
└─────────────────────────────────────────────────────────────────────────┘
Phase 3: Component Initialization
┌─────────────────────────────────────────────────────────────────────────┐
│ 1. Start exchange simulator (if enabled) │
│ 2. Initialize order books for all symbols │
│ 3. Initialize risk manager with limits │
│ 4. Initialize strategy with parameters │
│ 5. Start monitoring dashboard │
└─────────────────────────────────────────────────────────────────────────┘
Phase 4: Connection Establishment
┌─────────────────────────────────────────────────────────────────────────┐
│ 1. Connect feed handler to market data source │
│ 2. Wait for initial order book snapshot │
│ 3. Connect gateway to order entry │
│ 4. Authenticate sessions │
│ 5. Verify connectivity with heartbeats │
└─────────────────────────────────────────────────────────────────────────┘
Phase 5: Trading Enable
┌─────────────────────────────────────────────────────────────────────────┐
│ 1. Enable strategy to generate signals │
│ 2. Enable gateway to send orders │
│ 3. Start latency measurement │
│ 4. Begin dashboard updates │
│ 5. Log "TRADING ENABLED" │
└─────────────────────────────────────────────────────────────────────────┘
MAIN EVENT LOOP (per thread)
============================
Network I/O Thread:
┌─────────────────────────────────────────────────────────────────────────┐
│ loop { │
│ events = epoll_wait(timeout=1ms); │
│ │
│ for event in events { │
│ if event.fd == market_data_socket { │
│ bytes = recv(fd); │
│ for msg in parse_market_data(bytes) { │
│ msg.receive_time = now_nanos(); │
│ md_queue.push(msg); // Lock-free │
│ } │
│ } │
│ │
│ if event.fd == order_socket { │
│ bytes = recv(fd); │
│ for exec in parse_executions(bytes) { │
│ exec.receive_time = now_nanos(); │
│ exec_queue.push(exec); // Lock-free │
│ } │
│ } │
│ } │
│ │
│ // Send any pending orders │
│ while let Some(order) = order_queue.pop() { │
│ bytes = serialize_order(order); │
│ send(order_socket, bytes); │
│ } │
│ } │
└─────────────────────────────────────────────────────────────────────────┘
Strategy Thread:
┌─────────────────────────────────────────────────────────────────────────┐
│ loop { │
│ // Process market data │
│ while let Some(md) = md_queue.pop() { │
│ order_book.update(md); │
│ strategy.on_market_data(md, &order_book); │
│ } │
│ │
│ // Process execution reports │
│ while let Some(exec) = exec_queue.pop() { │
│ risk_manager.on_execution(exec); │
│ strategy.on_execution(exec); │
│ │
│ // Measure latency │
│ if exec.report_type == FILL { │
│ latency = exec.receive_time - exec.strategy_time; │
│ metrics.record_latency(latency); │
│ } │
│ } │
│ │
│ // Generate orders if strategy has signals │
│ if let Some(signal) = strategy.get_signal() { │
│ if risk_manager.check(signal) == APPROVED { │
│ order = create_order(signal); │
│ order.strategy_time = now_nanos(); │
│ order_queue.push(order); │
│ } │
│ } │
│ } │
└─────────────────────────────────────────────────────────────────────────┘
5. Implementation Guide
5.1 Development Environment Setup
# Linux (Ubuntu 22.04+) - Recommended
# System packages
sudo apt-get update
sudo apt-get install -y build-essential cmake ninja-build
sudo apt-get install -y liburing-dev # io_uring support
sudo apt-get install -y pkg-config libssl-dev
# C++ toolchain
sudo apt-get install -y g++-12 clang-15
sudo apt-get install -y libfmt-dev libspdlog-dev
# Rust toolchain
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
source $HOME/.cargo/env
rustup default stable
rustup component add rust-analyzer clippy rustfmt
# Performance tools
sudo apt-get install -y linux-tools-common linux-tools-generic
cargo install flamegraph hyperfine
# For dashboard (TUI option)
cargo install cargo-watch
# CPU isolation (optional, for production-like testing)
# Edit /etc/default/grub:
# GRUB_CMDLINE_LINUX="isolcpus=1,2,3 nohz_full=1,2,3"
# sudo update-grub && reboot
5.2 Project Structure
hft_system/
├── Cargo.toml # Workspace root
├── Cargo.lock
├── config/
│ ├── default.toml # Default configuration
│ ├── paper_trading.toml # Paper trading config
│ └── backtest.toml # Backtesting config
│
├── crates/
│ ├── common/ # Shared types and utilities
│ │ ├── Cargo.toml
│ │ └── src/
│ │ ├── lib.rs
│ │ ├── types.rs # Order, Trade, Position types
│ │ ├── protocol.rs # Wire protocol definitions
│ │ ├── config.rs # Configuration parsing
│ │ └── time.rs # High-precision timing
│ │
│ ├── orderbook/ # From Project 1
│ │ ├── Cargo.toml
│ │ └── src/
│ │ ├── lib.rs
│ │ ├── book.rs
│ │ └── level.rs
│ │
│ ├── lockfree/ # From Project 2
│ │ ├── Cargo.toml
│ │ └── src/
│ │ ├── lib.rs
│ │ ├── spsc_queue.rs
│ │ └── mpsc_queue.rs
│ │
│ ├── matching_engine/ # From Project 3
│ │ ├── Cargo.toml
│ │ └── src/
│ │ ├── lib.rs
│ │ ├── engine.rs
│ │ ├── network.rs
│ │ └── protocol.rs
│ │
│ ├── allocator/ # From Project 5
│ │ ├── Cargo.toml
│ │ └── src/
│ │ ├── lib.rs
│ │ ├── pool.rs
│ │ └── arena.rs
│ │
│ ├── feed_handler/ # NEW: Market data processing
│ │ ├── Cargo.toml
│ │ └── src/
│ │ ├── lib.rs
│ │ ├── parser.rs
│ │ ├── normalizer.rs
│ │ └── publisher.rs
│ │
│ ├── strategy/ # NEW: Trading strategies
│ │ ├── Cargo.toml
│ │ └── src/
│ │ ├── lib.rs
│ │ ├── engine.rs
│ │ ├── signals.rs
│ │ ├── strategies/
│ │ │ ├── mod.rs
│ │ │ ├── mean_reversion.rs
│ │ │ └── momentum.rs
│ │ └── order_manager.rs
│ │
│ ├── gateway/ # NEW: Order routing
│ │ ├── Cargo.toml
│ │ └── src/
│ │ ├── lib.rs
│ │ ├── connection.rs
│ │ ├── session.rs
│ │ └── serializer.rs
│ │
│ ├── risk/ # NEW: Risk management
│ │ ├── Cargo.toml
│ │ └── src/
│ │ ├── lib.rs
│ │ ├── position.rs
│ │ ├── limits.rs
│ │ ├── pnl.rs
│ │ └── circuit_breaker.rs
│ │
│ ├── dashboard/ # NEW: Monitoring
│ │ ├── Cargo.toml
│ │ └── src/
│ │ ├── lib.rs
│ │ ├── metrics.rs
│ │ ├── latency.rs
│ │ ├── tui.rs # Terminal UI
│ │ └── http.rs # Optional web server
│ │
│ └── exchange_sim/ # Enhanced matching engine
│ ├── Cargo.toml
│ └── src/
│ ├── lib.rs
│ ├── simulator.rs
│ ├── synthetic_data.rs
│ └── latency_injection.rs
│
├── src/
│ ├── main.rs # System entry point
│ └── orchestrator.rs # Component coordination
│
├── tests/
│ ├── integration/
│ │ ├── full_system_test.rs
│ │ ├── latency_test.rs
│ │ └── risk_test.rs
│ └── scenarios/
│ ├── normal_trading.rs
│ ├── high_volume.rs
│ └── failure_recovery.rs
│
├── benches/
│ ├── end_to_end_latency.rs
│ ├── throughput.rs
│ └── component_latency.rs
│
└── scripts/
├── run_paper_trading.sh
├── run_backtest.sh
└── analyze_latency.py
5.3 The Core Question You’re Answering
“How do you integrate multiple high-performance components into a cohesive system without losing the latency guarantees each component achieved in isolation?”
This is the fundamental challenge of production systems engineering. Individual components can be fast, but:
- Inter-component communication adds latency
- Coordination overhead introduces jitter
- Shared resources create contention
- Failure in one component affects others
The answer requires:
- Lock-free communication between components
- CPU affinity to eliminate context switching
- Careful memory layout to avoid false sharing
- Robust failure handling with circuit breakers
- Comprehensive instrumentation to identify bottlenecks
5.4 Concepts You Must Understand First
| Concept | Self-Assessment Question | Where to Learn |
|---|---|---|
| Lock-free queues | Can you implement a SPSC queue from memory? | P02, Rust Atomics and Locks Ch. 5 |
| Order book mechanics | Can you explain price-time priority? | P01, Building Low Latency Applications Ch. 5 |
| Event-driven I/O | Can you explain epoll edge vs level triggering? | P03, Linux Programming Interface Ch. 63 |
| CPU affinity | What is the cost of context switching? | CS:APP Ch. 8, Linux man sched_setaffinity |
| Memory ordering | What does acquire-release mean? | Rust Atomics and Locks Ch. 3 |
| HDR Histogram | How do you compute percentiles efficiently? | HdrHistogram documentation |
| Position tracking | How do you calculate FIFO P&L? | Any trading operations book |
| Risk controls | What is a circuit breaker pattern? | Release It! by Michael Nygard |
5.5 Questions to Guide Your Design
System Architecture:
- How many threads will you use and why?
- Which components share state and how will you synchronize access?
- What is your strategy for handling component failures?
- How will you ensure deterministic behavior for testing?
Inter-Component Communication:
- Will you use queues, shared memory, or function calls?
- What is the maximum queue depth and what happens when full?
- How will you handle message backpressure?
- How will you pass timestamps across component boundaries?
Latency Measurement:
- Where exactly will you capture timestamps?
- How will you avoid measurement affecting the thing being measured?
- What percentiles will you track (p50, p95, p99, p999)?
- How will you identify which component adds the most latency?
Risk Management:
- What happens when position limits are breached?
- How do you handle orders in flight when limits are hit?
- What is your P&L calculation methodology?
- How will the strategy be notified of risk events?
Failure Handling:
- What happens when the exchange connection drops?
- What happens when the strategy crashes?
- How do you restart without losing position state?
- How do you prevent “zombie orders” (sent but not tracked)?
5.6 Thinking Exercise: Trace a Trade Through the System
Before writing code, trace through this complete scenario on paper:
T=0: System starts, connects to exchange
T=100: Market data arrives: AAPL bid=150.00 ask=150.02
T=101: Market data update: AAPL bid=150.01 ask=150.02
T=102: Strategy detects: bid crossed moving average, signal to BUY
T=103: Risk check: position=0, limit=1000, APPROVED
T=104: Order created: BUY 100 AAPL @ 150.02 LIMIT
T=105: Order serialized and sent to exchange
T=110: Exchange acknowledges order (order_id=12345)
T=115: Exchange fills order: 100 @ 150.02
T=116: Fill message received
T=117: Position updated: AAPL +100 @ 150.02
T=118: Strategy notified of fill
Answer on paper:
- At each timestamp, which component is active?
- What messages pass between components at each step?
- What is the end-to-end latency (T=101 to T=116)?
- Where could latency be reduced?
- What happens if the exchange rejects the order at T=110?
- What happens if market data arrives at T=114 showing price=150.10?
5.7 Hints in Layers
Hint 1: Start with Two Components
Don’t try to build all six components at once. Start with:
- Exchange Simulator (modified P03) + Simple Client
- Verify you can submit orders and receive fills
Then add one component at a time:
- Add Feed Handler (receive market data)
- Add Strategy (generate orders from market data)
- Add Risk Manager (wrap strategy)
- Add Gateway (replace simple client)
- Add Dashboard (last, non-critical path)
Each addition should not break existing functionality.
Hint 2: Lock-Free Queue Design
For inter-component communication, use bounded SPSC queues:
// Pre-allocate queue with power-of-2 capacity
struct SpscQueue<T, const N: usize> {
buffer: [UnsafeCell<MaybeUninit<T>>; N],
head: AtomicUsize, // Write position (producer)
tail: AtomicUsize, // Read position (consumer)
// Pad to avoid false sharing between head and tail
}
// Producer only touches head
fn push(&self, item: T) -> Result<(), T> {
let head = self.head.load(Ordering::Relaxed);
let tail = self.tail.load(Ordering::Acquire);
if head - tail >= N {
return Err(item); // Full
}
unsafe {
(*self.buffer[head % N].get()).write(item);
}
self.head.store(head + 1, Ordering::Release);
Ok(())
}
// Consumer only touches tail
fn pop(&self) -> Option<T> {
let tail = self.tail.load(Ordering::Relaxed);
let head = self.head.load(Ordering::Acquire);
if tail == head {
return None; // Empty
}
let item = unsafe {
(*self.buffer[tail % N].get()).assume_init_read()
};
self.tail.store(tail + 1, Ordering::Release);
Some(item)
}
Key: Separate cache lines for producer and consumer state.
Hint 3: Thread Pinning
Use thread affinity to prevent context switching:
// Linux
use libc::{cpu_set_t, sched_setaffinity, CPU_SET, CPU_ZERO};
fn pin_to_core(core_id: usize) {
unsafe {
let mut set: cpu_set_t = std::mem::zeroed();
CPU_ZERO(&mut set);
CPU_SET(core_id, &mut set);
let result = sched_setaffinity(0, std::mem::size_of::<cpu_set_t>(), &set);
assert_eq!(result, 0, "Failed to set CPU affinity");
}
}
// In thread startup
std::thread::spawn(move || {
pin_to_core(1); // Pin to core 1
// ... strategy loop
});
Verify with taskset or htop that threads stay on assigned cores.
Hint 4: Latency Measurement
Use HDR Histogram for efficient percentile tracking:
use hdrhistogram::Histogram;
struct LatencyTracker {
histogram: Histogram<u64>,
}
impl LatencyTracker {
fn new() -> Self {
// Track latencies from 1us to 10s, with 3 significant figures
Self {
histogram: Histogram::new_with_bounds(1, 10_000_000_000, 3).unwrap(),
}
}
fn record(&mut self, latency_ns: u64) {
let _ = self.histogram.record(latency_ns);
}
fn get_percentiles(&self) -> LatencyStats {
LatencyStats {
p50: self.histogram.value_at_quantile(0.50),
p95: self.histogram.value_at_quantile(0.95),
p99: self.histogram.value_at_quantile(0.99),
p999: self.histogram.value_at_quantile(0.999),
max: self.histogram.max(),
}
}
}
For multi-threaded access, use lock-free recorders or thread-local histograms.
Hint 5: Risk Manager Interface
The risk manager should be synchronous (inline) for pre-trade checks:
pub struct RiskManager {
positions: HashMap<SymbolId, Position>,
limits: RiskLimits,
metrics: Arc<Metrics>,
}
#[derive(Debug, Clone, Copy)]
pub enum RiskDecision {
Approved,
Rejected { reason: RiskRejectReason },
Modified { new_qty: u32, reason: &'static str },
}
impl RiskManager {
pub fn check_order(&mut self, order: &OrderCommand) -> RiskDecision {
// 1. Position limit check
let current_pos = self.positions.get(&order.symbol_id).map(|p| p.quantity).unwrap_or(0);
let proposed_pos = if order.side == Side::Buy {
current_pos + order.quantity as i64
} else {
current_pos - order.quantity as i64
};
if proposed_pos.abs() > self.limits.max_position_per_symbol {
return RiskDecision::Rejected {
reason: RiskRejectReason::PositionLimit,
};
}
// 2. Total exposure check
// 3. Order rate limit check
// 4. P&L limit check
// ...
RiskDecision::Approved
}
pub fn on_fill(&mut self, fill: &ExecutionReport) {
// Update position
// Recalculate P&L
// Check for circuit breaker triggers
}
}
Risk checks MUST be fast (< 1 microsecond). No I/O, no locks.
Hint 6: Simple Strategy Implementation
Start with a simple mean reversion strategy:
pub struct MeanReversionStrategy {
window_size: usize,
price_history: VecDeque<f64>,
threshold_sigma: f64,
position: i64,
max_position: i64,
}
impl MeanReversionStrategy {
pub fn on_market_data(&mut self, md: &MarketDataUpdate) -> Option<Signal> {
let mid_price = (md.bid_price + md.ask_price) as f64 / 2.0;
self.price_history.push_back(mid_price);
if self.price_history.len() > self.window_size {
self.price_history.pop_front();
}
if self.price_history.len() < self.window_size {
return None; // Not enough data
}
let mean = self.price_history.iter().sum::<f64>() / self.window_size as f64;
let variance = self.price_history.iter()
.map(|p| (p - mean).powi(2))
.sum::<f64>() / self.window_size as f64;
let std_dev = variance.sqrt();
let z_score = (mid_price - mean) / std_dev;
// Generate signals
if z_score < -self.threshold_sigma && self.position < self.max_position {
return Some(Signal::Buy { strength: z_score.abs() });
}
if z_score > self.threshold_sigma && self.position > -self.max_position {
return Some(Signal::Sell { strength: z_score.abs() });
}
None
}
}
Note: Use fixed-point math in production to avoid floating-point issues.
5.8 The Interview Questions They’ll Ask
After completing this project, you’ll be prepared for senior/staff-level systems interviews:
System Design:
- “Design a trading system that handles 1 million orders per second.”
- Expected: Component separation, lock-free communication, CPU affinity, batching
- “How would you achieve sub-10-microsecond latency?”
- Expected: Kernel bypass (DPDK), FPGA, co-location, memory pre-allocation
- “How do you ensure no orders are lost during a system restart?”
- Expected: Write-ahead logging, checkpointing, replay from sequence numbers
Concurrency:
- “Explain your inter-component communication design.”
- Expected: Lock-free queues, memory ordering, cache line separation
- “How do you measure latency without affecting performance?”
- Expected: Hardware timestamps, lock-free logging, sampling, HDR Histogram
Risk Management:
- “What risk controls would you implement for a trading system?”
- Expected: Position limits, P&L limits, rate limiting, circuit breakers, kill switch
- “How do you handle partial fills in position tracking?”
- Expected: FIFO matching, average price calculation, realized vs unrealized P&L
Operations:
- “How would you debug a latency spike in production?”
- Expected: Trace analysis, timestamp correlation, profiling, cache analysis
- “How do you deploy changes without disrupting trading?”
- Expected: Blue-green deployment, feature flags, gradual rollout, instant rollback
- “What happens if your strategy goes haywire?”
- Expected: Circuit breakers, position limits, human kill switch, automated monitoring
5.9 Books That Will Help
| Topic | Book | Relevant Chapters |
|---|---|---|
| Trading system architecture | Building Low Latency Applications with C++ | Full book, especially Ch. 8-10 |
| Lock-free programming | Rust Atomics and Locks | Ch. 4-7 |
| Systems performance | Systems Performance, 2nd Ed. by Brendan Gregg | Ch. 3, 6-7 |
| Resilience patterns | Release It! by Michael Nygard | Ch. 4-5 |
| Trading operations | The Man Who Solved the Market (for context) | Full book |
| Exchange mechanics | Trading and Exchanges by Larry Harris | Ch. 6-10 |
| Performance monitoring | The Art of Capacity Planning | Ch. 2-4 |
| Linux internals | Linux Programming Interface | Ch. 62-63 |
5.10 Implementation Phases
Phase 1: Foundation (Week 1-2)
Goals:
- Set up project structure with all crates
- Verify previous project code compiles and integrates
- Create shared types and configuration system
Tasks:
- Create Cargo workspace with all crates
- Define shared types in
commoncrate (Order, Trade, Position) - Implement configuration loading (TOML parser)
- Create logging infrastructure (spdlog or tracing)
- Set up basic test framework
- Verify P01-P05 code integrates cleanly
Checkpoint: All crates compile, shared types defined, config loads.
Phase 2: Exchange Simulator Enhancement (Week 3-4)
Goals:
- Enhance matching engine to work as standalone exchange
- Add synthetic market data generation
- Implement binary protocol for orders and market data
Tasks:
- Add multi-symbol support to matching engine
- Create synthetic order generator (random walks, patterns)
- Define and implement wire protocol for orders
- Define and implement wire protocol for market data
- Add latency injection for testing
- Create simple test client
Checkpoint: Exchange runs standalone, generates data, accepts orders.
Phase 3: Feed Handler + Gateway (Week 5-6)
Goals:
- Build market data feed handler with lock-free publishing
- Build order gateway with session management
- Verify round-trip communication with exchange
Tasks:
- Implement feed handler connection and parsing
- Add timestamp injection at receive
- Implement lock-free publishing queue
- Implement gateway connection and serialization
- Add session management (login, heartbeat)
- Test full round-trip: send order, receive fill
Checkpoint: Can send orders and receive fills through gateway.
Phase 4: Strategy Engine + Risk (Week 7-8)
Goals:
- Implement strategy engine with order book reconstruction
- Build risk management layer
- Integrate with feed handler and gateway
Tasks:
- Implement order book reconstruction from updates
- Create strategy interface and mean reversion implementation
- Implement risk manager with position and P&L tracking
- Wire strategy to consume market data and produce orders
- Add order management (tracking working orders)
- Integrate with gateway for order submission
Checkpoint: Strategy generates orders based on market data.
Phase 5: Dashboard + Metrics (Week 9-10)
Goals:
- Build real-time monitoring dashboard
- Implement comprehensive latency tracking
- Add health monitoring for all components
Tasks:
- Add lock-free metrics counters to all components
- Implement latency histogram (HDR Histogram)
- Create TUI dashboard with ratatui
- Display positions, P&L, latency, throughput
- Add risk status indicators
- Implement alert generation
Checkpoint: Dashboard shows live system status.
Phase 6: Integration + Optimization (Week 11-12)
Goals:
- Full system integration
- CPU affinity and performance optimization
- Comprehensive testing
Tasks:
- Configure CPU affinity for all threads
- Profile and optimize hot paths
- Implement failure handling and recovery
- Run extended duration tests
- Document latency characteristics
- Create paper trading configuration
Checkpoint: System runs for hours without degradation.
5.11 Key Implementation Decisions
| Decision | Option A | Option B | Recommendation |
|---|---|---|---|
| Threading model | Single-threaded | Multi-threaded with affinity | Multi-threaded for realism |
| Inter-component | Function calls | Lock-free queues | Lock-free queues |
| Configuration | Compile-time | Runtime (TOML) | Runtime for flexibility |
| Dashboard | TUI (ratatui) | Web (HTTP server) | TUI for simplicity |
| Latency tracking | Simple stats | HDR Histogram | HDR Histogram |
| Strategy | Built-in | Plugin system | Built-in (simpler) |
| Logging | Synchronous | Async (lock-free) | Async to avoid blocking |
| Exchange | Simulated | Real market data | Simulated first |
6. Testing Strategy
6.1 Unit Tests
Test each component in isolation:
Feed Handler:
- Parse valid market data messages
- Handle malformed messages gracefully
- Verify timestamp injection accuracy
Strategy:
- Signal generation under known market conditions
- Order book reconstruction correctness
- Edge cases (empty book, one-sided book)
Risk Manager:
- Position limit enforcement
- P&L calculation correctness
- Circuit breaker triggering
Gateway:
- Order serialization correctness
- Fill handling and position updates
- Connection management
6.2 Integration Tests
Test component interactions:
Feed Handler + Strategy:
- Market data flows to strategy
- Strategy sees correct order book state
- No message loss under load
Strategy + Risk + Gateway:
- Orders flow through risk to gateway
- Risk rejections are handled correctly
- Position updates flow back to strategy
Full System:
- End-to-end order flow
- Latency meets targets
- No memory leaks over extended runs
6.3 Chaos Testing
Simulate failures:
Network Failures:
- Market data connection drops
- Order connection drops
- Connection restored after timeout
Component Failures:
- Strategy thread panics
- Risk manager rejects all orders
- Gateway backs up
Resource Exhaustion:
- Queue fills up
- Memory pressure
- CPU overload
6.4 Performance Tests
Latency Testing:
- Measure p50, p95, p99, p999 latency
- Identify latency outliers and causes
- Test under various load levels
Throughput Testing:
- Maximum orders per second
- Sustained throughput over 60+ seconds
- Throughput under memory pressure
Stability Testing:
- 24-hour continuous run
- Memory usage over time
- Latency stability over time
7. Common Pitfalls & Debugging
| Problem | Symptom | Cause | Solution |
|---|---|---|---|
| Latency spikes | Occasional 10x+ latency | GC, allocation, context switch | Eliminate allocations, pin CPUs |
| Queue overflow | Missing messages | Producer faster than consumer | Larger queues, backpressure |
| Position mismatch | P&L looks wrong | Missed fills or double-counting | Sequence numbers, reconciliation |
| False sharing | Poor multi-thread perf | Data structures share cache lines | Pad structures to 64 bytes |
| Lock contention | Throughput ceiling | Hidden locks in libraries | Profile, replace with lock-free |
| Memory growth | Eventual OOM | Unbounded caches/histories | Bound all collections |
| Time drift | Incorrect latency | Wall clock vs monotonic | Use Instant, not SystemTime |
| Protocol mismatch | Garbled data | Byte order, field size errors | Test with known good data |
Debugging Strategies
# CPU profiling
perf record -g ./hft_system
perf report
# Latency profiling
cargo flamegraph --bin hft_system
# Memory profiling
valgrind --tool=massif ./hft_system
heaptrack ./hft_system
# Cache analysis
perf stat -e cache-misses,cache-references ./hft_system
# Thread scheduling
perf sched record ./hft_system
perf sched latency
# System calls
strace -c ./hft_system
8. Extensions & Challenges
Beginner Extensions
- Add more trading strategies (momentum, pairs)
- Implement multiple risk limit types
- Add trade logging to file
- Create configuration UI
Intermediate Extensions
- Real market data integration (delayed quotes)
- Multiple exchange connections
- Strategy parameter optimization
- Web-based dashboard
- Historical data replay
Advanced Extensions
- Machine learning signal generation
- io_uring for all I/O
- FIX protocol support
- Multi-asset correlation strategies
- Geographic distribution
Expert Extensions
- Kernel bypass networking (DPDK)
- FPGA acceleration
- Co-located deployment
- Sub-microsecond latency target
- Live trading with real money
9. Real-World Connections
How Production Systems Differ
| Aspect | This Project | Jane Street / Citadel / Two Sigma |
|---|---|---|
| Market data | Simulated | Direct feeds ($millions/year) |
| Network | TCP kernel stack | DPDK/RDMA, kernel bypass |
| Hardware | Consumer CPU | Xeon, FPGA, custom ASICs |
| Latency | ~50 microseconds | ~1-10 microseconds |
| Symbols | 4-10 | 10,000+ |
| Strategies | 1-2 | 100s running simultaneously |
| Risk | Basic limits | ML-based, real-time VaR |
| Team | 1 person | 50-500 engineers |
| Budget | $0 | $100M+ infrastructure |
Architectural Patterns You’re Learning
These patterns are exactly what production systems use:
- Lock-free communication - Disruptor pattern (LMAX)
- CPU affinity - Universal in HFT
- Memory pre-allocation - No malloc in hot path
- HDR Histogram - Industry standard for latency
- Circuit breakers - Required by all exchanges
- Separate control/data planes - Dashboard vs trading
Companies Building Similar Systems
- Jane Street - OCaml-based trading systems
- Citadel Securities - C++ market making
- Two Sigma - Python + C++ quantitative trading
- Tower Research - Ultra-low-latency C++
- DRW - Multi-asset trading technology
- Optiver - Market making across exchanges
- IMC - Technology-focused trading
10. Resources
Essential Reading
- Building Low Latency Applications with C++ by Sourav Ghosh - The primary reference for this project
- Rust Atomics and Locks by Mara Bos - Lock-free programming in Rust
- Systems Performance by Brendan Gregg - Performance analysis methodology
Code References
- PacktPublishing/Building-Low-Latency-Applications-with-CPP - Full C++ trading system
- LMAX Disruptor - High-performance ring buffer
- HdrHistogram - Latency histograms
- ratatui - TUI framework for dashboard
Academic Papers
- “The LMAX Architecture” - Martin Fowler (lock-free patterns)
- “Trading and Exchanges” - Market microstructure fundamentals
- “Flash Boys” - Context on HFT industry (narrative, not technical)
Video Resources
- CppCon talks on low-latency systems
- Rust Nation talks on performance
- Jane Street tech talks (YouTube)
11. Self-Assessment Checklist
Understanding
- I can explain the role of each component in the system
- I can describe how data flows from market data to order submission
- I understand why lock-free queues are necessary
- I can explain the risk management requirements
- I understand end-to-end latency measurement
- I can describe failure modes and recovery strategies
Implementation
- All six components are implemented and integrated
- Lock-free queues pass stress tests
- CPU affinity is configured and verified
- Latency meets targets (p99 < 200us)
- Risk management prevents limit breaches
- Dashboard shows real-time metrics
- System runs for hours without memory growth
Operations
- I can start and stop the system cleanly
- I can configure via TOML files
- I can interpret latency histograms
- I can diagnose performance issues
- I can explain the system in an interview
12. Submission / Completion Criteria
Minimum Viable Completion
- Exchange simulator running with synthetic data
- Feed handler receiving and publishing market data
- Strategy generating orders from signals
- Risk manager performing basic checks
- Gateway sending orders and receiving fills
- Orders execute end-to-end
Full Completion
- All components integrated with lock-free communication
- CPU affinity configured for critical threads
- Latency < 100 microseconds p99
- Dashboard displaying live metrics
- Risk limits enforced correctly
- Runs for 1+ hour without issues
- Throughput > 10,000 orders/second
Excellence (Going Above and Beyond)
- Multiple trading strategies implemented
- Latency < 50 microseconds p99
- Paper trading with real (delayed) market data
- Web-based dashboard option
- Comprehensive chaos testing
- Automated regression testing
- Performance comparison documentation
This capstone project represents the culmination of the HFT learning path. Completing it demonstrates mastery of systems programming, concurrent computing, and trading system architecture. The skills you develop are directly applicable to roles at quantitative trading firms, exchanges, and high-performance computing companies.
| *Reference: HIGH_FREQUENCY_TRADING_CPP_RUST_LEARNING_PROJECTS_SUMMARY.md | Project Index* |