Project 1: Stock Data API Client
Build a client that downloads, normalizes, and stores market data.
Project Overview
| Attribute | Value |
|---|---|
| Difficulty | Level 1: Beginner |
| Time Estimate | Weekend |
| Main Language | Python |
| Alternative Languages | JavaScript, Go |
| Knowledge Area | Data retrieval |
| Tools | HTTP client, CSV storage |
| Main Book | “Python for Data Analysis” by Wes McKinney |
What you’ll build: A CLI that pulls OHLCV data for a symbol and stores it in a clean, consistent format.
Why it teaches quant: Every model starts with data. If you can’t fetch and normalize it, nothing else matters.
Core challenges you’ll face:
- Handling API rate limits and pagination
- Normalizing timestamps and time zones
- Managing missing or partial data
Real World Outcome
You will request a symbol and receive a clean dataset ready for analysis.
Example Output:
$ python fetch.py --symbol AAPL --start 2023-01-01 --end 2023-12-31
Downloaded 252 rows
Saved to data/AAPL.csv
Verification steps:
- Check for missing dates
- Verify columns and data types
The Core Question You’re Answering
“How do I turn raw market API responses into reliable time series?”
Data consistency is the foundation of quant work.
Concepts You Must Understand First
Stop and research these before coding:
- OHLCV structure
- What do open, high, low, close, and volume represent?
- Book Reference: “Trading and Exchanges” by Larry Harris, Ch. 3
- Time zones and trading calendars
- Why do trading days differ from calendar days?
- Book Reference: “Python for Data Analysis” by Wes McKinney, Ch. 11
- Data normalization
- Why do you standardize column names and types?
- Book Reference: “Python for Data Analysis” by Wes McKinney, Ch. 4
Questions to Guide Your Design
- Storage format
- Will you use CSV, Parquet, or a database?
- How will you version or overwrite data?
- Error handling
- How will you handle missing days or API failures?
- Will you retry or fail fast?
Thinking Exercise
Trading Days
List the U.S. market holidays for a given year and estimate how many trading days remain.
Questions while working:
- Why does data sometimes skip weekdays?
- How do you align multiple symbols?
The Interview Questions They’ll Ask
Prepare to answer these:
- “What is OHLCV data used for?”
- “How do you handle missing trading days?”
- “Why is time zone normalization critical?”
- “What is survivorship bias in datasets?”
- “How do you validate data quality?”
Hints in Layers
Hint 1: Starting Point Start with one symbol and a short date range.
Hint 2: Next Level Add retry logic for rate limits.
Hint 3: Technical Details Normalize timestamps to a single timezone and format.
Hint 4: Tools/Debugging Log API responses for a sample request.
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Market data | “Trading and Exchanges” by Larry Harris | Ch. 3 |
| Time series | “Python for Data Analysis” by Wes McKinney | Ch. 11 |
| Data cleaning | “Python for Data Analysis” by Wes McKinney | Ch. 4 |
Implementation Hints
- Keep API keys in environment variables.
- Always sort by timestamp before saving.
- Add a metadata header with source and fetch date.
Learning Milestones
- First milestone: You can fetch and store OHLCV data.
- Second milestone: You can normalize timestamps and missing days.
- Final milestone: You can build a reliable data pipeline for analysis.