Project 1: Stock Data API Client

Build a client that downloads, normalizes, and stores market data.


Project Overview

Attribute Value
Difficulty Level 1: Beginner
Time Estimate Weekend
Main Language Python
Alternative Languages JavaScript, Go
Knowledge Area Data retrieval
Tools HTTP client, CSV storage
Main Book “Python for Data Analysis” by Wes McKinney

What you’ll build: A CLI that pulls OHLCV data for a symbol and stores it in a clean, consistent format.

Why it teaches quant: Every model starts with data. If you can’t fetch and normalize it, nothing else matters.

Core challenges you’ll face:

  • Handling API rate limits and pagination
  • Normalizing timestamps and time zones
  • Managing missing or partial data

Real World Outcome

You will request a symbol and receive a clean dataset ready for analysis.

Example Output:

$ python fetch.py --symbol AAPL --start 2023-01-01 --end 2023-12-31
Downloaded 252 rows
Saved to data/AAPL.csv

Verification steps:

  • Check for missing dates
  • Verify columns and data types

The Core Question You’re Answering

“How do I turn raw market API responses into reliable time series?”

Data consistency is the foundation of quant work.


Concepts You Must Understand First

Stop and research these before coding:

  1. OHLCV structure
    • What do open, high, low, close, and volume represent?
    • Book Reference: “Trading and Exchanges” by Larry Harris, Ch. 3
  2. Time zones and trading calendars
    • Why do trading days differ from calendar days?
    • Book Reference: “Python for Data Analysis” by Wes McKinney, Ch. 11
  3. Data normalization
    • Why do you standardize column names and types?
    • Book Reference: “Python for Data Analysis” by Wes McKinney, Ch. 4

Questions to Guide Your Design

  1. Storage format
    • Will you use CSV, Parquet, or a database?
    • How will you version or overwrite data?
  2. Error handling
    • How will you handle missing days or API failures?
    • Will you retry or fail fast?

Thinking Exercise

Trading Days

List the U.S. market holidays for a given year and estimate how many trading days remain.

Questions while working:

  • Why does data sometimes skip weekdays?
  • How do you align multiple symbols?

The Interview Questions They’ll Ask

Prepare to answer these:

  1. “What is OHLCV data used for?”
  2. “How do you handle missing trading days?”
  3. “Why is time zone normalization critical?”
  4. “What is survivorship bias in datasets?”
  5. “How do you validate data quality?”

Hints in Layers

Hint 1: Starting Point Start with one symbol and a short date range.

Hint 2: Next Level Add retry logic for rate limits.

Hint 3: Technical Details Normalize timestamps to a single timezone and format.

Hint 4: Tools/Debugging Log API responses for a sample request.


Books That Will Help

Topic Book Chapter
Market data “Trading and Exchanges” by Larry Harris Ch. 3
Time series “Python for Data Analysis” by Wes McKinney Ch. 11
Data cleaning “Python for Data Analysis” by Wes McKinney Ch. 4

Implementation Hints

  • Keep API keys in environment variables.
  • Always sort by timestamp before saving.
  • Add a metadata header with source and fetch date.

Learning Milestones

  1. First milestone: You can fetch and store OHLCV data.
  2. Second milestone: You can normalize timestamps and missing days.
  3. Final milestone: You can build a reliable data pipeline for analysis.