Project 1: Real-Time Earthquake Monitor
Build a reliable geospatial pipeline that ingests live USGS earthquake events and publishes a trustworthy interactive map plus operational summary.
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Level 1: Beginner (The Tinkerer) |
| Time Estimate | 6-8 hours |
| Main Programming Language | Python (Alternatives: TypeScript, R, Julia) |
| Coolness Level | Level 3: Genuinely Clever |
| Business Potential | 2. The “Micro-SaaS / Pro Tool” |
| Prerequisites | Python basics, JSON parsing, CRS fundamentals |
| Key Topics | GeoJSON, CRS hygiene, filtering, map communication |
1. Learning Objectives
By completing this project, you will:
- Ingest and validate real-time geospatial event feeds.
- Normalize event schema into a reproducible table shape.
- Apply deterministic spatial and temporal filters.
- Publish an interactive map artifact and an operational run summary.
- Explain why display correctness does not guarantee analytical correctness.
2. All Theory Needed (Per-Concept Breakdown)
2.1 GeoJSON Event Modeling
Fundamentals
- GeoJSON FeatureCollections contain
features, each withgeometryandproperties. - Earthquake events are points with coordinates in lon/lat order.
- RFC 7946 assumes WGS84 coordinates for interoperability.
Deep Dive The main risk in live-feed projects is schema trust. Feeds evolve, fields may be null, and optional fields may appear/disappear without warning. You need a canonical schema contract that defines which fields are required, optional, and derived. A robust flow validates at ingest, normalizes into canonical records, and logs rejects by reason. This protects downstream mapping and analytics from silent corruption.
GeoJSON is excellent for interchange but not a full analytics contract. For example, geometry may be valid while key properties are missing. Treat geometry validity and attribute completeness as separate QA checks. Also, do not trust coordinate order from upstream transformations blindly; enforce explicit (lon, lat) mapping and test with known landmarks.
How this fits in this project
- Feed ingestion, normalization table, reject logging.
Minimal concrete example
Pseudo-contract:
required: event_id, event_time_utc, magnitude, geometry_point
optional: depth_km, place_text, source_url
rejected_if: missing geometry OR missing magnitude
2.2 CRS and Filtering Invariants
Fundamentals
- Feed coordinates are typically EPSG:4326.
- BBox filters assume coordinate order and unit interpretation.
- Distance-based filters require projected CRS in meters.
Deep Dive Many earthquake dashboards only need point display, but analytics often add radius filters, nearest-facility metrics, or cluster distance thresholds. Those operations become invalid when performed directly in angular degrees. For this project, establish two-phase CRS policy: ingest in source CRS, and only reproject when metric computations are needed. Keep this policy explicit so future project growth does not inherit hidden assumptions.
How this fits in this project
- BBox correctness, optional distance-based enrichment.
Minimal concrete example
if operation in [distance, buffer, nearest]:
reproject to local meter-based CRS first
else:
keep source CRS for map display/export
2.3 Operational Observability
Fundamentals
- Live feeds require run-level telemetry.
- Counts should be logged at each stage.
Deep Dive A successful exit code is not enough. You need stage-level observability to detect schema drift, feed outages, and over-restrictive filters. Minimum operational counters: raw events, valid events, rejected events by reason, filtered events, exported events. Add runtime and source timestamp metadata to every output. This turns a toy map into an operable geospatial service.
3. Project Specification
3.1 What You Will Build
A command-line pipeline that:
- fetches the latest USGS earthquake feed,
- validates and normalizes records,
- applies time/magnitude/region filters,
- exports interactive map and summary outputs.
Included:
- deterministic run logs,
- map with event popups,
- export artifacts for downstream workflows.
Excluded:
- push notifications,
- multi-source feed fusion,
- complex hazard forecasting.
3.2 Functional Requirements
- Fetch GeoJSON from configured endpoint with timeout/retry policy.
- Validate each feature against canonical schema.
- Support filters:
--hours,--min-mag,--bbox. - Render map markers with consistent style rules.
- Emit summary file with QA counters.
3.3 Non-Functional Requirements
- Reliability: Graceful handling for empty feed and transient HTTP failures.
- Reproducibility: Frozen fixture mode for deterministic testing.
- Interpretability: Every output includes filter configuration and generation timestamp.
3.4 Data Formats / Schemas
Input: GeoJSON FeatureCollection
Canonical output columns:
- event_id (string)
- event_time_utc (ISO-8601)
- magnitude (float)
- depth_km (float | null)
- longitude (float)
- latitude (float)
- place (string | null)
3.5 Edge Cases
- Missing geometry
- Null magnitude
- Time parse failure
- BBox crossing antimeridian
- Empty result after filtering
3.6 Real World Outcome
3.6.1 How to Run (Copy/Paste)
$ python run_project1_pipeline.py --hours 24 --min-mag 3.5 --bbox "-125,24,-66,50"
3.6.2 Golden Path Demo (Deterministic)
$ python run_project1_pipeline.py --fixture fixtures/usgs_sample_2026_02_11.json --hours 24 --min-mag 4.0
[INFO] fixture rows=120
[INFO] valid rows=116 rejected=4
[INFO] filtered rows=37
[INFO] outputs written: map/html + csv + summary
[DONE] runtime=2.1s
3.6.3 Exact Terminal Transcript (Live)
$ python run_project1_pipeline.py --hours 24 --min-mag 3.5 --bbox "-125,24,-66,50"
[INFO] Fetching USGS feed
[INFO] Retrieved 142 events
[INFO] Validated 139 records (3 rejected)
[INFO] Region filter kept 57 records
[INFO] Wrote outputs/earthquakes_latest.html
[INFO] Wrote outputs/earthquakes_latest.csv
[INFO] Wrote outputs/earthquakes_summary.txt
[DONE] Completed in 8.4 seconds
4. Solution Architecture
4.1 High-Level Design
Feed Fetcher -> Schema Validator -> Normalizer -> Spatial/Temporal Filter -> Map Renderer -> Exporter
4.2 Key Components
| Component | Responsibility | Key Decision |
|---|---|---|
| Fetcher | Retrieve feed payload | Retry/backoff strategy |
| Validator | Enforce required fields | Reject and log policy |
| Filter Engine | Apply time/mag/bbox filters | Deterministic ordering |
| Renderer | Build interactive map | Style rules by magnitude |
| Reporter | Emit QA summary | Stage counters and runtime |
4.3 Algorithm Overview
- Fetch payload.
- Parse features.
- Validate and normalize.
- Apply filters.
- Render and export.
Complexity:
- Time: O(n)
- Space: O(n)
5. Implementation Guide
5.1 Development Environment Setup
$ mamba create -n geo-p01 python=3.11 geopandas folium requests -y
$ mamba activate geo-p01
5.2 Project Structure
project1/
├── src/
│ ├── ingest.py
│ ├── validate.py
│ ├── filter.py
│ ├── render.py
│ └── main.py
├── fixtures/
├── outputs/
└── tests/
5.3 The Core Question You Are Answering
“How do I produce a map that remains correct when the upstream feed is noisy and changing?”
5.4 Concepts You Must Understand First
- GeoJSON schema expectations.
- CRS and coordinate order invariants.
- Deterministic filtering and logging.
5.5 Questions to Guide Your Design
- What should happen when the feed returns malformed rows?
- Which counters prove that filtering is behaving correctly?
- How will you distinguish data outage from true low-seismic activity?
5.6 Thinking Exercise
Trace one malformed event through your pipeline and decide exactly where and how it is rejected.
5.7 Interview Questions
- Why should a map pipeline have reject logs?
- What is the difference between display CRS and analysis CRS?
- How do you make live-feed outputs testable?
5.8 Hints in Layers
- Hint 1: Implement validator first.
- Hint 2: Add fixture mode before live mode.
- Hint 3: Separate normalization from filtering.
- Hint 4: Emit stage counters in one summary object.
6. Testing Strategy
6.1 Test Categories
| Category | Purpose |
|---|---|
| Unit | Validate schema checks and filter behavior |
| Integration | Verify end-to-end artifacts are generated |
| Edge Case | Handle null fields, empty results, malformed geometry |
6.2 Critical Test Cases
- Fixture with missing magnitude should increment reject counter.
- BBox filter should include known in-bounds event and exclude out-of-bounds event.
- Empty filtered set should still produce summary file with zero count.
7. Common Pitfalls & Debugging
| Pitfall | Symptom | Fix |
|---|---|---|
| Lat/lon swapped | Markers appear in wrong region | Enforce explicit coordinate mapping |
| Silent schema drift | Sudden count collapse | Add required-field validation metrics |
| Timezone mismatch | Events unexpectedly excluded | Normalize all times to UTC |
8. Extensions & Challenges
- Add depth-based thematic layers.
- Add historical baseline comparison for anomaly detection.
- Add lightweight API endpoint for latest summary.
9. Real-World Connections
- Disaster-response monitoring dashboards.
- Insurance and risk exposure awareness tooling.
- Public alerting and situational awareness portals.
10. Resources
- USGS Earthquake API: https://earthquake.usgs.gov/fdsnws/event/1/
- GeoJSON RFC 7946: https://www.rfc-editor.org/rfc/rfc7946
- GeoPandas docs: https://geopandas.org/en/stable/docs/user_guide.html
- Folium docs: https://python-visualization.github.io/folium/latest/
11. Self-Assessment Checklist
- I can explain the feed schema and reject policy.
- I can reproduce identical output from a fixture.
- I can justify CRS choices for any metric operation.
- I can diagnose sudden count anomalies using stage counters.
12. Submission / Completion Criteria
Minimum Viable Completion
- Deterministic fixture run works end-to-end.
- Live run produces map, csv, and summary.
Full Completion
- Includes robust reject logging and edge-case handling.
- Includes concise runbook for operators.
Excellence
- Adds automated regression check for feed schema drift.
- Adds trend comparison against prior run baseline.