Project 4: The Data Synthesizer - Merging and Joining Datasets
Build a notebook that merges multiple datasets into one unified table for analysis.
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Advanced |
| Time Estimate | Weekend |
| Language | Python |
| Prerequisites | Projects 1-3 |
| Key Topics | joins, keys, merge types |
| Output | Unified dataset + analysis |
Learning Objectives
- Perform inner and left joins.
- Join on different key names.
- Validate row counts after joins.
- Use merged data to answer a question.
The Core Question You’re Answering
“How do I combine tables correctly without losing or duplicating data?”
Project Specification
Functional Requirements
- Inner join on shared key.
- Left join to find unmatched rows.
- Join three tables.
- Produce a summary table from merged data.
Implementation Guide
Questions to Guide Your Design
- Which keys uniquely identify rows?
- What is the expected row count after each join?
- How will you detect duplicate amplification?
Testing Strategy
- Row counts validated.
- Unmatched rows identified.
Extensions
- Join audit report.
- Reusable merge utility.
This guide was generated from LEARN_PANDAS_DEEP_DIVE.md. For the complete learning path, see the parent directory LEARN_PANDAS_DEEP_DIVE/README.md.