Project 4: The Data Synthesizer - Merging and Joining Datasets

Build a notebook that merges multiple datasets into one unified table for analysis.


Quick Reference

Attribute Value
Difficulty Advanced
Time Estimate Weekend
Language Python
Prerequisites Projects 1-3
Key Topics joins, keys, merge types
Output Unified dataset + analysis

Learning Objectives

  1. Perform inner and left joins.
  2. Join on different key names.
  3. Validate row counts after joins.
  4. Use merged data to answer a question.

The Core Question You’re Answering

“How do I combine tables correctly without losing or duplicating data?”


Project Specification

Functional Requirements

  1. Inner join on shared key.
  2. Left join to find unmatched rows.
  3. Join three tables.
  4. Produce a summary table from merged data.

Implementation Guide

Questions to Guide Your Design

  1. Which keys uniquely identify rows?
  2. What is the expected row count after each join?
  3. How will you detect duplicate amplification?

Testing Strategy

  • Row counts validated.
  • Unmatched rows identified.

Extensions

  • Join audit report.
  • Reusable merge utility.

This guide was generated from LEARN_PANDAS_DEEP_DIVE.md. For the complete learning path, see the parent directory LEARN_PANDAS_DEEP_DIVE/README.md.