Project 2: The Data Janitor - Cleaning a Messy Dataset

Build a notebook that diagnoses and fixes missing values, types, and duplicates.

Quick Reference

Attribute	Value
Difficulty	Intermediate
Time Estimate	Weekend
Language	Python
Prerequisites	Project 1
Key Topics	missing data, dtype conversion, cleaning
Output	Cleaned dataset + report

Learning Objectives

Detect missing values and data quality issues.
Apply fill/drop strategies by column.
Convert columns to correct dtypes.
Normalize string columns.
Produce a documented cleaning report.

The Core Question You’re Answering

“How do I turn messy real-world data into a clean, analyzable table?”

Project Specification

Functional Requirements

Missing-value report per column.
At least two cleaning strategies applied.
Type conversion for numeric and date columns.
Duplicate detection and handling.
Save cleaned output.

Implementation Guide

Questions to Guide Your Design

Which columns are critical and cannot be missing?
When should you drop vs fill?
How will you document decisions?

Testing Strategy

Missing counts reduced.
Dtypes correct.
Duplicates handled.

Extensions

Data quality score.
Unit tests for cleaning steps.

This guide was generated from LEARN_PANDAS_DEEP_DIVE.md. For the complete learning path, see the parent directory LEARN_PANDAS_DEEP_DIVE/README.md.