Project 1: The First DataFrame - Exploring and Filtering

Build a notebook that loads a dataset, inspects its schema, and answers basic questions with filters and sorting.

Quick Reference

Attribute	Value
Difficulty	Beginner
Time Estimate	A few hours
Language	Python
Prerequisites	Basic Python
Key Topics	DataFrame, Series, indexing, filtering
Output	Notebook report

Learning Objectives

By completing this project, you will:

Load CSV data into a DataFrame.
Inspect schema with .info() and .describe().
Select rows/columns with .loc and .iloc.
Filter with boolean masks and compound conditions.
Sort results to answer top-N questions.

The Core Question You’re Answering

“How do I turn a raw table into answers without writing loops?”

Concepts You Must Understand First

Concept	Why It Matters	Where to Learn
Series vs DataFrame	Core data model	Pandas docs
Indexing	`.loc` vs `.iloc`	Pandas docs
Boolean masks	Filtering logic	Pandas docs

Project Specification

Functional Requirements

Load a dataset with at least 5 columns.
Inspect schema and summary stats.
Answer at least 5 questions with filters.
Sort results for top/bottom analysis.
Summarize findings in markdown.

Example Output

A notebook section titled “Top 10 highest-rated movies after 2010” with a sorted table.

Implementation Guide

Project Structure

project-root/
├── data/
│   └── dataset.csv
├── notebooks/
│   └── analysis.ipynb
└── README.md

Questions to Guide Your Design

What questions are meaningful for this dataset?
What filters make sense for stakeholders?
How will you present results clearly?

Testing Strategy

Filters return expected rows.
Sorting is correct.
Notebook runs top-to-bottom.

Extensions

Add a chart for one metric.
Export filtered dataset to CSV.

This guide was generated from LEARN_PANDAS_DEEP_DIVE.md. For the complete learning path, see the parent directory LEARN_PANDAS_DEEP_DIVE/README.md.