Project 1: The First DataFrame - Exploring and Filtering
Build a notebook that loads a dataset, inspects its schema, and answers basic questions with filters and sorting.
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Beginner |
| Time Estimate | A few hours |
| Language | Python |
| Prerequisites | Basic Python |
| Key Topics | DataFrame, Series, indexing, filtering |
| Output | Notebook report |
Learning Objectives
By completing this project, you will:
- Load CSV data into a DataFrame.
- Inspect schema with
.info()and.describe(). - Select rows/columns with
.locand.iloc. - Filter with boolean masks and compound conditions.
- Sort results to answer top-N questions.
The Core Question You’re Answering
“How do I turn a raw table into answers without writing loops?”
Concepts You Must Understand First
| Concept | Why It Matters | Where to Learn |
|---|---|---|
| Series vs DataFrame | Core data model | Pandas docs |
| Indexing | .loc vs .iloc |
Pandas docs |
| Boolean masks | Filtering logic | Pandas docs |
Project Specification
Functional Requirements
- Load a dataset with at least 5 columns.
- Inspect schema and summary stats.
- Answer at least 5 questions with filters.
- Sort results for top/bottom analysis.
- Summarize findings in markdown.
Example Output
A notebook section titled “Top 10 highest-rated movies after 2010” with a sorted table.
Implementation Guide
Project Structure
project-root/
├── data/
│ └── dataset.csv
├── notebooks/
│ └── analysis.ipynb
└── README.md
Questions to Guide Your Design
- What questions are meaningful for this dataset?
- What filters make sense for stakeholders?
- How will you present results clearly?
Testing Strategy
- Filters return expected rows.
- Sorting is correct.
- Notebook runs top-to-bottom.
Extensions
- Add a chart for one metric.
- Export filtered dataset to CSV.
This guide was generated from LEARN_PANDAS_DEEP_DIVE.md. For the complete learning path, see the parent directory LEARN_PANDAS_DEEP_DIVE/README.md.