Project 1: The First DataFrame - Exploring and Filtering

Build a notebook that loads a dataset, inspects its schema, and answers basic questions with filters and sorting.


Quick Reference

Attribute Value
Difficulty Beginner
Time Estimate A few hours
Language Python
Prerequisites Basic Python
Key Topics DataFrame, Series, indexing, filtering
Output Notebook report

Learning Objectives

By completing this project, you will:

  1. Load CSV data into a DataFrame.
  2. Inspect schema with .info() and .describe().
  3. Select rows/columns with .loc and .iloc.
  4. Filter with boolean masks and compound conditions.
  5. Sort results to answer top-N questions.

The Core Question You’re Answering

“How do I turn a raw table into answers without writing loops?”


Concepts You Must Understand First

Concept Why It Matters Where to Learn
Series vs DataFrame Core data model Pandas docs
Indexing .loc vs .iloc Pandas docs
Boolean masks Filtering logic Pandas docs

Project Specification

Functional Requirements

  1. Load a dataset with at least 5 columns.
  2. Inspect schema and summary stats.
  3. Answer at least 5 questions with filters.
  4. Sort results for top/bottom analysis.
  5. Summarize findings in markdown.

Example Output

A notebook section titled “Top 10 highest-rated movies after 2010” with a sorted table.


Implementation Guide

Project Structure

project-root/
├── data/
│   └── dataset.csv
├── notebooks/
│   └── analysis.ipynb
└── README.md

Questions to Guide Your Design

  1. What questions are meaningful for this dataset?
  2. What filters make sense for stakeholders?
  3. How will you present results clearly?

Testing Strategy

  • Filters return expected rows.
  • Sorting is correct.
  • Notebook runs top-to-bottom.

Extensions

  • Add a chart for one metric.
  • Export filtered dataset to CSV.

This guide was generated from LEARN_PANDAS_DEEP_DIVE.md. For the complete learning path, see the parent directory LEARN_PANDAS_DEEP_DIVE/README.md.