Project 1: Describing Data - The mtcars Dataset
Build a descriptive statistics report for a classic dataset.
Project Overview
| Attribute | Value |
|---|---|
| Difficulty | Level 1: Beginner |
| Time Estimate | Weekend |
| Main Language | R |
| Alternative Languages | Python, Julia |
| Knowledge Area | Descriptive statistics |
| Tools | RStudio |
| Main Book | “R for Data Science” by Wickham & Grolemund |
What you’ll build: A summary report of the mtcars dataset with key statistics and interpretations.
Why it teaches stats: Descriptive stats are the foundation of any analysis.
Core challenges you’ll face:
- Computing summary measures correctly
- Interpreting units and context
- Presenting results clearly
Real World Outcome
You will produce a short report that explains averages, spreads, and notable values.
Example Output:
Mean mpg: 20.1
Median mpg: 19.2
Std dev mpg: 6.0
Verification steps:
- Cross-check with built-in summary outputs
- Validate units and context
The Core Question You’re Answering
“What is this dataset telling me at a glance?”
This is the first step in any data analysis.
Concepts You Must Understand First
Stop and research these before coding:
- Mean vs median
- When does the median tell a better story?
- Book Reference: “R for Data Science”, Ch. 7
- Variance and standard deviation
- What does spread mean in practice?
- Book Reference: “OpenIntro Statistics” by Diez et al., Ch. 2
- Summary tables
- How do you summarize categorical vs numeric data?
- Book Reference: “R for Data Science”, Ch. 5
Questions to Guide Your Design
- Metric selection
- Which statistics are most meaningful for each column?
- How will you handle categorical variables?
- Reporting format
- Will you output a table or a narrative report?
- How will you ensure reproducibility?
Thinking Exercise
Outliers
Find the maximum and minimum mpg values and decide if they are outliers.
Questions while working:
- What counts as an outlier here?
- How does it affect the mean?
The Interview Questions They’ll Ask
Prepare to answer these:
- “What is the difference between mean and median?”
- “Why do we use standard deviation?”
- “How do you summarize categorical data?”
- “What is an outlier?”
- “Why is descriptive statistics not enough?”
Hints in Layers
Hint 1: Starting Point Start with summary() to get baseline stats.
Hint 2: Next Level Compute mean, median, sd for key variables.
Hint 3: Technical Details Use grouped summaries for factors like cylinders.
Hint 4: Tools/Debugging Compare your results against known references.
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Summary stats | “R for Data Science” | Ch. 7 |
| Variability | “OpenIntro Statistics” | Ch. 2 |
| Tables | “R for Data Science” | Ch. 5 |
Implementation Hints
- Keep code in an R Markdown report.
- Label units clearly.
- Interpret numbers in plain language.
Learning Milestones
- First milestone: You can compute basic summaries.
- Second milestone: You can interpret variability.
- Final milestone: You can communicate insights clearly.