Project 1: Describing Data - The mtcars Dataset

Build a descriptive statistics report for a classic dataset.


Project Overview

Attribute Value
Difficulty Level 1: Beginner
Time Estimate Weekend
Main Language R
Alternative Languages Python, Julia
Knowledge Area Descriptive statistics
Tools RStudio
Main Book “R for Data Science” by Wickham & Grolemund

What you’ll build: A summary report of the mtcars dataset with key statistics and interpretations.

Why it teaches stats: Descriptive stats are the foundation of any analysis.

Core challenges you’ll face:

  • Computing summary measures correctly
  • Interpreting units and context
  • Presenting results clearly

Real World Outcome

You will produce a short report that explains averages, spreads, and notable values.

Example Output:

Mean mpg: 20.1
Median mpg: 19.2
Std dev mpg: 6.0

Verification steps:

  • Cross-check with built-in summary outputs
  • Validate units and context

The Core Question You’re Answering

“What is this dataset telling me at a glance?”

This is the first step in any data analysis.


Concepts You Must Understand First

Stop and research these before coding:

  1. Mean vs median
    • When does the median tell a better story?
    • Book Reference: “R for Data Science”, Ch. 7
  2. Variance and standard deviation
    • What does spread mean in practice?
    • Book Reference: “OpenIntro Statistics” by Diez et al., Ch. 2
  3. Summary tables
    • How do you summarize categorical vs numeric data?
    • Book Reference: “R for Data Science”, Ch. 5

Questions to Guide Your Design

  1. Metric selection
    • Which statistics are most meaningful for each column?
    • How will you handle categorical variables?
  2. Reporting format
    • Will you output a table or a narrative report?
    • How will you ensure reproducibility?

Thinking Exercise

Outliers

Find the maximum and minimum mpg values and decide if they are outliers.

Questions while working:

  • What counts as an outlier here?
  • How does it affect the mean?

The Interview Questions They’ll Ask

Prepare to answer these:

  1. “What is the difference between mean and median?”
  2. “Why do we use standard deviation?”
  3. “How do you summarize categorical data?”
  4. “What is an outlier?”
  5. “Why is descriptive statistics not enough?”

Hints in Layers

Hint 1: Starting Point Start with summary() to get baseline stats.

Hint 2: Next Level Compute mean, median, sd for key variables.

Hint 3: Technical Details Use grouped summaries for factors like cylinders.

Hint 4: Tools/Debugging Compare your results against known references.


Books That Will Help

Topic Book Chapter
Summary stats “R for Data Science” Ch. 7
Variability “OpenIntro Statistics” Ch. 2
Tables “R for Data Science” Ch. 5

Implementation Hints

  • Keep code in an R Markdown report.
  • Label units clearly.
  • Interpret numbers in plain language.

Learning Milestones

  1. First milestone: You can compute basic summaries.
  2. Second milestone: You can interpret variability.
  3. Final milestone: You can communicate insights clearly.