Project 5: Does More Studying Mean Higher Grades?

Build a regression analysis to test the relationship between study time and grades.


Project Overview

Attribute Value
Difficulty Level 2: Intermediate
Time Estimate Weekend
Main Language Python
Alternative Languages R, JavaScript
Knowledge Area Regression and correlation
Tools Plotting library
Main Book “OpenIntro Statistics” by Diez et al.

What you’ll build: A regression model that estimates how study time predicts grades, with diagnostic plots.

Why it teaches stats: It ties correlation, regression, and interpretation together.

Core challenges you’ll face:

  • Fitting the regression line
  • Interpreting slope in context
  • Checking residuals and assumptions

Real World Outcome

You will produce a scatter plot with a fitted line, report slope and R-squared, and discuss limitations.

Example Output:

Slope: 3.2 points/hour
R-squared: 0.48
Saved plot: study_vs_grades.png

Verification steps:

  • Compare predicted vs observed values
  • Check residual patterns

The Core Question You’re Answering

“How much does study time explain grade variation?”

This is a real-world regression question.


Concepts You Must Understand First

Stop and research these before coding:

  1. Correlation vs causation
    • Why doesn’t correlation prove cause?
    • Book Reference: “OpenIntro Statistics” Ch. 3
  2. Regression slope
    • What does slope mean in units?
    • Book Reference: “OpenIntro Statistics” Ch. 7
  3. Residuals
    • How do residuals reveal model problems?
    • Book Reference: “OpenIntro Statistics” Ch. 7

Questions to Guide Your Design

  1. Data quality
    • How will you handle outliers and missing data?
    • How will you check for nonlinearity?
  2. Interpretation
    • How will you explain R-squared to a non-technical reader?
    • What limitations will you highlight?

Thinking Exercise

Slope Meaning

If slope = 3.2, how many extra hours correspond to a 10-point grade increase?

Questions while working:

  • Does this relationship make sense at extremes?
  • What other factors might matter?

The Interview Questions They’ll Ask

Prepare to answer these:

  1. “What does the slope represent?”
  2. “What is R-squared?”
  3. “Why doesn’t correlation imply causation?”
  4. “How do you check regression assumptions?”
  5. “What is an outlier’s impact?”

Hints in Layers

Hint 1: Starting Point Plot study hours vs grades.

Hint 2: Next Level Fit a regression line and compute R-squared.

Hint 3: Technical Details Inspect residuals to detect curvature or heteroscedasticity.

Hint 4: Tools/Debugging Try removing outliers and compare slopes.


Books That Will Help

Topic Book Chapter
Correlation “OpenIntro Statistics” Ch. 3
Regression “OpenIntro Statistics” Ch. 7
Residuals “OpenIntro Statistics” Ch. 7

Implementation Hints

  • Use a simple linear model first.
  • Label axes with units.
  • Discuss limitations in the output report.

Learning Milestones

  1. First milestone: You can fit a regression line.
  2. Second milestone: You can interpret coefficients.
  3. Final milestone: You can explain limitations and assumptions.