Project 5: Does More Studying Mean Higher Grades?
Build a regression analysis to test the relationship between study time and grades.
Project Overview
| Attribute | Value |
|---|---|
| Difficulty | Level 2: Intermediate |
| Time Estimate | Weekend |
| Main Language | Python |
| Alternative Languages | R, JavaScript |
| Knowledge Area | Regression and correlation |
| Tools | Plotting library |
| Main Book | “OpenIntro Statistics” by Diez et al. |
What you’ll build: A regression model that estimates how study time predicts grades, with diagnostic plots.
Why it teaches stats: It ties correlation, regression, and interpretation together.
Core challenges you’ll face:
- Fitting the regression line
- Interpreting slope in context
- Checking residuals and assumptions
Real World Outcome
You will produce a scatter plot with a fitted line, report slope and R-squared, and discuss limitations.
Example Output:
Slope: 3.2 points/hour
R-squared: 0.48
Saved plot: study_vs_grades.png
Verification steps:
- Compare predicted vs observed values
- Check residual patterns
The Core Question You’re Answering
“How much does study time explain grade variation?”
This is a real-world regression question.
Concepts You Must Understand First
Stop and research these before coding:
- Correlation vs causation
- Why doesn’t correlation prove cause?
- Book Reference: “OpenIntro Statistics” Ch. 3
- Regression slope
- What does slope mean in units?
- Book Reference: “OpenIntro Statistics” Ch. 7
- Residuals
- How do residuals reveal model problems?
- Book Reference: “OpenIntro Statistics” Ch. 7
Questions to Guide Your Design
- Data quality
- How will you handle outliers and missing data?
- How will you check for nonlinearity?
- Interpretation
- How will you explain R-squared to a non-technical reader?
- What limitations will you highlight?
Thinking Exercise
Slope Meaning
If slope = 3.2, how many extra hours correspond to a 10-point grade increase?
Questions while working:
- Does this relationship make sense at extremes?
- What other factors might matter?
The Interview Questions They’ll Ask
Prepare to answer these:
- “What does the slope represent?”
- “What is R-squared?”
- “Why doesn’t correlation imply causation?”
- “How do you check regression assumptions?”
- “What is an outlier’s impact?”
Hints in Layers
Hint 1: Starting Point Plot study hours vs grades.
Hint 2: Next Level Fit a regression line and compute R-squared.
Hint 3: Technical Details Inspect residuals to detect curvature or heteroscedasticity.
Hint 4: Tools/Debugging Try removing outliers and compare slopes.
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Correlation | “OpenIntro Statistics” | Ch. 3 |
| Regression | “OpenIntro Statistics” | Ch. 7 |
| Residuals | “OpenIntro Statistics” | Ch. 7 |
Implementation Hints
- Use a simple linear model first.
- Label axes with units.
- Discuss limitations in the output report.
Learning Milestones
- First milestone: You can fit a regression line.
- Second milestone: You can interpret coefficients.
- Final milestone: You can explain limitations and assumptions.