Project 11: Resampling and Modern Methods Lab
Build a lab for bootstrap, permutation tests, cross-validation, and Monte Carlo risk simulation.
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Level 3: Advanced |
| Time Estimate | 2 weeks |
| Main Programming Language | Python |
| Alternative Programming Languages | R |
| Coolness Level | Level 4: Hardcore Tech Flex |
| Business Potential | 1. The “Resume Gold” |
| Prerequisites | Projects 7-10 |
| Key Topics | Bootstrap, permutation tests, CV, Monte Carlo |
1. Learning Objectives
- Estimate uncertainty with bootstrap replicates.
- Build permutation-based significance logic.
- Design leakage-safe cross-validation workflows.
- Use Monte Carlo to stress policy decisions.
2. All Theory Needed (Per-Concept Breakdown)
2.1 Bootstrap and Permutation Logic
- Fundamentals: Resampling approximates unknown sampling distributions from observed data.
- Deep Dive into the concept: Bootstrap and permutation answer different questions; choosing the wrong one can invalidate conclusions.
2.2 Cross-Validation and Simulation
- Fundamentals: CV estimates generalization; Monte Carlo explores uncertain scenarios.
- Deep Dive into the concept: Fold leakage and weak scenario design are the most common failure modes.
3. Project Specification
3.1 What You Will Build
A command-line lab that compares model metrics with bootstrap intervals, permutation p-values, repeated CV, and scenario simulations.
3.2 Functional Requirements
- Bootstrap CI module for arbitrary metrics.
- Permutation test module for model delta significance.
- Repeated CV module with split registry.
- Monte Carlo scenario engine for decision risk.
3.3 Non-Functional Requirements
- Deterministic runs and replicate logs.
- Convergence diagnostics for replicate count.
3.4 Example Usage / Output
$ python resampling_lab.py --task model_compare
Bootstrap AUC CI A: [0.782, 0.811]
Bootstrap AUC CI B: [0.776, 0.807]
Permutation p-value: 0.118
Recommendation: no strong evidence A > B
3.5 Real World Outcome
You produce robust model-comparison decisions that include uncertainty rather than point-score overconfidence.
4. Solution Architecture
Input task -> resampling orchestrator -> metric distribution store -> decision report
5. Implementation Guide
5.1 Development Environment Setup
pip install numpy scipy scikit-learn
5.2 Project Structure
P11/
resampling_lab.py
scenarios/
outputs/
5.3 The Core Question You Are Answering
“How likely is this model difference to persist beyond this sample?”
5.4 Concepts You Must Understand First
- Bootstrap interval construction
- Exchangeability in permutation testing
- Cross-validation leakage patterns
- Monte Carlo scenario design
5.5 Questions to Guide Your Design
- How many replicates are enough for stable estimates?
- Which metric is decision-critical?
- How will you detect leakage early?
5.6 Thinking Exercise
Design a scenario where naive random CV would be invalid and explain a safer split strategy.
5.7 The Interview Questions They’ll Ask
- Why does bootstrap not fix sampling bias?
- When is permutation invalid?
- How do you know if replicates are enough?
- Why can CV overestimate production performance?
- What decisions does Monte Carlo improve?
5.8 Hints in Layers
- Hint 1: Start with one scalar metric.
- Hint 2: Add permutation null generation.
- Hint 3: Add repeated CV and split logging.
- Hint 4: Add replicate convergence plots.
5.9 Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Bootstrap | Efron & Tibshirani | Ch. 1-3 |
| Resampling in ML | ISLR | resampling chapter |
| Simulation | Think Stats | simulation chapters |
6. Testing Strategy
- Deterministic seed tests.
- Known synthetic effect recovery tests.
- Leakage injection tests.
7. Common Pitfalls & Debugging
| Pitfall | Symptom | Solution |
|---|---|---|
| Too few replicates | unstable CI endpoints | increase B, monitor convergence |
| Leakage across folds | inflated CV metrics | strict split registry and pipeline isolation |
| Wrong null in permutation | misleading p-values | explicit hypothesis encoding |
8. Extensions & Challenges
- Add block bootstrap for time dependence.
- Add Bayesian bootstrap comparison.
9. Real-World Connections
- Model selection for fraud detection.
- Risk-adjusted launch decisions.
10. Resources
- Efron & Tibshirani
- ISLR
11. Self-Assessment Checklist
- I can choose the right resampling method by question type.
- I can explain uncertainty in model comparisons.
- My pipeline catches leakage risks.
12. Submission / Completion Criteria
Minimum: bootstrap + permutation + repeated CV on one dataset.
Full: includes Monte Carlo policy simulation and convergence diagnostics.