Project 11: Epidemic Simulator (Exponential Growth and Compartments)
Build a scenario-driven simulator that models infection spread, peak load, and intervention effects over time.
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Level 2: Beginner-Intermediate |
| Time Estimate | Weekend to 1 week |
| Language | Python (Alternatives: JavaScript, R) |
| Prerequisites | Exponential functions, rates, percentages, basic plotting |
| Key Topics | Discrete-time simulation, SIR dynamics, reproduction number, parameter sensitivity |
1. Learning Objectives
By completing this project, you will:
- Explain why small rate changes can produce dramatically different epidemic outcomes.
- Implement a discrete SIR-style model with explicit update equations.
- Track and interpret peak infection timing, peak size, and total affected population.
- Compare intervention scenarios (contact reduction, faster recovery, delayed response).
- Communicate model limits and assumptions clearly, like an engineer.
2. Theoretical Foundation
2.1 Core Concepts
- Exponential growth: Early spread can look slow, then accelerate rapidly due to compounding.
- Compartment models: Population is partitioned into Susceptible (S), Infected (I), Recovered (R).
- Reproduction numbers: R0 and time-varying Re summarize expected secondary infections.
- Finite populations: Growth eventually slows because susceptible people are depleted.
- Discrete-time updates: Simulations step day-by-day, balancing infection and recovery flows.
2.2 Why This Matters
This project turns a headline concept into measurable curves. You learn to reason about growth, delay, saturation, and tradeoffs in public-health style systems. These same modeling skills appear in finance, infrastructure capacity planning, and marketing diffusion.
2.3 Historical Context / Background
Compartment models have been used for over a century to reason about infectious spread. During recent global outbreaks, these models became mainstream in dashboards and policy discussions, making mathematical literacy around growth and assumptions a practical skill.
2.4 Common Misconceptions
- Misconception: Exponential means forever. Reality: Real systems saturate due to constraints.
- Misconception: One R0 number predicts everything. Reality: Transmission changes with behavior and interventions.
- Misconception: Curves are exact forecasts. Reality: They are scenario tools based on assumptions.
3. Project Specification
3.1 What You Will Build
A deterministic simulator that:
- Runs one or more SIR scenarios over configurable days
- Produces daily S, I, R values and summary metrics
- Compares baseline and intervention scenarios
- Exports table and chart outputs for review
3.2 Functional Requirements
- Scenario input: Accept population size and parameters (infection rate, recovery rate, initial infected).
- State updates: Compute S, I, R per time step with explicit formulas.
- Metrics: Report peak infected value, peak day, and cumulative recovered.
- Scenario comparison: Support baseline vs intervention in the same run.
- Output artifacts: Save daily series table and a labeled curve visualization.
3.3 Non-Functional Requirements
- Performance: Simulate at least 365 days in under a few seconds.
- Reliability: State values never go negative and total population stays conserved.
- Usability: Outputs must make peak timing and intervention impact obvious.
3.4 Example Usage / Output
Pseudo terminal transcript:
$ epidemic-sim --scenario baseline --days 180
Population: 10000
Initial state: S=9990 I=10 R=0
Peak infected: day 47, count 3124
Final state: S=2114 I=23 R=7863
Saved series: ./output/p11-baseline.csv
Saved chart: ./output/p11-baseline.png
$ epidemic-sim --scenario baseline --scenario intervention_30pct_contact_drop
Comparison summary:
- Baseline peak: day 47, 3124 infected
- Intervention peak: day 66, 1810 infected
3.5 Real World Outcome
After finishing, you can run two scenarios and immediately answer practical questions: when the wave peaks, how high it goes, and how much an intervention delays or reduces that peak. Your output should include clear labeled curves where baseline and intervention diverge in an interpretable way.
4. Solution Architecture
4.1 High-Level Design
Scenario Parameters
|
v
State Initializer (S0, I0, R0)
|
v
Daily Update Engine
|
+--> Metrics Extractor (peak day, peak size, totals)
|
v
Series Export (table) + Chart Renderer
4.2 Key Components
| Component | Responsibility | Key Decisions |
|---|---|---|
| Scenario Loader | Parse input parameters and intervention windows | Keep schema small and explicit |
| Update Engine | Apply daily SIR transitions | Enforce conservation checks each step |
| Metrics Module | Compute peak and final outcomes | Store both absolute and percentage metrics |
| Comparator | Align baseline vs intervention outputs | Use same timeline and initial state |
| Reporter | Export chart/table and narrative summary | Include assumptions in every report |
4.3 Data Structures
Scenario:
name: string
population: integer
beta: decimal # transmission factor
gamma: decimal # recovery factor
initial_infected: integer
duration_days: integer
interventions: list of {start_day, end_day, beta_multiplier}
State:
day: integer
S: decimal
I: decimal
R: decimal
RunSummary:
peak_day: integer
peak_infected: decimal
final_S: decimal
final_I: decimal
final_R: decimal
4.4 Algorithm Overview
Key Algorithm: Daily SIR Step
- Compute new infections from current S, I, and transmission factor.
- Compute new recoveries from current I and recovery factor.
- Update S, I, R using mass-conserving transitions.
- Clamp near-zero floating artifacts for stability reporting.
- Record state and update peak metrics.
Complexity Analysis:
- Time: O(D) per scenario, where D is days simulated
- Space: O(D) if storing full series, O(1) if stream-only summary
5. Implementation Guide
5.1 Development Environment Setup
Pseudo setup checklist:
1) Prepare runtime and plotting dependency.
2) Create folders: scenarios/, output/, tests/.
3) Run a 10-day sanity scenario to validate conservation output.
5.2 Project Structure
p11-epidemic-simulator/
├── scenarios/
│ ├── baseline.scenario
│ └── intervention.scenario
├── src/
│ ├── state_update_engine
│ ├── metrics
│ ├── comparator
│ └── report_writer
├── output/
└── tests/
5.3 The Core Question You’re Answering
“How do compounding transmission and recovery rates shape the full lifecycle of an outbreak?”
5.4 Concepts You Must Understand First
- Exponential growth and doubling time
- Can you explain why repeated multiplication accelerates?
- Can you estimate doubling time from a daily growth rate?
- Book Reference: “Calculus” by James Stewart, Ch. 1
- Compartment dynamics (SIR)
- What does each compartment represent physically?
- Which flows are allowed between compartments?
- Book Reference: “Mathematical Models in Biology” by Leah Edelstein-Keshet, Ch. 2
- Rates and units
- Are your parameters per-day or per-week?
- What breaks if units are inconsistent?
- Book Reference: “Modeling Infectious Diseases in Humans and Animals” by Keeling and Rohani, Ch. 1
- Model assumptions and limits
- What real behaviors are ignored (age structure, mobility networks, reinfection)?
- How should uncertainty be communicated?
- Book Reference: “An Introduction to Infectious Disease Modelling” by Emilia Vynnycky and Richard White, Ch. 1-3
5.5 Questions to Guide Your Design
- How will you prove that S + I + R remains equal to total population each day?
- Where will you encode interventions so scenario comparisons stay reproducible?
- Should your state values be integers, decimals, or both (for reporting vs math stability)?
- What summary metrics are most useful for non-technical readers?
5.6 Thinking Exercise
Before coding, reason through this manually:
Population = 1000
Initial: S=990, I=10, R=0
Assume day-level transitions produce:
new_infections = 12
new_recoveries = 3
Compute next-day S, I, R and verify conservation.
Then repeat once more with I changed from prior result.
Questions:
- Why can I still grow even while recoveries happen?
- Under what condition does I begin to decline?
5.7 The Interview Questions They’ll Ask
- What assumptions make the SIR model useful but limited?
- How do beta and gamma affect peak size and timing?
- Why can two scenarios with close parameters diverge so much?
- What does it mean when Re falls below 1?
- How would you validate a simulator before trusting its output?
5.8 Hints in Layers
Hint 1: Start with conservation checks
Log S + I + R every day and fail fast if it drifts.
Hint 2: Add one intervention only
First implement a single day-range beta multiplier before adding complex policy schedules.
Hint 3: Compare baseline and intervention on one chart
Visual divergence is often easier to debug than raw tables.
Hint 4: Keep assumptions visible
Print scenario assumptions at the top of every output report.
5.9 Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Exponential growth intuition | “Calculus” by James Stewart | Ch. 1 |
| Compartment model basics | “Mathematical Models in Biology” by Leah Edelstein-Keshet | Ch. 2 |
| Infectious disease model structure | “Modeling Infectious Diseases in Humans and Animals” by Keeling and Rohani | Ch. 1-3 |
| Interpreting assumptions | “An Introduction to Infectious Disease Modelling” by Vynnycky and White | Ch. 1-4 |
5.10 Implementation Phases
Phase 1: Foundation (3-5 hours)
Goals:
- Implement baseline SIR step logic
- Verify conservation and non-negative constraints
Tasks:
- Build state update loop for one scenario.
- Emit daily state series to terminal or table file.
Checkpoint: 30-day baseline run executes with valid state invariants.
Phase 2: Core Functionality (4-6 hours)
Goals:
- Add metrics and scenario comparison
- Generate plot-ready outputs
Tasks:
- Implement peak detection and summary metrics.
- Add baseline vs intervention comparator.
Checkpoint: Comparison report clearly shows different peak day/count.
Phase 3: Polish and Communication (3-5 hours)
Goals:
- Improve output clarity and interpretation
- Document assumptions and model limits
Tasks:
- Add annotations for peak day and intervention windows.
- Add assumptions block to output summary.
Checkpoint: A reader can understand outcome and caveats without reading source logic.
5.11 Key Implementation Decisions
| Decision | Options | Recommendation | Rationale |
|---|---|---|---|
| Time model | Continuous differential, discrete daily | Discrete daily | Fits high-school math and easier validation |
| Value type | Integer-only, decimal | Decimal for math + rounded display | Avoids truncation artifacts |
| Intervention modeling | Hard-coded, scenario-driven | Scenario-driven | Enables reproducible experiments |
| Output emphasis | Table only, chart only, both | Both | Numeric audit + visual intuition |
6. Testing Strategy
6.1 Test Categories
| Category | Purpose | Examples |
|---|---|---|
| Unit Tests | Validate update formulas | Single-step SIR transitions |
| Integration Tests | Validate scenario execution | 180-day baseline + intervention run |
| Edge Case Tests | Validate boundary behavior | Zero infected, extreme rates, tiny populations |
6.2 Critical Test Cases
- Population conservation: S + I + R equals N for every day.
- Non-negativity: No compartment becomes negative under valid parameters.
- Threshold behavior: Scenarios with lower effective transmission produce lower or delayed peaks.
- Deterministic reproducibility: Same scenario file yields same metrics and outputs.
6.3 Test Data
Scenario A: N=1000, beta=0.30, gamma=0.10, I0=5, days=120
Scenario B: same as A, but beta multiplier 0.70 from day 20-80
Scenario C: low transmission stress case (peak should remain small)
Scenario D: near-zero recovery stress case (long infectious tail)
7. Common Pitfalls and Debugging
7.1 Frequent Mistakes
| Pitfall | Symptom | Solution |
|---|---|---|
| Unit mismatch in rates | Unrealistic explosion or immediate collapse | Standardize parameters to per-day units |
| Missing conservation checks | Silent drift in total population | Assert S+I+R equals N each step |
| Overinterpreting outputs | False confidence in precise forecasts | Label assumptions and uncertainty clearly |
| Hard-coded interventions | Difficult scenario comparison | Use external scenario definitions |
7.2 Debugging Strategies
- Run a tiny toy population (N=20) and inspect each day manually.
- Compare one-day hand calculations to simulator output.
- Print transmission and recovery contributions separately before state update.
7.3 Performance Traps
Re-rendering heavy plots inside every simulation loop wastes time. Store series first, plot once per scenario at the end.
8. Extensions and Challenges
8.1 Beginner Extensions
- Add hospitalization proxy curve derived from infected counts.
- Add a compact plain-text daily report for terminal review.
8.2 Intermediate Extensions
- Add SEIR compartment (Exposed) and compare to SIR.
- Add vaccination rollout schedule with start-day and capacity constraints.
8.3 Advanced Extensions
- Add age-group compartments with different contact matrices.
- Add stochastic runs and confidence bands from multiple simulations.
9. Real-World Connections
9.1 Industry Applications
- Public health planning: Scenario analysis for capacity and intervention timing.
- Operations forecasting: Similar dynamics appear in support queues and outage propagation.
- Risk communication: Converting models into actionable visual narratives.
9.2 Related Open Source Projects
- EpiModel: https://www.epimodel.org/ - Network and compartment modeling ecosystem.
- Covasim: https://covasim.org/ - Agent-based epidemic simulation framework.
- Our World in Data: https://ourworldindata.org/ - Public data context for interpreting epidemic curves.
9.3 Interview Relevance
- You can explain compounding growth clearly with both equations and curves.
- You can discuss model assumptions and uncertainty responsibly.
- You can justify design choices in simulation systems (state updates, validation, outputs).
10. Resources
10.1 Essential Reading
- “Calculus” by James Stewart - Ch. 1 for exponential growth and rates.
- “Mathematical Models in Biology” by Leah Edelstein-Keshet - Ch. 2 for compartment model foundations.
- “Modeling Infectious Diseases in Humans and Animals” by Keeling and Rohani - Ch. 1-3 for parameter interpretation.
10.2 Video Resources
- Khan Academy lessons on exponential growth and differential thinking.
- Introductory talks on SIR model intuition (search: “SIR model explained visually”).
10.3 Tools and Documentation
- NumPy documentation: https://numpy.org/doc/ - Numeric arrays and calculations.
- Pandas documentation: https://pandas.pydata.org/docs/ - Time-series tables and summaries.
- Matplotlib documentation: https://matplotlib.org/stable/users/index.html - Curve plotting and annotations.
10.4 Related Projects in This Series
- P05-monte-carlo-casino.md: Probability and simulation thinking.
- P07-derivative-explorer.md: Rate-of-change intuition used in interpreting curve slopes.
- P08-area-estimator.md: Accumulation concepts for totals over time.
11. Self-Assessment Checklist
11.1 Understanding
- I can explain why exponential growth accelerates.
- I can describe SIR flows and what each parameter controls.
- I can explain why interventions shift both peak height and peak timing.
11.2 Implementation
- My simulator preserves total population each day.
- My outputs include baseline and intervention comparison metrics.
- My charts clearly label peaks and intervention windows.
11.3 Growth
- I can communicate model assumptions and limits without overclaiming certainty.
- I can propose sensible next refinements (SEIR, stochasticity, age groups).
- I can defend this project as a rigorous math-modeling exercise in interviews.
12. Submission / Completion Criteria
Minimum Viable Completion:
- Baseline SIR simulation runs for configurable duration.
- Daily S, I, R outputs are generated and conserved.
- Peak day and peak infected metrics are reported.
Full Completion:
- All minimum criteria plus:
- Baseline vs intervention scenario comparison is implemented.
- Chart output includes readable labels and peak annotations.
- A short assumptions section is included in output report.
Excellence (Going Above and Beyond):
- Multiple intervention schedules can be compared in one run.
- Additional compartment or stochastic extension is implemented.
- A concise analysis note interprets tradeoffs and uncertainty clearly.
This guide was expanded from LEARN_HIGH_SCHOOL_MATH_WITH_PYTHON.md. For the full sequence, see README.md.