Project 11: Epidemic Simulator (Exponential Growth and Compartments)

Build a scenario-driven simulator that models infection spread, peak load, and intervention effects over time.

Quick Reference

Attribute	Value
Difficulty	Level 2: Beginner-Intermediate
Time Estimate	Weekend to 1 week
Language	Python (Alternatives: JavaScript, R)
Prerequisites	Exponential functions, rates, percentages, basic plotting
Key Topics	Discrete-time simulation, SIR dynamics, reproduction number, parameter sensitivity

1. Learning Objectives

By completing this project, you will:

Explain why small rate changes can produce dramatically different epidemic outcomes.
Implement a discrete SIR-style model with explicit update equations.
Track and interpret peak infection timing, peak size, and total affected population.
Compare intervention scenarios (contact reduction, faster recovery, delayed response).
Communicate model limits and assumptions clearly, like an engineer.

2. Theoretical Foundation

2.1 Core Concepts

Exponential growth: Early spread can look slow, then accelerate rapidly due to compounding.
Compartment models: Population is partitioned into Susceptible (S), Infected (I), Recovered (R).
Reproduction numbers: R0 and time-varying Re summarize expected secondary infections.
Finite populations: Growth eventually slows because susceptible people are depleted.
Discrete-time updates: Simulations step day-by-day, balancing infection and recovery flows.

2.2 Why This Matters

This project turns a headline concept into measurable curves. You learn to reason about growth, delay, saturation, and tradeoffs in public-health style systems. These same modeling skills appear in finance, infrastructure capacity planning, and marketing diffusion.

2.3 Historical Context / Background

Compartment models have been used for over a century to reason about infectious spread. During recent global outbreaks, these models became mainstream in dashboards and policy discussions, making mathematical literacy around growth and assumptions a practical skill.

2.4 Common Misconceptions

Misconception: Exponential means forever. Reality: Real systems saturate due to constraints.
Misconception: One R0 number predicts everything. Reality: Transmission changes with behavior and interventions.
Misconception: Curves are exact forecasts. Reality: They are scenario tools based on assumptions.

3. Project Specification

3.1 What You Will Build

A deterministic simulator that:

Runs one or more SIR scenarios over configurable days
Produces daily S, I, R values and summary metrics
Compares baseline and intervention scenarios
Exports table and chart outputs for review

3.2 Functional Requirements

Scenario input: Accept population size and parameters (infection rate, recovery rate, initial infected).
State updates: Compute S, I, R per time step with explicit formulas.
Metrics: Report peak infected value, peak day, and cumulative recovered.
Scenario comparison: Support baseline vs intervention in the same run.
Output artifacts: Save daily series table and a labeled curve visualization.

3.3 Non-Functional Requirements

Performance: Simulate at least 365 days in under a few seconds.
Reliability: State values never go negative and total population stays conserved.
Usability: Outputs must make peak timing and intervention impact obvious.

3.4 Example Usage / Output

Pseudo terminal transcript:

$ epidemic-sim --scenario baseline --days 180
Population: 10000
Initial state: S=9990 I=10 R=0
Peak infected: day 47, count 3124
Final state: S=2114 I=23 R=7863
Saved series: ./output/p11-baseline.csv
Saved chart:  ./output/p11-baseline.png

$ epidemic-sim --scenario baseline --scenario intervention_30pct_contact_drop
Comparison summary:
- Baseline peak: day 47, 3124 infected
- Intervention peak: day 66, 1810 infected

3.5 Real World Outcome

After finishing, you can run two scenarios and immediately answer practical questions: when the wave peaks, how high it goes, and how much an intervention delays or reduces that peak. Your output should include clear labeled curves where baseline and intervention diverge in an interpretable way.

4. Solution Architecture

4.1 High-Level Design

Scenario Parameters
   |
   v
State Initializer (S0, I0, R0)
   |
   v
Daily Update Engine
   |
   +--> Metrics Extractor (peak day, peak size, totals)
   |
   v
Series Export (table) + Chart Renderer

4.2 Key Components

Component	Responsibility	Key Decisions
Scenario Loader	Parse input parameters and intervention windows	Keep schema small and explicit
Update Engine	Apply daily SIR transitions	Enforce conservation checks each step
Metrics Module	Compute peak and final outcomes	Store both absolute and percentage metrics
Comparator	Align baseline vs intervention outputs	Use same timeline and initial state
Reporter	Export chart/table and narrative summary	Include assumptions in every report

4.3 Data Structures

Scenario:
  name: string
  population: integer
  beta: decimal        # transmission factor
  gamma: decimal       # recovery factor
  initial_infected: integer
  duration_days: integer
  interventions: list of {start_day, end_day, beta_multiplier}

State:
  day: integer
  S: decimal
  I: decimal
  R: decimal

RunSummary:
  peak_day: integer
  peak_infected: decimal
  final_S: decimal
  final_I: decimal
  final_R: decimal

4.4 Algorithm Overview

Key Algorithm: Daily SIR Step

Compute new infections from current S, I, and transmission factor.
Compute new recoveries from current I and recovery factor.
Update S, I, R using mass-conserving transitions.
Clamp near-zero floating artifacts for stability reporting.
Record state and update peak metrics.

Complexity Analysis:

Time: O(D) per scenario, where D is days simulated
Space: O(D) if storing full series, O(1) if stream-only summary

5. Implementation Guide

5.1 Development Environment Setup

Pseudo setup checklist:

1) Prepare runtime and plotting dependency.
2) Create folders: scenarios/, output/, tests/.
3) Run a 10-day sanity scenario to validate conservation output.

5.2 Project Structure

p11-epidemic-simulator/
├── scenarios/
│   ├── baseline.scenario
│   └── intervention.scenario
├── src/
│   ├── state_update_engine
│   ├── metrics
│   ├── comparator
│   └── report_writer
├── output/
└── tests/

5.3 The Core Question You’re Answering

“How do compounding transmission and recovery rates shape the full lifecycle of an outbreak?”

5.4 Concepts You Must Understand First

Exponential growth and doubling time
- Can you explain why repeated multiplication accelerates?
- Can you estimate doubling time from a daily growth rate?
- Book Reference: “Calculus” by James Stewart, Ch. 1
Compartment dynamics (SIR)
- What does each compartment represent physically?
- Which flows are allowed between compartments?
- Book Reference: “Mathematical Models in Biology” by Leah Edelstein-Keshet, Ch. 2
Rates and units
- Are your parameters per-day or per-week?
- What breaks if units are inconsistent?
- Book Reference: “Modeling Infectious Diseases in Humans and Animals” by Keeling and Rohani, Ch. 1
Model assumptions and limits
- What real behaviors are ignored (age structure, mobility networks, reinfection)?
- How should uncertainty be communicated?
- Book Reference: “An Introduction to Infectious Disease Modelling” by Emilia Vynnycky and Richard White, Ch. 1-3

5.5 Questions to Guide Your Design

How will you prove that S + I + R remains equal to total population each day?
Where will you encode interventions so scenario comparisons stay reproducible?
Should your state values be integers, decimals, or both (for reporting vs math stability)?
What summary metrics are most useful for non-technical readers?

5.6 Thinking Exercise

Before coding, reason through this manually:

Population = 1000
Initial: S=990, I=10, R=0
Assume day-level transitions produce:
  new_infections = 12
  new_recoveries = 3

Compute next-day S, I, R and verify conservation.
Then repeat once more with I changed from prior result.

Questions:

Why can I still grow even while recoveries happen?
Under what condition does I begin to decline?

5.7 The Interview Questions They’ll Ask

What assumptions make the SIR model useful but limited?
How do beta and gamma affect peak size and timing?
Why can two scenarios with close parameters diverge so much?
What does it mean when Re falls below 1?
How would you validate a simulator before trusting its output?

5.8 Hints in Layers

Hint 1: Start with conservation checks
Log S + I + R every day and fail fast if it drifts.

Hint 2: Add one intervention only
First implement a single day-range beta multiplier before adding complex policy schedules.

Hint 3: Compare baseline and intervention on one chart
Visual divergence is often easier to debug than raw tables.

Hint 4: Keep assumptions visible
Print scenario assumptions at the top of every output report.

5.9 Books That Will Help

Topic	Book	Chapter
Exponential growth intuition	“Calculus” by James Stewart	Ch. 1
Compartment model basics	“Mathematical Models in Biology” by Leah Edelstein-Keshet	Ch. 2
Infectious disease model structure	“Modeling Infectious Diseases in Humans and Animals” by Keeling and Rohani	Ch. 1-3
Interpreting assumptions	“An Introduction to Infectious Disease Modelling” by Vynnycky and White	Ch. 1-4

5.10 Implementation Phases

Phase 1: Foundation (3-5 hours)

Goals:

Implement baseline SIR step logic
Verify conservation and non-negative constraints

Tasks:

Build state update loop for one scenario.
Emit daily state series to terminal or table file.

Checkpoint: 30-day baseline run executes with valid state invariants.

Phase 2: Core Functionality (4-6 hours)

Goals:

Add metrics and scenario comparison
Generate plot-ready outputs

Tasks:

Implement peak detection and summary metrics.
Add baseline vs intervention comparator.

Checkpoint: Comparison report clearly shows different peak day/count.

Phase 3: Polish and Communication (3-5 hours)

Goals:

Improve output clarity and interpretation
Document assumptions and model limits

Tasks:

Add annotations for peak day and intervention windows.
Add assumptions block to output summary.

Checkpoint: A reader can understand outcome and caveats without reading source logic.

5.11 Key Implementation Decisions

Decision	Options	Recommendation	Rationale
Time model	Continuous differential, discrete daily	Discrete daily	Fits high-school math and easier validation
Value type	Integer-only, decimal	Decimal for math + rounded display	Avoids truncation artifacts
Intervention modeling	Hard-coded, scenario-driven	Scenario-driven	Enables reproducible experiments
Output emphasis	Table only, chart only, both	Both	Numeric audit + visual intuition

6. Testing Strategy

6.1 Test Categories

Category	Purpose	Examples
Unit Tests	Validate update formulas	Single-step SIR transitions
Integration Tests	Validate scenario execution	180-day baseline + intervention run
Edge Case Tests	Validate boundary behavior	Zero infected, extreme rates, tiny populations

6.2 Critical Test Cases

Population conservation: S + I + R equals N for every day.
Non-negativity: No compartment becomes negative under valid parameters.
Threshold behavior: Scenarios with lower effective transmission produce lower or delayed peaks.
Deterministic reproducibility: Same scenario file yields same metrics and outputs.

6.3 Test Data

Scenario A: N=1000, beta=0.30, gamma=0.10, I0=5, days=120
Scenario B: same as A, but beta multiplier 0.70 from day 20-80
Scenario C: low transmission stress case (peak should remain small)
Scenario D: near-zero recovery stress case (long infectious tail)

7. Common Pitfalls and Debugging

7.1 Frequent Mistakes

Pitfall	Symptom	Solution
Unit mismatch in rates	Unrealistic explosion or immediate collapse	Standardize parameters to per-day units
Missing conservation checks	Silent drift in total population	Assert S+I+R equals N each step
Overinterpreting outputs	False confidence in precise forecasts	Label assumptions and uncertainty clearly
Hard-coded interventions	Difficult scenario comparison	Use external scenario definitions

7.2 Debugging Strategies

Run a tiny toy population (N=20) and inspect each day manually.
Compare one-day hand calculations to simulator output.
Print transmission and recovery contributions separately before state update.

7.3 Performance Traps

Re-rendering heavy plots inside every simulation loop wastes time. Store series first, plot once per scenario at the end.

8. Extensions and Challenges

8.1 Beginner Extensions

Add hospitalization proxy curve derived from infected counts.
Add a compact plain-text daily report for terminal review.

8.2 Intermediate Extensions

Add SEIR compartment (Exposed) and compare to SIR.
Add vaccination rollout schedule with start-day and capacity constraints.

8.3 Advanced Extensions

Add age-group compartments with different contact matrices.
Add stochastic runs and confidence bands from multiple simulations.

9. Real-World Connections

9.1 Industry Applications

Public health planning: Scenario analysis for capacity and intervention timing.
Operations forecasting: Similar dynamics appear in support queues and outage propagation.
Risk communication: Converting models into actionable visual narratives.

EpiModel: https://www.epimodel.org/ - Network and compartment modeling ecosystem.
Covasim: https://covasim.org/ - Agent-based epidemic simulation framework.
Our World in Data: https://ourworldindata.org/ - Public data context for interpreting epidemic curves.

9.3 Interview Relevance

You can explain compounding growth clearly with both equations and curves.
You can discuss model assumptions and uncertainty responsibly.
You can justify design choices in simulation systems (state updates, validation, outputs).

10. Resources

10.1 Essential Reading

“Calculus” by James Stewart - Ch. 1 for exponential growth and rates.
“Mathematical Models in Biology” by Leah Edelstein-Keshet - Ch. 2 for compartment model foundations.
“Modeling Infectious Diseases in Humans and Animals” by Keeling and Rohani - Ch. 1-3 for parameter interpretation.

10.2 Video Resources

Khan Academy lessons on exponential growth and differential thinking.
Introductory talks on SIR model intuition (search: “SIR model explained visually”).

10.3 Tools and Documentation

NumPy documentation: https://numpy.org/doc/ - Numeric arrays and calculations.
Pandas documentation: https://pandas.pydata.org/docs/ - Time-series tables and summaries.
Matplotlib documentation: https://matplotlib.org/stable/users/index.html - Curve plotting and annotations.

P05-monte-carlo-casino.md: Probability and simulation thinking.
P07-derivative-explorer.md: Rate-of-change intuition used in interpreting curve slopes.
P08-area-estimator.md: Accumulation concepts for totals over time.

11. Self-Assessment Checklist

11.1 Understanding

I can explain why exponential growth accelerates.
I can describe SIR flows and what each parameter controls.
I can explain why interventions shift both peak height and peak timing.

11.2 Implementation

My simulator preserves total population each day.
My outputs include baseline and intervention comparison metrics.
My charts clearly label peaks and intervention windows.

11.3 Growth

I can communicate model assumptions and limits without overclaiming certainty.
I can propose sensible next refinements (SEIR, stochasticity, age groups).
I can defend this project as a rigorous math-modeling exercise in interviews.

12. Submission / Completion Criteria

Minimum Viable Completion:

Baseline SIR simulation runs for configurable duration.
Daily S, I, R outputs are generated and conserved.
Peak day and peak infected metrics are reported.

Full Completion:

All minimum criteria plus:
Baseline vs intervention scenario comparison is implemented.
Chart output includes readable labels and peak annotations.
A short assumptions section is included in output report.

Excellence (Going Above and Beyond):

Multiple intervention schedules can be compared in one run.
Additional compartment or stochastic extension is implemented.
A concise analysis note interprets tradeoffs and uncertainty clearly.

This guide was expanded from LEARN_HIGH_SCHOOL_MATH_WITH_PYTHON.md. For the full sequence, see README.md.

Project 11: Epidemic Simulator (Exponential Growth and Compartments)

Quick Reference

1. Learning Objectives

2. Theoretical Foundation

2.1 Core Concepts

2.2 Why This Matters

2.3 Historical Context / Background

2.4 Common Misconceptions

3. Project Specification

3.1 What You Will Build

3.2 Functional Requirements

3.3 Non-Functional Requirements

3.4 Example Usage / Output

3.5 Real World Outcome

4. Solution Architecture

4.1 High-Level Design

4.2 Key Components

4.3 Data Structures

4.4 Algorithm Overview

5. Implementation Guide

5.1 Development Environment Setup

5.2 Project Structure

5.3 The Core Question You’re Answering

5.4 Concepts You Must Understand First

5.5 Questions to Guide Your Design

5.6 Thinking Exercise

5.7 The Interview Questions They’ll Ask

5.8 Hints in Layers

5.9 Books That Will Help

5.10 Implementation Phases

Phase 1: Foundation (3-5 hours)

Phase 2: Core Functionality (4-6 hours)

Phase 3: Polish and Communication (3-5 hours)

5.11 Key Implementation Decisions

6. Testing Strategy

6.1 Test Categories

6.2 Critical Test Cases

6.3 Test Data

7. Common Pitfalls and Debugging

7.1 Frequent Mistakes

7.2 Debugging Strategies

7.3 Performance Traps

8. Extensions and Challenges

8.1 Beginner Extensions

8.2 Intermediate Extensions

8.3 Advanced Extensions

9. Real-World Connections

9.1 Industry Applications

9.2 Related Open Source Projects

9.3 Interview Relevance

10. Resources

10.1 Essential Reading

10.2 Video Resources

10.3 Tools and Documentation

10.4 Related Projects in This Series

11. Self-Assessment Checklist

11.1 Understanding

11.2 Implementation

11.3 Growth

12. Submission / Completion Criteria