Project 11: Epidemic Simulator (Exponential Growth and Compartments)

Build a scenario-driven simulator that models infection spread, peak load, and intervention effects over time.

Quick Reference

Attribute Value
Difficulty Level 2: Beginner-Intermediate
Time Estimate Weekend to 1 week
Language Python (Alternatives: JavaScript, R)
Prerequisites Exponential functions, rates, percentages, basic plotting
Key Topics Discrete-time simulation, SIR dynamics, reproduction number, parameter sensitivity

1. Learning Objectives

By completing this project, you will:

  1. Explain why small rate changes can produce dramatically different epidemic outcomes.
  2. Implement a discrete SIR-style model with explicit update equations.
  3. Track and interpret peak infection timing, peak size, and total affected population.
  4. Compare intervention scenarios (contact reduction, faster recovery, delayed response).
  5. Communicate model limits and assumptions clearly, like an engineer.

2. Theoretical Foundation

2.1 Core Concepts

  • Exponential growth: Early spread can look slow, then accelerate rapidly due to compounding.
  • Compartment models: Population is partitioned into Susceptible (S), Infected (I), Recovered (R).
  • Reproduction numbers: R0 and time-varying Re summarize expected secondary infections.
  • Finite populations: Growth eventually slows because susceptible people are depleted.
  • Discrete-time updates: Simulations step day-by-day, balancing infection and recovery flows.

2.2 Why This Matters

This project turns a headline concept into measurable curves. You learn to reason about growth, delay, saturation, and tradeoffs in public-health style systems. These same modeling skills appear in finance, infrastructure capacity planning, and marketing diffusion.

2.3 Historical Context / Background

Compartment models have been used for over a century to reason about infectious spread. During recent global outbreaks, these models became mainstream in dashboards and policy discussions, making mathematical literacy around growth and assumptions a practical skill.

2.4 Common Misconceptions

  • Misconception: Exponential means forever. Reality: Real systems saturate due to constraints.
  • Misconception: One R0 number predicts everything. Reality: Transmission changes with behavior and interventions.
  • Misconception: Curves are exact forecasts. Reality: They are scenario tools based on assumptions.

3. Project Specification

3.1 What You Will Build

A deterministic simulator that:

  • Runs one or more SIR scenarios over configurable days
  • Produces daily S, I, R values and summary metrics
  • Compares baseline and intervention scenarios
  • Exports table and chart outputs for review

3.2 Functional Requirements

  1. Scenario input: Accept population size and parameters (infection rate, recovery rate, initial infected).
  2. State updates: Compute S, I, R per time step with explicit formulas.
  3. Metrics: Report peak infected value, peak day, and cumulative recovered.
  4. Scenario comparison: Support baseline vs intervention in the same run.
  5. Output artifacts: Save daily series table and a labeled curve visualization.

3.3 Non-Functional Requirements

  • Performance: Simulate at least 365 days in under a few seconds.
  • Reliability: State values never go negative and total population stays conserved.
  • Usability: Outputs must make peak timing and intervention impact obvious.

3.4 Example Usage / Output

Pseudo terminal transcript:

$ epidemic-sim --scenario baseline --days 180
Population: 10000
Initial state: S=9990 I=10 R=0
Peak infected: day 47, count 3124
Final state: S=2114 I=23 R=7863
Saved series: ./output/p11-baseline.csv
Saved chart:  ./output/p11-baseline.png

$ epidemic-sim --scenario baseline --scenario intervention_30pct_contact_drop
Comparison summary:
- Baseline peak: day 47, 3124 infected
- Intervention peak: day 66, 1810 infected

3.5 Real World Outcome

After finishing, you can run two scenarios and immediately answer practical questions: when the wave peaks, how high it goes, and how much an intervention delays or reduces that peak. Your output should include clear labeled curves where baseline and intervention diverge in an interpretable way.


4. Solution Architecture

4.1 High-Level Design

Scenario Parameters
   |
   v
State Initializer (S0, I0, R0)
   |
   v
Daily Update Engine
   |
   +--> Metrics Extractor (peak day, peak size, totals)
   |
   v
Series Export (table) + Chart Renderer

4.2 Key Components

Component Responsibility Key Decisions
Scenario Loader Parse input parameters and intervention windows Keep schema small and explicit
Update Engine Apply daily SIR transitions Enforce conservation checks each step
Metrics Module Compute peak and final outcomes Store both absolute and percentage metrics
Comparator Align baseline vs intervention outputs Use same timeline and initial state
Reporter Export chart/table and narrative summary Include assumptions in every report

4.3 Data Structures

Scenario:
  name: string
  population: integer
  beta: decimal        # transmission factor
  gamma: decimal       # recovery factor
  initial_infected: integer
  duration_days: integer
  interventions: list of {start_day, end_day, beta_multiplier}

State:
  day: integer
  S: decimal
  I: decimal
  R: decimal

RunSummary:
  peak_day: integer
  peak_infected: decimal
  final_S: decimal
  final_I: decimal
  final_R: decimal

4.4 Algorithm Overview

Key Algorithm: Daily SIR Step

  1. Compute new infections from current S, I, and transmission factor.
  2. Compute new recoveries from current I and recovery factor.
  3. Update S, I, R using mass-conserving transitions.
  4. Clamp near-zero floating artifacts for stability reporting.
  5. Record state and update peak metrics.

Complexity Analysis:

  • Time: O(D) per scenario, where D is days simulated
  • Space: O(D) if storing full series, O(1) if stream-only summary

5. Implementation Guide

5.1 Development Environment Setup

Pseudo setup checklist:

1) Prepare runtime and plotting dependency.
2) Create folders: scenarios/, output/, tests/.
3) Run a 10-day sanity scenario to validate conservation output.

5.2 Project Structure

p11-epidemic-simulator/
├── scenarios/
│   ├── baseline.scenario
│   └── intervention.scenario
├── src/
│   ├── state_update_engine
│   ├── metrics
│   ├── comparator
│   └── report_writer
├── output/
└── tests/

5.3 The Core Question You’re Answering

“How do compounding transmission and recovery rates shape the full lifecycle of an outbreak?”

5.4 Concepts You Must Understand First

  1. Exponential growth and doubling time
    • Can you explain why repeated multiplication accelerates?
    • Can you estimate doubling time from a daily growth rate?
    • Book Reference: “Calculus” by James Stewart, Ch. 1
  2. Compartment dynamics (SIR)
    • What does each compartment represent physically?
    • Which flows are allowed between compartments?
    • Book Reference: “Mathematical Models in Biology” by Leah Edelstein-Keshet, Ch. 2
  3. Rates and units
    • Are your parameters per-day or per-week?
    • What breaks if units are inconsistent?
    • Book Reference: “Modeling Infectious Diseases in Humans and Animals” by Keeling and Rohani, Ch. 1
  4. Model assumptions and limits
    • What real behaviors are ignored (age structure, mobility networks, reinfection)?
    • How should uncertainty be communicated?
    • Book Reference: “An Introduction to Infectious Disease Modelling” by Emilia Vynnycky and Richard White, Ch. 1-3

5.5 Questions to Guide Your Design

  1. How will you prove that S + I + R remains equal to total population each day?
  2. Where will you encode interventions so scenario comparisons stay reproducible?
  3. Should your state values be integers, decimals, or both (for reporting vs math stability)?
  4. What summary metrics are most useful for non-technical readers?

5.6 Thinking Exercise

Before coding, reason through this manually:

Population = 1000
Initial: S=990, I=10, R=0
Assume day-level transitions produce:
  new_infections = 12
  new_recoveries = 3

Compute next-day S, I, R and verify conservation.
Then repeat once more with I changed from prior result.

Questions:

  • Why can I still grow even while recoveries happen?
  • Under what condition does I begin to decline?

5.7 The Interview Questions They’ll Ask

  1. What assumptions make the SIR model useful but limited?
  2. How do beta and gamma affect peak size and timing?
  3. Why can two scenarios with close parameters diverge so much?
  4. What does it mean when Re falls below 1?
  5. How would you validate a simulator before trusting its output?

5.8 Hints in Layers

Hint 1: Start with conservation checks
Log S + I + R every day and fail fast if it drifts.

Hint 2: Add one intervention only
First implement a single day-range beta multiplier before adding complex policy schedules.

Hint 3: Compare baseline and intervention on one chart
Visual divergence is often easier to debug than raw tables.

Hint 4: Keep assumptions visible
Print scenario assumptions at the top of every output report.

5.9 Books That Will Help

Topic Book Chapter
Exponential growth intuition “Calculus” by James Stewart Ch. 1
Compartment model basics “Mathematical Models in Biology” by Leah Edelstein-Keshet Ch. 2
Infectious disease model structure “Modeling Infectious Diseases in Humans and Animals” by Keeling and Rohani Ch. 1-3
Interpreting assumptions “An Introduction to Infectious Disease Modelling” by Vynnycky and White Ch. 1-4

5.10 Implementation Phases

Phase 1: Foundation (3-5 hours)

Goals:

  • Implement baseline SIR step logic
  • Verify conservation and non-negative constraints

Tasks:

  1. Build state update loop for one scenario.
  2. Emit daily state series to terminal or table file.

Checkpoint: 30-day baseline run executes with valid state invariants.

Phase 2: Core Functionality (4-6 hours)

Goals:

  • Add metrics and scenario comparison
  • Generate plot-ready outputs

Tasks:

  1. Implement peak detection and summary metrics.
  2. Add baseline vs intervention comparator.

Checkpoint: Comparison report clearly shows different peak day/count.

Phase 3: Polish and Communication (3-5 hours)

Goals:

  • Improve output clarity and interpretation
  • Document assumptions and model limits

Tasks:

  1. Add annotations for peak day and intervention windows.
  2. Add assumptions block to output summary.

Checkpoint: A reader can understand outcome and caveats without reading source logic.

5.11 Key Implementation Decisions

Decision Options Recommendation Rationale
Time model Continuous differential, discrete daily Discrete daily Fits high-school math and easier validation
Value type Integer-only, decimal Decimal for math + rounded display Avoids truncation artifacts
Intervention modeling Hard-coded, scenario-driven Scenario-driven Enables reproducible experiments
Output emphasis Table only, chart only, both Both Numeric audit + visual intuition

6. Testing Strategy

6.1 Test Categories

Category Purpose Examples
Unit Tests Validate update formulas Single-step SIR transitions
Integration Tests Validate scenario execution 180-day baseline + intervention run
Edge Case Tests Validate boundary behavior Zero infected, extreme rates, tiny populations

6.2 Critical Test Cases

  1. Population conservation: S + I + R equals N for every day.
  2. Non-negativity: No compartment becomes negative under valid parameters.
  3. Threshold behavior: Scenarios with lower effective transmission produce lower or delayed peaks.
  4. Deterministic reproducibility: Same scenario file yields same metrics and outputs.

6.3 Test Data

Scenario A: N=1000, beta=0.30, gamma=0.10, I0=5, days=120
Scenario B: same as A, but beta multiplier 0.70 from day 20-80
Scenario C: low transmission stress case (peak should remain small)
Scenario D: near-zero recovery stress case (long infectious tail)

7. Common Pitfalls and Debugging

7.1 Frequent Mistakes

Pitfall Symptom Solution
Unit mismatch in rates Unrealistic explosion or immediate collapse Standardize parameters to per-day units
Missing conservation checks Silent drift in total population Assert S+I+R equals N each step
Overinterpreting outputs False confidence in precise forecasts Label assumptions and uncertainty clearly
Hard-coded interventions Difficult scenario comparison Use external scenario definitions

7.2 Debugging Strategies

  • Run a tiny toy population (N=20) and inspect each day manually.
  • Compare one-day hand calculations to simulator output.
  • Print transmission and recovery contributions separately before state update.

7.3 Performance Traps

Re-rendering heavy plots inside every simulation loop wastes time. Store series first, plot once per scenario at the end.


8. Extensions and Challenges

8.1 Beginner Extensions

  • Add hospitalization proxy curve derived from infected counts.
  • Add a compact plain-text daily report for terminal review.

8.2 Intermediate Extensions

  • Add SEIR compartment (Exposed) and compare to SIR.
  • Add vaccination rollout schedule with start-day and capacity constraints.

8.3 Advanced Extensions

  • Add age-group compartments with different contact matrices.
  • Add stochastic runs and confidence bands from multiple simulations.

9. Real-World Connections

9.1 Industry Applications

  • Public health planning: Scenario analysis for capacity and intervention timing.
  • Operations forecasting: Similar dynamics appear in support queues and outage propagation.
  • Risk communication: Converting models into actionable visual narratives.

9.3 Interview Relevance

  • You can explain compounding growth clearly with both equations and curves.
  • You can discuss model assumptions and uncertainty responsibly.
  • You can justify design choices in simulation systems (state updates, validation, outputs).

10. Resources

10.1 Essential Reading

  • “Calculus” by James Stewart - Ch. 1 for exponential growth and rates.
  • “Mathematical Models in Biology” by Leah Edelstein-Keshet - Ch. 2 for compartment model foundations.
  • “Modeling Infectious Diseases in Humans and Animals” by Keeling and Rohani - Ch. 1-3 for parameter interpretation.

10.2 Video Resources

  • Khan Academy lessons on exponential growth and differential thinking.
  • Introductory talks on SIR model intuition (search: “SIR model explained visually”).

10.3 Tools and Documentation


11. Self-Assessment Checklist

11.1 Understanding

  • I can explain why exponential growth accelerates.
  • I can describe SIR flows and what each parameter controls.
  • I can explain why interventions shift both peak height and peak timing.

11.2 Implementation

  • My simulator preserves total population each day.
  • My outputs include baseline and intervention comparison metrics.
  • My charts clearly label peaks and intervention windows.

11.3 Growth

  • I can communicate model assumptions and limits without overclaiming certainty.
  • I can propose sensible next refinements (SEIR, stochasticity, age groups).
  • I can defend this project as a rigorous math-modeling exercise in interviews.

12. Submission / Completion Criteria

Minimum Viable Completion:

  • Baseline SIR simulation runs for configurable duration.
  • Daily S, I, R outputs are generated and conserved.
  • Peak day and peak infected metrics are reported.

Full Completion:

  • All minimum criteria plus:
  • Baseline vs intervention scenario comparison is implemented.
  • Chart output includes readable labels and peak annotations.
  • A short assumptions section is included in output report.

Excellence (Going Above and Beyond):

  • Multiple intervention schedules can be compared in one run.
  • Additional compartment or stochastic extension is implemented.
  • A concise analysis note interprets tradeoffs and uncertainty clearly.

This guide was expanded from LEARN_HIGH_SCHOOL_MATH_WITH_PYTHON.md. For the full sequence, see README.md.