Project 10: Numerical Integration Visualizer

A tool that computes definite integrals numerically using various methods (rectangles, trapezoids, Simpson’s rule), visualizing the approximation and error.

Quick Reference

Attribute Value
Difficulty Level 2: Intermediate (The Developer)
Main Programming Language Python
Alternative Programming Languages C, Julia, Rust
Coolness Level Level 3: Genuinely Clever
Business Potential 1. The “Resume Gold” (Educational/Personal Brand)
Knowledge Area Numerical Methods / Calculus
Software or Tool Integration Calculator
Main Book “Numerical Recipes” by Press et al.

1. Learning Objectives

By completing this project, you will:

  1. Translate math definitions into deterministic implementation steps.
  2. Build validation checks that make correctness observable.
  3. Diagnose numerical, logical, and data-shape failures early.
  4. Explain tradeoffs in interviews using evidence from your own build.

2. All Theory Needed (Per-Concept Breakdown)

This project applies the following theory clusters:

  • Symbolic-to-numeric translation (expressions, data shapes, invariants)
  • Stability constraints (precision, scaling, stopping criteria)
  • Optimization or inference logic (depending on project objective)
  • Evaluation discipline (error analysis, test coverage, reproducibility)

Concept A: Mathematical Representation Discipline

Fundamentals A math expression is not executable until you define representation, ordering, and domain constraints. The same equation can be represented as a token stream, tree, matrix pipeline, or probability graph. Choosing representation determines what bugs you can catch early.

Deep Dive into the concept Most project failures begin before algorithm selection: they start with ambiguous representation. If your parser cannot distinguish unary minus from subtraction, your calculator fails. If your matrix dimensions are implicit rather than validated, your linear algebra pipeline fails silently. If your probabilistic assumptions (independence, stationarity, or class priors) are not explicit, your inference can look accurate on one split and collapse on another. The core implementation move is to treat representation as a contract. Define each object with shape, domain, and semantic intent. Then enforce invariants at boundaries: input parser, preprocessing, training loop, evaluation stage. This makes debugging local instead of global.

How this fits this project You will encode each operation with explicit contracts and invariant checks.

Definitions & key terms

  • Invariant: Property that must hold before and after each operation.
  • Shape contract: Expected dimensional structure of vectors/matrices/tensors.
  • Domain constraint: Allowed value range (for example log input > 0).

Mental model diagram

User Input -> Representation Layer -> Validated Operation -> Observable Output
              (tokens/shapes)        (invariants pass)       (tests/plots/logs)

How it works

  1. Parse/ingest data into typed structures.
  2. Validate shape/domain invariants.
  3. Execute operation.
  4. Compare observed output with expected behavior.
  5. Record failure signature if mismatch appears.

Minimal concrete example

PSEUDOCODE
read expression
tokenize with precedence rules
if token sequence invalid -> return syntax error
evaluate tree
if domain violation -> return bounded diagnostic
print value and confidence check

Common misconceptions

  • “If it runs once, representation is correct.” -> false.
  • “Type checks are enough without shape checks.” -> false.

Check-your-understanding questions

  1. Which invariant catches division-by-zero earliest?
  2. Why does shape validation belong at boundaries rather than only in core logic?
  3. Predict failure if tokenization ignores unary minus.

Check-your-understanding answers

  1. Domain check on denominator before operation execution.
  2. Boundary validation keeps errors local and diagnostic.
  3. Expressions like -2^2 get misinterpreted and produce wrong precedence behavior.

Real-world applications Feature preprocessing, model-serving input validation, and experiment-tracking schema enforcement.

Where you’ll apply it This project and every downstream project in the sprint.

References

  • CSAPP (Bryant & O’Hallaron), floating-point chapter
  • Math for Programmers (Paul Orland), representation-oriented chapters

Key insight Correct representation reduces the complexity of every later decision.

Summary Stable ML math implementations start with explicit contracts, not implicit assumptions.

Homework/Exercises

  1. Write five invariants for your project.
  2. Build a failing test input for each invariant.

Solutions

  1. Include at least one shape, one domain, one convergence, one reproducibility, and one output-range invariant.
  2. Each failing input should trigger exactly one diagnostic to keep root-cause analysis clean.

3. Build Blueprint

  1. Scope the smallest end-to-end slice that produces visible output.
  2. Add deterministic tests and edge-case probes.
  3. Layer complexity only after baseline behavior is stable.
  4. Add metrics logging before optimization.
  5. Run failure drills: perturb inputs, scale values, and check stability.

4. Real-World Outcome (Target)

$ python integrate.py "x^2" 0 3

Computing ∫₀³ x² dx

Method        | n=10    | n=100   | n=1000  | Exact
--------------+---------+---------+---------+-------
Left Riemann  | 7.785   | 8.866   | 8.987   | 9.000
Right Riemann | 10.395  | 9.136   | 9.014   | 9.000
Trapezoidal   | 9.090   | 9.001   | 9.000   | 9.000
Simpson's     | 9.000   | 9.000   | 9.000   | 9.000

[Visual: Area under x² from 0 to 3, with rectangles/trapezoids overlaid]
[Animation: More rectangles → better approximation]

Implementation Hints: Left Riemann sum:

def left_riemann(f, a, b, n):
    dx = (b - a) / n
    return sum(f(a + i*dx) * dx for i in range(n))

Trapezoidal: (f(left) + f(right)) / 2 * dx for each interval

Simpson’s rule (for even n):

∫f ≈ (dx/3) * [f(x₀) + 4f(x₁) + 2f(x₂) + 4f(x₃) + ... + f(xₙ)]

(alternating 4s and 2s, 1s at ends)

Learning milestones:

  1. Rectangles approximate area → You understand integration geometrically
  2. More rectangles = better approximation → You understand limits
  3. Simpson’s converges much faster → You understand higher-order methods

5. Core Design Notes from Main Guide

Core Question

How can we compute the area under any curve when we cannot find an antiderivative, and why do different approximation methods converge at dramatically different rates?

Integration is fundamentally about accumulation–adding up infinitely many infinitesimally small pieces. When exact antiderivatives are unavailable (which is most of the time for real-world functions), we approximate by summing finite rectangles, trapezoids, or parabolas. The profound insight is that different approximation strategies converge to the true value at vastly different speeds. Simpson’s rule uses parabolas and achieves O(h^4) accuracy, while rectangles give only O(h). Understanding this prepares you for the numerical methods that underpin scientific computing.

Concepts You Must Understand First

Stop and research these before coding:

  1. The definite integral as a limit of Riemann sums
    • What does the integral symbol mean geometrically?
    • Why is the limit of the sum as rectangles get infinitely thin equal to the area?
    • What are left, right, and midpoint Riemann sums?
    • Book Reference: “Calculus” Chapter 5 - James Stewart
  2. The trapezoidal rule: using linear interpolation
    • Why is a trapezoid a better approximation than a rectangle?
    • What is the formula for the trapezoidal rule?
    • What is the error order (O(h^2))?
    • Book Reference: “Numerical Recipes” Chapter 4 - Press et al.
  3. Simpson’s rule: using quadratic interpolation
    • Why does fitting a parabola through three points give better accuracy?
    • What is the 1-4-1 weighting pattern?
    • Why does Simpson’s rule require an even number of intervals?
    • Book Reference: “Numerical Recipes” Chapter 4 - Press et al.
  4. Error analysis: why higher-order methods are better
    • What does O(h^n) error mean for convergence?
    • Why does Simpson’s rule (O(h^4)) need far fewer points than rectangles (O(h))?
    • When might higher-order methods NOT be better?
    • Book Reference: “Numerical Methods” Chapter 7 - Burden & Faires
  5. Adaptive integration: concentrating effort where needed
    • Why use more subintervals where the function changes rapidly?
    • What is the basic idea of adaptive quadrature?
    • How do you estimate local error to guide adaptation?
    • Book Reference: “Numerical Recipes” Chapter 4 - Press et al.

Questions to Guide Your Design

Before implementing, think through these:

  1. How will you represent the function to integrate? Lambda functions? Strings that get parsed? Callable objects?

  2. How will you handle functions with singularities or discontinuities? Division by zero at endpoints? Jump discontinuities?

  3. What comparison metrics will you show? Error vs. n? Computation time? Visual accuracy?

  4. How will you visualize each method? Overlaid rectangles/trapezoids on the function? Separate plots?

  5. Will you implement adaptive integration? This is more complex but shows the power of error estimation.

  6. How will you handle the “exact” value for comparison? Use scipy.integrate.quad as ground truth? Symbolic integration when possible?

Thinking Exercise

Before writing any code, compute the integral of x^2 from 0 to 3 by hand using each method with n=2:

True value: integral of x^2 from 0 to 3 = [x^3/3] from 0 to 3 = 27/3 = 9.0

Left Riemann (n=2):
  Interval width: h = (3-0)/2 = 1.5
  Left endpoints: x = 0, x = 1.5
  f(0) = 0, f(1.5) = 2.25
  Sum = (0 + 2.25) * 1.5 = 3.375
  Error = |9 - 3.375| = 5.625

Right Riemann (n=2):
  Right endpoints: x = 1.5, x = 3
  f(1.5) = 2.25, f(3) = 9
  Sum = (2.25 + 9) * 1.5 = 16.875
  Error = |9 - 16.875| = 7.875

Trapezoidal (n=2):
  Average of left and right = (3.375 + 16.875) / 2 = 10.125
  Or: h/2 * [f(0) + 2*f(1.5) + f(3)] = 1.5/2 * [0 + 4.5 + 9] = 10.125
  Error = |9 - 10.125| = 1.125  (Much better!)

Simpson's (n=2):
  Uses parabola through (0,0), (1.5, 2.25), (3, 9)
  = h/3 * [f(0) + 4*f(1.5) + f(3)] = 1.5/3 * [0 + 9 + 9] = 9.0
  Error = 0  (Exact! Simpson's is exact for polynomials up to degree 3)

This example shows why Simpson’s rule is so powerful!

Interview Questions

  1. “What is numerical integration and when do you need it?” Expected answer: Approximating the definite integral when no closed-form antiderivative exists, or when the function is only known at discrete points.

  2. “Explain the difference between left Riemann sums, trapezoidal rule, and Simpson’s rule.” Expected answer: Left Riemann uses rectangles with height at left endpoint (O(h) error). Trapezoidal uses trapezoids (linear interpolation, O(h^2) error). Simpson’s uses parabolas (quadratic interpolation, O(h^4) error).

  3. “Why does Simpson’s rule converge faster than the trapezoidal rule?” Expected answer: Simpson’s uses higher-order polynomial approximation (quadratics vs. lines). The error terms cancel out more effectively, giving O(h^4) vs O(h^2).

  4. “What is adaptive integration?” Expected answer: Using more subintervals where the function changes rapidly and fewer where it is smooth. Based on local error estimation.

  5. “When might the trapezoidal rule outperform Simpson’s rule?” Expected answer: When the function has discontinuities or sharp corners that violate smoothness assumptions. Also when function evaluations are very expensive and the higher accuracy does not justify extra points.

  6. “How do you verify the correctness of a numerical integration routine?” Expected answer: Test on functions with known integrals. Check that doubling n reduces error by the expected factor (h becomes h/2, so O(h^4) error should drop by 16x for Simpson’s).

Hints in Layers (Treat as pseudocode guidance)

Hint 1: Start with the three basic implementations:

def left_riemann(f, a, b, n):
    h = (b - a) / n
    return h * sum(f(a + i*h) for i in range(n))

def trapezoidal(f, a, b, n):
    h = (b - a) / n
    return h * (0.5*f(a) + sum(f(a + i*h) for i in range(1, n)) + 0.5*f(b))

def simpsons(f, a, b, n):  # n must be even
    h = (b - a) / n
    x = [a + i*h for i in range(n+1)]
    return h/3 * (f(a) + f(b) +
                  4*sum(f(x[i]) for i in range(1, n, 2)) +
                  2*sum(f(x[i]) for i in range(2, n-1, 2)))

Hint 2: For visualization, draw rectangles/trapezoids:

import matplotlib.pyplot as plt
import matplotlib.patches as patches

# For rectangles
for i in range(n):
    x_left = a + i*h
    rect = patches.Rectangle((x_left, 0), h, f(x_left), ...)
    ax.add_patch(rect)

Hint 3: Create a convergence plot:

ns = [2, 4, 8, 16, 32, 64, 128, 256]
errors = [abs(simpsons(f, a, b, n) - true_value) for n in ns]
plt.loglog(ns, errors, 'o-')  # Log-log shows power law clearly

Hint 4: Good test functions:

  • x^2 from 0 to 1 (exact = 1/3, smooth)
  • sin(x) from 0 to pi (exact = 2, smooth periodic)
  • sqrt(x) from 0 to 1 (exact = 2/3, infinite derivative at 0)
  • 1/(1+x^2) from 0 to 1 (exact = pi/4, smooth but interesting)

Hint 5: For adaptive integration, the basic idea:

def adaptive(f, a, b, tol):
    mid = (a + b) / 2
    whole = simpsons(f, a, b, 2)
    left = simpsons(f, a, mid, 2)
    right = simpsons(f, mid, b, 2)
    if abs(whole - (left + right)) < tol:
        return left + right
    else:
        return adaptive(f, a, mid, tol/2) + adaptive(f, mid, b, tol/2)

Books That Will Help

Topic Book Chapter
Riemann Sums and Theory “Calculus” - James Stewart Chapter 5
Numerical Integration Methods “Numerical Recipes” - Press et al. Chapter 4
Error Analysis “Numerical Methods” - Burden & Faires Chapter 4
Adaptive Quadrature “Numerical Recipes” - Press et al. Chapter 4
Gaussian Quadrature “Numerical Linear Algebra” - Trefethen & Bau Chapter 19
Applications in ML “Pattern Recognition and ML” - Bishop Appendix A
Visualization “Math for Programmers” - Paul Orland Chapter 8


6. Validation, Pitfalls, and Completion

Common Pitfalls and Debugging

Problem 1: “Outputs drift after a few iterations”

  • Why: Hidden numerical instability (unscaled features, aggressive step size, or repeated subtraction of nearly equal values).
  • Fix: Normalize inputs, reduce step size, and track relative error rather than only absolute error.
  • Quick test: Run the same task with two scales of input (for example x and 10x) and compare normalized error curves.

Problem 2: “Results are inconsistent across runs”

  • Why: Random seeds, data split randomness, or non-deterministic ordering are uncontrolled.
  • Fix: Set seeds, log configuration, and store split indices and hyperparameters with each run.
  • Quick test: Re-run three times with the same seed and confirm metrics remain inside a tight tolerance band.

Problem 3: “The project works on the demo case but fails on edge cases”

  • Why: Tests only cover happy-path inputs.
  • Fix: Add adversarial inputs (empty values, extreme ranges, near-singular matrices, rare classes).
  • Quick test: Build an edge-case test matrix and ensure every scenario reports expected behavior.

Definition of Done

  • Core functionality works on reference inputs
  • Edge cases are tested and documented
  • Results are reproducible (seeded and versioned configuration)
  • Performance or convergence behavior is measured and explained
  • A short retrospective explains what failed first and how you fixed it

7. Extension Ideas

  1. Add a stress-test mode with adversarial inputs.
  2. Add a short benchmark report (runtime + memory + error trend).
  3. Add a reproducibility bundle (seed, config, and fixed test corpus).

8. Why This Project Matters

Not specified

This project is valuable because it creates observable evidence of mathematical reasoning under real implementation constraints.