Project 8: “The Property Based Testing Suite” — Advanced Testing

Attribute Value
File KIRO_CLI_LEARNING_PROJECTS.md
Main Programming Language Python (Hypothesis) or TypeScript (fast-check)
Coolness Level Level 4: Hardcore Tech Flex
Difficulty Level 3: Advanced
Knowledge Area Advanced Testing

What you’ll build: A booking system tested with PBT to prove no overlapping bookings.

Why it teaches PBT: It exposes subtle edge cases AI might miss.

Success criteria:

  • PBT finds at least one real bug before you fix it.

Real World Outcome

You’ll implement a room booking system where property-based testing automatically generates thousands of test cases, exposing edge cases like timezone boundaries, concurrent bookings, and off-by-one errors that example-based tests would miss.

Example: Booking System Test Output

# test_booking.py
from hypothesis import given, strategies as st
from datetime import datetime, timedelta
from booking import BookingSystem

@given(
    bookings=st.lists(
        st.tuples(
            st.datetimes(min_value=datetime(2025,1,1), max_value=datetime(2025,12,31)),
            st.integers(min_value=1, max_value=8)  # duration in hours
        ),
        min_size=2,
        max_size=50
    )
)
def test_no_overlapping_bookings(bookings):
    system = BookingSystem()

    for start, duration in bookings:
        end = start + timedelta(hours=duration)
        system.book("room-A", start, end)

    # Property: No two bookings should overlap
    all_bookings = system.get_bookings("room-A")
    for i, booking1 in enumerate(all_bookings):
        for booking2 in all_bookings[i+1:]:
            assert not booking1.overlaps(booking2), \
                f"Overlap detected: {booking1} and {booking2}"

When you run the tests:

$ pytest test_booking.py -v

test_booking.py::test_no_overlapping_bookings FAILED

================================= FAILURES =================================
test_no_overlapping_bookings - AssertionError

Falsifying example:
  bookings = [
    (datetime(2025, 3, 15, 14, 0, 0), 2),  # 14:00-16:00
    (datetime(2025, 3, 15, 15, 59, 59), 1) # 15:59:59-16:59:59
  ]

AssertionError: Overlap detected:
  Booking(start=2025-03-15 14:00:00, end=2025-03-15 16:00:00)
  Booking(start=2025-03-15 15:59:59, end=2025-03-15 16:59:59)

Hypothesis found a counterexample after 147 test cases.
Shrunk input to minimal failing case.

The bug revealed: Your overlap check used start < other_end and end > other_start, but failed on second-level precision boundaries. The fix:

def overlaps(self, other):
    # Fixed: Use <= for inclusive boundary checking
    return self.start < other.end and self.end > other.start

After fixing:

$ pytest test_booking.py -v

test_booking.py::test_no_overlapping_bookings PASSED

Hypothesis ran 100 test cases (2,847 examples total)
All properties hold ✓

Property-based testing generated 2,847 booking combinations and proved your invariant holds across all of them.


The Core Question You’re Answering

“How do I test properties that must hold for ALL possible inputs, not just the examples I thought of?”

Traditional example-based testing forces you to imagine edge cases. You write tests for:

  • Normal case: 2pm-3pm
  • Boundary case: Midnight
  • Edge case: Leap year February 29th

But you’ll always miss combinations. Property-based testing inverts this: you state the invariant (no overlaps), and the framework generates inputs designed to break it.

This project teaches you to think in properties (universal truths) rather than examples (specific scenarios).


Concepts You Must Understand First

Stop and research these before coding:

  1. Property-Based Testing (PBT) vs Example-Based Testing
    • What is a “property” in the context of testing?
    • How does random generation differ from hand-crafted examples?
    • What is “shrinking” and why is it critical for debugging?
    • Book Reference: “Property-Based Testing with PropEr, Erlang, and Elixir” by Fred Hebert - Ch. 1
  2. Test Generators and Strategies
    • How do you define the space of valid inputs?
    • What constraints ensure generated data is realistic?
    • How do you generate dependent values (end time > start time)?
    • Web Reference: Hypothesis Documentation - Strategies
  3. Invariants and Postconditions
    • What makes a good invariant (universally true property)?
    • How do you express “for all X, property P holds”?
    • When should you test state transitions vs final outcomes?
    • Book Reference: “Growing Object-Oriented Software, Guided by Tests” Ch. 19

Questions to Guide Your Design

Before implementing, think through these:

  1. System Properties
    • What invariants must ALWAYS hold in your booking system?
    • No overlapping bookings for the same room
    • Booking end time > start time
    • Cannot book in the past
    • Total bookings <= room capacity
    • Which of these can be violated by bad inputs vs implementation bugs?
  2. Test Data Generation
    • How do you generate realistic datetime ranges?
    • Should you test with timezones, or UTC only?
    • How do you ensure generated bookings have variety (short, long, overnight)?
    • Do you need to generate user IDs, or just time ranges?
  3. Shrinking Strategy
    • When a test fails with 50 bookings, how do you find the minimal failing case?
    • Should you shrink by removing bookings, or simplifying time ranges?
    • How do you preserve the failure while reducing complexity?

Thinking Exercise

Property Discovery: Booking System Invariants

Given a booking system with this interface:

class BookingSystem:
    def book(room_id, start, end) -> booking_id
    def cancel(booking_id) -> bool
    def get_bookings(room_id) -> List[Booking]

List all properties that should ALWAYS hold:

Temporal Properties:

  1. For any booking: booking.end > booking.start
  2. Cannot book a time in the past relative to system time

Collision Properties:

  1. No two active bookings for the same room overlap
  2. After canceling booking X, overlaps must be recalculated

State Properties:

  1. Total active bookings equals successful book() calls minus cancel() calls
  2. get_bookings() returns bookings in chronological order

Now design PBT tests for each:

Property Generator Strategy Assertion
1. End > Start Generate (start, start + positive_delta) assert booking.end > booking.start
3. No overlaps Generate list of (start, duration) tuples Pairwise overlap check
5. Booking count Generate sequence of book/cancel actions assert len(get_bookings) == expected

The Interview Questions They’ll Ask

  1. “Explain the difference between property-based testing and fuzzing. When would you use each?”

  2. “How would you write a property-based test for a sorting algorithm without reimplementing the sort?”

  3. “What strategies would you use to generate valid JSON that conforms to a specific schema?”

  4. “Describe how shrinking works in Hypothesis/QuickCheck and why it’s essential for debugging.”

  5. “How would you test a distributed system’s consistency guarantees using property-based testing?”

  6. “What are the limitations of PBT? Name scenarios where example-based tests are superior.”


Hints in Layers

Hint 1: Start with Simple Properties Before testing complex booking logic, verify basic properties:

@given(st.datetimes(), st.timedelta(min_value=timedelta(hours=1)))
def test_booking_duration_positive(start, duration):
    end = start + duration
    booking = Booking(start, end)
    assert booking.duration() > timedelta(0)

Hint 2: Use Composite Strategies Generate bookings that meet domain constraints:

valid_booking = st.builds(
    Booking,
    start=st.datetimes(min_value=datetime(2025,1,1)),
    duration=st.integers(min_value=1, max_value=8).map(lambda h: timedelta(hours=h))
)

Hint 3: Test State Machines Model booking workflows as state transitions:

class BookingStateMachine(RuleBasedStateMachine):
    @rule(start=datetimes(), duration=hours())
    def book_room(self, start, duration):
        self.system.book("room-A", start, start+duration)
        # Invariant: check no overlaps after every booking

Hint 4: Shrinking and Debugging When a test fails, Hypothesis automatically simplifies the input. Example:

Initial failure: 50 bookings
Shrunk to: 2 bookings (minimal reproduction)

Books That Will Help

Topic Book Chapter
PBT Fundamentals “Property-Based Testing with PropEr, Erlang, and Elixir” by Fred Hebert Ch. 1-3
Hypothesis (Python) “Effective Python” by Brett Slatkin Item 76
QuickCheck (Haskell) “Learn You a Haskell for Great Good!” by Miran Lipovača Ch. 11
State Machine Testing “Hypothesis Documentation” (online) Stateful Testing Guide

Common Pitfalls & Debugging

Problem 1: “Tests pass locally but fail in CI due to timezone differences”

  • Why: Generated datetimes assume local timezone
  • Fix: Always use UTC for test data: st.datetimes(timezones=st.just(timezone.utc))
  • Quick test: export TZ=America/New_York && pytest

Problem 2: “Hypothesis generates unrealistic edge cases (year 9999)”

  • Why: Default datetime range is too broad
  • Fix: Constrain generators to realistic bounds: min_value=datetime(2025,1,1), max_value=datetime(2030,12,31)
  • Quick test: Add @settings(verbosity=Verbosity.verbose) to see generated values

Problem 3: “Test fails intermittently with different shrunk examples”

  • Why: Property relies on system state (database, clock)
  • Fix: Use deterministic seeds and isolated test fixtures
  • Quick test: @given(...) @settings(derandomize=True)

Problem 4: “Shrinking takes too long (>30 seconds)”

  • Why: Complex data structures with many interdependencies
  • Fix: Simplify generators or use @settings(max_examples=50) during development
  • Quick test: Monitor shrinking with --hypothesis-show-statistics

Definition of Done

  • Implemented booking system with book(), cancel(), and get_bookings() methods
  • Property test verifies no overlapping bookings (with Hypothesis generating 100+ examples)
  • Property test found at least one real bug (documented in README)
  • Tests use constrained datetime generation (realistic time ranges)
  • Shrinking produces minimal failing examples (verified manually)
  • README explains each property being tested and why it matters
  • CI runs PBT with fixed seed for reproducible failures
  • Coverage report shows all edge cases exercised by generated inputs