Project 8: “The Property Based Testing Suite” — Advanced Testing
| Attribute | Value |
|---|---|
| File | KIRO_CLI_LEARNING_PROJECTS.md |
| Main Programming Language | Python (Hypothesis) or TypeScript (fast-check) |
| Coolness Level | Level 4: Hardcore Tech Flex |
| Difficulty | Level 3: Advanced |
| Knowledge Area | Advanced Testing |
What you’ll build: A booking system tested with PBT to prove no overlapping bookings.
Why it teaches PBT: It exposes subtle edge cases AI might miss.
Success criteria:
- PBT finds at least one real bug before you fix it.
Real World Outcome
You’ll implement a room booking system where property-based testing automatically generates thousands of test cases, exposing edge cases like timezone boundaries, concurrent bookings, and off-by-one errors that example-based tests would miss.
Example: Booking System Test Output
# test_booking.py
from hypothesis import given, strategies as st
from datetime import datetime, timedelta
from booking import BookingSystem
@given(
bookings=st.lists(
st.tuples(
st.datetimes(min_value=datetime(2025,1,1), max_value=datetime(2025,12,31)),
st.integers(min_value=1, max_value=8) # duration in hours
),
min_size=2,
max_size=50
)
)
def test_no_overlapping_bookings(bookings):
system = BookingSystem()
for start, duration in bookings:
end = start + timedelta(hours=duration)
system.book("room-A", start, end)
# Property: No two bookings should overlap
all_bookings = system.get_bookings("room-A")
for i, booking1 in enumerate(all_bookings):
for booking2 in all_bookings[i+1:]:
assert not booking1.overlaps(booking2), \
f"Overlap detected: {booking1} and {booking2}"
When you run the tests:
$ pytest test_booking.py -v
test_booking.py::test_no_overlapping_bookings FAILED
================================= FAILURES =================================
test_no_overlapping_bookings - AssertionError
Falsifying example:
bookings = [
(datetime(2025, 3, 15, 14, 0, 0), 2), # 14:00-16:00
(datetime(2025, 3, 15, 15, 59, 59), 1) # 15:59:59-16:59:59
]
AssertionError: Overlap detected:
Booking(start=2025-03-15 14:00:00, end=2025-03-15 16:00:00)
Booking(start=2025-03-15 15:59:59, end=2025-03-15 16:59:59)
Hypothesis found a counterexample after 147 test cases.
Shrunk input to minimal failing case.
The bug revealed: Your overlap check used start < other_end and end > other_start, but failed on second-level precision boundaries. The fix:
def overlaps(self, other):
# Fixed: Use <= for inclusive boundary checking
return self.start < other.end and self.end > other.start
After fixing:
$ pytest test_booking.py -v
test_booking.py::test_no_overlapping_bookings PASSED
Hypothesis ran 100 test cases (2,847 examples total)
All properties hold ✓
Property-based testing generated 2,847 booking combinations and proved your invariant holds across all of them.
The Core Question You’re Answering
“How do I test properties that must hold for ALL possible inputs, not just the examples I thought of?”
Traditional example-based testing forces you to imagine edge cases. You write tests for:
- Normal case: 2pm-3pm
- Boundary case: Midnight
- Edge case: Leap year February 29th
But you’ll always miss combinations. Property-based testing inverts this: you state the invariant (no overlaps), and the framework generates inputs designed to break it.
This project teaches you to think in properties (universal truths) rather than examples (specific scenarios).
Concepts You Must Understand First
Stop and research these before coding:
- Property-Based Testing (PBT) vs Example-Based Testing
- What is a “property” in the context of testing?
- How does random generation differ from hand-crafted examples?
- What is “shrinking” and why is it critical for debugging?
- Book Reference: “Property-Based Testing with PropEr, Erlang, and Elixir” by Fred Hebert - Ch. 1
- Test Generators and Strategies
- How do you define the space of valid inputs?
- What constraints ensure generated data is realistic?
- How do you generate dependent values (end time > start time)?
- Web Reference: Hypothesis Documentation - Strategies
- Invariants and Postconditions
- What makes a good invariant (universally true property)?
- How do you express “for all X, property P holds”?
- When should you test state transitions vs final outcomes?
- Book Reference: “Growing Object-Oriented Software, Guided by Tests” Ch. 19
Questions to Guide Your Design
Before implementing, think through these:
- System Properties
- What invariants must ALWAYS hold in your booking system?
- No overlapping bookings for the same room
- Booking end time > start time
- Cannot book in the past
- Total bookings <= room capacity
- Which of these can be violated by bad inputs vs implementation bugs?
- Test Data Generation
- How do you generate realistic datetime ranges?
- Should you test with timezones, or UTC only?
- How do you ensure generated bookings have variety (short, long, overnight)?
- Do you need to generate user IDs, or just time ranges?
- Shrinking Strategy
- When a test fails with 50 bookings, how do you find the minimal failing case?
- Should you shrink by removing bookings, or simplifying time ranges?
- How do you preserve the failure while reducing complexity?
Thinking Exercise
Property Discovery: Booking System Invariants
Given a booking system with this interface:
class BookingSystem:
def book(room_id, start, end) -> booking_id
def cancel(booking_id) -> bool
def get_bookings(room_id) -> List[Booking]
List all properties that should ALWAYS hold:
Temporal Properties:
- For any booking:
booking.end > booking.start - Cannot book a time in the past relative to system time
Collision Properties:
- No two active bookings for the same room overlap
- After canceling booking X, overlaps must be recalculated
State Properties:
- Total active bookings equals successful
book()calls minuscancel()calls - get_bookings() returns bookings in chronological order
Now design PBT tests for each:
| Property | Generator Strategy | Assertion |
|---|---|---|
| 1. End > Start | Generate (start, start + positive_delta) |
assert booking.end > booking.start |
| 3. No overlaps | Generate list of (start, duration) tuples |
Pairwise overlap check |
| 5. Booking count | Generate sequence of book/cancel actions | assert len(get_bookings) == expected |
The Interview Questions They’ll Ask
-
“Explain the difference between property-based testing and fuzzing. When would you use each?”
-
“How would you write a property-based test for a sorting algorithm without reimplementing the sort?”
-
“What strategies would you use to generate valid JSON that conforms to a specific schema?”
-
“Describe how shrinking works in Hypothesis/QuickCheck and why it’s essential for debugging.”
-
“How would you test a distributed system’s consistency guarantees using property-based testing?”
-
“What are the limitations of PBT? Name scenarios where example-based tests are superior.”
Hints in Layers
Hint 1: Start with Simple Properties Before testing complex booking logic, verify basic properties:
@given(st.datetimes(), st.timedelta(min_value=timedelta(hours=1)))
def test_booking_duration_positive(start, duration):
end = start + duration
booking = Booking(start, end)
assert booking.duration() > timedelta(0)
Hint 2: Use Composite Strategies Generate bookings that meet domain constraints:
valid_booking = st.builds(
Booking,
start=st.datetimes(min_value=datetime(2025,1,1)),
duration=st.integers(min_value=1, max_value=8).map(lambda h: timedelta(hours=h))
)
Hint 3: Test State Machines Model booking workflows as state transitions:
class BookingStateMachine(RuleBasedStateMachine):
@rule(start=datetimes(), duration=hours())
def book_room(self, start, duration):
self.system.book("room-A", start, start+duration)
# Invariant: check no overlaps after every booking
Hint 4: Shrinking and Debugging When a test fails, Hypothesis automatically simplifies the input. Example:
Initial failure: 50 bookings
Shrunk to: 2 bookings (minimal reproduction)
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| PBT Fundamentals | “Property-Based Testing with PropEr, Erlang, and Elixir” by Fred Hebert | Ch. 1-3 |
| Hypothesis (Python) | “Effective Python” by Brett Slatkin | Item 76 |
| QuickCheck (Haskell) | “Learn You a Haskell for Great Good!” by Miran Lipovača | Ch. 11 |
| State Machine Testing | “Hypothesis Documentation” (online) | Stateful Testing Guide |
Common Pitfalls & Debugging
Problem 1: “Tests pass locally but fail in CI due to timezone differences”
- Why: Generated datetimes assume local timezone
- Fix: Always use UTC for test data:
st.datetimes(timezones=st.just(timezone.utc)) - Quick test:
export TZ=America/New_York && pytest
Problem 2: “Hypothesis generates unrealistic edge cases (year 9999)”
- Why: Default datetime range is too broad
- Fix: Constrain generators to realistic bounds:
min_value=datetime(2025,1,1), max_value=datetime(2030,12,31) - Quick test: Add
@settings(verbosity=Verbosity.verbose)to see generated values
Problem 3: “Test fails intermittently with different shrunk examples”
- Why: Property relies on system state (database, clock)
- Fix: Use deterministic seeds and isolated test fixtures
- Quick test:
@given(...) @settings(derandomize=True)
Problem 4: “Shrinking takes too long (>30 seconds)”
- Why: Complex data structures with many interdependencies
- Fix: Simplify generators or use
@settings(max_examples=50)during development - Quick test: Monitor shrinking with
--hypothesis-show-statistics
Definition of Done
- Implemented booking system with
book(),cancel(), andget_bookings()methods - Property test verifies no overlapping bookings (with Hypothesis generating 100+ examples)
- Property test found at least one real bug (documented in README)
- Tests use constrained datetime generation (realistic time ranges)
- Shrinking produces minimal failing examples (verified manually)
- README explains each property being tested and why it matters
- CI runs PBT with fixed seed for reproducible failures
- Coverage report shows all edge cases exercised by generated inputs