Project 5: Discriminated Unions Parser
Project 5: Discriminated Unions Parser
Build a webhook handler that uses discriminated unions to parse different event typesโorders, payments, refundsโeach with different fields, all validated correctly based on a type discriminator.
Learning Objectives
By completing this project, you will:
- Master discriminated unions in Pydantic - Parse polymorphic data based on a discriminator field
- Understand union matching modes - Learn left_to_right, smart, and discriminator modes
- Use Literal types for type discrimination - Define fixed values that identify model variants
- Implement callable discriminators - Handle complex dispatch logic for dynamic type selection
- Design robust fallback handling - Gracefully handle unknown or malformed event types
- Generate proper OpenAPI schemas - Ensure unions produce correct API documentation
Deep Theoretical Foundation
What Are Discriminated Unions and Why They Matter
Real-world APIs send different data shapes through the same endpoint. Webhooks from Stripe, GitHub, or e-commerce platforms deliver orders, payments, and refunds to a single /webhooks endpoint. Each event has different fields but arrives at the same location.
Webhook Payload
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโ
โ { "type": "..." } โ
โโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโ
โ โ โ
โผ โผ โผ
type="order.created" type="payment.succeeded" type="refund.issued"
โ โ โ
โผ โผ โผ
โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ
โ OrderEvent โ โ PaymentEventโ โ RefundEvent โ
โ - order_id โ โ - payment_idโ โ - refund_id โ
โ - customer โ โ - amount โ โ - reason โ
โ - items โ โ - method โ โ - amount โ
โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ

A discriminated union (or โtagged unionโ) uses a shared field to determine which type applies. Pydanticโs discriminator mode provides O(1) lookup instead of trying each model sequentially.
Union Matching Modes in Pydantic V2
Pydantic V2 offers three distinct modes for matching data against union types. Understanding these modes is crucial for choosing the right approach for your use case.
Mode 1: Left-to-Right (Explicit)
This mode tries each model in the order specified until one succeeds. It was the default in Pydantic V1.
from pydantic import BaseModel
from typing import Union
class Dog(BaseModel):
name: str
breed: str
class Cat(BaseModel):
name: str
color: str
# Tries Dog first, then Cat
Pet = Union[Dog, Cat]
# Problem: {"name": "Fluffy", "breed": "Siamese"} matches Dog!
# Even if you meant Cat with breed being an extra field
Problems with left-to-right:
- Order matters and is fragile to refactoring
- Extra fields may cause wrong match
- Performance degrades with many options (O(n) checking)
Mode 2: Smart Mode (Default in V2)
Pydantic analyzes models and picks the โbest matchโ using heuristics based on required fields, field types, and overlap.
# Smart mode is the default in Pydantic V2
Pet = Union[Dog, Cat]
# Pydantic examines:
# 1. Required fields that match
# 2. Exact type matches vs coercions
# 3. Unique fields for each model
Smart mode works well for simple cases but can be unpredictable for complex overlapping models.
Mode 3: Discriminator Mode (Recommended for APIs)
This mode explicitly tells Pydantic which field determines the type, providing O(1) lookup.
from typing import Annotated, Union, Literal
from pydantic import BaseModel, Field
class Dog(BaseModel):
pet_type: Literal["dog"] # Discriminator value
name: str
breed: str
class Cat(BaseModel):
pet_type: Literal["cat"] # Discriminator value
name: str
color: str
# Use discriminator parameter
Pet = Annotated[
Union[Dog, Cat],
Field(discriminator="pet_type")
]
# Now parsing is O(1) - directly maps to correct model
Benefits of discriminator mode:
- O(1) lookup instead of O(n) checking
- Deterministic behavior with no ambiguity
- Clear OpenAPI schema with discriminator defined
- Better error messages when discriminator value is invalid
Literal Types for Type Discrimination
Literal types from the typing module define exact values a field can have. This is essential for discriminated unions because each model variant must have a unique Literal value.
from typing import Literal
from pydantic import BaseModel
# Field must be exactly "dog" - not "Dog", not "DOG"
pet_type: Literal["dog"]
# Multiple allowed values in one Literal
status: Literal["pending", "approved", "rejected"]
# Numeric literals work too
priority: Literal[1, 2, 3]
In discriminated unions, each model variant must have a unique Literal value:
class OrderCreatedEvent(BaseModel):
type: Literal["order.created"] # Unique discriminator
order_id: str
items: list[str]
class OrderCancelledEvent(BaseModel):
type: Literal["order.cancelled"] # Different value
order_id: str
reason: str
class OrderShippedEvent(BaseModel):
type: Literal["order.shipped"] # Another unique value
order_id: str
tracking_number: str
Literal type validation:
# This succeeds
OrderCreatedEvent(type="order.created", order_id="123", items=["a"])
# This fails with validation error - wrong literal value
OrderCreatedEvent(type="order.shipped", order_id="123", items=["a"])
# Error: Input should be 'order.created'
Callable Discriminators for Complex Dispatch
Sometimes the discriminator logic is more complex than a simple field value. Pydantic V2 supports callable discriminators that compute the type from the data.
from typing import Annotated, Union
from pydantic import BaseModel, Discriminator
class CreditCardPayment(BaseModel):
method: str # Will be "card"
card_last_four: str
amount: float
class BankTransfer(BaseModel):
method: str # Will be "bank"
account_number: str
routing_number: str
amount: float
class CryptoPayment(BaseModel):
method: str # Will be "crypto"
wallet_address: str
amount: float
def get_payment_type(data: dict) -> str:
"""Complex logic to determine payment type."""
method = data.get("method", "").lower()
# Handle legacy formats
if method in ("credit_card", "debit_card", "card"):
return "card"
elif method in ("ach", "wire", "bank", "bank_transfer"):
return "bank"
elif method in ("btc", "eth", "crypto", "cryptocurrency"):
return "crypto"
else:
return "unknown"
# Callable discriminator with custom error handling
Payment = Annotated[
Union[CreditCardPayment, BankTransfer, CryptoPayment],
Discriminator(
get_payment_type,
custom_error_type="invalid_payment_method",
custom_error_message="Unknown payment method"
)
]
When to use callable discriminators:
- Legacy data normalization - Old systems use different naming conventions
- Computed discrimination - Type determined by multiple fields
- Version handling - Different API versions have different schemas
- Fallback logic - Default to a specific type when uncertain
Fallback Handling for Unknown Types
Production systems must handle unknown event types gracefully. When webhooks evolve or new event types are added, your system should not crash.
from pydantic import BaseModel, Field, model_validator
from typing import Union, Literal, Any
# Known event types with specific Literal discriminators
class OrderEvent(BaseModel):
type: Literal["order.created", "order.updated"]
order_id: str
data: dict
class PaymentEvent(BaseModel):
type: Literal["payment.succeeded", "payment.failed"]
payment_id: str
amount: float
# Fallback for unknown events
class UnknownEvent(BaseModel):
"""Catches any event type we don't explicitly handle."""
type: str # Not Literal - accepts any string
raw_data: dict = Field(default_factory=dict)
@model_validator(mode="before")
@classmethod
def capture_raw(cls, data: dict) -> dict:
"""Preserve the original payload for debugging."""
return {
"type": data.get("type", "unknown"),
"raw_data": data
}
Fallback strategies:
- Log and acknowledge - Store unknown events for later analysis
- Graceful degradation - Extract only common fields
- Alert and retry - Queue for manual review
- Version negotiation - Request event in compatible format
OpenAPI Schema Generation for Unions
Discriminated unions generate proper OpenAPI 3.0+ schemas with discriminator mappings. This ensures Swagger/OpenAPI documentation correctly represents your polymorphic data.
from pydantic import BaseModel, Field
from typing import Annotated, Union, Literal
class Dog(BaseModel):
pet_type: Literal["dog"]
breed: str
class Cat(BaseModel):
pet_type: Literal["cat"]
color: str
Pet = Annotated[Union[Dog, Cat], Field(discriminator="pet_type")]
class PetContainer(BaseModel):
pet: Pet
# Generate schema
print(PetContainer.model_json_schema())
Output includes discriminator information:
{
"properties": {
"pet": {
"discriminator": {
"mapping": {
"cat": "#/$defs/Cat",
"dog": "#/$defs/Dog"
},
"propertyName": "pet_type"
},
"oneOf": [
{"$ref": "#/$defs/Dog"},
{"$ref": "#/$defs/Cat"}
]
}
},
"$defs": {
"Dog": {...},
"Cat": {...}
}
}
This generates proper Swagger/OpenAPI documentation with a dropdown for selecting event types, making your API self-documenting and easier for consumers to understand.
Project Specification
Functional Requirements
Build a webhook handler that:
- Parses multiple event types - Orders, payments, and customer events
- Uses discriminated unions - Type field determines which model validates the data
- Handles unknown events gracefully - Unknown types are captured, not rejected
- Provides event routing - Use Pythonโs
singledispatchfor type-safe handlers - Generates correct OpenAPI docs - Discriminator mapping in schema
Webhook Event Structure
{
"id": "evt_123abc",
"type": "order.created",
"timestamp": "2024-01-15T10:30:00Z",
"version": "1.0",
"data": {
"order_id": "ord_456",
"customer_id": "cust_789",
"items": ["widget"],
"total": 99.99
}
}
Event Types to Implement
Order Events: order.created, order.updated, order.cancelled, order.shipped
Payment Events: payment.succeeded, payment.failed, payment.refunded
Customer Events: customer.created, customer.updated, customer.deleted
API Endpoints
POST /webhooks - Receive and process webhook events
GET /webhooks/events - List recent events (filtered by type)
GET /webhooks/events/{id} - Get specific event details
POST /webhooks/replay/{id} - Re-process a specific event
Solution Architecture
Component Diagram
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Webhook Router โ
โ POST /webhooks โ parse_event() โ route_event() โ handle โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Event Parser โ
โ Raw JSON โโโบ WebhookEvent (Discriminated Union) โ
โ โ โ
โ โโโโโโโโโโโโผโโโโโโโโโโโฌโโโโโโโโโโโโโโโ โ
โ โผ โผ โผ โผ โ
โ OrderEvent PaymentEvent CustomerEvent UnknownEvent โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Event Handlers โ
โ @singledispatch โ
โ def handle_event(event: BaseEvent): ... โ
โ โ
โ @handle_event.register(OrderCreatedEvent) โ
โ def _(event): # Process order... โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ

Project Structure
webhook-handler/
โโโ app/
โ โโโ main.py
โ โโโ models/
โ โ โโโ base.py # BaseEvent
โ โ โโโ orders.py # Order events
โ โ โโโ payments.py # Payment events
โ โ โโโ customers.py # Customer events
โ โ โโโ unknown.py # Fallback model
โ โ โโโ union.py # Discriminated union
โ โโโ handlers/
โ โ โโโ registry.py # singledispatch base
โ โ โโโ orders.py # Order handlers
โ โ โโโ payments.py # Payment handlers
โ โโโ routes/
โ โ โโโ webhooks.py # API endpoints
โ โโโ store/
โ โโโ event_store.py # Event persistence
โโโ tests/
Phased Implementation Guide
Phase 1: Base Models (1-2 hours)
Create models/base.py:
from pydantic import BaseModel, Field
from datetime import datetime
class BaseEvent(BaseModel):
id: str = Field(..., description="Unique event identifier")
timestamp: datetime
version: str = Field(default="1.0")
Create order/payment event models with Literal types for discrimination.
Checkpoint: Individual event models validate correctly.
Phase 2: Discriminated Union (1-2 hours)
Create models/union.py:
from typing import Annotated, Union
from pydantic import Field
WebhookEvent = Annotated[
Union[OrderCreatedEvent, PaymentSucceededEvent, ...],
Field(discriminator="type")
]
Checkpoint: Union routes to correct event types based on type field.
Phase 3: Fallback Handling (1 hour)
Create UnknownEvent model and wrapper class:
class WebhookPayload(BaseModel):
@classmethod
def parse_event(cls, data: dict):
try:
return cls(event=data) # Try discriminated union
except ValidationError:
return cls(event=UnknownEvent.model_validate(data))
Checkpoint: Unknown events are captured, not rejected.
Phase 4: Event Handlers (2 hours)
Create handlers with singledispatch:
from functools import singledispatch
@singledispatch
def handle_event(event: BaseEvent) -> dict:
return {"status": "unhandled"}
@handle_event.register(OrderCreatedEvent)
def _(event: OrderCreatedEvent) -> dict:
return {"status": "processed", "order_id": event.data.order_id}
Checkpoint: Events route to correct handlers by type.
Phase 5: FastAPI Integration (2 hours)
Create routes/webhooks.py:
@router.post("")
async def receive_webhook(request: Request):
payload = await request.json()
event_wrapper = WebhookPayload.parse_event(payload)
result = handle_event(event_wrapper.event)
return {"status": "received", "event_id": event_wrapper.event.id, **result}
Checkpoint: API accepts and processes webhook events.
Phase 6: Event Store and Replay (2 hours)
Implement event storage, querying, and replay functionality.
Checkpoint: Full webhook system with storage and replay.
Testing Strategy
Unit Tests for Models
import pytest
from datetime import datetime
from pydantic import ValidationError, TypeAdapter
from app.models.orders import OrderCreatedEvent, OrderCreatedData
from app.models.payments import PaymentSucceededEvent
from app.models.unknown import UnknownEvent
from app.models.union import WebhookEvent
class TestOrderModels:
def test_order_created_valid(self):
event = OrderCreatedEvent(
id="evt_123",
type="order.created",
timestamp=datetime.now(),
data=OrderCreatedData(
order_id="ord_456",
customer_id="cust_789",
items=[{"sku": "A", "quantity": 1, "price": 10.0}],
total=10.0
)
)
assert event.type == "order.created"
assert event.data.order_id == "ord_456"
def test_order_created_wrong_type_literal(self):
"""Literal type must match exactly."""
with pytest.raises(ValidationError) as exc:
OrderCreatedEvent(
id="evt_123",
type="order.updated", # Wrong literal!
timestamp=datetime.now(),
data={"order_id": "123", "customer_id": "456", "items": [], "total": 0}
)
assert "type" in str(exc.value)
class TestDiscriminatedUnion:
@pytest.fixture
def adapter(self):
return TypeAdapter(WebhookEvent)
def test_parses_order_created(self, adapter):
data = {
"id": "evt_1",
"type": "order.created",
"timestamp": "2024-01-15T10:30:00Z",
"data": {"order_id": "ord_1", "customer_id": "cust_1", "items": [], "total": 0}
}
event = adapter.validate_python(data)
assert isinstance(event, OrderCreatedEvent)
def test_parses_payment_succeeded(self, adapter):
data = {
"id": "evt_2",
"type": "payment.succeeded",
"timestamp": "2024-01-15T10:30:00Z",
"data": {"payment_id": "pay_1", "order_id": "ord_1", "amount": 99.99, "method": "card"}
}
event = adapter.validate_python(data)
assert isinstance(event, PaymentSucceededEvent)
def test_invalid_discriminator_value(self, adapter):
"""Unknown type in discriminated union raises error."""
data = {
"id": "evt_3",
"type": "inventory.adjusted", # Not in union
"timestamp": "2024-01-15T10:30:00Z",
"data": {}
}
with pytest.raises(ValidationError):
adapter.validate_python(data)
class TestUnknownEventFallback:
def test_captures_unknown_type(self):
from app.models.union import WebhookPayload
data = {
"id": "evt_unknown",
"type": "some.future.event",
"timestamp": "2024-01-15T10:30:00Z",
"data": {"key": "value"}
}
payload = WebhookPayload.parse_event(data)
assert isinstance(payload.event, UnknownEvent)
assert payload.event.type == "some.future.event"
Integration Tests
import pytest
from fastapi.testclient import TestClient
from app.main import app
client = TestClient(app)
class TestWebhookEndpoint:
def test_receive_order_created(self):
response = client.post("/webhooks", json={
"id": "evt_test_1",
"type": "order.created",
"timestamp": "2024-01-15T10:30:00Z",
"data": {
"order_id": "ord_test",
"customer_id": "cust_test",
"items": [{"sku": "TEST", "quantity": 1, "price": 10.0}],
"total": 10.0
}
})
assert response.status_code == 200
data = response.json()
assert data["status"] == "received"
assert data["event_type"] == "order.created"
def test_receive_unknown_event_accepted(self):
"""Unknown events should be accepted, not rejected."""
response = client.post("/webhooks", json={
"id": "evt_unknown",
"type": "some.unknown.type",
"timestamp": "2024-01-15T10:30:00Z",
"data": {"arbitrary": "data"}
})
assert response.status_code == 200
data = response.json()
assert "warning" in data or data["status"] == "received"
def test_validation_error_missing_id(self):
"""Missing required fields should return 422."""
response = client.post("/webhooks", json={
"type": "order.created",
"timestamp": "2024-01-15T10:30:00Z"
# Missing: id, data
})
assert response.status_code == 422
def test_list_events_by_type(self):
response = client.get("/webhooks/events?type=order.created")
assert response.status_code == 200
def test_replay_event(self):
# Create an event first
client.post("/webhooks", json={
"id": "evt_replay",
"type": "order.created",
"timestamp": "2024-01-15T10:30:00Z",
"data": {"order_id": "ord_1", "customer_id": "c_1", "items": [], "total": 0}
})
# Replay it
response = client.post("/webhooks/replay/evt_replay")
assert response.status_code == 200
Common Pitfalls and Debugging
Pitfall 1: Overlapping Literal Values
Problem: Two models have the same Literal value causing duplicate discriminator errors.
Solution: Each model must have a unique Literal value.
Pitfall 2: Missing Discriminator Field
Problem: Data lacks the discriminator field.
Solution: Validate presence before parsing or use a wrapper with explicit error handling.
Pitfall 3: Fallback Model Matches First
Problem: UnknownEvent with type: str matches everything if placed first.
Solution: Put specific Literal types first in the Union, fallback last.
Pitfall 4: Callable Discriminator Returns None
Problem: Callable returns None when field is missing.
Solution: Always return a valid string; handle missing gracefully.
Pitfall 5: Nested Data Validation Errors
Problem: Outer event is valid but nested data fails.
Debugging: Check full error path in ValidationError.errors().
Extensions and Challenges
Extension 1: Webhook Signature Verification
Implement HMAC signature verification for security:
def verify_signature(payload: bytes, signature: str, secret: str) -> bool:
expected = hmac.new(secret.encode(), payload, hashlib.sha256).hexdigest()
return hmac.compare_digest(f"sha256={expected}", signature)
Extension 2: Event Versioning
Handle multiple versions with compound discriminator:
def get_event_key(data: dict) -> str:
return f"{data.get('type')}:{data.get('version', '1.0')}"
Extension 3: Retry Queue
Implement retry logic for failed event processing with exponential backoff.
Extension 4: Event Sourcing
Store events as source of truth and replay to rebuild state.
Extension 5: Outbound Webhooks
Build the sending side with signature generation and delivery tracking.
Real-World Connections
Where This Pattern Appears
- Payment Processors: Stripe, PayPal, Square webhooks
- E-commerce: Shopify, WooCommerce event notifications
- Communication: Twilio, SendGrid status callbacks
- Infrastructure: GitHub webhooks, AWS SNS, CloudEvents
Production Considerations
- Idempotency: Handle duplicate deliveries
- Ordering: Events may arrive out of order
- Rate Limiting: Protect against webhook floods
- Dead Letter Queue: Handle permanently failed events
Self-Assessment Checklist
Core Understanding
- Can I explain discriminated unions and their benefits?
- Can I describe Pydanticโs three union matching modes?
- Can I use Literal types for discrimination?
- Can I implement callable discriminators?
Implementation Skills
- Can I define discriminated unions with multiple event types?
- Can I implement fallback handling for unknown types?
- Can I use singledispatch for type-safe routing?
- Can I generate correct OpenAPI schemas?
Mastery Indicators
- System handles all known event types correctly
- Unknown events are gracefully captured
- OpenAPI documentation is complete
- Event handlers are type-safe
Resources
Documentation
Specifications
Books
- โRobust Pythonโ by Patrick Viafore - Chapter 6 on Union Types