Project 7: Build Your Own Mini-Pydantic
Project 7: Build Your Own Mini-Pydantic
Build a simplified version of Pydantic that uses type hints for validationโwithout using Pydantic itself. This reveals exactly how Pydantic works under the hood.
Learning Objectives
By completing this project, you will:
- Master Python metaclasses - Understand how
__new__and metaclasses enable class customization at definition time - Deeply understand the typing module - Use
get_type_hints,get_origin,get_argsto introspect type annotations at runtime - Implement runtime type checking - Build a complete type validation system from scratch
- Learn the descriptor protocol - Use
__get__and__set__for field-level validation - Understand error aggregation patterns - Collect and report all validation errors, not just the first one
- Comprehend Pydantic V2 architecture - Appreciate how pydantic-core (Rust) achieves high performance
Deep Theoretical Foundation
Python Metaclasses and __new__
Every class in Python is an instance of a metaclass. By default, this is type. When you write class Foo: pass, Python actually calls type('Foo', (), {}) behind the scenes.
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Python's Type Hierarchy โ
โ โ
โ โโโโโโโโโโโโ โ
โ โ type โ โโโ Metaclass (all classes are instances) โ
โ โโโโโโฌโโโโโโ โ
โ โ instance of โ
โ โผ โ
โ โโโโโโโโโโโโ โ
โ โ object โ โโโ Base class (all objects inherit) โ
โ โโโโโโฌโโโโโโ โ
โ โ inherits from โ
โ โผ โ
โ โโโโโโโโโโโโ โ
โ โ Foo โ โโโ Your class โ
โ โโโโโโฌโโโโโโ โ
โ โ instance of โ
โ โผ โ
โ โโโโโโโโโโโโ โ
โ โ foo_obj โ โโโ Instance of your class โ
โ โโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Understanding __new__ vs __init__:
class MyMeta(type):
def __new__(mcs, name, bases, namespace):
"""
Called BEFORE the class is created.
Parameters:
- mcs: The metaclass itself (like self, but for metaclasses)
- name: The name of the class being created
- bases: Tuple of parent classes
- namespace: Dict of class attributes (methods, fields, etc.)
Returns:
- The newly created class object
"""
print(f"Creating class: {name}")
print(f"Bases: {bases}")
print(f"Namespace keys: {list(namespace.keys())}")
# This is where Pydantic inspects type hints and sets up validation
cls = super().__new__(mcs, name, bases, namespace)
# Now we can modify the class after creation
cls.__custom_attribute__ = "Added by metaclass"
return cls
def __init__(cls, name, bases, namespace):
"""
Called AFTER the class is created.
Usually used for additional setup.
"""
super().__init__(name, bases, namespace)
class MyClass(metaclass=MyMeta):
x: int = 10
# Output:
# Creating class: MyClass
# Bases: ()
# Namespace keys: ['__module__', '__qualname__', '__annotations__', 'x']
Why Pydantic uses metaclasses:
- Intercept class creation - Extract type annotations before any instance is created
- Modify class behavior - Replace
__init__with a validating version - Add class-level attributes - Store field information, validators, schema data
- Ensure consistency - Every subclass automatically gets validation behavior
The typing Module Internals
Pythonโs typing module provides tools for type hints, but these types are not enforced at runtime by default. Pydantic uses introspection to enforce them.
Key functions for runtime type introspection:
from typing import (
get_type_hints, get_origin, get_args,
Union, Optional, List, Dict, Literal
)
get_type_hints() - Resolves annotations, including forward references:
class User:
name: str
age: "int" # Forward reference as string
hints = get_type_hints(User)
# {'name': <class 'str'>, 'age': <class 'int'>}
# Handles forward references that __annotations__ cannot:
class Node:
value: int
next: "Node" # Self-reference
# __annotations__ gives: {'value': int, 'next': 'Node'}
# get_type_hints() gives: {'value': int, 'next': <class 'Node'>}
get_origin() - Gets the base of a generic type:
from typing import get_origin
get_origin(List[int]) # list
get_origin(Dict[str, int]) # dict
get_origin(Optional[str]) # typing.Union
get_origin(Union[int, str]) # typing.Union
get_origin(Literal["a", "b"]) # typing.Literal
get_origin(int) # None (not a generic)
get_origin(str) # None
get_args() - Gets the type arguments:
from typing import get_args
get_args(List[int]) # (int,)
get_args(Dict[str, int]) # (str, int)
get_args(Optional[str]) # (str, NoneType)
get_args(Union[int, str, None]) # (int, str, NoneType)
get_args(Literal["a", "b"]) # ('a', 'b')
get_args(int) # () (empty tuple)
Type checking flowchart:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Runtime Type Checking Flow โ
โ โ
โ Input: expected_type, value โ
โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ origin = get_origin โ โ
โ โ (expected_type) โ โ
โ โโโโโโโโโโโโฌโโโโโโโโโโโ โ
โ โ โ
โ โผ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ Is origin None? โ โ
โ โโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโ โ
โ โ Yes โ No โ
โ โผ โผ โ
โ โโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ Simple type โ โ Generic type โ โ
โ โ isinstance โ โ args = get_args(expected_type) โ โ
โ โ (value, type) โ โโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโ โ
โ โโโโโโโโโโโโโโโโโโ โ โ
โ โผ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ origin is Union? โ โ
โ โ โ Try each arg โ โ
โ โ origin is list? โ โ
โ โ โ Check each item โ โ
โ โ origin is dict? โ โ
โ โ โ Check keys and values โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
How Pydantic V2 Works Internally
Pydantic V2 is a complete rewrite with a Rust core (pydantic-core) for performance:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Pydantic V2 Architecture โ
โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ Python Layer โ โ
โ โ - BaseModel class โ โ
โ โ - @field_validator, @model_validator โ โ
โ โ - Field() configuration โ โ
โ โ - JSON Schema generation โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ โ
โ โ PyO3 bindings โ
โ โผ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ pydantic-core (Rust) โ โ
โ โ - SchemaValidator: Core validation logic โ โ
โ โ - SchemaSerializer: Serialization logic โ โ
โ โ - CoreSchema: Internal schema representation โ โ
โ โ - 5-50x faster than Python-based V1 โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ
โ Why Rust? โ
โ - No Python GIL limitations โ
โ - Memory-safe without garbage collection overhead โ
โ - Direct memory access for JSON parsing โ
โ - Compile-time optimizations โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
The validation pipeline in detail:
- Schema Building (at class definition time):
- Extract type hints from
__annotations__ - Process
Field()configurations - Collect validators (
@field_validator,@model_validator) - Build internal
CoreSchemarepresentation - Compile validation functions in Rust
- Extract type hints from
- Validation (at instantiation time):
- Receive input data (dict, JSON, object)
- For JSON: Parse directly in Rust (faster than
json.loads) - Apply type coercion based on schema
- Run field validators
- Run model validators
- Construct Python object
# What happens when you do:
user = User(name="John", age="25")
# Internally:
# 1. User.__init__ is called (replaced by Pydantic)
# 2. Input converted to dict if needed
# 3. pydantic-core.SchemaValidator.validate_python() called
# 4. Rust code: parse "25" -> 25 (coercion)
# 5. Rust code: validate name is str, age is int
# 6. Python validators run (if any)
# 7. User instance created with validated data
Runtime Type Checking in Python
Python is dynamically typed - the interpreter doesnโt enforce type hints. For runtime enforcement, you must:
- Extract type information from annotations
- Check values against expected types
- Handle coercion (converting compatible types)
- Report errors clearly
The isinstance() challenge with generics:
# This works for simple types:
isinstance(42, int) # True
isinstance("hello", str) # True
# This does NOT work for generics:
isinstance([1, 2, 3], list[int]) # TypeError!
isinstance({"a": 1}, Dict[str, int]) # TypeError!
# You must check the container and its contents separately:
def is_list_of(value, item_type):
if not isinstance(value, list):
return False
return all(isinstance(item, item_type) for item in value)
Type coercion strategies:
def coerce_value(value, target_type):
"""
Attempt to convert value to target_type.
Pydantic does this by default (strict mode disables it).
"""
# String to int
if target_type is int and isinstance(value, str):
try:
return int(value)
except ValueError:
raise TypeError(f"Cannot convert '{value}' to int")
# String to bool
if target_type is bool and isinstance(value, str):
if value.lower() in ('true', '1', 'yes', 'on'):
return True
if value.lower() in ('false', '0', 'no', 'off'):
return False
raise TypeError(f"Cannot convert '{value}' to bool")
# Float to int (truncate)
if target_type is int and isinstance(value, float):
return int(value)
# String to float
if target_type is float and isinstance(value, str):
try:
return float(value)
except ValueError:
raise TypeError(f"Cannot convert '{value}' to float")
# Already correct type
if isinstance(value, target_type):
return value
raise TypeError(f"Expected {target_type.__name__}, got {type(value).__name__}")
Descriptor Protocol for Field Validation
Descriptors control attribute access. They implement __get__, __set__, and/or __delete__:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Descriptor Protocol โ
โ โ
โ class ValidatedField: โ
โ def __set_name__(self, owner, name): โ
โ # Called when descriptor is assigned to class attr โ
โ self.name = name โ
โ self.private_name = f"__{name}" โ
โ โ
โ def __get__(self, obj, objtype=None): โ
โ # Called when attribute is accessed: obj.field โ
โ if obj is None: โ
โ return self # Class-level access โ
โ return getattr(obj, self.private_name) โ
โ โ
โ def __set__(self, obj, value): โ
โ # Called when attribute is assigned: obj.field = val โ
โ validated = self.validate(value) โ
โ setattr(obj, self.private_name, validated) โ
โ โ
โ class User: โ
โ age = ValidatedField(int, min=0, max=150) โ
โ โ
โ user = User() โ
โ user.age = 25 # Calls ValidatedField.__set__ โ
โ print(user.age) # Calls ValidatedField.__get__ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
A complete validated field descriptor:
class TypedField:
"""Descriptor that validates type on assignment."""
def __init__(self, expected_type, required=True, default=None):
self.expected_type = expected_type
self.required = required
self.default = default
def __set_name__(self, owner, name):
self.name = name
self.private_name = f"_field_{name}"
def __get__(self, obj, objtype=None):
if obj is None:
return self
return getattr(obj, self.private_name, self.default)
def __set__(self, obj, value):
if value is None and not self.required:
setattr(obj, self.private_name, None)
return
if not isinstance(value, self.expected_type):
raise TypeError(
f"Field '{self.name}' expected {self.expected_type.__name__}, "
f"got {type(value).__name__}"
)
setattr(obj, self.private_name, value)
class Person:
name = TypedField(str)
age = TypedField(int)
email = TypedField(str, required=False)
person = Person()
person.name = "John" # OK
person.age = 30 # OK
person.age = "thirty" # TypeError!
Error Aggregation Patterns
Good validation reports ALL errors, not just the first one. This requires:
- Collecting errors during validation
- Continuing after errors instead of raising immediately
- Structured error representation with paths
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Error Aggregation Pattern โ
โ โ
โ class ValidationError(Exception): โ
โ def __init__(self, errors: list[dict]): โ
โ self.errors = errors โ
โ message = f"{len(errors)} validation error(s)" โ
โ super().__init__(message) โ
โ โ
โ # During validation: โ
โ errors = [] โ
โ โ
โ for field_name, expected_type in fields.items(): โ
โ try: โ
โ validate_field(field_name, expected_type, value) โ
โ except TypeError as e: โ
โ errors.append({ โ
โ "loc": (field_name,), โ
โ "msg": str(e), โ
โ "type": "type_error", โ
โ "input": value โ
โ }) โ
โ # DON'T raise - continue to next field โ
โ โ
โ if errors: โ
โ raise ValidationError(errors) โ
โ โ
โ # Result: User sees ALL errors at once โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Nested error paths:
# For nested structures like:
{
"user": {
"address": {
"zip_code": "invalid"
}
}
}
# Error location should be:
("user", "address", "zip_code")
# Which displays as:
"user.address.zip_code"
Project Specification
Functional Requirements
Build a validation library called mini_pydantic that:
- Defines validated models using type hints
- Validates data on instantiation like Pydanticโs BaseModel
- Supports basic types - str, int, float, bool
- Supports containers - list[T], dict[K, V], Optional[T]
- Supports nested models - Models containing other models
- Coerces types by default - Convert โ123โ to 123
- Aggregates all errors - Report every validation failure
- Provides model_dump() - Convert back to dictionary
Core API
from mini_pydantic import MiniModel, ValidationError
from typing import Optional, List
class Address(MiniModel):
street: str
city: str
country: str = "USA" # Default value
class User(MiniModel):
name: str
age: int
email: Optional[str] = None
tags: List[str] = []
address: Optional[Address] = None
# Valid usage
user = User(
name="John",
age="25", # Coerced to int
address={"street": "123 Main St", "city": "NYC"} # Nested validation
)
print(user.name) # "John"
print(user.age) # 25 (int, not str)
print(user.address.city) # "NYC"
print(user.model_dump()) # Returns dict
# Validation errors
try:
User(
age="not-a-number",
tags="should-be-list"
)
except ValidationError as e:
for error in e.errors:
print(f"{error['loc']}: {error['msg']}")
# ('name',): Field required
# ('age',): invalid literal for int() with base 10: 'not-a-number'
# ('tags',): Expected list, got str
Expected Behavior
| Input | Expected Type | Behavior |
|---|---|---|
"123" |
int |
Coerce to 123 |
"true" |
bool |
Coerce to True |
1.5 |
int |
Coerce to 1 |
{"a": 1} |
NestedModel |
Recursively validate |
None |
Optional[T] |
Allow None |
[1, 2, 3] |
List[int] |
Validate each item |
| Missing field | required | Add to errors |
| Missing field | has default | Use default |
Solution Architecture
Component Diagram
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ MiniModelMeta (Metaclass) โ
โ โ
โ __new__(mcs, name, bases, namespace): โ
โ 1. Extract __annotations__ โ
โ 2. Call get_type_hints() for forward refs โ
โ 3. Identify required vs optional fields โ
โ 4. Store field info in __field_types__ โ
โ 5. Create class with super().__new__() โ
โ 6. Return modified class โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โ uses as metaclass
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ MiniModel (Base Class) โ
โ โ
โ __init__(self, **data): โ
โ 1. Check required fields are present โ
โ 2. For each field in __field_types__: โ
โ a. Get value from data or default โ
โ b. Validate using TypeValidator โ
โ c. Set attribute โ
โ 3. Raise ValidationError if any errors โ
โ โ
โ model_dump(self) -> dict: โ
โ Recursively convert to dictionary โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โ delegates to
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ TypeValidator (Helper) โ
โ โ
โ validate(expected_type, value, field_name) -> validated_value โ
โ โ
โ - Handles: str, int, float, bool โ
โ - Handles: Optional[T] (Union with None) โ
โ - Handles: List[T] โ
โ - Handles: Dict[K, V] โ
โ - Handles: Nested MiniModel subclasses โ
โ - Performs type coercion โ
โ - Raises descriptive errors โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โ raises
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ ValidationError (Exception) โ
โ โ
โ __init__(self, errors: list[dict]): โ
โ self.errors = errors โ
โ โ
โ Error dict structure: โ
โ { โ
โ "loc": ("field", "subfield"), # Path tuple โ
โ "msg": "Error message", โ
โ "type": "error_type", โ
โ "input": original_value โ
โ } โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Data Flow
User(name="John", age="25", address={"city": "NYC"})
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ MiniModel.__init__ โ
โ โ
โ 1. Check required fields: โ
โ - name: present โ โ
โ - age: present โ โ
โ - email: Optional, missing โ use None โ
โ - address: present โ โ
โ โ
โ 2. Validate each field: โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ name: str โ "John" โ โ
โ โ โ isinstance("John", str) โ True โ "John" โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ age: int โ "25" โ โ
โ โ โ isinstance("25", int) โ False โ โ
โ โ โ Coerce: int("25") โ 25 โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ address: Optional[Address] โ {"city": "NYC"} โ โ
โ โ โ isinstance(dict, Address) โ False โ โ
โ โ โ Address is MiniModel subclass โ โ
โ โ โ Recursively: Address(**{"city": "NYC"}) โ โ
โ โ โ Returns validated Address instance โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ
โ 3. Set attributes: โ
โ self.name = "John" โ
โ self.age = 25 โ
โ self.address = Address(city="NYC", ...) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Phased Implementation Guide
Phase 1: Basic Structure (1-2 hours)
Goal: Create the metaclass and base class skeleton.
# mini_pydantic/__init__.py
from .core import MiniModel, ValidationError
# mini_pydantic/core.py
from typing import get_type_hints, get_origin, get_args, Union
class ValidationError(Exception):
"""Raised when validation fails."""
def __init__(self, errors: list):
self.errors = errors
super().__init__(f"{len(errors)} validation error(s)")
class MiniModelMeta(type):
"""Metaclass that processes type hints at class creation."""
def __new__(mcs, name, bases, namespace):
cls = super().__new__(mcs, name, bases, namespace)
# Skip processing for the base MiniModel class
if name == 'MiniModel':
return cls
# Store processed field information
cls.__field_types__ = {}
cls.__required_fields__ = set()
cls.__optional_fields__ = set()
cls.__field_defaults__ = {}
# TODO: Process annotations in next phase
return cls
class MiniModel(metaclass=MiniModelMeta):
"""Base class for validated models."""
def __init__(self, **data):
# TODO: Implement validation in next phase
pass
def model_dump(self) -> dict:
# TODO: Implement serialization in next phase
return {}
Checkpoint: Class can be defined without errors.
class User(MiniModel):
name: str
age: int
# Should not raise any errors
Phase 2: Type Hint Extraction (1-2 hours)
Goal: Extract and process type annotations.
class MiniModelMeta(type):
def __new__(mcs, name, bases, namespace):
cls = super().__new__(mcs, name, bases, namespace)
if name == 'MiniModel':
return cls
# Get type hints (handles forward references)
try:
hints = get_type_hints(cls)
except Exception:
hints = getattr(cls, '__annotations__', {})
cls.__field_types__ = hints
cls.__required_fields__ = set()
cls.__optional_fields__ = set()
cls.__field_defaults__ = {}
for field_name, field_type in hints.items():
# Check if field has default value
if hasattr(cls, field_name):
cls.__optional_fields__.add(field_name)
cls.__field_defaults__[field_name] = getattr(cls, field_name)
# Check if type is Optional (Union with None)
elif mcs._is_optional(field_type):
cls.__optional_fields__.add(field_name)
cls.__field_defaults__[field_name] = None
else:
cls.__required_fields__.add(field_name)
return cls
@staticmethod
def _is_optional(field_type) -> bool:
"""Check if type is Optional[X] (Union[X, None])."""
origin = get_origin(field_type)
if origin is Union:
args = get_args(field_type)
return type(None) in args
return False
Checkpoint: Fields are correctly classified as required/optional.
class User(MiniModel):
name: str # Required
age: int # Required
email: Optional[str] = None # Optional
role: str = "user" # Optional (has default)
print(User.__required_fields__) # {'name', 'age'}
print(User.__optional_fields__) # {'email', 'role'}
Phase 3: Basic Validation (2-3 hours)
Goal: Validate simple types and required fields.
class MiniModel(metaclass=MiniModelMeta):
def __init__(self, **data):
errors = []
# Check required fields
for field in self.__required_fields__:
if field not in data:
errors.append({
'loc': (field,),
'msg': 'Field required',
'type': 'missing'
})
# Validate and set fields
for field_name, field_type in self.__field_types__.items():
if field_name in data:
value = data[field_name]
try:
validated = self._validate_field(field_name, field_type, value)
setattr(self, field_name, validated)
except Exception as e:
errors.append({
'loc': (field_name,),
'msg': str(e),
'type': 'validation_error',
'input': value
})
elif field_name in self.__optional_fields__:
default = self.__field_defaults__.get(field_name)
setattr(self, field_name, default)
if errors:
raise ValidationError(errors)
def _validate_field(self, name: str, expected_type, value):
"""Validate a single field value."""
# Basic type validation with coercion
if expected_type is int:
if isinstance(value, int) and not isinstance(value, bool):
return value
return int(value) # Coerce
if expected_type is str:
if isinstance(value, str):
return value
return str(value) # Coerce
if expected_type is float:
if isinstance(value, (int, float)):
return float(value)
return float(value) # Coerce from string
if expected_type is bool:
if isinstance(value, bool):
return value
if isinstance(value, str):
if value.lower() in ('true', '1', 'yes'):
return True
if value.lower() in ('false', '0', 'no'):
return False
return bool(value)
# Direct type check as fallback
if isinstance(value, expected_type):
return value
raise ValueError(
f"Expected {expected_type.__name__}, got {type(value).__name__}"
)
Checkpoint: Basic validation works.
user = User(name="John", age="25")
print(user.age) # 25 (int, coerced from str)
try:
User(age=30) # Missing 'name'
except ValidationError as e:
print(e.errors) # [{'loc': ('name',), 'msg': 'Field required', ...}]
Phase 4: Complex Types (2-3 hours)
Goal: Handle Optional, List, Dict, and nested models.
def _validate_field(self, name: str, expected_type, value):
"""Validate a single field value."""
origin = get_origin(expected_type)
# Handle Optional[X] (Union[X, None])
if origin is Union:
args = get_args(expected_type)
if value is None and type(None) in args:
return None
# Try each type in the union
for arg in args:
if arg is type(None):
continue
try:
return self._validate_field(name, arg, value)
except Exception:
continue
type_names = [a.__name__ for a in args if a is not type(None)]
raise ValueError(f"Expected one of {type_names}, got {type(value).__name__}")
# Handle list[X]
if origin is list:
if not isinstance(value, list):
raise ValueError(f"Expected list, got {type(value).__name__}")
item_type = get_args(expected_type)[0] if get_args(expected_type) else object
validated_items = []
for i, item in enumerate(value):
try:
validated_items.append(
self._validate_field(f"{name}[{i}]", item_type, item)
)
except Exception as e:
raise ValueError(f"Item {i}: {e}")
return validated_items
# Handle dict[K, V]
if origin is dict:
if not isinstance(value, dict):
raise ValueError(f"Expected dict, got {type(value).__name__}")
args = get_args(expected_type)
if len(args) == 2:
key_type, value_type = args
validated_dict = {}
for k, v in value.items():
validated_key = self._validate_field(f"{name}.key", key_type, k)
validated_val = self._validate_field(f"{name}[{k}]", value_type, v)
validated_dict[validated_key] = validated_val
return validated_dict
return value
# Handle nested MiniModel
if isinstance(expected_type, type) and issubclass(expected_type, MiniModel):
if isinstance(value, expected_type):
return value
if isinstance(value, dict):
return expected_type(**value)
raise ValueError(f"Expected {expected_type.__name__} or dict")
# Basic types (from Phase 3)
# ... (previous implementation)
Checkpoint: Complex types work.
class Address(MiniModel):
city: str
country: str = "USA"
class User(MiniModel):
tags: List[str]
address: Optional[Address] = None
user = User(
tags=["python", "validation"],
address={"city": "NYC"}
)
print(user.address.city) # "NYC"
Phase 5: Error Aggregation (1-2 hours)
Goal: Collect all errors with proper paths.
Improve the __init__ to track error paths for nested structures:
def __init__(self, **data):
errors = []
# Check required fields
for field in self.__required_fields__:
if field not in data:
errors.append({
'loc': (field,),
'msg': 'Field required',
'type': 'missing'
})
# Validate and set fields
for field_name, field_type in self.__field_types__.items():
if field_name in data:
value = data[field_name]
try:
validated = self._validate_field(field_name, field_type, value)
setattr(self, field_name, validated)
except ValidationError as nested_error:
# Nested model validation error - prepend path
for error in nested_error.errors:
error['loc'] = (field_name,) + error['loc']
errors.append(error)
except Exception as e:
errors.append({
'loc': (field_name,),
'msg': str(e),
'type': 'validation_error',
'input': value
})
elif field_name in self.__optional_fields__:
# Handle defaults that might be mutable
default = self.__field_defaults__.get(field_name)
if isinstance(default, list):
default = list(default) # Copy
elif isinstance(default, dict):
default = dict(default) # Copy
setattr(self, field_name, default)
if errors:
raise ValidationError(errors)
Checkpoint: Nested errors have full paths.
try:
User(address={"city": 123}) # city should be str
except ValidationError as e:
print(e.errors[0]['loc']) # ('address', 'city')
Phase 6: Serialization and Polish (1-2 hours)
Goal: Implement model_dump(), __repr__, and edge cases.
def model_dump(self) -> dict:
"""Convert model to dictionary."""
result = {}
for field_name in self.__field_types__:
value = getattr(self, field_name, None)
if isinstance(value, MiniModel):
result[field_name] = value.model_dump()
elif isinstance(value, list):
result[field_name] = [
v.model_dump() if isinstance(v, MiniModel) else v
for v in value
]
elif isinstance(value, dict):
result[field_name] = {
k: v.model_dump() if isinstance(v, MiniModel) else v
for k, v in value.items()
}
else:
result[field_name] = value
return result
def __repr__(self):
fields = ', '.join(
f'{name}={getattr(self, name)!r}'
for name in self.__field_types__
)
return f'{self.__class__.__name__}({fields})'
def __eq__(self, other):
if not isinstance(other, self.__class__):
return False
return self.model_dump() == other.model_dump()
Final Checkpoint: Full functionality works.
user = User(name="John", age=30, address={"city": "NYC"})
print(user) # User(name='John', age=30, ...)
print(user.model_dump()) # {'name': 'John', 'age': 30, ...}
Testing Strategy
Unit Tests
# tests/test_basic_types.py
import pytest
from mini_pydantic import MiniModel, ValidationError
class SimpleModel(MiniModel):
name: str
age: int
score: float
active: bool
def test_valid_simple_model():
model = SimpleModel(name="John", age=30, score=95.5, active=True)
assert model.name == "John"
assert model.age == 30
assert model.score == 95.5
assert model.active is True
def test_type_coercion():
model = SimpleModel(name="John", age="30", score="95.5", active="true")
assert model.age == 30
assert isinstance(model.age, int)
assert model.score == 95.5
assert model.active is True
def test_missing_required_field():
with pytest.raises(ValidationError) as exc:
SimpleModel(name="John")
errors = exc.value.errors
assert len(errors) == 3 # age, score, active
locs = {e['loc'] for e in errors}
assert ('age',) in locs
assert ('score',) in locs
def test_invalid_type():
with pytest.raises(ValidationError) as exc:
SimpleModel(name="John", age="not-a-number", score=1.0, active=True)
assert any(e['loc'] == ('age',) for e in exc.value.errors)
# tests/test_optional_types.py
from typing import Optional
class OptionalModel(MiniModel):
required_field: str
optional_field: Optional[str] = None
default_field: str = "default"
def test_optional_with_none():
model = OptionalModel(required_field="test")
assert model.optional_field is None
assert model.default_field == "default"
def test_optional_with_value():
model = OptionalModel(
required_field="test",
optional_field="provided"
)
assert model.optional_field == "provided"
# tests/test_nested_models.py
class Address(MiniModel):
city: str
country: str = "USA"
class Person(MiniModel):
name: str
address: Address
def test_nested_model_from_dict():
person = Person(
name="John",
address={"city": "NYC"}
)
assert person.address.city == "NYC"
assert person.address.country == "USA"
def test_nested_model_instance():
addr = Address(city="Boston")
person = Person(name="Jane", address=addr)
assert person.address.city == "Boston"
def test_nested_validation_error():
with pytest.raises(ValidationError) as exc:
Person(name="John", address={"city": 123})
errors = exc.value.errors
assert any(
e['loc'] == ('address', 'city')
for e in errors
)
# tests/test_list_types.py
from typing import List
class TaggedItem(MiniModel):
name: str
tags: List[str]
def test_list_validation():
item = TaggedItem(name="test", tags=["a", "b", "c"])
assert item.tags == ["a", "b", "c"]
def test_list_coercion():
item = TaggedItem(name="test", tags=[1, 2, 3])
assert item.tags == ["1", "2", "3"]
def test_list_type_error():
with pytest.raises(ValidationError):
TaggedItem(name="test", tags="not-a-list")
Integration Tests
# tests/test_integration.py
class Address(MiniModel):
street: str
city: str
zip_code: str
class Company(MiniModel):
name: str
employees: int
class Person(MiniModel):
name: str
age: int
email: Optional[str] = None
addresses: List[Address] = []
employer: Optional[Company] = None
def test_complex_model():
data = {
"name": "John Doe",
"age": 30,
"email": "john@example.com",
"addresses": [
{"street": "123 Main St", "city": "NYC", "zip_code": "10001"},
{"street": "456 Oak Ave", "city": "LA", "zip_code": "90001"}
],
"employer": {
"name": "TechCorp",
"employees": 500
}
}
person = Person(**data)
assert person.name == "John Doe"
assert len(person.addresses) == 2
assert person.addresses[0].city == "NYC"
assert person.employer.name == "TechCorp"
def test_model_dump_roundtrip():
person = Person(
name="Jane",
age=25,
addresses=[{"street": "789 Pine", "city": "Chicago", "zip_code": "60601"}]
)
dumped = person.model_dump()
restored = Person(**dumped)
assert restored.name == person.name
assert restored.addresses[0].city == person.addresses[0].city
def test_all_errors_collected():
with pytest.raises(ValidationError) as exc:
Person(
age="not-a-number",
addresses=[{"street": 123}] # city and zip_code missing, street wrong type
)
errors = exc.value.errors
# Should have: name missing, age invalid, address[0] issues
assert len(errors) >= 3
# Verify error locations are correct
locs = [e['loc'] for e in errors]
assert ('name',) in locs
assert ('age',) in locs
Common Pitfalls and Debugging
Pitfall 1: Forward Reference Resolution Fails
Problem: Models referencing each other fail with NameError.
class Node(MiniModel):
value: int
next: "Node" # Forward reference
Solution: Call model_rebuild() after all classes are defined:
@classmethod
def model_rebuild(cls):
"""Rebuild the model to resolve forward references."""
try:
cls.__field_types__ = get_type_hints(cls)
except NameError:
pass # Still unresolvable
Pitfall 2: Mutable Default Values
Problem: List/dict defaults are shared between instances.
class User(MiniModel):
tags: List[str] = [] # DANGER: Same list for all instances!
Solution: Copy mutable defaults in __init__:
if isinstance(default, (list, dict)):
default = type(default)(default) # Create a copy
Pitfall 3: Bool is Subclass of Int
Problem: isinstance(True, int) returns True.
# This incorrectly accepts True for an int field
model = SimpleModel(age=True) # age becomes True, not 1
Solution: Check for bool explicitly before int:
if expected_type is int:
if isinstance(value, bool): # Check bool FIRST
raise ValueError("Expected int, got bool")
if isinstance(value, int):
return value
return int(value)
Pitfall 4: Union Order Matters
Problem: Union[int, str] always matches int first.
# "123" gets coerced to 123 because int is tried first
Solution: Document this behavior or implement โbest matchโ logic:
# Option 1: Try exact type match first
for arg in args:
if isinstance(value, arg):
return value
# Option 2: Prefer the most specific type
# (Complex to implement correctly)
Pitfall 5: Recursive Models Cause Infinite Loops
Problem: Self-referencing models without None termination.
class Node(MiniModel):
value: int
children: List["Node"] # No None termination - must always have children
Solution: Always use Optional for recursive references:
class Node(MiniModel):
value: int
children: Optional[List["Node"]] = None # Safe termination
Debugging Tips
Print type inspection results:
from typing import get_origin, get_args
def debug_type(t):
print(f"Type: {t}")
print(f"Origin: {get_origin(t)}")
print(f"Args: {get_args(t)}")
Add verbose logging to _validate_field:
def _validate_field(self, name, expected_type, value):
print(f"Validating {name}: {value!r} against {expected_type}")
# ... validation code ...
Extensions and Challenges
Extension 1: Field Constraints
Add validation constraints like Pydanticโs Field():
class Field:
def __init__(self, *,
default=...,
min_length=None,
max_length=None,
ge=None,
le=None,
pattern=None):
self.default = default
self.min_length = min_length
# ... store all constraints
class User(MiniModel):
name: str = Field(min_length=1, max_length=100)
age: int = Field(ge=0, le=150)
Extension 2: Custom Validators
Add decorator-based validators:
class User(MiniModel):
email: str
@field_validator('email')
@classmethod
def validate_email(cls, v):
if '@' not in v:
raise ValueError('Invalid email')
return v.lower()
Extension 3: Strict Mode
Disable type coercion:
class StrictUser(MiniModel):
age: int
model_config = {'strict': True}
StrictUser(age="25") # Raises error instead of coercing
Extension 4: JSON Schema Generation
Generate JSON Schema from models:
schema = User.model_json_schema()
# Returns:
# {
# "type": "object",
# "properties": {
# "name": {"type": "string"},
# "age": {"type": "integer"}
# },
# "required": ["name", "age"]
# }
Extension 5: Alias Support
Support different names for input/output:
class User(MiniModel):
user_name: str = Field(alias="userName")
# Accepts {"userName": "John"}
# Outputs {"user_name": "John"} or {"userName": "John"} based on config
Real-World Connections
Where These Concepts Appear
- Pydantic - The library youโre reimplementing (production-grade)
- dataclasses - Python stdlib, no validation but similar structure
- attrs - Third-party, with optional validators
- marshmallow - Serialization/validation, schema-based approach
- SQLModel - Pydantic + SQLAlchemy combined
- FastAPI - Uses Pydantic for request/response validation
Industry Applications
- API Development - Request validation in web frameworks
- Configuration Management - Type-safe config loading
- Data Pipelines - Schema enforcement between stages
- LLM Applications - Structured output parsing
- ORM Integration - Database model validation
Production Considerations
If you wanted to use your mini-pydantic in production, you would need:
- Performance optimization - Compile validators, cache type info
- Thread safety - Careful with mutable class-level state
- Error message quality - Clear, actionable messages
- Documentation - Rich docstrings, type stubs
- Edge cases - Handle every possible input gracefully
Self-Assessment Checklist
Core Understanding
- Can I explain why metaclasses are used instead of regular inheritance?
- Can I describe the difference between
__annotations__andget_type_hints()? - Can I explain how
get_origin()andget_args()decompose generic types? - Can I describe the descriptor protocol and when to use it?
Implementation Skills
- Can I implement a metaclass that processes type annotations?
- Can I validate nested model structures recursively?
- Can I handle Optional, List, and Dict types correctly?
- Can I implement type coercion for basic types?
Design Understanding
- Can I explain why error aggregation is better than fail-fast?
- Can I describe how to handle mutable default values safely?
- Can I explain the tradeoffs of type coercion vs strict mode?
- Can I design an extensible validation architecture?
Mastery Indicators
- My implementation handles all basic Python types correctly
- Nested models work with proper error path reporting
- All validation errors are collected before raising
- The code is well-organized with clear separation of concerns
- I understand the limitations compared to real Pydantic
Resources
Documentation
Books
- โFluent Pythonโ by Luciano Ramalho - Chapters 23 (Descriptors) and 24 (Metaclasses)
- โPython in a Nutshellโ by Alex Martelli - Comprehensive reference
- โRobust Pythonโ by Patrick Viafore - Type hints and validation
Source Code Study
- pydantic-core - Rust implementation
- pydantic - Python layer
- typeguard - Runtime type checking