Project 7: Build Your Own Mini-Pydantic

Project 7: Build Your Own Mini-Pydantic

Build a simplified version of Pydantic that uses type hints for validationโ€“without using Pydantic itself. This reveals exactly how Pydantic works under the hood.


Learning Objectives

By completing this project, you will:

  1. Master Python metaclasses - Understand how __new__ and metaclasses enable class customization at definition time
  2. Deeply understand the typing module - Use get_type_hints, get_origin, get_args to introspect type annotations at runtime
  3. Implement runtime type checking - Build a complete type validation system from scratch
  4. Learn the descriptor protocol - Use __get__ and __set__ for field-level validation
  5. Understand error aggregation patterns - Collect and report all validation errors, not just the first one
  6. Comprehend Pydantic V2 architecture - Appreciate how pydantic-core (Rust) achieves high performance

Deep Theoretical Foundation

Python Metaclasses and __new__

Every class in Python is an instance of a metaclass. By default, this is type. When you write class Foo: pass, Python actually calls type('Foo', (), {}) behind the scenes.

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                    Python's Type Hierarchy                       โ”‚
โ”‚                                                                  โ”‚
โ”‚     โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                                                โ”‚
โ”‚     โ”‚   type   โ”‚  โ—„โ”€โ”€ Metaclass (all classes are instances)    โ”‚
โ”‚     โ””โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”˜                                                โ”‚
โ”‚          โ”‚ instance of                                           โ”‚
โ”‚          โ–ผ                                                       โ”‚
โ”‚     โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                                                โ”‚
โ”‚     โ”‚  object  โ”‚  โ—„โ”€โ”€ Base class (all objects inherit)         โ”‚
โ”‚     โ””โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”˜                                                โ”‚
โ”‚          โ”‚ inherits from                                         โ”‚
โ”‚          โ–ผ                                                       โ”‚
โ”‚     โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                                                โ”‚
โ”‚     โ”‚   Foo    โ”‚  โ—„โ”€โ”€ Your class                                โ”‚
โ”‚     โ””โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”˜                                                โ”‚
โ”‚          โ”‚ instance of                                           โ”‚
โ”‚          โ–ผ                                                       โ”‚
โ”‚     โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                                                โ”‚
โ”‚     โ”‚ foo_obj  โ”‚  โ—„โ”€โ”€ Instance of your class                    โ”‚
โ”‚     โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                                                โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Understanding __new__ vs __init__:

class MyMeta(type):
    def __new__(mcs, name, bases, namespace):
        """
        Called BEFORE the class is created.

        Parameters:
        - mcs: The metaclass itself (like self, but for metaclasses)
        - name: The name of the class being created
        - bases: Tuple of parent classes
        - namespace: Dict of class attributes (methods, fields, etc.)

        Returns:
        - The newly created class object
        """
        print(f"Creating class: {name}")
        print(f"Bases: {bases}")
        print(f"Namespace keys: {list(namespace.keys())}")

        # This is where Pydantic inspects type hints and sets up validation
        cls = super().__new__(mcs, name, bases, namespace)

        # Now we can modify the class after creation
        cls.__custom_attribute__ = "Added by metaclass"

        return cls

    def __init__(cls, name, bases, namespace):
        """
        Called AFTER the class is created.
        Usually used for additional setup.
        """
        super().__init__(name, bases, namespace)


class MyClass(metaclass=MyMeta):
    x: int = 10

# Output:
# Creating class: MyClass
# Bases: ()
# Namespace keys: ['__module__', '__qualname__', '__annotations__', 'x']

Why Pydantic uses metaclasses:

  1. Intercept class creation - Extract type annotations before any instance is created
  2. Modify class behavior - Replace __init__ with a validating version
  3. Add class-level attributes - Store field information, validators, schema data
  4. Ensure consistency - Every subclass automatically gets validation behavior

The typing Module Internals

Pythonโ€™s typing module provides tools for type hints, but these types are not enforced at runtime by default. Pydantic uses introspection to enforce them.

Key functions for runtime type introspection:

from typing import (
    get_type_hints, get_origin, get_args,
    Union, Optional, List, Dict, Literal
)

get_type_hints() - Resolves annotations, including forward references:

class User:
    name: str
    age: "int"  # Forward reference as string

hints = get_type_hints(User)
# {'name': <class 'str'>, 'age': <class 'int'>}

# Handles forward references that __annotations__ cannot:
class Node:
    value: int
    next: "Node"  # Self-reference

# __annotations__ gives: {'value': int, 'next': 'Node'}
# get_type_hints() gives: {'value': int, 'next': <class 'Node'>}

get_origin() - Gets the base of a generic type:

from typing import get_origin

get_origin(List[int])           # list
get_origin(Dict[str, int])      # dict
get_origin(Optional[str])       # typing.Union
get_origin(Union[int, str])     # typing.Union
get_origin(Literal["a", "b"])   # typing.Literal
get_origin(int)                 # None (not a generic)
get_origin(str)                 # None

get_args() - Gets the type arguments:

from typing import get_args

get_args(List[int])              # (int,)
get_args(Dict[str, int])         # (str, int)
get_args(Optional[str])          # (str, NoneType)
get_args(Union[int, str, None])  # (int, str, NoneType)
get_args(Literal["a", "b"])      # ('a', 'b')
get_args(int)                    # () (empty tuple)

Type checking flowchart:

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                   Runtime Type Checking Flow                     โ”‚
โ”‚                                                                  โ”‚
โ”‚   Input: expected_type, value                                   โ”‚
โ”‚                                                                  โ”‚
โ”‚   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                                       โ”‚
โ”‚   โ”‚ origin = get_origin โ”‚                                       โ”‚
โ”‚   โ”‚ (expected_type)     โ”‚                                       โ”‚
โ”‚   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                                       โ”‚
โ”‚              โ”‚                                                   โ”‚
โ”‚              โ–ผ                                                   โ”‚
โ”‚   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”          โ”‚
โ”‚   โ”‚ Is origin None?                                   โ”‚          โ”‚
โ”‚   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜          โ”‚
โ”‚            โ”‚ Yes                 โ”‚ No                            โ”‚
โ”‚            โ–ผ                     โ–ผ                               โ”‚
โ”‚   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”       โ”‚
โ”‚   โ”‚ Simple type    โ”‚   โ”‚ Generic type                    โ”‚       โ”‚
โ”‚   โ”‚ isinstance     โ”‚   โ”‚ args = get_args(expected_type) โ”‚       โ”‚
โ”‚   โ”‚ (value, type)  โ”‚   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜       โ”‚
โ”‚   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                    โ”‚                        โ”‚
โ”‚                                         โ–ผ                        โ”‚
โ”‚                        โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”       โ”‚
โ”‚                        โ”‚ origin is Union?               โ”‚       โ”‚
โ”‚                        โ”‚   โ†’ Try each arg               โ”‚       โ”‚
โ”‚                        โ”‚ origin is list?                โ”‚       โ”‚
โ”‚                        โ”‚   โ†’ Check each item            โ”‚       โ”‚
โ”‚                        โ”‚ origin is dict?                โ”‚       โ”‚
โ”‚                        โ”‚   โ†’ Check keys and values      โ”‚       โ”‚
โ”‚                        โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜       โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

How Pydantic V2 Works Internally

Pydantic V2 is a complete rewrite with a Rust core (pydantic-core) for performance:

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                   Pydantic V2 Architecture                       โ”‚
โ”‚                                                                  โ”‚
โ”‚   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”     โ”‚
โ”‚   โ”‚                    Python Layer                        โ”‚     โ”‚
โ”‚   โ”‚  - BaseModel class                                     โ”‚     โ”‚
โ”‚   โ”‚  - @field_validator, @model_validator                  โ”‚     โ”‚
โ”‚   โ”‚  - Field() configuration                               โ”‚     โ”‚
โ”‚   โ”‚  - JSON Schema generation                              โ”‚     โ”‚
โ”‚   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜     โ”‚
โ”‚                              โ”‚                                   โ”‚
โ”‚                              โ”‚ PyO3 bindings                     โ”‚
โ”‚                              โ–ผ                                   โ”‚
โ”‚   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”     โ”‚
โ”‚   โ”‚                    pydantic-core (Rust)                โ”‚     โ”‚
โ”‚   โ”‚  - SchemaValidator: Core validation logic              โ”‚     โ”‚
โ”‚   โ”‚  - SchemaSerializer: Serialization logic               โ”‚     โ”‚
โ”‚   โ”‚  - CoreSchema: Internal schema representation          โ”‚     โ”‚
โ”‚   โ”‚  - 5-50x faster than Python-based V1                   โ”‚     โ”‚
โ”‚   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜     โ”‚
โ”‚                                                                  โ”‚
โ”‚   Why Rust?                                                      โ”‚
โ”‚   - No Python GIL limitations                                   โ”‚
โ”‚   - Memory-safe without garbage collection overhead             โ”‚
โ”‚   - Direct memory access for JSON parsing                       โ”‚
โ”‚   - Compile-time optimizations                                  โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

The validation pipeline in detail:

  1. Schema Building (at class definition time):
    • Extract type hints from __annotations__
    • Process Field() configurations
    • Collect validators (@field_validator, @model_validator)
    • Build internal CoreSchema representation
    • Compile validation functions in Rust
  2. Validation (at instantiation time):
    • Receive input data (dict, JSON, object)
    • For JSON: Parse directly in Rust (faster than json.loads)
    • Apply type coercion based on schema
    • Run field validators
    • Run model validators
    • Construct Python object
# What happens when you do:
user = User(name="John", age="25")

# Internally:
# 1. User.__init__ is called (replaced by Pydantic)
# 2. Input converted to dict if needed
# 3. pydantic-core.SchemaValidator.validate_python() called
# 4. Rust code: parse "25" -> 25 (coercion)
# 5. Rust code: validate name is str, age is int
# 6. Python validators run (if any)
# 7. User instance created with validated data

Runtime Type Checking in Python

Python is dynamically typed - the interpreter doesnโ€™t enforce type hints. For runtime enforcement, you must:

  1. Extract type information from annotations
  2. Check values against expected types
  3. Handle coercion (converting compatible types)
  4. Report errors clearly

The isinstance() challenge with generics:

# This works for simple types:
isinstance(42, int)       # True
isinstance("hello", str)  # True

# This does NOT work for generics:
isinstance([1, 2, 3], list[int])  # TypeError!
isinstance({"a": 1}, Dict[str, int])  # TypeError!

# You must check the container and its contents separately:
def is_list_of(value, item_type):
    if not isinstance(value, list):
        return False
    return all(isinstance(item, item_type) for item in value)

Type coercion strategies:

def coerce_value(value, target_type):
    """
    Attempt to convert value to target_type.
    Pydantic does this by default (strict mode disables it).
    """
    # String to int
    if target_type is int and isinstance(value, str):
        try:
            return int(value)
        except ValueError:
            raise TypeError(f"Cannot convert '{value}' to int")

    # String to bool
    if target_type is bool and isinstance(value, str):
        if value.lower() in ('true', '1', 'yes', 'on'):
            return True
        if value.lower() in ('false', '0', 'no', 'off'):
            return False
        raise TypeError(f"Cannot convert '{value}' to bool")

    # Float to int (truncate)
    if target_type is int and isinstance(value, float):
        return int(value)

    # String to float
    if target_type is float and isinstance(value, str):
        try:
            return float(value)
        except ValueError:
            raise TypeError(f"Cannot convert '{value}' to float")

    # Already correct type
    if isinstance(value, target_type):
        return value

    raise TypeError(f"Expected {target_type.__name__}, got {type(value).__name__}")

Descriptor Protocol for Field Validation

Descriptors control attribute access. They implement __get__, __set__, and/or __delete__:

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                    Descriptor Protocol                           โ”‚
โ”‚                                                                  โ”‚
โ”‚   class ValidatedField:                                         โ”‚
โ”‚       def __set_name__(self, owner, name):                      โ”‚
โ”‚           # Called when descriptor is assigned to class attr    โ”‚
โ”‚           self.name = name                                       โ”‚
โ”‚           self.private_name = f"__{name}"                        โ”‚
โ”‚                                                                  โ”‚
โ”‚       def __get__(self, obj, objtype=None):                     โ”‚
โ”‚           # Called when attribute is accessed: obj.field        โ”‚
โ”‚           if obj is None:                                        โ”‚
โ”‚               return self  # Class-level access                  โ”‚
โ”‚           return getattr(obj, self.private_name)                โ”‚
โ”‚                                                                  โ”‚
โ”‚       def __set__(self, obj, value):                            โ”‚
โ”‚           # Called when attribute is assigned: obj.field = val  โ”‚
โ”‚           validated = self.validate(value)                       โ”‚
โ”‚           setattr(obj, self.private_name, validated)            โ”‚
โ”‚                                                                  โ”‚
โ”‚   class User:                                                    โ”‚
โ”‚       age = ValidatedField(int, min=0, max=150)                 โ”‚
โ”‚                                                                  โ”‚
โ”‚   user = User()                                                  โ”‚
โ”‚   user.age = 25      # Calls ValidatedField.__set__             โ”‚
โ”‚   print(user.age)    # Calls ValidatedField.__get__             โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

A complete validated field descriptor:

class TypedField:
    """Descriptor that validates type on assignment."""

    def __init__(self, expected_type, required=True, default=None):
        self.expected_type = expected_type
        self.required = required
        self.default = default

    def __set_name__(self, owner, name):
        self.name = name
        self.private_name = f"_field_{name}"

    def __get__(self, obj, objtype=None):
        if obj is None:
            return self
        return getattr(obj, self.private_name, self.default)

    def __set__(self, obj, value):
        if value is None and not self.required:
            setattr(obj, self.private_name, None)
            return

        if not isinstance(value, self.expected_type):
            raise TypeError(
                f"Field '{self.name}' expected {self.expected_type.__name__}, "
                f"got {type(value).__name__}"
            )
        setattr(obj, self.private_name, value)


class Person:
    name = TypedField(str)
    age = TypedField(int)
    email = TypedField(str, required=False)

person = Person()
person.name = "John"     # OK
person.age = 30          # OK
person.age = "thirty"    # TypeError!

Error Aggregation Patterns

Good validation reports ALL errors, not just the first one. This requires:

  1. Collecting errors during validation
  2. Continuing after errors instead of raising immediately
  3. Structured error representation with paths
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                  Error Aggregation Pattern                       โ”‚
โ”‚                                                                  โ”‚
โ”‚   class ValidationError(Exception):                              โ”‚
โ”‚       def __init__(self, errors: list[dict]):                   โ”‚
โ”‚           self.errors = errors                                   โ”‚
โ”‚           message = f"{len(errors)} validation error(s)"        โ”‚
โ”‚           super().__init__(message)                              โ”‚
โ”‚                                                                  โ”‚
โ”‚   # During validation:                                           โ”‚
โ”‚   errors = []                                                    โ”‚
โ”‚                                                                  โ”‚
โ”‚   for field_name, expected_type in fields.items():              โ”‚
โ”‚       try:                                                       โ”‚
โ”‚           validate_field(field_name, expected_type, value)      โ”‚
โ”‚       except TypeError as e:                                     โ”‚
โ”‚           errors.append({                                        โ”‚
โ”‚               "loc": (field_name,),                             โ”‚
โ”‚               "msg": str(e),                                     โ”‚
โ”‚               "type": "type_error",                             โ”‚
โ”‚               "input": value                                     โ”‚
โ”‚           })                                                     โ”‚
โ”‚           # DON'T raise - continue to next field                โ”‚
โ”‚                                                                  โ”‚
โ”‚   if errors:                                                     โ”‚
โ”‚       raise ValidationError(errors)                              โ”‚
โ”‚                                                                  โ”‚
โ”‚   # Result: User sees ALL errors at once                        โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Nested error paths:

# For nested structures like:
{
    "user": {
        "address": {
            "zip_code": "invalid"
        }
    }
}

# Error location should be:
("user", "address", "zip_code")

# Which displays as:
"user.address.zip_code"

Project Specification

Functional Requirements

Build a validation library called mini_pydantic that:

  1. Defines validated models using type hints
  2. Validates data on instantiation like Pydanticโ€™s BaseModel
  3. Supports basic types - str, int, float, bool
  4. Supports containers - list[T], dict[K, V], Optional[T]
  5. Supports nested models - Models containing other models
  6. Coerces types by default - Convert โ€œ123โ€ to 123
  7. Aggregates all errors - Report every validation failure
  8. Provides model_dump() - Convert back to dictionary

Core API

from mini_pydantic import MiniModel, ValidationError
from typing import Optional, List

class Address(MiniModel):
    street: str
    city: str
    country: str = "USA"  # Default value

class User(MiniModel):
    name: str
    age: int
    email: Optional[str] = None
    tags: List[str] = []
    address: Optional[Address] = None

# Valid usage
user = User(
    name="John",
    age="25",  # Coerced to int
    address={"street": "123 Main St", "city": "NYC"}  # Nested validation
)

print(user.name)         # "John"
print(user.age)          # 25 (int, not str)
print(user.address.city) # "NYC"
print(user.model_dump()) # Returns dict

# Validation errors
try:
    User(
        age="not-a-number",
        tags="should-be-list"
    )
except ValidationError as e:
    for error in e.errors:
        print(f"{error['loc']}: {error['msg']}")
# ('name',): Field required
# ('age',): invalid literal for int() with base 10: 'not-a-number'
# ('tags',): Expected list, got str

Expected Behavior

Input Expected Type Behavior
"123" int Coerce to 123
"true" bool Coerce to True
1.5 int Coerce to 1
{"a": 1} NestedModel Recursively validate
None Optional[T] Allow None
[1, 2, 3] List[int] Validate each item
Missing field required Add to errors
Missing field has default Use default

Solution Architecture

Component Diagram

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                    MiniModelMeta (Metaclass)                     โ”‚
โ”‚                                                                  โ”‚
โ”‚  __new__(mcs, name, bases, namespace):                          โ”‚
โ”‚    1. Extract __annotations__                                   โ”‚
โ”‚    2. Call get_type_hints() for forward refs                    โ”‚
โ”‚    3. Identify required vs optional fields                      โ”‚
โ”‚    4. Store field info in __field_types__                       โ”‚
โ”‚    5. Create class with super().__new__()                       โ”‚
โ”‚    6. Return modified class                                     โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                              โ”‚
                              โ”‚ uses as metaclass
                              โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                      MiniModel (Base Class)                      โ”‚
โ”‚                                                                  โ”‚
โ”‚  __init__(self, **data):                                        โ”‚
โ”‚    1. Check required fields are present                         โ”‚
โ”‚    2. For each field in __field_types__:                        โ”‚
โ”‚       a. Get value from data or default                         โ”‚
โ”‚       b. Validate using TypeValidator                           โ”‚
โ”‚       c. Set attribute                                          โ”‚
โ”‚    3. Raise ValidationError if any errors                       โ”‚
โ”‚                                                                  โ”‚
โ”‚  model_dump(self) -> dict:                                      โ”‚
โ”‚    Recursively convert to dictionary                            โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                              โ”‚
                              โ”‚ delegates to
                              โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                     TypeValidator (Helper)                       โ”‚
โ”‚                                                                  โ”‚
โ”‚  validate(expected_type, value, field_name) -> validated_value  โ”‚
โ”‚                                                                  โ”‚
โ”‚  - Handles: str, int, float, bool                               โ”‚
โ”‚  - Handles: Optional[T] (Union with None)                       โ”‚
โ”‚  - Handles: List[T]                                             โ”‚
โ”‚  - Handles: Dict[K, V]                                          โ”‚
โ”‚  - Handles: Nested MiniModel subclasses                         โ”‚
โ”‚  - Performs type coercion                                       โ”‚
โ”‚  - Raises descriptive errors                                    โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                              โ”‚
                              โ”‚ raises
                              โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                    ValidationError (Exception)                   โ”‚
โ”‚                                                                  โ”‚
โ”‚  __init__(self, errors: list[dict]):                            โ”‚
โ”‚    self.errors = errors                                          โ”‚
โ”‚                                                                  โ”‚
โ”‚  Error dict structure:                                           โ”‚
โ”‚    {                                                             โ”‚
โ”‚      "loc": ("field", "subfield"),  # Path tuple                โ”‚
โ”‚      "msg": "Error message",                                    โ”‚
โ”‚      "type": "error_type",                                      โ”‚
โ”‚      "input": original_value                                    โ”‚
โ”‚    }                                                             โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Data Flow

User(name="John", age="25", address={"city": "NYC"})
                    โ”‚
                    โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                    MiniModel.__init__                           โ”‚
โ”‚                                                                 โ”‚
โ”‚  1. Check required fields:                                      โ”‚
โ”‚     - name: present โœ“                                          โ”‚
โ”‚     - age: present โœ“                                           โ”‚
โ”‚     - email: Optional, missing โ†’ use None                      โ”‚
โ”‚     - address: present โœ“                                       โ”‚
โ”‚                                                                 โ”‚
โ”‚  2. Validate each field:                                        โ”‚
โ”‚     โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚
โ”‚     โ”‚ name: str โ† "John"                                      โ”‚ โ”‚
โ”‚     โ”‚   โ†’ isinstance("John", str) โ†’ True โ†’ "John"            โ”‚ โ”‚
โ”‚     โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚
โ”‚     โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚
โ”‚     โ”‚ age: int โ† "25"                                         โ”‚ โ”‚
โ”‚     โ”‚   โ†’ isinstance("25", int) โ†’ False                      โ”‚ โ”‚
โ”‚     โ”‚   โ†’ Coerce: int("25") โ†’ 25                             โ”‚ โ”‚
โ”‚     โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚
โ”‚     โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚
โ”‚     โ”‚ address: Optional[Address] โ† {"city": "NYC"}           โ”‚ โ”‚
โ”‚     โ”‚   โ†’ isinstance(dict, Address) โ†’ False                  โ”‚ โ”‚
โ”‚     โ”‚   โ†’ Address is MiniModel subclass                       โ”‚ โ”‚
โ”‚     โ”‚   โ†’ Recursively: Address(**{"city": "NYC"})            โ”‚ โ”‚
โ”‚     โ”‚   โ†’ Returns validated Address instance                  โ”‚ โ”‚
โ”‚     โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚
โ”‚                                                                 โ”‚
โ”‚  3. Set attributes:                                             โ”‚
โ”‚     self.name = "John"                                          โ”‚
โ”‚     self.age = 25                                               โ”‚
โ”‚     self.address = Address(city="NYC", ...)                     โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Phased Implementation Guide

Phase 1: Basic Structure (1-2 hours)

Goal: Create the metaclass and base class skeleton.

# mini_pydantic/__init__.py
from .core import MiniModel, ValidationError

# mini_pydantic/core.py
from typing import get_type_hints, get_origin, get_args, Union


class ValidationError(Exception):
    """Raised when validation fails."""

    def __init__(self, errors: list):
        self.errors = errors
        super().__init__(f"{len(errors)} validation error(s)")


class MiniModelMeta(type):
    """Metaclass that processes type hints at class creation."""

    def __new__(mcs, name, bases, namespace):
        cls = super().__new__(mcs, name, bases, namespace)

        # Skip processing for the base MiniModel class
        if name == 'MiniModel':
            return cls

        # Store processed field information
        cls.__field_types__ = {}
        cls.__required_fields__ = set()
        cls.__optional_fields__ = set()
        cls.__field_defaults__ = {}

        # TODO: Process annotations in next phase

        return cls


class MiniModel(metaclass=MiniModelMeta):
    """Base class for validated models."""

    def __init__(self, **data):
        # TODO: Implement validation in next phase
        pass

    def model_dump(self) -> dict:
        # TODO: Implement serialization in next phase
        return {}

Checkpoint: Class can be defined without errors.

class User(MiniModel):
    name: str
    age: int

# Should not raise any errors

Phase 2: Type Hint Extraction (1-2 hours)

Goal: Extract and process type annotations.

class MiniModelMeta(type):
    def __new__(mcs, name, bases, namespace):
        cls = super().__new__(mcs, name, bases, namespace)

        if name == 'MiniModel':
            return cls

        # Get type hints (handles forward references)
        try:
            hints = get_type_hints(cls)
        except Exception:
            hints = getattr(cls, '__annotations__', {})

        cls.__field_types__ = hints
        cls.__required_fields__ = set()
        cls.__optional_fields__ = set()
        cls.__field_defaults__ = {}

        for field_name, field_type in hints.items():
            # Check if field has default value
            if hasattr(cls, field_name):
                cls.__optional_fields__.add(field_name)
                cls.__field_defaults__[field_name] = getattr(cls, field_name)
            # Check if type is Optional (Union with None)
            elif mcs._is_optional(field_type):
                cls.__optional_fields__.add(field_name)
                cls.__field_defaults__[field_name] = None
            else:
                cls.__required_fields__.add(field_name)

        return cls

    @staticmethod
    def _is_optional(field_type) -> bool:
        """Check if type is Optional[X] (Union[X, None])."""
        origin = get_origin(field_type)
        if origin is Union:
            args = get_args(field_type)
            return type(None) in args
        return False

Checkpoint: Fields are correctly classified as required/optional.

class User(MiniModel):
    name: str                      # Required
    age: int                       # Required
    email: Optional[str] = None    # Optional
    role: str = "user"             # Optional (has default)

print(User.__required_fields__)  # {'name', 'age'}
print(User.__optional_fields__)  # {'email', 'role'}

Phase 3: Basic Validation (2-3 hours)

Goal: Validate simple types and required fields.

class MiniModel(metaclass=MiniModelMeta):
    def __init__(self, **data):
        errors = []

        # Check required fields
        for field in self.__required_fields__:
            if field not in data:
                errors.append({
                    'loc': (field,),
                    'msg': 'Field required',
                    'type': 'missing'
                })

        # Validate and set fields
        for field_name, field_type in self.__field_types__.items():
            if field_name in data:
                value = data[field_name]
                try:
                    validated = self._validate_field(field_name, field_type, value)
                    setattr(self, field_name, validated)
                except Exception as e:
                    errors.append({
                        'loc': (field_name,),
                        'msg': str(e),
                        'type': 'validation_error',
                        'input': value
                    })
            elif field_name in self.__optional_fields__:
                default = self.__field_defaults__.get(field_name)
                setattr(self, field_name, default)

        if errors:
            raise ValidationError(errors)

    def _validate_field(self, name: str, expected_type, value):
        """Validate a single field value."""
        # Basic type validation with coercion
        if expected_type is int:
            if isinstance(value, int) and not isinstance(value, bool):
                return value
            return int(value)  # Coerce

        if expected_type is str:
            if isinstance(value, str):
                return value
            return str(value)  # Coerce

        if expected_type is float:
            if isinstance(value, (int, float)):
                return float(value)
            return float(value)  # Coerce from string

        if expected_type is bool:
            if isinstance(value, bool):
                return value
            if isinstance(value, str):
                if value.lower() in ('true', '1', 'yes'):
                    return True
                if value.lower() in ('false', '0', 'no'):
                    return False
            return bool(value)

        # Direct type check as fallback
        if isinstance(value, expected_type):
            return value

        raise ValueError(
            f"Expected {expected_type.__name__}, got {type(value).__name__}"
        )

Checkpoint: Basic validation works.

user = User(name="John", age="25")
print(user.age)  # 25 (int, coerced from str)

try:
    User(age=30)  # Missing 'name'
except ValidationError as e:
    print(e.errors)  # [{'loc': ('name',), 'msg': 'Field required', ...}]

Phase 4: Complex Types (2-3 hours)

Goal: Handle Optional, List, Dict, and nested models.

def _validate_field(self, name: str, expected_type, value):
    """Validate a single field value."""
    origin = get_origin(expected_type)

    # Handle Optional[X] (Union[X, None])
    if origin is Union:
        args = get_args(expected_type)
        if value is None and type(None) in args:
            return None

        # Try each type in the union
        for arg in args:
            if arg is type(None):
                continue
            try:
                return self._validate_field(name, arg, value)
            except Exception:
                continue

        type_names = [a.__name__ for a in args if a is not type(None)]
        raise ValueError(f"Expected one of {type_names}, got {type(value).__name__}")

    # Handle list[X]
    if origin is list:
        if not isinstance(value, list):
            raise ValueError(f"Expected list, got {type(value).__name__}")

        item_type = get_args(expected_type)[0] if get_args(expected_type) else object
        validated_items = []

        for i, item in enumerate(value):
            try:
                validated_items.append(
                    self._validate_field(f"{name}[{i}]", item_type, item)
                )
            except Exception as e:
                raise ValueError(f"Item {i}: {e}")

        return validated_items

    # Handle dict[K, V]
    if origin is dict:
        if not isinstance(value, dict):
            raise ValueError(f"Expected dict, got {type(value).__name__}")

        args = get_args(expected_type)
        if len(args) == 2:
            key_type, value_type = args
            validated_dict = {}
            for k, v in value.items():
                validated_key = self._validate_field(f"{name}.key", key_type, k)
                validated_val = self._validate_field(f"{name}[{k}]", value_type, v)
                validated_dict[validated_key] = validated_val
            return validated_dict
        return value

    # Handle nested MiniModel
    if isinstance(expected_type, type) and issubclass(expected_type, MiniModel):
        if isinstance(value, expected_type):
            return value
        if isinstance(value, dict):
            return expected_type(**value)
        raise ValueError(f"Expected {expected_type.__name__} or dict")

    # Basic types (from Phase 3)
    # ... (previous implementation)

Checkpoint: Complex types work.

class Address(MiniModel):
    city: str
    country: str = "USA"

class User(MiniModel):
    tags: List[str]
    address: Optional[Address] = None

user = User(
    tags=["python", "validation"],
    address={"city": "NYC"}
)
print(user.address.city)  # "NYC"

Phase 5: Error Aggregation (1-2 hours)

Goal: Collect all errors with proper paths.

Improve the __init__ to track error paths for nested structures:

def __init__(self, **data):
    errors = []

    # Check required fields
    for field in self.__required_fields__:
        if field not in data:
            errors.append({
                'loc': (field,),
                'msg': 'Field required',
                'type': 'missing'
            })

    # Validate and set fields
    for field_name, field_type in self.__field_types__.items():
        if field_name in data:
            value = data[field_name]
            try:
                validated = self._validate_field(field_name, field_type, value)
                setattr(self, field_name, validated)
            except ValidationError as nested_error:
                # Nested model validation error - prepend path
                for error in nested_error.errors:
                    error['loc'] = (field_name,) + error['loc']
                    errors.append(error)
            except Exception as e:
                errors.append({
                    'loc': (field_name,),
                    'msg': str(e),
                    'type': 'validation_error',
                    'input': value
                })
        elif field_name in self.__optional_fields__:
            # Handle defaults that might be mutable
            default = self.__field_defaults__.get(field_name)
            if isinstance(default, list):
                default = list(default)  # Copy
            elif isinstance(default, dict):
                default = dict(default)  # Copy
            setattr(self, field_name, default)

    if errors:
        raise ValidationError(errors)

Checkpoint: Nested errors have full paths.

try:
    User(address={"city": 123})  # city should be str
except ValidationError as e:
    print(e.errors[0]['loc'])  # ('address', 'city')

Phase 6: Serialization and Polish (1-2 hours)

Goal: Implement model_dump(), __repr__, and edge cases.

def model_dump(self) -> dict:
    """Convert model to dictionary."""
    result = {}
    for field_name in self.__field_types__:
        value = getattr(self, field_name, None)

        if isinstance(value, MiniModel):
            result[field_name] = value.model_dump()
        elif isinstance(value, list):
            result[field_name] = [
                v.model_dump() if isinstance(v, MiniModel) else v
                for v in value
            ]
        elif isinstance(value, dict):
            result[field_name] = {
                k: v.model_dump() if isinstance(v, MiniModel) else v
                for k, v in value.items()
            }
        else:
            result[field_name] = value

    return result

def __repr__(self):
    fields = ', '.join(
        f'{name}={getattr(self, name)!r}'
        for name in self.__field_types__
    )
    return f'{self.__class__.__name__}({fields})'

def __eq__(self, other):
    if not isinstance(other, self.__class__):
        return False
    return self.model_dump() == other.model_dump()

Final Checkpoint: Full functionality works.

user = User(name="John", age=30, address={"city": "NYC"})
print(user)  # User(name='John', age=30, ...)
print(user.model_dump())  # {'name': 'John', 'age': 30, ...}

Testing Strategy

Unit Tests

# tests/test_basic_types.py
import pytest
from mini_pydantic import MiniModel, ValidationError


class SimpleModel(MiniModel):
    name: str
    age: int
    score: float
    active: bool


def test_valid_simple_model():
    model = SimpleModel(name="John", age=30, score=95.5, active=True)
    assert model.name == "John"
    assert model.age == 30
    assert model.score == 95.5
    assert model.active is True


def test_type_coercion():
    model = SimpleModel(name="John", age="30", score="95.5", active="true")
    assert model.age == 30
    assert isinstance(model.age, int)
    assert model.score == 95.5
    assert model.active is True


def test_missing_required_field():
    with pytest.raises(ValidationError) as exc:
        SimpleModel(name="John")

    errors = exc.value.errors
    assert len(errors) == 3  # age, score, active
    locs = {e['loc'] for e in errors}
    assert ('age',) in locs
    assert ('score',) in locs


def test_invalid_type():
    with pytest.raises(ValidationError) as exc:
        SimpleModel(name="John", age="not-a-number", score=1.0, active=True)

    assert any(e['loc'] == ('age',) for e in exc.value.errors)


# tests/test_optional_types.py
from typing import Optional


class OptionalModel(MiniModel):
    required_field: str
    optional_field: Optional[str] = None
    default_field: str = "default"


def test_optional_with_none():
    model = OptionalModel(required_field="test")
    assert model.optional_field is None
    assert model.default_field == "default"


def test_optional_with_value():
    model = OptionalModel(
        required_field="test",
        optional_field="provided"
    )
    assert model.optional_field == "provided"


# tests/test_nested_models.py
class Address(MiniModel):
    city: str
    country: str = "USA"


class Person(MiniModel):
    name: str
    address: Address


def test_nested_model_from_dict():
    person = Person(
        name="John",
        address={"city": "NYC"}
    )
    assert person.address.city == "NYC"
    assert person.address.country == "USA"


def test_nested_model_instance():
    addr = Address(city="Boston")
    person = Person(name="Jane", address=addr)
    assert person.address.city == "Boston"


def test_nested_validation_error():
    with pytest.raises(ValidationError) as exc:
        Person(name="John", address={"city": 123})

    errors = exc.value.errors
    assert any(
        e['loc'] == ('address', 'city')
        for e in errors
    )


# tests/test_list_types.py
from typing import List


class TaggedItem(MiniModel):
    name: str
    tags: List[str]


def test_list_validation():
    item = TaggedItem(name="test", tags=["a", "b", "c"])
    assert item.tags == ["a", "b", "c"]


def test_list_coercion():
    item = TaggedItem(name="test", tags=[1, 2, 3])
    assert item.tags == ["1", "2", "3"]


def test_list_type_error():
    with pytest.raises(ValidationError):
        TaggedItem(name="test", tags="not-a-list")

Integration Tests

# tests/test_integration.py

class Address(MiniModel):
    street: str
    city: str
    zip_code: str

class Company(MiniModel):
    name: str
    employees: int

class Person(MiniModel):
    name: str
    age: int
    email: Optional[str] = None
    addresses: List[Address] = []
    employer: Optional[Company] = None


def test_complex_model():
    data = {
        "name": "John Doe",
        "age": 30,
        "email": "john@example.com",
        "addresses": [
            {"street": "123 Main St", "city": "NYC", "zip_code": "10001"},
            {"street": "456 Oak Ave", "city": "LA", "zip_code": "90001"}
        ],
        "employer": {
            "name": "TechCorp",
            "employees": 500
        }
    }

    person = Person(**data)

    assert person.name == "John Doe"
    assert len(person.addresses) == 2
    assert person.addresses[0].city == "NYC"
    assert person.employer.name == "TechCorp"


def test_model_dump_roundtrip():
    person = Person(
        name="Jane",
        age=25,
        addresses=[{"street": "789 Pine", "city": "Chicago", "zip_code": "60601"}]
    )

    dumped = person.model_dump()
    restored = Person(**dumped)

    assert restored.name == person.name
    assert restored.addresses[0].city == person.addresses[0].city


def test_all_errors_collected():
    with pytest.raises(ValidationError) as exc:
        Person(
            age="not-a-number",
            addresses=[{"street": 123}]  # city and zip_code missing, street wrong type
        )

    errors = exc.value.errors

    # Should have: name missing, age invalid, address[0] issues
    assert len(errors) >= 3

    # Verify error locations are correct
    locs = [e['loc'] for e in errors]
    assert ('name',) in locs
    assert ('age',) in locs

Common Pitfalls and Debugging

Pitfall 1: Forward Reference Resolution Fails

Problem: Models referencing each other fail with NameError.

class Node(MiniModel):
    value: int
    next: "Node"  # Forward reference

Solution: Call model_rebuild() after all classes are defined:

@classmethod
def model_rebuild(cls):
    """Rebuild the model to resolve forward references."""
    try:
        cls.__field_types__ = get_type_hints(cls)
    except NameError:
        pass  # Still unresolvable

Pitfall 2: Mutable Default Values

Problem: List/dict defaults are shared between instances.

class User(MiniModel):
    tags: List[str] = []  # DANGER: Same list for all instances!

Solution: Copy mutable defaults in __init__:

if isinstance(default, (list, dict)):
    default = type(default)(default)  # Create a copy

Pitfall 3: Bool is Subclass of Int

Problem: isinstance(True, int) returns True.

# This incorrectly accepts True for an int field
model = SimpleModel(age=True)  # age becomes True, not 1

Solution: Check for bool explicitly before int:

if expected_type is int:
    if isinstance(value, bool):  # Check bool FIRST
        raise ValueError("Expected int, got bool")
    if isinstance(value, int):
        return value
    return int(value)

Pitfall 4: Union Order Matters

Problem: Union[int, str] always matches int first.

# "123" gets coerced to 123 because int is tried first

Solution: Document this behavior or implement โ€œbest matchโ€ logic:

# Option 1: Try exact type match first
for arg in args:
    if isinstance(value, arg):
        return value

# Option 2: Prefer the most specific type
# (Complex to implement correctly)

Pitfall 5: Recursive Models Cause Infinite Loops

Problem: Self-referencing models without None termination.

class Node(MiniModel):
    value: int
    children: List["Node"]  # No None termination - must always have children

Solution: Always use Optional for recursive references:

class Node(MiniModel):
    value: int
    children: Optional[List["Node"]] = None  # Safe termination

Debugging Tips

Print type inspection results:

from typing import get_origin, get_args

def debug_type(t):
    print(f"Type: {t}")
    print(f"Origin: {get_origin(t)}")
    print(f"Args: {get_args(t)}")

Add verbose logging to _validate_field:

def _validate_field(self, name, expected_type, value):
    print(f"Validating {name}: {value!r} against {expected_type}")
    # ... validation code ...

Extensions and Challenges

Extension 1: Field Constraints

Add validation constraints like Pydanticโ€™s Field():

class Field:
    def __init__(self, *,
                 default=...,
                 min_length=None,
                 max_length=None,
                 ge=None,
                 le=None,
                 pattern=None):
        self.default = default
        self.min_length = min_length
        # ... store all constraints

class User(MiniModel):
    name: str = Field(min_length=1, max_length=100)
    age: int = Field(ge=0, le=150)

Extension 2: Custom Validators

Add decorator-based validators:

class User(MiniModel):
    email: str

    @field_validator('email')
    @classmethod
    def validate_email(cls, v):
        if '@' not in v:
            raise ValueError('Invalid email')
        return v.lower()

Extension 3: Strict Mode

Disable type coercion:

class StrictUser(MiniModel):
    age: int

    model_config = {'strict': True}

StrictUser(age="25")  # Raises error instead of coercing

Extension 4: JSON Schema Generation

Generate JSON Schema from models:

schema = User.model_json_schema()
# Returns:
# {
#   "type": "object",
#   "properties": {
#     "name": {"type": "string"},
#     "age": {"type": "integer"}
#   },
#   "required": ["name", "age"]
# }

Extension 5: Alias Support

Support different names for input/output:

class User(MiniModel):
    user_name: str = Field(alias="userName")

# Accepts {"userName": "John"}
# Outputs {"user_name": "John"} or {"userName": "John"} based on config

Real-World Connections

Where These Concepts Appear

  1. Pydantic - The library youโ€™re reimplementing (production-grade)
  2. dataclasses - Python stdlib, no validation but similar structure
  3. attrs - Third-party, with optional validators
  4. marshmallow - Serialization/validation, schema-based approach
  5. SQLModel - Pydantic + SQLAlchemy combined
  6. FastAPI - Uses Pydantic for request/response validation

Industry Applications

  • API Development - Request validation in web frameworks
  • Configuration Management - Type-safe config loading
  • Data Pipelines - Schema enforcement between stages
  • LLM Applications - Structured output parsing
  • ORM Integration - Database model validation

Production Considerations

If you wanted to use your mini-pydantic in production, you would need:

  1. Performance optimization - Compile validators, cache type info
  2. Thread safety - Careful with mutable class-level state
  3. Error message quality - Clear, actionable messages
  4. Documentation - Rich docstrings, type stubs
  5. Edge cases - Handle every possible input gracefully

Self-Assessment Checklist

Core Understanding

  • Can I explain why metaclasses are used instead of regular inheritance?
  • Can I describe the difference between __annotations__ and get_type_hints()?
  • Can I explain how get_origin() and get_args() decompose generic types?
  • Can I describe the descriptor protocol and when to use it?

Implementation Skills

  • Can I implement a metaclass that processes type annotations?
  • Can I validate nested model structures recursively?
  • Can I handle Optional, List, and Dict types correctly?
  • Can I implement type coercion for basic types?

Design Understanding

  • Can I explain why error aggregation is better than fail-fast?
  • Can I describe how to handle mutable default values safely?
  • Can I explain the tradeoffs of type coercion vs strict mode?
  • Can I design an extensible validation architecture?

Mastery Indicators

  • My implementation handles all basic Python types correctly
  • Nested models work with proper error path reporting
  • All validation errors are collected before raising
  • The code is well-organized with clear separation of concerns
  • I understand the limitations compared to real Pydantic

Resources

Documentation

Books

  • โ€œFluent Pythonโ€ by Luciano Ramalho - Chapters 23 (Descriptors) and 24 (Metaclasses)
  • โ€œPython in a Nutshellโ€ by Alex Martelli - Comprehensive reference
  • โ€œRobust Pythonโ€ by Patrick Viafore - Type hints and validation

Source Code Study

Advanced Topics