Project 8: LLM Structured Output

Build a system that uses Pydantic to define structured outputs for LLMs, ensuring the AI returns validated, type-safe data instead of arbitrary text.

Learning Objectives

By completing this project, you will:

Understand the problem of unstructured LLM output - Why raw text responses are unreliable for production systems
Master JSON Schema generation with Pydantic - Use model_json_schema() to create schemas that guide LLM responses
Implement schema injection in prompts - Techniques for instructing LLMs to follow specific output formats
Integrate with OpenAI and Anthropic APIs - Use function calling and structured output features
Apply the Instructor library pattern - Understand how Instructor patches LLM clients for automatic validation
Build retry and self-correction strategies - Handle malformed responses gracefully with automatic retries

Deep Theoretical Foundation

The Problem of Unstructured LLM Output

Large Language Models are fundamentally text generation systems. When you ask an LLM to “extract the person’s name and age from this text,” you might get:

Attempt 1: "The person's name is John and they are 30 years old."
Attempt 2: "Name: John, Age: 30"
Attempt 3: "John (30)"
Attempt 4: "I found that the individual named John is thirty years old."

All correct semantically, but none are reliably parseable by code. This unpredictability is catastrophic for production systems that need to:

Store extracted data in databases
Chain LLM outputs to other services
Validate business rules on extracted information
Provide consistent API responses

┌─────────────────────────────────────────────────────────────────┐
│                 THE STRUCTURED OUTPUT PROBLEM                    │
│                                                                  │
│   ┌───────────────────┐                                         │
│   │    LLM Prompt     │                                         │
│   │  "Extract user    │                                         │
│   │   info from text" │                                         │
│   └─────────┬─────────┘                                         │
│             │                                                    │
│             ▼                                                    │
│   ┌───────────────────┐                                         │
│   │   LLM Response    │                                         │
│   │  (Free-form text) │                                         │
│   └─────────┬─────────┘                                         │
│             │                                                    │
│    ┌────────┴────────┐                                          │
│    │                 │                                          │
│    ▼                 ▼                                          │
│  "Name: John"    "The user                                      │
│  "Age: 30"        John is 30"                                   │
│                                                                  │
│             │                 │                                  │
│             ▼                 ▼                                  │
│   ┌─────────────────────────────────────────────────────────┐   │
│   │                    YOUR CODE                             │   │
│   │   regex? string parsing? prayer?                        │   │
│   │                                                          │   │
│   │   name = ???  # How do you reliably extract this?       │   │
│   │   age = ???   # What if it says "thirty" instead of 30? │   │
│   └─────────────────────────────────────────────────────────┘   │
│                                                                  │
│                        THE NIGHTMARE                             │
└─────────────────────────────────────────────────────────────────┘

The Solution: Schema-Guided Generation

The solution is to tell the LLM exactly what structure we expect, and have the LLM API enforce that structure:

┌─────────────────────────────────────────────────────────────────┐
│                 THE STRUCTURED OUTPUT SOLUTION                   │
│                                                                  │
│   ┌───────────────────┐      ┌───────────────────┐              │
│   │  Pydantic Model   │ ───► │   JSON Schema     │              │
│   │                   │      │                   │              │
│   │  class User:      │      │  {                │              │
│   │    name: str      │      │    "properties":  │              │
│   │    age: int       │      │      "name": {}   │              │
│   │                   │      │      "age": {}    │              │
│   └───────────────────┘      └─────────┬─────────┘              │
│                                        │                         │
│                                        ▼                         │
│   ┌─────────────────────────────────────────────────────────┐   │
│   │                      LLM API                             │   │
│   │                                                          │   │
│   │   - Prompt: "Extract user info from: 'John is 30'"      │   │
│   │   - Schema: {"name": str, "age": int}                   │   │
│   │   - Mode: JSON / Function Calling / Structured Output   │   │
│   │                                                          │   │
│   └─────────────────────────────────────────────────────────┘   │
│                              │                                   │
│                              ▼                                   │
│                    {"name": "John", "age": 30}                  │
│                              │                                   │
│                              ▼                                   │
│   ┌─────────────────────────────────────────────────────────┐   │
│   │                   Pydantic Validation                    │   │
│   │                                                          │   │
│   │   user = User.model_validate_json(response)              │   │
│   │   # Guaranteed to have correct types!                    │   │
│   │                                                          │   │
│   └─────────────────────────────────────────────────────────┘   │
│                                                                  │
│                         RELIABLE OUTPUT                          │
└─────────────────────────────────────────────────────────────────┘

JSON Schema Generation with model_json_schema()

Pydantic can generate JSON Schema from any model, which becomes the bridge between your Python types and the LLM:

from pydantic import BaseModel, Field
from typing import Literal, Optional
from datetime import date

class Person(BaseModel):
    """A person extracted from text."""
    name: str = Field(..., description="The person's full name")
    age: int = Field(..., ge=0, le=150, description="Age in years")
    occupation: Optional[str] = Field(None, description="Job or profession")

# Generate JSON Schema
schema = Person.model_json_schema()
print(json.dumps(schema, indent=2))

Output:

{
  "title": "Person",
  "description": "A person extracted from text.",
  "type": "object",
  "properties": {
    "name": {
      "type": "string",
      "description": "The person's full name",
      "title": "Name"
    },
    "age": {
      "type": "integer",
      "minimum": 0,
      "maximum": 150,
      "description": "Age in years",
      "title": "Age"
    },
    "occupation": {
      "type": "string",
      "description": "Job or profession",
      "title": "Occupation",
      "default": null
    }
  },
  "required": ["name", "age"]
}

Key Insights:

Field descriptions become schema descriptions - The LLM reads these to understand what each field means
Constraints are encoded - ge=0, le=150 becomes minimum and maximum
Optional fields have defaults - The LLM knows it can omit them
Types are enforced - age: int means the JSON must have an integer, not a string

Schema Injection Strategies

There are three main strategies for getting LLMs to produce structured output:

Strategy 1: System Prompt with Schema

The simplest approach: include the schema in the system prompt and ask for JSON:

def build_prompt(schema: dict, user_prompt: str) -> list[dict]:
    return [
        {
            "role": "system",
            "content": f"""You are a helpful assistant that always responds with valid JSON.

Your response must conform to this JSON schema:
{json.dumps(schema, indent=2)}

Respond ONLY with valid JSON, no other text or explanation."""
        },
        {
            "role": "user",
            "content": user_prompt
        }
    ]

Pros:

Works with any LLM that can output JSON
Simple to implement

Cons:

LLM might still produce invalid JSON
No guarantee schema is followed exactly
Nested or complex schemas often fail

Strategy 2: OpenAI Function Calling

OpenAI’s function calling feature was originally designed for tool use, but works excellently for structured extraction:

from openai import OpenAI

client = OpenAI()

response = client.chat.completions.create(
    model="gpt-4",
    messages=[
        {"role": "user", "content": "Extract person info from: John Smith is 30 years old."}
    ],
    functions=[
        {
            "name": "extract_person",
            "description": "Extract person information from text",
            "parameters": Person.model_json_schema()
        }
    ],
    function_call={"name": "extract_person"}  # Force this function
)

# Response is in function_call.arguments as JSON string
person_json = response.choices[0].message.function_call.arguments
person = Person.model_validate_json(person_json)

Pros:

More reliable than raw JSON mode
LLM is “trained” to produce function arguments

Cons:

Function calling adds token overhead
Not all models support it

Strategy 3: OpenAI Structured Outputs (Newest)

OpenAI’s newest feature guarantees schema compliance:

from openai import OpenAI

client = OpenAI()

response = client.chat.completions.create(
    model="gpt-4o-2024-08-06",  # Must be this model or newer
    messages=[
        {"role": "user", "content": "Extract person info from: John Smith is 30."}
    ],
    response_format={
        "type": "json_schema",
        "json_schema": {
            "name": "person_response",
            "strict": True,
            "schema": Person.model_json_schema()
        }
    }
)

person = Person.model_validate_json(response.choices[0].message.content)

Pros:

Guaranteed valid JSON matching schema
Fastest and most reliable

Cons:

Only available on newest models
Some schema features not supported in strict mode

The Instructor Library Pattern

The Instructor library wraps OpenAI/Anthropic clients to automate the structured output pattern:

┌─────────────────────────────────────────────────────────────────┐
│                   INSTRUCTOR ARCHITECTURE                        │
│                                                                  │
│   ┌───────────────────────────────────────────────────────────┐ │
│   │                     Your Code                              │ │
│   │                                                            │ │
│   │   @instructor.patch                                        │ │
│   │   client = OpenAI()                                        │ │
│   │                                                            │ │
│   │   user = client.chat.completions.create(                   │ │
│   │       model="gpt-4",                                       │ │
│   │       response_model=User,  # <- Pydantic model!           │ │
│   │       messages=[...]                                       │ │
│   │   )                                                        │ │
│   │   # user is a validated User instance                      │ │
│   │                                                            │ │
│   └───────────────────────────────────────────────────────────┘ │
│                              │                                   │
│                              ▼                                   │
│   ┌───────────────────────────────────────────────────────────┐ │
│   │                   Instructor Internals                     │ │
│   │                                                            │ │
│   │   1. Extract JSON Schema from User model                   │ │
│   │   2. Choose strategy (function calling / JSON mode)        │ │
│   │   3. Add schema to API call                                │ │
│   │   4. Make API request                                      │ │
│   │   5. Parse JSON response                                   │ │
│   │   6. Validate with User.model_validate()                   │ │
│   │   7. If validation fails, retry with error feedback        │ │
│   │   8. Return validated User instance                        │ │
│   │                                                            │ │
│   └───────────────────────────────────────────────────────────┘ │
│                              │                                   │
│                              ▼                                   │
│   ┌───────────────────────────────────────────────────────────┐ │
│   │                     OpenAI API                             │ │
│   └───────────────────────────────────────────────────────────┘ │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Instructor provides:

Automatic schema injection - No manual JSON Schema handling
Retry with feedback - If LLM produces invalid JSON, it retries with the error message
Multiple modes - Function calling, JSON mode, tool use
Streaming support - Partial objects as they generate
Validation hooks - Custom validators that trigger retries

Retry and Self-Correction Strategies

LLMs are probabilistic - even with schemas, they sometimes produce invalid output. A robust system needs retry logic:

class RetryStrategy:
    """Configurable retry strategy for LLM structured output."""

    def __init__(
        self,
        max_retries: int = 3,
        include_error_in_retry: bool = True,
        exponential_backoff: bool = True
    ):
        self.max_retries = max_retries
        self.include_error_in_retry = include_error_in_retry
        self.exponential_backoff = exponential_backoff

The self-correction pattern:

┌─────────────────────────────────────────────────────────────────┐
│                    SELF-CORRECTION LOOP                          │
│                                                                  │
│   ┌─────────────┐                                               │
│   │   Attempt 1 │                                               │
│   └──────┬──────┘                                               │
│          │                                                       │
│          ▼                                                       │
│   ┌─────────────────┐     ┌─────────────────┐                   │
│   │  LLM Response   │ ──► │    Validate     │                   │
│   │  {"name": "Jo   │     │    with         │                   │
│   │   "age": "30"}  │     │    Pydantic     │                   │
│   └─────────────────┘     └────────┬────────┘                   │
│                                    │                             │
│                              ✗ Invalid!                          │
│                    "age should be int, got str"                  │
│                                    │                             │
│                                    ▼                             │
│   ┌─────────────────────────────────────────────────────────┐   │
│   │   Attempt 2 (with error feedback)                        │   │
│   │                                                          │   │
│   │   System: "Your previous response had errors:            │   │
│   │   - age: Input should be a valid integer                │   │
│   │   Please fix and try again."                            │   │
│   │                                                          │   │
│   └─────────────────────────────────────────────────────────┘   │
│                                    │                             │
│                                    ▼                             │
│   ┌─────────────────┐     ┌─────────────────┐                   │
│   │  LLM Response   │ ──► │    Validate     │                   │
│   │  {"name": "John"│     │    with         │                   │
│   │   "age": 30}    │     │    Pydantic     │                   │
│   └─────────────────┘     └────────┬────────┘                   │
│                                    │                             │
│                              ✓ Valid!                            │
│                                    │                             │
│                                    ▼                             │
│                           Return User instance                   │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Handling Complex Nested Types with LLMs

Complex schemas present challenges for LLMs. Consider:

from pydantic import BaseModel, Field
from typing import List, Optional, Literal
from datetime import datetime

class Address(BaseModel):
    street: str
    city: str
    country: str = Field(..., description="ISO 3166-1 alpha-2 country code")
    postal_code: Optional[str] = None

class ContactMethod(BaseModel):
    type: Literal["email", "phone", "social"]
    value: str
    is_primary: bool = False

class Person(BaseModel):
    name: str
    age: int
    addresses: List[Address] = Field(default_factory=list)
    contacts: List[ContactMethod] = Field(default_factory=list)
    metadata: dict = Field(default_factory=dict)

This schema has:

Nested objects (Address, ContactMethod)
Lists of objects
Literal types for enums
Optional fields with defaults
Arbitrary dict fields

Strategies for Complex Schemas:

Break into steps - Extract simple fields first, then complex ones:

# Step 1: Extract basic info
basic_info = extract(BasicPerson, text)

# Step 2: Extract addresses with context
addresses = extract(List[Address], text, context=basic_info)

# Step 3: Combine
full_person = Person(**basic_info.model_dump(), addresses=addresses)

Use description heavily - LLMs rely on descriptions for context:

class Address(BaseModel):
    """A physical mailing address. Extract from the text any mention
    of where the person lives or works."""

    street: str = Field(..., description="Street address including number")
    city: str = Field(..., description="City name, not abbreviated")

Provide examples - Include example outputs in the prompt:

EXAMPLES = """
Example input: "John lives at 123 Main St in NYC"
Example output: {"street": "123 Main St", "city": "New York City", "country": "US"}
"""

Simplify when possible - Use flatter schemas if nesting isn’t essential

Comparing LLM Providers for Structured Output

Provider	Method	Reliability	Speed	Cost
OpenAI GPT-4o	Structured Outputs	Highest	Fast	$$$
OpenAI GPT-4	Function Calling	High	Medium	$$$
OpenAI GPT-3.5	JSON Mode	Medium	Fast	$
Anthropic Claude	Tool Use	High	Medium	$$
Local (Ollama)	JSON Mode	Variable	Depends	Free

Token Efficiency Considerations

Structured output has token overhead:

┌─────────────────────────────────────────────────────────────────┐
│                    TOKEN BREAKDOWN                               │
│                                                                  │
│   Standard Prompt:                                              │
│   ┌─────────────────────────────────────────────────────────┐   │
│   │ System: "You are a helpful assistant"           ~10 tokens │
│   │ User: "Extract name and age from: John is 30"  ~15 tokens │
│   │ Response: "Name: John, Age: 30"                ~10 tokens │
│   └─────────────────────────────────────────────────────────┘   │
│   Total: ~35 tokens                                             │
│                                                                  │
│   Structured Output:                                            │
│   ┌─────────────────────────────────────────────────────────┐   │
│   │ System: "You are a helpful assistant that outputs JSON" │   │
│   │ System: + JSON Schema (varies)                ~50-200 tokens│
│   │ User: "Extract name and age from: John is 30"  ~15 tokens │
│   │ Response: {"name": "John", "age": 30}          ~15 tokens │
│   └─────────────────────────────────────────────────────────┘   │
│   Total: ~100-250 tokens                                        │
│                                                                  │
│   Trade-off: 3-7x more tokens for guaranteed structure          │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Project Specification

Functional Requirements

Build a structured LLM output system that:

Defines extraction schemas with Pydantic - Multiple domains (people, events, products)
Supports multiple LLM providers - OpenAI, Anthropic, with a consistent interface
Implements retry with self-correction - Automatic retries with error feedback
Handles complex nested types - Lists, nested objects, optional fields
Provides validation feedback - Clear error messages when extraction fails
Supports streaming - Partial results for long extractions

Use Cases to Implement

Use Case 1: Document Entity Extraction

Extract structured entities from documents:

class Entity(BaseModel):
    """An entity mentioned in the document."""
    name: str = Field(..., description="Entity name as it appears in text")
    type: Literal["person", "organization", "location", "date", "product"]
    context: str = Field(..., description="Sentence where entity appears")
    confidence: float = Field(..., ge=0, le=1)

class DocumentAnalysis(BaseModel):
    """Complete analysis of a document."""
    summary: str = Field(..., max_length=500)
    entities: List[Entity]
    key_topics: List[str]
    sentiment: Literal["positive", "negative", "neutral"]

Use Case 2: Structured Data Transformation

Convert unstructured text to database-ready records:

class ProductListing(BaseModel):
    """A product extracted from a listing description."""
    title: str = Field(..., max_length=200)
    price: float = Field(..., ge=0)
    currency: str = Field("USD", pattern=r'^[A-Z]{3}$')
    category: str
    features: List[str] = Field(default_factory=list)
    in_stock: bool = True

class ProductCatalog(BaseModel):
    """Multiple products from a catalog page."""
    products: List[ProductListing]
    source_url: Optional[str] = None

Use Case 3: Conversational Response Structuring

Structure chatbot responses for downstream processing:

class Intent(BaseModel):
    """Detected user intent."""
    category: Literal["question", "command", "feedback", "other"]
    action: Optional[str] = Field(None, description="Specific action requested")
    entities: dict = Field(default_factory=dict)

class StructuredResponse(BaseModel):
    """A chatbot response with structured metadata."""
    text: str = Field(..., description="Response text to show user")
    intent: Intent
    follow_up_questions: List[str] = Field(default_factory=list)
    requires_human: bool = False
    confidence: float = Field(..., ge=0, le=1)

CLI Interface

# Extract entities from text
$ llm-extract --schema entities --input document.txt --output entities.json

# Extract from stdin with custom schema
$ cat document.txt | llm-extract --schema-file custom_schema.py --model Person

# Interactive mode with streaming
$ llm-extract --interactive --schema chat_response

# Batch processing
$ llm-extract --schema products --input-dir listings/ --output-dir extracted/

API Interface

from structured_llm import StructuredLLM, RetryConfig

# Initialize with configuration
llm = StructuredLLM(
    provider="openai",
    model="gpt-4o",
    retry_config=RetryConfig(max_retries=3)
)

# Simple extraction
person = llm.extract(
    schema=Person,
    text="John Smith is a 30-year-old software engineer from NYC."
)

# Batch extraction
products = llm.extract_many(
    schema=ProductListing,
    texts=listing_texts,
    concurrency=5
)

# With custom prompt
analysis = llm.extract(
    schema=DocumentAnalysis,
    text=document,
    system_prompt="You are an expert document analyst...",
    examples=[
        ("Example input...", {"summary": "...", "entities": [...]})
    ]
)

Solution Architecture

Component Design

┌─────────────────────────────────────────────────────────────────┐
│                         StructuredLLM                            │
│                     (Main Entry Point)                           │
│                                                                  │
│  - extract(schema, text) -> Model                               │
│  - extract_many(schema, texts) -> List[Model]                   │
│  - stream(schema, text) -> AsyncIterator[PartialModel]          │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘
                              │
        ┌─────────────────────┼─────────────────────┐
        ▼                     ▼                     ▼
┌───────────────────┐ ┌───────────────────┐ ┌───────────────────┐
│  SchemaBuilder    │ │   PromptBuilder   │ │   RetryHandler    │
│                   │ │                   │ │                   │
│ - to_json_schema  │ │ - build_system    │ │ - with_retries    │
│ - to_function     │ │ - build_user      │ │ - format_error    │
│ - from_pydantic   │ │ - inject_schema   │ │ - should_retry    │
└───────────────────┘ └───────────────────┘ └───────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                        LLM Providers                             │
│  ┌─────────────────┐  ┌─────────────────┐  ┌─────────────────┐  │
│  │ OpenAIProvider  │  │ AnthropicProvider│ │  OllamaProvider │  │
│  │                 │  │                 │  │                 │  │
│  │ - function_call │  │ - tool_use      │  │ - json_mode     │  │
│  │ - structured    │  │                 │  │                 │  │
│  │ - json_mode     │  │                 │  │                 │  │
│  └─────────────────┘  └─────────────────┘  └─────────────────┘  │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                       ResponseParser                             │
│                                                                  │
│  - parse_json(response) -> dict                                 │
│  - validate(dict, schema) -> Model | ValidationError            │
│  - extract_from_function_call(response) -> dict                 │
└─────────────────────────────────────────────────────────────────┘

Provider Abstraction

from abc import ABC, abstractmethod
from pydantic import BaseModel
from typing import Type, TypeVar, AsyncIterator

T = TypeVar('T', bound=BaseModel)

class LLMProvider(ABC):
    """Abstract base class for LLM providers."""

    @abstractmethod
    def complete(
        self,
        messages: list[dict],
        schema: dict,
        **kwargs
    ) -> str:
        """Get completion from LLM."""
        pass

    @abstractmethod
    async def stream(
        self,
        messages: list[dict],
        schema: dict,
        **kwargs
    ) -> AsyncIterator[str]:
        """Stream completion from LLM."""
        pass

    @property
    @abstractmethod
    def supports_function_calling(self) -> bool:
        """Whether this provider supports function calling."""
        pass

    @property
    @abstractmethod
    def supports_structured_output(self) -> bool:
        """Whether this provider supports strict structured output."""
        pass


class OpenAIProvider(LLMProvider):
    """OpenAI API provider."""

    def __init__(self, model: str = "gpt-4o", api_key: str = None):
        self.model = model
        self.client = openai.OpenAI(api_key=api_key)

    @property
    def supports_function_calling(self) -> bool:
        return True

    @property
    def supports_structured_output(self) -> bool:
        return "gpt-4o" in self.model  # Only latest models

    def complete(self, messages: list[dict], schema: dict, **kwargs) -> str:
        if self.supports_structured_output:
            return self._complete_structured(messages, schema, **kwargs)
        elif self.supports_function_calling:
            return self._complete_function_call(messages, schema, **kwargs)
        else:
            return self._complete_json_mode(messages, schema, **kwargs)

Retry Handler Design

from dataclasses import dataclass
from typing import Callable, TypeVar
from pydantic import ValidationError
import time

T = TypeVar('T')

@dataclass
class RetryConfig:
    """Configuration for retry behavior."""
    max_retries: int = 3
    initial_delay: float = 0.5
    exponential_base: float = 2.0
    include_error_feedback: bool = True
    max_delay: float = 30.0

class RetryHandler:
    """Handles retries with self-correction feedback."""

    def __init__(self, config: RetryConfig, provider: LLMProvider):
        self.config = config
        self.provider = provider

    def with_retries(
        self,
        func: Callable[[], T],
        on_error: Callable[[Exception, int], list[dict]] = None
    ) -> T:
        """Execute function with retries."""
        last_error = None
        messages = None

        for attempt in range(self.config.max_retries + 1):
            try:
                if attempt > 0 and last_error and on_error:
                    # Add error feedback to messages
                    messages = on_error(last_error, attempt)

                return func(messages)

            except (ValidationError, json.JSONDecodeError) as e:
                last_error = e

                if attempt < self.config.max_retries:
                    delay = min(
                        self.config.initial_delay * (self.config.exponential_base ** attempt),
                        self.config.max_delay
                    )
                    time.sleep(delay)

        raise last_error

    def format_validation_error(self, error: ValidationError) -> str:
        """Format validation error for LLM feedback."""
        lines = ["Your previous response had validation errors:"]
        for err in error.errors():
            field_path = ".".join(str(p) for p in err["loc"])
            lines.append(f"- {field_path}: {err['msg']}")
        lines.append("\nPlease fix these issues and try again.")
        return "\n".join(lines)

Project Structure

structured_llm/
├── src/
│   └── structured_llm/
│       ├── __init__.py
│       ├── client.py           # Main StructuredLLM class
│       ├── schemas.py          # Schema building utilities
│       ├── prompts.py          # Prompt construction
│       ├── retry.py            # Retry handling
│       ├── streaming.py        # Streaming support
│       │
│       ├── providers/
│       │   ├── __init__.py
│       │   ├── base.py         # Abstract provider
│       │   ├── openai.py       # OpenAI implementation
│       │   ├── anthropic.py    # Anthropic implementation
│       │   └── ollama.py       # Ollama implementation
│       │
│       └── examples/
│           ├── entities.py     # Entity extraction schemas
│           ├── products.py     # Product extraction schemas
│           └── chat.py         # Chat response schemas
│
├── tests/
│   ├── test_schemas.py
│   ├── test_retry.py
│   ├── test_providers.py
│   └── test_integration.py
│
├── examples/
│   ├── simple_extraction.py
│   ├── batch_processing.py
│   └── streaming_example.py
│
├── pyproject.toml
└── README.md

Phased Implementation Guide

Phase 1: Core Schema Infrastructure (2-3 hours)

Goal: Build the foundation for schema handling.

Create base Pydantic models for extraction:

# src/structured_llm/schemas.py
from pydantic import BaseModel, Field
from typing import Type, Any
import json

def model_to_json_schema(model: Type[BaseModel]) -> dict:
    """Convert Pydantic model to JSON Schema for LLM."""
    schema = model.model_json_schema()
    # Clean up schema for LLM consumption
    return _clean_schema(schema)

def _clean_schema(schema: dict) -> dict:
    """Remove Pydantic-specific fields that confuse LLMs."""
    # Remove $defs if not needed
    # Simplify title fields
    # etc.
    pass

Create example extraction schemas:

# src/structured_llm/examples/entities.py
class Person(BaseModel):
    """A person extracted from text."""
    name: str = Field(..., description="Full name")
    age: Optional[int] = Field(None, ge=0, le=150)
    occupation: Optional[str] = None

Write tests for schema generation:

def test_simple_schema():
    schema = model_to_json_schema(Person)
    assert schema["properties"]["name"]["type"] == "string"
    assert "age" in schema["properties"]

Checkpoint: Can generate clean JSON schemas from Pydantic models.

Phase 2: Prompt Construction (2 hours)

Goal: Build reliable prompt templates.

Create prompt builder:

# src/structured_llm/prompts.py
class PromptBuilder:
    def __init__(self, schema: dict, examples: list = None):
        self.schema = schema
        self.examples = examples or []

    def build_system_prompt(self) -> str:
        """Build system prompt with schema injection."""
        pass

    def build_extraction_prompt(self, text: str) -> str:
        """Build user prompt for extraction."""
        pass

    def build_retry_prompt(self, error: str) -> str:
        """Build prompt for retry with error feedback."""
        pass

Test prompt construction with different schemas.

Checkpoint: Prompts correctly include schema and examples.

Phase 3: OpenAI Provider (3-4 hours)

Goal: Implement OpenAI integration with multiple modes.

Create abstract provider base:

# src/structured_llm/providers/base.py
class LLMProvider(ABC):
    @abstractmethod
    def complete(self, messages: list, schema: dict) -> str:
        pass

Implement OpenAI provider with three modes:
- JSON mode (basic)
- Function calling
- Structured outputs (if model supports)

Test with real API calls:

def test_openai_extraction():
    provider = OpenAIProvider(model="gpt-4o")
    result = provider.complete(
        messages=[{"role": "user", "content": "John is 30"}],
        schema=Person.model_json_schema()
    )
    person = Person.model_validate_json(result)
    assert person.name == "John"

Checkpoint: Can extract structured data via OpenAI.

Phase 4: Retry and Self-Correction (2-3 hours)

Goal: Handle failures gracefully.

Implement retry handler with exponential backoff
Add error feedback for self-correction
Create validation error formatter

Test retry behavior:

def test_retry_on_validation_error():
    # Mock LLM to return invalid then valid
    handler = RetryHandler(RetryConfig(max_retries=2))
    result = handler.with_retries(mock_extraction)
    assert result is not None

Checkpoint: System recovers from malformed responses.

Phase 5: Main Client API (2-3 hours)

Goal: Create the unified StructuredLLM interface.

Implement main client:

# src/structured_llm/client.py
class StructuredLLM:
    def __init__(
        self,
        provider: str = "openai",
        model: str = "gpt-4o",
        retry_config: RetryConfig = None
    ):
        self.provider = self._create_provider(provider, model)
        self.retry = RetryHandler(retry_config or RetryConfig())

    def extract(
        self,
        schema: Type[T],
        text: str,
        system_prompt: str = None
    ) -> T:
        """Extract structured data from text."""
        pass

    def extract_many(
        self,
        schema: Type[T],
        texts: list[str],
        concurrency: int = 5
    ) -> list[T]:
        """Extract from multiple texts in parallel."""
        pass

Add high-level convenience methods
Write comprehensive integration tests

Checkpoint: Can use simple API for extractions.

Phase 6: Additional Providers and Streaming (3-4 hours)

Goal: Support more providers and streaming.

Implement Anthropic provider:

# src/structured_llm/providers/anthropic.py
class AnthropicProvider(LLMProvider):
    def complete(self, messages, schema):
        # Use tool_use for structured output
        pass

Add streaming support:

async def stream(
    self,
    schema: Type[T],
    text: str
) -> AsyncIterator[PartialModel[T]]:
    """Stream partial results as they generate."""
    pass

Create CLI tool for command-line usage

Checkpoint: Full-featured structured LLM system.

Testing Strategy

Unit Tests

# tests/test_schemas.py
import pytest
from pydantic import BaseModel, Field, ValidationError
from structured_llm.schemas import model_to_json_schema

class TestSchemaGeneration:
    def test_simple_model(self):
        class Simple(BaseModel):
            name: str
            age: int

        schema = model_to_json_schema(Simple)
        assert schema["type"] == "object"
        assert "name" in schema["properties"]
        assert schema["properties"]["age"]["type"] == "integer"

    def test_nested_model(self):
        class Address(BaseModel):
            city: str

        class Person(BaseModel):
            name: str
            address: Address

        schema = model_to_json_schema(Person)
        # Verify nested schema is properly included
        assert "address" in schema["properties"]

    def test_optional_fields(self):
        class WithOptional(BaseModel):
            required: str
            optional: Optional[str] = None

        schema = model_to_json_schema(WithOptional)
        assert "required" in schema.get("required", [])
        assert "optional" not in schema.get("required", [])

    def test_field_descriptions(self):
        class WithDescriptions(BaseModel):
            name: str = Field(..., description="The person's name")

        schema = model_to_json_schema(WithDescriptions)
        assert schema["properties"]["name"]["description"] == "The person's name"


# tests/test_retry.py
class TestRetryHandler:
    def test_succeeds_first_try(self):
        config = RetryConfig(max_retries=3)
        handler = RetryHandler(config, mock_provider)

        attempts = []
        def succeeding_func(messages=None):
            attempts.append(1)
            return {"name": "John", "age": 30}

        result = handler.with_retries(succeeding_func)
        assert len(attempts) == 1
        assert result["name"] == "John"

    def test_retries_on_validation_error(self):
        config = RetryConfig(max_retries=3)
        handler = RetryHandler(config, mock_provider)

        attempts = []
        def failing_then_succeeding(messages=None):
            attempts.append(1)
            if len(attempts) < 2:
                raise ValidationError(...)
            return {"name": "John", "age": 30}

        result = handler.with_retries(failing_then_succeeding)
        assert len(attempts) == 2

    def test_exhausts_retries(self):
        config = RetryConfig(max_retries=2)
        handler = RetryHandler(config, mock_provider)

        def always_failing(messages=None):
            raise ValidationError(...)

        with pytest.raises(ValidationError):
            handler.with_retries(always_failing)

    def test_error_feedback_included(self):
        config = RetryConfig(include_error_feedback=True)
        handler = RetryHandler(config, mock_provider)

        error_messages = []
        def capture_messages(error, attempt):
            error_messages.append(handler.format_validation_error(error))
            return [{"role": "system", "content": error_messages[-1]}]

        # ... test that error messages are properly formatted


# tests/test_prompts.py
class TestPromptBuilder:
    def test_schema_injection(self):
        schema = {"properties": {"name": {"type": "string"}}}
        builder = PromptBuilder(schema)

        system_prompt = builder.build_system_prompt()
        assert "name" in system_prompt
        assert "string" in system_prompt

    def test_examples_included(self):
        builder = PromptBuilder(
            schema={},
            examples=[("Input text", {"output": "value"})]
        )

        prompt = builder.build_system_prompt()
        assert "Input text" in prompt
        assert "output" in prompt

Integration Tests

# tests/test_integration.py
import pytest
from structured_llm import StructuredLLM
from structured_llm.examples.entities import Person, DocumentAnalysis

@pytest.mark.integration
class TestOpenAIIntegration:
    """Tests that require actual API calls."""

    @pytest.fixture
    def client(self):
        return StructuredLLM(provider="openai", model="gpt-4o-mini")

    def test_simple_extraction(self, client):
        person = client.extract(
            schema=Person,
            text="John Smith is a 30-year-old software engineer."
        )

        assert isinstance(person, Person)
        assert person.name == "John Smith"
        assert person.age == 30
        assert person.occupation == "software engineer"

    def test_missing_optional_fields(self, client):
        person = client.extract(
            schema=Person,
            text="Someone named Alice was mentioned."
        )

        assert person.name == "Alice"
        assert person.age is None  # Not mentioned

    def test_complex_nested_extraction(self, client):
        analysis = client.extract(
            schema=DocumentAnalysis,
            text="""
            Apple Inc. announced today that CEO Tim Cook will present
            the new iPhone at their Cupertino headquarters. Analysts
            expect strong sales despite economic headwinds.
            """
        )

        assert len(analysis.entities) > 0
        assert any(e.type == "organization" for e in analysis.entities)
        assert any(e.type == "person" for e in analysis.entities)

    def test_batch_extraction(self, client):
        texts = [
            "John is 25.",
            "Mary is 30.",
            "Bob is 45."
        ]

        people = client.extract_many(
            schema=Person,
            texts=texts,
            concurrency=3
        )

        assert len(people) == 3
        assert all(isinstance(p, Person) for p in people)

    def test_retry_on_malformed_response(self, client):
        # This tests the retry mechanism with a tricky prompt
        # that might produce invalid output on first try

        person = client.extract(
            schema=Person,
            text="The age is thirty and name is 123"  # Tricky!
        )

        # Should eventually succeed
        assert isinstance(person, Person)


@pytest.mark.integration
class TestAnthropicIntegration:
    @pytest.fixture
    def client(self):
        return StructuredLLM(provider="anthropic", model="claude-3-sonnet")

    def test_simple_extraction(self, client):
        person = client.extract(
            schema=Person,
            text="Jane Doe is 28 years old."
        )

        assert person.name == "Jane Doe"
        assert person.age == 28

Mock Tests for Offline Development

# tests/test_with_mocks.py
from unittest.mock import Mock, patch
from structured_llm import StructuredLLM

class TestWithMocks:
    def test_provider_called_correctly(self):
        mock_provider = Mock()
        mock_provider.complete.return_value = '{"name": "Test", "age": 25}'

        with patch('structured_llm.client.OpenAIProvider', return_value=mock_provider):
            client = StructuredLLM()
            person = client.extract(Person, "Test is 25")

        mock_provider.complete.assert_called_once()
        args = mock_provider.complete.call_args
        assert "Test is 25" in str(args)

    def test_schema_included_in_request(self):
        mock_provider = Mock()
        mock_provider.complete.return_value = '{"name": "Test", "age": 25}'

        with patch('structured_llm.client.OpenAIProvider', return_value=mock_provider):
            client = StructuredLLM()
            client.extract(Person, "Some text")

        call_kwargs = mock_provider.complete.call_args.kwargs
        assert "schema" in call_kwargs
        assert "name" in call_kwargs["schema"]["properties"]

Common Pitfalls and Debugging

Pitfall 1: Schema Too Complex for LLM

Problem: LLM fails to produce valid JSON for deeply nested or complex schemas.

Symptom:

ValidationError: 5 validation errors for ComplexModel
  nested.deeply.field1: Field required
  nested.deeply.field2: Input should be a valid string
  ...

Solution: Break extraction into steps or simplify schema:

# Instead of one complex schema
class ComplexPerson(BaseModel):
    name: str
    addresses: List[Address]
    employment_history: List[Job]
    education: List[Degree]

# Use simpler sequential extraction
basic_info = client.extract(BasicPerson, text)
addresses = client.extract(List[Address], text)
# Combine afterward

Pitfall 2: Field Descriptions Not Clear Enough

Problem: LLM misunderstands what a field should contain.

Symptom: Extracted values are technically valid but semantically wrong.

Solution: Add detailed descriptions and examples:

# BAD
class Product(BaseModel):
    price: float

# GOOD
class Product(BaseModel):
    price: float = Field(
        ...,
        description="Price in USD as a decimal number (e.g., 29.99). "
                    "Do NOT include currency symbol or thousands separators.",
        examples=[29.99, 149.00, 9.50]
    )

Pitfall 3: Optional Fields Treated as Required

Problem: LLM returns null/empty for required fields or omits optional fields entirely.

Symptom:

ValidationError: 1 validation error for Person
  occupation: Field required

Solution: Be explicit about optionality in descriptions:

class Person(BaseModel):
    name: str = Field(..., description="REQUIRED: The person's full name")
    occupation: Optional[str] = Field(
        None,
        description="OPTIONAL: Job title if mentioned, otherwise omit or set to null"
    )

Pitfall 4: JSON Parsing Failures

Problem: LLM includes markdown formatting or extra text around JSON.

Symptom:

json.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Example bad output:

Here's the extracted data:
```json
{"name": "John"}

**Solution**: Strip markdown and extract JSON:
```python
def extract_json(response: str) -> str:
    """Extract JSON from potentially wrapped response."""
    # Try direct parse first
    try:
        json.loads(response)
        return response
    except json.JSONDecodeError:
        pass

    # Look for JSON in code blocks
    import re
    json_match = re.search(r'```(?:json)?\s*([\s\S]*?)\s*```', response)
    if json_match:
        return json_match.group(1)

    # Look for JSON object/array
    json_match = re.search(r'(\{[\s\S]*\}|\[[\s\S]*\])', response)
    if json_match:
        return json_match.group(1)

    raise ValueError(f"Could not extract JSON from: {response[:100]}...")

Pitfall 5: Retry Loop Never Succeeds

Problem: Same validation error occurs on every retry.

Symptom: Max retries exceeded with identical errors.

Solution: Ensure error feedback is actually reaching the LLM:

def build_retry_prompt(self, original_messages: list, error: ValidationError) -> list:
    """Build prompt that includes error feedback."""
    error_feedback = {
        "role": "user",
        "content": f"""Your previous response was invalid.

Errors:
{self.format_validation_error(error)}

Please provide a corrected response that fixes these issues.
Remember to:
1. Return ONLY valid JSON
2. Include all required fields
3. Use correct data types (integers for ages, strings for names, etc.)
"""
    }

    return original_messages + [error_feedback]

Pitfall 6: Rate Limiting and Timeout Issues

Problem: API calls fail due to rate limits or timeouts.

Symptom:

openai.RateLimitError: Rate limit exceeded

Solution: Implement backoff and retry for rate limits:

import time
from tenacity import retry, stop_after_attempt, wait_exponential

@retry(
    stop=stop_after_attempt(5),
    wait=wait_exponential(multiplier=1, min=1, max=60),
    retry=lambda e: isinstance(e, openai.RateLimitError)
)
def call_with_retry(self, *args, **kwargs):
    return self.client.chat.completions.create(*args, **kwargs)

Pitfall 7: Token Limit Exceeded

Problem: Schema + prompt + response exceeds model’s context window.

Symptom:

openai.BadRequestError: This model's maximum context length is 8192 tokens

Solution: Estimate and manage token usage:

def estimate_tokens(self, text: str) -> int:
    """Rough token estimate (4 chars per token average)."""
    return len(text) // 4

def check_token_budget(self, schema: dict, text: str, max_tokens: int = 8000):
    schema_tokens = self.estimate_tokens(json.dumps(schema))
    text_tokens = self.estimate_tokens(text)
    overhead = 500  # System prompt, formatting

    available_for_response = max_tokens - schema_tokens - text_tokens - overhead

    if available_for_response < 500:
        raise ValueError(
            f"Input too long. Schema: {schema_tokens}, Text: {text_tokens}, "
            f"Only {available_for_response} tokens left for response."
        )

Extensions and Challenges

Extension 1: Streaming Partial Results

Implement streaming for long extractions to show progress:

from typing import AsyncIterator
from pydantic import BaseModel

class PartialModel:
    """Represents a partially extracted model."""
    def __init__(self, model_class: type, partial_data: dict):
        self.model_class = model_class
        self.partial_data = partial_data
        self.is_complete = False

    def get_partial(self) -> dict:
        return self.partial_data

    def finalize(self) -> BaseModel:
        return self.model_class.model_validate(self.partial_data)

async def stream_extraction(
    self,
    schema: Type[T],
    text: str
) -> AsyncIterator[PartialModel[T]]:
    """Stream partial results as they generate."""
    buffer = ""
    partial_data = {}

    async for chunk in self.provider.stream(messages, schema):
        buffer += chunk

        # Try to parse partial JSON
        try:
            partial_data = parse_partial_json(buffer)
            yield PartialModel(schema, partial_data)
        except:
            continue

    # Final complete result
    final = PartialModel(schema, partial_data)
    final.is_complete = True
    yield final

Extension 2: Schema Evolution and Versioning

Handle schema changes gracefully:

class SchemaRegistry:
    """Manage multiple versions of extraction schemas."""

    def __init__(self):
        self.schemas: dict[str, dict[str, type]] = {}

    def register(self, name: str, version: str, schema: type):
        if name not in self.schemas:
            self.schemas[name] = {}
        self.schemas[name][version] = schema

    def get(self, name: str, version: str = "latest") -> type:
        if version == "latest":
            versions = sorted(self.schemas[name].keys())
            version = versions[-1]
        return self.schemas[name][version]

    def migrate(self, data: dict, from_version: str, to_version: str) -> dict:
        """Migrate extracted data between schema versions."""
        pass

# Usage
registry = SchemaRegistry()
registry.register("person", "1.0", PersonV1)
registry.register("person", "2.0", PersonV2)  # Added new fields

schema = registry.get("person", "latest")

Extension 3: Extraction with Confidence Scores

Add confidence scoring to extractions:

from typing import Generic, TypeVar

T = TypeVar('T', bound=BaseModel)

class ExtractionResult(BaseModel, Generic[T]):
    """Extraction result with confidence metadata."""
    data: T
    confidence: float = Field(..., ge=0, le=1)
    extraction_notes: Optional[str] = None
    fields_uncertain: list[str] = Field(default_factory=list)

class ConfidenceAwareExtractor:
    def extract_with_confidence(
        self,
        schema: Type[T],
        text: str
    ) -> ExtractionResult[T]:
        # First extraction
        data = self.extract(schema, text)

        # Ask LLM to rate confidence
        confidence_schema = ConfidenceRating
        rating = self.extract(
            confidence_schema,
            f"Rate your confidence in this extraction:\n{data.model_dump_json()}"
        )

        return ExtractionResult(
            data=data,
            confidence=rating.overall_confidence,
            fields_uncertain=rating.uncertain_fields
        )

Extension 4: Multi-LLM Consensus

Use multiple LLMs and take consensus:

class ConsensusExtractor:
    """Extract with multiple LLMs and take consensus."""

    def __init__(self, providers: list[LLMProvider]):
        self.providers = providers

    def extract_consensus(
        self,
        schema: Type[T],
        text: str,
        min_agreement: float = 0.66
    ) -> T:
        results = []
        for provider in self.providers:
            try:
                result = provider.extract(schema, text)
                results.append(result)
            except Exception:
                continue

        if not results:
            raise ValueError("All providers failed")

        # Find consensus (simplified - real impl would be smarter)
        return self._find_consensus(results, min_agreement)

    def _find_consensus(self, results: list[T], min_agreement: float) -> T:
        # Compare results field by field
        # Return most common values
        pass

Extension 5: Extraction Pipeline DSL

Create a domain-specific language for complex extraction pipelines:

from dataclasses import dataclass

@dataclass
class ExtractionStep:
    schema: type
    depends_on: list[str] = None
    condition: Callable = None

class ExtractionPipeline:
    """Define multi-step extraction pipelines."""

    def __init__(self):
        self.steps: dict[str, ExtractionStep] = {}

    def add_step(self, name: str, step: ExtractionStep):
        self.steps[name] = step
        return self

    def run(self, text: str) -> dict[str, BaseModel]:
        results = {}

        for name, step in self._topological_sort():
            # Check condition
            if step.condition and not step.condition(results):
                continue

            # Build context from dependencies
            context = {
                dep: results[dep].model_dump()
                for dep in (step.depends_on or [])
            }

            # Extract
            results[name] = self.client.extract(
                step.schema,
                text,
                context=context
            )

        return results

# Usage
pipeline = ExtractionPipeline()
pipeline.add_step("basic", ExtractionStep(BasicInfo))
pipeline.add_step("details", ExtractionStep(
    DetailedInfo,
    depends_on=["basic"],
    condition=lambda r: r["basic"].needs_details
))

Real-World Connections

Where This Pattern Appears

AI-Powered Data Entry - Extracting form data from documents
Chatbot Response Structuring - Making chatbot outputs machine-readable
Content Classification - Categorizing content with consistent schemas
API Response Generation - Using LLMs to generate structured API responses
ETL Pipelines - Transforming unstructured to structured data

Industry Examples

Anthropic Claude Tool Use - Structured outputs via tool definitions
OpenAI Structured Outputs - Native JSON Schema compliance
LangChain Output Parsers - Framework for structured LLM outputs
Instructor Library - Popular library for Pydantic + LLM integration
Outlines - Constrained text generation with regex/JSON

Production Considerations

Cost Management
- Schema injection adds tokens (and cost)
- Cache frequent extractions
- Use cheaper models for simple schemas
Latency
- Retries add latency
- Consider async/parallel extraction
- Streaming for user-facing applications
Reliability
- Always have fallback behavior
- Log failed extractions for analysis
- Monitor extraction success rates
Security
- Validate extracted data before use
- Don’t trust LLM output for security decisions
- Sanitize inputs to prevent prompt injection

Self-Assessment Checklist

Core Understanding

Can I explain why structured LLM output is important for production systems?
Can I describe the difference between JSON mode, function calling, and structured outputs?
Can I explain how Pydantic’s JSON Schema generation works?
Can I describe the self-correction retry pattern?

Implementation Skills

Can I generate a clean JSON Schema from a Pydantic model?
Can I build prompts that reliably produce structured output?
Can I implement retry logic with error feedback?
Can I handle complex nested schemas?

Provider Knowledge

Can I integrate with OpenAI’s structured output features?
Can I use Anthropic’s tool use for structured extraction?
Can I abstract providers behind a common interface?

Production Readiness

Can I handle edge cases (empty input, no matches, partial data)?
Can I manage token budgets for complex schemas?
Can I implement streaming for long extractions?
Can I test structured extraction without API calls?

Mastery Indicators

System handles all validation errors gracefully
Extraction succeeds on complex real-world text
Provider abstraction allows easy switching
Tests cover both success and failure cases
Documentation is comprehensive

Resources

Documentation

Libraries

instructor - Structured outputs for LLMs
outlines - Constrained text generation
marvin - AI functions with Pydantic
langchain - LLM framework with output parsers

Books and Articles

“AI Engineering” by Chip Huyen - Comprehensive LLM engineering guide
OpenAI Cookbook - Practical examples
Anthropic Prompt Engineering Guide

LangChain Output Parsers
Guardrails AI - Validation for LLM outputs
LMQL - Query language for LLMs with constraints

Project 8: LLM Structured Output

Learning Objectives

Deep Theoretical Foundation

The Problem of Unstructured LLM Output

The Solution: Schema-Guided Generation

JSON Schema Generation with model_json_schema()

Schema Injection Strategies

Strategy 1: System Prompt with Schema

Strategy 2: OpenAI Function Calling

Strategy 3: OpenAI Structured Outputs (Newest)

The Instructor Library Pattern

Retry and Self-Correction Strategies

Handling Complex Nested Types with LLMs

Comparing LLM Providers for Structured Output

Token Efficiency Considerations

Project Specification

Functional Requirements

Use Cases to Implement

Use Case 1: Document Entity Extraction

Use Case 2: Structured Data Transformation

Use Case 3: Conversational Response Structuring

CLI Interface

API Interface

Solution Architecture

Component Design

Provider Abstraction

Retry Handler Design

Project Structure

Phased Implementation Guide

Phase 1: Core Schema Infrastructure (2-3 hours)

Phase 2: Prompt Construction (2 hours)

Phase 3: OpenAI Provider (3-4 hours)

Phase 4: Retry and Self-Correction (2-3 hours)

Phase 5: Main Client API (2-3 hours)

Phase 6: Additional Providers and Streaming (3-4 hours)

Testing Strategy

Unit Tests

Integration Tests

Mock Tests for Offline Development

Common Pitfalls and Debugging

Pitfall 1: Schema Too Complex for LLM

Pitfall 2: Field Descriptions Not Clear Enough

Pitfall 3: Optional Fields Treated as Required

Pitfall 4: JSON Parsing Failures

Pitfall 5: Retry Loop Never Succeeds

Pitfall 6: Rate Limiting and Timeout Issues

Pitfall 7: Token Limit Exceeded

Extensions and Challenges

Extension 1: Streaming Partial Results

Extension 2: Schema Evolution and Versioning

Extension 3: Extraction with Confidence Scores

Extension 4: Multi-LLM Consensus

Extension 5: Extraction Pipeline DSL

Real-World Connections

Where This Pattern Appears

Industry Examples

Production Considerations

Self-Assessment Checklist

Core Understanding

Implementation Skills

Provider Knowledge

Production Readiness

Mastery Indicators

Resources

Documentation

Libraries

Books and Articles

Related Projects