Project 8: LLM Structured Output
Project 8: LLM Structured Output
Build a system that uses Pydantic to define structured outputs for LLMs, ensuring the AI returns validated, type-safe data instead of arbitrary text.
Learning Objectives
By completing this project, you will:
- Understand the problem of unstructured LLM output - Why raw text responses are unreliable for production systems
- Master JSON Schema generation with Pydantic - Use
model_json_schema()to create schemas that guide LLM responses - Implement schema injection in prompts - Techniques for instructing LLMs to follow specific output formats
- Integrate with OpenAI and Anthropic APIs - Use function calling and structured output features
- Apply the Instructor library pattern - Understand how Instructor patches LLM clients for automatic validation
- Build retry and self-correction strategies - Handle malformed responses gracefully with automatic retries
Deep Theoretical Foundation
The Problem of Unstructured LLM Output
Large Language Models are fundamentally text generation systems. When you ask an LLM to โextract the personโs name and age from this text,โ you might get:
Attempt 1: "The person's name is John and they are 30 years old."
Attempt 2: "Name: John, Age: 30"
Attempt 3: "John (30)"
Attempt 4: "I found that the individual named John is thirty years old."
All correct semantically, but none are reliably parseable by code. This unpredictability is catastrophic for production systems that need to:
- Store extracted data in databases
- Chain LLM outputs to other services
- Validate business rules on extracted information
- Provide consistent API responses
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ THE STRUCTURED OUTPUT PROBLEM โ
โ โ
โ โโโโโโโโโโโโโโโโโโโโโ โ
โ โ LLM Prompt โ โ
โ โ "Extract user โ โ
โ โ info from text" โ โ
โ โโโโโโโโโโโฌโโโโโโโโโโ โ
โ โ โ
โ โผ โ
โ โโโโโโโโโโโโโโโโโโโโโ โ
โ โ LLM Response โ โ
โ โ (Free-form text) โ โ
โ โโโโโโโโโโโฌโโโโโโโโโโ โ
โ โ โ
โ โโโโโโโโโโดโโโโโโโโโ โ
โ โ โ โ
โ โผ โผ โ
โ "Name: John" "The user โ
โ "Age: 30" John is 30" โ
โ โ
โ โ โ โ
โ โผ โผ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ YOUR CODE โ โ
โ โ regex? string parsing? prayer? โ โ
โ โ โ โ
โ โ name = ??? # How do you reliably extract this? โ โ
โ โ age = ??? # What if it says "thirty" instead of 30? โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ
โ THE NIGHTMARE โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
The Solution: Schema-Guided Generation
The solution is to tell the LLM exactly what structure we expect, and have the LLM API enforce that structure:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ THE STRUCTURED OUTPUT SOLUTION โ
โ โ
โ โโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโ โ
โ โ Pydantic Model โ โโโโบ โ JSON Schema โ โ
โ โ โ โ โ โ
โ โ class User: โ โ { โ โ
โ โ name: str โ โ "properties": โ โ
โ โ age: int โ โ "name": {} โ โ
โ โ โ โ "age": {} โ โ
โ โโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโฌโโโโโโโโโโ โ
โ โ โ
โ โผ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ LLM API โ โ
โ โ โ โ
โ โ - Prompt: "Extract user info from: 'John is 30'" โ โ
โ โ - Schema: {"name": str, "age": int} โ โ
โ โ - Mode: JSON / Function Calling / Structured Output โ โ
โ โ โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ โ
โ โผ โ
โ {"name": "John", "age": 30} โ
โ โ โ
โ โผ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ Pydantic Validation โ โ
โ โ โ โ
โ โ user = User.model_validate_json(response) โ โ
โ โ # Guaranteed to have correct types! โ โ
โ โ โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ
โ RELIABLE OUTPUT โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
JSON Schema Generation with model_json_schema()
Pydantic can generate JSON Schema from any model, which becomes the bridge between your Python types and the LLM:
from pydantic import BaseModel, Field
from typing import Literal, Optional
from datetime import date
class Person(BaseModel):
"""A person extracted from text."""
name: str = Field(..., description="The person's full name")
age: int = Field(..., ge=0, le=150, description="Age in years")
occupation: Optional[str] = Field(None, description="Job or profession")
# Generate JSON Schema
schema = Person.model_json_schema()
print(json.dumps(schema, indent=2))
Output:
{
"title": "Person",
"description": "A person extracted from text.",
"type": "object",
"properties": {
"name": {
"type": "string",
"description": "The person's full name",
"title": "Name"
},
"age": {
"type": "integer",
"minimum": 0,
"maximum": 150,
"description": "Age in years",
"title": "Age"
},
"occupation": {
"type": "string",
"description": "Job or profession",
"title": "Occupation",
"default": null
}
},
"required": ["name", "age"]
}
Key Insights:
- Field descriptions become schema descriptions - The LLM reads these to understand what each field means
- Constraints are encoded -
ge=0, le=150becomesminimumandmaximum - Optional fields have defaults - The LLM knows it can omit them
- Types are enforced -
age: intmeans the JSON must have an integer, not a string
Schema Injection Strategies
There are three main strategies for getting LLMs to produce structured output:
Strategy 1: System Prompt with Schema
The simplest approach: include the schema in the system prompt and ask for JSON:
def build_prompt(schema: dict, user_prompt: str) -> list[dict]:
return [
{
"role": "system",
"content": f"""You are a helpful assistant that always responds with valid JSON.
Your response must conform to this JSON schema:
{json.dumps(schema, indent=2)}
Respond ONLY with valid JSON, no other text or explanation."""
},
{
"role": "user",
"content": user_prompt
}
]
Pros:
- Works with any LLM that can output JSON
- Simple to implement
Cons:
- LLM might still produce invalid JSON
- No guarantee schema is followed exactly
- Nested or complex schemas often fail
Strategy 2: OpenAI Function Calling
OpenAIโs function calling feature was originally designed for tool use, but works excellently for structured extraction:
from openai import OpenAI
client = OpenAI()
response = client.chat.completions.create(
model="gpt-4",
messages=[
{"role": "user", "content": "Extract person info from: John Smith is 30 years old."}
],
functions=[
{
"name": "extract_person",
"description": "Extract person information from text",
"parameters": Person.model_json_schema()
}
],
function_call={"name": "extract_person"} # Force this function
)
# Response is in function_call.arguments as JSON string
person_json = response.choices[0].message.function_call.arguments
person = Person.model_validate_json(person_json)
Pros:
- More reliable than raw JSON mode
- LLM is โtrainedโ to produce function arguments
Cons:
- Function calling adds token overhead
- Not all models support it
Strategy 3: OpenAI Structured Outputs (Newest)
OpenAIโs newest feature guarantees schema compliance:
from openai import OpenAI
client = OpenAI()
response = client.chat.completions.create(
model="gpt-4o-2024-08-06", # Must be this model or newer
messages=[
{"role": "user", "content": "Extract person info from: John Smith is 30."}
],
response_format={
"type": "json_schema",
"json_schema": {
"name": "person_response",
"strict": True,
"schema": Person.model_json_schema()
}
}
)
person = Person.model_validate_json(response.choices[0].message.content)
Pros:
- Guaranteed valid JSON matching schema
- Fastest and most reliable
Cons:
- Only available on newest models
- Some schema features not supported in strict mode
The Instructor Library Pattern
The Instructor library wraps OpenAI/Anthropic clients to automate the structured output pattern:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ INSTRUCTOR ARCHITECTURE โ
โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ Your Code โ โ
โ โ โ โ
โ โ @instructor.patch โ โ
โ โ client = OpenAI() โ โ
โ โ โ โ
โ โ user = client.chat.completions.create( โ โ
โ โ model="gpt-4", โ โ
โ โ response_model=User, # <- Pydantic model! โ โ
โ โ messages=[...] โ โ
โ โ ) โ โ
โ โ # user is a validated User instance โ โ
โ โ โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ โ
โ โผ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ Instructor Internals โ โ
โ โ โ โ
โ โ 1. Extract JSON Schema from User model โ โ
โ โ 2. Choose strategy (function calling / JSON mode) โ โ
โ โ 3. Add schema to API call โ โ
โ โ 4. Make API request โ โ
โ โ 5. Parse JSON response โ โ
โ โ 6. Validate with User.model_validate() โ โ
โ โ 7. If validation fails, retry with error feedback โ โ
โ โ 8. Return validated User instance โ โ
โ โ โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ โ
โ โผ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ OpenAI API โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Instructor provides:
- Automatic schema injection - No manual JSON Schema handling
- Retry with feedback - If LLM produces invalid JSON, it retries with the error message
- Multiple modes - Function calling, JSON mode, tool use
- Streaming support - Partial objects as they generate
- Validation hooks - Custom validators that trigger retries
Retry and Self-Correction Strategies
LLMs are probabilistic - even with schemas, they sometimes produce invalid output. A robust system needs retry logic:
class RetryStrategy:
"""Configurable retry strategy for LLM structured output."""
def __init__(
self,
max_retries: int = 3,
include_error_in_retry: bool = True,
exponential_backoff: bool = True
):
self.max_retries = max_retries
self.include_error_in_retry = include_error_in_retry
self.exponential_backoff = exponential_backoff
The self-correction pattern:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ SELF-CORRECTION LOOP โ
โ โ
โ โโโโโโโโโโโโโโโ โ
โ โ Attempt 1 โ โ
โ โโโโโโโโฌโโโโโโโ โ
โ โ โ
โ โผ โ
โ โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ โ
โ โ LLM Response โ โโโบ โ Validate โ โ
โ โ {"name": "Jo โ โ with โ โ
โ โ "age": "30"} โ โ Pydantic โ โ
โ โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโฌโโโโโโโโโ โ
โ โ โ
โ โ Invalid! โ
โ "age should be int, got str" โ
โ โ โ
โ โผ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ Attempt 2 (with error feedback) โ โ
โ โ โ โ
โ โ System: "Your previous response had errors: โ โ
โ โ - age: Input should be a valid integer โ โ
โ โ Please fix and try again." โ โ
โ โ โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ โ
โ โผ โ
โ โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ โ
โ โ LLM Response โ โโโบ โ Validate โ โ
โ โ {"name": "John"โ โ with โ โ
โ โ "age": 30} โ โ Pydantic โ โ
โ โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโฌโโโโโโโโโ โ
โ โ โ
โ โ Valid! โ
โ โ โ
โ โผ โ
โ Return User instance โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Handling Complex Nested Types with LLMs
Complex schemas present challenges for LLMs. Consider:
from pydantic import BaseModel, Field
from typing import List, Optional, Literal
from datetime import datetime
class Address(BaseModel):
street: str
city: str
country: str = Field(..., description="ISO 3166-1 alpha-2 country code")
postal_code: Optional[str] = None
class ContactMethod(BaseModel):
type: Literal["email", "phone", "social"]
value: str
is_primary: bool = False
class Person(BaseModel):
name: str
age: int
addresses: List[Address] = Field(default_factory=list)
contacts: List[ContactMethod] = Field(default_factory=list)
metadata: dict = Field(default_factory=dict)
This schema has:
- Nested objects (Address, ContactMethod)
- Lists of objects
- Literal types for enums
- Optional fields with defaults
- Arbitrary dict fields
Strategies for Complex Schemas:
- Break into steps - Extract simple fields first, then complex ones:
# Step 1: Extract basic info basic_info = extract(BasicPerson, text) # Step 2: Extract addresses with context addresses = extract(List[Address], text, context=basic_info) # Step 3: Combine full_person = Person(**basic_info.model_dump(), addresses=addresses) - Use description heavily - LLMs rely on descriptions for context:
class Address(BaseModel): """A physical mailing address. Extract from the text any mention of where the person lives or works.""" street: str = Field(..., description="Street address including number") city: str = Field(..., description="City name, not abbreviated") - Provide examples - Include example outputs in the prompt:
EXAMPLES = """ Example input: "John lives at 123 Main St in NYC" Example output: {"street": "123 Main St", "city": "New York City", "country": "US"} """ - Simplify when possible - Use flatter schemas if nesting isnโt essential
Comparing LLM Providers for Structured Output
| Provider | Method | Reliability | Speed | Cost |
|---|---|---|---|---|
| OpenAI GPT-4o | Structured Outputs | Highest | Fast | $$$ |
| OpenAI GPT-4 | Function Calling | High | Medium | $$$ |
| OpenAI GPT-3.5 | JSON Mode | Medium | Fast | $ |
| Anthropic Claude | Tool Use | High | Medium | $$ |
| Local (Ollama) | JSON Mode | Variable | Depends | Free |
Token Efficiency Considerations
Structured output has token overhead:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ TOKEN BREAKDOWN โ
โ โ
โ Standard Prompt: โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ System: "You are a helpful assistant" ~10 tokens โ
โ โ User: "Extract name and age from: John is 30" ~15 tokens โ
โ โ Response: "Name: John, Age: 30" ~10 tokens โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ Total: ~35 tokens โ
โ โ
โ Structured Output: โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ System: "You are a helpful assistant that outputs JSON" โ โ
โ โ System: + JSON Schema (varies) ~50-200 tokensโ
โ โ User: "Extract name and age from: John is 30" ~15 tokens โ
โ โ Response: {"name": "John", "age": 30} ~15 tokens โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ Total: ~100-250 tokens โ
โ โ
โ Trade-off: 3-7x more tokens for guaranteed structure โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Project Specification
Functional Requirements
Build a structured LLM output system that:
- Defines extraction schemas with Pydantic - Multiple domains (people, events, products)
- Supports multiple LLM providers - OpenAI, Anthropic, with a consistent interface
- Implements retry with self-correction - Automatic retries with error feedback
- Handles complex nested types - Lists, nested objects, optional fields
- Provides validation feedback - Clear error messages when extraction fails
- Supports streaming - Partial results for long extractions
Use Cases to Implement
Use Case 1: Document Entity Extraction
Extract structured entities from documents:
class Entity(BaseModel):
"""An entity mentioned in the document."""
name: str = Field(..., description="Entity name as it appears in text")
type: Literal["person", "organization", "location", "date", "product"]
context: str = Field(..., description="Sentence where entity appears")
confidence: float = Field(..., ge=0, le=1)
class DocumentAnalysis(BaseModel):
"""Complete analysis of a document."""
summary: str = Field(..., max_length=500)
entities: List[Entity]
key_topics: List[str]
sentiment: Literal["positive", "negative", "neutral"]
Use Case 2: Structured Data Transformation
Convert unstructured text to database-ready records:
class ProductListing(BaseModel):
"""A product extracted from a listing description."""
title: str = Field(..., max_length=200)
price: float = Field(..., ge=0)
currency: str = Field("USD", pattern=r'^[A-Z]{3}$')
category: str
features: List[str] = Field(default_factory=list)
in_stock: bool = True
class ProductCatalog(BaseModel):
"""Multiple products from a catalog page."""
products: List[ProductListing]
source_url: Optional[str] = None
Use Case 3: Conversational Response Structuring
Structure chatbot responses for downstream processing:
class Intent(BaseModel):
"""Detected user intent."""
category: Literal["question", "command", "feedback", "other"]
action: Optional[str] = Field(None, description="Specific action requested")
entities: dict = Field(default_factory=dict)
class StructuredResponse(BaseModel):
"""A chatbot response with structured metadata."""
text: str = Field(..., description="Response text to show user")
intent: Intent
follow_up_questions: List[str] = Field(default_factory=list)
requires_human: bool = False
confidence: float = Field(..., ge=0, le=1)
CLI Interface
# Extract entities from text
$ llm-extract --schema entities --input document.txt --output entities.json
# Extract from stdin with custom schema
$ cat document.txt | llm-extract --schema-file custom_schema.py --model Person
# Interactive mode with streaming
$ llm-extract --interactive --schema chat_response
# Batch processing
$ llm-extract --schema products --input-dir listings/ --output-dir extracted/
API Interface
from structured_llm import StructuredLLM, RetryConfig
# Initialize with configuration
llm = StructuredLLM(
provider="openai",
model="gpt-4o",
retry_config=RetryConfig(max_retries=3)
)
# Simple extraction
person = llm.extract(
schema=Person,
text="John Smith is a 30-year-old software engineer from NYC."
)
# Batch extraction
products = llm.extract_many(
schema=ProductListing,
texts=listing_texts,
concurrency=5
)
# With custom prompt
analysis = llm.extract(
schema=DocumentAnalysis,
text=document,
system_prompt="You are an expert document analyst...",
examples=[
("Example input...", {"summary": "...", "entities": [...]})
]
)
Solution Architecture
Component Design
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ StructuredLLM โ
โ (Main Entry Point) โ
โ โ
โ - extract(schema, text) -> Model โ
โ - extract_many(schema, texts) -> List[Model] โ
โ - stream(schema, text) -> AsyncIterator[PartialModel] โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโ
โผ โผ โผ
โโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโ
โ SchemaBuilder โ โ PromptBuilder โ โ RetryHandler โ
โ โ โ โ โ โ
โ - to_json_schema โ โ - build_system โ โ - with_retries โ
โ - to_function โ โ - build_user โ โ - format_error โ
โ - from_pydantic โ โ - inject_schema โ โ - should_retry โ
โโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ LLM Providers โ
โ โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ โ
โ โ OpenAIProvider โ โ AnthropicProviderโ โ OllamaProvider โ โ
โ โ โ โ โ โ โ โ
โ โ - function_call โ โ - tool_use โ โ - json_mode โ โ
โ โ - structured โ โ โ โ โ โ
โ โ - json_mode โ โ โ โ โ โ
โ โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ ResponseParser โ
โ โ
โ - parse_json(response) -> dict โ
โ - validate(dict, schema) -> Model | ValidationError โ
โ - extract_from_function_call(response) -> dict โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Provider Abstraction
from abc import ABC, abstractmethod
from pydantic import BaseModel
from typing import Type, TypeVar, AsyncIterator
T = TypeVar('T', bound=BaseModel)
class LLMProvider(ABC):
"""Abstract base class for LLM providers."""
@abstractmethod
def complete(
self,
messages: list[dict],
schema: dict,
**kwargs
) -> str:
"""Get completion from LLM."""
pass
@abstractmethod
async def stream(
self,
messages: list[dict],
schema: dict,
**kwargs
) -> AsyncIterator[str]:
"""Stream completion from LLM."""
pass
@property
@abstractmethod
def supports_function_calling(self) -> bool:
"""Whether this provider supports function calling."""
pass
@property
@abstractmethod
def supports_structured_output(self) -> bool:
"""Whether this provider supports strict structured output."""
pass
class OpenAIProvider(LLMProvider):
"""OpenAI API provider."""
def __init__(self, model: str = "gpt-4o", api_key: str = None):
self.model = model
self.client = openai.OpenAI(api_key=api_key)
@property
def supports_function_calling(self) -> bool:
return True
@property
def supports_structured_output(self) -> bool:
return "gpt-4o" in self.model # Only latest models
def complete(self, messages: list[dict], schema: dict, **kwargs) -> str:
if self.supports_structured_output:
return self._complete_structured(messages, schema, **kwargs)
elif self.supports_function_calling:
return self._complete_function_call(messages, schema, **kwargs)
else:
return self._complete_json_mode(messages, schema, **kwargs)
Retry Handler Design
from dataclasses import dataclass
from typing import Callable, TypeVar
from pydantic import ValidationError
import time
T = TypeVar('T')
@dataclass
class RetryConfig:
"""Configuration for retry behavior."""
max_retries: int = 3
initial_delay: float = 0.5
exponential_base: float = 2.0
include_error_feedback: bool = True
max_delay: float = 30.0
class RetryHandler:
"""Handles retries with self-correction feedback."""
def __init__(self, config: RetryConfig, provider: LLMProvider):
self.config = config
self.provider = provider
def with_retries(
self,
func: Callable[[], T],
on_error: Callable[[Exception, int], list[dict]] = None
) -> T:
"""Execute function with retries."""
last_error = None
messages = None
for attempt in range(self.config.max_retries + 1):
try:
if attempt > 0 and last_error and on_error:
# Add error feedback to messages
messages = on_error(last_error, attempt)
return func(messages)
except (ValidationError, json.JSONDecodeError) as e:
last_error = e
if attempt < self.config.max_retries:
delay = min(
self.config.initial_delay * (self.config.exponential_base ** attempt),
self.config.max_delay
)
time.sleep(delay)
raise last_error
def format_validation_error(self, error: ValidationError) -> str:
"""Format validation error for LLM feedback."""
lines = ["Your previous response had validation errors:"]
for err in error.errors():
field_path = ".".join(str(p) for p in err["loc"])
lines.append(f"- {field_path}: {err['msg']}")
lines.append("\nPlease fix these issues and try again.")
return "\n".join(lines)
Project Structure
structured_llm/
โโโ src/
โ โโโ structured_llm/
โ โโโ __init__.py
โ โโโ client.py # Main StructuredLLM class
โ โโโ schemas.py # Schema building utilities
โ โโโ prompts.py # Prompt construction
โ โโโ retry.py # Retry handling
โ โโโ streaming.py # Streaming support
โ โ
โ โโโ providers/
โ โ โโโ __init__.py
โ โ โโโ base.py # Abstract provider
โ โ โโโ openai.py # OpenAI implementation
โ โ โโโ anthropic.py # Anthropic implementation
โ โ โโโ ollama.py # Ollama implementation
โ โ
โ โโโ examples/
โ โโโ entities.py # Entity extraction schemas
โ โโโ products.py # Product extraction schemas
โ โโโ chat.py # Chat response schemas
โ
โโโ tests/
โ โโโ test_schemas.py
โ โโโ test_retry.py
โ โโโ test_providers.py
โ โโโ test_integration.py
โ
โโโ examples/
โ โโโ simple_extraction.py
โ โโโ batch_processing.py
โ โโโ streaming_example.py
โ
โโโ pyproject.toml
โโโ README.md
Phased Implementation Guide
Phase 1: Core Schema Infrastructure (2-3 hours)
Goal: Build the foundation for schema handling.
- Create base Pydantic models for extraction:
# src/structured_llm/schemas.py from pydantic import BaseModel, Field from typing import Type, Any import json def model_to_json_schema(model: Type[BaseModel]) -> dict: """Convert Pydantic model to JSON Schema for LLM.""" schema = model.model_json_schema() # Clean up schema for LLM consumption return _clean_schema(schema) def _clean_schema(schema: dict) -> dict: """Remove Pydantic-specific fields that confuse LLMs.""" # Remove $defs if not needed # Simplify title fields # etc. pass - Create example extraction schemas:
# src/structured_llm/examples/entities.py class Person(BaseModel): """A person extracted from text.""" name: str = Field(..., description="Full name") age: Optional[int] = Field(None, ge=0, le=150) occupation: Optional[str] = None - Write tests for schema generation:
def test_simple_schema(): schema = model_to_json_schema(Person) assert schema["properties"]["name"]["type"] == "string" assert "age" in schema["properties"]
Checkpoint: Can generate clean JSON schemas from Pydantic models.
Phase 2: Prompt Construction (2 hours)
Goal: Build reliable prompt templates.
- Create prompt builder:
# src/structured_llm/prompts.py class PromptBuilder: def __init__(self, schema: dict, examples: list = None): self.schema = schema self.examples = examples or [] def build_system_prompt(self) -> str: """Build system prompt with schema injection.""" pass def build_extraction_prompt(self, text: str) -> str: """Build user prompt for extraction.""" pass def build_retry_prompt(self, error: str) -> str: """Build prompt for retry with error feedback.""" pass - Test prompt construction with different schemas.
Checkpoint: Prompts correctly include schema and examples.
Phase 3: OpenAI Provider (3-4 hours)
Goal: Implement OpenAI integration with multiple modes.
- Create abstract provider base:
# src/structured_llm/providers/base.py class LLMProvider(ABC): @abstractmethod def complete(self, messages: list, schema: dict) -> str: pass - Implement OpenAI provider with three modes:
- JSON mode (basic)
- Function calling
- Structured outputs (if model supports)
- Test with real API calls:
def test_openai_extraction(): provider = OpenAIProvider(model="gpt-4o") result = provider.complete( messages=[{"role": "user", "content": "John is 30"}], schema=Person.model_json_schema() ) person = Person.model_validate_json(result) assert person.name == "John"
Checkpoint: Can extract structured data via OpenAI.
Phase 4: Retry and Self-Correction (2-3 hours)
Goal: Handle failures gracefully.
- Implement retry handler with exponential backoff
- Add error feedback for self-correction
- Create validation error formatter
- Test retry behavior:
def test_retry_on_validation_error(): # Mock LLM to return invalid then valid handler = RetryHandler(RetryConfig(max_retries=2)) result = handler.with_retries(mock_extraction) assert result is not None
Checkpoint: System recovers from malformed responses.
Phase 5: Main Client API (2-3 hours)
Goal: Create the unified StructuredLLM interface.
- Implement main client:
# src/structured_llm/client.py class StructuredLLM: def __init__( self, provider: str = "openai", model: str = "gpt-4o", retry_config: RetryConfig = None ): self.provider = self._create_provider(provider, model) self.retry = RetryHandler(retry_config or RetryConfig()) def extract( self, schema: Type[T], text: str, system_prompt: str = None ) -> T: """Extract structured data from text.""" pass def extract_many( self, schema: Type[T], texts: list[str], concurrency: int = 5 ) -> list[T]: """Extract from multiple texts in parallel.""" pass - Add high-level convenience methods
- Write comprehensive integration tests
Checkpoint: Can use simple API for extractions.
Phase 6: Additional Providers and Streaming (3-4 hours)
Goal: Support more providers and streaming.
- Implement Anthropic provider:
# src/structured_llm/providers/anthropic.py class AnthropicProvider(LLMProvider): def complete(self, messages, schema): # Use tool_use for structured output pass - Add streaming support:
async def stream( self, schema: Type[T], text: str ) -> AsyncIterator[PartialModel[T]]: """Stream partial results as they generate.""" pass - Create CLI tool for command-line usage
Checkpoint: Full-featured structured LLM system.
Testing Strategy
Unit Tests
# tests/test_schemas.py
import pytest
from pydantic import BaseModel, Field, ValidationError
from structured_llm.schemas import model_to_json_schema
class TestSchemaGeneration:
def test_simple_model(self):
class Simple(BaseModel):
name: str
age: int
schema = model_to_json_schema(Simple)
assert schema["type"] == "object"
assert "name" in schema["properties"]
assert schema["properties"]["age"]["type"] == "integer"
def test_nested_model(self):
class Address(BaseModel):
city: str
class Person(BaseModel):
name: str
address: Address
schema = model_to_json_schema(Person)
# Verify nested schema is properly included
assert "address" in schema["properties"]
def test_optional_fields(self):
class WithOptional(BaseModel):
required: str
optional: Optional[str] = None
schema = model_to_json_schema(WithOptional)
assert "required" in schema.get("required", [])
assert "optional" not in schema.get("required", [])
def test_field_descriptions(self):
class WithDescriptions(BaseModel):
name: str = Field(..., description="The person's name")
schema = model_to_json_schema(WithDescriptions)
assert schema["properties"]["name"]["description"] == "The person's name"
# tests/test_retry.py
class TestRetryHandler:
def test_succeeds_first_try(self):
config = RetryConfig(max_retries=3)
handler = RetryHandler(config, mock_provider)
attempts = []
def succeeding_func(messages=None):
attempts.append(1)
return {"name": "John", "age": 30}
result = handler.with_retries(succeeding_func)
assert len(attempts) == 1
assert result["name"] == "John"
def test_retries_on_validation_error(self):
config = RetryConfig(max_retries=3)
handler = RetryHandler(config, mock_provider)
attempts = []
def failing_then_succeeding(messages=None):
attempts.append(1)
if len(attempts) < 2:
raise ValidationError(...)
return {"name": "John", "age": 30}
result = handler.with_retries(failing_then_succeeding)
assert len(attempts) == 2
def test_exhausts_retries(self):
config = RetryConfig(max_retries=2)
handler = RetryHandler(config, mock_provider)
def always_failing(messages=None):
raise ValidationError(...)
with pytest.raises(ValidationError):
handler.with_retries(always_failing)
def test_error_feedback_included(self):
config = RetryConfig(include_error_feedback=True)
handler = RetryHandler(config, mock_provider)
error_messages = []
def capture_messages(error, attempt):
error_messages.append(handler.format_validation_error(error))
return [{"role": "system", "content": error_messages[-1]}]
# ... test that error messages are properly formatted
# tests/test_prompts.py
class TestPromptBuilder:
def test_schema_injection(self):
schema = {"properties": {"name": {"type": "string"}}}
builder = PromptBuilder(schema)
system_prompt = builder.build_system_prompt()
assert "name" in system_prompt
assert "string" in system_prompt
def test_examples_included(self):
builder = PromptBuilder(
schema={},
examples=[("Input text", {"output": "value"})]
)
prompt = builder.build_system_prompt()
assert "Input text" in prompt
assert "output" in prompt
Integration Tests
# tests/test_integration.py
import pytest
from structured_llm import StructuredLLM
from structured_llm.examples.entities import Person, DocumentAnalysis
@pytest.mark.integration
class TestOpenAIIntegration:
"""Tests that require actual API calls."""
@pytest.fixture
def client(self):
return StructuredLLM(provider="openai", model="gpt-4o-mini")
def test_simple_extraction(self, client):
person = client.extract(
schema=Person,
text="John Smith is a 30-year-old software engineer."
)
assert isinstance(person, Person)
assert person.name == "John Smith"
assert person.age == 30
assert person.occupation == "software engineer"
def test_missing_optional_fields(self, client):
person = client.extract(
schema=Person,
text="Someone named Alice was mentioned."
)
assert person.name == "Alice"
assert person.age is None # Not mentioned
def test_complex_nested_extraction(self, client):
analysis = client.extract(
schema=DocumentAnalysis,
text="""
Apple Inc. announced today that CEO Tim Cook will present
the new iPhone at their Cupertino headquarters. Analysts
expect strong sales despite economic headwinds.
"""
)
assert len(analysis.entities) > 0
assert any(e.type == "organization" for e in analysis.entities)
assert any(e.type == "person" for e in analysis.entities)
def test_batch_extraction(self, client):
texts = [
"John is 25.",
"Mary is 30.",
"Bob is 45."
]
people = client.extract_many(
schema=Person,
texts=texts,
concurrency=3
)
assert len(people) == 3
assert all(isinstance(p, Person) for p in people)
def test_retry_on_malformed_response(self, client):
# This tests the retry mechanism with a tricky prompt
# that might produce invalid output on first try
person = client.extract(
schema=Person,
text="The age is thirty and name is 123" # Tricky!
)
# Should eventually succeed
assert isinstance(person, Person)
@pytest.mark.integration
class TestAnthropicIntegration:
@pytest.fixture
def client(self):
return StructuredLLM(provider="anthropic", model="claude-3-sonnet")
def test_simple_extraction(self, client):
person = client.extract(
schema=Person,
text="Jane Doe is 28 years old."
)
assert person.name == "Jane Doe"
assert person.age == 28
Mock Tests for Offline Development
# tests/test_with_mocks.py
from unittest.mock import Mock, patch
from structured_llm import StructuredLLM
class TestWithMocks:
def test_provider_called_correctly(self):
mock_provider = Mock()
mock_provider.complete.return_value = '{"name": "Test", "age": 25}'
with patch('structured_llm.client.OpenAIProvider', return_value=mock_provider):
client = StructuredLLM()
person = client.extract(Person, "Test is 25")
mock_provider.complete.assert_called_once()
args = mock_provider.complete.call_args
assert "Test is 25" in str(args)
def test_schema_included_in_request(self):
mock_provider = Mock()
mock_provider.complete.return_value = '{"name": "Test", "age": 25}'
with patch('structured_llm.client.OpenAIProvider', return_value=mock_provider):
client = StructuredLLM()
client.extract(Person, "Some text")
call_kwargs = mock_provider.complete.call_args.kwargs
assert "schema" in call_kwargs
assert "name" in call_kwargs["schema"]["properties"]
Common Pitfalls and Debugging
Pitfall 1: Schema Too Complex for LLM
Problem: LLM fails to produce valid JSON for deeply nested or complex schemas.
Symptom:
ValidationError: 5 validation errors for ComplexModel
nested.deeply.field1: Field required
nested.deeply.field2: Input should be a valid string
...
Solution: Break extraction into steps or simplify schema:
# Instead of one complex schema
class ComplexPerson(BaseModel):
name: str
addresses: List[Address]
employment_history: List[Job]
education: List[Degree]
# Use simpler sequential extraction
basic_info = client.extract(BasicPerson, text)
addresses = client.extract(List[Address], text)
# Combine afterward
Pitfall 2: Field Descriptions Not Clear Enough
Problem: LLM misunderstands what a field should contain.
Symptom: Extracted values are technically valid but semantically wrong.
Solution: Add detailed descriptions and examples:
# BAD
class Product(BaseModel):
price: float
# GOOD
class Product(BaseModel):
price: float = Field(
...,
description="Price in USD as a decimal number (e.g., 29.99). "
"Do NOT include currency symbol or thousands separators.",
examples=[29.99, 149.00, 9.50]
)
Pitfall 3: Optional Fields Treated as Required
Problem: LLM returns null/empty for required fields or omits optional fields entirely.
Symptom:
ValidationError: 1 validation error for Person
occupation: Field required
Solution: Be explicit about optionality in descriptions:
class Person(BaseModel):
name: str = Field(..., description="REQUIRED: The person's full name")
occupation: Optional[str] = Field(
None,
description="OPTIONAL: Job title if mentioned, otherwise omit or set to null"
)
Pitfall 4: JSON Parsing Failures
Problem: LLM includes markdown formatting or extra text around JSON.
Symptom:
json.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Example bad output:
Here's the extracted data:
```json
{"name": "John"}
**Solution**: Strip markdown and extract JSON:
```python
def extract_json(response: str) -> str:
"""Extract JSON from potentially wrapped response."""
# Try direct parse first
try:
json.loads(response)
return response
except json.JSONDecodeError:
pass
# Look for JSON in code blocks
import re
json_match = re.search(r'```(?:json)?\s*([\s\S]*?)\s*```', response)
if json_match:
return json_match.group(1)
# Look for JSON object/array
json_match = re.search(r'(\{[\s\S]*\}|\[[\s\S]*\])', response)
if json_match:
return json_match.group(1)
raise ValueError(f"Could not extract JSON from: {response[:100]}...")
Pitfall 5: Retry Loop Never Succeeds
Problem: Same validation error occurs on every retry.
Symptom: Max retries exceeded with identical errors.
Solution: Ensure error feedback is actually reaching the LLM:
def build_retry_prompt(self, original_messages: list, error: ValidationError) -> list:
"""Build prompt that includes error feedback."""
error_feedback = {
"role": "user",
"content": f"""Your previous response was invalid.
Errors:
{self.format_validation_error(error)}
Please provide a corrected response that fixes these issues.
Remember to:
1. Return ONLY valid JSON
2. Include all required fields
3. Use correct data types (integers for ages, strings for names, etc.)
"""
}
return original_messages + [error_feedback]
Pitfall 6: Rate Limiting and Timeout Issues
Problem: API calls fail due to rate limits or timeouts.
Symptom:
openai.RateLimitError: Rate limit exceeded
Solution: Implement backoff and retry for rate limits:
import time
from tenacity import retry, stop_after_attempt, wait_exponential
@retry(
stop=stop_after_attempt(5),
wait=wait_exponential(multiplier=1, min=1, max=60),
retry=lambda e: isinstance(e, openai.RateLimitError)
)
def call_with_retry(self, *args, **kwargs):
return self.client.chat.completions.create(*args, **kwargs)
Pitfall 7: Token Limit Exceeded
Problem: Schema + prompt + response exceeds modelโs context window.
Symptom:
openai.BadRequestError: This model's maximum context length is 8192 tokens
Solution: Estimate and manage token usage:
def estimate_tokens(self, text: str) -> int:
"""Rough token estimate (4 chars per token average)."""
return len(text) // 4
def check_token_budget(self, schema: dict, text: str, max_tokens: int = 8000):
schema_tokens = self.estimate_tokens(json.dumps(schema))
text_tokens = self.estimate_tokens(text)
overhead = 500 # System prompt, formatting
available_for_response = max_tokens - schema_tokens - text_tokens - overhead
if available_for_response < 500:
raise ValueError(
f"Input too long. Schema: {schema_tokens}, Text: {text_tokens}, "
f"Only {available_for_response} tokens left for response."
)
Extensions and Challenges
Extension 1: Streaming Partial Results
Implement streaming for long extractions to show progress:
from typing import AsyncIterator
from pydantic import BaseModel
class PartialModel:
"""Represents a partially extracted model."""
def __init__(self, model_class: type, partial_data: dict):
self.model_class = model_class
self.partial_data = partial_data
self.is_complete = False
def get_partial(self) -> dict:
return self.partial_data
def finalize(self) -> BaseModel:
return self.model_class.model_validate(self.partial_data)
async def stream_extraction(
self,
schema: Type[T],
text: str
) -> AsyncIterator[PartialModel[T]]:
"""Stream partial results as they generate."""
buffer = ""
partial_data = {}
async for chunk in self.provider.stream(messages, schema):
buffer += chunk
# Try to parse partial JSON
try:
partial_data = parse_partial_json(buffer)
yield PartialModel(schema, partial_data)
except:
continue
# Final complete result
final = PartialModel(schema, partial_data)
final.is_complete = True
yield final
Extension 2: Schema Evolution and Versioning
Handle schema changes gracefully:
class SchemaRegistry:
"""Manage multiple versions of extraction schemas."""
def __init__(self):
self.schemas: dict[str, dict[str, type]] = {}
def register(self, name: str, version: str, schema: type):
if name not in self.schemas:
self.schemas[name] = {}
self.schemas[name][version] = schema
def get(self, name: str, version: str = "latest") -> type:
if version == "latest":
versions = sorted(self.schemas[name].keys())
version = versions[-1]
return self.schemas[name][version]
def migrate(self, data: dict, from_version: str, to_version: str) -> dict:
"""Migrate extracted data between schema versions."""
pass
# Usage
registry = SchemaRegistry()
registry.register("person", "1.0", PersonV1)
registry.register("person", "2.0", PersonV2) # Added new fields
schema = registry.get("person", "latest")
Extension 3: Extraction with Confidence Scores
Add confidence scoring to extractions:
from typing import Generic, TypeVar
T = TypeVar('T', bound=BaseModel)
class ExtractionResult(BaseModel, Generic[T]):
"""Extraction result with confidence metadata."""
data: T
confidence: float = Field(..., ge=0, le=1)
extraction_notes: Optional[str] = None
fields_uncertain: list[str] = Field(default_factory=list)
class ConfidenceAwareExtractor:
def extract_with_confidence(
self,
schema: Type[T],
text: str
) -> ExtractionResult[T]:
# First extraction
data = self.extract(schema, text)
# Ask LLM to rate confidence
confidence_schema = ConfidenceRating
rating = self.extract(
confidence_schema,
f"Rate your confidence in this extraction:\n{data.model_dump_json()}"
)
return ExtractionResult(
data=data,
confidence=rating.overall_confidence,
fields_uncertain=rating.uncertain_fields
)
Extension 4: Multi-LLM Consensus
Use multiple LLMs and take consensus:
class ConsensusExtractor:
"""Extract with multiple LLMs and take consensus."""
def __init__(self, providers: list[LLMProvider]):
self.providers = providers
def extract_consensus(
self,
schema: Type[T],
text: str,
min_agreement: float = 0.66
) -> T:
results = []
for provider in self.providers:
try:
result = provider.extract(schema, text)
results.append(result)
except Exception:
continue
if not results:
raise ValueError("All providers failed")
# Find consensus (simplified - real impl would be smarter)
return self._find_consensus(results, min_agreement)
def _find_consensus(self, results: list[T], min_agreement: float) -> T:
# Compare results field by field
# Return most common values
pass
Extension 5: Extraction Pipeline DSL
Create a domain-specific language for complex extraction pipelines:
from dataclasses import dataclass
@dataclass
class ExtractionStep:
schema: type
depends_on: list[str] = None
condition: Callable = None
class ExtractionPipeline:
"""Define multi-step extraction pipelines."""
def __init__(self):
self.steps: dict[str, ExtractionStep] = {}
def add_step(self, name: str, step: ExtractionStep):
self.steps[name] = step
return self
def run(self, text: str) -> dict[str, BaseModel]:
results = {}
for name, step in self._topological_sort():
# Check condition
if step.condition and not step.condition(results):
continue
# Build context from dependencies
context = {
dep: results[dep].model_dump()
for dep in (step.depends_on or [])
}
# Extract
results[name] = self.client.extract(
step.schema,
text,
context=context
)
return results
# Usage
pipeline = ExtractionPipeline()
pipeline.add_step("basic", ExtractionStep(BasicInfo))
pipeline.add_step("details", ExtractionStep(
DetailedInfo,
depends_on=["basic"],
condition=lambda r: r["basic"].needs_details
))
Real-World Connections
Where This Pattern Appears
- AI-Powered Data Entry - Extracting form data from documents
- Chatbot Response Structuring - Making chatbot outputs machine-readable
- Content Classification - Categorizing content with consistent schemas
- API Response Generation - Using LLMs to generate structured API responses
- ETL Pipelines - Transforming unstructured to structured data
Industry Examples
- Anthropic Claude Tool Use - Structured outputs via tool definitions
- OpenAI Structured Outputs - Native JSON Schema compliance
- LangChain Output Parsers - Framework for structured LLM outputs
- Instructor Library - Popular library for Pydantic + LLM integration
- Outlines - Constrained text generation with regex/JSON
Production Considerations
- Cost Management
- Schema injection adds tokens (and cost)
- Cache frequent extractions
- Use cheaper models for simple schemas
- Latency
- Retries add latency
- Consider async/parallel extraction
- Streaming for user-facing applications
- Reliability
- Always have fallback behavior
- Log failed extractions for analysis
- Monitor extraction success rates
- Security
- Validate extracted data before use
- Donโt trust LLM output for security decisions
- Sanitize inputs to prevent prompt injection
Self-Assessment Checklist
Core Understanding
- Can I explain why structured LLM output is important for production systems?
- Can I describe the difference between JSON mode, function calling, and structured outputs?
- Can I explain how Pydanticโs JSON Schema generation works?
- Can I describe the self-correction retry pattern?
Implementation Skills
- Can I generate a clean JSON Schema from a Pydantic model?
- Can I build prompts that reliably produce structured output?
- Can I implement retry logic with error feedback?
- Can I handle complex nested schemas?
Provider Knowledge
- Can I integrate with OpenAIโs structured output features?
- Can I use Anthropicโs tool use for structured extraction?
- Can I abstract providers behind a common interface?
Production Readiness
- Can I handle edge cases (empty input, no matches, partial data)?
- Can I manage token budgets for complex schemas?
- Can I implement streaming for long extractions?
- Can I test structured extraction without API calls?
Mastery Indicators
- System handles all validation errors gracefully
- Extraction succeeds on complex real-world text
- Provider abstraction allows easy switching
- Tests cover both success and failure cases
- Documentation is comprehensive
Resources
Documentation
- Pydantic JSON Schema
- OpenAI Function Calling
- OpenAI Structured Outputs
- Anthropic Tool Use
- Instructor Library
Libraries
- instructor - Structured outputs for LLMs
- outlines - Constrained text generation
- marvin - AI functions with Pydantic
- langchain - LLM framework with output parsers
Books and Articles
- โAI Engineeringโ by Chip Huyen - Comprehensive LLM engineering guide
- OpenAI Cookbook - Practical examples
- Anthropic Prompt Engineering Guide
Related Projects
- LangChain Output Parsers
- Guardrails AI - Validation for LLM outputs
- LMQL - Query language for LLMs with constraints