Project 1: The Basic Information Extractor

Build a PydanticAI pipeline that extracts structured fields from text with strict validation.

Quick Reference

Attribute	Value
Difficulty	Level 1: Beginner
Time Estimate	4-6 hours
Language	Python
Prerequisites	Pydantic basics, JSON schema
Key Topics	structured output, validation, retries

1. Learning Objectives

By completing this project, you will:

Define Pydantic models for extraction.
Validate LLM outputs against schema.
Implement retry logic for invalid outputs.
Log extraction failures for debugging.
Measure extraction accuracy on sample text.

2. Theoretical Foundation

2.1 Why PydanticAI

Schema-first extraction turns probabilistic text into typed data you can trust.

3. Project Specification

3.1 What You Will Build

A CLI tool that takes unstructured text (emails, tickets) and outputs validated JSON.

3.2 Functional Requirements

Pydantic model with required fields.
Output parser tied to schema.
Retry strategy on validation errors.
Error logs for raw outputs.
Evaluation on a small dataset.

3.3 Non-Functional Requirements

Deterministic mode for testing.
Clear error messages for invalid outputs.
Configurable schema for new domains.

4. Solution Architecture

4.1 Components

Component	Responsibility
Schema Model	Define fields and types
Agent	Generate structured output
Validator	Enforce schema
Logger	Track failures

5. Implementation Guide

5.1 Project Structure

LEARN_PYDANTIC_AI/P01-basic-extractor/
├── src/
│   ├── models.py
│   ├── agent.py
│   ├── validate.py
│   └── cli.py

5.2 Implementation Phases

Phase 1: Schema (2h)

Define extraction fields.
Checkpoint: sample data validates.

Phase 2: Agent + validation (2-3h)

Run extraction with Pydantic validation.
Checkpoint: invalid output triggers retry.

Phase 3: Evaluation (1-2h)

Score accuracy on sample inputs.
Checkpoint: report shows success rate.

6. Testing Strategy

6.1 Test Categories

Category	Purpose	Examples
Unit	schema	missing field fails
Integration	agent	valid JSON produced
Regression	retries	recover from invalid output

6.2 Critical Test Cases

Missing required field triggers retry.
Extra fields are rejected or ignored.
Output matches schema exactly.

7. Common Pitfalls & Debugging

Pitfall	Symptom	Fix
Loose schema	inconsistent output	tighten field types
Over-strict schema	many failures	allow optional fields
Hidden errors	hard to debug	log raw output

8. Extensions & Challenges

Beginner

Add a second schema type.
Add CSV export.

Intermediate

Add confidence scores.
Add human review queue.

Advanced

Add schema auto-inference.
Add active learning for hard cases.

9. Real-World Connections

Support workflows rely on structured extraction.
Compliance needs validated fields.

10. Resources

PydanticAI docs
JSON schema references

11. Self-Assessment Checklist

I can define Pydantic schemas for extraction.
I can validate and retry invalid outputs.
I can measure extraction accuracy.

12. Submission / Completion Criteria

Minimum Completion:

Schema-validated extraction
Retry on invalid output

Full Completion:

Evaluation report
Error logging

Excellence:

Confidence scoring
Human review workflow

This guide was generated from project_based_ideas/AI_AGENTS_LLM_RAG/LEARN_PYDANTIC_AI.md.