Project 2: The Smart Classifier and Tagger

Build a PydanticAI classifier that assigns categories and tags with schema validation.

Quick Reference

Attribute	Value
Difficulty	Level 2: Intermediate
Time Estimate	6-10 hours
Language	Python
Prerequisites	Project 1, Pydantic models
Key Topics	classification, tagging, constrained outputs

1. Learning Objectives

By completing this project, you will:

Define enums for categories and tags.
Enforce label constraints with validation.
Add confidence scores for classifications.
Evaluate accuracy on a labeled dataset.
Log ambiguous or low-confidence cases.

2. Theoretical Foundation

2.1 Constrained Classification

Constraining outputs to known labels improves reliability and makes downstream systems deterministic.

3. Project Specification

3.1 What You Will Build

A classifier that labels inputs with a primary category and optional tags, validated by schema.

3.2 Functional Requirements

Category enum with allowed values.
Tag list with optional fields.
Confidence score in output.
Evaluation against labeled samples.
Ambiguity handling when confidence is low.

3.3 Non-Functional Requirements

Deterministic mode for testing.
Clear error logs for invalid labels.
Configurable labels for new domains.

4. Solution Architecture

4.1 Components

Component	Responsibility
Label Schema	Define allowed labels
Classifier Agent	Generate category/tags
Validator	Enforce schema
Evaluator	Measure accuracy

5. Implementation Guide

5.1 Project Structure

LEARN_PYDANTIC_AI/P02-classifier-tagger/
├── src/
│   ├── labels.py
│   ├── agent.py
│   ├── validate.py
│   └── eval.py

5.2 Implementation Phases

Phase 1: Label schema (2-3h)

Define enums and tags.
Checkpoint: invalid labels rejected.

Phase 2: Classification (2-4h)

Run agent and validate output.
Checkpoint: classifier returns labels + confidence.

Phase 3: Evaluation (2-3h)

Compare against labeled samples.
Checkpoint: accuracy report created.

6. Testing Strategy

6.1 Test Categories

Category	Purpose	Examples
Unit	schema	invalid labels fail
Integration	agent	labels within enum
Regression	eval	accuracy stable

6.2 Critical Test Cases

Output label outside enum is rejected.
Low confidence triggers ambiguity handling.
Tags remain within allowed list.

7. Common Pitfalls & Debugging

Pitfall	Symptom	Fix
Label drift	unknown labels	enforce enum constraints
Overconfident outputs	low quality	calibrate confidence
Sparse tags	missing insights	expand tag list

8. Extensions & Challenges

Beginner

Add multi-label classification.
Add CSV import for data.

Intermediate

Add active learning for low-confidence samples.
Add label explanations.

Advanced

Add hierarchical labels.
Add drift detection in production.

9. Real-World Connections

Support triage relies on accurate labels.
Content moderation needs constrained outputs.

10. Resources

PydanticAI docs
Classification evaluation guides

11. Self-Assessment Checklist

I can constrain outputs to enums.
I can validate and score classifications.
I can evaluate accuracy reliably.

12. Submission / Completion Criteria

Minimum Completion:

Schema-validated classifier
Confidence scoring

Full Completion:

Evaluation report
Ambiguity handling

Excellence:

Active learning workflow
Drift detection

This guide was generated from project_based_ideas/AI_AGENTS_LLM_RAG/LEARN_PYDANTIC_AI.md.