Project 14: Distilling a Coding Agent
Distill a larger coding agent into a smaller, cheaper model while preserving task success rates.
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Level 5: Expert |
| Time Estimate | 3-4 weeks |
| Language | Python |
| Prerequisites | Distillation, eval harnesses |
| Key Topics | agent distillation, evals, regression testing |
1. Learning Objectives
By completing this project, you will:
- Capture teacher traces for coding tasks.
- Train a student model on agent traces.
- Evaluate task success rates.
- Measure latency and cost improvements.
- Prevent regressions with eval harnesses.
2. Theoretical Foundation
2.1 Distilling Agents
Agent distillation uses traces and decisions to train smaller models that preserve behavior.
3. Project Specification
3.1 What You Will Build
A distillation pipeline that learns from a teacher coding agent and evaluates the student on coding tasks.
3.2 Functional Requirements
- Trace collection from teacher agent.
- Student training on traces.
- Evaluation harness for coding tasks.
- Regression metrics vs teacher.
- Cost/latency report.
3.3 Non-Functional Requirements
- Deterministic evaluation on fixed tasks.
- Clear success criteria for tasks.
- Trace storage for reproducibility.
4. Solution Architecture
4.1 Components
| Component | Responsibility |
|---|---|
| Trace Collector | Record teacher behavior |
| Trainer | Distill student model |
| Evaluator | Run coding task suite |
| Reporter | Compare costs and accuracy |
5. Implementation Guide
5.1 Project Structure
QUANTIZATION_DISTILLATION_INFERENCE_OPTIMIZATION_MASTERY/P14-distill-agent/
├── src/
│ ├── collect.py
│ ├── train.py
│ ├── eval.py
│ └── report.py
5.2 Implementation Phases
Phase 1: Trace collection (8-12h)
- Run teacher agent on tasks.
- Checkpoint: traces stored with outcomes.
Phase 2: Distillation (8-12h)
- Train student on traces.
- Checkpoint: student reproduces actions.
Phase 3: Evaluation (8-12h)
- Compare student vs teacher on task suite.
- Checkpoint: regression report produced.
6. Testing Strategy
6.1 Test Categories
| Category | Purpose | Examples |
|---|---|---|
| Unit | trace format | schema validation |
| Integration | eval | task success rates |
| Regression | metrics | compare vs teacher |
6.2 Critical Test Cases
- Student achieves target success rate threshold.
- Cost per task decreases.
- Regression alerts on failed tasks.
7. Common Pitfalls & Debugging
| Pitfall | Symptom | Fix |
|---|---|---|
| Low student accuracy | task failures | improve trace quality |
| Overfitting traces | poor generalization | add task diversity |
| Missing evals | silent regressions | maintain task suite |
8. Extensions & Challenges
Beginner
- Add task difficulty tiers.
- Add trace filtering.
Intermediate
- Add active learning for new tasks.
- Add task-specific adapters.
Advanced
- Add multi-teacher distillation.
- Add long-horizon agent tasks.
9. Real-World Connections
- Coding copilots benefit from distilled models for cost.
- Enterprise agents need regression-safe distillation.
10. Resources
- Distillation papers
- Agent evaluation frameworks
11. Self-Assessment Checklist
- I can collect and store agent traces.
- I can train a student model from traces.
- I can evaluate regressions reliably.
12. Submission / Completion Criteria
Minimum Completion:
- Trace collection + student training
Full Completion:
- Evaluation suite + reports
Excellence:
- Multi-teacher distillation
- Long-horizon tasks
This guide was generated from project_based_ideas/AI_AGENTS_LLM_RAG/QUANTIZATION_DISTILLATION_INFERENCE_OPTIMIZATION_MASTERY.md.