Project 18: Cost-Latency-Aware Model Router
Build a model routing layer that chooses inference strategy per task based on quality, latency, and budget constraints.
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Level 3: Advanced |
| Time Estimate | 10-20 hours |
| Language | TypeScript (alt: Python) |
| Prerequisites | Projects 9, 17 |
| Key Topics | routing policy, objective functions, fallback systems |
Learning Objectives
- Classify tasks into route policy classes.
- Optimize model choice using multi-objective constraints.
- Handle outages and regressions with deterministic fallbacks.
- Evaluate router outcomes against quality and cost baselines.
The Core Question You’re Answering
“How do you avoid paying premium-model prices for every step while keeping quality stable?”
Concepts You Must Understand First
| Concept | Why It Matters | Where to Learn |
|---|---|---|
| Multi-objective routing | Balance conflicting SLOs | optimization references |
| Route explainability | Required for debugging and governance | policy engineering patterns |
| Drift monitoring | Keeps routing decisions current | telemetry + eval integration |
Theoretical Foundation
Task Class + Policy Budget + Historical Quality -> Router -> Model Tier -> Outcome Feedback
Routing is a closed-loop control problem, not static config.
Project Specification
What You’ll Build
A router that:
- Supports 3+ model tiers
- Uses class-specific latency/cost/quality thresholds
- Logs route rationale and fallback events
- Learns from weekly eval refreshes
Functional Requirements
- Task classification
- Rule-based or score-based routing decision
- Fallback cascade and circuit breaker
- Route decision artifact storage
Non-Functional Requirements
- Stable behavior under traffic spikes
- Fast rollback for bad route policies
- Reproducible offline replay
Real World Outcome
$ node p18_router.js --task "draft customer response with citations"
[class] support_with_references
[policy] max_cost=$0.02 p95<3s
[selected] fast-reasoner-mini
[quality_est] 0.81 [latency_est] 2.4s [cost_est] $0.013
[artifact] route_decision.json
Architecture Overview
Task Classifier -> Policy Engine -> Router -> Model Adapter -> Feedback Store
Implementation Guide
Phase 1: Static Routing Rules
- Define classes and thresholds.
Phase 2: Fallback and Circuit Breakers
- Add outage-aware routing resilience.
Phase 3: Feedback Loop
- Periodic route tuning from eval and telemetry.
Testing Strategy
- Replay tests on historical workloads
- Failure injection for provider outage
- Quality regression checks against gold set
Common Pitfalls & Debugging
| Pitfall | Symptom | Fix |
|---|---|---|
| Cost-only optimization | quality collapse | enforce quality floor constraints |
| Route oscillation | unstable decisions | add hysteresis and smoothing windows |
| Poor fallback design | cascading failures | explicit fallback graph + circuit breaker |
Interview Questions They’ll Ask
- Why not one model for everything?
- How do you quantify route quality?
- How do you avoid routing thrash?
- How do you validate policy changes safely?
Hints in Layers
- Hint 1: Start with 2 tiers and static policies.
- Hint 2: Record every route rationale.
- Hint 3: Replay traffic before policy rollout.
- Hint 4: Add guardrails for minimum quality.
Submission / Completion Criteria
Minimum Completion
- Router selects between multiple tiers with explicit policy
Full Completion
- Fallback resilience + replay-validated improvements
Excellence
- Automated route tuning with regression-safe rollout