Project 18: Cost-Latency-Aware Model Router

Build a model routing layer that chooses inference strategy per task based on quality, latency, and budget constraints.

Quick Reference

Attribute	Value
Difficulty	Level 3: Advanced
Time Estimate	10-20 hours
Language	TypeScript (alt: Python)
Prerequisites	Projects 9, 17
Key Topics	routing policy, objective functions, fallback systems

Learning Objectives

Classify tasks into route policy classes.
Optimize model choice using multi-objective constraints.
Handle outages and regressions with deterministic fallbacks.
Evaluate router outcomes against quality and cost baselines.

The Core Question You’re Answering

“How do you avoid paying premium-model prices for every step while keeping quality stable?”

Concepts You Must Understand First

Concept	Why It Matters	Where to Learn
Multi-objective routing	Balance conflicting SLOs	optimization references
Route explainability	Required for debugging and governance	policy engineering patterns
Drift monitoring	Keeps routing decisions current	telemetry + eval integration

Theoretical Foundation

Task Class + Policy Budget + Historical Quality -> Router -> Model Tier -> Outcome Feedback

Routing is a closed-loop control problem, not static config.

Project Specification

What You’ll Build

A router that:

Supports 3+ model tiers
Uses class-specific latency/cost/quality thresholds
Logs route rationale and fallback events
Learns from weekly eval refreshes

Functional Requirements

Task classification
Rule-based or score-based routing decision
Fallback cascade and circuit breaker
Route decision artifact storage

Non-Functional Requirements

Stable behavior under traffic spikes
Fast rollback for bad route policies
Reproducible offline replay

Real World Outcome

$ node p18_router.js --task "draft customer response with citations"
[class] support_with_references
[policy] max_cost=$0.02 p95<3s
[selected] fast-reasoner-mini
[quality_est] 0.81 [latency_est] 2.4s [cost_est] $0.013
[artifact] route_decision.json

Architecture Overview

Task Classifier -> Policy Engine -> Router -> Model Adapter -> Feedback Store

Implementation Guide

Phase 1: Static Routing Rules

Define classes and thresholds.

Phase 2: Fallback and Circuit Breakers

Add outage-aware routing resilience.

Phase 3: Feedback Loop

Periodic route tuning from eval and telemetry.

Testing Strategy

Replay tests on historical workloads
Failure injection for provider outage
Quality regression checks against gold set

Common Pitfalls & Debugging

Pitfall	Symptom	Fix
Cost-only optimization	quality collapse	enforce quality floor constraints
Route oscillation	unstable decisions	add hysteresis and smoothing windows
Poor fallback design	cascading failures	explicit fallback graph + circuit breaker

Interview Questions They’ll Ask

Why not one model for everything?
How do you quantify route quality?
How do you avoid routing thrash?
How do you validate policy changes safely?

Hints in Layers

Hint 1: Start with 2 tiers and static policies.
Hint 2: Record every route rationale.
Hint 3: Replay traffic before policy rollout.
Hint 4: Add guardrails for minimum quality.

Submission / Completion Criteria

Minimum Completion

Router selects between multiple tiers with explicit policy

Full Completion

Fallback resilience + replay-validated improvements

Excellence

Automated route tuning with regression-safe rollout