Project 18: Cost-Latency-Aware Model Router

Build a model routing layer that chooses inference strategy per task based on quality, latency, and budget constraints.


Quick Reference

Attribute Value
Difficulty Level 3: Advanced
Time Estimate 10-20 hours
Language TypeScript (alt: Python)
Prerequisites Projects 9, 17
Key Topics routing policy, objective functions, fallback systems

Learning Objectives

  1. Classify tasks into route policy classes.
  2. Optimize model choice using multi-objective constraints.
  3. Handle outages and regressions with deterministic fallbacks.
  4. Evaluate router outcomes against quality and cost baselines.

The Core Question You’re Answering

“How do you avoid paying premium-model prices for every step while keeping quality stable?”


Concepts You Must Understand First

Concept Why It Matters Where to Learn
Multi-objective routing Balance conflicting SLOs optimization references
Route explainability Required for debugging and governance policy engineering patterns
Drift monitoring Keeps routing decisions current telemetry + eval integration

Theoretical Foundation

Task Class + Policy Budget + Historical Quality -> Router -> Model Tier -> Outcome Feedback

Routing is a closed-loop control problem, not static config.


Project Specification

What You’ll Build

A router that:

  • Supports 3+ model tiers
  • Uses class-specific latency/cost/quality thresholds
  • Logs route rationale and fallback events
  • Learns from weekly eval refreshes

Functional Requirements

  1. Task classification
  2. Rule-based or score-based routing decision
  3. Fallback cascade and circuit breaker
  4. Route decision artifact storage

Non-Functional Requirements

  • Stable behavior under traffic spikes
  • Fast rollback for bad route policies
  • Reproducible offline replay

Real World Outcome

$ node p18_router.js --task "draft customer response with citations"
[class] support_with_references
[policy] max_cost=$0.02 p95<3s
[selected] fast-reasoner-mini
[quality_est] 0.81 [latency_est] 2.4s [cost_est] $0.013
[artifact] route_decision.json

Architecture Overview

Task Classifier -> Policy Engine -> Router -> Model Adapter -> Feedback Store

Implementation Guide

Phase 1: Static Routing Rules

  • Define classes and thresholds.

Phase 2: Fallback and Circuit Breakers

  • Add outage-aware routing resilience.

Phase 3: Feedback Loop

  • Periodic route tuning from eval and telemetry.

Testing Strategy

  • Replay tests on historical workloads
  • Failure injection for provider outage
  • Quality regression checks against gold set

Common Pitfalls & Debugging

Pitfall Symptom Fix
Cost-only optimization quality collapse enforce quality floor constraints
Route oscillation unstable decisions add hysteresis and smoothing windows
Poor fallback design cascading failures explicit fallback graph + circuit breaker

Interview Questions They’ll Ask

  1. Why not one model for everything?
  2. How do you quantify route quality?
  3. How do you avoid routing thrash?
  4. How do you validate policy changes safely?

Hints in Layers

  • Hint 1: Start with 2 tiers and static policies.
  • Hint 2: Record every route rationale.
  • Hint 3: Replay traffic before policy rollout.
  • Hint 4: Add guardrails for minimum quality.

Submission / Completion Criteria

Minimum Completion

  • Router selects between multiple tiers with explicit policy

Full Completion

  • Fallback resilience + replay-validated improvements

Excellence

  • Automated route tuning with regression-safe rollout