P04: Multi-Provider Model Router
P04: Multi-Provider Model Router
Build a smart API gateway that dynamically routes prompts to the optimal LLM (GPT-4 for reasoning, Claude for long context, Gemini for vision) based on task analysis, with automatic fallback handling, cost tracking, and a real-time dashboard.
Overview
| Attribute | Value |
|---|---|
| Difficulty | Intermediate |
| Time Estimate | 1-2 weeks |
| Language | TypeScript (recommended), Python, Go |
| Prerequisites | AI SDK basics (Projects 1-3), Multiple API keys (OpenAI, Anthropic, Google) |
| Primary Book | โDesigning Data-Intensive Applicationsโ by Martin Kleppmann |
Learning Objectives
By completing this project, you will:
- Master provider abstraction - Understand how the AI SDK normalizes different provider APIs into a unified interface
- Implement intelligent routing - Build a task classifier that determines the optimal model for each request
- Build resilient fallback chains - Create fault-tolerant systems that gracefully degrade when providers fail
- Design cost optimization strategies - Route simple tasks to cheaper models while preserving quality for complex ones
- Implement production telemetry - Track token usage, latency, costs, and success rates across providers
- Understand rate limiting - Handle quota exhaustion and implement backoff strategies
- Build real-time observability - Create a dashboard showing routing decisions and system health
Theoretical Foundation
Part 1: Provider Abstraction Pattern
The core insight of the AI SDK is that despite surface differences, all LLM providers do fundamentally the same thing: accept a prompt and return a response. The SDK exploits this commonality.
YOUR APPLICATION
|
v
+---------------------------+
| AI SDK Unified API |
| |
| generateText() |
| generateObject() |
| streamText() |
+-------------+-------------+
|
+-------------+-------------+
| Provider Adapter Layer |
| |
| Normalizes: |
| - Authentication |
| - Request format |
| - Response structure |
| - Error types |
| - Token counting |
+--+-------+-------+-------++
| | | |
v v v v
+------+ +------+ +------+ +------+
|OpenAI| |Claude| |Gemini| |Cohere|
+------+ +------+ +------+ +------+
What the abstraction normalizes:
| Aspect | OpenAI Format | Anthropic Format | AI SDK Unified |
|---|---|---|---|
| Model ID | gpt-4-turbo |
claude-3-opus-20240229 |
openai('gpt-4-turbo') or anthropic('claude-3-opus') |
| System Message | messages[0].role = 'system' |
Separate system parameter |
system: 'You are...' |
| Token Usage | usage.total_tokens |
usage.input_tokens + output_tokens |
usage.totalTokens |
| Streaming | SSE with data: [DONE] |
SSE with event: message_stop |
Unified async iterator |
Why this matters:
- Switch providers with a single line change
- Test against multiple providers without code changes
- Implement fallback chains trivially
- Compare provider performance objectively
Part 2: Model Capabilities Landscape
Different models excel at different tasks. Understanding these strengths is essential for intelligent routing.
CAPABILITY MATRIX (as of 2024)
Reasoning Context Vision Speed Cost
| | | | |
GPT-4 Turbo โโโโโโโโโ โโโโโโโ โโโโโโโ โโโโโ โโโโโ
GPT-4o โโโโโโโโโ โโโโโโโ โโโโโโโโ โโโโโโ โโโโโ
Claude 3 Opus โโโโโโโโโ โโโโโโโโ โโโโโโโโ โโโโโโ โโโโโ
Claude 3.5 Sonnet โโโโโโโโโ โโโโโโโโ โโโโโโโโ โโโโโโ โโโโโ
Gemini 1.5 Pro โโโโโโโโโ โโโโโโโโโ โโโโโโโโ โโโโโโ โโโโโ
GPT-3.5 Turbo โโโโโโโโโ โโโโโโโโ โโโโโโโ โโโโโโ โโโโโ
Legend: โ = Strong โ = Weak/None
Routing heuristics:
- Complex reasoning (math, logic puzzles) -> Claude Opus, GPT-4
- Long documents (100k+ tokens) -> Claude, Gemini 1.5 Pro
- Vision tasks (image analysis) -> GPT-4o, Gemini Pro Vision
- Simple tasks (classification, formatting) -> GPT-3.5, Claude Haiku
- Cost-sensitive -> Haiku, GPT-3.5, Gemini Flash
Part 3: Fallback Chain Patterns
Production systems fail. Your router must handle:
- API rate limits
- Provider outages
- Model-specific errors
- Network timeouts
FALLBACK CHAIN ARCHITECTURE
Incoming Request
|
v
+------------------+
| Task Classifier | "What type of task is this?"
+--------+---------+
|
v
+------------------+
| Route to Primary | Based on task type
+--------+---------+
|
v
+--------+---------+
| Try Primary |
| (Claude Opus) |
+--------+---------+
|
Success? ------> Return response
|
No
|
v
+--------+---------+
| Try Secondary |
| (GPT-4 Turbo) |
+--------+---------+
|
Success? ------> Return response + log fallback
|
No
|
v
+--------+---------+
| Try Tertiary |
| (Gemini Pro) |
+--------+---------+
|
Success? ------> Return response + log degradation
|
No
|
v
+--------+---------+
| Graceful Failure |
| Return error + |
| retry guidance |
+------------------+
Fallback strategies:
| Strategy | Description | Use Case |
|---|---|---|
| Ordered Chain | Try providers in fixed priority order | General purpose |
| Capability Match | Find next provider with same capability | Vision, long context |
| Cost Escalation | Start cheap, escalate on failure | Cost-sensitive apps |
| Geographic | Route by latency/region | Global deployments |
Part 4: Cost Optimization
AI costs add up fast. Smart routing saves money.
COST PER 1M TOKENS (Input/Output)
+------------------+----------+-----------+
| Model | Input | Output |
+------------------+----------+-----------+
| GPT-4 Turbo | $10.00 | $30.00 |
| GPT-4o | $5.00 | $15.00 |
| GPT-3.5 Turbo | $0.50 | $1.50 |
| Claude 3 Opus | $15.00 | $75.00 |
| Claude 3 Sonnet | $3.00 | $15.00 |
| Claude 3 Haiku | $0.25 | $1.25 |
| Gemini 1.5 Pro | $3.50 | $10.50 |
| Gemini 1.5 Flash | $0.35 | $1.05 |
+------------------+----------+-----------+
COST OPTIMIZATION FLOW:
Request arrives
|
v
+--------------+ +--------------+
| Complexity | --> | Simple? | --> Use Haiku/3.5
| Analysis | | (< 50 words | (10-60x cheaper)
+--------------+ | response) |
+--------------+
|
v No
+--------------+
| Medium? | --> Use Sonnet/4o
| (analysis, | (3-5x cheaper)
| summaries) |
+--------------+
|
v No
+--------------+
| Complex? | --> Use Opus/GPT-4
| (reasoning, | (full capability)
| creativity) |
+--------------+
Part 5: Telemetry and Observability
You cannot optimize what you cannot measure. A production router needs:
TELEMETRY COLLECTION POINTS
Request Flow
|
v
+---+---+
| Entry | Record: timestamp, request_id, prompt_length
+---+---+
|
v
+---+---+
|Classify| Record: inferred_task_type, selected_provider
+---+---+
|
v
+---+---+
| Route | Record: primary_provider, fallback_triggered?
+---+---+
|
v
+---+---+
| LLM | Record: provider, model, latency_ms
+---+---+ input_tokens, output_tokens, cost
|
v
+---+---+
| Exit | Record: success/failure, total_latency,
+---+---+ error_type (if failed)
METRICS TO TRACK:
+----------------------------------+-------------------+
| Metric | Why |
+----------------------------------+-------------------+
| requests_per_provider | Usage distribution|
| latency_p50, p95, p99 | Performance SLAs |
| cost_per_provider | Budget tracking |
| fallback_rate | Reliability |
| error_rate_per_provider | Provider health |
| tokens_per_request | Usage patterns |
| cost_savings (vs single provider)| ROI justification |
+----------------------------------+-------------------+
Part 6: Rate Limiting and Quota Management
Every provider imposes limits. Your router must handle them gracefully.
RATE LIMIT HANDLING
+-------------------+ +-------------------+
| Request Arrives | -------> | Check Rate Limit |
+-------------------+ | State |
+--------+----------+
|
+-----------------------+-----------------------+
| | |
v v v
+-------+-------+ +-------+-------+ +-------+-------+
| Under Limit | | Near Limit | | Over Limit |
| (proceed) | | (warn + queue)| | (backoff) |
+---------------+ +---------------+ +-------+-------+
|
+---------------+---------------+
| |
v v
+-------+-------+ +-------+-------+
| Exponential | | Route to |
| Backoff | | Alternative |
+---------------+ +---------------+
RATE LIMIT TRACKING STRUCTURE:
{
"openai": {
"requests_per_minute": { "limit": 500, "used": 423, "reset_at": "..." },
"tokens_per_minute": { "limit": 90000, "used": 67500, "reset_at": "..." }
},
"anthropic": {
"requests_per_minute": { "limit": 1000, "used": 234, "reset_at": "..." },
"tokens_per_minute": { "limit": 100000, "used": 45000, "reset_at": "..." }
}
}
Project Specification
What Youโre Building
A REST API gateway that:
- Accepts prompts with optional capability hints
- Classifies the task type using an LLM
- Routes to the optimal provider based on task and cost constraints
- Implements fallback chains when providers fail
- Tracks all requests with detailed telemetry
- Exposes a dashboard showing routing decisions and costs
API Endpoints
POST /api/route
Request: {
prompt: string,
capability?: "reasoning" | "vision" | "long-context" | "fast" | "cheap",
maxCost?: number,
images?: string[] // base64 encoded for vision tasks
}
Response: {
response: string,
metadata: {
provider: string,
model: string,
latency_ms: number,
input_tokens: number,
output_tokens: number,
cost: number,
fallback_used: boolean,
original_provider?: string
}
}
GET /api/stats
Response: {
total_requests: number,
requests_by_provider: { [provider: string]: number },
total_cost: number,
cost_by_provider: { [provider: string]: number },
average_latency_ms: number,
fallback_rate: number,
error_rate: number
}
GET /api/health
Response: {
status: "healthy" | "degraded" | "unhealthy",
providers: {
[provider: string]: {
status: "up" | "down" | "rate_limited",
last_success: string,
error_rate_1h: number
}
}
}
Task Classification Schema
const TaskClassification = z.object({
taskType: z.enum([
'simple_qa', // Simple questions, lookups
'summarization', // Text summarization
'analysis', // Document/data analysis
'reasoning', // Logic, math, complex thinking
'creative', // Writing, brainstorming
'code', // Programming tasks
'vision', // Image understanding
'conversation' // Multi-turn chat
]),
complexity: z.enum(['simple', 'medium', 'complex']),
estimatedOutputTokens: z.number(),
requiresVision: z.boolean(),
requiresLongContext: z.boolean(),
reasoning: z.string() // Why this classification
});
Provider Configuration
interface ProviderConfig {
name: string;
models: {
primary: string;
fallback?: string;
};
capabilities: ('reasoning' | 'vision' | 'long-context' | 'fast' | 'cheap')[];
costPer1kTokens: {
input: number;
output: number;
};
rateLimits: {
requestsPerMinute: number;
tokensPerMinute: number;
};
enabled: boolean;
}
// Example configuration
const providers: ProviderConfig[] = [
{
name: 'anthropic',
models: { primary: 'claude-3-opus-20240229', fallback: 'claude-3-sonnet-20240229' },
capabilities: ['reasoning', 'long-context'],
costPer1kTokens: { input: 0.015, output: 0.075 },
rateLimits: { requestsPerMinute: 1000, tokensPerMinute: 100000 },
enabled: true
},
// ... more providers
];
Dashboard Requirements
- Real-time request count by provider (bar chart)
- Cumulative cost over time (line chart)
- Latency distribution histogram
- Fallback event timeline
- Provider health status indicators
- Recent routing decisions table
Real World Outcome
CLI Request Example
$ curl -X POST http://localhost:3000/api/route \
-H "Content-Type: application/json" \
-d '{
"prompt": "Explain the mathematical proof of the Pythagorean theorem using geometric reasoning",
"capability": "reasoning"
}'
{
"response": "The Pythagorean theorem states that in a right triangle, the square of the hypotenuse equals the sum of squares of the other two sides (a^2 + b^2 = c^2).\n\n**Geometric Proof by Rearrangement:**\n\n1. Consider a square with side length (a + b)...",
"metadata": {
"provider": "anthropic",
"model": "claude-3-opus-20240229",
"latency_ms": 2847,
"input_tokens": 24,
"output_tokens": 312,
"cost": 0.0239,
"fallback_used": false,
"task_classification": {
"taskType": "reasoning",
"complexity": "complex",
"estimatedOutputTokens": 300,
"requiresVision": false,
"requiresLongContext": false
}
}
}
Fallback Scenario
$ curl -X POST http://localhost:3000/api/route \
-H "Content-Type: application/json" \
-d '{
"prompt": "What is 2 + 2?",
"capability": "fast"
}'
# When primary provider (Claude Haiku) is rate-limited:
{
"response": "2 + 2 = 4",
"metadata": {
"provider": "openai",
"model": "gpt-3.5-turbo",
"latency_ms": 342,
"input_tokens": 8,
"output_tokens": 6,
"cost": 0.000013,
"fallback_used": true,
"original_provider": "anthropic",
"fallback_reason": "rate_limit_exceeded"
}
}
Stats Endpoint
$ curl http://localhost:3000/api/stats
{
"total_requests": 15847,
"requests_by_provider": {
"anthropic": 8234,
"openai": 5612,
"google": 2001
},
"total_cost": 127.45,
"cost_by_provider": {
"anthropic": 89.23,
"openai": 31.18,
"google": 7.04
},
"average_latency_ms": 1847,
"fallback_rate": 0.034,
"error_rate": 0.008,
"cost_savings_vs_opus_only": 412.67
}
Dashboard Mockup
+------------------------------------------------------------------+
| MODEL ROUTER DASHBOARD |
+------------------------------------------------------------------+
| |
| Provider Distribution (24h) Cost Over Time |
| +--------------------------+ +---------------------------+ |
| | Anthropic โโโโโโโโ 52%| | ___/ | |
| | OpenAI โโโโโ 35% | | ___/ | |
| | Google โโ 13% | | ___/ | |
| +--------------------------+ | ___/ | |
| | ___/ | |
| Latency Distribution +---------------------------+ |
| +---------------------------+ $0 $127 |
| | ___ | |
| | / \ | Provider Health |
| | / \__ | +------------------------+ |
| |___/ \____ | | Anthropic [OK] | |
| +---------------------------+ | OpenAI [OK] | |
| 0ms 500ms 1s 2s 5s | Google [DEGRADED]| |
| +------------------------+ |
| Recent Routing Decisions |
| +--------------------------------------------------------------+|
| | Time | Task Type | Provider | Fallback | Cost | Lat ||
| |----------|------------|-----------|----------|--------|------||
| | 14:23:01 | reasoning | anthropic | No | $0.024 | 2.8s ||
| | 14:22:58 | simple_qa | openai | No | $0.001 | 0.3s ||
| | 14:22:45 | vision | google | Yes* | $0.012 | 1.5s ||
| | 14:22:32 | code | anthropic | No | $0.018 | 2.1s ||
| +--------------------------------------------------------------+|
| * Original: openai (rate_limited) |
+------------------------------------------------------------------+
Solution Architecture
High-Level Architecture
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ API GATEWAY โ
โ โ
โโโโโโโโโโโโโ โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ Client โโโโโโโโโโโโโโโโ Request Handler โ โ
โ (curl/app)โ โ โโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโ โ
โโโโโโโโโโโโโ โ โ โ
โ โผ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ Task Classifier โ โ
โ โ (generateObject + schema) โ โ
โ โโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโ โ
โ โ โ
โ โผ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ Routing Engine โ โ
โ โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ โ
โ โ โ Capability Matcher โ โ โ
โ โ โ Cost Optimizer โ โ โ
โ โ โ Rate Limit Checker โ โ โ
โ โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ โ
โ โโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโ โ
โ โ โ
โ โผ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ Provider Executor โ โ
โ โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ โ
โ โ โ Fallback Chain Manager โ โ โ
โ โ โ Error Handler โ โ โ
โ โ โ Retry Logic โ โ โ
โ โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ โ
โ โโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโ โ
โ โ โ
โ โผ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ Telemetry Collector โ โ
โ โ (usage, cost, latency, errors) โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ โ โ
โผ โผ โผ
โโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโ
โ OpenAI โ โ Anthropic โ โ Google โ
โ GPT-4/3.5 โ โ Claude โ โ Gemini โ
โโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโ
Task Classifier Design
The classifier uses generateObject to analyze the incoming prompt:
TASK CLASSIFIER FLOW
Input: "Explain the mathematical proof..."
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโ
โ Classification โ
โ Prompt Template โ
โ โ
โ "Analyze this prompt โ
โ and determine: โ
โ - Task type โ
โ - Complexity โ
โ - Required caps โ
โ ..." โ
โโโโโโโโโโโโฌโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโ
โ generateObject() โ
โ + TaskSchema โ
โ โ
โ Uses fast model: โ
โ gpt-3.5 or haiku โ
โ (minimize cost) โ
โโโโโโโโโโโโฌโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโ
โ Classification โ
โ Result: โ
โ โ
โ taskType: reasoningโ
โ complexity: complexโ
โ vision: false โ
โ longContext: false โ
โโโโโโโโโโโโโโโโโโโโโโโโ
Fallback Chain Implementation
FALLBACK EXECUTION WITH CIRCUIT BREAKER
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Provider Chain โ
โ โ
โ โโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโ โ
โ โ Primary โโโโโถโSecondary โโโโโถโ Tertiary โโโโโถโ Error โ โ
โ โ Claude โ โ GPT-4 โ โ Gemini โ โ Handler โ โ
โ โโโโโโฌโโโโโโ โโโโโโฌโโโโโโ โโโโโโฌโโโโโโ โโโโโโโโโโโ โ
โ โ โ โ โ
โ โผ โผ โผ โ
โ โโโโโโโโโโโ โโโโโโโโโโโ โโโโโโโโโโโ โ
โ โ Circuit โ โ Circuit โ โ Circuit โ โ
โ โ Breaker โ โ Breaker โ โ Breaker โ โ
โ โ โ โ โ โ โ โ
โ โ CLOSED โ โ CLOSED โ โ HALF- โ โ
โ โ โ โ โ โ OPEN โ โ
โ โโโโโโโโโโโ โโโโโโโโโโโ โโโโโโโโโโโ โ
โ โ
โ Circuit States: โ
โ - CLOSED: Normal operation โ
โ - OPEN: Too many failures, skip provider โ
โ - HALF-OPEN: Testing if provider recovered โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
File Structure
model-router/
โโโ src/
โ โโโ index.ts # Main application entry
โ โโโ server.ts # Express/Hono server setup
โ โ
โ โโโ classifier/
โ โ โโโ index.ts # Task classification logic
โ โ โโโ schema.ts # Zod schemas for classification
โ โ โโโ prompts.ts # Classification prompt templates
โ โ
โ โโโ router/
โ โ โโโ index.ts # Main routing logic
โ โ โโโ capabilities.ts # Capability matching
โ โ โโโ cost-optimizer.ts # Cost-based routing
โ โ โโโ rate-limiter.ts # Rate limit tracking
โ โ
โ โโโ providers/
โ โ โโโ index.ts # Provider registry
โ โ โโโ config.ts # Provider configurations
โ โ โโโ openai.ts # OpenAI adapter
โ โ โโโ anthropic.ts # Anthropic adapter
โ โ โโโ google.ts # Google AI adapter
โ โ
โ โโโ executor/
โ โ โโโ index.ts # Request execution
โ โ โโโ fallback-chain.ts # Fallback logic
โ โ โโโ circuit-breaker.ts # Circuit breaker pattern
โ โ โโโ retry.ts # Retry with backoff
โ โ
โ โโโ telemetry/
โ โ โโโ index.ts # Telemetry aggregation
โ โ โโโ metrics.ts # Metric definitions
โ โ โโโ cost-tracker.ts # Cost calculation
โ โ โโโ store.ts # In-memory metrics store
โ โ
โ โโโ api/
โ โ โโโ routes.ts # API route definitions
โ โ โโโ handlers.ts # Request handlers
โ โ โโโ middleware.ts # Auth, logging, etc.
โ โ
โ โโโ dashboard/
โ โโโ index.html # Dashboard UI
โ โโโ styles.css # Dashboard styles
โ โโโ charts.ts # Chart rendering
โ
โโโ tests/
โ โโโ classifier.test.ts # Classification tests
โ โโโ router.test.ts # Routing tests
โ โโโ fallback.test.ts # Fallback chain tests
โ โโโ cost.test.ts # Cost calculation tests
โ โโโ mocks/
โ โโโ providers.ts # Mock provider responses
โ โโโ fixtures.ts # Test data
โ
โโโ .env.example # Environment variables template
โโโ package.json
โโโ tsconfig.json
โโโ README.md
The Core Question Youโre Answering
โHow do I build a smart system that routes requests to the optimal LLM?โ
This question decomposes into:
- How do I classify tasks? What makes a prompt โneed reasoningโ vs โneed speedโ?
- How do I compare providers? What are the meaningful dimensions?
- How do I handle failures? What happens when the โbestโ provider is down?
- How do I track costs? Different pricing, different token counting.
- How do I prove it works? What metrics show the system is making good decisions?
Concepts You Must Understand First
Before writing code, ensure you understand:
| Concept | Why It Matters | Reference |
|---|---|---|
| Provider Abstraction | The SDKโs core value proposition | AI SDK Providers |
| Structured Output | Task classification requires typed results | AI SDK generateObject |
| Error Types | Different errors need different handling | AI SDK Error Handling |
| Circuit Breaker Pattern | Prevents cascading failures | โRelease It!โ Ch. 5 |
| Rate Limiting | Every API has limits | Provider documentation |
| Token Economics | Cost = f(input_tokens, output_tokens, model) | Provider pricing pages |
Questions to Guide Your Design
Before coding, answer these questions:
-
What classifier model should you use? The classifier runs on every request. Using GPT-4 to classify before routing to GPT-4 is wasteful. How do you keep classification cheap?
-
How granular should task types be? โreasoningโ is broad. โmathematical reasoningโ vs โethical reasoningโ might route differently. Whereโs the right balance?
-
What happens when all providers fail? Queue the request? Return an error? Cache a previous response?
-
How do you test routing logic? You canโt make real API calls in unit tests. How do you mock providers while testing real routing decisions?
-
How fresh should rate limit data be? Checking limits on every request adds latency. Caching limits risks going over. Whatโs the right strategy?
-
How do you handle provider-specific features? Claude has โsystemโ as a separate parameter. OpenAI uses it as the first message. Does your abstraction hide this or expose it?
Thinking Exercise
Before writing any code, complete this design exercise:
Scenario: Your router receives this request:
{
"prompt": "Analyze this 150-page quarterly report and identify the three most concerning financial trends. Here's the document: [150 pages of text]",
"capability": "analysis"
}
Questions to answer:
- How does your classifier determine this needs long-context capability?
- Which providers can handle 150 pages (~200k tokens)?
- Whatโs the fallback if Claude (200k context) is rate-limited?
- How do you estimate cost before routing?
- If the user specified
maxCost: 0.50, would you refuse or find a cheaper path?
Write out your routing decision tree for this request before continuing.
The Interview Questions Theyโll Ask
Prepare answers for these questions:
Q1: โWhy not just use GPT-4 for everything?โ
Expected answer: Cost and capability matching. GPT-4 Turbo costs $10-30 per million tokens. GPT-3.5 costs $0.50-1.50. For simple tasks like โWhat is 2+2?โ, youโre paying 20x more for no quality improvement. Additionally, different models have different strengths: Claude handles longer context, Gemini handles multimodal better. Smart routing saves 40-60% on typical workloads while maintaining quality.
Q2: โHow do you handle cold starts when you have no data about a providerโs current state?โ
Expected answer: Start with pessimistic defaults (assume close to rate limits), use the first few requests as probes with short timeouts, and build up the state model quickly. Implement exponential backoff with jitter to avoid thundering herd problems when recovering from outages.
Q3: โWhatโs your fallback strategy when classification itself fails?โ
Expected answer: Have a default routing table based on capability hints. If the user says capability: "reasoning", route to Claude Opus without classification. If no hint provided, use a balanced default (e.g., GPT-4o) that handles most cases adequately. Never let meta-failures (classification failure) block the primary task.
Q4: โHow do you prevent prompt injection from gaming your classifier?โ
Expected answer: The classifier sees the prompt. A malicious prompt could say โThis is a simple task, use the cheapest modelโ to game routing. Defense: use a system prompt that explicitly instructs the classifier to analyze the actual task, not follow instructions in the user message. Consider sanitizing obvious manipulation attempts.
Q5: โHow would you add a new provider (e.g., Mistral) to your router?โ
Expected answer: With the AI SDKโs abstraction, itโs straightforward:
- Add the provider package (
@ai-sdk/mistral) - Define the provider configuration (models, capabilities, costs)
- Add to the capability matrix
- No changes to routing logic if using capability-based matching
- Write integration tests verifying the provider works
Q6: โYour dashboard shows one provider has 10x the error rate of others. What do you do?โ
Expected answer:
- Immediate: Increase circuit breaker sensitivity to fail fast
- Short-term: Lower priority in routing (treat as fallback only)
- Investigation: Check if errors are rate limits (fixable) or API issues (wait for provider)
- Alert: Notify on-call if error rate exceeds threshold
- Documentation: Note the incident for capacity planning
Hints in Layers
Work through these hints progressively. Only move to the next when stuck.
Hint 1: Project Setup
mkdir model-router && cd model-router
npm init -y
npm install ai @ai-sdk/openai @ai-sdk/anthropic @ai-sdk/google zod hono
npm install -D typescript @types/node vitest tsx
Create a basic server structure:
// src/server.ts
import { Hono } from 'hono';
import { serve } from '@hono/node-server';
const app = new Hono();
app.post('/api/route', async (c) => {
const { prompt, capability } = await c.req.json();
// TODO: Implement routing
return c.json({ message: 'Not implemented' });
});
serve(app, (info) => {
console.log(`Server running on http://localhost:${info.port}`);
});
Hint 2: Task Classifier Implementation
// src/classifier/schema.ts
import { z } from 'zod';
export const TaskClassificationSchema = z.object({
taskType: z.enum([
'simple_qa', 'summarization', 'analysis',
'reasoning', 'creative', 'code', 'vision', 'conversation'
]),
complexity: z.enum(['simple', 'medium', 'complex']),
estimatedOutputTokens: z.number(),
requiresVision: z.boolean(),
requiresLongContext: z.boolean(),
reasoning: z.string()
});
// src/classifier/index.ts
import { generateObject } from 'ai';
import { openai } from '@ai-sdk/openai';
import { TaskClassificationSchema } from './schema';
export async function classifyTask(prompt: string, images?: string[]) {
const { object } = await generateObject({
model: openai('gpt-3.5-turbo'), // Use cheap model for classification
schema: TaskClassificationSchema,
prompt: `Analyze this prompt and classify it:
PROMPT: ${prompt}
Consider:
- What type of task is being requested?
- How complex is the expected response?
- Does it require vision capabilities? ${images ? 'Images are provided.' : 'No images provided.'}
- Does it require processing a very long context (100k+ tokens)?
Provide your classification.`
});
return object;
}
Hint 3: Provider Registry
// src/providers/config.ts
export interface ProviderConfig {
name: string;
createModel: (modelId: string) => LanguageModel;
models: {
default: string;
fast: string;
cheap: string;
};
capabilities: Set<string>;
costPer1kTokens: { input: number; output: number };
maxContextTokens: number;
}
export const providers: Record<string, ProviderConfig> = {
anthropic: {
name: 'anthropic',
createModel: (id) => anthropic(id),
models: {
default: 'claude-3-opus-20240229',
fast: 'claude-3-sonnet-20240229',
cheap: 'claude-3-haiku-20240307'
},
capabilities: new Set(['reasoning', 'long-context', 'code', 'creative']),
costPer1kTokens: { input: 0.015, output: 0.075 },
maxContextTokens: 200000
},
openai: {
name: 'openai',
createModel: (id) => openai(id),
models: {
default: 'gpt-4-turbo',
fast: 'gpt-4o',
cheap: 'gpt-3.5-turbo'
},
capabilities: new Set(['reasoning', 'vision', 'code', 'fast']),
costPer1kTokens: { input: 0.01, output: 0.03 },
maxContextTokens: 128000
},
google: {
name: 'google',
createModel: (id) => google(id),
models: {
default: 'gemini-1.5-pro',
fast: 'gemini-1.5-flash',
cheap: 'gemini-1.5-flash'
},
capabilities: new Set(['vision', 'long-context', 'fast', 'cheap']),
costPer1kTokens: { input: 0.0035, output: 0.0105 },
maxContextTokens: 1000000
}
};
Hint 4: Routing Engine
// src/router/index.ts
import { providers, ProviderConfig } from '../providers/config';
import { TaskClassification } from '../classifier/schema';
export interface RoutingDecision {
provider: ProviderConfig;
model: string;
fallbackChain: Array<{ provider: ProviderConfig; model: string }>;
reasoning: string;
}
export function routeTask(
classification: TaskClassification,
capability?: string,
maxCost?: number
): RoutingDecision {
// 1. Filter providers by required capabilities
let candidates = Object.values(providers).filter(p => {
if (classification.requiresVision && !p.capabilities.has('vision')) return false;
if (classification.requiresLongContext && p.maxContextTokens < 100000) return false;
if (capability && !p.capabilities.has(capability)) return false;
return true;
});
// 2. Sort by suitability
candidates.sort((a, b) => {
// Prefer cheaper for simple tasks
if (classification.complexity === 'simple') {
return a.costPer1kTokens.output - b.costPer1kTokens.output;
}
// Prefer capability match for complex tasks
if (capability && a.capabilities.has(capability) && !b.capabilities.has(capability)) {
return -1;
}
return 0;
});
// 3. Build fallback chain
const primary = candidates[0];
const fallbacks = candidates.slice(1).map(p => ({
provider: p,
model: classification.complexity === 'simple' ? p.models.cheap : p.models.default
}));
return {
provider: primary,
model: classification.complexity === 'simple' ? primary.models.cheap : primary.models.default,
fallbackChain: fallbacks,
reasoning: `Selected ${primary.name} for ${classification.taskType} (${classification.complexity})`
};
}
Hint 5: Fallback Chain Executor
// src/executor/fallback-chain.ts
import { generateText } from 'ai';
import { RoutingDecision } from '../router';
import { TelemetryCollector } from '../telemetry';
export interface ExecutionResult {
response: string;
provider: string;
model: string;
fallbackUsed: boolean;
originalProvider?: string;
usage: { inputTokens: number; outputTokens: number };
latencyMs: number;
}
export async function executeWithFallback(
prompt: string,
decision: RoutingDecision,
telemetry: TelemetryCollector
): Promise<ExecutionResult> {
const attempts = [
{ provider: decision.provider, model: decision.model },
...decision.fallbackChain
];
let lastError: Error | null = null;
for (let i = 0; i < attempts.length; i++) {
const { provider, model } = attempts[i];
const startTime = Date.now();
try {
const result = await generateText({
model: provider.createModel(model),
prompt
});
const latencyMs = Date.now() - startTime;
telemetry.recordSuccess(provider.name, model, {
inputTokens: result.usage.promptTokens,
outputTokens: result.usage.completionTokens,
latencyMs
});
return {
response: result.text,
provider: provider.name,
model,
fallbackUsed: i > 0,
originalProvider: i > 0 ? attempts[0].provider.name : undefined,
usage: {
inputTokens: result.usage.promptTokens,
outputTokens: result.usage.completionTokens
},
latencyMs
};
} catch (error) {
lastError = error as Error;
telemetry.recordFailure(provider.name, model, error);
// Check if error is retryable
if (!isRetryableError(error)) {
throw error; // Don't try fallbacks for non-retryable errors
}
}
}
throw new Error(`All providers failed. Last error: ${lastError?.message}`);
}
function isRetryableError(error: unknown): boolean {
if (error instanceof Error) {
// Rate limit and server errors are retryable
return error.message.includes('rate limit') ||
error.message.includes('503') ||
error.message.includes('timeout');
}
return false;
}
Hint 6: Telemetry and Dashboard
// src/telemetry/index.ts
export class TelemetryCollector {
private metrics: {
requests: Map<string, number>;
costs: Map<string, number>;
latencies: number[];
errors: Map<string, number>;
fallbacks: number;
};
constructor() {
this.metrics = {
requests: new Map(),
costs: new Map(),
latencies: [],
errors: new Map(),
fallbacks: 0
};
}
recordSuccess(provider: string, model: string, data: {
inputTokens: number;
outputTokens: number;
latencyMs: number;
}) {
// Increment request count
const key = `${provider}:${model}`;
this.metrics.requests.set(key, (this.metrics.requests.get(key) || 0) + 1);
// Calculate and record cost
const cost = this.calculateCost(provider, data.inputTokens, data.outputTokens);
this.metrics.costs.set(provider, (this.metrics.costs.get(provider) || 0) + cost);
// Record latency
this.metrics.latencies.push(data.latencyMs);
}
recordFailure(provider: string, model: string, error: unknown) {
const key = `${provider}:${model}`;
this.metrics.errors.set(key, (this.metrics.errors.get(key) || 0) + 1);
}
recordFallback() {
this.metrics.fallbacks++;
}
getStats() {
const totalRequests = Array.from(this.metrics.requests.values())
.reduce((a, b) => a + b, 0);
const totalCost = Array.from(this.metrics.costs.values())
.reduce((a, b) => a + b, 0);
const avgLatency = this.metrics.latencies.length > 0
? this.metrics.latencies.reduce((a, b) => a + b, 0) / this.metrics.latencies.length
: 0;
return {
totalRequests,
requestsByProvider: Object.fromEntries(this.metrics.requests),
totalCost: Math.round(totalCost * 1000) / 1000,
costByProvider: Object.fromEntries(this.metrics.costs),
averageLatencyMs: Math.round(avgLatency),
fallbackRate: totalRequests > 0 ? this.metrics.fallbacks / totalRequests : 0,
errorRate: this.calculateErrorRate()
};
}
private calculateCost(provider: string, inputTokens: number, outputTokens: number): number {
const costs = {
anthropic: { input: 0.015, output: 0.075 },
openai: { input: 0.01, output: 0.03 },
google: { input: 0.0035, output: 0.0105 }
};
const rate = costs[provider as keyof typeof costs] || { input: 0, output: 0 };
return (inputTokens * rate.input + outputTokens * rate.output) / 1000;
}
private calculateErrorRate(): number {
const totalErrors = Array.from(this.metrics.errors.values())
.reduce((a, b) => a + b, 0);
const totalRequests = Array.from(this.metrics.requests.values())
.reduce((a, b) => a + b, 0);
return totalRequests > 0 ? totalErrors / (totalRequests + totalErrors) : 0;
}
}
Phased Implementation Guide
Phase 1: Foundation (Days 1-2)
Goal: Basic request handling with single provider
Tasks:
- Set up project with TypeScript, Hono, and AI SDK
- Create basic
/api/routeendpoint - Implement single-provider routing (OpenAI only)
- Add basic error handling
- Return structured response with metadata
Milestone: Can send a prompt and get a response from OpenAI
Verification:
curl -X POST http://localhost:3000/api/route \
-H "Content-Type: application/json" \
-d '{"prompt": "What is 2+2?"}'
# Should return: { "response": "4", "metadata": { "provider": "openai", ... } }
Phase 2: Multi-Provider Routing (Days 3-4)
Goal: Route to different providers based on capability
Tasks:
- Add Anthropic and Google providers
- Implement task classifier with
generateObject - Build capability matching logic
- Implement basic routing decision engine
- Support capability hints in API
Milestone: Different prompts route to different providers
Verification:
# Reasoning task -> Claude
curl -X POST http://localhost:3000/api/route \
-d '{"prompt": "Solve this logic puzzle...", "capability": "reasoning"}'
# metadata.provider should be "anthropic"
# Vision task -> GPT-4o or Gemini
curl -X POST http://localhost:3000/api/route \
-d '{"prompt": "Describe this image", "capability": "vision", "images": ["base64..."]}'
# metadata.provider should be "openai" or "google"
Phase 3: Fallback Chains (Days 5-6)
Goal: Gracefully handle provider failures
Tasks:
- Implement fallback chain executor
- Add error classification (retryable vs fatal)
- Implement circuit breaker pattern
- Add retry with exponential backoff
- Track fallback events
Milestone: Requests succeed even when primary provider fails
Verification:
// In test file - mock OpenAI to fail
mock(openai).rejects(new Error('rate limit exceeded'));
// Request should still succeed via Anthropic
const response = await fetch('/api/route', { ... });
expect(response.metadata.fallbackUsed).toBe(true);
expect(response.metadata.provider).toBe('anthropic');
Phase 4: Telemetry and Cost Tracking (Days 7-8)
Goal: Full observability of routing decisions
Tasks:
- Implement TelemetryCollector class
- Track requests, costs, latency per provider
- Calculate cost savings vs single-provider baseline
- Create
/api/statsendpoint - Add
/api/healthendpoint
Milestone: Can see detailed stats and costs
Verification:
# Run 100 test requests
for i in {1..100}; do
curl -X POST http://localhost:3000/api/route \
-d '{"prompt": "Random prompt '$i'"}' &
done
wait
# Check stats
curl http://localhost:3000/api/stats
# Should show distribution across providers, total cost, latencies
Phase 5: Dashboard and Polish (Days 9-10)
Goal: Visual dashboard and production hardening
Tasks:
- Create HTML dashboard with charts
- Add WebSocket for real-time updates
- Implement rate limit tracking and pre-emptive routing
- Add request queuing for overloaded providers
- Write comprehensive tests
- Document API and configuration
Milestone: Production-ready router with dashboard
Verification:
- Dashboard shows live request distribution
- Charts update in real-time
- Rate limits are respected
- All tests pass
Testing Strategy
Provider Mock Testing
// tests/mocks/providers.ts
import { vi } from 'vitest';
export function createMockProvider(options: {
name: string;
shouldFail?: boolean;
failAfter?: number;
latencyMs?: number;
}) {
let callCount = 0;
return {
generateText: vi.fn().mockImplementation(async ({ prompt }) => {
callCount++;
if (options.latencyMs) {
await new Promise(r => setTimeout(r, options.latencyMs));
}
if (options.shouldFail) {
throw new Error(`${options.name} provider failed`);
}
if (options.failAfter && callCount > options.failAfter) {
throw new Error(`${options.name} rate limited`);
}
return {
text: `Response from ${options.name}`,
usage: { promptTokens: 10, completionTokens: 20 }
};
})
};
}
// tests/fallback.test.ts
import { describe, it, expect, beforeEach } from 'vitest';
import { createMockProvider } from './mocks/providers';
import { executeWithFallback } from '../src/executor/fallback-chain';
describe('Fallback Chain', () => {
it('falls back to secondary when primary fails', async () => {
const primary = createMockProvider({ name: 'primary', shouldFail: true });
const secondary = createMockProvider({ name: 'secondary' });
const result = await executeWithFallback('test prompt', {
provider: primary,
model: 'test-model',
fallbackChain: [{ provider: secondary, model: 'backup-model' }],
reasoning: 'test'
}, mockTelemetry);
expect(result.fallbackUsed).toBe(true);
expect(result.provider).toBe('secondary');
expect(primary.generateText).toHaveBeenCalledTimes(1);
expect(secondary.generateText).toHaveBeenCalledTimes(1);
});
it('exhausts all providers before failing', async () => {
const failingProviders = [
createMockProvider({ name: 'p1', shouldFail: true }),
createMockProvider({ name: 'p2', shouldFail: true }),
createMockProvider({ name: 'p3', shouldFail: true })
];
await expect(executeWithFallback('test', {
provider: failingProviders[0],
model: 'model',
fallbackChain: failingProviders.slice(1).map(p => ({ provider: p, model: 'model' })),
reasoning: 'test'
}, mockTelemetry)).rejects.toThrow('All providers failed');
});
});
Cost Calculation Tests
// tests/cost.test.ts
import { describe, it, expect } from 'vitest';
import { TelemetryCollector } from '../src/telemetry';
describe('Cost Tracking', () => {
it('calculates OpenAI costs correctly', () => {
const telemetry = new TelemetryCollector();
// 1000 input tokens, 500 output tokens on GPT-4
telemetry.recordSuccess('openai', 'gpt-4-turbo', {
inputTokens: 1000,
outputTokens: 500,
latencyMs: 1000
});
const stats = telemetry.getStats();
// Input: 1000 * $0.01/1k = $0.01
// Output: 500 * $0.03/1k = $0.015
// Total: $0.025
expect(stats.totalCost).toBeCloseTo(0.025, 3);
});
it('aggregates costs across providers', () => {
const telemetry = new TelemetryCollector();
telemetry.recordSuccess('openai', 'gpt-4', { inputTokens: 1000, outputTokens: 500, latencyMs: 100 });
telemetry.recordSuccess('anthropic', 'claude-3-opus', { inputTokens: 1000, outputTokens: 500, latencyMs: 100 });
const stats = telemetry.getStats();
expect(stats.costByProvider['openai']).toBeDefined();
expect(stats.costByProvider['anthropic']).toBeDefined();
expect(stats.totalCost).toBeGreaterThan(0);
});
});
Integration Tests
// tests/integration.test.ts
import { describe, it, expect, beforeAll, afterAll } from 'vitest';
describe('Model Router Integration', () => {
let server: ReturnType<typeof startServer>;
beforeAll(async () => {
server = await startServer({ port: 3001 });
});
afterAll(() => {
server.close();
});
it('routes simple queries to cheap models', async () => {
const response = await fetch('http://localhost:3001/api/route', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ prompt: 'What is 2+2?' })
});
const data = await response.json();
// Simple math should route to cheap model
expect(['gpt-3.5-turbo', 'claude-3-haiku']).toContain(data.metadata.model);
});
it('routes complex reasoning to powerful models', async () => {
const response = await fetch('http://localhost:3001/api/route', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
prompt: 'Prove that the square root of 2 is irrational using proof by contradiction.',
capability: 'reasoning'
})
});
const data = await response.json();
// Complex reasoning should use powerful model
expect(['claude-3-opus', 'gpt-4-turbo', 'gpt-4']).toContain(
data.metadata.model.replace(/-\d{8}$/, '') // Strip date suffix
);
});
});
Common Pitfalls and Debugging
Pitfall 1: Classification Bottleneck
Symptom: Every request has 200-500ms overhead before routing
Cause: Using expensive model for classification
Fix: Use the cheapest, fastest model for classification (GPT-3.5 Turbo, Claude Haiku). Classification doesnโt need reasoning power.
Pitfall 2: Ignoring Provider Error Types
Symptom: Fallback triggers on every error, including auth failures
Cause: Not distinguishing retryable vs permanent errors
Fix: Categorize errors:
- Retryable: rate limit, timeout, 503
- Non-retryable: 401 (auth), 400 (bad request), 404
function isRetryable(error: Error): boolean {
const message = error.message.toLowerCase();
return message.includes('rate limit') ||
message.includes('timeout') ||
message.includes('503') ||
message.includes('overloaded');
}
Pitfall 3: Thundering Herd on Recovery
Symptom: After provider recovers, it immediately gets overloaded again
Cause: All queued requests hitting the provider at once
Fix: Implement gradual ramp-up with circuit breaker half-open state. Only allow 1-2 test requests through before fully opening.
Pitfall 4: Stale Rate Limit Data
Symptom: Still hitting rate limits despite tracking
Cause: Rate limit state not updated between requests
Fix: Update rate limit state synchronously. Use atomic operations if multi-threaded.
Pitfall 5: Token Count Mismatch
Symptom: Cost calculations are wrong
Cause: Different tokenizers for different providers
Fix: Use provider-specific token counting or estimate conservatively. The AI SDK returns actual usage in the response; use that for cost calculation, not estimates.
Pitfall 6: Missing Timeout Handling
Symptom: Requests hang indefinitely when provider is slow
Cause: No timeout on LLM calls
Fix: Wrap all provider calls in timeout:
const result = await Promise.race([
generateText({ model, prompt }),
new Promise((_, reject) =>
setTimeout(() => reject(new Error('Timeout')), 30000)
)
]);
Pitfall 7: Hardcoded Provider URLs
Symptom: Canโt test with local models or proxies
Cause: Provider URLs hardcoded in SDK initialization
Fix: Use environment variables for base URLs. The AI SDK supports custom base URLs for all providers.
Extensions and Challenges
Extension 1: Semantic Caching
Add a semantic cache that returns cached responses for semantically similar prompts:
// Before routing
const cachedResponse = await semanticCache.lookup(prompt);
if (cachedResponse && cachedResponse.similarity > 0.95) {
return cachedResponse.response;
}
// After successful response
await semanticCache.store(prompt, response);
This can reduce costs by 30-50% for repetitive queries.
Extension 2: A/B Testing Framework
Implement A/B testing to compare provider performance:
interface ABTest {
name: string;
variants: Array<{
provider: string;
weight: number; // 0-1, must sum to 1
}>;
}
// Route 50% to Claude, 50% to GPT-4 for reasoning tasks
const test: ABTest = {
name: 'reasoning-provider-comparison',
variants: [
{ provider: 'anthropic', weight: 0.5 },
{ provider: 'openai', weight: 0.5 }
]
};
Track quality metrics per variant to determine winner.
Extension 3: Cost Budget Enforcement
Implement per-user or per-project cost budgets:
interface Budget {
userId: string;
dailyLimit: number;
monthlyLimit: number;
used: { daily: number; monthly: number };
}
// Before routing
if (budget.used.daily + estimatedCost > budget.dailyLimit) {
throw new BudgetExceededError('Daily budget exceeded');
}
Extension 4: Quality Scoring
Implement automatic quality scoring to detect when cheap models underperform:
// After response
const qualityScore = await evaluateResponse(prompt, response);
if (qualityScore < threshold) {
// Re-route to more powerful model
return await routeWithUpgrade(prompt, 'complex');
}
Books That Will Help
| Topic | Book | Chapter | Why |
|---|---|---|---|
| Data encoding & APIs | โDesigning Data-Intensive Applicationsโ by Martin Kleppmann | Ch. 4 (Encoding & Evolution) | Understand how to version APIs and handle schema changes |
| Fault tolerance | โDesigning Data-Intensive Applicationsโ by Martin Kleppmann | Ch. 9 (Consistency & Consensus) | Deep understanding of failure modes |
| Stability patterns | โRelease It!, 2nd Editionโ by Michael Nygard | Ch. 5 (Stability Patterns) | Circuit breakers, bulkheads, timeouts |
| TypeScript patterns | โProgramming TypeScriptโ by Boris Cherny | Ch. 4 (Functions) | Type-safe function design |
| Error handling | โProgramming TypeScriptโ by Boris Cherny | Ch. 7 (Error Handling) | Error types and recovery |
| API design | โRESTful Web APIsโ by Leonard Richardson | Ch. 8-10 | Designing hypermedia APIs |
Reading order for this project:
- โRelease It!โ Ch. 5 - Stability patterns (1 hour) - Core resilience concepts
- Kleppmann Ch. 4 - Encoding (1 hour) - API versioning
- AI SDK Provider docs (30 min) - Provider abstraction
- AI SDK Error Handling docs (30 min) - Error types
- Then start coding
Self-Assessment Checklist
Core Understanding
- I can explain why provider abstraction saves development time
- I understand the trade-offs between different routing strategies
- I can describe when to use fallback vs retry
- I know how circuit breakers prevent cascading failures
- I understand why classification needs to be fast and cheap
- I can calculate the cost of an LLM request given token counts
Implementation Skills
- My classifier correctly categorizes different task types
- My router selects appropriate providers based on capabilities
- My fallback chain executes correctly when primary fails
- My telemetry accurately tracks costs and latencies
- My circuit breaker prevents requests to failing providers
- My API returns detailed metadata about routing decisions
Production Readiness
- I handle all error types appropriately (retryable vs fatal)
- I implement timeouts on all provider calls
- I track rate limits and route proactively
- I have comprehensive test coverage
- My dashboard updates in real-time
- I can demonstrate cost savings vs single-provider approach
Teaching Test
Can you explain to someone else:
- Why use multiple LLM providers instead of just one?
- How do you decide which model to use for a given task?
- What happens when a provider fails mid-request?
- How do you measure if your routing is actually saving money?
- Whatโs a circuit breaker and why do you need one?
Resources
Primary
- AI SDK Providers Documentation - Official provider setup
- AI SDK Error Handling - Error types and recovery
- โRelease It!โ by Michael Nygard - Stability patterns bible
Provider Pricing
- OpenAI Pricing - Current GPT model prices
- Anthropic Pricing - Claude model prices
- Google AI Pricing - Gemini model prices
Patterns
- Circuit Breaker Pattern - Martin Fowlerโs explanation
- Retry Pattern - Microsoftโs architecture guide
Tools
When you complete this project, you will have built a production-grade AI routing system. Youโll understand how to leverage multiple LLM providers effectively, implement resilient fallback patterns, and optimize costs while maintaining quality. These skills are directly applicable to any organization running AI workloads at scale.