Project 2: Model Router Analyzer

Understanding the Auto Router: Master cost-effective AI model selection

Project Metadata

Attribute	Value
Difficulty	Level 2: Intermediate
Time Estimate	1 Week (15-20 hours)
Primary Language	Python
Alternative Languages	TypeScript, Go
Prerequisites	Project 1, basic Python, data analysis fundamentals
Main Reference	“AI Engineering” by Chip Huyen

Learning Objectives

By completing this project, you will:

Understand LLM model tiers - the capabilities, costs, and latency characteristics of Haiku, Sonnet, and Opus
Analyze the Auto router’s decision-making - when it escalates to more powerful models and why
Develop cost optimization intuition - identifying opportunities to reduce AI spending without sacrificing quality
Build data analysis skills - parsing logs, computing metrics, and generating visualizations
Create actionable recommendations - translating data into practical workflow improvements

Deep Theoretical Foundation

The Model Selection Problem

When you interact with an AI, a critical decision happens before any response is generated: which model should handle this request? This decision has profound implications:

┌─────────────────────────────────────────────────────────────────────┐
│              THE MODEL SELECTION TRADEOFF SPACE                      │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│  CAPABILITY                                                         │
│       ▲                                                             │
│       │                              ┌─────────────┐                │
│       │                              │   OPUS      │                │
│       │                              │  (Deep)     │                │
│       │                              │             │                │
│  High │                              │  • Complex  │                │
│       │                              │    reasoning│                │
│       │            ┌─────────────┐   │  • Creative │                │
│       │            │   SONNET    │   │  • Nuanced  │                │
│       │            │   (Smart)   │   └─────────────┘                │
│       │            │             │                                  │
│  Med  │            │  • General  │                                  │
│       │            │    coding   │                                  │
│       │            │  • Refactor │                                  │
│       │ ┌───────┐  └─────────────┘                                  │
│       │ │ HAIKU │                                                   │
│       │ │(Fast) │                                                   │
│  Low  │ │       │                                                   │
│       │ │• Syntax│                                                  │
│       │ │• Simple│                                                  │
│       │ └───────┘                                                   │
│       │                                                             │
│       └─────────────────────────────────────────────────────────────▶
│         Low              Medium              High         COST      │
│                                                                     │
│   OPTIMAL SELECTION: Match task complexity to model capability      │
│   WASTE: Using Opus for syntax questions                            │
│   FAILURE: Using Haiku for architecture design                      │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘

Model Characteristics Deep Dive

Model	Cost Multiplier	Latency	Best For	Failure Modes
Haiku 4.5	0.4x	~200ms	Syntax, simple queries, fast feedback loops	Misses nuance, shallow reasoning
Sonnet 4.5	1.3x	~800ms	General coding, refactoring, debugging	Occasionally overthinks simple tasks
Opus 4.5	2.2x	~2000ms	Architecture, complex reasoning, legacy code	Overkill for simple tasks, expensive

How the Auto Router Works

The Auto router is Kiro’s intelligent model selector. It analyzes your prompt and routes it to the appropriate model tier:

┌─────────────────────────────────────────────────────────────────────┐
│                    AUTO ROUTER DECISION FLOW                         │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│  User Prompt: "..."                                                 │
│        │                                                            │
│        ▼                                                            │
│  ┌─────────────────────────────────────────────────────────────┐    │
│  │                    PROMPT ANALYSIS                           │    │
│  │                                                             │    │
│  │  Features Extracted:                                        │    │
│  │  • Token count                                              │    │
│  │  • Question type (syntax? architecture? debug?)             │    │
│  │  • Complexity signals (numbers, conditions, dependencies)   │    │
│  │  • Historical context (previous turns in conversation)      │    │
│  │  • File context size (more files = more complexity)         │    │
│  │                                                             │    │
│  └────────────────────────┬────────────────────────────────────┘    │
│                           │                                         │
│                           ▼                                         │
│  ┌─────────────────────────────────────────────────────────────┐    │
│  │                    COMPLEXITY SCORING                        │    │
│  │                                                             │    │
│  │  Score = 0                                                  │    │
│  │                                                             │    │
│  │  IF contains "syntax", "import", "how to" → score += 0      │    │
│  │  IF contains "refactor", "debug", "fix" → score += 5        │    │
│  │  IF contains "design", "architect", "strategy" → score += 10│    │
│  │  IF token_count > 500 → score += 3                          │    │
│  │  IF file_count > 5 → score += 5                             │    │
│  │  IF has_code_block → score += 2                             │    │
│  │                                                             │    │
│  └────────────────────────┬────────────────────────────────────┘    │
│                           │                                         │
│                           ▼                                         │
│  ┌─────────────────────────────────────────────────────────────┐    │
│  │                    MODEL SELECTION                           │    │
│  │                                                             │    │
│  │  IF score < 5:                                              │    │
│  │      ┌──────────┐                                           │    │
│  │      │  HAIKU   │ → Fast, cheap, sufficient                 │    │
│  │      └──────────┘                                           │    │
│  │                                                             │    │
│  │  ELIF score < 12:                                           │    │
│  │      ┌──────────┐                                           │    │
│  │      │  SONNET  │ → Balanced capability                     │    │
│  │      └──────────┘                                           │    │
│  │                                                             │    │
│  │  ELSE:                                                      │    │
│  │      ┌──────────┐                                           │    │
│  │      │  OPUS    │ → Maximum reasoning power                 │    │
│  │      └──────────┘                                           │    │
│  │                                                             │    │
│  └─────────────────────────────────────────────────────────────┘    │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘

The Economics of Model Selection

Understanding the cost implications is crucial:

┌─────────────────────────────────────────────────────────────────────┐
│                    COST CALCULATION EXAMPLE                          │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│  Scenario: 100 queries over a work session                          │
│                                                                     │
│  ┌─────────────────────────────────────────────────────────────┐    │
│  │ NAIVE APPROACH: Always use Sonnet                           │    │
│  │                                                             │    │
│  │  100 queries × 1,000 tokens × $0.003/1K = $0.30 per session │    │
│  │                                                             │    │
│  │  Monthly (20 sessions): $6.00                               │    │
│  └─────────────────────────────────────────────────────────────┘    │
│                                                                     │
│  ┌─────────────────────────────────────────────────────────────┐    │
│  │ OPTIMIZED APPROACH: Smart routing                           │    │
│  │                                                             │    │
│  │  40 queries (simple) × 1,000 × $0.0012/1K = $0.048          │    │
│  │  50 queries (medium) × 1,000 × $0.0039/1K = $0.195          │    │
│  │  10 queries (complex) × 1,000 × $0.0066/1K = $0.066         │    │
│  │  ────────────────────────────────────────────────           │    │
│  │  Total: $0.309 → but with 23% BETTER routing:               │    │
│  │  Actual: $0.238 per session                                 │    │
│  │                                                             │    │
│  │  Monthly (20 sessions): $4.76                               │    │
│  │  SAVINGS: $1.24/month (21%)                                 │    │
│  └─────────────────────────────────────────────────────────────┘    │
│                                                                     │
│  For heavy users (100 sessions/month): $62 savings/year!            │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘

Historical Context: From Manual to Intelligent Routing

The evolution of model selection:

Era	Approach	Overhead
2023	Manual API calls with explicit model selection	Developer decides every call
2024	Simple routing based on token count	Rule-based, crude
2025	ML-powered routing (Auto router)	Learns from usage patterns
Future	Predictive routing with quality feedback loops	Self-optimizing

The Auto router represents a significant step: it uses a lightweight classifier trained on millions of queries to predict optimal model selection.

Real-World Analogy: The Restaurant Kitchen

Think of model selection like a restaurant kitchen:

Haiku = Line cook: Fast, efficient, handles simple dishes
Sonnet = Sous chef: Skilled, handles complex orders
Opus = Executive chef: Creative, handles VIP requests

You wouldn’t have the executive chef make a salad, and you wouldn’t have the line cook design the tasting menu. The maître d’ (Auto router) routes orders appropriately.

Complete Project Specification

What You Are Building

A Model Usage Analyzer that:

Logs Model Selections: Captures which model was used for each query
Classifies Query Complexity: Categorizes queries by type and difficulty
Calculates Cost Metrics: Computes actual vs. optimal spending
Identifies Optimization Opportunities: Finds mismatches between task and model
Generates Recommendations: Provides actionable advice for better routing

Architecture Overview

┌─────────────────────────────────────────────────────────────────────┐
│                MODEL ANALYZER ARCHITECTURE                           │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│  ┌─────────────────────────────────────────────────────────────┐    │
│  │                      Data Collection                         │    │
│  │  ┌───────────────┐  ┌───────────────┐  ┌───────────────┐    │    │
│  │  │ Kiro Logs     │  │ /usage API    │  │ Session Data  │    │    │
│  │  │ ($TMPDIR)     │  │ (credits)     │  │ (Project 1)   │    │    │
│  │  └───────┬───────┘  └───────┬───────┘  └───────┬───────┘    │    │
│  │          │                  │                  │             │    │
│  └──────────┼──────────────────┼──────────────────┼─────────────┘    │
│             │                  │                  │                  │
│             └──────────────────┼──────────────────┘                  │
│                                │                                     │
│                                ▼                                     │
│  ┌─────────────────────────────────────────────────────────────┐    │
│  │                      Log Parser                              │    │
│  │  • Extract model selection events                           │    │
│  │  • Parse prompt content                                     │    │
│  │  • Capture response metadata                                │    │
│  └────────────────────────┬────────────────────────────────────┘    │
│                           │                                         │
│                           ▼                                         │
│  ┌─────────────────────────────────────────────────────────────┐    │
│  │                    Query Classifier                          │    │
│  │  ┌───────────┐  ┌───────────┐  ┌───────────┐                │    │
│  │  │  Syntax   │  │  Debug    │  │Architecture│                │    │
│  │  │  Queries  │  │  Queries  │  │  Queries  │                │    │
│  │  └───────────┘  └───────────┘  └───────────┘                │    │
│  │                                                             │    │
│  │  Keywords: "import", "syntax", "how to" → SIMPLE            │    │
│  │  Keywords: "debug", "fix", "error" → MEDIUM                 │    │
│  │  Keywords: "design", "architect", "refactor" → COMPLEX      │    │
│  └────────────────────────┬────────────────────────────────────┘    │
│                           │                                         │
│                           ▼                                         │
│  ┌─────────────────────────────────────────────────────────────┐    │
│  │                    Cost Calculator                           │    │
│  │                                                             │    │
│  │  actual_cost = sum(model_cost[m] * tokens[m])               │    │
│  │  optimal_cost = sum(optimal_model_cost[q] * tokens[q])      │    │
│  │  waste = actual_cost - optimal_cost                         │    │
│  └────────────────────────┬────────────────────────────────────┘    │
│                           │                                         │
│                           ▼                                         │
│  ┌─────────────────────────────────────────────────────────────┐    │
│  │                 Recommendation Engine                        │    │
│  │                                                             │    │
│  │  IF simple_query AND used_sonnet:                           │    │
│  │      → "Consider forcing Haiku for syntax queries"          │    │
│  │                                                             │    │
│  │  IF complex_query AND used_sonnet AND low_quality:          │    │
│  │      → "Force Opus for architecture discussions"            │    │
│  │                                                             │    │
│  └────────────────────────┬────────────────────────────────────┘    │
│                           │                                         │
│                           ▼                                         │
│  ┌─────────────────────────────────────────────────────────────┐    │
│  │                    Report Generator                          │    │
│  │  • Terminal dashboard (rich/matplotlib)                     │    │
│  │  • JSON export                                              │    │
│  │  • Markdown report                                          │    │
│  └─────────────────────────────────────────────────────────────┘    │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘

Expected Deliverables

model-analyzer/
├── analyzer/
│   ├── __init__.py
│   ├── log_parser.py        # Parse Kiro logs
│   ├── classifier.py        # Classify query complexity
│   ├── cost_calculator.py   # Calculate costs and savings
│   ├── recommender.py       # Generate recommendations
│   └── reporter.py          # Generate reports
├── cli.py                   # Command-line interface
├── tests/
│   ├── test_classifier.py
│   ├── test_cost_calculator.py
│   └── sample_logs/
├── requirements.txt
└── README.md

Solution Architecture

Data Model

from dataclasses import dataclass
from enum import Enum
from datetime import datetime
from typing import List, Optional

class Model(Enum):
    HAIKU = "haiku"
    SONNET = "sonnet"
    OPUS = "opus"
    AUTO = "auto"

class ComplexityLevel(Enum):
    SIMPLE = 1      # Syntax, imports, how-to
    MEDIUM = 2      # Debugging, fixing, general coding
    COMPLEX = 3     # Architecture, design, refactoring

@dataclass
class Query:
    id: str
    timestamp: datetime
    prompt: str
    model_used: Model
    model_selected_by: str  # "auto" or "manual"
    tokens_input: int
    tokens_output: int
    latency_ms: int
    complexity: Optional[ComplexityLevel] = None
    optimal_model: Optional[Model] = None

@dataclass
class CostAnalysis:
    actual_cost: float
    optimal_cost: float
    waste: float
    savings_percentage: float
    recommendations: List[str]

@dataclass
class UsageReport:
    period_start: datetime
    period_end: datetime
    total_queries: int
    model_distribution: dict[Model, int]
    cost_analysis: CostAnalysis
    misrouted_queries: List[Query]

Classification Algorithm

┌─────────────────────────────────────────────────────────────────────┐
│                    QUERY CLASSIFICATION ALGORITHM                    │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│  INPUT: prompt (str), context_files (int), conversation_turns (int) │
│                                                                     │
│  ┌─────────────────────────────────────────────────────────────┐    │
│  │ STEP 1: KEYWORD ANALYSIS                                    │    │
│  │                                                             │    │
│  │ simple_keywords = ["syntax", "import", "how to", "what is", │    │
│  │                    "convert", "format"]                     │    │
│  │                                                             │    │
│  │ medium_keywords = ["debug", "fix", "error", "bug", "issue", │    │
│  │                    "not working", "broken"]                 │    │
│  │                                                             │    │
│  │ complex_keywords = ["design", "architect", "refactor",      │    │
│  │                     "restructure", "strategy", "optimize",  │    │
│  │                     "implement from scratch"]               │    │
│  └─────────────────────────────────────────────────────────────┘    │
│                                                                     │
│  ┌─────────────────────────────────────────────────────────────┐    │
│  │ STEP 2: CONTEXT ANALYSIS                                    │    │
│  │                                                             │    │
│  │ IF context_files > 10: complexity += 1                      │    │
│  │ IF conversation_turns > 5: complexity += 1                  │    │
│  │ IF prompt_tokens > 500: complexity += 1                     │    │
│  └─────────────────────────────────────────────────────────────┘    │
│                                                                     │
│  ┌─────────────────────────────────────────────────────────────┐    │
│  │ STEP 3: PATTERN MATCHING                                    │    │
│  │                                                             │    │
│  │ IF matches(r"^(what|how|where).*\?$"): likely SIMPLE        │    │
│  │ IF matches(r"(refactor|redesign).*entire"): likely COMPLEX  │    │
│  │ IF mentions_multiple_files(): likely MEDIUM or higher       │    │
│  └─────────────────────────────────────────────────────────────┘    │
│                                                                     │
│  ┌─────────────────────────────────────────────────────────────┐    │
│  │ STEP 4: OPTIMAL MODEL MAPPING                               │    │
│  │                                                             │    │
│  │ SIMPLE → HAIKU                                              │    │
│  │ MEDIUM → SONNET                                             │    │
│  │ COMPLEX → OPUS                                              │    │
│  └─────────────────────────────────────────────────────────────┘    │
│                                                                     │
│  OUTPUT: (ComplexityLevel, optimal_model: Model)                    │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘

Cost Calculation Model

┌─────────────────────────────────────────────────────────────────────┐
│                    COST CALCULATION MODEL                            │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│  PRICING (per 1K tokens, approximate):                              │
│                                                                     │
│  ┌──────────────────────────────────────────────────────────────┐   │
│  │  Model       │ Input Cost  │ Output Cost │ Multiplier        │   │
│  │──────────────┼─────────────┼─────────────┼──────────────────│   │
│  │  Haiku 4.5   │  $0.0008    │  $0.0032    │  0.4x            │   │
│  │  Sonnet 4.5  │  $0.003     │  $0.015     │  1.0x (baseline) │   │
│  │  Opus 4.5    │  $0.015     │  $0.075     │  2.2x            │   │
│  └──────────────────────────────────────────────────────────────┘   │
│                                                                     │
│  CALCULATION:                                                       │
│                                                                     │
│  For each query q:                                                  │
│    actual_cost[q] = (input_tokens * input_price[model_used]) +     │
│                     (output_tokens * output_price[model_used])      │
│                                                                     │
│    optimal_cost[q] = (input_tokens * input_price[optimal_model]) + │
│                      (output_tokens * output_price[optimal_model])  │
│                                                                     │
│    waste[q] = actual_cost[q] - optimal_cost[q]                      │
│                                                                     │
│  AGGREGATE:                                                         │
│    total_actual = sum(actual_cost)                                  │
│    total_optimal = sum(optimal_cost)                                │
│    total_waste = sum(waste where waste > 0)                         │
│    savings_opportunity = (total_waste / total_actual) * 100         │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘

Phased Implementation Guide

Phase 1: Log Parsing (3-4 hours)

Goal: Extract model selection events from Kiro logs

What to Build:

Locate Kiro log files
Parse log format to extract relevant events
Create structured Query objects

Hint 1: Kiro logs are typically in $TMPDIR/kiro-log/:

ls -la "$TMPDIR/kiro-log/" 2>/dev/null
# Or check ~/.kiro/logs/

Hint 2: Log entries often have a recognizable format:

import re

LOG_PATTERN = r'\[(\d{4}-\d{2}-\d{2}T[\d:]+)\] \[(\w+)\] model=(\w+) tokens=(\d+)'

def parse_log_line(line: str) -> Optional[dict]:
    match = re.match(LOG_PATTERN, line)
    if match:
        return {
            'timestamp': match.group(1),
            'event': match.group(2),
            'model': match.group(3),
            'tokens': int(match.group(4))
        }
    return None

Hint 3: Handle both JSON and plain text log formats:

def parse_log_file(path: Path) -> List[dict]:
    events = []
    with open(path) as f:
        for line in f:
            try:
                # Try JSON first
                event = json.loads(line)
            except json.JSONDecodeError:
                # Fall back to pattern matching
                event = parse_log_line(line)
            if event:
                events.append(event)
    return events

Validation Checkpoint: You can parse a log file and print a list of model selections.

Phase 2: Query Classification (4-5 hours)

Goal: Classify queries by complexity and determine optimal model

What to Build:

Keyword-based classifier
Context-aware complexity scoring
Optimal model mapping

Hint 1: Use a simple keyword scoring system:

COMPLEXITY_KEYWORDS = {
    'simple': ['syntax', 'import', 'how to', 'what is', 'convert'],
    'medium': ['debug', 'fix', 'error', 'bug', 'explain'],
    'complex': ['design', 'architect', 'refactor', 'restructure']
}

def classify_prompt(prompt: str) -> ComplexityLevel:
    prompt_lower = prompt.lower()
    scores = {level: 0 for level in ComplexityLevel}

    for level, keywords in COMPLEXITY_KEYWORDS.items():
        for keyword in keywords:
            if keyword in prompt_lower:
                scores[ComplexityLevel[level.upper()]] += 1

    return max(scores, key=scores.get)

Hint 2: Consider context size as a complexity signal:

def adjust_for_context(base_level: ComplexityLevel, context_files: int) -> ComplexityLevel:
    if context_files > 10 and base_level == ComplexityLevel.SIMPLE:
        return ComplexityLevel.MEDIUM
    if context_files > 20:
        return ComplexityLevel.COMPLEX
    return base_level

Hint 3: Map complexity to optimal model:

OPTIMAL_MODEL_MAP = {
    ComplexityLevel.SIMPLE: Model.HAIKU,
    ComplexityLevel.MEDIUM: Model.SONNET,
    ComplexityLevel.COMPLEX: Model.OPUS
}

Validation Checkpoint: You can classify a list of sample prompts and verify the classifications make sense.

Phase 3: Cost Analysis and Recommendations (4-5 hours)

Goal: Calculate costs and generate actionable recommendations

What to Build:

Cost calculator with real pricing
Waste identification
Recommendation generator
Report output

Hint 1: Use dataclasses for clean cost modeling:

@dataclass
class ModelPricing:
    input_per_1k: float
    output_per_1k: float

PRICING = {
    Model.HAIKU: ModelPricing(0.0008, 0.0032),
    Model.SONNET: ModelPricing(0.003, 0.015),
    Model.OPUS: ModelPricing(0.015, 0.075)
}

def calculate_cost(model: Model, input_tokens: int, output_tokens: int) -> float:
    pricing = PRICING[model]
    return (input_tokens / 1000 * pricing.input_per_1k +
            output_tokens / 1000 * pricing.output_per_1k)

Hint 2: Generate recommendations based on patterns:

def generate_recommendations(queries: List[Query]) -> List[str]:
    recommendations = []

    # Count misroutes
    simple_with_opus = [q for q in queries
                        if q.complexity == ComplexityLevel.SIMPLE
                        and q.model_used == Model.OPUS]

    if len(simple_with_opus) > 5:
        savings = sum(calculate_waste(q) for q in simple_with_opus)
        recommendations.append(
            f"Found {len(simple_with_opus)} simple queries using Opus. "
            f"Force Haiku with '/model set haiku' for syntax questions. "
            f"Potential savings: ${savings:.2f}"
        )

    return recommendations

Hint 3: Use rich for beautiful terminal output:

from rich.console import Console
from rich.table import Table

def render_report(analysis: CostAnalysis):
    console = Console()

    table = Table(title="Model Usage Report")
    table.add_column("Metric")
    table.add_column("Value", justify="right")

    table.add_row("Actual Cost", f"${analysis.actual_cost:.2f}")
    table.add_row("Optimal Cost", f"${analysis.optimal_cost:.2f}")
    table.add_row("Waste", f"${analysis.waste:.2f}")
    table.add_row("Savings Opportunity", f"{analysis.savings_percentage:.1f}%")

    console.print(table)

Validation Checkpoint: You can run the analyzer and see a formatted report with costs and recommendations.

Testing Strategy

Unit Tests

# test_classifier.py
import pytest
from analyzer.classifier import classify_prompt, ComplexityLevel

class TestClassifier:
    def test_syntax_query_is_simple(self):
        prompt = "What's the syntax for optional chaining in TypeScript?"
        assert classify_prompt(prompt) == ComplexityLevel.SIMPLE

    def test_debug_query_is_medium(self):
        prompt = "Debug this segfault in my memory allocator"
        assert classify_prompt(prompt) == ComplexityLevel.MEDIUM

    def test_architecture_query_is_complex(self):
        prompt = "Design a microservices architecture for a fintech app"
        assert classify_prompt(prompt) == ComplexityLevel.COMPLEX

    def test_ambiguous_query_defaults_to_medium(self):
        prompt = "Help me with this code"
        assert classify_prompt(prompt) == ComplexityLevel.MEDIUM

Integration Tests

# test_integration.py
def test_full_analysis_pipeline():
    # Create sample log file
    sample_logs = """
    [2025-12-22T10:00:00] model=sonnet prompt="what is the syntax for..." tokens_in=50 tokens_out=100
    [2025-12-22T10:01:00] model=opus prompt="design a new auth system" tokens_in=200 tokens_out=500
    """

    with tempfile.NamedTemporaryFile(mode='w', suffix='.log') as f:
        f.write(sample_logs)
        f.flush()

        result = analyze_logs(f.name)

        assert result.total_queries == 2
        assert result.cost_analysis.waste > 0  # Sonnet for simple query

Sample Data for Testing

Create a sample_logs/ directory with realistic test data:

// sample_logs/diverse_queries.json
[
  {"prompt": "What's the Python syntax for list comprehension?", "model": "sonnet", "tokens_in": 30, "tokens_out": 150},
  {"prompt": "Debug why this React component is re-rendering", "model": "sonnet", "tokens_in": 500, "tokens_out": 800},
  {"prompt": "Design a distributed caching layer for our microservices", "model": "sonnet", "tokens_in": 200, "tokens_out": 2000},
  {"prompt": "How do I import numpy?", "model": "opus", "tokens_in": 10, "tokens_out": 50}
]

Common Pitfalls and Debugging

Pitfall 1: Log Format Variations

Symptom: Parser works on some logs but fails on others

Cause: Kiro log format may change between versions

Debug:

# Print first few lines to understand format
with open(log_file) as f:
    for i, line in enumerate(f):
        print(f"Line {i}: {repr(line[:100])}")
        if i > 5:
            break

Solution: Build flexible parsers that try multiple formats:

def parse_line(line: str) -> Optional[dict]:
    parsers = [parse_json, parse_structured_text, parse_plain_text]
    for parser in parsers:
        result = parser(line)
        if result:
            return result
    return None

Pitfall 2: Missing Token Counts

Symptom: Token counts are zero or missing

Cause: Logs may not include token counts for all events

Debug:

grep -o 'tokens[^,]*' /path/to/logs | sort | uniq -c

Solution: Estimate tokens when missing:

def estimate_tokens(text: str) -> int:
    # Rough estimation: ~4 characters per token
    return len(text) // 4

Pitfall 3: Classification Disagreements

Symptom: Queries are classified differently than expected

Cause: Keyword-based classification is imperfect

Debug:

def classify_with_debug(prompt: str) -> tuple[ComplexityLevel, dict]:
    scores = {}
    for level, keywords in COMPLEXITY_KEYWORDS.items():
        matched = [k for k in keywords if k in prompt.lower()]
        scores[level] = {'count': len(matched), 'keywords': matched}
    return max(scores, key=lambda k: scores[k]['count']), scores

Solution: Allow manual override and feedback loop:

# Store corrections for learning
def record_correction(query_id: str, correct_complexity: ComplexityLevel):
    corrections_file = Path.home() / '.model-analyzer' / 'corrections.json'
    # Load existing corrections
    corrections = json.loads(corrections_file.read_text()) if corrections_file.exists() else {}
    corrections[query_id] = correct_complexity.value
    corrections_file.write_text(json.dumps(corrections))

Pitfall 4: Pricing Data Outdated

Symptom: Cost calculations don’t match Kiro’s /usage output

Cause: Model pricing changes over time

Debug:

kiro-cli /usage --format json | jq '.credits'

Solution: Make pricing configurable:

# config.yaml
pricing:
  haiku:
    input_per_1k: 0.0008
    output_per_1k: 0.0032
  sonnet:
    input_per_1k: 0.003
    output_per_1k: 0.015
  opus:
    input_per_1k: 0.015
    output_per_1k: 0.075

Extensions and Challenges

Extension 1: Real-time Dashboard

Create a live dashboard that updates as you use Kiro:

# Use watchdog to monitor log files
from watchdog.observers import Observer
from watchdog.events import FileSystemEventHandler

class LogHandler(FileSystemEventHandler):
    def on_modified(self, event):
        if event.src_path.endswith('.log'):
            self.update_dashboard()

Extension 2: ML-Based Classifier

Replace keyword matching with a trained classifier:

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB

# Train on labeled examples
vectorizer = TfidfVectorizer()
classifier = MultinomialNB()

X = vectorizer.fit_transform(training_prompts)
classifier.fit(X, training_labels)

Extension 3: Team Analytics

Aggregate usage across team members for organization-wide insights:

./model-analyzer team-report --team-dir /shared/kiro-logs/

Extension 4: A/B Testing Framework

Compare model performance on similar queries:

def ab_test(prompt: str, models: List[Model]) -> dict:
    results = {}
    for model in models:
        response = run_query(prompt, force_model=model)
        results[model] = {
            'response': response,
            'latency': response.latency,
            'cost': calculate_cost(model, response.tokens)
        }
    return results

Challenge: Predictive Routing

Build a system that predicts optimal routing before the query is sent:

User starts typing: "design a..."
System predicts: COMPLEX (confidence: 0.85)
Suggestion: "This looks like an architecture question. Consider forcing Opus for best results."

Real-World Connections

How Professionals Use This

Cost Management: Engineering managers track AI spending per team/project
Performance Optimization: DevOps teams monitor response latency vs. model selection
Quality Assurance: Teams correlate model selection with code review feedback
Capacity Planning: Predict AI compute needs based on usage patterns

Industry Patterns

LLM Observability (MLOps): This project introduces concepts used in production LLM monitoring systems like LangSmith, Weights & Biases, and Datadog LLM Observability.

Cost Attribution (FinOps): Tracking AI costs per project/feature mirrors cloud cost allocation practices.

Quality-Cost Tradeoffs (Engineering Economics): The model selection problem is a specific instance of the general engineering tradeoff between quality, speed, and cost.

Self-Assessment Checklist

Understanding Verification

Can you explain when Haiku is sufficient vs. when Opus is needed?
- Haiku: Syntax, simple lookups, fast iteration
- Opus: Architecture, complex reasoning, creative solutions
What factors influence the Auto router’s decision?
- Prompt complexity (keywords, length)
- Context size (files loaded)
- Historical patterns (conversation depth)
How do you calculate cost savings from better routing?
- Compare actual model cost vs. optimal model cost per query
- Aggregate waste across session/week/month
When should you override Auto and force a specific model?
- Force Haiku: Known simple queries, speed-critical loops
- Force Opus: Architecture discussions, complex debugging

Skill Demonstration

I can parse Kiro logs and extract model selection events
I can classify query complexity with reasonable accuracy
I can calculate cost metrics and identify waste
I can generate actionable recommendations
I can visualize usage patterns in the terminal

Interview Preparation

Be ready to answer:

“How would you design a model routing system for an AI application?”
“What metrics would you track to optimize LLM costs?”
“How do you balance cost vs. quality in AI deployments?”
“How would you A/B test different models for the same task?”

Topic	Resource	Why It Helps
LLM Engineering	“AI Engineering” by Chip Huyen, Ch. 4-6	Deep dive into model serving and optimization
Cost Optimization	AWS Well-Architected Framework, Cost Pillar	General principles of cloud cost management
Data Analysis	“Python for Data Analysis” by McKinney, Ch. 8	Pandas patterns for log analysis
Visualization	Rich documentation (rich.readthedocs.io)	Beautiful terminal output
ML Classification	Scikit-learn tutorials	For ML-based classifier extension

What Success Looks Like

When you complete this project, you will have:

A Working Tool: model-analyzer command that generates usage reports
Cost Awareness: Intuition for when to override Auto router
Data Analysis Skills: Log parsing, classification, and visualization
Optimization Mindset: Identifying waste and recommending improvements
Foundation for MLOps: Understanding of LLM observability concepts

Next Steps: Move to Project 3 (Context Window Visualizer) to understand token economics in depth.

Project 2: Model Router Analyzer

Project Metadata

Learning Objectives

Deep Theoretical Foundation

The Model Selection Problem

Model Characteristics Deep Dive

How the Auto Router Works

The Economics of Model Selection

Historical Context: From Manual to Intelligent Routing

Real-World Analogy: The Restaurant Kitchen

Complete Project Specification

What You Are Building

Architecture Overview

Expected Deliverables

Solution Architecture

Data Model

Classification Algorithm

Cost Calculation Model

Phased Implementation Guide

Phase 1: Log Parsing (3-4 hours)

Phase 2: Query Classification (4-5 hours)

Phase 3: Cost Analysis and Recommendations (4-5 hours)

Testing Strategy

Unit Tests

Integration Tests

Sample Data for Testing

Common Pitfalls and Debugging

Pitfall 1: Log Format Variations

Pitfall 2: Missing Token Counts

Pitfall 3: Classification Disagreements

Pitfall 4: Pricing Data Outdated

Extensions and Challenges

Extension 1: Real-time Dashboard

Extension 2: ML-Based Classifier

Extension 3: Team Analytics

Extension 4: A/B Testing Framework

Challenge: Predictive Routing

Real-World Connections

How Professionals Use This

Industry Patterns

Self-Assessment Checklist

Understanding Verification

Skill Demonstration

Interview Preparation

Recommended Reading

What Success Looks Like