← Back to all projects

AI SDK LEARNING PROJECTS

In 2023, when ChatGPT exploded onto the scene, developers scrambled to build AI-powered applications. The problem? Every LLM provider had a different API. OpenAI used one format, Anthropic another, Google yet another. Code written for one provider couldn't be ported to another without significant rewrites.

Learning the AI SDK (Vercel) Deeply

Goal: Master the Vercel AI SDK through hands-on projects that teach its core concepts by building real applications. By the end of these projects, you will understand how to generate text and structured data from LLMs, implement real-time streaming interfaces, build autonomous agents that use tools, and create production-ready AI systems with proper error handling, cost tracking, and multi-provider support.


Why the AI SDK Matters

In 2023, when ChatGPT exploded onto the scene, developers scrambled to build AI-powered applications. The problem? Every LLM provider had a different API. OpenAI used one format, Anthropic another, Google yet another. Code written for one provider couldn’t be ported to another without significant rewrites.

Vercel’s AI SDK solved this problem with a radical idea: a unified TypeScript interface that abstracts provider differences. Write once, run on any model. But it’s not just about abstraction—the SDK provides:

  • Type-safe structured output with Zod schemas
  • First-class streaming with Server-Sent Events and React hooks
  • Tool calling that lets LLMs take actions, not just generate text
  • Agent loops that run autonomously until tasks complete

Today, the AI SDK powers thousands of production applications. Understanding it deeply means understanding how modern AI applications are built.

The AI SDK in the Ecosystem

┌─────────────────────────────────────────────────────────────────────────────┐
│                         YOUR APPLICATION                                      │
│                                                                               │
│   ┌───────────────────────────────────────────────────────────────────────┐  │
│   │                         AI SDK (Unified API)                           │  │
│   │                                                                        │  │
│   │   ┌─────────────┐  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐  │  │
│   │   │ generateText │  │ streamText  │  │generateObject│ │ streamObject│  │  │
│   │   │   Batch      │  │  Real-time  │  │ Structured  │  │  Streaming  │  │  │
│   │   │   Output     │  │  Streaming  │  │   Output    │  │  Structured │  │  │
│   │   └─────────────┘  └─────────────┘  └─────────────┘  └─────────────┘  │  │
│   │                                                                        │  │
│   │   ┌──────────────────────────────┐  ┌────────────────────────────────┐│  │
│   │   │       Tools & Agents          │  │     React/Vue/Svelte Hooks    ││  │
│   │   │   stopWhen, prepareStep,      │  │   useChat, useCompletion,     ││  │
│   │   │   tool(), Agent class         │  │   useObject                   ││  │
│   │   └──────────────────────────────┘  └────────────────────────────────┘│  │
│   │                                                                        │  │
│   └────────────────────────────────┬──────────────────────────────────────┘  │
│                                    │                                          │
│                    Provider Abstraction Layer                                 │
│                                    │                                          │
│   ┌────────────┬──────────────┬────┴─────┬──────────────┬────────────────┐   │
│   │            │              │          │              │                │   │
│   ▼            ▼              ▼          ▼              ▼                ▼   │
│ ┌──────┐   ┌──────────┐   ┌───────┐  ┌───────┐   ┌──────────┐   ┌───────┐   │
│ │OpenAI│   │Anthropic │   │Google │  │Mistral│   │ Cohere   │   │ Local │   │
│ │ GPT  │   │ Claude   │   │Gemini │  │       │   │          │   │Models │   │
│ └──────┘   └──────────┘   └───────┘  └───────┘   └──────────┘   └───────┘   │
└─────────────────────────────────────────────────────────────────────────────┘

AI SDK Ecosystem


Core Concepts Deep Dive

Before diving into projects, you must understand the fundamental concepts that make the AI SDK powerful. Each concept builds on the previous one—don’t skip ahead.

1. Text Generation: The Foundation

At its core, the AI SDK does one thing: sends prompts to LLMs and gets responses back. But HOW you get those responses matters enormously.

┌────────────────────────────────────────────────────────────────────────────┐
│                    TEXT GENERATION PATTERNS                                  │
├────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│   generateText (Blocking)                 streamText (Real-time)             │
│   ─────────────────────────               ──────────────────────             │
│                                                                              │
│   Client          Server                  Client          Server             │
│      │               │                       │               │               │
│      │──── POST ────►│                       │──── POST ────►│               │
│      │               │                       │               │               │
│      │   (waiting)   │ ◄─────────────────┐   │   (waiting)   │ ◄──────────┐  │
│      │               │ Processing LLM    │   │               │ Start LLM  │  │
│      │               │ response...       │   │◄── token ─────│            │  │
│      │               │ (could be 10s+)   │   │◄── token ─────│ streaming  │  │
│      │               │                   │   │◄── token ─────│            │  │
│      │◄─ COMPLETE ───│ ──────────────────┘   │◄── token ─────│            │  │
│      │               │                       │◄── [done] ────│ ───────────┘  │
│      │               │                       │               │               │
│                                                                              │
│   USE WHEN:                               USE WHEN:                          │
│   • Background processing                 • Interactive UIs                  │
│   • Batch operations                      • Chat interfaces                  │
│   • Email drafting                        • Real-time feedback               │
│   • Agent tool calls                      • Long-form generation             │
│                                                                              │
└────────────────────────────────────────────────────────────────────────────┘

Text Generation Patterns

Key Insight: generateText blocks until the full response is ready. streamText returns an async iterator that yields tokens as they’re generated. For a 500-word response, generateText makes the user wait 5-10 seconds for anything to appear; streamText shows the first word in milliseconds.

// Blocking - waits for complete response
const { text } = await generateText({
  model: openai('gpt-4'),
  prompt: 'Explain quantum computing in 500 words'
});
console.log(text); // Full response after ~10 seconds

// Streaming - yields tokens as they arrive
const { textStream } = await streamText({
  model: openai('gpt-4'),
  prompt: 'Explain quantum computing in 500 words'
});
for await (const chunk of textStream) {
  process.stdout.write(chunk); // Each word appears immediately
}

2. Structured Output: Type-Safe AI

Raw text from LLMs is messy. You ask for JSON, you might get markdown. You ask for a number, you might get “approximately 42.” generateObject solves this by enforcing Zod schemas:

┌────────────────────────────────────────────────────────────────────────────┐
│                    STRUCTURED OUTPUT FLOW                                    │
├────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│   User Input                  Schema Definition              Typed Output    │
│   ──────────                  ─────────────────              ────────────    │
│                                                                              │
│   "Spent $45.50 on      ┌─────────────────────┐         {                    │
│    dinner with client    │  z.object({         │           amount: 45.50,    │
│    at Italian            │    amount: z.number │           category:         │
│    restaurant            │    category: z.enum │             "dining",       │
│    last Tuesday"         │    vendor: z.string │           vendor: "Italian  │
│                          │    date: z.date()   │             Restaurant",    │
│         │                │  })                 │           date: Date        │
│         │                └──────────┬──────────┘         }                   │
│         │                           │                       ▲                │
│         │                           │                       │                │
│         └───────────────────────────┼───────────────────────┘                │
│                                     │                                        │
│                              ┌──────┴──────┐                                 │
│                              │ generateObject│                                │
│                              │    + LLM     │                                │
│                              └─────────────┘                                 │
│                                                                              │
│   The LLM "sees" the schema and generates valid data.                        │
│   If validation fails, AI SDK throws AI_NoObjectGeneratedError.              │
│                                                                              │
└────────────────────────────────────────────────────────────────────────────┘

Structured Output Flow

Key Insight: Schema descriptions are prompt engineering. The LLM reads your schema including field descriptions to understand what you want. Better descriptions = better extraction.

const expenseSchema = z.object({
  amount: z.number().describe('The monetary amount spent in dollars'),
  category: z.enum(['dining', 'travel', 'office', 'entertainment'])
    .describe('The expense category for accounting'),
  vendor: z.string().describe('The business name where money was spent'),
  date: z.date().describe('When the expense occurred')
});

const { object } = await generateObject({
  model: openai('gpt-4'),
  schema: expenseSchema,
  prompt: 'Spent $45.50 on dinner with client at Italian restaurant last Tuesday'
});

// object is fully typed: { amount: number, category: "dining" | ..., ... }

3. Tools: AI That Takes Action

Text generation is passive—the AI talks, you listen. Tools make AI active—the AI can DO things.

┌────────────────────────────────────────────────────────────────────────────┐
│                       TOOL CALLING FLOW                                      │
├────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│   ┌──────────────────────────────────────────────────────────────────────┐  │
│   │                           Tool Registry                               │  │
│   │                                                                       │  │
│   │   ┌─────────────┐   ┌─────────────┐   ┌─────────────┐               │  │
│   │   │ getWeather  │   │ searchWeb   │   │ sendEmail   │               │  │
│   │   │             │   │             │   │             │               │  │
│   │   │ description:│   │ description:│   │ description:│               │  │
│   │   │ "Get current│   │ "Search the │   │ "Send an    │               │  │
│   │   │  weather    │   │  web for    │   │  email to   │               │  │
│   │   │  for city"  │   │  information│   │  a recipient│               │  │
│   │   │             │   │  "          │   │  "          │               │  │
│   │   │ input:      │   │ input:      │   │ input:      │               │  │
│   │   │  {city}     │   │  {query}    │   │  {to,subj,  │               │  │
│   │   │             │   │             │   │   body}     │               │  │
│   │   └─────────────┘   └─────────────┘   └─────────────┘               │  │
│   └──────────────────────────────────────────────────────────────────────┘  │
│                                    │                                         │
│                                    │ LLM sees descriptions                   │
│                                    │ and chooses which to call               │
│                                    ▼                                         │
│   ┌──────────────────────────────────────────────────────────────────────┐  │
│   │  User: "What's the weather in Tokyo and email it to john@example.com" │  │
│   │                                                                        │  │
│   │  LLM Reasoning:                                                        │  │
│   │   1. I need weather data → call getWeather({city: "Tokyo"})           │  │
│   │   2. I need to send email → call sendEmail({to: "john@...", ...})     │  │
│   │                                                                        │  │
│   │  SDK executes tools, returns results to LLM                            │  │
│   │  LLM generates final response incorporating tool results               │  │
│   └──────────────────────────────────────────────────────────────────────┘  │
│                                                                              │
└────────────────────────────────────────────────────────────────────────────┘

Tool Calling Flow

Key Insight: The LLM decides WHEN and WHICH tools to call based on your descriptions. You don’t control the flow—you define capabilities and let the LLM orchestrate.

const tools = {
  getWeather: tool({
    description: 'Get current weather for a city',
    parameters: z.object({
      city: z.string().describe('City name')
    }),
    execute: async ({ city }) => {
      const response = await fetch(`https://api.weather.com/${city}`);
      return response.json();
    }
  })
};

const { text, toolCalls } = await generateText({
  model: openai('gpt-4'),
  tools,
  prompt: 'What is the weather in Tokyo?'
});
// LLM called getWeather, got result, and incorporated it into response

4. Agents: Autonomous AI

A tool call is a single action. An agent is an LLM in a loop, calling tools repeatedly until a task is complete.

┌────────────────────────────────────────────────────────────────────────────┐
│                         AGENT LOOP ARCHITECTURE                              │
├────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│   User Goal: "Research quantum computing and write a summary"               │
│                                                                              │
│   ┌────────────────────────────────────────────────────────────────────┐    │
│   │                        AGENT LOOP                                   │    │
│   │                                                                     │    │
│   │   ┌─────────────────────────────────────────────────────────────┐  │    │
│   │   │  prepareStep: Inject accumulated context                     │  │    │
│   │   │    • "You have learned: [facts from previous steps]"        │  │    │
│   │   │    • "Sources visited: [urls]"                              │  │    │
│   │   └──────────────────────────┬──────────────────────────────────┘  │    │
│   │                              │                                      │    │
│   │                              ▼                                      │    │
│   │   ┌─────────────────────────────────────────────────────────────┐  │    │
│   │   │  LLM Decision: What should I do next?                        │  │    │
│   │   │                                                              │  │    │
│   │   │  Step 1: "I need to search" → webSearch("quantum computing")│  │    │
│   │   │  Step 2: "I should read this" → readPage("nature.com/...")  │  │    │
│   │   │  Step 3: "I found facts" → extractFacts(content)            │  │    │
│   │   │  Step 4: "Need more info" → webSearch("quantum error...")   │  │    │
│   │   │  ...                                                         │  │    │
│   │   │  Step N: "I have enough" → synthesize final answer          │  │    │
│   │   └──────────────────────────┬──────────────────────────────────┘  │    │
│   │                              │                                      │    │
│   │                              ▼                                      │    │
│   │   ┌─────────────────────────────────────────────────────────────┐  │    │
│   │   │  stopWhen: Check if agent should terminate                   │  │    │
│   │   │    • hasToolCall('synthesize') → true: STOP                 │  │    │
│   │   │    • stepCount > maxSteps → true: STOP                      │  │    │
│   │   │    • otherwise → false: CONTINUE LOOP                       │  │    │
│   │   └─────────────────────────────────────────────────────────────┘  │    │
│   │                                                                     │    │
│   └────────────────────────────────────────────────────────────────────┘    │
│                                                                              │
│   Output: Complete research summary with citations                           │
│                                                                              │
└────────────────────────────────────────────────────────────────────────────┘

Agent Loop Architecture

Key Insight: stopWhen and prepareStep are your control mechanisms. prepareStep injects state before each iteration; stopWhen decides when to stop. The agent is autonomous between these boundaries.

const { text, steps } = await generateText({
  model: openai('gpt-4'),
  tools: { search, readPage, synthesize },
  stopWhen: hasToolCall('synthesize'), // Stop when synthesis tool is called
  prepareStep: async ({ previousSteps }) => {
    // Inject accumulated knowledge before each step
    const facts = extractFacts(previousSteps);
    return {
      system: `You are a research agent. Facts learned so far: ${facts}`
    };
  },
  prompt: 'Research quantum computing and write a summary'
});

5. Provider Abstraction: Write Once, Run Anywhere

Different LLM providers have different APIs, capabilities, and quirks. The AI SDK normalizes them:

┌────────────────────────────────────────────────────────────────────────────┐
│                     PROVIDER ABSTRACTION                                     │
├────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│   YOUR CODE (unchanged)                                                      │
│   ─────────────────────                                                      │
│                                                                              │
│   const result = await generateText({                                        │
│     model: provider('model-name'),  ◄── Only this line changes              │
│     prompt: 'Your prompt here'                                               │
│   });                                                                        │
│                                                                              │
│   ┌────────────────────────────────────────────────────────────────────┐    │
│   │                    Provider Implementations                         │    │
│   │                                                                     │    │
│   │   openai('gpt-4')          → OpenAI REST API                       │    │
│   │   anthropic('claude-3')    → Anthropic Messages API                │    │
│   │   google('gemini-pro')     → Google Generative AI API              │    │
│   │   mistral('mistral-large') → Mistral La Plateforme API             │    │
│   │   ollama('llama2')         → Local Ollama HTTP API                 │    │
│   │                                                                     │    │
│   │   Each provider handles:                                            │    │
│   │   • Authentication (API keys, tokens)                              │    │
│   │   • Request format translation                                     │    │
│   │   • Response normalization                                         │    │
│   │   • Streaming protocol differences                                 │    │
│   │   • Error mapping to AI SDK error types                            │    │
│   │                                                                     │    │
│   └────────────────────────────────────────────────────────────────────┘    │
│                                                                              │
│   USE CASE: Fallback chains, cost optimization, capability routing          │
│                                                                              │
│   // Try Claude for reasoning, fall back to GPT-4                           │
│   try {                                                                      │
│     return await generateText({ model: anthropic('claude-3-opus') });       │
│   } catch {                                                                  │
│     return await generateText({ model: openai('gpt-4') });                  │
│   }                                                                          │
│                                                                              │
└────────────────────────────────────────────────────────────────────────────┘

Provider Abstraction

6. Streaming Architecture: Server-Sent Events

Understanding HOW streaming works is crucial for building real-time AI interfaces:

┌────────────────────────────────────────────────────────────────────────────┐
│                    STREAMING DATA FLOW                                       │
├────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│   Browser                 Next.js API Route              LLM Provider        │
│     │                          │                              │              │
│     │── POST /api/chat ───────►│                              │              │
│     │                          │── streamText() ─────────────►│              │
│     │                          │                              │              │
│     │                          │◄─ AsyncIterableStream ───────│              │
│     │                          │   (yields token by token)    │              │
│     │                          │                              │              │
│     │                   ┌──────┴──────┐                       │              │
│     │                   │ toDataStream│                       │              │
│     │                   │  Response() │                       │              │
│     │                   └──────┬──────┘                       │              │
│     │                          │                              │              │
│     │◄─ SSE: data: {"type":"text","value":"The"} ─────────────│              │
│     │◄─ SSE: data: {"type":"text","value":" quantum"} ────────│              │
│     │◄─ SSE: data: {"type":"text","value":" computer"} ───────│              │
│     │◄─ SSE: data: {"type":"finish"} ─────────────────────────│              │
│     │                          │                              │              │
│   ┌─┴─┐                        │                              │              │
│   │useChat hook               │                              │              │
│   │processes SSE              │                              │              │
│   │updates React state        │                              │              │
│   │triggers re-render         │                              │              │
│   └───┘                        │                              │              │
│                                                                              │
│   SSE Format:                                                                │
│   ───────────                                                                │
│   event: message                                                             │
│   data: {"type":"text-delta","textDelta":"The"}                              │
│                                                                              │
│   data: {"type":"text-delta","textDelta":" answer"}                          │
│                                                                              │
│   data: {"type":"finish","finishReason":"stop"}                              │
│                                                                              │
└────────────────────────────────────────────────────────────────────────────┘

Key Insight: Server-Sent Events are unidirectional (server → client), simpler than WebSockets, and perfect for LLM streaming. The AI SDK handles all the serialization and React state management.


Concept Summary Table

Concept Cluster What You Need to Internalize
Text Generation generateText is blocking, streamText is real-time. Both are the foundation for all LLM interactions.
Structured Output generateObject transforms unstructured text into typed, validated data. Zod schemas guide LLM output. Schema descriptions are prompt engineering.
Tool Calling Tools are functions the LLM can invoke. The LLM decides WHEN and WHICH tool to call based on descriptions. You define capabilities; the LLM orchestrates.
Agent Loop An agent is an LLM in a loop, calling tools until a task is complete. stopWhen and prepareStep are your control mechanisms.
Provider Abstraction Switch between OpenAI, Anthropic, Google with one line. The SDK normalizes API differences, auth, streaming protocols.
Streaming Architecture SSE transport, AsyncIterableStream, token-by-token delivery. React hooks (useChat, useCompletion) handle client-side state.
Error Handling AI_NoObjectGeneratedError, provider failures, stream errors. Production AI needs graceful degradation and retry logic.
Telemetry Track tokens, costs, latency per request. Essential for production AI systems and cost optimization.

Deep Dive Reading By Concept

Concept Book Chapters & Resources
Text Generation • “JavaScript: The Definitive Guide” by David Flanagan - Ch. 13 (Asynchronous JavaScript, Promises, async/await)
AI SDK generateText docs
AI SDK streamText docs
Structured Output • “Programming TypeScript” by Boris Cherny - Ch. 3 (Types), Ch. 6 (Advanced Types)
AI SDK generateObject docs
Zod documentation - Schema validation patterns
Tool Calling • “Building LLM Apps” by Harrison Chase (LangChain blog series)
AI SDK Tools and Tool Calling
How to build AI Agents with Vercel
Agent Loop • “ReAct: Synergizing Reasoning and Acting” (Yao et al.) - The academic foundation
AI SDK Agents docs
• “Artificial Intelligence: A Modern Approach” by Russell & Norvig - Ch. 2 (Intelligent Agents)
Provider Abstraction • “Design Patterns” by Gang of Four - Adapter pattern
AI SDK Providers docs
• “Designing Data-Intensive Applications” by Martin Kleppmann - Ch. 4 (Encoding and Evolution)
Streaming Architecture • “JavaScript: The Definitive Guide” by David Flanagan - Ch. 13 (Async Iteration), Ch. 15.11 (Server-Sent Events)
• “Node.js Design Patterns” by Mario Casciaro - Ch. 6 (Streams)
MDN Server-Sent Events
AI SDK UI hooks docs
Error Handling • “Programming TypeScript” by Boris Cherny - Ch. 7 (Handling Errors)
• “Release It!, 2nd Edition” by Michael Nygard - Ch. 5 (Stability Patterns)
AI SDK Error Handling docs
Telemetry AI SDK Telemetry docs
• “Designing Data-Intensive Applications” by Martin Kleppmann - Ch. 1 (Reliability, Observability)
• OpenTelemetry documentation for observability patterns

Project 1: AI-Powered Expense Tracker CLI

📖 View Detailed Guide →

  • File: AI_SDK_LEARNING_PROJECTS.md
  • Programming Language: TypeScript
  • Coolness Level: Level 2: Practical but Forgettable
  • Business Potential: 2. The “Micro-SaaS / Pro Tool”
  • Difficulty: Level 1: Beginner
  • Knowledge Area: Generative AI / CLI Tools
  • Software or Tool: AI SDK / Zod
  • Main Book: “Programming TypeScript” by Boris Cherny

What you’ll build: A command-line tool where you describe expenses in natural language (“Spent $45.50 on dinner with client at Italian restaurant”) and it extracts, categorizes, and stores structured expense records.

Why it teaches AI SDK: This forces you to understand generateObject and Zod schemas at their core. You’ll see how the LLM transforms unstructured human text into validated, typed data—the bread and butter of real AI applications.

Core challenges you’ll face:

  • Designing Zod schemas that guide LLM output effectively (maps to structured output)
  • Handling validation errors when the LLM produces invalid data (maps to error handling)
  • Adding schema descriptions to improve extraction accuracy (maps to prompt engineering)
  • Supporting multiple categories and edge cases (maps to schema design)

Key Concepts:

  • Zod Schema Design: AI SDK Generating Structured Data Docs
  • TypeScript Type Inference: “Programming TypeScript” by Boris Cherny - Ch. 3
  • CLI Development: “Command-Line Rust” by Ken Youens-Clark (patterns apply to TS too)

Difficulty: Beginner Time estimate: Weekend Prerequisites: Basic TypeScript, npm/pnpm

Learning milestones:

  1. First generateObject call returns parsed expense → you understand schema-to-output mapping
  2. Adding descriptions to schema fields improves extraction → you grasp how LLMs consume schemas
  3. Handling AI_NoObjectGeneratedError gracefully → you understand AI SDK error patterns

Real World Outcome

When you run the CLI, here’s exactly what you’ll see in your terminal:

$ expense "Coffee with team $23.40 at Starbucks this morning"

✓ Expense recorded

┌─────────────────────────────────────────────────────────────────┐
│                        EXPENSE RECORD                            │
├─────────────────────────────────────────────────────────────────┤
│  Amount:     $23.40                                              │
│  Category:   dining                                              │
│  Vendor:     Starbucks                                           │
│  Date:       2025-12-22                                          │
│  Notes:      Coffee with team                                    │
├─────────────────────────────────────────────────────────────────┤
│  ID:         exp_a7f3b2c1                                        │
│  Created:    2025-12-22T10:34:12Z                                │
└─────────────────────────────────────────────────────────────────┘

Saved to ~/.expenses/2025-12.json

Try more complex natural language inputs:

$ expense "Took an Uber from airport to hotel, $67.80, for the Chicago conference trip"

✓ Expense recorded

┌─────────────────────────────────────────────────────────────────┐
│                        EXPENSE RECORD                            │
├─────────────────────────────────────────────────────────────────┤
│  Amount:     $67.80                                              │
│  Category:   travel                                              │
│  Vendor:     Uber                                                │
│  Date:       2025-12-22                                          │
│  Notes:      Airport to hotel, Chicago conference                │
├─────────────────────────────────────────────────────────────────┤
│  ID:         exp_b8e4c3d2                                        │
│  Created:    2025-12-22T10:35:45Z                                │
└─────────────────────────────────────────────────────────────────┘

Generate reports:

$ expense report --month 2025-12

┌─────────────────────────────────────────────────────────────────┐
│              EXPENSE REPORT: December 2025                       │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  SUMMARY BY CATEGORY                                             │
│  ───────────────────                                             │
│  dining        │████████████████     │  $234.50  (12 expenses)  │
│  travel        │████████████         │  $567.80  (5 expenses)   │
│  office        │████                 │  $89.20   (3 expenses)   │
│  entertainment │██                   │  $45.00   (2 expenses)   │
│  ─────────────────────────────────────────────────────────────  │
│  TOTAL                                 $936.50  (22 expenses)   │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Exported to ~/.expenses/report-2025-12.csv

Handle errors gracefully:

$ expense "bought something"

⚠ Could not extract expense details

Missing information:
  • Amount: No monetary value found
  • Vendor: No vendor/merchant identified

Please include at least an amount, e.g.:
  expense "bought lunch $15 at Chipotle"

The Core Question You’re Answering

“How do I transform messy, unstructured human text into clean, typed, validated data structures using AI?”

This is THE fundamental pattern of modern AI applications. Every chatbot that fills out forms, every assistant that creates calendar events, every tool that extracts data from documents—they all use this pattern. You describe something in plain English, and the AI SDK + LLM extracts structured data.

Before you write code, understand: generateObject is not just “LLM call with schema.” The schema itself is part of the prompt. The LLM sees your Zod schema including field names, types, and descriptions. Better schemas = better extraction.

Concepts You Must Understand First

Stop and research these before coding:

  1. Zod Schemas as LLM Instructions
    • What is a Zod schema and how does TypeScript infer types from it?
    • How does generateObject send the schema to the LLM?
    • Why do .describe() methods on schema fields improve extraction?
    • Reference: Zod documentation - Start here
  2. generateObject vs generateText
    • When would you use generateText vs generateObject?
    • What happens internally when you call generateObject?
    • What is AI_NoObjectGeneratedError and when does it occur?
    • Reference: AI SDK generateObject docs
  3. TypeScript Type Inference
    • How does z.infer<typeof schema> work?
    • Why is this important for type-safe AI applications?
    • Book Reference: “Programming TypeScript” by Boris Cherny - Ch. 3 (Types)
  4. Error Handling in AI Systems
    • What happens when the LLM generates data that doesn’t match the schema?
    • How do you handle partial matches or missing fields?
    • What’s the difference between validation errors and generation errors?
    • Book Reference: “Programming TypeScript” by Boris Cherny - Ch. 7 (Handling Errors)
  5. CLI Design Patterns
    • How do you parse command-line arguments in Node.js?
    • What makes a good CLI user experience?
    • Book Reference: “Command-Line Rust” by Ken Youens-Clark - Ch. 1-2 (patterns apply to TypeScript)

Questions to Guide Your Design

Before implementing, think through these:

  1. Schema Design
    • What fields does an expense record need? (amount, category, vendor, date, notes?)
    • What data types should each field be? (number, enum, string, Date?)
    • Which fields are required vs optional?
    • How do you handle ambiguous categories? (Is “Uber” travel or transportation?)
  2. Natural Language Parsing
    • How many ways can someone describe “$45.50”? (“45.50”, “$45.50”, “forty-five fifty”, “about 45 bucks”)
    • How do you handle relative dates? (“yesterday”, “last Tuesday”, “this morning”)
    • What if the vendor is implied but not stated? (“got coffee” → Starbucks?)
  3. Storage and Persistence
    • Where do you store expenses? (JSON file, SQLite, in-memory?)
    • How do you organize by month/year for reporting?
    • How do you handle concurrent writes?
  4. Error Recovery
    • What do you do when extraction fails completely?
    • How do you handle partial extraction (got amount but no vendor)?
    • Should you prompt the user for missing information?
  5. CLI Interface
    • What commands do you need? (add, list, report, export?)
    • How do you handle interactive vs non-interactive modes?
    • What output formats do you support? (JSON, table, CSV?)

Thinking Exercise

Before coding, design your schema on paper:

// Start with this skeleton and fill in the blanks:

const expenseSchema = z.object({
  // What fields do you need?
  // What types should they be?
  // What descriptions will help the LLM understand what you want?

  amount: z.number().describe('???'),
  category: z.enum(['???']).describe('???'),
  vendor: z.string().describe('???'),
  date: z.string().describe('???'), // or z.date()?
  notes: z.string().optional().describe('???'),
});

// Now trace through these inputs:
// 1. "Coffee $4.50 at Starbucks"
// 2. "Spent around 50 bucks on office supplies at Amazon yesterday"
// 3. "Uber to airport"  ← No amount! What happens?
// 4. "Bought stuff"     ← Very ambiguous! What happens?

Questions while tracing:

  • Which inputs will extract cleanly?
  • Which will cause validation errors?
  • How would you modify your schema to handle more edge cases?
  • What descriptions would help the LLM interpret “around 50 bucks”?

The Interview Questions They’ll Ask

Prepare to answer these:

  1. “What is the difference between generateText and generateObject?”
    • generateText returns unstructured text. generateObject returns a typed object validated against a Zod schema. Use generateObject when you need structured, validated data.
  2. “How does Zod work with the AI SDK?”
    • Zod schemas define the expected structure. The AI SDK serializes the schema (including descriptions) and sends it to the LLM. The LLM generates JSON matching the schema. The SDK validates the response and returns a typed object.
  3. “What happens if the LLM generates invalid data?”
    • The SDK throws AI_NoObjectGeneratedError. You can catch this and retry, prompt for more information, or fall back gracefully.
  4. “How do schema descriptions affect LLM output quality?”
    • Descriptions are essentially prompt engineering embedded in your type definitions. Clear descriptions with examples dramatically improve extraction accuracy.
  5. “How would you handle partial extraction?”
    • Use optional fields (z.optional()) for non-critical data. For required fields, validate the error and prompt the user for missing information.
  6. “What are the tradeoffs of different expense categories?”
    • z.enum() limits categories but ensures consistency. z.string() is flexible but may result in inconsistent categorization. A middle ground: use z.enum() with a catch-all “other” category.

Hints in Layers

Hint 1: Basic Setup Start with the simplest possible schema and a single command:

import { generateObject } from 'ai';
import { openai } from '@ai-sdk/openai';
import { z } from 'zod';

const expenseSchema = z.object({
  amount: z.number(),
  vendor: z.string(),
});

const { object } = await generateObject({
  model: openai('gpt-4o-mini'),
  schema: expenseSchema,
  prompt: process.argv[2], // "Coffee $5 at Starbucks"
});

console.log(object);

Run it and see what you get. Does it work? What’s missing?

Hint 2: Add Descriptions Descriptions dramatically improve extraction:

const expenseSchema = z.object({
  amount: z.number()
    .describe('The monetary amount spent in US dollars. Extract from phrases like "$45.50", "45 dollars", "about 50 bucks".'),
  vendor: z.string()
    .describe('The business or merchant name where the purchase was made.'),
  category: z.enum(['dining', 'travel', 'office', 'entertainment', 'other'])
    .describe('The expense category. Use "dining" for restaurants and coffee shops, "travel" for transportation and hotels.'),
});

Hint 3: Handle Errors Wrap your call in try/catch:

import { AI_NoObjectGeneratedError } from 'ai';

try {
  const { object } = await generateObject({ ... });
  console.log('✓ Expense recorded');
  console.log(object);
} catch (error) {
  if (error instanceof AI_NoObjectGeneratedError) {
    console.log('⚠ Could not extract expense details');
    console.log('Please include an amount and vendor.');
  } else {
    throw error;
  }
}

Hint 4: Add Persistence Store expenses in a JSON file:

import { readFileSync, writeFileSync, existsSync } from 'fs';

const EXPENSES_FILE = './expenses.json';

function loadExpenses(): Expense[] {
  if (!existsSync(EXPENSES_FILE)) return [];
  return JSON.parse(readFileSync(EXPENSES_FILE, 'utf-8'));
}

function saveExpense(expense: Expense) {
  const expenses = loadExpenses();
  expenses.push({ ...expense, id: crypto.randomUUID(), createdAt: new Date() });
  writeFileSync(EXPENSES_FILE, JSON.stringify(expenses, null, 2));
}

Hint 5: Build the Report Command Group expenses by category:

const expenses = loadExpenses();
const byCategory = Object.groupBy(expenses, (e) => e.category);

for (const [category, items] of Object.entries(byCategory)) {
  const total = items.reduce((sum, e) => sum + e.amount, 0);
  console.log(`${category}: $${total.toFixed(2)} (${items.length} expenses)`);
}

Books That Will Help

Topic Book Chapter
TypeScript fundamentals “Programming TypeScript” by Boris Cherny Ch. 3 (Types), Ch. 6 (Advanced Types)
Error handling patterns “Programming TypeScript” by Boris Cherny Ch. 7 (Handling Errors)
Zod and validation Zod documentation Entire guide
CLI design patterns “Command-Line Rust” by Ken Youens-Clark Ch. 1-2 (patterns apply to TS)
Async/await patterns “JavaScript: The Definitive Guide” by David Flanagan Ch. 13 (Asynchronous JavaScript)
AI SDK structured output AI SDK Docs Generating Structured Data

Recommended reading order:

  1. Zod documentation (30 min) - Understand schema basics
  2. AI SDK generateObject docs (30 min) - Understand the API
  3. Boris Cherny Ch. 3 (1 hour) - Deep TypeScript types
  4. Then start coding!

Project 2: Real-Time Document Summarizer with Streaming UI

📖 View Detailed Guide →

  • File: AI_SDK_LEARNING_PROJECTS.md
  • Main Programming Language: TypeScript
  • Alternative Programming Languages: JavaScript, Python, Go
  • Coolness Level: Level 3: Genuinely Clever
  • Business Potential: Level 2: The “Micro-SaaS / Pro Tool”
  • Difficulty: Level 2: Intermediate (The Developer)
  • Knowledge Area: Web Streaming, AI Integration
  • Software or Tool: Next.js, AI SDK, React
  • Main Book: “JavaScript: The Definitive Guide” by David Flanagan

What you’ll build: A web application where users paste long documents (articles, papers, transcripts) and watch summaries generate in real-time, character by character, with a progress indicator and section-by-section breakdown.

Why it teaches AI SDK: streamText is what makes AI apps feel alive. You’ll implement the streaming pipeline end-to-end: from the SDK’s async iterators through Server-Sent Events to React state updates. This is how ChatGPT-style UIs work.

Core challenges you’ll face:

  • Implementing SSE streaming from Next.js API routes (maps to streaming architecture)
  • Consuming streams on the client with proper cleanup (maps to async iteration)
  • Handling partial updates and rendering in-progress text (maps to state management)
  • Graceful error handling mid-stream (maps to error boundaries)

Resources for key challenges:

  • “The AI SDK UI docs on useChat/useCompletion” - Shows the React hooks that handle streaming
  • “MDN Server-Sent Events guide” - Foundation for understanding the transport layer

Key Concepts:

  • Streaming Responses: AI SDK streamText Docs
  • React Server Components: “Learning React, 2nd Edition” by Eve Porcello - Ch. 12
  • Async Iterators: “JavaScript: The Definitive Guide” by David Flanagan - Ch. 13

Difficulty: Beginner-Intermediate Time estimate: 1 week Prerequisites: React/Next.js basics, TypeScript

Real world outcome:

  • Paste a 5,000-word article and watch the summary stream in real-time
  • See a “Summarizing…” indicator with word count progress
  • Final output shows key points, main themes, and a one-paragraph summary
  • Copy button to grab the summary for use elsewhere

Learning milestones:

  1. First stream renders tokens in real-time → you understand async iteration
  2. Implementing abort controller cancels mid-stream → you grasp cleanup patterns
  3. Adding streaming structured output with streamObject → you combine both patterns

Real World Outcome

When you open the web app in your browser, here’s exactly what you’ll see and experience:

Initial State:

┌─────────────────────────────────────────────────────────────────────┐
│  📄 Document Summarizer                                              │
├─────────────────────────────────────────────────────────────────────┤
│                                                                      │
│  Paste your document here:                                           │
│  ┌────────────────────────────────────────────────────────────────┐ │
│  │                                                                 │ │
│  │  Paste or type your document text...                           │ │
│  │                                                                 │ │
│  │                                                                 │ │
│  │                                                                 │ │
│  └────────────────────────────────────────────────────────────────┘ │
│                                                                      │
│  Document length: 0 words                   [✨ Summarize]           │
│                                                                      │
└─────────────────────────────────────────────────────────────────────┘

After Pasting a Document (5,000+ words):

┌─────────────────────────────────────────────────────────────────────┐
│  📄 Document Summarizer                                              │
├─────────────────────────────────────────────────────────────────────┤
│                                                                      │
│  Paste your document here:                                           │
│  ┌────────────────────────────────────────────────────────────────┐ │
│  │ The field of quantum computing has seen remarkable progress    │ │
│  │ over the past decade. Recent breakthroughs in error           │ │
│  │ correction, qubit stability, and algorithmic development      │ │
│  │ have brought us closer than ever to practical quantum         │ │
│  │ advantage. This comprehensive analysis examines...            │ │
│  │ [... 5,234 more words ...]                                    │ │
│  └────────────────────────────────────────────────────────────────┘ │
│                                                                      │
│  Document length: 5,847 words               [✨ Summarize]           │
│                                                                      │
└─────────────────────────────────────────────────────────────────────┘

While Streaming (the magic happens!):

┌─────────────────────────────────────────────────────────────────────┐
│  📄 Document Summarizer                                              │
├─────────────────────────────────────────────────────────────────────┤
│                                                                      │
│  📝 Summary                                                          │
│  ─────────────────────────────────────────────────────────────────  │
│  ⏳ Generating...                           Progress: 234 words      │
│  ┌────────────────────────────────────────────────────────────────┐ │
│  │                                                                 │ │
│  │ ## Key Points                                                   │ │
│  │                                                                 │ │
│  │ The article examines recent quantum computing breakthroughs,   │ │
│  │ focusing on three critical areas:                              │ │
│  │                                                                 │ │
│  │ 1. **Error Correction**: IBM's new surface code approach       │ │
│  │    achieves 99.5% fidelity, a significant improvement over     │ │
│  │    previous methods. This breakthrough addresses one of the█   │ │
│  │                                                                 │ │
│  └────────────────────────────────────────────────────────────────┘ │
│                                                                      │
│  [⏹ Cancel]                                                          │
│                                                                      │
└─────────────────────────────────────────────────────────────────────┘

The cursor (█) moves in real-time as each token arrives from the LLM. The user watches the summary build word by word—this is the “ChatGPT effect” that makes AI feel alive.

Completed Summary:

┌─────────────────────────────────────────────────────────────────────┐
│  📄 Document Summarizer                                              │
├─────────────────────────────────────────────────────────────────────┤
│                                                                      │
│  📝 Summary                                           ✓ Complete     │
│  ─────────────────────────────────────────────────────────────────  │
│  Generated in 4.2s                          Total: 312 words         │
│  ┌────────────────────────────────────────────────────────────────┐ │
│  │                                                                 │ │
│  │ ## Key Points                                                   │ │
│  │                                                                 │ │
│  │ The article examines recent quantum computing breakthroughs,   │ │
│  │ focusing on three critical areas:                              │ │
│  │                                                                 │ │
│  │ 1. **Error Correction**: IBM's new surface code approach       │ │
│  │    achieves 99.5% fidelity, a significant improvement...       │ │
│  │                                                                 │ │
│  │ 2. **Qubit Scaling**: Google's 1,000-qubit processor           │ │
│  │    demonstrates exponential progress in hardware capacity...   │ │
│  │                                                                 │ │
│  │ 3. **Commercial Applications**: First production deployments   │ │
│  │    in drug discovery and financial modeling show...            │ │
│  │                                                                 │ │
│  │ ## Main Themes                                                  │ │
│  │ - Race between IBM, Google, and emerging startups              │ │
│  │ - Shift from theoretical to practical quantum advantage        │ │
│  │ - Growing investment from pharmaceutical and finance sectors   │ │
│  │                                                                 │ │
│  │ ## One-Paragraph Summary                                        │ │
│  │ Quantum computing is transitioning from experimental to        │ │
│  │ practical, with major players achieving key milestones in      │ │
│  │ error correction and scaling that enable real-world use cases. │ │
│  │                                                                 │ │
│  └────────────────────────────────────────────────────────────────┘ │
│                                                                      │
│  [📋 Copy to Clipboard]      [🔄 Summarize Again]      [📄 New Doc]  │
│                                                                      │
└─────────────────────────────────────────────────────────────────────┘

Error State (mid-stream failure):

┌─────────────────────────────────────────────────────────────────────┐
│  📄 Document Summarizer                                              │
├─────────────────────────────────────────────────────────────────────┤
│                                                                      │
│  📝 Summary                                           ⚠️ Error       │
│  ─────────────────────────────────────────────────────────────────  │
│  Stopped after 2.1s                         Partial: 156 words       │
│  ┌────────────────────────────────────────────────────────────────┐ │
│  │                                                                 │ │
│  │ ## Key Points                                                   │ │
│  │                                                                 │ │
│  │ The article examines recent quantum computing breakthroughs,   │ │
│  │ focusing on three critical areas:                              │ │
│  │                                                                 │ │
│  │ 1. **Error Correction**: IBM's new surface code approach       │ │
│  │    achieves 99.5% fidelity...                                  │ │
│  │                                                                 │ │
│  │ ─────────────────────────────────────────────────────────────  │ │
│  │ ⚠️ Stream interrupted: Connection timeout                       │ │
│  │    Showing partial results above.                              │ │
│  │                                                                 │ │
│  └────────────────────────────────────────────────────────────────┘ │
│                                                                      │
│  [🔄 Retry]                  [📋 Copy Partial]         [📄 New Doc]  │
│                                                                      │
└─────────────────────────────────────────────────────────────────────┘

Key UX behaviors to implement:

  1. The text area scrolls automatically to keep the cursor visible
  2. Word count updates in real-time as tokens arrive
  3. “Cancel” button appears only during streaming
  4. Partial results are preserved even on error
  5. Copy button works even during streaming (copies current content)

The Core Question You’re Answering

“How do I stream LLM responses in real-time to create responsive, interactive UIs?”

This is about understanding the entire streaming pipeline from the AI SDK’s async iterators through Server-Sent Events to React state updates. You’re not just calling an API—you’re building a real-time data flow that makes AI feel alive and responsive.

Concepts You Must Understand First

  1. Server-Sent Events (SSE) - The transport layer, how events flow from server to client over HTTP
  2. Async Iterators - The for await...of pattern, AsyncIterableStream in JavaScript
  3. React State with Streams - Updating state incrementally as chunks arrive without causing excessive re-renders
  4. AbortController - Cancellation patterns for stopping streams mid-flight
  5. Next.js API Routes - Server-side streaming setup with proper headers and response handling

Questions to Guide Your Design

  1. How do you send streaming responses from Next.js API routes?
  2. How do you consume Server-Sent Events on the client side?
  3. What happens if the user navigates away mid-stream? (Memory leaks, cleanup)
  4. How do you show a loading state vs partial content? (UX considerations)
  5. What do you do when the stream errors halfway through?
  6. How do you handle backpressure if the client can’t keep up with the stream?

Thinking Exercise

Draw a diagram of the data flow:

  1. User pastes text and clicks “Summarize”
  2. Client sends POST request to /api/summarize with document text
  3. API route calls streamText() from AI SDK
  4. AI SDK returns an AsyncIterableStream
  5. Next.js converts this to Server-Sent Events (SSE) via toDataStreamResponse()
  6. Browser EventSource/fetch receives SSE chunks
  7. React hook (useChat/useCompletion) processes each chunk
  8. State updates trigger re-renders
  9. UI shows progressive text with cursor indicator
  10. Stream completes or user cancels with AbortController

Now trace what happens when:

  • The network connection drops mid-stream
  • The user clicks “Cancel”
  • Two requests are made simultaneously
  • The LLM returns an error after 50 tokens

The Interview Questions They’ll Ask

  1. “Explain the difference between WebSockets and Server-Sent Events”
    • Expected answer: SSE is unidirectional (server → client), simpler, built on HTTP, auto-reconnects. WebSockets are bidirectional, require protocol upgrade, more complex but better for chat-like interactions.
  2. “How would you implement cancellation for a streaming request?”
    • Expected answer: Use AbortController on the client, pass signal to fetch, clean up EventSource. On server, handle abort signals in the stream processing.
  3. “What happens if the stream errors mid-response?”
    • Expected answer: Partial data is already rendered, need error boundary to catch and display error state, possibly implement retry logic, show user what was received + error message.
  4. “How do you handle back-pressure in streaming?”
    • Expected answer: Browser EventSource buffers automatically, but you need to consider state update batching in React, potentially throttle/debounce updates, use React 18 transitions for non-urgent updates.
  5. “Why use Server-Sent Events instead of polling?”
    • Expected answer: Lower latency, less server load, real-time updates, no missed messages between polls, built-in reconnection.

Hints in Layers

Hint 1 (Basic Setup): Use the AI SDK’s toDataStreamResponse() helper to convert the stream into a format Next.js can send via SSE.

Hint 2 (Client Integration): The AI SDK provides useChat or useCompletion hooks that handle SSE consumption, state management, and cleanup automatically.

Hint 3 (Cancellation): Implement AbortController on the client side and pass the signal to your fetch request. The AI SDK hooks support this with the abort() function they return.

Hint 4 (Error Handling): Add React Error Boundaries around your streaming component, and handle errors in the onError callback of the AI SDK hooks. Consider showing partial results even when errors occur.

Hint 5 (Progress Tracking): The streamText response includes token counts and metadata. Use onFinish callback to track completion, and parse the streaming chunks to count words/tokens for progress indicators.

Hint 6 (Performance): Use React 18’s useTransition for non-urgent state updates to prevent janky UI. Consider useDeferredValue for the streaming text to keep the UI responsive.

Books That Will Help

Topic Book Chapter/Section
Async JavaScript & Iterators “JavaScript: The Definitive Guide” by David Flanagan Ch. 13 (Asynchronous JavaScript)
Server-Sent Events “JavaScript: The Definitive Guide” by David Flanagan Ch. 15.11 (Server-Sent Events)
React State Management “Learning React, 2nd Edition” by Eve Porcello Ch. 8 (Hooks), Ch. 12 (React and Server)
Streaming in Node.js “Node.js Design Patterns, 3rd Edition” by Mario Casciaro Ch. 6 (Streams)
Error Handling Patterns “Release It!, 2nd Edition” by Michael Nygard Ch. 5 (Stability Patterns)
Web APIs & Fetch “JavaScript: The Definitive Guide” by David Flanagan Ch. 15 (Web APIs)
React 18 Concurrent Features “Learning React, 2nd Edition” by Eve Porcello Ch. 8 (useTransition, useDeferredValue)

Recommended reading order:

  1. Start with Flanagan Ch. 13 to understand async/await and async iterators
  2. Read Flanagan Ch. 15.11 for SSE fundamentals
  3. Move to Porcello Ch. 8 for React hooks patterns
  4. Then tackle the AI SDK documentation with this foundation

Online Resources:


Project 3: Code Review Agent with Tool Calling

📖 View Detailed Guide →

  • File: AI_SDK_LEARNING_PROJECTS.md
  • Main Programming Language: TypeScript
  • Alternative Programming Languages: Python, Go, JavaScript
  • Coolness Level: Level 3: Genuinely Clever
  • Business Potential: Level 2: The “Micro-SaaS / Pro Tool”
  • Difficulty: Level 2: Intermediate (The Developer)
  • Knowledge Area: AI Agents, Tool Calling
  • Software or Tool: AI SDK, GitHub API, CLI
  • Main Book: “Building LLM Agents” by Harrison Chase (LangChain blog series)

What you’ll build: A CLI agent that takes a GitHub PR URL or local diff, then autonomously reads files, analyzes code patterns, checks for issues, and generates a structured code review with specific line-by-line feedback.

Why it teaches AI SDK: This is your first real agent—an LLM in a loop calling tools. You’ll define tools for file reading, pattern searching, and issue tracking. The LLM decides which tools to call and when, not you. This is where AI SDK becomes powerful.

Core challenges you’ll face:

  • Defining tool schemas that the LLM can understand and invoke correctly (maps to tool definition)
  • Implementing the agent loop with maxSteps or stopWhen (maps to agent architecture)
  • Managing context as tools return data back to the LLM (maps to conversation state)
  • Handling tool execution failures gracefully (maps to error recovery)

Resources for key challenges:

  • “AI SDK Agents documentation” - The canonical reference for agentic patterns
  • “Building LLM Agents” by Harrison Chase (LangChain blog series) - Mental models for agent design

Key Concepts:

Difficulty: Intermediate Time estimate: 1-2 weeks Prerequisites: Completed Projects 1-2, Git basics

Learning milestones:

  1. LLM calls your readFile tool → you understand tool invocation flow
  2. Agent makes multiple tool calls in sequence → you grasp the agentic loop
  3. Using onStepFinish to log agent progress → you understand observability patterns
  4. Agent decides it’s “done” autonomously → you understand completion conditions

Real World Outcome

When you run the CLI, here’s exactly what you’ll see in your terminal:

$ review-agent https://github.com/acme/widget-api/pull/47

🔍 Code Review Agent initialized
📋 Fetching PR #47 from acme/widget-api...

[Step 1] 🔧 Tool: fetchPRMetadata
         Title: "Add user authentication middleware"
         Files changed: 5
         Additions: 234, Deletions: 12

[Step 2] 🔧 Tool: readFile("src/middleware/auth.ts")
         Reading 89 lines...

[Step 3] 🔧 Tool: readFile("src/routes/users.ts")
         Reading 156 lines...

[Step 4] 🔧 Tool: searchPattern("password", "*.ts")
         Found 3 matches across 2 files

[Step 5] 🔧 Tool: readFile("src/utils/crypto.ts")
         Reading 45 lines...

[Step 6] 🤔 Agent reasoning: "I've reviewed the authentication
         implementation. I noticed potential security issues with
         password handling and a missing rate limiter. Let me
         generate the review."

[Step 7] 🔧 Tool: generateReview

═══════════════════════════════════════════════════════════════════════
                    CODE REVIEW: PR #47
            "Add user authentication middleware"
═══════════════════════════════════════════════════════════════════════

## 🔴 Security Issues (2)

### 1. Plaintext Password Logging
**File:** `src/middleware/auth.ts:34`
**Severity:** Critical

```typescript
console.log(`Login attempt for user: ${email}, password: ${password}`);
                                                        ^^^^^^^^

Issue: Password is logged in plaintext. This exposes credentials in logs. Fix: Remove password from log statement or mask it.

2. Missing Rate Limiting

File: src/routes/users.ts:12-45 Severity: High

The /login endpoint has no rate limiting, making it vulnerable to brute-force attacks.

Recommended: Add rate limiting middleware (e.g., express-rate-limit).


🟡 Code Quality (3)

1. Magic Number

File: src/middleware/auth.ts:56

const TOKEN_EXPIRY = 86400;  // What is this number?

Suggestion: Use named constant: const ONE_DAY_IN_SECONDS = 86400;

2. Missing Error Type

File: src/utils/crypto.ts:23

} catch (e) {
    throw e;  // No type narrowing
}

Suggestion: Use catch (e: unknown) and proper error handling.

3. Inconsistent Async Pattern

File: src/routes/users.ts:78 Mixing .then() and async/await. Prefer consistent async/await.


🟢 Suggestions (2)

1. Add Input Validation

File: src/routes/users.ts:15 Consider adding Zod schema validation for the login request body.

2. Extract JWT Secret

File: src/middleware/auth.ts:8 JWT_SECRET should come from environment variables, not hardcoded.


Summary

Category Count
🔴 Security Issues 2
🟡 Code Quality 3
🟢 Suggestions 2

Overall: This PR introduces authentication but has critical security issues that must be addressed before merging.

Recommendation: Request changes

═══════════════════════════════════════════════════════════════════════

📁 Full review saved to: review-pr-47.md 🔗 Ready to post as PR comment? [y/N]


If the user confirms, the agent posts the review as a GitHub comment:

```bash
$ y

📤 Posting review to GitHub...
✓ Review posted: https://github.com/acme/widget-api/pull/47#issuecomment-1234567

Done! Agent completed in 12.3s (7 steps, 3 files analyzed)

The Core Question You’re Answering

“How do I build an AI that autonomously takes actions, not just generates text?”

This is the paradigm shift from AI as a “fancy autocomplete” to AI as an “autonomous agent.” You’re not just asking the LLM to write a review—you’re giving it tools to fetch PRs, read files, search patterns, and letting it decide what to do next.

The LLM is now in control of the flow. It chooses which files to read. It decides when it has enough information. It determines when to stop. Your job is to define the tools and constraints, then let the agent work.

Concepts You Must Understand First

Stop and research these before coding:

  1. Tool Definition with the AI SDK
    • What is the tool() function and how do you define a tool?
    • How does the LLM “see” your tool? (description + parameters schema)
    • What’s the difference between execute and generate in tools?
    • Reference: AI SDK Tools and Tool Calling
  2. Agent Loop with stopWhen
    • What does stopWhen do in generateText?
    • How does the agent loop work internally?
    • What is hasToolCall() and how do you use it?
    • Reference: AI SDK Agents
  3. Context Management
    • How do tool results get fed back to the LLM?
    • What happens if the context gets too long?
    • How do you use onStepFinish for observability?
    • Reference: AI SDK Agent Events
  4. GitHub API Basics
    • How do you fetch PR metadata with the GitHub REST API?
    • How do you get the list of changed files in a PR?
    • How do you read file contents from a specific commit?
    • Reference: GitHub REST API - Pull Requests
  5. Error Handling in Agents
    • What happens if a tool fails mid-execution?
    • How do you implement retry logic for transient failures?
    • How do you handle LLM errors vs tool errors?
    • Book Reference: “Release It!, 2nd Edition” by Michael Nygard - Ch. 5

Questions to Guide Your Design

Before implementing, think through these:

  1. What tools does a code review agent need?
    • fetchPRMetadata: Get PR title, description, files changed
    • readFile: Read a specific file’s contents
    • searchPattern: Search for patterns across files (like grep)
    • getDiff: Get the diff for a specific file
    • generateReview: Final tool that triggers review synthesis
  2. How does the agent know what to review?
    • Start with the list of changed files from the PR
    • Agent decides which files are important to read
    • Agent searches for patterns that indicate issues (e.g., “TODO”, “password”, “console.log”)
  3. How does the agent know when to stop?
    • Use stopWhen: hasToolCall('generateReview')
    • Agent calls generateReview when it has gathered enough information
    • Add maxSteps as a safety limit
  4. How do you structure the review output?
    • Use generateObject with a schema for the review
    • Categories: security issues, code quality, suggestions
    • Each issue has: file, line, description, severity, suggested fix
  5. How do you handle large PRs?
    • Limit the number of files to analyze
    • Summarize file contents if too long
    • Prioritize files by extension (.ts > .md)

Thinking Exercise

Design your tools on paper before implementing:

// Define your tool schemas:

const tools = {
  fetchPRMetadata: tool({
    description: '???', // What should this say?
    parameters: z.object({
      prUrl: z.string().describe('???')
    }),
    execute: async ({ prUrl }) => {
      // What does this return?
      // { title, description, filesChanged, additions, deletions }
    }
  }),

  readFile: tool({
    description: '???',
    parameters: z.object({
      path: z.string().describe('???')
    }),
    execute: async ({ path }) => {
      // Return file contents as string
    }
  }),

  searchPattern: tool({
    description: '???',
    parameters: z.object({
      pattern: z.string(),
      glob: z.string().optional()
    }),
    execute: async ({ pattern, glob }) => {
      // Return matches: [{ file, line, match }]
    }
  }),

  generateReview: tool({
    description: 'Generate the final code review. Call this when you have gathered enough information.',
    parameters: z.object({
      summary: z.string(),
      issues: z.array(issueSchema),
      recommendation: z.enum(['approve', 'request-changes', 'comment'])
    }),
    execute: async (review) => review // Just return the structured review
  })
};

// Trace through a simple PR with 2 files changed:
// 1. What tool does the agent call first?
// 2. How does it decide which file to read?
// 3. When does it decide it has enough information?
// 4. What triggers the generateReview call?

The Interview Questions They’ll Ask

Prepare to answer these:

  1. “What is an AI agent and how is it different from a simple LLM call?”
    • An agent is an LLM in a loop that can call tools. Unlike a single LLM call that just generates text, an agent can take actions (read files, make API calls) and iterate until a task is complete. The agent autonomously decides which actions to take.
  2. “How do you define a tool for the AI SDK?”
    • Use the tool() function with a description (tells LLM when to use it), a Zod parameters schema (defines the input), and an execute function (performs the action). The description is critical—it’s prompt engineering for tool selection.
  3. “What is stopWhen and how does it work?”
    • stopWhen is a condition that determines when the agent loop terminates. Common patterns: hasToolCall('finalTool') stops when a specific tool is called, or a custom function that checks step count or context.
  4. “How do you handle context growth in agents?”
    • Use prepareStep to summarize or filter previous steps. Limit tool output size. Implement context windowing. For code review: only include relevant file snippets, not entire files.
  5. “What happens if a tool fails during agent execution?”
    • The error is returned to the LLM as a tool result. The LLM can decide to retry, try a different approach, or handle the error gracefully. You can also implement retry logic in the tool’s execute function.
  6. “How would you test an AI agent?”
    • Mock the LLM responses to test tool orchestration. Test tools in isolation. Use deterministic prompts for reproducible behavior. Log all steps for debugging. Implement integration tests with real LLM calls for end-to-end validation.

Hints in Layers

Hint 1: Start with a single tool Get the agent loop working with just one tool:

import { generateText, tool } from 'ai';
import { openai } from '@ai-sdk/openai';
import { z } from 'zod';

const tools = {
  readFile: tool({
    description: 'Read a file from the repository',
    parameters: z.object({
      path: z.string().describe('Path to the file')
    }),
    execute: async ({ path }) => {
      // For now, just return mock content
      return `Contents of ${path}: // TODO: implement`;
    }
  })
};

const { text, steps } = await generateText({
  model: openai('gpt-4'),
  tools,
  prompt: 'Read the file src/index.ts and tell me what it does.'
});

console.log('Steps:', steps.length);
console.log('Result:', text);

Run this and observe how the LLM calls your tool.

Hint 2: Add the agent loop with stopWhen

import { hasToolCall } from 'ai';

const tools = {
  readFile: tool({ ... }),
  generateSummary: tool({
    description: 'Generate the final summary. Call this when done.',
    parameters: z.object({
      summary: z.string()
    }),
    execute: async ({ summary }) => summary
  })
};

const { text, steps } = await generateText({
  model: openai('gpt-4'),
  tools,
  stopWhen: hasToolCall('generateSummary'),
  prompt: 'Read src/index.ts and src/utils.ts, then generate a summary.'
});

Hint 3: Add observability with onStepFinish

const { text, steps } = await generateText({
  model: openai('gpt-4'),
  tools,
  stopWhen: hasToolCall('generateSummary'),
  onStepFinish: ({ stepType, toolCalls }) => {
    console.log(`[Step] Type: ${stepType}`);
    for (const call of toolCalls || []) {
      console.log(`  Tool: ${call.toolName}(${JSON.stringify(call.args)})`);
    }
  },
  prompt: 'Review the PR...'
});

Hint 4: Connect to real GitHub API

const fetchPRMetadata = tool({
  description: 'Fetch metadata for a GitHub Pull Request',
  parameters: z.object({
    owner: z.string(),
    repo: z.string(),
    prNumber: z.number()
  }),
  execute: async ({ owner, repo, prNumber }) => {
    const response = await fetch(
      `https://api.github.com/repos/${owner}/${repo}/pulls/${prNumber}`,
      { headers: { Authorization: `token ${process.env.GITHUB_TOKEN}` } }
    );
    const pr = await response.json();
    return {
      title: pr.title,
      body: pr.body,
      changedFiles: pr.changed_files,
      additions: pr.additions,
      deletions: pr.deletions
    };
  }
});

Hint 5: Structure the review output

const reviewSchema = z.object({
  securityIssues: z.array(z.object({
    file: z.string(),
    line: z.number(),
    severity: z.enum(['critical', 'high', 'medium', 'low']),
    description: z.string(),
    suggestedFix: z.string()
  })),
  codeQuality: z.array(z.object({
    file: z.string(),
    line: z.number(),
    description: z.string(),
    suggestion: z.string()
  })),
  recommendation: z.enum(['approve', 'request-changes', 'comment']),
  summary: z.string()
});

const generateReview = tool({
  description: 'Generate the final structured code review',
  parameters: reviewSchema,
  execute: async (review) => review
});

Books That Will Help

Topic Book Chapter
Agent mental models “Artificial Intelligence: A Modern Approach” by Russell & Norvig Ch. 2 (Intelligent Agents)
ReAct pattern “ReAct: Synergizing Reasoning and Acting” (Yao et al.) The academic paper
Error handling “Release It!, 2nd Edition” by Michael Nygard Ch. 5 (Stability Patterns)
Tool design AI SDK Tools Docs Entire section
Agent loops AI SDK Agents Docs stopWhen, prepareStep
TypeScript patterns “Programming TypeScript” by Boris Cherny Ch. 4 (Functions), Ch. 7 (Error Handling)
GitHub API GitHub REST API Docs Pull Requests, Contents
CLI development “Command-Line Rust” by Ken Youens-Clark Ch. 1-3 (patterns apply)

Recommended reading order:

  1. AI SDK Tools and Tool Calling docs (30 min) - Understand tool definition
  2. AI SDK Agents docs (30 min) - Understand stopWhen and loop control
  3. Russell & Norvig Ch. 2 (1 hour) - Deep mental model for agents
  4. GitHub Pull Requests API (30 min) - Understand the data you’ll work with
  5. Then start coding!

Project 4: Multi-Provider Model Router

📖 View Detailed Guide →

  • File: AI_SDK_LEARNING_PROJECTS.md
  • Main Programming Language: TypeScript
  • Alternative Programming Languages: Python, Go, JavaScript
  • Coolness Level: Level 3: Genuinely Clever
  • Business Potential: Level 3: The “Service & Support” Model
  • Difficulty: Level 2: Intermediate (The Developer)
  • Knowledge Area: API Gateway, AI Integration
  • Software or Tool: AI SDK, OpenAI, Anthropic, Google AI
  • Main Book: “Designing Data-Intensive Applications” by Martin Kleppmann

What you’ll build: A smart API gateway that accepts prompts and dynamically routes them to the optimal model (GPT-4 for reasoning, Claude for long context, Gemini for vision) based on task analysis, with fallback handling and cost tracking.

Why it teaches AI SDK: The SDK’s provider abstraction is its killer feature. You’ll implement a system that uses generateObject to classify tasks, then routes to different providers—all through the unified API. You’ll deeply understand how the SDK normalizes provider differences.

Core challenges you’ll face:

  • Configuring multiple providers with their API keys and settings (maps to provider setup)
  • Building a task classifier that determines optimal model (maps to structured output)
  • Implementing fallback logic when primary provider fails (maps to error handling)
  • Tracking token usage and costs across providers (maps to telemetry)

Key Concepts:

Difficulty: Intermediate Time estimate: 1-2 weeks Prerequisites: Multiple API keys (OpenAI, Anthropic, Google), completed Projects 1-3

Real world outcome:

  • REST API endpoint that accepts { prompt, preferredCapability: "reasoning" | "vision" | "long-context" }
  • Automatically selects the best model, falls back on failure
  • Dashboard showing requests per provider, costs, latency, and success rates
  • Cost savings visible when cheaper models handle simple tasks

Learning milestones:

  1. Swapping providers with one line change → you understand the abstraction value
  2. Fallback chain executes on provider error → you grasp resilience patterns
  3. Telemetry shows cost per request → you understand production observability

Real World Outcome

When you run your Multi-Provider Model Router, here’s exactly what you’ll see and experience:

Testing the Router via HTTP:

$ curl -X POST http://localhost:3000/api/route \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "Explain quantum entanglement in simple terms",
    "preferredCapability": "reasoning"
  }'

{
  "provider": "openai",
  "model": "gpt-4-turbo",
  "response": "Quantum entanglement is a phenomenon where two particles...",
  "metadata": {
    "latency_ms": 1247,
    "tokens_used": 156,
    "cost_usd": 0.00468,
    "fallback_attempted": false,
    "routing_reason": "capability_match"
  }
}

Vision Task Automatically Routes to Gemini:

$ curl -X POST http://localhost:3000/api/route \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "What objects are in this image?",
    "image_url": "https://example.com/photo.jpg",
    "preferredCapability": "vision"
  }'

{
  "provider": "google",
  "model": "gemini-2.0-flash-001",
  "response": "The image contains: a wooden table, a laptop computer...",
  "metadata": {
    "latency_ms": 892,
    "tokens_used": 89,
    "cost_usd": 0.00089,
    "fallback_attempted": false,
    "routing_reason": "vision_capability"
  }
}

Fallback Chain in Action (Primary Provider Down):

$ curl -X POST http://localhost:3000/api/route \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "Summarize this 50-page legal document...",
    "preferredCapability": "long-context"
  }'

{
  "provider": "anthropic",
  "model": "claude-3-7-sonnet-20250219",
  "response": "This legal document outlines a commercial lease agreement...",
  "metadata": {
    "latency_ms": 3421,
    "tokens_used": 1247,
    "cost_usd": 0.03741,
    "fallback_attempted": true,
    "fallback_chain": [
      {
        "provider": "openai",
        "model": "gpt-4-turbo",
        "error": "Rate limit exceeded (429)",
        "timestamp": "2025-12-27T10:23:41Z"
      },
      {
        "provider": "anthropic",
        "model": "claude-3-7-sonnet-20250219",
        "status": "success",
        "timestamp": "2025-12-27T10:23:44Z"
      }
    ],
    "routing_reason": "fallback_success"
  }
}

Dashboard View (Running at http://localhost:3000/dashboard):

┌──────────────────────────────────────────────────────────────────────────┐
│                    🎯 Multi-Provider Router Dashboard                    │
│                     Last updated: 2025-12-27 10:25:43                    │
├──────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  📊 PROVIDER STATISTICS (Last 24 Hours)                                 │
│  ────────────────────────────────────────────────────────────────────   │
│                                                                          │
│  Provider    │ Requests │ Success │ Avg Latency │ Cost      │ Uptime   │
│  ──────────────────────────────────────────────────────────────────────│
│  OpenAI      │   1,247  │  98.2%  │    1.2s     │ $12.34   │  99.8%   │
│  Anthropic   │    834   │  99.7%  │    2.1s     │ $28.91   │ 100.0%   │
│  Google      │    423   │  97.4%  │    0.9s     │  $4.23   │  98.1%   │
│  ──────────────────────────────────────────────────────────────────────│
│  TOTAL       │   2,504  │  98.6%  │    1.5s     │ $45.48   │  99.3%   │
│                                                                          │
│  💰 COST SAVINGS                                                         │
│  ────────────────────────────────────────────────────────────────────   │
│                                                                          │
│  If all requests used GPT-4: $89.23                                     │
│  Actual cost with routing:    $45.48                                    │
│  Savings:                     $43.75 (49.0%)                            │
│                                                                          │
│  🔄 ROUTING BREAKDOWN                                                    │
│  ────────────────────────────────────────────────────────────────────   │
│                                                                          │
│  reasoning        ████████████████░░░░  62% → OpenAI GPT-4             │
│  vision           ████████░░░░░░░░░░░░  27% → Google Gemini            │
│  long-context     ████░░░░░░░░░░░░░░░░  11% → Anthropic Claude         │
│                                                                          │
│  ⚠️ RECENT FALLBACKS (Last 2 Hours)                                     │
│  ────────────────────────────────────────────────────────────────────   │
│                                                                          │
│  10:23:41 │ OpenAI → Anthropic  │ Rate limit (429)                     │
│  09:47:12 │ Google → OpenAI     │ Timeout (>5s)                        │
│  09:12:34 │ OpenAI → Anthropic  │ Model unavailable (503)              │
│                                                                          │
│  📈 LIVE REQUEST RATE                                                    │
│  ────────────────────────────────────────────────────────────────────   │
│                                                                          │
│  10:20  ▂▄▆█▆▄▂                                                         │
│  10:21  ▄▆█▆▄▂▁                                                         │
│  10:22  ▆█▆▄▂▁▂                                                         │
│  10:23  █▆▄▂▁▂▄                                                         │
│  10:24  ▆▄▂▁▂▄▆                                                         │
│  10:25  ▄▂▁▂▄▆█ ← Current                                               │
│                                                                          │
└──────────────────────────────────────────────────────────────────────────┘

CLI Tool Output:

$ ai-router stats --provider openai

OpenAI Provider Statistics
─────────────────────────────────────────────────
Status:              ✅ Healthy
Last request:        23 seconds ago
Requests today:      1,247
Success rate:        98.2% (1,225 successful)
Average latency:     1.24s
P50 latency:         1.12s
P95 latency:         2.34s
P99 latency:         4.56s

Cost today:          $12.34
Total tokens:        1,247,834
  - Input tokens:    847,234 ($8.47)
  - Output tokens:   400,600 ($3.87)

Models used:
  - gpt-4-turbo:        89% (1,110 requests)
  - gpt-3.5-turbo:      11% (137 requests)

Recent errors:
  [10:23:41] Rate limit exceeded (429)
  [08:15:23] Timeout after 30s
  [07:42:11] Invalid API key (401)

Config File (router.config.json):

{
  "providers": {
    "openai": {
      "apiKey": "${OPENAI_API_KEY}",
      "models": {
        "reasoning": "gpt-4-turbo",
        "fallback": "gpt-3.5-turbo"
      },
      "timeout": 30000,
      "maxRetries": 2,
      "circuitBreaker": {
        "failureThreshold": 5,
        "resetTimeout": 60000
      }
    },
    "anthropic": {
      "apiKey": "${ANTHROPIC_API_KEY}",
      "models": {
        "long-context": "claude-3-7-sonnet-20250219",
        "reasoning": "claude-3-5-sonnet-20241022"
      },
      "timeout": 60000,
      "maxRetries": 2
    },
    "google": {
      "apiKey": "${GOOGLE_AI_API_KEY}",
      "models": {
        "vision": "gemini-2.0-flash-001",
        "reasoning": "gemini-2.0-pro-001"
      },
      "timeout": 20000,
      "maxRetries": 3
    }
  },
  "routing": {
    "defaultProvider": "openai",
    "fallbackChain": ["openai", "anthropic", "google"],
    "capabilityMapping": {
      "reasoning": ["openai", "anthropic"],
      "vision": ["google", "openai"],
      "long-context": ["anthropic", "google"]
    },
    "costOptimization": {
      "enabled": true,
      "preferCheaperModels": true,
      "costThreshold": 0.05
    }
  },
  "telemetry": {
    "enabled": true,
    "logLevel": "info",
    "metricsRetention": "7d",
    "exportFormat": "prometheus"
  }
}

Key behaviors you’ll implement:

  1. Request classifier analyzes the prompt and determines optimal provider
  2. Primary provider is attempted first based on capability match
  3. If primary fails, automatic fallback to secondary providers in chain
  4. All requests logged with timing, cost, and routing decisions
  5. Dashboard updates in real-time showing provider health and costs
  6. Circuit breaker pattern prevents cascading failures
  7. Cost tracking per request, per provider, and aggregate

The Core Question You’re Answering

“How do I build resilient, production-grade AI systems that don’t go down when a single provider fails?”

In production, relying on a single LLM provider is like having a single point of failure in your infrastructure. OpenAI, Anthropic, and Google have all experienced downtime in 2025. When your primary provider hits rate limits, experiences an outage, or becomes slow, what happens to your users?

This project teaches you the production architecture pattern that 78% of enterprises use: multi-provider routing with automatic fallback. You’re not just calling an LLM—you’re building an intelligent gateway that:

  • Routes requests to the optimal model based on task requirements
  • Automatically fails over when providers are down
  • Tracks costs across all providers to optimize spend
  • Provides observability into your AI infrastructure

The AI SDK’s provider abstraction makes this possible without writing provider-specific code. Change one configuration line, and you’ve switched from OpenAI to Anthropic. This is the power of abstraction.

Concepts You Must Understand First

Stop and research these before coding:

  1. API Gateway Pattern
    • What is an API gateway and why do you need one?
    • How does a gateway differ from a simple proxy?
    • What responsibilities belong in the gateway layer vs application layer?
    • Book Reference: “Designing Data-Intensive Applications” by Martin Kleppmann — Ch. 4 (Encoding and Evolution) & Ch. 12 (The Future of Data Systems)
  2. Circuit Breaker Pattern
    • What is a circuit breaker and how does it prevent cascading failures?
    • What are the three states: Closed, Open, Half-Open?
    • How do you determine failure thresholds and reset timeouts?
    • When should you open the circuit vs retry?
    • Book Reference: “Release It!, 2nd Edition” by Michael Nygard — Ch. 5 (Stability Patterns)
  3. Fallback Chains and Resilience
    • What’s the difference between retrying the same provider vs falling back to another?
    • How do you design a fallback hierarchy?
    • What happens if all providers in the chain fail?
    • How do you avoid infinite loops in fallback logic?
    • Book Reference: “Building Microservices, 2nd Edition” by Sam Newman — Ch. 11 (Resiliency)
  4. Provider Abstraction in AI SDK
    • How does the AI SDK normalize differences between OpenAI, Anthropic, and Google?
    • What is the unified interface that all providers implement?
    • How do you configure multiple providers in one application?
    • What provider-specific features can’t be abstracted?
    • Reference: AI SDK Providers Documentation
  5. Structured Output with generateObject
    • How do you use generateObject to classify tasks?
    • What’s the difference between generateObject and generateText?
    • How do you define a Zod schema for structured output?
    • Why is structured output better than parsing text for classification?
    • Reference: AI SDK Structured Outputs
  6. Telemetry and Observability
    • What metrics should you track for LLM requests? (latency, tokens, cost, errors)
    • How do you implement request tracing across providers?
    • What’s the difference between metrics, logs, and traces?
    • How do you aggregate costs across different pricing models?
    • Book Reference: “Designing Data-Intensive Applications” by Martin Kleppmann — Ch. 1 (Reliable, Scalable, and Maintainable Applications)
  7. Rate Limiting and Quotas
    • Why do LLM providers rate limit, and how do you handle it?
    • What’s the difference between per-second and per-day limits?
    • How do you implement client-side rate limiting?
    • When should you retry vs fallback on rate limit errors?
    • Reference: OpenAI Rate Limits, Anthropic Rate Limits
  8. Cost Optimization Strategies
    • How do you calculate cost per request for different models?
    • When should you route to a cheaper model vs a more capable one?
    • How do you balance cost, latency, and quality?
    • What’s the ROI of using a routing layer? (Hint: 40-50% cost reduction)
    • Book Reference: “AI Engineering” by Chip Huyen — Ch. 9 (Cost Optimization)

Questions to Guide Your Design

Before implementing, think through these:

  1. Task Classification
    • How do you determine which capability a request needs? (reasoning, vision, long-context)
    • Should classification use an LLM or rule-based logic?
    • What if the user’s preferredCapability doesn’t match the actual task?
    • How do you handle requests that need multiple capabilities?
  2. Routing Logic
    • Given a capability (e.g., “reasoning”), how do you choose between GPT-4 and Claude?
    • Should you always route to the “best” model or consider cost?
    • How do you handle provider-specific features (e.g., Anthropic’s tool use)?
    • What’s your routing strategy: round-robin, least-latency, cost-based, or capability-based?
  3. Fallback Chain Design
    • What’s the order of your fallback chain? Primary → Secondary → Tertiary
    • Do you retry the same model or switch models within a provider?
    • How many retries before giving up?
    • Should fallback preserve the same model capability or accept degradation?
  4. Error Handling
    • How do you distinguish between retryable errors (503, 429) and non-retryable (401, 400)?
    • What do you return to the user when all providers fail?
    • Should you log PII from failed requests for debugging?
    • How do you prevent error amplification across providers?
  5. Telemetry Collection
    • What data do you capture per request? (provider, model, latency, tokens, cost, status)
    • How do you calculate token costs across providers with different pricing?
    • Where do you store telemetry? (In-memory, database, metrics service)
    • How do you export metrics for monitoring tools like Prometheus or Datadog?
  6. Circuit Breaker Configuration
    • What’s your failure threshold? (e.g., 5 failures in 60 seconds opens the circuit)
    • How long should the circuit stay open before trying half-open?
    • Should circuit state be per-provider or per-model?
    • How do you reset the circuit breaker on success?
  7. Security and API Keys
    • How do you securely store API keys for multiple providers?
    • Should keys be in environment variables, config files, or a secrets manager?
    • How do you rotate keys without downtime?
    • What happens if an API key is revoked mid-request?

Thinking Exercise

Draw the request flow diagram on paper:

User Request
    |
    v
[1] Task Classifier (generateObject)
    |
    v
[2] Capability Detection
    |
    +-- reasoning    → Primary: OpenAI GPT-4
    +-- vision       → Primary: Google Gemini
    +-- long-context → Primary: Anthropic Claude
    |
    v
[3] Check Circuit Breaker State
    |
    +-- OPEN   → Skip provider, use fallback
    +-- CLOSED → Proceed to provider
    |
    v
[4] Primary Provider Request
    |
    +-- Success (200)
    |   |
    |   v
    |   [7] Log Telemetry → Return Response
    |
    +-- Retryable Error (429, 503, timeout)
    |   |
    |   v
    |   [5] Retry or Fallback?
    |       |
    |       +-- Retry (attempt 1, 2) → [4]
    |       +-- Fallback → [6]
    |
    +-- Non-Retryable Error (401, 400)
        |
        v
        [8] Return Error to User
    |
    v
[6] Fallback Chain
    |
    +-- Secondary Provider (Anthropic)
    |   |
    |   +-- Success → [7]
    |   +-- Failure → Tertiary Provider
    |
    +-- Tertiary Provider (Google)
        |
        +-- Success → [7]
        +-- Failure → [8] All providers failed
    |
    v
[7] Update Telemetry
    |
    +-- Increment provider request count
    +-- Record latency (end_time - start_time)
    +-- Calculate cost (tokens * price_per_token)
    +-- Log routing decision and fallback chain
    |
    v
[8] Return to User

Now trace this scenario:

A user sends a vision task request. OpenAI’s vision model times out after 5 seconds. You fall back to Google Gemini, which succeeds in 0.9 seconds. Walk through each step:

  1. What does the task classifier detect?
  2. Which provider is tried first? Why?
  3. What happens when OpenAI times out?
  4. How does the circuit breaker react?
  5. Why does Gemini become the fallback?
  6. What telemetry is logged?
  7. What does the user see in the response metadata?

Additional scenarios to trace:

  • OpenAI rate limit (429) → Retry or fallback immediately?
  • All providers return 503 → What do you return to user?
  • Request takes 10s on primary, 2s on fallback → Do you set a deadline and fallback preemptively?
  • User requests “reasoning” but prompt contains an image → Does classifier override user preference?

The Interview Questions They’ll Ask

Prepare to answer these:

  1. “Explain the difference between retries and fallbacks in distributed systems.”
    • Expected answer: Retries attempt the same operation again (same provider/model) hoping for a transient error to resolve. Fallbacks switch to an alternative provider/model when the primary fails. Retries are for temporary issues (network blip), fallbacks are for sustained failures (provider outage). Use retries with exponential backoff for 5xx errors, fallbacks for provider unavailability.
  2. “How would you implement a circuit breaker for an LLM provider?”
    • Expected answer: Track failures per provider in a sliding time window. When failures exceed threshold (e.g., 5 in 60s), open the circuit—reject requests immediately without calling the provider. After a timeout (e.g., 30s), transition to half-open and allow one test request. If successful, close the circuit; if failed, reopen. This prevents cascading failures and gives the provider time to recover.
  3. “How do you calculate cost per request when tokens vary per provider?”
    • Expected answer: Each provider has different pricing (e.g., OpenAI: $0.01/1K input tokens, Anthropic: $0.015/1K). Capture usage.promptTokens and usage.completionTokens from the response, multiply by the provider’s price per token, and sum. Store pricing in a config map keyed by provider + model. Track cumulative cost per provider and aggregate across all providers.
  4. “What’s the tradeoff between cost optimization and latency?”
    • Expected answer: Cheaper models (GPT-3.5, Claude Haiku) are faster but less capable. Expensive models (GPT-4, Claude Opus) are slower but produce better output. A router can optimize cost by routing simple tasks to cheap models and complex tasks to expensive ones. However, task classification adds latency (~200ms). For latency-critical apps, pre-select models. For cost-critical apps, classify every request.
  5. “How would you handle a scenario where OpenAI is down for 2 hours?”
    • Expected answer: Circuit breaker opens immediately after threshold failures (e.g., 5 failures in 60s). All OpenAI requests automatically route to fallback providers (Anthropic, Google). Telemetry logs the fallback chain. Monitor dashboard shows OpenAI at 0% uptime, fallback providers handling load. When OpenAI recovers, half-open state allows test requests to gradually restore traffic. Users experience minimal disruption—just slightly higher costs (if fallback is more expensive) or different response styles.
  6. “Why use the AI SDK’s provider abstraction instead of calling APIs directly?”
    • Expected answer: Provider abstraction eliminates vendor lock-in and reduces code complexity. Without it, you’d write OpenAI-specific code (openai.chat.completions.create), Anthropic-specific code (anthropic.messages.create), etc. With AI SDK, you use generateText() for all providers—the SDK handles API differences. This makes fallback trivial: just switch the provider parameter. It also future-proofs your code when new providers emerge.
  7. “How do you prevent infinite loops in fallback chains?”
    • Expected answer: Set a maximum depth for the fallback chain (e.g., 3 providers). Track which providers have been tried in the current request. If all providers fail, return an error instead of retrying from the beginning. Use a visited set to prevent circular fallbacks. Additionally, implement request timeouts at the gateway level (e.g., 30s total) to abort even if fallback logic hasn’t finished.
  8. “What metrics would you expose in a production LLM gateway?”
    • Expected answer:
      • Availability: Success rate per provider (requests succeeded / total requests)
      • Latency: P50, P95, P99 response times per provider
      • Cost: Total spend per provider, per model, per day
      • Throughput: Requests per second, tokens per second
      • Errors: Error rate by type (4xx, 5xx, timeout)
      • Fallbacks: Fallback invocation count, fallback success rate
      • Circuit Breaker: Current state per provider (open/closed/half-open)

      Export these in Prometheus format or push to a metrics service (Datadog, CloudWatch).

Hints in Layers

Hint 1 (Start with Provider Setup): Configure multiple providers using the AI SDK’s provider system:

import { openai } from '@ai-sdk/openai';
import { anthropic } from '@ai-sdk/anthropic';
import { google } from '@ai-sdk/google';

const providers = {
  openai: openai('gpt-4-turbo'),
  anthropic: anthropic('claude-3-7-sonnet-20250219'),
  google: google('gemini-2.0-flash-001')
};

Hint 2 (Task Classifier with generateObject): Use generateObject to analyze the prompt and determine optimal capability:

import { generateObject } from 'ai';
import { z } from 'zod';

const classification = await generateObject({
  model: openai('gpt-4o-mini'), // Fast, cheap model for classification
  schema: z.object({
    capability: z.enum(['reasoning', 'vision', 'long-context']),
    reasoning: z.string(),
    confidence: z.number().min(0).max(1)
  }),
  prompt: `Analyze this request and determine the required capability: ${userPrompt}`
});

Hint 3 (Implement Fallback Logic): Wrap provider calls in a try-catch with fallback chain:

async function routeRequest(prompt: string, capability: string) {
  const chain = getProviderChain(capability); // ['openai', 'anthropic', 'google']

  for (const providerName of chain) {
    try {
      const provider = providers[providerName];
      const result = await generateText({
        model: provider,
        prompt: prompt,
        maxRetries: 2 // Retry same provider twice before fallback
      });

      return {
        provider: providerName,
        result,
        fallbackAttempted: chain.indexOf(providerName) > 0
      };
    } catch (error) {
      if (isLastProvider(providerName, chain)) {
        throw new Error('All providers failed');
      }
      // Log error and continue to next provider
      console.error(`Provider ${providerName} failed:`, error);
    }
  }
}

Hint 4 (Cost Tracking): Create a pricing map and calculate cost from token usage:

const pricing = {
  'openai/gpt-4-turbo': { input: 0.01, output: 0.03 }, // per 1K tokens
  'anthropic/claude-3-7-sonnet-20250219': { input: 0.015, output: 0.075 },
  'google/gemini-2.0-flash-001': { input: 0.001, output: 0.002 }
};

function calculateCost(usage: any, provider: string, model: string) {
  const key = `${provider}/${model}`;
  const prices = pricing[key];

  const inputCost = (usage.promptTokens / 1000) * prices.input;
  const outputCost = (usage.completionTokens / 1000) * prices.output;

  return inputCost + outputCost;
}

Hint 5 (Circuit Breaker Pattern): Implement a simple circuit breaker per provider:

class CircuitBreaker {
  private failures = 0;
  private lastFailureTime = 0;
  private state: 'CLOSED' | 'OPEN' | 'HALF_OPEN' = 'CLOSED';

  constructor(
    private threshold = 5,
    private resetTimeout = 60000 // 1 minute
  ) {}

  async execute<T>(fn: () => Promise<T>): Promise<T> {
    if (this.state === 'OPEN') {
      if (Date.now() - this.lastFailureTime > this.resetTimeout) {
        this.state = 'HALF_OPEN';
      } else {
        throw new Error('Circuit breaker is OPEN');
      }
    }

    try {
      const result = await fn();
      this.onSuccess();
      return result;
    } catch (error) {
      this.onFailure();
      throw error;
    }
  }

  private onSuccess() {
    this.failures = 0;
    this.state = 'CLOSED';
  }

  private onFailure() {
    this.failures++;
    this.lastFailureTime = Date.now();
    if (this.failures >= this.threshold) {
      this.state = 'OPEN';
    }
  }
}

const circuitBreakers = {
  openai: new CircuitBreaker(),
  anthropic: new CircuitBreaker(),
  google: new CircuitBreaker()
};

Hint 6 (Telemetry with onFinish): Use the AI SDK’s onFinish callback to capture metrics:

const result = await generateText({
  model: provider,
  prompt: prompt,
  onFinish: ({ text, usage, finishReason }) => {
    const cost = calculateCost(usage, providerName, modelName);

    telemetry.log({
      provider: providerName,
      model: modelName,
      tokens: usage.totalTokens,
      cost: cost,
      latency: Date.now() - startTime,
      finishReason: finishReason,
      timestamp: new Date().toISOString()
    });
  }
});

Hint 7 (Build a Simple Dashboard): Create an HTTP endpoint that aggregates telemetry:

app.get('/api/stats', (req, res) => {
  const stats = {
    providers: telemetry.getProviderStats(),
    totalCost: telemetry.getTotalCost(),
    requestCount: telemetry.getRequestCount(),
    fallbackRate: telemetry.getFallbackRate(),
    recentErrors: telemetry.getRecentErrors(10)
  };

  res.json(stats);
});

Use a frontend (React, Vue, or even static HTML) to poll this endpoint and display charts.

Books That Will Help

Topic Book Chapter/Section
API Gateway Patterns “Designing Data-Intensive Applications” by Martin Kleppmann Ch. 4 (Encoding and Evolution), Ch. 12 (The Future of Data Systems)
Circuit Breaker Pattern “Release It!, 2nd Edition” by Michael Nygard Ch. 5 (Stability Patterns: Circuit Breaker)
Fallback and Resilience “Building Microservices, 2nd Edition” by Sam Newman Ch. 11 (Microservices at Scale: Resiliency)
Distributed System Design “Designing Data-Intensive Applications” by Martin Kleppmann Ch. 1 (Reliable, Scalable, and Maintainable Applications)
Observability and Telemetry “Designing Data-Intensive Applications” by Martin Kleppmann Ch. 1 (Maintainability: Operability)
Error Handling Strategies “Release It!, 2nd Edition” by Michael Nygard Ch. 4 (Stability Antipatterns), Ch. 5 (Stability Patterns)
Cost Optimization “AI Engineering” by Chip Huyen Ch. 9 (Model Deployment and Serving: Cost Optimization)
Production AI Systems “AI Engineering” by Chip Huyen Ch. 8 (Model Deployment), Ch. 10 (Infrastructure and Tooling)
Service Reliability “Fundamentals of Software Architecture” by Mark Richards and Neal Ford Ch. 10 (Architectural Characteristics: Reliability)
TypeScript for Production “Effective TypeScript” by Dan Vanderkam Ch. 6 (Types Declarations and @types), Ch. 7 (Write and Run Your Code)

Recommended reading order:

  1. Start with Kleppmann Ch. 1 to understand what makes systems maintainable and observable
  2. Read Nygard Ch. 5 to master circuit breakers and retry patterns
  3. Move to Newman Ch. 11 for resilience strategies in distributed systems
  4. Dive into Huyen Ch. 9 for AI-specific cost optimization techniques
  5. Reference AI SDK documentation alongside coding

Online Resources:

Sources for Further Research:


Project 5: Autonomous Research Agent with Memory

📖 View Detailed Guide →

  • File: AI_SDK_LEARNING_PROJECTS.md
  • Main Programming Language: TypeScript
  • Alternative Programming Languages: Python, Go, JavaScript
  • Coolness Level: Level 4: Hardcore Tech Flex
  • Business Potential: Level 4: The “Open Core” Infrastructure
  • Difficulty: Level 3: Advanced (The Engineer)
  • Knowledge Area: AI Agents, Knowledge Graphs
  • Software or Tool: AI SDK, Web Search APIs, Graph Databases
  • Main Book: “Graph Algorithms the Fun Way” by Jeremy Kubica

What you’ll build: An agent that takes a research question, autonomously searches the web, reads pages, extracts facts, maintains a knowledge graph of discovered information, and synthesizes a final research report with citations.

Why it teaches AI SDK: This is a complex multi-tool agent with state management. You’ll implement tools for web search, page reading, fact extraction, and graph updates. The agent must decide when to search more vs. when to synthesize—real autonomous decision-making.

Core challenges you’ll face:

  • Building tools that interact with external APIs (search, fetch) (maps to tool implementation)
  • Maintaining state across agent iterations (knowledge graph) (maps to agent state)
  • Using prepareStep to inject context before each iteration (maps to loop control)
  • Implementing stopWhen for intelligent termination (maps to completion criteria)

Resources for key challenges:

  • “AI SDK 5 Agent documentation” - The stopWhen and prepareStep APIs
  • “ReAct: Synergizing Reasoning and Acting” (Yao et al.) - The academic foundation for tool-using agents

Key Concepts:

  • Agent Loop Control: AI SDK Agents - Advanced
  • State Management: “Fluent Python, 2nd Edition” by Luciano Ramalho - Ch. 22 (patterns apply)
  • Knowledge Graphs: “Graph Algorithms the Fun Way” by Jeremy Kubica - Ch. 2-3

Difficulty: Advanced Time estimate: 2-3 weeks Prerequisites: Completed Projects 1-4

Real world outcome:

  • Ask “What are the latest developments in quantum computing 2025?”
  • Watch the agent search, read articles, extract facts, build connections
  • Final output: 2-3 page research report with inline citations, confidence scores
  • Knowledge graph visualization showing how facts connect

Learning milestones:

  1. Agent chains 5+ tool calls to gather information → you understand complex orchestration
  2. prepareStep injects accumulated knowledge → you grasp context management
  3. Agent decides “enough research, time to write” → you understand autonomous planning
  4. Structured citations extracted with generateObject → you combine all SDK patterns

Real World Outcome

Show EXACT command-line output with agent steps:

$ research "What are the latest breakthroughs in quantum computing as of 2025?"

🔬 Research Agent initialized
📚 Knowledge graph: empty

[Step 1] 🔍 Tool: webSearch("quantum computing breakthroughs 2025")
         → Found 15 relevant results

[Step 2] 📖 Tool: readPage("https://nature.com/quantum-error-correction-2025")
         → Extracted 12 facts about error correction advances

[Step 3] 📝 Tool: addToGraph({
           entity: "IBM Quantum",
           relation: "achieved",
           target: "1000+ qubit processor"
         })
         → Knowledge graph: 12 nodes, 8 edges

[Step 4] 🔍 Tool: webSearch("Google quantum supremacy 2025")
         → Found 8 relevant results

...

[Step 12] 🤔 Agent reasoning: "I have gathered sufficient information
          on error correction, qubit scaling, and commercial applications.
          Time to synthesize the research report."

[Step 13] 📊 Tool: synthesizeReport()

═══════════════════════════════════════════════════════════════════════
                 RESEARCH REPORT: QUANTUM COMPUTING 2025
═══════════════════════════════════════════════════════════════════════

## Executive Summary

Quantum computing achieved several major milestones in 2025, with
breakthroughs in error correction, qubit scaling, and commercial...

## Key Findings

### 1. Error Correction (High Confidence: 0.92)
IBM and Google independently demonstrated...

### 2. Commercial Applications (Medium Confidence: 0.78)
First production use cases emerged in...

## Knowledge Graph Visualization

    ┌─────────────┐      achieved      ┌────────────────────┐
    │  IBM Quantum │ ────────────────► │ 1000+ qubit proc.  │
    └──────┬──────┘                    └────────────────────┘
           │
    competes with
           │
           ▼
    ┌──────────────┐     published     ┌────────────────────┐
    │ Google Quant │ ────────────────► │ Error correction   │
    └──────────────┘                   │ breakthrough       │
                                       └────────────────────┘

## Sources

[1] Nature: "Quantum Error Correction Advances" (2025-03-15)
    Confidence: 0.95
    https://nature.com/quantum-error-correction-2025

[2] ArXiv: "Scaling Quantum Processors" (2025-06-22)
    Confidence: 0.88
    ...

═══════════════════════════════════════════════════════════════════════

📁 Full report saved to: research_quantum_2025-12-22.md
📊 Knowledge graph exported to: knowledge_graph.json

The Core Question You’re Answering

“How do I build an agent that autonomously explores, learns, and synthesizes information?”

This is about understanding complex multi-tool agents with state management, autonomous decision-making, and knowledge accumulation. You’re not just calling tools—you’re building a system that thinks, learns, and decides when it knows enough.

Concepts You Must Understand First

  1. Multi-Tool Orchestration - Coordinating multiple tools with different purposes (search, read, extract, store)
  2. Agent State Management - Maintaining state (knowledge graph) across iterations
  3. prepareStep - Injecting accumulated context before each LLM call
  4. stopWhen - Intelligent termination conditions based on agent reasoning
  5. Knowledge Graphs - Representing and querying accumulated facts as entities and relationships

Include ASCII diagram of the research loop:

┌──────────────────────────────────────────────────────────────────┐
│                    RESEARCH AGENT ARCHITECTURE                    │
├──────────────────────────────────────────────────────────────────┤
│                                                                   │
│   ┌─────────────────────────────────────────────────────────┐    │
│   │                    AGENT STATE                           │    │
│   │  ┌───────────────┐  ┌────────────────┐  ┌────────────┐  │    │
│   │  │ Knowledge     │  │ Sources        │  │ Confidence │  │    │
│   │  │ Graph         │  │ Collected      │  │ Scores     │  │    │
│   │  └───────────────┘  └────────────────┘  └────────────┘  │    │
│   └─────────────────────────────────────────────────────────┘    │
│                              ▲                                    │
│                              │ prepareStep injects state          │
│                              │                                    │
│   ┌──────────────────────────┴───────────────────────────────┐   │
│   │                      AGENT LOOP                           │   │
│   │                                                           │   │
│   │   ┌──────┐    ┌─────────────────────────────────────┐    │   │
│   │   │ LLM  │ ──►│ Tools: search, read, extract, graph │    │   │
│   │   └──▲───┘    └───────────────────┬─────────────────┘    │   │
│   │      │                            │                       │   │
│   │      └────────────────────────────┘                       │   │
│   │                                                           │   │
│   │   stopWhen: agent says "research complete"                │   │
│   └───────────────────────────────────────────────────────────┘   │
│                                                                   │
│   Output: Synthesized report + Knowledge graph + Citations        │
└──────────────────────────────────────────────────────────────────┘

Questions to Guide Your Design

  1. What tools does a research agent need?
    • webSearch: Find relevant sources on the web
    • readPage: Extract content from URLs
    • extractFacts: Parse content into structured facts with generateObject
    • addToGraph: Store facts as knowledge graph nodes/edges
    • queryGraph: Find related information already collected
    • synthesizeReport: Generate final output with citations
  2. How do you represent the knowledge graph?
    • Nodes: entities (people, organizations, concepts, technologies)
    • Edges: relationships (achieved, published, competes with, enables)
    • Metadata: confidence scores, source URLs, timestamps
    • Consider: in-memory Map, SQLite with graph queries, or Neo4j
  3. How does the agent know when to stop searching and start writing?
    • stopWhen condition: “I have sufficient information to answer the question”
    • Agent reasons about coverage: multiple sources, key topics addressed, confidence threshold
    • Step limit as safety: maxSteps to prevent infinite loops
  4. How do you assign confidence scores to facts?
    • Source credibility: .edu/.gov = high, blogs = medium
    • Corroboration: multiple sources = higher confidence
    • Recency: newer sources = higher confidence for current events
    • Extract confidence as part of the fact schema

Thinking Exercise

Design the knowledge graph data structure before implementing:

// What should your types look like?
interface KnowledgeNode {
  id: string;
  type: 'entity' | 'concept' | 'event';
  name: string;
  description: string;
  sourceUrls: string[];
  confidence: number;
}

interface KnowledgeEdge {
  from: string;  // node id
  relation: string;
  to: string;    // node id
  confidence: number;
  sourceUrl: string;
}

// How will you query it?
// How will you update it?
// How will you serialize it for prepareStep?

The Interview Questions They’ll Ask

  1. “How do you maintain state across agent iterations?”
    • Answer: Use prepareStep to inject the serialized knowledge graph as context
    • The LLM sees what it has already learned before deciding the next action
    • State lives outside the agent loop, updated after each tool call
  2. “What is prepareStep and when would you use it?”
    • Answer: prepareStep is a callback that runs before each agent iteration
    • It lets you inject dynamic context (like accumulated knowledge)
    • Use it when the agent needs to “remember” previous findings
  3. “How would you implement a research termination condition?”
    • Answer: stopWhen with agent reasoning: “Do I have enough information?”
    • Agent evaluates coverage of key topics, number of sources, confidence levels
    • Fallback: maxSteps limit to prevent runaway loops
  4. “How do you handle conflicting information from different sources?”
    • Answer: Track confidence scores, store multiple facts with different sources
    • Flag conflicts in the knowledge graph (contradicts relationship)
    • Let the synthesis tool weigh evidence and present both views

Hints in Layers

Hint 1: Start with search + readPage tools only

  • Get the basic agent loop working: search → read → search → read
  • Don’t worry about knowledge graphs yet
  • Just accumulate raw text in an array

Hint 2: Add a simple in-memory fact store

  • Define a Facts array with { fact: string, source: string }
  • Add an extractFacts tool that uses generateObject
  • Store facts in memory, no graph yet

Hint 3: Use prepareStep to inject accumulated facts

  • Before each LLM call, serialize facts to text
  • Inject as context: “So far you have learned: [facts]”
  • Agent now “remembers” what it found

Hint 4: Add synthesizeReport as the final tool

  • When agent decides it’s done, it calls synthesizeReport
  • This tool uses generateObject to structure the final report
  • Include citations by matching facts to their source URLs

Hint 5: Upgrade to a real knowledge graph

  • Replace Facts array with nodes and edges
  • Add queryGraph tool so agent can search its own memory
  • Visualize with ASCII or export to JSON for external tools

Books That Will Help

Topic Book Chapter
Knowledge Graphs “Graph Algorithms the Fun Way” by Jeremy Kubica Ch. 2-3 (Graph representation)
Agent Patterns “Building LLM Apps” by Harrison Chase Agent loops, tool design
ReAct Pattern “ReAct: Synergizing Reasoning and Acting” (paper) The academic foundation
State Management “Fluent Python, 2nd Edition” by Luciano Ramalho Ch. 22 (patterns apply to TS)
Async Iteration “JavaScript: The Definitive Guide” by David Flanagan Ch. 13 (agent loop internals)
Web Scraping “Web Scraping with Python” by Ryan Mitchell Ch. 2-4 (readPage implementation)
Structured Output “Programming TypeScript” by Boris Cherny Ch. 3 (Zod schemas for facts)

Project Comparison

Project Difficulty Time Depth of Understanding Fun Factor
Expense Tracker CLI Beginner Weekend ⭐⭐ ⭐⭐⭐
Streaming Summarizer Beginner-Intermediate 1 week ⭐⭐⭐ ⭐⭐⭐⭐
Code Review Agent Intermediate 1-2 weeks ⭐⭐⭐⭐ ⭐⭐⭐⭐⭐
Model Router Intermediate 1-2 weeks ⭐⭐⭐⭐ ⭐⭐⭐
Research Agent Advanced 2-3 weeks ⭐⭐⭐⭐⭐ ⭐⭐⭐⭐⭐

Recommendation

Based on learning the AI SDK deeply, I recommend this progression:

  1. Start with Project 1 (Expense Tracker) - Gets you comfortable with the core API patterns in a low-risk CLI environment. You’ll have something working in a weekend.

  2. Move to Project 2 (Streaming Summarizer) - Adds the streaming dimension and web UI integration. This is where AI apps become fun.

  3. Tackle Project 3 (Code Review Agent) - This is the inflection point where you go from “using AI” to “building AI systems.” Tool calling changes everything.

  4. Projects 4-5 based on your interests - Model Router if you’re building production systems; Research Agent if you want to push agent capabilities.


Final Overall Project: Personal AI Command Center

What you’ll build: A unified personal AI assistant hub with multiple specialized agents (research agent, code helper, email manager, calendar assistant) that can be invoked via CLI, web UI, or API. Each agent has its own tools and state, but they can collaborate and share context through a central orchestration layer.

Why it teaches everything: This is the synthesis project. You’ll use:

  • generateText/streamText for real-time interactions
  • generateObject for structured task routing and data extraction
  • Tools for each agent’s specific capabilities
  • Agent loops for autonomous task completion
  • Provider abstraction to route different tasks to optimal models
  • Telemetry for usage tracking and debugging
  • Streaming UI for interactive web interface

Core challenges you’ll face:

  • Designing an agent orchestration layer that routes to specialized agents (maps to architecture)
  • Implementing shared context/memory across agents (maps to state management)
  • Building a unified tool registry that agents can discover (maps to tool design)
  • Creating a streaming web UI with multiple concurrent agent conversations (maps to real-time systems)
  • Implementing cost controls and rate limiting across providers (maps to production concerns)

Key Concepts:

  • Multi-Agent Architecture: AI SDK 6 Agent Abstraction docs
  • Event-Driven Architecture: “Designing Data-Intensive Applications” by Martin Kleppmann - Ch. 11
  • React Concurrent Features: “Learning React, 2nd Edition” by Eve Porcello - Ch. 8
  • API Design: “Design and Build Great Web APIs” by Mike Amundsen - Ch. 3-5

Difficulty: Advanced Time estimate: 1 month+ Prerequisites: All previous projects

Real world outcome:

  • Web dashboard showing all your agents and their status
  • Natural language command: “Research quantum computing, then draft an email to my team summarizing it”
  • Watch agents collaborate: research agent gathers info → email agent drafts message
  • CLI access: ai research "topic", ai email draft "context"
  • API endpoint for integration with other tools
  • Usage dashboard showing costs, requests, model usage by agent

Learning milestones:

  1. Single agent works end-to-end → you’ve internalized the agent pattern
  2. Two agents share context successfully → you understand inter-agent communication
  3. Web UI streams multiple agent responses → you’ve mastered concurrent streaming
  4. Cost tracking shows optimization opportunities → you think about production AI systems
  5. Someone else can use your command center → you’ve built a real product

Real World Outcome

Show EXACT what the web dashboard and CLI look like:

┌─────────────────────────────────────────────────────────────────────────────┐
│  🤖 Personal AI Command Center                              [Dashboard]     │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  ACTIVE AGENTS                                  RECENT ACTIVITY              │
│  ─────────────                                  ───────────────              │
│  🔬 Research Agent    [Idle]                    10:34 Drafted email to team  │
│  📧 Email Agent       [Processing...]           10:32 Research completed     │
│  📅 Calendar Agent    [Idle]                    10:28 Scheduled meeting      │
│  💻 Code Helper       [Idle]                    10:15 Reviewed PR #234       │
│                                                                              │
│  ─────────────────────────────────────────────────────────────────────────  │
│                                                                              │
│  CURRENT TASK: Drafting email summary of quantum research                   │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │                                                                      │    │
│  │  📧 Email Agent streaming...                                         │    │
│  │                                                                      │    │
│  │  Subject: Quantum Computing Research Summary                         │    │
│  │                                                                      │    │
│  │  Hi Team,                                                            │    │
│  │                                                                      │    │
│  │  I wanted to share some exciting findings from my research on█       │    │
│  │                                                                      │    │
│  └─────────────────────────────────────────────────────────────────────┘    │
│                                                                              │
│  ─────────────────────────────────────────────────────────────────────────  │
│                                                                              │
│  COST TRACKING (This Month)                                                 │
│  ───────────────────────────                                                │
│  Total: $23.45                                                              │
│  ├── Research Agent:  $12.30 (Claude Opus)                                  │
│  ├── Email Agent:     $5.20 (GPT-4)                                         │
│  ├── Calendar Agent:  $2.15 (GPT-3.5)                                       │
│  └── Code Helper:     $3.80 (Claude Sonnet)                                 │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

CLI access:

$ ai "Research quantum computing, then draft an email to my team summarizing it"

🤖 Orchestrator analyzing task...
📋 Execution plan:
   1. Research Agent → gather quantum computing info
   2. Email Agent → draft summary email

[Research Agent] 🔬 Starting research...
[Research Agent] ✓ Completed (12 facts gathered)

[Email Agent] 📧 Drafting email...
[Email Agent] ✓ Draft ready

Would you like me to send this email? [y/N]

The Core Question You’re Answering

“How do I build a system where multiple specialized agents collaborate to complete complex tasks?”

This is the synthesis of everything you’ve learned…

Concepts You Must Understand First

  1. Multi-Agent Orchestration - Coordinating multiple agents
  2. Agent-to-Agent Communication - Sharing context between agents
  3. Task Decomposition - Breaking complex tasks into agent subtasks
  4. Unified Tool Registry - Agents discovering and using shared tools
  5. Streaming with Multiple Agents - Concurrent streaming in web UI
  6. Cost Management - Tracking and controlling costs across agents

Include ASCII diagram:

┌─────────────────────────────────────────────────────────────────────────┐
│                      AI COMMAND CENTER ARCHITECTURE                      │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│   User Input: "Research X, then email summary to team"                   │
│        │                                                                 │
│        ▼                                                                 │
│   ┌─────────────────────────────────────────────────────────────────┐   │
│   │                    ORCHESTRATION LAYER                           │   │
│   │                                                                  │   │
│   │   Task Decomposition → Agent Selection → Execution Plan          │   │
│   └──────────────────────────────┬──────────────────────────────────┘   │
│                                  │                                       │
│          ┌───────────────────────┼───────────────────────┐              │
│          │                       │                       │              │
│          ▼                       ▼                       ▼              │
│   ┌─────────────┐        ┌─────────────┐        ┌─────────────┐        │
│   │  Research   │        │   Email     │        │  Calendar   │        │
│   │   Agent     │        │   Agent     │        │   Agent     │        │
│   │             │        │             │        │             │        │
│   │ Tools:      │        │ Tools:      │        │ Tools:      │        │
│   │ - search    │        │ - compose   │        │ - schedule  │        │
│   │ - read      │        │ - send      │        │ - check     │        │
│   │ - extract   │        │ - list      │        │ - invite    │        │
│   └──────┬──────┘        └──────┬──────┘        └──────┬──────┘        │
│          │                      │                      │                │
│          └──────────────────────┴──────────────────────┘                │
│                                 │                                        │
│                                 ▼                                        │
│   ┌─────────────────────────────────────────────────────────────────┐   │
│   │                     SHARED CONTEXT STORE                         │   │
│   │                                                                  │   │
│   │   Accumulated knowledge, user preferences, conversation history  │   │
│   └─────────────────────────────────────────────────────────────────┘   │
│                                                                          │
│   ┌─────────────────────────────────────────────────────────────────┐   │
│   │                     PROVIDER ABSTRACTION                         │   │
│   │         OpenAI  │  Anthropic  │  Google  │  Local Models         │   │
│   └─────────────────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────────────────┘

AI Command Center Architecture

AI Command Center Architecture

Questions to Guide Your Design

  1. How do agents communicate with each other?
  2. How do you handle agent failures in a chain?
  3. How do you stream multiple agent outputs to the UI?
  4. How do you implement cost controls per agent?

Thinking Exercise

Design the orchestration layer before implementing - how does it decompose tasks and select agents?

The Interview Questions They’ll Ask

  1. “How would you design a multi-agent system?”
  2. “How do you handle context sharing between agents?”
  3. “What’s your strategy for cost control in production AI?”
  4. “How would you test a multi-agent system?”

Hints in Layers

  • Hint 1: Start with one agent end-to-end
  • Hint 2: Add a simple orchestrator that routes to agents
  • Hint 3: Implement shared context store
  • Hint 4: Add concurrent streaming to the web UI

Books That Will Help

Topic Book Chapter
Event-Driven Architecture “Designing Data-Intensive Applications” Ch. 11
Multi-Agent Systems “Artificial Intelligence: A Modern Approach” Ch. 2
API Design “Design and Build Great Web APIs” Ch. 3-5
React Patterns “Learning React” by Eve Porcello Ch. 8, 12

Sources