AI SDK LEARNING PROJECTS

Learning the AI SDK (Vercel) Deeply

Goal: Master the Vercel AI SDK through hands-on projects that teach its core concepts by building real applications. By the end of these projects, you will understand how to generate text and structured data from LLMs, implement real-time streaming interfaces, build autonomous agents that use tools, and create production-ready AI systems with proper error handling, cost tracking, and multi-provider support.

Why the AI SDK Matters

In 2023, when ChatGPT exploded onto the scene, developers scrambled to build AI-powered applications. The problem? Every LLM provider had a different API. OpenAI used one format, Anthropic another, Google yet another. Code written for one provider couldn’t be ported to another without significant rewrites.

Vercel’s AI SDK solved this problem with a radical idea: a unified TypeScript interface that abstracts provider differences. Write once, run on any model. But it’s not just about abstraction—the SDK provides:

Type-safe structured output with Zod schemas
First-class streaming with Server-Sent Events and React hooks
Tool calling that lets LLMs take actions, not just generate text
Agent loops that run autonomously until tasks complete

Today, the AI SDK powers thousands of production applications. Understanding it deeply means understanding how modern AI applications are built.

The AI SDK in the Ecosystem

┌─────────────────────────────────────────────────────────────────────────────┐
│                         YOUR APPLICATION                                      │
│                                                                               │
│   ┌───────────────────────────────────────────────────────────────────────┐  │
│   │                         AI SDK (Unified API)                           │  │
│   │                                                                        │  │
│   │   ┌─────────────┐  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐  │  │
│   │   │ generateText │  │ streamText  │  │generateObject│ │ streamObject│  │  │
│   │   │   Batch      │  │  Real-time  │  │ Structured  │  │  Streaming  │  │  │
│   │   │   Output     │  │  Streaming  │  │   Output    │  │  Structured │  │  │
│   │   └─────────────┘  └─────────────┘  └─────────────┘  └─────────────┘  │  │
│   │                                                                        │  │
│   │   ┌──────────────────────────────┐  ┌────────────────────────────────┐│  │
│   │   │       Tools & Agents          │  │     React/Vue/Svelte Hooks    ││  │
│   │   │   stopWhen, prepareStep,      │  │   useChat, useCompletion,     ││  │
│   │   │   tool(), Agent class         │  │   useObject                   ││  │
│   │   └──────────────────────────────┘  └────────────────────────────────┘│  │
│   │                                                                        │  │
│   └────────────────────────────────┬──────────────────────────────────────┘  │
│                                    │                                          │
│                    Provider Abstraction Layer                                 │
│                                    │                                          │
│   ┌────────────┬──────────────┬────┴─────┬──────────────┬────────────────┐   │
│   │            │              │          │              │                │   │
│   ▼            ▼              ▼          ▼              ▼                ▼   │
│ ┌──────┐   ┌──────────┐   ┌───────┐  ┌───────┐   ┌──────────┐   ┌───────┐   │
│ │OpenAI│   │Anthropic │   │Google │  │Mistral│   │ Cohere   │   │ Local │   │
│ │ GPT  │   │ Claude   │   │Gemini │  │       │   │          │   │Models │   │
│ └──────┘   └──────────┘   └───────┘  └───────┘   └──────────┘   └───────┘   │
└─────────────────────────────────────────────────────────────────────────────┘

AI SDK Ecosystem

Core Concepts Deep Dive

Before diving into projects, you must understand the fundamental concepts that make the AI SDK powerful. Each concept builds on the previous one—don’t skip ahead.

1. Text Generation: The Foundation

At its core, the AI SDK does one thing: sends prompts to LLMs and gets responses back. But HOW you get those responses matters enormously.

┌────────────────────────────────────────────────────────────────────────────┐
│                    TEXT GENERATION PATTERNS                                  │
├────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│   generateText (Blocking)                 streamText (Real-time)             │
│   ─────────────────────────               ──────────────────────             │
│                                                                              │
│   Client          Server                  Client          Server             │
│      │               │                       │               │               │
│      │──── POST ────►│                       │──── POST ────►│               │
│      │               │                       │               │               │
│      │   (waiting)   │ ◄─────────────────┐   │   (waiting)   │ ◄──────────┐  │
│      │               │ Processing LLM    │   │               │ Start LLM  │  │
│      │               │ response...       │   │◄── token ─────│            │  │
│      │               │ (could be 10s+)   │   │◄── token ─────│ streaming  │  │
│      │               │                   │   │◄── token ─────│            │  │
│      │◄─ COMPLETE ───│ ──────────────────┘   │◄── token ─────│            │  │
│      │               │                       │◄── [done] ────│ ───────────┘  │
│      │               │                       │               │               │
│                                                                              │
│   USE WHEN:                               USE WHEN:                          │
│   • Background processing                 • Interactive UIs                  │
│   • Batch operations                      • Chat interfaces                  │
│   • Email drafting                        • Real-time feedback               │
│   • Agent tool calls                      • Long-form generation             │
│                                                                              │
└────────────────────────────────────────────────────────────────────────────┘

Text Generation Patterns

Key Insight: generateText blocks until the full response is ready. streamText returns an async iterator that yields tokens as they’re generated. For a 500-word response, generateText makes the user wait 5-10 seconds for anything to appear; streamText shows the first word in milliseconds.

// Blocking - waits for complete response
const { text } = await generateText({
  model: openai('gpt-4'),
  prompt: 'Explain quantum computing in 500 words'
});
console.log(text); // Full response after ~10 seconds

// Streaming - yields tokens as they arrive
const { textStream } = await streamText({
  model: openai('gpt-4'),
  prompt: 'Explain quantum computing in 500 words'
});
for await (const chunk of textStream) {
  process.stdout.write(chunk); // Each word appears immediately
}

2. Structured Output: Type-Safe AI

Raw text from LLMs is messy. You ask for JSON, you might get markdown. You ask for a number, you might get “approximately 42.” generateObject solves this by enforcing Zod schemas:

┌────────────────────────────────────────────────────────────────────────────┐
│                    STRUCTURED OUTPUT FLOW                                    │
├────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│   User Input                  Schema Definition              Typed Output    │
│   ──────────                  ─────────────────              ────────────    │
│                                                                              │
│   "Spent $45.50 on      ┌─────────────────────┐         {                    │
│    dinner with client    │  z.object({         │           amount: 45.50,    │
│    at Italian            │    amount: z.number │           category:         │
│    restaurant            │    category: z.enum │             "dining",       │
│    last Tuesday"         │    vendor: z.string │           vendor: "Italian  │
│                          │    date: z.date()   │             Restaurant",    │
│         │                │  })                 │           date: Date        │
│         │                └──────────┬──────────┘         }                   │
│         │                           │                       ▲                │
│         │                           │                       │                │
│         └───────────────────────────┼───────────────────────┘                │
│                                     │                                        │
│                              ┌──────┴──────┐                                 │
│                              │ generateObject│                                │
│                              │    + LLM     │                                │
│                              └─────────────┘                                 │
│                                                                              │
│   The LLM "sees" the schema and generates valid data.                        │
│   If validation fails, AI SDK throws AI_NoObjectGeneratedError.              │
│                                                                              │
└────────────────────────────────────────────────────────────────────────────┘

Structured Output Flow

Key Insight: Schema descriptions are prompt engineering. The LLM reads your schema including field descriptions to understand what you want. Better descriptions = better extraction.

const expenseSchema = z.object({
  amount: z.number().describe('The monetary amount spent in dollars'),
  category: z.enum(['dining', 'travel', 'office', 'entertainment'])
    .describe('The expense category for accounting'),
  vendor: z.string().describe('The business name where money was spent'),
  date: z.date().describe('When the expense occurred')
});

const { object } = await generateObject({
  model: openai('gpt-4'),
  schema: expenseSchema,
  prompt: 'Spent $45.50 on dinner with client at Italian restaurant last Tuesday'
});

// object is fully typed: { amount: number, category: "dining" | ..., ... }

3. Tools: AI That Takes Action

Text generation is passive—the AI talks, you listen. Tools make AI active—the AI can DO things.

┌────────────────────────────────────────────────────────────────────────────┐
│                       TOOL CALLING FLOW                                      │
├────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│   ┌──────────────────────────────────────────────────────────────────────┐  │
│   │                           Tool Registry                               │  │
│   │                                                                       │  │
│   │   ┌─────────────┐   ┌─────────────┐   ┌─────────────┐               │  │
│   │   │ getWeather  │   │ searchWeb   │   │ sendEmail   │               │  │
│   │   │             │   │             │   │             │               │  │
│   │   │ description:│   │ description:│   │ description:│               │  │
│   │   │ "Get current│   │ "Search the │   │ "Send an    │               │  │
│   │   │  weather    │   │  web for    │   │  email to   │               │  │
│   │   │  for city"  │   │  information│   │  a recipient│               │  │
│   │   │             │   │  "          │   │  "          │               │  │
│   │   │ input:      │   │ input:      │   │ input:      │               │  │
│   │   │  {city}     │   │  {query}    │   │  {to,subj,  │               │  │
│   │   │             │   │             │   │   body}     │               │  │
│   │   └─────────────┘   └─────────────┘   └─────────────┘               │  │
│   └──────────────────────────────────────────────────────────────────────┘  │
│                                    │                                         │
│                                    │ LLM sees descriptions                   │
│                                    │ and chooses which to call               │
│                                    ▼                                         │
│   ┌──────────────────────────────────────────────────────────────────────┐  │
│   │  User: "What's the weather in Tokyo and email it to john@example.com" │  │
│   │                                                                        │  │
│   │  LLM Reasoning:                                                        │  │
│   │   1. I need weather data → call getWeather({city: "Tokyo"})           │  │
│   │   2. I need to send email → call sendEmail({to: "john@...", ...})     │  │
│   │                                                                        │  │
│   │  SDK executes tools, returns results to LLM                            │  │
│   │  LLM generates final response incorporating tool results               │  │
│   └──────────────────────────────────────────────────────────────────────┘  │
│                                                                              │
└────────────────────────────────────────────────────────────────────────────┘

Tool Calling Flow

Key Insight: The LLM decides WHEN and WHICH tools to call based on your descriptions. You don’t control the flow—you define capabilities and let the LLM orchestrate.

const tools = {
  getWeather: tool({
    description: 'Get current weather for a city',
    parameters: z.object({
      city: z.string().describe('City name')
    }),
    execute: async ({ city }) => {
      const response = await fetch(`https://api.weather.com/${city}`);
      return response.json();
    }
  })
};

const { text, toolCalls } = await generateText({
  model: openai('gpt-4'),
  tools,
  prompt: 'What is the weather in Tokyo?'
});
// LLM called getWeather, got result, and incorporated it into response

4. Agents: Autonomous AI

A tool call is a single action. An agent is an LLM in a loop, calling tools repeatedly until a task is complete.

┌────────────────────────────────────────────────────────────────────────────┐
│                         AGENT LOOP ARCHITECTURE                              │
├────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│   User Goal: "Research quantum computing and write a summary"               │
│                                                                              │
│   ┌────────────────────────────────────────────────────────────────────┐    │
│   │                        AGENT LOOP                                   │    │
│   │                                                                     │    │
│   │   ┌─────────────────────────────────────────────────────────────┐  │    │
│   │   │  prepareStep: Inject accumulated context                     │  │    │
│   │   │    • "You have learned: [facts from previous steps]"        │  │    │
│   │   │    • "Sources visited: [urls]"                              │  │    │
│   │   └──────────────────────────┬──────────────────────────────────┘  │    │
│   │                              │                                      │    │
│   │                              ▼                                      │    │
│   │   ┌─────────────────────────────────────────────────────────────┐  │    │
│   │   │  LLM Decision: What should I do next?                        │  │    │
│   │   │                                                              │  │    │
│   │   │  Step 1: "I need to search" → webSearch("quantum computing")│  │    │
│   │   │  Step 2: "I should read this" → readPage("nature.com/...")  │  │    │
│   │   │  Step 3: "I found facts" → extractFacts(content)            │  │    │
│   │   │  Step 4: "Need more info" → webSearch("quantum error...")   │  │    │
│   │   │  ...                                                         │  │    │
│   │   │  Step N: "I have enough" → synthesize final answer          │  │    │
│   │   └──────────────────────────┬──────────────────────────────────┘  │    │
│   │                              │                                      │    │
│   │                              ▼                                      │    │
│   │   ┌─────────────────────────────────────────────────────────────┐  │    │
│   │   │  stopWhen: Check if agent should terminate                   │  │    │
│   │   │    • hasToolCall('synthesize') → true: STOP                 │  │    │
│   │   │    • stepCount > maxSteps → true: STOP                      │  │    │
│   │   │    • otherwise → false: CONTINUE LOOP                       │  │    │
│   │   └─────────────────────────────────────────────────────────────┘  │    │
│   │                                                                     │    │
│   └────────────────────────────────────────────────────────────────────┘    │
│                                                                              │
│   Output: Complete research summary with citations                           │
│                                                                              │
└────────────────────────────────────────────────────────────────────────────┘

Agent Loop Architecture

Key Insight: stopWhen and prepareStep are your control mechanisms. prepareStep injects state before each iteration; stopWhen decides when to stop. The agent is autonomous between these boundaries.

const { text, steps } = await generateText({
  model: openai('gpt-4'),
  tools: { search, readPage, synthesize },
  stopWhen: hasToolCall('synthesize'), // Stop when synthesis tool is called
  prepareStep: async ({ previousSteps }) => {
    // Inject accumulated knowledge before each step
    const facts = extractFacts(previousSteps);
    return {
      system: `You are a research agent. Facts learned so far: ${facts}`
    };
  },
  prompt: 'Research quantum computing and write a summary'
});

5. Provider Abstraction: Write Once, Run Anywhere

Different LLM providers have different APIs, capabilities, and quirks. The AI SDK normalizes them:

┌────────────────────────────────────────────────────────────────────────────┐
│                     PROVIDER ABSTRACTION                                     │
├────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│   YOUR CODE (unchanged)                                                      │
│   ─────────────────────                                                      │
│                                                                              │
│   const result = await generateText({                                        │
│     model: provider('model-name'),  ◄── Only this line changes              │
│     prompt: 'Your prompt here'                                               │
│   });                                                                        │
│                                                                              │
│   ┌────────────────────────────────────────────────────────────────────┐    │
│   │                    Provider Implementations                         │    │
│   │                                                                     │    │
│   │   openai('gpt-4')          → OpenAI REST API                       │    │
│   │   anthropic('claude-3')    → Anthropic Messages API                │    │
│   │   google('gemini-pro')     → Google Generative AI API              │    │
│   │   mistral('mistral-large') → Mistral La Plateforme API             │    │
│   │   ollama('llama2')         → Local Ollama HTTP API                 │    │
│   │                                                                     │    │
│   │   Each provider handles:                                            │    │
│   │   • Authentication (API keys, tokens)                              │    │
│   │   • Request format translation                                     │    │
│   │   • Response normalization                                         │    │
│   │   • Streaming protocol differences                                 │    │
│   │   • Error mapping to AI SDK error types                            │    │
│   │                                                                     │    │
│   └────────────────────────────────────────────────────────────────────┘    │
│                                                                              │
│   USE CASE: Fallback chains, cost optimization, capability routing          │
│                                                                              │
│   // Try Claude for reasoning, fall back to GPT-4                           │
│   try {                                                                      │
│     return await generateText({ model: anthropic('claude-3-opus') });       │
│   } catch {                                                                  │
│     return await generateText({ model: openai('gpt-4') });                  │
│   }                                                                          │
│                                                                              │
└────────────────────────────────────────────────────────────────────────────┘

Provider Abstraction

6. Streaming Architecture: Server-Sent Events

Understanding HOW streaming works is crucial for building real-time AI interfaces:

┌────────────────────────────────────────────────────────────────────────────┐
│                    STREAMING DATA FLOW                                       │
├────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│   Browser                 Next.js API Route              LLM Provider        │
│     │                          │                              │              │
│     │── POST /api/chat ───────►│                              │              │
│     │                          │── streamText() ─────────────►│              │
│     │                          │                              │              │
│     │                          │◄─ AsyncIterableStream ───────│              │
│     │                          │   (yields token by token)    │              │
│     │                          │                              │              │
│     │                   ┌──────┴──────┐                       │              │
│     │                   │ toDataStream│                       │              │
│     │                   │  Response() │                       │              │
│     │                   └──────┬──────┘                       │              │
│     │                          │                              │              │
│     │◄─ SSE: data: {"type":"text","value":"The"} ─────────────│              │
│     │◄─ SSE: data: {"type":"text","value":" quantum"} ────────│              │
│     │◄─ SSE: data: {"type":"text","value":" computer"} ───────│              │
│     │◄─ SSE: data: {"type":"finish"} ─────────────────────────│              │
│     │                          │                              │              │
│   ┌─┴─┐                        │                              │              │
│   │useChat hook               │                              │              │
│   │processes SSE              │                              │              │
│   │updates React state        │                              │              │
│   │triggers re-render         │                              │              │
│   └───┘                        │                              │              │
│                                                                              │
│   SSE Format:                                                                │
│   ───────────                                                                │
│   event: message                                                             │
│   data: {"type":"text-delta","textDelta":"The"}                              │
│                                                                              │
│   data: {"type":"text-delta","textDelta":" answer"}                          │
│                                                                              │
│   data: {"type":"finish","finishReason":"stop"}                              │
│                                                                              │
└────────────────────────────────────────────────────────────────────────────┘

Key Insight: Server-Sent Events are unidirectional (server → client), simpler than WebSockets, and perfect for LLM streaming. The AI SDK handles all the serialization and React state management.

Concept Summary Table

Concept Cluster	What You Need to Internalize
Text Generation	`generateText` is blocking, `streamText` is real-time. Both are the foundation for all LLM interactions.
Structured Output	`generateObject` transforms unstructured text into typed, validated data. Zod schemas guide LLM output. Schema descriptions are prompt engineering.
Tool Calling	Tools are functions the LLM can invoke. The LLM decides WHEN and WHICH tool to call based on descriptions. You define capabilities; the LLM orchestrates.
Agent Loop	An agent is an LLM in a loop, calling tools until a task is complete. `stopWhen` and `prepareStep` are your control mechanisms.
Provider Abstraction	Switch between OpenAI, Anthropic, Google with one line. The SDK normalizes API differences, auth, streaming protocols.
Streaming Architecture	SSE transport, AsyncIterableStream, token-by-token delivery. React hooks (`useChat`, `useCompletion`) handle client-side state.
Error Handling	`AI_NoObjectGeneratedError`, provider failures, stream errors. Production AI needs graceful degradation and retry logic.
Telemetry	Track tokens, costs, latency per request. Essential for production AI systems and cost optimization.

Deep Dive Reading By Concept

Concept	Book Chapters & Resources
Text Generation	• “JavaScript: The Definitive Guide” by David Flanagan - Ch. 13 (Asynchronous JavaScript, Promises, async/await) • AI SDK generateText docs • AI SDK streamText docs
Structured Output	• “Programming TypeScript” by Boris Cherny - Ch. 3 (Types), Ch. 6 (Advanced Types) • AI SDK generateObject docs • Zod documentation - Schema validation patterns
Tool Calling	• “Building LLM Apps” by Harrison Chase (LangChain blog series) • AI SDK Tools and Tool Calling • How to build AI Agents with Vercel
Agent Loop	• “ReAct: Synergizing Reasoning and Acting” (Yao et al.) - The academic foundation • AI SDK Agents docs • “Artificial Intelligence: A Modern Approach” by Russell & Norvig - Ch. 2 (Intelligent Agents)
Provider Abstraction	• “Design Patterns” by Gang of Four - Adapter pattern • AI SDK Providers docs • “Designing Data-Intensive Applications” by Martin Kleppmann - Ch. 4 (Encoding and Evolution)
Streaming Architecture	• “JavaScript: The Definitive Guide” by David Flanagan - Ch. 13 (Async Iteration), Ch. 15.11 (Server-Sent Events) • “Node.js Design Patterns” by Mario Casciaro - Ch. 6 (Streams) • MDN Server-Sent Events • AI SDK UI hooks docs
Error Handling	• “Programming TypeScript” by Boris Cherny - Ch. 7 (Handling Errors) • “Release It!, 2nd Edition” by Michael Nygard - Ch. 5 (Stability Patterns) • AI SDK Error Handling docs
Telemetry	• AI SDK Telemetry docs • “Designing Data-Intensive Applications” by Martin Kleppmann - Ch. 1 (Reliability, Observability) • OpenTelemetry documentation for observability patterns

Project 1: AI-Powered Expense Tracker CLI

File: AI_SDK_LEARNING_PROJECTS.md
Programming Language: TypeScript
Coolness Level: Level 2: Practical but Forgettable
Business Potential: 2. The “Micro-SaaS / Pro Tool”
Difficulty: Level 1: Beginner
Knowledge Area: Generative AI / CLI Tools
Software or Tool: AI SDK / Zod
Main Book: “Programming TypeScript” by Boris Cherny

What you’ll build: A command-line tool where you describe expenses in natural language (“Spent $45.50 on dinner with client at Italian restaurant”) and it extracts, categorizes, and stores structured expense records.

Why it teaches AI SDK: This forces you to understand generateObject and Zod schemas at their core. You’ll see how the LLM transforms unstructured human text into validated, typed data—the bread and butter of real AI applications.

Core challenges you’ll face:

Designing Zod schemas that guide LLM output effectively (maps to structured output)
Handling validation errors when the LLM produces invalid data (maps to error handling)
Adding schema descriptions to improve extraction accuracy (maps to prompt engineering)
Supporting multiple categories and edge cases (maps to schema design)

Key Concepts:

Zod Schema Design: AI SDK Generating Structured Data Docs
TypeScript Type Inference: “Programming TypeScript” by Boris Cherny - Ch. 3
CLI Development: “Command-Line Rust” by Ken Youens-Clark (patterns apply to TS too)

Difficulty: Beginner Time estimate: Weekend Prerequisites: Basic TypeScript, npm/pnpm

Learning milestones:

First generateObject call returns parsed expense → you understand schema-to-output mapping
Adding descriptions to schema fields improves extraction → you grasp how LLMs consume schemas
Handling AI_NoObjectGeneratedError gracefully → you understand AI SDK error patterns

Real World Outcome

When you run the CLI, here’s exactly what you’ll see in your terminal:

$ expense "Coffee with team $23.40 at Starbucks this morning"

✓ Expense recorded

┌─────────────────────────────────────────────────────────────────┐
│                        EXPENSE RECORD                            │
├─────────────────────────────────────────────────────────────────┤
│  Amount:     $23.40                                              │
│  Category:   dining                                              │
│  Vendor:     Starbucks                                           │
│  Date:       2025-12-22                                          │
│  Notes:      Coffee with team                                    │
├─────────────────────────────────────────────────────────────────┤
│  ID:         exp_a7f3b2c1                                        │
│  Created:    2025-12-22T10:34:12Z                                │
└─────────────────────────────────────────────────────────────────┘

Saved to ~/.expenses/2025-12.json

Try more complex natural language inputs:

$ expense "Took an Uber from airport to hotel, $67.80, for the Chicago conference trip"

✓ Expense recorded

┌─────────────────────────────────────────────────────────────────┐
│                        EXPENSE RECORD                            │
├─────────────────────────────────────────────────────────────────┤
│  Amount:     $67.80                                              │
│  Category:   travel                                              │
│  Vendor:     Uber                                                │
│  Date:       2025-12-22                                          │
│  Notes:      Airport to hotel, Chicago conference                │
├─────────────────────────────────────────────────────────────────┤
│  ID:         exp_b8e4c3d2                                        │
│  Created:    2025-12-22T10:35:45Z                                │
└─────────────────────────────────────────────────────────────────┘

Generate reports:

$ expense report --month 2025-12

┌─────────────────────────────────────────────────────────────────┐
│              EXPENSE REPORT: December 2025                       │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  SUMMARY BY CATEGORY                                             │
│  ───────────────────                                             │
│  dining        │████████████████     │  $234.50  (12 expenses)  │
│  travel        │████████████         │  $567.80  (5 expenses)   │
│  office        │████                 │  $89.20   (3 expenses)   │
│  entertainment │██                   │  $45.00   (2 expenses)   │
│  ─────────────────────────────────────────────────────────────  │
│  TOTAL                                 $936.50  (22 expenses)   │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Exported to ~/.expenses/report-2025-12.csv

Handle errors gracefully:

$ expense "bought something"

⚠ Could not extract expense details

Missing information:
  • Amount: No monetary value found
  • Vendor: No vendor/merchant identified

Please include at least an amount, e.g.:
  expense "bought lunch $15 at Chipotle"

The Core Question You’re Answering

“How do I transform messy, unstructured human text into clean, typed, validated data structures using AI?”

This is THE fundamental pattern of modern AI applications. Every chatbot that fills out forms, every assistant that creates calendar events, every tool that extracts data from documents—they all use this pattern. You describe something in plain English, and the AI SDK + LLM extracts structured data.

Before you write code, understand: generateObject is not just “LLM call with schema.” The schema itself is part of the prompt. The LLM sees your Zod schema including field names, types, and descriptions. Better schemas = better extraction.

Concepts You Must Understand First

Stop and research these before coding:

Zod Schemas as LLM Instructions
- What is a Zod schema and how does TypeScript infer types from it?
- How does generateObject send the schema to the LLM?
- Why do .describe() methods on schema fields improve extraction?
- Reference: Zod documentation - Start here
generateObject vs generateText
- When would you use generateText vs generateObject?
- What happens internally when you call generateObject?
- What is AI_NoObjectGeneratedError and when does it occur?
- Reference: AI SDK generateObject docs
TypeScript Type Inference
- How does z.infer<typeof schema> work?
- Why is this important for type-safe AI applications?
- Book Reference: “Programming TypeScript” by Boris Cherny - Ch. 3 (Types)
Error Handling in AI Systems
- What happens when the LLM generates data that doesn’t match the schema?
- How do you handle partial matches or missing fields?
- What’s the difference between validation errors and generation errors?
- Book Reference: “Programming TypeScript” by Boris Cherny - Ch. 7 (Handling Errors)
CLI Design Patterns
- How do you parse command-line arguments in Node.js?
- What makes a good CLI user experience?
- Book Reference: “Command-Line Rust” by Ken Youens-Clark - Ch. 1-2 (patterns apply to TypeScript)

Questions to Guide Your Design

Before implementing, think through these:

Schema Design
- What fields does an expense record need? (amount, category, vendor, date, notes?)
- What data types should each field be? (number, enum, string, Date?)
- Which fields are required vs optional?
- How do you handle ambiguous categories? (Is “Uber” travel or transportation?)
Natural Language Parsing
- How many ways can someone describe “$45.50”? (“45.50”, “$45.50”, “forty-five fifty”, “about 45 bucks”)
- How do you handle relative dates? (“yesterday”, “last Tuesday”, “this morning”)
- What if the vendor is implied but not stated? (“got coffee” → Starbucks?)
Storage and Persistence
- Where do you store expenses? (JSON file, SQLite, in-memory?)
- How do you organize by month/year for reporting?
- How do you handle concurrent writes?
Error Recovery
- What do you do when extraction fails completely?
- How do you handle partial extraction (got amount but no vendor)?
- Should you prompt the user for missing information?
CLI Interface
- What commands do you need? (add, list, report, export?)
- How do you handle interactive vs non-interactive modes?
- What output formats do you support? (JSON, table, CSV?)

Thinking Exercise

Before coding, design your schema on paper:

// Start with this skeleton and fill in the blanks:

const expenseSchema = z.object({
  // What fields do you need?
  // What types should they be?
  // What descriptions will help the LLM understand what you want?

  amount: z.number().describe('???'),
  category: z.enum(['???']).describe('???'),
  vendor: z.string().describe('???'),
  date: z.string().describe('???'), // or z.date()?
  notes: z.string().optional().describe('???'),
});

// Now trace through these inputs:
// 1. "Coffee $4.50 at Starbucks"
// 2. "Spent around 50 bucks on office supplies at Amazon yesterday"
// 3. "Uber to airport"  ← No amount! What happens?
// 4. "Bought stuff"     ← Very ambiguous! What happens?

Questions while tracing:

Which inputs will extract cleanly?
Which will cause validation errors?
How would you modify your schema to handle more edge cases?
What descriptions would help the LLM interpret “around 50 bucks”?

The Interview Questions They’ll Ask

Prepare to answer these:

“What is the difference between generateText and generateObject?”
- generateText returns unstructured text. generateObject returns a typed object validated against a Zod schema. Use generateObject when you need structured, validated data.
“How does Zod work with the AI SDK?”
- Zod schemas define the expected structure. The AI SDK serializes the schema (including descriptions) and sends it to the LLM. The LLM generates JSON matching the schema. The SDK validates the response and returns a typed object.
“What happens if the LLM generates invalid data?”
- The SDK throws AI_NoObjectGeneratedError. You can catch this and retry, prompt for more information, or fall back gracefully.
“How do schema descriptions affect LLM output quality?”
- Descriptions are essentially prompt engineering embedded in your type definitions. Clear descriptions with examples dramatically improve extraction accuracy.
“How would you handle partial extraction?”
- Use optional fields (z.optional()) for non-critical data. For required fields, validate the error and prompt the user for missing information.
“What are the tradeoffs of different expense categories?”
- z.enum() limits categories but ensures consistency. z.string() is flexible but may result in inconsistent categorization. A middle ground: use z.enum() with a catch-all “other” category.

Hints in Layers

Hint 1: Basic Setup Start with the simplest possible schema and a single command:

import { generateObject } from 'ai';
import { openai } from '@ai-sdk/openai';
import { z } from 'zod';

const expenseSchema = z.object({
  amount: z.number(),
  vendor: z.string(),
});

const { object } = await generateObject({
  model: openai('gpt-4o-mini'),
  schema: expenseSchema,
  prompt: process.argv[2], // "Coffee $5 at Starbucks"
});

console.log(object);

Run it and see what you get. Does it work? What’s missing?

Hint 2: Add Descriptions Descriptions dramatically improve extraction:

const expenseSchema = z.object({
  amount: z.number()
    .describe('The monetary amount spent in US dollars. Extract from phrases like "$45.50", "45 dollars", "about 50 bucks".'),
  vendor: z.string()
    .describe('The business or merchant name where the purchase was made.'),
  category: z.enum(['dining', 'travel', 'office', 'entertainment', 'other'])
    .describe('The expense category. Use "dining" for restaurants and coffee shops, "travel" for transportation and hotels.'),
});

Hint 3: Handle Errors Wrap your call in try/catch:

import { AI_NoObjectGeneratedError } from 'ai';

try {
  const { object } = await generateObject({ ... });
  console.log('✓ Expense recorded');
  console.log(object);
} catch (error) {
  if (error instanceof AI_NoObjectGeneratedError) {
    console.log('⚠ Could not extract expense details');
    console.log('Please include an amount and vendor.');
  } else {
    throw error;
  }
}

Hint 4: Add Persistence Store expenses in a JSON file:

import { readFileSync, writeFileSync, existsSync } from 'fs';

const EXPENSES_FILE = './expenses.json';

function loadExpenses(): Expense[] {
  if (!existsSync(EXPENSES_FILE)) return [];
  return JSON.parse(readFileSync(EXPENSES_FILE, 'utf-8'));
}

function saveExpense(expense: Expense) {
  const expenses = loadExpenses();
  expenses.push({ ...expense, id: crypto.randomUUID(), createdAt: new Date() });
  writeFileSync(EXPENSES_FILE, JSON.stringify(expenses, null, 2));
}

Hint 5: Build the Report Command Group expenses by category:

const expenses = loadExpenses();
const byCategory = Object.groupBy(expenses, (e) => e.category);

for (const [category, items] of Object.entries(byCategory)) {
  const total = items.reduce((sum, e) => sum + e.amount, 0);
  console.log(`${category}: $${total.toFixed(2)} (${items.length} expenses)`);
}

Books That Will Help

Topic	Book	Chapter
TypeScript fundamentals	“Programming TypeScript” by Boris Cherny	Ch. 3 (Types), Ch. 6 (Advanced Types)
Error handling patterns	“Programming TypeScript” by Boris Cherny	Ch. 7 (Handling Errors)
Zod and validation	Zod documentation	Entire guide
CLI design patterns	“Command-Line Rust” by Ken Youens-Clark	Ch. 1-2 (patterns apply to TS)
Async/await patterns	“JavaScript: The Definitive Guide” by David Flanagan	Ch. 13 (Asynchronous JavaScript)
AI SDK structured output	AI SDK Docs	Generating Structured Data

Recommended reading order:

Zod documentation (30 min) - Understand schema basics
AI SDK generateObject docs (30 min) - Understand the API
Boris Cherny Ch. 3 (1 hour) - Deep TypeScript types
Then start coding!

Project 2: Real-Time Document Summarizer with Streaming UI

File: AI_SDK_LEARNING_PROJECTS.md
Main Programming Language: TypeScript
Alternative Programming Languages: JavaScript, Python, Go
Coolness Level: Level 3: Genuinely Clever
Business Potential: Level 2: The “Micro-SaaS / Pro Tool”
Difficulty: Level 2: Intermediate (The Developer)
Knowledge Area: Web Streaming, AI Integration
Software or Tool: Next.js, AI SDK, React
Main Book: “JavaScript: The Definitive Guide” by David Flanagan

What you’ll build: A web application where users paste long documents (articles, papers, transcripts) and watch summaries generate in real-time, character by character, with a progress indicator and section-by-section breakdown.

Why it teaches AI SDK: streamText is what makes AI apps feel alive. You’ll implement the streaming pipeline end-to-end: from the SDK’s async iterators through Server-Sent Events to React state updates. This is how ChatGPT-style UIs work.

Core challenges you’ll face:

Implementing SSE streaming from Next.js API routes (maps to streaming architecture)
Consuming streams on the client with proper cleanup (maps to async iteration)
Handling partial updates and rendering in-progress text (maps to state management)
Graceful error handling mid-stream (maps to error boundaries)

Resources for key challenges:

“The AI SDK UI docs on useChat/useCompletion” - Shows the React hooks that handle streaming
“MDN Server-Sent Events guide” - Foundation for understanding the transport layer

Key Concepts:

Streaming Responses: AI SDK streamText Docs
React Server Components: “Learning React, 2nd Edition” by Eve Porcello - Ch. 12
Async Iterators: “JavaScript: The Definitive Guide” by David Flanagan - Ch. 13

Difficulty: Beginner-Intermediate Time estimate: 1 week Prerequisites: React/Next.js basics, TypeScript

Real world outcome:

Paste a 5,000-word article and watch the summary stream in real-time
See a “Summarizing…” indicator with word count progress
Final output shows key points, main themes, and a one-paragraph summary
Copy button to grab the summary for use elsewhere

Learning milestones:

First stream renders tokens in real-time → you understand async iteration
Implementing abort controller cancels mid-stream → you grasp cleanup patterns
Adding streaming structured output with streamObject → you combine both patterns

Real World Outcome

When you open the web app in your browser, here’s exactly what you’ll see and experience:

Initial State:

┌─────────────────────────────────────────────────────────────────────┐
│  📄 Document Summarizer                                              │
├─────────────────────────────────────────────────────────────────────┤
│                                                                      │
│  Paste your document here:                                           │
│  ┌────────────────────────────────────────────────────────────────┐ │
│  │                                                                 │ │
│  │  Paste or type your document text...                           │ │
│  │                                                                 │ │
│  │                                                                 │ │
│  │                                                                 │ │
│  └────────────────────────────────────────────────────────────────┘ │
│                                                                      │
│  Document length: 0 words                   [✨ Summarize]           │
│                                                                      │
└─────────────────────────────────────────────────────────────────────┘

After Pasting a Document (5,000+ words):

┌─────────────────────────────────────────────────────────────────────┐
│  📄 Document Summarizer                                              │
├─────────────────────────────────────────────────────────────────────┤
│                                                                      │
│  Paste your document here:                                           │
│  ┌────────────────────────────────────────────────────────────────┐ │
│  │ The field of quantum computing has seen remarkable progress    │ │
│  │ over the past decade. Recent breakthroughs in error           │ │
│  │ correction, qubit stability, and algorithmic development      │ │
│  │ have brought us closer than ever to practical quantum         │ │
│  │ advantage. This comprehensive analysis examines...            │ │
│  │ [... 5,234 more words ...]                                    │ │
│  └────────────────────────────────────────────────────────────────┘ │
│                                                                      │
│  Document length: 5,847 words               [✨ Summarize]           │
│                                                                      │
└─────────────────────────────────────────────────────────────────────┘

While Streaming (the magic happens!):

┌─────────────────────────────────────────────────────────────────────┐
│  📄 Document Summarizer                                              │
├─────────────────────────────────────────────────────────────────────┤
│                                                                      │
│  📝 Summary                                                          │
│  ─────────────────────────────────────────────────────────────────  │
│  ⏳ Generating...                           Progress: 234 words      │
│  ┌────────────────────────────────────────────────────────────────┐ │
│  │                                                                 │ │
│  │ ## Key Points                                                   │ │
│  │                                                                 │ │
│  │ The article examines recent quantum computing breakthroughs,   │ │
│  │ focusing on three critical areas:                              │ │
│  │                                                                 │ │
│  │ 1. **Error Correction**: IBM's new surface code approach       │ │
│  │    achieves 99.5% fidelity, a significant improvement over     │ │
│  │    previous methods. This breakthrough addresses one of the█   │ │
│  │                                                                 │ │
│  └────────────────────────────────────────────────────────────────┘ │
│                                                                      │
│  [⏹ Cancel]                                                          │
│                                                                      │
└─────────────────────────────────────────────────────────────────────┘

The cursor (█) moves in real-time as each token arrives from the LLM. The user watches the summary build word by word—this is the “ChatGPT effect” that makes AI feel alive.

Completed Summary:

┌─────────────────────────────────────────────────────────────────────┐
│  📄 Document Summarizer                                              │
├─────────────────────────────────────────────────────────────────────┤
│                                                                      │
│  📝 Summary                                           ✓ Complete     │
│  ─────────────────────────────────────────────────────────────────  │
│  Generated in 4.2s                          Total: 312 words         │
│  ┌────────────────────────────────────────────────────────────────┐ │
│  │                                                                 │ │
│  │ ## Key Points                                                   │ │
│  │                                                                 │ │
│  │ The article examines recent quantum computing breakthroughs,   │ │
│  │ focusing on three critical areas:                              │ │
│  │                                                                 │ │
│  │ 1. **Error Correction**: IBM's new surface code approach       │ │
│  │    achieves 99.5% fidelity, a significant improvement...       │ │
│  │                                                                 │ │
│  │ 2. **Qubit Scaling**: Google's 1,000-qubit processor           │ │
│  │    demonstrates exponential progress in hardware capacity...   │ │
│  │                                                                 │ │
│  │ 3. **Commercial Applications**: First production deployments   │ │
│  │    in drug discovery and financial modeling show...            │ │
│  │                                                                 │ │
│  │ ## Main Themes                                                  │ │
│  │ - Race between IBM, Google, and emerging startups              │ │
│  │ - Shift from theoretical to practical quantum advantage        │ │
│  │ - Growing investment from pharmaceutical and finance sectors   │ │
│  │                                                                 │ │
│  │ ## One-Paragraph Summary                                        │ │
│  │ Quantum computing is transitioning from experimental to        │ │
│  │ practical, with major players achieving key milestones in      │ │
│  │ error correction and scaling that enable real-world use cases. │ │
│  │                                                                 │ │
│  └────────────────────────────────────────────────────────────────┘ │
│                                                                      │
│  [📋 Copy to Clipboard]      [🔄 Summarize Again]      [📄 New Doc]  │
│                                                                      │
└─────────────────────────────────────────────────────────────────────┘

Error State (mid-stream failure):

┌─────────────────────────────────────────────────────────────────────┐
│  📄 Document Summarizer                                              │
├─────────────────────────────────────────────────────────────────────┤
│                                                                      │
│  📝 Summary                                           ⚠️ Error       │
│  ─────────────────────────────────────────────────────────────────  │
│  Stopped after 2.1s                         Partial: 156 words       │
│  ┌────────────────────────────────────────────────────────────────┐ │
│  │                                                                 │ │
│  │ ## Key Points                                                   │ │
│  │                                                                 │ │
│  │ The article examines recent quantum computing breakthroughs,   │ │
│  │ focusing on three critical areas:                              │ │
│  │                                                                 │ │
│  │ 1. **Error Correction**: IBM's new surface code approach       │ │
│  │    achieves 99.5% fidelity...                                  │ │
│  │                                                                 │ │
│  │ ─────────────────────────────────────────────────────────────  │ │
│  │ ⚠️ Stream interrupted: Connection timeout                       │ │
│  │    Showing partial results above.                              │ │
│  │                                                                 │ │
│  └────────────────────────────────────────────────────────────────┘ │
│                                                                      │
│  [🔄 Retry]                  [📋 Copy Partial]         [📄 New Doc]  │
│                                                                      │
└─────────────────────────────────────────────────────────────────────┘

Key UX behaviors to implement:

The text area scrolls automatically to keep the cursor visible
Word count updates in real-time as tokens arrive
“Cancel” button appears only during streaming
Partial results are preserved even on error
Copy button works even during streaming (copies current content)

The Core Question You’re Answering

“How do I stream LLM responses in real-time to create responsive, interactive UIs?”

This is about understanding the entire streaming pipeline from the AI SDK’s async iterators through Server-Sent Events to React state updates. You’re not just calling an API—you’re building a real-time data flow that makes AI feel alive and responsive.

Concepts You Must Understand First

Server-Sent Events (SSE) - The transport layer, how events flow from server to client over HTTP
Async Iterators - The for await...of pattern, AsyncIterableStream in JavaScript
React State with Streams - Updating state incrementally as chunks arrive without causing excessive re-renders
AbortController - Cancellation patterns for stopping streams mid-flight
Next.js API Routes - Server-side streaming setup with proper headers and response handling

Questions to Guide Your Design

How do you send streaming responses from Next.js API routes?
How do you consume Server-Sent Events on the client side?
What happens if the user navigates away mid-stream? (Memory leaks, cleanup)
How do you show a loading state vs partial content? (UX considerations)
What do you do when the stream errors halfway through?
How do you handle backpressure if the client can’t keep up with the stream?

Thinking Exercise

Draw a diagram of the data flow:

User pastes text and clicks “Summarize”
Client sends POST request to /api/summarize with document text
API route calls streamText() from AI SDK
AI SDK returns an AsyncIterableStream
Next.js converts this to Server-Sent Events (SSE) via toDataStreamResponse()
Browser EventSource/fetch receives SSE chunks
React hook (useChat/useCompletion) processes each chunk
State updates trigger re-renders
UI shows progressive text with cursor indicator
Stream completes or user cancels with AbortController

Now trace what happens when:

The network connection drops mid-stream
The user clicks “Cancel”
Two requests are made simultaneously
The LLM returns an error after 50 tokens

The Interview Questions They’ll Ask

“Explain the difference between WebSockets and Server-Sent Events”
- Expected answer: SSE is unidirectional (server → client), simpler, built on HTTP, auto-reconnects. WebSockets are bidirectional, require protocol upgrade, more complex but better for chat-like interactions.
“How would you implement cancellation for a streaming request?”
- Expected answer: Use AbortController on the client, pass signal to fetch, clean up EventSource. On server, handle abort signals in the stream processing.
“What happens if the stream errors mid-response?”
- Expected answer: Partial data is already rendered, need error boundary to catch and display error state, possibly implement retry logic, show user what was received + error message.
“How do you handle back-pressure in streaming?”
- Expected answer: Browser EventSource buffers automatically, but you need to consider state update batching in React, potentially throttle/debounce updates, use React 18 transitions for non-urgent updates.
“Why use Server-Sent Events instead of polling?”
- Expected answer: Lower latency, less server load, real-time updates, no missed messages between polls, built-in reconnection.

Hints in Layers

Hint 1 (Basic Setup): Use the AI SDK’s toDataStreamResponse() helper to convert the stream into a format Next.js can send via SSE.

Hint 2 (Client Integration): The AI SDK provides useChat or useCompletion hooks that handle SSE consumption, state management, and cleanup automatically.

Hint 3 (Cancellation): Implement AbortController on the client side and pass the signal to your fetch request. The AI SDK hooks support this with the abort() function they return.

Hint 4 (Error Handling): Add React Error Boundaries around your streaming component, and handle errors in the onError callback of the AI SDK hooks. Consider showing partial results even when errors occur.

Hint 5 (Progress Tracking): The streamText response includes token counts and metadata. Use onFinish callback to track completion, and parse the streaming chunks to count words/tokens for progress indicators.

Hint 6 (Performance): Use React 18’s useTransition for non-urgent state updates to prevent janky UI. Consider useDeferredValue for the streaming text to keep the UI responsive.

Books That Will Help

Topic	Book	Chapter/Section
Async JavaScript & Iterators	“JavaScript: The Definitive Guide” by David Flanagan	Ch. 13 (Asynchronous JavaScript)
Server-Sent Events	“JavaScript: The Definitive Guide” by David Flanagan	Ch. 15.11 (Server-Sent Events)
React State Management	“Learning React, 2nd Edition” by Eve Porcello	Ch. 8 (Hooks), Ch. 12 (React and Server)
Streaming in Node.js	“Node.js Design Patterns, 3rd Edition” by Mario Casciaro	Ch. 6 (Streams)
Error Handling Patterns	“Release It!, 2nd Edition” by Michael Nygard	Ch. 5 (Stability Patterns)
Web APIs & Fetch	“JavaScript: The Definitive Guide” by David Flanagan	Ch. 15 (Web APIs)
React 18 Concurrent Features	“Learning React, 2nd Edition” by Eve Porcello	Ch. 8 (useTransition, useDeferredValue)

Recommended reading order:

Start with Flanagan Ch. 13 to understand async/await and async iterators
Read Flanagan Ch. 15.11 for SSE fundamentals
Move to Porcello Ch. 8 for React hooks patterns
Then tackle the AI SDK documentation with this foundation

Online Resources:

Project 3: Code Review Agent with Tool Calling

File: AI_SDK_LEARNING_PROJECTS.md
Main Programming Language: TypeScript
Alternative Programming Languages: Python, Go, JavaScript
Coolness Level: Level 3: Genuinely Clever
Business Potential: Level 2: The “Micro-SaaS / Pro Tool”
Difficulty: Level 2: Intermediate (The Developer)
Knowledge Area: AI Agents, Tool Calling
Software or Tool: AI SDK, GitHub API, CLI
Main Book: “Building LLM Agents” by Harrison Chase (LangChain blog series)

What you’ll build: A CLI agent that takes a GitHub PR URL or local diff, then autonomously reads files, analyzes code patterns, checks for issues, and generates a structured code review with specific line-by-line feedback.

Why it teaches AI SDK: This is your first real agent—an LLM in a loop calling tools. You’ll define tools for file reading, pattern searching, and issue tracking. The LLM decides which tools to call and when, not you. This is where AI SDK becomes powerful.

Core challenges you’ll face:

Defining tool schemas that the LLM can understand and invoke correctly (maps to tool definition)
Implementing the agent loop with maxSteps or stopWhen (maps to agent architecture)
Managing context as tools return data back to the LLM (maps to conversation state)
Handling tool execution failures gracefully (maps to error recovery)

Resources for key challenges:

“AI SDK Agents documentation” - The canonical reference for agentic patterns
“Building LLM Agents” by Harrison Chase (LangChain blog series) - Mental models for agent design

Key Concepts:

Tool Definition: AI SDK Tools and Tool Calling
Agent Loop: AI SDK Agents
Git/GitHub API: GitHub REST API documentation for PR data

Difficulty: Intermediate Time estimate: 1-2 weeks Prerequisites: Completed Projects 1-2, Git basics

Learning milestones:

LLM calls your readFile tool → you understand tool invocation flow
Agent makes multiple tool calls in sequence → you grasp the agentic loop
Using onStepFinish to log agent progress → you understand observability patterns
Agent decides it’s “done” autonomously → you understand completion conditions

Real World Outcome

When you run the CLI, here’s exactly what you’ll see in your terminal:

$ review-agent https://github.com/acme/widget-api/pull/47

🔍 Code Review Agent initialized
📋 Fetching PR #47 from acme/widget-api...

[Step 1] 🔧 Tool: fetchPRMetadata
         Title: "Add user authentication middleware"
         Files changed: 5
         Additions: 234, Deletions: 12

[Step 2] 🔧 Tool: readFile("src/middleware/auth.ts")
         Reading 89 lines...

[Step 3] 🔧 Tool: readFile("src/routes/users.ts")
         Reading 156 lines...

[Step 4] 🔧 Tool: searchPattern("password", "*.ts")
         Found 3 matches across 2 files

[Step 5] 🔧 Tool: readFile("src/utils/crypto.ts")
         Reading 45 lines...

[Step 6] 🤔 Agent reasoning: "I've reviewed the authentication
         implementation. I noticed potential security issues with
         password handling and a missing rate limiter. Let me
         generate the review."

[Step 7] 🔧 Tool: generateReview

═══════════════════════════════════════════════════════════════════════
                    CODE REVIEW: PR #47
            "Add user authentication middleware"
═══════════════════════════════════════════════════════════════════════

## 🔴 Security Issues (2)

### 1. Plaintext Password Logging
**File:** `src/middleware/auth.ts:34`
**Severity:** Critical

```typescript
console.log(`Login attempt for user: ${email}, password: ${password}`);
                                                        ^^^^^^^^

Issue: Password is logged in plaintext. This exposes credentials in logs. Fix: Remove password from log statement or mask it.

2. Missing Rate Limiting

File: src/routes/users.ts:12-45 Severity: High

The /login endpoint has no rate limiting, making it vulnerable to brute-force attacks.

Recommended: Add rate limiting middleware (e.g., express-rate-limit).

🟡 Code Quality (3)

1. Magic Number

File: src/middleware/auth.ts:56

const TOKEN_EXPIRY = 86400;  // What is this number?

Suggestion: Use named constant: const ONE_DAY_IN_SECONDS = 86400;

2. Missing Error Type

File: src/utils/crypto.ts:23

} catch (e) {
    throw e;  // No type narrowing
}

Suggestion: Use catch (e: unknown) and proper error handling.

3. Inconsistent Async Pattern

File: src/routes/users.ts:78 Mixing .then() and async/await. Prefer consistent async/await.

🟢 Suggestions (2)

1. Add Input Validation

File: src/routes/users.ts:15 Consider adding Zod schema validation for the login request body.

2. Extract JWT Secret

File: src/middleware/auth.ts:8 JWT_SECRET should come from environment variables, not hardcoded.

Summary

Category	Count
🔴 Security Issues	2
🟡 Code Quality	3
🟢 Suggestions	2

Overall: This PR introduces authentication but has critical security issues that must be addressed before merging.

Recommendation: Request changes

═══════════════════════════════════════════════════════════════════════

📁 Full review saved to: review-pr-47.md 🔗 Ready to post as PR comment? [y/N]

If the user confirms, the agent posts the review as a GitHub comment:

```bash
$ y

📤 Posting review to GitHub...
✓ Review posted: https://github.com/acme/widget-api/pull/47#issuecomment-1234567

Done! Agent completed in 12.3s (7 steps, 3 files analyzed)

The Core Question You’re Answering

“How do I build an AI that autonomously takes actions, not just generates text?”

This is the paradigm shift from AI as a “fancy autocomplete” to AI as an “autonomous agent.” You’re not just asking the LLM to write a review—you’re giving it tools to fetch PRs, read files, search patterns, and letting it decide what to do next.

The LLM is now in control of the flow. It chooses which files to read. It decides when it has enough information. It determines when to stop. Your job is to define the tools and constraints, then let the agent work.

Concepts You Must Understand First

Stop and research these before coding:

Tool Definition with the AI SDK
- What is the tool() function and how do you define a tool?
- How does the LLM “see” your tool? (description + parameters schema)
- What’s the difference between execute and generate in tools?
- Reference: AI SDK Tools and Tool Calling
Agent Loop with stopWhen
- What does stopWhen do in generateText?
- How does the agent loop work internally?
- What is hasToolCall() and how do you use it?
- Reference: AI SDK Agents
Context Management
- How do tool results get fed back to the LLM?
- What happens if the context gets too long?
- How do you use onStepFinish for observability?
- Reference: AI SDK Agent Events
GitHub API Basics
- How do you fetch PR metadata with the GitHub REST API?
- How do you get the list of changed files in a PR?
- How do you read file contents from a specific commit?
- Reference: GitHub REST API - Pull Requests
Error Handling in Agents
- What happens if a tool fails mid-execution?
- How do you implement retry logic for transient failures?
- How do you handle LLM errors vs tool errors?
- Book Reference: “Release It!, 2nd Edition” by Michael Nygard - Ch. 5

Questions to Guide Your Design

Before implementing, think through these:

What tools does a code review agent need?
- fetchPRMetadata: Get PR title, description, files changed
- readFile: Read a specific file’s contents
- searchPattern: Search for patterns across files (like grep)
- getDiff: Get the diff for a specific file
- generateReview: Final tool that triggers review synthesis
How does the agent know what to review?
- Start with the list of changed files from the PR
- Agent decides which files are important to read
- Agent searches for patterns that indicate issues (e.g., “TODO”, “password”, “console.log”)
How does the agent know when to stop?
- Use stopWhen: hasToolCall('generateReview')
- Agent calls generateReview when it has gathered enough information
- Add maxSteps as a safety limit
How do you structure the review output?
- Use generateObject with a schema for the review
- Categories: security issues, code quality, suggestions
- Each issue has: file, line, description, severity, suggested fix
How do you handle large PRs?
- Limit the number of files to analyze
- Summarize file contents if too long
- Prioritize files by extension (.ts > .md)

Thinking Exercise

Design your tools on paper before implementing:

// Define your tool schemas:

const tools = {
  fetchPRMetadata: tool({
    description: '???', // What should this say?
    parameters: z.object({
      prUrl: z.string().describe('???')
    }),
    execute: async ({ prUrl }) => {
      // What does this return?
      // { title, description, filesChanged, additions, deletions }
    }
  }),

  readFile: tool({
    description: '???',
    parameters: z.object({
      path: z.string().describe('???')
    }),
    execute: async ({ path }) => {
      // Return file contents as string
    }
  }),

  searchPattern: tool({
    description: '???',
    parameters: z.object({
      pattern: z.string(),
      glob: z.string().optional()
    }),
    execute: async ({ pattern, glob }) => {
      // Return matches: [{ file, line, match }]
    }
  }),

  generateReview: tool({
    description: 'Generate the final code review. Call this when you have gathered enough information.',
    parameters: z.object({
      summary: z.string(),
      issues: z.array(issueSchema),
      recommendation: z.enum(['approve', 'request-changes', 'comment'])
    }),
    execute: async (review) => review // Just return the structured review
  })
};

// Trace through a simple PR with 2 files changed:
// 1. What tool does the agent call first?
// 2. How does it decide which file to read?
// 3. When does it decide it has enough information?
// 4. What triggers the generateReview call?

The Interview Questions They’ll Ask

Prepare to answer these:

“What is an AI agent and how is it different from a simple LLM call?”
- An agent is an LLM in a loop that can call tools. Unlike a single LLM call that just generates text, an agent can take actions (read files, make API calls) and iterate until a task is complete. The agent autonomously decides which actions to take.
“How do you define a tool for the AI SDK?”
- Use the tool() function with a description (tells LLM when to use it), a Zod parameters schema (defines the input), and an execute function (performs the action). The description is critical—it’s prompt engineering for tool selection.
“What is stopWhen and how does it work?”
- stopWhen is a condition that determines when the agent loop terminates. Common patterns: hasToolCall('finalTool') stops when a specific tool is called, or a custom function that checks step count or context.
“How do you handle context growth in agents?”
- Use prepareStep to summarize or filter previous steps. Limit tool output size. Implement context windowing. For code review: only include relevant file snippets, not entire files.
“What happens if a tool fails during agent execution?”
- The error is returned to the LLM as a tool result. The LLM can decide to retry, try a different approach, or handle the error gracefully. You can also implement retry logic in the tool’s execute function.
“How would you test an AI agent?”
- Mock the LLM responses to test tool orchestration. Test tools in isolation. Use deterministic prompts for reproducible behavior. Log all steps for debugging. Implement integration tests with real LLM calls for end-to-end validation.

Hints in Layers

Hint 1: Start with a single tool Get the agent loop working with just one tool:

import { generateText, tool } from 'ai';
import { openai } from '@ai-sdk/openai';
import { z } from 'zod';

const tools = {
  readFile: tool({
    description: 'Read a file from the repository',
    parameters: z.object({
      path: z.string().describe('Path to the file')
    }),
    execute: async ({ path }) => {
      // For now, just return mock content
      return `Contents of ${path}: // TODO: implement`;
    }
  })
};

const { text, steps } = await generateText({
  model: openai('gpt-4'),
  tools,
  prompt: 'Read the file src/index.ts and tell me what it does.'
});

console.log('Steps:', steps.length);
console.log('Result:', text);

Run this and observe how the LLM calls your tool.

Hint 2: Add the agent loop with stopWhen

import { hasToolCall } from 'ai';

const tools = {
  readFile: tool({ ... }),
  generateSummary: tool({
    description: 'Generate the final summary. Call this when done.',
    parameters: z.object({
      summary: z.string()
    }),
    execute: async ({ summary }) => summary
  })
};

const { text, steps } = await generateText({
  model: openai('gpt-4'),
  tools,
  stopWhen: hasToolCall('generateSummary'),
  prompt: 'Read src/index.ts and src/utils.ts, then generate a summary.'
});

Hint 3: Add observability with onStepFinish

const { text, steps } = await generateText({
  model: openai('gpt-4'),
  tools,
  stopWhen: hasToolCall('generateSummary'),
  onStepFinish: ({ stepType, toolCalls }) => {
    console.log(`[Step] Type: ${stepType}`);
    for (const call of toolCalls || []) {
      console.log(`  Tool: ${call.toolName}(${JSON.stringify(call.args)})`);
    }
  },
  prompt: 'Review the PR...'
});

Hint 4: Connect to real GitHub API

const fetchPRMetadata = tool({
  description: 'Fetch metadata for a GitHub Pull Request',
  parameters: z.object({
    owner: z.string(),
    repo: z.string(),
    prNumber: z.number()
  }),
  execute: async ({ owner, repo, prNumber }) => {
    const response = await fetch(
      `https://api.github.com/repos/${owner}/${repo}/pulls/${prNumber}`,
      { headers: { Authorization: `token ${process.env.GITHUB_TOKEN}` } }
    );
    const pr = await response.json();
    return {
      title: pr.title,
      body: pr.body,
      changedFiles: pr.changed_files,
      additions: pr.additions,
      deletions: pr.deletions
    };
  }
});

Hint 5: Structure the review output

const reviewSchema = z.object({
  securityIssues: z.array(z.object({
    file: z.string(),
    line: z.number(),
    severity: z.enum(['critical', 'high', 'medium', 'low']),
    description: z.string(),
    suggestedFix: z.string()
  })),
  codeQuality: z.array(z.object({
    file: z.string(),
    line: z.number(),
    description: z.string(),
    suggestion: z.string()
  })),
  recommendation: z.enum(['approve', 'request-changes', 'comment']),
  summary: z.string()
});

const generateReview = tool({
  description: 'Generate the final structured code review',
  parameters: reviewSchema,
  execute: async (review) => review
});

Books That Will Help

Topic	Book	Chapter
Agent mental models	“Artificial Intelligence: A Modern Approach” by Russell & Norvig	Ch. 2 (Intelligent Agents)
ReAct pattern	“ReAct: Synergizing Reasoning and Acting” (Yao et al.)	The academic paper
Error handling	“Release It!, 2nd Edition” by Michael Nygard	Ch. 5 (Stability Patterns)
Tool design	AI SDK Tools Docs	Entire section
Agent loops	AI SDK Agents Docs	stopWhen, prepareStep
TypeScript patterns	“Programming TypeScript” by Boris Cherny	Ch. 4 (Functions), Ch. 7 (Error Handling)
GitHub API	GitHub REST API Docs	Pull Requests, Contents
CLI development	“Command-Line Rust” by Ken Youens-Clark	Ch. 1-3 (patterns apply)

Recommended reading order:

AI SDK Tools and Tool Calling docs (30 min) - Understand tool definition
AI SDK Agents docs (30 min) - Understand stopWhen and loop control
Russell & Norvig Ch. 2 (1 hour) - Deep mental model for agents
GitHub Pull Requests API (30 min) - Understand the data you’ll work with
Then start coding!

Project 4: Multi-Provider Model Router

File: AI_SDK_LEARNING_PROJECTS.md
Main Programming Language: TypeScript
Alternative Programming Languages: Python, Go, JavaScript
Coolness Level: Level 3: Genuinely Clever
Business Potential: Level 3: The “Service & Support” Model
Difficulty: Level 2: Intermediate (The Developer)
Knowledge Area: API Gateway, AI Integration
Software or Tool: AI SDK, OpenAI, Anthropic, Google AI
Main Book: “Designing Data-Intensive Applications” by Martin Kleppmann

What you’ll build: A smart API gateway that accepts prompts and dynamically routes them to the optimal model (GPT-4 for reasoning, Claude for long context, Gemini for vision) based on task analysis, with fallback handling and cost tracking.

Why it teaches AI SDK: The SDK’s provider abstraction is its killer feature. You’ll implement a system that uses generateObject to classify tasks, then routes to different providers—all through the unified API. You’ll deeply understand how the SDK normalizes provider differences.

Core challenges you’ll face:

Configuring multiple providers with their API keys and settings (maps to provider setup)
Building a task classifier that determines optimal model (maps to structured output)
Implementing fallback logic when primary provider fails (maps to error handling)
Tracking token usage and costs across providers (maps to telemetry)

Key Concepts:

Provider Configuration: AI SDK Providers
Error Handling: AI SDK Error Handling
Usage Tracking: AI SDK Telemetry
API Gateway Patterns: “Designing Data-Intensive Applications” by Martin Kleppmann - Ch. 4

Difficulty: Intermediate Time estimate: 1-2 weeks Prerequisites: Multiple API keys (OpenAI, Anthropic, Google), completed Projects 1-3

Real world outcome:

REST API endpoint that accepts { prompt, preferredCapability: "reasoning" | "vision" | "long-context" }
Automatically selects the best model, falls back on failure
Dashboard showing requests per provider, costs, latency, and success rates
Cost savings visible when cheaper models handle simple tasks

Learning milestones:

Swapping providers with one line change → you understand the abstraction value
Fallback chain executes on provider error → you grasp resilience patterns
Telemetry shows cost per request → you understand production observability

Project 5: Autonomous Research Agent with Memory

File: AI_SDK_LEARNING_PROJECTS.md
Main Programming Language: TypeScript
Alternative Programming Languages: Python, Go, JavaScript
Coolness Level: Level 4: Hardcore Tech Flex
Business Potential: Level 4: The “Open Core” Infrastructure
Difficulty: Level 3: Advanced (The Engineer)
Knowledge Area: AI Agents, Knowledge Graphs
Software or Tool: AI SDK, Web Search APIs, Graph Databases
Main Book: “Graph Algorithms the Fun Way” by Jeremy Kubica

What you’ll build: An agent that takes a research question, autonomously searches the web, reads pages, extracts facts, maintains a knowledge graph of discovered information, and synthesizes a final research report with citations.

Why it teaches AI SDK: This is a complex multi-tool agent with state management. You’ll implement tools for web search, page reading, fact extraction, and graph updates. The agent must decide when to search more vs. when to synthesize—real autonomous decision-making.

Core challenges you’ll face:

Building tools that interact with external APIs (search, fetch) (maps to tool implementation)
Maintaining state across agent iterations (knowledge graph) (maps to agent state)
Using prepareStep to inject context before each iteration (maps to loop control)
Implementing stopWhen for intelligent termination (maps to completion criteria)

Resources for key challenges:

“AI SDK 5 Agent documentation” - The stopWhen and prepareStep APIs
“ReAct: Synergizing Reasoning and Acting” (Yao et al.) - The academic foundation for tool-using agents

Key Concepts:

Agent Loop Control: AI SDK Agents - Advanced
State Management: “Fluent Python, 2nd Edition” by Luciano Ramalho - Ch. 22 (patterns apply)
Knowledge Graphs: “Graph Algorithms the Fun Way” by Jeremy Kubica - Ch. 2-3

Difficulty: Advanced Time estimate: 2-3 weeks Prerequisites: Completed Projects 1-4

Real world outcome:

Ask “What are the latest developments in quantum computing 2025?”
Watch the agent search, read articles, extract facts, build connections
Final output: 2-3 page research report with inline citations, confidence scores
Knowledge graph visualization showing how facts connect

Learning milestones:

Agent chains 5+ tool calls to gather information → you understand complex orchestration
prepareStep injects accumulated knowledge → you grasp context management
Agent decides “enough research, time to write” → you understand autonomous planning
Structured citations extracted with generateObject → you combine all SDK patterns

Real World Outcome

Show EXACT command-line output with agent steps:

$ research "What are the latest breakthroughs in quantum computing as of 2025?"

🔬 Research Agent initialized
📚 Knowledge graph: empty

[Step 1] 🔍 Tool: webSearch("quantum computing breakthroughs 2025")
         → Found 15 relevant results

[Step 2] 📖 Tool: readPage("https://nature.com/quantum-error-correction-2025")
         → Extracted 12 facts about error correction advances

[Step 3] 📝 Tool: addToGraph({
           entity: "IBM Quantum",
           relation: "achieved",
           target: "1000+ qubit processor"
         })
         → Knowledge graph: 12 nodes, 8 edges

[Step 4] 🔍 Tool: webSearch("Google quantum supremacy 2025")
         → Found 8 relevant results

...

[Step 12] 🤔 Agent reasoning: "I have gathered sufficient information
          on error correction, qubit scaling, and commercial applications.
          Time to synthesize the research report."

[Step 13] 📊 Tool: synthesizeReport()

═══════════════════════════════════════════════════════════════════════
                 RESEARCH REPORT: QUANTUM COMPUTING 2025
═══════════════════════════════════════════════════════════════════════

## Executive Summary

Quantum computing achieved several major milestones in 2025, with
breakthroughs in error correction, qubit scaling, and commercial...

## Key Findings

### 1. Error Correction (High Confidence: 0.92)
IBM and Google independently demonstrated...

### 2. Commercial Applications (Medium Confidence: 0.78)
First production use cases emerged in...

## Knowledge Graph Visualization

    ┌─────────────┐      achieved      ┌────────────────────┐
    │  IBM Quantum │ ────────────────► │ 1000+ qubit proc.  │
    └──────┬──────┘                    └────────────────────┘
           │
    competes with
           │
           ▼
    ┌──────────────┐     published     ┌────────────────────┐
    │ Google Quant │ ────────────────► │ Error correction   │
    └──────────────┘                   │ breakthrough       │
                                       └────────────────────┘

## Sources

[1] Nature: "Quantum Error Correction Advances" (2025-03-15)
    Confidence: 0.95
    https://nature.com/quantum-error-correction-2025

[2] ArXiv: "Scaling Quantum Processors" (2025-06-22)
    Confidence: 0.88
    ...

═══════════════════════════════════════════════════════════════════════

📁 Full report saved to: research_quantum_2025-12-22.md
📊 Knowledge graph exported to: knowledge_graph.json

The Core Question You’re Answering

“How do I build an agent that autonomously explores, learns, and synthesizes information?”

This is about understanding complex multi-tool agents with state management, autonomous decision-making, and knowledge accumulation. You’re not just calling tools—you’re building a system that thinks, learns, and decides when it knows enough.

Concepts You Must Understand First

Multi-Tool Orchestration - Coordinating multiple tools with different purposes (search, read, extract, store)
Agent State Management - Maintaining state (knowledge graph) across iterations
prepareStep - Injecting accumulated context before each LLM call
stopWhen - Intelligent termination conditions based on agent reasoning
Knowledge Graphs - Representing and querying accumulated facts as entities and relationships

Include ASCII diagram of the research loop:

┌──────────────────────────────────────────────────────────────────┐
│                    RESEARCH AGENT ARCHITECTURE                    │
├──────────────────────────────────────────────────────────────────┤
│                                                                   │
│   ┌─────────────────────────────────────────────────────────┐    │
│   │                    AGENT STATE                           │    │
│   │  ┌───────────────┐  ┌────────────────┐  ┌────────────┐  │    │
│   │  │ Knowledge     │  │ Sources        │  │ Confidence │  │    │
│   │  │ Graph         │  │ Collected      │  │ Scores     │  │    │
│   │  └───────────────┘  └────────────────┘  └────────────┘  │    │
│   └─────────────────────────────────────────────────────────┘    │
│                              ▲                                    │
│                              │ prepareStep injects state          │
│                              │                                    │
│   ┌──────────────────────────┴───────────────────────────────┐   │
│   │                      AGENT LOOP                           │   │
│   │                                                           │   │
│   │   ┌──────┐    ┌─────────────────────────────────────┐    │   │
│   │   │ LLM  │ ──►│ Tools: search, read, extract, graph │    │   │
│   │   └──▲───┘    └───────────────────┬─────────────────┘    │   │
│   │      │                            │                       │   │
│   │      └────────────────────────────┘                       │   │
│   │                                                           │   │
│   │   stopWhen: agent says "research complete"                │   │
│   └───────────────────────────────────────────────────────────┘   │
│                                                                   │
│   Output: Synthesized report + Knowledge graph + Citations        │
└──────────────────────────────────────────────────────────────────┘

Questions to Guide Your Design

What tools does a research agent need?
- webSearch: Find relevant sources on the web
- readPage: Extract content from URLs
- extractFacts: Parse content into structured facts with generateObject
- addToGraph: Store facts as knowledge graph nodes/edges
- queryGraph: Find related information already collected
- synthesizeReport: Generate final output with citations
How do you represent the knowledge graph?
- Nodes: entities (people, organizations, concepts, technologies)
- Edges: relationships (achieved, published, competes with, enables)
- Metadata: confidence scores, source URLs, timestamps
- Consider: in-memory Map, SQLite with graph queries, or Neo4j
How does the agent know when to stop searching and start writing?
- stopWhen condition: “I have sufficient information to answer the question”
- Agent reasons about coverage: multiple sources, key topics addressed, confidence threshold
- Step limit as safety: maxSteps to prevent infinite loops
How do you assign confidence scores to facts?
- Source credibility: .edu/.gov = high, blogs = medium
- Corroboration: multiple sources = higher confidence
- Recency: newer sources = higher confidence for current events
- Extract confidence as part of the fact schema

Thinking Exercise

Design the knowledge graph data structure before implementing:

// What should your types look like?
interface KnowledgeNode {
  id: string;
  type: 'entity' | 'concept' | 'event';
  name: string;
  description: string;
  sourceUrls: string[];
  confidence: number;
}

interface KnowledgeEdge {
  from: string;  // node id
  relation: string;
  to: string;    // node id
  confidence: number;
  sourceUrl: string;
}

// How will you query it?
// How will you update it?
// How will you serialize it for prepareStep?

The Interview Questions They’ll Ask

“How do you maintain state across agent iterations?”
- Answer: Use prepareStep to inject the serialized knowledge graph as context
- The LLM sees what it has already learned before deciding the next action
- State lives outside the agent loop, updated after each tool call
“What is prepareStep and when would you use it?”
- Answer: prepareStep is a callback that runs before each agent iteration
- It lets you inject dynamic context (like accumulated knowledge)
- Use it when the agent needs to “remember” previous findings
“How would you implement a research termination condition?”
- Answer: stopWhen with agent reasoning: “Do I have enough information?”
- Agent evaluates coverage of key topics, number of sources, confidence levels
- Fallback: maxSteps limit to prevent runaway loops
“How do you handle conflicting information from different sources?”
- Answer: Track confidence scores, store multiple facts with different sources
- Flag conflicts in the knowledge graph (contradicts relationship)
- Let the synthesis tool weigh evidence and present both views

Hints in Layers

Hint 1: Start with search + readPage tools only

Get the basic agent loop working: search → read → search → read
Don’t worry about knowledge graphs yet
Just accumulate raw text in an array

Hint 2: Add a simple in-memory fact store

Define a Facts array with { fact: string, source: string }
Add an extractFacts tool that uses generateObject
Store facts in memory, no graph yet

Hint 3: Use prepareStep to inject accumulated facts

Before each LLM call, serialize facts to text
Inject as context: “So far you have learned: [facts]”
Agent now “remembers” what it found

Hint 4: Add synthesizeReport as the final tool

When agent decides it’s done, it calls synthesizeReport
This tool uses generateObject to structure the final report
Include citations by matching facts to their source URLs

Hint 5: Upgrade to a real knowledge graph

Replace Facts array with nodes and edges
Add queryGraph tool so agent can search its own memory
Visualize with ASCII or export to JSON for external tools

Books That Will Help

Topic	Book	Chapter
Knowledge Graphs	“Graph Algorithms the Fun Way” by Jeremy Kubica	Ch. 2-3 (Graph representation)
Agent Patterns	“Building LLM Apps” by Harrison Chase	Agent loops, tool design
ReAct Pattern	“ReAct: Synergizing Reasoning and Acting” (paper)	The academic foundation
State Management	“Fluent Python, 2nd Edition” by Luciano Ramalho	Ch. 22 (patterns apply to TS)
Async Iteration	“JavaScript: The Definitive Guide” by David Flanagan	Ch. 13 (agent loop internals)
Web Scraping	“Web Scraping with Python” by Ryan Mitchell	Ch. 2-4 (readPage implementation)
Structured Output	“Programming TypeScript” by Boris Cherny	Ch. 3 (Zod schemas for facts)

Project Comparison

Project	Difficulty	Time	Depth of Understanding	Fun Factor
Expense Tracker CLI	Beginner	Weekend	⭐⭐	⭐⭐⭐
Streaming Summarizer	Beginner-Intermediate	1 week	⭐⭐⭐	⭐⭐⭐⭐
Code Review Agent	Intermediate	1-2 weeks	⭐⭐⭐⭐	⭐⭐⭐⭐⭐
Model Router	Intermediate	1-2 weeks	⭐⭐⭐⭐	⭐⭐⭐
Research Agent	Advanced	2-3 weeks	⭐⭐⭐⭐⭐	⭐⭐⭐⭐⭐

Recommendation

Based on learning the AI SDK deeply, I recommend this progression:

Start with Project 1 (Expense Tracker) - Gets you comfortable with the core API patterns in a low-risk CLI environment. You’ll have something working in a weekend.
Move to Project 2 (Streaming Summarizer) - Adds the streaming dimension and web UI integration. This is where AI apps become fun.
Tackle Project 3 (Code Review Agent) - This is the inflection point where you go from “using AI” to “building AI systems.” Tool calling changes everything.
Projects 4-5 based on your interests - Model Router if you’re building production systems; Research Agent if you want to push agent capabilities.

Final Overall Project: Personal AI Command Center

What you’ll build: A unified personal AI assistant hub with multiple specialized agents (research agent, code helper, email manager, calendar assistant) that can be invoked via CLI, web UI, or API. Each agent has its own tools and state, but they can collaborate and share context through a central orchestration layer.

Why it teaches everything: This is the synthesis project. You’ll use:

generateText/streamText for real-time interactions
generateObject for structured task routing and data extraction
Tools for each agent’s specific capabilities
Agent loops for autonomous task completion
Provider abstraction to route different tasks to optimal models
Telemetry for usage tracking and debugging
Streaming UI for interactive web interface

Core challenges you’ll face:

Designing an agent orchestration layer that routes to specialized agents (maps to architecture)
Implementing shared context/memory across agents (maps to state management)
Building a unified tool registry that agents can discover (maps to tool design)
Creating a streaming web UI with multiple concurrent agent conversations (maps to real-time systems)
Implementing cost controls and rate limiting across providers (maps to production concerns)

Key Concepts:

Multi-Agent Architecture: AI SDK 6 Agent Abstraction docs
Event-Driven Architecture: “Designing Data-Intensive Applications” by Martin Kleppmann - Ch. 11
React Concurrent Features: “Learning React, 2nd Edition” by Eve Porcello - Ch. 8
API Design: “Design and Build Great Web APIs” by Mike Amundsen - Ch. 3-5

Difficulty: Advanced Time estimate: 1 month+ Prerequisites: All previous projects

Real world outcome:

Web dashboard showing all your agents and their status
Natural language command: “Research quantum computing, then draft an email to my team summarizing it”
Watch agents collaborate: research agent gathers info → email agent drafts message
CLI access: ai research "topic", ai email draft "context"
API endpoint for integration with other tools
Usage dashboard showing costs, requests, model usage by agent

Learning milestones:

Single agent works end-to-end → you’ve internalized the agent pattern
Two agents share context successfully → you understand inter-agent communication
Web UI streams multiple agent responses → you’ve mastered concurrent streaming
Cost tracking shows optimization opportunities → you think about production AI systems
Someone else can use your command center → you’ve built a real product

Real World Outcome

Show EXACT what the web dashboard and CLI look like:

┌─────────────────────────────────────────────────────────────────────────────┐
│  🤖 Personal AI Command Center                              [Dashboard]     │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  ACTIVE AGENTS                                  RECENT ACTIVITY              │
│  ─────────────                                  ───────────────              │
│  🔬 Research Agent    [Idle]                    10:34 Drafted email to team  │
│  📧 Email Agent       [Processing...]           10:32 Research completed     │
│  📅 Calendar Agent    [Idle]                    10:28 Scheduled meeting      │
│  💻 Code Helper       [Idle]                    10:15 Reviewed PR #234       │
│                                                                              │
│  ─────────────────────────────────────────────────────────────────────────  │
│                                                                              │
│  CURRENT TASK: Drafting email summary of quantum research                   │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │                                                                      │    │
│  │  📧 Email Agent streaming...                                         │    │
│  │                                                                      │    │
│  │  Subject: Quantum Computing Research Summary                         │    │
│  │                                                                      │    │
│  │  Hi Team,                                                            │    │
│  │                                                                      │    │
│  │  I wanted to share some exciting findings from my research on█       │    │
│  │                                                                      │    │
│  └─────────────────────────────────────────────────────────────────────┘    │
│                                                                              │
│  ─────────────────────────────────────────────────────────────────────────  │
│                                                                              │
│  COST TRACKING (This Month)                                                 │
│  ───────────────────────────                                                │
│  Total: $23.45                                                              │
│  ├── Research Agent:  $12.30 (Claude Opus)                                  │
│  ├── Email Agent:     $5.20 (GPT-4)                                         │
│  ├── Calendar Agent:  $2.15 (GPT-3.5)                                       │
│  └── Code Helper:     $3.80 (Claude Sonnet)                                 │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

CLI access:

$ ai "Research quantum computing, then draft an email to my team summarizing it"

🤖 Orchestrator analyzing task...
📋 Execution plan:
   1. Research Agent → gather quantum computing info
   2. Email Agent → draft summary email

[Research Agent] 🔬 Starting research...
[Research Agent] ✓ Completed (12 facts gathered)

[Email Agent] 📧 Drafting email...
[Email Agent] ✓ Draft ready

Would you like me to send this email? [y/N]

The Core Question You’re Answering

“How do I build a system where multiple specialized agents collaborate to complete complex tasks?”

This is the synthesis of everything you’ve learned…

Concepts You Must Understand First

Multi-Agent Orchestration - Coordinating multiple agents
Agent-to-Agent Communication - Sharing context between agents
Task Decomposition - Breaking complex tasks into agent subtasks
Unified Tool Registry - Agents discovering and using shared tools
Streaming with Multiple Agents - Concurrent streaming in web UI
Cost Management - Tracking and controlling costs across agents

Include ASCII diagram:

┌─────────────────────────────────────────────────────────────────────────┐
│                      AI COMMAND CENTER ARCHITECTURE                      │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│   User Input: "Research X, then email summary to team"                   │
│        │                                                                 │
│        ▼                                                                 │
│   ┌─────────────────────────────────────────────────────────────────┐   │
│   │                    ORCHESTRATION LAYER                           │   │
│   │                                                                  │   │
│   │   Task Decomposition → Agent Selection → Execution Plan          │   │
│   └──────────────────────────────┬──────────────────────────────────┘   │
│                                  │                                       │
│          ┌───────────────────────┼───────────────────────┐              │
│          │                       │                       │              │
│          ▼                       ▼                       ▼              │
│   ┌─────────────┐        ┌─────────────┐        ┌─────────────┐        │
│   │  Research   │        │   Email     │        │  Calendar   │        │
│   │   Agent     │        │   Agent     │        │   Agent     │        │
│   │             │        │             │        │             │        │
│   │ Tools:      │        │ Tools:      │        │ Tools:      │        │
│   │ - search    │        │ - compose   │        │ - schedule  │        │
│   │ - read      │        │ - send      │        │ - check     │        │
│   │ - extract   │        │ - list      │        │ - invite    │        │
│   └──────┬──────┘        └──────┬──────┘        └──────┬──────┘        │
│          │                      │                      │                │
│          └──────────────────────┴──────────────────────┘                │
│                                 │                                        │
│                                 ▼                                        │
│   ┌─────────────────────────────────────────────────────────────────┐   │
│   │                     SHARED CONTEXT STORE                         │   │
│   │                                                                  │   │
│   │   Accumulated knowledge, user preferences, conversation history  │   │
│   └─────────────────────────────────────────────────────────────────┘   │
│                                                                          │
│   ┌─────────────────────────────────────────────────────────────────┐   │
│   │                     PROVIDER ABSTRACTION                         │   │
│   │         OpenAI  │  Anthropic  │  Google  │  Local Models         │   │
│   └─────────────────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────────────────┘

AI Command Center Architecture

Questions to Guide Your Design

How do agents communicate with each other?
How do you handle agent failures in a chain?
How do you stream multiple agent outputs to the UI?
How do you implement cost controls per agent?

Thinking Exercise

Design the orchestration layer before implementing - how does it decompose tasks and select agents?

The Interview Questions They’ll Ask

“How would you design a multi-agent system?”
“How do you handle context sharing between agents?”
“What’s your strategy for cost control in production AI?”
“How would you test a multi-agent system?”

Hints in Layers

Hint 1: Start with one agent end-to-end
Hint 2: Add a simple orchestrator that routes to agents
Hint 3: Implement shared context store
Hint 4: Add concurrent streaming to the web UI

Books That Will Help

Topic	Book	Chapter
Event-Driven Architecture	“Designing Data-Intensive Applications”	Ch. 11
Multi-Agent Systems	“Artificial Intelligence: A Modern Approach”	Ch. 2
API Design	“Design and Build Great Web APIs”	Ch. 3-5
React Patterns	“Learning React” by Eve Porcello	Ch. 8, 12