Project 3: Code Review Agent with Tool Calling

Deep Dive Guide AI SDK Learning Projects Estimated Time: 1-2 weeks

Learning Objectives
Deep Theoretical Foundation
Complete Project Specification
Real World Outcome
Solution Architecture
Phased Implementation Guide
Testing Strategy
Common Pitfalls and Debugging
Extensions and Challenges
Resources
Self-Assessment Checklist

Learning Objectives

By completing this project, you will master:

Agent Architecture: Understanding how LLM agents differ from simple LLM calls, including the perception-action loop, tool invocation, and autonomous decision-making
Tool Definition: Creating well-designed tools using the AI SDK’s tool() function with proper descriptions that guide LLM behavior
Agent Loop Control: Implementing stopWhen, maxSteps, and onStepFinish to control and observe agent execution
Context Management: Handling growing conversation context as tools return data, preventing context overflow
The ReAct Pattern: Implementing the Reasoning + Acting paradigm where the LLM reasons about its next step before taking action
External API Integration: Connecting agents to real-world APIs (GitHub) for practical utility
Error Recovery: Building resilient agents that gracefully handle tool failures and API errors

Deep Theoretical Foundation

What is an AI Agent?

An AI agent is fundamentally different from a simple LLM call. While a single LLM call is like asking a question and receiving an answer, an agent is like having an assistant who can actually do things in the world.

Russell & Norvig’s Definition (from “Artificial Intelligence: A Modern Approach”, Ch. 2):

“An agent is anything that can be viewed as perceiving its environment through sensors and acting upon that environment through actuators.”

For LLM agents, this translates to:

Sensors: The information provided to the LLM (prompts, tool results, context)
Actuators: The tools the agent can invoke (API calls, file operations, calculations)
Environment: The external world (GitHub, file systems, databases)

                         THE AGENT PARADIGM

    +-----------------+         +-----------------+
    |   Environment   |         |    LLM Agent    |
    |                 |         |                 |
    |  - GitHub API   |<--------|  Actuators:     |
    |  - File System  | Actions |  - fetchPR()    |
    |  - Databases    |         |  - readFile()   |
    |                 |         |  - searchCode() |
    |                 |-------->|                 |
    |                 | Percepts|  Sensors:       |
    |                 |         |  - Tool Results |
    |                 |         |  - Error Msgs   |
    +-----------------+         +-----------------+
                                       |
                                       v
                              +-----------------+
                              |  Decision Logic |
                              |  (The LLM)      |
                              |                 |
                              |  "What should I |
                              |   do next?"     |
                              +-----------------+

The Agent Loop: Perception-Decision-Action

Every agent operates in a continuous loop until it achieves its goal or reaches a termination condition:

                    THE AGENT LOOP

         +------------------------------------------+
         |                                          |
         v                                          |
    +----------+     +----------+     +----------+  |
    | PERCEIVE |---->| DECIDE   |---->|   ACT    |--+
    +----------+     +----------+     +----------+
         |                |                |
         |                |                |
    Read context    LLM reasons      Execute tool
    from tools      about next       and capture
    and history     action           result


    Detailed Flow:

    Step 1: PERCEIVE
    +--------------------------------------------------+
    | Context includes:                                 |
    | - Original user prompt                           |
    | - System instructions                            |
    | - Previous tool calls and their results          |
    | - Any errors from previous steps                 |
    +--------------------------------------------------+
                           |
                           v
    Step 2: DECIDE
    +--------------------------------------------------+
    | LLM evaluates:                                    |
    | - "Do I have enough information to complete?"    |
    | - "Which tool should I call next?"               |
    | - "What parameters should I pass?"               |
    | - "Should I stop and respond to the user?"       |
    +--------------------------------------------------+
                           |
                           v
    Step 3: ACT
    +--------------------------------------------------+
    | Options:                                          |
    | A) Call a tool with specific parameters          |
    | B) Generate final text response (terminate)      |
    | C) Call special "done" tool (terminate)          |
    +--------------------------------------------------+
                           |
                           v
                   Loop back to PERCEIVE
                   (unless terminated)

The tool() Function: How LLMs Invoke Tools

The tool() function is how you give capabilities to your agent. It has three critical components:

Description: Tells the LLM when and why to use this tool
Parameters Schema: Defines what the tool accepts (using Zod)
Execute Function: Performs how the tool does its work

import { tool } from 'ai';
import { z } from 'zod';

const readFile = tool({
  // DESCRIPTION: This is prompt engineering!
  // The LLM reads this to decide when to call the tool
  description: 'Read the contents of a file from the repository. ' +
               'Use this when you need to examine source code. ' +
               'Returns the full file contents as a string.',

  // PARAMETERS: The LLM generates these values
  parameters: z.object({
    path: z.string().describe('Path to the file, e.g., "src/index.ts"')
  }),

  // EXECUTE: Your code that runs when the tool is called
  execute: async ({ path }) => {
    const content = await fs.readFile(path, 'utf-8');
    return content;
  }
});

How the LLM “Sees” Your Tool:

When you define tools, the AI SDK converts them to a schema that the LLM understands:

    YOUR CODE                         WHAT THE LLM SEES

    const readFile = tool({           {
      description: 'Read...',           "name": "readFile",
      parameters: z.object({            "description": "Read the contents
        path: z.string()                  of a file from the repository.
          .describe('...')                Use this when you need to
      }),                                 examine source code...",
      execute: async ({ path }) => {    "parameters": {
        ...                               "type": "object",
      }                                   "properties": {
    });                                     "path": {
                                              "type": "string",
                                              "description": "Path..."
                                            }
                                          },
                                          "required": ["path"]
                                        }
                                      }

Tool Descriptions as Prompt Engineering:

The description is the most important part of a tool definition. It’s literally prompt engineering that guides the LLM’s tool selection:

    POOR DESCRIPTION                  GOOD DESCRIPTION

    "Read file"                       "Read the contents of a source code
                                       file from the repository. Use this
                                       tool when you need to examine
                                       implementation details, understand
                                       code structure, or find specific
                                       patterns. Returns the full file
                                       as a string. For large files
                                       (>500 lines), consider using
                                       searchPattern first to locate
                                       specific areas of interest."

    Problems:                         Benefits:
    - LLM doesn't know when          - Clear use case
      to use it                      - Explains return value
    - No guidance on purpose         - Suggests alternatives
    - No context about               - Helps LLM make decisions
      return value

How stopWhen and maxSteps Work

The agent loop needs termination conditions. The AI SDK provides two mechanisms:

stopWhen: A function that examines each step and returns true when the loop should end.

import { generateText, hasToolCall } from 'ai';

const result = await generateText({
  model: openai('gpt-4'),
  tools: { readFile, searchPattern, generateReview },

  // Stop when the agent calls generateReview
  stopWhen: hasToolCall('generateReview'),

  prompt: 'Review this PR...'
});

How stopWhen Works Internally:

    Agent Loop with stopWhen

    +---------+     +---------+     +---------+
    | Step 1  |---->| Step 2  |---->| Step 3  |
    | readFile|     | search  |     |generateR|
    +---------+     +---------+     +---------+
         |               |               |
         v               v               v
    +----------+    +----------+    +----------+
    | stopWhen |    | stopWhen |    | stopWhen |
    | returns  |    | returns  |    | returns  |
    | false    |    | false    |    | TRUE!    |
    +----------+    +----------+    +----------+
         |               |               |
         v               v               v
      Continue        Continue         STOP
                                        |
                                        v
                                   Return result

maxSteps: A safety limit preventing infinite loops.

const result = await generateText({
  model: openai('gpt-4'),
  tools,
  maxSteps: 10,  // Absolute maximum iterations
  stopWhen: hasToolCall('generateReview'),
  prompt: 'Review this PR...'
});

Combined Flow:

    For each step:

    1. Check: steps >= maxSteps?
       |
       +-- Yes --> STOP (safety limit)
       |
       +-- No --> Continue

    2. LLM generates response

    3. Check: stopWhen(step) === true?
       |
       +-- Yes --> STOP (goal reached)
       |
       +-- No --> Execute tools, continue loop

Context Management and Conversation State

As the agent works, context grows. Each tool call adds to the conversation history:

    CONTEXT GROWTH ACROSS ITERATIONS

    Step 1:
    +---------------------------+
    | System: "You are a code   |
    |   reviewer..."            |
    | User: "Review PR #47"     |
    | Assistant: Call readFile  |
    | Tool Result: [89 lines]   |  <-- +89 lines added
    +---------------------------+
    ~200 tokens

    Step 2:
    +---------------------------+
    | [Previous context]        |
    | Assistant: Call readFile  |
    |   (another file)          |
    | Tool Result: [156 lines]  |  <-- +156 lines added
    +---------------------------+
    ~600 tokens

    Step 3:
    +---------------------------+
    | [Previous context]        |
    | Assistant: Call search    |
    | Tool Result: [3 matches]  |  <-- +30 lines added
    +---------------------------+
    ~800 tokens

    ...

    Step N:
    +---------------------------+
    | [All previous context]    |
    |                           |
    | RISK: Context exceeds     |
    | model's context window!   |
    +---------------------------+
    ~128,000 tokens?

Managing Context with prepareStep:

The AI SDK allows you to preprocess context before each step:

const result = await generateText({
  model: openai('gpt-4'),
  tools,
  prepareStep: async ({ steps }) => {
    // Summarize old steps to reduce context
    if (steps.length > 5) {
      return {
        steps: [
          // Keep only summary of old steps
          summarizeSteps(steps.slice(0, -2)),
          // Keep last 2 steps in full
          ...steps.slice(-2)
        ]
      };
    }
    return { steps };
  },
  prompt: 'Review this PR...'
});

The ReAct Pattern: Reasoning + Acting

ReAct (from the paper “ReAct: Synergizing Reasoning and Acting in Language Models” by Yao et al.) is a paradigm where the LLM explicitly reasons about its actions:

    TRADITIONAL AGENT              REACT AGENT

    User: Review this PR           User: Review this PR
           |                              |
           v                              v
    Call readFile                  Thought: "I should first
           |                        read the changed files
           v                        to understand what was
    Call searchPattern              modified."
           |                              |
           v                              v
    Return result                  Action: readFile
                                          |
                                          v
                                   Observation: [file contents]
                                          |
                                          v
                                   Thought: "I see password
                                    handling. Let me search
                                    for security patterns."
                                          |
                                          v
                                   Action: searchPattern
                                          |
                                          v
                                   ... continues with
                                   explicit reasoning

ReAct in AI SDK:

You can encourage ReAct behavior through your system prompt:

const systemPrompt = `
You are a code review agent. For each action you take:

1. THOUGHT: Explain your reasoning for the next action
2. ACTION: Call the appropriate tool
3. OBSERVATION: Analyze the tool result
4. Repeat until you have enough information

Always explain your thinking before acting.
`;

const result = await generateText({
  model: openai('gpt-4'),
  system: systemPrompt,
  tools,
  stopWhen: hasToolCall('generateReview'),
  prompt: 'Review PR #47'
});

This produces agent traces like:

[Step 1] Thought: "I need to start by fetching the PR metadata
         to understand what files were changed."
         Action: fetchPRMetadata({prUrl: "..."})

[Step 2] Observation: "5 files changed, mostly in src/auth/"
         Thought: "Authentication changes are security-sensitive.
         I should read the main auth file first."
         Action: readFile({path: "src/auth/middleware.ts"})

[Step 3] Observation: "I see password handling on line 34"
         Thought: "Password handling is critical. Let me search
         for any console.log statements that might leak credentials."
         Action: searchPattern({pattern: "console.log.*password"})

Complete Project Specification

Overview

Build a CLI tool that functions as an autonomous code review agent. Given a GitHub Pull Request URL or local git diff, the agent will:

Fetch PR metadata and list of changed files
Autonomously decide which files to read and analyze
Search for common code issues and security patterns
Generate a structured code review with line-specific feedback
Optionally post the review as a GitHub comment

Functional Requirements

Requirement	Description	Priority
PR Input	Accept GitHub PR URL (e.g., `github.com/org/repo/pull/123`)	P0
Local Diff	Accept local git diff as alternative input	P1
File Reading	Read individual files from the PR’s head commit	P0
Pattern Search	Search codebase for specific patterns (security, code smells)	P0
Review Generation	Produce structured review with categories and line numbers	P0
GitHub Integration	Post review as PR comment	P1
Progress Logging	Show agent’s reasoning and tool calls in real-time	P0
Rate Limiting	Handle GitHub API rate limits gracefully	P1
Large PR Handling	Gracefully handle PRs with many changed files	P1

Non-Functional Requirements

Requirement	Target	Rationale
Latency	< 30s for typical PR (5-10 files)	User experience
Token Efficiency	< 50K tokens per review	Cost control
Reliability	Graceful degradation on API failures	Production readiness
Observability	Full trace of agent decisions	Debugging

Tool Definitions

The agent requires these core tools:

+-------------------+----------------------------------------+
|      Tool         |              Purpose                   |
+-------------------+----------------------------------------+
| fetchPRMetadata   | Get PR title, description, file list   |
| getChangedFiles   | List all files modified in the PR      |
| readFile          | Read contents of a specific file       |
| getDiff           | Get unified diff for a file            |
| searchPattern     | Grep-like search across files          |
| generateReview    | Produce final structured review        |
+-------------------+----------------------------------------+

Real World Outcome

When you run the CLI, here’s exactly what you’ll see in your terminal:

$ review-agent https://github.com/acme/widget-api/pull/47

 Code Review Agent initialized
 Fetching PR #47 from acme/widget-api...

[Step 1] Tool: fetchPRMetadata
         Title: "Add user authentication middleware"
         Files changed: 5
         Additions: 234, Deletions: 12

[Step 2] Tool: readFile("src/middleware/auth.ts")
         Reading 89 lines...

[Step 3] Tool: readFile("src/routes/users.ts")
         Reading 156 lines...

[Step 4] Tool: searchPattern("password", "*.ts")
         Found 3 matches across 2 files

[Step 5] Tool: readFile("src/utils/crypto.ts")
         Reading 45 lines...

[Step 6] Agent reasoning: "I've reviewed the authentication
         implementation. I noticed potential security issues with
         password handling and a missing rate limiter. Let me
         generate the review."

[Step 7] Tool: generateReview

=====================================================================
                    CODE REVIEW: PR #47
            "Add user authentication middleware"
=====================================================================

## Security Issues (2)

### 1. Plaintext Password Logging
**File:** `src/middleware/auth.ts:34`
**Severity:** Critical

```typescript
console.log(`Login attempt for user: ${email}, password: ${password}`);
                                                        ^^^^^^^^

Issue: Password is logged in plaintext. This exposes credentials in logs. Fix: Remove password from log statement or mask it.

2. Missing Rate Limiting

File: src/routes/users.ts:12-45 Severity: High

The /login endpoint has no rate limiting, making it vulnerable to brute-force attacks.

Recommended: Add rate limiting middleware (e.g., express-rate-limit).

Code Quality (3)

1. Magic Number

File: src/middleware/auth.ts:56

const TOKEN_EXPIRY = 86400;  // What is this number?

Suggestion: Use named constant: const ONE_DAY_IN_SECONDS = 86400;

2. Missing Error Type

File: src/utils/crypto.ts:23

} catch (e) {
    throw e;  // No type narrowing
}

Suggestion: Use catch (e: unknown) and proper error handling.

3. Inconsistent Async Pattern

File: src/routes/users.ts:78 Mixing .then() and async/await. Prefer consistent async/await.

Suggestions (2)

1. Add Input Validation

File: src/routes/users.ts:15 Consider adding Zod schema validation for the login request body.

2. Extract JWT Secret

File: src/middleware/auth.ts:8 JWT_SECRET should come from environment variables, not hardcoded.

Summary

Category	Count
Security Issues	2
Code Quality	3
Suggestions	2

Overall: This PR introduces authentication but has critical security issues that must be addressed before merging.

Recommendation: Request changes

=====================================================================

Full review saved to: review-pr-47.md Ready to post as PR comment? [y/N]

If the user confirms, the agent posts the review as a GitHub comment:

```bash
$ y

 Posting review to GitHub...
 Review posted: https://github.com/acme/widget-api/pull/47#issuecomment-1234567

Done! Agent completed in 12.3s (7 steps, 3 files analyzed)

Solution Architecture

High-Level Architecture

                    CODE REVIEW AGENT ARCHITECTURE

    +------------------------------------------------------------------+
    |                         CLI INTERFACE                             |
    |  $ review-agent <pr-url> [--output=file] [--post]                |
    +------------------------------------------------------------------+
                                  |
                                  v
    +------------------------------------------------------------------+
    |                      INPUT PARSER                                 |
    |  - Extract owner, repo, PR number from URL                       |
    |  - Validate GitHub token                                         |
    +------------------------------------------------------------------+
                                  |
                                  v
    +------------------------------------------------------------------+
    |                     AGENT ORCHESTRATOR                           |
    |                                                                   |
    |   +------------------+    +------------------+                    |
    |   |   System Prompt  |    |   User Message   |                    |
    |   |  "You are a code |    |  "Review PR #47  |                    |
    |   |   reviewer..."   |    |   from repo X"   |                    |
    |   +------------------+    +------------------+                    |
    |              |                     |                              |
    |              +----------+----------+                              |
    |                         |                                         |
    |                         v                                         |
    |   +--------------------------------------------------+           |
    |   |                 generateText()                    |           |
    |   |                                                   |           |
    |   |  model: openai('gpt-4')                          |           |
    |   |  tools: { fetchPR, readFile, search, review }    |           |
    |   |  stopWhen: hasToolCall('generateReview')         |           |
    |   |  maxSteps: 15                                    |           |
    |   |  onStepFinish: logProgress                       |           |
    |   +--------------------------------------------------+           |
    +------------------------------------------------------------------+
                                  |
                                  v
    +------------------------------------------------------------------+
    |                      TOOL REGISTRY                                |
    |                                                                   |
    |  +------------+  +------------+  +------------+  +------------+  |
    |  | fetchPR    |  | readFile   |  | search     |  | getDiff    |  |
    |  | Metadata   |  |            |  | Pattern    |  |            |  |
    |  +------------+  +------------+  +------------+  +------------+  |
    |        |               |               |               |          |
    |        +---------------+---------------+---------------+          |
    |                        |                                          |
    |                        v                                          |
    |               +------------------+                                |
    |               | generateReview   |                                |
    |               | (Terminal Tool)  |                                |
    |               +------------------+                                |
    +------------------------------------------------------------------+
                                  |
                                  v
    +------------------------------------------------------------------+
    |                    EXTERNAL SERVICES                              |
    |                                                                   |
    |  +------------------+           +------------------+              |
    |  |   GitHub API     |           |    OpenAI API    |              |
    |  |                  |           |                  |              |
    |  | - /pulls/:id     |           | - chat/complete  |              |
    |  | - /contents/:path|           |   with tools     |              |
    |  | - /comments      |           |                  |              |
    |  +------------------+           +------------------+              |
    +------------------------------------------------------------------+

Tool Registry Design

                        TOOL REGISTRY PATTERN

    +------------------------------------------------------------------+
    |                        tools/index.ts                             |
    |                                                                   |
    |  export const tools = {                                          |
    |    fetchPRMetadata,  // from ./github.ts                         |
    |    getChangedFiles,  // from ./github.ts                         |
    |    readFile,         // from ./files.ts                          |
    |    getDiff,          // from ./files.ts                          |
    |    searchPattern,    // from ./search.ts                         |
    |    generateReview,   // from ./review.ts (terminal)              |
    |  };                                                              |
    +------------------------------------------------------------------+
                                  |
                                  |
        +-------------------------+-------------------------+
        |                         |                         |
        v                         v                         v
    +-----------+           +-----------+           +-----------+
    |github.ts  |           | files.ts  |           | search.ts |
    |           |           |           |           |           |
    |fetchPR    |           |readFile   |           |searchPat  |
    |Metadata() |           |()         |           |tern()     |
    |           |           |           |           |           |
    |getChanged |           |getDiff()  |           |           |
    |Files()    |           |           |           |           |
    +-----------+           +-----------+           +-----------+
        |                         |                         |
        v                         v                         v
    +-----------+           +-----------+           +-----------+
    | GitHubAPI |           | Node fs   |           |child_proc |
    | (Octokit) |           | promises  |           |exec('rg') |
    +-----------+           +-----------+           +-----------+

Agent Loop Internals

                    AGENT LOOP DETAILED FLOW

    generateText() called
           |
           v
    +------+------+
    | Initialize  |
    | context     |
    | messages=[] |
    +-------------+
           |
           v
    +------+------+
    | step = 0    |<----------------------------------+
    +-------------+                                   |
           |                                          |
           v                                          |
    +------+------+                                   |
    | step++      |                                   |
    +-------------+                                   |
           |                                          |
           v                                          |
    +------+------+                                   |
    | step >      |--Yes--> Return { text, steps }   |
    | maxSteps?   |                                   |
    +-------------+                                   |
           |No                                        |
           v                                          |
    +------+------+                                   |
    | LLM call    |                                   |
    | with tools  |                                   |
    | & context   |                                   |
    +-------------+                                   |
           |                                          |
           v                                          |
    +------+------+                                   |
    | Response    |                                   |
    | has tool    |--No--> Return { text, steps }    |
    | calls?      |        (LLM chose to respond)    |
    +-------------+                                   |
           |Yes                                       |
           v                                          |
    +------+------+                                   |
    | stopWhen    |--Yes--> Return { text, steps }   |
    | (toolCall)? |        (Goal reached)            |
    +-------------+                                   |
           |No                                        |
           v                                          |
    +------+------+                                   |
    | Execute     |                                   |
    | tool(s)     |                                   |
    +-------------+                                   |
           |                                          |
           v                                          |
    +------+------+                                   |
    | onStep      |                                   |
    | Finish()    |                                   |
    +-------------+                                   |
           |                                          |
           v                                          |
    +------+------+                                   |
    | Append to   |                                   |
    | context:    |                                   |
    | - assistant |                                   |
    |   message   |                                   |
    | - tool      |-----------------------------------+
    |   results   |
    +-------------+

GitHub API Integration Points

                    GITHUB API INTEGRATION

    PR URL: github.com/acme/widget-api/pull/47
                         |
                         v
    +------------------------------------------+
    |          URL PARSER                       |
    |  owner = "acme"                          |
    |  repo = "widget-api"                     |
    |  prNumber = 47                           |
    +------------------------------------------+
                         |
    +--------------------+--------------------+
    |                    |                    |
    v                    v                    v

    GET /repos/{owner}/{repo}/pulls/{prNumber}
    +------------------------------------------+
    | Response:                                 |
    | {                                        |
    |   title: "Add auth middleware",          |
    |   body: "...",                           |
    |   head: { sha: "abc123" },               |
    |   base: { sha: "def456" },               |
    |   changed_files: 5,                      |
    |   additions: 234,                        |
    |   deletions: 12                          |
    | }                                        |
    +------------------------------------------+

    GET /repos/{owner}/{repo}/pulls/{prNumber}/files
    +------------------------------------------+
    | Response:                                 |
    | [                                        |
    |   {                                      |
    |     filename: "src/auth.ts",             |
    |     status: "modified",                  |
    |     additions: 89,                       |
    |     deletions: 4,                        |
    |     patch: "@@ -10,4 +10,89 @@..."       |
    |   },                                     |
    |   ...                                    |
    | ]                                        |
    +------------------------------------------+

    GET /repos/{owner}/{repo}/contents/{path}?ref={sha}
    +------------------------------------------+
    | Response:                                 |
    | {                                        |
    |   content: "base64-encoded-content",     |
    |   encoding: "base64"                     |
    | }                                        |
    +------------------------------------------+

    POST /repos/{owner}/{repo}/issues/{prNumber}/comments
    +------------------------------------------+
    | Request Body:                             |
    | {                                        |
    |   body: "## Code Review\n\n..."          |
    | }                                        |
    +------------------------------------------+

Recommended File Structure

review-agent/
|-- package.json
|-- tsconfig.json
|-- .env                      # GITHUB_TOKEN, OPENAI_API_KEY
|-- .env.example
|
|-- src/
|   |-- index.ts              # CLI entry point
|   |-- agent.ts              # Main agent orchestration
|   |
|   |-- tools/
|   |   |-- index.ts          # Tool registry
|   |   |-- github.ts         # fetchPRMetadata, getChangedFiles
|   |   |-- files.ts          # readFile, getDiff
|   |   |-- search.ts         # searchPattern
|   |   |-- review.ts         # generateReview (terminal tool)
|   |
|   |-- lib/
|   |   |-- github-client.ts  # Octokit wrapper
|   |   |-- url-parser.ts     # Parse PR URLs
|   |   |-- formatter.ts      # Format review output
|   |
|   |-- schemas/
|   |   |-- review.ts         # Zod schemas for review structure
|   |   |-- issue.ts          # Schema for individual issues
|   |
|   |-- prompts/
|   |   |-- system.ts         # System prompt for agent
|
|-- tests/
|   |-- unit/
|   |   |-- tools/
|   |   |   |-- github.test.ts
|   |   |   |-- files.test.ts
|   |   |   |-- search.test.ts
|   |
|   |-- integration/
|   |   |-- agent.test.ts     # Full agent tests with mocked LLM
|   |
|   |-- fixtures/
|       |-- sample-pr.json    # Sample PR metadata
|       |-- sample-files/     # Sample files to review
|
|-- README.md

Phased Implementation Guide

Phase 1: Foundation (Days 1-2)

Goal: Get a minimal agent loop working with one tool.

Milestone: Agent calls readFile and returns file contents.

// src/index.ts - Minimal viable agent
import { generateText, tool } from 'ai';
import { openai } from '@ai-sdk/openai';
import { z } from 'zod';
import * as fs from 'fs/promises';

const readFile = tool({
  description: 'Read a file from the local filesystem',
  parameters: z.object({
    path: z.string().describe('Path to the file')
  }),
  execute: async ({ path }) => {
    try {
      return await fs.readFile(path, 'utf-8');
    } catch (error) {
      return `Error reading file: ${error}`;
    }
  }
});

async function main() {
  const { text, steps } = await generateText({
    model: openai('gpt-4'),
    tools: { readFile },
    prompt: 'Read the file package.json and tell me the project name'
  });

  console.log('Steps taken:', steps.length);
  console.log('Result:', text);
}

main();

Checklist:

Project initialized with TypeScript
AI SDK and OpenAI provider installed
Single tool defined and working
Agent successfully calls tool and uses result

Phase 2: Tool Suite (Days 3-5)

Goal: Implement all tools needed for code review.

Milestone: Agent can fetch PR metadata, read files, and search patterns.

Tasks:

Set up GitHub API client (Octokit)
Implement fetchPRMetadata tool
Implement getChangedFiles tool
Implement searchPattern tool
Implement getDiff tool
Add URL parser for PR URLs

// src/tools/github.ts
import { Octokit } from '@octokit/rest';
import { tool } from 'ai';
import { z } from 'zod';

const octokit = new Octokit({ auth: process.env.GITHUB_TOKEN });

export const fetchPRMetadata = tool({
  description: 'Fetch metadata for a GitHub Pull Request including title, ' +
               'description, and statistics. Use this first to understand ' +
               'what the PR is about.',
  parameters: z.object({
    owner: z.string().describe('Repository owner (user or org)'),
    repo: z.string().describe('Repository name'),
    prNumber: z.number().describe('Pull request number')
  }),
  execute: async ({ owner, repo, prNumber }) => {
    const { data } = await octokit.pulls.get({ owner, repo, pull_number: prNumber });
    return {
      title: data.title,
      body: data.body,
      changedFiles: data.changed_files,
      additions: data.additions,
      deletions: data.deletions,
      headSha: data.head.sha,
      baseSha: data.base.sha
    };
  }
});

export const getChangedFiles = tool({
  description: 'Get the list of files changed in a Pull Request. ' +
               'Returns filenames, status (added/modified/deleted), ' +
               'and line change counts.',
  parameters: z.object({
    owner: z.string(),
    repo: z.string(),
    prNumber: z.number()
  }),
  execute: async ({ owner, repo, prNumber }) => {
    const { data } = await octokit.pulls.listFiles({
      owner,
      repo,
      pull_number: prNumber
    });
    return data.map(f => ({
      filename: f.filename,
      status: f.status,
      additions: f.additions,
      deletions: f.deletions
    }));
  }
});

Checklist:

GitHub client configured with authentication
All 5 tools implemented and tested individually
URL parser extracts owner/repo/PR from various URL formats
Error handling for API failures

Phase 3: Agent Loop (Days 6-8)

Goal: Wire tools into agent with proper termination.

Milestone: Agent autonomously reviews a PR and produces structured output.

Tasks:

Create system prompt for code review
Implement generateReview terminal tool
Configure stopWhen with hasToolCall
Add onStepFinish for progress logging
Add maxSteps safety limit

// src/agent.ts
import { generateText, hasToolCall } from 'ai';
import { openai } from '@ai-sdk/openai';
import { tools } from './tools';
import { systemPrompt } from './prompts/system';

export async function reviewPR(prUrl: string) {
  const { owner, repo, prNumber } = parsePRUrl(prUrl);

  const result = await generateText({
    model: openai('gpt-4'),
    system: systemPrompt,
    tools,
    maxSteps: 15,
    stopWhen: hasToolCall('generateReview'),
    onStepFinish: ({ stepType, toolCalls, text }) => {
      console.log(`[Step] ${stepType}`);
      if (toolCalls) {
        for (const call of toolCalls) {
          console.log(`  Tool: ${call.toolName}(${JSON.stringify(call.args)})`);
        }
      }
      if (text) {
        console.log(`  Reasoning: ${text.slice(0, 100)}...`);
      }
    },
    prompt: `Review Pull Request #${prNumber} from ${owner}/${repo}.
             Start by fetching the PR metadata.`
  });

  // Extract the review from the final tool call
  const reviewStep = result.steps.find(s =>
    s.toolCalls?.some(tc => tc.toolName === 'generateReview')
  );

  return reviewStep?.toolCalls?.[0]?.args;
}

Checklist:

System prompt guides agent behavior
Agent calls tools in logical sequence
Progress visible in terminal
Agent terminates when calling generateReview
Structured review extracted from result

Phase 4: Polish and CLI (Days 9-11)

Goal: Production-ready CLI with formatting and GitHub posting.

Milestone: Full CLI experience as shown in Real World Outcome.

Tasks:

Build CLI interface with commander.js
Format review output with colors (chalk)
Save review to markdown file
Implement GitHub comment posting
Add interactive confirmation prompts
Handle edge cases (large PRs, private repos)

// src/index.ts
import { Command } from 'commander';
import chalk from 'chalk';
import { reviewPR } from './agent';
import { postReviewComment } from './lib/github-client';
import { formatReview } from './lib/formatter';

const program = new Command();

program
  .name('review-agent')
  .description('AI-powered code review agent')
  .argument('<pr-url>', 'GitHub PR URL')
  .option('-o, --output <file>', 'Save review to file')
  .option('-p, --post', 'Post review as GitHub comment')
  .action(async (prUrl, options) => {
    console.log(chalk.blue(' Code Review Agent initialized'));

    const review = await reviewPR(prUrl);
    const formatted = formatReview(review);

    console.log(formatted);

    if (options.output) {
      await fs.writeFile(options.output, formatted);
      console.log(chalk.green(` Saved to ${options.output}`));
    }

    if (options.post) {
      const url = await postReviewComment(prUrl, review);
      console.log(chalk.green(` Posted: ${url}`));
    }
  });

program.parse();

Checklist:

CLI parses arguments correctly
Output is formatted and colored
Review saved to file when requested
GitHub posting works with confirmation
Error messages are helpful

Phase 5: Robustness (Days 12-14)

Goal: Handle edge cases and improve reliability.

Milestone: Agent gracefully handles failures and large PRs.

Tasks:

Add retry logic for transient failures
Implement context windowing for large PRs
Add rate limiting for GitHub API
Handle binary files gracefully
Add timeout handling
Write comprehensive tests

Checklist:

Retries on transient API failures
Large PRs handled without context overflow
Rate limits respected
Binary files skipped with message
Operations timeout after reasonable period
Tests cover happy path and error cases

Testing Strategy

Testing AI Agents: The Challenge

Testing agents is uniquely challenging because:

LLM responses are non-deterministic
Tool orchestration is dynamic
External APIs add complexity
The agent makes autonomous decisions

Mocking LLM Responses

Create predictable test scenarios by mocking the model:

// tests/mocks/model.ts
import { MockLanguageModelV1 } from 'ai/test';

export function createMockModel(responses: string[]) {
  let callIndex = 0;

  return new MockLanguageModelV1({
    doGenerate: async () => {
      const response = responses[callIndex++];
      // Parse response to determine if it's text or tool call
      if (response.startsWith('TOOL:')) {
        const [_, toolName, args] = response.match(/TOOL:(\w+):(.+)/)!;
        return {
          type: 'tool-call',
          toolCalls: [{
            toolName,
            args: JSON.parse(args)
          }]
        };
      }
      return { type: 'text', text: response };
    }
  });
}

// tests/integration/agent.test.ts
import { createMockModel } from '../mocks/model';

describe('Code Review Agent', () => {
  it('calls fetchPRMetadata first', async () => {
    const mockModel = createMockModel([
      'TOOL:fetchPRMetadata:{"owner":"acme","repo":"api","prNumber":1}'
    ]);

    const result = await reviewPR('https://github.com/acme/api/pull/1', {
      model: mockModel
    });

    expect(result.steps[0].toolCalls[0].toolName).toBe('fetchPRMetadata');
  });
});

Testing Tool Execution

Test tools in isolation from the LLM:

// tests/unit/tools/github.test.ts
import { fetchPRMetadata } from '../../../src/tools/github';
import nock from 'nock';

describe('fetchPRMetadata', () => {
  beforeEach(() => {
    nock('https://api.github.com')
      .get('/repos/acme/api/pulls/47')
      .reply(200, {
        title: 'Add auth',
        body: 'Description',
        changed_files: 5,
        additions: 100,
        deletions: 10,
        head: { sha: 'abc123' },
        base: { sha: 'def456' }
      });
  });

  afterEach(() => {
    nock.cleanAll();
  });

  it('returns structured PR metadata', async () => {
    const result = await fetchPRMetadata.execute({
      owner: 'acme',
      repo: 'api',
      prNumber: 47
    });

    expect(result).toEqual({
      title: 'Add auth',
      body: 'Description',
      changedFiles: 5,
      additions: 100,
      deletions: 10,
      headSha: 'abc123',
      baseSha: 'def456'
    });
  });

  it('handles API errors gracefully', async () => {
    nock.cleanAll();
    nock('https://api.github.com')
      .get('/repos/acme/api/pulls/999')
      .reply(404, { message: 'Not Found' });

    await expect(fetchPRMetadata.execute({
      owner: 'acme',
      repo: 'api',
      prNumber: 999
    })).rejects.toThrow(/Not Found/);
  });
});

Integration Testing with Real LLM

For integration tests, use deterministic prompts and validate behavior:

// tests/integration/agent-real.test.ts
describe('Agent Integration (Real LLM)', () => {
  // Skip in CI, run manually
  it.skip('reviews a real PR end-to-end', async () => {
    const review = await reviewPR(
      'https://github.com/your-test-repo/pull/1'
    );

    // Validate structure, not exact content
    expect(review).toHaveProperty('securityIssues');
    expect(review).toHaveProperty('codeQuality');
    expect(review).toHaveProperty('recommendation');
    expect(['approve', 'request-changes', 'comment'])
      .toContain(review.recommendation);
  });
});

Test Matrix

+---------------------+---------------+---------------+---------------+
|    Test Type        | Determinism   | Speed         | Coverage      |
+---------------------+---------------+---------------+---------------+
| Unit (tools)        | Deterministic | Fast (ms)     | Tool logic    |
| Unit (mocked LLM)   | Deterministic | Fast (ms)     | Orchestration |
| Integration (mock)  | Deterministic | Medium (s)    | Full flow     |
| Integration (real)  | Non-determ    | Slow (10s+)   | E2E behavior  |
+---------------------+---------------+---------------+---------------+

Common Pitfalls and Debugging

Pitfall 1: Poor Tool Descriptions

Symptom: LLM calls wrong tool or ignores available tools.

Cause: Tool descriptions don’t clearly explain when to use them.

Solution: Write descriptions as if explaining to a new team member.

// BAD
description: 'Read file'

// GOOD
description: 'Read the contents of a source code file from the repository. ' +
             'Use this when you need to examine implementation details or ' +
             'understand code structure. Returns the full file contents as text. ' +
             'For very large files, consider using searchPattern first.'

Pitfall 2: Context Overflow

Symptom: Agent crashes with “context length exceeded” or responses become confused.

Cause: Tool results accumulate without summarization.

Solution: Implement context management with prepareStep or limit tool output size.

const readFile = tool({
  description: 'Read file contents',
  parameters: z.object({ path: z.string() }),
  execute: async ({ path }) => {
    const content = await fs.readFile(path, 'utf-8');
    const lines = content.split('\n');

    // Limit returned content
    if (lines.length > 100) {
      return `[First 100 lines of ${lines.length} total]\n` +
             lines.slice(0, 100).join('\n') +
             '\n...[truncated]';
    }
    return content;
  }
});

Pitfall 3: Missing Stop Condition

Symptom: Agent runs forever or hits maxSteps without meaningful result.

Cause: No clear termination condition or LLM doesn’t understand when to stop.

Solution: Use explicit terminal tool with clear description.

const generateReview = tool({
  description: 'Generate the final code review. ' +
               'IMPORTANT: Call this tool when you have gathered enough ' +
               'information to write a complete review. Do not call any ' +
               'other tools after this.',
  parameters: reviewSchema,
  execute: async (review) => review
});

// In agent:
stopWhen: hasToolCall('generateReview')

Pitfall 4: No Error Recovery

Symptom: Agent crashes when GitHub API returns error.

Cause: Tools don’t handle errors gracefully.

Solution: Return errors as data, let LLM decide next action.

const readFile = tool({
  description: 'Read file. Returns error message if file not found.',
  parameters: z.object({ path: z.string() }),
  execute: async ({ path }) => {
    try {
      return await fs.readFile(path, 'utf-8');
    } catch (error) {
      // Return error as data, don't throw
      return `ERROR: Could not read file ${path}: ${error.message}`;
    }
  }
});

This allows the LLM to reason: “The file doesn’t exist, let me try a different approach.”

Pitfall 5: Non-Deterministic Testing

Symptom: Tests pass/fail randomly.

Cause: Testing with real LLM without controlling for non-determinism.

Solution: Mock LLM for deterministic tests, use behavioral assertions for real LLM tests.

// DON'T assert exact output
expect(review.summary).toBe('This PR has security issues');

// DO assert structure and reasonable behavior
expect(review.summary).toBeDefined();
expect(review.summary.length).toBeGreaterThan(20);

Pitfall 6: Forgetting Rate Limits

Symptom: Agent works for small PRs but fails on larger ones.

Cause: GitHub API rate limiting kicks in.

Solution: Implement rate limiting and caching.

import Bottleneck from 'bottleneck';

const limiter = new Bottleneck({
  minTime: 100,  // Minimum 100ms between requests
  maxConcurrent: 3
});

const octokit = new Octokit({
  auth: process.env.GITHUB_TOKEN,
  request: {
    fetch: async (url, options) => {
      return limiter.schedule(() => fetch(url, options));
    }
  }
});

Pitfall 7: Ignoring onStepFinish

Symptom: No visibility into what agent is doing; hard to debug.

Cause: No observability hooks implemented.

Solution: Always use onStepFinish for logging.

const result = await generateText({
  model,
  tools,
  onStepFinish: ({ stepType, toolCalls, text, usage }) => {
    console.log(`[${new Date().toISOString()}] Step: ${stepType}`);
    console.log(`  Tokens: ${usage?.totalTokens}`);

    if (toolCalls?.length) {
      for (const call of toolCalls) {
        console.log(`  Tool: ${call.toolName}`);
        console.log(`  Args: ${JSON.stringify(call.args, null, 2)}`);
      }
    }

    if (text) {
      console.log(`  Reasoning: ${text.substring(0, 200)}`);
    }
  },
  prompt: '...'
});

Extensions and Challenges

Extension 1: Multi-Language Support

Challenge: Extend the agent to review code in multiple languages with language-specific rules.

Implementation Ideas:

Add language detection tool
Create language-specific prompt templates
Implement language-specific pattern searches (e.g., Go’s error handling, Python’s type hints)

const detectLanguage = tool({
  description: 'Detect the primary programming language of a file',
  parameters: z.object({ path: z.string() }),
  execute: async ({ path }) => {
    const ext = path.split('.').pop();
    const languageMap = {
      'ts': 'TypeScript',
      'tsx': 'TypeScript/React',
      'js': 'JavaScript',
      'py': 'Python',
      'go': 'Go',
      'rs': 'Rust'
    };
    return languageMap[ext] || 'Unknown';
  }
});

Extension 2: Learning from Feedback

Challenge: Allow users to rate reviews and use that feedback to improve future reviews.

Implementation Ideas:

Store reviews and their ratings in a database
Include highly-rated past reviews as few-shot examples
Use RAG to retrieve relevant past reviews

USER FEEDBACK LOOP

  Agent Review -> User Rating -> Store in DB
       ^                              |
       |                              v
       +-------- RAG Retrieval -------+

  "For this authentication PR, here are examples
   of highly-rated reviews on similar PRs..."

Extension 3: CI/CD Integration

Challenge: Run the agent automatically on every PR via GitHub Actions.

Implementation Ideas:

Create GitHub Action workflow
Run agent on PR open/update events
Post review as PR check or comment
Handle concurrent executions

# .github/workflows/ai-review.yml
name: AI Code Review
on:
  pull_request:
    types: [opened, synchronize]

jobs:
  review:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Run AI Review
        env:
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
        run: |
          npx review-agent ${{ github.event.pull_request.html_url }} --post

Extension 4: Diff-Aware Review

Challenge: Focus the review only on changed lines, not entire files.

Implementation Ideas:

Parse unified diff format
Extract only changed lines with context
Generate line-specific comments that map to diff hunks
Use GitHub’s review API for inline comments

const getDiffContext = tool({
  description: 'Get the specific lines changed in a file with surrounding context',
  parameters: z.object({
    path: z.string(),
    contextLines: z.number().default(3)
  }),
  execute: async ({ path, contextLines }) => {
    const diff = await getDiff(path);
    const hunks = parseDiff(diff);

    return hunks.map(hunk => ({
      startLine: hunk.newStart,
      endLine: hunk.newStart + hunk.newLines,
      oldContent: hunk.oldLines,
      newContent: hunk.newLines,
      context: hunk.context
    }));
  }
});

Resources

Books

Book	Relevant Chapters	What You’ll Learn
“Artificial Intelligence: A Modern Approach” by Russell & Norvig	Ch. 2: Intelligent Agents	Deep theoretical foundation for agents, PEAS framework, agent types
“Programming TypeScript” by Boris Cherny	Ch. 4: Functions, Ch. 7: Error Handling	Type-safe function design, error handling patterns for tools
“Release It!, 2nd Edition” by Michael Nygard	Ch. 5: Stability Patterns	Circuit breakers, timeouts, retry logic for resilient agents
“Command-Line Rust” by Ken Youens-Clark	Ch. 1-3	CLI design patterns (applicable to any language)
“Designing Data-Intensive Applications” by Martin Kleppmann	Ch. 1-2	Thinking about reliability and maintainability

Papers

“ReAct: Synergizing Reasoning and Acting in Language Models” by Yao et al. - The foundational paper on ReAct agents
“Toolformer: Language Models Can Teach Themselves to Use Tools” - Understanding how LLMs learn tool use

Documentation

AI SDK Tools and Tool Calling - Canonical reference for tool definition
AI SDK Agents - Agent loop, stopWhen, prepareStep
GitHub REST API - Pull Requests - PR metadata, files, comments
GitHub REST API - Contents - Reading file contents
Octokit.js Documentation - GitHub API client for Node.js

Videos and Courses

AI SDK Official YouTube tutorials
“Building AI Agents” series on LangChain’s YouTube channel
“Prompt Engineering for Tool Use” - Anthropic’s documentation

Self-Assessment Checklist

Use this checklist to verify your understanding before considering the project complete:

Conceptual Understanding

Can you explain the difference between an LLM call and an AI agent?
- Agent: LLM in a loop that can take actions via tools
- LLM call: Single request/response without actions
Can you draw the agent loop from memory?
- Perceive (context) -> Decide (LLM reasoning) -> Act (tool or respond) -> Loop
Can you explain how the LLM “sees” your tool definitions?
- Tools are converted to JSON schema with name, description, parameters
- Description is prompt engineering for tool selection
Can you explain what stopWhen does and when to use it?
- Checks each step to determine if loop should terminate
- Use with terminal tools like generateReview
Can you explain the ReAct pattern?
- Reasoning + Acting: LLM explicitly reasons before each action
- Thought -> Action -> Observation cycle

Implementation Skills

Can you implement a tool with proper description and parameters?
- Description explains when/why to use
- Zod schema with .describe() for each parameter
- Execute function handles errors gracefully
Can you set up an agent loop with proper termination?
- maxSteps for safety limit
- stopWhen for goal-based termination
- Terminal tool that signals completion
Can you implement onStepFinish for observability?
- Log step type, tool calls, reasoning
- Track token usage
Can you handle context growth in long-running agents?
- Limit tool output size
- Use prepareStep for summarization
- Prioritize relevant information
Can you integrate with the GitHub API?
- Fetch PR metadata
- Read file contents at specific commits
- Post comments

Testing and Debugging

Can you test tools in isolation?
- Mock external APIs with nock
- Test success and error cases
Can you test agent orchestration with mocked LLM?
- MockLanguageModelV1 for deterministic responses
- Verify tool call sequence
Can you debug an agent that isn’t calling the right tools?
- Check tool descriptions
- Verify prompt clarity
- Use onStepFinish to trace decisions
Can you handle tool failures gracefully?
- Return errors as data, don’t throw
- Let LLM decide recovery strategy

Production Readiness

Does your agent have proper error handling?
- API failures don’t crash the agent
- User sees helpful error messages
Does your agent handle edge cases?
- Large PRs with many files
- Binary files
- Private repositories
- Rate limiting
Is your agent observable?
- Progress logged in real-time
- Token usage tracked
- Execution time measured

Summary

Building a code review agent teaches you the fundamental patterns of AI agent development:

Tool Definition: How to give capabilities to an LLM through well-designed tools
Agent Loop: How the perceive-decide-act cycle works in practice
Context Management: How to handle growing conversation state
Termination: How to know when an agent should stop
Observability: How to understand what an agent is doing
Resilience: How to build agents that gracefully handle failures

This project bridges the gap between “AI generates text” and “AI takes actions.” You’re not just building a code reviewer - you’re learning patterns that apply to any autonomous AI system: research agents, data analysis agents, customer support agents, and beyond.

The skills you develop here - designing tool interfaces, managing context, handling non-determinism, testing autonomous systems - are increasingly valuable as AI agents become central to software development.

Next Steps After Completion:

Project 4: Multi-Provider Model Router (apply tool patterns to API routing)
Project 5: Semantic Search Pipeline (combine agents with embeddings)
Project 6: Real-time AI Dashboard (agents in streaming contexts)

This guide is part of the AI SDK Learning Projects series. For the full project list, see AI_SDK_LEARNING_PROJECTS.md.

Project 3: Code Review Agent with Tool Calling

Table of Contents

Learning Objectives

Deep Theoretical Foundation

What is an AI Agent?

The Agent Loop: Perception-Decision-Action

The tool() Function: How LLMs Invoke Tools

How stopWhen and maxSteps Work

Context Management and Conversation State

The ReAct Pattern: Reasoning + Acting

Complete Project Specification

Overview

Functional Requirements

Non-Functional Requirements

Tool Definitions

Real World Outcome

2. Missing Rate Limiting

Code Quality (3)

1. Magic Number

2. Missing Error Type

3. Inconsistent Async Pattern

Suggestions (2)

1. Add Input Validation

2. Extract JWT Secret

Summary

Solution Architecture

High-Level Architecture

Tool Registry Design

Agent Loop Internals

GitHub API Integration Points

Recommended File Structure

Phased Implementation Guide

Phase 1: Foundation (Days 1-2)

Phase 2: Tool Suite (Days 3-5)

Phase 3: Agent Loop (Days 6-8)

Phase 4: Polish and CLI (Days 9-11)

Phase 5: Robustness (Days 12-14)

Testing Strategy

Testing AI Agents: The Challenge

Mocking LLM Responses

Testing Tool Execution

Integration Testing with Real LLM

Test Matrix

Common Pitfalls and Debugging

Pitfall 1: Poor Tool Descriptions

Pitfall 2: Context Overflow

Pitfall 3: Missing Stop Condition

Pitfall 4: No Error Recovery

Pitfall 5: Non-Deterministic Testing

Pitfall 6: Forgetting Rate Limits

Pitfall 7: Ignoring onStepFinish

Extensions and Challenges

Extension 1: Multi-Language Support

Extension 2: Learning from Feedback

Extension 3: CI/CD Integration

Extension 4: Diff-Aware Review

Resources

Books

Papers

Documentation

Videos and Courses

Recommended Reading Order

Self-Assessment Checklist

Conceptual Understanding

Implementation Skills

Testing and Debugging

Production Readiness

Summary