Project 3: Code Review Agent with Tool Calling

Project 3: Code Review Agent with Tool Calling

Deep Dive Guide AI SDK Learning Projects Estimated Time: 1-2 weeks

Table of Contents

  1. Learning Objectives
  2. Deep Theoretical Foundation
  3. Complete Project Specification
  4. Real World Outcome
  5. Solution Architecture
  6. Phased Implementation Guide
  7. Testing Strategy
  8. Common Pitfalls and Debugging
  9. Extensions and Challenges
  10. Resources
  11. Self-Assessment Checklist

Learning Objectives

By completing this project, you will master:

  • Agent Architecture: Understanding how LLM agents differ from simple LLM calls, including the perception-action loop, tool invocation, and autonomous decision-making
  • Tool Definition: Creating well-designed tools using the AI SDK’s tool() function with proper descriptions that guide LLM behavior
  • Agent Loop Control: Implementing stopWhen, maxSteps, and onStepFinish to control and observe agent execution
  • Context Management: Handling growing conversation context as tools return data, preventing context overflow
  • The ReAct Pattern: Implementing the Reasoning + Acting paradigm where the LLM reasons about its next step before taking action
  • External API Integration: Connecting agents to real-world APIs (GitHub) for practical utility
  • Error Recovery: Building resilient agents that gracefully handle tool failures and API errors

Deep Theoretical Foundation

What is an AI Agent?

An AI agent is fundamentally different from a simple LLM call. While a single LLM call is like asking a question and receiving an answer, an agent is like having an assistant who can actually do things in the world.

Russell & Norvig’s Definition (from “Artificial Intelligence: A Modern Approach”, Ch. 2):

“An agent is anything that can be viewed as perceiving its environment through sensors and acting upon that environment through actuators.”

For LLM agents, this translates to:

  • Sensors: The information provided to the LLM (prompts, tool results, context)
  • Actuators: The tools the agent can invoke (API calls, file operations, calculations)
  • Environment: The external world (GitHub, file systems, databases)
                         THE AGENT PARADIGM

    +-----------------+         +-----------------+
    |   Environment   |         |    LLM Agent    |
    |                 |         |                 |
    |  - GitHub API   |<--------|  Actuators:     |
    |  - File System  | Actions |  - fetchPR()    |
    |  - Databases    |         |  - readFile()   |
    |                 |         |  - searchCode() |
    |                 |-------->|                 |
    |                 | Percepts|  Sensors:       |
    |                 |         |  - Tool Results |
    |                 |         |  - Error Msgs   |
    +-----------------+         +-----------------+
                                       |
                                       v
                              +-----------------+
                              |  Decision Logic |
                              |  (The LLM)      |
                              |                 |
                              |  "What should I |
                              |   do next?"     |
                              +-----------------+

The Agent Loop: Perception-Decision-Action

Every agent operates in a continuous loop until it achieves its goal or reaches a termination condition:

                    THE AGENT LOOP

         +------------------------------------------+
         |                                          |
         v                                          |
    +----------+     +----------+     +----------+  |
    | PERCEIVE |---->| DECIDE   |---->|   ACT    |--+
    +----------+     +----------+     +----------+
         |                |                |
         |                |                |
    Read context    LLM reasons      Execute tool
    from tools      about next       and capture
    and history     action           result


    Detailed Flow:

    Step 1: PERCEIVE
    +--------------------------------------------------+
    | Context includes:                                 |
    | - Original user prompt                           |
    | - System instructions                            |
    | - Previous tool calls and their results          |
    | - Any errors from previous steps                 |
    +--------------------------------------------------+
                           |
                           v
    Step 2: DECIDE
    +--------------------------------------------------+
    | LLM evaluates:                                    |
    | - "Do I have enough information to complete?"    |
    | - "Which tool should I call next?"               |
    | - "What parameters should I pass?"               |
    | - "Should I stop and respond to the user?"       |
    +--------------------------------------------------+
                           |
                           v
    Step 3: ACT
    +--------------------------------------------------+
    | Options:                                          |
    | A) Call a tool with specific parameters          |
    | B) Generate final text response (terminate)      |
    | C) Call special "done" tool (terminate)          |
    +--------------------------------------------------+
                           |
                           v
                   Loop back to PERCEIVE
                   (unless terminated)

The tool() Function: How LLMs Invoke Tools

The tool() function is how you give capabilities to your agent. It has three critical components:

  1. Description: Tells the LLM when and why to use this tool
  2. Parameters Schema: Defines what the tool accepts (using Zod)
  3. Execute Function: Performs how the tool does its work
import { tool } from 'ai';
import { z } from 'zod';

const readFile = tool({
  // DESCRIPTION: This is prompt engineering!
  // The LLM reads this to decide when to call the tool
  description: 'Read the contents of a file from the repository. ' +
               'Use this when you need to examine source code. ' +
               'Returns the full file contents as a string.',

  // PARAMETERS: The LLM generates these values
  parameters: z.object({
    path: z.string().describe('Path to the file, e.g., "src/index.ts"')
  }),

  // EXECUTE: Your code that runs when the tool is called
  execute: async ({ path }) => {
    const content = await fs.readFile(path, 'utf-8');
    return content;
  }
});

How the LLM “Sees” Your Tool:

When you define tools, the AI SDK converts them to a schema that the LLM understands:

    YOUR CODE                         WHAT THE LLM SEES

    const readFile = tool({           {
      description: 'Read...',           "name": "readFile",
      parameters: z.object({            "description": "Read the contents
        path: z.string()                  of a file from the repository.
          .describe('...')                Use this when you need to
      }),                                 examine source code...",
      execute: async ({ path }) => {    "parameters": {
        ...                               "type": "object",
      }                                   "properties": {
    });                                     "path": {
                                              "type": "string",
                                              "description": "Path..."
                                            }
                                          },
                                          "required": ["path"]
                                        }
                                      }

Tool Descriptions as Prompt Engineering:

The description is the most important part of a tool definition. It’s literally prompt engineering that guides the LLM’s tool selection:

    POOR DESCRIPTION                  GOOD DESCRIPTION

    "Read file"                       "Read the contents of a source code
                                       file from the repository. Use this
                                       tool when you need to examine
                                       implementation details, understand
                                       code structure, or find specific
                                       patterns. Returns the full file
                                       as a string. For large files
                                       (>500 lines), consider using
                                       searchPattern first to locate
                                       specific areas of interest."

    Problems:                         Benefits:
    - LLM doesn't know when          - Clear use case
      to use it                      - Explains return value
    - No guidance on purpose         - Suggests alternatives
    - No context about               - Helps LLM make decisions
      return value

How stopWhen and maxSteps Work

The agent loop needs termination conditions. The AI SDK provides two mechanisms:

stopWhen: A function that examines each step and returns true when the loop should end.

import { generateText, hasToolCall } from 'ai';

const result = await generateText({
  model: openai('gpt-4'),
  tools: { readFile, searchPattern, generateReview },

  // Stop when the agent calls generateReview
  stopWhen: hasToolCall('generateReview'),

  prompt: 'Review this PR...'
});

How stopWhen Works Internally:

    Agent Loop with stopWhen

    +---------+     +---------+     +---------+
    | Step 1  |---->| Step 2  |---->| Step 3  |
    | readFile|     | search  |     |generateR|
    +---------+     +---------+     +---------+
         |               |               |
         v               v               v
    +----------+    +----------+    +----------+
    | stopWhen |    | stopWhen |    | stopWhen |
    | returns  |    | returns  |    | returns  |
    | false    |    | false    |    | TRUE!    |
    +----------+    +----------+    +----------+
         |               |               |
         v               v               v
      Continue        Continue         STOP
                                        |
                                        v
                                   Return result

maxSteps: A safety limit preventing infinite loops.

const result = await generateText({
  model: openai('gpt-4'),
  tools,
  maxSteps: 10,  // Absolute maximum iterations
  stopWhen: hasToolCall('generateReview'),
  prompt: 'Review this PR...'
});

Combined Flow:

    For each step:

    1. Check: steps >= maxSteps?
       |
       +-- Yes --> STOP (safety limit)
       |
       +-- No --> Continue

    2. LLM generates response

    3. Check: stopWhen(step) === true?
       |
       +-- Yes --> STOP (goal reached)
       |
       +-- No --> Execute tools, continue loop

Context Management and Conversation State

As the agent works, context grows. Each tool call adds to the conversation history:

    CONTEXT GROWTH ACROSS ITERATIONS

    Step 1:
    +---------------------------+
    | System: "You are a code   |
    |   reviewer..."            |
    | User: "Review PR #47"     |
    | Assistant: Call readFile  |
    | Tool Result: [89 lines]   |  <-- +89 lines added
    +---------------------------+
    ~200 tokens

    Step 2:
    +---------------------------+
    | [Previous context]        |
    | Assistant: Call readFile  |
    |   (another file)          |
    | Tool Result: [156 lines]  |  <-- +156 lines added
    +---------------------------+
    ~600 tokens

    Step 3:
    +---------------------------+
    | [Previous context]        |
    | Assistant: Call search    |
    | Tool Result: [3 matches]  |  <-- +30 lines added
    +---------------------------+
    ~800 tokens

    ...

    Step N:
    +---------------------------+
    | [All previous context]    |
    |                           |
    | RISK: Context exceeds     |
    | model's context window!   |
    +---------------------------+
    ~128,000 tokens?

Managing Context with prepareStep:

The AI SDK allows you to preprocess context before each step:

const result = await generateText({
  model: openai('gpt-4'),
  tools,
  prepareStep: async ({ steps }) => {
    // Summarize old steps to reduce context
    if (steps.length > 5) {
      return {
        steps: [
          // Keep only summary of old steps
          summarizeSteps(steps.slice(0, -2)),
          // Keep last 2 steps in full
          ...steps.slice(-2)
        ]
      };
    }
    return { steps };
  },
  prompt: 'Review this PR...'
});

The ReAct Pattern: Reasoning + Acting

ReAct (from the paper “ReAct: Synergizing Reasoning and Acting in Language Models” by Yao et al.) is a paradigm where the LLM explicitly reasons about its actions:

    TRADITIONAL AGENT              REACT AGENT

    User: Review this PR           User: Review this PR
           |                              |
           v                              v
    Call readFile                  Thought: "I should first
           |                        read the changed files
           v                        to understand what was
    Call searchPattern              modified."
           |                              |
           v                              v
    Return result                  Action: readFile
                                          |
                                          v
                                   Observation: [file contents]
                                          |
                                          v
                                   Thought: "I see password
                                    handling. Let me search
                                    for security patterns."
                                          |
                                          v
                                   Action: searchPattern
                                          |
                                          v
                                   ... continues with
                                   explicit reasoning

ReAct in AI SDK:

You can encourage ReAct behavior through your system prompt:

const systemPrompt = `
You are a code review agent. For each action you take:

1. THOUGHT: Explain your reasoning for the next action
2. ACTION: Call the appropriate tool
3. OBSERVATION: Analyze the tool result
4. Repeat until you have enough information

Always explain your thinking before acting.
`;

const result = await generateText({
  model: openai('gpt-4'),
  system: systemPrompt,
  tools,
  stopWhen: hasToolCall('generateReview'),
  prompt: 'Review PR #47'
});

This produces agent traces like:

[Step 1] Thought: "I need to start by fetching the PR metadata
         to understand what files were changed."
         Action: fetchPRMetadata({prUrl: "..."})

[Step 2] Observation: "5 files changed, mostly in src/auth/"
         Thought: "Authentication changes are security-sensitive.
         I should read the main auth file first."
         Action: readFile({path: "src/auth/middleware.ts"})

[Step 3] Observation: "I see password handling on line 34"
         Thought: "Password handling is critical. Let me search
         for any console.log statements that might leak credentials."
         Action: searchPattern({pattern: "console.log.*password"})

Complete Project Specification

Overview

Build a CLI tool that functions as an autonomous code review agent. Given a GitHub Pull Request URL or local git diff, the agent will:

  1. Fetch PR metadata and list of changed files
  2. Autonomously decide which files to read and analyze
  3. Search for common code issues and security patterns
  4. Generate a structured code review with line-specific feedback
  5. Optionally post the review as a GitHub comment

Functional Requirements

Requirement Description Priority
PR Input Accept GitHub PR URL (e.g., github.com/org/repo/pull/123) P0
Local Diff Accept local git diff as alternative input P1
File Reading Read individual files from the PR’s head commit P0
Pattern Search Search codebase for specific patterns (security, code smells) P0
Review Generation Produce structured review with categories and line numbers P0
GitHub Integration Post review as PR comment P1
Progress Logging Show agent’s reasoning and tool calls in real-time P0
Rate Limiting Handle GitHub API rate limits gracefully P1
Large PR Handling Gracefully handle PRs with many changed files P1

Non-Functional Requirements

Requirement Target Rationale
Latency < 30s for typical PR (5-10 files) User experience
Token Efficiency < 50K tokens per review Cost control
Reliability Graceful degradation on API failures Production readiness
Observability Full trace of agent decisions Debugging

Tool Definitions

The agent requires these core tools:

+-------------------+----------------------------------------+
|      Tool         |              Purpose                   |
+-------------------+----------------------------------------+
| fetchPRMetadata   | Get PR title, description, file list   |
| getChangedFiles   | List all files modified in the PR      |
| readFile          | Read contents of a specific file       |
| getDiff           | Get unified diff for a file            |
| searchPattern     | Grep-like search across files          |
| generateReview    | Produce final structured review        |
+-------------------+----------------------------------------+

Real World Outcome

When you run the CLI, here’s exactly what you’ll see in your terminal:

$ review-agent https://github.com/acme/widget-api/pull/47

 Code Review Agent initialized
 Fetching PR #47 from acme/widget-api...

[Step 1] Tool: fetchPRMetadata
         Title: "Add user authentication middleware"
         Files changed: 5
         Additions: 234, Deletions: 12

[Step 2] Tool: readFile("src/middleware/auth.ts")
         Reading 89 lines...

[Step 3] Tool: readFile("src/routes/users.ts")
         Reading 156 lines...

[Step 4] Tool: searchPattern("password", "*.ts")
         Found 3 matches across 2 files

[Step 5] Tool: readFile("src/utils/crypto.ts")
         Reading 45 lines...

[Step 6] Agent reasoning: "I've reviewed the authentication
         implementation. I noticed potential security issues with
         password handling and a missing rate limiter. Let me
         generate the review."

[Step 7] Tool: generateReview

=====================================================================
                    CODE REVIEW: PR #47
            "Add user authentication middleware"
=====================================================================

## Security Issues (2)

### 1. Plaintext Password Logging
**File:** `src/middleware/auth.ts:34`
**Severity:** Critical

```typescript
console.log(`Login attempt for user: ${email}, password: ${password}`);
                                                        ^^^^^^^^

Issue: Password is logged in plaintext. This exposes credentials in logs. Fix: Remove password from log statement or mask it.

2. Missing Rate Limiting

File: src/routes/users.ts:12-45 Severity: High

The /login endpoint has no rate limiting, making it vulnerable to brute-force attacks.

Recommended: Add rate limiting middleware (e.g., express-rate-limit).


Code Quality (3)

1. Magic Number

File: src/middleware/auth.ts:56

const TOKEN_EXPIRY = 86400;  // What is this number?

Suggestion: Use named constant: const ONE_DAY_IN_SECONDS = 86400;

2. Missing Error Type

File: src/utils/crypto.ts:23

} catch (e) {
    throw e;  // No type narrowing
}

Suggestion: Use catch (e: unknown) and proper error handling.

3. Inconsistent Async Pattern

File: src/routes/users.ts:78 Mixing .then() and async/await. Prefer consistent async/await.


Suggestions (2)

1. Add Input Validation

File: src/routes/users.ts:15 Consider adding Zod schema validation for the login request body.

2. Extract JWT Secret

File: src/middleware/auth.ts:8 JWT_SECRET should come from environment variables, not hardcoded.


Summary

Category Count
Security Issues 2
Code Quality 3
Suggestions 2

Overall: This PR introduces authentication but has critical security issues that must be addressed before merging.

Recommendation: Request changes

=====================================================================

Full review saved to: review-pr-47.md Ready to post as PR comment? [y/N]


If the user confirms, the agent posts the review as a GitHub comment:

```bash
$ y

 Posting review to GitHub...
 Review posted: https://github.com/acme/widget-api/pull/47#issuecomment-1234567

Done! Agent completed in 12.3s (7 steps, 3 files analyzed)

Solution Architecture

High-Level Architecture

                    CODE REVIEW AGENT ARCHITECTURE

    +------------------------------------------------------------------+
    |                         CLI INTERFACE                             |
    |  $ review-agent <pr-url> [--output=file] [--post]                |
    +------------------------------------------------------------------+
                                  |
                                  v
    +------------------------------------------------------------------+
    |                      INPUT PARSER                                 |
    |  - Extract owner, repo, PR number from URL                       |
    |  - Validate GitHub token                                         |
    +------------------------------------------------------------------+
                                  |
                                  v
    +------------------------------------------------------------------+
    |                     AGENT ORCHESTRATOR                           |
    |                                                                   |
    |   +------------------+    +------------------+                    |
    |   |   System Prompt  |    |   User Message   |                    |
    |   |  "You are a code |    |  "Review PR #47  |                    |
    |   |   reviewer..."   |    |   from repo X"   |                    |
    |   +------------------+    +------------------+                    |
    |              |                     |                              |
    |              +----------+----------+                              |
    |                         |                                         |
    |                         v                                         |
    |   +--------------------------------------------------+           |
    |   |                 generateText()                    |           |
    |   |                                                   |           |
    |   |  model: openai('gpt-4')                          |           |
    |   |  tools: { fetchPR, readFile, search, review }    |           |
    |   |  stopWhen: hasToolCall('generateReview')         |           |
    |   |  maxSteps: 15                                    |           |
    |   |  onStepFinish: logProgress                       |           |
    |   +--------------------------------------------------+           |
    +------------------------------------------------------------------+
                                  |
                                  v
    +------------------------------------------------------------------+
    |                      TOOL REGISTRY                                |
    |                                                                   |
    |  +------------+  +------------+  +------------+  +------------+  |
    |  | fetchPR    |  | readFile   |  | search     |  | getDiff    |  |
    |  | Metadata   |  |            |  | Pattern    |  |            |  |
    |  +------------+  +------------+  +------------+  +------------+  |
    |        |               |               |               |          |
    |        +---------------+---------------+---------------+          |
    |                        |                                          |
    |                        v                                          |
    |               +------------------+                                |
    |               | generateReview   |                                |
    |               | (Terminal Tool)  |                                |
    |               +------------------+                                |
    +------------------------------------------------------------------+
                                  |
                                  v
    +------------------------------------------------------------------+
    |                    EXTERNAL SERVICES                              |
    |                                                                   |
    |  +------------------+           +------------------+              |
    |  |   GitHub API     |           |    OpenAI API    |              |
    |  |                  |           |                  |              |
    |  | - /pulls/:id     |           | - chat/complete  |              |
    |  | - /contents/:path|           |   with tools     |              |
    |  | - /comments      |           |                  |              |
    |  +------------------+           +------------------+              |
    +------------------------------------------------------------------+

Tool Registry Design

                        TOOL REGISTRY PATTERN

    +------------------------------------------------------------------+
    |                        tools/index.ts                             |
    |                                                                   |
    |  export const tools = {                                          |
    |    fetchPRMetadata,  // from ./github.ts                         |
    |    getChangedFiles,  // from ./github.ts                         |
    |    readFile,         // from ./files.ts                          |
    |    getDiff,          // from ./files.ts                          |
    |    searchPattern,    // from ./search.ts                         |
    |    generateReview,   // from ./review.ts (terminal)              |
    |  };                                                              |
    +------------------------------------------------------------------+
                                  |
                                  |
        +-------------------------+-------------------------+
        |                         |                         |
        v                         v                         v
    +-----------+           +-----------+           +-----------+
    |github.ts  |           | files.ts  |           | search.ts |
    |           |           |           |           |           |
    |fetchPR    |           |readFile   |           |searchPat  |
    |Metadata() |           |()         |           |tern()     |
    |           |           |           |           |           |
    |getChanged |           |getDiff()  |           |           |
    |Files()    |           |           |           |           |
    +-----------+           +-----------+           +-----------+
        |                         |                         |
        v                         v                         v
    +-----------+           +-----------+           +-----------+
    | GitHubAPI |           | Node fs   |           |child_proc |
    | (Octokit) |           | promises  |           |exec('rg') |
    +-----------+           +-----------+           +-----------+

Agent Loop Internals

                    AGENT LOOP DETAILED FLOW

    generateText() called
           |
           v
    +------+------+
    | Initialize  |
    | context     |
    | messages=[] |
    +-------------+
           |
           v
    +------+------+
    | step = 0    |<----------------------------------+
    +-------------+                                   |
           |                                          |
           v                                          |
    +------+------+                                   |
    | step++      |                                   |
    +-------------+                                   |
           |                                          |
           v                                          |
    +------+------+                                   |
    | step >      |--Yes--> Return { text, steps }   |
    | maxSteps?   |                                   |
    +-------------+                                   |
           |No                                        |
           v                                          |
    +------+------+                                   |
    | LLM call    |                                   |
    | with tools  |                                   |
    | & context   |                                   |
    +-------------+                                   |
           |                                          |
           v                                          |
    +------+------+                                   |
    | Response    |                                   |
    | has tool    |--No--> Return { text, steps }    |
    | calls?      |        (LLM chose to respond)    |
    +-------------+                                   |
           |Yes                                       |
           v                                          |
    +------+------+                                   |
    | stopWhen    |--Yes--> Return { text, steps }   |
    | (toolCall)? |        (Goal reached)            |
    +-------------+                                   |
           |No                                        |
           v                                          |
    +------+------+                                   |
    | Execute     |                                   |
    | tool(s)     |                                   |
    +-------------+                                   |
           |                                          |
           v                                          |
    +------+------+                                   |
    | onStep      |                                   |
    | Finish()    |                                   |
    +-------------+                                   |
           |                                          |
           v                                          |
    +------+------+                                   |
    | Append to   |                                   |
    | context:    |                                   |
    | - assistant |                                   |
    |   message   |                                   |
    | - tool      |-----------------------------------+
    |   results   |
    +-------------+

GitHub API Integration Points

                    GITHUB API INTEGRATION

    PR URL: github.com/acme/widget-api/pull/47
                         |
                         v
    +------------------------------------------+
    |          URL PARSER                       |
    |  owner = "acme"                          |
    |  repo = "widget-api"                     |
    |  prNumber = 47                           |
    +------------------------------------------+
                         |
    +--------------------+--------------------+
    |                    |                    |
    v                    v                    v

    GET /repos/{owner}/{repo}/pulls/{prNumber}
    +------------------------------------------+
    | Response:                                 |
    | {                                        |
    |   title: "Add auth middleware",          |
    |   body: "...",                           |
    |   head: { sha: "abc123" },               |
    |   base: { sha: "def456" },               |
    |   changed_files: 5,                      |
    |   additions: 234,                        |
    |   deletions: 12                          |
    | }                                        |
    +------------------------------------------+

    GET /repos/{owner}/{repo}/pulls/{prNumber}/files
    +------------------------------------------+
    | Response:                                 |
    | [                                        |
    |   {                                      |
    |     filename: "src/auth.ts",             |
    |     status: "modified",                  |
    |     additions: 89,                       |
    |     deletions: 4,                        |
    |     patch: "@@ -10,4 +10,89 @@..."       |
    |   },                                     |
    |   ...                                    |
    | ]                                        |
    +------------------------------------------+

    GET /repos/{owner}/{repo}/contents/{path}?ref={sha}
    +------------------------------------------+
    | Response:                                 |
    | {                                        |
    |   content: "base64-encoded-content",     |
    |   encoding: "base64"                     |
    | }                                        |
    +------------------------------------------+

    POST /repos/{owner}/{repo}/issues/{prNumber}/comments
    +------------------------------------------+
    | Request Body:                             |
    | {                                        |
    |   body: "## Code Review\n\n..."          |
    | }                                        |
    +------------------------------------------+
review-agent/
|-- package.json
|-- tsconfig.json
|-- .env                      # GITHUB_TOKEN, OPENAI_API_KEY
|-- .env.example
|
|-- src/
|   |-- index.ts              # CLI entry point
|   |-- agent.ts              # Main agent orchestration
|   |
|   |-- tools/
|   |   |-- index.ts          # Tool registry
|   |   |-- github.ts         # fetchPRMetadata, getChangedFiles
|   |   |-- files.ts          # readFile, getDiff
|   |   |-- search.ts         # searchPattern
|   |   |-- review.ts         # generateReview (terminal tool)
|   |
|   |-- lib/
|   |   |-- github-client.ts  # Octokit wrapper
|   |   |-- url-parser.ts     # Parse PR URLs
|   |   |-- formatter.ts      # Format review output
|   |
|   |-- schemas/
|   |   |-- review.ts         # Zod schemas for review structure
|   |   |-- issue.ts          # Schema for individual issues
|   |
|   |-- prompts/
|   |   |-- system.ts         # System prompt for agent
|
|-- tests/
|   |-- unit/
|   |   |-- tools/
|   |   |   |-- github.test.ts
|   |   |   |-- files.test.ts
|   |   |   |-- search.test.ts
|   |
|   |-- integration/
|   |   |-- agent.test.ts     # Full agent tests with mocked LLM
|   |
|   |-- fixtures/
|       |-- sample-pr.json    # Sample PR metadata
|       |-- sample-files/     # Sample files to review
|
|-- README.md

Phased Implementation Guide

Phase 1: Foundation (Days 1-2)

Goal: Get a minimal agent loop working with one tool.

Milestone: Agent calls readFile and returns file contents.

// src/index.ts - Minimal viable agent
import { generateText, tool } from 'ai';
import { openai } from '@ai-sdk/openai';
import { z } from 'zod';
import * as fs from 'fs/promises';

const readFile = tool({
  description: 'Read a file from the local filesystem',
  parameters: z.object({
    path: z.string().describe('Path to the file')
  }),
  execute: async ({ path }) => {
    try {
      return await fs.readFile(path, 'utf-8');
    } catch (error) {
      return `Error reading file: ${error}`;
    }
  }
});

async function main() {
  const { text, steps } = await generateText({
    model: openai('gpt-4'),
    tools: { readFile },
    prompt: 'Read the file package.json and tell me the project name'
  });

  console.log('Steps taken:', steps.length);
  console.log('Result:', text);
}

main();

Checklist:

  • Project initialized with TypeScript
  • AI SDK and OpenAI provider installed
  • Single tool defined and working
  • Agent successfully calls tool and uses result

Phase 2: Tool Suite (Days 3-5)

Goal: Implement all tools needed for code review.

Milestone: Agent can fetch PR metadata, read files, and search patterns.

Tasks:

  1. Set up GitHub API client (Octokit)
  2. Implement fetchPRMetadata tool
  3. Implement getChangedFiles tool
  4. Implement searchPattern tool
  5. Implement getDiff tool
  6. Add URL parser for PR URLs
// src/tools/github.ts
import { Octokit } from '@octokit/rest';
import { tool } from 'ai';
import { z } from 'zod';

const octokit = new Octokit({ auth: process.env.GITHUB_TOKEN });

export const fetchPRMetadata = tool({
  description: 'Fetch metadata for a GitHub Pull Request including title, ' +
               'description, and statistics. Use this first to understand ' +
               'what the PR is about.',
  parameters: z.object({
    owner: z.string().describe('Repository owner (user or org)'),
    repo: z.string().describe('Repository name'),
    prNumber: z.number().describe('Pull request number')
  }),
  execute: async ({ owner, repo, prNumber }) => {
    const { data } = await octokit.pulls.get({ owner, repo, pull_number: prNumber });
    return {
      title: data.title,
      body: data.body,
      changedFiles: data.changed_files,
      additions: data.additions,
      deletions: data.deletions,
      headSha: data.head.sha,
      baseSha: data.base.sha
    };
  }
});

export const getChangedFiles = tool({
  description: 'Get the list of files changed in a Pull Request. ' +
               'Returns filenames, status (added/modified/deleted), ' +
               'and line change counts.',
  parameters: z.object({
    owner: z.string(),
    repo: z.string(),
    prNumber: z.number()
  }),
  execute: async ({ owner, repo, prNumber }) => {
    const { data } = await octokit.pulls.listFiles({
      owner,
      repo,
      pull_number: prNumber
    });
    return data.map(f => ({
      filename: f.filename,
      status: f.status,
      additions: f.additions,
      deletions: f.deletions
    }));
  }
});

Checklist:

  • GitHub client configured with authentication
  • All 5 tools implemented and tested individually
  • URL parser extracts owner/repo/PR from various URL formats
  • Error handling for API failures

Phase 3: Agent Loop (Days 6-8)

Goal: Wire tools into agent with proper termination.

Milestone: Agent autonomously reviews a PR and produces structured output.

Tasks:

  1. Create system prompt for code review
  2. Implement generateReview terminal tool
  3. Configure stopWhen with hasToolCall
  4. Add onStepFinish for progress logging
  5. Add maxSteps safety limit
// src/agent.ts
import { generateText, hasToolCall } from 'ai';
import { openai } from '@ai-sdk/openai';
import { tools } from './tools';
import { systemPrompt } from './prompts/system';

export async function reviewPR(prUrl: string) {
  const { owner, repo, prNumber } = parsePRUrl(prUrl);

  const result = await generateText({
    model: openai('gpt-4'),
    system: systemPrompt,
    tools,
    maxSteps: 15,
    stopWhen: hasToolCall('generateReview'),
    onStepFinish: ({ stepType, toolCalls, text }) => {
      console.log(`[Step] ${stepType}`);
      if (toolCalls) {
        for (const call of toolCalls) {
          console.log(`  Tool: ${call.toolName}(${JSON.stringify(call.args)})`);
        }
      }
      if (text) {
        console.log(`  Reasoning: ${text.slice(0, 100)}...`);
      }
    },
    prompt: `Review Pull Request #${prNumber} from ${owner}/${repo}.
             Start by fetching the PR metadata.`
  });

  // Extract the review from the final tool call
  const reviewStep = result.steps.find(s =>
    s.toolCalls?.some(tc => tc.toolName === 'generateReview')
  );

  return reviewStep?.toolCalls?.[0]?.args;
}

Checklist:

  • System prompt guides agent behavior
  • Agent calls tools in logical sequence
  • Progress visible in terminal
  • Agent terminates when calling generateReview
  • Structured review extracted from result

Phase 4: Polish and CLI (Days 9-11)

Goal: Production-ready CLI with formatting and GitHub posting.

Milestone: Full CLI experience as shown in Real World Outcome.

Tasks:

  1. Build CLI interface with commander.js
  2. Format review output with colors (chalk)
  3. Save review to markdown file
  4. Implement GitHub comment posting
  5. Add interactive confirmation prompts
  6. Handle edge cases (large PRs, private repos)
// src/index.ts
import { Command } from 'commander';
import chalk from 'chalk';
import { reviewPR } from './agent';
import { postReviewComment } from './lib/github-client';
import { formatReview } from './lib/formatter';

const program = new Command();

program
  .name('review-agent')
  .description('AI-powered code review agent')
  .argument('<pr-url>', 'GitHub PR URL')
  .option('-o, --output <file>', 'Save review to file')
  .option('-p, --post', 'Post review as GitHub comment')
  .action(async (prUrl, options) => {
    console.log(chalk.blue(' Code Review Agent initialized'));

    const review = await reviewPR(prUrl);
    const formatted = formatReview(review);

    console.log(formatted);

    if (options.output) {
      await fs.writeFile(options.output, formatted);
      console.log(chalk.green(` Saved to ${options.output}`));
    }

    if (options.post) {
      const url = await postReviewComment(prUrl, review);
      console.log(chalk.green(` Posted: ${url}`));
    }
  });

program.parse();

Checklist:

  • CLI parses arguments correctly
  • Output is formatted and colored
  • Review saved to file when requested
  • GitHub posting works with confirmation
  • Error messages are helpful

Phase 5: Robustness (Days 12-14)

Goal: Handle edge cases and improve reliability.

Milestone: Agent gracefully handles failures and large PRs.

Tasks:

  1. Add retry logic for transient failures
  2. Implement context windowing for large PRs
  3. Add rate limiting for GitHub API
  4. Handle binary files gracefully
  5. Add timeout handling
  6. Write comprehensive tests

Checklist:

  • Retries on transient API failures
  • Large PRs handled without context overflow
  • Rate limits respected
  • Binary files skipped with message
  • Operations timeout after reasonable period
  • Tests cover happy path and error cases

Testing Strategy

Testing AI Agents: The Challenge

Testing agents is uniquely challenging because:

  1. LLM responses are non-deterministic
  2. Tool orchestration is dynamic
  3. External APIs add complexity
  4. The agent makes autonomous decisions

Mocking LLM Responses

Create predictable test scenarios by mocking the model:

// tests/mocks/model.ts
import { MockLanguageModelV1 } from 'ai/test';

export function createMockModel(responses: string[]) {
  let callIndex = 0;

  return new MockLanguageModelV1({
    doGenerate: async () => {
      const response = responses[callIndex++];
      // Parse response to determine if it's text or tool call
      if (response.startsWith('TOOL:')) {
        const [_, toolName, args] = response.match(/TOOL:(\w+):(.+)/)!;
        return {
          type: 'tool-call',
          toolCalls: [{
            toolName,
            args: JSON.parse(args)
          }]
        };
      }
      return { type: 'text', text: response };
    }
  });
}

// tests/integration/agent.test.ts
import { createMockModel } from '../mocks/model';

describe('Code Review Agent', () => {
  it('calls fetchPRMetadata first', async () => {
    const mockModel = createMockModel([
      'TOOL:fetchPRMetadata:{"owner":"acme","repo":"api","prNumber":1}'
    ]);

    const result = await reviewPR('https://github.com/acme/api/pull/1', {
      model: mockModel
    });

    expect(result.steps[0].toolCalls[0].toolName).toBe('fetchPRMetadata');
  });
});

Testing Tool Execution

Test tools in isolation from the LLM:

// tests/unit/tools/github.test.ts
import { fetchPRMetadata } from '../../../src/tools/github';
import nock from 'nock';

describe('fetchPRMetadata', () => {
  beforeEach(() => {
    nock('https://api.github.com')
      .get('/repos/acme/api/pulls/47')
      .reply(200, {
        title: 'Add auth',
        body: 'Description',
        changed_files: 5,
        additions: 100,
        deletions: 10,
        head: { sha: 'abc123' },
        base: { sha: 'def456' }
      });
  });

  afterEach(() => {
    nock.cleanAll();
  });

  it('returns structured PR metadata', async () => {
    const result = await fetchPRMetadata.execute({
      owner: 'acme',
      repo: 'api',
      prNumber: 47
    });

    expect(result).toEqual({
      title: 'Add auth',
      body: 'Description',
      changedFiles: 5,
      additions: 100,
      deletions: 10,
      headSha: 'abc123',
      baseSha: 'def456'
    });
  });

  it('handles API errors gracefully', async () => {
    nock.cleanAll();
    nock('https://api.github.com')
      .get('/repos/acme/api/pulls/999')
      .reply(404, { message: 'Not Found' });

    await expect(fetchPRMetadata.execute({
      owner: 'acme',
      repo: 'api',
      prNumber: 999
    })).rejects.toThrow(/Not Found/);
  });
});

Integration Testing with Real LLM

For integration tests, use deterministic prompts and validate behavior:

// tests/integration/agent-real.test.ts
describe('Agent Integration (Real LLM)', () => {
  // Skip in CI, run manually
  it.skip('reviews a real PR end-to-end', async () => {
    const review = await reviewPR(
      'https://github.com/your-test-repo/pull/1'
    );

    // Validate structure, not exact content
    expect(review).toHaveProperty('securityIssues');
    expect(review).toHaveProperty('codeQuality');
    expect(review).toHaveProperty('recommendation');
    expect(['approve', 'request-changes', 'comment'])
      .toContain(review.recommendation);
  });
});

Test Matrix

+---------------------+---------------+---------------+---------------+
|    Test Type        | Determinism   | Speed         | Coverage      |
+---------------------+---------------+---------------+---------------+
| Unit (tools)        | Deterministic | Fast (ms)     | Tool logic    |
| Unit (mocked LLM)   | Deterministic | Fast (ms)     | Orchestration |
| Integration (mock)  | Deterministic | Medium (s)    | Full flow     |
| Integration (real)  | Non-determ    | Slow (10s+)   | E2E behavior  |
+---------------------+---------------+---------------+---------------+

Common Pitfalls and Debugging

Pitfall 1: Poor Tool Descriptions

Symptom: LLM calls wrong tool or ignores available tools.

Cause: Tool descriptions don’t clearly explain when to use them.

Solution: Write descriptions as if explaining to a new team member.

// BAD
description: 'Read file'

// GOOD
description: 'Read the contents of a source code file from the repository. ' +
             'Use this when you need to examine implementation details or ' +
             'understand code structure. Returns the full file contents as text. ' +
             'For very large files, consider using searchPattern first.'

Pitfall 2: Context Overflow

Symptom: Agent crashes with “context length exceeded” or responses become confused.

Cause: Tool results accumulate without summarization.

Solution: Implement context management with prepareStep or limit tool output size.

const readFile = tool({
  description: 'Read file contents',
  parameters: z.object({ path: z.string() }),
  execute: async ({ path }) => {
    const content = await fs.readFile(path, 'utf-8');
    const lines = content.split('\n');

    // Limit returned content
    if (lines.length > 100) {
      return `[First 100 lines of ${lines.length} total]\n` +
             lines.slice(0, 100).join('\n') +
             '\n...[truncated]';
    }
    return content;
  }
});

Pitfall 3: Missing Stop Condition

Symptom: Agent runs forever or hits maxSteps without meaningful result.

Cause: No clear termination condition or LLM doesn’t understand when to stop.

Solution: Use explicit terminal tool with clear description.

const generateReview = tool({
  description: 'Generate the final code review. ' +
               'IMPORTANT: Call this tool when you have gathered enough ' +
               'information to write a complete review. Do not call any ' +
               'other tools after this.',
  parameters: reviewSchema,
  execute: async (review) => review
});

// In agent:
stopWhen: hasToolCall('generateReview')

Pitfall 4: No Error Recovery

Symptom: Agent crashes when GitHub API returns error.

Cause: Tools don’t handle errors gracefully.

Solution: Return errors as data, let LLM decide next action.

const readFile = tool({
  description: 'Read file. Returns error message if file not found.',
  parameters: z.object({ path: z.string() }),
  execute: async ({ path }) => {
    try {
      return await fs.readFile(path, 'utf-8');
    } catch (error) {
      // Return error as data, don't throw
      return `ERROR: Could not read file ${path}: ${error.message}`;
    }
  }
});

This allows the LLM to reason: “The file doesn’t exist, let me try a different approach.”

Pitfall 5: Non-Deterministic Testing

Symptom: Tests pass/fail randomly.

Cause: Testing with real LLM without controlling for non-determinism.

Solution: Mock LLM for deterministic tests, use behavioral assertions for real LLM tests.

// DON'T assert exact output
expect(review.summary).toBe('This PR has security issues');

// DO assert structure and reasonable behavior
expect(review.summary).toBeDefined();
expect(review.summary.length).toBeGreaterThan(20);

Pitfall 6: Forgetting Rate Limits

Symptom: Agent works for small PRs but fails on larger ones.

Cause: GitHub API rate limiting kicks in.

Solution: Implement rate limiting and caching.

import Bottleneck from 'bottleneck';

const limiter = new Bottleneck({
  minTime: 100,  // Minimum 100ms between requests
  maxConcurrent: 3
});

const octokit = new Octokit({
  auth: process.env.GITHUB_TOKEN,
  request: {
    fetch: async (url, options) => {
      return limiter.schedule(() => fetch(url, options));
    }
  }
});

Pitfall 7: Ignoring onStepFinish

Symptom: No visibility into what agent is doing; hard to debug.

Cause: No observability hooks implemented.

Solution: Always use onStepFinish for logging.

const result = await generateText({
  model,
  tools,
  onStepFinish: ({ stepType, toolCalls, text, usage }) => {
    console.log(`[${new Date().toISOString()}] Step: ${stepType}`);
    console.log(`  Tokens: ${usage?.totalTokens}`);

    if (toolCalls?.length) {
      for (const call of toolCalls) {
        console.log(`  Tool: ${call.toolName}`);
        console.log(`  Args: ${JSON.stringify(call.args, null, 2)}`);
      }
    }

    if (text) {
      console.log(`  Reasoning: ${text.substring(0, 200)}`);
    }
  },
  prompt: '...'
});

Extensions and Challenges

Extension 1: Multi-Language Support

Challenge: Extend the agent to review code in multiple languages with language-specific rules.

Implementation Ideas:

  • Add language detection tool
  • Create language-specific prompt templates
  • Implement language-specific pattern searches (e.g., Go’s error handling, Python’s type hints)
const detectLanguage = tool({
  description: 'Detect the primary programming language of a file',
  parameters: z.object({ path: z.string() }),
  execute: async ({ path }) => {
    const ext = path.split('.').pop();
    const languageMap = {
      'ts': 'TypeScript',
      'tsx': 'TypeScript/React',
      'js': 'JavaScript',
      'py': 'Python',
      'go': 'Go',
      'rs': 'Rust'
    };
    return languageMap[ext] || 'Unknown';
  }
});

Extension 2: Learning from Feedback

Challenge: Allow users to rate reviews and use that feedback to improve future reviews.

Implementation Ideas:

  • Store reviews and their ratings in a database
  • Include highly-rated past reviews as few-shot examples
  • Use RAG to retrieve relevant past reviews
USER FEEDBACK LOOP

  Agent Review -> User Rating -> Store in DB
       ^                              |
       |                              v
       +-------- RAG Retrieval -------+

  "For this authentication PR, here are examples
   of highly-rated reviews on similar PRs..."

Extension 3: CI/CD Integration

Challenge: Run the agent automatically on every PR via GitHub Actions.

Implementation Ideas:

  • Create GitHub Action workflow
  • Run agent on PR open/update events
  • Post review as PR check or comment
  • Handle concurrent executions
# .github/workflows/ai-review.yml
name: AI Code Review
on:
  pull_request:
    types: [opened, synchronize]

jobs:
  review:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Run AI Review
        env:
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
        run: |
          npx review-agent ${{ github.event.pull_request.html_url }} --post

Extension 4: Diff-Aware Review

Challenge: Focus the review only on changed lines, not entire files.

Implementation Ideas:

  • Parse unified diff format
  • Extract only changed lines with context
  • Generate line-specific comments that map to diff hunks
  • Use GitHub’s review API for inline comments
const getDiffContext = tool({
  description: 'Get the specific lines changed in a file with surrounding context',
  parameters: z.object({
    path: z.string(),
    contextLines: z.number().default(3)
  }),
  execute: async ({ path, contextLines }) => {
    const diff = await getDiff(path);
    const hunks = parseDiff(diff);

    return hunks.map(hunk => ({
      startLine: hunk.newStart,
      endLine: hunk.newStart + hunk.newLines,
      oldContent: hunk.oldLines,
      newContent: hunk.newLines,
      context: hunk.context
    }));
  }
});

Resources

Books

Book Relevant Chapters What You’ll Learn
“Artificial Intelligence: A Modern Approach” by Russell & Norvig Ch. 2: Intelligent Agents Deep theoretical foundation for agents, PEAS framework, agent types
“Programming TypeScript” by Boris Cherny Ch. 4: Functions, Ch. 7: Error Handling Type-safe function design, error handling patterns for tools
“Release It!, 2nd Edition” by Michael Nygard Ch. 5: Stability Patterns Circuit breakers, timeouts, retry logic for resilient agents
“Command-Line Rust” by Ken Youens-Clark Ch. 1-3 CLI design patterns (applicable to any language)
“Designing Data-Intensive Applications” by Martin Kleppmann Ch. 1-2 Thinking about reliability and maintainability

Papers

  • “ReAct: Synergizing Reasoning and Acting in Language Models” by Yao et al. - The foundational paper on ReAct agents
  • “Toolformer: Language Models Can Teach Themselves to Use Tools” - Understanding how LLMs learn tool use

Documentation

Videos and Courses

  • AI SDK Official YouTube tutorials
  • “Building AI Agents” series on LangChain’s YouTube channel
  • “Prompt Engineering for Tool Use” - Anthropic’s documentation
  1. AI SDK Tools Docs (30 min) - Understand tool definition syntax
  2. AI SDK Agents Docs (30 min) - Understand stopWhen and loop control
  3. Russell & Norvig Ch. 2 (1-2 hours) - Deep mental model for agents
  4. GitHub Pull Requests API (30 min) - Understand the data you’ll work with
  5. Cherny Ch. 7 (1 hour) - TypeScript error handling for robust tools
  6. Nygard Ch. 5 (1 hour) - Stability patterns for production readiness
  7. Start coding!

Self-Assessment Checklist

Use this checklist to verify your understanding before considering the project complete:

Conceptual Understanding

  • Can you explain the difference between an LLM call and an AI agent?
    • Agent: LLM in a loop that can take actions via tools
    • LLM call: Single request/response without actions
  • Can you draw the agent loop from memory?
    • Perceive (context) -> Decide (LLM reasoning) -> Act (tool or respond) -> Loop
  • Can you explain how the LLM “sees” your tool definitions?
    • Tools are converted to JSON schema with name, description, parameters
    • Description is prompt engineering for tool selection
  • Can you explain what stopWhen does and when to use it?
    • Checks each step to determine if loop should terminate
    • Use with terminal tools like generateReview
  • Can you explain the ReAct pattern?
    • Reasoning + Acting: LLM explicitly reasons before each action
    • Thought -> Action -> Observation cycle

Implementation Skills

  • Can you implement a tool with proper description and parameters?
    • Description explains when/why to use
    • Zod schema with .describe() for each parameter
    • Execute function handles errors gracefully
  • Can you set up an agent loop with proper termination?
    • maxSteps for safety limit
    • stopWhen for goal-based termination
    • Terminal tool that signals completion
  • Can you implement onStepFinish for observability?
    • Log step type, tool calls, reasoning
    • Track token usage
  • Can you handle context growth in long-running agents?
    • Limit tool output size
    • Use prepareStep for summarization
    • Prioritize relevant information
  • Can you integrate with the GitHub API?
    • Fetch PR metadata
    • Read file contents at specific commits
    • Post comments

Testing and Debugging

  • Can you test tools in isolation?
    • Mock external APIs with nock
    • Test success and error cases
  • Can you test agent orchestration with mocked LLM?
    • MockLanguageModelV1 for deterministic responses
    • Verify tool call sequence
  • Can you debug an agent that isn’t calling the right tools?
    • Check tool descriptions
    • Verify prompt clarity
    • Use onStepFinish to trace decisions
  • Can you handle tool failures gracefully?
    • Return errors as data, don’t throw
    • Let LLM decide recovery strategy

Production Readiness

  • Does your agent have proper error handling?
    • API failures don’t crash the agent
    • User sees helpful error messages
  • Does your agent handle edge cases?
    • Large PRs with many files
    • Binary files
    • Private repositories
    • Rate limiting
  • Is your agent observable?
    • Progress logged in real-time
    • Token usage tracked
    • Execution time measured

Summary

Building a code review agent teaches you the fundamental patterns of AI agent development:

  1. Tool Definition: How to give capabilities to an LLM through well-designed tools
  2. Agent Loop: How the perceive-decide-act cycle works in practice
  3. Context Management: How to handle growing conversation state
  4. Termination: How to know when an agent should stop
  5. Observability: How to understand what an agent is doing
  6. Resilience: How to build agents that gracefully handle failures

This project bridges the gap between “AI generates text” and “AI takes actions.” You’re not just building a code reviewer - you’re learning patterns that apply to any autonomous AI system: research agents, data analysis agents, customer support agents, and beyond.

The skills you develop here - designing tool interfaces, managing context, handling non-determinism, testing autonomous systems - are increasingly valuable as AI agents become central to software development.

Next Steps After Completion:

  • Project 4: Multi-Provider Model Router (apply tool patterns to API routing)
  • Project 5: Semantic Search Pipeline (combine agents with embeddings)
  • Project 6: Real-time AI Dashboard (agents in streaming contexts)

This guide is part of the AI SDK Learning Projects series. For the full project list, see AI_SDK_LEARNING_PROJECTS.md.