Project 3: Code Review Agent with Tool Calling
Project 3: Code Review Agent with Tool Calling
Deep Dive Guide AI SDK Learning Projects Estimated Time: 1-2 weeks
Table of Contents
- Learning Objectives
- Deep Theoretical Foundation
- Complete Project Specification
- Real World Outcome
- Solution Architecture
- Phased Implementation Guide
- Testing Strategy
- Common Pitfalls and Debugging
- Extensions and Challenges
- Resources
- Self-Assessment Checklist
Learning Objectives
By completing this project, you will master:
- Agent Architecture: Understanding how LLM agents differ from simple LLM calls, including the perception-action loop, tool invocation, and autonomous decision-making
- Tool Definition: Creating well-designed tools using the AI SDKâs
tool()function with proper descriptions that guide LLM behavior - Agent Loop Control: Implementing
stopWhen,maxSteps, andonStepFinishto control and observe agent execution - Context Management: Handling growing conversation context as tools return data, preventing context overflow
- The ReAct Pattern: Implementing the Reasoning + Acting paradigm where the LLM reasons about its next step before taking action
- External API Integration: Connecting agents to real-world APIs (GitHub) for practical utility
- Error Recovery: Building resilient agents that gracefully handle tool failures and API errors
Deep Theoretical Foundation
What is an AI Agent?
An AI agent is fundamentally different from a simple LLM call. While a single LLM call is like asking a question and receiving an answer, an agent is like having an assistant who can actually do things in the world.
Russell & Norvigâs Definition (from âArtificial Intelligence: A Modern Approachâ, Ch. 2):
âAn agent is anything that can be viewed as perceiving its environment through sensors and acting upon that environment through actuators.â
For LLM agents, this translates to:
- Sensors: The information provided to the LLM (prompts, tool results, context)
- Actuators: The tools the agent can invoke (API calls, file operations, calculations)
- Environment: The external world (GitHub, file systems, databases)
THE AGENT PARADIGM
+-----------------+ +-----------------+
| Environment | | LLM Agent |
| | | |
| - GitHub API |<--------| Actuators: |
| - File System | Actions | - fetchPR() |
| - Databases | | - readFile() |
| | | - searchCode() |
| |-------->| |
| | Percepts| Sensors: |
| | | - Tool Results |
| | | - Error Msgs |
+-----------------+ +-----------------+
|
v
+-----------------+
| Decision Logic |
| (The LLM) |
| |
| "What should I |
| do next?" |
+-----------------+
The Agent Loop: Perception-Decision-Action
Every agent operates in a continuous loop until it achieves its goal or reaches a termination condition:
THE AGENT LOOP
+------------------------------------------+
| |
v |
+----------+ +----------+ +----------+ |
| PERCEIVE |---->| DECIDE |---->| ACT |--+
+----------+ +----------+ +----------+
| | |
| | |
Read context LLM reasons Execute tool
from tools about next and capture
and history action result
Detailed Flow:
Step 1: PERCEIVE
+--------------------------------------------------+
| Context includes: |
| - Original user prompt |
| - System instructions |
| - Previous tool calls and their results |
| - Any errors from previous steps |
+--------------------------------------------------+
|
v
Step 2: DECIDE
+--------------------------------------------------+
| LLM evaluates: |
| - "Do I have enough information to complete?" |
| - "Which tool should I call next?" |
| - "What parameters should I pass?" |
| - "Should I stop and respond to the user?" |
+--------------------------------------------------+
|
v
Step 3: ACT
+--------------------------------------------------+
| Options: |
| A) Call a tool with specific parameters |
| B) Generate final text response (terminate) |
| C) Call special "done" tool (terminate) |
+--------------------------------------------------+
|
v
Loop back to PERCEIVE
(unless terminated)
The tool() Function: How LLMs Invoke Tools
The tool() function is how you give capabilities to your agent. It has three critical components:
- Description: Tells the LLM when and why to use this tool
- Parameters Schema: Defines what the tool accepts (using Zod)
- Execute Function: Performs how the tool does its work
import { tool } from 'ai';
import { z } from 'zod';
const readFile = tool({
// DESCRIPTION: This is prompt engineering!
// The LLM reads this to decide when to call the tool
description: 'Read the contents of a file from the repository. ' +
'Use this when you need to examine source code. ' +
'Returns the full file contents as a string.',
// PARAMETERS: The LLM generates these values
parameters: z.object({
path: z.string().describe('Path to the file, e.g., "src/index.ts"')
}),
// EXECUTE: Your code that runs when the tool is called
execute: async ({ path }) => {
const content = await fs.readFile(path, 'utf-8');
return content;
}
});
How the LLM âSeesâ Your Tool:
When you define tools, the AI SDK converts them to a schema that the LLM understands:
YOUR CODE WHAT THE LLM SEES
const readFile = tool({ {
description: 'Read...', "name": "readFile",
parameters: z.object({ "description": "Read the contents
path: z.string() of a file from the repository.
.describe('...') Use this when you need to
}), examine source code...",
execute: async ({ path }) => { "parameters": {
... "type": "object",
} "properties": {
}); "path": {
"type": "string",
"description": "Path..."
}
},
"required": ["path"]
}
}
Tool Descriptions as Prompt Engineering:
The description is the most important part of a tool definition. Itâs literally prompt engineering that guides the LLMâs tool selection:
POOR DESCRIPTION GOOD DESCRIPTION
"Read file" "Read the contents of a source code
file from the repository. Use this
tool when you need to examine
implementation details, understand
code structure, or find specific
patterns. Returns the full file
as a string. For large files
(>500 lines), consider using
searchPattern first to locate
specific areas of interest."
Problems: Benefits:
- LLM doesn't know when - Clear use case
to use it - Explains return value
- No guidance on purpose - Suggests alternatives
- No context about - Helps LLM make decisions
return value
How stopWhen and maxSteps Work
The agent loop needs termination conditions. The AI SDK provides two mechanisms:
stopWhen: A function that examines each step and returns true when the loop should end.
import { generateText, hasToolCall } from 'ai';
const result = await generateText({
model: openai('gpt-4'),
tools: { readFile, searchPattern, generateReview },
// Stop when the agent calls generateReview
stopWhen: hasToolCall('generateReview'),
prompt: 'Review this PR...'
});
How stopWhen Works Internally:
Agent Loop with stopWhen
+---------+ +---------+ +---------+
| Step 1 |---->| Step 2 |---->| Step 3 |
| readFile| | search | |generateR|
+---------+ +---------+ +---------+
| | |
v v v
+----------+ +----------+ +----------+
| stopWhen | | stopWhen | | stopWhen |
| returns | | returns | | returns |
| false | | false | | TRUE! |
+----------+ +----------+ +----------+
| | |
v v v
Continue Continue STOP
|
v
Return result
maxSteps: A safety limit preventing infinite loops.
const result = await generateText({
model: openai('gpt-4'),
tools,
maxSteps: 10, // Absolute maximum iterations
stopWhen: hasToolCall('generateReview'),
prompt: 'Review this PR...'
});
Combined Flow:
For each step:
1. Check: steps >= maxSteps?
|
+-- Yes --> STOP (safety limit)
|
+-- No --> Continue
2. LLM generates response
3. Check: stopWhen(step) === true?
|
+-- Yes --> STOP (goal reached)
|
+-- No --> Execute tools, continue loop
Context Management and Conversation State
As the agent works, context grows. Each tool call adds to the conversation history:
CONTEXT GROWTH ACROSS ITERATIONS
Step 1:
+---------------------------+
| System: "You are a code |
| reviewer..." |
| User: "Review PR #47" |
| Assistant: Call readFile |
| Tool Result: [89 lines] | <-- +89 lines added
+---------------------------+
~200 tokens
Step 2:
+---------------------------+
| [Previous context] |
| Assistant: Call readFile |
| (another file) |
| Tool Result: [156 lines] | <-- +156 lines added
+---------------------------+
~600 tokens
Step 3:
+---------------------------+
| [Previous context] |
| Assistant: Call search |
| Tool Result: [3 matches] | <-- +30 lines added
+---------------------------+
~800 tokens
...
Step N:
+---------------------------+
| [All previous context] |
| |
| RISK: Context exceeds |
| model's context window! |
+---------------------------+
~128,000 tokens?
Managing Context with prepareStep:
The AI SDK allows you to preprocess context before each step:
const result = await generateText({
model: openai('gpt-4'),
tools,
prepareStep: async ({ steps }) => {
// Summarize old steps to reduce context
if (steps.length > 5) {
return {
steps: [
// Keep only summary of old steps
summarizeSteps(steps.slice(0, -2)),
// Keep last 2 steps in full
...steps.slice(-2)
]
};
}
return { steps };
},
prompt: 'Review this PR...'
});
The ReAct Pattern: Reasoning + Acting
ReAct (from the paper âReAct: Synergizing Reasoning and Acting in Language Modelsâ by Yao et al.) is a paradigm where the LLM explicitly reasons about its actions:
TRADITIONAL AGENT REACT AGENT
User: Review this PR User: Review this PR
| |
v v
Call readFile Thought: "I should first
| read the changed files
v to understand what was
Call searchPattern modified."
| |
v v
Return result Action: readFile
|
v
Observation: [file contents]
|
v
Thought: "I see password
handling. Let me search
for security patterns."
|
v
Action: searchPattern
|
v
... continues with
explicit reasoning
ReAct in AI SDK:
You can encourage ReAct behavior through your system prompt:
const systemPrompt = `
You are a code review agent. For each action you take:
1. THOUGHT: Explain your reasoning for the next action
2. ACTION: Call the appropriate tool
3. OBSERVATION: Analyze the tool result
4. Repeat until you have enough information
Always explain your thinking before acting.
`;
const result = await generateText({
model: openai('gpt-4'),
system: systemPrompt,
tools,
stopWhen: hasToolCall('generateReview'),
prompt: 'Review PR #47'
});
This produces agent traces like:
[Step 1] Thought: "I need to start by fetching the PR metadata
to understand what files were changed."
Action: fetchPRMetadata({prUrl: "..."})
[Step 2] Observation: "5 files changed, mostly in src/auth/"
Thought: "Authentication changes are security-sensitive.
I should read the main auth file first."
Action: readFile({path: "src/auth/middleware.ts"})
[Step 3] Observation: "I see password handling on line 34"
Thought: "Password handling is critical. Let me search
for any console.log statements that might leak credentials."
Action: searchPattern({pattern: "console.log.*password"})
Complete Project Specification
Overview
Build a CLI tool that functions as an autonomous code review agent. Given a GitHub Pull Request URL or local git diff, the agent will:
- Fetch PR metadata and list of changed files
- Autonomously decide which files to read and analyze
- Search for common code issues and security patterns
- Generate a structured code review with line-specific feedback
- Optionally post the review as a GitHub comment
Functional Requirements
| Requirement | Description | Priority |
|---|---|---|
| PR Input | Accept GitHub PR URL (e.g., github.com/org/repo/pull/123) |
P0 |
| Local Diff | Accept local git diff as alternative input | P1 |
| File Reading | Read individual files from the PRâs head commit | P0 |
| Pattern Search | Search codebase for specific patterns (security, code smells) | P0 |
| Review Generation | Produce structured review with categories and line numbers | P0 |
| GitHub Integration | Post review as PR comment | P1 |
| Progress Logging | Show agentâs reasoning and tool calls in real-time | P0 |
| Rate Limiting | Handle GitHub API rate limits gracefully | P1 |
| Large PR Handling | Gracefully handle PRs with many changed files | P1 |
Non-Functional Requirements
| Requirement | Target | Rationale |
|---|---|---|
| Latency | < 30s for typical PR (5-10 files) | User experience |
| Token Efficiency | < 50K tokens per review | Cost control |
| Reliability | Graceful degradation on API failures | Production readiness |
| Observability | Full trace of agent decisions | Debugging |
Tool Definitions
The agent requires these core tools:
+-------------------+----------------------------------------+
| Tool | Purpose |
+-------------------+----------------------------------------+
| fetchPRMetadata | Get PR title, description, file list |
| getChangedFiles | List all files modified in the PR |
| readFile | Read contents of a specific file |
| getDiff | Get unified diff for a file |
| searchPattern | Grep-like search across files |
| generateReview | Produce final structured review |
+-------------------+----------------------------------------+
Real World Outcome
When you run the CLI, hereâs exactly what youâll see in your terminal:
$ review-agent https://github.com/acme/widget-api/pull/47
Code Review Agent initialized
Fetching PR #47 from acme/widget-api...
[Step 1] Tool: fetchPRMetadata
Title: "Add user authentication middleware"
Files changed: 5
Additions: 234, Deletions: 12
[Step 2] Tool: readFile("src/middleware/auth.ts")
Reading 89 lines...
[Step 3] Tool: readFile("src/routes/users.ts")
Reading 156 lines...
[Step 4] Tool: searchPattern("password", "*.ts")
Found 3 matches across 2 files
[Step 5] Tool: readFile("src/utils/crypto.ts")
Reading 45 lines...
[Step 6] Agent reasoning: "I've reviewed the authentication
implementation. I noticed potential security issues with
password handling and a missing rate limiter. Let me
generate the review."
[Step 7] Tool: generateReview
=====================================================================
CODE REVIEW: PR #47
"Add user authentication middleware"
=====================================================================
## Security Issues (2)
### 1. Plaintext Password Logging
**File:** `src/middleware/auth.ts:34`
**Severity:** Critical
```typescript
console.log(`Login attempt for user: ${email}, password: ${password}`);
^^^^^^^^
Issue: Password is logged in plaintext. This exposes credentials in logs. Fix: Remove password from log statement or mask it.
2. Missing Rate Limiting
File: src/routes/users.ts:12-45
Severity: High
The /login endpoint has no rate limiting, making it vulnerable to
brute-force attacks.
Recommended: Add rate limiting middleware (e.g., express-rate-limit).
Code Quality (3)
1. Magic Number
File: src/middleware/auth.ts:56
const TOKEN_EXPIRY = 86400; // What is this number?
Suggestion: Use named constant: const ONE_DAY_IN_SECONDS = 86400;
2. Missing Error Type
File: src/utils/crypto.ts:23
} catch (e) {
throw e; // No type narrowing
}
Suggestion: Use catch (e: unknown) and proper error handling.
3. Inconsistent Async Pattern
File: src/routes/users.ts:78
Mixing .then() and async/await. Prefer consistent async/await.
Suggestions (2)
1. Add Input Validation
File: src/routes/users.ts:15
Consider adding Zod schema validation for the login request body.
2. Extract JWT Secret
File: src/middleware/auth.ts:8
JWT_SECRET should come from environment variables, not hardcoded.
Summary
| Category | Count |
|---|---|
| Security Issues | 2 |
| Code Quality | 3 |
| Suggestions | 2 |
Overall: This PR introduces authentication but has critical security issues that must be addressed before merging.
Recommendation: Request changes
=====================================================================
Full review saved to: review-pr-47.md Ready to post as PR comment? [y/N]
If the user confirms, the agent posts the review as a GitHub comment:
```bash
$ y
Posting review to GitHub...
Review posted: https://github.com/acme/widget-api/pull/47#issuecomment-1234567
Done! Agent completed in 12.3s (7 steps, 3 files analyzed)
Solution Architecture
High-Level Architecture
CODE REVIEW AGENT ARCHITECTURE
+------------------------------------------------------------------+
| CLI INTERFACE |
| $ review-agent <pr-url> [--output=file] [--post] |
+------------------------------------------------------------------+
|
v
+------------------------------------------------------------------+
| INPUT PARSER |
| - Extract owner, repo, PR number from URL |
| - Validate GitHub token |
+------------------------------------------------------------------+
|
v
+------------------------------------------------------------------+
| AGENT ORCHESTRATOR |
| |
| +------------------+ +------------------+ |
| | System Prompt | | User Message | |
| | "You are a code | | "Review PR #47 | |
| | reviewer..." | | from repo X" | |
| +------------------+ +------------------+ |
| | | |
| +----------+----------+ |
| | |
| v |
| +--------------------------------------------------+ |
| | generateText() | |
| | | |
| | model: openai('gpt-4') | |
| | tools: { fetchPR, readFile, search, review } | |
| | stopWhen: hasToolCall('generateReview') | |
| | maxSteps: 15 | |
| | onStepFinish: logProgress | |
| +--------------------------------------------------+ |
+------------------------------------------------------------------+
|
v
+------------------------------------------------------------------+
| TOOL REGISTRY |
| |
| +------------+ +------------+ +------------+ +------------+ |
| | fetchPR | | readFile | | search | | getDiff | |
| | Metadata | | | | Pattern | | | |
| +------------+ +------------+ +------------+ +------------+ |
| | | | | |
| +---------------+---------------+---------------+ |
| | |
| v |
| +------------------+ |
| | generateReview | |
| | (Terminal Tool) | |
| +------------------+ |
+------------------------------------------------------------------+
|
v
+------------------------------------------------------------------+
| EXTERNAL SERVICES |
| |
| +------------------+ +------------------+ |
| | GitHub API | | OpenAI API | |
| | | | | |
| | - /pulls/:id | | - chat/complete | |
| | - /contents/:path| | with tools | |
| | - /comments | | | |
| +------------------+ +------------------+ |
+------------------------------------------------------------------+
Tool Registry Design
TOOL REGISTRY PATTERN
+------------------------------------------------------------------+
| tools/index.ts |
| |
| export const tools = { |
| fetchPRMetadata, // from ./github.ts |
| getChangedFiles, // from ./github.ts |
| readFile, // from ./files.ts |
| getDiff, // from ./files.ts |
| searchPattern, // from ./search.ts |
| generateReview, // from ./review.ts (terminal) |
| }; |
+------------------------------------------------------------------+
|
|
+-------------------------+-------------------------+
| | |
v v v
+-----------+ +-----------+ +-----------+
|github.ts | | files.ts | | search.ts |
| | | | | |
|fetchPR | |readFile | |searchPat |
|Metadata() | |() | |tern() |
| | | | | |
|getChanged | |getDiff() | | |
|Files() | | | | |
+-----------+ +-----------+ +-----------+
| | |
v v v
+-----------+ +-----------+ +-----------+
| GitHubAPI | | Node fs | |child_proc |
| (Octokit) | | promises | |exec('rg') |
+-----------+ +-----------+ +-----------+
Agent Loop Internals
AGENT LOOP DETAILED FLOW
generateText() called
|
v
+------+------+
| Initialize |
| context |
| messages=[] |
+-------------+
|
v
+------+------+
| step = 0 |<----------------------------------+
+-------------+ |
| |
v |
+------+------+ |
| step++ | |
+-------------+ |
| |
v |
+------+------+ |
| step > |--Yes--> Return { text, steps } |
| maxSteps? | |
+-------------+ |
|No |
v |
+------+------+ |
| LLM call | |
| with tools | |
| & context | |
+-------------+ |
| |
v |
+------+------+ |
| Response | |
| has tool |--No--> Return { text, steps } |
| calls? | (LLM chose to respond) |
+-------------+ |
|Yes |
v |
+------+------+ |
| stopWhen |--Yes--> Return { text, steps } |
| (toolCall)? | (Goal reached) |
+-------------+ |
|No |
v |
+------+------+ |
| Execute | |
| tool(s) | |
+-------------+ |
| |
v |
+------+------+ |
| onStep | |
| Finish() | |
+-------------+ |
| |
v |
+------+------+ |
| Append to | |
| context: | |
| - assistant | |
| message | |
| - tool |-----------------------------------+
| results |
+-------------+
GitHub API Integration Points
GITHUB API INTEGRATION
PR URL: github.com/acme/widget-api/pull/47
|
v
+------------------------------------------+
| URL PARSER |
| owner = "acme" |
| repo = "widget-api" |
| prNumber = 47 |
+------------------------------------------+
|
+--------------------+--------------------+
| | |
v v v
GET /repos/{owner}/{repo}/pulls/{prNumber}
+------------------------------------------+
| Response: |
| { |
| title: "Add auth middleware", |
| body: "...", |
| head: { sha: "abc123" }, |
| base: { sha: "def456" }, |
| changed_files: 5, |
| additions: 234, |
| deletions: 12 |
| } |
+------------------------------------------+
GET /repos/{owner}/{repo}/pulls/{prNumber}/files
+------------------------------------------+
| Response: |
| [ |
| { |
| filename: "src/auth.ts", |
| status: "modified", |
| additions: 89, |
| deletions: 4, |
| patch: "@@ -10,4 +10,89 @@..." |
| }, |
| ... |
| ] |
+------------------------------------------+
GET /repos/{owner}/{repo}/contents/{path}?ref={sha}
+------------------------------------------+
| Response: |
| { |
| content: "base64-encoded-content", |
| encoding: "base64" |
| } |
+------------------------------------------+
POST /repos/{owner}/{repo}/issues/{prNumber}/comments
+------------------------------------------+
| Request Body: |
| { |
| body: "## Code Review\n\n..." |
| } |
+------------------------------------------+
Recommended File Structure
review-agent/
|-- package.json
|-- tsconfig.json
|-- .env # GITHUB_TOKEN, OPENAI_API_KEY
|-- .env.example
|
|-- src/
| |-- index.ts # CLI entry point
| |-- agent.ts # Main agent orchestration
| |
| |-- tools/
| | |-- index.ts # Tool registry
| | |-- github.ts # fetchPRMetadata, getChangedFiles
| | |-- files.ts # readFile, getDiff
| | |-- search.ts # searchPattern
| | |-- review.ts # generateReview (terminal tool)
| |
| |-- lib/
| | |-- github-client.ts # Octokit wrapper
| | |-- url-parser.ts # Parse PR URLs
| | |-- formatter.ts # Format review output
| |
| |-- schemas/
| | |-- review.ts # Zod schemas for review structure
| | |-- issue.ts # Schema for individual issues
| |
| |-- prompts/
| | |-- system.ts # System prompt for agent
|
|-- tests/
| |-- unit/
| | |-- tools/
| | | |-- github.test.ts
| | | |-- files.test.ts
| | | |-- search.test.ts
| |
| |-- integration/
| | |-- agent.test.ts # Full agent tests with mocked LLM
| |
| |-- fixtures/
| |-- sample-pr.json # Sample PR metadata
| |-- sample-files/ # Sample files to review
|
|-- README.md
Phased Implementation Guide
Phase 1: Foundation (Days 1-2)
Goal: Get a minimal agent loop working with one tool.
Milestone: Agent calls readFile and returns file contents.
// src/index.ts - Minimal viable agent
import { generateText, tool } from 'ai';
import { openai } from '@ai-sdk/openai';
import { z } from 'zod';
import * as fs from 'fs/promises';
const readFile = tool({
description: 'Read a file from the local filesystem',
parameters: z.object({
path: z.string().describe('Path to the file')
}),
execute: async ({ path }) => {
try {
return await fs.readFile(path, 'utf-8');
} catch (error) {
return `Error reading file: ${error}`;
}
}
});
async function main() {
const { text, steps } = await generateText({
model: openai('gpt-4'),
tools: { readFile },
prompt: 'Read the file package.json and tell me the project name'
});
console.log('Steps taken:', steps.length);
console.log('Result:', text);
}
main();
Checklist:
- Project initialized with TypeScript
- AI SDK and OpenAI provider installed
- Single tool defined and working
- Agent successfully calls tool and uses result
Phase 2: Tool Suite (Days 3-5)
Goal: Implement all tools needed for code review.
Milestone: Agent can fetch PR metadata, read files, and search patterns.
Tasks:
- Set up GitHub API client (Octokit)
- Implement
fetchPRMetadatatool - Implement
getChangedFilestool - Implement
searchPatterntool - Implement
getDifftool - Add URL parser for PR URLs
// src/tools/github.ts
import { Octokit } from '@octokit/rest';
import { tool } from 'ai';
import { z } from 'zod';
const octokit = new Octokit({ auth: process.env.GITHUB_TOKEN });
export const fetchPRMetadata = tool({
description: 'Fetch metadata for a GitHub Pull Request including title, ' +
'description, and statistics. Use this first to understand ' +
'what the PR is about.',
parameters: z.object({
owner: z.string().describe('Repository owner (user or org)'),
repo: z.string().describe('Repository name'),
prNumber: z.number().describe('Pull request number')
}),
execute: async ({ owner, repo, prNumber }) => {
const { data } = await octokit.pulls.get({ owner, repo, pull_number: prNumber });
return {
title: data.title,
body: data.body,
changedFiles: data.changed_files,
additions: data.additions,
deletions: data.deletions,
headSha: data.head.sha,
baseSha: data.base.sha
};
}
});
export const getChangedFiles = tool({
description: 'Get the list of files changed in a Pull Request. ' +
'Returns filenames, status (added/modified/deleted), ' +
'and line change counts.',
parameters: z.object({
owner: z.string(),
repo: z.string(),
prNumber: z.number()
}),
execute: async ({ owner, repo, prNumber }) => {
const { data } = await octokit.pulls.listFiles({
owner,
repo,
pull_number: prNumber
});
return data.map(f => ({
filename: f.filename,
status: f.status,
additions: f.additions,
deletions: f.deletions
}));
}
});
Checklist:
- GitHub client configured with authentication
- All 5 tools implemented and tested individually
- URL parser extracts owner/repo/PR from various URL formats
- Error handling for API failures
Phase 3: Agent Loop (Days 6-8)
Goal: Wire tools into agent with proper termination.
Milestone: Agent autonomously reviews a PR and produces structured output.
Tasks:
- Create system prompt for code review
- Implement
generateReviewterminal tool - Configure
stopWhenwithhasToolCall - Add
onStepFinishfor progress logging - Add
maxStepssafety limit
// src/agent.ts
import { generateText, hasToolCall } from 'ai';
import { openai } from '@ai-sdk/openai';
import { tools } from './tools';
import { systemPrompt } from './prompts/system';
export async function reviewPR(prUrl: string) {
const { owner, repo, prNumber } = parsePRUrl(prUrl);
const result = await generateText({
model: openai('gpt-4'),
system: systemPrompt,
tools,
maxSteps: 15,
stopWhen: hasToolCall('generateReview'),
onStepFinish: ({ stepType, toolCalls, text }) => {
console.log(`[Step] ${stepType}`);
if (toolCalls) {
for (const call of toolCalls) {
console.log(` Tool: ${call.toolName}(${JSON.stringify(call.args)})`);
}
}
if (text) {
console.log(` Reasoning: ${text.slice(0, 100)}...`);
}
},
prompt: `Review Pull Request #${prNumber} from ${owner}/${repo}.
Start by fetching the PR metadata.`
});
// Extract the review from the final tool call
const reviewStep = result.steps.find(s =>
s.toolCalls?.some(tc => tc.toolName === 'generateReview')
);
return reviewStep?.toolCalls?.[0]?.args;
}
Checklist:
- System prompt guides agent behavior
- Agent calls tools in logical sequence
- Progress visible in terminal
- Agent terminates when calling generateReview
- Structured review extracted from result
Phase 4: Polish and CLI (Days 9-11)
Goal: Production-ready CLI with formatting and GitHub posting.
Milestone: Full CLI experience as shown in Real World Outcome.
Tasks:
- Build CLI interface with commander.js
- Format review output with colors (chalk)
- Save review to markdown file
- Implement GitHub comment posting
- Add interactive confirmation prompts
- Handle edge cases (large PRs, private repos)
// src/index.ts
import { Command } from 'commander';
import chalk from 'chalk';
import { reviewPR } from './agent';
import { postReviewComment } from './lib/github-client';
import { formatReview } from './lib/formatter';
const program = new Command();
program
.name('review-agent')
.description('AI-powered code review agent')
.argument('<pr-url>', 'GitHub PR URL')
.option('-o, --output <file>', 'Save review to file')
.option('-p, --post', 'Post review as GitHub comment')
.action(async (prUrl, options) => {
console.log(chalk.blue(' Code Review Agent initialized'));
const review = await reviewPR(prUrl);
const formatted = formatReview(review);
console.log(formatted);
if (options.output) {
await fs.writeFile(options.output, formatted);
console.log(chalk.green(` Saved to ${options.output}`));
}
if (options.post) {
const url = await postReviewComment(prUrl, review);
console.log(chalk.green(` Posted: ${url}`));
}
});
program.parse();
Checklist:
- CLI parses arguments correctly
- Output is formatted and colored
- Review saved to file when requested
- GitHub posting works with confirmation
- Error messages are helpful
Phase 5: Robustness (Days 12-14)
Goal: Handle edge cases and improve reliability.
Milestone: Agent gracefully handles failures and large PRs.
Tasks:
- Add retry logic for transient failures
- Implement context windowing for large PRs
- Add rate limiting for GitHub API
- Handle binary files gracefully
- Add timeout handling
- Write comprehensive tests
Checklist:
- Retries on transient API failures
- Large PRs handled without context overflow
- Rate limits respected
- Binary files skipped with message
- Operations timeout after reasonable period
- Tests cover happy path and error cases
Testing Strategy
Testing AI Agents: The Challenge
Testing agents is uniquely challenging because:
- LLM responses are non-deterministic
- Tool orchestration is dynamic
- External APIs add complexity
- The agent makes autonomous decisions
Mocking LLM Responses
Create predictable test scenarios by mocking the model:
// tests/mocks/model.ts
import { MockLanguageModelV1 } from 'ai/test';
export function createMockModel(responses: string[]) {
let callIndex = 0;
return new MockLanguageModelV1({
doGenerate: async () => {
const response = responses[callIndex++];
// Parse response to determine if it's text or tool call
if (response.startsWith('TOOL:')) {
const [_, toolName, args] = response.match(/TOOL:(\w+):(.+)/)!;
return {
type: 'tool-call',
toolCalls: [{
toolName,
args: JSON.parse(args)
}]
};
}
return { type: 'text', text: response };
}
});
}
// tests/integration/agent.test.ts
import { createMockModel } from '../mocks/model';
describe('Code Review Agent', () => {
it('calls fetchPRMetadata first', async () => {
const mockModel = createMockModel([
'TOOL:fetchPRMetadata:{"owner":"acme","repo":"api","prNumber":1}'
]);
const result = await reviewPR('https://github.com/acme/api/pull/1', {
model: mockModel
});
expect(result.steps[0].toolCalls[0].toolName).toBe('fetchPRMetadata');
});
});
Testing Tool Execution
Test tools in isolation from the LLM:
// tests/unit/tools/github.test.ts
import { fetchPRMetadata } from '../../../src/tools/github';
import nock from 'nock';
describe('fetchPRMetadata', () => {
beforeEach(() => {
nock('https://api.github.com')
.get('/repos/acme/api/pulls/47')
.reply(200, {
title: 'Add auth',
body: 'Description',
changed_files: 5,
additions: 100,
deletions: 10,
head: { sha: 'abc123' },
base: { sha: 'def456' }
});
});
afterEach(() => {
nock.cleanAll();
});
it('returns structured PR metadata', async () => {
const result = await fetchPRMetadata.execute({
owner: 'acme',
repo: 'api',
prNumber: 47
});
expect(result).toEqual({
title: 'Add auth',
body: 'Description',
changedFiles: 5,
additions: 100,
deletions: 10,
headSha: 'abc123',
baseSha: 'def456'
});
});
it('handles API errors gracefully', async () => {
nock.cleanAll();
nock('https://api.github.com')
.get('/repos/acme/api/pulls/999')
.reply(404, { message: 'Not Found' });
await expect(fetchPRMetadata.execute({
owner: 'acme',
repo: 'api',
prNumber: 999
})).rejects.toThrow(/Not Found/);
});
});
Integration Testing with Real LLM
For integration tests, use deterministic prompts and validate behavior:
// tests/integration/agent-real.test.ts
describe('Agent Integration (Real LLM)', () => {
// Skip in CI, run manually
it.skip('reviews a real PR end-to-end', async () => {
const review = await reviewPR(
'https://github.com/your-test-repo/pull/1'
);
// Validate structure, not exact content
expect(review).toHaveProperty('securityIssues');
expect(review).toHaveProperty('codeQuality');
expect(review).toHaveProperty('recommendation');
expect(['approve', 'request-changes', 'comment'])
.toContain(review.recommendation);
});
});
Test Matrix
+---------------------+---------------+---------------+---------------+
| Test Type | Determinism | Speed | Coverage |
+---------------------+---------------+---------------+---------------+
| Unit (tools) | Deterministic | Fast (ms) | Tool logic |
| Unit (mocked LLM) | Deterministic | Fast (ms) | Orchestration |
| Integration (mock) | Deterministic | Medium (s) | Full flow |
| Integration (real) | Non-determ | Slow (10s+) | E2E behavior |
+---------------------+---------------+---------------+---------------+
Common Pitfalls and Debugging
Pitfall 1: Poor Tool Descriptions
Symptom: LLM calls wrong tool or ignores available tools.
Cause: Tool descriptions donât clearly explain when to use them.
Solution: Write descriptions as if explaining to a new team member.
// BAD
description: 'Read file'
// GOOD
description: 'Read the contents of a source code file from the repository. ' +
'Use this when you need to examine implementation details or ' +
'understand code structure. Returns the full file contents as text. ' +
'For very large files, consider using searchPattern first.'
Pitfall 2: Context Overflow
Symptom: Agent crashes with âcontext length exceededâ or responses become confused.
Cause: Tool results accumulate without summarization.
Solution: Implement context management with prepareStep or limit tool output size.
const readFile = tool({
description: 'Read file contents',
parameters: z.object({ path: z.string() }),
execute: async ({ path }) => {
const content = await fs.readFile(path, 'utf-8');
const lines = content.split('\n');
// Limit returned content
if (lines.length > 100) {
return `[First 100 lines of ${lines.length} total]\n` +
lines.slice(0, 100).join('\n') +
'\n...[truncated]';
}
return content;
}
});
Pitfall 3: Missing Stop Condition
Symptom: Agent runs forever or hits maxSteps without meaningful result.
Cause: No clear termination condition or LLM doesnât understand when to stop.
Solution: Use explicit terminal tool with clear description.
const generateReview = tool({
description: 'Generate the final code review. ' +
'IMPORTANT: Call this tool when you have gathered enough ' +
'information to write a complete review. Do not call any ' +
'other tools after this.',
parameters: reviewSchema,
execute: async (review) => review
});
// In agent:
stopWhen: hasToolCall('generateReview')
Pitfall 4: No Error Recovery
Symptom: Agent crashes when GitHub API returns error.
Cause: Tools donât handle errors gracefully.
Solution: Return errors as data, let LLM decide next action.
const readFile = tool({
description: 'Read file. Returns error message if file not found.',
parameters: z.object({ path: z.string() }),
execute: async ({ path }) => {
try {
return await fs.readFile(path, 'utf-8');
} catch (error) {
// Return error as data, don't throw
return `ERROR: Could not read file ${path}: ${error.message}`;
}
}
});
This allows the LLM to reason: âThe file doesnât exist, let me try a different approach.â
Pitfall 5: Non-Deterministic Testing
Symptom: Tests pass/fail randomly.
Cause: Testing with real LLM without controlling for non-determinism.
Solution: Mock LLM for deterministic tests, use behavioral assertions for real LLM tests.
// DON'T assert exact output
expect(review.summary).toBe('This PR has security issues');
// DO assert structure and reasonable behavior
expect(review.summary).toBeDefined();
expect(review.summary.length).toBeGreaterThan(20);
Pitfall 6: Forgetting Rate Limits
Symptom: Agent works for small PRs but fails on larger ones.
Cause: GitHub API rate limiting kicks in.
Solution: Implement rate limiting and caching.
import Bottleneck from 'bottleneck';
const limiter = new Bottleneck({
minTime: 100, // Minimum 100ms between requests
maxConcurrent: 3
});
const octokit = new Octokit({
auth: process.env.GITHUB_TOKEN,
request: {
fetch: async (url, options) => {
return limiter.schedule(() => fetch(url, options));
}
}
});
Pitfall 7: Ignoring onStepFinish
Symptom: No visibility into what agent is doing; hard to debug.
Cause: No observability hooks implemented.
Solution: Always use onStepFinish for logging.
const result = await generateText({
model,
tools,
onStepFinish: ({ stepType, toolCalls, text, usage }) => {
console.log(`[${new Date().toISOString()}] Step: ${stepType}`);
console.log(` Tokens: ${usage?.totalTokens}`);
if (toolCalls?.length) {
for (const call of toolCalls) {
console.log(` Tool: ${call.toolName}`);
console.log(` Args: ${JSON.stringify(call.args, null, 2)}`);
}
}
if (text) {
console.log(` Reasoning: ${text.substring(0, 200)}`);
}
},
prompt: '...'
});
Extensions and Challenges
Extension 1: Multi-Language Support
Challenge: Extend the agent to review code in multiple languages with language-specific rules.
Implementation Ideas:
- Add language detection tool
- Create language-specific prompt templates
- Implement language-specific pattern searches (e.g., Goâs error handling, Pythonâs type hints)
const detectLanguage = tool({
description: 'Detect the primary programming language of a file',
parameters: z.object({ path: z.string() }),
execute: async ({ path }) => {
const ext = path.split('.').pop();
const languageMap = {
'ts': 'TypeScript',
'tsx': 'TypeScript/React',
'js': 'JavaScript',
'py': 'Python',
'go': 'Go',
'rs': 'Rust'
};
return languageMap[ext] || 'Unknown';
}
});
Extension 2: Learning from Feedback
Challenge: Allow users to rate reviews and use that feedback to improve future reviews.
Implementation Ideas:
- Store reviews and their ratings in a database
- Include highly-rated past reviews as few-shot examples
- Use RAG to retrieve relevant past reviews
USER FEEDBACK LOOP
Agent Review -> User Rating -> Store in DB
^ |
| v
+-------- RAG Retrieval -------+
"For this authentication PR, here are examples
of highly-rated reviews on similar PRs..."
Extension 3: CI/CD Integration
Challenge: Run the agent automatically on every PR via GitHub Actions.
Implementation Ideas:
- Create GitHub Action workflow
- Run agent on PR open/update events
- Post review as PR check or comment
- Handle concurrent executions
# .github/workflows/ai-review.yml
name: AI Code Review
on:
pull_request:
types: [opened, synchronize]
jobs:
review:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Run AI Review
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
run: |
npx review-agent ${{ github.event.pull_request.html_url }} --post
Extension 4: Diff-Aware Review
Challenge: Focus the review only on changed lines, not entire files.
Implementation Ideas:
- Parse unified diff format
- Extract only changed lines with context
- Generate line-specific comments that map to diff hunks
- Use GitHubâs review API for inline comments
const getDiffContext = tool({
description: 'Get the specific lines changed in a file with surrounding context',
parameters: z.object({
path: z.string(),
contextLines: z.number().default(3)
}),
execute: async ({ path, contextLines }) => {
const diff = await getDiff(path);
const hunks = parseDiff(diff);
return hunks.map(hunk => ({
startLine: hunk.newStart,
endLine: hunk.newStart + hunk.newLines,
oldContent: hunk.oldLines,
newContent: hunk.newLines,
context: hunk.context
}));
}
});
Resources
Books
| Book | Relevant Chapters | What Youâll Learn |
|---|---|---|
| âArtificial Intelligence: A Modern Approachâ by Russell & Norvig | Ch. 2: Intelligent Agents | Deep theoretical foundation for agents, PEAS framework, agent types |
| âProgramming TypeScriptâ by Boris Cherny | Ch. 4: Functions, Ch. 7: Error Handling | Type-safe function design, error handling patterns for tools |
| âRelease It!, 2nd Editionâ by Michael Nygard | Ch. 5: Stability Patterns | Circuit breakers, timeouts, retry logic for resilient agents |
| âCommand-Line Rustâ by Ken Youens-Clark | Ch. 1-3 | CLI design patterns (applicable to any language) |
| âDesigning Data-Intensive Applicationsâ by Martin Kleppmann | Ch. 1-2 | Thinking about reliability and maintainability |
Papers
- âReAct: Synergizing Reasoning and Acting in Language Modelsâ by Yao et al. - The foundational paper on ReAct agents
- âToolformer: Language Models Can Teach Themselves to Use Toolsâ - Understanding how LLMs learn tool use
Documentation
- AI SDK Tools and Tool Calling - Canonical reference for tool definition
- AI SDK Agents - Agent loop, stopWhen, prepareStep
- GitHub REST API - Pull Requests - PR metadata, files, comments
- GitHub REST API - Contents - Reading file contents
- Octokit.js Documentation - GitHub API client for Node.js
Videos and Courses
- AI SDK Official YouTube tutorials
- âBuilding AI Agentsâ series on LangChainâs YouTube channel
- âPrompt Engineering for Tool Useâ - Anthropicâs documentation
Recommended Reading Order
- AI SDK Tools Docs (30 min) - Understand tool definition syntax
- AI SDK Agents Docs (30 min) - Understand stopWhen and loop control
- Russell & Norvig Ch. 2 (1-2 hours) - Deep mental model for agents
- GitHub Pull Requests API (30 min) - Understand the data youâll work with
- Cherny Ch. 7 (1 hour) - TypeScript error handling for robust tools
- Nygard Ch. 5 (1 hour) - Stability patterns for production readiness
- Start coding!
Self-Assessment Checklist
Use this checklist to verify your understanding before considering the project complete:
Conceptual Understanding
- Can you explain the difference between an LLM call and an AI agent?
- Agent: LLM in a loop that can take actions via tools
- LLM call: Single request/response without actions
- Can you draw the agent loop from memory?
- Perceive (context) -> Decide (LLM reasoning) -> Act (tool or respond) -> Loop
- Can you explain how the LLM âseesâ your tool definitions?
- Tools are converted to JSON schema with name, description, parameters
- Description is prompt engineering for tool selection
- Can you explain what stopWhen does and when to use it?
- Checks each step to determine if loop should terminate
- Use with terminal tools like
generateReview
- Can you explain the ReAct pattern?
- Reasoning + Acting: LLM explicitly reasons before each action
- Thought -> Action -> Observation cycle
Implementation Skills
- Can you implement a tool with proper description and parameters?
- Description explains when/why to use
- Zod schema with
.describe()for each parameter - Execute function handles errors gracefully
- Can you set up an agent loop with proper termination?
maxStepsfor safety limitstopWhenfor goal-based termination- Terminal tool that signals completion
- Can you implement onStepFinish for observability?
- Log step type, tool calls, reasoning
- Track token usage
- Can you handle context growth in long-running agents?
- Limit tool output size
- Use prepareStep for summarization
- Prioritize relevant information
- Can you integrate with the GitHub API?
- Fetch PR metadata
- Read file contents at specific commits
- Post comments
Testing and Debugging
- Can you test tools in isolation?
- Mock external APIs with nock
- Test success and error cases
- Can you test agent orchestration with mocked LLM?
- MockLanguageModelV1 for deterministic responses
- Verify tool call sequence
- Can you debug an agent that isnât calling the right tools?
- Check tool descriptions
- Verify prompt clarity
- Use onStepFinish to trace decisions
- Can you handle tool failures gracefully?
- Return errors as data, donât throw
- Let LLM decide recovery strategy
Production Readiness
- Does your agent have proper error handling?
- API failures donât crash the agent
- User sees helpful error messages
- Does your agent handle edge cases?
- Large PRs with many files
- Binary files
- Private repositories
- Rate limiting
- Is your agent observable?
- Progress logged in real-time
- Token usage tracked
- Execution time measured
Summary
Building a code review agent teaches you the fundamental patterns of AI agent development:
- Tool Definition: How to give capabilities to an LLM through well-designed tools
- Agent Loop: How the perceive-decide-act cycle works in practice
- Context Management: How to handle growing conversation state
- Termination: How to know when an agent should stop
- Observability: How to understand what an agent is doing
- Resilience: How to build agents that gracefully handle failures
This project bridges the gap between âAI generates textâ and âAI takes actions.â Youâre not just building a code reviewer - youâre learning patterns that apply to any autonomous AI system: research agents, data analysis agents, customer support agents, and beyond.
The skills you develop here - designing tool interfaces, managing context, handling non-determinism, testing autonomous systems - are increasingly valuable as AI agents become central to software development.
Next Steps After Completion:
- Project 4: Multi-Provider Model Router (apply tool patterns to API routing)
- Project 5: Semantic Search Pipeline (combine agents with embeddings)
- Project 6: Real-time AI Dashboard (agents in streaming contexts)
This guide is part of the AI SDK Learning Projects series. For the full project list, see AI_SDK_LEARNING_PROJECTS.md.