Project 25: Code Review Workflow (Multi-Agent Review)
Build a multi-agent code review system where specialized agents (security, performance, style) review code in parallel and synthesize their findings into actionable feedback.
Learning Objectives
By completing this project, you will:
- Master multi-agent orchestration patterns for parallel task execution
- Design specialized AI agents with focused expertise and custom configurations
- Implement result synthesis combining findings from multiple sources
- Apply severity ranking algorithms to prioritize code review feedback
- Understand agent delegation patterns for complex workflows
Deep Theoretical Foundation
The Code Review Challenge
Traditional code review has fundamental limitations:
Traditional Code Review:
┌─────────────────────────────────────────────────────────────────────┐
│ Human Reviewer │
│ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐│
│ │ Security │ │ Performance │ │ Style │ │ Logic ││
│ │ Focus │ │ Focus │ │ Focus │ │ Focus ││
│ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘│
│ │ │ │ │ │
│ └───────────────┴───────────────┴───────────────┘ │
│ │ │
│ ▼ │
│ Single Brain Tries │
│ to Cover Everything │
│ │
│ Problems: │
│ • Cognitive overload │
│ • Expertise gaps │
│ • Inconsistent focus │
│ • Time constraints │
│ • Fatigue │
└─────────────────────────────────────────────────────────────────────┘
Multi-Agent Review:
┌─────────────────────────────────────────────────────────────────────┐
│ Coordinator Agent │
│ │ │
│ ┌────────────────────┼────────────────────┐ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Security │ │ Performance │ │ Style │ │
│ │ Agent │ │ Agent │ │ Agent │ │
│ │ │ │ │ │ │ │
│ │ • OWASP │ │ • O(n) vs │ │ • ESLint │ │
│ │ • Injection │ │ O(n^2) │ │ • Prettier │ │
│ │ • Auth │ │ • Memory │ │ • Naming │ │
│ │ • Crypto │ │ • Caching │ │ • DRY │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
│ │ │ │ │
│ └────────────────────┼────────────────────┘ │
│ ▼ │
│ ┌─────────────────┐ │
│ │ Synthesize │ │
│ │ & Prioritize │ │
│ └─────────────────┘ │
│ │
│ Benefits: │
│ • Deep expertise per domain │
│ • Parallel execution │
│ • Consistent focus │
│ • No fatigue │
│ • Comprehensive coverage │
└─────────────────────────────────────────────────────────────────────┘
Multi-Agent Architectures
There are several patterns for organizing multiple agents:
Pattern 1: PARALLEL (Your Project)
┌─────────────────────────────────────────────────────────────────────┐
│ │
│ ┌───────────────┐ │
│ │ Coordinator │ │
│ └───────────────┘ │
│ │ │
│ ┌──────────────┼──────────────┐ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Agent A │ │ Agent B │ │ Agent C │ ← Run in parallel│
│ └──────────┘ └──────────┘ └──────────┘ │
│ │ │ │ │
│ └──────────────┼──────────────┘ │
│ ▼ │
│ ┌───────────────┐ │
│ │ Synthesize │ │
│ └───────────────┘ │
│ │
│ Use case: Independent tasks, time-sensitive │
└─────────────────────────────────────────────────────────────────────┘
Pattern 2: SEQUENTIAL (Pipeline)
┌─────────────────────────────────────────────────────────────────────┐
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Agent A │──►│ Agent B │──►│ Agent C │──►│ Agent D │ │
│ │ (Parse) │ │(Analyze) │ │(Suggest) │ │ (Format) │ │
│ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │
│ │
│ Use case: Each step depends on previous output │
└─────────────────────────────────────────────────────────────────────┘
Pattern 3: HIERARCHICAL (Tree)
┌─────────────────────────────────────────────────────────────────────┐
│ │
│ ┌───────────────┐ │
│ │ Manager │ │
│ └───────────────┘ │
│ │ │
│ ┌──────────────┼──────────────┐ │
│ ▼ ▼ ▼ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Lead A │ │ Lead B │ │ Lead C │ │
│ └──────────┘ └──────────┘ └──────────┘ │
│ │ │ │
│ ┌─────┴─────┐ ┌─────┴─────┐ │
│ ▼ ▼ ▼ ▼ │
│ ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐ │
│ │Worker 1│ │Worker 2│ │Worker 3│ │Worker 4│ │
│ └────────┘ └────────┘ └────────┘ └────────┘ │
│ │
│ Use case: Large teams, complex delegation │
└─────────────────────────────────────────────────────────────────────┘
Pattern 4: DEBATE (Adversarial)
┌─────────────────────────────────────────────────────────────────────┐
│ │
│ ┌──────────┐ ┌──────────┐ │
│ │ Agent A │◄──── Debate/Challenge ─────►│ Agent B │ │
│ │(Advocate)│ │ (Critic) │ │
│ └──────────┘ └──────────┘ │
│ │ │ │
│ └────────────────┬───────────────────────┘ │
│ ▼ │
│ ┌──────────────┐ │
│ │ Judge │ │
│ │ (Arbiter) │ │
│ └──────────────┘ │
│ │
│ Use case: Exploring trade-offs, finding edge cases │
└─────────────────────────────────────────────────────────────────────┘
Specialized Agent Design
Each agent needs a focused configuration that shapes its expertise:
Agent Specialization Architecture:
┌─────────────────────────────────────────────────────────────────────┐
│ SECURITY AGENT │
├─────────────────────────────────────────────────────────────────────┤
│ System Prompt: │
│ "You are a security-focused code reviewer. Your expertise: │
│ - OWASP Top 10 vulnerabilities │
│ - Input validation and sanitization │
│ - Authentication and authorization │
│ - Cryptographic best practices │
│ - SQL injection, XSS, CSRF prevention │
│ │
│ For each issue, rate severity: CRITICAL, HIGH, MEDIUM, LOW │
│ Provide specific remediation steps." │
│ │
│ Focus Areas: │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Injection │ │ Broken │ │ Sensitive │ │ Broken │ │
│ │ Flaws │ │ Auth │ │ Data Expose │ │ Access │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ │
└─────────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────┐
│ PERFORMANCE AGENT │
├─────────────────────────────────────────────────────────────────────┤
│ System Prompt: │
│ "You are a performance-focused code reviewer. Your expertise: │
│ - Algorithmic complexity (Big O notation) │
│ - Memory management and leaks │
│ - Database query optimization │
│ - Caching strategies │
│ - Async/parallel execution opportunities │
│ │
│ For each issue, estimate impact: 10x, 5x, 2x, marginal │
│ Suggest benchmarks to verify improvements." │
│ │
│ Focus Areas: │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ O(n^2) → │ │ N+1 │ │ Memory │ │ Blocking │ │
│ │ O(n) │ │ Queries │ │ Leaks │ │ I/O │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ │
└─────────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────┐
│ STYLE AGENT │
├─────────────────────────────────────────────────────────────────────┤
│ System Prompt: │
│ "You are a code style and quality reviewer. Your expertise: │
│ - Naming conventions and clarity │
│ - Code organization and modularity │
│ - DRY principle adherence │
│ - Documentation completeness │
│ - Consistency with project patterns │
│ │
│ Group issues by: formatting, naming, structure, documentation │
│ Reference relevant style guides when applicable." │
│ │
│ Focus Areas: │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Naming │ │ Code │ │ Missing │ │ DRY │ │
│ │ Conventions │ │ Smells │ │ Docs │ │ Violations │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ │
└─────────────────────────────────────────────────────────────────────┘
Result Synthesis
Combining findings from multiple agents requires careful prioritization:
Synthesis Algorithm:
┌─────────────────────────────────────────────────────────────────────┐
│ INPUT: Agent Findings │
│ │
│ Security: [Finding1, Finding2, Finding3] │
│ Performance: [Finding4, Finding5] │
│ Style: [Finding6, Finding7, Finding8, Finding9] │
└─────────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────┐
│ STEP 1: Normalize Severity │
│ │
│ Map each agent's severity to common scale (1-10): │
│ │
│ Security: Performance: Style: │
│ CRITICAL = 10 10x impact = 9 Blocking = 6 │
│ HIGH = 8 5x impact = 7 Major = 4 │
│ MEDIUM = 5 2x impact = 5 Minor = 2 │
│ LOW = 3 marginal = 2 Nitpick = 1 │
└─────────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────┐
│ STEP 2: Deduplicate │
│ │
│ Detect overlapping findings (same line, similar issue): │
│ │
│ Security: "SQL injection at line 42" │
│ Performance: "Unparameterized query at line 42" ← MERGE │
│ │
│ Result: Combined finding with both perspectives │
└─────────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────┐
│ STEP 3: Weight by Category │
│ │
│ Apply category multipliers (configurable): │
│ │
│ Security findings: ×1.5 (most critical) │
│ Performance findings: ×1.2 (important) │
│ Style findings: ×1.0 (baseline) │
│ │
│ Final score = normalized_severity × category_weight │
└─────────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────┐
│ STEP 4: Sort and Group │
│ │
│ 1. SQL Injection (Security - CRITICAL) Score: 15.0 │
│ 2. N+1 Query (Performance - HIGH) Score: 10.8 │
│ 3. Missing Auth Check (Security - HIGH) Score: 12.0 │
│ 4. Unused import (Style - Minor) Score: 2.0 │
│ ... │
│ │
│ Group by file for developer convenience │
└─────────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────┐
│ OUTPUT: Prioritized Review │
│ │
│ MUST FIX (Score > 10): │
│ 1. SQL Injection in getUserById() │
│ 2. Missing auth check in deleteUser() │
│ │
│ SHOULD FIX (Score 5-10): │
│ 3. N+1 query in getOrdersWithItems() │
│ │
│ CONSIDER (Score < 5): │
│ 4. Unused imports │
│ 5. Variable naming suggestions │
└─────────────────────────────────────────────────────────────────────┘
Kiro Subagent Spawning
Kiro CLI supports spawning subagents for parallel execution:
Subagent Spawning Patterns:
Method 1: CLI Subprocess
┌─────────────────────────────────────────────────────────────────────┐
│ const reviews = await Promise.all([ │
│ $`kiro-cli --agent security --print "${prompt}"`, │
│ $`kiro-cli --agent performance --print "${prompt}"`, │
│ $`kiro-cli --agent style --print "${prompt}"`, │
│ ]); │
│ │
│ Pros: Simple, isolated │
│ Cons: Startup overhead per agent │
└─────────────────────────────────────────────────────────────────────┘
Method 2: Agent Configuration
┌─────────────────────────────────────────────────────────────────────┐
│ // .kiro/agents/review-coordinator.yaml │
│ name: review-coordinator │
│ system_prompt: | │
│ You coordinate code reviews by delegating to specialized agents. │
│ │
│ allowed_tools: │
│ - spawn_subagent │
│ │
│ subagents: │
│ - security-reviewer │
│ - performance-reviewer │
│ - style-reviewer │
└─────────────────────────────────────────────────────────────────────┘
Method 3: Direct Invocation
┌─────────────────────────────────────────────────────────────────────┐
│ > "Review this PR with all specialized agents" │
│ │
│ [Coordinator] Spawning security-reviewer... │
│ [Coordinator] Spawning performance-reviewer... │
│ [Coordinator] Spawning style-reviewer... │
│ │
│ [Waiting for subagents...] │
│ │
│ [security-reviewer] Found 3 issues │
│ [performance-reviewer] Found 2 issues │
│ [style-reviewer] Found 5 issues │
│ │
│ [Coordinator] Synthesizing findings... │
└─────────────────────────────────────────────────────────────────────┘
Real-World Analogy: The Architecture Review Board
Think of this system like a corporate Architecture Review Board:
- The Coordinator is the meeting chair who assigns the agenda
- Security Agent is the security architect who focuses only on threats
- Performance Agent is the performance engineer who watches for bottlenecks
- Style Agent is the tech lead who maintains coding standards
- The Synthesis is the meeting minutes that prioritize action items
Each expert reviews independently, then they meet to consolidate feedback.
Historical Context
Code review automation has evolved:
Code Review Evolution:
1970s: Fagan Inspections
└─► Formal, meeting-based reviews
1990s: Lightweight Reviews
└─► Email-based, async reviews
2000s: Tool-Assisted (Crucible, Review Board)
└─► Web interfaces, inline comments
2010s: Pull Request Workflow
└─► GitHub/GitLab integrated reviews
2020s: AI Linters (Codacy, DeepSource)
└─► Automated issue detection
2024+: Multi-Agent AI Review ◄─── YOU ARE HERE
└─► Specialized AI agents with synthesis
Book References
For deeper understanding:
- “Working Effectively with Legacy Code” by Michael Feathers - Code analysis techniques
- “Clean Code” by Robert C. Martin - Style and quality principles
- “Secure Coding in C and C++” by Seacord - Security review patterns
- “High Performance Browser Networking” by Grigorik - Performance analysis
Complete Project Specification
What You Are Building
A multi-agent code review system that:
- Accepts code for review (file, diff, or PR)
- Spawns specialized agents in parallel
- Collects and normalizes findings from each agent
- Synthesizes a prioritized report with actionable feedback
- Optionally applies fixes for certain issue types
Functional Requirements
| Feature | Behavior |
|---|---|
| Input | Accept file path, git diff, or GitHub PR URL |
| Parallel Review | Run security, performance, style agents simultaneously |
| Findings Format | Standardized structure with line numbers, severity |
| Prioritization | Rank issues by weighted severity |
| Output | Clear, actionable review comments |
Non-Functional Requirements
- Latency: Complete review within 60 seconds for typical PR
- Accuracy: Minimize false positives while catching real issues
- Extensibility: Easy to add new specialized agents
- Integration: Work with GitHub PR workflow
Solution Architecture
High-Level Component Diagram
┌─────────────────────────────────────────────────────────────────────┐
│ User Request │
│ │
│ "Review PR #42 with all agents" │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────┐│
│ │ Coordinator Agent ││
│ │ ││
│ │ 1. Parse request (PR #42) ││
│ │ 2. Fetch code diff ││
│ │ 3. Spawn subagents ││
│ │ 4. Collect results ││
│ │ 5. Synthesize report ││
│ │ ││
│ └─────────────────────────────────────────────────────────────────┘│
│ │ │
│ │ Parallel Spawn │
│ ▼ │
│ ┌──────────────┬──────────────┬──────────────┐ │
│ │ Security │ Performance │ Style │ │
│ │ Agent │ Agent │ Agent │ │
│ │ │ │ │ │
│ │ Input: Diff │ Input: Diff │ Input: Diff │ │
│ │ │ │ │ │
│ │ Output: │ Output: │ Output: │ │
│ │ [{finding}] │ [{finding}] │ [{finding}] │ │
│ └──────────────┴──────────────┴──────────────┘ │
│ │ │ │ │
│ └──────────────┼──────────────┘ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────┐│
│ │ Synthesizer ││
│ │ ││
│ │ • Normalize severities ││
│ │ • Deduplicate findings ││
│ │ • Apply category weights ││
│ │ • Sort by priority ││
│ │ • Format output ││
│ │ ││
│ └─────────────────────────────────────────────────────────────────┘│
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────┐│
│ │ Prioritized Report ││
│ │ ││
│ │ MUST FIX: ││
│ │ 1. SQL Injection (Security - CRITICAL) ││
│ │ 2. Missing auth check (Security - HIGH) ││
│ │ ││
│ │ SHOULD FIX: ││
│ │ 3. N+1 query (Performance - HIGH) ││
│ │ ... ││
│ └─────────────────────────────────────────────────────────────────┘│
└─────────────────────────────────────────────────────────────────────┘
Data Flow: Complete Review Cycle
1. Input Processing
┌─────────────────────────────────────────────────────────────────┐
│ Input: "Review PR #42" │
│ │
│ Coordinator: │
│ 1. Parse: PR number = 42 │
│ 2. Fetch: gh pr diff 42 > diff.patch │
│ 3. Extract: Changed files and line ranges │
│ │
│ Output: │
│ { │
│ "files": ["src/api/users.ts", "src/services/auth.ts"], │
│ "diff": "...unified diff content...", │
│ "additions": 150, │
│ "deletions": 23 │
│ } │
└─────────────────────────────────────────────────────────────────┘
│
▼
2. Parallel Agent Execution
┌─────────────────────────────────────────────────────────────────┐
│ Promise.all([ │
│ securityAgent.review(context), // 15 seconds │
│ performanceAgent.review(context), // 12 seconds │
│ styleAgent.review(context), // 8 seconds │
│ ]) │
│ │
│ Total time: ~15 seconds (parallel) │
│ Sequential would be: ~35 seconds │
└─────────────────────────────────────────────────────────────────┘
│
▼
3. Raw Findings Collection
┌─────────────────────────────────────────────────────────────────┐
│ Security Agent Output: │
│ [ │
│ { │
│ "type": "SQL_INJECTION", │
│ "severity": "CRITICAL", │
│ "file": "src/api/users.ts", │
│ "line": 42, │
│ "message": "User input directly interpolated in SQL", │
│ "suggestion": "Use parameterized queries" │
│ }, │
│ ... │
│ ] │
│ │
│ Performance Agent Output: │
│ [ │
│ { │
│ "type": "N_PLUS_1", │
│ "severity": "HIGH", │
│ "file": "src/services/orders.ts", │
│ "line": 78, │
│ "message": "Query inside loop creates N+1 problem", │
│ "suggestion": "Use eager loading or batch query" │
│ }, │
│ ... │
│ ] │
│ │
│ Style Agent Output: │
│ [ │
│ { │
│ "type": "NAMING", │
│ "severity": "LOW", │
│ "file": "src/api/users.ts", │
│ "line": 15, │
│ "message": "Variable 'x' is not descriptive", │
│ "suggestion": "Rename to 'userId' or 'userIndex'" │
│ }, │
│ ... │
│ ] │
└─────────────────────────────────────────────────────────────────┘
│
▼
4. Synthesis and Output
┌─────────────────────────────────────────────────────────────────┐
│ MULTI-AGENT CODE REVIEW - PR #42 │
│ ════════════════════════════════════════════════════════════════│
│ │
│ Summary: 10 issues found (2 critical, 3 high, 5 low) │
│ │
│ MUST FIX (Critical): │
│ ──────────────────────────────────────────────────────────────── │
│ 1. [Security] SQL Injection │
│ File: src/api/users.ts:42 │
│ Issue: User input directly interpolated in SQL query │
│ Fix: Use parameterized query with $1, $2 placeholders │
│ │
│ SHOULD FIX (High): │
│ ──────────────────────────────────────────────────────────────── │
│ 2. [Performance] N+1 Query │
│ File: src/services/orders.ts:78 │
│ Issue: Database query inside loop │
│ Fix: Use .include() for eager loading │
│ │
│ 3. [Security] Missing Rate Limiting │
│ File: src/api/auth.ts:15 │
│ Issue: Login endpoint has no rate limit │
│ Fix: Add rate-limiter-flexible middleware │
│ │
│ CONSIDER (Low): │
│ ──────────────────────────────────────────────────────────────── │
│ 4-10. [Style] Various naming and formatting issues │
│ │
└─────────────────────────────────────────────────────────────────┘
Key Interfaces
// Finding from any agent
interface Finding {
agent: 'security' | 'performance' | 'style';
type: string;
severity: 'critical' | 'high' | 'medium' | 'low';
file: string;
line: number;
endLine?: number;
message: string;
suggestion: string;
codeSnippet?: string;
references?: string[];
}
// Review context passed to agents
interface ReviewContext {
diff: string;
files: FileContext[];
metadata: {
prNumber?: number;
baseBranch: string;
headBranch: string;
author: string;
};
}
interface FileContext {
path: string;
content: string;
diff: string;
changedLines: number[];
}
// Synthesized report
interface SynthesizedReport {
summary: {
total: number;
bySeverity: Record<string, number>;
byAgent: Record<string, number>;
};
findings: PrioritizedFinding[];
suggestedActions: Action[];
}
interface PrioritizedFinding extends Finding {
priority: number; // Computed score
relatedFindings?: Finding[]; // Merged duplicates
}
Agent Configuration Files
# .kiro/agents/security-reviewer.yaml
name: security-reviewer
system_prompt: |
You are a security-focused code reviewer with expertise in:
- OWASP Top 10 vulnerabilities
- Authentication and authorization flaws
- Input validation and output encoding
- Cryptographic weaknesses
- Information disclosure
When reviewing code:
1. Focus ONLY on security issues
2. Rate each finding: CRITICAL, HIGH, MEDIUM, LOW
3. Provide specific, actionable remediation
4. Reference CWE numbers when applicable
Output format: JSON array of findings.
allowed_tools:
- read_file
- search_codebase
model: claude-sonnet-4-20250514 # Fast, capable
---
# .kiro/agents/performance-reviewer.yaml
name: performance-reviewer
system_prompt: |
You are a performance-focused code reviewer with expertise in:
- Algorithmic complexity (Big O)
- Database query optimization
- Memory management
- Caching strategies
- Async/parallel execution
When reviewing code:
1. Focus ONLY on performance issues
2. Estimate impact: 10x, 5x, 2x, marginal
3. Suggest benchmarks to verify
4. Provide specific optimization techniques
Output format: JSON array of findings.
allowed_tools:
- read_file
- search_codebase
model: claude-sonnet-4-20250514
---
# .kiro/agents/style-reviewer.yaml
name: style-reviewer
system_prompt: |
You are a code style and quality reviewer with expertise in:
- Naming conventions
- Code organization
- DRY principle
- Documentation
- Consistency
When reviewing code:
1. Focus ONLY on style and quality issues
2. Reference project style guides
3. Distinguish: blocking vs. suggestions
4. Keep suggestions constructive
Output format: JSON array of findings.
allowed_tools:
- read_file
- search_codebase
model: claude-haiku-4-20250514 # Fast, good for style
---
# .kiro/agents/review-coordinator.yaml
name: review-coordinator
system_prompt: |
You are the code review coordinator. Your role:
1. Parse user review requests
2. Delegate to specialized agents
3. Collect and synthesize findings
4. Present prioritized report
You have access to these subagents:
- security-reviewer
- performance-reviewer
- style-reviewer
allowed_tools:
- spawn_subagent
- read_file
- gh_cli
model: claude-sonnet-4-20250514
Phased Implementation Guide
Phase 1: Single Agent Review (Days 1-3)
Goal: Create one working review agent (start with security).
Tasks:
- Create security-reviewer agent configuration
- Implement review prompt that outputs JSON findings
- Parse agent output into structured findings
- Test with sample code containing known vulnerabilities
- Format findings for display
Hints:
- Start with a hardcoded file path for testing
- Use JSON mode for structured output
- Include example findings in the system prompt
Starter Agent Prompt:
const securityReviewPrompt = `
Review this code for security vulnerabilities:
\`\`\`typescript
${codeContent}
\`\`\`
Return a JSON array of findings:
[
{
"type": "SQL_INJECTION",
"severity": "CRITICAL",
"line": 42,
"message": "User input directly in SQL",
"suggestion": "Use parameterized queries"
}
]
If no issues found, return empty array: []
`;
Phase 2: Multiple Agents (Days 4-6)
Goal: Add performance and style agents, run in parallel.
Tasks:
- Create performance-reviewer agent configuration
- Create style-reviewer agent configuration
- Implement parallel execution with Promise.all
- Collect results from all agents
- Handle agent failures gracefully
Hints:
- Each agent should have isolated context
- Use timeouts to prevent hanging agents
- Log which agent produced which findings
Parallel Execution:
async function runAllAgents(context: ReviewContext): Promise<Finding[]> {
const agents = ['security', 'performance', 'style'];
const results = await Promise.allSettled(
agents.map(agent =>
runAgent(agent, context).catch(err => {
console.error(`${agent} agent failed:`, err);
return [];
})
)
);
return results
.filter((r): r is PromiseFulfilledResult<Finding[]> => r.status === 'fulfilled')
.flatMap(r => r.value);
}
Phase 3: Coordinator Agent (Days 7-9)
Goal: Create the orchestrating coordinator agent.
Tasks:
- Create review-coordinator agent configuration
- Implement PR/diff fetching logic
- Build context object for subagents
- Implement subagent spawning
- Collect results from subagents
Hints:
- The coordinator needs access to
ghCLI - Pass minimal context to subagents (just what they need)
- Track timing for each agent
Coordinator Flow:
class ReviewCoordinator {
async review(request: string): Promise<SynthesizedReport> {
// 1. Parse request
const { prNumber, files } = this.parseRequest(request);
// 2. Fetch context
const context = await this.fetchContext(prNumber);
// 3. Spawn subagents in parallel
const findings = await this.runAllAgents(context);
// 4. Synthesize
return this.synthesize(findings);
}
}
Phase 4: Synthesis and Output (Days 10-14)
Goal: Implement finding synthesis and prioritized output.
Tasks:
- Implement severity normalization
- Detect and merge duplicate findings
- Apply category weights
- Sort by computed priority
- Format beautiful output
Hints:
- Duplicates often have same file and similar line numbers
- Use fuzzy matching for message similarity
- Group by file for developer convenience
Synthesis Implementation:
function synthesize(findings: Finding[]): SynthesizedReport {
// Normalize severities to 1-10 scale
const normalized = findings.map(f => ({
...f,
normalizedSeverity: normalizeSeverity(f.agent, f.severity),
}));
// Deduplicate (same file + similar line + similar message)
const deduplicated = deduplicateFindings(normalized);
// Apply category weights
const weighted = deduplicated.map(f => ({
...f,
priority: f.normalizedSeverity * getCategoryWeight(f.agent),
}));
// Sort by priority
weighted.sort((a, b) => b.priority - a.priority);
return {
summary: computeSummary(weighted),
findings: weighted,
suggestedActions: generateActions(weighted),
};
}
Testing Strategy
Unit Tests
describe('FindingSynthesizer', () => {
describe('normalizeSeverity', () => {
it('maps security CRITICAL to 10', () => {
expect(normalizeSeverity('security', 'critical')).toBe(10);
});
it('maps style minor to 2', () => {
expect(normalizeSeverity('style', 'low')).toBe(2);
});
});
describe('deduplicateFindings', () => {
it('merges findings on same line', () => {
const findings = [
{ agent: 'security', file: 'a.ts', line: 42, message: 'SQL injection' },
{ agent: 'performance', file: 'a.ts', line: 42, message: 'Slow query' },
];
const deduped = deduplicateFindings(findings);
expect(deduped).toHaveLength(1);
expect(deduped[0].relatedFindings).toHaveLength(1);
});
});
});
Integration Tests
describe('Full Review Pipeline', () => {
it('reviews a PR with all agents', async () => {
const coordinator = new ReviewCoordinator();
// Review a known test PR
const report = await coordinator.review('Review PR #1');
expect(report.findings.length).toBeGreaterThan(0);
expect(report.summary.byAgent).toHaveProperty('security');
expect(report.summary.byAgent).toHaveProperty('performance');
expect(report.summary.byAgent).toHaveProperty('style');
});
});
Manual Testing
# 1. Start coordinator agent
kiro-cli --agent review-coordinator
# 2. Review a local file
> "Review src/api/users.ts for all issues"
# 3. Review a PR
> "Review PR #42 with all agents"
# 4. Verify output format and prioritization
# Should see categorized, prioritized findings
Common Pitfalls and Debugging
Pitfall 1: Agents Return Inconsistent Formats
Symptom: JSON parsing fails on some agent outputs
Prevention:
function parseAgentOutput(output: string, agent: string): Finding[] {
try {
// Try to extract JSON from markdown code blocks
const jsonMatch = output.match(/```json\n?([\s\S]*?)\n?```/);
const json = jsonMatch ? jsonMatch[1] : output;
const findings = JSON.parse(json);
// Validate structure
return findings.filter(f =>
f.type && f.severity && f.line && f.message
).map(f => ({
...f,
agent,
}));
} catch (e) {
console.error(`Failed to parse ${agent} output:`, e);
return [];
}
}
Pitfall 2: Subagent Times Out
Symptom: One slow agent blocks entire review
Solution:
async function runAgentWithTimeout(agent: string, context: ReviewContext, timeoutMs = 30000) {
const controller = new AbortController();
const timeout = setTimeout(() => controller.abort(), timeoutMs);
try {
return await runAgent(agent, context, { signal: controller.signal });
} catch (e) {
if (e.name === 'AbortError') {
console.warn(`${agent} agent timed out after ${timeoutMs}ms`);
return [];
}
throw e;
} finally {
clearTimeout(timeout);
}
}
Pitfall 3: Duplicate Findings Not Detected
Symptom: Same issue reported by multiple agents separately
Solution:
function isSimilarFinding(a: Finding, b: Finding): boolean {
// Same file
if (a.file !== b.file) return false;
// Similar line (within 5 lines)
if (Math.abs(a.line - b.line) > 5) return false;
// Similar message (fuzzy match)
const similarity = stringSimilarity(a.message, b.message);
return similarity > 0.6;
}
Extensions and Challenges
Extension 1: GitHub Integration
Post review comments directly to PRs:
async function postToGitHub(report: SynthesizedReport, prNumber: number) {
for (const finding of report.findings) {
await $`gh pr comment ${prNumber} --body ${formatComment(finding)}`;
// Or use review API for inline comments
await $`gh api repos/:owner/:repo/pulls/${prNumber}/comments -f body="${finding.message}" -f path="${finding.file}" -f line=${finding.line}`;
}
}
Extension 2: Learning from Feedback
Track which findings developers actually fix:
interface FindingFeedback {
findingId: string;
wasFixed: boolean;
wasHelpful: boolean;
comment?: string;
}
// Use feedback to tune severity weights
function updateWeights(feedback: FindingFeedback[]) {
const fixRates = groupBy(feedback, f => f.findingType);
// Increase weight for types that are frequently fixed
// Decrease weight for types that are often dismissed
}
Extension 3: Custom Agents
Allow users to define project-specific agents:
# .kiro/agents/react-reviewer.yaml
name: react-reviewer
system_prompt: |
You are a React specialist. Review for:
- Hook rules violations
- State management anti-patterns
- Performance issues (missing memo, key props)
- Accessibility issues
Extension 4: Auto-Fix Capability
For certain issues, apply fixes automatically:
interface AutoFix {
type: string;
pattern: RegExp;
replacement: string | ((match: string) => string);
}
const autoFixes: AutoFix[] = [
{
type: 'MISSING_AWAIT',
pattern: /(?<!await\s)(fetch\()/g,
replacement: 'await $1',
},
];
Extension 5: Review History
Track review trends over time:
Review Trends (Last 30 Days):
┌─────────────────────────────────────────────────────────────────┐
│ Total Reviews: 47 │
│ Total Findings: 234 │
│ │
│ Top Issue Types: │
│ 1. N+1 Queries 45 ████████████ │
│ 2. Missing Auth 23 ██████ │
│ 3. Hardcoded Values 18 █████ │
│ │
│ Trend: Security issues ↓ 15%, Performance issues ↑ 8% │
└─────────────────────────────────────────────────────────────────┘
Real-World Connections
Industry Adoption
Multi-agent review patterns are used by:
- Amazon CodeGuru: Security and performance analysis
- DeepSource: Multiple analyzers running in parallel
- Codacy: Rule-based multi-category checks
- Snyk Code: Security-focused AI review
Production Considerations
| Concern | Solution |
|---|---|
| Cost | Use cheaper models for style, expensive for security |
| Latency | Parallel execution, aggressive timeouts |
| Accuracy | Track false positive rates, tune prompts |
| Coverage | Add new agents for project-specific patterns |
| Integration | GitHub Actions, GitLab CI/CD, Bitbucket |
Self-Assessment Checklist
Knowledge Verification
- Can you explain the parallel vs. sequential multi-agent patterns?
- How do you design agent specialization through system prompts?
- What is the finding synthesis process?
- Why is severity normalization important?
- How do you handle agent failures gracefully?
Implementation Verification
- All three agents run in parallel successfully
- Findings are properly attributed to their source agent
- Duplicate findings are detected and merged
- Output is sorted by priority
- The system handles agent timeouts gracefully
Quality Verification
- Security agent catches common vulnerabilities
- Performance agent identifies complexity issues
- Style agent flags inconsistencies
- False positive rate is acceptable
- Report is actionable and clear
Integration Verification
- Works with local files
- Works with git diffs
- Works with GitHub PRs
- Results can be posted as PR comments
Summary
Building a multi-agent code review system teaches you:
- Agent Orchestration: Coordinating multiple AI agents in parallel
- Specialization Design: Creating focused agents with deep expertise
- Result Synthesis: Combining and prioritizing findings from multiple sources
- Production Patterns: Handling timeouts, failures, and inconsistencies
The multi-agent pattern you have learned here applies far beyond code review - it works for any complex task that benefits from multiple specialized perspectives: security audits, documentation review, test planning, and more.
Next Project: P26-mdflow-workflow-engine.md - Executable markdown workflows with AI