Project 25: Code Review Workflow (Multi-Agent Review)

Build a multi-agent code review system where specialized agents (security, performance, style) review code in parallel and synthesize their findings into actionable feedback.

Learning Objectives

By completing this project, you will:

Master multi-agent orchestration patterns for parallel task execution
Design specialized AI agents with focused expertise and custom configurations
Implement result synthesis combining findings from multiple sources
Apply severity ranking algorithms to prioritize code review feedback
Understand agent delegation patterns for complex workflows

Deep Theoretical Foundation

The Code Review Challenge

Traditional code review has fundamental limitations:

Traditional Code Review:

┌─────────────────────────────────────────────────────────────────────┐
│                         Human Reviewer                               │
│                                                                      │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐│
│  │  Security   │  │ Performance │  │    Style    │  │   Logic     ││
│  │    Focus    │  │    Focus    │  │    Focus    │  │    Focus    ││
│  └─────────────┘  └─────────────┘  └─────────────┘  └─────────────┘│
│         │               │               │               │          │
│         └───────────────┴───────────────┴───────────────┘          │
│                              │                                       │
│                              ▼                                       │
│                    Single Brain Tries                                │
│                    to Cover Everything                               │
│                                                                      │
│  Problems:                                                           │
│  • Cognitive overload                                               │
│  • Expertise gaps                                                   │
│  • Inconsistent focus                                               │
│  • Time constraints                                                 │
│  • Fatigue                                                          │
└─────────────────────────────────────────────────────────────────────┘

Multi-Agent Review:

┌─────────────────────────────────────────────────────────────────────┐
│                       Coordinator Agent                              │
│                              │                                       │
│         ┌────────────────────┼────────────────────┐                 │
│         │                    │                    │                 │
│         ▼                    ▼                    ▼                 │
│  ┌─────────────┐      ┌─────────────┐      ┌─────────────┐        │
│  │  Security   │      │ Performance │      │    Style    │        │
│  │    Agent    │      │    Agent    │      │    Agent    │        │
│  │             │      │             │      │             │        │
│  │ • OWASP     │      │ • O(n) vs   │      │ • ESLint    │        │
│  │ • Injection │      │   O(n^2)    │      │ • Prettier  │        │
│  │ • Auth      │      │ • Memory    │      │ • Naming    │        │
│  │ • Crypto    │      │ • Caching   │      │ • DRY       │        │
│  └─────────────┘      └─────────────┘      └─────────────┘        │
│         │                    │                    │                 │
│         └────────────────────┼────────────────────┘                 │
│                              ▼                                       │
│                    ┌─────────────────┐                              │
│                    │    Synthesize   │                              │
│                    │    & Prioritize │                              │
│                    └─────────────────┘                              │
│                                                                      │
│  Benefits:                                                           │
│  • Deep expertise per domain                                        │
│  • Parallel execution                                               │
│  • Consistent focus                                                 │
│  • No fatigue                                                       │
│  • Comprehensive coverage                                           │
└─────────────────────────────────────────────────────────────────────┘

Multi-Agent Architectures

There are several patterns for organizing multiple agents:

Pattern 1: PARALLEL (Your Project)
┌─────────────────────────────────────────────────────────────────────┐
│                                                                      │
│                       ┌───────────────┐                             │
│                       │  Coordinator  │                             │
│                       └───────────────┘                             │
│                              │                                       │
│               ┌──────────────┼──────────────┐                       │
│               │              │              │                       │
│               ▼              ▼              ▼                       │
│        ┌──────────┐   ┌──────────┐   ┌──────────┐                  │
│        │ Agent A  │   │ Agent B  │   │ Agent C  │  ← Run in parallel│
│        └──────────┘   └──────────┘   └──────────┘                  │
│               │              │              │                       │
│               └──────────────┼──────────────┘                       │
│                              ▼                                       │
│                       ┌───────────────┐                             │
│                       │   Synthesize  │                             │
│                       └───────────────┘                             │
│                                                                      │
│  Use case: Independent tasks, time-sensitive                        │
└─────────────────────────────────────────────────────────────────────┘

Pattern 2: SEQUENTIAL (Pipeline)
┌─────────────────────────────────────────────────────────────────────┐
│                                                                      │
│  ┌──────────┐   ┌──────────┐   ┌──────────┐   ┌──────────┐        │
│  │ Agent A  │──►│ Agent B  │──►│ Agent C  │──►│ Agent D  │        │
│  │ (Parse)  │   │(Analyze) │   │(Suggest) │   │ (Format) │        │
│  └──────────┘   └──────────┘   └──────────┘   └──────────┘        │
│                                                                      │
│  Use case: Each step depends on previous output                     │
└─────────────────────────────────────────────────────────────────────┘

Pattern 3: HIERARCHICAL (Tree)
┌─────────────────────────────────────────────────────────────────────┐
│                                                                      │
│                       ┌───────────────┐                             │
│                       │    Manager    │                             │
│                       └───────────────┘                             │
│                              │                                       │
│               ┌──────────────┼──────────────┐                       │
│               ▼              ▼              ▼                       │
│        ┌──────────┐   ┌──────────┐   ┌──────────┐                  │
│        │ Lead A   │   │ Lead B   │   │ Lead C   │                  │
│        └──────────┘   └──────────┘   └──────────┘                  │
│               │                             │                       │
│         ┌─────┴─────┐                 ┌─────┴─────┐                │
│         ▼           ▼                 ▼           ▼                │
│    ┌────────┐ ┌────────┐        ┌────────┐ ┌────────┐             │
│    │Worker 1│ │Worker 2│        │Worker 3│ │Worker 4│             │
│    └────────┘ └────────┘        └────────┘ └────────┘             │
│                                                                      │
│  Use case: Large teams, complex delegation                          │
└─────────────────────────────────────────────────────────────────────┘

Pattern 4: DEBATE (Adversarial)
┌─────────────────────────────────────────────────────────────────────┐
│                                                                      │
│  ┌──────────┐                             ┌──────────┐              │
│  │ Agent A  │◄──── Debate/Challenge ─────►│ Agent B  │              │
│  │(Advocate)│                             │ (Critic) │              │
│  └──────────┘                             └──────────┘              │
│         │                                        │                  │
│         └────────────────┬───────────────────────┘                  │
│                          ▼                                          │
│                   ┌──────────────┐                                  │
│                   │    Judge     │                                  │
│                   │  (Arbiter)   │                                  │
│                   └──────────────┘                                  │
│                                                                      │
│  Use case: Exploring trade-offs, finding edge cases                 │
└─────────────────────────────────────────────────────────────────────┘

Specialized Agent Design

Each agent needs a focused configuration that shapes its expertise:

Agent Specialization Architecture:

┌─────────────────────────────────────────────────────────────────────┐
│                        SECURITY AGENT                                │
├─────────────────────────────────────────────────────────────────────┤
│ System Prompt:                                                       │
│ "You are a security-focused code reviewer. Your expertise:          │
│  - OWASP Top 10 vulnerabilities                                     │
│  - Input validation and sanitization                                │
│  - Authentication and authorization                                 │
│  - Cryptographic best practices                                     │
│  - SQL injection, XSS, CSRF prevention                              │
│                                                                      │
│  For each issue, rate severity: CRITICAL, HIGH, MEDIUM, LOW        │
│  Provide specific remediation steps."                               │
│                                                                      │
│ Focus Areas:                                                         │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐   │
│ │ Injection   │ │   Broken    │ │  Sensitive  │ │   Broken    │   │
│ │   Flaws     │ │    Auth     │ │ Data Expose │ │   Access    │   │
│ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘   │
└─────────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────────┐
│                      PERFORMANCE AGENT                               │
├─────────────────────────────────────────────────────────────────────┤
│ System Prompt:                                                       │
│ "You are a performance-focused code reviewer. Your expertise:       │
│  - Algorithmic complexity (Big O notation)                          │
│  - Memory management and leaks                                      │
│  - Database query optimization                                      │
│  - Caching strategies                                               │
│  - Async/parallel execution opportunities                           │
│                                                                      │
│  For each issue, estimate impact: 10x, 5x, 2x, marginal            │
│  Suggest benchmarks to verify improvements."                        │
│                                                                      │
│ Focus Areas:                                                         │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐   │
│ │ O(n^2) →    │ │   N+1       │ │   Memory    │ │   Blocking  │   │
│ │   O(n)      │ │  Queries    │ │    Leaks    │ │     I/O     │   │
│ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘   │
└─────────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────────┐
│                         STYLE AGENT                                  │
├─────────────────────────────────────────────────────────────────────┤
│ System Prompt:                                                       │
│ "You are a code style and quality reviewer. Your expertise:         │
│  - Naming conventions and clarity                                   │
│  - Code organization and modularity                                 │
│  - DRY principle adherence                                          │
│  - Documentation completeness                                       │
│  - Consistency with project patterns                                │
│                                                                      │
│  Group issues by: formatting, naming, structure, documentation     │
│  Reference relevant style guides when applicable."                  │
│                                                                      │
│ Focus Areas:                                                         │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐   │
│ │   Naming    │ │    Code     │ │  Missing    │ │   DRY       │   │
│ │ Conventions │ │  Smells     │ │    Docs     │ │ Violations  │   │
│ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘   │
└─────────────────────────────────────────────────────────────────────┘

Result Synthesis

Combining findings from multiple agents requires careful prioritization:

Synthesis Algorithm:

┌─────────────────────────────────────────────────────────────────────┐
│                        INPUT: Agent Findings                         │
│                                                                      │
│  Security: [Finding1, Finding2, Finding3]                           │
│  Performance: [Finding4, Finding5]                                  │
│  Style: [Finding6, Finding7, Finding8, Finding9]                    │
└─────────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────────┐
│                     STEP 1: Normalize Severity                       │
│                                                                      │
│  Map each agent's severity to common scale (1-10):                  │
│                                                                      │
│  Security:                Performance:           Style:             │
│  CRITICAL = 10            10x impact = 9         Blocking = 6       │
│  HIGH = 8                 5x impact = 7          Major = 4          │
│  MEDIUM = 5               2x impact = 5          Minor = 2          │
│  LOW = 3                  marginal = 2           Nitpick = 1        │
└─────────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────────┐
│                     STEP 2: Deduplicate                              │
│                                                                      │
│  Detect overlapping findings (same line, similar issue):            │
│                                                                      │
│  Security: "SQL injection at line 42"                               │
│  Performance: "Unparameterized query at line 42" ← MERGE           │
│                                                                      │
│  Result: Combined finding with both perspectives                    │
└─────────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────────┐
│                     STEP 3: Weight by Category                       │
│                                                                      │
│  Apply category multipliers (configurable):                         │
│                                                                      │
│  Security findings:    ×1.5  (most critical)                        │
│  Performance findings: ×1.2  (important)                            │
│  Style findings:       ×1.0  (baseline)                             │
│                                                                      │
│  Final score = normalized_severity × category_weight               │
└─────────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────────┐
│                     STEP 4: Sort and Group                           │
│                                                                      │
│  1. SQL Injection (Security - CRITICAL)     Score: 15.0            │
│  2. N+1 Query (Performance - HIGH)          Score: 10.8            │
│  3. Missing Auth Check (Security - HIGH)    Score: 12.0            │
│  4. Unused import (Style - Minor)           Score: 2.0             │
│  ...                                                                 │
│                                                                      │
│  Group by file for developer convenience                            │
└─────────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────────┐
│                        OUTPUT: Prioritized Review                    │
│                                                                      │
│  MUST FIX (Score > 10):                                             │
│  1. SQL Injection in getUserById()                                  │
│  2. Missing auth check in deleteUser()                              │
│                                                                      │
│  SHOULD FIX (Score 5-10):                                           │
│  3. N+1 query in getOrdersWithItems()                              │
│                                                                      │
│  CONSIDER (Score < 5):                                              │
│  4. Unused imports                                                  │
│  5. Variable naming suggestions                                     │
└─────────────────────────────────────────────────────────────────────┘

Kiro Subagent Spawning

Kiro CLI supports spawning subagents for parallel execution:

Subagent Spawning Patterns:

Method 1: CLI Subprocess
┌─────────────────────────────────────────────────────────────────────┐
│ const reviews = await Promise.all([                                  │
│   $`kiro-cli --agent security --print "${prompt}"`,                │
│   $`kiro-cli --agent performance --print "${prompt}"`,             │
│   $`kiro-cli --agent style --print "${prompt}"`,                   │
│ ]);                                                                  │
│                                                                      │
│ Pros: Simple, isolated                                              │
│ Cons: Startup overhead per agent                                    │
└─────────────────────────────────────────────────────────────────────┘

Method 2: Agent Configuration
┌─────────────────────────────────────────────────────────────────────┐
│ // .kiro/agents/review-coordinator.yaml                             │
│ name: review-coordinator                                             │
│ system_prompt: |                                                     │
│   You coordinate code reviews by delegating to specialized agents.  │
│                                                                      │
│ allowed_tools:                                                       │
│   - spawn_subagent                                                   │
│                                                                      │
│ subagents:                                                           │
│   - security-reviewer                                                │
│   - performance-reviewer                                             │
│   - style-reviewer                                                   │
└─────────────────────────────────────────────────────────────────────┘

Method 3: Direct Invocation
┌─────────────────────────────────────────────────────────────────────┐
│ > "Review this PR with all specialized agents"                      │
│                                                                      │
│ [Coordinator] Spawning security-reviewer...                         │
│ [Coordinator] Spawning performance-reviewer...                      │
│ [Coordinator] Spawning style-reviewer...                            │
│                                                                      │
│ [Waiting for subagents...]                                          │
│                                                                      │
│ [security-reviewer] Found 3 issues                                  │
│ [performance-reviewer] Found 2 issues                               │
│ [style-reviewer] Found 5 issues                                     │
│                                                                      │
│ [Coordinator] Synthesizing findings...                              │
└─────────────────────────────────────────────────────────────────────┘

Real-World Analogy: The Architecture Review Board

Think of this system like a corporate Architecture Review Board:

The Coordinator is the meeting chair who assigns the agenda
Security Agent is the security architect who focuses only on threats
Performance Agent is the performance engineer who watches for bottlenecks
Style Agent is the tech lead who maintains coding standards
The Synthesis is the meeting minutes that prioritize action items

Each expert reviews independently, then they meet to consolidate feedback.

Historical Context

Code review automation has evolved:

Code Review Evolution:

1970s: Fagan Inspections
       └─► Formal, meeting-based reviews

1990s: Lightweight Reviews
       └─► Email-based, async reviews

2000s: Tool-Assisted (Crucible, Review Board)
       └─► Web interfaces, inline comments

2010s: Pull Request Workflow
       └─► GitHub/GitLab integrated reviews

2020s: AI Linters (Codacy, DeepSource)
       └─► Automated issue detection

2024+: Multi-Agent AI Review ◄─── YOU ARE HERE
       └─► Specialized AI agents with synthesis

Book References

For deeper understanding:

“Working Effectively with Legacy Code” by Michael Feathers - Code analysis techniques
“Clean Code” by Robert C. Martin - Style and quality principles
“Secure Coding in C and C++” by Seacord - Security review patterns
“High Performance Browser Networking” by Grigorik - Performance analysis

Complete Project Specification

What You Are Building

A multi-agent code review system that:

Accepts code for review (file, diff, or PR)
Spawns specialized agents in parallel
Collects and normalizes findings from each agent
Synthesizes a prioritized report with actionable feedback
Optionally applies fixes for certain issue types

Functional Requirements

Feature	Behavior
Input	Accept file path, git diff, or GitHub PR URL
Parallel Review	Run security, performance, style agents simultaneously
Findings Format	Standardized structure with line numbers, severity
Prioritization	Rank issues by weighted severity
Output	Clear, actionable review comments

Non-Functional Requirements

Latency: Complete review within 60 seconds for typical PR
Accuracy: Minimize false positives while catching real issues
Extensibility: Easy to add new specialized agents
Integration: Work with GitHub PR workflow

Solution Architecture

High-Level Component Diagram

┌─────────────────────────────────────────────────────────────────────┐
│                        User Request                                  │
│                                                                      │
│  "Review PR #42 with all agents"                                    │
│         │                                                            │
│         ▼                                                            │
│  ┌─────────────────────────────────────────────────────────────────┐│
│  │                  Coordinator Agent                               ││
│  │                                                                  ││
│  │  1. Parse request (PR #42)                                       ││
│  │  2. Fetch code diff                                              ││
│  │  3. Spawn subagents                                              ││
│  │  4. Collect results                                              ││
│  │  5. Synthesize report                                            ││
│  │                                                                  ││
│  └─────────────────────────────────────────────────────────────────┘│
│         │                                                            │
│         │ Parallel Spawn                                             │
│         ▼                                                            │
│  ┌──────────────┬──────────────┬──────────────┐                    │
│  │   Security   │  Performance │    Style     │                    │
│  │    Agent     │    Agent     │    Agent     │                    │
│  │              │              │              │                    │
│  │ Input: Diff  │ Input: Diff  │ Input: Diff  │                    │
│  │              │              │              │                    │
│  │ Output:      │ Output:      │ Output:      │                    │
│  │ [{finding}]  │ [{finding}]  │ [{finding}]  │                    │
│  └──────────────┴──────────────┴──────────────┘                    │
│         │              │              │                             │
│         └──────────────┼──────────────┘                             │
│                        ▼                                             │
│  ┌─────────────────────────────────────────────────────────────────┐│
│  │                    Synthesizer                                   ││
│  │                                                                  ││
│  │  • Normalize severities                                          ││
│  │  • Deduplicate findings                                          ││
│  │  • Apply category weights                                        ││
│  │  • Sort by priority                                              ││
│  │  • Format output                                                 ││
│  │                                                                  ││
│  └─────────────────────────────────────────────────────────────────┘│
│         │                                                            │
│         ▼                                                            │
│  ┌─────────────────────────────────────────────────────────────────┐│
│  │                   Prioritized Report                             ││
│  │                                                                  ││
│  │  MUST FIX:                                                       ││
│  │  1. SQL Injection (Security - CRITICAL)                          ││
│  │  2. Missing auth check (Security - HIGH)                         ││
│  │                                                                  ││
│  │  SHOULD FIX:                                                     ││
│  │  3. N+1 query (Performance - HIGH)                               ││
│  │  ...                                                             ││
│  └─────────────────────────────────────────────────────────────────┘│
└─────────────────────────────────────────────────────────────────────┘

Data Flow: Complete Review Cycle

1. Input Processing
   ┌─────────────────────────────────────────────────────────────────┐
   │ Input: "Review PR #42"                                          │
   │                                                                  │
   │ Coordinator:                                                     │
   │ 1. Parse: PR number = 42                                        │
   │ 2. Fetch: gh pr diff 42 > diff.patch                           │
   │ 3. Extract: Changed files and line ranges                       │
   │                                                                  │
   │ Output:                                                          │
   │ {                                                                │
   │   "files": ["src/api/users.ts", "src/services/auth.ts"],       │
   │   "diff": "...unified diff content...",                         │
   │   "additions": 150,                                              │
   │   "deletions": 23                                                │
   │ }                                                                │
   └─────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
2. Parallel Agent Execution
   ┌─────────────────────────────────────────────────────────────────┐
   │ Promise.all([                                                    │
   │   securityAgent.review(context),    // 15 seconds              │
   │   performanceAgent.review(context), // 12 seconds              │
   │   styleAgent.review(context),       // 8 seconds               │
   │ ])                                                               │
   │                                                                  │
   │ Total time: ~15 seconds (parallel)                              │
   │ Sequential would be: ~35 seconds                                 │
   └─────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
3. Raw Findings Collection
   ┌─────────────────────────────────────────────────────────────────┐
   │ Security Agent Output:                                           │
   │ [                                                                │
   │   {                                                              │
   │     "type": "SQL_INJECTION",                                    │
   │     "severity": "CRITICAL",                                     │
   │     "file": "src/api/users.ts",                                 │
   │     "line": 42,                                                 │
   │     "message": "User input directly interpolated in SQL",       │
   │     "suggestion": "Use parameterized queries"                   │
   │   },                                                             │
   │   ...                                                            │
   │ ]                                                                │
   │                                                                  │
   │ Performance Agent Output:                                        │
   │ [                                                                │
   │   {                                                              │
   │     "type": "N_PLUS_1",                                         │
   │     "severity": "HIGH",                                         │
   │     "file": "src/services/orders.ts",                           │
   │     "line": 78,                                                 │
   │     "message": "Query inside loop creates N+1 problem",         │
   │     "suggestion": "Use eager loading or batch query"           │
   │   },                                                             │
   │   ...                                                            │
   │ ]                                                                │
   │                                                                  │
   │ Style Agent Output:                                              │
   │ [                                                                │
   │   {                                                              │
   │     "type": "NAMING",                                           │
   │     "severity": "LOW",                                          │
   │     "file": "src/api/users.ts",                                 │
   │     "line": 15,                                                 │
   │     "message": "Variable 'x' is not descriptive",              │
   │     "suggestion": "Rename to 'userId' or 'userIndex'"          │
   │   },                                                             │
   │   ...                                                            │
   │ ]                                                                │
   └─────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
4. Synthesis and Output
   ┌─────────────────────────────────────────────────────────────────┐
   │ MULTI-AGENT CODE REVIEW - PR #42                                 │
   │ ════════════════════════════════════════════════════════════════│
   │                                                                  │
   │ Summary: 10 issues found (2 critical, 3 high, 5 low)            │
   │                                                                  │
   │ MUST FIX (Critical):                                             │
   │ ──────────────────────────────────────────────────────────────── │
   │ 1. [Security] SQL Injection                                      │
   │    File: src/api/users.ts:42                                    │
   │    Issue: User input directly interpolated in SQL query         │
   │    Fix: Use parameterized query with $1, $2 placeholders        │
   │                                                                  │
   │ SHOULD FIX (High):                                               │
   │ ──────────────────────────────────────────────────────────────── │
   │ 2. [Performance] N+1 Query                                       │
   │    File: src/services/orders.ts:78                              │
   │    Issue: Database query inside loop                            │
   │    Fix: Use .include() for eager loading                        │
   │                                                                  │
   │ 3. [Security] Missing Rate Limiting                              │
   │    File: src/api/auth.ts:15                                     │
   │    Issue: Login endpoint has no rate limit                      │
   │    Fix: Add rate-limiter-flexible middleware                    │
   │                                                                  │
   │ CONSIDER (Low):                                                  │
   │ ──────────────────────────────────────────────────────────────── │
   │ 4-10. [Style] Various naming and formatting issues              │
   │                                                                  │
   └─────────────────────────────────────────────────────────────────┘

Key Interfaces

// Finding from any agent
interface Finding {
  agent: 'security' | 'performance' | 'style';
  type: string;
  severity: 'critical' | 'high' | 'medium' | 'low';
  file: string;
  line: number;
  endLine?: number;
  message: string;
  suggestion: string;
  codeSnippet?: string;
  references?: string[];
}

// Review context passed to agents
interface ReviewContext {
  diff: string;
  files: FileContext[];
  metadata: {
    prNumber?: number;
    baseBranch: string;
    headBranch: string;
    author: string;
  };
}

interface FileContext {
  path: string;
  content: string;
  diff: string;
  changedLines: number[];
}

// Synthesized report
interface SynthesizedReport {
  summary: {
    total: number;
    bySeverity: Record<string, number>;
    byAgent: Record<string, number>;
  };
  findings: PrioritizedFinding[];
  suggestedActions: Action[];
}

interface PrioritizedFinding extends Finding {
  priority: number;  // Computed score
  relatedFindings?: Finding[];  // Merged duplicates
}

Agent Configuration Files

# .kiro/agents/security-reviewer.yaml
name: security-reviewer
system_prompt: |
  You are a security-focused code reviewer with expertise in:
  - OWASP Top 10 vulnerabilities
  - Authentication and authorization flaws
  - Input validation and output encoding
  - Cryptographic weaknesses
  - Information disclosure

  When reviewing code:
  1. Focus ONLY on security issues
  2. Rate each finding: CRITICAL, HIGH, MEDIUM, LOW
  3. Provide specific, actionable remediation
  4. Reference CWE numbers when applicable

  Output format: JSON array of findings.

allowed_tools:
  - read_file
  - search_codebase

model: claude-sonnet-4-20250514  # Fast, capable

---
# .kiro/agents/performance-reviewer.yaml
name: performance-reviewer
system_prompt: |
  You are a performance-focused code reviewer with expertise in:
  - Algorithmic complexity (Big O)
  - Database query optimization
  - Memory management
  - Caching strategies
  - Async/parallel execution

  When reviewing code:
  1. Focus ONLY on performance issues
  2. Estimate impact: 10x, 5x, 2x, marginal
  3. Suggest benchmarks to verify
  4. Provide specific optimization techniques

  Output format: JSON array of findings.

allowed_tools:
  - read_file
  - search_codebase

model: claude-sonnet-4-20250514

---
# .kiro/agents/style-reviewer.yaml
name: style-reviewer
system_prompt: |
  You are a code style and quality reviewer with expertise in:
  - Naming conventions
  - Code organization
  - DRY principle
  - Documentation
  - Consistency

  When reviewing code:
  1. Focus ONLY on style and quality issues
  2. Reference project style guides
  3. Distinguish: blocking vs. suggestions
  4. Keep suggestions constructive

  Output format: JSON array of findings.

allowed_tools:
  - read_file
  - search_codebase

model: claude-haiku-4-20250514  # Fast, good for style

---
# .kiro/agents/review-coordinator.yaml
name: review-coordinator
system_prompt: |
  You are the code review coordinator. Your role:
  1. Parse user review requests
  2. Delegate to specialized agents
  3. Collect and synthesize findings
  4. Present prioritized report

  You have access to these subagents:
  - security-reviewer
  - performance-reviewer
  - style-reviewer

allowed_tools:
  - spawn_subagent
  - read_file
  - gh_cli

model: claude-sonnet-4-20250514

Phased Implementation Guide

Phase 1: Single Agent Review (Days 1-3)

Goal: Create one working review agent (start with security).

Tasks:

Create security-reviewer agent configuration
Implement review prompt that outputs JSON findings
Parse agent output into structured findings
Test with sample code containing known vulnerabilities
Format findings for display

Hints:

Start with a hardcoded file path for testing
Use JSON mode for structured output
Include example findings in the system prompt

Starter Agent Prompt:

const securityReviewPrompt = `
Review this code for security vulnerabilities:

\`\`\`typescript
${codeContent}
\`\`\`

Return a JSON array of findings:
[
  {
    "type": "SQL_INJECTION",
    "severity": "CRITICAL",
    "line": 42,
    "message": "User input directly in SQL",
    "suggestion": "Use parameterized queries"
  }
]

If no issues found, return empty array: []
`;

Phase 2: Multiple Agents (Days 4-6)

Goal: Add performance and style agents, run in parallel.

Tasks:

Create performance-reviewer agent configuration
Create style-reviewer agent configuration
Implement parallel execution with Promise.all
Collect results from all agents
Handle agent failures gracefully

Hints:

Each agent should have isolated context
Use timeouts to prevent hanging agents
Log which agent produced which findings

Parallel Execution:

async function runAllAgents(context: ReviewContext): Promise<Finding[]> {
  const agents = ['security', 'performance', 'style'];

  const results = await Promise.allSettled(
    agents.map(agent =>
      runAgent(agent, context).catch(err => {
        console.error(`${agent} agent failed:`, err);
        return [];
      })
    )
  );

  return results
    .filter((r): r is PromiseFulfilledResult<Finding[]> => r.status === 'fulfilled')
    .flatMap(r => r.value);
}

Phase 3: Coordinator Agent (Days 7-9)

Goal: Create the orchestrating coordinator agent.

Tasks:

Create review-coordinator agent configuration
Implement PR/diff fetching logic
Build context object for subagents
Implement subagent spawning
Collect results from subagents

Hints:

The coordinator needs access to gh CLI
Pass minimal context to subagents (just what they need)
Track timing for each agent

Coordinator Flow:

class ReviewCoordinator {
  async review(request: string): Promise<SynthesizedReport> {
    // 1. Parse request
    const { prNumber, files } = this.parseRequest(request);

    // 2. Fetch context
    const context = await this.fetchContext(prNumber);

    // 3. Spawn subagents in parallel
    const findings = await this.runAllAgents(context);

    // 4. Synthesize
    return this.synthesize(findings);
  }
}

Phase 4: Synthesis and Output (Days 10-14)

Goal: Implement finding synthesis and prioritized output.

Tasks:

Implement severity normalization
Detect and merge duplicate findings
Apply category weights
Sort by computed priority
Format beautiful output

Hints:

Duplicates often have same file and similar line numbers
Use fuzzy matching for message similarity
Group by file for developer convenience

Synthesis Implementation:

function synthesize(findings: Finding[]): SynthesizedReport {
  // Normalize severities to 1-10 scale
  const normalized = findings.map(f => ({
    ...f,
    normalizedSeverity: normalizeSeverity(f.agent, f.severity),
  }));

  // Deduplicate (same file + similar line + similar message)
  const deduplicated = deduplicateFindings(normalized);

  // Apply category weights
  const weighted = deduplicated.map(f => ({
    ...f,
    priority: f.normalizedSeverity * getCategoryWeight(f.agent),
  }));

  // Sort by priority
  weighted.sort((a, b) => b.priority - a.priority);

  return {
    summary: computeSummary(weighted),
    findings: weighted,
    suggestedActions: generateActions(weighted),
  };
}

Testing Strategy

Unit Tests

describe('FindingSynthesizer', () => {
  describe('normalizeSeverity', () => {
    it('maps security CRITICAL to 10', () => {
      expect(normalizeSeverity('security', 'critical')).toBe(10);
    });

    it('maps style minor to 2', () => {
      expect(normalizeSeverity('style', 'low')).toBe(2);
    });
  });

  describe('deduplicateFindings', () => {
    it('merges findings on same line', () => {
      const findings = [
        { agent: 'security', file: 'a.ts', line: 42, message: 'SQL injection' },
        { agent: 'performance', file: 'a.ts', line: 42, message: 'Slow query' },
      ];
      const deduped = deduplicateFindings(findings);
      expect(deduped).toHaveLength(1);
      expect(deduped[0].relatedFindings).toHaveLength(1);
    });
  });
});

Integration Tests

describe('Full Review Pipeline', () => {
  it('reviews a PR with all agents', async () => {
    const coordinator = new ReviewCoordinator();

    // Review a known test PR
    const report = await coordinator.review('Review PR #1');

    expect(report.findings.length).toBeGreaterThan(0);
    expect(report.summary.byAgent).toHaveProperty('security');
    expect(report.summary.byAgent).toHaveProperty('performance');
    expect(report.summary.byAgent).toHaveProperty('style');
  });
});

Manual Testing

# 1. Start coordinator agent
kiro-cli --agent review-coordinator

# 2. Review a local file
> "Review src/api/users.ts for all issues"

# 3. Review a PR
> "Review PR #42 with all agents"

# 4. Verify output format and prioritization
# Should see categorized, prioritized findings

Common Pitfalls and Debugging

Pitfall 1: Agents Return Inconsistent Formats

Symptom: JSON parsing fails on some agent outputs

Prevention:

function parseAgentOutput(output: string, agent: string): Finding[] {
  try {
    // Try to extract JSON from markdown code blocks
    const jsonMatch = output.match(/```json\n?([\s\S]*?)\n?```/);
    const json = jsonMatch ? jsonMatch[1] : output;

    const findings = JSON.parse(json);

    // Validate structure
    return findings.filter(f =>
      f.type && f.severity && f.line && f.message
    ).map(f => ({
      ...f,
      agent,
    }));
  } catch (e) {
    console.error(`Failed to parse ${agent} output:`, e);
    return [];
  }
}

Pitfall 2: Subagent Times Out

Symptom: One slow agent blocks entire review

Solution:

async function runAgentWithTimeout(agent: string, context: ReviewContext, timeoutMs = 30000) {
  const controller = new AbortController();
  const timeout = setTimeout(() => controller.abort(), timeoutMs);

  try {
    return await runAgent(agent, context, { signal: controller.signal });
  } catch (e) {
    if (e.name === 'AbortError') {
      console.warn(`${agent} agent timed out after ${timeoutMs}ms`);
      return [];
    }
    throw e;
  } finally {
    clearTimeout(timeout);
  }
}

Pitfall 3: Duplicate Findings Not Detected

Symptom: Same issue reported by multiple agents separately

Solution:

function isSimilarFinding(a: Finding, b: Finding): boolean {
  // Same file
  if (a.file !== b.file) return false;

  // Similar line (within 5 lines)
  if (Math.abs(a.line - b.line) > 5) return false;

  // Similar message (fuzzy match)
  const similarity = stringSimilarity(a.message, b.message);
  return similarity > 0.6;
}

Extensions and Challenges

Extension 1: GitHub Integration

Post review comments directly to PRs:

async function postToGitHub(report: SynthesizedReport, prNumber: number) {
  for (const finding of report.findings) {
    await $`gh pr comment ${prNumber} --body ${formatComment(finding)}`;

    // Or use review API for inline comments
    await $`gh api repos/:owner/:repo/pulls/${prNumber}/comments -f body="${finding.message}" -f path="${finding.file}" -f line=${finding.line}`;
  }
}

Extension 2: Learning from Feedback

Track which findings developers actually fix:

interface FindingFeedback {
  findingId: string;
  wasFixed: boolean;
  wasHelpful: boolean;
  comment?: string;
}

// Use feedback to tune severity weights
function updateWeights(feedback: FindingFeedback[]) {
  const fixRates = groupBy(feedback, f => f.findingType);
  // Increase weight for types that are frequently fixed
  // Decrease weight for types that are often dismissed
}

Extension 3: Custom Agents

Allow users to define project-specific agents:

# .kiro/agents/react-reviewer.yaml
name: react-reviewer
system_prompt: |
  You are a React specialist. Review for:
  - Hook rules violations
  - State management anti-patterns
  - Performance issues (missing memo, key props)
  - Accessibility issues

Extension 4: Auto-Fix Capability

For certain issues, apply fixes automatically:

interface AutoFix {
  type: string;
  pattern: RegExp;
  replacement: string | ((match: string) => string);
}

const autoFixes: AutoFix[] = [
  {
    type: 'MISSING_AWAIT',
    pattern: /(?<!await\s)(fetch\()/g,
    replacement: 'await $1',
  },
];

Extension 5: Review History

Track review trends over time:

Review Trends (Last 30 Days):
┌─────────────────────────────────────────────────────────────────┐
│ Total Reviews: 47                                                │
│ Total Findings: 234                                              │
│                                                                  │
│ Top Issue Types:                                                 │
│ 1. N+1 Queries         45 ████████████                          │
│ 2. Missing Auth        23 ██████                                │
│ 3. Hardcoded Values    18 █████                                 │
│                                                                  │
│ Trend: Security issues ↓ 15%, Performance issues ↑ 8%           │
└─────────────────────────────────────────────────────────────────┘

Real-World Connections

Industry Adoption

Multi-agent review patterns are used by:

Amazon CodeGuru: Security and performance analysis
DeepSource: Multiple analyzers running in parallel
Codacy: Rule-based multi-category checks
Snyk Code: Security-focused AI review

Production Considerations

Concern	Solution
Cost	Use cheaper models for style, expensive for security
Latency	Parallel execution, aggressive timeouts
Accuracy	Track false positive rates, tune prompts
Coverage	Add new agents for project-specific patterns
Integration	GitHub Actions, GitLab CI/CD, Bitbucket

Self-Assessment Checklist

Knowledge Verification

Can you explain the parallel vs. sequential multi-agent patterns?
How do you design agent specialization through system prompts?
What is the finding synthesis process?
Why is severity normalization important?
How do you handle agent failures gracefully?

Implementation Verification

All three agents run in parallel successfully
Findings are properly attributed to their source agent
Duplicate findings are detected and merged
Output is sorted by priority
The system handles agent timeouts gracefully

Quality Verification

Security agent catches common vulnerabilities
Performance agent identifies complexity issues
Style agent flags inconsistencies
False positive rate is acceptable
Report is actionable and clear

Integration Verification

Works with local files
Works with git diffs
Works with GitHub PRs
Results can be posted as PR comments

Summary

Building a multi-agent code review system teaches you:

Agent Orchestration: Coordinating multiple AI agents in parallel
Specialization Design: Creating focused agents with deep expertise
Result Synthesis: Combining and prioritizing findings from multiple sources
Production Patterns: Handling timeouts, failures, and inconsistencies

The multi-agent pattern you have learned here applies far beyond code review - it works for any complex task that benefits from multiple specialized perspectives: security audits, documentation review, test planning, and more.

Next Project: P26-mdflow-workflow-engine.md - Executable markdown workflows with AI