Project 37: Multi-Agent Orchestrator - Parallel Claude Swarm
Project 37: Multi-Agent Orchestrator - Parallel Claude Swarm
Build an orchestration system that spawns multiple Claude Code instances in headless mode, assigns them specialized tasks, coordinates their work through shared state, handles failures gracefully, and combines their outputs into coherent results.
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Master |
| Time Estimate | 1 month+ |
| Languages | TypeScript (Alternatives: Python, Go) |
| Prerequisites | All previous projects, understanding of distributed systems concepts |
| Key Topics | Multi-agent systems, distributed coordination, fault tolerance, consensus |
| Knowledge Area | Distributed Systems / Agent Orchestration |
| Software/Tools | Claude Code Headless, Task API |
| Coolness Level | Level 5: Pure Magic (Super Cool) |
| Business Potential | 4. The “Open Core” Infrastructure |
1. Project Overview
This is the pinnacle of Claude Code automation. You will build a system that coordinates multiple Claude Code instances working in parallel - like a swarm of AI workers each with specialized capabilities, all collaborating on complex tasks that no single agent could accomplish efficiently.
What makes this Master level:
- Distributed systems thinking is required
- Failure handling must be robust
- Coordination problems are subtle and complex
- Performance optimization across multiple agents
- State synchronization across processes
2. Real World Outcome
You will have a multi-agent orchestration system capable of complex, parallelized workflows:
Example: Large-Scale Code Migration
$ claude-swarm run ./migration-plan.yaml
+------------------------------------------------------------------+
| CLAUDE SWARM ORCHESTRATOR |
+------------------------------------------------------------------+
Loading migration plan...
Target: Migrate JavaScript codebase to TypeScript
Spawning agent swarm:
+-------------------------------------------------------------------+
| Agent ID | Role | Assigned Work | Status |
+----------+-----------+----------------------------------+--------+
| agent-1 | Analyzer | Scanning codebase for types | INIT |
| agent-2 | Analyzer | Identifying external deps | INIT |
| agent-3 | Converter | Converting /src/utils (15 files) | INIT |
| agent-4 | Converter | Converting /src/components (32) | INIT |
| agent-5 | Converter | Converting /src/services (8) | INIT |
| agent-6 | Validator | Type-checking converted files | WAIT |
| agent-7 | Reviewer | Reviewing conversion quality | WAIT |
+----------+-----------+----------------------------------+--------+
Progress:
[================------------] 60% - 33/55 files converted
Agent Status (real-time):
agent-1: COMPLETE - Found 127 type patterns
agent-2: COMPLETE - 15 deps need @types packages
agent-3: WORKING - 12/15 files done
agent-4: WORKING - 20/32 files done
agent-5: COMPLETE - All services converted
agent-6: WAITING - Need more completed files
agent-7: WORKING - 5 files reviewed
WARNING agent-4 error on /src/components/DataGrid.jsx:
"Complex HOC pattern needs manual intervention"
-> Added to manual-review queue
-> Agent-4 continuing with remaining files...
...
MIGRATION COMPLETE!
+------------------------------------------------------------------+
| FINAL RESULTS |
+------------------------------------------------------------------+
| Files auto-converted: | 52/55 |
| Files need manual review: | 3 |
| Type errors: | 0 |
| Generated artifacts: | tsconfig.json, types/*.d.ts |
| Total time: | 4m 32s (vs ~2h sequential) |
| Parallel speedup: | ~26x |
+------------------------------------------------------------------+
Report: ./migration-report.html
Example: Parallel Codebase Analysis
$ claude-swarm analyze --depth=deep ./large-monorepo
Swarm Configuration:
- 4 Architecture Analyzers
- 2 Security Auditors
- 2 Performance Profilers
- 1 Documentation Reviewer
- 1 Result Aggregator
[====================] 100% Complete
Analysis Report Generated:
- Architecture diagram: ./reports/architecture.svg
- Security findings: 3 high, 7 medium, 12 low
- Performance hotspots: 5 identified
- Documentation coverage: 67%
- Tech debt estimate: 340 story points
3. The Core Question You Are Answering
“How do you coordinate multiple AI agents to accomplish more than one agent could alone?”
This is not just parallelization for speed. It is about specialization - one agent that excels at analysis, another at conversion, another at review. Together they produce higher quality results than one agent trying to do everything.
Sub-questions to consider:
- How do you partition work without losing context?
- How do agents share discoveries without overwhelming each other?
- How do you handle conflicting recommendations from different agents?
- What happens when one agent fails mid-task?
4. Concepts You Must Understand First
Stop and research these before coding:
4.1 Agent Specialization
Questions to answer:
- How do you make an agent “expert” in a task?
- What is the tradeoff between generalist and specialist agents?
- How do you define agent boundaries?
Key insight: Specialization through output styles and focused system prompts. An “Analyzer” agent gets a different persona than a “Converter” agent.
Reference: “Multi-Agent Systems” by Wooldridge - Chapters 4-5
4.2 Coordination Patterns
Questions to answer:
- What is the difference between orchestration and choreography?
- How do you handle shared state across agents?
- What synchronization primitives do you need?
+-----------------------------------------------------------------------+
| ORCHESTRATION vs CHOREOGRAPHY |
+-----------------------------------------------------------------------+
| |
| ORCHESTRATION (Central Control): |
| |
| +---------------+ |
| | Orchestrator | <-- Single point of coordination |
| +---------------+ |
| / | \ |
| v v v |
| +------+ +------+ +------+ |
| |Agent1| |Agent2| |Agent3| <-- Agents receive commands |
| +------+ +------+ +------+ |
| |
| Pros: Simple control flow, easy to debug |
| Cons: Single point of failure, bottleneck potential |
| |
+-----------------------------------------------------------------------+
| |
| CHOREOGRAPHY (Decentralized): |
| |
| +------+ +------+ |
| |Agent1| <-> |Agent2| <-- Agents communicate directly |
| +------+ +------+ |
| ^ ^ |
| \ / |
| v v |
| +------+ |
| |Agent3| |
| +------+ |
| |
| Pros: No single point of failure, scalable |
| Cons: Complex to debug, emergent behavior |
| |
+-----------------------------------------------------------------------+
Reference: “Designing Data-Intensive Applications” by Kleppmann - Chapters 8-9
4.3 Failure Modes in Distributed Systems
Questions to answer:
- What happens when one agent fails?
- How do you implement retries with backoff?
- When should the whole swarm fail vs. continue?
The Eight Fallacies of Distributed Computing (Peter Deutsch):
- The network is reliable
- Latency is zero
- Bandwidth is infinite
- The network is secure
- Topology does not change
- There is one administrator
- Transport cost is zero
- The network is homogeneous
Reference: “Release It!” by Michael Nygard - Chapters 4-5
4.4 Consensus and Coordination Algorithms
Simplified concepts to understand:
- Leader Election: How do agents decide who coordinates?
- Distributed Locks: How do you prevent duplicate work?
- Work Stealing: How do idle agents get more work?
Reference: “Designing Data-Intensive Applications” Chapter 9 - Consistency and Consensus
5. Questions to Guide Your Design
Before implementing, think through these:
5.1 Work Division
- How do you partition work across agents?
- By file? By directory? By task type?
- What is the optimal number of agents for a task?
- Too few: Slow
- Too many: Coordination overhead dominates
- How do you handle uneven workloads?
- Some files are 10 lines, others are 1000
- Work stealing vs. static assignment
5.2 Communication
- How do agents share results?
- Shared filesystem?
- Message queue?
- In-memory (Redis)?
- What is the message format?
- Structured JSON?
- Streaming updates?
- Event sourcing?
- How do you handle ordering and conflicts?
- Two agents find the same bug
- One agent’s output invalidates another’s
5.3 Progress and Observability
- How do you track overall progress?
- Not just “tasks complete” but “meaningful progress”
- How do you visualize agent status?
- Real-time dashboard?
- CLI output?
- What do you log for debugging?
- Too little: Cannot diagnose failures
- Too much: Information overload
6. Thinking Exercise: Design Agent Topologies
Consider three coordination patterns and when each is appropriate:
Pattern A: Hub and Spoke
+--------------+
| Orchestrator |
+--------------+
/ | \
/ | \
v v v
+-------+ +-------+ +-------+
|Agent 1| |Agent 2| |Agent 3|
+-------+ +-------+ +-------+
Characteristics:
- Central control over all agents
- Simple to implement and reason about
- Orchestrator can become bottleneck
- Single point of failure
Use when:
- Tasks are independent
- Central state coordination is required
- Debugging visibility is important
Pattern B: Pipeline
+-------+ +-------+ +-------+ +--------+
|Agent 1| --> |Agent 2| --> |Agent 3| --> | Result |
+-------+ +-------+ +-------+ +--------+
(Analyze) (Convert) (Validate)
Characteristics:
- Each agent has a specialized role
- Output of one feeds input of next
- Natural for staged workflows
- Bottleneck at slowest stage
Use when:
- Tasks have natural phases
- Each phase has different requirements
- Order matters
Pattern C: Mesh
+-------+ <---------> +-------+
|Agent 1| |Agent 2|
+-------+ +-------+
^ ^
| |
v v
+-------+ <---------> +-------+
|Agent 3| |Agent 4|
+-------+ +-------+
Characteristics:
- All agents can communicate
- Most flexible but most complex
- Emergent coordination
- Difficult to debug
Use when:
- Tasks require collaboration
- No clear hierarchy
- Maximum parallelism needed
Design Questions
For each pattern, answer:
- How do you implement it with Claude Code headless?
- What are the failure modes?
- How do you monitor progress?
- What coordination primitives are needed?
7. The Interview Questions They Will Ask
Prepare to answer these:
-
“How would you handle a situation where agents produce conflicting results?”
Think about: Voting, confidence scores, human escalation, domain-specific merge strategies
-
“What is your strategy for debugging a multi-agent system?”
Think about: Correlation IDs, distributed tracing, replay capabilities, deterministic testing
-
“How do you prevent agents from duplicating work?”
Think about: Work queues, distributed locks, idempotent operations, task fingerprinting
-
“What is the tradeoff between agent count and coordination overhead?”
Think about: Amdahl’s Law, communication costs, diminishing returns, optimal parallelism
-
“How would you implement checkpointing for long-running swarms?”
Think about: State persistence, resume semantics, partial completion, crash recovery
8. Hints in Layers
Only read when stuck:
Hint 1: Start with Two Agents
Build the simplest possible orchestration: one agent produces, one agent reviews. Get that working first before scaling up.
// Simplest possible multi-agent
async function simplestSwarm() {
const producer = spawnAgent("producer", "Write code for X");
const result = await waitForResult(producer);
const reviewer = spawnAgent("reviewer", `Review this code: ${result}`);
const review = await waitForResult(reviewer);
return { code: result, review };
}
Hint 2: Use Session IDs
Each headless Claude session has an ID. Use --resume to have agents continue from where they left off.
# First interaction
claude -p "Start analyzing codebase" --output-format json
# Returns session_id in output
# Continue same session
claude --resume <session_id> -p "Now focus on security issues"
Hint 3: File-Based Coordination
The simplest shared state is files. Agents write to specific locations, others read when ready.
/tmp/swarm-work/
/agent-1/
output.json # Agent 1's results
status.json # Agent 1's current status
/agent-2/
output.json
status.json
/shared/
work-queue.json # Remaining work
completed.json # Finished items
Hint 4: Output Styles for Specialization
Give each agent a different output style that focuses them on their task.
# styles/analyzer.md
You are a code analyzer. Focus ONLY on:
- Identifying patterns
- Cataloging dependencies
- Finding type information
Output structured JSON, no explanations.
# styles/converter.md
You are a code converter. Focus ONLY on:
- Converting JavaScript to TypeScript
- Adding type annotations
- Preserving functionality
Be conservative - flag uncertain conversions.
9. Books That Will Help
| Topic | Book | Chapters | Why It Helps |
|---|---|---|---|
| Distributed Systems | “Designing Data-Intensive Applications” by Kleppmann | Ch. 8-9 | Understanding coordination, consistency, consensus |
| Agent Systems | “Multi-Agent Systems” by Wooldridge | Ch. 4-5 | Agent communication, coordination protocols |
| Resilience | “Release It!” by Nygard | Ch. 4-5 | Failure patterns, circuit breakers, bulkheads |
| Concurrency | “Seven Concurrency Models in Seven Weeks” by Butcher | All | Different coordination paradigms |
| AI Agents | “Artificial Intelligence: A Modern Approach” by Russell & Norvig | Ch. 2 | Agent architectures and design |
10. Architecture Deep Dive
10.1 System Architecture
+------------------------------------------------------------------------+
| MULTI-AGENT ORCHESTRATION SYSTEM |
+------------------------------------------------------------------------+
| |
| +------------------------------------------------------------------+ |
| | ORCHESTRATOR | |
| | (TypeScript/Node.js) | |
| | | |
| | +------------+ +-------------+ +------------+ +------------+ | |
| | | Work Queue | | State Store | | Agent Pool | | Result Agg | | |
| | +------------+ +-------------+ +------------+ +------------+ | |
| +------------------------------------------------------------------+ |
| | | | |
| | spawn | read/write | results |
| v v v |
| +------------------------------------------------------------------+ |
| | AGENT LAYER | |
| | | |
| | +----------+ +----------+ +----------+ +----------+ | |
| | | Agent 1 | | Agent 2 | | Agent 3 | | Agent N | | |
| | | Headless | | Headless | | Headless | | Headless | | |
| | | Claude | | Claude | | Claude | | Claude | | |
| | +----------+ +----------+ +----------+ +----------+ | |
| | | | | | | |
| +------------------------------------------------------------------+ |
| | | | | |
| v v v v |
| +------------------------------------------------------------------+ |
| | SHARED STATE LAYER | |
| | | |
| | +---------------+ +---------------+ +-------------------+ | |
| | | File System | | Redis/SQLite | | Message Queue | | |
| | | (Simplest) | | (Structured) | | (Event-Driven) | | |
| | +---------------+ +---------------+ +-------------------+ | |
| +------------------------------------------------------------------+ |
| |
+------------------------------------------------------------------------+
10.2 Agent Spawning Pattern
interface AgentConfig {
name: string;
role: 'analyzer' | 'converter' | 'reviewer' | 'validator';
outputStyle: string;
workDir: string;
}
interface AgentHandle {
name: string;
process: ChildProcess;
sessionId: string | null;
status: 'starting' | 'running' | 'complete' | 'failed';
startTime: number;
}
async function spawnAgent(config: AgentConfig): Promise<AgentHandle> {
const { name, role, outputStyle, workDir } = config;
const proc = spawn('claude', [
'-p', getPromptForRole(role),
'--output-format', 'stream-json',
'--output-style', outputStyle,
'--allowedTools', getToolsForRole(role).join(','),
'--workdir', workDir
], {
env: { ...process.env, CLAUDE_AGENT_NAME: name }
});
const handle: AgentHandle = {
name,
process: proc,
sessionId: null,
status: 'starting',
startTime: Date.now()
};
// Parse first output to get session ID
proc.stdout.on('data', (chunk) => {
const lines = chunk.toString().split('\n');
for (const line of lines) {
if (line.trim()) {
const event = JSON.parse(line);
if (event.session_id && !handle.sessionId) {
handle.sessionId = event.session_id;
handle.status = 'running';
}
}
}
});
return handle;
}
10.3 Result Aggregation
interface AgentResult {
agentName: string;
role: string;
output: any;
duration: number;
errors: string[];
}
interface MergedResult {
success: boolean;
outputs: Map<string, any>;
conflicts: Conflict[];
summary: string;
}
async function aggregateResults(
agents: AgentHandle[]
): Promise<MergedResult> {
// Wait for all agents to complete
const results = await Promise.allSettled(
agents.map(a => waitForCompletion(a))
);
const successful: AgentResult[] = [];
const failed: { agent: string; error: Error }[] = [];
results.forEach((result, idx) => {
if (result.status === 'fulfilled') {
successful.push(result.value);
} else {
failed.push({
agent: agents[idx].name,
error: result.reason
});
}
});
// Detect conflicts between agent outputs
const conflicts = detectConflicts(successful);
// Merge based on role-specific strategies
const merged = mergeByRole(successful);
return {
success: failed.length === 0,
outputs: merged,
conflicts,
summary: generateSummary(successful, failed, conflicts)
};
}
function detectConflicts(results: AgentResult[]): Conflict[] {
const conflicts: Conflict[] = [];
// Example: Two agents modified the same file
const fileModifications = new Map<string, AgentResult[]>();
for (const result of results) {
const files = result.output.modifiedFiles || [];
for (const file of files) {
const existing = fileModifications.get(file) || [];
existing.push(result);
fileModifications.set(file, existing);
}
}
for (const [file, agents] of fileModifications) {
if (agents.length > 1) {
conflicts.push({
type: 'file_conflict',
file,
agents: agents.map(a => a.agentName),
resolution: 'manual_review'
});
}
}
return conflicts;
}
10.4 Failure Handling
interface RetryConfig {
maxRetries: number;
backoffMs: number;
backoffMultiplier: number;
maxBackoffMs: number;
}
async function withRetry<T>(
operation: () => Promise<T>,
config: RetryConfig,
context: string
): Promise<T> {
let lastError: Error;
let delay = config.backoffMs;
for (let attempt = 0; attempt < config.maxRetries; attempt++) {
try {
return await operation();
} catch (error) {
lastError = error as Error;
console.error(
`[${context}] Attempt ${attempt + 1} failed: ${lastError.message}`
);
if (attempt < config.maxRetries - 1) {
console.log(`[${context}] Retrying in ${delay}ms...`);
await sleep(delay);
delay = Math.min(delay * config.backoffMultiplier, config.maxBackoffMs);
}
}
}
throw new Error(
`[${context}] All ${config.maxRetries} attempts failed. ` +
`Last error: ${lastError!.message}`
);
}
// Circuit breaker for agent spawning
class AgentCircuitBreaker {
private failures = 0;
private lastFailure = 0;
private state: 'closed' | 'open' | 'half-open' = 'closed';
constructor(
private threshold: number = 3,
private resetTimeMs: number = 30000
) {}
async execute<T>(operation: () => Promise<T>): Promise<T> {
if (this.state === 'open') {
if (Date.now() - this.lastFailure > this.resetTimeMs) {
this.state = 'half-open';
} else {
throw new Error('Circuit breaker is open - agent spawning disabled');
}
}
try {
const result = await operation();
this.onSuccess();
return result;
} catch (error) {
this.onFailure();
throw error;
}
}
private onSuccess() {
this.failures = 0;
this.state = 'closed';
}
private onFailure() {
this.failures++;
this.lastFailure = Date.now();
if (this.failures >= this.threshold) {
this.state = 'open';
}
}
}
11. Implementation Milestones
Milestone 1: Two Agents Coordinate Successfully (Week 1-2)
Goal: Basic orchestration works
Deliverables:
- Spawn one producer agent
- Spawn one reviewer agent
- Producer output feeds to reviewer
- Combined result is meaningful
- Basic error handling
Validation: Run migration on 5-file test project
Milestone 2: Failure is Handled Gracefully (Week 2-3)
Goal: Resilience works
Deliverables:
- Agent timeout handling
- Retry with exponential backoff
- Work redistribution on failure
- Partial results on failure
- Circuit breaker for spawning
Validation: Kill random agents during run, system recovers
Milestone 3: N Agents Scale Efficiently (Week 3-4)
Goal: Full swarm capability
Deliverables:
- Dynamic agent count based on work
- Work stealing for load balancing
- Progress reporting dashboard
- Performance metrics
- Conflict detection and resolution
Validation: 50-file project completes 5x faster with 5 agents
12. Testing Strategy
Unit Tests
describe('Agent Spawning', () => {
it('should spawn agent with correct configuration', async () => {
const agent = await spawnAgent({
name: 'test-agent',
role: 'analyzer',
outputStyle: 'json',
workDir: '/tmp/test'
});
expect(agent.status).toBe('starting');
expect(agent.process.pid).toBeDefined();
});
it('should parse session ID from output', async () => {
// Mock Claude output
const mockOutput = '{"session_id": "abc123", "type": "init"}\n';
// ...
});
});
Integration Tests
describe('Multi-Agent Migration', () => {
it('should complete migration with multiple converters', async () => {
const result = await runSwarm({
task: 'migrate',
target: './test-fixtures/small-project',
agentCount: 3
});
expect(result.success).toBe(true);
expect(result.filesConverted).toBe(10);
expect(result.errors).toHaveLength(0);
});
it('should handle agent failure gracefully', async () => {
const result = await runSwarm({
task: 'migrate',
target: './test-fixtures/small-project',
agentCount: 3,
simulateFailure: { agentIndex: 1, afterMs: 1000 }
});
expect(result.success).toBe(true);
expect(result.recoveries).toBe(1);
});
});
Chaos Tests
describe('Chaos Engineering', () => {
it('should survive random agent terminations', async () => {
const swarm = startSwarm({ agentCount: 5 });
// Kill random agents every 5 seconds
const chaos = setInterval(() => {
const randomAgent = Math.floor(Math.random() * 5);
swarm.agents[randomAgent]?.process.kill('SIGTERM');
}, 5000);
const result = await swarm.complete();
clearInterval(chaos);
expect(result.success).toBe(true);
});
});
13. Configuration Schema
# swarm-config.yaml
swarm:
name: "code-migration"
maxAgents: 10
minAgents: 2
agents:
analyzer:
count: 2
style: "styles/analyzer.md"
tools: ["Read", "Glob", "Grep"]
timeout: 300000 # 5 minutes
converter:
count: 5
style: "styles/converter.md"
tools: ["Read", "Write", "Edit", "Bash"]
timeout: 600000 # 10 minutes
validator:
count: 2
style: "styles/validator.md"
tools: ["Bash"]
timeout: 180000 # 3 minutes
reviewer:
count: 1
style: "styles/reviewer.md"
tools: ["Read", "Glob"]
timeout: 300000 # 5 minutes
coordination:
stateBackend: "sqlite" # or "redis", "filesystem"
workDistribution: "pull" # or "push"
conflictResolution: "manual" # or "auto-merge", "last-write-wins"
resilience:
retries: 3
backoff:
initial: 1000
multiplier: 2
max: 30000
circuitBreaker:
threshold: 5
resetTime: 60000
observability:
logLevel: "info"
metricsPort: 9090
dashboard: true
14. Example Swarm Definitions
Migration Swarm
# swarms/migration.yaml
name: JavaScript to TypeScript Migration
description: Convert JS files to TS with type inference
phases:
- name: analysis
agents:
- role: dependency-analyzer
task: "Identify all external dependencies and their @types packages"
- role: pattern-analyzer
task: "Find common patterns that indicate types (PropTypes, JSDoc, etc.)"
- name: conversion
depends_on: [analysis]
agents:
- role: converter
count: dynamic # Scale based on file count
task: "Convert assigned files to TypeScript"
- name: validation
depends_on: [conversion]
agents:
- role: type-checker
task: "Run tsc and collect type errors"
- role: test-runner
task: "Run existing tests to verify behavior"
- name: review
depends_on: [validation]
agents:
- role: reviewer
task: "Review conversions for quality and consistency"
Analysis Swarm
# swarms/codebase-analysis.yaml
name: Deep Codebase Analysis
description: Comprehensive analysis of large codebase
agents:
- role: architecture-mapper
count: 2
task: "Map module dependencies and identify architectural patterns"
- role: security-auditor
count: 2
task: "Find security vulnerabilities and unsafe patterns"
- role: performance-profiler
count: 1
task: "Identify performance bottlenecks and optimization opportunities"
- role: code-quality-reviewer
count: 2
task: "Assess code quality, duplication, and maintainability"
- role: documentation-checker
count: 1
task: "Evaluate documentation coverage and accuracy"
aggregation:
strategy: merge-by-category
output: comprehensive-report
15. Observability Dashboard
+------------------------------------------------------------------------+
| CLAUDE SWARM DASHBOARD |
+------------------------------------------------------------------------+
| |
| TASK: code-migration-2024-01-15 |
| STATUS: RUNNING |
| ELAPSED: 00:04:32 |
| |
| +-------------------------------------------------------------------+ |
| | OVERALL PROGRESS | |
| | [================------------] 60% (33/55 files) | |
| +-------------------------------------------------------------------+ |
| |
| +-------------------------------------------------------------------+ |
| | AGENT STATUS | |
| +-------------------------------------------------------------------+ |
| | Agent | Status | Progress | CPU | Mem | Duration | |
| |----------------|-----------|----------|------|-------|-----------| |
| | analyzer-1 | COMPLETE | 100% | - | - | 1:23 | |
| | analyzer-2 | COMPLETE | 100% | - | - | 1:45 | |
| | converter-1 | WORKING | 80% | 45% | 234MB | 3:12 | |
| | converter-2 | WORKING | 62% | 52% | 198MB | 3:08 | |
| | converter-3 | COMPLETE | 100% | - | - | 2:56 | |
| | validator-1 | WAITING | 0% | 2% | 45MB | - | |
| | reviewer-1 | WORKING | 15% | 12% | 87MB | 0:45 | |
| +-------------------------------------------------------------------+ |
| |
| +-------------------------------------------------------------------+ |
| | RECENT EVENTS | |
| +-------------------------------------------------------------------+ |
| | 04:30:12 | converter-3 | Completed /src/services/auth.ts | |
| | 04:30:08 | converter-2 | Converting /src/components/Modal.tsx | |
| | 04:29:55 | converter-1 | WARNING: Complex HOC in DataGrid.jsx | |
| | 04:29:45 | reviewer-1 | Reviewed /src/utils/helpers.ts - PASS | |
| +-------------------------------------------------------------------+ |
| |
| +-------------------------------------------------------------------+ |
| | METRICS | |
| +-------------------------------------------------------------------+ |
| | Files/minute: 8.2 | |
| | Errors: 1 | |
| | Warnings: 3 | |
| | Manual review: 1 file | |
| | Est. remaining: 2:45 | |
| +-------------------------------------------------------------------+ |
| |
+------------------------------------------------------------------------+
16. Common Pitfalls and Solutions
Pitfall 1: Context Window Exhaustion
Problem: Agents run out of context when processing large files.
Solution:
// Split large files before assigning to agents
function splitLargeFiles(files: string[], maxLines: number = 500): Task[] {
const tasks: Task[] = [];
for (const file of files) {
const lines = readFileSync(file, 'utf-8').split('\n');
if (lines.length > maxLines) {
// Split into logical chunks (by function/class boundaries)
const chunks = splitByBoundaries(lines, maxLines);
chunks.forEach((chunk, i) => {
tasks.push({
type: 'partial',
file,
chunk: i,
totalChunks: chunks.length,
content: chunk
});
});
} else {
tasks.push({ type: 'complete', file });
}
}
return tasks;
}
Pitfall 2: Race Conditions in Shared State
Problem: Multiple agents update the same state simultaneously.
Solution:
// Use atomic operations with file locking
import { lockfile } from 'proper-lockfile';
async function updateSharedState(
statePath: string,
updater: (state: any) => any
): Promise<void> {
const release = await lockfile.lock(statePath);
try {
const current = JSON.parse(readFileSync(statePath, 'utf-8'));
const updated = updater(current);
writeFileSync(statePath, JSON.stringify(updated, null, 2));
} finally {
await release();
}
}
Pitfall 3: Agent Starvation
Problem: Some agents finish quickly and sit idle while others are overloaded.
Solution:
// Implement work stealing
async function workStealing(idle: AgentHandle, busy: AgentHandle[]): Promise<void> {
for (const agent of busy) {
const stolen = await agent.stealWork(1);
if (stolen.length > 0) {
await idle.assignWork(stolen);
return;
}
}
// No work to steal - mark agent as truly idle
idle.status = 'idle-no-work';
}
17. Extension Ideas
Once you complete the basic swarm:
- Adaptive Scaling: Automatically adjust agent count based on work queue depth
- Agent Memory: Agents remember context from previous runs
- Specialization Learning: Agents develop expertise over time
- Cross-Project Swarms: Coordinate across multiple repositories
- Human-in-the-Loop: Escalation points for human decisions
- Visual Swarm Builder: GUI for designing swarm topologies
18. Success Criteria
You have mastered this project when:
- Your swarm can coordinate 5+ agents on a real task
- Failures are handled gracefully without data loss
- Performance scales sub-linearly with agent count
- You can explain every coordination decision
- You have implemented at least two topology patterns
- You can debug a multi-agent failure from logs alone
- Your swarm has completed a real-world migration or analysis
Source
This project is part of the Claude Code Mastery: 40 Projects learning path.