Project 8: Multi-Agent Collaboration (The Teamwork)
Project 8: Multi-Agent Collaboration (The Teamwork)
Build a small โresearch teamโ where specialized agents (Researcher, Writer, Critic) collaborate through an orchestrated loop to produce a better final artifact than any single agent.
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Level 5: Master |
| Time Estimate | 35โ55 hours |
| Language | Python |
| Prerequisites | Strong prompt/tool fundamentals, trace/debug habits, eval mindset |
| Key Topics | role specialization, inter-agent protocols, shared memory, iterative refinement, conflict resolution |
1. Learning Objectives
By completing this project, you will:
- Implement multi-agent orchestration with explicit roles and goals.
- Design a protocol for agent communication (messages, handoffs, critique format).
- Build shared memory so agents can collaborate on the same evidence base.
- Add iteration limits and quality thresholds to prevent endless debates.
- Evaluate collaboration quality (does the team actually improve output?).
2. Theoretical Foundation
2.1 Core Concepts
- Division of labor: Specialization reduces cognitive load: one agent gathers evidence, one synthesizes, one critiques.
- Communication protocols: Without structure, agents produce redundant text. You want structured handoffs: evidence lists, outlines, critique checklists.
- Shared memory: If the Writer cannot see what the Researcher found, the system fails. Shared state needs provenance and versioning.
- Iteration & convergence: Multi-agent loops need stopping criteria: max rounds, minimum score, or diminishing returns.
- Failure modes: Groupthink, oscillation, and โcritic paralysisโ are common; orchestration logic must manage them.
2.2 Why This Matters
Complex assistant tasks (planning a trip, writing a proposal, designing a system) benefit from multiple perspectives and internal checks. Multi-agent systems are a practical way to add โchecks and balances.โ
2.3 Common Misconceptions
- โMore agents = better.โ More agents increases coordination overhead; keep the team small and roles sharp.
- โCritic should be harsh.โ The critic should be constructive and grounded in criteria.
- โAgents can share context implicitly.โ They canโt; you must implement memory sharing.
3. Project Specification
3.1 What You Will Build
A CLI tool that accepts a topic and produces a final report (blog post, memo, plan) by orchestrating:
- Researcher: gathers sources and extracts factual bullets.
- Writer: produces a draft from research.
- Critic: reviews against a rubric and requests revisions.
3.2 Functional Requirements
- Roles: at least 3 agents with distinct prompts and responsibilities.
- Shared memory: research artifacts stored and referenced by ID.
- Orchestrator: runs the loop: Research โ Draft โ Critique โ Revise (bounded).
- Rubric: critic outputs structured evaluation (scores + actionable feedback).
- Citations: final output includes a sources section if web research is enabled.
3.3 Non-Functional Requirements
- Determinism: run at low temperature for critic and rubric scoring.
- Observability: store an execution trace of agent messages and decisions.
- Cost control: cap tokens per agent turn and cap iterations.
- Quality control: enforce minimum evidence count and citation discipline.
3.4 Example Usage / Output
python multi_agent_research.py --topic "Sustainable urban agriculture solutions"
Output artifacts:
report.md(final)trace.jsonl(all agent messages and tool calls)evidence.json(normalized sources + snippets)
4. Solution Architecture
4.1 High-Level Design
โโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโ
โ CLI/UI โโโโโโถโ Orchestrator โ
โโโโโโโโโโโโโโโโโ โ (rounds + policy) โ
โโโโโโโโโฌโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโ
โผ โผ โผ
โโโโโโโโโโโโโโ โโโโโโโโโโโโโโ โโโโโโโโโโโโโโ
โ Researcher โ โ Writer โ โ Critic โ
โโโโโโโโฌโโโโโโโ โโโโโโโโฌโโโโโโโ โโโโโโโโฌโโโโโโโ
โ evidence โ drafts โ rubric feedback
โผ โผ โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Shared Memory / Store โ
โโโโโโโโโโโโโโโโโโโโโโโโโโ
4.2 Key Components
| Component | Responsibility | Key Decisions |
|---|---|---|
| Orchestrator | control order + stopping | max rounds; thresholds; timeouts |
| Agent prompts | define roles | sharp responsibilities; structured outputs |
| Shared store | persist evidence and drafts | version by round; provenance |
| Rubric/evals | measure quality | criteria: accuracy, clarity, completeness |
4.3 Data Structures
from dataclasses import dataclass
@dataclass(frozen=True)
class SourceItem:
id: str
url: str
title: str
snippet: str
@dataclass(frozen=True)
class Critique:
scores: dict[str, int] # e.g., {"accuracy": 8, "clarity": 7}
must_fix: list[str]
nice_to_have: list[str]
4.4 Algorithm Overview
Key Algorithm: bounded collaboration loop
- Researcher gathers N sources and extracts M evidence bullets.
- Writer drafts output using only evidence store.
- Critic scores against rubric and returns a structured critique.
- If score < threshold and rounds remain: Writer revises using critique.
- Stop when threshold met or max rounds reached; emit final report.
Complexity Analysis:
- Time: O(rounds ร agent_turns) model/tool calls
- Space: O(evidence + traces)
5. Implementation Guide
5.1 Development Environment Setup
python -m venv .venv
source .venv/bin/activate
pip install pydantic rich
5.2 Project Structure
multi-agent-team/
โโโ src/
โ โโโ cli.py
โ โโโ orchestrator.py
โ โโโ agents/
โ โ โโโ researcher.py
โ โ โโโ writer.py
โ โ โโโ critic.py
โ โโโ memory.py
โ โโโ evals.py
โโโ data/
โโโ runs/
5.3 Implementation Phases
Phase 1: Roles + trace logging (8โ12h)
Goals:
- Run a fixed pipeline with three agents and store traces.
Tasks:
- Implement agent wrappers with structured inputs/outputs.
- Store every message in
trace.jsonlwith timestamps. - Produce a report from a fixed evidence set (no web tool yet).
Checkpoint: Given a seed evidence file, output is stable and traceable.
Phase 2: Shared memory + rubric-driven revision (10โ15h)
Goals:
- Critic drives measurable improvements.
Tasks:
- Implement shared store with versions per round.
- Define rubric and parse critic output with validation.
- Implement revision loop with max rounds and thresholds.
Checkpoint: Round 2 output is demonstrably better on rubric criteria.
Phase 3: Real research tools + citation discipline (12โ28h)
Goals:
- Use browsing/search tools and maintain provenance.
Tasks:
- Integrate a web search/fetch tool (or reuse Project 5 components).
- Normalize sources and store evidence bullets with URLs.
- Enforce โno evidence โ no claimโ rule in Writer prompt.
Checkpoint: Final report includes citations tied to evidence.
5.4 Key Implementation Decisions
| Decision | Options | Recommendation | Rationale |
|---|---|---|---|
| Memory | shared text blob vs structured store | structured store | provenance + constraints |
| Critique | freeform vs JSON rubric | JSON rubric | stable iteration |
| Stopping | fixed rounds vs threshold | threshold + max rounds | prevents infinite loops |
6. Testing Strategy
6.1 Test Categories
| Category | Purpose | Examples |
|---|---|---|
| Unit | parsing/validation | critique JSON parsing, store versioning |
| Replay | deterministic behavior | run with cached sources and fixed temps |
| Quality | eval harness | rubric score monotonicity across rounds |
6.2 Critical Test Cases
- Convergence: system stops when quality threshold met.
- No hallucinated citations: all citations correspond to stored sources.
- Critic usefulness: critic output contains actionable, specific fixes.
7. Common Pitfalls & Debugging
| Pitfall | Symptom | Solution |
|---|---|---|
| Agents repeat themselves | bloated traces | require structured outputs and concise formats |
| Critic nitpicks | endless revisions | โmust fixโ vs โnice to haveโ separation |
| Missing shared context | writer invents facts | enforce evidence-only writing policy |
| Runaway cost | too many rounds | hard caps on rounds/tokens |
Debugging strategies:
- Inspect trace and identify where protocol breaks (e.g., unstructured outputs).
- Add small โcontract testsโ for agent output schemas.
8. Extensions & Challenges
8.1 Beginner Extensions
- Add a โSummarizerโ agent to compress evidence.
- Add a โFact-checkerโ agent that verifies claims against sources.
8.2 Intermediate Extensions
- Add parallel research: multiple research subagents gather evidence concurrently.
- Add disagreement resolution: critic flags contradictions and asks for more research.
8.3 Advanced Extensions
- Add task decomposition: orchestrator splits topic into sub-questions automatically.
- Add score-based model selection per role (cheap researcher, strong critic).
9. Real-World Connections
9.1 Industry Applications
- Content pipelines (research โ draft โ editorial review).
- Multi-agent customer support (triage, resolution, QA).
9.3 Interview Relevance
- Multi-agent orchestration, protocols, shared memory, and cost controls.
10. Resources
10.1 Essential Reading
- Multi-Agent Systems with AutoGen (Victor Dibia) โ roles and orchestration
- Building AI Agents (Packt) โ agent loops, tool use
10.3 Tools & Documentation
- CrewAI / AutoGen docs (agents, tasks, tools)
- LangGraph for explicit state machines
10.4 Related Projects in This Series
- Previous: Project 7 (code agent) โ traceability and safety rails
- Next: Project 12 (self-improving) โ recursive capability growth with strict sandboxing
11. Self-Assessment Checklist
- I can explain why the team improves output vs a single agent.
- I can show a structured communication protocol and trace.
- I can cap cost and still converge to acceptable quality.
- I can enforce evidence/citation discipline across agents.
12. Submission / Completion Criteria
Minimum Viable Completion:
- Three agents with distinct roles
- Orchestrated loop with trace logging
- Final report output
Full Completion:
- Shared memory store with provenance
- Rubric-driven revision loop with thresholds
- Optional web research with citations
Excellence (Going Above & Beyond):
- Parallel research subagents + contradiction resolution + eval harness
This guide was generated from project_based_ideas/AI_PERSONAL_ASSISTANTS_MASTERY.md. For the complete sprint overview, see project_based_ideas/AI_PERSONAL_ASSISTANTS_MASTERY/README.md.