Project 20: Federated Production Agent Platform Capstone
Integrate protocol interop, safety policy, workflow orchestration, evaluation, telemetry, and routing into one production-style agent platform.
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Level 5: Master |
| Time Estimate | 24-40 hours |
| Language | TypeScript + Python (alt: Go) |
| Prerequisites | Projects 11-19 |
| Key Topics | platform architecture, governance, SRE operations, rollout strategy |
Learning Objectives
- Compose MCP and A2A into a federated control plane.
- Enforce policy and human approvals across trust zones.
- Integrate eval and telemetry into deployment decisions.
- Ship a scorecard-driven canary-to-production rollout.
The Core Question You’re Answering
“What architecture is required to run autonomous agents as dependable production infrastructure?”
Concepts You Must Understand First
| Concept | Why It Matters | Where to Learn |
|---|---|---|
| Protocol interoperability | Heterogeneous agent ecosystem | MCP docs, A2A docs |
| Built-in + external tools | Full action surface design | OpenAI tools for agents |
| Evaluation-driven release | Deployment quality gates | SWE-Lancer |
| GenAI observability standards | Cross-runtime diagnostics | OTel GenAI conventions |
Theoretical Foundation
Ingress -> Planner -> MCP/A2A Delegation -> Policy/HITL -> Execution -> Eval -> Scorecard -> Rollout Controller
A platform is a feedback system: run, measure, decide, adapt.
Project Specification
What You’ll Build
A capstone platform that:
- Runs one full enterprise scenario end-to-end
- Delegates to specialized agents via protocol boundaries
- Captures telemetry and evaluation metrics
- Promotes/rolls back by explicit policy thresholds
Functional Requirements
- Federated agent registry and routing
- Policy engine with high-risk approval gates
- Evaluation harness and scorecard output
- Deployment controller for canary promotion
Non-Functional Requirements
- End-to-end traceability
- Defined SLOs for latency/safety/cost
- Incident-ready runbook artifacts
Real World Outcome
$ make p20-capstone-demo
[bootstrap] gateway, bridge, workflow, telemetry, eval runner online
[run] scenario=enterprise_incident_response
[policy] 2 actions paused for approval
[eval] success=0.84 safety=0.97 cost=$1.92/run
[deploy] canary passed -> promoted to 50% traffic
[artifact] platform_scorecard.md + architecture_decisions.md
Architecture Overview
Control Plane: Registry + Policy + Rollout
Data Plane: Agent runtimes + Tool servers
Ops Plane: Telemetry + Eval + Incident workflows
Implementation Guide
Phase 1: Integration Spine
- Wire protocols and shared identity model.
Phase 2: Safety + Operations
- Add policy gates, telemetry, and scorecards.
Phase 3: Deployment Controls
- Canary progression and rollback from score thresholds.
Testing Strategy
- Golden-path integration test
- Failure-injection for dependency loss
- Policy bypass and red-team scenarios
- Rollback drills with synthetic regressions
Common Pitfalls & Debugging
| Pitfall | Symptom | Fix |
|---|---|---|
| Cross-protocol ambiguity | dropped/invalid tasks | explicit boundary contracts |
| Policy after execution | unsafe side effects | enforce pre-dispatch policy |
| Promotion by intuition | unstable deployments | scorecard-gated rollout |
Interview Questions They’ll Ask
- How do you separate control/data/ops planes?
- Which metrics gate production promotion?
- How do protocols interact safely?
- How do you recover from partial platform failures?
Hints in Layers
- Hint 1: Build one deterministic scenario first.
- Hint 2: Treat policy and observability as primitives.
- Hint 3: Define promotion thresholds before running canaries.
- Hint 4: Keep architecture decision records as living docs.
Submission / Completion Criteria
Minimum Completion
- End-to-end platform run with artifacts and traceability
Full Completion
- Automated scorecard-gated canary promotion
Excellence
- Incident playbook and rollback simulation completed