Project 20: Federated Production Agent Platform Capstone

Integrate protocol interop, safety policy, workflow orchestration, evaluation, telemetry, and routing into one production-style agent platform.

Quick Reference

Attribute	Value
Difficulty	Level 5: Master
Time Estimate	24-40 hours
Language	TypeScript + Python (alt: Go)
Prerequisites	Projects 11-19
Key Topics	platform architecture, governance, SRE operations, rollout strategy

Learning Objectives

Compose MCP and A2A into a federated control plane.
Enforce policy and human approvals across trust zones.
Integrate eval and telemetry into deployment decisions.
Ship a scorecard-driven canary-to-production rollout.

The Core Question You’re Answering

“What architecture is required to run autonomous agents as dependable production infrastructure?”

Concepts You Must Understand First

Concept	Why It Matters	Where to Learn
Protocol interoperability	Heterogeneous agent ecosystem	MCP docs, A2A docs
Built-in + external tools	Full action surface design	OpenAI tools for agents
Evaluation-driven release	Deployment quality gates	SWE-Lancer
GenAI observability standards	Cross-runtime diagnostics	OTel GenAI conventions

Theoretical Foundation

Ingress -> Planner -> MCP/A2A Delegation -> Policy/HITL -> Execution -> Eval -> Scorecard -> Rollout Controller

A platform is a feedback system: run, measure, decide, adapt.

Project Specification

What You’ll Build

A capstone platform that:

Runs one full enterprise scenario end-to-end
Delegates to specialized agents via protocol boundaries
Captures telemetry and evaluation metrics
Promotes/rolls back by explicit policy thresholds

Functional Requirements

Federated agent registry and routing
Policy engine with high-risk approval gates
Evaluation harness and scorecard output
Deployment controller for canary promotion

Non-Functional Requirements

End-to-end traceability
Defined SLOs for latency/safety/cost
Incident-ready runbook artifacts

Real World Outcome

$ make p20-capstone-demo
[bootstrap] gateway, bridge, workflow, telemetry, eval runner online
[run] scenario=enterprise_incident_response
[policy] 2 actions paused for approval
[eval] success=0.84 safety=0.97 cost=$1.92/run
[deploy] canary passed -> promoted to 50% traffic
[artifact] platform_scorecard.md + architecture_decisions.md

Architecture Overview

Control Plane: Registry + Policy + Rollout
Data Plane: Agent runtimes + Tool servers
Ops Plane: Telemetry + Eval + Incident workflows

Implementation Guide

Phase 1: Integration Spine

Wire protocols and shared identity model.

Phase 2: Safety + Operations

Add policy gates, telemetry, and scorecards.

Phase 3: Deployment Controls

Canary progression and rollback from score thresholds.

Testing Strategy

Golden-path integration test
Failure-injection for dependency loss
Policy bypass and red-team scenarios
Rollback drills with synthetic regressions

Common Pitfalls & Debugging

Pitfall	Symptom	Fix
Cross-protocol ambiguity	dropped/invalid tasks	explicit boundary contracts
Policy after execution	unsafe side effects	enforce pre-dispatch policy
Promotion by intuition	unstable deployments	scorecard-gated rollout

Interview Questions They’ll Ask

How do you separate control/data/ops planes?
Which metrics gate production promotion?
How do protocols interact safely?
How do you recover from partial platform failures?

Hints in Layers

Hint 1: Build one deterministic scenario first.
Hint 2: Treat policy and observability as primitives.
Hint 3: Define promotion thresholds before running canaries.
Hint 4: Keep architecture decision records as living docs.

Submission / Completion Criteria

Minimum Completion

End-to-end platform run with artifacts and traceability

Full Completion

Automated scorecard-gated canary promotion

Excellence

Incident playbook and rollback simulation completed