Project 20: Federated Production Agent Platform Capstone

Integrate protocol interop, safety policy, workflow orchestration, evaluation, telemetry, and routing into one production-style agent platform.


Quick Reference

Attribute Value
Difficulty Level 5: Master
Time Estimate 24-40 hours
Language TypeScript + Python (alt: Go)
Prerequisites Projects 11-19
Key Topics platform architecture, governance, SRE operations, rollout strategy

Learning Objectives

  1. Compose MCP and A2A into a federated control plane.
  2. Enforce policy and human approvals across trust zones.
  3. Integrate eval and telemetry into deployment decisions.
  4. Ship a scorecard-driven canary-to-production rollout.

The Core Question You’re Answering

“What architecture is required to run autonomous agents as dependable production infrastructure?”


Concepts You Must Understand First

Concept Why It Matters Where to Learn
Protocol interoperability Heterogeneous agent ecosystem MCP docs, A2A docs
Built-in + external tools Full action surface design OpenAI tools for agents
Evaluation-driven release Deployment quality gates SWE-Lancer
GenAI observability standards Cross-runtime diagnostics OTel GenAI conventions

Theoretical Foundation

Ingress -> Planner -> MCP/A2A Delegation -> Policy/HITL -> Execution -> Eval -> Scorecard -> Rollout Controller

A platform is a feedback system: run, measure, decide, adapt.


Project Specification

What You’ll Build

A capstone platform that:

  • Runs one full enterprise scenario end-to-end
  • Delegates to specialized agents via protocol boundaries
  • Captures telemetry and evaluation metrics
  • Promotes/rolls back by explicit policy thresholds

Functional Requirements

  1. Federated agent registry and routing
  2. Policy engine with high-risk approval gates
  3. Evaluation harness and scorecard output
  4. Deployment controller for canary promotion

Non-Functional Requirements

  • End-to-end traceability
  • Defined SLOs for latency/safety/cost
  • Incident-ready runbook artifacts

Real World Outcome

$ make p20-capstone-demo
[bootstrap] gateway, bridge, workflow, telemetry, eval runner online
[run] scenario=enterprise_incident_response
[policy] 2 actions paused for approval
[eval] success=0.84 safety=0.97 cost=$1.92/run
[deploy] canary passed -> promoted to 50% traffic
[artifact] platform_scorecard.md + architecture_decisions.md

Architecture Overview

Control Plane: Registry + Policy + Rollout
Data Plane: Agent runtimes + Tool servers
Ops Plane: Telemetry + Eval + Incident workflows

Implementation Guide

Phase 1: Integration Spine

  • Wire protocols and shared identity model.

Phase 2: Safety + Operations

  • Add policy gates, telemetry, and scorecards.

Phase 3: Deployment Controls

  • Canary progression and rollback from score thresholds.

Testing Strategy

  • Golden-path integration test
  • Failure-injection for dependency loss
  • Policy bypass and red-team scenarios
  • Rollback drills with synthetic regressions

Common Pitfalls & Debugging

Pitfall Symptom Fix
Cross-protocol ambiguity dropped/invalid tasks explicit boundary contracts
Policy after execution unsafe side effects enforce pre-dispatch policy
Promotion by intuition unstable deployments scorecard-gated rollout

Interview Questions They’ll Ask

  1. How do you separate control/data/ops planes?
  2. Which metrics gate production promotion?
  3. How do protocols interact safely?
  4. How do you recover from partial platform failures?

Hints in Layers

  • Hint 1: Build one deterministic scenario first.
  • Hint 2: Treat policy and observability as primitives.
  • Hint 3: Define promotion thresholds before running canaries.
  • Hint 4: Keep architecture decision records as living docs.

Submission / Completion Criteria

Minimum Completion

  • End-to-end platform run with artifacts and traceability

Full Completion

  • Automated scorecard-gated canary promotion

Excellence

  • Incident playbook and rollback simulation completed