Project 11: Human Approval Console for Risky Tool Calls

A review queue that pauses risky agent actions, asks a reviewer for approval, and resumes execution with immutable audit events.

Quick Reference

Attribute Value
Difficulty Level 3: Advanced
Time Estimate 12-24 hours
Main Programming Language TypeScript
Alternative Programming Languages Python, Elixir
Coolness Level Level 4: Very Cool
Business Potential 4. Enterprise Trust Feature
Key Topics AI SDK 6, AI Gateway, reliability, observability

1. Learning Objectives

  1. Build an observable, policy-aware AI workflow with measurable behavior.
  2. Apply AI SDK 6 primitives with AI Gateway routing in a production-style architecture.
  3. Implement failure handling, cost controls, and evidence-oriented logging.
  4. Validate correctness with reproducible tests and explicit acceptance criteria.

2. All Theory Needed (Per-Concept Breakdown)

Concept A: Runtime Contract Design

Fundamentals A runtime contract defines what must be true at each boundary: input shape, output shape, errors, and non-functional constraints. In AI systems this matters because model outputs vary and provider behavior changes over time. A contract gives your pipeline a stable internal surface even when model vendors evolve rapidly.

Deep Dive into the concept The key pattern is to separate capability selection from business intent. Your product asks for intent (summarize, classify, extract), then a policy layer maps that intent to a model class and provider route. The contract layer enforces schema validation and emits structured failure reasons so downstream components can recover predictably. You should include invariants like timeout budgets, maximum token spend, and mandatory trace IDs. Combine this with deterministic policy tables for fallback and quality tiers. This prevents accidental coupling to one provider and keeps upgrades controlled.

How this fits on projects This project applies contracts to model routing, policy checks, and downstream output consumers.

Definitions & key terms

  • Contract: explicit runtime guarantee.
  • Invariant: rule that must always hold.
  • Degradation: planned quality reduction to preserve availability.

Mental model diagram

intent -> policy -> model route -> validation -> publish result
                     |
                     -> fallback route -> validation -> publish result

How it works

  1. Accept request with required metadata.
  2. Resolve policy to primary route.
  3. Execute with strict timeout/cost budget.
  4. Validate output contract.
  5. Fallback or fail with explicit reason.

Minimal concrete example

IF request.priority = low AND spend_pct > 90 THEN route = low_cost
ELSE route = balanced
VALIDATE output.schema

Common misconceptions

  • “Provider abstraction means all models behave the same” -> false. Contracts still required.

Check-your-understanding questions

  1. Why separate intent from route?
  2. What invariant protects cost overruns?
  3. What should happen when validation fails?

Check-your-understanding answers

  1. It prevents product logic from depending on provider specifics.
  2. Explicit per-request and per-window token/cost limits.
  3. Retry/fallback or explicit failure with reason code.

Real-world applications

  • AI support platforms, compliance assistants, internal copilots.

Where you’ll apply it

  • This project implementation and Project 16 system integration.

References

  • AI SDK docs: https://ai-sdk.dev/docs/introduction
  • AI Gateway docs: https://vercel.com/docs/ai-gateway

Key insights Contract-first architecture is the difference between demo AI and production AI.

Summary Design contracts before prompts.

Homework/Exercises

  • Write three invariants for latency, cost, and safety.

Solutions

  • Example: p95 < 2.5s, run cost < $0.15, no high-risk tool without approval.

3. Project Specification

3.1 What You Will Build

A complete implementation of Human Approval Console for Risky Tool Calls with routing policies, observability, and operational runbooks.

3.2 Architecture

Client -> API -> Orchestrator -> AI Gateway -> Provider(s)
                 |
                 +-> Policy Engine
                 +-> Metrics/Tracing

3.3 Milestones

  1. Baseline happy-path flow.
  2. Error handling and fallback.
  3. Metrics and cost reports.
  4. Security and approval controls.
  5. Load/reliability validation.

4. Validation and Definition of Done

  • End-to-end scenario passes with deterministic outputs where expected.
  • Failure scenarios emit correct reason codes.
  • Dashboard shows latency, cost, and failure breakdown.
  • Runbook includes rollback and incident steps.

5. Interview-Grade Discussion Prompts

  1. Which invariants mattered most and why?
  2. How did you choose fallback order?
  3. How would this design evolve for 10x traffic?

6. Next Steps

  • Add tenant isolation tests.
  • Add canary policy rollout.
  • Add regression eval suite in CI.