Project 27: Autonomy Boundaries and Self-Improvement Guardrails

Build policy controls that limit autonomous action by reversibility, risk, and economic consequence.


Quick Reference

Attribute Value
Difficulty Level 4: Expert
Time Estimate 10-18 hours
Language Python (alt: TypeScript)
Prerequisites Projects 6, 16, 22
Key Topics autonomy levels, human checkpoints, safe adaptation, TCO analysis

Learning Objectives

  1. Define explicit autonomy levels and transition rules.
  2. Route irreversible/high-risk actions to human checkpoints.
  3. Bound self-improvement loops with policy and eval gates.
  4. Compare agent-vs-human economics including incident externalities.

The Core Question You’re Answering

“When should an agent act alone, and when must humans remain in the loop?”


Concepts You Must Understand First

Concept Why It Matters Where to Learn
Reversibility analysis controls irreversible damage risk management methods
Human checkpoint routing keeps risk acceptable approval workflow design
Safe adaptation boundaries prevents policy drift AI risk frameworks
Full-cost modeling avoids false automation savings operations economics

Theoretical Foundation

Task Risk + Reversibility + Confidence + Blast Radius -> Autonomy Policy -> Action or Human Review

Higher autonomy must be earned with measured evidence.


Project Specification

What You’ll Build

A boundary manager that:

  • Assigns autonomy levels
  • Applies mandatory approvals
  • Restricts adaptive changes
  • Produces economic decision reports

Functional Requirements

  1. Risk/reversibility scoring engine
  2. Human checkpoint router
  3. Adaptation gate controls
  4. Human-vs-agent cost model

Non-Functional Requirements

  • Auditable policy decisions
  • Low approval-routing ambiguity
  • Clear rollback policy

Real World Outcome

$ python p27_autonomy_guard.py --scenario "vendor_contract_update"
[risk] high reversibility=low
[autonomy] level=review_only
[checkpoint] legal_ops_required=true
[adaptation] online_learning_blocked=true
[economics] agent=$6.70 human=$14.20 incident_adjusted=agent_not_preferred

Architecture Overview

Policy Matrix -> Risk Scorer -> Approval Router -> Action Dispatcher -> Audit Log

Implementation Guide

Phase 1: Policy Matrix

  • Define autonomy levels and allowed actions.

Phase 2: Runtime Routing

  • Add approval and dispatch controls.

Phase 3: Economic Layer

  • Include incident-adjusted TCO model.

Testing Strategy

  • High-risk action routing tests
  • Policy bypass tests
  • Adaptation drift tests

Common Pitfalls & Debugging

Pitfall Symptom Fix
Cosmetic checkpoints risky actions still execute hard-block pre-dispatch
Optimistic economics over-automation include incident and maintenance costs
Unbounded adaptation policy drift gate every adaptation change

Interview Questions They’ll Ask

  1. How do you define automation irreversibility?
  2. What should always require human approval?
  3. How do you constrain self-improving loops?
  4. How do you compare agent and human total cost?

Hints in Layers

  • Hint 1: Keep autonomy levels finite and explicit.
  • Hint 2: Treat approvals as workflow primitives.
  • Hint 3: Require policy version tags on decisions.
  • Hint 4: Stress-test economics with incident scenarios.

Submission / Completion Criteria

Minimum Completion

  • Enforced autonomy matrix + checkpoint routing

Full Completion

  • Adaptation limits + incident-adjusted economics

Excellence

  • Governance-ready autonomy policy pack