Project 27: Autonomy Boundaries and Self-Improvement Guardrails
Build policy controls that limit autonomous action by reversibility, risk, and economic consequence.
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Level 4: Expert |
| Time Estimate | 10-18 hours |
| Language | Python (alt: TypeScript) |
| Prerequisites | Projects 6, 16, 22 |
| Key Topics | autonomy levels, human checkpoints, safe adaptation, TCO analysis |
Learning Objectives
- Define explicit autonomy levels and transition rules.
- Route irreversible/high-risk actions to human checkpoints.
- Bound self-improvement loops with policy and eval gates.
- Compare agent-vs-human economics including incident externalities.
The Core Question You’re Answering
“When should an agent act alone, and when must humans remain in the loop?”
Concepts You Must Understand First
| Concept | Why It Matters | Where to Learn |
|---|---|---|
| Reversibility analysis | controls irreversible damage | risk management methods |
| Human checkpoint routing | keeps risk acceptable | approval workflow design |
| Safe adaptation boundaries | prevents policy drift | AI risk frameworks |
| Full-cost modeling | avoids false automation savings | operations economics |
Theoretical Foundation
Task Risk + Reversibility + Confidence + Blast Radius -> Autonomy Policy -> Action or Human Review
Higher autonomy must be earned with measured evidence.
Project Specification
What You’ll Build
A boundary manager that:
- Assigns autonomy levels
- Applies mandatory approvals
- Restricts adaptive changes
- Produces economic decision reports
Functional Requirements
- Risk/reversibility scoring engine
- Human checkpoint router
- Adaptation gate controls
- Human-vs-agent cost model
Non-Functional Requirements
- Auditable policy decisions
- Low approval-routing ambiguity
- Clear rollback policy
Real World Outcome
$ python p27_autonomy_guard.py --scenario "vendor_contract_update"
[risk] high reversibility=low
[autonomy] level=review_only
[checkpoint] legal_ops_required=true
[adaptation] online_learning_blocked=true
[economics] agent=$6.70 human=$14.20 incident_adjusted=agent_not_preferred
Architecture Overview
Policy Matrix -> Risk Scorer -> Approval Router -> Action Dispatcher -> Audit Log
Implementation Guide
Phase 1: Policy Matrix
- Define autonomy levels and allowed actions.
Phase 2: Runtime Routing
- Add approval and dispatch controls.
Phase 3: Economic Layer
- Include incident-adjusted TCO model.
Testing Strategy
- High-risk action routing tests
- Policy bypass tests
- Adaptation drift tests
Common Pitfalls & Debugging
| Pitfall | Symptom | Fix |
|---|---|---|
| Cosmetic checkpoints | risky actions still execute | hard-block pre-dispatch |
| Optimistic economics | over-automation | include incident and maintenance costs |
| Unbounded adaptation | policy drift | gate every adaptation change |
Interview Questions They’ll Ask
- How do you define automation irreversibility?
- What should always require human approval?
- How do you constrain self-improving loops?
- How do you compare agent and human total cost?
Hints in Layers
- Hint 1: Keep autonomy levels finite and explicit.
- Hint 2: Treat approvals as workflow primitives.
- Hint 3: Require policy version tags on decisions.
- Hint 4: Stress-test economics with incident scenarios.
Submission / Completion Criteria
Minimum Completion
- Enforced autonomy matrix + checkpoint routing
Full Completion
- Adaptation limits + incident-adjusted economics
Excellence
- Governance-ready autonomy policy pack