Project 6: Tool Safety Gatekeeper

Build a gatekeeper that intercepts agent tool use and enforces policy approvals.

Quick Reference

Attribute Value
Difficulty Level 4
Time Estimate 16-24 hours
Language Python (Alternatives: TypeScript, Go)
Prerequisites Role design, logging, basic policy modeling
Key Topics Safety policies, approvals, audit logs

1. Learning Objectives

By completing this project, you will:

  1. Define tool categories and risk levels.
  2. Implement a policy engine for approvals.
  3. Build an audit log of all tool requests.
  4. Add escalation paths for high-risk actions.

2. Theoretical Foundation

2.1 Core Concepts

  • Policy Enforcement: Rules that gate tool usage.
  • Risk Scoring: Categorizing tools by impact.
  • Auditability: Recording decisions for review.

2.2 Why This Matters

Agents can cause real-world impact through tool calls. A gatekeeper ensures safety, compliance, and accountability.

2.3 Historical Context / Background

Control planes and policy engines are standard in security-critical systems. They map directly to LLM agent tool usage.

2.4 Common Misconceptions

  • “Tool use is always safe.” Tools can trigger irreversible actions.
  • “Policies slow systems.” They prevent catastrophic errors.

3. Project Specification

3.1 What You Will Build

A policy gatekeeper that intercepts tool requests, evaluates them against rules, and either approves, blocks, or escalates.

3.2 Functional Requirements

  1. Tool Registry: Record tools and risk levels.
  2. Policy Engine: Approve, block, or escalate.
  3. Audit Log: Persist decisions with reasons.
  4. Escalation Workflow: Human or supervisor review.

3.3 Non-Functional Requirements

  • Security: No tool call bypasses the gatekeeper.
  • Transparency: All decisions explainable.
  • Reliability: Fallback if policies fail.

3.4 Example Usage / Output

$ request-tool --tool "write_file" --reason "update report"
[Gatekeeper] decision: ESCALATE (risk: high)

3.5 Real World Outcome

You can demonstrate that risky tool requests are blocked or escalated, and safe ones are approved with full audit trails.


4. Solution Architecture

4.1 High-Level Design

Agent -> Tool Request -> Policy Engine -> Approve/Block/Escalate -> Audit Log

4.2 Key Components

Component Responsibility Key Decisions
Tool Registry Define risk levels Static config
Policy Engine Apply rules Rule-based checks
Audit Log Persist decisions Append-only logs
Escalation Handler Human review Manual approval

4.3 Data Structures

Pseudo-structures:

STRUCT ToolRequest:
  tool_name
  risk_level
  justification
  requester_role

STRUCT PolicyDecision:
  decision
  reason
  timestamp

4.4 Algorithm Overview

Policy Evaluation

  1. Read tool risk level.
  2. Apply rule set.
  3. Approve, block, or escalate.
  4. Log decision.

Complexity Analysis:

  • Time: O(R) rules
  • Space: O(L) logs

5. Implementation Guide

5.1 Development Environment Setup

Use a simple configuration file to define tool rules and risk levels.

5.2 Project Structure

project-root/
├── tools/
├── policies/
├── audit/
├── escalation/
└── logs/

5.3 The Core Question You’re Answering

“How do I allow agents to act while preventing unsafe actions?”

5.4 Concepts You Must Understand First

  1. Policy enforcement
    • How to define and apply tool rules.
    • Book Reference: “Release It!” - Ch. 4
  2. Escalation
    • When to require human review.
    • Book Reference: “Clean Architecture” - Ch. 11

5.5 Questions to Guide Your Design

  1. Risk levels
    • Which tools are low vs high risk?
  2. Approval criteria
    • What conditions trigger escalation?

5.6 Thinking Exercise

Design a policy table with three tools and specify their risk levels and approval rules.

5.7 The Interview Questions They’ll Ask

  1. “How do you enforce tool-use policies?”
  2. “What is the difference between block and escalate?”
  3. “How do you audit tool actions?”
  4. “How do you prevent policy bypass?”
  5. “How do you update policies safely?”

5.8 Hints in Layers

Hint 1: Define tool categories Start with read-only vs write tools.

Hint 2: Add escalation High-risk tools require approval.

Hint 3: Log decisions Record all requests with reasons.

Hint 4: Add policy tests Test that risky tools are blocked.


5.9 Books That Will Help

Topic Book Chapter
Reliability and safety “Release It!” Ch. 4

5.10 Implementation Phases

Phase 1: Foundation (4-6 hours)

Goals:

  • Define tool registry
  • Implement policy engine

Tasks:

  1. Create tool risk list
  2. Implement rule evaluation

Checkpoint: Policy decisions returned for sample requests.

Phase 2: Core Functionality (6-8 hours)

Goals:

  • Add audit logging
  • Add escalation flow

Tasks:

  1. Log decisions
  2. Implement escalation queue

Checkpoint: Audit log records every tool request.

Phase 3: Polish & Edge Cases (4-6 hours)

Goals:

  • Add policy updates
  • Add alerts

Tasks:

  1. Support policy reload
  2. Alert on high-risk escalations

Checkpoint: Updates take effect without restart.

5.11 Key Implementation Decisions

Decision Options Recommendation Rationale
Policy model Hardcoded vs config Config Easier updates
Escalation Auto-approve vs human Human for high risk Safety

6. Testing Strategy

6.1 Test Categories

Category Purpose Examples
Unit Tests Rule evaluation High-risk tool escalates
Integration Tests Audit logging Decision recorded
Edge Case Tests Unknown tool Blocked by default

6.2 Critical Test Cases

  1. Unknown tool is blocked.
  2. High-risk tool triggers escalation.
  3. Low-risk tool is approved.

6.3 Test Data

Tool: write_file
Risk: high
Expected: escalate

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

Pitfall Symptom Solution
No default deny Unknown tools allowed Block by default
Missing logs No audit trail Log every request
Policy drift Inconsistent rules Centralized config

7.2 Debugging Strategies

  • Review audit logs by tool name.
  • Compare requests to policy rules.

7.3 Performance Traps

  • Excessive escalation can slow system throughput.

8. Extensions & Challenges

8.1 Beginner Extensions

  • Add tool categories in UI.
  • Add simple approval queue.

8.2 Intermediate Extensions

  • Add risk scoring based on task context.
  • Add per-role permissions.

8.3 Advanced Extensions

  • Add anomaly detection for tool usage.
  • Integrate with external policy engines.

9. Real-World Connections

9.1 Industry Applications

  • AI copilots with safe tool usage
  • Compliance-driven automation
  • Open Policy Agent (policy enforcement patterns)

9.3 Interview Relevance

  • Safety and control in agent systems is a hot interview topic.

10. Resources

10.1 Essential Reading

  • “Release It!” - reliability and safety

10.2 Tools & Documentation

  • Open Policy Agent docs: https://www.openpolicyagent.org/
  • Previous Project: Knowledge Ledger (P05)
  • Next Project: Swarm Simulation Sandbox (P07)

11. Self-Assessment Checklist

11.1 Understanding

  • I can define tool risk levels and policies

11.2 Implementation

  • Every tool request is audited

11.3 Growth

  • I can describe trade-offs between speed and safety

12. Submission / Completion Criteria

Minimum Viable Completion:

  • Policy engine intercepts tool calls

Full Completion:

  • Escalation and auditing are implemented

Excellence (Going Above & Beyond):

  • Risk scoring and anomaly detection added

This guide was generated from LEARN_COMPLEX_MULTI_AGENT_SYSTEMS_DEEP_DIVE.md. For the complete learning path, see the README.