Project 13: Tool Permission Firewall

Action-denial logs, approval queue traces, and policy coverage matrix.

Quick Reference

Attribute Value
Difficulty Level 3: Advanced
Time Estimate 5-10 days (capstone: 3-5 weeks)
Main Programming Language TypeScript
Alternative Programming Languages Go, Python
Coolness Level Level 5: Security Architecture
Business Potential 5. Security Product
Knowledge Area Authorization for Agents
Software or Tool Policy engine + approval workflow
Main Book Zero Trust Architecture (NIST SP 800-207)
Concept Clusters Instruction Hierarchy and Injection Defense; Tool Calling and MCP Interoperability

1. Learning Objectives

By completing this project, you will:

  1. Design a policy firewall that enforces least-privilege capability-based permissions for AI tool calls with default-deny semantics.
  2. Build a risk classification system that assigns tool actions to tiers (read-only, state-modifying, irreversible, external-facing) based on side effects, blast radius, and reversibility.
  3. Implement a deterministic authorization engine that evaluates tool requests against role-scoped policies, argument constraints, and contextual risk factors before allowing execution.
  4. Design and operate a human approval queue with timeout policies, escalation chains, and audit logging for high-risk tool actions that exceed automated authorization thresholds.
  5. Produce immutable decision logs that capture every allow, deny, and escalation event with policy rule references, enabling compliance audits and incident forensics.
  6. Measure firewall effectiveness through policy coverage matrices, false-positive rates, queue latency metrics, and approval throughput dashboards.

2. All Theory Needed (Per-Concept Breakdown)

Action Risk Classification

Fundamentals Action Risk Classification is the foundational mental model for this project because every tool call an AI agent can make carries a different level of potential harm. A read-only database lookup has near-zero blast radius; a bank transfer to an external account is irreversible and high-cost. Without a systematic classification scheme, firewall policies devolve into ad-hoc allowlists that miss dangerous edge cases. This concept forces you to evaluate every tool action along four dimensions: side effects (does it change state?), blast radius (how many systems or users are affected?), reversibility (can the action be undone?), and cost (financial or reputational exposure). The resulting risk tier determines whether the action is auto-allowed, requires argument validation, or must be routed to human approval. For this project, risk classification is the first gate in the pipeline; everything downstream depends on it being accurate and deterministic.

Deep Dive into the concept At depth, Action Risk Classification requires building a structured risk matrix that maps every registered tool to a risk tier. The four canonical tiers are: LOW (read-only operations with no side effects, such as lookups and searches), MEDIUM (state-modifying operations that are reversible, such as updating a record or toggling a feature flag), HIGH (irreversible or high-cost operations, such as deleting data or processing financial transactions), and CRITICAL (external-facing operations that cross trust boundaries, such as sending emails to customers, making API calls to third-party systems, or transferring funds externally).

The classification is not static. The same tool can carry different risk depending on context. A crm_update tool is MEDIUM risk when updating a note field, but becomes HIGH risk when changing a customer’s billing address because that affects downstream invoice routing. This means the classifier must inspect not just the tool name but also the arguments being passed. Argument-sensitive classification requires a rule set that can match on field names, value ranges, and cross-field combinations. For example, bank_transfer is MEDIUM when amount < 500 and destination is internal, but CRITICAL when amount >= 500 or destination is external.

Dynamic risk escalation is the most subtle part. Consider an agent operating in a sandbox environment versus production: the same tool call should be classified differently depending on the execution context. Your risk classifier needs an environment dimension (sandbox, staging, production) that shifts tier boundaries upward in higher-trust environments. Similarly, repeated failed attempts for the same tool within a session should escalate the risk score, since this pattern may indicate adversarial probing.

The risk matrix itself should be a declarative data structure, not scattered if-else logic. This makes it auditable (security reviewers can inspect the matrix without reading code), testable (you can enumerate all tool-tier mappings and verify completeness), and versionable (changes to classification rules are tracked as configuration diffs). A well-designed matrix also reveals coverage gaps: tools that lack explicit classification rules default to the highest tier (deny-by-default), which is the safe failure mode.

Finally, risk classification feeds directly into observability. Every classification decision should emit a structured log entry containing the tool name, the tier assigned, the arguments that influenced the decision, and the rule that matched. Over time, these logs reveal patterns: which tools are most frequently called, which classifications are most often overridden by reviewers, and where the matrix has blind spots.

How this fit on projects Action Risk Classification is the entry point of the P13 firewall pipeline. It determines the policy path for every incoming tool request and directly shapes what the approval queue and audit logger need to handle.

Definitions & key terms

  • Risk tier: a categorical level (LOW, MEDIUM, HIGH, CRITICAL) assigned to a tool action based on its potential impact.
  • Blast radius: the scope of systems, data, or users affected if the action produces an unintended outcome.
  • Reversibility: whether the effects of an action can be undone programmatically within a bounded time window.
  • Argument-sensitive classification: risk tier assignment that depends on the specific arguments passed to a tool, not just the tool name.
  • Coverage gap: a tool or argument combination that has no explicit classification rule and therefore defaults to the highest risk tier.

Mental model diagram (ASCII)

Tool Call Request
       |
       v
+-------------------------------+
| Risk Classifier               |
|                               |
|  Tool Name ----+              |
|  Arguments ----|-> Rule Match |
|  Context ------+              |
|                               |
|  Rules:                       |
|   read_* ---------> LOW      |
|   update_* -------> MEDIUM   |
|   delete_* -------> HIGH     |
|   external_* -----> CRITICAL |
|   amount > $500 --> CRITICAL |
|   env=prod -------> tier + 1 |
+-------------------------------+
       |
       v
  Risk Tier Decision
  (LOW | MEDIUM | HIGH | CRITICAL)
       |
       v
  Policy Engine (next stage)

How it works (step-by-step, with invariants and failure modes)

  1. Receive a tool call request with tool name, arguments, actor identity, and session context.
  2. Look up the tool in the risk matrix. If no explicit rule exists, assign CRITICAL (deny-by-default invariant).
  3. Evaluate argument-sensitive rules: check value ranges, allowlists, and cross-field constraints.
  4. Apply context modifiers: escalate tier if environment is production, if session has prior failures, or if actor lacks elevated privileges.
  5. Emit classification decision with tier, matched rule ID, and contributing factors.
  6. Pass the tier and metadata to the authorization engine for policy evaluation.

Invariants: every tool call must receive exactly one tier assignment; no tool call may proceed without classification; classification must be deterministic for identical inputs.

Failure modes: missing tool in registry (defaults to CRITICAL), argument parsing failure (defaults to CRITICAL with error reason), conflicting rules (highest tier wins).

Minimal concrete example

Risk Matrix (excerpt):
+---------------------+----------+----------------------------+
| Tool                | Base Tier| Argument Escalation Rules  |
+---------------------+----------+----------------------------+
| crm_lookup          | LOW      | none                       |
| crm_update          | MEDIUM   | field=billing -> HIGH      |
| user_delete         | HIGH     | none                       |
| bank_transfer       | HIGH     | amount>=500 -> CRITICAL    |
|                     |          | dest=external -> CRITICAL  |
| email_send_customer | CRITICAL | none                       |
+---------------------+----------+----------------------------+

Classification trace:
  tool: bank_transfer
  args: {amount: 20000, destination: "ext_22"}
  matched_rule: "amount>=500 -> CRITICAL"
  context_modifier: "env=production -> no further escalation (already CRITICAL)"
  final_tier: CRITICAL
  rule_id: RC-004

Common misconceptions

  • “Tool name alone is sufficient for risk classification.” In reality, arguments often determine the true risk. A database_query tool is LOW risk for SELECT but HIGH risk if it accepts raw SQL that could include DELETE statements.
  • “Risk tiers are permanent and never need updating.” New tools, changed business rules, and incidents regularly require matrix revisions. Treat the matrix as a living configuration artifact.
  • “All HIGH-risk actions are equally dangerous.” There is meaningful difference between deleting a draft document and deleting a customer account. Sub-tier granularity or numeric risk scores within tiers can improve policy precision.

Check-your-understanding questions

  1. Why should a tool with no explicit classification rule default to CRITICAL rather than LOW?
  2. Give an example where the same tool name should produce different risk tiers depending on its arguments.
  3. How does execution context (sandbox vs. production) affect risk classification?

Check-your-understanding answers

  1. Defaulting to CRITICAL enforces deny-by-default. If a new tool is registered without classification rules, it is blocked until someone explicitly reviews and assigns the correct tier. Defaulting to LOW would silently allow potentially dangerous actions.
  2. A payment_refund tool with amount=5 is MEDIUM (small reversible refund), but with amount=50000 it is CRITICAL (large financial exposure). The argument value determines the blast radius.
  3. In sandbox, a user_delete might stay at MEDIUM because no real data is affected. In production, the same call is HIGH or CRITICAL because real user data would be permanently lost. The classifier applies a context modifier that shifts tiers upward for higher-trust environments.

Real-world applications

  • Enterprise AI copilots that can trigger CRM updates, refunds, and account modifications on behalf of support agents.
  • Autonomous coding agents that need different permission levels for reading files, writing files, and executing shell commands.
  • Healthcare AI assistants where read-only chart access is low-risk but modifying treatment plans is high-risk and requires physician approval.

Where you’ll apply it

  • The risk classifier is the first component you build in P13. It feeds tier decisions into the authorization engine (Concept 2) and determines which requests enter the approval queue (Concept 3).

References

  • NIST SP 800-207 Zero Trust Architecture, Section 3 (trust evaluation and policy decision points)
  • OWASP LLM Top 10 2025, LLM06 (Excessive Agency) and LLM07 (Insecure Plugin Design)
  • “Security Engineering” by Ross Anderson, Chapter 4 (access control fundamentals)

Key insights Risk classification must be argument-aware and context-sensitive, not just tool-name lookups, because the same tool can be safe or dangerous depending on what it is asked to do and where.

Summary Action Risk Classification assigns every tool call to a risk tier based on side effects, blast radius, reversibility, cost, argument values, and execution context. It uses a declarative risk matrix with deny-by-default semantics, argument-sensitive escalation rules, and context modifiers. This classification is the first gate in the firewall pipeline and determines all downstream policy decisions.

Homework/Exercises to practice the concept

  • Design a risk matrix for 8 tools in a customer support system, assigning base tiers and at least 3 argument-sensitive escalation rules.
  • Write a classification trace for a file_delete tool called with path=”/tmp/cache” versus path=”/etc/passwd” and explain why the tiers differ.
  • Identify 3 coverage gaps in a sample risk matrix and explain what incidents each gap could cause.

Solutions to the homework/exercises For the risk matrix: tools like ticket_lookup (LOW), ticket_update_notes (MEDIUM), ticket_reassign (MEDIUM), refund_process (HIGH, escalate to CRITICAL if amount > 1000), account_close (HIGH), email_customer (CRITICAL), payment_charge (CRITICAL), database_export (HIGH, escalate to CRITICAL if table contains PII). Each escalation rule should reference a specific argument field and threshold. For the file_delete trace: /tmp/cache is MEDIUM (temporary, reversible from cache rebuild) while /etc/passwd is CRITICAL (system file, irreversible, catastrophic blast radius). Coverage gaps typically appear in newly added tools, tools with overly broad argument schemas, and tools that can be chained to achieve higher-privilege outcomes than their individual tiers suggest.

Deterministic Tool Authorization

Fundamentals Deterministic Tool Authorization is the policy engine at the heart of the firewall. Once a tool call has been classified by risk tier, the authorization engine must decide whether to allow, deny, or escalate the request. This decision must be deterministic: the same request with the same context must always produce the same outcome. The engine implements a capability-based security model where each actor (the AI agent, a specific user session, or a role) holds a set of capabilities that define which tools they can invoke and with what argument constraints. Unlike role-based access control where permissions are coarse-grained and attached to broad roles, capability-based authorization scopes permissions to specific tool-action-argument combinations. This granularity is essential because AI agents often need narrow permissions (e.g., can read CRM records but cannot update billing fields) that do not map cleanly to traditional roles.

Deep Dive into the concept At depth, Deterministic Tool Authorization requires designing a policy engine with three layers: rule definition, rule evaluation, and decision caching.

Rule definition is the declarative layer. Each policy rule specifies: a subject (who is requesting, e.g., “assistant” role, session “s_001”), an action (which tool and operation), a set of argument constraints (allowlists, range checks, cross-field validations), and a decision (ALLOW, DENY, or ESCALATE). Rules are evaluated in a defined order. The engine uses a deny-by-default model: if no rule explicitly allows an action, the action is denied. This is the most critical invariant in the entire system. A common mistake is defaulting to allow and then trying to enumerate all dangerous cases. The deny-by-default model means you only need to enumerate what is safe, which is a much smaller and more auditable set.

Rule evaluation order matters because rules can overlap. Consider a rule that allows refund_process for amounts under 100 and another that escalates refund_process for amounts over 500. What happens for amount=250? You need explicit precedence: more specific rules take priority over general rules, and DENY/ESCALATE rules take priority over ALLOW rules at the same specificity level. This conflict resolution strategy must be documented and tested because subtle ordering bugs can silently permit dangerous actions.

Argument validation goes beyond schema checking. Schema validation confirms that amount is a number, but the authorization engine must also enforce business constraints: is the amount within the allowed range for this actor? Is the destination account on the allowlist? Do the arguments violate any cross-field constraints (e.g., internal transfers cannot exceed 10,000 while external transfers cannot exceed 500)? These are not type checks; they are policy checks that require domain-specific rule logic.

Authorization caching is an optimization concern with correctness implications. If you cache authorization decisions, you must invalidate the cache when policies change, when actor permissions are revoked, or when session context changes. A stale cache that grants access based on a revoked permission is a security vulnerability. The safest approach is to re-evaluate policies on every request and use caching only for the rule lookup step (not the decision step), or to implement explicit cache invalidation triggered by policy update events.

The final consideration is latency. The authorization engine sits in the critical path of every tool call. If evaluation takes too long, developers will be tempted to bypass it or move it to an async path, which defeats its purpose. Design the rule set for fast evaluation: index rules by tool name for O(1) lookup, evaluate argument constraints with simple comparisons, and keep the total rule count manageable (hundreds, not thousands). If the rule set grows large, partition it by tool namespace.

How this fit on projects This concept is the core policy engine of P13. It receives risk tier assignments from the classifier (Concept 1) and routes high-risk decisions to the approval queue (Concept 3). Every authorization decision is logged for audit.

Definitions & key terms

  • Capability: a specific permission tuple of (subject, tool, action, argument_constraints) that grants access to a narrow operation.
  • Deny-by-default: the policy invariant that any request without an explicit ALLOW rule is automatically denied.
  • Rule precedence: the evaluation order that determines which rule applies when multiple rules match the same request. More specific rules win; DENY/ESCALATE wins over ALLOW at the same level.
  • Argument constraint: a business-logic check on tool arguments that goes beyond type validation, such as range limits, allowlists, and cross-field rules.
  • Policy version: a snapshot identifier for the complete rule set, enabling rollback and audit trail correlation.

Mental model diagram (ASCII)

Tool Request (tool, args, actor, session, risk_tier)
       |
       v
+-------------------------------------------+
| Authorization Engine                      |
|                                           |
|  1. Lookup rules by tool name             |
|     +---> No rules found? -> DENY         |
|                                           |
|  2. Filter rules by actor/role match      |
|     +---> No matching actor? -> DENY      |
|                                           |
|  3. Evaluate argument constraints         |
|     +---> amount > limit? -> ESCALATE     |
|     +---> dest not in allowlist? -> DENY  |
|                                           |
|  4. Apply precedence (specific > general) |
|     +---> DENY/ESCALATE > ALLOW           |
|                                           |
|  5. Emit decision + rule_id + reason      |
+-------------------------------------------+
       |
       v
  ALLOW -> Execute tool
  DENY  -> Return error with reason
  ESCALATE -> Route to approval queue

How it works (step-by-step, with invariants and failure modes)

  1. Receive a tool request with the risk tier from the classifier, the actor identity, session context, and full argument payload.
  2. Look up all policy rules that match the tool name. If no rules exist, DENY (deny-by-default invariant).
  3. Filter matching rules by actor/role. If the actor has no matching rules, DENY.
  4. For each matching rule, evaluate argument constraints: range checks, allowlist membership, cross-field validations.
  5. If multiple rules match, apply precedence: most specific rule wins. If there is a tie, DENY or ESCALATE wins over ALLOW.
  6. Emit the authorization decision with the matched rule ID, the reason (which constraint matched or failed), and the policy version.
  7. If ALLOW, pass the request to the tool execution layer. If DENY, return a structured error. If ESCALATE, route to the approval queue.

Invariants: no tool executes without an explicit ALLOW decision; every decision references a policy version and rule ID; identical inputs always produce identical decisions.

Failure modes: policy file fails to load (deny all requests and alert operators), rule conflict not resolvable (deny and log conflict details), actor identity missing or forged (deny with AUTH_MISSING reason).

Minimal concrete example

Policy rules for "refund_process" tool:
+-----+----------------+--------------------+---------------------------+-----------+
| #   | Actor          | Tool               | Argument Constraints      | Decision  |
+-----+----------------+--------------------+---------------------------+-----------+
| R01 | support_agent  | refund_process     | amount <= 100             | ALLOW     |
| R02 | support_agent  | refund_process     | 100 < amount <= 500       | ALLOW     |
| R03 | support_agent  | refund_process     | amount > 500              | ESCALATE  |
| R04 | support_lead   | refund_process     | amount <= 5000            | ALLOW     |
| R05 | support_lead   | refund_process     | amount > 5000             | ESCALATE  |
| R06 | assistant      | refund_process     | any                       | ESCALATE  |
+-----+----------------+--------------------+---------------------------+-----------+

Evaluation trace:
  request: {tool: "refund_process", actor: "support_agent", args: {amount: 750}}
  matching_rules: [R01, R02, R03]
  constraint_eval: R01 fails (750 > 100), R02 fails (750 > 500), R03 passes (750 > 500)
  decision: ESCALATE (rule R03)
  policy_version: "pol_v2.1"
  reason: "Refund amount exceeds support_agent threshold of $500"

Common misconceptions

  • “Role-based access is sufficient for AI tool authorization.” Roles are too coarse. An AI assistant with a “support” role should not automatically have the same permissions as a human support agent. Capability-based authorization lets you scope permissions precisely.
  • “If the risk tier is LOW, no authorization check is needed.” Even LOW-risk actions need authorization to prevent privilege accumulation. An agent that can make unlimited read-only calls could still exfiltrate sensitive data through volume.
  • “Authorization decisions can be cached indefinitely for performance.” Policy changes, permission revocations, and session context changes all invalidate cached decisions. A stale ALLOW cache is a security vulnerability.

Check-your-understanding questions

  1. Why does deny-by-default produce a safer system than allow-by-default with a denylist?
  2. What happens when two policy rules match the same request with conflicting decisions (one ALLOW, one DENY)?
  3. Why must argument constraints go beyond schema validation?

Check-your-understanding answers

  1. With deny-by-default, new tools and edge cases are automatically blocked until someone explicitly reviews them. With allow-by-default, any tool you forget to denylist is silently permitted, which means security degrades as the system grows.
  2. The DENY rule takes precedence. In the conflict resolution strategy, DENY and ESCALATE always win over ALLOW at the same specificity level. This ensures that safety constraints are never overridden by permissive rules.
  3. Schema validation only checks types (is amount a number?). Argument constraints check business logic (is amount within the allowed range for this actor? is destination on the approved list?). Without argument constraints, a structurally valid request with dangerous values would be approved.

Real-world applications

  • Cloud provider IAM systems that evaluate policy documents with conditions, resource ARNs, and action wildcards for every API call.
  • Payment processing systems where transaction authorization depends on amount thresholds, merchant categories, and cardholder risk profiles.
  • Kubernetes admission controllers that evaluate pod specifications against policy rules before allowing deployment.

Where you’ll apply it

  • The authorization engine is the second component in P13. It consumes risk tier assignments from the classifier and produces ALLOW/DENY/ESCALATE decisions that determine whether tools execute or requests enter the approval queue.

References

  • NIST SP 800-207 Zero Trust Architecture, Section 4 (policy engine and policy administrator)
  • “Security Engineering” by Ross Anderson, Chapter 4 (access control) and Chapter 8 (capability-based systems)
  • AWS IAM Policy Evaluation Logic documentation (deny-by-default with explicit allow)
  • Open Policy Agent (OPA) Rego language design for declarative policy evaluation

Key insights The authorization engine must be deterministic, deny-by-default, and argument-aware, because AI agents generate unpredictable tool call combinations that cannot be anticipated by coarse role-based rules alone.

Summary Deterministic Tool Authorization implements a capability-based policy engine that evaluates every tool request against scoped rules with argument constraints, deny-by-default semantics, and explicit precedence ordering. It produces ALLOW, DENY, or ESCALATE decisions with full traceability, ensuring that no tool executes without passing a deterministic policy check.

Homework/Exercises to practice the concept

  • Write a complete policy rule set (at least 10 rules) for a system with 5 tools and 3 actor roles, including argument constraints and escalation thresholds.
  • Design a conflict resolution test: create 3 scenarios where multiple rules match and trace through the precedence logic to determine the final decision.
  • Describe a cache invalidation strategy for authorization decisions and identify 3 events that must trigger invalidation.

Solutions to the homework/exercises The policy rule set should cover all tool-actor combinations with explicit rules, leaving no gaps (deny-by-default catches anything missing). Argument constraints should include at least one range check, one allowlist, and one cross-field constraint. For conflict resolution: each scenario should show the matching rules, their specificity ranking, and the final decision with the rule ID that won. Common invalidation triggers are: policy version update (all cached decisions for the old version are invalid), actor permission revocation (cached ALLOWs for that actor are invalid), and session termination (all session-scoped cache entries are cleared).

Human Approval and Exception Handling

Fundamentals Human Approval and Exception Handling is the safety valve that prevents the firewall from becoming either a bottleneck or a rubber stamp. When the authorization engine produces an ESCALATE decision, the request enters a human approval queue. This concept covers the design of that queue, the workflow around reviewer decisions, timeout and escalation policies, exception handling for edge cases, and the audit trail that makes every decision traceable. The fundamental tension is between safety (requiring human review for risky actions) and velocity (not blocking agents on slow human response). A well-designed approval system resolves this tension through tiered SLAs, automatic timeout policies, and escalation chains that ensure no request languishes indefinitely. Without this concept, the firewall either blocks everything (frustrating users) or auto-approves everything (defeating its purpose).

Deep Dive into the concept At depth, Human Approval and Exception Handling requires designing five interconnected subsystems: the approval queue, the reviewer workflow, timeout and escalation policies, exception override mechanics, and audit logging.

The approval queue is a persistent, ordered collection of pending requests. Each queue entry contains the original tool request, the risk classification, the authorization decision (ESCALATE), the policy rule that triggered escalation, the full argument payload (with sensitive fields redacted), and a timestamp. The queue must be durable; if the system restarts, pending approvals must survive. It must also be observable: operators need to see queue depth, average wait time, and oldest pending item at a glance.

The reviewer workflow defines who can approve or deny requests and what information they see. Reviewers need enough context to make an informed decision: what tool is being called, with what arguments, by which agent, in what session context, and why the policy engine escalated it. The UI should present the policy rule explanation, the risk tier, and any relevant history (e.g., “this agent has made 3 similar requests in the last hour”). Reviewers choose APPROVE (with optional justification note) or DENY (with required reason). Both decisions are final and immutable once recorded.

Timeout and escalation policies prevent queue stagnation. Every queue entry has a deadline. If no reviewer acts before the deadline, the system follows a configured policy: either auto-deny (safest default), auto-escalate to a higher-level reviewer, or auto-approve with a warning flag (only for low-consequence escalations). The timeout duration should be risk-proportional: CRITICAL actions get a 5-minute timeout before escalation to a manager, while HIGH actions might get 30 minutes. Escalation chains define who gets notified at each stage: primary reviewer, backup reviewer, team lead, on-call manager. If the entire chain expires without action, the request is denied with an ESCALATION_TIMEOUT reason code.

Exception handling covers the cases where the normal approval flow breaks down. What if a reviewer approves an action but the underlying tool has become unavailable? What if a reviewer approves a batch of actions but one has already been superseded by a newer request? What if a reviewer needs to override a DENY decision from the policy engine with documented justification? Exception overrides require elevated privileges and mandatory justification text. Every override is flagged in the audit log with OVERRIDE status, making it easy to audit and review during security postmortems.

Balancing automation with human oversight is an ongoing calibration exercise. If too many actions escalate to human review, reviewers suffer approval fatigue and start rubber-stamping (defeating the purpose of the firewall). If too few actions escalate, dangerous actions slip through. The key metric is the false escalation rate: the percentage of escalated actions that reviewers approve without modification. If this rate exceeds 90%, the escalation threshold is probably too aggressive and the policy rules should be loosened for those specific tool-argument combinations. Conversely, if reviewers deny more than 30% of escalated actions, the automation is letting too many risky requests through.

Audit logging ties everything together. Every event in the approval lifecycle must be recorded: request received, request queued, reviewer assigned, reviewer decision, timeout triggered, escalation fired, exception override applied. Each log entry includes the trace ID (linking back to the original tool request), the actor, the timestamp, the decision, and the justification. These logs are append-only and immutable. They serve three purposes: compliance (demonstrating that risky actions were reviewed), incident forensics (tracing how a bad action was approved), and policy tuning (identifying patterns in reviewer decisions that suggest rule adjustments).

How this fit on projects This concept is the third and final stage of the P13 firewall pipeline. It receives ESCALATE decisions from the authorization engine (Concept 2) and manages the human review process. Its audit logs integrate with the decision logs from the classifier and authorization engine to provide end-to-end traceability.

Definitions & key terms

  • Approval queue: a persistent, ordered collection of tool requests awaiting human review, with metadata for display, timeout, and escalation.
  • Escalation chain: an ordered list of reviewers who are notified progressively if the preceding reviewer does not act within the timeout window.
  • Timeout policy: a configured deadline for reviewer action, after which the system auto-denies, auto-escalates, or (rarely) auto-approves with a flag.
  • Exception override: a mechanism allowing a privileged reviewer to reverse a policy DENY decision with mandatory documented justification.
  • Approval fatigue: the degradation of review quality when reviewers are presented with too many low-signal escalation requests, leading to rubber-stamping.
  • False escalation rate: the percentage of escalated actions that are approved without modification, indicating overly aggressive escalation thresholds.

Mental model diagram (ASCII)

ESCALATE Decision (from auth engine)
       |
       v
+--------------------------------------------------+
| Approval Queue                                   |
|                                                  |
|  +--------------------------------------------+ |
|  | Pending Request                             | |
|  | tool: bank_transfer                         | |
|  | args: {amount: 20000, dest: "ext_22"}       | |
|  | risk: CRITICAL                              | |
|  | rule: R03 (amount > $500 external)          | |
|  | timeout: 5 min -> escalate to manager       | |
|  | created: 2025-01-15T10:32:00Z               | |
|  +--------------------------------------------+ |
+--------------------------------------------------+
       |
       v
+--------------------------------------------------+
| Reviewer Decision                                |
|                                                  |
|  [APPROVE + justification]                       |
|      -> Execute tool, log APPROVED               |
|                                                  |
|  [DENY + reason]                                 |
|      -> Return error to agent, log DENIED        |
|                                                  |
|  [NO ACTION within timeout]                      |
|      -> Escalation chain fires                   |
|      -> Next reviewer gets 5 min                 |
|      -> Chain exhausted? -> AUTO_DENY            |
+--------------------------------------------------+
       |
       v
+--------------------------------------------------+
| Audit Log (append-only)                          |
|                                                  |
|  trace_id: trc_p13_099                           |
|  events:                                         |
|    - QUEUED at T+0s by policy_engine             |
|    - ASSIGNED to reviewer_1 at T+2s              |
|    - TIMEOUT at T+300s (no action)               |
|    - ESCALATED to manager_1 at T+301s            |
|    - APPROVED by manager_1 at T+420s             |
|      justification: "Verified with customer"     |
|    - EXECUTED at T+421s                          |
+--------------------------------------------------+

How it works (step-by-step, with invariants and failure modes)

  1. Authorization engine emits ESCALATE decision. The request is persisted to the approval queue with full context, risk tier, matched rule, and timeout deadline.
  2. The queue assigns the request to the first reviewer in the escalation chain and sends a notification (UI alert, email, or Slack message).
  3. The reviewer examines the request context in the approval UI: tool name, arguments (sensitive fields redacted), risk tier, policy rule explanation, and session history.
  4. The reviewer chooses APPROVE (with optional justification) or DENY (with required reason).
  5. If APPROVE: the tool executes, and the approval decision is logged with the reviewer identity and justification.
  6. If DENY: the agent receives a structured error with APPROVAL_DENIED reason code, and the denial is logged.
  7. If no action before timeout: the request escalates to the next reviewer in the chain. If the chain is exhausted, the request is auto-denied with ESCALATION_TIMEOUT reason.
  8. Every event (queue, assign, timeout, escalate, approve, deny) is recorded in the append-only audit log.

Invariants: no ESCALATED request executes without an explicit APPROVE decision from a reviewer; every queue entry has a finite timeout; the audit log is append-only and contains every state transition.

Failure modes: queue storage failure (deny all pending requests and alert ops), reviewer notification delivery failure (escalation timer still runs, backup reviewer is notified), approval for an already-cancelled request (check request state before execution, log STALE_APPROVAL).

Minimal concrete example

Approval queue entry:
  trace_id: trc_p13_099
  tool: bank_transfer
  args: {amount: 20000, destination: "ext_22"}  # dest_name redacted
  actor: assistant
  session: s_001
  risk_tier: CRITICAL
  escalation_rule: R03
  escalation_reason: "External transfer exceeds $500 threshold"
  timeout_policy:
    level_1: reviewer=ops_team, timeout=300s
    level_2: reviewer=manager, timeout=300s
    level_3: auto_deny

Reviewer decision:
  reviewer: manager_1
  action: APPROVE
  justification: "Customer confirmed transfer via phone callback ref#12345"
  decided_at: 2025-01-15T10:39:00Z

Audit log entry:
  trace_id: trc_p13_099
  policy_version: pol_v2.1
  events:
    - {type: QUEUED, at: "10:32:00Z", by: "policy_engine"}
    - {type: ASSIGNED, at: "10:32:02Z", to: "ops_team"}
    - {type: TIMEOUT, at: "10:37:02Z"}
    - {type: ESCALATED, at: "10:37:03Z", to: "manager_1"}
    - {type: APPROVED, at: "10:39:00Z", by: "manager_1",
       justification: "Customer confirmed via callback ref#12345"}
    - {type: EXECUTED, at: "10:39:01Z"}

Common misconceptions

  • “Human approval means every risky action gets reviewed.” In practice, only actions that exceed the automated authorization threshold are escalated. If the policy engine can deterministically allow an action based on its rules, no human needs to see it. The approval queue handles only the residual cases.
  • “Approval fatigue is an education problem.” It is a design problem. If reviewers approve 95% of escalated requests without meaningful review, the escalation threshold is wrong. The fix is to tune the policy rules so that fewer low-signal requests reach the queue.
  • “Audit logs are only useful for compliance.” They are the primary debugging tool for security incidents. When a bad action was approved, the audit log shows who approved it, what context they saw, and what justification they provided. This is essential for improving the policy rules.

Check-your-understanding questions

  1. What should happen when a timeout expires and no reviewer has acted?
  2. Why must DENY decisions require a reason but APPROVE decisions can have optional justification?
  3. How do you detect and mitigate approval fatigue?

Check-your-understanding answers

  1. The request escalates to the next reviewer in the chain. If the entire chain is exhausted, the request is auto-denied with an ESCALATION_TIMEOUT reason. Auto-deny is the safe default because it is better to block a legitimate action (which can be retried) than to allow a dangerous one.
  2. DENY reasons are essential for the agent (to understand why and potentially reformulate) and for policy tuning (to identify rules that are too permissive). APPROVE justifications are optional because the approval itself is the meaningful signal, but they are strongly recommended for CRITICAL tier actions to support audit review.
  3. Monitor the false escalation rate (approved-without-modification / total-escalated). If it exceeds 80-90%, the escalation rules are too aggressive. Review the most frequently approved tool-argument combinations and consider adding explicit ALLOW rules for those patterns.

Real-world applications

  • Financial transaction approval workflows where transfers above a threshold require manager sign-off with dual control.
  • Healthcare order entry systems where certain medication orders require pharmacist review before dispensing.
  • Cloud infrastructure changes where production deployments require approval from a designated release manager.
  • Content moderation systems where borderline content is escalated to human reviewers with SLA targets.

Where you’ll apply it

  • The approval queue and audit logger are the final components of P13. They integrate with the risk classifier (Concept 1) and authorization engine (Concept 2) to form the complete firewall pipeline.

References

  • NIST SP 800-207 Zero Trust Architecture, Section 5 (trust algorithm and policy decision points)
  • “Designing Secure Systems” principles of separation of duties and dual control
  • SOC 2 Type II audit trail requirements for access control decisions
  • Google SRE book, Chapter 14 (managing incidents and escalation)

Key insights The approval queue must balance safety against velocity through tiered timeouts and escalation chains, while audit logs provide the forensic backbone that makes every decision traceable and every policy gap discoverable.

Summary Human Approval and Exception Handling manages the lifecycle of escalated tool requests through a persistent approval queue, tiered reviewer workflows with timeout and escalation policies, exception override mechanisms, and immutable audit logging. It prevents both over-blocking (approval fatigue) and under-blocking (rubber-stamping) through measurable thresholds and continuous policy calibration.

Homework/Exercises to practice the concept

  • Design an escalation chain for 3 risk tiers (HIGH, CRITICAL, CRITICAL+EXTERNAL) with specific timeout durations, reviewer roles, and fallback decisions.
  • Create a mock audit log for a scenario where a request times out at the first level, escalates to a manager, and is approved with justification. Include all event types and timestamps.
  • Calculate the false escalation rate from a sample of 50 escalated requests (40 approved, 8 denied, 2 timed out) and recommend whether the escalation threshold should be adjusted.

Solutions to the homework/exercises The escalation chain should show increasing urgency: HIGH might have a 30-minute timeout with auto-deny fallback, CRITICAL a 5-minute timeout escalating to manager then auto-deny, and CRITICAL+EXTERNAL a 2-minute timeout with immediate manager notification and auto-deny if the entire chain exhausts. The audit log should include QUEUED, ASSIGNED, TIMEOUT, ESCALATED, ASSIGNED (second reviewer), APPROVED events with realistic timestamps showing the timeout gap. For the false escalation rate: 40 approved out of 50 total = 80%, which is at the threshold. If the rate stays above 80%, review the most common tool-argument patterns in the approved set and consider adding explicit ALLOW rules for the safest ones to reduce queue load.

3. Project Specification

3.1 What You Will Build

A policy firewall that enforces capability-based permissions and human approval for risky tool actions.

3.2 Functional Requirements

  1. Evaluate each tool call against capability policy rules.
  2. Allow low-risk actions automatically with trace logging.
  3. Route high-risk actions to human approval queue.
  4. Record immutable decision logs for audit review.

3.3 Non-Functional Requirements

  • Performance: Policy check latency under 100 ms for synchronous decisions.
  • Reliability: Policy engine returns deterministic decisions for same context.
  • Security/Policy: No privileged action executes without explicit allow decision.

3.4 Example Usage / Output

Browser URL: http://localhost:3013/firewall/approvals

+--------------------------------------------------------------------------------+
| Tool Firewall - Pending Approvals                                              |
+----------------------+--------+------------+----------------+------------------+
| Action               | Risk   | Requester  | Rule           | Actions          |
+----------------------+--------+------------+----------------+------------------+
| bank_transfer $20k   | HIGH   | assistant  | HIGH_RISK_TXN  | [Approve] [Deny] |
| delete_user_account  | HIGH   | assistant  | ACCOUNT_DELETE | [Approve] [Deny] |
+--------------------------------------------------------------------------------+
| Trace Detail: trc_p13_099                                                      |
| Requested tool: bank_transfer                                                  |
| Policy reason: Human approval required for external transfer over $500         |
+--------------------------------------------------------------------------------+
$ curl -s http://localhost:3000/v1/tool/execute \
  -H 'content-type: application/json' \
  -d '{
  "tool": "crm_lookup",
  "arguments": {"customer_id": "cus_1001"},
  "actor": "assistant",
  "session_id": "s_001"
}' | jq
{
  "decision": "ALLOW",
  "policy_rule": "LOW_RISK_READ_ONLY",
  "trace_id": "trc_p13_001"
}

3.5 Data Formats / Schemas / Protocols

  • Tool execution request JSON with actor/session/tool/arguments.
  • Decision log JSONL including rule id, risk score, and action.
  • Approval queue records with reviewer identity and decision timestamp.

3.6 Edge Cases

  • Tool arguments omit fields needed for policy evaluation.
  • Same high-risk request retried repeatedly to bypass queue.
  • Policy update occurs while action is pending approval.
  • Reviewer denies request after partial external side effect.

3.7 Real World Outcome

This project is complete when both UI workflow and backend policy enforcement are visible and auditable.

3.7.1 How to Run (Copy/Paste)

$ npm run dev --workspace p13-tool-firewall

3.7.2 Golden Path Demo (Deterministic)

Use the provided fixture payload and pre-seeded queue/data so UI counts and API responses are reproducible.

3.7.3 Browser Flow

  • Open: http://localhost:3013/firewall/approvals
  • Verify these visible states:
  • Queue table shows pending actions with columns: Action, Risk, Requester, Policy Rule, Created At.
  • Each row has Approve, Deny, and View Trace buttons.
  • Right-side panel shows full argument payload, redacted secrets, and policy explanation.
+--------------------------------------------------------------------------------+
| Tool Firewall - Pending Approvals                                              |
+----------------------+--------+------------+----------------+------------------+
| Action               | Risk   | Requester  | Rule           | Actions          |
+----------------------+--------+------------+----------------+------------------+
| bank_transfer $20k   | HIGH   | assistant  | HIGH_RISK_TXN  | [Approve] [Deny] |
| delete_user_account  | HIGH   | assistant  | ACCOUNT_DELETE | [Approve] [Deny] |
+--------------------------------------------------------------------------------+
| Trace Detail: trc_p13_099                                                      |
| Requested tool: bank_transfer                                                  |
| Policy reason: Human approval required for external transfer over $500         |
+--------------------------------------------------------------------------------+

3.7.4 API Behavior (Success + Error)

$ curl -s http://localhost:3000/v1/tool/execute \
  -H 'content-type: application/json' \
  -d '{
  "tool": "crm_lookup",
  "arguments": {"customer_id": "cus_1001"},
  "actor": "assistant",
  "session_id": "s_001"
}' | jq
{
  "decision": "ALLOW",
  "policy_rule": "LOW_RISK_READ_ONLY",
  "trace_id": "trc_p13_001"
}
$ curl -s http://localhost:3000/v1/tool/execute \
  -H 'content-type: application/json' \
  -d '{
  "tool": "bank_transfer",
  "arguments": {"amount": 20000, "destination": "ext_22"},
  "actor": "assistant",
  "session_id": "s_001"
}' | jq
{
  "error": {
    "code": "APPROVAL_REQUIRED",
    "message": "High-risk action queued for human approval.",
    "trace_id": "trc_p13_099",
    "project": "P13"
  }
}

4. Solution Architecture

4.1 High-Level Design

User Input / Trigger
        |
        v
+-------------------------+
| Policy Engine |
+-------------------------+
        |
        v
+-------------------------+
| Approval Queue |
+-------------------------+
        |
        v
+-------------------------+
| Audit Logger |
+-------------------------+
        |
        v
Artifacts / API / UI / Logs

4.2 Key Components

| Component | Responsibility | Key Decisions | |———–|—————-|—————| | Policy Engine | Computes allow/deny/escalate decision. | Use explicit default-deny behavior. | | Approval Queue | Stores and routes high-risk actions to humans. | Decouple queue from real-time request path. | | Audit Logger | Persists decision and reviewer metadata. | Immutable logs simplify compliance reviews. |

4.3 Data Structures (No Full Code)

P13_Request:
- trace_id
- input payload/context
- policy profile

P13_Decision:
- status (ALLOW | DENY | RETRY | ESCALATE | PROMOTE | ROLLBACK)
- reason_code
- artifact pointers

4.4 Algorithm Overview

Key algorithm: Policy-aware decision pipeline

  1. Normalize input and attach deterministic trace metadata.
  2. Run contract/schema validation and project-specific core checks.
  3. Apply policy gates and decide: success, retry, deny, escalate, or rollback.
  4. Persist artifacts and publish operational metrics.

Complexity Analysis (conceptual):

  • Time: O(n) over fixture/request items in a batch run.
  • Space: O(n) for traces and report artifacts.

5. Implementation Guide

5.1 Development Environment Setup

# 1) Install dependencies
# 2) Prepare fixtures under fixtures/
# 3) Run the project command(s) listed in section 3.7

5.2 Project Structure

p13/
├── src/
├── fixtures/
├── policies/
├── out/
└── README.md

5.3 The Core Question You’re Answering

“How do I prevent unsafe side effects when models request privileged tool actions?”

This question matters because it forces the project to produce objective evidence instead of relying on subjective prompt impressions.

5.4 Concepts You Must Understand First

  1. Capability-based authorization
    • Why does this concept matter for P13?
    • Book Reference: NIST Zero Trust Architecture (SP 800-207)
  2. Risk-tiered approval workflows
    • Why does this concept matter for P13?
    • Book Reference: Operational governance patterns
  3. Policy decision logging
    • Why does this concept matter for P13?
    • Book Reference: Auditability and compliance controls

5.5 Questions to Guide Your Design

  1. Boundary and contracts
    • What is the smallest safe contract surface for tool permission firewall?
    • Which failure reasons must be explicit and machine-readable?
  2. Runtime policy
    • What is allowed automatically, what needs retry, and what must escalate?
    • Which policy checks must happen before any side effect?
  3. Evidence and observability
    • What traces/metrics are required for fast incident triage?
    • What specific thresholds trigger rollback or human review?

5.6 Thinking Exercise

Pre-Mortem for Tool Permission Firewall

Before implementing, write down 10 ways this project can fail in production. Classify each failure into: contract, policy, security, or operations.

Questions to answer:

  • Which failures can be prevented before runtime?
  • Which failures require runtime detection and escalation?

5.7 The Interview Questions They’ll Ask

  1. “How does capability-based access differ from role-based access here?”
  2. “What actions should always require human approval?”
  3. “How do you design an auditable policy decision record?”
  4. “What is your strategy for policy versioning?”
  5. “How do you prevent approval fatigue?”

5.8 Hints in Layers

Hint 1: Model proposals are untrusted Treat tool calls as requests, not commands.

Hint 2: Default deny Unknown tools or contexts should never auto-allow.

Hint 3: Explain decisions Policy rule ids and rationale must be visible to reviewers.

Hint 4: Measure queue health Track pending age and reviewer throughput.

5.9 Books That Will Help

| Topic | Book | Chapter | |——-|——|———| | Zero trust model | NIST SP 800-207 Zero Trust Architecture | Core sections | | Security engineering | “Security Engineering” by Ross Anderson | Authorization chapters | | Operational reliability | “Site Reliability Engineering” by Google | Operational response chapters |

5.10 Implementation Phases

Phase 1: Foundation

  • Define contracts, policy profiles, and deterministic fixtures.
  • Build the core execution path and baseline artifact output.
  • Checkpoint: One golden-path scenario runs end-to-end with trace id and artifact.

Phase 2: Core Functionality

  • Add project-specific evaluation/routing/verification logic.
  • Add error paths with unified reason codes.
  • Checkpoint: Golden-path and one failure-path both behave deterministically.

Phase 3: Operational Hardening

  • Add metrics, trend reporting, and release/rollback or escalation gates.
  • Document runbook and incident/debug flow.
  • Checkpoint: Team member can reproduce output from clean checkout.

5.11 Key Implementation Decisions

| Decision | Options | Recommendation | Rationale | |———-|———|—————-|———–| | Validation order | Late checks vs early checks | Early checks | Fail-fast saves cost and reduces unsafe execution | | Failure handling | Silent retries vs explicit reason codes | Explicit reason codes | Enables automation and faster debugging | | Rollout/escalation | Manual-only vs policy-driven | Policy-driven with manual override | Balances speed and safety |

6. Testing Strategy

6.1 Test Categories

| Category | Purpose | Examples | |———-|———|———-| | Unit Tests | Validate deterministic building blocks | schema checks, policy gates, parser behaviors | | Integration Tests | Verify end-to-end project path | golden-path command/API flow | | Edge Case Tests | Ensure robust failure handling | malformed fixture, blocked policy action |

6.2 Critical Test Cases

  1. Golden path succeeds and emits expected artifact shape.
  2. High-risk/invalid path returns deterministic error with reason code.
  3. Replay with same seed/config yields same decision summary.

6.3 Test Data

fixtures/golden_case.*
fixtures/failure_case.*
fixtures/edge_cases/*

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

| Pitfall | Symptom | Solution | |———|———|———-| | “High-risk action executed automatically” | Default policy fallback is allow. | Switch to explicit deny-by-default. | | “Reviewers cannot understand why action was blocked” | Decision lacks rule explanation. | Attach policy rationale and matching condition in UI. | | “Queue backlog grows” | Escalation rules are too broad. | Tune risk thresholds and auto-allow low-risk read-only calls. |

7.2 Debugging Strategies

  • Re-run deterministic fixtures with fixed seed and compare trace ids.
  • Diff latest artifacts against last known-good baseline.
  • Isolate whether failure is contract, policy, or runtime dependency related.

7.3 Performance Traps

  • Unbounded retries inflate latency and cost.
  • Overly broad logging can slow hot paths.
  • Missing cache/canonicalization can create avoidable compute churn.

8. Extensions & Challenges

8.1 Beginner Extensions

  • Add one new fixture category and expected outcome labels.
  • Add one new reason code with deterministic validation.

8.2 Intermediate Extensions

  • Add dashboard-ready trend exports.
  • Add automated regression diff against previous run artifacts.

8.3 Advanced Extensions

  • Integrate with rollout gates or human approval workflows.
  • Add chaos-style fault injection and recovery assertions.

9. Real-World Connections

9.1 Industry Applications

  • PromptOps platform teams operating AI features under compliance constraints.
  • Internal AI governance tooling for release safety and incident response.
  • LangChain/LangSmith style eval and tracing workflows.
  • OpenTelemetry-based observability stacks for decision traces.

9.3 Interview Relevance

  • Demonstrates ability to convert probabilistic model behavior into deterministic software guarantees.
  • Shows practical production-thinking: contracts, policies, monitoring, and operational controls.

10. Resources

10.1 Essential Reading

  • OpenAI/Anthropic/Google provider docs for structured outputs, tool calling, and prompt controls.
  • OWASP LLM Top 10 and NIST AI RMF guidance for safety and governance.

10.2 Video Resources

  • Talks on LLM eval systems, PromptOps, and AI safety operations.

10.3 Tools & Documentation

  • JSON schema validators, policy engines, and tracing infrastructure docs.
  • Previous projects: build specialized primitives.
  • Next projects: integrate these primitives into broader operational systems.

11. Self-Assessment Checklist

11.1 Understanding

  • I can explain the core risk boundaries and policy gates for this project.
  • I can explain the artifact format and why each field exists.
  • I can justify the release/escalation criteria.

11.2 Implementation

  • Golden-path and failure-path flows both work.
  • Deterministic artifacts are produced and reproducible.
  • Observability fields are present for debugging and audits.

11.3 Growth

  • I can describe one tradeoff I made and why.
  • I can explain this project design in an interview setting.

12. Submission / Completion Criteria

Minimum Viable Completion:

  • Golden path works with deterministic output artifact.
  • At least one failure-path scenario returns unified error shape/reason code.
  • Core metrics are emitted and documented.

Full Completion:

  • Includes automated tests, trend reporting, and reproducible runbook.
  • Includes operational thresholds for promote/rollback or escalate/approve.

Excellence (Above & Beyond):

  • Integrates with adjacent projects (registry, rollout, firewall, HITL) cleanly.
  • Demonstrates incident drill replay and fast root-cause workflow.