Project 14: Browser and Computer-Use Agent Sandbox

Build a constrained browser-operating agent with policy gates, redaction, and human takeover.

Quick Reference

Attribute	Value
Difficulty	Level 4: Expert
Time Estimate	12-24 hours
Language	Python (alt: TypeScript)
Prerequisites	Projects 2, 6, 9
Key Topics	computer use tools, action safety, UI drift recovery

Learning Objectives

Execute browser tasks through a controlled action layer.
Enforce policy checks before each UI action.
Handle UI drift and ambiguous states robustly.
Support operator takeover and resume.

The Core Question You’re Answering

“How do you safely operationalize an agent that clicks and types in real interfaces?”

Concepts You Must Understand First

Concept	Why It Matters	Where to Learn
Computer-use tools	High-impact action channel	OpenAI tools for agents
Action policy gates	Prevent irreversible mistakes	Guardrails architecture references
UI state verification	Detect drift and stale targets	Browser automation best practices

Theoretical Foundation

Task -> Plan -> Action Proposal -> Policy Gate -> Execute -> Verify -> Repeat

UI actions require stronger guardrails than read-only API calls.

Project Specification

What You’ll Build

A sandbox runner that:

Opens controlled browser session
Executes allowed actions only
Logs screenshots and action traces
Requires approval for risky actions

Functional Requirements

Domain allow/deny policy
Action-type policy (click, type, submit)
Post-action verification checks
Human takeover endpoint

Non-Functional Requirements

Sensitive data redaction
Deterministic replay mode for one golden scenario
Clear incident trail for each run

Real World Outcome

$ python p14_computer_use.py --task "collect pricing tiers"
[sandbox] browser session started
[policy] blocked domains: payments.*, admin.*
[action] navigate -> extract_table -> navigate -> extract_table
[action] blocked submit action (manual approval required)
[artifact] pricing_matrix.csv + action_trace.ndjson

Architecture Overview

Agent Planner
   |
Action Queue -> Policy Evaluator -> Browser Sandbox -> Verifier -> Trace Store

Implementation Guide

Read-only extraction flows, no form submissions.

Phase 2: Policy + Verification

Gate actions and verify page-state invariants.

Phase 3: Human Takeover

Pause on high-risk actions and resume safely.

Testing Strategy

UI drift tests (changed selectors)
Deceptive prompt/button tests
Replay tests with redacted traces

Common Pitfalls & Debugging

Pitfall	Symptom	Fix
Policy only at planning stage	unsafe action executes	enforce policy at execution boundary
Silent extraction failures	incomplete dataset	add post-action assertions
Stuck in modal loops	repeated retries	max retries + operator escalation

Interview Questions They’ll Ask

What risks are unique to computer-use agents?
How do you design safe action permissions?
How do you recover from UI drift?
What should be logged for compliance?

Hints in Layers

Hint 1: Begin with read-only browsing tasks.
Hint 2: Require verification after every action.
Hint 3: Add per-action confidence thresholds.
Hint 4: Build operator pause/resume into v1.

Submission / Completion Criteria

Minimum Completion

Safe read-only extraction flow with full trace

Full Completion

Policy-gated write actions + human takeover

Excellence

Robust drift handling with minimal false positives