Project 2: Policy Decision Engine (Building a PDP)
Project 2: Policy Decision Engine (Building a PDP)
Project Overview
What youâre building: A standalone âBrainâ server that functions as the Policy Decision Point (PDP) in Zero Trust Architecture. It receives authorization requests (e.g., âCan User A perform Action B on Resource C?â) and returns âAllowâ or âDenyâ decisions based on dynamic rules stored in a database or JSON file. This server doesnât handle actual traffic - it only makes decisions.
Why it matters: In traditional security, authorization logic is scattered across applications - each service implements its own access control. This creates inconsistency, makes auditing nearly impossible, and means changing a policy requires updating multiple codebases. A centralized PDP solves this by becoming the single source of truth for all authorization decisions.
Real-world applications:
- Enterprise authorization systems (like Googleâs Zanzibar or Airbnbâs Himeji)
- Cloud IAM policy evaluation (AWS IAM, Google Cloud IAP)
- API gateway authorization layers
- Microservices access control in service meshes
- Zero Trust Network Access (ZTNA) decision engines
+------------------------------------------------------------------+
| Policy Decision Point (PDP) |
+------------------------------------------------------------------+
| |
| PEP (Proxy/Gateway) PDP (This Project) |
| | | |
| | Authorization Request: | |
| | "Can Alice push to | |
| | kernel-repo at 3AM?" | |
| |-------------------------->| |
| | | |
| | +-----|-----+ |
| | | Evaluate | |
| | | Policies | |
| | +-----|-----+ |
| | | |
| | Decision Response: | |
| | { "decision": "DENY", | |
| | "reason": "Outside | |
| | allowed hours" } | |
| |<--------------------------| |
| | | |
| V | |
| +------------------+ | |
| | Enforce Decision | | |
| | Block Request | | |
| +------------------+ | |
| |
+------------------------------------------------------------------+
Learning Objectives
By completing this project, you will be able to:
- Explain the difference between RBAC, ABAC, and ReBAC and when to use each access control model
- Design and implement a policy evaluation engine that supports complex, context-aware rules
- Build a high-performance authorization service capable of sub-5ms response times
- Implement policy hot-reloading without service restart or dropped connections
- Create audit logs that capture the complete decision trail for compliance requirements
- Handle policy conflicts using precedence rules and explicit conflict resolution strategies
- Integrate data enrichment from external sources (device health, threat intelligence) into decisions
Prerequisites
Required knowledge:
- Proficiency in Go or Rust (or Python for prototyping)
- Understanding of JSON and REST APIs
- Basic knowledge of boolean logic and conditional expressions
- Familiarity with HTTP servers and request handling
Helpful but not required:
- Experience with Open Policy Agent (OPA) or similar policy engines
- Understanding of database query optimization
- Knowledge of caching strategies
System requirements:
- Linux (preferred) or macOS
- Go 1.21+ or Rust 1.70+
- curl for testing
- Redis (optional, for caching)
Deep Theoretical Foundation
Access Control Models: RBAC vs ABAC vs ReBAC
Before building a policy engine, you must understand the three fundamental access control paradigms:
+------------------------------------------------------------------+
| Access Control Model Comparison |
+------------------------------------------------------------------+
RBAC (Role-Based Access Control)
================================
Simple, coarse-grained. Users have Roles, Roles have Permissions.
User: Alice
|
+---> Role: Developer
|
+---> Permissions: [read:code, write:code, read:docs]
Decision Logic:
IF user.role == "developer" THEN allow(read:code)
Pros: Simple to understand, easy to audit
Cons: Role explosion, lacks context awareness
+------------------------------------------------------------------+
ABAC (Attribute-Based Access Control)
=====================================
Fine-grained. Decisions based on attributes of Subject, Resource,
Action, and Environment.
Subject Attributes: Resource Attributes:
+------------------+ +------------------+
| user: alice | | type: repository |
| role: developer | | owner: kernel |
| clearance: L2 | | sensitivity: high|
| device: secure | | classification: C|
+------------------+ +------------------+
Environment Attributes: Action Attributes:
+------------------+ +------------------+
| time: 14:00 | | operation: push |
| location: NYC | | scope: main |
| ip: 10.0.1.5 | | method: POST |
+------------------+ +------------------+
Decision Logic:
IF subject.role == "developer"
AND resource.sensitivity <= subject.clearance
AND environment.time BETWEEN 08:00 AND 20:00
AND subject.device == "secure"
THEN allow(action)
Pros: Highly flexible, context-aware, fine-grained
Cons: Complex to manage, harder to audit
+------------------------------------------------------------------+
ReBAC (Relationship-Based Access Control)
=========================================
Graph-based. Access determined by relationships between entities.
Used by Google Zanzibar, Airbnb Himeji, Ory Keto.
alice --[member]--> team-kernel --[owner]--> kernel-repo
|
bob --[viewer]------------------------|
Decision Logic:
Can(alice, push, kernel-repo)?
-> alice is member of team-kernel
-> team-kernel is owner of kernel-repo
-> owners can push
-> ALLOW
Pros: Natural for hierarchies, handles delegation well
Cons: Graph traversal complexity, eventual consistency issues
+------------------------------------------------------------------+
The Policy Decision Point in Zero Trust Architecture
The PDP sits at the heart of the Zero Trust Control Plane:
+------------------------------------------------------------------+
| Zero Trust Control Plane Architecture |
+------------------------------------------------------------------+
| |
| CONTROL PLANE |
| +----------------------------------------------------------+ |
| | | |
| | +----------------+ +---------------------------+ | |
| | | Policy | | Policy Information | | |
| | | Administrator | | Points (PIPs) | | |
| | | (PA) | | | | |
| | +-------+--------+ | +--------+ +--------+ | | |
| | | | |Identity| |Device | | | |
| | | | |Store | |Health | | | |
| | v | +--------+ +--------+ | | |
| | +----------------+ | +--------+ +--------+ | | |
| | | Policy |<----| |Threat | |Time/ | | | |
| | | Decision | | |Intel | |Location| | | |
| | | Point | | +--------+ +--------+ | | |
| | | (PDP) | +---------------------------+ | |
| | +-------+--------+ | |
| | | | |
| +-----------|----------------------------------------------+ |
| | Decision (Allow/Deny + Reason) |
| v |
| +----------------------------------------------------------+ |
| | DATA PLANE | |
| | | |
| | [Subject] --> [PEP/Gateway] --> [Resource] | |
| | ^ | |
| | | | |
| | Enforces Decision | |
| +----------------------------------------------------------+ |
| |
+------------------------------------------------------------------+
Policy Languages and Domain-Specific Languages (DSLs)
Production policy engines use specialized languages for expressing rules:
+------------------------------------------------------------------+
| Policy Language Comparison |
+------------------------------------------------------------------+
OPA Rego (Open Policy Agent)
============================
Logic-based, Datalog-inspired language. Industry standard.
package authz
default allow = false
allow {
input.subject.role == "admin"
}
allow {
input.subject.role == "developer"
input.resource.type == "code"
is_business_hours
}
is_business_hours {
now := time.now_ns()
hour := time.clock(now)[0]
hour >= 8
hour < 20
}
+------------------------------------------------------------------+
Cedar (AWS Verified Permissions)
================================
Type-safe, analyzable policy language from AWS.
permit (
principal in Group::"developers",
action == Action::"push",
resource in Repository::"kernel"
) when {
context.time.hour >= 8 &&
context.time.hour < 20 &&
principal.device_health == "secure"
};
+------------------------------------------------------------------+
JSON-Based Rules (What you'll build)
====================================
Simpler format for learning - you'll implement an evaluator for this.
{
"id": "dev-push-hours",
"effect": "allow",
"subjects": {
"roles": ["developer", "admin"]
},
"actions": ["push", "commit"],
"resources": {
"type": "repository"
},
"conditions": {
"time_range": {"start": "08:00", "end": "20:00"},
"device_health": "secure"
}
}
+------------------------------------------------------------------+
Context-Aware Authorization
Context transforms simple RBAC into intelligent, adaptive security:
+------------------------------------------------------------------+
| Context Dimensions for Decisions |
+------------------------------------------------------------------+
| |
| WHO (Subject Context) WHAT (Resource Context) |
| +------------------------+ +------------------------+ |
| | Identity: alice@corp | | ID: kernel-repo | |
| | Roles: [dev, sre] | | Type: repository | |
| | Groups: [team-kernel] | | Owner: linux-foundation| |
| | Clearance: L3 | | Sensitivity: critical | |
| | MFA: verified | | Classification: public | |
| | Session Age: 2h | | Data Types: [code] | |
| +------------------------+ +------------------------+ |
| |
| HOW (Action Context) WHERE/WHEN (Environment) |
| +------------------------+ +------------------------+ |
| | Operation: git_push | | Time: 2024-12-26T14:00 | |
| | Method: POST | | Day: Thursday | |
| | Scope: refs/heads/main | | Location: NYC office | |
| | Commit Count: 3 | | IP: 10.0.1.45 | |
| | Files Changed: 12 | | Network: corporate | |
| +------------------------+ | Device ID: MBP-42 | |
| | Device Health: secure | |
| WHY (Risk Context) | Risk Score: 0.15 | |
| +------------------------+ +------------------------+ |
| | Threat Level: low | |
| | Recent Failures: 0 | |
| | Anomaly Score: 0.2 | |
| | Active Incidents: 0 | |
| +------------------------+ |
| |
| DECISION MATRIX: |
| +---------------------------------------------------------+ |
| | Rule: "Critical repos require device health + work hours"| |
| | | |
| | subject.roles CONTAINS "developer" -> TRUE | |
| | resource.sensitivity == "critical" -> TRUE | |
| | environment.device_health == "secure" -> TRUE | |
| | environment.time IN [08:00, 20:00] -> TRUE | |
| | environment.risk_score < 0.5 -> TRUE | |
| | | |
| | ALL CONDITIONS MET -> ALLOW | |
| +---------------------------------------------------------+ |
| |
+------------------------------------------------------------------+
Policy Conflict Resolution
When multiple policies apply, you need clear resolution rules:
+------------------------------------------------------------------+
| Policy Conflict Resolution |
+------------------------------------------------------------------+
| |
| STRATEGY 1: Deny Overrides (Most Secure) |
| ========================================= |
| If ANY policy says DENY, the final decision is DENY. |
| Used by: AWS IAM, most enterprise systems |
| |
| Policy A: ALLOW (developer can read) |
| Policy B: DENY (no access after 10PM) |
| Final: DENY |
| |
| +----------------------------------------------------------+ |
| | for policy in matching_policies: | |
| | if policy.effect == DENY: | |
| | return DENY, policy.reason | |
| | return ALLOW if any_allow else DENY | |
| +----------------------------------------------------------+ |
| |
| STRATEGY 2: First Match (Order Matters) |
| ======================================== |
| First policy that matches determines the outcome. |
| Used by: Firewall rules, some RBAC systems |
| |
| Policy 1: DENY contractors after hours |
| Policy 2: ALLOW developers to push |
| Request: Contractor pushing at 3PM |
| Final: DENY (matched Policy 1 first) |
| |
| STRATEGY 3: Most Specific Wins |
| ============================== |
| Policy with most specific match takes precedence. |
| Used by: URL routing, some ABAC systems |
| |
| Policy A: General "developers can read repos" |
| Policy B: Specific "Alice cannot read kernel-repo" |
| Final: DENY for Alice on kernel-repo (more specific) |
| |
| STRATEGY 4: Priority/Weight Based |
| ================================= |
| Each policy has an explicit priority number. |
| |
| Policy A: priority=100, ALLOW developers |
| Policy B: priority=200, DENY after hours |
| Final: Higher priority (200) wins -> DENY |
| |
+------------------------------------------------------------------+
Fail-Open vs Fail-Closed Decisions
Critical design choice when the PDP cannot make a decision:
+------------------------------------------------------------------+
| Failure Mode Decision Matrix |
+------------------------------------------------------------------+
| |
| FAIL-CLOSED (Deny by Default) |
| ============================= |
| If PDP is unavailable or uncertain, DENY the request. |
| |
| Pros: Cons: |
| - More secure - Availability impact |
| - No unauthorized access - User frustration |
| - Meets compliance requirements - Business disruption |
| |
| When to use: |
| - Financial systems |
| - Healthcare data |
| - Critical infrastructure |
| - Government/classified systems |
| |
| +----------------------------------------------------------+ |
| | func decide(request) -> Decision: | |
| | try: | |
| | return evaluate_policies(request) | |
| | except Timeout, Error: | |
| | log.error("PDP failure, denying request") | |
| | return DENY("System unavailable") | |
| +----------------------------------------------------------+ |
| |
+------------------------------------------------------------------+
| |
| FAIL-OPEN (Allow by Default) |
| ============================= |
| If PDP is unavailable, ALLOW the request. |
| |
| Pros: Cons: |
| - Better availability - Security risk |
| - No business disruption - Compliance violations |
| - User experience preserved - Audit gaps |
| |
| When to use (with extreme caution): |
| - Read-only public data |
| - Non-sensitive operations |
| - When availability > confidentiality |
| |
| NEVER use for: |
| - Write operations |
| - Sensitive data access |
| - Administrative actions |
| |
+------------------------------------------------------------------+
| |
| HYBRID: Cached Fallback |
| ======================== |
| Use cached decisions during PDP outages. |
| |
| +----------------------------------------------------------+ |
| | func decide(request) -> Decision: | |
| | cache_key = hash(request.subject, request.action, | |
| | request.resource) | |
| | try: | |
| | decision = evaluate_policies(request) | |
| | cache.set(cache_key, decision, ttl=300s) | |
| | return decision | |
| | except Timeout: | |
| | if cached = cache.get(cache_key): | |
| | log.warn("Using cached decision") | |
| | return cached | |
| | return DENY("No cached decision available") | |
| +----------------------------------------------------------+ |
| |
+------------------------------------------------------------------+
Caching Strategies for Authorization
Performance is critical - every request in your system calls the PDP:
+------------------------------------------------------------------+
| Authorization Caching Strategies |
+------------------------------------------------------------------+
| |
| LAYER 1: PEP-Side Cache (Sidecar/Local) |
| ======================================== |
| |
| +----------+ +----------+ +----------+ |
| | PEP |--->| Local |--->| PDP | |
| | Gateway | | Cache | | Server | |
| +----------+ +----------+ +----------+ |
| |
| - TTL: 30s - 5min (depends on sensitivity) |
| - Cache Key: hash(subject, action, resource) |
| - Invalidation: on policy change, session end |
| |
| Latency: ~0.1ms (cache hit) vs ~5ms (PDP call) |
| |
+------------------------------------------------------------------+
| |
| LAYER 2: Distributed Cache (Redis/Memcached) |
| ============================================= |
| |
| +--------+ +--------+ +---------+ +--------+ |
| | PEP 1 |--->| | | |<---| PEP 2 | |
| +--------+ | Redis |<---| PDP | +--------+ |
| +--------+ | Cluster| | Cluster | +--------+ |
| | PEP 3 |--->| | | |<---| PEP 4 | |
| +--------+ +--------+ +---------+ +--------+ |
| |
| - Shared cache across all PEPs |
| - TTL: configurable per policy sensitivity |
| - Pub/Sub for instant invalidation |
| |
+------------------------------------------------------------------+
| |
| CACHE INVALIDATION STRATEGIES |
| ============================== |
| |
| 1. Time-Based (TTL): |
| cache.set(key, decision, ttl=300) # 5 minutes |
| |
| 2. Event-Based (Push Invalidation): |
| on_policy_change -> redis.publish("invalidate", policy_id) |
| on_session_end -> redis.delete(session_cache_key) |
| |
| 3. Version-Based: |
| cache_key = f"{subject}:{action}:{resource}:v{policy_ver}" |
| # Old keys expire, new keys created on policy update |
| |
| 4. Hierarchical: |
| cache["org:acme:*"] -> clear all Acme policies |
| cache["org:acme:user:alice"] -> clear Alice's cache only |
| |
+------------------------------------------------------------------+
| |
| CACHE CONSISTENCY vs LATENCY TRADEOFF |
| ====================================== |
| |
| High Sensitivity Resource (PII, Financial): |
| - TTL: 0 (no cache) or 30 seconds max |
| - Immediate invalidation on policy change |
| - Accept higher latency for security |
| |
| Medium Sensitivity Resource (Internal Docs): |
| - TTL: 5 minutes |
| - Eventual consistency acceptable |
| |
| Low Sensitivity Resource (Public API): |
| - TTL: 30 minutes |
| - Stale reads acceptable |
| |
+------------------------------------------------------------------+
Complete Project Specification
Request/Response JSON Schemas
Authorization Request Schema
{
"$schema": "http://json-schema.org/draft-07/schema#",
"title": "AuthorizationRequest",
"type": "object",
"required": ["subject", "action", "resource"],
"properties": {
"request_id": {
"type": "string",
"description": "Unique identifier for this request (for tracing)"
},
"subject": {
"type": "object",
"description": "The entity requesting access",
"required": ["id"],
"properties": {
"id": { "type": "string" },
"type": { "type": "string", "enum": ["user", "service", "device"] },
"roles": { "type": "array", "items": { "type": "string" } },
"groups": { "type": "array", "items": { "type": "string" } },
"attributes": { "type": "object" },
"device_health": {
"type": "string",
"enum": ["secure", "at_risk", "compromised", "unknown"]
},
"mfa_verified": { "type": "boolean" },
"session_age_seconds": { "type": "integer" }
}
},
"action": {
"type": "string",
"description": "The operation being requested"
},
"resource": {
"type": "object",
"description": "The target of the action",
"required": ["id"],
"properties": {
"id": { "type": "string" },
"type": { "type": "string" },
"owner": { "type": "string" },
"sensitivity": {
"type": "string",
"enum": ["public", "internal", "confidential", "critical"]
},
"attributes": { "type": "object" }
}
},
"environment": {
"type": "object",
"description": "Contextual information about the request",
"properties": {
"timestamp": { "type": "string", "format": "date-time" },
"ip_address": { "type": "string" },
"location": { "type": "string" },
"network_type": {
"type": "string",
"enum": ["corporate", "vpn", "public", "unknown"]
},
"user_agent": { "type": "string" }
}
}
}
}
Authorization Response Schema
{
"$schema": "http://json-schema.org/draft-07/schema#",
"title": "AuthorizationResponse",
"type": "object",
"required": ["decision", "request_id", "evaluated_at"],
"properties": {
"decision": {
"type": "string",
"enum": ["ALLOW", "DENY"]
},
"request_id": {
"type": "string",
"description": "Echo of the request ID for correlation"
},
"reason": {
"type": "string",
"description": "Human-readable explanation of the decision"
},
"matched_policy": {
"type": "string",
"description": "ID of the policy that determined this decision"
},
"evaluated_at": {
"type": "string",
"format": "date-time"
},
"evaluation_time_ms": {
"type": "number",
"description": "Time taken to evaluate (for performance monitoring)"
},
"obligations": {
"type": "array",
"description": "Actions the PEP must take if allowing (e.g., log, notify)",
"items": {
"type": "object",
"properties": {
"action": { "type": "string" },
"parameters": { "type": "object" }
}
}
}
}
}
Policy Rule Format Specification
{
"$schema": "http://json-schema.org/draft-07/schema#",
"title": "Policy",
"type": "object",
"required": ["id", "effect"],
"properties": {
"id": {
"type": "string",
"description": "Unique policy identifier"
},
"name": {
"type": "string",
"description": "Human-readable policy name"
},
"description": {
"type": "string"
},
"effect": {
"type": "string",
"enum": ["allow", "deny"]
},
"priority": {
"type": "integer",
"default": 100,
"description": "Higher priority policies are evaluated first"
},
"subjects": {
"type": "object",
"description": "Conditions on the requesting entity",
"properties": {
"ids": { "type": "array", "items": { "type": "string" } },
"roles": { "type": "array", "items": { "type": "string" } },
"groups": { "type": "array", "items": { "type": "string" } },
"types": { "type": "array", "items": { "type": "string" } },
"attributes": {
"type": "object",
"additionalProperties": true
}
}
},
"actions": {
"type": "array",
"items": { "type": "string" },
"description": "List of actions this policy applies to"
},
"resources": {
"type": "object",
"description": "Conditions on the target resource",
"properties": {
"ids": { "type": "array", "items": { "type": "string" } },
"types": { "type": "array", "items": { "type": "string" } },
"owners": { "type": "array", "items": { "type": "string" } },
"sensitivity": { "type": "array", "items": { "type": "string" } },
"attributes": {
"type": "object",
"additionalProperties": true
}
}
},
"conditions": {
"type": "object",
"description": "Additional conditions that must be met",
"properties": {
"time_range": {
"type": "object",
"properties": {
"start": { "type": "string", "pattern": "^[0-2][0-9]:[0-5][0-9]$" },
"end": { "type": "string", "pattern": "^[0-2][0-9]:[0-5][0-9]$" },
"timezone": { "type": "string" },
"days": {
"type": "array",
"items": { "type": "string", "enum": ["Mon","Tue","Wed","Thu","Fri","Sat","Sun"] }
}
}
},
"device_health": {
"type": "array",
"items": { "type": "string", "enum": ["secure", "at_risk"] }
},
"mfa_required": { "type": "boolean" },
"network_types": {
"type": "array",
"items": { "type": "string" }
},
"max_session_age_seconds": { "type": "integer" },
"custom": {
"type": "object",
"description": "Custom JSONPath-based conditions"
}
}
},
"obligations": {
"type": "array",
"description": "Actions to take when this policy matches",
"items": {
"type": "object",
"properties": {
"on": { "type": "string", "enum": ["allow", "deny", "both"] },
"action": { "type": "string" },
"parameters": { "type": "object" }
}
}
}
}
}
Example Policies for Common Scenarios
{
"policies": [
{
"id": "admin-full-access",
"name": "Administrators have full access",
"description": "Global admin override - use with caution",
"effect": "allow",
"priority": 1000,
"subjects": {
"roles": ["admin", "super-admin"]
},
"actions": ["*"],
"resources": {}
},
{
"id": "dev-push-business-hours",
"name": "Developers can push during business hours",
"effect": "allow",
"priority": 100,
"subjects": {
"roles": ["developer", "sre"]
},
"actions": ["push", "merge"],
"resources": {
"types": ["repository"]
},
"conditions": {
"time_range": {
"start": "08:00",
"end": "20:00",
"timezone": "America/New_York",
"days": ["Mon", "Tue", "Wed", "Thu", "Fri"]
},
"device_health": ["secure"]
}
},
{
"id": "block-critical-after-hours",
"name": "Block access to critical resources after hours",
"effect": "deny",
"priority": 200,
"subjects": {},
"actions": ["*"],
"resources": {
"sensitivity": ["critical"]
},
"conditions": {
"time_range": {
"start": "22:00",
"end": "06:00"
}
}
},
{
"id": "require-mfa-for-sensitive",
"name": "Require MFA for confidential resources",
"effect": "deny",
"priority": 300,
"subjects": {
"attributes": {
"mfa_verified": false
}
},
"actions": ["*"],
"resources": {
"sensitivity": ["confidential", "critical"]
},
"obligations": [
{
"on": "deny",
"action": "require_mfa",
"parameters": { "redirect": "/auth/mfa" }
}
]
},
{
"id": "service-mesh-internal",
"name": "Services can communicate within mesh",
"effect": "allow",
"priority": 50,
"subjects": {
"types": ["service"],
"groups": ["internal-services"]
},
"actions": ["call", "invoke"],
"resources": {
"types": ["service", "api"]
},
"conditions": {
"network_types": ["corporate", "vpn"]
}
},
{
"id": "compromised-device-block",
"name": "Block all access from compromised devices",
"effect": "deny",
"priority": 999,
"subjects": {
"attributes": {
"device_health": "compromised"
}
},
"actions": ["*"],
"resources": {},
"obligations": [
{
"on": "deny",
"action": "alert_security_team",
"parameters": { "severity": "high" }
}
]
}
]
}
Performance Requirements
| Metric | Target | Acceptable | Unacceptable |
|---|---|---|---|
| P50 Latency | < 2ms | < 5ms | > 10ms |
| P99 Latency | < 10ms | < 25ms | > 50ms |
| Throughput | > 10,000 req/s | > 5,000 req/s | < 1,000 req/s |
| Error Rate | < 0.01% | < 0.1% | > 1% |
| Policy Reload | < 100ms | < 500ms | > 1s |
Real World Outcome
By the end of this project, you will have a high-performance authorization microservice. You can integrate this with the Proxy from Project 1 to create a complete Zero Trust flow.
What you will see:
- A REST API: Listening on port 9090
- Dynamic Policy Loading: Change a JSON file on disk, and the PDP will immediately change its decisions without a restart
- Detailed Decision Logs: The PDP prints why it allowed or denied a request, which is essential for security auditing
Command-Line Examples
# 1. Start your Policy Engine
$ ./zta-pdp --policy-file ./policies.json --port 9090
[INFO] PDP v1.0.0 starting...
[INFO] Loading policies from ./policies.json
[INFO] Loaded 6 security policies
[INFO] Policy Engine listening on :9090
[INFO] Healthcheck endpoint: GET /health
[INFO] Decision endpoint: POST /v1/decide
# 2. Check health status
$ curl http://localhost:9090/health
{
"status": "healthy",
"policies_loaded": 6,
"uptime_seconds": 45,
"version": "1.0.0"
}
# 3. Developer pushing to repo during business hours (ALLOWED)
$ curl -s -X POST http://localhost:9090/v1/decide \
-H "Content-Type: application/json" \
-d '{
"request_id": "req-001",
"subject": {
"id": "alice@example.com",
"type": "user",
"roles": ["developer"],
"device_health": "secure"
},
"action": "push",
"resource": {
"id": "kernel-repo",
"type": "repository",
"sensitivity": "internal"
},
"environment": {
"timestamp": "2024-12-26T14:00:00Z",
"network_type": "corporate"
}
}' | jq .
{
"decision": "ALLOW",
"request_id": "req-001",
"reason": "Matched policy 'dev-push-business-hours': Developers can push during business hours",
"matched_policy": "dev-push-business-hours",
"evaluated_at": "2024-12-26T14:00:01.234Z",
"evaluation_time_ms": 0.45
}
# 4. Same developer pushing at 3 AM (DENIED)
$ curl -s -X POST http://localhost:9090/v1/decide \
-H "Content-Type: application/json" \
-d '{
"request_id": "req-002",
"subject": {
"id": "alice@example.com",
"type": "user",
"roles": ["developer"],
"device_health": "secure"
},
"action": "push",
"resource": {
"id": "kernel-repo",
"type": "repository",
"sensitivity": "critical"
},
"environment": {
"timestamp": "2024-12-26T03:00:00Z",
"network_type": "vpn"
}
}' | jq .
{
"decision": "DENY",
"request_id": "req-002",
"reason": "Matched policy 'block-critical-after-hours': Block access to critical resources after hours",
"matched_policy": "block-critical-after-hours",
"evaluated_at": "2024-12-26T03:00:01.567Z",
"evaluation_time_ms": 0.38
}
# 5. User without MFA accessing confidential resource (DENIED with obligation)
$ curl -s -X POST http://localhost:9090/v1/decide \
-H "Content-Type: application/json" \
-d '{
"request_id": "req-003",
"subject": {
"id": "bob@example.com",
"type": "user",
"roles": ["viewer"],
"mfa_verified": false
},
"action": "read",
"resource": {
"id": "financial-reports",
"type": "document",
"sensitivity": "confidential"
},
"environment": {
"timestamp": "2024-12-26T10:00:00Z"
}
}' | jq .
{
"decision": "DENY",
"request_id": "req-003",
"reason": "Matched policy 'require-mfa-for-sensitive': Require MFA for confidential resources",
"matched_policy": "require-mfa-for-sensitive",
"evaluated_at": "2024-12-26T10:00:01.890Z",
"evaluation_time_ms": 0.52,
"obligations": [
{
"action": "require_mfa",
"parameters": { "redirect": "/auth/mfa" }
}
]
}
# 6. Admin accessing everything (ALLOWED - admin override)
$ curl -s -X POST http://localhost:9090/v1/decide \
-H "Content-Type: application/json" \
-d '{
"request_id": "req-004",
"subject": {
"id": "charlie@example.com",
"type": "user",
"roles": ["admin"]
},
"action": "delete",
"resource": {
"id": "production-database",
"type": "database",
"sensitivity": "critical"
},
"environment": {
"timestamp": "2024-12-26T03:00:00Z"
}
}' | jq .
{
"decision": "ALLOW",
"request_id": "req-004",
"reason": "Matched policy 'admin-full-access': Administrators have full access",
"matched_policy": "admin-full-access",
"evaluated_at": "2024-12-26T03:00:01.234Z",
"evaluation_time_ms": 0.28
}
# 7. Hot reload policies (update JSON file, then trigger reload)
$ curl -X POST http://localhost:9090/admin/reload-policies
{
"status": "reloaded",
"policies_loaded": 7,
"reload_time_ms": 45.2
}
# 8. View audit log (last 10 decisions)
$ curl http://localhost:9090/admin/audit?limit=10 | jq .
{
"decisions": [
{
"request_id": "req-004",
"subject_id": "charlie@example.com",
"action": "delete",
"resource_id": "production-database",
"decision": "ALLOW",
"matched_policy": "admin-full-access",
"timestamp": "2024-12-26T03:00:01.234Z"
},
...
]
}
The Core Question Youâre Answering
âHow do I build a system that can answer âIs this user allowed to do this action on this resource?â in milliseconds, with policies that are flexible, auditable, and maintainable?â
This question sits at the heart of every secure system. Every time someone clicks âSubmit,â every API call, every database query - somewhere, something must decide: allowed or denied. Traditional approaches scatter this logic across codebases, making security audits nightmares and policy changes dangerous multi-week projects. A well-designed Policy Decision Engine centralizes this critical logic, making authorization decisions consistent, traceable, and adaptable to changing business requirements without touching application code.
Concepts You Must Understand First
Before writing any code, you need to internalize these foundational concepts. For each one, make sure you can answer the associated questions.
1. Policy Decision Point (PDP) Architecture
The question to answer: What is the PDPâs role in the request lifecycle, and why must it be stateless?
The PDP is the âbrainâ that evaluates authorization requests. It receives context (who, what, where, when) and returns a decision. Understanding why this component must be isolated, fast, and horizontally scalable is essential.
- How does the PDP differ from a Policy Enforcement Point (PEP)?
- Why should the PDP never make network calls to enforce decisions?
- What happens to your systemâs reliability if the PDP becomes a single point of failure?
Read: âZero Trust Networksâ by Gilman & Barth, Chapter 3: The Zero Trust Control Plane - specifically the sections on policy architecture and the separation of concerns between decision and enforcement.
2. Access Control Models: ABAC vs RBAC vs PBAC
The question to answer: When is RBAC sufficient, and when does the complexity of ABAC become necessary?
Role-Based Access Control (RBAC) assigns permissions to roles, then assigns roles to users. Attribute-Based Access Control (ABAC) evaluates policies based on arbitrary attributes of subjects, resources, actions, and context. Policy-Based Access Control (PBAC) uses explicit policy statements that can combine elements of both.
- What is ârole explosionâ and why does it plague large RBAC systems?
- How would you express âcontractors can only access non-production resources during business hoursâ in RBAC vs ABAC?
- What attributes beyond âroleâ might influence an authorization decision?
Read: âSecurity in Computingâ by Charles Pfleeger, Chapter 4: Access Control - covers the theoretical foundations of access control matrices, capabilities, and access control lists.
3. Policy Languages: Rego, Cedar, and XACML Concepts
The question to answer: What makes a policy language expressive enough to capture real security requirements, yet analyzable enough to prove properties about the policies?
Production systems use specialized Domain-Specific Languages (DSLs) for expressing policies. Understanding why these exist - and their tradeoffs - helps you design a sensible policy format even if you start with JSON.
- What is Datalog, and why do languages like Rego build on it?
- How does Cedarâs type system help prevent policy errors?
- Why is it valuable to be able to answer âCan this policy ever allow access to resource X?â
Read: âZero Trust Networksâ by Gilman & Barth, Chapter 3 - discusses policy engines and their role. Additionally, explore the Open Policy Agent (OPA) documentation on Regoâs evaluation model.
4. Rule Evaluation and Conflict Resolution
The question to answer: When Policy A says ALLOW and Policy B says DENY, what should happen and why?
Real systems have many policies, and they overlap. Multiple policies might apply to a single request. The conflict resolution strategy you choose fundamentally affects your security posture.
- What does âdeny overridesâ mean, and why is it the most common strategy?
- How does policy priority ordering work, and when is it necessary?
- What is the âdefault denyâ principle, and how does it relate to fail-closed behavior?
Read: âFoundations of Information Securityâ by Jason Andress, Chapter 5: Authentication and Authorization - covers authorization principles and the logic behind access decisions.
5. Caching Strategies for Policy Decisions
The question to answer: How do you cache authorization decisions without creating security holes?
Every millisecond counts when every request requires an authorization check. Caching can reduce latency by 10-100x, but caching security decisions introduces risks: stale permissions, delayed revocation, cache poisoning.
- What should the cache key be for an authorization decision?
- When a userâs permissions are revoked, how quickly must cached decisions be invalidated?
- What is cache stampede, and how do you prevent it during policy reloads?
Read: âDesigning Data-Intensive Applications, 2nd Edâ by Martin Kleppmann, Chapters 1-2 - covers caching patterns, consistency tradeoffs, and the challenges of distributed state.
6. Audit Logging for Compliance
The question to answer: What information must you log to reconstruct exactly why access was granted or denied?
Authorization decisions are among the most security-sensitive events in a system. Regulators, auditors, and incident responders need to understand who accessed what, when, and why.
- What is the difference between an audit log and an application log?
- Why must audit logs be append-only and tamper-evident?
- What fields are essential in an authorization audit record?
Read: âSecurity in Computingâ by Charles Pfleeger, sections on auditing and accountability - discusses what makes an audit trail useful for security analysis.
Questions to Guide Your Design
These questions should shape your implementation decisions. Answer them before and during your build.
Architecture Questions
-
Where does your PDP fit? Draw a diagram showing how requests flow from a user through authentication, to your PDP, to the protected resource. Where is the PEP in this flow?
-
What is your API contract? What fields are required in an authorization request? What does a response look like? What HTTP status codes do you return?
-
How do you handle unknown fields? If a request includes an attribute your PDP doesnât understand, do you ignore it, reject the request, or log a warning?
-
What is your failure mode? If your PDP crashes, throws an exception, or times out, is the request allowed or denied?
Policy Design Questions
-
How expressive is your policy language? Can you express âDevelopers can push to non-production repos during NYC business hours from devices that passed health checks in the last 24 hoursâ?
-
How do you handle wildcards? Does
actions: ["*"]match any action? Doesresources: {}(empty object) match all resources or no resources? -
What happens when no policy matches? Is this an implicit deny, or an error condition?
-
How do you test policies before deploying them? Can you simulate a decision without affecting production?
Performance Questions
-
What is your latency budget? If you have 100ms for the entire request, how much can the PDP consume?
-
How do you scale? Can you run multiple PDP instances? Do they share state?
-
What can you precompute? Can you compile policies into faster data structures at load time?
-
What can you cache? Are there request patterns that repeat frequently enough to cache?
Operational Questions
-
How do you update policies without downtime? What happens to in-flight requests during a policy reload?
-
How do you know your PDP is healthy? What metrics do you expose? What does a health check verify?
-
How do you debug a wrong decision? Can you replay a request and see exactly which policy matched and why?
Thinking Exercise
Before you write code, work through this scenario with pencil and paper. This exercises your understanding of policy evaluation.
Scenario: Multi-Policy Evaluation
You have four policies loaded in your PDP:
Policy A (priority: 100, effect: ALLOW)
- subjects: roles contain "developer"
- actions: ["read", "push"]
- resources: type = "repository"
- conditions: none
Policy B (priority: 200, effect: DENY)
- subjects: all
- actions: ["push", "merge", "delete"]
- resources: sensitivity = "critical"
- conditions: time outside 08:00-18:00
Policy C (priority: 150, effect: ALLOW)
- subjects: roles contain "admin"
- actions: ["*"]
- resources: all
- conditions: none
Policy D (priority: 300, effect: DENY)
- subjects: device_health = "compromised"
- actions: ["*"]
- resources: all
- conditions: none
Now trace through these authorization requests. For each one, identify:
- Which policies match the subject, action, and resource?
- Which of those also pass their condition checks?
- Using âdeny overridesâ with priority ordering, what is the final decision?
Request 1:
{
"subject": { "id": "alice", "roles": ["developer"], "device_health": "secure" },
"action": "push",
"resource": { "id": "web-app", "type": "repository", "sensitivity": "internal" },
"environment": { "timestamp": "2024-12-26T14:00:00Z" }
}
Request 2:
{
"subject": { "id": "bob", "roles": ["developer"], "device_health": "secure" },
"action": "push",
"resource": { "id": "payment-service", "type": "repository", "sensitivity": "critical" },
"environment": { "timestamp": "2024-12-26T22:00:00Z" }
}
Request 3:
{
"subject": { "id": "charlie", "roles": ["admin"], "device_health": "secure" },
"action": "delete",
"resource": { "id": "payment-service", "type": "repository", "sensitivity": "critical" },
"environment": { "timestamp": "2024-12-26T22:00:00Z" }
}
Request 4:
{
"subject": { "id": "diana", "roles": ["admin"], "device_health": "compromised" },
"action": "read",
"resource": { "id": "docs", "type": "document", "sensitivity": "public" },
"environment": { "timestamp": "2024-12-26T10:00:00Z" }
}
Work through each request step by step. Write down your reasoning. Then check your answers:
Click to reveal answers
Request 1: ALLOW (via Policy A)
- Policy A matches: developer, push, repository - no conditions - ALLOW
- Policy B: matches push, but sensitivity is âinternalâ not âcriticalâ - no match
- Policy C: not admin - no match
- Policy D: device not compromised - no match
- Only Policy A applies. Decision: ALLOW
Request 2: DENY (via Policy B)
- Policy A: matches subject, action, resource - ALLOW candidate
- Policy B: matches (push action, critical sensitivity, time 22:00 is outside 08:00-18:00) - DENY candidate
- Policy C: not admin - no match
- Policy D: device not compromised - no match
- Policy B has higher priority (200 > 100) and is DENY. With deny-overrides: DENY
Request 3: DENY (via Policy B)
- Policy A: not âpushâ or âreadâ action (itâs âdeleteâ) - wait, check again. Policy A has actions [âreadâ, âpushâ]. Delete is not in that list. - no match
- Policy B: delete is in [âpushâ, âmergeâ, âdeleteâ], sensitivity is critical, time 22:00 is outside hours - DENY candidate
- Policy C: admin, wildcard action - ALLOW candidate
- Policy D: device not compromised - no match
- Both B and C match. B has priority 200, C has priority 150. But this is deny-overrides, so ANY deny causes denial regardless of priority. Decision: DENY
Request 4: DENY (via Policy D)
- Policy A: admin not in [âdeveloperâ] - no match
- Policy B: action âreadâ not in [âpushâ, âmergeâ, âdeleteâ] - no match
- Policy C: admin, wildcard action - ALLOW candidate
- Policy D: device_health is compromised - DENY candidate
- Both C and D match. D has higher priority (300) and is DENY. With deny-overrides: DENY
If your answers differed, review the matching logic. Pay special attention to:
- Whether empty conditions mean âalways trueâ or ânever matchâ
- How âdeny overridesâ interacts with priority
- The difference between âno matchâ and âmatch with opposite effectâ
Hints in Layers
If you get stuck, reveal hints progressively. Try to solve problems yourself first.
Hint 1: Start With the Simplest Possible Version
Click to reveal
Donât start with JSON policy files. Start with hardcoded policies in your code. Create a /v1/decide endpoint that:
- Accepts a POST request with JSON body
- Parses subject, action, and resource from the body
- Has ONE hardcoded rule: âif subject.roles contains âadminâ, return ALLOWâ
- Returns DENY for everything else
Once this works, youâve proven your HTTP layer, JSON parsing, and decision response format. Only then add policy loading.
Hint 2: Policy Matching is Just Set Intersection
Click to reveal
Most policy matching reduces to: âDoes the request attribute intersect with the policyâs allowed set?â
func matchesSubject(request, policy):
if policy.subjects.roles is empty:
return true // Empty means "any"
return intersection(request.subject.roles, policy.subjects.roles) is not empty
The same pattern applies to actions and resource types. An empty constraint means âmatch all.â A non-empty constraint means âat least one must match.â
Watch out for the wildcard action "*" - handle it explicitly.
Hint 3: Conditions Are Just Boolean Expressions
Click to reveal
Each condition type is a function that returns true or false:
func evaluateConditions(request, policy):
for each condition in policy.conditions:
if condition is time_range:
if not isInTimeRange(request.environment.timestamp, condition.start, condition.end):
return false
if condition is device_health:
if request.subject.device_health not in condition.allowed_states:
return false
// ... more condition types
return true // All conditions passed
Start by supporting just one condition type (time_range is a good first choice). Add more incrementally.
Hint 4: Use RWMutex for Policy Hot-Reload
Click to reveal
The classic reader-writer problem: many goroutines read policies (evaluating decisions), but occasionally one goroutine writes (reloading policies).
type PolicyStore struct {
mu sync.RWMutex
policies []Policy
}
func (ps *PolicyStore) Evaluate(req Request) Decision {
ps.mu.RLock() // Multiple readers can hold this simultaneously
defer ps.mu.RUnlock()
// Read from ps.policies safely
}
func (ps *PolicyStore) Reload(newPolicies []Policy) {
ps.mu.Lock() // Exclusive lock - blocks all readers
defer ps.mu.Unlock()
ps.policies = newPolicies
}
The key insight: RLock doesnât block other RLocks, so decision evaluation remains parallel. Only during the brief moment of Reload do readers block.
Hint 5: Cache Key Design Matters
Click to reveal
A naive cache key might be: hash(entire_request). But this has poor hit rates because timestamps differ on every request.
Better approach: Cache based on the attributes that actually affect the decision:
func cacheKey(request):
relevant = {
"subject_id": request.subject.id,
"subject_roles": sorted(request.subject.roles),
"action": request.action,
"resource_id": request.resource.id,
"resource_type": request.resource.type,
// Note: NOT including timestamp, only hour
"hour_bucket": request.environment.timestamp.hour
}
return sha256(json(relevant))
The âhour bucketâ means decisions are cached per hour, which is appropriate for hourly time-range policies. Adjust granularity based on your policy precision.
Also consider: what happens when a userâs roles change? You need a way to invalidate their cached decisions. One pattern: include a âcache versionâ in the key, and increment it on role changes.
Solution Architecture
Component Diagram
+------------------------------------------------------------------+
| PDP Architecture Overview |
+------------------------------------------------------------------+
| |
| EXTERNAL INTERNAL |
| +------------------+ +---------------------------+ |
| | Policy JSON | | Policy Engine | |
| | File/DB |----------->| | |
| +------------------+ | +---------------------+ | |
| ^ | | Policy Store | | |
| | | | (In-Memory + Index) | | |
| +------------------+ | +---------------------+ | |
| | File Watcher / | | | | |
| | Admin API |------------| v | |
| +------------------+ | +---------------------+ | |
| | | Rule Evaluator | | |
| +------------------+ | | - Condition Matcher | | |
| | HTTP Server |----------->| | - Conflict Resolver | | |
| | :9090 | | +---------------------+ | |
| +------------------+ | | | |
| ^ | v | |
| | | +---------------------+ | |
| +------------------+ | | Decision Cache | | |
| | PEP / Gateway | | | (Optional Redis) | | |
| | (Project 1) | | +---------------------+ | |
| +------------------+ | | | |
| | v | |
| +------------------+ | +---------------------+ | |
| | Policy Info |<-----------| | Audit Logger | | |
| | Points (PIPs): | | | (Append-Only Log) | | |
| | - Device Health | | +---------------------+ | |
| | - Threat Intel | | | |
| +------------------+ +---------------------------+ |
| |
+------------------------------------------------------------------+
Policy Evaluation Flow
+------------------------------------------------------------------+
| Policy Evaluation Flow |
+------------------------------------------------------------------+
| |
| 1. REQUEST RECEIVED |
| +----------------------------------------------------------+ |
| | POST /v1/decide | |
| | { subject, action, resource, environment } | |
| +-------------------------+--------------------------------+ |
| | |
| v |
| 2. REQUEST VALIDATION |
| +----------------------------------------------------------+ |
| | - Validate JSON schema | |
| | - Normalize fields (lowercase, trim) | |
| | - Set defaults for missing optional fields | |
| | - Generate request_id if not provided | |
| +-------------------------+--------------------------------+ |
| | |
| v |
| 3. CACHE CHECK (Optional) |
| +----------------------------------------------------------+ |
| | cache_key = hash(subject.id, action, resource.id) | |
| | if cache.has(cache_key): | |
| | return cache.get(cache_key) ----------------------->|-->|
| +-------------------------+--------------------------------+ |
| | |
| v |
| 4. DATA ENRICHMENT (Optional) |
| +----------------------------------------------------------+ |
| | - Fetch device health from Device Trust Service | |
| | - Fetch user attributes from Identity Store | |
| | - Fetch threat intel for IP address | |
| | - Attach enriched data to request context | |
| +-------------------------+--------------------------------+ |
| | |
| v |
| 5. POLICY MATCHING |
| +----------------------------------------------------------+ |
| | for policy in policies (sorted by priority DESC): | |
| | if matches_subject(request, policy) AND | |
| | matches_action(request, policy) AND | |
| | matches_resource(request, policy): | |
| | add to candidate_policies | |
| +-------------------------+--------------------------------+ |
| | |
| v |
| 6. CONDITION EVALUATION |
| +----------------------------------------------------------+ |
| | for policy in candidate_policies: | |
| | if evaluate_conditions(request, policy.conditions): | |
| | add to matching_policies | |
| +-------------------------+--------------------------------+ |
| | |
| v |
| 7. CONFLICT RESOLUTION |
| +----------------------------------------------------------+ |
| | Strategy: Deny Overrides | |
| | | |
| | if any(p.effect == DENY for p in matching_policies): | |
| | decision = DENY | |
| | matched_policy = first DENY policy | |
| | elif any(p.effect == ALLOW for p in matching_policies): | |
| | decision = ALLOW | |
| | matched_policy = first ALLOW policy | |
| | else: | |
| | decision = DENY # Default deny if no match | |
| | reason = "No matching policy found" | |
| +-------------------------+--------------------------------+ |
| | |
| v |
| 8. RESPONSE GENERATION |
| +----------------------------------------------------------+ |
| | response = { | |
| | decision: ALLOW/DENY, | |
| | reason: matched_policy.description, | |
| | matched_policy: matched_policy.id, | |
| | obligations: matched_policy.obligations, | |
| | evaluated_at: now(), | |
| | evaluation_time_ms: elapsed | |
| | } | |
| +-------------------------+--------------------------------+ |
| | |
| v |
| 9. CACHE UPDATE & AUDIT |
| +----------------------------------------------------------+ |
| | cache.set(cache_key, response, ttl=300) | |
| | audit_log.append(request, response) | |
| +-------------------------+--------------------------------+ |
| | |
| v |
| 10. RETURN RESPONSE |
| +----------------------------------------------------------+ |
| | HTTP 200 OK | |
| | Content-Type: application/json | |
| | { decision, reason, ... } | |
| +----------------------------------------------------------+ |
| |
+------------------------------------------------------------------+
Key Design Decisions
| Decision | Choice | Rationale |
|---|---|---|
| Language | Go or Rust | Performance-critical, concurrent, minimal GC pauses |
| Policy Format | JSON | Human-readable, tooling support, easy hot-reload |
| Storage | In-memory with file backing | Sub-millisecond lookups, persistence on restart |
| Conflict Strategy | Deny Overrides | Security-first, predictable behavior |
| Caching | Optional Redis/local | Trade consistency for performance when acceptable |
| API Style | REST + JSON | Universal compatibility, easy debugging |
| Audit Log | Append-only file | Tamper-evident, compliance-friendly |
Phased Implementation Guide
Phase 1: Basic Allow/Deny Endpoint
Goal: Create a minimal server that accepts authorization requests and returns hardcoded decisions.
Deliverable: Working HTTP server with /v1/decide endpoint.
Steps:
- Set up project structure (Go module or Rust crate)
- Create request/response structs from the JSON schemas
- Implement HTTP handler for
POST /v1/decide - Validate incoming JSON structure
- Return hardcoded
DENYfor all requests (fail-closed default) - Add health check endpoint
GET /health
Verification:
# Server should start
$ ./zta-pdp --port 9090
[INFO] PDP listening on :9090
# Health check works
$ curl http://localhost:9090/health
{"status": "healthy"}
# Decision endpoint returns DENY
$ curl -X POST http://localhost:9090/v1/decide \
-d '{"subject":{"id":"test"},"action":"read","resource":{"id":"test"}}' \
-H "Content-Type: application/json"
{"decision": "DENY", "reason": "No policies configured"}
Phase 2: Simple Role-Based Rules
Goal: Implement basic RBAC - if user has required role, allow access.
Deliverable: Working policy evaluation with role matching.
Steps:
- Create
Policystruct matching the schema - Load policies from a JSON file at startup
- Implement
matches_subject()- check if subject roles intersect with policy roles - Implement
matches_action()- check if action is in policy actions - Implement
matches_resource()- check if resource type matches - Return ALLOW if any policy matches with
effect: "allow"
Sample Policy File (policies.json):
{
"policies": [
{
"id": "admin-all",
"effect": "allow",
"subjects": { "roles": ["admin"] },
"actions": ["*"],
"resources": {}
},
{
"id": "dev-read",
"effect": "allow",
"subjects": { "roles": ["developer"] },
"actions": ["read"],
"resources": { "types": ["repository"] }
}
]
}
Verification:
# Admin can do anything
$ curl -X POST http://localhost:9090/v1/decide \
-d '{"subject":{"id":"alice","roles":["admin"]},"action":"delete","resource":{"id":"db"}}' \
-H "Content-Type: application/json"
{"decision": "ALLOW", "matched_policy": "admin-all"}
# Developer can read repos
$ curl -X POST http://localhost:9090/v1/decide \
-d '{"subject":{"id":"bob","roles":["developer"]},"action":"read","resource":{"id":"repo","type":"repository"}}' \
-H "Content-Type: application/json"
{"decision": "ALLOW", "matched_policy": "dev-read"}
# Developer cannot delete
$ curl -X POST http://localhost:9090/v1/decide \
-d '{"subject":{"id":"bob","roles":["developer"]},"action":"delete","resource":{"id":"repo"}}' \
-H "Content-Type: application/json"
{"decision": "DENY", "reason": "No matching policy"}
Phase 3: Attribute-Based Rules with JSONPath
Goal: Add condition evaluation for time-based, device health, and custom attributes.
Deliverable: Full ABAC support with complex conditions.
Steps:
- Implement
evaluate_conditions()function - Add time range condition checking
- Add device health condition checking
- Add network type condition checking
- Implement JSONPath extraction for custom conditions
- Add conflict resolution (deny overrides allow)
- Add policy priority sorting
Key Functions:
evaluate_time_range(request, condition) -> bool
evaluate_device_health(request, condition) -> bool
evaluate_network_type(request, condition) -> bool
evaluate_custom_condition(request, jsonpath_expr, expected_value) -> bool
Verification:
# Request during business hours with secure device - ALLOW
$ curl -X POST http://localhost:9090/v1/decide \
-d '{
"subject":{"id":"alice","roles":["developer"],"device_health":"secure"},
"action":"push",
"resource":{"id":"repo","type":"repository"},
"environment":{"timestamp":"2024-12-26T14:00:00Z"}
}' \
-H "Content-Type: application/json"
{"decision": "ALLOW"}
# Same request at 3 AM - DENY
$ curl -X POST http://localhost:9090/v1/decide \
-d '{
"subject":{"id":"alice","roles":["developer"],"device_health":"secure"},
"action":"push",
"resource":{"id":"repo","type":"repository"},
"environment":{"timestamp":"2024-12-26T03:00:00Z"}
}' \
-H "Content-Type: application/json"
{"decision": "DENY", "reason": "Outside allowed time range"}
Phase 4: Policy Hot-Reloading
Goal: Update policies without restarting the server.
Deliverable: File watcher + admin reload endpoint.
Steps:
- Implement file watcher using
fsnotify(Go) ornotify(Rust) - Create read-write lock for policy store
- On file change: parse new policies, validate, atomic swap
- Add
POST /admin/reload-policiesendpoint for manual reload - Add policy version tracking
- Graceful handling of parse errors (keep old policies)
Implementation Pattern (Go):
type PolicyStore struct {
policies []Policy
mu sync.RWMutex
version int
}
func (ps *PolicyStore) Reload(path string) error {
newPolicies, err := loadFromFile(path)
if err != nil {
return err // Keep old policies
}
ps.mu.Lock()
defer ps.mu.Unlock()
ps.policies = newPolicies
ps.version++
return nil
}
func (ps *PolicyStore) Evaluate(req Request) Decision {
ps.mu.RLock()
defer ps.mu.RUnlock()
// Use ps.policies...
}
Verification:
# Initial state: 2 policies
$ curl http://localhost:9090/health
{"policies_loaded": 2, "policy_version": 1}
# Modify policies.json (add a new policy)
$ echo '...' >> policies.json
# Automatic reload detected
[INFO] File change detected, reloading policies...
[INFO] Loaded 3 policies (version 2)
# Or manual reload
$ curl -X POST http://localhost:9090/admin/reload-policies
{"status": "reloaded", "policies_loaded": 3, "policy_version": 2}
Phase 5: Performance Optimization and Caching
Goal: Achieve sub-5ms P99 latency at 10,000+ requests/second.
Deliverable: Optimized engine with caching and metrics.
Steps:
- Add request timing metrics (evaluation_time_ms in response)
- Implement in-memory LRU cache for decisions
- Add policy index by action type for faster matching
- Implement connection pooling if using external data sources
- Add
/metricsendpoint (Prometheus format) - Profile and optimize hot paths
Caching Strategy:
Cache Key: SHA256(subject.id + action + resource.id)
Cache TTL: 300 seconds (configurable per resource sensitivity)
Invalidation: On policy reload, clear entire cache
Optimization Techniques:
- Pre-compile regex patterns at policy load time
- Use sync.Pool for request/response object reuse
- Avoid allocations in the hot path
- Index policies by action for O(1) lookup
- Use atomic counters for metrics
Verification:
# Benchmark with wrk or hey
$ hey -n 100000 -c 100 -m POST \
-H "Content-Type: application/json" \
-d '{"subject":{"id":"test","roles":["dev"]},"action":"read","resource":{"id":"repo"}}' \
http://localhost:9090/v1/decide
Summary:
Requests/sec: 12345.67
Latency:
50%: 1.2ms
99%: 4.8ms
# Check metrics
$ curl http://localhost:9090/metrics
pdp_requests_total{decision="ALLOW"} 50000
pdp_requests_total{decision="DENY"} 50000
pdp_evaluation_seconds{quantile="0.5"} 0.0012
pdp_evaluation_seconds{quantile="0.99"} 0.0048
pdp_cache_hits_total 75000
pdp_cache_misses_total 25000
Testing Strategy
Unit Tests
Test individual components in isolation:
- Policy Parsing:
- Valid JSON parses correctly
- Invalid JSON returns error
- Missing required fields fail validation
- Unknown fields are ignored (forward compatibility)
- Subject Matching:
- Role intersection works correctly
- Empty policy roles match all subjects
- Case sensitivity handling
- Condition Evaluation:
- Time range: inside, outside, edge cases (midnight crossing)
- Device health: exact match, array membership
- JSONPath: valid paths, missing fields, type mismatches
- Conflict Resolution:
- DENY overrides ALLOW
- Priority ordering works
- No-match returns default DENY
Integration Tests
Test the complete request-response cycle:
- API Contract:
- Request validation rejects malformed JSON
- Response matches expected schema
- HTTP status codes are correct (200, 400, 500)
- End-to-End Scenarios:
- Admin bypasses all restrictions
- Time-based denial works
- MFA requirement triggers obligation
- Unknown subjects are denied
- Hot Reload:
- New policies take effect immediately
- Invalid policy file doesnât crash server
- Concurrent requests during reload work correctly
Performance Tests
# Load test with realistic payload
$ cat <<EOF > payload.json
{
"subject": {"id": "alice", "roles": ["developer"], "device_health": "secure"},
"action": "push",
"resource": {"id": "repo-123", "type": "repository", "sensitivity": "internal"},
"environment": {"timestamp": "2024-12-26T14:00:00Z", "network_type": "corporate"}
}
EOF
# Sustained load test (10 minutes)
$ hey -z 600s -c 100 -m POST \
-H "Content-Type: application/json" \
-D payload.json \
http://localhost:9090/v1/decide
# Spike test (sudden burst)
$ hey -n 50000 -c 500 -m POST ...
# Measure memory under load
$ while true; do ps -o rss= -p $(pgrep zta-pdp); sleep 5; done
Verification Against Reference Implementation
Compare your PDP decisions with Open Policy Agent (OPA):
# Run OPA with equivalent Rego policies
$ opa run --server policies.rego
# Test same input against both
$ diff <(curl your-pdp/decide -d @input.json | jq .decision) \
<(curl opa/v1/data/authz -d @input.json | jq .result.allow)
Common Pitfalls and Debugging
Pitfall 1: Policy Never Matches
Symptom: All requests return DENY even when they should match.
Cause: Field name mismatch or case sensitivity.
Solution:
# Add debug logging to show what's being compared
[DEBUG] Checking policy 'dev-read':
[DEBUG] Subject roles: ["Developer"] vs Policy roles: ["developer"]
[DEBUG] -> No intersection (case mismatch!)
# Fix: Normalize to lowercase at parse time
subject.roles = subject.roles.map(r => r.toLowerCase())
Pitfall 2: Time Zone Issues
Symptom: Time-based policies allow/deny at wrong times.
Cause: Server timezone differs from policy timezone.
Solution:
// Always parse times in the policy's specified timezone
loc, _ := time.LoadLocation(condition.TimeRange.Timezone)
requestTime := request.Environment.Timestamp.In(loc)
hour := requestTime.Hour()
Pitfall 3: Cache Stampede on Policy Reload
Symptom: Latency spike after policy reload as cache is empty.
Cause: All cached decisions invalidated simultaneously.
Solution:
// Option 1: Probabilistic early expiration
func shouldFetch(ttl time.Duration) bool {
remaining := ttl.Seconds()
return rand.Float64() < math.Exp(-remaining/60.0)
}
// Option 2: Background cache warming
func afterReload() {
go warmCache(commonRequests)
}
Pitfall 4: Wildcard Action Matching
Symptom: actions: ["*"] doesnât match all actions.
Cause: Wildcard treated as literal string.
Solution:
func matchesAction(request Request, policy Policy) bool {
if len(policy.Actions) == 0 {
return true // Empty = match all
}
for _, action := range policy.Actions {
if action == "*" || action == request.Action {
return true
}
}
return false
}
Pitfall 5: Race Condition on Policy Reload
Symptom: Intermittent panics or wrong decisions during reload.
Cause: Reading policies while theyâre being written.
Solution:
// Use RWMutex - multiple readers OR single writer
type PolicyStore struct {
mu sync.RWMutex
policies []Policy
}
func (ps *PolicyStore) Evaluate(req Request) Decision {
ps.mu.RLock() // Acquire read lock
defer ps.mu.RUnlock()
// Safe to read ps.policies
}
func (ps *PolicyStore) Reload(new []Policy) {
ps.mu.Lock() // Acquire write lock
defer ps.mu.Unlock()
ps.policies = new // Atomic swap
}
Debugging Checklist
- Enable verbose logging: Log every policy check with matched/unmatched reason
- Add request echo: Include parsed request in response for verification
- Use curl with
-v: See exact request being sent - Check policy loading: Verify policies.json is valid JSON
- Verify timestamps: Ensure client and server clocks are synchronized
- Test with minimal policy: Single policy to isolate matching logic
Extensions and Challenges
Extension 1: gRPC API
Replace REST with gRPC for lower latency and type safety.
Hints:
- Define
.protofile matching the JSON schemas - Use streaming for bulk decisions
- Implement health checking protocol
- Target < 1ms P50 latency
Extension 2: Policy Simulation Mode
Add endpoint to test âwhat ifâ scenarios without affecting production.
$ curl -X POST http://localhost:9090/v1/simulate \
-d '{
"request": { ... },
"proposed_policies": [ ... ]
}'
{"would_allow": true, "matched_policy": "new-policy-draft"}
Extension 3: Policy Analytics Dashboard
Build a web UI showing:
- Decision distribution over time
- Most-matched policies
- Most-denied subjects/resources
- Latency percentiles
Extension 4: Distributed Policy Sync
Multiple PDP instances sharing policies via:
- etcd/Consul for configuration
- Raft consensus for consistency
- Merkle trees for efficient diff
Extension 5: Machine Learning Risk Scoring
Integrate with a risk scoring model:
- Input: Subject + context features
- Output: Risk score 0.0 - 1.0
- Use in conditions:
risk_score < 0.5
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Authorization Logic | âFoundations of Information Securityâ by Jason Andress | Ch. 5: Authentication and Authorization |
| Policy as Code | âZero Trust Networksâ by Gilman & Barth | Ch. 3: The Zero Trust Control Plane |
| System Performance | âDesigning Data-Intensive Applications, 2nd Edâ by Kleppmann | Ch. 1-2: Foundations |
| Access Control Models | âSecurity in Computingâ by Charles Pfleeger | Ch. 4: Access Control |
| Go Performance | âLearning Go, 2nd Editionâ by Jon Bodner | Ch. 12: Performance |
| Concurrent Data Structures | âAlgorithms, Fourth Editionâ by Sedgewick & Wayne | Ch. 4: Hash Tables |
| Rule Engines & Logic | âDesign Patternsâ by Gamma et al. | Ch. 5: Strategy Pattern |
| Cryptographic Foundations | âSerious Cryptography, 2nd Editionâ by Aumasson | Ch. 5: MACs and Authentication |
Interview Questions
Questions you should be able to answer after completing this project:
- âWhat is the difference between Authentication and Authorization?â
- Authentication: Verifying WHO you are (identity)
- Authorization: Verifying WHAT you can do (permissions)
- The PDP handles authorization, assumes authentication already happened
- âExplain Attribute-Based Access Control (ABAC) with a real-world example.â
- Example: âDevelopers can push to non-production repos during business hours from secure devicesâ
- Combines subject attributes (role, device health), resource attributes (environment), and context (time)
- More flexible than RBAC, can express complex policies
- âHow do you handle âPolicy Conflictâ (e.g., one rule says ALLOW, another says DENY)?â
- Strategy 1: Deny overrides (most secure, what AWS uses)
- Strategy 2: First match (order-dependent, like firewalls)
- Strategy 3: Most specific wins (requires specificity scoring)
- Strategy 4: Explicit priority numbers
- âWhy is centralized policy management key to Zero Trust?â
- Single source of truth for all authorization decisions
- Consistent enforcement across all services
- Easier auditing and compliance
- Policy updates take effect everywhere immediately
- Decouples authorization logic from application code
- âWhat happens if the PDP is unavailable?â
- Fail-closed: Deny all requests (secure but impacts availability)
- Fail-open: Allow all requests (dangerous, should never be used for sensitive resources)
- Cached fallback: Use last known decision (tradeoff between security and availability)
- âHow do you achieve sub-5ms latency in a policy engine?â
- In-memory policy storage (no database queries)
- Policy indexing by action type
- Decision caching with appropriate TTL
- Object pooling to reduce GC pressure
- Avoid regex compilation in hot path
- âWhat is a Policy Information Point (PIP)?â
- External data source that enriches authorization requests
- Examples: Device health service, threat intelligence feed, HR database
- PDP queries PIPs to get current context for decisions
- Should be cached to avoid latency
Self-Assessment Checklist
Before considering this project complete, verify you can:
- Explain the difference between RBAC, ABAC, and ReBAC with concrete examples
- Describe the role of PDP, PEP, and PIP in Zero Trust architecture
- Implement a policy evaluation engine that matches subjects, actions, and resources
- Handle time-based conditions with proper timezone handling
- Implement conflict resolution using deny-overrides strategy
- Hot-reload policies without dropping requests or crashing
- Achieve < 5ms P99 latency under load
- Write comprehensive audit logs for every decision
- Handle failure modes gracefully (fail-closed by default)
- Test your PDP against complex scenarios (after hours, compromised devices, etc.)
- Explain when caching is appropriate and when it creates security risks
- Integrate your PDP with the Identity-Aware Proxy from Project 1
- Answer all the interview questions listed above confidently