Project 2: Policy Decision Engine (Building a PDP)

Project Overview

What you’re building: A standalone “Brain” server that functions as the Policy Decision Point (PDP) in Zero Trust Architecture. It receives authorization requests (e.g., “Can User A perform Action B on Resource C?”) and returns “Allow” or “Deny” decisions based on dynamic rules stored in a database or JSON file. This server doesn’t handle actual traffic - it only makes decisions.

Why it matters: In traditional security, authorization logic is scattered across applications - each service implements its own access control. This creates inconsistency, makes auditing nearly impossible, and means changing a policy requires updating multiple codebases. A centralized PDP solves this by becoming the single source of truth for all authorization decisions.

Real-world applications:

Enterprise authorization systems (like Google’s Zanzibar or Airbnb’s Himeji)
Cloud IAM policy evaluation (AWS IAM, Google Cloud IAP)
API gateway authorization layers
Microservices access control in service meshes
Zero Trust Network Access (ZTNA) decision engines

+------------------------------------------------------------------+
|                    Policy Decision Point (PDP)                    |
+------------------------------------------------------------------+
|                                                                  |
|   PEP (Proxy/Gateway)          PDP (This Project)                |
|        |                           |                             |
|        |  Authorization Request:   |                             |
|        |  "Can Alice push to       |                             |
|        |   kernel-repo at 3AM?"    |                             |
|        |-------------------------->|                             |
|        |                           |                             |
|        |                     +-----|-----+                       |
|        |                     | Evaluate  |                       |
|        |                     | Policies  |                       |
|        |                     +-----|-----+                       |
|        |                           |                             |
|        |  Decision Response:       |                             |
|        |  { "decision": "DENY",    |                             |
|        |    "reason": "Outside     |                             |
|        |     allowed hours" }      |                             |
|        |<--------------------------|                             |
|        |                           |                             |
|        V                           |                             |
|   +------------------+             |                             |
|   | Enforce Decision |             |                             |
|   | Block Request    |             |                             |
|   +------------------+             |                             |
|                                                                  |
+------------------------------------------------------------------+

Learning Objectives

By completing this project, you will be able to:

Explain the difference between RBAC, ABAC, and ReBAC and when to use each access control model
Design and implement a policy evaluation engine that supports complex, context-aware rules
Build a high-performance authorization service capable of sub-5ms response times
Implement policy hot-reloading without service restart or dropped connections
Create audit logs that capture the complete decision trail for compliance requirements
Handle policy conflicts using precedence rules and explicit conflict resolution strategies
Integrate data enrichment from external sources (device health, threat intelligence) into decisions

Prerequisites

Required knowledge:

Proficiency in Go or Rust (or Python for prototyping)
Understanding of JSON and REST APIs
Basic knowledge of boolean logic and conditional expressions
Familiarity with HTTP servers and request handling

Helpful but not required:

Experience with Open Policy Agent (OPA) or similar policy engines
Understanding of database query optimization
Knowledge of caching strategies

System requirements:

Linux (preferred) or macOS
Go 1.21+ or Rust 1.70+
curl for testing
Redis (optional, for caching)

Deep Theoretical Foundation

Access Control Models: RBAC vs ABAC vs ReBAC

Before building a policy engine, you must understand the three fundamental access control paradigms:

+------------------------------------------------------------------+
|                    Access Control Model Comparison                |
+------------------------------------------------------------------+

RBAC (Role-Based Access Control)
================================
Simple, coarse-grained. Users have Roles, Roles have Permissions.

   User: Alice
      |
      +---> Role: Developer
               |
               +---> Permissions: [read:code, write:code, read:docs]

   Decision Logic:
   IF user.role == "developer" THEN allow(read:code)

   Pros: Simple to understand, easy to audit
   Cons: Role explosion, lacks context awareness

+------------------------------------------------------------------+

ABAC (Attribute-Based Access Control)
=====================================
Fine-grained. Decisions based on attributes of Subject, Resource,
Action, and Environment.

   Subject Attributes:        Resource Attributes:
   +------------------+       +------------------+
   | user: alice      |       | type: repository |
   | role: developer  |       | owner: kernel    |
   | clearance: L2    |       | sensitivity: high|
   | device: secure   |       | classification: C|
   +------------------+       +------------------+

   Environment Attributes:    Action Attributes:
   +------------------+       +------------------+
   | time: 14:00      |       | operation: push  |
   | location: NYC    |       | scope: main      |
   | ip: 10.0.1.5     |       | method: POST     |
   +------------------+       +------------------+

   Decision Logic:
   IF subject.role == "developer"
      AND resource.sensitivity <= subject.clearance
      AND environment.time BETWEEN 08:00 AND 20:00
      AND subject.device == "secure"
   THEN allow(action)

   Pros: Highly flexible, context-aware, fine-grained
   Cons: Complex to manage, harder to audit

+------------------------------------------------------------------+

ReBAC (Relationship-Based Access Control)
=========================================
Graph-based. Access determined by relationships between entities.
Used by Google Zanzibar, Airbnb Himeji, Ory Keto.

   alice --[member]--> team-kernel --[owner]--> kernel-repo
                                         |
   bob --[viewer]------------------------|

   Decision Logic:
   Can(alice, push, kernel-repo)?
     -> alice is member of team-kernel
     -> team-kernel is owner of kernel-repo
     -> owners can push
     -> ALLOW

   Pros: Natural for hierarchies, handles delegation well
   Cons: Graph traversal complexity, eventual consistency issues

+------------------------------------------------------------------+

The Policy Decision Point in Zero Trust Architecture

The PDP sits at the heart of the Zero Trust Control Plane:

+------------------------------------------------------------------+
|              Zero Trust Control Plane Architecture                |
+------------------------------------------------------------------+
|                                                                  |
|                     CONTROL PLANE                                |
|   +----------------------------------------------------------+   |
|   |                                                          |   |
|   |   +----------------+     +---------------------------+   |   |
|   |   |     Policy     |     |    Policy Information     |   |   |
|   |   |  Administrator |     |    Points (PIPs)          |   |   |
|   |   |      (PA)      |     |                           |   |   |
|   |   +-------+--------+     |  +--------+ +--------+    |   |   |
|   |           |              |  |Identity| |Device  |    |   |   |
|   |           |              |  |Store   | |Health  |    |   |   |
|   |           v              |  +--------+ +--------+    |   |   |
|   |   +----------------+     |  +--------+ +--------+    |   |   |
|   |   |    Policy      |<----|  |Threat  | |Time/   |    |   |   |
|   |   |   Decision     |     |  |Intel   | |Location|    |   |   |
|   |   |    Point       |     |  +--------+ +--------+    |   |   |
|   |   |    (PDP)       |     +---------------------------+   |   |
|   |   +-------+--------+                                     |   |
|   |           |                                              |   |
|   +-----------|----------------------------------------------+   |
|               | Decision (Allow/Deny + Reason)                   |
|               v                                                  |
|   +----------------------------------------------------------+   |
|   |                      DATA PLANE                          |   |
|   |                                                          |   |
|   |   [Subject] --> [PEP/Gateway] --> [Resource]             |   |
|   |                      ^                                   |   |
|   |                      |                                   |   |
|   |               Enforces Decision                          |   |
|   +----------------------------------------------------------+   |
|                                                                  |
+------------------------------------------------------------------+

Policy Languages and Domain-Specific Languages (DSLs)

Production policy engines use specialized languages for expressing rules:

+------------------------------------------------------------------+
|                    Policy Language Comparison                     |
+------------------------------------------------------------------+

OPA Rego (Open Policy Agent)
============================
Logic-based, Datalog-inspired language. Industry standard.

   package authz

   default allow = false

   allow {
       input.subject.role == "admin"
   }

   allow {
       input.subject.role == "developer"
       input.resource.type == "code"
       is_business_hours
   }

   is_business_hours {
       now := time.now_ns()
       hour := time.clock(now)[0]
       hour >= 8
       hour < 20
   }

+------------------------------------------------------------------+

Cedar (AWS Verified Permissions)
================================
Type-safe, analyzable policy language from AWS.

   permit (
       principal in Group::"developers",
       action == Action::"push",
       resource in Repository::"kernel"
   ) when {
       context.time.hour >= 8 &&
       context.time.hour < 20 &&
       principal.device_health == "secure"
   };

+------------------------------------------------------------------+

JSON-Based Rules (What you'll build)
====================================
Simpler format for learning - you'll implement an evaluator for this.

   {
     "id": "dev-push-hours",
     "effect": "allow",
     "subjects": {
       "roles": ["developer", "admin"]
     },
     "actions": ["push", "commit"],
     "resources": {
       "type": "repository"
     },
     "conditions": {
       "time_range": {"start": "08:00", "end": "20:00"},
       "device_health": "secure"
     }
   }

+------------------------------------------------------------------+

Context-Aware Authorization

Context transforms simple RBAC into intelligent, adaptive security:

+------------------------------------------------------------------+
|                    Context Dimensions for Decisions               |
+------------------------------------------------------------------+
|                                                                  |
|   WHO (Subject Context)           WHAT (Resource Context)        |
|   +------------------------+      +------------------------+     |
|   | Identity: alice@corp   |      | ID: kernel-repo        |     |
|   | Roles: [dev, sre]      |      | Type: repository       |     |
|   | Groups: [team-kernel]  |      | Owner: linux-foundation|     |
|   | Clearance: L3          |      | Sensitivity: critical  |     |
|   | MFA: verified          |      | Classification: public |     |
|   | Session Age: 2h        |      | Data Types: [code]     |     |
|   +------------------------+      +------------------------+     |
|                                                                  |
|   HOW (Action Context)            WHERE/WHEN (Environment)       |
|   +------------------------+      +------------------------+     |
|   | Operation: git_push    |      | Time: 2024-12-26T14:00 |     |
|   | Method: POST           |      | Day: Thursday          |     |
|   | Scope: refs/heads/main |      | Location: NYC office   |     |
|   | Commit Count: 3        |      | IP: 10.0.1.45          |     |
|   | Files Changed: 12      |      | Network: corporate     |     |
|   +------------------------+      | Device ID: MBP-42      |     |
|                                   | Device Health: secure   |     |
|   WHY (Risk Context)              | Risk Score: 0.15       |     |
|   +------------------------+      +------------------------+     |
|   | Threat Level: low      |                                     |
|   | Recent Failures: 0     |                                     |
|   | Anomaly Score: 0.2     |                                     |
|   | Active Incidents: 0    |                                     |
|   +------------------------+                                     |
|                                                                  |
|   DECISION MATRIX:                                               |
|   +---------------------------------------------------------+   |
|   | Rule: "Critical repos require device health + work hours"|   |
|   |                                                         |   |
|   | subject.roles CONTAINS "developer"           -> TRUE    |   |
|   | resource.sensitivity == "critical"           -> TRUE    |   |
|   | environment.device_health == "secure"        -> TRUE    |   |
|   | environment.time IN [08:00, 20:00]           -> TRUE    |   |
|   | environment.risk_score < 0.5                 -> TRUE    |   |
|   |                                                         |   |
|   | ALL CONDITIONS MET -> ALLOW                             |   |
|   +---------------------------------------------------------+   |
|                                                                  |
+------------------------------------------------------------------+

Policy Conflict Resolution

When multiple policies apply, you need clear resolution rules:

+------------------------------------------------------------------+
|                    Policy Conflict Resolution                     |
+------------------------------------------------------------------+
|                                                                  |
|   STRATEGY 1: Deny Overrides (Most Secure)                       |
|   =========================================                      |
|   If ANY policy says DENY, the final decision is DENY.           |
|   Used by: AWS IAM, most enterprise systems                      |
|                                                                  |
|   Policy A: ALLOW (developer can read)                           |
|   Policy B: DENY (no access after 10PM)                          |
|   Final:    DENY                                                 |
|                                                                  |
|   +----------------------------------------------------------+   |
|   |   for policy in matching_policies:                       |   |
|   |       if policy.effect == DENY:                          |   |
|   |           return DENY, policy.reason                     |   |
|   |   return ALLOW if any_allow else DENY                    |   |
|   +----------------------------------------------------------+   |
|                                                                  |
|   STRATEGY 2: First Match (Order Matters)                        |
|   ========================================                       |
|   First policy that matches determines the outcome.              |
|   Used by: Firewall rules, some RBAC systems                     |
|                                                                  |
|   Policy 1: DENY contractors after hours                         |
|   Policy 2: ALLOW developers to push                             |
|   Request:  Contractor pushing at 3PM                            |
|   Final:    DENY (matched Policy 1 first)                        |
|                                                                  |
|   STRATEGY 3: Most Specific Wins                                 |
|   ==============================                                 |
|   Policy with most specific match takes precedence.              |
|   Used by: URL routing, some ABAC systems                        |
|                                                                  |
|   Policy A: General "developers can read repos"                  |
|   Policy B: Specific "Alice cannot read kernel-repo"             |
|   Final:    DENY for Alice on kernel-repo (more specific)        |
|                                                                  |
|   STRATEGY 4: Priority/Weight Based                              |
|   =================================                              |
|   Each policy has an explicit priority number.                   |
|                                                                  |
|   Policy A: priority=100, ALLOW developers                       |
|   Policy B: priority=200, DENY after hours                       |
|   Final:    Higher priority (200) wins -> DENY                   |
|                                                                  |
+------------------------------------------------------------------+

Fail-Open vs Fail-Closed Decisions

Critical design choice when the PDP cannot make a decision:

+------------------------------------------------------------------+
|                    Failure Mode Decision Matrix                   |
+------------------------------------------------------------------+
|                                                                  |
|   FAIL-CLOSED (Deny by Default)                                  |
|   =============================                                  |
|   If PDP is unavailable or uncertain, DENY the request.          |
|                                                                  |
|   Pros:                           Cons:                          |
|   - More secure                   - Availability impact          |
|   - No unauthorized access        - User frustration             |
|   - Meets compliance requirements - Business disruption          |
|                                                                  |
|   When to use:                                                   |
|   - Financial systems                                            |
|   - Healthcare data                                              |
|   - Critical infrastructure                                      |
|   - Government/classified systems                                |
|                                                                  |
|   +----------------------------------------------------------+   |
|   |   func decide(request) -> Decision:                      |   |
|   |       try:                                               |   |
|   |           return evaluate_policies(request)              |   |
|   |       except Timeout, Error:                             |   |
|   |           log.error("PDP failure, denying request")      |   |
|   |           return DENY("System unavailable")              |   |
|   +----------------------------------------------------------+   |
|                                                                  |
+------------------------------------------------------------------+
|                                                                  |
|   FAIL-OPEN (Allow by Default)                                   |
|   =============================                                  |
|   If PDP is unavailable, ALLOW the request.                      |
|                                                                  |
|   Pros:                           Cons:                          |
|   - Better availability           - Security risk                |
|   - No business disruption        - Compliance violations        |
|   - User experience preserved     - Audit gaps                   |
|                                                                  |
|   When to use (with extreme caution):                            |
|   - Read-only public data                                        |
|   - Non-sensitive operations                                     |
|   - When availability > confidentiality                          |
|                                                                  |
|   NEVER use for:                                                 |
|   - Write operations                                             |
|   - Sensitive data access                                        |
|   - Administrative actions                                       |
|                                                                  |
+------------------------------------------------------------------+
|                                                                  |
|   HYBRID: Cached Fallback                                        |
|   ========================                                       |
|   Use cached decisions during PDP outages.                       |
|                                                                  |
|   +----------------------------------------------------------+   |
|   |   func decide(request) -> Decision:                      |   |
|   |       cache_key = hash(request.subject, request.action,  |   |
|   |                        request.resource)                 |   |
|   |       try:                                               |   |
|   |           decision = evaluate_policies(request)          |   |
|   |           cache.set(cache_key, decision, ttl=300s)       |   |
|   |           return decision                                |   |
|   |       except Timeout:                                    |   |
|   |           if cached = cache.get(cache_key):              |   |
|   |               log.warn("Using cached decision")          |   |
|   |               return cached                              |   |
|   |           return DENY("No cached decision available")    |   |
|   +----------------------------------------------------------+   |
|                                                                  |
+------------------------------------------------------------------+

Caching Strategies for Authorization

Performance is critical - every request in your system calls the PDP:

+------------------------------------------------------------------+
|                    Authorization Caching Strategies               |
+------------------------------------------------------------------+
|                                                                  |
|   LAYER 1: PEP-Side Cache (Sidecar/Local)                        |
|   ========================================                       |
|                                                                  |
|   +----------+    +----------+    +----------+                   |
|   |   PEP    |--->|  Local   |--->|   PDP    |                   |
|   | Gateway  |    |  Cache   |    |  Server  |                   |
|   +----------+    +----------+    +----------+                   |
|                                                                  |
|   - TTL: 30s - 5min (depends on sensitivity)                     |
|   - Cache Key: hash(subject, action, resource)                   |
|   - Invalidation: on policy change, session end                  |
|                                                                  |
|   Latency: ~0.1ms (cache hit) vs ~5ms (PDP call)                 |
|                                                                  |
+------------------------------------------------------------------+
|                                                                  |
|   LAYER 2: Distributed Cache (Redis/Memcached)                   |
|   =============================================                  |
|                                                                  |
|   +--------+    +--------+    +---------+    +--------+          |
|   | PEP 1  |--->|        |    |         |<---| PEP 2  |          |
|   +--------+    |  Redis |<---|   PDP   |    +--------+          |
|   +--------+    | Cluster|    | Cluster |    +--------+          |
|   | PEP 3  |--->|        |    |         |<---| PEP 4  |          |
|   +--------+    +--------+    +---------+    +--------+          |
|                                                                  |
|   - Shared cache across all PEPs                                 |
|   - TTL: configurable per policy sensitivity                     |
|   - Pub/Sub for instant invalidation                             |
|                                                                  |
+------------------------------------------------------------------+
|                                                                  |
|   CACHE INVALIDATION STRATEGIES                                  |
|   ==============================                                 |
|                                                                  |
|   1. Time-Based (TTL):                                           |
|      cache.set(key, decision, ttl=300)  # 5 minutes              |
|                                                                  |
|   2. Event-Based (Push Invalidation):                            |
|      on_policy_change -> redis.publish("invalidate", policy_id)  |
|      on_session_end -> redis.delete(session_cache_key)           |
|                                                                  |
|   3. Version-Based:                                              |
|      cache_key = f"{subject}:{action}:{resource}:v{policy_ver}"  |
|      # Old keys expire, new keys created on policy update        |
|                                                                  |
|   4. Hierarchical:                                               |
|      cache["org:acme:*"]           -> clear all Acme policies    |
|      cache["org:acme:user:alice"]  -> clear Alice's cache only   |
|                                                                  |
+------------------------------------------------------------------+
|                                                                  |
|   CACHE CONSISTENCY vs LATENCY TRADEOFF                          |
|   ======================================                         |
|                                                                  |
|   High Sensitivity Resource (PII, Financial):                    |
|   - TTL: 0 (no cache) or 30 seconds max                          |
|   - Immediate invalidation on policy change                      |
|   - Accept higher latency for security                           |
|                                                                  |
|   Medium Sensitivity Resource (Internal Docs):                   |
|   - TTL: 5 minutes                                               |
|   - Eventual consistency acceptable                              |
|                                                                  |
|   Low Sensitivity Resource (Public API):                         |
|   - TTL: 30 minutes                                              |
|   - Stale reads acceptable                                       |
|                                                                  |
+------------------------------------------------------------------+

Complete Project Specification

Request/Response JSON Schemas

Authorization Request Schema

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "title": "AuthorizationRequest",
  "type": "object",
  "required": ["subject", "action", "resource"],
  "properties": {
    "request_id": {
      "type": "string",
      "description": "Unique identifier for this request (for tracing)"
    },
    "subject": {
      "type": "object",
      "description": "The entity requesting access",
      "required": ["id"],
      "properties": {
        "id": { "type": "string" },
        "type": { "type": "string", "enum": ["user", "service", "device"] },
        "roles": { "type": "array", "items": { "type": "string" } },
        "groups": { "type": "array", "items": { "type": "string" } },
        "attributes": { "type": "object" },
        "device_health": {
          "type": "string",
          "enum": ["secure", "at_risk", "compromised", "unknown"]
        },
        "mfa_verified": { "type": "boolean" },
        "session_age_seconds": { "type": "integer" }
      }
    },
    "action": {
      "type": "string",
      "description": "The operation being requested"
    },
    "resource": {
      "type": "object",
      "description": "The target of the action",
      "required": ["id"],
      "properties": {
        "id": { "type": "string" },
        "type": { "type": "string" },
        "owner": { "type": "string" },
        "sensitivity": {
          "type": "string",
          "enum": ["public", "internal", "confidential", "critical"]
        },
        "attributes": { "type": "object" }
      }
    },
    "environment": {
      "type": "object",
      "description": "Contextual information about the request",
      "properties": {
        "timestamp": { "type": "string", "format": "date-time" },
        "ip_address": { "type": "string" },
        "location": { "type": "string" },
        "network_type": {
          "type": "string",
          "enum": ["corporate", "vpn", "public", "unknown"]
        },
        "user_agent": { "type": "string" }
      }
    }
  }
}

Authorization Response Schema

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "title": "AuthorizationResponse",
  "type": "object",
  "required": ["decision", "request_id", "evaluated_at"],
  "properties": {
    "decision": {
      "type": "string",
      "enum": ["ALLOW", "DENY"]
    },
    "request_id": {
      "type": "string",
      "description": "Echo of the request ID for correlation"
    },
    "reason": {
      "type": "string",
      "description": "Human-readable explanation of the decision"
    },
    "matched_policy": {
      "type": "string",
      "description": "ID of the policy that determined this decision"
    },
    "evaluated_at": {
      "type": "string",
      "format": "date-time"
    },
    "evaluation_time_ms": {
      "type": "number",
      "description": "Time taken to evaluate (for performance monitoring)"
    },
    "obligations": {
      "type": "array",
      "description": "Actions the PEP must take if allowing (e.g., log, notify)",
      "items": {
        "type": "object",
        "properties": {
          "action": { "type": "string" },
          "parameters": { "type": "object" }
        }
      }
    }
  }
}

Policy Rule Format Specification

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "title": "Policy",
  "type": "object",
  "required": ["id", "effect"],
  "properties": {
    "id": {
      "type": "string",
      "description": "Unique policy identifier"
    },
    "name": {
      "type": "string",
      "description": "Human-readable policy name"
    },
    "description": {
      "type": "string"
    },
    "effect": {
      "type": "string",
      "enum": ["allow", "deny"]
    },
    "priority": {
      "type": "integer",
      "default": 100,
      "description": "Higher priority policies are evaluated first"
    },
    "subjects": {
      "type": "object",
      "description": "Conditions on the requesting entity",
      "properties": {
        "ids": { "type": "array", "items": { "type": "string" } },
        "roles": { "type": "array", "items": { "type": "string" } },
        "groups": { "type": "array", "items": { "type": "string" } },
        "types": { "type": "array", "items": { "type": "string" } },
        "attributes": {
          "type": "object",
          "additionalProperties": true
        }
      }
    },
    "actions": {
      "type": "array",
      "items": { "type": "string" },
      "description": "List of actions this policy applies to"
    },
    "resources": {
      "type": "object",
      "description": "Conditions on the target resource",
      "properties": {
        "ids": { "type": "array", "items": { "type": "string" } },
        "types": { "type": "array", "items": { "type": "string" } },
        "owners": { "type": "array", "items": { "type": "string" } },
        "sensitivity": { "type": "array", "items": { "type": "string" } },
        "attributes": {
          "type": "object",
          "additionalProperties": true
        }
      }
    },
    "conditions": {
      "type": "object",
      "description": "Additional conditions that must be met",
      "properties": {
        "time_range": {
          "type": "object",
          "properties": {
            "start": { "type": "string", "pattern": "^[0-2][0-9]:[0-5][0-9]$" },
            "end": { "type": "string", "pattern": "^[0-2][0-9]:[0-5][0-9]$" },
            "timezone": { "type": "string" },
            "days": {
              "type": "array",
              "items": { "type": "string", "enum": ["Mon","Tue","Wed","Thu","Fri","Sat","Sun"] }
            }
          }
        },
        "device_health": {
          "type": "array",
          "items": { "type": "string", "enum": ["secure", "at_risk"] }
        },
        "mfa_required": { "type": "boolean" },
        "network_types": {
          "type": "array",
          "items": { "type": "string" }
        },
        "max_session_age_seconds": { "type": "integer" },
        "custom": {
          "type": "object",
          "description": "Custom JSONPath-based conditions"
        }
      }
    },
    "obligations": {
      "type": "array",
      "description": "Actions to take when this policy matches",
      "items": {
        "type": "object",
        "properties": {
          "on": { "type": "string", "enum": ["allow", "deny", "both"] },
          "action": { "type": "string" },
          "parameters": { "type": "object" }
        }
      }
    }
  }
}

Example Policies for Common Scenarios

{
  "policies": [
    {
      "id": "admin-full-access",
      "name": "Administrators have full access",
      "description": "Global admin override - use with caution",
      "effect": "allow",
      "priority": 1000,
      "subjects": {
        "roles": ["admin", "super-admin"]
      },
      "actions": ["*"],
      "resources": {}
    },
    {
      "id": "dev-push-business-hours",
      "name": "Developers can push during business hours",
      "effect": "allow",
      "priority": 100,
      "subjects": {
        "roles": ["developer", "sre"]
      },
      "actions": ["push", "merge"],
      "resources": {
        "types": ["repository"]
      },
      "conditions": {
        "time_range": {
          "start": "08:00",
          "end": "20:00",
          "timezone": "America/New_York",
          "days": ["Mon", "Tue", "Wed", "Thu", "Fri"]
        },
        "device_health": ["secure"]
      }
    },
    {
      "id": "block-critical-after-hours",
      "name": "Block access to critical resources after hours",
      "effect": "deny",
      "priority": 200,
      "subjects": {},
      "actions": ["*"],
      "resources": {
        "sensitivity": ["critical"]
      },
      "conditions": {
        "time_range": {
          "start": "22:00",
          "end": "06:00"
        }
      }
    },
    {
      "id": "require-mfa-for-sensitive",
      "name": "Require MFA for confidential resources",
      "effect": "deny",
      "priority": 300,
      "subjects": {
        "attributes": {
          "mfa_verified": false
        }
      },
      "actions": ["*"],
      "resources": {
        "sensitivity": ["confidential", "critical"]
      },
      "obligations": [
        {
          "on": "deny",
          "action": "require_mfa",
          "parameters": { "redirect": "/auth/mfa" }
        }
      ]
    },
    {
      "id": "service-mesh-internal",
      "name": "Services can communicate within mesh",
      "effect": "allow",
      "priority": 50,
      "subjects": {
        "types": ["service"],
        "groups": ["internal-services"]
      },
      "actions": ["call", "invoke"],
      "resources": {
        "types": ["service", "api"]
      },
      "conditions": {
        "network_types": ["corporate", "vpn"]
      }
    },
    {
      "id": "compromised-device-block",
      "name": "Block all access from compromised devices",
      "effect": "deny",
      "priority": 999,
      "subjects": {
        "attributes": {
          "device_health": "compromised"
        }
      },
      "actions": ["*"],
      "resources": {},
      "obligations": [
        {
          "on": "deny",
          "action": "alert_security_team",
          "parameters": { "severity": "high" }
        }
      ]
    }
  ]
}

Performance Requirements

Metric	Target	Acceptable	Unacceptable
P50 Latency	< 2ms	< 5ms	> 10ms
P99 Latency	< 10ms	< 25ms	> 50ms
Throughput	> 10,000 req/s	> 5,000 req/s	< 1,000 req/s
Error Rate	< 0.01%	< 0.1%	> 1%
Policy Reload	< 100ms	< 500ms	> 1s

Real World Outcome

By the end of this project, you will have a high-performance authorization microservice. You can integrate this with the Proxy from Project 1 to create a complete Zero Trust flow.

What you will see:

A REST API: Listening on port 9090
Dynamic Policy Loading: Change a JSON file on disk, and the PDP will immediately change its decisions without a restart
Detailed Decision Logs: The PDP prints why it allowed or denied a request, which is essential for security auditing

Command-Line Examples

# 1. Start your Policy Engine
$ ./zta-pdp --policy-file ./policies.json --port 9090
[INFO] PDP v1.0.0 starting...
[INFO] Loading policies from ./policies.json
[INFO] Loaded 6 security policies
[INFO] Policy Engine listening on :9090
[INFO] Healthcheck endpoint: GET /health
[INFO] Decision endpoint: POST /v1/decide

# 2. Check health status
$ curl http://localhost:9090/health
{
  "status": "healthy",
  "policies_loaded": 6,
  "uptime_seconds": 45,
  "version": "1.0.0"
}

# 3. Developer pushing to repo during business hours (ALLOWED)
$ curl -s -X POST http://localhost:9090/v1/decide \
  -H "Content-Type: application/json" \
  -d '{
    "request_id": "req-001",
    "subject": {
      "id": "alice@example.com",
      "type": "user",
      "roles": ["developer"],
      "device_health": "secure"
    },
    "action": "push",
    "resource": {
      "id": "kernel-repo",
      "type": "repository",
      "sensitivity": "internal"
    },
    "environment": {
      "timestamp": "2024-12-26T14:00:00Z",
      "network_type": "corporate"
    }
  }' | jq .

{
  "decision": "ALLOW",
  "request_id": "req-001",
  "reason": "Matched policy 'dev-push-business-hours': Developers can push during business hours",
  "matched_policy": "dev-push-business-hours",
  "evaluated_at": "2024-12-26T14:00:01.234Z",
  "evaluation_time_ms": 0.45
}

# 4. Same developer pushing at 3 AM (DENIED)
$ curl -s -X POST http://localhost:9090/v1/decide \
  -H "Content-Type: application/json" \
  -d '{
    "request_id": "req-002",
    "subject": {
      "id": "alice@example.com",
      "type": "user",
      "roles": ["developer"],
      "device_health": "secure"
    },
    "action": "push",
    "resource": {
      "id": "kernel-repo",
      "type": "repository",
      "sensitivity": "critical"
    },
    "environment": {
      "timestamp": "2024-12-26T03:00:00Z",
      "network_type": "vpn"
    }
  }' | jq .

{
  "decision": "DENY",
  "request_id": "req-002",
  "reason": "Matched policy 'block-critical-after-hours': Block access to critical resources after hours",
  "matched_policy": "block-critical-after-hours",
  "evaluated_at": "2024-12-26T03:00:01.567Z",
  "evaluation_time_ms": 0.38
}

# 5. User without MFA accessing confidential resource (DENIED with obligation)
$ curl -s -X POST http://localhost:9090/v1/decide \
  -H "Content-Type: application/json" \
  -d '{
    "request_id": "req-003",
    "subject": {
      "id": "bob@example.com",
      "type": "user",
      "roles": ["viewer"],
      "mfa_verified": false
    },
    "action": "read",
    "resource": {
      "id": "financial-reports",
      "type": "document",
      "sensitivity": "confidential"
    },
    "environment": {
      "timestamp": "2024-12-26T10:00:00Z"
    }
  }' | jq .

{
  "decision": "DENY",
  "request_id": "req-003",
  "reason": "Matched policy 'require-mfa-for-sensitive': Require MFA for confidential resources",
  "matched_policy": "require-mfa-for-sensitive",
  "evaluated_at": "2024-12-26T10:00:01.890Z",
  "evaluation_time_ms": 0.52,
  "obligations": [
    {
      "action": "require_mfa",
      "parameters": { "redirect": "/auth/mfa" }
    }
  ]
}

# 6. Admin accessing everything (ALLOWED - admin override)
$ curl -s -X POST http://localhost:9090/v1/decide \
  -H "Content-Type: application/json" \
  -d '{
    "request_id": "req-004",
    "subject": {
      "id": "charlie@example.com",
      "type": "user",
      "roles": ["admin"]
    },
    "action": "delete",
    "resource": {
      "id": "production-database",
      "type": "database",
      "sensitivity": "critical"
    },
    "environment": {
      "timestamp": "2024-12-26T03:00:00Z"
    }
  }' | jq .

{
  "decision": "ALLOW",
  "request_id": "req-004",
  "reason": "Matched policy 'admin-full-access': Administrators have full access",
  "matched_policy": "admin-full-access",
  "evaluated_at": "2024-12-26T03:00:01.234Z",
  "evaluation_time_ms": 0.28
}

# 7. Hot reload policies (update JSON file, then trigger reload)
$ curl -X POST http://localhost:9090/admin/reload-policies
{
  "status": "reloaded",
  "policies_loaded": 7,
  "reload_time_ms": 45.2
}

# 8. View audit log (last 10 decisions)
$ curl http://localhost:9090/admin/audit?limit=10 | jq .
{
  "decisions": [
    {
      "request_id": "req-004",
      "subject_id": "charlie@example.com",
      "action": "delete",
      "resource_id": "production-database",
      "decision": "ALLOW",
      "matched_policy": "admin-full-access",
      "timestamp": "2024-12-26T03:00:01.234Z"
    },
    ...
  ]
}

The Core Question You’re Answering

“How do I build a system that can answer ‘Is this user allowed to do this action on this resource?’ in milliseconds, with policies that are flexible, auditable, and maintainable?”

This question sits at the heart of every secure system. Every time someone clicks “Submit,” every API call, every database query - somewhere, something must decide: allowed or denied. Traditional approaches scatter this logic across codebases, making security audits nightmares and policy changes dangerous multi-week projects. A well-designed Policy Decision Engine centralizes this critical logic, making authorization decisions consistent, traceable, and adaptable to changing business requirements without touching application code.

Concepts You Must Understand First

Before writing any code, you need to internalize these foundational concepts. For each one, make sure you can answer the associated questions.

1. Policy Decision Point (PDP) Architecture

The question to answer: What is the PDP’s role in the request lifecycle, and why must it be stateless?

The PDP is the “brain” that evaluates authorization requests. It receives context (who, what, where, when) and returns a decision. Understanding why this component must be isolated, fast, and horizontally scalable is essential.

How does the PDP differ from a Policy Enforcement Point (PEP)?
Why should the PDP never make network calls to enforce decisions?
What happens to your system’s reliability if the PDP becomes a single point of failure?

Read: “Zero Trust Networks” by Gilman & Barth, Chapter 3: The Zero Trust Control Plane - specifically the sections on policy architecture and the separation of concerns between decision and enforcement.

2. Access Control Models: ABAC vs RBAC vs PBAC

The question to answer: When is RBAC sufficient, and when does the complexity of ABAC become necessary?

Role-Based Access Control (RBAC) assigns permissions to roles, then assigns roles to users. Attribute-Based Access Control (ABAC) evaluates policies based on arbitrary attributes of subjects, resources, actions, and context. Policy-Based Access Control (PBAC) uses explicit policy statements that can combine elements of both.

What is “role explosion” and why does it plague large RBAC systems?
How would you express “contractors can only access non-production resources during business hours” in RBAC vs ABAC?
What attributes beyond “role” might influence an authorization decision?

Read: “Security in Computing” by Charles Pfleeger, Chapter 4: Access Control - covers the theoretical foundations of access control matrices, capabilities, and access control lists.

3. Policy Languages: Rego, Cedar, and XACML Concepts

The question to answer: What makes a policy language expressive enough to capture real security requirements, yet analyzable enough to prove properties about the policies?

Production systems use specialized Domain-Specific Languages (DSLs) for expressing policies. Understanding why these exist - and their tradeoffs - helps you design a sensible policy format even if you start with JSON.

What is Datalog, and why do languages like Rego build on it?
How does Cedar’s type system help prevent policy errors?
Why is it valuable to be able to answer “Can this policy ever allow access to resource X?”

Read: “Zero Trust Networks” by Gilman & Barth, Chapter 3 - discusses policy engines and their role. Additionally, explore the Open Policy Agent (OPA) documentation on Rego’s evaluation model.

4. Rule Evaluation and Conflict Resolution

The question to answer: When Policy A says ALLOW and Policy B says DENY, what should happen and why?

Real systems have many policies, and they overlap. Multiple policies might apply to a single request. The conflict resolution strategy you choose fundamentally affects your security posture.

What does “deny overrides” mean, and why is it the most common strategy?
How does policy priority ordering work, and when is it necessary?
What is the “default deny” principle, and how does it relate to fail-closed behavior?

Read: “Foundations of Information Security” by Jason Andress, Chapter 5: Authentication and Authorization - covers authorization principles and the logic behind access decisions.

5. Caching Strategies for Policy Decisions

The question to answer: How do you cache authorization decisions without creating security holes?

Every millisecond counts when every request requires an authorization check. Caching can reduce latency by 10-100x, but caching security decisions introduces risks: stale permissions, delayed revocation, cache poisoning.

What should the cache key be for an authorization decision?
When a user’s permissions are revoked, how quickly must cached decisions be invalidated?
What is cache stampede, and how do you prevent it during policy reloads?

Read: “Designing Data-Intensive Applications, 2nd Ed” by Martin Kleppmann, Chapters 1-2 - covers caching patterns, consistency tradeoffs, and the challenges of distributed state.

6. Audit Logging for Compliance

The question to answer: What information must you log to reconstruct exactly why access was granted or denied?

Authorization decisions are among the most security-sensitive events in a system. Regulators, auditors, and incident responders need to understand who accessed what, when, and why.

What is the difference between an audit log and an application log?
Why must audit logs be append-only and tamper-evident?
What fields are essential in an authorization audit record?

Read: “Security in Computing” by Charles Pfleeger, sections on auditing and accountability - discusses what makes an audit trail useful for security analysis.

Questions to Guide Your Design

These questions should shape your implementation decisions. Answer them before and during your build.

Architecture Questions

Where does your PDP fit? Draw a diagram showing how requests flow from a user through authentication, to your PDP, to the protected resource. Where is the PEP in this flow?
What is your API contract? What fields are required in an authorization request? What does a response look like? What HTTP status codes do you return?
How do you handle unknown fields? If a request includes an attribute your PDP doesn’t understand, do you ignore it, reject the request, or log a warning?
What is your failure mode? If your PDP crashes, throws an exception, or times out, is the request allowed or denied?

Policy Design Questions

How expressive is your policy language? Can you express “Developers can push to non-production repos during NYC business hours from devices that passed health checks in the last 24 hours”?
How do you handle wildcards? Does actions: ["*"] match any action? Does resources: {} (empty object) match all resources or no resources?
What happens when no policy matches? Is this an implicit deny, or an error condition?
How do you test policies before deploying them? Can you simulate a decision without affecting production?

Performance Questions

What is your latency budget? If you have 100ms for the entire request, how much can the PDP consume?
How do you scale? Can you run multiple PDP instances? Do they share state?
What can you precompute? Can you compile policies into faster data structures at load time?
What can you cache? Are there request patterns that repeat frequently enough to cache?

Operational Questions

How do you update policies without downtime? What happens to in-flight requests during a policy reload?
How do you know your PDP is healthy? What metrics do you expose? What does a health check verify?
How do you debug a wrong decision? Can you replay a request and see exactly which policy matched and why?

Thinking Exercise

Before you write code, work through this scenario with pencil and paper. This exercises your understanding of policy evaluation.

Scenario: Multi-Policy Evaluation

You have four policies loaded in your PDP:

Policy A (priority: 100, effect: ALLOW)
  - subjects: roles contain "developer"
  - actions: ["read", "push"]
  - resources: type = "repository"
  - conditions: none

Policy B (priority: 200, effect: DENY)
  - subjects: all
  - actions: ["push", "merge", "delete"]
  - resources: sensitivity = "critical"
  - conditions: time outside 08:00-18:00

Policy C (priority: 150, effect: ALLOW)
  - subjects: roles contain "admin"
  - actions: ["*"]
  - resources: all
  - conditions: none

Policy D (priority: 300, effect: DENY)
  - subjects: device_health = "compromised"
  - actions: ["*"]
  - resources: all
  - conditions: none

Now trace through these authorization requests. For each one, identify:

Which policies match the subject, action, and resource?
Which of those also pass their condition checks?
Using “deny overrides” with priority ordering, what is the final decision?

Request 1:

{
  "subject": { "id": "alice", "roles": ["developer"], "device_health": "secure" },
  "action": "push",
  "resource": { "id": "web-app", "type": "repository", "sensitivity": "internal" },
  "environment": { "timestamp": "2024-12-26T14:00:00Z" }
}

Request 2:

{
  "subject": { "id": "bob", "roles": ["developer"], "device_health": "secure" },
  "action": "push",
  "resource": { "id": "payment-service", "type": "repository", "sensitivity": "critical" },
  "environment": { "timestamp": "2024-12-26T22:00:00Z" }
}

Request 3:

{
  "subject": { "id": "charlie", "roles": ["admin"], "device_health": "secure" },
  "action": "delete",
  "resource": { "id": "payment-service", "type": "repository", "sensitivity": "critical" },
  "environment": { "timestamp": "2024-12-26T22:00:00Z" }
}

Request 4:

{
  "subject": { "id": "diana", "roles": ["admin"], "device_health": "compromised" },
  "action": "read",
  "resource": { "id": "docs", "type": "document", "sensitivity": "public" },
  "environment": { "timestamp": "2024-12-26T10:00:00Z" }
}

Work through each request step by step. Write down your reasoning. Then check your answers:

Click to reveal answers

Request 1: ALLOW (via Policy A)

Policy A matches: developer, push, repository - no conditions - ALLOW
Policy B: matches push, but sensitivity is “internal” not “critical” - no match
Policy C: not admin - no match
Policy D: device not compromised - no match
Only Policy A applies. Decision: ALLOW

Request 2: DENY (via Policy B)

Policy A: matches subject, action, resource - ALLOW candidate
Policy B: matches (push action, critical sensitivity, time 22:00 is outside 08:00-18:00) - DENY candidate
Policy C: not admin - no match
Policy D: device not compromised - no match
Policy B has higher priority (200 > 100) and is DENY. With deny-overrides: DENY

Request 3: DENY (via Policy B)

Policy A: not “push” or “read” action (it’s “delete”) - wait, check again. Policy A has actions [“read”, “push”]. Delete is not in that list. - no match
Policy B: delete is in [“push”, “merge”, “delete”], sensitivity is critical, time 22:00 is outside hours - DENY candidate
Policy C: admin, wildcard action - ALLOW candidate
Policy D: device not compromised - no match
Both B and C match. B has priority 200, C has priority 150. But this is deny-overrides, so ANY deny causes denial regardless of priority. Decision: DENY

Request 4: DENY (via Policy D)

Policy A: admin not in [“developer”] - no match
Policy B: action “read” not in [“push”, “merge”, “delete”] - no match
Policy C: admin, wildcard action - ALLOW candidate
Policy D: device_health is compromised - DENY candidate
Both C and D match. D has higher priority (300) and is DENY. With deny-overrides: DENY

If your answers differed, review the matching logic. Pay special attention to:

Whether empty conditions mean “always true” or “never match”
How “deny overrides” interacts with priority
The difference between “no match” and “match with opposite effect”

Hints in Layers

If you get stuck, reveal hints progressively. Try to solve problems yourself first.

Hint 1: Start With the Simplest Possible Version

Click to reveal

Don’t start with JSON policy files. Start with hardcoded policies in your code. Create a /v1/decide endpoint that:

Accepts a POST request with JSON body
Parses subject, action, and resource from the body
Has ONE hardcoded rule: “if subject.roles contains ‘admin’, return ALLOW”
Returns DENY for everything else

Once this works, you’ve proven your HTTP layer, JSON parsing, and decision response format. Only then add policy loading.

Hint 2: Policy Matching is Just Set Intersection

Click to reveal

Most policy matching reduces to: “Does the request attribute intersect with the policy’s allowed set?”

func matchesSubject(request, policy):
    if policy.subjects.roles is empty:
        return true  // Empty means "any"
    return intersection(request.subject.roles, policy.subjects.roles) is not empty

The same pattern applies to actions and resource types. An empty constraint means “match all.” A non-empty constraint means “at least one must match.”

Watch out for the wildcard action "*" - handle it explicitly.

Hint 3: Conditions Are Just Boolean Expressions

Click to reveal

Each condition type is a function that returns true or false:

func evaluateConditions(request, policy):
    for each condition in policy.conditions:
        if condition is time_range:
            if not isInTimeRange(request.environment.timestamp, condition.start, condition.end):
                return false
        if condition is device_health:
            if request.subject.device_health not in condition.allowed_states:
                return false
        // ... more condition types
    return true  // All conditions passed

Start by supporting just one condition type (time_range is a good first choice). Add more incrementally.

Hint 4: Use RWMutex for Policy Hot-Reload

Click to reveal

The classic reader-writer problem: many goroutines read policies (evaluating decisions), but occasionally one goroutine writes (reloading policies).

type PolicyStore struct {
    mu       sync.RWMutex
    policies []Policy
}

func (ps *PolicyStore) Evaluate(req Request) Decision {
    ps.mu.RLock()         // Multiple readers can hold this simultaneously
    defer ps.mu.RUnlock()
    // Read from ps.policies safely
}

func (ps *PolicyStore) Reload(newPolicies []Policy) {
    ps.mu.Lock()          // Exclusive lock - blocks all readers
    defer ps.mu.Unlock()
    ps.policies = newPolicies
}

The key insight: RLock doesn’t block other RLocks, so decision evaluation remains parallel. Only during the brief moment of Reload do readers block.

Hint 5: Cache Key Design Matters

Click to reveal

A naive cache key might be: hash(entire_request). But this has poor hit rates because timestamps differ on every request.

Better approach: Cache based on the attributes that actually affect the decision:

func cacheKey(request):
    relevant = {
        "subject_id": request.subject.id,
        "subject_roles": sorted(request.subject.roles),
        "action": request.action,
        "resource_id": request.resource.id,
        "resource_type": request.resource.type,
        // Note: NOT including timestamp, only hour
        "hour_bucket": request.environment.timestamp.hour
    }
    return sha256(json(relevant))

The “hour bucket” means decisions are cached per hour, which is appropriate for hourly time-range policies. Adjust granularity based on your policy precision.

Also consider: what happens when a user’s roles change? You need a way to invalidate their cached decisions. One pattern: include a “cache version” in the key, and increment it on role changes.

Solution Architecture

Component Diagram

+------------------------------------------------------------------+
|                    PDP Architecture Overview                      |
+------------------------------------------------------------------+
|                                                                  |
|   EXTERNAL                        INTERNAL                       |
|   +------------------+            +---------------------------+  |
|   |  Policy JSON     |            |      Policy Engine        |  |
|   |  File/DB         |----------->|                           |  |
|   +------------------+            |  +---------------------+  |  |
|           ^                       |  | Policy Store        |  |  |
|           |                       |  | (In-Memory + Index) |  |  |
|   +------------------+            |  +---------------------+  |  |
|   | File Watcher /   |            |           |               |  |
|   | Admin API        |------------|           v               |  |
|   +------------------+            |  +---------------------+  |  |
|                                   |  | Rule Evaluator      |  |  |
|   +------------------+            |  | - Condition Matcher |  |  |
|   |  HTTP Server     |----------->|  | - Conflict Resolver |  |  |
|   |  :9090           |            |  +---------------------+  |  |
|   +------------------+            |           |               |  |
|           ^                       |           v               |  |
|           |                       |  +---------------------+  |  |
|   +------------------+            |  | Decision Cache      |  |  |
|   |  PEP / Gateway   |            |  | (Optional Redis)    |  |  |
|   |  (Project 1)     |            |  +---------------------+  |  |
|   +------------------+            |           |               |  |
|                                   |           v               |  |
|   +------------------+            |  +---------------------+  |  |
|   | Policy Info      |<-----------|  | Audit Logger        |  |  |
|   | Points (PIPs):   |            |  | (Append-Only Log)   |  |  |
|   | - Device Health  |            |  +---------------------+  |  |
|   | - Threat Intel   |            |                           |  |
|   +------------------+            +---------------------------+  |
|                                                                  |
+------------------------------------------------------------------+

Policy Evaluation Flow

+------------------------------------------------------------------+
|                    Policy Evaluation Flow                         |
+------------------------------------------------------------------+
|                                                                  |
|   1. REQUEST RECEIVED                                            |
|   +----------------------------------------------------------+   |
|   | POST /v1/decide                                          |   |
|   | { subject, action, resource, environment }               |   |
|   +-------------------------+--------------------------------+   |
|                             |                                    |
|                             v                                    |
|   2. REQUEST VALIDATION                                          |
|   +----------------------------------------------------------+   |
|   | - Validate JSON schema                                   |   |
|   | - Normalize fields (lowercase, trim)                     |   |
|   | - Set defaults for missing optional fields               |   |
|   | - Generate request_id if not provided                    |   |
|   +-------------------------+--------------------------------+   |
|                             |                                    |
|                             v                                    |
|   3. CACHE CHECK (Optional)                                      |
|   +----------------------------------------------------------+   |
|   | cache_key = hash(subject.id, action, resource.id)        |   |
|   | if cache.has(cache_key):                                 |   |
|   |     return cache.get(cache_key)  ----------------------->|-->|
|   +-------------------------+--------------------------------+   |
|                             |                                    |
|                             v                                    |
|   4. DATA ENRICHMENT (Optional)                                  |
|   +----------------------------------------------------------+   |
|   | - Fetch device health from Device Trust Service          |   |
|   | - Fetch user attributes from Identity Store              |   |
|   | - Fetch threat intel for IP address                      |   |
|   | - Attach enriched data to request context                |   |
|   +-------------------------+--------------------------------+   |
|                             |                                    |
|                             v                                    |
|   5. POLICY MATCHING                                             |
|   +----------------------------------------------------------+   |
|   | for policy in policies (sorted by priority DESC):        |   |
|   |     if matches_subject(request, policy) AND              |   |
|   |        matches_action(request, policy) AND               |   |
|   |        matches_resource(request, policy):                |   |
|   |            add to candidate_policies                     |   |
|   +-------------------------+--------------------------------+   |
|                             |                                    |
|                             v                                    |
|   6. CONDITION EVALUATION                                        |
|   +----------------------------------------------------------+   |
|   | for policy in candidate_policies:                        |   |
|   |     if evaluate_conditions(request, policy.conditions):  |   |
|   |         add to matching_policies                         |   |
|   +-------------------------+--------------------------------+   |
|                             |                                    |
|                             v                                    |
|   7. CONFLICT RESOLUTION                                         |
|   +----------------------------------------------------------+   |
|   | Strategy: Deny Overrides                                 |   |
|   |                                                          |   |
|   | if any(p.effect == DENY for p in matching_policies):     |   |
|   |     decision = DENY                                      |   |
|   |     matched_policy = first DENY policy                   |   |
|   | elif any(p.effect == ALLOW for p in matching_policies):  |   |
|   |     decision = ALLOW                                     |   |
|   |     matched_policy = first ALLOW policy                  |   |
|   | else:                                                    |   |
|   |     decision = DENY  # Default deny if no match          |   |
|   |     reason = "No matching policy found"                  |   |
|   +-------------------------+--------------------------------+   |
|                             |                                    |
|                             v                                    |
|   8. RESPONSE GENERATION                                         |
|   +----------------------------------------------------------+   |
|   | response = {                                             |   |
|   |     decision: ALLOW/DENY,                                |   |
|   |     reason: matched_policy.description,                  |   |
|   |     matched_policy: matched_policy.id,                   |   |
|   |     obligations: matched_policy.obligations,             |   |
|   |     evaluated_at: now(),                                 |   |
|   |     evaluation_time_ms: elapsed                          |   |
|   | }                                                        |   |
|   +-------------------------+--------------------------------+   |
|                             |                                    |
|                             v                                    |
|   9. CACHE UPDATE & AUDIT                                        |
|   +----------------------------------------------------------+   |
|   | cache.set(cache_key, response, ttl=300)                  |   |
|   | audit_log.append(request, response)                      |   |
|   +-------------------------+--------------------------------+   |
|                             |                                    |
|                             v                                    |
|   10. RETURN RESPONSE                                            |
|   +----------------------------------------------------------+   |
|   | HTTP 200 OK                                              |   |
|   | Content-Type: application/json                           |   |
|   | { decision, reason, ... }                                |   |
|   +----------------------------------------------------------+   |
|                                                                  |
+------------------------------------------------------------------+

Key Design Decisions

Decision	Choice	Rationale
Language	Go or Rust	Performance-critical, concurrent, minimal GC pauses
Policy Format	JSON	Human-readable, tooling support, easy hot-reload
Storage	In-memory with file backing	Sub-millisecond lookups, persistence on restart
Conflict Strategy	Deny Overrides	Security-first, predictable behavior
Caching	Optional Redis/local	Trade consistency for performance when acceptable
API Style	REST + JSON	Universal compatibility, easy debugging
Audit Log	Append-only file	Tamper-evident, compliance-friendly

Phased Implementation Guide

Phase 1: Basic Allow/Deny Endpoint

Goal: Create a minimal server that accepts authorization requests and returns hardcoded decisions.

Deliverable: Working HTTP server with /v1/decide endpoint.

Steps:

Set up project structure (Go module or Rust crate)
Create request/response structs from the JSON schemas
Implement HTTP handler for POST /v1/decide
Validate incoming JSON structure
Return hardcoded DENY for all requests (fail-closed default)
Add health check endpoint GET /health

Verification:

# Server should start
$ ./zta-pdp --port 9090
[INFO] PDP listening on :9090

# Health check works
$ curl http://localhost:9090/health
{"status": "healthy"}

# Decision endpoint returns DENY
$ curl -X POST http://localhost:9090/v1/decide \
  -d '{"subject":{"id":"test"},"action":"read","resource":{"id":"test"}}' \
  -H "Content-Type: application/json"
{"decision": "DENY", "reason": "No policies configured"}

Phase 2: Simple Role-Based Rules

Goal: Implement basic RBAC - if user has required role, allow access.

Deliverable: Working policy evaluation with role matching.

Steps:

Create Policy struct matching the schema
Load policies from a JSON file at startup
Implement matches_subject() - check if subject roles intersect with policy roles
Implement matches_action() - check if action is in policy actions
Implement matches_resource() - check if resource type matches
Return ALLOW if any policy matches with effect: "allow"

Sample Policy File (policies.json):

{
  "policies": [
    {
      "id": "admin-all",
      "effect": "allow",
      "subjects": { "roles": ["admin"] },
      "actions": ["*"],
      "resources": {}
    },
    {
      "id": "dev-read",
      "effect": "allow",
      "subjects": { "roles": ["developer"] },
      "actions": ["read"],
      "resources": { "types": ["repository"] }
    }
  ]
}

Verification:

# Admin can do anything
$ curl -X POST http://localhost:9090/v1/decide \
  -d '{"subject":{"id":"alice","roles":["admin"]},"action":"delete","resource":{"id":"db"}}' \
  -H "Content-Type: application/json"
{"decision": "ALLOW", "matched_policy": "admin-all"}

# Developer can read repos
$ curl -X POST http://localhost:9090/v1/decide \
  -d '{"subject":{"id":"bob","roles":["developer"]},"action":"read","resource":{"id":"repo","type":"repository"}}' \
  -H "Content-Type: application/json"
{"decision": "ALLOW", "matched_policy": "dev-read"}

# Developer cannot delete
$ curl -X POST http://localhost:9090/v1/decide \
  -d '{"subject":{"id":"bob","roles":["developer"]},"action":"delete","resource":{"id":"repo"}}' \
  -H "Content-Type: application/json"
{"decision": "DENY", "reason": "No matching policy"}

Phase 3: Attribute-Based Rules with JSONPath

Goal: Add condition evaluation for time-based, device health, and custom attributes.

Deliverable: Full ABAC support with complex conditions.

Steps:

Implement evaluate_conditions() function
Add time range condition checking
Add device health condition checking
Add network type condition checking
Implement JSONPath extraction for custom conditions
Add conflict resolution (deny overrides allow)
Add policy priority sorting

Key Functions:

evaluate_time_range(request, condition) -> bool
evaluate_device_health(request, condition) -> bool
evaluate_network_type(request, condition) -> bool
evaluate_custom_condition(request, jsonpath_expr, expected_value) -> bool

Verification:

# Request during business hours with secure device - ALLOW
$ curl -X POST http://localhost:9090/v1/decide \
  -d '{
    "subject":{"id":"alice","roles":["developer"],"device_health":"secure"},
    "action":"push",
    "resource":{"id":"repo","type":"repository"},
    "environment":{"timestamp":"2024-12-26T14:00:00Z"}
  }' \
  -H "Content-Type: application/json"
{"decision": "ALLOW"}

# Same request at 3 AM - DENY
$ curl -X POST http://localhost:9090/v1/decide \
  -d '{
    "subject":{"id":"alice","roles":["developer"],"device_health":"secure"},
    "action":"push",
    "resource":{"id":"repo","type":"repository"},
    "environment":{"timestamp":"2024-12-26T03:00:00Z"}
  }' \
  -H "Content-Type: application/json"
{"decision": "DENY", "reason": "Outside allowed time range"}

Phase 4: Policy Hot-Reloading

Goal: Update policies without restarting the server.

Deliverable: File watcher + admin reload endpoint.

Steps:

Implement file watcher using fsnotify (Go) or notify (Rust)
Create read-write lock for policy store
On file change: parse new policies, validate, atomic swap
Add POST /admin/reload-policies endpoint for manual reload
Add policy version tracking
Graceful handling of parse errors (keep old policies)

Implementation Pattern (Go):

type PolicyStore struct {
    policies []Policy
    mu       sync.RWMutex
    version  int
}

func (ps *PolicyStore) Reload(path string) error {
    newPolicies, err := loadFromFile(path)
    if err != nil {
        return err  // Keep old policies
    }
    ps.mu.Lock()
    defer ps.mu.Unlock()
    ps.policies = newPolicies
    ps.version++
    return nil
}

func (ps *PolicyStore) Evaluate(req Request) Decision {
    ps.mu.RLock()
    defer ps.mu.RUnlock()
    // Use ps.policies...
}

Verification:

# Initial state: 2 policies
$ curl http://localhost:9090/health
{"policies_loaded": 2, "policy_version": 1}

# Modify policies.json (add a new policy)
$ echo '...' >> policies.json

# Automatic reload detected
[INFO] File change detected, reloading policies...
[INFO] Loaded 3 policies (version 2)

# Or manual reload
$ curl -X POST http://localhost:9090/admin/reload-policies
{"status": "reloaded", "policies_loaded": 3, "policy_version": 2}

Phase 5: Performance Optimization and Caching

Goal: Achieve sub-5ms P99 latency at 10,000+ requests/second.

Deliverable: Optimized engine with caching and metrics.

Steps:

Add request timing metrics (evaluation_time_ms in response)
Implement in-memory LRU cache for decisions
Add policy index by action type for faster matching
Implement connection pooling if using external data sources
Add /metrics endpoint (Prometheus format)
Profile and optimize hot paths

Caching Strategy:

Cache Key: SHA256(subject.id + action + resource.id)
Cache TTL: 300 seconds (configurable per resource sensitivity)
Invalidation: On policy reload, clear entire cache

Optimization Techniques:

Pre-compile regex patterns at policy load time
Use sync.Pool for request/response object reuse
Avoid allocations in the hot path
Index policies by action for O(1) lookup
Use atomic counters for metrics

Verification:

# Benchmark with wrk or hey
$ hey -n 100000 -c 100 -m POST \
  -H "Content-Type: application/json" \
  -d '{"subject":{"id":"test","roles":["dev"]},"action":"read","resource":{"id":"repo"}}' \
  http://localhost:9090/v1/decide

Summary:
  Requests/sec:	12345.67
  Latency:
    50%: 1.2ms
    99%: 4.8ms

# Check metrics
$ curl http://localhost:9090/metrics
pdp_requests_total{decision="ALLOW"} 50000
pdp_requests_total{decision="DENY"} 50000
pdp_evaluation_seconds{quantile="0.5"} 0.0012
pdp_evaluation_seconds{quantile="0.99"} 0.0048
pdp_cache_hits_total 75000
pdp_cache_misses_total 25000

Testing Strategy

Unit Tests

Test individual components in isolation:

Policy Parsing:
- Valid JSON parses correctly
- Invalid JSON returns error
- Missing required fields fail validation
- Unknown fields are ignored (forward compatibility)
Subject Matching:
- Role intersection works correctly
- Empty policy roles match all subjects
- Case sensitivity handling
Condition Evaluation:
- Time range: inside, outside, edge cases (midnight crossing)
- Device health: exact match, array membership
- JSONPath: valid paths, missing fields, type mismatches
Conflict Resolution:
- DENY overrides ALLOW
- Priority ordering works
- No-match returns default DENY

Integration Tests

Test the complete request-response cycle:

API Contract:
- Request validation rejects malformed JSON
- Response matches expected schema
- HTTP status codes are correct (200, 400, 500)
End-to-End Scenarios:
- Admin bypasses all restrictions
- Time-based denial works
- MFA requirement triggers obligation
- Unknown subjects are denied
Hot Reload:
- New policies take effect immediately
- Invalid policy file doesn’t crash server
- Concurrent requests during reload work correctly

Performance Tests

# Load test with realistic payload
$ cat <<EOF > payload.json
{
  "subject": {"id": "alice", "roles": ["developer"], "device_health": "secure"},
  "action": "push",
  "resource": {"id": "repo-123", "type": "repository", "sensitivity": "internal"},
  "environment": {"timestamp": "2024-12-26T14:00:00Z", "network_type": "corporate"}
}
EOF

# Sustained load test (10 minutes)
$ hey -z 600s -c 100 -m POST \
  -H "Content-Type: application/json" \
  -D payload.json \
  http://localhost:9090/v1/decide

# Spike test (sudden burst)
$ hey -n 50000 -c 500 -m POST ...

# Measure memory under load
$ while true; do ps -o rss= -p $(pgrep zta-pdp); sleep 5; done

Verification Against Reference Implementation

Compare your PDP decisions with Open Policy Agent (OPA):

# Run OPA with equivalent Rego policies
$ opa run --server policies.rego

# Test same input against both
$ diff <(curl your-pdp/decide -d @input.json | jq .decision) \
       <(curl opa/v1/data/authz -d @input.json | jq .result.allow)

Common Pitfalls and Debugging

Pitfall 1: Policy Never Matches

Symptom: All requests return DENY even when they should match.

Cause: Field name mismatch or case sensitivity.

Solution:

# Add debug logging to show what's being compared
[DEBUG] Checking policy 'dev-read':
[DEBUG]   Subject roles: ["Developer"] vs Policy roles: ["developer"]
[DEBUG]   -> No intersection (case mismatch!)

# Fix: Normalize to lowercase at parse time
subject.roles = subject.roles.map(r => r.toLowerCase())

Pitfall 2: Time Zone Issues

Symptom: Time-based policies allow/deny at wrong times.

Cause: Server timezone differs from policy timezone.

Solution:

// Always parse times in the policy's specified timezone
loc, _ := time.LoadLocation(condition.TimeRange.Timezone)
requestTime := request.Environment.Timestamp.In(loc)
hour := requestTime.Hour()

Pitfall 3: Cache Stampede on Policy Reload

Symptom: Latency spike after policy reload as cache is empty.

Cause: All cached decisions invalidated simultaneously.

Solution:

// Option 1: Probabilistic early expiration
func shouldFetch(ttl time.Duration) bool {
    remaining := ttl.Seconds()
    return rand.Float64() < math.Exp(-remaining/60.0)
}

// Option 2: Background cache warming
func afterReload() {
    go warmCache(commonRequests)
}

Pitfall 4: Wildcard Action Matching

Symptom: actions: ["*"] doesn’t match all actions.

Cause: Wildcard treated as literal string.

Solution:

func matchesAction(request Request, policy Policy) bool {
    if len(policy.Actions) == 0 {
        return true  // Empty = match all
    }
    for _, action := range policy.Actions {
        if action == "*" || action == request.Action {
            return true
        }
    }
    return false
}

Pitfall 5: Race Condition on Policy Reload

Symptom: Intermittent panics or wrong decisions during reload.

Cause: Reading policies while they’re being written.

Solution:

// Use RWMutex - multiple readers OR single writer
type PolicyStore struct {
    mu       sync.RWMutex
    policies []Policy
}

func (ps *PolicyStore) Evaluate(req Request) Decision {
    ps.mu.RLock()          // Acquire read lock
    defer ps.mu.RUnlock()
    // Safe to read ps.policies
}

func (ps *PolicyStore) Reload(new []Policy) {
    ps.mu.Lock()           // Acquire write lock
    defer ps.mu.Unlock()
    ps.policies = new      // Atomic swap
}

Debugging Checklist

Enable verbose logging: Log every policy check with matched/unmatched reason
Add request echo: Include parsed request in response for verification
Use curl with -v: See exact request being sent
Check policy loading: Verify policies.json is valid JSON
Verify timestamps: Ensure client and server clocks are synchronized
Test with minimal policy: Single policy to isolate matching logic

Extensions and Challenges

Extension 1: gRPC API

Replace REST with gRPC for lower latency and type safety.

Hints:

Define .proto file matching the JSON schemas
Use streaming for bulk decisions
Implement health checking protocol
Target < 1ms P50 latency

Extension 2: Policy Simulation Mode

Add endpoint to test “what if” scenarios without affecting production.

$ curl -X POST http://localhost:9090/v1/simulate \
  -d '{
    "request": { ... },
    "proposed_policies": [ ... ]
  }'
{"would_allow": true, "matched_policy": "new-policy-draft"}

Extension 3: Policy Analytics Dashboard

Build a web UI showing:

Decision distribution over time
Most-matched policies
Most-denied subjects/resources
Latency percentiles

Extension 4: Distributed Policy Sync

Multiple PDP instances sharing policies via:

etcd/Consul for configuration
Raft consensus for consistency
Merkle trees for efficient diff

Extension 5: Machine Learning Risk Scoring

Integrate with a risk scoring model:

Input: Subject + context features
Output: Risk score 0.0 - 1.0
Use in conditions: risk_score < 0.5

Books That Will Help

Topic	Book	Chapter
Authorization Logic	“Foundations of Information Security” by Jason Andress	Ch. 5: Authentication and Authorization
Policy as Code	“Zero Trust Networks” by Gilman & Barth	Ch. 3: The Zero Trust Control Plane
System Performance	“Designing Data-Intensive Applications, 2nd Ed” by Kleppmann	Ch. 1-2: Foundations
Access Control Models	“Security in Computing” by Charles Pfleeger	Ch. 4: Access Control
Go Performance	“Learning Go, 2nd Edition” by Jon Bodner	Ch. 12: Performance
Concurrent Data Structures	“Algorithms, Fourth Edition” by Sedgewick & Wayne	Ch. 4: Hash Tables
Rule Engines & Logic	“Design Patterns” by Gamma et al.	Ch. 5: Strategy Pattern
Cryptographic Foundations	“Serious Cryptography, 2nd Edition” by Aumasson	Ch. 5: MACs and Authentication

Interview Questions

Questions you should be able to answer after completing this project:

“What is the difference between Authentication and Authorization?”
- Authentication: Verifying WHO you are (identity)
- Authorization: Verifying WHAT you can do (permissions)
- The PDP handles authorization, assumes authentication already happened
“Explain Attribute-Based Access Control (ABAC) with a real-world example.”
- Example: “Developers can push to non-production repos during business hours from secure devices”
- Combines subject attributes (role, device health), resource attributes (environment), and context (time)
- More flexible than RBAC, can express complex policies
“How do you handle ‘Policy Conflict’ (e.g., one rule says ALLOW, another says DENY)?”
- Strategy 1: Deny overrides (most secure, what AWS uses)
- Strategy 2: First match (order-dependent, like firewalls)
- Strategy 3: Most specific wins (requires specificity scoring)
- Strategy 4: Explicit priority numbers
“Why is centralized policy management key to Zero Trust?”
- Single source of truth for all authorization decisions
- Consistent enforcement across all services
- Easier auditing and compliance
- Policy updates take effect everywhere immediately
- Decouples authorization logic from application code
“What happens if the PDP is unavailable?”
- Fail-closed: Deny all requests (secure but impacts availability)
- Fail-open: Allow all requests (dangerous, should never be used for sensitive resources)
- Cached fallback: Use last known decision (tradeoff between security and availability)
“How do you achieve sub-5ms latency in a policy engine?”
- In-memory policy storage (no database queries)
- Policy indexing by action type
- Decision caching with appropriate TTL
- Object pooling to reduce GC pressure
- Avoid regex compilation in hot path
“What is a Policy Information Point (PIP)?”
- External data source that enriches authorization requests
- Examples: Device health service, threat intelligence feed, HR database
- PDP queries PIPs to get current context for decisions
- Should be cached to avoid latency

Self-Assessment Checklist

Before considering this project complete, verify you can: