Project 2: Policy Decision Engine (Building a PDP)

Project 2: Policy Decision Engine (Building a PDP)

Project Overview

What you’re building: A standalone “Brain” server that functions as the Policy Decision Point (PDP) in Zero Trust Architecture. It receives authorization requests (e.g., “Can User A perform Action B on Resource C?”) and returns “Allow” or “Deny” decisions based on dynamic rules stored in a database or JSON file. This server doesn’t handle actual traffic - it only makes decisions.

Why it matters: In traditional security, authorization logic is scattered across applications - each service implements its own access control. This creates inconsistency, makes auditing nearly impossible, and means changing a policy requires updating multiple codebases. A centralized PDP solves this by becoming the single source of truth for all authorization decisions.

Real-world applications:

  • Enterprise authorization systems (like Google’s Zanzibar or Airbnb’s Himeji)
  • Cloud IAM policy evaluation (AWS IAM, Google Cloud IAP)
  • API gateway authorization layers
  • Microservices access control in service meshes
  • Zero Trust Network Access (ZTNA) decision engines
+------------------------------------------------------------------+
|                    Policy Decision Point (PDP)                    |
+------------------------------------------------------------------+
|                                                                  |
|   PEP (Proxy/Gateway)          PDP (This Project)                |
|        |                           |                             |
|        |  Authorization Request:   |                             |
|        |  "Can Alice push to       |                             |
|        |   kernel-repo at 3AM?"    |                             |
|        |-------------------------->|                             |
|        |                           |                             |
|        |                     +-----|-----+                       |
|        |                     | Evaluate  |                       |
|        |                     | Policies  |                       |
|        |                     +-----|-----+                       |
|        |                           |                             |
|        |  Decision Response:       |                             |
|        |  { "decision": "DENY",    |                             |
|        |    "reason": "Outside     |                             |
|        |     allowed hours" }      |                             |
|        |<--------------------------|                             |
|        |                           |                             |
|        V                           |                             |
|   +------------------+             |                             |
|   | Enforce Decision |             |                             |
|   | Block Request    |             |                             |
|   +------------------+             |                             |
|                                                                  |
+------------------------------------------------------------------+

Learning Objectives

By completing this project, you will be able to:

  1. Explain the difference between RBAC, ABAC, and ReBAC and when to use each access control model
  2. Design and implement a policy evaluation engine that supports complex, context-aware rules
  3. Build a high-performance authorization service capable of sub-5ms response times
  4. Implement policy hot-reloading without service restart or dropped connections
  5. Create audit logs that capture the complete decision trail for compliance requirements
  6. Handle policy conflicts using precedence rules and explicit conflict resolution strategies
  7. Integrate data enrichment from external sources (device health, threat intelligence) into decisions

Prerequisites

Required knowledge:

  • Proficiency in Go or Rust (or Python for prototyping)
  • Understanding of JSON and REST APIs
  • Basic knowledge of boolean logic and conditional expressions
  • Familiarity with HTTP servers and request handling

Helpful but not required:

  • Experience with Open Policy Agent (OPA) or similar policy engines
  • Understanding of database query optimization
  • Knowledge of caching strategies

System requirements:

  • Linux (preferred) or macOS
  • Go 1.21+ or Rust 1.70+
  • curl for testing
  • Redis (optional, for caching)

Deep Theoretical Foundation

Access Control Models: RBAC vs ABAC vs ReBAC

Before building a policy engine, you must understand the three fundamental access control paradigms:

+------------------------------------------------------------------+
|                    Access Control Model Comparison                |
+------------------------------------------------------------------+

RBAC (Role-Based Access Control)
================================
Simple, coarse-grained. Users have Roles, Roles have Permissions.

   User: Alice
      |
      +---> Role: Developer
               |
               +---> Permissions: [read:code, write:code, read:docs]

   Decision Logic:
   IF user.role == "developer" THEN allow(read:code)

   Pros: Simple to understand, easy to audit
   Cons: Role explosion, lacks context awareness

+------------------------------------------------------------------+

ABAC (Attribute-Based Access Control)
=====================================
Fine-grained. Decisions based on attributes of Subject, Resource,
Action, and Environment.

   Subject Attributes:        Resource Attributes:
   +------------------+       +------------------+
   | user: alice      |       | type: repository |
   | role: developer  |       | owner: kernel    |
   | clearance: L2    |       | sensitivity: high|
   | device: secure   |       | classification: C|
   +------------------+       +------------------+

   Environment Attributes:    Action Attributes:
   +------------------+       +------------------+
   | time: 14:00      |       | operation: push  |
   | location: NYC    |       | scope: main      |
   | ip: 10.0.1.5     |       | method: POST     |
   +------------------+       +------------------+

   Decision Logic:
   IF subject.role == "developer"
      AND resource.sensitivity <= subject.clearance
      AND environment.time BETWEEN 08:00 AND 20:00
      AND subject.device == "secure"
   THEN allow(action)

   Pros: Highly flexible, context-aware, fine-grained
   Cons: Complex to manage, harder to audit

+------------------------------------------------------------------+

ReBAC (Relationship-Based Access Control)
=========================================
Graph-based. Access determined by relationships between entities.
Used by Google Zanzibar, Airbnb Himeji, Ory Keto.

   alice --[member]--> team-kernel --[owner]--> kernel-repo
                                         |
   bob --[viewer]------------------------|

   Decision Logic:
   Can(alice, push, kernel-repo)?
     -> alice is member of team-kernel
     -> team-kernel is owner of kernel-repo
     -> owners can push
     -> ALLOW

   Pros: Natural for hierarchies, handles delegation well
   Cons: Graph traversal complexity, eventual consistency issues

+------------------------------------------------------------------+

The Policy Decision Point in Zero Trust Architecture

The PDP sits at the heart of the Zero Trust Control Plane:

+------------------------------------------------------------------+
|              Zero Trust Control Plane Architecture                |
+------------------------------------------------------------------+
|                                                                  |
|                     CONTROL PLANE                                |
|   +----------------------------------------------------------+   |
|   |                                                          |   |
|   |   +----------------+     +---------------------------+   |   |
|   |   |     Policy     |     |    Policy Information     |   |   |
|   |   |  Administrator |     |    Points (PIPs)          |   |   |
|   |   |      (PA)      |     |                           |   |   |
|   |   +-------+--------+     |  +--------+ +--------+    |   |   |
|   |           |              |  |Identity| |Device  |    |   |   |
|   |           |              |  |Store   | |Health  |    |   |   |
|   |           v              |  +--------+ +--------+    |   |   |
|   |   +----------------+     |  +--------+ +--------+    |   |   |
|   |   |    Policy      |<----|  |Threat  | |Time/   |    |   |   |
|   |   |   Decision     |     |  |Intel   | |Location|    |   |   |
|   |   |    Point       |     |  +--------+ +--------+    |   |   |
|   |   |    (PDP)       |     +---------------------------+   |   |
|   |   +-------+--------+                                     |   |
|   |           |                                              |   |
|   +-----------|----------------------------------------------+   |
|               | Decision (Allow/Deny + Reason)                   |
|               v                                                  |
|   +----------------------------------------------------------+   |
|   |                      DATA PLANE                          |   |
|   |                                                          |   |
|   |   [Subject] --> [PEP/Gateway] --> [Resource]             |   |
|   |                      ^                                   |   |
|   |                      |                                   |   |
|   |               Enforces Decision                          |   |
|   +----------------------------------------------------------+   |
|                                                                  |
+------------------------------------------------------------------+

Policy Languages and Domain-Specific Languages (DSLs)

Production policy engines use specialized languages for expressing rules:

+------------------------------------------------------------------+
|                    Policy Language Comparison                     |
+------------------------------------------------------------------+

OPA Rego (Open Policy Agent)
============================
Logic-based, Datalog-inspired language. Industry standard.

   package authz

   default allow = false

   allow {
       input.subject.role == "admin"
   }

   allow {
       input.subject.role == "developer"
       input.resource.type == "code"
       is_business_hours
   }

   is_business_hours {
       now := time.now_ns()
       hour := time.clock(now)[0]
       hour >= 8
       hour < 20
   }

+------------------------------------------------------------------+

Cedar (AWS Verified Permissions)
================================
Type-safe, analyzable policy language from AWS.

   permit (
       principal in Group::"developers",
       action == Action::"push",
       resource in Repository::"kernel"
   ) when {
       context.time.hour >= 8 &&
       context.time.hour < 20 &&
       principal.device_health == "secure"
   };

+------------------------------------------------------------------+

JSON-Based Rules (What you'll build)
====================================
Simpler format for learning - you'll implement an evaluator for this.

   {
     "id": "dev-push-hours",
     "effect": "allow",
     "subjects": {
       "roles": ["developer", "admin"]
     },
     "actions": ["push", "commit"],
     "resources": {
       "type": "repository"
     },
     "conditions": {
       "time_range": {"start": "08:00", "end": "20:00"},
       "device_health": "secure"
     }
   }

+------------------------------------------------------------------+

Context-Aware Authorization

Context transforms simple RBAC into intelligent, adaptive security:

+------------------------------------------------------------------+
|                    Context Dimensions for Decisions               |
+------------------------------------------------------------------+
|                                                                  |
|   WHO (Subject Context)           WHAT (Resource Context)        |
|   +------------------------+      +------------------------+     |
|   | Identity: alice@corp   |      | ID: kernel-repo        |     |
|   | Roles: [dev, sre]      |      | Type: repository       |     |
|   | Groups: [team-kernel]  |      | Owner: linux-foundation|     |
|   | Clearance: L3          |      | Sensitivity: critical  |     |
|   | MFA: verified          |      | Classification: public |     |
|   | Session Age: 2h        |      | Data Types: [code]     |     |
|   +------------------------+      +------------------------+     |
|                                                                  |
|   HOW (Action Context)            WHERE/WHEN (Environment)       |
|   +------------------------+      +------------------------+     |
|   | Operation: git_push    |      | Time: 2024-12-26T14:00 |     |
|   | Method: POST           |      | Day: Thursday          |     |
|   | Scope: refs/heads/main |      | Location: NYC office   |     |
|   | Commit Count: 3        |      | IP: 10.0.1.45          |     |
|   | Files Changed: 12      |      | Network: corporate     |     |
|   +------------------------+      | Device ID: MBP-42      |     |
|                                   | Device Health: secure   |     |
|   WHY (Risk Context)              | Risk Score: 0.15       |     |
|   +------------------------+      +------------------------+     |
|   | Threat Level: low      |                                     |
|   | Recent Failures: 0     |                                     |
|   | Anomaly Score: 0.2     |                                     |
|   | Active Incidents: 0    |                                     |
|   +------------------------+                                     |
|                                                                  |
|   DECISION MATRIX:                                               |
|   +---------------------------------------------------------+   |
|   | Rule: "Critical repos require device health + work hours"|   |
|   |                                                         |   |
|   | subject.roles CONTAINS "developer"           -> TRUE    |   |
|   | resource.sensitivity == "critical"           -> TRUE    |   |
|   | environment.device_health == "secure"        -> TRUE    |   |
|   | environment.time IN [08:00, 20:00]           -> TRUE    |   |
|   | environment.risk_score < 0.5                 -> TRUE    |   |
|   |                                                         |   |
|   | ALL CONDITIONS MET -> ALLOW                             |   |
|   +---------------------------------------------------------+   |
|                                                                  |
+------------------------------------------------------------------+

Policy Conflict Resolution

When multiple policies apply, you need clear resolution rules:

+------------------------------------------------------------------+
|                    Policy Conflict Resolution                     |
+------------------------------------------------------------------+
|                                                                  |
|   STRATEGY 1: Deny Overrides (Most Secure)                       |
|   =========================================                      |
|   If ANY policy says DENY, the final decision is DENY.           |
|   Used by: AWS IAM, most enterprise systems                      |
|                                                                  |
|   Policy A: ALLOW (developer can read)                           |
|   Policy B: DENY (no access after 10PM)                          |
|   Final:    DENY                                                 |
|                                                                  |
|   +----------------------------------------------------------+   |
|   |   for policy in matching_policies:                       |   |
|   |       if policy.effect == DENY:                          |   |
|   |           return DENY, policy.reason                     |   |
|   |   return ALLOW if any_allow else DENY                    |   |
|   +----------------------------------------------------------+   |
|                                                                  |
|   STRATEGY 2: First Match (Order Matters)                        |
|   ========================================                       |
|   First policy that matches determines the outcome.              |
|   Used by: Firewall rules, some RBAC systems                     |
|                                                                  |
|   Policy 1: DENY contractors after hours                         |
|   Policy 2: ALLOW developers to push                             |
|   Request:  Contractor pushing at 3PM                            |
|   Final:    DENY (matched Policy 1 first)                        |
|                                                                  |
|   STRATEGY 3: Most Specific Wins                                 |
|   ==============================                                 |
|   Policy with most specific match takes precedence.              |
|   Used by: URL routing, some ABAC systems                        |
|                                                                  |
|   Policy A: General "developers can read repos"                  |
|   Policy B: Specific "Alice cannot read kernel-repo"             |
|   Final:    DENY for Alice on kernel-repo (more specific)        |
|                                                                  |
|   STRATEGY 4: Priority/Weight Based                              |
|   =================================                              |
|   Each policy has an explicit priority number.                   |
|                                                                  |
|   Policy A: priority=100, ALLOW developers                       |
|   Policy B: priority=200, DENY after hours                       |
|   Final:    Higher priority (200) wins -> DENY                   |
|                                                                  |
+------------------------------------------------------------------+

Fail-Open vs Fail-Closed Decisions

Critical design choice when the PDP cannot make a decision:

+------------------------------------------------------------------+
|                    Failure Mode Decision Matrix                   |
+------------------------------------------------------------------+
|                                                                  |
|   FAIL-CLOSED (Deny by Default)                                  |
|   =============================                                  |
|   If PDP is unavailable or uncertain, DENY the request.          |
|                                                                  |
|   Pros:                           Cons:                          |
|   - More secure                   - Availability impact          |
|   - No unauthorized access        - User frustration             |
|   - Meets compliance requirements - Business disruption          |
|                                                                  |
|   When to use:                                                   |
|   - Financial systems                                            |
|   - Healthcare data                                              |
|   - Critical infrastructure                                      |
|   - Government/classified systems                                |
|                                                                  |
|   +----------------------------------------------------------+   |
|   |   func decide(request) -> Decision:                      |   |
|   |       try:                                               |   |
|   |           return evaluate_policies(request)              |   |
|   |       except Timeout, Error:                             |   |
|   |           log.error("PDP failure, denying request")      |   |
|   |           return DENY("System unavailable")              |   |
|   +----------------------------------------------------------+   |
|                                                                  |
+------------------------------------------------------------------+
|                                                                  |
|   FAIL-OPEN (Allow by Default)                                   |
|   =============================                                  |
|   If PDP is unavailable, ALLOW the request.                      |
|                                                                  |
|   Pros:                           Cons:                          |
|   - Better availability           - Security risk                |
|   - No business disruption        - Compliance violations        |
|   - User experience preserved     - Audit gaps                   |
|                                                                  |
|   When to use (with extreme caution):                            |
|   - Read-only public data                                        |
|   - Non-sensitive operations                                     |
|   - When availability > confidentiality                          |
|                                                                  |
|   NEVER use for:                                                 |
|   - Write operations                                             |
|   - Sensitive data access                                        |
|   - Administrative actions                                       |
|                                                                  |
+------------------------------------------------------------------+
|                                                                  |
|   HYBRID: Cached Fallback                                        |
|   ========================                                       |
|   Use cached decisions during PDP outages.                       |
|                                                                  |
|   +----------------------------------------------------------+   |
|   |   func decide(request) -> Decision:                      |   |
|   |       cache_key = hash(request.subject, request.action,  |   |
|   |                        request.resource)                 |   |
|   |       try:                                               |   |
|   |           decision = evaluate_policies(request)          |   |
|   |           cache.set(cache_key, decision, ttl=300s)       |   |
|   |           return decision                                |   |
|   |       except Timeout:                                    |   |
|   |           if cached = cache.get(cache_key):              |   |
|   |               log.warn("Using cached decision")          |   |
|   |               return cached                              |   |
|   |           return DENY("No cached decision available")    |   |
|   +----------------------------------------------------------+   |
|                                                                  |
+------------------------------------------------------------------+

Caching Strategies for Authorization

Performance is critical - every request in your system calls the PDP:

+------------------------------------------------------------------+
|                    Authorization Caching Strategies               |
+------------------------------------------------------------------+
|                                                                  |
|   LAYER 1: PEP-Side Cache (Sidecar/Local)                        |
|   ========================================                       |
|                                                                  |
|   +----------+    +----------+    +----------+                   |
|   |   PEP    |--->|  Local   |--->|   PDP    |                   |
|   | Gateway  |    |  Cache   |    |  Server  |                   |
|   +----------+    +----------+    +----------+                   |
|                                                                  |
|   - TTL: 30s - 5min (depends on sensitivity)                     |
|   - Cache Key: hash(subject, action, resource)                   |
|   - Invalidation: on policy change, session end                  |
|                                                                  |
|   Latency: ~0.1ms (cache hit) vs ~5ms (PDP call)                 |
|                                                                  |
+------------------------------------------------------------------+
|                                                                  |
|   LAYER 2: Distributed Cache (Redis/Memcached)                   |
|   =============================================                  |
|                                                                  |
|   +--------+    +--------+    +---------+    +--------+          |
|   | PEP 1  |--->|        |    |         |<---| PEP 2  |          |
|   +--------+    |  Redis |<---|   PDP   |    +--------+          |
|   +--------+    | Cluster|    | Cluster |    +--------+          |
|   | PEP 3  |--->|        |    |         |<---| PEP 4  |          |
|   +--------+    +--------+    +---------+    +--------+          |
|                                                                  |
|   - Shared cache across all PEPs                                 |
|   - TTL: configurable per policy sensitivity                     |
|   - Pub/Sub for instant invalidation                             |
|                                                                  |
+------------------------------------------------------------------+
|                                                                  |
|   CACHE INVALIDATION STRATEGIES                                  |
|   ==============================                                 |
|                                                                  |
|   1. Time-Based (TTL):                                           |
|      cache.set(key, decision, ttl=300)  # 5 minutes              |
|                                                                  |
|   2. Event-Based (Push Invalidation):                            |
|      on_policy_change -> redis.publish("invalidate", policy_id)  |
|      on_session_end -> redis.delete(session_cache_key)           |
|                                                                  |
|   3. Version-Based:                                              |
|      cache_key = f"{subject}:{action}:{resource}:v{policy_ver}"  |
|      # Old keys expire, new keys created on policy update        |
|                                                                  |
|   4. Hierarchical:                                               |
|      cache["org:acme:*"]           -> clear all Acme policies    |
|      cache["org:acme:user:alice"]  -> clear Alice's cache only   |
|                                                                  |
+------------------------------------------------------------------+
|                                                                  |
|   CACHE CONSISTENCY vs LATENCY TRADEOFF                          |
|   ======================================                         |
|                                                                  |
|   High Sensitivity Resource (PII, Financial):                    |
|   - TTL: 0 (no cache) or 30 seconds max                          |
|   - Immediate invalidation on policy change                      |
|   - Accept higher latency for security                           |
|                                                                  |
|   Medium Sensitivity Resource (Internal Docs):                   |
|   - TTL: 5 minutes                                               |
|   - Eventual consistency acceptable                              |
|                                                                  |
|   Low Sensitivity Resource (Public API):                         |
|   - TTL: 30 minutes                                              |
|   - Stale reads acceptable                                       |
|                                                                  |
+------------------------------------------------------------------+

Complete Project Specification

Request/Response JSON Schemas

Authorization Request Schema

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "title": "AuthorizationRequest",
  "type": "object",
  "required": ["subject", "action", "resource"],
  "properties": {
    "request_id": {
      "type": "string",
      "description": "Unique identifier for this request (for tracing)"
    },
    "subject": {
      "type": "object",
      "description": "The entity requesting access",
      "required": ["id"],
      "properties": {
        "id": { "type": "string" },
        "type": { "type": "string", "enum": ["user", "service", "device"] },
        "roles": { "type": "array", "items": { "type": "string" } },
        "groups": { "type": "array", "items": { "type": "string" } },
        "attributes": { "type": "object" },
        "device_health": {
          "type": "string",
          "enum": ["secure", "at_risk", "compromised", "unknown"]
        },
        "mfa_verified": { "type": "boolean" },
        "session_age_seconds": { "type": "integer" }
      }
    },
    "action": {
      "type": "string",
      "description": "The operation being requested"
    },
    "resource": {
      "type": "object",
      "description": "The target of the action",
      "required": ["id"],
      "properties": {
        "id": { "type": "string" },
        "type": { "type": "string" },
        "owner": { "type": "string" },
        "sensitivity": {
          "type": "string",
          "enum": ["public", "internal", "confidential", "critical"]
        },
        "attributes": { "type": "object" }
      }
    },
    "environment": {
      "type": "object",
      "description": "Contextual information about the request",
      "properties": {
        "timestamp": { "type": "string", "format": "date-time" },
        "ip_address": { "type": "string" },
        "location": { "type": "string" },
        "network_type": {
          "type": "string",
          "enum": ["corporate", "vpn", "public", "unknown"]
        },
        "user_agent": { "type": "string" }
      }
    }
  }
}

Authorization Response Schema

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "title": "AuthorizationResponse",
  "type": "object",
  "required": ["decision", "request_id", "evaluated_at"],
  "properties": {
    "decision": {
      "type": "string",
      "enum": ["ALLOW", "DENY"]
    },
    "request_id": {
      "type": "string",
      "description": "Echo of the request ID for correlation"
    },
    "reason": {
      "type": "string",
      "description": "Human-readable explanation of the decision"
    },
    "matched_policy": {
      "type": "string",
      "description": "ID of the policy that determined this decision"
    },
    "evaluated_at": {
      "type": "string",
      "format": "date-time"
    },
    "evaluation_time_ms": {
      "type": "number",
      "description": "Time taken to evaluate (for performance monitoring)"
    },
    "obligations": {
      "type": "array",
      "description": "Actions the PEP must take if allowing (e.g., log, notify)",
      "items": {
        "type": "object",
        "properties": {
          "action": { "type": "string" },
          "parameters": { "type": "object" }
        }
      }
    }
  }
}

Policy Rule Format Specification

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "title": "Policy",
  "type": "object",
  "required": ["id", "effect"],
  "properties": {
    "id": {
      "type": "string",
      "description": "Unique policy identifier"
    },
    "name": {
      "type": "string",
      "description": "Human-readable policy name"
    },
    "description": {
      "type": "string"
    },
    "effect": {
      "type": "string",
      "enum": ["allow", "deny"]
    },
    "priority": {
      "type": "integer",
      "default": 100,
      "description": "Higher priority policies are evaluated first"
    },
    "subjects": {
      "type": "object",
      "description": "Conditions on the requesting entity",
      "properties": {
        "ids": { "type": "array", "items": { "type": "string" } },
        "roles": { "type": "array", "items": { "type": "string" } },
        "groups": { "type": "array", "items": { "type": "string" } },
        "types": { "type": "array", "items": { "type": "string" } },
        "attributes": {
          "type": "object",
          "additionalProperties": true
        }
      }
    },
    "actions": {
      "type": "array",
      "items": { "type": "string" },
      "description": "List of actions this policy applies to"
    },
    "resources": {
      "type": "object",
      "description": "Conditions on the target resource",
      "properties": {
        "ids": { "type": "array", "items": { "type": "string" } },
        "types": { "type": "array", "items": { "type": "string" } },
        "owners": { "type": "array", "items": { "type": "string" } },
        "sensitivity": { "type": "array", "items": { "type": "string" } },
        "attributes": {
          "type": "object",
          "additionalProperties": true
        }
      }
    },
    "conditions": {
      "type": "object",
      "description": "Additional conditions that must be met",
      "properties": {
        "time_range": {
          "type": "object",
          "properties": {
            "start": { "type": "string", "pattern": "^[0-2][0-9]:[0-5][0-9]$" },
            "end": { "type": "string", "pattern": "^[0-2][0-9]:[0-5][0-9]$" },
            "timezone": { "type": "string" },
            "days": {
              "type": "array",
              "items": { "type": "string", "enum": ["Mon","Tue","Wed","Thu","Fri","Sat","Sun"] }
            }
          }
        },
        "device_health": {
          "type": "array",
          "items": { "type": "string", "enum": ["secure", "at_risk"] }
        },
        "mfa_required": { "type": "boolean" },
        "network_types": {
          "type": "array",
          "items": { "type": "string" }
        },
        "max_session_age_seconds": { "type": "integer" },
        "custom": {
          "type": "object",
          "description": "Custom JSONPath-based conditions"
        }
      }
    },
    "obligations": {
      "type": "array",
      "description": "Actions to take when this policy matches",
      "items": {
        "type": "object",
        "properties": {
          "on": { "type": "string", "enum": ["allow", "deny", "both"] },
          "action": { "type": "string" },
          "parameters": { "type": "object" }
        }
      }
    }
  }
}

Example Policies for Common Scenarios

{
  "policies": [
    {
      "id": "admin-full-access",
      "name": "Administrators have full access",
      "description": "Global admin override - use with caution",
      "effect": "allow",
      "priority": 1000,
      "subjects": {
        "roles": ["admin", "super-admin"]
      },
      "actions": ["*"],
      "resources": {}
    },
    {
      "id": "dev-push-business-hours",
      "name": "Developers can push during business hours",
      "effect": "allow",
      "priority": 100,
      "subjects": {
        "roles": ["developer", "sre"]
      },
      "actions": ["push", "merge"],
      "resources": {
        "types": ["repository"]
      },
      "conditions": {
        "time_range": {
          "start": "08:00",
          "end": "20:00",
          "timezone": "America/New_York",
          "days": ["Mon", "Tue", "Wed", "Thu", "Fri"]
        },
        "device_health": ["secure"]
      }
    },
    {
      "id": "block-critical-after-hours",
      "name": "Block access to critical resources after hours",
      "effect": "deny",
      "priority": 200,
      "subjects": {},
      "actions": ["*"],
      "resources": {
        "sensitivity": ["critical"]
      },
      "conditions": {
        "time_range": {
          "start": "22:00",
          "end": "06:00"
        }
      }
    },
    {
      "id": "require-mfa-for-sensitive",
      "name": "Require MFA for confidential resources",
      "effect": "deny",
      "priority": 300,
      "subjects": {
        "attributes": {
          "mfa_verified": false
        }
      },
      "actions": ["*"],
      "resources": {
        "sensitivity": ["confidential", "critical"]
      },
      "obligations": [
        {
          "on": "deny",
          "action": "require_mfa",
          "parameters": { "redirect": "/auth/mfa" }
        }
      ]
    },
    {
      "id": "service-mesh-internal",
      "name": "Services can communicate within mesh",
      "effect": "allow",
      "priority": 50,
      "subjects": {
        "types": ["service"],
        "groups": ["internal-services"]
      },
      "actions": ["call", "invoke"],
      "resources": {
        "types": ["service", "api"]
      },
      "conditions": {
        "network_types": ["corporate", "vpn"]
      }
    },
    {
      "id": "compromised-device-block",
      "name": "Block all access from compromised devices",
      "effect": "deny",
      "priority": 999,
      "subjects": {
        "attributes": {
          "device_health": "compromised"
        }
      },
      "actions": ["*"],
      "resources": {},
      "obligations": [
        {
          "on": "deny",
          "action": "alert_security_team",
          "parameters": { "severity": "high" }
        }
      ]
    }
  ]
}

Performance Requirements

Metric Target Acceptable Unacceptable
P50 Latency < 2ms < 5ms > 10ms
P99 Latency < 10ms < 25ms > 50ms
Throughput > 10,000 req/s > 5,000 req/s < 1,000 req/s
Error Rate < 0.01% < 0.1% > 1%
Policy Reload < 100ms < 500ms > 1s

Real World Outcome

By the end of this project, you will have a high-performance authorization microservice. You can integrate this with the Proxy from Project 1 to create a complete Zero Trust flow.

What you will see:

  1. A REST API: Listening on port 9090
  2. Dynamic Policy Loading: Change a JSON file on disk, and the PDP will immediately change its decisions without a restart
  3. Detailed Decision Logs: The PDP prints why it allowed or denied a request, which is essential for security auditing

Command-Line Examples

# 1. Start your Policy Engine
$ ./zta-pdp --policy-file ./policies.json --port 9090
[INFO] PDP v1.0.0 starting...
[INFO] Loading policies from ./policies.json
[INFO] Loaded 6 security policies
[INFO] Policy Engine listening on :9090
[INFO] Healthcheck endpoint: GET /health
[INFO] Decision endpoint: POST /v1/decide

# 2. Check health status
$ curl http://localhost:9090/health
{
  "status": "healthy",
  "policies_loaded": 6,
  "uptime_seconds": 45,
  "version": "1.0.0"
}

# 3. Developer pushing to repo during business hours (ALLOWED)
$ curl -s -X POST http://localhost:9090/v1/decide \
  -H "Content-Type: application/json" \
  -d '{
    "request_id": "req-001",
    "subject": {
      "id": "alice@example.com",
      "type": "user",
      "roles": ["developer"],
      "device_health": "secure"
    },
    "action": "push",
    "resource": {
      "id": "kernel-repo",
      "type": "repository",
      "sensitivity": "internal"
    },
    "environment": {
      "timestamp": "2024-12-26T14:00:00Z",
      "network_type": "corporate"
    }
  }' | jq .

{
  "decision": "ALLOW",
  "request_id": "req-001",
  "reason": "Matched policy 'dev-push-business-hours': Developers can push during business hours",
  "matched_policy": "dev-push-business-hours",
  "evaluated_at": "2024-12-26T14:00:01.234Z",
  "evaluation_time_ms": 0.45
}

# 4. Same developer pushing at 3 AM (DENIED)
$ curl -s -X POST http://localhost:9090/v1/decide \
  -H "Content-Type: application/json" \
  -d '{
    "request_id": "req-002",
    "subject": {
      "id": "alice@example.com",
      "type": "user",
      "roles": ["developer"],
      "device_health": "secure"
    },
    "action": "push",
    "resource": {
      "id": "kernel-repo",
      "type": "repository",
      "sensitivity": "critical"
    },
    "environment": {
      "timestamp": "2024-12-26T03:00:00Z",
      "network_type": "vpn"
    }
  }' | jq .

{
  "decision": "DENY",
  "request_id": "req-002",
  "reason": "Matched policy 'block-critical-after-hours': Block access to critical resources after hours",
  "matched_policy": "block-critical-after-hours",
  "evaluated_at": "2024-12-26T03:00:01.567Z",
  "evaluation_time_ms": 0.38
}

# 5. User without MFA accessing confidential resource (DENIED with obligation)
$ curl -s -X POST http://localhost:9090/v1/decide \
  -H "Content-Type: application/json" \
  -d '{
    "request_id": "req-003",
    "subject": {
      "id": "bob@example.com",
      "type": "user",
      "roles": ["viewer"],
      "mfa_verified": false
    },
    "action": "read",
    "resource": {
      "id": "financial-reports",
      "type": "document",
      "sensitivity": "confidential"
    },
    "environment": {
      "timestamp": "2024-12-26T10:00:00Z"
    }
  }' | jq .

{
  "decision": "DENY",
  "request_id": "req-003",
  "reason": "Matched policy 'require-mfa-for-sensitive': Require MFA for confidential resources",
  "matched_policy": "require-mfa-for-sensitive",
  "evaluated_at": "2024-12-26T10:00:01.890Z",
  "evaluation_time_ms": 0.52,
  "obligations": [
    {
      "action": "require_mfa",
      "parameters": { "redirect": "/auth/mfa" }
    }
  ]
}

# 6. Admin accessing everything (ALLOWED - admin override)
$ curl -s -X POST http://localhost:9090/v1/decide \
  -H "Content-Type: application/json" \
  -d '{
    "request_id": "req-004",
    "subject": {
      "id": "charlie@example.com",
      "type": "user",
      "roles": ["admin"]
    },
    "action": "delete",
    "resource": {
      "id": "production-database",
      "type": "database",
      "sensitivity": "critical"
    },
    "environment": {
      "timestamp": "2024-12-26T03:00:00Z"
    }
  }' | jq .

{
  "decision": "ALLOW",
  "request_id": "req-004",
  "reason": "Matched policy 'admin-full-access': Administrators have full access",
  "matched_policy": "admin-full-access",
  "evaluated_at": "2024-12-26T03:00:01.234Z",
  "evaluation_time_ms": 0.28
}

# 7. Hot reload policies (update JSON file, then trigger reload)
$ curl -X POST http://localhost:9090/admin/reload-policies
{
  "status": "reloaded",
  "policies_loaded": 7,
  "reload_time_ms": 45.2
}

# 8. View audit log (last 10 decisions)
$ curl http://localhost:9090/admin/audit?limit=10 | jq .
{
  "decisions": [
    {
      "request_id": "req-004",
      "subject_id": "charlie@example.com",
      "action": "delete",
      "resource_id": "production-database",
      "decision": "ALLOW",
      "matched_policy": "admin-full-access",
      "timestamp": "2024-12-26T03:00:01.234Z"
    },
    ...
  ]
}

The Core Question You’re Answering

“How do I build a system that can answer ‘Is this user allowed to do this action on this resource?’ in milliseconds, with policies that are flexible, auditable, and maintainable?”

This question sits at the heart of every secure system. Every time someone clicks “Submit,” every API call, every database query - somewhere, something must decide: allowed or denied. Traditional approaches scatter this logic across codebases, making security audits nightmares and policy changes dangerous multi-week projects. A well-designed Policy Decision Engine centralizes this critical logic, making authorization decisions consistent, traceable, and adaptable to changing business requirements without touching application code.


Concepts You Must Understand First

Before writing any code, you need to internalize these foundational concepts. For each one, make sure you can answer the associated questions.

1. Policy Decision Point (PDP) Architecture

The question to answer: What is the PDP’s role in the request lifecycle, and why must it be stateless?

The PDP is the “brain” that evaluates authorization requests. It receives context (who, what, where, when) and returns a decision. Understanding why this component must be isolated, fast, and horizontally scalable is essential.

  • How does the PDP differ from a Policy Enforcement Point (PEP)?
  • Why should the PDP never make network calls to enforce decisions?
  • What happens to your system’s reliability if the PDP becomes a single point of failure?

Read: “Zero Trust Networks” by Gilman & Barth, Chapter 3: The Zero Trust Control Plane - specifically the sections on policy architecture and the separation of concerns between decision and enforcement.

2. Access Control Models: ABAC vs RBAC vs PBAC

The question to answer: When is RBAC sufficient, and when does the complexity of ABAC become necessary?

Role-Based Access Control (RBAC) assigns permissions to roles, then assigns roles to users. Attribute-Based Access Control (ABAC) evaluates policies based on arbitrary attributes of subjects, resources, actions, and context. Policy-Based Access Control (PBAC) uses explicit policy statements that can combine elements of both.

  • What is “role explosion” and why does it plague large RBAC systems?
  • How would you express “contractors can only access non-production resources during business hours” in RBAC vs ABAC?
  • What attributes beyond “role” might influence an authorization decision?

Read: “Security in Computing” by Charles Pfleeger, Chapter 4: Access Control - covers the theoretical foundations of access control matrices, capabilities, and access control lists.

3. Policy Languages: Rego, Cedar, and XACML Concepts

The question to answer: What makes a policy language expressive enough to capture real security requirements, yet analyzable enough to prove properties about the policies?

Production systems use specialized Domain-Specific Languages (DSLs) for expressing policies. Understanding why these exist - and their tradeoffs - helps you design a sensible policy format even if you start with JSON.

  • What is Datalog, and why do languages like Rego build on it?
  • How does Cedar’s type system help prevent policy errors?
  • Why is it valuable to be able to answer “Can this policy ever allow access to resource X?”

Read: “Zero Trust Networks” by Gilman & Barth, Chapter 3 - discusses policy engines and their role. Additionally, explore the Open Policy Agent (OPA) documentation on Rego’s evaluation model.

4. Rule Evaluation and Conflict Resolution

The question to answer: When Policy A says ALLOW and Policy B says DENY, what should happen and why?

Real systems have many policies, and they overlap. Multiple policies might apply to a single request. The conflict resolution strategy you choose fundamentally affects your security posture.

  • What does “deny overrides” mean, and why is it the most common strategy?
  • How does policy priority ordering work, and when is it necessary?
  • What is the “default deny” principle, and how does it relate to fail-closed behavior?

Read: “Foundations of Information Security” by Jason Andress, Chapter 5: Authentication and Authorization - covers authorization principles and the logic behind access decisions.

5. Caching Strategies for Policy Decisions

The question to answer: How do you cache authorization decisions without creating security holes?

Every millisecond counts when every request requires an authorization check. Caching can reduce latency by 10-100x, but caching security decisions introduces risks: stale permissions, delayed revocation, cache poisoning.

  • What should the cache key be for an authorization decision?
  • When a user’s permissions are revoked, how quickly must cached decisions be invalidated?
  • What is cache stampede, and how do you prevent it during policy reloads?

Read: “Designing Data-Intensive Applications, 2nd Ed” by Martin Kleppmann, Chapters 1-2 - covers caching patterns, consistency tradeoffs, and the challenges of distributed state.

6. Audit Logging for Compliance

The question to answer: What information must you log to reconstruct exactly why access was granted or denied?

Authorization decisions are among the most security-sensitive events in a system. Regulators, auditors, and incident responders need to understand who accessed what, when, and why.

  • What is the difference between an audit log and an application log?
  • Why must audit logs be append-only and tamper-evident?
  • What fields are essential in an authorization audit record?

Read: “Security in Computing” by Charles Pfleeger, sections on auditing and accountability - discusses what makes an audit trail useful for security analysis.


Questions to Guide Your Design

These questions should shape your implementation decisions. Answer them before and during your build.

Architecture Questions

  1. Where does your PDP fit? Draw a diagram showing how requests flow from a user through authentication, to your PDP, to the protected resource. Where is the PEP in this flow?

  2. What is your API contract? What fields are required in an authorization request? What does a response look like? What HTTP status codes do you return?

  3. How do you handle unknown fields? If a request includes an attribute your PDP doesn’t understand, do you ignore it, reject the request, or log a warning?

  4. What is your failure mode? If your PDP crashes, throws an exception, or times out, is the request allowed or denied?

Policy Design Questions

  1. How expressive is your policy language? Can you express “Developers can push to non-production repos during NYC business hours from devices that passed health checks in the last 24 hours”?

  2. How do you handle wildcards? Does actions: ["*"] match any action? Does resources: {} (empty object) match all resources or no resources?

  3. What happens when no policy matches? Is this an implicit deny, or an error condition?

  4. How do you test policies before deploying them? Can you simulate a decision without affecting production?

Performance Questions

  1. What is your latency budget? If you have 100ms for the entire request, how much can the PDP consume?

  2. How do you scale? Can you run multiple PDP instances? Do they share state?

  3. What can you precompute? Can you compile policies into faster data structures at load time?

  4. What can you cache? Are there request patterns that repeat frequently enough to cache?

Operational Questions

  1. How do you update policies without downtime? What happens to in-flight requests during a policy reload?

  2. How do you know your PDP is healthy? What metrics do you expose? What does a health check verify?

  3. How do you debug a wrong decision? Can you replay a request and see exactly which policy matched and why?


Thinking Exercise

Before you write code, work through this scenario with pencil and paper. This exercises your understanding of policy evaluation.

Scenario: Multi-Policy Evaluation

You have four policies loaded in your PDP:

Policy A (priority: 100, effect: ALLOW)
  - subjects: roles contain "developer"
  - actions: ["read", "push"]
  - resources: type = "repository"
  - conditions: none

Policy B (priority: 200, effect: DENY)
  - subjects: all
  - actions: ["push", "merge", "delete"]
  - resources: sensitivity = "critical"
  - conditions: time outside 08:00-18:00

Policy C (priority: 150, effect: ALLOW)
  - subjects: roles contain "admin"
  - actions: ["*"]
  - resources: all
  - conditions: none

Policy D (priority: 300, effect: DENY)
  - subjects: device_health = "compromised"
  - actions: ["*"]
  - resources: all
  - conditions: none

Now trace through these authorization requests. For each one, identify:

  1. Which policies match the subject, action, and resource?
  2. Which of those also pass their condition checks?
  3. Using “deny overrides” with priority ordering, what is the final decision?

Request 1:

{
  "subject": { "id": "alice", "roles": ["developer"], "device_health": "secure" },
  "action": "push",
  "resource": { "id": "web-app", "type": "repository", "sensitivity": "internal" },
  "environment": { "timestamp": "2024-12-26T14:00:00Z" }
}

Request 2:

{
  "subject": { "id": "bob", "roles": ["developer"], "device_health": "secure" },
  "action": "push",
  "resource": { "id": "payment-service", "type": "repository", "sensitivity": "critical" },
  "environment": { "timestamp": "2024-12-26T22:00:00Z" }
}

Request 3:

{
  "subject": { "id": "charlie", "roles": ["admin"], "device_health": "secure" },
  "action": "delete",
  "resource": { "id": "payment-service", "type": "repository", "sensitivity": "critical" },
  "environment": { "timestamp": "2024-12-26T22:00:00Z" }
}

Request 4:

{
  "subject": { "id": "diana", "roles": ["admin"], "device_health": "compromised" },
  "action": "read",
  "resource": { "id": "docs", "type": "document", "sensitivity": "public" },
  "environment": { "timestamp": "2024-12-26T10:00:00Z" }
}

Work through each request step by step. Write down your reasoning. Then check your answers:

Click to reveal answers

Request 1: ALLOW (via Policy A)

  • Policy A matches: developer, push, repository - no conditions - ALLOW
  • Policy B: matches push, but sensitivity is “internal” not “critical” - no match
  • Policy C: not admin - no match
  • Policy D: device not compromised - no match
  • Only Policy A applies. Decision: ALLOW

Request 2: DENY (via Policy B)

  • Policy A: matches subject, action, resource - ALLOW candidate
  • Policy B: matches (push action, critical sensitivity, time 22:00 is outside 08:00-18:00) - DENY candidate
  • Policy C: not admin - no match
  • Policy D: device not compromised - no match
  • Policy B has higher priority (200 > 100) and is DENY. With deny-overrides: DENY

Request 3: DENY (via Policy B)

  • Policy A: not “push” or “read” action (it’s “delete”) - wait, check again. Policy A has actions [“read”, “push”]. Delete is not in that list. - no match
  • Policy B: delete is in [“push”, “merge”, “delete”], sensitivity is critical, time 22:00 is outside hours - DENY candidate
  • Policy C: admin, wildcard action - ALLOW candidate
  • Policy D: device not compromised - no match
  • Both B and C match. B has priority 200, C has priority 150. But this is deny-overrides, so ANY deny causes denial regardless of priority. Decision: DENY

Request 4: DENY (via Policy D)

  • Policy A: admin not in [“developer”] - no match
  • Policy B: action “read” not in [“push”, “merge”, “delete”] - no match
  • Policy C: admin, wildcard action - ALLOW candidate
  • Policy D: device_health is compromised - DENY candidate
  • Both C and D match. D has higher priority (300) and is DENY. With deny-overrides: DENY

If your answers differed, review the matching logic. Pay special attention to:

  • Whether empty conditions mean “always true” or “never match”
  • How “deny overrides” interacts with priority
  • The difference between “no match” and “match with opposite effect”

Hints in Layers

If you get stuck, reveal hints progressively. Try to solve problems yourself first.

Hint 1: Start With the Simplest Possible Version

Click to reveal

Don’t start with JSON policy files. Start with hardcoded policies in your code. Create a /v1/decide endpoint that:

  1. Accepts a POST request with JSON body
  2. Parses subject, action, and resource from the body
  3. Has ONE hardcoded rule: “if subject.roles contains ‘admin’, return ALLOW”
  4. Returns DENY for everything else

Once this works, you’ve proven your HTTP layer, JSON parsing, and decision response format. Only then add policy loading.

Hint 2: Policy Matching is Just Set Intersection

Click to reveal

Most policy matching reduces to: “Does the request attribute intersect with the policy’s allowed set?”

func matchesSubject(request, policy):
    if policy.subjects.roles is empty:
        return true  // Empty means "any"
    return intersection(request.subject.roles, policy.subjects.roles) is not empty

The same pattern applies to actions and resource types. An empty constraint means “match all.” A non-empty constraint means “at least one must match.”

Watch out for the wildcard action "*" - handle it explicitly.

Hint 3: Conditions Are Just Boolean Expressions

Click to reveal

Each condition type is a function that returns true or false:

func evaluateConditions(request, policy):
    for each condition in policy.conditions:
        if condition is time_range:
            if not isInTimeRange(request.environment.timestamp, condition.start, condition.end):
                return false
        if condition is device_health:
            if request.subject.device_health not in condition.allowed_states:
                return false
        // ... more condition types
    return true  // All conditions passed

Start by supporting just one condition type (time_range is a good first choice). Add more incrementally.

Hint 4: Use RWMutex for Policy Hot-Reload

Click to reveal

The classic reader-writer problem: many goroutines read policies (evaluating decisions), but occasionally one goroutine writes (reloading policies).

type PolicyStore struct {
    mu       sync.RWMutex
    policies []Policy
}

func (ps *PolicyStore) Evaluate(req Request) Decision {
    ps.mu.RLock()         // Multiple readers can hold this simultaneously
    defer ps.mu.RUnlock()
    // Read from ps.policies safely
}

func (ps *PolicyStore) Reload(newPolicies []Policy) {
    ps.mu.Lock()          // Exclusive lock - blocks all readers
    defer ps.mu.Unlock()
    ps.policies = newPolicies
}

The key insight: RLock doesn’t block other RLocks, so decision evaluation remains parallel. Only during the brief moment of Reload do readers block.

Hint 5: Cache Key Design Matters

Click to reveal

A naive cache key might be: hash(entire_request). But this has poor hit rates because timestamps differ on every request.

Better approach: Cache based on the attributes that actually affect the decision:

func cacheKey(request):
    relevant = {
        "subject_id": request.subject.id,
        "subject_roles": sorted(request.subject.roles),
        "action": request.action,
        "resource_id": request.resource.id,
        "resource_type": request.resource.type,
        // Note: NOT including timestamp, only hour
        "hour_bucket": request.environment.timestamp.hour
    }
    return sha256(json(relevant))

The “hour bucket” means decisions are cached per hour, which is appropriate for hourly time-range policies. Adjust granularity based on your policy precision.

Also consider: what happens when a user’s roles change? You need a way to invalidate their cached decisions. One pattern: include a “cache version” in the key, and increment it on role changes.


Solution Architecture

Component Diagram

+------------------------------------------------------------------+
|                    PDP Architecture Overview                      |
+------------------------------------------------------------------+
|                                                                  |
|   EXTERNAL                        INTERNAL                       |
|   +------------------+            +---------------------------+  |
|   |  Policy JSON     |            |      Policy Engine        |  |
|   |  File/DB         |----------->|                           |  |
|   +------------------+            |  +---------------------+  |  |
|           ^                       |  | Policy Store        |  |  |
|           |                       |  | (In-Memory + Index) |  |  |
|   +------------------+            |  +---------------------+  |  |
|   | File Watcher /   |            |           |               |  |
|   | Admin API        |------------|           v               |  |
|   +------------------+            |  +---------------------+  |  |
|                                   |  | Rule Evaluator      |  |  |
|   +------------------+            |  | - Condition Matcher |  |  |
|   |  HTTP Server     |----------->|  | - Conflict Resolver |  |  |
|   |  :9090           |            |  +---------------------+  |  |
|   +------------------+            |           |               |  |
|           ^                       |           v               |  |
|           |                       |  +---------------------+  |  |
|   +------------------+            |  | Decision Cache      |  |  |
|   |  PEP / Gateway   |            |  | (Optional Redis)    |  |  |
|   |  (Project 1)     |            |  +---------------------+  |  |
|   +------------------+            |           |               |  |
|                                   |           v               |  |
|   +------------------+            |  +---------------------+  |  |
|   | Policy Info      |<-----------|  | Audit Logger        |  |  |
|   | Points (PIPs):   |            |  | (Append-Only Log)   |  |  |
|   | - Device Health  |            |  +---------------------+  |  |
|   | - Threat Intel   |            |                           |  |
|   +------------------+            +---------------------------+  |
|                                                                  |
+------------------------------------------------------------------+

Policy Evaluation Flow

+------------------------------------------------------------------+
|                    Policy Evaluation Flow                         |
+------------------------------------------------------------------+
|                                                                  |
|   1. REQUEST RECEIVED                                            |
|   +----------------------------------------------------------+   |
|   | POST /v1/decide                                          |   |
|   | { subject, action, resource, environment }               |   |
|   +-------------------------+--------------------------------+   |
|                             |                                    |
|                             v                                    |
|   2. REQUEST VALIDATION                                          |
|   +----------------------------------------------------------+   |
|   | - Validate JSON schema                                   |   |
|   | - Normalize fields (lowercase, trim)                     |   |
|   | - Set defaults for missing optional fields               |   |
|   | - Generate request_id if not provided                    |   |
|   +-------------------------+--------------------------------+   |
|                             |                                    |
|                             v                                    |
|   3. CACHE CHECK (Optional)                                      |
|   +----------------------------------------------------------+   |
|   | cache_key = hash(subject.id, action, resource.id)        |   |
|   | if cache.has(cache_key):                                 |   |
|   |     return cache.get(cache_key)  ----------------------->|-->|
|   +-------------------------+--------------------------------+   |
|                             |                                    |
|                             v                                    |
|   4. DATA ENRICHMENT (Optional)                                  |
|   +----------------------------------------------------------+   |
|   | - Fetch device health from Device Trust Service          |   |
|   | - Fetch user attributes from Identity Store              |   |
|   | - Fetch threat intel for IP address                      |   |
|   | - Attach enriched data to request context                |   |
|   +-------------------------+--------------------------------+   |
|                             |                                    |
|                             v                                    |
|   5. POLICY MATCHING                                             |
|   +----------------------------------------------------------+   |
|   | for policy in policies (sorted by priority DESC):        |   |
|   |     if matches_subject(request, policy) AND              |   |
|   |        matches_action(request, policy) AND               |   |
|   |        matches_resource(request, policy):                |   |
|   |            add to candidate_policies                     |   |
|   +-------------------------+--------------------------------+   |
|                             |                                    |
|                             v                                    |
|   6. CONDITION EVALUATION                                        |
|   +----------------------------------------------------------+   |
|   | for policy in candidate_policies:                        |   |
|   |     if evaluate_conditions(request, policy.conditions):  |   |
|   |         add to matching_policies                         |   |
|   +-------------------------+--------------------------------+   |
|                             |                                    |
|                             v                                    |
|   7. CONFLICT RESOLUTION                                         |
|   +----------------------------------------------------------+   |
|   | Strategy: Deny Overrides                                 |   |
|   |                                                          |   |
|   | if any(p.effect == DENY for p in matching_policies):     |   |
|   |     decision = DENY                                      |   |
|   |     matched_policy = first DENY policy                   |   |
|   | elif any(p.effect == ALLOW for p in matching_policies):  |   |
|   |     decision = ALLOW                                     |   |
|   |     matched_policy = first ALLOW policy                  |   |
|   | else:                                                    |   |
|   |     decision = DENY  # Default deny if no match          |   |
|   |     reason = "No matching policy found"                  |   |
|   +-------------------------+--------------------------------+   |
|                             |                                    |
|                             v                                    |
|   8. RESPONSE GENERATION                                         |
|   +----------------------------------------------------------+   |
|   | response = {                                             |   |
|   |     decision: ALLOW/DENY,                                |   |
|   |     reason: matched_policy.description,                  |   |
|   |     matched_policy: matched_policy.id,                   |   |
|   |     obligations: matched_policy.obligations,             |   |
|   |     evaluated_at: now(),                                 |   |
|   |     evaluation_time_ms: elapsed                          |   |
|   | }                                                        |   |
|   +-------------------------+--------------------------------+   |
|                             |                                    |
|                             v                                    |
|   9. CACHE UPDATE & AUDIT                                        |
|   +----------------------------------------------------------+   |
|   | cache.set(cache_key, response, ttl=300)                  |   |
|   | audit_log.append(request, response)                      |   |
|   +-------------------------+--------------------------------+   |
|                             |                                    |
|                             v                                    |
|   10. RETURN RESPONSE                                            |
|   +----------------------------------------------------------+   |
|   | HTTP 200 OK                                              |   |
|   | Content-Type: application/json                           |   |
|   | { decision, reason, ... }                                |   |
|   +----------------------------------------------------------+   |
|                                                                  |
+------------------------------------------------------------------+

Key Design Decisions

Decision Choice Rationale
Language Go or Rust Performance-critical, concurrent, minimal GC pauses
Policy Format JSON Human-readable, tooling support, easy hot-reload
Storage In-memory with file backing Sub-millisecond lookups, persistence on restart
Conflict Strategy Deny Overrides Security-first, predictable behavior
Caching Optional Redis/local Trade consistency for performance when acceptable
API Style REST + JSON Universal compatibility, easy debugging
Audit Log Append-only file Tamper-evident, compliance-friendly

Phased Implementation Guide

Phase 1: Basic Allow/Deny Endpoint

Goal: Create a minimal server that accepts authorization requests and returns hardcoded decisions.

Deliverable: Working HTTP server with /v1/decide endpoint.

Steps:

  1. Set up project structure (Go module or Rust crate)
  2. Create request/response structs from the JSON schemas
  3. Implement HTTP handler for POST /v1/decide
  4. Validate incoming JSON structure
  5. Return hardcoded DENY for all requests (fail-closed default)
  6. Add health check endpoint GET /health

Verification:

# Server should start
$ ./zta-pdp --port 9090
[INFO] PDP listening on :9090

# Health check works
$ curl http://localhost:9090/health
{"status": "healthy"}

# Decision endpoint returns DENY
$ curl -X POST http://localhost:9090/v1/decide \
  -d '{"subject":{"id":"test"},"action":"read","resource":{"id":"test"}}' \
  -H "Content-Type: application/json"
{"decision": "DENY", "reason": "No policies configured"}

Phase 2: Simple Role-Based Rules

Goal: Implement basic RBAC - if user has required role, allow access.

Deliverable: Working policy evaluation with role matching.

Steps:

  1. Create Policy struct matching the schema
  2. Load policies from a JSON file at startup
  3. Implement matches_subject() - check if subject roles intersect with policy roles
  4. Implement matches_action() - check if action is in policy actions
  5. Implement matches_resource() - check if resource type matches
  6. Return ALLOW if any policy matches with effect: "allow"

Sample Policy File (policies.json):

{
  "policies": [
    {
      "id": "admin-all",
      "effect": "allow",
      "subjects": { "roles": ["admin"] },
      "actions": ["*"],
      "resources": {}
    },
    {
      "id": "dev-read",
      "effect": "allow",
      "subjects": { "roles": ["developer"] },
      "actions": ["read"],
      "resources": { "types": ["repository"] }
    }
  ]
}

Verification:

# Admin can do anything
$ curl -X POST http://localhost:9090/v1/decide \
  -d '{"subject":{"id":"alice","roles":["admin"]},"action":"delete","resource":{"id":"db"}}' \
  -H "Content-Type: application/json"
{"decision": "ALLOW", "matched_policy": "admin-all"}

# Developer can read repos
$ curl -X POST http://localhost:9090/v1/decide \
  -d '{"subject":{"id":"bob","roles":["developer"]},"action":"read","resource":{"id":"repo","type":"repository"}}' \
  -H "Content-Type: application/json"
{"decision": "ALLOW", "matched_policy": "dev-read"}

# Developer cannot delete
$ curl -X POST http://localhost:9090/v1/decide \
  -d '{"subject":{"id":"bob","roles":["developer"]},"action":"delete","resource":{"id":"repo"}}' \
  -H "Content-Type: application/json"
{"decision": "DENY", "reason": "No matching policy"}

Phase 3: Attribute-Based Rules with JSONPath

Goal: Add condition evaluation for time-based, device health, and custom attributes.

Deliverable: Full ABAC support with complex conditions.

Steps:

  1. Implement evaluate_conditions() function
  2. Add time range condition checking
  3. Add device health condition checking
  4. Add network type condition checking
  5. Implement JSONPath extraction for custom conditions
  6. Add conflict resolution (deny overrides allow)
  7. Add policy priority sorting

Key Functions:

evaluate_time_range(request, condition) -> bool
evaluate_device_health(request, condition) -> bool
evaluate_network_type(request, condition) -> bool
evaluate_custom_condition(request, jsonpath_expr, expected_value) -> bool

Verification:

# Request during business hours with secure device - ALLOW
$ curl -X POST http://localhost:9090/v1/decide \
  -d '{
    "subject":{"id":"alice","roles":["developer"],"device_health":"secure"},
    "action":"push",
    "resource":{"id":"repo","type":"repository"},
    "environment":{"timestamp":"2024-12-26T14:00:00Z"}
  }' \
  -H "Content-Type: application/json"
{"decision": "ALLOW"}

# Same request at 3 AM - DENY
$ curl -X POST http://localhost:9090/v1/decide \
  -d '{
    "subject":{"id":"alice","roles":["developer"],"device_health":"secure"},
    "action":"push",
    "resource":{"id":"repo","type":"repository"},
    "environment":{"timestamp":"2024-12-26T03:00:00Z"}
  }' \
  -H "Content-Type: application/json"
{"decision": "DENY", "reason": "Outside allowed time range"}

Phase 4: Policy Hot-Reloading

Goal: Update policies without restarting the server.

Deliverable: File watcher + admin reload endpoint.

Steps:

  1. Implement file watcher using fsnotify (Go) or notify (Rust)
  2. Create read-write lock for policy store
  3. On file change: parse new policies, validate, atomic swap
  4. Add POST /admin/reload-policies endpoint for manual reload
  5. Add policy version tracking
  6. Graceful handling of parse errors (keep old policies)

Implementation Pattern (Go):

type PolicyStore struct {
    policies []Policy
    mu       sync.RWMutex
    version  int
}

func (ps *PolicyStore) Reload(path string) error {
    newPolicies, err := loadFromFile(path)
    if err != nil {
        return err  // Keep old policies
    }
    ps.mu.Lock()
    defer ps.mu.Unlock()
    ps.policies = newPolicies
    ps.version++
    return nil
}

func (ps *PolicyStore) Evaluate(req Request) Decision {
    ps.mu.RLock()
    defer ps.mu.RUnlock()
    // Use ps.policies...
}

Verification:

# Initial state: 2 policies
$ curl http://localhost:9090/health
{"policies_loaded": 2, "policy_version": 1}

# Modify policies.json (add a new policy)
$ echo '...' >> policies.json

# Automatic reload detected
[INFO] File change detected, reloading policies...
[INFO] Loaded 3 policies (version 2)

# Or manual reload
$ curl -X POST http://localhost:9090/admin/reload-policies
{"status": "reloaded", "policies_loaded": 3, "policy_version": 2}

Phase 5: Performance Optimization and Caching

Goal: Achieve sub-5ms P99 latency at 10,000+ requests/second.

Deliverable: Optimized engine with caching and metrics.

Steps:

  1. Add request timing metrics (evaluation_time_ms in response)
  2. Implement in-memory LRU cache for decisions
  3. Add policy index by action type for faster matching
  4. Implement connection pooling if using external data sources
  5. Add /metrics endpoint (Prometheus format)
  6. Profile and optimize hot paths

Caching Strategy:

Cache Key: SHA256(subject.id + action + resource.id)
Cache TTL: 300 seconds (configurable per resource sensitivity)
Invalidation: On policy reload, clear entire cache

Optimization Techniques:

  • Pre-compile regex patterns at policy load time
  • Use sync.Pool for request/response object reuse
  • Avoid allocations in the hot path
  • Index policies by action for O(1) lookup
  • Use atomic counters for metrics

Verification:

# Benchmark with wrk or hey
$ hey -n 100000 -c 100 -m POST \
  -H "Content-Type: application/json" \
  -d '{"subject":{"id":"test","roles":["dev"]},"action":"read","resource":{"id":"repo"}}' \
  http://localhost:9090/v1/decide

Summary:
  Requests/sec:	12345.67
  Latency:
    50%: 1.2ms
    99%: 4.8ms

# Check metrics
$ curl http://localhost:9090/metrics
pdp_requests_total{decision="ALLOW"} 50000
pdp_requests_total{decision="DENY"} 50000
pdp_evaluation_seconds{quantile="0.5"} 0.0012
pdp_evaluation_seconds{quantile="0.99"} 0.0048
pdp_cache_hits_total 75000
pdp_cache_misses_total 25000

Testing Strategy

Unit Tests

Test individual components in isolation:

  1. Policy Parsing:
    • Valid JSON parses correctly
    • Invalid JSON returns error
    • Missing required fields fail validation
    • Unknown fields are ignored (forward compatibility)
  2. Subject Matching:
    • Role intersection works correctly
    • Empty policy roles match all subjects
    • Case sensitivity handling
  3. Condition Evaluation:
    • Time range: inside, outside, edge cases (midnight crossing)
    • Device health: exact match, array membership
    • JSONPath: valid paths, missing fields, type mismatches
  4. Conflict Resolution:
    • DENY overrides ALLOW
    • Priority ordering works
    • No-match returns default DENY

Integration Tests

Test the complete request-response cycle:

  1. API Contract:
    • Request validation rejects malformed JSON
    • Response matches expected schema
    • HTTP status codes are correct (200, 400, 500)
  2. End-to-End Scenarios:
    • Admin bypasses all restrictions
    • Time-based denial works
    • MFA requirement triggers obligation
    • Unknown subjects are denied
  3. Hot Reload:
    • New policies take effect immediately
    • Invalid policy file doesn’t crash server
    • Concurrent requests during reload work correctly

Performance Tests

# Load test with realistic payload
$ cat <<EOF > payload.json
{
  "subject": {"id": "alice", "roles": ["developer"], "device_health": "secure"},
  "action": "push",
  "resource": {"id": "repo-123", "type": "repository", "sensitivity": "internal"},
  "environment": {"timestamp": "2024-12-26T14:00:00Z", "network_type": "corporate"}
}
EOF

# Sustained load test (10 minutes)
$ hey -z 600s -c 100 -m POST \
  -H "Content-Type: application/json" \
  -D payload.json \
  http://localhost:9090/v1/decide

# Spike test (sudden burst)
$ hey -n 50000 -c 500 -m POST ...

# Measure memory under load
$ while true; do ps -o rss= -p $(pgrep zta-pdp); sleep 5; done

Verification Against Reference Implementation

Compare your PDP decisions with Open Policy Agent (OPA):

# Run OPA with equivalent Rego policies
$ opa run --server policies.rego

# Test same input against both
$ diff <(curl your-pdp/decide -d @input.json | jq .decision) \
       <(curl opa/v1/data/authz -d @input.json | jq .result.allow)

Common Pitfalls and Debugging

Pitfall 1: Policy Never Matches

Symptom: All requests return DENY even when they should match.

Cause: Field name mismatch or case sensitivity.

Solution:

# Add debug logging to show what's being compared
[DEBUG] Checking policy 'dev-read':
[DEBUG]   Subject roles: ["Developer"] vs Policy roles: ["developer"]
[DEBUG]   -> No intersection (case mismatch!)

# Fix: Normalize to lowercase at parse time
subject.roles = subject.roles.map(r => r.toLowerCase())

Pitfall 2: Time Zone Issues

Symptom: Time-based policies allow/deny at wrong times.

Cause: Server timezone differs from policy timezone.

Solution:

// Always parse times in the policy's specified timezone
loc, _ := time.LoadLocation(condition.TimeRange.Timezone)
requestTime := request.Environment.Timestamp.In(loc)
hour := requestTime.Hour()

Pitfall 3: Cache Stampede on Policy Reload

Symptom: Latency spike after policy reload as cache is empty.

Cause: All cached decisions invalidated simultaneously.

Solution:

// Option 1: Probabilistic early expiration
func shouldFetch(ttl time.Duration) bool {
    remaining := ttl.Seconds()
    return rand.Float64() < math.Exp(-remaining/60.0)
}

// Option 2: Background cache warming
func afterReload() {
    go warmCache(commonRequests)
}

Pitfall 4: Wildcard Action Matching

Symptom: actions: ["*"] doesn’t match all actions.

Cause: Wildcard treated as literal string.

Solution:

func matchesAction(request Request, policy Policy) bool {
    if len(policy.Actions) == 0 {
        return true  // Empty = match all
    }
    for _, action := range policy.Actions {
        if action == "*" || action == request.Action {
            return true
        }
    }
    return false
}

Pitfall 5: Race Condition on Policy Reload

Symptom: Intermittent panics or wrong decisions during reload.

Cause: Reading policies while they’re being written.

Solution:

// Use RWMutex - multiple readers OR single writer
type PolicyStore struct {
    mu       sync.RWMutex
    policies []Policy
}

func (ps *PolicyStore) Evaluate(req Request) Decision {
    ps.mu.RLock()          // Acquire read lock
    defer ps.mu.RUnlock()
    // Safe to read ps.policies
}

func (ps *PolicyStore) Reload(new []Policy) {
    ps.mu.Lock()           // Acquire write lock
    defer ps.mu.Unlock()
    ps.policies = new      // Atomic swap
}

Debugging Checklist

  1. Enable verbose logging: Log every policy check with matched/unmatched reason
  2. Add request echo: Include parsed request in response for verification
  3. Use curl with -v: See exact request being sent
  4. Check policy loading: Verify policies.json is valid JSON
  5. Verify timestamps: Ensure client and server clocks are synchronized
  6. Test with minimal policy: Single policy to isolate matching logic

Extensions and Challenges

Extension 1: gRPC API

Replace REST with gRPC for lower latency and type safety.

Hints:

  • Define .proto file matching the JSON schemas
  • Use streaming for bulk decisions
  • Implement health checking protocol
  • Target < 1ms P50 latency

Extension 2: Policy Simulation Mode

Add endpoint to test “what if” scenarios without affecting production.

$ curl -X POST http://localhost:9090/v1/simulate \
  -d '{
    "request": { ... },
    "proposed_policies": [ ... ]
  }'
{"would_allow": true, "matched_policy": "new-policy-draft"}

Extension 3: Policy Analytics Dashboard

Build a web UI showing:

  • Decision distribution over time
  • Most-matched policies
  • Most-denied subjects/resources
  • Latency percentiles

Extension 4: Distributed Policy Sync

Multiple PDP instances sharing policies via:

  • etcd/Consul for configuration
  • Raft consensus for consistency
  • Merkle trees for efficient diff

Extension 5: Machine Learning Risk Scoring

Integrate with a risk scoring model:

  • Input: Subject + context features
  • Output: Risk score 0.0 - 1.0
  • Use in conditions: risk_score < 0.5

Books That Will Help

Topic Book Chapter
Authorization Logic “Foundations of Information Security” by Jason Andress Ch. 5: Authentication and Authorization
Policy as Code “Zero Trust Networks” by Gilman & Barth Ch. 3: The Zero Trust Control Plane
System Performance “Designing Data-Intensive Applications, 2nd Ed” by Kleppmann Ch. 1-2: Foundations
Access Control Models “Security in Computing” by Charles Pfleeger Ch. 4: Access Control
Go Performance “Learning Go, 2nd Edition” by Jon Bodner Ch. 12: Performance
Concurrent Data Structures “Algorithms, Fourth Edition” by Sedgewick & Wayne Ch. 4: Hash Tables
Rule Engines & Logic “Design Patterns” by Gamma et al. Ch. 5: Strategy Pattern
Cryptographic Foundations “Serious Cryptography, 2nd Edition” by Aumasson Ch. 5: MACs and Authentication

Interview Questions

Questions you should be able to answer after completing this project:

  1. “What is the difference between Authentication and Authorization?”
    • Authentication: Verifying WHO you are (identity)
    • Authorization: Verifying WHAT you can do (permissions)
    • The PDP handles authorization, assumes authentication already happened
  2. “Explain Attribute-Based Access Control (ABAC) with a real-world example.”
    • Example: “Developers can push to non-production repos during business hours from secure devices”
    • Combines subject attributes (role, device health), resource attributes (environment), and context (time)
    • More flexible than RBAC, can express complex policies
  3. “How do you handle ‘Policy Conflict’ (e.g., one rule says ALLOW, another says DENY)?”
    • Strategy 1: Deny overrides (most secure, what AWS uses)
    • Strategy 2: First match (order-dependent, like firewalls)
    • Strategy 3: Most specific wins (requires specificity scoring)
    • Strategy 4: Explicit priority numbers
  4. “Why is centralized policy management key to Zero Trust?”
    • Single source of truth for all authorization decisions
    • Consistent enforcement across all services
    • Easier auditing and compliance
    • Policy updates take effect everywhere immediately
    • Decouples authorization logic from application code
  5. “What happens if the PDP is unavailable?”
    • Fail-closed: Deny all requests (secure but impacts availability)
    • Fail-open: Allow all requests (dangerous, should never be used for sensitive resources)
    • Cached fallback: Use last known decision (tradeoff between security and availability)
  6. “How do you achieve sub-5ms latency in a policy engine?”
    • In-memory policy storage (no database queries)
    • Policy indexing by action type
    • Decision caching with appropriate TTL
    • Object pooling to reduce GC pressure
    • Avoid regex compilation in hot path
  7. “What is a Policy Information Point (PIP)?”
    • External data source that enriches authorization requests
    • Examples: Device health service, threat intelligence feed, HR database
    • PDP queries PIPs to get current context for decisions
    • Should be cached to avoid latency

Self-Assessment Checklist

Before considering this project complete, verify you can:

  • Explain the difference between RBAC, ABAC, and ReBAC with concrete examples
  • Describe the role of PDP, PEP, and PIP in Zero Trust architecture
  • Implement a policy evaluation engine that matches subjects, actions, and resources
  • Handle time-based conditions with proper timezone handling
  • Implement conflict resolution using deny-overrides strategy
  • Hot-reload policies without dropping requests or crashing
  • Achieve < 5ms P99 latency under load
  • Write comprehensive audit logs for every decision
  • Handle failure modes gracefully (fail-closed by default)
  • Test your PDP against complex scenarios (after hours, compromised devices, etc.)
  • Explain when caching is appropriate and when it creates security risks
  • Integrate your PDP with the Identity-Aware Proxy from Project 1
  • Answer all the interview questions listed above confidently