Project 6: Continuous Authentication Monitor (Behavioral Zero Trust)

The Core Zero Trust Principle: “Trust is not a one-time decision. It is continuously evaluated based on behavior and context.” This project teaches you that authentication at login is just the beginning - real security requires watching every action, learning normal patterns, and detecting when something feels wrong.

Project Overview

Attribute	Value
Difficulty	Level 4: Expert
Time Estimate	Extended Sprint (40-60 hours)
Primary Language	Python
Alternative Languages	Go, Node.js
Coolness Level	Level 5: Pure Magic
Business Potential	Industry Disruptor
Knowledge Area	Data Science / Security Analytics
Software/Tools	Log Processing, Anomaly Detection, Geolocation APIs
Main Book	“Security in Computing” by Pfleeger - Chapter 7

Learning Objectives

By completing this project, you will be able to:

Implement User and Entity Behavior Analytics (UEBA): Build a system that learns “normal” behavior patterns and detects deviations that indicate account compromise.
Calculate Continuous Trust Scores: Move beyond binary authentication to real-time risk assessment that considers location, time, device, and behavior.
Solve the Impossible Travel Problem: Implement the Haversine formula to detect physically impossible login sequences (NYC to London in 20 minutes).
Build Statistical Baselines: Use moving averages, standard deviations, and z-scores to define what “normal” means for each user.
Design Real-Time Event Processing: Architect a stream processing system that can analyze access logs and trigger alerts within seconds.
Implement Token Revocation at Scale: Propagate security decisions to all Policy Enforcement Points (PEPs) in your infrastructure simultaneously.
Balance Security and Usability: Tune detection thresholds to minimize false positives while catching real threats.

Deep Theoretical Foundation

User and Entity Behavior Analytics (UEBA)

UEBA represents a paradigm shift in security from signature-based detection to behavior-based detection. Traditional security asks “Is this a known attack pattern?” UEBA asks “Is this behavior normal for this entity?”

+------------------------------------------------------------------+
|                    EVOLUTION OF THREAT DETECTION                  |
+------------------------------------------------------------------+

GENERATION 1: Signature-Based (1990s-2000s)
==========================================

   Known Attack Patterns Database
   +---------------------------+
   | SQL Injection: ' OR 1=1  |
   | XSS: <script>alert()</script> |
   | Known Malware Hashes     |
   +---------------------------+
              |
              v
   [Request] ---> [Pattern Match?] ---> [Alert if Match]

   Weakness: Cannot detect unknown attacks (zero-days)
   Weakness: Attackers simply modify patterns slightly


GENERATION 2: Rule-Based (2000s-2010s)
=====================================

   Static Rules
   +---------------------------+
   | IF failed_logins > 5     |
   |    THEN lock_account     |
   | IF request_rate > 100/s  |
   |    THEN rate_limit       |
   +---------------------------+
              |
              v
   [Request] ---> [Rule Evaluation] ---> [Action]

   Weakness: Requires human to write every rule
   Weakness: Attackers learn the thresholds


GENERATION 3: Behavior Analytics (2015-Present) - This Project
==============================================================

   Learned Behavioral Baselines (per-user)
   +---------------------------+
   | Alice: Login NYC 9-5     |
   | Alice: Uses Chrome/macOS |
   | Alice: Accesses /api/v2  |
   | Alice: ~50 requests/hour |
   +---------------------------+
              |
              v
   [Request] ---> [Compare to Baseline] ---> [Calculate Risk Score]
              |
              v
   [Risk Score > Threshold?] ---> [Alert + Revoke Access]

   Strength: Detects novel attacks (no signature needed)
   Strength: Personalized to each user's patterns
   Strength: Adapts as behavior naturally changes

+------------------------------------------------------------------+

Why UEBA Matters for Zero Trust:

Zero Trust mandates continuous verification. But what does “verification” mean after initial authentication? You cannot ask users to re-enter passwords every 5 minutes. Instead, you verify their behavior matches their identity.

Traditional Authentication:
===========================

08:00 - User: alice, Password: ******, MFA: 123456
        --> ACCESS GRANTED (for 8 hours)

08:01 - 16:00: No further verification
        (Attacker could have taken over at 08:05)


Continuous Authentication (UEBA):
=================================

08:00 - User: alice, Password: ******, MFA: 123456
        --> Initial access granted, monitoring begins
        --> Trust Score: 100

08:05 - Request from NYC, Chrome/macOS, normal pattern
        --> Trust Score: 100

08:10 - Request from NYC, same device
        --> Trust Score: 100

08:15 - ALERT: Request from LONDON, different device
        --> Impossible Travel detected (NYC->London in 15 min)
        --> Trust Score: 0
        --> ACCESS REVOKED
        --> MFA step-up required

Continuous Authentication vs Point-in-Time

The phrase “authenticate once, trust forever” is the antithesis of Zero Trust. Here’s why continuous authentication is essential:

+------------------------------------------------------------------+
|              THE PROBLEM WITH POINT-IN-TIME AUTH                  |
+------------------------------------------------------------------+

Attack Scenario: Session Hijacking
==================================

08:00 - Legitimate user Alice authenticates
        Browser receives session token: JWT_TOKEN_ABC

08:01 - Alice's laptop infected with malware
        Malware extracts JWT_TOKEN_ABC

08:02 - Attacker uses JWT_TOKEN_ABC from different location
        Server sees: Valid token, correct signature, not expired
        Result: ACCESS GRANTED

08:03 - 16:00: Attacker has full access as "Alice"

WHY TRADITIONAL AUTH FAILED:
- Token was valid (correct signature)
- Token was not expired
- No mechanism to detect context change

+------------------------------------------------------------------+

Solution: Continuous Context Evaluation
=======================================

Every request is evaluated against:

1. LOCATION CONTEXT
   - Is this IP address consistent with recent activity?
   - Is travel time between locations physically possible?
   - Is this a known VPN exit node or Tor node?

2. TEMPORAL CONTEXT
   - Does this match the user's typical working hours?
   - Is this an unusual day of week for this user?
   - How long since the last activity? (session staleness)

3. DEVICE CONTEXT
   - Is this the same browser fingerprint?
   - Is this the same user-agent string?
   - Has the device health changed?

4. BEHAVIORAL CONTEXT
   - Is the request rate normal for this user?
   - Are they accessing typical resources?
   - Is the data volume expected?

+------------------------------------------------------------------+

The Impossible Travel Algorithm

The “Impossible Travel” detection is one of the most powerful signals in UEBA. It catches scenarios where a user’s credentials appear to be used from two distant locations within an impossibly short time.

The Haversine Formula:

The Haversine formula calculates the great-circle distance between two points on a sphere given their latitude and longitude. This is the shortest distance between two points on Earth’s surface.

+------------------------------------------------------------------+
|                    HAVERSINE FORMULA                              |
+------------------------------------------------------------------+

Given:
  - Point A: (lat1, lon1) in radians
  - Point B: (lat2, lon2) in radians
  - R = Earth's radius = 6,371 km

Formula:
                    _____________________________________
  d = 2R * arcsin( / sin^2((lat2-lat1)/2) +              )
                  V   cos(lat1)*cos(lat2)*sin^2((lon2-lon1)/2)


Step-by-step calculation:

1. Convert degrees to radians:
   lat_rad = lat_deg * (pi / 180)

2. Calculate differences:
   dlat = lat2 - lat1
   dlon = lon2 - lon1

3. Apply Haversine:
   a = sin^2(dlat/2) + cos(lat1) * cos(lat2) * sin^2(dlon/2)
   c = 2 * arcsin(sqrt(a))
   distance = R * c

+------------------------------------------------------------------+
|                    EXAMPLE CALCULATION                            |
+------------------------------------------------------------------+

Location A: New York City
  Latitude:  40.7128 N
  Longitude: 74.0060 W

Location B: London
  Latitude:  51.5074 N
  Longitude: 0.1278 W

Step 1: Convert to radians
  NYC:    lat1 = 0.7102 rad,  lon1 = -1.2918 rad
  London: lat2 = 0.8997 rad,  lon2 = -0.0022 rad

Step 2: Calculate
  dlat = 0.1895 rad
  dlon = 1.2896 rad

  a = sin^2(0.0948) + cos(0.7102) * cos(0.8997) * sin^2(0.6448)
  a = 0.00897 + 0.764 * 0.621 * 0.361
  a = 0.00897 + 0.171
  a = 0.180

  c = 2 * arcsin(sqrt(0.180))
  c = 2 * arcsin(0.424)
  c = 2 * 0.438
  c = 0.876 rad

  distance = 6371 * 0.876 = 5,581 km

Result: NYC to London = 5,581 km (approximately)

+------------------------------------------------------------------+

Implementation in Python:

import math

def haversine_distance(lat1: float, lon1: float,
                       lat2: float, lon2: float) -> float:
    """
    Calculate the great-circle distance between two points
    on Earth using the Haversine formula.

    Args:
        lat1, lon1: Coordinates of point 1 (in degrees)
        lat2, lon2: Coordinates of point 2 (in degrees)

    Returns:
        Distance in kilometers
    """
    R = 6371  # Earth's radius in kilometers

    # Convert to radians
    lat1_rad = math.radians(lat1)
    lat2_rad = math.radians(lat2)
    dlat = math.radians(lat2 - lat1)
    dlon = math.radians(lon2 - lon1)

    # Haversine formula
    a = (math.sin(dlat / 2) ** 2 +
         math.cos(lat1_rad) * math.cos(lat2_rad) *
         math.sin(dlon / 2) ** 2)
    c = 2 * math.asin(math.sqrt(a))

    return R * c

Speed Calculation and Thresholds:

+------------------------------------------------------------------+
|                    IMPOSSIBLE TRAVEL DETECTION                    |
+------------------------------------------------------------------+

Given two log entries:
  Entry 1: User Alice, IP 1.1.1.1 (NYC), Time: 10:00:00
  Entry 2: User Alice, IP 8.8.8.8 (London), Time: 10:20:00

Step 1: Geolocate IPs
  1.1.1.1 -> NYC (40.7128, -74.0060)
  8.8.8.8 -> London (51.5074, -0.1278)

Step 2: Calculate distance
  Distance = haversine(40.7128, -74.0060, 51.5074, -0.1278)
  Distance = 5,581 km

Step 3: Calculate time difference
  Time diff = 10:20:00 - 10:00:00 = 20 minutes = 0.333 hours

Step 4: Calculate required speed
  Speed = Distance / Time = 5,581 km / 0.333 hours = 16,743 km/h

Step 5: Compare to threshold
  Commercial aircraft: ~900 km/h
  Supersonic jet:      ~2,200 km/h
  "Impossible" threshold: 1,500 km/h (generous buffer)

  16,743 km/h >> 1,500 km/h

VERDICT: IMPOSSIBLE TRAVEL DETECTED
ACTION: Revoke session, require re-authentication

+------------------------------------------------------------------+
|                    THRESHOLD CONSIDERATIONS                       |
+------------------------------------------------------------------+

Conservative (Fewer False Positives):
  Speed > 2,500 km/h = Impossible
  Rationale: Accounts for supersonic travel, timing errors

Aggressive (More Security):
  Speed > 1,000 km/h = Suspicious
  Rationale: Commercial flights average 800-900 km/h

Adaptive (Best Practice):
  Calculate based on:
  - User's typical travel patterns
  - Known airports near each location
  - Time of day (red-eye flights vs business hours)
  - Account sensitivity level

+------------------------------------------------------------------+

Edge Cases to Handle:

+------------------------------------------------------------------+
|              IMPOSSIBLE TRAVEL: EDGE CASES                        |
+------------------------------------------------------------------+

CASE 1: VPN Usage
=================
User connects to VPN, appears to "travel" instantly.

Solution:
- Maintain list of known VPN/proxy IP ranges
- Flag VPN traffic but don't immediately alert
- Use device fingerprint as secondary signal
- Allow user to register "I use VPN" preference

CASE 2: Mobile Network Handoff
==============================
Mobile carrier assigns different exit IPs that geolocate differently.

Solution:
- Use ASN (Autonomous System Number) grouping
- Same carrier ASN = same logical location
- Set minimum distance threshold (e.g., >500km to alert)

CASE 3: Shared IP (NAT/CGNAT)
=============================
Multiple users share the same public IP (corporate NAT, CGNAT).

Solution:
- Combine with device fingerprint
- Use behavioral signals (typing patterns, request patterns)
- Maintain "known office IP" allowlist

CASE 4: Legitimate Fast Travel
==============================
User genuinely flies from NYC to Boston (1 hour flight).

Solution:
- Speed threshold must account for fastest commercial travel
- Use 1,500 km/h as baseline (faster than any commercial flight)
- Consider "airport proximity" in calculations

CASE 5: Clock Synchronization
=============================
Log timestamps from different servers have clock skew.

Solution:
- Use centralized time service (NTP synchronized)
- Include microsecond precision in timestamps
- Require minimum time difference (e.g., 1 minute) before analysis

+------------------------------------------------------------------+

Statistical Anomaly Detection

Building a baseline of “normal” behavior requires understanding statistical concepts. Here’s how to detect when behavior deviates significantly from the norm.

Standard Deviation and Z-Scores:

+------------------------------------------------------------------+
|                    STATISTICAL BASELINE BUILDING                   |
+------------------------------------------------------------------+

THE CONCEPT:
============

Normal behavior clusters around an average (mean).
The spread of that behavior is the standard deviation.
A z-score tells us how many standard deviations from the mean a
value is.

                      value - mean
           z-score = ---------------
                     standard_deviation

If z-score > 2: Value is unusually high (95th percentile)
If z-score > 3: Value is extremely unusual (99.7th percentile)


EXAMPLE: Login Hour Detection
=============================

Alice's login history (30 days):
  Hours: [9, 9, 10, 9, 8, 9, 10, 9, 9, 10, 8, 9, 9, 10, 9,
          9, 8, 9, 10, 9, 9, 9, 10, 8, 9, 9, 10, 9, 9, 9]

Calculate mean (average):
  mean = sum(hours) / count = 270 / 30 = 9.0

Calculate standard deviation:
  variance = sum((x - mean)^2) / count
  variance = sum of squared differences / 30
  variance = 0.47
  std_dev = sqrt(0.47) = 0.68

Now evaluate new logins:

  Login at 9:00 AM:
    z-score = (9 - 9) / 0.68 = 0
    Interpretation: Perfectly normal

  Login at 10:00 AM:
    z-score = (10 - 9) / 0.68 = 1.47
    Interpretation: Slightly unusual but acceptable

  Login at 3:00 AM:
    z-score = (3 - 9) / 0.68 = -8.8
    Interpretation: EXTREMELY UNUSUAL (8+ standard deviations!)
    Action: Trigger MFA step-up

+------------------------------------------------------------------+

Exponential Moving Average (EMA):

For behavior that changes over time, use an exponential moving average to weight recent behavior more heavily:

+------------------------------------------------------------------+
|              EXPONENTIAL MOVING AVERAGE (EMA)                     |
+------------------------------------------------------------------+

THE FORMULA:
============

EMA_today = (value_today * alpha) + (EMA_yesterday * (1 - alpha))

Where alpha (smoothing factor) = 2 / (N + 1)
  N = number of periods (e.g., 30 days)
  alpha for 30-day EMA = 2/31 = 0.0645


WHY EMA vs SIMPLE AVERAGE:
==========================

Simple Average: All historical data weighted equally
  - User's behavior from 6 months ago = today's behavior
  - Doesn't adapt well to legitimate changes

Exponential Moving Average: Recent data weighted more heavily
  - Yesterday's behavior matters more than last month's
  - Naturally adapts as user's patterns evolve
  - Older data "decays" exponentially


EXAMPLE: Request Rate Baseline
==============================

Day 1: Requests = 50   -> EMA = 50 (initialization)
Day 2: Requests = 60   -> EMA = 60*0.065 + 50*0.935 = 50.6
Day 3: Requests = 55   -> EMA = 55*0.065 + 50.6*0.935 = 50.9
...
Day 30: Requests = 80  -> EMA = ~65

If Day 31 shows 500 requests:
  z-score = (500 - 65) / baseline_std_dev = Very High
  Action: Alert - unusual activity volume

+------------------------------------------------------------------+

Time-Based Patterns:

+------------------------------------------------------------------+
|                    TIME-BASED BEHAVIORAL PATTERNS                  |
+------------------------------------------------------------------+

PATTERN 1: Hour of Day
======================
Track when user typically accesses the system.

   Hour  | Frequency | Probability
   ------|-----------|------------
   08:00 |    12     |   0.02
   09:00 |    85     |   0.17
   10:00 |    95     |   0.19
   11:00 |    78     |   0.16
   ...   |    ...    |   ...
   03:00 |     0     |   0.00

Request at 3 AM: Probability = 0.00
Action: High-risk flag


PATTERN 2: Day of Week
======================
Track weekday vs weekend behavior.

   Day       | Requests | Typical
   ----------|----------|--------
   Monday    |    120   |   Yes
   Tuesday   |    115   |   Yes
   Wednesday |    118   |   Yes
   Thursday  |    122   |   Yes
   Friday    |    110   |   Yes
   Saturday  |      2   |   No
   Sunday    |      0   |   No

Request on Sunday: Atypical
Combine with other signals for risk score


PATTERN 3: Request Velocity
===========================
Track how quickly requests come in bursts.

Normal pattern:
  09:00-09:05: 5 requests
  09:05-09:10: 8 requests
  09:10-09:15: 4 requests
  Average: 5.7 requests per 5 minutes

Anomalous pattern:
  09:00-09:05: 150 requests  <- Automated script?

z-score = (150 - 5.7) / 2.1 = 68.7
Action: Rate limit + alert

+------------------------------------------------------------------+

Trust Score Calculation

A composite trust score combines multiple signals into a single value that the Policy Decision Point (PDP) can use for access decisions.

+------------------------------------------------------------------+
|                    TRUST SCORE ARCHITECTURE                        |
+------------------------------------------------------------------+

SCORE COMPONENTS:
=================

1. LOCATION SCORE (0-100)
   - Same city as usual: 100
   - Same country, different city: 80
   - Different country, expected travel: 60
   - Different country, unexpected: 20
   - Impossible travel detected: 0

2. TEMPORAL SCORE (0-100)
   - Normal working hours: 100
   - Off-hours but precedent exists: 70
   - First time at this hour: 40
   - 3 AM on a holiday: 10

3. DEVICE SCORE (0-100)
   - Known device, healthy: 100
   - Known device, unhealthy: 50
   - Unknown device, expected type: 40
   - Unknown device, unexpected type: 20

4. BEHAVIORAL SCORE (0-100)
   - Request pattern matches baseline: 100
   - Slightly elevated activity: 80
   - Unusual resources accessed: 50
   - Massive deviation from norm: 10


COMPOSITE CALCULATION:
======================

Method 1: Weighted Average
--------------------------
trust_score = (
    location_score   * 0.30 +
    temporal_score   * 0.20 +
    device_score     * 0.25 +
    behavioral_score * 0.25
)

Method 2: Minimum Signal (More Secure)
--------------------------------------
trust_score = min(location, temporal, device, behavioral)

Rationale: One critical failure = high risk

Method 3: Multiplicative (Most Sensitive)
-----------------------------------------
trust_score = (loc/100) * (temp/100) * (dev/100) * (behav/100) * 100

Effect: Any low score dramatically reduces final score


POLICY ACTIONS BASED ON SCORE:
==============================

Trust Score | Action
------------|--------------------------------------------
90-100      | Full access, no additional verification
70-89       | Access granted, activity logged
50-69       | Limited access, step-up MFA for sensitive ops
30-49       | Read-only access, alert security team
0-29        | Access denied, session terminated, MFA required

+------------------------------------------------------------------+

Score Decay Over Time:

Trust should decay when there’s no activity to verify:

+------------------------------------------------------------------+
|                    TRUST SCORE DECAY                              |
+------------------------------------------------------------------+

CONCEPT:
========
The longer a session is idle, the less confident we are that the
original user is still in control.

Decay Formula (Exponential):
  current_score = initial_score * e^(-lambda * time_since_activity)

Where lambda = decay rate constant

Example with lambda = 0.01 per minute:

Time Idle | Trust Score (started at 100)
----------|------------------------------
0 min     | 100
5 min     | 95
15 min    | 86
30 min    | 74
60 min    | 55
120 min   | 30

At 120 minutes idle: Force re-authentication


IMPLEMENTATION:
===============

class TrustScore:
    def __init__(self, initial_score: float, decay_rate: float = 0.01):
        self.initial_score = initial_score
        self.decay_rate = decay_rate
        self.last_activity = time.time()

    def get_current_score(self) -> float:
        time_idle = time.time() - self.last_activity
        minutes_idle = time_idle / 60
        decay_factor = math.exp(-self.decay_rate * minutes_idle)
        return self.initial_score * decay_factor

    def refresh(self, new_score: float):
        self.initial_score = new_score
        self.last_activity = time.time()

+------------------------------------------------------------------+

Real-Time Event Processing

Continuous authentication requires processing log events in real-time, not in batch. Here are the key architectural patterns:

+------------------------------------------------------------------+
|              REAL-TIME EVENT PROCESSING ARCHITECTURE              |
+------------------------------------------------------------------+

STREAM PROCESSING CONCEPTS:
===========================

Unlike batch processing (process all logs nightly), stream processing
handles events as they arrive, with sub-second latency.

Key Components:

1. EVENT PRODUCER
   - PEP proxies emit access logs
   - Each log entry = one event
   - Format: JSON with timestamp, user, IP, resource, action

2. MESSAGE QUEUE / EVENT BUS
   - Kafka, Redis Streams, RabbitMQ
   - Decouples producers from consumers
   - Provides durability and replay capability

3. STREAM PROCESSOR
   - Consumes events in real-time
   - Maintains state (user baselines, current sessions)
   - Emits alerts and revocation signals

4. SINK / ACTION LAYER
   - Writes alerts to database
   - Publishes revocation to PEPs
   - Updates dashboards


ARCHITECTURE DIAGRAM:
=====================

+--------+     +--------+     +--------+
| PEP 1  |     | PEP 2  |     | PEP 3  |    (Policy Enforcement Points)
+---+----+     +---+----+     +---+----+
    |              |              |
    v              v              v
+----------------------------------+
|          MESSAGE QUEUE           |    (Redis Streams / Kafka)
|    "access-log" topic/stream     |
+----------------+-----------------+
                 |
                 v
+----------------------------------+
|      CONTINUOUS AUTH MONITOR     |    (This Project)
|                                  |
|  +----------------------------+  |
|  | Event Parser               |  |
|  +-------------+--------------+  |
|                |                 |
|  +-------------v--------------+  |
|  | User Baseline Store        |  |    (Per-user behavioral models)
|  +-------------+--------------+  |
|                |                 |
|  +-------------v--------------+  |
|  | Anomaly Detector           |  |    (Z-scores, impossible travel)
|  +-------------+--------------+  |
|                |                 |
|  +-------------v--------------+  |
|  | Trust Score Calculator     |  |
|  +-------------+--------------+  |
|                |                 |
+----------------+-----------------+
                 |
        +--------+--------+
        |                 |
        v                 v
+---------------+  +---------------+
|   Dashboard   |  |  Revocation   |    (Real-time UI + PEP signals)
|   (WebSocket) |  |   Publisher   |
+---------------+  +-------+-------+
                           |
          +----------------+----------------+
          v                v                v
     +--------+       +--------+       +--------+
     | PEP 1  |       | PEP 2  |       | PEP 3  |
     +--------+       +--------+       +--------+

+------------------------------------------------------------------+

Event-Driven Architecture:

+------------------------------------------------------------------+
|                    PUB/SUB FOR REVOCATION                          |
+------------------------------------------------------------------+

PROBLEM:
========
When impossible travel is detected, we must revoke the session
across ALL PEP proxies within seconds.

Option 1: Poll-Based (Slow)
---------------------------
PEPs periodically check: "Is this token still valid?"
  - Poll interval: 30 seconds
  - Worst case: 30 second window for attacker

Option 2: Push-Based (Fast) - Preferred
---------------------------------------
Monitor publishes: "REVOKE user:alice session:XYZ"
All PEPs receive immediately and terminate session.
  - Latency: < 100ms
  - No polling overhead


IMPLEMENTATION WITH REDIS PUB/SUB:
==================================

Monitor (Publisher):
-------------------
def revoke_user_session(user_id: str, session_id: str, reason: str):
    message = {
        "action": "REVOKE",
        "user_id": user_id,
        "session_id": session_id,
        "reason": reason,
        "timestamp": time.time()
    }
    redis_client.publish("session-revocations", json.dumps(message))

PEP (Subscriber):
-----------------
def handle_revocation(message):
    data = json.loads(message)
    if data["action"] == "REVOKE":
        session_store.invalidate(
            user_id=data["user_id"],
            session_id=data["session_id"]
        )
        log.warning(f"Session revoked: {data['reason']}")

# Subscribe to channel
pubsub = redis_client.pubsub()
pubsub.subscribe("session-revocations")
for message in pubsub.listen():
    if message["type"] == "message":
        handle_revocation(message["data"])


FAILSAFE: SHORT-LIVED TOKENS
============================
Even with push revocation, use short-lived tokens (5-15 minutes)
as defense in depth. If push fails, token expires quickly.

+------------------------------------------------------------------+

Token Revocation Strategies

When the monitor detects an anomaly, it must revoke the compromised token. Here are the strategies:

+------------------------------------------------------------------+
|                    TOKEN REVOCATION STRATEGIES                     |
+------------------------------------------------------------------+

STRATEGY 1: Token Blacklist (Redis-Based)
=========================================

How it works:
- Maintain a set of revoked token IDs (JTI claims)
- On each request, PEP checks: Is this JTI in the blacklist?
- TTL on blacklist entries matches token lifetime

Redis Implementation:
  # Revoke a token
  SADD revoked_tokens:<user_id> <token_jti>
  EXPIRE revoked_tokens:<user_id> 3600  # 1 hour TTL

  # Check if revoked
  SISMEMBER revoked_tokens:<user_id> <token_jti>

Pros:
  + Simple to implement
  + Fast O(1) lookup
  + Distributed across PEPs

Cons:
  - Requires network call on every request
  - Blacklist can grow large


STRATEGY 2: Short-Lived Tokens with Refresh
============================================

How it works:
- Access tokens live only 5-15 minutes
- Refresh tokens live longer (hours/days)
- On revocation: invalidate refresh token
- Access token expires naturally

Implementation:
  Access Token:  { exp: now + 15 minutes, ... }
  Refresh Token: { jti: "refresh-xyz", exp: now + 24 hours }

  # To revoke: Add refresh token to blacklist
  SADD revoked_refresh_tokens <refresh_token_jti>

Pros:
  + Small blast radius (15 min max)
  + Fewer blacklist entries (only refresh tokens)
  + Can skip blacklist check for access tokens

Cons:
  - 15 minute window of exposure
  - More token refresh traffic


STRATEGY 3: Pushed Invalidation (Recommended for ZTA)
=====================================================

How it works:
- On revocation: Publish message to all PEPs immediately
- PEPs maintain local cache of revoked sessions
- No centralized check needed after initial push

Implementation:
  Monitor:
    redis.publish("revocations", {
        "user": "alice",
        "session": "sess-123",
        "reason": "impossible_travel"
    })

  PEP:
    on_revocation_message(msg):
        local_revocation_cache.add(msg.session)

    on_request(token):
        if token.session_id in local_revocation_cache:
            return DENY
        # Continue normal validation

Pros:
  + Near-instant revocation (< 100ms)
  + No centralized bottleneck
  + PEPs remain autonomous

Cons:
  - Requires reliable message delivery
  - PEPs must subscribe to revocation channel


STRATEGY 4: JWT `jti` Claim Tracking
====================================

Every JWT should include a unique identifier (jti claim):

{
  "sub": "alice@example.com",
  "jti": "550e8400-e29b-41d4-a716-446655440000",
  "exp": 1703980800,
  "iat": 1703977200
}

This enables:
- Per-token revocation (not per-user)
- Audit trail of specific tokens
- Replay attack prevention

+------------------------------------------------------------------+

IP Geolocation

Converting IP addresses to geographic coordinates is fundamental to impossible travel detection.

+------------------------------------------------------------------+
|                    IP GEOLOCATION                                  |
+------------------------------------------------------------------+

HOW IT WORKS:
=============

1. IP Address Registration
   - IANA allocates IP blocks to Regional Internet Registries (RIRs)
   - RIRs allocate to ISPs
   - ISPs assign to customers in specific regions

2. Geolocation Databases
   - Companies like MaxMind map IP ranges to locations
   - Based on: ISP registration data, user-submitted data,
     network topology analysis

3. Lookup Process
   - Input: IP address (e.g., 8.8.8.8)
   - Output: Latitude, Longitude, City, Country, ASN


MAXMIND GEOLITE2 DATABASE:
==========================

Free database with city-level accuracy.

Installation (Python):
  pip install geoip2
  # Download GeoLite2-City.mmdb from MaxMind

Usage:
  import geoip2.database

  reader = geoip2.database.Reader('GeoLite2-City.mmdb')

  response = reader.city('128.101.101.101')

  print(response.city.name)           # Minneapolis
  print(response.country.name)        # United States
  print(response.location.latitude)   # 44.9778
  print(response.location.longitude)  # -93.2650
  print(response.location.accuracy_radius)  # 20 (km)


ACCURACY CONSIDERATIONS:
========================

Accuracy varies significantly:

  Location Type      | Typical Accuracy
  -------------------|------------------
  Home ISP           | City-level (~50km)
  Mobile carrier     | Region-level (~100km)
  Corporate network  | Building-level (if registered)
  VPN/Proxy          | Exit node location (not user)
  Tor                | Random exit node
  CGNAT              | ISP hub location (not user)


HANDLING INACCURACY:
====================

1. Use accuracy_radius from database
   - If accuracy_radius > 100km, reduce confidence
   - Don't trigger impossible travel if both locations uncertain

2. Buffer distance calculations
   - NYC to Boston = 340km
   - If accuracy_radius = 50km each, effective minimum = 240km
   - Use: effective_distance = distance - (radius1 + radius2)

3. Flag VPN/proxy IPs
   - Maintain list of known VPN provider IP ranges
   - Don't use these for location-based decisions
   - Fall back to other signals (device, behavior)

+------------------------------------------------------------------+
|                    PRIVACY CONSIDERATIONS                          |
+------------------------------------------------------------------+

IP geolocation involves privacy-sensitive data:

1. DATA RETENTION
   - Don't store raw IPs longer than necessary
   - Hash or anonymize for long-term analytics
   - Follow GDPR/CCPA requirements

2. USER TRANSPARENCY
   - Inform users their location is monitored
   - Provide mechanism to report false positives
   - Allow users to pre-register travel

3. GRANULARITY
   - Store city/region, not exact coordinates
   - Precision for security, not surveillance

+------------------------------------------------------------------+

References to Books in User’s Library

The following books provide deeper theoretical foundations:

“Security in Computing” by Charles Pfleeger - Chapter 7: Network Security

Authentication protocols and their weaknesses
Session management and replay attacks
Understanding why continuous verification matters

“Designing Data-Intensive Applications” by Martin Kleppmann - Chapter 11: Stream Processing

Event streaming architectures
Exactly-once processing semantics
Time handling in distributed systems
Why batch processing is insufficient for security

“Foundations of Information Security” by Jason Andress - Chapter 8: Intrusion Detection

Signature vs anomaly-based detection
False positive and false negative tradeoffs
Building detection systems that scale

Complete Project Specification

Functional Requirements

ID	Requirement	Priority	Acceptance Criteria
FR-1	Ingest access logs in real-time	P0	Process logs within 1 second of generation
FR-2	Detect impossible travel	P0	Flag when travel speed > 1500 km/h
FR-3	Build user behavioral baselines	P0	Track login times, locations, request patterns per user
FR-4	Calculate continuous trust scores	P0	Composite score from location, time, device, behavior
FR-5	Trigger session revocation	P0	Publish revocation signal within 100ms of detection
FR-6	Expose real-time dashboard	P1	WebSocket-based trust score visualization
FR-7	Handle VPN/proxy edge cases	P1	Flag but don’t auto-revoke for known VPN IPs
FR-8	Support MFA step-up triggers	P1	Return step-up recommendation for medium-risk scores
FR-9	Persist audit trail	P2	Store all detections for forensic analysis
FR-10	Configure thresholds	P2	Admin API to adjust detection sensitivity

Non-Functional Requirements

ID	Requirement	Target	Rationale
NFR-1	Detection latency	< 2 seconds	Minimize attacker window
NFR-2	Revocation propagation	< 100ms	All PEPs must receive signal fast
NFR-3	False positive rate	< 1%	Avoid user frustration
NFR-4	Throughput	10,000 events/sec	Handle enterprise scale
NFR-5	Availability	99.9%	Security system must be reliable
NFR-6	Baseline convergence	< 7 days	Learn user patterns within a week

Log Event Schema

{
  "event_id": "evt-550e8400-e29b-41d4",
  "timestamp": "2024-12-27T10:15:00.123Z",
  "user_id": "alice@example.com",
  "session_id": "sess-4412-XA",
  "token_jti": "jwt-889923",
  "source_ip": "203.0.113.45",
  "user_agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7)",
  "request": {
    "method": "GET",
    "path": "/api/v2/sensitive-data",
    "query_params": {},
    "body_size_bytes": 0
  },
  "response": {
    "status_code": 200,
    "body_size_bytes": 4532
  },
  "pep_id": "proxy-east-1",
  "device_fingerprint": "fp-abc123"
}

Alert Event Schema

{
  "alert_id": "alert-123456",
  "timestamp": "2024-12-27T10:20:00.456Z",
  "user_id": "alice@example.com",
  "alert_type": "impossible_travel",
  "severity": "critical",
  "details": {
    "location_a": {
      "ip": "203.0.113.45",
      "city": "New York",
      "country": "US",
      "coordinates": [40.7128, -74.0060]
    },
    "location_b": {
      "ip": "185.34.22.11",
      "city": "London",
      "country": "GB",
      "coordinates": [51.5074, -0.1278]
    },
    "time_difference_seconds": 1200,
    "distance_km": 5581,
    "required_speed_kmh": 16743
  },
  "trust_score_before": 100,
  "trust_score_after": 0,
  "action_taken": "session_revoked",
  "session_id": "sess-4412-XA"
}

Architecture Diagram

+------------------------------------------------------------------+
|              CONTINUOUS AUTHENTICATION MONITOR                     |
+------------------------------------------------------------------+

                    POLICY ENFORCEMENT POINTS
+--------+  +--------+  +--------+  +--------+  +--------+
| PEP 1  |  | PEP 2  |  | PEP 3  |  | PEP 4  |  | PEP 5  |
+---+----+  +---+----+  +---+----+  +---+----+  +---+----+
    |           |           |           |           |
    |    Access Logs (JSON over Redis Streams)      |
    v           v           v           v           v
+------------------------------------------------------------------+
|                     REDIS STREAMS                                 |
|                   "access-events" stream                          |
+--------------------------------+---------------------------------+
                                 |
                                 v
+------------------------------------------------------------------+
|                 CONTINUOUS AUTH MONITOR (Python)                   |
|                                                                   |
|  +------------------------+    +--------------------------+       |
|  |  Stream Consumer       |    |  Baseline Store          |       |
|  |  (asyncio/aioredis)    |    |  (Redis Hash per user)   |       |
|  +------------+-----------+    +-----------+--------------+       |
|               |                            |                      |
|               v                            v                      |
|  +------------------------+    +--------------------------+       |
|  |  Event Parser          |--->|  Baseline Updater        |       |
|  |  - Extract user, IP    |    |  - EMA calculations      |       |
|  |  - Geolocate IP        |    |  - Time pattern updates  |       |
|  +------------+-----------+    +--------------------------+       |
|               |                                                   |
|               v                                                   |
|  +------------------------+                                       |
|  |  Anomaly Detectors     |                                       |
|  |  +------------------+  |                                       |
|  |  | Impossible Travel|  |                                       |
|  |  +------------------+  |                                       |
|  |  | Temporal Anomaly |  |                                       |
|  |  +------------------+  |                                       |
|  |  | Behavior Anomaly |  |                                       |
|  |  +------------------+  |                                       |
|  +------------+-----------+                                       |
|               |                                                   |
|               v                                                   |
|  +------------------------+                                       |
|  |  Trust Score Engine    |                                       |
|  |  - Weighted composite  |                                       |
|  |  - Score decay         |                                       |
|  +------------+-----------+                                       |
|               |                                                   |
|       +-------+-------+                                           |
|       |               |                                           |
|       v               v                                           |
|  +---------+     +------------+                                   |
|  | Alert   |     | Revocation |                                   |
|  | Writer  |     | Publisher  |                                   |
|  +---------+     +-----+------+                                   |
|                        |                                          |
+------------------------+------------------------------------------+
                         |
                         | Redis Pub/Sub "session-revocations"
                         v
+--------+  +--------+  +--------+  +--------+  +--------+
| PEP 1  |  | PEP 2  |  | PEP 3  |  | PEP 4  |  | PEP 5  |
+--------+  +--------+  +--------+  +--------+  +--------+
    (All PEPs subscribe and instantly terminate revoked sessions)

+------------------------------------------------------------------+
|                     SUPPORTING SYSTEMS                            |
+------------------------------------------------------------------+

+----------------+   +----------------+   +----------------+
|  GeoIP DB      |   |  PostgreSQL    |   |  Dashboard UI  |
|  (MaxMind)     |   |  (Audit Logs)  |   |  (React/WS)    |
+----------------+   +----------------+   +----------------+

Real World Outcome

When you complete this project, you will have an intelligent security monitor that detects account takeover in real-time. Here is exactly what you will see:

Starting the Monitor

$ ./zta-monitor --config /etc/zta-monitor/config.yaml

+------------------------------------------------------------------+
|         ZERO TRUST CONTINUOUS AUTHENTICATION MONITOR              |
+------------------------------------------------------------------+
[INFO] 2024-12-27T10:00:00Z Monitor v1.0.0 starting...
[INFO] 2024-12-27T10:00:00Z Connecting to Redis at localhost:6379
[INFO] 2024-12-27T10:00:00Z Loading GeoIP database: /data/GeoLite2-City.mmdb
[INFO] 2024-12-27T10:00:01Z GeoIP database loaded (3.2M entries)
[INFO] 2024-12-27T10:00:01Z Subscribing to stream: access-events
[INFO] 2024-12-27T10:00:01Z Publishing revocations to: session-revocations
[INFO] 2024-12-27T10:00:01Z Dashboard WebSocket on port 8080
[INFO] 2024-12-27T10:00:01Z Ready. Monitoring for anomalies...
[INFO] 2024-12-27T10:00:01Z Loaded baselines for 50 users from cache.

Normal Activity

[LOG] 2024-12-27T10:05:00Z User: alice@corp.com | IP: 203.0.113.45 (NYC) | Trust: 100
[LOG] 2024-12-27T10:05:05Z User: alice@corp.com | IP: 203.0.113.45 (NYC) | Trust: 100
[LOG] 2024-12-27T10:05:10Z User: bob@corp.com   | IP: 198.51.100.22 (LA)  | Trust: 100
[LOG] 2024-12-27T10:05:15Z User: alice@corp.com | IP: 203.0.113.45 (NYC) | Trust: 100
[BASELINE] alice@corp.com: Updated location baseline (NYC confirmed)
[BASELINE] alice@corp.com: Updated temporal baseline (10:00-11:00 typical)

Anomaly Detection: Impossible Travel

[LOG]   2024-12-27T10:20:00Z User: alice@corp.com | IP: 185.34.22.11 (LONDON)

+------------------------------------------------------------------+
|                    CRITICAL ALERT: IMPOSSIBLE TRAVEL               |
+------------------------------------------------------------------+
|                                                                   |
|  User:           alice@corp.com                                   |
|  Session ID:     sess-4412-XA                                     |
|                                                                   |
|  Location A:     New York, US (203.0.113.45)                      |
|  Time A:         2024-12-27T10:05:15Z                            |
|                                                                   |
|  Location B:     London, GB (185.34.22.11)                        |
|  Time B:         2024-12-27T10:20:00Z                            |
|                                                                   |
|  Time Elapsed:   15 minutes                                       |
|  Distance:       5,581 km                                         |
|  Required Speed: 22,324 km/h                                      |
|                                                                   |
|  Verdict:        PHYSICALLY IMPOSSIBLE                            |
|                  (Max feasible: 1,500 km/h)                       |
|                                                                   |
|  Trust Score:    100 -> 0                                         |
|                                                                   |
+------------------------------------------------------------------+

[ACTION] 2024-12-27T10:20:00.052Z Publishing revocation for alice@corp.com (sess-4412-XA)
[ACTION] 2024-12-27T10:20:00.053Z Revocation published to 'session-revocations' channel
[AUDIT]  2024-12-27T10:20:00.055Z Alert written to PostgreSQL (alert-123456)

PEP Response (In Another Terminal)

# Terminal 2: PEP Proxy Log
$ ./zta-proxy --log-level info

[INFO] 2024-12-27T10:00:00Z PEP Proxy listening on :8080
[INFO] 2024-12-27T10:00:00Z Subscribed to 'session-revocations' channel

[INFO] 2024-12-27T10:05:15Z ALLOW alice@corp.com GET /api/data (Trust: 100)
[INFO] 2024-12-27T10:10:22Z ALLOW alice@corp.com GET /api/users (Trust: 100)

[REVOCATION] 2024-12-27T10:20:00.053Z Received GLOBAL_REVOKE:
             User: alice@corp.com
             Session: sess-4412-XA
             Reason: Impossible travel detected

[INFO] 2024-12-27T10:20:00.054Z Session terminated: sess-4412-XA
[INFO] 2024-12-27T10:20:00.054Z Removed from active sessions cache

Attacker Experience

# Terminal 3: Attacker's request (using stolen token from London)
$ curl -i -H "Authorization: Bearer eyJhbG..." http://api.corp.com/data

HTTP/1.1 401 Unauthorized
Content-Type: application/json
X-ZT-Reason: session_revoked

{
  "error": "Session Terminated",
  "reason": "Suspicious login activity detected (Impossible Travel).",
  "alert_id": "alert-123456",
  "remediation": "Your session has been terminated for security. Please authenticate with MFA to restore access.",
  "mfa_url": "https://auth.corp.com/mfa?user=alice@corp.com"
}

Trust Score Dashboard

$ curl http://localhost:8080/api/dashboard/sessions | jq .

{
  "active_sessions": [
    {
      "user": "bob@corp.com",
      "session_id": "sess-7721-BC",
      "trust_score": 95,
      "last_activity": "2024-12-27T10:19:55Z",
      "location": "Los Angeles, US",
      "device": "Chrome/macOS",
      "status": "active"
    },
    {
      "user": "charlie@corp.com",
      "session_id": "sess-9921-DE",
      "trust_score": 72,
      "last_activity": "2024-12-27T10:18:30Z",
      "location": "Chicago, US",
      "device": "Firefox/Windows",
      "status": "active",
      "warnings": ["Unusual access time (outside 9-5)"]
    }
  ],
  "recent_alerts": [
    {
      "alert_id": "alert-123456",
      "user": "alice@corp.com",
      "type": "impossible_travel",
      "severity": "critical",
      "timestamp": "2024-12-27T10:20:00Z",
      "action_taken": "session_revoked"
    }
  ],
  "statistics": {
    "total_sessions_monitored": 147,
    "alerts_last_hour": 1,
    "average_trust_score": 94.2,
    "false_positive_rate_30d": 0.3
  }
}

The Core Question You’re Answering

“How can I continuously verify a user’s identity throughout their session, detecting anomalies that suggest account compromise or session hijacking?”

One-time login authentication creates a dangerous assumption: that whoever presented valid credentials at 9 AM is still the legitimate user at 3 PM. In reality, session tokens can be stolen, devices can be compromised, and attackers can hijack authenticated sessions within minutes of the initial login. Continuous authentication closes this gap by treating every action as an opportunity to verify identity, not through intrusive re-authentication, but through behavioral signals that are unique to each user.

Concepts You Must Understand First

Before diving into implementation, ensure you have a solid grasp of these foundational concepts:

1. User and Entity Behavior Analytics (UEBA)

UEBA is the practice of establishing behavioral baselines for users (and entities like devices, services) and detecting deviations that may indicate compromise. Unlike signature-based detection that looks for known attack patterns, UEBA asks “is this behavior normal for THIS specific user?” This personalization is what makes it powerful against novel attacks.

Key insight: Every user has a unique behavioral fingerprint - the hours they work, the resources they access, the locations they connect from, and the patterns of their requests.

2. Behavioral Baselines and Anomaly Detection

A baseline represents “normal” behavior for a user, built from historical observations. Anomaly detection identifies when current behavior deviates significantly from this baseline. The challenge is distinguishing between legitimate changes (user traveling for work) and malicious deviations (attacker using stolen credentials).

Key insight: Baselines must evolve over time to accommodate legitimate behavioral changes while remaining sensitive to sudden, unexplained shifts.

3. Impossible Travel Detection and the Haversine Formula

Impossible travel detection catches scenarios where a user appears to be in two geographically distant locations within an impossibly short time frame. The Haversine formula calculates the great-circle distance between two points on Earth’s surface, which combined with elapsed time gives the required travel speed.

Key insight: A login from New York at 10:00 AM followed by a login from London at 10:15 AM requires travel at 22,000+ km/h - clearly impossible and strong evidence of credential theft.

4. Session Management and Token Lifecycle

Sessions represent authenticated user contexts, typically maintained via tokens (JWTs, session cookies). Understanding how tokens are issued, validated, refreshed, and revoked is crucial. Continuous authentication adds a new dimension: tokens that were valid can become invalid based on behavioral signals, not just expiration.

Key insight: Revocation must be fast (sub-second) and distributed across all Policy Enforcement Points simultaneously.

5. Risk Scoring Algorithms

Risk scores combine multiple signals into a single value that drives access decisions. Signals include location familiarity, temporal patterns, device recognition, and behavioral consistency. The scoring algorithm must balance sensitivity (catching real attacks) with specificity (avoiding false positives).

Key insight: A single suspicious signal may reduce trust moderately, but multiple concurrent anomalies should trigger immediate revocation.

6. Real-Time Event Processing

Continuous authentication requires processing access logs as they happen, not in nightly batches. Stream processing architectures consume events in real-time, maintain stateful computations (user baselines), and emit decisions within milliseconds. Understanding event streaming patterns (pub/sub, consumer groups) is essential.

Key insight: The latency between an attack and detection must be measured in seconds, not hours.

Questions to Guide Your Design

As you design your Continuous Authentication Monitor, work through these questions:

Event Collection Architecture

What data points will you capture in each access log event?
How will you ensure PEPs (Policy Enforcement Points) reliably deliver logs to your monitor?
What happens if log delivery is delayed or out of order?
How will you handle high-volume event streams without dropping events?

Scoring and Decision Logic

How will you weight different signals (location, time, device, behavior) in your composite score?
What thresholds trigger different responses (step-up MFA vs immediate revocation)?
How should trust scores decay over time during periods of inactivity?
How will you handle the cold-start problem for new users with no baseline?

Response Actions

How will you propagate revocation decisions to all PEPs in sub-second time?
What remediation options will you offer users who are flagged (MFA step-up, support contact)?
How will you prevent alert fatigue for security teams?
What audit trail will you maintain for compliance and forensics?

Edge Cases

How will you handle legitimate VPN usage that changes apparent location?
What about mobile users whose carrier assigns different IP ranges?
How will you detect and handle coordinated attacks across multiple accounts?
What if your monitor itself is targeted for denial of service?

Thinking Exercise

Before writing any code, work through this design exercise on paper:

Design a Behavioral Baseline on Paper

Imagine you’re building a baseline for user “Alice” who works as a software engineer:

Temporal Profile:
- What hours does Alice typically log in? Draw a histogram of login times.
- What days of the week does she work? How would you represent this?
- How would you update this histogram with new data while giving more weight to recent behavior?
Location Profile:
- Alice primarily works from home (NYC) but visits the office (Boston) twice a month and occasionally travels. How would you represent her “normal” locations?
- What data structure captures location familiarity with confidence levels?
- How far can she be from a known location before you consider it anomalous?
Behavioral Profile:
- Alice typically makes 50-80 API requests per hour during work hours. How would you represent this as a baseline?
- She accesses the /api/code and /api/docs endpoints most frequently. How would you track resource access patterns?
- Calculate: If her mean is 65 requests/hour with standard deviation of 12, what z-score would 200 requests/hour produce?
Anomaly Scenarios:
- Write out the detection logic for: “Alice logs in at 3 AM from an IP in Romania”
- What signals fire? What would you set her trust score to?
- What action would you take?

This paper exercise forces you to think through the data structures and algorithms before touching code.

Hints in Layers

Use these hints progressively - try to solve problems yourself before revealing the next layer.

Hint 1: Start with the Haversine Function

Your first piece of working code should be the distance calculation. This is pure math with no external dependencies - perfect for test-driven development. Write tests first with known distances (NYC to London = ~5,580 km) and implement until tests pass.

Hint 2: Use Redis Streams for Event Ingestion

Don’t build a custom message queue. Redis Streams (XREAD, XADD) provide durable, ordered event streaming with consumer groups. Your monitor can use XREAD BLOCK to wait for new events, process them, and acknowledge with XACK. This handles the “reliable delivery” problem for you.

Hint 3: Store Baselines as Redis Hashes

For each user, store their baseline as a Redis Hash (HSET/HGETALL). Keys like baseline:{user_id} with fields for last_location, hour_histogram, request_ema, etc. This gives you atomic updates and fast reads without a separate database for the hot path.

Hint 4: Use Pub/Sub for Revocation Broadcasting

When you detect an anomaly and need to revoke a session, publish to a Redis Pub/Sub channel. All PEPs subscribe to this channel and receive the revocation message within milliseconds. This is faster than polling a blacklist and scales horizontally.

Hint 5: Implement Score Components Independently First

Build and test each score component (location_score, temporal_score, behavioral_score) as independent functions before combining them. This makes debugging easier and allows you to tune each component separately. Only after each component works should you implement the composite scoring logic.

Solution Architecture

Component Breakdown

continuous-auth-monitor/
+-- main.py                       # Application entry point
+-- config/
|   +-- config.py                 # Configuration loading
|   +-- config.yaml               # Default configuration
+-- ingestion/
|   +-- stream_consumer.py        # Redis Streams consumer
|   +-- event_parser.py           # Parse log events
+-- geolocation/
|   +-- geoip_service.py          # MaxMind GeoIP lookup
|   +-- distance.py               # Haversine calculations
+-- baseline/
|   +-- store.py                  # User baseline storage
|   +-- updater.py                # EMA baseline updates
|   +-- models.py                 # Baseline data structures
+-- detection/
|   +-- impossible_travel.py      # Travel speed detection
|   +-- temporal_anomaly.py       # Time-based detection
|   +-- behavioral_anomaly.py     # Request pattern detection
|   +-- detector_pipeline.py      # Orchestrates all detectors
+-- scoring/
|   +-- trust_score.py            # Composite score calculation
|   +-- decay.py                  # Score decay over time
+-- actions/
|   +-- revocation_publisher.py   # Redis Pub/Sub publisher
|   +-- alert_writer.py           # PostgreSQL audit log
|   +-- mfa_trigger.py            # MFA step-up integration
+-- api/
|   +-- dashboard.py              # REST + WebSocket API
|   +-- admin.py                  # Configuration endpoints
+-- tests/
    +-- test_haversine.py
    +-- test_baseline.py
    +-- test_impossible_travel.py
    +-- test_integration.py

Data Flow

+------------------------------------------------------------------+
|                    DETAILED DATA FLOW                             |
+------------------------------------------------------------------+

1. LOG EVENT ARRIVES
   +----------------------------------------------------------+
   | Redis Stream: XREAD BLOCK 0 STREAMS access-events $      |
   +----------------------------------------------------------+
                              |
                              v
2. EVENT PARSING
   +----------------------------------------------------------+
   | Extract: user_id, session_id, source_ip, timestamp       |
   | Validate: Required fields present, timestamp parseable   |
   +----------------------------------------------------------+
                              |
                              v
3. GEOLOCATION
   +----------------------------------------------------------+
   | GeoIP Lookup: source_ip -> (latitude, longitude, city)   |
   | Cache: Store IP->Location mappings (1 hour TTL)          |
   +----------------------------------------------------------+
                              |
                              v
4. FETCH USER BASELINE
   +----------------------------------------------------------+
   | Redis Hash: HGETALL baseline:{user_id}                   |
   | Contents: last_location, typical_hours, avg_requests,    |
   |           request_stddev, last_activity_time             |
   +----------------------------------------------------------+
                              |
                              v
5. RUN ANOMALY DETECTORS (Parallel)
   +----------------------------------------------------------+
   | Impossible Travel: Compare current location to last      |
   | Temporal Anomaly:  Compare login hour to baseline        |
   | Behavioral:        Compare request patterns to baseline  |
   +----------------------------------------------------------+
                              |
                              v
6. CALCULATE TRUST SCORE
   +----------------------------------------------------------+
   | Composite: weight * location_score +                     |
   |            weight * temporal_score +                     |
   |            weight * behavioral_score                     |
   | Apply decay based on idle time                           |
   +----------------------------------------------------------+
                              |
                              v
7. DECISION
   +----------------------------------------------------------+
   | If trust_score < threshold:                              |
   |   -> Publish revocation                                  |
   |   -> Write alert to audit log                            |
   |   -> Notify dashboard via WebSocket                      |
   | Else:                                                    |
   |   -> Update baseline with new data                       |
   |   -> Update last_location, last_activity                 |
   +----------------------------------------------------------+
                              |
                              v
8. UPDATE BASELINE (if not anomaly)
   +----------------------------------------------------------+
   | Redis Hash: HSET baseline:{user_id}                      |
   |   last_location = current_location                       |
   |   last_activity = current_time                           |
   |   hour_histogram = updated histogram                     |
   |   request_ema = new EMA value                            |
   +----------------------------------------------------------+

Phased Implementation Guide

Phase 1: Log Ingestion and Geolocation (8-10 hours)

Goal: Consume access logs from Redis Streams and enrich with geolocation.

Milestone: Print each log entry with city and coordinates.

Steps:

Set up Python project with dependencies:
```
pip install aioredis geoip2 pydantic
```

Create Redis Streams consumer:

import asyncio
import aioredis

async def consume_events():
    redis = await aioredis.from_url("redis://localhost")
    last_id = "0"

    while True:
        events = await redis.xread(
            {"access-events": last_id},
            block=1000  # Wait up to 1 second
        )
        for stream, messages in events:
            for msg_id, data in messages:
                yield data
                last_id = msg_id

Integrate MaxMind GeoIP:

import geoip2.database

class GeoIPService:
    def __init__(self, db_path: str):
        self.reader = geoip2.database.Reader(db_path)

    def lookup(self, ip: str) -> dict:
        try:
            response = self.reader.city(ip)
            return {
                "city": response.city.name,
                "country": response.country.iso_code,
                "latitude": response.location.latitude,
                "longitude": response.location.longitude,
                "accuracy_km": response.location.accuracy_radius
            }
        except geoip2.errors.AddressNotFoundError:
            return None

Implement Haversine distance function (from theory section).
Create event parser with geolocation enrichment.

Verification:

# Publish test event
$ redis-cli XADD access-events "*" user_id alice@corp.com source_ip 8.8.8.8

# Monitor should output:
[LOG] User: alice@corp.com | IP: 8.8.8.8 | Location: Mountain View, US (37.4056, -122.0775)

Phase 2: User Baseline Building (8-10 hours)

Goal: Build and update per-user behavioral baselines.

Milestone: Show user’s typical login hours and locations.

Steps:

Design baseline data model:

from dataclasses import dataclass
from typing import List, Dict

@dataclass
class UserBaseline:
    user_id: str
    last_location: tuple  # (lat, lon, city)
    last_activity: datetime
    hour_histogram: Dict[int, int]  # hour -> count
    location_history: List[tuple]   # Recent locations
    request_count_ema: float
    request_count_stddev: float

Implement baseline storage in Redis:

class BaselineStore:
    def __init__(self, redis_client):
        self.redis = redis_client

    async def get_baseline(self, user_id: str) -> UserBaseline:
        data = await self.redis.hgetall(f"baseline:{user_id}")
        if not data:
            return None
        return UserBaseline.from_dict(data)

    async def update_baseline(self, user_id: str, event: dict):
        # Update hour histogram
        hour = event["timestamp"].hour
        await self.redis.hincrby(f"baseline:{user_id}", f"hour:{hour}", 1)

        # Update last location
        await self.redis.hset(f"baseline:{user_id}", mapping={
            "last_lat": event["latitude"],
            "last_lon": event["longitude"],
            "last_city": event["city"],
            "last_activity": event["timestamp"].isoformat()
        })

Implement EMA calculation for request rates.
Add baseline initialization for new users.

Verification:

$ curl http://localhost:8080/api/baseline/alice@corp.com

{
  "user_id": "alice@corp.com",
  "typical_hours": [9, 10, 11, 14, 15, 16],
  "typical_locations": ["New York", "Boston"],
  "baseline_confidence": 0.85,
  "events_analyzed": 127
}

Phase 3: Impossible Travel Detection (6-8 hours)

Goal: Detect when a user appears in two distant locations too quickly.

Milestone: Trigger alert for NYC->London in 15 minutes.

Steps:

Implement the detector:

from dataclasses import dataclass

@dataclass
class ImpossibleTravelAlert:
    user_id: str
    location_a: dict
    location_b: dict
    time_difference_seconds: int
    distance_km: float
    speed_kmh: float
    threshold_kmh: float = 1500

class ImpossibleTravelDetector:
    def __init__(self, speed_threshold_kmh: float = 1500):
        self.threshold = speed_threshold_kmh

    def detect(self, current_event: dict,
               last_location: tuple,
               last_time: datetime) -> ImpossibleTravelAlert:

        current_loc = (current_event["latitude"],
                       current_event["longitude"])

        distance = haversine_distance(
            last_location[0], last_location[1],
            current_loc[0], current_loc[1]
        )

        time_diff = (current_event["timestamp"] - last_time).total_seconds()
        hours = time_diff / 3600

        if hours == 0:
            speed = float('inf')
        else:
            speed = distance / hours

        if speed > self.threshold:
            return ImpossibleTravelAlert(
                user_id=current_event["user_id"],
                location_a={"city": last_location[2],
                            "coords": last_location[:2]},
                location_b={"city": current_event["city"],
                            "coords": current_loc},
                time_difference_seconds=int(time_diff),
                distance_km=distance,
                speed_kmh=speed
            )
        return None

Add edge case handling (VPN IPs, same city tolerance).
Integrate with baseline store.

Verification:

# Simulate two events
$ redis-cli XADD access-events "*" user_id alice source_ip 1.1.1.1 timestamp "2024-12-27T10:00:00Z"
$ redis-cli XADD access-events "*" user_id alice source_ip 8.8.8.8 timestamp "2024-12-27T10:15:00Z"

# Monitor should detect:
[ALERT] Impossible Travel: alice | NYC -> London | 15 min | 5581 km | 22324 km/h

Phase 4: Trust Score Calculation (6-8 hours)

Goal: Calculate composite trust scores from multiple signals.

Milestone: Display real-time trust scores for active sessions.

Steps:

Implement individual score calculators:

def location_score(current_loc: tuple, baseline: UserBaseline) -> int:
    """Score based on location familiarity."""
    known_cities = [loc[2] for loc in baseline.location_history]

    if current_loc[2] in known_cities:
        return 100

    # Calculate distance to nearest known location
    min_distance = min(
        haversine_distance(*current_loc[:2], *known[:2])
        for known in baseline.location_history
    )

    if min_distance < 100:  # Same metro area
        return 80
    elif min_distance < 500:  # Same region
        return 60
    elif min_distance < 1000:  # Same country probably
        return 40
    else:
        return 20

def temporal_score(hour: int, baseline: UserBaseline) -> int:
    """Score based on login time typicality."""
    total_events = sum(baseline.hour_histogram.values())
    if total_events == 0:
        return 50  # No baseline yet

    hour_count = baseline.hour_histogram.get(hour, 0)
    frequency = hour_count / total_events

    if frequency > 0.1:  # >10% of logins at this hour
        return 100
    elif frequency > 0.05:
        return 80
    elif frequency > 0.01:
        return 50
    else:
        return 20

Implement composite score calculator with weights.
Add score decay function.
Integrate with session state storage.

Verification:

$ curl http://localhost:8080/api/sessions/active

[
  {
    "session_id": "sess-4412-XA",
    "user": "alice@corp.com",
    "trust_score": 95,
    "score_breakdown": {
      "location": 100,
      "temporal": 90,
      "behavioral": 95
    }
  }
]

Phase 5: Revocation Signal Broadcasting (4-6 hours)

Goal: Publish revocation signals that PEPs can consume.

Milestone: Verify PEP receives and acts on revocation within 100ms.

Steps:

Implement revocation publisher:

import json

class RevocationPublisher:
    def __init__(self, redis_client):
        self.redis = redis_client
        self.channel = "session-revocations"

    async def revoke(self, user_id: str, session_id: str, reason: str):
        message = {
            "action": "REVOKE",
            "user_id": user_id,
            "session_id": session_id,
            "reason": reason,
            "timestamp": datetime.utcnow().isoformat()
        }
        await self.redis.publish(self.channel, json.dumps(message))

Create PEP subscription handler (for testing).
Add latency measurement.
Implement retry logic for failed publishes.

Verification:

# Terminal 1: Subscribe to revocations
$ redis-cli SUBSCRIBE session-revocations

# Terminal 2: Trigger revocation
$ curl -X POST http://localhost:8080/api/revoke/alice@corp.com/sess-4412-XA

# Terminal 1 should immediately show:
1) "message"
2) "session-revocations"
3) "{\"action\":\"REVOKE\",\"user_id\":\"alice@corp.com\",...}"

Phase 6: Dashboard and Alerting (6-8 hours)

Goal: Real-time visibility into trust scores and alerts.

Milestone: WebSocket-connected dashboard showing live session states.

Steps:

Create FastAPI application with WebSocket support:

from fastapi import FastAPI, WebSocket
from fastapi.websockets import WebSocketDisconnect

app = FastAPI()
active_connections = []

@app.websocket("/ws/dashboard")
async def dashboard_websocket(websocket: WebSocket):
    await websocket.accept()
    active_connections.append(websocket)
    try:
        while True:
            # Keep connection alive
            await websocket.receive_text()
    except WebSocketDisconnect:
        active_connections.remove(websocket)

async def broadcast_update(data: dict):
    for connection in active_connections:
        await connection.send_json(data)

Implement REST endpoints for session listing.
Create alert history endpoint.
Add admin API for threshold configuration.

Testing Strategy

Unit Tests

# tests/test_haversine.py
import pytest
from geolocation.distance import haversine_distance

def test_haversine_nyc_to_london():
    # NYC coordinates
    nyc = (40.7128, -74.0060)
    # London coordinates
    london = (51.5074, -0.1278)

    distance = haversine_distance(*nyc, *london)

    # Expected: ~5,581 km (allow 1% tolerance)
    assert 5500 < distance < 5650

def test_haversine_same_point():
    point = (40.7128, -74.0060)
    distance = haversine_distance(*point, *point)
    assert distance == 0

def test_haversine_antipodal_points():
    # Maximum possible distance
    point_a = (0, 0)
    point_b = (0, 180)
    distance = haversine_distance(*point_a, *point_b)
    # Half Earth circumference: ~20,000 km
    assert 19900 < distance < 20100

# tests/test_impossible_travel.py
import pytest
from datetime import datetime, timedelta
from detection.impossible_travel import ImpossibleTravelDetector

def test_impossible_travel_detected():
    detector = ImpossibleTravelDetector(speed_threshold_kmh=1500)

    current_event = {
        "user_id": "alice",
        "latitude": 51.5074,  # London
        "longitude": -0.1278,
        "city": "London",
        "timestamp": datetime(2024, 12, 27, 10, 20, 0)
    }

    last_location = (40.7128, -74.0060, "New York")  # NYC
    last_time = datetime(2024, 12, 27, 10, 5, 0)  # 15 min earlier

    alert = detector.detect(current_event, last_location, last_time)

    assert alert is not None
    assert alert.speed_kmh > 20000  # Way over threshold
    assert alert.distance_km > 5500

def test_legitimate_travel_not_flagged():
    detector = ImpossibleTravelDetector(speed_threshold_kmh=1500)

    # NYC to Boston, 6 hours later (possible by car)
    current_event = {
        "user_id": "alice",
        "latitude": 42.3601,  # Boston
        "longitude": -71.0589,
        "city": "Boston",
        "timestamp": datetime(2024, 12, 27, 16, 0, 0)
    }

    last_location = (40.7128, -74.0060, "New York")
    last_time = datetime(2024, 12, 27, 10, 0, 0)  # 6 hours earlier

    alert = detector.detect(current_event, last_location, last_time)

    assert alert is None  # No alert for ~60 km/h travel

Integration Tests

# tests/test_integration.py
import pytest
import asyncio
import aioredis

@pytest.fixture
async def redis_client():
    client = await aioredis.from_url("redis://localhost")
    yield client
    await client.close()

@pytest.mark.asyncio
async def test_end_to_end_impossible_travel(redis_client):
    """Test complete flow from log ingestion to revocation."""

    # Subscribe to revocations
    pubsub = redis_client.pubsub()
    await pubsub.subscribe("session-revocations")

    # Publish first event (NYC)
    await redis_client.xadd("access-events", {
        "user_id": "test-user",
        "source_ip": "203.0.113.45",
        "session_id": "test-session",
        "timestamp": "2024-12-27T10:00:00Z"
    })

    # Wait for baseline update
    await asyncio.sleep(1)

    # Publish second event (London, 15 min later)
    await redis_client.xadd("access-events", {
        "user_id": "test-user",
        "source_ip": "185.34.22.11",
        "session_id": "test-session",
        "timestamp": "2024-12-27T10:15:00Z"
    })

    # Wait for revocation
    message = await asyncio.wait_for(
        pubsub.get_message(ignore_subscribe_messages=True),
        timeout=5.0
    )

    assert message is not None
    data = json.loads(message["data"])
    assert data["action"] == "REVOKE"
    assert data["user_id"] == "test-user"

False Positive Testing

# tests/test_false_positives.py
def test_vpn_should_not_trigger_immediate_revoke():
    """VPN usage should flag but not auto-revoke."""
    # Known VPN IP range
    vpn_ip = "104.238.130.1"  # Example VPN provider

    detector = ImpossibleTravelDetector()
    detector.vpn_ips = load_vpn_ip_ranges()

    result = detector.detect_with_context(
        current_ip=vpn_ip,
        last_location=(40.7128, -74.0060, "NYC"),
        last_time=datetime.now() - timedelta(minutes=5)
    )

    assert result.action == "FLAG"  # Not REVOKE
    assert result.requires_mfa_stepup == True

def test_same_city_different_ip():
    """Different IPs in same city should not flag."""
    # Two different IPs both in NYC
    detector = ImpossibleTravelDetector(min_distance_km=50)

    current = {"lat": 40.7580, "lon": -73.9855}  # Midtown
    last = (40.7128, -74.0060, "NYC")  # Downtown

    alert = detector.detect(current, last, datetime.now())

    assert alert is None  # Only 5km apart

Performance Testing

# Generate load test events
$ python scripts/generate_events.py --count 100000 --users 1000

# Run benchmark
$ hyperfine --warmup 3 'python -c "from main import process_batch; process_batch()"'

# Expected: Process 10,000 events/second on single core

Common Pitfalls and Debugging

Pitfall 1: VPN False Positives

Symptom: Users with VPNs constantly flagged for impossible travel.

Cause: VPN exit nodes in different countries than user’s actual location.

Solution:

class VPNAwareDetector:
    def __init__(self):
        self.vpn_asns = self.load_vpn_asn_list()
        self.vpn_ip_ranges = self.load_vpn_ip_ranges()

    def is_vpn(self, ip: str) -> bool:
        # Check if IP belongs to known VPN provider
        if ip_in_ranges(ip, self.vpn_ip_ranges):
            return True

        # Check ASN
        asn = self.get_asn(ip)
        return asn in self.vpn_asns

    def detect(self, event, last_location, last_time):
        if self.is_vpn(event["source_ip"]):
            # Don't use location for VPN IPs
            # Fall back to other signals (device, behavior)
            return self.detect_via_behavior(event)

        return self.detect_via_location(event, last_location, last_time)

Additional: Allow users to pre-register VPN usage in their profile.

Pitfall 2: Clock Synchronization Issues

Symptom: Events appear out of order; impossible travel detected for legitimate sequences.

Cause: Different PEP servers have unsynchronized clocks.

Solution:

def validate_event_sequence(current: dict, previous: dict) -> bool:
    """Require minimum time gap before analysis."""

    time_diff = (current["timestamp"] - previous["timestamp"]).total_seconds()

    # If events are within 60 seconds, don't analyze for travel
    # Clock skew could cause false ordering
    if abs(time_diff) < 60:
        return False

    # If current event is before previous, log warning
    if time_diff < 0:
        log.warning(f"Event timestamp ordering issue: {current} before {previous}")
        return False

    return True

Infrastructure: Ensure all servers use NTP with tight synchronization (chrony or ntpd).

Pitfall 3: Baseline Cold Start

Symptom: New users immediately flagged for anomalies.

Cause: No baseline exists to compare against.

Solution:

class BaselineManager:
    MINIMUM_EVENTS_FOR_BASELINE = 10
    COLD_START_TRUST_SCORE = 70  # Not full trust, but not denied

    def has_sufficient_baseline(self, user_id: str) -> bool:
        baseline = self.get_baseline(user_id)
        if not baseline:
            return False
        return baseline.event_count >= self.MINIMUM_EVENTS_FOR_BASELINE

    def get_trust_score(self, user_id: str, event: dict) -> int:
        if not self.has_sufficient_baseline(user_id):
            # New user - can't detect anomalies yet
            # Apply conservative trust score but allow access
            return self.COLD_START_TRUST_SCORE

        return self.calculate_trust_score(user_id, event)

Consider: “Onboarding mode” for new users with reduced sensitivity.

Pitfall 4: Alert Fatigue

Symptom: Security team ignores alerts because too many are false positives.

Cause: Thresholds too aggressive; legitimate edge cases not handled.

Solution:

class AlertManager:
    def __init__(self):
        self.alert_counts = defaultdict(int)
        self.suppression_rules = []

    def should_alert(self, alert: Alert) -> bool:
        # Check suppression rules
        for rule in self.suppression_rules:
            if rule.matches(alert):
                return False

        # Rate limit per user
        user_alerts_today = self.alert_counts[alert.user_id]
        if user_alerts_today > 5:
            log.info(f"Suppressing alert for {alert.user_id} - rate limited")
            return False

        # Require minimum severity
        if alert.severity < Severity.MEDIUM:
            return False

        return True

    def add_suppression_rule(self, rule: SuppressionRule):
        """Allow admins to suppress known false positive patterns."""
        self.suppression_rules.append(rule)

Tuning: Track false positive rate and adjust thresholds quarterly.

Pitfall 5: Memory Leak in Baseline Store

Symptom: Monitor memory grows unbounded over time.

Cause: Storing unlimited location history per user.

Solution:

class UserBaseline:
    MAX_LOCATION_HISTORY = 100
    MAX_HOUR_HISTOGRAM_SIZE = 24  # Always bounded

    def add_location(self, location: tuple):
        self.location_history.append(location)
        # Prune old entries
        if len(self.location_history) > self.MAX_LOCATION_HISTORY:
            self.location_history = self.location_history[-self.MAX_LOCATION_HISTORY:]

    def add_to_hour_histogram(self, hour: int):
        self.hour_histogram[hour] = self.hour_histogram.get(hour, 0) + 1

        # Use exponential decay to prevent unbounded growth
        if sum(self.hour_histogram.values()) > 1000:
            for h in self.hour_histogram:
                self.hour_histogram[h] = int(self.hour_histogram[h] * 0.9)

Debugging Commands

# Check user's current baseline
$ redis-cli HGETALL baseline:alice@corp.com

# View recent events for a user
$ redis-cli XRANGE access-events - + COUNT 10 | grep alice

# Test geolocation
$ python -c "from geolocation import GeoIPService; g = GeoIPService('GeoLite2-City.mmdb'); print(g.lookup('8.8.8.8'))"

# Monitor revocation channel
$ redis-cli SUBSCRIBE session-revocations

# Check detection latency
$ python scripts/measure_latency.py

# View alert history
$ curl http://localhost:8080/api/alerts?user=alice@corp.com&limit=10

Extensions and Challenges

Extension 1: Machine Learning for Behavior Modeling

Replace statistical baselines with ML models that learn complex patterns.

Approach:

from sklearn.ensemble import IsolationForest
import numpy as np

class MLBehaviorModel:
    def __init__(self):
        self.model = IsolationForest(
            contamination=0.01,  # Expect 1% anomalies
            random_state=42
        )

    def train(self, user_events: list):
        """Train on user's historical events."""
        features = self.extract_features(user_events)
        self.model.fit(features)

    def extract_features(self, events: list) -> np.array:
        return np.array([
            [
                e["hour"],
                e["day_of_week"],
                e["latitude"],
                e["longitude"],
                e["request_rate"],
                e["session_duration"]
            ]
            for e in events
        ])

    def is_anomaly(self, event: dict) -> bool:
        features = self.extract_features([event])
        prediction = self.model.predict(features)
        return prediction[0] == -1  # -1 = anomaly

Why It’s Better: Can detect complex, multi-dimensional anomalies that simple statistics miss.

Extension 2: User-Agent and Browser Fingerprinting

Add device fingerprinting as an additional signal.

Implementation:

class DeviceFingerprinter:
    def extract_fingerprint(self, headers: dict) -> str:
        """Create a hash of device characteristics."""
        components = [
            headers.get("User-Agent", ""),
            headers.get("Accept-Language", ""),
            headers.get("Accept-Encoding", ""),
            headers.get("Sec-CH-UA", ""),  # Client hints
            headers.get("Sec-CH-UA-Platform", "")
        ]
        return hashlib.sha256("|".join(components).encode()).hexdigest()[:16]

    def is_new_device(self, user_id: str, fingerprint: str) -> bool:
        known_devices = self.get_known_devices(user_id)
        return fingerprint not in known_devices

Use Case: Even if IP changes (VPN), same device fingerprint = higher trust.

Extension 3: Keystroke Dynamics

Analyze typing patterns for continuous authentication.

Concept:

+------------------------------------------------------------------+
|                    KEYSTROKE DYNAMICS                             |
+------------------------------------------------------------------+

Measurable characteristics:
- Key hold time (dwell time): How long each key is pressed
- Inter-key interval: Time between releasing one key and pressing next
- Typing speed: Characters per minute
- Error patterns: Backspace frequency, correction patterns

Example baseline for "alice":
- Average dwell time: 85ms
- Average inter-key interval: 120ms
- Typing speed: 65 WPM
- Error rate: 3%

Anomaly detection:
- Current session dwell time: 45ms (much faster)
- Current speed: 120 WPM
- Conclusion: Possibly a different person typing, or automated tool

+------------------------------------------------------------------+

Extension 4: Mouse Movement Analysis

Track mouse movement patterns as behavioral biometric.

Signals to analyze:

Mouse speed distribution
Cursor path characteristics (straight lines vs curves)
Click patterns (single vs double click timing)
Scroll behavior

Extension 5: Integration with SIEM

Send alerts to enterprise SIEM for correlation.

class SIEMIntegration:
    def __init__(self, siem_endpoint: str, api_key: str):
        self.endpoint = siem_endpoint
        self.api_key = api_key

    def send_alert(self, alert: Alert):
        # Format as Common Event Format (CEF)
        cef_message = self.format_cef(alert)

        requests.post(
            f"{self.endpoint}/api/events",
            headers={"Authorization": f"Bearer {self.api_key}"},
            json={"event": cef_message}
        )

    def format_cef(self, alert: Alert) -> str:
        return (
            f"CEF:0|ZeroTrust|AuthMonitor|1.0|"
            f"{alert.type}|{alert.severity}|"
            f"src={alert.source_ip} "
            f"duser={alert.user_id} "
            f"msg={alert.message}"
        )

Real-World Connections

Microsoft Defender for Identity

Microsoft’s UEBA solution for detecting identity-based attacks.

How it compares:

Uses similar signals (location, time, behavior)
Integrates with Active Directory
Your project: More customizable, works with any identity provider

Reference: https://docs.microsoft.com/en-us/defender-for-identity/

AWS GuardDuty

Amazon’s threat detection service using ML.

Relevant features:

Unusual API calls detection
Impossible travel detection for AWS Console logins
Automated remediation via EventBridge

Your project: Applies same concepts to your custom applications.

Okta ThreatInsight

Okta’s built-in threat detection for identity.

Capabilities:

Credential stuffing detection
Brute force protection
Suspicious location alerts

Your project: Building similar capabilities from scratch teaches you how these work.

Google BeyondCorp

Google’s Zero Trust implementation.

Relevant concepts:

Device trust signals
Context-aware access
Continuous verification

Reference: https://cloud.google.com/beyondcorp

Interview Questions

Question 1: What is User and Entity Behavior Analytics (UEBA)?

Strong Answer: UEBA is a security approach that establishes behavioral baselines for users and entities, then detects deviations that may indicate compromise. Unlike signature-based detection that looks for known attack patterns, UEBA asks “is this behavior normal for this user?”

Key components:

Baseline building (what’s normal)
Anomaly detection (what’s different)
Risk scoring (how serious is the deviation)
Response automation (what to do about it)

Follow-up: “What’s the difference between UEBA and traditional IDS?”

Traditional IDS uses signatures and rules. UEBA learns per-entity baselines and detects novel attacks without prior signatures.

Question 2: Explain the Impossible Travel detection algorithm.

Strong Answer:

Extract location from IP address using geolocation database
Calculate great-circle distance between current and last location using Haversine formula
Calculate elapsed time between events
Compute required travel speed: distance / time
If speed exceeds threshold (e.g., 1500 km/h), flag as impossible

Edge cases to handle:

VPN usage (IP doesn’t reflect actual location)
Mobile network IP changes
Clock synchronization between servers
Same-city IP changes

Formula: Speed = Distance / Time, where Distance = 2R * arcsin(sqrt(a)) with a from Haversine.

Question 3: How do you balance security with user experience in continuous authentication?

Strong Answer: Three key strategies:

Risk-based responses:
- Low risk: No friction
- Medium risk: Step-up MFA for sensitive operations only
- High risk: Force re-authentication
- Critical (impossible travel): Immediate revocation
False positive reduction:
- Combine multiple signals before acting
- Allow users to pre-register travel or VPN usage
- Implement suppression rules for known patterns
- Tune thresholds based on measured false positive rates
Graceful degradation:
- Never completely lock out based on single signal
- Provide clear remediation path (MFA step-up)
- Track and learn from user feedback on false positives

Question 4: How would you handle token revocation at scale?

Strong Answer: Push-based approach using pub/sub:

Monitor publishes revocation to Redis Pub/Sub
All PEPs subscribe and receive within milliseconds
PEPs maintain local cache of revoked sessions
No centralized bottleneck

Defense in depth:

Short-lived access tokens (5-15 minutes)
Even without push, token expires quickly
Blacklist only needs to store refresh token revocations

Challenges at scale:

Network partitions could miss revocations
Solution: Combine push with short TTL as fallback
Consider consistent hashing for revocation distribution

Question 5: What is a trust score and how do you calculate it?

Strong Answer: A trust score is a composite value (0-100) representing confidence that a session is legitimate.

Components:

Location score: Based on familiarity of current location
Temporal score: Based on typicality of current time
Device score: Based on known device fingerprint
Behavioral score: Based on request pattern normality

Calculation methods:

Weighted average: sum(score * weight)
Minimum signal: min(all_scores) - most secure
Multiplicative: product(scores/100) * 100 - most sensitive

Decay: Trust decays over time without activity, forcing re-verification.

Question 6: How do you handle the cold start problem for new users?

Strong Answer: New users have no baseline to compare against, creating challenges:

Strategies:

Onboarding period: Apply conservative trust score (e.g., 70) during first N events
Organization-wide baseline: Compare to typical behavior for similar roles
Explicit profiling: Ask users to register expected locations/devices
Higher MFA frequency: Require more verification until baseline established
Supervised learning: Use labeled data from existing users to train models

Minimum baseline threshold: Typically 10-20 events before anomaly detection activates.

Question 7: Describe the z-score and how it applies to anomaly detection.

Strong Answer: Z-score measures how many standard deviations a value is from the mean:

z = (value - mean) / standard_deviation

Interpretation:

z < 2: Normal (within 95% of data)
z > 2: Unusual (outside 95%)
z > 3: Very unusual (outside 99.7%)

Application in UEBA:

Calculate mean and stddev of user’s login hours
For new login, compute z-score
z > 3 for 3 AM login = anomaly alert

Limitations:

Assumes normal distribution
Sensitive to outliers in baseline
Use robust alternatives (MAD) for non-normal data

Question 8: How does stream processing differ from batch processing for security monitoring?

Strong Answer: Batch processing:

Process logs nightly
Detection latency: hours to days
Simpler architecture
Good for forensics, not prevention

Stream processing:

Process events as they arrive
Detection latency: seconds
More complex (state management, exactly-once)
Essential for active defense

For security:

Batch: Historical analysis, compliance reporting
Stream: Real-time threat detection, active response
Best practice: Use both (Lambda architecture)

Question 9: What signals beyond location would you use for continuous authentication?

Strong Answer:

Device fingerprint: Browser/OS characteristics
Temporal patterns: Typical login hours, day of week
Request behavior: Frequency, resources accessed, data volume
Session characteristics: Duration, activity patterns
Biometrics (advanced): Keystroke dynamics, mouse movements
Network context: VPN, corporate network, public WiFi
Device health: From endpoint agent (antivirus, patch level)

Composite approach: No single signal is definitive; combine for confidence.

Question 10: How would you measure the effectiveness of a UEBA system?

Strong Answer: Metrics:

Detection rate: % of real attacks caught (from red team exercises)
False positive rate: % of legitimate users flagged
Detection latency: Time from attack start to alert
Mean time to respond: Alert to revocation
User friction: MFA step-ups per user per day

Testing methods:

Red team exercises (inject known attacks)
Replay historical attacks with known labels
A/B testing detection thresholds
User feedback on false positives

Target: <1% false positive rate while maintaining >95% detection rate.

Resources and Self-Assessment

Books

Book	Author	Relevant Chapters
Security in Computing	Charles Pfleeger	Ch. 7: Network Security, Ch. 8: Intrusion Detection
Designing Data-Intensive Applications	Martin Kleppmann	Ch. 11: Stream Processing
Foundations of Information Security	Jason Andress	Ch. 8: Intrusion Detection
Zero Trust Networks	Gilman & Barth	Ch. 6: Runtime Security
The Art of Software Security Assessment	Dowd, McDonald, Schuh	Ch. 11: Network Protocols

Tools and Libraries

Tool	Purpose
MaxMind GeoLite2	IP geolocation database
Redis Streams	Event streaming
geoip2 (Python)	GeoIP database client
aioredis	Async Redis client for Python
FastAPI	Dashboard API
pytest-asyncio	Async testing

RFCs and Standards

Document	Topic
RFC 8693	OAuth 2.0 Token Exchange (for step-up auth)
NIST SP 800-207	Zero Trust Architecture
MITRE ATT&CK	Credential access techniques

Self-Assessment Checklist

Before considering this project complete, verify you can:

Conceptual Understanding:

Explain UEBA and how it differs from signature-based detection
Describe the impossible travel algorithm with mathematical formulas
Explain trust scores and how multiple signals combine
Articulate the tradeoff between security and usability
Describe token revocation strategies and their tradeoffs

Implementation Skills:

Calculate great-circle distance using Haversine formula
Build behavioral baselines using exponential moving averages
Calculate z-scores for anomaly detection
Process events from a stream (Redis Streams or similar)
Publish messages via pub/sub for distributed revocation

Integration Capability:

Integrate with GeoIP database for location lookup
Connect to Policy Enforcement Points for revocation
Expose REST and WebSocket APIs for dashboard
Store audit trails in persistent database

Operational Readiness:

Handle VPN and proxy edge cases
Manage the cold start problem for new users
Tune thresholds to minimize false positives
Measure and optimize detection latency

Security Thinking:

Consider how an attacker might evade detection
Design for fail-closed behavior
Handle clock synchronization issues
Protect the monitor itself from compromise

The Core Question You’ve Answered

“How do I know that the person using this session is still the same person who authenticated?”

This is THE fundamental question of continuous authentication. By building this project, you’ve learned that authentication is not an event - it’s a continuous process. Every action provides evidence about identity, and your monitor is constantly evaluating that evidence.

You’ve built a system that watches behavior, learns what’s normal, and raises the alarm when something feels wrong - not because it matches a known attack pattern, but because it deviates from expected behavior. This is the essence of Zero Trust.

The Trust Score is the new access decision. It’s not “does this user have permission?” but “how confident are we that this is actually that user, right now, in this context?”

Your Continuous Authentication Monitor answers that question for every request, enabling truly adaptive, risk-based access control.

Project Guide Version 1.0 - December 2024