Project 5: Device Trust & Health Attestation

Project 5: Device Trust & Health Attestation

The Core Question: “How do I know the device connecting to my system is secure, and not a compromised laptop pretending to be trusted?”


Table of Contents

  1. Learning Objectives
  2. Deep Theoretical Foundation
  3. Complete Project Specification
  4. Real World Outcome
  5. The Core Question You’re Answering
  6. Concepts You Must Understand First
  7. Questions to Guide Your Design
  8. Thinking Exercise
  9. Hints in Layers
  10. Solution Architecture
  11. Phased Implementation Guide
  12. Testing Strategy
  13. Common Pitfalls & Debugging
  14. Extensions & Challenges
  15. Books That Will Help
  16. Interview Questions
  17. Self-Assessment Checklist

Project Overview

Attribute Value
Difficulty Level 2: Intermediate
Time Estimate Weekend (8-16 hours)
Primary Language Go or Python
Alternative Languages Swift (macOS), PowerShell (Windows)
Knowledge Area Endpoint Security / Operating Systems
Software/Tools OS APIs (Disk encryption, Firewall, Patch level)
Main Book “Zero Trust Security” by Andravous

Learning Objectives

By completing this project, you will master:

  1. Device Trust Fundamentals: Understand why Zero Trust requires device verification, not just user authentication. Learn that a valid user on a compromised device is still a threat.

  2. Endpoint Posture Assessment: Query operating system state to determine security configuration–disk encryption status, firewall state, patch level, and antivirus presence.

  3. Cross-Platform System Programming: Use OS-specific APIs and commands (Linux, macOS, Windows) to gather security telemetry. Understand why abstraction layers are necessary.

  4. Cryptographic Report Signing: Sign health reports so that malicious software cannot forge a “healthy” status. Understand the role of device keys in attestation.

  5. Continuous vs Point-in-Time Verification: Implement real-time monitoring that detects security posture changes and can trigger access revocation immediately.

  6. Trust Score Calculation: Move beyond binary “trusted/untrusted” to nuanced scoring that allows risk-based access decisions.

  7. Integration with Policy Decision Points: Design your agent’s output to be consumable by the Policy Decision Point (PDP) you built in Project 2.


Deep Theoretical Foundation

Device Trust in Zero Trust Architecture

In traditional perimeter security, if you were on the corporate network, your device was implicitly trusted. Zero Trust Architecture (ZTA) fundamentally rejects this assumption.

Traditional Model:                    Zero Trust Model:
==================                    ==================

   Internet                              Internet
      |                                     |
   [Firewall]                          [Identity-Aware Proxy]
      |                                     |
   "Trust Zone"                        Per-request verification:
      |                                     |
   All devices                         1. Is this user authenticated?
   trusted by                          2. Is this DEVICE healthy?
   network                             3. Is this request context normal?
   location                            4. Is this resource appropriate?
                                            |
                                       Access granted (or denied)

The Key Insight: In ZTA, you don’t trust a USER–you trust the combination of:

  • A verified identity (from Project 1)
  • A healthy device (this project)
  • An appropriate context (time, location, behavior)
  • A specific resource request

This is called “compound identity” or “context-aware access.”

NIST SP 800-207 Section 3.2.2 explicitly states:

“The trust algorithm should consider both the identity of the requester and the state of the device making the request.”

Why Device Trust Matters

Consider this attack scenario:

Attacker Strategy:
==================

1. Compromise employee's personal laptop (via phishing malware)
2. Wait for employee to VPN into corporate network
3. Use employee's valid credentials to access sensitive data
4. Exfiltrate data through the VPN tunnel

Traditional security sees:
- Valid username: alice@company.com [CHECK]
- Valid password: ******** [CHECK]
- Valid MFA token: 123456 [CHECK]
- VPN connection: Established [CHECK]

Result: ACCESS GRANTED (but laptop is compromised!)

---

With Device Trust:
==================

1. Attacker compromises laptop
2. Employee attempts VPN connection
3. Device Health Agent reports:
   - Disk encryption: ENABLED [CHECK]
   - Firewall: DISABLED [FAIL] - attacker disabled it
   - OS patch level: 47 days old [WARN]
   - Unknown process: rat.exe [CRITICAL]
   - Antivirus: Definitions 30 days old [WARN]

Trust Score: 23/100 (Below threshold of 70)

Result: ACCESS DENIED, Security team notified

Endpoint Posture Checks: What to Measure

A comprehensive device health assessment examines multiple dimensions:

+------------------------------------------------------------------+
|                    DEVICE HEALTH DIMENSIONS                       |
+------------------------------------------------------------------+
|                                                                   |
|  1. DISK ENCRYPTION                                               |
|     - Full disk encryption enabled?                               |
|     - Recovery key escrowed?                                      |
|     - Encryption algorithm strength?                              |
|                                                                   |
|  2. FIREWALL STATUS                                               |
|     - Host-based firewall enabled?                                |
|     - Inbound connections blocked?                                |
|     - Outbound filtering configured?                              |
|                                                                   |
|  3. OPERATING SYSTEM                                              |
|     - Current patch level?                                        |
|     - Days since last update?                                     |
|     - Known CVEs affecting this version?                          |
|                                                                   |
|  4. ANTIVIRUS/EDR                                                 |
|     - Antivirus present?                                          |
|     - Real-time scanning enabled?                                 |
|     - Definition age?                                             |
|                                                                   |
|  5. SECURE BOOT                                                   |
|     - Secure Boot enabled?                                        |
|     - Boot chain verified?                                        |
|     - UEFI firmware version?                                      |
|                                                                   |
|  6. RUNNING PROCESSES                                             |
|     - Any unauthorized processes?                                 |
|     - Suspicious network connections?                             |
|     - Elevated privilege usage?                                   |
|                                                                   |
|  7. DEVICE IDENTITY                                               |
|     - Hardware serial number match?                               |
|     - TPM attestation valid?                                      |
|     - Device registered in inventory?                             |
|                                                                   |
+------------------------------------------------------------------+

TPM and Hardware Roots of Trust

A Trusted Platform Module (TPM) is a hardware security chip that provides cryptographic operations and secure key storage. It’s fundamental to hardware-based device attestation.

TPM Architecture:
=================

+------------------------------------------------------------------+
|                     TRUSTED PLATFORM MODULE                       |
+------------------------------------------------------------------+
|                                                                   |
|  +------------------+     +------------------+                    |
|  |   Endorsement    |     |   Storage Root   |                    |
|  |   Key (EK)       |     |   Key (SRK)      |                    |
|  |   (Unique to     |     |   (User-         |                    |
|  |    this TPM)     |     |    controlled)   |                    |
|  +------------------+     +------------------+                    |
|                                                                   |
|  +------------------+     +------------------+                    |
|  |   Platform       |     |   Attestation    |                    |
|  |   Configuration  |     |   Identity Key   |                    |
|  |   Registers      |     |   (AIK)          |                    |
|  |   (PCRs)         |     |                  |                    |
|  +------------------+     +------------------+                    |
|                                                                   |
|  PCR Values:                                                      |
|  PCR[0] = Hash(BIOS/UEFI firmware)                               |
|  PCR[1] = Hash(BIOS configuration)                               |
|  PCR[2] = Hash(Option ROMs)                                       |
|  PCR[3] = Hash(Option ROM configuration)                          |
|  PCR[4] = Hash(MBR/bootloader)                                    |
|  PCR[5] = Hash(MBR/bootloader configuration)                      |
|  PCR[7] = Hash(Secure Boot policy)                                |
|  ...                                                              |
|                                                                   |
+------------------------------------------------------------------+
|                                                                   |
|  Operations:                                                      |
|  - TPM_Extend(PCR, data) - Add measurement to PCR                |
|  - TPM_Quote(PCRs, AIK) - Sign current PCR values                |
|  - TPM_Seal(data, PCRs) - Encrypt data, only decrypt if PCRs     |
|                           match specified values                  |
|                                                                   |
+------------------------------------------------------------------+

How TPM Attestation Works:

Remote Attestation Flow:
========================

Device                              Verifier (PDP)
======                              ==============

1. PDP requests attestation with nonce
                                    <--- Send nonce (random challenge)

2. Device gathers measurements
   - Boot log
   - PCR values
   - Current state

3. TPM signs PCR values + nonce with AIK
   [PCR0..PCR7 | nonce] ---> TPM ---> [Signature]

4. Send signed quote + boot log
   ---> [Quote + Signature + Boot Log]

5.                                  Verify AIK signature
                                    Replay boot log to compute expected PCRs
                                    Compare computed vs received PCRs

6.                                  <--- Trust decision

Why the nonce? Prevents replay attacks--attacker can't
send yesterday's "healthy" quote today.

For This Project: While full TPM attestation is complex (and platform-specific), we’ll implement a simplified version using asymmetric cryptography. The agent will have a private key, and the PDP will verify signatures using the corresponding public key.

Attestation Concepts

Attestation is the process of providing evidence about a system’s state to a remote verifier. There are several models:

Attestation Models:
===================

1. SOFTWARE-BASED ATTESTATION
   - Agent collects state using OS APIs
   - Signs report with software key
   - WEAKNESS: Compromised OS can lie
   - GOOD FOR: Deployment simplicity, most threats

2. HARDWARE-BASED ATTESTATION (TPM)
   - TPM measures boot process
   - Measurements stored in PCRs
   - TPM signs quote
   - STRENGTH: Kernel can't forge boot measurements
   - WEAKNESS: Expensive, complex, not universal

3. HYBRID ATTESTATION
   - TPM attests boot integrity
   - Software agent attests runtime state
   - Best of both worlds
   - MOST REALISTIC for enterprises

For this project, we implement SOFTWARE-BASED ATTESTATION
with strong signing, preparing for TPM integration later.

Key Attestation Properties:

Property Description Our Implementation
Freshness Report is recent, not replayed Include timestamp + nonce
Authenticity Report comes from claimed device Sign with device private key
Integrity Report hasn’t been modified Signature covers all fields
Non-repudiation Device can’t deny sending report Asymmetric signature

Trust Scores vs Binary Decisions

Traditional access control is binary: ALLOW or DENY. Zero Trust often uses continuous trust scores that enable nuanced decisions:

Binary Decision:                    Trust Score Decision:
================                    ====================

Firewall disabled?                  Firewall disabled: -20 points
--> DENY ACCESS                     OS patch level: -10 points
                                    Antivirus current: +10 points
                                    Known device: +30 points
                                    ---------------------------
                                    Total: 85/100

                                    Policy rules:
                                    - Score < 50: DENY all access
                                    - Score 50-70: Read-only access
                                    - Score 70-90: Normal access
                                    - Score > 90: Full access

Result: One issue doesn't          Result: Degraded access,
lock out user entirely             user can still work

Trust Score Components:

# Example scoring model
TRUST_COMPONENTS = {
    "disk_encryption": {
        "enabled": +20,
        "disabled": -30,
        "unknown": -10
    },
    "firewall": {
        "enabled": +15,
        "disabled": -25,
        "unknown": -5
    },
    "os_patch_level": {
        "current": +15,           # Within 7 days
        "slightly_outdated": +5,   # 8-30 days
        "outdated": -10,           # 31-90 days
        "critically_outdated": -30 # 90+ days
    },
    "antivirus": {
        "present_current": +15,
        "present_outdated": +5,
        "not_present": -20
    },
    "known_device": {
        "registered": +25,
        "unregistered": -15
    },
    "secure_boot": {
        "enabled": +10,
        "disabled": -5,
        "unsupported": 0
    }
}

# Base score: 50 (neutral)
# Max theoretical: 100
# Min theoretical: 0 (capped)

Continuous vs Point-in-Time Verification

Point-in-Time: Check device health once (at login, at VPN connection). Continuous: Constantly monitor device health and react to changes.

Point-in-Time Verification:
===========================

08:00 - User logs in
        Health check: PASS
        Access granted

10:00 - User disables firewall
        (No verification)

12:00 - Attacker exfiltrates data
        (Still has access)

16:00 - User logs out

Problem: 6-hour window of vulnerable access

---

Continuous Verification:
========================

08:00 - User logs in
        Health check: PASS
        Access granted
        Monitoring begins

10:00 - User disables firewall
        Agent detects change
        Sends updated health report
        PDP recalculates trust score
        Score drops below threshold
        Access revoked

10:01 - User attempts data access
        ACCESS DENIED

10:05 - User re-enables firewall
        Agent detects improvement
        Sends updated report
        Access restored

Advantage: Real-time response to security changes

Implementation Approaches:

Polling Model:                      Event-Driven Model:
==============                      ==================

while True:                         register_event_handlers(
    report = collect_posture()          on_firewall_change,
    send_to_pdp(report)                 on_encryption_change,
    sleep(60)  # Every minute           on_patch_installed,
                                        on_process_started
                                    )
Pros: Simple to implement
Cons: Delay in detection,           Pros: Immediate detection
      wasted resources              Cons: OS-specific, complex

For this project: Start with polling,
then add event-driven for bonus points

BYOD Considerations

Bring Your Own Device (BYOD) policies introduce unique device trust challenges:

Corporate Device:                   Personal Device (BYOD):
=================                   =======================

- Company controls configuration    - User controls configuration
- Can enforce encryption           - Can only REQUEST encryption
- Can install EDR                  - Cannot mandate EDR
- Full visibility                  - Privacy concerns
- Easier to attest                 - Harder to attest

BYOD Trust Strategies:
======================

1. CONTAINER APPROACH
   - Work apps in isolated container
   - Container enforces policies
   - Personal apps unrestricted

2. RISK-BASED ACCESS
   - Lower trust score for BYOD
   - Limited access to sensitive data
   - More frequent re-authentication

3. MANAGED APPS ONLY
   - Work through web browser
   - No local data storage
   - Minimal device requirements

Our Agent's BYOD Mode:
======================
- Collect only security-relevant data
- No personal app inspection
- Report device type: "personal" vs "corporate"
- Let PDP apply appropriate policy

Complete Project Specification

Functional Requirements

Core Features (Must Have):

Feature Description Priority
Disk encryption status Query FileVault (macOS), BitLocker (Windows), LUKS (Linux) P0
Firewall status Query system firewall enabled/disabled state P0
OS patch level Determine days since last OS update P0
Health report generation Create structured JSON report P0
Report signing Sign reports with device private key P0
HTTP endpoint Expose health status via local API P1
Continuous monitoring Detect posture changes in real-time P1
PDP integration Send reports to Policy Decision Point P2

Health Report Schema:

{
    "report_version": "1.0",
    "device_id": "device-unique-identifier",
    "timestamp": "2025-12-27T10:30:00Z",
    "nonce": "random-string-from-pdp",
    "posture": {
        "disk_encryption": {
            "enabled": true,
            "algorithm": "AES-256-XTS",
            "recovery_key_escrowed": true
        },
        "firewall": {
            "enabled": true,
            "profile": "public"
        },
        "os": {
            "name": "macOS",
            "version": "14.2.1",
            "last_update": "2025-12-20T00:00:00Z",
            "days_since_update": 7,
            "known_cves": []
        },
        "antivirus": {
            "present": true,
            "product": "CrowdStrike Falcon",
            "definitions_age_days": 1
        },
        "secure_boot": {
            "enabled": true,
            "mode": "full"
        }
    },
    "trust_score": 87,
    "issues": [
        {"severity": "warning", "message": "OS update available"}
    ],
    "signature": "base64-encoded-signature"
}

Non-Functional Requirements

Requirement Target Rationale
Collection time < 2 seconds Agent shouldn’t slow down login
Memory usage < 50 MB Background process
CPU usage < 1% average Shouldn’t drain battery
Report size < 10 KB Network efficiency
Update frequency Configurable (default 60s) Balance freshness vs overhead

Real World Outcome

When your agent runs, here’s what you’ll see:

Starting the Agent

$ sudo ./device-health-agent --config /etc/device-health/config.yaml

[2025-12-27T10:30:00Z] Device Health Agent v1.0.0
[2025-12-27T10:30:00Z] Loading configuration from /etc/device-health/config.yaml
[2025-12-27T10:30:00Z] Device ID: device-abc123
[2025-12-27T10:30:00Z] PDP endpoint: https://pdp.company.com/device-health
[2025-12-27T10:30:00Z] Monitoring interval: 60 seconds
[2025-12-27T10:30:00Z] Starting initial posture collection...

+------------------------------------------------------------------+
|                    DEVICE HEALTH REPORT                           |
+------------------------------------------------------------------+
|                                                                   |
|  Device ID:        device-abc123                                  |
|  Timestamp:        2025-12-27T10:30:01Z                          |
|  Trust Score:      87/100 [TRUSTED]                              |
|                                                                   |
+------------------------------------------------------------------+
|  POSTURE CHECKS                                                   |
+------------------------------------------------------------------+
|                                                                   |
|  [PASS] Disk Encryption    FileVault enabled (AES-256-XTS)       |
|  [PASS] Firewall           Enabled (Stealth mode active)          |
|  [WARN] OS Patch Level     7 days since last update              |
|  [PASS] Antivirus          CrowdStrike Falcon (defs: 1 day old)  |
|  [PASS] Secure Boot        Enabled (Full security)               |
|                                                                   |
+------------------------------------------------------------------+
|  ISSUES                                                           |
+------------------------------------------------------------------+
|                                                                   |
|  [WARNING] OS update available (macOS 14.2.2)                    |
|                                                                   |
+------------------------------------------------------------------+

[2025-12-27T10:30:01Z] Health report signed with device key
[2025-12-27T10:30:01Z] Report sent to PDP: 200 OK
[2025-12-27T10:30:01Z] PDP response: {"access_level": "full", "message": "Device trusted"}
[2025-12-27T10:30:01Z] Entering continuous monitoring mode...

Detecting a Security Change

[2025-12-27T10:45:22Z] POSTURE CHANGE DETECTED
[2025-12-27T10:45:22Z] Change: firewall.enabled: true -> false

+------------------------------------------------------------------+
|                    POSTURE CHANGE ALERT                           |
+------------------------------------------------------------------+
|                                                                   |
|  Change Type:     DEGRADATION                                     |
|  Component:       Firewall                                        |
|  Previous State:  Enabled                                         |
|  Current State:   DISABLED                                        |
|                                                                   |
|  Trust Score Impact: 87 -> 62 (-25 points)                       |
|                                                                   |
+------------------------------------------------------------------+

[2025-12-27T10:45:22Z] Generating emergency health report...
[2025-12-27T10:45:22Z] Report sent to PDP: 200 OK
[2025-12-27T10:45:22Z] PDP response: {"access_level": "restricted", "message": "Firewall must be enabled for full access"}

+------------------------------------------------------------------+
|                    ACCESS LEVEL CHANGED                           |
+------------------------------------------------------------------+
|                                                                   |
|  Previous Access:  FULL                                           |
|  Current Access:   RESTRICTED (read-only)                         |
|                                                                   |
|  To restore full access, enable the system firewall:             |
|    macOS:   System Settings > Network > Firewall > Turn On       |
|    Linux:   sudo ufw enable                                       |
|    Windows: netsh advfirewall set allprofiles state on           |
|                                                                   |
+------------------------------------------------------------------+

Querying the Local API

$ curl http://localhost:8080/health

{
    "device_id": "device-abc123",
    "timestamp": "2025-12-27T10:50:00Z",
    "trust_score": 62,
    "access_level": "restricted",
    "posture": {
        "disk_encryption": {"status": "pass", "enabled": true},
        "firewall": {"status": "fail", "enabled": false},
        "os_patch_level": {"status": "warn", "days": 7},
        "antivirus": {"status": "pass", "present": true},
        "secure_boot": {"status": "pass", "enabled": true}
    },
    "issues": [
        {"severity": "critical", "message": "Firewall is disabled"},
        {"severity": "warning", "message": "OS update available"}
    ]
}

The Core Question You’re Answering

“How do I know the device connecting to my system is secure, and not a compromised laptop pretending to be trusted?”

This is the fundamental question of endpoint security in Zero Trust Architecture. Traditional security assumed that if a device was on the corporate network, it could be trusted. Zero Trust rejects this assumption entirely.

Consider the scenario: An employee’s laptop gets infected with malware through a phishing attack. The attacker now has valid credentials, can pass MFA (the malware waits for the user to authenticate), and connects through legitimate network paths. Traditional security sees everything as valid. But if you’re checking device health, you’ll notice the firewall was disabled, an unknown process is running, and the antivirus definitions are weeks old.

This project teaches you to build the mechanism that answers: “Even though the user authenticated correctly, should we trust the device they’re using?”


Concepts You Must Understand First

Before diving into implementation, ensure you understand these foundational concepts:

1. TPM and Hardware Roots of Trust

A Trusted Platform Module (TPM) is a dedicated hardware chip that provides cryptographic operations isolated from the main CPU. It contains:

  • Endorsement Key (EK): A unique, manufacturer-installed key that identifies this specific TPM
  • Platform Configuration Registers (PCRs): Special registers that store measurements of the boot process
  • Attestation Identity Key (AIK): A key used to sign quotes (reports of PCR values)

The TPM creates a “chain of trust” from hardware up through software. Each boot stage measures the next stage before executing it. If malware modifies the bootloader, the measurements will differ from expected values.

Why it matters: Software-based attestation can be fooled by a compromised OS. The OS can simply lie about its security state. TPM measurements are taken before the OS loads, so a rootkit cannot forge “I was a clean boot.”

2. Endpoint Posture Checks

“Posture” refers to the security configuration state of a device. Key dimensions include:

  • Disk Encryption: Is data at rest encrypted? Can a thief with physical access read the drive?
  • Firewall Status: Is the host-based firewall enabled? Are inbound connections blocked?
  • OS Patch Level: How many days since the last security update? Are known CVEs affecting this version?
  • Antivirus/EDR Presence: Is endpoint protection software running? Are definitions current?
  • Secure Boot: Is the boot chain verified? Can unauthorized bootloaders run?

Each dimension contributes to overall device trustworthiness. A device might have encryption enabled but an outdated OS, leading to a moderate trust score rather than binary pass/fail.

3. Attestation Protocols

Attestation is the process of providing cryptographically verifiable evidence about system state to a remote party. Key properties:

  • Freshness: The report is recent (includes timestamp and nonce to prevent replay)
  • Authenticity: The report genuinely came from the claimed device (signed with device key)
  • Integrity: The report hasn’t been modified in transit (signature covers all fields)
  • Non-repudiation: The device cannot deny having sent the report (asymmetric cryptography)

The verifier sends a random nonce (challenge), the device includes this nonce in its signed report, proving the report was generated after the challenge.

4. Trust Score Calculation

Rather than binary “trusted” or “untrusted,” modern ZTA uses numerical trust scores that enable graduated access:

  • Score 0-50: Deny all access
  • Score 51-70: Read-only access to non-sensitive resources
  • Score 71-85: Standard access
  • Score 86-100: Full access including sensitive operations

Each posture dimension contributes positive or negative points. Disk encryption enabled: +20. Firewall disabled: -25. The final score determines access level.

Why scores over binary? A user whose only issue is a 10-day-old OS patch shouldn’t be locked out entirely. They can work with reduced privileges while a reminder prompts them to update.

5. Continuous vs Point-in-Time Verification

Point-in-time: Check device health once at login/VPN connection. Problem: What if security degrades after login?

Continuous: Monitor device state constantly, react to changes immediately. If the firewall gets disabled at 2pm, access is restricted at 2:01pm, not at next login.

Continuous verification can be implemented via:

  • Polling (check every N seconds)
  • Event-driven (subscribe to OS notifications about security changes)
  • Hybrid (poll regularly, but also react to events)

Questions to Guide Your Design

Before writing code, think through these questions:

Data Collection

  1. How will you query disk encryption status on your target OS? What command or API returns this information?
  2. How do you determine if the firewall is enabled vs just installed? What’s the difference between having firewall software and having it actively blocking traffic?
  3. How do you find when the OS was last updated? Is this the same as when the last package was installed?
  4. What elevated permissions does your agent need? Can you drop privileges after startup?

Report Integrity

  1. What cryptographic algorithm will you use to sign reports? Why that choice?
  2. How do you prevent an attacker from replaying yesterday’s “healthy” report today?
  3. Where is the device’s private key stored? Who has access to it? Could malware extract it?
  4. How does the PDP know which public key corresponds to which device?

Change Detection

  1. How often should you poll for changes? What’s the tradeoff between responsiveness and resource usage?
  2. What constitutes a “significant” change worth reporting immediately vs waiting for the next scheduled report?
  3. How do you handle temporary state changes (e.g., firewall briefly disabled during software installation)?

Integration

  1. What format should your health report use? JSON? What schema?
  2. How will your agent communicate with the PDP? Push reports? Wait to be polled?
  3. What happens if the PDP is unreachable? Queue reports? Continue with cached trust level?

Thinking Exercise

Before implementing, work through this design exercise on paper:

Design a Trust Scoring Model

Create a scoring model for device health that balances security with usability.

Your task: Define point values for each posture dimension and justify your choices.

Consider:

  1. What’s your base score? (Starting point before any checks)
  2. Which security failures are critical (immediate access denial)?
  3. Which are warnings (reduced score but not blocking)?
  4. How do you handle “unknown” states (couldn’t check a dimension)?

Example framework:

Base Score: 50

Disk Encryption:
  - Enabled with strong algorithm: +25
  - Enabled with weak algorithm: +15
  - In progress: +10
  - Disabled: -40
  - Unknown/check failed: -10

[Continue for other dimensions...]

Questions to answer:

  • What’s the minimum score for any access? Why?
  • What score grants full access? Why?
  • If someone has perfect security except firewall disabled, what happens?
  • If someone has all checks as “unknown” (new OS version broke your queries), what access do they get?

Write out your complete scoring model before implementing it in code.


Hints in Layers

If you get stuck, reveal these hints progressively:

Hint 1: Getting Started with System Queries

Start with a single platform (your own). Don’t try to support all three OSes initially.

For macOS: The fdesetup command handles FileVault queries. The Application Firewall is controlled by socketfilterfw. The softwareupdate command shows pending updates.

For Linux: Check /dev/mapper/ for LUKS encryption. Query ufw or firewall-cmd depending on distro. Look at /var/lib/apt/periodic/ timestamps for update recency.

For Windows: PowerShell’s Get-BitLockerVolume and Get-NetFirewallProfile cmdlets are your friends.

Hint 2: Structuring Your Agent

Create an abstraction layer early:

Collector Interface:
  - Collect() returns (Status, error)

Platform-specific implementations:
  - DarwinDiskEncryptionCollector
  - LinuxDiskEncryptionCollector
  - WindowsDiskEncryptionCollector

Factory function:
  - NewDiskEncryptionCollector() returns appropriate impl for runtime.GOOS

This pattern lets you add Windows support later without restructuring.

Hint 3: Signing Reports Correctly

The message you sign must include everything that needs protection:

  • The report content itself (serialized to JSON)
  • The timestamp
  • The nonce from the verifier

Use a consistent format: message = report_json || "|" || timestamp || "|" || nonce

Sign the message bytes, not the string representation of each component separately.

Hint 4: Continuous Monitoring Strategy

Start with simple polling:

loop every 60 seconds:
    new_state = collect_all()
    if new_state != cached_state:
        report_change(cached_state, new_state)
        cached_state = new_state

Only add event-driven monitoring after polling works. Events are OS-specific and more complex.

For change detection, compare the serialized state or implement Equals() methods. Be careful with floating-point comparisons or timestamps that change every collection.

Hint 5: Error Handling and Graceful Degradation

Not every check will succeed every time. Handle failures gracefully:

  • If a specific collector fails, report that dimension as “unknown” rather than crashing
  • If you can’t reach the PDP, queue reports and retry with exponential backoff
  • If permissions change mid-run, log the error and continue with reduced collection
  • Set a maximum report age; if you haven’t successfully sent a report in N minutes, alert locally

Consider what happens if your agent itself is compromised. Defense in depth: the agent is just one input to the PDP’s decision.


Solution Architecture

System Architecture Diagram

+------------------------------------------------------------------+
|                     DEVICE HEALTH AGENT                           |
+------------------------------------------------------------------+
|                                                                   |
|  +----------------+    +------------------+    +---------------+  |
|  | Posture        |    | Report           |    | HTTP          |  |
|  | Collectors     |    | Generator        |    | Server        |  |
|  |----------------|    |------------------|    |---------------|  |
|  | - Disk Encrypt |    | - Schema         |    | - /health     |  |
|  | - Firewall     |--->| - Scoring        |--->| - /report     |  |
|  | - OS Patches   |    | - Signing        |    | - /status     |  |
|  | - Antivirus    |    |                  |    |               |  |
|  +----------------+    +------------------+    +---------------+  |
|         |                      |                      |          |
|         v                      v                      |          |
|  +----------------+    +------------------+           |          |
|  | OS Abstraction |    | Crypto Layer     |           |          |
|  |----------------|    |------------------|           |          |
|  | - Linux impl   |    | - Key management |           |          |
|  | - macOS impl   |    | - Ed25519 signing|           |          |
|  | - Windows impl |    | - Verification   |           |          |
|  +----------------+    +------------------+           |          |
|                                                       |          |
+------------------------------------------------------------------+
          |                       |                     |
          v                       v                     v
    +----------+          +------------+         +-----------+
    | OS APIs  |          | Key Store  |         | PDP       |
    | Commands |          | (file/TPM) |         | Service   |
    +----------+          +------------+         +-----------+

Module Breakdown

device-health-agent/
+-- main.go                    # Entry point, CLI handling
+-- config/
|   +-- config.go              # Configuration loading
|   +-- config.yaml            # Default configuration
+-- collectors/
|   +-- interface.go           # Collector interface definition
|   +-- disk_encryption.go     # Disk encryption status
|   +-- firewall.go            # Firewall status
|   +-- os_patches.go          # OS patch level
|   +-- antivirus.go           # Antivirus status
|   +-- secure_boot.go         # Secure boot status
+-- collectors/platform/
|   +-- linux.go               # Linux implementations
|   +-- darwin.go              # macOS implementations
|   +-- windows.go             # Windows implementations
+-- report/
|   +-- generator.go           # Report generation
|   +-- schema.go              # Report data structures
|   +-- signer.go              # Cryptographic signing
|   +-- scorer.go              # Trust score calculation
+-- monitor/
|   +-- watcher.go             # Continuous monitoring
|   +-- events.go              # Change detection
+-- api/
|   +-- server.go              # HTTP API server
|   +-- handlers.go            # Request handlers
+-- crypto/
|   +-- keys.go                # Key management
|   +-- sign.go                # Signing operations
+-- client/
|   +-- pdp.go                 # PDP communication client
+-- tests/
    +-- collectors_test.go
    +-- report_test.go
    +-- integration_test.go

Data Flow Diagram

                 Configuration
                      |
                      v
+------------------+  +------------------+
|     Startup      |->|   Key Loading    |
+------------------+  +--------+---------+
                               |
                               v
          +--------------------+--------------------+
          |                    |                    |
          v                    v                    v
+------------------+  +------------------+  +------------------+
| Disk Encryption  |  |    Firewall      |  |   OS Patches     |
| Collector        |  |    Collector     |  |   Collector      |
+--------+---------+  +--------+---------+  +--------+---------+
         |                     |                     |
         +---------------------+---------------------+
                               |
                               v
                    +--------------------+
                    | Report Generator   |
                    |   - Aggregate      |
                    |   - Score          |
                    |   - Format         |
                    +--------+-----------+
                             |
                             v
                    +--------------------+
                    | Report Signer      |
                    |   - Add timestamp  |
                    |   - Add nonce      |
                    |   - Sign with key  |
                    +--------+-----------+
                             |
            +----------------+----------------+
            |                                 |
            v                                 v
    +----------------+               +----------------+
    | HTTP API       |               | PDP Client     |
    | (localhost)    |               | (remote)       |
    +----------------+               +----------------+
            |                                 |
            v                                 v
    Local consumers                  Policy Decision
    (CLI, browser)                   Point

Phased Implementation Guide

Phase 1: Query Disk Encryption Status (2-3 hours)

Goal: Determine if the system’s disk is encrypted using OS-specific commands.

Milestone: Running ./agent --check disk prints encryption status.

OS-Specific Commands

macOS (FileVault):

# Check FileVault status
$ fdesetup status
FileVault is On.

# Programmatic check (returns JSON)
$ fdesetup status -extended
{
    "FileVaultStatus": "On",
    "Encryption Conversion Progress": 100.0,
    "Encryption Type": "AES-XTS"
}

# Using diskutil
$ diskutil apfs list | grep -A5 "FileVault"

Linux (LUKS):

# Check if root filesystem is on encrypted volume
$ lsblk -f
NAME        FSTYPE      LABEL   MOUNTPOINT
sda
+-sda1      ext4        boot    /boot
+-sda2      crypto_LUKS
  +-root    ext4        root    /

# Check LUKS status
$ sudo cryptsetup status /dev/mapper/root
/dev/mapper/root is active.
  type:    LUKS2
  cipher:  aes-xts-plain64
  keysize: 512 bits

# Using dmsetup
$ sudo dmsetup status --target crypt
root: 0 209715200 crypt

Windows (BitLocker):

# PowerShell command
Get-BitLockerVolume -MountPoint "C:" | Select-Object VolumeStatus, EncryptionMethod

# manage-bde command
manage-bde -status C:

# Example output:
# Volume C: []
#     Conversion Status:    Fully Encrypted
#     Percentage Encrypted: 100%
#     Encryption Method:    XTS-AES 256
#     Protection Status:    Protection On

Go Implementation (macOS example)

package collectors

import (
    "os/exec"
    "strings"
)

type DiskEncryptionStatus struct {
    Enabled   bool   `json:"enabled"`
    Algorithm string `json:"algorithm,omitempty"`
    Progress  int    `json:"progress,omitempty"` // 0-100
}

func CheckDiskEncryption() (*DiskEncryptionStatus, error) {
    // Check if running on macOS
    if runtime.GOOS != "darwin" {
        return checkDiskEncryptionLinux()
    }

    cmd := exec.Command("fdesetup", "status")
    output, err := cmd.Output()
    if err != nil {
        return nil, fmt.Errorf("failed to check FileVault: %w", err)
    }

    outputStr := string(output)
    status := &DiskEncryptionStatus{}

    if strings.Contains(outputStr, "FileVault is On") {
        status.Enabled = true
        status.Algorithm = "AES-XTS" // Default for FileVault 2
        status.Progress = 100
    } else if strings.Contains(outputStr, "Encryption in progress") {
        status.Enabled = true
        status.Progress = parseProgress(outputStr)
    } else {
        status.Enabled = false
    }

    return status, nil
}

Python Implementation (Linux example)

import subprocess
import re
from dataclasses import dataclass
from typing import Optional

@dataclass
class DiskEncryptionStatus:
    enabled: bool
    algorithm: Optional[str] = None
    luks_version: Optional[str] = None

def check_disk_encryption() -> DiskEncryptionStatus:
    """Check if root filesystem is encrypted using LUKS."""

    # First, find the root device
    result = subprocess.run(
        ["findmnt", "-n", "-o", "SOURCE", "/"],
        capture_output=True, text=True
    )
    root_device = result.stdout.strip()

    # Check if it's a device mapper (indicates encryption)
    if "/dev/mapper/" in root_device:
        # Query LUKS status
        try:
            result = subprocess.run(
                ["sudo", "cryptsetup", "status", root_device],
                capture_output=True, text=True
            )

            if "is active" in result.stdout:
                # Parse cipher info
                cipher_match = re.search(r"cipher:\s+(\S+)", result.stdout)
                algorithm = cipher_match.group(1) if cipher_match else "unknown"

                return DiskEncryptionStatus(
                    enabled=True,
                    algorithm=algorithm,
                    luks_version="LUKS2" if "LUKS2" in result.stdout else "LUKS1"
                )
        except subprocess.CalledProcessError:
            pass

    return DiskEncryptionStatus(enabled=False)

Phase 2: Query Firewall Status (2-3 hours)

Goal: Determine if the host-based firewall is enabled.

Milestone: Running ./agent --check firewall prints firewall status.

OS-Specific Commands

macOS (Application Firewall):

# Check firewall state
$ /usr/libexec/ApplicationFirewall/socketfilterfw --getglobalstate
Firewall is enabled. (State = 1)

# Check stealth mode
$ /usr/libexec/ApplicationFirewall/socketfilterfw --getstealthmode
Stealth mode enabled.

# Check block all incoming
$ /usr/libexec/ApplicationFirewall/socketfilterfw --getblockall
Block all DISABLED!

Linux (iptables/ufw/firewalld):

# Check UFW (Ubuntu)
$ sudo ufw status
Status: active

To                         Action      From
--                         ------      ----
22/tcp                     ALLOW       Anywhere

# Check iptables (any Linux)
$ sudo iptables -L -n | head -5
Chain INPUT (policy DROP)
target     prot opt source               destination
ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0

# Check firewalld (RHEL/CentOS)
$ firewall-cmd --state
running

Windows (Windows Firewall):

# PowerShell
Get-NetFirewallProfile | Select-Object Name, Enabled

# Example output:
# Name    Enabled
# ----    -------
# Domain     True
# Private    True
# Public     True

# Using netsh
netsh advfirewall show allprofiles state

Go Implementation

func CheckFirewall() (*FirewallStatus, error) {
    switch runtime.GOOS {
    case "darwin":
        return checkFirewallMacOS()
    case "linux":
        return checkFirewallLinux()
    case "windows":
        return checkFirewallWindows()
    default:
        return nil, fmt.Errorf("unsupported OS: %s", runtime.GOOS)
    }
}

func checkFirewallMacOS() (*FirewallStatus, error) {
    cmd := exec.Command(
        "/usr/libexec/ApplicationFirewall/socketfilterfw",
        "--getglobalstate",
    )
    output, err := cmd.Output()
    if err != nil {
        return nil, err
    }

    status := &FirewallStatus{}
    if strings.Contains(string(output), "State = 1") {
        status.Enabled = true

        // Check stealth mode
        stealthCmd := exec.Command(
            "/usr/libexec/ApplicationFirewall/socketfilterfw",
            "--getstealthmode",
        )
        stealthOutput, _ := stealthCmd.Output()
        status.StealthMode = strings.Contains(string(stealthOutput), "enabled")
    }

    return status, nil
}

func checkFirewallLinux() (*FirewallStatus, error) {
    // Try ufw first
    cmd := exec.Command("ufw", "status")
    output, err := cmd.Output()
    if err == nil && strings.Contains(string(output), "Status: active") {
        return &FirewallStatus{Enabled: true, Type: "ufw"}, nil
    }

    // Try firewalld
    cmd = exec.Command("firewall-cmd", "--state")
    output, err = cmd.Output()
    if err == nil && strings.Contains(string(output), "running") {
        return &FirewallStatus{Enabled: true, Type: "firewalld"}, nil
    }

    // Check iptables policy
    cmd = exec.Command("iptables", "-L", "INPUT", "-n")
    output, err = cmd.Output()
    if err == nil && strings.Contains(string(output), "policy DROP") {
        return &FirewallStatus{Enabled: true, Type: "iptables"}, nil
    }

    return &FirewallStatus{Enabled: false}, nil
}

Phase 3: Check OS Patch Level (2-3 hours)

Goal: Determine how recently the OS was updated and if updates are pending.

Milestone: Running ./agent --check patches prints days since last update.

OS-Specific Commands

macOS:

# Get last update date from install history
$ softwareupdate --history | head -5
Display Name                                      Version    Date
------------                                      -------    ----
macOS Sonoma 14.2.1                              14.2.1     12/20/2024, 10:30:00 AM

# Check for pending updates
$ softwareupdate -l
Software Update found the following new or updated software:
* Label: macOS Sonoma 14.2.2
    Title: macOS Sonoma 14.2.2, Version: 14.2.2, Size: 1.2GB

# System profiler approach
$ system_profiler SPInstallHistoryDataType | head -20

Linux (Debian/Ubuntu):

# Last apt update
$ stat /var/lib/apt/periodic/update-success-stamp
Access: 2025-12-20 10:30:00.000000000 +0000
Modify: 2025-12-20 10:30:00.000000000 +0000

# Check pending updates
$ apt list --upgradable 2>/dev/null | wc -l

# Last package installation
$ ls -lt /var/log/apt/history.log
-rw-r--r-- 1 root root 12345 Dec 20 10:30 /var/log/apt/history.log

# Or parse dpkg.log
$ grep " install " /var/log/dpkg.log | tail -1
2025-12-20 10:30:00 install linux-image-6.1.0-15-amd64:amd64 6.1.0-15

Linux (RHEL/CentOS):

# Last yum/dnf update
$ sudo dnf history | head -5
ID     | Command line             | Date and time    | Action(s)
--------------------------------------------------------------------------------
   105 | update                   | 2025-12-20 10:30 | Upgrade

# Check pending updates
$ dnf check-update | wc -l

Windows:

# Get update history
Get-HotFix | Sort-Object InstalledOn -Descending | Select-Object -First 5

# Example output:
# Source        Description      HotFixID      InstalledBy          InstalledOn
# ------        -----------      --------      -----------          -----------
# DESKTOP-ABC   Security Update  KB5034123     NT AUTHORITY\SYSTEM  12/20/2025

# Check Windows Update service
Get-WindowsUpdateLog

# Using wmic
wmic qfe get InstalledOn | sort

Python Implementation

import subprocess
import os
from datetime import datetime, timedelta
from dataclasses import dataclass
from typing import List, Optional

@dataclass
class PatchStatus:
    last_update: datetime
    days_since_update: int
    pending_updates: int
    security_updates_pending: int
    os_version: str

def check_patch_level_linux() -> PatchStatus:
    """Check patch level on Debian/Ubuntu systems."""

    # Find last successful apt update
    stamp_file = "/var/lib/apt/periodic/update-success-stamp"

    if os.path.exists(stamp_file):
        last_update = datetime.fromtimestamp(os.path.getmtime(stamp_file))
    else:
        # Fallback to dpkg.log
        last_update = get_last_dpkg_install()

    days_since = (datetime.now() - last_update).days

    # Count pending updates
    result = subprocess.run(
        ["apt", "list", "--upgradable"],
        capture_output=True, text=True
    )
    # Subtract 1 for the header line
    pending = max(0, len(result.stdout.strip().split('\n')) - 1)

    # Count security updates
    result = subprocess.run(
        ["apt", "list", "--upgradable"],
        capture_output=True, text=True
    )
    security = sum(1 for line in result.stdout.split('\n')
                   if 'security' in line.lower())

    # Get OS version
    with open('/etc/os-release') as f:
        for line in f:
            if line.startswith('PRETTY_NAME='):
                os_version = line.split('=')[1].strip().strip('"')
                break

    return PatchStatus(
        last_update=last_update,
        days_since_update=days_since,
        pending_updates=pending,
        security_updates_pending=security,
        os_version=os_version
    )

def get_last_dpkg_install() -> datetime:
    """Parse dpkg.log to find last install/upgrade."""
    log_file = "/var/log/dpkg.log"
    last_date = datetime.min

    with open(log_file) as f:
        for line in f:
            if " install " in line or " upgrade " in line:
                try:
                    date_str = line.split()[0] + " " + line.split()[1]
                    date = datetime.strptime(date_str, "%Y-%m-%d %H:%M:%S")
                    if date > last_date:
                        last_date = date
                except ValueError:
                    continue

    return last_date

Phase 4: Sign Health Reports (2-3 hours)

Goal: Cryptographically sign health reports so they cannot be forged.

Milestone: Reports include a signature that can be verified by the PDP.

Key Generation

# Generate Ed25519 keypair (recommended for signing)
$ openssl genpkey -algorithm ED25519 -out device_private.pem
$ openssl pkey -in device_private.pem -pubout -out device_public.pem

# View the keys
$ cat device_public.pem
-----BEGIN PUBLIC KEY-----
MCowBQYDK2VwAyEAxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx=
-----END PUBLIC KEY-----

Go Implementation

package crypto

import (
    "crypto/ed25519"
    "crypto/x509"
    "encoding/base64"
    "encoding/json"
    "encoding/pem"
    "fmt"
    "os"
    "time"
)

type SignedReport struct {
    Report    json.RawMessage `json:"report"`
    Timestamp time.Time       `json:"timestamp"`
    Nonce     string          `json:"nonce"`
    Signature string          `json:"signature"`
}

type Signer struct {
    privateKey ed25519.PrivateKey
}

func NewSigner(keyPath string) (*Signer, error) {
    keyBytes, err := os.ReadFile(keyPath)
    if err != nil {
        return nil, fmt.Errorf("failed to read private key: %w", err)
    }

    block, _ := pem.Decode(keyBytes)
    if block == nil {
        return nil, fmt.Errorf("failed to decode PEM block")
    }

    key, err := x509.ParsePKCS8PrivateKey(block.Bytes)
    if err != nil {
        return nil, fmt.Errorf("failed to parse private key: %w", err)
    }

    edKey, ok := key.(ed25519.PrivateKey)
    if !ok {
        return nil, fmt.Errorf("key is not Ed25519")
    }

    return &Signer{privateKey: edKey}, nil
}

func (s *Signer) SignReport(report interface{}, nonce string) (*SignedReport, error) {
    reportJSON, err := json.Marshal(report)
    if err != nil {
        return nil, err
    }

    timestamp := time.Now().UTC()

    // Create the message to sign: report || timestamp || nonce
    message := fmt.Sprintf("%s|%s|%s",
        string(reportJSON),
        timestamp.Format(time.RFC3339),
        nonce,
    )

    signature := ed25519.Sign(s.privateKey, []byte(message))

    return &SignedReport{
        Report:    reportJSON,
        Timestamp: timestamp,
        Nonce:     nonce,
        Signature: base64.StdEncoding.EncodeToString(signature),
    }, nil
}

Python Implementation

import json
import base64
from datetime import datetime
from cryptography.hazmat.primitives import serialization
from cryptography.hazmat.primitives.asymmetric.ed25519 import Ed25519PrivateKey
from dataclasses import dataclass

@dataclass
class SignedReport:
    report: dict
    timestamp: str
    nonce: str
    signature: str

class ReportSigner:
    def __init__(self, private_key_path: str):
        with open(private_key_path, 'rb') as f:
            self.private_key = serialization.load_pem_private_key(
                f.read(),
                password=None
            )

    def sign_report(self, report: dict, nonce: str) -> SignedReport:
        report_json = json.dumps(report, sort_keys=True)
        timestamp = datetime.utcnow().isoformat() + 'Z'

        # Message to sign: report || timestamp || nonce
        message = f"{report_json}|{timestamp}|{nonce}"

        signature = self.private_key.sign(message.encode())
        signature_b64 = base64.b64encode(signature).decode()

        return SignedReport(
            report=report,
            timestamp=timestamp,
            nonce=nonce,
            signature=signature_b64
        )

Verification on PDP Side

func VerifyReport(signedReport *SignedReport, publicKeyPath string) error {
    // Load public key
    keyBytes, _ := os.ReadFile(publicKeyPath)
    block, _ := pem.Decode(keyBytes)
    pubKey, _ := x509.ParsePKIXPublicKey(block.Bytes)
    edPubKey := pubKey.(ed25519.PublicKey)

    // Reconstruct the message
    message := fmt.Sprintf("%s|%s|%s",
        string(signedReport.Report),
        signedReport.Timestamp.Format(time.RFC3339),
        signedReport.Nonce,
    )

    // Decode signature
    signature, _ := base64.StdEncoding.DecodeString(signedReport.Signature)

    // Verify
    if !ed25519.Verify(edPubKey, []byte(message), signature) {
        return fmt.Errorf("signature verification failed")
    }

    // Check timestamp freshness (within 5 minutes)
    age := time.Since(signedReport.Timestamp)
    if age > 5*time.Minute {
        return fmt.Errorf("report is too old: %v", age)
    }

    return nil
}

Phase 5: Continuous Monitoring with Change Detection (3-4 hours)

Goal: Monitor system state continuously and detect changes in real-time.

Milestone: Agent detects and reports when firewall is disabled.

Architecture

Continuous Monitoring Architecture:
===================================

+------------------------------------------------------------------+
|                      MONITORING LOOP                              |
+------------------------------------------------------------------+
|                                                                   |
|  +------------------+                                             |
|  | Initial State    |---+                                         |
|  | Collection       |   |                                         |
|  +------------------+   |                                         |
|                         v                                         |
|                   +-----------+                                   |
|                   | State     |                                   |
|                   | Cache     |<---------+                        |
|                   +-----------+          |                        |
|                         |                |                        |
|                         | Compare        |                        |
|                         v                |                        |
|  +------------------+   |   +------------------+                  |
|  | Polling Timer    |-->+-->| Change Detector  |                  |
|  | (60s interval)   |       +--------+---------+                  |
|  +------------------+                |                            |
|                                      v                            |
|                             +------------------+                  |
|                             | Change Detected? |                  |
|                             +--------+---------+                  |
|                                      |                            |
|                      +---------------+---------------+            |
|                      |                               |            |
|                      v                               v            |
|               +-----------+                   +------------+      |
|               | No Change |                   | Change!    |      |
|               | Sleep     |                   | Send Alert |      |
|               +-----------+                   +-----+------+      |
|                                                     |             |
|                                                     v             |
|                                              +------------+       |
|                                              | Update     |       |
|                                              | PDP        |       |
|                                              +------------+       |
|                                                                   |
+------------------------------------------------------------------+

Go Implementation

package monitor

import (
    "context"
    "log"
    "reflect"
    "sync"
    "time"
)

type PostureChange struct {
    Component    string      `json:"component"`
    PreviousState interface{} `json:"previous_state"`
    CurrentState  interface{} `json:"current_state"`
    Timestamp    time.Time   `json:"timestamp"`
    Severity     string      `json:"severity"` // "critical", "warning", "info"
}

type Monitor struct {
    collectors    map[string]Collector
    previousState map[string]interface{}
    interval      time.Duration
    onChange      func(change PostureChange)
    mu            sync.RWMutex
}

func NewMonitor(interval time.Duration, onChange func(PostureChange)) *Monitor {
    return &Monitor{
        collectors:    make(map[string]Collector),
        previousState: make(map[string]interface{}),
        interval:      interval,
        onChange:      onChange,
    }
}

func (m *Monitor) RegisterCollector(name string, collector Collector) {
    m.collectors[name] = collector
}

func (m *Monitor) Start(ctx context.Context) {
    // Initial collection
    m.collectAll()

    ticker := time.NewTicker(m.interval)
    defer ticker.Stop()

    for {
        select {
        case <-ctx.Done():
            log.Println("Monitor stopped")
            return
        case <-ticker.C:
            m.checkForChanges()
        }
    }
}

func (m *Monitor) collectAll() {
    m.mu.Lock()
    defer m.mu.Unlock()

    for name, collector := range m.collectors {
        state, err := collector.Collect()
        if err != nil {
            log.Printf("Error collecting %s: %v", name, err)
            continue
        }
        m.previousState[name] = state
    }
}

func (m *Monitor) checkForChanges() {
    m.mu.Lock()
    defer m.mu.Unlock()

    for name, collector := range m.collectors {
        currentState, err := collector.Collect()
        if err != nil {
            log.Printf("Error collecting %s: %v", name, err)
            continue
        }

        previousState, exists := m.previousState[name]
        if !exists {
            m.previousState[name] = currentState
            continue
        }

        if !reflect.DeepEqual(previousState, currentState) {
            change := PostureChange{
                Component:     name,
                PreviousState: previousState,
                CurrentState:  currentState,
                Timestamp:     time.Now(),
                Severity:      m.determineSeverity(name, previousState, currentState),
            }

            // Notify callback
            if m.onChange != nil {
                m.onChange(change)
            }

            // Update cached state
            m.previousState[name] = currentState
        }
    }
}

func (m *Monitor) determineSeverity(component string, prev, curr interface{}) string {
    // Firewall disabled is critical
    if component == "firewall" {
        prevFw, _ := prev.(*FirewallStatus)
        currFw, _ := curr.(*FirewallStatus)
        if prevFw.Enabled && !currFw.Enabled {
            return "critical"
        }
    }

    // Encryption disabled is critical
    if component == "disk_encryption" {
        prevEnc, _ := prev.(*DiskEncryptionStatus)
        currEnc, _ := curr.(*DiskEncryptionStatus)
        if prevEnc.Enabled && !currEnc.Enabled {
            return "critical"
        }
    }

    return "warning"
}

Event-Driven Alternative (macOS example)

For immediate detection without polling, use OS notification APIs:

// macOS: Monitor firewall changes using Endpoint Security framework
import EndpointSecurity

class FirewallMonitor {
    var client: OpaquePointer?

    func start() {
        var newClient: OpaquePointer?

        let result = es_new_client(&newClient) { _, message in
            // Called when firewall state changes
            if message.pointee.event_type == ES_EVENT_TYPE_NOTIFY_EXEC {
                // Check if it's a firewall-related change
                self.handleFirewallChange()
            }
        }

        if result == ES_NEW_CLIENT_RESULT_SUCCESS {
            self.client = newClient

            // Subscribe to events
            let events: [es_event_type_t] = [
                ES_EVENT_TYPE_NOTIFY_EXEC
            ]
            es_subscribe(client!, events, UInt32(events.count))
        }
    }
}

Testing Strategy

Unit Tests: Collector Mocking

// tests/collectors_test.go
package collectors_test

import (
    "testing"
    "github.com/stretchr/testify/assert"
    "github.com/stretchr/testify/mock"
)

// Mock command executor for testing
type MockCommandExecutor struct {
    mock.Mock
}

func (m *MockCommandExecutor) Execute(cmd string, args ...string) ([]byte, error) {
    callArgs := m.Called(cmd, args)
    return callArgs.Get(0).([]byte), callArgs.Error(1)
}

func TestDiskEncryptionCheck_MacOS_Enabled(t *testing.T) {
    executor := new(MockCommandExecutor)
    executor.On("Execute", "fdesetup", []string{"status"}).
        Return([]byte("FileVault is On."), nil)

    collector := NewDiskEncryptionCollector(executor)
    status, err := collector.Collect()

    assert.NoError(t, err)
    assert.True(t, status.Enabled)
}

func TestDiskEncryptionCheck_MacOS_Disabled(t *testing.T) {
    executor := new(MockCommandExecutor)
    executor.On("Execute", "fdesetup", []string{"status"}).
        Return([]byte("FileVault is Off."), nil)

    collector := NewDiskEncryptionCollector(executor)
    status, err := collector.Collect()

    assert.NoError(t, err)
    assert.False(t, status.Enabled)
}

func TestFirewallCheck_Linux_UFWEnabled(t *testing.T) {
    executor := new(MockCommandExecutor)
    executor.On("Execute", "ufw", []string{"status"}).
        Return([]byte("Status: active\n"), nil)

    collector := NewFirewallCollector(executor)
    status, err := collector.Collect()

    assert.NoError(t, err)
    assert.True(t, status.Enabled)
    assert.Equal(t, "ufw", status.Type)
}

Integration Tests: Real System Queries

// tests/integration_test.go
// +build integration

package tests

import (
    "runtime"
    "testing"
)

func TestRealDiskEncryptionStatus(t *testing.T) {
    if runtime.GOOS != "darwin" {
        t.Skip("macOS-specific test")
    }

    collector := NewDiskEncryptionCollector(nil) // Real executor
    status, err := collector.Collect()

    if err != nil {
        t.Fatalf("Failed to collect disk encryption status: %v", err)
    }

    t.Logf("Disk encryption enabled: %v", status.Enabled)
    if status.Enabled {
        t.Logf("Algorithm: %s", status.Algorithm)
    }
}

func TestRealFirewallStatus(t *testing.T) {
    collector := NewFirewallCollector(nil)
    status, err := collector.Collect()

    if err != nil {
        t.Fatalf("Failed to collect firewall status: %v", err)
    }

    t.Logf("Firewall enabled: %v", status.Enabled)
}

Signature Verification Tests

func TestReportSigningAndVerification(t *testing.T) {
    // Generate test keypair
    pubKey, privKey, _ := ed25519.GenerateKey(nil)

    // Create signer with private key
    signer := &Signer{privateKey: privKey}

    report := map[string]interface{}{
        "disk_encryption": true,
        "firewall": true,
    }

    signedReport, err := signer.SignReport(report, "test-nonce-123")
    assert.NoError(t, err)

    // Verify signature
    verifier := &Verifier{publicKey: pubKey}
    err = verifier.Verify(signedReport)
    assert.NoError(t, err)
}

func TestTamperedReportRejected(t *testing.T) {
    pubKey, privKey, _ := ed25519.GenerateKey(nil)
    signer := &Signer{privateKey: privKey}

    report := map[string]interface{}{"firewall": true}
    signedReport, _ := signer.SignReport(report, "nonce")

    // Tamper with the report
    signedReport.Report = []byte(`{"firewall": false}`)

    verifier := &Verifier{publicKey: pubKey}
    err := verifier.Verify(signedReport)
    assert.Error(t, err)
    assert.Contains(t, err.Error(), "signature verification failed")
}

Continuous Monitoring Tests

func TestChangeDetection(t *testing.T) {
    changes := make(chan PostureChange, 10)

    monitor := NewMonitor(100*time.Millisecond, func(c PostureChange) {
        changes <- c
    })

    // Start with firewall enabled
    mockCollector := &MockCollector{}
    mockCollector.SetState(&FirewallStatus{Enabled: true})
    monitor.RegisterCollector("firewall", mockCollector)

    ctx, cancel := context.WithTimeout(context.Background(), 2*time.Second)
    defer cancel()

    go monitor.Start(ctx)

    // Wait for initial collection
    time.Sleep(50 * time.Millisecond)

    // Simulate firewall being disabled
    mockCollector.SetState(&FirewallStatus{Enabled: false})

    // Wait for change detection
    select {
    case change := <-changes:
        assert.Equal(t, "firewall", change.Component)
        assert.Equal(t, "critical", change.Severity)
    case <-time.After(1 * time.Second):
        t.Fatal("Expected change notification not received")
    }
}

Common Pitfalls & Debugging

Pitfall 1: Permission Errors on System Queries

Symptom: Agent fails with “operation not permitted” or “access denied.”

Cause: Many security queries require elevated privileges.

Solution:

# Bad: Running as normal user
$ ./device-health-agent
Error: fdesetup: operation not permitted

# Good: Running with appropriate permissions
$ sudo ./device-health-agent
FileVault is On.

# Better: Use capabilities (Linux) instead of full root
$ sudo setcap 'cap_dac_read_search+ep' ./device-health-agent

In Code:

func checkPermissions() error {
    if runtime.GOOS == "darwin" || runtime.GOOS == "linux" {
        if os.Geteuid() != 0 {
            return fmt.Errorf("this agent requires root privileges. Run with sudo")
        }
    }
    return nil
}

Pitfall 2: OS Command Parsing Fragility

Symptom: Agent works on one machine but fails on another.

Cause: Command output format varies between OS versions.

# macOS 13.x
$ fdesetup status
FileVault is On.

# macOS 14.x (hypothetical change)
$ fdesetup status
FileVault: Enabled
Encryption Type: APFS

# Your regex breaks!

Solution:

// Bad: Fragile exact match
if output == "FileVault is On." {
    return true
}

// Good: Flexible matching
func isFileVaultEnabled(output string) bool {
    output = strings.ToLower(output)
    return strings.Contains(output, "filevault is on") ||
           strings.Contains(output, "filevault: enabled") ||
           strings.Contains(output, "fully encrypted")
}

Pitfall 3: Clock Skew Breaking Signature Verification

Symptom: Valid signatures rejected with “report is too old.”

Cause: Clock skew between agent and PDP.

Solution:

// Allow for clock skew (up to 5 minutes in either direction)
func verifyTimestamp(reportTime time.Time) error {
    now := time.Now()
    skew := 5 * time.Minute

    if reportTime.Before(now.Add(-skew)) {
        return fmt.Errorf("report is too old: %v ago", now.Sub(reportTime))
    }

    if reportTime.After(now.Add(skew)) {
        return fmt.Errorf("report is from the future: %v ahead", reportTime.Sub(now))
    }

    return nil
}

Also consider using NTP and monitoring clock drift.


Pitfall 4: Race Conditions in Continuous Monitoring

Symptom: Duplicate change notifications or missed changes.

Cause: Reading and writing state without proper synchronization.

Solution:

// Bad: No synchronization
func (m *Monitor) checkForChanges() {
    for name, collector := range m.collectors {
        current := collector.Collect()
        if current != m.previousState[name] {  // Race!
            m.onChange(...)
            m.previousState[name] = current    // Race!
        }
    }
}

// Good: Proper locking
func (m *Monitor) checkForChanges() {
    m.mu.Lock()
    defer m.mu.Unlock()

    for name, collector := range m.collectors {
        current, _ := collector.Collect()
        previous := m.previousState[name]
        if !reflect.DeepEqual(current, previous) {
            m.onChange(...)
            m.previousState[name] = current
        }
    }
}

Pitfall 5: Agent Itself Becoming an Attack Vector

Symptom: Attacker uses the agent to exfiltrate data or pivot.

Cause: Agent runs with elevated privileges and exposes HTTP API.

Solution:

// Bind only to localhost
server := &http.Server{
    Addr: "127.0.0.1:8080",  // NOT 0.0.0.0:8080
}

// Require authentication for API
func authMiddleware(next http.Handler) http.Handler {
    return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
        token := r.Header.Get("X-Agent-Token")
        if token != expectedToken {
            http.Error(w, "Unauthorized", http.StatusUnauthorized)
            return
        }
        next.ServeHTTP(w, r)
    })
}

// Drop privileges after startup
func dropPrivileges() {
    // On Linux, switch to unprivileged user after binding port
    if runtime.GOOS == "linux" {
        syscall.Setgid(nobodyGid)
        syscall.Setuid(nobodyUid)
    }
}

Debugging Commands

# Test individual collectors
$ ./device-health-agent --check disk --debug
[DEBUG] Executing: fdesetup status
[DEBUG] Raw output: "FileVault is On.\n"
[DEBUG] Parsed result: {Enabled: true, Algorithm: "AES-XTS"}

# View full report without sending to PDP
$ ./device-health-agent --dry-run --json | jq .

# Verify signature manually
$ ./device-health-agent --verify-key /path/to/public.pem --report report.json

# Monitor system calls (Linux)
$ strace -f ./device-health-agent --check firewall

# Monitor system calls (macOS)
$ dtruss ./device-health-agent --check disk

Extensions & Challenges

Extension 1: TPM Integration

Integrate with the Trusted Platform Module for hardware-backed attestation:

// Use go-tpm library
import "github.com/google/go-tpm/tpm2"

func getTPMAttestation(nonce []byte) (*TPMQuote, error) {
    // Open TPM device
    tpm, err := tpm2.OpenTPM("/dev/tpmrm0")
    if err != nil {
        return nil, err
    }
    defer tpm.Close()

    // Get AIK handle (assumes key already created)
    aikHandle := tpmutil.Handle(0x81010001)

    // Quote PCRs with nonce
    quote, sig, err := tpm2.Quote(
        tpm,
        aikHandle,
        "",  // AIK password
        "",  // data qualifier
        nonce,
        tpm2.PCRSelection{
            Hash: tpm2.AlgSHA256,
            PCRs: []int{0, 1, 2, 7},  // Boot PCRs
        },
        tpm2.AlgNull,
    )

    return &TPMQuote{
        Quote:     quote,
        Signature: sig,
        PCRs:      pcrValues,
    }, nil
}

Extension 2: CVE Vulnerability Scanning

Check installed software versions against known CVEs:

import requests
from packaging import version

def check_cves(installed_packages: dict) -> list:
    """Check packages against NVD (National Vulnerability Database)."""

    vulnerabilities = []

    for package, pkg_version in installed_packages.items():
        # Query NVD API (simplified)
        response = requests.get(
            f"https://services.nvd.nist.gov/rest/json/cves/2.0",
            params={
                "keywordSearch": package,
                "resultsPerPage": 20
            }
        )

        cves = response.json().get("vulnerabilities", [])

        for cve in cves:
            cve_data = cve.get("cve", {})
            # Check if our version is affected
            if is_version_affected(pkg_version, cve_data):
                vulnerabilities.append({
                    "cve_id": cve_data.get("id"),
                    "package": package,
                    "severity": get_cvss_severity(cve_data),
                    "description": cve_data.get("descriptions", [{}])[0].get("value")
                })

    return vulnerabilities

Extension 3: EDR Integration

Integrate with endpoint detection tools like CrowdStrike, Carbon Black, or osquery:

// Query osquery for security posture
func queryOsquery() (map[string]interface{}, error) {
    conn, err := osquery.NewExtensionManagerClient(
        "/var/osquery/osquery.em",
    )
    if err != nil {
        return nil, err
    }
    defer conn.Close()

    // Check for suspicious processes
    results, err := conn.Query(context.Background(), `
        SELECT name, path, pid
        FROM processes
        WHERE path NOT LIKE '/usr/%'
        AND path NOT LIKE '/System/%'
        AND path NOT LIKE '/Applications/%'
    `)

    return map[string]interface{}{
        "suspicious_processes": results,
    }, nil
}

Extension 4: Mobile Device Support (iOS/Android)

Extend the agent concept to mobile devices:

// iOS MDM query
import DeviceCheck

class iOSPostureChecker {
    func checkDeviceIntegrity(completion: @escaping (DevicePosture) -> Void) {
        let currentDevice = DCDevice.current

        // Check if device supports DeviceCheck
        guard currentDevice.isSupported else {
            completion(DevicePosture(trusted: false, reason: "DeviceCheck not supported"))
            return
        }

        // Generate device token
        currentDevice.generateToken { token, error in
            if let token = token {
                // Send token to your server for validation
                self.validateToken(token) { isValid in
                    completion(DevicePosture(
                        trusted: isValid,
                        jailbroken: self.checkJailbreak(),
                        passcodeSet: self.checkPasscode()
                    ))
                }
            }
        }
    }

    private func checkJailbreak() -> Bool {
        let jailbreakPaths = [
            "/Applications/Cydia.app",
            "/private/var/lib/apt",
            "/usr/sbin/sshd"
        ]
        return jailbreakPaths.contains { FileManager.default.fileExists(atPath: $0) }
    }
}

Extension 5: Behavioral Anomaly Detection

Add behavioral monitoring to detect compromised devices:

from collections import deque
from statistics import mean, stdev

class BehavioralMonitor:
    def __init__(self, window_size: int = 100):
        self.network_connections = deque(maxlen=window_size)
        self.process_starts = deque(maxlen=window_size)
        self.file_accesses = deque(maxlen=window_size)

    def record_event(self, event_type: str, data: dict):
        if event_type == "network":
            self.network_connections.append(data)
        elif event_type == "process":
            self.process_starts.append(data)
        elif event_type == "file":
            self.file_accesses.append(data)

        # Check for anomalies
        return self.detect_anomalies()

    def detect_anomalies(self) -> list:
        anomalies = []

        # Check for unusual number of outbound connections
        if len(self.network_connections) > 10:
            connections_per_minute = self.calculate_rate(self.network_connections)
            if connections_per_minute > self.normal_connection_rate * 3:
                anomalies.append({
                    "type": "excessive_network",
                    "rate": connections_per_minute,
                    "severity": "warning"
                })

        # Check for processes from unusual locations
        for proc in list(self.process_starts)[-10:]:
            if self.is_suspicious_path(proc.get("path")):
                anomalies.append({
                    "type": "suspicious_process",
                    "path": proc.get("path"),
                    "severity": "critical"
                })

        return anomalies

Books That Will Help

Primary Reading

Book Author Relevant Chapters
Zero Trust Security Andravous Ch. 5 (Device Trust), Ch. 7 (Continuous Verification)
Zero Trust Networks Evan Gilman, Doug Barth Ch. 4 (Device Trust), Ch. 8 (Endpoint Security)
Security in Computing Charles Pfleeger Ch. 3 (Authentication), Ch. 6 (Operating Systems)

Secondary Reading

Book Author Why It Helps
The Linux Programming Interface Michael Kerrisk System programming fundamentals
Serious Cryptography, 2nd Edition Jean-Philippe Aumasson Understanding signatures and attestation
Practical Binary Analysis Dennis Andriesse Low-level system inspection techniques
macOS Internals Jonathan Levin macOS security architecture
Windows Internals, 7th Edition Russinovich et al. Windows security subsystems

Standards and Specifications

Document Source Content
NIST SP 800-207 NIST Zero Trust Architecture definition
NIST SP 800-123 NIST Guide to General Server Security
CIS Benchmarks CIS Security configuration baselines
TCG TPM 2.0 Specification TCG Hardware attestation

Interview Questions

After completing this project, you should be able to answer:

Conceptual Questions

  1. “Why is device trust important in Zero Trust Architecture?”

    A valid user on a compromised device is still a threat. ZTA requires verifying both identity AND device health before granting access. A stolen laptop with valid credentials should not have the same access as a healthy corporate device.

  2. “What’s the difference between software-based and hardware-based attestation?”

    Software attestation uses agent-collected data and software keys - simpler but the OS could lie. Hardware attestation uses TPM to measure and sign boot state - more secure but complex. The TPM measures boot integrity before the OS loads, so a compromised kernel can’t forge those measurements.

  3. “How do you prevent an attacker from forging a healthy device report?”

    Sign reports with a device-specific private key that only the legitimate agent possesses. Include a nonce from the verifier to prevent replay attacks. Include timestamps to detect stale reports. For maximum security, use TPM-backed keys that cannot be extracted even with root access.

  4. “What’s the advantage of continuous verification over point-in-time checks?”

    Point-in-time checks can be bypassed by changing device state after authentication. Continuous monitoring detects changes in real-time and can revoke access immediately. This closes the window between compromise and access revocation from hours to seconds.

  5. “How would you handle BYOD devices in a Zero Trust model?”

    Apply a lower base trust score, limit access to less sensitive resources, require more frequent re-authentication, use containerization for work data, and collect only security-relevant telemetry to respect privacy.

Technical Questions

  1. “How would you query disk encryption status on macOS?”

    Use fdesetup status which returns “FileVault is On/Off.” Parse the output to determine enabled state. For more detail, use fdesetup status -extended which returns JSON with encryption type and progress.

  2. “What are Platform Configuration Registers (PCRs) in a TPM?”
    PCRs are special registers that store cryptographic measurements of the boot process. They can only be extended (new_value = hash(old_value   measurement)), not overwritten. This creates a chain of trust from firmware through bootloader to kernel. Standard PCRs include 0 (BIOS), 4 (bootloader), and 7 (Secure Boot policy).
  3. “How would you detect that a firewall was just disabled?”

    Poll the firewall status periodically (e.g., every 60 seconds) or use OS event APIs for immediate notification. When a change is detected, generate an updated health report, recalculate the trust score, and send to the PDP for access re-evaluation.

  4. “Why use Ed25519 for report signing instead of RSA?”

    Ed25519 provides 128-bit security with 256-bit keys (vs 3072-bit RSA), faster signing and verification, deterministic signatures (no random nonce needed), and resistance to timing attacks. It’s also simpler to implement correctly.

  5. “How would you integrate this agent with the Policy Decision Point from Project 2?”

    The agent exposes a /health endpoint or pushes reports to a PDP endpoint. Reports include structured posture data and a trust score. The PDP stores the device public key, verifies report signatures, and uses the trust score as an input to access decisions. The PDP can request fresh reports with a nonce to prevent replay.


Self-Assessment Checklist

Before considering this project complete, verify your understanding:

Conceptual Understanding

  • Can you explain why device trust is a pillar of Zero Trust Architecture?
  • Can you describe at least 5 security posture checks and why each matters?
  • Can you explain the difference between software and hardware attestation?
  • Can you articulate why trust scores are better than binary decisions?
  • Can you explain the security benefits of continuous vs point-in-time verification?

Implementation Skills

  • Can you query disk encryption status on at least two operating systems?
  • Can you query firewall status on at least two operating systems?
  • Can you determine OS patch level and days since last update?
  • Can you generate and sign a structured health report?
  • Can you verify a signed report on the receiving end?

Cross-Platform Development

  • Does your agent work on macOS?
  • Does your agent work on Linux (at least Ubuntu)?
  • Do you have a clear abstraction layer for OS-specific code?
  • Can you add Windows support without major refactoring?

Security Considerations

  • Are your reports signed with strong cryptography (Ed25519 or similar)?
  • Do you include timestamps and nonces to prevent replay attacks?
  • Does your agent require appropriate permissions without running as root unnecessarily?
  • Is your local HTTP API bound to localhost only?
  • Have you considered how an attacker might try to forge healthy reports?

Real-World Readiness

  • Can your agent run as a background service/daemon?
  • Does it handle system sleep/wake cycles gracefully?
  • Does it recover from temporary errors (network issues, permission changes)?
  • Can it be configured without recompiling (config file, env vars)?
  • Could you deploy this to a fleet of 1000 devices?

Integration Capability

  • Does your report format match what a PDP would expect?
  • Can you extend the agent with new collectors without major changes?
  • Is the trust scoring model configurable?
  • Could this integrate with your Project 2 Policy Decision Engine?

The Core Question You’ve Answered

“How do I know the device connecting to my system is secure, and not a compromised laptop pretending to be trusted?”

This is THE fundamental question of endpoint security in Zero Trust. By building this device health agent, you have mastered:

  1. System Introspection: Querying OS state to determine security posture
  2. Cross-Platform Programming: Abstracting OS-specific APIs behind clean interfaces
  3. Cryptographic Attestation: Signing reports so they cannot be forged
  4. Continuous Security: Monitoring for changes and reacting in real-time
  5. Risk-Based Decisions: Moving beyond binary trust to nuanced scoring

You now understand that in Zero Trust, the question is never “Is this user authorized?” but rather “Is this user, on this device, at this time, in this context, authorized for this specific action?”

Your device health agent is one critical input to that complex decision.


Project Guide Version 1.0 - December 2025