Project 5: Device Trust & Health Attestation
Project 5: Device Trust & Health Attestation
The Core Question: âHow do I know the device connecting to my system is secure, and not a compromised laptop pretending to be trusted?â
Table of Contents
- Learning Objectives
- Deep Theoretical Foundation
- Complete Project Specification
- Real World Outcome
- The Core Question Youâre Answering
- Concepts You Must Understand First
- Questions to Guide Your Design
- Thinking Exercise
- Hints in Layers
- Solution Architecture
- Phased Implementation Guide
- Testing Strategy
- Common Pitfalls & Debugging
- Extensions & Challenges
- Books That Will Help
- Interview Questions
- Self-Assessment Checklist
Project Overview
| Attribute | Value |
|---|---|
| Difficulty | Level 2: Intermediate |
| Time Estimate | Weekend (8-16 hours) |
| Primary Language | Go or Python |
| Alternative Languages | Swift (macOS), PowerShell (Windows) |
| Knowledge Area | Endpoint Security / Operating Systems |
| Software/Tools | OS APIs (Disk encryption, Firewall, Patch level) |
| Main Book | âZero Trust Securityâ by Andravous |
Learning Objectives
By completing this project, you will master:
-
Device Trust Fundamentals: Understand why Zero Trust requires device verification, not just user authentication. Learn that a valid user on a compromised device is still a threat.
-
Endpoint Posture Assessment: Query operating system state to determine security configurationâdisk encryption status, firewall state, patch level, and antivirus presence.
-
Cross-Platform System Programming: Use OS-specific APIs and commands (Linux, macOS, Windows) to gather security telemetry. Understand why abstraction layers are necessary.
-
Cryptographic Report Signing: Sign health reports so that malicious software cannot forge a âhealthyâ status. Understand the role of device keys in attestation.
-
Continuous vs Point-in-Time Verification: Implement real-time monitoring that detects security posture changes and can trigger access revocation immediately.
-
Trust Score Calculation: Move beyond binary âtrusted/untrustedâ to nuanced scoring that allows risk-based access decisions.
-
Integration with Policy Decision Points: Design your agentâs output to be consumable by the Policy Decision Point (PDP) you built in Project 2.
Deep Theoretical Foundation
Device Trust in Zero Trust Architecture
In traditional perimeter security, if you were on the corporate network, your device was implicitly trusted. Zero Trust Architecture (ZTA) fundamentally rejects this assumption.
Traditional Model: Zero Trust Model:
================== ==================
Internet Internet
| |
[Firewall] [Identity-Aware Proxy]
| |
"Trust Zone" Per-request verification:
| |
All devices 1. Is this user authenticated?
trusted by 2. Is this DEVICE healthy?
network 3. Is this request context normal?
location 4. Is this resource appropriate?
|
Access granted (or denied)
The Key Insight: In ZTA, you donât trust a USERâyou trust the combination of:
- A verified identity (from Project 1)
- A healthy device (this project)
- An appropriate context (time, location, behavior)
- A specific resource request
This is called âcompound identityâ or âcontext-aware access.â
NIST SP 800-207 Section 3.2.2 explicitly states:
âThe trust algorithm should consider both the identity of the requester and the state of the device making the request.â
Why Device Trust Matters
Consider this attack scenario:
Attacker Strategy:
==================
1. Compromise employee's personal laptop (via phishing malware)
2. Wait for employee to VPN into corporate network
3. Use employee's valid credentials to access sensitive data
4. Exfiltrate data through the VPN tunnel
Traditional security sees:
- Valid username: alice@company.com [CHECK]
- Valid password: ******** [CHECK]
- Valid MFA token: 123456 [CHECK]
- VPN connection: Established [CHECK]
Result: ACCESS GRANTED (but laptop is compromised!)
---
With Device Trust:
==================
1. Attacker compromises laptop
2. Employee attempts VPN connection
3. Device Health Agent reports:
- Disk encryption: ENABLED [CHECK]
- Firewall: DISABLED [FAIL] - attacker disabled it
- OS patch level: 47 days old [WARN]
- Unknown process: rat.exe [CRITICAL]
- Antivirus: Definitions 30 days old [WARN]
Trust Score: 23/100 (Below threshold of 70)
Result: ACCESS DENIED, Security team notified
Endpoint Posture Checks: What to Measure
A comprehensive device health assessment examines multiple dimensions:
+------------------------------------------------------------------+
| DEVICE HEALTH DIMENSIONS |
+------------------------------------------------------------------+
| |
| 1. DISK ENCRYPTION |
| - Full disk encryption enabled? |
| - Recovery key escrowed? |
| - Encryption algorithm strength? |
| |
| 2. FIREWALL STATUS |
| - Host-based firewall enabled? |
| - Inbound connections blocked? |
| - Outbound filtering configured? |
| |
| 3. OPERATING SYSTEM |
| - Current patch level? |
| - Days since last update? |
| - Known CVEs affecting this version? |
| |
| 4. ANTIVIRUS/EDR |
| - Antivirus present? |
| - Real-time scanning enabled? |
| - Definition age? |
| |
| 5. SECURE BOOT |
| - Secure Boot enabled? |
| - Boot chain verified? |
| - UEFI firmware version? |
| |
| 6. RUNNING PROCESSES |
| - Any unauthorized processes? |
| - Suspicious network connections? |
| - Elevated privilege usage? |
| |
| 7. DEVICE IDENTITY |
| - Hardware serial number match? |
| - TPM attestation valid? |
| - Device registered in inventory? |
| |
+------------------------------------------------------------------+
TPM and Hardware Roots of Trust
A Trusted Platform Module (TPM) is a hardware security chip that provides cryptographic operations and secure key storage. Itâs fundamental to hardware-based device attestation.
TPM Architecture:
=================
+------------------------------------------------------------------+
| TRUSTED PLATFORM MODULE |
+------------------------------------------------------------------+
| |
| +------------------+ +------------------+ |
| | Endorsement | | Storage Root | |
| | Key (EK) | | Key (SRK) | |
| | (Unique to | | (User- | |
| | this TPM) | | controlled) | |
| +------------------+ +------------------+ |
| |
| +------------------+ +------------------+ |
| | Platform | | Attestation | |
| | Configuration | | Identity Key | |
| | Registers | | (AIK) | |
| | (PCRs) | | | |
| +------------------+ +------------------+ |
| |
| PCR Values: |
| PCR[0] = Hash(BIOS/UEFI firmware) |
| PCR[1] = Hash(BIOS configuration) |
| PCR[2] = Hash(Option ROMs) |
| PCR[3] = Hash(Option ROM configuration) |
| PCR[4] = Hash(MBR/bootloader) |
| PCR[5] = Hash(MBR/bootloader configuration) |
| PCR[7] = Hash(Secure Boot policy) |
| ... |
| |
+------------------------------------------------------------------+
| |
| Operations: |
| - TPM_Extend(PCR, data) - Add measurement to PCR |
| - TPM_Quote(PCRs, AIK) - Sign current PCR values |
| - TPM_Seal(data, PCRs) - Encrypt data, only decrypt if PCRs |
| match specified values |
| |
+------------------------------------------------------------------+
How TPM Attestation Works:
Remote Attestation Flow:
========================
Device Verifier (PDP)
====== ==============
1. PDP requests attestation with nonce
<--- Send nonce (random challenge)
2. Device gathers measurements
- Boot log
- PCR values
- Current state
3. TPM signs PCR values + nonce with AIK
[PCR0..PCR7 | nonce] ---> TPM ---> [Signature]
4. Send signed quote + boot log
---> [Quote + Signature + Boot Log]
5. Verify AIK signature
Replay boot log to compute expected PCRs
Compare computed vs received PCRs
6. <--- Trust decision
Why the nonce? Prevents replay attacks--attacker can't
send yesterday's "healthy" quote today.
For This Project: While full TPM attestation is complex (and platform-specific), weâll implement a simplified version using asymmetric cryptography. The agent will have a private key, and the PDP will verify signatures using the corresponding public key.
Attestation Concepts
Attestation is the process of providing evidence about a systemâs state to a remote verifier. There are several models:
Attestation Models:
===================
1. SOFTWARE-BASED ATTESTATION
- Agent collects state using OS APIs
- Signs report with software key
- WEAKNESS: Compromised OS can lie
- GOOD FOR: Deployment simplicity, most threats
2. HARDWARE-BASED ATTESTATION (TPM)
- TPM measures boot process
- Measurements stored in PCRs
- TPM signs quote
- STRENGTH: Kernel can't forge boot measurements
- WEAKNESS: Expensive, complex, not universal
3. HYBRID ATTESTATION
- TPM attests boot integrity
- Software agent attests runtime state
- Best of both worlds
- MOST REALISTIC for enterprises
For this project, we implement SOFTWARE-BASED ATTESTATION
with strong signing, preparing for TPM integration later.
Key Attestation Properties:
| Property | Description | Our Implementation |
|---|---|---|
| Freshness | Report is recent, not replayed | Include timestamp + nonce |
| Authenticity | Report comes from claimed device | Sign with device private key |
| Integrity | Report hasnât been modified | Signature covers all fields |
| Non-repudiation | Device canât deny sending report | Asymmetric signature |
Trust Scores vs Binary Decisions
Traditional access control is binary: ALLOW or DENY. Zero Trust often uses continuous trust scores that enable nuanced decisions:
Binary Decision: Trust Score Decision:
================ ====================
Firewall disabled? Firewall disabled: -20 points
--> DENY ACCESS OS patch level: -10 points
Antivirus current: +10 points
Known device: +30 points
---------------------------
Total: 85/100
Policy rules:
- Score < 50: DENY all access
- Score 50-70: Read-only access
- Score 70-90: Normal access
- Score > 90: Full access
Result: One issue doesn't Result: Degraded access,
lock out user entirely user can still work
Trust Score Components:
# Example scoring model
TRUST_COMPONENTS = {
"disk_encryption": {
"enabled": +20,
"disabled": -30,
"unknown": -10
},
"firewall": {
"enabled": +15,
"disabled": -25,
"unknown": -5
},
"os_patch_level": {
"current": +15, # Within 7 days
"slightly_outdated": +5, # 8-30 days
"outdated": -10, # 31-90 days
"critically_outdated": -30 # 90+ days
},
"antivirus": {
"present_current": +15,
"present_outdated": +5,
"not_present": -20
},
"known_device": {
"registered": +25,
"unregistered": -15
},
"secure_boot": {
"enabled": +10,
"disabled": -5,
"unsupported": 0
}
}
# Base score: 50 (neutral)
# Max theoretical: 100
# Min theoretical: 0 (capped)
Continuous vs Point-in-Time Verification
Point-in-Time: Check device health once (at login, at VPN connection). Continuous: Constantly monitor device health and react to changes.
Point-in-Time Verification:
===========================
08:00 - User logs in
Health check: PASS
Access granted
10:00 - User disables firewall
(No verification)
12:00 - Attacker exfiltrates data
(Still has access)
16:00 - User logs out
Problem: 6-hour window of vulnerable access
---
Continuous Verification:
========================
08:00 - User logs in
Health check: PASS
Access granted
Monitoring begins
10:00 - User disables firewall
Agent detects change
Sends updated health report
PDP recalculates trust score
Score drops below threshold
Access revoked
10:01 - User attempts data access
ACCESS DENIED
10:05 - User re-enables firewall
Agent detects improvement
Sends updated report
Access restored
Advantage: Real-time response to security changes
Implementation Approaches:
Polling Model: Event-Driven Model:
============== ==================
while True: register_event_handlers(
report = collect_posture() on_firewall_change,
send_to_pdp(report) on_encryption_change,
sleep(60) # Every minute on_patch_installed,
on_process_started
)
Pros: Simple to implement
Cons: Delay in detection, Pros: Immediate detection
wasted resources Cons: OS-specific, complex
For this project: Start with polling,
then add event-driven for bonus points
BYOD Considerations
Bring Your Own Device (BYOD) policies introduce unique device trust challenges:
Corporate Device: Personal Device (BYOD):
================= =======================
- Company controls configuration - User controls configuration
- Can enforce encryption - Can only REQUEST encryption
- Can install EDR - Cannot mandate EDR
- Full visibility - Privacy concerns
- Easier to attest - Harder to attest
BYOD Trust Strategies:
======================
1. CONTAINER APPROACH
- Work apps in isolated container
- Container enforces policies
- Personal apps unrestricted
2. RISK-BASED ACCESS
- Lower trust score for BYOD
- Limited access to sensitive data
- More frequent re-authentication
3. MANAGED APPS ONLY
- Work through web browser
- No local data storage
- Minimal device requirements
Our Agent's BYOD Mode:
======================
- Collect only security-relevant data
- No personal app inspection
- Report device type: "personal" vs "corporate"
- Let PDP apply appropriate policy
Complete Project Specification
Functional Requirements
Core Features (Must Have):
| Feature | Description | Priority |
|---|---|---|
| Disk encryption status | Query FileVault (macOS), BitLocker (Windows), LUKS (Linux) | P0 |
| Firewall status | Query system firewall enabled/disabled state | P0 |
| OS patch level | Determine days since last OS update | P0 |
| Health report generation | Create structured JSON report | P0 |
| Report signing | Sign reports with device private key | P0 |
| HTTP endpoint | Expose health status via local API | P1 |
| Continuous monitoring | Detect posture changes in real-time | P1 |
| PDP integration | Send reports to Policy Decision Point | P2 |
Health Report Schema:
{
"report_version": "1.0",
"device_id": "device-unique-identifier",
"timestamp": "2025-12-27T10:30:00Z",
"nonce": "random-string-from-pdp",
"posture": {
"disk_encryption": {
"enabled": true,
"algorithm": "AES-256-XTS",
"recovery_key_escrowed": true
},
"firewall": {
"enabled": true,
"profile": "public"
},
"os": {
"name": "macOS",
"version": "14.2.1",
"last_update": "2025-12-20T00:00:00Z",
"days_since_update": 7,
"known_cves": []
},
"antivirus": {
"present": true,
"product": "CrowdStrike Falcon",
"definitions_age_days": 1
},
"secure_boot": {
"enabled": true,
"mode": "full"
}
},
"trust_score": 87,
"issues": [
{"severity": "warning", "message": "OS update available"}
],
"signature": "base64-encoded-signature"
}
Non-Functional Requirements
| Requirement | Target | Rationale |
|---|---|---|
| Collection time | < 2 seconds | Agent shouldnât slow down login |
| Memory usage | < 50 MB | Background process |
| CPU usage | < 1% average | Shouldnât drain battery |
| Report size | < 10 KB | Network efficiency |
| Update frequency | Configurable (default 60s) | Balance freshness vs overhead |
Real World Outcome
When your agent runs, hereâs what youâll see:
Starting the Agent
$ sudo ./device-health-agent --config /etc/device-health/config.yaml
[2025-12-27T10:30:00Z] Device Health Agent v1.0.0
[2025-12-27T10:30:00Z] Loading configuration from /etc/device-health/config.yaml
[2025-12-27T10:30:00Z] Device ID: device-abc123
[2025-12-27T10:30:00Z] PDP endpoint: https://pdp.company.com/device-health
[2025-12-27T10:30:00Z] Monitoring interval: 60 seconds
[2025-12-27T10:30:00Z] Starting initial posture collection...
+------------------------------------------------------------------+
| DEVICE HEALTH REPORT |
+------------------------------------------------------------------+
| |
| Device ID: device-abc123 |
| Timestamp: 2025-12-27T10:30:01Z |
| Trust Score: 87/100 [TRUSTED] |
| |
+------------------------------------------------------------------+
| POSTURE CHECKS |
+------------------------------------------------------------------+
| |
| [PASS] Disk Encryption FileVault enabled (AES-256-XTS) |
| [PASS] Firewall Enabled (Stealth mode active) |
| [WARN] OS Patch Level 7 days since last update |
| [PASS] Antivirus CrowdStrike Falcon (defs: 1 day old) |
| [PASS] Secure Boot Enabled (Full security) |
| |
+------------------------------------------------------------------+
| ISSUES |
+------------------------------------------------------------------+
| |
| [WARNING] OS update available (macOS 14.2.2) |
| |
+------------------------------------------------------------------+
[2025-12-27T10:30:01Z] Health report signed with device key
[2025-12-27T10:30:01Z] Report sent to PDP: 200 OK
[2025-12-27T10:30:01Z] PDP response: {"access_level": "full", "message": "Device trusted"}
[2025-12-27T10:30:01Z] Entering continuous monitoring mode...
Detecting a Security Change
[2025-12-27T10:45:22Z] POSTURE CHANGE DETECTED
[2025-12-27T10:45:22Z] Change: firewall.enabled: true -> false
+------------------------------------------------------------------+
| POSTURE CHANGE ALERT |
+------------------------------------------------------------------+
| |
| Change Type: DEGRADATION |
| Component: Firewall |
| Previous State: Enabled |
| Current State: DISABLED |
| |
| Trust Score Impact: 87 -> 62 (-25 points) |
| |
+------------------------------------------------------------------+
[2025-12-27T10:45:22Z] Generating emergency health report...
[2025-12-27T10:45:22Z] Report sent to PDP: 200 OK
[2025-12-27T10:45:22Z] PDP response: {"access_level": "restricted", "message": "Firewall must be enabled for full access"}
+------------------------------------------------------------------+
| ACCESS LEVEL CHANGED |
+------------------------------------------------------------------+
| |
| Previous Access: FULL |
| Current Access: RESTRICTED (read-only) |
| |
| To restore full access, enable the system firewall: |
| macOS: System Settings > Network > Firewall > Turn On |
| Linux: sudo ufw enable |
| Windows: netsh advfirewall set allprofiles state on |
| |
+------------------------------------------------------------------+
Querying the Local API
$ curl http://localhost:8080/health
{
"device_id": "device-abc123",
"timestamp": "2025-12-27T10:50:00Z",
"trust_score": 62,
"access_level": "restricted",
"posture": {
"disk_encryption": {"status": "pass", "enabled": true},
"firewall": {"status": "fail", "enabled": false},
"os_patch_level": {"status": "warn", "days": 7},
"antivirus": {"status": "pass", "present": true},
"secure_boot": {"status": "pass", "enabled": true}
},
"issues": [
{"severity": "critical", "message": "Firewall is disabled"},
{"severity": "warning", "message": "OS update available"}
]
}
The Core Question Youâre Answering
âHow do I know the device connecting to my system is secure, and not a compromised laptop pretending to be trusted?â
This is the fundamental question of endpoint security in Zero Trust Architecture. Traditional security assumed that if a device was on the corporate network, it could be trusted. Zero Trust rejects this assumption entirely.
Consider the scenario: An employeeâs laptop gets infected with malware through a phishing attack. The attacker now has valid credentials, can pass MFA (the malware waits for the user to authenticate), and connects through legitimate network paths. Traditional security sees everything as valid. But if youâre checking device health, youâll notice the firewall was disabled, an unknown process is running, and the antivirus definitions are weeks old.
This project teaches you to build the mechanism that answers: âEven though the user authenticated correctly, should we trust the device theyâre using?â
Concepts You Must Understand First
Before diving into implementation, ensure you understand these foundational concepts:
1. TPM and Hardware Roots of Trust
A Trusted Platform Module (TPM) is a dedicated hardware chip that provides cryptographic operations isolated from the main CPU. It contains:
- Endorsement Key (EK): A unique, manufacturer-installed key that identifies this specific TPM
- Platform Configuration Registers (PCRs): Special registers that store measurements of the boot process
- Attestation Identity Key (AIK): A key used to sign quotes (reports of PCR values)
The TPM creates a âchain of trustâ from hardware up through software. Each boot stage measures the next stage before executing it. If malware modifies the bootloader, the measurements will differ from expected values.
Why it matters: Software-based attestation can be fooled by a compromised OS. The OS can simply lie about its security state. TPM measurements are taken before the OS loads, so a rootkit cannot forge âI was a clean boot.â
2. Endpoint Posture Checks
âPostureâ refers to the security configuration state of a device. Key dimensions include:
- Disk Encryption: Is data at rest encrypted? Can a thief with physical access read the drive?
- Firewall Status: Is the host-based firewall enabled? Are inbound connections blocked?
- OS Patch Level: How many days since the last security update? Are known CVEs affecting this version?
- Antivirus/EDR Presence: Is endpoint protection software running? Are definitions current?
- Secure Boot: Is the boot chain verified? Can unauthorized bootloaders run?
Each dimension contributes to overall device trustworthiness. A device might have encryption enabled but an outdated OS, leading to a moderate trust score rather than binary pass/fail.
3. Attestation Protocols
Attestation is the process of providing cryptographically verifiable evidence about system state to a remote party. Key properties:
- Freshness: The report is recent (includes timestamp and nonce to prevent replay)
- Authenticity: The report genuinely came from the claimed device (signed with device key)
- Integrity: The report hasnât been modified in transit (signature covers all fields)
- Non-repudiation: The device cannot deny having sent the report (asymmetric cryptography)
The verifier sends a random nonce (challenge), the device includes this nonce in its signed report, proving the report was generated after the challenge.
4. Trust Score Calculation
Rather than binary âtrustedâ or âuntrusted,â modern ZTA uses numerical trust scores that enable graduated access:
- Score 0-50: Deny all access
- Score 51-70: Read-only access to non-sensitive resources
- Score 71-85: Standard access
- Score 86-100: Full access including sensitive operations
Each posture dimension contributes positive or negative points. Disk encryption enabled: +20. Firewall disabled: -25. The final score determines access level.
Why scores over binary? A user whose only issue is a 10-day-old OS patch shouldnât be locked out entirely. They can work with reduced privileges while a reminder prompts them to update.
5. Continuous vs Point-in-Time Verification
Point-in-time: Check device health once at login/VPN connection. Problem: What if security degrades after login?
Continuous: Monitor device state constantly, react to changes immediately. If the firewall gets disabled at 2pm, access is restricted at 2:01pm, not at next login.
Continuous verification can be implemented via:
- Polling (check every N seconds)
- Event-driven (subscribe to OS notifications about security changes)
- Hybrid (poll regularly, but also react to events)
Questions to Guide Your Design
Before writing code, think through these questions:
Data Collection
- How will you query disk encryption status on your target OS? What command or API returns this information?
- How do you determine if the firewall is enabled vs just installed? Whatâs the difference between having firewall software and having it actively blocking traffic?
- How do you find when the OS was last updated? Is this the same as when the last package was installed?
- What elevated permissions does your agent need? Can you drop privileges after startup?
Report Integrity
- What cryptographic algorithm will you use to sign reports? Why that choice?
- How do you prevent an attacker from replaying yesterdayâs âhealthyâ report today?
- Where is the deviceâs private key stored? Who has access to it? Could malware extract it?
- How does the PDP know which public key corresponds to which device?
Change Detection
- How often should you poll for changes? Whatâs the tradeoff between responsiveness and resource usage?
- What constitutes a âsignificantâ change worth reporting immediately vs waiting for the next scheduled report?
- How do you handle temporary state changes (e.g., firewall briefly disabled during software installation)?
Integration
- What format should your health report use? JSON? What schema?
- How will your agent communicate with the PDP? Push reports? Wait to be polled?
- What happens if the PDP is unreachable? Queue reports? Continue with cached trust level?
Thinking Exercise
Before implementing, work through this design exercise on paper:
Design a Trust Scoring Model
Create a scoring model for device health that balances security with usability.
Your task: Define point values for each posture dimension and justify your choices.
Consider:
- Whatâs your base score? (Starting point before any checks)
- Which security failures are critical (immediate access denial)?
- Which are warnings (reduced score but not blocking)?
- How do you handle âunknownâ states (couldnât check a dimension)?
Example framework:
Base Score: 50
Disk Encryption:
- Enabled with strong algorithm: +25
- Enabled with weak algorithm: +15
- In progress: +10
- Disabled: -40
- Unknown/check failed: -10
[Continue for other dimensions...]
Questions to answer:
- Whatâs the minimum score for any access? Why?
- What score grants full access? Why?
- If someone has perfect security except firewall disabled, what happens?
- If someone has all checks as âunknownâ (new OS version broke your queries), what access do they get?
Write out your complete scoring model before implementing it in code.
Hints in Layers
If you get stuck, reveal these hints progressively:
Hint 1: Getting Started with System Queries
Start with a single platform (your own). Donât try to support all three OSes initially.
For macOS: The fdesetup command handles FileVault queries. The Application Firewall is controlled by socketfilterfw. The softwareupdate command shows pending updates.
For Linux: Check /dev/mapper/ for LUKS encryption. Query ufw or firewall-cmd depending on distro. Look at /var/lib/apt/periodic/ timestamps for update recency.
For Windows: PowerShellâs Get-BitLockerVolume and Get-NetFirewallProfile cmdlets are your friends.
Hint 2: Structuring Your Agent
Create an abstraction layer early:
Collector Interface:
- Collect() returns (Status, error)
Platform-specific implementations:
- DarwinDiskEncryptionCollector
- LinuxDiskEncryptionCollector
- WindowsDiskEncryptionCollector
Factory function:
- NewDiskEncryptionCollector() returns appropriate impl for runtime.GOOS
This pattern lets you add Windows support later without restructuring.
Hint 3: Signing Reports Correctly
The message you sign must include everything that needs protection:
- The report content itself (serialized to JSON)
- The timestamp
- The nonce from the verifier
Use a consistent format: message = report_json || "|" || timestamp || "|" || nonce
Sign the message bytes, not the string representation of each component separately.
Hint 4: Continuous Monitoring Strategy
Start with simple polling:
loop every 60 seconds:
new_state = collect_all()
if new_state != cached_state:
report_change(cached_state, new_state)
cached_state = new_state
Only add event-driven monitoring after polling works. Events are OS-specific and more complex.
For change detection, compare the serialized state or implement Equals() methods. Be careful with floating-point comparisons or timestamps that change every collection.
Hint 5: Error Handling and Graceful Degradation
Not every check will succeed every time. Handle failures gracefully:
- If a specific collector fails, report that dimension as âunknownâ rather than crashing
- If you canât reach the PDP, queue reports and retry with exponential backoff
- If permissions change mid-run, log the error and continue with reduced collection
- Set a maximum report age; if you havenât successfully sent a report in N minutes, alert locally
Consider what happens if your agent itself is compromised. Defense in depth: the agent is just one input to the PDPâs decision.
Solution Architecture
System Architecture Diagram
+------------------------------------------------------------------+
| DEVICE HEALTH AGENT |
+------------------------------------------------------------------+
| |
| +----------------+ +------------------+ +---------------+ |
| | Posture | | Report | | HTTP | |
| | Collectors | | Generator | | Server | |
| |----------------| |------------------| |---------------| |
| | - Disk Encrypt | | - Schema | | - /health | |
| | - Firewall |--->| - Scoring |--->| - /report | |
| | - OS Patches | | - Signing | | - /status | |
| | - Antivirus | | | | | |
| +----------------+ +------------------+ +---------------+ |
| | | | |
| v v | |
| +----------------+ +------------------+ | |
| | OS Abstraction | | Crypto Layer | | |
| |----------------| |------------------| | |
| | - Linux impl | | - Key management | | |
| | - macOS impl | | - Ed25519 signing| | |
| | - Windows impl | | - Verification | | |
| +----------------+ +------------------+ | |
| | |
+------------------------------------------------------------------+
| | |
v v v
+----------+ +------------+ +-----------+
| OS APIs | | Key Store | | PDP |
| Commands | | (file/TPM) | | Service |
+----------+ +------------+ +-----------+
Module Breakdown
device-health-agent/
+-- main.go # Entry point, CLI handling
+-- config/
| +-- config.go # Configuration loading
| +-- config.yaml # Default configuration
+-- collectors/
| +-- interface.go # Collector interface definition
| +-- disk_encryption.go # Disk encryption status
| +-- firewall.go # Firewall status
| +-- os_patches.go # OS patch level
| +-- antivirus.go # Antivirus status
| +-- secure_boot.go # Secure boot status
+-- collectors/platform/
| +-- linux.go # Linux implementations
| +-- darwin.go # macOS implementations
| +-- windows.go # Windows implementations
+-- report/
| +-- generator.go # Report generation
| +-- schema.go # Report data structures
| +-- signer.go # Cryptographic signing
| +-- scorer.go # Trust score calculation
+-- monitor/
| +-- watcher.go # Continuous monitoring
| +-- events.go # Change detection
+-- api/
| +-- server.go # HTTP API server
| +-- handlers.go # Request handlers
+-- crypto/
| +-- keys.go # Key management
| +-- sign.go # Signing operations
+-- client/
| +-- pdp.go # PDP communication client
+-- tests/
+-- collectors_test.go
+-- report_test.go
+-- integration_test.go
Data Flow Diagram
Configuration
|
v
+------------------+ +------------------+
| Startup |->| Key Loading |
+------------------+ +--------+---------+
|
v
+--------------------+--------------------+
| | |
v v v
+------------------+ +------------------+ +------------------+
| Disk Encryption | | Firewall | | OS Patches |
| Collector | | Collector | | Collector |
+--------+---------+ +--------+---------+ +--------+---------+
| | |
+---------------------+---------------------+
|
v
+--------------------+
| Report Generator |
| - Aggregate |
| - Score |
| - Format |
+--------+-----------+
|
v
+--------------------+
| Report Signer |
| - Add timestamp |
| - Add nonce |
| - Sign with key |
+--------+-----------+
|
+----------------+----------------+
| |
v v
+----------------+ +----------------+
| HTTP API | | PDP Client |
| (localhost) | | (remote) |
+----------------+ +----------------+
| |
v v
Local consumers Policy Decision
(CLI, browser) Point
Phased Implementation Guide
Phase 1: Query Disk Encryption Status (2-3 hours)
Goal: Determine if the systemâs disk is encrypted using OS-specific commands.
Milestone: Running ./agent --check disk prints encryption status.
OS-Specific Commands
macOS (FileVault):
# Check FileVault status
$ fdesetup status
FileVault is On.
# Programmatic check (returns JSON)
$ fdesetup status -extended
{
"FileVaultStatus": "On",
"Encryption Conversion Progress": 100.0,
"Encryption Type": "AES-XTS"
}
# Using diskutil
$ diskutil apfs list | grep -A5 "FileVault"
Linux (LUKS):
# Check if root filesystem is on encrypted volume
$ lsblk -f
NAME FSTYPE LABEL MOUNTPOINT
sda
+-sda1 ext4 boot /boot
+-sda2 crypto_LUKS
+-root ext4 root /
# Check LUKS status
$ sudo cryptsetup status /dev/mapper/root
/dev/mapper/root is active.
type: LUKS2
cipher: aes-xts-plain64
keysize: 512 bits
# Using dmsetup
$ sudo dmsetup status --target crypt
root: 0 209715200 crypt
Windows (BitLocker):
# PowerShell command
Get-BitLockerVolume -MountPoint "C:" | Select-Object VolumeStatus, EncryptionMethod
# manage-bde command
manage-bde -status C:
# Example output:
# Volume C: []
# Conversion Status: Fully Encrypted
# Percentage Encrypted: 100%
# Encryption Method: XTS-AES 256
# Protection Status: Protection On
Go Implementation (macOS example)
package collectors
import (
"os/exec"
"strings"
)
type DiskEncryptionStatus struct {
Enabled bool `json:"enabled"`
Algorithm string `json:"algorithm,omitempty"`
Progress int `json:"progress,omitempty"` // 0-100
}
func CheckDiskEncryption() (*DiskEncryptionStatus, error) {
// Check if running on macOS
if runtime.GOOS != "darwin" {
return checkDiskEncryptionLinux()
}
cmd := exec.Command("fdesetup", "status")
output, err := cmd.Output()
if err != nil {
return nil, fmt.Errorf("failed to check FileVault: %w", err)
}
outputStr := string(output)
status := &DiskEncryptionStatus{}
if strings.Contains(outputStr, "FileVault is On") {
status.Enabled = true
status.Algorithm = "AES-XTS" // Default for FileVault 2
status.Progress = 100
} else if strings.Contains(outputStr, "Encryption in progress") {
status.Enabled = true
status.Progress = parseProgress(outputStr)
} else {
status.Enabled = false
}
return status, nil
}
Python Implementation (Linux example)
import subprocess
import re
from dataclasses import dataclass
from typing import Optional
@dataclass
class DiskEncryptionStatus:
enabled: bool
algorithm: Optional[str] = None
luks_version: Optional[str] = None
def check_disk_encryption() -> DiskEncryptionStatus:
"""Check if root filesystem is encrypted using LUKS."""
# First, find the root device
result = subprocess.run(
["findmnt", "-n", "-o", "SOURCE", "/"],
capture_output=True, text=True
)
root_device = result.stdout.strip()
# Check if it's a device mapper (indicates encryption)
if "/dev/mapper/" in root_device:
# Query LUKS status
try:
result = subprocess.run(
["sudo", "cryptsetup", "status", root_device],
capture_output=True, text=True
)
if "is active" in result.stdout:
# Parse cipher info
cipher_match = re.search(r"cipher:\s+(\S+)", result.stdout)
algorithm = cipher_match.group(1) if cipher_match else "unknown"
return DiskEncryptionStatus(
enabled=True,
algorithm=algorithm,
luks_version="LUKS2" if "LUKS2" in result.stdout else "LUKS1"
)
except subprocess.CalledProcessError:
pass
return DiskEncryptionStatus(enabled=False)
Phase 2: Query Firewall Status (2-3 hours)
Goal: Determine if the host-based firewall is enabled.
Milestone: Running ./agent --check firewall prints firewall status.
OS-Specific Commands
macOS (Application Firewall):
# Check firewall state
$ /usr/libexec/ApplicationFirewall/socketfilterfw --getglobalstate
Firewall is enabled. (State = 1)
# Check stealth mode
$ /usr/libexec/ApplicationFirewall/socketfilterfw --getstealthmode
Stealth mode enabled.
# Check block all incoming
$ /usr/libexec/ApplicationFirewall/socketfilterfw --getblockall
Block all DISABLED!
Linux (iptables/ufw/firewalld):
# Check UFW (Ubuntu)
$ sudo ufw status
Status: active
To Action From
-- ------ ----
22/tcp ALLOW Anywhere
# Check iptables (any Linux)
$ sudo iptables -L -n | head -5
Chain INPUT (policy DROP)
target prot opt source destination
ACCEPT all -- 0.0.0.0/0 0.0.0.0/0
# Check firewalld (RHEL/CentOS)
$ firewall-cmd --state
running
Windows (Windows Firewall):
# PowerShell
Get-NetFirewallProfile | Select-Object Name, Enabled
# Example output:
# Name Enabled
# ---- -------
# Domain True
# Private True
# Public True
# Using netsh
netsh advfirewall show allprofiles state
Go Implementation
func CheckFirewall() (*FirewallStatus, error) {
switch runtime.GOOS {
case "darwin":
return checkFirewallMacOS()
case "linux":
return checkFirewallLinux()
case "windows":
return checkFirewallWindows()
default:
return nil, fmt.Errorf("unsupported OS: %s", runtime.GOOS)
}
}
func checkFirewallMacOS() (*FirewallStatus, error) {
cmd := exec.Command(
"/usr/libexec/ApplicationFirewall/socketfilterfw",
"--getglobalstate",
)
output, err := cmd.Output()
if err != nil {
return nil, err
}
status := &FirewallStatus{}
if strings.Contains(string(output), "State = 1") {
status.Enabled = true
// Check stealth mode
stealthCmd := exec.Command(
"/usr/libexec/ApplicationFirewall/socketfilterfw",
"--getstealthmode",
)
stealthOutput, _ := stealthCmd.Output()
status.StealthMode = strings.Contains(string(stealthOutput), "enabled")
}
return status, nil
}
func checkFirewallLinux() (*FirewallStatus, error) {
// Try ufw first
cmd := exec.Command("ufw", "status")
output, err := cmd.Output()
if err == nil && strings.Contains(string(output), "Status: active") {
return &FirewallStatus{Enabled: true, Type: "ufw"}, nil
}
// Try firewalld
cmd = exec.Command("firewall-cmd", "--state")
output, err = cmd.Output()
if err == nil && strings.Contains(string(output), "running") {
return &FirewallStatus{Enabled: true, Type: "firewalld"}, nil
}
// Check iptables policy
cmd = exec.Command("iptables", "-L", "INPUT", "-n")
output, err = cmd.Output()
if err == nil && strings.Contains(string(output), "policy DROP") {
return &FirewallStatus{Enabled: true, Type: "iptables"}, nil
}
return &FirewallStatus{Enabled: false}, nil
}
Phase 3: Check OS Patch Level (2-3 hours)
Goal: Determine how recently the OS was updated and if updates are pending.
Milestone: Running ./agent --check patches prints days since last update.
OS-Specific Commands
macOS:
# Get last update date from install history
$ softwareupdate --history | head -5
Display Name Version Date
------------ ------- ----
macOS Sonoma 14.2.1 14.2.1 12/20/2024, 10:30:00 AM
# Check for pending updates
$ softwareupdate -l
Software Update found the following new or updated software:
* Label: macOS Sonoma 14.2.2
Title: macOS Sonoma 14.2.2, Version: 14.2.2, Size: 1.2GB
# System profiler approach
$ system_profiler SPInstallHistoryDataType | head -20
Linux (Debian/Ubuntu):
# Last apt update
$ stat /var/lib/apt/periodic/update-success-stamp
Access: 2025-12-20 10:30:00.000000000 +0000
Modify: 2025-12-20 10:30:00.000000000 +0000
# Check pending updates
$ apt list --upgradable 2>/dev/null | wc -l
# Last package installation
$ ls -lt /var/log/apt/history.log
-rw-r--r-- 1 root root 12345 Dec 20 10:30 /var/log/apt/history.log
# Or parse dpkg.log
$ grep " install " /var/log/dpkg.log | tail -1
2025-12-20 10:30:00 install linux-image-6.1.0-15-amd64:amd64 6.1.0-15
Linux (RHEL/CentOS):
# Last yum/dnf update
$ sudo dnf history | head -5
ID | Command line | Date and time | Action(s)
--------------------------------------------------------------------------------
105 | update | 2025-12-20 10:30 | Upgrade
# Check pending updates
$ dnf check-update | wc -l
Windows:
# Get update history
Get-HotFix | Sort-Object InstalledOn -Descending | Select-Object -First 5
# Example output:
# Source Description HotFixID InstalledBy InstalledOn
# ------ ----------- -------- ----------- -----------
# DESKTOP-ABC Security Update KB5034123 NT AUTHORITY\SYSTEM 12/20/2025
# Check Windows Update service
Get-WindowsUpdateLog
# Using wmic
wmic qfe get InstalledOn | sort
Python Implementation
import subprocess
import os
from datetime import datetime, timedelta
from dataclasses import dataclass
from typing import List, Optional
@dataclass
class PatchStatus:
last_update: datetime
days_since_update: int
pending_updates: int
security_updates_pending: int
os_version: str
def check_patch_level_linux() -> PatchStatus:
"""Check patch level on Debian/Ubuntu systems."""
# Find last successful apt update
stamp_file = "/var/lib/apt/periodic/update-success-stamp"
if os.path.exists(stamp_file):
last_update = datetime.fromtimestamp(os.path.getmtime(stamp_file))
else:
# Fallback to dpkg.log
last_update = get_last_dpkg_install()
days_since = (datetime.now() - last_update).days
# Count pending updates
result = subprocess.run(
["apt", "list", "--upgradable"],
capture_output=True, text=True
)
# Subtract 1 for the header line
pending = max(0, len(result.stdout.strip().split('\n')) - 1)
# Count security updates
result = subprocess.run(
["apt", "list", "--upgradable"],
capture_output=True, text=True
)
security = sum(1 for line in result.stdout.split('\n')
if 'security' in line.lower())
# Get OS version
with open('/etc/os-release') as f:
for line in f:
if line.startswith('PRETTY_NAME='):
os_version = line.split('=')[1].strip().strip('"')
break
return PatchStatus(
last_update=last_update,
days_since_update=days_since,
pending_updates=pending,
security_updates_pending=security,
os_version=os_version
)
def get_last_dpkg_install() -> datetime:
"""Parse dpkg.log to find last install/upgrade."""
log_file = "/var/log/dpkg.log"
last_date = datetime.min
with open(log_file) as f:
for line in f:
if " install " in line or " upgrade " in line:
try:
date_str = line.split()[0] + " " + line.split()[1]
date = datetime.strptime(date_str, "%Y-%m-%d %H:%M:%S")
if date > last_date:
last_date = date
except ValueError:
continue
return last_date
Phase 4: Sign Health Reports (2-3 hours)
Goal: Cryptographically sign health reports so they cannot be forged.
Milestone: Reports include a signature that can be verified by the PDP.
Key Generation
# Generate Ed25519 keypair (recommended for signing)
$ openssl genpkey -algorithm ED25519 -out device_private.pem
$ openssl pkey -in device_private.pem -pubout -out device_public.pem
# View the keys
$ cat device_public.pem
-----BEGIN PUBLIC KEY-----
MCowBQYDK2VwAyEAxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx=
-----END PUBLIC KEY-----
Go Implementation
package crypto
import (
"crypto/ed25519"
"crypto/x509"
"encoding/base64"
"encoding/json"
"encoding/pem"
"fmt"
"os"
"time"
)
type SignedReport struct {
Report json.RawMessage `json:"report"`
Timestamp time.Time `json:"timestamp"`
Nonce string `json:"nonce"`
Signature string `json:"signature"`
}
type Signer struct {
privateKey ed25519.PrivateKey
}
func NewSigner(keyPath string) (*Signer, error) {
keyBytes, err := os.ReadFile(keyPath)
if err != nil {
return nil, fmt.Errorf("failed to read private key: %w", err)
}
block, _ := pem.Decode(keyBytes)
if block == nil {
return nil, fmt.Errorf("failed to decode PEM block")
}
key, err := x509.ParsePKCS8PrivateKey(block.Bytes)
if err != nil {
return nil, fmt.Errorf("failed to parse private key: %w", err)
}
edKey, ok := key.(ed25519.PrivateKey)
if !ok {
return nil, fmt.Errorf("key is not Ed25519")
}
return &Signer{privateKey: edKey}, nil
}
func (s *Signer) SignReport(report interface{}, nonce string) (*SignedReport, error) {
reportJSON, err := json.Marshal(report)
if err != nil {
return nil, err
}
timestamp := time.Now().UTC()
// Create the message to sign: report || timestamp || nonce
message := fmt.Sprintf("%s|%s|%s",
string(reportJSON),
timestamp.Format(time.RFC3339),
nonce,
)
signature := ed25519.Sign(s.privateKey, []byte(message))
return &SignedReport{
Report: reportJSON,
Timestamp: timestamp,
Nonce: nonce,
Signature: base64.StdEncoding.EncodeToString(signature),
}, nil
}
Python Implementation
import json
import base64
from datetime import datetime
from cryptography.hazmat.primitives import serialization
from cryptography.hazmat.primitives.asymmetric.ed25519 import Ed25519PrivateKey
from dataclasses import dataclass
@dataclass
class SignedReport:
report: dict
timestamp: str
nonce: str
signature: str
class ReportSigner:
def __init__(self, private_key_path: str):
with open(private_key_path, 'rb') as f:
self.private_key = serialization.load_pem_private_key(
f.read(),
password=None
)
def sign_report(self, report: dict, nonce: str) -> SignedReport:
report_json = json.dumps(report, sort_keys=True)
timestamp = datetime.utcnow().isoformat() + 'Z'
# Message to sign: report || timestamp || nonce
message = f"{report_json}|{timestamp}|{nonce}"
signature = self.private_key.sign(message.encode())
signature_b64 = base64.b64encode(signature).decode()
return SignedReport(
report=report,
timestamp=timestamp,
nonce=nonce,
signature=signature_b64
)
Verification on PDP Side
func VerifyReport(signedReport *SignedReport, publicKeyPath string) error {
// Load public key
keyBytes, _ := os.ReadFile(publicKeyPath)
block, _ := pem.Decode(keyBytes)
pubKey, _ := x509.ParsePKIXPublicKey(block.Bytes)
edPubKey := pubKey.(ed25519.PublicKey)
// Reconstruct the message
message := fmt.Sprintf("%s|%s|%s",
string(signedReport.Report),
signedReport.Timestamp.Format(time.RFC3339),
signedReport.Nonce,
)
// Decode signature
signature, _ := base64.StdEncoding.DecodeString(signedReport.Signature)
// Verify
if !ed25519.Verify(edPubKey, []byte(message), signature) {
return fmt.Errorf("signature verification failed")
}
// Check timestamp freshness (within 5 minutes)
age := time.Since(signedReport.Timestamp)
if age > 5*time.Minute {
return fmt.Errorf("report is too old: %v", age)
}
return nil
}
Phase 5: Continuous Monitoring with Change Detection (3-4 hours)
Goal: Monitor system state continuously and detect changes in real-time.
Milestone: Agent detects and reports when firewall is disabled.
Architecture
Continuous Monitoring Architecture:
===================================
+------------------------------------------------------------------+
| MONITORING LOOP |
+------------------------------------------------------------------+
| |
| +------------------+ |
| | Initial State |---+ |
| | Collection | | |
| +------------------+ | |
| v |
| +-----------+ |
| | State | |
| | Cache |<---------+ |
| +-----------+ | |
| | | |
| | Compare | |
| v | |
| +------------------+ | +------------------+ |
| | Polling Timer |-->+-->| Change Detector | |
| | (60s interval) | +--------+---------+ |
| +------------------+ | |
| v |
| +------------------+ |
| | Change Detected? | |
| +--------+---------+ |
| | |
| +---------------+---------------+ |
| | | |
| v v |
| +-----------+ +------------+ |
| | No Change | | Change! | |
| | Sleep | | Send Alert | |
| +-----------+ +-----+------+ |
| | |
| v |
| +------------+ |
| | Update | |
| | PDP | |
| +------------+ |
| |
+------------------------------------------------------------------+
Go Implementation
package monitor
import (
"context"
"log"
"reflect"
"sync"
"time"
)
type PostureChange struct {
Component string `json:"component"`
PreviousState interface{} `json:"previous_state"`
CurrentState interface{} `json:"current_state"`
Timestamp time.Time `json:"timestamp"`
Severity string `json:"severity"` // "critical", "warning", "info"
}
type Monitor struct {
collectors map[string]Collector
previousState map[string]interface{}
interval time.Duration
onChange func(change PostureChange)
mu sync.RWMutex
}
func NewMonitor(interval time.Duration, onChange func(PostureChange)) *Monitor {
return &Monitor{
collectors: make(map[string]Collector),
previousState: make(map[string]interface{}),
interval: interval,
onChange: onChange,
}
}
func (m *Monitor) RegisterCollector(name string, collector Collector) {
m.collectors[name] = collector
}
func (m *Monitor) Start(ctx context.Context) {
// Initial collection
m.collectAll()
ticker := time.NewTicker(m.interval)
defer ticker.Stop()
for {
select {
case <-ctx.Done():
log.Println("Monitor stopped")
return
case <-ticker.C:
m.checkForChanges()
}
}
}
func (m *Monitor) collectAll() {
m.mu.Lock()
defer m.mu.Unlock()
for name, collector := range m.collectors {
state, err := collector.Collect()
if err != nil {
log.Printf("Error collecting %s: %v", name, err)
continue
}
m.previousState[name] = state
}
}
func (m *Monitor) checkForChanges() {
m.mu.Lock()
defer m.mu.Unlock()
for name, collector := range m.collectors {
currentState, err := collector.Collect()
if err != nil {
log.Printf("Error collecting %s: %v", name, err)
continue
}
previousState, exists := m.previousState[name]
if !exists {
m.previousState[name] = currentState
continue
}
if !reflect.DeepEqual(previousState, currentState) {
change := PostureChange{
Component: name,
PreviousState: previousState,
CurrentState: currentState,
Timestamp: time.Now(),
Severity: m.determineSeverity(name, previousState, currentState),
}
// Notify callback
if m.onChange != nil {
m.onChange(change)
}
// Update cached state
m.previousState[name] = currentState
}
}
}
func (m *Monitor) determineSeverity(component string, prev, curr interface{}) string {
// Firewall disabled is critical
if component == "firewall" {
prevFw, _ := prev.(*FirewallStatus)
currFw, _ := curr.(*FirewallStatus)
if prevFw.Enabled && !currFw.Enabled {
return "critical"
}
}
// Encryption disabled is critical
if component == "disk_encryption" {
prevEnc, _ := prev.(*DiskEncryptionStatus)
currEnc, _ := curr.(*DiskEncryptionStatus)
if prevEnc.Enabled && !currEnc.Enabled {
return "critical"
}
}
return "warning"
}
Event-Driven Alternative (macOS example)
For immediate detection without polling, use OS notification APIs:
// macOS: Monitor firewall changes using Endpoint Security framework
import EndpointSecurity
class FirewallMonitor {
var client: OpaquePointer?
func start() {
var newClient: OpaquePointer?
let result = es_new_client(&newClient) { _, message in
// Called when firewall state changes
if message.pointee.event_type == ES_EVENT_TYPE_NOTIFY_EXEC {
// Check if it's a firewall-related change
self.handleFirewallChange()
}
}
if result == ES_NEW_CLIENT_RESULT_SUCCESS {
self.client = newClient
// Subscribe to events
let events: [es_event_type_t] = [
ES_EVENT_TYPE_NOTIFY_EXEC
]
es_subscribe(client!, events, UInt32(events.count))
}
}
}
Testing Strategy
Unit Tests: Collector Mocking
// tests/collectors_test.go
package collectors_test
import (
"testing"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/mock"
)
// Mock command executor for testing
type MockCommandExecutor struct {
mock.Mock
}
func (m *MockCommandExecutor) Execute(cmd string, args ...string) ([]byte, error) {
callArgs := m.Called(cmd, args)
return callArgs.Get(0).([]byte), callArgs.Error(1)
}
func TestDiskEncryptionCheck_MacOS_Enabled(t *testing.T) {
executor := new(MockCommandExecutor)
executor.On("Execute", "fdesetup", []string{"status"}).
Return([]byte("FileVault is On."), nil)
collector := NewDiskEncryptionCollector(executor)
status, err := collector.Collect()
assert.NoError(t, err)
assert.True(t, status.Enabled)
}
func TestDiskEncryptionCheck_MacOS_Disabled(t *testing.T) {
executor := new(MockCommandExecutor)
executor.On("Execute", "fdesetup", []string{"status"}).
Return([]byte("FileVault is Off."), nil)
collector := NewDiskEncryptionCollector(executor)
status, err := collector.Collect()
assert.NoError(t, err)
assert.False(t, status.Enabled)
}
func TestFirewallCheck_Linux_UFWEnabled(t *testing.T) {
executor := new(MockCommandExecutor)
executor.On("Execute", "ufw", []string{"status"}).
Return([]byte("Status: active\n"), nil)
collector := NewFirewallCollector(executor)
status, err := collector.Collect()
assert.NoError(t, err)
assert.True(t, status.Enabled)
assert.Equal(t, "ufw", status.Type)
}
Integration Tests: Real System Queries
// tests/integration_test.go
// +build integration
package tests
import (
"runtime"
"testing"
)
func TestRealDiskEncryptionStatus(t *testing.T) {
if runtime.GOOS != "darwin" {
t.Skip("macOS-specific test")
}
collector := NewDiskEncryptionCollector(nil) // Real executor
status, err := collector.Collect()
if err != nil {
t.Fatalf("Failed to collect disk encryption status: %v", err)
}
t.Logf("Disk encryption enabled: %v", status.Enabled)
if status.Enabled {
t.Logf("Algorithm: %s", status.Algorithm)
}
}
func TestRealFirewallStatus(t *testing.T) {
collector := NewFirewallCollector(nil)
status, err := collector.Collect()
if err != nil {
t.Fatalf("Failed to collect firewall status: %v", err)
}
t.Logf("Firewall enabled: %v", status.Enabled)
}
Signature Verification Tests
func TestReportSigningAndVerification(t *testing.T) {
// Generate test keypair
pubKey, privKey, _ := ed25519.GenerateKey(nil)
// Create signer with private key
signer := &Signer{privateKey: privKey}
report := map[string]interface{}{
"disk_encryption": true,
"firewall": true,
}
signedReport, err := signer.SignReport(report, "test-nonce-123")
assert.NoError(t, err)
// Verify signature
verifier := &Verifier{publicKey: pubKey}
err = verifier.Verify(signedReport)
assert.NoError(t, err)
}
func TestTamperedReportRejected(t *testing.T) {
pubKey, privKey, _ := ed25519.GenerateKey(nil)
signer := &Signer{privateKey: privKey}
report := map[string]interface{}{"firewall": true}
signedReport, _ := signer.SignReport(report, "nonce")
// Tamper with the report
signedReport.Report = []byte(`{"firewall": false}`)
verifier := &Verifier{publicKey: pubKey}
err := verifier.Verify(signedReport)
assert.Error(t, err)
assert.Contains(t, err.Error(), "signature verification failed")
}
Continuous Monitoring Tests
func TestChangeDetection(t *testing.T) {
changes := make(chan PostureChange, 10)
monitor := NewMonitor(100*time.Millisecond, func(c PostureChange) {
changes <- c
})
// Start with firewall enabled
mockCollector := &MockCollector{}
mockCollector.SetState(&FirewallStatus{Enabled: true})
monitor.RegisterCollector("firewall", mockCollector)
ctx, cancel := context.WithTimeout(context.Background(), 2*time.Second)
defer cancel()
go monitor.Start(ctx)
// Wait for initial collection
time.Sleep(50 * time.Millisecond)
// Simulate firewall being disabled
mockCollector.SetState(&FirewallStatus{Enabled: false})
// Wait for change detection
select {
case change := <-changes:
assert.Equal(t, "firewall", change.Component)
assert.Equal(t, "critical", change.Severity)
case <-time.After(1 * time.Second):
t.Fatal("Expected change notification not received")
}
}
Common Pitfalls & Debugging
Pitfall 1: Permission Errors on System Queries
Symptom: Agent fails with âoperation not permittedâ or âaccess denied.â
Cause: Many security queries require elevated privileges.
Solution:
# Bad: Running as normal user
$ ./device-health-agent
Error: fdesetup: operation not permitted
# Good: Running with appropriate permissions
$ sudo ./device-health-agent
FileVault is On.
# Better: Use capabilities (Linux) instead of full root
$ sudo setcap 'cap_dac_read_search+ep' ./device-health-agent
In Code:
func checkPermissions() error {
if runtime.GOOS == "darwin" || runtime.GOOS == "linux" {
if os.Geteuid() != 0 {
return fmt.Errorf("this agent requires root privileges. Run with sudo")
}
}
return nil
}
Pitfall 2: OS Command Parsing Fragility
Symptom: Agent works on one machine but fails on another.
Cause: Command output format varies between OS versions.
# macOS 13.x
$ fdesetup status
FileVault is On.
# macOS 14.x (hypothetical change)
$ fdesetup status
FileVault: Enabled
Encryption Type: APFS
# Your regex breaks!
Solution:
// Bad: Fragile exact match
if output == "FileVault is On." {
return true
}
// Good: Flexible matching
func isFileVaultEnabled(output string) bool {
output = strings.ToLower(output)
return strings.Contains(output, "filevault is on") ||
strings.Contains(output, "filevault: enabled") ||
strings.Contains(output, "fully encrypted")
}
Pitfall 3: Clock Skew Breaking Signature Verification
Symptom: Valid signatures rejected with âreport is too old.â
Cause: Clock skew between agent and PDP.
Solution:
// Allow for clock skew (up to 5 minutes in either direction)
func verifyTimestamp(reportTime time.Time) error {
now := time.Now()
skew := 5 * time.Minute
if reportTime.Before(now.Add(-skew)) {
return fmt.Errorf("report is too old: %v ago", now.Sub(reportTime))
}
if reportTime.After(now.Add(skew)) {
return fmt.Errorf("report is from the future: %v ahead", reportTime.Sub(now))
}
return nil
}
Also consider using NTP and monitoring clock drift.
Pitfall 4: Race Conditions in Continuous Monitoring
Symptom: Duplicate change notifications or missed changes.
Cause: Reading and writing state without proper synchronization.
Solution:
// Bad: No synchronization
func (m *Monitor) checkForChanges() {
for name, collector := range m.collectors {
current := collector.Collect()
if current != m.previousState[name] { // Race!
m.onChange(...)
m.previousState[name] = current // Race!
}
}
}
// Good: Proper locking
func (m *Monitor) checkForChanges() {
m.mu.Lock()
defer m.mu.Unlock()
for name, collector := range m.collectors {
current, _ := collector.Collect()
previous := m.previousState[name]
if !reflect.DeepEqual(current, previous) {
m.onChange(...)
m.previousState[name] = current
}
}
}
Pitfall 5: Agent Itself Becoming an Attack Vector
Symptom: Attacker uses the agent to exfiltrate data or pivot.
Cause: Agent runs with elevated privileges and exposes HTTP API.
Solution:
// Bind only to localhost
server := &http.Server{
Addr: "127.0.0.1:8080", // NOT 0.0.0.0:8080
}
// Require authentication for API
func authMiddleware(next http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
token := r.Header.Get("X-Agent-Token")
if token != expectedToken {
http.Error(w, "Unauthorized", http.StatusUnauthorized)
return
}
next.ServeHTTP(w, r)
})
}
// Drop privileges after startup
func dropPrivileges() {
// On Linux, switch to unprivileged user after binding port
if runtime.GOOS == "linux" {
syscall.Setgid(nobodyGid)
syscall.Setuid(nobodyUid)
}
}
Debugging Commands
# Test individual collectors
$ ./device-health-agent --check disk --debug
[DEBUG] Executing: fdesetup status
[DEBUG] Raw output: "FileVault is On.\n"
[DEBUG] Parsed result: {Enabled: true, Algorithm: "AES-XTS"}
# View full report without sending to PDP
$ ./device-health-agent --dry-run --json | jq .
# Verify signature manually
$ ./device-health-agent --verify-key /path/to/public.pem --report report.json
# Monitor system calls (Linux)
$ strace -f ./device-health-agent --check firewall
# Monitor system calls (macOS)
$ dtruss ./device-health-agent --check disk
Extensions & Challenges
Extension 1: TPM Integration
Integrate with the Trusted Platform Module for hardware-backed attestation:
// Use go-tpm library
import "github.com/google/go-tpm/tpm2"
func getTPMAttestation(nonce []byte) (*TPMQuote, error) {
// Open TPM device
tpm, err := tpm2.OpenTPM("/dev/tpmrm0")
if err != nil {
return nil, err
}
defer tpm.Close()
// Get AIK handle (assumes key already created)
aikHandle := tpmutil.Handle(0x81010001)
// Quote PCRs with nonce
quote, sig, err := tpm2.Quote(
tpm,
aikHandle,
"", // AIK password
"", // data qualifier
nonce,
tpm2.PCRSelection{
Hash: tpm2.AlgSHA256,
PCRs: []int{0, 1, 2, 7}, // Boot PCRs
},
tpm2.AlgNull,
)
return &TPMQuote{
Quote: quote,
Signature: sig,
PCRs: pcrValues,
}, nil
}
Extension 2: CVE Vulnerability Scanning
Check installed software versions against known CVEs:
import requests
from packaging import version
def check_cves(installed_packages: dict) -> list:
"""Check packages against NVD (National Vulnerability Database)."""
vulnerabilities = []
for package, pkg_version in installed_packages.items():
# Query NVD API (simplified)
response = requests.get(
f"https://services.nvd.nist.gov/rest/json/cves/2.0",
params={
"keywordSearch": package,
"resultsPerPage": 20
}
)
cves = response.json().get("vulnerabilities", [])
for cve in cves:
cve_data = cve.get("cve", {})
# Check if our version is affected
if is_version_affected(pkg_version, cve_data):
vulnerabilities.append({
"cve_id": cve_data.get("id"),
"package": package,
"severity": get_cvss_severity(cve_data),
"description": cve_data.get("descriptions", [{}])[0].get("value")
})
return vulnerabilities
Extension 3: EDR Integration
Integrate with endpoint detection tools like CrowdStrike, Carbon Black, or osquery:
// Query osquery for security posture
func queryOsquery() (map[string]interface{}, error) {
conn, err := osquery.NewExtensionManagerClient(
"/var/osquery/osquery.em",
)
if err != nil {
return nil, err
}
defer conn.Close()
// Check for suspicious processes
results, err := conn.Query(context.Background(), `
SELECT name, path, pid
FROM processes
WHERE path NOT LIKE '/usr/%'
AND path NOT LIKE '/System/%'
AND path NOT LIKE '/Applications/%'
`)
return map[string]interface{}{
"suspicious_processes": results,
}, nil
}
Extension 4: Mobile Device Support (iOS/Android)
Extend the agent concept to mobile devices:
// iOS MDM query
import DeviceCheck
class iOSPostureChecker {
func checkDeviceIntegrity(completion: @escaping (DevicePosture) -> Void) {
let currentDevice = DCDevice.current
// Check if device supports DeviceCheck
guard currentDevice.isSupported else {
completion(DevicePosture(trusted: false, reason: "DeviceCheck not supported"))
return
}
// Generate device token
currentDevice.generateToken { token, error in
if let token = token {
// Send token to your server for validation
self.validateToken(token) { isValid in
completion(DevicePosture(
trusted: isValid,
jailbroken: self.checkJailbreak(),
passcodeSet: self.checkPasscode()
))
}
}
}
}
private func checkJailbreak() -> Bool {
let jailbreakPaths = [
"/Applications/Cydia.app",
"/private/var/lib/apt",
"/usr/sbin/sshd"
]
return jailbreakPaths.contains { FileManager.default.fileExists(atPath: $0) }
}
}
Extension 5: Behavioral Anomaly Detection
Add behavioral monitoring to detect compromised devices:
from collections import deque
from statistics import mean, stdev
class BehavioralMonitor:
def __init__(self, window_size: int = 100):
self.network_connections = deque(maxlen=window_size)
self.process_starts = deque(maxlen=window_size)
self.file_accesses = deque(maxlen=window_size)
def record_event(self, event_type: str, data: dict):
if event_type == "network":
self.network_connections.append(data)
elif event_type == "process":
self.process_starts.append(data)
elif event_type == "file":
self.file_accesses.append(data)
# Check for anomalies
return self.detect_anomalies()
def detect_anomalies(self) -> list:
anomalies = []
# Check for unusual number of outbound connections
if len(self.network_connections) > 10:
connections_per_minute = self.calculate_rate(self.network_connections)
if connections_per_minute > self.normal_connection_rate * 3:
anomalies.append({
"type": "excessive_network",
"rate": connections_per_minute,
"severity": "warning"
})
# Check for processes from unusual locations
for proc in list(self.process_starts)[-10:]:
if self.is_suspicious_path(proc.get("path")):
anomalies.append({
"type": "suspicious_process",
"path": proc.get("path"),
"severity": "critical"
})
return anomalies
Books That Will Help
Primary Reading
| Book | Author | Relevant Chapters |
|---|---|---|
| Zero Trust Security | Andravous | Ch. 5 (Device Trust), Ch. 7 (Continuous Verification) |
| Zero Trust Networks | Evan Gilman, Doug Barth | Ch. 4 (Device Trust), Ch. 8 (Endpoint Security) |
| Security in Computing | Charles Pfleeger | Ch. 3 (Authentication), Ch. 6 (Operating Systems) |
Secondary Reading
| Book | Author | Why It Helps |
|---|---|---|
| The Linux Programming Interface | Michael Kerrisk | System programming fundamentals |
| Serious Cryptography, 2nd Edition | Jean-Philippe Aumasson | Understanding signatures and attestation |
| Practical Binary Analysis | Dennis Andriesse | Low-level system inspection techniques |
| macOS Internals | Jonathan Levin | macOS security architecture |
| Windows Internals, 7th Edition | Russinovich et al. | Windows security subsystems |
Standards and Specifications
| Document | Source | Content |
|---|---|---|
| NIST SP 800-207 | NIST | Zero Trust Architecture definition |
| NIST SP 800-123 | NIST | Guide to General Server Security |
| CIS Benchmarks | CIS | Security configuration baselines |
| TCG TPM 2.0 Specification | TCG | Hardware attestation |
Interview Questions
After completing this project, you should be able to answer:
Conceptual Questions
- âWhy is device trust important in Zero Trust Architecture?â
A valid user on a compromised device is still a threat. ZTA requires verifying both identity AND device health before granting access. A stolen laptop with valid credentials should not have the same access as a healthy corporate device.
- âWhatâs the difference between software-based and hardware-based attestation?â
Software attestation uses agent-collected data and software keys - simpler but the OS could lie. Hardware attestation uses TPM to measure and sign boot state - more secure but complex. The TPM measures boot integrity before the OS loads, so a compromised kernel canât forge those measurements.
- âHow do you prevent an attacker from forging a healthy device report?â
Sign reports with a device-specific private key that only the legitimate agent possesses. Include a nonce from the verifier to prevent replay attacks. Include timestamps to detect stale reports. For maximum security, use TPM-backed keys that cannot be extracted even with root access.
- âWhatâs the advantage of continuous verification over point-in-time checks?â
Point-in-time checks can be bypassed by changing device state after authentication. Continuous monitoring detects changes in real-time and can revoke access immediately. This closes the window between compromise and access revocation from hours to seconds.
- âHow would you handle BYOD devices in a Zero Trust model?â
Apply a lower base trust score, limit access to less sensitive resources, require more frequent re-authentication, use containerization for work data, and collect only security-relevant telemetry to respect privacy.
Technical Questions
- âHow would you query disk encryption status on macOS?â
Use
fdesetup statuswhich returns âFileVault is On/Off.â Parse the output to determine enabled state. For more detail, usefdesetup status -extendedwhich returns JSON with encryption type and progress. - âWhat are Platform Configuration Registers (PCRs) in a TPM?â
PCRs are special registers that store cryptographic measurements of the boot process. They can only be extended (new_value = hash(old_value  measurement)), not overwritten. This creates a chain of trust from firmware through bootloader to kernel. Standard PCRs include 0 (BIOS), 4 (bootloader), and 7 (Secure Boot policy). - âHow would you detect that a firewall was just disabled?â
Poll the firewall status periodically (e.g., every 60 seconds) or use OS event APIs for immediate notification. When a change is detected, generate an updated health report, recalculate the trust score, and send to the PDP for access re-evaluation.
- âWhy use Ed25519 for report signing instead of RSA?â
Ed25519 provides 128-bit security with 256-bit keys (vs 3072-bit RSA), faster signing and verification, deterministic signatures (no random nonce needed), and resistance to timing attacks. Itâs also simpler to implement correctly.
- âHow would you integrate this agent with the Policy Decision Point from Project 2?â
The agent exposes a
/healthendpoint or pushes reports to a PDP endpoint. Reports include structured posture data and a trust score. The PDP stores the device public key, verifies report signatures, and uses the trust score as an input to access decisions. The PDP can request fresh reports with a nonce to prevent replay.
Self-Assessment Checklist
Before considering this project complete, verify your understanding:
Conceptual Understanding
- Can you explain why device trust is a pillar of Zero Trust Architecture?
- Can you describe at least 5 security posture checks and why each matters?
- Can you explain the difference between software and hardware attestation?
- Can you articulate why trust scores are better than binary decisions?
- Can you explain the security benefits of continuous vs point-in-time verification?
Implementation Skills
- Can you query disk encryption status on at least two operating systems?
- Can you query firewall status on at least two operating systems?
- Can you determine OS patch level and days since last update?
- Can you generate and sign a structured health report?
- Can you verify a signed report on the receiving end?
Cross-Platform Development
- Does your agent work on macOS?
- Does your agent work on Linux (at least Ubuntu)?
- Do you have a clear abstraction layer for OS-specific code?
- Can you add Windows support without major refactoring?
Security Considerations
- Are your reports signed with strong cryptography (Ed25519 or similar)?
- Do you include timestamps and nonces to prevent replay attacks?
- Does your agent require appropriate permissions without running as root unnecessarily?
- Is your local HTTP API bound to localhost only?
- Have you considered how an attacker might try to forge healthy reports?
Real-World Readiness
- Can your agent run as a background service/daemon?
- Does it handle system sleep/wake cycles gracefully?
- Does it recover from temporary errors (network issues, permission changes)?
- Can it be configured without recompiling (config file, env vars)?
- Could you deploy this to a fleet of 1000 devices?
Integration Capability
- Does your report format match what a PDP would expect?
- Can you extend the agent with new collectors without major changes?
- Is the trust scoring model configurable?
- Could this integrate with your Project 2 Policy Decision Engine?
The Core Question Youâve Answered
âHow do I know the device connecting to my system is secure, and not a compromised laptop pretending to be trusted?â
This is THE fundamental question of endpoint security in Zero Trust. By building this device health agent, you have mastered:
- System Introspection: Querying OS state to determine security posture
- Cross-Platform Programming: Abstracting OS-specific APIs behind clean interfaces
- Cryptographic Attestation: Signing reports so they cannot be forged
- Continuous Security: Monitoring for changes and reacting in real-time
- Risk-Based Decisions: Moving beyond binary trust to nuanced scoring
You now understand that in Zero Trust, the question is never âIs this user authorized?â but rather âIs this user, on this device, at this time, in this context, authorized for this specific action?â
Your device health agent is one critical input to that complex decision.
Project Guide Version 1.0 - December 2025