Project 10: Cross-View File Audit

Compare filesystem API listings with raw disk scans for hidden files.

Quick Reference

Attribute Value
Difficulty Level 3
Time Estimate 1-2 weeks
Main Programming Language Bash (Alternatives: Python, PowerShell)
Alternative Programming Languages Python, PowerShell
Coolness Level Level 3
Business Potential Level 2
Prerequisites OS internals basics, CLI usage, logging familiarity
Key Topics file enumeration, raw scans

1. Learning Objectives

By completing this project, you will:

  1. Build a repeatable workflow for cross-view file audit.
  2. Generate reports with deterministic outputs.
  3. Translate findings into actionable recommendations.

2. All Theory Needed (Per-Concept Breakdown)

Cross-View Detection and Independent Sources of Truth

Fundamentals Cross-view detection compares two or more independent perspectives of system state to detect manipulation. Rootkits hide by intercepting the interfaces your tools use. If you only rely on one view, a rootkit can lie to you consistently. Cross-view techniques break that illusion by comparing a user-space list to a raw kernel list, or a filesystem API to a raw disk scan, or a socket table to packet captures. Any mismatch is a signal that warrants investigation. Cross-view does not require certainty; it is about exposing contradictions.

Deep Dive into the concept Rootkits operate by tampering with enumerators. On Windows, a kernel rootkit might filter the list of processes returned by NtQuerySystemInformation. On Linux, it might hook getdents to hide files. On any platform, the API layer is a chokepoint. Cross-view detection exploits the fact that it is hard to perfectly falsify every independent source. If you compare the output of an OS API with raw memory scanning or direct disk reads, you can discover objects that exist but are hidden.

The challenge is that independent views are rarely perfectly aligned. Processes can exit between scans; files can be created and deleted; network connections can be short-lived. Cross-view detection therefore requires correlation logic. You must normalize identifiers (PIDs, inode numbers, connection 5-tuples), apply time windows, and handle transient artifacts. A well-designed cross-view tool uses tolerance thresholds and timestamps to reduce noise without ignoring real anomalies.

A strong cross-view strategy uses at least one view that is difficult for the rootkit to tamper with. For example, scanning raw memory structures or parsing raw disk blocks bypasses normal API hooks. Host-based monitoring (hypervisor introspection, external packet capture) provides another strong view. You can also use data from a different privilege domain, such as an EDR agent running with kernel access compared to a user-space tool. The key is to establish independence; otherwise you are comparing two views that are both compromised.

Cross-view detection is not a verdict; it is a lead. When you see a mismatch, you must investigate and validate. That may include additional scans, signature checks, baseline comparisons, or memory forensics. For defenders, the value is speed: cross-view techniques can rapidly surface anomalies in a large system without needing full reverse engineering. In rootkit defense, cross-view is your primary tactic for detecting stealth.

How this fit on projects You will apply this in Section 3.2 (Functional Requirements), Section 4.4 (Algorithm Overview), and Section 6.2 (Critical Test Cases). Also used in: P09-cross-view-process-audit, P10-cross-view-file-audit, P11-network-stealth-detection, P12-memory-forensics-triage.

Definitions & key terms

  • Cross-view: Comparing multiple independent sources of system state to detect inconsistencies.
  • Enumerator: An interface that lists system objects (processes, files, connections).
  • Independent view: A data source the rootkit cannot easily tamper with in the same way.
  • Correlation window: A time or state window used to match objects across views.

Mental model diagram

View A (OS API) --> [List A]
View B (Raw/External) --> [List B]
             |
             v
         [Diff Engine] --> [Anomalies]

How it works (step-by-step)

  1. Collect list A using standard OS APIs.
  2. Collect list B using raw memory/disk or external telemetry.
  3. Normalize identifiers and timestamps.
  4. Diff the lists and flag mismatches.
  5. Validate anomalies with additional checks or baselines.

Minimal concrete example

api_processes: [1234, 1240, 1302]
memscan_processes: [1234, 1240, 1302, 1310]
hidden_candidates: [1310]

Common misconceptions

  • “Cross-view results are definitive.” They are indicators that require validation.
  • “One extra view is always enough.” Independence matters more than quantity.
  • “False positives mean the method is useless.” They usually indicate poor correlation logic.

Check-your-understanding questions

  • Why is independence between views critical?
  • What is a common cause of false positives in cross-view diffing?
  • How do you validate a suspected hidden artifact?

Check-your-understanding answers

  • If both views share a compromised interface, the rootkit can lie consistently in both.
  • Timing differences or transient objects that appear in one view but not the other.
  • Use additional telemetry, memory analysis, or disk scans to corroborate.

Real-world applications

  • Hidden process detection in incident response.
  • Filesystem integrity scanning in compromised systems.

Where you’ll apply it You will apply this in Section 3.2 (Functional Requirements), Section 4.4 (Algorithm Overview), and Section 6.2 (Critical Test Cases). Also used in: P09-cross-view-process-audit, P10-cross-view-file-audit, P11-network-stealth-detection, P12-memory-forensics-triage.

References

  • The Art of Memory Forensics - cross-view process analysis
  • Rootkit detection research papers on cross-view diffing

Key insights Cross-view checks reveal lies by forcing a system to contradict itself.

Summary Compare independent sources of truth to expose hidden processes, files, or connections.

Homework/Exercises to practice the concept

  • Design a diff algorithm that tolerates short-lived processes.
  • List two truly independent views for file enumeration on your OS.

Solutions to the homework/exercises

  • Use time windows and PID reuse checks to avoid false positives.
  • Combine filesystem APIs with raw disk parsing or offline scans.

3. Project Specification

3.1 What You Will Build

A tool or document that delivers: Compare filesystem API listings with raw disk scans for hidden files.

3.2 Functional Requirements

  1. Collect required system artifacts for the task.
  2. Normalize data and produce a report output.
  3. Provide a deterministic golden-path demo.
  4. Include explicit failure handling and exit codes.

3.3 Non-Functional Requirements

  • Performance: Complete within a typical maintenance window.
  • Reliability: Outputs must be deterministic and versioned.
  • Usability: Clear CLI output and documentation.

3.4 Example Usage / Output

$ ./P10-cross-view-file-audit.sh --report
[ok] report generated

3.5 Data Formats / Schemas / Protocols

Report JSON schema with fields: timestamp, host, findings, severity, remediation.

3.6 Edge Cases

  • Missing permissions or insufficient privileges.
  • Tooling not installed (e.g., missing sysctl or OS query tools).
  • Empty data sets (no drivers/modules found).

3.7 Real World Outcome

A deterministic report output stored in a case directory with hashes.

3.7.1 How to Run (Copy/Paste)

./P10-cross-view-file-audit.sh --out reports/P10-cross-view-file-audit.json

3.7.2 Golden Path Demo (Deterministic)

  • Report file exists and includes findings with severity.

3.7.3 Failure Demo

$ ./P10-cross-view-file-audit.sh --out /readonly/report.json
[error] cannot write report file
exit code: 2

Exit Codes:

  • 0 success
  • 2 output error

4. Solution Architecture

4.1 High-Level Design

[Collector] -> [Analyzer] -> [Report]

4.2 Key Components

Component Responsibility Key Decisions
Collector Collects raw artifacts Prefer OS-native tools
Analyzer Normalizes and scores findings Deterministic rules
Reporter Outputs report JSON + Markdown

4.3 Data Structures (No Full Code)

finding = { id, description, severity, evidence, remediation }

4.4 Algorithm Overview

Key Algorithm: Normalize and Score

  1. Collect artifacts.
  2. Normalize fields.
  3. Apply scoring rules.
  4. Output report.

Complexity Analysis:

  • Time: O(n) for n artifacts.
  • Space: O(n) for report.

5. Implementation Guide

5.1 Development Environment Setup

python3 -m venv .venv && source .venv/bin/activate
# install OS-specific tools as needed

5.2 Project Structure

project/
|-- src/
|   `-- main.py
|-- reports/
`-- README.md

5.3 The Core Question You’re Answering

“Can you detect hidden files or directories?”

This project turns theory into a repeatable, auditable workflow.

5.4 Concepts You Must Understand First

  • Relevant OS security controls
  • Detection workflows
  • Evidence handling

5.5 Questions to Guide Your Design

  1. What data sources are trusted for this task?
  2. How will you normalize differences across OS versions?
  3. What is a high-confidence signal vs noise?

5.6 Thinking Exercise

Sketch a pipeline from data collection to report output.

5.7 The Interview Questions They’ll Ask

  1. What is the main trust boundary in this project?
  2. How do you validate findings?
  3. What would you automate in production?

5.8 Hints in Layers

Hint 1: Start with a small, deterministic dataset.

Hint 2: Normalize output fields early.

Hint 3: Add a failure path with clear exit codes.


5.9 Books That Will Help

Topic Book Chapter
Rootkit defense Practical Malware Analysis Rootkit chapters
OS internals Operating Systems: Three Easy Pieces Processes and files

5.10 Implementation Phases

Phase 1: Data Collection (3-4 days)

Goals: Collect raw artifacts reliably.

Tasks:

  1. Identify OS-native tools.
  2. Capture sample data.

Checkpoint: Raw dataset stored.

Phase 2: Analysis & Reporting (4-5 days)

Goals: Normalize and score findings.

Tasks:

  1. Build analyzer.
  2. Generate report.

Checkpoint: Deterministic report generated.

Phase 3: Validation (2-3 days)

Goals: Validate rules and handle edge cases.

Tasks:

  1. Add failure tests.
  2. Document runbook.

Checkpoint: Failure cases documented.

5.11 Key Implementation Decisions

Decision Options Recommendation Rationale
Report format JSON, CSV JSON Structured and diffable
Scoring Simple, Weighted Weighted Prioritize high risk findings

6. Testing Strategy

6.1 Test Categories

Category Purpose Examples
Unit Tests Parser logic Sample data parsing
Integration Tests End-to-end run Generate report
Edge Case Tests Missing permissions Error path

6.2 Critical Test Cases

  1. Report generated with deterministic ordering.
  2. Exit code indicates failure on invalid output path.
  3. At least one high-risk finding is flagged in test data.

6.3 Test Data

Provide a small fixture file with one known suspicious artifact.

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

Pitfall Symptom Solution
Noisy results Too many alerts Add normalization and thresholds
Missing permissions Script fails Detect and warn early

7.2 Debugging Strategies

  • Log raw inputs before normalization.
  • Add verbose mode to show rule evaluation.

7.3 Performance Traps

Scanning large datasets without filtering can be slow; restrict scope to critical paths.


8. Extensions & Challenges

8.1 Beginner Extensions

  • Add a Markdown summary report.

8.2 Intermediate Extensions

  • Add a JSON schema validator for output.

8.3 Advanced Extensions

  • Integrate with a SIEM or ticketing system.

9. Real-World Connections

9.1 Industry Applications

  • Security operations audits and detection validation.
  • osquery - endpoint inventory

9.3 Interview Relevance

  • Discussing detection workflows and auditability.

10. Resources

10.1 Essential Reading

  • Practical Malware Analysis - rootkit detection chapters

10.2 Video Resources

  • Conference talks on rootkit detection

10.3 Tools & Documentation

  • OS-native logging and audit tools

11. Self-Assessment Checklist

11.1 Understanding

  • I can describe the trust boundary for this task.

11.2 Implementation

  • Report generation is deterministic.

11.3 Growth

  • I can explain how to operationalize this check.

12. Submission / Completion Criteria

Minimum Viable Completion:

  • Report created and contains at least one finding.

Full Completion:

  • Findings are categorized with remediation guidance.

Excellence (Going Above & Beyond):

  • Integrated into a broader toolkit or pipeline.