Project 10: Cross-View File Audit

Compare filesystem API listings with raw disk scans for hidden files.

Quick Reference

Attribute	Value
Difficulty	Level 3
Time Estimate	1-2 weeks
Main Programming Language	Bash (Alternatives: Python, PowerShell)
Alternative Programming Languages	Python, PowerShell
Coolness Level	Level 3
Business Potential	Level 2
Prerequisites	OS internals basics, CLI usage, logging familiarity
Key Topics	file enumeration, raw scans

1. Learning Objectives

By completing this project, you will:

Build a repeatable workflow for cross-view file audit.
Generate reports with deterministic outputs.
Translate findings into actionable recommendations.

2. All Theory Needed (Per-Concept Breakdown)

Cross-View Detection and Independent Sources of Truth

Fundamentals Cross-view detection compares two or more independent perspectives of system state to detect manipulation. Rootkits hide by intercepting the interfaces your tools use. If you only rely on one view, a rootkit can lie to you consistently. Cross-view techniques break that illusion by comparing a user-space list to a raw kernel list, or a filesystem API to a raw disk scan, or a socket table to packet captures. Any mismatch is a signal that warrants investigation. Cross-view does not require certainty; it is about exposing contradictions.

Deep Dive into the concept Rootkits operate by tampering with enumerators. On Windows, a kernel rootkit might filter the list of processes returned by NtQuerySystemInformation. On Linux, it might hook getdents to hide files. On any platform, the API layer is a chokepoint. Cross-view detection exploits the fact that it is hard to perfectly falsify every independent source. If you compare the output of an OS API with raw memory scanning or direct disk reads, you can discover objects that exist but are hidden.

The challenge is that independent views are rarely perfectly aligned. Processes can exit between scans; files can be created and deleted; network connections can be short-lived. Cross-view detection therefore requires correlation logic. You must normalize identifiers (PIDs, inode numbers, connection 5-tuples), apply time windows, and handle transient artifacts. A well-designed cross-view tool uses tolerance thresholds and timestamps to reduce noise without ignoring real anomalies.

A strong cross-view strategy uses at least one view that is difficult for the rootkit to tamper with. For example, scanning raw memory structures or parsing raw disk blocks bypasses normal API hooks. Host-based monitoring (hypervisor introspection, external packet capture) provides another strong view. You can also use data from a different privilege domain, such as an EDR agent running with kernel access compared to a user-space tool. The key is to establish independence; otherwise you are comparing two views that are both compromised.

Cross-view detection is not a verdict; it is a lead. When you see a mismatch, you must investigate and validate. That may include additional scans, signature checks, baseline comparisons, or memory forensics. For defenders, the value is speed: cross-view techniques can rapidly surface anomalies in a large system without needing full reverse engineering. In rootkit defense, cross-view is your primary tactic for detecting stealth.

How this fit on projects You will apply this in Section 3.2 (Functional Requirements), Section 4.4 (Algorithm Overview), and Section 6.2 (Critical Test Cases). Also used in: P09-cross-view-process-audit, P10-cross-view-file-audit, P11-network-stealth-detection, P12-memory-forensics-triage.

Definitions & key terms

Cross-view: Comparing multiple independent sources of system state to detect inconsistencies.
Enumerator: An interface that lists system objects (processes, files, connections).
Independent view: A data source the rootkit cannot easily tamper with in the same way.
Correlation window: A time or state window used to match objects across views.

Mental model diagram

View A (OS API) --> [List A]
View B (Raw/External) --> [List B]
             |
             v
         [Diff Engine] --> [Anomalies]

How it works (step-by-step)

Collect list A using standard OS APIs.
Collect list B using raw memory/disk or external telemetry.
Normalize identifiers and timestamps.
Diff the lists and flag mismatches.
Validate anomalies with additional checks or baselines.

Minimal concrete example

api_processes: [1234, 1240, 1302]
memscan_processes: [1234, 1240, 1302, 1310]
hidden_candidates: [1310]

Common misconceptions

“Cross-view results are definitive.” They are indicators that require validation.
“One extra view is always enough.” Independence matters more than quantity.
“False positives mean the method is useless.” They usually indicate poor correlation logic.

Check-your-understanding questions

Why is independence between views critical?
What is a common cause of false positives in cross-view diffing?
How do you validate a suspected hidden artifact?

Check-your-understanding answers

If both views share a compromised interface, the rootkit can lie consistently in both.
Timing differences or transient objects that appear in one view but not the other.
Use additional telemetry, memory analysis, or disk scans to corroborate.

Real-world applications

Hidden process detection in incident response.
Filesystem integrity scanning in compromised systems.

Where you’ll apply it You will apply this in Section 3.2 (Functional Requirements), Section 4.4 (Algorithm Overview), and Section 6.2 (Critical Test Cases). Also used in: P09-cross-view-process-audit, P10-cross-view-file-audit, P11-network-stealth-detection, P12-memory-forensics-triage.

References

The Art of Memory Forensics - cross-view process analysis
Rootkit detection research papers on cross-view diffing

Key insights Cross-view checks reveal lies by forcing a system to contradict itself.

Summary Compare independent sources of truth to expose hidden processes, files, or connections.

Homework/Exercises to practice the concept

Design a diff algorithm that tolerates short-lived processes.
List two truly independent views for file enumeration on your OS.

Solutions to the homework/exercises

Use time windows and PID reuse checks to avoid false positives.
Combine filesystem APIs with raw disk parsing or offline scans.

3. Project Specification

3.1 What You Will Build

A tool or document that delivers: Compare filesystem API listings with raw disk scans for hidden files.

3.2 Functional Requirements

Collect required system artifacts for the task.
Normalize data and produce a report output.
Provide a deterministic golden-path demo.
Include explicit failure handling and exit codes.

3.3 Non-Functional Requirements

Performance: Complete within a typical maintenance window.
Reliability: Outputs must be deterministic and versioned.
Usability: Clear CLI output and documentation.

3.4 Example Usage / Output

$ ./P10-cross-view-file-audit.sh --report
[ok] report generated

3.5 Data Formats / Schemas / Protocols

Report JSON schema with fields: timestamp, host, findings, severity, remediation.

3.6 Edge Cases

Missing permissions or insufficient privileges.
Tooling not installed (e.g., missing sysctl or OS query tools).
Empty data sets (no drivers/modules found).

3.7 Real World Outcome

A deterministic report output stored in a case directory with hashes.

3.7.1 How to Run (Copy/Paste)

./P10-cross-view-file-audit.sh --out reports/P10-cross-view-file-audit.json

3.7.2 Golden Path Demo (Deterministic)

Report file exists and includes findings with severity.

3.7.3 Failure Demo

$ ./P10-cross-view-file-audit.sh --out /readonly/report.json
[error] cannot write report file
exit code: 2

Exit Codes:

0 success
2 output error

4. Solution Architecture

4.1 High-Level Design

[Collector] -> [Analyzer] -> [Report]

4.2 Key Components

Component	Responsibility	Key Decisions
Collector	Collects raw artifacts	Prefer OS-native tools
Analyzer	Normalizes and scores findings	Deterministic rules
Reporter	Outputs report	JSON + Markdown

4.3 Data Structures (No Full Code)

finding = { id, description, severity, evidence, remediation }

4.4 Algorithm Overview

Key Algorithm: Normalize and Score

Collect artifacts.
Normalize fields.
Apply scoring rules.
Output report.

Complexity Analysis:

Time: O(n) for n artifacts.
Space: O(n) for report.

5. Implementation Guide

5.1 Development Environment Setup

python3 -m venv .venv && source .venv/bin/activate
# install OS-specific tools as needed

5.2 Project Structure

project/
|-- src/
|   `-- main.py
|-- reports/
`-- README.md

5.3 The Core Question You’re Answering

“Can you detect hidden files or directories?”

This project turns theory into a repeatable, auditable workflow.

5.4 Concepts You Must Understand First

Relevant OS security controls
Detection workflows
Evidence handling

5.5 Questions to Guide Your Design

What data sources are trusted for this task?
How will you normalize differences across OS versions?
What is a high-confidence signal vs noise?

5.6 Thinking Exercise

Sketch a pipeline from data collection to report output.

5.7 The Interview Questions They’ll Ask

What is the main trust boundary in this project?
How do you validate findings?
What would you automate in production?

5.8 Hints in Layers

Hint 1: Start with a small, deterministic dataset.

Hint 2: Normalize output fields early.

Hint 3: Add a failure path with clear exit codes.

5.9 Books That Will Help

Topic	Book	Chapter
Rootkit defense	Practical Malware Analysis	Rootkit chapters
OS internals	Operating Systems: Three Easy Pieces	Processes and files

5.10 Implementation Phases

Phase 1: Data Collection (3-4 days)

Goals: Collect raw artifacts reliably.

Tasks:

Identify OS-native tools.
Capture sample data.

Checkpoint: Raw dataset stored.

Phase 2: Analysis & Reporting (4-5 days)

Goals: Normalize and score findings.

Tasks:

Build analyzer.
Generate report.

Checkpoint: Deterministic report generated.

Phase 3: Validation (2-3 days)

Goals: Validate rules and handle edge cases.

Tasks:

Add failure tests.
Document runbook.

Checkpoint: Failure cases documented.

5.11 Key Implementation Decisions

Decision	Options	Recommendation	Rationale
Report format	JSON, CSV	JSON	Structured and diffable
Scoring	Simple, Weighted	Weighted	Prioritize high risk findings

6. Testing Strategy

6.1 Test Categories

Category	Purpose	Examples
Unit Tests	Parser logic	Sample data parsing
Integration Tests	End-to-end run	Generate report
Edge Case Tests	Missing permissions	Error path

6.2 Critical Test Cases

Report generated with deterministic ordering.
Exit code indicates failure on invalid output path.
At least one high-risk finding is flagged in test data.

6.3 Test Data

Provide a small fixture file with one known suspicious artifact.

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

Pitfall	Symptom	Solution
Noisy results	Too many alerts	Add normalization and thresholds
Missing permissions	Script fails	Detect and warn early

7.2 Debugging Strategies

Log raw inputs before normalization.
Add verbose mode to show rule evaluation.

7.3 Performance Traps

Scanning large datasets without filtering can be slow; restrict scope to critical paths.

8. Extensions & Challenges

8.1 Beginner Extensions

Add a Markdown summary report.

8.2 Intermediate Extensions

Add a JSON schema validator for output.

8.3 Advanced Extensions

Integrate with a SIEM or ticketing system.

9. Real-World Connections

9.1 Industry Applications

Security operations audits and detection validation.

osquery - endpoint inventory

9.3 Interview Relevance

Discussing detection workflows and auditability.

10. Resources

10.1 Essential Reading

Practical Malware Analysis - rootkit detection chapters

10.2 Video Resources

Conference talks on rootkit detection

10.3 Tools & Documentation

OS-native logging and audit tools

Previous: P09-cross-view-process-audit
Next: P11-network-stealth-detection

11. Self-Assessment Checklist

11.1 Understanding

I can describe the trust boundary for this task.

11.2 Implementation

Report generation is deterministic.

11.3 Growth

I can explain how to operationalize this check.

12. Submission / Completion Criteria

Minimum Viable Completion:

Report created and contains at least one finding.

Full Completion:

Findings are categorized with remediation guidance.

Excellence (Going Above & Beyond):

Integrated into a broader toolkit or pipeline.

Project 10: Cross-View File Audit

Quick Reference

1. Learning Objectives

2. All Theory Needed (Per-Concept Breakdown)

Cross-View Detection and Independent Sources of Truth

3. Project Specification

3.1 What You Will Build

3.2 Functional Requirements

3.3 Non-Functional Requirements

3.4 Example Usage / Output

3.5 Data Formats / Schemas / Protocols

3.6 Edge Cases

3.7 Real World Outcome

3.7.1 How to Run (Copy/Paste)

3.7.2 Golden Path Demo (Deterministic)

3.7.3 Failure Demo

4. Solution Architecture

4.1 High-Level Design

4.2 Key Components

4.3 Data Structures (No Full Code)

4.4 Algorithm Overview

5. Implementation Guide

5.1 Development Environment Setup

5.2 Project Structure

5.3 The Core Question You’re Answering

5.4 Concepts You Must Understand First

5.5 Questions to Guide Your Design

5.6 Thinking Exercise

5.7 The Interview Questions They’ll Ask

5.8 Hints in Layers

5.9 Books That Will Help

5.10 Implementation Phases

Phase 1: Data Collection (3-4 days)

Phase 2: Analysis & Reporting (4-5 days)

Phase 3: Validation (2-3 days)

5.11 Key Implementation Decisions

6. Testing Strategy

6.1 Test Categories

6.2 Critical Test Cases

6.3 Test Data

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

7.2 Debugging Strategies

7.3 Performance Traps

8. Extensions & Challenges

8.1 Beginner Extensions

8.2 Intermediate Extensions

8.3 Advanced Extensions

9. Real-World Connections

9.1 Industry Applications

9.2 Related Open Source Projects

9.3 Interview Relevance

10. Resources

10.1 Essential Reading

10.2 Video Resources

10.3 Tools & Documentation

10.4 Related Projects in This Series

Next: P11-network-stealth-detection

11. Self-Assessment Checklist

11.1 Understanding

11.2 Implementation

11.3 Growth

12. Submission / Completion Criteria