Project 8: Forensic Analyzer (Capstone)

Build a forensic investigation tool that collects timelines, detects suspicious patterns, hashes evidence, and preserves metadata.

Quick Reference

Attribute Value
Difficulty Expert
Time Estimate 2 weeks
Main Programming Language Bash (Alternatives: Python)
Alternative Programming Languages Python
Coolness Level Level 5 - “digital detective”
Business Potential Very High (incident response)
Prerequisites find, grep, timestamps, hashing
Key Topics forensic timelines, evidence handling, suspicious pattern detection

1. Learning Objectives

By completing this project, you will:

  1. Build a deterministic timeline of file changes using find -printf.
  2. Detect suspicious code patterns with context and file filtering.
  3. Preserve evidence with timestamps and generate hash manifests.
  4. Produce a structured forensic report with clear phases.
  5. Avoid contaminating evidence by using safe copy and read-only operations.

2. All Theory Needed (Per-Concept Breakdown)

2.1 Forensic Timestamps and Timeline Construction

Fundamentals

Forensics depends on time. The key timestamps are mtime (content modified), ctime (metadata changed), and atime (last access). Each has a different meaning. A timeline uses these timestamps to reconstruct what changed and when. find -printf '%T+' gives you sortable timestamps, but you must know which timestamp you are using and why. In incident response, choosing the wrong timestamp can lead to incorrect conclusions. Think of timestamps as different lenses on the same event. mtime tells you when bytes changed, ctime tells you when metadata changed, and atime tells you when a file was read. The timeline you build is only as accurate as your choice of lens. A forensic tool must therefore be explicit about which timestamp it uses and should surface multiple timestamps when possible.

Deep Dive into the concept

A forensic timeline is a structured list of events derived from filesystem metadata. The primary timestamps are mtime, ctime, and atime. mtime reflects content modification, ctime reflects metadata changes (permissions, ownership, renames), and atime reflects reads. Many systems mount with noatime or relatime, so atime may not be reliable. ctime is often misunderstood; it is not creation time. A timeline tool must document this explicitly and should capture both mtime and ctime to give investigators multiple signals.

Timeline construction is about ordering. find -printf '%T+ %p' outputs a timestamp and path, which can be sorted lexicographically to produce a timeline. This works because ISO-like timestamps sort in chronological order. If you need ctime, you can use %C+ or %C@ depending on your find implementation. Be aware that GNU and BSD find differ. On macOS, you may need gfind for consistent -printf fields. This is why tool portability must be documented.

A forensic timeline must also define its time window. For example, “files modified in the last 48 hours” is a common incident response window. find -mtime -2 approximates this using 24-hour increments. If you require precise cutoffs, use -newermt with explicit timestamps. For a deterministic demo, you should use fixed timestamps and fixtures, so that your output is stable and testable.

Another nuance is that timestamps can change during analysis if you accidentally write to files. The forensic principle is to avoid modifying evidence. Your tool should open files read-only and avoid operations that update atime (which might still happen depending on mount options). If you can, use stat without reading file contents. For this project, we accept that reading a file might update atime, and we document that limitation.

Finally, the timeline must include size and path for context. A line like 2026-01-01T10:00:00 /var/www/html/upload.php (2.1 KB) is more useful than a bare timestamp. Include size and permissions in the report to help identify suspicious changes, such as a sudden size jump or permission escalation.

Another important concept is clock drift. If a system’s clock is wrong, timestamps can mislead investigators. This is why real forensic workflows correlate filesystem timelines with external evidence such as logs or NTP offsets. In this project, you assume the system clock is correct, but you should still record the scan time and optionally the system timezone so that future reviewers can interpret the timeline correctly.

It is also useful to distinguish between selection and ordering. Selection is your time window filter (-mtime -2), while ordering is how you sort the timeline. Selection errors will omit relevant files; ordering errors will misrepresent sequence. Both matter. Document your selection semantics, especially the fact that -mtime uses 24-hour buckets, and use a sortable timestamp format to avoid ordering mistakes.

Another detail is that different filesystems may expose timestamps with different precision. Some record nanoseconds, others only seconds. When you compare two timestamps that appear equal, they might still differ at a finer granularity. If you include high-precision timestamps, your report may look noisy; if you truncate, you may lose ordering detail. Decide which precision you want and apply it consistently. For deterministic demos, truncate to seconds; for real investigations, keep full precision if available.

Finally, timeline accuracy is affected by how files are written. Some applications write files by creating a temp file and renaming it. This can change ctime and mtime in ways that look like separate events. A forensic analyst should be aware of this behavior and treat timeline entries as indicators, not absolute truths. Your tool should note this limitation in the report so users interpret the timeline correctly.

How this fit on projects

The forensic analyzer generates a timeline as its first phase, and uses it to select files for deeper inspection.

Definitions & key terms

  • mtime: last content modification time.
  • ctime: last metadata change time.
  • atime: last access time.
  • timeline: ordered list of events by time.

Mental model diagram (ASCII)

files -> stat metadata -> timestamp + path -> sort -> timeline

How it works (step-by-step)

  1. Use find to select files within a time window.
  2. Output timestamp, size, and path with -printf.
  3. Sort output descending for recent-first view.
  4. Save timeline in report.
  5. Invariant: timestamps in the report are derived from the same source field (mtime or ctime).
  6. Failure modes: incorrect time window semantics, clock skew, or files modified during scan.

Minimal concrete example

find /var/www -type f -mtime -2 -printf '%T+ %s %p\n' | sort -r

Common misconceptions

  • “ctime is creation time” -> False; it is metadata change time.
  • “atime is always reliable” -> False; it may be disabled.
  • “find -mtime -2 means exactly 48 hours” -> False; it uses 24-hour buckets.

Check-your-understanding questions

  1. What does ctime represent in forensics?
  2. Why might atime be unreliable?
  3. How do you make a timeline output deterministic?
  4. Why include size and path in the timeline?

Check-your-understanding answers

  1. It reflects metadata changes like permissions or ownership.
  2. It may be disabled or updated lazily by the filesystem.
  3. Use fixed fixtures, fixed timestamps, and stable sorting.
  4. They provide context for suspicious changes.

Real-world applications

  • Incident response investigations.
  • File integrity monitoring.
  • Compliance audits for unauthorized changes.

Where you will apply it

References

  • The Linux Programming Interface (Kerrisk), Chapter 15
  • man find
  • man stat

Key insights

Timelines are only as good as your timestamp semantics.

Summary

A forensic timeline is a structured, timestamped list of file events. Use the right timestamp and document it.

Homework/Exercises to practice the concept

  1. Build a timeline of the last 24 hours in a test directory.
  2. Compare mtime vs ctime for a file after chmod.
  3. Sort a timeline and verify ordering.

Solutions to the homework/exercises

  1. find . -type f -mtime -1 -printf '%T+ %p\n' | sort -r
  2. chmod 600 file; stat file and compare times.
  3. Use sort and inspect the output manually.

2.2 Evidence Handling, Hashing, and Chain of Custody

Fundamentals

Evidence must be preserved without modification. This means copying files with timestamps intact (cp -p), storing them in a dedicated evidence directory, and generating cryptographic hashes to prove integrity. A hash manifest records file paths and hashes so investigators can verify that evidence has not been altered. Chain of custody is the documented process of collecting, transferring, and storing evidence. In practice, evidence handling is about trust. If you cannot prove that a file is unchanged, you cannot rely on it for investigation or legal purposes. That is why hashing and metadata preservation are non-negotiable. Even in this simplified project, treat evidence handling as a first-class requirement, not a nice-to-have.

Deep Dive into the concept

Evidence handling is about trust. When you collect files from a compromised system, you must prove that the evidence is unchanged. The standard approach is to copy files while preserving metadata, then compute hashes. cp -p preserves timestamps and modes, which is essential for forensics. You should also capture the original path in the evidence directory to maintain context. For example, /var/www/html/upload.php becomes evidence/var/www/html/upload.php.

Hashing provides integrity. Common hashes include SHA-256 and SHA-1. MD5 is weak for cryptographic security, but still often used as a quick integrity check. For a modern forensic tool, use SHA-256. Your manifest should record the hash, file size, and original path. This allows future verification. A mismatch indicates tampering or corruption.

Chain of custody is a process, not just a file. Your report should include when evidence was collected, by whom (if applicable), and how it was stored. In this project, you simulate this by recording timestamps and a run ID. A deterministic demo should use fixed timestamps, but real runs should record actual times.

Another detail is read-only handling. If you open files, you may update atime. You can mitigate this by mounting evidence sources read-only or by using tools that do not update access times. For this project, you acknowledge this limitation and focus on preserving metadata on copied evidence.

Finally, you must avoid contaminating evidence with your own output. Store reports and hash manifests outside of the evidence directory, or in a clearly separate folder, to avoid mixing analysis artifacts with raw evidence. This is a subtle but important practice.

There is also a workflow aspect: evidence collection should be repeatable. If you run the tool twice, you should get two distinct evidence bundles, each with its own run ID and manifests. This keeps cases isolated and prevents accidental overwriting. It also allows you to compare evidence across runs if needed. A good forensic tool makes this separation explicit in its directory naming scheme.

Another nuance is choice of hash algorithm and verification. SHA-256 is the default because it is widely supported and secure. Your tool should also include a verification mode (sha256sum -c) so that evidence can be validated later. Even if you do not implement a full verification command, the manifest format should be compatible with standard tools so verification is easy.

Chain of custody also implies access control. Evidence directories should be created with restrictive permissions (for example, chmod 700) to prevent accidental modification. If you are working in a multi-user environment, store evidence in a location with controlled access. Even in a local project, you should treat evidence as sensitive data. This is a good habit for real incident response.

Lastly, consider that copying files can change their logical context. A file’s path is part of the evidence, not just its contents. That is why mirroring the directory structure is important: it preserves the original context for later analysis. Your report should record both the original path and the evidence path so reviewers can correlate them.

How this fit on projects

The forensic analyzer creates an evidence directory, copies files with cp -p, and generates a hash manifest as part of the final report.

Definitions & key terms

  • chain of custody: documented history of evidence handling.
  • hash manifest: file listing hashes and paths.
  • evidence preservation: copying with timestamps and permissions intact.
  • integrity check: verifying files have not changed.

Mental model diagram (ASCII)

source files -> cp -p -> evidence/ -> sha256sum -> manifest

How it works (step-by-step)

  1. Create an evidence directory with a run ID.
  2. Copy suspicious files with cp -p.
  3. Compute hashes and write a manifest.
  4. Record metadata in the report.
  5. Invariant: evidence files are read-only copies with preserved timestamps.
  6. Failure modes: missing permissions, partial copies, or hashes not matching.

Minimal concrete example

cp -p /var/www/html/upload.php evidence/var/www/html/upload.php
sha256sum evidence/var/www/html/upload.php >> hashes.txt

Common misconceptions

  • “Hashes are optional” -> False; integrity is essential.
  • “Copying files is enough” -> False; metadata must be preserved.
  • “Reports can live in evidence dir” -> Bad practice; separate them.

Check-your-understanding questions

  1. Why is cp -p required for forensics?
  2. What does a hash manifest prove?
  3. Why should evidence and reports be separated?
  4. Why is MD5 not ideal for security?

Check-your-understanding answers

  1. It preserves timestamps and permissions.
  2. That evidence files have not changed since collection.
  3. To avoid contaminating evidence with analysis artifacts.
  4. MD5 is vulnerable to collisions; SHA-256 is stronger.

Real-world applications

  • Incident response investigations.
  • Legal evidence preservation.
  • Audit trails for sensitive systems.

Where you will apply it

  • In this project: see §3.7 (evidence preservation) and §5.10 (phases).
  • Also used in: P06-system-janitor.md for safe operations.

References

  • NIST Digital Forensics guidelines
  • man cp
  • man sha256sum

Key insights

Evidence is only useful if you can prove it was not modified.

Summary

Preserve metadata, compute hashes, and document collection to maintain evidence integrity.

Homework/Exercises to practice the concept

  1. Copy a file with and without -p and compare timestamps.
  2. Generate a SHA-256 hash and verify it later.
  3. Build a manifest file with path, hash, and size.

Solutions to the homework/exercises

  1. cp file copy1; cp -p file copy2; stat file copy1 copy2
  2. sha256sum file > manifest; sha256sum -c manifest
  3. stat -c '%s %n' file | paste -d ' ' - <(sha256sum file)

2.3 Suspicious Pattern Detection and Contextual Matching

Fundamentals

Pattern detection is how you locate likely malicious code or artifacts. Common signatures include eval(, base64_decode(, system(, and long base64 strings. Grep can search recursively with -r or via find file lists. Context is important: a match in a comment is less concerning than a match in executable code. A forensic scanner should capture line numbers, filenames, and context lines to support investigation. Pattern detection is triage. It is designed to surface items that deserve human review, not to prove compromise. That is why context and file type filters are as important as the patterns themselves. A match without context is just a rumor; a match with context is evidence you can evaluate.

Deep Dive into the concept

Suspicious pattern detection is heuristic. You are not proving maliciousness; you are prioritizing inspection. Patterns should be based on real-world attack techniques. For web shells, indicators include eval, assert, and base64_decode in PHP, or child_process.exec in Node.js. For privilege escalation, look for chmod 777 in scripts or unexpected sudo usage.

Context matters. A match in a documentation file is not the same as a match in source code. That is why your tool should filter by extension and avoid scanning binary or vendor files. Use find to select only relevant file types, then apply grep with -n and -H. Add -C to include surrounding lines for context. This helps investigators decide whether the match is benign.

False positives are inevitable. To reduce them, anchor patterns to code-like contexts, such as \b(eval|assert)\s*\( or base64_decode\s*\(. Avoid matching in comments by filtering lines that start with comment markers, or run separate scans for code and comments. This is still imperfect, but it improves signal.

Another challenge is obfuscation. Attackers may split keywords across lines or use string concatenation to evade simple grep. This tool will not catch advanced obfuscation, and that limitation should be stated. The goal is rapid triage, not complete detection.

Finally, detection results should be included in the report with severity labels. For example, base64_decode in PHP might be “High” severity, while eval in a test file might be “Medium”. This helps prioritize manual review.

Another important consideration is scope. A web root might contain PHP, JavaScript, HTML, and configuration files. You should decide which file types are in scope for suspicious pattern detection and exclude the rest. Scanning everything increases noise and slows down the tool. For a forensic tool, you want high signal, not maximum coverage.

Also consider version control artifacts. Attackers sometimes hide malicious code in .git directories or in compiled assets. If you exclude those directories, you might miss evidence. This is a trade-off between performance and completeness. A pragmatic approach is to keep .git excluded by default but allow an override flag. Document this choice so investigators understand the limitations.

Pattern detection can also benefit from a severity map. For example, base64_decode might be high, eval critical, and system high. If you include the pattern name and severity in the report, investigators can quickly sort by risk. This is similar to the code auditor project but tuned for incident response. In a breach scenario, you want the fastest path to the most suspicious files.

Finally, remember that attackers can obfuscate. A string like ba + se64_decode will bypass simple regexes. This tool does not attempt deobfuscation, and that limitation should be explicit. The goal is to surface obvious indicators quickly, not to guarantee detection. A good forensic tool is transparent about what it does and does not catch.

In practice, investigators often maintain a small whitelist of known-safe files or directories (for example, vendor libraries or trusted frameworks). Incorporating a whitelist into your scan can reduce noise, but it also introduces risk if the whitelist is too broad. The safest approach is to keep whitelists narrow, document them in the report, and make them optional. This preserves the integrity of the scan while giving experienced analysts tools to focus their attention.

How this fit on projects

The forensic analyzer uses pattern detection to identify suspicious files that should be copied into the evidence bundle.

Definitions & key terms

  • suspicious pattern: regex indicative of malicious behavior.
  • context lines: lines around a match for understanding.
  • false positive: match that is not malicious.
  • obfuscation: hiding patterns to avoid detection.

Mental model diagram (ASCII)

file list -> grep patterns -> context -> suspicious list -> evidence copy

How it works (step-by-step)

  1. Build a list of source files to scan.
  2. Apply suspicious regex patterns with grep.
  3. Capture filename, line, and context.
  4. Add matching files to evidence list.
  5. Invariant: suspicious matches are tied to file paths and line numbers.
  6. Failure modes: overbroad patterns, missing file types, or matches only in comments.

Minimal concrete example

grep -rn -E 'eval\(|base64_decode\(' /var/www --include='*.php'

Common misconceptions

  • “Pattern matches prove compromise” -> False; they are indicators.
  • “Regex can detect all malware” -> False; obfuscation exists.
  • “Scanning all files is better” -> False; noise overwhelms signal.

Check-your-understanding questions

  1. Why are context lines important in forensic scans?
  2. How can you reduce false positives in pattern detection?
  3. Why should you avoid scanning binaries?
  4. What is the limitation of simple regex scanners?

Check-your-understanding answers

  1. They show surrounding code to evaluate intent.
  2. Use extension filters and anchored patterns.
  3. Binary matches are noisy and usually irrelevant.
  4. Obfuscation can evade simple patterns.

Real-world applications

  • Web shell detection in compromised sites.
  • Rapid triage during incident response.
  • Identifying suspicious scripts in cron jobs.

Where you will apply it

References

  • OWASP PHP Security Cheat Sheet
  • man grep
  • Black Hat Bash (Aleks/Farhi), Chapter 10

Key insights

Pattern detection is triage. It guides investigation but does not prove guilt.

Summary

Use focused patterns, context lines, and file filters to surface suspicious code quickly.

Homework/Exercises to practice the concept

  1. Build a regex list for PHP web shell indicators.
  2. Scan a fixture directory and review context.
  3. Classify findings by severity.

Solutions to the homework/exercises

  1. Patterns: eval\(, base64_decode\(, gzinflate\(, assert\(.
  2. grep -rn -E '...' ./fixtures/www --include='*.php' -C 2.
  3. Add a severity label to each pattern in a small rules file.

3. Project Specification

3.1 What You Will Build

A forensic analysis script that generates a timeline of recent changes, scans for suspicious patterns, hashes evidence files, and preserves them in an evidence bundle with a final report.

3.2 Functional Requirements

  1. Timeline: list files modified in the last N hours with timestamps and sizes.
  2. Pattern scan: detect suspicious code patterns with context.
  3. Hashing: generate SHA-256 hashes for evidence files.
  4. Evidence preservation: copy files with cp -p into an evidence folder.
  5. Report: produce a structured report with phases and counts.

3.3 Non-Functional Requirements

  • Forensic safety: avoid modifying evidence where possible.
  • Reliability: partial failures logged, not hidden.
  • Usability: clear phase headings and deterministic output.

3.4 Example Usage / Output

$ ./investigate.sh /var/www --hours 48

3.5 Data Formats / Schemas / Protocols

Report format:

PHASE 1: TIMELINE
2026-01-01T10:00:00 2100 /var/www/html/upload.php

PHASE 2: SUSPICIOUS PATTERNS
/var/www/html/upload.php:15: base64_decode(...)

PHASE 3: HASHES
SHA256 path

3.6 Edge Cases

  • Files modified during scan.
  • Permission denied on evidence copy.
  • No suspicious patterns found.

3.7 Real World Outcome

3.7.1 How to Run (Copy/Paste)

./investigate.sh /var/www --hours 48

3.7.2 Golden Path Demo (Deterministic)

Use a fixture directory and a fixed run timestamp 2026-01-01T12:00:00.

3.7.3 If CLI: exact terminal transcript

$ ./investigate.sh ./fixtures/www --hours 48
[2026-01-01T12:00:00] TARGET=./fixtures/www
[2026-01-01T12:00:00] WINDOW=48h
[2026-01-01T12:00:00] EVIDENCE=evidence_20260101_120000
[2026-01-01T12:00:00] REPORT=REPORT.txt
[2026-01-01T12:00:00] DONE

$ head -12 evidence_20260101_120000/REPORT.txt
PHASE 1: TIMELINE
2026-01-01T10:00:00 2100 ./fixtures/www/html/upload.php

PHASE 2: SUSPICIOUS PATTERNS
./fixtures/www/html/upload.php:15: $payload = base64_decode("ZXZhbCgkX1BPU1RbJ2NtZCddKTs=");

PHASE 3: HASHES
SHA256 3f6d... ./fixtures/www/html/upload.php

Failure demo (no target):

$ ./investigate.sh /no/such/path
[2026-01-01T12:00:00] ERROR: target not found
EXIT_CODE=2

Exit codes:

  • 0: success
  • 1: partial success (errors logged)
  • 2: invalid arguments or missing path

4. Solution Architecture

4.1 High-Level Design

scan -> timeline -> pattern scan -> evidence copy -> hash -> report

4.2 Key Components

Component Responsibility Key Decisions
Timeline collect recent files use mtime window
Scanner suspicious patterns file type filters
Evidence copy files preserve metadata
Hasher generate hashes SHA-256
Reporter final report phase sections

4.3 Data Structures (No Full Code)

case: {timeline, suspicious, hashes, errors}

4.4 Algorithm Overview

Key Algorithm: Forensic Case Builder

  1. Generate timeline by mtime window.
  2. Scan for suspicious patterns.
  3. Copy matched files into evidence bundle.
  4. Hash evidence files and record manifest.
  5. Write a report with phase summaries.

Complexity Analysis:

  • Time: O(n log n) for sorting timelines
  • Space: O(k) for evidence files

5. Implementation Guide

5.1 Development Environment Setup

# Requires find, grep, cp, sha256sum

5.2 Project Structure

project-root/
├── investigate.sh
├── fixtures/
│   └── www/
└── README.md

5.3 The Core Question You’re Answering

“How do you conduct a systematic forensic investigation using only grep and find?”

5.4 Concepts You Must Understand First

  1. Forensic timestamps and timelines
  2. Evidence handling and hashing
  3. Suspicious pattern detection

5.5 Questions to Guide Your Design

  1. What time window defines the incident?
  2. Which patterns indicate likely compromise?
  3. How will you preserve evidence without modifying it?

5.6 Thinking Exercise

Design the phases of a forensic investigation, including timeline, pattern scan, hashing, and evidence preservation.

5.7 The Interview Questions They’ll Ask

  1. “What does ctime tell you in forensics?”
  2. “Why are hashes required for evidence?”
  3. “How do you avoid contaminating evidence?”

5.8 Hints in Layers

Hint 1: Timeline

find /var/www -type f -mtime -2 -printf '%T+ %s %p\n' | sort -r

Hint 2: Suspicious patterns

grep -rn -E 'eval\(|base64_decode\(' /var/www --include='*.php'

Hint 3: Hashes

find /var/www -type f -mtime -2 -exec sha256sum {} + > hashes.txt

Hint 4: Preserve timestamps

cp -p /var/www/html/upload.php evidence/var/www/html/upload.php

5.9 Books That Will Help

Topic Book Chapter
File attributes The Linux Programming Interface (Kerrisk) Ch. 15
Regex The Linux Command Line (Shotts) Ch. 19
Incident response Black Hat Bash (Aleks/Farhi) Ch. 10

5.10 Implementation Phases

Phase 1: Foundation (2-3 days)

Goals:

  • Timeline generation
  • Basic report skeleton

Tasks:

  1. Implement mtime window selection.
  2. Output sorted timeline section.

Checkpoint: timeline report matches fixtures.

Phase 2: Core Functionality (4-5 days)

Goals:

  • Pattern detection and evidence copy

Tasks:

  1. Run suspicious pattern scans.
  2. Copy matched files with cp -p.

Checkpoint: evidence directory mirrors original paths.

Phase 3: Polish & Edge Cases (2-3 days)

Goals:

  • Hashing and final report

Tasks:

  1. Generate SHA-256 manifest.
  2. Add summary counts and error section.

Checkpoint: report contains all phases and counts.

5.11 Key Implementation Decisions

Decision Options Recommendation Rationale
Hash algorithm SHA-256 vs MD5 SHA-256 stronger integrity
Timeline metric mtime vs ctime mtime with ctime optional clearer meaning
Evidence storage flat vs mirrored mirrored preserves context

6. Testing Strategy

6.1 Test Categories

Category Purpose Examples
Unit Tests pattern rules suspicious regexes
Integration Tests end-to-end case fixture www
Edge Case Tests no matches empty suspicious list

6.2 Critical Test Cases

  1. No suspicious patterns: report still generated.
  2. Permission denied: error logged, exit code 1.
  3. Hash manifest: matches evidence files count.

6.3 Test Data

fixtures/www/html/upload.php
fixtures/www/html/index.php

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

Pitfall Symptom Solution
Wrong timestamp semantics timeline misleading document mtime vs ctime
Evidence contamination timestamps change use cp -p and separate output
Overbroad patterns too many matches tighten regex and filter file types

7.2 Debugging Strategies

  • Run timeline and pattern scans separately first.
  • Verify evidence directory paths match originals.
  • Check hash manifest with sha256sum -c.

7.3 Performance Traps

Scanning entire web roots can be large. Restrict file types and time windows.


8. Extensions & Challenges

8.1 Beginner Extensions

  • Add --include and --exclude file type flags.
  • Add a summary of file permissions in the timeline.

8.2 Intermediate Extensions

  • Add support for ctime or btime where available.
  • Add a JSON report output.

8.3 Advanced Extensions

  • Integrate with a hash whitelist for known-good files.
  • Add a signature database for common web shells.

9. Real-World Connections

9.1 Industry Applications

  • Incident response and breach investigations.
  • Digital forensics in legal contexts.
  • Security monitoring and threat hunting.
  • sleuthkit: digital forensics toolkit.
  • osquery: system auditing and investigation.

9.3 Interview Relevance

  • Forensic timestamp semantics.
  • Evidence handling and hashing.
  • Safe pattern detection techniques.

10. Resources

10.1 Essential Reading

  • The Linux Programming Interface (Kerrisk), Chapter 15
  • Black Hat Bash (Aleks/Farhi), Chapter 10

10.2 Video Resources

  • “Digital Forensics Basics” (YouTube)
  • “Incident Response with Unix Tools” (talk)

10.3 Tools & Documentation

  • man find
  • man sha256sum
  • man grep

11. Self-Assessment Checklist

11.1 Understanding

  • I can explain mtime vs ctime.
  • I can explain hash manifests.
  • I can explain why evidence should be copied with metadata.

11.2 Implementation

  • Timeline, pattern scan, and hashing all work.
  • Evidence is preserved with timestamps.
  • Report is deterministic.

11.3 Growth

  • I documented at least one limitation.
  • I can explain this project in an interview.
  • I can propose a stronger detection strategy.

12. Submission / Completion Criteria

Minimum Viable Completion:

  • Timeline and suspicious pattern scan produced.
  • Evidence copied and hashed.

Full Completion:

  • Report includes all phases with counts and error handling.
  • Deterministic output with fixed run headers.

Excellence (Going Above & Beyond):

  • JSON report plus hash verification step.
  • Signature-based detection and whitelist support.