Project 6: System Janitor
Build a safe cleanup tool that removes old temp files, prunes empty directories, and fixes risky permissions.
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Intermediate |
| Time Estimate | 4-8 hours |
| Main Programming Language | Bash (Alternatives: Python) |
| Alternative Programming Languages | Python |
| Coolness Level | Level 3 - “ops cleaner” |
| Business Potential | High (operations automation) |
| Prerequisites | permissions, find actions, safety |
| Key Topics | safe deletion, dry run, permissions, logging |
1. Learning Objectives
By completing this project, you will:
- Design a safe cleanup workflow with dry-run and confirmation.
- Use
find -execand-deletecorrectly and safely. - Identify and fix world-writable or risky permission settings.
- Log all actions for auditability.
- Handle permission errors without breaking the run.
2. All Theory Needed (Per-Concept Breakdown)
2.1 Safe Deletion, Dry Runs, and Transactional Thinking
Fundamentals
Deletion is irreversible. A cleanup script must treat deletion as a transaction: first identify candidates, then review, then execute. find -delete is fast but dangerous because a mistake can remove too much. Safer approaches use find ... -print for a dry run, then -exec rm -- {} + once you are confident. A janitor script should default to dry-run mode and require explicit confirmation for destructive actions.
The core idea is to separate selection from execution. Selection is reversible; execution is not. A good janitor tool makes these phases explicit so you can reason about what will happen before it happens. This is the same safety philosophy used in database migrations and infrastructure changes: plan, review, then apply.
Deep Dive into the concept
In production systems, cleanup automation is both necessary and risky. Temporary directories fill up, stale logs consume disk space, and old caches slow backups. The obvious answer is to delete old files, but the risk is accidental deletion of important data. That is why cleanup tools are designed in phases: discovery, review, and execution. Each phase should be explicit and logged.
A dry run is a simulation. It shows what would be deleted without actually deleting. A good janitor tool provides a --dry-run flag and prints a summary. This is essential for trust. When the user decides to execute, the tool should confirm and then proceed. In scripts, this may be a --force flag. The default should always be safe.
find -delete is powerful but has a critical semantic: it implicitly uses -depth. That means it deletes children before parents, which is necessary for directories. However, it also means a mis-specified path can remove an entire tree quickly. -exec rm -- {} + is more explicit and allows logging each deletion. When you use rm, always pass -- before the filename list to prevent option injection.
Another safe pattern is to generate a candidate list, save it to disk, and then operate on that list. This provides an audit trail and allows you to re-run with confidence. For example: find ... -print0 > candidates.list then xargs -0 rm -- < candidates.list. You can also check the list size, diff it between runs, or sample it before execution.
Time-based deletion has its own pitfalls. -mtime +30 means older than 30 days, but the calculation is in 24-hour chunks and based on file modification time. If you need more precise cutoffs, use -mmin or -newermt with explicit timestamps. A cleanup tool must document which time semantics it uses and why.
Finally, error handling: deletion can fail due to permissions, concurrent file changes, or locks. The tool should log failures and continue. It should also exit with a distinct code if any failures occurred so that automation can detect partial success.
Another subtlety is that cleanup policies can conflict. For example, you might want to delete files older than 30 days, but you might also want to keep the most recent 10 files regardless of age. These policies require ordering and precedence. A safe janitor tool should document its policy hierarchy and implement it deterministically. Even if you start with a simple policy, design the script so it can evolve without risky rewrites.
You should also think about idempotence. If the script runs twice in a row, the second run should not crash or behave unpredictably. It should simply find fewer candidates. This is important for scheduled jobs. Logging should also be append-only or should use a fresh log per run to avoid confusion.
A final operational concern is target scoping. A janitor tool should never accept an empty or root path without explicit confirmation, because a typo can turn a cleanup script into a system-wiper. Defensive checks like “refuse to run on /” or “require --force for system paths” are common in safe automation. Another safe practice is to canonicalize the target path (using realpath) and require it to be under an allowlisted base directory. These are not theoretical concerns; many cleanup incidents come from a missing variable or a mis-specified path.
How this fit on projects
The janitor script uses a dry-run mode and a safe deletion phase with logs and exit codes.
Definitions & key terms
- dry run: simulation mode that does not modify data.
-delete: find action that removes matched paths.- transactional: staged workflow with explicit review.
- option injection: filenames treated as command flags.
Mental model diagram (ASCII)
scan -> candidate list -> review -> delete -> log
How it works (step-by-step)
- Build a candidate list with
findpredicates. - If dry-run, print counts and exit.
- If execute, delete with
rm --or-delete. - Log every deletion and error.
- Exit with code indicating success or partial failure.
- Invariant: no destructive action occurs in dry-run mode.
- Failure modes: incorrect target path, overly broad predicates, or permission errors.
Minimal concrete example
find /tmp -type f -mtime +30 -print
find /tmp -type f -mtime +30 -exec rm -- {} +
Common misconceptions
- ”
-deleteis always safe” -> False; it is unforgiving. - “Dry runs are optional” -> False in automation.
- “
rmis safe without--” -> False if filenames start with-.
Check-your-understanding questions
- Why should a cleanup tool default to dry-run?
- What does
-mtime +30actually mean? - Why use
rm --when deleting files? - How should the script behave if some deletions fail?
Check-your-understanding answers
- It prevents accidental destructive actions.
- Files modified more than 30*24 hours ago.
- It prevents filenames from being interpreted as options.
- Continue, log failures, and return a partial-success code.
Real-world applications
- Automated cleanup in CI servers.
- Temp directory management on shared systems.
- Disk usage control in production.
Where you will apply it
- In this project: see §3.2 (requirements) and §3.7 (golden output).
- Also used in: P05-the-pipeline.md.
References
- The Linux Command Line (Shotts), Chapter 9
man findman rm
Key insights
Safe deletion is a workflow, not a single command.
Summary
A janitor script must prioritize safety: dry run first, explicit execution, and comprehensive logging.
Homework/Exercises to practice the concept
- Create a test tree and simulate deletion with
-print. - Add a
--forceflag that switches from dry-run to delete. - Confirm that filenames starting with
-are handled safely.
Solutions to the homework/exercises
find ./test -type f -mtime +0 -print- Use a shell flag that switches to
-exec rm --. touch -- -bad; rm -- -bad
2.2 Permissions, Ownership, and Risky Modes
Fundamentals
Permissions determine who can read, write, or execute a file. The janitor tool must detect risky permissions such as world-writable files or directories without the sticky bit. Ownership matters too: deleting files you do not own may fail or cause unintended side effects. Understanding rwx, special bits (setuid, setgid, sticky), and find -perm is essential to safely fix permissions and avoid breaking systems.
Permissions are part of the security posture of a system. A cleanup script that blindly changes permissions can do more harm than good. That is why permission fixes should be explicit, logged, and optional by default. The script should surface risky permissions and let an operator decide whether to fix them, unless a policy mandates automatic remediation.
Deep Dive into the concept
Unix permission bits are encoded in the inode mode field. The basic rwx bits apply to user, group, and others. World-writable files (-perm -o+w) are risky because any user can modify them. World-writable directories are even riskier if they do not have the sticky bit set, because any user can delete or rename other users’ files. The sticky bit (+t) is commonly used on /tmp to prevent such abuse. A janitor tool can detect world-writable directories without the sticky bit and fix them by adding +t or removing write permission for others.
Setuid and setgid bits elevate privileges when a program is executed. A janitor script should not change these bits lightly; instead, it should report them. Removing setuid bits can break system tools; leaving them in world-writable locations is a security risk. The tool should flag them in the report rather than automatically modify them, unless a safe policy is defined.
Ownership affects deletion and permission changes. find will match files you cannot delete, and rm will fail if you lack permission. Your script must log these failures and continue. If you are running with elevated privileges (like sudo), you must be even more cautious because your script can delete anything. That is why explicit path restrictions and dry-run previews are critical.
find -perm has subtle semantics. -perm -o+w matches files that are world-writable. -perm /o+w matches files where any of the specified bits are set, depending on the mode. You must choose the correct form. For a janitor tool, -perm -0002 is a precise way to identify world-writable files, but you must interpret the results carefully.
Finally, permission fixes should be logged and reversible. If you change permissions, record the old mode and the new mode so you can undo changes if needed. This is part of an audit trail. A janitor tool that silently changes permissions is hard to trust.
Another important detail is group ownership. Many systems rely on group permissions for shared access. If your tool changes permissions without considering group ownership, you can break collaboration workflows. A safer approach is to log group ownership and avoid changing group permissions unless explicitly requested. This is especially important on shared servers or CI machines.
You should also be aware of filesystem ACLs (access control lists). ACLs can grant or deny permissions independent of the classic rwx bits. find -perm does not capture ACLs. If your environment uses ACLs, your permission report may be incomplete. A good janitor tool should at least document this limitation and avoid claiming that it provides a full security audit.
Additionally, remember that permission semantics can be influenced by umask and default ACLs. If a directory has a default ACL, new files may inherit permissions that look risky but are intentional. Your tool should avoid blindly “fixing” such files unless policy explicitly demands it. A safe approach is to flag and report, then let a human decide. The sticky bit on shared directories is another example: adding it to /tmp is correct, but adding it to application directories may be inappropriate. Context matters.
How this fit on projects
The janitor report highlights world-writable files and optionally fixes them with explicit logging.
Definitions & key terms
- mode bits: permission bits stored in the inode.
- sticky bit: directory flag preventing users from deleting others’ files.
- setuid/setgid: execution flags that change effective UID/GID.
- world-writable: writable by “others”.
Mental model diagram (ASCII)
mode bits: [u rwx][g rwx][o rwx] + special bits
How it works (step-by-step)
- Use
find -perm -0002to locate world-writable files. - Filter directories and check sticky bit status.
- Apply fixes (chmod) only if policy allows.
- Log old and new modes.
- Invariant: permission changes are explicit and logged.
- Failure modes: unintended permission changes, ACLs masking real access, or insufficient privileges.
Minimal concrete example
find /tmp -type d -perm -0002 ! -perm -1000 -print
Common misconceptions
- “chmod 777 is fine for temp” -> False; use sticky bit instead.
- “setuid files are always malicious” -> False; many system tools use them.
- “Permissions changes are harmless” -> False; they can break apps.
Check-your-understanding questions
- Why is a world-writable directory without sticky bit risky?
- How do you find world-writable files with
find? - Why should permission changes be logged?
- What is the difference between
-perm -0002and-perm /0002?
Check-your-understanding answers
- Any user can delete or rename other users’ files.
find . -perm -0002.- To allow auditing and rollback.
-perm -0002requires the bit;/0002matches any of the bits.
Real-world applications
- Hardening temp directories on shared systems.
- Detecting risky permission changes after incidents.
- Compliance checks for shared folders.
Where you will apply it
- In this project: see §3.2 (requirements) and §7.1 (pitfalls).
- Also used in: P01-digital-census.md.
References
- The Linux Command Line (Shotts), Chapter 9
man chmodman find
Key insights
Permissions are policy encoded in bits; your tool must respect that policy.
Summary
A janitor tool must detect risky permissions and apply fixes only with clear intent and logging.
Homework/Exercises to practice the concept
- Create a world-writable directory and detect it with
find. - Add sticky bit and verify the change.
- Build a log line that records old and new modes.
Solutions to the homework/exercises
mkdir test; chmod 777 test; find . -type d -perm -0002chmod +t test; ls -ld testprintf 'chmod 1777 %s\n' test >> changes.log
3. Project Specification
3.1 What You Will Build
A CLI cleanup tool that deletes old files, removes empty directories, fixes risky permissions, and logs every action for review.
3.2 Functional Requirements
- Dry-run mode: list candidates without deletion.
- Deletion: remove files older than threshold.
- Empty dir cleanup: remove empty directories.
- Permission fixes: flag or fix world-writable paths.
- Logging: record all actions and errors.
3.3 Non-Functional Requirements
- Safety: default to dry-run.
- Reliability: continues on errors.
- Usability: clear summary output.
3.4 Example Usage / Output
$ ./janitor.sh /tmp --days 30 --fix-perms
3.5 Data Formats / Schemas / Protocols
Log format:
TIME ACTION PATH RESULT
2026-01-01T12:00:00 DELETE /tmp/old.log OK
2026-01-01T12:00:00 CHMOD /tmp/shared 1777
3.6 Edge Cases
- Files created during the run.
- Permission denied on delete.
- Symlink loops (avoid following symlinks).
3.7 Real World Outcome
3.7.1 How to Run (Copy/Paste)
./janitor.sh /tmp --days 30 --fix-perms
3.7.2 Golden Path Demo (Deterministic)
Use a fixture directory and a fixed timestamp in logs: 2026-01-01T12:00:00.
3.7.3 If CLI: exact terminal transcript
$ ./janitor.sh ./fixtures/tmp --days 30 --fix-perms
[2026-01-01T12:00:00] TARGET=./fixtures/tmp
[2026-01-01T12:00:00] MODE=EXECUTE
[2026-01-01T12:00:00] DELETE_FILES=2
[2026-01-01T12:00:00] REMOVE_DIRS=1
[2026-01-01T12:00:00] PERMS_FIXED=1
[2026-01-01T12:00:00] LOG=janitor.log
[2026-01-01T12:00:00] DONE
Failure demo (permission denied):
$ ./janitor.sh /root --days 30
[2026-01-01T12:00:00] ERROR: permission denied
EXIT_CODE=1
Exit codes:
- 0: success
- 1: partial success with errors
- 2: invalid arguments
4. Solution Architecture
4.1 High-Level Design
scan -> dry-run list -> confirm -> delete/fix -> log -> summary
4.2 Key Components
| Component | Responsibility | Key Decisions |
|---|---|---|
| Selector | choose candidates | mtime threshold |
| Deleter | remove files | rm -- or -delete |
| Dir cleaner | remove empty dirs | find -type d -empty |
| Perm fixer | adjust risky perms | log changes |
4.3 Data Structures (No Full Code)
log line: TIME ACTION PATH RESULT
4.4 Algorithm Overview
Key Algorithm: Cleanup Workflow
- Select candidate files by age.
- Dry-run or execute based on flag.
- Remove empty directories.
- Fix or report permissions.
- Write summary and exit code.
Complexity Analysis:
- Time: O(n)
- Space: O(1) streaming
5. Implementation Guide
5.1 Development Environment Setup
# Requires find, rm, chmod
5.2 Project Structure
project-root/
├── janitor.sh
├── fixtures/
│ └── tmp/
└── README.md
5.3 The Core Question You’re Answering
“How do I automate cleanup tasks without risking accidental data loss?”
5.4 Concepts You Must Understand First
- Safe deletion workflows
- Permissions and risky modes
5.5 Questions to Guide Your Design
- What directories are safe to clean?
- Should permission fixes be automatic or reported?
- How will you log changes for audit?
5.6 Thinking Exercise
Design a two-phase cleanup process: dry-run output and execution. Decide what summary counts you will include.
5.7 The Interview Questions They’ll Ask
- “Why is
-deletedangerous?” - “How do you test a deletion command safely?”
- “What is the sticky bit for?”
5.8 Hints in Layers
Hint 1: Dry run
find /tmp -type f -mtime +30 -print
Hint 2: Safe delete
find /tmp -type f -mtime +30 -exec rm -- {} +
Hint 3: Remove empty dirs
find /tmp -type d -empty -exec rmdir -- {} +
5.9 Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Permissions | The Linux Command Line (Shotts) | Ch. 9 |
| Find basics | The Linux Command Line (Shotts) | Ch. 17 |
| System ops | How Linux Works (Ward) | Ch. 4 |
5.10 Implementation Phases
Phase 1: Foundation (1-2 hours)
Goals:
- Implement dry-run mode
- Build candidate list
Tasks:
- Parse args and thresholds.
- Emit candidate list and summary.
Checkpoint: dry-run prints correct file list.
Phase 2: Core Functionality (2-3 hours)
Goals:
- Execute deletion and cleanup
Tasks:
- Delete files with
rm --. - Remove empty directories.
Checkpoint: files removed as expected.
Phase 3: Polish & Edge Cases (1-2 hours)
Goals:
- Permission fixes and logging
Tasks:
- Detect world-writable paths.
- Log permission changes.
Checkpoint: log includes old and new modes.
5.11 Key Implementation Decisions
| Decision | Options | Recommendation | Rationale |
|---|---|---|---|
| Default mode | dry-run vs execute | dry-run | safer by default |
| Permission fixes | auto vs report | report by default | avoid breaking apps |
| Delete method | -delete vs rm | rm with – | explicit and safe |
6. Testing Strategy
6.1 Test Categories
| Category | Purpose | Examples |
|---|---|---|
| Unit Tests | arg parsing | days threshold |
| Integration Tests | cleanup run | fixture tmp dir |
| Edge Case Tests | permission denied | protected file |
6.2 Critical Test Cases
- Dry-run: no files deleted.
- Execute: files deleted and logged.
- Permission denied: error logged, exit code 1.
6.3 Test Data
fixtures/tmp/old.log
fixtures/tmp/empty/
7. Common Pitfalls & Debugging
7.1 Frequent Mistakes
| Pitfall | Symptom | Solution |
|---|---|---|
| Running without dry-run | accidental deletion | default to dry-run |
| Missing prune | deletes too much | restrict target path |
| No logging | no audit trail | log every action |
7.2 Debugging Strategies
- Add a verbose flag to print actions.
- Test on a fixture directory first.
- Use
set -xto trace.
7.3 Performance Traps
Large temp trees can be slow. Use pruning and limit traversal depth when safe.
8. Extensions & Challenges
8.1 Beginner Extensions
- Add
--confirmprompt before delete. - Add
--excludepatterns.
8.2 Intermediate Extensions
- Add
--report-onlymode with CSV output. - Add
--xdevsupport.
8.3 Advanced Extensions
- Add retention policies and scheduled runs.
- Integrate with systemd timers.
9. Real-World Connections
9.1 Industry Applications
- Disk usage control on servers.
- Safe cleanup scripts in CI environments.
- Compliance around temp data retention.
9.2 Related Open Source Projects
logrotate: log cleanup and rotation.tmpreaper: temp directory cleanup.
9.3 Interview Relevance
- Safe automation practices.
- Permission and ownership reasoning.
- Error handling in scripts.
10. Resources
10.1 Essential Reading
- How Linux Works (Ward), Chapter 4
- The Linux Command Line (Shotts), Chapter 9
10.2 Video Resources
- “Safe Deletion in Unix” (YouTube)
- “Permissions and Sticky Bit” (tutorial)
10.3 Tools & Documentation
man findman chmodman rm
10.4 Related Projects in This Series
- P01-digital-census.md - permissions audit
- P05-the-pipeline.md - safe pipelines
11. Self-Assessment Checklist
11.1 Understanding
- I can explain why dry-run is essential.
- I can explain sticky bit behavior.
- I can explain
-mtimesemantics.
11.2 Implementation
- Dry-run and execute modes work.
- All deletions are logged.
- Permission issues are handled gracefully.
11.3 Growth
- I can propose safer defaults.
- I documented at least one edge case.
- I can explain this project in an interview.
12. Submission / Completion Criteria
Minimum Viable Completion:
- Dry-run and deletion modes implemented.
- Log output records actions.
Full Completion:
- Permission fixes detected or applied.
- Deterministic summary report included.
Excellence (Going Above & Beyond):
- Scheduled cleanup policy with retention rules.
- Rollback plan for permission changes.