Project 10: Parallel Directory Traversal
A lock-free parallel directory walker that uses a work-stealing queue to distribute directory traversal across multiple threads.
Quick Reference
| Attribute | Value |
|---|---|
| Primary Language | Rust |
| Alternative Languages | C++, Go |
| Difficulty | Level 3: Advanced |
| Time Estimate | 2 weeks |
| Knowledge Area | Concurrency / File Systems |
| Tooling | Custom Implementation (or study ignore crate) |
| Prerequisites | Basic threading, understanding of file systems |
What You Will Build
A lock-free parallel directory walker that uses a work-stealing queue to distribute directory traversal across multiple threads.
Why It Matters
This project builds core skills that appear repeatedly in real-world systems and tooling.
Core Challenges
- Work-stealing queues → maps to load balancing
- Lock-free synchronization → maps to avoiding contention
- Directory entry batching → maps to reducing syscall overhead
- Respecting .gitignore → maps to ignore pattern matching
Key Concepts
- Work-Stealing: “The Art of Multiprocessor Programming” Chapter 16 - Herlihy & Shavit
- Crossbeam Deque: Rust’s work-stealing deque implementation
- getdents64: Batched directory reading syscall
- Gitignore Semantics: Complex rules about inheritance and negation
Real-World Outcome
$ ./parallel_walker /home/user/code
Walking 2.1 million files across 150,000 directories...
Thread 0: 523,456 files processed
Thread 1: 518,234 files processed
Thread 2: 530,123 files processed
Thread 3: 528,187 files processed
Total time: 1.2 seconds
Throughput: 1.75 million files/second
Benchmark comparison:
Single-threaded: 4.8 seconds
4 threads: 1.2 seconds
Speedup: 4x (linear with thread count!)
Work-stealing statistics:
Steals performed: 1,234
Average queue depth: 45 directories
Implementation Guide
- Reproduce the simplest happy-path scenario.
- Build the smallest working version of the core feature.
- Add input validation and error handling.
- Add instrumentation/logging to confirm behavior.
- Refactor into clean modules with tests.
Milestones
- Milestone 1: Minimal working program that runs end-to-end.
- Milestone 2: Correct outputs for typical inputs.
- Milestone 3: Robust handling of edge cases.
- Milestone 4: Clean structure and documented usage.
Validation Checklist
- Output matches the real-world outcome example
- Handles invalid inputs safely
- Provides clear errors and exit codes
- Repeatable results across runs
References
- Main guide:
TEXT_SEARCH_TOOLS_DEEP_DIVE.md - “The Art of Multiprocessor Programming” by Herlihy & Shavit