Project 10: Parallel Directory Traversal

A lock-free parallel directory walker that uses a work-stealing queue to distribute directory traversal across multiple threads.

Quick Reference

Attribute Value
Primary Language Rust
Alternative Languages C++, Go
Difficulty Level 3: Advanced
Time Estimate 2 weeks
Knowledge Area Concurrency / File Systems
Tooling Custom Implementation (or study ignore crate)
Prerequisites Basic threading, understanding of file systems

What You Will Build

A lock-free parallel directory walker that uses a work-stealing queue to distribute directory traversal across multiple threads.

Why It Matters

This project builds core skills that appear repeatedly in real-world systems and tooling.

Core Challenges

  • Work-stealing queues → maps to load balancing
  • Lock-free synchronization → maps to avoiding contention
  • Directory entry batching → maps to reducing syscall overhead
  • Respecting .gitignore → maps to ignore pattern matching

Key Concepts

  • Work-Stealing: “The Art of Multiprocessor Programming” Chapter 16 - Herlihy & Shavit
  • Crossbeam Deque: Rust’s work-stealing deque implementation
  • getdents64: Batched directory reading syscall
  • Gitignore Semantics: Complex rules about inheritance and negation

Real-World Outcome

$ ./parallel_walker /home/user/code
Walking 2.1 million files across 150,000 directories...

Thread 0: 523,456 files processed
Thread 1: 518,234 files processed
Thread 2: 530,123 files processed
Thread 3: 528,187 files processed

Total time: 1.2 seconds
Throughput: 1.75 million files/second

Benchmark comparison:
  Single-threaded: 4.8 seconds
  4 threads:       1.2 seconds
  Speedup: 4x (linear with thread count!)

Work-stealing statistics:
  Steals performed: 1,234
  Average queue depth: 45 directories

Implementation Guide

  1. Reproduce the simplest happy-path scenario.
  2. Build the smallest working version of the core feature.
  3. Add input validation and error handling.
  4. Add instrumentation/logging to confirm behavior.
  5. Refactor into clean modules with tests.

Milestones

  • Milestone 1: Minimal working program that runs end-to-end.
  • Milestone 2: Correct outputs for typical inputs.
  • Milestone 3: Robust handling of edge cases.
  • Milestone 4: Clean structure and documented usage.

Validation Checklist

  • Output matches the real-world outcome example
  • Handles invalid inputs safely
  • Provides clear errors and exit codes
  • Repeatable results across runs

References

  • Main guide: TEXT_SEARCH_TOOLS_DEEP_DIVE.md
  • “The Art of Multiprocessor Programming” by Herlihy & Shavit