Project 3: Fearless Concurrent Web Scraper (Data Races Impossible)
A highly concurrent web scraper that fetches hundreds of pages simultaneously, extracts data, and aggregates results—with zero data races guaranteed by the compiler.
Quick Reference
| Attribute | Value |
|---|---|
| Primary Language | Rust |
| Alternative Languages | Go (for comparison), Python (to feel the pain) |
| Difficulty | Level 2: Intermediate |
| Time Estimate | 1-2 weeks |
| Knowledge Area | Concurrency / Async / Networking |
| Tooling | tokio, reqwest, scraper |
| Prerequisites | Project 1 completed, basic HTTP understanding |
What You Will Build
A highly concurrent web scraper that fetches hundreds of pages simultaneously, extracts data, and aggregates results—with zero data races guaranteed by the compiler.
Why It Matters
This project builds core skills that appear repeatedly in real-world systems and tooling.
Core Challenges
- Sharing state between async tasks → maps to Arc, Mutex, and the Send/Sync traits
- Handling rate limiting without blocking → maps to async/await and tokio runtime
- Aggregating results from many concurrent operations → maps to channels and message passing
- Graceful error handling across tasks → maps to Result propagation in async contexts
Key Concepts
- Send and Sync traits: “Rust Atomics and Locks” Chapter 1 - Mara Bos
- Async/Await: “Asynchronous Programming in Rust” - Rust Async Book (online)
- Arc and Mutex: “The Rust Programming Language” Chapter 16 - Steve Klabnik
- Channels (mpsc): “Programming Rust, 2nd Edition” Chapter 19 - Jim Blandy
Real-World Outcome
$ cargo run -- --urls urls.txt --workers 50 --output results.json
🕷️ Fearless Web Scraper v1.0
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
[░░░░░░░░░░░░░░░░░░░░] 0/500 pages
... scraping ...
[████████████████████] 500/500 pages
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
✅ Completed in 4.2 seconds
• 500 pages scraped
• 50 concurrent workers
• 0 data races (guaranteed by Rust!)
• 12 failed requests (logged to errors.log)
Results written to results.json
Implementation Guide
- Reproduce the simplest happy-path scenario.
- Build the smallest working version of the core feature.
- Add input validation and error handling.
- Add instrumentation/logging to confirm behavior.
- Refactor into clean modules with tests.
Milestones
- Milestone 1: Minimal working program that runs end-to-end.
- Milestone 2: Correct outputs for typical inputs.
- Milestone 3: Robust handling of edge cases.
- Milestone 4: Clean structure and documented usage.
Validation Checklist
- Output matches the real-world outcome example
- Handles invalid inputs safely
- Provides clear errors and exit codes
- Repeatable results across runs
References
- Main guide:
LEARN_RUST_DEEP_DIVE.md - “Rust Atomics and Locks” by Mara Bos