Project 2: Log File Analyzer
A tool that parses a web server log file and reports the number of unique hits per IP address, then prints the top 10 most frequent visitors.
Quick Reference
| Attribute | Value |
|---|---|
| Primary Language | C++ |
| Alternative Languages | Go, Python |
| Difficulty | Level 2: Intermediate |
| Time Estimate | 1-2 weeks |
| Knowledge Area | Data Processing / Performance / Hashing |
| Tooling | N/A |
| Prerequisites | Project 1, basic understanding of what a hash map is. |
What You Will Build
A tool that parses a web server log file and reports the number of unique hits per IP address, then prints the top 10 most frequent visitors.
Why It Matters
This project builds core skills that appear repeatedly in real-world systems and tooling.
Core Challenges
- Parsing unstructured log lines → maps to advanced
std::stringmanipulation, possibly regex - Counting occurrences efficiently → maps to choosing
std::unordered_mapfor its O(1) average access time - Finding the “top N” items → maps to sorting a copy of the data or using
std::partial_sort - Handling large files → maps to efficient I/O and memory management
Key Concepts
std::unordered_map: Hash-based key-value storage. (cppreference.com)std::partial_sort: A more efficient way to find the “top N” than a full sort. (cppreference.com)- Time Complexity (O(1) vs O(log n)): Understanding why
unordered_mapis faster thanmapfor this problem.
Real-World Outcome
$ ./log_analyzer access.log
... processing 1,000,000 log entries ...
Top 10 IP Addresses:
1. 172.16.0.10: 54,231 hits
2. 192.168.1.5: 21,098 hits
3. 10.0.0.1: 15,786 hits
...
Implementation Guide
- Reproduce the simplest happy-path scenario.
- Build the smallest working version of the core feature.
- Add input validation and error handling.
- Add instrumentation/logging to confirm behavior.
- Refactor into clean modules with tests.
Milestones
- Milestone 1: Minimal working program that runs end-to-end.
- Milestone 2: Correct outputs for typical inputs.
- Milestone 3: Robust handling of edge cases.
- Milestone 4: Clean structure and documented usage.
Validation Checklist
- Output matches the real-world outcome example
- Handles invalid inputs safely
- Provides clear errors and exit codes
- Repeatable results across runs
References
- Main guide:
LEARN_CPP_STL_DEEP_DIVE.md - “The C++ Programming Language” by Bjarne Stroustrup