Project 2: Log File Analyzer

A tool that parses a web server log file and reports the number of unique hits per IP address, then prints the top 10 most frequent visitors.

Quick Reference

Attribute Value
Primary Language C++
Alternative Languages Go, Python
Difficulty Level 2: Intermediate
Time Estimate 1-2 weeks
Knowledge Area Data Processing / Performance / Hashing
Tooling N/A
Prerequisites Project 1, basic understanding of what a hash map is.

What You Will Build

A tool that parses a web server log file and reports the number of unique hits per IP address, then prints the top 10 most frequent visitors.

Why It Matters

This project builds core skills that appear repeatedly in real-world systems and tooling.

Core Challenges

  • Parsing unstructured log lines → maps to advanced std::string manipulation, possibly regex
  • Counting occurrences efficiently → maps to choosing std::unordered_map for its O(1) average access time
  • Finding the “top N” items → maps to sorting a copy of the data or using std::partial_sort
  • Handling large files → maps to efficient I/O and memory management

Key Concepts

  • std::unordered_map: Hash-based key-value storage. (cppreference.com)
  • std::partial_sort: A more efficient way to find the “top N” than a full sort. (cppreference.com)
  • Time Complexity (O(1) vs O(log n)): Understanding why unordered_map is faster than map for this problem.

Real-World Outcome

$ ./log_analyzer access.log
... processing 1,000,000 log entries ...
Top 10 IP Addresses:
  1. 172.16.0.10: 54,231 hits
  2. 192.168.1.5: 21,098 hits
  3. 10.0.0.1: 15,786 hits
  ...

Implementation Guide

  1. Reproduce the simplest happy-path scenario.
  2. Build the smallest working version of the core feature.
  3. Add input validation and error handling.
  4. Add instrumentation/logging to confirm behavior.
  5. Refactor into clean modules with tests.

Milestones

  • Milestone 1: Minimal working program that runs end-to-end.
  • Milestone 2: Correct outputs for typical inputs.
  • Milestone 3: Robust handling of edge cases.
  • Milestone 4: Clean structure and documented usage.

Validation Checklist

  • Output matches the real-world outcome example
  • Handles invalid inputs safely
  • Provides clear errors and exit codes
  • Repeatable results across runs

References

  • Main guide: LEARN_CPP_STL_DEEP_DIVE.md
  • “The C++ Programming Language” by Bjarne Stroustrup