Project 3: Concordance Generator

A tool that reads a text file and generates a concordance: an alphabetical list of every unique word and the line numbers on which it appeared, with no duplicate line numbers per word.

Quick Reference

Attribute Value
Primary Language C++
Alternative Languages Python
Difficulty Level 2: Intermediate
Time Estimate Weekend
Knowledge Area Text Processing / Compound Data Structures
Tooling N/A
Prerequisites Project 1.

What You Will Build

A tool that reads a text file and generates a concordance: an alphabetical list of every unique word and the line numbers on which it appeared, with no duplicate line numbers per word.

Why It Matters

This project builds core skills that appear repeatedly in real-world systems and tooling.

Core Challenges

  • Splitting lines into words → maps to string manipulation, stringstream
  • Normalizing words → maps to using std::transform with a lambda to convert to lowercase and remove punctuation
  • Storing unique, sorted line numbers for each word → maps to leveraging std::set’s automatic sorting and uniqueness
  • Storing words in alphabetical order → maps to leveraging std::map’s automatic key sorting

Key Concepts

  • std::set: A container for unique, sorted elements. (cppreference.com)
  • Nested STL Containers: e.g., map<string, set<int>>.
  • std::transform: Applying an operation to every element in a range. (cppreference.com)

Real-World Outcome

It was the best of times.
It was the worst of times.

Implementation Guide

  1. Reproduce the simplest happy-path scenario.
  2. Build the smallest working version of the core feature.
  3. Add input validation and error handling.
  4. Add instrumentation/logging to confirm behavior.
  5. Refactor into clean modules with tests.

Milestones

  • Milestone 1: Minimal working program that runs end-to-end.
  • Milestone 2: Correct outputs for typical inputs.
  • Milestone 3: Robust handling of edge cases.
  • Milestone 4: Clean structure and documented usage.

Validation Checklist

  • Output matches the real-world outcome example
  • Handles invalid inputs safely
  • Provides clear errors and exit codes
  • Repeatable results across runs

References

  • Main guide: LEARN_CPP_STL_DEEP_DIVE.md
  • “Effective STL” by Scott Meyers