Project 13: Fuzzy String Matcher (fzf-style)

A fuzzy matcher that finds items matching a pattern with characters in order but not necessarily contiguous. Score matches by quality (gaps, position of matches, word boundaries).

Quick Reference

Attribute Value
Primary Language Go
Alternative Languages Rust, C++, Python
Difficulty Level 3: Advanced
Time Estimate 2 weeks
Knowledge Area Algorithms / String Matching
Tooling Custom Implementation
Prerequisites Dynamic programming, string algorithms

What You Will Build

A fuzzy matcher that finds items matching a pattern with characters in order but not necessarily contiguous. Score matches by quality (gaps, position of matches, word boundaries).

Why It Matters

This project builds core skills that appear repeatedly in real-world systems and tooling.

Core Challenges

  • Subsequence matching → maps to dynamic programming
  • Scoring function design → maps to heuristics for “good” matches
  • Word boundary bonuses → maps to camelCase and snake_case awareness
  • Performance optimization → maps to early termination, SIMD for candidate filtering

Key Concepts

  • Longest Common Subsequence: Base algorithm for fuzzy matching
  • Scoring Heuristics: Position bonuses, gap penalties, boundary bonuses
  • Smith-Waterman: Scoring inspiration from bioinformatics
  • fzf algo.go: The actual implementation to study

Real-World Outcome

$ ./fuzzy_matcher
Input: fzf
Candidates: [fuzzy_finder, foozbaz, fizzbuzz, FuzzyZoneFinder]

Matches (sorted by score):
  1. FuzzyZoneFinder   score=95  (f-z-f at word boundaries!)
  2. fuzzy_finder      score=78  (f-z-f with small gaps)
  3. fizzbuzz          score=45  (f-z at start, z far)
  4. foozbaz           score=32  (f-z far apart, z before a)

Score breakdown for "FuzzyZoneFinder":
  'F' at position 0: +20 (start bonus)
  'z' at position 3: +15 (after capital, word boundary)
  'f' at position 10: +18 (camelCase boundary)
  Gap penalty: -3 (3 chars between z and f)
  Total: 95

Performance:
  10,000 candidates: 2ms
  100,000 candidates: 18ms
  1,000,000 candidates: 150ms

Implementation Guide

  1. Reproduce the simplest happy-path scenario.
  2. Build the smallest working version of the core feature.
  3. Add input validation and error handling.
  4. Add instrumentation/logging to confirm behavior.
  5. Refactor into clean modules with tests.

Milestones

  • Milestone 1: Minimal working program that runs end-to-end.
  • Milestone 2: Correct outputs for typical inputs.
  • Milestone 3: Robust handling of edge cases.
  • Milestone 4: Clean structure and documented usage.

Validation Checklist

  • Output matches the real-world outcome example
  • Handles invalid inputs safely
  • Provides clear errors and exit codes
  • Repeatable results across runs

References

  • Main guide: TEXT_SEARCH_TOOLS_DEEP_DIVE.md
  • fzf source code (algo.go)