TEXT SEARCH TOOLS DEEP DIVE

Deep Dive: Text Search Tools (grep, ripgrep, ag, fzf, find, fd)

Understanding why some search tools are 10-100x faster than others requires diving deep into algorithms, system calls, parallelization, and modern CPU features. This guide takes you from fundamental string matching algorithms to building production-quality search tools.

The Landscape: Tool Comparison

Tool	Year	Language	Regex Engine	Key Innovation
grep	1974	C	Hand-rolled DFA/NFA + Boyer-Moore	Original, Unix philosophy
egrep	1976	C	DFA	First deterministic regex
GNU grep	1988	C	Hybrid DFA/NFA + Boyer-Moore	Tuned Boyer-Moore
ack	2005	Perl	Perl regex	First “code-aware” grep
The Silver Searcher (ag)	2011	C	PCRE (JIT)	mmap + pthreads + gitignore
ripgrep (rg)	2016	Rust	regex-automata (DFA)	Teddy SIMD + parallel + Unicode
fd	2017	Rust	regex	Parallel traversal + gitignore
fzf	2013	Go	Custom fuzzy	Interactive + worker goroutines

Why Performance Differs: The Core Factors

1. Regex Engine Architecture (Biggest Factor)

Backtracking (NFA) Engines: PCRE, Perl, Python, Java, .NET

Can express everything (backreferences, lookahead)
Catastrophic backtracking: Exponential time on pathological patterns
Example: (a+)+$ on aaaaaaaaaaaaaaaaaaaaaaaaaaaa! takes SECONDS

Finite Automata (DFA) Engines: RE2, Rust regex, GNU grep (partial)

Linear time guaranteed: O(n) where n = input length
Cannot do backreferences, but who uses those in grep?
Key insight: “Only ever in one state at a time”

2. Literal Optimization (The Secret Weapon)

Most “regex” searches are actually literal strings or have literal prefixes:

TODO → literal
function.*async → literal prefix “function”
\berror\b → literal “error” + boundary check

The Strategy: Find the literal fast (SIMD), THEN run the regex engine.

3. SIMD Vectorization

Modern CPUs can compare 16-32 bytes simultaneously:

SSE2: 128-bit registers (16 bytes at once)
AVX2: 256-bit registers (32 bytes at once)

The memchr crate achieves 15+ bytes per CPU cycle on finding a single byte.

4. Parallelization

ag: pthreads, up to 8 workers
ripgrep: Lock-free parallel directory traversal via crossbeam
fzf: Worker goroutines for matching

5. I/O Strategy

mmap: Map file directly into memory (good for single large files)
read + buffer: Traditional buffered I/O (good for many small files)
Adaptive: ripgrep chooses automatically

6. What NOT to Search

Respect .gitignore (skip node_modules, build artifacts)
Skip binary files
Skip hidden files by default

Phase 1: String Matching Algorithms

Project 1: Naive String Search (Baseline)

File: TEXT_SEARCH_TOOLS_DEEP_DIVE.md
Main Programming Language: C
Alternative Programming Languages: Rust, Zig, Go
Coolness Level: Level 2: Practical but Forgettable
Business Potential: 1. The “Resume Gold” (Educational/Personal Brand)
Difficulty: Level 1: Beginner
Knowledge Area: Algorithms / String Matching
Software or Tool: Custom Implementation
Main Book: “Introduction to Algorithms” by Cormen, Leiserson, Rivest, Stein (CLRS)

What you’ll build: A brute-force string search that compares every position in the text against the pattern. This is your baseline to understand what we’re optimizing.

Why it teaches search tools: You need to see the naive O(n×m) approach to appreciate why Boyer-Moore and SIMD matter. This is also what many standard library strstr() functions use!

Core challenges you’ll face:

Understanding worst case → pattern aaaaab in aaaaaaaaaaaaaaaaaab → maps to why preprocessing matters
Character-by-character comparison → maps to CPU cache behavior
Early termination → maps to optimization opportunities

Key Concepts:

String Matching Basics: “Introduction to Algorithms” Chapter 32.1 - CLRS
Big-O Analysis: Understanding O(n×m) worst case vs O(n) average case
Pointer Arithmetic: “C Programming Language” Chapter 5 - K&R

Difficulty: Beginner Time estimate: 2-3 hours Prerequisites: Basic C, pointers

Real world outcome:

$ ./naive_search "needle" haystack.txt
Found at position 1234
Found at position 5678
Search completed in 45ms

$ ./naive_search "aaaaaab" pathological.txt
# Watch it struggle on worst-case input
Search completed in 2300ms  # Much slower!

Benchmark:
Pattern "hello" in 1MB file:
  Naive: 12ms
  (We'll compare to Boyer-Moore later)

Implementation Hints:

For each position i in text:
    For each position j in pattern:
        If text[i+j] != pattern[j]:
            Break (mismatch)
        If j == pattern_length - 1:
            Found match at position i

The key insight is that we re-examine the same characters multiple times. If we’ve seen “abcd” and the pattern is “abce”, we know something about what the text contains—we shouldn’t forget it!

Learning milestones:

Implementation works correctly → You understand the basic problem
Can identify worst-case inputs → You understand algorithm analysis
See why this is slow → You’re ready for optimization

Project 2: Boyer-Moore String Search

File: TEXT_SEARCH_TOOLS_DEEP_DIVE.md
Main Programming Language: C
Alternative Programming Languages: Rust, C++, Zig
Coolness Level: Level 4: Hardcore Tech Flex
Business Potential: 1. The “Resume Gold” (Educational/Personal Brand)
Difficulty: Level 3: Advanced
Knowledge Area: Algorithms / String Matching
Software or Tool: Custom Implementation
Main Book: “Algorithms on Strings, Trees, and Sequences” by Dan Gusfield

What you’ll build: The Boyer-Moore algorithm with both the “bad character” and “good suffix” heuristics. This is what GNU grep uses internally.

Why it teaches search tools: Boyer-Moore is the foundation of fast string search. Its key insight—scanning backwards and skipping forward—is counterintuitive but brilliant. Longer patterns search FASTER because you can skip more.

Core challenges you’ll face:

Bad character table construction → maps to preprocessing patterns
Good suffix table construction → maps to suffix analysis
Rightmost occurrence lookup → maps to how much to skip
Understanding why backwards scanning works → maps to algorithm intuition

Key Concepts:

Bad Character Rule: “Algorithms on Strings, Trees, and Sequences” Chapter 2.2 - Gusfield
Good Suffix Rule: “Algorithms on Strings, Trees, and Sequences” Chapter 2.2 - Gusfield
Skip Tables: “The Boyer-Moore Fast String Searching Algorithm” - J. Strother Moore (original paper)
Horspool Simplification: Boyer-Moore-Horspool variant that’s simpler to implement

Difficulty: Intermediate-Advanced Time estimate: 1 week Prerequisites: Project 1 (Naive Search), understanding of suffix/prefix concepts

Real world outcome:

$ ./boyer_moore "substantially" novel.txt
Found at position 45231
Found at position 89012
Search completed in 3ms  # vs 45ms naive!

Skip table visualization:
  Pattern: "example"
  Bad character table:
    'e' -> 1 (rightmost position from end)
    'l' -> 2
    'p' -> 3
    'm' -> 4
    'a' -> 5
    'x' -> 6
    (all others) -> 7 (pattern length)

Benchmark comparison:
  Pattern length 5:  Naive 12ms, Boyer-Moore 8ms   (1.5x faster)
  Pattern length 15: Naive 15ms, Boyer-Moore 2ms   (7.5x faster)
  Pattern length 50: Naive 18ms, Boyer-Moore 0.4ms (45x faster!)

Key insight: Longer patterns = faster search!

Implementation Hints: The bad character rule: When a mismatch occurs at text[i] and pattern[j], shift the pattern so that the rightmost occurrence of text[i] in the pattern aligns with text[i]. If text[i] doesn’t occur in the pattern, shift the entire pattern past position i.

The magic is that you compare from RIGHT TO LEFT within the pattern. When you mismatch, you’ve already seen characters to the right—use that information to skip forward!

Learning milestones:

Bad character heuristic works → You understand basic skip logic
Good suffix heuristic works → You understand suffix analysis
Longer patterns search faster → You’ve internalized the counterintuitive insight

Project 3: Aho-Corasick Multi-Pattern Matching

File: TEXT_SEARCH_TOOLS_DEEP_DIVE.md
Main Programming Language: C
Alternative Programming Languages: Rust, C++, Go
Coolness Level: Level 4: Hardcore Tech Flex
Business Potential: 3. The “Service & Support” Model
Difficulty: Level 3: Advanced
Knowledge Area: Algorithms / Automata Theory
Software or Tool: Custom Implementation
Main Book: “Introduction to Algorithms” by CLRS

What you’ll build: An Aho-Corasick automaton that can search for multiple patterns simultaneously in a single pass through the text. This is what fgrep and antivirus scanners use.

Why it teaches search tools: When you have multiple patterns (like 1000 keywords to search for), naive approaches become O(n×m×k) where k is pattern count. Aho-Corasick is O(n + m + z) where z is the number of matches—linear regardless of pattern count!

Core challenges you’ll face:

Trie construction → maps to prefix tree data structures
Failure link computation → maps to KMP failure function generalization
Output function (suffix links) → maps to handling overlapping patterns
State machine execution → maps to automata theory

Key Concepts:

Trie Data Structure: “Introduction to Algorithms” Chapter 12 - CLRS
Failure Links: The key innovation—where to go on mismatch (like KMP’s failure function)
Finite State Machines: “Introduction to the Theory of Computation” Chapter 1 - Sipser
Original Paper: “Efficient string matching: An aid to bibliographic search” - Aho & Corasick (1975)

Difficulty: Advanced Time estimate: 2 weeks Prerequisites: Project 2, understanding of tries and FSMs

Real world outcome:

$ ./aho_corasick keywords.txt document.txt
Loaded 1000 patterns from keywords.txt
Searching document.txt (5MB)...
Found "error" at position 1234
Found "warning" at position 1235
Found "error" at position 5678
...
Total: 4523 matches
Search completed in 45ms

Automaton visualization:
  States: 4523
  Transitions: 12456
  Failure links: 4522

Benchmark:
  1000 patterns, 5MB file:
    Naive (run each pattern): 12,000ms
    Aho-Corasick: 45ms
    Speedup: 266x!

Implementation Hints: Build a trie from all patterns. Then compute “failure links”—for each state, where do you go when the next character doesn’t match any outgoing edge? The failure link points to the longest proper suffix of the current path that is also a prefix of some pattern.

Key insight: You never backtrack in the input. Each character is examined exactly once. The failure links let you “teleport” to the right state.

Learning milestones:

Trie construction correct → You understand prefix trees
Failure links computed correctly → You understand the KMP connection
Finds all overlapping matches → You understand the output function

Phase 2: Regular Expression Engines

Project 4: NFA Regex Engine (Thompson’s Construction)

File: TEXT_SEARCH_TOOLS_DEEP_DIVE.md
Main Programming Language: C
Alternative Programming Languages: Rust, Go, Zig
Coolness Level: Level 5: Pure Magic
Business Potential: 1. The “Resume Gold” (Educational/Personal Brand)
Difficulty: Level 4: Expert
Knowledge Area: Compilers / Automata Theory
Software or Tool: Custom Implementation
Main Book: “Compilers: Principles, Techniques, and Tools” by Aho, Sethi, Ullman (Dragon Book)

What you’ll build: A regex engine using Thompson’s NFA construction. Parse a regex, convert to NFA, and simulate the NFA on input text.

Why it teaches search tools: This is the foundation of RE2, Rust’s regex, and one mode of GNU grep. Understanding NFA simulation is crucial to understanding why some regex engines are fast and others have “catastrophic backtracking.”

Core challenges you’ll face:

Regex parsing → maps to recursive descent parsing
NFA construction (concatenation, alternation, Kleene star) → maps to automata construction
NFA simulation (tracking multiple states) → maps to set operations
ε-closure computation → maps to graph traversal

Key Concepts:

Thompson’s Construction: “Compilers: Principles, Techniques, and Tools” Chapter 3.7 - Dragon Book
NFA Simulation: “Regular Expression Matching Can Be Simple And Fast” - Russ Cox (swtch.com/~rsc/regexp/regexp1.html)
ε-transitions: How to handle “empty” transitions efficiently
State Set Representation: Bit vectors for O(1) state operations

Difficulty: Expert Time estimate: 2-3 weeks Prerequisites: Projects 1-3, understanding of automata theory basics

Real world outcome:

$ ./nfa_regex "a(b|c)*d" input.txt
Parsing regex: a(b|c)*d
NFA states: 8
NFA transitions: 10

Found match "abcd" at position 100
Found match "abbbcd" at position 200
Found match "acbcbd" at position 300
Search completed in 15ms

NFA visualization (DOT format):
  0 -a-> 1
  1 -ε-> 2
  1 -ε-> 4
  2 -b-> 3
  4 -c-> 5
  3 -ε-> 6
  5 -ε-> 6
  6 -ε-> 1
  6 -ε-> 7
  7 -d-> 8 (accepting)

Pathological test:
  Pattern: (a+)+$
  Input: "aaaaaaaaaaaaaaaaaaaaaaaaaaaa!"
  Time: 0.1ms  # Still fast! No backtracking!

Implementation Hints: Each regex operator becomes an NFA fragment:

Literal ‘a’: State with single outgoing edge labeled ‘a’
Concatenation AB: Connect A’s accepting state to B’s start state with ε
**Alternation A B**: New start state with ε to both A and B starts
Kleene star A*: ε to A, ε from A’s accept back to start, ε to skip A entirely

Simulation: Maintain a SET of current states. For each input character, compute the set of states reachable by following that character, then compute the ε-closure. Accept if final set contains an accepting state.

Learning milestones:

Parses regex correctly → You understand recursive descent
Constructs NFA correctly → You understand Thompson’s algorithm
No catastrophic backtracking → You understand why NFAs are O(n×m)

Project 5: DFA Regex Engine with Lazy Construction

File: TEXT_SEARCH_TOOLS_DEEP_DIVE.md
Main Programming Language: C
Alternative Programming Languages: Rust, C++, Go
Coolness Level: Level 5: Pure Magic
Business Potential: 3. The “Service & Support” Model
Difficulty: Level 4: Expert
Knowledge Area: Compilers / Automata Theory
Software or Tool: Custom Implementation
Main Book: “Introduction to the Theory of Computation” by Michael Sipser

What you’ll build: Convert NFA to DFA using the subset construction, but build states lazily (on-demand) to avoid exponential blowup. Cache states for reuse.

Why it teaches search tools: This is how production regex engines like RE2 and Rust’s regex work. DFA simulation is a simple table lookup per character—O(n) guaranteed. But DFAs can have exponentially many states, so lazy construction is essential.

Core challenges you’ll face:

Subset construction algorithm → maps to powerset of NFA states
State caching/memoization → maps to hash tables for state sets
Cache eviction under memory pressure → maps to LRU caches
Handling exponential blowup → maps to falling back to NFA

Key Concepts:

Subset Construction: “Introduction to the Theory of Computation” Chapter 1.2 - Sipser
Lazy DFA: “Regular Expression Matching: the Virtual Machine Approach” - Russ Cox
State Set Hashing: Efficiently comparing and storing state sets
Cache Management: When to evict cached DFA states

Difficulty: Expert Time estimate: 2-3 weeks Prerequisites: Project 4 (NFA Engine)

Real world outcome:

$ ./dfa_regex "a(b|c)*d" input.txt
NFA states: 8
DFA cache: lazy construction enabled

Searching...
DFA state 0 (NFA: {0}) -> 'a' -> DFA state 1 (NFA: {1,2,4,6,7})
DFA state 1 -> 'b' -> DFA state 2 (NFA: {1,2,3,4,6,7})
[States constructed on demand]

Found match "abcd" at position 100
Search completed in 5ms (vs 15ms NFA simulation)

Cache statistics:
  DFA states constructed: 4
  DFA states possible (worst case): 256 (2^8)
  Cache hits: 45,231
  Cache misses: 4

Benchmark:
  NFA simulation: 15ms
  Lazy DFA: 5ms
  Full DFA (if we built it all): 3ms but 2MB memory

Implementation Hints: Each DFA state represents a SET of NFA states. When you see character ‘c’ in DFA state S, compute which NFA states are reachable from any state in S via ‘c’, take the ε-closure, and that’s your new DFA state.

The trick: Use a hash table to check if you’ve seen this state set before. If so, reuse the existing DFA state. If not, create a new one and cache it.

For the cache, use LRU eviction—if memory gets tight, throw away the least-recently-used DFA states. They’ll be reconstructed if needed.

Learning milestones:

Subset construction produces correct DFA → You understand the powerset
Lazy construction avoids memory blowup → You understand caching
Performance matches or beats NFA → You understand the tradeoff

Project 6: Literal Extraction Optimizer

File: TEXT_SEARCH_TOOLS_DEEP_DIVE.md
Main Programming Language: Rust (or C)
Alternative Programming Languages: C++, Go
Coolness Level: Level 4: Hardcore Tech Flex
Business Potential: 3. The “Service & Support” Model
Difficulty: Level 3: Advanced
Knowledge Area: Regex Optimization
Software or Tool: Custom Implementation
Main Book: “Regular Expression Matching” by Russ Cox (online articles)

What you’ll build: Analyze a regex to extract literal strings that MUST appear in any match. Use these literals to quickly skip non-matching regions before running the full regex.

Why it teaches search tools: This is ripgrep’s secret weapon. Most regexes have literal prefixes, suffixes, or required substrings. Find those with memchr/SIMD, THEN run the regex engine. This is why ripgrep searching for function.*async is almost as fast as searching for function.

Core challenges you’ll face:

Prefix extraction → maps to “what literal MUST the match start with?”
Suffix extraction → maps to “what literal MUST the match end with?”
Required substring extraction → maps to “what literal MUST appear somewhere?”
Handling alternation → (foo|bar).*baz has no common prefix, but “baz” is required

Key Concepts:

Literal Extraction: regex-syntax crate documentation (Rust)
Prefix/Suffix Analysis: Walk the regex AST looking for required literals
Prefiltering: Run memchr on literals, verify with full regex

Difficulty: Advanced Time estimate: 1-2 weeks Prerequisites: Projects 4-5, understanding of regex ASTs

Real world outcome:

$ ./literal_optimizer "function\s+\w+\s*\(.*async" input.js
Regex analysis:
  Prefix literals: ["function"]
  Suffix literals: ["async"]
  Required substrings: ["function", "async"]

Strategy:
  1. memchr for "function"
  2. If found, check if "async" exists after it
  3. If both found, run full regex on candidate region

Benchmark (10MB JavaScript file):
  Full regex scan: 150ms
  Literal-optimized: 8ms
  Speedup: 18.75x

Analysis of common patterns:
  "TODO": 100% literal, use memchr directly
  "error.*line \d+": prefix "error", can skip most text
  "[a-z]+@[a-z]+\.[a-z]+": no useful literals (email regex)
  "\berror\b": literal "error", boundary check at matches

Implementation Hints: Walk the regex syntax tree:

Literal nodes: Accumulate into current prefix/suffix
Concatenation: Prefix of A, suffix of B
Alternation: Common prefix/suffix of all branches (may be empty)
Kleene star: Can be skipped, so it “resets” the required substring

The extracted literals are passed to memchr (for single byte), memmem (for substring), or Aho-Corasick (for multiple literals).

Learning milestones:

Correctly extracts prefix from simple regexes → You understand the concept
Handles alternation and quantifiers → You understand the complexity
10x+ speedup on real regexes → You’ve internalized the optimization

Phase 3: SIMD and Low-Level Optimization

Project 7: SIMD memchr Implementation

File: TEXT_SEARCH_TOOLS_DEEP_DIVE.md
Main Programming Language: C (with intrinsics)
Alternative Programming Languages: Rust, Assembly
Coolness Level: Level 5: Pure Magic
Business Potential: 4. The “Open Core” Infrastructure
Difficulty: Level 4: Expert
Knowledge Area: SIMD / Low-Level Optimization
Software or Tool: SSE2/AVX2 Intrinsics
Main Book: “Intel Intrinsics Guide” (online reference)

What you’ll build: A SIMD-accelerated memchr that finds a byte in a buffer at 15+ bytes per CPU cycle using SSE2/AVX2.

Why it teaches search tools: memchr is the foundation of all fast search. If you can find a byte at 15 bytes/cycle, you can prefilter gigabytes of text per second, only running the expensive regex engine on the tiny fraction that might match.

Core challenges you’ll face:

SIMD vector loading (_mm_loadu_si128) → maps to loading 16 bytes at once
Parallel comparison (_mm_cmpeq_epi8) → maps to comparing 16 bytes simultaneously
Mask extraction (_mm_movemask_epi8) → maps to converting results to bitmask
Bit scanning (__builtin_ctz) → maps to finding first match in mask
Alignment handling → maps to memory alignment requirements

Key Concepts:

SSE2 Intrinsics: Intel Intrinsics Guide (intel.com)
Vector Comparison: Compare 16/32 bytes in a single instruction
Movemask: Convert comparison result to 16/32-bit integer
Bit Manipulation: Find first set bit in mask

Difficulty: Expert Time estimate: 1 week Prerequisites: Understanding of pointers, basic assembly concepts

Real world outcome:

$ ./simd_memchr '\n' large_file.txt
Searching for newlines in 1GB file...

Implementation comparison:
  Naive (byte-by-byte): 1200ms (0.83 GB/s)
  Standard memchr:       150ms (6.67 GB/s)
  SSE2 memchr:            80ms (12.5 GB/s)
  AVX2 memchr:            45ms (22.2 GB/s)

Throughput: 22.2 GB/s = ~9 bytes per cycle on 2.5GHz CPU

Disassembly of hot loop:
  vmovdqu  ymm0, [rdi]        ; Load 32 bytes
  vpcmpeqb ymm1, ymm0, ymm2   ; Compare with search byte
  vpmovmskb eax, ymm1         ; Extract comparison mask
  test     eax, eax           ; Any matches?
  jnz      found              ; If yes, handle match
  add      rdi, 32            ; Advance pointer
  jmp      loop               ; Continue

Implementation Hints:

// SSE2 memchr core loop
__m128i needle = _mm_set1_epi8(c);  // Broadcast search byte to all 16 lanes
while (ptr + 16 <= end) {
    __m128i chunk = _mm_loadu_si128((__m128i*)ptr);  // Load 16 bytes
    __m128i cmp = _mm_cmpeq_epi8(chunk, needle);     // Compare all 16
    int mask = _mm_movemask_epi8(cmp);               // Get result as bits
    if (mask != 0) {
        return ptr + __builtin_ctz(mask);  // Return first match
    }
    ptr += 16;
}
// Handle remaining bytes with scalar loop

Learning milestones:

SSE2 version works correctly → You understand SIMD basics
AVX2 version achieves 20+ GB/s → You understand wider vectors
Handles alignment and tail bytes → You understand edge cases

Project 8: SIMD Substring Search (memmem)

File: TEXT_SEARCH_TOOLS_DEEP_DIVE.md
Main Programming Language: C (with intrinsics) or Rust
Alternative Programming Languages: C++, Assembly
Coolness Level: Level 5: Pure Magic
Business Potential: 4. The “Open Core” Infrastructure
Difficulty: Level 4: Expert
Knowledge Area: SIMD / String Algorithms
Software or Tool: SSE2/AVX2 Intrinsics
Main Book: “Intel Intrinsics Guide” + Hyperscan papers

What you’ll build: A SIMD-accelerated substring search using the “two-byte trick” or similar techniques. Find candidates with SIMD, verify with memcmp.

Why it teaches search tools: memmem is what you use for literal substrings. The two-byte trick: search for the first TWO bytes of the needle simultaneously, then verify the rest. This is often faster than full Boyer-Moore for short needles.

Core challenges you’ll face:

Two-byte candidate finding → maps to reducing false positives
Vectorized candidate verification → maps to batched memcmp
Handling variable needle lengths → maps to algorithm selection
Combining with Boyer-Moore for long needles → maps to hybrid approaches

Key Concepts:

Two-Way Algorithm: Used by glibc memmem
SIMD Candidate Finding: Search for first+last byte simultaneously
False Positive Rate: Why two bytes is much better than one byte
Hybrid Strategies: Different algorithms for different needle lengths

Difficulty: Expert Time estimate: 1-2 weeks Prerequisites: Project 7 (SIMD memchr)

Real world outcome:

$ ./simd_memmem "function" source_code.txt
Searching for "function" in 100MB file...

Algorithm selection:
  Needle length: 8 bytes
  Strategy: SIMD two-byte + verify

Benchmark:
  Naive strstr:      250ms
  glibc memmem:       45ms
  SIMD two-byte:      12ms
  Boyer-Moore:        18ms (better for longer needles)

False positive analysis:
  First byte 'f' matches: 2,345,678 times
  First two bytes "fu" match: 12,345 times (190x reduction!)
  Full "function" matches: 8,234 times

Disassembly insight:
  Searching for 'f' AND 'n' (first and last byte) simultaneously
  reduces candidates by 99.5%

Implementation Hints: The two-byte trick: Instead of searching for just the first byte, search for the first AND last byte of the needle at the correct offset. If needle is “hello” (length 5), search for positions where byte[i] == ‘h’ AND byte[i+4] == ‘o’. This dramatically reduces false positives.

With SIMD, you can check 16-32 positions simultaneously for both conditions, then AND the masks together.

Learning milestones:

Two-byte search works correctly → You understand the optimization
Faster than Boyer-Moore for short needles → You understand the tradeoff
Hybrid approach selects best algorithm → You understand adaptive strategies

Project 9: Teddy Algorithm (Multi-Literal SIMD)

File: TEXT_SEARCH_TOOLS_DEEP_DIVE.md
Main Programming Language: Rust
Alternative Programming Languages: C++
Coolness Level: Level 5: Pure Magic
Business Potential: 4. The “Open Core” Infrastructure
Difficulty: Level 5: Master
Knowledge Area: SIMD / Multiple Pattern Matching
Software or Tool: SSSE3/AVX2 (PSHUFB instruction)
Main Book: Hyperscan papers + Rust memchr crate source code

What you’ll build: The Teddy algorithm from Intel’s Hyperscan, which finds multiple short literal patterns simultaneously using the PSHUFB instruction for parallel table lookups.

Why it teaches search tools: Teddy is what ripgrep uses when you have a small number of literal patterns (up to ~64). It “completely blows away the competition for short substrings” according to ripgrep’s author.

Core challenges you’ll face:

PSHUFB as a lookup table → maps to parallel 16-way lookup
Fingerprint-based candidate detection → maps to bloom filter-like behavior
Bucket management → maps to handling false positives
Verification phase → maps to confirming actual matches

Key Concepts:

PSHUFB Instruction: Use SIMD register as 16-entry lookup table
Fingerprinting: Map patterns to fingerprints, search for fingerprints
Bucket Disambiguation: Multiple patterns can share a fingerprint
Teddy Paper: “Teddy: An Efficient SIMD-based Literal Matching Engine” - Intel

Difficulty: Master Time estimate: 3-4 weeks Prerequisites: Projects 7-8, deep SIMD understanding

Real world outcome:

$ ./teddy_search patterns.txt haystack.txt
Loading 32 patterns from patterns.txt...
Patterns: ["error", "warning", "fatal", "debug", ...]

Fingerprint table constructed:
  Bucket 0: ["error", "event"]
  Bucket 3: ["warning"]
  Bucket 7: ["fatal", "fail"]
  ...

Searching 1GB haystack...
Candidates found: 145,234
Verified matches: 12,456
Search time: 89ms (11.2 GB/s throughput)

Benchmark comparison (32 patterns, 1GB file):
  Aho-Corasick:  450ms
  Teddy SIMD:     89ms
  Speedup: 5x

Note: Teddy wins for small pattern sets (<~64 patterns)
      Aho-Corasick wins for large pattern sets (1000+)

Implementation Hints: The key insight is that PSHUFB lets you do 16 parallel lookups in a single instruction. Each byte of the input indexes into a 16-byte table.

For each input position, you check if the first 3 bytes match any pattern’s first 3 bytes by having separate lookup tables for each byte position and ANDing the results.

The “fingerprint” is which patterns might match at this position. You then verify the full patterns in a scalar loop.

Learning milestones:

PSHUFB-based lookup works → You understand the core trick
Multiple patterns searched simultaneously → You understand fingerprinting
Faster than Aho-Corasick for small sets → You’ve achieved the goal

Phase 4: File System and I/O Optimization

Project 10: Parallel Directory Traversal

File: TEXT_SEARCH_TOOLS_DEEP_DIVE.md
Main Programming Language: Rust
Alternative Programming Languages: C++, Go
Coolness Level: Level 4: Hardcore Tech Flex
Business Potential: 3. The “Service & Support” Model
Difficulty: Level 3: Advanced
Knowledge Area: Concurrency / File Systems
Software or Tool: Custom Implementation (or study ignore crate)
Main Book: “The Art of Multiprocessor Programming” by Herlihy & Shavit

What you’ll build: A lock-free parallel directory walker that uses a work-stealing queue to distribute directory traversal across multiple threads.

Why it teaches search tools: Directory traversal is often the bottleneck, not the search itself. fd and ripgrep use parallel traversal to saturate modern NVMe SSDs. A single thread can’t keep up with drives that deliver 500K+ IOPS.

Core challenges you’ll face:

Work-stealing queues → maps to load balancing
Lock-free synchronization → maps to avoiding contention
Directory entry batching → maps to reducing syscall overhead
Respecting .gitignore → maps to ignore pattern matching

Key Concepts:

Work-Stealing: “The Art of Multiprocessor Programming” Chapter 16 - Herlihy & Shavit
Crossbeam Deque: Rust’s work-stealing deque implementation
getdents64: Batched directory reading syscall
Gitignore Semantics: Complex rules about inheritance and negation

Difficulty: Advanced Time estimate: 2 weeks Prerequisites: Basic threading, understanding of file systems

Real world outcome:

$ ./parallel_walker /home/user/code
Walking 2.1 million files across 150,000 directories...

Thread 0: 523,456 files processed
Thread 1: 518,234 files processed
Thread 2: 530,123 files processed
Thread 3: 528,187 files processed

Total time: 1.2 seconds
Throughput: 1.75 million files/second

Benchmark comparison:
  Single-threaded: 4.8 seconds
  4 threads:       1.2 seconds
  Speedup: 4x (linear with thread count!)

Work-stealing statistics:
  Steals performed: 1,234
  Average queue depth: 45 directories

Implementation Hints: Use a deque per thread. When a thread discovers subdirectories, push them to its own deque. When a thread’s deque is empty, steal from another thread’s deque (from the opposite end to reduce contention).

For .gitignore, you need a stack of ignore matchers—each directory can have its own .gitignore that adds to the parent’s rules. Use a rope-like structure or thread-local copies to avoid contention.

Learning milestones:

Basic parallel traversal works → You understand work distribution
Work-stealing balances load → You understand dynamic scheduling
Respects .gitignore correctly → You understand the complexity

Project 11: mmap vs read Strategy Selector

File: TEXT_SEARCH_TOOLS_DEEP_DIVE.md
Main Programming Language: C
Alternative Programming Languages: Rust, C++
Coolness Level: Level 3: Genuinely Clever
Business Potential: 3. The “Service & Support” Model
Difficulty: Level 3: Advanced
Knowledge Area: Operating Systems / I/O
Software or Tool: Custom Implementation
Main Book: “Linux System Programming” by Robert Love

What you’ll build: A file reading abstraction that automatically chooses between mmap and buffered read based on file size, access pattern, and system characteristics.

Why it teaches search tools: ripgrep “chooses the best searching strategy for you automatically.” For single large files, mmap wins (no copy into userspace). For many small files, buffered read wins (lower overhead per file).

Core challenges you’ll face:

mmap overhead measurement → maps to page table setup cost
read buffer sizing → maps to syscall amortization
Page fault handling → maps to understanding mmap tradeoffs
File size heuristics → maps to when to switch strategies

Key Concepts:

mmap Mechanics: “Linux System Programming” Chapter 9 - Robert Love
Page Faults: Demand paging and its costs
Buffered I/O: “Advanced Programming in the UNIX Environment” Chapter 5 - Stevens & Rago
Linus on mmap: Understanding the “virtual memory mapping is very expensive in itself” quote

Difficulty: Advanced Time estimate: 1 week Prerequisites: Basic systems programming, understanding of virtual memory

Real world outcome:

$ ./io_selector --benchmark
Benchmarking I/O strategies...

Single 1GB file:
  mmap:          120ms (8.33 GB/s)
  read (64KB):   145ms (6.90 GB/s)
  Winner: mmap (17% faster)

1000 files, 1MB each:
  mmap:          850ms (1.18 GB/s)
  read (64KB):   320ms (3.12 GB/s)
  Winner: read (62% faster)

10000 files, 10KB each:
  mmap:          2100ms (0.05 GB/s)
  read (64KB):   180ms (0.56 GB/s)
  Winner: read (10x faster!)

Heuristic derived:
  Use mmap when: file_size > 1MB AND file_count < 10
  Use read otherwise
  Crossover point: ~500KB per file

Auto-selection in action:
  ./io_selector search "pattern" ./small_files/  → using read
  ./io_selector search "pattern" huge_file.log   → using mmap

Implementation Hints: mmap’s overhead is per-file (page table setup, TLB shootdowns on unmap). This fixed cost is amortized over large files but dominates for small files.

read’s overhead is per-syscall, but you can amortize it with large buffers. However, you pay a memory copy cost that mmap avoids.

The crossover depends on hardware—measure on target systems and encode heuristics.

Learning milestones:

Both strategies implemented correctly → You understand the mechanics
Benchmark identifies crossover point → You understand the tradeoffs
Auto-selection matches or beats manual choice → You’ve built a real optimization

Project 12: Line-Oriented Search with SIMD Line Counting

File: TEXT_SEARCH_TOOLS_DEEP_DIVE.md
Main Programming Language: C or Rust
Alternative Programming Languages: C++, Zig
Coolness Level: Level 4: Hardcore Tech Flex
Business Potential: 3. The “Service & Support” Model
Difficulty: Level 3: Advanced
Knowledge Area: SIMD / Text Processing
Software or Tool: Custom Implementation
Main Book: “Intel Intrinsics Guide”

What you’ll build: A SIMD-accelerated newline counter and line boundary finder that can identify line starts/ends at gigabytes per second.

Why it teaches search tools: grep is LINE-oriented—you need to output the line containing the match. Finding line boundaries fast is critical. ripgrep uses SIMD to count newlines, which is why it can report line numbers without significant overhead.

Core challenges you’ll face:

SIMD newline counting → maps to POPCNT on comparison masks
Line boundary tracking → maps to maintaining byte offsets
Interleaving with search → maps to fusion of operations
Handling \r\n vs \n → maps to Windows compatibility

Key Concepts:

SIMD POPCNT: Count bits set in comparison mask
Prefix Sum: Computing line numbers from newline counts
Line Caching: Store line boundaries for random access
Streaming Processing: Process file in chunks without full materialization

Difficulty: Advanced Time estimate: 1 week Prerequisites: Project 7 (SIMD memchr)

Real world outcome:

$ ./line_counter huge_log.txt
File size: 5.2 GB
Lines: 124,567,890

Benchmark:
  Naive (byte loop):  12.5 seconds
  wc -l:               3.2 seconds
  SIMD line count:     0.45 seconds (11.6 GB/s)

With line boundary tracking:
  Line 50000000 starts at byte offset 2,345,678,901
  Lookup time: 0.001ms (from precomputed index)

Integration with search:
  Pattern found at byte 123456789
  → Line number 567890 (computed in 0.01ms)
  → Line content: "Error: connection timeout..."

Implementation Hints: Use SIMD memchr to find newlines. For each 16/32-byte chunk, get the comparison mask and use POPCNT to count the bits set (= number of newlines in that chunk).

For line boundary tracking, either store every Nth line’s offset (sparse index) or compute prefix sums on the fly.

To output context lines, you need to find the line boundaries AROUND a match, not just count lines up to it.

Learning milestones:

SIMD newline counting correct and fast → You understand the basics
Can find line containing a byte offset → You understand the application
Performance matches ripgrep’s line counting → You’ve achieved production quality

Phase 5: Fuzzy Matching

Project 13: Fuzzy String Matcher (fzf-style)

File: TEXT_SEARCH_TOOLS_DEEP_DIVE.md
Main Programming Language: Go
Alternative Programming Languages: Rust, C++, Python
Coolness Level: Level 4: Hardcore Tech Flex
Business Potential: 2. The “Micro-SaaS / Pro Tool” (Solo-Preneur Potential)
Difficulty: Level 3: Advanced
Knowledge Area: Algorithms / String Matching
Software or Tool: Custom Implementation
Main Book: fzf source code (algo.go)

What you’ll build: A fuzzy matcher that finds items matching a pattern with characters in order but not necessarily contiguous. Score matches by quality (gaps, position of matches, word boundaries).

Why it teaches search tools: fzf revolutionized interactive file selection. Understanding its scoring algorithm helps you build better developer tools. The trick is balancing quality (finding the best match) with speed (interactive responsiveness).

Core challenges you’ll face:

Subsequence matching → maps to dynamic programming
Scoring function design → maps to heuristics for “good” matches
Word boundary bonuses → maps to camelCase and snake_case awareness
Performance optimization → maps to early termination, SIMD for candidate filtering

Key Concepts:

Longest Common Subsequence: Base algorithm for fuzzy matching
Scoring Heuristics: Position bonuses, gap penalties, boundary bonuses
Smith-Waterman: Scoring inspiration from bioinformatics
fzf algo.go: The actual implementation to study

Difficulty: Advanced Time estimate: 2 weeks Prerequisites: Dynamic programming, string algorithms

Real world outcome:

$ ./fuzzy_matcher
Input: fzf
Candidates: [fuzzy_finder, foozbaz, fizzbuzz, FuzzyZoneFinder]

Matches (sorted by score):
  1. FuzzyZoneFinder   score=95  (f-z-f at word boundaries!)
  2. fuzzy_finder      score=78  (f-z-f with small gaps)
  3. fizzbuzz          score=45  (f-z at start, z far)
  4. foozbaz           score=32  (f-z far apart, z before a)

Score breakdown for "FuzzyZoneFinder":
  'F' at position 0: +20 (start bonus)
  'z' at position 3: +15 (after capital, word boundary)
  'f' at position 10: +18 (camelCase boundary)
  Gap penalty: -3 (3 chars between z and f)
  Total: 95

Performance:
  10,000 candidates: 2ms
  100,000 candidates: 18ms
  1,000,000 candidates: 150ms

Implementation Hints: For each candidate, find the best way to match the pattern characters as a subsequence. Use dynamic programming to track score at each (pattern_pos, candidate_pos).

Scoring heuristics (from fzf):

Bonus for matching at start of word (after _, space, or capital)
Bonus for matching after a camelCase boundary
Penalty for gaps between matches
Bonus for consecutive matches

The first character match gets extra weight because it’s likely intentional.

Learning milestones:

Basic subsequence matching works → You understand the problem
Scoring produces intuitive rankings → You understand heuristics
Fast enough for interactive use (<100ms for 100K items) → You understand optimization

Project 14: Interactive Fuzzy Finder (Full fzf Clone)

File: TEXT_SEARCH_TOOLS_DEEP_DIVE.md
Main Programming Language: Go
Alternative Programming Languages: Rust, C++
Coolness Level: Level 4: Hardcore Tech Flex
Business Potential: 2. The “Micro-SaaS / Pro Tool” (Solo-Preneur Potential)
Difficulty: Level 3: Advanced
Knowledge Area: TUI / Concurrency
Software or Tool: Custom Implementation
Main Book: fzf source code

What you’ll build: A complete interactive fuzzy finder with: terminal UI, incremental input, parallel matching with worker goroutines, result streaming, preview pane.

Why it teaches search tools: fzf shows how to build responsive interactive tools. The architecture—input reader, matcher pool, result merger, renderer—is a pattern you’ll use in many tools.

Core challenges you’ll face:

Terminal UI (raw mode, escape sequences) → maps to TUI rendering
Incremental search (update as you type) → maps to debouncing and cancellation
Parallel matching → maps to worker pools and result merging
Streaming large inputs → maps to handling unbounded data

Key Concepts:

Terminal Raw Mode: Reading characters without waiting for Enter
ANSI Escape Codes: Cursor movement, colors, clearing
Worker Goroutines: Parallel matching with cancellation
Result Merging: Combining ranked results from multiple workers

Difficulty: Advanced Time estimate: 3-4 weeks Prerequisites: Project 13, terminal programming basics

Real world outcome:

$ find . -type f | ./mini_fzf
[Interactive TUI appears]

> src/ma
  4/1234
  > src/main.rs
    src/matcher.go
    src/main_test.go
    src/map.rs

Features:
- Type to filter
- Arrow keys to navigate
- Enter to select
- Ctrl-C to cancel
- Preview pane (--preview) shows file content

Performance:
- 100K items loaded in 50ms
- Typing latency <10ms (feels instant)
- Matches update incrementally as you type

Implementation Hints: Architecture (from fzf):

Reader: Reads input lines, chunks them, sends to matcher
Matcher: Pool of workers that score chunks against current pattern
Merger: Combines worker results, maintains top-K
Terminal: Renders results, handles input

When the pattern changes, cancel in-flight matching and start new match. Use channels for communication between components.

Learning milestones:

Basic TUI works (type and see results) → You understand terminal rendering
Results update smoothly while typing → You understand incremental search
Handles millions of items smoothly → You understand the architecture

Phase 6: Putting It All Together

Project 15: Build Your Own ripgrep

File: TEXT_SEARCH_TOOLS_DEEP_DIVE.md
Main Programming Language: Rust
Alternative Programming Languages: C++, Go
Coolness Level: Level 5: Pure Magic
Business Potential: 4. The “Open Core” Infrastructure
Difficulty: Level 5: Master
Knowledge Area: Systems Programming / Everything Combined
Software or Tool: Custom Implementation
Main Book: ripgrep source code + all previous resources

What you’ll build: A production-quality grep with: parallel directory traversal, .gitignore support, SIMD-accelerated literal search, regex support with literal extraction, colored output, context lines.

Why it teaches search tools: This is the capstone project. You’ll integrate everything: parallel I/O, SIMD algorithms, regex engines, output formatting. The result is a tool YOU built that can search gigabytes per second.

Core challenges you’ll face:

Integrating all previous components → maps to system architecture
Memory efficiency → maps to avoiding unnecessary allocations
Correctness edge cases → maps to handling binary files, encoding issues
User experience polish → maps to colors, progress, error messages

Key Concepts:

Component Integration: How the pieces fit together
Error Handling: Graceful degradation on permission errors, encoding issues
Output Buffering: Line buffering vs block buffering for piped output
Configuration: Command-line argument parsing, config files

Difficulty: Master Time estimate: 2-3 months Prerequisites: All previous projects

Real world outcome:

$ ./mygrep "function.*async" ~/code --stats
Searching 150,234 files...

src/server/api.ts:145: async function handleRequest(req: Request) {
src/server/api.ts:234: export async function processData(data: Data) {
src/utils/fetch.ts:12: async function fetchWithRetry(url: string) {
...

Statistics:
  Files searched: 150,234
  Files matched: 1,234
  Lines matched: 5,678
  Time: 0.89s
  Throughput: 2.1 GB/s

$ hyperfine './mygrep TODO ~/code' 'rg TODO ~/code'
  mygrep: 0.95s ± 0.02s
  rg:     0.87s ± 0.02s

  Within 10% of ripgrep! 🎉

Implementation Hints: Start with a working but slow version, then optimize:

Single-threaded, read-based file access, naive regex
Add parallel directory traversal
Add literal extraction and SIMD prefiltering
Add mmap for large files
Optimize output (buffer, avoid unnecessary work)

Profile continuously. The bottleneck shifts as you optimize—first it’s regex, then I/O, then output.

Learning milestones:

Correct results on all test cases → You understand the requirements
Within 2x of ripgrep performance → You understand optimization
Users would actually use this → You’ve built something real

Project Comparison Table

#	Project	Difficulty	Time	Depth of Understanding	Fun Factor
1	Naive String Search	Beginner	2-3 hours	⭐⭐	⭐⭐
2	Boyer-Moore	Int-Adv	1 week	⭐⭐⭐⭐⭐	⭐⭐⭐⭐
3	Aho-Corasick	Advanced	2 weeks	⭐⭐⭐⭐⭐	⭐⭐⭐⭐
4	NFA Regex Engine	Expert	2-3 weeks	⭐⭐⭐⭐⭐	⭐⭐⭐⭐⭐
5	DFA Regex (Lazy)	Expert	2-3 weeks	⭐⭐⭐⭐⭐	⭐⭐⭐⭐
6	Literal Extraction	Advanced	1-2 weeks	⭐⭐⭐⭐	⭐⭐⭐⭐
7	SIMD memchr	Expert	1 week	⭐⭐⭐⭐⭐	⭐⭐⭐⭐⭐
8	SIMD memmem	Expert	1-2 weeks	⭐⭐⭐⭐⭐	⭐⭐⭐⭐⭐
9	Teddy Algorithm	Master	3-4 weeks	⭐⭐⭐⭐⭐	⭐⭐⭐⭐⭐
10	Parallel Dir Traversal	Advanced	2 weeks	⭐⭐⭐⭐	⭐⭐⭐
11	mmap vs read Selector	Advanced	1 week	⭐⭐⭐⭐	⭐⭐⭐
12	SIMD Line Counting	Advanced	1 week	⭐⭐⭐⭐	⭐⭐⭐⭐
13	Fuzzy Matcher	Advanced	2 weeks	⭐⭐⭐⭐	⭐⭐⭐⭐
14	Interactive fzf Clone	Advanced	3-4 weeks	⭐⭐⭐⭐⭐	⭐⭐⭐⭐⭐
15	Build Your Own ripgrep	Master	2-3 months	⭐⭐⭐⭐⭐	⭐⭐⭐⭐⭐

Recommended Learning Path

Path A: Algorithm Focus (Theory → Implementation)

Projects 1-3 (String Matching Algorithms)
Projects 4-5 (Regex Engines)
Project 6 (Literal Extraction)
Project 15 (Integration)

Path B: Performance Focus (Make It Fast)

Project 1 (Baseline)
Projects 7-8 (SIMD Fundamentals)
Projects 10-11 (I/O Optimization)
Project 15 (Integration)

Path C: Tools Focus (Build Useful Things)

Projects 1-2 (Basic Algorithms)
Projects 10-11 (Parallel I/O)
Projects 13-14 (Fuzzy Finder)
Project 15 (ripgrep Clone)

Why Each Tool Is Fast: Summary

GNU grep

Boyer-Moore + DFA/NFA hybrid
Hand-tuned inner loop
Falls back to NFA for complex patterns
Decades of optimization

The Silver Searcher (ag)

mmap + pthreads + PCRE JIT
First to respect .gitignore by default
Binary search for ignore patterns
Up to 8 worker threads

ripgrep

Finite automata regex (guaranteed linear time)
Teddy SIMD algorithm for literals
Lock-free parallel directory traversal
Automatic mmap vs read selection
UTF-8 decoding built into DFA
Literal extraction for regex optimization

fd

Parallel directory traversal
Uses ripgrep’s regex and ignore crates
Smart defaults (ignore hidden, respect .gitignore)
10-23x faster than find for pattern matching

fzf

Custom fuzzy matching with scoring heuristics
Worker goroutine pool for parallel matching
Incremental search with result streaming
Two algorithm modes: v1 (fast) and v2 (optimal)

git ls-files (bonus)

Reads from Git’s index file directly
No directory traversal at all!
5x+ faster than any directory walker
Only works inside Git repos

Key Insights

Most “regex” searches are literal: The fastest path is finding literals with SIMD, then verifying with regex.
DFA > NFA for guaranteed performance: Backtracking regex engines can take exponential time. Finite automata are O(n) always.
Parallelization matters more than algorithm optimization: On modern SSDs, a single thread can’t saturate I/O. Use work-stealing queues.
Skip what you don’t need: Respecting .gitignore often reduces search space by 10-100x.
Measure, don’t assume: mmap isn’t always faster. Boyer-Moore isn’t always better. Profile on real workloads.
SIMD is a superpower: 16-32 bytes per instruction changes everything. memchr at 15 bytes/cycle is 15x faster than byte-by-byte.

Summary

Project #	Project Name	Main Language
1	Naive String Search	C
2	Boyer-Moore String Search	C
3	Aho-Corasick Multi-Pattern	C
4	NFA Regex Engine	C
5	DFA Regex Engine (Lazy)	C
6	Literal Extraction Optimizer	Rust
7	SIMD memchr	C
8	SIMD memmem	C
9	Teddy Algorithm	Rust
10	Parallel Directory Traversal	Rust
11	mmap vs read Selector	C
12	SIMD Line Counting	C
13	Fuzzy String Matcher	Go
14	Interactive Fuzzy Finder	Go
15	Build Your Own ripgrep	Rust

Deep Dive: Text Search Tools (grep, ripgrep, ag, fzf, find, fd)

The Landscape: Tool Comparison

Why Performance Differs: The Core Factors

1. Regex Engine Architecture (Biggest Factor)

2. Literal Optimization (The Secret Weapon)

3. SIMD Vectorization

4. Parallelization

5. I/O Strategy

6. What NOT to Search

Phase 1: String Matching Algorithms

Project 1: Naive String Search (Baseline)

Project 2: Boyer-Moore String Search

Project 3: Aho-Corasick Multi-Pattern Matching

Phase 2: Regular Expression Engines

Project 4: NFA Regex Engine (Thompson’s Construction)

Project 5: DFA Regex Engine with Lazy Construction

Project 6: Literal Extraction Optimizer

Phase 3: SIMD and Low-Level Optimization

Project 7: SIMD memchr Implementation

Project 8: SIMD Substring Search (memmem)

Project 9: Teddy Algorithm (Multi-Literal SIMD)

Phase 4: File System and I/O Optimization

Project 10: Parallel Directory Traversal

Project 11: mmap vs read Strategy Selector

Project 12: Line-Oriented Search with SIMD Line Counting

Phase 5: Fuzzy Matching

Project 13: Fuzzy String Matcher (fzf-style)

Project 14: Interactive Fuzzy Finder (Full fzf Clone)

Phase 6: Putting It All Together

Project 15: Build Your Own ripgrep

Project Comparison Table

Recommended Learning Path

Path A: Algorithm Focus (Theory → Implementation)

Path B: Performance Focus (Make It Fast)

Path C: Tools Focus (Build Useful Things)

Why Each Tool Is Fast: Summary

GNU grep

The Silver Searcher (ag)

ripgrep

fd

fzf

git ls-files (bonus)

Key Insights

Summary

Sources