TEXT SEARCH TOOLS DEEP DIVE
Deep Dive: Text Search Tools (grep, ripgrep, ag, fzf, find, fd)
Understanding why some search tools are 10-100x faster than others requires diving deep into algorithms, system calls, parallelization, and modern CPU features. This guide takes you from fundamental string matching algorithms to building production-quality search tools.
The Landscape: Tool Comparison
| Tool | Year | Language | Regex Engine | Key Innovation |
|---|---|---|---|---|
| grep | 1974 | C | Hand-rolled DFA/NFA + Boyer-Moore | Original, Unix philosophy |
| egrep | 1976 | C | DFA | First deterministic regex |
| GNU grep | 1988 | C | Hybrid DFA/NFA + Boyer-Moore | Tuned Boyer-Moore |
| ack | 2005 | Perl | Perl regex | First “code-aware” grep |
| The Silver Searcher (ag) | 2011 | C | PCRE (JIT) | mmap + pthreads + gitignore |
| ripgrep (rg) | 2016 | Rust | regex-automata (DFA) | Teddy SIMD + parallel + Unicode |
| fd | 2017 | Rust | regex | Parallel traversal + gitignore |
| fzf | 2013 | Go | Custom fuzzy | Interactive + worker goroutines |
Why Performance Differs: The Core Factors
1. Regex Engine Architecture (Biggest Factor)
Backtracking (NFA) Engines: PCRE, Perl, Python, Java, .NET
- Can express everything (backreferences, lookahead)
- Catastrophic backtracking: Exponential time on pathological patterns
- Example:
(a+)+$onaaaaaaaaaaaaaaaaaaaaaaaaaaaa!takes SECONDS
Finite Automata (DFA) Engines: RE2, Rust regex, GNU grep (partial)
- Linear time guaranteed: O(n) where n = input length
- Cannot do backreferences, but who uses those in grep?
- Key insight: “Only ever in one state at a time”
2. Literal Optimization (The Secret Weapon)
Most “regex” searches are actually literal strings or have literal prefixes:
TODO→ literalfunction.*async→ literal prefix “function”\berror\b→ literal “error” + boundary check
The Strategy: Find the literal fast (SIMD), THEN run the regex engine.
3. SIMD Vectorization
Modern CPUs can compare 16-32 bytes simultaneously:
- SSE2: 128-bit registers (16 bytes at once)
- AVX2: 256-bit registers (32 bytes at once)
The memchr crate achieves 15+ bytes per CPU cycle on finding a single byte.
4. Parallelization
- ag: pthreads, up to 8 workers
- ripgrep: Lock-free parallel directory traversal via crossbeam
- fzf: Worker goroutines for matching
5. I/O Strategy
- mmap: Map file directly into memory (good for single large files)
- read + buffer: Traditional buffered I/O (good for many small files)
- Adaptive: ripgrep chooses automatically
6. What NOT to Search
- Respect
.gitignore(skip node_modules, build artifacts) - Skip binary files
- Skip hidden files by default
Phase 1: String Matching Algorithms
Project 1: Naive String Search (Baseline)
- File: TEXT_SEARCH_TOOLS_DEEP_DIVE.md
- Main Programming Language: C
- Alternative Programming Languages: Rust, Zig, Go
- Coolness Level: Level 2: Practical but Forgettable
- Business Potential: 1. The “Resume Gold” (Educational/Personal Brand)
- Difficulty: Level 1: Beginner
- Knowledge Area: Algorithms / String Matching
- Software or Tool: Custom Implementation
- Main Book: “Introduction to Algorithms” by Cormen, Leiserson, Rivest, Stein (CLRS)
What you’ll build: A brute-force string search that compares every position in the text against the pattern. This is your baseline to understand what we’re optimizing.
Why it teaches search tools: You need to see the naive O(n×m) approach to appreciate why Boyer-Moore and SIMD matter. This is also what many standard library strstr() functions use!
Core challenges you’ll face:
- Understanding worst case → pattern
aaaaabinaaaaaaaaaaaaaaaaaab→ maps to why preprocessing matters - Character-by-character comparison → maps to CPU cache behavior
- Early termination → maps to optimization opportunities
Key Concepts:
- String Matching Basics: “Introduction to Algorithms” Chapter 32.1 - CLRS
- Big-O Analysis: Understanding O(n×m) worst case vs O(n) average case
- Pointer Arithmetic: “C Programming Language” Chapter 5 - K&R
Difficulty: Beginner Time estimate: 2-3 hours Prerequisites: Basic C, pointers
Real world outcome:
$ ./naive_search "needle" haystack.txt
Found at position 1234
Found at position 5678
Search completed in 45ms
$ ./naive_search "aaaaaab" pathological.txt
# Watch it struggle on worst-case input
Search completed in 2300ms # Much slower!
Benchmark:
Pattern "hello" in 1MB file:
Naive: 12ms
(We'll compare to Boyer-Moore later)
Implementation Hints:
For each position i in text:
For each position j in pattern:
If text[i+j] != pattern[j]:
Break (mismatch)
If j == pattern_length - 1:
Found match at position i
The key insight is that we re-examine the same characters multiple times. If we’ve seen “abcd” and the pattern is “abce”, we know something about what the text contains—we shouldn’t forget it!
Learning milestones:
- Implementation works correctly → You understand the basic problem
- Can identify worst-case inputs → You understand algorithm analysis
- See why this is slow → You’re ready for optimization
Project 2: Boyer-Moore String Search
- File: TEXT_SEARCH_TOOLS_DEEP_DIVE.md
- Main Programming Language: C
- Alternative Programming Languages: Rust, C++, Zig
- Coolness Level: Level 4: Hardcore Tech Flex
- Business Potential: 1. The “Resume Gold” (Educational/Personal Brand)
- Difficulty: Level 3: Advanced
- Knowledge Area: Algorithms / String Matching
- Software or Tool: Custom Implementation
- Main Book: “Algorithms on Strings, Trees, and Sequences” by Dan Gusfield
What you’ll build: The Boyer-Moore algorithm with both the “bad character” and “good suffix” heuristics. This is what GNU grep uses internally.
Why it teaches search tools: Boyer-Moore is the foundation of fast string search. Its key insight—scanning backwards and skipping forward—is counterintuitive but brilliant. Longer patterns search FASTER because you can skip more.
Core challenges you’ll face:
- Bad character table construction → maps to preprocessing patterns
- Good suffix table construction → maps to suffix analysis
- Rightmost occurrence lookup → maps to how much to skip
- Understanding why backwards scanning works → maps to algorithm intuition
Key Concepts:
- Bad Character Rule: “Algorithms on Strings, Trees, and Sequences” Chapter 2.2 - Gusfield
- Good Suffix Rule: “Algorithms on Strings, Trees, and Sequences” Chapter 2.2 - Gusfield
- Skip Tables: “The Boyer-Moore Fast String Searching Algorithm” - J. Strother Moore (original paper)
- Horspool Simplification: Boyer-Moore-Horspool variant that’s simpler to implement
Difficulty: Intermediate-Advanced Time estimate: 1 week Prerequisites: Project 1 (Naive Search), understanding of suffix/prefix concepts
Real world outcome:
$ ./boyer_moore "substantially" novel.txt
Found at position 45231
Found at position 89012
Search completed in 3ms # vs 45ms naive!
Skip table visualization:
Pattern: "example"
Bad character table:
'e' -> 1 (rightmost position from end)
'l' -> 2
'p' -> 3
'm' -> 4
'a' -> 5
'x' -> 6
(all others) -> 7 (pattern length)
Benchmark comparison:
Pattern length 5: Naive 12ms, Boyer-Moore 8ms (1.5x faster)
Pattern length 15: Naive 15ms, Boyer-Moore 2ms (7.5x faster)
Pattern length 50: Naive 18ms, Boyer-Moore 0.4ms (45x faster!)
Key insight: Longer patterns = faster search!
Implementation Hints: The bad character rule: When a mismatch occurs at text[i] and pattern[j], shift the pattern so that the rightmost occurrence of text[i] in the pattern aligns with text[i]. If text[i] doesn’t occur in the pattern, shift the entire pattern past position i.
The magic is that you compare from RIGHT TO LEFT within the pattern. When you mismatch, you’ve already seen characters to the right—use that information to skip forward!
Learning milestones:
- Bad character heuristic works → You understand basic skip logic
- Good suffix heuristic works → You understand suffix analysis
- Longer patterns search faster → You’ve internalized the counterintuitive insight
Project 3: Aho-Corasick Multi-Pattern Matching
- File: TEXT_SEARCH_TOOLS_DEEP_DIVE.md
- Main Programming Language: C
- Alternative Programming Languages: Rust, C++, Go
- Coolness Level: Level 4: Hardcore Tech Flex
- Business Potential: 3. The “Service & Support” Model
- Difficulty: Level 3: Advanced
- Knowledge Area: Algorithms / Automata Theory
- Software or Tool: Custom Implementation
- Main Book: “Introduction to Algorithms” by CLRS
What you’ll build: An Aho-Corasick automaton that can search for multiple patterns simultaneously in a single pass through the text. This is what fgrep and antivirus scanners use.
Why it teaches search tools: When you have multiple patterns (like 1000 keywords to search for), naive approaches become O(n×m×k) where k is pattern count. Aho-Corasick is O(n + m + z) where z is the number of matches—linear regardless of pattern count!
Core challenges you’ll face:
- Trie construction → maps to prefix tree data structures
- Failure link computation → maps to KMP failure function generalization
- Output function (suffix links) → maps to handling overlapping patterns
- State machine execution → maps to automata theory
Key Concepts:
- Trie Data Structure: “Introduction to Algorithms” Chapter 12 - CLRS
- Failure Links: The key innovation—where to go on mismatch (like KMP’s failure function)
- Finite State Machines: “Introduction to the Theory of Computation” Chapter 1 - Sipser
- Original Paper: “Efficient string matching: An aid to bibliographic search” - Aho & Corasick (1975)
Difficulty: Advanced Time estimate: 2 weeks Prerequisites: Project 2, understanding of tries and FSMs
Real world outcome:
$ ./aho_corasick keywords.txt document.txt
Loaded 1000 patterns from keywords.txt
Searching document.txt (5MB)...
Found "error" at position 1234
Found "warning" at position 1235
Found "error" at position 5678
...
Total: 4523 matches
Search completed in 45ms
Automaton visualization:
States: 4523
Transitions: 12456
Failure links: 4522
Benchmark:
1000 patterns, 5MB file:
Naive (run each pattern): 12,000ms
Aho-Corasick: 45ms
Speedup: 266x!
Implementation Hints: Build a trie from all patterns. Then compute “failure links”—for each state, where do you go when the next character doesn’t match any outgoing edge? The failure link points to the longest proper suffix of the current path that is also a prefix of some pattern.
Key insight: You never backtrack in the input. Each character is examined exactly once. The failure links let you “teleport” to the right state.
Learning milestones:
- Trie construction correct → You understand prefix trees
- Failure links computed correctly → You understand the KMP connection
- Finds all overlapping matches → You understand the output function
Phase 2: Regular Expression Engines
Project 4: NFA Regex Engine (Thompson’s Construction)
- File: TEXT_SEARCH_TOOLS_DEEP_DIVE.md
- Main Programming Language: C
- Alternative Programming Languages: Rust, Go, Zig
- Coolness Level: Level 5: Pure Magic
- Business Potential: 1. The “Resume Gold” (Educational/Personal Brand)
- Difficulty: Level 4: Expert
- Knowledge Area: Compilers / Automata Theory
- Software or Tool: Custom Implementation
- Main Book: “Compilers: Principles, Techniques, and Tools” by Aho, Sethi, Ullman (Dragon Book)
What you’ll build: A regex engine using Thompson’s NFA construction. Parse a regex, convert to NFA, and simulate the NFA on input text.
Why it teaches search tools: This is the foundation of RE2, Rust’s regex, and one mode of GNU grep. Understanding NFA simulation is crucial to understanding why some regex engines are fast and others have “catastrophic backtracking.”
Core challenges you’ll face:
- Regex parsing → maps to recursive descent parsing
- NFA construction (concatenation, alternation, Kleene star) → maps to automata construction
- NFA simulation (tracking multiple states) → maps to set operations
- ε-closure computation → maps to graph traversal
Key Concepts:
- Thompson’s Construction: “Compilers: Principles, Techniques, and Tools” Chapter 3.7 - Dragon Book
- NFA Simulation: “Regular Expression Matching Can Be Simple And Fast” - Russ Cox (swtch.com/~rsc/regexp/regexp1.html)
- ε-transitions: How to handle “empty” transitions efficiently
- State Set Representation: Bit vectors for O(1) state operations
Difficulty: Expert Time estimate: 2-3 weeks Prerequisites: Projects 1-3, understanding of automata theory basics
Real world outcome:
$ ./nfa_regex "a(b|c)*d" input.txt
Parsing regex: a(b|c)*d
NFA states: 8
NFA transitions: 10
Found match "abcd" at position 100
Found match "abbbcd" at position 200
Found match "acbcbd" at position 300
Search completed in 15ms
NFA visualization (DOT format):
0 -a-> 1
1 -ε-> 2
1 -ε-> 4
2 -b-> 3
4 -c-> 5
3 -ε-> 6
5 -ε-> 6
6 -ε-> 1
6 -ε-> 7
7 -d-> 8 (accepting)
Pathological test:
Pattern: (a+)+$
Input: "aaaaaaaaaaaaaaaaaaaaaaaaaaaa!"
Time: 0.1ms # Still fast! No backtracking!
Implementation Hints: Each regex operator becomes an NFA fragment:
- Literal ‘a’: State with single outgoing edge labeled ‘a’
- Concatenation AB: Connect A’s accepting state to B’s start state with ε
-
**Alternation A B**: New start state with ε to both A and B starts - Kleene star A*: ε to A, ε from A’s accept back to start, ε to skip A entirely
Simulation: Maintain a SET of current states. For each input character, compute the set of states reachable by following that character, then compute the ε-closure. Accept if final set contains an accepting state.
Learning milestones:
- Parses regex correctly → You understand recursive descent
- Constructs NFA correctly → You understand Thompson’s algorithm
- No catastrophic backtracking → You understand why NFAs are O(n×m)
Project 5: DFA Regex Engine with Lazy Construction
- File: TEXT_SEARCH_TOOLS_DEEP_DIVE.md
- Main Programming Language: C
- Alternative Programming Languages: Rust, C++, Go
- Coolness Level: Level 5: Pure Magic
- Business Potential: 3. The “Service & Support” Model
- Difficulty: Level 4: Expert
- Knowledge Area: Compilers / Automata Theory
- Software or Tool: Custom Implementation
- Main Book: “Introduction to the Theory of Computation” by Michael Sipser
What you’ll build: Convert NFA to DFA using the subset construction, but build states lazily (on-demand) to avoid exponential blowup. Cache states for reuse.
Why it teaches search tools: This is how production regex engines like RE2 and Rust’s regex work. DFA simulation is a simple table lookup per character—O(n) guaranteed. But DFAs can have exponentially many states, so lazy construction is essential.
Core challenges you’ll face:
- Subset construction algorithm → maps to powerset of NFA states
- State caching/memoization → maps to hash tables for state sets
- Cache eviction under memory pressure → maps to LRU caches
- Handling exponential blowup → maps to falling back to NFA
Key Concepts:
- Subset Construction: “Introduction to the Theory of Computation” Chapter 1.2 - Sipser
- Lazy DFA: “Regular Expression Matching: the Virtual Machine Approach” - Russ Cox
- State Set Hashing: Efficiently comparing and storing state sets
- Cache Management: When to evict cached DFA states
Difficulty: Expert Time estimate: 2-3 weeks Prerequisites: Project 4 (NFA Engine)
Real world outcome:
$ ./dfa_regex "a(b|c)*d" input.txt
NFA states: 8
DFA cache: lazy construction enabled
Searching...
DFA state 0 (NFA: {0}) -> 'a' -> DFA state 1 (NFA: {1,2,4,6,7})
DFA state 1 -> 'b' -> DFA state 2 (NFA: {1,2,3,4,6,7})
[States constructed on demand]
Found match "abcd" at position 100
Search completed in 5ms (vs 15ms NFA simulation)
Cache statistics:
DFA states constructed: 4
DFA states possible (worst case): 256 (2^8)
Cache hits: 45,231
Cache misses: 4
Benchmark:
NFA simulation: 15ms
Lazy DFA: 5ms
Full DFA (if we built it all): 3ms but 2MB memory
Implementation Hints: Each DFA state represents a SET of NFA states. When you see character ‘c’ in DFA state S, compute which NFA states are reachable from any state in S via ‘c’, take the ε-closure, and that’s your new DFA state.
The trick: Use a hash table to check if you’ve seen this state set before. If so, reuse the existing DFA state. If not, create a new one and cache it.
For the cache, use LRU eviction—if memory gets tight, throw away the least-recently-used DFA states. They’ll be reconstructed if needed.
Learning milestones:
- Subset construction produces correct DFA → You understand the powerset
- Lazy construction avoids memory blowup → You understand caching
- Performance matches or beats NFA → You understand the tradeoff
Project 6: Literal Extraction Optimizer
- File: TEXT_SEARCH_TOOLS_DEEP_DIVE.md
- Main Programming Language: Rust (or C)
- Alternative Programming Languages: C++, Go
- Coolness Level: Level 4: Hardcore Tech Flex
- Business Potential: 3. The “Service & Support” Model
- Difficulty: Level 3: Advanced
- Knowledge Area: Regex Optimization
- Software or Tool: Custom Implementation
- Main Book: “Regular Expression Matching” by Russ Cox (online articles)
What you’ll build: Analyze a regex to extract literal strings that MUST appear in any match. Use these literals to quickly skip non-matching regions before running the full regex.
Why it teaches search tools: This is ripgrep’s secret weapon. Most regexes have literal prefixes, suffixes, or required substrings. Find those with memchr/SIMD, THEN run the regex engine. This is why ripgrep searching for function.*async is almost as fast as searching for function.
Core challenges you’ll face:
- Prefix extraction → maps to “what literal MUST the match start with?”
- Suffix extraction → maps to “what literal MUST the match end with?”
- Required substring extraction → maps to “what literal MUST appear somewhere?”
- Handling alternation →
(foo|bar).*bazhas no common prefix, but “baz” is required
Key Concepts:
- Literal Extraction: regex-syntax crate documentation (Rust)
- Prefix/Suffix Analysis: Walk the regex AST looking for required literals
- Prefiltering: Run memchr on literals, verify with full regex
Difficulty: Advanced Time estimate: 1-2 weeks Prerequisites: Projects 4-5, understanding of regex ASTs
Real world outcome:
$ ./literal_optimizer "function\s+\w+\s*\(.*async" input.js
Regex analysis:
Prefix literals: ["function"]
Suffix literals: ["async"]
Required substrings: ["function", "async"]
Strategy:
1. memchr for "function"
2. If found, check if "async" exists after it
3. If both found, run full regex on candidate region
Benchmark (10MB JavaScript file):
Full regex scan: 150ms
Literal-optimized: 8ms
Speedup: 18.75x
Analysis of common patterns:
"TODO": 100% literal, use memchr directly
"error.*line \d+": prefix "error", can skip most text
"[a-z]+@[a-z]+\.[a-z]+": no useful literals (email regex)
"\berror\b": literal "error", boundary check at matches
Implementation Hints: Walk the regex syntax tree:
- Literal nodes: Accumulate into current prefix/suffix
- Concatenation: Prefix of A, suffix of B
- Alternation: Common prefix/suffix of all branches (may be empty)
- Kleene star: Can be skipped, so it “resets” the required substring
The extracted literals are passed to memchr (for single byte), memmem (for substring), or Aho-Corasick (for multiple literals).
Learning milestones:
- Correctly extracts prefix from simple regexes → You understand the concept
- Handles alternation and quantifiers → You understand the complexity
- 10x+ speedup on real regexes → You’ve internalized the optimization
Phase 3: SIMD and Low-Level Optimization
Project 7: SIMD memchr Implementation
- File: TEXT_SEARCH_TOOLS_DEEP_DIVE.md
- Main Programming Language: C (with intrinsics)
- Alternative Programming Languages: Rust, Assembly
- Coolness Level: Level 5: Pure Magic
- Business Potential: 4. The “Open Core” Infrastructure
- Difficulty: Level 4: Expert
- Knowledge Area: SIMD / Low-Level Optimization
- Software or Tool: SSE2/AVX2 Intrinsics
- Main Book: “Intel Intrinsics Guide” (online reference)
What you’ll build: A SIMD-accelerated memchr that finds a byte in a buffer at 15+ bytes per CPU cycle using SSE2/AVX2.
Why it teaches search tools: memchr is the foundation of all fast search. If you can find a byte at 15 bytes/cycle, you can prefilter gigabytes of text per second, only running the expensive regex engine on the tiny fraction that might match.
Core challenges you’ll face:
- SIMD vector loading (_mm_loadu_si128) → maps to loading 16 bytes at once
- Parallel comparison (_mm_cmpeq_epi8) → maps to comparing 16 bytes simultaneously
- Mask extraction (_mm_movemask_epi8) → maps to converting results to bitmask
- Bit scanning (__builtin_ctz) → maps to finding first match in mask
- Alignment handling → maps to memory alignment requirements
Key Concepts:
- SSE2 Intrinsics: Intel Intrinsics Guide (intel.com)
- Vector Comparison: Compare 16/32 bytes in a single instruction
- Movemask: Convert comparison result to 16/32-bit integer
- Bit Manipulation: Find first set bit in mask
Difficulty: Expert Time estimate: 1 week Prerequisites: Understanding of pointers, basic assembly concepts
Real world outcome:
$ ./simd_memchr '\n' large_file.txt
Searching for newlines in 1GB file...
Implementation comparison:
Naive (byte-by-byte): 1200ms (0.83 GB/s)
Standard memchr: 150ms (6.67 GB/s)
SSE2 memchr: 80ms (12.5 GB/s)
AVX2 memchr: 45ms (22.2 GB/s)
Throughput: 22.2 GB/s = ~9 bytes per cycle on 2.5GHz CPU
Disassembly of hot loop:
vmovdqu ymm0, [rdi] ; Load 32 bytes
vpcmpeqb ymm1, ymm0, ymm2 ; Compare with search byte
vpmovmskb eax, ymm1 ; Extract comparison mask
test eax, eax ; Any matches?
jnz found ; If yes, handle match
add rdi, 32 ; Advance pointer
jmp loop ; Continue
Implementation Hints:
// SSE2 memchr core loop
__m128i needle = _mm_set1_epi8(c); // Broadcast search byte to all 16 lanes
while (ptr + 16 <= end) {
__m128i chunk = _mm_loadu_si128((__m128i*)ptr); // Load 16 bytes
__m128i cmp = _mm_cmpeq_epi8(chunk, needle); // Compare all 16
int mask = _mm_movemask_epi8(cmp); // Get result as bits
if (mask != 0) {
return ptr + __builtin_ctz(mask); // Return first match
}
ptr += 16;
}
// Handle remaining bytes with scalar loop
Learning milestones:
- SSE2 version works correctly → You understand SIMD basics
- AVX2 version achieves 20+ GB/s → You understand wider vectors
- Handles alignment and tail bytes → You understand edge cases
Project 8: SIMD Substring Search (memmem)
- File: TEXT_SEARCH_TOOLS_DEEP_DIVE.md
- Main Programming Language: C (with intrinsics) or Rust
- Alternative Programming Languages: C++, Assembly
- Coolness Level: Level 5: Pure Magic
- Business Potential: 4. The “Open Core” Infrastructure
- Difficulty: Level 4: Expert
- Knowledge Area: SIMD / String Algorithms
- Software or Tool: SSE2/AVX2 Intrinsics
- Main Book: “Intel Intrinsics Guide” + Hyperscan papers
What you’ll build: A SIMD-accelerated substring search using the “two-byte trick” or similar techniques. Find candidates with SIMD, verify with memcmp.
Why it teaches search tools: memmem is what you use for literal substrings. The two-byte trick: search for the first TWO bytes of the needle simultaneously, then verify the rest. This is often faster than full Boyer-Moore for short needles.
Core challenges you’ll face:
- Two-byte candidate finding → maps to reducing false positives
- Vectorized candidate verification → maps to batched memcmp
- Handling variable needle lengths → maps to algorithm selection
- Combining with Boyer-Moore for long needles → maps to hybrid approaches
Key Concepts:
- Two-Way Algorithm: Used by glibc memmem
- SIMD Candidate Finding: Search for first+last byte simultaneously
- False Positive Rate: Why two bytes is much better than one byte
- Hybrid Strategies: Different algorithms for different needle lengths
Difficulty: Expert Time estimate: 1-2 weeks Prerequisites: Project 7 (SIMD memchr)
Real world outcome:
$ ./simd_memmem "function" source_code.txt
Searching for "function" in 100MB file...
Algorithm selection:
Needle length: 8 bytes
Strategy: SIMD two-byte + verify
Benchmark:
Naive strstr: 250ms
glibc memmem: 45ms
SIMD two-byte: 12ms
Boyer-Moore: 18ms (better for longer needles)
False positive analysis:
First byte 'f' matches: 2,345,678 times
First two bytes "fu" match: 12,345 times (190x reduction!)
Full "function" matches: 8,234 times
Disassembly insight:
Searching for 'f' AND 'n' (first and last byte) simultaneously
reduces candidates by 99.5%
Implementation Hints: The two-byte trick: Instead of searching for just the first byte, search for the first AND last byte of the needle at the correct offset. If needle is “hello” (length 5), search for positions where byte[i] == ‘h’ AND byte[i+4] == ‘o’. This dramatically reduces false positives.
With SIMD, you can check 16-32 positions simultaneously for both conditions, then AND the masks together.
Learning milestones:
- Two-byte search works correctly → You understand the optimization
- Faster than Boyer-Moore for short needles → You understand the tradeoff
- Hybrid approach selects best algorithm → You understand adaptive strategies
Project 9: Teddy Algorithm (Multi-Literal SIMD)
- File: TEXT_SEARCH_TOOLS_DEEP_DIVE.md
- Main Programming Language: Rust
- Alternative Programming Languages: C++
- Coolness Level: Level 5: Pure Magic
- Business Potential: 4. The “Open Core” Infrastructure
- Difficulty: Level 5: Master
- Knowledge Area: SIMD / Multiple Pattern Matching
- Software or Tool: SSSE3/AVX2 (PSHUFB instruction)
- Main Book: Hyperscan papers + Rust memchr crate source code
What you’ll build: The Teddy algorithm from Intel’s Hyperscan, which finds multiple short literal patterns simultaneously using the PSHUFB instruction for parallel table lookups.
Why it teaches search tools: Teddy is what ripgrep uses when you have a small number of literal patterns (up to ~64). It “completely blows away the competition for short substrings” according to ripgrep’s author.
Core challenges you’ll face:
- PSHUFB as a lookup table → maps to parallel 16-way lookup
- Fingerprint-based candidate detection → maps to bloom filter-like behavior
- Bucket management → maps to handling false positives
- Verification phase → maps to confirming actual matches
Key Concepts:
- PSHUFB Instruction: Use SIMD register as 16-entry lookup table
- Fingerprinting: Map patterns to fingerprints, search for fingerprints
- Bucket Disambiguation: Multiple patterns can share a fingerprint
- Teddy Paper: “Teddy: An Efficient SIMD-based Literal Matching Engine” - Intel
Difficulty: Master Time estimate: 3-4 weeks Prerequisites: Projects 7-8, deep SIMD understanding
Real world outcome:
$ ./teddy_search patterns.txt haystack.txt
Loading 32 patterns from patterns.txt...
Patterns: ["error", "warning", "fatal", "debug", ...]
Fingerprint table constructed:
Bucket 0: ["error", "event"]
Bucket 3: ["warning"]
Bucket 7: ["fatal", "fail"]
...
Searching 1GB haystack...
Candidates found: 145,234
Verified matches: 12,456
Search time: 89ms (11.2 GB/s throughput)
Benchmark comparison (32 patterns, 1GB file):
Aho-Corasick: 450ms
Teddy SIMD: 89ms
Speedup: 5x
Note: Teddy wins for small pattern sets (<~64 patterns)
Aho-Corasick wins for large pattern sets (1000+)
Implementation Hints: The key insight is that PSHUFB lets you do 16 parallel lookups in a single instruction. Each byte of the input indexes into a 16-byte table.
For each input position, you check if the first 3 bytes match any pattern’s first 3 bytes by having separate lookup tables for each byte position and ANDing the results.
The “fingerprint” is which patterns might match at this position. You then verify the full patterns in a scalar loop.
Learning milestones:
- PSHUFB-based lookup works → You understand the core trick
- Multiple patterns searched simultaneously → You understand fingerprinting
- Faster than Aho-Corasick for small sets → You’ve achieved the goal
Phase 4: File System and I/O Optimization
Project 10: Parallel Directory Traversal
- File: TEXT_SEARCH_TOOLS_DEEP_DIVE.md
- Main Programming Language: Rust
- Alternative Programming Languages: C++, Go
- Coolness Level: Level 4: Hardcore Tech Flex
- Business Potential: 3. The “Service & Support” Model
- Difficulty: Level 3: Advanced
- Knowledge Area: Concurrency / File Systems
- Software or Tool: Custom Implementation (or study ignore crate)
- Main Book: “The Art of Multiprocessor Programming” by Herlihy & Shavit
What you’ll build: A lock-free parallel directory walker that uses a work-stealing queue to distribute directory traversal across multiple threads.
Why it teaches search tools: Directory traversal is often the bottleneck, not the search itself. fd and ripgrep use parallel traversal to saturate modern NVMe SSDs. A single thread can’t keep up with drives that deliver 500K+ IOPS.
Core challenges you’ll face:
- Work-stealing queues → maps to load balancing
- Lock-free synchronization → maps to avoiding contention
- Directory entry batching → maps to reducing syscall overhead
- Respecting .gitignore → maps to ignore pattern matching
Key Concepts:
- Work-Stealing: “The Art of Multiprocessor Programming” Chapter 16 - Herlihy & Shavit
- Crossbeam Deque: Rust’s work-stealing deque implementation
- getdents64: Batched directory reading syscall
- Gitignore Semantics: Complex rules about inheritance and negation
Difficulty: Advanced Time estimate: 2 weeks Prerequisites: Basic threading, understanding of file systems
Real world outcome:
$ ./parallel_walker /home/user/code
Walking 2.1 million files across 150,000 directories...
Thread 0: 523,456 files processed
Thread 1: 518,234 files processed
Thread 2: 530,123 files processed
Thread 3: 528,187 files processed
Total time: 1.2 seconds
Throughput: 1.75 million files/second
Benchmark comparison:
Single-threaded: 4.8 seconds
4 threads: 1.2 seconds
Speedup: 4x (linear with thread count!)
Work-stealing statistics:
Steals performed: 1,234
Average queue depth: 45 directories
Implementation Hints: Use a deque per thread. When a thread discovers subdirectories, push them to its own deque. When a thread’s deque is empty, steal from another thread’s deque (from the opposite end to reduce contention).
For .gitignore, you need a stack of ignore matchers—each directory can have its own .gitignore that adds to the parent’s rules. Use a rope-like structure or thread-local copies to avoid contention.
Learning milestones:
- Basic parallel traversal works → You understand work distribution
- Work-stealing balances load → You understand dynamic scheduling
- Respects .gitignore correctly → You understand the complexity
Project 11: mmap vs read Strategy Selector
- File: TEXT_SEARCH_TOOLS_DEEP_DIVE.md
- Main Programming Language: C
- Alternative Programming Languages: Rust, C++
- Coolness Level: Level 3: Genuinely Clever
- Business Potential: 3. The “Service & Support” Model
- Difficulty: Level 3: Advanced
- Knowledge Area: Operating Systems / I/O
- Software or Tool: Custom Implementation
- Main Book: “Linux System Programming” by Robert Love
What you’ll build: A file reading abstraction that automatically chooses between mmap and buffered read based on file size, access pattern, and system characteristics.
Why it teaches search tools: ripgrep “chooses the best searching strategy for you automatically.” For single large files, mmap wins (no copy into userspace). For many small files, buffered read wins (lower overhead per file).
Core challenges you’ll face:
- mmap overhead measurement → maps to page table setup cost
- read buffer sizing → maps to syscall amortization
- Page fault handling → maps to understanding mmap tradeoffs
- File size heuristics → maps to when to switch strategies
Key Concepts:
- mmap Mechanics: “Linux System Programming” Chapter 9 - Robert Love
- Page Faults: Demand paging and its costs
- Buffered I/O: “Advanced Programming in the UNIX Environment” Chapter 5 - Stevens & Rago
- Linus on mmap: Understanding the “virtual memory mapping is very expensive in itself” quote
Difficulty: Advanced Time estimate: 1 week Prerequisites: Basic systems programming, understanding of virtual memory
Real world outcome:
$ ./io_selector --benchmark
Benchmarking I/O strategies...
Single 1GB file:
mmap: 120ms (8.33 GB/s)
read (64KB): 145ms (6.90 GB/s)
Winner: mmap (17% faster)
1000 files, 1MB each:
mmap: 850ms (1.18 GB/s)
read (64KB): 320ms (3.12 GB/s)
Winner: read (62% faster)
10000 files, 10KB each:
mmap: 2100ms (0.05 GB/s)
read (64KB): 180ms (0.56 GB/s)
Winner: read (10x faster!)
Heuristic derived:
Use mmap when: file_size > 1MB AND file_count < 10
Use read otherwise
Crossover point: ~500KB per file
Auto-selection in action:
./io_selector search "pattern" ./small_files/ → using read
./io_selector search "pattern" huge_file.log → using mmap
Implementation Hints: mmap’s overhead is per-file (page table setup, TLB shootdowns on unmap). This fixed cost is amortized over large files but dominates for small files.
read’s overhead is per-syscall, but you can amortize it with large buffers. However, you pay a memory copy cost that mmap avoids.
The crossover depends on hardware—measure on target systems and encode heuristics.
Learning milestones:
- Both strategies implemented correctly → You understand the mechanics
- Benchmark identifies crossover point → You understand the tradeoffs
- Auto-selection matches or beats manual choice → You’ve built a real optimization
Project 12: Line-Oriented Search with SIMD Line Counting
- File: TEXT_SEARCH_TOOLS_DEEP_DIVE.md
- Main Programming Language: C or Rust
- Alternative Programming Languages: C++, Zig
- Coolness Level: Level 4: Hardcore Tech Flex
- Business Potential: 3. The “Service & Support” Model
- Difficulty: Level 3: Advanced
- Knowledge Area: SIMD / Text Processing
- Software or Tool: Custom Implementation
- Main Book: “Intel Intrinsics Guide”
What you’ll build: A SIMD-accelerated newline counter and line boundary finder that can identify line starts/ends at gigabytes per second.
Why it teaches search tools: grep is LINE-oriented—you need to output the line containing the match. Finding line boundaries fast is critical. ripgrep uses SIMD to count newlines, which is why it can report line numbers without significant overhead.
Core challenges you’ll face:
- SIMD newline counting → maps to POPCNT on comparison masks
- Line boundary tracking → maps to maintaining byte offsets
- Interleaving with search → maps to fusion of operations
- Handling \r\n vs \n → maps to Windows compatibility
Key Concepts:
- SIMD POPCNT: Count bits set in comparison mask
- Prefix Sum: Computing line numbers from newline counts
- Line Caching: Store line boundaries for random access
- Streaming Processing: Process file in chunks without full materialization
Difficulty: Advanced Time estimate: 1 week Prerequisites: Project 7 (SIMD memchr)
Real world outcome:
$ ./line_counter huge_log.txt
File size: 5.2 GB
Lines: 124,567,890
Benchmark:
Naive (byte loop): 12.5 seconds
wc -l: 3.2 seconds
SIMD line count: 0.45 seconds (11.6 GB/s)
With line boundary tracking:
Line 50000000 starts at byte offset 2,345,678,901
Lookup time: 0.001ms (from precomputed index)
Integration with search:
Pattern found at byte 123456789
→ Line number 567890 (computed in 0.01ms)
→ Line content: "Error: connection timeout..."
Implementation Hints: Use SIMD memchr to find newlines. For each 16/32-byte chunk, get the comparison mask and use POPCNT to count the bits set (= number of newlines in that chunk).
For line boundary tracking, either store every Nth line’s offset (sparse index) or compute prefix sums on the fly.
To output context lines, you need to find the line boundaries AROUND a match, not just count lines up to it.
Learning milestones:
- SIMD newline counting correct and fast → You understand the basics
- Can find line containing a byte offset → You understand the application
- Performance matches ripgrep’s line counting → You’ve achieved production quality
Phase 5: Fuzzy Matching
Project 13: Fuzzy String Matcher (fzf-style)
- File: TEXT_SEARCH_TOOLS_DEEP_DIVE.md
- Main Programming Language: Go
- Alternative Programming Languages: Rust, C++, Python
- Coolness Level: Level 4: Hardcore Tech Flex
- Business Potential: 2. The “Micro-SaaS / Pro Tool” (Solo-Preneur Potential)
- Difficulty: Level 3: Advanced
- Knowledge Area: Algorithms / String Matching
- Software or Tool: Custom Implementation
- Main Book: fzf source code (algo.go)
What you’ll build: A fuzzy matcher that finds items matching a pattern with characters in order but not necessarily contiguous. Score matches by quality (gaps, position of matches, word boundaries).
Why it teaches search tools: fzf revolutionized interactive file selection. Understanding its scoring algorithm helps you build better developer tools. The trick is balancing quality (finding the best match) with speed (interactive responsiveness).
Core challenges you’ll face:
- Subsequence matching → maps to dynamic programming
- Scoring function design → maps to heuristics for “good” matches
- Word boundary bonuses → maps to camelCase and snake_case awareness
- Performance optimization → maps to early termination, SIMD for candidate filtering
Key Concepts:
- Longest Common Subsequence: Base algorithm for fuzzy matching
- Scoring Heuristics: Position bonuses, gap penalties, boundary bonuses
- Smith-Waterman: Scoring inspiration from bioinformatics
- fzf algo.go: The actual implementation to study
Difficulty: Advanced Time estimate: 2 weeks Prerequisites: Dynamic programming, string algorithms
Real world outcome:
$ ./fuzzy_matcher
Input: fzf
Candidates: [fuzzy_finder, foozbaz, fizzbuzz, FuzzyZoneFinder]
Matches (sorted by score):
1. FuzzyZoneFinder score=95 (f-z-f at word boundaries!)
2. fuzzy_finder score=78 (f-z-f with small gaps)
3. fizzbuzz score=45 (f-z at start, z far)
4. foozbaz score=32 (f-z far apart, z before a)
Score breakdown for "FuzzyZoneFinder":
'F' at position 0: +20 (start bonus)
'z' at position 3: +15 (after capital, word boundary)
'f' at position 10: +18 (camelCase boundary)
Gap penalty: -3 (3 chars between z and f)
Total: 95
Performance:
10,000 candidates: 2ms
100,000 candidates: 18ms
1,000,000 candidates: 150ms
Implementation Hints: For each candidate, find the best way to match the pattern characters as a subsequence. Use dynamic programming to track score at each (pattern_pos, candidate_pos).
Scoring heuristics (from fzf):
- Bonus for matching at start of word (after _, space, or capital)
- Bonus for matching after a camelCase boundary
- Penalty for gaps between matches
- Bonus for consecutive matches
The first character match gets extra weight because it’s likely intentional.
Learning milestones:
- Basic subsequence matching works → You understand the problem
- Scoring produces intuitive rankings → You understand heuristics
- Fast enough for interactive use (<100ms for 100K items) → You understand optimization
Project 14: Interactive Fuzzy Finder (Full fzf Clone)
- File: TEXT_SEARCH_TOOLS_DEEP_DIVE.md
- Main Programming Language: Go
- Alternative Programming Languages: Rust, C++
- Coolness Level: Level 4: Hardcore Tech Flex
- Business Potential: 2. The “Micro-SaaS / Pro Tool” (Solo-Preneur Potential)
- Difficulty: Level 3: Advanced
- Knowledge Area: TUI / Concurrency
- Software or Tool: Custom Implementation
- Main Book: fzf source code
What you’ll build: A complete interactive fuzzy finder with: terminal UI, incremental input, parallel matching with worker goroutines, result streaming, preview pane.
Why it teaches search tools: fzf shows how to build responsive interactive tools. The architecture—input reader, matcher pool, result merger, renderer—is a pattern you’ll use in many tools.
Core challenges you’ll face:
- Terminal UI (raw mode, escape sequences) → maps to TUI rendering
- Incremental search (update as you type) → maps to debouncing and cancellation
- Parallel matching → maps to worker pools and result merging
- Streaming large inputs → maps to handling unbounded data
Key Concepts:
- Terminal Raw Mode: Reading characters without waiting for Enter
- ANSI Escape Codes: Cursor movement, colors, clearing
- Worker Goroutines: Parallel matching with cancellation
- Result Merging: Combining ranked results from multiple workers
Difficulty: Advanced Time estimate: 3-4 weeks Prerequisites: Project 13, terminal programming basics
Real world outcome:
$ find . -type f | ./mini_fzf
[Interactive TUI appears]
> src/ma
4/1234
> src/main.rs
src/matcher.go
src/main_test.go
src/map.rs
Features:
- Type to filter
- Arrow keys to navigate
- Enter to select
- Ctrl-C to cancel
- Preview pane (--preview) shows file content
Performance:
- 100K items loaded in 50ms
- Typing latency <10ms (feels instant)
- Matches update incrementally as you type
Implementation Hints: Architecture (from fzf):
- Reader: Reads input lines, chunks them, sends to matcher
- Matcher: Pool of workers that score chunks against current pattern
- Merger: Combines worker results, maintains top-K
- Terminal: Renders results, handles input
When the pattern changes, cancel in-flight matching and start new match. Use channels for communication between components.
Learning milestones:
- Basic TUI works (type and see results) → You understand terminal rendering
- Results update smoothly while typing → You understand incremental search
- Handles millions of items smoothly → You understand the architecture
Phase 6: Putting It All Together
Project 15: Build Your Own ripgrep
- File: TEXT_SEARCH_TOOLS_DEEP_DIVE.md
- Main Programming Language: Rust
- Alternative Programming Languages: C++, Go
- Coolness Level: Level 5: Pure Magic
- Business Potential: 4. The “Open Core” Infrastructure
- Difficulty: Level 5: Master
- Knowledge Area: Systems Programming / Everything Combined
- Software or Tool: Custom Implementation
- Main Book: ripgrep source code + all previous resources
What you’ll build: A production-quality grep with: parallel directory traversal, .gitignore support, SIMD-accelerated literal search, regex support with literal extraction, colored output, context lines.
Why it teaches search tools: This is the capstone project. You’ll integrate everything: parallel I/O, SIMD algorithms, regex engines, output formatting. The result is a tool YOU built that can search gigabytes per second.
Core challenges you’ll face:
- Integrating all previous components → maps to system architecture
- Memory efficiency → maps to avoiding unnecessary allocations
- Correctness edge cases → maps to handling binary files, encoding issues
- User experience polish → maps to colors, progress, error messages
Key Concepts:
- Component Integration: How the pieces fit together
- Error Handling: Graceful degradation on permission errors, encoding issues
- Output Buffering: Line buffering vs block buffering for piped output
- Configuration: Command-line argument parsing, config files
Difficulty: Master Time estimate: 2-3 months Prerequisites: All previous projects
Real world outcome:
$ ./mygrep "function.*async" ~/code --stats
Searching 150,234 files...
src/server/api.ts:145: async function handleRequest(req: Request) {
src/server/api.ts:234: export async function processData(data: Data) {
src/utils/fetch.ts:12: async function fetchWithRetry(url: string) {
...
Statistics:
Files searched: 150,234
Files matched: 1,234
Lines matched: 5,678
Time: 0.89s
Throughput: 2.1 GB/s
$ hyperfine './mygrep TODO ~/code' 'rg TODO ~/code'
mygrep: 0.95s ± 0.02s
rg: 0.87s ± 0.02s
Within 10% of ripgrep! 🎉
Implementation Hints: Start with a working but slow version, then optimize:
- Single-threaded, read-based file access, naive regex
- Add parallel directory traversal
- Add literal extraction and SIMD prefiltering
- Add mmap for large files
- Optimize output (buffer, avoid unnecessary work)
Profile continuously. The bottleneck shifts as you optimize—first it’s regex, then I/O, then output.
Learning milestones:
- Correct results on all test cases → You understand the requirements
- Within 2x of ripgrep performance → You understand optimization
- Users would actually use this → You’ve built something real
Project Comparison Table
| # | Project | Difficulty | Time | Depth of Understanding | Fun Factor |
|---|---|---|---|---|---|
| 1 | Naive String Search | Beginner | 2-3 hours | ⭐⭐ | ⭐⭐ |
| 2 | Boyer-Moore | Int-Adv | 1 week | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ |
| 3 | Aho-Corasick | Advanced | 2 weeks | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ |
| 4 | NFA Regex Engine | Expert | 2-3 weeks | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ |
| 5 | DFA Regex (Lazy) | Expert | 2-3 weeks | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ |
| 6 | Literal Extraction | Advanced | 1-2 weeks | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ |
| 7 | SIMD memchr | Expert | 1 week | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ |
| 8 | SIMD memmem | Expert | 1-2 weeks | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ |
| 9 | Teddy Algorithm | Master | 3-4 weeks | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ |
| 10 | Parallel Dir Traversal | Advanced | 2 weeks | ⭐⭐⭐⭐ | ⭐⭐⭐ |
| 11 | mmap vs read Selector | Advanced | 1 week | ⭐⭐⭐⭐ | ⭐⭐⭐ |
| 12 | SIMD Line Counting | Advanced | 1 week | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ |
| 13 | Fuzzy Matcher | Advanced | 2 weeks | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ |
| 14 | Interactive fzf Clone | Advanced | 3-4 weeks | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ |
| 15 | Build Your Own ripgrep | Master | 2-3 months | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ |
Recommended Learning Path
Path A: Algorithm Focus (Theory → Implementation)
- Projects 1-3 (String Matching Algorithms)
- Projects 4-5 (Regex Engines)
- Project 6 (Literal Extraction)
- Project 15 (Integration)
Path B: Performance Focus (Make It Fast)
- Project 1 (Baseline)
- Projects 7-8 (SIMD Fundamentals)
- Projects 10-11 (I/O Optimization)
- Project 15 (Integration)
Path C: Tools Focus (Build Useful Things)
- Projects 1-2 (Basic Algorithms)
- Projects 10-11 (Parallel I/O)
- Projects 13-14 (Fuzzy Finder)
- Project 15 (ripgrep Clone)
Why Each Tool Is Fast: Summary
GNU grep
- Boyer-Moore + DFA/NFA hybrid
- Hand-tuned inner loop
- Falls back to NFA for complex patterns
- Decades of optimization
The Silver Searcher (ag)
- mmap + pthreads + PCRE JIT
- First to respect .gitignore by default
- Binary search for ignore patterns
- Up to 8 worker threads
ripgrep
- Finite automata regex (guaranteed linear time)
- Teddy SIMD algorithm for literals
- Lock-free parallel directory traversal
- Automatic mmap vs read selection
- UTF-8 decoding built into DFA
- Literal extraction for regex optimization
fd
- Parallel directory traversal
- Uses ripgrep’s regex and ignore crates
- Smart defaults (ignore hidden, respect .gitignore)
- 10-23x faster than find for pattern matching
fzf
- Custom fuzzy matching with scoring heuristics
- Worker goroutine pool for parallel matching
- Incremental search with result streaming
- Two algorithm modes: v1 (fast) and v2 (optimal)
git ls-files (bonus)
- Reads from Git’s index file directly
- No directory traversal at all!
- 5x+ faster than any directory walker
- Only works inside Git repos
Key Insights
-
Most “regex” searches are literal: The fastest path is finding literals with SIMD, then verifying with regex.
-
DFA > NFA for guaranteed performance: Backtracking regex engines can take exponential time. Finite automata are O(n) always.
-
Parallelization matters more than algorithm optimization: On modern SSDs, a single thread can’t saturate I/O. Use work-stealing queues.
-
Skip what you don’t need: Respecting .gitignore often reduces search space by 10-100x.
-
Measure, don’t assume: mmap isn’t always faster. Boyer-Moore isn’t always better. Profile on real workloads.
-
SIMD is a superpower: 16-32 bytes per instruction changes everything. memchr at 15 bytes/cycle is 15x faster than byte-by-byte.
Summary
| Project # | Project Name | Main Language |
|---|---|---|
| 1 | Naive String Search | C |
| 2 | Boyer-Moore String Search | C |
| 3 | Aho-Corasick Multi-Pattern | C |
| 4 | NFA Regex Engine | C |
| 5 | DFA Regex Engine (Lazy) | C |
| 6 | Literal Extraction Optimizer | Rust |
| 7 | SIMD memchr | C |
| 8 | SIMD memmem | C |
| 9 | Teddy Algorithm | Rust |
| 10 | Parallel Directory Traversal | Rust |
| 11 | mmap vs read Selector | C |
| 12 | SIMD Line Counting | C |
| 13 | Fuzzy String Matcher | Go |
| 14 | Interactive Fuzzy Finder | Go |
| 15 | Build Your Own ripgrep | Rust |
Sources
- ripgrep is faster than {grep, ag, git grep, ucg, pt, sift} - Andrew Gallant’s comprehensive benchmark
- ripgrep GitHub
- Ripgrep vs Grep Performance
- The Silver Searcher GitHub
- Boyer-Moore Algorithm (Wikipedia)
- Boyer-Moore at UT Austin
- Aho-Corasick Algorithm
- Aho-Corasick Rust Implementation
- Regular Expression Matching Can Be Simple And Fast - Russ Cox
- NFA vs DFA Regex Engines
- Hyperscan: Regular Expression Match
- Teddy Algorithm Paper
- memchr Rust Crate
- fd: A simple, fast alternative to find
- fd is 23x Faster Than find
- bfs 3.0: the fastest find yet
- fzf GitHub
- fzf Algorithm Source
- mmap vs read Performance
- Git ls-files is Faster Than fd and find
- Regex Literals Optimization
- GNU grep Development