TEXT SEARCH TOOLS DEEP DIVE
Understanding why some search tools are 10-100x faster than others requires diving deep into algorithms, system calls, parallelization, and modern CPU features. This guide takes you from fundamental string matching algorithms to building production-quality search tools.
Deep Dive: Text Search Tools (grep, ripgrep, ag, fzf, find, fd)
Understanding why some search tools are 10-100x faster than others requires diving deep into algorithms, system calls, parallelization, and modern CPU features. This guide takes you from fundamental string matching algorithms to building production-quality search tools.
The Landscape: Tool Comparison
| Tool | Year | Language | Regex Engine | Key Innovation |
|---|---|---|---|---|
| grep | 1974 | C | Hand-rolled DFA/NFA + Boyer-Moore | Original, Unix philosophy |
| egrep | 1976 | C | DFA | First deterministic regex |
| GNU grep | 1988 | C | Hybrid DFA/NFA + Boyer-Moore | Tuned Boyer-Moore |
| ack | 2005 | Perl | Perl regex | First âcode-awareâ grep |
| The Silver Searcher (ag) | 2011 | C | PCRE (JIT) | mmap + pthreads + gitignore |
| ripgrep (rg) | 2016 | Rust | regex-automata (DFA) | Teddy SIMD + parallel + Unicode |
| fd | 2017 | Rust | regex | Parallel traversal + gitignore |
| fzf | 2013 | Go | Custom fuzzy | Interactive + worker goroutines |
Why Performance Differs: The Core Factors
1. Regex Engine Architecture (Biggest Factor)
Backtracking (NFA) Engines: PCRE, Perl, Python, Java, .NET
- Can express everything (backreferences, lookahead)
- Catastrophic backtracking: Exponential time on pathological patterns
- Example:
(a+)+$onaaaaaaaaaaaaaaaaaaaaaaaaaaaa!takes SECONDS
Finite Automata (DFA) Engines: RE2, Rust regex, GNU grep (partial)
- Linear time guaranteed: O(n) where n = input length
- Cannot do backreferences, but who uses those in grep?
- Key insight: âOnly ever in one state at a timeâ
2. Literal Optimization (The Secret Weapon)
Most âregexâ searches are actually literal strings or have literal prefixes:
TODOâ literalfunction.*asyncâ literal prefix âfunctionâ\berror\bâ literal âerrorâ + boundary check
The Strategy: Find the literal fast (SIMD), THEN run the regex engine.
3. SIMD Vectorization
Modern CPUs can compare 16-32 bytes simultaneously:
- SSE2: 128-bit registers (16 bytes at once)
- AVX2: 256-bit registers (32 bytes at once)
The memchr crate achieves 15+ bytes per CPU cycle on finding a single byte.
4. Parallelization
- ag: pthreads, up to 8 workers
- ripgrep: Lock-free parallel directory traversal via crossbeam
- fzf: Worker goroutines for matching
5. I/O Strategy
- mmap: Map file directly into memory (good for single large files)
- read + buffer: Traditional buffered I/O (good for many small files)
- Adaptive: ripgrep chooses automatically
6. What NOT to Search
- Respect
.gitignore(skip node_modules, build artifacts) - Skip binary files
- Skip hidden files by default
Phase 1: String Matching Algorithms
Project 1: Naive String Search (Baseline)
- File: TEXT_SEARCH_TOOLS_DEEP_DIVE.md
- Main Programming Language: C
- Alternative Programming Languages: Rust, Zig, Go
- Coolness Level: Level 2: Practical but Forgettable
- Business Potential: 1. The âResume Goldâ (Educational/Personal Brand)
- Difficulty: Level 1: Beginner
- Knowledge Area: Algorithms / String Matching
- Software or Tool: Custom Implementation
- Main Book: âIntroduction to Algorithmsâ by Cormen, Leiserson, Rivest, Stein (CLRS)
What youâll build: A brute-force string search that compares every position in the text against the pattern. This is your baseline to understand what weâre optimizing.
Why it teaches search tools: You need to see the naive O(nĂm) approach to appreciate why Boyer-Moore and SIMD matter. This is also what many standard library strstr() functions use!
Core challenges youâll face:
- Understanding worst case â pattern
aaaaabinaaaaaaaaaaaaaaaaaabâ maps to why preprocessing matters - Character-by-character comparison â maps to CPU cache behavior
- Early termination â maps to optimization opportunities
Key Concepts:
- String Matching Basics: âIntroduction to Algorithmsâ Chapter 32.1 - CLRS
- Big-O Analysis: Understanding O(nĂm) worst case vs O(n) average case
- Pointer Arithmetic: âC Programming Languageâ Chapter 5 - K&R
Difficulty: Beginner Time estimate: 2-3 hours Prerequisites: Basic C, pointers
Real world outcome:
$ ./naive_search "needle" haystack.txt
Found at position 1234
Found at position 5678
Search completed in 45ms
$ ./naive_search "aaaaaab" pathological.txt
# Watch it struggle on worst-case input
Search completed in 2300ms # Much slower!
Benchmark:
Pattern "hello" in 1MB file:
Naive: 12ms
(We'll compare to Boyer-Moore later)
Implementation Hints:
For each position i in text:
For each position j in pattern:
If text[i+j] != pattern[j]:
Break (mismatch)
If j == pattern_length - 1:
Found match at position i
The key insight is that we re-examine the same characters multiple times. If weâve seen âabcdâ and the pattern is âabceâ, we know something about what the text containsâwe shouldnât forget it!
Learning milestones:
- Implementation works correctly â You understand the basic problem
- Can identify worst-case inputs â You understand algorithm analysis
- See why this is slow â Youâre ready for optimization
Project 2: Boyer-Moore String Search
- File: TEXT_SEARCH_TOOLS_DEEP_DIVE.md
- Main Programming Language: C
- Alternative Programming Languages: Rust, C++, Zig
- Coolness Level: Level 4: Hardcore Tech Flex
- Business Potential: 1. The âResume Goldâ (Educational/Personal Brand)
- Difficulty: Level 3: Advanced
- Knowledge Area: Algorithms / String Matching
- Software or Tool: Custom Implementation
- Main Book: âAlgorithms on Strings, Trees, and Sequencesâ by Dan Gusfield
What youâll build: The Boyer-Moore algorithm with both the âbad characterâ and âgood suffixâ heuristics. This is what GNU grep uses internally.
Why it teaches search tools: Boyer-Moore is the foundation of fast string search. Its key insightâscanning backwards and skipping forwardâis counterintuitive but brilliant. Longer patterns search FASTER because you can skip more.
Core challenges youâll face:
- Bad character table construction â maps to preprocessing patterns
- Good suffix table construction â maps to suffix analysis
- Rightmost occurrence lookup â maps to how much to skip
- Understanding why backwards scanning works â maps to algorithm intuition
Key Concepts:
- Bad Character Rule: âAlgorithms on Strings, Trees, and Sequencesâ Chapter 2.2 - Gusfield
- Good Suffix Rule: âAlgorithms on Strings, Trees, and Sequencesâ Chapter 2.2 - Gusfield
- Skip Tables: âThe Boyer-Moore Fast String Searching Algorithmâ - J. Strother Moore (original paper)
- Horspool Simplification: Boyer-Moore-Horspool variant thatâs simpler to implement
Difficulty: Intermediate-Advanced Time estimate: 1 week Prerequisites: Project 1 (Naive Search), understanding of suffix/prefix concepts
Real world outcome:
$ ./boyer_moore "substantially" novel.txt
Found at position 45231
Found at position 89012
Search completed in 3ms # vs 45ms naive!
Skip table visualization:
Pattern: "example"
Bad character table:
'e' -> 1 (rightmost position from end)
'l' -> 2
'p' -> 3
'm' -> 4
'a' -> 5
'x' -> 6
(all others) -> 7 (pattern length)
Benchmark comparison:
Pattern length 5: Naive 12ms, Boyer-Moore 8ms (1.5x faster)
Pattern length 15: Naive 15ms, Boyer-Moore 2ms (7.5x faster)
Pattern length 50: Naive 18ms, Boyer-Moore 0.4ms (45x faster!)
Key insight: Longer patterns = faster search!
Implementation Hints: The bad character rule: When a mismatch occurs at text[i] and pattern[j], shift the pattern so that the rightmost occurrence of text[i] in the pattern aligns with text[i]. If text[i] doesnât occur in the pattern, shift the entire pattern past position i.
The magic is that you compare from RIGHT TO LEFT within the pattern. When you mismatch, youâve already seen characters to the rightâuse that information to skip forward!
Learning milestones:
- Bad character heuristic works â You understand basic skip logic
- Good suffix heuristic works â You understand suffix analysis
- Longer patterns search faster â Youâve internalized the counterintuitive insight
Project 3: Aho-Corasick Multi-Pattern Matching
- File: TEXT_SEARCH_TOOLS_DEEP_DIVE.md
- Main Programming Language: C
- Alternative Programming Languages: Rust, C++, Go
- Coolness Level: Level 4: Hardcore Tech Flex
- Business Potential: 3. The âService & Supportâ Model
- Difficulty: Level 3: Advanced
- Knowledge Area: Algorithms / Automata Theory
- Software or Tool: Custom Implementation
- Main Book: âIntroduction to Algorithmsâ by CLRS
What youâll build: An Aho-Corasick automaton that can search for multiple patterns simultaneously in a single pass through the text. This is what fgrep and antivirus scanners use.
Why it teaches search tools: When you have multiple patterns (like 1000 keywords to search for), naive approaches become O(nĂmĂk) where k is pattern count. Aho-Corasick is O(n + m + z) where z is the number of matchesâlinear regardless of pattern count!
Core challenges youâll face:
- Trie construction â maps to prefix tree data structures
- Failure link computation â maps to KMP failure function generalization
- Output function (suffix links) â maps to handling overlapping patterns
- State machine execution â maps to automata theory
Key Concepts:
- Trie Data Structure: âIntroduction to Algorithmsâ Chapter 12 - CLRS
- Failure Links: The key innovationâwhere to go on mismatch (like KMPâs failure function)
- Finite State Machines: âIntroduction to the Theory of Computationâ Chapter 1 - Sipser
- Original Paper: âEfficient string matching: An aid to bibliographic searchâ - Aho & Corasick (1975)
Difficulty: Advanced Time estimate: 2 weeks Prerequisites: Project 2, understanding of tries and FSMs
Real world outcome:
$ ./aho_corasick keywords.txt document.txt
Loaded 1000 patterns from keywords.txt
Searching document.txt (5MB)...
Found "error" at position 1234
Found "warning" at position 1235
Found "error" at position 5678
...
Total: 4523 matches
Search completed in 45ms
Automaton visualization:
States: 4523
Transitions: 12456
Failure links: 4522
Benchmark:
1000 patterns, 5MB file:
Naive (run each pattern): 12,000ms
Aho-Corasick: 45ms
Speedup: 266x!
Implementation Hints: Build a trie from all patterns. Then compute âfailure linksââfor each state, where do you go when the next character doesnât match any outgoing edge? The failure link points to the longest proper suffix of the current path that is also a prefix of some pattern.
Key insight: You never backtrack in the input. Each character is examined exactly once. The failure links let you âteleportâ to the right state.
Learning milestones:
- Trie construction correct â You understand prefix trees
- Failure links computed correctly â You understand the KMP connection
- Finds all overlapping matches â You understand the output function
Phase 2: Regular Expression Engines
Project 4: NFA Regex Engine (Thompsonâs Construction)
- File: TEXT_SEARCH_TOOLS_DEEP_DIVE.md
- Main Programming Language: C
- Alternative Programming Languages: Rust, Go, Zig
- Coolness Level: Level 5: Pure Magic
- Business Potential: 1. The âResume Goldâ (Educational/Personal Brand)
- Difficulty: Level 4: Expert
- Knowledge Area: Compilers / Automata Theory
- Software or Tool: Custom Implementation
- Main Book: âCompilers: Principles, Techniques, and Toolsâ by Aho, Sethi, Ullman (Dragon Book)
What youâll build: A regex engine using Thompsonâs NFA construction. Parse a regex, convert to NFA, and simulate the NFA on input text.
Why it teaches search tools: This is the foundation of RE2, Rustâs regex, and one mode of GNU grep. Understanding NFA simulation is crucial to understanding why some regex engines are fast and others have âcatastrophic backtracking.â
Core challenges youâll face:
- Regex parsing â maps to recursive descent parsing
- NFA construction (concatenation, alternation, Kleene star) â maps to automata construction
- NFA simulation (tracking multiple states) â maps to set operations
- Δ-closure computation â maps to graph traversal
Key Concepts:
- Thompsonâs Construction: âCompilers: Principles, Techniques, and Toolsâ Chapter 3.7 - Dragon Book
- NFA Simulation: âRegular Expression Matching Can Be Simple And Fastâ - Russ Cox (swtch.com/~rsc/regexp/regexp1.html)
- Δ-transitions: How to handle âemptyâ transitions efficiently
- State Set Representation: Bit vectors for O(1) state operations
Difficulty: Expert Time estimate: 2-3 weeks Prerequisites: Projects 1-3, understanding of automata theory basics
Real world outcome:
$ ./nfa_regex "a(b|c)*d" input.txt
Parsing regex: a(b|c)*d
NFA states: 8
NFA transitions: 10
Found match "abcd" at position 100
Found match "abbbcd" at position 200
Found match "acbcbd" at position 300
Search completed in 15ms
NFA visualization (DOT format):
0 -a-> 1
1 -Δ-> 2
1 -Δ-> 4
2 -b-> 3
4 -c-> 5
3 -Δ-> 6
5 -Δ-> 6
6 -Δ-> 1
6 -Δ-> 7
7 -d-> 8 (accepting)
Pathological test:
Pattern: (a+)+$
Input: "aaaaaaaaaaaaaaaaaaaaaaaaaaaa!"
Time: 0.1ms # Still fast! No backtracking!
Implementation Hints: Each regex operator becomes an NFA fragment:
- Literal âaâ: State with single outgoing edge labeled âaâ
- Concatenation AB: Connect Aâs accepting state to Bâs start state with Δ
-
**Alternation A B**: New start state with Δ to both A and B starts - Kleene star A*: Δ to A, Δ from Aâs accept back to start, Δ to skip A entirely
Simulation: Maintain a SET of current states. For each input character, compute the set of states reachable by following that character, then compute the Δ-closure. Accept if final set contains an accepting state.
Learning milestones:
- Parses regex correctly â You understand recursive descent
- Constructs NFA correctly â You understand Thompsonâs algorithm
- No catastrophic backtracking â You understand why NFAs are O(nĂm)
Project 5: DFA Regex Engine with Lazy Construction
- File: TEXT_SEARCH_TOOLS_DEEP_DIVE.md
- Main Programming Language: C
- Alternative Programming Languages: Rust, C++, Go
- Coolness Level: Level 5: Pure Magic
- Business Potential: 3. The âService & Supportâ Model
- Difficulty: Level 4: Expert
- Knowledge Area: Compilers / Automata Theory
- Software or Tool: Custom Implementation
- Main Book: âIntroduction to the Theory of Computationâ by Michael Sipser
What youâll build: Convert NFA to DFA using the subset construction, but build states lazily (on-demand) to avoid exponential blowup. Cache states for reuse.
Why it teaches search tools: This is how production regex engines like RE2 and Rustâs regex work. DFA simulation is a simple table lookup per characterâO(n) guaranteed. But DFAs can have exponentially many states, so lazy construction is essential.
Core challenges youâll face:
- Subset construction algorithm â maps to powerset of NFA states
- State caching/memoization â maps to hash tables for state sets
- Cache eviction under memory pressure â maps to LRU caches
- Handling exponential blowup â maps to falling back to NFA
Key Concepts:
- Subset Construction: âIntroduction to the Theory of Computationâ Chapter 1.2 - Sipser
- Lazy DFA: âRegular Expression Matching: the Virtual Machine Approachâ - Russ Cox
- State Set Hashing: Efficiently comparing and storing state sets
- Cache Management: When to evict cached DFA states
Difficulty: Expert Time estimate: 2-3 weeks Prerequisites: Project 4 (NFA Engine)
Real world outcome:
$ ./dfa_regex "a(b|c)*d" input.txt
NFA states: 8
DFA cache: lazy construction enabled
Searching...
DFA state 0 (NFA: {0}) -> 'a' -> DFA state 1 (NFA: {1,2,4,6,7})
DFA state 1 -> 'b' -> DFA state 2 (NFA: {1,2,3,4,6,7})
[States constructed on demand]
Found match "abcd" at position 100
Search completed in 5ms (vs 15ms NFA simulation)
Cache statistics:
DFA states constructed: 4
DFA states possible (worst case): 256 (2^8)
Cache hits: 45,231
Cache misses: 4
Benchmark:
NFA simulation: 15ms
Lazy DFA: 5ms
Full DFA (if we built it all): 3ms but 2MB memory
Implementation Hints: Each DFA state represents a SET of NFA states. When you see character âcâ in DFA state S, compute which NFA states are reachable from any state in S via âcâ, take the Δ-closure, and thatâs your new DFA state.
The trick: Use a hash table to check if youâve seen this state set before. If so, reuse the existing DFA state. If not, create a new one and cache it.
For the cache, use LRU evictionâif memory gets tight, throw away the least-recently-used DFA states. Theyâll be reconstructed if needed.
Learning milestones:
- Subset construction produces correct DFA â You understand the powerset
- Lazy construction avoids memory blowup â You understand caching
- Performance matches or beats NFA â You understand the tradeoff
Project 6: Literal Extraction Optimizer
- File: TEXT_SEARCH_TOOLS_DEEP_DIVE.md
- Main Programming Language: Rust (or C)
- Alternative Programming Languages: C++, Go
- Coolness Level: Level 4: Hardcore Tech Flex
- Business Potential: 3. The âService & Supportâ Model
- Difficulty: Level 3: Advanced
- Knowledge Area: Regex Optimization
- Software or Tool: Custom Implementation
- Main Book: âRegular Expression Matchingâ by Russ Cox (online articles)
What youâll build: Analyze a regex to extract literal strings that MUST appear in any match. Use these literals to quickly skip non-matching regions before running the full regex.
Why it teaches search tools: This is ripgrepâs secret weapon. Most regexes have literal prefixes, suffixes, or required substrings. Find those with memchr/SIMD, THEN run the regex engine. This is why ripgrep searching for function.*async is almost as fast as searching for function.
Core challenges youâll face:
- Prefix extraction â maps to âwhat literal MUST the match start with?â
- Suffix extraction â maps to âwhat literal MUST the match end with?â
- Required substring extraction â maps to âwhat literal MUST appear somewhere?â
- Handling alternation â
(foo|bar).*bazhas no common prefix, but âbazâ is required
Key Concepts:
- Literal Extraction: regex-syntax crate documentation (Rust)
- Prefix/Suffix Analysis: Walk the regex AST looking for required literals
- Prefiltering: Run memchr on literals, verify with full regex
Difficulty: Advanced Time estimate: 1-2 weeks Prerequisites: Projects 4-5, understanding of regex ASTs
Real world outcome:
$ ./literal_optimizer "function\s+\w+\s*\(.*async" input.js
Regex analysis:
Prefix literals: ["function"]
Suffix literals: ["async"]
Required substrings: ["function", "async"]
Strategy:
1. memchr for "function"
2. If found, check if "async" exists after it
3. If both found, run full regex on candidate region
Benchmark (10MB JavaScript file):
Full regex scan: 150ms
Literal-optimized: 8ms
Speedup: 18.75x
Analysis of common patterns:
"TODO": 100% literal, use memchr directly
"error.*line \d+": prefix "error", can skip most text
"[a-z]+@[a-z]+\.[a-z]+": no useful literals (email regex)
"\berror\b": literal "error", boundary check at matches
Implementation Hints: Walk the regex syntax tree:
- Literal nodes: Accumulate into current prefix/suffix
- Concatenation: Prefix of A, suffix of B
- Alternation: Common prefix/suffix of all branches (may be empty)
- Kleene star: Can be skipped, so it âresetsâ the required substring
The extracted literals are passed to memchr (for single byte), memmem (for substring), or Aho-Corasick (for multiple literals).
Learning milestones:
- Correctly extracts prefix from simple regexes â You understand the concept
- Handles alternation and quantifiers â You understand the complexity
- 10x+ speedup on real regexes â Youâve internalized the optimization
Phase 3: SIMD and Low-Level Optimization
Project 7: SIMD memchr Implementation
- File: TEXT_SEARCH_TOOLS_DEEP_DIVE.md
- Main Programming Language: C (with intrinsics)
- Alternative Programming Languages: Rust, Assembly
- Coolness Level: Level 5: Pure Magic
- Business Potential: 4. The âOpen Coreâ Infrastructure
- Difficulty: Level 4: Expert
- Knowledge Area: SIMD / Low-Level Optimization
- Software or Tool: SSE2/AVX2 Intrinsics
- Main Book: âIntel Intrinsics Guideâ (online reference)
What youâll build: A SIMD-accelerated memchr that finds a byte in a buffer at 15+ bytes per CPU cycle using SSE2/AVX2.
Why it teaches search tools: memchr is the foundation of all fast search. If you can find a byte at 15 bytes/cycle, you can prefilter gigabytes of text per second, only running the expensive regex engine on the tiny fraction that might match.
Core challenges youâll face:
- SIMD vector loading (_mm_loadu_si128) â maps to loading 16 bytes at once
- Parallel comparison (_mm_cmpeq_epi8) â maps to comparing 16 bytes simultaneously
- Mask extraction (_mm_movemask_epi8) â maps to converting results to bitmask
- Bit scanning (__builtin_ctz) â maps to finding first match in mask
- Alignment handling â maps to memory alignment requirements
Key Concepts:
- SSE2 Intrinsics: Intel Intrinsics Guide (intel.com)
- Vector Comparison: Compare 16/32 bytes in a single instruction
- Movemask: Convert comparison result to 16/32-bit integer
- Bit Manipulation: Find first set bit in mask
Difficulty: Expert Time estimate: 1 week Prerequisites: Understanding of pointers, basic assembly concepts
Real world outcome:
$ ./simd_memchr '\n' large_file.txt
Searching for newlines in 1GB file...
Implementation comparison:
Naive (byte-by-byte): 1200ms (0.83 GB/s)
Standard memchr: 150ms (6.67 GB/s)
SSE2 memchr: 80ms (12.5 GB/s)
AVX2 memchr: 45ms (22.2 GB/s)
Throughput: 22.2 GB/s = ~9 bytes per cycle on 2.5GHz CPU
Disassembly of hot loop:
vmovdqu ymm0, [rdi] ; Load 32 bytes
vpcmpeqb ymm1, ymm0, ymm2 ; Compare with search byte
vpmovmskb eax, ymm1 ; Extract comparison mask
test eax, eax ; Any matches?
jnz found ; If yes, handle match
add rdi, 32 ; Advance pointer
jmp loop ; Continue
Implementation Hints:
// SSE2 memchr core loop
__m128i needle = _mm_set1_epi8(c); // Broadcast search byte to all 16 lanes
while (ptr + 16 <= end) {
__m128i chunk = _mm_loadu_si128((__m128i*)ptr); // Load 16 bytes
__m128i cmp = _mm_cmpeq_epi8(chunk, needle); // Compare all 16
int mask = _mm_movemask_epi8(cmp); // Get result as bits
if (mask != 0) {
return ptr + __builtin_ctz(mask); // Return first match
}
ptr += 16;
}
// Handle remaining bytes with scalar loop
Learning milestones:
- SSE2 version works correctly â You understand SIMD basics
- AVX2 version achieves 20+ GB/s â You understand wider vectors
- Handles alignment and tail bytes â You understand edge cases
Project 8: SIMD Substring Search (memmem)
- File: TEXT_SEARCH_TOOLS_DEEP_DIVE.md
- Main Programming Language: C (with intrinsics) or Rust
- Alternative Programming Languages: C++, Assembly
- Coolness Level: Level 5: Pure Magic
- Business Potential: 4. The âOpen Coreâ Infrastructure
- Difficulty: Level 4: Expert
- Knowledge Area: SIMD / String Algorithms
- Software or Tool: SSE2/AVX2 Intrinsics
- Main Book: âIntel Intrinsics Guideâ + Hyperscan papers
What youâll build: A SIMD-accelerated substring search using the âtwo-byte trickâ or similar techniques. Find candidates with SIMD, verify with memcmp.
Why it teaches search tools: memmem is what you use for literal substrings. The two-byte trick: search for the first TWO bytes of the needle simultaneously, then verify the rest. This is often faster than full Boyer-Moore for short needles.
Core challenges youâll face:
- Two-byte candidate finding â maps to reducing false positives
- Vectorized candidate verification â maps to batched memcmp
- Handling variable needle lengths â maps to algorithm selection
- Combining with Boyer-Moore for long needles â maps to hybrid approaches
Key Concepts:
- Two-Way Algorithm: Used by glibc memmem
- SIMD Candidate Finding: Search for first+last byte simultaneously
- False Positive Rate: Why two bytes is much better than one byte
- Hybrid Strategies: Different algorithms for different needle lengths
Difficulty: Expert Time estimate: 1-2 weeks Prerequisites: Project 7 (SIMD memchr)
Real world outcome:
$ ./simd_memmem "function" source_code.txt
Searching for "function" in 100MB file...
Algorithm selection:
Needle length: 8 bytes
Strategy: SIMD two-byte + verify
Benchmark:
Naive strstr: 250ms
glibc memmem: 45ms
SIMD two-byte: 12ms
Boyer-Moore: 18ms (better for longer needles)
False positive analysis:
First byte 'f' matches: 2,345,678 times
First two bytes "fu" match: 12,345 times (190x reduction!)
Full "function" matches: 8,234 times
Disassembly insight:
Searching for 'f' AND 'n' (first and last byte) simultaneously
reduces candidates by 99.5%
Implementation Hints: The two-byte trick: Instead of searching for just the first byte, search for the first AND last byte of the needle at the correct offset. If needle is âhelloâ (length 5), search for positions where byte[i] == âhâ AND byte[i+4] == âoâ. This dramatically reduces false positives.
With SIMD, you can check 16-32 positions simultaneously for both conditions, then AND the masks together.
Learning milestones:
- Two-byte search works correctly â You understand the optimization
- Faster than Boyer-Moore for short needles â You understand the tradeoff
- Hybrid approach selects best algorithm â You understand adaptive strategies
Project 9: Teddy Algorithm (Multi-Literal SIMD)
- File: TEXT_SEARCH_TOOLS_DEEP_DIVE.md
- Main Programming Language: Rust
- Alternative Programming Languages: C++
- Coolness Level: Level 5: Pure Magic
- Business Potential: 4. The âOpen Coreâ Infrastructure
- Difficulty: Level 5: Master
- Knowledge Area: SIMD / Multiple Pattern Matching
- Software or Tool: SSSE3/AVX2 (PSHUFB instruction)
- Main Book: Hyperscan papers + Rust memchr crate source code
What youâll build: The Teddy algorithm from Intelâs Hyperscan, which finds multiple short literal patterns simultaneously using the PSHUFB instruction for parallel table lookups.
Why it teaches search tools: Teddy is what ripgrep uses when you have a small number of literal patterns (up to ~64). It âcompletely blows away the competition for short substringsâ according to ripgrepâs author.
Core challenges youâll face:
- PSHUFB as a lookup table â maps to parallel 16-way lookup
- Fingerprint-based candidate detection â maps to bloom filter-like behavior
- Bucket management â maps to handling false positives
- Verification phase â maps to confirming actual matches
Key Concepts:
- PSHUFB Instruction: Use SIMD register as 16-entry lookup table
- Fingerprinting: Map patterns to fingerprints, search for fingerprints
- Bucket Disambiguation: Multiple patterns can share a fingerprint
- Teddy Paper: âTeddy: An Efficient SIMD-based Literal Matching Engineâ - Intel
Difficulty: Master Time estimate: 3-4 weeks Prerequisites: Projects 7-8, deep SIMD understanding
Real world outcome:
$ ./teddy_search patterns.txt haystack.txt
Loading 32 patterns from patterns.txt...
Patterns: ["error", "warning", "fatal", "debug", ...]
Fingerprint table constructed:
Bucket 0: ["error", "event"]
Bucket 3: ["warning"]
Bucket 7: ["fatal", "fail"]
...
Searching 1GB haystack...
Candidates found: 145,234
Verified matches: 12,456
Search time: 89ms (11.2 GB/s throughput)
Benchmark comparison (32 patterns, 1GB file):
Aho-Corasick: 450ms
Teddy SIMD: 89ms
Speedup: 5x
Note: Teddy wins for small pattern sets (<~64 patterns)
Aho-Corasick wins for large pattern sets (1000+)
Implementation Hints: The key insight is that PSHUFB lets you do 16 parallel lookups in a single instruction. Each byte of the input indexes into a 16-byte table.
For each input position, you check if the first 3 bytes match any patternâs first 3 bytes by having separate lookup tables for each byte position and ANDing the results.
The âfingerprintâ is which patterns might match at this position. You then verify the full patterns in a scalar loop.
Learning milestones:
- PSHUFB-based lookup works â You understand the core trick
- Multiple patterns searched simultaneously â You understand fingerprinting
- Faster than Aho-Corasick for small sets â Youâve achieved the goal
Phase 4: File System and I/O Optimization
Project 10: Parallel Directory Traversal
- File: TEXT_SEARCH_TOOLS_DEEP_DIVE.md
- Main Programming Language: Rust
- Alternative Programming Languages: C++, Go
- Coolness Level: Level 4: Hardcore Tech Flex
- Business Potential: 3. The âService & Supportâ Model
- Difficulty: Level 3: Advanced
- Knowledge Area: Concurrency / File Systems
- Software or Tool: Custom Implementation (or study ignore crate)
- Main Book: âThe Art of Multiprocessor Programmingâ by Herlihy & Shavit
What youâll build: A lock-free parallel directory walker that uses a work-stealing queue to distribute directory traversal across multiple threads.
Why it teaches search tools: Directory traversal is often the bottleneck, not the search itself. fd and ripgrep use parallel traversal to saturate modern NVMe SSDs. A single thread canât keep up with drives that deliver 500K+ IOPS.
Core challenges youâll face:
- Work-stealing queues â maps to load balancing
- Lock-free synchronization â maps to avoiding contention
- Directory entry batching â maps to reducing syscall overhead
- Respecting .gitignore â maps to ignore pattern matching
Key Concepts:
- Work-Stealing: âThe Art of Multiprocessor Programmingâ Chapter 16 - Herlihy & Shavit
- Crossbeam Deque: Rustâs work-stealing deque implementation
- getdents64: Batched directory reading syscall
- Gitignore Semantics: Complex rules about inheritance and negation
Difficulty: Advanced Time estimate: 2 weeks Prerequisites: Basic threading, understanding of file systems
Real world outcome:
$ ./parallel_walker /home/user/code
Walking 2.1 million files across 150,000 directories...
Thread 0: 523,456 files processed
Thread 1: 518,234 files processed
Thread 2: 530,123 files processed
Thread 3: 528,187 files processed
Total time: 1.2 seconds
Throughput: 1.75 million files/second
Benchmark comparison:
Single-threaded: 4.8 seconds
4 threads: 1.2 seconds
Speedup: 4x (linear with thread count!)
Work-stealing statistics:
Steals performed: 1,234
Average queue depth: 45 directories
Implementation Hints: Use a deque per thread. When a thread discovers subdirectories, push them to its own deque. When a threadâs deque is empty, steal from another threadâs deque (from the opposite end to reduce contention).
For .gitignore, you need a stack of ignore matchersâeach directory can have its own .gitignore that adds to the parentâs rules. Use a rope-like structure or thread-local copies to avoid contention.
Learning milestones:
- Basic parallel traversal works â You understand work distribution
- Work-stealing balances load â You understand dynamic scheduling
- Respects .gitignore correctly â You understand the complexity
Project 11: mmap vs read Strategy Selector
- File: TEXT_SEARCH_TOOLS_DEEP_DIVE.md
- Main Programming Language: C
- Alternative Programming Languages: Rust, C++
- Coolness Level: Level 3: Genuinely Clever
- Business Potential: 3. The âService & Supportâ Model
- Difficulty: Level 3: Advanced
- Knowledge Area: Operating Systems / I/O
- Software or Tool: Custom Implementation
- Main Book: âLinux System Programmingâ by Robert Love
What youâll build: A file reading abstraction that automatically chooses between mmap and buffered read based on file size, access pattern, and system characteristics.
Why it teaches search tools: ripgrep âchooses the best searching strategy for you automatically.â For single large files, mmap wins (no copy into userspace). For many small files, buffered read wins (lower overhead per file).
Core challenges youâll face:
- mmap overhead measurement â maps to page table setup cost
- read buffer sizing â maps to syscall amortization
- Page fault handling â maps to understanding mmap tradeoffs
- File size heuristics â maps to when to switch strategies
Key Concepts:
- mmap Mechanics: âLinux System Programmingâ Chapter 9 - Robert Love
- Page Faults: Demand paging and its costs
- Buffered I/O: âAdvanced Programming in the UNIX Environmentâ Chapter 5 - Stevens & Rago
- Linus on mmap: Understanding the âvirtual memory mapping is very expensive in itselfâ quote
Difficulty: Advanced Time estimate: 1 week Prerequisites: Basic systems programming, understanding of virtual memory
Real world outcome:
$ ./io_selector --benchmark
Benchmarking I/O strategies...
Single 1GB file:
mmap: 120ms (8.33 GB/s)
read (64KB): 145ms (6.90 GB/s)
Winner: mmap (17% faster)
1000 files, 1MB each:
mmap: 850ms (1.18 GB/s)
read (64KB): 320ms (3.12 GB/s)
Winner: read (62% faster)
10000 files, 10KB each:
mmap: 2100ms (0.05 GB/s)
read (64KB): 180ms (0.56 GB/s)
Winner: read (10x faster!)
Heuristic derived:
Use mmap when: file_size > 1MB AND file_count < 10
Use read otherwise
Crossover point: ~500KB per file
Auto-selection in action:
./io_selector search "pattern" ./small_files/ â using read
./io_selector search "pattern" huge_file.log â using mmap
Implementation Hints: mmapâs overhead is per-file (page table setup, TLB shootdowns on unmap). This fixed cost is amortized over large files but dominates for small files.
readâs overhead is per-syscall, but you can amortize it with large buffers. However, you pay a memory copy cost that mmap avoids.
The crossover depends on hardwareâmeasure on target systems and encode heuristics.
Learning milestones:
- Both strategies implemented correctly â You understand the mechanics
- Benchmark identifies crossover point â You understand the tradeoffs
- Auto-selection matches or beats manual choice â Youâve built a real optimization
Project 12: Line-Oriented Search with SIMD Line Counting
- File: TEXT_SEARCH_TOOLS_DEEP_DIVE.md
- Main Programming Language: C or Rust
- Alternative Programming Languages: C++, Zig
- Coolness Level: Level 4: Hardcore Tech Flex
- Business Potential: 3. The âService & Supportâ Model
- Difficulty: Level 3: Advanced
- Knowledge Area: SIMD / Text Processing
- Software or Tool: Custom Implementation
- Main Book: âIntel Intrinsics Guideâ
What youâll build: A SIMD-accelerated newline counter and line boundary finder that can identify line starts/ends at gigabytes per second.
Why it teaches search tools: grep is LINE-orientedâyou need to output the line containing the match. Finding line boundaries fast is critical. ripgrep uses SIMD to count newlines, which is why it can report line numbers without significant overhead.
Core challenges youâll face:
- SIMD newline counting â maps to POPCNT on comparison masks
- Line boundary tracking â maps to maintaining byte offsets
- Interleaving with search â maps to fusion of operations
- Handling \r\n vs \n â maps to Windows compatibility
Key Concepts:
- SIMD POPCNT: Count bits set in comparison mask
- Prefix Sum: Computing line numbers from newline counts
- Line Caching: Store line boundaries for random access
- Streaming Processing: Process file in chunks without full materialization
Difficulty: Advanced Time estimate: 1 week Prerequisites: Project 7 (SIMD memchr)
Real world outcome:
$ ./line_counter huge_log.txt
File size: 5.2 GB
Lines: 124,567,890
Benchmark:
Naive (byte loop): 12.5 seconds
wc -l: 3.2 seconds
SIMD line count: 0.45 seconds (11.6 GB/s)
With line boundary tracking:
Line 50000000 starts at byte offset 2,345,678,901
Lookup time: 0.001ms (from precomputed index)
Integration with search:
Pattern found at byte 123456789
â Line number 567890 (computed in 0.01ms)
â Line content: "Error: connection timeout..."
Implementation Hints: Use SIMD memchr to find newlines. For each 16/32-byte chunk, get the comparison mask and use POPCNT to count the bits set (= number of newlines in that chunk).
For line boundary tracking, either store every Nth lineâs offset (sparse index) or compute prefix sums on the fly.
To output context lines, you need to find the line boundaries AROUND a match, not just count lines up to it.
Learning milestones:
- SIMD newline counting correct and fast â You understand the basics
- Can find line containing a byte offset â You understand the application
- Performance matches ripgrepâs line counting â Youâve achieved production quality
Phase 5: Fuzzy Matching
Project 13: Fuzzy String Matcher (fzf-style)
- File: TEXT_SEARCH_TOOLS_DEEP_DIVE.md
- Main Programming Language: Go
- Alternative Programming Languages: Rust, C++, Python
- Coolness Level: Level 4: Hardcore Tech Flex
- Business Potential: 2. The âMicro-SaaS / Pro Toolâ (Solo-Preneur Potential)
- Difficulty: Level 3: Advanced
- Knowledge Area: Algorithms / String Matching
- Software or Tool: Custom Implementation
- Main Book: fzf source code (algo.go)
What youâll build: A fuzzy matcher that finds items matching a pattern with characters in order but not necessarily contiguous. Score matches by quality (gaps, position of matches, word boundaries).
Why it teaches search tools: fzf revolutionized interactive file selection. Understanding its scoring algorithm helps you build better developer tools. The trick is balancing quality (finding the best match) with speed (interactive responsiveness).
Core challenges youâll face:
- Subsequence matching â maps to dynamic programming
- Scoring function design â maps to heuristics for âgoodâ matches
- Word boundary bonuses â maps to camelCase and snake_case awareness
- Performance optimization â maps to early termination, SIMD for candidate filtering
Key Concepts:
- Longest Common Subsequence: Base algorithm for fuzzy matching
- Scoring Heuristics: Position bonuses, gap penalties, boundary bonuses
- Smith-Waterman: Scoring inspiration from bioinformatics
- fzf algo.go: The actual implementation to study
Difficulty: Advanced Time estimate: 2 weeks Prerequisites: Dynamic programming, string algorithms
Real world outcome:
$ ./fuzzy_matcher
Input: fzf
Candidates: [fuzzy_finder, foozbaz, fizzbuzz, FuzzyZoneFinder]
Matches (sorted by score):
1. FuzzyZoneFinder score=95 (f-z-f at word boundaries!)
2. fuzzy_finder score=78 (f-z-f with small gaps)
3. fizzbuzz score=45 (f-z at start, z far)
4. foozbaz score=32 (f-z far apart, z before a)
Score breakdown for "FuzzyZoneFinder":
'F' at position 0: +20 (start bonus)
'z' at position 3: +15 (after capital, word boundary)
'f' at position 10: +18 (camelCase boundary)
Gap penalty: -3 (3 chars between z and f)
Total: 95
Performance:
10,000 candidates: 2ms
100,000 candidates: 18ms
1,000,000 candidates: 150ms
Implementation Hints: For each candidate, find the best way to match the pattern characters as a subsequence. Use dynamic programming to track score at each (pattern_pos, candidate_pos).
Scoring heuristics (from fzf):
- Bonus for matching at start of word (after _, space, or capital)
- Bonus for matching after a camelCase boundary
- Penalty for gaps between matches
- Bonus for consecutive matches
The first character match gets extra weight because itâs likely intentional.
Learning milestones:
- Basic subsequence matching works â You understand the problem
- Scoring produces intuitive rankings â You understand heuristics
- Fast enough for interactive use (<100ms for 100K items) â You understand optimization
Project 14: Interactive Fuzzy Finder (Full fzf Clone)
- File: TEXT_SEARCH_TOOLS_DEEP_DIVE.md
- Main Programming Language: Go
- Alternative Programming Languages: Rust, C++
- Coolness Level: Level 4: Hardcore Tech Flex
- Business Potential: 2. The âMicro-SaaS / Pro Toolâ (Solo-Preneur Potential)
- Difficulty: Level 3: Advanced
- Knowledge Area: TUI / Concurrency
- Software or Tool: Custom Implementation
- Main Book: fzf source code
What youâll build: A complete interactive fuzzy finder with: terminal UI, incremental input, parallel matching with worker goroutines, result streaming, preview pane.
Why it teaches search tools: fzf shows how to build responsive interactive tools. The architectureâinput reader, matcher pool, result merger, rendererâis a pattern youâll use in many tools.
Core challenges youâll face:
- Terminal UI (raw mode, escape sequences) â maps to TUI rendering
- Incremental search (update as you type) â maps to debouncing and cancellation
- Parallel matching â maps to worker pools and result merging
- Streaming large inputs â maps to handling unbounded data
Key Concepts:
- Terminal Raw Mode: Reading characters without waiting for Enter
- ANSI Escape Codes: Cursor movement, colors, clearing
- Worker Goroutines: Parallel matching with cancellation
- Result Merging: Combining ranked results from multiple workers
Difficulty: Advanced Time estimate: 3-4 weeks Prerequisites: Project 13, terminal programming basics
Real world outcome:
$ find . -type f | ./mini_fzf
[Interactive TUI appears]
> src/ma
4/1234
> src/main.rs
src/matcher.go
src/main_test.go
src/map.rs
Features:
- Type to filter
- Arrow keys to navigate
- Enter to select
- Ctrl-C to cancel
- Preview pane (--preview) shows file content
Performance:
- 100K items loaded in 50ms
- Typing latency <10ms (feels instant)
- Matches update incrementally as you type
Implementation Hints: Architecture (from fzf):
- Reader: Reads input lines, chunks them, sends to matcher
- Matcher: Pool of workers that score chunks against current pattern
- Merger: Combines worker results, maintains top-K
- Terminal: Renders results, handles input
When the pattern changes, cancel in-flight matching and start new match. Use channels for communication between components.
Learning milestones:
- Basic TUI works (type and see results) â You understand terminal rendering
- Results update smoothly while typing â You understand incremental search
- Handles millions of items smoothly â You understand the architecture
Phase 6: Putting It All Together
Project 15: Build Your Own ripgrep
- File: TEXT_SEARCH_TOOLS_DEEP_DIVE.md
- Main Programming Language: Rust
- Alternative Programming Languages: C++, Go
- Coolness Level: Level 5: Pure Magic
- Business Potential: 4. The âOpen Coreâ Infrastructure
- Difficulty: Level 5: Master
- Knowledge Area: Systems Programming / Everything Combined
- Software or Tool: Custom Implementation
- Main Book: ripgrep source code + all previous resources
What youâll build: A production-quality grep with: parallel directory traversal, .gitignore support, SIMD-accelerated literal search, regex support with literal extraction, colored output, context lines.
Why it teaches search tools: This is the capstone project. Youâll integrate everything: parallel I/O, SIMD algorithms, regex engines, output formatting. The result is a tool YOU built that can search gigabytes per second.
Core challenges youâll face:
- Integrating all previous components â maps to system architecture
- Memory efficiency â maps to avoiding unnecessary allocations
- Correctness edge cases â maps to handling binary files, encoding issues
- User experience polish â maps to colors, progress, error messages
Key Concepts:
- Component Integration: How the pieces fit together
- Error Handling: Graceful degradation on permission errors, encoding issues
- Output Buffering: Line buffering vs block buffering for piped output
- Configuration: Command-line argument parsing, config files
Difficulty: Master Time estimate: 2-3 months Prerequisites: All previous projects
Real world outcome:
$ ./mygrep "function.*async" ~/code --stats
Searching 150,234 files...
src/server/api.ts:145: async function handleRequest(req: Request) {
src/server/api.ts:234: export async function processData(data: Data) {
src/utils/fetch.ts:12: async function fetchWithRetry(url: string) {
...
Statistics:
Files searched: 150,234
Files matched: 1,234
Lines matched: 5,678
Time: 0.89s
Throughput: 2.1 GB/s
$ hyperfine './mygrep TODO ~/code' 'rg TODO ~/code'
mygrep: 0.95s ± 0.02s
rg: 0.87s ± 0.02s
Within 10% of ripgrep! đ
Implementation Hints: Start with a working but slow version, then optimize:
- Single-threaded, read-based file access, naive regex
- Add parallel directory traversal
- Add literal extraction and SIMD prefiltering
- Add mmap for large files
- Optimize output (buffer, avoid unnecessary work)
Profile continuously. The bottleneck shifts as you optimizeâfirst itâs regex, then I/O, then output.
Learning milestones:
- Correct results on all test cases â You understand the requirements
- Within 2x of ripgrep performance â You understand optimization
- Users would actually use this â Youâve built something real
Project Comparison Table
| # | Project | Difficulty | Time | Depth of Understanding | Fun Factor |
|---|---|---|---|---|---|
| 1 | Naive String Search | Beginner | 2-3 hours | ââ | ââ |
| 2 | Boyer-Moore | Int-Adv | 1 week | âââââ | ââââ |
| 3 | Aho-Corasick | Advanced | 2 weeks | âââââ | ââââ |
| 4 | NFA Regex Engine | Expert | 2-3 weeks | âââââ | âââââ |
| 5 | DFA Regex (Lazy) | Expert | 2-3 weeks | âââââ | ââââ |
| 6 | Literal Extraction | Advanced | 1-2 weeks | ââââ | ââââ |
| 7 | SIMD memchr | Expert | 1 week | âââââ | âââââ |
| 8 | SIMD memmem | Expert | 1-2 weeks | âââââ | âââââ |
| 9 | Teddy Algorithm | Master | 3-4 weeks | âââââ | âââââ |
| 10 | Parallel Dir Traversal | Advanced | 2 weeks | ââââ | âââ |
| 11 | mmap vs read Selector | Advanced | 1 week | ââââ | âââ |
| 12 | SIMD Line Counting | Advanced | 1 week | ââââ | ââââ |
| 13 | Fuzzy Matcher | Advanced | 2 weeks | ââââ | ââââ |
| 14 | Interactive fzf Clone | Advanced | 3-4 weeks | âââââ | âââââ |
| 15 | Build Your Own ripgrep | Master | 2-3 months | âââââ | âââââ |
Recommended Learning Path
Path A: Algorithm Focus (Theory â Implementation)
- Projects 1-3 (String Matching Algorithms)
- Projects 4-5 (Regex Engines)
- Project 6 (Literal Extraction)
- Project 15 (Integration)
Path B: Performance Focus (Make It Fast)
- Project 1 (Baseline)
- Projects 7-8 (SIMD Fundamentals)
- Projects 10-11 (I/O Optimization)
- Project 15 (Integration)
Path C: Tools Focus (Build Useful Things)
- Projects 1-2 (Basic Algorithms)
- Projects 10-11 (Parallel I/O)
- Projects 13-14 (Fuzzy Finder)
- Project 15 (ripgrep Clone)
Why Each Tool Is Fast: Summary
GNU grep
- Boyer-Moore + DFA/NFA hybrid
- Hand-tuned inner loop
- Falls back to NFA for complex patterns
- Decades of optimization
The Silver Searcher (ag)
- mmap + pthreads + PCRE JIT
- First to respect .gitignore by default
- Binary search for ignore patterns
- Up to 8 worker threads
ripgrep
- Finite automata regex (guaranteed linear time)
- Teddy SIMD algorithm for literals
- Lock-free parallel directory traversal
- Automatic mmap vs read selection
- UTF-8 decoding built into DFA
- Literal extraction for regex optimization
fd
- Parallel directory traversal
- Uses ripgrepâs regex and ignore crates
- Smart defaults (ignore hidden, respect .gitignore)
- 10-23x faster than find for pattern matching
fzf
- Custom fuzzy matching with scoring heuristics
- Worker goroutine pool for parallel matching
- Incremental search with result streaming
- Two algorithm modes: v1 (fast) and v2 (optimal)
git ls-files (bonus)
- Reads from Gitâs index file directly
- No directory traversal at all!
- 5x+ faster than any directory walker
- Only works inside Git repos
Key Insights
-
Most âregexâ searches are literal: The fastest path is finding literals with SIMD, then verifying with regex.
-
DFA > NFA for guaranteed performance: Backtracking regex engines can take exponential time. Finite automata are O(n) always.
-
Parallelization matters more than algorithm optimization: On modern SSDs, a single thread canât saturate I/O. Use work-stealing queues.
-
Skip what you donât need: Respecting .gitignore often reduces search space by 10-100x.
-
Measure, donât assume: mmap isnât always faster. Boyer-Moore isnât always better. Profile on real workloads.
-
SIMD is a superpower: 16-32 bytes per instruction changes everything. memchr at 15 bytes/cycle is 15x faster than byte-by-byte.
Summary
| Project # | Project Name | Main Language |
|---|---|---|
| 1 | Naive String Search | C |
| 2 | Boyer-Moore String Search | C |
| 3 | Aho-Corasick Multi-Pattern | C |
| 4 | NFA Regex Engine | C |
| 5 | DFA Regex Engine (Lazy) | C |
| 6 | Literal Extraction Optimizer | Rust |
| 7 | SIMD memchr | C |
| 8 | SIMD memmem | C |
| 9 | Teddy Algorithm | Rust |
| 10 | Parallel Directory Traversal | Rust |
| 11 | mmap vs read Selector | C |
| 12 | SIMD Line Counting | C |
| 13 | Fuzzy String Matcher | Go |
| 14 | Interactive Fuzzy Finder | Go |
| 15 | Build Your Own ripgrep | Rust |
Sources
- ripgrep is faster than {grep, ag, git grep, ucg, pt, sift} - Andrew Gallantâs comprehensive benchmark
- ripgrep GitHub
- Ripgrep vs Grep Performance
- The Silver Searcher GitHub
- Boyer-Moore Algorithm (Wikipedia)
- Boyer-Moore at UT Austin
- Aho-Corasick Algorithm
- Aho-Corasick Rust Implementation
- Regular Expression Matching Can Be Simple And Fast - Russ Cox
- NFA vs DFA Regex Engines
- Hyperscan: Regular Expression Match
- Teddy Algorithm Paper
- memchr Rust Crate
- fd: A simple, fast alternative to find
- fd is 23x Faster Than find
- bfs 3.0: the fastest find yet
- fzf GitHub
- fzf Algorithm Source
- mmap vs read Performance
- Git ls-files is Faster Than fd and find
- Regex Literals Optimization
- GNU grep Development