Project 1: Naive String Search (Baseline)
A brute-force string search that compares every position in the text against the pattern. This is your baseline to understand what we’re optimizing.
Quick Reference
| Attribute | Value |
|---|---|
| Primary Language | C |
| Alternative Languages | Rust, Zig, Go |
| Difficulty | Level 1: Beginner |
| Time Estimate | 2-3 hours |
| Knowledge Area | Algorithms / String Matching |
| Tooling | Custom Implementation |
| Prerequisites | Basic C, pointers |
What You Will Build
A brute-force string search that compares every position in the text against the pattern. This is your baseline to understand what we’re optimizing.
Why It Matters
This project builds core skills that appear repeatedly in real-world systems and tooling.
Core Challenges
- Understanding worst case → pattern
aaaaabinaaaaaaaaaaaaaaaaaab→ maps to why preprocessing matters - Character-by-character comparison → maps to CPU cache behavior
- Early termination → maps to optimization opportunities
Key Concepts
- String Matching Basics: “Introduction to Algorithms” Chapter 32.1 - CLRS
- Big-O Analysis: Understanding O(n×m) worst case vs O(n) average case
- Pointer Arithmetic: “C Programming Language” Chapter 5 - K&R
Real-World Outcome
$ ./naive_search "needle" haystack.txt
Found at position 1234
Found at position 5678
Search completed in 45ms
$ ./naive_search "aaaaaab" pathological.txt
# Watch it struggle on worst-case input
Search completed in 2300ms # Much slower!
Benchmark:
Pattern "hello" in 1MB file:
Naive: 12ms
(We'll compare to Boyer-Moore later)
Implementation Guide
- Reproduce the simplest happy-path scenario.
- Build the smallest working version of the core feature.
- Add input validation and error handling.
- Add instrumentation/logging to confirm behavior.
- Refactor into clean modules with tests.
Milestones
- Milestone 1: Minimal working program that runs end-to-end.
- Milestone 2: Correct outputs for typical inputs.
- Milestone 3: Robust handling of edge cases.
- Milestone 4: Clean structure and documented usage.
Validation Checklist
- Output matches the real-world outcome example
- Handles invalid inputs safely
- Provides clear errors and exit codes
- Repeatable results across runs
References
- Main guide:
TEXT_SEARCH_TOOLS_DEEP_DIVE.md - “Introduction to Algorithms” by Cormen, Leiserson, Rivest, Stein (CLRS)