Project 10: SIMD-Powered String Search
An implementation of the
strstrfunction (find a substring in a larger string) that is accelerated using SIMD (Single Instruction, Multiple Data) intrinsics. You will compare its performance against the standard library’sstrstrand a naive C implementation.
Quick Reference
| Attribute | Value |
|---|---|
| Primary Language | C |
| Alternative Languages | C++, Rust |
| Difficulty | Level 4: Expert |
| Time Estimate | 2-3 weeks |
| Knowledge Area | SIMD / Vectorization / Low-Level Optimization |
| Tooling | GCC/Clang with Intrinsics (<immintrin.h>) |
| Prerequisites | Strong C and pointer skills, basic assembly knowledge is helpful. |
What You Will Build
An implementation of the strstr function (find a substring in a larger string) that is accelerated using SIMD (Single Instruction, Multiple Data) intrinsics. You will compare its performance against the standard library’s strstr and a naive C implementation.
Why It Matters
This project builds core skills that appear repeatedly in real-world systems and tooling.
Core Challenges
- Understanding SIMD concepts → maps to vector registers (XMM/YMM/ZMM), packed data, and lanes
- Using compiler intrinsics → maps to writing C code that maps directly to specific assembly instructions (e.g.,
_mm256_load_si256) - Designing a SIMD-friendly algorithm → maps to rethinking a simple search into a parallel operation
- Handling edge cases → maps to unaligned data and leftover bytes at the end of a string
Key Concepts
- SIMD (Single Instruction, Multiple Data): Intel Intrinsics Guide (web search).
- AVX/SSE Intrinsics: GCC documentation on x86 intrinsics.
- Data Alignment for Vectorization: Many blog posts cover why aligned loads are faster.
Real-World Outcome
$ ./simd_strstr
Searching for a 16-byte needle in a 1GB haystack...
Naive C strstr: 1.2 seconds
Standard library strstr: 0.3 seconds
SIMD-accelerated strstr: 0.08 seconds
-> SIMD is 15x faster than naive C and 3.75x faster than the optimized standard library.
Implementation Guide
- Reproduce the simplest happy-path scenario.
- Build the smallest working version of the core feature.
- Add input validation and error handling.
- Add instrumentation/logging to confirm behavior.
- Refactor into clean modules with tests.
Milestones
- Milestone 1: Minimal working program that runs end-to-end.
- Milestone 2: Correct outputs for typical inputs.
- Milestone 3: Robust handling of edge cases.
- Milestone 4: Clean structure and documented usage.
Validation Checklist
- Output matches the real-world outcome example
- Handles invalid inputs safely
- Provides clear errors and exit codes
- Repeatable results across runs
References
- Main guide:
LEARN_C_PERFORMANCE_DEEP_DIVE.md - “Modern X86 Assembly Language Programming” by Daniel Kusswurm (for concepts)