Project 9: Teddy Algorithm (Multi-Literal SIMD)

The Teddy algorithm from Intel’s Hyperscan, which finds multiple short literal patterns simultaneously using the PSHUFB instruction for parallel table lookups.

Quick Reference

Attribute Value
Primary Language Rust
Alternative Languages C++
Difficulty Level 5: Master
Time Estimate 3-4 weeks
Knowledge Area SIMD / Multiple Pattern Matching
Tooling SSSE3/AVX2 (PSHUFB instruction)
Prerequisites Projects 7-8, deep SIMD understanding

What You Will Build

The Teddy algorithm from Intel’s Hyperscan, which finds multiple short literal patterns simultaneously using the PSHUFB instruction for parallel table lookups.

Why It Matters

This project builds core skills that appear repeatedly in real-world systems and tooling.

Core Challenges

  • PSHUFB as a lookup table → maps to parallel 16-way lookup
  • Fingerprint-based candidate detection → maps to bloom filter-like behavior
  • Bucket management → maps to handling false positives
  • Verification phase → maps to confirming actual matches

Key Concepts

  • PSHUFB Instruction: Use SIMD register as 16-entry lookup table
  • Fingerprinting: Map patterns to fingerprints, search for fingerprints
  • Bucket Disambiguation: Multiple patterns can share a fingerprint
  • Teddy Paper: “Teddy: An Efficient SIMD-based Literal Matching Engine” - Intel

Real-World Outcome

$ ./teddy_search patterns.txt haystack.txt
Loading 32 patterns from patterns.txt...
Patterns: ["error", "warning", "fatal", "debug", ...]

Fingerprint table constructed:
  Bucket 0: ["error", "event"]
  Bucket 3: ["warning"]
  Bucket 7: ["fatal", "fail"]
  ...

Searching 1GB haystack...
Candidates found: 145,234
Verified matches: 12,456
Search time: 89ms (11.2 GB/s throughput)

Benchmark comparison (32 patterns, 1GB file):
  Aho-Corasick:  450ms
  Teddy SIMD:     89ms
  Speedup: 5x

Note: Teddy wins for small pattern sets (<~64 patterns)
      Aho-Corasick wins for large pattern sets (1000+)

Implementation Guide

  1. Reproduce the simplest happy-path scenario.
  2. Build the smallest working version of the core feature.
  3. Add input validation and error handling.
  4. Add instrumentation/logging to confirm behavior.
  5. Refactor into clean modules with tests.

Milestones

  • Milestone 1: Minimal working program that runs end-to-end.
  • Milestone 2: Correct outputs for typical inputs.
  • Milestone 3: Robust handling of edge cases.
  • Milestone 4: Clean structure and documented usage.

Validation Checklist

  • Output matches the real-world outcome example
  • Handles invalid inputs safely
  • Provides clear errors and exit codes
  • Repeatable results across runs

References

  • Main guide: TEXT_SEARCH_TOOLS_DEEP_DIVE.md
  • Hyperscan papers + Rust memchr crate source code