Project 12: Hybrid Retrieval Engine
Build a retrieval engine that combines semantic search (vectors), graph traversal, and keyword matching with intelligent fusion, optimizing for both precision and recall.
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Level 4: Expert |
| Time Estimate | 2-3 weeks (30-40 hours) |
| Language | Python |
| Prerequisites | Projects 1-11, information retrieval, reranking |
| Key Topics | Hybrid search, BM25, vector similarity, graph traversal, Reciprocal Rank Fusion, reranking, query routing |
1. Learning Objectives
By completing this project, you will:
- Understand different retrieval paradigms (sparse, dense, graph).
- Implement BM25 for keyword matching.
- Build vector similarity search with embeddings.
- Add graph-based retrieval via relationship traversal.
- Fuse results using RRF and learned reranking.
2. Theoretical Foundation
2.1 Core Concepts
-
Sparse Retrieval (BM25): Term frequency-based matching. Great for exact keywords and rare terms.
-
Dense Retrieval (Vectors): Semantic similarity via embeddings. Handles synonyms and concepts.
-
Graph Retrieval: Following relationships from matched entities. Adds structural context.
-
Reciprocal Rank Fusion (RRF): Combines ranked lists by reciprocal rank. Simple and effective.
-
Reranking: Using a cross-encoder or LLM to refine final ranking.
2.2 Why This Matters
No single retrieval method is best for all queries:
- “error 0x8007000d” → BM25 (exact match needed)
- “how to fix permissions” → Vector (semantic)
- “Alice’s current projects” → Graph (relationships)
Hybrid retrieval combines strengths and covers weaknesses.
2.3 Common Misconceptions
- “Vectors are always better.” They miss exact matches and rare terms.
- “BM25 is outdated.” It’s still competitive and often complementary.
- “More retrievers = better.” Fusion must be done carefully to avoid noise.
2.4 ASCII Diagram: Hybrid Retrieval Pipeline
HYBRID RETRIEVAL ARCHITECTURE
══════════════════════════════════════════════════════════════
USER QUERY
"What Python frameworks did Alice recommend for our API project?"
│
▼
┌─────────────────────────────────────────────────────────────┐
│ QUERY ANALYZER │
│ │
│ Extract: │
│ - Keywords: ["Python", "frameworks", "API"] │
│ - Entities: ["Alice", "API project"] │
│ - Intent: recommendation_lookup │
│ - Time: none specified (current) │
└─────────────────────────────────────────────────────────────┘
│
┌───────────┼───────────┐
│ │ │
▼ ▼ ▼
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ SPARSE │ │ DENSE │ │ GRAPH │
│ (BM25) │ │ (Vector) │ │ (Traversal) │
└─────────────┘ └─────────────┘ └─────────────┘
│ │ │
▼ ▼ ▼
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Results: │ │ Results: │ │ Results: │
│ 1. Doc A │ │ 1. Doc C │ │ 1. FastAPI │
│ 2. Doc B │ │ 2. Doc A │ │ 2. Django │
│ 3. Doc D │ │ 3. Doc B │ │ 3. Flask │
│ 4. Doc E │ │ 4. Doc F │ │ (via Alice) │
└─────────────┘ └─────────────┘ └─────────────┘
│ │ │
└───────────┼───────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ RECIPROCAL RANK FUSION (RRF) │
│ │
│ RRF(d) = Σ 1/(k + rank_i(d)) where k = 60 │
│ │
│ Fused Results: │
│ 1. Doc A (appeared in BM25@1, Vector@2) RRF: 0.033 │
│ 2. Doc B (appeared in BM25@2, Vector@3) RRF: 0.032 │
│ 3. Doc C (appeared in Vector@1) RRF: 0.016 │
│ 4. FastAPI (appeared in Graph@1) RRF: 0.016 │
│ ... │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ RERANKER │
│ │
│ Cross-encoder or LLM scores each (query, doc) pair │
│ │
│ Final Results: │
│ 1. Doc A (score: 0.94) - "Alice recommended FastAPI..." │
│ 2. FastAPI (score: 0.91) - From graph: Alice RECOMMENDED │
│ 3. Doc B (score: 0.87) - "Python API frameworks..." │
└─────────────────────────────────────────────────────────────┘
RETRIEVAL METHOD COMPARISON
═══════════════════════════
Query Type Best Method Why
─────────────────────────────────────────────────────────────
"error 0x8007000d" BM25 Exact match needed
"how to handle auth" Vector Semantic similarity
"Alice's projects" Graph Relationship-based
"Python for web APIs" Hybrid (all three) Multiple signals
RECIPROCAL RANK FUSION EXAMPLE
══════════════════════════════
BM25 Results: Vector Results: Graph Results:
1. Doc A 1. Doc C 1. FastAPI
2. Doc B 2. Doc A 2. Django
3. Doc D 3. Doc B
RRF Calculation (k=60):
Doc A: 1/(60+1) + 1/(60+2) = 0.0164 + 0.0161 = 0.0325
Doc B: 1/(60+2) + 1/(60+3) = 0.0161 + 0.0159 = 0.0320
Doc C: 1/(60+1) = 0.0164
FastAPI: 1/(60+1) = 0.0164
Doc D: 1/(60+3) = 0.0159
Django: 1/(60+2) = 0.0161
Final Order: Doc A, Doc B, Doc C ≈ FastAPI, Django, Doc D
3. Project Specification
3.1 What You Will Build
A Python retrieval engine that:
- Performs BM25, vector, and graph retrieval
- Fuses results using RRF
- Optionally reranks with cross-encoder
- Routes queries to optimal retrievers
3.2 Functional Requirements
- BM25 search:
engine.search_sparse(query)→ List[Result] - Vector search:
engine.search_dense(query)→ List[Result] - Graph search:
engine.search_graph(query, entities)→ List[Result] - Hybrid search:
engine.search(query)→ List[Result] - With reranking:
engine.search(query, rerank=True)→ List[Result] - Query routing:
engine.route_query(query)→ List[Retriever]
3.3 Example Usage / Output
from hybrid_retrieval import HybridEngine
engine = HybridEngine(
bm25_index=bm25_index,
vector_store=chroma_db,
graph_store=neo4j_driver,
reranker=cross_encoder # Optional
)
# Basic hybrid search
results = engine.search(
query="What Python frameworks did Alice recommend for APIs?",
top_k=10
)
for r in results:
print(f"[{r.score:.3f}] {r.source}: {r.content[:100]}...")
print(f" Methods: {r.retrieval_methods}")
# [0.94] episode_045: "Alice recommended FastAPI for our API project..."
# Methods: ['bm25', 'vector']
# [0.91] graph: Alice -[RECOMMENDED]-> FastAPI
# Methods: ['graph']
# [0.87] episode_032: "We discussed Python web frameworks..."
# Methods: ['vector']
# Search with detailed breakdown
results = engine.search(
query="authentication implementation",
top_k=5,
return_breakdown=True
)
print("BM25 candidates:", [r.id for r in results.bm25_results])
print("Vector candidates:", [r.id for r in results.vector_results])
print("Graph candidates:", [r.id for r in results.graph_results])
print("Final (fused + reranked):", [r.id for r in results.final])
# Query routing
routes = engine.analyze_query("error code 0x8007000d")
print(routes)
# QueryAnalysis(
# query_type="error_lookup",
# recommended_methods=["bm25"], # Exact match needed
# entities=[],
# keywords=["error", "0x8007000d"]
# )
routes = engine.analyze_query("What is Alice working on?")
print(routes)
# QueryAnalysis(
# query_type="entity_lookup",
# recommended_methods=["graph", "vector"],
# entities=["Alice"],
# keywords=["working"]
# )
4. Solution Architecture
4.1 High-Level Design
┌───────────────┐
│ Query │
└───────┬───────┘
│
▼
┌───────────────┐ ┌───────────────┐
│ Query │────▶│ Router │
│ Analyzer │ │ │
└───────────────┘ └───────┬───────┘
│
┌───────────────┼───────────────┐
│ │ │
▼ ▼ ▼
┌───────────┐ ┌───────────┐ ┌───────────┐
│ BM25 │ │ Vector │ │ Graph │
│ Retriever │ │ Retriever │ │ Retriever │
└───────────┘ └───────────┘ └───────────┘
│ │ │
└───────────────┼───────────────┘
│
▼
┌───────────────┐
│ Fusion │
│ (RRF) │
└───────┬───────┘
│
▼
┌───────────────┐
│ Reranker │
│ (Optional) │
└───────┬───────┘
│
▼
┌───────────────┐
│ Results │
└───────────────┘
4.2 Key Components
| Component | Responsibility | Technology |
|---|---|---|
| QueryAnalyzer | Extract keywords, entities, intent | LLM or rules |
| BM25Retriever | Sparse keyword matching | rank_bm25 library |
| VectorRetriever | Dense semantic search | ChromaDB |
| GraphRetriever | Relationship traversal | Neo4j + Cypher |
| RRFFusion | Combine ranked lists | Custom algorithm |
| Reranker | Refine final ranking | Cross-encoder model |
4.3 Data Models
from pydantic import BaseModel
from typing import Literal
class Result(BaseModel):
id: str
content: str
score: float
source: str
retrieval_methods: list[str]
metadata: dict = {}
class QueryAnalysis(BaseModel):
query_type: Literal["keyword", "semantic", "entity", "hybrid"]
recommended_methods: list[str]
entities: list[str]
keywords: list[str]
class SearchResults(BaseModel):
query: str
bm25_results: list[Result] | None
vector_results: list[Result] | None
graph_results: list[Result] | None
fused_results: list[Result]
final_results: list[Result]
5. Implementation Guide
5.1 Development Environment Setup
mkdir hybrid-retrieval && cd hybrid-retrieval
python -m venv .venv && source .venv/bin/activate
pip install rank-bm25 chromadb neo4j sentence-transformers
5.2 Project Structure
hybrid-retrieval/
├── src/
│ ├── engine.py # HybridEngine main class
│ ├── retrievers/
│ │ ├── bm25.py # BM25 retriever
│ │ ├── vector.py # Vector retriever
│ │ └── graph.py # Graph retriever
│ ├── fusion.py # RRF implementation
│ ├── reranker.py # Cross-encoder reranking
│ ├── analyzer.py # Query analysis
│ └── models.py # Data models
├── tests/
│ ├── test_retrieval.py
│ └── test_fusion.py
└── README.md
5.3 Implementation Phases
Phase 1: Individual Retrievers (10-12h)
Goals:
- BM25 retrieval working
- Vector retrieval working
- Graph retrieval working
Tasks:
- Implement BM25 indexing and search
- Set up vector store with embeddings
- Build graph retrieval with entity extraction
- Standardize result format across retrievers
Checkpoint: Each retriever returns ranked results.
Phase 2: Fusion and Routing (10-12h)
Goals:
- RRF fusion implemented
- Query routing working
Tasks:
- Implement RRF algorithm
- Build query analyzer for routing
- Add weighted fusion options
- Handle missing retrievers gracefully
Checkpoint: Hybrid search combines all retrievers.
Phase 3: Reranking and Optimization (8-10h)
Goals:
- Cross-encoder reranking
- Performance optimization
Tasks:
- Add cross-encoder reranker
- Implement result caching
- Add async retrieval for parallelism
- Benchmark and tune parameters
Checkpoint: Production-ready hybrid retrieval.
6. Testing Strategy
6.1 Test Categories
| Category | Purpose | Examples |
|---|---|---|
| Unit | Test individual components | RRF calculation |
| Integration | Test full pipeline | Query → results |
| Quality | Test retrieval accuracy | Known relevant docs |
6.2 Critical Test Cases
- Exact match query: BM25 should rank highest
- Semantic query: Vector should contribute
- Entity query: Graph results appear
- RRF correctness: Fusion ranks match expected
7. Common Pitfalls & Debugging
| Pitfall | Symptom | Solution |
|---|---|---|
| Score incompatibility | One retriever dominates | Normalize scores before fusion |
| Missing results | Expected doc not returned | Check all retrievers independently |
| Slow performance | High latency | Run retrievers in parallel |
| Reranker bias | Verbose docs ranked higher | Include length normalization |
8. Extensions & Challenges
8.1 Beginner Extensions
- Add retrieval method explanations
- Implement result deduplication
8.2 Intermediate Extensions
- Add learned fusion weights
- Implement query expansion
8.3 Advanced Extensions
- Add adaptive routing (learn from feedback)
- Implement distributed retrieval
9. Real-World Connections
9.1 Industry Applications
- Elasticsearch: Hybrid search with vectors
- Pinecone Hybrid: Sparse + dense retrieval
- Graphiti: Graph-enhanced RAG
9.2 Interview Relevance
- Explain BM25 vs vector tradeoffs
- Describe RRF and why it works
- Discuss when to use reranking
10. Resources
10.1 Essential Reading
- “AI Engineering” by Chip Huyen — Ch. on RAG
- “Reciprocal Rank Fusion” paper — Cormack et al.
- Cross-encoder vs Bi-encoder — SBERT documentation
10.2 Related Projects
- Previous: Project 11 (MemGPT Virtual Context)
- Next: Project 13 (Multi-Agent Shared Memory)
11. Self-Assessment Checklist
- I understand BM25, vector, and graph retrieval tradeoffs
- I can implement Reciprocal Rank Fusion
- I know when to use reranking
- I can route queries to optimal retrievers
12. Submission / Completion Criteria
Minimum Viable Completion:
- BM25 and vector retrieval working
- RRF fusion implemented
- Basic hybrid search
Full Completion:
- Graph retrieval integrated
- Query routing working
- Cross-encoder reranking
Excellence:
- Learned fusion weights
- Adaptive routing
- Production performance