Project 12: Hybrid Retrieval Engine

Build a retrieval engine that combines semantic search (vectors), graph traversal, and keyword matching with intelligent fusion, optimizing for both precision and recall.

Quick Reference

Attribute	Value
Difficulty	Level 4: Expert
Time Estimate	2-3 weeks (30-40 hours)
Language	Python
Prerequisites	Projects 1-11, information retrieval, reranking
Key Topics	Hybrid search, BM25, vector similarity, graph traversal, Reciprocal Rank Fusion, reranking, query routing

1. Learning Objectives

By completing this project, you will:

Understand different retrieval paradigms (sparse, dense, graph).
Implement BM25 for keyword matching.
Build vector similarity search with embeddings.
Add graph-based retrieval via relationship traversal.
Fuse results using RRF and learned reranking.

2. Theoretical Foundation

2.1 Core Concepts

Sparse Retrieval (BM25): Term frequency-based matching. Great for exact keywords and rare terms.
Dense Retrieval (Vectors): Semantic similarity via embeddings. Handles synonyms and concepts.
Graph Retrieval: Following relationships from matched entities. Adds structural context.
Reciprocal Rank Fusion (RRF): Combines ranked lists by reciprocal rank. Simple and effective.
Reranking: Using a cross-encoder or LLM to refine final ranking.

2.2 Why This Matters

No single retrieval method is best for all queries:

“error 0x8007000d” → BM25 (exact match needed)
“how to fix permissions” → Vector (semantic)
“Alice’s current projects” → Graph (relationships)

Hybrid retrieval combines strengths and covers weaknesses.

2.3 Common Misconceptions

“Vectors are always better.” They miss exact matches and rare terms.
“BM25 is outdated.” It’s still competitive and often complementary.
“More retrievers = better.” Fusion must be done carefully to avoid noise.

2.4 ASCII Diagram: Hybrid Retrieval Pipeline

HYBRID RETRIEVAL ARCHITECTURE
══════════════════════════════════════════════════════════════

USER QUERY
"What Python frameworks did Alice recommend for our API project?"

                    │
                    ▼
┌─────────────────────────────────────────────────────────────┐
│                    QUERY ANALYZER                            │
│                                                              │
│  Extract:                                                    │
│  - Keywords: ["Python", "frameworks", "API"]                │
│  - Entities: ["Alice", "API project"]                       │
│  - Intent: recommendation_lookup                             │
│  - Time: none specified (current)                           │
└─────────────────────────────────────────────────────────────┘
                    │
        ┌───────────┼───────────┐
        │           │           │
        ▼           ▼           ▼
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│  SPARSE     │ │   DENSE     │ │   GRAPH     │
│  (BM25)     │ │  (Vector)   │ │ (Traversal) │
└─────────────┘ └─────────────┘ └─────────────┘
        │           │           │
        ▼           ▼           ▼
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Results:    │ │ Results:    │ │ Results:    │
│ 1. Doc A    │ │ 1. Doc C    │ │ 1. FastAPI  │
│ 2. Doc B    │ │ 2. Doc A    │ │ 2. Django   │
│ 3. Doc D    │ │ 3. Doc B    │ │ 3. Flask    │
│ 4. Doc E    │ │ 4. Doc F    │ │ (via Alice) │
└─────────────┘ └─────────────┘ └─────────────┘
        │           │           │
        └───────────┼───────────┘
                    │
                    ▼
┌─────────────────────────────────────────────────────────────┐
│               RECIPROCAL RANK FUSION (RRF)                   │
│                                                              │
│  RRF(d) = Σ 1/(k + rank_i(d))  where k = 60                 │
│                                                              │
│  Fused Results:                                              │
│  1. Doc A (appeared in BM25@1, Vector@2)     RRF: 0.033     │
│  2. Doc B (appeared in BM25@2, Vector@3)     RRF: 0.032     │
│  3. Doc C (appeared in Vector@1)             RRF: 0.016     │
│  4. FastAPI (appeared in Graph@1)            RRF: 0.016     │
│  ...                                                         │
└─────────────────────────────────────────────────────────────┘
                    │
                    ▼
┌─────────────────────────────────────────────────────────────┐
│                      RERANKER                                │
│                                                              │
│  Cross-encoder or LLM scores each (query, doc) pair         │
│                                                              │
│  Final Results:                                              │
│  1. Doc A (score: 0.94) - "Alice recommended FastAPI..."    │
│  2. FastAPI (score: 0.91) - From graph: Alice RECOMMENDED   │
│  3. Doc B (score: 0.87) - "Python API frameworks..."        │
└─────────────────────────────────────────────────────────────┘


RETRIEVAL METHOD COMPARISON
═══════════════════════════

Query Type              Best Method         Why
─────────────────────────────────────────────────────────────
"error 0x8007000d"     BM25                 Exact match needed
"how to handle auth"   Vector               Semantic similarity
"Alice's projects"     Graph                Relationship-based
"Python for web APIs"  Hybrid (all three)   Multiple signals


RECIPROCAL RANK FUSION EXAMPLE
══════════════════════════════

BM25 Results:     Vector Results:    Graph Results:
1. Doc A          1. Doc C           1. FastAPI
2. Doc B          2. Doc A           2. Django
3. Doc D          3. Doc B

RRF Calculation (k=60):

Doc A:  1/(60+1) + 1/(60+2) = 0.0164 + 0.0161 = 0.0325
Doc B:  1/(60+2) + 1/(60+3) = 0.0161 + 0.0159 = 0.0320
Doc C:  1/(60+1) = 0.0164
FastAPI: 1/(60+1) = 0.0164
Doc D:  1/(60+3) = 0.0159
Django: 1/(60+2) = 0.0161

Final Order: Doc A, Doc B, Doc C ≈ FastAPI, Django, Doc D

3. Project Specification

3.1 What You Will Build

A Python retrieval engine that:

Performs BM25, vector, and graph retrieval
Fuses results using RRF
Optionally reranks with cross-encoder
Routes queries to optimal retrievers

3.2 Functional Requirements

BM25 search: engine.search_sparse(query) → List[Result]
Vector search: engine.search_dense(query) → List[Result]
Graph search: engine.search_graph(query, entities) → List[Result]
Hybrid search: engine.search(query) → List[Result]
With reranking: engine.search(query, rerank=True) → List[Result]
Query routing: engine.route_query(query) → List[Retriever]

3.3 Example Usage / Output

from hybrid_retrieval import HybridEngine

engine = HybridEngine(
    bm25_index=bm25_index,
    vector_store=chroma_db,
    graph_store=neo4j_driver,
    reranker=cross_encoder  # Optional
)

# Basic hybrid search
results = engine.search(
    query="What Python frameworks did Alice recommend for APIs?",
    top_k=10
)

for r in results:
    print(f"[{r.score:.3f}] {r.source}: {r.content[:100]}...")
    print(f"  Methods: {r.retrieval_methods}")
# [0.94] episode_045: "Alice recommended FastAPI for our API project..."
#   Methods: ['bm25', 'vector']
# [0.91] graph: Alice -[RECOMMENDED]-> FastAPI
#   Methods: ['graph']
# [0.87] episode_032: "We discussed Python web frameworks..."
#   Methods: ['vector']

# Search with detailed breakdown
results = engine.search(
    query="authentication implementation",
    top_k=5,
    return_breakdown=True
)

print("BM25 candidates:", [r.id for r in results.bm25_results])
print("Vector candidates:", [r.id for r in results.vector_results])
print("Graph candidates:", [r.id for r in results.graph_results])
print("Final (fused + reranked):", [r.id for r in results.final])

# Query routing
routes = engine.analyze_query("error code 0x8007000d")
print(routes)
# QueryAnalysis(
#   query_type="error_lookup",
#   recommended_methods=["bm25"], # Exact match needed
#   entities=[],
#   keywords=["error", "0x8007000d"]
# )

routes = engine.analyze_query("What is Alice working on?")
print(routes)
# QueryAnalysis(
#   query_type="entity_lookup",
#   recommended_methods=["graph", "vector"],
#   entities=["Alice"],
#   keywords=["working"]
# )

4. Solution Architecture

4.1 High-Level Design

┌───────────────┐
│    Query      │
└───────┬───────┘
        │
        ▼
┌───────────────┐     ┌───────────────┐
│    Query      │────▶│    Router     │
│   Analyzer    │     │               │
└───────────────┘     └───────┬───────┘
                              │
              ┌───────────────┼───────────────┐
              │               │               │
              ▼               ▼               ▼
       ┌───────────┐   ┌───────────┐   ┌───────────┐
       │   BM25    │   │  Vector   │   │   Graph   │
       │ Retriever │   │ Retriever │   │ Retriever │
       └───────────┘   └───────────┘   └───────────┘
              │               │               │
              └───────────────┼───────────────┘
                              │
                              ▼
                    ┌───────────────┐
                    │    Fusion     │
                    │    (RRF)      │
                    └───────┬───────┘
                            │
                            ▼
                    ┌───────────────┐
                    │   Reranker    │
                    │  (Optional)   │
                    └───────┬───────┘
                            │
                            ▼
                    ┌───────────────┐
                    │   Results     │
                    └───────────────┘

4.2 Key Components

Component	Responsibility	Technology
QueryAnalyzer	Extract keywords, entities, intent	LLM or rules
BM25Retriever	Sparse keyword matching	rank_bm25 library
VectorRetriever	Dense semantic search	ChromaDB
GraphRetriever	Relationship traversal	Neo4j + Cypher
RRFFusion	Combine ranked lists	Custom algorithm
Reranker	Refine final ranking	Cross-encoder model

4.3 Data Models

from pydantic import BaseModel
from typing import Literal

class Result(BaseModel):
    id: str
    content: str
    score: float
    source: str
    retrieval_methods: list[str]
    metadata: dict = {}

class QueryAnalysis(BaseModel):
    query_type: Literal["keyword", "semantic", "entity", "hybrid"]
    recommended_methods: list[str]
    entities: list[str]
    keywords: list[str]

class SearchResults(BaseModel):
    query: str
    bm25_results: list[Result] | None
    vector_results: list[Result] | None
    graph_results: list[Result] | None
    fused_results: list[Result]
    final_results: list[Result]

5. Implementation Guide

5.1 Development Environment Setup

mkdir hybrid-retrieval && cd hybrid-retrieval
python -m venv .venv && source .venv/bin/activate
pip install rank-bm25 chromadb neo4j sentence-transformers

5.2 Project Structure

hybrid-retrieval/
├── src/
│   ├── engine.py          # HybridEngine main class
│   ├── retrievers/
│   │   ├── bm25.py        # BM25 retriever
│   │   ├── vector.py      # Vector retriever
│   │   └── graph.py       # Graph retriever
│   ├── fusion.py          # RRF implementation
│   ├── reranker.py        # Cross-encoder reranking
│   ├── analyzer.py        # Query analysis
│   └── models.py          # Data models
├── tests/
│   ├── test_retrieval.py
│   └── test_fusion.py
└── README.md

5.3 Implementation Phases

Phase 1: Individual Retrievers (10-12h)

Goals:

BM25 retrieval working
Vector retrieval working
Graph retrieval working

Tasks:

Implement BM25 indexing and search
Set up vector store with embeddings
Build graph retrieval with entity extraction
Standardize result format across retrievers

Checkpoint: Each retriever returns ranked results.

Phase 2: Fusion and Routing (10-12h)

Goals:

RRF fusion implemented
Query routing working

Tasks:

Implement RRF algorithm
Build query analyzer for routing
Add weighted fusion options
Handle missing retrievers gracefully

Checkpoint: Hybrid search combines all retrievers.

Phase 3: Reranking and Optimization (8-10h)

Goals:

Cross-encoder reranking
Performance optimization

Tasks:

Add cross-encoder reranker
Implement result caching
Add async retrieval for parallelism
Benchmark and tune parameters

Checkpoint: Production-ready hybrid retrieval.

6. Testing Strategy

6.1 Test Categories

Category	Purpose	Examples
Unit	Test individual components	RRF calculation
Integration	Test full pipeline	Query → results
Quality	Test retrieval accuracy	Known relevant docs

6.2 Critical Test Cases

Exact match query: BM25 should rank highest
Semantic query: Vector should contribute
Entity query: Graph results appear
RRF correctness: Fusion ranks match expected

7. Common Pitfalls & Debugging

Pitfall	Symptom	Solution
Score incompatibility	One retriever dominates	Normalize scores before fusion
Missing results	Expected doc not returned	Check all retrievers independently
Slow performance	High latency	Run retrievers in parallel
Reranker bias	Verbose docs ranked higher	Include length normalization

8. Extensions & Challenges

8.1 Beginner Extensions

Add retrieval method explanations
Implement result deduplication

8.2 Intermediate Extensions

Add learned fusion weights
Implement query expansion

8.3 Advanced Extensions

Add adaptive routing (learn from feedback)
Implement distributed retrieval

9. Real-World Connections

9.1 Industry Applications

Elasticsearch: Hybrid search with vectors
Pinecone Hybrid: Sparse + dense retrieval
Graphiti: Graph-enhanced RAG

9.2 Interview Relevance

Explain BM25 vs vector tradeoffs
Describe RRF and why it works
Discuss when to use reranking

10. Resources

10.1 Essential Reading

“AI Engineering” by Chip Huyen — Ch. on RAG
“Reciprocal Rank Fusion” paper — Cormack et al.
Cross-encoder vs Bi-encoder — SBERT documentation

Previous: Project 11 (MemGPT Virtual Context)
Next: Project 13 (Multi-Agent Shared Memory)

11. Self-Assessment Checklist

I understand BM25, vector, and graph retrieval tradeoffs
I can implement Reciprocal Rank Fusion
I know when to use reranking
I can route queries to optimal retrievers

12. Submission / Completion Criteria

Minimum Viable Completion:

BM25 and vector retrieval working
RRF fusion implemented
Basic hybrid search

Full Completion:

Graph retrieval integrated
Query routing working
Cross-encoder reranking

Excellence:

Learned fusion weights
Adaptive routing
Production performance

Project 12: Hybrid Retrieval Engine

Quick Reference

1. Learning Objectives

2. Theoretical Foundation

2.1 Core Concepts

2.2 Why This Matters

2.3 Common Misconceptions

2.4 ASCII Diagram: Hybrid Retrieval Pipeline

3. Project Specification

3.1 What You Will Build

3.2 Functional Requirements

3.3 Example Usage / Output

4. Solution Architecture

4.1 High-Level Design

4.2 Key Components

4.3 Data Models

5. Implementation Guide

5.1 Development Environment Setup

5.2 Project Structure

5.3 Implementation Phases

Phase 1: Individual Retrievers (10-12h)

Phase 2: Fusion and Routing (10-12h)

Phase 3: Reranking and Optimization (8-10h)

6. Testing Strategy

6.1 Test Categories

6.2 Critical Test Cases

7. Common Pitfalls & Debugging

8. Extensions & Challenges

8.1 Beginner Extensions

8.2 Intermediate Extensions

8.3 Advanced Extensions

9. Real-World Connections

9.1 Industry Applications

9.2 Interview Relevance

10. Resources

10.1 Essential Reading

10.2 Related Projects

11. Self-Assessment Checklist

12. Submission / Completion Criteria