Project 12: Hybrid Retrieval Engine

Build a retrieval engine that combines semantic search (vectors), graph traversal, and keyword matching with intelligent fusion, optimizing for both precision and recall.

Quick Reference

Attribute Value
Difficulty Level 4: Expert
Time Estimate 2-3 weeks (30-40 hours)
Language Python
Prerequisites Projects 1-11, information retrieval, reranking
Key Topics Hybrid search, BM25, vector similarity, graph traversal, Reciprocal Rank Fusion, reranking, query routing

1. Learning Objectives

By completing this project, you will:

  1. Understand different retrieval paradigms (sparse, dense, graph).
  2. Implement BM25 for keyword matching.
  3. Build vector similarity search with embeddings.
  4. Add graph-based retrieval via relationship traversal.
  5. Fuse results using RRF and learned reranking.

2. Theoretical Foundation

2.1 Core Concepts

  • Sparse Retrieval (BM25): Term frequency-based matching. Great for exact keywords and rare terms.

  • Dense Retrieval (Vectors): Semantic similarity via embeddings. Handles synonyms and concepts.

  • Graph Retrieval: Following relationships from matched entities. Adds structural context.

  • Reciprocal Rank Fusion (RRF): Combines ranked lists by reciprocal rank. Simple and effective.

  • Reranking: Using a cross-encoder or LLM to refine final ranking.

2.2 Why This Matters

No single retrieval method is best for all queries:

  • “error 0x8007000d” → BM25 (exact match needed)
  • “how to fix permissions” → Vector (semantic)
  • “Alice’s current projects” → Graph (relationships)

Hybrid retrieval combines strengths and covers weaknesses.

2.3 Common Misconceptions

  • “Vectors are always better.” They miss exact matches and rare terms.
  • “BM25 is outdated.” It’s still competitive and often complementary.
  • “More retrievers = better.” Fusion must be done carefully to avoid noise.

2.4 ASCII Diagram: Hybrid Retrieval Pipeline

HYBRID RETRIEVAL ARCHITECTURE
══════════════════════════════════════════════════════════════

USER QUERY
"What Python frameworks did Alice recommend for our API project?"

                    │
                    ▼
┌─────────────────────────────────────────────────────────────┐
│                    QUERY ANALYZER                            │
│                                                              │
│  Extract:                                                    │
│  - Keywords: ["Python", "frameworks", "API"]                │
│  - Entities: ["Alice", "API project"]                       │
│  - Intent: recommendation_lookup                             │
│  - Time: none specified (current)                           │
└─────────────────────────────────────────────────────────────┘
                    │
        ┌───────────┼───────────┐
        │           │           │
        ▼           ▼           ▼
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│  SPARSE     │ │   DENSE     │ │   GRAPH     │
│  (BM25)     │ │  (Vector)   │ │ (Traversal) │
└─────────────┘ └─────────────┘ └─────────────┘
        │           │           │
        ▼           ▼           ▼
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Results:    │ │ Results:    │ │ Results:    │
│ 1. Doc A    │ │ 1. Doc C    │ │ 1. FastAPI  │
│ 2. Doc B    │ │ 2. Doc A    │ │ 2. Django   │
│ 3. Doc D    │ │ 3. Doc B    │ │ 3. Flask    │
│ 4. Doc E    │ │ 4. Doc F    │ │ (via Alice) │
└─────────────┘ └─────────────┘ └─────────────┘
        │           │           │
        └───────────┼───────────┘
                    │
                    ▼
┌─────────────────────────────────────────────────────────────┐
│               RECIPROCAL RANK FUSION (RRF)                   │
│                                                              │
│  RRF(d) = Σ 1/(k + rank_i(d))  where k = 60                 │
│                                                              │
│  Fused Results:                                              │
│  1. Doc A (appeared in BM25@1, Vector@2)     RRF: 0.033     │
│  2. Doc B (appeared in BM25@2, Vector@3)     RRF: 0.032     │
│  3. Doc C (appeared in Vector@1)             RRF: 0.016     │
│  4. FastAPI (appeared in Graph@1)            RRF: 0.016     │
│  ...                                                         │
└─────────────────────────────────────────────────────────────┘
                    │
                    ▼
┌─────────────────────────────────────────────────────────────┐
│                      RERANKER                                │
│                                                              │
│  Cross-encoder or LLM scores each (query, doc) pair         │
│                                                              │
│  Final Results:                                              │
│  1. Doc A (score: 0.94) - "Alice recommended FastAPI..."    │
│  2. FastAPI (score: 0.91) - From graph: Alice RECOMMENDED   │
│  3. Doc B (score: 0.87) - "Python API frameworks..."        │
└─────────────────────────────────────────────────────────────┘


RETRIEVAL METHOD COMPARISON
═══════════════════════════

Query Type              Best Method         Why
─────────────────────────────────────────────────────────────
"error 0x8007000d"     BM25                 Exact match needed
"how to handle auth"   Vector               Semantic similarity
"Alice's projects"     Graph                Relationship-based
"Python for web APIs"  Hybrid (all three)   Multiple signals


RECIPROCAL RANK FUSION EXAMPLE
══════════════════════════════

BM25 Results:     Vector Results:    Graph Results:
1. Doc A          1. Doc C           1. FastAPI
2. Doc B          2. Doc A           2. Django
3. Doc D          3. Doc B

RRF Calculation (k=60):

Doc A:  1/(60+1) + 1/(60+2) = 0.0164 + 0.0161 = 0.0325
Doc B:  1/(60+2) + 1/(60+3) = 0.0161 + 0.0159 = 0.0320
Doc C:  1/(60+1) = 0.0164
FastAPI: 1/(60+1) = 0.0164
Doc D:  1/(60+3) = 0.0159
Django: 1/(60+2) = 0.0161

Final Order: Doc A, Doc B, Doc C ≈ FastAPI, Django, Doc D

3. Project Specification

3.1 What You Will Build

A Python retrieval engine that:

  • Performs BM25, vector, and graph retrieval
  • Fuses results using RRF
  • Optionally reranks with cross-encoder
  • Routes queries to optimal retrievers

3.2 Functional Requirements

  1. BM25 search: engine.search_sparse(query) → List[Result]
  2. Vector search: engine.search_dense(query) → List[Result]
  3. Graph search: engine.search_graph(query, entities) → List[Result]
  4. Hybrid search: engine.search(query) → List[Result]
  5. With reranking: engine.search(query, rerank=True) → List[Result]
  6. Query routing: engine.route_query(query) → List[Retriever]

3.3 Example Usage / Output

from hybrid_retrieval import HybridEngine

engine = HybridEngine(
    bm25_index=bm25_index,
    vector_store=chroma_db,
    graph_store=neo4j_driver,
    reranker=cross_encoder  # Optional
)

# Basic hybrid search
results = engine.search(
    query="What Python frameworks did Alice recommend for APIs?",
    top_k=10
)

for r in results:
    print(f"[{r.score:.3f}] {r.source}: {r.content[:100]}...")
    print(f"  Methods: {r.retrieval_methods}")
# [0.94] episode_045: "Alice recommended FastAPI for our API project..."
#   Methods: ['bm25', 'vector']
# [0.91] graph: Alice -[RECOMMENDED]-> FastAPI
#   Methods: ['graph']
# [0.87] episode_032: "We discussed Python web frameworks..."
#   Methods: ['vector']

# Search with detailed breakdown
results = engine.search(
    query="authentication implementation",
    top_k=5,
    return_breakdown=True
)

print("BM25 candidates:", [r.id for r in results.bm25_results])
print("Vector candidates:", [r.id for r in results.vector_results])
print("Graph candidates:", [r.id for r in results.graph_results])
print("Final (fused + reranked):", [r.id for r in results.final])

# Query routing
routes = engine.analyze_query("error code 0x8007000d")
print(routes)
# QueryAnalysis(
#   query_type="error_lookup",
#   recommended_methods=["bm25"],  # Exact match needed
#   entities=[],
#   keywords=["error", "0x8007000d"]
# )

routes = engine.analyze_query("What is Alice working on?")
print(routes)
# QueryAnalysis(
#   query_type="entity_lookup",
#   recommended_methods=["graph", "vector"],
#   entities=["Alice"],
#   keywords=["working"]
# )

4. Solution Architecture

4.1 High-Level Design

┌───────────────┐
│    Query      │
└───────┬───────┘
        │
        ▼
┌───────────────┐     ┌───────────────┐
│    Query      │────▶│    Router     │
│   Analyzer    │     │               │
└───────────────┘     └───────┬───────┘
                              │
              ┌───────────────┼───────────────┐
              │               │               │
              ▼               ▼               ▼
       ┌───────────┐   ┌───────────┐   ┌───────────┐
       │   BM25    │   │  Vector   │   │   Graph   │
       │ Retriever │   │ Retriever │   │ Retriever │
       └───────────┘   └───────────┘   └───────────┘
              │               │               │
              └───────────────┼───────────────┘
                              │
                              ▼
                    ┌───────────────┐
                    │    Fusion     │
                    │    (RRF)      │
                    └───────┬───────┘
                            │
                            ▼
                    ┌───────────────┐
                    │   Reranker    │
                    │  (Optional)   │
                    └───────┬───────┘
                            │
                            ▼
                    ┌───────────────┐
                    │   Results     │
                    └───────────────┘

4.2 Key Components

Component Responsibility Technology
QueryAnalyzer Extract keywords, entities, intent LLM or rules
BM25Retriever Sparse keyword matching rank_bm25 library
VectorRetriever Dense semantic search ChromaDB
GraphRetriever Relationship traversal Neo4j + Cypher
RRFFusion Combine ranked lists Custom algorithm
Reranker Refine final ranking Cross-encoder model

4.3 Data Models

from pydantic import BaseModel
from typing import Literal

class Result(BaseModel):
    id: str
    content: str
    score: float
    source: str
    retrieval_methods: list[str]
    metadata: dict = {}

class QueryAnalysis(BaseModel):
    query_type: Literal["keyword", "semantic", "entity", "hybrid"]
    recommended_methods: list[str]
    entities: list[str]
    keywords: list[str]

class SearchResults(BaseModel):
    query: str
    bm25_results: list[Result] | None
    vector_results: list[Result] | None
    graph_results: list[Result] | None
    fused_results: list[Result]
    final_results: list[Result]

5. Implementation Guide

5.1 Development Environment Setup

mkdir hybrid-retrieval && cd hybrid-retrieval
python -m venv .venv && source .venv/bin/activate
pip install rank-bm25 chromadb neo4j sentence-transformers

5.2 Project Structure

hybrid-retrieval/
├── src/
│   ├── engine.py          # HybridEngine main class
│   ├── retrievers/
│   │   ├── bm25.py        # BM25 retriever
│   │   ├── vector.py      # Vector retriever
│   │   └── graph.py       # Graph retriever
│   ├── fusion.py          # RRF implementation
│   ├── reranker.py        # Cross-encoder reranking
│   ├── analyzer.py        # Query analysis
│   └── models.py          # Data models
├── tests/
│   ├── test_retrieval.py
│   └── test_fusion.py
└── README.md

5.3 Implementation Phases

Phase 1: Individual Retrievers (10-12h)

Goals:

  • BM25 retrieval working
  • Vector retrieval working
  • Graph retrieval working

Tasks:

  1. Implement BM25 indexing and search
  2. Set up vector store with embeddings
  3. Build graph retrieval with entity extraction
  4. Standardize result format across retrievers

Checkpoint: Each retriever returns ranked results.

Phase 2: Fusion and Routing (10-12h)

Goals:

  • RRF fusion implemented
  • Query routing working

Tasks:

  1. Implement RRF algorithm
  2. Build query analyzer for routing
  3. Add weighted fusion options
  4. Handle missing retrievers gracefully

Checkpoint: Hybrid search combines all retrievers.

Phase 3: Reranking and Optimization (8-10h)

Goals:

  • Cross-encoder reranking
  • Performance optimization

Tasks:

  1. Add cross-encoder reranker
  2. Implement result caching
  3. Add async retrieval for parallelism
  4. Benchmark and tune parameters

Checkpoint: Production-ready hybrid retrieval.


6. Testing Strategy

6.1 Test Categories

Category Purpose Examples
Unit Test individual components RRF calculation
Integration Test full pipeline Query → results
Quality Test retrieval accuracy Known relevant docs

6.2 Critical Test Cases

  1. Exact match query: BM25 should rank highest
  2. Semantic query: Vector should contribute
  3. Entity query: Graph results appear
  4. RRF correctness: Fusion ranks match expected

7. Common Pitfalls & Debugging

Pitfall Symptom Solution
Score incompatibility One retriever dominates Normalize scores before fusion
Missing results Expected doc not returned Check all retrievers independently
Slow performance High latency Run retrievers in parallel
Reranker bias Verbose docs ranked higher Include length normalization

8. Extensions & Challenges

8.1 Beginner Extensions

  • Add retrieval method explanations
  • Implement result deduplication

8.2 Intermediate Extensions

  • Add learned fusion weights
  • Implement query expansion

8.3 Advanced Extensions

  • Add adaptive routing (learn from feedback)
  • Implement distributed retrieval

9. Real-World Connections

9.1 Industry Applications

  • Elasticsearch: Hybrid search with vectors
  • Pinecone Hybrid: Sparse + dense retrieval
  • Graphiti: Graph-enhanced RAG

9.2 Interview Relevance

  • Explain BM25 vs vector tradeoffs
  • Describe RRF and why it works
  • Discuss when to use reranking

10. Resources

10.1 Essential Reading

  • “AI Engineering” by Chip Huyen — Ch. on RAG
  • “Reciprocal Rank Fusion” paper — Cormack et al.
  • Cross-encoder vs Bi-encoder — SBERT documentation
  • Previous: Project 11 (MemGPT Virtual Context)
  • Next: Project 13 (Multi-Agent Shared Memory)

11. Self-Assessment Checklist

  • I understand BM25, vector, and graph retrieval tradeoffs
  • I can implement Reciprocal Rank Fusion
  • I know when to use reranking
  • I can route queries to optimal retrievers

12. Submission / Completion Criteria

Minimum Viable Completion:

  • BM25 and vector retrieval working
  • RRF fusion implemented
  • Basic hybrid search

Full Completion:

  • Graph retrieval integrated
  • Query routing working
  • Cross-encoder reranking

Excellence:

  • Learned fusion weights
  • Adaptive routing
  • Production performance