Project 8: Metadata Filtering with Vector Search
Add structured metadata filtering to vector search without breaking recall or performance.
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Level 3: Advanced |
| Time Estimate | 10-16 hours |
| Language | Python |
| Prerequisites | Vector search basics, indexing |
| Key Topics | filtering, hybrid queries, index design |
1. Learning Objectives
By completing this project, you will:
- Store metadata alongside vectors.
- Implement boolean and range filters.
- Combine filter-first and filter-after strategies.
- Measure impact on recall and latency.
- Build a query planner for filters.
2. Theoretical Foundation
2.1 Filtered Vector Search
Filtering shrinks candidate sets but can reduce recall if applied incorrectly.
3. Project Specification
3.1 What You Will Build
A vector search system that supports metadata filters (e.g., date ranges, categories) with performance benchmarks.
3.2 Functional Requirements
- Metadata schema stored with vectors.
- Filter parser for queries.
- Filter strategies (pre-filter vs post-filter).
- Evaluation of recall and latency.
- Query planner to choose strategy.
3.3 Non-Functional Requirements
- Deterministic tests for filters.
- Clear logs for filtering decisions.
- Configurable filter policies.
4. Solution Architecture
4.1 Components
| Component | Responsibility |
|---|---|
| Metadata Store | Store structured fields |
| Filter Engine | Apply filters |
| Vector Search | Retrieve candidates |
| Planner | Choose filter strategy |
5. Implementation Guide
5.1 Project Structure
LEARN_VECTOR_DATABASES/P08-metadata-filtering/
├── src/
│ ├── metadata.py
│ ├── filters.py
│ ├── search.py
│ ├── planner.py
│ └── eval.py
5.2 Implementation Phases
Phase 1: Metadata + filters (4-6h)
- Store metadata and parse filters.
- Checkpoint: filters return correct subset.
Phase 2: Search integration (4-6h)
- Apply filters with vector search.
- Checkpoint: filtered queries return correct results.
Phase 3: Evaluation (3-5h)
- Measure recall and latency impact.
- Checkpoint: report shows tradeoffs.
6. Testing Strategy
6.1 Test Categories
| Category | Purpose | Examples |
|---|---|---|
| Unit | filters | range and boolean logic |
| Integration | search | filtered top-k results |
| Regression | planner | consistent strategy choices |
6.2 Critical Test Cases
- Filtered results respect metadata constraints.
- Planner chooses pre-filter for selective queries.
- Recall drop measured and reported.
7. Common Pitfalls & Debugging
| Pitfall | Symptom | Fix |
|---|---|---|
| Over-filtering | empty results | relax filter thresholds |
| Slow queries | too many candidates | pre-filter selectively |
| Inconsistent filtering | wrong results | add filter unit tests |
8. Extensions & Challenges
Beginner
- Add keyword + metadata hybrid search.
- Add filter presets.
Intermediate
- Add filter indexes for speed.
- Add query planner heuristics.
Advanced
- Add cost-based planner.
- Add multi-tenant filter policies.
9. Real-World Connections
- Enterprise search relies on metadata filters.
- Compliance systems need filtered retrieval.
10. Resources
- Vector DB filter docs
- Search system design references
11. Self-Assessment Checklist
- I can store and query metadata filters.
- I can combine filtering and vector search.
- I can measure recall impacts.
12. Submission / Completion Criteria
Minimum Completion:
- Metadata filtering with vector search
Full Completion:
- Evaluation of filter impact
Excellence:
- Cost-based planner
- Filter indexes
This guide was generated from project_based_ideas/AI_AGENTS_LLM_RAG/LEARN_VECTOR_DATABASES.md.