Project 4: Simple RAG (Retrieval-Augmented Generation) System

Build a minimal RAG pipeline that retrieves relevant chunks and produces grounded answers.

Quick Reference

Attribute Value
Difficulty Level 2: Intermediate
Time Estimate 10-14 hours
Language Python
Prerequisites Embeddings basics, vector search
Key Topics retrieval, grounding, chunking

1. Learning Objectives

By completing this project, you will:

  1. Chunk and embed a document set.
  2. Retrieve top-k chunks for a query.
  3. Generate answers grounded in retrieved context.
  4. Add citations or chunk references.
  5. Evaluate correctness on sample queries.

2. Theoretical Foundation

2.1 Why RAG

Retrieval helps keep the model grounded in known facts and reduces hallucinations.


3. Project Specification

3.1 What You Will Build

A small RAG system that ingests documents and answers questions using retrieved context.

3.2 Functional Requirements

  1. Chunker with configurable sizes.
  2. Embedding generator for chunks.
  3. Retriever to fetch top-k chunks.
  4. Answer generator with context injection.
  5. Evaluation on a small query set.

3.3 Non-Functional Requirements

  • Deterministic runs for testing.
  • Clear outputs with chunk IDs.
  • Fallback when retrieval is empty.

4. Solution Architecture

4.1 Components

Component Responsibility
Chunker Split documents
Embedder Generate vectors
Retriever Fetch top-k chunks
Answerer Generate grounded response

5. Implementation Guide

5.1 Project Structure

LEARN_LLM_MEMORY/P04-simple-rag/
├── src/
│   ├── chunk.py
│   ├── embed.py
│   ├── retrieve.py
│   ├── answer.py
│   └── eval.py

5.2 Implementation Phases

Phase 1: Chunking + embedding (4-6h)

  • Chunk documents and generate embeddings.
  • Checkpoint: chunks stored with metadata.

Phase 2: Retrieval (3-4h)

  • Retrieve top-k relevant chunks.
  • Checkpoint: retrieval results look relevant.

Phase 3: Answering + eval (3-4h)

  • Generate grounded answers.
  • Evaluate on sample queries.
  • Checkpoint: answers reference chunk IDs.

6. Testing Strategy

6.1 Test Categories

Category Purpose Examples
Unit chunking size/overlap constraints
Integration retrieval top-k correctness
Regression answers grounded output

6.2 Critical Test Cases

  1. Retrieval returns correct chunks for known query.
  2. Empty retrieval triggers safe fallback.
  3. Answer includes chunk references.

7. Common Pitfalls & Debugging

Pitfall Symptom Fix
Poor chunking irrelevant results tune size/overlap
Hallucinated answers unsupported claims enforce references
Slow search high latency reduce top-k or use ANN

8. Extensions & Challenges

Beginner

  • Add a simple CLI.
  • Add PDF support.

Intermediate

  • Add reranking.
  • Add evaluation dashboard.

Advanced

  • Add hybrid keyword + vector search.
  • Add citation enforcement.

9. Real-World Connections

  • Knowledge assistants depend on retrieval grounding.
  • Support bots need reliable references.

10. Resources

  • RAG tutorials
  • Vector database docs

11. Self-Assessment Checklist

  • I can build a simple RAG pipeline.
  • I can retrieve and use relevant chunks.
  • I can evaluate answer quality.

12. Submission / Completion Criteria

Minimum Completion:

  • End-to-end RAG pipeline
  • Grounded answers

Full Completion:

  • Evaluation on sample queries
  • Fallback handling

Excellence:

  • Reranking or hybrid search
  • Citation enforcement

This guide was generated from project_based_ideas/AI_AGENTS_LLM_RAG/LEARN_LLM_MEMORY.md.