GENERATIVE AI LLM RAG LEARNING PROJECTS
Learning Generative AI, LLMs, RAG, Vector Databases & Reranking
Excellent learning path! These technologies form the backbone of modern AI applications. Let me break this down and give you projects that will force you to truly understand each component - not just use APIs, but internalize how they work.
Core Concept Analysis
1. Generative AI & LLMs
- Transformer architecture (self-attention, multi-head attention)
- Tokenization (BPE, WordPiece, SentencePiece)
- Embeddings and positional encoding
- Forward pass, softmax, and generation strategies (greedy, beam search, sampling)
- Fine-tuning vs prompting vs RAG
2. RAG (Retrieval Augmented Generation)
- Document chunking strategies (fixed-size, semantic, sentence-based)
- Embedding generation (bi-encoders)
- Retrieval pipelines (sparse, dense, hybrid)
- Context injection and prompt construction
- Evaluation metrics (relevance, faithfulness, answer quality)
3. Vector Databases
- Vector similarity metrics (cosine, euclidean, dot product)
- Indexing algorithms (HNSW, IVF, PQ)
- Approximate Nearest Neighbors (ANN) trade-offs
- Metadata filtering and hybrid search
4. Reranking
- Bi-encoders vs Cross-encoders architecture
- Two-stage retrieval pipelines
- Relevance scoring mechanisms
- Performance vs accuracy trade-offs
Project 1: Build a Mini-Transformer from Scratch
- File: GENERATIVE_AI_LLM_RAG_LEARNING_PROJECTS.md
- Programming Language: Python
- Coolness Level: Level 5: Pure Magic (Super Cool)
- Business Potential: 1. The “Resume Gold”
- Difficulty: Level 4: Expert
- Knowledge Area: Deep Learning / NLP
- Software or Tool: PyTorch
- Main Book: “Build a Large Language Model (From Scratch)” by Sebastian Raschka
What you’ll build: A small GPT-like transformer (decoder-only) that can generate coherent text after training on a corpus like Shakespeare or Wikipedia excerpts.
Why it teaches LLMs: You cannot understand how ChatGPT “thinks” until you implement self-attention yourself and see how tokens attend to each other. Building this forces you to grapple with the math behind Q, K, V matrices, why positional encoding exists, and how generation actually works.
Core challenges you’ll face:
- Implementing multi-head self-attention without libraries (maps to understanding the attention mechanism)
- Making causal masking work for autoregressive generation (maps to understanding why LLMs can only see “past” tokens)
- Getting training to converge with proper learning rate scheduling (maps to understanding why training LLMs is hard)
- Implementing tokenization from scratch (maps to understanding how text becomes numbers)
Key Concepts:
- Attention Mechanism: “Attention Is All You Need” paper - Vaswani et al.
- Transformer Architecture: “Build a Large Language Model (From Scratch)” Chapter 3 - Sebastian Raschka
- Tokenization: “Let’s build the GPT Tokenizer” - Andrej Karpathy
- Training Dynamics: “AI Engineering” Chapter 4 - Chip Huyen
Difficulty: Intermediate Time estimate: 2-3 weeks Prerequisites: Python, PyTorch basics, linear algebra fundamentals
Real world outcome:
- Your terminal will generate coherent Shakespeare-style text: “To be or not to be, that is the question of the…”
- You can input any prompt and watch your model complete it
- A Jupyter notebook showing attention heatmaps visualizing what tokens your model “looks at”
Learning milestones:
- After implementing attention: You’ll understand why transformers can process sequences in parallel and what “attending” actually means
- After training: You’ll viscerally understand the compute/data/quality relationship and why bigger isn’t always better
- After generation: You’ll understand temperature, top-k, nucleus sampling and why models sometimes produce garbage
Project 2: Build Your Own Vector Database Engine
- File: GENERATIVE_AI_LLM_RAG_LEARNING_PROJECTS.md
- Programming Language: Python
- Coolness Level: Level 4: Hardcore Tech Flex
- Business Potential: 4. The “Open Core” Infrastructure
- Difficulty: Level 4: Expert
- Knowledge Area: Databases / ML Infrastructure
- Software or Tool: HNSW / Embeddings
- Main Book: “Algorithms, Fourth Edition” by Sedgewick & Wayne
What you’ll build: A vector database from scratch that stores embeddings, indexes them with HNSW (Hierarchical Navigable Small World), and retrieves nearest neighbors with sub-linear complexity.
Why it teaches Vector Databases: Using Pinecone or ChromaDB hides the magic. Building HNSW yourself forces you to understand why approximate nearest neighbor search is necessary (exact search is O(n)), how graph-based indexing works, and the precision-recall tradeoffs every production system makes.
Core challenges you’ll face:
- Implementing HNSW graph construction (maps to understanding navigable small-world graphs)
- Managing the index persistence to disk (maps to understanding why vector DBs need special storage)
- Implementing cosine similarity efficiently with numpy vectorization (maps to understanding similarity metrics)
- Building metadata filtering alongside vector search (maps to understanding hybrid search)
Resources for key challenges:
- “Hierarchical Navigable Small World graphs” paper - Original HNSW paper by Malkov & Yashunin
- “FAISS Tutorial” - Pinecone’s explanation of indexing algorithms
Key Concepts:
- HNSW Algorithm: “Efficient and robust approximate nearest neighbor search using HNSW” - Malkov & Yashunin
- Vector Similarity: “Math for Programmers” Chapter 6 (Vectors) - Paul Orland
- Indexing Structures: “Algorithms, Fourth Edition” Chapter 3 (Searching) - Sedgewick & Wayne
Difficulty: Intermediate-Advanced Time estimate: 2-3 weeks Prerequisites: Python, data structures (graphs, heaps), basic understanding of embeddings
Real world outcome:
- A CLI tool where you can:
./myvecdb add "The quick brown fox"and./myvecdb search "fast animal" --top-k 5 - Benchmarks showing your HNSW is 100x faster than brute-force on 100k vectors
- A visualization of your HNSW graph structure showing how vectors connect across layers
Learning milestones:
- After brute-force implementation: You’ll understand why O(n) doesn’t scale and appreciate the need for ANN
- After HNSW implementation: You’ll understand graph navigation, entry points, and layer traversal
- After benchmarking: You’ll internalize the precision-recall-latency tradeoff that every production system faces
Project 3: Build a Complete RAG System (No LangChain)
- File: GENERATIVE_AI_LLM_RAG_LEARNING_PROJECTS.md
- Programming Language: Python
- Coolness Level: Level 3: Genuinely Clever
- Business Potential: 2. The “Micro-SaaS / Pro Tool”
- Difficulty: Level 2: Intermediate
- Knowledge Area: Generative AI / Information Retrieval
- Software or Tool: Vector DB / LLM API
- Main Book: “AI Engineering” by Chip Huyen
What you’ll build: A question-answering system over your own documents (PDFs, markdown files) that chunks, embeds, retrieves, and generates answers — without using LangChain or LlamaIndex.
Why it teaches RAG: Frameworks hide everything. Building RAG manually forces you to understand why chunk size matters, how retrieval quality directly impacts generation quality, and where the “lost in the middle” problem comes from.
Core challenges you’ll face:
- Implementing chunking strategies (fixed, semantic, recursive) and seeing how they affect retrieval (maps to understanding document processing)
- Building the retrieval pipeline with proper scoring (maps to understanding information retrieval)
- Constructing prompts that maximize context utilization (maps to understanding prompt engineering for RAG)
- Evaluating RAG quality (relevance, faithfulness, hallucination detection)
Key Concepts:
- Chunking Strategies: “Enhancing RAG: A Study of Best Practices” - January 2025 arXiv paper
- Retrieval Fundamentals: “Designing Data-Intensive Applications” Chapter 3 - Martin Kleppmann
- Prompt Construction: “AI Engineering” Chapter 6 (RAG) - Chip Huyen
- Evaluation: “RAG Assessment (RAGAS)” - Standard metrics documentation
Difficulty: Intermediate Time estimate: 1-2 weeks Prerequisites: Python, understanding of embeddings, API access to an LLM (OpenAI, Claude, or local Ollama)
Real world outcome:
- A CLI:
./myrag query "What is the refund policy?" --docs ./company_policies/ - The system returns an answer with citations: “According to [policy.pdf, page 3], refunds are…”
- A dashboard showing retrieval scores, chunk sources, and confidence levels
Learning milestones:
- After implementing chunking: You’ll see how chunk size directly impacts retrieval quality — too small loses context, too large dilutes relevance
- After building retrieval: You’ll understand why “semantic search” isn’t magic and sometimes fails badly
- After end-to-end evaluation: You’ll understand the full chain of quality degradation and where to invest optimization effort
Project 4: Implement a Two-Stage Retrieval System with Reranking
- File: GENERATIVE_AI_LLM_RAG_LEARNING_PROJECTS.md
- Programming Language: Python
- Coolness Level: Level 3: Genuinely Clever
- Business Potential: 3. The “Service & Support” Model
- Difficulty: Level 3: Advanced
- Knowledge Area: Information Retrieval / ML
- Software or Tool: Cross-Encoders
- Main Book: “Introduction to Information Retrieval” by Manning, Raghavan & Schütze
What you’ll build: A search system that first retrieves 100 candidates using fast bi-encoder search, then re-ranks them with a cross-encoder to surface the most relevant results.
Why it teaches Reranking: Bi-encoders are fast but compress meaning into a single vector — they lose nuance. Cross-encoders are accurate but slow because they process query+document pairs together. Building both yourself reveals why modern search uses a two-stage architecture.
Core challenges you’ll face:
- Implementing/fine-tuning a bi-encoder for fast retrieval (maps to understanding embedding models)
- Implementing/using a cross-encoder for reranking (maps to understanding attention between query and document)
- Measuring precision@k before and after reranking (maps to understanding evaluation metrics)
- Finding the optimal candidate count for reranking (100? 50? 200?)
Resources for key challenges:
- “Retrieve & Re-Rank” - Sentence Transformers documentation
- “Rerankers and Two-Stage Retrieval” - Pinecone’s excellent visual guide
Key Concepts:
- Bi-Encoders: “Sentence-BERT paper” - Reimers & Gurevych
- Cross-Encoders: “Training and Finetuning Reranker Models” - Hugging Face blog
- Information Retrieval Metrics: “Foundations of Information Security” Chapter on Search - Jason Andress
Difficulty: Intermediate Time estimate: 1-2 weeks Prerequisites: Project 2 or 3 completed, familiarity with Hugging Face transformers
Real world outcome:
- A search API that returns results with clear before/after reranking comparison
- Benchmarks showing: “Bi-encoder alone: 72% precision@5, With reranking: 91% precision@5”
- A visualization showing how document rankings change after reranking
Learning milestones:
- After bi-encoder implementation: You’ll understand why embedding search is fast but approximate
- After cross-encoder implementation: You’ll see why cross-encoders are more accurate but can’t scale
- After combining both: You’ll internalize the fundamental speed-accuracy tradeoff in information retrieval
Project 5: Fine-Tune Your Own Embedding Model
- File: GENERATIVE_AI_LLM_RAG_LEARNING_PROJECTS.md
- Programming Language: Python
- Coolness Level: Level 4: Hardcore Tech Flex
- Business Potential: 3. The “Service & Support” Model
- Difficulty: Level 4: Expert
- Knowledge Area: ML Training / NLP
- Software or Tool: Sentence Transformers
- Main Book: “AI Engineering” by Chip Huyen
What you’ll build: Take an existing embedding model (like all-MiniLM-L6-v2) and fine-tune it on your domain-specific data to dramatically improve retrieval quality.
Why it teaches embeddings deeply: General-purpose embeddings work “okay” everywhere but excel nowhere. Fine-tuning forces you to understand contrastive learning, hard negative mining, and how embeddings encode meaning.
Core challenges you’ll face:
- Curating training data (query-positive-negative triplets) (maps to understanding what embeddings learn)
- Implementing contrastive loss (InfoNCE, Multiple Negatives Ranking Loss) (maps to understanding how embeddings are trained)
- Mining hard negatives effectively (maps to understanding why easy negatives don’t improve models)
- Evaluating embedding quality (NDCG, MRR, recall@k)
Key Concepts:
- Contrastive Learning: “AI Engineering” Chapter 5 (Fine-tuning) - Chip Huyen
- Loss Functions: “Sentence Transformers Training Overview” - SBERT docs
- Hard Negative Mining: “Training State-of-the-Art Embedding Models” - Hugging Face
Difficulty: Advanced Time estimate: 2 weeks Prerequisites: PyTorch, completed Projects 2-4, understanding of loss functions
Real world outcome:
- Side-by-side comparison: “Base model recall@10: 68%, Fine-tuned recall@10: 89%”
- Your fine-tuned model uploaded to Hugging Face Hub
- A demo where domain-specific queries (legal, medical, your codebase) return dramatically better results
Learning milestones:
- After data curation: You’ll understand that embedding quality is bounded by training data quality
- After training: You’ll viscerally understand the embedding space — similar things cluster together
- After evaluation: You’ll understand when to fine-tune vs when to use better base models
Project Comparison Table
| Project | Difficulty | Time | Depth of Understanding | Fun Factor |
|---|---|---|---|---|
| Mini-Transformer | Intermediate | 2-3 weeks | ⭐⭐⭐⭐⭐ (deepest LLM understanding) | ⭐⭐⭐⭐ (magical when it generates text) |
| Vector DB Engine | Intermediate-Advanced | 2-3 weeks | ⭐⭐⭐⭐⭐ (fundamental to all retrieval) | ⭐⭐⭐ (algorithmic satisfaction) |
| RAG System | Intermediate | 1-2 weeks | ⭐⭐⭐⭐ (practical RAG mastery) | ⭐⭐⭐⭐⭐ (immediately useful) |
| Two-Stage Reranking | Intermediate | 1-2 weeks | ⭐⭐⭐⭐ (search architecture) | ⭐⭐⭐⭐ (measurable improvements) |
| Fine-Tune Embeddings | Advanced | 2 weeks | ⭐⭐⭐⭐⭐ (embedding mastery) | ⭐⭐⭐ (requires patience) |
Recommended Learning Path
Based on your learning goals, here’s the optimal path:
Start with: Project 3 (RAG System)
Why: It gives you the fastest path to a working end-to-end system while touching all four areas. You’ll immediately see how LLMs, embeddings, retrieval, and generation fit together. This creates the mental framework for diving deeper.
Then: Project 2 (Vector DB Engine)
Why: After building RAG, you’ll have questions about why retrieval sometimes fails. Building a vector DB from scratch answers those questions and gives you intuition about similarity search.
Then: Project 4 (Two-Stage Reranking)
Why: You’ll see your RAG system’s limitations — retrieval isn’t always precise. Adding reranking teaches you the two-stage architecture used in production.
Finally: Project 1 (Mini-Transformer)
Why: Now you’ll appreciate why building this matters. After working with LLMs as black boxes, implementing one from scratch is enlightening.
Final Overall Project: Production-Grade AI Research Assistant
After completing the projects above, combine everything into one comprehensive system:
What you’ll build: A fully functional research assistant that can ingest papers/documents, answer questions with citations, maintain conversation context, and continuously improve its retrieval quality — all running locally or on your infrastructure.
Why this is the capstone: This forces you to integrate every component you’ve learned: your understanding of transformers powers your prompt engineering, your vector DB knowledge informs your indexing strategy, your RAG expertise structures your retrieval pipeline, and your reranking skills ensure precision.
Core challenges you’ll face:
- Multi-modal document processing (PDFs with figures, tables, code) (maps to advanced chunking)
- Hierarchical retrieval (section → paragraph → sentence) (maps to multi-stage retrieval architecture)
- Conversation memory with proper context window management (maps to understanding LLM limitations)
- Self-improving retrieval through user feedback signals (maps to production ML systems)
- Streaming responses with progressive retrieval (maps to real-world UX requirements)
Key Concepts:
- Agentic RAG: “2025’s Ultimate Guide to RAG Retrieval” - Mehul Pratap Singh
- Advanced RAG Patterns: “AI Engineering” Chapters 6-8 - Chip Huyen
- System Design: “Designing Data-Intensive Applications” Chapters 1-3 - Martin Kleppmann
- Production ML: “Fundamentals of Software Architecture” - Richards & Ford (for system design patterns)
Difficulty: Advanced Time estimate: 1-2 months Prerequisites: All 5 projects above completed
Real world outcome:
- A web UI where you drop PDFs and ask questions across your entire knowledge base
- Answers come with highlighted citations that link back to source documents
- A feedback mechanism: 👍/👎 that triggers fine-tuning of your embedding model
- Metrics dashboard showing retrieval quality, response latency, and user satisfaction
- Deploy it to a cloud instance and use it daily for your own research
Learning milestones:
- After MVP: You’ll have a working assistant that answers questions over your docs
- After adding reranking: You’ll see measurable quality improvements in precision
- After adding feedback loop: You’ll understand how production AI systems improve over time
- After deploying: You’ll understand the full MLOps lifecycle for AI applications
Essential Resources Summary
| Resource | What It Teaches |
|---|---|
| Sebastian Raschka’s “Build an LLM from Scratch” | Transformer implementation, training, fine-tuning |
| Chip Huyen’s “AI Engineering” | Production RAG, evaluation, system design |
| Sentence Transformers docs | Embeddings, reranking, fine-tuning |
| Pinecone Learning Center | Vector DBs, indexing, RAG patterns |
| Andrej Karpathy’s YouTube | Neural networks from scratch, GPT implementation |
Sources
- 2025’s Ultimate Guide to RAG Retrieval - Medium
- Enhancing RAG: A Study of Best Practices - arXiv
- Vector Database Comparison 2025 - Aloa
- Rerankers and Two-Stage Retrieval - Pinecone
- Training Reranker Models - Hugging Face
- LLMs from Scratch - GitHub
- Transformer Explainer - Interactive
- Retrieve & Re-Rank - Sentence Transformers