← Back to all projects

GENERATIVE AI LLM RAG LEARNING PROJECTS

Learning Generative AI, LLMs, RAG, Vector Databases & Reranking

Excellent learning path! These technologies form the backbone of modern AI applications. Let me break this down and give you projects that will force you to truly understand each component - not just use APIs, but internalize how they work.


Core Concept Analysis

1. Generative AI & LLMs

  • Transformer architecture (self-attention, multi-head attention)
  • Tokenization (BPE, WordPiece, SentencePiece)
  • Embeddings and positional encoding
  • Forward pass, softmax, and generation strategies (greedy, beam search, sampling)
  • Fine-tuning vs prompting vs RAG

2. RAG (Retrieval Augmented Generation)

  • Document chunking strategies (fixed-size, semantic, sentence-based)
  • Embedding generation (bi-encoders)
  • Retrieval pipelines (sparse, dense, hybrid)
  • Context injection and prompt construction
  • Evaluation metrics (relevance, faithfulness, answer quality)

3. Vector Databases

  • Vector similarity metrics (cosine, euclidean, dot product)
  • Indexing algorithms (HNSW, IVF, PQ)
  • Approximate Nearest Neighbors (ANN) trade-offs
  • Metadata filtering and hybrid search

4. Reranking

  • Bi-encoders vs Cross-encoders architecture
  • Two-stage retrieval pipelines
  • Relevance scoring mechanisms
  • Performance vs accuracy trade-offs

Project 1: Build a Mini-Transformer from Scratch

  • File: GENERATIVE_AI_LLM_RAG_LEARNING_PROJECTS.md
  • Programming Language: Python
  • Coolness Level: Level 5: Pure Magic (Super Cool)
  • Business Potential: 1. The “Resume Gold”
  • Difficulty: Level 4: Expert
  • Knowledge Area: Deep Learning / NLP
  • Software or Tool: PyTorch
  • Main Book: “Build a Large Language Model (From Scratch)” by Sebastian Raschka

What you’ll build: A small GPT-like transformer (decoder-only) that can generate coherent text after training on a corpus like Shakespeare or Wikipedia excerpts.

Why it teaches LLMs: You cannot understand how ChatGPT “thinks” until you implement self-attention yourself and see how tokens attend to each other. Building this forces you to grapple with the math behind Q, K, V matrices, why positional encoding exists, and how generation actually works.

Core challenges you’ll face:

  • Implementing multi-head self-attention without libraries (maps to understanding the attention mechanism)
  • Making causal masking work for autoregressive generation (maps to understanding why LLMs can only see “past” tokens)
  • Getting training to converge with proper learning rate scheduling (maps to understanding why training LLMs is hard)
  • Implementing tokenization from scratch (maps to understanding how text becomes numbers)

Key Concepts:

Difficulty: Intermediate Time estimate: 2-3 weeks Prerequisites: Python, PyTorch basics, linear algebra fundamentals

Real world outcome:

  • Your terminal will generate coherent Shakespeare-style text: “To be or not to be, that is the question of the…”
  • You can input any prompt and watch your model complete it
  • A Jupyter notebook showing attention heatmaps visualizing what tokens your model “looks at”

Learning milestones:

  1. After implementing attention: You’ll understand why transformers can process sequences in parallel and what “attending” actually means
  2. After training: You’ll viscerally understand the compute/data/quality relationship and why bigger isn’t always better
  3. After generation: You’ll understand temperature, top-k, nucleus sampling and why models sometimes produce garbage

Project 2: Build Your Own Vector Database Engine

  • File: GENERATIVE_AI_LLM_RAG_LEARNING_PROJECTS.md
  • Programming Language: Python
  • Coolness Level: Level 4: Hardcore Tech Flex
  • Business Potential: 4. The “Open Core” Infrastructure
  • Difficulty: Level 4: Expert
  • Knowledge Area: Databases / ML Infrastructure
  • Software or Tool: HNSW / Embeddings
  • Main Book: “Algorithms, Fourth Edition” by Sedgewick & Wayne

What you’ll build: A vector database from scratch that stores embeddings, indexes them with HNSW (Hierarchical Navigable Small World), and retrieves nearest neighbors with sub-linear complexity.

Why it teaches Vector Databases: Using Pinecone or ChromaDB hides the magic. Building HNSW yourself forces you to understand why approximate nearest neighbor search is necessary (exact search is O(n)), how graph-based indexing works, and the precision-recall tradeoffs every production system makes.

Core challenges you’ll face:

  • Implementing HNSW graph construction (maps to understanding navigable small-world graphs)
  • Managing the index persistence to disk (maps to understanding why vector DBs need special storage)
  • Implementing cosine similarity efficiently with numpy vectorization (maps to understanding similarity metrics)
  • Building metadata filtering alongside vector search (maps to understanding hybrid search)

Resources for key challenges:

Key Concepts:

  • HNSW Algorithm: “Efficient and robust approximate nearest neighbor search using HNSW” - Malkov & Yashunin
  • Vector Similarity: “Math for Programmers” Chapter 6 (Vectors) - Paul Orland
  • Indexing Structures: “Algorithms, Fourth Edition” Chapter 3 (Searching) - Sedgewick & Wayne

Difficulty: Intermediate-Advanced Time estimate: 2-3 weeks Prerequisites: Python, data structures (graphs, heaps), basic understanding of embeddings

Real world outcome:

  • A CLI tool where you can: ./myvecdb add "The quick brown fox" and ./myvecdb search "fast animal" --top-k 5
  • Benchmarks showing your HNSW is 100x faster than brute-force on 100k vectors
  • A visualization of your HNSW graph structure showing how vectors connect across layers

Learning milestones:

  1. After brute-force implementation: You’ll understand why O(n) doesn’t scale and appreciate the need for ANN
  2. After HNSW implementation: You’ll understand graph navigation, entry points, and layer traversal
  3. After benchmarking: You’ll internalize the precision-recall-latency tradeoff that every production system faces

Project 3: Build a Complete RAG System (No LangChain)

  • File: GENERATIVE_AI_LLM_RAG_LEARNING_PROJECTS.md
  • Programming Language: Python
  • Coolness Level: Level 3: Genuinely Clever
  • Business Potential: 2. The “Micro-SaaS / Pro Tool”
  • Difficulty: Level 2: Intermediate
  • Knowledge Area: Generative AI / Information Retrieval
  • Software or Tool: Vector DB / LLM API
  • Main Book: “AI Engineering” by Chip Huyen

What you’ll build: A question-answering system over your own documents (PDFs, markdown files) that chunks, embeds, retrieves, and generates answers — without using LangChain or LlamaIndex.

Why it teaches RAG: Frameworks hide everything. Building RAG manually forces you to understand why chunk size matters, how retrieval quality directly impacts generation quality, and where the “lost in the middle” problem comes from.

Core challenges you’ll face:

  • Implementing chunking strategies (fixed, semantic, recursive) and seeing how they affect retrieval (maps to understanding document processing)
  • Building the retrieval pipeline with proper scoring (maps to understanding information retrieval)
  • Constructing prompts that maximize context utilization (maps to understanding prompt engineering for RAG)
  • Evaluating RAG quality (relevance, faithfulness, hallucination detection)

Key Concepts:

Difficulty: Intermediate Time estimate: 1-2 weeks Prerequisites: Python, understanding of embeddings, API access to an LLM (OpenAI, Claude, or local Ollama)

Real world outcome:

  • A CLI: ./myrag query "What is the refund policy?" --docs ./company_policies/
  • The system returns an answer with citations: “According to [policy.pdf, page 3], refunds are…”
  • A dashboard showing retrieval scores, chunk sources, and confidence levels

Learning milestones:

  1. After implementing chunking: You’ll see how chunk size directly impacts retrieval quality — too small loses context, too large dilutes relevance
  2. After building retrieval: You’ll understand why “semantic search” isn’t magic and sometimes fails badly
  3. After end-to-end evaluation: You’ll understand the full chain of quality degradation and where to invest optimization effort

Project 4: Implement a Two-Stage Retrieval System with Reranking

  • File: GENERATIVE_AI_LLM_RAG_LEARNING_PROJECTS.md
  • Programming Language: Python
  • Coolness Level: Level 3: Genuinely Clever
  • Business Potential: 3. The “Service & Support” Model
  • Difficulty: Level 3: Advanced
  • Knowledge Area: Information Retrieval / ML
  • Software or Tool: Cross-Encoders
  • Main Book: “Introduction to Information Retrieval” by Manning, Raghavan & Schütze

What you’ll build: A search system that first retrieves 100 candidates using fast bi-encoder search, then re-ranks them with a cross-encoder to surface the most relevant results.

Why it teaches Reranking: Bi-encoders are fast but compress meaning into a single vector — they lose nuance. Cross-encoders are accurate but slow because they process query+document pairs together. Building both yourself reveals why modern search uses a two-stage architecture.

Core challenges you’ll face:

  • Implementing/fine-tuning a bi-encoder for fast retrieval (maps to understanding embedding models)
  • Implementing/using a cross-encoder for reranking (maps to understanding attention between query and document)
  • Measuring precision@k before and after reranking (maps to understanding evaluation metrics)
  • Finding the optimal candidate count for reranking (100? 50? 200?)

Resources for key challenges:

Key Concepts:

Difficulty: Intermediate Time estimate: 1-2 weeks Prerequisites: Project 2 or 3 completed, familiarity with Hugging Face transformers

Real world outcome:

  • A search API that returns results with clear before/after reranking comparison
  • Benchmarks showing: “Bi-encoder alone: 72% precision@5, With reranking: 91% precision@5”
  • A visualization showing how document rankings change after reranking

Learning milestones:

  1. After bi-encoder implementation: You’ll understand why embedding search is fast but approximate
  2. After cross-encoder implementation: You’ll see why cross-encoders are more accurate but can’t scale
  3. After combining both: You’ll internalize the fundamental speed-accuracy tradeoff in information retrieval

Project 5: Fine-Tune Your Own Embedding Model

  • File: GENERATIVE_AI_LLM_RAG_LEARNING_PROJECTS.md
  • Programming Language: Python
  • Coolness Level: Level 4: Hardcore Tech Flex
  • Business Potential: 3. The “Service & Support” Model
  • Difficulty: Level 4: Expert
  • Knowledge Area: ML Training / NLP
  • Software or Tool: Sentence Transformers
  • Main Book: “AI Engineering” by Chip Huyen

What you’ll build: Take an existing embedding model (like all-MiniLM-L6-v2) and fine-tune it on your domain-specific data to dramatically improve retrieval quality.

Why it teaches embeddings deeply: General-purpose embeddings work “okay” everywhere but excel nowhere. Fine-tuning forces you to understand contrastive learning, hard negative mining, and how embeddings encode meaning.

Core challenges you’ll face:

  • Curating training data (query-positive-negative triplets) (maps to understanding what embeddings learn)
  • Implementing contrastive loss (InfoNCE, Multiple Negatives Ranking Loss) (maps to understanding how embeddings are trained)
  • Mining hard negatives effectively (maps to understanding why easy negatives don’t improve models)
  • Evaluating embedding quality (NDCG, MRR, recall@k)

Key Concepts:

Difficulty: Advanced Time estimate: 2 weeks Prerequisites: PyTorch, completed Projects 2-4, understanding of loss functions

Real world outcome:

  • Side-by-side comparison: “Base model recall@10: 68%, Fine-tuned recall@10: 89%”
  • Your fine-tuned model uploaded to Hugging Face Hub
  • A demo where domain-specific queries (legal, medical, your codebase) return dramatically better results

Learning milestones:

  1. After data curation: You’ll understand that embedding quality is bounded by training data quality
  2. After training: You’ll viscerally understand the embedding space — similar things cluster together
  3. After evaluation: You’ll understand when to fine-tune vs when to use better base models

Project Comparison Table

Project Difficulty Time Depth of Understanding Fun Factor
Mini-Transformer Intermediate 2-3 weeks ⭐⭐⭐⭐⭐ (deepest LLM understanding) ⭐⭐⭐⭐ (magical when it generates text)
Vector DB Engine Intermediate-Advanced 2-3 weeks ⭐⭐⭐⭐⭐ (fundamental to all retrieval) ⭐⭐⭐ (algorithmic satisfaction)
RAG System Intermediate 1-2 weeks ⭐⭐⭐⭐ (practical RAG mastery) ⭐⭐⭐⭐⭐ (immediately useful)
Two-Stage Reranking Intermediate 1-2 weeks ⭐⭐⭐⭐ (search architecture) ⭐⭐⭐⭐ (measurable improvements)
Fine-Tune Embeddings Advanced 2 weeks ⭐⭐⭐⭐⭐ (embedding mastery) ⭐⭐⭐ (requires patience)

Based on your learning goals, here’s the optimal path:

Start with: Project 3 (RAG System)

Why: It gives you the fastest path to a working end-to-end system while touching all four areas. You’ll immediately see how LLMs, embeddings, retrieval, and generation fit together. This creates the mental framework for diving deeper.

Then: Project 2 (Vector DB Engine)

Why: After building RAG, you’ll have questions about why retrieval sometimes fails. Building a vector DB from scratch answers those questions and gives you intuition about similarity search.

Then: Project 4 (Two-Stage Reranking)

Why: You’ll see your RAG system’s limitations — retrieval isn’t always precise. Adding reranking teaches you the two-stage architecture used in production.

Finally: Project 1 (Mini-Transformer)

Why: Now you’ll appreciate why building this matters. After working with LLMs as black boxes, implementing one from scratch is enlightening.


Final Overall Project: Production-Grade AI Research Assistant

After completing the projects above, combine everything into one comprehensive system:

What you’ll build: A fully functional research assistant that can ingest papers/documents, answer questions with citations, maintain conversation context, and continuously improve its retrieval quality — all running locally or on your infrastructure.

Why this is the capstone: This forces you to integrate every component you’ve learned: your understanding of transformers powers your prompt engineering, your vector DB knowledge informs your indexing strategy, your RAG expertise structures your retrieval pipeline, and your reranking skills ensure precision.

Core challenges you’ll face:

  • Multi-modal document processing (PDFs with figures, tables, code) (maps to advanced chunking)
  • Hierarchical retrieval (section → paragraph → sentence) (maps to multi-stage retrieval architecture)
  • Conversation memory with proper context window management (maps to understanding LLM limitations)
  • Self-improving retrieval through user feedback signals (maps to production ML systems)
  • Streaming responses with progressive retrieval (maps to real-world UX requirements)

Key Concepts:

  • Agentic RAG: “2025’s Ultimate Guide to RAG Retrieval” - Mehul Pratap Singh
  • Advanced RAG Patterns: “AI Engineering” Chapters 6-8 - Chip Huyen
  • System Design: “Designing Data-Intensive Applications” Chapters 1-3 - Martin Kleppmann
  • Production ML: “Fundamentals of Software Architecture” - Richards & Ford (for system design patterns)

Difficulty: Advanced Time estimate: 1-2 months Prerequisites: All 5 projects above completed

Real world outcome:

  • A web UI where you drop PDFs and ask questions across your entire knowledge base
  • Answers come with highlighted citations that link back to source documents
  • A feedback mechanism: 👍/👎 that triggers fine-tuning of your embedding model
  • Metrics dashboard showing retrieval quality, response latency, and user satisfaction
  • Deploy it to a cloud instance and use it daily for your own research

Learning milestones:

  1. After MVP: You’ll have a working assistant that answers questions over your docs
  2. After adding reranking: You’ll see measurable quality improvements in precision
  3. After adding feedback loop: You’ll understand how production AI systems improve over time
  4. After deploying: You’ll understand the full MLOps lifecycle for AI applications

Essential Resources Summary

Resource What It Teaches
Sebastian Raschka’s “Build an LLM from Scratch” Transformer implementation, training, fine-tuning
Chip Huyen’s “AI Engineering” Production RAG, evaluation, system design
Sentence Transformers docs Embeddings, reranking, fine-tuning
Pinecone Learning Center Vector DBs, indexing, RAG patterns
Andrej Karpathy’s YouTube Neural networks from scratch, GPT implementation

Sources