← Back to all projects

LEARN LANGCHAIN PROJECTS

Calling a Large Language Model (LLM) is easy, but building a robust application around it is hard. LLMs are non-deterministic, stateless, and have knowledge cutoffs. LangChain provides the essential toolkit to solve these problems. It's the de facto framework for productionizing LLM applications.

Learn LangChain: From Simple Chains to Autonomous Agents

Goal: Master the LangChain framework to build sophisticated, data-aware, and autonomous applications powered by Large Language Models. Move beyond simple API calls to orchestrating complex chains, grounding models in your own data, and giving them tools to interact with the world.


Why Learn LangChain?

Calling a Large Language Model (LLM) is easy, but building a robust application around it is hard. LLMs are non-deterministic, stateless, and have knowledge cutoffs. LangChain provides the essential toolkit to solve these problems. Itโ€™s the โ€œde factoโ€ framework for productionizing LLM applications.

After completing these projects, you will:

  • Understand the core components: Models, Prompts, Chains, and Output Parsers.
  • Build applications that can reason over your private data using Retrieval-Augmented Generation (RAG).
  • Give your applications memory to have stateful, long-running conversations.
  • Create autonomous agents that can use tools (like web search or your own APIs) to solve complex problems.
  • Structure your code for modularity and scalability, ready for real-world deployment.

Core Concept Analysis

1. The Basic Chain: LLMChain

The fundamental unit in LangChain. It combines a Prompt Template (a recipe for a prompt) with a Model (the LLM) and an optional Output Parser (to structure the response).

  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
  โ”‚ User Input    โ”‚
  โ”‚ {"topic": "AI"}โ”‚
  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
          โ”‚
          โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  PromptTemplate   โ”‚   โ”‚        Model       โ”‚   โ”‚   OutputParser   โ”‚
โ”‚ "Tell me a joke   โ”‚โ”€โ”€โ–ถโ”‚(e.g., Google Gemini)โ”‚โ”€โ”€โ–ถโ”‚ (e.g., formats   โ”‚
โ”‚   about {topic}"  โ”‚   โ”‚                    โ”‚   โ”‚   into a list)   โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
          โ–ฒ                   โ”‚                      โ”‚
          โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                          LLMChain
                                โ”‚
                                โ–ผ
                        โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                        โ”‚Structured Outputโ”‚
                        โ”‚ ["Why did AI..."]โ”‚
                        โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

2. Retrieval-Augmented Generation (RAG): The RAG Chain

This is the most common and powerful pattern for making LLMs โ€œsmarterโ€ with your own data. It prevents hallucination and allows the LLM to answer questions about information it wasnโ€™t trained on.

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  Document  โ”‚   โ”‚  Text   โ”‚   โ”‚ Embedding โ”‚   โ”‚ VectorStore โ”‚
โ”‚ (e.g., PDF)โ”œโ”€โ”€โ–ถโ”‚ Splitterโ”‚โ”€โ”€โ–ถโ”‚   Model   โ”‚โ”€โ”€โ–ถโ”‚ (e.g., FAISS,โ”‚
โ”‚            โ”‚   โ”‚         โ”‚   โ”‚           โ”‚   โ”‚  ChromaDB)  โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
  (Load)          (Split)        (Embed)         (Store)

                                                     โ–ฒ
                                                     โ”‚ Retrieve relevant chunks
                                                     โ”‚
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                                 โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ User Query โ”‚โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ถโ”‚ Retriever โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                                 โ””โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”˜
                                                     โ”‚
          โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
          โ”‚ (Context)
          โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                     LLM Chain (as above)                  โ”‚
โ”‚  "Based on this context... {context}, answer the question โ”‚
โ”‚                    ... {question}"                        โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

3. Agents: The Decision Makers

An agent uses an LLM not just to answer, but to think. It decides which Tool to use to find an answer, executes it, observes the result, and repeats the process until it has a final answer.

            (Loop)
              โ–ฒ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚
โ”‚   LLM     โ”‚โ”€โ”˜
โ”‚(Reasoning)โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”˜
      โ”‚
      โ”‚ Thought: "I need to search the web for the current weather."
      โ”‚
      โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  Action   โ”‚
โ”‚Tool: searchโ”‚
โ”‚Input: "weather in SF"โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”˜
      โ”‚
      โ”‚
      โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚   Tool    โ”‚   โ”‚ Observation โ”‚
โ”‚(e.g., Web โ”‚โ”€โ”€โ–ถโ”‚ "It is 70ยฐF โ”‚
โ”‚  Search)  โ”‚   โ”‚  and sunny."โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                       โ”‚
                       โ”‚ Returns Observation to LLM
                       โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                                             โ–ผ
                                     โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                                     โ”‚   LLM     โ”‚
                                     โ””โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”˜
                                           โ”‚
                                           โ”‚ Thought: "I have the final answer."
                                           โ”‚
                                           โ–ผ
                                        โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                                        โ”‚ Answer โ”‚
                                        โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Project List

These projects are designed to be built in sequence. Each one introduces a new, fundamental LangChain component.


Project 1: Structured Data Extractor

  • File: LEARN_LANGCHAIN_PROJECTS.md
  • Main Programming Language: Python
  • Coolness Level: Level 2: Practical but Forgettable
  • Business Potential: 2. The โ€œMicro-SaaS / Pro Toolโ€
  • Difficulty: Level 1: Beginner
  • Knowledge Area: Models, Prompts, Output Parsers
  • Software or Tool: LangChain, an LLM provider (OpenAI, Google)
  • Main Book: LangChain Official Documentation

What youโ€™ll build: A Python script that takes a block of unstructured text (e.g., a user review) and uses an LLM to extract structured information (like a 1-5 star rating, a summary, and a list of positive/negative keywords) into a Pydantic class.

Why it teaches LangChain: This project teaches the three most fundamental components: a Model, a PromptTemplate, and an Output Parser. Youโ€™ll learn that getting structured data back from an LLM is a core challenge, and youโ€™ll solve it the โ€œLangChain wayโ€.

Core challenges youโ€™ll face:

  • Connecting to an LLM API โ†’ maps to instantiating a Chat model (e.g., ChatGoogleGenerativeAI)
  • Writing a prompt that instructs the LLM to extract specific information โ†’ maps to creating a PromptTemplate
  • Defining the desired output structure โ†’ maps to creating a Pydantic class
  • Forcing the LLM to return data that matches your structure โ†’ maps to using a PydanticOutputParser and combining it all in a chain

Key Concepts:

  • Models (LLMs vs. Chat Models): LangChain Docs - โ€œModelsโ€
  • Prompt Templates: LangChain Docs - โ€œPromptsโ€
  • Output Parsers (especially Pydantic): LangChain Docs - โ€œOutput Parsersโ€
  • LangChain Expression Language (LCEL): The | (pipe) syntax for chaining components.

Difficulty: Beginner Time estimate: Weekend Prerequisites: Basic Python, familiarity with calling APIs.

Real world outcome: A function that reliably converts messy text into a clean Python object.

# The API you build is the outcome
from pydantic import BaseModel, Field
from typing import List

class ReviewAnalysis(BaseModel):
    summary: str = Field(description="A one-sentence summary of the review.")
    rating: int = Field(description="The reviewer's rating from 1 to 5.")
    keywords: List[str] = Field(description="A list of keywords.")

unstructured_text = "This app is amazing! The UI is so clean and it runs super fast. I just wish it had a dark mode. I'd give it 4 stars."

# Your function would do the magic
analysis_result: ReviewAnalysis = analyze_review(unstructured_text)

print(analysis_result.summary)
# > The user is very happy with the app's UI and performance but desires a dark mode feature.
print(analysis_result.rating)
# > 4

Implementation Hints:

  1. Define your ReviewAnalysis Pydantic model.
  2. Create a PydanticOutputParser from this model.
  3. Create a PromptTemplate. The template string should include the userโ€™s input text and also the format instructions from the parser ({format_instructions}).
  4. Instantiate your chat model (e.g., ChatGoogleGenerativeAI).
  5. Chain them all together using LCEL: chain = prompt | model | parser.
  6. Invoke the chain: chain.invoke({"review_text": unstructured_text}).

Learning milestones:

  1. Successfully get a response from an LLM using LangChain โ†’ Master model instantiation.
  2. Create a dynamic prompt with PromptTemplate โ†’ Understand prompt management.
  3. Get structured JSON or a Pydantic object back from the LLM โ†’ Master Output Parsers.
  4. Build your first chain using LCEL โ†’ You understand the core composition syntax of modern LangChain.

Project 2: Document Q&A Bot (RAG)

  • File: LEARN_LANGCHAIN_PROJECTS.md
  • Main Programming Language: Python
  • Coolness Level: Level 4: Hardcore Tech Flex
  • Business Potential: 3. The โ€œService & Supportโ€ Model
  • Difficulty: Level 2: Intermediate
  • Knowledge Area: RAG: Document Loaders, Splitters, Embeddings, Vector Stores
  • Software or Tool: LangChain, FAISS/ChromaDB, a PDF or text file
  • Main Book: โ€œVector Databasesโ€ (Free Oโ€™Reilly Ebook)

What youโ€™ll build: A command-line tool that โ€œingestsโ€ a document (like a PDF or a long text file) and allows you to ask questions about its content. The LLMโ€™s answers will be grounded in the information from the document.

Why it teaches LangChain: This is the quintessential LangChain use case. It teaches the entire Retrieval-Augmented Generation (RAG) pipeline, which is the most effective way to make LLMs work with custom data. Youโ€™ll learn how to bridge the gap between your documents and the LLMโ€™s reasoning capabilities.

Core challenges youโ€™ll face:

  • Loading a document from a source โ†’ maps to using a DocumentLoader (e.g., PyPDFLoader)
  • Splitting the document into manageable chunks โ†’ maps to using a TextSplitter (e.g., RecursiveCharacterTextSplitter)
  • Creating vector embeddings of the chunks โ†’ maps to using an Embeddings model
  • Storing and retrieving chunks from a vector database โ†’ maps to using a VectorStore (like FAISS) and creating a Retriever

Key Concepts:

  • Document Loaders & Text Splitters: LangChain Docs - โ€œIndexesโ€
  • Vector Stores & Embeddings: LangChain Docs - โ€œIndexesโ€
  • Retrievers: The interface for fetching documents.
  • RetrievalQA Chain: A built-in chain that simplifies the RAG process.

Difficulty: Intermediate Time estimate: 1-2 weeks Prerequisites: Project 1, conceptual understanding of what vector embeddings are.

Real world outcome: An interactive Q&A session where the LLM correctly answers questions based only on the provided document.

(Console Output)

> Ingesting document 'deep_learning_paper.pdf'... Done.
> Ask a question about the document:
> What is a transformer architecture?

Based on the document, a transformer architecture is a novel network architecture based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. It consists of an encoder and a decoder...

> Ask a question about the document:
> What is the capital of France?

I'm sorry, but that information is not present in the provided document.

Implementation Hints:

  1. Ingestion (do this once):
    • Load the document with a DocumentLoader.
    • Split the loaded documents with a TextSplitter.
    • Instantiate an embeddings model (e.g., GoogleGenerativeAIEmbeddings).
    • Use the FAISS.from_documents() method to perform the embedding and storing in one step. Save the FAISS index to disk.
  2. Answering (do this for each query):
    • Load the FAISS index from disk.
    • Create a retriever from the vector store: vectorstore.as_retriever().
    • Instantiate your LLM.
    • Use the RetrievalQA chain, providing the LLM and the retriever.
    • Run the chain with your question.

Learning milestones:

  1. Load and split a document into chunks โ†’ Understand data preparation for RAG.
  2. Create and save a vector store โ†’ Master the embedding and storage process.
  3. Ask a question and retrieve relevant chunks โ†’ You can see the retrieval part of RAG in action.
  4. Get a final, synthesized answer from the RetrievalQA chain โ†’ Youโ€™ve built a complete RAG pipeline.

Project 3: A Research Assistant Agent with Tools

  • File: LEARN_LANGCHAIN_PROJECTS.md
  • Main Programming Language: Python
  • Coolness Level: Level 5: Pure Magic (Super Cool)
  • Business Potential: 4. The โ€œOpen Coreโ€ Infrastructure
  • Difficulty: Level 3: Advanced
  • Knowledge Area: Agents, Tools, ReAct Framework
  • Software or Tool: LangChain, DuckDuckGo Search library, Tavily
  • Main Book: โ€œAgents with LangChainโ€ (LangChain Official Guide)

What youโ€™ll build: An AI agent that can answer complex questions by deciding which tools to use. For example, to answer โ€œWho is the current CEO of the company that makes the iPhone, and what is his age raised to the power of 0.5?โ€, it must first search the web, then use a calculator.

Why it teaches LangChain: This project represents the leap from pre-defined chains to autonomous reasoning. Youโ€™ll learn to give an LLM a โ€œbrainโ€ and โ€œhands.โ€ The LLM acts as the brain, deciding what to do, and the tools are its hands, allowing it to interact with the outside world.

Core challenges youโ€™ll face:

  • Defining a set of tools for the agent โ†’ maps to giving the agent capabilities (e.g., a search tool, a calculator tool)
  • Creating an agent โ€œexecutorโ€ โ†’ maps to the runtime that manages the think-act-observe loop
  • Understanding the ReAct (Reason+Act) prompt โ†’ maps to seeing how the agent is prompted to โ€œthink out loudโ€ about its plan
  • Debugging the agentโ€™s thought process โ†’ maps to interpreting the intermediate steps to see why it chose a certain tool

Key Concepts:

  • Tools: The functions an agent can call. LangChain provides many pre-built tools.
  • Agents: The reasoning engine. Youโ€™ll use a pre-built agent type like create_react_agent.
  • AgentExecutor: The runtime environment that actually executes the agentโ€™s decisions.

Difficulty: Advanced Time estimate: 1-2 weeks Prerequisites: Project 1 & 2.

Real world outcome: An interactive session with an agent that shows its work as it finds the answer.

(Console Output)

> Question: What is the hometown of the director of the movie 'Inception'?

> Entering new AgentExecutor chain...
Thought: I need to find out who directed the movie 'Inception'. Then I need to search for that person's hometown.
Action:
{
  "action": "search",
  "action_input": "who directed the movie Inception"
}
Observation: Christopher Nolan
Thought: Now that I know the director is Christopher Nolan, I need to find his hometown.
Action:
{
  "action": "search",
  "action_input": "Christopher Nolan hometown"
}
Observation: London, England
Thought: I now have the final answer.
Final Answer: The hometown of the director of 'Inception' is London, England.

> Finished chain.

Implementation Hints:

  1. Install a search tool library like duckduckgo-search.
  2. Import the Tool class and the search wrapper from langchain_community.
  3. Instantiate your tools, e.g., search = DuckDuckGoSearchRun(). Put them in a list.
  4. Create a prompt using hub.pull("hwchase17/react"). This pulls a pre-made ReAct prompt template.
  5. Create the agent by binding the tools to the LLM: agent = create_react_agent(llm, tools, prompt).
  6. Create the executor: agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True). The verbose=True is crucial for seeing the thought process.
  7. Invoke the executor with your question.

Learning milestones:

  1. Create an agent that can use a single tool (e.g., search) โ†’ Master the basic agent setup.
  2. Add a second tool (e.g., a calculator or another search tool) โ†’ Understand how the agent chooses between tools.
  3. Analyze the verbose output to understand the ReAct loop โ†’ You can โ€œseeโ€ the agent thinking.
  4. Successfully answer a multi-step question that requires multiple tools โ†’ You have built an autonomous agent.

Project 4: Conversational Product Recommender

  • File: LEARN_LANGCHAIN_PROJECTS.md
  • Main Programming Language: Python
  • Coolness Level: Level 3: Genuinely Clever
  • Business Potential: 3. The โ€œService & Supportโ€ Model
  • Difficulty: Level 3: Advanced
  • Knowledge Area: Memory
  • Software or Tool: LangChain, ConversationBufferMemory
  • Main Book: LangChain Docs - โ€œMemoryโ€

What youโ€™ll build: A conversational chatbot that recommends products based on a userโ€™s stated preferences. Unlike a simple Q&A bot, this one will remember previous turns of the conversation to inform its recommendations.

Why it teaches LangChain: LLMs are stateless. This project forces you to solve that problem by introducing Memory. Youโ€™ll learn how to store, retrieve, and incorporate conversational history into your prompts automatically, enabling true multi-turn dialogue.

Core challenges youโ€™ll face:

  • Choosing the right type of memory โ†’ maps to understanding the tradeoffs between ConversationBufferMemory, ConversationSummaryMemory, etc.
  • Integrating memory into a chain โ†’ maps to using ConversationChain or manually adding memory to a prompt
  • Managing the context window โ†’ maps to realizing that infinite memory isnโ€™t feasible and understanding how summarizing memory works
  • Prompting for conversational interaction โ†’ maps to modifying your system prompt to be more conversational

Key Concepts:

  • Memory Types: ConversationBufferMemory, ConversationSummaryBufferMemory.
  • ConversationChain: A high-level chain that has memory built-in.
  • Chat History: The object that stores the conversation and is passed to the prompt.

Difficulty: Advanced Time estimate: 1-2 weeks Prerequisites: Project 1.

Real world outcome: A chatbot that remembers what you told it earlier.

(Console Output)

> AI: Hello! I'm your friendly gadget recommender. What are you looking for today?
> User: I need a new laptop
> AI: Great! Laptops are my specialty. Do you have any specific needs? For example, are you a gamer, a student, or a professional?
> User: I'm a student, so battery life is really important.
> AI: Understood. For students prioritizing battery life, I recommend looking at the MacBook Air M2, the Dell XPS 13, or the Lenovo Yoga 7i. They all offer excellent performance and over 10 hours of real-world use.
> User: which of those is the lightest?
> AI: Of the three laptops I mentioned, the MacBook Air M2 is the lightest, weighing just 2.7 pounds.

Notice how it understood the โ€œthoseโ€ in the last question refers to the laptops it just recommended.

Implementation Hints:

  1. Instantiate your chat model as usual.
  2. Instantiate a memory object, e.g., memory = ConversationBufferMemory().
  3. Use the high-level ConversationChain, passing it the llm and memory objects.
  4. Create a loop where you take user input and pass it to chain.predict(input=user_input). The chain will automatically handle loading the history, adding the new input, and saving the new output.

Learning milestones:

  1. Build a simple echo-bot with memory โ†’ You can see the history being saved.
  2. Implement a conversational Q&A bot โ†’ Youโ€™ve added state to a stateless system.
  3. Experiment with ConversationSummaryBufferMemory โ†’ You understand how to manage long conversations without exceeding the context limit.
  4. Successfully have a multi-turn conversation where the bot references prior information โ†’ You have mastered the fundamentals of conversational AI.

Project 5: Automated Trip Planner

  • File: LEARN_LANGCHAIN_PROJECTS.md
  • Main Programming Language: Python
  • Coolness Level: Level 3: Genuinely Clever
  • Business Potential: 2. The โ€œMicro-SaaS / Pro Toolโ€
  • Difficulty: Level 3: Advanced
  • Knowledge Area: Advanced Chains (Sequential, Router)
  • Software or Tool: LangChain
  • Main Book: LangChain Docs - โ€œChainsโ€

What youโ€™ll build: A tool that generates a travel itinerary. It will take a destination and duration as input, then generate a list of attractions. Based on the type of attraction, it will then suggest a specific restaurant nearby. This involves multiple, dependent steps.

Why it teaches LangChain: This project moves beyond single-purpose chains to complex workflows. Youโ€™ll learn how to pipe the output of one chain into the input of another (Sequential Chains) and how to use an LLM to dynamically decide which chain to run next (Router Chains).

Core challenges youโ€™ll face:

  • Breaking a complex task into a sequence of smaller LLM calls โ†’ maps to thinking in terms of a Directed Acyclic Graph (DAG) of operations
  • Managing inputs and outputs between chains โ†’ maps to using SimpleSequentialChain or the more flexible SequentialChain
  • Creating prompts that classify input for routing โ†’ maps to the โ€˜meta-promptingโ€™ needed for RouterChain
  • Defining the different โ€œdestinationโ€ chains for a router โ†’ maps to modularizing your applicationโ€™s logic

Key Concepts:

  • SimpleSequentialChain: For a linear sequence where one chainโ€™s output is the nextโ€™s input.
  • SequentialChain: For more complex sequences with multiple inputs/outputs.
  • RouterChain: Uses an LLM to choose one of several sub-chains to execute.
  • LLMChain: The building block for all other chains.

Difficulty: Advanced Time estimate: 1-2 weeks Prerequisites: Project 1.

Real world outcome: A script that produces a structured itinerary from a simple query.

(Console Output)
> Plan a one-day trip to Paris.

Okay, planning your trip!

**Generated Itinerary for Paris:**

1.  **Morning (Historical Sight):** The Louvre Museum
    *   *Restaurant Suggestion:* Le Fumoir, a classic French bistro perfect for a post-museum lunch.

2.  **Afternoon (Architectural Marvel):** Eiffel Tower
    *   *Restaurant Suggestion:* 58 Tour Eiffel, for a meal with a view right inside the tower itself.

3.  **Evening (Art & Culture):** Montmartre District
    *   *Restaurant Suggestion:* La Mรจre Catherine, a traditional restaurant in the heart of Montmartre's artist square.

Implementation Hints:

  1. Attraction Chain: An LLMChain that takes a destination and duration and outputs a numbered list of attractions, each with a โ€œtypeโ€ (e.g., โ€œHistorical Sightโ€).
  2. Restaurant Chain: An LLMChain that takes an attraction_name and attraction_type and outputs a nearby restaurant suggestion.
  3. Combine these with SequentialChain. You will need to use custom transform functions to parse the output of the first chain to create the inputs for the second.
  4. For the Router: First create several restaurant suggestion chains (e.g., fine_dining_chain, casual_eats_chain). Then create a MultiPromptChain router that uses an LLM to classify the attraction type and route to the appropriate restaurant chain.

Learning milestones:

  1. Build a SimpleSequentialChain โ†’ Master the basic chaining concept.
  2. Build a more complex SequentialChain that requires output manipulation โ†’ Understand data flow in chains.
  3. Implement a RouterChain that classifies and routes user intent โ†’ Let an LLM control the application flow.
  4. Successfully generate a multi-step, context-aware itinerary โ†’ You can build complex workflows.

Project 6: RAG Bot with Citations

  • File: LEARN_LANGCHAIN_PROJECTS.md
  • Main Programming Language: Python
  • Coolness Level: Level 4: Hardcore Tech Flex
  • Business Potential: 4. The โ€œOpen Coreโ€ Infrastructure
  • Difficulty: Level 4: Expert
  • Knowledge Area: Advanced RAG
  • Software or Tool: LangChain, ParentDocumentRetriever
  • Main Book: LangChain Docs - โ€œRetrieversโ€

What youโ€™ll build: An enhanced version of the Document Q&A bot from Project 2. This version will not only answer the question but also provide citations, pointing to the exact source document and page number it used to generate the answer.

Why it teaches LangChain: This is a crucial step for production-ready RAG. Users need to trust the AIโ€™s output, and citations are the best way to do that. This project teaches you how to handle document metadata and modify the final โ€œanswer synthesisโ€ step to include sources. It also introduces more advanced retrieval strategies.

Core challenges youโ€™ll face:

  • Preserving metadata during document splitting โ†’ maps to ensuring each chunk knows its original source (e.g., filename, page number)
  • Using a more advanced retriever โ†’ maps to exploring ParentDocumentRetriever to get more contextually relevant chunks
  • Modifying the final QA prompt โ†’ maps to instructing the LLM to provide sources along with its answer
  • Creating a final chain that returns a structured answer object โ†’ maps to combining the answer and sources into a single Pydantic object

Key Concepts:

  • ParentDocumentRetriever: A strategy that splits documents into small chunks for searching but retrieves larger, surrounding โ€œparentโ€ chunks for better context.
  • Metadata: Attaching key-value data to documents and chunks.
  • create_stuff_documents_chain: The chain responsible for stuffing the retrieved documents into the final prompt, which you can customize.

Difficulty: Expert Time estimate: 2 weeks Prerequisites: Project 2.

Real world outcome: A trustworthy Q&A bot that backs up its claims.

(Console Output)
> Question: In the SEC filing, what were the main risk factors mentioned for Q4?

**Answer:** The main risk factors mentioned for Q4 were increased competition from emerging markets, potential supply chain disruptions due to new trade regulations, and the ongoing volatility in currency exchange rates.

**Sources:**
1.  **Document:** `10-K_filing_2025.pdf`, Page 12, Section "Risk Factors"
2.  **Document:** `10-K_filing_2025.pdf`, Page 14, Section "International Operations"

Implementation Hints:

  1. When loading documents (e.g., with PyPDFLoader), the metadata for each page document will automatically include the source filename and page number.
  2. Use the ParentDocumentRetriever. You will need a docstore to hold the parent documents and a vectorstore for the child chunks.
  3. Create a custom prompt for the final QA step. Add a sentence like: โ€œYou must include a list of sources used to formulate your answer. A source is the โ€˜sourceโ€™ and โ€˜pageโ€™ from the metadata of the provided documents.โ€
  4. Use create_retrieval_chain which is designed to pass through the retrieved documents, making them available for you to format and display as sources.

Learning milestones:

  1. Ingest documents with metadata correctly preserved โ†’ You understand data provenance.
  2. Implement ParentDocumentRetriever โ†’ Youโ€™re using advanced retrieval strategies.
  3. Customize the final QA prompt to ask for sources โ†’ You can manipulate the final generation step.
  4. Return a structured object containing both the answer and a list of source documents โ†’ You can build enterprise-grade RAG systems.

Project 7: Natural Language to SQL Query Generator

  • File: LEARN_LANGCHAIN_PROJECTS.md
  • Main Programming Language: Python
  • Coolness Level: Level 4: Hardcore Tech Flex
  • Business Potential: 4. The โ€œOpen Coreโ€ Infrastructure
  • Difficulty: Level 4: Expert
  • Knowledge Area: SQL Agents & Chains
  • Software or Tool: LangChain, SQLAlchemy, a sample SQLite database (e.g., Chinook)
  • Main Book: LangChain Docs - โ€œSQLโ€

What youโ€™ll build: A tool that connects to a SQL database and allows users to ask questions in plain English. The tool will convert the question to a SQL query, execute it, and return the answer in a human-readable format.

Why it teaches LangChain: A huge amount of the worldโ€™s data is in structured SQL databases. This project teaches you how to give an LLM the ability to interact with this data. Youโ€™ll learn about the specialized SQL agents and chains that inspect the database schema to write accurate queries.

Core challenges youโ€™ll face:

  • Connecting LangChain to a database โ†’ maps to creating a SQLDatabase object using SQLAlchemy
  • Giving the LLM the database schema โ†’ maps to how the SQL toolkit automatically provides table and column info in the prompt
  • Ensuring the LLM generates valid SQL for your dialect โ†’ maps to the power and limitations of Text-to-SQL
  • Handling potential errors and interpreting results โ†’ maps to building a robust database-interacting agent

Key Concepts:

  • SQLDatabase Toolkit: The LangChain wrapper around a database connection.
  • create_sql_agent: A factory function for creating a powerful agent that can query databases.
  • Text-to-SQL Prompting: The art of providing schema information and examples to an LLM so it can write correct queries.

Difficulty: Expert Time estimate: 1-2 weeks Prerequisites: Basic SQL knowledge, Project 3.

Real world outcome: A powerful analytics tool that empowers non-technical users to query a database.

(Console Output)
> Enter your question about the Chinook database:
> Which 5 artists have the most albums?

> Entering new AgentExecutor chain...
Thought: The user wants to find the top 5 artists with the most albums. I need to join the 'artists' table with the 'albums' table, group by artist name, count the number of albums, and then order the results to get the top 5.
Action:
{
  "action": "sql_db_query",
  "action_input": "SELECT T1.Name, COUNT(T2.AlbumId) AS TotalAlbums FROM artists AS T1 JOIN albums AS T2 ON T1.ArtistId = T2.ArtistId GROUP BY T1.Name ORDER BY TotalAlbums DESC LIMIT 5"
}
Observation: [('Iron Maiden', 21), ('Led Zeppelin', 14), ('Deep Purple', 11), ('U2', 10), ('Metallica', 10)]
Thought: I have the result from the database. I can now formulate a final answer for the user.
Final Answer: The top 5 artists with the most albums are: Iron Maiden (21 albums), Led Zeppelin (14 albums), Deep Purple (11 albums), U2 (10 albums), and Metallica (10 albums).

> Finished chain.

Implementation Hints:

  1. Set up a sample SQLite database (the Chinook dataset is perfect for this).
  2. Install sqlalchemy and other necessary database drivers.
  3. Create the SQLDatabase object: db = SQLDatabase.from_uri("sqlite:///Chinook.db").
  4. Instantiate your LLM.
  5. Use the create_sql_agent function, passing it the llm and the db object. This will create an agent with a pre-configured set of database tools.
  6. Invoke the agent executor with a natural language question.

Learning milestones:

  1. Successfully connect LangChain to a SQL database โ†’ You can bridge the gap between LLMs and structured data.
  2. Ask a simple question that queries a single table โ†’ Youโ€™ve got the basic Text-to-SQL working.
  3. Ask a complex question that requires a JOIN โ†’ You can see the LLMโ€™s advanced reasoning capabilities.
  4. Observe the agent correcting a malformed query โ†’ You understand the iterative, self-healing nature of SQL agents.

Project 8: Movie Recommendation Graph Agent

  • File: LEARN_LANGCHAIN_PROJECTS.md
  • Main Programming Language: Python
  • Coolness Level: Level 5: Pure Magic (Super Cool)
  • Business Potential: 4. The โ€œOpen Coreโ€ Infrastructure
  • Difficulty: Level 5: Master
  • Knowledge Area: Graph Database Chains
  • Software or Tool: LangChain, Neo4j (or other graph DB), GraphCypherQAChain
  • Main Book: LangChain Docs - โ€œGraphโ€

What youโ€™ll build: An agent that provides movie recommendations by querying a graph database. Youโ€™ll model movies, actors, and directors as nodes and their relationships as edges. The agent will translate natural language questions into Cypher (the graph query language) to answer questions that are difficult for traditional databases, like โ€œFind movies starring an actor who also directed a movie.โ€

Why it teaches LangChain: This project explores the frontier of data interaction. While RAG is great for unstructured text and SQL agents are great for tables, Graph chains are designed to leverage the relationships in your data, enabling more complex, multi-hop reasoning.

Core challenges youโ€™ll face:

  • Modeling data as a graph โ†’ maps to thinking in terms of nodes, relationships, and properties
  • Connecting LangChain to a graph database โ†’ maps to using the Neo4jGraph or similar integration
  • Understanding Text-to-Cypher โ†’ maps to how the LLM uses the graph schema to construct Cypher queries
  • Leveraging the graph structure for novel queries โ†’ maps to asking questions that would be very slow or complex in SQL

Key Concepts:

  • Graph Databases: Understanding nodes, edges, and properties (Neo4j fundamentals).
  • Cypher Query Language: The SQL-like language for querying graphs.
  • GraphCypherQAChain: The specialized chain for converting questions to Cypher and executing them against a graph.
  • Graph Schema: The information provided to the LLM so it knows what types of nodes and relationships exist.

Difficulty: Master Time estimate: 2 weeks Prerequisites: Project 7, willingness to learn basic Cypher.

Real world outcome: A chatbot that can answer nuanced questions about movies and their connections.

(Console Output)
> Question: Which movies did Tom Hanks act in that were also directed by Steven Spielberg?

> Entering new GraphCypherQAChain chain...
Generated Cypher:
MATCH (p:Person {name: 'Tom Hanks'})-[:ACTED_IN]->(m:Movie)<-[:DIRECTED]-(d:Person {name: 'Steven Spielberg'}) RETURN m.title

> Executing Cypher...
> Finished chain.
Answer: Saving Private Ryan, Catch Me If You Can, The Terminal, Bridge of Spies, The Post.

Implementation Hints:

  1. Set up a Neo4j instance (AuraDB offers a free tier).
  2. Populate it with some sample movie data. Create (:Movie), (:Person) nodes and [:ACTED_IN], [:DIRECTED] relationships.
  3. Instantiate the Neo4jGraph object in LangChain with your database credentials.
  4. Instantiate your LLM.
  5. Create a GraphCypherQAChain from the graph object and the LLM.
  6. Run the chain with your natural language question. The chain will automatically infer the schema, generate Cypher, execute it, and synthesize the answer.

Learning milestones:

  1. Connect LangChain to a running graph database โ†’ Youโ€™ve integrated another type of data source.
  2. Successfully answer a single-hop question (โ€œWho directed โ€˜Forrest Gumpโ€™?โ€) โ†’ Youโ€™ve got Text-to-Cypher working.
  3. Successfully answer a multi-hop question (โ€œWho acted in a movie with Tom Hanks?โ€) โ†’ You are leveraging the power of the graph.
  4. Inspect the generated Cypher to understand how the LLM โ€œthinksโ€ in graphs โ†’ You have mastered graph-based LLM interaction.

Final Overall Project (Project 9): A GitHub Repo Analysis Agent

  • File: LEARN_LANGCHAIN_PROJECTS.md
  • Main Programming Language: Python
  • Coolness Level: Level 5: Pure Magic (Super Cool)
  • Business Potential: 5. The โ€œIndustry Disruptorโ€
  • Difficulty: Level 5: Master
  • Knowledge Area: Advanced Agents, Custom Tools, RAG-as-a-Tool
  • Software or Tool: LangChain, GitPython library

What youโ€™ll build: A sophisticated agent that can analyze a GitHub repository. Youโ€™ll equip it with custom tools to list_files, read_file, and a powerful RAG-based tool to answer_about_codebase. This allows it to answer both specific questions (โ€œWhatโ€™s in the Dockerfile?โ€) and high-level questions (โ€œWhat is the overall architecture of this project?โ€).

Why itโ€™s the capstone project: It combines all the concepts youโ€™ve learned. It uses an Agent as the main entry point, which uses Custom Tools you define. One of those tools is itself a full RAG Chain, demonstrating the powerful, composable nature of LangChain. This is a real, non-trivial application that showcases a deep understanding of the framework.

Core challenges youโ€™ll face:

  • Creating custom tools from Python functions โ†’ maps to using the @tool decorator to expose your own code to the agent
  • Designing a โ€œRAG toolโ€ โ†’ maps to wrapping your entire RAG chain from Project 2 into a single tool that the agent can decide to call
  • Prompting the agent effectively โ†’ maps to writing a system prompt that helps the agent understand its tools and when to use them
  • Orchestrating complex reasoning โ†’ maps to the agent deciding whether to read a single file for a specific answer or use the RAG tool for a general one

Real world outcome: An AI assistant that you can point at a GitHub repo and ask complex questions to, accelerating your ability to understand new code.

(Console Output)

> Analyzing repo 'https://github.com/langchain-ai/langchain'...
> Ask a question: How does LangChain handle streaming responses?

> Entering new AgentExecutor chain...
Thought: The user is asking a high-level conceptual question about the codebase. Reading a single file is unlikely to give me the full picture. I should use my tool for answering general questions about the codebase.
Action:
{
  "action": "answer_about_codebase",
  "action_input": "How does LangChain handle streaming responses?"
}
Observation: LangChain handles streaming by using iterators and generators. The `stream` method on chains and models returns an iterator that yields response chunks as they are generated by the LLM. This is often implemented using Server-Sent Events (SSE) or by parsing the raw streaming HTTP response. Key classes involved are `StreamingStdOutCallbackHandler`...
Thought: I have a comprehensive answer from my RAG tool. I can now provide the final answer.
Final Answer: LangChain handles streaming responses primarily through methods like `.stream()` available on its core components (Chains, LLMs). This method returns an iterator that yields chunks of the response as they become available from the model. This allows for real-time display of text in applications. For more detailed implementation, you can look into callback handlers like `StreamingStdOutCallbackHandler`.

> Finished chain.

Learning milestones:

  1. Build and test a custom list_files tool โ†’ You can expose simple functions as tools.
  2. Build a RAG-as-a-Tool and integrate it into the agentโ€™s tool list โ†’ Youโ€™ve mastered composition.
  3. Write a master prompt that guides the agent on when to use which tool โ†’ You understand how to steer agentic behavior.
  4. Successfully have the agent answer a high-level question using the RAG tool and a specific question using the read_file tool โ†’ You have built a truly intelligent and useful AI application.

Summary of Projects

Project Main Language Key LangChain Concept
1. Structured Data Extractor Python Prompts, Models, Output Parsers
2. Document Q&A Bot Python RAG (Indexes, Retrievers)
3. Research Assistant Agent Python Agents & Tools
4. Conversational Product Recommender Python Memory
5. Automated Trip Planner Python Advanced Chains (Sequential, Router)
6. RAG Bot with Citations Python Advanced RAG
7. Natural Language to SQL Query Python SQL Agents & Chains
8. Movie Recommendation Graph Agent Python Graph Chains
Final Project (9): GitHub Repo Analysis Agent Python Advanced Agents, Custom Tools, RAG-as-a-Tool