LEARN LANGCHAIN PROJECTS
Learn LangChain: From Simple Chains to Autonomous Agents
Goal: Master the LangChain framework to build sophisticated, data-aware, and autonomous applications powered by Large Language Models. Move beyond simple API calls to orchestrating complex chains, grounding models in your own data, and giving them tools to interact with the world.
Why Learn LangChain?
Calling a Large Language Model (LLM) is easy, but building a robust application around it is hard. LLMs are non-deterministic, stateless, and have knowledge cutoffs. LangChain provides the essential toolkit to solve these problems. It’s the “de facto” framework for productionizing LLM applications.
After completing these projects, you will:
- Understand the core components: Models, Prompts, Chains, and Output Parsers.
- Build applications that can reason over your private data using Retrieval-Augmented Generation (RAG).
- Give your applications memory to have stateful, long-running conversations.
- Create autonomous agents that can use tools (like web search or your own APIs) to solve complex problems.
- Structure your code for modularity and scalability, ready for real-world deployment.
Core Concept Analysis
1. The Basic Chain: LLMChain
The fundamental unit in LangChain. It combines a Prompt Template (a recipe for a prompt) with a Model (the LLM) and an optional Output Parser (to structure the response).
┌───────────────┐
│ User Input │
│ {"topic": "AI"}│
└───────┬───────┘
│
▼
┌───────────────────┐ ┌────────────────────┐ ┌──────────────────┐
│ PromptTemplate │ │ Model │ │ OutputParser │
│ "Tell me a joke │──▶│(e.g., Google Gemini)│──▶│ (e.g., formats │
│ about {topic}" │ │ │ │ into a list) │
└───────────────────┘ └────────────────────┘ └──────────────────┘
▲ │ │
└───────────────────┴──────────────────────┘
LLMChain
│
▼
┌────────────────┐
│Structured Output│
│ ["Why did AI..."]│
└────────────────┘
2. Retrieval-Augmented Generation (RAG): The RAG Chain
This is the most common and powerful pattern for making LLMs “smarter” with your own data. It prevents hallucination and allows the LLM to answer questions about information it wasn’t trained on.
┌────────────┐ ┌─────────┐ ┌───────────┐ ┌─────────────┐
│ Document │ │ Text │ │ Embedding │ │ VectorStore │
│ (e.g., PDF)├──▶│ Splitter│──▶│ Model │──▶│ (e.g., FAISS,│
│ │ │ │ │ │ │ ChromaDB) │
└────────────┘ └─────────┘ └───────────┘ └─────────────┘
(Load) (Split) (Embed) (Store)
▲
│ Retrieve relevant chunks
│
┌────────────┐ ┌───────────┐
│ User Query │────────────────────────────────▶│ Retriever │
└────────────┘ └─────┬─────┘
│
┌──────────────────────────────────────────┘
│ (Context)
▼
┌───────────────────────────────────────────────────────────┐
│ LLM Chain (as above) │
│ "Based on this context... {context}, answer the question │
│ ... {question}" │
└───────────────────────────────────────────────────────────┘
3. Agents: The Decision Makers
An agent uses an LLM not just to answer, but to think. It decides which Tool to use to find an answer, executes it, observes the result, and repeats the process until it has a final answer.
(Loop)
▲
┌───────────┐ │
│ LLM │─┘
│(Reasoning)│
└─────┬─────┘
│
│ Thought: "I need to search the web for the current weather."
│
▼
┌───────────┐
│ Action │
│Tool: search│
│Input: "weather in SF"│
└─────┬─────┘
│
│
▼
┌───────────┐ ┌─────────────┐
│ Tool │ │ Observation │
│(e.g., Web │──▶│ "It is 70°F │
│ Search) │ │ and sunny."│
└───────────┘ └──────┬──────┘
│
│ Returns Observation to LLM
└─────────────────────┐
▼
┌───────────┐
│ LLM │
└─────┬─────┘
│
│ Thought: "I have the final answer."
│
▼
┌────────┐
│ Answer │
└────────┘
Project List
These projects are designed to be built in sequence. Each one introduces a new, fundamental LangChain component.
Project 1: Structured Data Extractor
- File: LEARN_LANGCHAIN_PROJECTS.md
- Main Programming Language: Python
- Coolness Level: Level 2: Practical but Forgettable
- Business Potential: 2. The “Micro-SaaS / Pro Tool”
- Difficulty: Level 1: Beginner
- Knowledge Area: Models, Prompts, Output Parsers
- Software or Tool: LangChain, an LLM provider (OpenAI, Google)
- Main Book: LangChain Official Documentation
What you’ll build: A Python script that takes a block of unstructured text (e.g., a user review) and uses an LLM to extract structured information (like a 1-5 star rating, a summary, and a list of positive/negative keywords) into a Pydantic class.
Why it teaches LangChain: This project teaches the three most fundamental components: a Model, a PromptTemplate, and an Output Parser. You’ll learn that getting structured data back from an LLM is a core challenge, and you’ll solve it the “LangChain way”.
Core challenges you’ll face:
- Connecting to an LLM API → maps to instantiating a Chat model (e.g.,
ChatGoogleGenerativeAI) - Writing a prompt that instructs the LLM to extract specific information → maps to creating a
PromptTemplate - Defining the desired output structure → maps to creating a
Pydanticclass - Forcing the LLM to return data that matches your structure → maps to using a
PydanticOutputParserand combining it all in a chain
Key Concepts:
- Models (LLMs vs. Chat Models): LangChain Docs - “Models”
- Prompt Templates: LangChain Docs - “Prompts”
- Output Parsers (especially Pydantic): LangChain Docs - “Output Parsers”
- LangChain Expression Language (LCEL): The
|(pipe) syntax for chaining components.
Difficulty: Beginner Time estimate: Weekend Prerequisites: Basic Python, familiarity with calling APIs.
Real world outcome: A function that reliably converts messy text into a clean Python object.
# The API you build is the outcome
from pydantic import BaseModel, Field
from typing import List
class ReviewAnalysis(BaseModel):
summary: str = Field(description="A one-sentence summary of the review.")
rating: int = Field(description="The reviewer's rating from 1 to 5.")
keywords: List[str] = Field(description="A list of keywords.")
unstructured_text = "This app is amazing! The UI is so clean and it runs super fast. I just wish it had a dark mode. I'd give it 4 stars."
# Your function would do the magic
analysis_result: ReviewAnalysis = analyze_review(unstructured_text)
print(analysis_result.summary)
# > The user is very happy with the app's UI and performance but desires a dark mode feature.
print(analysis_result.rating)
# > 4
Implementation Hints:
- Define your
ReviewAnalysisPydantic model. - Create a
PydanticOutputParserfrom this model. - Create a
PromptTemplate. The template string should include the user’s input text and also the format instructions from the parser ({format_instructions}). - Instantiate your chat model (e.g.,
ChatGoogleGenerativeAI). - Chain them all together using LCEL:
chain = prompt | model | parser. - Invoke the chain:
chain.invoke({"review_text": unstructured_text}).
Learning milestones:
- Successfully get a response from an LLM using LangChain → Master model instantiation.
- Create a dynamic prompt with
PromptTemplate→ Understand prompt management. - Get structured JSON or a Pydantic object back from the LLM → Master Output Parsers.
- Build your first chain using LCEL → You understand the core composition syntax of modern LangChain.
Project 2: Document Q&A Bot (RAG)
- File: LEARN_LANGCHAIN_PROJECTS.md
- Main Programming Language: Python
- Coolness Level: Level 4: Hardcore Tech Flex
- Business Potential: 3. The “Service & Support” Model
- Difficulty: Level 2: Intermediate
- Knowledge Area: RAG: Document Loaders, Splitters, Embeddings, Vector Stores
- Software or Tool: LangChain, FAISS/ChromaDB, a PDF or text file
- Main Book: “Vector Databases” (Free O’Reilly Ebook)
What you’ll build: A command-line tool that “ingests” a document (like a PDF or a long text file) and allows you to ask questions about its content. The LLM’s answers will be grounded in the information from the document.
Why it teaches LangChain: This is the quintessential LangChain use case. It teaches the entire Retrieval-Augmented Generation (RAG) pipeline, which is the most effective way to make LLMs work with custom data. You’ll learn how to bridge the gap between your documents and the LLM’s reasoning capabilities.
Core challenges you’ll face:
- Loading a document from a source → maps to using a
DocumentLoader(e.g.,PyPDFLoader) - Splitting the document into manageable chunks → maps to using a
TextSplitter(e.g.,RecursiveCharacterTextSplitter) - Creating vector embeddings of the chunks → maps to using an
Embeddingsmodel - Storing and retrieving chunks from a vector database → maps to using a
VectorStore(likeFAISS) and creating aRetriever
Key Concepts:
- Document Loaders & Text Splitters: LangChain Docs - “Indexes”
- Vector Stores & Embeddings: LangChain Docs - “Indexes”
- Retrievers: The interface for fetching documents.
RetrievalQAChain: A built-in chain that simplifies the RAG process.
Difficulty: Intermediate Time estimate: 1-2 weeks Prerequisites: Project 1, conceptual understanding of what vector embeddings are.
Real world outcome: An interactive Q&A session where the LLM correctly answers questions based only on the provided document.
(Console Output)
> Ingesting document 'deep_learning_paper.pdf'... Done.
> Ask a question about the document:
> What is a transformer architecture?
Based on the document, a transformer architecture is a novel network architecture based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. It consists of an encoder and a decoder...
> Ask a question about the document:
> What is the capital of France?
I'm sorry, but that information is not present in the provided document.
Implementation Hints:
- Ingestion (do this once):
- Load the document with a
DocumentLoader. - Split the loaded documents with a
TextSplitter. - Instantiate an embeddings model (e.g.,
GoogleGenerativeAIEmbeddings). - Use the
FAISS.from_documents()method to perform the embedding and storing in one step. Save the FAISS index to disk.
- Load the document with a
- Answering (do this for each query):
- Load the FAISS index from disk.
- Create a retriever from the vector store:
vectorstore.as_retriever(). - Instantiate your LLM.
- Use the
RetrievalQAchain, providing the LLM and the retriever. - Run the chain with your question.
Learning milestones:
- Load and split a document into chunks → Understand data preparation for RAG.
- Create and save a vector store → Master the embedding and storage process.
- Ask a question and retrieve relevant chunks → You can see the retrieval part of RAG in action.
- Get a final, synthesized answer from the
RetrievalQAchain → You’ve built a complete RAG pipeline.
Project 3: A Research Assistant Agent with Tools
- File: LEARN_LANGCHAIN_PROJECTS.md
- Main Programming Language: Python
- Coolness Level: Level 5: Pure Magic (Super Cool)
- Business Potential: 4. The “Open Core” Infrastructure
- Difficulty: Level 3: Advanced
- Knowledge Area: Agents, Tools, ReAct Framework
- Software or Tool: LangChain, DuckDuckGo Search library, Tavily
- Main Book: “Agents with LangChain” (LangChain Official Guide)
What you’ll build: An AI agent that can answer complex questions by deciding which tools to use. For example, to answer “Who is the current CEO of the company that makes the iPhone, and what is his age raised to the power of 0.5?”, it must first search the web, then use a calculator.
Why it teaches LangChain: This project represents the leap from pre-defined chains to autonomous reasoning. You’ll learn to give an LLM a “brain” and “hands.” The LLM acts as the brain, deciding what to do, and the tools are its hands, allowing it to interact with the outside world.
Core challenges you’ll face:
- Defining a set of tools for the agent → maps to giving the agent capabilities (e.g., a search tool, a calculator tool)
- Creating an agent “executor” → maps to the runtime that manages the think-act-observe loop
- Understanding the ReAct (Reason+Act) prompt → maps to seeing how the agent is prompted to “think out loud” about its plan
- Debugging the agent’s thought process → maps to interpreting the intermediate steps to see why it chose a certain tool
Key Concepts:
- Tools: The functions an agent can call. LangChain provides many pre-built tools.
- Agents: The reasoning engine. You’ll use a pre-built agent type like
create_react_agent. - AgentExecutor: The runtime environment that actually executes the agent’s decisions.
Difficulty: Advanced Time estimate: 1-2 weeks Prerequisites: Project 1 & 2.
Real world outcome: An interactive session with an agent that shows its work as it finds the answer.
(Console Output)
> Question: What is the hometown of the director of the movie 'Inception'?
> Entering new AgentExecutor chain...
Thought: I need to find out who directed the movie 'Inception'. Then I need to search for that person's hometown.
Action:
{
"action": "search",
"action_input": "who directed the movie Inception"
}
Observation: Christopher Nolan
Thought: Now that I know the director is Christopher Nolan, I need to find his hometown.
Action:
{
"action": "search",
"action_input": "Christopher Nolan hometown"
}
Observation: London, England
Thought: I now have the final answer.
Final Answer: The hometown of the director of 'Inception' is London, England.
> Finished chain.
Implementation Hints:
- Install a search tool library like
duckduckgo-search. - Import the
Toolclass and the search wrapper fromlangchain_community. - Instantiate your tools, e.g.,
search = DuckDuckGoSearchRun(). Put them in alist. - Create a prompt using
hub.pull("hwchase17/react"). This pulls a pre-made ReAct prompt template. - Create the agent by binding the tools to the LLM:
agent = create_react_agent(llm, tools, prompt). - Create the executor:
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True). Theverbose=Trueis crucial for seeing the thought process. - Invoke the executor with your question.
Learning milestones:
- Create an agent that can use a single tool (e.g., search) → Master the basic agent setup.
- Add a second tool (e.g., a calculator or another search tool) → Understand how the agent chooses between tools.
- Analyze the
verboseoutput to understand the ReAct loop → You can “see” the agent thinking. - Successfully answer a multi-step question that requires multiple tools → You have built an autonomous agent.
Project 4: Conversational Product Recommender
- File: LEARN_LANGCHAIN_PROJECTS.md
- Main Programming Language: Python
- Coolness Level: Level 3: Genuinely Clever
- Business Potential: 3. The “Service & Support” Model
- Difficulty: Level 3: Advanced
- Knowledge Area: Memory
- Software or Tool: LangChain,
ConversationBufferMemory - Main Book: LangChain Docs - “Memory”
What you’ll build: A conversational chatbot that recommends products based on a user’s stated preferences. Unlike a simple Q&A bot, this one will remember previous turns of the conversation to inform its recommendations.
Why it teaches LangChain: LLMs are stateless. This project forces you to solve that problem by introducing Memory. You’ll learn how to store, retrieve, and incorporate conversational history into your prompts automatically, enabling true multi-turn dialogue.
Core challenges you’ll face:
- Choosing the right type of memory → maps to understanding the tradeoffs between
ConversationBufferMemory,ConversationSummaryMemory, etc. - Integrating memory into a chain → maps to using
ConversationChainor manually adding memory to a prompt - Managing the context window → maps to realizing that infinite memory isn’t feasible and understanding how summarizing memory works
- Prompting for conversational interaction → maps to modifying your system prompt to be more conversational
Key Concepts:
- Memory Types:
ConversationBufferMemory,ConversationSummaryBufferMemory. ConversationChain: A high-level chain that has memory built-in.- Chat History: The object that stores the conversation and is passed to the prompt.
Difficulty: Advanced Time estimate: 1-2 weeks Prerequisites: Project 1.
Real world outcome: A chatbot that remembers what you told it earlier.
(Console Output)
> AI: Hello! I'm your friendly gadget recommender. What are you looking for today?
> User: I need a new laptop
> AI: Great! Laptops are my specialty. Do you have any specific needs? For example, are you a gamer, a student, or a professional?
> User: I'm a student, so battery life is really important.
> AI: Understood. For students prioritizing battery life, I recommend looking at the MacBook Air M2, the Dell XPS 13, or the Lenovo Yoga 7i. They all offer excellent performance and over 10 hours of real-world use.
> User: which of those is the lightest?
> AI: Of the three laptops I mentioned, the MacBook Air M2 is the lightest, weighing just 2.7 pounds.
Notice how it understood the “those” in the last question refers to the laptops it just recommended.
Implementation Hints:
- Instantiate your chat model as usual.
- Instantiate a memory object, e.g.,
memory = ConversationBufferMemory(). - Use the high-level
ConversationChain, passing it thellmandmemoryobjects. - Create a loop where you take user input and pass it to
chain.predict(input=user_input). The chain will automatically handle loading the history, adding the new input, and saving the new output.
Learning milestones:
- Build a simple echo-bot with memory → You can see the history being saved.
- Implement a conversational Q&A bot → You’ve added state to a stateless system.
- Experiment with
ConversationSummaryBufferMemory→ You understand how to manage long conversations without exceeding the context limit. - Successfully have a multi-turn conversation where the bot references prior information → You have mastered the fundamentals of conversational AI.
Project 5: Automated Trip Planner
- File: LEARN_LANGCHAIN_PROJECTS.md
- Main Programming Language: Python
- Coolness Level: Level 3: Genuinely Clever
- Business Potential: 2. The “Micro-SaaS / Pro Tool”
- Difficulty: Level 3: Advanced
- Knowledge Area: Advanced Chains (Sequential, Router)
- Software or Tool: LangChain
- Main Book: LangChain Docs - “Chains”
What you’ll build: A tool that generates a travel itinerary. It will take a destination and duration as input, then generate a list of attractions. Based on the type of attraction, it will then suggest a specific restaurant nearby. This involves multiple, dependent steps.
Why it teaches LangChain: This project moves beyond single-purpose chains to complex workflows. You’ll learn how to pipe the output of one chain into the input of another (Sequential Chains) and how to use an LLM to dynamically decide which chain to run next (Router Chains).
Core challenges you’ll face:
- Breaking a complex task into a sequence of smaller LLM calls → maps to thinking in terms of a Directed Acyclic Graph (DAG) of operations
- Managing inputs and outputs between chains → maps to using
SimpleSequentialChainor the more flexibleSequentialChain - Creating prompts that classify input for routing → maps to the ‘meta-prompting’ needed for
RouterChain - Defining the different “destination” chains for a router → maps to modularizing your application’s logic
Key Concepts:
SimpleSequentialChain: For a linear sequence where one chain’s output is the next’s input.SequentialChain: For more complex sequences with multiple inputs/outputs.RouterChain: Uses an LLM to choose one of several sub-chains to execute.LLMChain: The building block for all other chains.
Difficulty: Advanced Time estimate: 1-2 weeks Prerequisites: Project 1.
Real world outcome: A script that produces a structured itinerary from a simple query.
(Console Output)
> Plan a one-day trip to Paris.
Okay, planning your trip!
**Generated Itinerary for Paris:**
1. **Morning (Historical Sight):** The Louvre Museum
* *Restaurant Suggestion:* Le Fumoir, a classic French bistro perfect for a post-museum lunch.
2. **Afternoon (Architectural Marvel):** Eiffel Tower
* *Restaurant Suggestion:* 58 Tour Eiffel, for a meal with a view right inside the tower itself.
3. **Evening (Art & Culture):** Montmartre District
* *Restaurant Suggestion:* La Mère Catherine, a traditional restaurant in the heart of Montmartre's artist square.
Implementation Hints:
- Attraction Chain: An
LLMChainthat takes adestinationanddurationand outputs a numbered list of attractions, each with a “type” (e.g., “Historical Sight”). - Restaurant Chain: An
LLMChainthat takes anattraction_nameandattraction_typeand outputs a nearby restaurant suggestion. - Combine these with
SequentialChain. You will need to use customtransformfunctions to parse the output of the first chain to create the inputs for the second. - For the Router: First create several restaurant suggestion chains (e.g.,
fine_dining_chain,casual_eats_chain). Then create aMultiPromptChainrouter that uses an LLM to classify the attraction type and route to the appropriate restaurant chain.
Learning milestones:
- Build a
SimpleSequentialChain→ Master the basic chaining concept. - Build a more complex
SequentialChainthat requires output manipulation → Understand data flow in chains. - Implement a
RouterChainthat classifies and routes user intent → Let an LLM control the application flow. - Successfully generate a multi-step, context-aware itinerary → You can build complex workflows.
Project 6: RAG Bot with Citations
- File: LEARN_LANGCHAIN_PROJECTS.md
- Main Programming Language: Python
- Coolness Level: Level 4: Hardcore Tech Flex
- Business Potential: 4. The “Open Core” Infrastructure
- Difficulty: Level 4: Expert
- Knowledge Area: Advanced RAG
- Software or Tool: LangChain,
ParentDocumentRetriever - Main Book: LangChain Docs - “Retrievers”
What you’ll build: An enhanced version of the Document Q&A bot from Project 2. This version will not only answer the question but also provide citations, pointing to the exact source document and page number it used to generate the answer.
Why it teaches LangChain: This is a crucial step for production-ready RAG. Users need to trust the AI’s output, and citations are the best way to do that. This project teaches you how to handle document metadata and modify the final “answer synthesis” step to include sources. It also introduces more advanced retrieval strategies.
Core challenges you’ll face:
- Preserving metadata during document splitting → maps to ensuring each chunk knows its original source (e.g., filename, page number)
- Using a more advanced retriever → maps to exploring
ParentDocumentRetrieverto get more contextually relevant chunks - Modifying the final QA prompt → maps to instructing the LLM to provide sources along with its answer
- Creating a final chain that returns a structured answer object → maps to combining the answer and sources into a single Pydantic object
Key Concepts:
ParentDocumentRetriever: A strategy that splits documents into small chunks for searching but retrieves larger, surrounding “parent” chunks for better context.- Metadata: Attaching key-value data to documents and chunks.
create_stuff_documents_chain: The chain responsible for stuffing the retrieved documents into the final prompt, which you can customize.
Difficulty: Expert Time estimate: 2 weeks Prerequisites: Project 2.
Real world outcome: A trustworthy Q&A bot that backs up its claims.
(Console Output)
> Question: In the SEC filing, what were the main risk factors mentioned for Q4?
**Answer:** The main risk factors mentioned for Q4 were increased competition from emerging markets, potential supply chain disruptions due to new trade regulations, and the ongoing volatility in currency exchange rates.
**Sources:**
1. **Document:** `10-K_filing_2025.pdf`, Page 12, Section "Risk Factors"
2. **Document:** `10-K_filing_2025.pdf`, Page 14, Section "International Operations"
Implementation Hints:
- When loading documents (e.g., with
PyPDFLoader), themetadatafor each page document will automatically include thesourcefilename andpagenumber. - Use the
ParentDocumentRetriever. You will need adocstoreto hold the parent documents and avectorstorefor the child chunks. - Create a custom prompt for the final QA step. Add a sentence like: “You must include a list of sources used to formulate your answer. A source is the ‘source’ and ‘page’ from the metadata of the provided documents.”
- Use
create_retrieval_chainwhich is designed to pass through the retrieved documents, making them available for you to format and display as sources.
Learning milestones:
- Ingest documents with metadata correctly preserved → You understand data provenance.
- Implement
ParentDocumentRetriever→ You’re using advanced retrieval strategies. - Customize the final QA prompt to ask for sources → You can manipulate the final generation step.
- Return a structured object containing both the answer and a list of source documents → You can build enterprise-grade RAG systems.
Project 7: Natural Language to SQL Query Generator
- File: LEARN_LANGCHAIN_PROJECTS.md
- Main Programming Language: Python
- Coolness Level: Level 4: Hardcore Tech Flex
- Business Potential: 4. The “Open Core” Infrastructure
- Difficulty: Level 4: Expert
- Knowledge Area: SQL Agents & Chains
- Software or Tool: LangChain, SQLAlchemy, a sample SQLite database (e.g., Chinook)
- Main Book: LangChain Docs - “SQL”
What you’ll build: A tool that connects to a SQL database and allows users to ask questions in plain English. The tool will convert the question to a SQL query, execute it, and return the answer in a human-readable format.
Why it teaches LangChain: A huge amount of the world’s data is in structured SQL databases. This project teaches you how to give an LLM the ability to interact with this data. You’ll learn about the specialized SQL agents and chains that inspect the database schema to write accurate queries.
Core challenges you’ll face:
- Connecting LangChain to a database → maps to creating a
SQLDatabaseobject usingSQLAlchemy - Giving the LLM the database schema → maps to how the SQL toolkit automatically provides table and column info in the prompt
- Ensuring the LLM generates valid SQL for your dialect → maps to the power and limitations of Text-to-SQL
- Handling potential errors and interpreting results → maps to building a robust database-interacting agent
Key Concepts:
SQLDatabaseToolkit: The LangChain wrapper around a database connection.create_sql_agent: A factory function for creating a powerful agent that can query databases.- Text-to-SQL Prompting: The art of providing schema information and examples to an LLM so it can write correct queries.
Difficulty: Expert Time estimate: 1-2 weeks Prerequisites: Basic SQL knowledge, Project 3.
Real world outcome: A powerful analytics tool that empowers non-technical users to query a database.
(Console Output)
> Enter your question about the Chinook database:
> Which 5 artists have the most albums?
> Entering new AgentExecutor chain...
Thought: The user wants to find the top 5 artists with the most albums. I need to join the 'artists' table with the 'albums' table, group by artist name, count the number of albums, and then order the results to get the top 5.
Action:
{
"action": "sql_db_query",
"action_input": "SELECT T1.Name, COUNT(T2.AlbumId) AS TotalAlbums FROM artists AS T1 JOIN albums AS T2 ON T1.ArtistId = T2.ArtistId GROUP BY T1.Name ORDER BY TotalAlbums DESC LIMIT 5"
}
Observation: [('Iron Maiden', 21), ('Led Zeppelin', 14), ('Deep Purple', 11), ('U2', 10), ('Metallica', 10)]
Thought: I have the result from the database. I can now formulate a final answer for the user.
Final Answer: The top 5 artists with the most albums are: Iron Maiden (21 albums), Led Zeppelin (14 albums), Deep Purple (11 albums), U2 (10 albums), and Metallica (10 albums).
> Finished chain.
Implementation Hints:
- Set up a sample SQLite database (the Chinook dataset is perfect for this).
- Install
sqlalchemyand other necessary database drivers. - Create the
SQLDatabaseobject:db = SQLDatabase.from_uri("sqlite:///Chinook.db"). - Instantiate your LLM.
- Use the
create_sql_agentfunction, passing it thellmand thedbobject. This will create an agent with a pre-configured set of database tools. - Invoke the agent executor with a natural language question.
Learning milestones:
- Successfully connect LangChain to a SQL database → You can bridge the gap between LLMs and structured data.
- Ask a simple question that queries a single table → You’ve got the basic Text-to-SQL working.
- Ask a complex question that requires a JOIN → You can see the LLM’s advanced reasoning capabilities.
- Observe the agent correcting a malformed query → You understand the iterative, self-healing nature of SQL agents.
Project 8: Movie Recommendation Graph Agent
- File: LEARN_LANGCHAIN_PROJECTS.md
- Main Programming Language: Python
- Coolness Level: Level 5: Pure Magic (Super Cool)
- Business Potential: 4. The “Open Core” Infrastructure
- Difficulty: Level 5: Master
- Knowledge Area: Graph Database Chains
- Software or Tool: LangChain, Neo4j (or other graph DB),
GraphCypherQAChain - Main Book: LangChain Docs - “Graph”
What you’ll build: An agent that provides movie recommendations by querying a graph database. You’ll model movies, actors, and directors as nodes and their relationships as edges. The agent will translate natural language questions into Cypher (the graph query language) to answer questions that are difficult for traditional databases, like “Find movies starring an actor who also directed a movie.”
Why it teaches LangChain: This project explores the frontier of data interaction. While RAG is great for unstructured text and SQL agents are great for tables, Graph chains are designed to leverage the relationships in your data, enabling more complex, multi-hop reasoning.
Core challenges you’ll face:
- Modeling data as a graph → maps to thinking in terms of nodes, relationships, and properties
- Connecting LangChain to a graph database → maps to using the
Neo4jGraphor similar integration - Understanding Text-to-Cypher → maps to how the LLM uses the graph schema to construct Cypher queries
- Leveraging the graph structure for novel queries → maps to asking questions that would be very slow or complex in SQL
Key Concepts:
- Graph Databases: Understanding nodes, edges, and properties (Neo4j fundamentals).
- Cypher Query Language: The SQL-like language for querying graphs.
GraphCypherQAChain: The specialized chain for converting questions to Cypher and executing them against a graph.- Graph Schema: The information provided to the LLM so it knows what types of nodes and relationships exist.
Difficulty: Master Time estimate: 2 weeks Prerequisites: Project 7, willingness to learn basic Cypher.
Real world outcome: A chatbot that can answer nuanced questions about movies and their connections.
(Console Output)
> Question: Which movies did Tom Hanks act in that were also directed by Steven Spielberg?
> Entering new GraphCypherQAChain chain...
Generated Cypher:
MATCH (p:Person {name: 'Tom Hanks'})-[:ACTED_IN]->(m:Movie)<-[:DIRECTED]-(d:Person {name: 'Steven Spielberg'}) RETURN m.title
> Executing Cypher...
> Finished chain.
Answer: Saving Private Ryan, Catch Me If You Can, The Terminal, Bridge of Spies, The Post.
Implementation Hints:
- Set up a Neo4j instance (AuraDB offers a free tier).
- Populate it with some sample movie data. Create
(:Movie),(:Person)nodes and[:ACTED_IN],[:DIRECTED]relationships. - Instantiate the
Neo4jGraphobject in LangChain with your database credentials. - Instantiate your LLM.
- Create a
GraphCypherQAChainfrom the graph object and the LLM. - Run the chain with your natural language question. The chain will automatically infer the schema, generate Cypher, execute it, and synthesize the answer.
Learning milestones:
- Connect LangChain to a running graph database → You’ve integrated another type of data source.
- Successfully answer a single-hop question (“Who directed ‘Forrest Gump’?”) → You’ve got Text-to-Cypher working.
- Successfully answer a multi-hop question (“Who acted in a movie with Tom Hanks?”) → You are leveraging the power of the graph.
- Inspect the generated Cypher to understand how the LLM “thinks” in graphs → You have mastered graph-based LLM interaction.
Final Overall Project (Project 9): A GitHub Repo Analysis Agent
- File: LEARN_LANGCHAIN_PROJECTS.md
- Main Programming Language: Python
- Coolness Level: Level 5: Pure Magic (Super Cool)
- Business Potential: 5. The “Industry Disruptor”
- Difficulty: Level 5: Master
- Knowledge Area: Advanced Agents, Custom Tools, RAG-as-a-Tool
- Software or Tool: LangChain, GitPython library
What you’ll build: A sophisticated agent that can analyze a GitHub repository. You’ll equip it with custom tools to list_files, read_file, and a powerful RAG-based tool to answer_about_codebase. This allows it to answer both specific questions (“What’s in the Dockerfile?”) and high-level questions (“What is the overall architecture of this project?”).
Why it’s the capstone project: It combines all the concepts you’ve learned. It uses an Agent as the main entry point, which uses Custom Tools you define. One of those tools is itself a full RAG Chain, demonstrating the powerful, composable nature of LangChain. This is a real, non-trivial application that showcases a deep understanding of the framework.
Core challenges you’ll face:
- Creating custom tools from Python functions → maps to using the
@tooldecorator to expose your own code to the agent - Designing a “RAG tool” → maps to wrapping your entire RAG chain from Project 2 into a single tool that the agent can decide to call
- Prompting the agent effectively → maps to writing a system prompt that helps the agent understand its tools and when to use them
- Orchestrating complex reasoning → maps to the agent deciding whether to read a single file for a specific answer or use the RAG tool for a general one
Real world outcome: An AI assistant that you can point at a GitHub repo and ask complex questions to, accelerating your ability to understand new code.
(Console Output)
> Analyzing repo 'https://github.com/langchain-ai/langchain'...
> Ask a question: How does LangChain handle streaming responses?
> Entering new AgentExecutor chain...
Thought: The user is asking a high-level conceptual question about the codebase. Reading a single file is unlikely to give me the full picture. I should use my tool for answering general questions about the codebase.
Action:
{
"action": "answer_about_codebase",
"action_input": "How does LangChain handle streaming responses?"
}
Observation: LangChain handles streaming by using iterators and generators. The `stream` method on chains and models returns an iterator that yields response chunks as they are generated by the LLM. This is often implemented using Server-Sent Events (SSE) or by parsing the raw streaming HTTP response. Key classes involved are `StreamingStdOutCallbackHandler`...
Thought: I have a comprehensive answer from my RAG tool. I can now provide the final answer.
Final Answer: LangChain handles streaming responses primarily through methods like `.stream()` available on its core components (Chains, LLMs). This method returns an iterator that yields chunks of the response as they become available from the model. This allows for real-time display of text in applications. For more detailed implementation, you can look into callback handlers like `StreamingStdOutCallbackHandler`.
> Finished chain.
Learning milestones:
- Build and test a custom
list_filestool → You can expose simple functions as tools. - Build a
RAG-as-a-Tooland integrate it into the agent’s tool list → You’ve mastered composition. - Write a master prompt that guides the agent on when to use which tool → You understand how to steer agentic behavior.
- Successfully have the agent answer a high-level question using the RAG tool and a specific question using the
read_filetool → You have built a truly intelligent and useful AI application.
Summary of Projects
| Project | Main Language | Key LangChain Concept |
|---|---|---|
| 1. Structured Data Extractor | Python | Prompts, Models, Output Parsers |
| 2. Document Q&A Bot | Python | RAG (Indexes, Retrievers) |
| 3. Research Assistant Agent | Python | Agents & Tools |
| 4. Conversational Product Recommender | Python | Memory |
| 5. Automated Trip Planner | Python | Advanced Chains (Sequential, Router) |
| 6. RAG Bot with Citations | Python | Advanced RAG |
| 7. Natural Language to SQL Query | Python | SQL Agents & Chains |
| 8. Movie Recommendation Graph Agent | Python | Graph Chains |
| Final Project (9): GitHub Repo Analysis Agent | Python | Advanced Agents, Custom Tools, RAG-as-a-Tool |