LangChain Tutorial: Build LLM-Powered Apps (Complete Guide 2025)

LangChain is the most widely-used Python framework for building LLM-powered applications. It provides the abstractions, integrations, and tools needed to build everything from a simple Q&A chatbot to a complex multi-agent system that can research, analyze, and generate reports autonomously. As of 2025, LangChain has over 95,000 GitHub stars and is used by thousands of production applications.

This guide covers the core concepts you need to be productive with LangChain, with real code you can copy and run. We'll build up to a complete RAG (Retrieval Augmented Generation) system — the most practically valuable architecture for most LLM applications.

What Is LangChain?

LangChain is a framework that simplifies the development of LLM applications by providing:

🔗

Composable Chains

String together LLM calls, data transformations, and tool calls using LCEL (LangChain Expression Language) — a clean, readable pipe syntax.

🤖

Agent Framework

Build autonomous agents that can use tools (search, code execution, APIs) and reason step-by-step using the ReAct pattern.

🗄️

300+ Integrations

Native connectors for every major LLM provider, vector store, document loader, and tool — so you're not writing boilerplate HTTP calls.

Installation & Setup

# Install core packages
pip install langchain langchain-openai langchain-community chromadb

# Set your API key
import os
os.environ["OPENAI_API_KEY"] = "your-key-here"
# Or set it in a .env file and use: from dotenv import load_dotenv; load_dotenv()

The LangChain package ecosystem is split into modular sub-packages as of v0.2+: langchain (core), langchain-openai (OpenAI-specific), langchain-anthropic (Claude), and langchain-community (community integrations). Install only what you need to keep your dependency footprint small.

LCEL: LangChain Expression Language

LCEL is LangChain's modern chain composition syntax using Python's pipe (|) operator. It's the recommended way to build chains and produces cleaner code than the older legacy chain classes.

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

# Initialize the model
llm = ChatOpenAI(model="gpt-4o", temperature=0.7)

# Create a prompt template
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant that explains technical concepts simply."),
    ("human", "Explain {concept} as if I'm 12 years old. Keep it under 100 words.")
])

# Build the chain using LCEL pipe syntax
chain = prompt | llm | StrOutputParser()

# Invoke the chain
result = chain.invoke({"concept": "neural networks"})
print(result)

The pipe operator creates a Runnable — a composable unit that can be invoked, batched, or streamed. Every component (prompt, model, parser) implements the same interface, making them trivially composable. You can also stream the output character by character:

# Streaming output (great for chat interfaces)
for chunk in chain.stream({"concept": "vector databases"}):
    print(chunk, end="", flush=True)

Memory: Maintaining Conversation State

By default, LLMs are stateless — each call is independent. LangChain's memory modules add conversation history. The most common pattern in 2025 uses RunnableWithMessageHistory:

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_community.chat_message_histories import ChatMessageHistory

llm = ChatOpenAI(model="gpt-4o")

prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful coding assistant."),
    MessagesPlaceholder(variable_name="history"),  # injects conversation history
    ("human", "{input}")
])

chain = prompt | llm | StrOutputParser()

# Store histories per session
session_store = {}

def get_session_history(session_id: str):
    if session_id not in session_store:
        session_store[session_id] = ChatMessageHistory()
    return session_store[session_id]

# Wrap chain with memory
chain_with_history = RunnableWithMessageHistory(
    chain,
    get_session_history,
    input_messages_key="input",
    history_messages_key="history"
)

# Use the same session_id to maintain context
config = {"configurable": {"session_id": "user-123"}}
response1 = chain_with_history.invoke({"input": "What is a decorator in Python?"}, config=config)
response2 = chain_with_history.invoke({"input": "Can you show me an example?"}, config=config)
# response2 understands "example" refers to the decorator from response1

For production, replace ChatMessageHistory (in-memory, lost on restart) with a persistent backend: RedisChatMessageHistory, DynamoDBChatMessageHistory, or PostgresChatMessageHistory.

Building a RAG System

Retrieval Augmented Generation (RAG) is the most valuable LangChain pattern for real applications. Instead of relying solely on the LLM's training data, RAG fetches relevant documents from your own data store and includes them in the context before generation. This enables accurate, cited answers over private documents, up-to-date information, and knowledge bases too large for a single context window.

📥 Document Loading & Chunking

Load your documents and split them into chunks small enough to fit in context. The chunk size / overlap balance matters: larger chunks preserve more context, smaller chunks enable more precise retrieval.

from langchain_community.document_loaders import PyPDFLoader, TextLoader, WebBaseLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter

# Load a PDF
loader = PyPDFLoader("company_handbook.pdf")
documents = loader.load()

# Load from URL
web_loader = WebBaseLoader("https://docs.example.com/api-reference")
web_docs = web_loader.load()

# Split into chunks
splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,       # characters per chunk
    chunk_overlap=200,     # overlap to preserve context across chunks
    length_function=len,
    separators=["\n\n", "\n", " ", ""]  # split on paragraphs first, then lines, then words
)
chunks = splitter.split_documents(documents)

🗄️ Vector Store Indexing

Convert document chunks to embeddings (dense numerical vectors) and store them in a vector database. Embeddings capture semantic meaning — chunks about similar topics end up near each other in vector space, enabling semantic search.

from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma

# Create embeddings using OpenAI's text-embedding-3-small (cost-effective)
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

# Create and persist a Chroma vector store
vectorstore = Chroma.from_documents(
    documents=chunks,
    embedding=embeddings,
    persist_directory="./chroma_db"  # persist to disk
)

# For subsequent runs, load the existing store:
# vectorstore = Chroma(persist_directory="./chroma_db", embedding_function=embeddings)

# Create a retriever — finds the top-k most semantically similar chunks
retriever = vectorstore.as_retriever(
    search_type="mmr",      # Maximum Marginal Relevance: balances similarity + diversity
    search_kwargs={"k": 5}  # return 5 most relevant chunks
)

🔗 RAG Chain Assembly

Connect the retriever to the LLM using LCEL. The chain retrieves relevant chunks, formats them into a prompt with context, and generates an answer grounded in your documents.

from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import ChatOpenAI
from langchain_core.output_parsers import StrOutputParser

llm = ChatOpenAI(model="gpt-4o", temperature=0)

rag_prompt = ChatPromptTemplate.from_messages([
    ("system", """You are a helpful assistant. Answer questions based ONLY on the provided context.
If the answer isn't in the context, say 'I don't have information about that in the provided documents.'
Always cite which part of the context your answer comes from.

Context:
{context}"""),
    ("human", "{question}")
])

def format_docs(docs):
    return "\n\n---\n\n".join(
        f"Source: {doc.metadata.get('source', 'Unknown')}, Page: {doc.metadata.get('page', 'N/A')}\n{doc.page_content}"
        for doc in docs
    )

# Full RAG chain
rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | rag_prompt
    | llm
    | StrOutputParser()
)

# Query your documents
answer = rag_chain.invoke("What is the company's remote work policy?")
print(answer)

📊 Evaluation & Improvement

A RAG system is only as good as its retrieval quality. Use LangSmith or a simple manual evaluation loop to measure retrieval precision and answer faithfulness — this catches issues like chunks being too small/large or the wrong embedding model being used.

# Simple retrieval quality test
test_questions = [
    "What is the vacation policy?",
    "How do I expense travel?",
    "Who approves performance reviews?"
]

for question in test_questions:
    retrieved_docs = retriever.invoke(question)
    print(f"\nQuestion: {question}")
    print(f"Top retrieved chunk: {retrieved_docs[0].page_content[:200]}...")
    print(f"Total chunks retrieved: {len(retrieved_docs)}")
    # Manually verify: does the top chunk actually contain the answer?

LangChain Agents

LangChain agents use an LLM to decide which tools to use and in what order, dynamically adapting to the task. The new recommended approach uses create_react_agent from LangGraph:

from langchain import hub
from langchain_openai import ChatOpenAI
from langchain_community.tools import DuckDuckGoSearchRun
from langchain_core.tools import tool
from langgraph.prebuilt import create_react_agent

# Define tools
search = DuckDuckGoSearchRun()

@tool
def calculate(expression: str) -> str:
    """Evaluate a mathematical expression. Input should be a valid Python math expression."""
    try:
        return str(eval(expression))
    except Exception as e:
        return f"Error: {e}"

tools = [search, calculate]

# Create the agent
llm = ChatOpenAI(model="gpt-4o", temperature=0)
agent = create_react_agent(llm, tools)

# Run the agent
response = agent.invoke({
    "messages": [("human", "What is the current price of NVIDIA stock, and what percentage change is that from $100?")]
})
print(response["messages"][-1].content)

The agent will autonomously call the search tool to find the current price, then call the calculate tool to compute the percentage — all without you specifying the order of operations.

LangChain vs LlamaIndex vs Haystack

Feature	LangChain	LlamaIndex	Haystack
Primary strength	Agent workflows, chains, general-purpose	Data indexing, RAG	Enterprise search pipelines
RAG support	✅ Excellent	✅ Best-in-class	✅ Good
Agent framework	✅ Excellent (LangGraph)	✅ Good	⚠️ Limited
Observability	✅ LangSmith	✅ LlamaTrace	✅ Hayhooks
Learning curve	Medium (rapid API changes historically)	Low-Medium	High
Community & docs	⭐⭐⭐⭐⭐ Largest	⭐⭐⭐⭐ Large	⭐⭐⭐ Good
Best for	Most LLM applications	Document Q&A, knowledge bases	Enterprise search, pipelines

When to use LlamaIndex instead: If your primary use case is building a Q&A system over documents or a knowledge base, LlamaIndex's data indexing abstractions and query engines are more purpose-built and often easier to get right than LangChain's more general-purpose approach. The two frameworks are not mutually exclusive — you can use LlamaIndex for retrieval and LangChain for chain/agent orchestration.

Production Deployment Tips

Use LangSmith for observability: Sign up free at smith.langchain.com. Set LANGCHAIN_TRACING_V2=true and LANGCHAIN_API_KEY in your environment — every chain invocation is automatically logged with full input/output traces, latency, and token usage. Essential for debugging production issues.
Cache LLM responses: Use InMemoryCache (development) or RedisCache (production) to cache identical LLM calls. Semantic caching (via GPTCache) can cache semantically similar questions. Reduces latency and API costs dramatically for high-traffic Q&A applications.
Stream responses to users: Always use chain.astream() in async web frameworks (FastAPI, Django async) for chat interfaces. Users experience first-token latency instead of waiting for the full response — dramatically better UX.
Separate indexing from retrieval: Run document ingestion (loading, chunking, embedding, storing) as an offline batch job, not inline with user requests. Indexing is slow and expensive; retrieval should be fast (<200ms). Use a dedicated ingestion pipeline that updates the vector store independently.
Pin your LangChain version: LangChain has historically made breaking changes between minor versions. Always pin exact versions in your requirements.txt (langchain==0.2.16) and test before upgrading in production.

Frequently Asked Questions

Do I need LangChain or can I just use the OpenAI SDK directly? +

For simple applications (one or two LLM calls, no retrieval), the OpenAI SDK directly is perfectly fine and has less overhead. LangChain's value grows with application complexity: if you need multiple models, RAG, agents, memory management, or complex chain logic, LangChain's abstractions save significant development time. The rule of thumb: start with the raw SDK, switch to LangChain when you find yourself re-implementing the same patterns.

Which vector database should I use with LangChain? +

For local development: Chroma (open-source, runs in-process, great DX). For small-medium production: Pinecone (managed, fast, excellent LangChain integration) or Weaviate (open-source, self-hostable). For large-scale enterprise: Qdrant (high performance, Rust-based) or pgvector (if you're already on PostgreSQL). If you're already using Supabase or PostgreSQL, pgvector is often the simplest production choice.

What is LCEL and should I use it over the old chain classes? +

LangChain Expression Language (LCEL) is the modern, recommended way to build chains using Python's pipe operator. It replaces older classes like LLMChain, ConversationalRetrievalChain, etc. Yes, you should use LCEL for all new code — it's cleaner, more composable, supports streaming out of the box, and is what LangChain's team actively develops. Legacy chain classes are deprecated but still work for backward compatibility.

How do I improve RAG answer quality? +

The most impactful improvements: (1) Better chunking — use semantic chunking or adjust chunk size for your content type; (2) Hybrid search — combine vector search with keyword (BM25) search using EnsembleRetriever; (3) Re-ranking — use Cohere's Rerank API to re-score retrieved chunks before passing to the LLM; (4) Query expansion — use an LLM to generate multiple versions of the user's question and retrieve for each; (5) Metadata filtering — add filters to restrict retrieval to relevant document sections.

What's the relationship between LangChain and LangGraph? +

LangGraph is LangChain's separate library for building stateful, multi-step agent workflows as directed graphs. While LangChain (LCEL) handles linear chains well, LangGraph is designed for more complex flows: agents with loops, parallel branches, human-in-the-loop approvals, and multi-agent coordination. Think of LangChain as the foundation and LangGraph as the orchestration layer on top. For complex agents in 2025, LangGraph is the recommended approach within the LangChain ecosystem.