Skip to content
Learn Agentic AI
Learn Agentic AI13 min read3 views

LangChain RAG Chains: Document Loaders, Text Splitters, and Retrieval QA

Build end-to-end Retrieval Augmented Generation pipelines with LangChain — covering document loaders, text splitting strategies, vector stores, retrievers, and RAG chain composition.

What Is RAG and Why LangChain for It

Retrieval Augmented Generation (RAG) combines a retrieval step with LLM generation. Instead of relying solely on the model's training data, RAG fetches relevant documents from your own data source and includes them as context in the prompt. This lets the model answer questions about your specific documents, databases, or knowledge bases.

LangChain provides the full RAG pipeline as composable components: document loaders to ingest data, text splitters to chunk it, embedding models and vector stores to index it, retrievers to search it, and chain composition to wire it all together.

Step 1: Loading Documents

LangChain ships with loaders for dozens of formats — PDF, HTML, CSV, Markdown, databases, APIs, and more.

flowchart TD
    START["LangChain RAG Chains: Document Loaders, Text Spli…"] --> A
    A["What Is RAG and Why LangChain for It"]
    A --> B
    B["Step 1: Loading Documents"]
    B --> C
    C["Step 2: Splitting Text into Chunks"]
    C --> D
    D["Step 3: Embedding and Indexing"]
    D --> E
    E["Step 4: Building the Retriever"]
    E --> F
    F["Step 5: Composing the RAG Chain"]
    F --> G
    G["Adding Source Citations"]
    G --> H
    H["FAQ"]
    H --> DONE["Key Takeaways"]
    style START fill:#4f46e5,stroke:#4338ca,color:#fff
    style DONE fill:#059669,stroke:#047857,color:#fff
from langchain_community.document_loaders import (
    TextLoader,
    PyPDFLoader,
    CSVLoader,
    WebBaseLoader,
)

# Load a text file
text_docs = TextLoader("knowledge_base.txt").load()

# Load a PDF (one document per page)
pdf_docs = PyPDFLoader("annual_report.pdf").load()

# Load from a web page
web_docs = WebBaseLoader("https://docs.example.com/guide").load()

# Each returns a list of Document objects
print(text_docs[0].page_content[:200])
print(text_docs[0].metadata)  # {"source": "knowledge_base.txt"}

Every loader returns Document objects with page_content (the text) and metadata (source, page number, etc.). Metadata flows through the entire pipeline, so your final answers can cite sources.

Step 2: Splitting Text into Chunks

Documents are often too large to fit in a single prompt. Text splitters break them into manageable chunks while preserving semantic coherence.

from langchain_text_splitters import RecursiveCharacterTextSplitter

splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,       # Max characters per chunk
    chunk_overlap=200,     # Overlap between consecutive chunks
    separators=["\n\n", "\n", ". ", " ", ""],
)

chunks = splitter.split_documents(pdf_docs)
print(f"Split {len(pdf_docs)} pages into {len(chunks)} chunks")

RecursiveCharacterTextSplitter is the recommended default. It tries to split on paragraph boundaries first, then sentences, then words, ensuring chunks are as semantically coherent as possible. The overlap ensures that information spanning a boundary appears in at least one chunk.

For code, use RecursiveCharacterTextSplitter.from_language():

from langchain_text_splitters import RecursiveCharacterTextSplitter, Language

python_splitter = RecursiveCharacterTextSplitter.from_language(
    language=Language.PYTHON,
    chunk_size=1000,
    chunk_overlap=100,
)

Step 3: Embedding and Indexing

Chunks are converted to vectors using an embedding model and stored in a vector store for similarity search.

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

flowchart LR
    S0["Step 1: Loading Documents"]
    S0 --> S1
    S1["Step 2: Splitting Text into Chunks"]
    S1 --> S2
    S2["Step 3: Embedding and Indexing"]
    S2 --> S3
    S3["Step 4: Building the Retriever"]
    S3 --> S4
    S4["Step 5: Composing the RAG Chain"]
    style S0 fill:#4f46e5,stroke:#4338ca,color:#fff
    style S4 fill:#059669,stroke:#047857,color:#fff
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import FAISS

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

# Create vector store from chunks
vectorstore = FAISS.from_documents(chunks, embeddings)

# Or use Chroma for persistence
from langchain_chroma import Chroma

vectorstore = Chroma.from_documents(
    chunks,
    embeddings,
    persist_directory="./chroma_db",
)

FAISS is fast and in-memory. Chroma persists to disk. For production, consider Pinecone, Weaviate, or pgvector for PostgreSQL.

Step 4: Building the Retriever

A retriever wraps the vector store and returns the most relevant documents for a query.

retriever = vectorstore.as_retriever(
    search_type="similarity",
    search_kwargs={"k": 4},  # Return top 4 chunks
)

# Test the retriever
docs = retriever.invoke("What were Q3 revenue numbers?")
for doc in docs:
    print(doc.page_content[:100])
    print(doc.metadata)
    print("---")

You can also use search_type="mmr" (Maximal Marginal Relevance) to get diverse results rather than just the closest matches.

Step 5: Composing the RAG Chain

Now connect everything into an LCEL chain that retrieves context and generates answers.

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

# Format retrieved documents into a single string
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

prompt = ChatPromptTemplate.from_template(
    """Answer the question based on the following context.
If the context does not contain enough information, say so.

Context:
{context}

Question: {question}

Answer:"""
)

rag_chain = (
    {
        "context": retriever | format_docs,
        "question": RunnablePassthrough(),
    }
    | prompt
    | ChatOpenAI(model="gpt-4o", temperature=0)
    | StrOutputParser()
)

answer = rag_chain.invoke("What were the key highlights from Q3?")
print(answer)

The dictionary step runs the retriever and passthrough in parallel. Retrieved documents are formatted into a string, while the original question is forwarded. Both feed into the prompt template.

Adding Source Citations

To return sources alongside the answer, modify the chain to return both.

from langchain_core.runnables import RunnableParallel

rag_with_sources = RunnableParallel(
    answer=rag_chain,
    sources=retriever | (lambda docs: [d.metadata["source"] for d in docs]),
)

result = rag_with_sources.invoke("What were Q3 revenue numbers?")
print(result["answer"])
print("Sources:", result["sources"])

FAQ

How do I choose the right chunk size?

Start with 1000 characters and 200 overlap. Smaller chunks (500 characters) improve retrieval precision but may lose context. Larger chunks (2000 characters) retain more context but may dilute relevance. Test with your actual queries and documents, measuring retrieval quality.

Can I use RAG with local models instead of OpenAI?

Yes. Replace ChatOpenAI with any LangChain model wrapper — ChatOllama for local Ollama models, for example. For embeddings, use HuggingFaceEmbeddings or OllamaEmbeddings to keep everything local.

How do I update the vector store when my documents change?

Most vector stores support add_documents() to add new content. For updates, delete the old documents by ID and add the new versions. Chroma and Pinecone support upsert operations. For bulk reindexing, rebuild the vector store from scratch.


#LangChain #RAG #VectorStore #DocumentLoading #Python #AgenticAI #LearnAI #AIEngineering

Share
C

Written by

CallSphere Team

Expert insights on AI voice agents and customer communication automation.

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.

Related Articles You May Like

guides

Understanding AI Voice Technology: A Beginner's Guide

A plain-English guide to AI voice technology — LLMs, STT, TTS, RAG, function calling, and latency budgets. Learn how modern voice agents actually work.

Technical Guides

How to Train an AI Voice Agent on Your Business: Prompts, RAG, and Fine-Tuning

A practical guide to training an AI voice agent on your specific business — system prompts, RAG over knowledge bases, and when to fine-tune.

AI Interview Prep

8 LLM & RAG Interview Questions That OpenAI, Anthropic & Google Actually Ask

Real LLM and RAG interview questions from top AI labs in 2026. Covers fine-tuning vs RAG decisions, production RAG pipelines, evaluation, PEFT methods, positional embeddings, and safety guardrails with expert answers.

AI Interview Prep

7 AI Coding Interview Questions From Anthropic, Meta & OpenAI (2026 Edition)

Real AI coding interview questions from Anthropic, Meta, and OpenAI in 2026. Includes implementing attention from scratch, Anthropic's progressive coding screens, Meta's AI-assisted round, and vector search — with solution approaches.

Learn Agentic AI

AI Agents for IT Helpdesk: L1 Automation, Ticket Routing, and Knowledge Base Integration

Build IT helpdesk AI agents with multi-agent architecture for triage, device, network, and security issues. RAG-powered knowledge base, automated ticket creation, routing, and escalation.

Learn Agentic AI

Building a Multi-Agent Data Pipeline: Ingestion, Transformation, and Analysis Agents

Build a three-agent data pipeline with ingestion, transformation, and analysis agents that process data from APIs, CSVs, and databases using Python.