---
title: "LangChain RAG Chains: Document Loaders, Text Splitters, and Retrieval QA"
description: "Build end-to-end Retrieval Augmented Generation pipelines with LangChain — covering document loaders, text splitting strategies, vector stores, retrievers, and RAG chain composition."
canonical: https://callsphere.ai/blog/langchain-rag-chains-document-loaders-text-splitters-retrieval-qa
category: "Learn Agentic AI"
tags: ["LangChain", "RAG", "Vector Store", "Document Loading", "Python"]
author: "CallSphere Team"
published: 2026-03-17T00:00:00.000Z
updated: 2026-05-08T12:24:13.301Z
---

# LangChain RAG Chains: Document Loaders, Text Splitters, and Retrieval QA

> Build end-to-end Retrieval Augmented Generation pipelines with LangChain — covering document loaders, text splitting strategies, vector stores, retrievers, and RAG chain composition.

## What Is RAG and Why LangChain for It

Retrieval Augmented Generation (RAG) combines a retrieval step with LLM generation. Instead of relying solely on the model's training data, RAG fetches relevant documents from your own data source and includes them as context in the prompt. This lets the model answer questions about your specific documents, databases, or knowledge bases.

LangChain provides the full RAG pipeline as composable components: document loaders to ingest data, text splitters to chunk it, embedding models and vector stores to index it, retrievers to search it, and chain composition to wire it all together.

## Step 1: Loading Documents

LangChain ships with loaders for dozens of formats — PDF, HTML, CSV, Markdown, databases, APIs, and more.

```mermaid
flowchart LR
    Q(["User query"])
    EMB["Embed query
text-embedding-3"]
    VEC[("Vector DB
pgvector or Pinecone")]
    RET["Top-k retrieval
k = 8"]
    PROMPT["Augmented prompt
system plus context"]
    LLM["LLM generation
Claude or GPT"]
    CITE["Inline citations
and page anchors"]
    OUT(["Grounded answer"])
    Q --> EMB --> VEC --> RET --> PROMPT --> LLM --> CITE --> OUT
    style EMB fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
    style VEC fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
    style LLM fill:#4f46e5,stroke:#4338ca,color:#fff
    style OUT fill:#059669,stroke:#047857,color:#fff
```

```python
from langchain_community.document_loaders import (
    TextLoader,
    PyPDFLoader,
    CSVLoader,
    WebBaseLoader,
)

# Load a text file
text_docs = TextLoader("knowledge_base.txt").load()

# Load a PDF (one document per page)
pdf_docs = PyPDFLoader("annual_report.pdf").load()

# Load from a web page
web_docs = WebBaseLoader("https://docs.example.com/guide").load()

# Each returns a list of Document objects
print(text_docs[0].page_content[:200])
print(text_docs[0].metadata)  # {"source": "knowledge_base.txt"}
```

Every loader returns `Document` objects with `page_content` (the text) and `metadata` (source, page number, etc.). Metadata flows through the entire pipeline, so your final answers can cite sources.

## Step 2: Splitting Text into Chunks

Documents are often too large to fit in a single prompt. Text splitters break them into manageable chunks while preserving semantic coherence.

```python
from langchain_text_splitters import RecursiveCharacterTextSplitter

splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,       # Max characters per chunk
    chunk_overlap=200,     # Overlap between consecutive chunks
    separators=["\n\n", "\n", ". ", " ", ""],
)

chunks = splitter.split_documents(pdf_docs)
print(f"Split {len(pdf_docs)} pages into {len(chunks)} chunks")
```

`RecursiveCharacterTextSplitter` is the recommended default. It tries to split on paragraph boundaries first, then sentences, then words, ensuring chunks are as semantically coherent as possible. The overlap ensures that information spanning a boundary appears in at least one chunk.

For code, use `RecursiveCharacterTextSplitter.from_language()`:

```python
from langchain_text_splitters import RecursiveCharacterTextSplitter, Language

python_splitter = RecursiveCharacterTextSplitter.from_language(
    language=Language.PYTHON,
    chunk_size=1000,
    chunk_overlap=100,
)
```

## Step 3: Embedding and Indexing

Chunks are converted to vectors using an embedding model and stored in a vector store for similarity search.

```python
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import FAISS

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

# Create vector store from chunks
vectorstore = FAISS.from_documents(chunks, embeddings)

# Or use Chroma for persistence
from langchain_chroma import Chroma

vectorstore = Chroma.from_documents(
    chunks,
    embeddings,
    persist_directory="./chroma_db",
)
```

FAISS is fast and in-memory. Chroma persists to disk. For production, consider Pinecone, Weaviate, or pgvector for PostgreSQL.

## Step 4: Building the Retriever

A retriever wraps the vector store and returns the most relevant documents for a query.

```python
retriever = vectorstore.as_retriever(
    search_type="similarity",
    search_kwargs={"k": 4},  # Return top 4 chunks
)

# Test the retriever
docs = retriever.invoke("What were Q3 revenue numbers?")
for doc in docs:
    print(doc.page_content[:100])
    print(doc.metadata)
    print("---")
```

You can also use `search_type="mmr"` (Maximal Marginal Relevance) to get diverse results rather than just the closest matches.

## Step 5: Composing the RAG Chain

Now connect everything into an LCEL chain that retrieves context and generates answers.

```python
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

# Format retrieved documents into a single string
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

prompt = ChatPromptTemplate.from_template(
    """Answer the question based on the following context.
If the context does not contain enough information, say so.

Context:
{context}

Question: {question}

Answer:"""
)

rag_chain = (
    {
        "context": retriever | format_docs,
        "question": RunnablePassthrough(),
    }
    | prompt
    | ChatOpenAI(model="gpt-4o", temperature=0)
    | StrOutputParser()
)

answer = rag_chain.invoke("What were the key highlights from Q3?")
print(answer)
```

The dictionary step runs the retriever and passthrough in parallel. Retrieved documents are formatted into a string, while the original question is forwarded. Both feed into the prompt template.

## Adding Source Citations

To return sources alongside the answer, modify the chain to return both.

```python
from langchain_core.runnables import RunnableParallel

rag_with_sources = RunnableParallel(
    answer=rag_chain,
    sources=retriever | (lambda docs: [d.metadata["source"] for d in docs]),
)

result = rag_with_sources.invoke("What were Q3 revenue numbers?")
print(result["answer"])
print("Sources:", result["sources"])
```

## FAQ

### How do I choose the right chunk size?

Start with 1000 characters and 200 overlap. Smaller chunks (500 characters) improve retrieval precision but may lose context. Larger chunks (2000 characters) retain more context but may dilute relevance. Test with your actual queries and documents, measuring retrieval quality.

### Can I use RAG with local models instead of OpenAI?

Yes. Replace `ChatOpenAI` with any LangChain model wrapper — `ChatOllama` for local Ollama models, for example. For embeddings, use `HuggingFaceEmbeddings` or `OllamaEmbeddings` to keep everything local.

### How do I update the vector store when my documents change?

Most vector stores support `add_documents()` to add new content. For updates, delete the old documents by ID and add the new versions. Chroma and Pinecone support `upsert` operations. For bulk reindexing, rebuild the vector store from scratch.

---

#LangChain #RAG #VectorStore #DocumentLoading #Python #AgenticAI #LearnAI #AIEngineering

---

Source: https://callsphere.ai/blog/langchain-rag-chains-document-loaders-text-splitters-retrieval-qa