---
title: "Document Chunking Strategies for RAG: Fixed-Size, Semantic, and Recursive"
description: "Learn the most effective document chunking methods for RAG pipelines including fixed-size, semantic, and recursive splitting, with guidance on overlap, chunk sizes, and markdown-aware strategies."
canonical: https://callsphere.ai/blog/document-chunking-strategies-rag-fixed-semantic-recursive
category: "Learn Agentic AI"
tags: ["RAG", "Document Chunking", "Text Splitting", "NLP", "Vector Search"]
author: "CallSphere Team"
published: 2026-03-17T00:00:00.000Z
updated: 2026-05-07T00:30:44.204Z
---

# Document Chunking Strategies for RAG: Fixed-Size, Semantic, and Recursive

> Learn the most effective document chunking methods for RAG pipelines including fixed-size, semantic, and recursive splitting, with guidance on overlap, chunk sizes, and markdown-aware strategies.

## Why Chunking Matters More Than You Think

Chunking is the single most impactful decision in a RAG pipeline. If your chunks are too large, they contain too much noise and the embedding becomes a blurry average of unrelated ideas. If they are too small, they lose context and the retrieved snippet is meaningless on its own. The embedding model and the LLM both perform best when each chunk represents one coherent idea.

This post covers the three primary chunking strategies, their tradeoffs, and production-ready implementations.

## Strategy 1: Fixed-Size Chunking

The simplest approach splits text into chunks of a fixed token or character count with optional overlap.

```mermaid
flowchart LR
    Q(["User query"])
    EMB["Embed query
text-embedding-3"]
    VEC[("Vector DB
pgvector or Pinecone")]
    RET["Top-k retrieval
k = 8"]
    PROMPT["Augmented prompt
system plus context"]
    LLM["LLM generation
Claude or GPT"]
    CITE["Inline citations
and page anchors"]
    OUT(["Grounded answer"])
    Q --> EMB --> VEC --> RET --> PROMPT --> LLM --> CITE --> OUT
    style EMB fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
    style VEC fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
    style LLM fill:#4f46e5,stroke:#4338ca,color:#fff
    style OUT fill:#059669,stroke:#047857,color:#fff
```

```python
from langchain.text_splitter import CharacterTextSplitter

splitter = CharacterTextSplitter(
    separator="\n",
    chunk_size=500,       # characters
    chunk_overlap=50,     # overlap between consecutive chunks
    length_function=len,
)

chunks = splitter.split_text(document_text)
print(f"Created {len(chunks)} chunks")
print(f"Average chunk length: {sum(len(c) for c in chunks) / len(chunks):.0f} chars")
```

**Pros:** Simple, predictable chunk sizes, easy to reason about token costs.

**Cons:** Splits mid-sentence and mid-paragraph, breaking semantic coherence. A chunk might start with "...the patient should take 200mg" without any indication of which medication is being discussed.

**Best for:** Unstructured plain text where no natural boundaries exist, or as a baseline to compare against smarter methods.

## Strategy 2: Recursive Character Splitting

This is the most popular strategy in production RAG systems. It tries to split on natural boundaries — paragraphs first, then sentences, then words — and only falls back to character-level splits when necessary.

```python
from langchain.text_splitter import RecursiveCharacterTextSplitter

splitter = RecursiveCharacterTextSplitter(
    chunk_size=512,
    chunk_overlap=64,
    separators=[
        "\n\n",   # Try paragraph breaks first
        "\n",     # Then line breaks
        ". ",      # Then sentence endings
        ", ",      # Then clause boundaries
        " ",       # Then word boundaries
        ""         # Last resort: character-level
    ]
)

chunks = splitter.split_text(document_text)
```

The algorithm walks through the separator list in order. It first tries to split on double newlines (paragraphs). If a resulting chunk exceeds `chunk_size`, it recursively splits that chunk using the next separator.

**Pros:** Preserves semantic boundaries in most cases. Paragraphs stay intact when possible.

**Cons:** Chunk sizes still vary. Does not understand the actual meaning of the text.

## Strategy 3: Semantic Chunking

Semantic chunking uses embedding similarity to detect topic boundaries. It embeds each sentence, then groups consecutive sentences that are semantically similar into the same chunk. When the similarity drops below a threshold, a new chunk begins.

```python
from langchain_experimental.text_splitter import SemanticChunker
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

chunker = SemanticChunker(
    embeddings,
    breakpoint_threshold_type="percentile",
    breakpoint_threshold_amount=75,  # split at 75th percentile dissimilarity
)

chunks = chunker.split_text(document_text)

for i, chunk in enumerate(chunks[:3]):
    print(f"Chunk {i}: {len(chunk)} chars | Preview: {chunk[:80]}...")
```

**Pros:** Each chunk genuinely covers one coherent topic. Embedding quality improves significantly because the vector represents a single concept.

**Cons:** Requires an embedding API call for every sentence during indexing (higher cost). Chunk sizes are unpredictable. Slower ingestion.

## Markdown-Aware Splitting

Technical documentation, wikis, and README files use markdown headers as natural section boundaries. A markdown-aware splitter respects these headings:

```python
from langchain.text_splitter import MarkdownHeaderTextSplitter

headers_to_split_on = [
    ("#", "h1"),
    ("##", "h2"),
    ("###", "h3"),
]

md_splitter = MarkdownHeaderTextSplitter(
    headers_to_split_on=headers_to_split_on
)

chunks = md_splitter.split_text(markdown_text)

# Each chunk carries its header hierarchy as metadata
for chunk in chunks[:2]:
    print(f"Headers: {chunk.metadata}")
    print(f"Content: {chunk.page_content[:100]}...")
    print()
```

The metadata (which headers this chunk falls under) is extremely valuable for retrieval. You can prepend headers to the chunk text before embedding so the vector captures the full context.

## Choosing Chunk Size: A Practical Guide

There is no universal optimal chunk size. Here are guidelines based on production experience:

| Use Case | Chunk Size | Overlap | Reasoning |
| --- | --- | --- | --- |
| Q&A over docs | 256-512 tokens | 10-15% | Small focused chunks match specific questions |
| Summarization | 1024-2048 tokens | 5% | Larger chunks preserve narrative flow |
| Code search | 64-256 tokens | 0 | Functions/classes are natural boundaries |
| Legal/medical | 512-1024 tokens | 15-20% | Higher overlap prevents splitting critical clauses |

## Overlap: Why It Matters

Overlap ensures that information spanning a chunk boundary is not lost. Consider a document where paragraph A ends with a key fact and paragraph B provides the explanation. Without overlap, the fact and its explanation land in separate chunks. With a 64-token overlap, the end of chunk N is repeated at the start of chunk N+1.

```python
# Visualize overlap
for i in range(min(3, len(chunks) - 1)):
    end_of_current = chunks[i][-80:]
    start_of_next = chunks[i + 1][:80]
    overlap = set(end_of_current.split()) & set(start_of_next.split())
    print(f"Chunks {i}-{i+1} share {len(overlap)} words in overlap region")
```

## FAQ

### What chunk size should I start with for a new RAG project?

Start with 512 tokens using recursive character splitting with 64-token overlap. This works well for most question-answering use cases. Then measure retrieval quality and adjust — decrease chunk size if retrieved chunks contain too much irrelevant text, increase if chunks lack sufficient context.

### Should I use semantic chunking in production?

Semantic chunking produces higher-quality chunks but is slower and more expensive during ingestion because every sentence requires an embedding call. Use it when ingestion is infrequent (you index documents once or nightly) and retrieval quality is critical. For real-time or high-volume ingestion, recursive splitting is more practical.

### How do I handle tables and images in documents?

Tables should be extracted as structured text (CSV or markdown table format) and chunked as complete units — never split a table row across chunks. For images, use a multimodal embedding model or generate a text description of the image and embed that description alongside the surrounding text.

---

#RAG #DocumentChunking #TextSplitting #NLP #VectorSearch #AgenticAI #LearnAI #AIEngineering

---

Source: https://callsphere.ai/blog/document-chunking-strategies-rag-fixed-semantic-recursive
