Skip to content
Learn Agentic AI
Learn Agentic AI10 min read1 views

Parent-Child Chunking for RAG: Small Chunks for Search, Large Chunks for Context

Learn the parent-child chunking strategy where small chunks provide precise search matches while their larger parent chunks provide the full context needed for accurate generation.

The Chunking Dilemma

Every RAG system faces a fundamental tension in chunk sizing. Small chunks (100-200 tokens) produce precise embeddings that match specific queries accurately, but they lack the surrounding context needed for the LLM to generate comprehensive answers. Large chunks (1000-2000 tokens) provide rich context for generation, but their embeddings average over too many concepts, reducing retrieval precision.

This is not a theoretical problem. In practice, a 100-token chunk containing "The annual renewal rate increased to 94% in Q3" will match a revenue retention query perfectly. But the LLM needs the surrounding paragraphs to understand what drove that increase, which segments improved, and what caveats apply. Conversely, a 2000-token chunk about Q3 performance might not rank highly for a specific retention query because the embedding averages over dozens of different topics.

Parent-child chunking resolves this by decoupling search from context.

How Parent-Child Chunking Works

The strategy maintains two levels of chunks:

flowchart TD
    START["Parent-Child Chunking for RAG: Small Chunks for S…"] --> A
    A["The Chunking Dilemma"]
    A --> B
    B["How Parent-Child Chunking Works"]
    B --> C
    C["Implementation"]
    C --> D
    D["Embedding and Retrieval"]
    D --> E
    E["Handling Section-Aware Parent Chunks"]
    E --> F
    F["Choosing Chunk Sizes"]
    F --> G
    G["FAQ"]
    G --> DONE["Key Takeaways"]
    style START fill:#4f46e5,stroke:#4338ca,color:#fff
    style DONE fill:#059669,stroke:#047857,color:#fff
  • Child chunks (small, 100-300 tokens) — Used for embedding and similarity search. These are precise and topically focused.
  • Parent chunks (large, 1000-2000 tokens) — Used for context in generation. Each parent contains multiple children.

When a query comes in, the system searches against child chunk embeddings. When a child matches, the system retrieves its parent chunk and sends that larger context to the LLM.

Implementation

from dataclasses import dataclass, field
from openai import OpenAI
import hashlib
import uuid

client = OpenAI()

@dataclass
class Chunk:
    id: str
    content: str
    parent_id: str | None = None
    children: list[str] = field(default_factory=list)
    embedding: list[float] | None = None

class ParentChildChunker:
    def __init__(
        self,
        parent_size: int = 1500,
        child_size: int = 300,
        child_overlap: int = 50,
    ):
        self.parent_size = parent_size
        self.child_size = child_size
        self.child_overlap = child_overlap
        self.parents: dict[str, Chunk] = {}
        self.children: dict[str, Chunk] = {}

    def chunk_document(self, text: str) -> list[Chunk]:
        """Split document into parent and child chunks."""
        words = text.split()
        all_children = []

        # Create parent chunks
        for i in range(0, len(words), self.parent_size):
            parent_text = " ".join(
                words[i:i + self.parent_size]
            )
            parent_id = str(uuid.uuid4())
            parent = Chunk(
                id=parent_id, content=parent_text
            )
            self.parents[parent_id] = parent

            # Create child chunks within this parent
            parent_words = parent_text.split()
            step = self.child_size - self.child_overlap

            for j in range(0, len(parent_words), step):
                child_text = " ".join(
                    parent_words[j:j + self.child_size]
                )
                if len(child_text.split()) < 20:
                    continue  # Skip tiny fragments

                child_id = str(uuid.uuid4())
                child = Chunk(
                    id=child_id,
                    content=child_text,
                    parent_id=parent_id,
                )
                self.children[child_id] = child
                parent.children.append(child_id)
                all_children.append(child)

        return all_children

Embedding and Retrieval

Only the child chunks get embedded and stored in the vector index:

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

from openai import OpenAI

client = OpenAI()

def embed_children(
    chunker: ParentChildChunker,
) -> list[Chunk]:
    """Embed only child chunks for search indexing."""
    children = list(chunker.children.values())
    batch_size = 100

    for i in range(0, len(children), batch_size):
        batch = children[i:i + batch_size]
        response = client.embeddings.create(
            model="text-embedding-3-small",
            input=[c.content for c in batch],
        )
        for chunk, emb in zip(batch, response.data):
            chunk.embedding = emb.embedding

    return children

def parent_child_search(
    query: str,
    chunker: ParentChildChunker,
    vectorstore,
    k: int = 5,
) -> list[str]:
    """Search children, return parents for context."""
    # Search against child embeddings
    child_results = vectorstore.similarity_search(query, k=k)

    # Retrieve unique parent chunks
    seen_parents = set()
    parent_contexts = []

    for child_doc in child_results:
        child_id = child_doc.metadata["chunk_id"]
        child = chunker.children.get(child_id)
        if child and child.parent_id not in seen_parents:
            seen_parents.add(child.parent_id)
            parent = chunker.parents[child.parent_id]
            parent_contexts.append(parent.content)

    return parent_contexts

Handling Section-Aware Parent Chunks

For structured documents, align parent chunks with document sections rather than using fixed token counts:

import re

def section_aware_chunking(
    markdown_text: str,
) -> list[tuple[str, str]]:
    """Create parent chunks aligned with document sections."""
    # Split on headings
    sections = re.split(
        r'(?=^##?s)', markdown_text, flags=re.MULTILINE
    )

    parents = []
    for section in sections:
        section = section.strip()
        if not section:
            continue

        # Extract heading as metadata
        lines = section.split("\n")
        heading = lines[0].strip("# ").strip()
        body = "\n".join(lines[1:]).strip()

        if len(body.split()) > 50:  # Skip near-empty sections
            parents.append((heading, body))

    return parents

Choosing Chunk Sizes

The optimal sizes depend on your documents and queries. Here are guidelines based on empirical testing:

  • Technical documentation: Parent 1500 tokens, Child 200 tokens. Technical queries are precise and benefit from small child chunks.
  • Legal contracts: Parent 2000 tokens, Child 300 tokens. Legal context requires broad surrounding text for accurate interpretation.
  • Support conversations: Parent 1000 tokens, Child 150 tokens. Individual messages are short but need thread context.

Always evaluate on your specific query patterns. Measure retrieval precision at the child level and answer quality at the parent level.

FAQ

Does parent-child chunking increase storage requirements?

It increases storage by roughly 5-15% compared to single-level chunking because child chunks overlap within parents. However, you only embed and index the children, so vector storage scales with the number of children, not parents. The parent documents can be stored in a simple key-value store.

Can I use more than two levels in the hierarchy?

Yes, three-level hierarchies (grandparent-parent-child) work well for very long documents. Grandparent chunks represent entire sections, parents represent subsections, and children represent individual paragraphs. However, more levels add complexity to the retrieval logic, so only add a level if two levels provably underperform on your evaluation dataset.

How does this compare to overlapping windows in standard chunking?

Overlapping windows add context at the edges of each chunk but do not solve the core precision-context tradeoff. A 500-token chunk with 100-token overlap is still a compromise. Parent-child chunking fully decouples search precision from generation context, giving you the best of both worlds.


#ChunkingStrategy #RAG #ParentChildChunks #VectorSearch #DocumentProcessing #AgenticAI #LearnAI #AIEngineering

Share
C

Written by

CallSphere Team

Expert insights on AI voice agents and customer communication automation.

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.

Related Articles You May Like

guides

Understanding AI Voice Technology: A Beginner's Guide

A plain-English guide to AI voice technology — LLMs, STT, TTS, RAG, function calling, and latency budgets. Learn how modern voice agents actually work.

Technical Guides

How to Train an AI Voice Agent on Your Business: Prompts, RAG, and Fine-Tuning

A practical guide to training an AI voice agent on your specific business — system prompts, RAG over knowledge bases, and when to fine-tune.

AI Interview Prep

8 LLM & RAG Interview Questions That OpenAI, Anthropic & Google Actually Ask

Real LLM and RAG interview questions from top AI labs in 2026. Covers fine-tuning vs RAG decisions, production RAG pipelines, evaluation, PEFT methods, positional embeddings, and safety guardrails with expert answers.

Learn Agentic AI

AI Agents for IT Helpdesk: L1 Automation, Ticket Routing, and Knowledge Base Integration

Build IT helpdesk AI agents with multi-agent architecture for triage, device, network, and security issues. RAG-powered knowledge base, automated ticket creation, routing, and escalation.

Learn Agentic AI

Building Document Processing Agents: PDF, Email, and Spreadsheet Automation

Technical guide to building AI agents that automate document processing — PDF parsing and extraction, email classification and routing, and spreadsheet analysis with reporting.

Guides

Privacy-First AI for Procurement: How to Build Secure, Guardrail-Driven Systems

Learn how to design privacy-first AI systems for procurement workflows. Covers data classification, guardrails, RBAC, prompt injection prevention, RAG, and full auditability for enterprise AI.