Multi-Hop RAG: Designing Retrieval Pipelines for Complex Questions
Multi-hop questions break naive RAG. The 2026 retrieval patterns that handle 'who is the manager of the engineer who shipped Y' style questions.
What Multi-Hop Means
Single-hop questions can be answered from one retrieved chunk. Multi-hop questions need multiple chunks chained: "Who is the manager of the engineer who shipped feature X?" requires finding the engineer first, then their manager.
Naive RAG retrieves k chunks for a single embedding query and feeds them to the model. Multi-hop questions confuse this pattern; the right chunks may not all match a single embedding query.
By 2026 multi-hop RAG is its own design discipline. This piece walks through it.
The Three Patterns
flowchart TB
M[Multi-hop strategies] --> M1[Decompose-then-retrieve]
M --> M2[Iterative-retrieve]
M --> M3[Graph-traverse]
Decompose-then-Retrieve
Use an LLM to decompose the question into atomic sub-questions. Retrieve for each. Compose.
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
Q: Who is the manager of the engineer who shipped feature X?
Decompose:
Q1: Who shipped feature X?
Q2: Who is the manager of [answer to Q1]?
Each sub-question is single-hop and retrieves cleanly.
Iterative-Retrieve
Retrieve, look at the result, decide what's missing, retrieve again. Continue until the question is answered or budget is exhausted.
flowchart LR
Q[Question] --> R1[Retrieve 1]
R1 --> Eval1[LLM evaluates: enough?]
Eval1 -->|No| Refine[Refine query]
Refine --> R2[Retrieve 2]
R2 --> Eval2[Eval]
Eval2 -->|Yes| Answer[Answer]
Flexible; can handle questions whose decomposition is not obvious upfront.
Graph-Traverse
Use a knowledge graph for the structured part of the question. RAG for the unstructured. Blend results.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
For "manager of the engineer who shipped X": graph stores the engineer-manager relationship; vector store stores feature-shipping records. Query both.
When Each One Wins
- Decompose-then-retrieve: when the decomposition is obvious and bounded
- Iterative-retrieve: when the decomposition emerges from intermediate results
- Graph-traverse: when relationships are first-class and known
Most production multi-hop RAG in 2026 uses iterative-retrieve as the default with graph-traverse mixed in for relationship-heavy domains.
A Production Pattern
flowchart TB
Q[Question] --> Class[Classify hop count]
Class -->|1-hop| Simple[Standard RAG]
Class -->|multi-hop| Iter[Iterative-retrieve]
Iter --> R1[Retrieve]
R1 --> Sub[LLM extracts sub-question]
Sub --> R2[Retrieve sub-question]
R2 --> Combine[LLM combines]
Combine --> Done[Answer]
The classifier saves cost on simple questions. The iterative loop handles the hard ones with budget caps.
Cost Control
Multi-hop RAG is expensive because it makes multiple LLM and retriever calls. Patterns:
- Cap iterations (typically 3-5)
- Use cheap LLMs for sub-question extraction
- Cache intermediate results (the same sub-question may appear across user queries)
- Pre-decompose common question shapes
Evaluation
flowchart LR
Test[Test questions] --> Multi[Multi-hop benchmark]
Multi --> Recall[Recall: did we retrieve all needed chunks?]
Multi --> Compose[Composition: did we combine correctly?]
Multi --> Hop[Hop count: minimum needed]
Multi-hop benchmarks (HotpotQA, 2WikiMultihopQA) test retrieval over hops. Use them, but augment with your own multi-hop questions over your corpus.
Common Failure Modes
- Decomposition gets the wrong sub-questions
- Iterative loop never converges (cap iterations)
- One bad sub-result poisons the chain (validate intermediate results)
- Composition step misses a key fact (use stronger model for composition)
Sources
- "Self-Ask" Press et al. — https://arxiv.org/abs/2210.03350
- "IRCoT" Trivedi et al. — https://arxiv.org/abs/2212.10509
- HotpotQA dataset — https://hotpotqa.github.io
- "Multi-hop QA" survey 2025 — https://arxiv.org
- LangGraph multi-hop recipes — https://langchain-ai.github.io/langgraph
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.