Self-Correcting RAG: CRAG, Self-RAG, and the Loop That Fixes Wrong Retrievals
Naive RAG retrieves wrong documents and answers from them confidently. The 2026 self-correcting RAG patterns that detect and fix bad retrievals.
The Failure Mode Self-Correcting RAG Targets
Classic RAG retrieves the top-k documents and feeds them to the LLM. If the retrieval was bad, the LLM still produces an answer — and often a confident, wrong one. The model has no way to know the retrieved context is irrelevant.
Self-correcting RAG adds a feedback loop: evaluate the retrieved context, decide whether to use it as-is, refine the search, or fall back to a different source. By 2026 this is standard for any production RAG that handles non-trivial questions.
The Two Reference Patterns
flowchart LR
subgraph CRAG[CRAG]
Q1[Query] --> R1[Retrieve]
R1 --> Eval1[Retrieval Evaluator]
Eval1 -->|correct| Use1[Use as is]
Eval1 -->|ambiguous| Refine[Refine + retrieve again]
Eval1 -->|incorrect| Fallback[Web search fallback]
end
subgraph Self[Self-RAG]
Q2[Query] --> Decide[Decide: retrieve or not]
Decide -->|yes| R2[Retrieve]
R2 --> Generate[Generate with retrieved]
Decide -->|no| Direct[Generate directly]
Generate --> Critique[Critique own output]
Critique -->|good| Out[Output]
Critique -->|bad| Q2
end
CRAG (Corrective RAG)
CRAG adds a retrieval evaluator before the generation step. The evaluator scores each retrieved document for relevance. Three branches:
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
- Correct: documents are relevant; generate normally
- Ambiguous: documents are partially relevant; refine the query and retrieve again, then generate
- Incorrect: documents are irrelevant; bypass them and use a fallback source (web search, a different vector index, etc.)
Simple, cheap (the evaluator is a small fast model), production-friendly. CRAG is the most-deployed self-correcting pattern in 2026.
Self-RAG
Self-RAG is more ambitious. The model is fine-tuned to emit special "reflection tokens" that decide whether to retrieve, score retrieved documents, and critique the generated output. The whole RAG loop runs inside one model.
- Pro: tight integration; can decide adaptively whether to retrieve at all
- Con: requires fine-tuning the underlying model; less plug-and-play
A Production CRAG Implementation
sequenceDiagram
participant U as User
participant Q as Query Rewriter
participant R as Retriever
participant E as Evaluator
participant G as Generator
participant W as Web Search
U->>Q: question
Q->>R: rewritten query
R->>E: top-k docs
E->>E: score each doc
alt all relevant
E->>G: pass docs
else some relevant
E->>R: refined query
R->>E: new docs
E->>G: pass curated set
else none relevant
E->>W: web search
W->>G: results
end
G->>U: answer with citations
The retrieval evaluator is typically a small, fast LLM (Haiku 4.5, GPT-5-mini, Llama-3-8B) prompted to score docs as relevant / partially / irrelevant. Cost is small relative to the generator.
Cost vs Quality
The numbers from production deployments:
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
- Naive RAG: $0.012/query, 73% accuracy
- CRAG: $0.018/query (+50% cost), 86% accuracy (+13 points)
- Self-RAG: $0.024/query (+100% cost), 88% accuracy (+15 points)
The cost-quality math favors CRAG for almost all production deployments. Self-RAG is for cases where the extra two points matter and you have the fine-tuning budget.
What the Evaluator Should Check
The 2026 best practice: evaluate three things, not just relevance:
- Relevance: does the document address the query topic?
- Specificity: does it contain the specific facts the question asks about?
- Currency: is it from a time window that matches the question?
A document can be relevant and specific but stale; CRAG that does not check currency answers questions with last-year's facts.
When Self-Correcting RAG Underperforms
- Trivial questions where any retrieval is fine; the evaluator is overhead
- Single-document corpora where the right document is always retrieved if anything is
- Latency-sensitive workloads where the extra evaluator round-trip is unacceptable
Combining With Agentic RAG
CRAG and Self-RAG sit nicely under an agentic RAG layer. The agent decides whether to retrieve at all; CRAG handles the corrective loop when retrieval is invoked; the agent can also decide to retrieve from a different source if CRAG flags incorrect retrievals.
Sources
- CRAG paper — https://arxiv.org/abs/2401.15884
- Self-RAG paper — https://arxiv.org/abs/2310.11511
- LangGraph CRAG implementation — https://langchain-ai.github.io/langgraph/tutorials/rag/langgraph_crag
- "Active RAG" survey — https://arxiv.org/abs/2403.10131
- "RAG techniques in 2025" — https://blog.langchain.dev
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.