Skip to content
Agentic AI
Agentic AI7 min read0 views

Build a Cited Claude RAG App: Step-by-Step Guide

A copy-pasteable guide to wiring Claude's Citations API into a retrieval app: request shape, response parsing, span mapping, and verification.

You have a corpus, a search index, and a Claude API key. What you do not have yet is a system where every answer comes back stitched to the exact sentences that justify it. This is a build log, not an overview. By the end you will have a working request shape, a parser for the response, a verification pass, and a rendering strategy — the four things that turn raw retrieval into a cited answer an engineer can actually trust in production.

We will assume you already retrieve relevant chunks somehow; the focus is everything after retrieval, where most teams get vague. Each step includes the actual data shapes so you are not guessing.

Key takeaways

  • Pass each source as a Claude document content block with citations.enabled — that single flag is what unlocks span-level provenance.
  • The response is interleaved text and citation objects; you must parse the structure rather than read a flat string.
  • Preserve original character offsets through retrieval so you can deep-link back to the exact source span.
  • A one-claim-at-a-time verifier on Haiku is cheap insurance against confidently mis-cited facts.
  • Render unverified or uncited sentences differently so trust signals reach the user, not just your logs.

Step 1: shape the request with document blocks

Retrieval hands you a list of chunks. Each chunk becomes a document content block. The order of blocks defines the document_index Claude will reference in citations, so keep that ordering stable and remember it on your side. Here is the request body for a two-source question:

{
  "model": "claude-sonnet-4-6",
  "max_tokens": 1024,
  "messages": [{
    "role": "user",
    "content": [
      { "type": "document",
        "source": { "type": "text", "media_type": "text/plain",
                     "data": "Onboarding requires a signed W-9 before the first payout." },
        "title": "contractor-handbook", "citations": { "enabled": true } },
      { "type": "document",
        "source": { "type": "text", "media_type": "text/plain",
                     "data": "Payouts run on the 1st and 15th of each month." },
        "title": "payments-faq", "citations": { "enabled": true } },
      { "type": "text",
        "text": "When will a new contractor get paid? Cite each fact." }
    ]
  }]
}

Notice the instruction "Cite each fact" in the user text. The Citations feature does the structural work, but a short directive nudges the model to attribute every claim rather than only the obvious ones. Keep the directive short; long meta-instructions about citing tend to bloat the answer.

Step 2: parse the interleaved response

The response content is an array of text blocks. Some blocks carry a citations array; each citation has a type of char_location with document_index, document_title, cited_text, and start/end character indices. Walk the array and pair each sentence with its supporting spans:

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live Demo →
for block in response.content:
    if block.type != "text":
        continue
    sentence = block.text
    spans = []
    for c in (block.citations or []):
        spans.append({
            "doc": c.document_title,
            "text": c.cited_text,
            "start": c.start_char_index,
            "end": c.end_char_index,
        })
    claims.append({"sentence": sentence, "spans": spans})

You now hold a list of claim objects, each with zero or more spans. Sentences with zero spans are your watch list — they are assertions the model made without grounding, and how you treat them (drop, flag, or re-ask) is a product decision you should make explicitly.

The build pipeline at a glance

flowchart TD
  A["Retrieved chunks + offsets"] --> B["Build document blocks"]
  B --> C["Call Claude (citations enabled)"]
  C --> D["Parse text + citation spans"]
  D --> E{"Sentence has >=1 span?"}
  E -->|No| F["Mark uncited, flag in UI"]
  E -->|Yes| G["Verify span supports sentence"]
  G --> H["Render footnote deep-link"]
  F --> H

This is the skeleton you are coding. Each box is a function; the diamond is the single most valuable branch in the whole app because it decides what happens to ungrounded claims.

Step 3: map spans back to your real documents

The character indices Claude returns are offsets into the data string you sent — not into your original file. If you sent a cleaned chunk, the offsets are relative to that chunk. So when you build document blocks, record the mapping from each block back to its source document and the chunk's starting offset in the original. Then a returned span of, say, characters 12–58 of block 0 can be translated to characters (chunk_start + 12) to (chunk_start + 58) in the real file. That translation is what powers a "jump to highlighted sentence in the PDF" experience.

Skip this and your citations still work as labels but cannot deep-link precisely. Teams that invest one afternoon in the offset bookkeeping get the most convincing trust UX for almost free.

Step 4: verify each claim against its span

For each claim with spans, run a focused check. The prompt is deliberately narrow — one claim, its cited text, a three-way verdict:

verdict = claude.messages.create(
  model="claude-haiku-4-5", max_tokens=10,
  messages=[{"role":"user","content":
    f"Claim: {sentence}\nEvidence: {span_text}\n"
    "Reply with one word: SUPPORTED, PARTIAL, or UNSUPPORTED."}]
)

Fan these out in parallel — claims are independent, so latency stays close to a single call. Aggregate the verdicts. UNSUPPORTED claims should not reach the user unannotated; either suppress them, re-run generation with a stricter instruction, or surface them with a visible warning. This step is cheap on Haiku and turns "we have citations" into "our citations passed a check."

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Step 5: render trust into the UI

Provenance only helps if users see it. Convert each claim's spans into numbered footnotes that, on click, scroll the source pane to the highlighted character range. Show the document title near the citation. For PARTIAL or UNSUPPORTED claims, use a distinct visual treatment — a muted color, a small caution icon — so the interface is honest about confidence. The goal is that a skeptical reader can verify any sentence in one interaction.

Common pitfalls

  • Reading the response as a string. If you concatenate content into plain text you discard every citation. Always iterate the block array.
  • Forgetting block order equals document index. Reordering chunks between request and parsing silently mis-attributes every citation.
  • Offsets relative to the wrong text. Returned indices are into the data you sent; track chunk-to-source offset maps or your deep links land in the wrong place.
  • No path for uncited sentences. Decide up front whether ungrounded claims are dropped, flagged, or regenerated — silence here ships unverified prose.
  • Verifying with an over-powered, expensive model. The check is a narrow entailment task; a small fast model handles it for a fraction of the cost.

Ship it in five steps

  1. Convert retrieved chunks into document blocks with citations enabled and stable ordering.
  2. Call Claude and parse the interleaved text-and-citation structure into claim objects.
  3. Translate returned offsets back to original-document character ranges.
  4. Run a parallel Haiku verifier returning a three-way verdict per claim.
  5. Render numbered footnotes that deep-link to highlighted spans and flag weak claims.

Frequently asked questions

What is the minimum to get citations from Claude?

Pass your sources as document content blocks and set citations: { enabled: true } on each. The response then includes per-sentence citation objects with document indices and character offsets.

Why parse the response instead of reading the text?

Citations live as structured metadata attached to individual text blocks. Flattening the content array to a string throws that metadata away, leaving you with prose and no provenance.

Do I need PDFs, or does plain text work?

Plain text works and is simplest for the walkthrough. Claude also accepts PDF document sources; the citation mechanics are the same, returning spans you map back to pages.

How many documents can I include per request?

Several, bounded by the context window. Keep ordering stable and titles descriptive, and retrieve tightly so you send the most relevant chunks rather than everything.

Bringing grounded answers to live conversations

CallSphere wires this exact pattern into voice and chat agents — they retrieve from your real documents and answer callers with facts they can stand behind, around the clock. See it working at callsphere.ai.


Source & attribution: This is an independent, original explainer inspired by Anthropic's coverage on the Claude blog. Claude, Claude Code, Claude Cowork, Claude Opus, and the Model Context Protocol are products and trademarks of Anthropic. CallSphere is not affiliated with or endorsed by Anthropic.

Share

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.