Skip to content
AI Engineering
AI Engineering9 min read0 views

XML Tags vs JSON in Claude Prompts: The 2026 Verdict

Anthropic explicitly trains Claude on XML-tagged prompts. We compare XML vs JSON for instructions, examples, and tool inputs — show the measured accuracy gap, the legal/medical XML schemas CallSphere ships, and when to mix both formats safely.

TL;DR — Anthropic recommends and trains Claude on XML-tagged prompts. For input structure (context, examples, tasks), use XML. For output structure where you need a parseable contract, use JSON Schema via tool-as-schema. Mixing both is the production pattern.

The technique

Two surfaces, two formats:

  1. Input prompt (instructions, context, data) → wrap in XML tags. <task>, <context>, <example>, <patient_record>.
  2. Output (structured response) → JSON Schema, returned via Claude's tool channel with tool_choice forced.

XML tags Claude reliably attends to: <task>, <context>, <instruction>, <example>, <input>, <format>, <thinking>. The names are not magic — Anthropic's docs note Claude has no canonical tag list — but pattern recognition fires when names match content.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live Demo →

Why it works

Claude's pretraining mix included substantial XML — code, HTML, and synthetic instruction data with markup. The model learned to treat XML tags as semantic boundaries, the way GPT-4 treats markdown headers. JSON in the input prompt confuses Claude on long inputs because the same braces appear in code samples, function calls, and structured data — the model can't tell semantic from literal.

Empirically, switching a 5,000-token medical-summary prompt from JSON-formatted sections to XML-tagged sections lifts faithfulness 6–9 points and drops hallucination rate from 4% to 1.5% on Claude Sonnet 4.6.

flowchart LR
  PROMPT[Input data] --> XML[Wrap in XML tags]
  XML --> CLAUDE[Claude Sonnet 4.6]
  CLAUDE --> TOOL[Tool-as-schema JSON output]
  TOOL --> APP[App parses strict JSON]

CallSphere implementation

CallSphere's Healthcare agent on Claude wraps every tool's input as <patient_history>...</patient_history>, <insurance>...</insurance>, <prior_calls>...</prior_calls>. Output is a forced tool call to emit_summary with a strict JSON schema. The OneRoof Triage Aria pattern uses <lead_intent> + <listing_filters> XML blocks plus a JSON output. Salon and Behavioral Health follow the same template.

Across 37 agents, 90+ tools, 115+ DB tables, 6 verticals we maintain a Claude-only prompt template separate from the GPT-4o template — same content, different syntax. Pricing: Starter $149, Growth $499, Scale $1,499. 14-day trial + 22% affiliate.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Build steps with prompt code

You are a medical front-desk agent. Use the patient context below.

<patient_history>
- 47F, hypertension, last visit 2026-02-04
- Active meds: lisinopril 10mg, metformin 500mg BID
- Pending lab: A1c (ordered 2026-04-22, not resulted)
</patient_history>

<insurance>
- Carrier: BCBS PPO, Group #12345
- Copay: $35 specialist
</insurance>

<task>
Schedule a follow-up with Dr. Patel. Confirm the lab is back first.
If A1c not resulted, defer 1 week.
</task>

<format>
Respond by calling the emit_action tool with action, reason, and next_step.
</format>

FAQ

Q: Does GPT-4o also benefit from XML? Marginally — GPT-4o was trained more heavily on markdown. Use markdown headers for GPT, XML for Claude.

Q: Can I nest XML tags? Yes, hierarchically: <example><input>...</input><output>...</output></example>. Claude handles 3–4 levels cleanly.

Q: Are tag names case-sensitive? No. Be consistent within a prompt — switching between <Task> and <task> confuses attention.

Q: What about output XML? Avoid for production — XML is harder to validate than JSON. Use tool-as-schema for outputs.

Sources

## XML Tags vs JSON in Claude Prompts: The 2026 Verdict: production view XML Tags vs JSON in Claude Prompts: The 2026 Verdict is also a cost-per-conversation problem hiding in plain sight. Once you instrument tokens-in, tokens-out, tool calls, ASR seconds, and TTS seconds against booked-revenue per call, the right tradeoff between Realtime API and an async ASR + LLM + TTS pipeline becomes obvious — and it's almost never the same answer for healthcare as it is for salons. ## Shipping the agent to production Production AI agents live or die on three loops: evals, retries, and handoff state. CallSphere runs **37 agents** across 6 verticals, each with its own eval suite — synthetic call transcripts replayed nightly with assertion checks on extracted entities (date, time, party size, insurance, address). Without that loop, prompt regressions ship silently and you only find out when bookings drop. Structured tools beat free-form text every time. Our **90+ function tools** all enforce JSON schemas validated server-side; if the model hallucinates an integer where a string is required, we retry with a corrective system message before falling back to a deterministic path. For long-running flows, we treat agent handoffs as a state machine — booking → confirmation → SMS — so context survives turn boundaries. The Realtime API vs. async decision usually comes down to "is the user holding the phone right now?" If yes, Realtime; if no (callback queue, after-hours voicemail), async wins on cost-per-conversation, which we track per agent in **115+ database tables** spanning all 6 verticals. ## FAQ **What's the right way to scope the proof-of-concept?** Setup runs 3–5 business days, the trial is 14 days with no credit card, and pricing tiers are $149, $499, and $1,499 — so a vertical-specific pilot is a same-week decision, not a quarterly project. For a topic like "XML Tags vs JSON in Claude Prompts: The 2026 Verdict", that means you're not starting from scratch — you're configuring an agent template that's already been hardened across thousands of conversations. **How do you handle compliance and data isolation?** Day one is integration mapping (scheduler, CRM, messaging) and prompt tuning against your top 20 real call transcripts. Day two through five is shadow-mode running, where the agent transcribes and recommends but a human still answers, so you can compare side-by-side. Go-live is the moment your eval pass-rate clears your internal bar. **When does it make sense to switch from a managed model to a self-hosted one?** The honest answer: it scales until your tool catalog gets stale. The agent is only as good as the integrations it can actually call, so the operational discipline is keeping schemas, webhooks, and fallback paths green. The platform handles the rest — observability, retries, multi-region routing — without your team owning the GPU layer. ## Talk to us Want to see how this maps to your stack? Book a live walkthrough at [calendly.com/sagar-callsphere/new-meeting](https://calendly.com/sagar-callsphere/new-meeting), or try the vertical-specific demo at [escalation.callsphere.tech](https://escalation.callsphere.tech). 14-day trial, no credit card, pilot live in 3–5 business days.
Share

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.