Structured Output Prompts: JSON Schema, XML, and Function-Call Modes
Three structured-output approaches, three different reliability profiles. The 2026 best practices for getting clean structured output from LLMs.
Why Structured Output Matters
Production AI systems consume LLM outputs programmatically. Structured outputs (JSON, XML) parse cleanly; free-form prose does not. The reliability of structured output decides whether your downstream code can rely on it.
By 2026 three approaches dominate: JSON Schema validation, XML tagging, and native function-calling. This piece compares them.
The Three Approaches
flowchart TB
JSON[JSON Schema validation] --> Strong1[Strong: machine-readable, validated]
XML[XML tagging] --> Strong2[Strong: human-readable, flexible]
Func[Function-call mode] --> Strong3[Strong: native LLM support, most reliable]
JSON Schema
The structured-output API ("response_format": json_schema) constrains the model to produce valid JSON matching your schema:
{
"type": "object",
"properties": {
"intent": { "type": "string", "enum": ["book", "cancel", "reschedule"] },
"patient_id": { "type": "string" }
},
"required": ["intent"]
}
Modern providers (OpenAI, Anthropic, Google) support schema-constrained generation. Output is guaranteed to parse and match the schema.
- Strengths: validated, machine-readable
- Weaknesses: schema must be carefully designed; nested complex schemas can confuse the model
- Best for: most structured-output use cases in 2026
XML Tagging
The model produces XML-tagged output:
<intent>book</intent>
<patient_id>a1b2c3</patient_id>
<rationale>The user explicitly asked to schedule.</rationale>
Anthropic's Claude is particularly tuned to XML tags.
See AI Voice Agents Handle Real Calls
Book a free demo or calculate how much you can save with AI voice automation.
- Strengths: human-readable; flexible; allows mixed structured + free-form
- Weaknesses: requires post-parsing; not validated by default
- Best for: outputs combining structured data with prose; debugging
Function Calling
The model emits a tool call with structured arguments:
function_call: book_appointment(patient_id="a1b2c3", start_time="2026-04-25T10:00:00")
Native function-calling APIs handle the structuring.
- Strengths: most reliable; deeply trained behavior; integrates with agentic flows
- Weaknesses: awkward when you want output but no actual side effect
- Best for: agentic workflows; pure-extraction tasks via "fake" tool calls
Decision Matrix
flowchart TD
Q1{Output triggers an action?} -->|Yes| Func2[Function call]
Q1 -->|No| Q2{Pure structured data?}
Q2 -->|Yes| JSON2[JSON Schema]
Q2 -->|No, mixed prose + struct| XML2[XML tags]
Common Failures
- Over-nested schemas: deeply nested structured output is harder for models. Flatten.
- Optional vs required confusion: be explicit about which fields are required.
- Hallucinated values: even structured outputs can have wrong values. Validate semantics, not just structure.
- Mixed natural language and JSON: many models will add a preamble before the JSON if not constrained. Use the API's mode to suppress.
Schema Design Patterns
Good: flat structure with explicit types
{
"intent": "string with enum",
"confidence": "number 0-1",
"extracted_entities": ["array of strings"]
}
Avoid: deeply nested with anyOf branches
{
"result": {
"type": "anyOf...",
"subtype": { "anyOf": ... }
}
}
Frontier models in 2026 handle reasonable nesting well; pathologically nested schemas remain risky.
Validation Layers
Beyond schema validation:
- Check enums match expected values
- Range-check numbers
- Pattern-match strings (UUID, email, dates)
- Cross-field validation (start_time < end_time)
- Domain validation (patient_id exists in DB)
The schema validates structure; your code validates semantics.
Mixed Mode
For complex outputs, combine:
- Function call for the action
- XML tags inside the function's arguments for nuanced fields
The 2026 pattern that works: structured shell, free-form internals where flexibility helps.
What Frontier Models Do Best
- OpenAI: JSON Schema mode is the most-tuned structured output path
- Anthropic Claude: XML tags, then function calling, then JSON
- Google Gemini: JSON Schema; function calling
- Open-weights: varies; Llama, Qwen3, DeepSeek all have function-calling but quality varies
For maximum reliability, use the provider's native structured-output mode and benchmark.
Sources
- OpenAI structured outputs — https://platform.openai.com/docs/guides/structured-outputs
- Anthropic XML tags — https://docs.anthropic.com/claude/docs/use-xml-tags
- Anthropic tool use — https://docs.anthropic.com/claude/docs/tool-use
- Outlines library — https://github.com/dottxt-ai/outlines
- "Structured generation" research — https://arxiv.org
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.