Why Structured Output Matters

Production AI systems consume LLM outputs programmatically. Structured outputs (JSON, XML) parse cleanly; free-form prose does not. The reliability of structured output decides whether your downstream code can rely on it.

By 2026 three approaches dominate: JSON Schema validation, XML tagging, and native function-calling. This piece compares them.

The Three Approaches

flowchart TB
    JSON[JSON Schema validation] --> Strong1[Strong: machine-readable, validated]
    XML[XML tagging] --> Strong2[Strong: human-readable, flexible]
    Func[Function-call mode] --> Strong3[Strong: native LLM support, most reliable]

JSON Schema

The structured-output API ("response_format": json_schema) constrains the model to produce valid JSON matching your schema:

{
  "type": "object",
  "properties": {
    "intent": { "type": "string", "enum": ["book", "cancel", "reschedule"] },
    "patient_id": { "type": "string" }
  },
  "required": ["intent"]
}

Modern providers (OpenAI, Anthropic, Google) support schema-constrained generation. Output is guaranteed to parse and match the schema.

Strengths: validated, machine-readable
Weaknesses: schema must be carefully designed; nested complex schemas can confuse the model
Best for: most structured-output use cases in 2026

XML Tagging

The model produces XML-tagged output:

<intent>book</intent>
<patient_id>a1b2c3</patient_id>
<rationale>The user explicitly asked to schedule.</rationale>

Anthropic's Claude is particularly tuned to XML tags.

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

Try Live Demo ROI Calculator

Strengths: human-readable; flexible; allows mixed structured + free-form
Weaknesses: requires post-parsing; not validated by default
Best for: outputs combining structured data with prose; debugging

Function Calling

The model emits a tool call with structured arguments:

function_call: book_appointment(patient_id="a1b2c3", start_time="2026-04-25T10:00:00")

Native function-calling APIs handle the structuring.

Strengths: most reliable; deeply trained behavior; integrates with agentic flows
Weaknesses: awkward when you want output but no actual side effect
Best for: agentic workflows; pure-extraction tasks via "fake" tool calls

Decision Matrix

flowchart TD
    Q1{Output triggers an action?} -->|Yes| Func2[Function call]
    Q1 -->|No| Q2{Pure structured data?}
    Q2 -->|Yes| JSON2[JSON Schema]
    Q2 -->|No, mixed prose + struct| XML2[XML tags]

Common Failures

Over-nested schemas: deeply nested structured output is harder for models. Flatten.
Optional vs required confusion: be explicit about which fields are required.
Hallucinated values: even structured outputs can have wrong values. Validate semantics, not just structure.
Mixed natural language and JSON: many models will add a preamble before the JSON if not constrained. Use the API's mode to suppress.

Schema Design Patterns

Good: flat structure with explicit types
{
  "intent": "string with enum",
  "confidence": "number 0-1",
  "extracted_entities": ["array of strings"]
}

Avoid: deeply nested with anyOf branches
{
  "result": {
    "type": "anyOf...",
    "subtype": { "anyOf": ... }
  }
}

Frontier models in 2026 handle reasonable nesting well; pathologically nested schemas remain risky.

Validation Layers

Beyond schema validation:

Check enums match expected values
Range-check numbers
Pattern-match strings (UUID, email, dates)
Cross-field validation (start_time < end_time)
Domain validation (patient_id exists in DB)

The schema validates structure; your code validates semantics.

Mixed Mode

For complex outputs, combine:

Function call for the action
XML tags inside the function's arguments for nuanced fields

The 2026 pattern that works: structured shell, free-form internals where flexibility helps.

What Frontier Models Do Best

OpenAI: JSON Schema mode is the most-tuned structured output path
Anthropic Claude: XML tags, then function calling, then JSON
Google Gemini: JSON Schema; function calling
Open-weights: varies; Llama, Qwen3, DeepSeek all have function-calling but quality varies

For maximum reliability, use the provider's native structured-output mode and benchmark.

Sources

OpenAI structured outputs — https://platform.openai.com/docs/guides/structured-outputs
Anthropic XML tags — https://docs.anthropic.com/claude/docs/use-xml-tags
Anthropic tool use — https://docs.anthropic.com/claude/docs/tool-use
Outlines library — https://github.com/dottxt-ai/outlines
"Structured generation" research — https://arxiv.org

Structured Output Prompts: JSON Schema, XML, and Function-Call Modes

Why Structured Output Matters

The Three Approaches

JSON Schema

XML Tagging

Function Calling

Decision Matrix

Common Failures

Schema Design Patterns

Validation Layers

Mixed Mode

What Frontier Models Do Best

Sources

Try CallSphere AI Voice Agents

Related Articles You May Like

Tool-Calling Schemas That Don't Break: Robust Function Definitions

Prompt Engineering for Tool-Calling Agents: 10 Patterns That Work

Building Voice Agents with the OpenAI Realtime API: Full Tutorial

How to Train an AI Voice Agent on Your Business: Prompts, RAG, and Fine-Tuning

8 LLM & RAG Interview Questions That OpenAI, Anthropic & Google Actually Ask

Prompt Engineering for AI Agents: System Prompts, Tool Descriptions, and Few-Shot Patterns