Handling Structured Output Failures: Retries, Fallbacks, and Partial Parsing

Why Structured Outputs Fail

Even with OpenAI's constrained decoding and Pydantic validation, structured output extraction fails in production. Common failure modes include:

API errors: Rate limits (429), server errors (500), timeouts
Validation errors: The model returns valid JSON that fails your business logic validators
Content refusals: The model refuses to process the input due to safety filters
Malformed output: Rare with strict mode, but possible with JSON mode or local models
Hallucination: The JSON is valid and schema-conforming, but the extracted values are wrong

A production system must handle every one of these gracefully. Crashing on the first validation error is not acceptable.

Retry with Exponential Backoff

The simplest resilience pattern. Retry API failures with increasing delays:

flowchart LR
    INPUT(["User intent"])
    PARSE["Parse plus<br/>classify"]
    PLAN["Plan and tool<br/>selection"]
    AGENT["Agent loop<br/>LLM plus tools"]
    GUARD{"Guardrails<br/>and policy"}
    EXEC["Execute and<br/>verify result"]
    OBS[("Trace and metrics")]
    OUT(["Outcome plus<br/>next action"])
    INPUT --> PARSE --> PLAN --> AGENT --> GUARD
    GUARD -->|Pass| EXEC --> OUT
    GUARD -->|Fail| AGENT
    AGENT --> OBS
    style AGENT fill:#4f46e5,stroke:#4338ca,color:#fff
    style GUARD fill:#f59e0b,stroke:#d97706,color:#1f2937
    style OBS fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
    style OUT fill:#059669,stroke:#047857,color:#fff

import time
import random
from typing import TypeVar, Callable
from openai import RateLimitError, APITimeoutError, APIConnectionError

T = TypeVar("T")

def retry_with_backoff(
    func: Callable[..., T],
    max_retries: int = 5,
    base_delay: float = 1.0,
    max_delay: float = 60.0,
    retryable_errors: tuple = (RateLimitError, APITimeoutError, APIConnectionError),
) -> Callable[..., T]:
    """Decorator that retries a function with exponential backoff."""

    def wrapper(*args, **kwargs) -> T:
        last_error = None
        for attempt in range(max_retries):
            try:
                return func(*args, **kwargs)
            except retryable_errors as e:
                last_error = e
                delay = min(base_delay * (2 ** attempt) + random.uniform(0, 1), max_delay)
                print(f"Attempt {attempt + 1} failed: {e}. Retrying in {delay:.1f}s")
                time.sleep(delay)
            except Exception:
                raise  # Non-retryable errors propagate immediately
        raise last_error

    return wrapper

The jitter (random.uniform(0, 1)) prevents thundering herd problems when multiple processes hit rate limits simultaneously.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live →

Try Live Demo →

Validation-Aware Retries with Instructor

Instructor's built-in retry mechanism feeds validation errors back to the model. Customize this behavior:

import instructor
from openai import OpenAI
from pydantic import BaseModel, Field, field_validator
from typing import List

client = instructor.from_openai(OpenAI())

class StrictProduct(BaseModel):
    name: str
    price: float = Field(gt=0)
    currency: str = Field(pattern=r"^[A-Z]{3}$")
    categories: List[str] = Field(min_length=1, max_length=5)

    @field_validator("name")
    @classmethod
    def name_not_generic(cls, v: str) -> str:
        generic_names = {"product", "item", "thing", "unknown"}
        if v.lower().strip() in generic_names:
            raise ValueError(f"Name '{v}' is too generic. Extract the actual product name.")
        return v

# Instructor automatically retries with validation errors in the prompt
product = client.chat.completions.create(
    model="gpt-4o",
    response_model=StrictProduct,
    max_retries=3,  # Will retry up to 3 times on validation failure
    messages=[
        {"role": "user", "content": "The new widget costs fifteen dollars."}
    ],
)

On each retry, the model sees its previous output and the exact validation error, allowing it to self-correct.

Fallback Schemas

When a detailed extraction fails repeatedly, fall back to a simpler schema that captures partial data:

class DetailedExtraction(BaseModel):
    company_name: str
    founding_year: int
    revenue: float
    employee_count: int
    headquarters_city: str
    headquarters_country: str
    industry: str
    ceo_name: str

class FallbackExtraction(BaseModel):
    company_name: str
    raw_details: str = Field(description="Any other details as free text")
    extraction_complete: bool = False

def extract_with_fallback(text: str) -> DetailedExtraction | FallbackExtraction:
    """Try detailed extraction first, fall back to simple on failure."""
    try:
        return client.chat.completions.create(
            model="gpt-4o",
            response_model=DetailedExtraction,
            max_retries=2,
            messages=[
                {"role": "system", "content": "Extract company details precisely."},
                {"role": "user", "content": text}
            ],
        )
    except Exception as e:
        print(f"Detailed extraction failed: {e}. Trying fallback.")
        return client.chat.completions.create(
            model="gpt-4o",
            response_model=FallbackExtraction,
            max_retries=1,
            messages=[
                {"role": "system", "content": "Extract whatever company information you can."},
                {"role": "user", "content": text}
            ],
        )

The fallback captures the company name (almost always extractable) and dumps everything else into free text. This is better than returning nothing — downstream systems can still use the partial data.

Partial Result Recovery

When extracting a list of items, some may validate while others fail. Recover the valid ones:

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Try Live Demo → Book 30-min Walkthrough See Pricing

from pydantic import ValidationError

class Transaction(BaseModel):
    date: str = Field(pattern=r"^\d{4}-\d{2}-\d{2}$")
    amount: float = Field(gt=0)
    merchant: str
    category: str

def extract_transactions_with_recovery(raw_items: list[dict]) -> tuple[list[Transaction], list[dict]]:
    """Parse a list of raw dicts, separating valid from invalid."""
    valid = []
    invalid = []

    for item in raw_items:
        try:
            valid.append(Transaction.model_validate(item))
        except ValidationError as e:
            invalid.append({"data": item, "errors": e.errors()})

    return valid, invalid

# Example usage after getting raw JSON from LLM
import json

raw_response = '''[
    {"date": "2025-03-15", "amount": 42.50, "merchant": "Coffee Shop", "category": "food"},
    {"date": "March 15", "amount": -10, "merchant": "", "category": "other"},
    {"date": "2025-03-16", "amount": 120.00, "merchant": "Gas Station", "category": "transport"}
]'''

raw_items = json.loads(raw_response)
valid, invalid = extract_transactions_with_recovery(raw_items)
print(f"Recovered {len(valid)} of {len(raw_items)} transactions")
print(f"Failed items: {len(invalid)}")

Graceful Degradation Pipeline

Combine all patterns into a complete resilience pipeline:

from dataclasses import dataclass
from typing import Any, Optional

@dataclass
class ExtractionResult:
    data: Any
    quality: str  # "full", "partial", "fallback", "failed"
    errors: list[str]
    attempts: int

def resilient_extract(text: str) -> ExtractionResult:
    errors = []

    # Attempt 1: Full extraction with strict model
    try:
        result = client.chat.completions.create(
            model="gpt-4o",
            response_model=DetailedExtraction,
            max_retries=2,
            messages=[
                {"role": "system", "content": "Extract all company details."},
                {"role": "user", "content": text}
            ],
        )
        return ExtractionResult(data=result, quality="full", errors=[], attempts=1)
    except Exception as e:
        errors.append(f"Full extraction failed: {e}")

    # Attempt 2: Fallback schema with cheaper model
    try:
        result = client.chat.completions.create(
            model="gpt-4o-mini",
            response_model=FallbackExtraction,
            max_retries=1,
            messages=[
                {"role": "system", "content": "Extract basic company info."},
                {"role": "user", "content": text}
            ],
        )
        return ExtractionResult(data=result, quality="fallback", errors=errors, attempts=2)
    except Exception as e:
        errors.append(f"Fallback extraction failed: {e}")

    return ExtractionResult(data=None, quality="failed", errors=errors, attempts=2)

FAQ

How many retries should I configure for production systems?

For API errors (rate limits, timeouts), use 3-5 retries with exponential backoff. For validation errors via Instructor, use 2-3 retries — if the model cannot produce valid output in 3 attempts, more retries rarely help and you should fall back to a simpler schema. Total retry budget should stay under 30 seconds for user-facing applications.

How do I log structured output failures for debugging?

Log the full context: input text, raw LLM response, validation errors, retry count, and which fallback stage succeeded. Use structured logging (JSON format) so you can query failures by error type, schema, and model. This data is invaluable for identifying which validators are too strict and which input patterns cause consistent failures.

Should I use circuit breakers for LLM extraction?

Yes, especially in high-throughput systems. If the LLM API returns errors on 50%+ of recent requests, stop sending new requests for a cooldown period (30-60 seconds). This prevents cascading failures and wasted API spend. Libraries like tenacity and pybreaker make this easy to implement.

#ErrorHandling #Resilience #StructuredOutputs #Production #Python #AgenticAI #LearnAI #AIEngineering

Handling Structured Output Failures: Retries, Fallbacks, and Partial Parsing

Why Structured Outputs Fail

Retry with Exponential Backoff

Validation-Aware Retries with Instructor

Fallback Schemas

Partial Result Recovery

Graceful Degradation Pipeline

FAQ

How many retries should I configure for production systems?

How do I log structured output failures for debugging?

Should I use circuit breakers for LLM extraction?

Try CallSphere AI Voice Agents

Related Articles You May Like

GPT-Realtime-2 128K Context: What It Unlocks for Voice Agents

Building Your First Agent with the OpenAI Agents SDK in 2026: A Hands-On Walkthrough

Smolagents: Hugging Face's Code-First Agent Framework Reviewed

Ollama in 2026: Is It Production-Ready Now? An Honest Look

Multi-Turn Dialogue Coherence: Why Bots Lose the Thread

PyTorch 2.x Compile in Production: When It Helps and When It Hurts