---
title: "Handling Structured Output Failures: Retries, Fallbacks, and Partial Parsing"
description: "Build resilient structured output systems that handle LLM failures gracefully. Learn retry strategies with exponential backoff, fallback schemas, partial result recovery, and graceful degradation patterns."
canonical: https://callsphere.ai/blog/handling-structured-output-failures-retries-fallbacks-partial-parsing
category: "Learn Agentic AI"
tags: ["Error Handling", "Resilience", "Structured Outputs", "Production", "Python"]
author: "CallSphere Team"
published: 2026-03-17T00:00:00.000Z
updated: 2026-05-06T01:02:42.282Z
---

# Handling Structured Output Failures: Retries, Fallbacks, and Partial Parsing

> Build resilient structured output systems that handle LLM failures gracefully. Learn retry strategies with exponential backoff, fallback schemas, partial result recovery, and graceful degradation patterns.

## Why Structured Outputs Fail

Even with OpenAI's constrained decoding and Pydantic validation, structured output extraction fails in production. Common failure modes include:

- **API errors**: Rate limits (429), server errors (500), timeouts
- **Validation errors**: The model returns valid JSON that fails your business logic validators
- **Content refusals**: The model refuses to process the input due to safety filters
- **Malformed output**: Rare with strict mode, but possible with JSON mode or local models
- **Hallucination**: The JSON is valid and schema-conforming, but the extracted values are wrong

A production system must handle every one of these gracefully. Crashing on the first validation error is not acceptable.

## Retry with Exponential Backoff

The simplest resilience pattern. Retry API failures with increasing delays:

```mermaid
flowchart LR
    INPUT(["User intent"])
    PARSE["Parse plus
classify"]
    PLAN["Plan and tool
selection"]
    AGENT["Agent loop
LLM plus tools"]
    GUARD{"Guardrails
and policy"}
    EXEC["Execute and
verify result"]
    OBS[("Trace and metrics")]
    OUT(["Outcome plus
next action"])
    INPUT --> PARSE --> PLAN --> AGENT --> GUARD
    GUARD -->|Pass| EXEC --> OUT
    GUARD -->|Fail| AGENT
    AGENT --> OBS
    style AGENT fill:#4f46e5,stroke:#4338ca,color:#fff
    style GUARD fill:#f59e0b,stroke:#d97706,color:#1f2937
    style OBS fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
    style OUT fill:#059669,stroke:#047857,color:#fff
```

```python
import time
import random
from typing import TypeVar, Callable
from openai import RateLimitError, APITimeoutError, APIConnectionError

T = TypeVar("T")

def retry_with_backoff(
    func: Callable[..., T],
    max_retries: int = 5,
    base_delay: float = 1.0,
    max_delay: float = 60.0,
    retryable_errors: tuple = (RateLimitError, APITimeoutError, APIConnectionError),
) -> Callable[..., T]:
    """Decorator that retries a function with exponential backoff."""

    def wrapper(*args, **kwargs) -> T:
        last_error = None
        for attempt in range(max_retries):
            try:
                return func(*args, **kwargs)
            except retryable_errors as e:
                last_error = e
                delay = min(base_delay * (2 ** attempt) + random.uniform(0, 1), max_delay)
                print(f"Attempt {attempt + 1} failed: {e}. Retrying in {delay:.1f}s")
                time.sleep(delay)
            except Exception:
                raise  # Non-retryable errors propagate immediately
        raise last_error

    return wrapper
```

The jitter (`random.uniform(0, 1)`) prevents thundering herd problems when multiple processes hit rate limits simultaneously.

## Validation-Aware Retries with Instructor

Instructor's built-in retry mechanism feeds validation errors back to the model. Customize this behavior:

```python
import instructor
from openai import OpenAI
from pydantic import BaseModel, Field, field_validator
from typing import List

client = instructor.from_openai(OpenAI())

class StrictProduct(BaseModel):
    name: str
    price: float = Field(gt=0)
    currency: str = Field(pattern=r"^[A-Z]{3}$")
    categories: List[str] = Field(min_length=1, max_length=5)

    @field_validator("name")
    @classmethod
    def name_not_generic(cls, v: str) -> str:
        generic_names = {"product", "item", "thing", "unknown"}
        if v.lower().strip() in generic_names:
            raise ValueError(f"Name '{v}' is too generic. Extract the actual product name.")
        return v

# Instructor automatically retries with validation errors in the prompt
product = client.chat.completions.create(
    model="gpt-4o",
    response_model=StrictProduct,
    max_retries=3,  # Will retry up to 3 times on validation failure
    messages=[
        {"role": "user", "content": "The new widget costs fifteen dollars."}
    ],
)
```

On each retry, the model sees its previous output and the exact validation error, allowing it to self-correct.

## Fallback Schemas

When a detailed extraction fails repeatedly, fall back to a simpler schema that captures partial data:

```python
class DetailedExtraction(BaseModel):
    company_name: str
    founding_year: int
    revenue: float
    employee_count: int
    headquarters_city: str
    headquarters_country: str
    industry: str
    ceo_name: str

class FallbackExtraction(BaseModel):
    company_name: str
    raw_details: str = Field(description="Any other details as free text")
    extraction_complete: bool = False

def extract_with_fallback(text: str) -> DetailedExtraction | FallbackExtraction:
    """Try detailed extraction first, fall back to simple on failure."""
    try:
        return client.chat.completions.create(
            model="gpt-4o",
            response_model=DetailedExtraction,
            max_retries=2,
            messages=[
                {"role": "system", "content": "Extract company details precisely."},
                {"role": "user", "content": text}
            ],
        )
    except Exception as e:
        print(f"Detailed extraction failed: {e}. Trying fallback.")
        return client.chat.completions.create(
            model="gpt-4o",
            response_model=FallbackExtraction,
            max_retries=1,
            messages=[
                {"role": "system", "content": "Extract whatever company information you can."},
                {"role": "user", "content": text}
            ],
        )
```

The fallback captures the company name (almost always extractable) and dumps everything else into free text. This is better than returning nothing — downstream systems can still use the partial data.

## Partial Result Recovery

When extracting a list of items, some may validate while others fail. Recover the valid ones:

```python
from pydantic import ValidationError

class Transaction(BaseModel):
    date: str = Field(pattern=r"^\d{4}-\d{2}-\d{2}$")
    amount: float = Field(gt=0)
    merchant: str
    category: str

def extract_transactions_with_recovery(raw_items: list[dict]) -> tuple[list[Transaction], list[dict]]:
    """Parse a list of raw dicts, separating valid from invalid."""
    valid = []
    invalid = []

    for item in raw_items:
        try:
            valid.append(Transaction.model_validate(item))
        except ValidationError as e:
            invalid.append({"data": item, "errors": e.errors()})

    return valid, invalid

# Example usage after getting raw JSON from LLM
import json

raw_response = '''[
    {"date": "2025-03-15", "amount": 42.50, "merchant": "Coffee Shop", "category": "food"},
    {"date": "March 15", "amount": -10, "merchant": "", "category": "other"},
    {"date": "2025-03-16", "amount": 120.00, "merchant": "Gas Station", "category": "transport"}
]'''

raw_items = json.loads(raw_response)
valid, invalid = extract_transactions_with_recovery(raw_items)
print(f"Recovered {len(valid)} of {len(raw_items)} transactions")
print(f"Failed items: {len(invalid)}")
```

## Graceful Degradation Pipeline

Combine all patterns into a complete resilience pipeline:

```python
from dataclasses import dataclass
from typing import Any, Optional

@dataclass
class ExtractionResult:
    data: Any
    quality: str  # "full", "partial", "fallback", "failed"
    errors: list[str]
    attempts: int

def resilient_extract(text: str) -> ExtractionResult:
    errors = []

    # Attempt 1: Full extraction with strict model
    try:
        result = client.chat.completions.create(
            model="gpt-4o",
            response_model=DetailedExtraction,
            max_retries=2,
            messages=[
                {"role": "system", "content": "Extract all company details."},
                {"role": "user", "content": text}
            ],
        )
        return ExtractionResult(data=result, quality="full", errors=[], attempts=1)
    except Exception as e:
        errors.append(f"Full extraction failed: {e}")

    # Attempt 2: Fallback schema with cheaper model
    try:
        result = client.chat.completions.create(
            model="gpt-4o-mini",
            response_model=FallbackExtraction,
            max_retries=1,
            messages=[
                {"role": "system", "content": "Extract basic company info."},
                {"role": "user", "content": text}
            ],
        )
        return ExtractionResult(data=result, quality="fallback", errors=errors, attempts=2)
    except Exception as e:
        errors.append(f"Fallback extraction failed: {e}")

    return ExtractionResult(data=None, quality="failed", errors=errors, attempts=2)
```

## FAQ

### How many retries should I configure for production systems?

For API errors (rate limits, timeouts), use 3-5 retries with exponential backoff. For validation errors via Instructor, use 2-3 retries — if the model cannot produce valid output in 3 attempts, more retries rarely help and you should fall back to a simpler schema. Total retry budget should stay under 30 seconds for user-facing applications.

### How do I log structured output failures for debugging?

Log the full context: input text, raw LLM response, validation errors, retry count, and which fallback stage succeeded. Use structured logging (JSON format) so you can query failures by error type, schema, and model. This data is invaluable for identifying which validators are too strict and which input patterns cause consistent failures.

### Should I use circuit breakers for LLM extraction?

Yes, especially in high-throughput systems. If the LLM API returns errors on 50%+ of recent requests, stop sending new requests for a cooldown period (30-60 seconds). This prevents cascading failures and wasted API spend. Libraries like `tenacity` and `pybreaker` make this easy to implement.

---

#ErrorHandling #Resilience #StructuredOutputs #Production #Python #AgenticAI #LearnAI #AIEngineering

---

Source: https://callsphere.ai/blog/handling-structured-output-failures-retries-fallbacks-partial-parsing
