---
title: "OpenAI JSON Mode and Structured Outputs: Reliable Data Extraction"
description: "Master OpenAI's JSON mode and structured outputs to extract reliable, schema-validated data from LLMs with guaranteed format compliance and Pydantic integration."
canonical: https://callsphere.ai/blog/openai-json-mode-structured-outputs-reliable-data-extraction
category: "Learn Agentic AI"
tags: ["OpenAI", "JSON Mode", "Structured Outputs", "Pydantic", "Data Extraction"]
author: "CallSphere Team"
published: 2026-03-17T00:00:00.000Z
updated: 2026-05-07T00:24:23.737Z
---

# OpenAI JSON Mode and Structured Outputs: Reliable Data Extraction

> Master OpenAI's JSON mode and structured outputs to extract reliable, schema-validated data from LLMs with guaranteed format compliance and Pydantic integration.

## The Problem with Unstructured LLM Output

By default, LLMs return free-form text. When you need structured data — a JSON object with specific fields, types, and constraints — you are relying on the model to follow your prompt instructions perfectly. It usually works, but sometimes the model wraps JSON in markdown code fences, adds extra commentary, omits fields, or returns invalid JSON.

OpenAI provides two mechanisms to solve this: JSON mode and structured outputs. Both guarantee valid JSON, but structured outputs go further by enforcing a specific schema.

## JSON Mode: Guaranteed Valid JSON

JSON mode ensures the model outputs valid JSON, but does not enforce a specific structure:

```mermaid
flowchart LR
    INPUT(["User intent"])
    PARSE["Parse plus
classify"]
    PLAN["Plan and tool
selection"]
    AGENT["Agent loop
LLM plus tools"]
    GUARD{"Guardrails
and policy"}
    EXEC["Execute and
verify result"]
    OBS[("Trace and metrics")]
    OUT(["Outcome plus
next action"])
    INPUT --> PARSE --> PLAN --> AGENT --> GUARD
    GUARD -->|Pass| EXEC --> OUT
    GUARD -->|Fail| AGENT
    AGENT --> OBS
    style AGENT fill:#4f46e5,stroke:#4338ca,color:#fff
    style GUARD fill:#f59e0b,stroke:#d97706,color:#1f2937
    style OBS fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
    style OUT fill:#059669,stroke:#047857,color:#fff
```

```python
from openai import OpenAI

client = OpenAI()

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "Extract the person's details as JSON with name, age, and city fields."},
        {"role": "user", "content": "John Smith is 34 years old and lives in Chicago."},
    ],
    response_format={"type": "json_object"},
)

import json
data = json.loads(response.choices[0].message.content)
print(data)
# {"name": "John Smith", "age": 34, "city": "Chicago"}
```

**Important:** You must mention JSON in your system or user message when using JSON mode. The API requires this and will error if you do not.

## Structured Outputs: Schema-Enforced JSON

Structured outputs go beyond JSON mode by enforcing a specific JSON schema:

```python
from openai import OpenAI

client = OpenAI()

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "Extract product information from the text."},
        {"role": "user", "content": "The MacBook Pro 16-inch costs $2499, weighs 4.8 lbs, and has an M3 Max chip."},
    ],
    response_format={
        "type": "json_schema",
        "json_schema": {
            "name": "product_info",
            "strict": True,
            "schema": {
                "type": "object",
                "properties": {
                    "product_name": {"type": "string"},
                    "price_usd": {"type": "number"},
                    "weight_lbs": {"type": "number"},
                    "processor": {"type": "string"},
                },
                "required": ["product_name", "price_usd", "weight_lbs", "processor"],
                "additionalProperties": False,
            },
        },
    },
)

data = json.loads(response.choices[0].message.content)
print(data)
```

With `strict: True`, the model is constrained to output JSON that conforms exactly to your schema. Every required field will be present, types will match, and no extra fields will appear.

## Pydantic Integration

The SDK integrates with Pydantic models for a cleaner developer experience:

```python
from pydantic import BaseModel
from openai import OpenAI

client = OpenAI()

class ContactInfo(BaseModel):
    name: str
    email: str
    phone: str
    company: str

response = client.beta.chat.completions.parse(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "Extract contact information from the text."},
        {"role": "user", "content": "Reach out to Sarah Connor at sarah@skynet.com or 555-0199. She works at Cyberdyne Systems."},
    ],
    response_format=ContactInfo,
)

contact = response.choices[0].message.parsed
print(f"Name: {contact.name}")
print(f"Email: {contact.email}")
print(f"Phone: {contact.phone}")
print(f"Company: {contact.company}")
```

The `.parse()` method automatically converts the Pydantic model into a JSON schema, sends it to the API, and parses the response back into a typed Pydantic instance.

## Nested and Complex Schemas

Structured outputs support nested objects, arrays, and enums:

```python
from pydantic import BaseModel
from enum import Enum

class Severity(str, Enum):
    low = "low"
    medium = "medium"
    high = "high"
    critical = "critical"

class Step(BaseModel):
    description: str
    estimated_hours: float

class BugReport(BaseModel):
    title: str
    severity: Severity
    affected_component: str
    steps_to_reproduce: list[Step]
    expected_behavior: str
    actual_behavior: str

response = client.beta.chat.completions.parse(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "Parse the bug report into structured format."},
        {"role": "user", "content": "Critical bug in the payment module. When a user clicks 'Pay Now' with an expired card (takes 2 seconds), the system shows a success message instead of an error. Expected: error message. Actual: success confirmation."},
    ],
    response_format=BugReport,
)

bug = response.choices[0].message.parsed
print(f"Title: {bug.title}")
print(f"Severity: {bug.severity}")
print(f"Steps: {len(bug.steps_to_reproduce)}")
```

## Handling Refusals

Sometimes the model refuses to fill the schema (e.g., for safety reasons). Check for this:

```python
response = client.beta.chat.completions.parse(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "Extract the information."},
        {"role": "user", "content": "Some input text here."},
    ],
    response_format=ContactInfo,
)

message = response.choices[0].message
if message.refusal:
    print(f"Model refused: {message.refusal}")
else:
    contact = message.parsed
    print(contact)
```

## Practical Example: Invoice Parsing

Here is a realistic data extraction pipeline:

```python
from pydantic import BaseModel

class LineItem(BaseModel):
    description: str
    quantity: int
    unit_price: float
    total: float

class Invoice(BaseModel):
    invoice_number: str
    date: str
    vendor_name: str
    line_items: list[LineItem]
    subtotal: float
    tax: float
    total: float

def parse_invoice(raw_text: str) -> Invoice:
    response = client.beta.chat.completions.parse(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": "Parse the invoice text into structured data. Calculate totals if not explicitly stated."},
            {"role": "user", "content": raw_text},
        ],
        response_format=Invoice,
    )
    return response.choices[0].message.parsed
```

## FAQ

### What is the difference between JSON mode and structured outputs?

JSON mode guarantees the output is valid JSON but does not enforce a specific structure. Structured outputs enforce a specific JSON schema with exact field names, types, and constraints. Use JSON mode for flexibility, structured outputs for reliability.

### Do structured outputs work with all OpenAI models?

Structured outputs with `json_schema` require GPT-4o or later models. JSON mode (`json_object`) is supported by GPT-4o, GPT-4o-mini, and GPT-3.5-turbo. Check the API documentation for the latest model compatibility.

### Can I use optional fields in structured output schemas?

With `strict: True`, all properties must be listed in `required`. To make a field optional, use a union type with null: `{"type": ["string", "null"]}`. In Pydantic, use `Optional[str]` with a default of `None`.

---

#OpenAI #JSONMode #StructuredOutputs #Pydantic #DataExtraction #AgenticAI #LearnAI #AIEngineering

---

Source: https://callsphere.ai/blog/openai-json-mode-structured-outputs-reliable-data-extraction
