Gemini Structured Output: Getting JSON and Typed Responses from Google AI

Why Structured Output Matters for Agents

Agents that produce free-form text are limited to human consumption. Agents that produce structured data can feed into databases, trigger workflows, update dashboards, and chain into other agents. When your classification agent returns {"sentiment": "negative", "urgency": "high", "category": "billing"} instead of a paragraph, downstream systems can act on it immediately.

Gemini supports native structured output through JSON mode and schema constraints. Unlike prompt-based approaches that ask the model to "return JSON," Gemini's structured output is enforced at the model level — the output is guaranteed to be valid JSON matching your schema.

Basic JSON Mode

The simplest approach sets the response MIME type to JSON:

flowchart LR
    INPUT(["User intent"])
    PARSE["Parse plus<br/>classify"]
    PLAN["Plan and tool<br/>selection"]
    AGENT["Agent loop<br/>LLM plus tools"]
    GUARD{"Guardrails<br/>and policy"}
    EXEC["Execute and<br/>verify result"]
    OBS[("Trace and metrics")]
    OUT(["Outcome plus<br/>next action"])
    INPUT --> PARSE --> PLAN --> AGENT --> GUARD
    GUARD -->|Pass| EXEC --> OUT
    GUARD -->|Fail| AGENT
    AGENT --> OBS
    style AGENT fill:#4f46e5,stroke:#4338ca,color:#fff
    style GUARD fill:#f59e0b,stroke:#d97706,color:#1f2937
    style OBS fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
    style OUT fill:#059669,stroke:#047857,color:#fff

import google.generativeai as genai
import json
import os

genai.configure(api_key=os.environ["GOOGLE_API_KEY"])

model = genai.GenerativeModel(
    "gemini-2.0-flash",
    generation_config=genai.GenerationConfig(
        response_mime_type="application/json",
    ),
)

response = model.generate_content(
    "Analyze the sentiment of this review: "
    "'The product arrived late but the quality exceeded my expectations. "
    "Customer support was unhelpful when I asked about the delay.'"
)

data = json.loads(response.text)
print(json.dumps(data, indent=2))

With JSON mode enabled, the response is guaranteed to be valid JSON. However, the schema is inferred from the prompt — the model decides what keys and types to use.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live →

Try Live Demo →

Schema-Constrained Output

For production agents, define an explicit schema to guarantee the response structure:

import google.generativeai as genai
from google.generativeai.types import GenerationConfig
import json
import os

genai.configure(api_key=os.environ["GOOGLE_API_KEY"])

# Define the expected output schema
review_schema = {
    "type": "object",
    "properties": {
        "sentiment": {
            "type": "string",
            "enum": ["positive", "negative", "mixed", "neutral"],
        },
        "urgency": {
            "type": "string",
            "enum": ["low", "medium", "high"],
        },
        "topics": {
            "type": "array",
            "items": {"type": "string"},
        },
        "summary": {
            "type": "string",
        },
        "confidence_score": {
            "type": "number",
        },
    },
    "required": ["sentiment", "urgency", "topics", "summary", "confidence_score"],
}

model = genai.GenerativeModel(
    "gemini-2.0-flash",
    generation_config=GenerationConfig(
        response_mime_type="application/json",
        response_schema=review_schema,
    ),
)

response = model.generate_content(
    "Analyze this customer review: 'I have been waiting 3 weeks for my refund. "
    "Every time I call, I get transferred to a different department. This is unacceptable.'"
)

result = json.loads(response.text)
print(f"Sentiment: {result['sentiment']}")
print(f"Urgency: {result['urgency']}")
print(f"Topics: {result['topics']}")

The enum constraint is powerful — it forces the model to choose from your predefined categories, eliminating inconsistent labels like "somewhat positive" or "POSITIVE" that break downstream logic.

Extracting Structured Data from Documents

A common agent pattern is extracting structured records from unstructured text:

invoice_schema = {
    "type": "object",
    "properties": {
        "vendor_name": {"type": "string"},
        "invoice_number": {"type": "string"},
        "date": {"type": "string", "description": "ISO 8601 format"},
        "line_items": {
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "description": {"type": "string"},
                    "quantity": {"type": "integer"},
                    "unit_price": {"type": "number"},
                    "total": {"type": "number"},
                },
                "required": ["description", "quantity", "unit_price", "total"],
            },
        },
        "subtotal": {"type": "number"},
        "tax": {"type": "number"},
        "total": {"type": "number"},
    },
    "required": ["vendor_name", "invoice_number", "date", "line_items", "total"],
}

model = genai.GenerativeModel(
    "gemini-2.0-flash",
    generation_config=GenerationConfig(
        response_mime_type="application/json",
        response_schema=invoice_schema,
    ),
)

# Works with both text and image inputs
invoice_image = genai.upload_file("invoice_scan.pdf")
response = model.generate_content([
    "Extract all invoice details from this document.",
    invoice_image,
])

invoice_data = json.loads(response.text)

Array Responses for Batch Processing

When you need multiple structured items from a single prompt, use an array schema:

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Try Live Demo → Book 30-min Walkthrough See Pricing

batch_schema = {
    "type": "array",
    "items": {
        "type": "object",
        "properties": {
            "email": {"type": "string"},
            "intent": {
                "type": "string",
                "enum": ["support", "sales", "billing", "feedback", "spam"],
            },
            "priority": {
                "type": "string",
                "enum": ["low", "medium", "high", "critical"],
            },
            "suggested_response": {"type": "string"},
        },
        "required": ["email", "intent", "priority", "suggested_response"],
    },
}

model = genai.GenerativeModel(
    "gemini-2.0-flash",
    generation_config=GenerationConfig(
        response_mime_type="application/json",
        response_schema=batch_schema,
    ),
)

emails_text = """
Email 1: "Our production server is down, we need immediate help!"
Email 2: "Can you send me pricing for the enterprise plan?"
Email 3: "Just wanted to say your product saved us 20 hours this week."
"""

response = model.generate_content(
    f"Classify each email and suggest a response:\n{emails_text}"
)

classified = json.loads(response.text)
for item in classified:
    print(f"Intent: {item['intent']} | Priority: {item['priority']}")

Validation Pattern for Production

Always validate structured output even with schema enforcement:

from pydantic import BaseModel, field_validator
from typing import Literal

class ReviewAnalysis(BaseModel):
    sentiment: Literal["positive", "negative", "mixed", "neutral"]
    urgency: Literal["low", "medium", "high"]
    topics: list[str]
    summary: str
    confidence_score: float

    @field_validator("confidence_score")
    @classmethod
    def validate_confidence(cls, v):
        if not 0 <= v <= 1:
            raise ValueError("confidence_score must be between 0 and 1")
        return v

# Parse and validate
raw = json.loads(response.text)
validated = ReviewAnalysis(**raw)

FAQ

Does structured output work with streaming?

Yes, but the JSON is only valid once the full response is received. During streaming, you receive partial JSON that cannot be parsed until complete. If you need progressive results, use a streaming JSON parser or wait for the complete response.

What happens if the model cannot match the schema?

If the model cannot generate valid output matching your schema, the response may be empty or contain a minimal valid structure. This is rare with well-designed schemas but can occur with overly restrictive constraints or contradictory requirements.

Can I use Pydantic models directly as the schema?

Not directly in the google-generativeai SDK. You need to pass a JSON Schema dictionary. However, you can generate the schema from a Pydantic model using ReviewAnalysis.model_json_schema() and pass that to response_schema.

#GoogleGemini #StructuredOutput #JSON #DataExtraction #Python #AgenticAI #LearnAI #AIEngineering

Gemini Structured Output: Getting JSON and Typed Responses from Google AI

Why Structured Output Matters for Agents

Basic JSON Mode

Schema-Constrained Output

Extracting Structured Data from Documents

Array Responses for Batch Processing

Validation Pattern for Production

FAQ

Does structured output work with streaming?

What happens if the model cannot match the schema?

Can I use Pydantic models directly as the schema?

Try CallSphere AI Voice Agents

Related Articles You May Like

Building Your First Agent with the OpenAI Agents SDK in 2026: A Hands-On Walkthrough

Smolagents: Hugging Face's Code-First Agent Framework Reviewed

Structured Output Prompts: JSON Schema, XML, and Function-Call Modes

Gemini 3.1 Pro and Workspace Intelligence: April 2026 Chat Agent Updates

Deploy a Voice Agent on Modal with Python and Serverless GPU

Pydantic AI April 2026 Update: Typed Agents and Structured Tools