Beyond Plain Text Tool Outputs

Most tool examples return simple strings. But real-world agents need to return charts, generated files, images, and structured data. The OpenAI Agents SDK provides dedicated output types that let your tools return rich content alongside text.

The three output types are:

ToolOutputText — explicit text output (useful when combining with other types)
ToolOutputImage — base64-encoded images that the model can see and reason about
ToolOutputFileContent — file content (CSV, PDF, etc.) as base64 data

Returning Images with ToolOutputImage

When your tool generates a chart, screenshot, or any visual content, wrap it in a ToolOutputImage so the agent can interpret the image:

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live →

Try Live Demo →

flowchart LR
    INPUT(["User intent"])
    PARSE["Parse plus<br/>classify"]
    PLAN["Plan and tool<br/>selection"]
    AGENT["Agent loop<br/>LLM plus tools"]
    GUARD{"Guardrails<br/>and policy"}
    EXEC["Execute and<br/>verify result"]
    OBS[("Trace and metrics")]
    OUT(["Outcome plus<br/>next action"])
    INPUT --> PARSE --> PLAN --> AGENT --> GUARD
    GUARD -->|Pass| EXEC --> OUT
    GUARD -->|Fail| AGENT
    AGENT --> OBS
    style AGENT fill:#4f46e5,stroke:#4338ca,color:#fff
    style GUARD fill:#f59e0b,stroke:#d97706,color:#1f2937
    style OBS fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
    style OUT fill:#059669,stroke:#047857,color:#fff

import base64
from agents import function_tool
from agents.tool import ToolOutputImage

@function_tool
def generate_chart(data_points: str) -> ToolOutputImage:
    """Generate a bar chart from comma-separated values."""
    import matplotlib
    matplotlib.use("Agg")
    import matplotlib.pyplot as plt
    import io

    values = [float(x.strip()) for x in data_points.split(",")]
    labels = [f"Item {i+1}" for i in range(len(values))]

    fig, ax = plt.subplots()
    ax.bar(labels, values)
    ax.set_title("Generated Chart")

    buf = io.BytesIO()
    fig.savefig(buf, format="png")
    buf.seek(0)
    plt.close(fig)

    image_base64 = base64.b64encode(buf.read()).decode("utf-8")
    return ToolOutputImage(image_data=image_base64, media_type="image/png")

The agent receives the image and can describe it, answer questions about it, or reference it in its response. This is particularly useful for data visualization, diagram generation, and screenshot analysis tools.

Returning Files with ToolOutputFileContent

For tools that produce downloadable content — CSV exports, generated PDFs, configuration files — use ToolOutputFileContent:

import base64
import csv
import io
from agents import function_tool
from agents.tool import ToolOutputFileContent

@function_tool
def export_report_csv(report_type: str) -> ToolOutputFileContent:
    """Export a report as a CSV file."""
    output = io.StringIO()
    writer = csv.writer(output)
    writer.writerow(["Month", "Revenue", "Expenses", "Profit"])
    writer.writerow(["January", "50000", "30000", "20000"])
    writer.writerow(["February", "55000", "32000", "23000"])
    writer.writerow(["March", "60000", "31000", "29000"])

    csv_bytes = output.getvalue().encode("utf-8")
    file_base64 = base64.b64encode(csv_bytes).decode("utf-8")

    return ToolOutputFileContent(
        file_data=file_base64,
        media_type="text/csv",
    )

The model receives the file content and can summarize it, extract specific values, or explain what the file contains.

Combining Multiple Output Types

A single tool can return multiple outputs by returning a list. For example, a reporting tool might return both a chart image and the underlying data as text:

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Try Live Demo → Book 30-min Walkthrough See Pricing

import base64
import io
from agents import function_tool
from agents.tool import ToolOutputImage, ToolOutputText

@function_tool
def sales_dashboard(quarter: str) -> list:
    """Generate a sales dashboard with chart and summary for the given quarter."""
    import matplotlib
    matplotlib.use("Agg")
    import matplotlib.pyplot as plt

    months = ["Month 1", "Month 2", "Month 3"]
    revenue = [45000, 52000, 61000]

    fig, ax = plt.subplots()
    ax.plot(months, revenue, marker="o")
    ax.set_title(f"Revenue Trend - {quarter}")
    ax.set_ylabel("Revenue ($)")

    buf = io.BytesIO()
    fig.savefig(buf, format="png")
    buf.seek(0)
    plt.close(fig)

    image_data = base64.b64encode(buf.read()).decode("utf-8")
    total = sum(revenue)

    return [
        ToolOutputImage(image_data=image_data, media_type="image/png"),
        ToolOutputText(text=f"Total revenue for {quarter}: ${total:,}. Growth rate: 35.6% from Month 1 to Month 3."),
    ]

When you return a list, the SDK sends each item as a separate content block in the tool response. The agent sees all of them and can reference both the chart and the text in its reply.

Using ToolOutputText for Explicit Text

You might wonder why ToolOutputText exists when you can just return a string. The answer is composition — when you return a list of outputs, you need explicit types for each element:

from agents.tool import ToolOutputText, ToolOutputImage

@function_tool
def analyze_image(image_url: str) -> list:
    """Download an image, analyze it, and return both the image and analysis."""
    # Download and process the image...
    image_base64 = "..."  # base64 encoded image

    return [
        ToolOutputImage(image_data=image_base64, media_type="image/jpeg"),
        ToolOutputText(text="Analysis: The image contains a landscape with mountains and a lake. Dominant colors are blue and green."),
    ]

Structured Data as Tool Output

For tools that return structured data (JSON, tables, records), you have two options. The simplest is to format the data as a readable string:

import json
from agents import function_tool

@function_tool
def get_customer_profile(customer_id: str) -> str:
    """Retrieve a customer's full profile."""
    profile = {
        "id": customer_id,
        "name": "Jane Smith",
        "plan": "Enterprise",
        "usage": {"api_calls": 15420, "storage_gb": 42.3},
        "status": "active",
    }
    return json.dumps(profile, indent=2)

The agent parses JSON naturally and can extract specific fields when answering user questions. For very large or complex data, consider summarizing it in the tool before returning.

Key Takeaways

Use ToolOutputImage to return charts, screenshots, and generated visuals
Use ToolOutputFileContent for downloadable files like CSVs and PDFs
Return a list to combine multiple output types from a single tool call
Use ToolOutputText when mixing text with other output types in a list
For structured data, JSON strings work well — the model parses them naturally
Always base64-encode binary content before wrapping in output types

Returning Rich Output from Agent Tools: Images, Files, and Structured Data

Beyond Plain Text Tool Outputs

Returning Images with ToolOutputImage

Returning Files with ToolOutputFileContent

Combining Multiple Output Types

Using ToolOutputText for Explicit Text

Structured Data as Tool Output

Key Takeaways

Try CallSphere AI Voice Agents

Related Articles You May Like

Desktop AI Agents in 2026: Project Arc, Claude Cowork, OpenAI Agents Compared

OpenAI Frontier: Model-Native Orchestration Is the Default in 2026

Gemini Enterprise vs Anthropic vs OpenAI Frontier: 2026 Comparison

Anthropic's Financial Services Platform: State of Play in May 2026

Model-Native Harness: Why OpenAI and Anthropic Are Killing ReAct Loops

GPT-Realtime-Whisper vs Deepgram: Streaming STT in 2026