Skip to content
Learn Agentic AI
Learn Agentic AI8 min read2 views

Returning Rich Output from Agent Tools: Images, Files, and Structured Data

Go beyond plain text responses. Learn how to return images, files, and structured data from OpenAI Agents SDK tools using ToolOutputImage, ToolOutputFileContent, and ToolOutputText.

Beyond Plain Text Tool Outputs

Most tool examples return simple strings. But real-world agents need to return charts, generated files, images, and structured data. The OpenAI Agents SDK provides dedicated output types that let your tools return rich content alongside text.

The three output types are:

  • ToolOutputText — explicit text output (useful when combining with other types)
  • ToolOutputImage — base64-encoded images that the model can see and reason about
  • ToolOutputFileContent — file content (CSV, PDF, etc.) as base64 data

Returning Images with ToolOutputImage

When your tool generates a chart, screenshot, or any visual content, wrap it in a ToolOutputImage so the agent can interpret the image:

flowchart TD
    START["Returning Rich Output from Agent Tools: Images, F…"] --> A
    A["Beyond Plain Text Tool Outputs"]
    A --> B
    B["Returning Images with ToolOutputImage"]
    B --> C
    C["Returning Files with ToolOutputFileCont…"]
    C --> D
    D["Combining Multiple Output Types"]
    D --> E
    E["Using ToolOutputText for Explicit Text"]
    E --> F
    F["Structured Data as Tool Output"]
    F --> G
    G["Key Takeaways"]
    G --> DONE["Key Takeaways"]
    style START fill:#4f46e5,stroke:#4338ca,color:#fff
    style DONE fill:#059669,stroke:#047857,color:#fff
import base64
from agents import function_tool
from agents.tool import ToolOutputImage

@function_tool
def generate_chart(data_points: str) -> ToolOutputImage:
    """Generate a bar chart from comma-separated values."""
    import matplotlib
    matplotlib.use("Agg")
    import matplotlib.pyplot as plt
    import io

    values = [float(x.strip()) for x in data_points.split(",")]
    labels = [f"Item {i+1}" for i in range(len(values))]

    fig, ax = plt.subplots()
    ax.bar(labels, values)
    ax.set_title("Generated Chart")

    buf = io.BytesIO()
    fig.savefig(buf, format="png")
    buf.seek(0)
    plt.close(fig)

    image_base64 = base64.b64encode(buf.read()).decode("utf-8")
    return ToolOutputImage(image_data=image_base64, media_type="image/png")

The agent receives the image and can describe it, answer questions about it, or reference it in its response. This is particularly useful for data visualization, diagram generation, and screenshot analysis tools.

Returning Files with ToolOutputFileContent

For tools that produce downloadable content — CSV exports, generated PDFs, configuration files — use ToolOutputFileContent:

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

import base64
import csv
import io
from agents import function_tool
from agents.tool import ToolOutputFileContent

@function_tool
def export_report_csv(report_type: str) -> ToolOutputFileContent:
    """Export a report as a CSV file."""
    output = io.StringIO()
    writer = csv.writer(output)
    writer.writerow(["Month", "Revenue", "Expenses", "Profit"])
    writer.writerow(["January", "50000", "30000", "20000"])
    writer.writerow(["February", "55000", "32000", "23000"])
    writer.writerow(["March", "60000", "31000", "29000"])

    csv_bytes = output.getvalue().encode("utf-8")
    file_base64 = base64.b64encode(csv_bytes).decode("utf-8")

    return ToolOutputFileContent(
        file_data=file_base64,
        media_type="text/csv",
    )

The model receives the file content and can summarize it, extract specific values, or explain what the file contains.

Combining Multiple Output Types

A single tool can return multiple outputs by returning a list. For example, a reporting tool might return both a chart image and the underlying data as text:

flowchart TD
    CENTER(("Core Concepts"))
    CENTER --> N0["ToolOutputText — explicit text output u…"]
    CENTER --> N1["ToolOutputImage — base64-encoded images…"]
    CENTER --> N2["ToolOutputFileContent — file content CS…"]
    CENTER --> N3["Use ToolOutputImage to return charts, s…"]
    CENTER --> N4["Use ToolOutputFileContent for downloada…"]
    CENTER --> N5["Return a list to combine multiple outpu…"]
    style CENTER fill:#4f46e5,stroke:#4338ca,color:#fff
import base64
import io
from agents import function_tool
from agents.tool import ToolOutputImage, ToolOutputText

@function_tool
def sales_dashboard(quarter: str) -> list:
    """Generate a sales dashboard with chart and summary for the given quarter."""
    import matplotlib
    matplotlib.use("Agg")
    import matplotlib.pyplot as plt

    months = ["Month 1", "Month 2", "Month 3"]
    revenue = [45000, 52000, 61000]

    fig, ax = plt.subplots()
    ax.plot(months, revenue, marker="o")
    ax.set_title(f"Revenue Trend - {quarter}")
    ax.set_ylabel("Revenue ($)")

    buf = io.BytesIO()
    fig.savefig(buf, format="png")
    buf.seek(0)
    plt.close(fig)

    image_data = base64.b64encode(buf.read()).decode("utf-8")
    total = sum(revenue)

    return [
        ToolOutputImage(image_data=image_data, media_type="image/png"),
        ToolOutputText(text=f"Total revenue for {quarter}: ${total:,}. Growth rate: 35.6% from Month 1 to Month 3."),
    ]

When you return a list, the SDK sends each item as a separate content block in the tool response. The agent sees all of them and can reference both the chart and the text in its reply.

Using ToolOutputText for Explicit Text

You might wonder why ToolOutputText exists when you can just return a string. The answer is composition — when you return a list of outputs, you need explicit types for each element:

from agents.tool import ToolOutputText, ToolOutputImage

@function_tool
def analyze_image(image_url: str) -> list:
    """Download an image, analyze it, and return both the image and analysis."""
    # Download and process the image...
    image_base64 = "..."  # base64 encoded image

    return [
        ToolOutputImage(image_data=image_base64, media_type="image/jpeg"),
        ToolOutputText(text="Analysis: The image contains a landscape with mountains and a lake. Dominant colors are blue and green."),
    ]

Structured Data as Tool Output

For tools that return structured data (JSON, tables, records), you have two options. The simplest is to format the data as a readable string:

import json
from agents import function_tool

@function_tool
def get_customer_profile(customer_id: str) -> str:
    """Retrieve a customer's full profile."""
    profile = {
        "id": customer_id,
        "name": "Jane Smith",
        "plan": "Enterprise",
        "usage": {"api_calls": 15420, "storage_gb": 42.3},
        "status": "active",
    }
    return json.dumps(profile, indent=2)

The agent parses JSON naturally and can extract specific fields when answering user questions. For very large or complex data, consider summarizing it in the tool before returning.

Key Takeaways

  • Use ToolOutputImage to return charts, screenshots, and generated visuals
  • Use ToolOutputFileContent for downloadable files like CSVs and PDFs
  • Return a list to combine multiple output types from a single tool call
  • Use ToolOutputText when mixing text with other output types in a list
  • For structured data, JSON strings work well — the model parses them naturally
  • Always base64-encode binary content before wrapping in output types
Share
C

Written by

CallSphere Team

Expert insights on AI voice agents and customer communication automation.

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.

Related Articles You May Like