Returning Rich Output from Agent Tools: Images, Files, and Structured Data
Go beyond plain text responses. Learn how to return images, files, and structured data from OpenAI Agents SDK tools using ToolOutputImage, ToolOutputFileContent, and ToolOutputText.
Beyond Plain Text Tool Outputs
Most tool examples return simple strings. But real-world agents need to return charts, generated files, images, and structured data. The OpenAI Agents SDK provides dedicated output types that let your tools return rich content alongside text.
The three output types are:
- ToolOutputText — explicit text output (useful when combining with other types)
- ToolOutputImage — base64-encoded images that the model can see and reason about
- ToolOutputFileContent — file content (CSV, PDF, etc.) as base64 data
Returning Images with ToolOutputImage
When your tool generates a chart, screenshot, or any visual content, wrap it in a ToolOutputImage so the agent can interpret the image:
flowchart TD
START["Returning Rich Output from Agent Tools: Images, F…"] --> A
A["Beyond Plain Text Tool Outputs"]
A --> B
B["Returning Images with ToolOutputImage"]
B --> C
C["Returning Files with ToolOutputFileCont…"]
C --> D
D["Combining Multiple Output Types"]
D --> E
E["Using ToolOutputText for Explicit Text"]
E --> F
F["Structured Data as Tool Output"]
F --> G
G["Key Takeaways"]
G --> DONE["Key Takeaways"]
style START fill:#4f46e5,stroke:#4338ca,color:#fff
style DONE fill:#059669,stroke:#047857,color:#fff
import base64
from agents import function_tool
from agents.tool import ToolOutputImage
@function_tool
def generate_chart(data_points: str) -> ToolOutputImage:
"""Generate a bar chart from comma-separated values."""
import matplotlib
matplotlib.use("Agg")
import matplotlib.pyplot as plt
import io
values = [float(x.strip()) for x in data_points.split(",")]
labels = [f"Item {i+1}" for i in range(len(values))]
fig, ax = plt.subplots()
ax.bar(labels, values)
ax.set_title("Generated Chart")
buf = io.BytesIO()
fig.savefig(buf, format="png")
buf.seek(0)
plt.close(fig)
image_base64 = base64.b64encode(buf.read()).decode("utf-8")
return ToolOutputImage(image_data=image_base64, media_type="image/png")
The agent receives the image and can describe it, answer questions about it, or reference it in its response. This is particularly useful for data visualization, diagram generation, and screenshot analysis tools.
Returning Files with ToolOutputFileContent
For tools that produce downloadable content — CSV exports, generated PDFs, configuration files — use ToolOutputFileContent:
See AI Voice Agents Handle Real Calls
Book a free demo or calculate how much you can save with AI voice automation.
import base64
import csv
import io
from agents import function_tool
from agents.tool import ToolOutputFileContent
@function_tool
def export_report_csv(report_type: str) -> ToolOutputFileContent:
"""Export a report as a CSV file."""
output = io.StringIO()
writer = csv.writer(output)
writer.writerow(["Month", "Revenue", "Expenses", "Profit"])
writer.writerow(["January", "50000", "30000", "20000"])
writer.writerow(["February", "55000", "32000", "23000"])
writer.writerow(["March", "60000", "31000", "29000"])
csv_bytes = output.getvalue().encode("utf-8")
file_base64 = base64.b64encode(csv_bytes).decode("utf-8")
return ToolOutputFileContent(
file_data=file_base64,
media_type="text/csv",
)
The model receives the file content and can summarize it, extract specific values, or explain what the file contains.
Combining Multiple Output Types
A single tool can return multiple outputs by returning a list. For example, a reporting tool might return both a chart image and the underlying data as text:
flowchart TD
CENTER(("Core Concepts"))
CENTER --> N0["ToolOutputText — explicit text output u…"]
CENTER --> N1["ToolOutputImage — base64-encoded images…"]
CENTER --> N2["ToolOutputFileContent — file content CS…"]
CENTER --> N3["Use ToolOutputImage to return charts, s…"]
CENTER --> N4["Use ToolOutputFileContent for downloada…"]
CENTER --> N5["Return a list to combine multiple outpu…"]
style CENTER fill:#4f46e5,stroke:#4338ca,color:#fff
import base64
import io
from agents import function_tool
from agents.tool import ToolOutputImage, ToolOutputText
@function_tool
def sales_dashboard(quarter: str) -> list:
"""Generate a sales dashboard with chart and summary for the given quarter."""
import matplotlib
matplotlib.use("Agg")
import matplotlib.pyplot as plt
months = ["Month 1", "Month 2", "Month 3"]
revenue = [45000, 52000, 61000]
fig, ax = plt.subplots()
ax.plot(months, revenue, marker="o")
ax.set_title(f"Revenue Trend - {quarter}")
ax.set_ylabel("Revenue ($)")
buf = io.BytesIO()
fig.savefig(buf, format="png")
buf.seek(0)
plt.close(fig)
image_data = base64.b64encode(buf.read()).decode("utf-8")
total = sum(revenue)
return [
ToolOutputImage(image_data=image_data, media_type="image/png"),
ToolOutputText(text=f"Total revenue for {quarter}: ${total:,}. Growth rate: 35.6% from Month 1 to Month 3."),
]
When you return a list, the SDK sends each item as a separate content block in the tool response. The agent sees all of them and can reference both the chart and the text in its reply.
Using ToolOutputText for Explicit Text
You might wonder why ToolOutputText exists when you can just return a string. The answer is composition — when you return a list of outputs, you need explicit types for each element:
from agents.tool import ToolOutputText, ToolOutputImage
@function_tool
def analyze_image(image_url: str) -> list:
"""Download an image, analyze it, and return both the image and analysis."""
# Download and process the image...
image_base64 = "..." # base64 encoded image
return [
ToolOutputImage(image_data=image_base64, media_type="image/jpeg"),
ToolOutputText(text="Analysis: The image contains a landscape with mountains and a lake. Dominant colors are blue and green."),
]
Structured Data as Tool Output
For tools that return structured data (JSON, tables, records), you have two options. The simplest is to format the data as a readable string:
import json
from agents import function_tool
@function_tool
def get_customer_profile(customer_id: str) -> str:
"""Retrieve a customer's full profile."""
profile = {
"id": customer_id,
"name": "Jane Smith",
"plan": "Enterprise",
"usage": {"api_calls": 15420, "storage_gb": 42.3},
"status": "active",
}
return json.dumps(profile, indent=2)
The agent parses JSON naturally and can extract specific fields when answering user questions. For very large or complex data, consider summarizing it in the tool before returning.
Key Takeaways
- Use
ToolOutputImageto return charts, screenshots, and generated visuals - Use
ToolOutputFileContentfor downloadable files like CSVs and PDFs - Return a list to combine multiple output types from a single tool call
- Use
ToolOutputTextwhen mixing text with other output types in a list - For structured data, JSON strings work well — the model parses them naturally
- Always base64-encode binary content before wrapping in output types
Written by
CallSphere Team
Expert insights on AI voice agents and customer communication automation.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.