Chat Agent with Code Interpreter for Data Analysis
Build a data analysis chat agent using the OpenAI Agents SDK CodeInterpreterTool that executes Python code, generates visualizations, processes uploaded files, and returns structured results.
Why Chat Agents Need Code Execution
Text-only chat agents hit a wall when users ask quantitative questions. "What was the average revenue per customer last quarter?" requires actual computation on actual data — not a language model's best guess at math. Code Interpreter bridges this gap by giving the agent a sandboxed Python environment where it can write and execute code, process uploaded files, and generate visualizations.
The OpenAI Agents SDK wraps Code Interpreter as CodeInterpreterTool, making it a first-class tool that the agent can invoke alongside custom functions. In this guide, we build a data analysis chat agent that accepts CSV uploads, runs computations, generates charts, and returns structured insights.
Agent Configuration
The data analysis agent needs CodeInterpreterTool plus clear instructions about how to approach data analysis tasks.
flowchart TD
START["Chat Agent with Code Interpreter for Data Analysis"] --> A
A["Why Chat Agents Need Code Execution"]
A --> B
B["Agent Configuration"]
B --> C
C["File Upload and Agent Execution"]
C --> D
D["Handling Generated Visualizations"]
D --> E
E["Building Analysis Workflows"]
E --> F
F["Combining Code Interpreter with Custom …"]
F --> G
G["Production Considerations"]
G --> DONE["Key Takeaways"]
style START fill:#4f46e5,stroke:#4338ca,color:#fff
style DONE fill:#059669,stroke:#047857,color:#fff
# agents/analyst_agent.py
from agents import Agent
from agents.tool import CodeInterpreterTool
analyst_agent = Agent(
name="data_analyst",
model="gpt-4o",
instructions="""You are a data analysis assistant. Users will upload data files
or ask analytical questions. Your job is to:
1. Understand the user's analytical question
2. Write Python code to load, clean, and analyze the data
3. Generate visualizations when they would help explain results
4. Summarize findings in plain language with specific numbers
Guidelines:
- Always start by examining the data: check shape, columns, dtypes, and missing values
- Use pandas for data manipulation and matplotlib or seaborn for charts
- Show your work: explain what the code does before running it
- Round numbers appropriately (2 decimal places for money, 1 for percentages)
- When generating charts, use clear titles, labels, and legends
- If the data has issues (missing values, inconsistent types), note them
- Always provide a plain-language summary of the results after the analysis""",
tools=[CodeInterpreterTool()],
)
File Upload and Agent Execution
Users upload data files through the API. The files are passed to the agent run so Code Interpreter can access them.
# main.py
import openai
from fastapi import FastAPI, UploadFile, File, Form
from agents import Runner
from agents.analyst_agent import analyst_agent
from session_manager import SessionManager
app = FastAPI()
sessions = SessionManager()
oa_client = openai.OpenAI()
@app.post("/analyze")
async def analyze_data(
session_id: str = Form(...),
message: str = Form(...),
file: UploadFile | None = File(None),
):
session = sessions.get_or_create(session_id)
# Upload file to OpenAI if provided
file_ids = []
if file:
content = await file.read()
uploaded = oa_client.files.create(
file=(file.filename, content),
purpose="assistants",
)
file_ids.append(uploaded.id)
session.add_message(
"user",
f"{message}\n\n[Uploaded file: {file.filename}]",
)
else:
session.add_message("user", message)
# Build input with file attachments
input_list = session.to_input_list()
if file_ids:
# Attach files to the last user message
input_list[-1]["attachments"] = [
{"file_id": fid, "tools": [{"type": "code_interpreter"}]}
for fid in file_ids
]
result = await Runner.run(analyst_agent, input=input_list)
session.result = result
response_text = result.final_output
session.add_message("assistant", response_text)
# Extract generated files (charts, processed data)
output_files = extract_output_files(result)
return {
"response": response_text,
"output_files": output_files,
}
def extract_output_files(result) -> list[dict]:
"""Extract file outputs (charts, CSVs) from the agent result."""
files = []
for item in result.new_items:
if hasattr(item, "file_id") and item.file_id:
files.append({
"file_id": item.file_id,
"type": getattr(item, "mime_type", "application/octet-stream"),
})
return files
Handling Generated Visualizations
When Code Interpreter generates a chart, it produces an image file. The API needs to serve these files so the frontend can display them inline in the chat.
flowchart TD
CENTER(("Core Concepts"))
CENTER --> N0["Code Interpreter loads the CSV with pan…"]
CENTER --> N1["Inspects columns and data types"]
CENTER --> N2["Parses date columns and groups by month"]
CENTER --> N3["Computes monthly revenue totals"]
CENTER --> N4["Generates a line chart for the trend"]
CENTER --> N5["Sorts customers by total spend and extr…"]
style CENTER fill:#4f46e5,stroke:#4338ca,color:#fff
# file_routes.py
import openai
from fastapi import APIRouter, HTTPException
from fastapi.responses import Response
router = APIRouter()
oa_client = openai.OpenAI()
@router.get("/files/{file_id}")
async def get_file(file_id: str):
"""Download a file generated by Code Interpreter."""
try:
content = oa_client.files.content(file_id)
data = content.read()
# Determine content type from file metadata
file_info = oa_client.files.retrieve(file_id)
filename = file_info.filename or "output"
if filename.endswith(".png"):
media_type = "image/png"
elif filename.endswith(".csv"):
media_type = "text/csv"
elif filename.endswith(".json"):
media_type = "application/json"
else:
media_type = "application/octet-stream"
return Response(
content=data,
media_type=media_type,
headers={
"Content-Disposition": f"inline; filename={filename}"
},
)
except Exception as e:
raise HTTPException(status_code=404, detail="File not found")
Building Analysis Workflows
For complex analysis tasks, the agent naturally breaks the work into steps — loading data, cleaning it, computing metrics, and generating visualizations. Here is how a typical conversation flows:
User: "Upload sales_q4.csv — show me monthly revenue trends and identify the top 5 customers by total spend."
See AI Voice Agents Handle Real Calls
Book a free demo or calculate how much you can save with AI voice automation.
Agent execution sequence:
- Code Interpreter loads the CSV with pandas
- Inspects columns and data types
- Parses date columns and groups by month
- Computes monthly revenue totals
- Generates a line chart for the trend
- Sorts customers by total spend and extracts top 5
- Generates a bar chart for top customers
- Returns a summary with specific numbers and both charts
The agent handles all of this in a single turn because Code Interpreter can execute multiple code cells sequentially.
Combining Code Interpreter with Custom Tools
The real power emerges when you combine Code Interpreter with domain-specific tools. The agent can fetch live data from your APIs, then analyze it with Code Interpreter.
# agents/enhanced_analyst.py
from agents import Agent, function_tool
from agents.tool import CodeInterpreterTool
import httpx
@function_tool
async def fetch_sales_data(
start_date: str,
end_date: str,
region: str = "all",
) -> str:
"""Fetch sales data from the internal API for analysis."""
async with httpx.AsyncClient() as client:
resp = await client.get(
"http://localhost:8000/internal/sales",
params={
"start": start_date,
"end": end_date,
"region": region,
},
)
data = resp.json()
# Return as CSV-formatted string for Code Interpreter
if not data["records"]:
return "No records found for the specified period."
headers = list(data["records"][0].keys())
rows = [",".join(headers)]
for record in data["records"]:
rows.append(",".join(str(record[h]) for h in headers))
return "\n".join(rows)
@function_tool
async def fetch_customer_segments() -> str:
"""Fetch customer segmentation data for enriching analysis."""
async with httpx.AsyncClient() as client:
resp = await client.get(
"http://localhost:8000/internal/customers/segments"
)
data = resp.json()
headers = ["customer_id", "segment", "lifetime_value", "join_date"]
rows = [",".join(headers)]
for customer in data["customers"]:
rows.append(",".join(str(customer[h]) for h in headers))
return "\n".join(rows)
enhanced_analyst = Agent(
name="enhanced_analyst",
model="gpt-4o",
instructions="""You are a business intelligence analyst with access to live
company data and a Python code execution environment.
Workflow:
1. Use fetch tools to retrieve relevant data from company systems
2. Use Code Interpreter to analyze the data with pandas
3. Generate charts for visual insights
4. Provide a clear executive summary with specific numbers
Always fetch fresh data rather than asking the user to upload files.
Combine multiple data sources when it would improve the analysis.""",
tools=[
CodeInterpreterTool(),
fetch_sales_data,
fetch_customer_segments,
],
)
The agent decides which tools to call and in what order. A typical flow: the user asks "How did enterprise customers perform in Q4?" The agent calls fetch_sales_data and fetch_customer_segments, then uses Code Interpreter to join the datasets, filter for enterprise segments, compute metrics, and generate charts.
Production Considerations
File size limits. Code Interpreter has limits on uploaded file size (currently around 512 MB). For larger datasets, pre-process files to include only the relevant subset, or have a custom tool that queries the database directly and returns aggregated results.
Execution timeouts. Code Interpreter code cells have a timeout. Long-running computations (training ML models, processing millions of rows) may be interrupted. Design your agent instructions to work with reasonable data sizes and suggest sampling for very large datasets.
Security. Code Interpreter runs in a sandboxed environment — it cannot access your server or network. However, uploaded files may contain sensitive data. Validate and sanitize uploads before passing them to the API. Never upload files containing credentials or PII unless your compliance requirements allow it.
Cost management. Each Code Interpreter invocation consumes compute time that is billed separately from token usage. Monitor usage through the OpenAI dashboard and set spending alerts. For high-volume applications, cache analysis results for common queries.
Code Interpreter transforms a chat agent from a text generator into a computational partner. Combined with custom tools for data access, it creates a self-service analytics experience where users get specific, data-backed answers to business questions through natural conversation.
Written by
CallSphere Team
Expert insights on AI voice agents and customer communication automation.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.