Building Production Chat Agents with OpenAI Agents SDK
Learn how to build production-grade chat agents using the OpenAI Agents SDK with tool integration, session management, streaming responses, and FastAPI backend architecture.
From Chatbot to Chat Agent
Most chat interfaces built on top of LLMs are simple request-response wrappers — the user sends a message, the API returns a completion, and the frontend displays it. These are chatbots, not agents. A chat agent is fundamentally different: it can reason about which tools to use, execute multi-step plans, hand off to specialized sub-agents, and maintain state across a conversation session.
The OpenAI Agents SDK provides the primitives to build chat agents that go beyond text generation. In this guide, we build a production chat agent from scratch using the Agents SDK, FastAPI for the backend, and proper session management for multi-user deployments.
Chat Agent Architecture
A production chat agent has four components:
flowchart TD
START["Building Production Chat Agents with OpenAI Agent…"] --> A
A["From Chatbot to Chat Agent"]
A --> B
B["Chat Agent Architecture"]
B --> C
C["Step 1: Define the Chat Agent"]
C --> D
D["Step 2: Build the Session Manager"]
D --> E
E["Step 3: FastAPI Integration"]
E --> F
F["Step 4: Error Handling and Resilience"]
F --> G
G["Step 5: Observability and Logging"]
G --> DONE["Key Takeaways"]
style START fill:#4f46e5,stroke:#4338ca,color:#fff
style DONE fill:#059669,stroke:#047857,color:#fff
- Agent Definition — the instructions, model, and tools that define the agent's behavior
- Session Layer — tracks conversation history and state per user
- API Layer — FastAPI endpoints that accept messages and return responses
- Tool Layer — functions the agent can call to interact with external systems
┌─────────────┐ HTTP ┌──────────────────┐
│ React Chat │◄────────────────────►│ FastAPI Server │
│ Frontend │ │ │
└─────────────┘ │ ┌─────────────┐ │
│ │ Session Mgr │ │
│ └──────┬──────┘ │
│ │ │
│ ┌──────▼──────┐ │
│ │ Chat Agent │ │
│ │ + Tools │ │
│ └──────┬──────┘ │
│ │ │
│ ┌──────▼──────┐ │
│ │ OpenAI API │ │
│ └─────────────┘ │
└──────────────────┘
Step 1: Define the Chat Agent
The agent definition is the core of the system. It specifies the model, system instructions, and available tools.
# agents/support_agent.py
from agents import Agent, function_tool
import httpx
@function_tool
async def search_knowledge_base(query: str) -> str:
"""Search the company knowledge base for relevant articles."""
async with httpx.AsyncClient() as client:
resp = await client.post(
"http://localhost:8000/internal/search",
json={"query": query, "limit": 3},
)
results = resp.json()
if not results["articles"]:
return "No relevant articles found."
formatted = []
for article in results["articles"]:
formatted.append(
f"**{article['title']}**\n{article['snippet']}"
)
return "\n\n".join(formatted)
@function_tool
async def create_support_ticket(
subject: str,
description: str,
priority: str = "medium",
) -> str:
"""Create a support ticket when the issue cannot be resolved in chat."""
async with httpx.AsyncClient() as client:
resp = await client.post(
"http://localhost:8000/internal/tickets",
json={
"subject": subject,
"description": description,
"priority": priority,
},
)
ticket = resp.json()
return f"Ticket {ticket['id']} created with priority {priority}."
@function_tool
def get_business_hours() -> str:
"""Return current business hours and support availability."""
return (
"Business hours: Monday-Friday 9 AM - 6 PM EST. "
"Live agent support is available during business hours. "
"Chat agent support is available 24/7."
)
support_agent = Agent(
name="support_agent",
model="gpt-4o",
instructions="""You are a helpful customer support agent for Acme Corp.
Your responsibilities:
- Answer questions about products and services using the knowledge base
- Help troubleshoot common issues
- Create support tickets for issues you cannot resolve
- Provide business hours and availability information
Guidelines:
- Always search the knowledge base before answering product questions
- Be concise but thorough in your responses
- If you cannot resolve an issue, create a ticket and let the user know
- Never make up information — if you do not know, say so
- Use a friendly, professional tone""",
tools=[search_knowledge_base, create_support_ticket, get_business_hours],
)
Step 2: Build the Session Manager
In production, multiple users chat simultaneously. Each conversation needs its own history. The session manager stores conversation state and converts it to the format the Agents SDK expects.
See AI Voice Agents Handle Real Calls
Book a free demo or calculate how much you can save with AI voice automation.
flowchart LR
S0["Step 1: Define the Chat Agent"]
S0 --> S1
S1["Step 2: Build the Session Manager"]
S1 --> S2
S2["Step 3: FastAPI Integration"]
S2 --> S3
S3["Step 4: Error Handling and Resilience"]
S3 --> S4
S4["Step 5: Observability and Logging"]
style S0 fill:#4f46e5,stroke:#4338ca,color:#fff
style S4 fill:#059669,stroke:#047857,color:#fff
# session_manager.py
import time
from dataclasses import dataclass, field
from typing import Optional
@dataclass
class ChatMessage:
role: str # "user" or "assistant"
content: str
timestamp: float = field(default_factory=time.time)
@dataclass
class ChatSession:
session_id: str
messages: list[ChatMessage] = field(default_factory=list)
created_at: float = field(default_factory=time.time)
last_active: float = field(default_factory=time.time)
result: Optional[object] = None # stores last Runner result
def add_message(self, role: str, content: str):
self.messages.append(ChatMessage(role=role, content=content))
self.last_active = time.time()
def to_input_list(self) -> list[dict]:
"""Convert session history to Agents SDK input format."""
if self.result is not None:
return self.result.to_input_list()
return [
{"role": msg.role, "content": msg.content}
for msg in self.messages
]
class SessionManager:
def __init__(self, max_sessions: int = 10000, ttl_seconds: int = 3600):
self._sessions: dict[str, ChatSession] = {}
self._max_sessions = max_sessions
self._ttl = ttl_seconds
def get_or_create(self, session_id: str) -> ChatSession:
self._cleanup_expired()
if session_id not in self._sessions:
self._sessions[session_id] = ChatSession(session_id=session_id)
return self._sessions[session_id]
def _cleanup_expired(self):
now = time.time()
expired = [
sid for sid, s in self._sessions.items()
if now - s.last_active > self._ttl
]
for sid in expired:
del self._sessions[sid]
The key method is to_input_list(). When a previous Runner.run() result exists, we call result.to_input_list() to get the full conversation history including tool calls and their results. This preserves the agent's complete context across turns.
Step 3: FastAPI Integration
The API layer connects the frontend to the agent. It handles session routing, input validation, and response formatting.
# main.py
from fastapi import FastAPI, HTTPException
from fastapi.middleware.cors import CORSMiddleware
from pydantic import BaseModel
from agents import Runner
from agents.support_agent import support_agent
from session_manager import SessionManager
app = FastAPI(title="Chat Agent API")
app.add_middleware(
CORSMiddleware,
allow_origins=["*"],
allow_methods=["*"],
allow_headers=["*"],
)
sessions = SessionManager()
class ChatRequest(BaseModel):
session_id: str
message: str
class ChatResponse(BaseModel):
session_id: str
response: str
tools_used: list[str]
@app.post("/chat", response_model=ChatResponse)
async def chat(request: ChatRequest):
if not request.message.strip():
raise HTTPException(status_code=422, detail="Message cannot be empty")
session = sessions.get_or_create(request.session_id)
session.add_message("user", request.message)
input_list = session.to_input_list()
result = await Runner.run(
support_agent,
input=input_list,
)
# Store the result for next turn context
session.result = result
session.add_message("assistant", result.final_output)
# Extract tool names from the run
tools_used = []
for item in result.new_items:
if hasattr(item, "name"):
tools_used.append(item.name)
return ChatResponse(
session_id=request.session_id,
response=result.final_output,
tools_used=tools_used,
)
@app.delete("/chat/{session_id}")
async def end_session(session_id: str):
if session_id in sessions._sessions:
del sessions._sessions[session_id]
return {"status": "session_ended"}
Step 4: Error Handling and Resilience
Production chat agents must handle failures gracefully. Network errors, API rate limits, and tool failures should not crash the server or leave the user hanging.
# error_handling.py
from agents import Runner
from agents.exceptions import MaxTurnsExceeded, ModelBehaviorError
import logging
logger = logging.getLogger(__name__)
FALLBACK_MESSAGE = (
"I apologize, but I am experiencing a temporary issue. "
"Please try again in a moment, or I can create a support "
"ticket for you."
)
async def safe_agent_run(agent, input_list, max_retries=2):
"""Run the agent with error handling and retry logic."""
for attempt in range(max_retries + 1):
try:
result = await Runner.run(
agent,
input=input_list,
max_turns=15,
)
return result
except MaxTurnsExceeded:
logger.warning("Agent exceeded max turns, returning partial result")
return None
except ModelBehaviorError as e:
logger.error(f"Model behavior error: {e}")
if attempt < max_retries:
continue
return None
except Exception as e:
logger.exception(f"Unexpected error on attempt {attempt + 1}")
if attempt < max_retries:
continue
return None
return None
Integrate the safe runner into the chat endpoint by replacing the direct Runner.run() call with safe_agent_run(). When it returns None, respond with the fallback message and offer to create a support ticket.
Step 5: Observability and Logging
Every production chat agent needs structured logging that captures the full lifecycle of each request — from user message to tool calls to final response. This is essential for debugging issues, measuring quality, and understanding user behavior.
# middleware.py
import time
import uuid
import logging
from fastapi import Request
logger = logging.getLogger("chat_agent")
async def log_chat_request(request: Request, call_next):
request_id = str(uuid.uuid4())[:8]
start_time = time.time()
response = await call_next(request)
duration_ms = (time.time() - start_time) * 1000
logger.info(
"chat_request",
extra={
"request_id": request_id,
"method": request.method,
"path": request.url.path,
"status": response.status_code,
"duration_ms": round(duration_ms, 2),
},
)
return response
The combination of structured agent definitions, proper session management, resilient error handling, and observability gives you a chat agent that can serve real users at scale. The Agents SDK handles the complex orchestration of tool calling and multi-turn reasoning, while the FastAPI layer provides the production infrastructure around it.
Written by
CallSphere Team
Expert insights on AI voice agents and customer communication automation.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.