WebSocket Transport for Low-Latency Agent Communication
Enable WebSocket transport in the OpenAI Agents SDK for persistent connections, reduced latency, and faster multi-turn agent interactions using set_default_openai_responses_transport.
Why WebSocket Transport Matters for Agents
By default, the OpenAI Agents SDK uses HTTP for every API call. Each tool call, each generation, each handoff results in a new HTTP request — a new TCP connection (or at least a new request on a keep-alive connection), TLS handshake overhead, and HTTP header parsing. For a single agent call, this overhead is negligible. For a multi-agent workflow with ten tool calls and three handoffs, it adds up.
WebSocket transport replaces these individual HTTP requests with a single persistent connection. The agent opens a WebSocket to the OpenAI API once, and all subsequent messages flow over that connection with minimal overhead. The result is measurably lower latency for multi-turn and tool-heavy agent interactions.
Enabling WebSocket Transport
The SDK provides a one-line configuration to switch to WebSocket transport:
flowchart TD
START["WebSocket Transport for Low-Latency Agent Communi…"] --> A
A["Why WebSocket Transport Matters for Age…"]
A --> B
B["Enabling WebSocket Transport"]
B --> C
C["How It Works Under the Hood"]
C --> D
D["Benchmarking the Difference"]
D --> E
E["Per-Agent Transport Configuration"]
E --> F
F["Connection Management"]
F --> G
G["Streaming with WebSocket Transport"]
G --> H
H["When to Use WebSocket Transport"]
H --> DONE["Key Takeaways"]
style START fill:#4f46e5,stroke:#4338ca,color:#fff
style DONE fill:#059669,stroke:#047857,color:#fff
from agents import set_default_openai_responses_transport
# Enable WebSocket transport globally
set_default_openai_responses_transport("websocket")
That is it. Every subsequent Runner.run() call will use WebSocket instead of HTTP. No changes to your agent definitions, tools, or handoffs are needed.
How It Works Under the Hood
When you set the transport to "websocket", the SDK:
- Opens a persistent WebSocket connection to the OpenAI Responses API
- Sends agent generation requests as WebSocket messages
- Receives streaming responses over the same connection
- Keeps the connection alive across multiple tool call rounds within a single
Runner.run()
The key performance benefit is in multi-turn interactions. Consider an agent that calls three tools sequentially:
HTTP transport (default):
- Request 1: Initial generation -> Response (tool call) — ~200ms overhead
- Request 2: Tool result -> Response (tool call) — ~200ms overhead
- Request 3: Tool result -> Response (tool call) — ~200ms overhead
- Request 4: Tool result -> Final response — ~200ms overhead
- Total overhead: ~800ms just from HTTP round trips
WebSocket transport:
- Connection established once: ~300ms
- Message 1: Initial generation -> Response (tool call) — ~20ms overhead
- Message 2: Tool result -> Response (tool call) — ~20ms overhead
- Message 3: Tool result -> Response (tool call) — ~20ms overhead
- Message 4: Tool result -> Final response — ~20ms overhead
- Total overhead: ~380ms
For tool-heavy workflows, WebSocket transport can reduce total latency by 40-60%.
See AI Voice Agents Handle Real Calls
Book a free demo or calculate how much you can save with AI voice automation.
Benchmarking the Difference
Let us build a benchmark to measure the actual impact:
from agents import Agent, Runner, function_tool, set_default_openai_responses_transport
import asyncio
import time
@function_tool
def get_weather(city: str) -> str:
"""Get current weather for a city."""
return f"Weather in {city}: 22C, partly cloudy"
@function_tool
def get_population(city: str) -> str:
"""Get population of a city."""
return f"Population of {city}: 8.3 million"
@function_tool
def get_timezone(city: str) -> str:
"""Get timezone of a city."""
return f"Timezone of {city}: UTC+5:30"
agent = Agent(
name="CityInfoAgent",
model="gpt-4.1",
instructions=(
"When asked about a city, always call all three tools "
"(weather, population, timezone) before responding."
),
tools=[get_weather, get_population, get_timezone],
)
async def benchmark_transport(transport: str, iterations: int = 5):
"""Benchmark agent runs with the specified transport."""
set_default_openai_responses_transport(transport)
durations = []
for i in range(iterations):
start = time.monotonic()
result = await Runner.run(
agent,
input="Tell me about Mumbai.",
)
elapsed = time.monotonic() - start
durations.append(elapsed)
avg = sum(durations) / len(durations)
p95 = sorted(durations)[int(len(durations) * 0.95)]
return {"transport": transport, "avg_ms": avg * 1000, "p95_ms": p95 * 1000}
async def main():
print("Benchmarking HTTP transport...")
http_results = await benchmark_transport("http", iterations=10)
print(f" HTTP - avg: {http_results['avg_ms']:.0f}ms, p95: {http_results['p95_ms']:.0f}ms")
print("Benchmarking WebSocket transport...")
ws_results = await benchmark_transport("websocket", iterations=10)
print(f" WS - avg: {ws_results['avg_ms']:.0f}ms, p95: {ws_results['p95_ms']:.0f}ms")
improvement = (1 - ws_results["avg_ms"] / http_results["avg_ms"]) * 100
print(f" Improvement: {improvement:.1f}%")
asyncio.run(main())
In typical benchmarks, you will see 30-50% latency reduction for agents with three or more tool calls per run.
Per-Agent Transport Configuration
You can also set the transport per-agent rather than globally:
flowchart TD
CENTER(("Core Concepts"))
CENTER --> N0["Opens a persistent WebSocket connection…"]
CENTER --> N1["Sends agent generation requests as WebS…"]
CENTER --> N2["Receives streaming responses over the s…"]
CENTER --> N3["Keeps the connection alive across multi…"]
CENTER --> N4["Request 1: Initial generation -gt Respo…"]
CENTER --> N5["Request 2: Tool result -gt Response too…"]
style CENTER fill:#4f46e5,stroke:#4338ca,color:#fff
from agents import Agent
# This agent uses WebSocket for its low-latency requirement
fast_agent = Agent(
name="FastAgent",
model="gpt-4.1",
instructions="Respond quickly using available tools.",
tools=[get_weather, get_population],
model_settings={"transport": "websocket"},
)
# This agent uses default HTTP (simpler debugging)
debug_agent = Agent(
name="DebugAgent",
model="gpt-4.1",
instructions="Process requests for debugging and analysis.",
)
Connection Management
WebSocket connections need lifecycle management in production. The SDK handles most of this automatically, but you should be aware of the behavior:
from agents import set_default_openai_responses_transport, Runner
import asyncio
async def handle_request(user_input: str):
"""Each Runner.run() manages its own WebSocket lifecycle."""
# The SDK opens a WebSocket for this run
result = await Runner.run(agent, input=user_input)
# The WebSocket is closed when the run completes
return result.final_output
async def handle_concurrent_requests(inputs: list[str]):
"""Concurrent runs each get their own WebSocket connection."""
tasks = [handle_request(inp) for inp in inputs]
results = await asyncio.gather(*tasks)
return results
# Enable WebSocket globally
set_default_openai_responses_transport("websocket")
# Handle 10 concurrent requests — each gets its own connection
asyncio.run(handle_concurrent_requests(["Query " + str(i) for i in range(10)]))
Each Runner.run() call manages its own WebSocket connection. Concurrent runs create concurrent connections. This is safe and correct — WebSocket connections are lightweight, and the OpenAI API supports many simultaneous connections per API key.
Streaming with WebSocket Transport
WebSocket transport pairs naturally with streaming, since the connection is already persistent:
from agents import Agent, Runner, set_default_openai_responses_transport
set_default_openai_responses_transport("websocket")
agent = Agent(
name="StreamingAgent",
model="gpt-4.1",
instructions="Provide detailed answers.",
)
async def stream_response(user_input: str):
"""Stream agent output over WebSocket transport."""
result = Runner.run_streamed(agent, input=user_input)
async for event in result.stream_events():
if event.type == "raw_response_event":
if hasattr(event.data, "delta") and event.data.delta:
print(event.data.delta, end="", flush=True)
print() # Newline after streaming completes
return result.final_output
The combination of WebSocket transport and streaming gives you the lowest possible time-to-first-token for agent responses.
When to Use WebSocket Transport
Use WebSocket when:
- Your agents make three or more tool calls per run
- You have multi-agent workflows with handoffs
- Latency is a key metric (real-time chat, voice agents)
- You are running streaming responses
Stick with HTTP when:
- Your agents are simple single-turn, no-tool interactions
- You are debugging and want clear request/response pairs in your network inspector
- Your infrastructure (proxies, load balancers) does not support WebSocket passthrough
- You are behind a corporate firewall that blocks WebSocket upgrades
Infrastructure Considerations
If you deploy behind a reverse proxy or load balancer, ensure WebSocket support is enabled:
# nginx.conf
location /api/agent/ {
proxy_pass http://agent-service:8000;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
proxy_set_header Host $host;
proxy_read_timeout 300s;
proxy_send_timeout 300s;
}
For Kubernetes ingress:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: agent-ingress
annotations:
nginx.ingress.kubernetes.io/proxy-read-timeout: "300"
nginx.ingress.kubernetes.io/proxy-send-timeout: "300"
nginx.ingress.kubernetes.io/websocket-services: "agent-service"
spec:
rules:
- host: agents.example.com
http:
paths:
- path: /api/
pathType: Prefix
backend:
service:
name: agent-service
port:
number: 8000
WebSocket transport is a straightforward optimization that yields meaningful latency improvements for tool-heavy and multi-agent workflows. Enable it globally, benchmark it against HTTP for your specific use case, and ensure your infrastructure supports WebSocket passthrough. The single-line configuration change makes it one of the easiest performance wins available.
Written by
CallSphere Team
Expert insights on AI voice agents and customer communication automation.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.