---
title: "WebSocket Transport for Low-Latency Agent Communication"
description: "Enable WebSocket transport in the OpenAI Agents SDK for persistent connections, reduced latency, and faster multi-turn agent interactions using set_default_openai_responses_transport."
canonical: https://callsphere.ai/blog/websocket-transport-low-latency-agent-communication
category: "Learn Agentic AI"
tags: ["OpenAI", "WebSocket", "Low-Latency", "Transport"]
author: "CallSphere Team"
published: 2026-03-14T00:00:00.000Z
updated: 2026-05-06T01:15:06.238Z
---

# WebSocket Transport for Low-Latency Agent Communication

> Enable WebSocket transport in the OpenAI Agents SDK for persistent connections, reduced latency, and faster multi-turn agent interactions using set_default_openai_responses_transport.

## Why WebSocket Transport Matters for Agents

By default, the OpenAI Agents SDK uses HTTP for every API call. Each tool call, each generation, each handoff results in a new HTTP request — a new TCP connection (or at least a new request on a keep-alive connection), TLS handshake overhead, and HTTP header parsing. For a single agent call, this overhead is negligible. For a multi-agent workflow with ten tool calls and three handoffs, it adds up.

WebSocket transport replaces these individual HTTP requests with a single persistent connection. The agent opens a WebSocket to the OpenAI API once, and all subsequent messages flow over that connection with minimal overhead. The result is measurably lower latency for multi-turn and tool-heavy agent interactions.

## Enabling WebSocket Transport

The SDK provides a one-line configuration to switch to WebSocket transport:

```mermaid
flowchart LR
    REQ(["Request"])
    BATCH["Continuous batching
vLLM scheduler"]
    PREF{"Prefill or
decode?"}
    PRE["Prefill phase
parallel attention"]
    DEC["Decode phase
token by token"]
    KV[("Paged KV cache")]
    SAMP["Sampling
top-p, temp"]
    STREAM["Stream tokens
to client"]
    REQ --> BATCH --> PREF
    PREF -->|First token| PRE --> KV
    PREF -->|Next token| DEC
    KV --> DEC --> SAMP --> STREAM
    SAMP -->|EOS| DONE(["Response complete"])
    style BATCH fill:#4f46e5,stroke:#4338ca,color:#fff
    style KV fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
    style STREAM fill:#0ea5e9,stroke:#0369a1,color:#fff
    style DONE fill:#059669,stroke:#047857,color:#fff
```

```python
from agents import set_default_openai_responses_transport

# Enable WebSocket transport globally
set_default_openai_responses_transport("websocket")
```

That is it. Every subsequent `Runner.run()` call will use WebSocket instead of HTTP. No changes to your agent definitions, tools, or handoffs are needed.

## How It Works Under the Hood

When you set the transport to `"websocket"`, the SDK:

1. Opens a persistent WebSocket connection to the OpenAI Responses API
2. Sends agent generation requests as WebSocket messages
3. Receives streaming responses over the same connection
4. Keeps the connection alive across multiple tool call rounds within a single `Runner.run()`

The key performance benefit is in multi-turn interactions. Consider an agent that calls three tools sequentially:

**HTTP transport (default):**

- Request 1: Initial generation -> Response (tool call) — ~200ms overhead
- Request 2: Tool result -> Response (tool call) — ~200ms overhead
- Request 3: Tool result -> Response (tool call) — ~200ms overhead
- Request 4: Tool result -> Final response — ~200ms overhead
- Total overhead: ~800ms just from HTTP round trips

**WebSocket transport:**

- Connection established once: ~300ms
- Message 1: Initial generation -> Response (tool call) — ~20ms overhead
- Message 2: Tool result -> Response (tool call) — ~20ms overhead
- Message 3: Tool result -> Response (tool call) — ~20ms overhead
- Message 4: Tool result -> Final response — ~20ms overhead
- Total overhead: ~380ms

For tool-heavy workflows, WebSocket transport can reduce total latency by 40-60%.

## Benchmarking the Difference

Let us build a benchmark to measure the actual impact:

```python
from agents import Agent, Runner, function_tool, set_default_openai_responses_transport
import asyncio
import time

@function_tool
def get_weather(city: str) -> str:
    """Get current weather for a city."""
    return f"Weather in {city}: 22C, partly cloudy"

@function_tool
def get_population(city: str) -> str:
    """Get population of a city."""
    return f"Population of {city}: 8.3 million"

@function_tool
def get_timezone(city: str) -> str:
    """Get timezone of a city."""
    return f"Timezone of {city}: UTC+5:30"

agent = Agent(
    name="CityInfoAgent",
    model="gpt-4.1",
    instructions=(
        "When asked about a city, always call all three tools "
        "(weather, population, timezone) before responding."
    ),
    tools=[get_weather, get_population, get_timezone],
)

async def benchmark_transport(transport: str, iterations: int = 5):
    """Benchmark agent runs with the specified transport."""
    set_default_openai_responses_transport(transport)

    durations = []
    for i in range(iterations):
        start = time.monotonic()
        result = await Runner.run(
            agent,
            input="Tell me about Mumbai.",
        )
        elapsed = time.monotonic() - start
        durations.append(elapsed)

    avg = sum(durations) / len(durations)
    p95 = sorted(durations)[int(len(durations) * 0.95)]
    return {"transport": transport, "avg_ms": avg * 1000, "p95_ms": p95 * 1000}

async def main():
    print("Benchmarking HTTP transport...")
    http_results = await benchmark_transport("http", iterations=10)
    print(f"  HTTP  - avg: {http_results['avg_ms']:.0f}ms, p95: {http_results['p95_ms']:.0f}ms")

    print("Benchmarking WebSocket transport...")
    ws_results = await benchmark_transport("websocket", iterations=10)
    print(f"  WS    - avg: {ws_results['avg_ms']:.0f}ms, p95: {ws_results['p95_ms']:.0f}ms")

    improvement = (1 - ws_results["avg_ms"] / http_results["avg_ms"]) * 100
    print(f"  Improvement: {improvement:.1f}%")

asyncio.run(main())
```

In typical benchmarks, you will see 30-50% latency reduction for agents with three or more tool calls per run.

## Per-Agent Transport Configuration

You can also set the transport per-agent rather than globally:

```python
from agents import Agent

# This agent uses WebSocket for its low-latency requirement
fast_agent = Agent(
    name="FastAgent",
    model="gpt-4.1",
    instructions="Respond quickly using available tools.",
    tools=[get_weather, get_population],
    model_settings={"transport": "websocket"},
)

# This agent uses default HTTP (simpler debugging)
debug_agent = Agent(
    name="DebugAgent",
    model="gpt-4.1",
    instructions="Process requests for debugging and analysis.",
)
```

## Connection Management

WebSocket connections need lifecycle management in production. The SDK handles most of this automatically, but you should be aware of the behavior:

```python
from agents import set_default_openai_responses_transport, Runner
import asyncio

async def handle_request(user_input: str):
    """Each Runner.run() manages its own WebSocket lifecycle."""
    # The SDK opens a WebSocket for this run
    result = await Runner.run(agent, input=user_input)
    # The WebSocket is closed when the run completes
    return result.final_output

async def handle_concurrent_requests(inputs: list[str]):
    """Concurrent runs each get their own WebSocket connection."""
    tasks = [handle_request(inp) for inp in inputs]
    results = await asyncio.gather(*tasks)
    return results

# Enable WebSocket globally
set_default_openai_responses_transport("websocket")

# Handle 10 concurrent requests — each gets its own connection
asyncio.run(handle_concurrent_requests(["Query " + str(i) for i in range(10)]))
```

Each `Runner.run()` call manages its own WebSocket connection. Concurrent runs create concurrent connections. This is safe and correct — WebSocket connections are lightweight, and the OpenAI API supports many simultaneous connections per API key.

## Streaming with WebSocket Transport

WebSocket transport pairs naturally with streaming, since the connection is already persistent:

```python
from agents import Agent, Runner, set_default_openai_responses_transport

set_default_openai_responses_transport("websocket")

agent = Agent(
    name="StreamingAgent",
    model="gpt-4.1",
    instructions="Provide detailed answers.",
)

async def stream_response(user_input: str):
    """Stream agent output over WebSocket transport."""
    result = Runner.run_streamed(agent, input=user_input)

    async for event in result.stream_events():
        if event.type == "raw_response_event":
            if hasattr(event.data, "delta") and event.data.delta:
                print(event.data.delta, end="", flush=True)

    print()  # Newline after streaming completes
    return result.final_output
```

The combination of WebSocket transport and streaming gives you the lowest possible time-to-first-token for agent responses.

## When to Use WebSocket Transport

**Use WebSocket when:**

- Your agents make three or more tool calls per run
- You have multi-agent workflows with handoffs
- Latency is a key metric (real-time chat, voice agents)
- You are running streaming responses

**Stick with HTTP when:**

- Your agents are simple single-turn, no-tool interactions
- You are debugging and want clear request/response pairs in your network inspector
- Your infrastructure (proxies, load balancers) does not support WebSocket passthrough
- You are behind a corporate firewall that blocks WebSocket upgrades

## Infrastructure Considerations

If you deploy behind a reverse proxy or load balancer, ensure WebSocket support is enabled:

```nginx
# nginx.conf
location /api/agent/ {
    proxy_pass http://agent-service:8000;
    proxy_http_version 1.1;
    proxy_set_header Upgrade $http_upgrade;
    proxy_set_header Connection "upgrade";
    proxy_set_header Host $host;
    proxy_read_timeout 300s;
    proxy_send_timeout 300s;
}
```

For Kubernetes ingress:

```yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: agent-ingress
  annotations:
    nginx.ingress.kubernetes.io/proxy-read-timeout: "300"
    nginx.ingress.kubernetes.io/proxy-send-timeout: "300"
    nginx.ingress.kubernetes.io/websocket-services: "agent-service"
spec:
  rules:
    - host: agents.example.com
      http:
        paths:
          - path: /api/
            pathType: Prefix
            backend:
              service:
                name: agent-service
                port:
                  number: 8000
```

WebSocket transport is a straightforward optimization that yields meaningful latency improvements for tool-heavy and multi-agent workflows. Enable it globally, benchmark it against HTTP for your specific use case, and ensure your infrastructure supports WebSocket passthrough. The single-line configuration change makes it one of the easiest performance wins available.

---

Source: https://callsphere.ai/blog/websocket-transport-low-latency-agent-communication