When to Use Streamable HTTP

MCPServerStdio works great when the tool server runs on the same machine as the agent. But in production, your tools often live on remote servers — a company API, a cloud service, a shared tool server accessible by multiple agents. MCPServerStreamableHTTP connects your agent to remote MCP servers over HTTP, with support for streaming responses, authentication, retries, and tool caching.

Use Streamable HTTP when:

The MCP server runs on a different machine or in the cloud
Multiple agents need to share the same tool server
The tool server needs to scale independently from agents
You need authentication, rate limiting, or other HTTP-layer features

Basic Configuration

from agents.mcp import MCPServerStreamableHTTP

server = MCPServerStreamableHTTP(
    name="Remote Tools",
    params={
        "url": "https://tools.example.com/mcp",
    },
)

The url points to the MCP endpoint on the remote server. The Streamable HTTP transport communicates using HTTP POST requests with JSON-RPC payloads and receives streaming responses via Server-Sent Events.

flowchart LR
    HOST(["MCP host<br/>Claude Desktop or IDE"])
    CLIENT["MCP client"]
    subgraph SERVERS["MCP Servers"]
        S1["Filesystem server"]
        S2["GitHub server"]
        S3["Postgres server"]
        SX["Custom tool server"]
    end
    LLM["LLM session"]
    OUT(["Grounded action"])
    HOST <--> CLIENT
    CLIENT <-->|stdio or HTTP+SSE| S1
    CLIENT <--> S2
    CLIENT <--> S3
    CLIENT <--> SX
    CLIENT --> LLM --> OUT
    style HOST fill:#f1f5f9,stroke:#64748b,color:#0f172a
    style CLIENT fill:#4f46e5,stroke:#4338ca,color:#fff
    style OUT fill:#059669,stroke:#047857,color:#fff

Authentication with Headers

Most remote MCP servers require authentication. Pass headers in the configuration:

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live →

Try Live Demo →

import os

server = MCPServerStreamableHTTP(
    name="Authenticated Tools",
    params={
        "url": "https://tools.example.com/mcp",
        "headers": {
            "Authorization": f"Bearer {os.environ['MCP_API_KEY']}",
            "X-Org-Id": "org_12345",
        },
    },
)

For OAuth-based authentication where tokens expire:

class TokenRefreshingMCPServer:
    """Wrapper that refreshes auth tokens before connecting."""

    def __init__(self, url: str, token_provider):
        self.url = url
        self.token_provider = token_provider

    async def get_server(self) -> MCPServerStreamableHTTP:
        token = await self.token_provider.get_valid_token()
        return MCPServerStreamableHTTP(
            name="OAuth Tools",
            params={
                "url": self.url,
                "headers": {
                    "Authorization": f"Bearer {token}",
                },
            },
        )

# Usage
token_provider = OAuthTokenProvider(
    client_id="your_client_id",
    client_secret="your_client_secret",
    token_url="https://auth.example.com/token",
)

refreshing_server = TokenRefreshingMCPServer(
    url="https://tools.example.com/mcp",
    token_provider=token_provider,
)

server = await refreshing_server.get_server()

Timeout and Retry Configuration

Remote servers can be slow or temporarily unavailable. Configure timeouts and retries to handle this gracefully:

server = MCPServerStreamableHTTP(
    name="Resilient Remote Tools",
    params={
        "url": "https://tools.example.com/mcp",
        "headers": {
            "Authorization": f"Bearer {os.environ['MCP_API_KEY']}",
        },
        "timeout": 30,           # Connection timeout in seconds
        "sse_read_timeout": 300,  # SSE stream read timeout for long operations
    },
)

The distinction between timeout and sse_read_timeout matters: timeout is the initial connection timeout, while sse_read_timeout controls how long to wait for streaming data. Long-running tools (like database migrations or file processing) need a generous sse_read_timeout.

Retry with Backoff

For production reliability, configure retry behavior:

from agents.mcp import MCPServerStreamableHTTP

server = MCPServerStreamableHTTP(
    name="Production Tools",
    params={
        "url": "https://tools.example.com/mcp",
        "headers": {"Authorization": f"Bearer {os.environ['MCP_API_KEY']}"},
    },
    # Client-side retry configuration
    client_session_timeout_seconds=300,
)

For more control over retries, wrap the server connection with custom logic:

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Try Live Demo → Book 30-min Walkthrough See Pricing

import asyncio
from typing import Optional

async def connect_with_retry(
    server: MCPServerStreamableHTTP,
    max_attempts: int = 3,
    base_delay: float = 1.0,
) -> bool:
    """Connect to an MCP server with exponential backoff."""
    for attempt in range(max_attempts):
        try:
            await server.connect()
            return True
        except Exception as e:
            if attempt == max_attempts - 1:
                raise
            delay = base_delay * (2 ** attempt)
            print(f"Connection attempt {attempt + 1} failed: {e}. Retrying in {delay}s...")
            await asyncio.sleep(delay)
    return False

Caching Tool Lists for Performance

Every time you enter the async with block, the client fetches the server's tool list. For servers with stable tool sets, this is redundant overhead. Enable caching:

server = MCPServerStreamableHTTP(
    name="Cached Tools",
    params={
        "url": "https://tools.example.com/mcp",
        "headers": {"Authorization": f"Bearer {os.environ['MCP_API_KEY']}"},
    },
    cache_tools_list=True,  # Cache the tool list across connections
)

With cache_tools_list=True, the tool list is fetched once and reused on subsequent connections. This saves a round trip on every agent run. Disable caching only if the server's tools change frequently.

Building an Agent with Remote API Tools

Here is a complete example connecting to a remote CRM tools server:

import asyncio
from agents import Agent, Runner
from agents.mcp import MCPServerStreamableHTTP

async def main():
    # Connect to a remote CRM tool server
    crm_server = MCPServerStreamableHTTP(
        name="CRM Tools",
        params={
            "url": "https://crm-tools.internal.company.com/mcp",
            "headers": {
                "Authorization": f"Bearer {os.environ['CRM_MCP_TOKEN']}",
                "X-Team": "sales",
            },
            "timeout": 15,
            "sse_read_timeout": 120,
        },
        cache_tools_list=True,
    )

    # Connect to a remote analytics server
    analytics_server = MCPServerStreamableHTTP(
        name="Analytics Tools",
        params={
            "url": "https://analytics-tools.internal.company.com/mcp",
            "headers": {
                "Authorization": f"Bearer {os.environ['ANALYTICS_MCP_TOKEN']}",
            },
        },
        cache_tools_list=True,
    )

    async with crm_server, analytics_server:
        agent = Agent(
            name="Sales Intelligence Agent",
            instructions="""You are a sales intelligence assistant with access
            to CRM data and analytics tools.

            Use CRM tools to look up contacts, deals, and account history.
            Use analytics tools to pull pipeline metrics and forecasts.

            Always cite specific data points when making recommendations.
            Never guess — if you cannot find the data, say so.""",
            mcp_servers=[crm_server, analytics_server],
        )

        result = await Runner.run(
            agent,
            input="What is the current pipeline value for Q2 and which deals are most at risk?",
        )
        print(result.final_output)

asyncio.run(main())

Building a Remote MCP Server

Here is how to build the server side using FastMCP with HTTP transport:

# crm_tools_server.py
from mcp.server.fastmcp import FastMCP
import asyncpg

mcp = FastMCP("CRM Tools")
db_pool = None

@mcp.tool()
async def search_contacts(query: str, limit: int = 10) -> str:
    """Search CRM contacts by name, email, or company."""
    rows = await db_pool.fetch(
        """
        SELECT name, email, company, deal_count, total_revenue
        FROM contacts
        WHERE name ILIKE $1 OR email ILIKE $1 OR company ILIKE $1
        ORDER BY total_revenue DESC
        LIMIT $2
        """,
        f"%{query}%",
        limit,
    )
    if not rows:
        return "No contacts found matching the query."
    results = []
    for r in rows:
        results.append(
            f"- {r['name']} ({r['email']}) at {r['company']}: "
            f"{r['deal_count']} deals, ${r['total_revenue']:,.0f} revenue"
        )
    return "\n".join(results)

@mcp.tool()
async def get_pipeline_summary(quarter: str) -> str:
    """Get deal pipeline summary for a given quarter (e.g., 'Q2 2026')."""
    rows = await db_pool.fetch(
        """
        SELECT stage, COUNT(*) as deal_count, SUM(value) as total_value
        FROM deals
        WHERE quarter = $1
        GROUP BY stage
        ORDER BY total_value DESC
        """,
        quarter,
    )
    if not rows:
        return f"No pipeline data found for {quarter}."
    lines = [f"Pipeline for {quarter}:"]
    for r in rows:
        lines.append(
            f"  {r['stage']}: {r['deal_count']} deals, ${r['total_value']:,.0f}"
        )
    return "\n".join(lines)

if __name__ == "__main__":
    import asyncio

    async def setup():
        global db_pool
        db_pool = await asyncpg.create_pool(dsn="postgresql://user:pass@db:5432/crm")
        mcp.run(transport="streamable-http", host="0.0.0.0", port=8080)

    asyncio.run(setup())

Production Deployment Patterns

Health checks — Add a /health endpoint to your MCP server for load balancer probes
Rate limiting — Implement per-client rate limits to prevent one agent from monopolizing resources
Request logging — Log every tool invocation with trace IDs for debugging
Circuit breaker — If the remote server fails repeatedly, stop trying and fall back gracefully
mTLS — Use mutual TLS for service-to-service authentication in internal networks
Connection pooling — Reuse HTTP connections across multiple agent runs

MCPServerStreamableHTTP is the production transport for multi-service architectures where tools live on dedicated servers.

MCPServerStreamableHTTP: Connecting to Remote Tool Servers

When to Use Streamable HTTP

Basic Configuration

Authentication with Headers

Timeout and Retry Configuration

Retry with Backoff

Caching Tool Lists for Performance

Building an Agent with Remote API Tools

Building a Remote MCP Server

Production Deployment Patterns

Try CallSphere AI Voice Agents

Related Articles You May Like

Desktop AI Agents in 2026: Project Arc, Claude Cowork, OpenAI Agents Compared

OpenAI Frontier: Model-Native Orchestration Is the Default in 2026

Building Multi-Agent Systems With MCP, A2A, And CallSphere As A Node

Gemini Enterprise vs Anthropic vs OpenAI Frontier: 2026 Comparison

Anthropic's Financial Services Platform: State of Play in May 2026

MCP vs A2A: When To Use Which Protocol (2026 Decision Guide)