The Hidden Cost of Tool Discovery

Every time an MCP agent starts a run, it calls list_tools() on each connected MCP server. This discovery step fetches the name, description, and JSON schema for every tool the server exposes. For a stdio server, that means spawning a subprocess, waiting for initialization, and exchanging JSON-RPC messages. For an HTTP server, it means a network round-trip.

When you have a single server with five tools, the cost is negligible. But production agents often connect to three, four, or more servers — a filesystem server, a database server, a search server, and a custom business logic server. Each server might expose ten to twenty tools. Suddenly, tool discovery adds 500 milliseconds to two seconds of latency before the agent can process its first message.

The fix is straightforward: cache the tool definitions so that discovery only happens once.

Enabling cache_tools_list

The OpenAI Agents SDK supports tool caching directly on MCP server instances. When you set cache_tools_list=True, the SDK stores the tool definitions after the first list_tools() call and reuses them on subsequent agent runs without re-fetching:

flowchart LR
    HOST(["MCP host<br/>Claude Desktop or IDE"])
    CLIENT["MCP client"]
    subgraph SERVERS["MCP Servers"]
        S1["Filesystem server"]
        S2["GitHub server"]
        S3["Postgres server"]
        SX["Custom tool server"]
    end
    LLM["LLM session"]
    OUT(["Grounded action"])
    HOST <--> CLIENT
    CLIENT <-->|stdio or HTTP+SSE| S1
    CLIENT <--> S2
    CLIENT <--> S3
    CLIENT <--> SX
    CLIENT --> LLM --> OUT
    style HOST fill:#f1f5f9,stroke:#64748b,color:#0f172a
    style CLIENT fill:#4f46e5,stroke:#4338ca,color:#fff
    style OUT fill:#059669,stroke:#047857,color:#fff

from agents.mcp import MCPServerStdio, MCPServerStreamableHTTP

# Stdio server with caching enabled
filesystem_server = MCPServerStdio(
    name="Filesystem",
    params={
        "command": "npx",
        "args": ["-y", "@modelcontextprotocol/server-filesystem", "/data"],
    },
    cache_tools_list=True,
)

# HTTP server with caching enabled
db_server = MCPServerStreamableHTTP(
    name="Database",
    params={
        "url": "http://localhost:8001/mcp",
    },
    cache_tools_list=True,
)

With caching enabled, the first agent run performs normal tool discovery. Every subsequent run skips the discovery step entirely and uses the cached schemas. For stdio servers, this is especially impactful because it avoids re-spawning the subprocess just to enumerate tools.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live →

Try Live Demo →

How the Cache Works Internally

The caching mechanism is simple but effective. When cache_tools_list is True, the SDK stores the result of list_tools() in memory on the server object. On subsequent calls, it returns the stored list immediately instead of making a JSON-RPC request.

This means the cache lives for the lifetime of the server object. If you create a new MCPServerStdio instance, it starts with an empty cache. If you reuse the same instance across multiple Runner.run() calls — which is the recommended pattern — the cache persists.

from agents import Agent, Runner

# Create server once, reuse across runs
server = MCPServerStdio(
    name="Tools",
    params={"command": "npx", "args": ["-y", "@modelcontextprotocol/server-tools"]},
    cache_tools_list=True,
)

agent = Agent(
    name="Assistant",
    instructions="You are a helpful assistant.",
    mcp_servers=[server],
)

async def handle_request(user_message: str):
    # First call: discovers tools, caches them
    # All subsequent calls: uses cached tool list
    result = await Runner.run(agent, user_message)
    return result.final_output

Invalidating the Cache

Caching introduces a consistency problem. If the MCP server adds, removes, or modifies tools after the initial discovery, the cached list becomes stale. The agent might try to call a tool that no longer exists, or miss a newly added tool.

The SDK provides invalidate_tools_cache() to handle this:

# After deploying a new version of the MCP server
filesystem_server.invalidate_tools_cache()

# The next Runner.run() call will re-discover tools
result = await Runner.run(agent, "List all files in /data")

You can also build automatic invalidation into your workflow. A common pattern is to invalidate on a schedule or in response to deployment events:

import asyncio
from datetime import datetime

class ManagedMCPServer:
    def __init__(self, server, refresh_interval_seconds=300):
        self.server = server
        self.refresh_interval = refresh_interval_seconds
        self.last_refresh = datetime.now()

    async def maybe_refresh(self):
        elapsed = (datetime.now() - self.last_refresh).total_seconds()
        if elapsed > self.refresh_interval:
            self.server.invalidate_tools_cache()
            self.last_refresh = datetime.now()

    async def run_agent(self, agent, message):
        await self.maybe_refresh()
        return await Runner.run(agent, message)

Another approach is event-driven invalidation. If your MCP servers are deployed via CI/CD, you can send a webhook or message to your agent service whenever a server is redeployed:

from fastapi import FastAPI

app = FastAPI()
servers = {}

@app.post("/webhook/server-deployed")
async def on_server_deployed(server_name: str):
    if server_name in servers:
        servers[server_name].invalidate_tools_cache()
        return {"status": "cache_invalidated", "server": server_name}
    return {"status": "server_not_found"}

Latency Benchmarks

To quantify the impact of caching, here are measurements from a real agent setup with three MCP servers. The environment uses MCPServerStdio for a filesystem server, MCPServerStreamableHTTP for a database server, and another stdio server for a custom tools package.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Try Live Demo → Book 30-min Walkthrough See Pricing

Without caching (tool discovery on every run):

Server	Discovery Time
Filesystem (stdio)	420ms
Database (HTTP)	85ms
Custom tools (stdio)	380ms
Total	885ms

With cache_tools_list=True (after first run):

Server	Discovery Time
Filesystem (stdio)	<1ms
Database (HTTP)	<1ms
Custom tools (stdio)	<1ms
Total	<3ms

That is a 99.7% reduction in tool discovery latency. For an agent handling real-time chat, cutting 880 milliseconds from every response cycle is transformative.

Benchmarking Your Own Setup

You can measure tool discovery latency in your own environment with a simple timing wrapper:

import time
from agents.mcp import MCPServerStdio

async def benchmark_tool_discovery(server, iterations=10):
    times = []
    for i in range(iterations):
        server.invalidate_tools_cache()
        start = time.perf_counter()
        tools = await server.list_tools()
        elapsed = (time.perf_counter() - start) * 1000
        times.append(elapsed)
        print(f"  Run {i+1}: {elapsed:.1f}ms ({len(tools)} tools)")
    avg = sum(times) / len(times)
    print(f"  Average: {avg:.1f}ms")
    return avg

async def benchmark_cached(server, iterations=10):
    # Prime the cache
    await server.list_tools()
    times = []
    for i in range(iterations):
        start = time.perf_counter()
        tools = await server.list_tools()
        elapsed = (time.perf_counter() - start) * 1000
        times.append(elapsed)
    avg = sum(times) / len(times)
    print(f"  Cached average: {avg:.2f}ms")
    return avg

When Not to Cache

Caching is not always the right choice. Avoid it when:

Tools change frequently during development. If you are actively iterating on an MCP server and adding or renaming tools, stale caches will cause confusing errors.
The server is short-lived. If each agent run creates and destroys a new server instance, caching provides no benefit because the cache is lost with the instance.
Tool availability is dynamic. Some servers expose different tools based on the authenticated user or context. Caching a tool list from one user would be incorrect for another.

For all other cases — and especially in production where server definitions are stable — enabling cache_tools_list=True is one of the simplest and highest-impact performance optimizations available.

Production Recommendations

Always enable caching in production. Set cache_tools_list=True on every MCP server instance that has a stable tool set.
Use long-lived server objects. Create MCP server instances at application startup and reuse them across requests. Do not recreate them per request.
Invalidate on deploy. Wire your CI/CD pipeline to call invalidate_tools_cache() whenever an MCP server is redeployed.
Monitor discovery latency. Log the time spent in tool discovery so you can detect regressions when servers add new tools or infrastructure changes affect subprocess startup.
Set refresh intervals for safety. Even with deploy-triggered invalidation, add a periodic refresh (every five to ten minutes) as a safety net against missed events.

Tool caching is a small configuration change with outsized impact. It eliminates the most common source of unnecessary latency in multi-server MCP agents and is the first optimization you should apply when moving from prototype to production.

Caching MCP Tool Definitions for Performance

The Hidden Cost of Tool Discovery

Enabling cache_tools_list

How the Cache Works Internally

Invalidating the Cache

Latency Benchmarks

Benchmarking Your Own Setup

When Not to Cache

Production Recommendations

Try CallSphere AI Voice Agents

Related Articles You May Like

Desktop AI Agents in 2026: Project Arc, Claude Cowork, OpenAI Agents Compared

OpenAI Frontier: Model-Native Orchestration Is the Default in 2026

Building Multi-Agent Systems With MCP, A2A, And CallSphere As A Node

Gemini Enterprise vs Anthropic vs OpenAI Frontier: 2026 Comparison

Anthropic's Financial Services Platform: State of Play in May 2026

MCP vs A2A: When To Use Which Protocol (2026 Decision Guide)