Caching MCP Tool Definitions for Performance
Dramatically reduce agent startup latency by caching MCP tool definitions with cache_tools_list, implementing cache invalidation strategies, and benchmarking the performance gains in production agents.
The Hidden Cost of Tool Discovery
Every time an MCP agent starts a run, it calls list_tools() on each connected MCP server. This discovery step fetches the name, description, and JSON schema for every tool the server exposes. For a stdio server, that means spawning a subprocess, waiting for initialization, and exchanging JSON-RPC messages. For an HTTP server, it means a network round-trip.
When you have a single server with five tools, the cost is negligible. But production agents often connect to three, four, or more servers — a filesystem server, a database server, a search server, and a custom business logic server. Each server might expose ten to twenty tools. Suddenly, tool discovery adds 500 milliseconds to two seconds of latency before the agent can process its first message.
The fix is straightforward: cache the tool definitions so that discovery only happens once.
Enabling cache_tools_list
The OpenAI Agents SDK supports tool caching directly on MCP server instances. When you set cache_tools_list=True, the SDK stores the tool definitions after the first list_tools() call and reuses them on subsequent agent runs without re-fetching:
flowchart TD
START["Caching MCP Tool Definitions for Performance"] --> A
A["The Hidden Cost of Tool Discovery"]
A --> B
B["Enabling cache_tools_list"]
B --> C
C["How the Cache Works Internally"]
C --> D
D["Invalidating the Cache"]
D --> E
E["Latency Benchmarks"]
E --> F
F["Benchmarking Your Own Setup"]
F --> G
G["When Not to Cache"]
G --> H
H["Production Recommendations"]
H --> DONE["Key Takeaways"]
style START fill:#4f46e5,stroke:#4338ca,color:#fff
style DONE fill:#059669,stroke:#047857,color:#fff
from agents.mcp import MCPServerStdio, MCPServerStreamableHTTP
# Stdio server with caching enabled
filesystem_server = MCPServerStdio(
name="Filesystem",
params={
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-filesystem", "/data"],
},
cache_tools_list=True,
)
# HTTP server with caching enabled
db_server = MCPServerStreamableHTTP(
name="Database",
params={
"url": "http://localhost:8001/mcp",
},
cache_tools_list=True,
)
With caching enabled, the first agent run performs normal tool discovery. Every subsequent run skips the discovery step entirely and uses the cached schemas. For stdio servers, this is especially impactful because it avoids re-spawning the subprocess just to enumerate tools.
How the Cache Works Internally
The caching mechanism is simple but effective. When cache_tools_list is True, the SDK stores the result of list_tools() in memory on the server object. On subsequent calls, it returns the stored list immediately instead of making a JSON-RPC request.
This means the cache lives for the lifetime of the server object. If you create a new MCPServerStdio instance, it starts with an empty cache. If you reuse the same instance across multiple Runner.run() calls — which is the recommended pattern — the cache persists.
from agents import Agent, Runner
# Create server once, reuse across runs
server = MCPServerStdio(
name="Tools",
params={"command": "npx", "args": ["-y", "@modelcontextprotocol/server-tools"]},
cache_tools_list=True,
)
agent = Agent(
name="Assistant",
instructions="You are a helpful assistant.",
mcp_servers=[server],
)
async def handle_request(user_message: str):
# First call: discovers tools, caches them
# All subsequent calls: uses cached tool list
result = await Runner.run(agent, user_message)
return result.final_output
Invalidating the Cache
Caching introduces a consistency problem. If the MCP server adds, removes, or modifies tools after the initial discovery, the cached list becomes stale. The agent might try to call a tool that no longer exists, or miss a newly added tool.
See AI Voice Agents Handle Real Calls
Book a free demo or calculate how much you can save with AI voice automation.
The SDK provides invalidate_tools_cache() to handle this:
# After deploying a new version of the MCP server
filesystem_server.invalidate_tools_cache()
# The next Runner.run() call will re-discover tools
result = await Runner.run(agent, "List all files in /data")
You can also build automatic invalidation into your workflow. A common pattern is to invalidate on a schedule or in response to deployment events:
import asyncio
from datetime import datetime
class ManagedMCPServer:
def __init__(self, server, refresh_interval_seconds=300):
self.server = server
self.refresh_interval = refresh_interval_seconds
self.last_refresh = datetime.now()
async def maybe_refresh(self):
elapsed = (datetime.now() - self.last_refresh).total_seconds()
if elapsed > self.refresh_interval:
self.server.invalidate_tools_cache()
self.last_refresh = datetime.now()
async def run_agent(self, agent, message):
await self.maybe_refresh()
return await Runner.run(agent, message)
Another approach is event-driven invalidation. If your MCP servers are deployed via CI/CD, you can send a webhook or message to your agent service whenever a server is redeployed:
from fastapi import FastAPI
app = FastAPI()
servers = {}
@app.post("/webhook/server-deployed")
async def on_server_deployed(server_name: str):
if server_name in servers:
servers[server_name].invalidate_tools_cache()
return {"status": "cache_invalidated", "server": server_name}
return {"status": "server_not_found"}
Latency Benchmarks
To quantify the impact of caching, here are measurements from a real agent setup with three MCP servers. The environment uses MCPServerStdio for a filesystem server, MCPServerStreamableHTTP for a database server, and another stdio server for a custom tools package.
Without caching (tool discovery on every run):
| Server | Discovery Time |
|---|---|
| Filesystem (stdio) | 420ms |
| Database (HTTP) | 85ms |
| Custom tools (stdio) | 380ms |
| Total | 885ms |
With cache_tools_list=True (after first run):
| Server | Discovery Time |
|---|---|
| Filesystem (stdio) | <1ms |
| Database (HTTP) | <1ms |
| Custom tools (stdio) | <1ms |
| Total | <3ms |
That is a 99.7% reduction in tool discovery latency. For an agent handling real-time chat, cutting 880 milliseconds from every response cycle is transformative.
Benchmarking Your Own Setup
You can measure tool discovery latency in your own environment with a simple timing wrapper:
import time
from agents.mcp import MCPServerStdio
async def benchmark_tool_discovery(server, iterations=10):
times = []
for i in range(iterations):
server.invalidate_tools_cache()
start = time.perf_counter()
tools = await server.list_tools()
elapsed = (time.perf_counter() - start) * 1000
times.append(elapsed)
print(f" Run {i+1}: {elapsed:.1f}ms ({len(tools)} tools)")
avg = sum(times) / len(times)
print(f" Average: {avg:.1f}ms")
return avg
async def benchmark_cached(server, iterations=10):
# Prime the cache
await server.list_tools()
times = []
for i in range(iterations):
start = time.perf_counter()
tools = await server.list_tools()
elapsed = (time.perf_counter() - start) * 1000
times.append(elapsed)
avg = sum(times) / len(times)
print(f" Cached average: {avg:.2f}ms")
return avg
When Not to Cache
Caching is not always the right choice. Avoid it when:
- Tools change frequently during development. If you are actively iterating on an MCP server and adding or renaming tools, stale caches will cause confusing errors.
- The server is short-lived. If each agent run creates and destroys a new server instance, caching provides no benefit because the cache is lost with the instance.
- Tool availability is dynamic. Some servers expose different tools based on the authenticated user or context. Caching a tool list from one user would be incorrect for another.
For all other cases — and especially in production where server definitions are stable — enabling cache_tools_list=True is one of the simplest and highest-impact performance optimizations available.
Production Recommendations
- Always enable caching in production. Set
cache_tools_list=Trueon every MCP server instance that has a stable tool set. - Use long-lived server objects. Create MCP server instances at application startup and reuse them across requests. Do not recreate them per request.
- Invalidate on deploy. Wire your CI/CD pipeline to call
invalidate_tools_cache()whenever an MCP server is redeployed. - Monitor discovery latency. Log the time spent in tool discovery so you can detect regressions when servers add new tools or infrastructure changes affect subprocess startup.
- Set refresh intervals for safety. Even with deploy-triggered invalidation, add a periodic refresh (every five to ten minutes) as a safety net against missed events.
Tool caching is a small configuration change with outsized impact. It eliminates the most common source of unnecessary latency in multi-server MCP agents and is the first optimization you should apply when moving from prototype to production.
Written by
CallSphere Team
Expert insights on AI voice agents and customer communication automation.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.