Skip to content
Learn Agentic AI
Learn Agentic AI12 min read9 views

Caching MCP Tool Definitions for Performance

Dramatically reduce agent startup latency by caching MCP tool definitions with cache_tools_list, implementing cache invalidation strategies, and benchmarking the performance gains in production agents.

The Hidden Cost of Tool Discovery

Every time an MCP agent starts a run, it calls list_tools() on each connected MCP server. This discovery step fetches the name, description, and JSON schema for every tool the server exposes. For a stdio server, that means spawning a subprocess, waiting for initialization, and exchanging JSON-RPC messages. For an HTTP server, it means a network round-trip.

When you have a single server with five tools, the cost is negligible. But production agents often connect to three, four, or more servers — a filesystem server, a database server, a search server, and a custom business logic server. Each server might expose ten to twenty tools. Suddenly, tool discovery adds 500 milliseconds to two seconds of latency before the agent can process its first message.

The fix is straightforward: cache the tool definitions so that discovery only happens once.

Enabling cache_tools_list

The OpenAI Agents SDK supports tool caching directly on MCP server instances. When you set cache_tools_list=True, the SDK stores the tool definitions after the first list_tools() call and reuses them on subsequent agent runs without re-fetching:

flowchart TD
    START["Caching MCP Tool Definitions for Performance"] --> A
    A["The Hidden Cost of Tool Discovery"]
    A --> B
    B["Enabling cache_tools_list"]
    B --> C
    C["How the Cache Works Internally"]
    C --> D
    D["Invalidating the Cache"]
    D --> E
    E["Latency Benchmarks"]
    E --> F
    F["Benchmarking Your Own Setup"]
    F --> G
    G["When Not to Cache"]
    G --> H
    H["Production Recommendations"]
    H --> DONE["Key Takeaways"]
    style START fill:#4f46e5,stroke:#4338ca,color:#fff
    style DONE fill:#059669,stroke:#047857,color:#fff
from agents.mcp import MCPServerStdio, MCPServerStreamableHTTP

# Stdio server with caching enabled
filesystem_server = MCPServerStdio(
    name="Filesystem",
    params={
        "command": "npx",
        "args": ["-y", "@modelcontextprotocol/server-filesystem", "/data"],
    },
    cache_tools_list=True,
)

# HTTP server with caching enabled
db_server = MCPServerStreamableHTTP(
    name="Database",
    params={
        "url": "http://localhost:8001/mcp",
    },
    cache_tools_list=True,
)

With caching enabled, the first agent run performs normal tool discovery. Every subsequent run skips the discovery step entirely and uses the cached schemas. For stdio servers, this is especially impactful because it avoids re-spawning the subprocess just to enumerate tools.

How the Cache Works Internally

The caching mechanism is simple but effective. When cache_tools_list is True, the SDK stores the result of list_tools() in memory on the server object. On subsequent calls, it returns the stored list immediately instead of making a JSON-RPC request.

This means the cache lives for the lifetime of the server object. If you create a new MCPServerStdio instance, it starts with an empty cache. If you reuse the same instance across multiple Runner.run() calls — which is the recommended pattern — the cache persists.

from agents import Agent, Runner

# Create server once, reuse across runs
server = MCPServerStdio(
    name="Tools",
    params={"command": "npx", "args": ["-y", "@modelcontextprotocol/server-tools"]},
    cache_tools_list=True,
)

agent = Agent(
    name="Assistant",
    instructions="You are a helpful assistant.",
    mcp_servers=[server],
)

async def handle_request(user_message: str):
    # First call: discovers tools, caches them
    # All subsequent calls: uses cached tool list
    result = await Runner.run(agent, user_message)
    return result.final_output

Invalidating the Cache

Caching introduces a consistency problem. If the MCP server adds, removes, or modifies tools after the initial discovery, the cached list becomes stale. The agent might try to call a tool that no longer exists, or miss a newly added tool.

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

The SDK provides invalidate_tools_cache() to handle this:

# After deploying a new version of the MCP server
filesystem_server.invalidate_tools_cache()

# The next Runner.run() call will re-discover tools
result = await Runner.run(agent, "List all files in /data")

You can also build automatic invalidation into your workflow. A common pattern is to invalidate on a schedule or in response to deployment events:

import asyncio
from datetime import datetime

class ManagedMCPServer:
    def __init__(self, server, refresh_interval_seconds=300):
        self.server = server
        self.refresh_interval = refresh_interval_seconds
        self.last_refresh = datetime.now()

    async def maybe_refresh(self):
        elapsed = (datetime.now() - self.last_refresh).total_seconds()
        if elapsed > self.refresh_interval:
            self.server.invalidate_tools_cache()
            self.last_refresh = datetime.now()

    async def run_agent(self, agent, message):
        await self.maybe_refresh()
        return await Runner.run(agent, message)

Another approach is event-driven invalidation. If your MCP servers are deployed via CI/CD, you can send a webhook or message to your agent service whenever a server is redeployed:

from fastapi import FastAPI

app = FastAPI()
servers = {}

@app.post("/webhook/server-deployed")
async def on_server_deployed(server_name: str):
    if server_name in servers:
        servers[server_name].invalidate_tools_cache()
        return {"status": "cache_invalidated", "server": server_name}
    return {"status": "server_not_found"}

Latency Benchmarks

To quantify the impact of caching, here are measurements from a real agent setup with three MCP servers. The environment uses MCPServerStdio for a filesystem server, MCPServerStreamableHTTP for a database server, and another stdio server for a custom tools package.

Without caching (tool discovery on every run):

Server Discovery Time
Filesystem (stdio) 420ms
Database (HTTP) 85ms
Custom tools (stdio) 380ms
Total 885ms

With cache_tools_list=True (after first run):

Server Discovery Time
Filesystem (stdio) <1ms
Database (HTTP) <1ms
Custom tools (stdio) <1ms
Total <3ms

That is a 99.7% reduction in tool discovery latency. For an agent handling real-time chat, cutting 880 milliseconds from every response cycle is transformative.

Benchmarking Your Own Setup

You can measure tool discovery latency in your own environment with a simple timing wrapper:

import time
from agents.mcp import MCPServerStdio

async def benchmark_tool_discovery(server, iterations=10):
    times = []
    for i in range(iterations):
        server.invalidate_tools_cache()
        start = time.perf_counter()
        tools = await server.list_tools()
        elapsed = (time.perf_counter() - start) * 1000
        times.append(elapsed)
        print(f"  Run {i+1}: {elapsed:.1f}ms ({len(tools)} tools)")
    avg = sum(times) / len(times)
    print(f"  Average: {avg:.1f}ms")
    return avg

async def benchmark_cached(server, iterations=10):
    # Prime the cache
    await server.list_tools()
    times = []
    for i in range(iterations):
        start = time.perf_counter()
        tools = await server.list_tools()
        elapsed = (time.perf_counter() - start) * 1000
        times.append(elapsed)
    avg = sum(times) / len(times)
    print(f"  Cached average: {avg:.2f}ms")
    return avg

When Not to Cache

Caching is not always the right choice. Avoid it when:

  • Tools change frequently during development. If you are actively iterating on an MCP server and adding or renaming tools, stale caches will cause confusing errors.
  • The server is short-lived. If each agent run creates and destroys a new server instance, caching provides no benefit because the cache is lost with the instance.
  • Tool availability is dynamic. Some servers expose different tools based on the authenticated user or context. Caching a tool list from one user would be incorrect for another.

For all other cases — and especially in production where server definitions are stable — enabling cache_tools_list=True is one of the simplest and highest-impact performance optimizations available.

Production Recommendations

  1. Always enable caching in production. Set cache_tools_list=True on every MCP server instance that has a stable tool set.
  2. Use long-lived server objects. Create MCP server instances at application startup and reuse them across requests. Do not recreate them per request.
  3. Invalidate on deploy. Wire your CI/CD pipeline to call invalidate_tools_cache() whenever an MCP server is redeployed.
  4. Monitor discovery latency. Log the time spent in tool discovery so you can detect regressions when servers add new tools or infrastructure changes affect subprocess startup.
  5. Set refresh intervals for safety. Even with deploy-triggered invalidation, add a periodic refresh (every five to ten minutes) as a safety net against missed events.

Tool caching is a small configuration change with outsized impact. It eliminates the most common source of unnecessary latency in multi-server MCP agents and is the first optimization you should apply when moving from prototype to production.

Share
C

Written by

CallSphere Team

Expert insights on AI voice agents and customer communication automation.

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.

Related Articles You May Like