---
title: "Caching MCP Tool Definitions for Performance"
description: "Dramatically reduce agent startup latency by caching MCP tool definitions with cache_tools_list, implementing cache invalidation strategies, and benchmarking the performance gains in production agents."
canonical: https://callsphere.ai/blog/caching-mcp-tool-definitions-performance
category: "Learn Agentic AI"
tags: ["OpenAI", "MCP", "Caching", "Performance", "Optimization"]
author: "CallSphere Team"
published: 2026-03-14T00:00:00.000Z
updated: 2026-05-06T01:02:41.611Z
---

# Caching MCP Tool Definitions for Performance

> Dramatically reduce agent startup latency by caching MCP tool definitions with cache_tools_list, implementing cache invalidation strategies, and benchmarking the performance gains in production agents.

## The Hidden Cost of Tool Discovery

Every time an MCP agent starts a run, it calls `list_tools()` on each connected MCP server. This discovery step fetches the name, description, and JSON schema for every tool the server exposes. For a stdio server, that means spawning a subprocess, waiting for initialization, and exchanging JSON-RPC messages. For an HTTP server, it means a network round-trip.

When you have a single server with five tools, the cost is negligible. But production agents often connect to three, four, or more servers — a filesystem server, a database server, a search server, and a custom business logic server. Each server might expose ten to twenty tools. Suddenly, tool discovery adds 500 milliseconds to two seconds of latency before the agent can process its first message.

The fix is straightforward: cache the tool definitions so that discovery only happens once.

## Enabling cache_tools_list

The OpenAI Agents SDK supports tool caching directly on MCP server instances. When you set `cache_tools_list=True`, the SDK stores the tool definitions after the first `list_tools()` call and reuses them on subsequent agent runs without re-fetching:

```mermaid
flowchart LR
    HOST(["MCP host
Claude Desktop or IDE"])
    CLIENT["MCP client"]
    subgraph SERVERS["MCP Servers"]
        S1["Filesystem server"]
        S2["GitHub server"]
        S3["Postgres server"]
        SX["Custom tool server"]
    end
    LLM["LLM session"]
    OUT(["Grounded action"])
    HOST  CLIENT
    CLIENT |stdio or HTTP+SSE| S1
    CLIENT  S2
    CLIENT  S3
    CLIENT  SX
    CLIENT --> LLM --> OUT
    style HOST fill:#f1f5f9,stroke:#64748b,color:#0f172a
    style CLIENT fill:#4f46e5,stroke:#4338ca,color:#fff
    style OUT fill:#059669,stroke:#047857,color:#fff
```

```python
from agents.mcp import MCPServerStdio, MCPServerStreamableHTTP

# Stdio server with caching enabled
filesystem_server = MCPServerStdio(
    name="Filesystem",
    params={
        "command": "npx",
        "args": ["-y", "@modelcontextprotocol/server-filesystem", "/data"],
    },
    cache_tools_list=True,
)

# HTTP server with caching enabled
db_server = MCPServerStreamableHTTP(
    name="Database",
    params={
        "url": "http://localhost:8001/mcp",
    },
    cache_tools_list=True,
)
```

With caching enabled, the first agent run performs normal tool discovery. Every subsequent run skips the discovery step entirely and uses the cached schemas. For stdio servers, this is especially impactful because it avoids re-spawning the subprocess just to enumerate tools.

## How the Cache Works Internally

The caching mechanism is simple but effective. When `cache_tools_list` is `True`, the SDK stores the result of `list_tools()` in memory on the server object. On subsequent calls, it returns the stored list immediately instead of making a JSON-RPC request.

This means the cache lives for the lifetime of the server object. If you create a new `MCPServerStdio` instance, it starts with an empty cache. If you reuse the same instance across multiple `Runner.run()` calls — which is the recommended pattern — the cache persists.

```python
from agents import Agent, Runner

# Create server once, reuse across runs
server = MCPServerStdio(
    name="Tools",
    params={"command": "npx", "args": ["-y", "@modelcontextprotocol/server-tools"]},
    cache_tools_list=True,
)

agent = Agent(
    name="Assistant",
    instructions="You are a helpful assistant.",
    mcp_servers=[server],
)

async def handle_request(user_message: str):
    # First call: discovers tools, caches them
    # All subsequent calls: uses cached tool list
    result = await Runner.run(agent, user_message)
    return result.final_output
```

## Invalidating the Cache

Caching introduces a consistency problem. If the MCP server adds, removes, or modifies tools after the initial discovery, the cached list becomes stale. The agent might try to call a tool that no longer exists, or miss a newly added tool.

The SDK provides `invalidate_tools_cache()` to handle this:

```python
# After deploying a new version of the MCP server
filesystem_server.invalidate_tools_cache()

# The next Runner.run() call will re-discover tools
result = await Runner.run(agent, "List all files in /data")
```

You can also build automatic invalidation into your workflow. A common pattern is to invalidate on a schedule or in response to deployment events:

```python
import asyncio
from datetime import datetime

class ManagedMCPServer:
    def __init__(self, server, refresh_interval_seconds=300):
        self.server = server
        self.refresh_interval = refresh_interval_seconds
        self.last_refresh = datetime.now()

    async def maybe_refresh(self):
        elapsed = (datetime.now() - self.last_refresh).total_seconds()
        if elapsed > self.refresh_interval:
            self.server.invalidate_tools_cache()
            self.last_refresh = datetime.now()

    async def run_agent(self, agent, message):
        await self.maybe_refresh()
        return await Runner.run(agent, message)
```

Another approach is event-driven invalidation. If your MCP servers are deployed via CI/CD, you can send a webhook or message to your agent service whenever a server is redeployed:

```python
from fastapi import FastAPI

app = FastAPI()
servers = {}

@app.post("/webhook/server-deployed")
async def on_server_deployed(server_name: str):
    if server_name in servers:
        servers[server_name].invalidate_tools_cache()
        return {"status": "cache_invalidated", "server": server_name}
    return {"status": "server_not_found"}
```

## Latency Benchmarks

To quantify the impact of caching, here are measurements from a real agent setup with three MCP servers. The environment uses `MCPServerStdio` for a filesystem server, `MCPServerStreamableHTTP` for a database server, and another stdio server for a custom tools package.

**Without caching (tool discovery on every run):**

| Server | Discovery Time |
| --- | --- |
| Filesystem (stdio) | 420ms |
| Database (HTTP) | 85ms |
| Custom tools (stdio) | 380ms |
| **Total** | **885ms** |

**With cache_tools_list=True (after first run):**

| Server | Discovery Time |
| --- | --- |
| Filesystem (stdio) | <1ms |
| Database (HTTP) | <1ms |
| Custom tools (stdio) | <1ms |
| **Total** | **<3ms** |

That is a 99.7% reduction in tool discovery latency. For an agent handling real-time chat, cutting 880 milliseconds from every response cycle is transformative.

## Benchmarking Your Own Setup

You can measure tool discovery latency in your own environment with a simple timing wrapper:

```python
import time
from agents.mcp import MCPServerStdio

async def benchmark_tool_discovery(server, iterations=10):
    times = []
    for i in range(iterations):
        server.invalidate_tools_cache()
        start = time.perf_counter()
        tools = await server.list_tools()
        elapsed = (time.perf_counter() - start) * 1000
        times.append(elapsed)
        print(f"  Run {i+1}: {elapsed:.1f}ms ({len(tools)} tools)")
    avg = sum(times) / len(times)
    print(f"  Average: {avg:.1f}ms")
    return avg

async def benchmark_cached(server, iterations=10):
    # Prime the cache
    await server.list_tools()
    times = []
    for i in range(iterations):
        start = time.perf_counter()
        tools = await server.list_tools()
        elapsed = (time.perf_counter() - start) * 1000
        times.append(elapsed)
    avg = sum(times) / len(times)
    print(f"  Cached average: {avg:.2f}ms")
    return avg
```

## When Not to Cache

Caching is not always the right choice. Avoid it when:

- **Tools change frequently during development.** If you are actively iterating on an MCP server and adding or renaming tools, stale caches will cause confusing errors.
- **The server is short-lived.** If each agent run creates and destroys a new server instance, caching provides no benefit because the cache is lost with the instance.
- **Tool availability is dynamic.** Some servers expose different tools based on the authenticated user or context. Caching a tool list from one user would be incorrect for another.

For all other cases — and especially in production where server definitions are stable — enabling `cache_tools_list=True` is one of the simplest and highest-impact performance optimizations available.

## Production Recommendations

1. **Always enable caching in production.** Set `cache_tools_list=True` on every MCP server instance that has a stable tool set.
2. **Use long-lived server objects.** Create MCP server instances at application startup and reuse them across requests. Do not recreate them per request.
3. **Invalidate on deploy.** Wire your CI/CD pipeline to call `invalidate_tools_cache()` whenever an MCP server is redeployed.
4. **Monitor discovery latency.** Log the time spent in tool discovery so you can detect regressions when servers add new tools or infrastructure changes affect subprocess startup.
5. **Set refresh intervals for safety.** Even with deploy-triggered invalidation, add a periodic refresh (every five to ten minutes) as a safety net against missed events.

Tool caching is a small configuration change with outsized impact. It eliminates the most common source of unnecessary latency in multi-server MCP agents and is the first optimization you should apply when moving from prototype to production.

---

Source: https://callsphere.ai/blog/caching-mcp-tool-definitions-performance