---
title: "Tool Search and Deferred Loading for Large Tool Sets"
description: "Learn how to use ToolSearchTool and defer_loading in the OpenAI Agents SDK to manage large tool inventories, reduce token usage, and dynamically load tools only when needed."
canonical: https://callsphere.ai/blog/tool-search-deferred-loading-large-tool-sets-openai-agents-sdk
category: "Learn Agentic AI"
tags: ["OpenAI", "Tool Search", "Deferred Loading", "Optimization"]
author: "CallSphere Team"
published: 2026-03-14T00:00:00.000Z
updated: 2026-06-05T14:40:29.756Z
---

# Tool Search and Deferred Loading for Large Tool Sets

> Learn how to use ToolSearchTool and defer_loading in the OpenAI Agents SDK to manage large tool inventories, reduce token usage, and dynamically load tools only when needed.

## The Problem with Large Tool Sets

Every tool you give an agent costs tokens. The tool's name, description, and full parameter schema are serialized into the system prompt on every model call. With 5 tools, this overhead is negligible. With 50, it consumes thousands of tokens per turn. With 200 — a realistic number for enterprise agents connecting to multiple APIs — you are burning a significant portion of your context window before the conversation even starts.

Beyond token cost, large tool sets degrade model performance. Research consistently shows that models make better tool-calling decisions when presented with fewer, more relevant options. An agent with 200 tools in its prompt will make more selection errors than the same agent with 10 well-matched tools.

The OpenAI Agents SDK addresses this with two complementary features: `ToolSearchTool` for dynamic tool discovery and `defer_loading` for lazy schema loading.

## How ToolSearchTool Works

`ToolSearchTool` is a meta-tool — a tool whose job is to find other tools. Instead of loading all tools into the agent's context, you load only the ToolSearchTool. When the agent needs a capability, it searches for relevant tools by description, loads them dynamically, and then calls them.

```mermaid
flowchart LR
    INPUT(["User input"])
    AGENT["Agent
name plus instructions"]
    HAND{"Handoff to
another agent?"}
    SUB["Sub-agent
specialist"]
    GUARD{"Guardrail
passed?"}
    TOOL["Tool call"]
    SDK[("Tracing
OpenAI dashboard")]
    OUT(["Final output"])
    INPUT --> AGENT --> HAND
    HAND -->|Yes| SUB --> GUARD
    HAND -->|No| GUARD
    GUARD -->|Yes| TOOL --> AGENT
    GUARD -->|Block| OUT
    AGENT --> OUT
    AGENT --> SDK
    style AGENT fill:#4f46e5,stroke:#4338ca,color:#fff
    style GUARD fill:#f59e0b,stroke:#d97706,color:#1f2937
    style SDK fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
    style OUT fill:#059669,stroke:#047857,color:#fff
```

Here is the basic setup:

```python
from agents import Agent, Runner, function_tool
from agents.tools import ToolSearchTool

# Define a large set of tools
@function_tool
def get_user_profile(user_id: str) -> dict:
    """Retrieve a user's profile including name, email, and preferences."""
    return {"user_id": user_id, "name": "Alice", "email": "alice@example.com"}

@function_tool
def update_user_email(user_id: str, new_email: str) -> dict:
    """Update the email address for a given user."""
    return {"status": "updated", "user_id": user_id, "email": new_email}

@function_tool
def list_invoices(user_id: str, status: str = "all") -> list:
    """List all invoices for a user, optionally filtered by status."""
    return [{"invoice_id": "INV-001", "amount": 99.99, "status": "paid"}]

@function_tool
def create_support_ticket(user_id: str, subject: str, description: str) -> dict:
    """Create a new support ticket for a user."""
    return {"ticket_id": "TKT-42", "status": "open"}

@function_tool
def get_system_health() -> dict:
    """Check the health status of all backend services."""
    return {"api": "healthy", "database": "healthy", "cache": "healthy"}

# Create a searchable tool registry
all_tools = [
    get_user_profile, update_user_email,
    list_invoices, create_support_ticket,
    get_system_health,
]

agent = Agent(
    name="SupportAgent",
    instructions="""You are a customer support agent. Use tool_search
    to find the right tool for each task. Search by describing what
    you need to do.""",
    tools=[
        ToolSearchTool(tools=all_tools),
    ],
    model="gpt-4o",
)
```

When the agent runs, its initial context only contains the schema for `ToolSearchTool` — a single lightweight tool. When it needs to, say, look up a user profile, it calls `tool_search` with a query like "retrieve user profile information." The ToolSearchTool searches the registry by matching the query against tool names and descriptions, then returns the matching tool schemas. On the next turn, the agent can call the discovered tool directly.

## Deferred Loading with defer_loading=True

While `ToolSearchTool` handles the discovery pattern, `defer_loading` controls how individual tools are represented in the agent's context. When you set `defer_loading=True` on a tool, the SDK includes only the tool's name in the prompt — no description, no parameter schema. The full schema is loaded only when the agent decides to search for or call that tool.

```python
from agents import Agent, function_tool
from agents.tools import ToolSearchTool

@function_tool(defer_loading=True)
def complex_data_migration(
    source_table: str,
    target_table: str,
    filter_column: str,
    filter_value: str,
    batch_size: int = 1000,
    dry_run: bool = True,
) -> dict:
    """Execute a filtered data migration between database tables
    with batching and optional dry-run mode."""
    return {"status": "migrated", "rows": 5000}

@function_tool(defer_loading=True)
def generate_analytics_report(
    report_type: str,
    date_from: str,
    date_to: str,
    dimensions: list[str] | None = None,
    metrics: list[str] | None = None,
    format: str = "json",
) -> dict:
    """Generate an analytics report with customizable dimensions,
    metrics, and output format."""
    return {"report_id": "RPT-001", "status": "generated"}
```

These tools have complex schemas with multiple parameters. Without deferred loading, each would consume hundreds of tokens in every prompt. With `defer_loading=True`, they appear as just their name until the agent needs them.

## Combining ToolSearchTool with Deferred Loading

The most powerful pattern combines both features. ToolSearchTool discovers the right tools by semantic search, and deferred loading ensures those tools do not inflate every prompt:

```python
from agents import Agent, Runner, function_tool
from agents.tools import ToolSearchTool
import asyncio

# Imagine 50+ tools across multiple domains
user_tools = [get_user_profile, update_user_email]  # ... many more
billing_tools = [list_invoices]  # ... many more
support_tools = [create_support_ticket]  # ... many more
admin_tools = [get_system_health]  # ... many more

all_tools = user_tools + billing_tools + support_tools + admin_tools

# All tools are deferred — minimal token cost
for tool in all_tools:
    tool.defer_loading = True

agent = Agent(
    name="UnifiedAgent",
    instructions="""You have access to a large set of tools through
    tool_search. Always search for the right tool before attempting
    an action. If no tool matches, tell the user what you cannot do.""",
    tools=[
        ToolSearchTool(tools=all_tools),
    ],
    model="gpt-4o",
)

async def main():
    result = await Runner.run(
        agent,
        input="Show me the latest invoices for user U-123",
    )
    print(result.final_output)

asyncio.run(main())
```

The agent's first model call sees only the `tool_search` schema. It searches for "invoices for user," discovers `list_invoices`, and on the next turn has the full schema available to call it with the correct parameters.

## Token Impact Analysis

To understand the savings, consider a concrete example. A typical function tool with 5 parameters, a docstring, and type annotations consumes roughly 150-250 tokens in the prompt. With 100 such tools:

- **Without optimization**: 15,000-25,000 tokens per model call just for tool schemas
- **With defer_loading only**: ~2,000 tokens (just names)
- **With ToolSearchTool**: ~200 tokens (just the search tool schema)
- **With both combined**: ~200 tokens base, plus ~200-400 tokens for each discovered tool on subsequent turns

For agents that run multi-turn conversations, the cumulative savings are substantial. A 10-turn conversation with 100 tools could save over 200,000 tokens total.

## Best Practices for Tool Search

**Write descriptive tool docstrings.** The search matches against names and descriptions. A tool called `process_data` with no docstring is nearly impossible to find by semantic search. Write explicit, keyword-rich descriptions.

**Group related tools logically.** If you have 20 billing tools and 20 user management tools, the search performs better when tools in the same domain use consistent terminology.

**Set search result limits.** Configure how many tools the search returns to avoid flooding the agent's context with irrelevant matches:

```python
search_tool = ToolSearchTool(
    tools=all_tools,
    max_results=5,
)
```

**Cache discovered tools across turns.** The SDK handles this internally — once a tool is discovered and loaded in a run, it stays available for subsequent turns without another search.

**Fall back gracefully.** Instruct the agent to report when it cannot find a matching tool rather than hallucinating a tool call. This prevents confusing errors and gives the user actionable feedback.

Tool search and deferred loading are essential patterns for any agent that operates across multiple domains or integrates with large API surfaces. They keep your token costs predictable, your model performance high, and your agent architecture scalable.

---

Source: https://callsphere.ai/blog/tool-search-deferred-loading-large-tool-sets-openai-agents-sdk