The Problem with Large Tool Sets

Every tool you give an agent costs tokens. The tool's name, description, and full parameter schema are serialized into the system prompt on every model call. With 5 tools, this overhead is negligible. With 50, it consumes thousands of tokens per turn. With 200 — a realistic number for enterprise agents connecting to multiple APIs — you are burning a significant portion of your context window before the conversation even starts.

Beyond token cost, large tool sets degrade model performance. Research consistently shows that models make better tool-calling decisions when presented with fewer, more relevant options. An agent with 200 tools in its prompt will make more selection errors than the same agent with 10 well-matched tools.

The OpenAI Agents SDK addresses this with two complementary features: ToolSearchTool for dynamic tool discovery and defer_loading for lazy schema loading.

How ToolSearchTool Works

ToolSearchTool is a meta-tool — a tool whose job is to find other tools. Instead of loading all tools into the agent's context, you load only the ToolSearchTool. When the agent needs a capability, it searches for relevant tools by description, loads them dynamically, and then calls them.

flowchart LR
    INPUT(["User input"])
    AGENT["Agent<br/>name plus instructions"]
    HAND{"Handoff to<br/>another agent?"}
    SUB["Sub-agent<br/>specialist"]
    GUARD{"Guardrail<br/>passed?"}
    TOOL["Tool call"]
    SDK[("Tracing<br/>OpenAI dashboard")]
    OUT(["Final output"])
    INPUT --> AGENT --> HAND
    HAND -->|Yes| SUB --> GUARD
    HAND -->|No| GUARD
    GUARD -->|Yes| TOOL --> AGENT
    GUARD -->|Block| OUT
    AGENT --> OUT
    AGENT --> SDK
    style AGENT fill:#4f46e5,stroke:#4338ca,color:#fff
    style GUARD fill:#f59e0b,stroke:#d97706,color:#1f2937
    style SDK fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
    style OUT fill:#059669,stroke:#047857,color:#fff

Here is the basic setup:

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live →

Try Live Demo →

from agents import Agent, Runner, function_tool
from agents.tools import ToolSearchTool

# Define a large set of tools
@function_tool
def get_user_profile(user_id: str) -> dict:
    """Retrieve a user's profile including name, email, and preferences."""
    return {"user_id": user_id, "name": "Alice", "email": "alice@example.com"}

@function_tool
def update_user_email(user_id: str, new_email: str) -> dict:
    """Update the email address for a given user."""
    return {"status": "updated", "user_id": user_id, "email": new_email}

@function_tool
def list_invoices(user_id: str, status: str = "all") -> list:
    """List all invoices for a user, optionally filtered by status."""
    return [{"invoice_id": "INV-001", "amount": 99.99, "status": "paid"}]

@function_tool
def create_support_ticket(user_id: str, subject: str, description: str) -> dict:
    """Create a new support ticket for a user."""
    return {"ticket_id": "TKT-42", "status": "open"}

@function_tool
def get_system_health() -> dict:
    """Check the health status of all backend services."""
    return {"api": "healthy", "database": "healthy", "cache": "healthy"}

# Create a searchable tool registry
all_tools = [
    get_user_profile, update_user_email,
    list_invoices, create_support_ticket,
    get_system_health,
]

agent = Agent(
    name="SupportAgent",
    instructions="""You are a customer support agent. Use tool_search
    to find the right tool for each task. Search by describing what
    you need to do.""",
    tools=[
        ToolSearchTool(tools=all_tools),
    ],
    model="gpt-4o",
)

When the agent runs, its initial context only contains the schema for ToolSearchTool — a single lightweight tool. When it needs to, say, look up a user profile, it calls tool_search with a query like "retrieve user profile information." The ToolSearchTool searches the registry by matching the query against tool names and descriptions, then returns the matching tool schemas. On the next turn, the agent can call the discovered tool directly.

Deferred Loading with defer_loading=True

While ToolSearchTool handles the discovery pattern, defer_loading controls how individual tools are represented in the agent's context. When you set defer_loading=True on a tool, the SDK includes only the tool's name in the prompt — no description, no parameter schema. The full schema is loaded only when the agent decides to search for or call that tool.

from agents import Agent, function_tool
from agents.tools import ToolSearchTool

@function_tool(defer_loading=True)
def complex_data_migration(
    source_table: str,
    target_table: str,
    filter_column: str,
    filter_value: str,
    batch_size: int = 1000,
    dry_run: bool = True,
) -> dict:
    """Execute a filtered data migration between database tables
    with batching and optional dry-run mode."""
    return {"status": "migrated", "rows": 5000}

@function_tool(defer_loading=True)
def generate_analytics_report(
    report_type: str,
    date_from: str,
    date_to: str,
    dimensions: list[str] | None = None,
    metrics: list[str] | None = None,
    format: str = "json",
) -> dict:
    """Generate an analytics report with customizable dimensions,
    metrics, and output format."""
    return {"report_id": "RPT-001", "status": "generated"}

These tools have complex schemas with multiple parameters. Without deferred loading, each would consume hundreds of tokens in every prompt. With defer_loading=True, they appear as just their name until the agent needs them.

Combining ToolSearchTool with Deferred Loading

The most powerful pattern combines both features. ToolSearchTool discovers the right tools by semantic search, and deferred loading ensures those tools do not inflate every prompt:

from agents import Agent, Runner, function_tool
from agents.tools import ToolSearchTool
import asyncio

# Imagine 50+ tools across multiple domains
user_tools = [get_user_profile, update_user_email]  # ... many more
billing_tools = [list_invoices]  # ... many more
support_tools = [create_support_ticket]  # ... many more
admin_tools = [get_system_health]  # ... many more

all_tools = user_tools + billing_tools + support_tools + admin_tools

# All tools are deferred — minimal token cost
for tool in all_tools:
    tool.defer_loading = True

agent = Agent(
    name="UnifiedAgent",
    instructions="""You have access to a large set of tools through
    tool_search. Always search for the right tool before attempting
    an action. If no tool matches, tell the user what you cannot do.""",
    tools=[
        ToolSearchTool(tools=all_tools),
    ],
    model="gpt-4o",
)

async def main():
    result = await Runner.run(
        agent,
        input="Show me the latest invoices for user U-123",
    )
    print(result.final_output)

asyncio.run(main())

The agent's first model call sees only the tool_search schema. It searches for "invoices for user," discovers list_invoices, and on the next turn has the full schema available to call it with the correct parameters.

Token Impact Analysis

To understand the savings, consider a concrete example. A typical function tool with 5 parameters, a docstring, and type annotations consumes roughly 150-250 tokens in the prompt. With 100 such tools:

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Try Live Demo → Book 30-min Walkthrough See Pricing

Without optimization: 15,000-25,000 tokens per model call just for tool schemas
With defer_loading only: ~2,000 tokens (just names)
With ToolSearchTool: ~200 tokens (just the search tool schema)
With both combined: ~200 tokens base, plus ~200-400 tokens for each discovered tool on subsequent turns

For agents that run multi-turn conversations, the cumulative savings are substantial. A 10-turn conversation with 100 tools could save over 200,000 tokens total.

Best Practices for Tool Search

Write descriptive tool docstrings. The search matches against names and descriptions. A tool called process_data with no docstring is nearly impossible to find by semantic search. Write explicit, keyword-rich descriptions.

Group related tools logically. If you have 20 billing tools and 20 user management tools, the search performs better when tools in the same domain use consistent terminology.

Set search result limits. Configure how many tools the search returns to avoid flooding the agent's context with irrelevant matches:

search_tool = ToolSearchTool(
    tools=all_tools,
    max_results=5,
)

Cache discovered tools across turns. The SDK handles this internally — once a tool is discovered and loaded in a run, it stays available for subsequent turns without another search.

Fall back gracefully. Instruct the agent to report when it cannot find a matching tool rather than hallucinating a tool call. This prevents confusing errors and gives the user actionable feedback.

Tool search and deferred loading are essential patterns for any agent that operates across multiple domains or integrates with large API surfaces. They keep your token costs predictable, your model performance high, and your agent architecture scalable.

Tool Search and Deferred Loading for Large Tool Sets

The Problem with Large Tool Sets

How ToolSearchTool Works

Deferred Loading with defer_loading=True

Combining ToolSearchTool with Deferred Loading

Token Impact Analysis

Best Practices for Tool Search

Try CallSphere AI Voice Agents

Related Articles You May Like

Desktop AI Agents in 2026: Project Arc, Claude Cowork, OpenAI Agents Compared

OpenAI Frontier: Model-Native Orchestration Is the Default in 2026

Gemini Enterprise vs Anthropic vs OpenAI Frontier: 2026 Comparison

Anthropic's Financial Services Platform: State of Play in May 2026

Model-Native Harness: Why OpenAI and Anthropic Are Killing ReAct Loops

GPT-Realtime-Whisper vs Deepgram: Streaming STT in 2026