Skip to content
Learn Agentic AI
Learn Agentic AI11 min read6 views

Tool Guardrails: Protecting Function Execution

Learn how to implement tool input and output guardrails in the OpenAI Agents SDK to validate function arguments, skip dangerous calls, and replace tool outputs before they reach the agent.

Why Tool Execution Needs Its Own Guardrails

Input guardrails catch bad user messages. Output guardrails catch bad agent responses. But between those two checkpoints, the agent calls tools — and tool calls are where the real damage happens. A miscrafted tool call can delete database records, send emails to the wrong recipient, charge a credit card for the wrong amount, or leak internal data through an API.

Tool guardrails in the OpenAI Agents SDK intercept tool execution at two points: before the function runs (tool input guardrails) and after it returns (tool output guardrails). They give you the ability to validate arguments, skip dangerous calls entirely, or replace tool outputs with sanitized versions.

Tool Input Guardrails: Validating Before Execution

A tool input guardrail inspects the arguments that the agent has decided to pass to a function. It runs after the LLM has generated the tool call but before the actual function executes.

flowchart TD
    START["Tool Guardrails: Protecting Function Execution"] --> A
    A["Why Tool Execution Needs Its Own Guardr…"]
    A --> B
    B["Tool Input Guardrails: Validating Befor…"]
    B --> C
    C["Tool Output Guardrails: Sanitizing Afte…"]
    C --> D
    D["Combining Input and Output Tool Guardra…"]
    D --> E
    E["Skipping Calls vs Replacing Output"]
    E --> F
    F["Real-World Pattern: Audit Logging Throu…"]
    F --> G
    G["Best Practices"]
    G --> DONE["Key Takeaways"]
    style START fill:#4f46e5,stroke:#4338ca,color:#fff
    style DONE fill:#059669,stroke:#047857,color:#fff
from agents import Agent, Runner, function_tool
from pydantic import BaseModel
import asyncio

@function_tool
def transfer_funds(from_account: str, to_account: str, amount: float) -> str:
    """Transfer funds between customer accounts."""
    # In production, this calls your banking API
    return f"Transferred ${amount:.2f} from {from_account} to {to_account}"

@function_tool
def get_account_balance(account_id: str) -> str:
    """Get the current balance for an account."""
    # Simulated lookup
    balances = {"ACC001": 5420.50, "ACC002": 12300.00}
    balance = balances.get(account_id, 0.0)
    return f"Account {account_id} balance: ${balance:.2f}"

Now define a tool input guardrail that validates transfer amounts:

async def transfer_amount_guardrail(ctx, agent, tool_call):
    """Block transfers above the auto-approval limit."""
    if tool_call.function.name != "transfer_funds":
        return None  # Only check transfer_funds calls

    import json
    args = json.loads(tool_call.function.arguments)
    amount = args.get("amount", 0)

    if amount > 10000:
        return {
            "skip": True,
            "replacement_output": (
                "Transfer blocked: amounts over $10,000 require "
                "manager approval. Please escalate this request."
            ),
        }

    if amount <= 0:
        return {
            "skip": True,
            "replacement_output": (
                "Transfer blocked: amount must be a positive number."
            ),
        }

    return None  # Allow the call to proceed

When the guardrail returns None, the tool call proceeds normally. When it returns a dictionary with skip: True, the actual function is never called, and the replacement_output is fed back to the agent as if the tool had returned that value.

Attaching Tool Input Guardrails to an Agent

banking_agent = Agent(
    name="BankingAgent",
    instructions="""You are a banking support agent. You can check
    account balances and transfer funds between accounts. Always
    confirm the details with the customer before executing a transfer.""",
    model="gpt-4o",
    tools=[transfer_funds, get_account_balance],
    tool_use_guardrails=[
        {
            "type": "input",
            "guardrail_function": transfer_amount_guardrail,
        },
    ],
)

This is a fundamentally different safety model than relying on prompt instructions. The prompt says "confirm before transferring," but the guardrail enforces a hard limit regardless of what the model decides to do.

Tool Output Guardrails: Sanitizing After Execution

Tool output guardrails run after the function returns but before the result is passed back to the agent. They are useful for redacting sensitive data, normalizing formats, or adding warnings to tool results.

async def redact_tool_output_guardrail(ctx, agent, tool_call, tool_output):
    """Redact sensitive fields from tool outputs before the agent sees them."""
    import re

    output_str = str(tool_output)

    # Redact SSNs
    output_str = re.sub(
        r"(d{3})-(d{2})-(d{4})",
        r"***-**-\3",
        output_str,
    )

    # Redact credit card numbers (keep last 4)
    output_str = re.sub(
        r"d{4}[-s]?d{4}[-s]?d{4}[-s]?(d{4})",
        r"****-****-****-\1",
        output_str,
    )

    if output_str != str(tool_output):
        return {"replacement_output": output_str}

    return None  # No modification needed

The agent sees the redacted version. It can still reference "the card ending in 4242" or "the last four of your SSN" without the full sensitive data ever appearing in the conversation context. This is critical because the full conversation context is often logged, cached, or sent to other services.

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

Attaching Output Guardrails

customer_agent = Agent(
    name="CustomerAgent",
    instructions="Help customers with account inquiries.",
    model="gpt-4o",
    tools=[lookup_customer, get_transactions],
    tool_use_guardrails=[
        {
            "type": "output",
            "guardrail_function": redact_tool_output_guardrail,
        },
    ],
)

Combining Input and Output Tool Guardrails

For maximum protection, apply both input and output guardrails to the same agent. Input guardrails prevent dangerous calls. Output guardrails sanitize the results of allowed calls.

flowchart TD
    ROOT["Tool Guardrails: Protecting Function Executi…"] 
    ROOT --> P0["Tool Input Guardrails: Validating Befor…"]
    P0 --> P0C0["Attaching Tool Input Guardrails to an A…"]
    ROOT --> P1["Tool Output Guardrails: Sanitizing Afte…"]
    P1 --> P1C0["Attaching Output Guardrails"]
    style ROOT fill:#4f46e5,stroke:#4338ca,color:#fff
    style P0 fill:#e0e7ff,stroke:#6366f1,color:#1e293b
    style P1 fill:#e0e7ff,stroke:#6366f1,color:#1e293b
secure_agent = Agent(
    name="SecureAgent",
    instructions="You are a secure financial assistant.",
    model="gpt-4o",
    tools=[transfer_funds, get_account_balance, lookup_customer],
    tool_use_guardrails=[
        {
            "type": "input",
            "guardrail_function": transfer_amount_guardrail,
        },
        {
            "type": "input",
            "guardrail_function": block_after_hours_guardrail,
        },
        {
            "type": "output",
            "guardrail_function": redact_tool_output_guardrail,
        },
    ],
)

Skipping Calls vs Replacing Output

Tool guardrails give you two distinct intervention strategies, and choosing the right one depends on the scenario.

Skipping the call means the function never executes. Use this when the tool call itself is dangerous — transferring too much money, deleting data, or calling an external API with invalid parameters.

Replacing the output means the function executes normally, but its return value is modified before the agent sees it. Use this when the function is safe to call but its output contains sensitive data that should not enter the conversation context.

async def selective_guardrail(ctx, agent, tool_call):
    """Example showing both skip and allow-with-modification patterns."""
    import json

    if tool_call.function.name == "delete_record":
        # SKIP: Never allow deletion through the agent
        return {
            "skip": True,
            "replacement_output": (
                "Record deletion is not available through this interface. "
                "Please submit a deletion request through the admin portal."
            ),
        }

    if tool_call.function.name == "search_users":
        args = json.loads(tool_call.function.arguments)
        query = args.get("query", "")
        if len(query) < 3:
            # SKIP: Prevent overly broad searches
            return {
                "skip": True,
                "replacement_output": (
                    "Search query must be at least 3 characters. "
                    "Please ask the customer for more specific information."
                ),
            }

    return None  # Allow all other calls

Real-World Pattern: Audit Logging Through Tool Guardrails

Tool guardrails are an excellent place to implement audit logging because they see every tool call the agent makes, including the arguments and outputs.

import json
from datetime import datetime

async def audit_log_guardrail(ctx, agent, tool_call):
    """Log every tool call for audit purposes. Never skip or modify."""
    args = json.loads(tool_call.function.arguments)

    audit_entry = {
        "timestamp": datetime.utcnow().isoformat(),
        "agent_name": agent.name,
        "tool_name": tool_call.function.name,
        "arguments": args,
        "session_id": getattr(ctx, "session_id", "unknown"),
    }

    # Write to your audit log (database, file, or external service)
    await write_audit_log(audit_entry)

    # Always return None — this guardrail never blocks
    return None

This guardrail observes without interfering. Every tool call is logged with full context, giving you a complete audit trail of what the agent did, when, and with what parameters. This is invaluable for compliance, debugging, and understanding agent behavior in production.

Best Practices

Fail closed, not open. If your guardrail encounters an error during evaluation (network timeout, parsing failure), skip the tool call rather than allowing it. An errored guardrail should block, not pass.

Keep guardrail logic simple. Tool guardrails add latency to every tool call. Use fast checks — argument validation, threshold comparisons, regex matching. Reserve LLM-based evaluation for input and output guardrails where the overhead is amortized across the full request.

Test with adversarial tool calls. Craft test cases where the model generates edge-case arguments: negative amounts, empty strings, SQL injection in search queries, extremely long inputs. Your guardrails should handle all of these gracefully.

Separate concerns. Use one guardrail per concern — one for amount limits, one for audit logging, one for PII redaction. This makes them independently testable and easy to enable or disable per environment.

Share
C

Written by

CallSphere Team

Expert insights on AI voice agents and customer communication automation.

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.