Tool Guardrails: Protecting Function Execution
Learn how to implement tool input and output guardrails in the OpenAI Agents SDK to validate function arguments, skip dangerous calls, and replace tool outputs before they reach the agent.
Why Tool Execution Needs Its Own Guardrails
Input guardrails catch bad user messages. Output guardrails catch bad agent responses. But between those two checkpoints, the agent calls tools — and tool calls are where the real damage happens. A miscrafted tool call can delete database records, send emails to the wrong recipient, charge a credit card for the wrong amount, or leak internal data through an API.
Tool guardrails in the OpenAI Agents SDK intercept tool execution at two points: before the function runs (tool input guardrails) and after it returns (tool output guardrails). They give you the ability to validate arguments, skip dangerous calls entirely, or replace tool outputs with sanitized versions.
Tool Input Guardrails: Validating Before Execution
A tool input guardrail inspects the arguments that the agent has decided to pass to a function. It runs after the LLM has generated the tool call but before the actual function executes.
flowchart TD
START["Tool Guardrails: Protecting Function Execution"] --> A
A["Why Tool Execution Needs Its Own Guardr…"]
A --> B
B["Tool Input Guardrails: Validating Befor…"]
B --> C
C["Tool Output Guardrails: Sanitizing Afte…"]
C --> D
D["Combining Input and Output Tool Guardra…"]
D --> E
E["Skipping Calls vs Replacing Output"]
E --> F
F["Real-World Pattern: Audit Logging Throu…"]
F --> G
G["Best Practices"]
G --> DONE["Key Takeaways"]
style START fill:#4f46e5,stroke:#4338ca,color:#fff
style DONE fill:#059669,stroke:#047857,color:#fff
from agents import Agent, Runner, function_tool
from pydantic import BaseModel
import asyncio
@function_tool
def transfer_funds(from_account: str, to_account: str, amount: float) -> str:
"""Transfer funds between customer accounts."""
# In production, this calls your banking API
return f"Transferred ${amount:.2f} from {from_account} to {to_account}"
@function_tool
def get_account_balance(account_id: str) -> str:
"""Get the current balance for an account."""
# Simulated lookup
balances = {"ACC001": 5420.50, "ACC002": 12300.00}
balance = balances.get(account_id, 0.0)
return f"Account {account_id} balance: ${balance:.2f}"
Now define a tool input guardrail that validates transfer amounts:
async def transfer_amount_guardrail(ctx, agent, tool_call):
"""Block transfers above the auto-approval limit."""
if tool_call.function.name != "transfer_funds":
return None # Only check transfer_funds calls
import json
args = json.loads(tool_call.function.arguments)
amount = args.get("amount", 0)
if amount > 10000:
return {
"skip": True,
"replacement_output": (
"Transfer blocked: amounts over $10,000 require "
"manager approval. Please escalate this request."
),
}
if amount <= 0:
return {
"skip": True,
"replacement_output": (
"Transfer blocked: amount must be a positive number."
),
}
return None # Allow the call to proceed
When the guardrail returns None, the tool call proceeds normally. When it returns a dictionary with skip: True, the actual function is never called, and the replacement_output is fed back to the agent as if the tool had returned that value.
Attaching Tool Input Guardrails to an Agent
banking_agent = Agent(
name="BankingAgent",
instructions="""You are a banking support agent. You can check
account balances and transfer funds between accounts. Always
confirm the details with the customer before executing a transfer.""",
model="gpt-4o",
tools=[transfer_funds, get_account_balance],
tool_use_guardrails=[
{
"type": "input",
"guardrail_function": transfer_amount_guardrail,
},
],
)
This is a fundamentally different safety model than relying on prompt instructions. The prompt says "confirm before transferring," but the guardrail enforces a hard limit regardless of what the model decides to do.
Tool Output Guardrails: Sanitizing After Execution
Tool output guardrails run after the function returns but before the result is passed back to the agent. They are useful for redacting sensitive data, normalizing formats, or adding warnings to tool results.
async def redact_tool_output_guardrail(ctx, agent, tool_call, tool_output):
"""Redact sensitive fields from tool outputs before the agent sees them."""
import re
output_str = str(tool_output)
# Redact SSNs
output_str = re.sub(
r"(d{3})-(d{2})-(d{4})",
r"***-**-\3",
output_str,
)
# Redact credit card numbers (keep last 4)
output_str = re.sub(
r"d{4}[-s]?d{4}[-s]?d{4}[-s]?(d{4})",
r"****-****-****-\1",
output_str,
)
if output_str != str(tool_output):
return {"replacement_output": output_str}
return None # No modification needed
The agent sees the redacted version. It can still reference "the card ending in 4242" or "the last four of your SSN" without the full sensitive data ever appearing in the conversation context. This is critical because the full conversation context is often logged, cached, or sent to other services.
See AI Voice Agents Handle Real Calls
Book a free demo or calculate how much you can save with AI voice automation.
Attaching Output Guardrails
customer_agent = Agent(
name="CustomerAgent",
instructions="Help customers with account inquiries.",
model="gpt-4o",
tools=[lookup_customer, get_transactions],
tool_use_guardrails=[
{
"type": "output",
"guardrail_function": redact_tool_output_guardrail,
},
],
)
Combining Input and Output Tool Guardrails
For maximum protection, apply both input and output guardrails to the same agent. Input guardrails prevent dangerous calls. Output guardrails sanitize the results of allowed calls.
flowchart TD
ROOT["Tool Guardrails: Protecting Function Executi…"]
ROOT --> P0["Tool Input Guardrails: Validating Befor…"]
P0 --> P0C0["Attaching Tool Input Guardrails to an A…"]
ROOT --> P1["Tool Output Guardrails: Sanitizing Afte…"]
P1 --> P1C0["Attaching Output Guardrails"]
style ROOT fill:#4f46e5,stroke:#4338ca,color:#fff
style P0 fill:#e0e7ff,stroke:#6366f1,color:#1e293b
style P1 fill:#e0e7ff,stroke:#6366f1,color:#1e293b
secure_agent = Agent(
name="SecureAgent",
instructions="You are a secure financial assistant.",
model="gpt-4o",
tools=[transfer_funds, get_account_balance, lookup_customer],
tool_use_guardrails=[
{
"type": "input",
"guardrail_function": transfer_amount_guardrail,
},
{
"type": "input",
"guardrail_function": block_after_hours_guardrail,
},
{
"type": "output",
"guardrail_function": redact_tool_output_guardrail,
},
],
)
Skipping Calls vs Replacing Output
Tool guardrails give you two distinct intervention strategies, and choosing the right one depends on the scenario.
Skipping the call means the function never executes. Use this when the tool call itself is dangerous — transferring too much money, deleting data, or calling an external API with invalid parameters.
Replacing the output means the function executes normally, but its return value is modified before the agent sees it. Use this when the function is safe to call but its output contains sensitive data that should not enter the conversation context.
async def selective_guardrail(ctx, agent, tool_call):
"""Example showing both skip and allow-with-modification patterns."""
import json
if tool_call.function.name == "delete_record":
# SKIP: Never allow deletion through the agent
return {
"skip": True,
"replacement_output": (
"Record deletion is not available through this interface. "
"Please submit a deletion request through the admin portal."
),
}
if tool_call.function.name == "search_users":
args = json.loads(tool_call.function.arguments)
query = args.get("query", "")
if len(query) < 3:
# SKIP: Prevent overly broad searches
return {
"skip": True,
"replacement_output": (
"Search query must be at least 3 characters. "
"Please ask the customer for more specific information."
),
}
return None # Allow all other calls
Real-World Pattern: Audit Logging Through Tool Guardrails
Tool guardrails are an excellent place to implement audit logging because they see every tool call the agent makes, including the arguments and outputs.
import json
from datetime import datetime
async def audit_log_guardrail(ctx, agent, tool_call):
"""Log every tool call for audit purposes. Never skip or modify."""
args = json.loads(tool_call.function.arguments)
audit_entry = {
"timestamp": datetime.utcnow().isoformat(),
"agent_name": agent.name,
"tool_name": tool_call.function.name,
"arguments": args,
"session_id": getattr(ctx, "session_id", "unknown"),
}
# Write to your audit log (database, file, or external service)
await write_audit_log(audit_entry)
# Always return None — this guardrail never blocks
return None
This guardrail observes without interfering. Every tool call is logged with full context, giving you a complete audit trail of what the agent did, when, and with what parameters. This is invaluable for compliance, debugging, and understanding agent behavior in production.
Best Practices
Fail closed, not open. If your guardrail encounters an error during evaluation (network timeout, parsing failure), skip the tool call rather than allowing it. An errored guardrail should block, not pass.
Keep guardrail logic simple. Tool guardrails add latency to every tool call. Use fast checks — argument validation, threshold comparisons, regex matching. Reserve LLM-based evaluation for input and output guardrails where the overhead is amortized across the full request.
Test with adversarial tool calls. Craft test cases where the model generates edge-case arguments: negative amounts, empty strings, SQL injection in search queries, extremely long inputs. Your guardrails should handle all of these gracefully.
Separate concerns. Use one guardrail per concern — one for amount limits, one for audit logging, one for PII redaction. This makes them independently testable and easy to enable or disable per environment.
Written by
CallSphere Team
Expert insights on AI voice agents and customer communication automation.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.