Why Output Guardrails Exist

Input guardrails protect against bad requests. Output guardrails protect against bad responses. Even with a well-designed prompt and input validation, an LLM can produce outputs that violate your policies — leaking internal data, generating PII, returning hallucinated legal advice, or producing content that does not meet compliance standards.

Output guardrails in the OpenAI Agents SDK run after the agent completes its response but before that response is returned to the caller. They give you a final checkpoint to inspect, validate, and potentially block the agent's output.

The pattern mirrors input guardrails: you define a guardrail function, it returns a GuardrailFunctionOutput, and if the tripwire is triggered, the SDK raises an exception. The key difference is that output guardrails receive the agent's generated output rather than the user's input.

Basic Output Guardrail Structure

An output guardrail function receives the agent's output and evaluates it against your policies.

flowchart LR
    REQ(["Inbound request"])
    PII["PII detection<br/>regex plus NER"]
    POL{"Policy engine<br/>OPA or rules"}
    REDACT["Redact or mask"]
    LLM["LLM call"]
    OUT["Response"]
    AUDIT[("Append only<br/>audit log")]
    BLOCK(["Block plus<br/>notify DPO"])
    REQ --> PII --> POL
    POL -->|Allow| REDACT --> LLM --> OUT --> AUDIT
    POL -->|Deny| BLOCK
    style POL fill:#4f46e5,stroke:#4338ca,color:#fff
    style AUDIT fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
    style BLOCK fill:#dc2626,stroke:#b91c1c,color:#fff
    style OUT fill:#059669,stroke:#047857,color:#fff

from agents import Agent, Runner, OutputGuardrail, GuardrailFunctionOutput
from pydantic import BaseModel
import asyncio
import re

async def pii_guardrail(ctx, agent, output) -> GuardrailFunctionOutput:
    """Check agent output for personally identifiable information."""
    output_text = str(output)

    # Check for common PII patterns
    ssn_pattern = r"d{3}-d{2}-d{4}"
    email_pattern = r"[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+.[A-Z|a-z]{2,}"
    phone_pattern = r"d{3}[-.]?d{3}[-.]?d{4}"

    findings = {
        "ssn_found": bool(re.search(ssn_pattern, output_text)),
        "email_found": bool(re.search(email_pattern, output_text)),
        "phone_found": bool(re.search(phone_pattern, output_text)),
    }

    has_pii = any(findings.values())

    return GuardrailFunctionOutput(
        output_info=findings,
        tripwire_triggered=has_pii,
    )

support_agent = Agent(
    name="SupportAgent",
    instructions="""You are a customer support agent. Help users with
    their account issues. NEVER include SSNs, full email addresses,
    or phone numbers in your responses — use masked versions instead.""",
    model="gpt-4o",
    output_guardrails=[
        OutputGuardrail(guardrail_function=pii_guardrail),
    ],
)

Even though the instructions tell the agent to avoid PII, you cannot rely on prompt instructions alone. The output guardrail acts as an enforced policy — it catches what the model misses.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live →

Try Live Demo →

Catching OutputGuardrailTripwireTriggered

When an output guardrail trips, the SDK raises OutputGuardrailTripwireTriggered. This is your signal to suppress the response and return a safe alternative.

from agents.exceptions import OutputGuardrailTripwireTriggered

async def handle_user_message(user_input: str) -> str:
    try:
        result = await Runner.run(support_agent, user_input)
        return result.final_output
    except OutputGuardrailTripwireTriggered as e:
        guardrail_info = e.guardrail_result.output_info
        # Log for compliance audit
        log_pii_violation(
            user_input=user_input,
            guardrail_findings=guardrail_info,
            timestamp=datetime.utcnow(),
        )

        return (
            "I apologize, but I am unable to share that information "
            "in this format. Please contact our support team directly "
            "for assistance with sensitive account details."
        )

The critical point: the unsafe output is never returned to the user. The result.final_output that contained PII is discarded when the exception fires. Your application returns a safe, generic message instead.

LLM-Based Output Guardrails

Regex patterns catch structured PII, but many compliance requirements are semantic. For example, detecting whether a response contains medical advice, financial recommendations, or legal opinions requires an LLM to understand context.

class ComplianceCheckOutput(BaseModel):
    is_compliant: bool
    violation_type: str | None = None
    explanation: str

compliance_checker = Agent(
    name="ComplianceChecker",
    instructions="""Evaluate the given text for compliance violations.
    Flag if the text contains:
    - Medical diagnoses or treatment recommendations
    - Specific financial or investment advice
    - Legal opinions presented as fact
    - Promises or guarantees about outcomes
    Return is_compliant=True if none of these are present.""",
    model="gpt-4o-mini",
    output_type=ComplianceCheckOutput,
)

async def compliance_guardrail(ctx, agent, output) -> GuardrailFunctionOutput:
    result = await Runner.run(compliance_checker, str(output), context=ctx.context)
    return GuardrailFunctionOutput(
        output_info=result.final_output.model_dump(),
        tripwire_triggered=not result.final_output.is_compliant,
    )

This pattern uses a small, fast model (gpt-4o-mini) as the compliance checker. It evaluates the main agent's full response and flags violations that no regex could catch — like "You should definitely invest in index funds right now" being flagged as financial advice.

PII Detection: A Complete Example

Here is a production-grade PII detection guardrail that combines regex patterns with an LLM-based check for contextual PII (names, addresses, and other information that is PII only in context).

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Try Live Demo → Book 30-min Walkthrough See Pricing

import re
from agents import Agent, Runner, OutputGuardrail, GuardrailFunctionOutput
from pydantic import BaseModel

class PIIAnalysis(BaseModel):
    contains_pii: bool
    pii_types: list[str]
    confidence: float

pii_detector = Agent(
    name="PIIDetector",
    instructions="""Analyze the text for personally identifiable
    information. Check for: full names paired with account details,
    physical addresses, dates of birth in context, medical record
    numbers, any data that could identify a specific individual.
    Report confidence as a float between 0 and 1.""",
    model="gpt-4o-mini",
    output_type=PIIAnalysis,
)

REGEX_PATTERNS = {
    "ssn": r"d{3}-d{2}-d{4}",
    "credit_card": r"d{4}[-s]?d{4}[-s]?d{4}[-s]?d{4}",
    "email": r"[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+.[A-Z|a-z]{2,}",
    "phone_us": r"(+1[-s]?)?(?d{3})?[-s.]?d{3}[-s.]?d{4}",
    "ip_address": r"d{1,3}.d{1,3}.d{1,3}.d{1,3}",
}

async def comprehensive_pii_guardrail(ctx, agent, output) -> GuardrailFunctionOutput:
    output_text = str(output)

    # Layer 1: Fast regex scan
    regex_findings = {}
    for pii_type, pattern in REGEX_PATTERNS.items():
        matches = re.findall(pattern, output_text)
        if matches:
            regex_findings[pii_type] = len(matches)

    # If regex finds PII, trip immediately — no need for LLM check
    if regex_findings:
        return GuardrailFunctionOutput(
            output_info={"method": "regex", "findings": regex_findings},
            tripwire_triggered=True,
        )

    # Layer 2: LLM-based contextual PII check
    result = await Runner.run(pii_detector, output_text, context=ctx.context)
    analysis = result.final_output

    return GuardrailFunctionOutput(
        output_info={
            "method": "llm",
            "contains_pii": analysis.contains_pii,
            "pii_types": analysis.pii_types,
            "confidence": analysis.confidence,
        },
        tripwire_triggered=analysis.contains_pii and analysis.confidence > 0.7,
    )

This two-layer approach is both fast and thorough. Regex catches structured PII instantly without any LLM cost. The LLM layer only runs when regex finds nothing, catching contextual PII that patterns miss.

Output Guardrails with Structured Output

When your agent uses output_type to return structured data (a Pydantic model), the output guardrail receives the parsed object, not raw text. This makes validation even more precise.

class CustomerResponse(BaseModel):
    message: str
    suggested_actions: list[str]
    internal_notes: str | None = None

async def no_internal_notes_guardrail(ctx, agent, output) -> GuardrailFunctionOutput:
    """Ensure internal notes are never populated in the response."""
    if isinstance(output, CustomerResponse) and output.internal_notes:
        return GuardrailFunctionOutput(
            output_info={"violation": "internal_notes_populated"},
            tripwire_triggered=True,
        )
    return GuardrailFunctionOutput(
        output_info={"status": "clean"},
        tripwire_triggered=False,
    )

Combining Input and Output Guardrails

A defense-in-depth strategy uses both guardrail types. Input guardrails block bad requests early. Output guardrails catch any issues that slip through the agent's processing.

production_agent = Agent(
    name="ProductionAgent",
    instructions="You are a helpful assistant for Acme Corp customers.",
    model="gpt-4o",
    input_guardrails=[
        InputGuardrail(guardrail_function=topic_guardrail),
        InputGuardrail(guardrail_function=injection_guardrail),
    ],
    output_guardrails=[
        OutputGuardrail(guardrail_function=comprehensive_pii_guardrail),
        OutputGuardrail(guardrail_function=compliance_guardrail),
        OutputGuardrail(guardrail_function=no_internal_notes_guardrail),
    ],
)

Input guardrails save money by rejecting bad input before the main agent runs. Output guardrails save reputation by catching bad output before the user sees it. Both are necessary. Neither alone is sufficient.

Performance Considerations

Output guardrails add latency to every successful response. The user waits for both the agent and the guardrail to finish. To minimize impact:

Use regex and heuristic checks first. Only call an LLM-based guardrail when cheap checks pass. Keep guardrail agents on fast models like gpt-4o-mini. Run multiple output guardrails in parallel when they are independent. And measure: track the p50, p95, and p99 latency that guardrails add so you can make informed trade-offs between safety and speed.

Output Guardrails: Ensuring Safe Agent Responses

Why Output Guardrails Exist

Basic Output Guardrail Structure

Catching OutputGuardrailTripwireTriggered

LLM-Based Output Guardrails

PII Detection: A Complete Example

Output Guardrails with Structured Output

Combining Input and Output Guardrails

Performance Considerations

Try CallSphere AI Voice Agents

Related Articles You May Like

GPT-Realtime-2 For Healthcare Voice: HIPAA and BAA Considerations

Desktop AI Agents in 2026: Project Arc, Claude Cowork, OpenAI Agents Compared

OpenAI Frontier: Model-Native Orchestration Is the Default in 2026

Gemini Enterprise vs Anthropic vs OpenAI Frontier: 2026 Comparison

Anthropic's Financial Services Platform: State of Play in May 2026

Model-Native Harness: Why OpenAI and Anthropic Are Killing ReAct Loops