Skip to content
Learn Agentic AI
Learn Agentic AI12 min read37 views

Sensitive Data Handling in Agent Traces

Learn how to control sensitive data in OpenAI Agents SDK traces using trace_include_sensitive_data, environment variables, and GDPR-compliant tracing strategies for production AI systems.

The Privacy Problem with Agent Traces

Agent traces are invaluable for debugging and monitoring, but they create a serious privacy challenge. A typical trace captures the full text of every LLM input and output, every tool argument and return value, and every piece of context passed between agents. In a healthcare chatbot, that means patient symptoms and medical history flowing into your trace storage. In a financial advisor agent, that means account numbers and transaction details. In a customer support agent, that means email addresses, phone numbers, and complaint details.

Storing this data in trace backends — especially third-party platforms — can violate GDPR, HIPAA, CCPA, and SOC 2 requirements. The OpenAI Agents SDK provides built-in controls to manage what sensitive data appears in traces, but using these controls effectively requires understanding the full picture.

The trace_include_sensitive_data Flag

The primary control mechanism is the trace_include_sensitive_data setting on RunConfig. When set to False, the SDK strips LLM inputs, LLM outputs, and tool arguments from trace spans. The structural data — span names, types, durations, and hierarchy — remains intact.

flowchart TD
    START["Sensitive Data Handling in Agent Traces"] --> A
    A["The Privacy Problem with Agent Traces"]
    A --> B
    B["The trace_include_sensitive_data Flag"]
    B --> C
    C["Environment Variable Control"]
    C --> D
    D["Selective Data Redaction"]
    D --> E
    E["GDPR Compliance Strategy"]
    E --> F
    F["Implementing Data Retention Policies"]
    F --> G
    G["Layered Privacy Architecture"]
    G --> H
    H["Testing Your Privacy Controls"]
    H --> DONE["Key Takeaways"]
    style START fill:#4f46e5,stroke:#4338ca,color:#fff
    style DONE fill:#059669,stroke:#047857,color:#fff
from agents import Agent, Runner, RunConfig

agent = Agent(
    name="Medical Triage Agent",
    instructions="You help patients describe symptoms and recommend next steps.",
)

# Traces will NOT include message content or tool arguments
result = await Runner.run(
    agent,
    "I have been experiencing chest pain and shortness of breath for 3 days",
    run_config=RunConfig(trace_include_sensitive_data=False),
)

With this flag disabled, the trace still shows:

  • An agent_span for "Medical Triage Agent" with its duration
  • Generation spans showing the model name, token counts, and latency
  • Function spans showing tool names and durations

But the actual patient message, the model's response, and any tool arguments containing patient data are omitted.

Environment Variable Control

For production deployments, you typically want a global default rather than per-request configuration. The SDK respects an environment variable that sets the default behavior:

# In your deployment configuration (Dockerfile, k8s manifest, .env)
export OPENAI_AGENTS_DISABLE_TRACING_SENSITIVE_DATA=1

When this variable is set, all traces default to excluding sensitive data. Individual runs can still override:

# This specific run WILL include sensitive data despite the env var
result = await Runner.run(
    agent,
    query,
    run_config=RunConfig(trace_include_sensitive_data=True),
)

This pattern is useful for debugging: keep sensitive data disabled globally, but enable it temporarily for specific traces you need to inspect during an incident.

Selective Data Redaction

The binary flag is a blunt instrument. Sometimes you need traces that include some data but redact specific fields. The SDK does not provide field-level redaction natively, but you can implement it with a trace processor:

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

import re
from agents.tracing import TracingProcessor, Span

REDACTION_PATTERNS = [
    (re.compile(r"\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+.[A-Z|a-z]{2,}\b"), "[EMAIL_REDACTED]"),
    (re.compile(r"\bd{3}[-.]?d{3}[-.]?d{4}\b"), "[PHONE_REDACTED]"),
    (re.compile(r"\bd{3}-d{2}-d{4}\b"), "[SSN_REDACTED]"),
    (re.compile(r"\bd{13,19}\b"), "[CARD_REDACTED]"),
    (re.compile(r"\b[A-Z]{2}d{2}[A-Z0-9]{11,30}\b"), "[IBAN_REDACTED]"),
]

class RedactingTraceProcessor(TracingProcessor):
    def _redact(self, text: str) -> str:
        if not isinstance(text, str):
            return text
        for pattern, replacement in REDACTION_PATTERNS:
            text = pattern.sub(replacement, text)
        return text

    def _redact_dict(self, data: dict) -> dict:
        redacted = {}
        for key, value in data.items():
            if isinstance(value, str):
                redacted[key] = self._redact(value)
            elif isinstance(value, dict):
                redacted[key] = self._redact_dict(value)
            elif isinstance(value, list):
                redacted[key] = [
                    self._redact(item) if isinstance(item, str) else item
                    for item in value
                ]
            else:
                redacted[key] = value
        return redacted

    def on_span_end(self, span: Span) -> None:
        if span.data:
            span.data = self._redact_dict(span.data)

    async def shutdown(self) -> None:
        pass

Register this processor, and every span's data payload is scrubbed of emails, phone numbers, SSNs, and card numbers before being sent to any downstream exporter. The redaction happens in-process before data leaves your infrastructure.

GDPR Compliance Strategy

GDPR imposes specific requirements that affect how you design agent tracing:

Right to Erasure (Article 17) — Users can request deletion of their personal data. If your traces contain personal data, you must be able to find and delete traces associated with a specific user.

Data Minimization (Article 5) — You should only collect data that is necessary for the stated purpose. Storing full conversation transcripts in traces when you only need latency metrics violates this principle.

Purpose Limitation — Data collected for debugging cannot be repurposed for training or analytics without explicit consent.

Here is a GDPR-compliant tracing architecture:

import hashlib
from agents.tracing import TracingProcessor, Trace, Span

class GDPRCompliantProcessor(TracingProcessor):
    def __init__(self, metrics_backend, audit_backend):
        self.metrics = metrics_backend
        self.audit = audit_backend

    def _pseudonymize_user(self, user_id: str) -> str:
        """One-way hash for trace correlation without storing real IDs."""
        return hashlib.sha256(
            f"{user_id}:trace-salt-rotate-monthly".encode()
        ).hexdigest()[:16]

    def on_trace_start(self, trace: Trace) -> None:
        user_id = (trace.metadata or {}).get("user_id")
        if user_id:
            trace.metadata["user_id"] = self._pseudonymize_user(user_id)

    def on_span_end(self, span: Span) -> None:
        # Only export structural and performance data
        self.metrics.record({
            "trace_id": span.trace_id,
            "span_type": span.span_type,
            "span_name": span.name,
            "duration_ms": (span.end_time - span.start_time).total_seconds() * 1000,
            "tokens": span.data.get("total_tokens", 0) if span.data else 0,
        })

    def on_trace_end(self, trace: Trace) -> None:
        # Audit log records that a trace happened, not what it contained
        self.audit.log({
            "trace_id": trace.trace_id,
            "workflow": trace.name,
            "timestamp": trace.end_time.isoformat(),
            "pseudonymized_user": (trace.metadata or {}).get("user_id"),
        })

    async def shutdown(self) -> None:
        pass

This processor exports performance metrics and audit records without any personal data. The pseudonymized user ID allows you to correlate traces for a single user (for debugging patterns) without being able to identify who the user is.

Implementing Data Retention Policies

Traces should not live forever. Implement retention policies that automatically purge old trace data:

from datetime import datetime, timedelta

class RetentionAwareProcessor(TracingProcessor):
    def __init__(self, storage_backend, retention_days: int = 30):
        self.storage = storage_backend
        self.retention_days = retention_days

    def on_trace_end(self, trace: Trace) -> None:
        self.storage.store(
            trace_id=trace.trace_id,
            data=self._extract_safe_data(trace),
            ttl=timedelta(days=self.retention_days),
        )

    def _extract_safe_data(self, trace: Trace) -> dict:
        return {
            "trace_id": trace.trace_id,
            "workflow": trace.name,
            "duration_ms": trace.duration_ms,
            "span_count": trace.span_count,
            "created_at": datetime.utcnow().isoformat(),
            "expires_at": (
                datetime.utcnow() + timedelta(days=self.retention_days)
            ).isoformat(),
        }

    async def shutdown(self) -> None:
        pass

Layered Privacy Architecture

The most robust approach combines multiple strategies:

from agents import add_trace_processor

# Layer 1: Global sensitive data exclusion
# Set OPENAI_AGENTS_DISABLE_TRACING_SENSITIVE_DATA=1 in environment

# Layer 2: Pattern-based redaction for any data that slips through
add_trace_processor(RedactingTraceProcessor())

# Layer 3: GDPR-compliant export with pseudonymization
add_trace_processor(GDPRCompliantProcessor(metrics_db, audit_log))

# Layer 4: Time-bounded retention
add_trace_processor(RetentionAwareProcessor(trace_store, retention_days=30))

Each layer catches what the previous one might miss. The environment variable prevents sensitive data from entering traces at the SDK level. The redacting processor catches any PII that enters through custom spans. The GDPR processor pseudonymizes identifiers and strips content. The retention processor ensures nothing persists beyond its useful life.

Testing Your Privacy Controls

Privacy controls are only effective if they are verified. Write tests that confirm redaction works:

import pytest
from unittest.mock import MagicMock

def test_email_redaction():
    processor = RedactingTraceProcessor()
    span = MagicMock()
    span.data = {"input": "Contact me at [email protected] for details"}
    processor.on_span_end(span)
    assert "[email protected]" not in str(span.data)
    assert "[EMAIL_REDACTED]" in span.data["input"]

def test_sensitive_data_flag_excludes_content():
    result = Runner.run_sync(
        agent,
        "My SSN is 123-45-6789",
        run_config=RunConfig(trace_include_sensitive_data=False),
    )
    # Verify trace spans do not contain the input
    for span in collected_spans:
        assert "123-45-6789" not in str(span.data)

Sensitive data handling is not an afterthought — it is a prerequisite for production tracing. Build your privacy controls before you deploy, not after a compliance audit discovers patient data in your Langfuse dashboard.

Share
C

Written by

CallSphere Team

Expert insights on AI voice agents and customer communication automation.

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.