Why PII Leaks Are the Biggest Risk in Agent Systems

AI agents process natural language at scale. Users type credit card numbers into chat fields, paste medical records into support conversations, and share social security numbers without thinking twice. If your agent pipeline passes that data unfiltered to an LLM, you have a compliance violation on your hands — potentially before the model even responds.

The stakes are not hypothetical. GDPR fines can reach 4% of annual global revenue. HIPAA violations carry penalties up to $1.5 million per incident category per year. California's CCPA allows statutory damages of $100-$750 per consumer per incident. A single leaked SSN in a logged LLM request can trigger all of these.

This post builds a production PII detection and redaction system that sits inside your agent pipeline, combining fast regex matching with LLM-based semantic detection for the cases regex misses.

The Two-Layer Detection Architecture

No single detection method catches everything. Regex is fast and deterministic but misses context-dependent PII. LLM-based detection understands context but is slow and expensive. The solution is a two-layer approach: regex first for known patterns, then LLM verification for ambiguous cases.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live →

Try Live Demo →

flowchart LR
    REQ(["Inbound request"])
    PII["PII detection<br/>regex plus NER"]
    POL{"Policy engine<br/>OPA or rules"}
    REDACT["Redact or mask"]
    LLM["LLM call"]
    OUT["Response"]
    AUDIT[("Append only<br/>audit log")]
    BLOCK(["Block plus<br/>notify DPO"])
    REQ --> PII --> POL
    POL -->|Allow| REDACT --> LLM --> OUT --> AUDIT
    POL -->|Deny| BLOCK
    style POL fill:#4f46e5,stroke:#4338ca,color:#fff
    style AUDIT fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
    style BLOCK fill:#dc2626,stroke:#b91c1c,color:#fff
    style OUT fill:#059669,stroke:#047857,color:#fff

from dataclasses import dataclass, field
from enum import Enum
from typing import Optional
import re

class PIIType(str, Enum):
    SSN = "ssn"
    CREDIT_CARD = "credit_card"
    EMAIL = "email"
    PHONE = "phone"
    DATE_OF_BIRTH = "date_of_birth"
    ADDRESS = "address"
    NAME = "name"
    MEDICAL_ID = "medical_id"

@dataclass
class PIIMatch:
    pii_type: PIIType
    start: int
    end: int
    original_value: str
    confidence: float
    detection_method: str
    redacted_token: Optional[str] = None

Layer 1: Regex-Based Pattern Detection

Regex catches structured PII with high confidence. Social security numbers, credit cards, emails, and phone numbers all follow predictable formats. The key is building patterns that minimize false positives while catching format variations.

class RegexPIIDetector:
    PATTERNS: dict[PIIType, list[re.Pattern]] = {
        PIIType.SSN: [
            re.compile(r"d{3}-d{2}-d{4}"),
            re.compile(r"d{9}(?=.*(?:ssn|social))", re.IGNORECASE),
        ],
        PIIType.CREDIT_CARD: [
            re.compile(r"4d{3}[s-]?d{4}[s-]?d{4}[s-]?d{4}"),
            re.compile(r"5[1-5]d{2}[s-]?d{4}[s-]?d{4}[s-]?d{4}"),
            re.compile(r"3[47]d{2}[s-]?d{6}[s-]?d{5}"),
        ],
        PIIType.EMAIL: [
            re.compile(r"[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+.[A-Z|a-z]{2,}"),
        ],
        PIIType.PHONE: [
            re.compile(r"+?1?[s.-]?(?d{3})?[s.-]?d{3}[s.-]?d{4}"),
            re.compile(r"+d{1,3}[s.-]?d{4,14}"),
        ],
    }

    def detect(self, text: str) -> list[PIIMatch]:
        matches = []
        for pii_type, patterns in self.PATTERNS.items():
            for pattern in patterns:
                for match in pattern.finditer(text):
                    pii_match = PIIMatch(
                        pii_type=pii_type,
                        start=match.start(),
                        end=match.end(),
                        original_value=match.group(),
                        confidence=0.95,
                        detection_method="regex",
                    )
                    if self._validate_match(pii_match):
                        matches.append(pii_match)
        return matches

    def _validate_match(self, match: PIIMatch) -> bool:
        if match.pii_type == PIIType.CREDIT_CARD:
            return self._luhn_check(
                re.sub(r"[s-]", "", match.original_value)
            )
        return True

    @staticmethod
    def _luhn_check(number: str) -> bool:
        digits = [int(d) for d in number]
        odd_digits = digits[-1::-2]
        even_digits = digits[-2::-2]
        total = sum(odd_digits)
        for d in even_digits:
            total += sum(divmod(d * 2, 10))
        return total % 10 == 0

The Luhn check on credit card numbers is critical. Without it, any 16-digit number triggers a false positive — order IDs, tracking numbers, and random numeric strings all get flagged incorrectly.

Layer 2: LLM-Based Semantic Detection

Regex cannot catch unstructured PII. When a user writes "my name is John Smith and I live at 42 Maple Street," there is no fixed pattern — the PII is embedded in natural language. An LLM guardrail handles this layer.

from agents import Agent, Runner
from pydantic import BaseModel, Field

class SemanticPIIResult(BaseModel):
    contains_pii: bool = Field(description="Whether the text contains PII")
    findings: list[dict] = Field(
        description="List of PII findings with type, value, and location"
    )
    confidence: float = Field(ge=0.0, le=1.0)

pii_detection_agent = Agent(
    name="PIIDetector",
    instructions="""You are a PII detection specialist. Analyze text for
    personally identifiable information that regex patterns would miss.

    Look for:
    - Full names (first + last)
    - Street addresses and locations
    - Dates of birth in conversational context
    - Medical record numbers or patient IDs
    - Financial account references
    - Any combination of data that could identify a person

    Do NOT flag: generic titles, company names, public figures
    mentioned in news context, or obviously fictional examples.

    Return structured findings with exact text spans.""",
    model="gpt-4o-mini",
    output_type=SemanticPIIResult,
)

async def detect_semantic_pii(text: str) -> SemanticPIIResult:
    result = await Runner.run(pii_detection_agent, text)
    return result.final_output

Using gpt-4o-mini keeps costs low while maintaining strong detection accuracy. For high-sensitivity environments like healthcare or finance, upgrade to gpt-4o.

Reversible Tokenization for Redaction

Production systems often need to reverse the redaction — a compliance officer reviewing an audit log needs to see the original data. Reversible tokenization replaces PII with deterministic tokens that map back to originals through a secure vault.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Try Live Demo → Book 30-min Walkthrough See Pricing

import hashlib
import json
from cryptography.fernet import Fernet

class PIIVault:
    def __init__(self, encryption_key: bytes):
        self.cipher = Fernet(encryption_key)
        self._store: dict[str, bytes] = {}

    def tokenize(self, match: PIIMatch) -> str:
        token_id = hashlib.sha256(
            f"${match.pii_type.value}:${match.original_value}".encode()
        ).hexdigest()[:12]
        token = f"[{match.pii_type.value.upper()}_{token_id}]"

        encrypted = self.cipher.encrypt(
            json.dumps({
                "type": match.pii_type.value,
                "value": match.original_value,
                "detection_method": match.detection_method,
            }).encode()
        )
        self._store[token] = encrypted
        return token

    def detokenize(self, token: str) -> Optional[dict]:
        encrypted = self._store.get(token)
        if not encrypted:
            return None
        decrypted = self.cipher.decrypt(encrypted)
        return json.loads(decrypted.decode())

Building the Full Redaction Pipeline

Now we combine both detection layers with the vault into a single pipeline that processes text before it reaches the LLM.

class PIIRedactionPipeline:
    def __init__(self, vault: PIIVault):
        self.regex_detector = RegexPIIDetector()
        self.vault = vault

    async def redact(self, text: str) -> tuple[str, list[PIIMatch]]:
        all_matches: list[PIIMatch] = []

        # Layer 1: regex detection
        regex_matches = self.regex_detector.detect(text)
        all_matches.extend(regex_matches)

        # Layer 2: LLM semantic detection
        semantic_result = await detect_semantic_pii(text)
        if semantic_result.contains_pii:
            for finding in semantic_result.findings:
                start = text.find(finding["value"])
                if start >= 0:
                    all_matches.append(PIIMatch(
                        pii_type=PIIType(finding["type"]),
                        start=start,
                        end=start + len(finding["value"]),
                        original_value=finding["value"],
                        confidence=semantic_result.confidence,
                        detection_method="llm_semantic",
                    ))

        # Deduplicate overlapping matches
        all_matches = self._deduplicate(all_matches)

        # Tokenize and redact (process in reverse order to preserve offsets)
        redacted_text = text
        for match in sorted(all_matches, key=lambda m: m.start, reverse=True):
            token = self.vault.tokenize(match)
            match.redacted_token = token
            redacted_text = (
                redacted_text[:match.start] + token + redacted_text[match.end:]
            )

        return redacted_text, all_matches

    def _deduplicate(self, matches: list[PIIMatch]) -> list[PIIMatch]:
        if not matches:
            return []
        sorted_matches = sorted(matches, key=lambda m: (m.start, -m.confidence))
        result = [sorted_matches[0]]
        for match in sorted_matches[1:]:
            prev = result[-1]
            if match.start >= prev.end:
                result.append(match)
            elif match.confidence > prev.confidence:
                result[-1] = match
        return result

GDPR requires more than just redaction. You need data minimization, right to erasure, and audit trails. Here is how to integrate these requirements into your pipeline.

import datetime

class GDPRCompliantPipeline(PIIRedactionPipeline):
    def __init__(self, vault: PIIVault, audit_log_path: str):
        super().__init__(vault)
        self.audit_log_path = audit_log_path

    async def process_with_audit(
        self, text: str, user_id: str, purpose: str
    ) -> tuple[str, str]:
        redacted_text, matches = await self.redact(text)

        audit_entry = {
            "timestamp": datetime.datetime.utcnow().isoformat(),
            "user_id": user_id,
            "purpose": purpose,
            "pii_types_found": [m.pii_type.value for m in matches],
            "detection_methods": [m.detection_method for m in matches],
            "redaction_count": len(matches),
        }
        self._write_audit_log(audit_entry)

        return redacted_text, audit_entry["timestamp"]

    def handle_erasure_request(self, user_id: str):
        """GDPR Article 17 - Right to Erasure"""
        self.vault.purge_by_user(user_id)
        self._write_audit_log({
            "timestamp": datetime.datetime.utcnow().isoformat(),
            "user_id": user_id,
            "action": "erasure_completed",
        })

Output Sanitization

Input redaction is half the battle. The LLM might generate PII in its output — hallucinating realistic SSNs, generating plausible addresses, or echoing back redacted tokens in a way that leaks information. Run the same detection pipeline on agent outputs.

async def sanitized_agent_run(
    agent: Agent,
    user_input: str,
    pipeline: GDPRCompliantPipeline,
    user_id: str,
) -> str:
    # Redact input before sending to LLM
    redacted_input, _ = await pipeline.process_with_audit(
        user_input, user_id, purpose="agent_input"
    )

    # Run the agent with redacted input
    result = await Runner.run(agent, redacted_input)

    # Scan and redact the output too
    redacted_output, _ = await pipeline.process_with_audit(
        result.final_output, user_id, purpose="agent_output"
    )

    return redacted_output

Key Takeaways

PII detection in agent pipelines requires a layered approach. Regex handles structured patterns with high speed and precision. LLM-based detection catches the unstructured PII that regex misses. Reversible tokenization lets you redact for the model while preserving recoverability for authorized reviewers. GDPR compliance is not an afterthought — it is an architectural requirement that shapes how you store, process, and purge personal data throughout the entire agent lifecycle.

Never trust a single detection method. Never skip output sanitization. And always build the audit trail from day one.

PII Detection and Redaction in Agent Pipelines

Why PII Leaks Are the Biggest Risk in Agent Systems

The Two-Layer Detection Architecture

Layer 1: Regex-Based Pattern Detection

Layer 2: LLM-Based Semantic Detection

Reversible Tokenization for Redaction

Building the Full Redaction Pipeline

Output Sanitization

Key Takeaways

Try CallSphere AI Voice Agents

Related Articles You May Like

GPT-Realtime-2 For Healthcare Voice: HIPAA and BAA Considerations

Desktop AI Agents in 2026: Project Arc, Claude Cowork, OpenAI Agents Compared

OpenAI Frontier: Model-Native Orchestration Is the Default in 2026

Gemini Enterprise vs Anthropic vs OpenAI Frontier: 2026 Comparison

Anthropic's Financial Services Platform: State of Play in May 2026

Model-Native Harness: Why OpenAI and Anthropic Are Killing ReAct Loops

Why PII Leaks Are the Biggest Risk in Agent Systems

The Two-Layer Detection Architecture

Layer 1: Regex-Based Pattern Detection

Layer 2: LLM-Based Semantic Detection

Reversible Tokenization for Redaction

Building the Full Redaction Pipeline

GDPR Compliance Patterns

Output Sanitization

Key Takeaways

Try CallSphere AI Voice Agents

Related Articles You May Like

GPT-Realtime-2 For Healthcare Voice: HIPAA and BAA Considerations

Desktop AI Agents in 2026: Project Arc, Claude Cowork, OpenAI Agents Compared

OpenAI Frontier: Model-Native Orchestration Is the Default in 2026

Gemini Enterprise vs Anthropic vs OpenAI Frontier: 2026 Comparison

Anthropic's Financial Services Platform: State of Play in May 2026

Model-Native Harness: Why OpenAI and Anthropic Are Killing ReAct Loops