---
title: "Validating LLM Outputs: Custom Validators, Business Rules, and Data Quality Checks"
description: "Build comprehensive validation layers for LLM outputs using Pydantic validators, cross-field validation, domain-specific constraints, and data quality scoring. Catch hallucinations before they reach your database."
canonical: https://callsphere.ai/blog/validating-llm-outputs-custom-validators-business-rules-data-quality
category: "Learn Agentic AI"
tags: ["Validation", "Data Quality", "Pydantic", "Business Rules", "Python"]
author: "CallSphere Team"
published: 2026-03-17T00:00:00.000Z
updated: 2026-05-06T01:02:42.651Z
---

# Validating LLM Outputs: Custom Validators, Business Rules, and Data Quality Checks

> Build comprehensive validation layers for LLM outputs using Pydantic validators, cross-field validation, domain-specific constraints, and data quality scoring. Catch hallucinations before they reach your database.

## The Validation Gap

Structured outputs guarantee valid JSON that conforms to a schema. But schema conformance is the lowest bar. A JSON object where every field has the right type can still contain:

- A person's name in the email field
- A date in the future for a historical event
- A price that violates your business pricing rules
- An address that is syntactically valid but does not exist
- A summary that contradicts the source document

Validation is where you bridge the gap between "structurally correct" and "actually correct." Pydantic gives you the tools to build validation layers that catch these issues before bad data reaches your database or your users.

## Field-Level Validators

Start with individual field constraints. Pydantic offers two approaches: `Field` constraints for simple rules and `field_validator` for complex logic:

```mermaid
flowchart LR
    INPUT(["User intent"])
    PARSE["Parse plus
classify"]
    PLAN["Plan and tool
selection"]
    AGENT["Agent loop
LLM plus tools"]
    GUARD{"Guardrails
and policy"}
    EXEC["Execute and
verify result"]
    OBS[("Trace and metrics")]
    OUT(["Outcome plus
next action"])
    INPUT --> PARSE --> PLAN --> AGENT --> GUARD
    GUARD -->|Pass| EXEC --> OUT
    GUARD -->|Fail| AGENT
    AGENT --> OBS
    style AGENT fill:#4f46e5,stroke:#4338ca,color:#fff
    style GUARD fill:#f59e0b,stroke:#d97706,color:#1f2937
    style OBS fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
    style OUT fill:#059669,stroke:#047857,color:#fff
```

```python
from pydantic import BaseModel, Field, field_validator
from typing import List, Optional
import re

class ExtractedCompany(BaseModel):
    name: str = Field(min_length=2, max_length=200)
    ticker: Optional[str] = Field(default=None, pattern=r"^[A-Z]{1,5}$")
    employee_count: Optional[int] = Field(default=None, ge=1, le=10_000_000)
    founded_year: Optional[int] = Field(default=None, ge=1600, le=2026)
    website: Optional[str] = None
    revenue_usd: Optional[float] = Field(default=None, ge=0)

    @field_validator("name")
    @classmethod
    def clean_company_name(cls, v: str) -> str:
        # Remove common LLM artifacts
        v = v.strip().strip('"').strip("'")
        # Reject obviously wrong names
        if v.lower() in {"n/a", "unknown", "none", "null", "company"}:
            raise ValueError(f"'{v}' is not a valid company name")
        return v

    @field_validator("website")
    @classmethod
    def validate_url(cls, v: Optional[str]) -> Optional[str]:
        if v is None:
            return v
        url_pattern = re.compile(
            r"^https?://[a-zA-Z0-9]([a-zA-Z0-9-]*[a-zA-Z0-9])?"
            r"(.[a-zA-Z]{2,})+(/.*)?$"
        )
        if not url_pattern.match(v):
            raise ValueError(f"Invalid URL format: '{v}'")
        return v
```

The `founded_year` field rejects years before 1600 and after 2026. This catches the common hallucination where the model invents a founding year that is clearly wrong.

## Cross-Field Validation

Many business rules involve relationships between fields. Use `model_validator` to enforce them:

```python
from pydantic import model_validator

class JobPosting(BaseModel):
    title: str
    company: str
    salary_min: Optional[float] = None
    salary_max: Optional[float] = None
    salary_currency: str = "USD"
    experience_min_years: Optional[int] = Field(default=None, ge=0)
    experience_max_years: Optional[int] = Field(default=None, ge=0)
    remote: bool = False
    location: Optional[str] = None

    @model_validator(mode="after")
    def validate_salary_range(self) -> "JobPosting":
        if self.salary_min is not None and self.salary_max is not None:
            if self.salary_min > self.salary_max:
                raise ValueError(
                    f"salary_min ({self.salary_min}) cannot exceed "
                    f"salary_max ({self.salary_max})"
                )
            if self.salary_max > 10 * self.salary_min:
                raise ValueError(
                    f"Salary range too wide: {self.salary_min}-{self.salary_max}. "
                    "This likely indicates an extraction error."
                )
        return self

    @model_validator(mode="after")
    def validate_experience_range(self) -> "JobPosting":
        if self.experience_min_years is not None and self.experience_max_years is not None:
            if self.experience_min_years > self.experience_max_years:
                raise ValueError(
                    f"experience_min ({self.experience_min_years}) exceeds "
                    f"experience_max ({self.experience_max_years})"
                )
        return self

    @model_validator(mode="after")
    def validate_location_for_non_remote(self) -> "JobPosting":
        if not self.remote and not self.location:
            raise ValueError("Non-remote jobs must have a location specified")
        return self
```

The salary range validator catches a subtle issue: if the model extracts a min of 50000 and a max of 500000, the 10x ratio flag triggers, indicating the model probably misread the salary.

## Domain-Specific Constraint Libraries

For complex domains, build a reusable validation library:

```python
class MedicalValidators:
    """Validation functions for medical data extraction."""

    VALID_BLOOD_TYPES = {"A+", "A-", "B+", "B-", "AB+", "AB-", "O+", "O-"}

    @staticmethod
    def validate_icd10_code(code: str) -> str:
        """Validate ICD-10 diagnosis code format."""
        pattern = re.compile(r"^[A-Z]\d{2}(\.\d{1,4})?$")
        if not pattern.match(code):
            raise ValueError(f"Invalid ICD-10 code format: '{code}'")
        return code

    @staticmethod
    def validate_npi(npi: str) -> str:
        """Validate National Provider Identifier (10-digit)."""
        if not re.match(r"^\d{10}$", npi):
            raise ValueError(f"NPI must be exactly 10 digits, got: '{npi}'")
        return npi

class PatientRecord(BaseModel):
    name: str
    date_of_birth: str
    blood_type: Optional[str] = None
    diagnoses: List[str] = Field(default_factory=list)
    provider_npi: Optional[str] = None

    @field_validator("blood_type")
    @classmethod
    def check_blood_type(cls, v: Optional[str]) -> Optional[str]:
        if v and v not in MedicalValidators.VALID_BLOOD_TYPES:
            raise ValueError(f"Invalid blood type: '{v}'")
        return v

    @field_validator("diagnoses")
    @classmethod
    def check_icd_codes(cls, v: List[str]) -> List[str]:
        return [MedicalValidators.validate_icd10_code(code) for code in v]
```

## Data Quality Scoring

Instead of binary pass/fail, assign a quality score to each extraction:

```python
@dataclass
class QualityReport:
    score: float  # 0.0 to 1.0
    issues: List[str]
    field_scores: dict[str, float]

def assess_extraction_quality(extracted: BaseModel, source_text: str) -> QualityReport:
    """Score the quality of an extraction result."""
    issues = []
    field_scores = {}
    total_fields = 0
    filled_fields = 0

    for field_name, field_info in extracted.model_fields.items():
        total_fields += 1
        value = getattr(extracted, field_name)

        if value is None or value == [] or value == "":
            field_scores[field_name] = 0.0
        else:
            filled_fields += 1
            # Check if extracted value appears in source text
            str_value = str(value).lower()
            if len(str_value) > 3 and str_value not in source_text.lower():
                issues.append(f"'{field_name}' value '{value}' not found in source text")
                field_scores[field_name] = 0.5  # Suspicious but not necessarily wrong
            else:
                field_scores[field_name] = 1.0

    completeness = filled_fields / total_fields if total_fields > 0 else 0
    accuracy = sum(field_scores.values()) / total_fields if total_fields > 0 else 0
    overall = (completeness * 0.4) + (accuracy * 0.6)

    if completeness  tuple[ExtractedCompany | None, QualityReport]:
    """Extract, validate, and score a company extraction."""
    try:
        company = client.chat.completions.create(
            model="gpt-4o",
            response_model=ExtractedCompany,
            max_retries=3,
            messages=[
                {
                    "role": "system",
                    "content": (
                        "Extract company information. Only include data "
                        "explicitly stated in the text. Use null for missing fields."
                    )
                },
                {"role": "user", "content": text}
            ],
        )

        quality = assess_extraction_quality(company, text)

        if quality.score < 0.3:
            return None, quality  # Reject low-quality extractions

        return company, quality

    except Exception as e:
        return None, QualityReport(
            score=0.0,
            issues=[f"Extraction failed: {str(e)}"],
            field_scores={},
        )

# Usage
text = "Acme Corp (ACME) was founded in 2015. They have about 500 employees."
company, quality = validated_extract(text)
if company:
    print(f"Extracted: {company.name} (quality: {quality.score:.2f})")
    for issue in quality.issues:
        print(f"  Warning: {issue}")
```

## FAQ

### How strict should my validators be?

Start permissive and tighten based on data. Track which validators trigger most often and examine the rejected data manually. If a validator rejects more than 20% of extractions, it is probably too strict — either loosen it or improve your extraction prompt. Production systems typically stabilize at 3-5% rejection rate.

### Should I validate LLM outputs differently than user inputs?

Yes. User input validation focuses on security (SQL injection, XSS). LLM output validation focuses on correctness (hallucination detection, domain constraint enforcement). You still need basic security checks on LLM output if it is ever rendered as HTML or used in database queries, but the primary concern is data accuracy.

### How do I handle validation failures in a user-facing application?

Never show raw validation errors to end users. Map internal validation failures to user-friendly messages: "We could not extract all the details from this document. Please review the highlighted fields." Log the full validation context for debugging, and provide a manual override for users to correct extracted values.

---

#Validation #DataQuality #Pydantic #BusinessRules #Python #AgenticAI #LearnAI #AIEngineering

---

Source: https://callsphere.ai/blog/validating-llm-outputs-custom-validators-business-rules-data-quality
