---
title: "AI Agent for Expense Reporting: Receipt Scanning, Categorization, and Policy Compliance"
description: "Build an AI agent that scans receipts with OCR, categorizes expenses, checks them against company policy, routes approvals, and generates expense reports automatically."
canonical: https://callsphere.ai/blog/ai-agent-expense-reporting-receipt-scanning-categorization-policy-compliance
category: "Learn Agentic AI"
tags: ["Expense Reporting", "OCR", "Receipt Scanning", "Policy Compliance", "Workflow Automation"]
author: "CallSphere Team"
published: 2026-03-17T00:00:00.000Z
updated: 2026-06-01T21:26:16.995Z
---

# AI Agent for Expense Reporting: Receipt Scanning, Categorization, and Policy Compliance

> Build an AI agent that scans receipts with OCR, categorizes expenses, checks them against company policy, routes approvals, and generates expense reports automatically.

## Why Expense Reporting Is a Universal Pain Point

Every organization with employees who travel or make purchases needs expense reporting. Yet the process remains universally disliked — employees hate filling out forms, managers hate reviewing them, and finance teams hate chasing down missing receipts and policy violations. An AI agent can eliminate most of this friction by scanning receipts, auto-categorizing expenses, checking policy compliance in real time, and routing everything through the approval workflow.

## Agent Components

1. **Receipt Scanner** — OCR extraction from photos and PDFs
2. **Expense Categorizer** — classify expenses by type and project
3. **Policy Checker** — validate against company expense policies
4. **Report Generator** — compile approved expenses into reports

## Step 1: Receipt Scanning with OCR

Extract structured data from receipt images.

```mermaid
flowchart LR
    REQ(["Inbound request"])
    PII["PII detection
regex plus NER"]
    POL{"Policy engine
OPA or rules"}
    REDACT["Redact or mask"]
    LLM["LLM call"]
    OUT["Response"]
    AUDIT[("Append only
audit log")]
    BLOCK(["Block plus
notify DPO"])
    REQ --> PII --> POL
    POL -->|Allow| REDACT --> LLM --> OUT --> AUDIT
    POL -->|Deny| BLOCK
    style POL fill:#4f46e5,stroke:#4338ca,color:#fff
    style AUDIT fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
    style BLOCK fill:#dc2626,stroke:#b91c1c,color:#fff
    style OUT fill:#059669,stroke:#047857,color:#fff
```

```python
from pydantic import BaseModel
from datetime import date
from openai import OpenAI

client = OpenAI()

class ReceiptData(BaseModel):
    merchant_name: str
    merchant_category: str
    date: date
    subtotal: float
    tax: float
    tip: float | None = None
    total: float
    currency: str
    payment_method: str
    line_items: list[dict]  # {"item": str, "qty": int, "price": float}

def scan_receipt(image_path: str) -> ReceiptData:
    """Extract structured data from a receipt image."""
    import base64

    with open(image_path, "rb") as f:
        image_data = base64.b64encode(f.read()).decode()

    response = client.beta.chat.completions.parse(
        model="gpt-4o",
        messages=[
            {
                "role": "system",
                "content": (
                    "Extract all data from this receipt image. "
                    "Include the merchant name, date, line items, "
                    "subtotal, tax, tip if present, total, currency, "
                    "and payment method. Use ISO date format."
                ),
            },
            {
                "role": "user",
                "content": [
                    {
                        "type": "image_url",
                        "image_url": {
                            "url": (
                                f"data:image/jpeg;base64,"
                                f"{image_data}"
                            )
                        },
                    },
                    {
                        "type": "text",
                        "text": "Extract all receipt data.",
                    },
                ],
            },
        ],
        response_format=ReceiptData,
    )
    return response.choices[0].message.parsed
```

## Step 2: Expense Categorization

Map each expense to the correct category based on company chart of accounts.

```python
class ExpenseCategory(BaseModel):
    category: str
    subcategory: str
    gl_code: str  # General ledger code
    project_code: str | None
    is_billable: bool
    confidence: float

EXPENSE_CATEGORIES = {
    "Travel - Airfare": "6100",
    "Travel - Lodging": "6110",
    "Travel - Ground Transport": "6120",
    "Travel - Car Rental": "6130",
    "Meals - Client Entertainment": "6200",
    "Meals - Team / Working": "6210",
    "Meals - Individual Travel": "6220",
    "Office Supplies": "6300",
    "Software & Subscriptions": "6400",
    "Professional Development": "6500",
    "Equipment": "6600",
    "Communications": "6700",
    "Other": "6900",
}

def categorize_expense(
    receipt: ReceiptData, trip_context: str = ""
) -> ExpenseCategory:
    """Categorize an expense based on receipt data and context."""
    categories_list = "\n".join(
        f"- {cat} (GL: {gl})"
        for cat, gl in EXPENSE_CATEGORIES.items()
    )

    response = client.beta.chat.completions.parse(
        model="gpt-4o",
        messages=[
            {
                "role": "system",
                "content": (
                    "Categorize this expense into the correct "
                    "category. Determine if it is billable to a "
                    f"client.\n\nCategories:\n{categories_list}"
                ),
            },
            {
                "role": "user",
                "content": (
                    f"Merchant: {receipt.merchant_name}\n"
                    f"Category: {receipt.merchant_category}\n"
                    f"Amount: ${receipt.total:.2f}\n"
                    f"Date: {receipt.date}\n"
                    f"Items: {receipt.line_items}\n"
                    f"Trip Context: {trip_context or 'None provided'}"
                ),
            },
        ],
        response_format=ExpenseCategory,
    )
    return response.choices[0].message.parsed
```

## Step 3: Policy Compliance Checking

Validate each expense against company policies before submission.

```python
class PolicyViolation(BaseModel):
    rule_id: str
    rule_description: str
    severity: str  # "warning", "violation", "block"
    details: str
    suggested_action: str

class ComplianceResult(BaseModel):
    is_compliant: bool
    violations: list[PolicyViolation]
    requires_additional_approval: bool
    approval_level: str  # "manager", "director", "vp", "cfo"

class ExpensePolicy:
    """Company expense policy rules engine."""

    def __init__(self):
        self.rules = [
            {
                "id": "MAX_MEAL_INDIVIDUAL",
                "description": "Individual meal limit: $75",
                "check": self._check_meal_limit,
            },
            {
                "id": "MAX_MEAL_CLIENT",
                "description": "Client entertainment limit: $150/person",
                "check": self._check_client_meal_limit,
            },
            {
                "id": "RECEIPT_REQUIRED",
                "description": "Receipt required for expenses over $25",
                "check": self._check_receipt_required,
            },
            {
                "id": "ADVANCE_BOOKING",
                "description": "Flights must be booked 14+ days ahead",
                "check": self._check_advance_booking,
            },
            {
                "id": "HOTEL_RATE",
                "description": "Hotel max rate: $250/night",
                "check": self._check_hotel_rate,
            },
        ]

    def check_compliance(
        self,
        receipt: ReceiptData,
        category: ExpenseCategory,
        booking_date: date | None = None,
        attendee_count: int = 1,
    ) -> ComplianceResult:
        """Check expense against all policy rules."""
        violations = []

        for rule in self.rules:
            violation = rule["check"](
                receipt, category, booking_date, attendee_count
            )
            if violation:
                violations.append(violation)

        # Determine approval level
        blocking = [v for v in violations if v.severity == "block"]
        has_violations = [
            v for v in violations if v.severity == "violation"
        ]

        if receipt.total > 5000:
            approval = "vp"
        elif receipt.total > 1000 or has_violations:
            approval = "director"
        else:
            approval = "manager"

        return ComplianceResult(
            is_compliant=len(violations) == 0,
            violations=violations,
            requires_additional_approval=len(has_violations) > 0,
            approval_level=approval,
        )

    def _check_meal_limit(self, receipt, category, *args):
        if "Meals - Individual" in category.category:
            if receipt.total > 75:
                return PolicyViolation(
                    rule_id="MAX_MEAL_INDIVIDUAL",
                    rule_description="Individual meal limit: $75",
                    severity="violation",
                    details=(
                        f"Meal total ${receipt.total:.2f} "
                        f"exceeds $75 limit"
                    ),
                    suggested_action=(
                        "Provide business justification or "
                        "split the expense"
                    ),
                )
        return None

    def _check_client_meal_limit(self, receipt, category, bd, count):
        if "Client Entertainment" in category.category and count > 0:
            per_person = receipt.total / count
            if per_person > 150:
                return PolicyViolation(
                    rule_id="MAX_MEAL_CLIENT",
                    rule_description="Client meal: $150/person max",
                    severity="warning",
                    details=(
                        f"${per_person:.2f}/person exceeds limit"
                    ),
                    suggested_action="Get director pre-approval",
                )
        return None

    def _check_receipt_required(self, receipt, category, *args):
        # This would check if a receipt image was provided
        return None  # Assume receipt present since we scanned it

    def _check_advance_booking(self, receipt, category, bd, *args):
        if "Airfare" in category.category and bd:
            days_advance = (receipt.date - bd).days
            if days_advance  250:
                return PolicyViolation(
                    rule_id="HOTEL_RATE",
                    rule_description="Hotel max: $250/night",
                    severity="violation",
                    details=f"Rate ${receipt.total:.2f} exceeds $250",
                    suggested_action="Book within policy rate or get pre-approval",
                )
        return None
```

## Step 4: Expense Report Generation

Compile processed expenses into a formatted report.

```python
from dataclasses import dataclass, field

@dataclass
class ExpenseReport:
    report_id: str
    employee_name: str
    department: str
    period_start: date
    period_end: date
    expenses: list[dict] = field(default_factory=list)

    @property
    def total_amount(self) -> float:
        return sum(e["amount"] for e in self.expenses)

    @property
    def by_category(self) -> dict[str, float]:
        totals = {}
        for e in self.expenses:
            cat = e["category"]
            totals[cat] = totals.get(cat, 0) + e["amount"]
        return totals

    def add_expense(
        self,
        receipt: ReceiptData,
        category: ExpenseCategory,
        compliance: ComplianceResult,
    ):
        self.expenses.append({
            "date": str(receipt.date),
            "merchant": receipt.merchant_name,
            "amount": receipt.total,
            "category": category.category,
            "gl_code": category.gl_code,
            "billable": category.is_billable,
            "compliant": compliance.is_compliant,
            "violations": [
                v.rule_id for v in compliance.violations
            ],
            "approval_level": compliance.approval_level,
        })

    def generate_summary(self) -> str:
        """Generate a formatted expense report summary."""
        lines = [
            f"Expense Report: {self.report_id}",
            f"Employee: {self.employee_name}",
            f"Period: {self.period_start} to {self.period_end}",
            f"Total: ${self.total_amount:,.2f}",
            "",
            "By Category:",
        ]
        for cat, total in sorted(
            self.by_category.items(),
            key=lambda x: x[1],
            reverse=True,
        ):
            lines.append(f"  {cat}: ${total:,.2f}")

        non_compliant = [
            e for e in self.expenses if not e["compliant"]
        ]
        if non_compliant:
            lines.append(f"\nPolicy Violations: {len(non_compliant)}")

        return "\n".join(lines)
```

## Full Pipeline

```python
def process_expense(
    image_path: str, trip_context: str = ""
) -> dict:
    """Process a single expense from receipt to report entry."""
    receipt = scan_receipt(image_path)
    category = categorize_expense(receipt, trip_context)
    policy = ExpensePolicy()
    compliance = policy.check_compliance(receipt, category)

    return {
        "receipt": receipt,
        "category": category,
        "compliance": compliance,
    }

# Process multiple receipts
report = ExpenseReport(
    report_id="EXP-2026-0342",
    employee_name="Jane Smith",
    department="Sales",
    period_start=date(2026, 3, 1),
    period_end=date(2026, 3, 15),
)

receipt_files = ["dinner_receipt.jpg", "hotel_bill.pdf", "uber.png"]
for path in receipt_files:
    result = process_expense(path, "Client meeting in NYC")
    report.add_expense(
        result["receipt"], result["category"], result["compliance"]
    )

print(report.generate_summary())
```

## FAQ

### How accurate is OCR-based receipt scanning?

Modern vision-language models like GPT-4o achieve over 95% accuracy on clearly printed receipts. Accuracy drops with faded thermal paper, handwritten receipts, or receipts in poor lighting conditions. For business-critical accuracy, implement a confidence threshold and route low-confidence extractions for manual verification.

### How do you handle expenses in foreign currencies?

Store the original currency and amount alongside the converted amount. Use a reliable exchange rate API (such as Open Exchange Rates or the European Central Bank) to convert at the transaction date rate. Company policy should specify whether to use the transaction date rate or the report submission date rate.

### Can the agent learn from past categorization decisions?

Yes. Log every categorization decision along with any corrections made by employees or approvers. Use this feedback to fine-tune the categorization model over time. You can also build a merchant-to-category lookup table from historical data so that repeat merchants are categorized instantly without an LLM call.

---

#ExpenseReporting #OCR #ReceiptScanning #PolicyCompliance #WorkflowAutomation #AgenticAI #LearnAI #AIEngineering

---

Source: https://callsphere.ai/blog/ai-agent-expense-reporting-receipt-scanning-categorization-policy-compliance
