Building AI Report Generation for SaaS: Natural Language to Analytics

Why Natural Language Reports Matter

Most SaaS products have a reporting section that requires users to select filters, choose chart types, and configure date ranges manually. Power users love it. Everyone else avoids it. When a VP asks "How did our conversion rate change after we launched the new pricing page?", they want to type that question and get an answer — not spend 15 minutes configuring a funnel report.

AI report generation bridges this gap by translating natural language into database queries, visualizations, and exportable documents.

Safe Data Access Layer

The AI must query your database without risking SQL injection or unauthorized data access. Build a restricted query layer that only allows SELECT statements on approved tables.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live →

Try Live Demo →

flowchart LR
    INPUT(["User intent"])
    PARSE["Parse plus<br/>classify"]
    PLAN["Plan and tool<br/>selection"]
    AGENT["Agent loop<br/>LLM plus tools"]
    GUARD{"Guardrails<br/>and policy"}
    EXEC["Execute and<br/>verify result"]
    OBS[("Trace and metrics")]
    OUT(["Outcome plus<br/>next action"])
    INPUT --> PARSE --> PLAN --> AGENT --> GUARD
    GUARD -->|Pass| EXEC --> OUT
    GUARD -->|Fail| AGENT
    AGENT --> OBS
    style AGENT fill:#4f46e5,stroke:#4338ca,color:#fff
    style GUARD fill:#f59e0b,stroke:#d97706,color:#1f2937
    style OBS fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
    style OUT fill:#059669,stroke:#047857,color:#fff

import re
import sqlalchemy
from sqlalchemy import text
from dataclasses import dataclass

@dataclass
class TableSchema:
    name: str
    columns: list[dict]  # {"name": str, "type": str, "description": str}
    description: str

ALLOWED_TABLES: dict[str, TableSchema] = {
    "deals": TableSchema(
        name="deals",
        columns=[
            {"name": "id", "type": "UUID", "description": "Deal ID"},
            {"name": "name", "type": "VARCHAR", "description": "Deal name"},
            {"name": "value", "type": "DECIMAL", "description": "Deal value in USD"},
            {"name": "stage", "type": "VARCHAR", "description": "Pipeline stage"},
            {"name": "created_at", "type": "TIMESTAMP", "description": "Creation date"},
            {"name": "closed_at", "type": "TIMESTAMP", "description": "Close date"},
            {"name": "tenant_id", "type": "UUID", "description": "Tenant ID"},
        ],
        description="Sales deals and opportunities",
    ),
    "contacts": TableSchema(
        name="contacts",
        columns=[
            {"name": "id", "type": "UUID", "description": "Contact ID"},
            {"name": "name", "type": "VARCHAR", "description": "Full name"},
            {"name": "email", "type": "VARCHAR", "description": "Email address"},
            {"name": "company", "type": "VARCHAR", "description": "Company name"},
            {"name": "created_at", "type": "TIMESTAMP", "description": "Creation date"},
            {"name": "tenant_id", "type": "UUID", "description": "Tenant ID"},
        ],
        description="Contact records",
    ),
}

def validate_query(sql: str, tenant_id: str) -> str:
    """Validate and sandbox the generated SQL."""
    sql_upper = sql.strip().upper()

    # Only allow SELECT statements
    if not sql_upper.startswith("SELECT"):
        raise ValueError("Only SELECT queries are allowed.")

    # Block dangerous keywords
    forbidden = ["INSERT", "UPDATE", "DELETE", "DROP", "ALTER", "TRUNCATE",
                  "CREATE", "GRANT", "REVOKE", "EXEC"]
    for keyword in forbidden:
        if re.search(rf'\b{keyword}\b', sql_upper):
            raise ValueError(f"Forbidden keyword: {keyword}")

    # Ensure tenant_id filter is present
    if "tenant_id" not in sql.lower():
        raise ValueError("Query must include tenant_id filter.")

    return sql

Text-to-SQL with Schema Context

Feed the LLM your table schemas so it generates accurate queries. Always include column descriptions — they are more valuable than column names for query accuracy.

async def generate_report_query(question: str, tenant_id: str,
                                 llm_client) -> dict:
    schema_description = ""
    for table in ALLOWED_TABLES.values():
        cols = ", ".join(
            [f"{c['name']} ({c['type']}: {c['description']})"
             for c in table.columns]
        )
        schema_description += f"\nTable: {table.name} - {table.description}\n"
        schema_description += f"  Columns: {cols}\n"

    prompt = f"""You are a SQL query generator for a SaaS analytics system.
Generate a PostgreSQL query to answer the user's question.

RULES:
- Only use tables and columns from the schema below
- ALWAYS filter by tenant_id = '{tenant_id}'
- Use aggregate functions (COUNT, SUM, AVG) for summary questions
- Include ORDER BY and LIMIT where appropriate
- Return JSON with: "sql", "chart_type" (bar, line, pie, table, number),
  "title", "x_axis", "y_axis"

SCHEMA:
{schema_description}

User question: {question}"""

    response = await llm_client.chat(
        messages=[{"role": "user", "content": prompt}],
        response_format={"type": "json_object"},
    )
    return response

Executing Queries and Building Charts

Run the validated query and transform results into chart-ready data structures.

from enum import Enum
from pydantic import BaseModel

class ChartType(str, Enum):
    BAR = "bar"
    LINE = "line"
    PIE = "pie"
    TABLE = "table"
    NUMBER = "number"

class ReportResult(BaseModel):
    title: str
    chart_type: ChartType
    data: list[dict]
    x_axis: str | None = None
    y_axis: str | None = None
    summary: str

async def execute_report(query_plan: dict, tenant_id: str,
                          db_engine) -> ReportResult:
    sql = validate_query(query_plan["sql"], tenant_id)

    async with db_engine.connect() as conn:
        result = await conn.execute(text(sql))
        rows = [dict(row._mapping) for row in result.fetchall()]

    # For single-number results
    if query_plan.get("chart_type") == "number" and len(rows) == 1:
        value = list(rows[0].values())[0]
        return ReportResult(
            title=query_plan["title"],
            chart_type=ChartType.NUMBER,
            data=[{"value": value}],
            summary=f"{query_plan['title']}: {value}",
        )

    # Serialize datetime and decimal values
    import json
    from datetime import datetime
    from decimal import Decimal

    def serialize(obj):
        if isinstance(obj, datetime):
            return obj.isoformat()
        if isinstance(obj, Decimal):
            return float(obj)
        return obj

    serialized_rows = [
        {k: serialize(v) for k, v in row.items()} for row in rows
    ]

    return ReportResult(
        title=query_plan["title"],
        chart_type=ChartType(query_plan.get("chart_type", "table")),
        data=serialized_rows,
        x_axis=query_plan.get("x_axis"),
        y_axis=query_plan.get("y_axis"),
        summary=f"Found {len(rows)} records for: {query_plan['title']}",
    )

Export to Multiple Formats

Users need reports in PDF, CSV, and email-ready formats.

import csv
import io

def export_csv(report: ReportResult) -> str:
    if not report.data:
        return ""
    output = io.StringIO()
    writer = csv.DictWriter(output, fieldnames=report.data[0].keys())
    writer.writeheader()
    writer.writerows(report.data)
    return output.getvalue()

def export_html_table(report: ReportResult) -> str:
    if not report.data:
        return "<p>No data available.</p>"
    headers = list(report.data[0].keys())
    html = f"<h2>{report.title}</h2><table border='1'><tr>"
    html += "".join(f"<th>{h}</th>" for h in headers)
    html += "</tr>"
    for row in report.data:
        html += "<tr>"
        html += "".join(f"<td>{row.get(h, '')}</td>" for h in headers)
        html += "</tr>"
    html += "</table>"
    return html

The Complete Report API

from fastapi import FastAPI, Depends
from pydantic import BaseModel

app = FastAPI()

class ReportRequest(BaseModel):
    question: str

@app.post("/api/reports/generate", response_model=ReportResult)
async def generate_report(
    req: ReportRequest,
    tenant_id: str = Depends(get_current_tenant),
    llm_client = Depends(get_llm_client),
    db_engine = Depends(get_db_engine),
):
    query_plan = await generate_report_query(
        question=req.question,
        tenant_id=tenant_id,
        llm_client=llm_client,
    )
    report = await execute_report(query_plan, tenant_id, db_engine)
    return report

FAQ

How do I prevent the AI from generating expensive queries?

Add a query cost estimator using EXPLAIN before execution. Set a maximum estimated cost threshold (e.g., 10,000 cost units) and reject queries that exceed it. Also enforce a hard row limit with LIMIT 10000 appended to every query, and set a statement timeout at the database level (e.g., 30 seconds).

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Try Live Demo → Book 30-min Walkthrough See Pricing

What if the AI generates an incorrect query?

Show the generated SQL to the user alongside the results, with an "Edit Query" option. Log all generated queries with the original question for quality monitoring. Build a feedback loop where users can flag incorrect results, and use those examples to improve the system prompt with few-shot examples of correct query patterns.

How do I handle questions that span multiple tables?

Include JOIN relationships in your schema description. Specify which columns are foreign keys and how tables relate. The LLM handles multi-table queries well when the schema description includes lines like "deals.contact_id references contacts.id" — this gives it the explicit relationship it needs to write correct JOINs.

#AIReports #NaturalLanguageAnalytics #SaaS #TexttoSQL #Python #DataVisualization #AgenticAI #LearnAI #AIEngineering

Building AI Report Generation for SaaS: Natural Language to Analytics

Why Natural Language Reports Matter

Safe Data Access Layer

Text-to-SQL with Schema Context

Executing Queries and Building Charts

Export to Multiple Formats

The Complete Report API

FAQ

How do I prevent the AI from generating expensive queries?

What if the AI generates an incorrect query?

How do I handle questions that span multiple tables?

Try CallSphere AI Voice Agents

Related Articles You May Like

MCP Servers for SaaS Tools: A 2026 Registry Walkthrough for Voice Agent Teams

Building Your First Agent with the OpenAI Agents SDK in 2026: A Hands-On Walkthrough

Stargate progress update — April 2026 site and capex

Smolagents: Hugging Face's Code-First Agent Framework Reviewed

Vercel AI SDK for SaaS Onboarding Agents: Conversion Lift Story

Embedding AI Into SaaS Products: Architecture and UX Patterns