Skip to content
Learn Agentic AI
Learn Agentic AI10 min read3 views

Building AI Report Generation for SaaS: Natural Language to Analytics

Implement a natural language report builder that lets SaaS users ask questions in plain English and get back charts, tables, and exportable reports from their product data.

Why Natural Language Reports Matter

Most SaaS products have a reporting section that requires users to select filters, choose chart types, and configure date ranges manually. Power users love it. Everyone else avoids it. When a VP asks "How did our conversion rate change after we launched the new pricing page?", they want to type that question and get an answer — not spend 15 minutes configuring a funnel report.

AI report generation bridges this gap by translating natural language into database queries, visualizations, and exportable documents.

Safe Data Access Layer

The AI must query your database without risking SQL injection or unauthorized data access. Build a restricted query layer that only allows SELECT statements on approved tables.

flowchart TD
    START["Building AI Report Generation for SaaS: Natural L…"] --> A
    A["Why Natural Language Reports Matter"]
    A --> B
    B["Safe Data Access Layer"]
    B --> C
    C["Text-to-SQL with Schema Context"]
    C --> D
    D["Executing Queries and Building Charts"]
    D --> E
    E["Export to Multiple Formats"]
    E --> F
    F["The Complete Report API"]
    F --> G
    G["FAQ"]
    G --> DONE["Key Takeaways"]
    style START fill:#4f46e5,stroke:#4338ca,color:#fff
    style DONE fill:#059669,stroke:#047857,color:#fff
import re
import sqlalchemy
from sqlalchemy import text
from dataclasses import dataclass

@dataclass
class TableSchema:
    name: str
    columns: list[dict]  # {"name": str, "type": str, "description": str}
    description: str

ALLOWED_TABLES: dict[str, TableSchema] = {
    "deals": TableSchema(
        name="deals",
        columns=[
            {"name": "id", "type": "UUID", "description": "Deal ID"},
            {"name": "name", "type": "VARCHAR", "description": "Deal name"},
            {"name": "value", "type": "DECIMAL", "description": "Deal value in USD"},
            {"name": "stage", "type": "VARCHAR", "description": "Pipeline stage"},
            {"name": "created_at", "type": "TIMESTAMP", "description": "Creation date"},
            {"name": "closed_at", "type": "TIMESTAMP", "description": "Close date"},
            {"name": "tenant_id", "type": "UUID", "description": "Tenant ID"},
        ],
        description="Sales deals and opportunities",
    ),
    "contacts": TableSchema(
        name="contacts",
        columns=[
            {"name": "id", "type": "UUID", "description": "Contact ID"},
            {"name": "name", "type": "VARCHAR", "description": "Full name"},
            {"name": "email", "type": "VARCHAR", "description": "Email address"},
            {"name": "company", "type": "VARCHAR", "description": "Company name"},
            {"name": "created_at", "type": "TIMESTAMP", "description": "Creation date"},
            {"name": "tenant_id", "type": "UUID", "description": "Tenant ID"},
        ],
        description="Contact records",
    ),
}


def validate_query(sql: str, tenant_id: str) -> str:
    """Validate and sandbox the generated SQL."""
    sql_upper = sql.strip().upper()

    # Only allow SELECT statements
    if not sql_upper.startswith("SELECT"):
        raise ValueError("Only SELECT queries are allowed.")

    # Block dangerous keywords
    forbidden = ["INSERT", "UPDATE", "DELETE", "DROP", "ALTER", "TRUNCATE",
                  "CREATE", "GRANT", "REVOKE", "EXEC"]
    for keyword in forbidden:
        if re.search(rf'\b{keyword}\b', sql_upper):
            raise ValueError(f"Forbidden keyword: {keyword}")

    # Ensure tenant_id filter is present
    if "tenant_id" not in sql.lower():
        raise ValueError("Query must include tenant_id filter.")

    return sql

Text-to-SQL with Schema Context

Feed the LLM your table schemas so it generates accurate queries. Always include column descriptions — they are more valuable than column names for query accuracy.

async def generate_report_query(question: str, tenant_id: str,
                                 llm_client) -> dict:
    schema_description = ""
    for table in ALLOWED_TABLES.values():
        cols = ", ".join(
            [f"{c['name']} ({c['type']}: {c['description']})"
             for c in table.columns]
        )
        schema_description += f"\nTable: {table.name} - {table.description}\n"
        schema_description += f"  Columns: {cols}\n"

    prompt = f"""You are a SQL query generator for a SaaS analytics system.
Generate a PostgreSQL query to answer the user's question.

RULES:
- Only use tables and columns from the schema below
- ALWAYS filter by tenant_id = '{tenant_id}'
- Use aggregate functions (COUNT, SUM, AVG) for summary questions
- Include ORDER BY and LIMIT where appropriate
- Return JSON with: "sql", "chart_type" (bar, line, pie, table, number),
  "title", "x_axis", "y_axis"

SCHEMA:
{schema_description}

User question: {question}"""

    response = await llm_client.chat(
        messages=[{"role": "user", "content": prompt}],
        response_format={"type": "json_object"},
    )
    return response

Executing Queries and Building Charts

Run the validated query and transform results into chart-ready data structures.

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

from enum import Enum
from pydantic import BaseModel

class ChartType(str, Enum):
    BAR = "bar"
    LINE = "line"
    PIE = "pie"
    TABLE = "table"
    NUMBER = "number"

class ReportResult(BaseModel):
    title: str
    chart_type: ChartType
    data: list[dict]
    x_axis: str | None = None
    y_axis: str | None = None
    summary: str

async def execute_report(query_plan: dict, tenant_id: str,
                          db_engine) -> ReportResult:
    sql = validate_query(query_plan["sql"], tenant_id)

    async with db_engine.connect() as conn:
        result = await conn.execute(text(sql))
        rows = [dict(row._mapping) for row in result.fetchall()]

    # For single-number results
    if query_plan.get("chart_type") == "number" and len(rows) == 1:
        value = list(rows[0].values())[0]
        return ReportResult(
            title=query_plan["title"],
            chart_type=ChartType.NUMBER,
            data=[{"value": value}],
            summary=f"{query_plan['title']}: {value}",
        )

    # Serialize datetime and decimal values
    import json
    from datetime import datetime
    from decimal import Decimal

    def serialize(obj):
        if isinstance(obj, datetime):
            return obj.isoformat()
        if isinstance(obj, Decimal):
            return float(obj)
        return obj

    serialized_rows = [
        {k: serialize(v) for k, v in row.items()} for row in rows
    ]

    return ReportResult(
        title=query_plan["title"],
        chart_type=ChartType(query_plan.get("chart_type", "table")),
        data=serialized_rows,
        x_axis=query_plan.get("x_axis"),
        y_axis=query_plan.get("y_axis"),
        summary=f"Found {len(rows)} records for: {query_plan['title']}",
    )

Export to Multiple Formats

Users need reports in PDF, CSV, and email-ready formats.

import csv
import io

def export_csv(report: ReportResult) -> str:
    if not report.data:
        return ""
    output = io.StringIO()
    writer = csv.DictWriter(output, fieldnames=report.data[0].keys())
    writer.writeheader()
    writer.writerows(report.data)
    return output.getvalue()


def export_html_table(report: ReportResult) -> str:
    if not report.data:
        return "<p>No data available.</p>"
    headers = list(report.data[0].keys())
    html = f"<h2>{report.title}</h2><table border='1'><tr>"
    html += "".join(f"<th>{h}</th>" for h in headers)
    html += "</tr>"
    for row in report.data:
        html += "<tr>"
        html += "".join(f"<td>{row.get(h, '')}</td>" for h in headers)
        html += "</tr>"
    html += "</table>"
    return html

The Complete Report API

from fastapi import FastAPI, Depends
from pydantic import BaseModel

app = FastAPI()

class ReportRequest(BaseModel):
    question: str

@app.post("/api/reports/generate", response_model=ReportResult)
async def generate_report(
    req: ReportRequest,
    tenant_id: str = Depends(get_current_tenant),
    llm_client = Depends(get_llm_client),
    db_engine = Depends(get_db_engine),
):
    query_plan = await generate_report_query(
        question=req.question,
        tenant_id=tenant_id,
        llm_client=llm_client,
    )
    report = await execute_report(query_plan, tenant_id, db_engine)
    return report

FAQ

How do I prevent the AI from generating expensive queries?

Add a query cost estimator using EXPLAIN before execution. Set a maximum estimated cost threshold (e.g., 10,000 cost units) and reject queries that exceed it. Also enforce a hard row limit with LIMIT 10000 appended to every query, and set a statement timeout at the database level (e.g., 30 seconds).

What if the AI generates an incorrect query?

Show the generated SQL to the user alongside the results, with an "Edit Query" option. Log all generated queries with the original question for quality monitoring. Build a feedback loop where users can flag incorrect results, and use those examples to improve the system prompt with few-shot examples of correct query patterns.

How do I handle questions that span multiple tables?

Include JOIN relationships in your schema description. Specify which columns are foreign keys and how tables relate. The LLM handles multi-table queries well when the schema description includes lines like "deals.contact_id references contacts.id" — this gives it the explicit relationship it needs to write correct JOINs.


#AIReports #NaturalLanguageAnalytics #SaaS #TexttoSQL #Python #DataVisualization #AgenticAI #LearnAI #AIEngineering

Share
C

Written by

CallSphere Team

Expert insights on AI voice agents and customer communication automation.

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.

Related Articles You May Like

Buyer Guides

Self-Hosted vs SaaS AI Voice Agents: Which Deployment Model Is Right for You?

Comparing self-hosted and SaaS AI voice agent deployments — security, cost, latency, and compliance tradeoffs.

AI Interview Prep

7 AI Coding Interview Questions From Anthropic, Meta & OpenAI (2026 Edition)

Real AI coding interview questions from Anthropic, Meta, and OpenAI in 2026. Includes implementing attention from scratch, Anthropic's progressive coding screens, Meta's AI-assisted round, and vector search — with solution approaches.

Learn Agentic AI

Building a Multi-Agent Data Pipeline: Ingestion, Transformation, and Analysis Agents

Build a three-agent data pipeline with ingestion, transformation, and analysis agents that process data from APIs, CSVs, and databases using Python.

Learn Agentic AI

Building a Research Agent with Web Search and Report Generation: Complete Tutorial

Build a research agent that searches the web, extracts and synthesizes data, and generates formatted reports using OpenAI Agents SDK and web search tools.

Learn Agentic AI

OpenAI Agents SDK in 2026: Building Multi-Agent Systems with Handoffs and Guardrails

Complete tutorial on the OpenAI Agents SDK covering agent creation, tool definitions, handoff patterns between specialist agents, and input/output guardrails for safe AI systems.

Learn Agentic AI

LangGraph Agent Patterns 2026: Building Stateful Multi-Step AI Workflows

Complete LangGraph tutorial covering state machines for agents, conditional edges, human-in-the-loop patterns, checkpointing, and parallel execution with full code examples.