Adding AI Chat to Your SaaS Product: Architecture and Implementation Guide

Why AI Chat Belongs Inside Your Product

Adding AI chat to a SaaS product is not the same as dropping a third-party chatbot on your marketing site. Product-embedded AI chat needs access to the user's data, must respect their permissions, and should understand the current application context. A customer viewing an invoice should be able to ask "Why is this total different from last month?" and get a real, data-backed answer — not a generic FAQ response.

This guide covers the architecture for building an AI chat system that lives inside your SaaS application as a first-class feature.

Architecture Overview

The system has four layers: the frontend widget, a WebSocket gateway, an AI orchestration service, and your existing product APIs.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live →

Try Live Demo →

flowchart LR
    INPUT(["User intent"])
    PARSE["Parse plus<br/>classify"]
    PLAN["Plan and tool<br/>selection"]
    AGENT["Agent loop<br/>LLM plus tools"]
    GUARD{"Guardrails<br/>and policy"}
    EXEC["Execute and<br/>verify result"]
    OBS[("Trace and metrics")]
    OUT(["Outcome plus<br/>next action"])
    INPUT --> PARSE --> PLAN --> AGENT --> GUARD
    GUARD -->|Pass| EXEC --> OUT
    GUARD -->|Fail| AGENT
    AGENT --> OBS
    style AGENT fill:#4f46e5,stroke:#4338ca,color:#fff
    style GUARD fill:#f59e0b,stroke:#d97706,color:#1f2937
    style OBS fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
    style OUT fill:#059669,stroke:#047857,color:#fff

# Backend: FastAPI WebSocket endpoint for AI chat
from fastapi import FastAPI, WebSocket, Depends
from typing import Optional
import json

app = FastAPI()

class ChatContext:
    """Captures the user's current product context."""
    def __init__(self, user_id: str, tenant_id: str, current_page: str,
                 entity_type: Optional[str] = None,
                 entity_id: Optional[str] = None):
        self.user_id = user_id
        self.tenant_id = tenant_id
        self.current_page = current_page
        self.entity_type = entity_type
        self.entity_id = entity_id

    def to_system_prompt(self) -> str:
        context = f"User is on page: {self.current_page}."
        if self.entity_type and self.entity_id:
            context += f" They are viewing {self.entity_type} with ID {self.entity_id}."
        return context

@app.websocket("/ws/chat")
async def chat_endpoint(websocket: WebSocket):
    await websocket.accept()
    # Authenticate from token in first message
    auth_msg = await websocket.receive_json()
    user = await authenticate_ws_token(auth_msg["token"])
    if not user:
        await websocket.close(code=4001)
        return

    while True:
        data = await websocket.receive_json()
        context = ChatContext(
            user_id=user.id,
            tenant_id=user.tenant_id,
            current_page=data.get("page", "/"),
            entity_type=data.get("entity_type"),
            entity_id=data.get("entity_id"),
        )
        response = await generate_ai_response(
            message=data["message"],
            context=context,
            permissions=user.permissions,
        )
        await websocket.send_json({"reply": response})

The chat widget mounts as a floating component that tracks the user's current route and sends page context with every message.

// React chat widget that sends page context
import { useEffect, useRef, useState } from "react";
import { usePathname } from "next/navigation";

interface ChatMessage {
  role: "user" | "assistant";
  content: string;
}

export function AIChatWidget({ authToken }: { authToken: string }) {
  const [messages, setMessages] = useState<ChatMessage[]>([]);
  const [input, setInput] = useState("");
  const wsRef = useRef<WebSocket | null>(null);
  const pathname = usePathname();

  useEffect(() => {
    const ws = new WebSocket(`wss://api.example.com/ws/chat`);
    ws.onopen = () => ws.send(JSON.stringify({ token: authToken }));
    ws.onmessage = (event) => {
      const data = JSON.parse(event.data);
      setMessages((prev) => [...prev, { role: "assistant", content: data.reply }]);
    };
    wsRef.current = ws;
    return () => ws.close();
  }, [authToken]);

  const sendMessage = () => {
    if (!input.trim() || !wsRef.current) return;
    const payload = {
      message: input,
      page: pathname,
      entity_type: extractEntityType(pathname),
      entity_id: extractEntityId(pathname),
    };
    wsRef.current.send(JSON.stringify(payload));
    setMessages((prev) => [...prev, { role: "user", content: input }]);
    setInput("");
  };

  return (
    <div className="fixed bottom-4 right-4 w-96 bg-white shadow-xl rounded-lg">
      <div className="h-80 overflow-y-auto p-4">
        {messages.map((msg, i) => (
          <div key={i} className={msg.role === "user" ? "text-right" : "text-left"}>
            <p className="inline-block p-2 rounded-lg bg-gray-100">{msg.content}</p>
          </div>
        ))}
      </div>
      <div className="flex p-2 border-t">
        <input value={input} onChange={(e) => setInput(e.target.value)}
          className="flex-1 border rounded-l px-3" placeholder="Ask anything..." />
        <button onClick={sendMessage} className="bg-blue-600 text-white px-4 rounded-r">
          Send
        </button>
      </div>
    </div>
  );
}

Permission-Scoped Data Access

The AI must never return data the user is not authorized to see. Inject the user's permission set into the tool layer so every data fetch is scoped.

async def generate_ai_response(message: str, context: ChatContext,
                                permissions: list[str]) -> str:
    tools = build_scoped_tools(context.tenant_id, context.user_id, permissions)

    system_prompt = f"""You are a helpful assistant inside our SaaS product.
{context.to_system_prompt()}
Only use the provided tools to fetch data. Never fabricate data.
The user has these permissions: {', '.join(permissions)}.
Do not attempt to access data outside their permission scope."""

    response = await call_llm(
        system=system_prompt,
        messages=[{"role": "user", "content": message}],
        tools=tools,
    )
    return response

def build_scoped_tools(tenant_id: str, user_id: str,
                       permissions: list[str]) -> list:
    tools = []
    if "invoices:read" in permissions:
        tools.append(InvoiceLookupTool(tenant_id=tenant_id))
    if "analytics:read" in permissions:
        tools.append(AnalyticsQueryTool(tenant_id=tenant_id))
    if "users:read" in permissions:
        tools.append(UserDirectoryTool(tenant_id=tenant_id))
    return tools

Conversation Management

Store conversations so users can return to previous threads. Use a simple schema with tenant isolation built in.

# SQLAlchemy model for chat history
from sqlalchemy import Column, String, Text, DateTime, ForeignKey
from sqlalchemy.dialects.postgresql import UUID
import uuid
from datetime import datetime

class ChatConversation(Base):
    __tablename__ = "chat_conversations"
    id = Column(UUID(as_uuid=True), primary_key=True, default=uuid.uuid4)
    tenant_id = Column(UUID(as_uuid=True), nullable=False, index=True)
    user_id = Column(UUID(as_uuid=True), ForeignKey("users.id"), nullable=False)
    title = Column(String(255))
    created_at = Column(DateTime, default=datetime.utcnow)

class ChatMessage(Base):
    __tablename__ = "chat_messages"
    id = Column(UUID(as_uuid=True), primary_key=True, default=uuid.uuid4)
    conversation_id = Column(UUID(as_uuid=True),
                             ForeignKey("chat_conversations.id"), nullable=False, index=True)
    role = Column(String(20), nullable=False)
    content = Column(Text, nullable=False)
    created_at = Column(DateTime, default=datetime.utcnow)

FAQ

How do I prevent the AI from leaking data between tenants?

Every database query and tool invocation must be scoped by tenant_id. Pass the tenant ID from the authenticated session into every tool constructor, and add it as a mandatory WHERE clause. Never rely on the LLM to filter data — enforce it at the data access layer.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Try Live Demo → Book 30-min Walkthrough See Pricing

Should I use WebSockets or HTTP streaming for chat?

WebSockets are better for bidirectional, long-lived conversations where the server might push updates (typing indicators, tool progress). HTTP streaming with Server-Sent Events works well if your infrastructure does not support WebSocket scaling. For most SaaS products, WebSockets provide the best user experience.

How do I handle rate limiting for the AI chat?

Implement rate limiting at two levels: per-user message rate (e.g., 20 messages per minute) and per-tenant token budget (e.g., 100,000 tokens per day). Track usage in Redis with sliding window counters and return clear error messages when limits are hit.

#AIChat #SaaS #WidgetArchitecture #ContextInjection #Python #TypeScript #AgenticAI #LearnAI #AIEngineering

Adding AI Chat to Your SaaS Product: Architecture and Implementation Guide

Why AI Chat Belongs Inside Your Product

Architecture Overview

Frontend Widget Design

Permission-Scoped Data Access

Conversation Management

FAQ

How do I prevent the AI from leaking data between tenants?

Should I use WebSockets or HTTP streaming for chat?

How do I handle rate limiting for the AI chat?

Try CallSphere AI Voice Agents

Related Articles You May Like

MCP Servers for SaaS Tools: A 2026 Registry Walkthrough for Voice Agent Teams

Building Your First Agent with the OpenAI Agents SDK in 2026: A Hands-On Walkthrough

Vercel AI SDK v5 Agent Patterns: stopWhen, prepareStep, and Loop Control

Mastra.ai: The TypeScript Agent Framework Worth Trying in 2026

Stargate progress update — April 2026 site and capex

Smolagents: Hugging Face's Code-First Agent Framework Reviewed