Skip to content
Learn Agentic AI
Learn Agentic AI9 min read0 views

Adding AI Chat to Your SaaS Product: Architecture and Implementation Guide

Learn how to embed an AI chat widget into your SaaS application with proper backend integration, context injection, permission scoping, and conversation management.

Why AI Chat Belongs Inside Your Product

Adding AI chat to a SaaS product is not the same as dropping a third-party chatbot on your marketing site. Product-embedded AI chat needs access to the user's data, must respect their permissions, and should understand the current application context. A customer viewing an invoice should be able to ask "Why is this total different from last month?" and get a real, data-backed answer — not a generic FAQ response.

This guide covers the architecture for building an AI chat system that lives inside your SaaS application as a first-class feature.

Architecture Overview

The system has four layers: the frontend widget, a WebSocket gateway, an AI orchestration service, and your existing product APIs.

flowchart TD
    START["Adding AI Chat to Your SaaS Product: Architecture…"] --> A
    A["Why AI Chat Belongs Inside Your Product"]
    A --> B
    B["Architecture Overview"]
    B --> C
    C["Frontend Widget Design"]
    C --> D
    D["Permission-Scoped Data Access"]
    D --> E
    E["Conversation Management"]
    E --> F
    F["FAQ"]
    F --> DONE["Key Takeaways"]
    style START fill:#4f46e5,stroke:#4338ca,color:#fff
    style DONE fill:#059669,stroke:#047857,color:#fff
# Backend: FastAPI WebSocket endpoint for AI chat
from fastapi import FastAPI, WebSocket, Depends
from typing import Optional
import json

app = FastAPI()

class ChatContext:
    """Captures the user's current product context."""
    def __init__(self, user_id: str, tenant_id: str, current_page: str,
                 entity_type: Optional[str] = None,
                 entity_id: Optional[str] = None):
        self.user_id = user_id
        self.tenant_id = tenant_id
        self.current_page = current_page
        self.entity_type = entity_type
        self.entity_id = entity_id

    def to_system_prompt(self) -> str:
        context = f"User is on page: {self.current_page}."
        if self.entity_type and self.entity_id:
            context += f" They are viewing {self.entity_type} with ID {self.entity_id}."
        return context


@app.websocket("/ws/chat")
async def chat_endpoint(websocket: WebSocket):
    await websocket.accept()
    # Authenticate from token in first message
    auth_msg = await websocket.receive_json()
    user = await authenticate_ws_token(auth_msg["token"])
    if not user:
        await websocket.close(code=4001)
        return

    while True:
        data = await websocket.receive_json()
        context = ChatContext(
            user_id=user.id,
            tenant_id=user.tenant_id,
            current_page=data.get("page", "/"),
            entity_type=data.get("entity_type"),
            entity_id=data.get("entity_id"),
        )
        response = await generate_ai_response(
            message=data["message"],
            context=context,
            permissions=user.permissions,
        )
        await websocket.send_json({"reply": response})

Frontend Widget Design

The chat widget mounts as a floating component that tracks the user's current route and sends page context with every message.

// React chat widget that sends page context
import { useEffect, useRef, useState } from "react";
import { usePathname } from "next/navigation";

interface ChatMessage {
  role: "user" | "assistant";
  content: string;
}

export function AIChatWidget({ authToken }: { authToken: string }) {
  const [messages, setMessages] = useState<ChatMessage[]>([]);
  const [input, setInput] = useState("");
  const wsRef = useRef<WebSocket | null>(null);
  const pathname = usePathname();

  useEffect(() => {
    const ws = new WebSocket(`wss://api.example.com/ws/chat`);
    ws.onopen = () => ws.send(JSON.stringify({ token: authToken }));
    ws.onmessage = (event) => {
      const data = JSON.parse(event.data);
      setMessages((prev) => [...prev, { role: "assistant", content: data.reply }]);
    };
    wsRef.current = ws;
    return () => ws.close();
  }, [authToken]);

  const sendMessage = () => {
    if (!input.trim() || !wsRef.current) return;
    const payload = {
      message: input,
      page: pathname,
      entity_type: extractEntityType(pathname),
      entity_id: extractEntityId(pathname),
    };
    wsRef.current.send(JSON.stringify(payload));
    setMessages((prev) => [...prev, { role: "user", content: input }]);
    setInput("");
  };

  return (
    <div className="fixed bottom-4 right-4 w-96 bg-white shadow-xl rounded-lg">
      <div className="h-80 overflow-y-auto p-4">
        {messages.map((msg, i) => (
          <div key={i} className={msg.role === "user" ? "text-right" : "text-left"}>
            <p className="inline-block p-2 rounded-lg bg-gray-100">{msg.content}</p>
          </div>
        ))}
      </div>
      <div className="flex p-2 border-t">
        <input value={input} onChange={(e) => setInput(e.target.value)}
          className="flex-1 border rounded-l px-3" placeholder="Ask anything..." />
        <button onClick={sendMessage} className="bg-blue-600 text-white px-4 rounded-r">
          Send
        </button>
      </div>
    </div>
  );
}

Permission-Scoped Data Access

The AI must never return data the user is not authorized to see. Inject the user's permission set into the tool layer so every data fetch is scoped.

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

async def generate_ai_response(message: str, context: ChatContext,
                                permissions: list[str]) -> str:
    tools = build_scoped_tools(context.tenant_id, context.user_id, permissions)

    system_prompt = f"""You are a helpful assistant inside our SaaS product.
{context.to_system_prompt()}
Only use the provided tools to fetch data. Never fabricate data.
The user has these permissions: {', '.join(permissions)}.
Do not attempt to access data outside their permission scope."""

    response = await call_llm(
        system=system_prompt,
        messages=[{"role": "user", "content": message}],
        tools=tools,
    )
    return response


def build_scoped_tools(tenant_id: str, user_id: str,
                       permissions: list[str]) -> list:
    tools = []
    if "invoices:read" in permissions:
        tools.append(InvoiceLookupTool(tenant_id=tenant_id))
    if "analytics:read" in permissions:
        tools.append(AnalyticsQueryTool(tenant_id=tenant_id))
    if "users:read" in permissions:
        tools.append(UserDirectoryTool(tenant_id=tenant_id))
    return tools

Conversation Management

Store conversations so users can return to previous threads. Use a simple schema with tenant isolation built in.

# SQLAlchemy model for chat history
from sqlalchemy import Column, String, Text, DateTime, ForeignKey
from sqlalchemy.dialects.postgresql import UUID
import uuid
from datetime import datetime

class ChatConversation(Base):
    __tablename__ = "chat_conversations"
    id = Column(UUID(as_uuid=True), primary_key=True, default=uuid.uuid4)
    tenant_id = Column(UUID(as_uuid=True), nullable=False, index=True)
    user_id = Column(UUID(as_uuid=True), ForeignKey("users.id"), nullable=False)
    title = Column(String(255))
    created_at = Column(DateTime, default=datetime.utcnow)

class ChatMessage(Base):
    __tablename__ = "chat_messages"
    id = Column(UUID(as_uuid=True), primary_key=True, default=uuid.uuid4)
    conversation_id = Column(UUID(as_uuid=True),
                             ForeignKey("chat_conversations.id"), nullable=False, index=True)
    role = Column(String(20), nullable=False)
    content = Column(Text, nullable=False)
    created_at = Column(DateTime, default=datetime.utcnow)

FAQ

How do I prevent the AI from leaking data between tenants?

Every database query and tool invocation must be scoped by tenant_id. Pass the tenant ID from the authenticated session into every tool constructor, and add it as a mandatory WHERE clause. Never rely on the LLM to filter data — enforce it at the data access layer.

Should I use WebSockets or HTTP streaming for chat?

WebSockets are better for bidirectional, long-lived conversations where the server might push updates (typing indicators, tool progress). HTTP streaming with Server-Sent Events works well if your infrastructure does not support WebSocket scaling. For most SaaS products, WebSockets provide the best user experience.

How do I handle rate limiting for the AI chat?

Implement rate limiting at two levels: per-user message rate (e.g., 20 messages per minute) and per-tenant token budget (e.g., 100,000 tokens per day). Track usage in Redis with sliding window counters and return clear error messages when limits are hit.


#AIChat #SaaS #WidgetArchitecture #ContextInjection #Python #TypeScript #AgenticAI #LearnAI #AIEngineering

Share
C

Written by

CallSphere Team

Expert insights on AI voice agents and customer communication automation.

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.

Related Articles You May Like

Use Cases

Year-Round Client Engagement for CPA Firms Using AI Chat and Voice Agents

Learn how CPA firms use AI chat and voice agents for year-round client engagement — quarterly check-ins, tax planning reminders, and estimated payment alerts.

Use Cases

Online Course Enrollment: AI Chat Agents That Convert Website Visitors into Paying Students

How online education platforms use AI chat agents to boost enrollment conversion from 3% to 12% by engaging visitors with personalized course guidance.

Buyer Guides

Self-Hosted vs SaaS AI Voice Agents: Which Deployment Model Is Right for You?

Comparing self-hosted and SaaS AI voice agent deployments — security, cost, latency, and compliance tradeoffs.

AI Interview Prep

7 AI Coding Interview Questions From Anthropic, Meta & OpenAI (2026 Edition)

Real AI coding interview questions from Anthropic, Meta, and OpenAI in 2026. Includes implementing attention from scratch, Anthropic's progressive coding screens, Meta's AI-assisted round, and vector search — with solution approaches.

Learn Agentic AI

Building a Multi-Agent Data Pipeline: Ingestion, Transformation, and Analysis Agents

Build a three-agent data pipeline with ingestion, transformation, and analysis agents that process data from APIs, CSVs, and databases using Python.

Learn Agentic AI

How to Build an AI Coding Assistant with Claude and MCP: Step-by-Step Guide

Build a powerful AI coding assistant that reads files, runs tests, and fixes bugs using the Claude API and Model Context Protocol servers in TypeScript.