Skip to content
Learn Agentic AI
Learn Agentic AI14 min read12 views

Building an AI Agent SaaS Platform: Architecture Patterns

Design and build a multi-tenant AI agent SaaS platform with user isolation, API key management, token metering, billing integration, and scalable infrastructure using the OpenAI Agents SDK.

The AI Agent Platform Opportunity

As agentic AI matures, many teams are building platforms that let end users create and run AI agents without writing code. These platforms face unique architectural challenges: multi-tenancy, usage-based billing, token metering, agent isolation, and the need to support hundreds of concurrent agent runs without one tenant's workload degrading another's.

This post covers the architecture patterns for building a production AI agent SaaS platform on the OpenAI Agents SDK.

System Architecture Overview

A multi-tenant agent platform has five core layers:

flowchart TD
    START["Building an AI Agent SaaS Platform: Architecture …"] --> A
    A["The AI Agent Platform Opportunity"]
    A --> B
    B["System Architecture Overview"]
    B --> C
    C["Database Schema"]
    C --> D
    D["API Key Authentication"]
    D --> E
    E["Token Metering Service"]
    E --> F
    F["Tenant-Isolated Agent Runtime"]
    F --> G
    G["API Endpoints"]
    G --> H
    H["Billing Integration"]
    H --> DONE["Key Takeaways"]
    style START fill:#4f46e5,stroke:#4338ca,color:#fff
    style DONE fill:#059669,stroke:#047857,color:#fff
  1. API Gateway — Authentication, rate limiting, request routing
  2. Tenant Management — User accounts, API keys, configuration
  3. Agent Runtime — Executes agent workflows with tenant isolation
  4. Metering and Billing — Tracks token usage, enforces limits, bills customers
  5. Storage — Agent definitions, conversation history, tool configurations
Client -> API Gateway -> Agent Runtime -> OpenAI API
                |              |
          Tenant DB      Metering Service
                |              |
          Agent Store    Billing System

Database Schema

The core schema captures tenants, agents, API keys, and usage:

# models.py
from sqlalchemy import Column, String, Integer, Float, Boolean, DateTime, ForeignKey, Text, JSON
from sqlalchemy.orm import relationship, DeclarativeBase
from sqlalchemy.dialects.postgresql import UUID
import uuid
from datetime import datetime


class Base(DeclarativeBase):
    pass


class Tenant(Base):
    __tablename__ = "tenants"

    id = Column(UUID(as_uuid=True), primary_key=True, default=uuid.uuid4)
    name = Column(String(255), nullable=False)
    email = Column(String(255), unique=True, nullable=False)
    plan = Column(String(50), default="free")  # free, pro, enterprise
    monthly_token_limit = Column(Integer, default=1_000_000)
    is_active = Column(Boolean, default=True)
    created_at = Column(DateTime, default=datetime.utcnow)

    api_keys = relationship("APIKey", back_populates="tenant")
    agents = relationship("AgentConfig", back_populates="tenant")
    usage_records = relationship("UsageRecord", back_populates="tenant")


class APIKey(Base):
    __tablename__ = "api_keys"

    id = Column(UUID(as_uuid=True), primary_key=True, default=uuid.uuid4)
    tenant_id = Column(UUID(as_uuid=True), ForeignKey("tenants.id"), nullable=False)
    key_hash = Column(String(64), unique=True, nullable=False)
    key_prefix = Column(String(8), nullable=False)  # First 8 chars for identification
    name = Column(String(255), nullable=False)
    is_active = Column(Boolean, default=True)
    last_used_at = Column(DateTime, nullable=True)
    created_at = Column(DateTime, default=datetime.utcnow)

    tenant = relationship("Tenant", back_populates="api_keys")


class AgentConfig(Base):
    __tablename__ = "agent_configs"

    id = Column(UUID(as_uuid=True), primary_key=True, default=uuid.uuid4)
    tenant_id = Column(UUID(as_uuid=True), ForeignKey("tenants.id"), nullable=False)
    name = Column(String(255), nullable=False)
    model = Column(String(50), default="gpt-4.1")
    instructions = Column(Text, nullable=False)
    tools = Column(JSON, default=[])
    handoffs = Column(JSON, default=[])
    is_published = Column(Boolean, default=False)
    created_at = Column(DateTime, default=datetime.utcnow)
    updated_at = Column(DateTime, default=datetime.utcnow, onupdate=datetime.utcnow)

    tenant = relationship("Tenant", back_populates="agents")


class UsageRecord(Base):
    __tablename__ = "usage_records"

    id = Column(UUID(as_uuid=True), primary_key=True, default=uuid.uuid4)
    tenant_id = Column(UUID(as_uuid=True), ForeignKey("tenants.id"), nullable=False)
    agent_id = Column(UUID(as_uuid=True), ForeignKey("agent_configs.id"), nullable=True)
    model = Column(String(50), nullable=False)
    input_tokens = Column(Integer, default=0)
    output_tokens = Column(Integer, default=0)
    total_tokens = Column(Integer, default=0)
    cost_usd = Column(Float, default=0.0)
    created_at = Column(DateTime, default=datetime.utcnow)

    tenant = relationship("Tenant", back_populates="usage_records")

API Key Authentication

Authenticate tenants using hashed API keys:

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

flowchart TD
    CENTER(("Core Concepts"))
    CENTER --> N0["API Gateway — Authentication, rate limi…"]
    CENTER --> N1["Tenant Management — User accounts, API …"]
    CENTER --> N2["Agent Runtime — Executes agent workflow…"]
    CENTER --> N3["Metering and Billing — Tracks token usa…"]
    CENTER --> N4["Storage — Agent definitions, conversati…"]
    style CENTER fill:#4f46e5,stroke:#4338ca,color:#fff
# auth.py
from fastapi import Security, HTTPException
from fastapi.security import APIKeyHeader
import hashlib
from sqlalchemy import select
from models import APIKey, Tenant
from database import get_session

api_key_header = APIKeyHeader(name="X-API-Key")


def hash_api_key(key: str) -> str:
    return hashlib.sha256(key.encode()).hexdigest()


async def get_current_tenant(api_key: str = Security(api_key_header)) -> Tenant:
    """Authenticate and return the tenant for the given API key."""
    key_hash = hash_api_key(api_key)

    async with get_session() as session:
        result = await session.execute(
            select(APIKey)
            .where(APIKey.key_hash == key_hash, APIKey.is_active == True)
            .options(selectinload(APIKey.tenant))
        )
        api_key_record = result.scalar_one_or_none()

        if not api_key_record:
            raise HTTPException(status_code=401, detail="Invalid API key")

        tenant = api_key_record.tenant
        if not tenant.is_active:
            raise HTTPException(status_code=403, detail="Account suspended")

        # Update last used timestamp
        api_key_record.last_used_at = datetime.utcnow()
        await session.commit()

        return tenant

Token Metering Service

Track every token consumed by every tenant in real time:

# metering.py
from datetime import datetime, timedelta
from sqlalchemy import select, func
from models import UsageRecord, Tenant
from database import get_session

MODEL_PRICING = {
    "gpt-5": {"input": 10.00, "output": 30.00},
    "gpt-4.1": {"input": 2.00, "output": 8.00},
    "gpt-4.1-mini": {"input": 0.40, "output": 1.60},
}


class MeteringService:
    async def record_usage(
        self,
        tenant_id: str,
        agent_id: str | None,
        model: str,
        input_tokens: int,
        output_tokens: int,
    ) -> UsageRecord:
        """Record token usage for a tenant."""
        pricing = MODEL_PRICING.get(model, MODEL_PRICING["gpt-4.1"])
        cost = (
            (input_tokens / 1_000_000) * pricing["input"] +
            (output_tokens / 1_000_000) * pricing["output"]
        )

        async with get_session() as session:
            record = UsageRecord(
                tenant_id=tenant_id,
                agent_id=agent_id,
                model=model,
                input_tokens=input_tokens,
                output_tokens=output_tokens,
                total_tokens=input_tokens + output_tokens,
                cost_usd=cost,
            )
            session.add(record)
            await session.commit()
            return record

    async def get_monthly_usage(self, tenant_id: str) -> dict:
        """Get the tenant's usage for the current billing period."""
        month_start = datetime.utcnow().replace(day=1, hour=0, minute=0, second=0)

        async with get_session() as session:
            result = await session.execute(
                select(
                    func.sum(UsageRecord.total_tokens).label("total_tokens"),
                    func.sum(UsageRecord.cost_usd).label("total_cost"),
                    func.count(UsageRecord.id).label("request_count"),
                )
                .where(
                    UsageRecord.tenant_id == tenant_id,
                    UsageRecord.created_at >= month_start,
                )
            )
            row = result.one()
            return {
                "total_tokens": row.total_tokens or 0,
                "total_cost": round(row.total_cost or 0.0, 4),
                "request_count": row.request_count or 0,
            }

    async def check_quota(self, tenant_id: str) -> bool:
        """Check if the tenant has remaining quota."""
        async with get_session() as session:
            tenant = await session.get(Tenant, tenant_id)
            usage = await self.get_monthly_usage(tenant_id)
            return usage["total_tokens"] < tenant.monthly_token_limit

Tenant-Isolated Agent Runtime

Each agent run must be isolated to its tenant. The runtime builds agents dynamically from the tenant's configuration:

# runtime.py
from agents import Agent, Runner, function_tool
from models import AgentConfig, Tenant
from metering import MeteringService

metering = MeteringService()


class TenantAgentRuntime:
    """Runs agents in a tenant-isolated context."""

    def __init__(self, tenant: Tenant):
        self.tenant = tenant

    def build_agent(self, config: AgentConfig) -> Agent:
        """Build an SDK Agent from a tenant's agent configuration."""
        tools = self._build_tools(config.tools)

        return Agent(
            name=config.name,
            model=config.model,
            instructions=config.instructions,
            tools=tools,
        )

    async def run(self, config: AgentConfig, user_input: str) -> dict:
        """Execute an agent run with metering and quota enforcement."""
        # Check quota before running
        has_quota = await metering.check_quota(str(self.tenant.id))
        if not has_quota:
            return {"error": "Monthly token quota exceeded", "status": 429}

        agent = self.build_agent(config)

        result = await Runner.run(agent, input=user_input)

        # Record token usage
        total_input = 0
        total_output = 0
        for response in result.raw_responses:
            if response.usage:
                total_input += response.usage.input_tokens
                total_output += response.usage.output_tokens

        await metering.record_usage(
            tenant_id=str(self.tenant.id),
            agent_id=str(config.id),
            model=config.model,
            input_tokens=total_input,
            output_tokens=total_output,
        )

        return {
            "response": result.final_output,
            "tokens_used": total_input + total_output,
            "model": config.model,
        }

    def _build_tools(self, tool_configs: list[dict]) -> list:
        """Build tool functions from configuration."""
        tools = []
        for tool_config in tool_configs:
            if tool_config["type"] == "webhook":
                tools.append(self._create_webhook_tool(tool_config))
        return tools

    def _create_webhook_tool(self, config: dict):
        """Create a function tool that calls a tenant's webhook."""
        import httpx

        @function_tool(name_override=config["name"], description_override=config["description"])
        async def webhook_tool(**kwargs) -> str:
            async with httpx.AsyncClient(timeout=30.0) as client:
                resp = await client.post(
                    config["url"],
                    json=kwargs,
                    headers={"Authorization": f"Bearer {config.get('token', '')}"},
                )
                return resp.text

        return webhook_tool

API Endpoints

The platform exposes a clean REST API for tenants to manage and run their agents:

# routes.py
from fastapi import APIRouter, Depends, HTTPException
from pydantic import BaseModel
from auth import get_current_tenant
from runtime import TenantAgentRuntime
from models import Tenant, AgentConfig
from database import get_session

router = APIRouter(prefix="/v1")


class RunRequest(BaseModel):
    agent_id: str
    message: str


class RunResponse(BaseModel):
    response: str
    tokens_used: int
    model: str


@router.post("/run", response_model=RunResponse)
async def run_agent(request: RunRequest, tenant: Tenant = Depends(get_current_tenant)):
    async with get_session() as session:
        config = await session.get(AgentConfig, request.agent_id)
        if not config or str(config.tenant_id) != str(tenant.id):
            raise HTTPException(status_code=404, detail="Agent not found")

    runtime = TenantAgentRuntime(tenant)
    result = await runtime.run(config, request.message)

    if "error" in result:
        raise HTTPException(status_code=result["status"], detail=result["error"])

    return RunResponse(**result)


@router.get("/usage")
async def get_usage(tenant: Tenant = Depends(get_current_tenant)):
    from metering import MeteringService
    metering = MeteringService()
    usage = await metering.get_monthly_usage(str(tenant.id))
    return {
        "monthly_usage": usage,
        "monthly_limit": tenant.monthly_token_limit,
        "plan": tenant.plan,
    }

Billing Integration

Connect metering data to a billing system like Stripe for usage-based pricing:

# billing.py
import stripe
from metering import MeteringService

stripe.api_key = "sk_..."

PLAN_PRICING = {
    "free": {"base_price": 0, "included_tokens": 1_000_000, "overage_per_1m": 0},
    "pro": {"base_price": 49, "included_tokens": 10_000_000, "overage_per_1m": 5.00},
    "enterprise": {"base_price": 299, "included_tokens": 100_000_000, "overage_per_1m": 3.00},
}


async def calculate_invoice(tenant_id: str, plan: str) -> dict:
    metering = MeteringService()
    usage = await metering.get_monthly_usage(tenant_id)
    pricing = PLAN_PRICING[plan]

    base = pricing["base_price"]
    overage_tokens = max(0, usage["total_tokens"] - pricing["included_tokens"])
    overage_cost = (overage_tokens / 1_000_000) * pricing["overage_per_1m"]

    return {
        "base_price": base,
        "included_tokens": pricing["included_tokens"],
        "tokens_used": usage["total_tokens"],
        "overage_tokens": overage_tokens,
        "overage_cost": round(overage_cost, 2),
        "total": round(base + overage_cost, 2),
    }

Building an AI agent SaaS platform requires careful attention to isolation, metering, and scalability. The patterns above — hashed API keys, per-tenant agent runtimes, real-time token metering, and usage-based billing — provide a solid foundation. Start with a single-tenant deployment to validate your agent framework, then add multi-tenancy once the core agent logic is proven.

Share
C

Written by

CallSphere Team

Expert insights on AI voice agents and customer communication automation.

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.