Building an AI Agent SaaS Platform: Architecture Patterns
Design and build a multi-tenant AI agent SaaS platform with user isolation, API key management, token metering, billing integration, and scalable infrastructure using the OpenAI Agents SDK.
The AI Agent Platform Opportunity
As agentic AI matures, many teams are building platforms that let end users create and run AI agents without writing code. These platforms face unique architectural challenges: multi-tenancy, usage-based billing, token metering, agent isolation, and the need to support hundreds of concurrent agent runs without one tenant's workload degrading another's.
This post covers the architecture patterns for building a production AI agent SaaS platform on the OpenAI Agents SDK.
System Architecture Overview
A multi-tenant agent platform has five core layers:
flowchart TD
START["Building an AI Agent SaaS Platform: Architecture …"] --> A
A["The AI Agent Platform Opportunity"]
A --> B
B["System Architecture Overview"]
B --> C
C["Database Schema"]
C --> D
D["API Key Authentication"]
D --> E
E["Token Metering Service"]
E --> F
F["Tenant-Isolated Agent Runtime"]
F --> G
G["API Endpoints"]
G --> H
H["Billing Integration"]
H --> DONE["Key Takeaways"]
style START fill:#4f46e5,stroke:#4338ca,color:#fff
style DONE fill:#059669,stroke:#047857,color:#fff
- API Gateway — Authentication, rate limiting, request routing
- Tenant Management — User accounts, API keys, configuration
- Agent Runtime — Executes agent workflows with tenant isolation
- Metering and Billing — Tracks token usage, enforces limits, bills customers
- Storage — Agent definitions, conversation history, tool configurations
Client -> API Gateway -> Agent Runtime -> OpenAI API
| |
Tenant DB Metering Service
| |
Agent Store Billing System
Database Schema
The core schema captures tenants, agents, API keys, and usage:
# models.py
from sqlalchemy import Column, String, Integer, Float, Boolean, DateTime, ForeignKey, Text, JSON
from sqlalchemy.orm import relationship, DeclarativeBase
from sqlalchemy.dialects.postgresql import UUID
import uuid
from datetime import datetime
class Base(DeclarativeBase):
pass
class Tenant(Base):
__tablename__ = "tenants"
id = Column(UUID(as_uuid=True), primary_key=True, default=uuid.uuid4)
name = Column(String(255), nullable=False)
email = Column(String(255), unique=True, nullable=False)
plan = Column(String(50), default="free") # free, pro, enterprise
monthly_token_limit = Column(Integer, default=1_000_000)
is_active = Column(Boolean, default=True)
created_at = Column(DateTime, default=datetime.utcnow)
api_keys = relationship("APIKey", back_populates="tenant")
agents = relationship("AgentConfig", back_populates="tenant")
usage_records = relationship("UsageRecord", back_populates="tenant")
class APIKey(Base):
__tablename__ = "api_keys"
id = Column(UUID(as_uuid=True), primary_key=True, default=uuid.uuid4)
tenant_id = Column(UUID(as_uuid=True), ForeignKey("tenants.id"), nullable=False)
key_hash = Column(String(64), unique=True, nullable=False)
key_prefix = Column(String(8), nullable=False) # First 8 chars for identification
name = Column(String(255), nullable=False)
is_active = Column(Boolean, default=True)
last_used_at = Column(DateTime, nullable=True)
created_at = Column(DateTime, default=datetime.utcnow)
tenant = relationship("Tenant", back_populates="api_keys")
class AgentConfig(Base):
__tablename__ = "agent_configs"
id = Column(UUID(as_uuid=True), primary_key=True, default=uuid.uuid4)
tenant_id = Column(UUID(as_uuid=True), ForeignKey("tenants.id"), nullable=False)
name = Column(String(255), nullable=False)
model = Column(String(50), default="gpt-4.1")
instructions = Column(Text, nullable=False)
tools = Column(JSON, default=[])
handoffs = Column(JSON, default=[])
is_published = Column(Boolean, default=False)
created_at = Column(DateTime, default=datetime.utcnow)
updated_at = Column(DateTime, default=datetime.utcnow, onupdate=datetime.utcnow)
tenant = relationship("Tenant", back_populates="agents")
class UsageRecord(Base):
__tablename__ = "usage_records"
id = Column(UUID(as_uuid=True), primary_key=True, default=uuid.uuid4)
tenant_id = Column(UUID(as_uuid=True), ForeignKey("tenants.id"), nullable=False)
agent_id = Column(UUID(as_uuid=True), ForeignKey("agent_configs.id"), nullable=True)
model = Column(String(50), nullable=False)
input_tokens = Column(Integer, default=0)
output_tokens = Column(Integer, default=0)
total_tokens = Column(Integer, default=0)
cost_usd = Column(Float, default=0.0)
created_at = Column(DateTime, default=datetime.utcnow)
tenant = relationship("Tenant", back_populates="usage_records")
API Key Authentication
Authenticate tenants using hashed API keys:
See AI Voice Agents Handle Real Calls
Book a free demo or calculate how much you can save with AI voice automation.
flowchart TD
CENTER(("Core Concepts"))
CENTER --> N0["API Gateway — Authentication, rate limi…"]
CENTER --> N1["Tenant Management — User accounts, API …"]
CENTER --> N2["Agent Runtime — Executes agent workflow…"]
CENTER --> N3["Metering and Billing — Tracks token usa…"]
CENTER --> N4["Storage — Agent definitions, conversati…"]
style CENTER fill:#4f46e5,stroke:#4338ca,color:#fff
# auth.py
from fastapi import Security, HTTPException
from fastapi.security import APIKeyHeader
import hashlib
from sqlalchemy import select
from models import APIKey, Tenant
from database import get_session
api_key_header = APIKeyHeader(name="X-API-Key")
def hash_api_key(key: str) -> str:
return hashlib.sha256(key.encode()).hexdigest()
async def get_current_tenant(api_key: str = Security(api_key_header)) -> Tenant:
"""Authenticate and return the tenant for the given API key."""
key_hash = hash_api_key(api_key)
async with get_session() as session:
result = await session.execute(
select(APIKey)
.where(APIKey.key_hash == key_hash, APIKey.is_active == True)
.options(selectinload(APIKey.tenant))
)
api_key_record = result.scalar_one_or_none()
if not api_key_record:
raise HTTPException(status_code=401, detail="Invalid API key")
tenant = api_key_record.tenant
if not tenant.is_active:
raise HTTPException(status_code=403, detail="Account suspended")
# Update last used timestamp
api_key_record.last_used_at = datetime.utcnow()
await session.commit()
return tenant
Token Metering Service
Track every token consumed by every tenant in real time:
# metering.py
from datetime import datetime, timedelta
from sqlalchemy import select, func
from models import UsageRecord, Tenant
from database import get_session
MODEL_PRICING = {
"gpt-5": {"input": 10.00, "output": 30.00},
"gpt-4.1": {"input": 2.00, "output": 8.00},
"gpt-4.1-mini": {"input": 0.40, "output": 1.60},
}
class MeteringService:
async def record_usage(
self,
tenant_id: str,
agent_id: str | None,
model: str,
input_tokens: int,
output_tokens: int,
) -> UsageRecord:
"""Record token usage for a tenant."""
pricing = MODEL_PRICING.get(model, MODEL_PRICING["gpt-4.1"])
cost = (
(input_tokens / 1_000_000) * pricing["input"] +
(output_tokens / 1_000_000) * pricing["output"]
)
async with get_session() as session:
record = UsageRecord(
tenant_id=tenant_id,
agent_id=agent_id,
model=model,
input_tokens=input_tokens,
output_tokens=output_tokens,
total_tokens=input_tokens + output_tokens,
cost_usd=cost,
)
session.add(record)
await session.commit()
return record
async def get_monthly_usage(self, tenant_id: str) -> dict:
"""Get the tenant's usage for the current billing period."""
month_start = datetime.utcnow().replace(day=1, hour=0, minute=0, second=0)
async with get_session() as session:
result = await session.execute(
select(
func.sum(UsageRecord.total_tokens).label("total_tokens"),
func.sum(UsageRecord.cost_usd).label("total_cost"),
func.count(UsageRecord.id).label("request_count"),
)
.where(
UsageRecord.tenant_id == tenant_id,
UsageRecord.created_at >= month_start,
)
)
row = result.one()
return {
"total_tokens": row.total_tokens or 0,
"total_cost": round(row.total_cost or 0.0, 4),
"request_count": row.request_count or 0,
}
async def check_quota(self, tenant_id: str) -> bool:
"""Check if the tenant has remaining quota."""
async with get_session() as session:
tenant = await session.get(Tenant, tenant_id)
usage = await self.get_monthly_usage(tenant_id)
return usage["total_tokens"] < tenant.monthly_token_limit
Tenant-Isolated Agent Runtime
Each agent run must be isolated to its tenant. The runtime builds agents dynamically from the tenant's configuration:
# runtime.py
from agents import Agent, Runner, function_tool
from models import AgentConfig, Tenant
from metering import MeteringService
metering = MeteringService()
class TenantAgentRuntime:
"""Runs agents in a tenant-isolated context."""
def __init__(self, tenant: Tenant):
self.tenant = tenant
def build_agent(self, config: AgentConfig) -> Agent:
"""Build an SDK Agent from a tenant's agent configuration."""
tools = self._build_tools(config.tools)
return Agent(
name=config.name,
model=config.model,
instructions=config.instructions,
tools=tools,
)
async def run(self, config: AgentConfig, user_input: str) -> dict:
"""Execute an agent run with metering and quota enforcement."""
# Check quota before running
has_quota = await metering.check_quota(str(self.tenant.id))
if not has_quota:
return {"error": "Monthly token quota exceeded", "status": 429}
agent = self.build_agent(config)
result = await Runner.run(agent, input=user_input)
# Record token usage
total_input = 0
total_output = 0
for response in result.raw_responses:
if response.usage:
total_input += response.usage.input_tokens
total_output += response.usage.output_tokens
await metering.record_usage(
tenant_id=str(self.tenant.id),
agent_id=str(config.id),
model=config.model,
input_tokens=total_input,
output_tokens=total_output,
)
return {
"response": result.final_output,
"tokens_used": total_input + total_output,
"model": config.model,
}
def _build_tools(self, tool_configs: list[dict]) -> list:
"""Build tool functions from configuration."""
tools = []
for tool_config in tool_configs:
if tool_config["type"] == "webhook":
tools.append(self._create_webhook_tool(tool_config))
return tools
def _create_webhook_tool(self, config: dict):
"""Create a function tool that calls a tenant's webhook."""
import httpx
@function_tool(name_override=config["name"], description_override=config["description"])
async def webhook_tool(**kwargs) -> str:
async with httpx.AsyncClient(timeout=30.0) as client:
resp = await client.post(
config["url"],
json=kwargs,
headers={"Authorization": f"Bearer {config.get('token', '')}"},
)
return resp.text
return webhook_tool
API Endpoints
The platform exposes a clean REST API for tenants to manage and run their agents:
# routes.py
from fastapi import APIRouter, Depends, HTTPException
from pydantic import BaseModel
from auth import get_current_tenant
from runtime import TenantAgentRuntime
from models import Tenant, AgentConfig
from database import get_session
router = APIRouter(prefix="/v1")
class RunRequest(BaseModel):
agent_id: str
message: str
class RunResponse(BaseModel):
response: str
tokens_used: int
model: str
@router.post("/run", response_model=RunResponse)
async def run_agent(request: RunRequest, tenant: Tenant = Depends(get_current_tenant)):
async with get_session() as session:
config = await session.get(AgentConfig, request.agent_id)
if not config or str(config.tenant_id) != str(tenant.id):
raise HTTPException(status_code=404, detail="Agent not found")
runtime = TenantAgentRuntime(tenant)
result = await runtime.run(config, request.message)
if "error" in result:
raise HTTPException(status_code=result["status"], detail=result["error"])
return RunResponse(**result)
@router.get("/usage")
async def get_usage(tenant: Tenant = Depends(get_current_tenant)):
from metering import MeteringService
metering = MeteringService()
usage = await metering.get_monthly_usage(str(tenant.id))
return {
"monthly_usage": usage,
"monthly_limit": tenant.monthly_token_limit,
"plan": tenant.plan,
}
Billing Integration
Connect metering data to a billing system like Stripe for usage-based pricing:
# billing.py
import stripe
from metering import MeteringService
stripe.api_key = "sk_..."
PLAN_PRICING = {
"free": {"base_price": 0, "included_tokens": 1_000_000, "overage_per_1m": 0},
"pro": {"base_price": 49, "included_tokens": 10_000_000, "overage_per_1m": 5.00},
"enterprise": {"base_price": 299, "included_tokens": 100_000_000, "overage_per_1m": 3.00},
}
async def calculate_invoice(tenant_id: str, plan: str) -> dict:
metering = MeteringService()
usage = await metering.get_monthly_usage(tenant_id)
pricing = PLAN_PRICING[plan]
base = pricing["base_price"]
overage_tokens = max(0, usage["total_tokens"] - pricing["included_tokens"])
overage_cost = (overage_tokens / 1_000_000) * pricing["overage_per_1m"]
return {
"base_price": base,
"included_tokens": pricing["included_tokens"],
"tokens_used": usage["total_tokens"],
"overage_tokens": overage_tokens,
"overage_cost": round(overage_cost, 2),
"total": round(base + overage_cost, 2),
}
Building an AI agent SaaS platform requires careful attention to isolation, metering, and scalability. The patterns above — hashed API keys, per-tenant agent runtimes, real-time token metering, and usage-based billing — provide a solid foundation. Start with a single-tenant deployment to validate your agent framework, then add multi-tenancy once the core agent logic is proven.
Written by
CallSphere Team
Expert insights on AI voice agents and customer communication automation.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.