Capstone: Building a Multi-Tenant AI Agent SaaS with Usage-Based Billing
Build a production SaaS platform where multiple tenants can create and deploy AI agents with tenant isolation, a visual agent builder, usage tracking, and Stripe-based usage billing.
SaaS Architecture for AI Agents
Building a multi-tenant AI agent platform requires solving four hard problems simultaneously: tenant isolation (one customer's data and agents must never leak to another), dynamic agent configuration (tenants create agents without writing code), usage metering (track every LLM call, tool invocation, and conversation), and billing (charge based on actual consumption).
This capstone builds a platform where each tenant signs up, creates agents through a web-based builder, deploys them to their own endpoints, and pays based on usage. The architecture uses a shared PostgreSQL database with row-level tenant isolation, a FastAPI backend, and Stripe for billing.
Data Model with Tenant Isolation
Every table includes a tenant_id column. All queries are scoped to the authenticated tenant.
flowchart LR
INPUT(["User intent"])
PARSE["Parse plus<br/>classify"]
PLAN["Plan and tool<br/>selection"]
AGENT["Agent loop<br/>LLM plus tools"]
GUARD{"Guardrails<br/>and policy"}
EXEC["Execute and<br/>verify result"]
OBS[("Trace and metrics")]
OUT(["Outcome plus<br/>next action"])
INPUT --> PARSE --> PLAN --> AGENT --> GUARD
GUARD -->|Pass| EXEC --> OUT
GUARD -->|Fail| AGENT
AGENT --> OBS
style AGENT fill:#4f46e5,stroke:#4338ca,color:#fff
style GUARD fill:#f59e0b,stroke:#d97706,color:#1f2937
style OBS fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
style OUT fill:#059669,stroke:#047857,color:#fff
# models.py
from sqlalchemy import Column, String, Text, Integer, Float, DateTime, ForeignKey
from sqlalchemy.dialects.postgresql import UUID, JSONB
import uuid
class Tenant(Base):
__tablename__ = "tenants"
id = Column(UUID(as_uuid=True), primary_key=True, default=uuid.uuid4)
name = Column(String(200), nullable=False)
slug = Column(String(100), unique=True, nullable=False)
stripe_customer_id = Column(String(100), nullable=True)
plan = Column(String(50), default="free") # free, starter, pro, enterprise
api_key = Column(String(100), unique=True)
created_at = Column(DateTime, server_default="now()")
class AgentConfig(Base):
__tablename__ = "agent_configs"
id = Column(UUID(as_uuid=True), primary_key=True, default=uuid.uuid4)
tenant_id = Column(UUID(as_uuid=True), ForeignKey("tenants.id"), index=True)
name = Column(String(200))
instructions = Column(Text)
model = Column(String(50), default="gpt-4o")
tools = Column(JSONB, default=[]) # list of enabled tool configs
is_active = Column(String(10), default="true")
created_at = Column(DateTime, server_default="now()")
class UsageRecord(Base):
__tablename__ = "usage_records"
id = Column(UUID(as_uuid=True), primary_key=True, default=uuid.uuid4)
tenant_id = Column(UUID(as_uuid=True), ForeignKey("tenants.id"), index=True)
agent_id = Column(UUID(as_uuid=True), ForeignKey("agent_configs.id"))
event_type = Column(String(50)) # "llm_call", "tool_call", "conversation"
tokens_input = Column(Integer, default=0)
tokens_output = Column(Integer, default=0)
cost_cents = Column(Float, default=0)
metadata_ = Column(JSONB, default={})
created_at = Column(DateTime, server_default="now()")
Tenant-Scoped Dependency Injection
Use a FastAPI dependency that extracts the tenant from the API key and scopes all database queries.
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
# core/auth.py
from fastapi import Depends, HTTPException, Security
from fastapi.security import APIKeyHeader
api_key_header = APIKeyHeader(name="X-API-Key")
async def get_current_tenant(
api_key: str = Security(api_key_header),
db=Depends(get_db),
) -> Tenant:
tenant = db.query(Tenant).filter(Tenant.api_key == api_key).first()
if not tenant:
raise HTTPException(status_code=401, detail="Invalid API key")
return tenant
class TenantScoped:
"""Utility to scope queries to the current tenant."""
def __init__(self, db, tenant: Tenant):
self.db = db
self.tenant_id = tenant.id
def query(self, model):
return self.db.query(model).filter(model.tenant_id == self.tenant_id)
Dynamic Agent Builder
Tenants configure agents through the admin dashboard. The backend loads agent configurations from the database and instantiates them on demand.
# services/agent_factory.py
from agents import Agent, function_tool
# Registry of available tools that tenants can enable
TOOL_REGISTRY = {
"search_kb": search_knowledge_base,
"send_email": send_email_tool,
"create_ticket": create_ticket_tool,
"lookup_order": lookup_order_tool,
"check_calendar": check_calendar_tool,
}
def build_agent_from_config(config: AgentConfig) -> Agent:
"""Dynamically build an Agent from a database configuration."""
enabled_tools = []
for tool_config in config.tools:
tool_name = tool_config["name"]
if tool_name in TOOL_REGISTRY:
enabled_tools.append(TOOL_REGISTRY[tool_name])
return Agent(
name=config.name,
instructions=config.instructions,
model=config.model,
tools=enabled_tools,
)
Usage Metering
Every LLM call and tool invocation is recorded for billing.
# services/metering.py
from datetime import datetime
TOKEN_COSTS = {
"gpt-4o": {"input": 0.25, "output": 1.00}, # per 100k tokens
"gpt-4o-mini": {"input": 0.015, "output": 0.06},
}
async def record_usage(
db, tenant_id: str, agent_id: str,
event_type: str, tokens_in: int, tokens_out: int, model: str
):
costs = TOKEN_COSTS.get(model, TOKEN_COSTS["gpt-4o"])
cost = (tokens_in * costs["input"] + tokens_out * costs["output"]) / 100_000
record = UsageRecord(
tenant_id=tenant_id,
agent_id=agent_id,
event_type=event_type,
tokens_input=tokens_in,
tokens_output=tokens_out,
cost_cents=cost * 100, # store in cents
)
db.add(record)
db.commit()
Stripe Billing Integration
Sync usage to Stripe at the end of each billing period using Stripe metered billing.
# services/billing.py
import stripe
from sqlalchemy import func
from datetime import datetime, timedelta
stripe.api_key = os.environ["STRIPE_SECRET_KEY"]
async def sync_usage_to_stripe(tenant_id: str, db):
"""Report usage to Stripe for metered billing."""
tenant = db.query(Tenant).get(tenant_id)
if not tenant.stripe_customer_id:
return
# Calculate usage since last sync
period_start = datetime.utcnow() - timedelta(days=1)
total_cost = db.query(func.sum(UsageRecord.cost_cents)).filter(
UsageRecord.tenant_id == tenant_id,
UsageRecord.created_at >= period_start,
).scalar() or 0
# Report to Stripe
stripe.billing.MeterEvent.create(
event_name="ai_agent_usage",
payload={
"value": str(int(total_cost)),
"stripe_customer_id": tenant.stripe_customer_id,
},
)
async def get_tenant_usage_summary(tenant_id: str, days: int, db) -> dict:
since = datetime.utcnow() - timedelta(days=days)
records = db.query(UsageRecord).filter(
UsageRecord.tenant_id == tenant_id,
UsageRecord.created_at >= since,
).all()
return {
"total_cost_cents": sum(r.cost_cents for r in records),
"total_llm_calls": sum(1 for r in records if r.event_type == "llm_call"),
"total_tokens_input": sum(r.tokens_input for r in records),
"total_tokens_output": sum(r.tokens_output for r in records),
"total_conversations": sum(1 for r in records if r.event_type == "conversation"),
}
Tenant API Endpoint
Each tenant gets their own agent endpoint, authenticated by their API key.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
# routes/agent_api.py
from fastapi import APIRouter
router = APIRouter()
@router.post("/v1/chat")
async def chat(
body: ChatRequest,
tenant: Tenant = Depends(get_current_tenant),
db=Depends(get_db),
):
scoped = TenantScoped(db, tenant)
config = scoped.query(AgentConfig).filter(
AgentConfig.id == body.agent_id
).first()
if not config:
raise HTTPException(404, "Agent not found")
agent = build_agent_from_config(config)
result = await Runner.run(agent, body.message)
# Record usage
usage = result.raw_responses[-1].usage
await record_usage(
db, str(tenant.id), str(config.id),
"llm_call", usage.input_tokens, usage.output_tokens, config.model
)
return {"reply": result.final_output, "agent": config.name}
FAQ
How do I prevent one tenant's heavy usage from affecting others?
Implement per-tenant rate limiting using a Redis-backed token bucket. Each tenant gets a request-per-minute and tokens-per-day limit based on their plan tier. When a tenant exceeds their limit, return a 429 status code with a Retry-After header.
How do I handle tenant data deletion for compliance?
Implement a cascade delete that removes all tenant data: agent configs, usage records, conversations, and any uploaded knowledge base documents. Use a soft-delete first (mark as deleted with a timestamp) and run a hard-delete job after a 30-day grace period. Log the deletion for audit compliance.
How do I let tenants bring their own API keys?
Store tenant-provided API keys encrypted in the database. When building an agent for that tenant, configure the OpenAI client with their key instead of yours. This shifts LLM costs to the tenant while you charge only for platform usage. Validate the key on save by making a minimal API call.
#CapstoneProject #SaaS #MultiTenant #Billing #AgentBuilder #FullStackAI #AgenticAI #LearnAI #AIEngineering
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.