Capstone: Building a Multi-Tenant AI Agent SaaS with Usage-Based Billing

SaaS Architecture for AI Agents

Building a multi-tenant AI agent platform requires solving four hard problems simultaneously: tenant isolation (one customer's data and agents must never leak to another), dynamic agent configuration (tenants create agents without writing code), usage metering (track every LLM call, tool invocation, and conversation), and billing (charge based on actual consumption).

This capstone builds a platform where each tenant signs up, creates agents through a web-based builder, deploys them to their own endpoints, and pays based on usage. The architecture uses a shared PostgreSQL database with row-level tenant isolation, a FastAPI backend, and Stripe for billing.

Data Model with Tenant Isolation

Every table includes a tenant_id column. All queries are scoped to the authenticated tenant.

flowchart LR
    INPUT(["User intent"])
    PARSE["Parse plus<br/>classify"]
    PLAN["Plan and tool<br/>selection"]
    AGENT["Agent loop<br/>LLM plus tools"]
    GUARD{"Guardrails<br/>and policy"}
    EXEC["Execute and<br/>verify result"]
    OBS[("Trace and metrics")]
    OUT(["Outcome plus<br/>next action"])
    INPUT --> PARSE --> PLAN --> AGENT --> GUARD
    GUARD -->|Pass| EXEC --> OUT
    GUARD -->|Fail| AGENT
    AGENT --> OBS
    style AGENT fill:#4f46e5,stroke:#4338ca,color:#fff
    style GUARD fill:#f59e0b,stroke:#d97706,color:#1f2937
    style OBS fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
    style OUT fill:#059669,stroke:#047857,color:#fff

# models.py
from sqlalchemy import Column, String, Text, Integer, Float, DateTime, ForeignKey
from sqlalchemy.dialects.postgresql import UUID, JSONB
import uuid

class Tenant(Base):
    __tablename__ = "tenants"
    id = Column(UUID(as_uuid=True), primary_key=True, default=uuid.uuid4)
    name = Column(String(200), nullable=False)
    slug = Column(String(100), unique=True, nullable=False)
    stripe_customer_id = Column(String(100), nullable=True)
    plan = Column(String(50), default="free")  # free, starter, pro, enterprise
    api_key = Column(String(100), unique=True)
    created_at = Column(DateTime, server_default="now()")

class AgentConfig(Base):
    __tablename__ = "agent_configs"
    id = Column(UUID(as_uuid=True), primary_key=True, default=uuid.uuid4)
    tenant_id = Column(UUID(as_uuid=True), ForeignKey("tenants.id"), index=True)
    name = Column(String(200))
    instructions = Column(Text)
    model = Column(String(50), default="gpt-4o")
    tools = Column(JSONB, default=[])  # list of enabled tool configs
    is_active = Column(String(10), default="true")
    created_at = Column(DateTime, server_default="now()")

class UsageRecord(Base):
    __tablename__ = "usage_records"
    id = Column(UUID(as_uuid=True), primary_key=True, default=uuid.uuid4)
    tenant_id = Column(UUID(as_uuid=True), ForeignKey("tenants.id"), index=True)
    agent_id = Column(UUID(as_uuid=True), ForeignKey("agent_configs.id"))
    event_type = Column(String(50))  # "llm_call", "tool_call", "conversation"
    tokens_input = Column(Integer, default=0)
    tokens_output = Column(Integer, default=0)
    cost_cents = Column(Float, default=0)
    metadata_ = Column(JSONB, default={})
    created_at = Column(DateTime, server_default="now()")

Tenant-Scoped Dependency Injection

Use a FastAPI dependency that extracts the tenant from the API key and scopes all database queries.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live →

Try Live Demo →

# core/auth.py
from fastapi import Depends, HTTPException, Security
from fastapi.security import APIKeyHeader

api_key_header = APIKeyHeader(name="X-API-Key")

async def get_current_tenant(
    api_key: str = Security(api_key_header),
    db=Depends(get_db),
) -> Tenant:
    tenant = db.query(Tenant).filter(Tenant.api_key == api_key).first()
    if not tenant:
        raise HTTPException(status_code=401, detail="Invalid API key")
    return tenant

class TenantScoped:
    """Utility to scope queries to the current tenant."""
    def __init__(self, db, tenant: Tenant):
        self.db = db
        self.tenant_id = tenant.id

    def query(self, model):
        return self.db.query(model).filter(model.tenant_id == self.tenant_id)

Dynamic Agent Builder

Tenants configure agents through the admin dashboard. The backend loads agent configurations from the database and instantiates them on demand.

# services/agent_factory.py
from agents import Agent, function_tool

# Registry of available tools that tenants can enable
TOOL_REGISTRY = {
    "search_kb": search_knowledge_base,
    "send_email": send_email_tool,
    "create_ticket": create_ticket_tool,
    "lookup_order": lookup_order_tool,
    "check_calendar": check_calendar_tool,
}

def build_agent_from_config(config: AgentConfig) -> Agent:
    """Dynamically build an Agent from a database configuration."""
    enabled_tools = []
    for tool_config in config.tools:
        tool_name = tool_config["name"]
        if tool_name in TOOL_REGISTRY:
            enabled_tools.append(TOOL_REGISTRY[tool_name])

    return Agent(
        name=config.name,
        instructions=config.instructions,
        model=config.model,
        tools=enabled_tools,
    )

Usage Metering

Every LLM call and tool invocation is recorded for billing.

# services/metering.py
from datetime import datetime

TOKEN_COSTS = {
    "gpt-4o": {"input": 0.25, "output": 1.00},      # per 100k tokens
    "gpt-4o-mini": {"input": 0.015, "output": 0.06},
}

async def record_usage(
    db, tenant_id: str, agent_id: str,
    event_type: str, tokens_in: int, tokens_out: int, model: str
):
    costs = TOKEN_COSTS.get(model, TOKEN_COSTS["gpt-4o"])
    cost = (tokens_in * costs["input"] + tokens_out * costs["output"]) / 100_000

    record = UsageRecord(
        tenant_id=tenant_id,
        agent_id=agent_id,
        event_type=event_type,
        tokens_input=tokens_in,
        tokens_output=tokens_out,
        cost_cents=cost * 100,  # store in cents
    )
    db.add(record)
    db.commit()

Stripe Billing Integration

Sync usage to Stripe at the end of each billing period using Stripe metered billing.

# services/billing.py
import stripe
from sqlalchemy import func
from datetime import datetime, timedelta

stripe.api_key = os.environ["STRIPE_SECRET_KEY"]

async def sync_usage_to_stripe(tenant_id: str, db):
    """Report usage to Stripe for metered billing."""
    tenant = db.query(Tenant).get(tenant_id)
    if not tenant.stripe_customer_id:
        return

    # Calculate usage since last sync
    period_start = datetime.utcnow() - timedelta(days=1)
    total_cost = db.query(func.sum(UsageRecord.cost_cents)).filter(
        UsageRecord.tenant_id == tenant_id,
        UsageRecord.created_at >= period_start,
    ).scalar() or 0

    # Report to Stripe
    stripe.billing.MeterEvent.create(
        event_name="ai_agent_usage",
        payload={
            "value": str(int(total_cost)),
            "stripe_customer_id": tenant.stripe_customer_id,
        },
    )

async def get_tenant_usage_summary(tenant_id: str, days: int, db) -> dict:
    since = datetime.utcnow() - timedelta(days=days)
    records = db.query(UsageRecord).filter(
        UsageRecord.tenant_id == tenant_id,
        UsageRecord.created_at >= since,
    ).all()
    return {
        "total_cost_cents": sum(r.cost_cents for r in records),
        "total_llm_calls": sum(1 for r in records if r.event_type == "llm_call"),
        "total_tokens_input": sum(r.tokens_input for r in records),
        "total_tokens_output": sum(r.tokens_output for r in records),
        "total_conversations": sum(1 for r in records if r.event_type == "conversation"),
    }

Tenant API Endpoint

Each tenant gets their own agent endpoint, authenticated by their API key.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Try Live Demo → Book 30-min Walkthrough See Pricing

# routes/agent_api.py
from fastapi import APIRouter

router = APIRouter()

@router.post("/v1/chat")
async def chat(
    body: ChatRequest,
    tenant: Tenant = Depends(get_current_tenant),
    db=Depends(get_db),
):
    scoped = TenantScoped(db, tenant)
    config = scoped.query(AgentConfig).filter(
        AgentConfig.id == body.agent_id
    ).first()
    if not config:
        raise HTTPException(404, "Agent not found")

    agent = build_agent_from_config(config)
    result = await Runner.run(agent, body.message)

    # Record usage
    usage = result.raw_responses[-1].usage
    await record_usage(
        db, str(tenant.id), str(config.id),
        "llm_call", usage.input_tokens, usage.output_tokens, config.model
    )

    return {"reply": result.final_output, "agent": config.name}

FAQ

How do I prevent one tenant's heavy usage from affecting others?

Implement per-tenant rate limiting using a Redis-backed token bucket. Each tenant gets a request-per-minute and tokens-per-day limit based on their plan tier. When a tenant exceeds their limit, return a 429 status code with a Retry-After header.

How do I handle tenant data deletion for compliance?

Implement a cascade delete that removes all tenant data: agent configs, usage records, conversations, and any uploaded knowledge base documents. Use a soft-delete first (mark as deleted with a timestamp) and run a hard-delete job after a 30-day grace period. Log the deletion for audit compliance.

How do I let tenants bring their own API keys?

Store tenant-provided API keys encrypted in the database. When building an agent for that tenant, configure the OpenAI client with their key instead of yours. This shifts LLM costs to the tenant while you charge only for platform usage. Validate the key on save by making a minimal API call.

#CapstoneProject #SaaS #MultiTenant #Billing #AgentBuilder #FullStackAI #AgenticAI #LearnAI #AIEngineering

Capstone: Building a Multi-Tenant AI Agent SaaS with Usage-Based Billing

SaaS Architecture for AI Agents

Data Model with Tenant Isolation

Tenant-Scoped Dependency Injection

Dynamic Agent Builder

Usage Metering

Stripe Billing Integration

Tenant API Endpoint

FAQ

How do I prevent one tenant's heavy usage from affecting others?

How do I handle tenant data deletion for compliance?

How do I let tenants bring their own API keys?

Try CallSphere AI Voice Agents

Related Articles You May Like

Workspace Studio: Google's AI Agent Builder Inside Workspace (2026)

MCP Servers for SaaS Tools: A 2026 Registry Walkthrough for Voice Agent Teams

Stargate progress update — April 2026 site and capex

Vercel AI SDK for SaaS Onboarding Agents: Conversion Lift Story

Embedding AI Into SaaS Products: Architecture and UX Patterns

Deploying Voice AI Across 50 Clinics: Vapi Engineering Cost vs CallSphere