---
title: "Building a Multi-Tenant AI Agent Platform: Isolating Customers in Shared Infrastructure"
description: "Design and build a multi-tenant AI agent platform with proper tenant isolation, resource quotas, data segregation, per-tenant billing, and shared infrastructure that scales efficiently without cross-tenant data leakage."
canonical: https://callsphere.ai/blog/multi-tenant-ai-agent-platform-isolation-billing
category: "Learn Agentic AI"
tags: ["Multi-Tenant", "AI Agents", "Platform Engineering", "Tenant Isolation", "SaaS", "Data Segregation"]
author: "CallSphere Team"
published: 2026-03-17T00:00:00.000Z
updated: 2026-05-06T01:02:43.473Z
---

# Building a Multi-Tenant AI Agent Platform: Isolating Customers in Shared Infrastructure

> Design and build a multi-tenant AI agent platform with proper tenant isolation, resource quotas, data segregation, per-tenant billing, and shared infrastructure that scales efficiently without cross-tenant data leakage.

## Why Multi-Tenancy Is Hard for AI Agents

Multi-tenant AI agent platforms share infrastructure across customers to reduce costs, but AI agents introduce unique isolation challenges. An agent's system prompt contains business-specific knowledge. Conversation histories contain customer PII. Tool configurations expose internal APIs. A cross-tenant data leak in an AI agent is not just a privacy violation — it could expose one customer's business logic and customer data to another.

The three pillars of AI agent multi-tenancy are data isolation (no tenant can read another tenant's data), resource isolation (one tenant's usage spike does not degrade another's experience), and configuration isolation (each tenant's agent behaves according to their specific settings).

## Data Isolation with Row-Level Security

The most practical approach for most platforms is a shared database with row-level security (RLS). Every table includes a `tenant_id` column, and PostgreSQL enforces that queries only return rows matching the current tenant:

```mermaid
flowchart LR
    AGENT(["Agent wants
to run code"])
    POLICY{"Policy check
allow list"}
    SANDBOX[("Ephemeral sandbox
Firecracker or gVisor")]
    NETPOL["Egress firewall
deny by default"]
    LIMIT["Resource limits
CPU, mem, time"]
    EXEC["Run untrusted code"]
    LOG[("Audit log")]
    OUT(["Captured stdout
or error"])
    DENY(["Refuse"])
    AGENT --> POLICY
    POLICY -->|Allow| SANDBOX
    POLICY -->|Block| DENY
    SANDBOX --> NETPOL --> LIMIT --> EXEC --> LOG --> OUT
    style POLICY fill:#f59e0b,stroke:#d97706,color:#1f2937
    style SANDBOX fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
    style EXEC fill:#4f46e5,stroke:#4338ca,color:#fff
    style OUT fill:#059669,stroke:#047857,color:#fff
    style DENY fill:#dc2626,stroke:#b91c1c,color:#fff
```

```python
# Database schema with tenant isolation
SCHEMA = """
CREATE TABLE tenants (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    name TEXT NOT NULL,
    plan TEXT NOT NULL DEFAULT 'free',
    created_at TIMESTAMPTZ DEFAULT NOW()
);

CREATE TABLE conversations (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    tenant_id UUID NOT NULL REFERENCES tenants(id),
    user_id TEXT NOT NULL,
    created_at TIMESTAMPTZ DEFAULT NOW()
);

CREATE TABLE messages (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    conversation_id UUID NOT NULL REFERENCES conversations(id),
    tenant_id UUID NOT NULL REFERENCES tenants(id),
    role TEXT NOT NULL,
    content TEXT NOT NULL,
    tokens_used INTEGER DEFAULT 0,
    created_at TIMESTAMPTZ DEFAULT NOW()
);

-- Row-Level Security
ALTER TABLE conversations ENABLE ROW LEVEL SECURITY;
ALTER TABLE messages ENABLE ROW LEVEL SECURITY;

CREATE POLICY tenant_isolation_conversations ON conversations
    USING (tenant_id = current_setting('app.current_tenant')::UUID);

CREATE POLICY tenant_isolation_messages ON messages
    USING (tenant_id = current_setting('app.current_tenant')::UUID);

-- Index for tenant-scoped queries
CREATE INDEX idx_messages_tenant_conv
    ON messages (tenant_id, conversation_id, created_at);
"""
```

Set the tenant context on every database connection before executing queries:

```python
from contextlib import asynccontextmanager

@asynccontextmanager
async def tenant_connection(tenant_id: str):
    conn = await db_pool.acquire()
    try:
        await conn.execute(
            f"SET app.current_tenant = '{tenant_id}'"
        )
        yield conn
    finally:
        await conn.execute("RESET app.current_tenant")
        await db_pool.release(conn)

# Usage
async def get_conversation_history(
    tenant_id: str, conversation_id: str
) -> list:
    async with tenant_connection(tenant_id) as conn:
        # RLS automatically filters to this tenant
        rows = await conn.fetch(
            "SELECT role, content FROM messages "
            "WHERE conversation_id = $1 ORDER BY created_at",
            conversation_id,
        )
        return [dict(r) for r in rows]
```

Even if a bug in your application code accidentally passes the wrong conversation ID, RLS ensures the query returns zero rows rather than another tenant's data.

## Resource Quotas and Rate Limiting

Each tenant needs resource limits to prevent one customer from consuming all capacity. Implement tiered quotas based on the customer's plan:

```python
from dataclasses import dataclass

@dataclass
class TenantQuota:
    messages_per_minute: int
    messages_per_day: int
    max_tokens_per_message: int
    max_concurrent_sessions: int
    monthly_token_budget: int

PLAN_QUOTAS = {
    "free": TenantQuota(
        messages_per_minute=10,
        messages_per_day=100,
        max_tokens_per_message=2000,
        max_concurrent_sessions=5,
        monthly_token_budget=500_000,
    ),
    "pro": TenantQuota(
        messages_per_minute=60,
        messages_per_day=5000,
        max_tokens_per_message=8000,
        max_concurrent_sessions=50,
        monthly_token_budget=10_000_000,
    ),
    "enterprise": TenantQuota(
        messages_per_minute=300,
        messages_per_day=50000,
        max_tokens_per_message=16000,
        max_concurrent_sessions=500,
        monthly_token_budget=100_000_000,
    ),
}

class QuotaEnforcer:
    def __init__(self, redis_client):
        self.redis = redis_client

    async def check_quota(self, tenant_id: str, plan: str) -> bool:
        quota = PLAN_QUOTAS[plan]

        # Check rate limit (sliding window)
        minute_key = f"rate:{tenant_id}:minute"
        current = await self.redis.incr(minute_key)
        if current == 1:
            await self.redis.expire(minute_key, 60)
        if current > quota.messages_per_minute:
            return False

        # Check daily limit
        day_key = f"rate:{tenant_id}:day:{today()}"
        daily = await self.redis.incr(day_key)
        if daily == 1:
            await self.redis.expire(day_key, 86400)
        if daily > quota.messages_per_day:
            return False

        return True
```

## Tenant-Specific Agent Configuration

Each tenant configures their agent differently — custom system prompts, enabled tools, model preferences, branding. Store this configuration separately and load it per request:

```python
class TenantAgentConfig:
    def __init__(self, redis_client, db_pool):
        self.redis = redis_client
        self.db = db_pool

    async def get_config(self, tenant_id: str) -> dict:
        cache_key = f"tenant:config:{tenant_id}"
        cached = await self.redis.get(cache_key)
        if cached:
            return json.loads(cached)

        async with tenant_connection(tenant_id) as conn:
            config = await conn.fetchrow(
                "SELECT system_prompt, model, enabled_tools, "
                "temperature, max_turns FROM agent_configs "
                "WHERE tenant_id = $1 AND active = true",
                tenant_id,
            )

        config_dict = dict(config)
        await self.redis.setex(cache_key, 300, json.dumps(config_dict))
        return config_dict
```

## Per-Tenant Billing with Token Tracking

Track every LLM API call with the tenant ID to enable accurate billing:

```python
class UsageMeter:
    def __init__(self, db_pool):
        self.db = db_pool

    async def record_usage(
        self,
        tenant_id: str,
        model: str,
        input_tokens: int,
        output_tokens: int,
        conversation_id: str,
    ):
        async with self.db.acquire() as conn:
            await conn.execute(
                "INSERT INTO usage_records "
                "(tenant_id, model, input_tokens, output_tokens, "
                "conversation_id, cost_cents, recorded_at) "
                "VALUES ($1, $2, $3, $4, $5, $6, NOW())",
                tenant_id,
                model,
                input_tokens,
                output_tokens,
                conversation_id,
                self._calculate_cost(model, input_tokens, output_tokens),
            )

    def _calculate_cost(
        self, model: str, input_tokens: int, output_tokens: int
    ) -> float:
        rates = {
            "gpt-4o-mini": (0.015, 0.06),
            "gpt-4o": (0.25, 1.00),
        }
        input_rate, output_rate = rates.get(model, (0.25, 1.00))
        return (
            (input_tokens / 100_000) * input_rate
            + (output_tokens / 100_000) * output_rate
        )
```

## FAQ

### Should I use a shared database or separate databases per tenant?

Use a shared database with row-level security for most cases. It is simpler to manage, migrate, and back up. Use separate databases only for enterprise customers with strict compliance requirements (healthcare, finance) or when a single tenant's data volume justifies dedicated infrastructure.

### How do I prevent one tenant's agent from accidentally accessing another tenant's tools?

Load the tool configuration per-tenant at request time and only register the tools that tenant has enabled. Never use a global tool registry shared across tenants. If tools access external APIs, use tenant-specific API keys stored encrypted in the database.

### What happens when a tenant exceeds their quota?

Return a 429 status code with a `Retry-After` header indicating when they can resume. For soft limits (approaching the monthly budget), send a notification to the tenant admin and optionally downgrade to a cheaper model rather than hard-blocking. For hard limits (daily rate limits), block immediately to protect infrastructure.

---

#MultiTenant #AIAgents #PlatformEngineering #TenantIsolation #SaaS #DataSegregation #AgenticAI #LearnAI #AIEngineering

---

Source: https://callsphere.ai/blog/multi-tenant-ai-agent-platform-isolation-billing
