---
title: "Cost Tracking for AI Agents: Per-User, Per-Feature Token Usage Analytics"
description: "Build a complete cost tracking system for AI agents that attributes token usage to individual users and features, sets budget alerts, and provides dashboards for controlling LLM spend in production."
canonical: https://callsphere.ai/blog/cost-tracking-ai-agents-per-user-token-usage-analytics
category: "Learn Agentic AI"
tags: ["Cost Tracking", "Token Usage", "Analytics", "AI Agents", "Budget Management"]
author: "CallSphere Team"
published: 2026-03-17T00:00:00.000Z
updated: 2026-05-07T15:03:40.367Z
---

# Cost Tracking for AI Agents: Per-User, Per-Feature Token Usage Analytics

> Build a complete cost tracking system for AI agents that attributes token usage to individual users and features, sets budget alerts, and provides dashboards for controlling LLM spend in production.

## Why Cost Tracking Is Critical for Production Agents

LLM costs scale with usage in ways that are easy to underestimate. A single GPT-4o call might cost fractions of a cent, but an agent that makes three LLM calls per user message — one for routing, one for the specialist, one for summarization — multiplied by thousands of daily users creates a bill that grows faster than most teams expect. Without per-user, per-feature cost attribution, you cannot answer basic questions: Which users drive the most cost? Which agent features are expensive relative to their value? Are costs growing faster than revenue?

A cost tracking system captures token usage at the call level, attributes it to users and features, stores it for analysis, and alerts when budgets are at risk.

## The Token Usage Data Model

Start with a database table that records every LLM call with enough context for flexible analysis.

```mermaid
flowchart LR
    subgraph IN["Inputs"]
        I1["Monthly call volume"]
        I2["Average deal value"]
        I3["Current answer rate"]
        I4["Receptionist cost
per month"]
    end
    subgraph CALC["CallSphere Captures"]
        C1["Missed calls converted
at 24 by 7 coverage"]
        C2["Receptionist payroll
displaced or freed"]
    end
    subgraph OUT["Outputs"]
        O1["Recovered revenue
per month"]
        O2["Operating cost saved"]
        O3((Net ROI
monthly))
    end
    I1 --> C1
    I2 --> C1
    I3 --> C1
    I4 --> C2
    C1 --> O1 --> O3
    C2 --> O2 --> O3
    style C1 fill:#4f46e5,stroke:#4338ca,color:#fff
    style C2 fill:#4f46e5,stroke:#4338ca,color:#fff
    style O3 fill:#059669,stroke:#047857,color:#fff
```

```python
# SQLAlchemy model for token usage tracking
from sqlalchemy import Column, String, Integer, Float, DateTime, Index
from sqlalchemy.dialects.postgresql import UUID
from sqlalchemy.orm import declarative_base
from datetime import datetime
import uuid

Base = declarative_base()

class TokenUsage(Base):
    __tablename__ = "token_usage"

    id = Column(UUID(as_uuid=True), primary_key=True, default=uuid.uuid4)
    timestamp = Column(DateTime, nullable=False, default=datetime.utcnow)
    user_id = Column(String, nullable=False, index=True)
    conversation_id = Column(String, nullable=False, index=True)
    agent_name = Column(String, nullable=False)
    feature = Column(String, nullable=False)  # e.g., "routing", "support", "summarization"
    model = Column(String, nullable=False)
    prompt_tokens = Column(Integer, nullable=False)
    completion_tokens = Column(Integer, nullable=False)
    total_tokens = Column(Integer, nullable=False)
    cost_usd = Column(Float, nullable=False)

    __table_args__ = (
        Index("idx_usage_user_timestamp", "user_id", "timestamp"),
        Index("idx_usage_feature_timestamp", "feature", "timestamp"),
    )
```

## Recording Token Usage from LLM Calls

Wrap your LLM client to automatically record usage after every call. Maintain a pricing table that maps models to per-token costs.

```python
MODEL_PRICING = {
    # model: (cost_per_prompt_token, cost_per_completion_token)
    "gpt-4o": (0.0000025, 0.00001),
    "gpt-4o-mini": (0.00000015, 0.0000006),
    "claude-sonnet-4-20250514": (0.000003, 0.000015),
    "claude-haiku-35": (0.0000008, 0.000004),
}

def calculate_cost(model: str, prompt_tokens: int, completion_tokens: int) -> float:
    pricing = MODEL_PRICING.get(model, (0.000003, 0.000015))
    return (prompt_tokens * pricing[0]) + (completion_tokens * pricing[1])

async def tracked_llm_call(
    model: str,
    messages: list,
    user_id: str,
    conversation_id: str,
    feature: str,
    agent_name: str,
    db_session,
):
    response = await llm_client.chat.completions.create(
        model=model, messages=messages
    )

    usage = response.usage
    cost = calculate_cost(model, usage.prompt_tokens, usage.completion_tokens)

    record = TokenUsage(
        user_id=user_id,
        conversation_id=conversation_id,
        agent_name=agent_name,
        feature=feature,
        model=model,
        prompt_tokens=usage.prompt_tokens,
        completion_tokens=usage.completion_tokens,
        total_tokens=usage.total_tokens,
        cost_usd=cost,
    )
    db_session.add(record)
    await db_session.commit()

    return response
```

## Building Usage Analytics Queries

With usage data in PostgreSQL, you can answer the key cost questions with straightforward SQL.

```python
from sqlalchemy import func, text
from datetime import datetime, timedelta

async def get_daily_cost_by_feature(db_session, days: int = 30):
    """Cost per feature per day for the last N days."""
    cutoff = datetime.utcnow() - timedelta(days=days)
    result = await db_session.execute(
        text("""
            SELECT
                date_trunc('day', timestamp) AS day,
                feature,
                SUM(cost_usd) AS total_cost,
                SUM(total_tokens) AS total_tokens,
                COUNT(*) AS call_count
            FROM token_usage
            WHERE timestamp >= :cutoff
            GROUP BY day, feature
            ORDER BY day DESC, total_cost DESC
        """),
        {"cutoff": cutoff},
    )
    return result.fetchall()

async def get_top_users_by_cost(db_session, limit: int = 20):
    """Top N users by total LLM cost in the current month."""
    month_start = datetime.utcnow().replace(day=1, hour=0, minute=0, second=0)
    result = await db_session.execute(
        text("""
            SELECT
                user_id,
                SUM(cost_usd) AS total_cost,
                SUM(total_tokens) AS total_tokens,
                COUNT(DISTINCT conversation_id) AS conversations
            FROM token_usage
            WHERE timestamp >= :month_start
            GROUP BY user_id
            ORDER BY total_cost DESC
            LIMIT :limit
        """),
        {"month_start": month_start, "limit": limit},
    )
    return result.fetchall()
```

## Budget Alerts

Check user and global budgets after every LLM call. When a threshold is exceeded, send alerts and optionally throttle the user.

```python
MONTHLY_BUDGET_USD = 5000.0
PER_USER_DAILY_LIMIT_USD = 2.0

async def check_budgets(user_id: str, db_session):
    """Check both global and per-user budgets after each call."""
    # Check per-user daily spend
    today_start = datetime.utcnow().replace(hour=0, minute=0, second=0)
    user_result = await db_session.execute(
        text("""
            SELECT COALESCE(SUM(cost_usd), 0)
            FROM token_usage
            WHERE user_id = :user_id AND timestamp >= :today_start
        """),
        {"user_id": user_id, "today_start": today_start},
    )
    user_daily_cost = user_result.scalar()

    if user_daily_cost >= PER_USER_DAILY_LIMIT_USD:
        await send_alert(
            severity="warning",
            message=f"User {user_id} exceeded daily limit: ${user_daily_cost:.2f}",
        )
        raise BudgetExceededError(f"Daily usage limit reached for user {user_id}")

    # Check global monthly spend
    month_start = datetime.utcnow().replace(day=1, hour=0, minute=0, second=0)
    global_result = await db_session.execute(
        text("SELECT COALESCE(SUM(cost_usd), 0) FROM token_usage WHERE timestamp >= :month_start"),
        {"month_start": month_start},
    )
    monthly_cost = global_result.scalar()

    if monthly_cost >= MONTHLY_BUDGET_USD * 0.8:
        await send_alert(
            severity="critical",
            message=f"Monthly budget 80% consumed: ${monthly_cost:.2f} / ${MONTHLY_BUDGET_USD}",
        )
```

## Exposing a Cost Dashboard API

Serve the analytics data through a FastAPI endpoint so your dashboard frontend can display it.

```python
from fastapi import APIRouter, Depends

router = APIRouter(prefix="/api/costs")

@router.get("/daily-by-feature")
async def daily_costs(days: int = 30, db=Depends(get_db)):
    rows = await get_daily_cost_by_feature(db, days)
    return [
        {"day": str(r.day.date()), "feature": r.feature,
         "cost": round(r.total_cost, 4), "tokens": r.total_tokens}
        for r in rows
    ]

@router.get("/top-users")
async def top_users(limit: int = 20, db=Depends(get_db)):
    rows = await get_top_users_by_cost(db, limit)
    return [
        {"user_id": r.user_id, "cost": round(r.total_cost, 4),
         "tokens": r.total_tokens, "conversations": r.conversations}
        for r in rows
    ]
```

## FAQ

### How accurate is token-based cost tracking compared to the actual invoice?

Token-based tracking is typically within 2-5% of the actual invoice. Discrepancies come from retries that consume tokens before failing, cached completions that some providers discount, and rounding differences. Reconcile your tracked costs against the provider invoice monthly and adjust your pricing table if needed.

### Should I track costs synchronously or asynchronously?

Use asynchronous recording. Write the usage record to a queue or background task so it does not add latency to the user response. A simple approach is to use `asyncio.create_task()` to fire the database write without awaiting it in the request path. For high-throughput systems, batch writes via a message queue like Redis Streams or Kafka.

### How do I handle cost tracking when the agent retries a failed LLM call?

Track every attempt, including retries. Each attempt consumes tokens and incurs cost, even if the response is discarded. Add a `retry_attempt` field to your usage table so you can analyze retry rates and their cost impact separately from successful first-attempt calls.

---

#CostTracking #TokenUsage #Analytics #AIAgents #BudgetManagement #AgenticAI #LearnAI #AIEngineering

---

Source: https://callsphere.ai/blog/cost-tracking-ai-agents-per-user-token-usage-analytics
