---
title: "Token Usage Analytics: Understanding and Optimizing LLM Consumption Patterns"
description: "Learn how to track token consumption across AI agents, attribute costs to specific features and users, identify usage trends, and implement optimization strategies that reduce LLM spend without sacrificing quality."
canonical: https://callsphere.ai/blog/token-usage-analytics-understanding-optimizing-llm-consumption-patterns
category: "Learn Agentic AI"
tags: ["Token Usage", "Cost Optimization", "LLM", "Analytics", "AI Agents"]
author: "CallSphere Team"
published: 2026-03-17T00:00:00.000Z
updated: 2026-05-08T14:03:45.751Z
---

# Token Usage Analytics: Understanding and Optimizing LLM Consumption Patterns

> Learn how to track token consumption across AI agents, attribute costs to specific features and users, identify usage trends, and implement optimization strategies that reduce LLM spend without sacrificing quality.

## Why Token Usage Analytics Matter

LLM costs are directly tied to token consumption. A single agent conversation might use anywhere from 500 to 50,000 tokens depending on context length, tool calls, and conversation depth. Without granular tracking, you cannot answer basic questions: Which agent costs the most? Which conversations are outliers? Is your cost per resolution trending up or down?

Token analytics transform LLM spending from an opaque monthly bill into a controllable, optimizable metric.

## Capturing Token Data

Every LLM API response includes token usage information. The key is capturing this data consistently and attaching it to the right context: the conversation, the agent, and the specific step within the agent loop.

```mermaid
flowchart LR
    subgraph IN["Inputs"]
        I1["Monthly call volume"]
        I2["Average deal value"]
        I3["Current answer rate"]
        I4["Receptionist cost
per month"]
    end
    subgraph CALC["CallSphere Captures"]
        C1["Missed calls converted
at 24 by 7 coverage"]
        C2["Receptionist payroll
displaced or freed"]
    end
    subgraph OUT["Outputs"]
        O1["Recovered revenue
per month"]
        O2["Operating cost saved"]
        O3((Net ROI
monthly))
    end
    I1 --> C1
    I2 --> C1
    I3 --> C1
    I4 --> C2
    C1 --> O1 --> O3
    C2 --> O2 --> O3
    style C1 fill:#4f46e5,stroke:#4338ca,color:#fff
    style C2 fill:#4f46e5,stroke:#4338ca,color:#fff
    style O3 fill:#059669,stroke:#047857,color:#fff
```

```python
from dataclasses import dataclass, field
from datetime import datetime
from openai import OpenAI

client = OpenAI()

@dataclass
class TokenRecord:
    conversation_id: str
    agent_name: str
    model: str
    prompt_tokens: int
    completion_tokens: int
    total_tokens: int
    timestamp: str = field(
        default_factory=lambda: datetime.utcnow().isoformat()
    )
    step_type: str = ""  # "main_response", "tool_call", "classification"
    cost_usd: float = 0.0

MODEL_PRICING = {
    "gpt-4o": {"input": 2.50 / 1_000_000, "output": 10.00 / 1_000_000},
    "gpt-4o-mini": {"input": 0.15 / 1_000_000, "output": 0.60 / 1_000_000},
    "gpt-4.1": {"input": 2.00 / 1_000_000, "output": 8.00 / 1_000_000},
    "gpt-4.1-mini": {"input": 0.40 / 1_000_000, "output": 1.60 / 1_000_000},
}

def calculate_cost(model: str, prompt_tokens: int, completion_tokens: int) -> float:
    pricing = MODEL_PRICING.get(model, {"input": 0, "output": 0})
    return (
        prompt_tokens * pricing["input"]
        + completion_tokens * pricing["output"]
    )
```

## Building a Token Tracker

A centralized tracker wraps every LLM call, records token usage, and provides aggregation methods.

```python
from collections import defaultdict
import json

class TokenTracker:
    def __init__(self):
        self.records: list[TokenRecord] = []
        self._by_conversation: dict[str, list[TokenRecord]] = defaultdict(list)
        self._by_agent: dict[str, list[TokenRecord]] = defaultdict(list)

    def record(self, rec: TokenRecord) -> None:
        rec.cost_usd = calculate_cost(
            rec.model, rec.prompt_tokens, rec.completion_tokens
        )
        self.records.append(rec)
        self._by_conversation[rec.conversation_id].append(rec)
        self._by_agent[rec.agent_name].append(rec)

    def tracked_completion(
        self, conversation_id: str, agent_name: str,
        step_type: str, **kwargs
    ) -> dict:
        response = client.chat.completions.create(**kwargs)
        usage = response.usage
        rec = TokenRecord(
            conversation_id=conversation_id,
            agent_name=agent_name,
            model=kwargs.get("model", "unknown"),
            prompt_tokens=usage.prompt_tokens,
            completion_tokens=usage.completion_tokens,
            total_tokens=usage.total_tokens,
            step_type=step_type,
        )
        self.record(rec)
        return {
            "response": response,
            "tokens": rec,
        }

    def cost_by_agent(self) -> dict[str, float]:
        return {
            agent: sum(r.cost_usd for r in records)
            for agent, records in self._by_agent.items()
        }

    def cost_by_conversation(self) -> dict[str, float]:
        return {
            conv: sum(r.cost_usd for r in records)
            for conv, records in self._by_conversation.items()
        }
```

## Usage Trend Analysis

Tracking token usage over time reveals whether your agents are becoming more or less efficient. A rising cost-per-conversation trend signals prompt bloat or unnecessary tool calls.

```python
from datetime import timedelta

def daily_usage_summary(
    records: list[TokenRecord], days: int = 30
) -> list[dict]:
    from collections import defaultdict
    daily: dict[str, dict] = defaultdict(
        lambda: {"total_tokens": 0, "cost_usd": 0.0, "conversations": set()}
    )
    for rec in records:
        day = rec.timestamp[:10]  # extract YYYY-MM-DD
        daily[day]["total_tokens"] += rec.total_tokens
        daily[day]["cost_usd"] += rec.cost_usd
        daily[day]["conversations"].add(rec.conversation_id)

    summary = []
    for day in sorted(daily.keys())[-days:]:
        data = daily[day]
        conv_count = len(data["conversations"])
        summary.append({
            "date": day,
            "total_tokens": data["total_tokens"],
            "total_cost": round(data["cost_usd"], 4),
            "conversations": conv_count,
            "cost_per_conversation": round(
                data["cost_usd"] / conv_count, 4
            ) if conv_count else 0,
            "tokens_per_conversation": (
                data["total_tokens"] // conv_count
            ) if conv_count else 0,
        })
    return summary
```

## Optimization Opportunities

Once you have visibility into token consumption, several optimization strategies become obvious. Prompt compression reduces input tokens. Model tiering routes simple requests to cheaper models. Caching avoids redundant calls entirely.

```python
class TokenOptimizer:
    def __init__(self, tracker: TokenTracker):
        self.tracker = tracker

    def find_expensive_conversations(
        self, threshold_usd: float = 0.10
    ) -> list[dict]:
        costs = self.tracker.cost_by_conversation()
        return [
            {"conversation_id": cid, "cost": cost}
            for cid, cost in sorted(costs.items(), key=lambda x: -x[1])
            if cost > threshold_usd
        ]

    def find_prompt_bloat(self, threshold_ratio: float = 5.0) -> list[dict]:
        bloated = []
        for rec in self.tracker.records:
            ratio = rec.prompt_tokens / max(rec.completion_tokens, 1)
            if ratio > threshold_ratio:
                bloated.append({
                    "conversation_id": rec.conversation_id,
                    "agent": rec.agent_name,
                    "prompt_tokens": rec.prompt_tokens,
                    "completion_tokens": rec.completion_tokens,
                    "ratio": round(ratio, 1),
                })
        return bloated

    def model_tier_recommendation(self) -> list[dict]:
        recommendations = []
        for agent, records in self.tracker._by_agent.items():
            avg_tokens = sum(r.total_tokens for r in records) / len(records)
            current_cost = sum(r.cost_usd for r in records)
            if avg_tokens < 500 and records[0].model != "gpt-4o-mini":
                recommendations.append({
                    "agent": agent,
                    "current_model": records[0].model,
                    "suggested_model": "gpt-4o-mini",
                    "potential_savings_pct": 85,
                })
        return recommendations
```

## FAQ

### How do I track token usage for streaming responses?

Most APIs provide token counts in the final chunk of a streaming response. For OpenAI, the last chunk includes a `usage` field when you set `stream_options={"include_usage": True}` in your request. Capture this final chunk and feed it into your tracker just like a non-streaming response.

### What is a good cost-per-conversation benchmark?

It varies dramatically by use case. Simple FAQ agents using gpt-4o-mini might cost $0.001 per conversation. Complex multi-step agents with tool calls on gpt-4o can reach $0.05 to $0.20. The more useful benchmark is cost-per-resolution, which factors in whether the agent actually solved the problem.

### Should I set hard token limits on conversations?

Yes, but with a graceful fallback. Set a warning threshold at 80% of your budget and a hard limit at 100%. When the warning threshold is hit, instruct the agent to summarize and resolve quickly. When the hard limit is hit, escalate to a human rather than abruptly cutting the conversation.

---

#TokenUsage #CostOptimization #LLM #Analytics #AIAgents #AgenticAI #LearnAI #AIEngineering

---

Source: https://callsphere.ai/blog/token-usage-analytics-understanding-optimizing-llm-consumption-patterns
