---
title: "Building Custom Agent Dashboards: Visualizing Conversations, Costs, and Latency"
description: "Build production-grade Grafana dashboards for AI agent systems that visualize conversation throughput, per-model costs, LLM latency percentiles, and tool usage patterns using Prometheus metrics."
canonical: https://callsphere.ai/blog/building-custom-agent-dashboards-conversations-costs-latency
category: "Learn Agentic AI"
tags: ["Dashboards", "Grafana", "Prometheus", "Monitoring", "AI Agents"]
author: "CallSphere Team"
published: 2026-03-17T00:00:00.000Z
updated: 2026-05-06T01:02:45.421Z
---

# Building Custom Agent Dashboards: Visualizing Conversations, Costs, and Latency

> Build production-grade Grafana dashboards for AI agent systems that visualize conversation throughput, per-model costs, LLM latency percentiles, and tool usage patterns using Prometheus metrics.

## The Key Metrics Every Agent Dashboard Needs

Generic application dashboards track request rate, error rate, and latency. Agent dashboards need those plus metrics unique to LLM workloads: token consumption, cost per conversation, tool call success rates, and conversation completion rates. Without these, you are flying blind on the dimensions that matter most for agent reliability and cost control.

The foundation is a metrics collection layer that captures these signals at the right granularity, and a visualization layer that makes patterns visible at a glance.

## Exposing Prometheus Metrics from Your Agent

Use the `prometheus_client` library to define counters, histograms, and gauges that capture agent-specific signals.

```mermaid
flowchart LR
    REQ(["Request"])
    BATCH["Continuous batching
vLLM scheduler"]
    PREF{"Prefill or
decode?"}
    PRE["Prefill phase
parallel attention"]
    DEC["Decode phase
token by token"]
    KV[("Paged KV cache")]
    SAMP["Sampling
top-p, temp"]
    STREAM["Stream tokens
to client"]
    REQ --> BATCH --> PREF
    PREF -->|First token| PRE --> KV
    PREF -->|Next token| DEC
    KV --> DEC --> SAMP --> STREAM
    SAMP -->|EOS| DONE(["Response complete"])
    style BATCH fill:#4f46e5,stroke:#4338ca,color:#fff
    style KV fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
    style STREAM fill:#0ea5e9,stroke:#0369a1,color:#fff
    style DONE fill:#059669,stroke:#047857,color:#fff
```

```python
from prometheus_client import Counter, Histogram, Gauge, start_http_server

# Conversation metrics
conversations_total = Counter(
    "agent_conversations_total",
    "Total conversations started",
    ["agent_name", "status"],
)

# LLM call metrics
llm_call_duration = Histogram(
    "agent_llm_call_duration_seconds",
    "LLM call latency in seconds",
    ["model", "agent_name"],
    buckets=[0.1, 0.25, 0.5, 1.0, 2.0, 5.0, 10.0, 30.0],
)

tokens_used = Counter(
    "agent_tokens_total",
    "Total tokens consumed",
    ["model", "token_type"],  # token_type: prompt or completion
)

# Tool metrics
tool_calls_total = Counter(
    "agent_tool_calls_total",
    "Total tool invocations",
    ["tool_name", "status"],
)

# Active conversations gauge
active_conversations = Gauge(
    "agent_active_conversations",
    "Currently active conversations",
    ["agent_name"],
)

# Start metrics server on port 9090
start_http_server(9090)
```

## Instrumenting the Agent Loop

Wrap the core agent operations to emit metrics on every call.

```python
import time

async def instrumented_llm_call(model: str, messages: list, agent_name: str):
    start = time.perf_counter()
    try:
        response = await llm_client.chat.completions.create(
            model=model, messages=messages
        )
        duration = time.perf_counter() - start
        llm_call_duration.labels(model=model, agent_name=agent_name).observe(duration)
        tokens_used.labels(model=model, token_type="prompt").inc(
            response.usage.prompt_tokens
        )
        tokens_used.labels(model=model, token_type="completion").inc(
            response.usage.completion_tokens
        )
        return response
    except Exception as e:
        duration = time.perf_counter() - start
        llm_call_duration.labels(model=model, agent_name=agent_name).observe(duration)
        raise

async def instrumented_tool_call(tool_name: str, arguments: dict):
    try:
        result = await execute_tool(tool_name, arguments)
        tool_calls_total.labels(tool_name=tool_name, status="success").inc()
        return result
    except Exception:
        tool_calls_total.labels(tool_name=tool_name, status="error").inc()
        raise

async def run_conversation(user_id: str, message: str, agent_name: str):
    active_conversations.labels(agent_name=agent_name).inc()
    try:
        result = await agent.run(message)
        conversations_total.labels(agent_name=agent_name, status="completed").inc()
        return result
    except Exception:
        conversations_total.labels(agent_name=agent_name, status="failed").inc()
        raise
    finally:
        active_conversations.labels(agent_name=agent_name).dec()
```

## Building the Grafana Dashboard

Configure Prometheus as a Grafana data source, then create panels using PromQL queries for each KPI.

**Conversation throughput** — requests per minute over time:

```text
rate(agent_conversations_total[5m])
```

**LLM latency P95** — the 95th percentile response time by model:

```text
histogram_quantile(0.95, rate(agent_llm_call_duration_seconds_bucket[5m]))
```

**Token burn rate** — tokens per minute, split by prompt vs completion:

```text
rate(agent_tokens_total[5m])
```

**Cost estimation panel** — multiply token rates by per-token pricing using a recording rule or Grafana transformation:

```text
rate(agent_tokens_total{token_type="prompt", model="gpt-4o"}[5m]) * 0.0000025
+
rate(agent_tokens_total{token_type="completion", model="gpt-4o"}[5m]) * 0.00001
```

**Tool error rate** — percentage of tool calls that fail:

```text
rate(agent_tool_calls_total{status="error"}[5m])
/ rate(agent_tool_calls_total[5m])
```

## Setting Up Alerts

Define Prometheus alerting rules that fire when agent KPIs breach thresholds.

```yaml
# prometheus-alerts.yaml
groups:
  - name: agent_alerts
    rules:
      - alert: HighLLMLatency
        expr: histogram_quantile(0.95, rate(agent_llm_call_duration_seconds_bucket[5m])) > 5
        for: 3m
        labels:
          severity: warning
        annotations:
          summary: "LLM P95 latency exceeds 5 seconds"

      - alert: HighToolErrorRate
        expr: >
          rate(agent_tool_calls_total{status="error"}[10m])
          / rate(agent_tool_calls_total[10m]) > 0.1
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "Tool error rate above 10%"
```

## FAQ

### How many Prometheus labels should I use per metric?

Keep label cardinality low. Labels like `model`, `agent_name`, and `status` are fine because they have a small, bounded set of values. Never use labels with high cardinality like `user_id` or `conversation_id` — these will cause Prometheus memory and performance issues. Track per-user data in a separate analytics database instead.

### Should I track metrics in the agent code or use a sidecar?

Instrument directly in the agent code for LLM-specific metrics like token counts and tool call results, because only the application has that context. Use a sidecar or service mesh for infrastructure metrics like HTTP request rate and network latency. The two approaches complement each other.

### How do I estimate costs when using multiple models?

Create a pricing lookup that maps model names to per-token costs, then apply it as a Grafana transformation or Prometheus recording rule. Update the pricing table whenever your provider changes rates. Some teams store costs in a database and join with token metrics in Grafana for more flexibility.

---

#Dashboards #Grafana #Prometheus #Monitoring #AIAgents #AgenticAI #LearnAI #AIEngineering

---

Source: https://callsphere.ai/blog/building-custom-agent-dashboards-conversations-costs-latency
