Agent Analytics for Marketplace Providers: Understanding Usage and Revenue

Why Marketplace Analytics Are Different

Agent marketplace analytics serve two audiences: the marketplace operator needs platform-level metrics (total GMV, active publishers, consumer retention), and individual publishers need agent-level metrics (install count, usage patterns, revenue, satisfaction scores). The analytics system must aggregate raw telemetry into actionable insights for both audiences.

Traditional SaaS analytics track page views and clicks. Agent analytics track conversations, tool usage patterns, error rates, cost efficiency, and outcome quality. These agent-specific metrics require purpose-built collection and aggregation pipelines.

Event Collection Pipeline

Every agent interaction generates a stream of events. A structured event schema ensures consistent collection across all agents in the marketplace:

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live →

Try Live Demo →

flowchart LR
    INPUT(["User intent"])
    PARSE["Parse plus<br/>classify"]
    PLAN["Plan and tool<br/>selection"]
    AGENT["Agent loop<br/>LLM plus tools"]
    GUARD{"Guardrails<br/>and policy"}
    EXEC["Execute and<br/>verify result"]
    OBS[("Trace and metrics")]
    OUT(["Outcome plus<br/>next action"])
    INPUT --> PARSE --> PLAN --> AGENT --> GUARD
    GUARD -->|Pass| EXEC --> OUT
    GUARD -->|Fail| AGENT
    AGENT --> OBS
    style AGENT fill:#4f46e5,stroke:#4338ca,color:#fff
    style GUARD fill:#f59e0b,stroke:#d97706,color:#1f2937
    style OBS fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
    style OUT fill:#059669,stroke:#047857,color:#fff

from dataclasses import dataclass, field
from datetime import datetime, timezone
from enum import Enum
from typing import Optional
import uuid

class EventType(Enum):
    AGENT_INVOKED = "agent_invoked"
    AGENT_COMPLETED = "agent_completed"
    AGENT_ERRORED = "agent_errored"
    TOOL_CALLED = "tool_called"
    TOOL_FAILED = "tool_failed"
    USER_FEEDBACK = "user_feedback"
    INSTALL = "install"
    UNINSTALL = "uninstall"

@dataclass
class AnalyticsEvent:
    id: str = field(
        default_factory=lambda: str(uuid.uuid4())
    )
    event_type: EventType = EventType.AGENT_INVOKED
    agent_id: str = ""
    publisher_id: str = ""
    tenant_id: str = ""
    timestamp: datetime = field(
        default_factory=lambda: datetime.now(timezone.utc)
    )
    properties: dict = field(default_factory=dict)

class EventCollector:
    def __init__(self, event_queue):
        self.queue = event_queue

    async def track_invocation(
        self,
        agent_id: str,
        publisher_id: str,
        tenant_id: str,
        input_tokens: int,
        output_tokens: int,
        tool_calls: list[str],
        duration_ms: int,
        success: bool,
        cost_usd: float,
    ):
        event = AnalyticsEvent(
            event_type=(
                EventType.AGENT_COMPLETED
                if success
                else EventType.AGENT_ERRORED
            ),
            agent_id=agent_id,
            publisher_id=publisher_id,
            tenant_id=tenant_id,
            properties={
                "input_tokens": input_tokens,
                "output_tokens": output_tokens,
                "tool_calls": tool_calls,
                "duration_ms": duration_ms,
                "cost_usd": cost_usd,
            },
        )
        await self.queue.enqueue(event)

    async def track_feedback(
        self,
        agent_id: str,
        publisher_id: str,
        tenant_id: str,
        rating: int,
        comment: Optional[str] = None,
    ):
        event = AnalyticsEvent(
            event_type=EventType.USER_FEEDBACK,
            agent_id=agent_id,
            publisher_id=publisher_id,
            tenant_id=tenant_id,
            properties={
                "rating": rating,
                "comment": comment,
            },
        )
        await self.queue.enqueue(event)

Publisher Dashboard Metrics

Publishers need metrics that help them understand how their agent performs and where to invest improvement effort:

from dataclasses import dataclass

@dataclass
class PublisherDashboardMetrics:
    # Usage
    total_invocations: int = 0
    unique_tenants: int = 0
    active_installs: int = 0
    invocations_trend: list[dict] = field(
        default_factory=list
    )

    # Quality
    avg_satisfaction: float = 0.0
    error_rate: float = 0.0
    avg_response_time_ms: int = 0
    p95_response_time_ms: int = 0

    # Revenue
    total_revenue: float = 0.0
    revenue_trend: list[dict] = field(
        default_factory=list
    )
    avg_revenue_per_tenant: float = 0.0

    # Tool usage
    tool_usage_breakdown: dict[str, int] = field(
        default_factory=dict
    )
    tool_failure_rates: dict[str, float] = field(
        default_factory=dict
    )

class PublisherAnalyticsService:
    def __init__(self, event_store):
        self.events = event_store

    async def get_dashboard(
        self, publisher_id: str, period_days: int = 30
    ) -> PublisherDashboardMetrics:
        raw_events = await self.events.query(
            publisher_id=publisher_id,
            days=period_days,
        )

        completions = [
            e for e in raw_events
            if e.event_type == EventType.AGENT_COMPLETED
        ]
        errors = [
            e for e in raw_events
            if e.event_type == EventType.AGENT_ERRORED
        ]
        feedback = [
            e for e in raw_events
            if e.event_type == EventType.USER_FEEDBACK
        ]

        total = len(completions) + len(errors)
        unique_tenants = len(set(
            e.tenant_id for e in completions + errors
        ))

        # Tool usage breakdown
        tool_counts: dict[str, int] = {}
        for event in completions:
            for tool in event.properties.get(
                "tool_calls", []
            ):
                tool_counts[tool] = (
                    tool_counts.get(tool, 0) + 1
                )

        # Revenue
        total_revenue = sum(
            e.properties.get("cost_usd", 0)
            for e in completions
        )

        # Satisfaction
        ratings = [
            e.properties["rating"]
            for e in feedback
            if "rating" in e.properties
        ]
        avg_sat = (
            sum(ratings) / len(ratings) if ratings else 0.0
        )

        # Response times
        durations = [
            e.properties["duration_ms"]
            for e in completions
            if "duration_ms" in e.properties
        ]
        durations.sort()
        avg_duration = (
            sum(durations) // len(durations)
            if durations
            else 0
        )
        p95_duration = (
            durations[int(len(durations) * 0.95)]
            if durations
            else 0
        )

        return PublisherDashboardMetrics(
            total_invocations=total,
            unique_tenants=unique_tenants,
            avg_satisfaction=round(avg_sat, 2),
            error_rate=(
                round(len(errors) / total, 4)
                if total > 0
                else 0.0
            ),
            avg_response_time_ms=avg_duration,
            p95_response_time_ms=p95_duration,
            total_revenue=round(total_revenue, 2),
            avg_revenue_per_tenant=(
                round(total_revenue / unique_tenants, 2)
                if unique_tenants > 0
                else 0.0
            ),
            tool_usage_breakdown=tool_counts,
        )

Insight Generation

Raw metrics are useful, but actionable insights drive improvement. An insight engine analyzes patterns and generates recommendations:

@dataclass
class Insight:
    severity: str  # "critical", "warning", "info"
    category: str
    title: str
    description: str
    recommendation: str

class InsightEngine:
    async def generate_insights(
        self, metrics: PublisherDashboardMetrics
    ) -> list[Insight]:
        insights = []

        if metrics.error_rate > 0.05:
            insights.append(Insight(
                severity="critical",
                category="reliability",
                title="High Error Rate",
                description=(
                    f"Error rate is {metrics.error_rate:.1%}, "
                    f"above the 5% threshold."
                ),
                recommendation=(
                    "Review error logs for the most common "
                    "failure patterns. Check tool integrations "
                    "and add retry logic for transient failures."
                ),
            ))

        if metrics.p95_response_time_ms > 10000:
            insights.append(Insight(
                severity="warning",
                category="performance",
                title="Slow p95 Response Time",
                description=(
                    f"p95 latency is "
                    f"{metrics.p95_response_time_ms}ms."
                ),
                recommendation=(
                    "Consider using a faster model for simple "
                    "queries or adding response streaming."
                ),
            ))

        if metrics.avg_satisfaction < 3.5:
            insights.append(Insight(
                severity="warning",
                category="quality",
                title="Low User Satisfaction",
                description=(
                    f"Average rating is "
                    f"{metrics.avg_satisfaction}/5.0."
                ),
                recommendation=(
                    "Review low-rated conversations to identify "
                    "common frustration patterns. Improve system "
                    "prompt or add missing tool capabilities."
                ),
            ))

        # Tool failure analysis
        for tool, rate in metrics.tool_failure_rates.items():
            if rate > 0.1:
                insights.append(Insight(
                    severity="warning",
                    category="reliability",
                    title=f"Tool '{tool}' Failing Often",
                    description=(
                        f"Failure rate: {rate:.1%}"
                    ),
                    recommendation=(
                        f"Check the '{tool}' integration "
                        f"configuration and API health."
                    ),
                ))

        return insights

FAQ

What are the most important metrics for a marketplace publisher to track?

Focus on three pillars: adoption (install count, active tenants, retention), quality (satisfaction rating, error rate, response latency), and revenue (total revenue, revenue per tenant, churn rate). Adoption without quality leads to uninstalls. Quality without revenue tracking leads to unsustainable pricing.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Try Live Demo → Book 30-min Walkthrough See Pricing

How do you handle analytics data privacy across tenants?

Never expose one tenant's conversation content to another tenant or to the publisher. Aggregate metrics — counts, averages, distributions — are safe to share. Individual conversation logs should only be visible to the tenant who owns them. Publishers see aggregate statistics about how their agent performs across all tenants without seeing any specific tenant's data.

How frequently should analytics be updated?

Real-time for operational metrics like error rate and latency — publishers need to catch issues immediately. Hourly for usage and revenue metrics — this balances freshness with compute cost. Daily for trend analysis and insights — these require enough data to be statistically meaningful.

#AgentAnalytics #MarketplaceMetrics #RevenueAnalytics #UsageTracking #AgenticAI #LearnAI #AIEngineering

Agent Analytics for Marketplace Providers: Understanding Usage and Revenue

Why Marketplace Analytics Are Different

Event Collection Pipeline

Publisher Dashboard Metrics

Insight Generation

FAQ

What are the most important metrics for a marketplace publisher to track?

How do you handle analytics data privacy across tenants?

How frequently should analytics be updated?

Try CallSphere AI Voice Agents

Related Articles You May Like

Desktop AI Agents in 2026: Project Arc, Claude Cowork, OpenAI Agents Compared

Anthropic Skills System: Loadable Tool Packs for Claude Agents

Enterprise CIO Guide: Harvey AI — Legal Agents Move from Pilot to Practice

Enterprise CIO Guide: Perplexity Comet — The Agentic Browser Goes Mass Market

Enterprise CIO Guide: Hippocratic AI — Healthcare Agents at Scale

Designing Agent Loops with the Claude Agent SDK