Post-Call Sentiment + Topic Extraction: CallSphere vs Vapi

TL;DR

Voice AI without analytics is just an answering machine that talks back. CallSphere ships post-call analytics by default: GPT-4o-mini runs on every healthcare call to extract sentiment (-1.0 to 1.0), lead score (0-100), intent, topics, satisfaction (1-5), escalation flag (boolean), and an AI-generated summary. Vapi.ai is voice infrastructure with no native analytics layer — customers must wire their own LLM analytics pipeline. This post shows the analytics row schema, a sample row, the pipeline architecture, and what to ask in procurement.

Why Analytics Matter More Than the Voice Itself

Operations teams care less about the voice quality of an AI agent and more about three questions:

Are calls solving the customer's problem? (intent + satisfaction)
Are leads slipping through the cracks? (lead score + escalation flag)
Are agents — human or AI — sentiment-competent? (sentiment trend by hour / day / queue)

These questions are answered with structured analytics on every call, not with manual sampling. A platform without native analytics forces the customer to build the analytics layer themselves — which most never finish.

The CallSphere call_log_analytics Schema

Every healthcare call (and most other verticals) produces a row like:

call_id           | uuid
sentiment_score   | float  (-1.0 to 1.0)
lead_score        | int    (0 to 100)
intent            | text   (e.g., "appointment_booking", "billing_question")
topics            | text[] (e.g., ["insurance", "rescheduling"])
satisfaction      | int    (1 to 5)
escalation_flag   | bool
ai_summary        | text   (2-3 sentences)
analyzed_at       | timestamp
model_version     | text   ("gpt-4o-mini-2024-07-18")

A sample row (synthetic):

sentiment_score: 0.42
lead_score: 78
intent: "new_patient_intake"
topics: ["insurance", "first_visit", "scheduling"]
satisfaction: 4
escalation_flag: false
ai_summary: "Caller is a new patient checking insurance acceptance and seeking first appointment. Provided Aetna PPO. Booked Tuesday 10am with Dr. Patel."

This row is queryable in any dashboard tool. Operations can filter:

"Show me low-satisfaction calls in the last 7 days"
"Top 10 topics for hot leads (lead_score > 80)"
"Hourly sentiment trend for the front-desk queue"

Vapi's Analytics Gap

Vapi customers who want this functionality must:

Capture the post-call transcript via a webhook
Send it to OpenAI / Anthropic with a structured-output prompt
Parse and store the JSON
Build dashboards on the analytics table
Maintain prompt drift / schema drift over time
Add error handling, retries, idempotency

This is several engineer-weeks plus ongoing prompt engineering. And every change to the analytics schema requires re-running historical data — which most teams never do.

CallSphere's Pipeline

The analytics pipeline runs as a background job after every call:

Transcript and metadata captured to call_logs
Job triggers GPT-4o-mini with structured prompt
Output validated against schema (type checks + range checks)
Row written to call_log_analytics
Webhooks fire for downstream systems (CRM, BI)
Dashboards refresh in near-real-time

Costs are predictable: GPT-4o-mini at ~$0.15 per 1M input tokens, with average call analysis ~2K tokens, makes per-call analytics cost ~$0.0003. Effectively free at any reasonable volume.

Mermaid: Analytics Pipeline

graph LR
  CALL[Call Ends] --> CAP[Capture Transcript]
  CAP --> CLOG[(call_logs)]
  CLOG --> JOB[Background Job]
  JOB --> LLM[GPT-4o-mini]
  LLM --> VAL[Schema Validation]
  VAL --> CLA[(call_log_analytics)]
  CLA --> DASH[Dashboards]
  CLA --> WHK[Webhooks]
  WHK --> CRM[CRM]
  WHK --> BI[BI]
  CLA --> BQ[BigQuery Export]

The pipeline is idempotent and re-runnable, so prompt or schema changes can backfill historical calls cleanly.

Comparison Table

Analytics Capability	Vapi DIY	CallSphere
Sentiment scoring	Build yourself	Built-in
Lead scoring	Build yourself	Built-in
Intent detection	Build yourself	Built-in
Topic extraction	Build yourself	Built-in
Satisfaction estimate	Build yourself	Built-in
Escalation flag	Build yourself	Built-in
AI summary	Build yourself	Built-in
Schema-validated outputs	Build yourself	Default
Backfill on prompt changes	Build yourself	Built-in
Dashboard integration	Build yourself	Default
Webhook fanout	Build yourself	Default
Time-to-analytics	Weeks-months	Day 1

What "Lead Score" Actually Means in Healthcare

For healthcare voice, lead_score factors include:

Caller is a new vs returning patient
Insurance accepted by the practice
Appointment booked vs not
Caller intent strength ("just looking" vs "I need an MRI tomorrow")
Sentiment trajectory through the call
Likelihood-to-convert estimated from historical patterns

Scores in the 0-30 range typically represent informational inquiries; 30-60 are warm prospects; 60-80 are hot leads; 80+ are converted-on-call (e.g., booked an appointment, provided insurance).

Procurement-Friendly Analytics Checklist

Is post-call analytics included by default?
What model powers the analytics? (GPT-4o-mini, custom, etc.)
What signals are extracted (sentiment, lead, intent, topics, satisfaction, escalation)?
Are outputs schema-validated?
Are the analytics queryable via SQL / BI tools?
Are webhooks supported for CRM fanout?
Are historical backfills supported on prompt updates?
Are model versions logged with each row?
What is the per-call analytics cost?
Is analytics in scope for SOC 2 / HIPAA?

Real-World ROI Pattern

A 20-provider primary care group that switched from a Vapi-based intake bot (no analytics) to CallSphere reported the following changes after 90 days:

Identified that "insurance acceptance" was the #1 unanswered topic, leading to FAQ updates that raised conversion 12%
Found that low-satisfaction calls clustered around lunch hour due to staff hand-off issues; rescheduled coverage
Caught 14 escalation_flag=true calls per week that had been silently dropping; recovered ~$60K in deferred services

Cost of CallSphere analytics: included. Cost of building the same in Vapi: estimated at 8 engineer-weeks plus ongoing maintenance.

CTA

Voice AI without analytics is missing the point. Book a CallSphere demo and see the analytics dashboard live, or check pricing.

FAQ

Why GPT-4o-mini specifically?

GPT-4o-mini provides excellent JSON schema conformance and topic extraction quality at near-trivial cost. It's purpose-fit for structured analytics on relatively short transcripts.

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

Try Live Demo ROI Calculator

Can I bring my own analytics model?

Yes — enterprise plans support pluggable analytics. Customers have used Anthropic Claude Haiku, custom fine-tunes, or local models for on-prem deployments.

Are analytics rows tied back to specific moments in the call?

The transcript and timestamps are joinable via call_id, so a low-satisfaction flag can be traced to a specific transcript segment.

Is the lead-score model explainable?

Yes. The prompt asks the model to also return a brief reasoning string, captured in the row for ops review.

How is prompt drift managed?

Prompt versions are stored in git. The model_version column captures which prompt+model produced each row. Backfills can be scheduled when prompts change materially.

Deep Dive: Sentiment Score Interpretation

Sentiment is reported as a float from -1.0 to 1.0:

Range	Interpretation	Typical Action
-1.0 to -0.5	Strongly negative	Immediate manager review, possible outreach
-0.5 to -0.2	Mildly negative	Review in daily summary
-0.2 to 0.2	Neutral	No action
0.2 to 0.5	Mildly positive	Coaching recognition
0.5 to 1.0	Strongly positive	Best-practices identification

Calibration: CallSphere's sentiment is calibrated against a held-out set of human-labeled calls. Model upgrades trigger a re-calibration to ensure score interpretations remain stable over time.

Topic Extraction in Practice

Topics are extracted as a multi-label set (a call can have multiple topics). The default vocabulary is vertical-specific:

Healthcare topics:

appointment_booking, appointment_reschedule, appointment_cancel
billing_question, insurance_verification, copay
prescription_refill, medication_question
new_patient, returning_patient
referral, second_opinion
complaint, compliment

Sales topics:

pricing, demo_request, technical_question
competitor_mention, integration_question
decision_maker, budget_question, timeline
objection_pricing, objection_features, objection_timing

Custom topic vocabularies can be defined per tenant.

Intent Classification

Intent is a single-label classification with the highest-confidence intent for the overall call. This is distinct from topics, which are multi-label.

Intent + topic together give a richer view than either alone:

intent: "appointment_booking"
topics: ["insurance", "first_visit", "scheduling"]

The pair lets ops dashboards filter by primary intent and drill into related topics.

Satisfaction Estimate

Satisfaction is a 1-5 estimate based on:

Final sentiment of the call
Whether the caller's primary goal was accomplished
Sentiment trajectory (rising vs falling)
Explicit satisfaction signals ("thank you", "that was helpful", "this is frustrating")
Caller's tone in closing

Satisfaction differs from sentiment — a caller can be moderately negative throughout but ultimately satisfied if the issue was resolved.

Escalation Flag

The boolean escalation flag fires when the call meets any of:

Caller explicitly requests a human ("can I talk to a person")
Sentiment dips below -0.5 sustained
Healthcare emergency keywords detected
Compliance-sensitive topics (e.g., complaint, threat)
Agent could not resolve the caller's primary goal

When the flag fires during the call, a real-time alert can be sent to the operations team. After the call, the flag drives a "missed escalation" report.

AI Summary Quality

The AI summary is a 2-3 sentence narrative answering:

Who called and what did they want?
What did the agent do?
What was the outcome?

Example summary: "New patient seeking dermatology consultation. Agent verified Aetna PPO acceptance and scheduled with Dr. Patel for Tuesday April 22 at 10am. Caller satisfied; no follow-up required."

The summary is the single most-used field by busy managers reviewing daily call digests.

Backfill Patterns

When prompts change materially, customers can backfill historical calls:

Choose date range
Choose call types
Backfill writes new rows alongside originals (both retained for comparison)
Difference report highlights which calls changed scores significantly

This is critical for honest reporting — without backfill, "improvements" in sentiment over time may just be prompt changes.

Cost Predictability

GPT-4o-mini at current pricing means:

Average call ~3-5 minutes ~600-1000 transcribed words ~1500-2500 input tokens
Analytics output ~200-400 tokens
Per-call analytics cost ~$0.0003-$0.0005

For a 100K-call-per-month customer, total analytics LLM cost ~$30-50/month — effectively free relative to the value.

Privacy Considerations

Analytics is run on redacted transcripts (see the redaction post). PII spans are masked before the analytics LLM sees them. This means analytics outputs themselves carry minimal PII and can be safely stored in BI tools accessible to a broader audience.