Skip to content

Platform

How CallSphere Works

Technical reference for the CallSphere AI voice and chat agent platform. Architecture, pipeline, execution model, and safety controls. For the underlying stack and AI models, see Technology.

System Definition

What It Is

An agentic AI platform that conducts voice and chat conversations with customers. It understands intent, executes actions via tools, and responds in natural language.

What It Replaces

IVR phone trees, rule-based chatbots, and after-hours voicemail. Handles tasks that previously required a human agent for each interaction.

What It Does Not Replace

Human agents for complex escalations, licensed professionals (legal, medical), and empathy-primary interactions. CallSphere augments your team, not eliminates it.

Mechanistic Workflow

Every voice interaction follows these 9 steps from inbound signal to response delivery.

1

Inbound Signal

Call arrives via SIP trunk, WebRTC, or WebSocket. The transport layer establishes a bidirectional audio stream.

2

ASR Transcription

Automatic speech recognition converts audio to text in real time. Supports 57+ languages with speaker diarization.

3

Turn Detection

Voice activity detection (VAD) and endpointing determine when the caller has finished speaking. Silence threshold: 600ms configurable.

4

Intent Recognition

The LLM analyzes the transcript against the system prompt and conversation history to identify caller intent.

5

Tool Selection

Based on intent, the LLM selects zero or more tools from the agent's allowlist. Tool definitions include name, description, and parameter schema.

6

Tool Execution

Selected tools execute against external APIs (CRM, calendar, payment processor). Results return as structured JSON.

7

Response Generation

The LLM composes a natural-language response incorporating tool results, conversation context, and guardrail constraints.

8

TTS Synthesis

Text-to-speech converts the response to audio. Voice, speed, and tone are configurable per agent.

9

Delivery

Audio streams back to the caller. Barge-in detection allows the caller to interrupt at any point, restarting from step 3.

Agent Architecture

The platform is organized into 6 layers. Each layer is independently replaceable.

1

Transport

WebRTC, SIP, WebSocket, PSTN. Manages bidirectional audio/text streams and session lifecycle.

2

Speech

ASR (speech-to-text), TTS (text-to-speech), VAD (voice activity detection), endpointing, barge-in handling.

3

Reasoning

Frontier LLM with system prompt, conversation history, and structured output. Per-agent model selection across leading providers.

4

Actions

Tool calling engine. Executes API calls, database queries, and workflow triggers based on LLM decisions.

5

Safety

Guardrails, PII redaction, topic deny-lists, confidence thresholds, and escalation triggers.

6

Integrations

CRM, calendar, payments, ticketing, knowledge base, and custom webhook connectors.

See the architecture handle a real call.

Talk to a live AI voice agent — no signup required.

Try the Live Demo

Voice Pipeline

ASRBest-in-class speech-to-text engines
ASR Latency~300ms per utterance
Turn DetectionVAD + endpointing, 600ms configurable silence threshold
Total Latency Budget<1.5 seconds end-to-end
TTSNeural text-to-speech with accent-aware voices
Interruption HandlingBarge-in restarts pipeline from turn detection
Languages57+ languages with accent-aware models

Action Execution Model

Agent actions fall into 4 modes depending on risk and reversibility.

Deterministic

Fixed-logic actions like looking up business hours or reading a menu. No LLM reasoning required.

API Call

Agent invokes an external API (e.g., book appointment, check inventory). Parameters are extracted from conversation context.

Approval-Required

Agent proposes an action and waits for caller confirmation before executing. Used for payments and irreversible operations.

Human Handoff

Agent transfers the call to a human operator with full conversation context. Triggered by policy rules or caller request.

Enterprise Safety & Control

  • Tool allowlists per agent prevent unauthorized actions
  • Topic deny-lists block discussion of excluded subjects
  • PII redaction masks sensitive data before storage
  • Confidence thresholds trigger escalation when the agent is uncertain
  • Turn limits prevent infinite conversation loops
  • Rate limiting protects against abuse
  • Immutable audit logs record every action and tool invocation
  • HIPAA, PCI-DSS, and GDPR compliance controls available

When Not to Use CallSphere

CallSphere is not suitable for every use case. Do not use it for:

  • Legal disputes requiring licensed legal counsel
  • Situations where callers expect a named individual
  • Clinical decisions requiring licensed medical sign-off
  • Empathy-primary interactions (grief counseling, crisis lines)
  • Environments without internet connectivity

What you get with CallSphere

  • 6-layer architecture: Transport (WebRTC, SIP, WebSocket), Speech (ASR/TTS), Reasoning (LLM with system prompt), Actions (tool calling), Safety (guardrails, PII redaction), Integrations (CRM, calendar, payments).
  • Sub-1.5-second end-to-end voice latency: ~300ms ASR, ~500ms LLM, ~200ms TTS, ~200-400ms network and telephony.
  • First-party function tools plus custom REST and webhook tools defined by JSON schema. HMAC-SHA256 signed webhooks for verification.
  • Multi-LLM: per-agent selection across frontier language models, chosen by task complexity and latency budget — no single-vendor lock-in.

Why CallSphere for the platform

CallSphere runs 6 production AI voice and chat agent platforms today, serving businesses in all 50 US states. Each agent has access to 14 function tools (appointment booking, payment capture, CRM upsert, calendar sync, knowledge-base retrieval, SMS handoff, and more), speaks 57+ languages, and answers in under 1.5 seconds end-to-end. Pricing starts at $149/mo and scales to $1,499/mo for unlimited agents with a 99.9% uptime SLA. Onboarding takes 3-5 business days for most teams, and every plan includes a free 30-day pilot with no credit card.

FAQ

The platform questions, answered

The questions buyers ask most often before they sign.

How fast can a CallSphere agent go live?
Simple use cases like appointment scheduling or FAQ deflection ship in 24 hours. Most customers are in production within 3-5 business days. Complex multi-agent rollouts with custom CRM and EHR integrations take 1-2 weeks with a dedicated onboarding specialist.
What does CallSphere cost?
Starter is $149/mo (1 voice agent, 1 chat agent, 2,000 interactions). Growth is $499/mo (3 agents, 10,000 interactions, 99.9% SLA). Scale is $1,499/mo (unlimited agents and 50,000 interactions, SSO/SAML, dedicated success). Annual billing saves 20% across all tiers.
Does CallSphere support voice and chat from one agent?
Yes. The same agent config, tools, and knowledge base power phone calls, web chat, and SMS. Voice end-to-end latency stays under 1.5 seconds; chat replies stream in under 800ms.
Is CallSphere HIPAA compliant?
Yes. We sign a BAA, encrypt PHI in transit (TLS 1.2+) and at rest, redact PII from logs by default, and run on AWS US-East with optional EU (Frankfurt) and APAC (Singapore) residency on Scale plans.
What integrations are included?
Out-of-the-box connectors for HubSpot, Salesforce, Zendesk, Twilio, Stripe, Shopify, ServiceTitan, Calendly, and Google Calendar. Custom REST and webhook tools take ~1 day to wire up on Growth and Scale plans.
What happens if the AI can't handle a call?
Agents escalate to a human on five configurable triggers: explicit customer request, confidence below threshold, turn-limit exceeded, sensitive topic detected, or repeated tool failure. Full transcript and extracted entities are handed off with the call.