Platform
How CallSphere Works
Technical reference for the CallSphere AI voice and chat agent platform. Architecture, pipeline, execution model, and safety controls. For the underlying stack and AI models, see Technology.
System Definition
What It Is
An agentic AI platform that conducts voice and chat conversations with customers. It understands intent, executes actions via tools, and responds in natural language.
What It Replaces
IVR phone trees, rule-based chatbots, and after-hours voicemail. Handles tasks that previously required a human agent for each interaction.
What It Does Not Replace
Human agents for complex escalations, licensed professionals (legal, medical), and empathy-primary interactions. CallSphere augments your team, not eliminates it.
Mechanistic Workflow
Every voice interaction follows these 9 steps from inbound signal to response delivery.
Inbound Signal
Call arrives via SIP trunk, WebRTC, or WebSocket. The transport layer establishes a bidirectional audio stream.
ASR Transcription
Automatic speech recognition converts audio to text in real time. Supports 57+ languages with speaker diarization.
Turn Detection
Voice activity detection (VAD) and endpointing determine when the caller has finished speaking. Silence threshold: 600ms configurable.
Intent Recognition
The LLM analyzes the transcript against the system prompt and conversation history to identify caller intent.
Tool Selection
Based on intent, the LLM selects zero or more tools from the agent's allowlist. Tool definitions include name, description, and parameter schema.
Tool Execution
Selected tools execute against external APIs (CRM, calendar, payment processor). Results return as structured JSON.
Response Generation
The LLM composes a natural-language response incorporating tool results, conversation context, and guardrail constraints.
TTS Synthesis
Text-to-speech converts the response to audio. Voice, speed, and tone are configurable per agent.
Delivery
Audio streams back to the caller. Barge-in detection allows the caller to interrupt at any point, restarting from step 3.
Agent Architecture
The platform is organized into 6 layers. Each layer is independently replaceable.
Transport
WebRTC, SIP, WebSocket, PSTN. Manages bidirectional audio/text streams and session lifecycle.
Speech
ASR (speech-to-text), TTS (text-to-speech), VAD (voice activity detection), endpointing, barge-in handling.
Reasoning
Frontier LLM with system prompt, conversation history, and structured output. Per-agent model selection across leading providers.
Actions
Tool calling engine. Executes API calls, database queries, and workflow triggers based on LLM decisions.
Safety
Guardrails, PII redaction, topic deny-lists, confidence thresholds, and escalation triggers.
Integrations
CRM, calendar, payments, ticketing, knowledge base, and custom webhook connectors.
See the architecture handle a real call.
Talk to a live AI voice agent — no signup required.
Voice Pipeline
| ASR | Best-in-class speech-to-text engines |
| ASR Latency | ~300ms per utterance |
| Turn Detection | VAD + endpointing, 600ms configurable silence threshold |
| Total Latency Budget | <1.5 seconds end-to-end |
| TTS | Neural text-to-speech with accent-aware voices |
| Interruption Handling | Barge-in restarts pipeline from turn detection |
| Languages | 57+ languages with accent-aware models |
Action Execution Model
Agent actions fall into 4 modes depending on risk and reversibility.
Deterministic
Fixed-logic actions like looking up business hours or reading a menu. No LLM reasoning required.
API Call
Agent invokes an external API (e.g., book appointment, check inventory). Parameters are extracted from conversation context.
Approval-Required
Agent proposes an action and waits for caller confirmation before executing. Used for payments and irreversible operations.
Human Handoff
Agent transfers the call to a human operator with full conversation context. Triggered by policy rules or caller request.
Enterprise Safety & Control
- Tool allowlists per agent prevent unauthorized actions
- Topic deny-lists block discussion of excluded subjects
- PII redaction masks sensitive data before storage
- Confidence thresholds trigger escalation when the agent is uncertain
- Turn limits prevent infinite conversation loops
- Rate limiting protects against abuse
- Immutable audit logs record every action and tool invocation
- HIPAA, PCI-DSS, and GDPR compliance controls available
When Not to Use CallSphere
CallSphere is not suitable for every use case. Do not use it for:
- Legal disputes requiring licensed legal counsel
- Situations where callers expect a named individual
- Clinical decisions requiring licensed medical sign-off
- Empathy-primary interactions (grief counseling, crisis lines)
- Environments without internet connectivity
What you get with CallSphere
- 6-layer architecture: Transport (WebRTC, SIP, WebSocket), Speech (ASR/TTS), Reasoning (LLM with system prompt), Actions (tool calling), Safety (guardrails, PII redaction), Integrations (CRM, calendar, payments).
- Sub-1.5-second end-to-end voice latency: ~300ms ASR, ~500ms LLM, ~200ms TTS, ~200-400ms network and telephony.
- First-party function tools plus custom REST and webhook tools defined by JSON schema. HMAC-SHA256 signed webhooks for verification.
- Multi-LLM: per-agent selection across frontier language models, chosen by task complexity and latency budget — no single-vendor lock-in.
Why CallSphere for the platform
CallSphere runs 6 production AI voice and chat agent platforms today, serving businesses in all 50 US states. Each agent has access to 14 function tools (appointment booking, payment capture, CRM upsert, calendar sync, knowledge-base retrieval, SMS handoff, and more), speaks 57+ languages, and answers in under 1.5 seconds end-to-end. Pricing starts at $149/mo and scales to $1,499/mo for unlimited agents with a 99.9% uptime SLA. Onboarding takes 3-5 business days for most teams, and every plan includes a free 30-day pilot with no credit card.
The platform questions, answered
The questions buyers ask most often before they sign.