By Sagar Shankaran, Founder of CallSphere
ElevenLabs shipped v3 expressive TTS and consolidated its agents stack into ElevenAgents in March 2026. The MCP-native, action-taking voice agent has arrived.
Key takeaways
ElevenLabs shipped v3 expressive TTS and consolidated its agents stack into ElevenAgents in March 2026. The MCP-native, action-taking voice agent has arrived.
flowchart TD
In["Inbound voice call"] --> VAD["Server VAD"]
VAD --> Triage["Triage Agent"]
Triage -->|booking| Book["Booking Agent"]
Triage -->|inquiry| Info["Inquiry Agent"]
Triage -->|reschedule| Resched["Reschedule Agent"]
Book --> DB[("Postgres + Prisma")]
Info --> DB
Resched --> DB
DB --> Out["Spoken response · ElevenLabs"]ElevenLabs entered 2026 with three coordinated releases. Eleven v3 (the expressive flagship) shifted TTS from synthesized to genuinely emotive — it can deliver sarcasm, hesitation, and laughter in-line with conversational tags. Scribe became their most accurate transcription model, released January 2026. And in March 2026, the company consolidated its conversational stack into ElevenAgents — a single product that replaces the older "Conversational AI" and "11.ai" alpha experiments.
The March 25 2026 IBM partnership announcement was the production unlock: ElevenAgents now ships a watsonx integration, putting premium voice capabilities into IBM's agentic enterprise stack.
The biggest architectural shift inside ElevenAgents is Model Context Protocol (MCP) native support. Agents are no longer prompt-and-respond — they take real actions mid-conversation: check a CRM, book an appointment, process a payment. The platform also added Git-style branching for agent definitions, conversation search across uploaded files, and stronger safety guardrails for production deployment.
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
Three concrete consequences:
CallSphere's Salon GlamBook product (4 agents, GB-YYYYMMDD-### booking refs) runs on ElevenLabs TTS/STT — it adopted the v3 voices in March and we measured a 19% drop in "asked to repeat" events from real callers. The salon-receptionist persona benefits the most from v3's expressive range because booking a balayage at 6pm is an emotionally charged conversation when the only Saturday slot just disappeared.
Across the broader CallSphere fleet (37 agents, 90+ tools, 115+ DB tables, 6 verticals, 57+ languages, HIPAA + SOC 2 aligned), we use ElevenLabs where empathy and brand voice matter most: Salon, the chat-side product cards on /demo, and pilot deployments in hospitality. The Healthcare Voice Agent stays on OpenAI Realtime because of its tighter PHI governance story, but every CallSphere deployment can pick the TTS provider per-agent.
If you want to hear the difference, the demo cards on our site let you A/B Eleven v3 against gpt-realtime live.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
<prosody> tags, replace with v3's natural-language tags.What is ElevenLabs v3?
ElevenLabs v3 is the company's flagship expressive text-to-speech model, which supports inline emotional tags like [laughs], [sighs], and conversational cadence. It launched in 2025 and remains the recommended model in early 2026.
Is ElevenAgents a replacement for Conversational AI? Yes. ElevenAgents consolidates Conversational AI plus 11.ai into one product, with MCP support, Git-style branching, and IBM watsonx integration as of March 25 2026.
Does ElevenLabs support MCP? Yes — ElevenAgents added native Model Context Protocol support in 2026, which means tools defined as MCP servers can be plugged into agents without bespoke integration code.
How does v3 compare to gpt-realtime for natural-sounding speech? For brand voice and emotional range, blind tests still favor v3. For end-to-end latency and integrated tool calling, gpt-realtime is the simpler one-stack option. CallSphere uses both depending on the vertical.
Can I clone my own voice for an agent? Yes — instant cloning needs about 1-2 minutes of clean audio for ElevenLabs. Professional voice cloning needs more (~30 minutes) but produces studio-grade output.
Written by
Sagar Shankaran· Founder, CallSphere
Sagar Shankaran is the founder of CallSphere, where he builds production AI voice and chat agents deployed across healthcare, hospitality, real estate, and home services. He writes about agentic AI, LLM engineering, and shipping voice agents that handle real calls in production.
See how AI voice agents work for your industry. Live demo available -- no signup required.
A founder's guide to texto a voz (text-to-speech in Spanish): LATAM vs Castilian voices, free options, and how CallSphere ships Spanish agents.
A founder's guide to the female voice generator landscape: AI female voices, Japanese voices, robot voices, and how CallSphere ships 57+ voices live.
A founder's guide to the Siri voice generator landscape: how AI voice cloning works, what is legal, and how CallSphere uses 57+ voices in production.
A founder's guide to robot voice TTS, character voice text-to-speech, and where the Brian voice and announcer voice still beat human voices.
How to voice text in 2026: best apps, the API stack behind them, and how I use the same tech inside CallSphere's 57+ language voice agents.
A founder's guide to AI voice assistants for ecommerce: customer service, order lookup, and how CallSphere fits in versus virtual receptionists.
© 2026 CallSphere LLC. All rights reserved.