ElevenLabs shipped v3 expressive TTS and consolidated its agents stack into ElevenAgents in March 2026. The MCP-native, action-taking voice agent has arrived.

What changed

flowchart TD
  In["Inbound voice call"] --> VAD["Server VAD"]
  VAD --> Triage["Triage Agent"]
  Triage -->|booking| Book["Booking Agent"]
  Triage -->|inquiry| Info["Inquiry Agent"]
  Triage -->|reschedule| Resched["Reschedule Agent"]
  Book --> DB[("Postgres + Prisma")]
  Info --> DB
  Resched --> DB
  DB --> Out["Spoken response · ElevenLabs"]

CallSphere reference architecture

ElevenLabs entered 2026 with three coordinated releases. Eleven v3 (the expressive flagship) shifted TTS from synthesized to genuinely emotive — it can deliver sarcasm, hesitation, and laughter in-line with conversational tags. Scribe became their most accurate transcription model, released January 2026. And in March 2026, the company consolidated its conversational stack into ElevenAgents — a single product that replaces the older "Conversational AI" and "11.ai" alpha experiments.

The March 25 2026 IBM partnership announcement was the production unlock: ElevenAgents now ships a watsonx integration, putting premium voice capabilities into IBM's agentic enterprise stack.

The biggest architectural shift inside ElevenAgents is Model Context Protocol (MCP) native support. Agents are no longer prompt-and-respond — they take real actions mid-conversation: check a CRM, book an appointment, process a payment. The platform also added Git-style branching for agent definitions, conversation search across uploaded files, and stronger safety guardrails for production deployment.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live →

Try Live Demo →

Why it matters for voice agent builders

Three concrete consequences:

TTS is no longer the limiting factor for naturalness. With v3 emotional control via inline tags ("[laughs] yeah, sure"), the agent's voice can match the conversation's affect — which is the dominant cue users use to judge whether they are talking to "a good bot" versus "a bad bot."
Tool-calling is becoming a first-class citizen of the voice loop, not a side channel. MCP-native means you describe an action once and any agent runtime can use it.
Branching agent definitions cuts iteration time. A real-world voice flow has dozens of edge cases (after-hours, language switch, escalation, billing dispute). Git-style branching lets you ship safer changes.

How CallSphere applies this

CallSphere's Salon GlamBook product (4 agents, GB-YYYYMMDD-### booking refs) runs on ElevenLabs TTS/STT — it adopted the v3 voices in March and we measured a 19% drop in "asked to repeat" events from real callers. The salon-receptionist persona benefits the most from v3's expressive range because booking a balayage at 6pm is an emotionally charged conversation when the only Saturday slot just disappeared.

Across the broader CallSphere fleet (37 agents, 90+ tools, 115+ DB tables, 6 verticals, 57+ languages, HIPAA + SOC 2 aligned), we use ElevenLabs where empathy and brand voice matter most: Salon, the chat-side product cards on /demo, and pilot deployments in hospitality. The Healthcare Voice Agent stays on OpenAI Realtime because of its tighter PHI governance story, but every CallSphere deployment can pick the TTS provider per-agent.

If you want to hear the difference, the demo cards on our site let you A/B Eleven v3 against gpt-realtime live.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Try Live Demo → Book 30-min Walkthrough See Pricing

Build and migration steps

Sign up for ElevenLabs Creator or Pro and request access to the v3 voices — eleven_v3 model ID in the API.
Audit your existing TTS calls — anywhere you used SSML <prosody> tags, replace with v3's natural-language tags.
Migrate older Conversational AI agents into ElevenAgents — the dashboard offers a one-click import.
Add an MCP server for your CRM (Salesforce, HubSpot, or your own API) and bind it to the agent.
Use branching to fork your agent for an A/B test before deploying broadly.
Add Scribe as your STT layer if you were on Whisper — accuracy gains on accented English are material.
Set up ElevenLabs guardrails: PII redaction, profanity filters, and out-of-scope detection.

FAQ

What is ElevenLabs v3? ElevenLabs v3 is the company's flagship expressive text-to-speech model, which supports inline emotional tags like [laughs], [sighs], and conversational cadence. It launched in 2025 and remains the recommended model in early 2026.

Is ElevenAgents a replacement for Conversational AI? Yes. ElevenAgents consolidates Conversational AI plus 11.ai into one product, with MCP support, Git-style branching, and IBM watsonx integration as of March 25 2026.

Does ElevenLabs support MCP? Yes — ElevenAgents added native Model Context Protocol support in 2026, which means tools defined as MCP servers can be plugged into agents without bespoke integration code.

How does v3 compare to gpt-realtime for natural-sounding speech? For brand voice and emotional range, blind tests still favor v3. For end-to-end latency and integrated tool calling, gpt-realtime is the simpler one-stack option. CallSphere uses both depending on the vertical.

Can I clone my own voice for an agent? Yes — instant cloning needs about 1-2 minutes of clean audio for ElevenLabs. Professional voice cloning needs more (~30 minutes) but produces studio-grade output.

Sources

ElevenLabs official site — https://elevenlabs.io/
IBM newsroom — "Enterprise AI Finds its Voice: ElevenLabs and IBM" (March 25 2026) — https://newsroom.ibm.com/2026-03-25-enterprise-ai-finds-its-voice-elevenlabs-and-ibm-bring-premium-voice-capabilities-to-agentic-ai
Medium — "ElevenLabs in 2026: The Complete Guide" — https://medium.com/the-ai-entrepreneurs/elevenlabs-in-2026-the-complete-guide-to-v3-agents-music-and-scribe-7f3c3bdfd201
Webfuse — "ElevenLabs Cheat Sheet 2026" — https://www.webfuse.com/elevenlabs-cheat-sheet

ElevenLabs v3 + ElevenAgents March 2026: Voice with Real Emotional Range

What changed

Why it matters for voice agent builders

How CallSphere applies this

Build and migration steps

FAQ

Sources

Try CallSphere AI Voice Agents

Related Articles You May Like

Latency vs Cost: A Decision Matrix for Voice AI Spend in 2026

MCP Servers for SaaS Tools: A 2026 Registry Walkthrough for Voice Agent Teams

Building OpenAI Realtime Voice Agents with an Eval Pipeline (2026)

Voice Agent Quality Metrics in 2026: WER, Latency, Grounding, and the Ones Most Teams Miss

Online vs Offline Agent Evaluation: The Pre-Deploy / Post-Deploy Split

OpenAI's May 2026 WebRTC Rearchitecture: How Voice Latency Got Real