By Sagar Shankaran, Founder of CallSphere
Soniox v4 Async (Jan 29) and v4 Real-Time (Feb 5) deliver native-speaker accuracy across 60+ languages with code-switching. Inside the model and where it wins.
Key takeaways
Soniox v4 Async (Jan 29) and v4 Real-Time (Feb 5) deliver native-speaker accuracy across 60+ languages with code-switching. Inside the model and where it wins.
flowchart TD
In["Inbound voice call"] --> VAD["Server VAD"]
VAD --> Triage["Triage Agent"]
Triage -->|booking| Book["Booking Agent"]
Triage -->|inquiry| Info["Inquiry Agent"]
Triage -->|reschedule| Resched["Reschedule Agent"]
Book --> DB[("Postgres + Prisma")]
Info --> DB
Resched --> DB
DB --> Out["Spoken response · ElevenLabs"]Soniox shipped two flagship releases in early 2026:
The two releases share a single underlying universal model that natively understands all 60+ languages and handles code-switching seamlessly within a sentence. That is meaningfully different from the older approach of "detect language first, then route to the right monolingual model" — which adds latency and breaks on mid-sentence language flips.
On April 23, 2026, Soniox added Soniox Text-to-Speech — a new API for high-fidelity speech generation in 60+ languages with accurate alphanumeric rendering and natural language switching, completing the company's offering as a full voice stack.
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
Soniox also offers real-time, context-aware translation across 60+ languages and 3,600+ language pairs, engineered specifically for code-switching environments.
The combination of universal multilingual + code-switching matters for three concrete reasons:
CallSphere supports 57+ languages across 6 verticals. Until Q1 2026, our multilingual stack was a mix: OpenAI Whisper for STT in some languages, Deepgram Nova for others, ElevenLabs Multilingual v2 for TTS, and a separate translation layer for less common pairs. This was operationally heavy and inconsistent in quality.
In April 2026, we migrated our LATAM, India, and East Asia pilots to Soniox v4 Real-Time + v4 Async (for post-call transcript reconstruction) + Soniox TTS where ElevenLabs did not have a strong voice. Net change: single vendor for the multilingual tier, ~22% lower per-call cost on those routes, and fewer code-switching mistakes in QA.
The Healthcare Voice Agent (FastAPI :8084, 14 tools, OpenAI Realtime, post-call sentiment –1.0 to 1.0 + lead score 0-100) keeps OpenAI Realtime as the default for English; Soniox is the path for non-English calls. OneRoof Real Estate (10 specialist agents, vision on photos, OpenAI Agents SDK) and Salon GlamBook (4 agents) similarly route by language.
The same $149 / $499 / $1499 pricing covers any language; the 14-day no-card trial lets resellers prove out a non-English market before committing.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
soniox-v4-realtime against your existing STT on 200 real calls per language — measure WER and code-switch behavior.auto to let the universal model pick — do not lock to a single locale.What is Soniox v4? Soniox's fourth-generation universal multilingual speech model. Released in two variants: v4 Async (January 29, 2026) for batch and v4 Real-Time (February 5, 2026) for streaming voice agents.
How many languages does Soniox v4 support? 60+ languages with native-speaker-quality recognition. Code-switching is supported within a single audio stream without explicit language detection.
Can Soniox handle code-switching? Yes — that is its core differentiator. The universal model handles speakers flipping languages mid-sentence (Spanish-English, Hindi-English, etc.) without breaking the transcript.
Is Soniox cheaper than Deepgram or Whisper? Pricing varies by volume. Soniox is competitive with Deepgram for streaming and beats Whisper API for non-English. Run a per-language cost comparison before committing.
Does Soniox have its own TTS? Yes — Soniox TTS launched April 23, 2026 with 60+ language support, alphanumeric accuracy, and language-switching mid-utterance.
Written by
Sagar Shankaran· Founder, CallSphere
Sagar Shankaran is the founder of CallSphere, where he builds production AI voice and chat agents deployed across healthcare, hospitality, real estate, and home services. He writes about agentic AI, LLM engineering, and shipping voice agents that handle real calls in production.
See how AI voice agents work for your industry. Live demo available -- no signup required.
A founder's guide to texto a voz (text-to-speech in Spanish): LATAM vs Castilian voices, free options, and how CallSphere ships Spanish agents.
A founder's guide to the female voice generator landscape: AI female voices, Japanese voices, robot voices, and how CallSphere ships 57+ voices live.
A founder's guide to the Siri voice generator landscape: how AI voice cloning works, what is legal, and how CallSphere uses 57+ voices in production.
How to voice text in 2026: best apps, the API stack behind them, and how I use the same tech inside CallSphere's 57+ language voice agents.
A founder's guide to AI voice assistants for ecommerce: customer service, order lookup, and how CallSphere fits in versus virtual receptionists.
Robot text to speech in 2026: how I pick TTS APIs, when robotic voices help, and how CallSphere ships 57+ language voice agents. Hands-on guide.
© 2026 CallSphere LLC. All rights reserved.