By Sagar Shankaran, Founder of CallSphere
mediasoup is the Node + C++ SFU that powers half the open-source voice stack. Here is the cost math against managed clouds and the production tuning that matters.
Key takeaways
mediasoup is a C++ SFU with a Node.js control plane. For voice-only AI agents, a single 16-core box can hold thousands of concurrent peers. Self-hosting cuts the per-minute cost 4–6x versus managed clouds — if you can run it.
flowchart LR
Browser["Browser · WebRTC"] --> ICE["ICE / STUN / TURN"]
ICE --> SFU["SFU · Pion Go gateway 1.23"]
SFU --> NATS["NATS bus"]
NATS --> AI["AI Worker · OpenAI Realtime"]
AI --> NATS
NATS --> SFU
SFU --> Browsermediasoup is purpose-built to route media packets — not to encode them, not to mix them, just to forward. It is faster and lighter than most general-purpose SFUs because the worker is C++; the orchestration runs in Node. In 2026 it remains the SFU of choice for teams who already operate Linux at scale and want to push managed-cloud bandwidth bills down.
The cost math is straightforward. A managed SFU minute (LiveKit Cloud, Daily, Agora) runs $0.002–0.004 per participant-minute. Self-hosted mediasoup on a $400/month bare-metal box can hold ~3,000 concurrent voice peers — a hair under $0.0005 per participant-minute including the TURN egress. Above 1,000 concurrent users the savings are real.
A mediasoup voice-agent setup:
mediasoup deliberately leaves "rooms," "permissions," and "presence" to your app — you build that on top. For voice agents this is a feature: you control session lifecycle.
CallSphere does not self-host mediasoup in production yet. We benchmarked it for a six-month window before settling on Pion + Go gateway 1.23 — the C++/Node split inside mediasoup did not pair as cleanly with our Go-everywhere services and NATS topology. That said, we recommend mediasoup to customers who want to bring their own SFU on-premises (especially behavioral-health customers under stricter data-residency rules). Across our 6 verticals, mediasoup, LiveKit OSS, and Pion all run side-by-side at customer sites without us.
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
For reference, our 6-container pod (CRM writer, calendar, MLS lookup, SMS, audit, transcript) sits behind whichever SFU the deployment chooses; the pod just exposes WebRTC and gRPC.
```ts import { Device } from "mediasoup-client";
async function startCall(routerRtpCapabilities: any, signal: any) { const device = new Device(); await device.load({ routerRtpCapabilities });
const sendT = await signal.create("createWebRtcTransport"); const send = device.createSendTransport(sendT); send.on("connect", ({ dtlsParameters }, cb) => signal.send("connect", { dtlsParameters }).then(cb)); send.on("produce", (params, cb) => signal.send("produce", params).then(({ id }: any) => cb({ id })));
const stream = await navigator.mediaDevices.getUserMedia({ audio: true }); await send.produce({ track: stream.getAudioTracks()[0] });
const recvT = await signal.create("createWebRtcTransport"); const recv = device.createRecvTransport(recvT); // subscribe to agent producer via signaling, then recv.consume(...) } ```
Is mediasoup OSS friendly to commercial use? Yes — ISC license. How many cores per 1k voice peers? Roughly 4 cores at 16 kbps Opus once tuned. Does it support SVC/simulcast? Yes for video; for voice, you stay in single-stream Opus. Can it work with OpenAI Realtime? Yes — bridge in your AI worker process; the worker maintains a Realtime session per call. What about end-to-end encryption? SRTP terminates at the SFU; for E2EE add Insertable Streams.
See the bundled stack in /pricing and start a 14-day /trial. Earn 22% on every referral via /affiliate.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
mediasoup as a Self-Hosted SFU: Cutting WebRTC Cost in Half for 2026 Voice AI sits on top of a regional VPC and a cold-start problem you only see at 3am. If your voice stack lives in us-east-1 but your customer is calling from a Sydney mobile network, the round-trip time alone wrecks turn-taking. Multi-region routing, GPU residency, and warm pools become the difference between "natural" and "robotic" — and it's all infra, not the model.
The big fork is managed (OpenAI Realtime, ElevenLabs Conversational AI) versus self-hosted on GPUs you operate. Managed wins on cold-start, model freshness, and zero-ops; self-hosted wins on unit economics past a certain conversation volume and on data residency for regulated verticals. CallSphere runs hybrid: Realtime for live calls, self-hosted Whisper + a hosted LLM for async, both routed through a Go gateway that enforces per-tenant rate limits.
Latency budgets are non-negotiable on voice. End-to-end target is sub-800ms ASR-to-first-token and sub-1.4s first-audio-out; anything beyond that and turn-taking feels stilted. GPU residency in the same region as your TURN servers matters more than choosing a slightly bigger model.
Observability is the unglamorous backbone — every conversation produces logs, traces, sentiment scoring, and cost attribution piped to a per-tenant dashboard. HIPAA + SOC 2 aligned isolation keeps healthcare traffic separated from salon traffic at the storage layer, not just the API.
Is this realistic for a small business, or is it enterprise-only? The IT Helpdesk product is built on ChromaDB for RAG over runbooks, Supabase for auth and storage, and 40+ data models covering tickets, assets, MSP clients, and escalation chains. For a topic like "mediasoup as a Self-Hosted SFU: Cutting WebRTC Cost in Half for 2026 Voice AI", that means you're not starting from scratch — you're configuring an agent template that's already been hardened across thousands of conversations.
Which integrations have to be in place before launch? Day one is integration mapping (scheduler, CRM, messaging) and prompt tuning against your top 20 real call transcripts. Day two through five is shadow-mode running, where the agent transcribes and recommends but a human still answers, so you can compare side-by-side. Go-live is the moment your eval pass-rate clears your internal bar.
How do we measure whether it's actually working? The honest answer: it scales until your tool catalog gets stale. The agent is only as good as the integrations it can actually call, so the operational discipline is keeping schemas, webhooks, and fallback paths green. The platform handles the rest — observability, retries, multi-region routing — without your team owning the GPU layer.
Want to see how this maps to your stack? Book a live walkthrough at calendly.com/sagar-callsphere/new-meeting, or try the vertical-specific demo at sales.callsphere.tech. 14-day trial, no credit card, pilot live in 3–5 business days.
Written by
Sagar Shankaran· Founder, CallSphere
Sagar Shankaran is the founder of CallSphere, where he builds production AI voice and chat agents deployed across healthcare, hospitality, real estate, and home services. He writes about agentic AI, LLM engineering, and shipping voice agents that handle real calls in production.
See how AI voice agents work for your industry. Live demo available -- no signup required.
A founder's guide to texto a voz (text-to-speech in Spanish): LATAM vs Castilian voices, free options, and how CallSphere ships Spanish agents.
A founder's guide to the female voice generator landscape: AI female voices, Japanese voices, robot voices, and how CallSphere ships 57+ voices live.
A founder's guide to the Siri voice generator landscape: how AI voice cloning works, what is legal, and how CallSphere uses 57+ voices in production.
A founder's guide to AI voice assistants for ecommerce: customer service, order lookup, and how CallSphere fits in versus virtual receptionists.
Robot text to speech in 2026: how I pick TTS APIs, when robotic voices help, and how CallSphere ships 57+ language voice agents. Hands-on guide.
The customer support specialist role in 2026 is half human, half AI. Here is what the job looks like, the AI tools that pair with it, and how we ship it.
© 2026 CallSphere LLC. All rights reserved.
Watch how CallSphere handles real customer calls, schedules appointments, and processes payments — live.