By Sagar Shankaran, Founder of CallSphere
WHIP is the simplest WebRTC ingress protocol you can ship. For AI voice agents that need to take in live audio from a third-party source, it is the right answer in 2026.
Key takeaways
WHIP (WebRTC-HTTP Ingestion Protocol, RFC 9725) takes the entire WebRTC ingest signalling problem and turns it into a single HTTP POST. For AI voice ingress in 2026 it is the cleanest way to wire any source — phone bridge, IoT mic, broadcaster — into your agent.
A traditional WebRTC ingest needs a signalling server: WebSocket, JSON messages, ICE trickle, renegotiation glue. WHIP collapses all of that to one HTTP exchange:
That is the entire spec. RFC 9725 (published March 2025) standardized it. By 2026 every major SFU (LiveKit, mediasoup, Janus, Cloudflare Realtime), most CDNs, and OBS itself ship WHIP support.
```mermaid flowchart LR Source[Audio source - SIP bridge / IoT / OBS] -- POST SDP --> WHIP[WHIP endpoint] WHIP --> SFU[SFU or AI gateway] SFU --> Agent[AI voice agent] Agent -- response --> SFU SFU --> Egress[WHEP or browser] ```
Note the symmetry: WHIP for ingest, WHEP (RFC 9728) for egress. Both are sub-300-byte HTTP requests carrying SDP. They are simple enough that an embedded device with 64 KB of RAM can implement them; complex enough that LiveKit and Pion treat them as first-class transports.
CallSphere uses WHIP in three places:
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
Across 37 agents, 90+ tools, and 115+ database tables, WHIP centralizes the "outside-in audio" problem to a single auditable endpoint. SOC 2 + HIPAA logs every POST/DELETE with subject identity. Pricing $149/$499/$1499 with the 14-day trial; affiliates 22% on /affiliate.
```ts async function whipPublish(endpoint: string, token: string, mediaStream: MediaStream) { const pc = new RTCPeerConnection(); for (const t of mediaStream.getTracks()) pc.addTrack(t, mediaStream);
const offer = await pc.createOffer();
await pc.setLocalDescription(offer);
// wait for ICE gathering or send trickle
await new Promise
const res = await fetch(endpoint, { method: "POST", headers: { "Content-Type": "application/sdp", Authorization: `Bearer ${token}` }, body: pc.localDescription!.sdp, }); const answer = await res.text(); const location = res.headers.get("Location")!; await pc.setRemoteDescription({ type: "answer", sdp: answer }); return { pc, location }; } ```
Is WHIP only for video? No — audio-only WHIP is identical, just no `a=video` line.
Is it the same as RTMP? No — WHIP is sub-second; RTMP is multi-second.
Does OpenAI Realtime speak WHIP? Not directly — it has its own SDP exchange endpoint that is WHIP-shaped but not RFC-9725 compliant.
What about WHEP for output? Yes — WHEP (RFC 9728) is the egress twin and we use it for the agent's output side in some flows.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
Can I use WHIP from a phone? Yes — many SIP-to-WebRTC bridges ship WHIP as the WebRTC side.
Does Pion implement WHIP? Yes — see the Pion examples in their repo; LiveKit, Cloudflare Realtime, Janus, and mediasoup all ship WHIP plugins.
Can WHIP carry data channels? The base RFC focuses on media; data-channel ingest is an extension some implementations support and some do not.
What about authentication beyond Bearer tokens? mTLS or signed query strings are common; Bearer is the spec default.
Three rules from running WHIP across IoT, telephony, and broadcast:
The rule that gets the most pushback is the third one. The right answer for AI voice is "no idle is healthy" — if the source is not actively publishing audio, drop and reconnect. Any other policy lets ghost sessions accumulate.
The biggest reason we standardized on WHIP across all our ingress paths is operational: a single endpoint with bearer-token auth gives you one thing to monitor, one rate-limit policy, one audit trail, and one incident-response checklist. Compared to a per-source bespoke signaling protocol, the operational simplification is enormous — and it makes onboarding new partners (a hardware vendor, a hospital, a brokerage) a one-page integration spec rather than a quarter-long project.
Try WHIP-driven calls live on /demo, pricing /pricing, or start a /trial.
Written by
Sagar Shankaran· Founder, CallSphere
Sagar Shankaran is the founder of CallSphere, where he builds production AI voice and chat agents deployed across healthcare, hospitality, real estate, and home services. He writes about agentic AI, LLM engineering, and shipping voice agents that handle real calls in production.
See how AI voice agents work for your industry. Live demo available -- no signup required.
A founder's guide to texto a voz (text-to-speech in Spanish): LATAM vs Castilian voices, free options, and how CallSphere ships Spanish agents.
A founder's guide to the female voice generator landscape: AI female voices, Japanese voices, robot voices, and how CallSphere ships 57+ voices live.
A founder's guide to the Siri voice generator landscape: how AI voice cloning works, what is legal, and how CallSphere uses 57+ voices in production.
A founder's guide to AI voice assistants for ecommerce: customer service, order lookup, and how CallSphere fits in versus virtual receptionists.
Robot text to speech in 2026: how I pick TTS APIs, when robotic voices help, and how CallSphere ships 57+ language voice agents. Hands-on guide.
The customer support specialist role in 2026 is half human, half AI. Here is what the job looks like, the AI tools that pair with it, and how we ship it.
© 2026 CallSphere LLC. All rights reserved.