WebRTC + WHIP for AI Ingress: The 2026 Production Pattern
WHIP is the simplest WebRTC ingress protocol you can ship. For AI voice agents that need to take in live audio from a third-party source, it is the right answer in 2026.
WHIP (WebRTC-HTTP Ingestion Protocol, RFC 9725) takes the entire WebRTC ingest signalling problem and turns it into a single HTTP POST. For AI voice ingress in 2026 it is the cleanest way to wire any source — phone bridge, IoT mic, broadcaster — into your agent.
What WHIP solves
A traditional WebRTC ingest needs a signalling server: WebSocket, JSON messages, ICE trickle, renegotiation glue. WHIP collapses all of that to one HTTP exchange:
- Client POSTs an SDP offer with `Content-Type: application/sdp`.
- Server responds with the SDP answer in the body and a `Location:` header.
- Client DELETEs the location to tear down.
That is the entire spec. RFC 9725 (published March 2025) standardized it. By 2026 every major SFU (LiveKit, mediasoup, Janus, Cloudflare Realtime), most CDNs, and OBS itself ship WHIP support.
Architecture for AI ingress
```mermaid flowchart LR Source[Audio source - SIP bridge / IoT / OBS] -- POST SDP --> WHIP[WHIP endpoint] WHIP --> SFU[SFU or AI gateway] SFU --> Agent[AI voice agent] Agent -- response --> SFU SFU --> Egress[WHEP or browser] ```
Note the symmetry: WHIP for ingest, WHEP (RFC 9728) for egress. Both are sub-300-byte HTTP requests carrying SDP. They are simple enough that an embedded device with 64 KB of RAM can implement them; complex enough that LiveKit and Pion treat them as first-class transports.
CallSphere implementation
CallSphere uses WHIP in three places:
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
- Phone bridge — Our SIP gateway converts inbound PSTN to a WHIP POST into the Pion Go gateway 1.23. The gateway answers, NATS broadcasts the new session, the 6-container pod (CRM, MLS, calendar, SMS, audit, transcript) wakes up. This is how Real Estate (OneRoof, /industries/real-estate) handles inbound calls.
- IoT field devices — Behavioral health partners send live audio from a bedside microphone via a Pion-based client to a WHIP endpoint. The agent can join, transcribe, and trigger a clinician page. Across 6 verticals (real estate, healthcare, behavioral health, legal, salon, insurance) this is the single reusable ingress protocol.
- OBS/broadcast — For training and demo recording at events, OBS pushes WHIP into our infrastructure; the AI agent in /demo is set up to consume it.
Across 37 agents, 90+ tools, and 115+ database tables, WHIP centralizes the "outside-in audio" problem to a single auditable endpoint. SOC 2 + HIPAA logs every POST/DELETE with subject identity. Pricing $149/$499/$1499 with the 14-day trial; affiliates 22% on /affiliate.
Code snippet (WHIP client)
```ts async function whipPublish(endpoint: string, token: string, mediaStream: MediaStream) { const pc = new RTCPeerConnection(); for (const t of mediaStream.getTracks()) pc.addTrack(t, mediaStream);
const offer = await pc.createOffer();
await pc.setLocalDescription(offer);
// wait for ICE gathering or send trickle
await new Promise
const res = await fetch(endpoint, { method: "POST", headers: { "Content-Type": "application/sdp", Authorization: `Bearer ${token}` }, body: pc.localDescription!.sdp, }); const answer = await res.text(); const location = res.headers.get("Location")!; await pc.setRemoteDescription({ type: "answer", sdp: answer }); return { pc, location }; } ```
Build steps
- Pick a server: LiveKit, mediasoup, Janus, or your own Pion-based gateway all support WHIP.
- Authenticate with Bearer tokens; never run unauthenticated WHIP in production.
- Use HTTPS only; `http://` WHIP is fine for lab work but never production.
- Implement DELETE on `Location` for clean shutdowns.
- Add a health probe — a HEAD on the endpoint returning capabilities is a common convention.
- Plan for trickle ICE via PATCH (RFC 9725 section 4.2) once your client and server both support it.
- Pre-flight Opus codec preferences in the SDP offer; some servers will fall back to PCMU silently if Opus is missing.
Common pitfalls
- Forgetting DELETE — sessions accumulate on the server until idle timeout. Always teardown explicitly.
- No Bearer auth — open WHIP endpoints get crawled and abused.
- Skipping ICE trickle support — adds 1–2 s of waiting for ICE gathering before the POST. Use the PATCH path.
- Mismatched codec sets — your source ships Opus, your server expects PCMU. Match deliberately.
- Forgetting CORS for browser publishers — `Access-Control-Allow-Headers: Content-Type, Authorization` is required.
FAQ
Is WHIP only for video? No — audio-only WHIP is identical, just no `a=video` line.
Is it the same as RTMP? No — WHIP is sub-second; RTMP is multi-second.
Does OpenAI Realtime speak WHIP? Not directly — it has its own SDP exchange endpoint that is WHIP-shaped but not RFC-9725 compliant.
What about WHEP for output? Yes — WHEP (RFC 9728) is the egress twin and we use it for the agent's output side in some flows.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
Can I use WHIP from a phone? Yes — many SIP-to-WebRTC bridges ship WHIP as the WebRTC side.
Does Pion implement WHIP? Yes — see the Pion examples in their repo; LiveKit, Cloudflare Realtime, Janus, and mediasoup all ship WHIP plugins.
Can WHIP carry data channels? The base RFC focuses on media; data-channel ingest is an extension some implementations support and some do not.
What about authentication beyond Bearer tokens? mTLS or signed query strings are common; Bearer is the spec default.
Production playbook for AI voice teams in 2026
Three rules from running WHIP across IoT, telephony, and broadcast:
- One endpoint, many sources. The same WHIP URL handles a SIP bridge, an IoT device, and an OBS publisher; the Bearer token tells the server which is which.
- Reject unsupported codecs in the SDP answer. Better than accepting and silently transcoding. Misconfigured sources should fail loudly.
- Idle-timeout aggressively. A 10-minute idle on a WHIP session is wasted resources. We default to 60 seconds and rely on PATCH keep-alives.
The rule that gets the most pushback is the third one. The right answer for AI voice is "no idle is healthy" — if the source is not actively publishing audio, drop and reconnect. Any other policy lets ghost sessions accumulate.
Watch list 2026
- WHIP trickle ICE via PATCH is in the spec but uneven across implementations; track LiveKit and Janus releases.
- WHEP for AI agent egress is the symmetric protocol — RFC 9728 — and gets you into HLS-style player ecosystems.
- Bearer token TTLs and rotation patterns are still developer-defined; some best practice may emerge.
- WHIP for data-only (the proposed extension) would let you publish DataChannel events as easily as audio.
The biggest reason we standardized on WHIP across all our ingress paths is operational: a single endpoint with bearer-token auth gives you one thing to monitor, one rate-limit policy, one audit trail, and one incident-response checklist. Compared to a per-source bespoke signaling protocol, the operational simplification is enormous — and it makes onboarding new partners (a hardware vendor, a hospital, a brokerage) a one-page integration spec rather than a quarter-long project.
Sources
- https://datatracker.ietf.org/doc/rfc9725/
- https://datatracker.ietf.org/doc/rfc9728/
- https://docs.livekit.io/home/server/whip/
- https://github.com/livekit/agents
- https://reference-server.pipecat.ai/en/latest/api/pipecat.transports.smallwebrtc.request_handler.html
- https://celloip.com/blog/livekit-voice-agents-guide/
Try WHIP-driven calls live on /demo, pricing /pricing, or start a /trial.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.