By Sagar Shankaran, Founder of CallSphere
Mozilla shipped AV1 by default, H.264 simulcast with dependency descriptors, and OS-integrated screen capture in Firefox during 2025. Here is what is locked in for 2026 and how it affects voice AI agents.
Key takeaways
Mozilla shipped AV1 by default, H.264 simulcast with dependency descriptors, and OS-integrated screen capture in Firefox during 2025. Here is what is locked in for 2026 and how it affects voice AI agents.
Mozilla published the "Firefox WebRTC 2025" wrap-up in January 2026, and four shipped items now form the Firefox 2026 baseline. (1) AV1 is on by default in every Firefox channel — no flag, no fallback path needed. (2) H.264 gained simulcast plus the dependency descriptor RTP header extension, which means Firefox can finally play in SFU-based selective forwarding workflows on the codec the iOS WebKit world still requires. (3) Camera resolution and frame-rate adaptation got rebuilt across all platforms, so the same getUserMedia constraints produce smoother streams with consistent aspect ratio. (4) macOS screen capture moved to the OS-integrated picker. Mozilla's stated 2026 priorities are continued web-compat work, broader codec interop, and chasing down the last simulcast/SVC parity gaps with Chromium.
For voice AI specifically, AV1-by-default in Firefox means agent-side video clips (e.g. screen-share-while-on-call) can target one codec across Chrome and Firefox without negotiation thrash. H.264 simulcast finally makes Firefox a viable second-screen client for any SFU-based supervisor or whisper-coach experience that previously required Chrome. The camera adaptation rebuild kills a recurring support ticket pattern — Firefox users on 4K webcams no longer report distorted aspect ratios when joining a call sized for 720p tiles. Combined, these reduce the "Chrome-only" caveats your sales team has to explain.
flowchart TD
A[Firefox 2025 ship list] --> B[AV1 default ON]
A --> C[H.264 simulcast + dep descriptor]
A --> D[Camera adaptation rebuild]
A --> E[macOS OS screen capture]
B --> F[Cross-browser AV1 negotiation]
C --> G[SFU compatibility for iOS-bound flows]
D --> H[Aspect-ratio consistency]
E --> I[Native picker UX]
CallSphere runs 37 agents · 90+ tools · 115+ tables · 6 verticals · HIPAA + SOC 2 aligned. Our supervisor whisper feature uses an SFU; we lit up Firefox H.264 simulcast for the Behavioral Health vertical the day Firefox 138 shipped, and Firefox sessions stopped falling back to single-layer H.264. The Real Estate OneRoof Pion Go gateway 1.23 negotiates AV1 first now that both major desktop browsers support it natively. Plans $149 / $499 / $1,499, 14-day trial, 22% affiliate Year 1.
['AV1', 'VP9', 'H264', 'VP8'] cross-browserDoes Firefox iOS get these features? No — Firefox iOS uses WebKit per Apple App Store rules. Use Safari capability matrix on iOS.
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
AV1 hardware decode required? No, but software decode burns CPU. Prefer hardware on Apple Silicon and recent Intel/AMD chips.
Does dependency descriptor break older SFUs? The RTP header extension is opt-in — older SFUs ignore it without errors.
Should I default to AV1 in 2026? For one-way streams (agent-to-listener), yes. For two-way calls, prefer H.264 still on iOS users.
Firefox WebRTC Roadmap 2026: AV1 by Default, H.264 Simulcast, Camera Adaptation sounds like a single decision, but in production it splits into eval design, prompt cost, and observability. The deeper you push toward live traffic, the more those three pull against each other — better evals catch silent failures, prompt cost limits how often you can re-run them, and weak observability hides which retries are actually saving conversations versus burning latency budget.
Production AI agents live or die on three loops: evals, retries, and handoff state. CallSphere runs 37 agents across 6 verticals, each with its own eval suite — synthetic call transcripts replayed nightly with assertion checks on extracted entities (date, time, party size, insurance, address). Without that loop, prompt regressions ship silently and you only find out when bookings drop.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
Structured tools beat free-form text every time. Our 90+ function tools all enforce JSON schemas validated server-side; if the model hallucinates an integer where a string is required, we retry with a corrective system message before falling back to a deterministic path. For long-running flows, we treat agent handoffs as a state machine — booking → confirmation → SMS — so context survives turn boundaries.
The Realtime API vs. async decision usually comes down to "is the user holding the phone right now?" If yes, Realtime; if no (callback queue, after-hours voicemail), async wins on cost-per-conversation, which we track per agent in 115+ database tables spanning all 6 verticals.
What's the right way to scope the proof-of-concept? CallSphere runs 37 production agents and 90+ function tools across 115+ database tables in 6 verticals, so most workflows you'd want already have a template. For a topic like "Firefox WebRTC Roadmap 2026: AV1 by Default, H.264 Simulcast, Camera Adaptation", that means you're not starting from scratch — you're configuring an agent template that's already been hardened across thousands of conversations.
How do you handle compliance and data isolation? Day one is integration mapping (scheduler, CRM, messaging) and prompt tuning against your top 20 real call transcripts. Day two through five is shadow-mode running, where the agent transcribes and recommends but a human still answers, so you can compare side-by-side. Go-live is the moment your eval pass-rate clears your internal bar.
When does it make sense to switch from a managed model to a self-hosted one? The honest answer: it scales until your tool catalog gets stale. The agent is only as good as the integrations it can actually call, so the operational discipline is keeping schemas, webhooks, and fallback paths green. The platform handles the rest — observability, retries, multi-region routing — without your team owning the GPU layer.
Want to see how this maps to your stack? Book a live walkthrough at calendly.com/sagar-callsphere/new-meeting, or try the vertical-specific demo at healthcare.callsphere.tech. 14-day trial, no credit card, pilot live in 3–5 business days.
Written by
Sagar Shankaran· Founder, CallSphere
Sagar Shankaran is the founder of CallSphere, where he builds production AI voice and chat agents deployed across healthcare, hospitality, real estate, and home services. He writes about agentic AI, LLM engineering, and shipping voice agents that handle real calls in production.
See how AI voice agents work for your industry. Live demo available -- no signup required.
BrowserStack offers 30,000+ real devices; Sauce Labs ships deep Appium automation. Here is how AI voice agent teams use both for WebRTC mobile QA in 2026.
WebTransport is Baseline as of March 2026. Media Over QUIC ships in production within the year. Here is what changes for AI voice agents — and what stays the same.
Anthropic's restricted Mythos model is reshaping vuln discovery. Inside the Mozilla Firefox case, what it means for AppSec, and where voice AI fits.
On May 4 2026 OpenAI published its Realtime stack rebuild — split-relay plus transceiver edge. Here is what changed and what it means for production voice agents.
Evaluate build vs buy for enterprise calling platforms. Architecture patterns, SIP infrastructure, WebRTC, cost models, and timeline estimates for custom telephony systems.
Live news studios in 2026 deploy an AI fact-checker behind every anchor, validating claims against trusted sources and offering on-air corrections within 30 seconds. Here is the production stack.
© 2026 CallSphere LLC. All rights reserved.