Bandwidth Voice API and AI Streaming in 2026: Carrier-Owned Pipes for OpenAI Realtime
Bandwidth runs its own US carrier network and added native OpenAI Realtime support in September 2025. Their BXML StartStream verb plus 2-way Media Streaming gives you carrier-grade pipes for AI voice without leaving the network you already paid for.
Bandwidth has been the carrier-on-a-CPaaS in the US since 1999. They own the network, the LERG entries, the 911, and now the AI integration: September 2025 brought native OpenAI Realtime support under their "Bring Your Own AI" approach. For US-only enterprises that already buy 911 and DIDs from Bandwidth, plugging into OpenAI without a third-party hop saves real latency.
Background
Bandwidth's voice product is BXML (Bandwidth eXtensible Markup Language), a Twilio-TwiML-style XML response language. The StartStream verb attaches a media stream to the call; the streamEventUrl receives Media Stream Started, Media Stream Rejected, and Media Stream Stopped callbacks. StartTranscription is a separate verb for live transcription with up to 4 concurrent track transcriptions per call.
Bandwidth's "Bring Your Own AI" expansion in September 2025 added direct support for OpenAI's Realtime API: a turn-key configuration that routes call audio through Bandwidth's network into OpenAI Realtime and back, without an intermediate WebSocket bridge. The latency claim is sub-200 ms because Bandwidth controls the IP path from carrier ingress to AI egress.
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
Architecture
graph LR
A[PSTN Caller] --> B[Bandwidth Carrier Network]
B -->|BXML response| C[Your App Server]
C -->|StartStream verb| B
B -->|wss bidirectional| D[Your WebSocket Server]
D -->|or direct hop| E[OpenAI Realtime]
E -->|audio back| D
D --> B
B --> A
F[StartTranscription verb] -.->|live tracks| G[Your Webhook]
<Response>
<StartStream
destination="wss://bridge.callsphere.ai/bandwidth-realtime"
name="ai-stream"
tracks="both"
streamEventUrl="https://callsphere.ai/api/bandwidth/stream-events"
streamEventMethod="POST">
<StreamParam name="tenant" value="abc123"/>
<StreamParam name="agent" value="intake"/>
</StartStream>
<Pause duration="3600"/>
</Response>
CallSphere implementation
CallSphere terminates on Twilio across every product (Healthcare AI on FastAPI :8084 to OpenAI Realtime, Real Estate AI, Sales Calling AI with 5 concurrent outbound, Salon AI, IT Helpdesk AI, After-Hours AI with Twilio simul call+SMS 120-second timeout). 37 agents, 90+ tools, 115+ DB tables, HIPAA + SOC 2, $149/$499/$1499 plans, 14-day trial, 22% affiliate. Bandwidth is in our evaluation list because their carrier-owned path can shave 50-100 ms off the round-trip for US-domestic calls. For prospects in the IT Helpdesk vertical with strict 911-on-prem requirements, Bandwidth is the natural carrier; we maintain a reference BXML configuration that routes their calls through our standard agent stack with a thin StartStream-to-WebSocket adapter.
Build steps
- Sign up for Bandwidth, configure a Voice application, get a Bandwidth phone number.
- Set the application's voice URL to your BXML endpoint.
- Inbound call hits your endpoint; respond with BXML containing StartStream and a Pause to keep the call open.
- Implement the WebSocket: parse the start metadata frame for streamId and customParam fields, then loop on media frames.
- Decode incoming audio (mulaw 8 kHz default, configurable to L16 16 kHz) and forward to OpenAI Realtime.
- Send audio back as binary WebSocket messages in the same content-type the stream negotiated.
- Handle streamEventUrl callbacks for Started/Rejected/Stopped lifecycle events.
Pitfalls
- StartStream without a Pause or Bridge after it will hang up the call when the verb completes; always Pause for the call duration.
- streamEventUrl is separate from the audio WebSocket; do not conflate them.
- 4 concurrent track-transcriptions limit applies to StartTranscription, not StartStream; do not assume you can fork the same call to four AI services.
- Bandwidth audio formats default to mulaw 8 kHz; explicit configuration to L16 requires the application setting plus the verb attribute.
- The BYOAI direct OpenAI integration is convenient but obscures the wire format; for debugging, fall back to a manual StartStream and your own bridge.
FAQ
Bandwidth vs Telnyx for US AI voice? Both own carrier networks. Bandwidth is older with deeper E911 and number-porting muscle; Telnyx has shipped LiveKit-on-Telnyx and is faster on AI features. Pick on existing relationship.
Native OpenAI Realtime support means what exactly? A configuration option that routes the call audio through Bandwidth's IP network directly into OpenAI Realtime, without a customer-managed bridge.
Can I still use my own bridge? Yes. The native option is opt-in; default is BYOWebSocket.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
HIPAA? Yes, Bandwidth signs BAAs on enterprise plans.
Pricing? Voice minutes plus a per-minute streaming charge; quote-based above the standard published rates.
Sources
- Bandwidth Voice API for Voice AI Platforms
- Bandwidth Bi-directional Media Streaming
- Bandwidth Start Stream BXML documentation
- Bandwidth OpenAI Realtime support announcement
Start a 14-day trial of our Twilio-based stack, see pricing, or contact us about Bandwidth integration for US-domestic high-volume tenants.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.