Skip to content
AI Voice Agents
AI Voice Agents10 min read0 views

Bandwidth Voice API and AI Streaming in 2026: Carrier-Owned Pipes for OpenAI Realtime

Bandwidth runs its own US carrier network and added native OpenAI Realtime support in September 2025. Their BXML StartStream verb plus 2-way Media Streaming gives you carrier-grade pipes for AI voice without leaving the network you already paid for.

Bandwidth has been the carrier-on-a-CPaaS in the US since 1999. They own the network, the LERG entries, the 911, and now the AI integration: September 2025 brought native OpenAI Realtime support under their "Bring Your Own AI" approach. For US-only enterprises that already buy 911 and DIDs from Bandwidth, plugging into OpenAI without a third-party hop saves real latency.

Background

Bandwidth's voice product is BXML (Bandwidth eXtensible Markup Language), a Twilio-TwiML-style XML response language. The StartStream verb attaches a media stream to the call; the streamEventUrl receives Media Stream Started, Media Stream Rejected, and Media Stream Stopped callbacks. StartTranscription is a separate verb for live transcription with up to 4 concurrent track transcriptions per call.

Bandwidth's "Bring Your Own AI" expansion in September 2025 added direct support for OpenAI's Realtime API: a turn-key configuration that routes call audio through Bandwidth's network into OpenAI Realtime and back, without an intermediate WebSocket bridge. The latency claim is sub-200 ms because Bandwidth controls the IP path from carrier ingress to AI egress.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live Demo →

Architecture

graph LR
    A[PSTN Caller] --> B[Bandwidth Carrier Network]
    B -->|BXML response| C[Your App Server]
    C -->|StartStream verb| B
    B -->|wss bidirectional| D[Your WebSocket Server]
    D -->|or direct hop| E[OpenAI Realtime]
    E -->|audio back| D
    D --> B
    B --> A
    F[StartTranscription verb] -.->|live tracks| G[Your Webhook]
<Response>
  <StartStream
    destination="wss://bridge.callsphere.ai/bandwidth-realtime"
    name="ai-stream"
    tracks="both"
    streamEventUrl="https://callsphere.ai/api/bandwidth/stream-events"
    streamEventMethod="POST">
    <StreamParam name="tenant" value="abc123"/>
    <StreamParam name="agent" value="intake"/>
  </StartStream>
  <Pause duration="3600"/>
</Response>

CallSphere implementation

CallSphere terminates on Twilio across every product (Healthcare AI on FastAPI :8084 to OpenAI Realtime, Real Estate AI, Sales Calling AI with 5 concurrent outbound, Salon AI, IT Helpdesk AI, After-Hours AI with Twilio simul call+SMS 120-second timeout). 37 agents, 90+ tools, 115+ DB tables, HIPAA + SOC 2, $149/$499/$1499 plans, 14-day trial, 22% affiliate. Bandwidth is in our evaluation list because their carrier-owned path can shave 50-100 ms off the round-trip for US-domestic calls. For prospects in the IT Helpdesk vertical with strict 911-on-prem requirements, Bandwidth is the natural carrier; we maintain a reference BXML configuration that routes their calls through our standard agent stack with a thin StartStream-to-WebSocket adapter.

Build steps

  1. Sign up for Bandwidth, configure a Voice application, get a Bandwidth phone number.
  2. Set the application's voice URL to your BXML endpoint.
  3. Inbound call hits your endpoint; respond with BXML containing StartStream and a Pause to keep the call open.
  4. Implement the WebSocket: parse the start metadata frame for streamId and customParam fields, then loop on media frames.
  5. Decode incoming audio (mulaw 8 kHz default, configurable to L16 16 kHz) and forward to OpenAI Realtime.
  6. Send audio back as binary WebSocket messages in the same content-type the stream negotiated.
  7. Handle streamEventUrl callbacks for Started/Rejected/Stopped lifecycle events.

Pitfalls

  • StartStream without a Pause or Bridge after it will hang up the call when the verb completes; always Pause for the call duration.
  • streamEventUrl is separate from the audio WebSocket; do not conflate them.
  • 4 concurrent track-transcriptions limit applies to StartTranscription, not StartStream; do not assume you can fork the same call to four AI services.
  • Bandwidth audio formats default to mulaw 8 kHz; explicit configuration to L16 requires the application setting plus the verb attribute.
  • The BYOAI direct OpenAI integration is convenient but obscures the wire format; for debugging, fall back to a manual StartStream and your own bridge.

FAQ

Bandwidth vs Telnyx for US AI voice? Both own carrier networks. Bandwidth is older with deeper E911 and number-porting muscle; Telnyx has shipped LiveKit-on-Telnyx and is faster on AI features. Pick on existing relationship.

Native OpenAI Realtime support means what exactly? A configuration option that routes the call audio through Bandwidth's IP network directly into OpenAI Realtime, without a customer-managed bridge.

Can I still use my own bridge? Yes. The native option is opt-in; default is BYOWebSocket.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

HIPAA? Yes, Bandwidth signs BAAs on enterprise plans.

Pricing? Voice minutes plus a per-minute streaming charge; quote-based above the standard published rates.

Sources

Start a 14-day trial of our Twilio-based stack, see pricing, or contact us about Bandwidth integration for US-domestic high-volume tenants.

Share

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.

Related Articles You May Like