FreeSWITCH was streaming PCM over WebSocket to remote ASR engines years before AI voice was a category. mod_audio_fork is the module that does it, and in 2026 it is still one of the cleanest ways to bridge a SIP-attached call to OpenAI Realtime, ElevenLabs Agents, or your own model server.

Background

flowchart TD
  Out[Outbound campaign] --> Twilio[Twilio Voice API]
  Twilio --> STIR[STIR/SHAKEN attestation]
  STIR --> Carrier[Originating carrier]
  Carrier --> Term[Terminating carrier]
  Term --> Recipient[Recipient phone]
  Recipient --> Webhook[/voice webhook/]
  Webhook --> Agent[AI sales agent]

CallSphere reference architecture

FreeSWITCH 1.10 ships with rich call control, robust SIP, and a mature module system. mod_audio_fork (and its drachtio-derived sibling mod_audio_stream) attach a media bug to a channel, fork the audio in either direction, and stream L16 PCM frames as binary WebSocket messages to a configurable URL. Optionally the WebSocket can send back audio (or JSON commands) and FreeSWITCH plays it on the channel.

For AI voice in 2026 the typical flow is: a SIP call hits FreeSWITCH, dialplan executes uuid_audio_fork on the bleg, the fork target is your AI bridge over WSS, the bridge forwards PCM to OpenAI Realtime, Realtime returns TTS audio, the bridge sends it back over WSS, FreeSWITCH plays it on the channel. The model is symmetric to the Twilio Media Streams + bridge pattern, but on infrastructure you control.

Technical deep-dive

<!-- dialplan/default.xml -->
<extension name="ai_agent_inbound">
  <condition field="destination_number" expression="^(\d+)$">
    <action application="answer"/>
    <action application="set" data="execute_on_answer=uuid_audio_fork ${uuid} start wss://bridge.callsphere.ai/freeswitch mono 16000"/>
    <action application="park"/>
  </condition>
</extension>

The uuid_audio_fork API call starts forking; mono 16000 says one channel of L16 PCM at 16 kHz. The WSS endpoint receives binary frames of PCM in real time.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live →

Try Live Demo →

# FastAPI WSS endpoint receiving FreeSWITCH audio
@app.websocket("/freeswitch")
async def freeswitch_bridge(ws: WebSocket):
    await ws.accept()
    openai_ws = await connect_openai_realtime()

    async def fs_to_openai():
        while True:
            data = await ws.receive_bytes()
            # data is raw L16 PCM 16 kHz mono
            await openai_ws.send_json({
                "type": "input_audio_buffer.append",
                "audio": base64.b64encode(data).decode()
            })

    async def openai_to_fs():
        async for msg in openai_ws.iter_text():
            evt = json.loads(msg)
            if evt["type"] == "response.audio.delta":
                pcm = base64.b64decode(evt["delta"])
                # Resample 24 kHz Realtime to 16 kHz for FreeSWITCH
                await ws.send_bytes(resample_24k_to_16k(pcm))

    await asyncio.gather(fs_to_openai(), openai_to_fs())

mod_audio_stream (drachtio variant) supports JSON control messages back to FreeSWITCH for things like "interrupt current audio" and "play this URL", which is useful for AI voice barge-in.

CallSphere implementation

CallSphere uses Twilio Programmable Voice across all six verticals as the production stack. For customers requiring self-hosted (regulated Healthcare AI deployments occasionally request this), FreeSWITCH + mod_audio_fork is the documented on-prem alternative. The bridge code is structurally identical to our Twilio Media Streams bridge: WebSocket receive PCM, send to OpenAI Realtime, receive TTS, send back. Healthcare AI on FastAPI :8084, Real Estate AI, Sales Calling AI (5 concurrent outbound per tenant), Salon AI, IT Helpdesk AI, and After-Hours AI (Twilio simul call+SMS with 120-second timeout) all map to this pattern. Across 37 agents, 90+ tools, 115+ DB tables, HIPAA + SOC 2 alignment, $149/$499/$1499 pricing, 14-day trial, and 22% affiliate, the FreeSWITCH path is reserved for on-prem and the Twilio path is the default cloud SKU.

Implementation steps

Install FreeSWITCH 1.10 with mod_audio_fork or mod_audio_stream compiled in (drachtio docker image drachtio/drachtio-freeswitch-mrf:v1.10.1-full is convenient).
Configure a SIP profile (sofia) and a dialplan context for inbound calls.
In dialplan, call uuid_audio_fork with start, your WSS URL, mono, and sample rate.
Stand up the WSS bridge service that translates between FreeSWITCH PCM and your AI engine.
Implement bidirectional audio: PCM in to model, TTS out to FreeSWITCH.
Handle barge-in by stopping the currently playing TTS when the model emits input_audio_buffer.speech_started and immediately playing the new response.
Wire DTMF: FreeSWITCH detects keypad and emits ESL events; subscribe via mod_event_socket and forward digits to the LLM.
Test end-to-end with sipsak or SIPp; measure first-audio-out latency and aim for under 1 second.

FAQ

mod_audio_fork vs mod_audio_stream? mod_audio_fork is the original FreeSWITCH-tree module. mod_audio_stream (drachtio) adds JSON command support and is more actively maintained for AI use cases.

Can I use Opus over the fork? The modules typically send L16 PCM. Opus encode at the bridge if you need bandwidth; for local WSS bandwidth is rarely the issue.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Try Live Demo → Book 30-min Walkthrough See Pricing

What is the CPU cost? About 1-3% of a vCPU per concurrent call for the fork and WebSocket path; negligible until you hit thousands of concurrent calls per node.

Does this support recording? Yes, FreeSWITCH's record_session is independent of audio_fork; you can record and fork simultaneously.

HIPAA on FreeSWITCH? Achievable. Use SIP/TLS, SRTP, WSS for the fork, encrypt recordings at rest, sign BAAs.

Sources

Start a 14-day trial on the cloud stack, see pricing, or contact us about FreeSWITCH on-prem AI voice.

FreeSWITCH mod_audio_fork to AI WebSocket in 2026: The Two-Way Audio Bridge

Background

Technical deep-dive

CallSphere implementation

Implementation steps

FAQ

Sources

Try CallSphere AI Voice Agents

Related Articles You May Like

Female Voice Generator: AI Voices That Sound Human in 2026

MOS Call Quality Scoring for AI Voice Operations in 2026: Beyond 4.2

Deploy a Voice Agent on fly.io with Multi-Region Routing

WebRTC vs WebSocket Voice: CallSphere Architecture Edge Over Vapi

Cold-Start Voice AI Performance: CallSphere vs Vapi Benchmarks

State Data Residency for AI Voice in Healthcare — Texas, Nevada, Colorado in 2026

Product

Resources

Company

Legal

Industries

Integrations

Solutions

Compare

Pillar Guides