FreeSWITCH mod_audio_fork to AI WebSocket in 2026: The Two-Way Audio Bridge
FreeSWITCH's mod_audio_fork (and its sibling mod_audio_stream) sends L16 PCM over WebSocket to your AI engine and receives audio back. Here is the production pattern in 2026.
FreeSWITCH was streaming PCM over WebSocket to remote ASR engines years before AI voice was a category. mod_audio_fork is the module that does it, and in 2026 it is still one of the cleanest ways to bridge a SIP-attached call to OpenAI Realtime, ElevenLabs Agents, or your own model server.
Background
flowchart TD
Out[Outbound campaign] --> Twilio[Twilio Voice API]
Twilio --> STIR[STIR/SHAKEN attestation]
STIR --> Carrier[Originating carrier]
Carrier --> Term[Terminating carrier]
Term --> Recipient[Recipient phone]
Recipient --> Webhook[/voice webhook/]
Webhook --> Agent[AI sales agent]FreeSWITCH 1.10 ships with rich call control, robust SIP, and a mature module system. mod_audio_fork (and its drachtio-derived sibling mod_audio_stream) attach a media bug to a channel, fork the audio in either direction, and stream L16 PCM frames as binary WebSocket messages to a configurable URL. Optionally the WebSocket can send back audio (or JSON commands) and FreeSWITCH plays it on the channel.
For AI voice in 2026 the typical flow is: a SIP call hits FreeSWITCH, dialplan executes uuid_audio_fork on the bleg, the fork target is your AI bridge over WSS, the bridge forwards PCM to OpenAI Realtime, Realtime returns TTS audio, the bridge sends it back over WSS, FreeSWITCH plays it on the channel. The model is symmetric to the Twilio Media Streams + bridge pattern, but on infrastructure you control.
Technical deep-dive
<!-- dialplan/default.xml -->
<extension name="ai_agent_inbound">
<condition field="destination_number" expression="^(\d+)$">
<action application="answer"/>
<action application="set" data="execute_on_answer=uuid_audio_fork ${uuid} start wss://bridge.callsphere.ai/freeswitch mono 16000"/>
<action application="park"/>
</condition>
</extension>
The uuid_audio_fork API call starts forking; mono 16000 says one channel of L16 PCM at 16 kHz. The WSS endpoint receives binary frames of PCM in real time.
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
# FastAPI WSS endpoint receiving FreeSWITCH audio
@app.websocket("/freeswitch")
async def freeswitch_bridge(ws: WebSocket):
await ws.accept()
openai_ws = await connect_openai_realtime()
async def fs_to_openai():
while True:
data = await ws.receive_bytes()
# data is raw L16 PCM 16 kHz mono
await openai_ws.send_json({
"type": "input_audio_buffer.append",
"audio": base64.b64encode(data).decode()
})
async def openai_to_fs():
async for msg in openai_ws.iter_text():
evt = json.loads(msg)
if evt["type"] == "response.audio.delta":
pcm = base64.b64decode(evt["delta"])
# Resample 24 kHz Realtime to 16 kHz for FreeSWITCH
await ws.send_bytes(resample_24k_to_16k(pcm))
await asyncio.gather(fs_to_openai(), openai_to_fs())
mod_audio_stream (drachtio variant) supports JSON control messages back to FreeSWITCH for things like "interrupt current audio" and "play this URL", which is useful for AI voice barge-in.
CallSphere implementation
CallSphere uses Twilio Programmable Voice across all six verticals as the production stack. For customers requiring self-hosted (regulated Healthcare AI deployments occasionally request this), FreeSWITCH + mod_audio_fork is the documented on-prem alternative. The bridge code is structurally identical to our Twilio Media Streams bridge: WebSocket receive PCM, send to OpenAI Realtime, receive TTS, send back. Healthcare AI on FastAPI :8084, Real Estate AI, Sales Calling AI (5 concurrent outbound per tenant), Salon AI, IT Helpdesk AI, and After-Hours AI (Twilio simul call+SMS with 120-second timeout) all map to this pattern. Across 37 agents, 90+ tools, 115+ DB tables, HIPAA + SOC 2 alignment, $149/$499/$1499 pricing, 14-day trial, and 22% affiliate, the FreeSWITCH path is reserved for on-prem and the Twilio path is the default cloud SKU.
Implementation steps
- Install FreeSWITCH 1.10 with mod_audio_fork or mod_audio_stream compiled in (drachtio docker image
drachtio/drachtio-freeswitch-mrf:v1.10.1-fullis convenient). - Configure a SIP profile (sofia) and a dialplan context for inbound calls.
- In dialplan, call
uuid_audio_forkwith start, your WSS URL, mono, and sample rate. - Stand up the WSS bridge service that translates between FreeSWITCH PCM and your AI engine.
- Implement bidirectional audio: PCM in to model, TTS out to FreeSWITCH.
- Handle barge-in by stopping the currently playing TTS when the model emits
input_audio_buffer.speech_startedand immediately playing the new response. - Wire DTMF: FreeSWITCH detects keypad and emits ESL events; subscribe via mod_event_socket and forward digits to the LLM.
- Test end-to-end with sipsak or SIPp; measure first-audio-out latency and aim for under 1 second.
FAQ
mod_audio_fork vs mod_audio_stream? mod_audio_fork is the original FreeSWITCH-tree module. mod_audio_stream (drachtio) adds JSON command support and is more actively maintained for AI use cases.
Can I use Opus over the fork? The modules typically send L16 PCM. Opus encode at the bridge if you need bandwidth; for local WSS bandwidth is rarely the issue.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
What is the CPU cost? About 1-3% of a vCPU per concurrent call for the fork and WebSocket path; negligible until you hit thousands of concurrent calls per node.
Does this support recording? Yes, FreeSWITCH's record_session is independent of audio_fork; you can record and fork simultaneously.
HIPAA on FreeSWITCH? Achievable. Use SIP/TLS, SRTP, WSS for the fork, encrypt recordings at rest, sign BAAs.
Sources
- GitHub: amigniter/mod_audio_stream
- Cyberpunk.tools: Add AI Voice Agent to FreeSWITCH in 30 Minutes
- drachtio-freeswitch-modules mod_audio_fork README
Start a 14-day trial on the cloud stack, see pricing, or contact us about FreeSWITCH on-prem AI voice.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.