By Sagar Shankaran, Founder of CallSphere
FreeSWITCH's mod_audio_fork (and its sibling mod_audio_stream) sends L16 PCM over WebSocket to your AI engine and receives audio back. Here is the production pattern in 2026.
Key takeaways
FreeSWITCH was streaming PCM over WebSocket to remote ASR engines years before AI voice was a category. mod_audio_fork is the module that does it, and in 2026 it is still one of the cleanest ways to bridge a SIP-attached call to OpenAI Realtime, ElevenLabs Agents, or your own model server.
flowchart TD
Out[Outbound campaign] --> Twilio[Twilio Voice API]
Twilio --> STIR[STIR/SHAKEN attestation]
STIR --> Carrier[Originating carrier]
Carrier --> Term[Terminating carrier]
Term --> Recipient[Recipient phone]
Recipient --> Webhook[/voice webhook/]
Webhook --> Agent[AI sales agent]FreeSWITCH 1.10 ships with rich call control, robust SIP, and a mature module system. mod_audio_fork (and its drachtio-derived sibling mod_audio_stream) attach a media bug to a channel, fork the audio in either direction, and stream L16 PCM frames as binary WebSocket messages to a configurable URL. Optionally the WebSocket can send back audio (or JSON commands) and FreeSWITCH plays it on the channel.
For AI voice in 2026 the typical flow is: a SIP call hits FreeSWITCH, dialplan executes uuid_audio_fork on the bleg, the fork target is your AI bridge over WSS, the bridge forwards PCM to OpenAI Realtime, Realtime returns TTS audio, the bridge sends it back over WSS, FreeSWITCH plays it on the channel. The model is symmetric to the Twilio Media Streams + bridge pattern, but on infrastructure you control.
<!-- dialplan/default.xml -->
<extension name="ai_agent_inbound">
<condition field="destination_number" expression="^(\d+)$">
<action application="answer"/>
<action application="set" data="execute_on_answer=uuid_audio_fork ${uuid} start wss://bridge.callsphere.ai/freeswitch mono 16000"/>
<action application="park"/>
</condition>
</extension>
The uuid_audio_fork API call starts forking; mono 16000 says one channel of L16 PCM at 16 kHz. The WSS endpoint receives binary frames of PCM in real time.
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
# FastAPI WSS endpoint receiving FreeSWITCH audio
@app.websocket("/freeswitch")
async def freeswitch_bridge(ws: WebSocket):
await ws.accept()
openai_ws = await connect_openai_realtime()
async def fs_to_openai():
while True:
data = await ws.receive_bytes()
# data is raw L16 PCM 16 kHz mono
await openai_ws.send_json({
"type": "input_audio_buffer.append",
"audio": base64.b64encode(data).decode()
})
async def openai_to_fs():
async for msg in openai_ws.iter_text():
evt = json.loads(msg)
if evt["type"] == "response.audio.delta":
pcm = base64.b64decode(evt["delta"])
# Resample 24 kHz Realtime to 16 kHz for FreeSWITCH
await ws.send_bytes(resample_24k_to_16k(pcm))
await asyncio.gather(fs_to_openai(), openai_to_fs())
mod_audio_stream (drachtio variant) supports JSON control messages back to FreeSWITCH for things like "interrupt current audio" and "play this URL", which is useful for AI voice barge-in.
CallSphere uses Twilio Programmable Voice across all six verticals as the production stack. For customers requiring self-hosted (regulated Healthcare AI deployments occasionally request this), FreeSWITCH + mod_audio_fork is the documented on-prem alternative. The bridge code is structurally identical to our Twilio Media Streams bridge: WebSocket receive PCM, send to OpenAI Realtime, receive TTS, send back. Healthcare AI on FastAPI :8084, Real Estate AI, Sales Calling AI (5 concurrent outbound per tenant), Salon AI, IT Helpdesk AI, and After-Hours AI (Twilio simul call+SMS with 120-second timeout) all map to this pattern. Across 37 agents, 90+ tools, 115+ DB tables, HIPAA + SOC 2 alignment, $149/$499/$1499 pricing, 14-day trial, and 22% affiliate, the FreeSWITCH path is reserved for on-prem and the Twilio path is the default cloud SKU.
drachtio/drachtio-freeswitch-mrf:v1.10.1-full is convenient).uuid_audio_fork with start, your WSS URL, mono, and sample rate.input_audio_buffer.speech_started and immediately playing the new response.mod_audio_fork vs mod_audio_stream? mod_audio_fork is the original FreeSWITCH-tree module. mod_audio_stream (drachtio) adds JSON command support and is more actively maintained for AI use cases.
Can I use Opus over the fork? The modules typically send L16 PCM. Opus encode at the bridge if you need bandwidth; for local WSS bandwidth is rarely the issue.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
What is the CPU cost? About 1-3% of a vCPU per concurrent call for the fork and WebSocket path; negligible until you hit thousands of concurrent calls per node.
Does this support recording? Yes, FreeSWITCH's record_session is independent of audio_fork; you can record and fork simultaneously.
HIPAA on FreeSWITCH? Achievable. Use SIP/TLS, SRTP, WSS for the fork, encrypt recordings at rest, sign BAAs.
Start a 14-day trial on the cloud stack, see pricing, or contact us about FreeSWITCH on-prem AI voice.
Written by
Sagar Shankaran· Founder, CallSphere
Sagar Shankaran is the founder of CallSphere, where he builds production AI voice and chat agents deployed across healthcare, hospitality, real estate, and home services. He writes about agentic AI, LLM engineering, and shipping voice agents that handle real calls in production.
See how AI voice agents work for your industry. Live demo available -- no signup required.
A founder's guide to the female voice generator landscape: AI female voices, Japanese voices, robot voices, and how CallSphere ships 57+ voices live.
MOS 4.3+ is the band where AI voice feels human. Drop below 3.6 and conversations break. Here is how to measure, improve, and alert on MOS in production AI voice using G.711, Opus, and the underlying packet loss / jitter / latency math.
fly.io runs voice agents close to every user. Real working fly.toml, Pipecat in Docker, and fly-replay for sticky WebSocket sessions across 35 regions.
WebRTC vs WebSocket for voice AI: when each transport wins on NAT traversal, jitter, codec choice and latency. CallSphere runs both, Vapi locks you in.
Detailed cold-start benchmarks for voice AI: WebSocket setup, model warmup, first-token latency. Compare CallSphere on K8s vs Vapi managed pipeline.
Texas SB 1188 requires US-resident EHRs from January 1, 2026; Nevada's consumer-health-data law constrains health data; Colorado AI Act takes effect June 30, 2026. AI voice agents must architect for state-by-state data localization.
© 2026 CallSphere LLC. All rights reserved.