FreeSWITCH Event Socket Library (ESL) for AI Voice Control in 2026
ESL is still the cleanest way to drive FreeSWITCH from an AI brain in 2026: outbound mode keeps media threads off the GPU, inbound mode gives you a firehose of channel events. Here is what changed and how to wire it.
Every production FreeSWITCH-plus-AI deployment in 2026 hits the same fork: do you let your Python brain steer the dialplan over ESL inbound, or do you flip ESL outbound and let FreeSWITCH spawn a TCP connection per call. Outbound wins for high-concurrency AI because media threads never block on inference, but it costs you a connection-per-call accounting model that ESL inbound does not have.
Background
Event Socket Library is the C client that ships with FreeSWITCH for talking to the Event System over a TCP socket. It dates back to FreeSWITCH 1.0 in 2008 and is the same wire protocol used by node-esl, eslgo, switch-esl in Rust, the Python ESL bindings, and the Ruby ESL gem. The protocol is line-oriented, plain text, and trivial to parse. Authentication is a simple ClueCon password by default; production rigs put it on localhost or behind WireGuard.
ESL has two modes. Inbound: your client connects to mod_event_socket on FreeSWITCH (default 8021), subscribes to events, and issues api or bgapi commands. Outbound: you put a
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
Architecture
graph LR
A[PSTN / SIP Trunk] --> B[FreeSWITCH 1.10.x]
B -->|dialplan socket action| C[ESL Outbound Listener]
C --> D[Python AI Brain]
D -->|playback / hangup| C
C -->|commands| B
B -->|mod_audio_stream WebSocket| E[OpenAI Realtime]
The ESL outbound listener handles signaling-level events (DTMF, CHANNEL_ANSWER, CHANNEL_HANGUP, custom variables) and issues commands like uuid_broadcast, playback, hangup, transfer. The actual audio bypasses ESL entirely and goes through mod_audio_stream or mod_audio_fork to a WebSocket bridge that talks to OpenAI Realtime, Deepgram, or your own STS endpoint.
<extension name="ai-handler">
<condition field="destination_number" expression="^(\+1\d{10})$">
<action application="set" data="hangup_after_bridge=true"/>
<action application="socket" data="127.0.0.1:9000 async full"/>
</condition>
</extension>
CallSphere implementation
CallSphere does not run FreeSWITCH in production. Every inbound and outbound call across our six verticals (Healthcare AI, Real Estate AI, Sales Calling AI, Salon AI, IT Helpdesk AI, After-Hours AI) terminates on Twilio Programmable Voice. Healthcare AI runs on a FastAPI service at port :8084 that bridges Twilio Media Streams to OpenAI Realtime over WebSocket; Sales Calling AI fires up to 5 concurrent outbound calls per tenant; After-Hours AI uses a Twilio simul call+SMS pattern with a 120-second timeout. Across 37 agents, 90+ tools, 115+ DB tables, and HIPAA + SOC 2 compliance, our $149/$499/$1499 plans with a 14-day trial and 22% affiliate program are powered by Twilio, not FreeSWITCH. We track FreeSWITCH ESL because some self-hosted prospects ask for it, and our engineering team keeps a reference build to validate carrier fallback patterns.
Build steps
- Install FreeSWITCH 1.10.12 with mod_event_socket and mod_audio_stream enabled.
- In dialplan, route AI calls through a
action to your listener IP and port. - Spin up an ESL outbound listener (Python: greenswitch, Node: drachtio-esl, Go: eslgo) that accepts TCP and reads the channel UUID from the event headers.
- Subscribe to CHANNEL_ANSWER, DTMF, CHANNEL_EXECUTE_COMPLETE, CHANNEL_HANGUP_COMPLETE for your UUID.
- Issue
uuid_audio_stream <uuid> start wss://bridge/realtime mono 16kto push audio to your AI service. - Receive AI-generated audio back over a separate channel (uuid_play_say, uuid_broadcast file://, or write directly to a named pipe with mod_audio_stream).
- Watch for CHANNEL_HANGUP and clean up your stream resources; FreeSWITCH will not do it for you.
Pitfalls
- ESL inbound on a single TCP connection becomes a bottleneck above ~200 concurrent calls; switch to outbound.
- Forgetting
async fullin the socket action means FreeSWITCH blocks the dialplan thread waiting for your TCP response. - mod_audio_stream defaults to L16 but some forks emit L16-PCM with different endianness; verify byte order on the wire.
- ClueCon as default password on a public IP is a free RCE; bind to 127.0.0.1 or use mTLS.
- ESL events fire in their own thread; do not mutate FreeSWITCH state from the event callback unless you are sure the API is thread-safe.
FAQ
Is ESL deprecated in favor of mod_xml_curl or mod_lua? No. ESL is the canonical external-control interface. Lua and XML curl handle dialplan-time decisions; ESL handles call-time decisions and runtime events.
Does ESL outbound scale to 10k calls? Yes if your listener is async and connection-per-call. Each connection idles between events. Expect 50-100 MB RAM per 1k connections in Python asyncio.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
Can I send audio over ESL? You can over the channel via uuid_record_buffer or playback, but for real-time AI use mod_audio_stream or mod_audio_fork to a WebSocket. ESL is for signaling.
FreeSWITCH or Asterisk for AI in 2026? FreeSWITCH wins for raw concurrency and codec flexibility; Asterisk wins for AGI and ARI maturity. Both are valid; carrier-side choices often dictate.
Why does CallSphere not run FreeSWITCH? Twilio gives us PCI- and HIPAA-aligned compliance attestations, global PSTN reach, and number provisioning APIs out of the box. The cost premium pays for itself in audit and uptime.
Sources
- FreeSWITCH Event Socket Library - SignalWire docs
- Event Socket Outbound documentation
- mod_audio_stream on GitHub
- AI Voicebots for Telecom: Asterisk and FreeSWITCH Integration
Start a 14-day trial to see Twilio-based voice agents in production, browse pricing for $149/$499/$1499 tiers, or book a demo to compare ESL versus our managed stack.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.