Skip to content
AI Infrastructure
AI Infrastructure11 min read0 views

FreeSWITCH Event Socket Library (ESL) for AI Voice Control in 2026

ESL is still the cleanest way to drive FreeSWITCH from an AI brain in 2026: outbound mode keeps media threads off the GPU, inbound mode gives you a firehose of channel events. Here is what changed and how to wire it.

Every production FreeSWITCH-plus-AI deployment in 2026 hits the same fork: do you let your Python brain steer the dialplan over ESL inbound, or do you flip ESL outbound and let FreeSWITCH spawn a TCP connection per call. Outbound wins for high-concurrency AI because media threads never block on inference, but it costs you a connection-per-call accounting model that ESL inbound does not have.

Background

Event Socket Library is the C client that ships with FreeSWITCH for talking to the Event System over a TCP socket. It dates back to FreeSWITCH 1.0 in 2008 and is the same wire protocol used by node-esl, eslgo, switch-esl in Rust, the Python ESL bindings, and the Ruby ESL gem. The protocol is line-oriented, plain text, and trivial to parse. Authentication is a simple ClueCon password by default; production rigs put it on localhost or behind WireGuard.

ESL has two modes. Inbound: your client connects to mod_event_socket on FreeSWITCH (default 8021), subscribes to events, and issues api or bgapi commands. Outbound: you put a action in the dialplan and FreeSWITCH dials your TCP listener, handing you full control of that one channel. AI voice in 2026 leans hard on outbound because each call gets its own goroutine or async task, isolated from the FreeSWITCH media loop.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live Demo →

Architecture

graph LR
    A[PSTN / SIP Trunk] --> B[FreeSWITCH 1.10.x]
    B -->|dialplan socket action| C[ESL Outbound Listener]
    C --> D[Python AI Brain]
    D -->|playback / hangup| C
    C -->|commands| B
    B -->|mod_audio_stream WebSocket| E[OpenAI Realtime]

The ESL outbound listener handles signaling-level events (DTMF, CHANNEL_ANSWER, CHANNEL_HANGUP, custom variables) and issues commands like uuid_broadcast, playback, hangup, transfer. The actual audio bypasses ESL entirely and goes through mod_audio_stream or mod_audio_fork to a WebSocket bridge that talks to OpenAI Realtime, Deepgram, or your own STS endpoint.

<extension name="ai-handler">
  <condition field="destination_number" expression="^(\+1\d{10})$">
    <action application="set" data="hangup_after_bridge=true"/>
    <action application="socket" data="127.0.0.1:9000 async full"/>
  </condition>
</extension>

CallSphere implementation

CallSphere does not run FreeSWITCH in production. Every inbound and outbound call across our six verticals (Healthcare AI, Real Estate AI, Sales Calling AI, Salon AI, IT Helpdesk AI, After-Hours AI) terminates on Twilio Programmable Voice. Healthcare AI runs on a FastAPI service at port :8084 that bridges Twilio Media Streams to OpenAI Realtime over WebSocket; Sales Calling AI fires up to 5 concurrent outbound calls per tenant; After-Hours AI uses a Twilio simul call+SMS pattern with a 120-second timeout. Across 37 agents, 90+ tools, 115+ DB tables, and HIPAA + SOC 2 compliance, our $149/$499/$1499 plans with a 14-day trial and 22% affiliate program are powered by Twilio, not FreeSWITCH. We track FreeSWITCH ESL because some self-hosted prospects ask for it, and our engineering team keeps a reference build to validate carrier fallback patterns.

Build steps

  1. Install FreeSWITCH 1.10.12 with mod_event_socket and mod_audio_stream enabled.
  2. In dialplan, route AI calls through a action to your listener IP and port.
  3. Spin up an ESL outbound listener (Python: greenswitch, Node: drachtio-esl, Go: eslgo) that accepts TCP and reads the channel UUID from the event headers.
  4. Subscribe to CHANNEL_ANSWER, DTMF, CHANNEL_EXECUTE_COMPLETE, CHANNEL_HANGUP_COMPLETE for your UUID.
  5. Issue uuid_audio_stream <uuid> start wss://bridge/realtime mono 16k to push audio to your AI service.
  6. Receive AI-generated audio back over a separate channel (uuid_play_say, uuid_broadcast file://, or write directly to a named pipe with mod_audio_stream).
  7. Watch for CHANNEL_HANGUP and clean up your stream resources; FreeSWITCH will not do it for you.

Pitfalls

  • ESL inbound on a single TCP connection becomes a bottleneck above ~200 concurrent calls; switch to outbound.
  • Forgetting async full in the socket action means FreeSWITCH blocks the dialplan thread waiting for your TCP response.
  • mod_audio_stream defaults to L16 but some forks emit L16-PCM with different endianness; verify byte order on the wire.
  • ClueCon as default password on a public IP is a free RCE; bind to 127.0.0.1 or use mTLS.
  • ESL events fire in their own thread; do not mutate FreeSWITCH state from the event callback unless you are sure the API is thread-safe.

FAQ

Is ESL deprecated in favor of mod_xml_curl or mod_lua? No. ESL is the canonical external-control interface. Lua and XML curl handle dialplan-time decisions; ESL handles call-time decisions and runtime events.

Does ESL outbound scale to 10k calls? Yes if your listener is async and connection-per-call. Each connection idles between events. Expect 50-100 MB RAM per 1k connections in Python asyncio.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Can I send audio over ESL? You can over the channel via uuid_record_buffer or playback, but for real-time AI use mod_audio_stream or mod_audio_fork to a WebSocket. ESL is for signaling.

FreeSWITCH or Asterisk for AI in 2026? FreeSWITCH wins for raw concurrency and codec flexibility; Asterisk wins for AGI and ARI maturity. Both are valid; carrier-side choices often dictate.

Why does CallSphere not run FreeSWITCH? Twilio gives us PCI- and HIPAA-aligned compliance attestations, global PSTN reach, and number provisioning APIs out of the box. The cost premium pays for itself in audit and uptime.

Sources

Start a 14-day trial to see Twilio-based voice agents in production, browse pricing for $149/$499/$1499 tiers, or book a demo to compare ESL versus our managed stack.

Share

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.

Related Articles You May Like

AI Voice Agents

MOS Call Quality Scoring for AI Voice Operations in 2026: Beyond 4.2

MOS 4.3+ is the band where AI voice feels human. Drop below 3.6 and conversations break. Here is how to measure, improve, and alert on MOS in production AI voice using G.711, Opus, and the underlying packet loss / jitter / latency math.

AI Engineering

The Latency Budget for AI Voice Agents Across PSTN in 2026

Where every millisecond goes between caller and AI: PSTN, carrier, STT, LLM, TTS, and back. The component-level targets that ship in 2026 and how to hit them.

AI Infrastructure

Session Border Controllers for AI Voice: Compliance, Security, Survival

What an SBC actually does, why AI voice deployments still need them in 2026, and how Oracle, Ribbon, AudioCodes, and Cisco fit into modern stacks.

AI Strategy

State Data Residency for AI Voice in Healthcare — Texas, Nevada, Colorado in 2026

Texas SB 1188 requires US-resident EHRs from January 1, 2026; Nevada's consumer-health-data law constrains health data; Colorado AI Act takes effect June 30, 2026. AI voice agents must architect for state-by-state data localization.

AI Engineering

SIP Debugging with sngrep and Wireshark for AI Voice Calls in 2026: The Hands-On Playbook

When your AI voice agent gets one-way audio, missed DTMF, or codec mismatch, sngrep and Wireshark are still the fastest path to root cause in 2026. Here is the playbook.

AI Engineering

SIP REGISTER and INVITE: Deep Dive for AI Voice Agent Builders

How SIP REGISTER and INVITE work end-to-end, why your AI agent platform needs to handle 401 challenges and Record-Route correctly, and the failure modes that bite production builds.