Every production FreeSWITCH-plus-AI deployment in 2026 hits the same fork: do you let your Python brain steer the dialplan over ESL inbound, or do you flip ESL outbound and let FreeSWITCH spawn a TCP connection per call. Outbound wins for high-concurrency AI because media threads never block on inference, but it costs you a connection-per-call accounting model that ESL inbound does not have.

Background

Event Socket Library is the C client that ships with FreeSWITCH for talking to the Event System over a TCP socket. It dates back to FreeSWITCH 1.0 in 2008 and is the same wire protocol used by node-esl, eslgo, switch-esl in Rust, the Python ESL bindings, and the Ruby ESL gem. The protocol is line-oriented, plain text, and trivial to parse. Authentication is a simple ClueCon password by default; production rigs put it on localhost or behind WireGuard.

ESL has two modes. Inbound: your client connects to mod_event_socket on FreeSWITCH (default 8021), subscribes to events, and issues api or bgapi commands. Outbound: you put a action in the dialplan and FreeSWITCH dials your TCP listener, handing you full control of that one channel. AI voice in 2026 leans hard on outbound because each call gets its own goroutine or async task, isolated from the FreeSWITCH media loop.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live →

Try Live Demo →

Architecture

graph LR
    A[PSTN / SIP Trunk] --> B[FreeSWITCH 1.10.x]
    B -->|dialplan socket action| C[ESL Outbound Listener]
    C --> D[Python AI Brain]
    D -->|playback / hangup| C
    C -->|commands| B
    B -->|mod_audio_stream WebSocket| E[OpenAI Realtime]

The ESL outbound listener handles signaling-level events (DTMF, CHANNEL_ANSWER, CHANNEL_HANGUP, custom variables) and issues commands like uuid_broadcast, playback, hangup, transfer. The actual audio bypasses ESL entirely and goes through mod_audio_stream or mod_audio_fork to a WebSocket bridge that talks to OpenAI Realtime, Deepgram, or your own STS endpoint.

<extension name="ai-handler">
  <condition field="destination_number" expression="^(\+1\d{10})$">
    <action application="set" data="hangup_after_bridge=true"/>
    <action application="socket" data="127.0.0.1:9000 async full"/>
  </condition>
</extension>

CallSphere implementation

CallSphere does not run FreeSWITCH in production. Every inbound and outbound call across our six verticals (Healthcare AI, Real Estate AI, Sales Calling AI, Salon AI, IT Helpdesk AI, After-Hours AI) terminates on Twilio Programmable Voice. Healthcare AI runs on a FastAPI service at port :8084 that bridges Twilio Media Streams to OpenAI Realtime over WebSocket; Sales Calling AI fires up to 5 concurrent outbound calls per tenant; After-Hours AI uses a Twilio simul call+SMS pattern with a 120-second timeout. Across 37 agents, 90+ tools, 115+ DB tables, and HIPAA + SOC 2 compliance, our $149/$499/$1499 plans with a 14-day trial and 22% affiliate program are powered by Twilio, not FreeSWITCH. We track FreeSWITCH ESL because some self-hosted prospects ask for it, and our engineering team keeps a reference build to validate carrier fallback patterns.

Build steps

Install FreeSWITCH 1.10.12 with mod_event_socket and mod_audio_stream enabled.
In dialplan, route AI calls through a action to your listener IP and port.
Spin up an ESL outbound listener (Python: greenswitch, Node: drachtio-esl, Go: eslgo) that accepts TCP and reads the channel UUID from the event headers.
Subscribe to CHANNEL_ANSWER, DTMF, CHANNEL_EXECUTE_COMPLETE, CHANNEL_HANGUP_COMPLETE for your UUID.
Issue uuid_audio_stream <uuid> start wss://bridge/realtime mono 16k to push audio to your AI service.
Receive AI-generated audio back over a separate channel (uuid_play_say, uuid_broadcast file://, or write directly to a named pipe with mod_audio_stream).
Watch for CHANNEL_HANGUP and clean up your stream resources; FreeSWITCH will not do it for you.

Pitfalls

ESL inbound on a single TCP connection becomes a bottleneck above ~200 concurrent calls; switch to outbound.
Forgetting async full in the socket action means FreeSWITCH blocks the dialplan thread waiting for your TCP response.
mod_audio_stream defaults to L16 but some forks emit L16-PCM with different endianness; verify byte order on the wire.
ClueCon as default password on a public IP is a free RCE; bind to 127.0.0.1 or use mTLS.
ESL events fire in their own thread; do not mutate FreeSWITCH state from the event callback unless you are sure the API is thread-safe.

FAQ

Is ESL deprecated in favor of mod_xml_curl or mod_lua? No. ESL is the canonical external-control interface. Lua and XML curl handle dialplan-time decisions; ESL handles call-time decisions and runtime events.

Does ESL outbound scale to 10k calls? Yes if your listener is async and connection-per-call. Each connection idles between events. Expect 50-100 MB RAM per 1k connections in Python asyncio.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Try Live Demo → Book 30-min Walkthrough See Pricing

Can I send audio over ESL? You can over the channel via uuid_record_buffer or playback, but for real-time AI use mod_audio_stream or mod_audio_fork to a WebSocket. ESL is for signaling.

FreeSWITCH or Asterisk for AI in 2026? FreeSWITCH wins for raw concurrency and codec flexibility; Asterisk wins for AGI and ARI maturity. Both are valid; carrier-side choices often dictate.

Why does CallSphere not run FreeSWITCH? Twilio gives us PCI- and HIPAA-aligned compliance attestations, global PSTN reach, and number provisioning APIs out of the box. The cost premium pays for itself in audit and uptime.

Sources

Start a 14-day trial to see Twilio-based voice agents in production, browse pricing for $149/$499/$1499 tiers, or book a demo to compare ESL versus our managed stack.

FreeSWITCH Event Socket Library (ESL) for AI Voice Control in 2026

Background

Architecture

CallSphere implementation

Build steps

Pitfalls

FAQ

Sources

Try CallSphere AI Voice Agents

Related Articles You May Like

MOS Call Quality Scoring for AI Voice Operations in 2026: Beyond 4.2

The Latency Budget for AI Voice Agents Across PSTN in 2026

Session Border Controllers for AI Voice: Compliance, Security, Survival

State Data Residency for AI Voice in Healthcare — Texas, Nevada, Colorado in 2026

SIP Debugging with sngrep and Wireshark for AI Voice Calls in 2026: The Hands-On Playbook

SIP REGISTER and INVITE: Deep Dive for AI Voice Agent Builders