By Sagar Shankaran, Founder of CallSphere
Asterisk's Stasis application + ARI gives you fine-grained channel control from any HTTP-speaking service. For AI voice agents in 2026 that means a pure-Python control plane sitting on top of a battle-tested PBX.
Key takeaways
Asterisk has been the on-prem PBX of last resort for two decades. In 2026 it is also one of the cleanest paths to building an AI voice agent on infrastructure you control end-to-end. The Stasis() dialplan app + ARI (Asterisk REST Interface) is the bridge.
flowchart LR
Phone["PSTN caller"] --> Carrier["Carrier"]
Carrier -- "SIP INVITE" --> SBC["Session Border Controller"]
SBC -- "SIP" --> PBX["Twilio / Asterisk"]
PBX -- "RTP · Opus" --> Bridge["AI Voice Gateway"]
Bridge --> AI["OpenAI Realtime"]
AI --> Bridge
Bridge --> PBXAsterisk's traditional dialplan language (extensions.conf) is fine for IVR but awkward for the kind of dynamic, LLM-driven flow control AI voice needs. ARI (introduced in Asterisk 12, hardened across 13-22) is a REST + WebSocket interface to channel control. The Stasis() dialplan application parks a channel under control of an ARI-connected external app, fires a StasisStart event over the WebSocket, and from then on every channel state change (dialing, answered, dtmf received, talking, silence, hangup) is a JSON event your app receives.
For AI voice in 2026 the model is: caller hits Asterisk, dialplan does Stasis(ai_agent), your Python app on ARI receives StasisStart, your app uses AudioSocket or external media to bridge raw PCM to your model server, the model talks back, your app sends TTS back through Asterisk, and on hangup or transfer the dialplan resumes. Asterisk handles the SIP, the codecs, the recording, the call-detail records; your app handles the AI brain.
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
; extensions.conf
[from-trunk]
exten => _X.,1,NoOp(Inbound to AI agent)
same => n,Answer()
same => n,Stasis(ai_agent,inbound,${EXTEN})
same => n,Hangup()
# Python ARI client (using aioari or requests-ari)
import asyncio
import aioari
async def main():
client = await aioari.connect("http://asterisk:8088", "ari_user", "ari_pass")
client.on_channel_event("StasisStart", on_start)
client.on_channel_event("ChannelDtmfReceived", on_dtmf)
client.on_channel_event("StasisEnd", on_end)
await client.run(apps="ai_agent")
async def on_start(channel, event):
args = event["args"] # ['inbound', '+19175551212']
# Bridge to AudioSocket for raw PCM
bridge = await client.bridges.create(type="mixing")
audiosocket = await client.channels.externalMedia(
app="ai_agent",
external_host="audiosocket-bridge:8090",
format="slin16"
)
await bridge.addChannel(channel=channel.id)
await bridge.addChannel(channel=audiosocket.id)
# AudioSocket server feeding OpenAI Realtime
async def handle_audiosocket(reader, writer):
while True:
kind = await reader.read(1)
length = int.from_bytes(await reader.read(2), "big")
payload = await reader.read(length)
if kind == b"\x10": # audio frame
await openai_send_audio(payload) # PCM 16-bit 16 kHz
elif kind == b"\x00": # hangup
return
The AudioSocket protocol (introduced in Asterisk 18 and stable through 22) is a tiny TCP/UDP framing for raw PCM that bypasses ARI WebSocket bandwidth limits.
CallSphere does not run Asterisk in production - we use Twilio Programmable Voice across all six verticals because it offloads carrier relationships, STIR/SHAKEN, A2P 10DLC, and SBC duty. But the Stasis pattern is what FastAPI :8084 is morally equivalent to: a Python control plane that receives a stream (Twilio Media Streams in our case, AudioSocket if we ran Asterisk) and manages the OpenAI Realtime conversation. For customers who require on-prem (we have several Healthcare AI prospects on this path), an Asterisk + Stasis + AudioSocket deployment with a sidecar bridge to OpenAI Realtime is the documented alternative. Sales Calling AI's 5 concurrent outbound calls per tenant and After-Hours AI's simul call+SMS with 120-second timeout map cleanly to ARI Originate APIs. Across 37 agents, 90+ tools, 115+ DB tables, HIPAA + SOC 2 alignment, $149/$499/$1499 pricing, and 14-day trial, the cloud path remains default and the Asterisk path is documented for self-hosted.
ari.conf) and create an ARI user with channel control privileges.Asterisk vs FreeSWITCH for AI voice control? Asterisk has cleaner ARI semantics and better community for IVR-style flows. FreeSWITCH has mod_audio_fork and finer event-socket control. Pick by team familiarity.
Can Stasis handle thousands of concurrent calls? Yes, with care. A single Asterisk node handles 1000+ concurrent calls; ARI WebSocket is a single bottleneck so consider sharding if you need more.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
Is AudioSocket production-grade? Yes in Asterisk 18+. The protocol is simple and field-tested.
HIPAA on Asterisk? Possible. Run SIP/TLS, SRTP, encrypt recordings at rest, sign a BAA with your hosting provider, and audit logs.
What about Asterisk 22 changes? Asterisk 22 (LTS released 2024) adds improved ARI events and stronger PJSIP defaults; AudioSocket continues to evolve.
Start a 14-day trial on the cloud stack, see pricing, or contact us about Asterisk-based on-prem AI voice deployments.
Written by
Sagar Shankaran· Founder, CallSphere
Sagar Shankaran is the founder of CallSphere, where he builds production AI voice and chat agents deployed across healthcare, hospitality, real estate, and home services. He writes about agentic AI, LLM engineering, and shipping voice agents that handle real calls in production.
See how AI voice agents work for your industry. Live demo available -- no signup required.
A founder's guide to the female voice generator landscape: AI female voices, Japanese voices, robot voices, and how CallSphere ships 57+ voices live.
A founder's guide to business phone systems in 2026. Cloud vs on-prem, AI voice agents, small business pricing, and what actually works for under 100 seats.
MOS 4.3+ is the band where AI voice feels human. Drop below 3.6 and conversations break. Here is how to measure, improve, and alert on MOS in production AI voice using G.711, Opus, and the underlying packet loss / jitter / latency math.
Asterisk + ARI + AudioSocket + an open LLM = a voice agent that drops into your existing PBX. No SIP-trunking provider lock-in — full Python orchestration.
Texas SB 1188 requires US-resident EHRs from January 1, 2026; Nevada's consumer-health-data law constrains health data; Colorado AI Act takes effect June 30, 2026. AI voice agents must architect for state-by-state data localization.
When your AI voice agent gets one-way audio, missed DTMF, or codec mismatch, sngrep and Wireshark are still the fastest path to root cause in 2026. Here is the playbook.
© 2026 CallSphere LLC. All rights reserved.