Skip to content
AI Engineering
AI Engineering11 min read0 views

Asterisk Stasis App for AI Voice Control in 2026: ARI as the Glue

Asterisk's Stasis application + ARI gives you fine-grained channel control from any HTTP-speaking service. For AI voice agents in 2026 that means a pure-Python control plane sitting on top of a battle-tested PBX.

Asterisk has been the on-prem PBX of last resort for two decades. In 2026 it is also one of the cleanest paths to building an AI voice agent on infrastructure you control end-to-end. The Stasis() dialplan app + ARI (Asterisk REST Interface) is the bridge.

Background

flowchart LR
  Phone["PSTN caller"] --> Carrier["Carrier"]
  Carrier -- "SIP INVITE" --> SBC["Session Border Controller"]
  SBC -- "SIP" --> PBX["Twilio / Asterisk"]
  PBX -- "RTP · Opus" --> Bridge["AI Voice Gateway"]
  Bridge --> AI["OpenAI Realtime"]
  AI --> Bridge
  Bridge --> PBX
CallSphere reference architecture

Asterisk's traditional dialplan language (extensions.conf) is fine for IVR but awkward for the kind of dynamic, LLM-driven flow control AI voice needs. ARI (introduced in Asterisk 12, hardened across 13-22) is a REST + WebSocket interface to channel control. The Stasis() dialplan application parks a channel under control of an ARI-connected external app, fires a StasisStart event over the WebSocket, and from then on every channel state change (dialing, answered, dtmf received, talking, silence, hangup) is a JSON event your app receives.

For AI voice in 2026 the model is: caller hits Asterisk, dialplan does Stasis(ai_agent), your Python app on ARI receives StasisStart, your app uses AudioSocket or external media to bridge raw PCM to your model server, the model talks back, your app sends TTS back through Asterisk, and on hangup or transfer the dialplan resumes. Asterisk handles the SIP, the codecs, the recording, the call-detail records; your app handles the AI brain.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live Demo →

Technical deep-dive

; extensions.conf
[from-trunk]
exten => _X.,1,NoOp(Inbound to AI agent)
 same => n,Answer()
 same => n,Stasis(ai_agent,inbound,${EXTEN})
 same => n,Hangup()
# Python ARI client (using aioari or requests-ari)
import asyncio
import aioari

async def main():
    client = await aioari.connect("http://asterisk:8088", "ari_user", "ari_pass")
    client.on_channel_event("StasisStart", on_start)
    client.on_channel_event("ChannelDtmfReceived", on_dtmf)
    client.on_channel_event("StasisEnd", on_end)
    await client.run(apps="ai_agent")

async def on_start(channel, event):
    args = event["args"]  # ['inbound', '+19175551212']
    # Bridge to AudioSocket for raw PCM
    bridge = await client.bridges.create(type="mixing")
    audiosocket = await client.channels.externalMedia(
        app="ai_agent",
        external_host="audiosocket-bridge:8090",
        format="slin16"
    )
    await bridge.addChannel(channel=channel.id)
    await bridge.addChannel(channel=audiosocket.id)
# AudioSocket server feeding OpenAI Realtime
async def handle_audiosocket(reader, writer):
    while True:
        kind = await reader.read(1)
        length = int.from_bytes(await reader.read(2), "big")
        payload = await reader.read(length)
        if kind == b"\x10":  # audio frame
            await openai_send_audio(payload)  # PCM 16-bit 16 kHz
        elif kind == b"\x00":  # hangup
            return

The AudioSocket protocol (introduced in Asterisk 18 and stable through 22) is a tiny TCP/UDP framing for raw PCM that bypasses ARI WebSocket bandwidth limits.

CallSphere implementation

CallSphere does not run Asterisk in production - we use Twilio Programmable Voice across all six verticals because it offloads carrier relationships, STIR/SHAKEN, A2P 10DLC, and SBC duty. But the Stasis pattern is what FastAPI :8084 is morally equivalent to: a Python control plane that receives a stream (Twilio Media Streams in our case, AudioSocket if we ran Asterisk) and manages the OpenAI Realtime conversation. For customers who require on-prem (we have several Healthcare AI prospects on this path), an Asterisk + Stasis + AudioSocket deployment with a sidecar bridge to OpenAI Realtime is the documented alternative. Sales Calling AI's 5 concurrent outbound calls per tenant and After-Hours AI's simul call+SMS with 120-second timeout map cleanly to ARI Originate APIs. Across 37 agents, 90+ tools, 115+ DB tables, HIPAA + SOC 2 alignment, $149/$499/$1499 pricing, and 14-day trial, the cloud path remains default and the Asterisk path is documented for self-hosted.

Implementation steps

  1. Install Asterisk 22 LTS or current; enable ARI (ari.conf) and create an ARI user with channel control privileges.
  2. Configure a SIP trunk (PJSIP preferred) and a context that hands incoming calls to Stasis(ai_agent).
  3. Stand up your ARI client in Python (aioari) or Go (ari-go); subscribe to channel events.
  4. Choose a media transport: AudioSocket (TCP, low-latency, simple) or external_media + RTP (more flexible, more code).
  5. Bridge the caller channel to the external_media channel via a mixing bridge.
  6. In your AudioSocket server, forward PCM to OpenAI Realtime; receive TTS back and write it as audio frames.
  7. Handle DTMF as ChannelDtmfReceived events; feed digits into the LLM as user input.
  8. On transfer, originate a new channel via ARI and bridge it; on hangup, leave Stasis cleanly.

FAQ

Asterisk vs FreeSWITCH for AI voice control? Asterisk has cleaner ARI semantics and better community for IVR-style flows. FreeSWITCH has mod_audio_fork and finer event-socket control. Pick by team familiarity.

Can Stasis handle thousands of concurrent calls? Yes, with care. A single Asterisk node handles 1000+ concurrent calls; ARI WebSocket is a single bottleneck so consider sharding if you need more.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Is AudioSocket production-grade? Yes in Asterisk 18+. The protocol is simple and field-tested.

HIPAA on Asterisk? Possible. Run SIP/TLS, SRTP, encrypt recordings at rest, sign a BAA with your hosting provider, and audit logs.

What about Asterisk 22 changes? Asterisk 22 (LTS released 2024) adds improved ARI events and stronger PJSIP defaults; AudioSocket continues to evolve.

Sources

Start a 14-day trial on the cloud stack, see pricing, or contact us about Asterisk-based on-prem AI voice deployments.

Share

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.

Related Articles You May Like

AI Voice Agents

MOS Call Quality Scoring for AI Voice Operations in 2026: Beyond 4.2

MOS 4.3+ is the band where AI voice feels human. Drop below 3.6 and conversations break. Here is how to measure, improve, and alert on MOS in production AI voice using G.711, Opus, and the underlying packet loss / jitter / latency math.

AI Infrastructure

Build a Voice Agent with Asterisk + ARI + Open LLM (2026)

Asterisk + ARI + AudioSocket + an open LLM = a voice agent that drops into your existing PBX. No SIP-trunking provider lock-in — full Python orchestration.

AI Strategy

State Data Residency for AI Voice in Healthcare — Texas, Nevada, Colorado in 2026

Texas SB 1188 requires US-resident EHRs from January 1, 2026; Nevada's consumer-health-data law constrains health data; Colorado AI Act takes effect June 30, 2026. AI voice agents must architect for state-by-state data localization.

AI Engineering

SIP Debugging with sngrep and Wireshark for AI Voice Calls in 2026: The Hands-On Playbook

When your AI voice agent gets one-way audio, missed DTMF, or codec mismatch, sngrep and Wireshark are still the fastest path to root cause in 2026. Here is the playbook.

AI Infrastructure

RTP Transcoding Cost for AI Voice in 2026: Why Edge Placement Beats Central GPU

Transcoding RTP to WebSocket is more CPU-intensive than people expect. For AI voice in 2026, where you place the transcode (edge near the carrier vs central near the model) decides your cost-per-minute.

AI Infrastructure

Kamailio Dispatcher for AI Voice Scaling in 2026: Round-Robin Is Not Enough

Kamailio 6.0's dispatcher module is how you horizontally scale AI voice bridges behind a SIP front-end. Round-robin is the easy answer; call-load and weight-based dispatching is the right one.