Build a Voice Agent with Asterisk + ARI + Open LLM (2026)
Asterisk + ARI + AudioSocket + an open LLM = a voice agent that drops into your existing PBX. No SIP-trunking provider lock-in — full Python orchestration.
TL;DR — If you already run Asterisk or FreePBX, you don't need Twilio. Use ARI for call control + AudioSocket for raw RTP and bolt on faster-whisper + Llama via Ollama + Piper. Production-ready and SIP-trunk-agnostic.
What you'll build
An Asterisk dialplan that, on incoming calls, hands the channel to a Python ARI app via AudioSocket. The app does STT, calls the LLM, synthesizes the reply, and pipes it back into the channel.
Prerequisites
- Asterisk 20 LTS (built with
chan_audiosocket). - Linux box on the same network (or co-located).
- Python 3.11,
pip install ari panoramisk faster-whisper piper-tts ollama numpy. - Ollama with
llama3.1:8b. - ARI configured in
/etc/asterisk/ari.confand a user (username = aiuser).
Architecture
flowchart LR
PSTN[Caller] -->|SIP| AST[Asterisk 20]
AST -->|ARI events| APP[Python ARI app]
AST <-->|AudioSocket TCP| APP
APP --> FW[faster-whisper]
APP --> OLL[Ollama llama3.1:8b]
APP --> PIP[Piper TTS]
Step 1 — Asterisk dialplan
```ini ; /etc/asterisk/extensions.conf [default] exten => 100,1,NoOp(Inbound to AI) same => n,Answer() same => n,Stasis(ai-voice) ; hand to ARI app named ai-voice same => n,Hangup() ```
```ini ; /etc/asterisk/ari.conf [general] enabled = yes [aiuser] type = user read_only = no password = supersecret ```
```ini ; /etc/asterisk/http.conf [general] enabled = yes bindaddr = 0.0.0.0 ```
Reload: asterisk -rx "core reload".
Step 2 — ARI app skeleton
```python
app.py
import ari, asyncio, socket, struct, threading client = ari.connect("http://127.0.0.1:8088", "aiuser", "supersecret")
def on_start(channel_obj, ev): chan = channel_obj["channel"] print("New call:", chan.id) # Bridge to AudioSocket on local TCP 9090 chan.continueInDialplan() # we'll execute AudioSocket from dialplan instead
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
client.on_channel_event("StasisStart", on_start) threading.Thread(target=client.run, args=(["ai-voice"],), daemon=True).start() ```
Step 3 — Push the channel into AudioSocket
Update the dialplan to hand the channel to AudioSocket once Stasis logs the call:
```ini [default] exten => 100,1,Answer() same => n,AudioSocket(uuid=${CHANNEL(uniqueid)},server=127.0.0.1:9090) same => n,Hangup() ```
AudioSocket sends slin16 mono frames over a raw TCP socket — perfect for Whisper.
Step 4 — TCP server that drives the conversation
```python
audiosocket_server.py
import socket, struct, numpy as np, subprocess, ollama from faster_whisper import WhisperModel stt = WhisperModel("small.en", device="cpu", compute_type="int8")
KIND_HANGUP, KIND_ID, KIND_SLIN, KIND_ERROR = 0x00, 0x01, 0x10, 0xFF
def read_frame(s): h = s.recv(3) if len(h) < 3: return None, None kind = h[0] length = struct.unpack(">H", h[1:3])[0] return kind, s.recv(length)
def send_slin(s, pcm_int16): for i in range(0, len(pcm_int16), 320): chunk = pcm_int16[i:i+320].tobytes() s.sendall(struct.pack(">BH", KIND_SLIN, len(chunk)) + chunk)
def transcribe(buf): pcm = np.frombuffer(buf, dtype=np.int16).astype(np.float32) / 32768 segs, _ = stt.transcribe(pcm, language="en", vad_filter=True) return " ".join(s.text for s in segs).strip()
def llm(history, text): history.append({"role":"user","content":text}) r = ollama.chat(model="llama3.1:8b", messages=history, options={"num_predict":140}) history.append(r["message"]) return r["message"]["content"]
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
def piper(text): p = subprocess.run( ["piper","--model","en_US-amy-medium","--output-raw"], input=text.encode(), capture_output=True) return np.frombuffer(p.stdout, dtype=np.int16)
srv = socket.socket(); srv.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1) srv.bind(("0.0.0.0", 9090)); srv.listen() print("AudioSocket listening :9090") while True: conn, _ = srv.accept() history, buf = [], bytearray() while True: kind, payload = read_frame(conn) if kind is None or kind == KIND_HANGUP: break if kind == KIND_SLIN: buf.extend(payload) if len(buf) > 16000 * 2 * 2: # ~2s audio text = transcribe(bytes(buf)); buf.clear() if text: reply = llm(history, text) pcm = piper(reply) send_slin(conn, pcm) ```
Step 5 — VAD for natural turn-taking
Wrap the buffer logic with webrtcvad instead of fixed 2-second windows:
```python import webrtcvad vad = webrtcvad.Vad(2) # 0–3, higher = more aggressive
Feed 20ms (320 byte) frames; trigger transcribe on trailing silence
```
Step 6 — Run + test
```bash ollama serve & python audiosocket_server.py & asterisk -rx "module load chan_audiosocket"
dial extension 100 from a softphone
```
Common pitfalls
- AudioSocket module not loaded.
asterisk -rx "module show like audiosocket"should list it. - Sample rate. AudioSocket is 8 kHz slin by default; force
SLIN16withSet(JITTERBUFFER(adaptive)=default)and codec negotiation. - Endianness. Length is big-endian; getting it wrong silently drops frames.
How CallSphere does this in production
CallSphere bridges to Asterisk-style PBXs via SIP for enterprise deployments while running its own 37-agent stack on cloud Realtime + ElevenLabs. Healthcare uses 14 HIPAA tools on FastAPI :8084; OneRoof Property runs 10 specialists on Pion WebRTC; Salon, Dental, F&B and Behavioral fill out 6 verticals. 90+ tools · 115+ DB tables. 14-day trial · 22% affiliate · /pricing.
FAQ
FreePBX support? Yes — same Asterisk under the hood.
Why not chan_pjsip + ExternalMedia? AudioSocket is simpler and more portable than ExternalMedia/RTP.
Latency? ~600–800 ms with VAD + small.en + 8B Q4.
Concurrent calls? Limited by your STT/LLM throughput; Asterisk handles thousands.
HIPAA? Lock down logs, set recording=no unless consented, encrypt SIP with TLS + SRTP.
Sources
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.