TL;DR — If you already run Asterisk or FreePBX, you don't need Twilio. Use ARI for call control + AudioSocket for raw RTP and bolt on faster-whisper + Llama via Ollama + Piper. Production-ready and SIP-trunk-agnostic.

What you'll build

An Asterisk dialplan that, on incoming calls, hands the channel to a Python ARI app via AudioSocket. The app does STT, calls the LLM, synthesizes the reply, and pipes it back into the channel.

Prerequisites

Asterisk 20 LTS (built with chan_audiosocket).
Linux box on the same network (or co-located).
Python 3.11, pip install ari panoramisk faster-whisper piper-tts ollama numpy.
Ollama with llama3.1:8b.
ARI configured in /etc/asterisk/ari.conf and a user (username = aiuser).

Architecture

flowchart LR
  PSTN[Caller] -->|SIP| AST[Asterisk 20]
  AST -->|ARI events| APP[Python ARI app]
  AST <-->|AudioSocket TCP| APP
  APP --> FW[faster-whisper]
  APP --> OLL[Ollama llama3.1:8b]
  APP --> PIP[Piper TTS]

Step 1 — Asterisk dialplan

```ini ; /etc/asterisk/extensions.conf [default] exten => 100,1,NoOp(Inbound to AI) same => n,Answer() same => n,Stasis(ai-voice) ; hand to ARI app named ai-voice same => n,Hangup() ```

```ini ; /etc/asterisk/ari.conf [general] enabled = yes [aiuser] type = user read_only = no password = supersecret ```

```ini ; /etc/asterisk/http.conf [general] enabled = yes bindaddr = 0.0.0.0 ```

Reload: asterisk -rx "core reload".

Step 2 — ARI app skeleton

```python

app.py

import ari, asyncio, socket, struct, threading client = ari.connect("http://127.0.0.1:8088", "aiuser", "supersecret")

def on_start(channel_obj, ev): chan = channel_obj["channel"] print("New call:", chan.id) # Bridge to AudioSocket on local TCP 9090 chan.continueInDialplan() # we'll execute AudioSocket from dialplan instead

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live →

Try Live Demo →

client.on_channel_event("StasisStart", on_start) threading.Thread(target=client.run, args=(["ai-voice"],), daemon=True).start() ```

Step 3 — Push the channel into AudioSocket

Update the dialplan to hand the channel to AudioSocket once Stasis logs the call:

```ini [default] exten => 100,1,Answer() same => n,AudioSocket(uuid=${CHANNEL(uniqueid)},server=127.0.0.1:9090) same => n,Hangup() ```

AudioSocket sends slin16 mono frames over a raw TCP socket — perfect for Whisper.

Step 4 — TCP server that drives the conversation

```python

audiosocket_server.py

import socket, struct, numpy as np, subprocess, ollama from faster_whisper import WhisperModel stt = WhisperModel("small.en", device="cpu", compute_type="int8")

KIND_HANGUP, KIND_ID, KIND_SLIN, KIND_ERROR = 0x00, 0x01, 0x10, 0xFF

def read_frame(s): h = s.recv(3) if len(h) < 3: return None, None kind = h[0] length = struct.unpack(">H", h[1:3])[0] return kind, s.recv(length)

def send_slin(s, pcm_int16): for i in range(0, len(pcm_int16), 320): chunk = pcm_int16[i:i+320].tobytes() s.sendall(struct.pack(">BH", KIND_SLIN, len(chunk)) + chunk)

def transcribe(buf): pcm = np.frombuffer(buf, dtype=np.int16).astype(np.float32) / 32768 segs, _ = stt.transcribe(pcm, language="en", vad_filter=True) return " ".join(s.text for s in segs).strip()

def llm(history, text): history.append({"role":"user","content":text}) r = ollama.chat(model="llama3.1:8b", messages=history, options={"num_predict":140}) history.append(r["message"]) return r["message"]["content"]

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Try Live Demo → Book 30-min Walkthrough See Pricing

def piper(text): p = subprocess.run( ["piper","--model","en_US-amy-medium","--output-raw"], input=text.encode(), capture_output=True) return np.frombuffer(p.stdout, dtype=np.int16)

srv = socket.socket(); srv.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1) srv.bind(("0.0.0.0", 9090)); srv.listen() print("AudioSocket listening :9090") while True: conn, _ = srv.accept() history, buf = [], bytearray() while True: kind, payload = read_frame(conn) if kind is None or kind == KIND_HANGUP: break if kind == KIND_SLIN: buf.extend(payload) if len(buf) > 16000 * 2 * 2: # ~2s audio text = transcribe(bytes(buf)); buf.clear() if text: reply = llm(history, text) pcm = piper(reply) send_slin(conn, pcm) ```

Step 5 — VAD for natural turn-taking

Wrap the buffer logic with webrtcvad instead of fixed 2-second windows:

```python import webrtcvad vad = webrtcvad.Vad(2) # 0–3, higher = more aggressive

Feed 20ms (320 byte) frames; trigger transcribe on trailing silence

```

Step 6 — Run + test

```bash ollama serve & python audiosocket_server.py & asterisk -rx "module load chan_audiosocket"

dial extension 100 from a softphone

```

Common pitfalls

AudioSocket module not loaded. asterisk -rx "module show like audiosocket" should list it.
Sample rate. AudioSocket is 8 kHz slin by default; force SLIN16 with Set(JITTERBUFFER(adaptive)=default) and codec negotiation.
Endianness. Length is big-endian; getting it wrong silently drops frames.

How CallSphere does this in production

CallSphere bridges to Asterisk-style PBXs via SIP for enterprise deployments while running its own 37-agent stack on cloud Realtime + ElevenLabs. Healthcare uses 14 HIPAA tools on FastAPI :8084; OneRoof Property runs 10 specialists on Pion WebRTC; Salon, Dental, F&B and Behavioral fill out 6 verticals. 90+ tools · 115+ DB tables. 14-day trial · 22% affiliate · /pricing.

FAQ

FreePBX support? Yes — same Asterisk under the hood.

Why not chan_pjsip + ExternalMedia? AudioSocket is simpler and more portable than ExternalMedia/RTP.

Latency? ~600–800 ms with VAD + small.en + 8B Q4.

Concurrent calls? Limited by your STT/LLM throughput; Asterisk handles thousands.

HIPAA? Lock down logs, set recording=no unless consented, encrypt SIP with TLS + SRTP.

Build a Voice Agent with Asterisk + ARI + Open LLM (2026)

What you'll build

Prerequisites

Architecture

Step 1 — Asterisk dialplan

Step 2 — ARI app skeleton

app.py

Step 3 — Push the channel into AudioSocket

Step 4 — TCP server that drives the conversation

audiosocket_server.py

Step 5 — VAD for natural turn-taking

Feed 20ms (320 byte) frames; trigger transcribe on trailing silence

Step 6 — Run + test

dial extension 100 from a softphone

Common pitfalls

How CallSphere does this in production

FAQ

Sources

Try CallSphere AI Voice Agents

Related Articles You May Like

Build a Chat Agent with Haystack RAG + Open LLM (Llama 3.2, 2026)

Build a Voice Agent on Cloudflare Workers AI (No External LLM)

How to Build Voice Agent CI/CD with Evals as Gate (GitHub Actions)

Build a CallSphere-Style Outbound Voice Campaign Tool

Build a CallSphere-Style Multi-Agent for HVAC Dispatch

Build a Voice Agent on AWS App Runner with FastAPI + Bedrock (2026)