By Sagar Shankaran, Founder of CallSphere
Asterisk + ARI + AudioSocket + an open LLM = a voice agent that drops into your existing PBX. No SIP-trunking provider lock-in — full Python orchestration.
Key takeaways
TL;DR — If you already run Asterisk or FreePBX, you don't need Twilio. Use ARI for call control + AudioSocket for raw RTP and bolt on faster-whisper + Llama via Ollama + Piper. Production-ready and SIP-trunk-agnostic.
An Asterisk dialplan that, on incoming calls, hands the channel to a Python ARI app via AudioSocket. The app does STT, calls the LLM, synthesizes the reply, and pipes it back into the channel.
chan_audiosocket).pip install ari panoramisk faster-whisper piper-tts ollama numpy.llama3.1:8b./etc/asterisk/ari.conf and a user (username = aiuser).flowchart LR
PSTN[Caller] -->|SIP| AST[Asterisk 20]
AST -->|ARI events| APP[Python ARI app]
AST <-->|AudioSocket TCP| APP
APP --> FW[faster-whisper]
APP --> OLL[Ollama llama3.1:8b]
APP --> PIP[Piper TTS]
```ini ; /etc/asterisk/extensions.conf [default] exten => 100,1,NoOp(Inbound to AI) same => n,Answer() same => n,Stasis(ai-voice) ; hand to ARI app named ai-voice same => n,Hangup() ```
```ini ; /etc/asterisk/ari.conf [general] enabled = yes [aiuser] type = user read_only = no password = supersecret ```
```ini ; /etc/asterisk/http.conf [general] enabled = yes bindaddr = 0.0.0.0 ```
Reload: asterisk -rx "core reload".
```python
import ari, asyncio, socket, struct, threading client = ari.connect("http://127.0.0.1:8088", "aiuser", "supersecret")
def on_start(channel_obj, ev): chan = channel_obj["channel"] print("New call:", chan.id) # Bridge to AudioSocket on local TCP 9090 chan.continueInDialplan() # we'll execute AudioSocket from dialplan instead
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
client.on_channel_event("StasisStart", on_start) threading.Thread(target=client.run, args=(["ai-voice"],), daemon=True).start() ```
Update the dialplan to hand the channel to AudioSocket once Stasis logs the call:
```ini [default] exten => 100,1,Answer() same => n,AudioSocket(uuid=${CHANNEL(uniqueid)},server=127.0.0.1:9090) same => n,Hangup() ```
AudioSocket sends slin16 mono frames over a raw TCP socket — perfect for Whisper.
```python
import socket, struct, numpy as np, subprocess, ollama from faster_whisper import WhisperModel stt = WhisperModel("small.en", device="cpu", compute_type="int8")
KIND_HANGUP, KIND_ID, KIND_SLIN, KIND_ERROR = 0x00, 0x01, 0x10, 0xFF
def read_frame(s): h = s.recv(3) if len(h) < 3: return None, None kind = h[0] length = struct.unpack(">H", h[1:3])[0] return kind, s.recv(length)
def send_slin(s, pcm_int16): for i in range(0, len(pcm_int16), 320): chunk = pcm_int16[i:i+320].tobytes() s.sendall(struct.pack(">BH", KIND_SLIN, len(chunk)) + chunk)
def transcribe(buf): pcm = np.frombuffer(buf, dtype=np.int16).astype(np.float32) / 32768 segs, _ = stt.transcribe(pcm, language="en", vad_filter=True) return " ".join(s.text for s in segs).strip()
def llm(history, text): history.append({"role":"user","content":text}) r = ollama.chat(model="llama3.1:8b", messages=history, options={"num_predict":140}) history.append(r["message"]) return r["message"]["content"]
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
def piper(text): p = subprocess.run( ["piper","--model","en_US-amy-medium","--output-raw"], input=text.encode(), capture_output=True) return np.frombuffer(p.stdout, dtype=np.int16)
srv = socket.socket(); srv.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1) srv.bind(("0.0.0.0", 9090)); srv.listen() print("AudioSocket listening :9090") while True: conn, _ = srv.accept() history, buf = [], bytearray() while True: kind, payload = read_frame(conn) if kind is None or kind == KIND_HANGUP: break if kind == KIND_SLIN: buf.extend(payload) if len(buf) > 16000 * 2 * 2: # ~2s audio text = transcribe(bytes(buf)); buf.clear() if text: reply = llm(history, text) pcm = piper(reply) send_slin(conn, pcm) ```
Wrap the buffer logic with webrtcvad instead of fixed 2-second windows:
```python import webrtcvad vad = webrtcvad.Vad(2) # 0–3, higher = more aggressive
```
```bash ollama serve & python audiosocket_server.py & asterisk -rx "module load chan_audiosocket"
```
asterisk -rx "module show like audiosocket" should list it.SLIN16 with Set(JITTERBUFFER(adaptive)=default) and codec negotiation.CallSphere bridges to Asterisk-style PBXs via SIP for enterprise deployments while running its own 37-agent stack on cloud Realtime + ElevenLabs. Healthcare uses 14 HIPAA tools on FastAPI :8084; OneRoof Property runs 10 specialists on Pion WebRTC; Salon, Dental, F&B and Behavioral fill out 6 verticals. 90+ tools · 115+ DB tables. 14-day trial · 22% affiliate · /pricing.
FreePBX support? Yes — same Asterisk under the hood.
Why not chan_pjsip + ExternalMedia? AudioSocket is simpler and more portable than ExternalMedia/RTP.
Latency? ~600–800 ms with VAD + small.en + 8B Q4.
Concurrent calls? Limited by your STT/LLM throughput; Asterisk handles thousands.
HIPAA? Lock down logs, set recording=no unless consented, encrypt SIP with TLS + SRTP.
Written by
Sagar Shankaran· Founder, CallSphere
Sagar Shankaran is the founder of CallSphere, where he builds production AI voice and chat agents deployed across healthcare, hospitality, real estate, and home services. He writes about agentic AI, LLM engineering, and shipping voice agents that handle real calls in production.
See how AI voice agents work for your industry. Live demo available -- no signup required.
A founder's guide to business phone systems in 2026. Cloud vs on-prem, AI voice agents, small business pricing, and what actually works for under 100 seats.
Haystack 2.7's Agent component plus an Ollama-served Llama 3.2 gives you tool-calling RAG with citations. Here's a complete pipeline against your own document store.
Run STT, LLM, and TTS entirely on Cloudflare's edge — no OpenAI, no ElevenLabs. Real working code with Whisper, Llama 3.3 70B, and Deepgram Aura.
Version your prompts in git, run a 50-case eval suite on every PR, block merges below threshold, and ship a new agent prompt with confidence — full GitHub Actions tutorial.
Replace expensive outbound SDR tooling with a self-hosted dialer that runs OpenAI Realtime agents at 100 concurrent calls. Full architecture and code.
HVAC companies miss 40–60% of inbound. Build a 4-agent dispatch (intake, scheduling, parts, emergency) that integrates with ServiceTitan in 600 lines.
© 2026 CallSphere LLC. All rights reserved.
Watch how CallSphere handles real customer calls, schedules appointments, and processes payments — live.
Try Live DemoBook a DemoCalculate Your ROI