By Sagar Shankaran, Founder of CallSphere
Piper 1.4.2 ships ONNX voices that synthesize on a Raspberry Pi 5 in real time. Here's a full Python voice agent with Piper, faster-whisper, and Ollama — no GPU required.
Key takeaways
TL;DR — Piper is the fastest open neural TTS that still sounds human. Version 1.4.2 (April 2 2026) added CUDA inference via ONNX Runtime and a unified
piperCLI that streams raw PCM. Pair it with faster-whisper for STT and Ollama for the LLM and you have a CPU-friendly voice agent.
A Python service that listens for an utterance, transcribes it with faster-whisper (small.en), gets a reply from a local llama3.1:8b via Ollama, and speaks it through Piper using en_US-amy-medium. End-to-end latency under 2 s on a Raspberry Pi 5 8GB.
pip install piper-tts faster-whisper sounddevice numpy ollama.ollama serve).ollama pull llama3.1:8b.python -m piper.download_voices en_US-amy-medium.flowchart LR
MIC[Microphone] --> FW[faster-whisper small.en]
FW -->|text| OLL[Ollama HTTP :11434]
OLL -->|text| PIPER[piper en_US-amy-medium]
PIPER -->|PCM 22050| SPK[Speaker]
```bash echo "Hello from a fully local voice agent." | \ piper --model en_US-amy-medium --output-raw | \ aplay -r 22050 -f S16_LE -t raw - ```
If you hear audio, you're done with the hard part.
```python import subprocess, sounddevice as sd, numpy as np
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
class Piper: def init(self, model="en_US-amy-medium.onnx"): self.model = model self.sr = 22050 def speak(self, text: str): proc = subprocess.Popen( ["piper", "--model", self.model, "--output-raw"], stdin=subprocess.PIPE, stdout=subprocess.PIPE) proc.stdin.write(text.encode()); proc.stdin.close() with sd.OutputStream(samplerate=self.sr, channels=1, dtype="int16") as out: while True: chunk = proc.stdout.read(4096) if not chunk: break out.write(np.frombuffer(chunk, dtype=np.int16)) ```
The trick is --output-raw plus a streaming OutputStream: you start hearing audio while later phonemes are still being synthesized.
```python from faster_whisper import WhisperModel stt = WhisperModel("small.en", device="cpu", compute_type="int8")
def transcribe(pcm_int16, sr=16000): audio = pcm_int16.astype(np.float32) / 32768.0 segments, _ = stt.transcribe(audio, language="en", vad_filter=True) return " ".join(s.text.strip() for s in segments) ```
```python import ollama SYSTEM = "You are Amy, a helpful voice assistant. Keep replies under 2 sentences."
def reply(history, user_text): history.append({"role": "user", "content": user_text}) r = ollama.chat(model="llama3.1:8b", messages=[{"role":"system","content":SYSTEM}, *history], options={"temperature": 0.4, "num_predict": 160}) msg = r["message"]["content"] history.append({"role":"assistant","content":msg}) return msg ```
```python def record(threshold=0.012, max_s=8): frames, silent = [], 0 with sd.InputStream(samplerate=16000, channels=1, dtype="int16") as s: while silent < 9000 and len(frames) * 1600 < 16000 * max_s: chunk, _ = s.read(1600); frames.append(chunk) rms = np.sqrt(np.mean((chunk.astype(np.float32)/32768)**2)) silent = silent + 1600 if rms < threshold else 0 return np.concatenate(frames).flatten() ```
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
```python piper, history = Piper(), [] piper.speak("Hi, I'm Amy. How can I help?") while True: pcm = record() text = transcribe(pcm) if not text.strip(): continue print("USER:", text) out = reply(history, text) print("BOT :", out) piper.speak(out) ```
config.json next to the .onnx.tiny.en Whisper.piper-tts vs piper packages. Use pip install piper-tts (the actively maintained 2026 fork).CallSphere serves 37 specialist agents across 6 verticals (Healthcare 14 tools / OpenAI Realtime / FastAPI :8084, OneRoof Property 10 specialists, Salon, Dental, F&B, Behavioral) with cloud TTS for emotional warmth and a Pion-based WebRTC mesh. We use Piper internally for cost-sensitive workflows and offline kiosk demos. Pricing is flat $149 / $499 / $1499, 14-day trial, 22% affiliate. 115+ DB tables back 90+ tools. See it on /demo.
How does Piper compare to ElevenLabs? ElevenLabs wins on emotion; Piper wins on cost and privacy.
Can Piper clone voices? Not directly — train a new voice with Piper's training pipeline (~3 hours of clean audio).
Does Piper run on Android? Yes, via piper-android ONNX Runtime build.
Latency target? Sub-300 ms first audio on a desktop CPU.
Real-time on a Pi Zero 2? Use x_low quality voices only.
Written by
Sagar Shankaran· Founder, CallSphere
Sagar Shankaran is the founder of CallSphere, where he builds production AI voice and chat agents deployed across healthcare, hospitality, real estate, and home services. He writes about agentic AI, LLM engineering, and shipping voice agents that handle real calls in production.
See how AI voice agents work for your industry. Live demo available -- no signup required.
The voice AI market hits $47.5B by 2034. For gyms and PT studios, voice agents now make economic sense for member intake, upsells, and reactivation campaigns.
With the voice AI market at $47.5B by 2034 and OpenAI's realtime release this week, every dealership and service shop should be evaluating voice agents. Here's how.
Spring 2026 AC season starts now. With the voice AI market at $47.5B by 2034, HVAC shops without after-hours voice agents will lose to those that have them.
Haystack 2.7's Agent component plus an Ollama-served Llama 3.2 gives you tool-calling RAG with citations. Here's a complete pipeline against your own document store.
OpenAI's GPT-Realtime-Translate handles 70 input languages live at $0.034/min. Here is what that means for multilingual restaurant takeout — and how CallSphere ships it.
OpenAI's GPT-Realtime-Translate hits 70 languages at $0.034/min. For dental practices in diverse metros, this changes who picks up the phone — and who books the appointment.
© 2026 CallSphere LLC. All rights reserved.
Watch how CallSphere handles real customer calls, schedules appointments, and processes payments — live.
Try Live DemoBook a DemoCalculate Your ROI