---
title: "Build a Voice Agent with Piper TTS — Local, Free, and Fast (2026)"
description: "Piper 1.4.2 ships ONNX voices that synthesize on a Raspberry Pi 5 in real time. Here's a full Python voice agent with Piper, faster-whisper, and Ollama — no GPU required."
canonical: https://callsphere.ai/blog/vw4h-build-voice-agent-piper-tts-local-free
category: "AI Voice Agents"
tags: ["Piper TTS", "Local TTS", "ONNX", "Voice Agent", "Tutorial"]
author: "CallSphere Team"
published: 2026-03-18T00:00:00.000Z
updated: 2026-05-07T16:13:44.576Z
---

# Build a Voice Agent with Piper TTS — Local, Free, and Fast (2026)

> Piper 1.4.2 ships ONNX voices that synthesize on a Raspberry Pi 5 in real time. Here's a full Python voice agent with Piper, faster-whisper, and Ollama — no GPU required.

> **TL;DR** — Piper is the fastest open neural TTS that still sounds human. Version 1.4.2 (April 2 2026) added CUDA inference via ONNX Runtime and a unified `piper` CLI that streams raw PCM. Pair it with faster-whisper for STT and Ollama for the LLM and you have a CPU-friendly voice agent.

## What you'll build

A Python service that listens for an utterance, transcribes it with `faster-whisper` (small.en), gets a reply from a local `llama3.1:8b` via Ollama, and speaks it through Piper using `en_US-amy-medium`. End-to-end latency under 2 s on a Raspberry Pi 5 8GB.

## Prerequisites

1. Python 3.11+ with `pip install piper-tts faster-whisper sounddevice numpy ollama`.
2. Ollama installed and running (`ollama serve`).
3. Pull a model: `ollama pull llama3.1:8b`.
4. Download a Piper voice: `python -m piper.download_voices en_US-amy-medium`.

## Architecture

```mermaid
flowchart LR
  MIC[Microphone] --> FW[faster-whisper small.en]
  FW -->|text| OLL[Ollama HTTP :11434]
  OLL -->|text| PIPER[piper en_US-amy-medium]
  PIPER -->|PCM 22050| SPK[Speaker]
```

## Step 1 — Verify Piper from the CLI

```bash
echo "Hello from a fully local voice agent." | \
  piper --model en_US-amy-medium --output-raw | \
  aplay -r 22050 -f S16_LE -t raw -
```

If you hear audio, you're done with the hard part.

## Step 2 — Wrap Piper as a streaming Python class

```python
import subprocess, sounddevice as sd, numpy as np

class Piper:
    def **init**(self, model="en_US-amy-medium.onnx"):
        self.model = model
        self.sr = 22050
    def speak(self, text: str):
        proc = subprocess.Popen(
            ["piper", "--model", self.model, "--output-raw"],
            stdin=subprocess.PIPE, stdout=subprocess.PIPE)
        proc.stdin.write(text.encode()); proc.stdin.close()
        with sd.OutputStream(samplerate=self.sr, channels=1, dtype="int16") as out:
            while True:
                chunk = proc.stdout.read(4096)
                if not chunk: break
                out.write(np.frombuffer(chunk, dtype=np.int16))
```

The trick is `--output-raw` plus a streaming `OutputStream`: you start hearing audio while later phonemes are still being synthesized.

## Step 3 — STT with faster-whisper

```python
from faster_whisper import WhisperModel
stt = WhisperModel("small.en", device="cpu", compute_type="int8")

def transcribe(pcm_int16, sr=16000):
    audio = pcm_int16.astype(np.float32) / 32768.0
    segments, _ = stt.transcribe(audio, language="en", vad_filter=True)
    return " ".join(s.text.strip() for s in segments)
```

## Step 4 — LLM via Ollama

```python
import ollama
SYSTEM = "You are Amy, a helpful voice assistant. Keep replies under 2 sentences."

def reply(history, user_text):
    history.append({"role": "user", "content": user_text})
    r = ollama.chat(model="llama3.1:8b",
        messages=[{"role":"system","content":SYSTEM}, *history],
        options={"temperature": 0.4, "num_predict": 160})
    msg = r["message"]["content"]
    history.append({"role":"assistant","content":msg})
    return msg
```

## Step 5 — Mic capture loop with simple VAD

```python
def record(threshold=0.012, max_s=8):
    frames, silent = [], 0
    with sd.InputStream(samplerate=16000, channels=1, dtype="int16") as s:
        while silent < 9000 and len(frames) * 1600 < 16000 * max_s:
            chunk, _ = s.read(1600); frames.append(chunk)
            rms = np.sqrt(np.mean((chunk.astype(np.float32)/32768)**2))
            silent = silent + 1600 if rms < threshold else 0
    return np.concatenate(frames).flatten()
```

## Step 6 — Tie it together

```python
piper, history = Piper(), []
piper.speak("Hi, I'm Amy. How can I help?")
while True:
    pcm = record()
    text = transcribe(pcm)
    if not text.strip(): continue
    print("USER:", text)
    out = reply(history, text)
    print("BOT :", out)
    piper.speak(out)
```

## Common pitfalls

- **Wrong sample rate.** Piper voices ship at 16/22.05/24 kHz — read `config.json` next to the .onnx.
- **Pi 5 thermal throttling.** Add a fan or reduce to `tiny.en` Whisper.
- **`piper-tts` vs `piper` packages.** Use `pip install piper-tts` (the actively maintained 2026 fork).

## How CallSphere does this in production

CallSphere serves 37 specialist agents across 6 verticals (Healthcare 14 tools / OpenAI Realtime / FastAPI :8084, OneRoof Property 10 specialists, Salon, Dental, F&B, Behavioral) with cloud TTS for emotional warmth and a Pion-based WebRTC mesh. We use Piper internally for cost-sensitive workflows and offline kiosk demos. Pricing is flat $149 / $499 / $1499, [14-day trial](/trial), [22% affiliate](/affiliate). 115+ DB tables back 90+ tools. See it on [/demo](/demo).

## FAQ

**How does Piper compare to ElevenLabs?** ElevenLabs wins on emotion; Piper wins on cost and privacy.

**Can Piper clone voices?** Not directly — train a new voice with Piper's training pipeline (~3 hours of clean audio).

**Does Piper run on Android?** Yes, via `piper-android` ONNX Runtime build.

**Latency target?** Sub-300 ms first audio on a desktop CPU.

**Real-time on a Pi Zero 2?** Use `x_low` quality voices only.

## Sources

- [Piper on GitHub](https://github.com/rhasspy/piper)
- [piper-tts on PyPI](https://pypi.org/project/piper-tts/)
- [Piper voice samples](https://rhasspy.github.io/piper-samples/)
- [Pipecat Piper integration](https://docs.pipecat.ai/api-reference/server/services/tts/piper)

---

Source: https://callsphere.ai/blog/vw4h-build-voice-agent-piper-tts-local-free