Build a Voice Agent with Krisp Audio Filter and VIVA SDK (2026)
Krisp's VIVA SDK isolates the primary speaker before STT. Wire it as a pre-processor in front of LiveKit/Pipecat for 30%+ WER drop in noisy calls.
TL;DR — Krisp shipped VIVA (Voice Isolation for Voice Agents) in 2026 — a CPU-only model 3.5x smaller than its predecessor that strips background noise AND secondary voices before audio reaches your STT. Drop it as a pipeline pre-processor and watch WER improve 20-40% on real-world calls.
What you'll build
A LiveKit Agents pipeline with Krisp VIVA inserted between the room input track and Deepgram STT, so the LLM only hears the primary caller — even at a coffee shop or with a TV in the background.
Architecture
flowchart LR
MIC[Caller mic] --> RM[LiveKit room]
RM --> KR[Krisp VIVA filter]
KR -- clean PCM --> STT[Deepgram Nova-3]
STT --> LLM[GPT-4o]
LLM --> TTS[ElevenLabs]
TTS --> RM --> MIC
Step 1 — Get Krisp SDK
Sign up at developers.krisp.ai. You'll get an SDK token + a per-platform binary (libkrisp-audio-sdk.so for Linux, .dylib Mac, .dll Windows, .wasm for browser).
Step 2 — Python bindings
```bash pip install krisp-audio-sdk # internal pip from Krisp export KRISP_TOKEN="your-token" ```
Step 3 — Wrap as an audio processor
```python import numpy as np from krisp_audio_sdk import AudioCleaner, ModelType from livekit.agents import audio
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
class KrispVAF(audio.AudioProcessor): def init(self): self.cleaner = AudioCleaner( model=ModelType.VIVA_VC_32K, # voice-call optimised sample_rate=16000, frame_size_ms=10, ) async def process(self, frame: audio.AudioFrame) -> audio.AudioFrame: clean = self.cleaner.clean_frame(frame.data) return audio.AudioFrame(data=clean, sample_rate=frame.sample_rate, num_channels=frame.num_channels) ```
Step 4 — Insert into LiveKit pipeline
```python from livekit.agents import AgentSession, RoomInputOptions
session = AgentSession( stt=deepgram.STT(model="nova-3"), llm=openai.LLM(model="gpt-4o"), tts=elevenlabs.TTS(), ) await session.start( room=ctx.room, agent=Concierge(), room_input_options=RoomInputOptions( audio_processors=[KrispVAF()], ), ) ```
Step 5 — Browser fallback (WASM)
```ts import { KrispSDK } from "@krisp.ai/krisp-audio-sdk-wasm";
const krisp = await KrispSDK.create({
authToken: process.env.NEXT_PUBLIC_KRISP_TOKEN!,
model: "viva_vc_16k",
});
const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
const cleaned = await krisp.process(stream); // returns a MediaStream
// pipe cleaned into your WebRTC peer connection or AudioWorklet
```
Step 6 — Pipecat variant
```python from pipecat.audio.filters.krisp_filter import KrispFilter
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
transport = DailyTransport(..., DailyParams( audio_in_filter=KrispFilter(model="viva_vc_32k"), )) ```
Step 7 — Measure the win
Run a controlled WER test (e.g. LibriSpeech + cafe-noise SNR 10 dB). Typical numbers in 2026: Deepgram Nova-3 alone hits ~14% WER on noisy mixed clips; Nova-3 + VIVA drops to ~9% — a >30% relative reduction.
Pitfalls
- Sample rate: VIVA models are SR-pinned (16k or 32k); resample BEFORE
clean_frame. - CPU budget: VIVA-VC adds ~6-10% single-core CPU per stream; size workers accordingly.
- Frame size: Stick with 10ms — 20ms increases buffer latency 2x for marginal quality gain.
- Browser CORS: WASM build requires
Cross-Origin-Embedder-Policy: require-corpin your headers.
How CallSphere does this
CallSphere wraps every inbound call across 6 verticals with VIVA, then feeds 37 agents through 90+ tools and 115+ DB tables. The salon vertical (loud chair-side noise) saw a 33% WER reduction. $149/$499/$1,499 · 14-day trial · 22% affiliate.
FAQ
Cloud or local? Krisp processes locally — no audio leaves the worker, so HIPAA/PII stays sealed.
License model? Per-minute via SDK token; volume tiers down to ~$0.001/min at scale.
Mobile? iOS + Android binaries ship with the same API surface.
Compatible with Deepgram/AssemblyAI/Soniox? Yes — VIVA is a pre-processor, totally vendor-neutral.
Sources
- Krisp Developers - Real-Time AI Voice SDK - https://krisp.ai/developers/
- Krisp SDK Docs - https://sdk-docs.krisp.ai/
- Krisp Blog - 3.5x Smaller Voice Isolation Model - https://krisp.ai/blog/small-voice-isolation-model/
- Krisp SDK Docs - Twilio Voice Integration - https://sdk-docs.krisp.ai/docs/twilio-voice
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.