By Sagar Shankaran, Founder of CallSphere
Mispronouncing 'metformin' destroys caller trust in 30 seconds. Domain adaptation drops Word Error Rate 2–30 points in healthcare and legal. We cover ASR vocabulary biasing, TTS pronunciation lexicons, and acoustic LoRA for voice agents.
Key takeaways
TL;DR — A general ASR mispronounces "metformin" 12% of the time and "amortization" 18% of the time. Domain adaptation — vocabulary biasing on the ASR + a pronunciation lexicon on the TTS + (optionally) acoustic LoRA — drops Word Error Rate 2–30 points in regulated verticals. It's the difference between a real voice agent and a science demo.
Three layers need adaptation:
<phoneme> overrides for brand names and rare words.flowchart TD
AUDIO[Caller audio] --> ASR[ASR with biasing]
VOCAB[Domain phrase list] --> ASR
ASR --> TEXT[Transcript]
TEXT --> LLM[Agent LLM]
LLM --> TTS_TEXT[Agent reply]
TTS_TEXT --> TTS[TTS with lexicon]
LEX[Pronunciation lexicon] --> TTS
TTS --> SPEECH[Audio out]
CallSphere runs 6 verticals · 37 agents with hard latency budgets (<800 ms TTFT). Domain adaptation we run today:
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
Across 90+ tools · 115+ DB tables, the vocabulary list is generated from the database at deploy time — 90+ tools mean 90+ schema names that need to be pronounceable. Plans: $149 / $499 / $1,499, 14-day trial, 22% affiliate.
# Deepgram domain biasing
deepgram.transcribe(
audio,
model="nova-3",
keywords=["metformin:5","keytruda:5","prior-auth:3"], # weight per term
language="en-US",
)
# OpenAI Realtime + custom lexicon
session.update({
"type":"session.update",
"session":{
"input_audio_transcription":{"model":"whisper-1"},
"instructions":"Pronounce 'Olaplex' as oh-LAP-lex.",
}
})
# Acoustic LoRA fine-tune of Whisper for telephony audio
from peft import LoraConfig, get_peft_model
model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-large-v3")
peft = LoraConfig(r=8, lora_alpha=16, target_modules=["q_proj","v_proj"], lora_dropout=0.05)
model = get_peft_model(model, peft)
# train on (mu-law-resampled-audio, transcript) pairs
Q: How big a vocabulary list is too big? Most ASRs handle 500–5,000 keywords cleanly. Past 5,000 you start eroding general performance.
Q: Vocabulary biasing vs acoustic LoRA? Biasing for terms; LoRA for accent/channel/dialect. They're orthogonal — combine them.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
Q: Does this work with Whisper?
Whisper supports prompt-based biasing (the initial_prompt parameter); acoustic LoRA needs HuggingFace + PEFT.
Q: How do I measure WER per vertical? Hold out 200 transcribed-and-corrected calls per vertical and compute WER weekly. Track per-term error rate too.
Q: When to skip acoustic LoRA? If your audio channel is clean and your accent distribution matches the base model's training data, biasing alone is enough.
Written by
Sagar Shankaran· Founder, CallSphere
Sagar Shankaran is the founder of CallSphere, where he builds production AI voice and chat agents deployed across healthcare, hospitality, real estate, and home services. He writes about agentic AI, LLM engineering, and shipping voice agents that handle real calls in production.
See how AI voice agents work for your industry. Live demo available -- no signup required.
A founder's guide to texto a voz (text-to-speech in Spanish): LATAM vs Castilian voices, free options, and how CallSphere ships Spanish agents.
A founder's guide to the female voice generator landscape: AI female voices, Japanese voices, robot voices, and how CallSphere ships 57+ voices live.
A founder's guide to the Siri voice generator landscape: how AI voice cloning works, what is legal, and how CallSphere uses 57+ voices in production.
A founder's guide to robot voice TTS, character voice text-to-speech, and where the Brian voice and announcer voice still beat human voices.
How to voice text in 2026: best apps, the API stack behind them, and how I use the same tech inside CallSphere's 57+ language voice agents.
A founder's guide to AI voice assistants for ecommerce: customer service, order lookup, and how CallSphere fits in versus virtual receptionists.
© 2026 CallSphere LLC. All rights reserved.