Multi-Language Voice Agents with Handoffs
Build multi-language voice agents that detect the caller's language, perform agent handoffs between language-specific specialists, and maintain context across language transitions.
The Multilingual Challenge
Businesses that serve diverse populations need voice agents that work in multiple languages. A property management company in a multicultural city might receive calls in English, Spanish, Mandarin, and Hindi within the same hour. A healthcare hotline serving immigrant communities must understand patients regardless of which language they speak.
Building a single voice agent that handles all languages equally well is harder than it sounds. Each language has different speech patterns, politeness conventions, sentence structures, and cultural expectations. The most effective architecture uses language-specific specialist agents with intelligent handoffs between them.
Architecture: Language Router with Specialist Agents
The pattern is straightforward: a front-door agent detects the caller's language and hands off to the appropriate specialist. Each specialist is tuned for its language — with culturally appropriate greetings, instructions in the target language, and language-specific tools.
flowchart TD
START["Multi-Language Voice Agents with Handoffs"] --> A
A["The Multilingual Challenge"]
A --> B
B["Architecture: Language Router with Spec…"]
B --> C
C["Language Detection Strategies"]
C --> D
D["Building Language-Specific Agents"]
D --> E
E["Implementing the Handoff"]
E --> F
F["Maintaining Context Across Language Swi…"]
F --> G
G["Voice and TTS Considerations"]
G --> H
H["Testing Multilingual Voice Agents"]
H --> DONE["Key Takeaways"]
style START fill:#4f46e5,stroke:#4338ca,color:#fff
style DONE fill:#059669,stroke:#047857,color:#fff
┌──────────────┐
│ Incoming │
│ Call │
└──────┬───────┘
│
┌──────▼───────┐
│ Language │
│ Router │
│ Agent │
└──────┬───────┘
│ handoff based on detected language
┌────┼────┬────────┐
│ │ │ │
┌─▼──┐┌▼──┐┌▼──────┐┌▼──────┐
│ EN ││ ES ││ ZH ││ HI │
│Agent││Agent││ Agent ││ Agent │
└────┘└────┘└───────┘└───────┘
Language Detection Strategies
Before you can route to the right specialist, you need to identify the language. There are three approaches, each with tradeoffs.
Strategy 1: Ask the Caller
The simplest approach is a multilingual greeting that asks the caller to state their preferred language:
from agents import Agent
language_router = Agent(
name="LanguageRouter",
instructions="""You are a multilingual receptionist. Greet the caller with:
"Hello! Welcome to Acme Services.
Para espanol, diga 'espanol'.
For English, say 'English'.
Mandarin, qing shuo 'zhongwen'.
Hindi ke liye, 'Hindi' kahein."
Listen to the caller's response and determine which language they prefer.
If they respond in a specific language without explicitly choosing,
use that language. Hand off to the appropriate language specialist.""",
)
Strategy 2: Automatic Language Identification
Use the speech-to-text transcription to detect the language automatically. The OpenAI Realtime API transcribes audio and can indicate the detected language:
See AI Voice Agents Handle Real Calls
Book a free demo or calculate how much you can save with AI voice automation.
import json
from typing import Optional
class LanguageDetector:
"""Detect language from the first few seconds of speech."""
CONFIDENCE_THRESHOLD = 0.7
SUPPORTED_LANGUAGES = {"en", "es", "zh", "hi", "fr", "ar", "pt"}
def __init__(self):
self._samples: list[str] = []
self._detected: Optional[str] = None
def on_transcript(self, transcript: str, language_code: str, confidence: float):
"""Process a transcript chunk with language detection metadata."""
self._samples.append(language_code)
if confidence >= self.CONFIDENCE_THRESHOLD:
self._detected = language_code
return self._detected
# After 3 samples, use majority vote
if len(self._samples) >= 3:
from collections import Counter
most_common = Counter(self._samples).most_common(1)[0][0]
if most_common in self.SUPPORTED_LANGUAGES:
self._detected = most_common
return self._detected
return None
@property
def language(self) -> Optional[str]:
return self._detected
Strategy 3: Hybrid Approach
The most robust approach combines automatic detection with explicit confirmation. Detect the language automatically, greet the caller in that language, and confirm:
async def hybrid_language_detection(ws, detector: LanguageDetector):
"""Detect language and confirm with the caller."""
GREETINGS = {
"en": "Hello! I detected you are speaking English. Is that correct?",
"es": "Hola! He detectado que habla espanol. Es correcto?",
"zh": "Ni hao! Wo jiance dao nin shuo zhongwen. Dui ma?",
"hi": "Namaste! Mujhe lagta hai aap Hindi mein bol rahe hain. Kya yah sahi hai?",
}
detected = detector.language
if detected and detected in GREETINGS:
greeting = GREETINGS[detected]
else:
greeting = (
"Hello! Which language would you prefer? "
"English, Espanol, Zhongwen, or Hindi?"
)
await ws.send(json.dumps({
"type": "conversation.item.create",
"item": {
"type": "message",
"role": "assistant",
"content": [{"type": "input_text", "text": greeting}],
},
}))
await ws.send(json.dumps({"type": "response.create"}))
Building Language-Specific Agents
Each specialist agent is configured with language-appropriate instructions, voice, and tools:
flowchart TD
CENTER(("Core Concepts"))
CENTER --> N0["Language detection accuracy — Test with…"]
CENTER --> N1["Handoff correctness — Verify the right …"]
CENTER --> N2["Context preservation — Ensure account d…"]
CENTER --> N3["Fallback behavior — Test with unsupport…"]
CENTER --> N4["Mixed-language input — Some callers mix…"]
style CENTER fill:#4f46e5,stroke:#4338ca,color:#fff
from agents import Agent, function_tool
@function_tool
async def lookup_account(account_id: str) -> str:
"""Look up customer account details by ID."""
# Database lookup logic
return f"Account {account_id}: Active, balance $150.00"
@function_tool
async def schedule_appointment(
date: str, time: str, service_type: str, language: str
) -> str:
"""Schedule a service appointment with language preference noted."""
return (
f"Appointment scheduled for {date} at {time} "
f"for {service_type}. Language preference: {language}"
)
english_agent = Agent(
name="EnglishSpecialist",
instructions="""You are a customer service agent. Speak only in English.
Be professional and helpful. Use clear, simple language.
Always confirm important details before taking actions.""",
tools=[lookup_account, schedule_appointment],
)
spanish_agent = Agent(
name="SpanishSpecialist",
instructions="""Eres un agente de servicio al cliente. Habla solo en espanol.
Se profesional y amable. Usa un lenguaje claro y sencillo.
Siempre confirma los detalles importantes antes de tomar acciones.
Use 'usted' for formal address unless the caller uses 'tu' first.""",
tools=[lookup_account, schedule_appointment],
)
mandarin_agent = Agent(
name="MandarinSpecialist",
instructions="""你是一位客户服务代理。请只使用中文交流。
保持专业和友好。使用清晰简洁的语言。
在执行任何操作之前,请务必确认重要细节。
Use formal register (您) unless the caller uses informal (你).""",
tools=[lookup_account, schedule_appointment],
)
hindi_agent = Agent(
name="HindiSpecialist",
instructions="""Aap ek customer service agent hain. Sirf Hindi mein baat karein.
Professional aur madad karne wale banein. Saaf aur seedhi bhasha ka istemal karein.
Koi bhi action lene se pehle zaroori details confirm karein.
Use respectful 'aap' form throughout the conversation.""",
tools=[lookup_account, schedule_appointment],
)
Implementing the Handoff
The OpenAI Agents SDK supports handoffs natively. The language router agent uses handoffs to transfer control to the appropriate specialist:
from agents import Agent
language_router = Agent(
name="LanguageRouter",
instructions="""You are a language routing agent. Your only job is to:
1. Detect the caller's preferred language
2. Hand off to the correct language specialist
Do NOT attempt to answer questions yourself.
Greet the caller briefly in a multilingual way, detect their language,
and perform the handoff immediately.
Supported languages: English, Spanish, Mandarin, Hindi.
If the language is not supported, hand off to the English specialist
and let them know the caller may need an interpreter.""",
handoffs=[english_agent, spanish_agent, mandarin_agent, hindi_agent],
)
When the router detects that a caller is speaking Spanish, it hands off to the SpanishSpecialist. The handoff transfers the full conversation context, so the specialist knows what has been said so far.
Maintaining Context Across Language Switches
Sometimes a caller switches languages mid-conversation, or asks to be transferred to a different language specialist. You need to preserve context across these transitions:
from dataclasses import dataclass, field
from typing import Optional
@dataclass
class MultilingualContext:
caller_id: str
detected_language: str
confirmed_language: Optional[str] = None
conversation_summary: str = ""
account_id: Optional[str] = None
pending_actions: list = field(default_factory=list)
language_switches: list = field(default_factory=list)
def switch_language(self, new_language: str, reason: str):
"""Record a language switch with context."""
self.language_switches.append({
"from": self.confirmed_language or self.detected_language,
"to": new_language,
"reason": reason,
"summary_at_switch": self.conversation_summary,
})
self.confirmed_language = new_language
def handoff_context(self) -> str:
"""Generate context string for the receiving agent."""
parts = [f"Caller ID: {self.caller_id}"]
if self.account_id:
parts.append(f"Account: {self.account_id}")
if self.conversation_summary:
parts.append(f"Conversation so far: {self.conversation_summary}")
if self.pending_actions:
parts.append(f"Pending actions: {', '.join(self.pending_actions)}")
if self.language_switches:
last_switch = self.language_switches[-1]
parts.append(
f"Switched from {last_switch['from']} because: "
f"{last_switch['reason']}"
)
return "\n".join(parts)
Voice and TTS Considerations
Each language may need a different TTS voice for natural-sounding output. Configure this per specialist:
LANGUAGE_VOICE_MAP = {
"en": {"voice": "alloy", "speed": 1.0},
"es": {"voice": "nova", "speed": 0.95},
"zh": {"voice": "shimmer", "speed": 0.9},
"hi": {"voice": "echo", "speed": 0.95},
}
async def configure_voice_for_language(ws, language: str):
"""Update the Realtime API session voice for the target language."""
config = LANGUAGE_VOICE_MAP.get(language, LANGUAGE_VOICE_MAP["en"])
await ws.send(json.dumps({
"type": "session.update",
"session": {
"voice": config["voice"],
"modalities": ["text", "audio"],
},
}))
Testing Multilingual Voice Agents
Testing multilingual agents requires care. Automated tests should cover:
- Language detection accuracy — Test with audio samples in each supported language
- Handoff correctness — Verify the right specialist receives the call
- Context preservation — Ensure account details survive language switches
- Fallback behavior — Test with unsupported languages to verify graceful degradation
- Mixed-language input — Some callers mix languages (code-switching); verify the agent does not break
Multilingual voice agents unlock global reach for businesses. The language router pattern with specialist handoffs keeps each agent focused and high-quality rather than trying to make a single agent do everything in every language.
Written by
CallSphere Team
Expert insights on AI voice agents and customer communication automation.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.