Skip to content
Learn Agentic AI
Learn Agentic AI12 min read10 views

Multi-Language Voice Agents with Handoffs

Build multi-language voice agents that detect the caller's language, perform agent handoffs between language-specific specialists, and maintain context across language transitions.

The Multilingual Challenge

Businesses that serve diverse populations need voice agents that work in multiple languages. A property management company in a multicultural city might receive calls in English, Spanish, Mandarin, and Hindi within the same hour. A healthcare hotline serving immigrant communities must understand patients regardless of which language they speak.

Building a single voice agent that handles all languages equally well is harder than it sounds. Each language has different speech patterns, politeness conventions, sentence structures, and cultural expectations. The most effective architecture uses language-specific specialist agents with intelligent handoffs between them.

Architecture: Language Router with Specialist Agents

The pattern is straightforward: a front-door agent detects the caller's language and hands off to the appropriate specialist. Each specialist is tuned for its language — with culturally appropriate greetings, instructions in the target language, and language-specific tools.

flowchart TD
    START["Multi-Language Voice Agents with Handoffs"] --> A
    A["The Multilingual Challenge"]
    A --> B
    B["Architecture: Language Router with Spec…"]
    B --> C
    C["Language Detection Strategies"]
    C --> D
    D["Building Language-Specific Agents"]
    D --> E
    E["Implementing the Handoff"]
    E --> F
    F["Maintaining Context Across Language Swi…"]
    F --> G
    G["Voice and TTS Considerations"]
    G --> H
    H["Testing Multilingual Voice Agents"]
    H --> DONE["Key Takeaways"]
    style START fill:#4f46e5,stroke:#4338ca,color:#fff
    style DONE fill:#059669,stroke:#047857,color:#fff
┌──────────────┐
│   Incoming    │
│    Call       │
└──────┬───────┘
       │
┌──────▼───────┐
│   Language    │
│   Router      │
│   Agent       │
└──────┬───────┘
       │ handoff based on detected language
  ┌────┼────┬────────┐
  │    │    │        │
┌─▼──┐┌▼──┐┌▼──────┐┌▼──────┐
│ EN ││ ES ││  ZH   ││  HI   │
│Agent││Agent││ Agent ││ Agent │
└────┘└────┘└───────┘└───────┘

Language Detection Strategies

Before you can route to the right specialist, you need to identify the language. There are three approaches, each with tradeoffs.

Strategy 1: Ask the Caller

The simplest approach is a multilingual greeting that asks the caller to state their preferred language:

from agents import Agent

language_router = Agent(
    name="LanguageRouter",
    instructions="""You are a multilingual receptionist. Greet the caller with:

"Hello! Welcome to Acme Services.
Para espanol, diga 'espanol'.
For English, say 'English'.
Mandarin, qing shuo 'zhongwen'.
Hindi ke liye, 'Hindi' kahein."

Listen to the caller's response and determine which language they prefer.
If they respond in a specific language without explicitly choosing,
use that language. Hand off to the appropriate language specialist.""",
)

Strategy 2: Automatic Language Identification

Use the speech-to-text transcription to detect the language automatically. The OpenAI Realtime API transcribes audio and can indicate the detected language:

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

import json
from typing import Optional

class LanguageDetector:
    """Detect language from the first few seconds of speech."""

    CONFIDENCE_THRESHOLD = 0.7
    SUPPORTED_LANGUAGES = {"en", "es", "zh", "hi", "fr", "ar", "pt"}

    def __init__(self):
        self._samples: list[str] = []
        self._detected: Optional[str] = None

    def on_transcript(self, transcript: str, language_code: str, confidence: float):
        """Process a transcript chunk with language detection metadata."""
        self._samples.append(language_code)

        if confidence >= self.CONFIDENCE_THRESHOLD:
            self._detected = language_code
            return self._detected

        # After 3 samples, use majority vote
        if len(self._samples) >= 3:
            from collections import Counter
            most_common = Counter(self._samples).most_common(1)[0][0]
            if most_common in self.SUPPORTED_LANGUAGES:
                self._detected = most_common
                return self._detected

        return None

    @property
    def language(self) -> Optional[str]:
        return self._detected

Strategy 3: Hybrid Approach

The most robust approach combines automatic detection with explicit confirmation. Detect the language automatically, greet the caller in that language, and confirm:

async def hybrid_language_detection(ws, detector: LanguageDetector):
    """Detect language and confirm with the caller."""
    GREETINGS = {
        "en": "Hello! I detected you are speaking English. Is that correct?",
        "es": "Hola! He detectado que habla espanol. Es correcto?",
        "zh": "Ni hao! Wo jiance dao nin shuo zhongwen. Dui ma?",
        "hi": "Namaste! Mujhe lagta hai aap Hindi mein bol rahe hain. Kya yah sahi hai?",
    }

    detected = detector.language
    if detected and detected in GREETINGS:
        greeting = GREETINGS[detected]
    else:
        greeting = (
            "Hello! Which language would you prefer? "
            "English, Espanol, Zhongwen, or Hindi?"
        )

    await ws.send(json.dumps({
        "type": "conversation.item.create",
        "item": {
            "type": "message",
            "role": "assistant",
            "content": [{"type": "input_text", "text": greeting}],
        },
    }))
    await ws.send(json.dumps({"type": "response.create"}))

Building Language-Specific Agents

Each specialist agent is configured with language-appropriate instructions, voice, and tools:

flowchart TD
    CENTER(("Core Concepts"))
    CENTER --> N0["Language detection accuracy — Test with…"]
    CENTER --> N1["Handoff correctness — Verify the right …"]
    CENTER --> N2["Context preservation — Ensure account d…"]
    CENTER --> N3["Fallback behavior — Test with unsupport…"]
    CENTER --> N4["Mixed-language input — Some callers mix…"]
    style CENTER fill:#4f46e5,stroke:#4338ca,color:#fff
from agents import Agent, function_tool

@function_tool
async def lookup_account(account_id: str) -> str:
    """Look up customer account details by ID."""
    # Database lookup logic
    return f"Account {account_id}: Active, balance $150.00"

@function_tool
async def schedule_appointment(
    date: str, time: str, service_type: str, language: str
) -> str:
    """Schedule a service appointment with language preference noted."""
    return (
        f"Appointment scheduled for {date} at {time} "
        f"for {service_type}. Language preference: {language}"
    )

english_agent = Agent(
    name="EnglishSpecialist",
    instructions="""You are a customer service agent. Speak only in English.
Be professional and helpful. Use clear, simple language.
Always confirm important details before taking actions.""",
    tools=[lookup_account, schedule_appointment],
)

spanish_agent = Agent(
    name="SpanishSpecialist",
    instructions="""Eres un agente de servicio al cliente. Habla solo en espanol.
Se profesional y amable. Usa un lenguaje claro y sencillo.
Siempre confirma los detalles importantes antes de tomar acciones.
Use 'usted' for formal address unless the caller uses 'tu' first.""",
    tools=[lookup_account, schedule_appointment],
)

mandarin_agent = Agent(
    name="MandarinSpecialist",
    instructions="""你是一位客户服务代理。请只使用中文交流。
保持专业和友好。使用清晰简洁的语言。
在执行任何操作之前,请务必确认重要细节。
Use formal register (您) unless the caller uses informal (你).""",
    tools=[lookup_account, schedule_appointment],
)

hindi_agent = Agent(
    name="HindiSpecialist",
    instructions="""Aap ek customer service agent hain. Sirf Hindi mein baat karein.
Professional aur madad karne wale banein. Saaf aur seedhi bhasha ka istemal karein.
Koi bhi action lene se pehle zaroori details confirm karein.
Use respectful 'aap' form throughout the conversation.""",
    tools=[lookup_account, schedule_appointment],
)

Implementing the Handoff

The OpenAI Agents SDK supports handoffs natively. The language router agent uses handoffs to transfer control to the appropriate specialist:

from agents import Agent

language_router = Agent(
    name="LanguageRouter",
    instructions="""You are a language routing agent. Your only job is to:
1. Detect the caller's preferred language
2. Hand off to the correct language specialist

Do NOT attempt to answer questions yourself.
Greet the caller briefly in a multilingual way, detect their language,
and perform the handoff immediately.

Supported languages: English, Spanish, Mandarin, Hindi.
If the language is not supported, hand off to the English specialist
and let them know the caller may need an interpreter.""",
    handoffs=[english_agent, spanish_agent, mandarin_agent, hindi_agent],
)

When the router detects that a caller is speaking Spanish, it hands off to the SpanishSpecialist. The handoff transfers the full conversation context, so the specialist knows what has been said so far.

Maintaining Context Across Language Switches

Sometimes a caller switches languages mid-conversation, or asks to be transferred to a different language specialist. You need to preserve context across these transitions:

from dataclasses import dataclass, field
from typing import Optional

@dataclass
class MultilingualContext:
    caller_id: str
    detected_language: str
    confirmed_language: Optional[str] = None
    conversation_summary: str = ""
    account_id: Optional[str] = None
    pending_actions: list = field(default_factory=list)
    language_switches: list = field(default_factory=list)

    def switch_language(self, new_language: str, reason: str):
        """Record a language switch with context."""
        self.language_switches.append({
            "from": self.confirmed_language or self.detected_language,
            "to": new_language,
            "reason": reason,
            "summary_at_switch": self.conversation_summary,
        })
        self.confirmed_language = new_language

    def handoff_context(self) -> str:
        """Generate context string for the receiving agent."""
        parts = [f"Caller ID: {self.caller_id}"]
        if self.account_id:
            parts.append(f"Account: {self.account_id}")
        if self.conversation_summary:
            parts.append(f"Conversation so far: {self.conversation_summary}")
        if self.pending_actions:
            parts.append(f"Pending actions: {', '.join(self.pending_actions)}")
        if self.language_switches:
            last_switch = self.language_switches[-1]
            parts.append(
                f"Switched from {last_switch['from']} because: "
                f"{last_switch['reason']}"
            )
        return "\n".join(parts)

Voice and TTS Considerations

Each language may need a different TTS voice for natural-sounding output. Configure this per specialist:

LANGUAGE_VOICE_MAP = {
    "en": {"voice": "alloy", "speed": 1.0},
    "es": {"voice": "nova", "speed": 0.95},
    "zh": {"voice": "shimmer", "speed": 0.9},
    "hi": {"voice": "echo", "speed": 0.95},
}

async def configure_voice_for_language(ws, language: str):
    """Update the Realtime API session voice for the target language."""
    config = LANGUAGE_VOICE_MAP.get(language, LANGUAGE_VOICE_MAP["en"])
    await ws.send(json.dumps({
        "type": "session.update",
        "session": {
            "voice": config["voice"],
            "modalities": ["text", "audio"],
        },
    }))

Testing Multilingual Voice Agents

Testing multilingual agents requires care. Automated tests should cover:

  1. Language detection accuracy — Test with audio samples in each supported language
  2. Handoff correctness — Verify the right specialist receives the call
  3. Context preservation — Ensure account details survive language switches
  4. Fallback behavior — Test with unsupported languages to verify graceful degradation
  5. Mixed-language input — Some callers mix languages (code-switching); verify the agent does not break

Multilingual voice agents unlock global reach for businesses. The language router pattern with specialist handoffs keeps each agent focused and high-quality rather than trying to make a single agent do everything in every language.

Share
C

Written by

CallSphere Team

Expert insights on AI voice agents and customer communication automation.

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.