Multi-Agent Voice Handoffs in 2026: The OpenAI Agents SDK Pattern
OpenAI Agents SDK introduced first-class voice handoffs in 2026. Manager vs decentralized patterns, session.update events, and how they work in production.
OpenAI Agents SDK introduced first-class voice handoffs in 2026. Manager vs decentralized patterns, session.update events, and how they work in production.
What changed
flowchart LR
Caller["Caller dials practice number"] --> Twilio["Twilio Programmable Voice"]
Twilio -- "Media Streams WS" --> Bridge["AI Bridge · FastAPI :8084"]
Bridge -- "PCM16 24kHz" --> Realtime["OpenAI Realtime API"]
Realtime -- "tool_call" --> Tools[("14 tools<br/>lookup · schedule · verify")]
Tools --> DB[("PostgreSQL<br/>healthcare_voice")]
Realtime --> Caller
Bridge --> Analytics[("Post-call analytics<br/>sentiment · lead score")]The OpenAI Agents SDK — released as an open-source framework in early 2026 — became the opinionated answer to "how do I build a multi-agent system?" The SDK ships four core primitives: Agents, Tools, Handoffs, and Guardrails. The voice-specific track lives in the SDK because Agent Builder (the no-code product) does not yet support voice workflows.
The handoff primitive is the headline feature for voice. A handoff is a structured mechanism where one agent transfers control to another, passing along context and conversation state. Under the hood, a handoff triggers a session.update event with new instructions and tools — the WebRTC session itself does not break, only the agent persona swaps.
OpenAI publishes two handoff patterns:
- Manager pattern — a central LLM orchestrates a network of specialized agents through tool calls, routing each turn to the right specialist.
- Decentralized pattern — agents hand off workflow execution directly to one another. Useful when one specialist agent finishes its work and explicitly passes control.
The SDK also adds Tracing for end-to-end observability of agent chains, and Guardrails for input/output validation — a critical pairing because handoffs amplify the attack surface.
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
Why it matters for voice agent builders
Real voice flows almost always span multiple specialist agents:
- A receptionist agent triages, then hands off to a billing agent or a clinical-intake agent.
- A real estate qualifier agent hands off to a property-tour-booking agent once the buyer is qualified.
- A salon front-desk agent hands off to a colorist-consultation agent for technical service questions.
Three concrete benefits of the handoff primitive:
- Specialist agents can have long, focused instructions. Instead of one mega-prompt covering every scenario, each specialist has a tight 200-line system prompt. This is a measurable accuracy win.
- Tools are scoped per agent. The receptionist does not have access to billing write tools. Reduced tool count per agent reduces tool-call confusion in the LLM.
- The WebRTC session survives handoffs. Users do not hear a "please hold while I transfer" — the voice is continuous, only the agent persona changes.
How CallSphere applies this
This handoff pattern is the architecture of the entire CallSphere fleet. We were doing it pre-SDK; the SDK formalized what we had built bespoke.
OneRoof Real Estate runs 10 specialist agents explicitly in this pattern: a triage agent, a buyer-qualifier, a seller-intake, a tour-booker, a financing-quoter, a comparable-puller, a neighborhood-explainer, a vision-on-photos analyst, a CRM-writer, and an escalation handler. The OpenAI Agents SDK + WebRTC stack underpins them. Vision on property photos is a per-agent capability invoked from the comparable-puller and neighborhood agents.
Healthcare Voice Agent runs a manager-pattern agent with 14 scoped tools — receptionist scope. When clinical detail is needed (medication history, symptom triage), it hands off to a clinical specialist with a separate prompt and a different tool subset. Post-call sentiment scoring and lead-score calculation happen on the manager-tier transcript view (FastAPI :8084).
Salon GlamBook runs 4 agents (front-desk, booking, color-consultation, customer-service), with GB-YYYYMMDD-### booking refs persisted across handoffs.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
Across 37 agents, 90+ tools, 115+ DB tables, 6 verticals, 57+ languages, HIPAA + SOC 2 aligned, the handoff is the only realistic architecture for delivering depth without prompt bloat.
The /demo page lets you trigger handoffs live across our products at the pricing tiers ($149 / $499 / $1499) on the 14-day no-card trial.
Build and migration steps
- Map your conversation into discrete agent personas. Aim for 3-10 specialists, not one mega-agent.
- Define a handoff trigger for each specialist — explicit ("when caller wants billing"), or LLM-decided via the manager.
- Implement the handoff via the SDK's
handoff()primitive — triggerssession.updatewith new tools and instructions. - Persist conversation state at handoff time — the new agent should not lose context (caller name, intent so far, prior tool results).
- Add tracing — the SDK's built-in tracing captures the handoff chain for debugging and audit.
- Add guardrails on every handoff edge — never trust unvetted state from another agent.
- Run a 500-call eval before going live; handoff failures are subtle and only surface in real conversational data.
FAQ
What is a handoff in the OpenAI Agents SDK?
A structured transfer of control from one agent to another, passing context and conversation state. Implemented at the WebRTC layer via a session.update event with new instructions and tools.
Manager pattern vs decentralized pattern — which is right? Manager pattern is the safer default — easier to debug, easier to audit. Decentralized works when specialist agents have clear "I am done, pass to X" exit conditions.
Does the user hear the handoff? No — the WebRTC session is continuous. The agent's persona changes (and possibly its voice), but there is no "please hold." Latency from handoff is usually under 200ms.
Can I do handoffs with tools that take a long time? Yes — the receiving agent can fire long-running tool calls. The SDK's tracing captures the latency and you can fill the silence with verbal back-channels from the receiving agent.
How does CallSphere's 10-agent OneRoof flow handle vision on property photos? The vision-capable agents (comparable-puller and neighborhood-explainer) get the vision tool injected into their scope at handoff time. Other agents in the chain do not have vision access — keeping tool counts focused per persona.
Sources
- OpenAI Agents SDK — Handoffs docs — https://openai.github.io/openai-agents-python/handoffs/
- OpenAI Agents SDK — orchestration guide — https://developers.openai.com/api/docs/guides/agents/orchestration
- OpenAI — A practical guide to building agents — https://openai.com/business/guides-and-resources/a-practical-guide-to-building-ai-agents/
- GitHub — openai/openai-realtime-agents — https://github.com/openai/openai-realtime-agents
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.