OpenAI Agents SDK introduced first-class voice handoffs in 2026. Manager vs decentralized patterns, session.update events, and how they work in production.

What changed

flowchart LR
  Caller["Caller dials practice number"] --> Twilio["Twilio Programmable Voice"]
  Twilio -- "Media Streams WS" --> Bridge["AI Bridge · FastAPI :8084"]
  Bridge -- "PCM16 24kHz" --> Realtime["OpenAI Realtime API"]
  Realtime -- "tool_call" --> Tools[("14 tools<br/>lookup · schedule · verify")]
  Tools --> DB[("PostgreSQL<br/>healthcare_voice")]
  Realtime --> Caller
  Bridge --> Analytics[("Post-call analytics<br/>sentiment · lead score")]

CallSphere reference architecture

The OpenAI Agents SDK — released as an open-source framework in early 2026 — became the opinionated answer to "how do I build a multi-agent system?" The SDK ships four core primitives: Agents, Tools, Handoffs, and Guardrails. The voice-specific track lives in the SDK because Agent Builder (the no-code product) does not yet support voice workflows.

The handoff primitive is the headline feature for voice. A handoff is a structured mechanism where one agent transfers control to another, passing along context and conversation state. Under the hood, a handoff triggers a session.update event with new instructions and tools — the WebRTC session itself does not break, only the agent persona swaps.

OpenAI publishes two handoff patterns:

Manager pattern — a central LLM orchestrates a network of specialized agents through tool calls, routing each turn to the right specialist.
Decentralized pattern — agents hand off workflow execution directly to one another. Useful when one specialist agent finishes its work and explicitly passes control.

The SDK also adds Tracing for end-to-end observability of agent chains, and Guardrails for input/output validation — a critical pairing because handoffs amplify the attack surface.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live →

Try Live Demo →

Why it matters for voice agent builders

Real voice flows almost always span multiple specialist agents:

A receptionist agent triages, then hands off to a billing agent or a clinical-intake agent.
A real estate qualifier agent hands off to a property-tour-booking agent once the buyer is qualified.
A salon front-desk agent hands off to a colorist-consultation agent for technical service questions.

Three concrete benefits of the handoff primitive:

Specialist agents can have long, focused instructions. Instead of one mega-prompt covering every scenario, each specialist has a tight 200-line system prompt. This is a measurable accuracy win.
Tools are scoped per agent. The receptionist does not have access to billing write tools. Reduced tool count per agent reduces tool-call confusion in the LLM.
The WebRTC session survives handoffs. Users do not hear a "please hold while I transfer" — the voice is continuous, only the agent persona changes.

How CallSphere applies this

This handoff pattern is the architecture of the entire CallSphere fleet. We were doing it pre-SDK; the SDK formalized what we had built bespoke.

OneRoof Real Estate runs 10 specialist agents explicitly in this pattern: a triage agent, a buyer-qualifier, a seller-intake, a tour-booker, a financing-quoter, a comparable-puller, a neighborhood-explainer, a vision-on-photos analyst, a CRM-writer, and an escalation handler. The OpenAI Agents SDK + WebRTC stack underpins them. Vision on property photos is a per-agent capability invoked from the comparable-puller and neighborhood agents.

Healthcare Voice Agent runs a manager-pattern agent with 14 scoped tools — receptionist scope. When clinical detail is needed (medication history, symptom triage), it hands off to a clinical specialist with a separate prompt and a different tool subset. Post-call sentiment scoring and lead-score calculation happen on the manager-tier transcript view (FastAPI :8084).

Salon GlamBook runs 4 agents (front-desk, booking, color-consultation, customer-service), with GB-YYYYMMDD-### booking refs persisted across handoffs.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Try Live Demo → Book 30-min Walkthrough See Pricing

Across 37 agents, 90+ tools, 115+ DB tables, 6 verticals, 57+ languages, HIPAA + SOC 2 aligned, the handoff is the only realistic architecture for delivering depth without prompt bloat.

The /demo page lets you trigger handoffs live across our products at the pricing tiers ($149 / $499 / $1499) on the 14-day no-card trial.

Build and migration steps

Map your conversation into discrete agent personas. Aim for 3-10 specialists, not one mega-agent.
Define a handoff trigger for each specialist — explicit ("when caller wants billing"), or LLM-decided via the manager.
Implement the handoff via the SDK's handoff() primitive — triggers session.update with new tools and instructions.
Persist conversation state at handoff time — the new agent should not lose context (caller name, intent so far, prior tool results).
Add tracing — the SDK's built-in tracing captures the handoff chain for debugging and audit.
Add guardrails on every handoff edge — never trust unvetted state from another agent.
Run a 500-call eval before going live; handoff failures are subtle and only surface in real conversational data.

FAQ

What is a handoff in the OpenAI Agents SDK? A structured transfer of control from one agent to another, passing context and conversation state. Implemented at the WebRTC layer via a session.update event with new instructions and tools.

Manager pattern vs decentralized pattern — which is right? Manager pattern is the safer default — easier to debug, easier to audit. Decentralized works when specialist agents have clear "I am done, pass to X" exit conditions.

Does the user hear the handoff? No — the WebRTC session is continuous. The agent's persona changes (and possibly its voice), but there is no "please hold." Latency from handoff is usually under 200ms.

Can I do handoffs with tools that take a long time? Yes — the receiving agent can fire long-running tool calls. The SDK's tracing captures the latency and you can fill the silence with verbal back-channels from the receiving agent.

How does CallSphere's 10-agent OneRoof flow handle vision on property photos? The vision-capable agents (comparable-puller and neighborhood-explainer) get the vision tool injected into their scope at handoff time. Other agents in the chain do not have vision access — keeping tool counts focused per persona.

Sources

OpenAI Agents SDK — Handoffs docs — https://openai.github.io/openai-agents-python/handoffs/
OpenAI Agents SDK — orchestration guide — https://developers.openai.com/api/docs/guides/agents/orchestration
OpenAI — A practical guide to building agents — https://openai.com/business/guides-and-resources/a-practical-guide-to-building-ai-agents/
GitHub — openai/openai-realtime-agents — https://github.com/openai/openai-realtime-agents

Multi-Agent Voice Handoffs in 2026: The OpenAI Agents SDK Pattern

What changed

Why it matters for voice agent builders

How CallSphere applies this

Build and migration steps

FAQ

Sources

Try CallSphere AI Voice Agents

Related Articles You May Like

Building OpenAI Realtime Voice Agents with an Eval Pipeline (2026)

LangGraph Supervisor Pattern: Orchestrating Multi-Agent Teams in 2026

OpenAI Agents SDK vs Assistants API in 2026: Migration Guide with Eval Parity

Building Your First Agent with the OpenAI Agents SDK in 2026: A Hands-On Walkthrough

Online vs Offline Agent Evaluation: The Pre-Deploy / Post-Deploy Split

Voice Agent Quality Metrics in 2026: WER, Latency, Grounding, and the Ones Most Teams Miss