By Sagar Shankaran, Founder of CallSphere
OpenAI Agents SDK introduced first-class voice handoffs in 2026. Manager vs decentralized patterns, session.update events, and how they work in production.
Key takeaways
OpenAI Agents SDK introduced first-class voice handoffs in 2026. Manager vs decentralized patterns, session.update events, and how they work in production.
flowchart LR
Caller["Caller dials practice number"] --> Twilio["Twilio Programmable Voice"]
Twilio -- "Media Streams WS" --> Bridge["AI Bridge · FastAPI :8084"]
Bridge -- "PCM16 24kHz" --> Realtime["OpenAI Realtime API"]
Realtime -- "tool_call" --> Tools[("14 tools<br/>lookup · schedule · verify")]
Tools --> DB[("PostgreSQL<br/>healthcare_voice")]
Realtime --> Caller
Bridge --> Analytics[("Post-call analytics<br/>sentiment · lead score")]The OpenAI Agents SDK — released as an open-source framework in early 2026 — became the opinionated answer to "how do I build a multi-agent system?" The SDK ships four core primitives: Agents, Tools, Handoffs, and Guardrails. The voice-specific track lives in the SDK because Agent Builder (the no-code product) does not yet support voice workflows.
The handoff primitive is the headline feature for voice. A handoff is a structured mechanism where one agent transfers control to another, passing along context and conversation state. Under the hood, a handoff triggers a session.update event with new instructions and tools — the WebRTC session itself does not break, only the agent persona swaps.
OpenAI publishes two handoff patterns:
The SDK also adds Tracing for end-to-end observability of agent chains, and Guardrails for input/output validation — a critical pairing because handoffs amplify the attack surface.
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
Real voice flows almost always span multiple specialist agents:
Three concrete benefits of the handoff primitive:
This handoff pattern is the architecture of the entire CallSphere fleet. We were doing it pre-SDK; the SDK formalized what we had built bespoke.
OneRoof Real Estate runs 10 specialist agents explicitly in this pattern: a triage agent, a buyer-qualifier, a seller-intake, a tour-booker, a financing-quoter, a comparable-puller, a neighborhood-explainer, a vision-on-photos analyst, a CRM-writer, and an escalation handler. The OpenAI Agents SDK + WebRTC stack underpins them. Vision on property photos is a per-agent capability invoked from the comparable-puller and neighborhood agents.
Healthcare Voice Agent runs a manager-pattern agent with 14 scoped tools — receptionist scope. When clinical detail is needed (medication history, symptom triage), it hands off to a clinical specialist with a separate prompt and a different tool subset. Post-call sentiment scoring and lead-score calculation happen on the manager-tier transcript view (FastAPI :8084).
Salon GlamBook runs 4 agents (front-desk, booking, color-consultation, customer-service), with GB-YYYYMMDD-### booking refs persisted across handoffs.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
Across 37 agents, 90+ tools, 115+ DB tables, 6 verticals, 57+ languages, HIPAA + SOC 2 aligned, the handoff is the only realistic architecture for delivering depth without prompt bloat.
The /demo page lets you trigger handoffs live across our products at the pricing tiers ($149 / $499 / $1499) on the 14-day no-card trial.
handoff() primitive — triggers session.update with new tools and instructions.What is a handoff in the OpenAI Agents SDK?
A structured transfer of control from one agent to another, passing context and conversation state. Implemented at the WebRTC layer via a session.update event with new instructions and tools.
Manager pattern vs decentralized pattern — which is right? Manager pattern is the safer default — easier to debug, easier to audit. Decentralized works when specialist agents have clear "I am done, pass to X" exit conditions.
Does the user hear the handoff? No — the WebRTC session is continuous. The agent's persona changes (and possibly its voice), but there is no "please hold." Latency from handoff is usually under 200ms.
Can I do handoffs with tools that take a long time? Yes — the receiving agent can fire long-running tool calls. The SDK's tracing captures the latency and you can fill the silence with verbal back-channels from the receiving agent.
How does CallSphere's 10-agent OneRoof flow handle vision on property photos? The vision-capable agents (comparable-puller and neighborhood-explainer) get the vision tool injected into their scope at handoff time. Other agents in the chain do not have vision access — keeping tool counts focused per persona.
Written by
Sagar Shankaran· Founder, CallSphere
Sagar Shankaran is the founder of CallSphere, where he builds production AI voice and chat agents deployed across healthcare, hospitality, real estate, and home services. He writes about agentic AI, LLM engineering, and shipping voice agents that handle real calls in production.
See how AI voice agents work for your industry. Live demo available -- no signup required.
Graphiti is the open-source temporal knowledge graph for AI agents in 2026. Learn how bi-temporal memory beats vector RAG for voice agents and long-running LLMs.
How we built a fault-tolerant HVAC emergency triage and tech-dispatch platform on Kubernetes — three-tier CQRS, 11 micro-agents on the OpenAI Agents SDK + LangGraph, NATS JetStream, DTMF/SMS/WebSocket acceptance, circuit breakers, and an evaluation pipeline that catches regressions before they wake a tech at 3 AM.
The 2026 desktop AI agent landscape — ServiceNow Project Arc, Anthropic Claude offerings, OpenAI agents, and Google Mariner. A buyer's map.
Self-correction is now a property of the model, not the framework. What that means for production agent reliability, voice/chat fallbacks, and CallSphere.
How to design a multi-agent system using MCP for tools and A2A for cross-vendor coordination, with a CallSphere voice agent as a participating node.
A2A is the open standard for agent-to-agent coordination. Here is how the Agent Card JSON works, how discovery happens, and what to publish.
© 2026 CallSphere LLC. All rights reserved.