By Sagar Shankaran, Founder of CallSphere
Handoffs let one specialist transfer control end-to-end, conversation context included. We compare OpenAI Agents SDK handoffs to LangGraph subgraphs, with CallSphere's after-hours 7-agent ladder as the reference design.
Key takeaways
TL;DR — Use handoffs (not "agents-as-tools") when the chosen specialist should own the next reply, not just contribute a paragraph. The OpenAI Agents SDK exposes handoffs as tool calls; LangGraph implements them via subgraphs. CallSphere's after-hours stack uses a 7-rung hierarchical ladder — Primary → Secondary → 6 fallbacks.
A handoff is an explicit, irreversible transfer of control from one agent to another. The receiving agent inherits the conversation and decides what happens next. Compare to "agent-as-tool" where the parent agent stays in control and just consumes the child's response.
In a hierarchy, handoffs flow down the tree (delegation) and sometimes back up (escalation). Children rarely talk laterally — when they do, route through the parent.
flowchart TD
ROOT[Root triage] -->|handoff: clinical| L1[Clinical lead]
ROOT -->|handoff: ops| L2[Ops lead]
L1 -->|handoff: scheduling| C1[Scheduler]
L1 -->|handoff: nurse line| C2[Nurse triage]
L2 -->|handoff: billing| C3[Billing]
L2 -->|handoff: insurance| C4[Insurance]
C2 -->|escalate| L1
L1 -->|escalate to human| HUMAN[Human RN]
CallSphere's after-hours product is a 7-agent hierarchical handoff ladder:
Each rung has its own prompt, its own toolset, and its own SLA. A handoff carries the full conversation transcript plus structured intent ("scheduling", "emergency", "billing"). Across the platform: 37 agents · 90+ tools · 115+ DB tables · 6 verticals, with handoffs implemented as edges in a Postgres-backed state machine.
Pricing: Starter $149 · Growth $499 · Scale $1,499, 14-day trial, 22% affiliate.
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
from agents import Agent, handoff
billing = Agent(name="Billing", instructions="Resolve invoice questions...")
support = Agent(name="Support", instructions="Tech issues only...")
triage = Agent(
name="Triage",
instructions="Route to billing or support. Hand off, don't summarize.",
handoffs=[handoff(billing), handoff(support)]
)
result = await Runner.run(triage, "My invoice is wrong")
print(result.final_output) # billing replied
In LangGraph the same idea uses Command(goto="billing", update={...}) to jump nodes while updating shared state. Both express irreversible delegation; pick the framework, not the pattern.
Q: Handoff or agent-as-tool? Handoff if the specialist should reply directly to the user. Agent-as-tool if the parent should integrate and rephrase.
Q: Can a child hand off back up? Yes — that's escalation. Just track depth so you don't loop.
Q: How does this differ from supervisor + specialists? Supervisor stays in control between every turn. Hierarchical handoffs let a specialist own multiple consecutive turns until it hands back.
Q: Does OpenAI's Agents SDK support nested handoffs? Yes. Each agent can declare its own handoffs, and runs traverse the tree until a terminal agent produces final output.
Q: What about streaming? Handoffs preserve the stream — the receiving agent picks up streaming on the same connection. Frameworks differ; test before production.
Once you've shipped hierarchical Handoffs to a real workload, the design questions change. You stop asking 'can the agent do this?' and start asking 'can the agent do this within a 1.2s p95 and under $0.04 per session?' That contract is what separates a demo from a production system. CallSphere learned this the expensive way while wiring 37 specialized agents to 90+ tools across 115+ database tables — every integration that didn't enforce schemas at the tool boundary eventually paged someone.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
Agentic AI in a real call center is a different beast than a single-LLM chatbot. Instead of one model answering one prompt, you orchestrate a small team: a router that decides intent, specialists that own a vertical (booking, intake, billing, escalation), and tools that read and write to the same Postgres your CRM trusts. Hand-offs are where most production bugs hide — when Agent A passes context to Agent B, anything that isn't explicit in the message gets lost, and the user feels it as the agent "forgetting." That's why the systems that hold up under load are the ones with typed tool schemas, deterministic state stored outside the conversation, and a hard ceiling on tool calls per session. The cost story is just as important: a multi-agent loop can quietly burn 10x the tokens of a single-LLM design if you let it think out loud at every step. The fix isn't a smarter model, it's smaller agents, shorter prompts, cached system messages, and evals that fail the build when p95 latency or per-session cost regresses. CallSphere runs this pattern across 6 verticals in production, and the rule has held every time: the agent you can debug in five minutes will out-survive the agent that's "smarter" on a benchmark.
Q: How do you scale hierarchical Handoffs without blowing up token cost?
A: Scaling comes from constraint, not capability. The deployments that hold up keep each agent narrow, cap tool calls per turn, cache the system prompt, and pin a smaller model for routing while reserving the larger model for synthesis. CallSphere's stack — 37 agents · 90+ tools · 115+ DB tables · 6 verticals live — is sized that way on purpose.
Q: What stops hierarchical Handoffs from looping forever on edge cases?
A: Hard ceilings beat heuristics. A maximum step count, an idempotency key on every tool call, and a fallback to a deterministic script when confidence drops below a threshold are what keep the loop bounded. Evals that simulate noisy inputs catch the rest before they reach a real caller.
Q: Where does CallSphere use hierarchical Handoffs in production today?
A: It's already in production. Today CallSphere runs this pattern in Sales and After-Hours Escalation, alongside the other live verticals (Healthcare, Real Estate, Salon, Sales, After-Hours Escalation, IT Helpdesk). The same orchestrator code path serves voice and chat — the difference is the tool set the router exposes.
Want to see real estate agents handle real traffic? Spin up a walkthrough at https://realestate.callsphere.tech or grab 20 minutes on the calendar: https://calendly.com/sagar-callsphere/new-meeting.
Written by
Sagar Shankaran· Founder, CallSphere
Sagar Shankaran is the founder of CallSphere, where he builds production AI voice and chat agents deployed across healthcare, hospitality, real estate, and home services. He writes about agentic AI, LLM engineering, and shipping voice agents that handle real calls in production.
See how AI voice agents work for your industry. Live demo available -- no signup required.
OpenAI's Frontier platform makes model-native orchestration the default. What that means for agent builders, voice/chat buyers, and the build-vs-buy decision.
The 2026 desktop AI agent landscape — ServiceNow Project Arc, Anthropic Claude offerings, OpenAI agents, and Google Mariner. A buyer's map.
How to design a multi-agent system using MCP for tools and A2A for cross-vendor coordination, with a CallSphere voice agent as a participating node.
A three-way comparison of Gemini Enterprise, Anthropic managed agents and OpenAI Frontier Platform after Cloud Next 2026 — strengths, gaps, buyer fit.
A2A is the open standard for agent-to-agent coordination. Here is how the Agent Card JSON works, how discovery happens, and what to publish.
Anthropic's May 2026 push positions Claude as a vertical platform for financial services. The strategic positioning versus OpenAI and Google.
© 2026 CallSphere LLC. All rights reserved.