Skip to content
Technical Guides
Technical Guides9 min read0 views

Hierarchical Agent Handoffs (OpenAI Agents SDK) vs Vapi Squads

Triage to specialist to return-to-orchestrator pattern explained with code. CallSphere's OpenAI Agents SDK handoffs vs Vapi Squads' linear chain.

TL;DR

CallSphere's voice agents use a triage → specialist → return-to-orchestrator pattern via the OpenAI Agents SDK. Vapi's answer to multi-agent is Squads, which chain agents linearly without a return path. The hierarchical pattern lets a caller bounce between three or four specialists in one call without losing context. The linear pattern works when the conversation has a fixed sequence and the caller doesn't switch topics. Most production voice AI workloads — healthcare intake, real estate buyer calls, IT helpdesk — benefit from hierarchical handoffs because the caller controls the topic and changes it mid-call.

The Multi-Agent Pattern That Actually Scales

A single mega-prompt that handles every skill collapses around 70% reliability under production load. Every team that scales past one workflow rebuilds around multiple specialist agents. The architectural question is how the agents are wired together.

Two dominant patterns:

  1. Linear chain: Agent A talks until handoff, then Agent B talks until handoff, then Agent C closes. No return path. (Vapi Squads.)
  2. Hierarchical with return: A triage agent at the root dispatches to specialists. Specialists can hand back to triage or to peers. State persists across handoffs. (CallSphere via OpenAI Agents SDK.)

Linear is simpler to design. Hierarchical handles caller-driven topic changes natively.

Vapi Squads in Detail

Vapi Squads define a sequence of agent definitions that participate in a call. Each agent has its own system prompt, tools, and voice. Transitions are triggered by either the agent calling a handoff function or by Vapi's matching logic.

What works:

  • Cleanly separates concerns. Sales script agents — opener, qualifier, closer — fit Squads perfectly.
  • Voice changes per agent give an audible cue when the call shifts.
  • Each agent has a focused prompt and toolset.

What doesn't:

  • No return path. Once you hand off to a specialist, you can't return to the original agent without redesigning the chain.
  • No central listener. There's no triage agent always present that can intercept "wait, actually, can you check my balance again?" mid-call.
  • State sharing is metadata-shaped. Persistent state across agents is bolt-on rather than first-class.

CallSphere Hierarchical Handoffs

The OpenAI Agents SDK gives every agent a list of handoffs — other agents it can transfer the conversation to. The runtime tracks the active agent and forwards new turns to it. Importantly, specialists can hand back to triage or to peers, which gives you arbitrary topology, not just a linear chain.

Production toplogies on CallSphere:

  • Real Estate: Triage + 10 specialists (Property Search with vision, Buyer Lead, Seller Lead, Mortgage Pre-Qual, Tour Scheduling, Listing Inquiry, Open House, Market Analytics, Closing Coordinator). Up to 6 specialist visits per call.
  • Salon: Triage + 4 specialists (Booking, Service Recommendation, Reminder/Reschedule).
  • IT Helpdesk: Triage + 10 specialists with ChromaDB RAG behind the answer specialist.
  • After-Hours: Triage + 7 specialists with escalation policy.
  • Sales: 5 GPT-4 specialists + ElevenLabs "Sarah" voice.
  • Healthcare: Single Head Agent + 14 function-calling tools (single-domain depth, not multi-agent breadth).

Code-Level Pattern

from agents import Agent

property_search = Agent(
    name="Property Search",
    instructions="Search listings, including buyer-uploaded photos via vision.",
    tools=[search_listings, analyze_photo],
)

buyer_lead = Agent(
    name="Buyer Lead",
    instructions="Qualify budget, timeline, and preapproval. Hand off to property_search if listing-specific. Hand back to triage if scope changes.",
    tools=[qualify_buyer, capture_contact],
    handoffs=[property_search],
)

mortgage_prequal = Agent(
    name="Mortgage Pre-Qual",
    instructions="Walk caller through pre-qual estimate.",
    tools=[run_prequal_estimate],
)

tour_scheduling = Agent(
    name="Tour Scheduling",
    instructions="Book a property tour at the listing.",
    tools=[book_tour, send_confirmation_sms],
)

triage = Agent(
    name="Triage",
    instructions=(
        "Listen for caller intent. Route to specialist. "
        "Return here when specialist completes."
    ),
    handoffs=[buyer_lead, mortgage_prequal, tour_scheduling, property_search],
)

# Wire return paths
buyer_lead.handoffs.append(triage)
mortgage_prequal.handoffs.append(triage)
tour_scheduling.handoffs.append(triage)
property_search.handoffs.append(triage)

The graph is now: Triage ↔ Buyer Lead ↔ Property Search, plus Mortgage Pre-Qual and Tour Scheduling. Any specialist can return to Triage, and Buyer Lead can hand directly to Property Search without bouncing through Triage.

Sequence Diagram of a Real Call

sequenceDiagram
    participant Caller
    participant T as Triage
    participant BL as Buyer Lead
    participant PS as Property Search
    participant MQ as Mortgage Pre-Qual
    participant TS as Tour Scheduling
    Caller->>T: "Hi, I'm interested in buying"
    T->>BL: handoff(intent="buyer")
    Caller->>BL: budget + timeline
    Caller->>BL: "Here's a photo I texted you"
    BL->>PS: handoff(reason="listing photo")
    PS-->>Caller: "That's listing 123, $689k, 3BR"
    PS->>BL: handoff(complete)
    Caller->>BL: "Am I pre-qualified?"
    BL->>MQ: handoff(reason="prequal check")
    MQ-->>Caller: pre-qual estimate
    MQ->>BL: handoff(complete)
    BL->>TS: handoff(reason="schedule tour")
    TS-->>Caller: tour confirmed Saturday 2pm
    TS->>T: handoff(complete)
    T-->>Caller: closing summary

Six specialist visits, two return-to-orchestrator transitions, all in one call. The same flow on a linear Squad would have to be designed up-front and would not handle the caller's "actually, am I prequalified?" detour gracefully.

Capability Comparison

Capability CallSphere (Agents SDK) Vapi Squads
Topology Hierarchical with return Linear chain
Triage agent Native Workaround
Return-to-orchestrator Native No
Vision-capable specialist Yes (Real Estate) DIY
RAG-backed specialist Yes (IT + ChromaDB) DIY
Specialists per vertical Up to 10 Limited by chain length
Shared session state SDK-managed Manual via metadata
Mid-call topic switch Native Not natively supported
Tool subset per agent Yes Yes
Voice change per agent Optional Native

State Sharing Across Handoffs

The Agents SDK passes session state forward to the next agent. CallSphere extends this with a vertical-specific context object (caller name, intent, captured fields, prior tool outputs). Specialists read what they need and add what they discover. Triage reads the final state to compose the closing summary.

In Squads, state sharing is metadata fields you populate on transitions. Possible, but more code.

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

When Linear Squads Are the Right Tool

Linear works when:

  • The call has a fixed script (sales call: opener → qualifier → closer).
  • The caller does not control the topic.
  • You want a different voice per agent for an audible script cue.

CallSphere uses a roughly linear pattern in Sales (5 GPT-4 specialists + Sarah voice) for exactly that reason.

When Hierarchical Wins

Hierarchical wins when:

  • The caller controls the topic (healthcare intake, IT helpdesk, real estate buyer).
  • More than two skill domains share one call.
  • You want to compose a closing summary from all specialist outputs at the end.

This covers most enterprise voice AI workloads in 2026.

FAQ

Can Vapi do hierarchical handoffs at all?

You can simulate them by having every Squad agent able to "return" via a transfer to a known triage agent, but the SDK pattern is linear and the experience reflects that. State sharing is more manual.

How does CallSphere prevent infinite handoff loops?

Each agent has a max-handoff-depth budget per call. Triage tracks the count and short-circuits with a polite escalation if the depth exceeds the budget.

What about cost? More agents means more tokens?

The system prompt for each specialist is much shorter than a mega-prompt would be. Total tokens per call are typically lower than a single-agent design because each specialist sees only the relevant context.

How do specialists know when to hand back?

Each specialist's instructions tell it the completion criteria. The Agents SDK exposes a handoff(target) tool the model can call. The triage agent's instructions reinforce "specialists will return here when done."

Does the caller hear the handoff?

By default, no. The handoff is silent. CallSphere optionally adds a "let me get a specialist for you" line for some verticals (after-hours escalation), tunable per workflow.

Designing Your Own Agent Topology

If you're considering building a multi-agent voice stack, here's the design heuristic CallSphere uses across verticals.

Step one: list every distinct skill the agent needs. Not categories — specific verbs. "Qualify a buyer," "search listings," "estimate mortgage pre-qual," "book a tour."

Step two: cluster the skills by who controls the topic. If a single skill always follows another in fixed order, they belong together. If a caller can request a skill out of order, it deserves its own specialist.

Step three: identify the triage role. Triage doesn't do work; it routes. It listens for intent at the start and again whenever a specialist returns control.

Step four: define handoff criteria as text in each specialist's prompt. "Hand back to triage when the caller's intent shifts away from buying" is an explicit, model-readable instruction. The Agents SDK reads it.

Step five: budget the max handoff depth. We default to 8 transitions per call. Beyond that, the call escalates to a human via the After-Hours triage path. This prevents pathological handoff loops.

Step six: instrument every transition. Each handoff is a row in PostgreSQL with the trigger, the from-agent, the to-agent, the call ID, and the timestamp. Post-call analytics by gpt-4o-mini reads these rows and attributes outcomes to topology decisions.

The output of these six steps is a per-vertical agent graph. CallSphere's graphs are version-controlled in git alongside the prompts. We treat the topology as code.

Try CallSphere

See hierarchical handoffs in production. Book a demo or browse Healthcare and Real Estate.

Share

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.