Skip to content
Technical Guides
Technical Guides14 min read0 views

Vapi Squads vs CallSphere Hierarchical Agents (Deep Dive)

Side-by-side architecture deep dive: Vapi Squads chained handoffs vs CallSphere OpenAI Agents SDK hierarchical orchestrator with return-to-parent flow.

TL;DR

Vapi Squads wire multiple assistants into a chain inside a single call — one assistant transfers to the next, and the new assistant runs to completion. CallSphere uses the OpenAI Agents SDK with a hierarchical orchestrator pattern: a parent agent dispatches to specialists and the specialist returns control when done. The first model is great for predictable linear flows. The second is required when you need branching, fallbacks, and "go back to triage if scheduling fails" semantics.

If your call flow is "always greet, then qualify, then book," Vapi Squads will work. If your flow has loops, retries, and policy-driven routing, you want hierarchy.

Why This Difference Matters

Voice calls are messy. People interrupt themselves, change topics, hang up mid-sentence, restart with a new question. A clean chain (A → B → C) breaks the moment a caller in C says "actually, can you check the records again?" — you need to hop back to A or sideways to a Records specialist.

Vapi Squads model agents as a directed graph of transitions. Once you transfer, you live in the new agent's frame. You can transfer again, but you cannot "return to parent." That makes the call topology essentially a tree with leaves — you walk down, you do not walk back up.

CallSphere's hierarchy explicitly supports return-to-parent. Specialists complete, the orchestrator regains control with the specialist's structured result merged into context, and the orchestrator decides what is next.

Vapi Squads Approach

A typical Vapi Squad config looks like this:

{
  "name": "Real Estate Squad",
  "members": [
    { "assistantId": "asst_greeter", "first": true },
    { "assistantId": "asst_qualifier" },
    { "assistantId": "asst_scheduler" }
  ],
  "membersOverrides": {
    "asst_greeter": {
      "transferList": [
        { "destination": "asst_qualifier", "message": "Transferring you to qualification" }
      ]
    },
    "asst_qualifier": {
      "transferList": [
        { "destination": "asst_scheduler", "message": "Booking specialist now" }
      ]
    }
  }
}

The flow is: greeter → qualifier → scheduler. Each assistant has its own system prompt, tools, and voice. Transitions are explicit and unidirectional unless you wire a back edge manually (which the model has no way to know about during reasoning).

Strengths:

  • Conceptually simple
  • Easy to debug per-assistant
  • Per-assistant voice changes are crisp

Weaknesses:

  • No true return-to-parent
  • State sharing across squad members is shallow
  • The current assistant has no awareness of the wider call plan
  • Branching ("if user asks about pricing, jump to pricing assistant") requires manual transfer rules in every member

CallSphere Hierarchical Approach

CallSphere builds on the OpenAI Agents SDK. The basic shape is a parent orchestrator plus N specialist agents that the orchestrator can hand off to and that return when done.

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

from agents import Agent, Runner, handoff

scheduling_agent = Agent(
    name="SchedulingAgent",
    instructions="...",
    tools=[search_providers, schedule_appointment, send_sms],
)

records_agent = Agent(
    name="RecordsAgent",
    instructions="...",
    tools=[lookup_patient, fetch_visits],
)

orchestrator = Agent(
    name="Orchestrator",
    instructions="""You triage healthcare calls. Hand off to scheduling for
    appointment work, records for chart questions, and escalate to human for
    clinical questions. Always summarize what the specialist returned before
    closing the call.""",
    handoffs=[
        handoff(scheduling_agent, on_complete="return_to_parent"),
        handoff(records_agent, on_complete="return_to_parent"),
    ],
)

result = await Runner.run(orchestrator, transcript)

The key flag is on_complete="return_to_parent". When scheduling_agent finishes its job (or hits a stop condition), control returns to the orchestrator with a structured handoff result attached to context. The orchestrator can then:

  • Close the call
  • Send to another specialist
  • Recover from a specialist failure
  • Re-prompt the user with a wider question

This is structurally equivalent to a function call returning, vs Vapi Squads' more JSR-style "jump and never come back."

Real Production Example: Real Estate

CallSphere's Real Estate voice agent ships with a hierarchy of:

  • Lead Triage agent (orchestrator)
  • Property Search specialist (with vision tool for buyer-uploaded photos)
  • Mortgage Pre-Qual specialist
  • Tour Scheduling specialist
  • Escalation-to-Human specialist

A typical conversation does not visit each specialist once. A buyer asks about a 3-bedroom in Sunnyvale (Property Search), then asks "can I afford this on $180K?" (Mortgage), then returns to Property Search to refine. Hierarchy makes this trivial; chained Squads would require manual back-edges between every pair.

Handoff Trace

A real handoff trace from a Real Estate call (sanitized):

[00:00.512] orchestrator: hand_off → property_search
            payload: { intent: "search", filters: { city: "Sunnyvale", beds: 3 } }
[00:08.341] property_search: vision_analyze(photo_id=ph_4k2)
[00:11.119] property_search: search_listings(filters)
[00:13.882] property_search: complete → return_to_parent
            result: { listings: [...3 items...] }
[00:14.110] orchestrator: hand_off → mortgage_prequal
            payload: { income: 180000, target_price: 1.4M }
[00:18.502] mortgage_prequal: prequal_calc(...)
[00:19.881] mortgage_prequal: complete → return_to_parent
            result: { max_loan: 1.1M, monthly: 7200 }
[00:20.044] orchestrator: hand_off → property_search
            payload: { intent: "search", filters: { city: "Sunnyvale", beds: 3, max_price: 1.1M } }

That third hand-off (back into property_search with a tighter budget) is the exact case Vapi Squads cannot model cleanly.

Vapi vs CallSphere Multi-Agent Comparison

Dimension Vapi Squads CallSphere Hierarchy
Topology DAG of assistants Tree with return-to-parent
Return semantics Explicit transfer back, manual First-class return_to_parent
State sharing Shallow (transfer message) Full conversation + structured payload
Voice per agent Easy (per-assistant) Easy (per-specialist config)
Tool isolation Per-assistant Per-specialist
Reentry into prior agent Manual transfer rule Native handoff
Failure recovery Hangup or transfer to fallback Orchestrator catches and reroutes
Debugging Vapi dashboard per call Full handoff trace + Postgres logs
Agent count practical 3-5 5-10 specialists comfortable

Hierarchical Handoff State Machine

stateDiagram-v2
    [*] --> Orchestrator
    Orchestrator --> PropertySearch: intent=search
    Orchestrator --> Mortgage: intent=affordability
    Orchestrator --> TourSchedule: intent=tour
    Orchestrator --> Human: complex/legal
    PropertySearch --> Orchestrator: complete (return)
    Mortgage --> Orchestrator: complete (return)
    TourSchedule --> Orchestrator: complete (return)
    PropertySearch --> Mortgage: cross-handoff via parent
    Mortgage --> PropertySearch: refine search
    Human --> [*]: warm transfer
    Orchestrator --> [*]: call ends

Code: Failure Recovery Pattern

Hierarchy makes "specialist failed, fall back to human" trivial:

@handoff_returns
async def scheduling_complete(result: SchedulingResult, ctx: Ctx):
    if result.status == "failed":
        # Orchestrator picks this up and routes to human
        return ctx.handoff_to(human_specialist, reason=result.error)
    return ctx.close_call(summary=result.summary)

The Squads equivalent requires every member to know about the human-fallback assistant and emit a transfer rule for it.

FAQ

Can Vapi Squads do return-to-parent at all?

Not as a first-class primitive. You can simulate it with a transfer rule that points back to the prior assistant, but the inbound assistant has no awareness it is "resuming" — it starts fresh.

Does CallSphere ever use a flat (non-hierarchical) agent?

Yes — the After-Hours voicemail agent is a single specialist with no orchestrator, because the flow is genuinely linear (capture, classify, route).

How deep can the hierarchy go?

Three levels in production today: orchestrator → specialist → sub-specialist (e.g., Real Estate Tour Schedule has a sub-specialist for virtual vs in-person). Anything deeper tends to be a smell.

Is this OpenAI-specific?

The OpenAI Agents SDK runs on OpenAI models, but the hierarchy pattern is portable. The CallSphere implementation can be ported to Anthropic or open-weight models with minor adapter work.

What is the latency cost of hierarchy?

A hand-off costs roughly 80-150ms of additional roundtrip vs a flat agent because of the structured result merge. Below the human-perceivable threshold for voice.

See It In Action

The demo at /demo walks through a hierarchical Real Estate call with live trace, and /features documents the orchestrator + specialist patterns shipped per vertical.

Share

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.