Skip to content
AI Voice Agents
AI Voice Agents11 min read0 views

SIP REFER for AI Agent-to-Human Transfer in 2026: Cold, Warm, and Attended

SIP REFER is how your AI voice agent hands a call to a human without losing context, caller ID, or attestation. Here is the wire-level mechanics of cold, warm, and attended transfers in 2026.

The hardest moment of any AI voice deployment is the handoff. A clean SIP REFER is the difference between "I'll connect you to Sarah, who already knows you called about your daughter's prescription" and "Please hold while we transfer you" followed by a customer repeating themselves to a confused human.

Background

flowchart LR
  UA[SIP UA] -- REGISTER --> Reg[Registrar]
  UA -- INVITE --> Proxy[SIP Proxy]
  Proxy --> Dispatcher[Kamailio dispatcher]
  Dispatcher --> Worker1[FreeSWITCH worker]
  Dispatcher --> Worker2[FreeSWITCH worker]
  Worker1 --> AI[(AI agent)]
  Worker2 --> AI
CallSphere reference architecture

SIP REFER is defined by RFC 3515 (with updates from RFC 5589 for transfer handling) and lets one party in a SIP dialog ask another party to initiate a new SIP request, typically an INVITE to a third party. It is the protocol primitive behind every flavor of call transfer: blind (cold), supervised (semi-attended), and attended (warm). The receiver of a REFER returns a 202 Accepted, then sends NOTIFY messages with a sip-frag body reporting the progress of the new call leg.

For AI voice agents in 2026, REFER is the bridge between automated handling and human escalation. Done right, the human picks up with caller ID intact, optional context headers carrying the AI's transcript and intent, and (on supported carriers) a preserved STIR/SHAKEN attestation. Done wrong, you get dropped attestation, lost context, and the dreaded "let me get your information again" failure mode.

Technical deep-dive

Three transfer flavors with their wire patterns:

Cold (blind) transfer. AI sends a REFER to the carrier with the human's number; carrier originates a new leg and bridges; AI drops out.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live Demo →
REFER sip:[email protected] SIP/2.0
Refer-To: <sip:[email protected]>
Referred-By: <sip:[email protected]>

Warm (consultative) transfer. AI dials the human first, talks to them ("Sarah, this is the AI - the caller is Maria from Apex Realty asking about a refinance. Putting her through now."), then issues a REFER with Replaces.

REFER sip:[email protected] SIP/2.0
Refer-To: <sip:[email protected]?Replaces=consult-dialog-id>
Referred-By: <sip:[email protected]>

Attended. Similar to warm but the AI keeps a brief 3-way bridge so it can vouch in real time before disconnecting.

For AI voice on Twilio Programmable Voice, the API equivalent of cold REFER is <Dial> with the new leg (Twilio originates), and warm is implemented as two <Dial> calls bridged via a conference. Twilio's Inbound SIP REFER (released 2024) lets your AI agent's TwiML respond to a REFER from a caller-side SIP device.

<!-- Twilio TwiML for warm transfer through a conference -->
<Response>
  <Say>Connecting you to Sarah, hold on one moment.</Say>
  <Dial>
    <Conference statusCallback="/transfer-events">
      transfer-{{ call_sid }}
    </Conference>
  </Dial>
</Response>

The conference name carries the call SID so the AI's monitoring service can log the handoff and enrich the human agent's screen with the transcript.

CallSphere implementation

CallSphere runs Twilio Programmable Voice across all six verticals (Healthcare AI, Real Estate AI, Sales Calling AI, Salon AI, IT Helpdesk AI, After-Hours AI). Healthcare AI on FastAPI :8084 hands escalations to a human via TwiML <Dial> with conference + transcript injection so the human sees the AI conversation context before unmuting. Sales Calling AI runs 5 concurrent outbound calls per tenant; warm-transfer to a human closer uses the same conference pattern. After-Hours AI fires Twilio simul call+SMS to on-call staff with a 120-second timeout - if the on-call answers within the window we bridge them in; if not, we voicemail. All transfers preserve STIR/SHAKEN Level A because they originate from the same Twilio Trust Hub profile. Across 37 agents, 90+ tools, 115+ DB tables, HIPAA + SOC 2, $149/$499/$1499 pricing, and 14-day trial, the transfer pattern is uniform per vertical with measured pickup rates.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Implementation steps

  1. Decide your default policy per vertical: Healthcare often needs warm with consent; Sales prefers attended; After-Hours uses cold-with-context.
  2. Implement the cold path first; it is the simplest wire pattern and many calls only need it.
  3. Layer the warm pattern using a conference room or B2BUA bridge so the AI can talk to the human privately first.
  4. Pass context: a custom SIP header X-AI-Context or a preconnect TwiML <Say> that summarizes the call to the human.
  5. Verify STIR/SHAKEN attestation survives; ported numbers can downgrade to Level B on transfer.
  6. Log the REFER/202/NOTIFY chain for every transfer; the NOTIFY tells you whether the new leg connected.
  7. Set a transfer timeout (we use 30 seconds for warm, 120 for After-Hours simul) and define the fallback.
  8. Train the AI on what to say in each path; "Connecting you to..." vs "Sarah is on the line, putting her through".

FAQ

Will the human see the original caller ID? On REFER + Replaces with a carrier that supports identity preservation, yes. On a re-originated leg through Twilio Dial, you choose the From; we set it to the original caller for warm transfers.

Does warm transfer break HIPAA? No, as long as both endpoints are covered entities or BAAs and the transcript handoff is on encrypted channels.

What if the human does not answer? Define a timeout and a fallback action: voicemail, secondary on-call, or back to the AI. After-Hours AI's 120 s window is our standard.

Can the AI listen during the human conversation? Technically yes via call recording, but it should be disclosed and limited; record-only-when-asked is the safer default for regulated verticals.

What is the latency cost of warm vs cold? Cold is essentially zero added latency. Warm adds 5-15 seconds (the consult call). Attended adds 15-30 seconds.

Sources

Start a 14-day trial and watch a warm transfer in action, see pricing, or contact us about transfer flows for your AI voice agent.

Share

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.