By Sagar Shankaran, Founder of CallSphere
SIP REFER is how your AI voice agent hands a call to a human without losing context, caller ID, or attestation. Here is the wire-level mechanics of cold, warm, and attended transfers in 2026.
Key takeaways
The hardest moment of any AI voice deployment is the handoff. A clean SIP REFER is the difference between "I'll connect you to Sarah, who already knows you called about your daughter's prescription" and "Please hold while we transfer you" followed by a customer repeating themselves to a confused human.
flowchart LR
UA[SIP UA] -- REGISTER --> Reg[Registrar]
UA -- INVITE --> Proxy[SIP Proxy]
Proxy --> Dispatcher[Kamailio dispatcher]
Dispatcher --> Worker1[FreeSWITCH worker]
Dispatcher --> Worker2[FreeSWITCH worker]
Worker1 --> AI[(AI agent)]
Worker2 --> AISIP REFER is defined by RFC 3515 (with updates from RFC 5589 for transfer handling) and lets one party in a SIP dialog ask another party to initiate a new SIP request, typically an INVITE to a third party. It is the protocol primitive behind every flavor of call transfer: blind (cold), supervised (semi-attended), and attended (warm). The receiver of a REFER returns a 202 Accepted, then sends NOTIFY messages with a sip-frag body reporting the progress of the new call leg.
For AI voice agents in 2026, REFER is the bridge between automated handling and human escalation. Done right, the human picks up with caller ID intact, optional context headers carrying the AI's transcript and intent, and (on supported carriers) a preserved STIR/SHAKEN attestation. Done wrong, you get dropped attestation, lost context, and the dreaded "let me get your information again" failure mode.
Three transfer flavors with their wire patterns:
Cold (blind) transfer. AI sends a REFER to the carrier with the human's number; carrier originates a new leg and bridges; AI drops out.
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
REFER sip:+19175551212@twilio.com SIP/2.0
Refer-To: <sip:+18001234567@twilio.com>
Referred-By: <sip:ai-agent@callsphere.ai>
Warm (consultative) transfer. AI dials the human first, talks to them ("Sarah, this is the AI - the caller is Maria from Apex Realty asking about a refinance. Putting her through now."), then issues a REFER with Replaces.
REFER sip:+19175551212@twilio.com SIP/2.0
Refer-To: <sip:+18001234567@twilio.com?Replaces=consult-dialog-id>
Referred-By: <sip:ai-agent@callsphere.ai>
Attended. Similar to warm but the AI keeps a brief 3-way bridge so it can vouch in real time before disconnecting.
For AI voice on Twilio Programmable Voice, the API equivalent of cold REFER is <Dial> with the new leg (Twilio originates), and warm is implemented as two <Dial> calls bridged via a conference. Twilio's Inbound SIP REFER (released 2024) lets your AI agent's TwiML respond to a REFER from a caller-side SIP device.
<!-- Twilio TwiML for warm transfer through a conference -->
<Response>
<Say>Connecting you to Sarah, hold on one moment.</Say>
<Dial>
<Conference statusCallback="/transfer-events">
transfer-{{ call_sid }}
</Conference>
</Dial>
</Response>
The conference name carries the call SID so the AI's monitoring service can log the handoff and enrich the human agent's screen with the transcript.
CallSphere runs Twilio Programmable Voice across all six verticals (Healthcare AI, Real Estate AI, Sales Calling AI, Salon AI, IT Helpdesk AI, After-Hours AI). Healthcare AI on FastAPI :8084 hands escalations to a human via TwiML <Dial> with conference + transcript injection so the human sees the AI conversation context before unmuting. Sales Calling AI runs 5 concurrent outbound calls per tenant; warm-transfer to a human closer uses the same conference pattern. After-Hours AI fires Twilio simul call+SMS to on-call staff with a 120-second timeout - if the on-call answers within the window we bridge them in; if not, we voicemail. All transfers preserve STIR/SHAKEN Level A because they originate from the same Twilio Trust Hub profile. Across 37 agents, 90+ tools, 115+ DB tables, HIPAA + SOC 2, $149/$499/$1499 pricing, and 14-day trial, the transfer pattern is uniform per vertical with measured pickup rates.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
X-AI-Context or a preconnect TwiML <Say> that summarizes the call to the human.Will the human see the original caller ID? On REFER + Replaces with a carrier that supports identity preservation, yes. On a re-originated leg through Twilio Dial, you choose the From; we set it to the original caller for warm transfers.
Does warm transfer break HIPAA? No, as long as both endpoints are covered entities or BAAs and the transcript handoff is on encrypted channels.
What if the human does not answer? Define a timeout and a fallback action: voicemail, secondary on-call, or back to the AI. After-Hours AI's 120 s window is our standard.
Can the AI listen during the human conversation? Technically yes via call recording, but it should be disclosed and limited; record-only-when-asked is the safer default for regulated verticals.
What is the latency cost of warm vs cold? Cold is essentially zero added latency. Warm adds 5-15 seconds (the consult call). Attended adds 15-30 seconds.
Start a 14-day trial and watch a warm transfer in action, see pricing, or contact us about transfer flows for your AI voice agent.
Written by
Sagar Shankaran· Founder, CallSphere
Sagar Shankaran is the founder of CallSphere, where he builds production AI voice and chat agents deployed across healthcare, hospitality, real estate, and home services. He writes about agentic AI, LLM engineering, and shipping voice agents that handle real calls in production.
See how AI voice agents work for your industry. Live demo available -- no signup required.
A VoIP telephone number is a phone number that routes calls over the internet instead of copper lines. Learn what a VoIP number is, how to get one, what it costs, and how to pair it with an AI voice agent in 2026.
How we built a fault-tolerant HVAC emergency triage and tech-dispatch platform on Kubernetes — three-tier CQRS, 11 micro-agents on the OpenAI Agents SDK + LangGraph, NATS JetStream, DTMF/SMS/WebSocket acceptance, circuit breakers, and an evaluation pipeline that catches regressions before they wake a tech at 3 AM.
Enterprise CIO Guide perspective on Klarna's AI agent pioneered the resolution-equivalent metric and is now in its third year of production data.
SMB Founder Playbook perspective on Klarna's AI agent pioneered the resolution-equivalent metric and is now in its third year of production data.
Voicemail detection accuracy makes or breaks outbound voice AI. CallSphere VoicemailAnalyzerAgent + Twilio AMD vs Vapi defaults. Real call examples included.
DTMF tone capture during agent speech, IVR-style menus, key suppression. How CallSphere handles DTMF via Twilio + custom logic vs Vapi defaults.
© 2026 CallSphere LLC. All rights reserved.