SIP INFO for DTMF in AI Agent Flows in 2026: When Out-of-Band Beats RTP Events
RFC 4733 RTP events handle most DTMF, but SIP INFO is the workaround when carriers strip telephone-events or when your AI agent needs out-of-band signaling. Here is when to use which in 2026.
The user pressed 4 to confirm. Your AI agent never heard it. Welcome to the DTMF transport problem - a 30-year-old wart that still bites AI voice deployments in 2026.
Background
flowchart LR
UA[SIP UA] -- REGISTER --> Reg[Registrar]
UA -- INVITE --> Proxy[SIP Proxy]
Proxy --> Dispatcher[Kamailio dispatcher]
Dispatcher --> Worker1[FreeSWITCH worker]
Dispatcher --> Worker2[FreeSWITCH worker]
Worker1 --> AI[(AI agent)]
Worker2 --> AIDTMF (touch-tones) over IP has three transport methods. RFC 4733 (which obsoletes the older RFC 2833) defines telephone-event payloads carried inside the RTP stream as a special payload type. SIP INFO, defined by RFC 2976 and refined for keypad use by RFC 6086, carries the digit as a SIP signaling message outside the media path. In-band DTMF actually plays the audible tone in the audio.
For AI voice agents, the picture is messy. Most US carriers prefer RFC 4733 telephone-events on egress because they are precise and tone-faithful. But carrier-level transcoding can strip the events on transit (PCMU + tel-event mismatch), wholesale resellers sometimes drop the negotiated payload type, and AI bridges that decode RTP straight to PCM may not detect tel-events at all. SIP INFO is the fallback when RTP events do not arrive.
Technical deep-dive
A SIP INFO DTMF message looks like:
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
INFO sip:[email protected] SIP/2.0
Via: SIP/2.0/TLS sbc.twilio.com;branch=z9hG4bK-info-1
Content-Type: application/dtmf-relay
Content-Length: 24
Signal=4
Duration=200
Signal is the digit (0-9, *, #, A-D), Duration is the tone duration in milliseconds. RFC 6086 also defines application/dtmf as a simpler one-line body but application/dtmf-relay is the de-facto standard, originating from Cisco and adopted broadly.
For AI agents the typical event flow is:
- Caller presses 4 on their phone
- Their device generates an in-band tone or RFC 4733 telephone-event
- The carrier hops transcode somewhere along the way
- By the time the call hits your AI bridge, the digit may have arrived as: in-band audio (audible tone), RFC 4733 events, or SIP INFO - or all three
A robust AI bridge listens for all three. The OpenAI Realtime API can detect DTMF tones in the audio stream as a server-side feature, but the timing is less precise than RFC 4733 events; for menu-driven flows, a parser on SIP INFO is faster and more reliable.
# FastAPI handler that merges DTMF sources
@app.post("/twilio/webhook/dtmf")
async def handle_dtmf(call_sid: str, digits: str):
"""Twilio sends DTMF as a webhook (its preferred method)."""
await dtmf_queue.put({"sid": call_sid, "digit": digits, "src": "webhook"})
@app.websocket("/realtime/{call_sid}")
async def media_stream(ws: WebSocket, call_sid: str):
async for msg in ws.iter_text():
evt = json.loads(msg)
if evt.get("event") == "dtmf":
await dtmf_queue.put({"sid": call_sid, "digit": evt["dtmf"]["digit"], "src": "media"})
The Twilio webhook path is roughly equivalent to SIP INFO out-of-band; the WebSocket media-stream DTMF event is roughly equivalent to RFC 4733. We dedupe in dtmf_queue since both can fire.
CallSphere implementation
CallSphere uses Twilio Programmable Voice across all six verticals. Twilio handles DTMF detection across in-band tone and RFC 4733 events automatically and forwards us either a webhook (TwiML <Gather>) or a Media Streams DTMF event. For Healthcare AI on FastAPI :8084 we accept both paths into a unified queue and feed digits into the OpenAI Realtime conversation as user-content events. Sales Calling AI uses DTMF for opt-out (press 9 to stop) on outbound legs - 5 concurrent per tenant - and we log every digit for TCPA records. After-Hours AI listens for confirmation digits during simul call+SMS to on-call staff (120-second timeout) so the on-call can press 1 to accept the page. Across 37 agents, 90+ tools, 115+ DB tables, HIPAA + SOC 2 alignment, $149/$499/$1499 pricing, and 14-day trial, the DTMF parsing layer is shared infrastructure across products.
Implementation steps
- Negotiate RFC 4733 telephone-events in your SDP answer (payload type 101 is conventional).
- Have your AI bridge subscribe to whichever DTMF event your provider exposes (Twilio webhook + Media Streams).
- Add a SIP INFO listener if your provider can pass it through; useful for upstream legs that strip RTP events.
- Dedupe digits across sources within a 200 ms window; phones often produce both an in-band tone and an event.
- Feed the digit into the AI as a synthetic user message ("[user pressed 4]") so the LLM can react.
- Log every digit to your CDR with source and timestamp for TCPA opt-out evidence.
- Test on a real cell phone, a real landline, and a softphone; behavior varies wildly.
- Set a debounce to avoid double-firing on long key presses.
FAQ
Should I support all three DTMF transports or just one? For inbound AI in 2026, accept all three. The cost is one extra parser; the cost of missing a digit is a frustrated user.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
Why is RFC 4733 not enough? Some carrier interconnects strip non-default payload types during transcoding; some softphones default to in-band only; some PBXs default to SIP INFO.
Does OpenAI Realtime detect DTMF natively? The model can hear in-band tones and react to them, but timing is less precise than parsed events. For menu logic always parse the event channel.
Is SIP INFO going away?
No. RFC 6086 reaffirms it, and Cisco/Avaya/Microsoft Teams continue to use application/dtmf-relay widely.
What about pulse dialing or rotary? Anachronism. Modern PSTN converts pulse to DTMF at the CO; you will never see pulse on IP signaling.
Sources
- DEV: How Voice AI handles DTMF (RFC 2833 vs SIP INFO vs in-band)
- VoIP-Info: SIP DTMF Signalling
- RFC 2976: The SIP INFO Method
Start a 14-day trial and test DTMF flows live, see pricing, or contact us about menu-driven AI voice flows.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.