---
title: "SIP INFO for DTMF in AI Agent Flows in 2026: When Out-of-Band Beats RTP Events"
description: "RFC 4733 RTP events handle most DTMF, but SIP INFO is the workaround when carriers strip telephone-events or when your AI agent needs out-of-band signaling. Here is when to use which in 2026."
canonical: https://callsphere.ai/blog/vw3d-sip-info-dtmf-ai-flows-2026
category: "AI Engineering"
tags: ["SIP INFO", "DTMF", "RFC 4733", "AI Voice", "Signaling"]
author: "CallSphere Team"
published: 2026-03-25T00:00:00.000Z
updated: 2026-05-07T09:59:38.198Z
---

# SIP INFO for DTMF in AI Agent Flows in 2026: When Out-of-Band Beats RTP Events

> RFC 4733 RTP events handle most DTMF, but SIP INFO is the workaround when carriers strip telephone-events or when your AI agent needs out-of-band signaling. Here is when to use which in 2026.

> The user pressed 4 to confirm. Your AI agent never heard it. Welcome to the DTMF transport problem - a 30-year-old wart that still bites AI voice deployments in 2026.

## Background

```mermaid
flowchart LR
  UA[SIP UA] -- REGISTER --> Reg[Registrar]
  UA -- INVITE --> Proxy[SIP Proxy]
  Proxy --> Dispatcher[Kamailio dispatcher]
  Dispatcher --> Worker1[FreeSWITCH worker]
  Dispatcher --> Worker2[FreeSWITCH worker]
  Worker1 --> AI[(AI agent)]
  Worker2 --> AI
```

CallSphere reference architecture

DTMF (touch-tones) over IP has three transport methods. RFC 4733 (which obsoletes the older RFC 2833) defines telephone-event payloads carried inside the RTP stream as a special payload type. SIP INFO, defined by RFC 2976 and refined for keypad use by RFC 6086, carries the digit as a SIP signaling message outside the media path. In-band DTMF actually plays the audible tone in the audio.

For AI voice agents, the picture is messy. Most US carriers prefer RFC 4733 telephone-events on egress because they are precise and tone-faithful. But carrier-level transcoding can strip the events on transit (PCMU + tel-event mismatch), wholesale resellers sometimes drop the negotiated payload type, and AI bridges that decode RTP straight to PCM may not detect tel-events at all. SIP INFO is the fallback when RTP events do not arrive.

## Technical deep-dive

A SIP INFO DTMF message looks like:

```
INFO sip:bridge@callsphere.ai SIP/2.0
Via: SIP/2.0/TLS sbc.twilio.com;branch=z9hG4bK-info-1
Content-Type: application/dtmf-relay
Content-Length: 24

Signal=4
Duration=200
```

`Signal` is the digit (0-9, *, #, A-D), `Duration` is the tone duration in milliseconds. RFC 6086 also defines `application/dtmf` as a simpler one-line body but `application/dtmf-relay` is the de-facto standard, originating from Cisco and adopted broadly.

For AI agents the typical event flow is:

1. Caller presses 4 on their phone
2. Their device generates an in-band tone or RFC 4733 telephone-event
3. The carrier hops transcode somewhere along the way
4. By the time the call hits your AI bridge, the digit may have arrived as: in-band audio (audible tone), RFC 4733 events, or SIP INFO - or all three

A robust AI bridge listens for all three. The OpenAI Realtime API can detect DTMF tones in the audio stream as a server-side feature, but the timing is less precise than RFC 4733 events; for menu-driven flows, a parser on SIP INFO is faster and more reliable.

```python
# FastAPI handler that merges DTMF sources
@app.post("/twilio/webhook/dtmf")
async def handle_dtmf(call_sid: str, digits: str):
    """Twilio sends DTMF as a webhook (its preferred method)."""
    await dtmf_queue.put({"sid": call_sid, "digit": digits, "src": "webhook"})

@app.websocket("/realtime/{call_sid}")
async def media_stream(ws: WebSocket, call_sid: str):
    async for msg in ws.iter_text():
        evt = json.loads(msg)
        if evt.get("event") == "dtmf":
            await dtmf_queue.put({"sid": call_sid, "digit": evt["dtmf"]["digit"], "src": "media"})
```

The Twilio webhook path is roughly equivalent to SIP INFO out-of-band; the WebSocket media-stream DTMF event is roughly equivalent to RFC 4733. We dedupe in `dtmf_queue` since both can fire.

## CallSphere implementation

CallSphere uses Twilio Programmable Voice across all six verticals. Twilio handles DTMF detection across in-band tone and RFC 4733 events automatically and forwards us either a webhook (TwiML ``) or a Media Streams DTMF event. For Healthcare AI on FastAPI :8084 we accept both paths into a unified queue and feed digits into the OpenAI Realtime conversation as user-content events. Sales Calling AI uses DTMF for opt-out (press 9 to stop) on outbound legs - 5 concurrent per tenant - and we log every digit for TCPA records. After-Hours AI listens for confirmation digits during simul call+SMS to on-call staff (120-second timeout) so the on-call can press 1 to accept the page. Across 37 agents, 90+ tools, 115+ DB tables, HIPAA + SOC 2 alignment, $149/$499/$1499 pricing, and 14-day trial, the DTMF parsing layer is shared infrastructure across products.

## Implementation steps

1. Negotiate RFC 4733 telephone-events in your SDP answer (payload type 101 is conventional).
2. Have your AI bridge subscribe to whichever DTMF event your provider exposes (Twilio webhook + Media Streams).
3. Add a SIP INFO listener if your provider can pass it through; useful for upstream legs that strip RTP events.
4. Dedupe digits across sources within a 200 ms window; phones often produce both an in-band tone and an event.
5. Feed the digit into the AI as a synthetic user message ("[user pressed 4]") so the LLM can react.
6. Log every digit to your CDR with source and timestamp for TCPA opt-out evidence.
7. Test on a real cell phone, a real landline, and a softphone; behavior varies wildly.
8. Set a debounce to avoid double-firing on long key presses.

## FAQ

**Should I support all three DTMF transports or just one?**
For inbound AI in 2026, accept all three. The cost is one extra parser; the cost of missing a digit is a frustrated user.

**Why is RFC 4733 not enough?**
Some carrier interconnects strip non-default payload types during transcoding; some softphones default to in-band only; some PBXs default to SIP INFO.

**Does OpenAI Realtime detect DTMF natively?**
The model can hear in-band tones and react to them, but timing is less precise than parsed events. For menu logic always parse the event channel.

**Is SIP INFO going away?**
No. RFC 6086 reaffirms it, and Cisco/Avaya/Microsoft Teams continue to use `application/dtmf-relay` widely.

**What about pulse dialing or rotary?**
Anachronism. Modern PSTN converts pulse to DTMF at the CO; you will never see pulse on IP signaling.

## Sources

- [DEV: How Voice AI handles DTMF (RFC 2833 vs SIP INFO vs in-band)](https://dev.to/priyanka_309d6c6a6006387e/how-voice-ai-handles-dtmf-the-complete-guide-rfc-2833-vs-sip-info-vs-in-band-54do)
- [VoIP-Info: SIP DTMF Signalling](https://www.voip-info.org/sip-dtmf-signalling/)
- [RFC 2976: The SIP INFO Method](https://www.rfc-editor.org/rfc/rfc2976.html)

Start a [14-day trial](/trial) and test DTMF flows live, see [pricing](/pricing), or [contact us](/contact) about menu-driven AI voice flows.

---

Source: https://callsphere.ai/blog/vw3d-sip-info-dtmf-ai-flows-2026
