By Sagar Shankaran, Founder of CallSphere
CallKit + WebRTC + PushKit is the only Apple-blessed path for AI voice agents on iOS. Here is the 2026 production playbook with audio-session, ICE, and CXProvider details.
Key takeaways
CallKit is not optional. Every iOS AI voice agent that wants to ring like a phone, survive backgrounding, and play audio over the right route in 2026 has to live inside CXProvider — and inside a WebRTC pipeline that knows how to share an AVAudioSession with it.
Apple introduced CallKit in iOS 10 to give VoIP apps the same affordances as the native Phone app: lock-screen ringing, native call UI, integration with the Recents and Contacts databases, and proper audio routing through the system phone manager. In 2026, CallKit remains the only Apple-blessed path for "the app must ring even when killed". For AI voice agents, that means an inbound call from your scheduling assistant, your real-estate agent, or your healthcare intake bot must come up the same way as a FaceTime audio call.
WebRTC handles media; CallKit handles UX and the audio session; PushKit wakes the app. Skipping CallKit means iOS will throttle background CPU, the audio session will not switch routes correctly when AirPods connect, and the user will be unable to answer from the lock screen. Skipping WebRTC means you ship SIP-over-WebSocket, which is fine on Wi-Fi and dies on cellular NAT.
```mermaid flowchart LR Server[Voice Agent Backend] -- VoIP push --> APNs[(Apple APNs)] APNs -- PushKit payload --> App[iOS App] App -- reportNewIncomingCall --> CXProvider[CallKit CXProvider] CXProvider -- user answers --> WebRTC[WebRTC PeerConnection] WebRTC -- DTLS-SRTP --> Gateway[Pion Go gateway 1.23] Gateway -- NATS --> Pod[6-container agent pod] ```
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
CallSphere uses this exact stack for the iOS clients of two of its six verticals (real estate, healthcare, behavioral health, legal, salon, insurance):
Across 37 agents, 90+ tools, 115+ database tables, and HIPAA + SOC 2 controls, the iOS layer is genuinely thin: it terminates SRTP, hands frames to AVAudioEngine, and lets CallKit own the UI. Pricing remains $149/$499/$1499 with the 14-day /trial; affiliates earn 22% — see /affiliate.
```swift // 1. Configure CXProvider once at app launch let config = CXProviderConfiguration() config.supportsVideo = false config.maximumCallsPerCallGroup = 1 config.supportedHandleTypes = [.generic] let provider = CXProvider(configuration: config) provider.setDelegate(self, queue: nil)
// 2. On PushKit VoIP push, report immediately (within 5s) func pushRegistry(_ registry: PKPushRegistry, didReceiveIncomingPushWith payload: PKPushPayload, for type: PKPushType, completion: @escaping () -> Void) { let uuid = UUID() let update = CXCallUpdate() update.remoteHandle = CXHandle(type: .generic, value: payload.dictionaryPayload["from"] as? String ?? "AI Agent") update.hasVideo = false provider.reportNewIncomingCall(with: uuid, update: update) { error in completion() } }
// 3. On answer, configure RTCAudioSession before peer connection setup func provider(_ provider: CXProvider, perform action: CXAnswerCallAction) { let session = RTCAudioSession.sharedInstance() session.lockForConfiguration() try? session.setCategory(AVAudioSession.Category.playAndRecord.rawValue, with: [.allowBluetooth, .duckOthers]) try? session.setMode(AVAudioSession.Mode.voiceChat.rawValue) session.unlockForConfiguration() startWebRTC() action.fulfill() } ```
Is CallKit required for AI voice agents? Yes for inbound that should ring; outbound-only apps can skip it but lose proper audio routing.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
Does CallKit work in China? Apple disables CallKit in mainland China builds; ship a separate non-CallKit build for the App Store China region.
Can I show my own custom UI? Use CallKit's "non-UI" providers and present custom screens after answer; lock-screen ringing must still be CallKit.
Does it work with WebRTC's default RTCAudioSession? Yes — `RTCAudioSession.sharedInstance()` is designed for exactly this; do not instantiate `AVAudioSession` directly.
What happens during an interrupting phone call? CallKit pauses your call; WebRTC sees an audio interruption notification and you must hold and resume the peer connection.
Try the WebRTC + CallKit path live at /demo, browse plans at /pricing, or start a /trial.
Written by
Sagar Shankaran· Founder, CallSphere
Sagar Shankaran is the founder of CallSphere, where he builds production AI voice and chat agents deployed across healthcare, hospitality, real estate, and home services. He writes about agentic AI, LLM engineering, and shipping voice agents that handle real calls in production.
See how AI voice agents work for your industry. Live demo available -- no signup required.
A founder's guide to texto a voz (text-to-speech in Spanish): LATAM vs Castilian voices, free options, and how CallSphere ships Spanish agents.
A founder's guide to the female voice generator landscape: AI female voices, Japanese voices, robot voices, and how CallSphere ships 57+ voices live.
A founder's guide to the Siri voice generator landscape: how AI voice cloning works, what is legal, and how CallSphere uses 57+ voices in production.
A founder's guide to AI voice assistants for ecommerce: customer service, order lookup, and how CallSphere fits in versus virtual receptionists.
Robot text to speech in 2026: how I pick TTS APIs, when robotic voices help, and how CallSphere ships 57+ language voice agents. Hands-on guide.
The customer support specialist role in 2026 is half human, half AI. Here is what the job looks like, the AI tools that pair with it, and how we ship it.
© 2026 CallSphere LLC. All rights reserved.