Skip to content
AI Engineering
AI Engineering11 min read0 views

PJSIP / pjproject 2.17 for Embedded AI Voice Devices in 2026

PJSIP 2.17 (April 2026) added explicit AI real-time speech connectivity support, async SIP authentication, and improved deadlock detection. For embedded AI voice on Linux, Android, RTOS, or microcontroller-class hardware, it is the obvious choice.

When you need a SIP and media stack on a Raspberry Pi, an Android handset, or a sub-128 MB embedded device that has to talk to an AI cloud, PJSIP is what you reach for. The 2.17 release in April 2026 explicitly added AI real-time speech connectivity hooks, async authentication so client apps no longer freeze on a blocking auth challenge, and a CMake reorganization that makes cross-compiling cleaner.

Background

PJSIP is a free open source multimedia communication library written in C with high-level APIs covering SIP, SDP, RTP, STUN, TURN, and ICE. It started as Benny Prijono's Master's-thesis SIP stack in 2003 and has since been adopted by Asterisk (chan_pjsip is the default SIP channel since Asterisk 13), Mizutech, CSipSimple, Linphone, baresip-style apps, and roughly every embedded VoIP device on the market.

PJSIP supports narrowband, wideband, and superwideband codecs (G.711, GSM, iLBC, G.722, Opus), DTLS-SRTP, ICE for NAT traversal, and a high-level abstraction (PJSUA / PJSUA2) that hides most of the SIP state-machine pain. It targets Linux, Windows, macOS, iOS, Android, and several RTOSes; the smallest reasonable footprint is around 700 KB code and 200 KB RAM.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live Demo →

Architecture

graph TD
    A[Embedded device microphone] --> B[PJMEDIA capture port]
    B --> C[Codec G.722 / Opus]
    C --> D[PJSIP RTP / SRTP]
    D --> E[Carrier SIP trunk]
    E --> F[AI bridge in cloud]
    F --> G[OpenAI Realtime / STS]
    G --> F
    F -->|RTP back| D
    D --> H[PJMEDIA playback port]
    H --> I[Speaker]

For AI voice the typical embedded pattern is: PJSUA2 sets up a SIP account against your carrier or your own SIP-aware AI bridge, the bridge handles STT, LLM, TTS, and the device just speaks and listens. PJSIP 2.17 adds optional hooks for direct WebSocket transport to AI services, but the cleanest production path is still: device runs SIP, your edge runs an AI gateway, AI gateway talks to the model.

// Minimal PJSUA2 outbound call to an AI endpoint
Endpoint ep;
EpConfig epCfg;
ep.libCreate();
ep.libInit(epCfg);

TransportConfig tcfg;
tcfg.port = 5060;
ep.transportCreate(PJSIP_TRANSPORT_UDP, tcfg);
ep.libStart();

AccountConfig acfg;
acfg.idUri = "sip:[email protected]";
acfg.regConfig.registrarUri = "sip:registrar.callsphere.local";
MyAccount acc;
acc.create(acfg);

CallOpParam prm(true);
MyCall call(acc);
call.makeCall("sip:[email protected]", prm);

CallSphere implementation

CallSphere does not ship embedded SIP devices today. Every product (Healthcare AI on FastAPI :8084 to OpenAI Realtime, Real Estate AI, Sales Calling AI with 5 concurrent outbound, Salon AI, IT Helpdesk AI, After-Hours AI Twilio simul call+SMS 120-second timeout) terminates on Twilio Programmable Voice from server-side code. Our 37 agents, 90+ tools, 115+ DB tables, HIPAA + SOC 2, $149/$499/$1499 plans, 14-day trial, and 22% affiliate are all server-side. We have evaluated PJSIP for an embedded healthcare kiosk SKU; for prospects asking about kiosks or in-vehicle telematics, we point them to a reference PJSUA2 client that registers to a Twilio SIP domain and lands the call into our existing FastAPI bridge.

Build steps

  1. Clone pjproject 2.17, run ./configure with target triplet, build with make dep && make.
  2. Cross-compile Opus and Speex EC libraries first if you need wideband codec quality.
  3. Use PJSUA2 (the C++ wrapper) for application code; raw PJSIP is needed only for protocol extensions.
  4. Configure ICE if the device sits behind NAT; STUN+TURN credentials live in epCfg.uaConfig.stunServer.
  5. Implement onCallMediaState to attach the audio device to the call media slot.
  6. Test against a public SIP test endpoint first (sipgate, sip.callcentric.com) before pointing at production.
  7. Profile RAM with valgrind massif on the target architecture; PJSIP idle is around 4 MB heap.

Pitfalls

  • PJSIP threading model is tricky; never call APIs from the audio thread without registering the thread with pj_thread_register.
  • ICE failures show up as one-way audio; verify STUN/TURN with stun-client before blaming the SIP stack.
  • Opus on embedded ARM may need NEON optimization; without it, encode CPU spikes hurt battery.
  • Memory leaks come from forgetting to delete the Account/Call C++ wrappers; PJSIP has its own memory pools but the wrapper bookkeeping is your job.
  • Cross-compiling for Android or iOS requires the platform-specific config-site.h tweaks; do not assume the default builds work on mobile.

FAQ

What is new in PJSIP 2.17? Async SIP client authentication, AI real-time speech connectivity hooks, CMake build reorganization, deadlock detection improvements. Released April 22 2026.

Does PJSIP support WebRTC directly? Not native. You can pair it with libwebrtc or use a SIP-to-WebRTC gateway like Drachtio.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Smallest device PJSIP runs on? ~512 KB flash, 192 KB RAM with narrowband codecs and minimal config. ESP32-class hardware works with care.

Asterisk chan_pjsip vs standalone PJSIP? chan_pjsip is the SIP channel inside Asterisk. Standalone PJSIP is the library you embed in your own app or device.

Is PJSIP licensing free for commercial use? Dual-licensed: GPL for free use, commercial license available from Teluu for closed-source products.

Sources

Start a 14-day trial of our managed AI voice, see pricing for cloud-side options, or contact us about embedded SIP reference designs.

Share

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.

Related Articles You May Like

AI Voice Agents

MOS Call Quality Scoring for AI Voice Operations in 2026: Beyond 4.2

MOS 4.3+ is the band where AI voice feels human. Drop below 3.6 and conversations break. Here is how to measure, improve, and alert on MOS in production AI voice using G.711, Opus, and the underlying packet loss / jitter / latency math.

AI Engineering

The Latency Budget for AI Voice Agents Across PSTN in 2026

Where every millisecond goes between caller and AI: PSTN, carrier, STT, LLM, TTS, and back. The component-level targets that ship in 2026 and how to hit them.

AI Infrastructure

Session Border Controllers for AI Voice: Compliance, Security, Survival

What an SBC actually does, why AI voice deployments still need them in 2026, and how Oracle, Ribbon, AudioCodes, and Cisco fit into modern stacks.

AI Strategy

State Data Residency for AI Voice in Healthcare — Texas, Nevada, Colorado in 2026

Texas SB 1188 requires US-resident EHRs from January 1, 2026; Nevada's consumer-health-data law constrains health data; Colorado AI Act takes effect June 30, 2026. AI voice agents must architect for state-by-state data localization.

AI Engineering

SIP Debugging with sngrep and Wireshark for AI Voice Calls in 2026: The Hands-On Playbook

When your AI voice agent gets one-way audio, missed DTMF, or codec mismatch, sngrep and Wireshark are still the fastest path to root cause in 2026. Here is the playbook.

AI Engineering

SIP REGISTER and INVITE: Deep Dive for AI Voice Agent Builders

How SIP REGISTER and INVITE work end-to-end, why your AI agent platform needs to handle 401 challenges and Record-Route correctly, and the failure modes that bite production builds.