Skip to content
AI Engineering
AI Engineering13 min read0 views

Replace PolyAI With a Multi-Agent OpenAI Agents SDK Stack

PolyAI's Agent Studio is great for enterprise but pricing is bespoke and customization is gated. Build the same containment with the OpenAI Agents SDK in 400 lines.

TL;DR — PolyAI's enterprise voice platform hits 80%+ containment but is closed-source and priced per seat plus per minute. The OpenAI Agents SDK (April 2026 release) ships realtime voice + multi-agent handoffs as a primitive — you can match the containment and own the IP.

What you'll build

A Python application using openai-agents[voice] that fields PSTN calls via Twilio, spawns a triage agent, hands off to specialist agents (billing, technical, retention), and persists every handoff to Postgres for analytics that mirror PolyAI's containment dashboard.

Prerequisites

  1. PolyAI deployment with at least one Agent Studio flow exported.
  2. OpenAI API key with Realtime access.
  3. Python 3.11+, openai-agents[voice], twilio, asyncpg.
  4. Twilio number + Media Streams.
  5. A definition of "containment" (calls resolved without human transfer).

Architecture

flowchart TB
  C[Caller] --> TW[Twilio]
  TW --> BR[Bridge]
  BR --> TR[Triage Agent]
  TR -->|handoff| BIL[Billing Specialist]
  TR -->|handoff| TEC[Tech Specialist]
  TR -->|handoff| RET[Retention Specialist]
  BIL --> CRM[(CRM API)]
  TEC --> KB[(Knowledge Base)]
  RET --> WIN[(Save Offers)]

Step 1 — Define specialists

```python from agents import Agent, Runner, RealtimeAgent, function_tool

billing = RealtimeAgent( name="billing", instructions=open("prompts/billing.md").read(), tools=[fetch_invoice, refund_charge], )

tech = RealtimeAgent( name="tech", instructions=open("prompts/tech.md").read(), tools=[run_diagnostic, file_ticket], )

retention = RealtimeAgent( name="retention", instructions=open("prompts/retention.md").read(), tools=[generate_offer], ) ```

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live Demo →

Step 2 — Triage with handoffs

```python triage = RealtimeAgent( name="triage", instructions="Greet, classify intent, hand off.", handoffs=[billing, tech, retention], ) ```

Step 3 — Tool definitions

```python @function_tool async def fetch_invoice(account_id: str) -> dict: return await crm.get_invoice(account_id)

@function_tool async def refund_charge(account_id: str, amount_cents: int, reason: str) -> dict: if amount_cents > 50000: # guardrail return {"error": "amount over policy; escalate"} return await billing_api.refund(account_id, amount_cents, reason) ```

Step 4 — Twilio bridge that talks to the realtime agent

```python from agents.realtime import RealtimeRunner

async def serve(twilio_ws): runner = RealtimeRunner(starting_agent=triage) async with runner.run() as session: async def t2a(): async for msg in twilio_ws.iter_text(): ev = json.loads(msg) if ev["event"] == "media": await session.send_audio(base64.b64decode(ev["media"]["payload"])) async def a2t(): async for ev in session.events(): if ev.type == "audio_chunk": await twilio_ws.send_json({"event":"media", "media":{"payload": base64.b64encode(ev.audio).decode()}}) if ev.type == "handoff": await db.log_handoff(call_id, ev.from_agent, ev.to_agent) await asyncio.gather(t2a(), a2t()) ```

Step 5 — Containment analytics

```sql SELECT date_trunc('day', started_at) AS day, count() FILTER (WHERE outcome = 'resolved') * 1.0 / count() AS containment FROM call_sessions GROUP BY 1 ORDER BY 1 DESC LIMIT 30; ```

Step 6 — Eval harness

Use the OpenAI Evals API to replay 200 calls per week, scoring each transcript on (a) correct intent, (b) correct specialist chosen, (c) successful tool execution, (d) no hallucinated balances. PolyAI hides this — you own it.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Step 7 — Cutover

Run the SDK stack on a sub-set of phone numbers for 4 weeks. PolyAI typically advertises 80%+ containment; aim to match within ±3 points before full migration.

Common pitfalls

  • Handoff loops. Set a max-handoff guardrail (3 per call) or you'll see ping-pong.
  • Tool token bloat. Specialists should not see other specialists' tools.
  • Voice changes on handoff. Pick the same voice across specialists or warn the user.

How CallSphere does this in production

CallSphere is built around exactly this pattern. 37 specialist agents handle 6 verticals: Healthcare (14 tools, OpenAI Realtime, FastAPI :8084 — HIPAA), OneRoof Property (10 specialists, WebRTC + Pion + NATS), Salon (4 agents, ElevenLabs, GB-YYYYMMDD-### refs). Triage agents always do classification → handoff, never long-form conversation. 90+ tools and 115+ DB tables back the network. /pricing · /compare/polyai.

FAQ

Is the Agents SDK production-ready? Yes — April 2026 release added Realtime + handoffs as first-class.

What about non-OpenAI models? SDK is provider-agnostic for non-realtime; realtime voice currently OpenAI/Gemini only.

SOC2/HIPAA? OpenAI Enterprise has both; sign the BAA before HIPAA traffic.

Can I keep PolyAI's analytics? Replicate via Postgres + Metabase in 2 days.

Multi-language? Realtime supports 30+ languages out of the box.

Sources

Share

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.

Related Articles You May Like