Skip to content
AI Voice Agents
AI Voice Agents11 min read0 views

Build a Voice Agent with Supabase Edge Functions + OpenAI Realtime (2026)

Use Supabase Edge Functions (Deno runtime) for long-running WebSocket voice bridges. Twilio integration, Postgres for sessions, RLS for tenant isolation, deploy with one CLI command.

TL;DR — Supabase Edge Functions run Deno on Cloudflare-style edge with full WebSocket support, including long-lived connections via Deno.serve upgrade. Pair with Twilio Media Streams, OpenAI Realtime, and Postgres + RLS for tenant-isolated voice agents. supabase functions deploy ships in 30 seconds.

What you'll build

A Supabase Edge Function twilio-voice that:

  1. Returns TwiML on HTTP POST
  2. Upgrades to WebSocket on the same path
  3. Bridges Twilio mu-law frames to OpenAI Realtime (g711_ulaw both ways)
  4. Persists every turn to a call_turns table with RLS by tenant_id

Prerequisites

  1. Supabase project (free tier works for dev).
  2. Supabase CLI (brew install supabase/tap/supabase).
  3. Twilio account + number.
  4. OPENAI_API_KEY set as a Supabase secret.

Architecture

flowchart LR
  C[Caller] --> T[Twilio]
  T -->|HTTP TwiML| EF[/functions/v1/twilio-voice]
  T -->|wss media| EF
  EF -->|wss| OAI[OpenAI Realtime]
  EF -->|insert| PG[(Postgres call_turns)]
  PG -->|RLS| TENANT[tenant_id row filter]

Step 1 — Create the function

```bash supabase functions new twilio-voice ```

Step 2 — TwiML + WS upgrade in one handler

```ts // supabase/functions/twilio-voice/index.ts Deno.serve(async (req) => { const url = new URL(req.url); if (req.headers.get("upgrade") === "websocket") return handleWs(req); if (req.method === "POST") { const host = req.headers.get("host"); return new Response( ``, { headers: { "content-type": "text/xml" } } ); } return new Response("ok"); }); ```

Step 3 — Bridge to OpenAI Realtime

```ts function handleWs(req: Request) { const { socket: twilio, response } = Deno.upgradeWebSocket(req); const ai = new WebSocket("wss://api.openai.com/v1/realtime?model=gpt-realtime", { headers: { Authorization: `Bearer ${Deno.env.get("OPENAI_API_KEY")}`, "OpenAI-Beta": "realtime=v1" } } as any);

let streamSid = ""; ai.onopen = () => ai.send(JSON.stringify({ type: "session.update", session: { instructions: "You are a friendly receptionist. Keep replies short.", voice: "marin", input_audio_format: "g711_ulaw", output_audio_format: "g711_ulaw", turn_detection: { type: "server_vad" } } }));

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live Demo →

twilio.onmessage = (e) => { const ev = JSON.parse(e.data); if (ev.event === "start") streamSid = ev.streamSid; if (ev.event === "media") ai.send(JSON.stringify({ type: "input_audio_buffer.append", audio: ev.media.payload })); };

ai.onmessage = (e) => { const ev = JSON.parse(e.data); if (ev.type === "response.audio.delta") { twilio.send(JSON.stringify({ event: "media", streamSid, media: { payload: ev.delta } })); } };

twilio.onclose = () => ai.close(); return response; } ```

Step 4 — Persist turns to Postgres with RLS

Create the table:

```sql create table call_turns ( id uuid default gen_random_uuid() primary key, tenant_id uuid not null, call_sid text, role text check (role in ('user','assistant')), text text, created_at timestamptz default now() ); alter table call_turns enable row level security; create policy tenant_isolation on call_turns for all using (tenant_id = current_setting('request.jwt.claim.tenant_id')::uuid); ```

In the function, on response.done event, insert a row using the service role key (bypasses RLS for system writes) but tag with tenant_id from the Twilio number lookup.

```ts import { createClient } from "https://esm.sh/@supabase/supabase-js@2"; const sb = createClient(Deno.env.get("SUPABASE_URL")!, Deno.env.get("SUPABASE_SERVICE_ROLE_KEY")!); await sb.from("call_turns").insert({ tenant_id, call_sid: streamSid, role: "assistant", text }); ```

Step 5 — Deploy

```bash supabase functions deploy twilio-voice --no-verify-jwt supabase secrets set OPENAI_API_KEY=sk-... ```

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

--no-verify-jwt because Twilio doesn't send a Supabase JWT. Validate Twilio signatures manually.

Step 6 — Configure Twilio

In Twilio → Phone Numbers → your number → Voice Webhook → https://YOUR-PROJECT.supabase.co/functions/v1/twilio-voice. Method POST.

Step 7 — Verify Twilio signature (security)

```ts import { hmac } from "https://deno.land/x/[email protected]/mod.ts"; function verifyTwilio(req: Request, body: string): boolean { const sig = req.headers.get("x-twilio-signature"); const url = req.url; const params = Object.fromEntries(new URLSearchParams(body)); const data = url + Object.keys(params).sort().map(k => k + params[k]).join(""); const expected = btoa(String.fromCharCode(...hmac("sha1", Deno.env.get("TWILIO_AUTH_TOKEN")!, data, "utf8", "uint8"))); return sig === expected; } ```

Pitfalls

  • CPU time per invocation is capped at 400ms on free tier and 2s on Pro for sync work, but WebSocket upgrades are charged differently — long-lived connections OK on Pro.
  • WebSocket idle timeout is 150s default — send Twilio keepalive frames or the call drops.
  • Service role key in code — never expose; use Vault for rotation.
  • Deno's fetch doesn't support trailers — some streaming LLM APIs choke; use Deno.connect for raw TCP if needed.
  • Cold starts are ~80-200ms; voice agents need warm. Use Supabase's vercel-style pinning (Pro plan).

How CallSphere does this in production

CallSphere's main voice path runs on FastAPI :8084 (k3s), not Supabase Edge. We use Postgres on the same private network for our 115+ tables. Supabase Edge Functions are great for greenfield projects that want managed Postgres + RLS + Edge in one product; we needed deeper tool/policy control for our 90+ tools and 6 verticals. 37 agents, $149/$499/$1499, 14-day trial, 22% affiliate.

FAQ

Q: Edge Functions vs running my own Deno? Edge Functions remove ops; for under ~50 concurrent voice calls per region the cost is hard to beat.

Q: Can I scale to 10k concurrent calls? Pro plan supports high concurrency per function but at that scale you'll want a dedicated runtime — Edge Functions aren't right.

Q: Latency? ~700ms voice-to-voice on the closest edge POP.

Q: Auth for the agent's user-facing calls? Use Supabase Auth on the human-facing app; voice path is system-only and writes via service role.

Q: Cost? Free tier: 500k invocations/mo. Pro: $25/mo + $2/M invocations. WS active-connection time is metered separately on Pro.

Sources

Share

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.