By Sagar Shankaran, Founder of CallSphere
Sub-second voice agent with zero idle cost: API Gateway WebSocket ($connect/$disconnect/$default), Lambda per-message handler, DynamoDB session store, and OpenAI Realtime over WebSocket.
Key takeaways
TL;DR — API Gateway WebSocket decouples the persistent client connection from compute, so Lambdas only spin up per message. For a voice agent you need three Lambdas (
$connect,$disconnect,$default), a DynamoDB table mapping connection IDs to OpenAI Realtime session IDs, and an SQS-backed worker that holds the OpenAI socket open. ~$0.04 per call-minute at low scale.
A serverless WebSocket endpoint that accepts PCM audio frames from a browser, forwards them to OpenAI Realtime via a long-lived worker (Fargate Spot or EC2 Spot), receives audio back, and pushes it down the API Gateway connection using the @connections POST API. The browser plays it through Web Audio. No always-on infrastructure for the WebSocket layer — only the OpenAI bridge worker.
gpt-realtime or gpt-realtime-mini).flowchart LR
B[Browser WS Client] -->|wss://| AG[API Gateway WebSocket]
AG -->|$connect / $default| LAM[Lambda Handlers]
LAM <-->|connectionId map| DDB[(DynamoDB sessions)]
LAM -->|SendMessage| SQS[(SQS audio_in)]
SQS --> WK[Bridge Worker Fargate]
WK <-->|wss://| OA[OpenAI Realtime]
WK -->|@connections POST| AG
AG -->|audio frames| B
```yaml
Resources: WS: Type: AWS::ApiGatewayV2::Api Properties: Name: voice-agent ProtocolType: WEBSOCKET RouteSelectionExpression: "$request.body.action" ```
Define three integrations: $connect, $disconnect, $default — each pointing at a Lambda.
```yaml Sessions: Type: AWS::DynamoDB::Table Properties: BillingMode: PAY_PER_REQUEST AttributeDefinitions: - { AttributeName: connectionId, AttributeType: S } KeySchema: - { AttributeName: connectionId, KeyType: HASH } TimeToLiveSpecification: AttributeName: ttl Enabled: true ```
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
TTL kills stale entries automatically; voice sessions rarely exceed 1 hour.
$connect and $default Lambdas```js // connect.mjs import { DynamoDBClient, PutItemCommand } from "@aws-sdk/client-dynamodb"; const ddb = new DynamoDBClient({}); export const handler = async (event) => { const cid = event.requestContext.connectionId; await ddb.send(new PutItemCommand({ TableName: process.env.SESSIONS, Item: { connectionId: { S: cid }, ttl: { N: String(Math.floor(Date.now()/1000)+3600) } } })); return { statusCode: 200 }; }; ```
```js // default.mjs — receive audio frames from browser, hand off to SQS import { SQSClient, SendMessageCommand } from "@aws-sdk/client-sqs"; const sqs = new SQSClient({}); export const handler = async (event) => { const cid = event.requestContext.connectionId; await sqs.send(new SendMessageCommand({ QueueUrl: process.env.AUDIO_IN, MessageBody: JSON.stringify({ cid, frame: event.body }) // base64 PCM })); return { statusCode: 200 }; }; ```
```js import WebSocket from "ws"; import { ApiGatewayManagementApiClient, PostToConnectionCommand } from "@aws-sdk/client-apigatewaymanagementapi"; const sessions = new Map(); // cid -> WS to OpenAI const apigw = new ApiGatewayManagementApiClient({ endpoint: process.env.WS_ENDPOINT });
async function getOrOpen(cid) {
if (sessions.has(cid)) return sessions.get(cid);
const ws = new WebSocket("wss://api.openai.com/v1/realtime?model=gpt-realtime", {
headers: { Authorization: Bearer ${process.env.OPENAI_API_KEY}, "OpenAI-Beta": "realtime=v1" }
});
ws.on("message", async (data) => {
const ev = JSON.parse(data.toString());
if (ev.type === "response.audio.delta") {
await apigw.send(new PostToConnectionCommand({ ConnectionId: cid, Data: Buffer.from(ev.delta, "base64") }));
}
});
sessions.set(cid, ws);
return ws;
}
```
The worker pulls from SQS, opens (or reuses) an OpenAI socket per connection, forwards frames in, and pushes audio out via @connections.
```js const ws = new WebSocket("wss://abc123.execute-api.us-east-1.amazonaws.com/prod"); const ctx = new AudioContext({ sampleRate: 24000 }); const src = ctx.createMediaStreamSource(await navigator.mediaDevices.getUserMedia({ audio: true })); const proc = ctx.createScriptProcessor(4096, 1, 1); src.connect(proc); proc.connect(ctx.destination); proc.onaudioprocess = (e) => { const f32 = e.inputBuffer.getChannelData(0); const i16 = new Int16Array(f32.length); for (let i=0;i<f32.length;i++) i16[i] = Math.max(-1, Math.min(1, f32[i])) * 0x7fff; ws.send(i16.buffer); }; ```
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
$disconnect cleanup```js export const handler = async (event) => { await ddb.send(new DeleteItemCommand({ TableName: process.env.SESSIONS, Key: { connectionId: { S: event.requestContext.connectionId } } })); return { statusCode: 200 }; }; ```
The bridge worker subscribes to a DynamoDB Stream (or polls) and closes its OpenAI socket when the row vanishes.
{type:"ping"} frame every 5 min from the worker to keep alive.$default adds 200-300ms — provisioned concurrency = 5 fixes it for ~$10/mo.@connections POST is region-locked — your worker must use the same region as the API.CallSphere doesn't use API Gateway WebSocket for voice — we measured that at our scale (millions of minutes/month) it's 3-4x more expensive than running our own websocket fleet behind ALB. We run FastAPI :8084 on bare k3s nodes for Healthcare and Pion Go + NATS for OneRoof multi-family. 37 agents, 90+ tools, 115+ DB tables, 6 verticals, $149/$499/$1499, 14-day trial, 22% affiliate. For early-stage builders without our scale, the serverless pattern in this post is the right answer.
Q: Can I drop the worker and put everything in Lambda? Only if every call lasts <15 min and you're OK with cold starts on every reconnect. The worker pattern is what makes this production-grade.
Q: Do I have to use OpenAI? No — swap the worker target for AWS Bedrock Nova Sonic, Azure GPT Realtime, or Cloudflare Workers AI.
Q: How do I add Twilio?
Replace the $default handler with a Twilio Media Stream WebSocket served from a non-API-Gateway endpoint (Twilio doesn't sign API Gateway URLs out of the box). Cleanest is a separate ALB+Fargate path for Twilio.
Q: What about HIPAA? API Gateway WebSocket is HIPAA-eligible since 2024. Sign a BAA, enable VPC endpoints, encrypt SQS with KMS CMK.
Q: Cost at 1k concurrent calls? Roughly: $1.50 API Gateway messages + $0.20 Lambda + $0.10 SQS + $0.50 Fargate + $20 OpenAI = $22.30/hour, or $0.022/min/call.
Written by
Sagar Shankaran· Founder, CallSphere
Sagar Shankaran is the founder of CallSphere, where he builds production AI voice and chat agents deployed across healthcare, hospitality, real estate, and home services. He writes about agentic AI, LLM engineering, and shipping voice agents that handle real calls in production.
See how AI voice agents work for your industry. Live demo available -- no signup required.
Haystack 2.7's Agent component plus an Ollama-served Llama 3.2 gives you tool-calling RAG with citations. Here's a complete pipeline against your own document store.
Run STT, LLM, and TTS entirely on Cloudflare's edge — no OpenAI, no ElevenLabs. Real working code with Whisper, Llama 3.3 70B, and Deepgram Aura.
AWS HealthScribe became the open scribe layer EHR vendors built on top of in 2026. Here's the API surface, the per-encounter pricing, the BAA terms.
AWS Multi-Agent Orchestrator ships supervisor routing, classifier, and shared memory. How to compose a customer-support agent team on Bedrock that scales cleanly.
Version your prompts in git, run a 50-case eval suite on every PR, block merges below threshold, and ship a new agent prompt with confidence — full GitHub Actions tutorial.
Replace expensive outbound SDR tooling with a self-hosted dialer that runs OpenAI Realtime agents at 100 concurrent calls. Full architecture and code.
© 2026 CallSphere LLC. All rights reserved.
Watch how CallSphere handles real customer calls, schedules appointments, and processes payments — live.
Try Live DemoBook a DemoCalculate Your ROI