Skip to content
Build a Serverless Voice Agent on Lambda + API Gateway WebSocket (2026)
Agentic AI & LLMs12 min read3 views

Build a Serverless Voice Agent on Lambda + API Gateway WebSocket (2026)

By Sagar Shankaran, Founder of CallSphere

Quick answer

Sub-second voice agent with zero idle cost: API Gateway WebSocket ($connect/$disconnect/$default), Lambda per-message handler, DynamoDB session store, and OpenAI Realtime over WebSocket.

Key takeaways

TL;DR — API Gateway WebSocket decouples the persistent client connection from compute, so Lambdas only spin up per message. For a voice agent you need three Lambdas ($connect, $disconnect, $default), a DynamoDB table mapping connection IDs to OpenAI Realtime session IDs, and an SQS-backed worker that holds the OpenAI socket open. ~$0.04 per call-minute at low scale.

What you'll build

A serverless WebSocket endpoint that accepts PCM audio frames from a browser, forwards them to OpenAI Realtime via a long-lived worker (Fargate Spot or EC2 Spot), receives audio back, and pushes it down the API Gateway connection using the @connections POST API. The browser plays it through Web Audio. No always-on infrastructure for the WebSocket layer — only the OpenAI bridge worker.

Prerequisites

  1. AWS account with API Gateway + Lambda + DynamoDB + SQS.
  2. OpenAI API key with Realtime (gpt-realtime or gpt-realtime-mini).
  3. AWS SAM or CDK; Node 20 runtime for the Lambdas.
  4. A small always-on worker (Fargate task or t4g.nano) — Lambda can't hold an external WS open for more than 15 min cleanly.

Architecture

flowchart LR
  B[Browser WS Client] -->|wss://| AG[API Gateway WebSocket]
  AG -->|$connect / $default| LAM[Lambda Handlers]
  LAM <-->|connectionId map| DDB[(DynamoDB sessions)]
  LAM -->|SendMessage| SQS[(SQS audio_in)]
  SQS --> WK[Bridge Worker Fargate]
  WK <-->|wss://| OA[OpenAI Realtime]
  WK -->|@connections POST| AG
  AG -->|audio frames| B

Step 1 — Provision the API Gateway WebSocket

```yaml

template.yaml (AWS SAM)

Resources: WS: Type: AWS::ApiGatewayV2::Api Properties: Name: voice-agent ProtocolType: WEBSOCKET RouteSelectionExpression: "$request.body.action" ```

Define three integrations: $connect, $disconnect, $default — each pointing at a Lambda.

Step 2 — DynamoDB table for connection state

```yaml Sessions: Type: AWS::DynamoDB::Table Properties: BillingMode: PAY_PER_REQUEST AttributeDefinitions: - { AttributeName: connectionId, AttributeType: S } KeySchema: - { AttributeName: connectionId, KeyType: HASH } TimeToLiveSpecification: AttributeName: ttl Enabled: true ```

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live Demo →

TTL kills stale entries automatically; voice sessions rarely exceed 1 hour.

Step 3 — $connect and $default Lambdas

```js // connect.mjs import { DynamoDBClient, PutItemCommand } from "@aws-sdk/client-dynamodb"; const ddb = new DynamoDBClient({}); export const handler = async (event) => { const cid = event.requestContext.connectionId; await ddb.send(new PutItemCommand({ TableName: process.env.SESSIONS, Item: { connectionId: { S: cid }, ttl: { N: String(Math.floor(Date.now()/1000)+3600) } } })); return { statusCode: 200 }; }; ```

```js // default.mjs — receive audio frames from browser, hand off to SQS import { SQSClient, SendMessageCommand } from "@aws-sdk/client-sqs"; const sqs = new SQSClient({}); export const handler = async (event) => { const cid = event.requestContext.connectionId; await sqs.send(new SendMessageCommand({ QueueUrl: process.env.AUDIO_IN, MessageBody: JSON.stringify({ cid, frame: event.body }) // base64 PCM })); return { statusCode: 200 }; }; ```

Step 4 — The bridge worker (Fargate)

```js import WebSocket from "ws"; import { ApiGatewayManagementApiClient, PostToConnectionCommand } from "@aws-sdk/client-apigatewaymanagementapi"; const sessions = new Map(); // cid -> WS to OpenAI const apigw = new ApiGatewayManagementApiClient({ endpoint: process.env.WS_ENDPOINT });

async function getOrOpen(cid) { if (sessions.has(cid)) return sessions.get(cid); const ws = new WebSocket("wss://api.openai.com/v1/realtime?model=gpt-realtime", { headers: { Authorization: Bearer ${process.env.OPENAI_API_KEY}, "OpenAI-Beta": "realtime=v1" } }); ws.on("message", async (data) => { const ev = JSON.parse(data.toString()); if (ev.type === "response.audio.delta") { await apigw.send(new PostToConnectionCommand({ ConnectionId: cid, Data: Buffer.from(ev.delta, "base64") })); } }); sessions.set(cid, ws); return ws; } ```

The worker pulls from SQS, opens (or reuses) an OpenAI socket per connection, forwards frames in, and pushes audio out via @connections.

Step 5 — Browser side: 16-bit PCM over WebSocket

```js const ws = new WebSocket("wss://abc123.execute-api.us-east-1.amazonaws.com/prod"); const ctx = new AudioContext({ sampleRate: 24000 }); const src = ctx.createMediaStreamSource(await navigator.mediaDevices.getUserMedia({ audio: true })); const proc = ctx.createScriptProcessor(4096, 1, 1); src.connect(proc); proc.connect(ctx.destination); proc.onaudioprocess = (e) => { const f32 = e.inputBuffer.getChannelData(0); const i16 = new Int16Array(f32.length); for (let i=0;i<f32.length;i++) i16[i] = Math.max(-1, Math.min(1, f32[i])) * 0x7fff; ws.send(i16.buffer); }; ```

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Step 6 — $disconnect cleanup

```js export const handler = async (event) => { await ddb.send(new DeleteItemCommand({ TableName: process.env.SESSIONS, Key: { connectionId: { S: event.requestContext.connectionId } } })); return { statusCode: 200 }; }; ```

The bridge worker subscribes to a DynamoDB Stream (or polls) and closes its OpenAI socket when the row vanishes.

Pitfalls

  • API Gateway WebSocket has a 10-minute idle timeout — send a {type:"ping"} frame every 5 min from the worker to keep alive.
  • Frame size limit is 32 KB per WS message; chunk audio if needed.
  • Lambda cold start on $default adds 200-300ms — provisioned concurrency = 5 fixes it for ~$10/mo.
  • @connections POST is region-locked — your worker must use the same region as the API.
  • Cost trap: API Gateway charges $1/M messages. At 50 fps audio that's $1.80/hour per active call before Lambda. Pre-aggregate frames if scaling.

How CallSphere does this in production

CallSphere doesn't use API Gateway WebSocket for voice — we measured that at our scale (millions of minutes/month) it's 3-4x more expensive than running our own websocket fleet behind ALB. We run FastAPI :8084 on bare k3s nodes for Healthcare and Pion Go + NATS for OneRoof multi-family. 37 agents, 90+ tools, 115+ DB tables, 6 verticals, $149/$499/$1499, 14-day trial, 22% affiliate. For early-stage builders without our scale, the serverless pattern in this post is the right answer.

FAQ

Q: Can I drop the worker and put everything in Lambda? Only if every call lasts <15 min and you're OK with cold starts on every reconnect. The worker pattern is what makes this production-grade.

Q: Do I have to use OpenAI? No — swap the worker target for AWS Bedrock Nova Sonic, Azure GPT Realtime, or Cloudflare Workers AI.

Q: How do I add Twilio? Replace the $default handler with a Twilio Media Stream WebSocket served from a non-API-Gateway endpoint (Twilio doesn't sign API Gateway URLs out of the box). Cleanest is a separate ALB+Fargate path for Twilio.

Q: What about HIPAA? API Gateway WebSocket is HIPAA-eligible since 2024. Sign a BAA, enable VPC endpoints, encrypt SQS with KMS CMK.

Q: Cost at 1k concurrent calls? Roughly: $1.50 API Gateway messages + $0.20 Lambda + $0.10 SQS + $0.50 Fargate + $20 OpenAI = $22.30/hour, or $0.022/min/call.

Sources

Share
S

Written by

Sagar Shankaran· Founder, CallSphere

Sagar Shankaran is the founder of CallSphere, where he builds production AI voice and chat agents deployed across healthcare, hospitality, real estate, and home services. He writes about agentic AI, LLM engineering, and shipping voice agents that handle real calls in production.

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.