---
title: "Build a Serverless Voice Agent on Lambda + API Gateway WebSocket (2026)"
description: "Sub-second voice agent with zero idle cost: API Gateway WebSocket ($connect/$disconnect/$default), Lambda per-message handler, DynamoDB session store, and OpenAI Realtime over WebSocket."
canonical: https://callsphere.ai/blog/vw5h-build-serverless-voice-agent-lambda-apigw-websocket
category: "AI Infrastructure"
tags: ["AWS", "Lambda", "API Gateway", "WebSocket", "Serverless", "Tutorial"]
author: "CallSphere Team"
published: 2026-03-21T00:00:00.000Z
updated: 2026-05-07T16:30:06.055Z
---

# Build a Serverless Voice Agent on Lambda + API Gateway WebSocket (2026)

> Sub-second voice agent with zero idle cost: API Gateway WebSocket ($connect/$disconnect/$default), Lambda per-message handler, DynamoDB session store, and OpenAI Realtime over WebSocket.

> **TL;DR** — API Gateway WebSocket decouples the persistent client connection from compute, so Lambdas only spin up per message. For a voice agent you need three Lambdas (`$connect`, `$disconnect`, `$default`), a DynamoDB table mapping connection IDs to OpenAI Realtime session IDs, and an SQS-backed worker that holds the OpenAI socket open. ~$0.04 per call-minute at low scale.

## What you'll build

A serverless WebSocket endpoint that accepts PCM audio frames from a browser, forwards them to OpenAI Realtime via a long-lived worker (Fargate Spot or EC2 Spot), receives audio back, and pushes it down the API Gateway connection using the `@connections` POST API. The browser plays it through Web Audio. No always-on infrastructure for the WebSocket layer — only the OpenAI bridge worker.

## Prerequisites

1. AWS account with API Gateway + Lambda + DynamoDB + SQS.
2. OpenAI API key with Realtime (`gpt-realtime` or `gpt-realtime-mini`).
3. AWS SAM or CDK; Node 20 runtime for the Lambdas.
4. A small always-on worker (Fargate task or t4g.nano) — Lambda can't hold an external WS open for more than 15 min cleanly.

## Architecture

```mermaid
flowchart LR
  B[Browser WS Client] -->|wss://| AG[API Gateway WebSocket]
  AG -->|$connect / $default| LAM[Lambda Handlers]
  LAM |connectionId map| DDB[(DynamoDB sessions)]
  LAM -->|SendMessage| SQS[(SQS audio_in)]
  SQS --> WK[Bridge Worker Fargate]
  WK |wss://| OA[OpenAI Realtime]
  WK -->|@connections POST| AG
  AG -->|audio frames| B
```

## Step 1 — Provision the API Gateway WebSocket

```yaml

# template.yaml (AWS SAM)

Resources:
  WS:
    Type: AWS::ApiGatewayV2::Api
    Properties:
      Name: voice-agent
      ProtocolType: WEBSOCKET
      RouteSelectionExpression: "$request.body.action"
```

Define three integrations: `$connect`, `$disconnect`, `$default` — each pointing at a Lambda.

## Step 2 — DynamoDB table for connection state

```yaml
Sessions:
  Type: AWS::DynamoDB::Table
  Properties:
    BillingMode: PAY_PER_REQUEST
    AttributeDefinitions:
      - { AttributeName: connectionId, AttributeType: S }
    KeySchema:
      - { AttributeName: connectionId, KeyType: HASH }
    TimeToLiveSpecification:
      AttributeName: ttl
      Enabled: true
```

TTL kills stale entries automatically; voice sessions rarely exceed 1 hour.

## Step 3 — `$connect` and `$default` Lambdas

```js
// connect.mjs
import { DynamoDBClient, PutItemCommand } from "@aws-sdk/client-dynamodb";
const ddb = new DynamoDBClient({});
export const handler = async (event) => {
  const cid = event.requestContext.connectionId;
  await ddb.send(new PutItemCommand({
    TableName: process.env.SESSIONS,
    Item: { connectionId: { S: cid }, ttl: { N: String(Math.floor(Date.now()/1000)+3600) } }
  }));
  return { statusCode: 200 };
};
```

```js
// default.mjs — receive audio frames from browser, hand off to SQS
import { SQSClient, SendMessageCommand } from "@aws-sdk/client-sqs";
const sqs = new SQSClient({});
export const handler = async (event) => {
  const cid = event.requestContext.connectionId;
  await sqs.send(new SendMessageCommand({
    QueueUrl: process.env.AUDIO_IN,
    MessageBody: JSON.stringify({ cid, frame: event.body })  // base64 PCM
  }));
  return { statusCode: 200 };
};
```

## Step 4 — The bridge worker (Fargate)

```js
import WebSocket from "ws";
import { ApiGatewayManagementApiClient, PostToConnectionCommand } from "@aws-sdk/client-apigatewaymanagementapi";
const sessions = new Map();    // cid -> WS to OpenAI
const apigw = new ApiGatewayManagementApiClient({ endpoint: process.env.WS_ENDPOINT });

async function getOrOpen(cid) {
  if (sessions.has(cid)) return sessions.get(cid);
  const ws = new WebSocket("wss://api.openai.com/v1/realtime?model=gpt-realtime", {
    headers: { Authorization: `Bearer ${process.env.OPENAI_API_KEY}`, "OpenAI-Beta": "realtime=v1" }
  });
  ws.on("message", async (data) => {
    const ev = JSON.parse(data.toString());
    if (ev.type === "response.audio.delta") {
      await apigw.send(new PostToConnectionCommand({ ConnectionId: cid, Data: Buffer.from(ev.delta, "base64") }));
    }
  });
  sessions.set(cid, ws);
  return ws;
}
```

The worker pulls from SQS, opens (or reuses) an OpenAI socket per connection, forwards frames in, and pushes audio out via `@connections`.

## Step 5 — Browser side: 16-bit PCM over WebSocket

```js
const ws = new WebSocket("wss://abc123.execute-api.us-east-1.amazonaws.com/prod");
const ctx = new AudioContext({ sampleRate: 24000 });
const src = ctx.createMediaStreamSource(await navigator.mediaDevices.getUserMedia({ audio: true }));
const proc = ctx.createScriptProcessor(4096, 1, 1);
src.connect(proc); proc.connect(ctx.destination);
proc.onaudioprocess = (e) => {
  const f32 = e.inputBuffer.getChannelData(0);
  const i16 = new Int16Array(f32.length);
  for (let i=0;i {
  await ddb.send(new DeleteItemCommand({ TableName: process.env.SESSIONS,
    Key: { connectionId: { S: event.requestContext.connectionId } } }));
  return { statusCode: 200 };
};
```

The bridge worker subscribes to a DynamoDB Stream (or polls) and closes its OpenAI socket when the row vanishes.

## Pitfalls

- **API Gateway WebSocket has a 10-minute idle timeout** — send a `{type:"ping"}` frame every 5 min from the worker to keep alive.
- **Frame size limit is 32 KB** per WS message; chunk audio if needed.
- **Lambda cold start** on `$default` adds 200-300ms — provisioned concurrency = 5 fixes it for ~$10/mo.
- **`@connections` POST is region-locked** — your worker must use the same region as the API.
- **Cost trap**: API Gateway charges $1/M messages. At 50 fps audio that's $1.80/hour per active call before Lambda. Pre-aggregate frames if scaling.

## How CallSphere does this in production

CallSphere doesn't use API Gateway WebSocket for voice — we measured that at our scale (millions of minutes/month) it's 3-4x more expensive than running our own websocket fleet behind ALB. We run FastAPI :8084 on bare k3s nodes for Healthcare and Pion Go + NATS for OneRoof multi-family. 37 agents, 90+ tools, 115+ DB tables, 6 verticals, $149/$499/$1499, 14-day trial, 22% affiliate. For early-stage builders without our scale, the serverless pattern in this post is the right answer.

## FAQ

**Q: Can I drop the worker and put everything in Lambda?**
Only if every call lasts <15 min and you're OK with cold starts on every reconnect. The worker pattern is what makes this production-grade.

**Q: Do I have to use OpenAI?**
No — swap the worker target for AWS Bedrock Nova Sonic, Azure GPT Realtime, or Cloudflare Workers AI.

**Q: How do I add Twilio?**
Replace the `$default` handler with a Twilio Media Stream WebSocket served from a non-API-Gateway endpoint (Twilio doesn't sign API Gateway URLs out of the box). Cleanest is a separate ALB+Fargate path for Twilio.

**Q: What about HIPAA?**
API Gateway WebSocket is HIPAA-eligible since 2024. Sign a BAA, enable VPC endpoints, encrypt SQS with KMS CMK.

**Q: Cost at 1k concurrent calls?**
Roughly: $1.50 API Gateway messages + $0.20 Lambda + $0.10 SQS + $0.50 Fargate + $20 OpenAI = $22.30/hour, or $0.022/min/call.

## Sources

- [API Gateway WebSocket APIs documentation](https://docs.aws.amazon.com/apigateway/latest/developerguide/apigateway-websocket-api.html)
- [Serverless strategies for streaming LLM responses — AWS Compute Blog](https://aws.amazon.com/blogs/compute/serverless-strategies-for-streaming-llm-responses/)
- [@aws-sdk/client-apigatewaymanagementapi reference](https://docs.aws.amazon.com/AWSJavaScriptSDK/v3/latest/client/apigatewaymanagementapi/)
- [OpenAI Realtime API guide](https://developers.openai.com/api/docs/guides/realtime)
- [Real-Time Communication with AWS API Gateway WebSockets and Lambda — Superluminar](https://superluminar.io/2024/08/02/real-time-communication-with-aws-api-gateway-websockets-and-lambda/)

---

Source: https://callsphere.ai/blog/vw5h-build-serverless-voice-agent-lambda-apigw-websocket
