Skip to content
AI Voice Agents
AI Voice Agents11 min read0 views

How to Build a Node.js Voice Agent with ElevenLabs Conversational AI

Wire ElevenLabs Conversational AI to an Express server, expose a public agent over WebSocket, and trigger client tools — full TypeScript tutorial for 2026.

TL;DR — ElevenLabs Conversational AI gives you a hosted agent with TTS-grade voices and built-in turn-taking. The @elevenlabs/client package handles the WebSocket; you only write your tool handlers and a thin Express layer for signed URLs.

What you'll build

A small Express service that mints a signed conversation URL, plus a Node.js client that joins the agent, executes registered "client tools" (like get_weather or book_slot), and streams audio. By the end you'll have a working voice loop with one of ElevenLabs' premium voices and tools the agent can call mid-conversation.

Prerequisites

  1. ElevenLabs account with a created Conversational AI agent (note the agent_id).
  2. npm install express @elevenlabs/client elevenlabs node-fetch.
  3. Node 20+ and a microphone (or audio file source).
  4. ELEVENLABS_API_KEY in env.
  5. Basic familiarity with WebSocket auth (signed URLs).

Architecture

flowchart LR
  Browser -->|GET /signed-url| Express
  Express -->|REST| ElevenLabs
  Browser -->|WS conversation| ElevenLabs
  Browser -->|client_tool calls| ToolHandler

Step 1 — Create the agent in ElevenLabs

In the dashboard, create an agent, paste a system prompt, pick a voice (e.g., Rachel), and define one client tool with name get_booking_slots and a JSON Schema for params. Copy the agent_id.

Step 2 — Express endpoint to mint a signed URL

For private agents you need a server-signed URL — never ship your API key to the browser.

```ts // server.ts import express from "express"; import fetch from "node-fetch";

const app = express(); app.get("/signed-url", async (_req, res) => { const r = await fetch( `https://api.elevenlabs.io/v1/convai/conversation/get-signed-url?agent_id=${process.env.AGENT_ID}\`, { headers: { "xi-api-key": process.env.ELEVENLABS_API_KEY! } } ); const { signed_url } = await r.json(); res.json({ signed_url }); }); app.listen(3000); ```

Step 3 — Browser conversation with client tools

```ts // client.ts (bundled to browser) import { Conversation } from "@elevenlabs/client";

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live Demo →

async function start() { const { signed_url } = await fetch("/signed-url").then(r => r.json());

const conversation = await Conversation.startSession({ signedUrl: signed_url, clientTools: { get_booking_slots: async ({ date }: { date: string }) => { const slots = await fetch(`/api/slots?date=${date}`).then(r => r.json()); return JSON.stringify(slots); }, book_appointment: async ({ slot_id, name }) => { await fetch("/api/book", { method: "POST", headers: { "content-type": "application/json" }, body: JSON.stringify({ slot_id, name }), }); return "Booked"; }, }, onModeChange: ({ mode }) => console.log("mode:", mode), onStatusChange: ({ status }) => console.log("status:", status), });

document.getElementById("end")!.onclick = () => conversation.endSession(); } start(); ```

Step 4 — Python alternative (server-side)

If your tool execution belongs server-side (DB writes, secrets), run the agent from Python and stream audio over your own transport:

```python from elevenlabs.client import ElevenLabs from elevenlabs.conversational_ai.conversation import Conversation, ClientTools from elevenlabs.conversational_ai.default_audio_interface import DefaultAudioInterface

client = ElevenLabs(api_key=os.environ["ELEVENLABS_API_KEY"]) tools = ClientTools()

async def get_booking_slots(params): return await db.fetch_slots(params["date"])

tools.register("get_booking_slots", get_booking_slots, is_async=True)

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

conv = Conversation( client=client, agent_id=os.environ["AGENT_ID"], requires_auth=True, audio_interface=DefaultAudioInterface(), client_tools=tools, ) conv.start_session() ```

Step 5 — Wire to Twilio for phone calls

ElevenLabs has a native Twilio integration: import a Twilio number into the ElevenLabs dashboard, and inbound calls are routed to your agent automatically. For outbound, hit the Twilio outbound endpoint:

```ts await fetch(`https://api.elevenlabs.io/v1/convai/twilio/outbound-call\`, { method: "POST", headers: { "xi-api-key": KEY, "content-type": "application/json" }, body: JSON.stringify({ agent_id: AGENT_ID, agent_phone_number_id: PHONE_ID, to_number: "+18453884261", }), }); ```

Step 6 — Logging tool calls

Every clientTool invocation is a great audit hook — log the args and the response to your DB so you can replay conversations later.

Common pitfalls

  • Shipping the API key to the browser: always proxy through /signed-url.
  • Tool returning non-string: always JSON.stringify the response — the agent reads it as text.
  • Microphone permission failures on iOS Safari: must be triggered by a user gesture (button click), not on page load.
  • Agent says "I don't know": check that tool names match exactly between the dashboard and your handlers.

How CallSphere does this in production

CallSphere's Salon vertical runs 4 ElevenLabs Conversational AI agents (booking, rescheduling, FAQ, retention) with GB-YYYYMMDD-### booking references handed back to the agent as tool results. Healthcare uses OpenAI Realtime PCM16 24kHz instead, but the tool registration pattern is identical. Pricing starts at $149/$499/$1499; 14-day trial here.

FAQ

OpenAI Realtime vs ElevenLabs Conversational AI? ElevenLabs ships with premium voices and a hosted dashboard. OpenAI Realtime is rawer but cheaper and lower-latency for phone.

Can I bring my own LLM? ElevenLabs supports custom LLMs (Claude, GPT-4o) via the agent settings.

Are tool calls billed separately? No — tool execution happens in your code, billing covers only conversation minutes.

How long can a session last? Up to 30 minutes per ElevenLabs limits as of April 2026.

Sources

Share

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.