---
title: "Build an AI Voice Agent with Nuxt 3 + Vue 3.5 + OpenAI Realtime (2026)"
description: "Nuxt 3 Nitro server routes mint ephemeral OpenAI keys, Vue 3.5 composables wrap WebRTC, and Pinia holds the call state. Sub-700ms voice agent in 200 lines."
canonical: https://callsphere.ai/blog/vw8h-build-ai-voice-agent-nuxt-3-vue-realtime-2026
category: "AI Voice Agents"
tags: ["Nuxt 3", "Vue", "WebRTC", "Voice Agent", "Realtime"]
author: "CallSphere Team"
published: 2026-04-18T00:00:00.000Z
updated: 2026-05-08T17:25:15.728Z
---

# Build an AI Voice Agent with Nuxt 3 + Vue 3.5 + OpenAI Realtime (2026)

> Nuxt 3 Nitro server routes mint ephemeral OpenAI keys, Vue 3.5 composables wrap WebRTC, and Pinia holds the call state. Sub-700ms voice agent in 200 lines.

> **TL;DR** — Nuxt 3.13+ on Vue 3.5 ships a built-in Nitro server, perfect for hiding OpenAI keys. Wrap WebRTC + the Realtime API in a `useVoiceAgent` composable for a clean Vue voice UI.

## What you'll build

A Nuxt 3 page with a Talk button that uses an ephemeral key minted by a Nitro server route, opens WebRTC to OpenAI gpt-realtime, and streams transcripts into a Pinia store.

## Prerequisites

1. `nuxt@^3.13`, `vue@^3.5`, `pinia@^2.2`.
2. `OPENAI_API_KEY` in `.env`.
3. Node 20+ or Bun 1.3.

## Architecture

```mermaid
flowchart LR
  V[Nuxt page] --> N[Nitro /api/realtime/key]
  N -- POST sessions --> OA1[OpenAI]
  OA1 --> N --> V
  V -- WebRTC SDP --> OA2[OpenAI Realtime]
```

## Step 1 — Nitro endpoint

```ts
// server/api/realtime/key.post.ts
export default defineEventHandler(async () => {
  const r = await $fetch(
    "[https://api.openai.com/v1/realtime/sessions](https://api.openai.com/v1/realtime/sessions)",
    {
      method: "POST",
      headers: { Authorization: `Bearer ${process.env.OPENAI_API_KEY}` },
      body: { model: "gpt-realtime", voice: "alloy" },
    },
  );
  return r;
});
```

## Step 2 — Composable

```ts
// composables/useVoiceAgent.ts
export function useVoiceAgent() {
  const live = ref(false);
  const transcript = ref("");
  const audioEl = ref(null);

async function start() {
    const { client_secret } = await $fetch("/api/realtime/key",
                                           { method: "POST" });
    const pc = new RTCPeerConnection();
    pc.ontrack = (e) => audioEl.value && (audioEl.value.srcObject = e.streams[0]);
    const ms = await navigator.mediaDevices.getUserMedia({ audio: true });
    ms.getTracks().forEach((t) => pc.addTrack(t, ms));

```
const dc = pc.createDataChannel("oai-events");
dc.addEventListener("message", (e) => {
  const evt = JSON.parse(e.data);
  if (evt.type === "response.audio_transcript.delta")
    transcript.value += evt.delta;
});
const offer = await pc.createOffer();
await pc.setLocalDescription(offer);
const ans = await fetch(
  "https://api.openai.com/v1/realtime?model=gpt-realtime",
  { method: "POST", body: offer.sdp,
    headers: { Authorization: `Bearer ${client_secret.value}`,
               "Content-Type": "application/sdp" } });
await pc.setRemoteDescription({ type: "answer", sdp: await ans.text() });
live.value = true;
```

}
  return { live, transcript, audioEl, start };
}
```

## Step 3 — Page

```vue

  {{ live ? "Live" : "Talk" }}

```
{{ transcript }}
```

\`\`\`

## Step 4 — Pinia store for multi-page state

```ts
// stores/calls.ts
export const useCalls = defineStore("calls", () => ({
  state: () => ({ history: [] as { role: string; text: string }[] }),
}));
```

## Step 5 — Deploy

`npx nuxi build && nuxt preview` locally, or use the `nitro-cloudflare` preset for Cloudflare Pages. Vercel and Netlify presets ship out of the box.

## Step 6 — Tool calls

Listen for `response.function_call_arguments.done`, run a Nitro endpoint to execute the tool server-side (so you keep secrets server-only), and reply via the data channel.

## Pitfalls

- **`process.env` in Nitro** vs `useRuntimeConfig` — use the latter for typed config.
- **WebRTC + SSR**: The voice page must be `` or set `ssr: false` in `definePageMeta`.
- **Vapor mode (preview)**: Vue 3.5's experimental Vapor mode skips VDOM but is still preview — opt-in carefully.

## How CallSphere does this in production

CallSphere's stack is multi-framework: Healthcare (FastAPI), OneRoof (Next.js 16 + React 19), Salon (NestJS 10 + Prisma), Sales (Node.js 20 + React 18 + Vite). Some agency white-label customers prefer Vue/Nuxt — supported via the same realtime relay. **37 agents · 90+ tools · 115+ DB tables · 6 verticals**. **$149/$499/$1,499**, **14-day trial**, **22% affiliate**.

## FAQ

**Vue 3.5 vs 3.4?** 3.5 brings reactivity perf wins and `useTemplateRef`.

**WebRTC vs WebSocket on Nuxt?** WebRTC for browser-direct, WebSocket if you need server-side audit/policy.

**Cloudflare Pages limit?** WS connections capped at 100 concurrent on free tier — bump to Workers Paid for production.

**Ephemeral key TTL?** 60s default; refresh before each call.

## Sources

- Nuxt 3 docs - [https://nuxt.com/](https://nuxt.com/)
- Mamezou - Nuxt + OpenAI Realtime - [https://developer.mamezou-tech.com/en/blogs/2024/10/16/openai-realtime-api-nuxt/](https://developer.mamezou-tech.com/en/blogs/2024/10/16/openai-realtime-api-nuxt/)
- Vue.js AI SDK Getting Started - [https://ai-sdk.dev/docs/getting-started/nuxt](https://ai-sdk.dev/docs/getting-started/nuxt)
- VueSchool - AI Interfaces with Vue + Nuxt - [https://vueschool.io/courses/ai-interfaces-with-vue-nuxt-and-the-ai-sdk](https://vueschool.io/courses/ai-interfaces-with-vue-nuxt-and-the-ai-sdk)

## How this plays out in production

If you are taking the ideas in *Build an AI Voice Agent with Nuxt 3 + Vue 3.5 + OpenAI Realtime (2026)* and putting them in front of real customers, the constraint that decides everything is ASR error rates on long-tail entities (drug names, street names, SKUs) and the post-call pipeline that must reconcile what was actually heard. Treat this as a voice-first system from the first prompt: the agent's persona, its tool surface, and its escalation rules all flow from that single decision. Teams that ship fast tend to instrument the loop end-to-end before they tune any single component, because the bottleneck is rarely where intuition puts it.

## Voice agent architecture, end to end

A production-grade voice stack at CallSphere stitches Twilio Programmable Voice (PSTN ingress, TwiML, bidirectional Media Streams) to a realtime reasoning layer — typically OpenAI Realtime or ElevenLabs Conversational AI — with sub-second response as a hard SLO. Anything north of one second of perceived silence and callers either repeat themselves or hang up; that single number drives the whole architecture. Server-side VAD with proper barge-in support is non-negotiable, otherwise the agent talks over the caller and the conversation collapses. Streaming TTS with phoneme-aligned interruption keeps the cadence natural even when the user changes their mind mid-sentence. Post-call, every transcript is run through a structured pipeline: sentiment, intent classification, lead score, escalation flag, and a normalized slot extraction (name, callback number, reason, urgency). For healthcare workloads, the BAA-covered storage path, audit logs, encryption-at-rest, and PHI-safe transcript redaction are wired in from day one, not bolted on at compliance review. The end state is a system where every call produces a row of structured data, not just a recording.

## FAQ

**What changes when you move a voice agent the way *Build an AI Voice Agent with Nuxt 3 + Vue 3.5 + OpenAI Realtime (2026)* describes?**

Treat the architecture in this post as a starting point and instrument it before you tune it. The metrics that matter most early on are end-to-end latency (target < 1s for voice, < 3s for chat), barge-in correctness, tool-call success rate, and post-conversation lead score distribution. Optimize whatever the data flags as the bottleneck, not whatever feels slowest in your head.

**Where does this break down for voice agent deployments at scale?**

The two failure modes that bite hardest are silent context loss across multi-turn handoffs and tool calls that succeed in dev but get rate-limited in production. Both are solvable with a proper agent backplane that pins state to a session ID, retries with backoff, and writes every tool invocation to an audit log you can replay.

**How does the salon stack (GlamBook) keep bookings clean across stylists and services?**

GlamBook runs 4 agents that handle booking, rescheduling, fuzzy service-name matching, and confirmations. Every appointment gets a deterministic reference like GB-YYYYMMDD-### so the salon, the customer, and the agent all reference the same object across SMS, email, and voice.

## See it live

Book a 30-minute working session at [calendly.com/sagar-callsphere/new-meeting](https://calendly.com/sagar-callsphere/new-meeting) and bring a real call flow — we will walk it through the live salon booking agent (GlamBook) at [salon.callsphere.tech](https://salon.callsphere.tech) and show you exactly where the production wiring sits.

---

Source: https://callsphere.ai/blog/vw8h-build-ai-voice-agent-nuxt-3-vue-realtime-2026