Build an AI Voice Agent on the T3 Stack (Next.js + tRPC + Prisma, 2026)
create-t3-app gives you Next.js 15, TypeScript, Tailwind, tRPC v11, Prisma, and Auth.js v5 in one CLI. Add OpenAI Realtime and you have a typed voice agent in 90 minutes.
TL;DR —
pnpm create t3-app@latestscaffolds a Next.js 15 App Router project with tRPC v11, Prisma, Auth.js v5 and Tailwind. Bolt OpenAI Realtime + WebRTC ephemeral keys onto it and you ship a typed, authenticated voice agent in one afternoon.
What you'll build
A logged-in user opens /voice, the page mints an ephemeral OpenAI key from a tRPC procedure, the browser opens a WebRTC peer connection to OpenAI Realtime, and call transcripts are written to Postgres via Prisma — all type-safe.
Prerequisites
- Node 20+,
pnpm@9. pnpm create t3-app@latest --noInstalland pick: tRPC, Prisma, Tailwind, Auth.js.OPENAI_API_KEYand a Postgres URL.
Architecture
flowchart TD
U[User] --> NX[Next.js 15 App Router]
NX --> TR[tRPC v11 - mintEphemeral]
TR --> OA[POST /v1/realtime/sessions]
OA --> NX
NX -- WebRTC SDP --> RT[OpenAI Realtime]
NX --> PR[Prisma transcripts]
Step 1 — Add the realtime procedure
```ts // server/api/routers/voice.ts import { z } from "zod"; import { protectedProcedure, createTRPCRouter } from "@/server/api/trpc";
export const voiceRouter = createTRPCRouter({
mintEphemeral: protectedProcedure
.input(z.object({ voice: z.enum(["alloy","verse"]).default("alloy") }))
.mutation(async ({ input }) => {
const r = await fetch("https://api.openai.com/v1/realtime/sessions", {
method: "POST",
headers: { Authorization: Bearer ${process.env.OPENAI_API_KEY},
"Content-Type": "application/json" },
body: JSON.stringify({ model: "gpt-realtime", voice: input.voice }),
});
return (await r.json()) as { client_secret: { value: string } };
}),
});
```
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
Step 2 — Browser WebRTC handshake
```tsx "use client"; import { api } from "@/trpc/react";
export function VoiceCall() {
const mint = api.voice.mintEphemeral.useMutation();
async function start() {
const { client_secret } = await mint.mutateAsync({ voice: "alloy" });
const pc = new RTCPeerConnection();
pc.ontrack = (e) => (audioEl.current!.srcObject = e.streams[0]);
const ms = await navigator.mediaDevices.getUserMedia({ audio: true });
ms.getTracks().forEach((t) => pc.addTrack(t, ms));
const dc = pc.createDataChannel("oai-events");
const offer = await pc.createOffer();
await pc.setLocalDescription(offer);
const ans = await fetch(
"https://api.openai.com/v1/realtime?model=gpt-realtime",
{ method: "POST", body: offer.sdp,
headers: { Authorization: Bearer ${client_secret.value},
"Content-Type": "application/sdp" } });
await pc.setRemoteDescription({ type: "answer", sdp: await ans.text() });
}
return ;
}
```
Step 3 — Persist transcripts
```prisma model Transcript { id String @id @default(cuid()) userId String text String role String createdAt DateTime @default(now()) } ```
```ts dc.addEventListener("message", async (e) => { const evt = JSON.parse(e.data); if (evt.type === "response.audio_transcript.done") await api.voice.saveTurn.mutate({ text: evt.transcript, role: "assistant" }); }); ```
Step 4 — Auth gate
Wrap the page in auth() — Auth.js v5 has stable App Router support. Unauthenticated users see a redirect; the tRPC procedure already uses protectedProcedure.
Step 5 — Deploy
vercel --prod works out of the box; set OPENAI_API_KEY and DATABASE_URL as environment variables. Use Vercel Postgres or Neon.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
Pitfalls
- WebRTC + corporate firewalls: Some networks block UDP 3478. Fall back to WebSocket if
pc.iceConnectionStatestayschecking>5s. - Ephemeral key TTL: Default 60s — mint right before
createOffer. - Auth.js v5 cookies: Edge middleware sometimes strips
__Secure-cookies — setuseSecureCookies: process.env.NODE_ENV === "production"explicitly.
How CallSphere does this in production
CallSphere's platform combines T3-style typing across 37 agents, 90+ tools, 115+ DB tables, and 6 verticals. OneRoof (Next.js 16 + React 19) is the closest analog: Auth.js gates the realtime endpoint, Prisma persists transcripts, and tRPC carries every tool call. $149/$499/$1,499, 14-day no-card trial, 22% affiliate.
FAQ
Is T3 still relevant in 2026? Yes — best free TS full-stack starter, but consider T4 (T3 + Vercel AI SDK + RAG) for AI-heavy apps.
Can I swap Prisma for Drizzle? create-t3-app --drizzle is supported in 2026.
Why WebRTC over WebSocket? Browsers handle echo cancellation + jitter natively over WebRTC.
Where does the API key live? Server only — clients receive only ephemeral keys.
Sources
- Create T3 App - https://create.t3.gg/
- T3 Stack 2026 review - https://starterpick.com/blog/t3-stack-2026
- OpenAI Realtime WebRTC - https://developers.openai.com/api/docs/guides/realtime-webrtc
- T4 stack - https://www.sitepoint.com/t4-stack-nextjs-16-vercel-ai-sdk-local-rag-tutorial/
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.