---
title: "Build a Metered AI Voice Agent with Clerk Billing + Stripe (2026)"
description: "Clerk Billing wraps Stripe in 700ms of glue: PricingTable component, has() entitlement checks, and per-minute metered billing. Plug it into a voice agent in one afternoon."
canonical: https://callsphere.ai/blog/vw8h-build-ai-voice-agent-clerk-stripe-billing-2026
category: "AI Voice Agents"
tags: ["Clerk", "Stripe", "Billing", "Voice Agent", "Metered"]
author: "CallSphere Team"
published: 2026-04-01T00:00:00.000Z
updated: 2026-05-08T17:25:15.726Z
---

# Build a Metered AI Voice Agent with Clerk Billing + Stripe (2026)

> Clerk Billing wraps Stripe in 700ms of glue: PricingTable component, has() entitlement checks, and per-minute metered billing. Plug it into a voice agent in one afternoon.

> **TL;DR** — Clerk Billing (0.7% + Stripe fees) gives you a drop-in ``, server-side `has({ feature })` entitlement, and Stripe Meters for per-minute voice billing — without writing a webhook handler.

## What you'll build

Three voice plans (Starter / Pro / Scale), each gating minutes per month. The Realtime endpoint checks entitlement, increments a Stripe Meter on every minute, and shuts off cleanly on cap.

## Prerequisites

1. Next.js 15 + Clerk `@clerk/nextjs@^6`.
2. Stripe account connected to Clerk Billing.
3. `OPENAI_API_KEY`, `STRIPE_SECRET_KEY`.

## Architecture

```mermaid
flowchart TD
  U[User] --> CL[Clerk Auth + Plan]
  U --> RT[/api/realtime/]
  RT --> EN{has voice_minutes?}
  EN -->|yes| OA[OpenAI Realtime]
  OA --> M[Stripe Meter increment]
  EN -->|no| UP[Upgrade page]
```

## Step 1 — Define plans in Clerk Dashboard

In Clerk Dashboard → Billing, create plans `starter_149`, `pro_499`, `scale_1499` and a feature `voice_minutes`. Mark `voice_minutes` as a metered feature linked to a Stripe Meter `voice_minutes_meter`.

## Step 2 — Drop the pricing table

```tsx
// app/pricing/page.tsx
import { PricingTable } from "@clerk/nextjs";
export default function Pricing() {
  return ;
}
```

## Step 3 — Entitlement gate

```ts
// app/api/realtime/route.ts
import { auth } from "@clerk/nextjs/server";
import { NextResponse } from "next/server";

export async function POST() {
  const { userId, has } = await auth();
  if (!userId) return new NextResponse("auth", { status: 401 });
  if (!has({ feature: "voice_minutes" }))
    return NextResponse.json({ upgrade: true }, { status: 402 });

const r = await fetch("[https://api.openai.com/v1/realtime/sessions](https://api.openai.com/v1/realtime/sessions)", {
    method: "POST",
    headers: { Authorization: `Bearer ${process.env.OPENAI_API_KEY}`,
               "Content-Type": "application/json" },
    body: JSON.stringify({ model: "gpt-realtime" }),
  });
  return NextResponse.json(await r.json());
}
```

## Step 4 — Meter usage on call end

```ts
import Stripe from "stripe";
import { auth, clerkClient } from "@clerk/nextjs/server";

const stripe = new Stripe(process.env.STRIPE_SECRET_KEY!);

export async function POST(req: Request) {
  const { userId } = await auth();
  const { minutes } = await req.json();
  const u = await (await clerkClient()).users.getUser(userId!);
  const customerId = u.publicMetadata.stripeCustomerId as string;

await stripe.billing.meterEvents.create({
    event_name: "voice_minutes_meter",
    payload: { stripe_customer_id: customerId, value: String(minutes) },
  });
  return Response.json({ ok: true });
}
```

## Step 5 — Hook on call disconnect

In the browser, when the WebRTC peer goes `disconnected`, POST elapsed minutes to `/api/meter`.

## Step 6 — Cap enforcement

Use `has({ feature: "voice_minutes", quantity: { gte: capLeft } })` in middleware to block once the soft cap is hit.

## Pitfalls

- **Test mode**: Clerk Billing dev sandbox uses Stripe test keys — switch both before launch.
- **Meter granularity**: Stripe Meters dedupe by `(customer, event_name, idempotency_key, time)`; always pass a unique `identifier` per call.
- **Entitlement caching**: `has()` is cached for 30s — for hard caps, query Stripe usage directly.

## How CallSphere does this in production

CallSphere prices at **$149 Starter / $499 Pro / $1,499 Scale** with metered overage, **14-day no-card trial**, and **22% recurring Year-1 affiliate**. The platform spans **37 agents**, **90+ tools**, **115+ DB tables**, **6 verticals** — Healthcare, OneRoof (Next.js 16 + React 19), Salon (NestJS 10 + Prisma), Sales (Node.js 20 + React 18 + Vite). Clerk + Stripe handles auth + billing across all six.

## FAQ

**Clerk Billing fee?** 0.7% per transaction on top of Stripe fees.

**Can I run B2B teams?** Yes — Clerk has Organizations with seat-based plans and per-org features.

**Can I avoid Clerk and use Stripe Customer Portal directly?** Yes, but you give up the React components and `has()` helpers.

**EU compliance?** Clerk offers SCA-ready Stripe checkout out of the box.

## Sources

- Clerk Billing for B2C - [https://clerk.com/docs/nextjs/guides/billing/for-b2c](https://clerk.com/docs/nextjs/guides/billing/for-b2c)
- Clerk Billing for B2B - [https://clerk.com/docs/nextjs/guides/billing/for-b2b](https://clerk.com/docs/nextjs/guides/billing/for-b2b)
- Stripe Sessions: Clerk + Stripe - [https://stripe.com/sessions/2025/instant-zero-integration-saas-billing-with-clerk-stripe](https://stripe.com/sessions/2025/instant-zero-integration-saas-billing-with-clerk-stripe)
- Stripe Billing Meters - [https://docs.stripe.com/billing/subscriptions/usage-based/recording-usage](https://docs.stripe.com/billing/subscriptions/usage-based/recording-usage)

## How this plays out in production

One layer below what *Build a Metered AI Voice Agent with Clerk Billing + Stripe (2026)* covers, the practical question every team hits is multi-turn handoffs between specialist agents without losing slot state, sentiment, or escalation context. Treat this as a voice-first system from the first prompt: the agent's persona, its tool surface, and its escalation rules all flow from that single decision. Teams that ship fast tend to instrument the loop end-to-end before they tune any single component, because the bottleneck is rarely where intuition puts it.

## Voice agent architecture, end to end

A production-grade voice stack at CallSphere stitches Twilio Programmable Voice (PSTN ingress, TwiML, bidirectional Media Streams) to a realtime reasoning layer — typically OpenAI Realtime or ElevenLabs Conversational AI — with sub-second response as a hard SLO. Anything north of one second of perceived silence and callers either repeat themselves or hang up; that single number drives the whole architecture. Server-side VAD with proper barge-in support is non-negotiable, otherwise the agent talks over the caller and the conversation collapses. Streaming TTS with phoneme-aligned interruption keeps the cadence natural even when the user changes their mind mid-sentence. Post-call, every transcript is run through a structured pipeline: sentiment, intent classification, lead score, escalation flag, and a normalized slot extraction (name, callback number, reason, urgency). For healthcare workloads, the BAA-covered storage path, audit logs, encryption-at-rest, and PHI-safe transcript redaction are wired in from day one, not bolted on at compliance review. The end state is a system where every call produces a row of structured data, not just a recording.

## FAQ

**What is the fastest path to a voice agent the way *Build a Metered AI Voice Agent with Clerk Billing + Stripe (2026)* describes?**

Treat the architecture in this post as a starting point and instrument it before you tune it. The metrics that matter most early on are end-to-end latency (target < 1s for voice, < 3s for chat), barge-in correctness, tool-call success rate, and post-conversation lead score distribution. Optimize whatever the data flags as the bottleneck, not whatever feels slowest in your head.

**What are the gotchas around voice agent deployments at scale?**

The two failure modes that bite hardest are silent context loss across multi-turn handoffs and tool calls that succeed in dev but get rate-limited in production. Both are solvable with a proper agent backplane that pins state to a session ID, retries with backoff, and writes every tool invocation to an audit log you can replay.

**What does the CallSphere outbound sales calling product do that a regular dialer does not?**

It uses the ElevenLabs "Sarah" voice, runs up to 5 concurrent outbound calls per operator, and ships with a browser-based dialer that transfers warm calls back to a human in one click. Dispositions, transcripts, and lead scores write back to the CRM automatically.

## See it live

Book a 30-minute working session at [calendly.com/sagar-callsphere/new-meeting](https://calendly.com/sagar-callsphere/new-meeting) and bring a real call flow — we will walk it through the live outbound sales dialer at [sales.callsphere.tech](https://sales.callsphere.tech) and show you exactly where the production wiring sits.

---

Source: https://callsphere.ai/blog/vw8h-build-ai-voice-agent-clerk-stripe-billing-2026