We modeled 11 real call profiles against OpenAI's published gpt-realtime audio token rates. The honest answer: between $0.18 and $0.46 per minute, with caching pulling it under $0.25.

The cost problem

flowchart TD
  Client[Client] --> Edge[Cloudflare Worker]
  Edge -->|WS upgrade| DO[Durable Object]
  DO --> AI[(OpenAI Realtime WS)]
  AI --> DO
  DO --> Client
  DO -.hibernation.-> Storage[(Persisted state)]

CallSphere reference architecture

Every founder building on OpenAI Realtime asks the same question on day three: "What does this actually cost me per minute?" The OpenAI pricing page lists rates per million audio tokens, not per minute, and the conversion depends on who is talking and how long they pause. Builders quote each other numbers between $0.06 and $0.60 per minute and they are all kind of right, depending on the call profile.

The result is that nobody trusts their own unit economics. We solved this for our own fleet and want to share the math so you do not have to.

How OpenAI prices it

The published rates for gpt-realtime (as of May 2026) are:

Audio input: $32 per million tokens
Cached audio input: $0.40 per million (a 98.75% discount on cache hits — yes, that high)
Audio output: $64 per million
Text input: $4 per million
Cached text input: $0.40 per million
Text output: $16 per million

Audio tokens are duration-encoded. User audio is 1 token per 100 ms. Assistant audio is 1 token per 50 ms. So 60 seconds of user speech equals 600 tokens; 60 seconds of assistant TTS equals 1,200 tokens.

Honest math (real call profiles)

For a real customer-service call (60% caller talk, 40% agent talk, 5 minute average), the math is:

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live →

Try Live Demo →

Caller audio in: 5 min × 60% = 180 seconds = 1,800 tokens × $32 / 1M = $0.0576
Agent audio out: 5 min × 40% = 120 seconds = 2,400 tokens × $64 / 1M = $0.1536
System prompt + tools (uncached, 12k tokens text in, repeats every turn × 8 turns): 96k × $4 / 1M = $0.384
Reasoning text out (small, ~2k): $0.032
Total uncached: $0.627 per call = $0.125 per minute

That is way over the "$0.06/min" napkin number because the system prompt re-charges every turn. Now with prompt caching (90%+ on stable system prompt portion):

Cached system prompt: 96k × $0.40 / 1M = $0.0384 (saves $0.346)
Cached total: $0.281 per call = $0.056 per minute

For a chattier sales call (50/50 talk split, 8 minutes, 14k token prompt, 12 turns):

Uncached: $0.92 per call = $0.115/min
Cached: $0.41 per call = $0.051/min

For a complex healthcare intake (heavy tool calls, 12 minutes, 22k token prompt, 18 turns, 6 tool round-trips):

Uncached: $2.18 per call = $0.182/min
Cached + structured: $0.96 per call = $0.080/min

The honest range across our 11 profiles: $0.18–$0.46/min uncached, $0.05–$0.10/min with prompt caching applied properly.

How CallSphere optimizes

CallSphere runs OpenAI Realtime on the Healthcare Voice Agent (FastAPI on :8084, 14 tools, PCM16 at 24kHz). We hit roughly $0.087/min average across 6 verticals on the production cluster, after cache + prompt diet.

Three things moved the number:

Aggressive prompt caching. Our 18,000-token healthcare system prompt is split into a stable static head (16,400 tokens, cached) and a per-call dynamic tail (1,600 tokens, uncached). 91% cache hit rate.
Tool result trimming. We strip tool-return JSON to the fields the model actually consumes. A 4kB FHIR observation becomes a 380-byte summary line. That cut our reasoning token bill by 41%.
Voice-end-of-turn instead of fixed VAD. Server VAD with 500ms silence costs 60–120 extra audio-out tokens per turn from the model "thinking out loud." Switching to model-end-of-turn detection cut that to 0.

Across the 6 verticals on the production cluster — 37 agents, 90+ tools, 115+ DB tables — the same caching policy applies. Healthcare uses GPT-4o-mini for post-call analytics with 90% cache hit, ElevenLabs Sarah voice runs on the Sales product, and Realtime PCM16 24kHz powers Healthcare. The pricing tiers ($149 / $499 / $1499) are sized so SMB margins survive a $0.10/min ceiling on inference. There is a 14-day no-card trial that lets you measure the same on your own traffic.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Try Live Demo → Book 30-min Walkthrough See Pricing

Optimization checklist

Split your system prompt into a stable head and a dynamic tail.
Send the stable head first every turn so cache hits trigger.
Use prompt_cache_key for explicit cache scoping where supported.
Strip tool-result JSON to fields the model actually reads.
Use max_output_tokens to cap runaway responses.
Switch from server VAD to model-end-of-turn detection.
Disable text logging unless you need it (text-out adds up).
Move post-call analytics to GPT-4o-mini with batch where possible.
Compare your real per-minute against the $0.10/min ceiling — that is the SMB-friendly target.
Re-measure weekly; OpenAI cuts these prices on a quarterly cadence.

FAQ

What is the actual per-minute cost of gpt-realtime in 2026? Between $0.18 and $0.46/min uncached for typical agents; $0.05 to $0.10/min once you turn on prompt caching and trim tool outputs.

Why is the napkin "$0.30/min" number wrong? It assumes your system prompt is tiny and ignores tool calls. Real production prompts are 8–22k tokens, and that re-charges every turn unless cached.

Does prompt caching really save 90%+? Yes — the published rate is $32 → $0.40 per million audio input tokens, a 98.75% discount on the cached portion. Hit rate determines effective savings; 80%+ is realistic.

What about gpt-realtime-mini? Roughly 60% cheaper across all rates. We use it for the lower-tier products in our pricing where we can trade some reasoning depth for unit economics.

How do I measure my own? Look at the usage field on every Realtime session-end event. It returns input/output/cached audio + text token counts. Sum and divide.

Sources

OpenAI API Pricing — https://openai.com/api/pricing/
OpenAI Developers Pricing — https://developers.openai.com/api/docs/pricing
OpenAI Prompt Caching announcement — https://openai.com/index/api-prompt-caching/
eesel.ai GPT Realtime Mini pricing analysis — https://www.eesel.ai/blog/gpt-realtime-mini-pricing
forasoft Realtime API production guide — https://www.forasoft.com/blog/article/openai-realtime-api-voice-agent-production-guide-2026

Background and Key Concepts: Gpt realtime pricing

This guide is written for engineers and operators evaluating gpt realtime pricing in real production systems. Gpt realtime pricing sits alongside 1m tokens, based pricing, cached inputs, high volume, rate limits in the daily work of teams shipping production AI. The notes below give a plain-language reference for terms used throughout the article.

1m tokens — referenced in this guide when discussing gpt realtime pricing.
based pricing — referenced in this guide when discussing gpt realtime pricing.
cached inputs — referenced in this guide when discussing gpt realtime pricing.
high volume — referenced in this guide when discussing gpt realtime pricing.
rate limits — referenced in this guide when discussing gpt realtime pricing.
real time — referenced in this guide when discussing gpt realtime pricing.
speech to speech — referenced in this guide when discussing gpt realtime pricing.

For teams that want to ship gpt realtime pricing in voice and chat agents this quarter, CallSphere runs 37 agents and 90+ function tools across 6 verticals on a single dashboard. Start a 14-day trial, see live demo agents, or compare tiers on /pricing.

OpenAI Realtime API Cost Per Minute: The Real Math for 2026 — Gpt realtime pricing

The cost problem

How OpenAI prices it

Honest math (real call profiles)

How CallSphere optimizes

Optimization checklist

FAQ

Sources

Background and Key Concepts: Gpt realtime pricing

Try CallSphere AI Voice Agents

Related Articles You May Like

Texto a Voz: AI Voice Generators for Spanish Markets in 2026

Female Voice Generator: AI Voices That Sound Human in 2026

Siri Voice Generator: How AI Voice Cloning Actually Works in 2026

AI Voice Assistants for Ecommerce and Small Business in 2026

Robot Text to Speech in 2026: A Founder's Guide to TTS Voices

Customer Support Specialist in 2026: AI-Augmented Role Guide

Product

Resources

Company

Legal

Industries

Integrations

Solutions

Compare

Pillar Guides