By Sagar Shankaran, Founder of CallSphere
Gpt realtime pricing: we modeled 11 real call profiles against OpenAI's published gpt-realtime audio token rates. The honest answer: between $0.18 and $0.46 per minute, with caching pulling it under $0.25.
Key takeaways
We modeled 11 real call profiles against OpenAI's published gpt-realtime audio token rates. The honest answer: between $0.18 and $0.46 per minute, with caching pulling it under $0.25.
flowchart TD
Client[Client] --> Edge[Cloudflare Worker]
Edge -->|WS upgrade| DO[Durable Object]
DO --> AI[(OpenAI Realtime WS)]
AI --> DO
DO --> Client
DO -.hibernation.-> Storage[(Persisted state)]Every founder building on OpenAI Realtime asks the same question on day three: "What does this actually cost me per minute?" The OpenAI pricing page lists rates per million audio tokens, not per minute, and the conversion depends on who is talking and how long they pause. Builders quote each other numbers between $0.06 and $0.60 per minute and they are all kind of right, depending on the call profile.
The result is that nobody trusts their own unit economics. We solved this for our own fleet and want to share the math so you do not have to.
The published rates for gpt-realtime (as of May 2026) are:
Audio tokens are duration-encoded. User audio is 1 token per 100 ms. Assistant audio is 1 token per 50 ms. So 60 seconds of user speech equals 600 tokens; 60 seconds of assistant TTS equals 1,200 tokens.
For a real customer-service call (60% caller talk, 40% agent talk, 5 minute average), the math is:
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
That is way over the "$0.06/min" napkin number because the system prompt re-charges every turn. Now with prompt caching (90%+ on stable system prompt portion):
For a chattier sales call (50/50 talk split, 8 minutes, 14k token prompt, 12 turns):
For a complex healthcare intake (heavy tool calls, 12 minutes, 22k token prompt, 18 turns, 6 tool round-trips):
The honest range across our 11 profiles: $0.18–$0.46/min uncached, $0.05–$0.10/min with prompt caching applied properly.
CallSphere runs OpenAI Realtime on the Healthcare Voice Agent (FastAPI on :8084, 14 tools, PCM16 at 24kHz). We hit roughly $0.087/min average across 6 verticals on the production cluster, after cache + prompt diet.
Three things moved the number:
Across the 6 verticals on the production cluster — 37 agents, 90+ tools, 115+ DB tables — the same caching policy applies. Healthcare uses GPT-4o-mini for post-call analytics with 90% cache hit, ElevenLabs Sarah voice runs on the Sales product, and Realtime PCM16 24kHz powers Healthcare. The pricing tiers ($149 / $499 / $1499) are sized so SMB margins survive a $0.10/min ceiling on inference. There is a 14-day no-card trial that lets you measure the same on your own traffic.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
prompt_cache_key for explicit cache scoping where supported.max_output_tokens to cap runaway responses.What is the actual per-minute cost of gpt-realtime in 2026? Between $0.18 and $0.46/min uncached for typical agents; $0.05 to $0.10/min once you turn on prompt caching and trim tool outputs.
Why is the napkin "$0.30/min" number wrong? It assumes your system prompt is tiny and ignores tool calls. Real production prompts are 8–22k tokens, and that re-charges every turn unless cached.
Does prompt caching really save 90%+? Yes — the published rate is $32 → $0.40 per million audio input tokens, a 98.75% discount on the cached portion. Hit rate determines effective savings; 80%+ is realistic.
What about gpt-realtime-mini? Roughly 60% cheaper across all rates. We use it for the lower-tier products in our pricing where we can trade some reasoning depth for unit economics.
How do I measure my own?
Look at the usage field on every Realtime session-end event. It returns input/output/cached audio + text token counts. Sum and divide.
This guide is written for engineers and operators evaluating gpt realtime pricing in real production systems. Gpt realtime pricing sits alongside 1m tokens, based pricing, cached inputs, high volume, rate limits in the daily work of teams shipping production AI. The notes below give a plain-language reference for terms used throughout the article.
For teams that want to ship gpt realtime pricing in voice and chat agents this quarter, CallSphere runs 37 agents and 90+ function tools across 6 verticals on a single dashboard. Start a 14-day trial, see live demo agents, or compare tiers on /pricing.
Written by
Sagar Shankaran· Founder, CallSphere
Sagar Shankaran is the founder of CallSphere, where he builds production AI voice and chat agents deployed across healthcare, hospitality, real estate, and home services. He writes about agentic AI, LLM engineering, and shipping voice agents that handle real calls in production.
See how AI voice agents work for your industry. Live demo available -- no signup required.
A founder's guide to texto a voz (text-to-speech in Spanish): LATAM vs Castilian voices, free options, and how CallSphere ships Spanish agents.
A founder's guide to the female voice generator landscape: AI female voices, Japanese voices, robot voices, and how CallSphere ships 57+ voices live.
A founder's guide to the Siri voice generator landscape: how AI voice cloning works, what is legal, and how CallSphere uses 57+ voices in production.
A founder's guide to AI voice assistants for ecommerce: customer service, order lookup, and how CallSphere fits in versus virtual receptionists.
Robot text to speech in 2026: how I pick TTS APIs, when robotic voices help, and how CallSphere ships 57+ language voice agents. Hands-on guide.
The customer support specialist role in 2026 is half human, half AI. Here is what the job looks like, the AI tools that pair with it, and how we ship it.
© 2026 CallSphere LLC. All rights reserved.