The Setup

Alongside OpenAI's direct launch of GPT-Realtime-2 on May 7, 2026, Microsoft made the same model family available through Azure AI Foundry. For enterprises that already buy AI through Azure — for procurement, compliance, BAA, data residency, or BYOC reasons — this is the deployment path that matters.

This is a practical guide to what is different on Azure vs OpenAI direct, and the gotchas that have surfaced in the first 72 hours.

Why Teams Pick Azure For Realtime Voice

Five durable reasons that have nothing to do with the model itself:

Existing Azure commit. Enterprise customers who have spent down Azure credits care a lot about which models count.
Data residency. Foundry exposes specific regions; OpenAI direct is more opaque on routing.
Private networking. Private endpoints, VNet integration, and customer-managed keys are all in Foundry's surface.
Compliance posture. HIPAA BAA, FedRAMP, EU sovereignty options are clearer on Azure than on OpenAI direct.
Single procurement. One PO covers OpenAI models plus the rest of the Azure stack.

What Is Different On Foundry

Six things to know if you have been on OpenAI direct and are moving to Foundry:

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live →

Try Live Demo →

Endpoint and auth. Foundry uses Azure-native auth (managed identity, key, AAD). The endpoint URL pattern is different. SDKs work but configuration is not portable line-for-line.
Quota model. Foundry quotas are per-region, per-deployment, per-subscription — separate from OpenAI rate limits. Plan for capacity early.
Versioning. Azure pins model versions and stages new versions in preview before GA. You control upgrades on your timeline.
Pricing parity, mostly. Foundry pricing tracks OpenAI's listed rates closely with some enterprise-tier variance. Verify the exact rate card for your commit.
Audit and logging. Foundry routes logs through Azure Monitor and Application Insights, not OpenAI's dashboard. Different observability story end to end.
Region availability rollout. GPT-Realtime-2 is rolling out across Foundry regions on a staged schedule. East US and West Europe first; some regions take weeks.

The Real Numbers

Foundry's headline pricing for GPT-Realtime-2 mirrors OpenAI's:

Audio input: $32 per 1M tokens
Audio output: $64 per 1M tokens
Cached input: $0.40 per 1M tokens
Context window: 128K
Max output: 32K

Translate ($0.034/min) and Whisper streaming ($0.017/min) are also on Foundry's rate card. Enterprise commit customers may have negotiated rates that differ.

Networking And Data Path

The default networking story on Foundry deployments:

Public endpoint. TLS-terminated, available globally, simplest to wire.
Private endpoint. Foundry exposes private endpoints via Azure Private Link — required for many financial services and healthcare deployments.
VNet integration. Spoke-and-hub patterns work; expect 1–2 days of network engineering even with templates.

For voice specifically, the websocket path needs careful firewall configuration. The most common deployment delay we have seen on day-one is a network team that has not yet allowed the streaming websocket path through corporate egress.

Compliance Specifics

HIPAA BAA. Available on Azure for the OpenAI service line. Confirm the specific GPT-Realtime-2 SKU is in your BAA — coverage extends per service, not per tenant.
Data retention. Foundry honors customer-controlled retention. The 30-day OpenAI default does not apply to enterprise Foundry deployments by default.
Customer-managed keys. Available, with the usual key-rotation operational overhead.

Production Tradeoffs

Three patterns that have already surfaced this week:

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Try Live Demo → Book 30-min Walkthrough See Pricing

Quota whiplash. Teams who tested on OpenAI direct and migrated to Foundry got rate-limited on their first production traffic spike because Foundry quotas were lower by default. Request increases in advance.
Region rollout timing. If your data residency region is not on the first wave, you may be running an interim period in a different region. Plan the deployment timeline accordingly.
Token accounting. Audio tokenization in Foundry's reporting differs in granularity from OpenAI's. Reconciling token counts to invoiced spend takes a week the first time.

Where CallSphere Fits

CallSphere is a managed AI voice and chat agent platform. We do not require customers to pick a cloud or manage Foundry quotas. The platform is the abstraction — customers consume per-interaction pricing (Starter $149/mo (2,000), Growth $499/mo (10,000), Scale $1,499/mo (50,000)) without owning the deployment surface. For enterprises that have a hard Azure-only mandate, we accommodate that on Scale-tier deployments; for everyone else, the cloud underneath is something we operate.

Talk to us about deployment options: callsphere.ai/demo.

What To Do This Week

Confirm GPT-Realtime-2 is in your Foundry region. If not, get on the rollout list.
Open quota increase requests early. Default quotas are not production-grade.
Validate BAA scope explicitly if you are in healthcare. Do not assume.
Run a 500-call canary in non-prod and reconcile token accounting line-by-line before scaling up.

FAQ

Q: Is Foundry strictly worse on raw speed than OpenAI direct? A: Within margin of error in our testing. Some regions are faster, some slower. The differences are in the noise vs production tuning of your own stack.

Q: Can I run hybrid — Foundry for prod, OpenAI direct for dev? A: Yes. Most teams do exactly this. Pin model versions explicitly so a Foundry rollout does not surprise prod.

Q: When does the BAA cover the new realtime models? A: Microsoft has confirmed coverage rollout in parallel with the model availability rollout. Confirm in writing before HIPAA traffic flows.

Azure AI Foundry + GPT-Realtime-2: Practical Deployment Guide

The Setup

Why Teams Pick Azure For Realtime Voice

What Is Different On Foundry

The Real Numbers

Networking And Data Path

Compliance Specifics

Production Tradeoffs

Where CallSphere Fits

What To Do This Week

FAQ

Try CallSphere AI Voice Agents

Related Articles You May Like

GPT-Realtime-2 For Healthcare Voice: HIPAA and BAA Considerations

GPT-Realtime-2 Tool Use and Reasoning: GPT-5-Class Voice Agents

GPT-Realtime-2 vs CallSphere: Build vs Buy for Voice Agents

GPT-Realtime-2 128K Context: What It Unlocks for Voice Agents

Zep Cloud vs Self-Hosted Zep: When to Pick Which Path in 2026

Enterprise AI Agent Procurement Playbook 2026: 5 Criteria, 12 Industries, Real Gates