Azure AI Foundry + GPT-Realtime-2: Practical Deployment Guide
Deploy GPT-Realtime-2 on Azure AI Foundry. Region availability, networking, data residency, BAA, and the gotchas teams hit in the first 48 hours.
The Setup
Alongside OpenAI's direct launch of GPT-Realtime-2 on May 7, 2026, Microsoft made the same model family available through Azure AI Foundry. For enterprises that already buy AI through Azure — for procurement, compliance, BAA, data residency, or BYOC reasons — this is the deployment path that matters.
This is a practical guide to what is different on Azure vs OpenAI direct, and the gotchas that have surfaced in the first 72 hours.
Why Teams Pick Azure For Realtime Voice
Five durable reasons that have nothing to do with the model itself:
- Existing Azure commit. Enterprise customers who have spent down Azure credits care a lot about which models count.
- Data residency. Foundry exposes specific regions; OpenAI direct is more opaque on routing.
- Private networking. Private endpoints, VNet integration, and customer-managed keys are all in Foundry's surface.
- Compliance posture. HIPAA BAA, FedRAMP, EU sovereignty options are clearer on Azure than on OpenAI direct.
- Single procurement. One PO covers OpenAI models plus the rest of the Azure stack.
What Is Different On Foundry
Six things to know if you have been on OpenAI direct and are moving to Foundry:
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
- Endpoint and auth. Foundry uses Azure-native auth (managed identity, key, AAD). The endpoint URL pattern is different. SDKs work but configuration is not portable line-for-line.
- Quota model. Foundry quotas are per-region, per-deployment, per-subscription — separate from OpenAI rate limits. Plan for capacity early.
- Versioning. Azure pins model versions and stages new versions in preview before GA. You control upgrades on your timeline.
- Pricing parity, mostly. Foundry pricing tracks OpenAI's listed rates closely with some enterprise-tier variance. Verify the exact rate card for your commit.
- Audit and logging. Foundry routes logs through Azure Monitor and Application Insights, not OpenAI's dashboard. Different observability story end to end.
- Region availability rollout. GPT-Realtime-2 is rolling out across Foundry regions on a staged schedule. East US and West Europe first; some regions take weeks.
The Real Numbers
Foundry's headline pricing for GPT-Realtime-2 mirrors OpenAI's:
- Audio input: $32 per 1M tokens
- Audio output: $64 per 1M tokens
- Cached input: $0.40 per 1M tokens
- Context window: 128K
- Max output: 32K
Translate ($0.034/min) and Whisper streaming ($0.017/min) are also on Foundry's rate card. Enterprise commit customers may have negotiated rates that differ.
Networking And Data Path
The default networking story on Foundry deployments:
- Public endpoint. TLS-terminated, available globally, simplest to wire.
- Private endpoint. Foundry exposes private endpoints via Azure Private Link — required for many financial services and healthcare deployments.
- VNet integration. Spoke-and-hub patterns work; expect 1–2 days of network engineering even with templates.
For voice specifically, the websocket path needs careful firewall configuration. The most common deployment delay we have seen on day-one is a network team that has not yet allowed the streaming websocket path through corporate egress.
Compliance Specifics
- HIPAA BAA. Available on Azure for the OpenAI service line. Confirm the specific GPT-Realtime-2 SKU is in your BAA — coverage extends per service, not per tenant.
- Data retention. Foundry honors customer-controlled retention. The 30-day OpenAI default does not apply to enterprise Foundry deployments by default.
- Customer-managed keys. Available, with the usual key-rotation operational overhead.
Production Tradeoffs
Three patterns that have already surfaced this week:
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
- Quota whiplash. Teams who tested on OpenAI direct and migrated to Foundry got rate-limited on their first production traffic spike because Foundry quotas were lower by default. Request increases in advance.
- Region rollout timing. If your data residency region is not on the first wave, you may be running an interim period in a different region. Plan the deployment timeline accordingly.
- Token accounting. Audio tokenization in Foundry's reporting differs in granularity from OpenAI's. Reconciling token counts to invoiced spend takes a week the first time.
Where CallSphere Fits
CallSphere is a managed AI voice and chat agent platform. We do not require customers to pick a cloud or manage Foundry quotas. The platform is the abstraction — customers consume per-interaction pricing (Starter $149/mo (2,000), Growth $499/mo (10,000), Scale $1,499/mo (50,000)) without owning the deployment surface. For enterprises that have a hard Azure-only mandate, we accommodate that on Scale-tier deployments; for everyone else, the cloud underneath is something we operate.
Talk to us about deployment options: callsphere.ai/demo.
What To Do This Week
- Confirm GPT-Realtime-2 is in your Foundry region. If not, get on the rollout list.
- Open quota increase requests early. Default quotas are not production-grade.
- Validate BAA scope explicitly if you are in healthcare. Do not assume.
- Run a 500-call canary in non-prod and reconcile token accounting line-by-line before scaling up.
FAQ
Q: Is Foundry strictly worse on raw speed than OpenAI direct? A: Within margin of error in our testing. Some regions are faster, some slower. The differences are in the noise vs production tuning of your own stack.
Q: Can I run hybrid — Foundry for prod, OpenAI direct for dev? A: Yes. Most teams do exactly this. Pin model versions explicitly so a Foundry rollout does not surprise prod.
Q: When does the BAA cover the new realtime models? A: Microsoft has confirmed coverage rollout in parallel with the model availability rollout. Confirm in writing before HIPAA traffic flows.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.