By Sagar Shankaran, Founder of CallSphere
Use Microsoft Foundry's GPT Realtime API plus Voice Live API for a sub-second voice agent. Real C# and Python code, Speech Service config, Azure AD auth, deploy to Container Apps.
Key takeaways
TL;DR — Microsoft Foundry (formerly Azure AI Foundry) ships the GPT Realtime API and a higher-level Voice Live API that bundles STT + LLM + TTS. Both speak the OpenAI Realtime WebSocket protocol. Pair with Azure AD token auth and Container Apps for HIPAA/SOC2-aligned voice agents.
A Python service hosted on Azure Container Apps that exposes a WebSocket bridge for browser callers, talks to the Voice Live API at /voice-agent/realtime, uses a custom system prompt, and falls back to GPT-5 with manual STT/TTS for tenants that need a custom voice. AAD-authenticated, scoped to a specific Azure AI Foundry project.
openai>=1.55, azure-identity, websockets.az CLI logged in; an AAD-managed identity for Container Apps.flowchart LR
B[Browser Caller] -->|wss| BR[FastAPI Bridge Container Apps]
BR -->|AAD token| KV[(Key Vault)]
BR -->|Voice Live API wss| VL[Foundry Voice Live]
VL --> GPT5[GPT-5 / GPT-Realtime-Mini]
BR -->|fallback| STT[Azure Speech STT]
STT --> GPT5
GPT5 --> TTS[Azure Speech Neural TTS]
TTS --> BR
BR --> B
```bash az group create -n vox -l eastus2 az cognitiveservices account create -g vox -n vox-foundry --kind AIServices --sku S0 -l eastus2 az cognitiveservices account create -g vox -n vox-speech --kind SpeechServices --sku S0 -l eastus2 ```
Note the endpoint URL — it will be https://vox-foundry.cognitiveservices.azure.com.
```python from azure.identity import DefaultAzureCredential cred = DefaultAzureCredential() def aad_token(): return cred.get_token("https://cognitiveservices.azure.com/.default").token ```
Set Cognitive Services User role on the Container Apps managed identity for both resources.
```python import asyncio, websockets, json, base64 ENDPOINT = "wss://vox-foundry.cognitiveservices.azure.com/voice-agent/realtime?api-version=2025-05-01-preview&model=gpt-realtime"
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
async def voice_session(): headers = {"Authorization": f"Bearer {aad_token()}"} async with websockets.connect(ENDPOINT, additional_headers=headers) as ws: await ws.send(json.dumps({ "type": "session.update", "session": { "instructions": "You are a friendly receptionist. Keep replies short.", "voice": "en-US-AvaMultilingualNeural", "input_audio_format": "pcm16", "output_audio_format": "pcm16", "turn_detection": {"type": "server_vad", "threshold": 0.5}, "tools": [], } })) async for msg in ws: ev = json.loads(msg) # handle response.audio.delta, response.done, etc. ```
The Voice Live wire format mirrors OpenAI Realtime; only the URL and auth differ.
```python from fastapi import FastAPI, WebSocket app = FastAPI()
@app.websocket("/agent") async def agent(ws: WebSocket): await ws.accept() headers = {"Authorization": f"Bearer {aad_token()}"} async with websockets.connect(ENDPOINT, additional_headers=headers) as az: async def in_loop(): async for frame in ws.iter_bytes(): await az.send(json.dumps({ "type": "input_audio_buffer.append", "audio": base64.b64encode(frame).decode() })) async def out_loop(): async for msg in az: ev = json.loads(msg) if ev["type"] == "response.audio.delta": await ws.send_bytes(base64.b64decode(ev["delta"])) await asyncio.gather(in_loop(), out_loop()) ```
In session.update, include:
```json "tools": [{ "type": "function", "name": "lookup_appointment", "description": "Get next appointment for a patient", "parameters": {"type":"object","properties":{"patient_id":{"type":"string"}},"required":["patient_id"]} }], "tool_choice": "auto" ```
When the model calls the tool, you get a response.function_call_arguments.done event; reply with conversation.item.create of type function_call_output then response.create.
For tenants who need a Custom Neural Voice, swap the Voice Live socket for the standard GPT-5 chat completions API plus Azure Speech SDK STT/TTS:
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
```python import azure.cognitiveservices.speech as speechsdk sc = speechsdk.SpeechConfig(auth_token=f"aad#{cred.get_token('https://cognitiveservices.azure.com/.default').token}", region="eastus2") sc.speech_synthesis_voice_name = "en-US-MyCustomVoice" ```
```bash az containerapp env create -g vox -n vox-env -l eastus2 az containerapp create -g vox -n vox-agent --environment vox-env \ --image ghcr.io/you/vox:latest --target-port 8080 --ingress external \ --user-assigned my-managed-identity --min-replicas 1 --max-replicas 50 ```
Container Apps' built-in WebSocket support means no Front Door needed for dev.
en-US-AvaMultilingualNeural is the default; pick from the Speech Studio voice gallery.CallSphere runs a multi-cloud strategy: OpenAI Realtime as primary, Azure Voice Live API as a fallback for tenants in Microsoft procurement frameworks (we have several enterprise Healthcare customers locked to Azure). Same FastAPI :8084 surface, same 90+ tools, same 115+ DB tables, just a different upstream socket. 37 agents across 6 verticals. Pricing: $149/$499/$1499, 14-day trial, 22% affiliate.
Q: GPT-5 vs GPT-Realtime-Mini for voice? Mini is purpose-built for voice — lower latency and 30% cheaper. GPT-5 wins on complex reasoning but adds 200-400ms.
Q: Can I use my own STT/TTS with Voice Live? No — Voice Live is fully managed. For BYO STT/TTS, drop down to the standalone Realtime API + Azure Speech SDK.
Q: HIPAA? Sign a BAA via the Azure portal, enable Customer Managed Keys on the Foundry resource, use Private Endpoints to keep audio off the public internet.
Q: Latency target? ~600-800ms voice-to-voice on Voice Live in East US 2.
Q: Streaming function-call args?
Yes — listen to response.function_call_arguments.delta and parse incrementally for ultra-low latency tool dispatch.
Written by
Sagar Shankaran· Founder, CallSphere
Sagar Shankaran is the founder of CallSphere, where he builds production AI voice and chat agents deployed across healthcare, hospitality, real estate, and home services. He writes about agentic AI, LLM engineering, and shipping voice agents that handle real calls in production.
See how AI voice agents work for your industry. Live demo available -- no signup required.
Deploy GPT-Realtime-2 on Azure AI Foundry. Region availability, networking, data residency, BAA, and the gotchas teams hit in the first 48 hours.
Haystack 2.7's Agent component plus an Ollama-served Llama 3.2 gives you tool-calling RAG with citations. Here's a complete pipeline against your own document store.
Run STT, LLM, and TTS entirely on Cloudflare's edge — no OpenAI, no ElevenLabs. Real working code with Whisper, Llama 3.3 70B, and Deepgram Aura.
Final-answer accuracy hides broken reasoning. Build an eval pipeline that scores the reasoning trace itself — coherence, faithfulness to tools, dead-end detection.
When reasoning models actually help inside an agent loop — and when they're an expensive mistake. Architecture patterns, code, and the cost/quality tradeoffs that matter.
Version your prompts in git, run a 50-case eval suite on every PR, block merges below threshold, and ship a new agent prompt with confidence — full GitHub Actions tutorial.
© 2026 CallSphere LLC. All rights reserved.
Watch how CallSphere handles real customer calls, schedules appointments, and processes payments — live.
Try Live DemoBook a DemoCalculate Your ROI