Build an AI Voice Agent with Azure Communication Services Call Automation (2026)
Wire ACS Call Automation bidirectional streaming to Voice Live API for production PSTN AI agents. Real C# Web App, EventGrid hookup, midcall barge-in, transfer-to-human flow.
TL;DR — As of January 2026, ACS Call Automation bidirectional streaming is GA. You purchase a phone number through ACS, hook EventGrid to your webhook, accept the call with
AnswerCall, startMediaStreamingOptionswith bidirectional=true, and pipe frames into Voice Live API. The whole loop fits in a small ASP.NET service.
What you'll build
A C# ASP.NET service that answers ACS-routed calls, opens a bidirectional media stream, and bridges audio to Voice Live API. Mid-call the agent can call a tool to fetch order status, and a "transfer to human" intent triggers AddParticipant with a queue's phone number.
Prerequisites
- Azure subscription with ACS resource and a purchased PSTN number.
- EventGrid topic subscribed to ACS events.
- Voice Live API enabled on a Foundry resource (same region recommended).
- .NET 8,
Azure.Communication.CallAutomationNuGet,Azure.Identity. - Public HTTPS callback URL (use ngrok or Container Apps for dev).
Architecture
flowchart TD
PSTN[Caller PSTN] --> ACS[ACS Call Automation]
ACS -->|EventGrid IncomingCall| API[ASP.NET Webhook]
API -->|AnswerCall + MediaStreaming| ACS
ACS <-->|wss audio frames| API
API <-->|wss| VL[Voice Live API]
VL --> GPT[gpt-realtime-mini]
API -->|AddParticipant| QUEUE[Human Queue Number]
Step 1 — Wire the EventGrid IncomingCall handler
```csharp
[HttpPost("/incoming")]
public async Task
Step 2 — Accept the WebSocket and bridge to Voice Live
```csharp [HttpGet("/media")] public async Task Media() { if (!HttpContext.WebSockets.IsWebSocketRequest) { HttpContext.Response.StatusCode = 400; return; } var acs = await HttpContext.WebSockets.AcceptWebSocketAsync(); using var vl = new ClientWebSocket(); vl.Options.SetRequestHeader("Authorization", $"Bearer {await GetAadToken()}"); await vl.ConnectAsync(new Uri("wss://vox-foundry.cognitiveservices.azure.com/voice-agent/realtime?api-version=2025-05-01-preview&model=gpt-realtime"), default);
await SendSessionUpdate(vl);
var t1 = Pump(acs, vl, ParseAcsFrame); // ACS -> Voice Live
var t2 = Pump(vl, acs, FormatAcsFrame); // Voice Live -> ACS
await Task.WhenAny(t1, t2);
} ```
ACS bidirectional frames are JSON-wrapped base64 PCM at 24kHz mono — the same sample rate Voice Live wants natively. No resampling.
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
Step 3 — Frame format helpers
```csharp record AcsAudioFrame(string kind, AcsAudioData audioData); record AcsAudioData(string data, string timestamp, string participantRawID, bool silent);
byte[] ParseAcsFrame(byte[] raw) {
var f = JsonSerializer.Deserialize
Step 4 — Tool: lookup_order via function calling
In session.update, register the tool. When Voice Live emits response.function_call_arguments.done, dispatch to your CRM SDK and reply with conversation.item.create (function_call_output) + response.create. Same pattern as OpenAI Realtime.
Step 5 — Transfer to human
When the user says "agent please", parse the model's signal (a tool call request_transfer is the cleanest), then:
```csharp await _calls.GetCallConnection(callConnectionId).AddParticipantAsync(new CallInvite(new PhoneNumberIdentifier("+1800SUPPORT"), new PhoneNumberIdentifier("+1YourACSNumber"))); ```
ACS handles SIP REFER under the covers; the AI can stay in the call as a transcriber or drop with HangUpAsync.
Step 6 — Recording + Contact Lens-equivalent
Enable StartRecordingAsync for compliance. ACS recordings drop into your storage account; pipe through Azure AI Speech batch transcription + Foundry sentiment analysis for post-call analytics.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
Step 7 — Ship to Container Apps with managed identity
Same as the previous post: az containerapp create with --user-assigned for AAD, --ingress external, scaling to 50 replicas. WebSocket sticky session is automatic in Container Apps.
Pitfalls
- EventGrid validation handshake: must respond to
SubscriptionValidationEventwith the validation code or the topic stays unverified. - MediaStreamingOptions requires
EnableBidirectional=truefor the new GA bidirectional path; otherwise you get one-way out-of-call streaming (legacy). - Audio format: ACS bidirectional only supports 24kHz mono PCM. If your downstream is 8kHz mu-law, transcode on the fly.
- Concurrent call quota starts at 100 — request increases via support.
- PSTN ingress costs ~$0.014/min in addition to Voice Live; cheaper than Twilio in most regions but watch billing.
How CallSphere does this in production
CallSphere uses ACS for select Microsoft-aligned enterprise tenants in Healthcare (HIPAA + BAA on AAD) but our default voice path is OpenAI Realtime over Twilio Media Streams via FastAPI :8084. We've measured ACS bidirectional median latency at ~750ms vs ~650ms for Twilio + OpenAI, but ACS wins on data residency for EU customers. 37 agents, 90+ tools, 115+ DB tables, 6 verticals. $149/$499/$1499, 14-day trial, 22% affiliate.
FAQ
Q: Can I use ACS without Voice Live? Yes — bridge to any STT+LLM+TTS stack. Voice Live just removes the integration tax.
Q: How do I get an EU number?
ACS supports number purchase in 30+ countries via the portal; pick country during PurchasePhoneNumbers.
Q: Latency vs Twilio Media Streams? On East US 2 with Voice Live: ~750ms. With Twilio + OpenAI Realtime: ~650ms. ACS catches up in EU regions where Twilio adds a transatlantic hop.
Q: How do I do warm transfers?
Use AddParticipantAsync then MuteParticipantAsync for the AI; the live agent picks up the same call leg.
Q: Can I record with redaction?
Yes — pipe recordings through Azure AI Speech batch transcription with profanity=Masked + a Foundry redaction prompt for PII/PHI before storing.
Sources
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.