TL;DR — As of January 2026, ACS Call Automation bidirectional streaming is GA. You purchase a phone number through ACS, hook EventGrid to your webhook, accept the call with AnswerCall, start MediaStreamingOptions with bidirectional=true, and pipe frames into Voice Live API. The whole loop fits in a small ASP.NET service.

What you'll build

A C# ASP.NET service that answers ACS-routed calls, opens a bidirectional media stream, and bridges audio to Voice Live API. Mid-call the agent can call a tool to fetch order status, and a "transfer to human" intent triggers AddParticipant with a queue's phone number.

Prerequisites

Azure subscription with ACS resource and a purchased PSTN number.
EventGrid topic subscribed to ACS events.
Voice Live API enabled on a Foundry resource (same region recommended).
.NET 8, Azure.Communication.CallAutomation NuGet, Azure.Identity.
Public HTTPS callback URL (use ngrok or Container Apps for dev).

Architecture

flowchart TD
  PSTN[Caller PSTN] --> ACS[ACS Call Automation]
  ACS -->|EventGrid IncomingCall| API[ASP.NET Webhook]
  API -->|AnswerCall + MediaStreaming| ACS
  ACS <-->|wss audio frames| API
  API <-->|wss| VL[Voice Live API]
  VL --> GPT[gpt-realtime-mini]
  API -->|AddParticipant| QUEUE[Human Queue Number]

Step 1 — Wire the EventGrid IncomingCall handler

```csharp [HttpPost("/incoming")] public async Task Incoming([FromBody] EventGridEvent[] events) { foreach (var e in events) { if (e.EventType == "Microsoft.EventGrid.SubscriptionValidationEvent") { var data = e.Data.ToObjectFromJson(); return Ok(new { validationResponse = data.ValidationCode }); } if (e.EventType == "Microsoft.Communication.IncomingCall") { var d = e.Data.ToObjectFromJson(); await _calls.AnswerCallAsync(new AnswerCallOptions(d.IncomingCallContext, new Uri($"{Host}/callback")) { MediaStreamingOptions = new MediaStreamingOptions( new Uri($"wss://{HostNoScheme}/media"), MediaStreamingContent.Audio, MediaStreamingAudioChannel.Mixed, startMediaStreaming: true) { EnableBidirectional = true, AudioFormat = AudioFormat.Pcm24KMono } }); } } return Ok(); } ```

Step 2 — Accept the WebSocket and bridge to Voice Live

```csharp [HttpGet("/media")] public async Task Media() { if (!HttpContext.WebSockets.IsWebSocketRequest) { HttpContext.Response.StatusCode = 400; return; } var acs = await HttpContext.WebSockets.AcceptWebSocketAsync(); using var vl = new ClientWebSocket(); vl.Options.SetRequestHeader("Authorization", $"Bearer {await GetAadToken()}"); await vl.ConnectAsync(new Uri("wss://vox-foundry.cognitiveservices.azure.com/voice-agent/realtime?api-version=2025-05-01-preview&model=gpt-realtime"), default);

await SendSessionUpdate(vl);
var t1 = Pump(acs, vl, ParseAcsFrame);   // ACS -> Voice Live
var t2 = Pump(vl, acs, FormatAcsFrame);  // Voice Live -> ACS
await Task.WhenAny(t1, t2);

} ```

ACS bidirectional frames are JSON-wrapped base64 PCM at 24kHz mono — the same sample rate Voice Live wants natively. No resampling.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live →

Try Live Demo →

Step 3 — Frame format helpers

```csharp record AcsAudioFrame(string kind, AcsAudioData audioData); record AcsAudioData(string data, string timestamp, string participantRawID, bool silent);

byte[] ParseAcsFrame(byte[] raw) { var f = JsonSerializer.Deserialize(raw); return f.kind == "AudioData" && !f.audioData.silent ? Convert.FromBase64String(f.audioData.data) : Array.Empty(); } byte[] FormatAcsFrame(byte[] pcm) { var f = new { kind = "AudioData", audioData = new { data = Convert.ToBase64String(pcm), timestamp = DateTime.UtcNow.ToString("o") } }; return JsonSerializer.SerializeToUtf8Bytes(f); } ```

Step 4 — Tool: lookup_order via function calling

In session.update, register the tool. When Voice Live emits response.function_call_arguments.done, dispatch to your CRM SDK and reply with conversation.item.create (function_call_output) + response.create. Same pattern as OpenAI Realtime.

Step 5 — Transfer to human

When the user says "agent please", parse the model's signal (a tool call request_transfer is the cleanest), then:

```csharp await _calls.GetCallConnection(callConnectionId).AddParticipantAsync(new CallInvite(new PhoneNumberIdentifier("+1800SUPPORT"), new PhoneNumberIdentifier("+1YourACSNumber"))); ```

ACS handles SIP REFER under the covers; the AI can stay in the call as a transcriber or drop with HangUpAsync.

Step 6 — Recording + Contact Lens-equivalent

Enable StartRecordingAsync for compliance. ACS recordings drop into your storage account; pipe through Azure AI Speech batch transcription + Foundry sentiment analysis for post-call analytics.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Try Live Demo → Book 30-min Walkthrough See Pricing

Step 7 — Ship to Container Apps with managed identity

Same as the previous post: az containerapp create with --user-assigned for AAD, --ingress external, scaling to 50 replicas. WebSocket sticky session is automatic in Container Apps.

Pitfalls

EventGrid validation handshake: must respond to SubscriptionValidationEvent with the validation code or the topic stays unverified.
MediaStreamingOptions requires EnableBidirectional=true for the new GA bidirectional path; otherwise you get one-way out-of-call streaming (legacy).
Audio format: ACS bidirectional only supports 24kHz mono PCM. If your downstream is 8kHz mu-law, transcode on the fly.
Concurrent call quota starts at 100 — request increases via support.
PSTN ingress costs ~$0.014/min in addition to Voice Live; cheaper than Twilio in most regions but watch billing.

How CallSphere does this in production

CallSphere uses ACS for select Microsoft-aligned enterprise tenants in Healthcare (HIPAA + BAA on AAD) but our default voice path is OpenAI Realtime over Twilio Media Streams via FastAPI :8084. We've measured ACS bidirectional median latency at ~750ms vs ~650ms for Twilio + OpenAI, but ACS wins on data residency for EU customers. 37 agents, 90+ tools, 115+ DB tables, 6 verticals. $149/$499/$1499, 14-day trial, 22% affiliate.

FAQ

Q: Can I use ACS without Voice Live? Yes — bridge to any STT+LLM+TTS stack. Voice Live just removes the integration tax.

Q: How do I get an EU number? ACS supports number purchase in 30+ countries via the portal; pick country during PurchasePhoneNumbers.

Q: Latency vs Twilio Media Streams? On East US 2 with Voice Live: ~750ms. With Twilio + OpenAI Realtime: ~650ms. ACS catches up in EU regions where Twilio adds a transatlantic hop.

Q: How do I do warm transfers? Use AddParticipantAsync then MuteParticipantAsync for the AI; the live agent picks up the same call leg.

Q: Can I record with redaction? Yes — pipe recordings through Azure AI Speech batch transcription with profanity=Masked + a Foundry redaction prompt for PII/PHI before storing.

Build an AI Voice Agent with Azure Communication Services Call Automation (2026)

What you'll build

Prerequisites

Architecture

Step 1 — Wire the EventGrid IncomingCall handler

Step 2 — Accept the WebSocket and bridge to Voice Live

Step 3 — Frame format helpers

Step 4 — Tool: lookup_order via function calling

Step 5 — Transfer to human

Step 6 — Recording + Contact Lens-equivalent

Step 7 — Ship to Container Apps with managed identity

Pitfalls

How CallSphere does this in production

FAQ

Sources

Try CallSphere AI Voice Agents

Related Articles You May Like

Build a Chat Agent with Haystack RAG + Open LLM (Llama 3.2, 2026)

Build a Voice Agent on Cloudflare Workers AI (No External LLM)

How to Build Voice Agent CI/CD with Evals as Gate (GitHub Actions)

Build a CallSphere-Style Outbound Voice Campaign Tool

Build a CallSphere-Style Multi-Agent for HVAC Dispatch

Semantic Kernel for Government Citizen-Services Agents: Build