---
title: "Build an AI Voice Agent with Azure Communication Services Call Automation (2026)"
description: "Wire ACS Call Automation bidirectional streaming to Voice Live API for production PSTN AI agents. Real C# Web App, EventGrid hookup, midcall barge-in, transfer-to-human flow."
canonical: https://callsphere.ai/blog/vw5h-build-ai-voice-agent-azure-communication-services-call-automation
category: "AI Voice Agents"
tags: ["Azure", "ACS", "Call Automation", "Voice Live", "Tutorial"]
author: "CallSphere Team"
published: 2026-04-02T00:00:00.000Z
updated: 2026-05-07T16:30:07.159Z
---

# Build an AI Voice Agent with Azure Communication Services Call Automation (2026)

> Wire ACS Call Automation bidirectional streaming to Voice Live API for production PSTN AI agents. Real C# Web App, EventGrid hookup, midcall barge-in, transfer-to-human flow.

> **TL;DR** — As of January 2026, ACS Call Automation bidirectional streaming is GA. You purchase a phone number through ACS, hook EventGrid to your webhook, accept the call with `AnswerCall`, start `MediaStreamingOptions` with bidirectional=true, and pipe frames into Voice Live API. The whole loop fits in a small ASP.NET service.

## What you'll build

A C# ASP.NET service that answers ACS-routed calls, opens a bidirectional media stream, and bridges audio to Voice Live API. Mid-call the agent can call a tool to fetch order status, and a "transfer to human" intent triggers `AddParticipant` with a queue's phone number.

## Prerequisites

1. Azure subscription with ACS resource and a purchased PSTN number.
2. EventGrid topic subscribed to ACS events.
3. Voice Live API enabled on a Foundry resource (same region recommended).
4. .NET 8, `Azure.Communication.CallAutomation` NuGet, `Azure.Identity`.
5. Public HTTPS callback URL (use ngrok or Container Apps for dev).

## Architecture

```mermaid
flowchart TD
  PSTN[Caller PSTN] --> ACS[ACS Call Automation]
  ACS -->|EventGrid IncomingCall| API[ASP.NET Webhook]
  API -->|AnswerCall + MediaStreaming| ACS
  ACS |wss audio frames| API
  API |wss| VL[Voice Live API]
  VL --> GPT[gpt-realtime-mini]
  API -->|AddParticipant| QUEUE[Human Queue Number]
```

## Step 1 — Wire the EventGrid IncomingCall handler

```csharp
[HttpPost("/incoming")]
public async Task Incoming([FromBody] EventGridEvent[] events) {
    foreach (var e in events) {
        if (e.EventType == "Microsoft.EventGrid.SubscriptionValidationEvent") {
            var data = e.Data.ToObjectFromJson();
            return Ok(new { validationResponse = data.ValidationCode });
        }
        if (e.EventType == "Microsoft.Communication.IncomingCall") {
            var d = e.Data.ToObjectFromJson();
            await _calls.AnswerCallAsync(new AnswerCallOptions(d.IncomingCallContext, new Uri($"{Host}/callback")) {
                MediaStreamingOptions = new MediaStreamingOptions(
                    new Uri($"wss://{HostNoScheme}/media"),
                    MediaStreamingContent.Audio,
                    MediaStreamingAudioChannel.Mixed,
                    startMediaStreaming: true) {
                    EnableBidirectional = true,
                    AudioFormat = AudioFormat.Pcm24KMono
                }
            });
        }
    }
    return Ok();
}
```

## Step 2 — Accept the WebSocket and bridge to Voice Live

```csharp
[HttpGet("/media")]
public async Task Media() {
    if (!HttpContext.WebSockets.IsWebSocketRequest) { HttpContext.Response.StatusCode = 400; return; }
    var acs = await HttpContext.WebSockets.AcceptWebSocketAsync();
    using var vl = new ClientWebSocket();
    vl.Options.SetRequestHeader("Authorization", $"Bearer {await GetAadToken()}");
    await vl.ConnectAsync(new Uri("wss://vox-foundry.cognitiveservices.azure.com/voice-agent/realtime?api-version=2025-05-01-preview&model=gpt-realtime"), default);

```
await SendSessionUpdate(vl);
var t1 = Pump(acs, vl, ParseAcsFrame);   // ACS -> Voice Live
var t2 = Pump(vl, acs, FormatAcsFrame);  // Voice Live -> ACS
await Task.WhenAny(t1, t2);
```

}
```

ACS bidirectional frames are JSON-wrapped base64 PCM at 24kHz mono — the same sample rate Voice Live wants natively. No resampling.

## Step 3 — Frame format helpers

```csharp
record AcsAudioFrame(string kind, AcsAudioData audioData);
record AcsAudioData(string data, string timestamp, string participantRawID, bool silent);

byte[] ParseAcsFrame(byte[] raw) {
    var f = JsonSerializer.Deserialize(raw);
    return f.kind == "AudioData" && !f.audioData.silent ? Convert.FromBase64String(f.audioData.data) : Array.Empty();
}
byte[] FormatAcsFrame(byte[] pcm) {
    var f = new { kind = "AudioData", audioData = new { data = Convert.ToBase64String(pcm), timestamp = DateTime.UtcNow.ToString("o") } };
    return JsonSerializer.SerializeToUtf8Bytes(f);
}
```

## Step 4 — Tool: lookup_order via function calling

In `session.update`, register the tool. When Voice Live emits `response.function_call_arguments.done`, dispatch to your CRM SDK and reply with `conversation.item.create` (`function_call_output`) + `response.create`. Same pattern as OpenAI Realtime.

## Step 5 — Transfer to human

When the user says "agent please", parse the model's signal (a tool call `request_transfer` is the cleanest), then:

```csharp
await _calls.GetCallConnection(callConnectionId).AddParticipantAsync(new CallInvite(new PhoneNumberIdentifier("+1800SUPPORT"), new PhoneNumberIdentifier("+1YourACSNumber")));
```

ACS handles SIP REFER under the covers; the AI can stay in the call as a transcriber or drop with `HangUpAsync`.

## Step 6 — Recording + Contact Lens-equivalent

Enable `StartRecordingAsync` for compliance. ACS recordings drop into your storage account; pipe through Azure AI Speech batch transcription + Foundry sentiment analysis for post-call analytics.

## Step 7 — Ship to Container Apps with managed identity

Same as the previous post: `az containerapp create` with `--user-assigned` for AAD, `--ingress external`, scaling to 50 replicas. WebSocket sticky session is automatic in Container Apps.

## Pitfalls

- **EventGrid validation handshake**: must respond to `SubscriptionValidationEvent` with the validation code or the topic stays unverified.
- **MediaStreamingOptions** requires `EnableBidirectional=true` for the new GA bidirectional path; otherwise you get one-way out-of-call streaming (legacy).
- **Audio format**: ACS bidirectional only supports 24kHz mono PCM. If your downstream is 8kHz mu-law, transcode on the fly.
- **Concurrent call quota** starts at 100 — request increases via support.
- **PSTN ingress costs** ~$0.014/min in addition to Voice Live; cheaper than Twilio in most regions but watch billing.

## How CallSphere does this in production

CallSphere uses ACS for select Microsoft-aligned enterprise tenants in Healthcare (HIPAA + BAA on AAD) but our default voice path is OpenAI Realtime over Twilio Media Streams via FastAPI :8084. We've measured ACS bidirectional median latency at ~750ms vs ~650ms for Twilio + OpenAI, but ACS wins on data residency for EU customers. 37 agents, 90+ tools, 115+ DB tables, 6 verticals. $149/$499/$1499, 14-day trial, 22% affiliate.

## FAQ

**Q: Can I use ACS without Voice Live?**
Yes — bridge to any STT+LLM+TTS stack. Voice Live just removes the integration tax.

**Q: How do I get an EU number?**
ACS supports number purchase in 30+ countries via the portal; pick country during `PurchasePhoneNumbers`.

**Q: Latency vs Twilio Media Streams?**
On East US 2 with Voice Live: ~750ms. With Twilio + OpenAI Realtime: ~650ms. ACS catches up in EU regions where Twilio adds a transatlantic hop.

**Q: How do I do warm transfers?**
Use `AddParticipantAsync` then `MuteParticipantAsync` for the AI; the live agent picks up the same call leg.

**Q: Can I record with redaction?**
Yes — pipe recordings through Azure AI Speech batch transcription with `profanity=Masked` + a Foundry redaction prompt for PII/PHI before storing.

## Sources

- [Call Automation AI Sample — ACS](https://learn.microsoft.com/en-us/azure/communication-services/samples/call-automation-ai)
- [AI in Azure Communication Services overview](https://learn.microsoft.com/en-us/azure/communication-services/concepts/ai)
- [Create next-gen voice agents with Voice Live and ACS — Microsoft Community Hub](https://techcommunity.microsoft.com/blog/azurecommunicationservicesblog/create-next-gen-voice-agents-with-azure-ais-voice-live-api-and-azure-communicati/4414735)
- [Azure-Samples/call-center-voice-agent-accelerator](https://github.com/Azure-Samples/call-center-voice-agent-accelerator)
- [Azure-Samples/communication-services-AI-customer-service-sample](https://github.com/Azure-Samples/communication-services-AI-customer-service-sample)

---

Source: https://callsphere.ai/blog/vw5h-build-ai-voice-agent-azure-communication-services-call-automation
