Build an Elixir Phoenix Channels Voice Agent with a LiveView UI
Combine Phoenix Channels for streaming audio frames with LiveView for the chat UI. BEAM concurrency means one node can hold tens of thousands of voice sessions.
TL;DR — A Phoenix node on a 2-vCPU box can hold 50k idle voice sockets without breaking a sweat. Channels move audio bytes; LiveView paints the transcript. The OpenAI Realtime client lives in a per-session GenServer.
What you'll build
A Phoenix 1.7 app that opens a Channel for binary audio frames, runs each call in a supervised GenServer that talks to OpenAI Realtime via websockex, and uses LiveView to render the live transcript. Pipe audio in, see the answer appear word-by-word, hear it back in the browser.
Prerequisites
- Elixir 1.17+, Erlang/OTP 27+.
mix phx.new voice --no-ecto --live.{:websockex, "~> 0.4.3"}inmix.exs.OPENAI_API_KEYin your runtime config.- A browser that supports
MediaRecorder(any modern browser).
Architecture
flowchart LR
B[Browser MediaRecorder] -- WS audio --> C[Phoenix Channel]
C -- cast frames --> G[Per-call GenServer]
G -- WebSocket --> O[OpenAI Realtime]
O -- delta --> G
G -- broadcast --> L[LiveView Transcript]
Step 1 — Boot the channel
In lib/voice_web/channels/voice_channel.ex:
```elixir defmodule VoiceWeb.VoiceChannel do use Phoenix.Channel
def join("voice:" <> session_id, _params, socket) do {:ok, pid} = Voice.Session.start_link(session_id, self()) {:ok, assign(socket, :session, pid)} end
def handle_in("audio", {:binary, payload}, socket) do Voice.Session.send_audio(socket.assigns.session, payload) {:noreply, socket} end end ```
Wire it into the user socket:
```elixir defmodule VoiceWeb.UserSocket do use Phoenix.Socket channel "voice:*", VoiceWeb.VoiceChannel def connect(_, socket, ), do: {:ok, socket} def id(), do: nil end ```
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
Step 2 — The per-call GenServer
```elixir defmodule Voice.Session do use GenServer require Logger
def start_link(id, channel_pid), do: GenServer.start_link(MODULE, {id, channel_pid})
def send_audio(pid, bytes), do: GenServer.cast(pid, {:audio, bytes})
@impl true def init({id, channel}) do {:ok, oai} = Voice.OpenAIClient.start_link(self()) {:ok, %{id: id, channel: channel, oai: oai}} end
@impl true def handle_cast({:audio, bytes}, %{oai: oai} = s) do Voice.OpenAIClient.append(oai, bytes) {:noreply, s} end
@impl true def handle_info({:openai, %{"type" => "response.audio.delta", "delta" => b64}}, s) do send(s.channel, {:audio_chunk, b64}) {:noreply, s} end
def handle_info({:openai, %{"type" => "response.text.delta", "delta" => t}}, s) do Phoenix.PubSub.broadcast(Voice.PubSub, "transcript:" <> s.id, {:token, t}) {:noreply, s} end end ```
Step 3 — websockex client to OpenAI
```elixir defmodule Voice.OpenAIClient do use WebSockex @url "wss://api.openai.com/v1/realtime?model=gpt-4o-realtime-preview-2025-06-03"
def start_link(parent) do key = System.fetch_env!("OPENAI_API_KEY") WebSockex.start_link(@url, MODULE, %{parent: parent}, extra_headers: [ {"Authorization", "Bearer " <> key}, {"OpenAI-Beta", "realtime=v1"} ]) end
def append(pid, bytes) do msg = Jason.encode!(%{ type: "input_audio_buffer.append", audio: Base.encode64(bytes) }) WebSockex.send_frame(pid, {:text, msg}) end
def handle_frame({:text, raw}, %{parent: p} = s) do send(p, {:openai, Jason.decode!(raw)}) {:ok, s} end end ```
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
Step 4 — LiveView transcript
```elixir defmodule VoiceWeb.TranscriptLive do use VoiceWeb, :live_view
def mount(%{"id" => id}, _, socket) do Phoenix.PubSub.subscribe(Voice.PubSub, "transcript:" <> id) {:ok, assign(socket, transcript: "", id: id)} end
def handle_info({:token, t}, socket), do: {:noreply, assign(socket, :transcript, socket.assigns.transcript <> t)}
def render(assigns) do ~H"""
<%= @transcript %>
Step 5 — Browser hook
```javascript let Hooks = {} Hooks.Mic = { mounted() { let socket = new Phoenix.Socket("/socket") socket.connect() let chan = socket.channel("voice:" + this.el.dataset.id) chan.join() navigator.mediaDevices.getUserMedia({audio:true}).then(stream => { let rec = new MediaRecorder(stream, { mimeType: "audio/webm;codecs=opus" }) rec.ondataavailable = (e) => e.data.arrayBuffer().then(buf => { chan.push("audio", new Uint8Array(buf)) }) rec.start(100) // 100ms slices }) } } ```
Common pitfalls
- Single GenServer for all calls — kills concurrency; spawn one per session.
- Sending raw binary as text frames — Phoenix Channels support
{:binary, _}; use it. - No supervisor — wrap
Voice.Sessionunder aDynamicSupervisor. - Forgetting PubSub — LiveView and the GenServer aren't the same process.
How CallSphere does this in production
While CallSphere's voice stack is FastAPI + Pion, our status board uses Phoenix LiveView for live agent metrics across 37 agents and 90+ tools. BEAM is unbeatable for many-tiny-process fan-out. Run our Real Estate agent demo to see what 50k concurrent socket scaling looks like in practice.
FAQ
Can BEAM really hold 50k voice sockets? Yes — each is ~50KB process heap. RAM is the limit, not CPU.
Why websockex instead of mint? It's a higher-level supervised client; mint is fine if you want full control.
How do I add tools (functions)? Listen for response.function_call_arguments.done and reply with conversation.item.create.
Audio format? g711_ulaw for telephony, pcm16 for browser.
Where do I deploy? fly.io or Gigalixir; both support clustered Phoenix.
Sources
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.