Skip to content
AI Voice Agents
AI Voice Agents12 min read0 views

Build an Elixir Phoenix Channels Voice Agent with a LiveView UI

Combine Phoenix Channels for streaming audio frames with LiveView for the chat UI. BEAM concurrency means one node can hold tens of thousands of voice sessions.

TL;DR — A Phoenix node on a 2-vCPU box can hold 50k idle voice sockets without breaking a sweat. Channels move audio bytes; LiveView paints the transcript. The OpenAI Realtime client lives in a per-session GenServer.

What you'll build

A Phoenix 1.7 app that opens a Channel for binary audio frames, runs each call in a supervised GenServer that talks to OpenAI Realtime via websockex, and uses LiveView to render the live transcript. Pipe audio in, see the answer appear word-by-word, hear it back in the browser.

Prerequisites

  1. Elixir 1.17+, Erlang/OTP 27+.
  2. mix phx.new voice --no-ecto --live.
  3. {:websockex, "~> 0.4.3"} in mix.exs.
  4. OPENAI_API_KEY in your runtime config.
  5. A browser that supports MediaRecorder (any modern browser).

Architecture

flowchart LR
  B[Browser MediaRecorder] -- WS audio --> C[Phoenix Channel]
  C -- cast frames --> G[Per-call GenServer]
  G -- WebSocket --> O[OpenAI Realtime]
  O -- delta --> G
  G -- broadcast --> L[LiveView Transcript]

Step 1 — Boot the channel

In lib/voice_web/channels/voice_channel.ex:

```elixir defmodule VoiceWeb.VoiceChannel do use Phoenix.Channel

def join("voice:" <> session_id, _params, socket) do {:ok, pid} = Voice.Session.start_link(session_id, self()) {:ok, assign(socket, :session, pid)} end

def handle_in("audio", {:binary, payload}, socket) do Voice.Session.send_audio(socket.assigns.session, payload) {:noreply, socket} end end ```

Wire it into the user socket:

```elixir defmodule VoiceWeb.UserSocket do use Phoenix.Socket channel "voice:*", VoiceWeb.VoiceChannel def connect(_, socket, ), do: {:ok, socket} def id(), do: nil end ```

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live Demo →

Step 2 — The per-call GenServer

```elixir defmodule Voice.Session do use GenServer require Logger

def start_link(id, channel_pid), do: GenServer.start_link(MODULE, {id, channel_pid})

def send_audio(pid, bytes), do: GenServer.cast(pid, {:audio, bytes})

@impl true def init({id, channel}) do {:ok, oai} = Voice.OpenAIClient.start_link(self()) {:ok, %{id: id, channel: channel, oai: oai}} end

@impl true def handle_cast({:audio, bytes}, %{oai: oai} = s) do Voice.OpenAIClient.append(oai, bytes) {:noreply, s} end

@impl true def handle_info({:openai, %{"type" => "response.audio.delta", "delta" => b64}}, s) do send(s.channel, {:audio_chunk, b64}) {:noreply, s} end

def handle_info({:openai, %{"type" => "response.text.delta", "delta" => t}}, s) do Phoenix.PubSub.broadcast(Voice.PubSub, "transcript:" <> s.id, {:token, t}) {:noreply, s} end end ```

Step 3 — websockex client to OpenAI

```elixir defmodule Voice.OpenAIClient do use WebSockex @url "wss://api.openai.com/v1/realtime?model=gpt-4o-realtime-preview-2025-06-03"

def start_link(parent) do key = System.fetch_env!("OPENAI_API_KEY") WebSockex.start_link(@url, MODULE, %{parent: parent}, extra_headers: [ {"Authorization", "Bearer " <> key}, {"OpenAI-Beta", "realtime=v1"} ]) end

def append(pid, bytes) do msg = Jason.encode!(%{ type: "input_audio_buffer.append", audio: Base.encode64(bytes) }) WebSockex.send_frame(pid, {:text, msg}) end

def handle_frame({:text, raw}, %{parent: p} = s) do send(p, {:openai, Jason.decode!(raw)}) {:ok, s} end end ```

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Step 4 — LiveView transcript

```elixir defmodule VoiceWeb.TranscriptLive do use VoiceWeb, :live_view

def mount(%{"id" => id}, _, socket) do Phoenix.PubSub.subscribe(Voice.PubSub, "transcript:" <> id) {:ok, assign(socket, transcript: "", id: id)} end

def handle_info({:token, t}, socket), do: {:noreply, assign(socket, :transcript, socket.assigns.transcript <> t)}

def render(assigns) do ~H"""

<%= @transcript %>
""" end end ```

Step 5 — Browser hook

```javascript let Hooks = {} Hooks.Mic = { mounted() { let socket = new Phoenix.Socket("/socket") socket.connect() let chan = socket.channel("voice:" + this.el.dataset.id) chan.join() navigator.mediaDevices.getUserMedia({audio:true}).then(stream => { let rec = new MediaRecorder(stream, { mimeType: "audio/webm;codecs=opus" }) rec.ondataavailable = (e) => e.data.arrayBuffer().then(buf => { chan.push("audio", new Uint8Array(buf)) }) rec.start(100) // 100ms slices }) } } ```

Common pitfalls

  • Single GenServer for all calls — kills concurrency; spawn one per session.
  • Sending raw binary as text frames — Phoenix Channels support {:binary, _}; use it.
  • No supervisor — wrap Voice.Session under a DynamicSupervisor.
  • Forgetting PubSub — LiveView and the GenServer aren't the same process.

How CallSphere does this in production

While CallSphere's voice stack is FastAPI + Pion, our status board uses Phoenix LiveView for live agent metrics across 37 agents and 90+ tools. BEAM is unbeatable for many-tiny-process fan-out. Run our Real Estate agent demo to see what 50k concurrent socket scaling looks like in practice.

FAQ

Can BEAM really hold 50k voice sockets? Yes — each is ~50KB process heap. RAM is the limit, not CPU.

Why websockex instead of mint? It's a higher-level supervised client; mint is fine if you want full control.

How do I add tools (functions)? Listen for response.function_call_arguments.done and reply with conversation.item.create.

Audio format? g711_ulaw for telephony, pcm16 for browser.

Where do I deploy? fly.io or Gigalixir; both support clustered Phoenix.

Sources

Share

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.