By Sagar Shankaran, Founder of CallSphere
Combine Phoenix Channels for streaming audio frames with LiveView for the chat UI. BEAM concurrency means one node can hold tens of thousands of voice sessions.
Key takeaways
TL;DR — A Phoenix node on a 2-vCPU box can hold 50k idle voice sockets without breaking a sweat. Channels move audio bytes; LiveView paints the transcript. The OpenAI Realtime client lives in a per-session GenServer.
A Phoenix 1.7 app that opens a Channel for binary audio frames, runs each call in a supervised GenServer that talks to OpenAI Realtime via websockex, and uses LiveView to render the live transcript. Pipe audio in, see the answer appear word-by-word, hear it back in the browser.
mix phx.new voice --no-ecto --live.{:websockex, "~> 0.4.3"} in mix.exs.OPENAI_API_KEY in your runtime config.MediaRecorder (any modern browser).flowchart LR
B[Browser MediaRecorder] -- WS audio --> C[Phoenix Channel]
C -- cast frames --> G[Per-call GenServer]
G -- WebSocket --> O[OpenAI Realtime]
O -- delta --> G
G -- broadcast --> L[LiveView Transcript]
In lib/voice_web/channels/voice_channel.ex:
```elixir defmodule VoiceWeb.VoiceChannel do use Phoenix.Channel
def join("voice:" <> session_id, _params, socket) do {:ok, pid} = Voice.Session.start_link(session_id, self()) {:ok, assign(socket, :session, pid)} end
def handle_in("audio", {:binary, payload}, socket) do Voice.Session.send_audio(socket.assigns.session, payload) {:noreply, socket} end end ```
Wire it into the user socket:
```elixir defmodule VoiceWeb.UserSocket do use Phoenix.Socket channel "voice:*", VoiceWeb.VoiceChannel def connect(_, socket, ), do: {:ok, socket} def id(), do: nil end ```
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
```elixir defmodule Voice.Session do use GenServer require Logger
def start_link(id, channel_pid), do: GenServer.start_link(MODULE, {id, channel_pid})
def send_audio(pid, bytes), do: GenServer.cast(pid, {:audio, bytes})
@impl true def init({id, channel}) do {:ok, oai} = Voice.OpenAIClient.start_link(self()) {:ok, %{id: id, channel: channel, oai: oai}} end
@impl true def handle_cast({:audio, bytes}, %{oai: oai} = s) do Voice.OpenAIClient.append(oai, bytes) {:noreply, s} end
@impl true def handle_info({:openai, %{"type" => "response.audio.delta", "delta" => b64}}, s) do send(s.channel, {:audio_chunk, b64}) {:noreply, s} end
def handle_info({:openai, %{"type" => "response.text.delta", "delta" => t}}, s) do Phoenix.PubSub.broadcast(Voice.PubSub, "transcript:" <> s.id, {:token, t}) {:noreply, s} end end ```
```elixir defmodule Voice.OpenAIClient do use WebSockex @url "wss://api.openai.com/v1/realtime?model=gpt-4o-realtime-preview-2025-06-03"
def start_link(parent) do key = System.fetch_env!("OPENAI_API_KEY") WebSockex.start_link(@url, MODULE, %{parent: parent}, extra_headers: [ {"Authorization", "Bearer " <> key}, {"OpenAI-Beta", "realtime=v1"} ]) end
def append(pid, bytes) do msg = Jason.encode!(%{ type: "input_audio_buffer.append", audio: Base.encode64(bytes) }) WebSockex.send_frame(pid, {:text, msg}) end
def handle_frame({:text, raw}, %{parent: p} = s) do send(p, {:openai, Jason.decode!(raw)}) {:ok, s} end end ```
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
```elixir defmodule VoiceWeb.TranscriptLive do use VoiceWeb, :live_view
def mount(%{"id" => id}, _, socket) do Phoenix.PubSub.subscribe(Voice.PubSub, "transcript:" <> id) {:ok, assign(socket, transcript: "", id: id)} end
def handle_info({:token, t}, socket), do: {:noreply, assign(socket, :transcript, socket.assigns.transcript <> t)}
def render(assigns) do ~H"""
<%= @transcript %>
```javascript let Hooks = {} Hooks.Mic = { mounted() { let socket = new Phoenix.Socket("/socket") socket.connect() let chan = socket.channel("voice:" + this.el.dataset.id) chan.join() navigator.mediaDevices.getUserMedia({audio:true}).then(stream => { let rec = new MediaRecorder(stream, { mimeType: "audio/webm;codecs=opus" }) rec.ondataavailable = (e) => e.data.arrayBuffer().then(buf => { chan.push("audio", new Uint8Array(buf)) }) rec.start(100) // 100ms slices }) } } ```
{:binary, _}; use it.Voice.Session under a DynamicSupervisor.While CallSphere's voice stack is FastAPI + Pion, our status board uses Phoenix LiveView for live agent metrics across 37 agents and 90+ tools. BEAM is unbeatable for many-tiny-process fan-out. Run our Real Estate agent demo to see what 50k concurrent socket scaling looks like in practice.
Can BEAM really hold 50k voice sockets? Yes — each is ~50KB process heap. RAM is the limit, not CPU.
Why websockex instead of mint? It's a higher-level supervised client; mint is fine if you want full control.
How do I add tools (functions)? Listen for response.function_call_arguments.done and reply with conversation.item.create.
Audio format? g711_ulaw for telephony, pcm16 for browser.
Where do I deploy? fly.io or Gigalixir; both support clustered Phoenix.
Written by
Sagar Shankaran· Founder, CallSphere
Sagar Shankaran is the founder of CallSphere, where he builds production AI voice and chat agents deployed across healthcare, hospitality, real estate, and home services. He writes about agentic AI, LLM engineering, and shipping voice agents that handle real calls in production.
See how AI voice agents work for your industry. Live demo available -- no signup required.
Haystack 2.7's Agent component plus an Ollama-served Llama 3.2 gives you tool-calling RAG with citations. Here's a complete pipeline against your own document store.
Run STT, LLM, and TTS entirely on Cloudflare's edge — no OpenAI, no ElevenLabs. Real working code with Whisper, Llama 3.3 70B, and Deepgram Aura.
Version your prompts in git, run a 50-case eval suite on every PR, block merges below threshold, and ship a new agent prompt with confidence — full GitHub Actions tutorial.
Replace expensive outbound SDR tooling with a self-hosted dialer that runs OpenAI Realtime agents at 100 concurrent calls. Full architecture and code.
The state of Grok 4's developer ecosystem in May 2026 — SDKs, framework integrations, and where the gaps remain. Practical context for teams in Phoenix, AZ.
Arize Phoenix is the open-source LLM observability tool that grew up significantly in 2026. Tracing, evals, and the OTel-native approach that makes Phoenix portable.
© 2026 CallSphere LLC. All rights reserved.
Watch how CallSphere handles real customer calls, schedules appointments, and processes payments — live.