TL;DR — Modal lets you put @app.function(gpu="A10G") on top of a regular Python function and ship it. Pair it with LiveKit Agents and you have a horizontally autoscaling voice cluster in under 100 lines.

What you'll build

A Modal-deployed LiveKit Agent that uses Whisper-large-v3 (GPU) for STT, GPT-4o-mini for LLM, and ElevenLabs for TTS. Modal autoscales containers up to 1000 concurrent calls; idle containers go to sleep in seconds, billed only for active CPU/GPU time.

Prerequisites

pip install modal and modal token new.
LiveKit Cloud project (or self-hosted).
Modal secrets for LiveKit + OpenAI + ElevenLabs.
Python 3.11+.
pip install livekit-agents livekit-plugins-openai livekit-plugins-elevenlabs.

Architecture

flowchart LR
  B[Browser/iOS/Android] -- WebRTC --> L[LiveKit Cloud]
  L -- dispatch --> M[Modal Function: VoiceAgent]
  M -- GPU --> W[Whisper]
  M -- API --> O[OpenAI gpt-4o-mini]
  M -- API --> E[ElevenLabs TTS]

agent.py:

```python import modal

image = ( modal.Image.debian_slim(python_version="3.11") .pip_install( "livekit-agents>=0.12", "livekit-plugins-openai", "livekit-plugins-silero", "livekit-plugins-elevenlabs", "livekit-plugins-deepgram", ) )

app = modal.App("callsphere-voice", image=image)

secrets = [ modal.Secret.from_name("livekit"), # LIVEKIT_URL/API_KEY/API_SECRET modal.Secret.from_name("openai"), # OPENAI_API_KEY modal.Secret.from_name("elevenlabs"), # ELEVEN_API_KEY ] ```

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live →

Try Live Demo →

Step 2 — Write the LiveKit entrypoint

```python from livekit.agents import ( AutoSubscribe, JobContext, WorkerOptions, cli, llm ) from livekit.agents.voice_assistant import VoiceAssistant from livekit.plugins import openai, silero, elevenlabs, deepgram

async def entrypoint(ctx: JobContext): await ctx.connect(auto_subscribe=AutoSubscribe.AUDIO_ONLY)

initial_ctx = llm.ChatContext().append(
    role="system",
    text=(
        "You are CallSphere's salon receptionist. "
        "Keep responses short and warm."
    ),
)

assistant = VoiceAssistant(
    vad=silero.VAD.load(),
    stt=deepgram.STT(model="nova-3"),
    llm=openai.LLM(model="gpt-4o-mini"),
    tts=elevenlabs.TTS(voice="Rachel"),
    chat_ctx=initial_ctx,
)
assistant.start(ctx.room)
await assistant.say("Hi, this is CallSphere. How can I help?", allow_interruptions=True)

```

```python @app.function( image=image, secrets=secrets, timeout=60 * 60, # 1 hour max per call cpu=2.0, memory=2048, min_containers=1, # warm pool max_containers=200, ) def run_worker(): cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint)) ```

Step 4 — GPU split for STT (optional)

Heavy Whisper inference belongs on its own autoscaling GPU pool:

```python @app.cls(image=image, secrets=secrets, gpu="A10G", min_containers=0) class WhisperGPU: @modal.method() def transcribe(self, audio_bytes: bytes) -> str: from faster_whisper import WhisperModel model = WhisperModel("large-v3", device="cuda", compute_type="float16") segments, _ = model.transcribe(audio_bytes, beam_size=1) return " ".join(s.text for s in segments) ```

The voice worker calls WhisperGPU().transcribe.remote(...); Modal autoscales the GPU pool independently.

Step 5 — Deploy

```bash modal deploy agent.py ```

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Try Live Demo → Book 30-min Walkthrough See Pricing

You'll see a URL for the function dashboard. LiveKit Cloud's worker dispatches to your Modal function via the LiveKit Cloud API, so connect them in the LiveKit dashboard under "Self-hosted dispatch."

Step 6 — Local dev loop

```bash modal run agent.py::run_worker ```

This runs the same function locally with hot-reload, sharing your Modal secrets so you don't manage .env.

Common pitfalls

Cold starts on GPU — pre-warm with min_containers=1 for production.
Bundling secrets in code — always use modal.Secret.from_name.
Long-running container leaks — set timeout so a hung agent dies.
Pip-install on every cold start — bake into image so it's cached.

How CallSphere does this in production

CallSphere uses Modal for off-hours batch tasks like nightly call summarization with Whisper-large — it's the cheapest GPU surface for spiky workloads. Our 24/7 voice plane is on dedicated k3s (Pion + FastAPI) for predictable latency. 37 agents across 6 verticals, HIPAA + SOC 2.

FAQ

Cost for 10k calls/day? ~$120 Modal + ~$200 LLM/STT/TTS APIs.

Cold start on CPU? ~1.5s; on GPU ~6s with model warmup.

Can I use Pipecat instead of LiveKit? Yes — Pipecat has an official Modal guide.

Modal vs RunPod? Modal scales to zero in seconds; RunPod cheaper for always-on.

Logs and metrics? Built-in dashboard + modal app logs.

Deploy a Voice Agent on Modal with Python and Serverless GPU

What you'll build

Prerequisites

Architecture

Step 2 — Write the LiveKit entrypoint

Step 4 — GPU split for STT (optional)

Step 5 — Deploy

Step 6 — Local dev loop

Common pitfalls

How CallSphere does this in production

FAQ

Sources

Try CallSphere AI Voice Agents

Related Articles You May Like

Build a Chat Agent with Haystack RAG + Open LLM (Llama 3.2, 2026)

Build a Voice Agent on Cloudflare Workers AI (No External LLM)

Building Your First Agent with the OpenAI Agents SDK in 2026: A Hands-On Walkthrough

How to Build Voice Agent CI/CD with Evals as Gate (GitHub Actions)

Build a CallSphere-Style Outbound Voice Campaign Tool

Build a CallSphere-Style Multi-Agent for HVAC Dispatch

What you'll build

Prerequisites

Architecture

Step 1 — Define the Modal app

Step 2 — Write the LiveKit entrypoint

Step 3 — Wrap in a Modal Function

Step 4 — GPU split for STT (optional)

Step 5 — Deploy

Step 6 — Local dev loop

Common pitfalls

How CallSphere does this in production

FAQ

Sources

Try CallSphere AI Voice Agents

Related Articles You May Like

Build a Chat Agent with Haystack RAG + Open LLM (Llama 3.2, 2026)

Build a Voice Agent on Cloudflare Workers AI (No External LLM)

Building Your First Agent with the OpenAI Agents SDK in 2026: A Hands-On Walkthrough

How to Build Voice Agent CI/CD with Evals as Gate (GitHub Actions)

Build a CallSphere-Style Outbound Voice Campaign Tool

Build a CallSphere-Style Multi-Agent for HVAC Dispatch