By Sagar Shankaran, Founder of CallSphere
Shrink an AI voice agent image from 950MB to 80MB with a Python 3.13 multi-stage build, uv for deps, and gcr.io/distroless/python3 nonroot. Real Dockerfile + benchmarks.
Key takeaways
TL;DR — Build with
python:3.13-slim+uvfor deps, copy a virtualenv intogcr.io/distroless/python3-debian12:nonroot, and your AI voice agent ships at ~80 MB with no shell, no apt, nopip. Smaller surface, faster pulls, fewer CVEs.
A two-stage Dockerfile that builds an OpenAI Realtime + LiveKit agent with uv, then copies the resolved virtualenv into a distroless runtime. The final image is non-root, has zero package managers, and starts in <300 ms.
flowchart LR
SRC[src + pyproject.toml] --> S1[Stage 1: python:3.13-slim + uv]
S1 -->|uv sync --frozen| VENV[/.venv/]
VENV --> S2[Stage 2: distroless/python3 nonroot]
S2 --> IMG[80MB image]
IMG --> K8S[k3s pod]
```dockerfile
ARG PY=3.13
FROM python:${PY}-slim AS builder ENV UV_LINK_MODE=copy \ UV_COMPILE_BYTECODE=1 \ UV_PROJECT_ENVIRONMENT=/opt/venv RUN --mount=type=cache,target=/root/.cache \ pip install --no-cache-dir uv==0.5.4 WORKDIR /app ```
UV_LINK_MODE=copy is mandatory when the venv is going to be moved across stages. UV_COMPILE_BYTECODE=1 precompiles .pyc, which gives a measurable cold-start improvement on distroless (where you can't recompile at runtime).
```dockerfile COPY pyproject.toml uv.lock ./ RUN --mount=type=cache,target=/root/.cache/uv \ uv sync --frozen --no-dev --no-install-project COPY src/ ./src/ RUN uv sync --frozen --no-dev ```
The split — install deps first, then copy code, then install project — gives proper Docker layer caching: code changes don't bust dependency resolution.
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
```dockerfile FROM gcr.io/distroless/python3-debian12:nonroot
COPY --from=builder /opt/venv /opt/venv COPY --from=builder /app/src /app/src
ENV PATH=/opt/venv/bin:$PATH \ PYTHONPATH=/app \ PYTHONDONTWRITEBYTECODE=1 \ PYTHONUNBUFFERED=1
WORKDIR /app USER nonroot ENTRYPOINT ["python", "-m", "src.agent"] ```
The final image: ~80 MB, no shell, no apt, no curl. USER nonroot (uid 65532) is built into the image — you can't accidentally run as root.
Distroless has no curl. Use Python:
```dockerfile HEALTHCHECK --interval=10s --timeout=2s \ CMD ["python", "-c", "import urllib.request,sys; sys.exit(0 if urllib.request.urlopen('http://127.0.0.1:8080/healthz', timeout=1).status==200 else 1)"] ```
For Kubernetes, drop the Docker HEALTHCHECK and use a real probe in the Pod spec — but include both for local docker run debugging.
```bash docker buildx create --use --name voicebuilder docker buildx build \ --platform linux/amd64,linux/arm64 \ --provenance=mode=max \ --sbom=true \ -t ghcr.io/acme/voice-agent:$(git rev-parse --short HEAD) \ --push . ```
--provenance=mode=max writes a SLSA-compatible provenance attestation; --sbom=true emits SPDX. Both are stored as OCI referrers — invisible to old clients but verifiable by cosign.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
```bash docker inspect ghcr.io/acme/voice-agent:abc123 \ --format '{{ .Size }} bytes user={{ .Config.User }}'
docker run --rm --read-only --user 65532:65532 \ ghcr.io/acme/voice-agent:abc123 python -c "import sys;print(sys.version)" ```
--read-only confirms the agent doesn't need to write to /tmp (set up an in-memory volume in K8s if it does).
```bash hyperfine --warmup 1 \ 'docker run --rm voice-agent:slim python -c "import livekit"' \ 'docker run --rm voice-agent:distroless python -c "import livekit"'
```
About 45% faster cold start because there's no shell init, no /etc/profile, and the image fits in page cache.
grpc-tools) need build essentials in stage 1; without build-essential, the wheel is built in the final image and bloats it.uv sync without --frozen in CI can produce different lock results than dev. Always --frozen in the build./tmp is read-only on distroless by default. Mount an emptyDir{ medium: Memory } if libs (matplotlib cache!) need it.apt for OS CVE patches — but distroless rebuilds nightly. Pin gcr.io/distroless/python3-debian12:nonroot only by digest in the registry, not by tag.alpine then copy to debian12 you'll get ImportError. Stay on debian-based slim → debian-based distroless.CallSphere ships every voice-agent image at ~85 MB on gcr.io/distroless/python3-debian12:nonroot. With 37 agents pulling on every pod restart across a fleet of k3s nodes, going from 950 MB to 85 MB cut our image-pull p95 from 12 s to 0.9 s. We also gained back ~12 GB of container-host disk per node. 90+ tools, 115+ DB tables, $149/$499/$1499, 14-day trial, 22% affiliate, demo.
Q: Why not Alpine?
musl libc breaks some Python wheels (notably grpcio, numpy on older releases). Distroless uses glibc — safer for AI stacks.
Q: How do I debug a distroless container?
Use gcr.io/distroless/python3-debian12:debug-nonroot for the same image with busybox. Switch only in dev.
Q: Can I add tini for signal handling?
Distroless includes a proper PID-1 /usr/bin/python invocation already; you don't need tini.
Q: SBOM where?
docker buildx imagetools inspect <image> --format '{{ json .SBOM }}' shows the SPDX SBOM stored as referrer.
Written by
Sagar Shankaran· Founder, CallSphere
Sagar Shankaran is the founder of CallSphere, where he builds production AI voice and chat agents deployed across healthcare, hospitality, real estate, and home services. He writes about agentic AI, LLM engineering, and shipping voice agents that handle real calls in production.
See how AI voice agents work for your industry. Live demo available -- no signup required.
Haystack 2.7's Agent component plus an Ollama-served Llama 3.2 gives you tool-calling RAG with citations. Here's a complete pipeline against your own document store.
Run STT, LLM, and TTS entirely on Cloudflare's edge — no OpenAI, no ElevenLabs. Real working code with Whisper, Llama 3.3 70B, and Deepgram Aura.
Step-by-step build of a working agent with the OpenAI Agents SDK — Agent class, tools, handoffs, tracing — plus an eval pipeline that catches regressions before merge.
Version your prompts in git, run a 50-case eval suite on every PR, block merges below threshold, and ship a new agent prompt with confidence — full GitHub Actions tutorial.
Replace expensive outbound SDR tooling with a self-hosted dialer that runs OpenAI Realtime agents at 100 concurrent calls. Full architecture and code.
HVAC companies miss 40–60% of inbound. Build a 4-agent dispatch (intake, scheduling, parts, emergency) that integrates with ServiceTitan in 600 lines.
© 2026 CallSphere LLC. All rights reserved.
Watch how CallSphere handles real customer calls, schedules appointments, and processes payments — live.
Try Live DemoBook a DemoCalculate Your ROI