Docker Multi-Stage AI Agent Images: uv + Distroless = 80MB (2026)
Shrink an AI voice agent image from 950MB to 80MB with a Python 3.13 multi-stage build, uv for deps, and gcr.io/distroless/python3 nonroot. Real Dockerfile + benchmarks.
TL;DR — Build with
python:3.13-slim+uvfor deps, copy a virtualenv intogcr.io/distroless/python3-debian12:nonroot, and your AI voice agent ships at ~80 MB with no shell, no apt, nopip. Smaller surface, faster pulls, fewer CVEs.
What you'll set up
A two-stage Dockerfile that builds an OpenAI Realtime + LiveKit agent with uv, then copies the resolved virtualenv into a distroless runtime. The final image is non-root, has zero package managers, and starts in <300 ms.
Architecture
flowchart LR
SRC[src + pyproject.toml] --> S1[Stage 1: python:3.13-slim + uv]
S1 -->|uv sync --frozen| VENV[/.venv/]
VENV --> S2[Stage 2: distroless/python3 nonroot]
S2 --> IMG[80MB image]
IMG --> K8S[k3s pod]
Step 1 — Pin Python and add uv
```dockerfile
syntax=docker/dockerfile:1.7
ARG PY=3.13
FROM python:${PY}-slim AS builder ENV UV_LINK_MODE=copy \ UV_COMPILE_BYTECODE=1 \ UV_PROJECT_ENVIRONMENT=/opt/venv RUN --mount=type=cache,target=/root/.cache \ pip install --no-cache-dir uv==0.5.4 WORKDIR /app ```
UV_LINK_MODE=copy is mandatory when the venv is going to be moved across stages. UV_COMPILE_BYTECODE=1 precompiles .pyc, which gives a measurable cold-start improvement on distroless (where you can't recompile at runtime).
Step 2 — Resolve dependencies with the lockfile
```dockerfile COPY pyproject.toml uv.lock ./ RUN --mount=type=cache,target=/root/.cache/uv \ uv sync --frozen --no-dev --no-install-project COPY src/ ./src/ RUN uv sync --frozen --no-dev ```
The split — install deps first, then copy code, then install project — gives proper Docker layer caching: code changes don't bust dependency resolution.
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
Step 3 — Switch to distroless and copy the venv
```dockerfile FROM gcr.io/distroless/python3-debian12:nonroot
COPY --from=builder /opt/venv /opt/venv COPY --from=builder /app/src /app/src
ENV PATH=/opt/venv/bin:$PATH \ PYTHONPATH=/app \ PYTHONDONTWRITEBYTECODE=1 \ PYTHONUNBUFFERED=1
WORKDIR /app USER nonroot ENTRYPOINT ["python", "-m", "src.agent"] ```
The final image: ~80 MB, no shell, no apt, no curl. USER nonroot (uid 65532) is built into the image — you can't accidentally run as root.
Step 4 — Add a healthcheck that doesn't need a shell
Distroless has no curl. Use Python:
```dockerfile HEALTHCHECK --interval=10s --timeout=2s \ CMD ["python", "-c", "import urllib.request,sys; sys.exit(0 if urllib.request.urlopen('http://127.0.0.1:8080/healthz', timeout=1).status==200 else 1)"] ```
For Kubernetes, drop the Docker HEALTHCHECK and use a real probe in the Pod spec — but include both for local docker run debugging.
Step 5 — Build with buildx + provenance
```bash docker buildx create --use --name voicebuilder docker buildx build \ --platform linux/amd64,linux/arm64 \ --provenance=mode=max \ --sbom=true \ -t ghcr.io/acme/voice-agent:$(git rev-parse --short HEAD) \ --push . ```
--provenance=mode=max writes a SLSA-compatible provenance attestation; --sbom=true emits SPDX. Both are stored as OCI referrers — invisible to old clients but verifiable by cosign.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
Step 6 — Verify the image really is small + non-root
```bash docker inspect ghcr.io/acme/voice-agent:abc123 \ --format '{{ .Size }} bytes user={{ .Config.User }}'
81 MB user=nonroot
docker run --rm --read-only --user 65532:65532 \ ghcr.io/acme/voice-agent:abc123 python -c "import sys;print(sys.version)" ```
--read-only confirms the agent doesn't need to write to /tmp (set up an in-memory volume in K8s if it does).
Step 7 — Bench cold start vs slim
```bash hyperfine --warmup 1 \ 'docker run --rm voice-agent:slim python -c "import livekit"' \ 'docker run --rm voice-agent:distroless python -c "import livekit"'
slim: 540ms ± 30ms
distroless: 290ms ± 18ms
```
About 45% faster cold start because there's no shell init, no /etc/profile, and the image fits in page cache.
Pitfalls
- Native deps with no manylinux wheel (e.g. some
grpc-tools) need build essentials in stage 1; withoutbuild-essential, the wheel is built in the final image and bloats it. uv syncwithout--frozenin CI can produce different lock results than dev. Always--frozenin the build./tmpis read-only on distroless by default. Mount anemptyDir{ medium: Memory }if libs (matplotlib cache!) need it.- No
aptfor OS CVE patches — but distroless rebuilds nightly. Pingcr.io/distroless/python3-debian12:nonrootonly by digest in the registry, not by tag. - Glibc mismatch — if you build on
alpinethen copy todebian12you'll getImportError. Stay on debian-based slim → debian-based distroless.
How CallSphere does this in production
CallSphere ships every voice-agent image at ~85 MB on gcr.io/distroless/python3-debian12:nonroot. With 37 agents pulling on every pod restart across a fleet of k3s nodes, going from 950 MB to 85 MB cut our image-pull p95 from 12 s to 0.9 s. We also gained back ~12 GB of container-host disk per node. 90+ tools, 115+ DB tables, $149/$499/$1499, 14-day trial, 22% affiliate, demo.
FAQ
Q: Why not Alpine?
musl libc breaks some Python wheels (notably grpcio, numpy on older releases). Distroless uses glibc — safer for AI stacks.
Q: How do I debug a distroless container?
Use gcr.io/distroless/python3-debian12:debug-nonroot for the same image with busybox. Switch only in dev.
Q: Can I add tini for signal handling?
Distroless includes a proper PID-1 /usr/bin/python invocation already; you don't need tini.
Q: SBOM where?
docker buildx imagetools inspect <image> --format '{{ json .SBOM }}' shows the SPDX SBOM stored as referrer.
Sources
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.