---
title: "Build a Voice Agent on Vertex AI Agent Builder with Gemini Live (2026)"
description: "Stand up a Gemini-powered voice agent with Vertex AI Agent Builder (now Gemini Enterprise Agent Platform). Phone gateway, ADK code-first agent, Cloud Run runtime — under 200 lines."
canonical: https://callsphere.ai/blog/vw5h-build-voice-agent-vertex-ai-agent-builder-gemini
category: "AI Voice Agents"
tags: ["GCP", "Vertex AI", "Gemini", "Agent Builder", "Tutorial"]
author: "CallSphere Team"
published: 2026-03-24T00:00:00.000Z
updated: 2026-05-07T16:30:06.352Z
---

# Build a Voice Agent on Vertex AI Agent Builder with Gemini Live (2026)

> Stand up a Gemini-powered voice agent with Vertex AI Agent Builder (now Gemini Enterprise Agent Platform). Phone gateway, ADK code-first agent, Cloud Run runtime — under 200 lines.

> **TL;DR** — Vertex AI Agent Builder (rebranded "Gemini Enterprise Agent Platform" at Cloud Next 2026) gives you the ADK code-first kit, Agent Engine managed runtime, and a built-in phone gateway with TTS/STT in 220+ voices and 40+ languages. You write a Python class, deploy with one command, attach a phone number, done.

## What you'll build

A code-first voice agent built with the Agent Development Kit (ADK), backed by `gemini-2.5-flash` for reasoning and Chirp 3 HD voices for TTS. The agent has one tool (`lookup_appointment`) backed by Firestore, runs on Agent Engine (managed), and answers a real PSTN number through Conversational Agents Phone Gateway.

## Prerequisites

1. GCP project with Vertex AI + Conversational Agents APIs enabled.
2. `gcloud` CLI authenticated, billing enabled.
3. Python 3.11 with `google-cloud-aiplatform>=1.85`, `google-adk>=0.5`.
4. A Firestore database in Native mode for the appointments tool.

## Architecture

```mermaid
flowchart TD
  PSTN[Caller PSTN] --> CXP[Conversational Agents Phone Gateway]
  CXP -->|Chirp 3 STT| AE[Agent Engine Runtime]
  AE -->|ADK agent| GEM[gemini-2.5-flash]
  AE -->|tool| FS[(Firestore appointments)]
  AE -->|text reply| TTS[Chirp 3 HD TTS]
  TTS --> CXP
  CXP --> PSTN
```

## Step 1 — Define the agent with ADK

```python

# agent.py

from google.adk.agents import Agent
from google.adk.tools import FunctionTool
from google.cloud import firestore

db = firestore.Client()

def lookup_appointment(patient_id: str) -> dict:
    """Returns the next appointment for the given patient_id."""
    doc = db.collection("appointments").document(patient_id).get()
    return doc.to_dict() or {"error": "not found"}

root_agent = Agent(
    name="reception_agent",
    model="gemini-2.5-flash",
    instruction=(
        "You are a friendly receptionist. Confirm the patient's name, "
        "look up their appointment, and read it back. Keep replies short."
    ),
    tools=[FunctionTool(func=lookup_appointment)],
)
```

## Step 2 — Test locally with the ADK dev UI

```bash
pip install google-adk
adk web

# opens a chat UI at http://localhost:8000

```

The dev UI shows the full reasoning trace, tool calls, and lets you swap the model in real time.

## Step 3 — Deploy to Agent Engine (managed runtime)

```python

# deploy.py

from vertexai import agent_engines
from agent import root_agent

remote = agent_engines.create(
    agent_engine=root_agent,
    requirements=["google-adk>=0.5", "google-cloud-firestore"],
    display_name="reception-agent",
)
print(remote.resource_name)
```

`gcloud auth application-default login && python deploy.py` — Agent Engine builds a container, pushes to Artifact Registry, and gives you a versioned endpoint.

## Step 4 — Attach a phone number via Conversational Agents

In the Conversational Agents console (formerly Dialogflow CX), create a new agent, choose **Use a deployed Agent Engine endpoint**, paste the resource name, then under **Manage → Integrations → Phone Gateway** click **Configure new number** and pick a country.

The gateway handles SIP, codec negotiation, Chirp 3 STT in (server VAD with 0.6s end-of-speech timeout), Chirp 3 HD TTS out, barge-in, and DTMF passthrough. No code on your side.

## Step 5 — Configure voice and turn-taking

In the agent's **Speech and IVR settings**, pick:

- STT: `chirp_3` model with `use_enhanced=true`
- TTS: voice `en-US-Chirp3-HD-Charon` (or `en-US-Studio-O` for Studio voices)
- End-of-speech timeout: `600ms` (default is too aggressive for elderly callers)
- Barge-in: enabled

## Step 6 — Add Vertex AI Search for RAG

If you need a knowledge base, create a Vertex AI Search data store over a GCS bucket of your help docs and add it as a sub-agent or as an ADK `VertexAiSearchTool`:

```python
from google.adk.tools import VertexAiSearchTool
search = VertexAiSearchTool(
    data_store_id="projects/123/locations/global/collections/default_collection/dataStores/help-docs"
)
root_agent.tools.append(search)
```

## Step 7 — Stream events for analytics

Agent Engine emits Cloud Logging events for every turn, every tool call, and every model response. Pipe them into BigQuery via a Logs Router sink for dashboards.

## Pitfalls

- **Phone Gateway numbers are US-only** as of May 2026 (Canada coming Q3). Use SIP trunking via your own carrier for other regions.
- **Agent Engine cold-start** is ~3s on first call after idle; set `min_instances=1` for production.
- **Chirp 3 HD voices** add ~200ms vs Studio. Use Studio voices when latency budget is tight.
- **Free trial limits** Vertex AI to $300 credit; Agent Engine billing kicks in immediately at $0.0001/request + compute time.
- **ADK + Firestore quotas**: 10k document reads/sec is the soft cap; cache hot patient lookups in Memorystore.

## How CallSphere does this in production

CallSphere runs OpenAI Realtime on FastAPI :8084 for Healthcare because GCP's Phone Gateway didn't support our HIPAA chain of custody until late 2025. For our 6 verticals (Healthcare, Multi-Family, Salons, Behavioral, Hospitality, Real Estate), we keep Gemini 2.5 Flash as a fallback model behind our 90+ tools — primarily for non-PHI workloads where its 1M context lets us pass entire CRM histories. 37 agents, 115+ DB tables. Pricing: $149/$499/$1499, 14-day trial, 22% affiliate.

## FAQ

**Q: ADK vs Agent Studio (low-code)?**
Use ADK for code-first teams that want git, tests, CI. Use Agent Studio for non-engineers and rapid prototyping. They share the same runtime.

**Q: Gemini 2.5 Flash vs Pro for voice?**
Flash is the right default for voice — TTFT is ~300ms vs ~700ms on Pro. Save Pro for tool-heavy reasoning loops.

**Q: How does this compare to Dialogflow CX classic?**
Conversational Agents (the new console) replaces both old Dialogflow CX and Agent Builder. ADK is what you write; CX flows are still available for deterministic IVR.

**Q: What's the latency target?**
Voice-to-voice ~700-900ms with Chirp 3 + Flash on `us-central1`.

**Q: Can I bring my own LLM?**
Yes — ADK's model param accepts any Vertex Model Garden or LiteLLM-compatible endpoint, including Claude on Vertex.

## Sources

- [Vertex AI Agent Builder overview — Google Cloud](https://docs.cloud.google.com/agent-builder/overview)
- [Agent Development Kit (ADK) documentation](https://google.github.io/adk-docs/)
- [Conversational Agents Phone Gateway](https://docs.cloud.google.com/dialogflow/cx/docs/concept/integration/phone-gateway)
- [Building AI Agents with Vertex AI Agent Builder — Google Codelab](https://codelabs.developers.google.com/devsite/codelabs/building-ai-agents-vertexai)
- [More ways to build and scale AI agents with Vertex AI Agent Builder — Google Cloud Blog](https://cloud.google.com/blog/products/ai-machine-learning/more-ways-to-build-and-scale-ai-agents-with-vertex-ai-agent-builder)

---

Source: https://callsphere.ai/blog/vw5h-build-voice-agent-vertex-ai-agent-builder-gemini
