Build a Voice Agent on Vertex AI Agent Builder with Gemini Live (2026)
Stand up a Gemini-powered voice agent with Vertex AI Agent Builder (now Gemini Enterprise Agent Platform). Phone gateway, ADK code-first agent, Cloud Run runtime — under 200 lines.
TL;DR — Vertex AI Agent Builder (rebranded "Gemini Enterprise Agent Platform" at Cloud Next 2026) gives you the ADK code-first kit, Agent Engine managed runtime, and a built-in phone gateway with TTS/STT in 220+ voices and 40+ languages. You write a Python class, deploy with one command, attach a phone number, done.
What you'll build
A code-first voice agent built with the Agent Development Kit (ADK), backed by gemini-2.5-flash for reasoning and Chirp 3 HD voices for TTS. The agent has one tool (lookup_appointment) backed by Firestore, runs on Agent Engine (managed), and answers a real PSTN number through Conversational Agents Phone Gateway.
Prerequisites
- GCP project with Vertex AI + Conversational Agents APIs enabled.
gcloudCLI authenticated, billing enabled.- Python 3.11 with
google-cloud-aiplatform>=1.85,google-adk>=0.5. - A Firestore database in Native mode for the appointments tool.
Architecture
flowchart TD
PSTN[Caller PSTN] --> CXP[Conversational Agents Phone Gateway]
CXP -->|Chirp 3 STT| AE[Agent Engine Runtime]
AE -->|ADK agent| GEM[gemini-2.5-flash]
AE -->|tool| FS[(Firestore appointments)]
AE -->|text reply| TTS[Chirp 3 HD TTS]
TTS --> CXP
CXP --> PSTN
Step 1 — Define the agent with ADK
```python
agent.py
from google.adk.agents import Agent from google.adk.tools import FunctionTool from google.cloud import firestore
db = firestore.Client()
def lookup_appointment(patient_id: str) -> dict: """Returns the next appointment for the given patient_id.""" doc = db.collection("appointments").document(patient_id).get() return doc.to_dict() or {"error": "not found"}
root_agent = Agent( name="reception_agent", model="gemini-2.5-flash", instruction=( "You are a friendly receptionist. Confirm the patient's name, " "look up their appointment, and read it back. Keep replies short." ), tools=[FunctionTool(func=lookup_appointment)], ) ```
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
Step 2 — Test locally with the ADK dev UI
```bash pip install google-adk adk web
opens a chat UI at http://localhost:8000
```
The dev UI shows the full reasoning trace, tool calls, and lets you swap the model in real time.
Step 3 — Deploy to Agent Engine (managed runtime)
```python
deploy.py
from vertexai import agent_engines from agent import root_agent
remote = agent_engines.create( agent_engine=root_agent, requirements=["google-adk>=0.5", "google-cloud-firestore"], display_name="reception-agent", ) print(remote.resource_name) ```
gcloud auth application-default login && python deploy.py — Agent Engine builds a container, pushes to Artifact Registry, and gives you a versioned endpoint.
Step 4 — Attach a phone number via Conversational Agents
In the Conversational Agents console (formerly Dialogflow CX), create a new agent, choose Use a deployed Agent Engine endpoint, paste the resource name, then under Manage → Integrations → Phone Gateway click Configure new number and pick a country.
The gateway handles SIP, codec negotiation, Chirp 3 STT in (server VAD with 0.6s end-of-speech timeout), Chirp 3 HD TTS out, barge-in, and DTMF passthrough. No code on your side.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
Step 5 — Configure voice and turn-taking
In the agent's Speech and IVR settings, pick:
- STT:
chirp_3model withuse_enhanced=true - TTS: voice
en-US-Chirp3-HD-Charon(oren-US-Studio-Ofor Studio voices) - End-of-speech timeout:
600ms(default is too aggressive for elderly callers) - Barge-in: enabled
Step 6 — Add Vertex AI Search for RAG
If you need a knowledge base, create a Vertex AI Search data store over a GCS bucket of your help docs and add it as a sub-agent or as an ADK VertexAiSearchTool:
```python from google.adk.tools import VertexAiSearchTool search = VertexAiSearchTool( data_store_id="projects/123/locations/global/collections/default_collection/dataStores/help-docs" ) root_agent.tools.append(search) ```
Step 7 — Stream events for analytics
Agent Engine emits Cloud Logging events for every turn, every tool call, and every model response. Pipe them into BigQuery via a Logs Router sink for dashboards.
Pitfalls
- Phone Gateway numbers are US-only as of May 2026 (Canada coming Q3). Use SIP trunking via your own carrier for other regions.
- Agent Engine cold-start is ~3s on first call after idle; set
min_instances=1for production. - Chirp 3 HD voices add ~200ms vs Studio. Use Studio voices when latency budget is tight.
- Free trial limits Vertex AI to $300 credit; Agent Engine billing kicks in immediately at $0.0001/request + compute time.
- ADK + Firestore quotas: 10k document reads/sec is the soft cap; cache hot patient lookups in Memorystore.
How CallSphere does this in production
CallSphere runs OpenAI Realtime on FastAPI :8084 for Healthcare because GCP's Phone Gateway didn't support our HIPAA chain of custody until late 2025. For our 6 verticals (Healthcare, Multi-Family, Salons, Behavioral, Hospitality, Real Estate), we keep Gemini 2.5 Flash as a fallback model behind our 90+ tools — primarily for non-PHI workloads where its 1M context lets us pass entire CRM histories. 37 agents, 115+ DB tables. Pricing: $149/$499/$1499, 14-day trial, 22% affiliate.
FAQ
Q: ADK vs Agent Studio (low-code)? Use ADK for code-first teams that want git, tests, CI. Use Agent Studio for non-engineers and rapid prototyping. They share the same runtime.
Q: Gemini 2.5 Flash vs Pro for voice? Flash is the right default for voice — TTFT is ~300ms vs ~700ms on Pro. Save Pro for tool-heavy reasoning loops.
Q: How does this compare to Dialogflow CX classic? Conversational Agents (the new console) replaces both old Dialogflow CX and Agent Builder. ADK is what you write; CX flows are still available for deterministic IVR.
Q: What's the latency target?
Voice-to-voice ~700-900ms with Chirp 3 + Flash on us-central1.
Q: Can I bring my own LLM? Yes — ADK's model param accepts any Vertex Model Garden or LiteLLM-compatible endpoint, including Claude on Vertex.
Sources
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.