Skip to content
AI Voice Agents
AI Voice Agents12 min read0 views

Build a Voice Agent on Vertex AI Agent Builder with Gemini Live (2026)

Stand up a Gemini-powered voice agent with Vertex AI Agent Builder (now Gemini Enterprise Agent Platform). Phone gateway, ADK code-first agent, Cloud Run runtime — under 200 lines.

TL;DR — Vertex AI Agent Builder (rebranded "Gemini Enterprise Agent Platform" at Cloud Next 2026) gives you the ADK code-first kit, Agent Engine managed runtime, and a built-in phone gateway with TTS/STT in 220+ voices and 40+ languages. You write a Python class, deploy with one command, attach a phone number, done.

What you'll build

A code-first voice agent built with the Agent Development Kit (ADK), backed by gemini-2.5-flash for reasoning and Chirp 3 HD voices for TTS. The agent has one tool (lookup_appointment) backed by Firestore, runs on Agent Engine (managed), and answers a real PSTN number through Conversational Agents Phone Gateway.

Prerequisites

  1. GCP project with Vertex AI + Conversational Agents APIs enabled.
  2. gcloud CLI authenticated, billing enabled.
  3. Python 3.11 with google-cloud-aiplatform>=1.85, google-adk>=0.5.
  4. A Firestore database in Native mode for the appointments tool.

Architecture

flowchart TD
  PSTN[Caller PSTN] --> CXP[Conversational Agents Phone Gateway]
  CXP -->|Chirp 3 STT| AE[Agent Engine Runtime]
  AE -->|ADK agent| GEM[gemini-2.5-flash]
  AE -->|tool| FS[(Firestore appointments)]
  AE -->|text reply| TTS[Chirp 3 HD TTS]
  TTS --> CXP
  CXP --> PSTN

Step 1 — Define the agent with ADK

```python

agent.py

from google.adk.agents import Agent from google.adk.tools import FunctionTool from google.cloud import firestore

db = firestore.Client()

def lookup_appointment(patient_id: str) -> dict: """Returns the next appointment for the given patient_id.""" doc = db.collection("appointments").document(patient_id).get() return doc.to_dict() or {"error": "not found"}

root_agent = Agent( name="reception_agent", model="gemini-2.5-flash", instruction=( "You are a friendly receptionist. Confirm the patient's name, " "look up their appointment, and read it back. Keep replies short." ), tools=[FunctionTool(func=lookup_appointment)], ) ```

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live Demo →

Step 2 — Test locally with the ADK dev UI

```bash pip install google-adk adk web

opens a chat UI at http://localhost:8000

```

The dev UI shows the full reasoning trace, tool calls, and lets you swap the model in real time.

Step 3 — Deploy to Agent Engine (managed runtime)

```python

deploy.py

from vertexai import agent_engines from agent import root_agent

remote = agent_engines.create( agent_engine=root_agent, requirements=["google-adk>=0.5", "google-cloud-firestore"], display_name="reception-agent", ) print(remote.resource_name) ```

gcloud auth application-default login && python deploy.py — Agent Engine builds a container, pushes to Artifact Registry, and gives you a versioned endpoint.

Step 4 — Attach a phone number via Conversational Agents

In the Conversational Agents console (formerly Dialogflow CX), create a new agent, choose Use a deployed Agent Engine endpoint, paste the resource name, then under Manage → Integrations → Phone Gateway click Configure new number and pick a country.

The gateway handles SIP, codec negotiation, Chirp 3 STT in (server VAD with 0.6s end-of-speech timeout), Chirp 3 HD TTS out, barge-in, and DTMF passthrough. No code on your side.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Step 5 — Configure voice and turn-taking

In the agent's Speech and IVR settings, pick:

  • STT: chirp_3 model with use_enhanced=true
  • TTS: voice en-US-Chirp3-HD-Charon (or en-US-Studio-O for Studio voices)
  • End-of-speech timeout: 600ms (default is too aggressive for elderly callers)
  • Barge-in: enabled

Step 6 — Add Vertex AI Search for RAG

If you need a knowledge base, create a Vertex AI Search data store over a GCS bucket of your help docs and add it as a sub-agent or as an ADK VertexAiSearchTool:

```python from google.adk.tools import VertexAiSearchTool search = VertexAiSearchTool( data_store_id="projects/123/locations/global/collections/default_collection/dataStores/help-docs" ) root_agent.tools.append(search) ```

Step 7 — Stream events for analytics

Agent Engine emits Cloud Logging events for every turn, every tool call, and every model response. Pipe them into BigQuery via a Logs Router sink for dashboards.

Pitfalls

  • Phone Gateway numbers are US-only as of May 2026 (Canada coming Q3). Use SIP trunking via your own carrier for other regions.
  • Agent Engine cold-start is ~3s on first call after idle; set min_instances=1 for production.
  • Chirp 3 HD voices add ~200ms vs Studio. Use Studio voices when latency budget is tight.
  • Free trial limits Vertex AI to $300 credit; Agent Engine billing kicks in immediately at $0.0001/request + compute time.
  • ADK + Firestore quotas: 10k document reads/sec is the soft cap; cache hot patient lookups in Memorystore.

How CallSphere does this in production

CallSphere runs OpenAI Realtime on FastAPI :8084 for Healthcare because GCP's Phone Gateway didn't support our HIPAA chain of custody until late 2025. For our 6 verticals (Healthcare, Multi-Family, Salons, Behavioral, Hospitality, Real Estate), we keep Gemini 2.5 Flash as a fallback model behind our 90+ tools — primarily for non-PHI workloads where its 1M context lets us pass entire CRM histories. 37 agents, 115+ DB tables. Pricing: $149/$499/$1499, 14-day trial, 22% affiliate.

FAQ

Q: ADK vs Agent Studio (low-code)? Use ADK for code-first teams that want git, tests, CI. Use Agent Studio for non-engineers and rapid prototyping. They share the same runtime.

Q: Gemini 2.5 Flash vs Pro for voice? Flash is the right default for voice — TTFT is ~300ms vs ~700ms on Pro. Save Pro for tool-heavy reasoning loops.

Q: How does this compare to Dialogflow CX classic? Conversational Agents (the new console) replaces both old Dialogflow CX and Agent Builder. ADK is what you write; CX flows are still available for deterministic IVR.

Q: What's the latency target? Voice-to-voice ~700-900ms with Chirp 3 + Flash on us-central1.

Q: Can I bring my own LLM? Yes — ADK's model param accepts any Vertex Model Garden or LiteLLM-compatible endpoint, including Claude on Vertex.

Sources

Share

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.