Build a Voice Agent with Dialogflow CX + Gemini Live API (2026)
Combine Dialogflow CX's deterministic flows with Gemini Live API's bidirectional streaming for hybrid voice agents. Phone Gateway, generative fallback, real working integration.
TL;DR — Dialogflow CX flows are still the gold standard for compliant, deterministic conversation paths (insurance verification, ID-and-V). Gemini Live API is OpenAI Realtime's GCP equivalent — bidirectional WebSocket, native audio. Hybrid agents use CX for the regulated path and a Gemini Live fallback for everything else.
What you'll build
A Dialogflow CX agent that handles a structured intake flow (verify identity, capture appointment intent), with a "Generative Fallback" that hands the live audio stream to Gemini Live API for free-form Q&A. Phone Gateway provides the PSTN front-end. The Gemini Live bridge runs on Cloud Run.
Prerequisites
- GCP project with Dialogflow CX, Vertex AI APIs enabled.
- Service account with
roles/dialogflow.clientandroles/aiplatform.user. google-cloud-dialogflow-cx,google-genaiPython packages.- Cloud Run for the Gemini Live bridge (or Cloud Functions Gen2).
Architecture
flowchart TD
PSTN[Caller] --> PG[Phone Gateway]
PG --> CX[Dialogflow CX Flow]
CX -->|deterministic intents| FB[Fulfillment webhook]
FB --> CRM[(CRM)]
CX -->|generative fallback| GLB[Gemini Live Bridge Cloud Run]
GLB <-->|wss| GL[Gemini Live API]
GLB --> CX
CX -->|Chirp TTS| PG
PG --> PSTN
Step 1 — Create the CX agent and Phone Gateway number
In the Conversational Agents console: New agent → name hybrid-voice → region us-central1 → enable Generative AI features. Under Manage → Integrations → Phone Gateway click Configure new number.
Step 2 — Define the deterministic flow
Build a single flow identity-verification with pages:
collect_dob(parameter@sys.daterequired)collect_member_id(parameter@sys.number-sequence)verify(calls webhook, transitions on success/failure)
Set the Default Welcome Intent to route into identity-verification. The fulfillment webhook is a Cloud Run service.
Step 3 — Add the Generative Fallback
In the Default Start Flow, set Event Handlers → no-match-default to a Generator:
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
```yaml generator: prompt: | The user said: $conversation.last-user-utterance Reply briefly as a friendly receptionist. model: gemini-2.5-flash ```
This handles single-turn fallbacks. For multi-turn free-form, route to a webhook that hands off to Gemini Live.
Step 4 — The Gemini Live bridge (Cloud Run)
```python
bridge.py
import asyncio, base64, os from fastapi import FastAPI, WebSocket from google import genai from google.genai import types
client = genai.Client(vertexai=True, project=os.environ["PROJECT"], location="us-central1") app = FastAPI()
@app.websocket("/live") async def live(ws: WebSocket): await ws.accept() config = types.LiveConnectConfig( response_modalities=["AUDIO"], speech_config=types.SpeechConfig( voice_config=types.VoiceConfig( prebuilt_voice_config=types.PrebuiltVoiceConfig(voice_name="Charon"))), system_instruction="You are a friendly receptionist. Keep replies short.") async with client.aio.live.connect(model="gemini-2.5-flash-live", config=config) as session: async def from_caller(): async for frame in ws.iter_bytes(): await session.send_realtime_input(audio=types.Blob(data=frame, mime_type="audio/pcm;rate=16000")) async def to_caller(): async for resp in session.receive(): if resp.data: await ws.send_bytes(resp.data) await asyncio.gather(from_caller(), to_caller()) ```
gcloud run deploy bridge --source . — done in 90 seconds.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
Step 5 — Hand off audio from CX to the bridge
Dialogflow CX doesn't expose raw audio to webhooks directly, so for free-form moments you "park" the CX session and hand the call to the bridge via the Phone Gateway's Custom Telephony Provider field. CX's live-agent-handoff event triggers a SIP REFER that the carrier routes to your Cloud Run WebSocket.
Step 6 — Telemetry
CX writes every turn to Conversation History; the bridge writes to Cloud Logging. Stitch sessions by call-sid (CX exposes it as a session parameter via Phone Gateway).
Pitfalls
- Generator vs Live API: Generators are stateless single-turn — perfect for "I didn't understand" repair. Live API is for multi-turn open-ended.
- Phone Gateway latency budget is tight; STT-TTS round-trip via CX adds ~400ms before the model even sees text. Use Gemini Live for low-latency segments.
- Webhook timeouts: CX kills webhooks at 30s. Async work belongs on Pub/Sub.
- Region binding: Phone Gateway numbers live in one region; cross-region calls add 60-100ms.
- Generative AI safety filters can block medical answers — tune
safety_settingson the Generator and Live config.
How CallSphere does this in production
CallSphere's Healthcare vertical (37 agents, 115+ DB tables) doesn't use Dialogflow CX — we built our own flow engine on FastAPI :8084 that routes between OpenAI Realtime and Anthropic Claude per turn based on PHI sensitivity. CX is excellent for teams that already have IVR investments; for greenfield, our managed product at $149/$499/$1499 (14-day trial, 22% affiliate) ships in days vs weeks. 90+ tools, 6 verticals.
FAQ
Q: When do I pick CX over a pure Live API stack? When you have hard compliance gates that must be deterministic (insurance verification, KYC). Live API alone isn't auditable enough for most regulated flows.
Q: Can Live API call tools mid-stream?
Yes — Live API supports function calling natively; declare tools in LiveConnectConfig.
Q: What's the cost?
CX is $0.007/request (text) or $0.06/min (voice with Phone Gateway). Live API on Vertex is $0.0006/sec audio in + $0.0024/sec audio out at flash-live rates.
Q: Does CX support barge-in? Yes — enable in Speech settings; default end-of-speech timeout is 500ms.
Q: Cross-region failover? Replicate the agent config via Terraform; Phone Gateway numbers can fail over to a backup CX agent in another region via the carrier-side route plan.
Sources
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.