---
title: "Build a Voice Agent with Dialogflow CX + Gemini Live API (2026)"
description: "Combine Dialogflow CX's deterministic flows with Gemini Live API's bidirectional streaming for hybrid voice agents. Phone Gateway, generative fallback, real working integration."
canonical: https://callsphere.ai/blog/vw5h-build-voice-agent-dialogflow-cx-gemini-live-api
category: "AI Voice Agents"
tags: ["GCP", "Dialogflow CX", "Gemini Live", "Vertex AI", "Tutorial"]
author: "CallSphere Team"
published: 2026-03-27T00:00:00.000Z
updated: 2026-05-07T16:30:06.590Z
---

# Build a Voice Agent with Dialogflow CX + Gemini Live API (2026)

> Combine Dialogflow CX's deterministic flows with Gemini Live API's bidirectional streaming for hybrid voice agents. Phone Gateway, generative fallback, real working integration.

> **TL;DR** — Dialogflow CX flows are still the gold standard for compliant, deterministic conversation paths (insurance verification, ID-and-V). Gemini Live API is OpenAI Realtime's GCP equivalent — bidirectional WebSocket, native audio. Hybrid agents use CX for the regulated path and a Gemini Live fallback for everything else.

## What you'll build

A Dialogflow CX agent that handles a structured intake flow (verify identity, capture appointment intent), with a "Generative Fallback" that hands the live audio stream to Gemini Live API for free-form Q&A. Phone Gateway provides the PSTN front-end. The Gemini Live bridge runs on Cloud Run.

## Prerequisites

1. GCP project with Dialogflow CX, Vertex AI APIs enabled.
2. Service account with `roles/dialogflow.client` and `roles/aiplatform.user`.
3. `google-cloud-dialogflow-cx`, `google-genai` Python packages.
4. Cloud Run for the Gemini Live bridge (or Cloud Functions Gen2).

## Architecture

```mermaid
flowchart TD
  PSTN[Caller] --> PG[Phone Gateway]
  PG --> CX[Dialogflow CX Flow]
  CX -->|deterministic intents| FB[Fulfillment webhook]
  FB --> CRM[(CRM)]
  CX -->|generative fallback| GLB[Gemini Live Bridge Cloud Run]
  GLB |wss| GL[Gemini Live API]
  GLB --> CX
  CX -->|Chirp TTS| PG
  PG --> PSTN
```

## Step 1 — Create the CX agent and Phone Gateway number

In the Conversational Agents console: New agent → name `hybrid-voice` → region `us-central1` → enable Generative AI features. Under **Manage → Integrations → Phone Gateway** click **Configure new number**.

## Step 2 — Define the deterministic flow

Build a single flow `identity-verification` with pages:

- `collect_dob` (parameter `@sys.date` required)
- `collect_member_id` (parameter `@sys.number-sequence`)
- `verify` (calls webhook, transitions on success/failure)

Set the **Default Welcome Intent** to route into `identity-verification`. The fulfillment webhook is a Cloud Run service.

## Step 3 — Add the Generative Fallback

In the Default Start Flow, set **Event Handlers → no-match-default** to a **Generator**:

```yaml
generator:
  prompt: |
    The user said: $conversation.last-user-utterance
    Reply briefly as a friendly receptionist.
  model: gemini-2.5-flash
```

This handles single-turn fallbacks. For multi-turn free-form, route to a webhook that hands off to Gemini Live.

## Step 4 — The Gemini Live bridge (Cloud Run)

```python

# bridge.py

import asyncio, base64, os
from fastapi import FastAPI, WebSocket
from google import genai
from google.genai import types

client = genai.Client(vertexai=True, project=os.environ["PROJECT"], location="us-central1")
app = FastAPI()

@app.websocket("/live")
async def live(ws: WebSocket):
    await ws.accept()
    config = types.LiveConnectConfig(
        response_modalities=["AUDIO"],
        speech_config=types.SpeechConfig(
            voice_config=types.VoiceConfig(
                prebuilt_voice_config=types.PrebuiltVoiceConfig(voice_name="Charon"))),
        system_instruction="You are a friendly receptionist. Keep replies short.")
    async with client.aio.live.connect(model="gemini-2.5-flash-live", config=config) as session:
        async def from_caller():
            async for frame in ws.iter_bytes():
                await session.send_realtime_input(audio=types.Blob(data=frame, mime_type="audio/pcm;rate=16000"))
        async def to_caller():
            async for resp in session.receive():
                if resp.data:
                    await ws.send_bytes(resp.data)
        await asyncio.gather(from_caller(), to_caller())
```

`gcloud run deploy bridge --source .` — done in 90 seconds.

## Step 5 — Hand off audio from CX to the bridge

Dialogflow CX doesn't expose raw audio to webhooks directly, so for free-form moments you "park" the CX session and hand the call to the bridge via the Phone Gateway's **Custom Telephony Provider** field. CX's `live-agent-handoff` event triggers a SIP REFER that the carrier routes to your Cloud Run WebSocket.

## Step 6 — Telemetry

CX writes every turn to Conversation History; the bridge writes to Cloud Logging. Stitch sessions by `call-sid` (CX exposes it as a session parameter via Phone Gateway).

## Pitfalls

- **Generator vs Live API**: Generators are stateless single-turn — perfect for "I didn't understand" repair. Live API is for multi-turn open-ended.
- **Phone Gateway latency budget** is tight; STT-TTS round-trip via CX adds ~400ms before the model even sees text. Use Gemini Live for low-latency segments.
- **Webhook timeouts**: CX kills webhooks at 30s. Async work belongs on Pub/Sub.
- **Region binding**: Phone Gateway numbers live in one region; cross-region calls add 60-100ms.
- **Generative AI safety filters** can block medical answers — tune `safety_settings` on the Generator and Live config.

## How CallSphere does this in production

CallSphere's Healthcare vertical (37 agents, 115+ DB tables) doesn't use Dialogflow CX — we built our own flow engine on FastAPI :8084 that routes between OpenAI Realtime and Anthropic Claude per turn based on PHI sensitivity. CX is excellent for teams that already have IVR investments; for greenfield, our managed product at $149/$499/$1499 (14-day trial, 22% affiliate) ships in days vs weeks. 90+ tools, 6 verticals.

## FAQ

**Q: When do I pick CX over a pure Live API stack?**
When you have hard compliance gates that must be deterministic (insurance verification, KYC). Live API alone isn't auditable enough for most regulated flows.

**Q: Can Live API call tools mid-stream?**
Yes — Live API supports function calling natively; declare tools in `LiveConnectConfig`.

**Q: What's the cost?**
CX is $0.007/request (text) or $0.06/min (voice with Phone Gateway). Live API on Vertex is $0.0006/sec audio in + $0.0024/sec audio out at `flash-live` rates.

**Q: Does CX support barge-in?**
Yes — enable in Speech settings; default end-of-speech timeout is 500ms.

**Q: Cross-region failover?**
Replicate the agent config via Terraform; Phone Gateway numbers can fail over to a backup CX agent in another region via the carrier-side route plan.

## Sources

- [Conversational Agents (Dialogflow CX) overview](https://docs.cloud.google.com/dialogflow/cx/docs/concept/console-conversational-agents)
- [Gemini Live API documentation](https://ai.google.dev/gemini-api/docs/live)
- [CX Phone Gateway integration](https://docs.cloud.google.com/dialogflow/cx/docs/concept/integration/phone-gateway)
- [Gemini Live API on Vertex AI](https://docs.cloud.google.com/vertex-ai/generative-ai/docs/live-api)
- [Customer Experience Agent Studio — Google Cloud](https://cloud.google.com/dialogflow)

---

Source: https://callsphere.ai/blog/vw5h-build-voice-agent-dialogflow-cx-gemini-live-api
