By Sagar Shankaran, Founder of CallSphere
Combine Dialogflow CX's deterministic flows with Gemini Live API's bidirectional streaming for hybrid voice agents. Phone Gateway, generative fallback, real working integration.
Key takeaways
TL;DR — Dialogflow CX flows are still the gold standard for compliant, deterministic conversation paths (insurance verification, ID-and-V). Gemini Live API is OpenAI Realtime's GCP equivalent — bidirectional WebSocket, native audio. Hybrid agents use CX for the regulated path and a Gemini Live fallback for everything else.
A Dialogflow CX agent that handles a structured intake flow (verify identity, capture appointment intent), with a "Generative Fallback" that hands the live audio stream to Gemini Live API for free-form Q&A. Phone Gateway provides the PSTN front-end. The Gemini Live bridge runs on Cloud Run.
roles/dialogflow.client and roles/aiplatform.user.google-cloud-dialogflow-cx, google-genai Python packages.flowchart TD
PSTN[Caller] --> PG[Phone Gateway]
PG --> CX[Dialogflow CX Flow]
CX -->|deterministic intents| FB[Fulfillment webhook]
FB --> CRM[(CRM)]
CX -->|generative fallback| GLB[Gemini Live Bridge Cloud Run]
GLB <-->|wss| GL[Gemini Live API]
GLB --> CX
CX -->|Chirp TTS| PG
PG --> PSTN
In the Conversational Agents console: New agent → name hybrid-voice → region us-central1 → enable Generative AI features. Under Manage → Integrations → Phone Gateway click Configure new number.
Build a single flow identity-verification with pages:
collect_dob (parameter @sys.date required)collect_member_id (parameter @sys.number-sequence)verify (calls webhook, transitions on success/failure)Set the Default Welcome Intent to route into identity-verification. The fulfillment webhook is a Cloud Run service.
In the Default Start Flow, set Event Handlers → no-match-default to a Generator:
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
```yaml generator: prompt: | The user said: $conversation.last-user-utterance Reply briefly as a friendly receptionist. model: gemini-2.5-flash ```
This handles single-turn fallbacks. For multi-turn free-form, route to a webhook that hands off to Gemini Live.
```python
import asyncio, base64, os from fastapi import FastAPI, WebSocket from google import genai from google.genai import types
client = genai.Client(vertexai=True, project=os.environ["PROJECT"], location="us-central1") app = FastAPI()
@app.websocket("/live") async def live(ws: WebSocket): await ws.accept() config = types.LiveConnectConfig( response_modalities=["AUDIO"], speech_config=types.SpeechConfig( voice_config=types.VoiceConfig( prebuilt_voice_config=types.PrebuiltVoiceConfig(voice_name="Charon"))), system_instruction="You are a friendly receptionist. Keep replies short.") async with client.aio.live.connect(model="gemini-2.5-flash-live", config=config) as session: async def from_caller(): async for frame in ws.iter_bytes(): await session.send_realtime_input(audio=types.Blob(data=frame, mime_type="audio/pcm;rate=16000")) async def to_caller(): async for resp in session.receive(): if resp.data: await ws.send_bytes(resp.data) await asyncio.gather(from_caller(), to_caller()) ```
gcloud run deploy bridge --source . — done in 90 seconds.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
Dialogflow CX doesn't expose raw audio to webhooks directly, so for free-form moments you "park" the CX session and hand the call to the bridge via the Phone Gateway's Custom Telephony Provider field. CX's live-agent-handoff event triggers a SIP REFER that the carrier routes to your Cloud Run WebSocket.
CX writes every turn to Conversation History; the bridge writes to Cloud Logging. Stitch sessions by call-sid (CX exposes it as a session parameter via Phone Gateway).
safety_settings on the Generator and Live config.CallSphere's Healthcare vertical (37 agents, 115+ DB tables) doesn't use Dialogflow CX — we built our own flow engine on FastAPI :8084 that routes between OpenAI Realtime and Anthropic Claude per turn based on PHI sensitivity. CX is excellent for teams that already have IVR investments; for greenfield, our managed product at $149/$499/$1499 (14-day trial, 22% affiliate) ships in days vs weeks. 90+ tools, 6 verticals.
Q: When do I pick CX over a pure Live API stack? When you have hard compliance gates that must be deterministic (insurance verification, KYC). Live API alone isn't auditable enough for most regulated flows.
Q: Can Live API call tools mid-stream?
Yes — Live API supports function calling natively; declare tools in LiveConnectConfig.
Q: What's the cost?
CX is $0.007/request (text) or $0.06/min (voice with Phone Gateway). Live API on Vertex is $0.0006/sec audio in + $0.0024/sec audio out at flash-live rates.
Q: Does CX support barge-in? Yes — enable in Speech settings; default end-of-speech timeout is 500ms.
Q: Cross-region failover? Replicate the agent config via Terraform; Phone Gateway numbers can fail over to a backup CX agent in another region via the carrier-side route plan.
Written by
Sagar Shankaran· Founder, CallSphere
Sagar Shankaran is the founder of CallSphere, where he builds production AI voice and chat agents deployed across healthcare, hospitality, real estate, and home services. He writes about agentic AI, LLM engineering, and shipping voice agents that handle real calls in production.
See how AI voice agents work for your industry. Live demo available -- no signup required.
A no-fluff recap of the 7 biggest enterprise AI moves from Google Cloud Next 2026 — Gemini Enterprise, Agentspace, A2A, Gemini 3.1 Ultra, and more.
Haystack 2.7's Agent component plus an Ollama-served Llama 3.2 gives you tool-calling RAG with citations. Here's a complete pipeline against your own document store.
At Cloud Next 2026 Google renamed Vertex AI to Gemini Enterprise Agent Platform and absorbed Agentspace. What actually changed and why a rebrand made sense.
Run STT, LLM, and TTS entirely on Cloudflare's edge — no OpenAI, no ElevenLabs. Real working code with Whisper, Llama 3.3 70B, and Deepgram Aura.
Version your prompts in git, run a 50-case eval suite on every PR, block merges below threshold, and ship a new agent prompt with confidence — full GitHub Actions tutorial.
Replace expensive outbound SDR tooling with a self-hosted dialer that runs OpenAI Realtime agents at 100 concurrent calls. Full architecture and code.
© 2026 CallSphere LLC. All rights reserved.
Watch how CallSphere handles real customer calls, schedules appointments, and processes payments — live.
Try Live DemoBook a DemoCalculate Your ROI