Replace Synthflow With a Self-Hosted FastAPI Voice Agent
Synthflow's no-code builder hits walls fast — branching tools, custom auth, real CRMs. Move to a self-hosted FastAPI agent and unlock everything in 600 lines.
TL;DR — Synthflow charges $0.09/min for the engine plus an LLM markup. Self-hosting the same flow on a $40/mo box drops you to LLM-cost-only and unlocks any tool integration that Synthflow's drag-and-drop can't reach.
What you'll build
A FastAPI service running on a single 4-vCPU VM that connects Twilio inbound calls to OpenAI Realtime, executes Python tools (any code you want, no JSON-only nodes), persists transcripts in Postgres, and exposes a /sessions admin UI similar to Synthflow's dashboard.
Prerequisites
- Synthflow account with at least one published agent and tool configurations exported (screenshots are fine).
- A VM (Hetzner CX32, AWS t3.medium, or equivalent) with Docker + Postgres.
- Twilio number and OpenAI Realtime key.
- Python 3.11, FastAPI,
asyncpg,websockets. - Reverse proxy (Caddy or Traefik) for TLS — Twilio Media Streams requires WSS.
Architecture
flowchart TB
TW[Twilio] --> CADDY[Caddy WSS]
CADDY --> APP[FastAPI :8000]
APP --> OAI[OpenAI Realtime]
APP --> PG[(Postgres)]
APP --> TOOLS[Your Python tools]
Step 1 — Schema and tool registry
```sql CREATE TABLE call_sessions ( id uuid PRIMARY KEY DEFAULT gen_random_uuid(), call_sid text UNIQUE, started_at timestamptz DEFAULT now(), ended_at timestamptz, transcript jsonb DEFAULT '[]'::jsonb, outcome text ); CREATE INDEX ON call_sessions (started_at DESC); ```
Step 2 — FastAPI shell with tool registration
```python from fastapi import FastAPI, WebSocket from typing import Callable app = FastAPI() TOOLS: dict[str, Callable] = {}
def tool(name: str, schema: dict): def deco(fn): TOOLS[name] = (fn, schema) return fn return deco
@tool("lookup_customer", { "type": "object", "required": ["phone"], "properties": {"phone": {"type": "string"}}, }) async def lookup_customer(phone: str): # any Python you want — psycopg, httpx, internal SDKs return {"name": "Maria", "tier": "gold"} ```
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
Step 3 — Realtime bridge
```python import websockets, json, os, asyncio
async def bridge(twilio_ws: WebSocket): headers = [("Authorization", f"Bearer {os.environ['OPENAI_API_KEY']}"), ("OpenAI-Beta", "realtime=v1")] url = "wss://api.openai.com/v1/realtime?model=gpt-4o-realtime-preview-2025-06-03" async with websockets.connect(url, additional_headers=headers) as oai: await oai.send(json.dumps({"type": "session.update", "session": { "instructions": open("system.md").read(), "voice": "alloy", "input_audio_format": "g711_ulaw", "output_audio_format": "g711_ulaw", "turn_detection": {"type": "server_vad"}, "tools": [{"type":"function","name":n,**({"description":""}),"parameters":s} for n,(_,s) in TOOLS.items()], }})) await asyncio.gather(pump_in(twilio_ws, oai), pump_out(twilio_ws, oai)) ```
Step 4 — Tool execution loop
```python async def handle_function_call(oai, ev): name = ev["name"]; args = json.loads(ev["arguments"]) fn, _ = TOOLS[name] try: result = await fn(**args) except Exception as e: result = {"error": str(e)} await oai.send(json.dumps({ "type": "conversation.item.create", "item": {"type": "function_call_output", "call_id": ev["call_id"], "output": json.dumps(result)}, })) await oai.send(json.dumps({"type": "response.create"})) ```
Step 5 — Transcript persistence
Subscribe to response.audio_transcript.done and conversation.item.input_audio_transcription.completed, append to call_sessions.transcript, set outcome from any tool that calls set_outcome.
Step 6 — Admin UI
A 60-line Next.js page hits GET /sessions and renders waveforms + transcripts. Replace Synthflow's dashboard in a day.
Step 7 — Deploy
Caddyfile:
``` agent.example.com { reverse_proxy 127.0.0.1:8000 } ```
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
Twilio Voice URL: https://agent.example.com/twilio-voice (returns TwiML <Connect><Stream url="wss://agent.example.com/media"/></Connect>).
Common pitfalls
- Reverse proxy buffering kills audio. Disable buffering for the WSS path.
- VAD too generous on noisy lines. Tune
silence_duration_msper locale. - Forgetting
response.createafter a tool result. Calls go silent forever.
How CallSphere does this in production
This pattern is CallSphere — at 100x scale. Healthcare's FastAPI on :8084 ships 14 tools (PHI redaction, eligibility lookup, appointment booking) under HIPAA. OneRoof Property uses 10 specialists over WebRTC + Pion + NATS. Salon runs 4 ElevenLabs agents with GB-YYYYMMDD-### references. Pricing: $149/$499/$1499 with 14-day trial. Compare on /compare/synthflow.
FAQ
Is FastAPI fast enough? Easily — async I/O dominates.
What about phone number pools? Twilio Elastic SIP supports number pooling natively.
HIPAA? Add audit logs, encrypt-at-rest, BAA with Twilio + OpenAI. CallSphere's healthcare stack does this end-to-end.
Branching like Synthflow's flow builder? Use sub-agents (handoffs) instead of nodes.
Cost at 30k min/mo? Synthflow: ~$2,700+. Self-host: ~$1,400.
Sources
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.