TL;DR — Synthflow charges $0.09/min for the engine plus an LLM markup. Self-hosting the same flow on a $40/mo box drops you to LLM-cost-only and unlocks any tool integration that Synthflow's drag-and-drop can't reach.

What you'll build

A FastAPI service running on a single 4-vCPU VM that connects Twilio inbound calls to OpenAI Realtime, executes Python tools (any code you want, no JSON-only nodes), persists transcripts in Postgres, and exposes a /sessions admin UI similar to Synthflow's dashboard.

Prerequisites

Synthflow account with at least one published agent and tool configurations exported (screenshots are fine).
A VM (Hetzner CX32, AWS t3.medium, or equivalent) with Docker + Postgres.
Twilio number and OpenAI Realtime key.
Python 3.11, FastAPI, asyncpg, websockets.
Reverse proxy (Caddy or Traefik) for TLS — Twilio Media Streams requires WSS.

Architecture

flowchart TB
  TW[Twilio] --> CADDY[Caddy WSS]
  CADDY --> APP[FastAPI :8000]
  APP --> OAI[OpenAI Realtime]
  APP --> PG[(Postgres)]
  APP --> TOOLS[Your Python tools]

Step 1 — Schema and tool registry

```sql CREATE TABLE call_sessions ( id uuid PRIMARY KEY DEFAULT gen_random_uuid(), call_sid text UNIQUE, started_at timestamptz DEFAULT now(), ended_at timestamptz, transcript jsonb DEFAULT '[]'::jsonb, outcome text ); CREATE INDEX ON call_sessions (started_at DESC); ```

Step 2 — FastAPI shell with tool registration

```python from fastapi import FastAPI, WebSocket from typing import Callable app = FastAPI() TOOLS: dict[str, Callable] = {}

def tool(name: str, schema: dict): def deco(fn): TOOLS[name] = (fn, schema) return fn return deco

@tool("lookup_customer", { "type": "object", "required": ["phone"], "properties": {"phone": {"type": "string"}}, }) async def lookup_customer(phone: str): # any Python you want — psycopg, httpx, internal SDKs return {"name": "Maria", "tier": "gold"} ```

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live →

Try Live Demo →

Step 3 — Realtime bridge

```python import websockets, json, os, asyncio

async def bridge(twilio_ws: WebSocket): headers = [("Authorization", f"Bearer {os.environ['OPENAI_API_KEY']}"), ("OpenAI-Beta", "realtime=v1")] url = "wss://api.openai.com/v1/realtime?model=gpt-4o-realtime-preview-2025-06-03" async with websockets.connect(url, additional_headers=headers) as oai: await oai.send(json.dumps({"type": "session.update", "session": { "instructions": open("system.md").read(), "voice": "alloy", "input_audio_format": "g711_ulaw", "output_audio_format": "g711_ulaw", "turn_detection": {"type": "server_vad"}, "tools": [{"type":"function","name":n,**({"description":""}),"parameters":s} for n,(_,s) in TOOLS.items()], }})) await asyncio.gather(pump_in(twilio_ws, oai), pump_out(twilio_ws, oai)) ```

Step 4 — Tool execution loop

```python async def handle_function_call(oai, ev): name = ev["name"]; args = json.loads(ev["arguments"]) fn, _ = TOOLS[name] try: result = await fn(**args) except Exception as e: result = {"error": str(e)} await oai.send(json.dumps({ "type": "conversation.item.create", "item": {"type": "function_call_output", "call_id": ev["call_id"], "output": json.dumps(result)}, })) await oai.send(json.dumps({"type": "response.create"})) ```

Step 5 — Transcript persistence

Subscribe to response.audio_transcript.done and conversation.item.input_audio_transcription.completed, append to call_sessions.transcript, set outcome from any tool that calls set_outcome.

Step 6 — Admin UI

A 60-line Next.js page hits GET /sessions and renders waveforms + transcripts. Replace Synthflow's dashboard in a day.

Step 7 — Deploy

Caddyfile:

``` agent.example.com { reverse_proxy 127.0.0.1:8000 } ```

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Try Live Demo → Book 30-min Walkthrough See Pricing

Twilio Voice URL: https://agent.example.com/twilio-voice (returns TwiML <Connect><Stream url="wss://agent.example.com/media"/></Connect>).

Common pitfalls

Reverse proxy buffering kills audio. Disable buffering for the WSS path.
VAD too generous on noisy lines. Tune silence_duration_ms per locale.
Forgetting response.create after a tool result. Calls go silent forever.

How CallSphere does this in production

This pattern is CallSphere — at 100x scale. Healthcare's FastAPI on :8084 ships 14 tools (PHI redaction, eligibility lookup, appointment booking) under HIPAA. OneRoof Property uses 10 specialists over WebRTC + Pion + NATS. Salon runs 4 ElevenLabs agents with GB-YYYYMMDD-### references. Pricing: $149/$499/$1499 with 14-day trial. Compare on /compare/synthflow.

FAQ

Is FastAPI fast enough? Easily — async I/O dominates.

What about phone number pools? Twilio Elastic SIP supports number pooling natively.

HIPAA? Add audit logs, encrypt-at-rest, BAA with Twilio + OpenAI. CallSphere's healthcare stack does this end-to-end.

Branching like Synthflow's flow builder? Use sub-agents (handoffs) instead of nodes.

Cost at 30k min/mo? Synthflow: ~$2,700+. Self-host: ~$1,400.

Replace Synthflow With a Self-Hosted FastAPI Voice Agent

What you'll build

Prerequisites

Architecture

Step 1 — Schema and tool registry

Step 2 — FastAPI shell with tool registration

Step 3 — Realtime bridge

Step 4 — Tool execution loop

Step 5 — Transcript persistence

Step 6 — Admin UI

Step 7 — Deploy

Common pitfalls

How CallSphere does this in production

FAQ

Sources

Try CallSphere AI Voice Agents

Related Articles You May Like

Latency vs Cost: A Decision Matrix for Voice AI Spend in 2026

Build a Chat Agent with Haystack RAG + Open LLM (Llama 3.2, 2026)

Build a Voice Agent on Cloudflare Workers AI (No External LLM)

OpenAI's May 2026 WebRTC Rearchitecture: How Voice Latency Got Real

How to Build Voice Agent CI/CD with Evals as Gate (GitHub Actions)

Logistics Dispatch Voice Agent 2026: Driver Hotline + Load Assignment Hands-Free