By Sagar Shankaran, Founder of CallSphere
Compare Statsig (now OpenAI-owned), GrowthBook with MCP, and LaunchDarkly for shipping AI prompt and model changes safely. Real flag patterns: prompt rollout, model swap, kill switch.
Key takeaways
TL;DR — Statsig (acquired by OpenAI) bundles flags + experimentation; GrowthBook is the open-source alternative with MCP integration; LaunchDarkly is the enterprise default. Use flags for prompts and model versions, not for feature toggles — AI changes need % rollout, not all-or-nothing.
A voice agent that pulls its system prompt and model name from a feature-flag service at session start, with cohort assignment by tenant ID and a global kill switch. Three patterns shown — pick one platform.
flowchart LR
CALL[New call] --> AGENT[Voice agent]
AGENT -->|user_id| FLAGS[Flag SDK]
FLAGS --> SDK_LD[LaunchDarkly]
FLAGS --> SDK_ST[Statsig]
FLAGS --> SDK_GB[GrowthBook]
SDK_LD --> CONFIG[(Eval rules)]
AGENT -->|chosen prompt + model| LLM[OpenAI Realtime]
AGENT -.->|track event| EVENTS[Events table]
For AI agents, flag prompts and models, not features:
```json { "agent_system_prompt_v": { "rollout": "v3", "fallback": "v2" }, "agent_model": { "rollout": "gpt-realtime", "fallback": "gpt-realtime-mini" }, "agent_tool_set": { "rollout": "core+v2-extras", "fallback": "core" }, "agent_kill_switch": false } ```
Kill switch is non-negotiable — if a model regression hits, you flip one bool and traffic moves back to the known-good config in seconds.
```python import ldclient from ldclient.config import Config ldclient.set_config(Config("${LD_SDK_KEY}")) ld = ldclient.get()
def session_config(call_id: str, tenant_id: str): ctx = ldclient.Context.builder(call_id).kind("call") \ .set("tenant", tenant_id).set("vertical", "healthcare").build() if ld.variation("agent_kill_switch", ctx, False): return SAFE_DEFAULTS return { "prompt_v": ld.variation("agent_system_prompt_v", ctx, "v2"), "model": ld.variation("agent_model", ctx, "gpt-realtime-mini"), } ```
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
LaunchDarkly's percentage rollouts and segments are mature; experiment add-on hooks into your warehouse.
```python from statsig import statsig, StatsigUser statsig.initialize("${STATSIG_SDK_KEY}")
def session_config(call_id, tenant_id): user = StatsigUser(user_id=call_id, custom={"tenant": tenant_id}) if statsig.check_gate(user, "agent_kill_switch"): return SAFE_DEFAULTS cfg = statsig.get_config(user, "voice_agent") return { "prompt_v": cfg.get("prompt_v", "v2"), "model": cfg.get("model", "gpt-realtime-mini") } ```
Statsig is best when you want flags + experiment results in the same dashboard. Now part of OpenAI, it has tightened LLM-aware defaults (auto-track input/output tokens per gate).
```python from growthbook import GrowthBook gb = GrowthBook(api_host="https://api.growthbook.io", client_key="${GB_KEY}")
def session_config(call_id, tenant_id): gb.set_attributes({"id": call_id, "tenant": tenant_id}) if gb.is_on("agent_kill_switch"): return SAFE_DEFAULTS return { "prompt_v": gb.get_feature_value("agent_system_prompt_v", "v2"), "model": gb.get_feature_value("agent_model", "gpt-realtime-mini"), } ```
GrowthBook's MCP server (2026) lets Claude or other agents read/write flags directly — useful for "the agent ships its own canary".
```python def end_call(call_id, success, latency_ms, ev): ev.track(call_id, "call_end", {"success": success, "latency_ms": latency_ms, "prompt_v": cfg["prompt_v"], "model": cfg["model"]}) ```
Without outcome events, you have flags but no learning. Tie every call to its flag values so the experiment view can compute lift.
All three SDKs poll/SSE for changes. For Python, ensure the SDK is initialized once at process start, not per call (huge perf hit otherwise).
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
```python SAFE_DEFAULTS = {"prompt_v": "v2", "model": "gpt-realtime-mini", "tool_set": "core"} ```
Every variation() call must have a hard-coded default. If LaunchDarkly's CDN is degraded, you keep serving — just without rollout granularity.
For multi-vertical agents (healthcare, salon, etc.), cohort by vertical so a healthcare-only prompt change can't leak to dental.
```json // LaunchDarkly targeting { "rules": [ { "if": { "vertical": "healthcare" }, "then": { "variation": "v3-healthcare" }}, { "default": "v2" } ] } ```
Lock prompt and model flags behind LaunchDarkly Workflows / Statsig approval gates. A 1-click change to model = 2-eyes review.
CallSphere uses GrowthBook self-hosted as the source of truth for system prompts and model selection across 37 voice agents and 6 verticals. Every call assigns a flag bundle at start (cached for the call), and we ship outcome events to Postgres for experiment analysis. Healthcare gets stricter cohort rules; behavioral health has its own approval workflow. 90+ tools, 115+ DB tables, $149/$499/$1499, 14-day trial, 22% affiliate.
Q: Why flag prompts vs versioning them in code? Speed of rollback (seconds vs deploy), and percentage rollout per cohort. Code versioning is for the content of prompts; flags are for which one is live.
Q: Statsig + OpenAI — concerns? Acquisition closed 2025. So far the SDK and pricing haven't changed; data residency policies are the watch-item.
Q: Open-source alternative if all three feel heavy? Unleash or PostHog feature flags. Both work fine for AI but have less LLM-specific tooling.
Q: How do I roll back instantly?
Flip the kill switch to true. With ~5s SDK polling, traffic moves to safe defaults globally in <30 s.
Written by
Sagar Shankaran· Founder, CallSphere
Sagar Shankaran is the founder of CallSphere, where he builds production AI voice and chat agents deployed across healthcare, hospitality, real estate, and home services. He writes about agentic AI, LLM engineering, and shipping voice agents that handle real calls in production.
See how AI voice agents work for your industry. Live demo available -- no signup required.
Haystack 2.7's Agent component plus an Ollama-served Llama 3.2 gives you tool-calling RAG with citations. Here's a complete pipeline against your own document store.
Run STT, LLM, and TTS entirely on Cloudflare's edge — no OpenAI, no ElevenLabs. Real working code with Whisper, Llama 3.3 70B, and Deepgram Aura.
Version your prompts in git, run a 50-case eval suite on every PR, block merges below threshold, and ship a new agent prompt with confidence — full GitHub Actions tutorial.
Replace expensive outbound SDR tooling with a self-hosted dialer that runs OpenAI Realtime agents at 100 concurrent calls. Full architecture and code.
HVAC companies miss 40–60% of inbound. Build a 4-agent dispatch (intake, scheduling, parts, emergency) that integrates with ServiceTitan in 600 lines.
LangChain v1 + LangGraph v1 in JS, paired with Ollama, gives you a fully local chat agent with tools, memory, and structured output. No OpenAI key required.
© 2026 CallSphere LLC. All rights reserved.
Watch how CallSphere handles real customer calls, schedules appointments, and processes payments — live.
Try Live DemoBook a DemoCalculate Your ROI