Voice Cloning Portability (ElevenLabs): CallSphere vs Vapi Lock-In

TL;DR

A voice clone is a brand asset. If you cannot move it, you do not really own it. CallSphere uses ElevenLabs Conversational AI (the "Sarah" voice in Sales) and ElevenLabs TTS/STT in Salon, with the cloned voice ID owned by your ElevenLabs account, fully portable. Vapi layers TTS providers (ElevenLabs, PlayHT, Azure, OpenAI), but the clone metadata, prompt-bound voice configs, and IVR-style audio cues are locked to Vapi's assistant config — moving them is a manual reconstruct.

This post is the asset-lifecycle deep dive: how to keep your voice clone portable, how the lifecycle differs between platforms, and the migration playbook if you are leaving a platform.

What Is Actually Locked In

When you "clone a voice" on a voice AI platform, you accumulate three artifacts:

The clone itself — model weights or voice ID at the TTS provider
The platform binding — assistant config that references the voice with provider-specific settings (stability, similarity_boost, style, speaker_boost)
Production-tuned prompts — system prompts crafted to sound natural with that specific voice

Lock-in shows up at layer 2 and 3. The clone itself usually is portable; the surrounding config is not.

Vapi Voice Cloning Approach

Vapi supports multiple TTS providers and lets you reference an ElevenLabs voice ID directly:

{
  "voice": {
    "provider": "11labs",
    "voiceId": "your_eleven_voice_id",
    "stability": 0.5,
    "similarityBoost": 0.75,
    "style": 0.0,
    "useSpeakerBoost": true,
    "model": "eleven_turbo_v2_5"
  }
}

You retain the underlying ElevenLabs voice ID — that is portable. What is not portable:

The Vapi assistant config that references it
Vapi-specific tuning that compensates for their pipeline
Squad members built around that voice
Custom audio cues stored in Vapi's CDN
Function-call flows wired to that assistant

If you migrate to another platform, you re-reference the voice ID and rebuild everything else.

CallSphere Voice Cloning Approach

CallSphere uses ElevenLabs for both Conversational AI ("Sarah" in the Sales platform) and TTS/STT in the Salon vertical. The integration is intentionally thin so the voice asset stays yours.

Asset Ownership Model

The cloned voice lives in your ElevenLabs account, not CallSphere's. CallSphere holds:

Your ElevenLabs API key (encrypted, per-tenant)
The voice ID
Tuning parameters (stability, similarity, style)
Prompt-voice bindings

When you offboard, you take the API key, the voice ID, and the tuning JSON. Replication on another platform is mechanical.

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

Try Live Demo ROI Calculator

Salon Voice Configuration Example

// shipped in the Salon backend
export const salonVoiceConfig = {
  provider: 'elevenlabs',
  voice_id: process.env.ELEVENLABS_SALON_VOICE_ID,
  model: 'eleven_turbo_v2_5',
  output_format: 'pcm_24000',
  voice_settings: {
    stability: 0.55,
    similarity_boost: 0.8,
    style: 0.15,
    use_speaker_boost: true,
  },
  // Bindings live in source control, not vendor-side
  prompt_bindings: {
    greeting: 'salon-warm-greeting-v3',
    confirm: 'salon-confirm-tone-v2',
    farewell: 'salon-farewell-v1',
  },
};

The prompt_bindings are pointers to system-prompt templates in our git repo. Migration = copy the JSON, copy the prompts, point a new platform at your ElevenLabs account.

Sales "Sarah" Conversational AI

Sales uses ElevenLabs Conversational AI, which is a higher-level construct than raw TTS — it owns the turn-taking and interruption logic. CallSphere stores the conv-AI agent ID alongside the voice ID; both live in your ElevenLabs account.

const sarahConfig = {
  conv_ai_agent_id: process.env.ELEVENLABS_SARAH_AGENT_ID,
  voice_id: process.env.ELEVENLABS_SARAH_VOICE_ID,
  // CallSphere-side overrides (orchestration, tools)
  hand_off_targets: ['booking_specialist', 'human_sales'],
  outbound_concurrency: 5,
};

When you leave CallSphere, you keep both ElevenLabs IDs and the orchestration overrides; you re-host the orchestration on whatever platform you choose.

Voice Asset Versioning

A subtle lock-in trap: when you tweak a voice (re-clone with new samples, adjust style), do you remember which prompts were tuned for which version?

CallSphere stores voice version + prompt version pairs in Postgres:

CREATE TABLE voice_asset_versions (
  id UUID PRIMARY KEY,
  voice_id TEXT NOT NULL,
  voice_version TEXT NOT NULL,
  prompt_template_id TEXT NOT NULL,
  prompt_version TEXT NOT NULL,
  paired_at TIMESTAMPTZ DEFAULT NOW(),
  retired_at TIMESTAMPTZ
);

Migrations ship the active row only. Old versions stay archived for audit.

Vapi vs CallSphere Voice Portability Comparison

Dimension	Vapi	CallSphere
ElevenLabs voice ID owned by	You (your ELabs account)	You (your ELabs account)
Tuning config location	Vapi assistant	Source control in your repo
Prompt-voice binding	Vapi UI	Source control
Audio cue assets	Vapi CDN	Your S3 bucket
Multi-tenant ELabs key	Shared possible	Per-tenant key
Rebuild on migration	Manual reconstruct	Copy JSON + prompts
Versioning of pairings	Limited	Postgres-tracked
Time to migrate (estimated)	2-5 days	4-8 hours

Voice Asset Lifecycle

graph LR
    A[Sample collection<br/>5-10 min audio] --> B[Upload to ElevenLabs]
    B --> C[Generate voice_id]
    C --> D[Tune voice_settings]
    D --> E[Bind to prompt template]
    E --> F[Pair version in Postgres]
    F --> G[Deploy to production]
    G --> H{Tune needed?}
    H -->|yes| D
    H -->|no| I[Archive old pairing]
    G --> J[Migration to new platform?]
    J -->|yes| K[Export voice_id + tuning + prompts]
    K --> L[Re-host on new platform]
    L --> G

Migration Playbook (CallSphere → Anywhere)

Export the voice IDs from the platform config (a single pnpm exec voice-export command in CallSphere).
Export tuning JSON — voice_settings + prompt_bindings.
Export system prompts — already in git, just copy the relevant directory.
Re-bind on the new platform — most platforms accept ElevenLabs voice IDs directly.
Run a parallel-call test for 50 calls to compare pre/post quality before cutover.

The migration is engineering-driven, not vendor-blocked. That is the whole point.

Anti-Patterns We Avoid

Storing audio cues on the platform's CDN. Always store in your own S3 bucket, reference by URL.
Embedding the API key in assistant configs. Always inject from secrets manager at runtime.
Hard-coding voice IDs in prompts. Always parameterize.
Skipping version pairing. A voice tweak six months ago that was paired with a now-deleted prompt is a silent quality regression.

FAQ

Can I clone a voice on CallSphere directly?

We do not host the cloning UI — you clone in your ElevenLabs account, then paste the voice ID into CallSphere config. This is by design: the asset stays in your account.

What if ElevenLabs raises prices?

Both platforms expose the price pass-through. Switching to PlayHT or Azure neural voices is a config change, not a re-clone.

How long does a voice clone take to make?

ElevenLabs Instant clone: ~1 minute, requires 1-3 minute sample. Professional clone: ~24 hours, requires 30+ minute sample.

Are there legal concerns with voice cloning?

Yes — written consent from the voice owner is mandatory. CallSphere requires uploaded consent forms before activating any clone in a production tenant.

Does Vapi support custom voices the same way?

Vapi supports ElevenLabs voice IDs and PlayHT cloned voices. Portability of the IDs themselves is similar; portability of the surrounding config is where Vapi adds friction.

Talk to a Human About Voice Strategy

The /features page documents the voice provider matrix per vertical, and /demo lets you hear the production "Sarah" voice on a live call.