---
title: "Voicemail Detection Accuracy: CallSphere vs Vapi (with Examples)"
description: "Voicemail detection accuracy makes or breaks outbound voice AI. CallSphere VoicemailAnalyzerAgent + Twilio AMD vs Vapi defaults. Real call examples included."
canonical: https://callsphere.ai/blog/voicemail-detection-accuracy-callsphere-vs-vapi
category: "Technical Guides"
tags: ["Voicemail Detection", "AMD", "Voice AI", "CallSphere", "Vapi", "Twilio", "After Hours"]
author: "CallSphere Team"
published: 2026-04-23T00:00:00.000Z
updated: 2026-05-01T09:41:36.261Z
---

# Voicemail Detection Accuracy: CallSphere vs Vapi (with Examples)

> Voicemail detection accuracy makes or breaks outbound voice AI. CallSphere VoicemailAnalyzerAgent + Twilio AMD vs Vapi defaults. Real call examples included.

## TL;DR

Voicemail detection (AMD - Answering Machine Detection) is the single biggest predictor of outbound campaign quality. False negatives (treating voicemail as a human) burn message budget and look spammy; false positives (treating humans as voicemail) make you look broken. **Vapi** uses provider-default AMD with limited customization. **CallSphere** uses a **two-stage cascade**: Twilio AMD signals first, then a **VoicemailAnalyzerAgent** built on `gpt-4o-mini` that listens to the first 4 seconds and confirms voicemail vs human with structured reasoning.

In production traffic across After-Hours dispatch, the cascade lands at ~96% accuracy vs ~83% for AMD-only.

## Why Voicemail Detection Is Hard

The naive heuristic — "wait for the beep" — fails because:

- People answer with long greetings ("Hello? Hi, this is John, who is this?")
- Voicemail systems have variable pre-beep delays (1.5s to 8s)
- Some voicemails skip the beep entirely
- Mobile carriers compress audio differently
- Background noise on humans imitates voicemail tone shifts

A single signal source is never enough. Production systems cascade.

## Vapi Voicemail Detection Approach

Vapi exposes a config block:

```json
{
  "voicemailDetection": {
    "provider": "twilio",
    "enabled": true,
    "machineDetectionTimeout": 30,
    "machineDetectionSpeechThreshold": 2400,
    "machineDetectionSpeechEndThreshold": 1200,
    "machineDetectionSilenceTimeout": 5000
  }
}
```

This delegates to Twilio's AMD plus Vapi's own assistant-side hint detection. The thresholds are exposed but the assistant logic is opaque.

**Strengths:** sane defaults work for most simple use cases.

**Weaknesses:**

- No second-pass LLM verification
- No way to inject domain knowledge ("this customer's voicemail says X")
- Hard to debug a false-positive
- Action on detection is binary (leave message / hang up)

## CallSphere Voicemail Detection Approach

CallSphere uses a **three-stage cascade**:

1. **Twilio AMD** runs in parallel with the call connect, returning `AnsweredBy` within ~2-3s
2. **Audio fingerprint** — first 1.5s of audio is matched against known voicemail intro patterns (regional carrier specifics)
3. **VoicemailAnalyzerAgent** — a `gpt-4o-mini` agent listens to the first 4 seconds of transcript + audio features and returns `{is_voicemail: bool, confidence: float, reasoning: string}`

The decision is a weighted vote.

### Twilio AMD Configuration

```python
client.calls.create(
    to=lead.phone,
    from_=campaign.caller_id,
    url=callback_url,
    machine_detection="DetectMessageEnd",  # waits for greeting end
    async_amd=True,                         # don't block call connect
    async_amd_status_callback=amd_callback_url,
    machine_detection_timeout=30,
    machine_detection_speech_threshold=2400,
    machine_detection_speech_end_threshold=1200,
    machine_detection_silence_timeout=5000,
)
```

`DetectMessageEnd` waits for the voicemail greeting to finish — important if you want to leave a message after the beep.

### VoicemailAnalyzerAgent

The second-pass agent is intentionally cheap (`gpt-4o-mini`) and structured:

```python
voicemail_analyzer = Agent(
    name="VoicemailAnalyzerAgent",
    model="gpt-4o-mini",
    instructions="""You analyze the first 4 seconds of an outbound call.
    Return strict JSON.

    Voicemail signals:
    - "You've reached the voicemail of..."
    - "I'm not available right now..."
    - "Please leave a message after the tone"
    - Long uninterrupted single voice >3s
    - "Please record your message"

    Human signals:
    - Question response: "Hello?" "Who is this?"
    - Short utterance under 2s with rising intonation
    - Background noise + brief greeting
    - Conversational hesitation: "Uh, hi?"

    Return: {"is_voicemail": bool, "confidence": 0.0-1.0, "reasoning": "..."}
    """,
    output_type=VoicemailVerdict,
)
```

### Cascade Logic

```python
async def detect_voicemail(call: OutboundCall) -> Verdict:
    twilio_signal = await call.amd_signal_within(2.5)
    audio_fingerprint = await call.audio_fingerprint_first_1500ms()

    if twilio_signal == "machine_start" and audio_fingerprint.match_voicemail:
        return Verdict.VOICEMAIL  # high confidence, skip LLM

    if twilio_signal == "human" and audio_fingerprint.match_human:
        return Verdict.HUMAN  # high confidence, skip LLM

    # Ambiguous — escalate to LLM
    transcript = await call.transcript_first_4s()
    audio_features = await call.audio_features_first_4s()
    verdict = await voicemail_analyzer.run({
        "transcript": transcript,
        "audio_features": audio_features.dict(),
        "twilio_amd": twilio_signal,
    })

    if verdict.confidence  Twilio[Twilio AMD
2.5s window]
    Start --> FP[Audio fingerprint
1.5s window]
    Twilio --> Agree{Both agree?}
    FP --> Agree
    Agree -->|yes voicemail| VM[Verdict: VOICEMAIL
cost: $0]
    Agree -->|yes human| H[Verdict: HUMAN
cost: $0]
    Agree -->|disagree| LLM[VoicemailAnalyzerAgent
gpt-4o-mini, 4s transcript]
    LLM --> Conf{conf > 0.65?}
    Conf -->|yes voicemail| VM2[Verdict: VOICEMAIL]
    Conf -->|yes human| H2[Verdict: HUMAN]
    Conf -->|no| U[Verdict: UNCERTAIN
treat as human, log]
    VM --> Action{Leave msg?}
    VM2 --> Action
    Action -->|yes| Beep[Wait beep, deliver SMS-ready msg]
    Action -->|no| Hangup[Hang up, retry tomorrow]
    H --> Live[Run human conversation flow]
    H2 --> Live
    U --> Live
```

## Practical Tips

- **Cascade > single signal.** Always.
- **Use `DetectMessageEnd`, not `Enable`.** `Enable` returns too early.
- **Log the LLM reasoning.** When detection disagrees with reality, the reasoning tells you what to fix.
- **Per-region tuning.** Audio fingerprints differ by carrier and region; ship a per-region config map.
- **Recheck weekly.** Voicemail patterns drift as carriers update prompts.

## FAQ

### Does the LLM second pass slow down the call?

Slightly — about 250-400ms on top of Twilio's 2.5s window. For outbound, this is invisible because the agent isn't speaking yet.

### Can I customize the voicemail message left?

Yes — CallSphere After-Hours flows include a per-campaign voicemail script tool, so the left message reflects the call purpose.

### What is the inbound counterpart?

Inbound rarely needs voicemail detection (the user is calling you), but the same cascade detects "you have reached an answering service for X" loops if you transfer.

### How often does the LLM disagree with Twilio?

About 12% of ambiguous cases land on the LLM, of which ~30% flip the verdict. Net: ~3.5% of all calls have their verdict corrected by the LLM second pass.

### What about regional/non-English voicemail?

The LLM prompt is multilingual; we ship Spanish-language voicemail patterns by default and add per-region configs as needed.

## See It Live

The [/features](/features) page lists per-vertical voicemail handling, and [/demo](/demo) includes an outbound test that triggers the full cascade you can inspect.

---

Source: https://callsphere.ai/blog/voicemail-detection-accuracy-callsphere-vs-vapi
