---
title: "Building a Voicemail AI Agent: Transcription, Analysis, and Automated Response"
description: "Build an intelligent voicemail system that transcribes messages, scores priority, extracts action items, and schedules callbacks automatically. Covers voicemail detection, message processing, and smart notifications."
canonical: https://callsphere.ai/blog/building-voicemail-ai-agent-transcription-analysis-response
category: "Learn Agentic AI"
tags: ["Voicemail", "Transcription", "AI Analysis", "Callback Scheduling", "Voice AI", "Automation"]
author: "CallSphere Team"
published: 2026-03-17T00:00:00.000Z
updated: 2026-05-06T15:45:03.495Z
---

# Building a Voicemail AI Agent: Transcription, Analysis, and Automated Response

> Build an intelligent voicemail system that transcribes messages, scores priority, extracts action items, and schedules callbacks automatically. Covers voicemail detection, message processing, and smart notifications.

## Rethinking Voicemail with AI

Traditional voicemail is a black hole. Messages pile up, important calls get buried under spam, and by the time someone listens to a message, the moment has passed. An AI-powered voicemail agent transforms this experience: every message is instantly transcribed, analyzed for urgency, scored by priority, and routed to the right person with a recommended action. Critical messages trigger immediate notifications. Routine ones get batched into a daily digest.

This is not just voicemail transcription — it is an intelligent message processing pipeline.

## Voicemail Detection and Greeting

The first challenge is knowing when to activate the voicemail system. This happens when a call goes unanswered or when the AI screening agent decides to take a message:

```mermaid
flowchart LR
    CALLER(["Caller"])
    subgraph TEL["Telephony"]
        SIP["Twilio SIP and PSTN"]
    end
    subgraph BRAIN["Business AI Agent"]
        STT["Streaming STT
Deepgram or Whisper"]
        NLU{"Intent and
Entity Extraction"}
        TOOLS["Tool Calls"]
        TTS["Streaming TTS
ElevenLabs or Rime"]
    end
    subgraph DATA["Live Data Plane"]
        CRM[("CRM and Notes")]
        CAL[("Calendar and
Schedule")]
        KB[("Knowledge Base
and Policies")]
    end
    subgraph OUT["Outcomes"]
        O1(["Booking captured"])
        O2(["CRM record created"])
        O3(["Human handoff"])
    end
    CALLER --> SIP --> STT --> NLU
    NLU -->|Lookup| TOOLS
    TOOLS  CRM
    TOOLS  CAL
    TOOLS  KB
    NLU --> TTS --> SIP --> CALLER
    NLU -->|Resolved| O1
    NLU -->|Schedule| O2
    NLU -->|Escalate| O3
    style CALLER fill:#f1f5f9,stroke:#64748b,color:#0f172a
    style NLU fill:#4f46e5,stroke:#4338ca,color:#fff
    style O1 fill:#059669,stroke:#047857,color:#fff
    style O2 fill:#0ea5e9,stroke:#0369a1,color:#fff
    style O3 fill:#f59e0b,stroke:#d97706,color:#1f2937
```

```python
from twilio.twiml.voice_response import VoiceResponse
from fastapi import FastAPI, Request
from fastapi.responses import Response

app = FastAPI()

@app.post("/voicemail-greeting")
async def voicemail_greeting(request: Request):
    """Play a personalized voicemail greeting and record."""
    form = await request.form()
    called_number = form.get("Called")
    caller_number = form.get("From")

    # Look up the mailbox owner for a personalized greeting
    owner = await get_mailbox_owner(called_number)

    response = VoiceResponse()

    if owner and owner.get("custom_greeting_url"):
        response.play(owner["custom_greeting_url"])
    else:
        name = owner.get("name", "the person you are calling") if owner else "us"
        response.say(
            f"You have reached {name}. "
            "Please leave a message after the tone and "
            "I will make sure it gets to the right person.",
            voice="Polly.Joanna",
        )

    response.pause(length=1)
    response.play("https://api.twilio.com/beep.mp3")

    # Record the voicemail
    response.record(
        action="/voicemail-complete",
        max_length=180,          # 3 minutes max
        timeout=5,               # 5 seconds of silence to stop
        transcribe=False,        # We will use our own transcription
        recording_status_callback="/recording-ready",
        play_beep=False,         # We already played our own
    )

    # Fallback if caller does not leave a message
    response.say("No message was recorded. Goodbye.")
    response.hangup()

    return Response(content=str(response), media_type="application/xml")
```

## Message Transcription Pipeline

When the recording is ready, download and transcribe it with high accuracy:

```python
import httpx
import os
from deepgram import DeepgramClient, PrerecordedOptions
from datetime import datetime

deepgram = DeepgramClient(os.environ["DEEPGRAM_API_KEY"])

async def transcribe_voicemail(recording_url: str) -> dict:
    """Download and transcribe a voicemail recording."""
    async with httpx.AsyncClient() as client:
        resp = await client.get(
            f"{recording_url}.wav",
            auth=(
                os.environ["TWILIO_ACCOUNT_SID"],
                os.environ["TWILIO_AUTH_TOKEN"],
            ),
        )
        audio_bytes = resp.content

    options = PrerecordedOptions(
        model="nova-2",
        smart_format=True,
        punctuate=True,
        paragraphs=True,
        detect_language=True,
        sentiment=True,
    )

    result = await deepgram.listen.asyncrest.v("1").transcribe_file(
        {"buffer": audio_bytes, "mimetype": "audio/wav"},
        options,
    )

    transcript = result.results.channels[0].alternatives[0]

    return {
        "text": transcript.transcript,
        "confidence": transcript.confidence,
        "language": result.results.channels[0].detected_language,
        "words": [
            {
                "word": w.word,
                "start": w.start,
                "end": w.end,
                "confidence": w.confidence,
            }
            for w in transcript.words
        ],
        "duration": result.metadata.duration,
    }
```

## AI-Powered Message Analysis

Analyze the transcribed message to extract structured information:

```python
from openai import AsyncOpenAI

client = AsyncOpenAI()

VOICEMAIL_ANALYSIS_PROMPT = """Analyze this voicemail message and extract:
1. caller_name: if mentioned
2. callback_number: if a different number is provided
3. summary: 1-2 sentence summary
4. intent: the caller's purpose (inquiry, complaint, appointment, urgent, sales, personal, spam)
5. urgency: 1-10 score (10 = emergency, 1 = junk)
6. sentiment: positive, neutral, negative, distressed
7. action_items: specific actions requested
8. entities: names, dates, account numbers, amounts mentioned
9. is_spam: boolean — telemarketer, robocall, or solicitation
10. suggested_response: recommended reply approach

Return valid JSON."""

async def analyze_voicemail(
    transcript_text: str,
    caller_number: str,
    caller_history: dict,
) -> dict:
    """Run AI analysis on a voicemail transcript."""
    context = ""
    if caller_history:
        context = (
            f"\nCaller history: {caller_history.get('total_calls', 0)} "
            f"previous calls, last contact: "
            f"{caller_history.get('last_contact', 'never')}. "
            f"Known as: {caller_history.get('name', 'unknown')}."
        )

    response = await client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": VOICEMAIL_ANALYSIS_PROMPT},
            {
                "role": "user",
                "content": f"Transcript: {transcript_text}{context}",
            },
        ],
        response_format={"type": "json_object"},
        temperature=0.2,
    )

    import json
    return json.loads(response.choices[0].message.content)
```

## Priority Scoring and Smart Routing

Not all voicemails are equal. Score and route them based on the analysis:

```python
from dataclasses import dataclass
from typing import Optional

@dataclass
class ProcessedVoicemail:
    id: str
    caller_number: str
    recording_url: str
    transcript: str
    analysis: dict
    priority_score: int
    mailbox_owner: str
    created_at: datetime
    callback_scheduled: Optional[datetime] = None

class VoicemailRouter:
    """Routes processed voicemails based on priority and content."""

    URGENCY_THRESHOLDS = {
        "immediate_notify": 8,   # Phone push + SMS
        "priority_notify": 5,    # Email + app notification
        "batch_digest": 1,       # Daily summary
        "spam_discard": 0,       # Auto-archive
    }

    async def route_voicemail(
        self, voicemail: ProcessedVoicemail
    ) -> str:
        """Determine notification strategy based on priority."""
        analysis = voicemail.analysis
        score = analysis.get("urgency", 5)

        if analysis.get("is_spam"):
            await self.archive_spam(voicemail)
            return "spam_archived"

        if score >= self.URGENCY_THRESHOLDS["immediate_notify"]:
            await self.send_immediate_notification(voicemail)
            await self.schedule_callback(voicemail, delay_minutes=15)
            return "immediate"

        if score >= self.URGENCY_THRESHOLDS["priority_notify"]:
            await self.send_priority_notification(voicemail)
            await self.schedule_callback(voicemail, delay_minutes=60)
            return "priority"

        await self.add_to_digest(voicemail)
        return "batched"

    async def send_immediate_notification(
        self, voicemail: ProcessedVoicemail
    ):
        """Push notification with transcript and suggested action."""
        message = (
            f"URGENT VOICEMAIL from {voicemail.analysis.get('caller_name', voicemail.caller_number)}\n"
            f"Summary: {voicemail.analysis['summary']}\n"
            f"Action: {voicemail.analysis.get('suggested_response', 'Call back ASAP')}"
        )
        await self.push_notification(voicemail.mailbox_owner, message)
        await self.send_sms(voicemail.mailbox_owner, message)

    async def schedule_callback(
        self, voicemail: ProcessedVoicemail, delay_minutes: int
    ):
        """Schedule an automated callback if not handled manually."""
        from datetime import timedelta
        callback_time = datetime.utcnow() + timedelta(minutes=delay_minutes)
        callback_number = (
            voicemail.analysis.get("callback_number")
            or voicemail.caller_number
        )

        await self.db_pool.execute(
            """
            INSERT INTO scheduled_callbacks
            (voicemail_id, phone_number, scheduled_at, status, context)
            VALUES ($1, $2, $3, 'pending', $4)
            """,
            voicemail.id,
            callback_number,
            callback_time,
            json.dumps(voicemail.analysis),
        )
```

## Automated Callback System

For voicemails that request a callback, the AI can handle the return call:

```python
class AutoCallbackEngine:
    """Handles automated callbacks for voicemail follow-up."""

    async def execute_callback(
        self, callback_id: str, voicemail: ProcessedVoicemail
    ):
        """Place an automated callback based on voicemail context."""
        context = voicemail.analysis

        # Generate a personalized callback script
        script = await self.generate_callback_script(context)

        # Place the call
        call = self.twilio_client.calls.create(
            to=context.get("callback_number", voicemail.caller_number),
            from_=os.environ["TWILIO_NUMBER"],
            url=(
                f"{self.webhook_base}/callback-answer"
                f"?callback_id={callback_id}"
            ),
            machine_detection="DetectMessageEnd",
        )

        return call.sid

    async def generate_callback_script(self, context: dict) -> str:
        """Generate a contextual callback opening."""
        response = await self.ai_client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[
                {
                    "role": "system",
                    "content": (
                        "Generate a brief, professional callback "
                        "opening based on the voicemail context. "
                        "Reference the caller's original message to "
                        "show you listened. Keep it under 3 sentences."
                    ),
                },
                {
                    "role": "user",
                    "content": (
                        f"Caller: {context.get('caller_name', 'the caller')}. "
                        f"Their message: {context['summary']}. "
                        f"They wanted: {', '.join(context.get('action_items', ['a callback']))}"
                    ),
                },
            ],
        )
        return response.choices[0].message.content
```

## The Complete Processing Pipeline

Wire everything together in an async pipeline:

```python
async def process_voicemail_pipeline(
    recording_sid: str,
    recording_url: str,
    call_sid: str,
    caller_number: str,
    called_number: str,
):
    """End-to-end voicemail processing pipeline."""
    # Step 1: Transcribe
    transcript = await transcribe_voicemail(recording_url)

    if transcript["confidence"] < 0.3:
        # Very low confidence — store raw recording, skip analysis
        await store_raw_voicemail(recording_sid, recording_url)
        return

    # Step 2: Get caller history
    caller_history = await get_caller_history(caller_number)

    # Step 3: Analyze
    analysis = await analyze_voicemail(
        transcript["text"], caller_number, caller_history
    )

    # Step 4: Create processed voicemail record
    voicemail = ProcessedVoicemail(
        id=recording_sid,
        caller_number=caller_number,
        recording_url=recording_url,
        transcript=transcript["text"],
        analysis=analysis,
        priority_score=analysis.get("urgency", 5),
        mailbox_owner=await get_mailbox_owner(called_number),
        created_at=datetime.utcnow(),
    )

    # Step 5: Store in database
    await store_processed_voicemail(voicemail)

    # Step 6: Route based on priority
    route_result = await voicemail_router.route_voicemail(voicemail)

    print(
        f"Voicemail from {caller_number}: "
        f"urgency={analysis.get('urgency')}, "
        f"intent={analysis.get('intent')}, "
        f"routed={route_result}"
    )
```

## FAQ

### How do I detect if a voicemail system answered instead of a human?

When making outbound calls, use Twilio's `machine_detection` parameter set to `DetectMessageEnd`. This uses audio analysis to distinguish human speech patterns from voicemail greetings. It detects the greeting, waits for the beep, and then connects your webhook so you can leave a message at the right moment. Detection accuracy is approximately 90% — design your opening line to work gracefully in both scenarios.

### What is the best way to handle voicemails in languages other than English?

Use a transcription service with automatic language detection (Deepgram and Whisper both support this). Once the language is detected, switch your AI analysis prompt to that language or use a multilingual model. Store the detected language alongside the transcript so notifications can be formatted appropriately. For businesses serving multilingual populations, consider offering the voicemail greeting in multiple languages.

### How do I handle very long voicemails or callers who ramble?

Set a `max_length` on the recording (120-180 seconds is typical). For analysis of long messages, the AI naturally handles this — the summary and action items extraction will distill even a rambling 3-minute message into a concise output. If you want to discourage long messages, your greeting can say "Please leave a brief message" and you can use the `timeout` parameter to stop recording after a few seconds of silence.

---

#Voicemail #Transcription #AIAnalysis #CallbackScheduling #VoiceAI #Automation #AgenticAI #LearnAI #AIEngineering

---

Source: https://callsphere.ai/blog/building-voicemail-ai-agent-transcription-analysis-response
