From Conversation to Action

A voice agent that can only talk is not very useful. The real power comes when the agent can do things — book appointments, search inventory, look up account details, process payments, and trigger workflows. These capabilities are implemented as function tools that the agent calls mid-conversation.

The challenge with voice is timing. When a user asks "Book me a dentist appointment for tomorrow at 2pm," they expect a response within a few seconds. If the tool takes 5 seconds to execute, the caller hears silence and wonders if the call dropped. Managing audio feedback during tool execution is critical for voice UX.

Defining Voice Agent Tools

Tools for voice agents follow the same pattern as chat agent tools in the OpenAI Agents SDK. The difference is in how you handle the user experience around tool execution:

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live →

Try Live Demo →

flowchart LR
    CALLER(["Student or Parent"])
    subgraph TEL["Telephony"]
        SIP["Twilio SIP and PSTN"]
    end
    subgraph BRAIN["Education AI Agent"]
        STT["Streaming STT<br/>Deepgram or Whisper"]
        NLU{"Intent and<br/>Entity Extraction"}
        TOOLS["Tool Calls"]
        TTS["Streaming TTS<br/>ElevenLabs or Rime"]
    end
    subgraph DATA["Live Data Plane"]
        CRM[("CRM and Notes")]
        CAL[("Calendar and<br/>Schedule")]
        KB[("Knowledge Base<br/>and Policies")]
    end
    subgraph OUT["Outcomes"]
        O1(["Enrollment captured"])
        O2(["Tour scheduled"])
        O3(["Counselor callback"])
    end
    CALLER --> SIP --> STT --> NLU
    NLU -->|Lookup| TOOLS
    TOOLS <--> CRM
    TOOLS <--> CAL
    TOOLS <--> KB
    NLU --> TTS --> SIP --> CALLER
    NLU -->|Resolved| O1
    NLU -->|Schedule| O2
    NLU -->|Escalate| O3
    style CALLER fill:#f1f5f9,stroke:#64748b,color:#0f172a
    style NLU fill:#4f46e5,stroke:#4338ca,color:#fff
    style O1 fill:#059669,stroke:#047857,color:#fff
    style O2 fill:#0ea5e9,stroke:#0369a1,color:#fff
    style O3 fill:#f59e0b,stroke:#d97706,color:#1f2937

from agents import Agent, function_tool
from datetime import datetime, timedelta
from typing import Optional
import httpx

@function_tool
async def check_availability(
    provider_name: str,
    date: str,
    service_type: str,
) -> str:
    """Check appointment availability for a specific provider and date.
    Returns available time slots."""
    async with httpx.AsyncClient() as client:
        resp = await client.get(
            "https://api.scheduling.internal/v1/availability",
            params={
                "provider": provider_name,
                "date": date,
                "service": service_type,
            },
            timeout=5.0,
        )
        data = resp.json()
        slots = data.get("available_slots", [])

    if not slots:
        return f"No availability for {provider_name} on {date} for {service_type}."

    slot_list = ", ".join(slots[:5])
    return f"Available times for {provider_name} on {date}: {slot_list}"

@function_tool
async def book_appointment(
    provider_name: str,
    date: str,
    time: str,
    patient_name: str,
    patient_phone: str,
    service_type: str,
    notes: Optional[str] = None,
) -> str:
    """Book an appointment with a provider. Requires patient details."""
    async with httpx.AsyncClient() as client:
        resp = await client.post(
            "https://api.scheduling.internal/v1/appointments",
            json={
                "provider": provider_name,
                "date": date,
                "time": time,
                "patient_name": patient_name,
                "patient_phone": patient_phone,
                "service_type": service_type,
                "notes": notes or "",
            },
            timeout=10.0,
        )

    if resp.status_code == 201:
        confirmation = resp.json()
        return (
            f"Appointment booked successfully. "
            f"Confirmation number: {confirmation['id']}. "
            f"{patient_name} with {provider_name} on {date} at {time} "
            f"for {service_type}."
        )
    elif resp.status_code == 409:
        return f"That time slot is no longer available. Please choose another time."
    else:
        return f"Unable to book the appointment. Please try again or call the office directly."

@function_tool
async def search_providers(
    specialty: str,
    location: Optional[str] = None,
    insurance: Optional[str] = None,
) -> str:
    """Search for providers by specialty, location, and insurance acceptance."""
    async with httpx.AsyncClient() as client:
        params = {"specialty": specialty}
        if location:
            params["location"] = location
        if insurance:
            params["insurance"] = insurance

        resp = await client.get(
            "https://api.scheduling.internal/v1/providers",
            params=params,
            timeout=5.0,
        )
        providers = resp.json().get("providers", [])

    if not providers:
        return f"No providers found for {specialty} in your area."

    results = []
    for p in providers[:3]:
        results.append(
            f"{p['name']} — {p['address']}, "
            f"next available: {p['next_available']}"
        )
    return "Here are the top providers:\n" + "\n".join(results)

Audio Feedback During Tool Execution

When a tool takes more than a second to execute, the user hears dead air. This is the single biggest UX problem with voice agent tools. There are several strategies to fill this gap.

Strategy 1: Filler Phrases

The simplest approach is to have the agent say something before calling the tool:

voice_agent = Agent(
    name="BookingAgent",
    instructions="""You are a medical appointment booking assistant.

IMPORTANT: Before calling any tool, always say a brief filler phrase so the
caller does not hear silence. Examples:
- "Let me check that for you."
- "One moment while I look that up."
- "I am searching for available times now."
- "Let me book that appointment for you right now."

After the tool returns, relay the results conversationally.""",
    tools=[check_availability, book_appointment, search_providers],
)

This approach is simple but relies on the model following instructions consistently. For more reliable behavior, handle it in code.

Strategy 2: Programmatic Hold Audio

Intercept the tool call event and inject audio feedback before executing the tool:

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Try Live Demo → Book 30-min Walkthrough See Pricing

import json
import asyncio

FILLER_MESSAGES = {
    "check_availability": "Let me check those available times for you.",
    "book_appointment": "I am booking that appointment now. Just a moment.",
    "search_providers": "Searching for providers in your area.",
}

async def handle_function_call(ws, function_name: str, call_id: str, arguments: str):
    """Handle a function call with audio feedback."""
    # Step 1: Send filler message immediately
    filler = FILLER_MESSAGES.get(function_name, "One moment please.")
    await ws.send(json.dumps({
        "type": "conversation.item.create",
        "item": {
            "type": "message",
            "role": "assistant",
            "content": [{"type": "input_text", "text": filler}],
        },
    }))

    # Step 2: Execute the tool
    result = await execute_tool(function_name, arguments)

    # Step 3: Return the tool result to the conversation
    await ws.send(json.dumps({
        "type": "conversation.item.create",
        "item": {
            "type": "function_call_output",
            "call_id": call_id,
            "output": result,
        },
    }))

    # Step 4: Trigger the agent to respond with the result
    await ws.send(json.dumps({"type": "response.create"}))

Handling Tool Errors Gracefully

Tools fail. APIs time out. Databases go down. In a chat agent, you can show an error message. In a voice agent, you need to speak the error naturally:

async def execute_tool_with_fallback(
    function_name: str,
    arguments: str,
    max_retries: int = 1,
) -> str:
    """Execute a tool with retry and graceful error handling."""
    tool_map = {
        "check_availability": check_availability,
        "book_appointment": book_appointment,
        "search_providers": search_providers,
    }

    tool_fn = tool_map.get(function_name)
    if not tool_fn:
        return f"I do not have the ability to {function_name} at the moment."

    import json as json_lib
    parsed_args = json_lib.loads(arguments)

    for attempt in range(max_retries + 1):
        try:
            result = await asyncio.wait_for(
                tool_fn(**parsed_args),
                timeout=8.0,
            )
            return result
        except asyncio.TimeoutError:
            if attempt < max_retries:
                continue
            return (
                "I am sorry, the system is taking longer than expected. "
                "Let me try a different approach, or I can transfer you "
                "to someone who can help directly."
            )
        except Exception as e:
            if attempt < max_retries:
                continue
            return (
                "I encountered an issue while processing your request. "
                "Would you like me to try again, or would you prefer "
                "to speak with a human agent?"
            )

Confirmation Before Destructive Actions

Voice is inherently error-prone — the STT might mishear a name, date, or number. Always confirm before executing actions that are hard to undo:

confirmation_agent = Agent(
    name="BookingAgent",
    instructions="""You are a medical appointment booking assistant.

CRITICAL RULES:
1. Before calling book_appointment, ALWAYS read back ALL details to the caller
   and ask for explicit confirmation:
   "Just to confirm — I will book an appointment with Dr. Smith on March 15th
   at 2:00 PM for a dental cleaning. The name on the appointment will be
   John Doe, and we will send a confirmation to 555-0123. Is all of that correct?"

2. Only proceed with booking after the caller says "yes", "correct",
   "that is right", or similar affirmative.

3. If the caller corrects any detail, update it and read back the full
   details again before booking.

4. After booking, always read back the confirmation number slowly and clearly.
   Spell out any letters.""",
    tools=[check_availability, book_appointment, search_providers],
)

Production Checklist

Before deploying voice agent tools to production:

Set timeouts on every external call — 5-8 seconds maximum for voice
Always provide audio feedback before tool execution
Confirm destructive actions by reading back details and waiting for affirmation
Handle partial failures — if SMS fails after booking succeeds, still confirm the booking
Log every tool call with arguments, duration, and result for debugging
Rate-limit tool calls per session to prevent abuse or infinite loops
Test with real speech input — STT errors in tool arguments (like mishearing "March 15" as "March 50") need graceful handling

Voice agent tools transform passive conversations into active service delivery. The key is managing the timing and feedback so that tool execution feels seamless rather than interruptive.

Voice Agent Tools: Booking, Search, and Real-Time Actions

From Conversation to Action

Defining Voice Agent Tools

Audio Feedback During Tool Execution

Strategy 1: Filler Phrases

Strategy 2: Programmatic Hold Audio

Handling Tool Errors Gracefully

Confirmation Before Destructive Actions

Production Checklist

Try CallSphere AI Voice Agents

Related Articles You May Like

Desktop AI Agents in 2026: Project Arc, Claude Cowork, OpenAI Agents Compared

OpenAI Frontier: Model-Native Orchestration Is the Default in 2026

Gemini Enterprise vs Anthropic vs OpenAI Frontier: 2026 Comparison

Anthropic's Financial Services Platform: State of Play in May 2026

Model-Native Harness: Why OpenAI and Anthropic Are Killing ReAct Loops

GPT-Realtime-Whisper vs Deepgram: Streaming STT in 2026