By Sagar Shankaran, Founder of CallSphere
Hume EVI 3 is one model for STT+LLM+TTS with prosody-aware reactions. Build a customizable speech-to-speech agent — TypeScript code, voice prompting, pitfalls.
Key takeaways
TL;DR — Hume EVI 3 is a single speech-language model that handles transcription, language, and speech in one shot — and it tracks the user's vocal emotion in real time. You can describe ANY voice in a prompt ("a warm 40-year-old British woman"), point it at Claude or Gemini, and get sub-300ms emotionally aware replies.
A Next.js app using Hume's TypeScript SDK to open an EVI 3 WebSocket session, render the live emotion meter, and let users design a voice via plain-English prompt — all under 250 lines.
flowchart LR
MIC[Browser mic] -- WS audio --> EV[Hume EVI 3]
EV -- prosody + transcript --> APP[Your client]
EV -- voice audio --> APP --> SP[Speakers]
EV -- llm_call --> CLD[Claude 4 / Gemini 2.5]
```bash npm i hume @humeai/voice-react
npm i hume jsonwebtoken ```
```ts // app/api/hume-token/route.ts import { fetchAccessToken } from "@humeai/voice";
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
export async function GET() { const accessToken = await fetchAccessToken({ apiKey: process.env.HUME_API_KEY!, secretKey: process.env.HUME_SECRET_KEY!, }); return Response.json({ accessToken }); } ```
In platform.hume.ai → EVI → Configs, create a config with:
evi-3A warm, calm 35-year-old American woman who sounds like a kind nurse.anthropic/claude-3-5-sonnet (or google/gemini-2.5-flash)You are Ava, a clinic concierge. Adapt tone to the caller's emotion.Copy the resulting configId.
```tsx "use client"; import { VoiceProvider, useVoice } from "@humeai/voice-react";
export default function Page() {
const [token, setToken] = useState<string | null>(null);
useEffect(() => { fetch("/api/hume-token").then(r=>r.json())
.then(j=>setToken(j.accessToken)); }, []);
if (!token) return null;
return (
<VoiceProvider auth={{ type: "accessToken", value: token }}
configId={process.env.NEXT_PUBLIC_HUME_CONFIG_ID!}>
```tsx function Concierge() { const { connect, disconnect, status, messages } = useVoice(); const last = messages[messages.length - 1]; const top3 = last?.models?.prosody?.scores ? Object.entries(last.models.prosody.scores) .sort((a, b) => (b[1] as number) - (a[1] as number)).slice(0, 3) : []; return ( <> <button onClick={status.value === "connected" ? disconnect : connect}> {status.value === "connected" ? "Hang up" : "Talk"}
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
```ts // node script import { HumeClient } from "hume"; const hume = new HumeClient({ apiKey: process.env.HUME_API_KEY! }); const voice = await hume.empathicVoice.customVoices.create({ name: "Sunrise Ava", baseVoice: "ITO", parameterModel: "20240715-4parameter", parameters: { gender: 2, assertiveness: -1, buoyancy: 1, confidence: 0 }, }); console.log(voice.id); ```
EVI 3 tool events look like {type: "tool_call", name, parameters, tool_call_id} — handle in onMessage and respond with {type: "tool_response", tool_call_id, content}.
evi-3 is ~280ms p50; switching to evi-3-fast drops to ~180ms with slightly less expressive prosody.CallSphere uses EVI 3 in the Behavioral Health vertical where emotional adaptation is core to UX — running across 37 agents · 90+ tools · 115+ DB tables · 6 verticals. $149/$499/$1,499 · 14-day trial · 22% affiliate.
Cost? Per-minute pricing on EVI 3 is comparable to GPT-4o Realtime — ~$0.18/min combined.
Custom LLM? Yes — point the config at OpenAI / Anthropic / Google / Mistral via the dashboard.
Voice cloning? With 30 seconds of audio, EVI 3 captures timbre, rhythm, and tone.
Phone calls? Twilio Media Streams bridge ships in the docs — wire WS-to-WS and you have PSTN.
Written by
Sagar Shankaran· Founder, CallSphere
Sagar Shankaran is the founder of CallSphere, where he builds production AI voice and chat agents deployed across healthcare, hospitality, real estate, and home services. He writes about agentic AI, LLM engineering, and shipping voice agents that handle real calls in production.
See how AI voice agents work for your industry. Live demo available -- no signup required.
The voice AI market hits $47.5B by 2034. For gyms and PT studios, voice agents now make economic sense for member intake, upsells, and reactivation campaigns.
With the voice AI market at $47.5B by 2034 and OpenAI's realtime release this week, every dealership and service shop should be evaluating voice agents. Here's how.
Spring 2026 AC season starts now. With the voice AI market at $47.5B by 2034, HVAC shops without after-hours voice agents will lose to those that have them.
OpenAI's GPT-Realtime-Translate handles 70 input languages live at $0.034/min. Here is what that means for multilingual restaurant takeout — and how CallSphere ships it.
OpenAI's GPT-Realtime-Translate hits 70 languages at $0.034/min. For dental practices in diverse metros, this changes who picks up the phone — and who books the appointment.
Google Cloud Next rebranded Vertex AI as Gemini Enterprise Agent Platform with 2M context. Here is what that means for salon and beauty bookings — and where CallSphere fits.
© 2026 CallSphere LLC. All rights reserved.
Watch how CallSphere handles real customer calls, schedules appointments, and processes payments — live.
Try Live DemoBook a DemoCalculate Your ROI