---
title: "Build a Voice Agent with Hume EVI 3: Emotionally Intelligent Voice (2026)"
description: "Hume EVI 3 is one model for STT+LLM+TTS with prosody-aware reactions. Build a customizable speech-to-speech agent — TypeScript code, voice prompting, pitfalls."
canonical: https://callsphere.ai/blog/vw9h-build-voice-agent-hume-evi-3-emotional-voice-2026
category: "AI Voice Agents"
tags: ["Hume", "EVI 3", "Emotional Voice", "Voice Agent", "Speech to Speech"]
author: "CallSphere Team"
published: 2026-04-04T00:00:00.000Z
updated: 2026-05-08T03:13:54.573Z
---

# Build a Voice Agent with Hume EVI 3: Emotionally Intelligent Voice (2026)

> Hume EVI 3 is one model for STT+LLM+TTS with prosody-aware reactions. Build a customizable speech-to-speech agent — TypeScript code, voice prompting, pitfalls.

> **TL;DR** — Hume EVI 3 is a single speech-language model that handles transcription, language, and speech in one shot — and it tracks the user's vocal emotion in real time. You can describe ANY voice in a prompt ("a warm 40-year-old British woman"), point it at Claude or Gemini, and get sub-300ms emotionally aware replies.

## What you'll build

A Next.js app using Hume's TypeScript SDK to open an EVI 3 WebSocket session, render the live emotion meter, and let users design a voice via plain-English prompt — all under 250 lines.

## Architecture

```mermaid
flowchart LR
  MIC[Browser mic] -- WS audio --> EV[Hume EVI 3]
  EV -- prosody + transcript --> APP[Your client]
  EV -- voice audio --> APP --> SP[Speakers]
  EV -- llm_call --> CLD[Claude 4 / Gemini 2.5]
```

## Step 1 — Install

```bash
npm i hume @humeai/voice-react

# server-side only:

npm i hume jsonwebtoken
```

## Step 2 — Mint an access token (server)

```ts
// app/api/hume-token/route.ts
import { fetchAccessToken } from "@humeai/voice";

export async function GET() {
  const accessToken = await fetchAccessToken({
    apiKey: process.env.HUME_API_KEY!,
    secretKey: process.env.HUME_SECRET_KEY!,
  });
  return Response.json({ accessToken });
}
```

## Step 3 — Configure the EVI

In platform.hume.ai → EVI → Configs, create a config with:

- Model: `evi-3`
- Voice description (prompt): `A warm, calm 35-year-old American woman who sounds like a kind nurse.`
- LLM: `anthropic/claude-3-5-sonnet` (or `google/gemini-2.5-flash`)
- System prompt: `You are Ava, a clinic concierge. Adapt tone to the caller's emotion.`
- Tools: optional (function calls work like OpenAI)

Copy the resulting `configId`.

## Step 4 — Client provider

```tsx
"use client";
import { VoiceProvider, useVoice } from "@humeai/voice-react";

export default function Page() {
  const [token, setToken] = useState(null);
  useEffect(() => { fetch("/api/hume-token").then(r=>r.json())
    .then(j=>setToken(j.accessToken)); }, []);
  if (!token) return null;
  return (

  );
}
```

## Step 5 — Render the emotion meter

```tsx
function Concierge() {
  const { connect, disconnect, status, messages } = useVoice();
  const last = messages[messages.length - 1];
  const top3 = last?.models?.prosody?.scores
    ? Object.entries(last.models.prosody.scores)
        .sort((a, b) => (b[1] as number) - (a[1] as number)).slice(0, 3)
    : [];
  return (
    <>

        {status.value === "connected" ? "Hang up" : "Talk"}

- {k}: {(v as number).toFixed(2)}

  );
}
```

## Step 6 — Design a voice in code

```ts
// node script
import { HumeClient } from "hume";
const hume = new HumeClient({ apiKey: process.env.HUME_API_KEY! });
const voice = await hume.empathicVoice.customVoices.create({
  name: "Sunrise Ava",
  baseVoice: "ITO",
  parameterModel: "20240715-4parameter",
  parameters: { gender: 2, assertiveness: -1, buoyancy: 1, confidence: 0 },
});
console.log(voice.id);
```

## Step 7 — Hook tool calls

EVI 3 tool events look like `{type: "tool_call", name, parameters, tool_call_id}` — handle in `onMessage` and respond with `{type: "tool_response", tool_call_id, content}`.

## Pitfalls

- **WebSocket only**: No HTTP REST surface for EVI; budget your reconnect logic.
- **Voice description quality**: Vague prompts ("nice voice") yield generic output — be specific (age, accent, energy).
- **Latency vs realism**: `evi-3` is ~280ms p50; switching to `evi-3-fast` drops to ~180ms with slightly less expressive prosody.
- **Multi-language**: Excellent on EN; for 60+ languages pair EVI 3 STT with Soniox or Universal-3.

## How CallSphere does this

CallSphere uses EVI 3 in the Behavioral Health vertical where emotional adaptation is core to UX — running across **37 agents · 90+ tools · 115+ DB tables · 6 verticals**. **$149/$499/$1,499 · 14-day trial · 22% affiliate**.

## FAQ

**Cost?** Per-minute pricing on EVI 3 is comparable to GPT-4o Realtime — ~$0.18/min combined.

**Custom LLM?** Yes — point the config at OpenAI / Anthropic / Google / Mistral via the dashboard.

**Voice cloning?** With 30 seconds of audio, EVI 3 captures timbre, rhythm, and tone.

**Phone calls?** Twilio Media Streams bridge ships in the docs — wire WS-to-WS and you have PSTN.

## Sources

- Hume Blog - Introducing EVI 3 - [https://www.hume.ai/blog/introducing-evi-3](https://www.hume.ai/blog/introducing-evi-3)
- Hume Blog - Announcing EVI 3 API - [https://www.hume.ai/blog/announcing-evi-3-api](https://www.hume.ai/blog/announcing-evi-3-api)
- Hume API Docs - Speech-to-Speech (EVI) - [https://dev.hume.ai/docs/speech-to-speech-evi/overview](https://dev.hume.ai/docs/speech-to-speech-evi/overview)
- Vercel Template - Hume Empathic Voice Starter - [https://vercel.com/templates/next.js/empathic-voice-interface-starter](https://vercel.com/templates/next.js/empathic-voice-interface-starter)

---

Source: https://callsphere.ai/blog/vw9h-build-voice-agent-hume-evi-3-emotional-voice-2026