---
title: "Building a Voice UI for AI Agents: Microphone Input, Waveform Visualization, and Playback"
description: "Implement a voice interface for AI agents using the MediaRecorder API, real-time audio waveform visualization with Canvas, and audio playback controls in React."
canonical: https://callsphere.ai/blog/building-voice-ui-ai-agents-microphone-waveform-playback
category: "Learn Agentic AI"
tags: ["Voice UI", "MediaRecorder API", "Audio Visualization", "React", "AI Agent Interface"]
author: "CallSphere Team"
published: 2026-03-17T00:00:00.000Z
updated: 2026-05-08T11:30:33.891Z
---

# Building a Voice UI for AI Agents: Microphone Input, Waveform Visualization, and Playback

> Implement a voice interface for AI agents using the MediaRecorder API, real-time audio waveform visualization with Canvas, and audio playback controls in React.

## Why Voice Interfaces for Agents

Voice interaction removes the typing bottleneck. Users can describe complex problems, provide context, and issue multi-step instructions faster through speech than text. Building a voice UI for an AI agent requires three capabilities: capturing microphone input, visualizing audio in real-time, and playing back agent audio responses.

## Requesting Microphone Access

The Web Audio API requires explicit user permission. Wrap the permission request in a hook that tracks the microphone state.

```mermaid
flowchart LR
    INPUT(["User intent"])
    PARSE["Parse plus
classify"]
    PLAN["Plan and tool
selection"]
    AGENT["Agent loop
LLM plus tools"]
    GUARD{"Guardrails
and policy"}
    EXEC["Execute and
verify result"]
    OBS[("Trace and metrics")]
    OUT(["Outcome plus
next action"])
    INPUT --> PARSE --> PLAN --> AGENT --> GUARD
    GUARD -->|Pass| EXEC --> OUT
    GUARD -->|Fail| AGENT
    AGENT --> OBS
    style AGENT fill:#4f46e5,stroke:#4338ca,color:#fff
    style GUARD fill:#f59e0b,stroke:#d97706,color:#1f2937
    style OBS fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
    style OUT fill:#059669,stroke:#047857,color:#fff
```

```typescript
import { useState, useCallback, useRef } from "react";

type MicStatus = "idle" | "requesting" | "active" | "denied" | "error";

function useMicrophone() {
  const [status, setStatus] = useState("idle");
  const streamRef = useRef(null);

  const requestAccess = useCallback(async () => {
    setStatus("requesting");
    try {
      const stream = await navigator.mediaDevices.getUserMedia({
        audio: {
          echoCancellation: true,
          noiseSuppression: true,
          sampleRate: 16000,
        },
      });
      streamRef.current = stream;
      setStatus("active");
      return stream;
    } catch (err) {
      const name = (err as DOMException).name;
      setStatus(name === "NotAllowedError" ? "denied" : "error");
      return null;
    }
  }, []);

  const stopMic = useCallback(() => {
    streamRef.current?.getTracks().forEach((t) => t.stop());
    streamRef.current = null;
    setStatus("idle");
  }, []);

  return { status, requestAccess, stopMic, stream: streamRef };
}
```

The `sampleRate: 16000` constraint is important. Most speech-to-text APIs expect 16kHz audio. Requesting it upfront avoids client-side resampling.

## Recording Audio with MediaRecorder

The `MediaRecorder` API captures audio chunks from the microphone stream. Collect chunks in an array and assemble them into a `Blob` when recording stops.

```typescript
function useAudioRecorder() {
  const [isRecording, setIsRecording] = useState(false);
  const recorderRef = useRef(null);
  const chunksRef = useRef([]);

  const startRecording = useCallback((stream: MediaStream) => {
    chunksRef.current = [];
    const recorder = new MediaRecorder(stream, {
      mimeType: "audio/webm;codecs=opus",
    });

    recorder.ondataavailable = (e) => {
      if (e.data.size > 0) chunksRef.current.push(e.data);
    };

    recorder.start(250); // Collect data every 250ms
    recorderRef.current = recorder;
    setIsRecording(true);
  }, []);

  const stopRecording = useCallback((): Promise => {
    return new Promise((resolve) => {
      const recorder = recorderRef.current;
      if (!recorder) return;

      recorder.onstop = () => {
        const blob = new Blob(chunksRef.current, {
          type: "audio/webm",
        });
        resolve(blob);
      };

      recorder.stop();
      setIsRecording(false);
    });
  }, []);

  return { isRecording, startRecording, stopRecording };
}
```

The 250ms interval in `recorder.start(250)` provides a good balance between responsiveness and efficiency. Smaller intervals create more chunks but allow for lower-latency streaming to the server.

## Real-Time Waveform Visualization

A waveform gives visual feedback that audio is being captured. Use an `AnalyserNode` from the Web Audio API and draw the waveform on a Canvas element.

```typescript
import { useEffect, useRef } from "react";

function WaveformVisualizer({
  stream,
  isActive,
}: {
  stream: MediaStream | null;
  isActive: boolean;
}) {
  const canvasRef = useRef(null);

  useEffect(() => {
    if (!stream || !isActive || !canvasRef.current) return;

    const audioCtx = new AudioContext();
    const analyser = audioCtx.createAnalyser();
    analyser.fftSize = 256;
    const source = audioCtx.createMediaStreamSource(stream);
    source.connect(analyser);

    const canvas = canvasRef.current;
    const ctx = canvas.getContext("2d")!;
    const bufferLength = analyser.frequencyBinCount;
    const dataArray = new Uint8Array(bufferLength);
    let animId: number;

    function draw() {
      animId = requestAnimationFrame(draw);
      analyser.getByteTimeDomainData(dataArray);

      ctx.fillStyle = "#f9fafb";
      ctx.fillRect(0, 0, canvas.width, canvas.height);
      ctx.lineWidth = 2;
      ctx.strokeStyle = "#3b82f6";
      ctx.beginPath();

      const sliceWidth = canvas.width / bufferLength;
      let x = 0;

      for (let i = 0; i  {
      cancelAnimationFrame(animId);
      source.disconnect();
      audioCtx.close();
    };
  }, [stream, isActive]);

  return (

  );
}
```

## Audio Playback for Agent Responses

When the agent returns an audio response, create an `Audio` element and manage playback state.

```typescript
function useAudioPlayback() {
  const [isPlaying, setIsPlaying] = useState(false);
  const audioRef = useRef(null);

  const play = useCallback((audioUrl: string) => {
    const audio = new Audio(audioUrl);
    audioRef.current = audio;
    audio.onended = () => setIsPlaying(false);
    audio.play();
    setIsPlaying(true);
  }, []);

  const stop = useCallback(() => {
    audioRef.current?.pause();
    audioRef.current = null;
    setIsPlaying(false);
  }, []);

  return { isPlaying, play, stop };
}
```

## Putting It All Together

Combine the hooks into a voice interaction component with record, send, and playback controls.

```typescript
function VoiceAgentUI() {
  const mic = useMicrophone();
  const recorder = useAudioRecorder();
  const playback = useAudioPlayback();

  const handleRecord = async () => {
    const stream = await mic.requestAccess();
    if (stream) recorder.startRecording(stream);
  };

  const handleStop = async () => {
    const blob = await recorder.stopRecording();
    mic.stopMic();
    // Send blob to your agent API
    const formData = new FormData();
    formData.append("audio", blob, "recording.webm");
    const res = await fetch("/api/agent/voice", {
      method: "POST",
      body: formData,
    });
    const { audioUrl } = await res.json();
    playback.play(audioUrl);
  };

  return (

        {recorder.isRecording ? "Stop" : "Mic"}

  );
}
```

## FAQ

### What audio format should I send to the speech-to-text API?

Most APIs accept `audio/webm` with Opus codec, which is what `MediaRecorder` produces by default in Chrome and Firefox. If your API requires WAV or PCM, use a library like `audiobuffer-to-wav` to convert the recorded blob before sending.

### How do I handle the microphone permission prompt appearing multiple times?

The browser remembers permission grants per origin. If you serve your app over HTTPS, the user only sees the prompt once unless they explicitly revoke it. On localhost during development the prompt may reappear. Check `navigator.permissions.query({ name: "microphone" })` to determine the current permission state before calling `getUserMedia`.

### Can I stream audio to the agent in real-time instead of recording first?

Yes. Use the `ondataavailable` callback with a short interval (100-250ms) and send each chunk to a WebSocket endpoint as it arrives. This enables real-time speech-to-text and reduces perceived latency because the agent starts processing before the user finishes speaking.

---

#VoiceUI #MediaRecorderAPI #AudioVisualization #React #AIAgentInterface #AgenticAI #LearnAI #AIEngineering

---

Source: https://callsphere.ai/blog/building-voice-ui-ai-agents-microphone-waveform-playback
