Streaming Text Display in React: Typewriter Effect for AI Agent Responses

Why Streaming Matters for Agent UX

When an AI agent takes 3-8 seconds to generate a full response, showing a blank loading spinner creates anxiety. Streaming tokens as they arrive gives users immediate feedback and makes the agent feel responsive. This pattern — used by ChatGPT, Claude, and every major AI interface — is achieved through Server-Sent Events (SSE) on the backend and incremental state updates on the frontend.

Setting Up the SSE Consumer

The browser EventSource API is simple but limited. It only supports GET requests and cannot send custom headers. For agent APIs that require POST bodies and authentication headers, use the Fetch API with a readable stream instead.

sequenceDiagram
    autonumber
    participant Client
    participant Edge as Edge Worker
    participant LLM as LLM Provider
    participant DB as Logs and Trace
    Client->>Edge: POST /chat (stream=true)
    Edge->>LLM: messages.create(stream=true)
    loop Each token
        LLM-->>Edge: SSE chunk delta
        Edge-->>Client: SSE chunk delta
        Edge->>DB: append token to span
    end
    LLM-->>Edge: stop_reason=end_turn
    Edge-->>Client: event: done
    Edge->>DB: finalize trace

async function* streamAgentResponse(
  message: string,
  signal: AbortSignal
): AsyncGenerator<string> {
  const response = await fetch("/api/agent/chat", {
    method: "POST",
    headers: {
      "Content-Type": "application/json",
      Authorization: `Bearer ${getToken()}`,
    },
    body: JSON.stringify({ message }),
    signal,
  });

  if (!response.ok) {
    throw new Error(`Agent error: ${response.status}`);
  }

  const reader = response.body!.getReader();
  const decoder = new TextDecoder();

  while (true) {
    const { done, value } = await reader.read();
    if (done) break;

    const chunk = decoder.decode(value, { stream: true });
    const lines = chunk.split("\n");

    for (const line of lines) {
      if (line.startsWith("data: ")) {
        const data = line.slice(6);
        if (data === "[DONE]") return;
        const parsed = JSON.parse(data);
        if (parsed.token) {
          yield parsed.token;
        }
      }
    }
  }
}

The async generator pattern is ideal here. It produces tokens lazily, handles back-pressure naturally, and composes cleanly with React hooks.

The Streaming Hook

Wrap the generator in a custom hook that manages accumulated text, streaming state, and cancellation.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live →

Try Live Demo →

import { useState, useRef, useCallback } from "react";

interface StreamState {
  text: string;
  isStreaming: boolean;
  error: string | null;
}

function useStreamingResponse() {
  const [state, setState] = useState<StreamState>({
    text: "",
    isStreaming: false,
    error: null,
  });
  const abortRef = useRef<AbortController | null>(null);

  const startStream = useCallback(async (message: string) => {
    abortRef.current?.abort();
    const controller = new AbortController();
    abortRef.current = controller;

    setState({ text: "", isStreaming: true, error: null });

    try {
      for await (const token of streamAgentResponse(
        message,
        controller.signal
      )) {
        setState((prev) => ({
          ...prev,
          text: prev.text + token,
        }));
      }
      setState((prev) => ({ ...prev, isStreaming: false }));
    } catch (err) {
      if ((err as Error).name !== "AbortError") {
        setState((prev) => ({
          ...prev,
          isStreaming: false,
          error: (err as Error).message,
        }));
      }
    }
  }, []);

  const cancel = useCallback(() => {
    abortRef.current?.abort();
    setState((prev) => ({ ...prev, isStreaming: false }));
  }, []);

  return { ...state, startStream, cancel };
}

Each token appends to the existing text through a state updater function. This avoids stale closure issues that would occur if you read state.text directly inside the loop.

Rendering Streaming Markdown

During streaming, partial markdown tokens arrive that may not form complete syntax. A naive markdown renderer would flicker between valid and invalid states. The solution: render markdown on every update but debounce expensive operations like syntax highlighting.

import ReactMarkdown from "react-markdown";

interface StreamingMessageProps {
  text: string;
  isStreaming: boolean;
}

function StreamingMessage({ text, isStreaming }: StreamingMessageProps) {
  return (
    <div className="prose prose-sm max-w-none">
      <ReactMarkdown>{text}</ReactMarkdown>
      {isStreaming && <BlinkingCursor />}
    </div>
  );
}

function BlinkingCursor() {
  return (
    <span
      className="inline-block w-2 h-5 bg-gray-800 ml-0.5 animate-pulse"
      aria-hidden="true"
    />
  );
}

The BlinkingCursor component creates the familiar typing indicator. The aria-hidden attribute prevents screen readers from announcing the cursor element.

Batching Token Updates for Performance

Setting state on every single token can cause excessive re-renders. If the backend streams tokens at high speed, batch them using requestAnimationFrame.

function useTokenBatcher(
  onBatch: (tokens: string) => void
) {
  const bufferRef = useRef("");
  const rafRef = useRef<number | null>(null);

  const addToken = useCallback((token: string) => {
    bufferRef.current += token;

    if (rafRef.current === null) {
      rafRef.current = requestAnimationFrame(() => {
        onBatch(bufferRef.current);
        bufferRef.current = "";
        rafRef.current = null;
      });
    }
  }, [onBatch]);

  return addToken;
}

This batches all tokens that arrive within a single animation frame into one state update. Instead of 50 re-renders per second you get at most 60, and each render processes multiple tokens at once.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Try Live Demo → Book 30-min Walkthrough See Pricing

Cancellation and Cleanup

Users must be able to stop a running stream. The AbortController pattern handles this cleanly. Wire a stop button to the cancel function from the hook.

function ChatControls({
  isStreaming,
  onCancel,
}: {
  isStreaming: boolean;
  onCancel: () => void;
}) {
  if (!isStreaming) return null;

  return (
    <button
      onClick={onCancel}
      className="flex items-center gap-1.5 rounded-lg border
                 px-3 py-1.5 text-sm hover:bg-gray-50"
    >
      <span className="w-3 h-3 rounded-sm bg-gray-700" />
      Stop generating
    </button>
  );
}

FAQ

How do I handle code blocks that arrive partially during streaming?

Most markdown renderers handle partial code blocks gracefully by treating unclosed fences as plain text until the closing fence arrives. If you see flicker, wrap your markdown component in React.memo and avoid re-parsing the entire string on every token. Libraries like react-markdown handle incremental content well out of the box.

What is the difference between SSE and WebSockets for streaming?

SSE is unidirectional (server to client), uses plain HTTP, and reconnects automatically. WebSockets are bidirectional and require a persistent connection. For AI agent streaming where the server sends tokens and the client only listens, SSE is simpler and sufficient. Use WebSockets when you need bidirectional communication, such as real-time collaborative editing or push notifications from the agent.

How do I add a copy button for completed responses?

After streaming finishes (isStreaming is false), render a copy button that calls navigator.clipboard.writeText(text). During streaming, hide the copy button to prevent users from copying incomplete content.

#React #Streaming #ServerSentEvents #TypeScript #AIAgentInterface #AgenticAI #LearnAI #AIEngineering

Streaming Text Display in React: Typewriter Effect for AI Agent Responses

Why Streaming Matters for Agent UX

Setting Up the SSE Consumer

The Streaming Hook

Rendering Streaming Markdown

Batching Token Updates for Performance

Cancellation and Cleanup

FAQ

How do I handle code blocks that arrive partially during streaming?

What is the difference between SSE and WebSockets for streaming?

How do I add a copy button for completed responses?

Try CallSphere AI Voice Agents

Related Articles You May Like

GPT-Realtime-Whisper vs Deepgram: Streaming STT in 2026

Streaming Agent Responses with OpenAI Agents SDK and LangChain in 2026

Token-Level Evaluation of Streaming Agents: TTFT, Stream Smoothness, and Mid-Stream Hallucination Detection

Vercel AI SDK v5 Agent Patterns: stopWhen, prepareStep, and Loop Control

Mastra.ai: The TypeScript Agent Framework Worth Trying in 2026

Vercel AI SDK 5: Tool Calling and Streaming Guide for React Apps