Why Streaming Matters for Chat Agents

When a chat agent calls tools, reasons about results, and composes a response, the total latency can easily reach 5-10 seconds. Without streaming, the user stares at a blank screen for the entire duration. With streaming, tokens appear as they are generated, tool calls are shown in progress, and the interface feels responsive even during complex operations.

The OpenAI Agents SDK provides Runner.run_streamed() which yields events as the agent processes — including partial text, tool call initiation, tool results, and agent handoffs. Combined with Server-Sent Events (SSE) on the backend and an EventSource consumer on the frontend, we can build a real-time streaming chat experience.

Architecture Overview

The streaming pipeline has three stages:

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live →

Try Live Demo →

sequenceDiagram
    autonumber
    participant Client
    participant Edge as Edge Worker
    participant LLM as LLM Provider
    participant DB as Logs and Trace
    Client->>Edge: POST /chat (stream=true)
    Edge->>LLM: messages.create(stream=true)
    loop Each token
        LLM-->>Edge: SSE chunk delta
        Edge-->>Client: SSE chunk delta
        Edge->>DB: append token to span
    end
    LLM-->>Edge: stop_reason=end_turn
    Edge-->>Client: event: done
    Edge->>DB: finalize trace

Agent Layer — run_streamed() produces a stream of StreamEvent objects
API Layer — FastAPI SSE endpoint converts events to SSE format
Frontend Layer — React component consumes SSE and renders tokens incrementally

run_streamed()          FastAPI SSE           EventSource
   Agent ──────► StreamEvent ──────► data: {...} ──────► React UI
   events         to JSON             SSE format         render

Backend: FastAPI SSE Endpoint

The backend needs to accept a chat message, start a streamed agent run, and forward events to the client as SSE. FastAPI supports SSE through StreamingResponse with the text/event-stream content type.

# main.py
import json
import asyncio
from fastapi import FastAPI
from fastapi.responses import StreamingResponse
from fastapi.middleware.cors import CORSMiddleware
from pydantic import BaseModel
from agents import Runner
from agents.stream_events import AgentUpdatedStreamEvent
from agents.items import (
    MessageOutputItem,
    ToolCallItem,
    ToolCallOutputItem,
)

from agents_config import support_agent
from session_manager import SessionManager

app = FastAPI()
app.add_middleware(
    CORSMiddleware,
    allow_origins=["*"],
    allow_methods=["*"],
    allow_headers=["*"],
)

sessions = SessionManager()

class ChatRequest(BaseModel):
    session_id: str
    message: str

@app.post("/chat/stream")
async def chat_stream(request: ChatRequest):
    session = sessions.get_or_create(request.session_id)
    session.add_message("user", request.message)
    input_list = session.to_input_list()

    async def event_generator():
        result = Runner.run_streamed(
            support_agent,
            input=input_list,
        )

        async for event in result.stream_events():
            # Stream partial text tokens
            if event.type == "raw_response_event":
                delta = getattr(
                    event.data, "delta", None
                )
                if delta:
                    yield format_sse({
                        "type": "text_delta",
                        "content": delta,
                    })

            # Notify about tool calls starting
            elif event.type == "run_item_stream_event":
                item = event.item
                if isinstance(item, ToolCallItem):
                    yield format_sse({
                        "type": "tool_start",
                        "tool": item.name,
                    })
                elif isinstance(item, ToolCallOutputItem):
                    yield format_sse({
                        "type": "tool_result",
                        "tool": item.name if hasattr(item, "name") else "unknown",
                        "output": str(item.output)[:200],
                    })

            # Notify about agent handoffs
            elif isinstance(event, AgentUpdatedStreamEvent):
                yield format_sse({
                    "type": "agent_switch",
                    "agent": event.new_agent.name,
                })

        # Stream is complete — save result for next turn
        final_result = result.final_result
        session.result = final_result
        session.add_message("assistant", final_result.final_output)

        yield format_sse({
            "type": "done",
            "final_output": final_result.final_output,
        })

    return StreamingResponse(
        event_generator(),
        media_type="text/event-stream",
        headers={
            "Cache-Control": "no-cache",
            "Connection": "keep-alive",
            "X-Accel-Buffering": "no",
        },
    )

def format_sse(data: dict) -> str:
    """Format a dictionary as an SSE event string."""
    return f"data: {json.dumps(data)}\n\n"

There are a few important details here. The X-Accel-Buffering: no header disables nginx buffering which would otherwise batch SSE events and defeat the purpose of streaming. The Cache-Control: no-cache header prevents CDN caching of the event stream.

Handling Partial Deltas

The raw_response_event events contain token-level deltas from the model. Each delta is a small piece of text — sometimes a single word, sometimes a partial word. The frontend accumulates these deltas to build the complete message.

# A more robust delta extractor that handles different event shapes
def extract_text_delta(event) -> str | None:
    """Extract text delta from a raw response event."""
    if event.type != "raw_response_event":
        return None

    data = event.data

    # OpenAI chat completion delta format
    if hasattr(data, "delta"):
        delta = data.delta
        if isinstance(delta, str):
            return delta
        if hasattr(delta, "content") and delta.content:
            return delta.content

    # Direct content attribute
    if hasattr(data, "content") and isinstance(data.content, str):
        return data.content

    return None

Frontend: React SSE Chat Component

The frontend uses the native EventSource API (or fetch with a ReadableStream for POST requests) to consume the SSE stream. Since SSE natively only supports GET requests and we need to send a POST body, we use the fetch API with a streaming reader.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Try Live Demo → Book 30-min Walkthrough See Pricing

// hooks/useChatStream.ts
import { useState, useCallback, useRef } from "react";

interface ChatMessage {
  role: "user" | "assistant";
  content: string;
  toolCalls?: string[];
  isStreaming?: boolean;
}

export function useChatStream(sessionId: string) {
  const [messages, setMessages] = useState<ChatMessage[]>([]);
  const [isLoading, setIsLoading] = useState(false);
  const abortRef = useRef<AbortController | null>(null);

  const sendMessage = useCallback(
    async (content: string) => {
      // Add user message immediately
      const userMsg: ChatMessage = { role: "user", content };
      setMessages((prev) => [...prev, userMsg]);
      setIsLoading(true);

      // Add placeholder for assistant response
      const assistantIdx = messages.length + 1;
      setMessages((prev) => [
        ...prev,
        { role: "assistant", content: "", isStreaming: true, toolCalls: [] },
      ]);

      const controller = new AbortController();
      abortRef.current = controller;

      try {
        const response = await fetch("/api/chat/stream", {
          method: "POST",
          headers: { "Content-Type": "application/json" },
          body: JSON.stringify({ session_id: sessionId, message: content }),
          signal: controller.signal,
        });

        const reader = response.body?.getReader();
        const decoder = new TextDecoder();
        let buffer = "";

        while (reader) {
          const { done, value } = await reader.read();
          if (done) break;

          buffer += decoder.decode(value, { stream: true });
          const lines = buffer.split("\n\n");
          buffer = lines.pop() || "";

          for (const line of lines) {
            if (!line.startsWith("data: ")) continue;
            const data = JSON.parse(line.slice(6));

            setMessages((prev) => {
              const updated = [...prev];
              const msg = updated[assistantIdx];
              if (!msg) return prev;

              if (data.type === "text_delta") {
                msg.content += data.content;
              } else if (data.type === "tool_start") {
                msg.toolCalls = [...(msg.toolCalls || []), data.tool];
              } else if (data.type === "done") {
                msg.isStreaming = false;
              }
              return updated;
            });
          }
        }
      } catch (err) {
        if ((err as Error).name !== "AbortError") {
          console.error("Stream error:", err);
        }
      } finally {
        setIsLoading(false);
      }
    },
    [sessionId, messages.length]
  );

  const cancel = useCallback(() => {
    abortRef.current?.abort();
  }, []);

  return { messages, sendMessage, isLoading, cancel };
}

Chat Message Component

The chat component renders messages with streaming indicators, tool call badges, and a typing cursor for in-progress responses.

// components/ChatMessage.tsx
import React from "react";

interface Props {
  role: "user" | "assistant";
  content: string;
  toolCalls?: string[];
  isStreaming?: boolean;
}

export function ChatMessage({ role, content, toolCalls, isStreaming }: Props) {
  return (
    <div className={`flex ${role === "user" ? "justify-end" : "justify-start"} mb-4`}>
      <div
        className={`max-w-[70%] rounded-lg px-4 py-2 ${
          role === "user"
            ? "bg-blue-600 text-white"
            : "bg-gray-100 text-gray-900"
        }`}
      >
        {toolCalls && toolCalls.length > 0 && (
          <div className="flex gap-1 mb-2">
            {toolCalls.map((tool, i) => (
              <span
                key={i}
                className="text-xs bg-yellow-200 text-yellow-800 px-2 py-0.5 rounded"
              >
                {tool}
              </span>
            ))}
          </div>
        )}

        <p className="text-sm whitespace-pre-wrap">
          {content}
          {isStreaming && (
            <span className="inline-block w-2 h-4 bg-gray-400 ml-1 animate-pulse" />
          )}
        </p>
      </div>
    </div>
  );
}

Complete Chat Page

Wire everything together in a chat page component:

// app/chat/page.tsx
"use client";
import React, { useState, useRef, useEffect } from "react";
import { useChatStream } from "@/hooks/useChatStream";
import { ChatMessage } from "@/components/ChatMessage";
import { v4 as uuidv4 } from "uuid";

export default function ChatPage() {
  const [sessionId] = useState(() => uuidv4());
  const { messages, sendMessage, isLoading, cancel } = useChatStream(sessionId);
  const [input, setInput] = useState("");
  const scrollRef = useRef<HTMLDivElement>(null);

  useEffect(() => {
    scrollRef.current?.scrollIntoView({ behavior: "smooth" });
  }, [messages]);

  const handleSubmit = (e: React.FormEvent) => {
    e.preventDefault();
    if (!input.trim() || isLoading) return;
    sendMessage(input.trim());
    setInput("");
  };

  return (
    <div className="flex flex-col h-screen max-w-2xl mx-auto">
      <div className="flex-1 overflow-y-auto p-4">
        {messages.map((msg, i) => (
          <ChatMessage key={i} {...msg} />
        ))}
        <div ref={scrollRef} />
      </div>

      <form onSubmit={handleSubmit} className="p-4 border-t flex gap-2">
        <input
          type="text"
          value={input}
          onChange={(e) => setInput(e.target.value)}
          placeholder="Type a message..."
          className="flex-1 border rounded-lg px-4 py-2"
          disabled={isLoading}
        />
        {isLoading ? (
          <button type="button" onClick={cancel} className="px-4 py-2 bg-red-500 text-white rounded-lg">
            Stop
          </button>
        ) : (
          <button type="submit" className="px-4 py-2 bg-blue-600 text-white rounded-lg">
            Send
          </button>
        )}
      </form>
    </div>
  );
}

Performance Considerations

When building streaming chat UIs, keep these performance factors in mind:

Debounce state updates — at high token rates, updating React state on every delta can cause jank. Batch deltas using requestAnimationFrame or accumulate in a ref and flush periodically.
Virtualize long conversations — for sessions with hundreds of messages, use a virtualized list (such as react-window) to avoid rendering all messages in the DOM.
Connection management — SSE connections hold a TCP socket open. Implement heartbeat events (a comment line every 30 seconds) to detect stale connections. Clean up sessions when the client disconnects.
Backpressure handling — if the frontend cannot consume events fast enough, the server buffers events in memory. Set reasonable buffer limits and drop stale connections.

Streaming transforms the chat agent experience from a frustrating wait into an engaging conversation. The combination of run_streamed(), FastAPI SSE, and React event consumption gives you a responsive, production-ready chat interface that handles tool calls, agent handoffs, and multi-turn context seamlessly.

Chat Agent UI: Streaming Responses with Server-Sent Events

Why Streaming Matters for Chat Agents

Architecture Overview

Backend: FastAPI SSE Endpoint

Handling Partial Deltas

Frontend: React SSE Chat Component

Chat Message Component

Complete Chat Page

Performance Considerations

Try CallSphere AI Voice Agents

Related Articles You May Like

Desktop AI Agents in 2026: Project Arc, Claude Cowork, OpenAI Agents Compared

OpenAI Frontier: Model-Native Orchestration Is the Default in 2026

Gemini Enterprise vs Anthropic vs OpenAI Frontier: 2026 Comparison

Anthropic's Financial Services Platform: State of Play in May 2026

Model-Native Harness: Why OpenAI and Anthropic Are Killing ReAct Loops

GPT-Realtime-Whisper vs Deepgram: Streaming STT in 2026