Built-in Tracing in OpenAI Agents SDK: Visualize and Debug Workflows
Learn how the OpenAI Agents SDK automatically traces every agent run with agent_span, generation_span, and function_span, and how to visualize traces in the OpenAI dashboard for debugging.
Why Tracing Matters for Agentic Systems
When you build a traditional API, debugging is straightforward: you read the request, follow the handler logic, and inspect the response. Agentic systems shatter that simplicity. A single user query might trigger an orchestrator agent, which delegates to two specialist agents, each calling three tools, with the orchestrator looping back for a second pass based on intermediate results. Without tracing, debugging this is like navigating a cave without a flashlight.
Tracing gives you a structured, hierarchical record of everything that happened during an agent run. You see which agent was active, what LLM calls were made, which tools were invoked, what arguments were passed, and how long each step took. OpenAI's Agents SDK ships with automatic tracing built in, so you get this visibility without writing a single line of instrumentation code.
How Auto-Tracing Works
Every call to Runner.run() automatically creates a trace — a top-level container that groups all the spans generated during that execution. Within the trace, the SDK creates three types of spans:
flowchart TD
START["Built-in Tracing in OpenAI Agents SDK: Visualize …"] --> A
A["Why Tracing Matters for Agentic Systems"]
A --> B
B["How Auto-Tracing Works"]
B --> C
C["Viewing Traces in the OpenAI Dashboard"]
C --> D
D["Controlling Trace Behavior"]
D --> E
E["Tracing Multi-Agent Handoffs"]
E --> F
F["Debugging Common Issues with Traces"]
F --> G
G["Best Practices for Production Tracing"]
G --> DONE["Key Takeaways"]
style START fill:#4f46e5,stroke:#4338ca,color:#fff
style DONE fill:#059669,stroke:#047857,color:#fff
- agent_span: Created whenever an agent becomes active. If your orchestrator hands off to a research agent, you will see separate agent spans for each.
- generation_span: Created for every LLM API call. This captures the model name, input messages, output, token counts, and latency.
- function_span: Created whenever a tool function is invoked. This records the tool name, input arguments, and return value.
Here is a minimal example that produces a fully traced run:
from agents import Agent, Runner, function_tool
@function_tool
def get_weather(city: str) -> str:
"""Fetch current weather for a city."""
return f"72F and sunny in {city}"
@function_tool
def get_population(city: str) -> str:
"""Fetch population data for a city."""
return f"{city} has a population of 1.5 million"
agent = Agent(
name="City Info Agent",
instructions="You provide city information using the available tools.",
tools=[get_weather, get_population],
)
result = Runner.run_sync(agent, "Tell me about Austin, Texas")
print(result.final_output)
When this code runs, the SDK automatically generates a trace with the following hierarchy:
Trace: "Agent run"
+-- agent_span: City Info Agent
+-- generation_span: gpt-4o (initial reasoning)
+-- function_span: get_weather(city="Austin, Texas")
+-- function_span: get_population(city="Austin, Texas")
+-- generation_span: gpt-4o (final synthesis)
You did not annotate anything. The SDK intercepted every meaningful step and recorded it.
Viewing Traces in the OpenAI Dashboard
Traces generated by the Agents SDK are automatically sent to the OpenAI platform. Navigate to the Traces section in the OpenAI dashboard to see a timeline view of every run. Each trace can be expanded to reveal the full span hierarchy.
The dashboard provides several critical debugging views:
Timeline View — Shows spans arranged on a horizontal time axis. This immediately reveals where your agent is spending time. If a tool call takes 3 seconds while everything else takes milliseconds, you spot the bottleneck instantly.
Span Detail View — Click any span to see its full payload. For a generation_span, you see the exact messages sent to the model, the completion returned, the token count, and the model used. For a function_span, you see the arguments and return value.
Trace Metadata — Each trace carries metadata including a unique trace ID, the total duration, the workflow name, and any custom tags you attach. This makes it easy to filter traces in the dashboard.
See AI Voice Agents Handle Real Calls
Book a free demo or calculate how much you can save with AI voice automation.
Controlling Trace Behavior
By default, every Runner.run() call is traced. You can customize this behavior:
from agents import Runner
# Disable tracing for a specific run
result = Runner.run_sync(agent, "Hello", run_config=RunConfig(tracing_disabled=True))
# Set a custom workflow name for easier filtering
result = Runner.run_sync(
agent,
"Tell me about Austin",
run_config=RunConfig(workflow_name="city-info-lookup"),
)
Setting a meaningful workflow_name is strongly recommended for production systems. Instead of seeing dozens of generic "Agent run" traces, you see "lead-qualification," "support-ticket-triage," and "document-summarization," making it trivial to filter and compare.
Tracing Multi-Agent Handoffs
Tracing becomes especially valuable with handoffs. When one agent transfers control to another, the trace captures the full chain:
from agents import Agent, Runner, handoff
research_agent = Agent(
name="Research Agent",
instructions="You perform deep research on topics.",
tools=[search_web, read_document],
)
summary_agent = Agent(
name="Summary Agent",
instructions="You summarize research findings concisely.",
)
orchestrator = Agent(
name="Orchestrator",
instructions="Route research requests to the research agent, then summarize.",
handoffs=[handoff(research_agent), handoff(summary_agent)],
)
result = Runner.run_sync(orchestrator, "Research quantum computing trends")
The resulting trace looks like:
Trace: "Agent run"
+-- agent_span: Orchestrator
+-- generation_span: gpt-4o (routing decision)
+-- agent_span: Research Agent
+-- generation_span: gpt-4o (research planning)
+-- function_span: search_web(query="quantum computing 2026 trends")
+-- function_span: read_document(url="...")
+-- generation_span: gpt-4o (synthesis)
+-- agent_span: Summary Agent
+-- generation_span: gpt-4o (summarization)
This hierarchical view shows you exactly how control flowed between agents, which tools were invoked at each stage, and how long each agent held the conversation.
Debugging Common Issues with Traces
Traces expose several common failure patterns that are otherwise difficult to diagnose:
Infinite Loops — If an agent keeps calling the same tool with identical arguments, the trace shows a repeating pattern of function_span entries. You can set max_turns on the Runner to prevent runaway execution and use the trace to identify why the agent is looping.
Wrong Agent Routing — In a multi-agent system, the trace reveals which agent handled each turn. If the orchestrator routes a billing question to the technical support agent, you see it immediately in the agent_span hierarchy.
Token Bloat — Generation spans include token counts. If a single LLM call consumes 15,000 tokens when you expected 2,000, the trace highlights the problem. This often points to overly verbose tool outputs being fed back into the conversation.
Slow Tool Calls — The timeline view shows duration for each span. A function_span that takes 8 seconds while others take 200 milliseconds points you directly to the external service or database query that needs optimization.
Best Practices for Production Tracing
Always set workflow_name — Generic trace names become useless at scale. Name your workflows after the user intent they serve.
Use trace IDs for correlation — Pass the trace ID into your application logs so you can cross-reference agent behavior with your existing observability stack.
Monitor trace duration trends — A trace that averaged 2 seconds last week but now averages 6 seconds signals a regression, even if no errors are thrown.
Review traces during incidents — When users report unexpected agent behavior, the trace is the first place to look. It shows you exactly what the agent did, not what you assumed it would do.
Sample in high-traffic environments — If your agent handles thousands of requests per minute, trace a representative sample rather than every request to manage storage costs.
Built-in tracing transforms agent debugging from guesswork into inspection. The OpenAI Agents SDK makes this effortless by auto-instrumenting every agent run, LLM call, and tool invocation without requiring you to modify your application code.
Written by
CallSphere Team
Expert insights on AI voice agents and customer communication automation.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.