When Built-in Tracing Is Not Enough

The OpenAI Agents SDK auto-traces agent runs, LLM calls, and tool invocations. For simple single-agent workflows, that is usually sufficient. But real production systems have complexity that lives outside the SDK's automatic instrumentation: database queries inside tools, preprocessing pipelines that transform user input before the agent sees it, postprocessing steps that validate and format agent output, and business logic that determines which agent to invoke in the first place.

Custom spans let you extend the trace hierarchy with your own instrumentation points, giving you a complete picture of every step in your workflow — not just the agent parts.

The trace() Context Manager

The trace() context manager creates a top-level trace that wraps your entire workflow. While Runner.run() creates traces automatically, using trace() explicitly gives you control over the trace name, grouping, and metadata:

flowchart LR
    APP(["Agent or API"])
    SDK["OTel SDK<br/>GenAI conventions"]
    COL["OTel Collector"]
    subgraph BACKENDS["Backends"]
        TR[("Traces<br/>Tempo or Honeycomb")]
        MET[("Metrics<br/>Prometheus")]
        LOG[("Logs<br/>Loki or ELK")]
    end
    DASH["Grafana plus alerts"]
    PAGE(["Pager"])
    APP --> SDK --> COL
    COL --> TR
    COL --> MET
    COL --> LOG
    TR --> DASH
    MET --> DASH
    LOG --> DASH
    DASH --> PAGE
    style SDK fill:#4f46e5,stroke:#4338ca,color:#fff
    style DASH fill:#f59e0b,stroke:#d97706,color:#1f2937
    style PAGE fill:#dc2626,stroke:#b91c1c,color:#fff

from agents import Agent, Runner, trace

agent = Agent(
    name="Support Agent",
    instructions="You help customers with technical issues.",
)

with trace("customer-support-workflow", metadata={"channel": "web", "tier": "premium"}):
    # Preprocessing outside the agent
    sanitized_input = sanitize_user_message(raw_input)
    customer_context = await fetch_customer_profile(user_id)

    # Agent run — automatically nested inside our trace
    result = await Runner.run(
        agent,
        f"Customer context: {customer_context}\nQuery: {sanitized_input}",
    )

    # Postprocessing outside the agent
    formatted = format_response(result.final_output)
    await log_interaction(user_id, sanitized_input, formatted)

Every span created by the Runner.run() call is automatically nested under your custom trace. The metadata dictionary appears in the dashboard alongside the trace, enabling you to filter by channel, tier, customer segment, or any other dimension relevant to your application.

Creating Custom Spans

Within a trace, you can create custom spans to instrument specific blocks of code. The custom_span() context manager is the primary tool for this:

from agents import trace, custom_span

with trace("document-processing-pipeline"):
    with custom_span("input_validation"):
        validated = validate_and_parse(raw_document)

    with custom_span("embedding_generation"):
        chunks = chunk_document(validated, max_tokens=500)
        embeddings = await generate_embeddings(chunks)

    with custom_span("vector_store_upsert"):
        await vector_db.upsert(embeddings)

    with custom_span("agent_analysis"):
        result = await Runner.run(analysis_agent, f"Analyze: {validated.summary}")

    with custom_span("result_persistence"):
        await save_analysis(result.final_output)

This produces a trace with six top-level spans: your four custom spans plus the agent and generation spans nested under "agent_analysis." The dashboard timeline view shows exactly how much time was spent in each phase — embedding generation, database operations, agent reasoning, and persistence.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live →

Try Live Demo →

Nested Span Hierarchies

Custom spans can be nested to create rich hierarchies that reflect the logical structure of your workflow:

from agents import trace, custom_span

async def process_order(order_id: str):
    with trace("order-processing", metadata={"order_id": order_id}):
        with custom_span("validation"):
            with custom_span("schema_check"):
                validate_order_schema(order)
            with custom_span("inventory_check"):
                available = await check_inventory(order.items)
            with custom_span("fraud_screening"):
                fraud_score = await screen_for_fraud(order)

        with custom_span("agent_review"):
            if fraud_score > 0.7:
                result = await Runner.run(
                    fraud_review_agent,
                    f"Review order {order_id} with fraud score {fraud_score}",
                )

        with custom_span("fulfillment"):
            with custom_span("payment_capture"):
                await capture_payment(order)
            with custom_span("shipping_label"):
                label = await generate_shipping_label(order)
            with custom_span("notification"):
                await send_confirmation_email(order, label)

The resulting trace hierarchy:

Trace: "order-processing" (order_id: ORD-12345)
  +-- validation
       +-- schema_check (12ms)
       +-- inventory_check (145ms)
       +-- fraud_screening (890ms)
  +-- agent_review
       +-- agent_span: Fraud Review Agent
            +-- generation_span: gpt-4o
  +-- fulfillment
       +-- payment_capture (234ms)
       +-- shipping_label (567ms)
       +-- notification (89ms)

This hierarchical structure makes it immediately obvious that fraud screening dominates the validation phase and shipping label generation is the bottleneck in fulfillment.

Manual Span Lifecycle

Sometimes you need more control than a context manager provides. The SDK supports manual span start and finish for cases where the span boundaries do not align with a Python with block — for example, when a span starts in one callback and finishes in another:

from agents import trace, custom_span, get_current_span

class StreamProcessor:
    def __init__(self):
        self.active_span = None

    async def on_stream_start(self, stream_id: str):
        # Start a span manually
        self.active_span = custom_span("stream_processing")
        self.active_span.__enter__()

    async def on_chunk_received(self, chunk: bytes):
        # Create child spans within the active span
        with custom_span("chunk_processing"):
            processed = await self.process_chunk(chunk)
            await self.buffer.append(processed)

    async def on_stream_end(self):
        # Finish the span manually
        if self.active_span:
            self.active_span.__exit__(None, None, None)
            self.active_span = None

Manual lifecycle management should be used sparingly. Context managers are safer because they guarantee the span is closed even if an exception occurs. Reserve manual management for event-driven or callback-based architectures where context managers are impractical.

Adding Data to Spans

Spans can carry structured data that appears in the dashboard when you inspect them:

from agents import custom_span

with custom_span("database_query", data={"table": "customers", "filter": "premium"}) as span:
    results = await db.query("SELECT * FROM customers WHERE tier = 'premium'")
    # Update span data after the operation completes
    span.set_data({
        "table": "customers",
        "filter": "premium",
        "row_count": len(results),
        "duration_ms": query_duration,
    })

Attaching data to spans transforms traces from simple timing records into rich debugging artifacts. When a query returns zero rows unexpectedly, the span data shows you the exact filter that was applied without requiring you to reproduce the issue.

Instrumenting Tool Functions with Custom Spans

While the SDK auto-traces tool invocations, you might want finer granularity inside complex tools:

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Try Live Demo → Book 30-min Walkthrough See Pricing

from agents import function_tool, custom_span

@function_tool
async def analyze_document(document_url: str) -> str:
    """Download, parse, and analyze a document."""
    with custom_span("document_download"):
        content = await download_document(document_url)

    with custom_span("text_extraction"):
        text = extract_text(content)

    with custom_span("entity_extraction"):
        entities = await extract_entities(text)

    with custom_span("sentiment_analysis"):
        sentiment = await analyze_sentiment(text)

    return (
        f"Document contains {len(entities)} entities. "
        f"Overall sentiment: {sentiment.label} ({sentiment.score:.2f})"
    )

Now when you view the trace, the function_span for analyze_document contains four child spans showing exactly where time was spent inside the tool. This is invaluable when a tool that "usually takes 500ms" suddenly takes 10 seconds — the child spans pinpoint whether the download, extraction, or analysis is the culprit.

Correlating Traces Across Services

In microservice architectures, an agent workflow might call external APIs that have their own tracing. You can propagate trace context to enable end-to-end correlation:

from agents import trace, get_current_trace

with trace("cross-service-workflow") as current_trace:
    trace_id = current_trace.trace_id

    # Pass trace_id to downstream services via headers
    response = await httpx.post(
        "https://internal-api/process",
        headers={"X-Trace-Id": trace_id},
        json={"data": payload},
    )

    # The downstream service can include this trace_id in its own logs
    result = await Runner.run(agent, response.text)

This pattern lets you follow a request from the user through your agent system and into backend services, creating a unified debugging experience across your entire infrastructure.

Visualization Best Practices

Name spans after operations, not implementations — Use "fraud_screening" not "call_sift_api." Operation names remain stable even when you swap providers.
Keep hierarchies shallow — Three to four levels of nesting is ideal. Deeper hierarchies become difficult to navigate in the dashboard.
Attach business context as metadata — Include customer IDs, order IDs, and feature flags so you can filter traces by business dimensions.
Use consistent naming conventions — Adopt snake_case for all span names and stick to it. Inconsistent naming makes dashboard filters unreliable.
Instrument the boundaries — The most valuable custom spans are at I/O boundaries: database calls, HTTP requests, file operations, and message queue publishes. These are where latency hides.

Custom spans and the trace() context manager turn the Agents SDK's built-in tracing from a useful default into a comprehensive observability layer for your entire application.

Custom Spans and Trace Visualization for Complex Workflows

When Built-in Tracing Is Not Enough

The trace() Context Manager

Creating Custom Spans

Nested Span Hierarchies

Manual Span Lifecycle

Adding Data to Spans

Instrumenting Tool Functions with Custom Spans

Correlating Traces Across Services

Visualization Best Practices

Try CallSphere AI Voice Agents

Related Articles You May Like

Desktop AI Agents in 2026: Project Arc, Claude Cowork, OpenAI Agents Compared

OpenAI Frontier: Model-Native Orchestration Is the Default in 2026

Gemini Enterprise vs Anthropic vs OpenAI Frontier: 2026 Comparison

Anthropic's Financial Services Platform: State of Play in May 2026

Model-Native Harness: Why OpenAI and Anthropic Are Killing ReAct Loops

GPT-Realtime-Whisper vs Deepgram: Streaming STT in 2026