Skip to content
Learn Agentic AI
Learn Agentic AI12 min read16 views

Sequential Agent Chaining: Building Pipeline Architectures

Learn how to build sequential agent pipelines where the output of one agent feeds directly into the next, using structured outputs for clean handoffs in the OpenAI Agents SDK.

Why Sequential Chaining Is the Most Underrated Agent Pattern

Most multi-agent tutorials jump straight to complex orchestration — handoffs, parallel execution, manager agents delegating to specialists. But the most reliable and debuggable multi-agent pattern is the simplest one: sequential chaining. Agent A finishes its work, passes a structured result to Agent B, which finishes its work and passes a structured result to Agent C.

Sequential pipelines are the assembly lines of agentic AI. Each agent has a single responsibility, a well-defined input contract, and a well-defined output contract. When something breaks, you know exactly which stage failed and why. When you need to improve quality, you can swap out a single stage without touching the rest.

This post walks through the architecture of sequential agent pipelines in the OpenAI Agents SDK, with particular focus on using structured outputs to create clean, type-safe handoffs between stages.

The Core Concept: Output as Input

A sequential chain is defined by one rule: the output of agent N becomes the input of agent N+1. This sounds trivial, but the implementation details matter enormously. If Agent A returns free-form text and Agent B expects structured data, the chain is fragile. If Agent A returns a Pydantic model and Agent B is instructed to work with that exact schema, the chain is robust.

flowchart TD
    START["Sequential Agent Chaining: Building Pipeline Arch…"] --> A
    A["Why Sequential Chaining Is the Most Und…"]
    A --> B
    B["The Core Concept: Output as Input"]
    B --> C
    C["Building the Pipeline Runner"]
    C --> D
    D["Adding Error Handling and Retries"]
    D --> E
    E["Designing Effective Stage Boundaries"]
    E --> F
    F["Validation Stages: The Quality Gate Pat…"]
    F --> G
    G["Performance Considerations"]
    G --> H
    H["When to Use Sequential Chaining vs Othe…"]
    H --> DONE["Key Takeaways"]
    style START fill:#4f46e5,stroke:#4338ca,color:#fff
    style DONE fill:#059669,stroke:#047857,color:#fff

The OpenAI Agents SDK supports structured outputs through the output_type parameter on an Agent. When you set an output type, the SDK forces the LLM to return valid JSON conforming to your Pydantic model. This eliminates parsing failures between stages.

from pydantic import BaseModel
from agents import Agent, Runner

class ResearchOutput(BaseModel):
    topic: str
    key_findings: list[str]
    sources: list[str]
    confidence_score: float

class DraftOutput(BaseModel):
    title: str
    sections: list[str]
    word_count: int
    tone: str

research_agent = Agent(
    name="Researcher",
    instructions="""You are a research analyst. Given a topic, produce
    structured research findings with sources and a confidence score
    between 0.0 and 1.0.""",
    model="gpt-4o",
    output_type=ResearchOutput,
)

draft_agent = Agent(
    name="Drafter",
    instructions="""You are a technical writer. Given research findings,
    produce a structured article draft with clear sections. Use a
    professional tone and target 800-1200 words.""",
    model="gpt-4o",
    output_type=DraftOutput,
)

Building the Pipeline Runner

The pipeline runner is a simple loop that feeds each agent's output into the next agent as input. The key design decision is how to format the handoff — you serialize the structured output into a string that the next agent can parse from its input message.

import asyncio
import json
from agents import Agent, Runner

async def run_pipeline(agents: list[Agent], initial_input: str) -> any:
    """Run a sequential pipeline of agents, passing each output as the next input."""
    current_input = initial_input

    for i, agent in enumerate(agents):
        print(f"Stage {i + 1}/{len(agents)}: Running {agent.name}")

        result = await Runner.run(agent, input=current_input)
        output = result.final_output

        # If the output is a Pydantic model, serialize it for the next agent
        if hasattr(output, 'model_dump'):
            current_input = (
                f"Previous stage ({agent.name}) produced the following output:\n"
                f"{json.dumps(output.model_dump(), indent=2)}"
            )
        else:
            current_input = str(output)

        print(f"  Completed: {agent.name}")

    return output

Now running the full pipeline is a single function call:

async def main():
    pipeline = [research_agent, draft_agent]
    result = await run_pipeline(pipeline, "Write about sequential agent pipelines in production AI systems")

    print(f"Title: {result.title}")
    print(f"Word count: {result.word_count}")
    for section in result.sections:
        print(f"  - {section[:80]}...")

asyncio.run(main())

Adding Error Handling and Retries

Production pipelines need to handle failures at any stage. If Agent B fails, you should not need to re-run Agent A (which may have cost significant tokens and time). A resilient pipeline runner captures intermediate results and supports retries per stage.

from dataclasses import dataclass, field

@dataclass
class PipelineResult:
    stage_outputs: list[any] = field(default_factory=list)
    failed_stage: int | None = None
    error: str | None = None
    completed: bool = False

async def run_pipeline_with_retries(
    agents: list[Agent],
    initial_input: str,
    max_retries: int = 2,
) -> PipelineResult:
    """Run a pipeline with per-stage retries and intermediate result capture."""
    pipeline_result = PipelineResult()
    current_input = initial_input

    for i, agent in enumerate(agents):
        success = False

        for attempt in range(max_retries + 1):
            try:
                result = await Runner.run(agent, input=current_input)
                output = result.final_output
                pipeline_result.stage_outputs.append(output)

                if hasattr(output, 'model_dump'):
                    current_input = (
                        f"Previous stage ({agent.name}) produced:\n"
                        f"{json.dumps(output.model_dump(), indent=2)}"
                    )
                else:
                    current_input = str(output)

                success = True
                break

            except Exception as e:
                print(f"  Stage {i + 1} attempt {attempt + 1} failed: {e}")
                if attempt == max_retries:
                    pipeline_result.failed_stage = i
                    pipeline_result.error = str(e)
                    return pipeline_result

        if not success:
            break

    pipeline_result.completed = True
    return pipeline_result

Designing Effective Stage Boundaries

The hardest part of sequential chaining is deciding where to split the pipeline. Too many stages and you waste tokens re-explaining context at each handoff. Too few stages and you lose the benefits of single-responsibility agents. Here are guidelines that work in practice.

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

Split when the skill set changes. If one stage requires domain expertise (medical terminology, legal reasoning) and the next requires a different skill (plain-language writing, data formatting), those should be separate agents with different system prompts.

Split when you need a quality gate. If you want to validate output before proceeding — checking that research has enough sources, or that a draft meets length requirements — insert a validation stage. This agent can either approve the output or reject it with feedback.

Do not split purely for modularity. If two stages need the same context and the same skills, combining them into a single agent is usually better. The overhead of serializing and re-parsing context is not free.

Validation Stages: The Quality Gate Pattern

A powerful extension of sequential chaining is the validation stage — an agent whose only job is to check the previous stage's output and either pass it through or flag issues.

class ValidationResult(BaseModel):
    is_valid: bool
    issues: list[str]
    suggestions: list[str]

validator_agent = Agent(
    name="QualityValidator",
    instructions="""You are a quality assurance reviewer. Evaluate the
    draft article against these criteria:
    1. All claims are supported by the provided sources
    2. The tone is professional and consistent
    3. Technical accuracy is maintained
    4. The article has a clear structure with introduction and conclusion

    Return is_valid=True only if ALL criteria are met. Otherwise list
    the specific issues and actionable suggestions.""",
    model="gpt-4o",
    output_type=ValidationResult,
)

You can insert this validation agent between any two stages. If validation fails, you can either retry the previous stage with the feedback or escalate to a human reviewer.

Performance Considerations

Sequential pipelines have an inherent latency cost: each stage must complete before the next begins. For a three-stage pipeline where each LLM call takes 3-5 seconds, total latency is 9-15 seconds. Strategies to mitigate this include using faster models (gpt-4o-mini) for simpler stages, caching stage outputs for repeated inputs, and running independent sub-tasks within a stage in parallel even though the stages themselves are sequential.

The token cost is also worth monitoring. Each handoff adds tokens because the next agent receives the serialized output of the previous agent as part of its input. For large outputs, consider summarizing before handoff rather than passing the complete structured output.

When to Use Sequential Chaining vs Other Patterns

Use sequential chaining when: your workflow has a natural linear progression, each stage has clear input/output contracts, and you value debuggability and reliability over latency.

Use handoffs instead when: the workflow requires dynamic routing — the output of one stage determines which agent should handle the next step, not just what data it receives.

Use parallel execution when: multiple stages are independent and can produce results simultaneously, which are later combined by a synthesis agent.

Sequential chaining is the foundation. Master it first, then add complexity only when the problem demands it.

Share
C

Written by

CallSphere Team

Expert insights on AI voice agents and customer communication automation.

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.