Sequential Agent Chaining: Building Pipeline Architectures
Learn how to build sequential agent pipelines where the output of one agent feeds directly into the next, using structured outputs for clean handoffs in the OpenAI Agents SDK.
Why Sequential Chaining Is the Most Underrated Agent Pattern
Most multi-agent tutorials jump straight to complex orchestration — handoffs, parallel execution, manager agents delegating to specialists. But the most reliable and debuggable multi-agent pattern is the simplest one: sequential chaining. Agent A finishes its work, passes a structured result to Agent B, which finishes its work and passes a structured result to Agent C.
Sequential pipelines are the assembly lines of agentic AI. Each agent has a single responsibility, a well-defined input contract, and a well-defined output contract. When something breaks, you know exactly which stage failed and why. When you need to improve quality, you can swap out a single stage without touching the rest.
This post walks through the architecture of sequential agent pipelines in the OpenAI Agents SDK, with particular focus on using structured outputs to create clean, type-safe handoffs between stages.
The Core Concept: Output as Input
A sequential chain is defined by one rule: the output of agent N becomes the input of agent N+1. This sounds trivial, but the implementation details matter enormously. If Agent A returns free-form text and Agent B expects structured data, the chain is fragile. If Agent A returns a Pydantic model and Agent B is instructed to work with that exact schema, the chain is robust.
flowchart TD
START["Sequential Agent Chaining: Building Pipeline Arch…"] --> A
A["Why Sequential Chaining Is the Most Und…"]
A --> B
B["The Core Concept: Output as Input"]
B --> C
C["Building the Pipeline Runner"]
C --> D
D["Adding Error Handling and Retries"]
D --> E
E["Designing Effective Stage Boundaries"]
E --> F
F["Validation Stages: The Quality Gate Pat…"]
F --> G
G["Performance Considerations"]
G --> H
H["When to Use Sequential Chaining vs Othe…"]
H --> DONE["Key Takeaways"]
style START fill:#4f46e5,stroke:#4338ca,color:#fff
style DONE fill:#059669,stroke:#047857,color:#fff
The OpenAI Agents SDK supports structured outputs through the output_type parameter on an Agent. When you set an output type, the SDK forces the LLM to return valid JSON conforming to your Pydantic model. This eliminates parsing failures between stages.
from pydantic import BaseModel
from agents import Agent, Runner
class ResearchOutput(BaseModel):
topic: str
key_findings: list[str]
sources: list[str]
confidence_score: float
class DraftOutput(BaseModel):
title: str
sections: list[str]
word_count: int
tone: str
research_agent = Agent(
name="Researcher",
instructions="""You are a research analyst. Given a topic, produce
structured research findings with sources and a confidence score
between 0.0 and 1.0.""",
model="gpt-4o",
output_type=ResearchOutput,
)
draft_agent = Agent(
name="Drafter",
instructions="""You are a technical writer. Given research findings,
produce a structured article draft with clear sections. Use a
professional tone and target 800-1200 words.""",
model="gpt-4o",
output_type=DraftOutput,
)
Building the Pipeline Runner
The pipeline runner is a simple loop that feeds each agent's output into the next agent as input. The key design decision is how to format the handoff — you serialize the structured output into a string that the next agent can parse from its input message.
import asyncio
import json
from agents import Agent, Runner
async def run_pipeline(agents: list[Agent], initial_input: str) -> any:
"""Run a sequential pipeline of agents, passing each output as the next input."""
current_input = initial_input
for i, agent in enumerate(agents):
print(f"Stage {i + 1}/{len(agents)}: Running {agent.name}")
result = await Runner.run(agent, input=current_input)
output = result.final_output
# If the output is a Pydantic model, serialize it for the next agent
if hasattr(output, 'model_dump'):
current_input = (
f"Previous stage ({agent.name}) produced the following output:\n"
f"{json.dumps(output.model_dump(), indent=2)}"
)
else:
current_input = str(output)
print(f" Completed: {agent.name}")
return output
Now running the full pipeline is a single function call:
async def main():
pipeline = [research_agent, draft_agent]
result = await run_pipeline(pipeline, "Write about sequential agent pipelines in production AI systems")
print(f"Title: {result.title}")
print(f"Word count: {result.word_count}")
for section in result.sections:
print(f" - {section[:80]}...")
asyncio.run(main())
Adding Error Handling and Retries
Production pipelines need to handle failures at any stage. If Agent B fails, you should not need to re-run Agent A (which may have cost significant tokens and time). A resilient pipeline runner captures intermediate results and supports retries per stage.
from dataclasses import dataclass, field
@dataclass
class PipelineResult:
stage_outputs: list[any] = field(default_factory=list)
failed_stage: int | None = None
error: str | None = None
completed: bool = False
async def run_pipeline_with_retries(
agents: list[Agent],
initial_input: str,
max_retries: int = 2,
) -> PipelineResult:
"""Run a pipeline with per-stage retries and intermediate result capture."""
pipeline_result = PipelineResult()
current_input = initial_input
for i, agent in enumerate(agents):
success = False
for attempt in range(max_retries + 1):
try:
result = await Runner.run(agent, input=current_input)
output = result.final_output
pipeline_result.stage_outputs.append(output)
if hasattr(output, 'model_dump'):
current_input = (
f"Previous stage ({agent.name}) produced:\n"
f"{json.dumps(output.model_dump(), indent=2)}"
)
else:
current_input = str(output)
success = True
break
except Exception as e:
print(f" Stage {i + 1} attempt {attempt + 1} failed: {e}")
if attempt == max_retries:
pipeline_result.failed_stage = i
pipeline_result.error = str(e)
return pipeline_result
if not success:
break
pipeline_result.completed = True
return pipeline_result
Designing Effective Stage Boundaries
The hardest part of sequential chaining is deciding where to split the pipeline. Too many stages and you waste tokens re-explaining context at each handoff. Too few stages and you lose the benefits of single-responsibility agents. Here are guidelines that work in practice.
See AI Voice Agents Handle Real Calls
Book a free demo or calculate how much you can save with AI voice automation.
Split when the skill set changes. If one stage requires domain expertise (medical terminology, legal reasoning) and the next requires a different skill (plain-language writing, data formatting), those should be separate agents with different system prompts.
Split when you need a quality gate. If you want to validate output before proceeding — checking that research has enough sources, or that a draft meets length requirements — insert a validation stage. This agent can either approve the output or reject it with feedback.
Do not split purely for modularity. If two stages need the same context and the same skills, combining them into a single agent is usually better. The overhead of serializing and re-parsing context is not free.
Validation Stages: The Quality Gate Pattern
A powerful extension of sequential chaining is the validation stage — an agent whose only job is to check the previous stage's output and either pass it through or flag issues.
class ValidationResult(BaseModel):
is_valid: bool
issues: list[str]
suggestions: list[str]
validator_agent = Agent(
name="QualityValidator",
instructions="""You are a quality assurance reviewer. Evaluate the
draft article against these criteria:
1. All claims are supported by the provided sources
2. The tone is professional and consistent
3. Technical accuracy is maintained
4. The article has a clear structure with introduction and conclusion
Return is_valid=True only if ALL criteria are met. Otherwise list
the specific issues and actionable suggestions.""",
model="gpt-4o",
output_type=ValidationResult,
)
You can insert this validation agent between any two stages. If validation fails, you can either retry the previous stage with the feedback or escalate to a human reviewer.
Performance Considerations
Sequential pipelines have an inherent latency cost: each stage must complete before the next begins. For a three-stage pipeline where each LLM call takes 3-5 seconds, total latency is 9-15 seconds. Strategies to mitigate this include using faster models (gpt-4o-mini) for simpler stages, caching stage outputs for repeated inputs, and running independent sub-tasks within a stage in parallel even though the stages themselves are sequential.
The token cost is also worth monitoring. Each handoff adds tokens because the next agent receives the serialized output of the previous agent as part of its input. For large outputs, consider summarizing before handoff rather than passing the complete structured output.
When to Use Sequential Chaining vs Other Patterns
Use sequential chaining when: your workflow has a natural linear progression, each stage has clear input/output contracts, and you value debuggability and reliability over latency.
Use handoffs instead when: the workflow requires dynamic routing — the output of one stage determines which agent should handle the next step, not just what data it receives.
Use parallel execution when: multiple stages are independent and can produce results simultaneously, which are later combined by a synthesis agent.
Sequential chaining is the foundation. Master it first, then add complexity only when the problem demands it.
Written by
CallSphere Team
Expert insights on AI voice agents and customer communication automation.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.