Prompt Chaining: Breaking Complex Tasks into Sequential LLM Calls
Learn how to decompose complex AI tasks into sequential prompt chains — passing intermediate results between LLM calls, handling errors in pipelines, and building reliable multi-step workflows.
Why Single Prompts Are Not Enough
As tasks grow in complexity, single prompts become unreliable. Asking an LLM to simultaneously analyze data, generate a report, and format it as a structured document invites errors at every level. Prompt chaining solves this by decomposing complex tasks into a sequence of focused LLM calls, where each call handles one well-defined step and passes its output to the next.
This is analogous to Unix pipes — small, composable operations chained together to accomplish complex workflows.
Basic Chain Pattern
The simplest chain passes the output of one call as input to the next:
flowchart TD
START["Prompt Chaining: Breaking Complex Tasks into Sequ…"] --> A
A["Why Single Prompts Are Not Enough"]
A --> B
B["Basic Chain Pattern"]
B --> C
C["Building a Chain Pipeline Class"]
C --> D
D["Error Handling in Chains"]
D --> E
E["Conditional Branching"]
E --> F
F["FAQ"]
F --> DONE["Key Takeaways"]
style START fill:#4f46e5,stroke:#4338ca,color:#fff
style DONE fill:#059669,stroke:#047857,color:#fff
from openai import OpenAI
client = OpenAI()
def llm_call(system: str, user: str, model: str = "gpt-4o") -> str:
response = client.chat.completions.create(
model=model,
messages=[
{"role": "system", "content": system},
{"role": "user", "content": user},
]
)
return response.choices[0].message.content
def analyze_and_report(raw_data: str) -> dict:
# Step 1: Extract key metrics
metrics = llm_call(
system="Extract numerical metrics from the data. Return as a bullet list of metric: value pairs.",
user=raw_data
)
# Step 2: Analyze trends
analysis = llm_call(
system="You are a data analyst. Analyze the metrics for trends, anomalies, and insights.",
user=f"Metrics:\n{metrics}"
)
# Step 3: Generate executive summary
summary = llm_call(
system="Write a 3-sentence executive summary for a non-technical audience.",
user=f"Analysis:\n{analysis}"
)
return {
"metrics": metrics,
"analysis": analysis,
"summary": summary,
}
Each step has a narrow, clearly defined task. The extraction step does not need to analyze. The analysis step does not need to format for executives. This separation produces better results at every stage.
Building a Chain Pipeline Class
For production systems, formalize chains with a pipeline abstraction:
See AI Voice Agents Handle Real Calls
Book a free demo or calculate how much you can save with AI voice automation.
from dataclasses import dataclass
from typing import Callable
@dataclass
class ChainStep:
name: str
system_prompt: str
input_formatter: Callable[[dict], str]
output_key: str
model: str = "gpt-4o"
class PromptChain:
def __init__(self, steps: list[ChainStep]):
self.steps = steps
self.client = OpenAI()
def run(self, initial_input: str) -> dict:
context = {"initial_input": initial_input}
for step in self.steps:
user_message = step.input_formatter(context)
response = self.client.chat.completions.create(
model=step.model,
messages=[
{"role": "system", "content": step.system_prompt},
{"role": "user", "content": user_message},
]
)
result = response.choices[0].message.content
context[step.output_key] = result
print(f"[{step.name}] completed -> {len(result)} chars")
return context
# Define a review pipeline
review_chain = PromptChain([
ChainStep(
name="extract_code",
system_prompt="Extract all code blocks from the pull request description. Return only the code.",
input_formatter=lambda ctx: ctx["initial_input"],
output_key="code",
),
ChainStep(
name="find_issues",
system_prompt="Review the code for bugs, security issues, and performance problems. List each issue.",
input_formatter=lambda ctx: ctx["code"],
output_key="issues",
),
ChainStep(
name="format_review",
system_prompt="Format the code review issues as a GitHub review comment with severity labels.",
input_formatter=lambda ctx: f"Issues found:\n{ctx['issues']}",
output_key="review",
),
])
results = review_chain.run(pr_description)
print(results["review"])
Error Handling in Chains
A chain is only as strong as its weakest link. Build error handling into the pipeline:
import logging
logger = logging.getLogger(__name__)
class ResilientChain:
def __init__(self, steps: list[ChainStep], max_retries: int = 2):
self.steps = steps
self.max_retries = max_retries
self.client = OpenAI()
def _execute_step(self, step: ChainStep, user_message: str) -> str:
for attempt in range(self.max_retries + 1):
try:
response = self.client.chat.completions.create(
model=step.model,
messages=[
{"role": "system", "content": step.system_prompt},
{"role": "user", "content": user_message},
]
)
result = response.choices[0].message.content
if not result or not result.strip():
raise ValueError("Empty response from LLM")
return result
except Exception as e:
logger.warning(
f"Step '{step.name}' attempt {attempt + 1} failed: {e}"
)
if attempt == self.max_retries:
raise RuntimeError(
f"Step '{step.name}' failed after {self.max_retries + 1} attempts"
) from e
def run(self, initial_input: str) -> dict:
context = {"initial_input": initial_input}
for i, step in enumerate(self.steps):
try:
user_message = step.input_formatter(context)
context[step.output_key] = self._execute_step(step, user_message)
except RuntimeError as e:
logger.error(f"Chain failed at step {i} ({step.name}): {e}")
context["error"] = str(e)
context["failed_step"] = step.name
break
return context
Conditional Branching
Not all chains are linear. Sometimes you need to branch based on intermediate results:
async def classify_and_route(customer_message: str) -> str:
# Step 1: Classify the intent
intent = llm_call(
system="Classify the customer message as: billing, technical, general, or urgent. Return only the category.",
user=customer_message
).strip().lower()
# Step 2: Route to specialized prompt based on classification
specialized_prompts = {
"billing": "You are a billing specialist. Help resolve payment and subscription issues.",
"technical": "You are a senior support engineer. Diagnose and solve technical problems.",
"urgent": "You are an escalation handler. Acknowledge the urgency, gather details, and create a priority ticket.",
"general": "You are a friendly support agent. Answer general questions about our product.",
}
system = specialized_prompts.get(intent, specialized_prompts["general"])
# Step 3: Generate the response with the specialized persona
response = llm_call(system=system, user=customer_message)
return response
This pattern — classify first, then route — is fundamental to building agentic systems. Each branch can use a different model, temperature, or even a different prompt chain.
FAQ
How many steps should a prompt chain have?
Keep chains to 2-5 steps. Each step adds latency and the risk of error compounding. If your chain has more than 5 steps, consider whether some steps can be combined or whether a single well-crafted prompt could replace part of the chain.
How do I debug a failing chain?
Log the full input and output of every step. When a chain produces bad results, inspect each step's output to find where quality degrades. Often the issue is in the input formatting between steps — the output of step N does not match what step N+1 expects.
Is prompt chaining the same as using agents with tools?
No. Prompt chaining is a predefined sequence of calls that you design. Agent tool use is dynamic — the model decides at runtime which tools to call and in what order. Chains are simpler, more predictable, and easier to debug. Use chains when the workflow is known; use agents when the workflow must be discovered.
#PromptChaining #PipelineDesign #LLMOrchestration #PromptEngineering #Python #AgenticAI #LearnAI #AIEngineering
Written by
CallSphere Team
Expert insights on AI voice agents and customer communication automation.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.