Saga Pattern for Multi-Step AI Workflows: Orchestration Beats Choreography in 2026
Multi-step AI workflows — book viewing, charge card, send confirmation, sync calendar — fail at step 3 and you need to compensate steps 1 and 2. The saga pattern is the answer; orchestration with Temporal is the 2026 default.
TL;DR — A multi-step AI workflow that touches three services has eight failure modes. The saga pattern decomposes the workflow into local transactions with compensating actions, and in 2026 the dominant flavor is orchestration (Temporal, Step Functions) over choreography because debugging a centralized state machine beats debugging a graph of event listeners.
The pattern
CallSphere booking workflow: agent books slot → charges card → sends confirmation SMS → syncs Google Calendar. Step 4 fails — what happens to steps 1-3? Without a saga: card is charged, SMS is sent, no calendar entry, customer is angry. With a saga: each step has a compensating action; the orchestrator runs the comps in reverse on failure.
How it works (architecture)
flowchart LR
Trigger[AI agent] --> Orch[Saga orchestrator]
Orch --> S1[1 Book slot]
S1 --> S2[2 Charge card]
S2 --> S3[3 Send SMS]
S3 --> S4[4 Sync calendar]
S4 -.fail.-> C3[Comp 3: SMS apology]
C3 --> C2[Comp 2: Refund card]
C2 --> C1[Comp 1: Release slot]
S4 --> Done[Done]
Each forward step has a compensation. The orchestrator (Temporal workflow, AWS Step Functions state machine, LittleHorse) tracks state durably and replays on crash.
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
CallSphere implementation
CallSphere uses Temporal for the Real Estate OneRoof booking saga (5 steps, 4 services, ~3 minute median). The Temporal workflow lives in a sidecar container next to the agent. After-hours uses a simpler Bull/Redis chain because the work is always 2 steps and reversible. 37 agents · 90+ tools · 115+ DB tables · 6 verticals · pricing $149/$499/$1499 · 14-day trial · 22% affiliate. Browse /pricing or take a demo.
Build steps with code
- Pick orchestration unless your saga is exactly 2 steps.
- Run Temporal (self-hosted or Cloud) with at least 3 worker replicas.
- Define the workflow as code — workflows are deterministic, activities are not.
- Each activity has a compensation activity.
- Idempotency keys per activity (post #14) — Temporal will retry.
- Set activity retry policy — exponential, max 5.
- Use signals + queries for human-in-the-loop steps.
from temporalio import workflow, activity
from datetime import timedelta
@activity.defn
async def book_slot(call_id: str, slot: str) -> str: ...
@activity.defn
async def release_slot(booking_id: str) -> None: ...
@activity.defn
async def charge_card(call_id: str, amount: int) -> str: ...
@activity.defn
async def refund_card(charge_id: str) -> None: ...
@activity.defn
async def send_sms(call_id: str, body: str) -> None: ...
@activity.defn
async def sync_calendar(booking_id: str) -> None: ...
@workflow.defn
class BookingSaga:
@workflow.run
async def run(self, call_id: str, slot: str, amount: int) -> str:
booking_id = await workflow.execute_activity(
book_slot, args=[call_id, slot],
start_to_close_timeout=timedelta(seconds=30),
)
try:
charge_id = await workflow.execute_activity(charge_card, args=[call_id, amount],
start_to_close_timeout=timedelta(seconds=30))
try:
await workflow.execute_activity(send_sms, args=[call_id, "Confirmed"],
start_to_close_timeout=timedelta(seconds=10))
try:
await workflow.execute_activity(sync_calendar, args=[booking_id],
start_to_close_timeout=timedelta(seconds=30))
return booking_id
except Exception:
await workflow.execute_activity(send_sms, args=[call_id, "Apology"],
start_to_close_timeout=timedelta(seconds=10))
raise
except Exception:
await workflow.execute_activity(refund_card, args=[charge_id],
start_to_close_timeout=timedelta(seconds=30))
raise
except Exception:
await workflow.execute_activity(release_slot, args=[booking_id],
start_to_close_timeout=timedelta(seconds=30))
raise
Common pitfalls
- Choreography for >3 steps — every team owns part of the saga, no one owns the whole; debugging is misery.
- Compensations that aren't idempotent — retry storms double-refund.
- Skipping the timeout — activities hang; workflow stuck forever.
- Using a saga where a 2PC would do — if both services are yours and on the same DB, just use a transaction.
- No human-in-the-loop affordance — real workflows need pauses; Temporal signals handle this.
FAQ
Orchestration vs choreography? Orchestration for >3 steps, choreography for tightly bounded contexts.
Temporal vs Step Functions? Temporal is portable and code-first. Step Functions is AWS-locked but operationally simple.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
What about LangGraph for agents? LangGraph orchestrates the model; Temporal orchestrates the side-effects. Often both.
Does CallSphere expose sagas to customers? Indirectly — they show up as multi-step bookings on /pricing. /demo.
How do compensations interact with the outbox? Each activity uses outbox + idempotency; the saga ensures correct ordering.
Sources
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.