By Sagar Shankaran, Founder of CallSphere
When an AI agent calls a tool, the database update and the message publish must be atomic. The outbox pattern is the only correct answer — and it's essential when those tools are CRM updates and bookings.
Key takeaways
TL;DR — When an AI tool call writes to the DB and publishes an event, you have a dual-write problem. The DB succeeds, the publish fails — or vice versa — and your system goes inconsistent. The transactional outbox pattern is the textbook fix: write the event into an OUTBOX table inside the same transaction, then a separate process drains the outbox to the message bus.
In CallSphere, when the booking agent confirms an appointment, two things must happen: the booking row is upserted in Postgres, and a booking.confirmed event is published to NATS so the SMS sender, the calendar sync, and the analytics pipeline pick it up. Naive code does both in the application — and the dual-write fails about 1 in 10,000 transactions, which at our volume is several lost events per day.
The fix: write the event to an outbox table in the same DB transaction. A separate worker (or Debezium) tails the outbox and publishes to NATS/Kafka. The DB and the bus stay in sync — at-least-once, never zero.
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
flowchart LR
Tool[AI tool call] -->|BEGIN| DB[(Postgres)]
Tool -->|INSERT booking| DB
Tool -->|INSERT outbox event| DB
Tool -->|COMMIT| DB
DB -.WAL.- Drain[Outbox drain<br/>or Debezium]
Drain -->|publish| Bus[(NATS/Kafka)]
Bus --> SMS[SMS sender]
Bus --> Cal[Calendar sync]
Bus --> An[Analytics]
Two implementations: a polling drain that selects unsent rows and marks them sent, or CDC (Debezium tailing the WAL — see post #8) which is faster and lower-overhead.
CallSphere uses the outbox pattern for every AI tool call that has a downstream side effect: bookings, callbacks, CRM upserts, escalations. Real Estate OneRoof's booking agent writes booking + outbox in one transaction; a Debezium connector drains the outbox to NATS within ~50 ms p95. After-hours uses a Bull/Redis queue for delayed retries — different mechanism, same principle (durable intent before publish). 37 agents · 90+ tools · 115+ DB tables · 6 verticals · pricing $149/$499/$1499 · 14-day trial · 22% affiliate. Browse /pricing or take a demo.
(id uuid pk, aggregate_id, event_type, payload jsonb, created_at, sent_at).SELECT ... WHERE sent_at IS NULL every 100 ms, OR use Debezium.sent_at < now() - 7 days.sent_at IS NULL AND created_at < now() - 5s.CREATE TABLE outbox (
id uuid PRIMARY KEY DEFAULT gen_random_uuid(),
aggregate_type text NOT NULL,
aggregate_id text NOT NULL,
event_type text NOT NULL,
payload jsonb NOT NULL,
created_at timestamptz NOT NULL DEFAULT now(),
sent_at timestamptz
);
CREATE INDEX outbox_unsent ON outbox (created_at) WHERE sent_at IS NULL;
async def confirm_booking(call_id: str, slot: datetime):
async with db.transaction():
await db.execute(
"INSERT INTO bookings (call_id, slot) VALUES ($1, $2)",
call_id, slot,
)
await db.execute(
"""INSERT INTO outbox (aggregate_type, aggregate_id, event_type, payload)
VALUES ('booking', $1, 'booking.confirmed.v1', $2)""",
call_id, json.dumps({"callId": call_id, "slot": slot.isoformat()}),
)
Polling vs CDC? CDC (Debezium) is lower-latency and lower-DB-load; polling is simpler. Start with polling, graduate to CDC at scale.
Why not just publish first? Then the publish succeeds and the DB write fails — phantom event, no row.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
Does this give exactly-once? No — at-least-once. Pair with idempotency keys (post #14).
Where does CallSphere put the drainer? A small Go service per pod, or a shared Debezium cluster — both in production today. See /pricing and /demo.
What's the latency? Polling: 100-500 ms. Debezium: 50-100 ms p95.
Written by
Sagar Shankaran· Founder, CallSphere
Sagar Shankaran is the founder of CallSphere, where he builds production AI voice and chat agents deployed across healthcare, hospitality, real estate, and home services. He writes about agentic AI, LLM engineering, and shipping voice agents that handle real calls in production.
See how AI voice agents work for your industry. Live demo available -- no signup required.
A founder's guide to the personal AI assistant market: best AI assistant apps, business-grade options, and how CallSphere's voice agent fits in.
A founder's guide to free AI agents, low-code AI agent builders, and how to know when you should pay for a real platform like CallSphere.
Self-correction is now a property of the model, not the framework. What that means for production agent reliability, voice/chat fallbacks, and CallSphere.
Inngest's Agent Kit adds durable steps, retries, and concurrency control for agent runs. The right pick for agents that span hours or days without losing state.
Rate limits decide UX and reliability for LLM-backed APIs. The 2026 patterns for shaping bursts, queueing, and fair allocation.
Canarying new model versions catches regressions early. The 2026 patterns for safe LLM canary deploys and rollback automation.
© 2026 CallSphere LLC. All rights reserved.