---
title: "Event Sourcing for AI Agents: Replay a Conversation, Re-Plan a Decision, Audit a Refund"
description: "Storing the agent's state mutations as immutable events lets you replay any conversation, A/B-test a new prompt against historical traffic, and prove to a regulator exactly what the agent saw and said."
canonical: https://callsphere.ai/blog/vw4c-event-sourcing-ai-agent-conversation-replay
category: "AI Engineering"
tags: ["Event Sourcing", "CQRS", "AI Replay", "Audit", "A/B Testing"]
author: "CallSphere Team"
published: 2026-04-14T00:00:00.000Z
updated: 2026-05-07T16:13:28.288Z
---

# Event Sourcing for AI Agents: Replay a Conversation, Re-Plan a Decision, Audit a Refund

> Storing the agent's state mutations as immutable events lets you replay any conversation, A/B-test a new prompt against historical traffic, and prove to a regulator exactly what the agent saw and said.

> **TL;DR** — Store every agent input and output as an immutable event in an append-only log. The current state is a fold over the events. Benefits: replay a conversation against a new prompt, full audit trail for HIPAA/SOC2, time-travel debugging, and A/B testing a new model on real historical traffic without touching production.

## The pattern

A traditional CRUD agent stores "the current customer record". An event-sourced agent stores "every event that has ever happened". The current record is derived by replaying the events. This is more storage and more code, but you gain superpowers: replay, audit, time-travel, and A/B against history.

## How it works (architecture)

```mermaid
flowchart LR
  Caller --> Agent[AI agent]
  Agent -->|append| ES[(Event store
EventStoreDB / Postgres)]
  ES -->|projection| Read1[(Customer view)]
  ES -->|projection| Read2[(Conversation view)]
  ES -.replay.- Replay[Replay engine]
  Replay -->|fold events| New[New prompt + model]
  Replay -->|compare| Diff[A/B diff]
```

Each event is `(aggregateId, eventType, payload, version, timestamp)`. Projections (CQRS read models) are derived. New prompts can be A/B tested by replaying past conversations through the new prompt and diffing the responses.

## CallSphere implementation

CallSphere event-sources every agent turn for [Real Estate OneRoof](/industries/real-estate), Healthcare, and Sales because regulator/HIPAA replay is contractual. After-hours and Salon use a simpler journal model. The append-only log lives in Postgres with a partition per month and a Kafka projection for downstream views. We A/B test new prompts by replaying yesterday's traffic through the candidate prompt and diffing tool calls. 37 agents · 90+ tools · 115+ DB tables · 6 verticals · pricing $149/$499/$1499 · [14-day trial](/trial) · [22% affiliate](/affiliate). [/pricing](/pricing) · [/demo](/demo).

## Build steps with code

1. **Pick a store**: EventStoreDB, Axon, or Postgres with append-only constraints.
2. **Define event types** — naming and versioning matter forever.
3. **Append on every state change** — DB row, tool call, model decision, redaction.
4. **Build projections** for read views (customer, call, conversation).
5. **Snapshot every N events** so replay isn't O(n) from time zero.
6. **Migrate event schemas** with versioning, never with mutation.
7. **Replay engine** to test new prompts against historical events.

```sql
CREATE TABLE events (
  id bigserial PRIMARY KEY,
  aggregate_id uuid NOT NULL,
  aggregate_type text NOT NULL,
  event_type text NOT NULL,
  event_version int NOT NULL,
  payload jsonb NOT NULL,
  created_at timestamptz NOT NULL DEFAULT now(),
  UNIQUE (aggregate_id, id)
) PARTITION BY RANGE (created_at);

CREATE INDEX events_agg ON events (aggregate_id, id);
```

```python
def fold_call(events: list[dict]) -> dict:
    state = {"transcript": [], "tool_calls": [], "redactions": []}
    for e in events:
        t = e["event_type"]
        if t == "user.utterance.v1":
            state["transcript"].append({"role": "user", "text": e["payload"]["text"]})
        elif t == "agent.utterance.v1":
            state["transcript"].append({"role": "agent", "text": e["payload"]["text"]})
        elif t == "tool.call.v1":
            state["tool_calls"].append(e["payload"])
        elif t == "redaction.applied.v1":
            state["redactions"].append(e["payload"])
    return state

# Replay against new prompt
def replay_with_prompt(call_id: str, new_prompt: str) -> dict:
    events = load_events(call_id)
    user_turns = [e for e in events if e["event_type"] == "user.utterance.v1"]
    new_responses = [run_model(new_prompt, u["payload"]["text"]) for u in user_turns]
    return diff_against_recorded(events, new_responses)
```

## Common pitfalls

- **Mutating event payloads** — never; version events instead.
- **No snapshots** — replay grinds when aggregates have 10k+ events.
- **Coupling read model schema to event names** — keep them decoupled via projections.
- **Treating event store as write-only** — you'll need to query; build projections.
- **Replay storms on prompt changes** — schedule them off-peak.

## FAQ

**Event sourcing vs append-only log?** Event sourcing is the architectural pattern; the log is the storage.

**EventStoreDB vs Postgres?** EventStoreDB is purpose-built; Postgres is fine up to ~1k events/sec per partition.

**Do we need CQRS?** Not strictly — but the read/write split falls out naturally.

**How does CallSphere use replay in eval?** We replay the last 7 days through every prompt change before shipping. Book a [demo](/demo) to see it.

**Cost?** ~3x storage vs CRUD; bounded with snapshot+compaction.

## Sources

- [microservices.io: Event Sourcing Pattern](https://microservices.io/patterns/data/event-sourcing.html)
- [Microsoft: Event Sourcing Pattern](https://learn.microsoft.com/en-us/azure/architecture/patterns/event-sourcing)
- [Event Sourcing and CQRS with Databases (Apr 2026)](https://dasroot.net/posts/2026/04/event-sourcing-cqrs-databases-eventstoredb-axon-polecat/)
- [Event Sourcing Explained: When CRUD Is Not Enough (Practical Guide 2026)](https://dev.to/young_gao/event-sourcing-explained-when-crud-is-not-enough-4od5)

---

Source: https://callsphere.ai/blog/vw4c-event-sourcing-ai-agent-conversation-replay