Agent Memory Patterns: Episodic, Semantic, and Procedural Stores in Production
Production LLM agents in 2026 separate episodic, semantic, and procedural memory. Here is how to design each store and the tradeoffs that matter.
Why One Memory Store Is Not Enough
Early LLM agents treated memory as one big vector store: dump every conversation chunk, retrieve the nearest neighbors, hope for the best. By 2026, the teams shipping reliable agents at scale have stopped doing this. They borrow the cognitive science taxonomy of episodic, semantic, and procedural memory because each kind needs different storage, different write rules, and very different retrieval behavior.
This guide walks through the three-store pattern, the tradeoffs that matter in production, and the open-source projects (Letta, Zep, Mem0, MemGPT, Cognee) implementing each piece.
The Three Stores
flowchart TB
User[User Turn] --> Agent[Agent Orchestrator]
Agent --> EM[Episodic Memory<br/>Time-stamped events]
Agent --> SM[Semantic Memory<br/>Distilled facts]
Agent --> PM[Procedural Memory<br/>Skills + workflows]
EM --> Vec[(Vector + Time Index)]
SM --> KG[(Knowledge Graph)]
PM --> Skill[(Skill Registry)]
Vec --> Retrieve[Retrieval Layer]
KG --> Retrieve
Skill --> Retrieve
Retrieve --> LLM[LLM Context]
Episodic Memory
Episodic memory is the timeline of what happened. Each entry is a tuple of (timestamp, agent_id, user_id, event_type, content, embedding). The right primitive is a vector store with a strong time dimension — pgvector with a btree on occurred_at, or Zep's purpose-built temporal graph.
Write rule: append-only. Every turn, every tool call, every tool result.
Retrieval rule: hybrid — combine semantic similarity to the current query with recency decay. A simple but durable formula is score = 0.7 * cosine + 0.3 * exp(-age_days / half_life).
Semantic Memory
Semantic memory is the distilled, deduplicated set of facts the agent has learned. "User prefers vegetarian food," "ACME's renewal date is October 15," "the database is named prod-east-1." This is not a transcript — it is the lessons drawn from many transcripts.
The right primitive in 2026 is a knowledge graph. Mem0, Cognee, and Graphiti all implement this with Neo4j, Kuzu, or Memgraph as the backing store. Updates run asynchronously: a background process consumes episodic events and emits CRUD operations on the graph.
See AI Voice Agents Handle Real Calls
Book a free demo or calculate how much you can save with AI voice automation.
Write rule: deduplicate on entity + relation. Use entity resolution (canonical-name matching plus embedding clustering) before insert.
Retrieval rule: graph traversal from the entities mentioned in the query. Limit by hop count (typically 2 or 3) and edge weight.
Procedural Memory
Procedural memory is "how I did X last time it worked." It stores the sequence of tool calls that successfully completed a task type. The right primitive is a skill or workflow registry — JSON documents keyed by task signature, retrieved by similarity to the current goal.
Write rule: only on verified success. Never write a skill from a failed or human-cancelled trajectory.
Retrieval rule: exact or near-exact match on task type, then embed the goal and pick the top-k templates.
The Asynchronous Memory Pipeline
The single biggest mistake in 2026 production agents is doing memory writes inline with the user-facing request. Episodic writes can be inline (low cost), but semantic and procedural writes are LLM-driven and slow. Run them on a queue:
sequenceDiagram
participant U as User
participant A as Agent
participant E as Episodic Store
participant Q as Queue (NATS / SQS)
participant W as Memory Worker
participant S as Semantic + Procedural
U->>A: Message
A->>E: append event
A->>U: response
E-->>Q: emit event
Q->>W: deliver
W->>W: extract facts + skills
W->>S: upsert
This keeps p95 latency low and makes memory enrichment idempotent and re-runnable.
Forgetting and Conflicts
The hard parts in 2026 are not write or read — they are forgetting and conflict resolution. Three patterns are working in practice:
- TTL on episodic: keep raw events for 30-90 days, then drop. The semantic store retains what mattered.
- Provenance on semantic: every fact has the source episode IDs. When a contradicting fact arrives, run a tiny LLM judge to merge or supersede.
- Versioned procedural: skills are versioned; failures decrement a confidence score; below a threshold, the skill is retired.
Open-Source Implementations Worth Studying
- Letta (formerly MemGPT) — best reference for the OS-paging analogy applied to LLM context
- Mem0 — production-ready, three-store implementation with graph backend
- Zep — temporal knowledge graph as a service
- Cognee — open-source memory engine with strong GraphRAG support
- Graphiti — Neo4j-backed temporal graph from Zep, open source
Sources
- Letta documentation — https://docs.letta.com
- Mem0 architecture — https://docs.mem0.ai/architecture
- Zep temporal graph paper — https://arxiv.org/abs/2501.13956
- Graphiti repo — https://github.com/getzep/graphiti
- "Generative Agents" Park et al. (the original episodic memory paper for LLMs) — https://arxiv.org/abs/2304.03442
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.