---
title: "Agent Memory Patterns: Episodic, Semantic, and Procedural Stores in Production"
description: "Production LLM agents in 2026 separate episodic, semantic, and procedural memory. Here is how to design each store and the tradeoffs that matter."
canonical: https://callsphere.ai/blog/agent-memory-patterns-episodic-semantic-procedural-2026
category: "Agentic AI"
tags: ["Agent Memory", "Vector Stores", "Agentic AI", "Production AI", "RAG"]
author: "CallSphere Team"
published: 2026-04-24T00:00:00.000Z
updated: 2026-05-02T04:20:52.857Z
---

# Agent Memory Patterns: Episodic, Semantic, and Procedural Stores in Production

> Production LLM agents in 2026 separate episodic, semantic, and procedural memory. Here is how to design each store and the tradeoffs that matter.

## Why One Memory Store Is Not Enough

Early LLM agents treated memory as one big vector store: dump every conversation chunk, retrieve the nearest neighbors, hope for the best. By 2026, the teams shipping reliable agents at scale have stopped doing this. They borrow the cognitive science taxonomy of episodic, semantic, and procedural memory because each kind needs different storage, different write rules, and very different retrieval behavior.

This guide walks through the three-store pattern, the tradeoffs that matter in production, and the open-source projects (Letta, Zep, Mem0, MemGPT, Cognee) implementing each piece.

## The Three Stores

```mermaid
flowchart TB
    User[User Turn] --> Agent[Agent Orchestrator]
    Agent --> EM[Episodic Memory
Time-stamped events]
    Agent --> SM[Semantic Memory
Distilled facts]
    Agent --> PM[Procedural Memory
Skills + workflows]
    EM --> Vec[(Vector + Time Index)]
    SM --> KG[(Knowledge Graph)]
    PM --> Skill[(Skill Registry)]
    Vec --> Retrieve[Retrieval Layer]
    KG --> Retrieve
    Skill --> Retrieve
    Retrieve --> LLM[LLM Context]
```

### Episodic Memory

Episodic memory is the timeline of what happened. Each entry is a tuple of `(timestamp, agent_id, user_id, event_type, content, embedding)`. The right primitive is a vector store with a strong time dimension — pgvector with a btree on `occurred_at`, or Zep's purpose-built temporal graph.

**Write rule**: append-only. Every turn, every tool call, every tool result.

**Retrieval rule**: hybrid — combine semantic similarity to the current query with recency decay. A simple but durable formula is `score = 0.7 * cosine + 0.3 * exp(-age_days / half_life)`.

### Semantic Memory

Semantic memory is the distilled, deduplicated set of facts the agent has learned. "User prefers vegetarian food," "ACME's renewal date is October 15," "the database is named prod-east-1." This is not a transcript — it is the lessons drawn from many transcripts.

The right primitive in 2026 is a knowledge graph. Mem0, Cognee, and Graphiti all implement this with Neo4j, Kuzu, or Memgraph as the backing store. Updates run asynchronously: a background process consumes episodic events and emits CRUD operations on the graph.

**Write rule**: deduplicate on entity + relation. Use entity resolution (canonical-name matching plus embedding clustering) before insert.

**Retrieval rule**: graph traversal from the entities mentioned in the query. Limit by hop count (typically 2 or 3) and edge weight.

### Procedural Memory

Procedural memory is "how I did X last time it worked." It stores the sequence of tool calls that successfully completed a task type. The right primitive is a skill or workflow registry — JSON documents keyed by task signature, retrieved by similarity to the current goal.

**Write rule**: only on verified success. Never write a skill from a failed or human-cancelled trajectory.

**Retrieval rule**: exact or near-exact match on task type, then embed the goal and pick the top-k templates.

## The Asynchronous Memory Pipeline

The single biggest mistake in 2026 production agents is doing memory writes inline with the user-facing request. Episodic writes can be inline (low cost), but semantic and procedural writes are LLM-driven and slow. Run them on a queue:

```mermaid
sequenceDiagram
    participant U as User
    participant A as Agent
    participant E as Episodic Store
    participant Q as Queue (NATS / SQS)
    participant W as Memory Worker
    participant S as Semantic + Procedural
    U->>A: Message
    A->>E: append event
    A->>U: response
    E-->>Q: emit event
    Q->>W: deliver
    W->>W: extract facts + skills
    W->>S: upsert
```

This keeps p95 latency low and makes memory enrichment idempotent and re-runnable.

## Forgetting and Conflicts

The hard parts in 2026 are not write or read — they are forgetting and conflict resolution. Three patterns are working in practice:

- **TTL on episodic**: keep raw events for 30-90 days, then drop. The semantic store retains what mattered.
- **Provenance on semantic**: every fact has the source episode IDs. When a contradicting fact arrives, run a tiny LLM judge to merge or supersede.
- **Versioned procedural**: skills are versioned; failures decrement a confidence score; below a threshold, the skill is retired.

## Open-Source Implementations Worth Studying

- **Letta (formerly MemGPT)** — best reference for the OS-paging analogy applied to LLM context
- **Mem0** — production-ready, three-store implementation with graph backend
- **Zep** — temporal knowledge graph as a service
- **Cognee** — open-source memory engine with strong GraphRAG support
- **Graphiti** — Neo4j-backed temporal graph from Zep, open source

## Sources

- Letta documentation — [https://docs.letta.com](https://docs.letta.com)
- Mem0 architecture — [https://docs.mem0.ai/architecture](https://docs.mem0.ai/architecture)
- Zep temporal graph paper — [https://arxiv.org/abs/2501.13956](https://arxiv.org/abs/2501.13956)
- Graphiti repo — [https://github.com/getzep/graphiti](https://github.com/getzep/graphiti)
- "Generative Agents" Park et al. (the original episodic memory paper for LLMs) — [https://arxiv.org/abs/2304.03442](https://arxiv.org/abs/2304.03442)

---

Source: https://callsphere.ai/blog/agent-memory-patterns-episodic-semantic-procedural-2026
