Why One Memory Store Is Not Enough

Early LLM agents treated memory as one big vector store: dump every conversation chunk, retrieve the nearest neighbors, hope for the best. By 2026, the teams shipping reliable agents at scale have stopped doing this. They borrow the cognitive science taxonomy of episodic, semantic, and procedural memory because each kind needs different storage, different write rules, and very different retrieval behavior.

This guide walks through the three-store pattern, the tradeoffs that matter in production, and the open-source projects (Letta, Zep, Mem0, MemGPT, Cognee) implementing each piece.

The Three Stores

flowchart TB
    User[User Turn] --> Agent[Agent Orchestrator]
    Agent --> EM[Episodic Memory<br/>Time-stamped events]
    Agent --> SM[Semantic Memory<br/>Distilled facts]
    Agent --> PM[Procedural Memory<br/>Skills + workflows]
    EM --> Vec[(Vector + Time Index)]
    SM --> KG[(Knowledge Graph)]
    PM --> Skill[(Skill Registry)]
    Vec --> Retrieve[Retrieval Layer]
    KG --> Retrieve
    Skill --> Retrieve
    Retrieve --> LLM[LLM Context]

Episodic Memory

Episodic memory is the timeline of what happened. Each entry is a tuple of (timestamp, agent_id, user_id, event_type, content, embedding). The right primitive is a vector store with a strong time dimension — pgvector with a btree on occurred_at, or Zep's purpose-built temporal graph.

Write rule: append-only. Every turn, every tool call, every tool result.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live →

Try Live Demo →

Retrieval rule: hybrid — combine semantic similarity to the current query with recency decay. A simple but durable formula is score = 0.7 * cosine + 0.3 * exp(-age_days / half_life).

Semantic Memory

Semantic memory is the distilled, deduplicated set of facts the agent has learned. "User prefers vegetarian food," "ACME's renewal date is October 15," "the database is named prod-east-1." This is not a transcript — it is the lessons drawn from many transcripts.

The right primitive in 2026 is a knowledge graph. Mem0, Cognee, and Graphiti all implement this with Neo4j, Kuzu, or Memgraph as the backing store. Updates run asynchronously: a background process consumes episodic events and emits CRUD operations on the graph.

Write rule: deduplicate on entity + relation. Use entity resolution (canonical-name matching plus embedding clustering) before insert.

Retrieval rule: graph traversal from the entities mentioned in the query. Limit by hop count (typically 2 or 3) and edge weight.

Procedural Memory

Procedural memory is "how I did X last time it worked." It stores the sequence of tool calls that successfully completed a task type. The right primitive is a skill or workflow registry — JSON documents keyed by task signature, retrieved by similarity to the current goal.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Try Live Demo → Book 30-min Walkthrough See Pricing

Write rule: only on verified success. Never write a skill from a failed or human-cancelled trajectory.

Retrieval rule: exact or near-exact match on task type, then embed the goal and pick the top-k templates.

The Asynchronous Memory Pipeline

The single biggest mistake in 2026 production agents is doing memory writes inline with the user-facing request. Episodic writes can be inline (low cost), but semantic and procedural writes are LLM-driven and slow. Run them on a queue:

sequenceDiagram
    participant U as User
    participant A as Agent
    participant E as Episodic Store
    participant Q as Queue (NATS / SQS)
    participant W as Memory Worker
    participant S as Semantic + Procedural
    U->>A: Message
    A->>E: append event
    A->>U: response
    E-->>Q: emit event
    Q->>W: deliver
    W->>W: extract facts + skills
    W->>S: upsert

This keeps p95 latency low and makes memory enrichment idempotent and re-runnable.

Forgetting and Conflicts

The hard parts in 2026 are not write or read — they are forgetting and conflict resolution. Three patterns are working in practice:

TTL on episodic: keep raw events for 30-90 days, then drop. The semantic store retains what mattered.
Provenance on semantic: every fact has the source episode IDs. When a contradicting fact arrives, run a tiny LLM judge to merge or supersede.
Versioned procedural: skills are versioned; failures decrement a confidence score; below a threshold, the skill is retired.

Open-Source Implementations Worth Studying

Letta (formerly MemGPT) — best reference for the OS-paging analogy applied to LLM context
Mem0 — production-ready, three-store implementation with graph backend
Zep — temporal knowledge graph as a service
Cognee — open-source memory engine with strong GraphRAG support
Graphiti — Neo4j-backed temporal graph from Zep, open source

Sources

Letta documentation — https://docs.letta.com
Mem0 architecture — https://docs.mem0.ai/architecture
Zep temporal graph paper — https://arxiv.org/abs/2501.13956
Graphiti repo — https://github.com/getzep/graphiti
"Generative Agents" Park et al. (the original episodic memory paper for LLMs) — https://arxiv.org/abs/2304.03442

Agent Memory Patterns: Episodic, Semantic, and Procedural Stores in Production

Why One Memory Store Is Not Enough

The Three Stores

Episodic Memory

Semantic Memory

Procedural Memory

The Asynchronous Memory Pipeline

Forgetting and Conflicts

Open-Source Implementations Worth Studying

Sources

Try CallSphere AI Voice Agents

Related Articles You May Like

Chatbot for Answering Questions: How to Build One That Works

Graphiti: How Temporal Knowledge Graphs Give AI Voice Agents Persistent Memory (2026 Guide)

How To Create A Chatbot In 2026: A Founder's Practical Guide

Desktop AI Agents in 2026: Project Arc, Claude Cowork, OpenAI Agents Compared

Reasoning models (Claude Mythos, o3, Opus 4.7, DeepSeek V4-Pro): Which Wins for Browser-side LLMs (WebGPU) in 2026?

Self-hosted on-prem stack for Browser-side LLMs (WebGPU): A May 2026 Comparison