Skip to content
Agentic AI
Agentic AI10 min read1 views

Agent Memory Patterns: Episodic, Semantic, and Procedural Stores in Production

Production LLM agents in 2026 separate episodic, semantic, and procedural memory. Here is how to design each store and the tradeoffs that matter.

Why One Memory Store Is Not Enough

Early LLM agents treated memory as one big vector store: dump every conversation chunk, retrieve the nearest neighbors, hope for the best. By 2026, the teams shipping reliable agents at scale have stopped doing this. They borrow the cognitive science taxonomy of episodic, semantic, and procedural memory because each kind needs different storage, different write rules, and very different retrieval behavior.

This guide walks through the three-store pattern, the tradeoffs that matter in production, and the open-source projects (Letta, Zep, Mem0, MemGPT, Cognee) implementing each piece.

The Three Stores

flowchart TB
    User[User Turn] --> Agent[Agent Orchestrator]
    Agent --> EM[Episodic Memory<br/>Time-stamped events]
    Agent --> SM[Semantic Memory<br/>Distilled facts]
    Agent --> PM[Procedural Memory<br/>Skills + workflows]
    EM --> Vec[(Vector + Time Index)]
    SM --> KG[(Knowledge Graph)]
    PM --> Skill[(Skill Registry)]
    Vec --> Retrieve[Retrieval Layer]
    KG --> Retrieve
    Skill --> Retrieve
    Retrieve --> LLM[LLM Context]

Episodic Memory

Episodic memory is the timeline of what happened. Each entry is a tuple of (timestamp, agent_id, user_id, event_type, content, embedding). The right primitive is a vector store with a strong time dimension — pgvector with a btree on occurred_at, or Zep's purpose-built temporal graph.

Write rule: append-only. Every turn, every tool call, every tool result.

Retrieval rule: hybrid — combine semantic similarity to the current query with recency decay. A simple but durable formula is score = 0.7 * cosine + 0.3 * exp(-age_days / half_life).

Semantic Memory

Semantic memory is the distilled, deduplicated set of facts the agent has learned. "User prefers vegetarian food," "ACME's renewal date is October 15," "the database is named prod-east-1." This is not a transcript — it is the lessons drawn from many transcripts.

The right primitive in 2026 is a knowledge graph. Mem0, Cognee, and Graphiti all implement this with Neo4j, Kuzu, or Memgraph as the backing store. Updates run asynchronously: a background process consumes episodic events and emits CRUD operations on the graph.

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

Write rule: deduplicate on entity + relation. Use entity resolution (canonical-name matching plus embedding clustering) before insert.

Retrieval rule: graph traversal from the entities mentioned in the query. Limit by hop count (typically 2 or 3) and edge weight.

Procedural Memory

Procedural memory is "how I did X last time it worked." It stores the sequence of tool calls that successfully completed a task type. The right primitive is a skill or workflow registry — JSON documents keyed by task signature, retrieved by similarity to the current goal.

Write rule: only on verified success. Never write a skill from a failed or human-cancelled trajectory.

Retrieval rule: exact or near-exact match on task type, then embed the goal and pick the top-k templates.

The Asynchronous Memory Pipeline

The single biggest mistake in 2026 production agents is doing memory writes inline with the user-facing request. Episodic writes can be inline (low cost), but semantic and procedural writes are LLM-driven and slow. Run them on a queue:

sequenceDiagram
    participant U as User
    participant A as Agent
    participant E as Episodic Store
    participant Q as Queue (NATS / SQS)
    participant W as Memory Worker
    participant S as Semantic + Procedural
    U->>A: Message
    A->>E: append event
    A->>U: response
    E-->>Q: emit event
    Q->>W: deliver
    W->>W: extract facts + skills
    W->>S: upsert

This keeps p95 latency low and makes memory enrichment idempotent and re-runnable.

Forgetting and Conflicts

The hard parts in 2026 are not write or read — they are forgetting and conflict resolution. Three patterns are working in practice:

  • TTL on episodic: keep raw events for 30-90 days, then drop. The semantic store retains what mattered.
  • Provenance on semantic: every fact has the source episode IDs. When a contradicting fact arrives, run a tiny LLM judge to merge or supersede.
  • Versioned procedural: skills are versioned; failures decrement a confidence score; below a threshold, the skill is retired.

Open-Source Implementations Worth Studying

  • Letta (formerly MemGPT) — best reference for the OS-paging analogy applied to LLM context
  • Mem0 — production-ready, three-store implementation with graph backend
  • Zep — temporal knowledge graph as a service
  • Cognee — open-source memory engine with strong GraphRAG support
  • Graphiti — Neo4j-backed temporal graph from Zep, open source

Sources

Share

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.