---
title: "Agent Memory Systems: Short-Term, Long-Term, and Episodic Memory for AI Agents"
description: "Technical deep dive into agent memory architectures covering conversation context, vector DB persistence, and experience replay with implementation code for production systems."
canonical: https://callsphere.ai/blog/agent-memory-systems-short-term-long-term-episodic-memory-ai-2026
category: "Learn Agentic AI"
tags: ["Agent Memory", "Memory Architecture", "Vector Database", "AI Agents", "Context Management"]
author: "CallSphere Team"
published: 2026-03-21T00:00:00.000Z
updated: 2026-05-07T03:28:56.906Z
---

# Agent Memory Systems: Short-Term, Long-Term, and Episodic Memory for AI Agents

> Technical deep dive into agent memory architectures covering conversation context, vector DB persistence, and experience replay with implementation code for production systems.

## Why Memory Transforms Agents from Stateless to Intelligent

A stateless AI agent answers each question in isolation. It cannot remember your name, your preferences, what you discussed yesterday, or the lessons it learned from past mistakes. This is the difference between a search engine and a colleague.

Memory is the architectural component that bridges this gap. By implementing structured memory systems, agents accumulate knowledge across conversations, learn from interactions, and provide increasingly personalized and accurate responses over time.

The human brain uses distinct memory systems — working memory for immediate context, long-term memory for persistent knowledge, and episodic memory for specific experiences. Production AI agents benefit from the same separation. Each type serves a different purpose, has different storage characteristics, and requires different retrieval strategies.

## Short-Term Memory: The Conversation Context

Short-term memory is the simplest form: it is the conversation history passed to the LLM with each request. Every message, tool call, and response in the current session forms the agent's immediate context.

```mermaid
flowchart TD
    DOC(["Document"])
    CHUNK["Chunker
recursive plus overlap"]
    EMB["Embedding model"]
    META["Attach metadata
source, page, tenant"]
    INDEX[("HNSW or IVF index
in vector store")]
    Q(["Query"])
    QEMB["Embed query"]
    SEARCH["ANN search
cosine similarity"]
    FILTER["Metadata filter
tenant or date"]
    HITS(["Top-k chunks"])
    DOC --> CHUNK --> EMB --> META --> INDEX
    Q --> QEMB --> SEARCH
    INDEX --> SEARCH --> FILTER --> HITS
    style INDEX fill:#4f46e5,stroke:#4338ca,color:#fff
    style HITS fill:#059669,stroke:#047857,color:#fff
```

```python
from dataclasses import dataclass, field
from typing import Any
import time

@dataclass
class Message:
    role: str  # "user", "assistant", "tool"
    content: str
    timestamp: float = field(default_factory=time.time)
    metadata: dict[str, Any] = field(default_factory=dict)

class ShortTermMemory:
    def __init__(self, max_tokens: int = 120_000):
        self.messages: list[Message] = []
        self.max_tokens = max_tokens

    def add(self, role: str, content: str, **metadata):
        self.messages.append(
            Message(role=role, content=content, metadata=metadata)
        )
        self._enforce_limit()

    def get_context(self) -> list[dict]:
        return [
            {"role": m.role, "content": m.content}
            for m in self.messages
        ]

    def _enforce_limit(self):
        """Sliding window: remove oldest messages when over limit."""
        total_tokens = sum(
            self._estimate_tokens(m.content) for m in self.messages
        )
        while total_tokens > self.max_tokens and len(self.messages) > 1:
            removed = self.messages.pop(0)
            total_tokens -= self._estimate_tokens(removed.content)

    def _estimate_tokens(self, text: str) -> int:
        # Rough estimate: 1 token per 4 characters
        return len(text) // 4

    def summarize_and_compress(self, summarizer_fn) -> str:
        """Compress older messages into a summary to save tokens."""
        if len(self.messages)  list[dict]:
        """Retrieve relevant memories for a query."""
        query_embedding = await self.embedding_fn(query)
        results = await self.vector_store.query(
            vector=query_embedding,
            top_k=top_k,
            filter={"namespace": self.namespace},
            include_metadata=True,
        )
        return [
            {
                "content": r["metadata"]["content"],
                "score": r["score"],
                "timestamp": r["metadata"]["timestamp"],
            }
            for r in results
            if r["score"] >= min_score
        ]

    async def forget(self, memory_id: str):
        """Delete a specific memory (GDPR compliance)."""
        await self.vector_store.delete(ids=[memory_id])
```

### What to Store in Long-Term Memory

Not every message belongs in long-term memory. Store:

- **User preferences**: "I prefer Python over JavaScript", "My timezone is PST"
- **Key decisions**: "We decided to use PostgreSQL for the user service"
- **Learned facts**: "The company's fiscal year starts in April"
- **Interaction outcomes**: "The refund was processed successfully on 2026-03-15"

Do not store: casual acknowledgments, error messages, routine confirmations, or verbatim conversation logs.

### Retrieval Strategies

**Semantic search** retrieves memories whose embeddings are closest to the current query. This is the default and handles most cases well.

**Temporal weighting** boosts recent memories and decays older ones. Multiply the similarity score by a time decay factor: `score * decay_factor^(days_since_stored)`.

**Categorical filtering** uses metadata tags to narrow the search space. When the agent is handling a billing question, filter memories to the "billing" category before running semantic search.

## Episodic Memory: Learning from Experience

Episodic memory stores complete interaction episodes — the full sequence of events from initial request to resolution. Unlike long-term memory which stores atomic facts, episodic memory preserves the narrative structure of past experiences.

```python
from dataclasses import dataclass, field
from typing import Any

@dataclass
class Episode:
    episode_id: str
    trigger: str  # What initiated this episode
    steps: list[dict] = field(default_factory=list)
    outcome: str = ""  # "success", "failure", "escalation"
    lessons: list[str] = field(default_factory=list)
    duration_seconds: float = 0.0

class EpisodicMemory:
    def __init__(self, storage, embedding_fn):
        self.storage = storage
        self.embedding_fn = embedding_fn
        self.current_episode: Episode | None = None

    def start_episode(self, episode_id: str, trigger: str):
        self.current_episode = Episode(
            episode_id=episode_id, trigger=trigger
        )

    def record_step(self, action: str, result: Any,
                    reasoning: str = ""):
        if self.current_episode:
            self.current_episode.steps.append({
                "action": action,
                "result": str(result),
                "reasoning": reasoning,
                "timestamp": time.time(),
            })

    async def end_episode(self, outcome: str,
                          lessons: list[str] = None):
        if not self.current_episode:
            return
        self.current_episode.outcome = outcome
        self.current_episode.lessons = lessons or []
        if self.current_episode.steps:
            self.current_episode.duration_seconds = (
                self.current_episode.steps[-1]["timestamp"]
                - self.current_episode.steps[0]["timestamp"]
            )
        # Store episode for future retrieval
        episode_text = self._serialize_episode(self.current_episode)
        embedding = await self.embedding_fn(episode_text)
        await self.storage.store(
            id=self.current_episode.episode_id,
            embedding=embedding,
            data=self.current_episode.__dict__,
        )
        self.current_episode = None

    async def recall_similar_episodes(self, situation: str,
                                       top_k: int = 3) -> list[dict]:
        """Find past episodes similar to the current situation."""
        query_embedding = await self.embedding_fn(situation)
        return await self.storage.query(
            vector=query_embedding, top_k=top_k
        )

    def _serialize_episode(self, episode: Episode) -> str:
        steps_text = " -> ".join(
            s["action"] for s in episode.steps
        )
        return (
            f"Trigger: {episode.trigger}. "
            f"Steps: {steps_text}. "
            f"Outcome: {episode.outcome}. "
            f"Lessons: {'; '.join(episode.lessons)}"
        )
```

### Experience Replay

The most powerful use of episodic memory is experience replay: when the agent encounters a new situation, it retrieves similar past episodes and uses them as few-shot examples in its prompt.

```python
async def handle_with_experience(agent, user_message: str,
                                  episodic_memory: EpisodicMemory):
    similar = await episodic_memory.recall_similar_episodes(
        user_message, top_k=2
    )
    experience_context = ""
    if similar:
        experience_context = "\nRelevant past experiences:\n"
        for ep in similar:
            experience_context += (
                f"- Situation: {ep['trigger']}\n"
                f"  Approach: {' -> '.join(s['action'] for s in ep['steps'])}\n"
                f"  Outcome: {ep['outcome']}\n"
                f"  Lessons: {'; '.join(ep.get('lessons', []))}\n"
            )

    enhanced_prompt = f"{agent.instructions}\n{experience_context}"
    # Run agent with enhanced context
    return await agent.run(user_message, instructions=enhanced_prompt)
```

This pattern allows agents to improve over time without retraining. Failed episodes teach the agent to avoid certain approaches. Successful episodes reinforce effective strategies.

## Combining All Three Memory Types

A production agent uses all three memory types together:

1. **Short-term memory** holds the current conversation — the user's messages, tool results, and the agent's responses
2. **Long-term memory** is queried at the start of each conversation to inject relevant user preferences and past knowledge
3. **Episodic memory** is queried when the agent encounters a problem, providing past experiences as guidance

The memory orchestration layer decides which memories to inject and in what priority. A common pattern is to allocate token budgets: 60% for the current conversation (short-term), 25% for long-term knowledge, and 15% for episodic examples.

## FAQ

### How do you handle memory conflicts between short-term and long-term?

Short-term memory always takes precedence. If the user said "I now prefer TypeScript" in the current conversation, that overrides a long-term memory saying "User prefers Python." After the conversation ends, the new preference should be stored in long-term memory, replacing or annotating the old one.

### What embedding model should you use for agent memory?

For most use cases, OpenAI's text-embedding-3-large or Cohere's embed-v4 provide the best balance of quality and cost. For high-throughput systems processing millions of memories, smaller models like text-embedding-3-small reduce latency and cost with minimal quality loss for retrieval tasks.

### How do you handle GDPR and data deletion for agent memories?

Every memory must be tagged with a user identifier. Implement a `forget_user(user_id)` function that deletes all memories associated with that user from both the vector store and any backing storage. This must include short-term conversation logs, long-term memory entries, and episodic records. Audit this functionality regularly.

### Does episodic memory actually improve agent performance?

Yes, measurably. In A/B tests across customer support and coding assistant use cases, agents with episodic memory show 15-25% higher task completion rates and 30% fewer repeated errors compared to agents with only short-term and long-term memory. The key is curating high-quality episodes — storing every interaction degrades retrieval quality.

---

Source: https://callsphere.ai/blog/agent-memory-systems-short-term-long-term-episodic-memory-ai-2026
