---
title: "Hierarchical Memory for AI Agents: Working Memory, Short-Term, and Long-Term Tiers"
description: "Learn how to design a three-tier memory architecture for AI agents with working memory, short-term buffers, and long-term stores, including promotion rules, eviction policies, and retrieval priority."
canonical: https://callsphere.ai/blog/hierarchical-memory-ai-agents-working-short-long-term-tiers
category: "Learn Agentic AI"
tags: ["Agent Memory", "Memory Architecture", "Working Memory", "Python", "Agentic AI"]
author: "CallSphere Team"
published: 2026-03-17T00:00:00.000Z
updated: 2026-05-06T01:02:44.490Z
---

# Hierarchical Memory for AI Agents: Working Memory, Short-Term, and Long-Term Tiers

> Learn how to design a three-tier memory architecture for AI agents with working memory, short-term buffers, and long-term stores, including promotion rules, eviction policies, and retrieval priority.

## Why a Single Memory Store Falls Short

Most agent frameworks treat memory as a flat list. Every fact, observation, and user message lives in one undifferentiated pool. This works for toy demos, but in production the agent slows down as the memory grows, retrieval quality degrades, and context windows overflow with irrelevant details.

Human cognition solves this with hierarchical memory. Working memory holds the immediate task context. Short-term memory retains recent interactions. Long-term memory stores consolidated knowledge built up over days and weeks. An AI agent benefits from the same layered approach.

## The Three-Tier Model

The hierarchy consists of three tiers, each with distinct capacity, retention, and retrieval characteristics.

```mermaid
flowchart TD
    MSG(["New message"])
    WORKING["Working memory
rolling window"]
    EPISODIC[("Episodic memory
past sessions")]
    SEMANTIC[("Semantic memory
facts and preferences")]
    SUM["Summarizer
compresses old turns"]
    ROUTER{"Retrieve
needed memories"}
    PROMPT["Assembled context"]
    LLM["LLM"]
    UPD["Memory updater
writes new facts"]
    MSG --> WORKING --> ROUTER
    ROUTER -->|Past sessions| EPISODIC
    ROUTER -->|User facts| SEMANTIC
    EPISODIC --> SUM --> PROMPT
    SEMANTIC --> PROMPT
    WORKING --> PROMPT --> LLM --> UPD
    UPD --> EPISODIC
    UPD --> SEMANTIC
    style ROUTER fill:#4f46e5,stroke:#4338ca,color:#fff
    style LLM fill:#f59e0b,stroke:#d97706,color:#1f2937
    style EPISODIC fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
    style SEMANTIC fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
```

**Working Memory** holds the current task context. It is small, fast, and completely replaced when the agent switches tasks. Think of it as the agent's scratchpad.

**Short-Term Memory** retains recent conversation turns and observations. It has a fixed window size and uses a FIFO eviction policy. Items that prove important get promoted to long-term storage.

**Long-Term Memory** stores consolidated facts, user preferences, and learned patterns. It persists across sessions and uses semantic search for retrieval.

```python
from dataclasses import dataclass, field
from datetime import datetime
from typing import Optional
from collections import deque

@dataclass
class MemoryItem:
    content: str
    timestamp: datetime
    importance: float = 0.5
    access_count: int = 0
    metadata: dict = field(default_factory=dict)

class HierarchicalMemory:
    def __init__(
        self,
        working_capacity: int = 5,
        short_term_capacity: int = 50,
    ):
        self.working: list[MemoryItem] = []
        self.short_term: deque[MemoryItem] = deque(
            maxlen=short_term_capacity
        )
        self.long_term: list[MemoryItem] = []
        self.working_capacity = working_capacity
        self.promotion_threshold = 0.7

    def add_to_working(self, content: str, importance: float = 0.5):
        item = MemoryItem(
            content=content,
            timestamp=datetime.now(),
            importance=importance,
        )
        self.working.append(item)
        if len(self.working) > self.working_capacity:
            evicted = self.working.pop(0)
            self.short_term.append(evicted)

    def promote_to_long_term(self, item: MemoryItem):
        """Promote important short-term memories."""
        if item.importance >= self.promotion_threshold:
            self.long_term.append(item)
            return True
        return False

    def sweep_short_term(self):
        """Review short-term memories for promotion."""
        promoted = []
        remaining = deque(maxlen=self.short_term.maxlen)
        for item in self.short_term:
            if self.promote_to_long_term(item):
                promoted.append(item)
            else:
                remaining.append(item)
        self.short_term = remaining
        return promoted
```

## Promotion Rules

Promotion from short-term to long-term should not be arbitrary. Three signals determine whether a memory deserves long-term storage.

**Importance score** — memories tagged with high importance during creation (user preferences, explicit instructions) are promoted immediately.

**Access frequency** — if the agent retrieves a short-term memory multiple times, it is clearly useful and should be promoted.

**Recency-weighted relevance** — memories that remain relevant after multiple conversation turns have proven their staying power.

```python
def should_promote(self, item: MemoryItem) -> bool:
    importance_signal = item.importance >= self.promotion_threshold
    access_signal = item.access_count >= 3
    age_seconds = (datetime.now() - item.timestamp).total_seconds()
    survived_long = age_seconds > 300 and item.access_count > 0
    return importance_signal or access_signal or survived_long
```

## Eviction Policies

Each tier needs a different eviction strategy. Working memory uses strict replacement — when a new task begins, the entire working memory is flushed. Short-term memory uses FIFO with a promotion check: before an item is evicted, the system evaluates whether it should be promoted. Long-term memory uses importance-decay eviction — items that have not been accessed in a long time and have low importance are candidates for removal.

```python
def evict_long_term(self, max_items: int = 1000):
    if len(self.long_term)  list[MemoryItem]:
    results = []
    # Tier 1: working memory — exact substring match
    for item in self.working:
        if query.lower() in item.content.lower():
            item.access_count += 1
            results.append(item)

    # Tier 2: short-term — recency bias
    for item in sorted(
        self.short_term,
        key=lambda m: m.timestamp,
        reverse=True,
    ):
        if query.lower() in item.content.lower():
            item.access_count += 1
            results.append(item)

    # Tier 3: long-term — would use embedding similarity
    # in production; simplified here for clarity
    for item in self.long_term:
        if query.lower() in item.content.lower():
            item.access_count += 1
            results.append(item)

    return results[:top_k]
```

## FAQ

### Why not just use a vector database for everything?

A vector database is excellent for long-term semantic retrieval, but it adds latency. Working memory and short-term memory benefit from in-process data structures that return results in microseconds. The hierarchical approach lets you use the right storage engine for each tier.

### How do I decide the capacity for each tier?

Working memory should match the context needed for a single task — typically 3 to 10 items. Short-term memory should cover a full conversation session, usually 30 to 100 items. Long-term capacity depends on your storage budget, but start with 1,000 items and add eviction when you exceed it.

### Can I persist all three tiers across agent restarts?

Working memory is ephemeral by design and should be rebuilt from the current task state. Short-term memory can be serialized to a session store like Redis with a TTL. Long-term memory should always be persisted to a database or vector store.

---

#AgentMemory #MemoryArchitecture #WorkingMemory #Python #AgenticAI #LearnAI #AIEngineering

---

Source: https://callsphere.ai/blog/hierarchical-memory-ai-agents-working-short-long-term-tiers