Skip to content
Agentic AI
Agentic AI10 min read0 views

Time-Decay Memory for Chat Agents: Ebbinghaus Curves in Practice

Good agent memory needs to forget. Time-decay weights recent memories higher; Ebbinghaus-style curves auto-evict stale entries; TTL tiers keep allergies forever and small-talk for an hour.

TL;DR — Agents that never forget end up flooded with stale, irrelevant context. Time-decay memory weights recent memories higher (exponential decay on recency), uses TTL tiers for category-specific lifetimes ("dietary allergies" = forever; "today's mood" = 24 hours), and auto-evicts low-utility entries. The 2026 best-of-class agents use Ebbinghaus-curve decay with reinforcement on recall.

The technique

Naive memory: dump every turn into a vector store, retrieve top-K each time. Three failures: (1) stale facts (the user moved cities a year ago); (2) salience inversion (the agent prefers a single vivid memory over a more recent contradicting one); (3) cost (memory grows without bound).

Time-decay memory multiplies semantic similarity by a recency function: score = sim * exp(-lambda * age). lambda controls half-life; longer half-lives for stable facts, shorter for volatile state.

Ebbinghaus-curve memory goes further: each memory has a continuous decay rate. Successful recalls reinforce the memory (push the curve out); unused memories decay and eventually evict.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live Demo →
flowchart LR
  T[New turn] --> EX[Extract facts]
  EX --> TT{TTL tier}
  TT -->|allergy| INF[Infinite TTL]
  TT -->|preference| LONG[1y TTL]
  TT -->|context| SHORT[7d TTL]
  TT -->|chat-only| SES[session]
  INF --> S[(Memory store)]
  LONG --> S
  SHORT --> S
  Q[Query] --> R[Retrieve]
  R --> SC[score = sim * exp -lambda*age]
  SC --> RE[Reinforce on hit]
  RE --> S

How it works

Each memory entry: { id, text, embedding, created_at, last_accessed_at, ttl_tier, decay_lambda, hit_count }. At write time, an LLM tags the fact with a TTL tier (immutable / long / short / session) and an initial decay_lambda. At retrieval, the score is cos(q, m.embedding) * exp(-m.decay_lambda * (now - m.last_accessed_at)). On a hit, last_accessed_at updates; hit_count increments; decay_lambda decreases (memory hardens). A nightly job evicts entries where exp(-lambda * age) < 0.05 and hit_count == 0.

CallSphere implementation

Every CallSphere voice/chat agent runs time-decay memory:

  • Allergies + insurance numbers in Healthcare = infinite TTL
  • Preferred broker / preferred school district in OneRoof = 1-year TTL
  • Last 5 ticket subjects in UrackIT IT helpdesk = 30-day TTL
  • Mood, current task, in-call context = session-only

Decay parameters live per vertical. Healthcare's medication-allergy memory has lambda = 0 (immutable). Real-estate buyer urgency ("we want to close in 30 days") has lambda = 0.05/day so it fades after the buying window.

37 agents · 90+ tools · 115+ DB tables · 6 verticals. $149/$499/$1499, 14-day trial, 22% affiliate. See multi-turn memory at work on /demo.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Build steps with code

import math, time

TTL_TIERS = {
    "immutable": (0.0, None),     # never evict, no decay
    "long":      (0.001, 365*86400),  # 1 year
    "short":     (0.01, 30*86400),
    "session":   (0.1, 86400),
}

def write_memory(text):
    tier = classify_ttl(text)        # LLM call: returns one of TTL_TIERS keys
    lam, ttl = TTL_TIERS[tier]
    db.insert("memory", {
        "text": text, "embedding": embed(text),
        "created_at": time.time(), "last_accessed_at": time.time(),
        "ttl_tier": tier, "decay_lambda": lam, "hit_count": 0,
    })

def retrieve(q, top_k=5):
    cands = vector_search(embed(q), k=50)
    now = time.time()
    scored = [
        (m, m.cos_sim * math.exp(-m.decay_lambda * (now - m.last_accessed_at)))
        for m in cands
    ]
    top = sorted(scored, key=lambda x: -x[1])[:top_k]
    for m, _ in top:
        db.update("memory", m.id, {
            "last_accessed_at": now,
            "hit_count": m.hit_count + 1,
            "decay_lambda": m.decay_lambda * 0.9,  # reinforce
        })
    return [m for m, _ in top]
  1. LLM-classify TTL on write. The classifier is the silent ranker.
  2. Reinforce on retrieval; do not just return — update.
  3. Run a nightly evictor for hit_count == 0 and effective_score < 0.05.
  4. Cap memory size per user; spillover evicts oldest session-tier first.

Pitfalls

  • Wrong TTL classifier: tagging "I love pizza" as immutable pollutes future calls. Calibrate.
  • Decay too aggressive: agent forgets a real allergy. Always test on a golden set.
  • No staleness detection: a "highly retrieved" memory is not necessarily correct. Add explicit contradiction handling.
  • Reinforcement loop: mis-classified memory keeps getting hit, never decays. Add a max_hit_count guardrail.

FAQ

Decay or TTL? Both. TTL is the floor (mass eviction), decay is the score modifier.

Embedding store or graph? Hybrid — embedding for fuzzy recall, graph for entity-heavy recalls. See vw6g-15 on graph memory.

Per-user or global? Per-user always. Cross-user memory is a privacy violation.

Cost? ~$0.001 per memory write (the TTL classifier). Cheap.

See it on /demo? Yes — the multi-turn demo logs decay scores in the trace panel.

Sources

Share

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.