Time-Decay Memory for Chat Agents: Ebbinghaus Curves in Practice
Good agent memory needs to forget. Time-decay weights recent memories higher; Ebbinghaus-style curves auto-evict stale entries; TTL tiers keep allergies forever and small-talk for an hour.
TL;DR — Agents that never forget end up flooded with stale, irrelevant context. Time-decay memory weights recent memories higher (exponential decay on recency), uses TTL tiers for category-specific lifetimes ("dietary allergies" = forever; "today's mood" = 24 hours), and auto-evicts low-utility entries. The 2026 best-of-class agents use Ebbinghaus-curve decay with reinforcement on recall.
The technique
Naive memory: dump every turn into a vector store, retrieve top-K each time. Three failures: (1) stale facts (the user moved cities a year ago); (2) salience inversion (the agent prefers a single vivid memory over a more recent contradicting one); (3) cost (memory grows without bound).
Time-decay memory multiplies semantic similarity by a recency function: score = sim * exp(-lambda * age). lambda controls half-life; longer half-lives for stable facts, shorter for volatile state.
Ebbinghaus-curve memory goes further: each memory has a continuous decay rate. Successful recalls reinforce the memory (push the curve out); unused memories decay and eventually evict.
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
flowchart LR
T[New turn] --> EX[Extract facts]
EX --> TT{TTL tier}
TT -->|allergy| INF[Infinite TTL]
TT -->|preference| LONG[1y TTL]
TT -->|context| SHORT[7d TTL]
TT -->|chat-only| SES[session]
INF --> S[(Memory store)]
LONG --> S
SHORT --> S
Q[Query] --> R[Retrieve]
R --> SC[score = sim * exp -lambda*age]
SC --> RE[Reinforce on hit]
RE --> S
How it works
Each memory entry: { id, text, embedding, created_at, last_accessed_at, ttl_tier, decay_lambda, hit_count }. At write time, an LLM tags the fact with a TTL tier (immutable / long / short / session) and an initial decay_lambda. At retrieval, the score is cos(q, m.embedding) * exp(-m.decay_lambda * (now - m.last_accessed_at)). On a hit, last_accessed_at updates; hit_count increments; decay_lambda decreases (memory hardens). A nightly job evicts entries where exp(-lambda * age) < 0.05 and hit_count == 0.
CallSphere implementation
Every CallSphere voice/chat agent runs time-decay memory:
- Allergies + insurance numbers in Healthcare = infinite TTL
- Preferred broker / preferred school district in OneRoof = 1-year TTL
- Last 5 ticket subjects in UrackIT IT helpdesk = 30-day TTL
- Mood, current task, in-call context = session-only
Decay parameters live per vertical. Healthcare's medication-allergy memory has lambda = 0 (immutable). Real-estate buyer urgency ("we want to close in 30 days") has lambda = 0.05/day so it fades after the buying window.
37 agents · 90+ tools · 115+ DB tables · 6 verticals. $149/$499/$1499, 14-day trial, 22% affiliate. See multi-turn memory at work on /demo.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
Build steps with code
import math, time
TTL_TIERS = {
"immutable": (0.0, None), # never evict, no decay
"long": (0.001, 365*86400), # 1 year
"short": (0.01, 30*86400),
"session": (0.1, 86400),
}
def write_memory(text):
tier = classify_ttl(text) # LLM call: returns one of TTL_TIERS keys
lam, ttl = TTL_TIERS[tier]
db.insert("memory", {
"text": text, "embedding": embed(text),
"created_at": time.time(), "last_accessed_at": time.time(),
"ttl_tier": tier, "decay_lambda": lam, "hit_count": 0,
})
def retrieve(q, top_k=5):
cands = vector_search(embed(q), k=50)
now = time.time()
scored = [
(m, m.cos_sim * math.exp(-m.decay_lambda * (now - m.last_accessed_at)))
for m in cands
]
top = sorted(scored, key=lambda x: -x[1])[:top_k]
for m, _ in top:
db.update("memory", m.id, {
"last_accessed_at": now,
"hit_count": m.hit_count + 1,
"decay_lambda": m.decay_lambda * 0.9, # reinforce
})
return [m for m, _ in top]
- LLM-classify TTL on write. The classifier is the silent ranker.
- Reinforce on retrieval; do not just return — update.
- Run a nightly evictor for hit_count == 0 and effective_score < 0.05.
- Cap memory size per user; spillover evicts oldest session-tier first.
Pitfalls
- Wrong TTL classifier: tagging "I love pizza" as immutable pollutes future calls. Calibrate.
- Decay too aggressive: agent forgets a real allergy. Always test on a golden set.
- No staleness detection: a "highly retrieved" memory is not necessarily correct. Add explicit contradiction handling.
- Reinforcement loop: mis-classified memory keeps getting hit, never decays. Add a max_hit_count guardrail.
FAQ
Decay or TTL? Both. TTL is the floor (mass eviction), decay is the score modifier.
Embedding store or graph? Hybrid — embedding for fuzzy recall, graph for entity-heavy recalls. See vw6g-15 on graph memory.
Per-user or global? Per-user always. Cross-user memory is a privacy violation.
Cost? ~$0.001 per memory write (the TTL classifier). Cheap.
See it on /demo? Yes — the multi-turn demo logs decay scores in the trace panel.
Sources
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.