By Sagar Shankaran, Founder of CallSphere
Good agent memory needs to forget. Time-decay weights recent memories higher; Ebbinghaus-style curves auto-evict stale entries; TTL tiers keep allergies forever and small-talk for an hour.
Key takeaways
TL;DR — Agents that never forget end up flooded with stale, irrelevant context. Time-decay memory weights recent memories higher (exponential decay on recency), uses TTL tiers for category-specific lifetimes ("dietary allergies" = forever; "today's mood" = 24 hours), and auto-evicts low-utility entries. The 2026 best-of-class agents use Ebbinghaus-curve decay with reinforcement on recall.
Naive memory: dump every turn into a vector store, retrieve top-K each time. Three failures: (1) stale facts (the user moved cities a year ago); (2) salience inversion (the agent prefers a single vivid memory over a more recent contradicting one); (3) cost (memory grows without bound).
Time-decay memory multiplies semantic similarity by a recency function: score = sim * exp(-lambda * age). lambda controls half-life; longer half-lives for stable facts, shorter for volatile state.
Ebbinghaus-curve memory goes further: each memory has a continuous decay rate. Successful recalls reinforce the memory (push the curve out); unused memories decay and eventually evict.
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
flowchart LR
T[New turn] --> EX[Extract facts]
EX --> TT{TTL tier}
TT -->|allergy| INF[Infinite TTL]
TT -->|preference| LONG[1y TTL]
TT -->|context| SHORT[7d TTL]
TT -->|chat-only| SES[session]
INF --> S[(Memory store)]
LONG --> S
SHORT --> S
Q[Query] --> R[Retrieve]
R --> SC[score = sim * exp -lambda*age]
SC --> RE[Reinforce on hit]
RE --> S
Each memory entry: { id, text, embedding, created_at, last_accessed_at, ttl_tier, decay_lambda, hit_count }. At write time, an LLM tags the fact with a TTL tier (immutable / long / short / session) and an initial decay_lambda. At retrieval, the score is cos(q, m.embedding) * exp(-m.decay_lambda * (now - m.last_accessed_at)). On a hit, last_accessed_at updates; hit_count increments; decay_lambda decreases (memory hardens). A nightly job evicts entries where exp(-lambda * age) < 0.05 and hit_count == 0.
Every CallSphere voice/chat agent runs time-decay memory:
Decay parameters live per vertical. Healthcare's medication-allergy memory has lambda = 0 (immutable). Real-estate buyer urgency ("we want to close in 30 days") has lambda = 0.05/day so it fades after the buying window.
37 agents · 90+ tools · 115+ DB tables · 6 verticals. $149/$499/$1499, 14-day trial, 22% affiliate. See multi-turn memory at work on /demo.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
import math, time
TTL_TIERS = {
"immutable": (0.0, None), # never evict, no decay
"long": (0.001, 365*86400), # 1 year
"short": (0.01, 30*86400),
"session": (0.1, 86400),
}
def write_memory(text):
tier = classify_ttl(text) # LLM call: returns one of TTL_TIERS keys
lam, ttl = TTL_TIERS[tier]
db.insert("memory", {
"text": text, "embedding": embed(text),
"created_at": time.time(), "last_accessed_at": time.time(),
"ttl_tier": tier, "decay_lambda": lam, "hit_count": 0,
})
def retrieve(q, top_k=5):
cands = vector_search(embed(q), k=50)
now = time.time()
scored = [
(m, m.cos_sim * math.exp(-m.decay_lambda * (now - m.last_accessed_at)))
for m in cands
]
top = sorted(scored, key=lambda x: -x[1])[:top_k]
for m, _ in top:
db.update("memory", m.id, {
"last_accessed_at": now,
"hit_count": m.hit_count + 1,
"decay_lambda": m.decay_lambda * 0.9, # reinforce
})
return [m for m, _ in top]
Decay or TTL? Both. TTL is the floor (mass eviction), decay is the score modifier.
Embedding store or graph? Hybrid — embedding for fuzzy recall, graph for entity-heavy recalls. See vw6g-15 on graph memory.
Per-user or global? Per-user always. Cross-user memory is a privacy violation.
Cost? ~$0.001 per memory write (the TTL classifier). Cheap.
See it on /demo? Yes — the multi-turn demo logs decay scores in the trace panel.
Written by
Sagar Shankaran· Founder, CallSphere
Sagar Shankaran is the founder of CallSphere, where he builds production AI voice and chat agents deployed across healthcare, hospitality, real estate, and home services. He writes about agentic AI, LLM engineering, and shipping voice agents that handle real calls in production.
See how AI voice agents work for your industry. Live demo available -- no signup required.
Graphiti is the open-source temporal knowledge graph for AI agents in 2026. Learn how bi-temporal memory beats vector RAG for voice agents and long-running LLMs.
78% of issues resolve via AI bots and 87% of users report positive experiences. Here is how 2026 chat agents fire inline 1–5 stars, NPS chips, and follow-up CSAT without survey fatigue.
Companies that safely automate 60 to 80 percent of refund requests with verifiable accuracy reduce costs and improve customer experience. Here is how to ship a chat-driven refund and cancellation flow without losing the customer.
Neo4j's agent-memory project ships short-term, long-term, and reasoning memory in one graph. Microsoft Agent Framework and LangChain both wire it in. Here is the production pattern.
11x.ai and Artisan promised to replace BDRs entirely. By 2026 most adopters reverted to hybrid models. Here is the outbound chat pattern that actually works.
Personalizing agents for one user is easy. Personalizing them for a million users is a memory-tier problem. The hot/warm/cold split and what each tier optimizes for.
© 2026 CallSphere LLC. All rights reserved.