By Sagar Shankaran, Founder of CallSphere
Zep beats MemGPT on the DMR benchmark (94.8% vs 93.4%) by tracking when facts became true. Here is the temporal-graph architecture and when it wins.
Key takeaways
TL;DR — Zep's "context graph" tracks not just what an agent knows but when each fact became true. The open-source Graphiti engine drives both Zep Cloud and self-hosted deployments. On the DMR benchmark, Zep scores 94.8% vs MemGPT's 93.4%, with up to 18.5% accuracy improvement and 90% lower latency than baseline.
flowchart LR
Repo[GitHub repo] --> CI[GitHub Actions]
CI --> Eval[Agent eval suite · PromptFoo]
Eval -->|pass| Deploy[Deploy]
Eval -->|fail| Block[Block PR]
Deploy --> Prod[Production agent]
Prod --> Trace[(LangSmith trace)]
Trace --> EvalA vector store says "I have a memory that matches this query." A knowledge graph says "I have entities, relationships, and facts." A temporal knowledge graph adds: each fact has a valid_at timestamp and an invalid_at timestamp. Facts can be superseded, not just deleted.
Concrete example. A user mentions in March: "I love Adidas shoes." In May they say: "I switched to Hoka after the marathon." A vector store either keeps both (and confuses the agent) or deletes the first (and loses history). A temporal graph stores:
(user, prefers, Adidas) valid 2026-03-01 to 2026-05-04(user, prefers, Hoka) valid 2026-05-04 to nowNow the agent can answer "what shoes does this user like?" with the current preference and answer "what did they like before the marathon?" with the historical one.
Zep's core is Graphiti (getzep/graphiti on GitHub), a temporally-aware knowledge graph engine that synthesizes both unstructured conversation and structured business data while maintaining historical relationships. Graphiti is Apache 2.0; you can run it standalone without Zep Cloud.
What Graphiti does on each new piece of input:
invalid_at.The Deep Memory Retrieval benchmark was established by the MemGPT team as their primary eval. On DMR:
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
Beyond raw accuracy, Zep reports up to 18.5% improvement on harder tasks and 90% latency reduction compared to baseline implementations. The latency gain comes from graph traversal being O(neighborhood) versus vector search being O(n).
Pick when:
Skip when:
CallSphere's Real Estate OneRoof is built on a Zep-backed memory layer. The 10 specialist agents read and write a shared graph that knows: which buyer toured which property when, which agent talked to which lender on what date, which inspection raised which issue.
When the buyer subgraph asks "did this buyer ever express concern about HOA fees?" — the temporal graph answers with "yes, on 2026-03-12 in conversation with Buyer Agent #3" with the exact context. That's not a similarity search; it's a graph traversal with a time filter.
For our healthcare deployment, the temporal aspect is non-negotiable: a patient's medications, allergies, and conditions all have validity windows. A vector store can't model that correctly.
Pricing: $149 / $499 / $1499. 14-day trial. 22% affiliate.
pip install graphiti-core.graphiti = Graphiti(neo4j_uri, ...).await graphiti.add_episode(name=..., episode_body=...).await graphiti.search(query=..., num_results=10).reference_time to filter by the graph state at a past moment.from graphiti_core import Graphiti
from datetime import datetime
graphiti = Graphiti(uri="bolt://neo4j:7687", user="neo4j", password="...")
# Add a conversation episode
await graphiti.add_episode(
name="call_2026_03_12",
episode_body="The buyer is concerned about HOA fees over $400/month.",
source_description="OneRoof inbound call",
reference_time=datetime(2026, 3, 12, 14, 22),
)
# Later, query with temporal context
results = await graphiti.search(
query="HOA fee concerns",
num_results=5,
center_node_uuid=buyer_uuid,
)
The deeper reason Zep wins on entity-rich domains is that graph traversal preserves structure. A query like "all properties this buyer toured with their spouse" is a 3-hop traversal in a graph (buyer → spouse → toured-with-property). In a vector DB you'd embed each fact and hope semantic similarity finds the right ones — fragile and approximate.
For unstructured text where the only structure is "this paragraph is similar to that paragraph," vector retrieval is the right tool. For entity-relational data, graph traversal is exactly the right tool. CallSphere's verticals are mostly entity-relational (buyers, agents, properties; patients, providers, medications) — so we lean on graphs.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
Graphiti combines graph traversal with vector similarity. Each node has an embedding; nodes within k hops are reranked by embedding similarity. You get graph structure for hard constraints ("only this buyer's facts") and embedding similarity for soft ranking ("most relevant first").
This combination is why the DMR numbers favor Zep: pure graph misses semantic neighbors; pure vector misses structural constraints; the hybrid approach captures both.
Each add_episode call runs entity extraction with an LLM. Cost: ~500-2000 input tokens + 200-800 output tokens per episode, depending on density. At scale, this dominates the operational cost.
Two optimizations we use:
add_episode call. Saves 30-50% on extraction LLM costs.If you're currently storing conversation history in pgvector or Pinecone, migrating to Graphiti is mechanical: write a script that iterates your existing records, converts each to an episode, and replays them through add_episode. The graph builds itself. We migrated OneRoof from a pgvector layer in about two days.
Zep Cloud vs self-hosted Graphiti? Cloud is the managed service with auth, multi-tenancy, and SLAs. Graphiti OSS is the engine — you run Neo4j, you scale it.
How much LLM cost does Graphiti add? Each add_episode runs entity extraction. Budget ~500-2000 tokens per episode depending on density.
Does it work without Neo4j? Neo4j is the supported backend. Other graph DBs are community work.
Can I migrate from mem0 to Zep? Yes — export memories, replay them as Graphiti episodes. The schema upgrade is the work; the migration is mechanical.
Where do I see this on CallSphere? Book a demo and we'll show OneRoof's temporal graph live.
Written by
Sagar Shankaran· Founder, CallSphere
Sagar Shankaran is the founder of CallSphere, where he builds production AI voice and chat agents deployed across healthcare, hospitality, real estate, and home services. He writes about agentic AI, LLM engineering, and shipping voice agents that handle real calls in production.
See how AI voice agents work for your industry. Live demo available -- no signup required.
Graphiti is the open-source temporal knowledge graph for AI agents in 2026. Learn how bi-temporal memory beats vector RAG for voice agents and long-running LLMs.
Zep Cloud and OSS Zep have diverged in 2026 with different feature sets. The build-vs-buy math for memory infrastructure with concrete cost numbers and trade-offs.
Neo4j's agent-memory project ships short-term, long-term, and reasoning memory in one graph. Microsoft Agent Framework and LangChain both wire it in. Here is the production pattern.
Cognee builds and queries a knowledge graph from your unstructured data automatically. A walkthrough from install to your first agent integration in production.
Why static knowledge graphs fail for agents that learn over time, and how Graphiti's temporal edges fix it. Concrete schema examples and edge-case behavior.
Three serious agent-memory layers in 2026: Mem0, Zep, and Letta. Where each one wins on cost, recall, and operational simplicity for production agent teams.
© 2026 CallSphere LLC. All rights reserved.
Watch how CallSphere handles real customer calls, schedules appointments, and processes payments — live.
Try Live DemoBook a DemoCalculate Your ROI