Skip to content
AI Infrastructure
AI Infrastructure11 min read0 views

Zep v2 + Graphiti: Why Temporal Knowledge Graphs Beat Vector Memory

Zep beats MemGPT on the DMR benchmark (94.8% vs 93.4%) by tracking when facts became true. Here is the temporal-graph architecture and when it wins.

TL;DR — Zep's "context graph" tracks not just what an agent knows but when each fact became true. The open-source Graphiti engine drives both Zep Cloud and self-hosted deployments. On the DMR benchmark, Zep scores 94.8% vs MemGPT's 93.4%, with up to 18.5% accuracy improvement and 90% lower latency than baseline.

What temporal means here

flowchart LR
  Repo[GitHub repo] --> CI[GitHub Actions]
  CI --> Eval[Agent eval suite · PromptFoo]
  Eval -->|pass| Deploy[Deploy]
  Eval -->|fail| Block[Block PR]
  Deploy --> Prod[Production agent]
  Prod --> Trace[(LangSmith trace)]
  Trace --> Eval
CallSphere reference architecture

A vector store says "I have a memory that matches this query." A knowledge graph says "I have entities, relationships, and facts." A temporal knowledge graph adds: each fact has a valid_at timestamp and an invalid_at timestamp. Facts can be superseded, not just deleted.

Concrete example. A user mentions in March: "I love Adidas shoes." In May they say: "I switched to Hoka after the marathon." A vector store either keeps both (and confuses the agent) or deletes the first (and loses history). A temporal graph stores:

  • (user, prefers, Adidas) valid 2026-03-01 to 2026-05-04
  • (user, prefers, Hoka) valid 2026-05-04 to now

Now the agent can answer "what shoes does this user like?" with the current preference and answer "what did they like before the marathon?" with the historical one.

Graphiti — the open-source engine

Zep's core is Graphiti (getzep/graphiti on GitHub), a temporally-aware knowledge graph engine that synthesizes both unstructured conversation and structured business data while maintaining historical relationships. Graphiti is Apache 2.0; you can run it standalone without Zep Cloud.

What Graphiti does on each new piece of input:

  1. Extract entities (people, companies, products, concepts).
  2. Extract relationships and facts ("Sagar founded CallSphere in 2024").
  3. Compare with existing graph nodes; merge by entity resolution.
  4. Detect contradictions; mark superseded facts with invalid_at.
  5. Expose query APIs for traversal, filtering by time window, and ranked retrieval.

DMR benchmark numbers

The Deep Memory Retrieval benchmark was established by the MemGPT team as their primary eval. On DMR:

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live Demo →
  • Zep: 94.8%
  • MemGPT (baseline): 93.4%

Beyond raw accuracy, Zep reports up to 18.5% improvement on harder tasks and 90% latency reduction compared to baseline implementations. The latency gain comes from graph traversal being O(neighborhood) versus vector search being O(n).

When to pick Zep / Graphiti

Pick when:

  • Your domain has rich entities and relationships (CRM, healthcare records, real estate, B2B sales).
  • Facts change over time and you need to reason about state at a past point.
  • You need structured retrieval — "all relationships of type X from this entity" — not just semantic similarity.
  • You want explainability — graph paths are inherently more auditable than embedding similarity scores.

Skip when:

  • Your data is mostly unstructured text with no entities of consequence.
  • You need the simplest possible memory (use mem0 instead).
  • Your latency budget cannot tolerate the LLM extraction step on writes.

How CallSphere uses it

CallSphere's Real Estate OneRoof is built on a Zep-backed memory layer. The 10 specialist agents read and write a shared graph that knows: which buyer toured which property when, which agent talked to which lender on what date, which inspection raised which issue.

When the buyer subgraph asks "did this buyer ever express concern about HOA fees?" — the temporal graph answers with "yes, on 2026-03-12 in conversation with Buyer Agent #3" with the exact context. That's not a similarity search; it's a graph traversal with a time filter.

For our healthcare deployment, the temporal aspect is non-negotiable: a patient's medications, allergies, and conditions all have validity windows. A vector store can't model that correctly.

Pricing: $149 / $499 / $1499. 14-day trial. 22% affiliate.

Build steps — Graphiti self-hosted

  1. pip install graphiti-core.
  2. Stand up Neo4j (5.x with Aura or self-hosted).
  3. Configure your LLM (OpenAI, Anthropic, or local) for entity extraction.
  4. Initialize: graphiti = Graphiti(neo4j_uri, ...).
  5. On every conversation turn: await graphiti.add_episode(name=..., episode_body=...).
  6. On retrieval: await graphiti.search(query=..., num_results=10).
  7. For temporal queries: pass reference_time to filter by the graph state at a past moment.

Code: a temporal query

from graphiti_core import Graphiti
from datetime import datetime

graphiti = Graphiti(uri="bolt://neo4j:7687", user="neo4j", password="...")

# Add a conversation episode
await graphiti.add_episode(
    name="call_2026_03_12",
    episode_body="The buyer is concerned about HOA fees over $400/month.",
    source_description="OneRoof inbound call",
    reference_time=datetime(2026, 3, 12, 14, 22),
)

# Later, query with temporal context
results = await graphiti.search(
    query="HOA fee concerns",
    num_results=5,
    center_node_uuid=buyer_uuid,
)

Why graph traversal beats vector similarity for entity-rich data

The deeper reason Zep wins on entity-rich domains is that graph traversal preserves structure. A query like "all properties this buyer toured with their spouse" is a 3-hop traversal in a graph (buyer → spouse → toured-with-property). In a vector DB you'd embed each fact and hope semantic similarity finds the right ones — fragile and approximate.

For unstructured text where the only structure is "this paragraph is similar to that paragraph," vector retrieval is the right tool. For entity-relational data, graph traversal is exactly the right tool. CallSphere's verticals are mostly entity-relational (buyers, agents, properties; patients, providers, medications) — so we lean on graphs.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Hybrid: graphs + vectors

Graphiti combines graph traversal with vector similarity. Each node has an embedding; nodes within k hops are reranked by embedding similarity. You get graph structure for hard constraints ("only this buyer's facts") and embedding similarity for soft ranking ("most relevant first").

This combination is why the DMR numbers favor Zep: pure graph misses semantic neighbors; pure vector misses structural constraints; the hybrid approach captures both.

Cost shape — entity extraction is the bottleneck

Each add_episode call runs entity extraction with an LLM. Cost: ~500-2000 input tokens + 200-800 output tokens per episode, depending on density. At scale, this dominates the operational cost.

Two optimizations we use:

  1. Batch episodes — group several conversation turns into a single add_episode call. Saves 30-50% on extraction LLM costs.
  2. Use a cheap extractor — GPT-5-mini or Haiku is plenty for entity extraction; reserve the smart model for retrieval-time reasoning.

Migrating from a vector store

If you're currently storing conversation history in pgvector or Pinecone, migrating to Graphiti is mechanical: write a script that iterates your existing records, converts each to an episode, and replays them through add_episode. The graph builds itself. We migrated OneRoof from a pgvector layer in about two days.

FAQ

Zep Cloud vs self-hosted Graphiti? Cloud is the managed service with auth, multi-tenancy, and SLAs. Graphiti OSS is the engine — you run Neo4j, you scale it.

How much LLM cost does Graphiti add? Each add_episode runs entity extraction. Budget ~500-2000 tokens per episode depending on density.

Does it work without Neo4j? Neo4j is the supported backend. Other graph DBs are community work.

Can I migrate from mem0 to Zep? Yes — export memories, replay them as Graphiti episodes. The schema upgrade is the work; the migration is mechanical.

Where do I see this on CallSphere? Book a demo and we'll show OneRoof's temporal graph live.

Sources

Share

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.