By Sagar Shankaran, Founder of CallSphere
Zep graphiti temporal knowledge graph agent memory documentation: zep 2.0 is built on Graphiti — a temporal knowledge graph that lets agents reason over how facts changed over time, not just what is true at the moment of query.
Key takeaways
Zep 2.0 is built on Graphiti — a temporal knowledge graph that lets agents reason over how facts changed over time, not just what is true at the moment of query.
The interesting question is not what this thing is. The interesting question is how it works under load, what assumptions break first, and which architectural patterns hold up when you push past the demo. That is where this piece spends its time. Teams in London are already shipping production deployments built on this stack, and the lessons are starting to filter into the wider community.
If your team is already using Zep, Graphiti, Knowledge Graph, the patterns below should map cleanly onto your stack. If you are still evaluating, the comparison sections will give you the trade-off math without forcing you to wade through marketing pages.
Zep 2.0 and Graphiti matters in 2026 not because of any single feature but because of where it sits in the agent stack. Production teams shipping Zep agents need three things: predictable behavior, ops-friendly observability, and a clear migration path when the underlying tools change. The April 2026 update lands meaningful improvements on all three.
The ecosystem context matters too. With Zep and Graphiti as the current center of gravity, decisions made now will compound over the next 12 to 18 months. The teams that get this right will spend less time on infrastructure and more time on product. The teams that pick wrong will spend a quarter on a migration they did not budget for.
One detail that often gets buried: the official documentation describes the happy path, but production deployments live in the unhappy path. Patterns for handling partial failures, network blips, and tool timeouts deserve as much attention as the architecture diagram.
flowchart TD
Conv[Conversation] --> Extract[Extract Facts/Entities]
Extract --> Vec[(Vector Store)]
Extract --> Graph[(Knowledge Graph)]
Query[Recall Query] --> Hybrid{Hybrid Retriever}
Vec --> Hybrid
Graph --> Hybrid
Hybrid --> Rerank[Reranker]
Rerank --> Ctx[Context Window]
Underneath the marketing surface, the architecture has three moving parts that matter: the runtime, the state model, and the observability surface. Each one has a "default" path and an "advanced" path, and the difference between them often determines whether a team gets to production in six weeks or six months.
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
The runtime decides how fast your agent can react and how cleanly it scales. The state model decides whether your agent can recover from a crash, branch a conversation, or hand work between specialists without dropping context. The observability surface decides whether your on-call engineer can debug a 3am incident in 10 minutes or 3 hours. Skip any one of these and you have a demo, not a product.
The interesting trade-off is between flexibility and operational simplicity. More flexibility means more code to maintain. More opinion in the framework means less code but also less wiggle room when your use case does not match the assumed shape. Production deployments in London have settled on a few common patterns — the kind of patterns that show up in three different vendors' reference architectures because they are the only patterns that actually work at scale.
The patterns that hold up under load:
Cost and performance numbers are where the marketing usually breaks down. The honest summary for Zep 2.0 and Graphiti as of April 10, 2026 looks like this: median latency is good, p99 latency is fine, and cost-per-request is competitive — but each of those is contingent on the deployment model you pick.
Self-hosted deployments give you control and unpredictable ops cost. Managed deployments give you predictability and a vendor-priced ceiling. The break-even point sits around the volume where you would need a half-FTE of ops to keep the self-hosted version healthy. For teams under 100k requests/day, managed almost always wins. Above 1M/day, self-hosted starts to make financial sense if you have the engineering bench to support it.
Two things tend to go wrong when teams adopt this stack without a careful plan. First, they over-architect for scale they do not have yet. Second, they under-invest in evals because the demo "felt right" — and then they have no way to measure regressions when they ship the next change. The teams that get the cost story right tend to share three traits: they instrument cost from day one, they cache aggressively at multiple layers, and they pick a single primary model rather than letting every agent call the most expensive option by default.
Looking forward, the next 90 days are likely to bring three meaningful changes. First, observability standards will continue to consolidate around OpenTelemetry's GenAI conventions — teams that emit them today will be ahead of the curve. Second, more managed agent platforms will ship MCP-native interfaces, reducing the integration glue every team writes today. Third, evals will move from a nice-to-have to a CI gate, just like unit tests did a decade ago.
The teams that ship the cleanest agent products in late 2026 will be the ones that took infrastructure decisions seriously now. The trade-offs covered above are not novel — they are the same boring infrastructure questions every previous wave of platform technology had to answer. The names are different. The decisions are not.
When should I use Zep 2.0 and Graphiti in production?
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
Zep 2.0 and Graphiti is the right pick when you need cross-session memory that survives restarts and supports user-level personalization. If your workload is simpler — for example, a single-turn classification task — you do not need this stack and lighter-weight tooling will get you to production faster. The break-even tends to land around the point where you have at least one multi-step agent serving real users with measurable cost or accuracy implications.
What does Zep 2.0 and Graphiti cost at scale?
Memory cost is dominated by embedding generation and vector storage. For a 100k-user agent product, expect costs in the low-to-mid four figures monthly across embedding API spend and vector storage.
What is the leading alternative to Zep 2.0 and Graphiti in 2026?
Common alternatives include Mem0 for vector-first memory, Letta for in-context/archival splits, Cognee for graph-first memory. The right pick depends on your existing stack, team experience, and which set of trade-offs you can live with operationally.
How do I prevent the memory layer from leaking data across users?
Strict tenant isolation. Every memory record is keyed by a user identifier, every recall query filters by that key, and the filter is enforced at the storage layer, not the application layer. Multi-tenant memory bugs are silent and dangerous — invest in tests that prove isolation, not just code review.
This guide is written for engineers and operators evaluating zep graphiti temporal knowledge graph agent memory documentation in real production systems. Zep graphiti temporal knowledge graph agent memory documentation sits alongside answering questions, build temporal knowledge graphs, entities extracted, entities relationships and facts, entity and relationship in the daily work of teams shipping production AI. The notes below give a plain-language reference for terms used throughout the article.
For teams that want to ship zep graphiti temporal knowledge graph agent memory documentation in voice and chat agents this quarter, CallSphere runs 37 agents and 90+ function tools across 6 verticals on a single dashboard. Start a 14-day trial, see live demo agents, or compare tiers on /pricing.
Written by
Sagar Shankaran· Founder, CallSphere
Sagar Shankaran is the founder of CallSphere, where he builds production AI voice and chat agents deployed across healthcare, hospitality, real estate, and home services. He writes about agentic AI, LLM engineering, and shipping voice agents that handle real calls in production.
See how AI voice agents work for your industry. Live demo available -- no signup required.
Graphiti is the open-source temporal knowledge graph for AI agents in 2026. Learn how bi-temporal memory beats vector RAG for voice agents and long-running LLMs.
Zep Cloud and OSS Zep have diverged in 2026 with different feature sets. The build-vs-buy math for memory infrastructure with concrete cost numbers and trade-offs.
Neo4j's agent-memory project ships short-term, long-term, and reasoning memory in one graph. Microsoft Agent Framework and LangChain both wire it in. Here is the production pattern.
Cognee builds and queries a knowledge graph from your unstructured data automatically. A walkthrough from install to your first agent integration in production.
Why static knowledge graphs fail for agents that learn over time, and how Graphiti's temporal edges fix it. Concrete schema examples and edge-case behavior.
Three serious agent-memory layers in 2026: Mem0, Zep, and Letta. Where each one wins on cost, recall, and operational simplicity for production agent teams.
© 2026 CallSphere LLC. All rights reserved.