---
title: "Graphiti: How Temporal Knowledge Graphs Give AI Voice Agents Persistent Memory (2026 Guide)"
description: "Graphiti is the open-source temporal knowledge graph for AI agents in 2026. Learn how bi-temporal memory beats vector RAG for voice agents and long-running LLMs."
canonical: https://callsphere.ai/blog/graphiti-temporal-knowledge-graph-ai-agents-2026
category: "Agentic AI"
tags: ["Knowledge Graphs", "AI Agents", "Memory", "Graphiti", "Zep", "RAG", "LLM Memory", "Agent Architecture", "Temporal Reasoning", "Voice Agents"]
author: "CallSphere Team"
published: 2026-05-15T00:00:00.000Z
updated: 2026-05-16T00:29:26.462Z
---

# Graphiti: How Temporal Knowledge Graphs Give AI Voice Agents Persistent Memory (2026 Guide)

> Graphiti is the open-source temporal knowledge graph for AI agents in 2026. Learn how bi-temporal memory beats vector RAG for voice agents and long-running LLMs.

## TL;DR

Agents that hold real conversations need real memory, and pure vector RAG is not memory — it is full-text recall without time, contradiction handling, or entity resolution. In 2026 the canonical open-source answer is [Graphiti](https://github.com/getzep/graphiti), Zep's temporal knowledge-graph library that ingests conversational "episodes," extracts entities and relationships with an LLM, stores them as a bi-temporal graph (valid time + system time), and serves hybrid retrieval (semantic + BM25 + graph traversal) back to the agent. The result is an agent that remembers that Mrs. Garcia is allergic to penicillin, that her allergy was confirmed on April 2 and is still valid today, and that her son David is the emergency contact — three facts a vector store would happily return as three contradictory chunks. At [CallSphere](/products/voice-agents) we think about memory in exactly this shape: every call is an episode, every caller is a node, every fact has a valid-from and valid-to. This guide explains the model, shows real Graphiti code, and walks the tradeoffs honestly.

## Why Stateless Agents Fall Apart in Production

The fastest way to lose trust in an AI voice agent is to make the caller repeat themselves. A patient phones a clinic on Monday to confirm a penicillin allergy. The agent dutifully writes a row to the EHR. The patient phones back Friday to reschedule and the same agent, on the same number, cheerfully asks "any allergies we should know about?" — because between turns the agent forgot everything.

This is the failure mode that kills agent pilots. It is not a model problem. GPT-5, Claude 4.7, and Gemini 3 are all entirely capable of handling that conversation correctly *if you put the right context in the prompt*. The problem is the layer underneath the model: the **memory layer** that decides what counts as "the right context" for this caller, on this turn, right now. Most teams skip that layer and bolt on a vector database, which works for documents and fails for conversations.

A stateless agent, by definition, treats every turn as its own universe. A long-running agent — one that should accumulate state across days, sessions, and calls — needs an explicit, queryable model of what it knows, when it learned it, and whether that fact is still true. That is what a temporal knowledge graph gives you. For background on where this fits in the broader stack, see our [glossary](/glossary) entry on agent memory.

## The Limits of Vector RAG for Agent Memory

When developers reach for "give the agent memory," they almost always start with retrieval-augmented generation: dump every turn into a vector store, embed the user query at runtime, top-k cosine, stuff into the system prompt. This works beautifully for documentation Q&A and breaks in four specific ways the moment you point it at a multi-session conversation.

**No notion of time.** A vector store knows that two strings are semantically close. It does not know which one came first, which one supersedes the other, or whether the older one is still valid. Ask it "what is the patient's current medication?" and it returns every medication ever mentioned, sorted by embedding distance, with no clue that the Lipitor was discontinued in 2025.

**No contradiction handling.** When new information conflicts with old information, the vector store keeps both. "I live in Brooklyn" from 2024 and "I just moved to Austin" from last week both come back. The model then has to figure out which one is current, often by counting tokens or guessing — neither of which is reliable enough for healthcare, finance, or legal voice agents.

**No entity resolution.** "Mrs. Garcia," "Maria Garcia," "the patient on line 2," and "the caller from yesterday morning" are four embeddings, not one entity. A vector store cannot tell you everything about Maria Garcia because it does not know those strings refer to the same node.

**No structured relations.** "David is Maria's emergency contact" and "Maria is David's mother" are facts a knowledge graph stores once and traverses both directions. A vector store stores two chunks and prays one of them ranks high enough on the next query.

Vector RAG is necessary — semantic recall is real and useful — but it is not sufficient. The 2026 consensus in [agentic RAG](/blog) is that long-running agents need a graph for facts and a vector index for fuzzy recall, fused at query time. Graphiti is the open-source library that does both in one system.

## What Graphiti Actually Does

Graphiti, built by [Zep](https://www.getzep.com/), is a Python framework that turns a stream of conversational episodes into a continuously updated, queryable knowledge graph. The full source lives at [github.com/getzep/graphiti](https://github.com/getzep/graphiti). Four ideas matter.

**Episodes are the unit of ingestion.** You hand Graphiti a chunk of conversation, a JSON document, or a freeform text blob and call `add_episode()`. Each episode is preserved verbatim as provenance, then passed through an LLM that extracts entities (nodes) and relationships (edges). Every node and edge keeps a pointer back to the episode that produced it, so you can always audit *why* the graph believes a fact.

**The data model is bi-temporal.** Every edge in the graph carries two timelines. **Valid time** says when the fact was true in the real world ("Maria has been on lisinopril since 2024-03-01"). **System time** — also called transaction time — says when Graphiti learned about it ("we ingested this on 2024-03-02T14:21:00Z"). The pair is the difference between a graph that can answer "what did we know on Tuesday?" and one that cannot.

**Edges get invalidated, not deleted.** When a new episode contradicts an existing edge, Graphiti does not overwrite. It closes the old edge's valid-time window and opens a new one. The history stays in the graph. You can run point-in-time queries that reconstruct the world as the agent understood it at any past moment — invaluable for debugging "why did the agent say that on the May 4 call?"

**Retrieval is hybrid.** A search call combines semantic embeddings, BM25 keyword search, and graph traversal from a center node, with reranking on top. That is the part most teams get wrong when they roll their own knowledge graph: pure Cypher queries miss fuzzy matches, pure vector misses structure, and you need both fused. Graphiti ships the fusion out of the box.

The supported backends are Neo4j 5.26+, FalkorDB 1.1.2+, Kuzu 0.11.2+, and Amazon Neptune. Most teams start on Neo4j (mature, familiar Cypher) and switch to FalkorDB when they want a Redis-style operational footprint.

## Architecture Flowchart

Here is the end-to-end data path for a single agent turn that uses Graphiti as its memory layer. This is the shape we recommend for any [voice agent](/products/voice-agents) or [chat agent](/products/chat-agents) that needs to span more than one session.

```mermaid
flowchart LR
  U[User Turn] --> A[Agent Runtime]
  A -->|add_episode| G[Graphiti Ingestion]
  G --> X[LLM Entity & Edge Extraction]
  X --> N[Node Resolution]
  N --> I[Edge Invalidation Check]
  I --> DB[(Graph DB: Neo4j / FalkorDB)]
  A -->|search query| H[Hybrid Retrieval]
  H --> S[Semantic Embeddings]
  H --> B[BM25 Keyword]
  H --> T[Graph Traversal]
  S --> R[Reranker]
  B --> R
  T --> R
  R --> C[Context Window]
  C --> L[LLM Reasoner]
  L --> Resp[Agent Response]
  Resp --> U
```

Two arrows do the heavy lifting. The top path is write-side: every turn becomes an episode, every episode mutates the graph, every mutation respects bi-temporal validity. The bottom path is read-side: the agent's next query goes through hybrid retrieval, gets reranked against a center node (usually the caller's node), and lands in the LLM's context window as a compact, time-aware fact set. The graph is the source of truth; the LLM is the reasoner over it.

## Why Voice Agents Especially Need This

[Voice agents](/products/voice-agents) live or die on continuity. A chat widget on a website can ask "what's your email?" and the user types it again. A voice caller on the phone will not repeat their date of birth, their last appointment, or their allergy list — they will hang up and call the front desk. The bar for memory on a voice channel is dramatically higher than on a chat channel, and the latency budget to retrieve that memory is dramatically tighter.

Consider three concrete voice-agent scenarios where temporal knowledge graphs change the conversation outcome:

**Clinic intake.** A patient calls about a refill. The agent needs to recall the active medications (valid today), the lapsed medications (valid then, invalidated since), the allergy list (currently valid), and the primary care physician (relationship, not a chunk). A vector store gives you the medications-ever-mentioned soup. Graphiti gives you "as of 2026-05-15, active medications are X, Y; allergies are penicillin (confirmed 2024-04-02); PCP is Dr. Chen."

**Outbound sales follow-up.** An SDR agent called Mr. Patel three weeks ago. He said "call me back after May 10 when I'm done with quarter-close." Today is May 15. The agent needs to know that conversation happened, what was promised, what objections came up, and which products were discussed — not because they are semantically similar to today's intro, but because they are linked to *this caller node* by *this episode*. Graph traversal from the caller node returns the previous-call context in one query.

**Insurance claims.** A claimant calls back on a multi-week case. The graph holds the claim node, the adjuster node, the policyholder node, the incident node, and dated edges between them. The agent can answer "what's the status of my claim?" by walking three hops from the caller, not by re-embedding the entire claim file every turn.

This is the architectural posture we take seriously at CallSphere. We do not currently ship Graphiti as a configurable product feature — our voice and chat agents use a combination of session state, vector recall, and structured CRM joins — but the temporal-graph perspective is exactly how we think about extending memory across calls. If you are building a voice agent that needs to remember callers across weeks, start with Graphiti and a small graph DB and grow into it.

## Sample Integration (Code)

Here is a realistic Python integration that mirrors what a CallSphere-style voice agent would do at call-end and call-start. It uses the actual Graphiti API surface from the public README and quickstart — `add_episode`, `search`, `EpisodeType` — and assumes Neo4j on `bolt://localhost:7687`. Do not paste this verbatim into production; treat it as a structural reference.

```python
import asyncio
import json
from datetime import datetime, timezone

from graphiti_core import Graphiti
from graphiti_core.nodes import EpisodeType

async def ingest_call_turn(
    graphiti: Graphiti,
    caller_id: str,
    transcript_chunk: str,
    structured_facts: dict | None = None,
) -> None:
    """Push one turn of a voice call into the temporal knowledge graph."""
    # 1. Ingest the raw conversational turn as a text episode.
    await graphiti.add_episode(
        name=f"call:{caller_id}:turn:{datetime.now(timezone.utc).isoformat()}",
        episode_body=transcript_chunk,
        source=EpisodeType.message,
        source_description=f"Voice call transcript for caller {caller_id}",
        reference_time=datetime.now(timezone.utc),
    )

    # 2. Optionally ingest structured facts (e.g., extracted slots) as JSON.
    #    Graphiti will resolve these against existing entities and invalidate
    #    any stale edges (e.g., old phone number, lapsed medication).
    if structured_facts:
        await graphiti.add_episode(
            name=f"call:{caller_id}:facts:{datetime.now(timezone.utc).isoformat()}",
            episode_body=json.dumps(structured_facts),
            source=EpisodeType.json,
            source_description=f"Extracted slots for caller {caller_id}",
            reference_time=datetime.now(timezone.utc),
        )

async def recall_caller_context(
    graphiti: Graphiti,
    caller_id: str,
    user_query: str,
) -> list[str]:
    """Pull a compact, time-aware fact set for the agent's next turn."""
    # Hybrid search: semantic + BM25 + graph traversal, reranked.
    results = await graphiti.search(
        query=f"{user_query} (caller {caller_id})",
    )
    # Each result includes the fact, its source episode, and validity window.
    return [r.fact for r in results]

async def main() -> None:
    graphiti = Graphiti(
        "bolt://localhost:7687",
        "neo4j",
        "password",
    )

    # End of call: persist the turn.
    await ingest_call_turn(
        graphiti,
        caller_id="caller_8421",
        transcript_chunk=(
            "Patient confirmed penicillin allergy. "
            "Wants to reschedule April 22 follow-up to May 6."
        ),
        structured_facts={
            "patient_id": "caller_8421",
            "allergy": "penicillin",
            "reschedule_from": "2026-04-22",
            "reschedule_to": "2026-05-06",
        },
    )

    # Start of next call (days later): pull caller-specific memory.
    facts = await recall_caller_context(
        graphiti,
        caller_id="caller_8421",
        user_query="Any drug allergies on file?",
    )
    for f in facts:
        print(f)

if __name__ == "__main__":
    asyncio.run(main())
```

Two things to flag. First, every call to `add_episode` is an async operation that fans out to LLM-based entity extraction — budget for it. Second, the `search` call is fast enough for chat (typically 80-250ms on a warm graph) but you will want to cache caller context at call-start for [voice agents](/products/voice-agents) where every millisecond of head-of-utterance latency is audible.

## A Second Diagram: Multi-Turn Voice Call with Recalled Context

Here is the same idea as a sequence diagram, showing how a returning caller benefits from the graph on call number two.

```mermaid
sequenceDiagram
  participant C as Caller (Mrs. Garcia)
  participant V as CallSphere Voice Agent
  participant G as Graphiti
  participant DB as Graph DB

  Note over C,DB: Call 1 — April 2
  C->>V: "I'm allergic to penicillin."
  V->>G: add_episode("allergy: penicillin", ref_time=Apr 2)
  G->>DB: extract entity Mrs. Garcia, edge HAS_ALLERGY -> penicillin
  DB-->>G: edge stored, valid_from=Apr 2
  G-->>V: ack

  Note over C,DB: Call 2 — May 15 (43 days later)
  C->>V: "Hi, I need to refill my amoxicillin."
  V->>G: search("amoxicillin", center=caller:garcia)
  G->>DB: hybrid query (semantic + BM25 + traversal)
  DB-->>G: HAS_ALLERGY=penicillin (still valid), prior_visit=Apr 2
  G-->>V: facts: [allergic to penicillin, prior visit Apr 2]
  V->>C: "Before I refill — amoxicillin is in the penicillin family. Should I flag this for Dr. Chen?"
  C->>V: "Yes, please."
  V->>G: add_episode("refill flagged for review", ref_time=May 15)
```

This is the conversation the patient remembers. No restating allergies, no missing context, no embarrassment. The graph carries the fact across 43 days and the agent does the right thing on turn one.

## Production Tradeoffs

Graphiti is the right answer for many agent products, but it is not free. Be honest with yourself about the tradeoffs before you commit.

**Latency.** Every `add_episode` triggers an LLM call for entity extraction (usually 300-1500ms depending on the extractor model), graph writes, and embedding generation. `search` adds 80-300ms on top of whatever your reasoner needs. For voice agents with sub-second TTFB targets, you ingest asynchronously after the turn and retrieve synchronously at turn-start with aggressive caching. Do not put `add_episode` on the critical path of a phone call.

**LLM cost.** Entity-and-edge extraction is the single largest variable cost. A 5-minute voice call with 30 turns can produce 30 ingestion LLM calls. At GPT-4o-mini prices this is cheap; at GPT-5 it is not. Most teams use a small extractor model (Haiku 4.5, GPT-4o-mini, Gemini 2.5 Flash) and reserve frontier models for the reasoning step. Graphiti supports configurable LLM clients exactly for this reason.

**Database cost.** Neo4j Aura's lowest production tier starts around $65/month and scales aggressively with node count. Self-hosted FalkorDB on a small VM is closer to $10/month and handles millions of nodes for most conversational workloads. Kuzu is embedded (zero ops, single binary) and a strong choice for low-traffic pilots. Pick based on your ops appetite, not vendor familiarity.

**Schema drift.** The LLM extractor will occasionally invent entity types that diverge from your canonical schema ("Person" vs "Caller" vs "Patient"). Graphiti's custom entity types help, but you will spend real engineering effort on the type system. Plan for a schema-review session every couple of weeks during the first quarter.

**When NOT to use it.** Stateless FAQ bots, single-turn classifiers, and document Q&A do not need a knowledge graph. If your agent never sees the same user twice and never reasons about change-over-time, pure vector RAG is the right answer and Graphiti is overkill. Use it when memory has to persist and facts have to evolve.

For an ROI-driven view of when these tradeoffs pencil out, run a quick estimate in our [ROI calculator](/tools/roi-calculator).

## Graphiti vs. Other Approaches in 2026

Here is the fair comparison most blog posts skip. Each tool below is good at its job; they are not interchangeable.

| Approach | Strength | Weakness | Best Fit |
| --- | --- | --- | --- |
| LangChain ConversationBuffer | Drop-in, zero infra, easy to reason about | No persistence across sessions, no entity model, blows context window quickly | Prototypes, single-session chatbots |
| Mem0 | Auto-summarized user memory, multi-backend, simple API | Lighter on temporal semantics than Graphiti, less explicit invalidation logic | Personalization layer for chat assistants |
| LlamaIndex Property Graph | Tight integration with LlamaIndex ingestion + query stack | More general-purpose KG, less opinionated about agent conversation semantics | Document-heavy KGs, mixed RAG + graph workloads |
| Graphiti | Bi-temporal, edge invalidation, hybrid retrieval purpose-built for agents | Requires graph DB, real ingestion latency, LLM cost on every episode | Long-running agents, voice agents, multi-session memory |

Mention-only, not an endorsement: Cursor and Claude both ship their own internal memory layers for code assistants, and Anthropic's [agent skills](/blog) ecosystem is converging on similar ideas. Those are great for IDE workflows. For a customer-facing voice agent that has to remember a caller across months, the open-source path through Graphiti is more transparent, more auditable, and easier to debug.

## How CallSphere Thinks About Agent Memory

At [CallSphere](/products/voice-agents) we run AI voice agents and chat agents for clinics, dental practices, restaurants, real-estate teams, and home-services brands. Across every vertical, the single complaint that breaks pilots is "the AI forgot." A patient who has to restate their date of birth on call three is a patient who books with a human next time.

Our position on memory is architectural, not feature-flagged. We use a combination of session-scoped state for in-call continuity, structured CRM joins for stable customer facts, and vector recall for fuzzy semantic matches across transcripts. For customers who want true cross-call memory with full temporal semantics — "remember Mrs. Garcia's allergy from April when she calls back in May" — Graphiti is exactly the shape we think it should take. We are happy to walk through how that plugs into a CallSphere [voice agent](/products/voice-agents) or [chat agent](/products/chat-agents) deployment during a [pricing](/pricing) conversation.

If your buying criteria include "the agent has to remember callers across weeks," do not settle for vector-only memory. Ask vendors specifically how they handle bi-temporal facts, edge invalidation, and entity resolution. The answer tells you whether you are buying a memory system or a vector search disguised as one.

## FAQ

**What is Graphiti?**
Graphiti is an open-source Python framework from Zep that builds a temporal knowledge graph for AI agents. It ingests conversational episodes, uses an LLM to extract entities and relationships, stores them in a bi-temporal graph (valid time + system time) backed by Neo4j, FalkorDB, Kuzu, or Amazon Neptune, and serves hybrid retrieval (semantic + BM25 + graph traversal) back to the agent. The repo is at [github.com/getzep/graphiti](https://github.com/getzep/graphiti).

**Graphiti vs Mem0 — which should I use?**
Use Mem0 when you want a lightweight personalization memory layer that auto-summarizes user preferences and plugs into existing chat assistants with minimal infra. Use Graphiti when you need explicit temporal semantics — facts that go invalid, point-in-time queries, structured entity-relationship reasoning — and you are willing to operate a graph database. Voice agents and multi-session agents lean Graphiti; consumer chat personalization leans Mem0.

**Does Graphiti require Neo4j?**
No. Graphiti supports Neo4j 5.26+, FalkorDB 1.1.2+, Kuzu 0.11.2+, and Amazon Neptune (Database Cluster or Neptune Analytics). Neo4j is the most common starting point because it is mature and well-documented; FalkorDB is a strong choice for Redis-style operational simplicity; Kuzu is embedded and ideal for low-ops pilots.

**How is a temporal knowledge graph different from RAG?**
RAG retrieves semantically similar text chunks. A temporal knowledge graph retrieves structured facts with validity windows. RAG has no native notion of "this fact replaced that one" or "this was true on Tuesday but is no longer true today." A temporal KG handles both — every edge carries valid_from / valid_to timestamps, and contradictions invalidate old edges rather than overwriting them. In practice, modern agents use both: the graph for structured memory, vector recall for fuzzy semantic matches, fused at query time.

**Is Graphiti production-ready in 2026?**
Yes, with caveats. The library is stable, has real production users (Zep's own commercial offering is built on top of it), and the API surface is small enough to reason about. The caveats are operational: you are running a graph database, paying LLM cost for every ingestion, and handling schema drift on extracted entities. Treat it like any other stateful infra — capacity planning, backups, monitoring — and it is solidly production-grade.

**How does CallSphere use knowledge graphs?**
CallSphere does not currently ship Graphiti as a configurable product feature. Our voice and chat agents use session state, structured CRM joins, and vector recall today. Temporal knowledge graphs are how we think about extending memory across sessions for customers whose use cases require it (clinics, claims, multi-touch sales). If you have that requirement, [contact us](/contact) and we will walk through what an integration would look like.

**What are episodes, nodes, and edges in Graphiti?**
An episode is the raw input — a conversational turn, a JSON blob, or a text document. Nodes are the entities extracted from episodes (people, products, claims, appointments). Edges are the relationships between nodes (HAS_ALLERGY, RESCHEDULED_TO, EMERGENCY_CONTACT_OF), each tagged with bi-temporal validity windows. Every node and edge keeps a pointer back to the source episode for full provenance.

## Closing

If you are shipping a voice or chat agent that needs to remember callers across calls — not within a single session, but across weeks of relationship — vector RAG alone will not get you there. Temporal knowledge graphs are the 2026 answer, and [Graphiti](https://github.com/getzep/graphiti) is the open-source library worth starting with. The hard work is not the graph; it is the bi-temporal model and the contradiction handling, which Graphiti gives you for free.

We help teams ship voice and chat agents that hold up under that bar every day. If you want to talk through what call-spanning memory looks like for your business — whether you build it on Graphiti, on CallSphere, or both — start with our [ROI calculator](/tools/roi-calculator) for a quick estimate, then [contact us](/contact) for a working session. The right memory architecture is the difference between an AI agent that pilots well and one that customers actually trust.

---

Source: https://callsphere.ai/blog/graphiti-temporal-knowledge-graph-ai-agents-2026
