Skip to content
Neo4j Knowledge Graph Memory for AI Agents in 2026
Agentic AI & LLMs12 min read60 views

Neo4j Knowledge Graph Memory for AI Agents in 2026

By Sagar Shankaran, Founder of CallSphere

Quick answer

Neo4j's agent-memory project ships short-term, long-term, and reasoning memory in one graph. Microsoft Agent Framework and LangChain both wire it in. Here is the production pattern.

Key takeaways

TL;DR — Neo4j Labs shipped neo4j-agent-memory in 2026 — a graph-native memory layer for AI agents with three layers in one knowledge graph: short-term (full conversation chains), long-term (entity + preference graph), and reasoning (record of how the agent solved past problems). It plugs into Microsoft Agent Framework, LangChain, Pydantic AI, Google ADK, Strands, and CrewAI.

The technique

Vector memory is fast for fuzzy recall but blind to relationships. Graph memory excels at multi-hop entity queries — "which patients did Dr. Lee see in February who also saw Dr. Park in March, and which of them have BCBS in-network?" That is a 4-hop join, expensive in vector + SQL, trivial in Cypher.

The neo4j-agent-memory schema models three memory layers as connected sub-graphs in one graph:

  • Short-term: (:Session)-[:HAS_MESSAGE]->(:Message) chains
  • Long-term: (:Person)-[:WORKS_AT]->(:Org), (:Person)-[:PREFERS]->(:Preference), etc.
  • Reasoning: (:Plan)-[:STEP]->(:Action)-[:USED_TOOL]->(:Tool) with outcome metadata
flowchart LR
  C[Conversation turn] --> EX[NER + relation extractor]
  EX --> ST[(Short-term graph)]
  EX --> LT[(Long-term graph)]
  P[Agent plan] --> RM[(Reasoning graph)]
  Q[Query] --> CY[Cypher router]
  CY --> ST
  CY --> LT
  CY --> RM
  CY --> A[Agent]

How it works

On every turn, an entity extractor runs (multi-stage: spaCy/GLiNER for cheap; LLM fallback for hard cases). Entities get upserted with embedding-based dedup ("Sagar S" -> "Sagar Shankaran"). Relations are extracted with a small LLM. New facts are appended; conflicting facts trigger a reconciliation prompt.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live Demo →

Retrieval is a typed Cypher query, not a similarity search. The agent has tools like graph_query_neighbors(entity), graph_find_path(a, b), graph_get_preferences(user). For fuzzy recall the agent calls a separate vector index over node text + descriptions.

The reasoning layer is the underrated piece: every plan + tool-call + outcome is logged. The agent can query "have I solved a problem like this before?" and lift the playbook.

CallSphere implementation

CallSphere uses Neo4j as the cross-entity memory layer for verticals where relationships matter most:

  • Healthcare: (:Patient)-[:HAS_PLAN]->(:InsurancePlan)-[:IN_NETWORK_WITH]->(:Provider) and (:Patient)-[:PRESCRIBED]->(:Medication) for allergy/interaction checks.
  • OneRoof real estate: (:Buyer)-[:WORKING_WITH]->(:Agent)-[:BROKERAGE]->(:Brokerage), (:Listing)-[:IN]->(:Neighborhood)-[:ZONED_FOR]->(:School).
  • UrackIT IT helpdesk: (:Incident)-[:ON]->(:Service)-[:DEPENDS_ON]->(:Service) for blast-radius reasoning.

37 agents · 90+ tools · 115+ DB tables · 6 verticals. $149/$499/$1499, 14-day trial, 22% affiliate. Vertical pages: /industries/it-services, /industries/real-estate.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Build steps with code

from neo4j_agent_memory import AgentMemory
from neo4j import GraphDatabase

driver = GraphDatabase.driver("bolt://localhost:7687", auth=("neo4j", "***"))
mem = AgentMemory(driver)

# Write
mem.add_long_term(
    user_id="patient_4421",
    text="Patient is allergic to penicillin and amoxicillin.",
    extract_entities=True,
)
mem.add_short_term(session_id="s_99", role="user", content="Same kid as last visit")

# Read with Cypher
with driver.session() as s:
    rows = s.run("""
      MATCH (p:Person {id: $uid})-[:ALLERGIC_TO]->(d:Drug)
      RETURN d.name AS drug
    """, uid="patient_4421").data()
  1. Run a multi-stage entity extractor: spaCy/GLiNER first, LLM only on miss.
  2. Always keep an embedding index alongside for fuzzy recall.
  3. Version every fact with asserted_at and source properties.
  4. Index hot relationship patterns; Cypher without proper indexes is slow.

Pitfalls

  • Entity drift: same person becomes 3 nodes. Hard-enforce dedup with embedding match + alias rules.
  • Schema explosion: 200+ relation types makes querying chaotic. Cap at 30–50.
  • Cost of LLM extraction: at scale the extractor dominates the bill. Use cheap statistical NER first.
  • No conflict policy: when "user lives in NY" and "user lives in Seattle" both exist, the agent picks at random unless you implement temporal reconciliation.

FAQ

Graph or vector? Both. Graph for entity-heavy queries; vector for fuzzy recall.

Neo4j or Memgraph? Neo4j for ecosystem and labs (agent-memory, GenAI integrations); Memgraph for raw query throughput.

Cypher complexity? Mid. A senior engineer is productive in a week.

Cost? Neo4j Aura starts at hobby tier; self-host community edition is free.

See it on /demo? Yes — try a multi-hop query like "find providers in-network for both my plans."

Sources

Share
S

Written by

Sagar Shankaran· Founder, CallSphere

Sagar Shankaran is the founder of CallSphere, where he builds production AI voice and chat agents deployed across healthcare, hospitality, real estate, and home services. He writes about agentic AI, LLM engineering, and shipping voice agents that handle real calls in production.

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.

Related Articles You May Like

Agentic AI & LLMs

Graphiti: How Temporal Knowledge Graphs Give AI Voice Agents Persistent Memory (2026 Guide)

Graphiti is the open-source temporal knowledge graph for AI agents in 2026. Learn how bi-temporal memory beats vector RAG for voice agents and long-running LLMs.

Agentic AI & LLMs

GPT-Realtime-2 Tool Use and Reasoning: GPT-5-Class Voice Agents

GPT-Realtime-2 brings GPT-5-class reasoning into voice. What that means for tool-call reliability, structured output, and production agent design.

Agentic AI & LLMs

MCP Servers for SaaS Tools: A 2026 Registry Walkthrough for Voice Agent Teams

The public MCP registry crossed 9,400 servers in April 2026. Here is a curated walkthrough of the SaaS MCP servers CallSphere mounts in production, with OAuth 2.1 PKCE patterns.

Agentic AI & LLMs

Vercel AI SDK v5 Agent Patterns: stopWhen, prepareStep, and Loop Control

AI SDK 5 ships fully typed chat for React, Svelte, Vue, and Angular plus first-class agent loop primitives. Here are the patterns that matter for shipping in 2026.

Agentic AI & LLMs

Agent Personalization at Scale: Patterns That Work for 1M Users

Personalizing agents for one user is easy. Personalizing them for a million users is a memory-tier problem. The hot/warm/cold split and what each tier optimizes for.

Agentic AI & LLMs

Memory Consolidation Patterns for Long-Running Agents in 2026

Long-running agents accumulate noisy state. Five consolidation patterns — summarization, salience scoring, decay, dedup, and refactor — and when each one fits.