By Sagar Shankaran, Founder of CallSphere
mem0 hit 37k+ GitHub stars and ships v1.0.4 with metadata filtering, project-level config, and timestamp backfills. Here is how to wire it as a drop-in memory bolt-on.
Key takeaways
TL;DR — mem0 ("mem-zero") is the lightest agent memory layer that works. You import it, call
memory.add()andmemory.search(), and your agent now has long-term memory. v1.0.4 (Feb 2026) adds metadata filtering, scoped config, and backfill timestamps. 37k+ GitHub stars, framework-agnostic.
flowchart TD
Client[MCP client · Claude Desktop] --> MCP[MCP server]
MCP --> Tool1[Tool: Calendar]
MCP --> Tool2[Tool: CRM]
MCP --> Tool3[Tool: KB search]
Tool1 --> SaaS1[(Calendly)]
Tool2 --> SaaS2[(Salesforce)]
Tool3 --> SaaS3[(Notion)]mem0 is a memory library, not an agent runtime. You keep your existing agent stack — OpenAI Agents SDK, LangGraph, CrewAI, smolagents, whatever — and bolt on memory in two function calls:
from mem0 import Memory
m = Memory()
# After a turn, store what was learned
m.add("User prefers Modal over Docker for sandboxes", user_id="sagar")
# Before next turn, recall
related = m.search("which sandbox does the user prefer?", user_id="sagar")
That's the whole API surface. Behind it: an LLM extracts memorable facts from raw text, a vector store indexes them, retrieval finds the relevant ones at recall time. The library handles deduplication, conflict resolution (new fact contradicts old fact → update), and decay.
timestamp parameter on update() for backfilling memory updates with accurate creation times — important for migrations.Pick mem0 when:
user_id, agent_id, run_id partitions out of the box).Skip mem0 when:
mem0 powers our per-prospect outbound research memory. When CallSphere's GTM engine reaches out to a prospect, it stores everything it learned (LinkedIn role, company funding stage, tech-stack signals, prior conversation snippets) under user_id=<prospect_email>. The next outbound touch retrieves that memory before drafting the email, so the second touch never asks "what does your company do?" — it references the first touch's context.
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
For our Real Estate OneRoof, mem0 stores buyer preferences across the months-long buyer journey: school district priorities, must-haves, deal-breakers, family stage. The agent searches by buyer ID before each conversation.
For our IT Services UrackIT deployment, mem0 sits next to the ChromaDB RAG layer — ChromaDB has the company's ticket corpus; mem0 has the per-customer learnings the agent picks up during live troubleshooting.
Pricing: $149 / $499 / $1499. 14-day trial. 22% affiliate.
pip install mem0ai (or npm i mem0ai).m = Memory.from_config({...}).m.add(turn_text, user_id=...).m.search(user_query, user_id=...) and prepend results to the system prompt.m.search(..., filters={"workflow": "outbound_research"}).from mem0 import Memory
m = Memory()
m.add(
"Prefers async meetings, EST 9am-2pm, no Mondays",
user_id="sagar",
metadata={"workflow": "scheduling", "source": "email"},
)
# Targeted retrieval
results = m.search(
"when can we schedule the call?",
user_id="sagar",
filters={"metadata.workflow": "scheduling"},
)
docker run -p 8000:8000 mem0ai/mem0:latest (or use the docker-compose).The internal LLM extractor doesn't store raw conversation text. It distills each input into atomic facts ("user prefers async meetings," "user works in EST timezone"). These atoms are what get vectorized and indexed. On retrieval the agent receives a list of relevant atoms, not raw turns.
The benefits compound:
The cost: you pay an LLM call per write. Budget this in your cost model. For high-write workloads (chatbots with many turns per session), batch writes or downsample.
If your "memory" is a static knowledge base (product docs, support articles, past tickets), you don't need mem0 — you need a regular vector DB with a retrieval layer. mem0's value kicks in when memories are generated during conversations and need extraction, conflict resolution, and per-user partitioning.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
The simple test: if you're storing things you typed, use a vector DB. If you're storing things the user said, use mem0.
Two operational profiles:
mem0 directly; the vector DB is your responsibility. Best for single-tenant apps and full control.CallSphere runs server mode behind our existing API gateway. Each tenant gets their own API key and namespace; the dashboard is for ops to debug "why didn't the agent remember X?" by viewing the actual stored atoms.
mem0 vs Letta vs Zep? mem0 is a library you import; Letta is a runtime your agent lives in; Zep is a managed temporal-graph platform. Pick by integration depth.
What vector store should I use? pgvector if you already run Postgres. Pinecone if you want managed. Chroma for local dev.
Does mem0 work with MCP? It can be exposed as an MCP server (community implementations exist) so your agents can read/write memories as tool calls.
Is the OSS version production-ready? Yes — 37k+ stars, AWS integrations, well-tested API. We run it in production.
How do I demo this on CallSphere? Book a demo; we'll show the per-prospect memory feeding our outbound engine.
Written by
Sagar Shankaran· Founder, CallSphere
Sagar Shankaran is the founder of CallSphere, where he builds production AI voice and chat agents deployed across healthcare, hospitality, real estate, and home services. He writes about agentic AI, LLM engineering, and shipping voice agents that handle real calls in production.
See how AI voice agents work for your industry. Live demo available -- no signup required.
A founder's guide to building a chatbot for answering questions on your website: RAG, voice, and how CallSphere ships one in 3-5 days.
Graphiti is the open-source temporal knowledge graph for AI agents in 2026. Learn how bi-temporal memory beats vector RAG for voice agents and long-running LLMs.
A founder's guide on how to create a chatbot in 2026. Build options, AI stack, integration patterns, and when buying a managed agent wins over building.
Working memory, permanent memory, sandboxes, harnesses, governance — the practical blueprint enterprises are using to ship long-horizon AI agents in 2026.
Haystack 2.7's Agent component plus an Ollama-served Llama 3.2 gives you tool-calling RAG with citations. Here's a complete pipeline against your own document store.
Beyond single-shot RAG — agentic RAG with LangGraph that re-retrieves, self-grades, and rewrites queries. With evals that catch silent retrieval drift.
© 2026 CallSphere LLC. All rights reserved.
Watch how CallSphere handles real customer calls, schedules appointments, and processes payments — live.
Try Live DemoBook a DemoCalculate Your ROI