By Sagar Shankaran, Founder of CallSphere
Letta treats the LLM like an OS that manages its own RAM, recall, and archival memory. Here is when this paradigm beats simple vector stores.
Key takeaways
TL;DR — Letta (formerly MemGPT) is an "LLM-as-OS" runtime where the agent manages its own memory tiers like an operating system manages RAM and disk. If your agent needs to learn across sessions and edit its own context, Letta is the most mature option in 2026.
flowchart TD
Client[MCP client · Claude Desktop] --> MCP[MCP server]
MCP --> Tool1[Tool: Calendar]
MCP --> Tool2[Tool: CRM]
MCP --> Tool3[Tool: KB search]
Tool1 --> SaaS1[(Calendly)]
Tool2 --> SaaS2[(Salesforce)]
Tool3 --> SaaS3[(Notion)]Traditional LLM apps treat memory as something the application layer fetches and stuffs into the prompt. Letta inverts that: the agent decides what to keep in context, what to push to recall, what to archive. The model has tools to read and write its own memory tiers.
Three tiers:
When the context is about to overflow, the agent receives a system message ("you are running out of context") and must decide what to evict to recall, what to summarize into core, and what to archive. This is the OS analogy made literal.
The MemGPT open-source project was absorbed into Letta. The platform now ships:
Pick Letta when:
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
Skip Letta when:
CallSphere's voice agents are mostly session-bounded — a single inbound or outbound call is the unit of work. We don't need Letta for that.
But our after-hours product (7 agents with explicit escalation) is exactly Letta-shaped. When a customer's caretaker calls at 11 PM about a recurring issue, the agent benefits from remembering the prior week's escalations, the family member's preferences, the on-call doctor's instructions. That state lives in our Postgres today; we've prototyped a Letta-backed version that lets the agent edit its own "what I know about this household" core memory after each call.
For our Real Estate OneRoof deployment (10 specialist agents), the buyer-journey use case is similar — a buyer searches for 6 months, talks to the agent dozens of times, and the agent should learn their preferences over that span. That's a Letta-shaped problem.
Pricing: $149 / $499 / $1499. 14-day trial. 22% affiliate.
pip install letta or run the Letta server: docker run -p 8283:8283 letta/letta.agent.memory.get().from letta_client import Letta
client = Letta(base_url="http://localhost:8283")
agent = client.agents.create(
name="callsphere-after-hours",
memory_blocks=[
{"label": "persona", "value": "I am a calm, careful after-hours support agent."},
{"label": "human", "value": "Unknown caller. I will learn as we talk."},
],
tools=["lookup_account", "page_on_call_doctor"],
model="openai/gpt-5",
embedding="openai/text-embedding-3-large",
)
response = client.agents.messages.create(
agent_id=agent.id,
messages=[{"role": "user", "content": "It's about my mom again, the breathing thing"}],
)
# Internally the agent updated its 'human' core memory block to record the
# caller's relationship and the recurring concern. Next call benefits.
Sizing the three tiers correctly is the difference between a useful Letta agent and a confused one:
Letta's automatic eviction logic moves content between tiers as context fills. The default behavior is reasonable; we override it only when we want a specific fact (like "this household speaks Spanish primarily") to stay in core forever.
A common Letta failure is the agent over-writing core memory until it loses sight of its persona. Mitigations:
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
We hit this in early prototyping; the read-only persona block fix made the agent's identity stable across hundreds of turns.
The Conversations API lets multiple users interact with the same agent and share memory. For a household agent (mom, dad, kids all calling about the same elderly relative's care), this is exactly the model. The agent knows it's a household, not three individual relationships, and reasons accordingly.
We're prototyping this for our after-hours product where caretakers and family members both interact with the same support agent.
Letta vs mem0 vs Zep? Letta is a runtime (the agent lives in Letta). mem0 is a memory library you wire into your existing agent. Zep is a managed memory platform with temporal knowledge graphs. Pick by where you want the agent to live.
Is Letta production-ready? Yes. The Letta Code agent ranks #1 on Terminal-Bench among OSS model-agnostic coding agents — that's a strong production signal.
Does it work with MCP? Yes — Letta agents can mount MCP servers as toolsets.
What's the licensing? Apache 2.0 for the OSS server. Letta also offers a managed cloud.
Where do I see this on CallSphere? Book a demo and we'll walk through our after-hours Letta prototype.
Written by
Sagar Shankaran· Founder, CallSphere
Sagar Shankaran is the founder of CallSphere, where he builds production AI voice and chat agents deployed across healthcare, hospitality, real estate, and home services. He writes about agentic AI, LLM engineering, and shipping voice agents that handle real calls in production.
See how AI voice agents work for your industry. Live demo available -- no signup required.
Graphiti is the open-source temporal knowledge graph for AI agents in 2026. Learn how bi-temporal memory beats vector RAG for voice agents and long-running LLMs.
Memory is supposed to make agents better — but does it? Build a memory eval pipeline that measures recall, precision, contradiction rate, and the freshness/staleness tradeoff.
Use LangGraph's checkpointer to make agents resumable across crashes and human-in-the-loop pauses, then replay any checkpoint into your eval pipeline.
How LangGraph's StateGraph, channels, and reducers actually work — with a working multi-step agent, eval hooks at every node, and the patterns that survive production.
How short-term (thread-scoped) and long-term (cross-thread) memory actually work in LangGraph, with code, schemas, and the eviction policies that keep cost predictable.
Neo4j's agent-memory project ships short-term, long-term, and reasoning memory in one graph. Microsoft Agent Framework and LangChain both wire it in. Here is the production pattern.
© 2026 CallSphere LLC. All rights reserved.
Watch how CallSphere handles real customer calls, schedules appointments, and processes payments — live.