By Sagar Shankaran, Founder of CallSphere
A working pgvector quickstart for AI agents — install the extension, design a hybrid table, build an HNSW index, and query top-k matches from Prisma and Python. Real schema, real code, no marketing fluff.
Key takeaways
TL;DR — pgvector 0.8.x on Postgres 17/18 is the simplest sane vector store for AI agents in 2026. You get ACID, joins, RLS, and HNSW indexes without bolting on a separate vector DB. This guide takes you from
CREATE EXTENSIONto a working top-k query in under 30 minutes.
A single agent_memories table that stores 1536-dimension OpenAI embeddings alongside metadata, indexed for sub-50ms top-k retrieval. By the end you'll have:
CREATE EXTENSION IF NOT EXISTS vector;
CREATE TABLE agent_memories (
id BIGSERIAL PRIMARY KEY,
tenant_id UUID NOT NULL,
agent_id UUID NOT NULL,
content TEXT NOT NULL,
embedding vector(1536) NOT NULL,
metadata JSONB NOT NULL DEFAULT '{}'::jsonb,
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
CREATE INDEX agent_memories_embedding_hnsw
ON agent_memories
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);
CREATE INDEX agent_memories_tenant_idx ON agent_memories (tenant_id, agent_id);
CREATE INDEX agent_memories_meta_gin ON agent_memories USING gin (metadata);
flowchart LR
USER[User turn] --> EMB[OpenAI embed]
EMB --> PG[(Postgres + pgvector)]
PG -->|top-k cosine| RET[Retrieved chunks]
RET --> LLM[GPT-4o]
LLM --> REPLY[Agent reply]
REPLY --> WRITE[Write to memory]
WRITE --> PG
On Postgres 18 with the postgresql-18-pgvector package:
sudo apt install postgresql-18-pgvector
psql -d agents -c "CREATE EXTENSION vector;"
psql -d agents -c "SELECT extname, extversion FROM pg_extension WHERE extname = 'vector';"
model AgentMemory {
id BigInt @id @default(autoincrement())
tenantId String @map("tenant_id") @db.Uuid
agentId String @map("agent_id") @db.Uuid
content String
embedding Unsupported("vector(1536)")
metadata Json @default("{}")
createdAt DateTime @default(now()) @map("created_at")
@@map("agent_memories")
@@index([tenantId, agentId])
}
Prisma cannot natively express vector, so use Unsupported and run raw SQL for inserts and ANN searches.
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
import { PrismaClient } from "@prisma/client";
const prisma = new PrismaClient();
export async function remember(
tenantId: string,
agentId: string,
content: string,
embedding: number[],
) {
const vec = `[${embedding.join(",")}]`;
await prisma.$executeRaw`
INSERT INTO agent_memories (tenant_id, agent_id, content, embedding)
VALUES (${tenantId}::uuid, ${agentId}::uuid, ${content}, ${vec}::vector)
`;
}
export async function recall(
tenantId: string,
agentId: string,
query: number[],
k = 5,
) {
const vec = `[${query.join(",")}]`;
return prisma.$queryRaw`
SELECT id, content, metadata,
1 - (embedding <=> ${vec}::vector) AS similarity
FROM agent_memories
WHERE tenant_id = ${tenantId}::uuid
AND agent_id = ${agentId}::uuid
ORDER BY embedding <=> ${vec}::vector
LIMIT ${k}
`;
}
The <=> operator is cosine distance. Use <-> for L2, <#> for inner-product.
ef_search per querySET LOCAL hnsw.ef_search = 100;
SELECT id FROM agent_memories
ORDER BY embedding <=> $1::vector LIMIT 10;
Default ef_search is 40 — too low for >100k rows. Set it per session or per query for higher recall.
import psycopg, os
from openai import OpenAI
client = OpenAI()
conn = psycopg.connect(os.environ["DATABASE_URL"])
def embed(text: str) -> list[float]:
return client.embeddings.create(
model="text-embedding-3-small", input=text
).data[0].embedding
def recall(tenant: str, agent: str, query: str, k: int = 5):
v = embed(query)
with conn.cursor() as cur:
cur.execute(
"""
SELECT content, 1 - (embedding <=> %s::vector) AS sim
FROM agent_memories
WHERE tenant_id = %s AND agent_id = %s
ORDER BY embedding <=> %s::vector LIMIT %s
""",
(v, tenant, agent, v, k),
)
return cur.fetchall()
'[1,2,3]'::vector is required; raw arrays fail.ef_search = 40 — too low; bump to 100+ for production recall.tenant_id in the WHERE.CallSphere runs 115+ DB tables across 6 verticals on PostgreSQL — including a dedicated healthcare_voice Prisma schema for HIPAA workloads, OneRoof's RLS-enforced tenant model, and UrackIT's Supabase + ChromaDB hybrid. Across 37 specialized agents and 90+ tools, every long-term memory write goes through pgvector with HNSW. Pricing tiers: $149 / $499 / $1,499, 14-day trial, 22% affiliate program.
Q: Should I use IVFFlat or HNSW?
HNSW for ≥99% workloads in 2026 — better recall, no lists to tune. IVFFlat only when memory is constrained.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
Q: How many vectors before pgvector hurts? Up to ~10M with HNSW + decent hardware. Beyond that consider sharding (Citus) or pgvectorscale.
Q: Can I store 3072-dim embeddings (text-embedding-3-large)? Yes, but HNSW caps at 2,000 dims. Use halfvec or matryoshka-truncate to 1536.
Q: Does pgvector support metadata filtering?
Yes — combine WHERE clauses with ANN ordering. Postgres planner handles it.
Q: Cosine vs L2 vs dot product? Cosine for normalized OpenAI embeddings, dot product when you control normalization yourself.
Written by
Sagar Shankaran· Founder, CallSphere
Sagar Shankaran is the founder of CallSphere, where he builds production AI voice and chat agents deployed across healthcare, hospitality, real estate, and home services. He writes about agentic AI, LLM engineering, and shipping voice agents that handle real calls in production.
See how AI voice agents work for your industry. Live demo available -- no signup required.
A founder's guide to building a chatbot for answering questions on your website: RAG, voice, and how CallSphere ships one in 3-5 days.
A founder's guide to the personal AI assistant market: best AI assistant apps, business-grade options, and how CallSphere's voice agent fits in.
A founder's guide to free AI agents, low-code AI agent builders, and how to know when you should pay for a real platform like CallSphere.
Graphiti is the open-source temporal knowledge graph for AI agents in 2026. Learn how bi-temporal memory beats vector RAG for voice agents and long-running LLMs.
A founder's guide on how to create a chatbot in 2026. Build options, AI stack, integration patterns, and when buying a managed agent wins over building.
Chatbot app vs ChatGPT in 2026: a founder's clear take on the difference, when to use which, and how a real AI chatbot app development works.
© 2026 CallSphere LLC. All rights reserved.
Watch how CallSphere handles real customer calls, schedules appointments, and processes payments — live.
Try Live DemoBook a DemoCalculate Your ROI