TL;DR — pgvector 0.8.x on Postgres 17/18 is the simplest sane vector store for AI agents in 2026. You get ACID, joins, RLS, and HNSW indexes without bolting on a separate vector DB. This guide takes you from CREATE EXTENSION to a working top-k query in under 30 minutes.

What you'll build

A single agent_memories table that stores 1536-dimension OpenAI embeddings alongside metadata, indexed for sub-50ms top-k retrieval. By the end you'll have:

pgvector installed and verified
A schema with vector + JSONB metadata + tenant isolation
An HNSW index tuned for ~1M rows
A working Prisma + Python retriever returning top-5 hits

Schema

CREATE EXTENSION IF NOT EXISTS vector;

CREATE TABLE agent_memories (
  id BIGSERIAL PRIMARY KEY,
  tenant_id UUID NOT NULL,
  agent_id UUID NOT NULL,
  content TEXT NOT NULL,
  embedding vector(1536) NOT NULL,
  metadata JSONB NOT NULL DEFAULT '{}'::jsonb,
  created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);

CREATE INDEX agent_memories_embedding_hnsw
  ON agent_memories
  USING hnsw (embedding vector_cosine_ops)
  WITH (m = 16, ef_construction = 64);

CREATE INDEX agent_memories_tenant_idx ON agent_memories (tenant_id, agent_id);
CREATE INDEX agent_memories_meta_gin ON agent_memories USING gin (metadata);

Architecture

flowchart LR
  USER[User turn] --> EMB[OpenAI embed]
  EMB --> PG[(Postgres + pgvector)]
  PG -->|top-k cosine| RET[Retrieved chunks]
  RET --> LLM[GPT-4o]
  LLM --> REPLY[Agent reply]
  REPLY --> WRITE[Write to memory]
  WRITE --> PG

Step 1 — Install pgvector

On Postgres 18 with the postgresql-18-pgvector package:

sudo apt install postgresql-18-pgvector
psql -d agents -c "CREATE EXTENSION vector;"
psql -d agents -c "SELECT extname, extversion FROM pg_extension WHERE extname = 'vector';"

Step 2 — Define the Prisma model

model AgentMemory {
  id         BigInt   @id @default(autoincrement())
  tenantId   String   @map("tenant_id") @db.Uuid
  agentId    String   @map("agent_id")  @db.Uuid
  content    String
  embedding  Unsupported("vector(1536)")
  metadata   Json     @default("{}")
  createdAt  DateTime @default(now()) @map("created_at")

  @@map("agent_memories")
  @@index([tenantId, agentId])
}

Prisma cannot natively express vector, so use Unsupported and run raw SQL for inserts and ANN searches.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live →

Try Live Demo →

Step 3 — Insert with raw SQL

import { PrismaClient } from "@prisma/client";
const prisma = new PrismaClient();

export async function remember(
  tenantId: string,
  agentId: string,
  content: string,
  embedding: number[],
) {
  const vec = `[${embedding.join(",")}]`;
  await prisma.$executeRaw`
    INSERT INTO agent_memories (tenant_id, agent_id, content, embedding)
    VALUES (${tenantId}::uuid, ${agentId}::uuid, ${content}, ${vec}::vector)
  `;
}

Step 4 — Top-k retrieval

export async function recall(
  tenantId: string,
  agentId: string,
  query: number[],
  k = 5,
) {
  const vec = `[${query.join(",")}]`;
  return prisma.$queryRaw`
    SELECT id, content, metadata,
           1 - (embedding <=> ${vec}::vector) AS similarity
    FROM agent_memories
    WHERE tenant_id = ${tenantId}::uuid
      AND agent_id  = ${agentId}::uuid
    ORDER BY embedding <=> ${vec}::vector
    LIMIT ${k}
  `;
}

The <=> operator is cosine distance. Use <-> for L2, <#> for inner-product.

Step 5 — Tune `ef_search` per query

SET LOCAL hnsw.ef_search = 100;
SELECT id FROM agent_memories
ORDER BY embedding <=> $1::vector LIMIT 10;

Default ef_search is 40 — too low for >100k rows. Set it per session or per query for higher recall.

Step 6 — Python equivalent (LangChain agents)

import psycopg, os
from openai import OpenAI

client = OpenAI()
conn = psycopg.connect(os.environ["DATABASE_URL"])

def embed(text: str) -> list[float]:
    return client.embeddings.create(
        model="text-embedding-3-small", input=text
    ).data[0].embedding

def recall(tenant: str, agent: str, query: str, k: int = 5):
    v = embed(query)
    with conn.cursor() as cur:
        cur.execute(
            """
            SELECT content, 1 - (embedding <=> %s::vector) AS sim
            FROM agent_memories
            WHERE tenant_id = %s AND agent_id = %s
            ORDER BY embedding <=> %s::vector LIMIT %s
            """,
            (v, tenant, agent, v, k),
        )
        return cur.fetchall()

Pitfalls

Forgetting to cast — '[1,2,3]'::vector is required; raw arrays fail.
Index built before data load — HNSW build is much faster on populated tables; bulk load first, then index.
Default ef_search = 40 — too low; bump to 100+ for production recall.
No tenant filter — without it, ANN scans the whole graph. Always include tenant_id in the WHERE.

CallSphere production note

CallSphere runs 115+ DB tables across 6 verticals on PostgreSQL — including a dedicated healthcare_voice Prisma schema for HIPAA workloads, OneRoof's RLS-enforced tenant model, and UrackIT's Supabase + ChromaDB hybrid. Across 37 specialized agents and 90+ tools, every long-term memory write goes through pgvector with HNSW. Pricing tiers: $149 / $499 / $1,499, 14-day trial, 22% affiliate program.

FAQ

Q: Should I use IVFFlat or HNSW? HNSW for ≥99% workloads in 2026 — better recall, no lists to tune. IVFFlat only when memory is constrained.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Try Live Demo → Book 30-min Walkthrough See Pricing

Q: How many vectors before pgvector hurts? Up to ~10M with HNSW + decent hardware. Beyond that consider sharding (Citus) or pgvectorscale.

Q: Can I store 3072-dim embeddings (text-embedding-3-large)? Yes, but HNSW caps at 2,000 dims. Use halfvec or matryoshka-truncate to 1536.

Q: Does pgvector support metadata filtering? Yes — combine WHERE clauses with ANN ordering. Postgres planner handles it.

Q: Cosine vs L2 vs dot product? Cosine for normalized OpenAI embeddings, dot product when you control normalization yourself.

pgvector Quickstart: Wire AI Agent Embeddings into Postgres in 30 Min (2026)

What you'll build

Schema

Architecture

Step 1 — Install pgvector

Step 2 — Define the Prisma model

Step 3 — Insert with raw SQL

Step 4 — Top-k retrieval

Step 5 — Tune `ef_search` per query

Step 6 — Python equivalent (LangChain agents)

Pitfalls

CallSphere production note

FAQ

Sources

Try CallSphere AI Voice Agents

Related Articles You May Like

Build a Chat Agent with Haystack RAG + Open LLM (Llama 3.2, 2026)

Agentic RAG with LangGraph: Iterative Retrieval, Self-Correction, and Eval Pipelines

Building Your First Agent with the OpenAI Agents SDK in 2026: A Hands-On Walkthrough

LangGraph Checkpointers in Production: Durable, Resumable Agents with Eval Replay

LangGraph State-Machine Architecture: A Principal-Engineer Deep Dive (2026)

Production RAG Agents with LangChain and RAGAS Evaluation in 2026

What you'll build

Schema

Architecture

Step 1 — Install pgvector

Step 2 — Define the Prisma model

Step 3 — Insert with raw SQL

Step 4 — Top-k retrieval

Step 5 — Tune ef_search per query

Step 6 — Python equivalent (LangChain agents)

Pitfalls

CallSphere production note

FAQ

Sources

Try CallSphere AI Voice Agents

Related Articles You May Like

Build a Chat Agent with Haystack RAG + Open LLM (Llama 3.2, 2026)

Agentic RAG with LangGraph: Iterative Retrieval, Self-Correction, and Eval Pipelines

Building Your First Agent with the OpenAI Agents SDK in 2026: A Hands-On Walkthrough

LangGraph Checkpointers in Production: Durable, Resumable Agents with Eval Replay

LangGraph State-Machine Architecture: A Principal-Engineer Deep Dive (2026)

Production RAG Agents with LangChain and RAGAS Evaluation in 2026

Step 5 — Tune `ef_search` per query