Skip to content
AI Engineering
AI Engineering11 min read0 views

pgvector Quickstart: Wire AI Agent Embeddings into Postgres in 30 Min (2026)

A working pgvector quickstart for AI agents — install the extension, design a hybrid table, build an HNSW index, and query top-k matches from Prisma and Python. Real schema, real code, no marketing fluff.

TL;DR — pgvector 0.8.x on Postgres 17/18 is the simplest sane vector store for AI agents in 2026. You get ACID, joins, RLS, and HNSW indexes without bolting on a separate vector DB. This guide takes you from CREATE EXTENSION to a working top-k query in under 30 minutes.

What you'll build

A single agent_memories table that stores 1536-dimension OpenAI embeddings alongside metadata, indexed for sub-50ms top-k retrieval. By the end you'll have:

  • pgvector installed and verified
  • A schema with vector + JSONB metadata + tenant isolation
  • An HNSW index tuned for ~1M rows
  • A working Prisma + Python retriever returning top-5 hits

Schema

CREATE EXTENSION IF NOT EXISTS vector;

CREATE TABLE agent_memories (
  id BIGSERIAL PRIMARY KEY,
  tenant_id UUID NOT NULL,
  agent_id UUID NOT NULL,
  content TEXT NOT NULL,
  embedding vector(1536) NOT NULL,
  metadata JSONB NOT NULL DEFAULT '{}'::jsonb,
  created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);

CREATE INDEX agent_memories_embedding_hnsw
  ON agent_memories
  USING hnsw (embedding vector_cosine_ops)
  WITH (m = 16, ef_construction = 64);

CREATE INDEX agent_memories_tenant_idx ON agent_memories (tenant_id, agent_id);
CREATE INDEX agent_memories_meta_gin ON agent_memories USING gin (metadata);

Architecture

flowchart LR
  USER[User turn] --> EMB[OpenAI embed]
  EMB --> PG[(Postgres + pgvector)]
  PG -->|top-k cosine| RET[Retrieved chunks]
  RET --> LLM[GPT-4o]
  LLM --> REPLY[Agent reply]
  REPLY --> WRITE[Write to memory]
  WRITE --> PG

Step 1 — Install pgvector

On Postgres 18 with the postgresql-18-pgvector package:

sudo apt install postgresql-18-pgvector
psql -d agents -c "CREATE EXTENSION vector;"
psql -d agents -c "SELECT extname, extversion FROM pg_extension WHERE extname = 'vector';"

Step 2 — Define the Prisma model

model AgentMemory {
  id         BigInt   @id @default(autoincrement())
  tenantId   String   @map("tenant_id") @db.Uuid
  agentId    String   @map("agent_id")  @db.Uuid
  content    String
  embedding  Unsupported("vector(1536)")
  metadata   Json     @default("{}")
  createdAt  DateTime @default(now()) @map("created_at")

  @@map("agent_memories")
  @@index([tenantId, agentId])
}

Prisma cannot natively express vector, so use Unsupported and run raw SQL for inserts and ANN searches.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live Demo →

Step 3 — Insert with raw SQL

import { PrismaClient } from "@prisma/client";
const prisma = new PrismaClient();

export async function remember(
  tenantId: string,
  agentId: string,
  content: string,
  embedding: number[],
) {
  const vec = `[${embedding.join(",")}]`;
  await prisma.$executeRaw`
    INSERT INTO agent_memories (tenant_id, agent_id, content, embedding)
    VALUES (${tenantId}::uuid, ${agentId}::uuid, ${content}, ${vec}::vector)
  `;
}

Step 4 — Top-k retrieval

export async function recall(
  tenantId: string,
  agentId: string,
  query: number[],
  k = 5,
) {
  const vec = `[${query.join(",")}]`;
  return prisma.$queryRaw`
    SELECT id, content, metadata,
           1 - (embedding <=> ${vec}::vector) AS similarity
    FROM agent_memories
    WHERE tenant_id = ${tenantId}::uuid
      AND agent_id  = ${agentId}::uuid
    ORDER BY embedding <=> ${vec}::vector
    LIMIT ${k}
  `;
}

The <=> operator is cosine distance. Use <-> for L2, <#> for inner-product.

Step 5 — Tune ef_search per query

SET LOCAL hnsw.ef_search = 100;
SELECT id FROM agent_memories
ORDER BY embedding <=> $1::vector LIMIT 10;

Default ef_search is 40 — too low for >100k rows. Set it per session or per query for higher recall.

Step 6 — Python equivalent (LangChain agents)

import psycopg, os
from openai import OpenAI

client = OpenAI()
conn = psycopg.connect(os.environ["DATABASE_URL"])

def embed(text: str) -> list[float]:
    return client.embeddings.create(
        model="text-embedding-3-small", input=text
    ).data[0].embedding

def recall(tenant: str, agent: str, query: str, k: int = 5):
    v = embed(query)
    with conn.cursor() as cur:
        cur.execute(
            """
            SELECT content, 1 - (embedding <=> %s::vector) AS sim
            FROM agent_memories
            WHERE tenant_id = %s AND agent_id = %s
            ORDER BY embedding <=> %s::vector LIMIT %s
            """,
            (v, tenant, agent, v, k),
        )
        return cur.fetchall()

Pitfalls

  • Forgetting to cast'[1,2,3]'::vector is required; raw arrays fail.
  • Index built before data load — HNSW build is much faster on populated tables; bulk load first, then index.
  • Default ef_search = 40 — too low; bump to 100+ for production recall.
  • No tenant filter — without it, ANN scans the whole graph. Always include tenant_id in the WHERE.

CallSphere production note

CallSphere runs 115+ DB tables across 6 verticals on PostgreSQL — including a dedicated healthcare_voice Prisma schema for HIPAA workloads, OneRoof's RLS-enforced tenant model, and UrackIT's Supabase + ChromaDB hybrid. Across 37 specialized agents and 90+ tools, every long-term memory write goes through pgvector with HNSW. Pricing tiers: $149 / $499 / $1,499, 14-day trial, 22% affiliate program.

FAQ

Q: Should I use IVFFlat or HNSW? HNSW for ≥99% workloads in 2026 — better recall, no lists to tune. IVFFlat only when memory is constrained.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Q: How many vectors before pgvector hurts? Up to ~10M with HNSW + decent hardware. Beyond that consider sharding (Citus) or pgvectorscale.

Q: Can I store 3072-dim embeddings (text-embedding-3-large)? Yes, but HNSW caps at 2,000 dims. Use halfvec or matryoshka-truncate to 1536.

Q: Does pgvector support metadata filtering? Yes — combine WHERE clauses with ANN ordering. Postgres planner handles it.

Q: Cosine vs L2 vs dot product? Cosine for normalized OpenAI embeddings, dot product when you control normalization yourself.

Sources

Share

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.

Related Articles You May Like

AI Engineering

Build a Chat Agent with Haystack RAG + Open LLM (Llama 3.2, 2026)

Haystack 2.7's Agent component plus an Ollama-served Llama 3.2 gives you tool-calling RAG with citations. Here's a complete pipeline against your own document store.

Agentic AI

Agentic RAG with LangGraph: Iterative Retrieval, Self-Correction, and Eval Pipelines

Beyond single-shot RAG — agentic RAG with LangGraph that re-retrieves, self-grades, and rewrites queries. With evals that catch silent retrieval drift.

Agentic AI

Building Your First Agent with the OpenAI Agents SDK in 2026: A Hands-On Walkthrough

Step-by-step build of a working agent with the OpenAI Agents SDK — Agent class, tools, handoffs, tracing — plus an eval pipeline that catches regressions before merge.

Agentic AI

LangGraph Checkpointers in Production: Durable, Resumable Agents with Eval Replay

Use LangGraph's checkpointer to make agents resumable across crashes and human-in-the-loop pauses, then replay any checkpoint into your eval pipeline.

Agentic AI

LangGraph State-Machine Architecture: A Principal-Engineer Deep Dive (2026)

How LangGraph's StateGraph, channels, and reducers actually work — with a working multi-step agent, eval hooks at every node, and the patterns that survive production.

Agentic AI

Production RAG Agents with LangChain and RAGAS Evaluation in 2026

Build a production RAG agent with LangChain, then measure faithfulness, answer relevance, and context precision with RAGAS. The four metrics that matter and how to wire them up.