---
title: "pgvector Quickstart: Wire AI Agent Embeddings into Postgres in 30 Min (2026)"
description: "A working pgvector quickstart for AI agents — install the extension, design a hybrid table, build an HNSW index, and query top-k matches from Prisma and Python. Real schema, real code, no marketing fluff."
canonical: https://callsphere.ai/blog/vw7h-pgvector-quickstart-ai-agent-embeddings-2026
category: "AI Engineering"
tags: ["pgvector", "Postgres", "Embeddings", "RAG", "AI Agents"]
author: "CallSphere Team"
published: 2026-03-15T00:00:00.000Z
updated: 2026-05-07T22:22:39.895Z
---

# pgvector Quickstart: Wire AI Agent Embeddings into Postgres in 30 Min (2026)

> A working pgvector quickstart for AI agents — install the extension, design a hybrid table, build an HNSW index, and query top-k matches from Prisma and Python. Real schema, real code, no marketing fluff.

> **TL;DR** — pgvector 0.8.x on Postgres 17/18 is the simplest sane vector store for AI agents in 2026. You get ACID, joins, RLS, and HNSW indexes without bolting on a separate vector DB. This guide takes you from `CREATE EXTENSION` to a working top-k query in under 30 minutes.

## What you'll build

A single `agent_memories` table that stores 1536-dimension OpenAI embeddings alongside metadata, indexed for sub-50ms top-k retrieval. By the end you'll have:

- pgvector installed and verified
- A schema with vector + JSONB metadata + tenant isolation
- An HNSW index tuned for ~1M rows
- A working Prisma + Python retriever returning top-5 hits

## Schema

```sql
CREATE EXTENSION IF NOT EXISTS vector;

CREATE TABLE agent_memories (
  id BIGSERIAL PRIMARY KEY,
  tenant_id UUID NOT NULL,
  agent_id UUID NOT NULL,
  content TEXT NOT NULL,
  embedding vector(1536) NOT NULL,
  metadata JSONB NOT NULL DEFAULT '{}'::jsonb,
  created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);

CREATE INDEX agent_memories_embedding_hnsw
  ON agent_memories
  USING hnsw (embedding vector_cosine_ops)
  WITH (m = 16, ef_construction = 64);

CREATE INDEX agent_memories_tenant_idx ON agent_memories (tenant_id, agent_id);
CREATE INDEX agent_memories_meta_gin ON agent_memories USING gin (metadata);
```

## Architecture

```mermaid
flowchart LR
  USER[User turn] --> EMB[OpenAI embed]
  EMB --> PG[(Postgres + pgvector)]
  PG -->|top-k cosine| RET[Retrieved chunks]
  RET --> LLM[GPT-4o]
  LLM --> REPLY[Agent reply]
  REPLY --> WRITE[Write to memory]
  WRITE --> PG
```

## Step 1 — Install pgvector

On Postgres 18 with the `postgresql-18-pgvector` package:

```bash
sudo apt install postgresql-18-pgvector
psql -d agents -c "CREATE EXTENSION vector;"
psql -d agents -c "SELECT extname, extversion FROM pg_extension WHERE extname = 'vector';"
```

## Step 2 — Define the Prisma model

```prisma
model AgentMemory {
  id         BigInt   @id @default(autoincrement())
  tenantId   String   @map("tenant_id") @db.Uuid
  agentId    String   @map("agent_id")  @db.Uuid
  content    String
  embedding  Unsupported("vector(1536)")
  metadata   Json     @default("{}")
  createdAt  DateTime @default(now()) @map("created_at")

  @@map("agent_memories")
  @@index([tenantId, agentId])
}
```

Prisma cannot natively express `vector`, so use `Unsupported` and run raw SQL for inserts and ANN searches.

## Step 3 — Insert with raw SQL

```ts
import { PrismaClient } from "@prisma/client";
const prisma = new PrismaClient();

export async function remember(
  tenantId: string,
  agentId: string,
  content: string,
  embedding: number[],
) {
  const vec = `[${embedding.join(",")}]`;
  await prisma.$executeRaw`
    INSERT INTO agent_memories (tenant_id, agent_id, content, embedding)
    VALUES (${tenantId}::uuid, ${agentId}::uuid, ${content}, ${vec}::vector)
  `;
}
```

## Step 4 — Top-k retrieval

```ts
export async function recall(
  tenantId: string,
  agentId: string,
  query: number[],
  k = 5,
) {
  const vec = `[${query.join(",")}]`;
  return prisma.$queryRaw`
    SELECT id, content, metadata,
           1 - (embedding  ${vec}::vector) AS similarity
    FROM agent_memories
    WHERE tenant_id = ${tenantId}::uuid
      AND agent_id  = ${agentId}::uuid
    ORDER BY embedding  ${vec}::vector
    LIMIT ${k}
  `;
}
```

The `` operator is cosine distance. Use `` for L2, `` for inner-product.

## Step 5 — Tune `ef_search` per query

```sql
SET LOCAL hnsw.ef_search = 100;
SELECT id FROM agent_memories
ORDER BY embedding  $1::vector LIMIT 10;
```

Default `ef_search` is 40 — too low for >100k rows. Set it per session or per query for higher recall.

## Step 6 — Python equivalent (LangChain agents)

```python
import psycopg, os
from openai import OpenAI

client = OpenAI()
conn = psycopg.connect(os.environ["DATABASE_URL"])

def embed(text: str) -> list[float]:
    return client.embeddings.create(
        model="text-embedding-3-small", input=text
    ).data[0].embedding

def recall(tenant: str, agent: str, query: str, k: int = 5):
    v = embed(query)
    with conn.cursor() as cur:
        cur.execute(
            """
            SELECT content, 1 - (embedding  %s::vector) AS sim
            FROM agent_memories
            WHERE tenant_id = %s AND agent_id = %s
            ORDER BY embedding  %s::vector LIMIT %s
            """,
            (v, tenant, agent, v, k),
        )
        return cur.fetchall()
```

## Pitfalls

- **Forgetting to cast** — `'[1,2,3]'::vector` is required; raw arrays fail.
- **Index built before data load** — HNSW build is much faster on populated tables; bulk load first, then index.
- **Default `ef_search = 40`** — too low; bump to 100+ for production recall.
- **No tenant filter** — without it, ANN scans the whole graph. Always include `tenant_id` in the WHERE.

## CallSphere production note

CallSphere runs **115+ DB tables** across 6 verticals on PostgreSQL — including a dedicated `healthcare_voice` Prisma schema for HIPAA workloads, OneRoof's RLS-enforced tenant model, and UrackIT's Supabase + ChromaDB hybrid. Across **37 specialized agents** and **90+ tools**, every long-term memory write goes through pgvector with HNSW. Pricing tiers: **$149 / $499 / $1,499**, 14-day trial, 22% affiliate program.

## FAQ

**Q: Should I use IVFFlat or HNSW?**
HNSW for ≥99% workloads in 2026 — better recall, no `lists` to tune. IVFFlat only when memory is constrained.

**Q: How many vectors before pgvector hurts?**
Up to ~10M with HNSW + decent hardware. Beyond that consider sharding (Citus) or pgvectorscale.

**Q: Can I store 3072-dim embeddings (text-embedding-3-large)?**
Yes, but HNSW caps at 2,000 dims. Use halfvec or matryoshka-truncate to 1536.

**Q: Does pgvector support metadata filtering?**
Yes — combine `WHERE` clauses with ANN ordering. Postgres planner handles it.

**Q: Cosine vs L2 vs dot product?**
Cosine for normalized OpenAI embeddings, dot product when you control normalization yourself.

## Sources

- [pgvector GitHub repository](https://github.com/pgvector/pgvector)
- [Neon — Understanding HNSW with pgvector](https://neon.com/blog/understanding-vector-search-and-hnsw-index-with-pgvector)
- [Crunchy Data — HNSW with Postgres and pgvector](https://www.crunchydata.com/blog/hnsw-indexes-with-postgres-and-pgvector)
- [Supabase — pgvector docs](https://supabase.com/docs/guides/database/extensions/pgvector)

---

Source: https://callsphere.ai/blog/vw7h-pgvector-quickstart-ai-agent-embeddings-2026