Build a RAG AI Agent with Drizzle ORM + pgvector + OpenAI (2026)
Drizzle ORM ships first-class pgvector helpers in 2026. Wire HNSW cosine search to gpt-4o-mini for sub-50ms RAG with type-safe queries — no Python, no Pinecone.
TL;DR — Drizzle ORM 0.36+ adds
vectorcolumn type andcosineDistanceoperator. With pgvector 0.7 HNSW indexes you can hit sub-50ms top-K retrieval on a free Neon tier — all from typed TypeScript.
What you'll build
A documentation-search agent that ingests Markdown files, embeds chunks with text-embedding-3-small, stores them in Postgres, and answers questions with citations using gpt-4o-mini.
Prerequisites
drizzle-orm@^0.36,drizzle-kit@^0.27,postgres@^3.4(orpg).- Postgres 15+ with pgvector 0.7+ (
CREATE EXTENSION vector). Neon, Supabase, RDS all support it. openai@^4.70.
Architecture
flowchart LR
MD[Markdown files] --> CH[Chunker]
CH --> EMB[text-embedding-3-small]
EMB --> PG[(Postgres + pgvector HNSW)]
Q[User Q] --> EMB2[embed]
EMB2 --> PG --> CTX[top-K]
CTX --> LLM[gpt-4o-mini] --> A[Answer + citations]
Step 1 — Schema
```ts import { pgTable, serial, text, vector, index } from "drizzle-orm/pg-core";
export const chunks = pgTable("chunks", { id: serial("id").primaryKey(), doc: text("doc").notNull(), content: text("content").notNull(), embedding: vector("embedding", { dimensions: 1536 }).notNull(), }, (t) => ({ // HNSW + cosine — fastest for OpenAI embeddings hnsw: index("hnsw_idx").using("hnsw", t.embedding.op("vector_cosine_ops")), })); ```
Step 2 — Ingest pipeline
```ts import OpenAI from "openai"; import { db } from "./db"; import { chunks } from "./schema";
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
const oa = new OpenAI();
export async function ingest(doc: string, text: string) { const pieces = text.match(/[^.!?]{200,400}[.!?]/g) ?? []; const r = await oa.embeddings.create({ model: "text-embedding-3-small", input: pieces, }); await db.insert(chunks).values(pieces.map((c, i) => ({ doc, content: c, embedding: r.data[i].embedding, }))); } ```
Step 3 — Type-safe retrieval
```ts import { cosineDistance, sql, desc } from "drizzle-orm";
export async function search(q: string, k = 5) { const [{ embedding }] = (await oa.embeddings.create({ model: "text-embedding-3-small", input: q, })).data;
const sim = sql1 - (${cosineDistance(chunks.embedding, embedding)});
return db.select({ doc: chunks.doc, content: chunks.content,
score: sim.as("score") })
.from(chunks)
.orderBy(desc(sim))
.limit(k);
}
```
Step 4 — Answer with citations
```ts
export async function answer(q: string) {
const ctx = await search(q, 5);
const r = await oa.chat.completions.create({
model: "gpt-4o-mini",
messages: [
{ role: "system", content: "Answer using ONLY the context. Cite [doc]." },
{ role: "user", content: Context:\n${ctx.map(c => [${c.doc}] ${c.content}).join("\n")}\n\nQ: ${q} },
],
});
return { answer: r.choices[0].message.content, citations: ctx };
}
```
Step 5 — Drizzle migrations
pnpm drizzle-kit generate && pnpm drizzle-kit migrate produces migration SQL with CREATE INDEX hnsw_idx USING hnsw. Re-run after dimension changes.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
Step 6 — Tune HNSW
```sql SET hnsw.ef_search = 100; -- higher = better recall, slower ```
For 1M+ chunks, also set m = 16 and ef_construction = 64 at index build time.
Pitfalls
- 1536 vs 3072 dims:
text-embedding-3-smallis 1536;-largeis 3072 — schema must match. vector_l2_opsvsvector_cosine_ops: OpenAI embeddings are normalized — cosine is correct.- Insert batches: Postgres has a ~65535 parameter limit; chunk inserts in batches of 200.
How CallSphere does this in production
CallSphere uses Drizzle + pgvector across 115+ DB tables for tool-call memory, agent transcripts, and KB RAG. The platform powers 37 agents, 90+ tools, and 6 verticals including Healthcare (FastAPI), OneRoof (Next.js 16 + React 19), Salon (NestJS 10 + Prisma), and Sales (Node.js 20 + React 18 + Vite). $149/$499/$1,499, 14-day no-card trial, 22% affiliate.
FAQ
Drizzle vs Prisma for RAG? Drizzle has native cosineDistance and lets you keep raw SQL when needed; Prisma still requires $queryRaw for vectors.
Cost? text-embedding-3-small is $0.02/1M tokens — millions of chunks for cents.
Why HNSW over IVFFlat? Better recall at low latencies for 1M+ rows; rebuild-free incremental insert.
Vector size limit? pgvector caps at 16,000 dimensions; OpenAI's largest is 3,072.
Sources
- Drizzle - Vector similarity search - https://orm.drizzle.team/docs/guides/vector-similarity-search
- pgvector-node - https://github.com/pgvector/pgvector-node
- Nile - AI-Native Drizzle - https://www.thenile.dev/docs/getting-started/languages/drizzle
- Drizzle PG extensions - https://orm.drizzle.team/docs/extensions/pg
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.