TL;DR — Drizzle ORM 0.36+ adds vector column type and cosineDistance operator. With pgvector 0.7 HNSW indexes you can hit sub-50ms top-K retrieval on a free Neon tier — all from typed TypeScript.

What you'll build

A documentation-search agent that ingests Markdown files, embeds chunks with text-embedding-3-small, stores them in Postgres, and answers questions with citations using gpt-4o-mini.

Prerequisites

drizzle-orm@^0.36, drizzle-kit@^0.27, postgres@^3.4 (or pg).
Postgres 15+ with pgvector 0.7+ (CREATE EXTENSION vector). Neon, Supabase, RDS all support it.
openai@^4.70.

Architecture

flowchart LR
  MD[Markdown files] --> CH[Chunker]
  CH --> EMB[text-embedding-3-small]
  EMB --> PG[(Postgres + pgvector HNSW)]
  Q[User Q] --> EMB2[embed]
  EMB2 --> PG --> CTX[top-K]
  CTX --> LLM[gpt-4o-mini] --> A[Answer + citations]

Step 1 — Schema

```ts import { pgTable, serial, text, vector, index } from "drizzle-orm/pg-core";

export const chunks = pgTable("chunks", { id: serial("id").primaryKey(), doc: text("doc").notNull(), content: text("content").notNull(), embedding: vector("embedding", { dimensions: 1536 }).notNull(), }, (t) => ({ // HNSW + cosine — fastest for OpenAI embeddings hnsw: index("hnsw_idx").using("hnsw", t.embedding.op("vector_cosine_ops")), })); ```

Step 2 — Ingest pipeline

```ts import OpenAI from "openai"; import { db } from "./db"; import { chunks } from "./schema";

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live →

Try Live Demo →

const oa = new OpenAI();

export async function ingest(doc: string, text: string) { const pieces = text.match(/[^.!?]{200,400}[.!?]/g) ?? []; const r = await oa.embeddings.create({ model: "text-embedding-3-small", input: pieces, }); await db.insert(chunks).values(pieces.map((c, i) => ({ doc, content: c, embedding: r.data[i].embedding, }))); } ```

Step 3 — Type-safe retrieval

```ts import { cosineDistance, sql, desc } from "drizzle-orm";

export async function search(q: string, k = 5) { const [{ embedding }] = (await oa.embeddings.create({ model: "text-embedding-3-small", input: q, })).data;

const sim = sql1 - (${cosineDistance(chunks.embedding, embedding)}); return db.select({ doc: chunks.doc, content: chunks.content, score: sim.as("score") }) .from(chunks) .orderBy(desc(sim)) .limit(k); } ```

Step 4 — Answer with citations

```ts export async function answer(q: string) { const ctx = await search(q, 5); const r = await oa.chat.completions.create({ model: "gpt-4o-mini", messages: [ { role: "system", content: "Answer using ONLY the context. Cite [doc]." }, { role: "user", content: Context:\n${ctx.map(c => [${c.doc}] ${c.content}).join("\n")}\n\nQ: ${q} }, ], }); return { answer: r.choices[0].message.content, citations: ctx }; } ```

Step 5 — Drizzle migrations

pnpm drizzle-kit generate && pnpm drizzle-kit migrate produces migration SQL with CREATE INDEX hnsw_idx USING hnsw. Re-run after dimension changes.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Try Live Demo → Book 30-min Walkthrough See Pricing

Step 6 — Tune HNSW

```sql SET hnsw.ef_search = 100; -- higher = better recall, slower ```

For 1M+ chunks, also set m = 16 and ef_construction = 64 at index build time.

Pitfalls

1536 vs 3072 dims: text-embedding-3-small is 1536; -large is 3072 — schema must match.
vector_l2_ops vs vector_cosine_ops: OpenAI embeddings are normalized — cosine is correct.
Insert batches: Postgres has a ~65535 parameter limit; chunk inserts in batches of 200.

How CallSphere does this in production

CallSphere uses Drizzle + pgvector across 115+ DB tables for tool-call memory, agent transcripts, and KB RAG. The platform powers 37 agents, 90+ tools, and 6 verticals including Healthcare (FastAPI), OneRoof (Next.js 16 + React 19), Salon (NestJS 10 + Prisma), and Sales (Node.js 20 + React 18 + Vite). $149/$499/$1,499, 14-day no-card trial, 22% affiliate.

FAQ

Drizzle vs Prisma for RAG? Drizzle has native cosineDistance and lets you keep raw SQL when needed; Prisma still requires $queryRaw for vectors.

Cost? text-embedding-3-small is $0.02/1M tokens — millions of chunks for cents.

Why HNSW over IVFFlat? Better recall at low latencies for 1M+ rows; rebuild-free incremental insert.

Vector size limit? pgvector caps at 16,000 dimensions; OpenAI's largest is 3,072.

Sources

Drizzle - Vector similarity search - https://orm.drizzle.team/docs/guides/vector-similarity-search
pgvector-node - https://github.com/pgvector/pgvector-node
Nile - AI-Native Drizzle - https://www.thenile.dev/docs/getting-started/languages/drizzle
Drizzle PG extensions - https://orm.drizzle.team/docs/extensions/pg

Build a RAG AI Agent with Drizzle ORM + pgvector + OpenAI (2026)

What you'll build

Prerequisites

Architecture

Step 1 — Schema

Step 2 — Ingest pipeline

Step 3 — Type-safe retrieval

Step 4 — Answer with citations

Step 5 — Drizzle migrations

Step 6 — Tune HNSW

Pitfalls

How CallSphere does this in production

FAQ

Sources

Try CallSphere AI Voice Agents

Related Articles You May Like

Build a Chat Agent with Haystack RAG + Open LLM (Llama 3.2, 2026)

Agentic RAG with LangGraph: Iterative Retrieval, Self-Correction, and Eval Pipelines

Production RAG Agents with LangChain and RAGAS Evaluation in 2026

Vercel AI SDK v5 Agent Patterns: stopWhen, prepareStep, and Loop Control

Cognee: Knowledge-Graph Memory for Agents — A Getting-Started Guide

Mastra.ai: The TypeScript Agent Framework Worth Trying in 2026