Skip to content
AI Engineering
AI Engineering11 min read0 views

Build a RAG AI Agent with Drizzle ORM + pgvector + OpenAI (2026)

Drizzle ORM ships first-class pgvector helpers in 2026. Wire HNSW cosine search to gpt-4o-mini for sub-50ms RAG with type-safe queries — no Python, no Pinecone.

TL;DR — Drizzle ORM 0.36+ adds vector column type and cosineDistance operator. With pgvector 0.7 HNSW indexes you can hit sub-50ms top-K retrieval on a free Neon tier — all from typed TypeScript.

What you'll build

A documentation-search agent that ingests Markdown files, embeds chunks with text-embedding-3-small, stores them in Postgres, and answers questions with citations using gpt-4o-mini.

Prerequisites

  1. drizzle-orm@^0.36, drizzle-kit@^0.27, postgres@^3.4 (or pg).
  2. Postgres 15+ with pgvector 0.7+ (CREATE EXTENSION vector). Neon, Supabase, RDS all support it.
  3. openai@^4.70.

Architecture

flowchart LR
  MD[Markdown files] --> CH[Chunker]
  CH --> EMB[text-embedding-3-small]
  EMB --> PG[(Postgres + pgvector HNSW)]
  Q[User Q] --> EMB2[embed]
  EMB2 --> PG --> CTX[top-K]
  CTX --> LLM[gpt-4o-mini] --> A[Answer + citations]

Step 1 — Schema

```ts import { pgTable, serial, text, vector, index } from "drizzle-orm/pg-core";

export const chunks = pgTable("chunks", { id: serial("id").primaryKey(), doc: text("doc").notNull(), content: text("content").notNull(), embedding: vector("embedding", { dimensions: 1536 }).notNull(), }, (t) => ({ // HNSW + cosine — fastest for OpenAI embeddings hnsw: index("hnsw_idx").using("hnsw", t.embedding.op("vector_cosine_ops")), })); ```

Step 2 — Ingest pipeline

```ts import OpenAI from "openai"; import { db } from "./db"; import { chunks } from "./schema";

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live Demo →

const oa = new OpenAI();

export async function ingest(doc: string, text: string) { const pieces = text.match(/[^.!?]{200,400}[.!?]/g) ?? []; const r = await oa.embeddings.create({ model: "text-embedding-3-small", input: pieces, }); await db.insert(chunks).values(pieces.map((c, i) => ({ doc, content: c, embedding: r.data[i].embedding, }))); } ```

Step 3 — Type-safe retrieval

```ts import { cosineDistance, sql, desc } from "drizzle-orm";

export async function search(q: string, k = 5) { const [{ embedding }] = (await oa.embeddings.create({ model: "text-embedding-3-small", input: q, })).data;

const sim = sql1 - (${cosineDistance(chunks.embedding, embedding)}); return db.select({ doc: chunks.doc, content: chunks.content, score: sim.as("score") }) .from(chunks) .orderBy(desc(sim)) .limit(k); } ```

Step 4 — Answer with citations

```ts export async function answer(q: string) { const ctx = await search(q, 5); const r = await oa.chat.completions.create({ model: "gpt-4o-mini", messages: [ { role: "system", content: "Answer using ONLY the context. Cite [doc]." }, { role: "user", content: Context:\n${ctx.map(c => [${c.doc}] ${c.content}).join("\n")}\n\nQ: ${q} }, ], }); return { answer: r.choices[0].message.content, citations: ctx }; } ```

Step 5 — Drizzle migrations

pnpm drizzle-kit generate && pnpm drizzle-kit migrate produces migration SQL with CREATE INDEX hnsw_idx USING hnsw. Re-run after dimension changes.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Step 6 — Tune HNSW

```sql SET hnsw.ef_search = 100; -- higher = better recall, slower ```

For 1M+ chunks, also set m = 16 and ef_construction = 64 at index build time.

Pitfalls

  • 1536 vs 3072 dims: text-embedding-3-small is 1536; -large is 3072 — schema must match.
  • vector_l2_ops vs vector_cosine_ops: OpenAI embeddings are normalized — cosine is correct.
  • Insert batches: Postgres has a ~65535 parameter limit; chunk inserts in batches of 200.

How CallSphere does this in production

CallSphere uses Drizzle + pgvector across 115+ DB tables for tool-call memory, agent transcripts, and KB RAG. The platform powers 37 agents, 90+ tools, and 6 verticals including Healthcare (FastAPI), OneRoof (Next.js 16 + React 19), Salon (NestJS 10 + Prisma), and Sales (Node.js 20 + React 18 + Vite). $149/$499/$1,499, 14-day no-card trial, 22% affiliate.

FAQ

Drizzle vs Prisma for RAG? Drizzle has native cosineDistance and lets you keep raw SQL when needed; Prisma still requires $queryRaw for vectors.

Cost? text-embedding-3-small is $0.02/1M tokens — millions of chunks for cents.

Why HNSW over IVFFlat? Better recall at low latencies for 1M+ rows; rebuild-free incremental insert.

Vector size limit? pgvector caps at 16,000 dimensions; OpenAI's largest is 3,072.

Sources

Share

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.