---
title: "Build a RAG AI Agent with Drizzle ORM + pgvector + OpenAI (2026)"
description: "Drizzle ORM ships first-class pgvector helpers in 2026. Wire HNSW cosine search to gpt-4o-mini for sub-50ms RAG with type-safe queries — no Python, no Pinecone."
canonical: https://callsphere.ai/blog/vw8h-build-ai-agent-drizzle-orm-pgvector-rag-2026
category: "AI Engineering"
tags: ["Drizzle", "pgvector", "RAG", "Postgres", "TypeScript"]
author: "CallSphere Team"
published: 2026-03-25T00:00:00.000Z
updated: 2026-05-07T22:23:17.415Z
---

# Build a RAG AI Agent with Drizzle ORM + pgvector + OpenAI (2026)

> Drizzle ORM ships first-class pgvector helpers in 2026. Wire HNSW cosine search to gpt-4o-mini for sub-50ms RAG with type-safe queries — no Python, no Pinecone.

> **TL;DR** — Drizzle ORM 0.36+ adds `vector` column type and `cosineDistance` operator. With pgvector 0.7 HNSW indexes you can hit sub-50ms top-K retrieval on a free Neon tier — all from typed TypeScript.

## What you'll build

A documentation-search agent that ingests Markdown files, embeds chunks with `text-embedding-3-small`, stores them in Postgres, and answers questions with citations using gpt-4o-mini.

## Prerequisites

1. `drizzle-orm@^0.36`, `drizzle-kit@^0.27`, `postgres@^3.4` (or `pg`).
2. Postgres 15+ with pgvector 0.7+ (`CREATE EXTENSION vector`). Neon, Supabase, RDS all support it.
3. `openai@^4.70`.

## Architecture

```mermaid
flowchart LR
  MD[Markdown files] --> CH[Chunker]
  CH --> EMB[text-embedding-3-small]
  EMB --> PG[(Postgres + pgvector HNSW)]
  Q[User Q] --> EMB2[embed]
  EMB2 --> PG --> CTX[top-K]
  CTX --> LLM[gpt-4o-mini] --> A[Answer + citations]
```

## Step 1 — Schema

```ts
import { pgTable, serial, text, vector, index } from "drizzle-orm/pg-core";

export const chunks = pgTable("chunks", {
  id:        serial("id").primaryKey(),
  doc:       text("doc").notNull(),
  content:   text("content").notNull(),
  embedding: vector("embedding", { dimensions: 1536 }).notNull(),
}, (t) => ({
  // HNSW + cosine — fastest for OpenAI embeddings
  hnsw: index("hnsw_idx").using("hnsw", t.embedding.op("vector_cosine_ops")),
}));
```

## Step 2 — Ingest pipeline

```ts
import OpenAI from "openai";
import { db } from "./db";
import { chunks } from "./schema";

const oa = new OpenAI();

export async function ingest(doc: string, text: string) {
  const pieces = text.match(/[^.!?]{200,400}[.!?]/g) ?? [];
  const r = await oa.embeddings.create({
    model: "text-embedding-3-small",
    input: pieces,
  });
  await db.insert(chunks).values(pieces.map((c, i) => ({
    doc, content: c, embedding: r.data[i].embedding,
  })));
}
```

## Step 3 — Type-safe retrieval

```ts
import { cosineDistance, sql, desc } from "drizzle-orm";

export async function search(q: string, k = 5) {
  const [{ embedding }] = (await oa.embeddings.create({
    model: "text-embedding-3-small", input: q,
  })).data;

const sim = sql`1 - (${cosineDistance(chunks.embedding, embedding)})`;
  return db.select({ doc: chunks.doc, content: chunks.content,
                     score: sim.as("score") })
    .from(chunks)
    .orderBy(desc(sim))
    .limit(k);
}
```

## Step 4 — Answer with citations

```ts
export async function answer(q: string) {
  const ctx = await search(q, 5);
  const r = await oa.chat.completions.create({
    model: "gpt-4o-mini",
    messages: [
      { role: "system", content: "Answer using ONLY the context. Cite [doc]." },
      { role: "user", content: `Context:\n${ctx.map(c => `[${c.doc}] ${c.content}`).join("\n")}\n\nQ: ${q}` },
    ],
  });
  return { answer: r.choices[0].message.content, citations: ctx };
}
```

## Step 5 — Drizzle migrations

`pnpm drizzle-kit generate && pnpm drizzle-kit migrate` produces migration SQL with `CREATE INDEX hnsw_idx USING hnsw`. Re-run after dimension changes.

## Step 6 — Tune HNSW

```sql
SET hnsw.ef_search = 100;   -- higher = better recall, slower
```

For 1M+ chunks, also set `m = 16` and `ef_construction = 64` at index build time.

## Pitfalls

- **1536 vs 3072 dims**: `text-embedding-3-small` is 1536; `-large` is 3072 — schema must match.
- **`vector_l2_ops` vs `vector_cosine_ops`**: OpenAI embeddings are normalized — cosine is correct.
- **Insert batches**: Postgres has a ~65535 parameter limit; chunk inserts in batches of 200.

## How CallSphere does this in production

CallSphere uses Drizzle + pgvector across **115+ DB tables** for tool-call memory, agent transcripts, and KB RAG. The platform powers **37 agents**, **90+ tools**, and **6 verticals** including Healthcare (FastAPI), OneRoof (Next.js 16 + React 19), Salon (NestJS 10 + Prisma), and Sales (Node.js 20 + React 18 + Vite). **$149/$499/$1,499**, **14-day no-card trial**, **22% affiliate**.

## FAQ

**Drizzle vs Prisma for RAG?** Drizzle has native `cosineDistance` and lets you keep raw SQL when needed; Prisma still requires `$queryRaw` for vectors.

**Cost?** `text-embedding-3-small` is $0.02/1M tokens — millions of chunks for cents.

**Why HNSW over IVFFlat?** Better recall at low latencies for 1M+ rows; rebuild-free incremental insert.

**Vector size limit?** pgvector caps at 16,000 dimensions; OpenAI's largest is 3,072.

## Sources

- Drizzle - Vector similarity search - [https://orm.drizzle.team/docs/guides/vector-similarity-search](https://orm.drizzle.team/docs/guides/vector-similarity-search)
- pgvector-node - [https://github.com/pgvector/pgvector-node](https://github.com/pgvector/pgvector-node)
- Nile - AI-Native Drizzle - [https://www.thenile.dev/docs/getting-started/languages/drizzle](https://www.thenile.dev/docs/getting-started/languages/drizzle)
- Drizzle PG extensions - [https://orm.drizzle.team/docs/extensions/pg](https://orm.drizzle.team/docs/extensions/pg)

---

Source: https://callsphere.ai/blog/vw8h-build-ai-agent-drizzle-orm-pgvector-rag-2026