TL;DR — Vector search misses exact matches (model names, error codes, drug names). Lexical search misses paraphrases. Hybrid retrieval — running both and merging with Reciprocal Rank Fusion — combines their strengths. In Postgres you get this for free with pgvector + pg_trgm + tsvector.

What you'll build

A single search_chunks(query, k) function that runs vector ANN, full-text BM25-flavored ranking, and trigram fuzzy matching, then fuses the three lists with RRF. Recall@10 climbs from ~0.62 (vector only) to ~0.84+.

Schema

CREATE EXTENSION IF NOT EXISTS vector;
CREATE EXTENSION IF NOT EXISTS pg_trgm;

CREATE TABLE chunks (
  id BIGSERIAL PRIMARY KEY,
  doc_id UUID NOT NULL,
  body TEXT NOT NULL,
  body_tsv tsvector GENERATED ALWAYS AS (to_tsvector('english', body)) STORED,
  embedding vector(1536) NOT NULL
);

CREATE INDEX chunks_tsv_idx ON chunks USING gin (body_tsv);
CREATE INDEX chunks_trgm_idx ON chunks USING gin (body gin_trgm_ops);
CREATE INDEX chunks_hnsw_idx ON chunks USING hnsw (embedding vector_cosine_ops)
  WITH (m = 16, ef_construction = 128);

Architecture

flowchart LR
  Q[User query] --> EMB[Embed]
  Q --> TS[Tsvector parse]
  Q --> TRG[Trigram tokens]
  EMB --> ANN[HNSW top-50]
  TS --> BM25[ts_rank top-50]
  TRG --> FUZZ[trgm sim top-50]
  ANN --> RRF[Reciprocal Rank Fusion]
  BM25 --> RRF
  FUZZ --> RRF
  RRF --> TOPK[Top-k to LLM]

Step 1 — Vector candidates

WITH vec AS (
  SELECT id, ROW_NUMBER() OVER (ORDER BY embedding <=> $1::vector) AS rk
  FROM chunks ORDER BY embedding <=> $1::vector LIMIT 50
)
SELECT * FROM vec;

Step 2 — Full-text candidates

WITH lex AS (
  SELECT id, ROW_NUMBER() OVER (
    ORDER BY ts_rank_cd(body_tsv, websearch_to_tsquery('english', $2)) DESC
  ) AS rk
  FROM chunks
  WHERE body_tsv @@ websearch_to_tsquery('english', $2)
  LIMIT 50
)
SELECT * FROM lex;

Step 3 — Trigram fuzzy candidates

WITH fuzz AS (
  SELECT id, ROW_NUMBER() OVER (ORDER BY similarity(body, $2) DESC) AS rk
  FROM chunks
  WHERE body % $2
  ORDER BY similarity(body, $2) DESC LIMIT 50
)
SELECT * FROM fuzz;

Step 4 — Fuse with RRF

CREATE OR REPLACE FUNCTION search_chunks(
  q_embed vector(1536),
  q_text  text,
  k       int DEFAULT 10,
  rrf_k   int DEFAULT 60
)
RETURNS TABLE (id bigint, body text, score float) AS $$
WITH
  vec AS (
    SELECT id, ROW_NUMBER() OVER (ORDER BY embedding <=> q_embed) AS rk
    FROM chunks ORDER BY embedding <=> q_embed LIMIT 50
  ),
  lex AS (
    SELECT id, ROW_NUMBER() OVER (
      ORDER BY ts_rank_cd(body_tsv, websearch_to_tsquery('english', q_text)) DESC
    ) AS rk
    FROM chunks
    WHERE body_tsv @@ websearch_to_tsquery('english', q_text)
    LIMIT 50
  ),
  fuzz AS (
    SELECT id, ROW_NUMBER() OVER (ORDER BY similarity(body, q_text) DESC) AS rk
    FROM chunks WHERE body % q_text LIMIT 50
  ),
  fused AS (
    SELECT id, SUM(1.0 / (rrf_k + rk)) AS score FROM (
      SELECT id, rk FROM vec
      UNION ALL SELECT id, rk FROM lex
      UNION ALL SELECT id, rk FROM fuzz
    ) u GROUP BY id
  )
SELECT c.id, c.body, f.score
FROM fused f JOIN chunks c USING (id)
ORDER BY f.score DESC LIMIT k;
$$ LANGUAGE sql STABLE;

Step 5 — Call from Prisma

const rows = await prisma.$queryRaw`
  SELECT * FROM search_chunks(${vec}::vector, ${q}, 10)
`;

Step 6 — Tune RRF `k`

k=60 is the canonical TREC value. Lower (k=10) emphasizes top-1 hits; higher (k=120) flattens results. A/B test on your eval set.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live →

Try Live Demo →

Pitfalls

websearch_to_tsquery rejects empty strings — guard with COALESCE or a fallback.
Trigram on long text — % works on whole-string sim. For long docs, store title || summary as a separate trgm column.
Different LIMITs per branch — keep them equal (50 each) so RRF weights are fair.
Forgetting STABLE — required so the planner caches the function call.

CallSphere production note

CallSphere's knowledge agents use this exact RRF pattern across the 115+ DB tables that hold product docs, FAQs, and call transcripts. The Healthcare vertical (HIPAA, Prisma healthcare_voice) tunes rrf_k=40 to favor exact drug-name matches; OneRoof (RLS) keeps rrf_k=60; UrackIT (Supabase + ChromaDB) ports the same RRF shape to its non-HIPAA stack. 37 agents · 90+ tools · 6 verticals. Pricing: $149/$499/$1,499 — 14-day trial, 22% affiliate.

FAQ

Q: How much does hybrid actually beat vector-only? On TREC-COVID and BEIR splits, hybrid + RRF lifts nDCG@10 by 15–25%.

Q: Do I need a reranker after RRF? Optional. A cross-encoder reranker on the top-20 fused results adds another 3–7% nDCG.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Try Live Demo → Book 30-min Walkthrough See Pricing

Q: What about pg_textsearch / BM25 in Postgres? TigerData's pg_textsearch ships true BM25. Drop-in for ts_rank_cd if you need it.

Q: pg_trgm vs FTS — when does each win? Trgm wins on misspellings and partial words; FTS wins on phrase/stem matching.

Q: Single SQL vs three round trips? Single SQL is faster and atomic. Always fuse server-side.

pg_trgm + pgvector Hybrid Retrieval: Build Better RAG in Postgres (2026)

What you'll build

Schema

Architecture

Step 1 — Vector candidates

Step 2 — Full-text candidates

Step 3 — Trigram fuzzy candidates

Step 4 — Fuse with RRF

Step 5 — Call from Prisma

Step 6 — Tune RRF `k`

Pitfalls

CallSphere production note

FAQ

Sources

Try CallSphere AI Voice Agents

Related Articles You May Like

Chatbot for Answering Questions: How to Build One That Works

Graphiti: How Temporal Knowledge Graphs Give AI Voice Agents Persistent Memory (2026 Guide)

How To Create A Chatbot In 2026: A Founder's Practical Guide

Build a Chat Agent with Haystack RAG + Open LLM (Llama 3.2, 2026)

Agentic RAG with LangGraph: Iterative Retrieval, Self-Correction, and Eval Pipelines

Production RAG Agents with LangChain and RAGAS Evaluation in 2026

Product

Resources

Company

Legal

Industries

Integrations

Solutions

Compare

Pillar Guides

See AI Voice Agents in Action

What you'll build

Schema

Architecture

Step 1 — Vector candidates

Step 2 — Full-text candidates

Step 3 — Trigram fuzzy candidates

Step 4 — Fuse with RRF

Step 5 — Call from Prisma

Step 6 — Tune RRF k

Pitfalls

CallSphere production note

FAQ

Sources

Try CallSphere AI Voice Agents

Related Articles You May Like

Chatbot for Answering Questions: How to Build One That Works

Graphiti: How Temporal Knowledge Graphs Give AI Voice Agents Persistent Memory (2026 Guide)

How To Create A Chatbot In 2026: A Founder's Practical Guide

Build a Chat Agent with Haystack RAG + Open LLM (Llama 3.2, 2026)

Agentic RAG with LangGraph: Iterative Retrieval, Self-Correction, and Eval Pipelines

Production RAG Agents with LangChain and RAGAS Evaluation in 2026

Product

Resources

Company

Legal

Industries

Integrations

Solutions

Compare

Pillar Guides

See AI Voice Agents in Action

Step 6 — Tune RRF `k`