By Sagar Shankaran, Founder of CallSphere
Pure vector search caps out around 62% precision. Adding pg_trgm + tsvector with Reciprocal Rank Fusion lifts it to 84%+. Real SQL, real RRF math, and a Prisma helper you can drop into any agent.
Key takeaways
TL;DR — Vector search misses exact matches (model names, error codes, drug names). Lexical search misses paraphrases. Hybrid retrieval — running both and merging with Reciprocal Rank Fusion — combines their strengths. In Postgres you get this for free with pgvector + pg_trgm + tsvector.
A single search_chunks(query, k) function that runs vector ANN, full-text BM25-flavored ranking, and trigram fuzzy matching, then fuses the three lists with RRF. Recall@10 climbs from ~0.62 (vector only) to ~0.84+.
CREATE EXTENSION IF NOT EXISTS vector;
CREATE EXTENSION IF NOT EXISTS pg_trgm;
CREATE TABLE chunks (
id BIGSERIAL PRIMARY KEY,
doc_id UUID NOT NULL,
body TEXT NOT NULL,
body_tsv tsvector GENERATED ALWAYS AS (to_tsvector('english', body)) STORED,
embedding vector(1536) NOT NULL
);
CREATE INDEX chunks_tsv_idx ON chunks USING gin (body_tsv);
CREATE INDEX chunks_trgm_idx ON chunks USING gin (body gin_trgm_ops);
CREATE INDEX chunks_hnsw_idx ON chunks USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 128);
flowchart LR
Q[User query] --> EMB[Embed]
Q --> TS[Tsvector parse]
Q --> TRG[Trigram tokens]
EMB --> ANN[HNSW top-50]
TS --> BM25[ts_rank top-50]
TRG --> FUZZ[trgm sim top-50]
ANN --> RRF[Reciprocal Rank Fusion]
BM25 --> RRF
FUZZ --> RRF
RRF --> TOPK[Top-k to LLM]
WITH vec AS (
SELECT id, ROW_NUMBER() OVER (ORDER BY embedding <=> $1::vector) AS rk
FROM chunks ORDER BY embedding <=> $1::vector LIMIT 50
)
SELECT * FROM vec;
WITH lex AS (
SELECT id, ROW_NUMBER() OVER (
ORDER BY ts_rank_cd(body_tsv, websearch_to_tsquery('english', $2)) DESC
) AS rk
FROM chunks
WHERE body_tsv @@ websearch_to_tsquery('english', $2)
LIMIT 50
)
SELECT * FROM lex;
WITH fuzz AS (
SELECT id, ROW_NUMBER() OVER (ORDER BY similarity(body, $2) DESC) AS rk
FROM chunks
WHERE body % $2
ORDER BY similarity(body, $2) DESC LIMIT 50
)
SELECT * FROM fuzz;
CREATE OR REPLACE FUNCTION search_chunks(
q_embed vector(1536),
q_text text,
k int DEFAULT 10,
rrf_k int DEFAULT 60
)
RETURNS TABLE (id bigint, body text, score float) AS $$
WITH
vec AS (
SELECT id, ROW_NUMBER() OVER (ORDER BY embedding <=> q_embed) AS rk
FROM chunks ORDER BY embedding <=> q_embed LIMIT 50
),
lex AS (
SELECT id, ROW_NUMBER() OVER (
ORDER BY ts_rank_cd(body_tsv, websearch_to_tsquery('english', q_text)) DESC
) AS rk
FROM chunks
WHERE body_tsv @@ websearch_to_tsquery('english', q_text)
LIMIT 50
),
fuzz AS (
SELECT id, ROW_NUMBER() OVER (ORDER BY similarity(body, q_text) DESC) AS rk
FROM chunks WHERE body % q_text LIMIT 50
),
fused AS (
SELECT id, SUM(1.0 / (rrf_k + rk)) AS score FROM (
SELECT id, rk FROM vec
UNION ALL SELECT id, rk FROM lex
UNION ALL SELECT id, rk FROM fuzz
) u GROUP BY id
)
SELECT c.id, c.body, f.score
FROM fused f JOIN chunks c USING (id)
ORDER BY f.score DESC LIMIT k;
$$ LANGUAGE sql STABLE;
const rows = await prisma.$queryRaw`
SELECT * FROM search_chunks(${vec}::vector, ${q}, 10)
`;
kk=60 is the canonical TREC value. Lower (k=10) emphasizes top-1 hits; higher (k=120) flattens results. A/B test on your eval set.
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
websearch_to_tsquery rejects empty strings — guard with COALESCE or a fallback.% works on whole-string sim. For long docs, store title || summary as a separate trgm column.STABLE — required so the planner caches the function call.CallSphere's knowledge agents use this exact RRF pattern across the 115+ DB tables that hold product docs, FAQs, and call transcripts. The Healthcare vertical (HIPAA, Prisma healthcare_voice) tunes rrf_k=40 to favor exact drug-name matches; OneRoof (RLS) keeps rrf_k=60; UrackIT (Supabase + ChromaDB) ports the same RRF shape to its non-HIPAA stack. 37 agents · 90+ tools · 6 verticals. Pricing: $149/$499/$1,499 — 14-day trial, 22% affiliate.
Q: How much does hybrid actually beat vector-only? On TREC-COVID and BEIR splits, hybrid + RRF lifts nDCG@10 by 15–25%.
Q: Do I need a reranker after RRF? Optional. A cross-encoder reranker on the top-20 fused results adds another 3–7% nDCG.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
Q: What about pg_textsearch / BM25 in Postgres? TigerData's pg_textsearch ships true BM25. Drop-in for ts_rank_cd if you need it.
Q: pg_trgm vs FTS — when does each win? Trgm wins on misspellings and partial words; FTS wins on phrase/stem matching.
Q: Single SQL vs three round trips? Single SQL is faster and atomic. Always fuse server-side.
Written by
Sagar Shankaran· Founder, CallSphere
Sagar Shankaran is the founder of CallSphere, where he builds production AI voice and chat agents deployed across healthcare, hospitality, real estate, and home services. He writes about agentic AI, LLM engineering, and shipping voice agents that handle real calls in production.
See how AI voice agents work for your industry. Live demo available -- no signup required.
A founder's guide to building a chatbot for answering questions on your website: RAG, voice, and how CallSphere ships one in 3-5 days.
Graphiti is the open-source temporal knowledge graph for AI agents in 2026. Learn how bi-temporal memory beats vector RAG for voice agents and long-running LLMs.
A founder's guide on how to create a chatbot in 2026. Build options, AI stack, integration patterns, and when buying a managed agent wins over building.
Haystack 2.7's Agent component plus an Ollama-served Llama 3.2 gives you tool-calling RAG with citations. Here's a complete pipeline against your own document store.
Beyond single-shot RAG — agentic RAG with LangGraph that re-retrieves, self-grades, and rewrites queries. With evals that catch silent retrieval drift.
Build a production RAG agent with LangChain, then measure faithfulness, answer relevance, and context precision with RAGAS. The four metrics that matter and how to wire them up.
© 2026 CallSphere LLC. All rights reserved.
Watch how CallSphere handles real customer calls, schedules appointments, and processes payments — live.
Try Live DemoBook a DemoCalculate Your ROI