---
title: "pgvector HNSW Index Tuning at Scale: m, ef_construction, ef_search (2026)"
description: "A measured guide to tuning pgvector HNSW indexes for AI agent workloads — what m, ef_construction, and ef_search actually do, how to size them at 1M, 10M, and 50M rows, and how to monitor recall in production."
canonical: https://callsphere.ai/blog/vw7h-pgvector-hnsw-index-tuning-at-scale-2026
category: "AI Infrastructure"
tags: ["pgvector", "HNSW", "Performance", "Postgres", "RAG"]
author: "CallSphere Team"
published: 2026-03-17T00:00:00.000Z
updated: 2026-05-07T22:22:40.074Z
---

# pgvector HNSW Index Tuning at Scale: m, ef_construction, ef_search (2026)

> A measured guide to tuning pgvector HNSW indexes for AI agent workloads — what m, ef_construction, and ef_search actually do, how to size them at 1M, 10M, and 50M rows, and how to monitor recall in production.

> **TL;DR** — Default HNSW params (`m=16`, `ef_construction=64`, `ef_search=40`) are optimized for 100k-row demos, not 10M-row production. Bumping `ef_construction` to 200 and `ef_search` to 100–200 typically lifts recall@10 from 0.85 to 0.97 with manageable latency cost.

## What you'll build

A reproducible benchmark loop that measures recall and p95 latency across HNSW parameter sets, plus a production tuning playbook for 1M, 10M, and 50M-row pgvector tables.

## Schema

```sql
CREATE TABLE rag_chunks (
  id BIGSERIAL PRIMARY KEY,
  doc_id UUID NOT NULL,
  chunk_text TEXT NOT NULL,
  embedding vector(1536) NOT NULL
);

-- Build index AFTER bulk load
CREATE INDEX rag_chunks_hnsw ON rag_chunks
  USING hnsw (embedding vector_cosine_ops)
  WITH (m = 32, ef_construction = 200);
```

## Architecture

```mermaid
flowchart TD
  LOAD[Bulk load 10M chunks] --> IDX[Build HNSW with m=32, ef=200]
  IDX --> BENCH[Benchmark loop]
  BENCH --> RECALL[Measure recall@10]
  BENCH --> P95[Measure p95 latency]
  RECALL --> TUNE{Recall > 0.95?}
  P95 --> TUNE
  TUNE -->|No| EFUP[Raise ef_search]
  TUNE -->|Yes| SHIP[Ship config]
```

## Step 1 — Understand the three knobs

- **`m`** — neighbors per node. Default 16. Higher m = better recall, larger index, slower build. For 10M+ vectors set m = 24–32.
- **`ef_construction`** — candidate list during build. Default 64. Production: 128–200. Affects build time, not query time.
- **`ef_search`** — candidate list during query. Default 40. Production: 80–200. Linear knob: latency vs recall.

## Step 2 — Build with parallel workers

```sql
SET maintenance_work_mem = '8GB';
SET max_parallel_maintenance_workers = 7;

CREATE INDEX CONCURRENTLY rag_chunks_hnsw ON rag_chunks
  USING hnsw (embedding vector_cosine_ops)
  WITH (m = 32, ef_construction = 200);
```

pgvector 0.7+ supports parallel HNSW builds — 4-8x faster on 8-core machines.

## Step 3 — Generate a recall ground truth

```python
import psycopg, numpy as np
conn = psycopg.connect(...)

def brute_force_topk(q: list[float], k: int = 10):
    with conn.cursor() as cur:
        cur.execute("SET LOCAL enable_indexscan = off")
        cur.execute(
            """
            SELECT id FROM rag_chunks
            ORDER BY embedding  %s::vector LIMIT %s
            """,
            (q, k),
        )
        return [r[0] for r in cur.fetchall()]
```

Run brute-force on 200 sampled queries, store as ground truth.

## Step 4 — Sweep `ef_search`

```python
def hnsw_topk(q, k=10, ef=100):
    with conn.cursor() as cur:
        cur.execute(f"SET LOCAL hnsw.ef_search = {ef}")
        cur.execute(
            "SELECT id FROM rag_chunks ORDER BY embedding  %s::vector LIMIT %s",
            (q, k),
        )
        return [r[0] for r in cur.fetchall()]

for ef in [40, 80, 120, 160, 200, 300]:
    hits, lat = [], []
    for q, gt in samples:
        t0 = time.perf_counter()
        ids = hnsw_topk(q, ef=ef)
        lat.append(time.perf_counter() - t0)
        hits.append(len(set(ids) & set(gt)) / 10)
    print(f"ef={ef} recall={np.mean(hits):.3f} p95={np.percentile(lat,95)*1000:.1f}ms")
```

## Step 5 — Read the curve, pick a point

Typical 10M-row result on a 16-vCPU Postgres:

| ef_search | recall@10 | p95 latency |
| --- | --- | --- |
| 40 | 0.86 | 8 ms |
| 100 | 0.94 | 14 ms |
| 200 | 0.98 | 26 ms |
| 400 | 0.99 | 51 ms |

For an agent that hits memory once per turn, 200 is the sweet spot.

## Step 6 — Production monitoring

```sql
SELECT relname, idx_scan, idx_tup_read, idx_tup_fetch,
       pg_size_pretty(pg_relation_size(indexrelid)) AS idx_size
FROM pg_stat_user_indexes
WHERE indexrelname = 'rag_chunks_hnsw';
```

Track index size weekly — HNSW grows ~1.5–2x the raw vector size at `m=32`.

## Pitfalls

- **Building before load** — wastes hours, produces worse graphs. Always load first.
- **`maintenance_work_mem` too small** — index spills to disk, build slows 10x. Set it to 25-50% of RAM.
- **Filtering on un-indexed columns** — `WHERE tenant_id = $1 ORDER BY embedding  $2` is post-filtered. Use a partial HNSW or pgvectorscale's StreamingDiskANN.
- **Ignoring write amplification** — every UPDATE to `embedding` rebuilds graph edges. Batch updates.

## CallSphere production note

CallSphere's RAG layer indexes 8M+ chunks across **115+ DB tables** with `m=24, ef_construction=128, ef_search=160`. Healthcare and Behavioral Health verticals run on a HIPAA-isolated `healthcare_voice` Prisma schema; OneRoof uses RLS-scoped HNSW indexes per landlord; UrackIT keeps its non-HIPAA RAG on Supabase + ChromaDB. **37 agents · 90+ tools · 6 verticals**. Plans: $149 / $499 / $1,499, 14-day trial, 22% affiliate.

## FAQ

**Q: Does `SET hnsw.ef_search` need to be SESSION-scoped?**
`SET LOCAL` inside the transaction is safest — avoids leaking to pooled connections.

**Q: When is IVFFlat actually better than HNSW?**
Memory-constrained boxes (100M vectors with low QPS.

**Q: Should I rebuild the index after bulk imports?**
Only if you imported >20% of total rows. HNSW handles incremental inserts well.

**Q: Can I use halfvec to halve memory?**
Yes — pgvector 0.7+ ships `halfvec(n)`. Recall drop is usually <1%, memory savings 50%.

**Q: What about pgvectorscale?**
StreamingDiskANN beats HNSW past ~50M vectors. Worth evaluating if you outgrow pgvector.

## Sources

- [pgvector GitHub — HNSW tuning](https://github.com/pgvector/pgvector#hnsw)
- [Crunchy Data — HNSW indexes](https://www.crunchydata.com/blog/hnsw-indexes-with-postgres-and-pgvector)
- [Google Cloud — pgvector index performance](https://cloud.google.com/blog/products/databases/faster-similarity-search-performance-with-pgvector-indexes)
- [Severalnines — pgvector deep dive](https://severalnines.com/blog/vector-similarity-search-with-postgresqls-pgvector-a-deep-dive/)

---

Source: https://callsphere.ai/blog/vw7h-pgvector-hnsw-index-tuning-at-scale-2026
