Skip to content
Learn Agentic AI
Learn Agentic AI14 min read2 views

Vector Index Types Explained: Flat, IVF, HNSW, and Product Quantization

Understand the four major vector index algorithms — Flat, IVF, HNSW, and Product Quantization — with clear explanations of accuracy vs speed tradeoffs and guidance on tuning parameters.

When you have 1,000 documents, finding the nearest neighbors to a query vector is trivial — compute the distance to every vector and sort. When you have 10 million documents, that brute-force approach takes seconds per query, which is unacceptable for interactive applications.

Vector indexes solve this by trading a small amount of accuracy for dramatic speed improvements. Instead of comparing against every vector (exact search), approximate nearest neighbor (ANN) indexes use clever data structures to narrow the search space. The four most important index types are Flat, IVF (Inverted File), HNSW (Hierarchical Navigable Small World), and PQ (Product Quantization).

Flat Index: The Baseline

A flat index stores vectors in a simple array and performs exhaustive comparison at query time. It is not really an "index" in the traditional sense — it is brute-force search.

flowchart TD
    START["Vector Index Types Explained: Flat, IVF, HNSW, an…"] --> A
    A["The Index Problem in Vector Search"]
    A --> B
    B["Flat Index: The Baseline"]
    B --> C
    C["IVF Inverted File Index"]
    C --> D
    D["HNSW Hierarchical Navigable Small World"]
    D --> E
    E["Product Quantization PQ"]
    E --> F
    F["Choosing the Right Index"]
    F --> G
    G["FAQ"]
    G --> DONE["Key Takeaways"]
    style START fill:#4f46e5,stroke:#4338ca,color:#fff
    style DONE fill:#059669,stroke:#047857,color:#fff
import faiss
import numpy as np

dimension = 1536
vectors = np.random.rand(10000, dimension).astype("float32")

# Flat index with L2 distance
index = faiss.IndexFlatL2(dimension)
index.add(vectors)

# Query — compares against all 10,000 vectors
query = np.random.rand(1, dimension).astype("float32")
distances, indices = index.search(query, k=5)
print(f"Nearest IDs: {indices[0]}")

Pros: 100% recall (exact results), no training step, simple to use. Cons: Query time scales linearly with dataset size. Impractical above ~100K vectors for real-time applications.

When to use: Small datasets, ground truth evaluation, accuracy benchmarking of other index types.

IVF (Inverted File Index)

IVF partitions the vector space into nlist clusters using k-means. At query time, it only searches the nprobe nearest clusters instead of the entire dataset.

nlist = 100  # number of clusters
quantizer = faiss.IndexFlatL2(dimension)
index = faiss.IndexIVFFlat(quantizer, dimension, nlist)

# IVF requires training on representative data
index.train(vectors)
index.add(vectors)

# Search the 10 nearest clusters
index.nprobe = 10
distances, indices = index.search(query, k=5)

The key tradeoff is nprobe:

  • Low nprobe (1-5): Very fast but may miss relevant vectors in other clusters
  • High nprobe (50-100): Slower but approaches exact search quality
  • nprobe = nlist: Equivalent to flat search (exact, but with cluster overhead)

Rule of thumb: Set nlist to roughly 4 * sqrt(n) where n is the number of vectors. Set nprobe to 5-10% of nlist as a starting point.

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

HNSW (Hierarchical Navigable Small World)

HNSW builds a multi-layer graph where each vector is a node. Upper layers are sparse (long-range connections for coarse navigation), and lower layers are dense (short-range connections for precise search). Queries start at the top layer and greedily navigate toward the nearest neighbors.

# HNSW in FAISS
M = 32              # connections per node
ef_construction = 40 # build-time search depth

index = faiss.IndexHNSWFlat(dimension, M)
index.hnsw.efConstruction = ef_construction
index.add(vectors)

# Query-time search depth
index.hnsw.efSearch = 64
distances, indices = index.search(query, k=5)

Key parameters:

  • M: Number of connections per node. Higher M improves recall but increases memory and build time. Typical values: 16-64.
  • ef_construction: Search depth during index building. Higher values produce better graphs. Typical: 100-200.
  • ef_search: Search depth at query time. Must be >= k. Higher values improve recall at the cost of latency.

Pros: Best recall-speed tradeoff for most workloads. Supports incremental inserts without rebuilding. No training step required. Cons: High memory usage (stores the graph structure). Slower to build than IVF for very large datasets.

Product Quantization (PQ)

PQ compresses vectors by splitting each vector into subvectors and quantizing each subvector independently. This reduces memory usage dramatically — a 1536-dimensional float32 vector (6KB) can be compressed to ~64 bytes.

m = 48       # number of subquantizers (must divide dimension evenly)
nbits = 8    # bits per subquantizer (256 centroids each)

index = faiss.IndexPQ(dimension, m, nbits)
index.train(vectors)
index.add(vectors)

distances, indices = index.search(query, k=5)

PQ is often combined with IVF for a powerful combination:

# IVF + PQ: partitioned search with compressed vectors
index = faiss.IndexIVFPQ(quantizer, dimension, nlist, m, nbits)
index.train(vectors)
index.add(vectors)
index.nprobe = 10

Pros: Massive memory reduction (10-50x). Enables billion-scale search on a single machine. Cons: Lower recall than uncompressed indexes. Requires training. Distance calculations are approximate.

Choosing the Right Index

Criteria Flat IVF HNSW PQ
Dataset size < 100K 100K - 10M 100K - 50M 1M - 1B+
Recall 100% 90-99% 95-99.9% 80-95%
Memory High High Very high Low
Build time None Moderate Slow Moderate
Query speed Slow at scale Fast Very fast Fast
Incremental insert Yes Rebuild needed Yes Rebuild needed

For most applications with fewer than 10 million vectors, HNSW is the best default. It delivers the highest recall at competitive query speeds without requiring a training step.

FAQ

Can I combine multiple index types together?

Yes. The most common combination is IVF + PQ (IndexIVFPQ in FAISS), which partitions the space with IVF and compresses vectors with PQ. HNSW can also use PQ compression (IndexHNSWPQ) for memory-constrained environments. These composite indexes trade some recall for dramatic memory and speed improvements.

How do I measure recall to compare index configurations?

Generate a ground truth by running exact search (Flat index) on a representative query set. Then run the same queries against your ANN index and calculate recall at K — the fraction of true nearest neighbors that appear in the ANN results. A recall of 0.95 at K=10 means 9.5 out of 10 true nearest neighbors are found on average.

Does the embedding dimension affect which index type to choose?

Higher dimensions increase memory usage and computation time for all index types, but they impact PQ most significantly because more subquantizers are needed. For very high dimensions (3072+), HNSW with dimensionality reduction or PQ compression becomes important. For typical dimensions (384-1536), any index type works without special consideration.


#VectorIndex #HNSW #IVF #ANN #Algorithm #AgenticAI #LearnAI #AIEngineering

Share
C

Written by

CallSphere Team

Expert insights on AI voice agents and customer communication automation.

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.

Related Articles You May Like

Healthcare

AI Voice Agents for Fertility Clinics: IVF Consult Booking, Cycle Coordination, and Emotional Intelligence

Fertility and reproductive endocrinology clinics deploy AI voice agents for IVF consult scheduling, cycle monitoring coordination, and emotionally-aware callbacks on difficult days.

Learn Agentic AI

Understanding Memory Constraints in LLM Inference: Key Strategies

Memory for Inference: Why Serving LLMs Is Really a Memory Problem

Learn Agentic AI

Fine-Tuning LLMs for Agentic Tasks: When and How to Customize Foundation Models

When fine-tuning beats prompting for AI agents: dataset creation from agent traces, SFT and DPO training approaches, evaluation methodology, and cost-benefit analysis for agentic fine-tuning.

Learn Agentic AI

The Rise of Agent-to-Agent Ecosystems: How MCP and A2A Are Creating Agent Marketplaces

How protocols like Anthropic's MCP and Google's A2A enable agents to discover and interact with each other, creating agent marketplaces and service networks in 2026.

Learn Agentic AI

Agent Gateway Pattern: Rate Limiting, Authentication, and Request Routing for AI Agents

Implementing an agent gateway with API key management, per-agent rate limiting, intelligent request routing, audit logging, and cost tracking for enterprise AI systems.

Learn Agentic AI

Agent A/B Testing: Comparing Model Versions, Prompts, and Architectures in Production

How to A/B test AI agents in production: traffic splitting, evaluation metrics, statistical significance, prompt version comparison, and architecture experiments.