Semantic Search with Elasticsearch: Dense Vector Search and BM25 Hybrid — Elasticsearch hybrid search bm25 dense vector knn documentation

Why Hybrid Search

Pure keyword search (BM25) excels at exact term matching but fails on synonyms and paraphrases. Pure vector search captures semantic meaning but can miss important exact matches — searching for "Python 3.12 release notes" might return results about "programming language updates" instead of the specific version. Hybrid search combines both approaches, giving you semantic understanding with keyword precision.

Elasticsearch 8.x natively supports dense vector fields and kNN search, making it an excellent platform for hybrid retrieval without running a separate vector database.

Index Configuration

First, create an index that stores both the text (for BM25) and the embedding vector (for kNN):

flowchart TD
    DOC(["Document"])
    CHUNK["Chunker<br/>recursive plus overlap"]
    EMB["Embedding model"]
    META["Attach metadata<br/>source, page, tenant"]
    INDEX[("HNSW or IVF index<br/>in vector store")]
    Q(["Query"])
    QEMB["Embed query"]
    SEARCH["ANN search<br/>cosine similarity"]
    FILTER["Metadata filter<br/>tenant or date"]
    HITS(["Top-k chunks"])
    DOC --> CHUNK --> EMB --> META --> INDEX
    Q --> QEMB --> SEARCH
    INDEX --> SEARCH --> FILTER --> HITS
    style INDEX fill:#4f46e5,stroke:#4338ca,color:#fff
    style HITS fill:#059669,stroke:#047857,color:#fff

from elasticsearch import Elasticsearch

es = Elasticsearch("http://localhost:9200")

INDEX_NAME = "documents"

index_mapping = {
    "settings": {
        "number_of_shards": 1,
        "number_of_replicas": 0,
        "index": {
            "similarity": {
                "custom_bm25": {
                    "type": "BM25",
                    "k1": 1.2,
                    "b": 0.75,
                }
            }
        },
    },
    "mappings": {
        "properties": {
            "title": {
                "type": "text",
                "analyzer": "standard",
                "similarity": "custom_bm25",
            },
            "body": {
                "type": "text",
                "analyzer": "standard",
                "similarity": "custom_bm25",
            },
            "embedding": {
                "type": "dense_vector",
                "dims": 384,
                "index": True,
                "similarity": "cosine",
            },
            "category": {"type": "keyword"},
            "published_at": {"type": "date"},
        }
    },
}

es.indices.create(index=INDEX_NAME, body=index_mapping)

The dense_vector field with index: True builds an HNSW graph for fast approximate nearest neighbor search. The similarity: "cosine" parameter tells Elasticsearch how to measure vector distance.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live →

Try Live Demo →

Indexing Documents with Embeddings

from sentence_transformers import SentenceTransformer
from typing import List, Dict

model = SentenceTransformer("all-MiniLM-L6-v2")

def index_documents(documents: List[Dict]):
    """Index documents with both text and embeddings."""
    texts = [f"{d['title']}. {d['body']}" for d in documents]
    embeddings = model.encode(texts, normalize_embeddings=True)

    actions = []
    for i, doc in enumerate(documents):
        action = {
            "_index": INDEX_NAME,
            "_id": doc.get("id", str(i)),
            "_source": {
                "title": doc["title"],
                "body": doc["body"],
                "embedding": embeddings[i].tolist(),
                "category": doc.get("category", "general"),
                "published_at": doc.get("published_at"),
            },
        }
        actions.append(action)

    from elasticsearch.helpers import bulk
    success, errors = bulk(es, actions, refresh=True)
    print(f"Indexed {success} documents, {len(errors)} errors")

Hybrid Search Query

Elasticsearch supports combining kNN and BM25 in a single query using the knn parameter alongside a traditional query block:

def hybrid_search(
    query_text: str,
    top_k: int = 10,
    knn_boost: float = 0.7,
    bm25_boost: float = 0.3,
    category_filter: str = None,
) -> List[Dict]:
    """Execute hybrid BM25 + kNN search."""
    query_embedding = model.encode(
        [query_text], normalize_embeddings=True
    )[0].tolist()

    # Build the BM25 query
    bm25_query = {
        "bool": {
            "should": [
                {
                    "multi_match": {
                        "query": query_text,
                        "fields": ["title^3", "body"],
                        "type": "best_fields",
                    }
                }
            ]
        }
    }

    # Add category filter if specified
    if category_filter:
        bm25_query["bool"]["filter"] = [
            {"term": {"category": category_filter}}
        ]

    # Build kNN clause
    knn_clause = {
        "field": "embedding",
        "query_vector": query_embedding,
        "k": top_k * 2,
        "num_candidates": 100,
        "boost": knn_boost,
    }

    if category_filter:
        knn_clause["filter"] = {"term": {"category": category_filter}}

    response = es.search(
        index=INDEX_NAME,
        knn=knn_clause,
        query={**bm25_query, "boost": bm25_boost},
        size=top_k,
    )

    results = []
    for hit in response["hits"]["hits"]:
        results.append({
            "id": hit["_id"],
            "score": hit["_score"],
            "title": hit["_source"]["title"],
            "body": hit["_source"]["body"][:200],
        })
    return results

The knn_boost and bm25_boost parameters control the relative weight of each scoring component. A 0.7/0.3 split favoring semantic search works well for natural language queries. For technical searches where exact terms matter more, try 0.4/0.6.

Tuning the Hybrid Balance

def evaluate_boost_ratios(
    queries_with_relevance: List[Dict],
    ratios: List[tuple] = None,
):
    """Test different kNN/BM25 boost ratios to find optimal balance."""
    if ratios is None:
        ratios = [
            (1.0, 0.0),  # pure kNN
            (0.8, 0.2),
            (0.7, 0.3),
            (0.5, 0.5),
            (0.3, 0.7),
            (0.0, 1.0),  # pure BM25
        ]

    for knn_b, bm25_b in ratios:
        total_ndcg = 0
        for item in queries_with_relevance:
            results = hybrid_search(
                item["query"], knn_boost=knn_b, bm25_boost=bm25_b
            )
            result_ids = [r["id"] for r in results]
            ndcg = compute_ndcg(result_ids, item["relevant_ids"])
            total_ndcg += ndcg

        avg_ndcg = total_ndcg / len(queries_with_relevance)
        print(f"kNN={knn_b:.1f} BM25={bm25_b:.1f} -> nDCG@10={avg_ndcg:.4f}")

FAQ

Should I use Elasticsearch or a dedicated vector database like Pinecone or Weaviate?

If you already run Elasticsearch and need hybrid search, it is the pragmatic choice — one fewer system to operate. Dedicated vector databases offer better performance for pure vector workloads at billion-scale, but for most applications under 10 million documents, Elasticsearch's native kNN is more than sufficient.

How does the num_candidates parameter affect kNN quality?

The num_candidates parameter controls how many vectors the HNSW graph explores during search. Higher values improve recall but increase latency. A value of 100-200 is a good default. If you notice relevant results being missed, increase it to 500 and measure the latency impact.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Try Live Demo → Book 30-min Walkthrough See Pricing

Can I update embeddings without re-indexing the entire document?

Yes. Use the Elasticsearch _update API to modify just the embedding field of a document. However, if you change embedding models, you must re-embed and re-index all documents because vectors from different models are not comparable.

#Elasticsearch #HybridSearch #BM25 #VectorSearch #KNN #AgenticAI #LearnAI #AIEngineering

Background and Key Concepts: Elasticsearch hybrid search bm25 dense vector knn documentation

This guide is written for engineers and operators evaluating elasticsearch hybrid search bm25 dense vector knn documentation in real production systems. Elasticsearch hybrid search bm25 dense vector knn documentation sits alongside numerical representations, rank constant, reciprocal rank fusion rrf, text field in the daily work of teams shipping production AI. The notes below give a plain-language reference for terms used throughout the article.

numerical representations — referenced in this guide when discussing elasticsearch hybrid search bm25 dense vector knn documentation.
rank constant — referenced in this guide when discussing elasticsearch hybrid search bm25 dense vector knn documentation.
reciprocal rank fusion rrf — referenced in this guide when discussing elasticsearch hybrid search bm25 dense vector knn documentation.
text field — referenced in this guide when discussing elasticsearch hybrid search bm25 dense vector knn documentation.

For teams that want to ship elasticsearch hybrid search bm25 dense vector knn documentation in voice and chat agents this quarter, CallSphere runs 37 agents and 90+ function tools across 6 verticals on a single dashboard. Start a 14-day trial, see live demo agents, or compare tiers on /pricing.

Semantic Search with Elasticsearch: Dense Vector Search and BM25 Hybrid — Elasticsearch hybrid search bm25 dense vector knn documentation

Why Hybrid Search

Index Configuration

Indexing Documents with Embeddings

Hybrid Search Query

Tuning the Hybrid Balance

FAQ

Should I use Elasticsearch or a dedicated vector database like Pinecone or Weaviate?

How does the num_candidates parameter affect kNN quality?

Can I update embeddings without re-indexing the entire document?

Background and Key Concepts: Elasticsearch hybrid search bm25 dense vector knn documentation

Try CallSphere AI Voice Agents

Related Articles You May Like

Agentic RAG with LangGraph: Iterative Retrieval, Self-Correction, and Eval Pipelines

Production RAG Agents with LangChain and RAGAS Evaluation in 2026

Embedding Fine-Tuning for Domain-Specific RAG

Vector Index Algorithms Compared: HNSW, IVF, ScaNN, DiskANN

Quantizing Embeddings: int8, Binary, and Matryoshka

Choosing an Embedding Model in 2026: text-embedding-3, BGE, Voyage, Cohere