AI-Powered Search for SaaS Applications: Semantic Search Over Product Data

Why Keyword Search Falls Short

Traditional keyword search works by matching exact tokens. When a user in your CRM searches for "companies that are struggling financially," keyword search returns nothing — because no record contains those exact words. Semantic search uses vector embeddings to match by meaning, so that query finds records tagged "at risk," "payment overdue," or "churn likelihood: high."

For SaaS products with rich, structured data, semantic search transforms how users discover and interact with their information.

Architecture: Indexing Pipeline

The indexing pipeline converts your product data into searchable vector embeddings. It runs on data changes (inserts, updates, deletes) and keeps the vector index in sync with your primary database.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live →

Try Live Demo →

flowchart TD
    DOC(["Document"])
    CHUNK["Chunker<br/>recursive plus overlap"]
    EMB["Embedding model"]
    META["Attach metadata<br/>source, page, tenant"]
    INDEX[("HNSW or IVF index<br/>in vector store")]
    Q(["Query"])
    QEMB["Embed query"]
    SEARCH["ANN search<br/>cosine similarity"]
    FILTER["Metadata filter<br/>tenant or date"]
    HITS(["Top-k chunks"])
    DOC --> CHUNK --> EMB --> META --> INDEX
    Q --> QEMB --> SEARCH
    INDEX --> SEARCH --> FILTER --> HITS
    style INDEX fill:#4f46e5,stroke:#4338ca,color:#fff
    style HITS fill:#059669,stroke:#047857,color:#fff

# Embedding indexer that processes data changes
from openai import OpenAI
import numpy as np
from dataclasses import dataclass

client = OpenAI()

@dataclass
class SearchDocument:
    entity_type: str
    entity_id: str
    tenant_id: str
    text: str
    metadata: dict

def create_embedding(text: str) -> list[float]:
    response = client.embeddings.create(
        model="text-embedding-3-small",
        input=text,
    )
    return response.data[0].embedding

def build_search_text(entity_type: str, record: dict) -> str:
    """Convert a database record into searchable text."""
    builders = {
        "contact": lambda r: (
            f"Contact: {r['name']}. Company: {r.get('company', 'N/A')}. "
            f"Title: {r.get('title', 'N/A')}. Notes: {r.get('notes', '')}. "
            f"Tags: {', '.join(r.get('tags', []))}."
        ),
        "deal": lambda r: (
            f"Deal: {r['name']}. Value: ${r.get('value', 0):,.2f}. "
            f"Stage: {r.get('stage', 'unknown')}. "
            f"Description: {r.get('description', '')}."
        ),
        "ticket": lambda r: (
            f"Support ticket: {r['subject']}. Status: {r.get('status', 'open')}. "
            f"Priority: {r.get('priority', 'normal')}. Body: {r.get('body', '')}."
        ),
    }
    builder = builders.get(entity_type)
    if not builder:
        raise ValueError(f"Unknown entity type: {entity_type}")
    return builder(record)

Storing Embeddings with pgvector

Use PostgreSQL with pgvector to keep embeddings alongside your existing data, avoiding the operational overhead of a separate vector database.

# pgvector storage and retrieval
import asyncpg

EMBED_DIM = 1536  # text-embedding-3-small dimension

async def setup_vector_table(pool: asyncpg.Pool):
    async with pool.acquire() as conn:
        await conn.execute("CREATE EXTENSION IF NOT EXISTS vector;")
        await conn.execute(f"""
            CREATE TABLE IF NOT EXISTS search_embeddings (
                id SERIAL PRIMARY KEY,
                tenant_id UUID NOT NULL,
                entity_type VARCHAR(50) NOT NULL,
                entity_id UUID NOT NULL,
                content TEXT NOT NULL,
                embedding vector({EMBED_DIM}) NOT NULL,
                metadata JSONB DEFAULT '{{}}',
                updated_at TIMESTAMPTZ DEFAULT NOW(),
                UNIQUE(entity_type, entity_id)
            );
        """)
        await conn.execute("""
            CREATE INDEX IF NOT EXISTS idx_search_embed_tenant
            ON search_embeddings (tenant_id);
        """)

async def upsert_embedding(pool: asyncpg.Pool, doc: SearchDocument):
    embedding = create_embedding(doc.text)
    embedding_str = "[" + ",".join(str(x) for x in embedding) + "]"
    async with pool.acquire() as conn:
        await conn.execute("""
            INSERT INTO search_embeddings
                (tenant_id, entity_type, entity_id, content, embedding, metadata)
            VALUES ($1, $2, $3, $4, $5::vector, $6)
            ON CONFLICT (entity_type, entity_id)
            DO UPDATE SET content = $4, embedding = $5::vector,
                          metadata = $6, updated_at = NOW();
        """, doc.tenant_id, doc.entity_type, doc.entity_id,
             doc.text, embedding_str, doc.metadata)

Search API

The search endpoint accepts a natural language query, embeds it, and performs a cosine similarity search scoped to the user's tenant.

from fastapi import FastAPI, Depends, Query
from pydantic import BaseModel

app = FastAPI()

class SearchResult(BaseModel):
    entity_type: str
    entity_id: str
    content: str
    score: float
    metadata: dict

@app.get("/api/search", response_model=list[SearchResult])
async def semantic_search(
    q: str = Query(..., min_length=2, max_length=500),
    entity_type: str | None = Query(None),
    limit: int = Query(10, ge=1, le=50),
    tenant_id: str = Depends(get_current_tenant),
    pool: asyncpg.Pool = Depends(get_db_pool),
):
    query_embedding = create_embedding(q)
    embedding_str = "[" + ",".join(str(x) for x in query_embedding) + "]"

    type_filter = "AND entity_type = $3" if entity_type else ""
    params = [tenant_id, embedding_str]
    if entity_type:
        params.append(entity_type)

    async with pool.acquire() as conn:
        rows = await conn.fetch(f"""
            SELECT entity_type, entity_id, content, metadata,
                   1 - (embedding <=> $2::vector) AS score
            FROM search_embeddings
            WHERE tenant_id = $1 {type_filter}
            ORDER BY embedding <=> $2::vector
            LIMIT {limit};
        """, *params)

    return [
        SearchResult(
            entity_type=r["entity_type"],
            entity_id=str(r["entity_id"]),
            content=r["content"],
            score=round(float(r["score"]), 4),
            metadata=r["metadata"],
        )
        for r in rows
    ]

Relevance Tuning

Combine vector similarity with keyword matching and recency boosting for better results.

# Hybrid scoring: vector similarity + keyword BM25 + recency
async def hybrid_search(pool: asyncpg.Pool, query: str,
                        tenant_id: str, limit: int = 10):
    query_embedding = create_embedding(query)
    embedding_str = "[" + ",".join(str(x) for x in query_embedding) + "]"

    async with pool.acquire() as conn:
        rows = await conn.fetch("""
            SELECT entity_type, entity_id, content, metadata,
                   1 - (embedding <=> $2::vector) AS vector_score,
                   ts_rank(to_tsvector('english', content),
                           plainto_tsquery('english', $3)) AS keyword_score,
                   EXTRACT(EPOCH FROM (NOW() - updated_at)) AS age_seconds
            FROM search_embeddings
            WHERE tenant_id = $1
            ORDER BY (
                0.7 * (1 - (embedding <=> $2::vector)) +
                0.2 * ts_rank(to_tsvector('english', content),
                              plainto_tsquery('english', $3)) +
                0.1 * (1.0 / (1.0 + EXTRACT(EPOCH FROM (NOW() - updated_at)) / 86400))
            ) DESC
            LIMIT $4;
        """, tenant_id, embedding_str, query, limit)
    return rows

FAQ

How do I keep the vector index in sync with my primary data?

Use database triggers or change data capture (CDC) to detect inserts, updates, and deletes. Queue these changes to a background worker that recomputes embeddings and upserts them. For deletes, remove the corresponding row from the search_embeddings table. A 30-second indexing delay is acceptable for most SaaS applications.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Try Live Demo → Book 30-min Walkthrough See Pricing

Should I use pgvector or a dedicated vector database?

pgvector is the right choice for most SaaS products under 10 million records. It keeps your stack simple — one database, one backup strategy, one connection pool. Switch to a dedicated vector database like Pinecone or Weaviate only if you need sub-10ms latency at scale or advanced filtering that pgvector does not support.

How do I handle multi-language search?

Use a multilingual embedding model like text-embedding-3-small (which supports 100+ languages natively). Index all content as-is without translation. The embedding model maps semantically similar content to nearby vectors regardless of language, so a query in Spanish will find relevant records written in English.

#SemanticSearch #VectorEmbeddings #SaaS #SearchAPI #Python #Pgvector #AgenticAI #LearnAI #AIEngineering

AI-Powered Search for SaaS Applications: Semantic Search Over Product Data

Why Keyword Search Falls Short

Architecture: Indexing Pipeline

Storing Embeddings with pgvector

Search API

Relevance Tuning

FAQ

How do I keep the vector index in sync with my primary data?

Should I use pgvector or a dedicated vector database?

How do I handle multi-language search?

Try CallSphere AI Voice Agents

Related Articles You May Like

MCP Servers for SaaS Tools: A 2026 Registry Walkthrough for Voice Agent Teams

Building Your First Agent with the OpenAI Agents SDK in 2026: A Hands-On Walkthrough

Stargate progress update — April 2026 site and capex

Database Backup and Recovery for AI Agent State: Postgres + pgvector

Smolagents: Hugging Face's Code-First Agent Framework Reviewed

Vercel AI SDK for SaaS Onboarding Agents: Conversion Lift Story