Skip to content
AI-Powered Search for SaaS Applications: Semantic Search Over Product Data
Learn Agentic AI9 min read13 views

AI-Powered Search for SaaS Applications: Semantic Search Over Product Data

Build semantic search for your SaaS product using vector embeddings, enabling users to find records by meaning rather than exact keyword matches.

Why Keyword Search Falls Short

Traditional keyword search works by matching exact tokens. When a user in your CRM searches for "companies that are struggling financially," keyword search returns nothing — because no record contains those exact words. Semantic search uses vector embeddings to match by meaning, so that query finds records tagged "at risk," "payment overdue," or "churn likelihood: high."

For SaaS products with rich, structured data, semantic search transforms how users discover and interact with their information.

Architecture: Indexing Pipeline

The indexing pipeline converts your product data into searchable vector embeddings. It runs on data changes (inserts, updates, deletes) and keeps the vector index in sync with your primary database.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live Demo →
flowchart TD
    DOC(["Document"])
    CHUNK["Chunker<br/>recursive plus overlap"]
    EMB["Embedding model"]
    META["Attach metadata<br/>source, page, tenant"]
    INDEX[("HNSW or IVF index<br/>in vector store")]
    Q(["Query"])
    QEMB["Embed query"]
    SEARCH["ANN search<br/>cosine similarity"]
    FILTER["Metadata filter<br/>tenant or date"]
    HITS(["Top-k chunks"])
    DOC --> CHUNK --> EMB --> META --> INDEX
    Q --> QEMB --> SEARCH
    INDEX --> SEARCH --> FILTER --> HITS
    style INDEX fill:#4f46e5,stroke:#4338ca,color:#fff
    style HITS fill:#059669,stroke:#047857,color:#fff
# Embedding indexer that processes data changes
from openai import OpenAI
import numpy as np
from dataclasses import dataclass

client = OpenAI()

@dataclass
class SearchDocument:
    entity_type: str
    entity_id: str
    tenant_id: str
    text: str
    metadata: dict

def create_embedding(text: str) -> list[float]:
    response = client.embeddings.create(
        model="text-embedding-3-small",
        input=text,
    )
    return response.data[0].embedding

def build_search_text(entity_type: str, record: dict) -> str:
    """Convert a database record into searchable text."""
    builders = {
        "contact": lambda r: (
            f"Contact: {r['name']}. Company: {r.get('company', 'N/A')}. "
            f"Title: {r.get('title', 'N/A')}. Notes: {r.get('notes', '')}. "
            f"Tags: {', '.join(r.get('tags', []))}."
        ),
        "deal": lambda r: (
            f"Deal: {r['name']}. Value: ${r.get('value', 0):,.2f}. "
            f"Stage: {r.get('stage', 'unknown')}. "
            f"Description: {r.get('description', '')}."
        ),
        "ticket": lambda r: (
            f"Support ticket: {r['subject']}. Status: {r.get('status', 'open')}. "
            f"Priority: {r.get('priority', 'normal')}. Body: {r.get('body', '')}."
        ),
    }
    builder = builders.get(entity_type)
    if not builder:
        raise ValueError(f"Unknown entity type: {entity_type}")
    return builder(record)

Storing Embeddings with pgvector

Use PostgreSQL with pgvector to keep embeddings alongside your existing data, avoiding the operational overhead of a separate vector database.

# pgvector storage and retrieval
import asyncpg

EMBED_DIM = 1536  # text-embedding-3-small dimension

async def setup_vector_table(pool: asyncpg.Pool):
    async with pool.acquire() as conn:
        await conn.execute("CREATE EXTENSION IF NOT EXISTS vector;")
        await conn.execute(f"""
            CREATE TABLE IF NOT EXISTS search_embeddings (
                id SERIAL PRIMARY KEY,
                tenant_id UUID NOT NULL,
                entity_type VARCHAR(50) NOT NULL,
                entity_id UUID NOT NULL,
                content TEXT NOT NULL,
                embedding vector({EMBED_DIM}) NOT NULL,
                metadata JSONB DEFAULT '{{}}',
                updated_at TIMESTAMPTZ DEFAULT NOW(),
                UNIQUE(entity_type, entity_id)
            );
        """)
        await conn.execute("""
            CREATE INDEX IF NOT EXISTS idx_search_embed_tenant
            ON search_embeddings (tenant_id);
        """)

async def upsert_embedding(pool: asyncpg.Pool, doc: SearchDocument):
    embedding = create_embedding(doc.text)
    embedding_str = "[" + ",".join(str(x) for x in embedding) + "]"
    async with pool.acquire() as conn:
        await conn.execute("""
            INSERT INTO search_embeddings
                (tenant_id, entity_type, entity_id, content, embedding, metadata)
            VALUES ($1, $2, $3, $4, $5::vector, $6)
            ON CONFLICT (entity_type, entity_id)
            DO UPDATE SET content = $4, embedding = $5::vector,
                          metadata = $6, updated_at = NOW();
        """, doc.tenant_id, doc.entity_type, doc.entity_id,
             doc.text, embedding_str, doc.metadata)

Search API

The search endpoint accepts a natural language query, embeds it, and performs a cosine similarity search scoped to the user's tenant.

from fastapi import FastAPI, Depends, Query
from pydantic import BaseModel

app = FastAPI()

class SearchResult(BaseModel):
    entity_type: str
    entity_id: str
    content: str
    score: float
    metadata: dict

@app.get("/api/search", response_model=list[SearchResult])
async def semantic_search(
    q: str = Query(..., min_length=2, max_length=500),
    entity_type: str | None = Query(None),
    limit: int = Query(10, ge=1, le=50),
    tenant_id: str = Depends(get_current_tenant),
    pool: asyncpg.Pool = Depends(get_db_pool),
):
    query_embedding = create_embedding(q)
    embedding_str = "[" + ",".join(str(x) for x in query_embedding) + "]"

    type_filter = "AND entity_type = $3" if entity_type else ""
    params = [tenant_id, embedding_str]
    if entity_type:
        params.append(entity_type)

    async with pool.acquire() as conn:
        rows = await conn.fetch(f"""
            SELECT entity_type, entity_id, content, metadata,
                   1 - (embedding <=> $2::vector) AS score
            FROM search_embeddings
            WHERE tenant_id = $1 {type_filter}
            ORDER BY embedding <=> $2::vector
            LIMIT {limit};
        """, *params)

    return [
        SearchResult(
            entity_type=r["entity_type"],
            entity_id=str(r["entity_id"]),
            content=r["content"],
            score=round(float(r["score"]), 4),
            metadata=r["metadata"],
        )
        for r in rows
    ]

Relevance Tuning

Combine vector similarity with keyword matching and recency boosting for better results.

# Hybrid scoring: vector similarity + keyword BM25 + recency
async def hybrid_search(pool: asyncpg.Pool, query: str,
                        tenant_id: str, limit: int = 10):
    query_embedding = create_embedding(query)
    embedding_str = "[" + ",".join(str(x) for x in query_embedding) + "]"

    async with pool.acquire() as conn:
        rows = await conn.fetch("""
            SELECT entity_type, entity_id, content, metadata,
                   1 - (embedding <=> $2::vector) AS vector_score,
                   ts_rank(to_tsvector('english', content),
                           plainto_tsquery('english', $3)) AS keyword_score,
                   EXTRACT(EPOCH FROM (NOW() - updated_at)) AS age_seconds
            FROM search_embeddings
            WHERE tenant_id = $1
            ORDER BY (
                0.7 * (1 - (embedding <=> $2::vector)) +
                0.2 * ts_rank(to_tsvector('english', content),
                              plainto_tsquery('english', $3)) +
                0.1 * (1.0 / (1.0 + EXTRACT(EPOCH FROM (NOW() - updated_at)) / 86400))
            ) DESC
            LIMIT $4;
        """, tenant_id, embedding_str, query, limit)
    return rows

FAQ

How do I keep the vector index in sync with my primary data?

Use database triggers or change data capture (CDC) to detect inserts, updates, and deletes. Queue these changes to a background worker that recomputes embeddings and upserts them. For deletes, remove the corresponding row from the search_embeddings table. A 30-second indexing delay is acceptable for most SaaS applications.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Should I use pgvector or a dedicated vector database?

pgvector is the right choice for most SaaS products under 10 million records. It keeps your stack simple — one database, one backup strategy, one connection pool. Switch to a dedicated vector database like Pinecone or Weaviate only if you need sub-10ms latency at scale or advanced filtering that pgvector does not support.

Use a multilingual embedding model like text-embedding-3-small (which supports 100+ languages natively). Index all content as-is without translation. The embedding model maps semantically similar content to nearby vectors regardless of language, so a query in Spanish will find relevant records written in English.


#SemanticSearch #VectorEmbeddings #SaaS #SearchAPI #Python #Pgvector #AgenticAI #LearnAI #AIEngineering

Share

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.

Related Articles You May Like

AI Infrastructure

MCP Servers for SaaS Tools: A 2026 Registry Walkthrough for Voice Agent Teams

The public MCP registry crossed 9,400 servers in April 2026. Here is a curated walkthrough of the SaaS MCP servers CallSphere mounts in production, with OAuth 2.1 PKCE patterns.

Agentic AI

Building Your First Agent with the OpenAI Agents SDK in 2026: A Hands-On Walkthrough

Step-by-step build of a working agent with the OpenAI Agents SDK — Agent class, tools, handoffs, tracing — plus an eval pipeline that catches regressions before merge.

Funding & Industry

Stargate progress update — April 2026 site and capex

OpenAI's Stargate with Oracle and SoftBank crossed a milestone in April 2026 with the first Texas site partially energized and three additional sites under construction.

AI Infrastructure

Database Backup and Recovery for AI Agent State: Postgres + pgvector

Your agent's memory, embeddings, and conversation state all live in Postgres. Backups must include vector data and survive a full-region loss. Here's how CallSphere does PITR for 115+ tables.

Agentic AI

Smolagents: Hugging Face's Code-First Agent Framework Reviewed

Smolagents lets agents write Python instead of JSON. Why code-as-action reduces tool errors and where the security trade-offs are for production deployments.

AI Engineering

Vercel AI SDK for SaaS Onboarding Agents: Conversion Lift Story

How a Seattle SaaS team used the Vercel AI SDK 5 agent loop to build an in-product onboarding agent that converts trial users at measurably higher rates.