---
title: "Full-Text Search for Agent Knowledge Bases: PostgreSQL tsvector and Trigram Search"
description: "Implement full-text search for AI agent knowledge bases using PostgreSQL tsvector, trigram similarity, and GIN indexes with ranking, fuzzy matching, and hybrid search strategies."
canonical: https://callsphere.ai/blog/full-text-search-agent-knowledge-bases-postgresql-tsvector-trigram
category: "Learn Agentic AI"
tags: ["Full-Text Search", "PostgreSQL", "tsvector", "Knowledge Base", "AI Agents"]
author: "CallSphere Team"
published: 2026-03-17T00:00:00.000Z
updated: 2026-05-06T01:02:43.699Z
---

# Full-Text Search for Agent Knowledge Bases: PostgreSQL tsvector and Trigram Search

> Implement full-text search for AI agent knowledge bases using PostgreSQL tsvector, trigram similarity, and GIN indexes with ranking, fuzzy matching, and hybrid search strategies.

## Why Agents Need Full-Text Search

AI agents frequently need to retrieve relevant information from a knowledge base before generating a response. While vector similarity search handles semantic queries well, full-text search excels at exact term matching, phrase queries, and structured document retrieval. A production agent system typically combines both approaches.

PostgreSQL provides two complementary text search systems built in — no external services required. The tsvector/tsquery system handles linguistic full-text search with stemming and ranking. The pg_trgm extension handles fuzzy matching and typo tolerance. Together, they cover the full spectrum of text retrieval needs.

## Setting Up tsvector Search

Create a knowledge base table with a generated tsvector column:

```mermaid
flowchart LR
    INPUT(["User intent"])
    PARSE["Parse plus
classify"]
    PLAN["Plan and tool
selection"]
    AGENT["Agent loop
LLM plus tools"]
    GUARD{"Guardrails
and policy"}
    EXEC["Execute and
verify result"]
    OBS[("Trace and metrics")]
    OUT(["Outcome plus
next action"])
    INPUT --> PARSE --> PLAN --> AGENT --> GUARD
    GUARD -->|Pass| EXEC --> OUT
    GUARD -->|Fail| AGENT
    AGENT --> OBS
    style AGENT fill:#4f46e5,stroke:#4338ca,color:#fff
    style GUARD fill:#f59e0b,stroke:#d97706,color:#1f2937
    style OBS fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
    style OUT fill:#059669,stroke:#047857,color:#fff
```

```sql
CREATE TABLE knowledge_articles (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    title TEXT NOT NULL,
    content TEXT NOT NULL,
    category TEXT,
    source TEXT,
    search_vector TSVECTOR GENERATED ALWAYS AS (
        setweight(to_tsvector('english', coalesce(title, '')), 'A') ||
        setweight(to_tsvector('english', coalesce(content, '')), 'B')
    ) STORED,
    created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
    updated_at TIMESTAMPTZ NOT NULL DEFAULT now()
);

-- GIN index on the search vector
CREATE INDEX idx_knowledge_search
    ON knowledge_articles USING gin(search_vector);
```

The `GENERATED ALWAYS AS` clause automatically maintains the search vector when title or content changes. The `setweight` function assigns weight A to titles and B to content, so title matches rank higher.

## Querying with tsquery

Search the knowledge base with ranked results:

```sql
SELECT
    id,
    title,
    ts_rank(search_vector, query) AS rank,
    ts_headline('english', content, query,
        'StartSel=, StopSel=, MaxFragments=3'
    ) AS snippet
FROM knowledge_articles,
     to_tsquery('english', 'agent & memory & retrieval') AS query
WHERE search_vector @@ query
ORDER BY rank DESC
LIMIT 10;
```

The `ts_headline` function generates highlighted snippets showing where matches occur — useful for displaying search results to users or providing context to the agent.

## Building Search Queries from User Input

User input rarely matches tsquery syntax. Parse natural language into a proper tsquery:

```python
import asyncpg

def build_tsquery(user_input: str) -> str:
    """Convert user input into a tsquery string."""
    words = user_input.strip().split()
    sanitized = [
        w.replace("'", "").replace("\\", "")
        for w in words
        if w.strip()
    ]
    if not sanitized:
        return ""
    # Join with & for AND semantics, use :* for prefix matching
    return " & ".join(f"{word}:*" for word in sanitized)

async def search_knowledge(
    pool: asyncpg.Pool, query_text: str, limit: int = 10
) -> list[dict]:
    tsquery = build_tsquery(query_text)
    if not tsquery:
        return []

    rows = await pool.fetch(
        """
        SELECT id, title, category,
            ts_rank(search_vector, to_tsquery('english', $1)) AS rank,
            ts_headline('english', content, to_tsquery('english', $1),
                'MaxFragments=2, MinWords=20, MaxWords=60'
            ) AS snippet
        FROM knowledge_articles
        WHERE search_vector @@ to_tsquery('english', $1)
        ORDER BY rank DESC
        LIMIT $2
        """,
        tsquery,
        limit,
    )
    return [dict(r) for r in rows]
```

The `:*` suffix enables prefix matching, so "retriev" matches "retrieval", "retrieve", and "retrieving". This provides a more forgiving search experience.

## Trigram Search for Fuzzy Matching

Enable the pg_trgm extension for similarity-based search that handles typos:

```sql
CREATE EXTENSION IF NOT EXISTS pg_trgm;

-- GIN trigram index on title
CREATE INDEX idx_knowledge_title_trgm
    ON knowledge_articles USING gin(title gin_trgm_ops);

-- Fuzzy search: find articles with titles similar to input
SELECT id, title,
    similarity(title, 'agnet memroy') AS sim_score
FROM knowledge_articles
WHERE title % 'agnet memroy'  -- % operator uses similarity threshold
ORDER BY sim_score DESC
LIMIT 5;
```

The default similarity threshold is 0.3. Adjust it for your use case:

```sql
SET pg_trgm.similarity_threshold = 0.2;  -- More permissive
```

## Hybrid Search: Combining tsvector and Trigram

For the best user experience, combine both approaches. Use tsvector for precise ranked results and fall back to trigram when tsvector returns no matches:

```python
async def hybrid_search(
    pool: asyncpg.Pool, query_text: str, limit: int = 10
) -> list[dict]:
    tsquery = build_tsquery(query_text)

    rows = await pool.fetch(
        """
        WITH fts AS (
            SELECT id, title, category,
                ts_rank(search_vector, to_tsquery('english', $1)) AS score,
                'fts' AS match_type
            FROM knowledge_articles
            WHERE search_vector @@ to_tsquery('english', $1)
        ),
        trgm AS (
            SELECT id, title, category,
                similarity(title || ' ' || content, $2) AS score,
                'trigram' AS match_type
            FROM knowledge_articles
            WHERE (title || ' ' || content) % $2
              AND id NOT IN (SELECT id FROM fts)
        )
        SELECT * FROM fts
        UNION ALL
        SELECT * FROM trgm
        ORDER BY score DESC
        LIMIT $3
        """,
        tsquery,
        query_text,
        limit,
    )
    return [dict(r) for r in rows]
```

This query first collects full-text matches, then adds trigram matches that were not already found, giving you both precision and typo tolerance.

## Integration with Agent Tools

Expose the search as a tool that your agent can call:

```python
from agents import Agent, function_tool

@function_tool
async def search_knowledge_base(query: str) -> str:
    """Search the internal knowledge base for relevant articles."""
    pool = get_db_pool()
    results = await hybrid_search(pool, query, limit=5)
    if not results:
        return "No relevant articles found."
    return "\n\n".join(
        f"**{r['title']}** ({r['category']})\nScore: {r['score']:.2f}"
        for r in results
    )
```

## FAQ

### How does PostgreSQL full-text search compare to Elasticsearch?

PostgreSQL full-text search handles most knowledge base scenarios well — up to millions of documents with sub-second query times. Elasticsearch is warranted when you need distributed search across billions of documents, complex faceted navigation, or real-time log analysis. For agent knowledge bases under ten million documents, PostgreSQL avoids the operational complexity of a separate search cluster.

### Can I combine full-text search with vector similarity search?

Yes. Store both a tsvector column and a vector embedding column on the same table. Query with a CTE that scores both approaches and combines them with weighted ranking. This hybrid retrieval pattern consistently outperforms either approach alone in RAG (Retrieval Augmented Generation) benchmarks.

### How do I keep the search index updated when articles change?

The `GENERATED ALWAYS AS ... STORED` column updates automatically on INSERT and UPDATE. For bulk imports, insert the data normally and the search vector regenerates. There is no separate indexing step required — PostgreSQL maintains the GIN index incrementally.

---

#FullTextSearch #PostgreSQL #Tsvector #KnowledgeBase #AIAgents #AgenticAI #LearnAI #AIEngineering

---

Source: https://callsphere.ai/blog/full-text-search-agent-knowledge-bases-postgresql-tsvector-trigram
