---
title: "Database Query Optimization for Agent Knowledge Retrieval: Indexes, Caching, and Denormalization"
description: "Optimize the database layer that powers your AI agent's knowledge retrieval with query profiling, index design, materialized views, and query caching strategies that cut latency from seconds to milliseconds."
canonical: https://callsphere.ai/blog/database-query-optimization-agent-knowledge-retrieval-indexes-caching
category: "Learn Agentic AI"
tags: ["Database Optimization", "PostgreSQL", "Indexing", "Query Caching", "Python"]
author: "CallSphere Team"
published: 2026-03-17T00:00:00.000Z
updated: 2026-05-06T01:02:44.327Z
---

# Database Query Optimization for Agent Knowledge Retrieval: Indexes, Caching, and Denormalization

> Optimize the database layer that powers your AI agent's knowledge retrieval with query profiling, index design, materialized views, and query caching strategies that cut latency from seconds to milliseconds.

## Why Database Performance Matters for AI Agents

When an AI agent calls a tool to look up customer data, search a knowledge base, or retrieve transaction history, that tool call usually hits a database. A tool call that takes 50ms feels instant. One that takes 2 seconds makes the entire agent feel broken — and the LLM is waiting idle the entire time.

Most database performance problems in agent systems come from three sources: missing indexes, the N+1 query pattern, and full table scans on large knowledge bases. Fixing these is often the highest-ROI optimization you can make.

## Query Profiling: Finding the Slow Queries

Before optimizing, measure. Use `EXPLAIN ANALYZE` in PostgreSQL to understand exactly how the database executes your queries.

```mermaid
flowchart LR
    PR(["PR opened"])
    UNIT["Unit tests"]
    EVAL["Eval harness
PromptFoo or Braintrust"]
    GOLD[("Golden set
200 tagged cases")]
    JUDGE["LLM as judge
plus regex graders"]
    SCORE["Aggregate score
and per slice"]
    GATE{"Score regress
more than 2 percent?"}
    BLOCK(["Block merge"])
    MERGE(["Merge to main"])
    PR --> UNIT --> EVAL --> GOLD --> JUDGE --> SCORE --> GATE
    GATE -->|Yes| BLOCK
    GATE -->|No| MERGE
    style EVAL fill:#4f46e5,stroke:#4338ca,color:#fff
    style GATE fill:#f59e0b,stroke:#d97706,color:#1f2937
    style BLOCK fill:#dc2626,stroke:#b91c1c,color:#fff
    style MERGE fill:#059669,stroke:#047857,color:#fff
```

```python
import asyncpg

async def profile_query(pool: asyncpg.Pool, query: str, *args) -> dict:
    """Run EXPLAIN ANALYZE on a query and return the execution plan."""
    explain_query = f"EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON) {query}"
    result = await pool.fetchval(explain_query, *args)

    plan = result[0]
    return {
        "total_time_ms": plan["Execution Time"],
        "planning_time_ms": plan["Planning Time"],
        "plan": plan["Plan"],
    }

# Usage
profile = await profile_query(
    pool,
    "SELECT * FROM knowledge_base WHERE content ILIKE $1",
    "%return policy%",
)
print(f"Query took {profile['total_time_ms']:.1f}ms")
# If this shows "Seq Scan" on a large table, you need an index
```

## Index Design for Agent Queries

Agents typically run three types of queries: exact lookup (find by ID), keyword search (find by content), and filtered listing (find by status + date range). Each needs a different index strategy.

```python
# Index creation script for a typical agent knowledge base

INDEXES = [
    # Exact lookup by slug or ID — B-tree (default)
    "CREATE INDEX IF NOT EXISTS idx_kb_slug ON knowledge_base (slug);",

    # Full-text search on content — GIN index with tsvector
    """CREATE INDEX IF NOT EXISTS idx_kb_content_fts
       ON knowledge_base USING GIN (to_tsvector('english', content));""",

    # Filtered listing: category + date for sorted retrieval
    """CREATE INDEX IF NOT EXISTS idx_kb_category_date
       ON knowledge_base (category, updated_at DESC);""",

    # Composite index for agent tool: status + priority + created
    """CREATE INDEX IF NOT EXISTS idx_tickets_status_priority
       ON support_tickets (status, priority DESC, created_at DESC)
       WHERE status = 'open';""",  # Partial index — only indexes open tickets
]

async def apply_indexes(pool: asyncpg.Pool):
    async with pool.acquire() as conn:
        for idx_sql in INDEXES:
            await conn.execute(idx_sql)
```

Partial indexes (with a `WHERE` clause) are especially powerful for agent queries. If your agent only searches open tickets, indexing only open tickets makes the index smaller and faster.

## Full-Text Search Instead of ILIKE

Agents often need to search knowledge bases by content. The naive approach uses `ILIKE`, which forces a full table scan on every query.

```python
import asyncpg

# BAD: Full table scan on every search
async def search_knowledge_slow(pool: asyncpg.Pool, query: str) -> list:
    return await pool.fetch(
        "SELECT * FROM knowledge_base WHERE content ILIKE $1 LIMIT 10",
        f"%{query}%",
    )

# GOOD: Full-text search with GIN index
async def search_knowledge_fast(pool: asyncpg.Pool, query: str) -> list:
    return await pool.fetch(
        """SELECT *, ts_rank(
               to_tsvector('english', content),
               plainto_tsquery('english', $1)
           ) AS rank
           FROM knowledge_base
           WHERE to_tsvector('english', content) @@ plainto_tsquery('english', $1)
           ORDER BY rank DESC
           LIMIT 10""",
        query,
    )
```

On a table with 100,000 rows, the ILIKE query takes 200-500ms. The full-text search query with a GIN index takes 2-10ms.

## Eliminating the N+1 Pattern

The N+1 problem is the most common performance killer in agent tools. It happens when you query a list and then loop through it to fetch related data.

```python
import asyncpg

# BAD: N+1 — one query for orders, then one per order for items
async def get_order_details_n_plus_1(pool: asyncpg.Pool, customer_id: str):
    orders = await pool.fetch(
        "SELECT * FROM orders WHERE customer_id = $1", customer_id
    )
    for order in orders:
        # This runs once PER order — 10 orders = 10 queries
        order["items"] = await pool.fetch(
            "SELECT * FROM order_items WHERE order_id = $1", order["id"]
        )
    return orders

# GOOD: Single query with JOIN
async def get_order_details_joined(pool: asyncpg.Pool, customer_id: str):
    rows = await pool.fetch(
        """SELECT o.id AS order_id, o.status, o.total,
                  oi.product_name, oi.quantity, oi.price
           FROM orders o
           LEFT JOIN order_items oi ON oi.order_id = o.id
           WHERE o.customer_id = $1
           ORDER BY o.created_at DESC""",
        customer_id,
    )
    # Group items by order
    orders = {}
    for row in rows:
        oid = row["order_id"]
        if oid not in orders:
            orders[oid] = {
                "id": oid, "status": row["status"],
                "total": row["total"], "items": [],
            }
        if row["product_name"]:
            orders[oid]["items"].append({
                "product": row["product_name"],
                "quantity": row["quantity"],
                "price": row["price"],
            })
    return list(orders.values())
```

## Materialized Views for Complex Aggregations

If your agent frequently needs aggregated data (e.g., "What are this customer's total purchases by category?"), materialized views pre-compute the result.

```python
# Create a materialized view for customer spending summaries
CREATE_MATVIEW = """
CREATE MATERIALIZED VIEW IF NOT EXISTS customer_spending_summary AS
SELECT
    c.id AS customer_id,
    c.email,
    COUNT(o.id) AS total_orders,
    SUM(o.total) AS lifetime_spend,
    MAX(o.created_at) AS last_order_date,
    AVG(o.total) AS avg_order_value
FROM customers c
LEFT JOIN orders o ON o.customer_id = c.id
GROUP BY c.id, c.email;

CREATE UNIQUE INDEX ON customer_spending_summary (customer_id);
"""

# Refresh the view periodically (not on every query)
REFRESH_VIEW = "REFRESH MATERIALIZED VIEW CONCURRENTLY customer_spending_summary;"

async def get_spending_summary(pool: asyncpg.Pool, customer_id: str) -> dict:
    """Instant lookup instead of expensive aggregation."""
    row = await pool.fetchrow(
        "SELECT * FROM customer_spending_summary WHERE customer_id = $1",
        customer_id,
    )
    return dict(row) if row else None
```

Refresh the materialized view on a schedule (every 5-15 minutes) rather than on every query. For most agent use cases, slightly stale aggregation data is perfectly acceptable.

## Query Result Caching

For data that does not change frequently, add an application-level cache between the agent and the database.

```python
import json
import hashlib

class QueryCache:
    def __init__(self, redis_client, default_ttl: int = 300):
        self.redis = redis_client
        self.default_ttl = default_ttl

    def _key(self, query: str, args: tuple) -> str:
        payload = f"{query}:{json.dumps(args, default=str)}"
        return f"qcache:{hashlib.sha256(payload.encode()).hexdigest()}"

    async def cached_fetch(self, pool, query: str, *args, ttl: int = None):
        key = self._key(query, args)
        cached = await self.redis.get(key)
        if cached:
            return json.loads(cached)

        rows = await pool.fetch(query, *args)
        result = [dict(r) for r in rows]

        await self.redis.set(
            key, json.dumps(result, default=str),
            ex=ttl or self.default_ttl,
        )
        return result
```

## FAQ

### How do I know which queries need optimization?

Enable slow query logging in PostgreSQL (`log_min_duration_statement = 100` logs queries over 100ms). Then sort by total time (frequency times duration). A query that runs 1,000 times per day at 200ms each is a higher priority than one that runs once at 5 seconds.

### Should I use vector search (pgvector) for agent knowledge retrieval?

Use vector search when your agent needs semantic similarity matching — finding content that is conceptually related to the query, not just keyword matches. Use full-text search for exact keyword queries. Many production systems use both: full-text search for precise lookups and vector search for exploratory queries.

### How often should I refresh materialized views?

It depends on how fresh the data needs to be. For agent-facing aggregations like customer spending summaries, refreshing every 5-15 minutes is sufficient. For dashboards, every hour works. Use `REFRESH MATERIALIZED VIEW CONCURRENTLY` to avoid locking the view during refresh, which lets agents continue reading during the refresh process.

---

#DatabaseOptimization #PostgreSQL #Indexing #QueryCaching #Python #AgenticAI #LearnAI #AIEngineering

---

Source: https://callsphere.ai/blog/database-query-optimization-agent-knowledge-retrieval-indexes-caching
