---
title: "Building an FAQ Agent: Automatic Question Answering from Knowledge Bases"
description: "Build a production FAQ agent that retrieves answers from knowledge bases using semantic search, applies confidence thresholds to avoid hallucination, and tracks unanswered questions to improve coverage over time."
canonical: https://callsphere.ai/blog/building-faq-agent-automatic-question-answering-knowledge-bases
category: "Learn Agentic AI"
tags: ["FAQ Agent", "Knowledge Base", "Semantic Search", "RAG", "Customer Support"]
author: "CallSphere Team"
published: 2026-03-17T00:00:00.000Z
updated: 2026-05-06T01:02:43.341Z
---

# Building an FAQ Agent: Automatic Question Answering from Knowledge Bases

> Build a production FAQ agent that retrieves answers from knowledge bases using semantic search, applies confidence thresholds to avoid hallucination, and tracks unanswered questions to improve coverage over time.

## The Problem with Static FAQ Pages

Traditional FAQ pages fail customers in two ways. First, customers must guess the exact wording the company used to describe their problem. Second, the list grows unwieldy over time — a 200-item FAQ page helps no one. An FAQ agent solves both problems by understanding the customer's question semantically and retrieving the most relevant answer regardless of how it was phrased.

## Architecture Overview

An FAQ agent has three core components: a knowledge base with embeddings, a retrieval layer that finds relevant answers, and a generation layer that synthesizes a natural response with confidence scoring.

```mermaid
flowchart LR
    Q(["User query"])
    EMB["Embed query
text-embedding-3"]
    VEC[("Vector DB
pgvector or Pinecone")]
    RET["Top-k retrieval
k = 8"]
    PROMPT["Augmented prompt
system plus context"]
    LLM["LLM generation
Claude or GPT"]
    CITE["Inline citations
and page anchors"]
    OUT(["Grounded answer"])
    Q --> EMB --> VEC --> RET --> PROMPT --> LLM --> CITE --> OUT
    style EMB fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
    style VEC fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
    style LLM fill:#4f46e5,stroke:#4338ca,color:#fff
    style OUT fill:#059669,stroke:#047857,color:#fff
```

```python
from dataclasses import dataclass
from openai import AsyncOpenAI
import numpy as np

@dataclass
class FAQEntry:
    id: str
    question: str
    answer: str
    embedding: list[float]
    category: str
    last_updated: str

@dataclass
class RetrievalResult:
    entry: FAQEntry
    similarity: float

class FAQKnowledgeBase:
    def __init__(self, client: AsyncOpenAI):
        self.client = client
        self.entries: list[FAQEntry] = []

    async def embed_text(self, text: str) -> list[float]:
        response = await self.client.embeddings.create(
            model="text-embedding-3-small",
            input=text,
        )
        return response.data[0].embedding

    async def add_entry(
        self, id: str, question: str, answer: str, category: str
    ):
        embedding = await self.embed_text(question)
        self.entries.append(
            FAQEntry(
                id=id,
                question=question,
                answer=answer,
                embedding=embedding,
                category=category,
                last_updated="2026-03-17",
            )
        )
```

## Semantic Retrieval with Confidence Scoring

The retrieval layer computes cosine similarity between the user question and every FAQ entry. This is where confidence thresholds become critical — returning a wrong answer is far worse than admitting the agent does not know.

```python
def cosine_similarity(a: list[float], b: list[float]) -> float:
    a_arr, b_arr = np.array(a), np.array(b)
    return float(
        np.dot(a_arr, b_arr)
        / (np.linalg.norm(a_arr) * np.linalg.norm(b_arr))
    )

class FAQRetriever:
    def __init__(self, kb: FAQKnowledgeBase):
        self.kb = kb
        self.high_confidence = 0.85
        self.low_confidence = 0.65

    async def retrieve(
        self, query: str, top_k: int = 3
    ) -> list[RetrievalResult]:
        query_embedding = await self.kb.embed_text(query)
        results = []
        for entry in self.kb.entries:
            sim = cosine_similarity(query_embedding, entry.embedding)
            results.append(RetrievalResult(entry=entry, similarity=sim))
        results.sort(key=lambda r: r.similarity, reverse=True)
        return results[:top_k]

    async def answer(self, query: str) -> dict:
        results = await self.retrieve(query)
        if not results:
            return {
                "answer": None,
                "confidence": "none",
                "should_track": True,
            }
        top = results[0]
        if top.similarity >= self.high_confidence:
            return {
                "answer": top.entry.answer,
                "confidence": "high",
                "source_id": top.entry.id,
                "similarity": top.similarity,
                "should_track": False,
            }
        elif top.similarity >= self.low_confidence:
            return {
                "answer": top.entry.answer,
                "confidence": "medium",
                "source_id": top.entry.id,
                "similarity": top.similarity,
                "should_track": True,
            }
        else:
            return {
                "answer": None,
                "confidence": "low",
                "should_track": True,
            }
```

## Tracking Unanswered Questions

Every question the agent cannot confidently answer is an opportunity to improve the knowledge base. An unanswered question tracker clusters similar failures and surfaces the most impactful gaps.

```python
from datetime import datetime
from collections import defaultdict

class UnansweredTracker:
    def __init__(self):
        self.questions: list[dict] = []

    def track(self, query: str, confidence: str, top_similarity: float):
        self.questions.append({
            "query": query,
            "confidence": confidence,
            "top_similarity": top_similarity,
            "timestamp": datetime.utcnow().isoformat(),
        })

    def get_gap_report(self, min_occurrences: int = 3) -> list[dict]:
        """Group similar unanswered questions and rank by frequency."""
        clusters = defaultdict(list)
        for q in self.questions:
            # Simple grouping by first 5 words
            key = " ".join(q["query"].lower().split()[:5])
            clusters[key].append(q)

        gaps = []
        for key, items in clusters.items():
            if len(items) >= min_occurrences:
                gaps.append({
                    "cluster_key": key,
                    "count": len(items),
                    "sample_queries": [i["query"] for i in items[:3]],
                    "avg_similarity": sum(
                        i["top_similarity"] for i in items
                    ) / len(items),
                })
        gaps.sort(key=lambda g: g["count"], reverse=True)
        return gaps
```

## Generating Natural Responses

Rather than returning raw FAQ text, the agent uses an LLM to synthesize a conversational answer grounded in the retrieved content. This prevents hallucination by constraining the model to only use provided sources.

```python
async def generate_faq_response(
    client: AsyncOpenAI, query: str, faq_answer: str
) -> str:
    response = await client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {
                "role": "system",
                "content": (
                    "You are a customer support assistant. Answer the "
                    "customer question using ONLY the provided FAQ "
                    "content. Do not add information not present in "
                    "the source. Be concise and helpful."
                ),
            },
            {
                "role": "user",
                "content": (
                    f"Customer question: {query}\n\n"
                    f"FAQ source: {faq_answer}"
                ),
            },
        ],
        max_tokens=300,
    )
    return response.choices[0].message.content
```

## FAQ

### What embedding model should I use for FAQ retrieval?

OpenAI's `text-embedding-3-small` offers an excellent balance of quality and cost for FAQ workloads. It handles paraphrases well and runs at a fraction of the cost of larger models. For multilingual FAQs, `text-embedding-3-large` performs better across languages.

### How do I set the right confidence threshold?

Start with a high threshold (0.85) and measure your false positive rate — cases where the agent returns a wrong answer confidently. Then lower the threshold gradually while monitoring accuracy. Most teams settle between 0.75 and 0.85 depending on their tolerance for incorrect responses versus unanswered questions.

### How often should I update the knowledge base?

Review your unanswered question tracker weekly. Any cluster with more than five occurrences represents a meaningful gap. Also re-embed entries whenever the underlying answer content changes, since stale embeddings paired with updated text create inconsistencies.

---

#FAQAgent #KnowledgeBase #SemanticSearch #RAG #CustomerSupport #AgenticAI #LearnAI #AIEngineering

---

Source: https://callsphere.ai/blog/building-faq-agent-automatic-question-answering-knowledge-bases